Message ID | 1413985819-9553-1-git-send-email-klamm@yandex-team.ru |
---|---|
State | Awaiting Upstream, archived |
Delegated to: | David Miller |
Headers | show |
On Wed, 2014-10-22 at 17:50 +0400, Roman Gushchin wrote: > Incoming packet is dropped silently by sk_filter(), if the skb was > allocated from pfmemalloc reserves and the corresponding socket is > not marked with the SOCK_MEMALLOC flag. > > Igb driver allocates pages for DMA with __skb_alloc_page(), which > calls alloc_pages_node() with the __GFP_MEMALLOC flag. So, in case > of OOM condition, igb can get pages with pfmemalloc flag set. > > If an incoming packet hits the pfmemalloc page and is large enough > (small packets are copying into the memory, allocated with > netdev_alloc_skb_ip_align(), so they are not affected), it will be > dropped. > > This behavior is ok under high memory pressure, but the problem is > that the igb driver reuses these mapped pages. So, packets are still > dropping even if all memory issues are gone and there is a plenty > of free memory. > > In my case, some TCP sessions hang on a small percentage (< 0.1%) > of machines days after OOMs. > > Fix this by avoiding reuse of such pages. > > Signed-off-by: Roman Gushchin <klamm@yandex-team.ru> > --- Interesting... It seems we also need to clear skb->pfmemalloc in napi_reuse_skb() -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2014-10-22 at 17:50 +0400, Roman Gushchin wrote: > Incoming packet is dropped silently by sk_filter(), if the skb was > allocated from pfmemalloc reserves and the corresponding socket is > not marked with the SOCK_MEMALLOC flag. > > Igb driver allocates pages for DMA with __skb_alloc_page(), which > calls alloc_pages_node() with the __GFP_MEMALLOC flag. So, in case > of OOM condition, igb can get pages with pfmemalloc flag set. > > If an incoming packet hits the pfmemalloc page and is large enough > (small packets are copying into the memory, allocated with > netdev_alloc_skb_ip_align(), so they are not affected), it will be > dropped. > > This behavior is ok under high memory pressure, but the problem is > that the igb driver reuses these mapped pages. So, packets are still > dropping even if all memory issues are gone and there is a plenty > of free memory. > > In my case, some TCP sessions hang on a small percentage (< 0.1%) > of machines days after OOMs. > > Fix this by avoiding reuse of such pages. > > Signed-off-by: Roman Gushchin <klamm@yandex-team.ru> > --- > drivers/net/ethernet/intel/igb/igb_main.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) Thanks Roman, I have added you patch to my queue.
Thank you! Probably we should add it to stable trees too? -- Regards, Roman 22.10.2014, 22:30, "Jeff Kirsher" <jeffrey.t.kirsher@intel.com>: > On Wed, 2014-10-22 at 17:50 +0400, Roman Gushchin wrote: >> Incoming packet is dropped silently by sk_filter(), if the skb was >> allocated from pfmemalloc reserves and the corresponding socket is >> not marked with the SOCK_MEMALLOC flag. >> >> Igb driver allocates pages for DMA with __skb_alloc_page(), which >> calls alloc_pages_node() with the __GFP_MEMALLOC flag. So, in case >> of OOM condition, igb can get pages with pfmemalloc flag set. >> >> If an incoming packet hits the pfmemalloc page and is large enough >> (small packets are copying into the memory, allocated with >> netdev_alloc_skb_ip_align(), so they are not affected), it will be >> dropped. >> >> This behavior is ok under high memory pressure, but the problem is >> that the igb driver reuses these mapped pages. So, packets are still >> dropping even if all memory issues are gone and there is a plenty >> of free memory. >> >> In my case, some TCP sessions hang on a small percentage (< 0.1%) >> of machines days after OOMs. >> >> Fix this by avoiding reuse of such pages. >> >> Signed-off-by: Roman Gushchin <klamm@yandex-team.ru> >> --- >> drivers/net/ethernet/intel/igb/igb_main.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) > > Thanks Roman, I have added you patch to my queue. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Interesting... > > It seems we also need to clear skb->pfmemalloc in napi_reuse_skb() Sounds reasonable, but are you sure, that we can just drop skb->pfmemalloc flag in napi_reuse_skb()? > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 0d4c897..6586392 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -6178,6 +6178,9 @@ static bool igb_can_reuse_rx_page(struct igb_rx_buffer *rx_buffer, if (unlikely(page_to_nid(page) != numa_node_id())) return false; + if (unlikely(page->pfmemalloc)) + return false; + #if (PAGE_SIZE < 8192) /* if we are only owner of page we can reuse it */ if (unlikely(page_count(page) != 1)) @@ -6245,7 +6248,8 @@ static bool igb_add_rx_frag(struct igb_ring *rx_ring, memcpy(__skb_put(skb, size), va, ALIGN(size, sizeof(long))); /* we can reuse buffer as-is, just make sure it is local */ - if (likely(page_to_nid(page) == numa_node_id())) + if (likely((page_to_nid(page) == numa_node_id()) && + !page->pfmemalloc)) return true; /* this page cannot be reused so discard it */
Incoming packet is dropped silently by sk_filter(), if the skb was allocated from pfmemalloc reserves and the corresponding socket is not marked with the SOCK_MEMALLOC flag. Igb driver allocates pages for DMA with __skb_alloc_page(), which calls alloc_pages_node() with the __GFP_MEMALLOC flag. So, in case of OOM condition, igb can get pages with pfmemalloc flag set. If an incoming packet hits the pfmemalloc page and is large enough (small packets are copying into the memory, allocated with netdev_alloc_skb_ip_align(), so they are not affected), it will be dropped. This behavior is ok under high memory pressure, but the problem is that the igb driver reuses these mapped pages. So, packets are still dropping even if all memory issues are gone and there is a plenty of free memory. In my case, some TCP sessions hang on a small percentage (< 0.1%) of machines days after OOMs. Fix this by avoiding reuse of such pages. Signed-off-by: Roman Gushchin <klamm@yandex-team.ru> --- drivers/net/ethernet/intel/igb/igb_main.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)