Message ID | 05cb873f-3edf-f115-305c-81b5ace8d76e@gmail.com |
---|---|
State | Rejected, archived |
Delegated to: | David Miller |
Headers | show |
Series | netdev: add netdev_pagefrag_enabled sysctl | expand |
From: Hongbo Li <herbert.tencent@gmail.com> Date: Thu, 9 Nov 2017 16:12:27 +0800 > From: Hongbo Li <herberthbli@tencent.com> > > This patch solves a memory frag issue when allocating skb. > I found this issue in a udp scenario, here is my test model: > 1. About five hundreds udp threads listen on server, > and five hundreds client threads send udp pkts to them. > Some threads send pkts in a faster speed than others. > 2. The user processes on server don't have enough ability > to receive these pkts. > > Then I got following result: > 1. Some udp sockets' recv-q reach the queue's limit, others > not because of the global rmem limit. > 2. The "free" command shows "used" memory is more than 62GB. > But cat /proc/net/sockstat shows that udp uses only 12GB. > > This will confused the user that why the system consumes so > many memory.This is caused by the memory frags in netdev layer. > __netdev_alloc_frag() allocs a page block which has 8 pages. > > Then in this scenario, most skbs are freed when the recv-q > is full, but if any skb in the same page block be queued to > other recv-q which is not full, the whole page block can't > be freed. > > So from the view of kernel, these pages are used, but from > the view of tcp/udp, only the skbs in recv-q are used. > > To avoid exhausting memory in such scenario, I add a sysctl > to make user can disable allocating skbs in page frag. > > Signed-off-by: Hongbo Li <herberthbli@tencent.com> When something like page fragments don't work properly, we fix them rather then providing a way to disable them. Thank you.
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2eaac7d..73540ee 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3319,6 +3319,7 @@ static __always_inline int ____dev_forward_skb(struct net_device *dev, extern int netdev_budget; extern unsigned int netdev_budget_usecs; +extern int netdev_pagefrag_enabled; /* Called by rtnetlink.c:rtnl_unlock() */ void netdev_run_todo(void); diff --git a/net/core/dev.c b/net/core/dev.c index 11596a3..2328ddb 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3527,6 +3527,7 @@ int dev_queue_xmit_accel(struct sk_buff *skb, void *accel_priv) int netdev_tstamp_prequeue __read_mostly = 1; int netdev_budget __read_mostly = 300; unsigned int __read_mostly netdev_budget_usecs = 2000; +int netdev_pagefrag_enabled __read_mostly = 1; int weight_p __read_mostly = 64; /* old backlog weight */ int dev_weight_rx_bias __read_mostly = 1; /* bias for backlog weight */ int dev_weight_tx_bias __read_mostly = 1; /* bias for output_queue quota */ diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 2465607..62a43fe 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -399,7 +399,8 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len, len += NET_SKB_PAD; if ((len > SKB_WITH_OVERHEAD(PAGE_SIZE)) || - (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) { + (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA)) || + !netdev_pagefrag_enabled) { skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE); if (!skb) goto skb_fail; @@ -466,7 +467,8 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len, len += NET_SKB_PAD + NET_IP_ALIGN; if ((len > SKB_WITH_OVERHEAD(PAGE_SIZE)) || - (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) { + (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA)) || + !netdev_pagefrag_enabled) { skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE); if (!skb) goto skb_fail; diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index cbc3dde..c0078c5 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -461,6 +461,13 @@ static int proc_do_rss_key(struct ctl_table *table, int write, .proc_handler = proc_dointvec_minmax, .extra1 = &zero, }, + { + .procname = "netdev_pagefrag_enabled", + .data = &netdev_pagefrag_enabled, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec + }, { } };