From patchwork Sat Mar 31 19:58:53 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 893868 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.b="fMt2d53j"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40D8Tc4673z9s3B for ; Sun, 1 Apr 2018 05:59:48 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752919AbeCaT7p (ORCPT ); Sat, 31 Mar 2018 15:59:45 -0400 Received: from mail-pf0-f194.google.com ([209.85.192.194]:34376 "EHLO mail-pf0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752677AbeCaT7n (ORCPT ); Sat, 31 Mar 2018 15:59:43 -0400 Received: by mail-pf0-f194.google.com with SMTP id q9so7481633pff.1 for ; Sat, 31 Mar 2018 12:59:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=AsHQXvePE4KudFS38/Lff3FF6asqr2/9MCo64M522wM=; b=fMt2d53jyd45Blq214ymb9MvX51Squ9GuEtLZ/S2ZBMRzdxEVp9FtMarzhPpGFHJKD FHFOUmQSN1V+uyfR9U9R7uCX5QD/3jLS78Up1pSJLH2kRNhCqgZDVWyR4iscfxwt/8dl AHKDtjnvcOWVbs6dRdZVFqc6UH+5260z6hrTMyV7NxKJUFyXGhRScbx3K1wHLREvgADW p6wBkHhOU1Ib1wTb6CZOOZiD9Z3hY0a2aEkJUS69b6hTUskHaYRRgKQ6FUSAsFq5PJ0l YKXm3OsinOM0VGdJrpHQ9WOHChD4l0pl/HUJAC8p8TjA8AUqsElDAhB1bR48mm0QJ1FU GAlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=AsHQXvePE4KudFS38/Lff3FF6asqr2/9MCo64M522wM=; b=LgZQKgNHvgyYOIIh1DdkZebOAzmoeF0eosKgYpEBpcgfIRPomgUUHQuYfD2SXHPnjV 7ICW06ojz4sVF53MmAfXpOdPRPSA7X3p+iKdHiHN6QHlPTFIVTXQWxHPJXAIGD0I0Ypg ViX//3bXeOeEKikRtANJVIxW3tB/jQFRwZNyVQZcDHd3RU2VSW315EMFCgAKZwfyUs6Z Pv1Ln0Co1GLin4kLGMSemhR+Y3mqXBORbVmze3OxeK8t8Nh8eBYMdY+xrQN1YklMJ4gS 6S+iPi66lvUQp9umbmnVEGqWN4FGAfWo/593McJvBw2B8rft6U3TApQ9v48a5p+ya5p4 C/iA== X-Gm-Message-State: AElRT7GDQOv5zWh8hW7cFJIkfAyt3TwEM1DWj7FKZPK0lA1eUpJmY/Vd dU+YkZmREm1iSZxheuMSzW513A== X-Google-Smtp-Source: AIpwx4+tH7xQYYXG8pjI2C1sN4S82Y8McaJlc8z8TwLyTWEVG3Z/5YEdN8iV1Z2wnXtQLelv1gwilA== X-Received: by 2002:a17:902:8213:: with SMTP id x19-v6mr4071969pln.371.1522526382075; Sat, 31 Mar 2018 12:59:42 -0700 (PDT) Received: from localhost ([2620:15c:2c4:201:f5a:7eca:440a:3ead]) by smtp.gmail.com with ESMTPSA id r75sm23913066pfb.98.2018.03.31.12.59.40 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sat, 31 Mar 2018 12:59:40 -0700 (PDT) From: Eric Dumazet To: "David S . Miller" Cc: netdev , Florian Westphal , Herbert Xu , Thomas Graf , Jesper Dangaard Brouer , Alexander Aring , Stefan Schmidt , Kirill Tkhai , Eric Dumazet , Eric Dumazet Subject: [PATCH v4 net-next 12/19] inet: frags: break the 2GB limit for frags storage Date: Sat, 31 Mar 2018 12:58:53 -0700 Message-Id: <20180331195900.183604-13-edumazet@google.com> X-Mailer: git-send-email 2.17.0.rc1.321.gba9d0f2565-goog In-Reply-To: <20180331195900.183604-1-edumazet@google.com> References: <20180331195900.183604-1-edumazet@google.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Some users are willing to provision huge amounts of memory to be able to perform reassembly reasonnably well under pressure. Current memory tracking is using one atomic_t and integers. Switch to atomic_long_t so that 64bit arches can use more than 2GB, without any cost for 32bit arches. Note that this patch avoids an overflow error, if high_thresh was set to ~2GB, since this test in inet_frag_alloc() was never true : if (... || frag_mem_limit(nf) > nf->high_thresh) Tested: $ echo 16000000000 >/proc/sys/net/ipv4/ipfrag_high_thresh $ grep FRAG /proc/net/sockstat FRAG: inuse 14705885 memory 16000002880 $ nstat -n ; sleep 1 ; nstat | grep Reas IpReasmReqds 3317150 0.0 IpReasmFails 3317112 0.0 Signed-off-by: Eric Dumazet --- Documentation/networking/ip-sysctl.txt | 4 ++-- include/net/inet_frag.h | 20 ++++++++++---------- net/ieee802154/6lowpan/reassembly.c | 10 +++++----- net/ipv4/ip_fragment.c | 10 +++++----- net/ipv4/proc.c | 2 +- net/ipv6/netfilter/nf_conntrack_reasm.c | 10 +++++----- net/ipv6/proc.c | 2 +- net/ipv6/reassembly.c | 6 +++--- 8 files changed, 32 insertions(+), 32 deletions(-) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 6f2a3670e44b6662ce53c16cb7ca1e4f61274c15..5dc1a040a2f1db610873de26c2d79bc57ac5a1a2 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -133,10 +133,10 @@ min_adv_mss - INTEGER IP Fragmentation: -ipfrag_high_thresh - INTEGER +ipfrag_high_thresh - LONG INTEGER Maximum memory used to reassemble IP fragments. -ipfrag_low_thresh - INTEGER +ipfrag_low_thresh - LONG INTEGER (Obsolete since linux-4.17) Maximum memory used to reassemble IP fragments before the kernel begins to remove incomplete fragment queues to free up resources. diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h index 95e353e3305b43253084d972e32538138bcc5454..a52e7273e7a59bc8ce47b21d29235a740add8db0 100644 --- a/include/net/inet_frag.h +++ b/include/net/inet_frag.h @@ -8,11 +8,11 @@ struct netns_frags { struct rhashtable rhashtable ____cacheline_aligned_in_smp; /* Keep atomic mem on separate cachelines in structs that include it */ - atomic_t mem ____cacheline_aligned_in_smp; + atomic_long_t mem ____cacheline_aligned_in_smp; /* sysctls */ + long high_thresh; + long low_thresh; int timeout; - int high_thresh; - int low_thresh; int max_dist; struct inet_frags *f; }; @@ -102,7 +102,7 @@ void inet_frags_fini(struct inet_frags *); static inline int inet_frags_init_net(struct netns_frags *nf) { - atomic_set(&nf->mem, 0); + atomic_long_set(&nf->mem, 0); return rhashtable_init(&nf->rhashtable, &nf->f->rhash_params); } void inet_frags_exit_net(struct netns_frags *nf); @@ -119,19 +119,19 @@ static inline void inet_frag_put(struct inet_frag_queue *q) /* Memory Tracking Functions. */ -static inline int frag_mem_limit(struct netns_frags *nf) +static inline long frag_mem_limit(const struct netns_frags *nf) { - return atomic_read(&nf->mem); + return atomic_long_read(&nf->mem); } -static inline void sub_frag_mem_limit(struct netns_frags *nf, int i) +static inline void sub_frag_mem_limit(struct netns_frags *nf, long val) { - atomic_sub(i, &nf->mem); + atomic_long_sub(val, &nf->mem); } -static inline void add_frag_mem_limit(struct netns_frags *nf, int i) +static inline void add_frag_mem_limit(struct netns_frags *nf, long val) { - atomic_add(i, &nf->mem); + atomic_long_add(val, &nf->mem); } /* RFC 3168 support : diff --git a/net/ieee802154/6lowpan/reassembly.c b/net/ieee802154/6lowpan/reassembly.c index 1aec71a3f90478a29065e6559a78aec2e930ceab..44f148a6bb57bb737c35a07e3f070472e209ea23 100644 --- a/net/ieee802154/6lowpan/reassembly.c +++ b/net/ieee802154/6lowpan/reassembly.c @@ -411,23 +411,23 @@ int lowpan_frag_rcv(struct sk_buff *skb, u8 frag_type) } #ifdef CONFIG_SYSCTL -static int zero; +static long zero; static struct ctl_table lowpan_frags_ns_ctl_table[] = { { .procname = "6lowpanfrag_high_thresh", .data = &init_net.ieee802154_lowpan.frags.high_thresh, - .maxlen = sizeof(int), + .maxlen = sizeof(unsigned long), .mode = 0644, - .proc_handler = proc_dointvec_minmax, + .proc_handler = proc_doulongvec_minmax, .extra1 = &init_net.ieee802154_lowpan.frags.low_thresh }, { .procname = "6lowpanfrag_low_thresh", .data = &init_net.ieee802154_lowpan.frags.low_thresh, - .maxlen = sizeof(int), + .maxlen = sizeof(unsigned long), .mode = 0644, - .proc_handler = proc_dointvec_minmax, + .proc_handler = proc_doulongvec_minmax, .extra1 = &zero, .extra2 = &init_net.ieee802154_lowpan.frags.high_thresh }, diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c index b0366224f314ae521d8c1f8fe04c53e419292b4c..053869f2c49b9fd7a87316f1ef1416568b1bf508 100644 --- a/net/ipv4/ip_fragment.c +++ b/net/ipv4/ip_fragment.c @@ -678,23 +678,23 @@ struct sk_buff *ip_check_defrag(struct net *net, struct sk_buff *skb, u32 user) EXPORT_SYMBOL(ip_check_defrag); #ifdef CONFIG_SYSCTL -static int zero; +static long zero; static struct ctl_table ip4_frags_ns_ctl_table[] = { { .procname = "ipfrag_high_thresh", .data = &init_net.ipv4.frags.high_thresh, - .maxlen = sizeof(int), + .maxlen = sizeof(unsigned long), .mode = 0644, - .proc_handler = proc_dointvec_minmax, + .proc_handler = proc_doulongvec_minmax, .extra1 = &init_net.ipv4.frags.low_thresh }, { .procname = "ipfrag_low_thresh", .data = &init_net.ipv4.frags.low_thresh, - .maxlen = sizeof(int), + .maxlen = sizeof(unsigned long), .mode = 0644, - .proc_handler = proc_dointvec_minmax, + .proc_handler = proc_doulongvec_minmax, .extra1 = &zero, .extra2 = &init_net.ipv4.frags.high_thresh }, diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c index aacfce0d7d82cf59269a69ef4d6ac8d9955b0bdc..a058de677e947846eb93020e0788148827c8f3cd 100644 --- a/net/ipv4/proc.c +++ b/net/ipv4/proc.c @@ -71,7 +71,7 @@ static int sockstat_seq_show(struct seq_file *seq, void *v) sock_prot_inuse_get(net, &udplite_prot)); seq_printf(seq, "RAW: inuse %d\n", sock_prot_inuse_get(net, &raw_prot)); - seq_printf(seq, "FRAG: inuse %u memory %u\n", + seq_printf(seq, "FRAG: inuse %u memory %lu\n", atomic_read(&net->ipv4.frags.rhashtable.nelems), frag_mem_limit(&net->ipv4.frags)); return 0; diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c index d866412b8f6c432f04c0968f08f820fdc561262b..603a395928593a071aff28105578d3bfdbf1b31e 100644 --- a/net/ipv6/netfilter/nf_conntrack_reasm.c +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c @@ -63,7 +63,7 @@ struct nf_ct_frag6_skb_cb static struct inet_frags nf_frags; #ifdef CONFIG_SYSCTL -static int zero; +static long zero; static struct ctl_table nf_ct_frag6_sysctl_table[] = { { @@ -76,18 +76,18 @@ static struct ctl_table nf_ct_frag6_sysctl_table[] = { { .procname = "nf_conntrack_frag6_low_thresh", .data = &init_net.nf_frag.frags.low_thresh, - .maxlen = sizeof(unsigned int), + .maxlen = sizeof(unsigned long), .mode = 0644, - .proc_handler = proc_dointvec_minmax, + .proc_handler = proc_doulongvec_minmax, .extra1 = &zero, .extra2 = &init_net.nf_frag.frags.high_thresh }, { .procname = "nf_conntrack_frag6_high_thresh", .data = &init_net.nf_frag.frags.high_thresh, - .maxlen = sizeof(unsigned int), + .maxlen = sizeof(unsigned long), .mode = 0644, - .proc_handler = proc_dointvec_minmax, + .proc_handler = proc_doulongvec_minmax, .extra1 = &init_net.nf_frag.frags.low_thresh }, { } diff --git a/net/ipv6/proc.c b/net/ipv6/proc.c index 8befeb91e0712ecc4d05c4c0a6ecca1808dcbcac..a85f7e0b14b10f59fdd2ea6901f8e9a95c13654f 100644 --- a/net/ipv6/proc.c +++ b/net/ipv6/proc.c @@ -47,7 +47,7 @@ static int sockstat6_seq_show(struct seq_file *seq, void *v) sock_prot_inuse_get(net, &udplitev6_prot)); seq_printf(seq, "RAW6: inuse %d\n", sock_prot_inuse_get(net, &rawv6_prot)); - seq_printf(seq, "FRAG6: inuse %u memory %u\n", + seq_printf(seq, "FRAG6: inuse %u memory %lu\n", atomic_read(&net->ipv6.frags.rhashtable.nelems), frag_mem_limit(&net->ipv6.frags)); return 0; diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c index 2a77fda5e3bca1b6ce8c24df11e741653a15c665..905a8aee2671fd76b47d9cc6b6b0c17c9691d224 100644 --- a/net/ipv6/reassembly.c +++ b/net/ipv6/reassembly.c @@ -552,15 +552,15 @@ static struct ctl_table ip6_frags_ns_ctl_table[] = { { .procname = "ip6frag_high_thresh", .data = &init_net.ipv6.frags.high_thresh, - .maxlen = sizeof(int), + .maxlen = sizeof(unsigned long), .mode = 0644, - .proc_handler = proc_dointvec_minmax, + .proc_handler = proc_doulongvec_minmax, .extra1 = &init_net.ipv6.frags.low_thresh }, { .procname = "ip6frag_low_thresh", .data = &init_net.ipv6.frags.low_thresh, - .maxlen = sizeof(int), + .maxlen = sizeof(unsigned long), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = &zero,