{"id":815099,"url":"http://patchwork.ozlabs.org/api/1.2/patches/815099/?format=json","web_url":"http://patchwork.ozlabs.org/project/netdev/patch/20170918190733.26272-5-edumazet@google.com/","project":{"id":7,"url":"http://patchwork.ozlabs.org/api/1.2/projects/7/?format=json","name":"Linux network development","link_name":"netdev","list_id":"netdev.vger.kernel.org","list_email":"netdev@vger.kernel.org","web_url":null,"scm_url":null,"webscm_url":null,"list_archive_url":"","list_archive_url_format":"","commit_url_format":""},"msgid":"<20170918190733.26272-5-edumazet@google.com>","list_archive_url":null,"date":"2017-09-18T19:07:30","name":"[net-next,4/7] ipv6: addrlabel: per netns list","commit_ref":null,"pull_url":null,"state":"changes-requested","archived":true,"hash":"18ce9071600963a6b55eac797d4661fe451d03e2","submitter":{"id":13357,"url":"http://patchwork.ozlabs.org/api/1.2/people/13357/?format=json","name":"Eric Dumazet","email":"edumazet@google.com"},"delegate":{"id":34,"url":"http://patchwork.ozlabs.org/api/1.2/users/34/?format=json","username":"davem","first_name":"David","last_name":"Miller","email":"davem@davemloft.net"},"mbox":"http://patchwork.ozlabs.org/project/netdev/patch/20170918190733.26272-5-edumazet@google.com/mbox/","series":[{"id":3713,"url":"http://patchwork.ozlabs.org/api/1.2/series/3713/?format=json","web_url":"http://patchwork.ozlabs.org/project/netdev/list/?series=3713","date":"2017-09-18T19:07:26","name":"net: speedup netns create/delete time","version":1,"mbox":"http://patchwork.ozlabs.org/series/3713/mbox/"}],"comments":"http://patchwork.ozlabs.org/api/patches/815099/comments/","check":"pending","checks":"http://patchwork.ozlabs.org/api/patches/815099/checks/","tags":{},"related":[],"headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=google.com header.i=@google.com\n\theader.b=\"UzO10Ns+\"; dkim-atps=neutral"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xwwWF5ZY9z9s83\n\tfor <patchwork-incoming@ozlabs.org>;\n\tTue, 19 Sep 2017 05:07:53 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1751370AbdIRTHv (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tMon, 18 Sep 2017 15:07:51 -0400","from mail-pf0-f171.google.com ([209.85.192.171]:51213 \"EHLO\n\tmail-pf0-f171.google.com\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S1751283AbdIRTHq (ORCPT\n\t<rfc822;netdev@vger.kernel.org>); Mon, 18 Sep 2017 15:07:46 -0400","by mail-pf0-f171.google.com with SMTP id b70so706585pfl.8\n\tfor <netdev@vger.kernel.org>; Mon, 18 Sep 2017 12:07:46 -0700 (PDT)","from localhost ([2620:15c:2cb:201:1d19:43e0:8828:6785])\n\tby smtp.gmail.com with ESMTPSA id\n\tb75sm194973pfc.29.2017.09.18.12.07.45\n\t(version=TLS1_2 cipher=AES128-SHA bits=128/128);\n\tMon, 18 Sep 2017 12:07:45 -0700 (PDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=google.com; s=20161025;\n\th=from:to:cc:subject:date:message-id:in-reply-to:references;\n\tbh=Q8V0HemrONcyRycE9St2P4itgRolQHy64OghRVmsJLc=;\n\tb=UzO10Ns+uvNd8ZC6MZQCqjKEQ08HzjUbGgvolBbKSkhJiz3fWhIjUmJyTCe2FRUMwV\n\tbG9d4AS/FfytJ0dMX92gLEm6DxFgpEU8Hs1Nrh1y1xTBJ8HLjh4YobHI8GtIdajOVs8q\n\tBoBdtcSHRQOedbGvJdZn75UTnyR9D2eez/F5gDpK9UOHXcK58AJ6Mvq9uGJQB3JCxOcB\n\tVBHpGaHlgt/+2ahb0U1rQ9pG2qZnxbAmTcblxpvao2sq0D9S/f5MVwflzlrxqHEYUPHo\n\tFg4R0JffkqKANtz3TEVZRgbHetvPY16dnmsVPltV6CJSFrQs2sJeXJlc3TN7kFN+vnz4\n\tbrAg==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to\n\t:references;\n\tbh=Q8V0HemrONcyRycE9St2P4itgRolQHy64OghRVmsJLc=;\n\tb=BayEmgAhWf/nXTursympS4kcJk9siFFqmhNQTPNHOvz1MqwdD0qaGjqwIzv7eIiC9n\n\t2S4GRukQni5HLellxU2BCW95E1aj0Chx2JnhiMu8dsP1mPOfh5a3GXZsIJb//5feB9Q7\n\thbFVG/B5qynFfWR4ZNe702sQYAumz/+okK2uPirOFXbPxBYJjUH8WlkzK/vyiDNSSy2L\n\tgkWFiSatHFdIhXRoirRViAE4Qmuq//tga4QzTgko57NXvgwhC6xGg5TCLyzi041rk9wi\n\ts2MB5hPJNedwNvQ/HHx0XUwrfWZdnraMw43GPF0GD63rCSRAbIdZCZ9iYZgx1BPbv8w3\n\tGE3A==","X-Gm-Message-State":"AHPjjUh5OAK6M4xzUBk9AfoWd769vJoauaJpG+k8s8evWpEsD1ObP1Jl\n\tI4LQKPCgg05bhzWe","X-Google-Smtp-Source":"ADKCNb6TIx1y39LZGGPSUahypGsOyfzKd947dZnKvSU3U/caIm/AA1ap8lvVg0XsrAYhUCI88I0cRQ==","X-Received":"by 10.99.126.84 with SMTP id o20mr32967817pgn.139.1505761666120; \n\tMon, 18 Sep 2017 12:07:46 -0700 (PDT)","From":"Eric Dumazet <edumazet@google.com>","To":"\"David S . Miller\" <davem@davemloft.net>","Cc":"netdev <netdev@vger.kernel.org>,\n\t\"Eric W . Biederman\" <ebiederm@xmission.com>,\n\tEric Dumazet <edumazet@google.com>, Eric Dumazet <eric.dumazet@gmail.com>","Subject":"[PATCH net-next 4/7] ipv6: addrlabel: per netns list","Date":"Mon, 18 Sep 2017 12:07:30 -0700","Message-Id":"<20170918190733.26272-5-edumazet@google.com>","X-Mailer":"git-send-email 2.14.1.690.gbb1197296e-goog","In-Reply-To":"<20170918190733.26272-1-edumazet@google.com>","References":"<20170918190733.26272-1-edumazet@google.com>","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"},"content":"Having a global list of labels do not scale to thousands of\nnetns in the cloud era. This causes quadratic behavior on\nnetns creation and deletion.\n\nThis is time having a per netns list of ~10 labels.\n\nTested:\n\n$ time perf record (for f in `seq 1 3000` ; do ip netns add tast$f; done)\n[ perf record: Woken up 1 times to write data ]\n[ perf record: Captured and wrote 3.637 MB perf.data (~158898 samples) ]\n\nreal    0m20.837s # instead of 0m24.227s\nuser    0m0.328s\nsys     0m20.338s # instead of 0m23.753s\n\n    16.17%       ip  [kernel.kallsyms]  [k] netlink_broadcast_filtered\n    12.30%       ip  [kernel.kallsyms]  [k] netlink_has_listeners\n     6.76%       ip  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave\n     5.78%       ip  [kernel.kallsyms]  [k] memset_erms\n     5.77%       ip  [kernel.kallsyms]  [k] kobject_uevent_env\n     5.18%       ip  [kernel.kallsyms]  [k] refcount_sub_and_test\n     4.96%       ip  [kernel.kallsyms]  [k] _raw_read_lock\n     3.82%       ip  [kernel.kallsyms]  [k] refcount_inc_not_zero\n     3.33%       ip  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore\n     2.11%       ip  [kernel.kallsyms]  [k] unmap_page_range\n     1.77%       ip  [kernel.kallsyms]  [k] __wake_up\n     1.69%       ip  [kernel.kallsyms]  [k] strlen\n     1.17%       ip  [kernel.kallsyms]  [k] __wake_up_common\n     1.09%       ip  [kernel.kallsyms]  [k] insert_header\n     1.04%       ip  [kernel.kallsyms]  [k] page_remove_rmap\n     1.01%       ip  [kernel.kallsyms]  [k] consume_skb\n     0.98%       ip  [kernel.kallsyms]  [k] netlink_trim\n     0.51%       ip  [kernel.kallsyms]  [k] kernfs_link_sibling\n     0.51%       ip  [kernel.kallsyms]  [k] filemap_map_pages\n     0.46%       ip  [kernel.kallsyms]  [k] memcpy_erms\n\nSigned-off-by: Eric Dumazet <edumazet@google.com>\n---\n include/net/netns/ipv6.h |  5 +++\n net/ipv6/addrlabel.c     | 81 ++++++++++++++++++------------------------------\n 2 files changed, 35 insertions(+), 51 deletions(-)","diff":"diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h\nindex 2544f9760a4263b7f1b8d622331ca63038586137..2ea1ed341ef81901b4fa271b0f7f4592e17c4f8a 100644\n--- a/include/net/netns/ipv6.h\n+++ b/include/net/netns/ipv6.h\n@@ -89,6 +89,11 @@ struct netns_ipv6 {\n \tatomic_t\t\tfib6_sernum;\n \tstruct seg6_pernet_data *seg6_data;\n \tstruct fib_notifier_ops\t*notifier_ops;\n+\tstruct {\n+\t\tstruct hlist_head head;\n+\t\tspinlock_t\tlock;\n+\t\tu32\t\tseq;\n+\t} ip6addrlbl_table;\n };\n \n #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)\ndiff --git a/net/ipv6/addrlabel.c b/net/ipv6/addrlabel.c\nindex b055bc79f56d555c89684116c1580984950f77a8..c6311d7108f651c7385cd6316752ba4a86667dcc 100644\n--- a/net/ipv6/addrlabel.c\n+++ b/net/ipv6/addrlabel.c\n@@ -30,7 +30,6 @@\n  * Policy Table\n  */\n struct ip6addrlbl_entry {\n-\tpossible_net_t lbl_net;\n \tstruct in6_addr prefix;\n \tint prefixlen;\n \tint ifindex;\n@@ -41,19 +40,6 @@ struct ip6addrlbl_entry {\n \tstruct rcu_head rcu;\n };\n \n-static struct ip6addrlbl_table\n-{\n-\tstruct hlist_head head;\n-\tspinlock_t lock;\n-\tu32 seq;\n-} ip6addrlbl_table;\n-\n-static inline\n-struct net *ip6addrlbl_net(const struct ip6addrlbl_entry *lbl)\n-{\n-\treturn read_pnet(&lbl->lbl_net);\n-}\n-\n /*\n  * Default policy table (RFC6724 + extensions)\n  *\n@@ -148,13 +134,10 @@ static inline void ip6addrlbl_put(struct ip6addrlbl_entry *p)\n }\n \n /* Find label */\n-static bool __ip6addrlbl_match(struct net *net,\n-\t\t\t       const struct ip6addrlbl_entry *p,\n+static bool __ip6addrlbl_match(const struct ip6addrlbl_entry *p,\n \t\t\t       const struct in6_addr *addr,\n \t\t\t       int addrtype, int ifindex)\n {\n-\tif (!net_eq(ip6addrlbl_net(p), net))\n-\t\treturn false;\n \tif (p->ifindex && p->ifindex != ifindex)\n \t\treturn false;\n \tif (p->addrtype && p->addrtype != addrtype)\n@@ -169,8 +152,9 @@ static struct ip6addrlbl_entry *__ipv6_addr_label(struct net *net,\n \t\t\t\t\t\t  int type, int ifindex)\n {\n \tstruct ip6addrlbl_entry *p;\n-\thlist_for_each_entry_rcu(p, &ip6addrlbl_table.head, list) {\n-\t\tif (__ip6addrlbl_match(net, p, addr, type, ifindex))\n+\n+\thlist_for_each_entry_rcu(p, &net->ipv6.ip6addrlbl_table.head, list) {\n+\t\tif (__ip6addrlbl_match(p, addr, type, ifindex))\n \t\t\treturn p;\n \t}\n \treturn NULL;\n@@ -196,8 +180,7 @@ u32 ipv6_addr_label(struct net *net,\n }\n \n /* allocate one entry */\n-static struct ip6addrlbl_entry *ip6addrlbl_alloc(struct net *net,\n-\t\t\t\t\t\t const struct in6_addr *prefix,\n+static struct ip6addrlbl_entry *ip6addrlbl_alloc(const struct in6_addr *prefix,\n \t\t\t\t\t\t int prefixlen, int ifindex,\n \t\t\t\t\t\t u32 label)\n {\n@@ -236,24 +219,23 @@ static struct ip6addrlbl_entry *ip6addrlbl_alloc(struct net *net,\n \tnewp->addrtype = addrtype;\n \tnewp->label = label;\n \tINIT_HLIST_NODE(&newp->list);\n-\twrite_pnet(&newp->lbl_net, net);\n \trefcount_set(&newp->refcnt, 1);\n \treturn newp;\n }\n \n /* add a label */\n-static int __ip6addrlbl_add(struct ip6addrlbl_entry *newp, int replace)\n+static int __ip6addrlbl_add(struct net *net, struct ip6addrlbl_entry *newp,\n+\t\t\t    int replace)\n {\n-\tstruct hlist_node *n;\n \tstruct ip6addrlbl_entry *last = NULL, *p = NULL;\n+\tstruct hlist_node *n;\n \tint ret = 0;\n \n \tADDRLABEL(KERN_DEBUG \"%s(newp=%p, replace=%d)\\n\", __func__, newp,\n \t\t  replace);\n \n-\thlist_for_each_entry_safe(p, n,\t&ip6addrlbl_table.head, list) {\n+\thlist_for_each_entry_safe(p, n,\t&net->ipv6.ip6addrlbl_table.head, list) {\n \t\tif (p->prefixlen == newp->prefixlen &&\n-\t\t    net_eq(ip6addrlbl_net(p), ip6addrlbl_net(newp)) &&\n \t\t    p->ifindex == newp->ifindex &&\n \t\t    ipv6_addr_equal(&p->prefix, &newp->prefix)) {\n \t\t\tif (!replace) {\n@@ -273,10 +255,10 @@ static int __ip6addrlbl_add(struct ip6addrlbl_entry *newp, int replace)\n \tif (last)\n \t\thlist_add_behind_rcu(&newp->list, &last->list);\n \telse\n-\t\thlist_add_head_rcu(&newp->list, &ip6addrlbl_table.head);\n+\t\thlist_add_head_rcu(&newp->list, &net->ipv6.ip6addrlbl_table.head);\n out:\n \tif (!ret)\n-\t\tip6addrlbl_table.seq++;\n+\t\tnet->ipv6.ip6addrlbl_table.seq++;\n \treturn ret;\n }\n \n@@ -292,12 +274,12 @@ static int ip6addrlbl_add(struct net *net,\n \t\t  __func__, prefix, prefixlen, ifindex, (unsigned int)label,\n \t\t  replace);\n \n-\tnewp = ip6addrlbl_alloc(net, prefix, prefixlen, ifindex, label);\n+\tnewp = ip6addrlbl_alloc(prefix, prefixlen, ifindex, label);\n \tif (IS_ERR(newp))\n \t\treturn PTR_ERR(newp);\n-\tspin_lock(&ip6addrlbl_table.lock);\n-\tret = __ip6addrlbl_add(newp, replace);\n-\tspin_unlock(&ip6addrlbl_table.lock);\n+\tspin_lock(&net->ipv6.ip6addrlbl_table.lock);\n+\tret = __ip6addrlbl_add(net, newp, replace);\n+\tspin_unlock(&net->ipv6.ip6addrlbl_table.lock);\n \tif (ret)\n \t\tip6addrlbl_free(newp);\n \treturn ret;\n@@ -315,9 +297,8 @@ static int __ip6addrlbl_del(struct net *net,\n \tADDRLABEL(KERN_DEBUG \"%s(prefix=%pI6, prefixlen=%d, ifindex=%d)\\n\",\n \t\t  __func__, prefix, prefixlen, ifindex);\n \n-\thlist_for_each_entry_safe(p, n, &ip6addrlbl_table.head, list) {\n+\thlist_for_each_entry_safe(p, n, &net->ipv6.ip6addrlbl_table.head, list) {\n \t\tif (p->prefixlen == prefixlen &&\n-\t\t    net_eq(ip6addrlbl_net(p), net) &&\n \t\t    p->ifindex == ifindex &&\n \t\t    ipv6_addr_equal(&p->prefix, prefix)) {\n \t\t\thlist_del_rcu(&p->list);\n@@ -340,9 +321,9 @@ static int ip6addrlbl_del(struct net *net,\n \t\t  __func__, prefix, prefixlen, ifindex);\n \n \tipv6_addr_prefix(&prefix_buf, prefix, prefixlen);\n-\tspin_lock(&ip6addrlbl_table.lock);\n+\tspin_lock(&net->ipv6.ip6addrlbl_table.lock);\n \tret = __ip6addrlbl_del(net, &prefix_buf, prefixlen, ifindex);\n-\tspin_unlock(&ip6addrlbl_table.lock);\n+\tspin_unlock(&net->ipv6.ip6addrlbl_table.lock);\n \treturn ret;\n }\n \n@@ -354,6 +335,9 @@ static int __net_init ip6addrlbl_net_init(struct net *net)\n \n \tADDRLABEL(KERN_DEBUG \"%s\\n\", __func__);\n \n+\tspin_lock_init(&net->ipv6.ip6addrlbl_table.lock);\n+\tINIT_HLIST_HEAD(&net->ipv6.ip6addrlbl_table.head);\n+\n \tfor (i = 0; i < ARRAY_SIZE(ip6addrlbl_init_table); i++) {\n \t\tint ret = ip6addrlbl_add(net,\n \t\t\t\t\t ip6addrlbl_init_table[i].prefix,\n@@ -373,14 +357,12 @@ static void __net_exit ip6addrlbl_net_exit(struct net *net)\n \tstruct hlist_node *n;\n \n \t/* Remove all labels belonging to the exiting net */\n-\tspin_lock(&ip6addrlbl_table.lock);\n-\thlist_for_each_entry_safe(p, n, &ip6addrlbl_table.head, list) {\n-\t\tif (net_eq(ip6addrlbl_net(p), net)) {\n-\t\t\thlist_del_rcu(&p->list);\n-\t\t\tip6addrlbl_put(p);\n-\t\t}\n+\tspin_lock(&net->ipv6.ip6addrlbl_table.lock);\n+\thlist_for_each_entry_safe(p, n, &net->ipv6.ip6addrlbl_table.head, list) {\n+\t\thlist_del_rcu(&p->list);\n+\t\tip6addrlbl_put(p);\n \t}\n-\tspin_unlock(&ip6addrlbl_table.lock);\n+\tspin_unlock(&net->ipv6.ip6addrlbl_table.lock);\n }\n \n static struct pernet_operations ipv6_addr_label_ops = {\n@@ -390,8 +372,6 @@ static struct pernet_operations ipv6_addr_label_ops = {\n \n int __init ipv6_addr_label_init(void)\n {\n-\tspin_lock_init(&ip6addrlbl_table.lock);\n-\n \treturn register_pernet_subsys(&ipv6_addr_label_ops);\n }\n \n@@ -510,11 +490,10 @@ static int ip6addrlbl_dump(struct sk_buff *skb, struct netlink_callback *cb)\n \tint err;\n \n \trcu_read_lock();\n-\thlist_for_each_entry_rcu(p, &ip6addrlbl_table.head, list) {\n-\t\tif (idx >= s_idx &&\n-\t\t    net_eq(ip6addrlbl_net(p), net)) {\n+\thlist_for_each_entry_rcu(p, &net->ipv6.ip6addrlbl_table.head, list) {\n+\t\tif (idx >= s_idx) {\n \t\t\terr = ip6addrlbl_fill(skb, p,\n-\t\t\t\t\t      ip6addrlbl_table.seq,\n+\t\t\t\t\t      net->ipv6.ip6addrlbl_table.seq,\n \t\t\t\t\t      NETLINK_CB(cb->skb).portid,\n \t\t\t\t\t      cb->nlh->nlmsg_seq,\n \t\t\t\t\t      RTM_NEWADDRLABEL,\n@@ -571,7 +550,7 @@ static int ip6addrlbl_get(struct sk_buff *in_skb, struct nlmsghdr *nlh,\n \tp = __ipv6_addr_label(net, addr, ipv6_addr_type(addr), ifal->ifal_index);\n \tif (p && !ip6addrlbl_hold(p))\n \t\tp = NULL;\n-\tlseq = ip6addrlbl_table.seq;\n+\tlseq = net->ipv6.ip6addrlbl_table.seq;\n \trcu_read_unlock();\n \n \tif (!p) {\n","prefixes":["net-next","4/7"]}