Message ID | 1565136348-28595-1-git-send-email-pkusunyifeng@gmail.com |
---|---|
State | Superseded |
Headers | show |
Series | [ovs-dev,v2] datapath: compat: Backports bugfixes for nf_conncount | expand |
On Tue, Aug 6, 2019 at 5:06 PM Yifeng Sun <pkusunyifeng@gmail.com> wrote: > > This patch backports several critical bug fixes related to > locking and data consistency in nf_conncount code. > > This backport is based on the following upstream net-next upstream commits. > 4cd273b ("netfilter: nf_conncount: don't skip eviction when age is negative") > d4e7df1 ("netfilter: nf_conncount: use rb_link_node_rcu() instead of rb_link_node()") > 53ca0f2 ("netfilter: nf_conncount: remove wrong condition check routine") > 3c5cdb1 ("netfilter: nf_conncount: fix unexpected permanent node of list.") > 31568ec ("netfilter: nf_conncount: fix list_del corruption in conn_free") > fd3e71a ("netfilter: nf_conncount: use spin_lock_bh instead of spin_lock") > > This patch also added additional compat code so that it can build on > all supported kernel versions. > > Travis tests are at > https://travis-ci.org/yifsun/ovs-travis/builds/568603796 > > VMware-BZ: #2396471 > > CC: Taehee Yoo <ap420073@gmail.com> > Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> > --- Hi Yifeng, Thanks for the patch. This backport looks good in general. I found that there are couple of fixes on nf_conncount in upstream kernel. Since we would like to use df4a90250976 ("netfilter: nf_conncount: merge lookup and add functions") to determine if the kernel has all the fixes. It would be good to bring other commits on the same date to our datapath compat code base as well. That is to squash all the following commits. One more suggestion is to squash the other patch that you sent ("datapath: Apply bug fixes of nf_conncount for different kernel versions") into this one, since updating "HAVE_UPSTREAM_NF_CONNCOUNT" is only useful with this patch. a007232066f6 ("netfilter: nf_conncount: fix argument order to find_next_bit") c80f10bc973a ("netfilter: nf_conncount: speculative garbage collection on empty lists") 2f971a8f4255 ("netfilter: nf_conncount: move all list iterations under spinlock") df4a90250976 ("netfilter: nf_conncount: merge lookup and add functions") e8cfb372b38a ("netfilter: nf_conncount: restart search when nodes have been erased") f7fcc98dfc2d ("netfilter: nf_conncount: split gc in two phases") 4cd273bb91b3 ("netfilter: nf_conncount: don't skip eviction when age is negative") c78e7818f16f ("netfilter: nf_conncount: replace CONNCOUNT_LOCK_SLOTS with CONNCOUNT_SLOTS") d4e7df16567b ("netfilter: nf_conncount: use rb_link_node_rcu() instead of rb_link_node()") 53ca0f2fec39 ("netfilter: nf_conncount: remove wrong condition check routine") 3c5cdb17c3be ("netfilter: nf_conncount: fix unexpected permanent node of list.") 31568ec09ea0 ("netfilter: nf_conncount: fix list_del corruption in conn_free") fd3e71a9f71e ("netfilter: nf_conncount: use spin_lock_bh instead of spin_lock") Thanks, -Yi-Hung
Thanks YiHung for reviewing. I've sent out a new version. Best, Yifeng On Wed, Aug 7, 2019 at 10:24 AM Yi-Hung Wei <yihung.wei@gmail.com> wrote: > > On Tue, Aug 6, 2019 at 5:06 PM Yifeng Sun <pkusunyifeng@gmail.com> wrote: > > > > This patch backports several critical bug fixes related to > > locking and data consistency in nf_conncount code. > > > > This backport is based on the following upstream net-next upstream commits. > > 4cd273b ("netfilter: nf_conncount: don't skip eviction when age is negative") > > d4e7df1 ("netfilter: nf_conncount: use rb_link_node_rcu() instead of rb_link_node()") > > 53ca0f2 ("netfilter: nf_conncount: remove wrong condition check routine") > > 3c5cdb1 ("netfilter: nf_conncount: fix unexpected permanent node of list.") > > 31568ec ("netfilter: nf_conncount: fix list_del corruption in conn_free") > > fd3e71a ("netfilter: nf_conncount: use spin_lock_bh instead of spin_lock") > > > > This patch also added additional compat code so that it can build on > > all supported kernel versions. > > > > Travis tests are at > > https://travis-ci.org/yifsun/ovs-travis/builds/568603796 > > > > VMware-BZ: #2396471 > > > > CC: Taehee Yoo <ap420073@gmail.com> > > Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> > > --- > > Hi Yifeng, > > Thanks for the patch. This backport looks good in general. > > I found that there are couple of fixes on nf_conncount in upstream > kernel. Since we would like to use df4a90250976 ("netfilter: > nf_conncount: merge lookup and add functions") to determine if the > kernel has all the fixes. It would be good to bring other commits on > the same date to our datapath compat code base as well. That is to > squash all the following commits. > > One more suggestion is to squash the other patch that you sent > ("datapath: Apply bug fixes of nf_conncount for different kernel > versions") into this one, since updating "HAVE_UPSTREAM_NF_CONNCOUNT" > is only useful with this patch. > > > a007232066f6 ("netfilter: nf_conncount: fix argument order to find_next_bit") > c80f10bc973a ("netfilter: nf_conncount: speculative garbage collection > on empty lists") > 2f971a8f4255 ("netfilter: nf_conncount: move all list iterations under > spinlock") > df4a90250976 ("netfilter: nf_conncount: merge lookup and add functions") > e8cfb372b38a ("netfilter: nf_conncount: restart search when nodes have > been erased") > f7fcc98dfc2d ("netfilter: nf_conncount: split gc in two phases") > 4cd273bb91b3 ("netfilter: nf_conncount: don't skip eviction when age > is negative") > c78e7818f16f ("netfilter: nf_conncount: replace CONNCOUNT_LOCK_SLOTS > with CONNCOUNT_SLOTS") > d4e7df16567b ("netfilter: nf_conncount: use rb_link_node_rcu() instead > of rb_link_node()") > 53ca0f2fec39 ("netfilter: nf_conncount: remove wrong condition check routine") > 3c5cdb17c3be ("netfilter: nf_conncount: fix unexpected permanent node of list.") > 31568ec09ea0 ("netfilter: nf_conncount: fix list_del corruption in conn_free") > fd3e71a9f71e ("netfilter: nf_conncount: use spin_lock_bh instead of spin_lock") > > Thanks, > > -Yi-Hung
diff --git a/acinclude.m4 b/acinclude.m4 index 116ffcf9096d..f8e856d3303f 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -1012,6 +1012,8 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [ [OVS_DEFINE([HAVE_GRE_CALC_HLEN])]) OVS_GREP_IFELSE([$KSRC/include/net/gre.h], [ip_gre_calc_hlen], [OVS_DEFINE([HAVE_IP_GRE_CALC_HLEN])]) + OVS_GREP_IFELSE([$KSRC/include/linux/rbtree.h], [rb_link_node_rcu], + [OVS_DEFINE([HAVE_RBTREE_RB_LINK_NODE_RCU])]) if cmp -s datapath/linux/kcompat.h.new \ datapath/linux/kcompat.h >/dev/null 2>&1; then diff --git a/datapath/linux/Modules.mk b/datapath/linux/Modules.mk index cbb29f1c69d0..69d7faeac414 100644 --- a/datapath/linux/Modules.mk +++ b/datapath/linux/Modules.mk @@ -116,5 +116,6 @@ openvswitch_headers += \ linux/compat/include/uapi/linux/netfilter.h \ linux/compat/include/linux/mm.h \ linux/compat/include/linux/netfilter.h \ - linux/compat/include/linux/overflow.h + linux/compat/include/linux/overflow.h \ + linux/compat/include/linux/rbtree.h EXTRA_DIST += linux/compat/build-aux/export-check-whitelist diff --git a/datapath/linux/compat/include/linux/rbtree.h b/datapath/linux/compat/include/linux/rbtree.h new file mode 100644 index 000000000000..dbf20ff0e0b8 --- /dev/null +++ b/datapath/linux/compat/include/linux/rbtree.h @@ -0,0 +1,19 @@ +#ifndef __LINUX_RBTREE_WRAPPER_H +#define __LINUX_RBTREE_WRAPPER_H 1 + +#include_next <linux/rbtree.h> + +#ifndef HAVE_RBTREE_RB_LINK_NODE_RCU +#include <linux/rcupdate.h> + +static inline void rb_link_node_rcu(struct rb_node *node, struct rb_node *parent, + struct rb_node **rb_link) +{ + node->__rb_parent_color = (unsigned long)parent; + node->rb_left = node->rb_right = NULL; + + rcu_assign_pointer(*rb_link, node); +} +#endif + +#endif /* __LINUX_RBTREE_WRAPPER_H */ diff --git a/datapath/linux/compat/nf_conncount.c b/datapath/linux/compat/nf_conncount.c index eeae440f872d..6a4d058e7fac 100644 --- a/datapath/linux/compat/nf_conncount.c +++ b/datapath/linux/compat/nf_conncount.c @@ -54,6 +54,7 @@ struct nf_conncount_tuple { struct nf_conntrack_zone zone; int cpu; u32 jiffies32; + bool dead; struct rcu_head rcu_head; }; @@ -111,15 +112,16 @@ nf_conncount_add(struct nf_conncount_list *list, conn->zone = *zone; conn->cpu = raw_smp_processor_id(); conn->jiffies32 = (u32)jiffies; - spin_lock(&list->list_lock); + conn->dead = false; + spin_lock_bh(&list->list_lock); if (list->dead == true) { kmem_cache_free(conncount_conn_cachep, conn); - spin_unlock(&list->list_lock); + spin_unlock_bh(&list->list_lock); return NF_CONNCOUNT_SKIP; } list_add_tail(&conn->node, &list->head); list->count++; - spin_unlock(&list->list_lock); + spin_unlock_bh(&list->list_lock); return NF_CONNCOUNT_ADDED; } @@ -136,19 +138,22 @@ static bool conn_free(struct nf_conncount_list *list, { bool free_entry = false; - spin_lock(&list->list_lock); + spin_lock_bh(&list->list_lock); - if (list->count == 0) { - spin_unlock(&list->list_lock); + if (conn->dead) { + spin_unlock_bh(&list->list_lock); return free_entry; } list->count--; + conn->dead = true; list_del_rcu(&conn->node); - if (list->count == 0) + if (list->count == 0) { + list->dead = true; free_entry = true; + } - spin_unlock(&list->list_lock); + spin_unlock_bh(&list->list_lock); call_rcu(&conn->rcu_head, __conn_free); return free_entry; } @@ -160,7 +165,7 @@ find_or_evict(struct net *net, struct nf_conncount_list *list, const struct nf_conntrack_tuple_hash *found; unsigned long a, b; int cpu = raw_smp_processor_id(); - __s32 age; + u32 age; found = nf_conntrack_find_get(net, &conn->zone, &conn->tuple); if (found) @@ -248,7 +253,7 @@ static void nf_conncount_list_init(struct nf_conncount_list *list) { spin_lock_init(&list->list_lock); INIT_LIST_HEAD(&list->head); - list->count = 1; + list->count = 0; list->dead = false; } @@ -261,6 +266,7 @@ static bool nf_conncount_gc_list(struct net *net, struct nf_conn *found_ct; unsigned int collected = 0; bool free_entry = false; + bool ret = false; list_for_each_entry_safe(conn, conn_n, &list->head, node) { found = find_or_evict(net, list, conn, &free_entry); @@ -290,7 +296,15 @@ static bool nf_conncount_gc_list(struct net *net, if (collected > CONNCOUNT_GC_MAX_NODES) return false; } - return false; + + spin_lock_bh(&list->list_lock); + if (!list->count) { + list->dead = true; + ret = true; + } + spin_unlock_bh(&list->list_lock); + + return ret; } static void __tree_nodes_free(struct rcu_head *h) @@ -310,11 +324,8 @@ static void tree_nodes_free(struct rb_root *root, while (gc_count) { rbconn = gc_nodes[--gc_count]; spin_lock(&rbconn->list.list_lock); - if (rbconn->list.count == 0 && rbconn->list.dead == false) { - rbconn->list.dead = true; - rb_erase(&rbconn->node, root); - call_rcu(&rbconn->rcu_head, __tree_nodes_free); - } + rb_erase(&rbconn->node, root); + call_rcu(&rbconn->rcu_head, __tree_nodes_free); spin_unlock(&rbconn->list.list_lock); } } @@ -415,8 +426,9 @@ insert_tree(struct net *net, nf_conncount_list_init(&rbconn->list); list_add(&conn->node, &rbconn->list.head); count = 1; + rbconn->list.count = count; - rb_link_node(&rbconn->node, parent, rbnode); + rb_link_node_rcu(&rbconn->node, parent, rbnode); rb_insert_color(&rbconn->node, root); out_unlock: spin_unlock_bh(&nf_conncount_locks[hash % CONNCOUNT_LOCK_SLOTS]);
This patch backports several critical bug fixes related to locking and data consistency in nf_conncount code. This backport is based on the following upstream net-next upstream commits. 4cd273b ("netfilter: nf_conncount: don't skip eviction when age is negative") d4e7df1 ("netfilter: nf_conncount: use rb_link_node_rcu() instead of rb_link_node()") 53ca0f2 ("netfilter: nf_conncount: remove wrong condition check routine") 3c5cdb1 ("netfilter: nf_conncount: fix unexpected permanent node of list.") 31568ec ("netfilter: nf_conncount: fix list_del corruption in conn_free") fd3e71a ("netfilter: nf_conncount: use spin_lock_bh instead of spin_lock") This patch also added additional compat code so that it can build on all supported kernel versions. Travis tests are at https://travis-ci.org/yifsun/ovs-travis/builds/568603796 VMware-BZ: #2396471 CC: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> --- v1->v2: Add fixes to support old kernel versions. Thanks YiHung for reviewing. acinclude.m4 | 2 ++ datapath/linux/Modules.mk | 3 +- datapath/linux/compat/include/linux/rbtree.h | 19 ++++++++++++ datapath/linux/compat/nf_conncount.c | 46 ++++++++++++++++++---------- 4 files changed, 52 insertions(+), 18 deletions(-) create mode 100644 datapath/linux/compat/include/linux/rbtree.h