From patchwork Wed Aug 7 00:05:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yifeng Sun X-Patchwork-Id: 1143072 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="vZL+Lcf8"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 463Bc85WNHz9sNk for ; Wed, 7 Aug 2019 10:05:59 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 4356DC79; Wed, 7 Aug 2019 00:05:56 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 036FFAD7 for ; Wed, 7 Aug 2019 00:05:55 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-pl1-f193.google.com (mail-pl1-f193.google.com [209.85.214.193]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 58AF47D2 for ; Wed, 7 Aug 2019 00:05:54 +0000 (UTC) Received: by mail-pl1-f193.google.com with SMTP id c2so38488203plz.13 for ; Tue, 06 Aug 2019 17:05:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=KKVPC6ZYP7SdnvQacpGdajhlhpUF3PMSW1vLSnXqQGU=; b=vZL+Lcf8ev3IqKALwrNK73QZ42143FGBrUG8yYMRvu4WiA2yQ0jqQNe6CzT2kqjfMH 3jDVq9XK0YmmdhW1KXfl8qXgoraFR5TdkuMyFpi0zP2a3mZZAzbVVuKv8pjth7N6l32Z yIJXA9XGnD4haMaaJ7vBtfE6t1Ah/P7TZsRHzAElRXRnDrsA2OPNNisRjHSEON0Tpd7g q/9qAByKLnFdie39vMhfZnLeqmIS04Kr397BkumQRW+8r5kvqas8aZZCbZ0Rbvjqvox5 aDteQga0W7Zs2qh+DP/eIQ+rpC1XVoCv25yAZeeyvkesmO1DVSszkSaq1SbAxqdFpDIc i+5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=KKVPC6ZYP7SdnvQacpGdajhlhpUF3PMSW1vLSnXqQGU=; b=PK5HzHQ/yf+EgiY70QNtg8LAZQNMYhDcJTKYP8c5jm+rrGo+oH7UMaJZtaqBt9Ii6N JnXNudpAJn8pPBeV7ce332Dv4NNP8Ky+mAx9raIKljiOxnUAIM/qB8mKU7Rock9CqiM1 HdqtuFxMvf3tAGHAxgLzset1LnXAiYFRUiyI3RUzj+Unw19BbM1haQcqNHCyHDZ+6v51 il3QCepJAVeHpggxxVqVd5flE4jE2x0NWxJ1ARUsz7g2AA6W96T0KHM04aOFdXFuHHZC qtmge3SNuS7uZfp0zLTfePuvnRxs9spOlQcL1XBVzS3qAPCTsmFIKHFYR+ZcmsUlDVnN HhQg== X-Gm-Message-State: APjAAAVkv9CpQJBuOIq2dAy2F2jvswCQ6f4eIhGx40hmTfAq/yRxjNGr 19zgVgbDC4UUdQIotuVISThh4+R7WGY= X-Google-Smtp-Source: APXvYqwCFGR8pLvdZqE27B5Xfg7VSf4NxIzLXjr32p//lbMs10NtcBfrOfjdfy5NLNhOrzTUj0y2Ng== X-Received: by 2002:a17:902:28c9:: with SMTP id f67mr5620196plb.19.1565136353526; Tue, 06 Aug 2019 17:05:53 -0700 (PDT) Received: from kern417.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id q24sm20787152pjp.14.2019.08.06.17.05.52 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 06 Aug 2019 17:05:52 -0700 (PDT) From: Yifeng Sun To: dev@openvswitch.org Date: Tue, 6 Aug 2019 17:05:48 -0700 Message-Id: <1565136348-28595-1-git-send-email-pkusunyifeng@gmail.com> X-Mailer: git-send-email 2.7.4 X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCH v2] datapath: compat: Backports bugfixes for nf_conncount X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org This patch backports several critical bug fixes related to locking and data consistency in nf_conncount code. This backport is based on the following upstream net-next upstream commits. 4cd273b ("netfilter: nf_conncount: don't skip eviction when age is negative") d4e7df1 ("netfilter: nf_conncount: use rb_link_node_rcu() instead of rb_link_node()") 53ca0f2 ("netfilter: nf_conncount: remove wrong condition check routine") 3c5cdb1 ("netfilter: nf_conncount: fix unexpected permanent node of list.") 31568ec ("netfilter: nf_conncount: fix list_del corruption in conn_free") fd3e71a ("netfilter: nf_conncount: use spin_lock_bh instead of spin_lock") This patch also added additional compat code so that it can build on all supported kernel versions. Travis tests are at https://travis-ci.org/yifsun/ovs-travis/builds/568603796 VMware-BZ: #2396471 CC: Taehee Yoo Signed-off-by: Yifeng Sun --- v1->v2: Add fixes to support old kernel versions. Thanks YiHung for reviewing. acinclude.m4 | 2 ++ datapath/linux/Modules.mk | 3 +- datapath/linux/compat/include/linux/rbtree.h | 19 ++++++++++++ datapath/linux/compat/nf_conncount.c | 46 ++++++++++++++++++---------- 4 files changed, 52 insertions(+), 18 deletions(-) create mode 100644 datapath/linux/compat/include/linux/rbtree.h diff --git a/acinclude.m4 b/acinclude.m4 index 116ffcf9096d..f8e856d3303f 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -1012,6 +1012,8 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [ [OVS_DEFINE([HAVE_GRE_CALC_HLEN])]) OVS_GREP_IFELSE([$KSRC/include/net/gre.h], [ip_gre_calc_hlen], [OVS_DEFINE([HAVE_IP_GRE_CALC_HLEN])]) + OVS_GREP_IFELSE([$KSRC/include/linux/rbtree.h], [rb_link_node_rcu], + [OVS_DEFINE([HAVE_RBTREE_RB_LINK_NODE_RCU])]) if cmp -s datapath/linux/kcompat.h.new \ datapath/linux/kcompat.h >/dev/null 2>&1; then diff --git a/datapath/linux/Modules.mk b/datapath/linux/Modules.mk index cbb29f1c69d0..69d7faeac414 100644 --- a/datapath/linux/Modules.mk +++ b/datapath/linux/Modules.mk @@ -116,5 +116,6 @@ openvswitch_headers += \ linux/compat/include/uapi/linux/netfilter.h \ linux/compat/include/linux/mm.h \ linux/compat/include/linux/netfilter.h \ - linux/compat/include/linux/overflow.h + linux/compat/include/linux/overflow.h \ + linux/compat/include/linux/rbtree.h EXTRA_DIST += linux/compat/build-aux/export-check-whitelist diff --git a/datapath/linux/compat/include/linux/rbtree.h b/datapath/linux/compat/include/linux/rbtree.h new file mode 100644 index 000000000000..dbf20ff0e0b8 --- /dev/null +++ b/datapath/linux/compat/include/linux/rbtree.h @@ -0,0 +1,19 @@ +#ifndef __LINUX_RBTREE_WRAPPER_H +#define __LINUX_RBTREE_WRAPPER_H 1 + +#include_next + +#ifndef HAVE_RBTREE_RB_LINK_NODE_RCU +#include + +static inline void rb_link_node_rcu(struct rb_node *node, struct rb_node *parent, + struct rb_node **rb_link) +{ + node->__rb_parent_color = (unsigned long)parent; + node->rb_left = node->rb_right = NULL; + + rcu_assign_pointer(*rb_link, node); +} +#endif + +#endif /* __LINUX_RBTREE_WRAPPER_H */ diff --git a/datapath/linux/compat/nf_conncount.c b/datapath/linux/compat/nf_conncount.c index eeae440f872d..6a4d058e7fac 100644 --- a/datapath/linux/compat/nf_conncount.c +++ b/datapath/linux/compat/nf_conncount.c @@ -54,6 +54,7 @@ struct nf_conncount_tuple { struct nf_conntrack_zone zone; int cpu; u32 jiffies32; + bool dead; struct rcu_head rcu_head; }; @@ -111,15 +112,16 @@ nf_conncount_add(struct nf_conncount_list *list, conn->zone = *zone; conn->cpu = raw_smp_processor_id(); conn->jiffies32 = (u32)jiffies; - spin_lock(&list->list_lock); + conn->dead = false; + spin_lock_bh(&list->list_lock); if (list->dead == true) { kmem_cache_free(conncount_conn_cachep, conn); - spin_unlock(&list->list_lock); + spin_unlock_bh(&list->list_lock); return NF_CONNCOUNT_SKIP; } list_add_tail(&conn->node, &list->head); list->count++; - spin_unlock(&list->list_lock); + spin_unlock_bh(&list->list_lock); return NF_CONNCOUNT_ADDED; } @@ -136,19 +138,22 @@ static bool conn_free(struct nf_conncount_list *list, { bool free_entry = false; - spin_lock(&list->list_lock); + spin_lock_bh(&list->list_lock); - if (list->count == 0) { - spin_unlock(&list->list_lock); + if (conn->dead) { + spin_unlock_bh(&list->list_lock); return free_entry; } list->count--; + conn->dead = true; list_del_rcu(&conn->node); - if (list->count == 0) + if (list->count == 0) { + list->dead = true; free_entry = true; + } - spin_unlock(&list->list_lock); + spin_unlock_bh(&list->list_lock); call_rcu(&conn->rcu_head, __conn_free); return free_entry; } @@ -160,7 +165,7 @@ find_or_evict(struct net *net, struct nf_conncount_list *list, const struct nf_conntrack_tuple_hash *found; unsigned long a, b; int cpu = raw_smp_processor_id(); - __s32 age; + u32 age; found = nf_conntrack_find_get(net, &conn->zone, &conn->tuple); if (found) @@ -248,7 +253,7 @@ static void nf_conncount_list_init(struct nf_conncount_list *list) { spin_lock_init(&list->list_lock); INIT_LIST_HEAD(&list->head); - list->count = 1; + list->count = 0; list->dead = false; } @@ -261,6 +266,7 @@ static bool nf_conncount_gc_list(struct net *net, struct nf_conn *found_ct; unsigned int collected = 0; bool free_entry = false; + bool ret = false; list_for_each_entry_safe(conn, conn_n, &list->head, node) { found = find_or_evict(net, list, conn, &free_entry); @@ -290,7 +296,15 @@ static bool nf_conncount_gc_list(struct net *net, if (collected > CONNCOUNT_GC_MAX_NODES) return false; } - return false; + + spin_lock_bh(&list->list_lock); + if (!list->count) { + list->dead = true; + ret = true; + } + spin_unlock_bh(&list->list_lock); + + return ret; } static void __tree_nodes_free(struct rcu_head *h) @@ -310,11 +324,8 @@ static void tree_nodes_free(struct rb_root *root, while (gc_count) { rbconn = gc_nodes[--gc_count]; spin_lock(&rbconn->list.list_lock); - if (rbconn->list.count == 0 && rbconn->list.dead == false) { - rbconn->list.dead = true; - rb_erase(&rbconn->node, root); - call_rcu(&rbconn->rcu_head, __tree_nodes_free); - } + rb_erase(&rbconn->node, root); + call_rcu(&rbconn->rcu_head, __tree_nodes_free); spin_unlock(&rbconn->list.list_lock); } } @@ -415,8 +426,9 @@ insert_tree(struct net *net, nf_conncount_list_init(&rbconn->list); list_add(&conn->node, &rbconn->list.head); count = 1; + rbconn->list.count = count; - rb_link_node(&rbconn->node, parent, rbnode); + rb_link_node_rcu(&rbconn->node, parent, rbnode); rb_insert_color(&rbconn->node, root); out_unlock: spin_unlock_bh(&nf_conncount_locks[hash % CONNCOUNT_LOCK_SLOTS]);