From patchwork Thu Jan 3 00:28:46 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alakesh Haloi X-Patchwork-Id: 1020120 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=amazon.com header.i=@amazon.com header.b="MTppiuld"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 43VTLR4zlNz9s2P for ; Thu, 3 Jan 2019 11:29:03 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728383AbfACA26 (ORCPT ); Wed, 2 Jan 2019 19:28:58 -0500 Received: from smtp-fw-6002.amazon.com ([52.95.49.90]:61909 "EHLO smtp-fw-6002.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728186AbfACA26 (ORCPT ); Wed, 2 Jan 2019 19:28:58 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1546475337; x=1578011337; h=date:from:to:cc:subject:message-id:mime-version; bh=zxc1h6/9T0t/6MRwM1vKM1q1Yg8PkqYt1NdzWX3RFng=; b=MTppiuldmbs1iyPB8relCRilcEzTzAKQUWu3+UqLMqoX0Vzse1D7Zef3 vqB/cyAyPsUtaEPyPuXEW+jszwZofQshmj1g/64DgsBp+fIXGDYQFFk8+ ATkPTXhfBNvzTsIj6rQsnD8tNx0PeA1mSkCQDQBvT4i6uvVCBa/6aDt+3 Q=; X-IronPort-AV: E=Sophos;i="5.56,253,1539648000"; d="scan'208";a="380031596" Received: from iad6-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-1a-67b371d8.us-east-1.amazon.com) ([10.124.125.6]) by smtp-border-fw-out-6002.iad6.amazon.com with ESMTP/TLS/DHE-RSA-AES256-SHA; 03 Jan 2019 00:28:56 +0000 Received: from EX13MTAUWC001.ant.amazon.com (iad55-ws-svc-p15-lb9-vlan2.iad.amazon.com [10.40.159.162]) by email-inbound-relay-1a-67b371d8.us-east-1.amazon.com (8.14.7/8.14.7) with ESMTP id x030Smnp057729 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=FAIL); Thu, 3 Jan 2019 00:28:52 GMT Received: from EX13D17UWC003.ant.amazon.com (10.43.162.206) by EX13MTAUWC001.ant.amazon.com (10.43.162.135) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Thu, 3 Jan 2019 00:28:52 +0000 Received: from dev-dsk-alakeshh-2c-f8a3e6e0.us-west-2.amazon.com (10.43.161.166) by EX13D17UWC003.ant.amazon.com (10.43.162.206) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Thu, 3 Jan 2019 00:28:52 +0000 Date: Thu, 3 Jan 2019 00:28:46 +0000 From: Alakesh Haloi To: CC: Pablo Neira Ayuso , Jozsef Kadlecsik , Florian Westphal , "David S. Miller" , Dmitry Andrianov , Justin Pettit , "Yi-Hung Wei" , Subject: [PATCH v3] netfilter: xt_connlimit: fix race in connection counting Message-ID: <20190103002839.GA111435@dev-dsk-alakeshh-2c-f8a3e6e0.us-west-2.amazon.com> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Originating-IP: [10.43.161.166] X-ClientProxiedBy: EX13D25UWB001.ant.amazon.com (10.43.161.245) To EX13D17UWC003.ant.amazon.com (10.43.162.206) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org commit b36e4523d4d5 ("netfilter: nf_conncount: fix garbage collection confirm race") An iptable rule like the following on a multicore systems will result in accepting more connections than set in the rule. iptables -A INPUT -p tcp -m tcp --syn --dport 7777 -m connlimit \ --connlimit-above 2000 --connlimit-mask 0 -j DROP In check_hlist function, connections that are found in saved connections but not in netfilter conntrack are deleted, assuming that those connections do not exist anymore. But for multi core systems, there exists a small time window, when a connection has been added to the xt_connlimit maintained rb-tree but has not yet made to netfilter conntrack table. This causes concurrent connections to return incorrect counts and go over limit set in iptable rule. The fix has been partially backported from the above mentioned upstream commit. Introduce timestamp and the owning cpu. Signed-off-by: Alakesh Haloi Cc: Pablo Neira Ayuso Cc: Jozsef Kadlecsik Cc: Florian Westphal Cc: "David S. Miller" Cc: stable@vger.kernel.org # v4.15 and before Cc: netdev@vger.kernel.org Cc: Dmitry Andrianov Cc: Justin Pettit Cc: Yi-Hung Wei --- net/netfilter/xt_connlimit.c | 28 ++++++++++++++++++++++++++-- 1 file changed, 26 insertions(+), 2 deletions(-) diff --git a/net/netfilter/xt_connlimit.c b/net/netfilter/xt_connlimit.c index ffa8eec..e7b092b 100644 --- a/net/netfilter/xt_connlimit.c +++ b/net/netfilter/xt_connlimit.c @@ -47,6 +47,8 @@ struct xt_connlimit_conn { struct hlist_node node; struct nf_conntrack_tuple tuple; union nf_inet_addr addr; + int cpu; + u32 jiffies32; }; struct xt_connlimit_rb { @@ -126,6 +128,8 @@ static bool add_hlist(struct hlist_head *head, return false; conn->tuple = *tuple; conn->addr = *addr; + conn->cpu = raw_smp_processor_id(); + conn->jiffies32 = (u32)jiffies; hlist_add_head(&conn->node, head); return true; } @@ -148,8 +152,26 @@ static unsigned int check_hlist(struct net *net, hlist_for_each_entry_safe(conn, n, head, node) { found = nf_conntrack_find_get(net, zone, &conn->tuple); if (found == NULL) { - hlist_del(&conn->node); - kmem_cache_free(connlimit_conn_cachep, conn); + /* If connection is not found, it may be because + * it has not made into conntrack table yet. We + * check if it is a recently created connection + * on a different core and do not delete it in that + * case. + */ + + unsigned long a, b; + int cpu = raw_smp_processor_id(); + __u32 age; + + b = conn->jiffies; + a = (u32)jiffies; + age = a - b; + if (conn->cpu != cpu && age <= 2) { + length++; + } else { + hlist_del(&conn->node); + kmem_cache_free(connlimit_conn_cachep, conn); + } continue; } @@ -271,6 +293,8 @@ static void tree_nodes_free(struct rb_root *root, conn->tuple = *tuple; conn->addr = *addr; + conn->cpu = raw_smp_processor_id(); + conn->jiffies32 = (u32)jiffies; rbconn->addr = *addr; INIT_HLIST_HEAD(&rbconn->hhead);