diff mbox series

[v3] netfilter: xt_connlimit: fix race in connection counting

Message ID 20190103002839.GA111435@dev-dsk-alakeshh-2c-f8a3e6e0.us-west-2.amazon.com
State Awaiting Upstream
Delegated to: David Miller
Headers show
Series [v3] netfilter: xt_connlimit: fix race in connection counting | expand

Commit Message

Alakesh Haloi Jan. 3, 2019, 12:28 a.m. UTC
commit b36e4523d4d5 ("netfilter: nf_conncount: fix garbage collection confirm
        race")

An iptable rule like the following on a multicore systems will result in
accepting more connections than set in the rule.

iptables  -A INPUT -p tcp -m tcp --syn --dport 7777 -m connlimit \
      --connlimit-above 2000 --connlimit-mask 0 -j DROP

In check_hlist function, connections that are found in saved connections
but not in netfilter conntrack are deleted, assuming that those
connections do not exist anymore. But for multi core systems, there exists
a small time window, when a connection has been added to the xt_connlimit
maintained rb-tree but has not yet made to netfilter conntrack table. This
causes concurrent connections to return incorrect counts and go over limit
set in iptable rule.

The fix has been partially backported from the above mentioned upstream
commit. Introduce timestamp and the owning cpu.

Signed-off-by: Alakesh Haloi <alakeshh@amazon.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: Florian Westphal <fw@strlen.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: stable@vger.kernel.org # v4.15 and before
Cc: netdev@vger.kernel.org
Cc: Dmitry Andrianov <dmitry.andrianov@alertme.com>
Cc: Justin Pettit <jpettit@vmware.com>
Cc: Yi-Hung Wei <yihung.wei@gmail.com>
---
 net/netfilter/xt_connlimit.c | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

Comments

Greg KH Jan. 10, 2019, 7:19 p.m. UTC | #1
On Thu, Jan 03, 2019 at 12:28:46AM +0000, Alakesh Haloi wrote:
> commit b36e4523d4d5 ("netfilter: nf_conncount: fix garbage collection confirm
>         race")
> 
> An iptable rule like the following on a multicore systems will result in
> accepting more connections than set in the rule.
> 
> iptables  -A INPUT -p tcp -m tcp --syn --dport 7777 -m connlimit \
>       --connlimit-above 2000 --connlimit-mask 0 -j DROP
> 
> In check_hlist function, connections that are found in saved connections
> but not in netfilter conntrack are deleted, assuming that those
> connections do not exist anymore. But for multi core systems, there exists
> a small time window, when a connection has been added to the xt_connlimit
> maintained rb-tree but has not yet made to netfilter conntrack table. This
> causes concurrent connections to return incorrect counts and go over limit
> set in iptable rule.
> 
> The fix has been partially backported from the above mentioned upstream
> commit. Introduce timestamp and the owning cpu.
> 
> Signed-off-by: Alakesh Haloi <alakeshh@amazon.com>
> Cc: Pablo Neira Ayuso <pablo@netfilter.org>
> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
> Cc: Florian Westphal <fw@strlen.de>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: stable@vger.kernel.org # v4.15 and before

But 4.14.92 already b36e4523d4d5 ("netfilter: nf_conncount: fix garbage
collection confirm race") and 4cd273bb91b3 ("netfilter: nf_conncount:
don't skip eviction when age is negative") in it.  Are you sure you
still need this patch?

thanks,

greg k-h
Alakesh Haloi Jan. 10, 2019, 7:27 p.m. UTC | #2
On Thu, Jan 10, 2019 at 08:19:09PM +0100, Greg KH wrote:
> On Thu, Jan 03, 2019 at 12:28:46AM +0000, Alakesh Haloi wrote:
> > commit b36e4523d4d5 ("netfilter: nf_conncount: fix garbage collection confirm
> >         race")
> > 
> > An iptable rule like the following on a multicore systems will result in
> > accepting more connections than set in the rule.
> > 
> > iptables  -A INPUT -p tcp -m tcp --syn --dport 7777 -m connlimit \
> >       --connlimit-above 2000 --connlimit-mask 0 -j DROP
> > 
> > In check_hlist function, connections that are found in saved connections
> > but not in netfilter conntrack are deleted, assuming that those
> > connections do not exist anymore. But for multi core systems, there exists
> > a small time window, when a connection has been added to the xt_connlimit
> > maintained rb-tree but has not yet made to netfilter conntrack table. This
> > causes concurrent connections to return incorrect counts and go over limit
> > set in iptable rule.
> > 
> > The fix has been partially backported from the above mentioned upstream
> > commit. Introduce timestamp and the owning cpu.
> > 
> > Signed-off-by: Alakesh Haloi <alakeshh@amazon.com>
> > Cc: Pablo Neira Ayuso <pablo@netfilter.org>
> > Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
> > Cc: Florian Westphal <fw@strlen.de>
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: stable@vger.kernel.org # v4.15 and before
> 
> But 4.14.92 already b36e4523d4d5 ("netfilter: nf_conncount: fix garbage
> collection confirm race") and 4cd273bb91b3 ("netfilter: nf_conncount:
> don't skip eviction when age is negative") in it.  Are you sure you
> still need this patch?
> 
> thanks,
> 
> greg k-h
Hi Greg
We do not need this patch anymore, since the relevant patches are already
in 4.9.92.

Thanks
-Alakesh
diff mbox series

Patch

diff --git a/net/netfilter/xt_connlimit.c b/net/netfilter/xt_connlimit.c
index ffa8eec..e7b092b 100644
--- a/net/netfilter/xt_connlimit.c
+++ b/net/netfilter/xt_connlimit.c
@@ -47,6 +47,8 @@  struct xt_connlimit_conn {
 	struct hlist_node		node;
 	struct nf_conntrack_tuple	tuple;
 	union nf_inet_addr		addr;
+	int				cpu;
+	u32				jiffies32;
 };
 
 struct xt_connlimit_rb {
@@ -126,6 +128,8 @@  static bool add_hlist(struct hlist_head *head,
 		return false;
 	conn->tuple = *tuple;
 	conn->addr = *addr;
+	conn->cpu = raw_smp_processor_id();
+	conn->jiffies32 = (u32)jiffies;
 	hlist_add_head(&conn->node, head);
 	return true;
 }
@@ -148,8 +152,26 @@  static unsigned int check_hlist(struct net *net,
 	hlist_for_each_entry_safe(conn, n, head, node) {
 		found = nf_conntrack_find_get(net, zone, &conn->tuple);
 		if (found == NULL) {
-			hlist_del(&conn->node);
-			kmem_cache_free(connlimit_conn_cachep, conn);
+			/* If connection is not found, it may be because
+			 * it has not made into conntrack table yet. We
+			 * check if it is a recently created connection
+			 * on a different core and do not delete it in that
+			 * case.
+			 */
+
+			unsigned long a, b;
+			int cpu = raw_smp_processor_id();
+			__u32 age;
+
+			b = conn->jiffies;
+			a = (u32)jiffies;
+			age = a - b;
+			if (conn->cpu != cpu && age <= 2) {
+				length++;
+			} else {
+				hlist_del(&conn->node);
+				kmem_cache_free(connlimit_conn_cachep, conn);
+			}
 			continue;
 		}
 
@@ -271,6 +293,8 @@  static void tree_nodes_free(struct rb_root *root,
 
 	conn->tuple = *tuple;
 	conn->addr = *addr;
+	conn->cpu = raw_smp_processor_id();
+	conn->jiffies32 = (u32)jiffies;
 	rbconn->addr = *addr;
 
 	INIT_HLIST_HEAD(&rbconn->hhead);