From patchwork Tue Mar 7 10:04:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 1753033 X-Patchwork-Delegate: pablo@netfilter.org Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=2620:137:e000::1:20; helo=out1.vger.email; envelope-from=netfilter-devel-owner@vger.kernel.org; receiver=) Received: from out1.vger.email (out1.vger.email [IPv6:2620:137:e000::1:20]) by legolas.ozlabs.org (Postfix) with ESMTP id 4PW9zC15RDz1yWx for ; Tue, 7 Mar 2023 21:04:39 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230514AbjCGKEf (ORCPT ); Tue, 7 Mar 2023 05:04:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40368 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230444AbjCGKEe (ORCPT ); Tue, 7 Mar 2023 05:04:34 -0500 Received: from mail.netfilter.org (mail.netfilter.org [217.70.188.207]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id F038A3CE2F; Tue, 7 Mar 2023 02:04:31 -0800 (PST) From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: davem@davemloft.net, netdev@vger.kernel.org, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com Subject: [PATCH net 1/3] netfilter: ctnetlink: revert to dumping mark regardless of event type Date: Tue, 7 Mar 2023 11:04:22 +0100 Message-Id: <20230307100424.2037-2-pablo@netfilter.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230307100424.2037-1-pablo@netfilter.org> References: <20230307100424.2037-1-pablo@netfilter.org> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: netfilter-devel@vger.kernel.org From: Ivan Delalande It seems that change was unintentional, we have userspace code that needs the mark while listening for events like REPLY, DESTROY, etc. Also include 0-marks in requested dumps, as they were before that fix. Fixes: 1feeae071507 ("netfilter: ctnetlink: fix compilation warning after data race fixes in ct mark") Signed-off-by: Ivan Delalande Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_conntrack_netlink.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index c11dff91d52d..bfc3aaa2c872 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -328,11 +328,12 @@ ctnetlink_dump_timestamp(struct sk_buff *skb, const struct nf_conn *ct) } #ifdef CONFIG_NF_CONNTRACK_MARK -static int ctnetlink_dump_mark(struct sk_buff *skb, const struct nf_conn *ct) +static int ctnetlink_dump_mark(struct sk_buff *skb, const struct nf_conn *ct, + bool dump) { u32 mark = READ_ONCE(ct->mark); - if (!mark) + if (!mark && !dump) return 0; if (nla_put_be32(skb, CTA_MARK, htonl(mark))) @@ -343,7 +344,7 @@ static int ctnetlink_dump_mark(struct sk_buff *skb, const struct nf_conn *ct) return -1; } #else -#define ctnetlink_dump_mark(a, b) (0) +#define ctnetlink_dump_mark(a, b, c) (0) #endif #ifdef CONFIG_NF_CONNTRACK_SECMARK @@ -548,7 +549,7 @@ static int ctnetlink_dump_extinfo(struct sk_buff *skb, static int ctnetlink_dump_info(struct sk_buff *skb, struct nf_conn *ct) { if (ctnetlink_dump_status(skb, ct) < 0 || - ctnetlink_dump_mark(skb, ct) < 0 || + ctnetlink_dump_mark(skb, ct, true) < 0 || ctnetlink_dump_secctx(skb, ct) < 0 || ctnetlink_dump_id(skb, ct) < 0 || ctnetlink_dump_use(skb, ct) < 0 || @@ -831,8 +832,7 @@ ctnetlink_conntrack_event(unsigned int events, const struct nf_ct_event *item) } #ifdef CONFIG_NF_CONNTRACK_MARK - if (events & (1 << IPCT_MARK) && - ctnetlink_dump_mark(skb, ct) < 0) + if (ctnetlink_dump_mark(skb, ct, events & (1 << IPCT_MARK))) goto nla_put_failure; #endif nlmsg_end(skb, nlh); @@ -2735,7 +2735,7 @@ static int __ctnetlink_glue_build(struct sk_buff *skb, struct nf_conn *ct) goto nla_put_failure; #ifdef CONFIG_NF_CONNTRACK_MARK - if (ctnetlink_dump_mark(skb, ct) < 0) + if (ctnetlink_dump_mark(skb, ct, true) < 0) goto nla_put_failure; #endif if (ctnetlink_dump_labels(skb, ct) < 0) From patchwork Tue Mar 7 10:04:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 1753034 X-Patchwork-Delegate: pablo@netfilter.org Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=2620:137:e000::1:20; helo=out1.vger.email; envelope-from=netfilter-devel-owner@vger.kernel.org; receiver=) Received: from out1.vger.email (out1.vger.email [IPv6:2620:137:e000::1:20]) by legolas.ozlabs.org (Postfix) with ESMTP id 4PW9zC44jDz246k for ; Tue, 7 Mar 2023 21:04:39 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229865AbjCGKEh (ORCPT ); Tue, 7 Mar 2023 05:04:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40378 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230505AbjCGKEf (ORCPT ); Tue, 7 Mar 2023 05:04:35 -0500 Received: from mail.netfilter.org (mail.netfilter.org [217.70.188.207]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 94DFE30F0; Tue, 7 Mar 2023 02:04:32 -0800 (PST) From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: davem@davemloft.net, netdev@vger.kernel.org, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com Subject: [PATCH net 2/3] netfilter: tproxy: fix deadlock due to missing BH disable Date: Tue, 7 Mar 2023 11:04:23 +0100 Message-Id: <20230307100424.2037-3-pablo@netfilter.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230307100424.2037-1-pablo@netfilter.org> References: <20230307100424.2037-1-pablo@netfilter.org> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: netfilter-devel@vger.kernel.org From: Florian Westphal The xtables packet traverser performs an unconditional local_bh_disable(), but the nf_tables evaluation loop does not. Functions that are called from either xtables or nftables must assume that they can be called in process context. inet_twsk_deschedule_put() assumes that no softirq interrupt can occur. If tproxy is used from nf_tables its possible that we'll deadlock trying to aquire a lock already held in process context. Add a small helper that takes care of this and use it. Link: https://lore.kernel.org/netfilter-devel/401bd6ed-314a-a196-1cdc-e13c720cc8f2@balasys.hu/ Fixes: 4ed8eb6570a4 ("netfilter: nf_tables: Add native tproxy support") Reported-and-tested-by: Major Dávid Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso --- include/net/netfilter/nf_tproxy.h | 7 +++++++ net/ipv4/netfilter/nf_tproxy_ipv4.c | 2 +- net/ipv6/netfilter/nf_tproxy_ipv6.c | 2 +- 3 files changed, 9 insertions(+), 2 deletions(-) diff --git a/include/net/netfilter/nf_tproxy.h b/include/net/netfilter/nf_tproxy.h index 82d0e41b76f2..faa108b1ba67 100644 --- a/include/net/netfilter/nf_tproxy.h +++ b/include/net/netfilter/nf_tproxy.h @@ -17,6 +17,13 @@ static inline bool nf_tproxy_sk_is_transparent(struct sock *sk) return false; } +static inline void nf_tproxy_twsk_deschedule_put(struct inet_timewait_sock *tw) +{ + local_bh_disable(); + inet_twsk_deschedule_put(tw); + local_bh_enable(); +} + /* assign a socket to the skb -- consumes sk */ static inline void nf_tproxy_assign_sock(struct sk_buff *skb, struct sock *sk) { diff --git a/net/ipv4/netfilter/nf_tproxy_ipv4.c b/net/ipv4/netfilter/nf_tproxy_ipv4.c index b22b2c745c76..69e331799604 100644 --- a/net/ipv4/netfilter/nf_tproxy_ipv4.c +++ b/net/ipv4/netfilter/nf_tproxy_ipv4.c @@ -38,7 +38,7 @@ nf_tproxy_handle_time_wait4(struct net *net, struct sk_buff *skb, hp->source, lport ? lport : hp->dest, skb->dev, NF_TPROXY_LOOKUP_LISTENER); if (sk2) { - inet_twsk_deschedule_put(inet_twsk(sk)); + nf_tproxy_twsk_deschedule_put(inet_twsk(sk)); sk = sk2; } } diff --git a/net/ipv6/netfilter/nf_tproxy_ipv6.c b/net/ipv6/netfilter/nf_tproxy_ipv6.c index 929502e51203..52f828bb5a83 100644 --- a/net/ipv6/netfilter/nf_tproxy_ipv6.c +++ b/net/ipv6/netfilter/nf_tproxy_ipv6.c @@ -63,7 +63,7 @@ nf_tproxy_handle_time_wait6(struct sk_buff *skb, int tproto, int thoff, lport ? lport : hp->dest, skb->dev, NF_TPROXY_LOOKUP_LISTENER); if (sk2) { - inet_twsk_deschedule_put(inet_twsk(sk)); + nf_tproxy_twsk_deschedule_put(inet_twsk(sk)); sk = sk2; } } From patchwork Tue Mar 7 10:04:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 1753035 X-Patchwork-Delegate: pablo@netfilter.org Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=2620:137:e000::1:20; helo=out1.vger.email; envelope-from=netfilter-devel-owner@vger.kernel.org; receiver=) Received: from out1.vger.email (out1.vger.email [IPv6:2620:137:e000::1:20]) by legolas.ozlabs.org (Postfix) with ESMTP id 4PW9zD1df7z1yWx for ; Tue, 7 Mar 2023 21:04:40 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231148AbjCGKEi (ORCPT ); Tue, 7 Mar 2023 05:04:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40406 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231128AbjCGKEf (ORCPT ); Tue, 7 Mar 2023 05:04:35 -0500 Received: from mail.netfilter.org (mail.netfilter.org [217.70.188.207]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 29073392B2; Tue, 7 Mar 2023 02:04:33 -0800 (PST) From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: davem@davemloft.net, netdev@vger.kernel.org, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com Subject: [PATCH net 3/3] netfilter: conntrack: adopt safer max chain length Date: Tue, 7 Mar 2023 11:04:24 +0100 Message-Id: <20230307100424.2037-4-pablo@netfilter.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230307100424.2037-1-pablo@netfilter.org> References: <20230307100424.2037-1-pablo@netfilter.org> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: netfilter-devel@vger.kernel.org From: Eric Dumazet Customers using GKE 1.25 and 1.26 are facing conntrack issues root caused to commit c9c3b6811f74 ("netfilter: conntrack: make max chain length random"). Even if we assume Uniform Hashing, a bucket often reachs 8 chained items while the load factor of the hash table is smaller than 0.5 With a limit of 16, we reach load factors of 3. With a limit of 32, we reach load factors of 11. With a limit of 40, we reach load factors of 15. With a limit of 50, we reach load factors of 24. This patch changes MIN_CHAINLEN to 50, to minimize risks. Ideally, we could in the future add a cushion based on expected load factor (2 * nf_conntrack_max / nf_conntrack_buckets), because some setups might expect unusual values. Fixes: c9c3b6811f74 ("netfilter: conntrack: make max chain length random") Signed-off-by: Eric Dumazet Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_conntrack_core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 7250082e7de5..c6a6a6099b4e 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -96,8 +96,8 @@ static DEFINE_MUTEX(nf_conntrack_mutex); #define GC_SCAN_MAX_DURATION msecs_to_jiffies(10) #define GC_SCAN_EXPIRED_MAX (64000u / HZ) -#define MIN_CHAINLEN 8u -#define MAX_CHAINLEN (32u - MIN_CHAINLEN) +#define MIN_CHAINLEN 50u +#define MAX_CHAINLEN (80u - MIN_CHAINLEN) static struct conntrack_gc_work conntrack_gc_work;