From patchwork Fri Jan 14 18:57:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bodong Wang X-Patchwork-Id: 1580253 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4Jb9Xs0zrKz9sXM for ; Sat, 15 Jan 2022 05:57:49 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1n8Rle-0005j2-Q7; Fri, 14 Jan 2022 18:57:42 +0000 Received: from mail-il-dmz.mellanox.com ([193.47.165.129] helo=mellanox.co.il) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1n8Rlc-0005hX-8G for kernel-team@lists.ubuntu.com; Fri, 14 Jan 2022 18:57:40 +0000 Received: from Internal Mail-Server by MTLPINE1 (envelope-from bodong@nvidia.com) with SMTP; 14 Jan 2022 20:57:35 +0200 Received: from sw-mtx-016.mtx.labs.mlnx. (sw-mtx-016.mtx.labs.mlnx [10.9.150.102]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id 20EIvWWq015716; Fri, 14 Jan 2022 20:57:34 +0200 From: Bodong Wang To: kernel-team@lists.ubuntu.com Subject: [SRU][F:linux-bluefield][PATCH V2 1/5] net: zero-initialize tc skb extension on allocation Date: Fri, 14 Jan 2022 12:57:28 -0600 Message-Id: <1642186652-10799-2-git-send-email-bodong@nvidia.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1642186652-10799-1-git-send-email-bodong@nvidia.com> References: <1642186652-10799-1-git-send-email-bodong@nvidia.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: vlad@nvidia.com, paulb@nvidia.com, majd@nvidia.com MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Vlad Buslov BugLink: https://bugs.launchpad.net/bugs/1957807 Function skb_ext_add() doesn't initialize created skb extension with any value and leaves it up to the user. However, since extension of type TC_SKB_EXT originally contained only single value tc_skb_ext->chain its users used to just assign the chain value without setting whole extension memory to zero first. This assumption changed when TC_SKB_EXT extension was extended with additional fields but not all users were updated to initialize the new fields which leads to use of uninitialized memory afterwards. UBSAN log: [ 778.299821] UBSAN: invalid-load in net/openvswitch/flow.c:899:28 [ 778.301495] load of value 107 is not a valid value for type '_Bool' [ 778.303215] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.12.0-rc7+ #2 [ 778.304933] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 778.307901] Call Trace: [ 778.308680] [ 778.309358] dump_stack+0xbb/0x107 [ 778.310307] ubsan_epilogue+0x5/0x40 [ 778.311167] __ubsan_handle_load_invalid_value.cold+0x43/0x48 [ 778.312454] ? memset+0x20/0x40 [ 778.313230] ovs_flow_key_extract.cold+0xf/0x14 [openvswitch] [ 778.314532] ovs_vport_receive+0x19e/0x2e0 [openvswitch] [ 778.315749] ? ovs_vport_find_upcall_portid+0x330/0x330 [openvswitch] [ 778.317188] ? create_prof_cpu_mask+0x20/0x20 [ 778.318220] ? arch_stack_walk+0x82/0xf0 [ 778.319153] ? secondary_startup_64_no_verify+0xb0/0xbb [ 778.320399] ? stack_trace_save+0x91/0xc0 [ 778.321362] ? stack_trace_consume_entry+0x160/0x160 [ 778.322517] ? lock_release+0x52e/0x760 [ 778.323444] netdev_frame_hook+0x323/0x610 [openvswitch] [ 778.324668] ? ovs_netdev_get_vport+0xe0/0xe0 [openvswitch] [ 778.325950] __netif_receive_skb_core+0x771/0x2db0 [ 778.327067] ? lock_downgrade+0x6e0/0x6f0 [ 778.328021] ? lock_acquire+0x565/0x720 [ 778.328940] ? generic_xdp_tx+0x4f0/0x4f0 [ 778.329902] ? inet_gro_receive+0x2a7/0x10a0 [ 778.330914] ? lock_downgrade+0x6f0/0x6f0 [ 778.331867] ? udp4_gro_receive+0x4c4/0x13e0 [ 778.332876] ? lock_release+0x52e/0x760 [ 778.333808] ? dev_gro_receive+0xcc8/0x2380 [ 778.334810] ? lock_downgrade+0x6f0/0x6f0 [ 778.335769] __netif_receive_skb_list_core+0x295/0x820 [ 778.336955] ? process_backlog+0x780/0x780 [ 778.337941] ? mlx5e_rep_tc_netdevice_event_unregister+0x20/0x20 [mlx5_core] [ 778.339613] ? seqcount_lockdep_reader_access.constprop.0+0xa7/0xc0 [ 778.341033] ? kvm_clock_get_cycles+0x14/0x20 [ 778.342072] netif_receive_skb_list_internal+0x5f5/0xcb0 [ 778.343288] ? __kasan_kmalloc+0x7a/0x90 [ 778.344234] ? mlx5e_handle_rx_cqe_mpwrq+0x9e0/0x9e0 [mlx5_core] [ 778.345676] ? mlx5e_xmit_xdp_frame_mpwqe+0x14d0/0x14d0 [mlx5_core] [ 778.347140] ? __netif_receive_skb_list_core+0x820/0x820 [ 778.348351] ? mlx5e_post_rx_mpwqes+0xa6/0x25d0 [mlx5_core] [ 778.349688] ? napi_gro_flush+0x26c/0x3c0 [ 778.350641] napi_complete_done+0x188/0x6b0 [ 778.351627] mlx5e_napi_poll+0x373/0x1b80 [mlx5_core] [ 778.352853] __napi_poll+0x9f/0x510 [ 778.353704] ? mlx5_flow_namespace_set_mode+0x260/0x260 [mlx5_core] [ 778.355158] net_rx_action+0x34c/0xa40 [ 778.356060] ? napi_threaded_poll+0x3d0/0x3d0 [ 778.357083] ? sched_clock_cpu+0x18/0x190 [ 778.358041] ? __common_interrupt+0x8e/0x1a0 [ 778.359045] __do_softirq+0x1ce/0x984 [ 778.359938] __irq_exit_rcu+0x137/0x1d0 [ 778.360865] irq_exit_rcu+0xa/0x20 [ 778.361708] common_interrupt+0x80/0xa0 [ 778.362640] [ 778.363212] asm_common_interrupt+0x1e/0x40 [ 778.364204] RIP: 0010:native_safe_halt+0xe/0x10 [ 778.365273] Code: 4f ff ff ff 4c 89 e7 e8 50 3f 40 fe e9 dc fe ff ff 48 89 df e8 43 3f 40 fe eb 90 cc e9 07 00 00 00 0f 00 2d 74 05 62 00 fb f4 90 e9 07 00 00 00 0f 00 2d 64 05 62 00 f4 c3 cc cc 0f 1f 44 00 [ 778.369355] RSP: 0018:ffffffff84407e48 EFLAGS: 00000246 [ 778.370570] RAX: ffff88842de46a80 RBX: ffffffff84425840 RCX: ffffffff83418468 [ 778.372143] RDX: 000000000026f1da RSI: 0000000000000004 RDI: ffffffff8343af5e [ 778.373722] RBP: fffffbfff0884b08 R08: 0000000000000000 R09: ffff88842de46bcb [ 778.375292] R10: ffffed1085bc8d79 R11: 0000000000000001 R12: 0000000000000000 [ 778.376860] R13: ffffffff851124a0 R14: 0000000000000000 R15: dffffc0000000000 [ 778.378491] ? rcu_eqs_enter.constprop.0+0xb8/0xe0 [ 778.379606] ? default_idle_call+0x5e/0xe0 [ 778.380578] default_idle+0xa/0x10 [ 778.381406] default_idle_call+0x96/0xe0 [ 778.382350] do_idle+0x3d4/0x550 [ 778.383153] ? arch_cpu_idle_exit+0x40/0x40 [ 778.384143] cpu_startup_entry+0x19/0x20 [ 778.385078] start_kernel+0x3c7/0x3e5 [ 778.385978] secondary_startup_64_no_verify+0xb0/0xbb Fix the issue by providing new function tc_skb_ext_alloc() that allocates tc skb extension and initializes its memory to 0 before returning it to the caller. Change all existing users to use new API instead of calling skb_ext_add() directly. Fixes: 038ebb1a713d ("net/sched: act_ct: fix miss set mru for ovs after defrag in act_ct") Fixes: d29334c15d33 ("net/sched: act_api: fix miss set post_ct for ovs after do conntrack in act_ct") Signed-off-by: Vlad Buslov Acked-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Paul Blakey (backported from commit 9453d45ecb6c2199d72e73c993e9d98677a2801b) [Paul: Removed changes to non-existing drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c] Signed-off-by: Bodong Wang --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 2 +- include/net/pkt_cls.h | 11 +++++++++++ net/sched/cls_api.c | 2 +- 3 files changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index cedb519..1bfe4e2 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -4737,7 +4737,7 @@ bool mlx5e_tc_rep_update_skb(struct mlx5_cqe64 *cqe, } if (chain) { - tc_skb_ext = skb_ext_add(skb, TC_SKB_EXT); + tc_skb_ext = tc_skb_ext_alloc(skb); if (!tc_skb_ext) { WARN_ON(1); return false; diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index 0511a80..0aac605 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -651,6 +651,17 @@ static inline bool tc_in_hw(u32 flags) cls_common->extack = extack; } +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT) +static inline struct tc_skb_ext *tc_skb_ext_alloc(struct sk_buff *skb) +{ + struct tc_skb_ext *tc_skb_ext = skb_ext_add(skb, TC_SKB_EXT); + + if (tc_skb_ext) + memset(tc_skb_ext, 0, sizeof(*tc_skb_ext)); + return tc_skb_ext; +} +#endif + enum tc_matchall_command { TC_CLSMATCHALL_REPLACE, TC_CLSMATCHALL_DESTROY, diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 32c2ced..c07cd36 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -1675,7 +1675,7 @@ int tcf_classify_ingress(struct sk_buff *skb, /* If we missed on some chain */ if (ret == TC_ACT_UNSPEC && last_executed_chain) { - ext = skb_ext_add(skb, TC_SKB_EXT); + ext = tc_skb_ext_alloc(skb); if (WARN_ON_ONCE(!ext)) return TC_ACT_SHOT; ext->chain = last_executed_chain; From patchwork Fri Jan 14 18:57:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bodong Wang X-Patchwork-Id: 1580254 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4Jb9Xw29j1z9sXM for ; Sat, 15 Jan 2022 05:57:52 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1n8Rlh-0005kl-1V; Fri, 14 Jan 2022 18:57:45 +0000 Received: from mail-il-dmz.mellanox.com ([193.47.165.129] helo=mellanox.co.il) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1n8Rlc-0005hY-8G for kernel-team@lists.ubuntu.com; Fri, 14 Jan 2022 18:57:40 +0000 Received: from Internal Mail-Server by MTLPINE1 (envelope-from bodong@nvidia.com) with SMTP; 14 Jan 2022 20:57:36 +0200 Received: from sw-mtx-016.mtx.labs.mlnx. (sw-mtx-016.mtx.labs.mlnx [10.9.150.102]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id 20EIvWWr015716; Fri, 14 Jan 2022 20:57:35 +0200 From: Bodong Wang To: kernel-team@lists.ubuntu.com Subject: [SRU][F:linux-bluefield][PATCH V2 2/5] net/sched: Extend qdisc control block with tc control block Date: Fri, 14 Jan 2022 12:57:29 -0600 Message-Id: <1642186652-10799-3-git-send-email-bodong@nvidia.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1642186652-10799-1-git-send-email-bodong@nvidia.com> References: <1642186652-10799-1-git-send-email-bodong@nvidia.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: vlad@nvidia.com, paulb@nvidia.com, majd@nvidia.com MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Paul Blakey BugLink: https://bugs.launchpad.net/bugs/1957807 BPF layer extends the qdisc control block via struct bpf_skb_data_end and because of that there is no more room to add variables to the qdisc layer control block without going over the skb->cb size. Extend the qdisc control block with a tc control block, and move all tc related variables to there as a pre-step for extending the tc control block with additional members. Signed-off-by: Jakub Kicinski Signed-off-by: Paul Blakey (cherry picked from commit ec624fe740b416fb68d536b37fb8eef46f90b5c2) Signed-off-by: Bodong Wang --- include/net/pkt_sched.h | 15 +++++++++++++++ include/net/sch_generic.h | 2 -- net/core/dev.c | 8 ++++---- net/sched/act_ct.c | 14 +++++++------- net/sched/cls_api.c | 6 ++++-- net/sched/cls_flower.c | 3 ++- net/sched/sch_frag.c | 3 ++- 7 files changed, 34 insertions(+), 17 deletions(-) diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index d1585b5..555083f 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -174,4 +174,19 @@ struct tc_taprio_qopt_offload *taprio_offload_get(struct tc_taprio_qopt_offload *offload); void taprio_offload_free(struct tc_taprio_qopt_offload *offload); +struct tc_skb_cb { + struct qdisc_skb_cb qdisc_cb; + + u16 mru; + bool post_ct; +}; + +static inline struct tc_skb_cb *tc_skb_cb(const struct sk_buff *skb) +{ + struct tc_skb_cb *cb = (struct tc_skb_cb *)skb->cb; + + BUILD_BUG_ON(sizeof(*cb) > sizeof_field(struct sk_buff, cb)); + return cb; +} + #endif diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index d2111d3..4a29c1e 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -425,8 +425,6 @@ struct qdisc_skb_cb { }; #define QDISC_CB_PRIV_LEN 20 unsigned char data[QDISC_CB_PRIV_LEN]; - u16 mru; - bool post_ct; }; typedef void tcf_chain_head_change_t(struct tcf_proto *tp_head, void *priv); diff --git a/net/core/dev.c b/net/core/dev.c index b08add0..4e57161 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3506,8 +3506,8 @@ int dev_loopback_xmit(struct net *net, struct sock *sk, struct sk_buff *skb) return skb; /* qdisc_skb_cb(skb)->pkt_len was already set by the caller. */ - qdisc_skb_cb(skb)->mru = 0; - qdisc_skb_cb(skb)->post_ct = false; + tc_skb_cb(skb)->mru = 0; + tc_skb_cb(skb)->post_ct = false; mini_qdisc_bstats_cpu_update(miniq, skb); switch (tcf_classify(skb, miniq->filter_list, &cl_res, false)) { @@ -4597,8 +4597,8 @@ static __latent_entropy void net_tx_action(struct softirq_action *h) } qdisc_skb_cb(skb)->pkt_len = skb->len; - qdisc_skb_cb(skb)->mru = 0; - qdisc_skb_cb(skb)->post_ct = false; + tc_skb_cb(skb)->mru = 0; + tc_skb_cb(skb)->post_ct = false; skb->tc_at_ingress = 1; mini_qdisc_bstats_cpu_update(miniq, skb); diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index df6c4eb..d916c59 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -683,10 +683,10 @@ static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, u8 family, u16 zone, bool *defrag) { enum ip_conntrack_info ctinfo; - struct qdisc_skb_cb cb; struct nf_conn *ct; int err = 0; bool frag; + u16 mru; /* Previously seen (loopback)? Ignore. */ ct = nf_ct_get(skb, &ctinfo); @@ -701,7 +701,7 @@ static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, return err; skb_get(skb); - cb = *qdisc_skb_cb(skb); + mru = tc_skb_cb(skb)->mru; if (family == NFPROTO_IPV4) { enum ip_defrag_users user = IP_DEFRAG_CONNTRACK_IN + zone; @@ -715,7 +715,7 @@ static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, if (!err) { *defrag = true; - cb.mru = IPCB(skb)->frag_max_size; + mru = IPCB(skb)->frag_max_size; } } else { /* NFPROTO_IPV6 */ #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) @@ -728,7 +728,7 @@ static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, if (!err) { *defrag = true; - cb.mru = IP6CB(skb)->frag_max_size; + mru = IP6CB(skb)->frag_max_size; } #else err = -EOPNOTSUPP; @@ -737,7 +737,7 @@ static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, } if (err != -EINPROGRESS) - *qdisc_skb_cb(skb) = cb; + tc_skb_cb(skb)->mru = mru; skb_clear_hash(skb); skb->ignore_df = 1; return err; @@ -956,7 +956,7 @@ static int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a, tcf_action_update_bstats(&c->common, skb); if (clear) { - qdisc_skb_cb(skb)->post_ct = false; + tc_skb_cb(skb)->post_ct = false; ct = nf_ct_get(skb, &ctinfo); if (ct) { nf_conntrack_put(&ct->ct_general); @@ -1043,7 +1043,7 @@ static int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a, out_push: skb_push_rcsum(skb, nh_ofs); - qdisc_skb_cb(skb)->post_ct = true; + tc_skb_cb(skb)->post_ct = true; out_clear: if (defrag) qdisc_skb_cb(skb)->pkt_len = skb->len; diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index c07cd36..535d243 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -1675,12 +1675,14 @@ int tcf_classify_ingress(struct sk_buff *skb, /* If we missed on some chain */ if (ret == TC_ACT_UNSPEC && last_executed_chain) { + struct tc_skb_cb *cb = tc_skb_cb(skb); + ext = tc_skb_ext_alloc(skb); if (WARN_ON_ONCE(!ext)) return TC_ACT_SHOT; ext->chain = last_executed_chain; - ext->mru = qdisc_skb_cb(skb)->mru; - ext->post_ct = qdisc_skb_cb(skb)->post_ct; + ext->mru = cb->mru; + ext->post_ct = cb->post_ct; } return ret; diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c index 6009f15..971b8d5 100644 --- a/net/sched/cls_flower.c +++ b/net/sched/cls_flower.c @@ -19,6 +19,7 @@ #include #include +#include #include #include #include @@ -305,7 +306,7 @@ static int fl_classify(struct sk_buff *skb, const struct tcf_proto *tp, { struct cls_fl_head *head = rcu_dereference_bh(tp->root); struct fl_flow_key skb_mkey; - bool post_ct = qdisc_skb_cb(skb)->post_ct; + bool post_ct = tc_skb_cb(skb)->post_ct; struct fl_flow_key skb_key; struct fl_flow_mask *mask; struct cls_fl_filter *f; diff --git a/net/sched/sch_frag.c b/net/sched/sch_frag.c index e1e77d3..6633530 100644 --- a/net/sched/sch_frag.c +++ b/net/sched/sch_frag.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB #include #include +#include #include #include #include @@ -137,7 +138,7 @@ static int sch_fragment(struct net *net, struct sk_buff *skb, int sch_frag_xmit_hook(struct sk_buff *skb, int (*xmit)(struct sk_buff *skb)) { - u16 mru = qdisc_skb_cb(skb)->mru; + u16 mru = tc_skb_cb(skb)->mru; int err; if (mru && skb->len > mru + skb->dev->hard_header_len) From patchwork Fri Jan 14 18:57:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bodong Wang X-Patchwork-Id: 1580256 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4Jb9Xx6vhnz9sXM for ; Sat, 15 Jan 2022 05:57:53 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1n8Rli-0005nB-U7; Fri, 14 Jan 2022 18:57:46 +0000 Received: from mail-il-dmz.mellanox.com ([193.47.165.129] helo=mellanox.co.il) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1n8Rlc-0005ha-94 for kernel-team@lists.ubuntu.com; Fri, 14 Jan 2022 18:57:40 +0000 Received: from Internal Mail-Server by MTLPINE1 (envelope-from bodong@nvidia.com) with SMTP; 14 Jan 2022 20:57:38 +0200 Received: from sw-mtx-016.mtx.labs.mlnx. (sw-mtx-016.mtx.labs.mlnx [10.9.150.102]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id 20EIvWWs015716; Fri, 14 Jan 2022 20:57:37 +0200 From: Bodong Wang To: kernel-team@lists.ubuntu.com Subject: [SRU][F:linux-bluefield][PATCH V2 3/5] net/sched: flow_dissector: Fix matching on zone id for invalid conns Date: Fri, 14 Jan 2022 12:57:30 -0600 Message-Id: <1642186652-10799-4-git-send-email-bodong@nvidia.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1642186652-10799-1-git-send-email-bodong@nvidia.com> References: <1642186652-10799-1-git-send-email-bodong@nvidia.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: vlad@nvidia.com, paulb@nvidia.com, majd@nvidia.com MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Paul Blakey BugLink: https://bugs.launchpad.net/bugs/1957807 If ct rejects a flow, it removes the conntrack info from the skb. act_ct sets the post_ct variable so the dissector will see this case as an +tracked +invalid state, but the zone id is lost with the conntrack info. To restore the zone id on such cases, set the last executed zone, via the tc control block, when passing ct, and read it back in the dissector if there is no ct info on the skb (invalid connection). Fixes: 7baf2429a1a9 ("net/sched: cls_flower add CT_FLAGS_INVALID flag support") Signed-off-by: Jakub Kicinski Signed-off-by: Paul Blakey (cherry picked from commit 3849595866166b23bf6a0cb9ff87e06423167f67) Signed-off-by: Bodong Wang --- include/linux/skbuff.h | 2 +- include/net/pkt_sched.h | 1 + net/core/flow_dissector.c | 3 ++- net/sched/act_ct.c | 1 + net/sched/cls_flower.c | 3 ++- 5 files changed, 7 insertions(+), 3 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 76b8caa..c9161c1 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1332,7 +1332,7 @@ void skb_flow_dissect_meta(const struct sk_buff *skb, struct flow_dissector *flow_dissector, void *target_container, u16 *ctinfo_map, size_t mapsize, - bool post_ct); + bool post_ct, u16 zone); void skb_flow_dissect_tunnel_info(const struct sk_buff *skb, struct flow_dissector *flow_dissector, diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index 555083f..b52882e 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -179,6 +179,7 @@ struct tc_skb_cb { u16 mru; bool post_ct; + u16 zone; /* Only valid if post_ct = true */ }; static inline struct tc_skb_cb *tc_skb_cb(const struct sk_buff *skb) diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 498ce01..e6555e6 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -255,7 +255,7 @@ void skb_flow_dissect_meta(const struct sk_buff *skb, skb_flow_dissect_ct(const struct sk_buff *skb, struct flow_dissector *flow_dissector, void *target_container, u16 *ctinfo_map, - size_t mapsize, bool post_ct) + size_t mapsize, bool post_ct, u16 zone) { #if IS_ENABLED(CONFIG_NF_CONNTRACK) struct flow_dissector_key_ct *key; @@ -277,6 +277,7 @@ void skb_flow_dissect_meta(const struct sk_buff *skb, if (!ct) { key->ct_state = TCA_FLOWER_KEY_CT_FLAGS_TRACKED | TCA_FLOWER_KEY_CT_FLAGS_INVALID; + key->ct_zone = zone; return; } diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index d916c59..796089c 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -1044,6 +1044,7 @@ static int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a, skb_push_rcsum(skb, nh_ofs); tc_skb_cb(skb)->post_ct = true; + tc_skb_cb(skb)->zone = p->zone; out_clear: if (defrag) qdisc_skb_cb(skb)->pkt_len = skb->len; diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c index 971b8d5..5129b67 100644 --- a/net/sched/cls_flower.c +++ b/net/sched/cls_flower.c @@ -307,6 +307,7 @@ static int fl_classify(struct sk_buff *skb, const struct tcf_proto *tp, struct cls_fl_head *head = rcu_dereference_bh(tp->root); struct fl_flow_key skb_mkey; bool post_ct = tc_skb_cb(skb)->post_ct; + u16 zone = tc_skb_cb(skb)->zone; struct fl_flow_key skb_key; struct fl_flow_mask *mask; struct cls_fl_filter *f; @@ -324,7 +325,7 @@ static int fl_classify(struct sk_buff *skb, const struct tcf_proto *tp, skb_flow_dissect_ct(skb, &mask->dissector, &skb_key, fl_ct_info_to_flower_map, ARRAY_SIZE(fl_ct_info_to_flower_map), - post_ct); + post_ct, zone); skb_flow_dissect(skb, &mask->dissector, &skb_key, 0); fl_set_masked_key(&skb_mkey, &skb_key, mask); From patchwork Fri Jan 14 18:57:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bodong Wang X-Patchwork-Id: 1580255 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4Jb9Xx0Ymgz9t0k for ; Sat, 15 Jan 2022 05:57:53 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1n8Rli-0005mW-Ha; Fri, 14 Jan 2022 18:57:46 +0000 Received: from mail-il-dmz.mellanox.com ([193.47.165.129] helo=mellanox.co.il) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1n8Rlc-0005hZ-95 for kernel-team@lists.ubuntu.com; Fri, 14 Jan 2022 18:57:40 +0000 Received: from Internal Mail-Server by MTLPINE1 (envelope-from bodong@nvidia.com) with SMTP; 14 Jan 2022 20:57:39 +0200 Received: from sw-mtx-016.mtx.labs.mlnx. (sw-mtx-016.mtx.labs.mlnx [10.9.150.102]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id 20EIvWWt015716; Fri, 14 Jan 2022 20:57:38 +0200 From: Bodong Wang To: kernel-team@lists.ubuntu.com Subject: [SRU][F:linux-bluefield][PATCH V2 4/5] net: openvswitch: Fix matching zone id for invalid conns arriving from tc Date: Fri, 14 Jan 2022 12:57:31 -0600 Message-Id: <1642186652-10799-5-git-send-email-bodong@nvidia.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1642186652-10799-1-git-send-email-bodong@nvidia.com> References: <1642186652-10799-1-git-send-email-bodong@nvidia.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: vlad@nvidia.com, paulb@nvidia.com, majd@nvidia.com MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Paul Blakey BugLink: https://bugs.launchpad.net/bugs/1957807 Zone id is not restored if we passed ct and ct rejected the connection, as there is no ct info on the skb. Save the zone from tc skb cb to tc skb extension and pass it on to ovs, use that info to restore the zone id for invalid connections. Fixes: d29334c15d33 ("net/sched: act_api: fix miss set post_ct for ovs after do conntrack in act_ct") Signed-off-by: Jakub Kicinski Signed-off-by: Paul Blakey (cherry picked from commit 635d448a1cce4b4ebee52b351052c70434fa90ea) Signed-off-by: Bodong Wang --- include/linux/skbuff.h | 1 + net/openvswitch/flow.c | 8 +++++++- net/sched/cls_api.c | 1 + 3 files changed, 9 insertions(+), 1 deletion(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index c9161c1..b53877d 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -284,6 +284,7 @@ struct nf_bridge_info { struct tc_skb_ext { __u32 chain; __u16 mru; + __u16 zone; bool post_ct; }; #endif diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c index 3d2e8fd..89e00eb 100644 --- a/net/openvswitch/flow.c +++ b/net/openvswitch/flow.c @@ -34,6 +34,7 @@ #include #include #include +#include #include "conntrack.h" #include "datapath.h" @@ -847,6 +848,7 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info, #endif bool post_ct = false; int res, err; + u16 zone = 0; /* Extract metadata from packet. */ if (tun_info) { @@ -885,6 +887,7 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info, key->recirc_id = tc_ext ? tc_ext->chain : 0; OVS_CB(skb)->mru = tc_ext ? tc_ext->mru : 0; post_ct = tc_ext ? tc_ext->post_ct : false; + zone = post_ct ? tc_ext->zone : 0; } else { key->recirc_id = 0; } @@ -893,8 +896,11 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info, #endif err = key_extract(skb, key); - if (!err) + if (!err) { ovs_ct_fill_key(skb, key, post_ct); /* Must be after key_extract(). */ + if (post_ct && !skb_get_nfct(skb)) + key->ct_zone = zone; + } return err; } diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 535d243..5df4155 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -1683,6 +1683,7 @@ int tcf_classify_ingress(struct sk_buff *skb, ext->chain = last_executed_chain; ext->mru = cb->mru; ext->post_ct = cb->post_ct; + ext->zone = cb->zone; } return ret; From patchwork Fri Jan 14 18:57:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bodong Wang X-Patchwork-Id: 1580257 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4Jb9Y159VZz9sXM for ; Sat, 15 Jan 2022 05:57:57 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1n8Rlo-0005vO-4D; Fri, 14 Jan 2022 18:57:52 +0000 Received: from mail-il-dmz.mellanox.com ([193.47.165.129] helo=mellanox.co.il) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1n8Rlh-0005km-BO for kernel-team@lists.ubuntu.com; Fri, 14 Jan 2022 18:57:45 +0000 Received: from Internal Mail-Server by MTLPINE1 (envelope-from bodong@nvidia.com) with SMTP; 14 Jan 2022 20:57:40 +0200 Received: from sw-mtx-016.mtx.labs.mlnx. (sw-mtx-016.mtx.labs.mlnx [10.9.150.102]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id 20EIvWWu015716; Fri, 14 Jan 2022 20:57:39 +0200 From: Bodong Wang To: kernel-team@lists.ubuntu.com Subject: [SRU][F:linux-bluefield][PATCH V2 5/5] UBUNTU: SAUCE: net: openvswitch: Fix ct_state nat flags for conns arriving from tc Date: Fri, 14 Jan 2022 12:57:32 -0600 Message-Id: <1642186652-10799-6-git-send-email-bodong@nvidia.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1642186652-10799-1-git-send-email-bodong@nvidia.com> References: <1642186652-10799-1-git-send-email-bodong@nvidia.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: vlad@nvidia.com, paulb@nvidia.com, majd@nvidia.com MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Paul Blakey BugLink: https://bugs.launchpad.net/bugs/1957807 Netfilter conntrack maintains NAT flags per connection indicating whether NAT was configured for the connection. Openvswitch maintains NAT flags on the per packet flow key ct_state field, indicating whether NAT was actually executed on the packet. When a packet misses from tc to ovs the conntrack NAT flags are set. However, NAT was not necessarily executed on the packet because the connection's state might still be in NEW state. As such, openvswitch wrongly assumes that NAT was executed and sets an incorrect flow key NAT flags. Fix this, by flagging to openvswitch which NAT was actually done in act_ct via tc_skb_ext and tc_skb_cb to the openvswitch module, so the packet flow key NAT flags will be correctly set. Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct") Signed-off-by: Paul Blakey Signed-off-by: Bodong Wang --- include/linux/skbuff.h | 4 +++- include/net/pkt_sched.h | 4 +++- net/openvswitch/flow.c | 16 +++++++++++++--- net/sched/act_ct.c | 6 ++++++ net/sched/cls_api.c | 2 ++ 5 files changed, 27 insertions(+), 5 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index b53877d..9701643 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -285,7 +285,9 @@ struct tc_skb_ext { __u32 chain; __u16 mru; __u16 zone; - bool post_ct; + u8 post_ct:1; + u8 post_ct_snat:1; + u8 post_ct_dnat:1; }; #endif diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index b52882e..6250acb 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -178,7 +178,9 @@ struct tc_skb_cb { struct qdisc_skb_cb qdisc_cb; u16 mru; - bool post_ct; + u8 post_ct:1; + u8 post_ct_snat:1; + u8 post_ct_dnat:1; u16 zone; /* Only valid if post_ct = true */ }; diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c index 89e00eb..090dc2f 100644 --- a/net/openvswitch/flow.c +++ b/net/openvswitch/flow.c @@ -846,7 +846,7 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info, #if IS_ENABLED(CONFIG_NET_TC_SKB_EXT) struct tc_skb_ext *tc_ext; #endif - bool post_ct = false; + bool post_ct = false, post_ct_snat = false, post_ct_dnat = false; int res, err; u16 zone = 0; @@ -887,6 +887,8 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info, key->recirc_id = tc_ext ? tc_ext->chain : 0; OVS_CB(skb)->mru = tc_ext ? tc_ext->mru : 0; post_ct = tc_ext ? tc_ext->post_ct : false; + post_ct_snat = post_ct ? tc_ext->post_ct_snat : false; + post_ct_dnat = post_ct ? tc_ext->post_ct_dnat : false; zone = post_ct ? tc_ext->zone : 0; } else { key->recirc_id = 0; @@ -898,8 +900,16 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info, err = key_extract(skb, key); if (!err) { ovs_ct_fill_key(skb, key, post_ct); /* Must be after key_extract(). */ - if (post_ct && !skb_get_nfct(skb)) - key->ct_zone = zone; + if (post_ct) { + if (!skb_get_nfct(skb)) { + key->ct_zone = zone; + } else { + if (!post_ct_dnat) + key->ct_state &= ~OVS_CS_F_DST_NAT; + if (!post_ct_snat) + key->ct_state &= ~OVS_CS_F_SRC_NAT; + } + } } return err; } diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index 796089c..8a78cbd 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -832,6 +832,12 @@ static int ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct, } err = nf_nat_packet(ct, ctinfo, hooknum, skb); + if (err == NF_ACCEPT) { + if (maniptype == NF_NAT_MANIP_SRC) + tc_skb_cb(skb)->post_ct_snat = 1; + if (maniptype == NF_NAT_MANIP_DST) + tc_skb_cb(skb)->post_ct_dnat = 1; + } out: return err; } diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 5df4155..aecc4f0 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -1683,6 +1683,8 @@ int tcf_classify_ingress(struct sk_buff *skb, ext->chain = last_executed_chain; ext->mru = cb->mru; ext->post_ct = cb->post_ct; + ext->post_ct_snat = cb->post_ct_snat; + ext->post_ct_dnat = cb->post_ct_dnat; ext->zone = cb->zone; }