mbox series

[SRU,F:linux-bluefield,v2,0/5] revert "Support hardware miss to tc action"

Message ID 20230424235107.607162-1-witu@nvidia.com
Headers show
Series revert "Support hardware miss to tc action" | expand

Message

William Tu April 24, 2023, 11:51 p.m. UTC
The series revert the patches for:
BugLink: https://bugs.launchpad.net/bugs/2012571

While testing, we found the following kernel NULL pointer dereference:
[  299.084455] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[  299.093268] Mem abort info:
[  299.096052]   ESR = 0x96000004
[  299.099157]   EC = 0x25: DABT (current EL), IL = 32 bits
[  299.104485]   SET = 0, FnV = 0
[  299.107529]   EA = 0, S1PTW = 0
[  299.110690] Data abort info:
[  299.113585]   ISV = 0, ISS = 0x00000004
[  299.117436]   CM = 0, WnR = 0
[  299.120420] user pgtable: 4k pages, 48-bit VAs, pgdp=000000083b887000
[  299.126882] [0000000000000000] pgd=0000000000000000
[  299.131778] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[  299.137342] Modules linked in: ipmb_host rdma_ucm(O) rdma_cm(O) iw_cm(O) ib_ipoib(O) ib_cm(O) ib_umad(O) mlx5_ib(O) mlx5_core(O) mlxfw(O) mlxdevm(O) auxiliary(O) ib_uverbs(O) psample ib_core(O) mlx_compat(O) sbsa_gwdt act_mirred act_skbedit ip6table_filter ip6table_nat ip6table_mangle ip6_tables xt_MASQUERADE xt_mark iptable_nat xt_conntrack xt_comment iptable_filter iptable_mangle bpfilter nfnetlink_cttimeout nfnetlink act_gact cls_flower sch_ingress xfrm_user xfrm_algo openvswitch nsh nf_conncount nf_nat mst_pciconf(O) ipmi_devintf ipmi_msghandler ipmb_dev_int 8021q garp stp mrp llc overlay dm_multipath uio_pdrv_genirq mlxbf_pmc uio pwr_mlxbf mlx_bootctl mlxbf_pka mlx_trio bluefield_edac sch_fq_codel efi_pstore ip_tables ipv6 crc_ccitt btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq raid1 raid0 crct10dif_ce i2c_mlxbf gpio_mlxbf2 mlxbf_gige aes_neon_bs aes_neon_blk [last unloaded: mlx_compat]
[  299.222443] CPU: 5 PID: 11902 Comm: revalidator15 Tainted: G           O      5.4.0-1061-bluefield #67-Ubuntu
[  299.232347] Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS 4.0.2.12660 Apr 15 2023
[  299.242253] pstate: a0000005 (NzCv daif -PAN -UAO)
[  299.247047] pc : fl_dump_key+0x34/0xa08 [cls_flower]
[  299.252002] lr : fl_dump+0xc4/0x300 [cls_flower]
[  299.256609] sp : ffff8000117935e0
[  299.259912] x29: ffff8000117935e0 x28: ffff4932144e6800
[  299.265214] x27: ffff493206128048 x26: ffff4931d77dd738
[  299.270516] x25: ffff493206128000 x24: ffff4932144e6800
[  299.275817] x23: ffff4932068bba00 x22: ffffcaf6d467d500
[  299.281118] x21: ffff493214581900 x20: 0000000000000000
[  299.286419] x19: ffff4932144e6a18 x18: 0000000000000000
[  299.291721] x17: 0000000000000000 x16: ffffcaf6d369ac70
[  299.297021] x15: 0000000000000000 x14: 0000000000000000
[  299.302323] x13: 0000000000000040 x12: 0000000000000030
[  299.307624] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
[  299.312925] x9 : fefefefefefefeff x8 : ffff493206128074
[  299.318226] x7 : 0000000000000000 x6 : ffff493206128074
[  299.323527] x5 : 0000000000007e50 x4 : 0000000000000000
[  299.328828] x3 : 0000000000000000 x2 : ffff4932144e6a18
[  299.334129] x1 : ffffcaf6d467d500 x0 : ffffcaf68d53198c

Git bisect shows the first FAILED is below
FAILED 0e6ec758d035 net/sched: flower: Move filter handle initialization earlier
PASS 636a658578ec net/sched: cls_api: Support hardware miss to tc action

It seems non-trivial to fix the issue, so request to revert.

v1->v2: Fix format errors.

William Tu (5):
  Revert "net/sched: flower: fix fl_change() error recovery path"
  Revert "net/sched: flower: Support hardware miss to tc action"
  Revert "net/sched: flower: Move filter handle initialization earlier"
  Revert "net/sched: cls_api: Support hardware miss to tc action"
  Revert "UBUNTU: SAUCE: net/sched: Provide act to offload action"

 include/linux/skbuff.h     |   6 +-
 include/net/flow_offload.h |   2 -
 include/net/pkt_cls.h      |  30 ++---
 include/net/sch_generic.h  |   2 -
 net/openvswitch/flow.c     |   3 +-
 net/sched/cls_api.c        | 217 ++-----------------------------------
 net/sched/cls_flower.c     |  79 +++++---------
 7 files changed, 51 insertions(+), 288 deletions(-)

Comments

Andrei Gherzan April 25, 2023, 9:42 a.m. UTC | #1
On 23/04/25 02:51AM, William Tu wrote:
> The series revert the patches for:
> BugLink: https://bugs.launchpad.net/bugs/2012571
> 
> While testing, we found the following kernel NULL pointer dereference:
> [  299.084455] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> [  299.093268] Mem abort info:
> [  299.096052]   ESR = 0x96000004
> [  299.099157]   EC = 0x25: DABT (current EL), IL = 32 bits
> [  299.104485]   SET = 0, FnV = 0
> [  299.107529]   EA = 0, S1PTW = 0
> [  299.110690] Data abort info:
> [  299.113585]   ISV = 0, ISS = 0x00000004
> [  299.117436]   CM = 0, WnR = 0
> [  299.120420] user pgtable: 4k pages, 48-bit VAs, pgdp=000000083b887000
> [  299.126882] [0000000000000000] pgd=0000000000000000
> [  299.131778] Internal error: Oops: 96000004 [#1] PREEMPT SMP
> [  299.137342] Modules linked in: ipmb_host rdma_ucm(O) rdma_cm(O) iw_cm(O) ib_ipoib(O) ib_cm(O) ib_umad(O) mlx5_ib(O) mlx5_core(O) mlxfw(O) mlxdevm(O) auxiliary(O) ib_uverbs(O) psample ib_core(O) mlx_compat(O) sbsa_gwdt act_mirred act_skbedit ip6table_filter ip6table_nat ip6table_mangle ip6_tables xt_MASQUERADE xt_mark iptable_nat xt_conntrack xt_comment iptable_filter iptable_mangle bpfilter nfnetlink_cttimeout nfnetlink act_gact cls_flower sch_ingress xfrm_user xfrm_algo openvswitch nsh nf_conncount nf_nat mst_pciconf(O) ipmi_devintf ipmi_msghandler ipmb_dev_int 8021q garp stp mrp llc overlay dm_multipath uio_pdrv_genirq mlxbf_pmc uio pwr_mlxbf mlx_bootctl mlxbf_pka mlx_trio bluefield_edac sch_fq_codel efi_pstore ip_tables ipv6 crc_ccitt btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq raid1 raid0 crct10dif_ce i2c_mlxbf gpio_mlxbf2 mlxbf_gige aes_neon_bs aes_neon_blk [last unloaded: mlx_compat]
> [  299.222443] CPU: 5 PID: 11902 Comm: revalidator15 Tainted: G           O      5.4.0-1061-bluefield #67-Ubuntu
> [  299.232347] Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS 4.0.2.12660 Apr 15 2023
> [  299.242253] pstate: a0000005 (NzCv daif -PAN -UAO)
> [  299.247047] pc : fl_dump_key+0x34/0xa08 [cls_flower]
> [  299.252002] lr : fl_dump+0xc4/0x300 [cls_flower]
> [  299.256609] sp : ffff8000117935e0
> [  299.259912] x29: ffff8000117935e0 x28: ffff4932144e6800
> [  299.265214] x27: ffff493206128048 x26: ffff4931d77dd738
> [  299.270516] x25: ffff493206128000 x24: ffff4932144e6800
> [  299.275817] x23: ffff4932068bba00 x22: ffffcaf6d467d500
> [  299.281118] x21: ffff493214581900 x20: 0000000000000000
> [  299.286419] x19: ffff4932144e6a18 x18: 0000000000000000
> [  299.291721] x17: 0000000000000000 x16: ffffcaf6d369ac70
> [  299.297021] x15: 0000000000000000 x14: 0000000000000000
> [  299.302323] x13: 0000000000000040 x12: 0000000000000030
> [  299.307624] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
> [  299.312925] x9 : fefefefefefefeff x8 : ffff493206128074
> [  299.318226] x7 : 0000000000000000 x6 : ffff493206128074
> [  299.323527] x5 : 0000000000007e50 x4 : 0000000000000000
> [  299.328828] x3 : 0000000000000000 x2 : ffff4932144e6a18
> [  299.334129] x1 : ffffcaf6d467d500 x0 : ffffcaf68d53198c
> 
> Git bisect shows the first FAILED is below
> FAILED 0e6ec758d035 net/sched: flower: Move filter handle initialization earlier
> PASS 636a658578ec net/sched: cls_api: Support hardware miss to tc action
> 
> It seems non-trivial to fix the issue, so request to revert.
> 
> v1->v2: Fix format errors.
> 
> William Tu (5):
>   Revert "net/sched: flower: fix fl_change() error recovery path"
>   Revert "net/sched: flower: Support hardware miss to tc action"
>   Revert "net/sched: flower: Move filter handle initialization earlier"
>   Revert "net/sched: cls_api: Support hardware miss to tc action"
>   Revert "UBUNTU: SAUCE: net/sched: Provide act to offload action"
> 
>  include/linux/skbuff.h     |   6 +-
>  include/net/flow_offload.h |   2 -
>  include/net/pkt_cls.h      |  30 ++---
>  include/net/sch_generic.h  |   2 -
>  net/openvswitch/flow.c     |   3 +-
>  net/sched/cls_api.c        | 217 ++-----------------------------------
>  net/sched/cls_flower.c     |  79 +++++---------
>  7 files changed, 51 insertions(+), 288 deletions(-)
> 
> -- 
> 2.34.1

Acked-by: Andrei Gherzan <andrei.gherzan@canonical.com>
Bartlomiej Zolnierkiewicz April 27, 2023, 10:10 a.m. UTC | #2
Acked-by: Bartlomiej Zolnierkiewicz <bartlomiej.zolnierkiewicz@canonical.com>

On Tue, Apr 25, 2023 at 1:52 AM William Tu <witu@nvidia.com> wrote:
>
> The series revert the patches for:
> BugLink: https://bugs.launchpad.net/bugs/2012571
>
> While testing, we found the following kernel NULL pointer dereference:
> [  299.084455] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> [  299.093268] Mem abort info:
> [  299.096052]   ESR = 0x96000004
> [  299.099157]   EC = 0x25: DABT (current EL), IL = 32 bits
> [  299.104485]   SET = 0, FnV = 0
> [  299.107529]   EA = 0, S1PTW = 0
> [  299.110690] Data abort info:
> [  299.113585]   ISV = 0, ISS = 0x00000004
> [  299.117436]   CM = 0, WnR = 0
> [  299.120420] user pgtable: 4k pages, 48-bit VAs, pgdp=000000083b887000
> [  299.126882] [0000000000000000] pgd=0000000000000000
> [  299.131778] Internal error: Oops: 96000004 [#1] PREEMPT SMP
> [  299.137342] Modules linked in: ipmb_host rdma_ucm(O) rdma_cm(O) iw_cm(O) ib_ipoib(O) ib_cm(O) ib_umad(O) mlx5_ib(O) mlx5_core(O) mlxfw(O) mlxdevm(O) auxiliary(O) ib_uverbs(O) psample ib_core(O) mlx_compat(O) sbsa_gwdt act_mirred act_skbedit ip6table_filter ip6table_nat ip6table_mangle ip6_tables xt_MASQUERADE xt_mark iptable_nat xt_conntrack xt_comment iptable_filter iptable_mangle bpfilter nfnetlink_cttimeout nfnetlink act_gact cls_flower sch_ingress xfrm_user xfrm_algo openvswitch nsh nf_conncount nf_nat mst_pciconf(O) ipmi_devintf ipmi_msghandler ipmb_dev_int 8021q garp stp mrp llc overlay dm_multipath uio_pdrv_genirq mlxbf_pmc uio pwr_mlxbf mlx_bootctl mlxbf_pka mlx_trio bluefield_edac sch_fq_codel efi_pstore ip_tables ipv6 crc_ccitt btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq raid1 raid0 crct10dif_ce i2c_mlxbf gpio_mlxbf2 mlxbf_gige aes_neon_bs aes_neon_blk [last unloaded: mlx_compat]
> [  299.222443] CPU: 5 PID: 11902 Comm: revalidator15 Tainted: G           O      5.4.0-1061-bluefield #67-Ubuntu
> [  299.232347] Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS 4.0.2.12660 Apr 15 2023
> [  299.242253] pstate: a0000005 (NzCv daif -PAN -UAO)
> [  299.247047] pc : fl_dump_key+0x34/0xa08 [cls_flower]
> [  299.252002] lr : fl_dump+0xc4/0x300 [cls_flower]
> [  299.256609] sp : ffff8000117935e0
> [  299.259912] x29: ffff8000117935e0 x28: ffff4932144e6800
> [  299.265214] x27: ffff493206128048 x26: ffff4931d77dd738
> [  299.270516] x25: ffff493206128000 x24: ffff4932144e6800
> [  299.275817] x23: ffff4932068bba00 x22: ffffcaf6d467d500
> [  299.281118] x21: ffff493214581900 x20: 0000000000000000
> [  299.286419] x19: ffff4932144e6a18 x18: 0000000000000000
> [  299.291721] x17: 0000000000000000 x16: ffffcaf6d369ac70
> [  299.297021] x15: 0000000000000000 x14: 0000000000000000
> [  299.302323] x13: 0000000000000040 x12: 0000000000000030
> [  299.307624] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
> [  299.312925] x9 : fefefefefefefeff x8 : ffff493206128074
> [  299.318226] x7 : 0000000000000000 x6 : ffff493206128074
> [  299.323527] x5 : 0000000000007e50 x4 : 0000000000000000
> [  299.328828] x3 : 0000000000000000 x2 : ffff4932144e6a18
> [  299.334129] x1 : ffffcaf6d467d500 x0 : ffffcaf68d53198c
>
> Git bisect shows the first FAILED is below
> FAILED 0e6ec758d035 net/sched: flower: Move filter handle initialization earlier
> PASS 636a658578ec net/sched: cls_api: Support hardware miss to tc action
>
> It seems non-trivial to fix the issue, so request to revert.
>
> v1->v2: Fix format errors.
>
> William Tu (5):
>   Revert "net/sched: flower: fix fl_change() error recovery path"
>   Revert "net/sched: flower: Support hardware miss to tc action"
>   Revert "net/sched: flower: Move filter handle initialization earlier"
>   Revert "net/sched: cls_api: Support hardware miss to tc action"
>   Revert "UBUNTU: SAUCE: net/sched: Provide act to offload action"
>
>  include/linux/skbuff.h     |   6 +-
>  include/net/flow_offload.h |   2 -
>  include/net/pkt_cls.h      |  30 ++---
>  include/net/sch_generic.h  |   2 -
>  net/openvswitch/flow.c     |   3 +-
>  net/sched/cls_api.c        | 217 ++-----------------------------------
>  net/sched/cls_flower.c     |  79 +++++---------
>  7 files changed, 51 insertions(+), 288 deletions(-)
>
Bartlomiej Zolnierkiewicz April 27, 2023, 10:15 a.m. UTC | #3
Applied to focal:linux-bluefield/master-next. Thanks.

--
Best regards,
Bartlomiej

On Tue, Apr 25, 2023 at 1:52 AM William Tu <witu@nvidia.com> wrote:
>
> The series revert the patches for:
> BugLink: https://bugs.launchpad.net/bugs/2012571
>
> While testing, we found the following kernel NULL pointer dereference:
> [  299.084455] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> [  299.093268] Mem abort info:
> [  299.096052]   ESR = 0x96000004
> [  299.099157]   EC = 0x25: DABT (current EL), IL = 32 bits
> [  299.104485]   SET = 0, FnV = 0
> [  299.107529]   EA = 0, S1PTW = 0
> [  299.110690] Data abort info:
> [  299.113585]   ISV = 0, ISS = 0x00000004
> [  299.117436]   CM = 0, WnR = 0
> [  299.120420] user pgtable: 4k pages, 48-bit VAs, pgdp=000000083b887000
> [  299.126882] [0000000000000000] pgd=0000000000000000
> [  299.131778] Internal error: Oops: 96000004 [#1] PREEMPT SMP
> [  299.137342] Modules linked in: ipmb_host rdma_ucm(O) rdma_cm(O) iw_cm(O) ib_ipoib(O) ib_cm(O) ib_umad(O) mlx5_ib(O) mlx5_core(O) mlxfw(O) mlxdevm(O) auxiliary(O) ib_uverbs(O) psample ib_core(O) mlx_compat(O) sbsa_gwdt act_mirred act_skbedit ip6table_filter ip6table_nat ip6table_mangle ip6_tables xt_MASQUERADE xt_mark iptable_nat xt_conntrack xt_comment iptable_filter iptable_mangle bpfilter nfnetlink_cttimeout nfnetlink act_gact cls_flower sch_ingress xfrm_user xfrm_algo openvswitch nsh nf_conncount nf_nat mst_pciconf(O) ipmi_devintf ipmi_msghandler ipmb_dev_int 8021q garp stp mrp llc overlay dm_multipath uio_pdrv_genirq mlxbf_pmc uio pwr_mlxbf mlx_bootctl mlxbf_pka mlx_trio bluefield_edac sch_fq_codel efi_pstore ip_tables ipv6 crc_ccitt btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq raid1 raid0 crct10dif_ce i2c_mlxbf gpio_mlxbf2 mlxbf_gige aes_neon_bs aes_neon_blk [last unloaded: mlx_compat]
> [  299.222443] CPU: 5 PID: 11902 Comm: revalidator15 Tainted: G           O      5.4.0-1061-bluefield #67-Ubuntu
> [  299.232347] Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS 4.0.2.12660 Apr 15 2023
> [  299.242253] pstate: a0000005 (NzCv daif -PAN -UAO)
> [  299.247047] pc : fl_dump_key+0x34/0xa08 [cls_flower]
> [  299.252002] lr : fl_dump+0xc4/0x300 [cls_flower]
> [  299.256609] sp : ffff8000117935e0
> [  299.259912] x29: ffff8000117935e0 x28: ffff4932144e6800
> [  299.265214] x27: ffff493206128048 x26: ffff4931d77dd738
> [  299.270516] x25: ffff493206128000 x24: ffff4932144e6800
> [  299.275817] x23: ffff4932068bba00 x22: ffffcaf6d467d500
> [  299.281118] x21: ffff493214581900 x20: 0000000000000000
> [  299.286419] x19: ffff4932144e6a18 x18: 0000000000000000
> [  299.291721] x17: 0000000000000000 x16: ffffcaf6d369ac70
> [  299.297021] x15: 0000000000000000 x14: 0000000000000000
> [  299.302323] x13: 0000000000000040 x12: 0000000000000030
> [  299.307624] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
> [  299.312925] x9 : fefefefefefefeff x8 : ffff493206128074
> [  299.318226] x7 : 0000000000000000 x6 : ffff493206128074
> [  299.323527] x5 : 0000000000007e50 x4 : 0000000000000000
> [  299.328828] x3 : 0000000000000000 x2 : ffff4932144e6a18
> [  299.334129] x1 : ffffcaf6d467d500 x0 : ffffcaf68d53198c
>
> Git bisect shows the first FAILED is below
> FAILED 0e6ec758d035 net/sched: flower: Move filter handle initialization earlier
> PASS 636a658578ec net/sched: cls_api: Support hardware miss to tc action
>
> It seems non-trivial to fix the issue, so request to revert.
>
> v1->v2: Fix format errors.
>
> William Tu (5):
>   Revert "net/sched: flower: fix fl_change() error recovery path"
>   Revert "net/sched: flower: Support hardware miss to tc action"
>   Revert "net/sched: flower: Move filter handle initialization earlier"
>   Revert "net/sched: cls_api: Support hardware miss to tc action"
>   Revert "UBUNTU: SAUCE: net/sched: Provide act to offload action"
>
>  include/linux/skbuff.h     |   6 +-
>  include/net/flow_offload.h |   2 -
>  include/net/pkt_cls.h      |  30 ++---
>  include/net/sch_generic.h  |   2 -
>  net/openvswitch/flow.c     |   3 +-
>  net/sched/cls_api.c        | 217 ++-----------------------------------
>  net/sched/cls_flower.c     |  79 +++++---------
>  7 files changed, 51 insertions(+), 288 deletions(-)
>