Message ID | 20231101144951.26198-1-witu@nvidia.com |
---|---|
Headers | show |
Series | Devlink backport: Fix mlx5 driver hangs due to mlx5_sf_hw_table_init Edit | expand |
Acked-by: Bartlomiej Zolnierkiewicz <bartlomiej.zolnierkiewicz@canonical.com> Please include BugLink also in the cover letter in the future submissions. -- Best regards, Bartlomiej On Wed, Nov 1, 2023 at 3:51 PM William Tu <witu@nvidia.com> wrote: > > Summary: > Machine hangs when loading OFED 2310 mlx5 driver at BlueField > > How to reproduce: > # load the OFED driver > > Reason: > BF got stuck and observed call trace "mlx5_sf_hw_table_init+0xf4/0x2d0 [mlx5_core] > > dmesg from minicom: > [ 726.569928] INFO: task systemd-udevd:297 blocked for more than 604 seconds. > [ 726.576895] Tainted: G OE 5.15.0-1029-bluefield #31-Ubuntu > [ 726.584101] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 726.591913] task:systemd-udevd state:D stack: 0 pid: 297 ppid: 280 flags:0x0000000d > [ 726.600248] Call trace: > [ 726.602680] __switch_to+0xf8/0x150 > [ 726.606159] __schedule+0x2b8/0x790 > [ 726.609634] schedule+0x64/0x140 > [ 726.612850] schedule_preempt_disabled+0x18/0x24 > [ 726.617453] __mutex_lock.constprop.0+0x1a0/0x680 > [ 726.622141] __mutex_lock_slowpath+0x40/0x90 > [ 726.626396] mutex_lock+0x64/0x70 > [ 726.629695] devlink_resource_register+0x50/0x1a0 > [ 726.634386] mlx5_sf_hw_table_init+0xf4/0x2d0 [mlx5_core] > [ 726.639882] mlx5_init_one_devl_locked+0x1c8/0x784 [mlx5_core] > [ 726.645791] probe_one+0x300/0x5f0 [mlx5_core] > [ 726.650307] local_pci_probe+0x48/0xb4 > [ 726.654043] pci_device_probe+0x18c/0x200 > [ 726.658039] really_probe+0xd0/0x490 > [ 726.661600] __driver_probe_device+0x148/0x190 > [ 726.666029] driver_probe_device+0x48/0x180 > [ 726.670198] __driver_attach+0x104/0x240 > [ 726.674106] bus_for_each_dev+0x78/0xdc > [ 726.677927] driver_attach+0x2c/0x40 > [ 726.681486] bus_add_driver+0x154/0x270 > [ 726.685307] driver_register+0x80/0x13c > [ 726.689129] __pci_register_driver+0x4c/0x60 > [ 726.693386] __init_backport+0xf0/0x1000 [mlx5_core] > [ 726.698425] do_one_initcall+0x4c/0x250 > [ 726.702248] do_init_module+0x50/0x260 > [ 726.705983] load_module+0x9fc/0xbe0 > [ 726.709543] __do_sys_finit_module+0xa8/0x114 > > How to fix: > This is related to > https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2039869 > and we need to backport/cherry-pick more patches from the series > > Patches are below > Backport: f655dacb59ac net: devlink: remove unused locked functions > Backport: 012ec02ae441 netdevsim: convert driver to use unlocked devlink API during init/fini > Cherry-pick: eb0e9fa2c635 net: devlink: add unlocked variants of devlink_region_create/destroy() functions > SKIP: 72a4c8c94efa mlxsw: convert driver to use unlocked devlink API during init/fini > Backport: 70a2ff89369d net: devlink: add unlocked variants of devlink_dpipe*() functions > Cherry-pick: 755cfa69c4ec net: devlink: add unlocked variants of devlink_sb*() functions > Cherry-pick: c223d6a4bf6d net: devlink: add unlocked variants of devlink_resource*() functions > Cherry-pick: 852e85a704c2 net: devlink: add unlocked variants of devling_trap*() functions > Cherry-pick: e26fde2f5bef net: devlink: avoid false DEADLOCK warning reported by lock > > Thanks! > > Jiri Pirko (6): > net: devlink: add unlocked variants of devlink_resource*() functions > net: devlink: add unlocked variants of devlink_sb*() functions > net: devlink: add unlocked variants of devlink_dpipe*() functions > net: devlink: add unlocked variants of devlink_region_create/destroy() > functions > netdevsim: convert driver to use unlocked devlink API during init/fini > net: devlink: remove unused locked functions > > Moshe Shemesh (1): > net: devlink: avoid false DEADLOCK warning reported by lockdep > > drivers/net/netdevsim/dev.c | 92 +++---- > drivers/net/netdevsim/fib.c | 62 ++--- > include/net/devlink.h | 60 ++-- > net/core/devlink.c | 534 ++++++++++++++++++++---------------- > 4 files changed, 421 insertions(+), 327 deletions(-) > >
On 01/11/2023 15:49, William Tu wrote: > Summary: > Machine hangs when loading OFED 2310 mlx5 driver at BlueField > > How to reproduce: > # load the OFED driver > > Reason: > BF got stuck and observed call trace "mlx5_sf_hw_table_init+0xf4/0x2d0 [mlx5_core] > > dmesg from minicom: > [ 726.569928] INFO: task systemd-udevd:297 blocked for more than 604 seconds. > [ 726.576895] Tainted: G OE 5.15.0-1029-bluefield #31-Ubuntu > [ 726.584101] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 726.591913] task:systemd-udevd state:D stack: 0 pid: 297 ppid: 280 flags:0x0000000d > [ 726.600248] Call trace: > [ 726.602680] __switch_to+0xf8/0x150 > [ 726.606159] __schedule+0x2b8/0x790 > [ 726.609634] schedule+0x64/0x140 > [ 726.612850] schedule_preempt_disabled+0x18/0x24 > [ 726.617453] __mutex_lock.constprop.0+0x1a0/0x680 > [ 726.622141] __mutex_lock_slowpath+0x40/0x90 > [ 726.626396] mutex_lock+0x64/0x70 > [ 726.629695] devlink_resource_register+0x50/0x1a0 > [ 726.634386] mlx5_sf_hw_table_init+0xf4/0x2d0 [mlx5_core] > [ 726.639882] mlx5_init_one_devl_locked+0x1c8/0x784 [mlx5_core] > [ 726.645791] probe_one+0x300/0x5f0 [mlx5_core] > [ 726.650307] local_pci_probe+0x48/0xb4 > [ 726.654043] pci_device_probe+0x18c/0x200 > [ 726.658039] really_probe+0xd0/0x490 > [ 726.661600] __driver_probe_device+0x148/0x190 > [ 726.666029] driver_probe_device+0x48/0x180 > [ 726.670198] __driver_attach+0x104/0x240 > [ 726.674106] bus_for_each_dev+0x78/0xdc > [ 726.677927] driver_attach+0x2c/0x40 > [ 726.681486] bus_add_driver+0x154/0x270 > [ 726.685307] driver_register+0x80/0x13c > [ 726.689129] __pci_register_driver+0x4c/0x60 > [ 726.693386] __init_backport+0xf0/0x1000 [mlx5_core] > [ 726.698425] do_one_initcall+0x4c/0x250 > [ 726.702248] do_init_module+0x50/0x260 > [ 726.705983] load_module+0x9fc/0xbe0 > [ 726.709543] __do_sys_finit_module+0xa8/0x114 > > How to fix: > This is related to > https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2039869 > and we need to backport/cherry-pick more patches from the series > > Patches are below > Backport: f655dacb59ac net: devlink: remove unused locked functions > Backport: 012ec02ae441 netdevsim: convert driver to use unlocked devlink API during init/fini > Cherry-pick: eb0e9fa2c635 net: devlink: add unlocked variants of devlink_region_create/destroy() functions > SKIP: 72a4c8c94efa mlxsw: convert driver to use unlocked devlink API during init/fini > Backport: 70a2ff89369d net: devlink: add unlocked variants of devlink_dpipe*() functions > Cherry-pick: 755cfa69c4ec net: devlink: add unlocked variants of devlink_sb*() functions > Cherry-pick: c223d6a4bf6d net: devlink: add unlocked variants of devlink_resource*() functions > Cherry-pick: 852e85a704c2 net: devlink: add unlocked variants of devling_trap*() functions > Cherry-pick: e26fde2f5bef net: devlink: avoid false DEADLOCK warning reported by lock > > Thanks! > > Jiri Pirko (6): > net: devlink: add unlocked variants of devlink_resource*() functions > net: devlink: add unlocked variants of devlink_sb*() functions > net: devlink: add unlocked variants of devlink_dpipe*() functions > net: devlink: add unlocked variants of devlink_region_create/destroy() > functions > netdevsim: convert driver to use unlocked devlink API during init/fini > net: devlink: remove unused locked functions > > Moshe Shemesh (1): > net: devlink: avoid false DEADLOCK warning reported by lockdep > > drivers/net/netdevsim/dev.c | 92 +++---- > drivers/net/netdevsim/fib.c | 62 ++--- > include/net/devlink.h | 60 ++-- > net/core/devlink.c | 534 ++++++++++++++++++++---------------- > 4 files changed, 421 insertions(+), 327 deletions(-) Acked-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
Applied to jammy:linux-bluefield/master-next. Thanks. -- Best regards, Bartlomiej On Wed, Nov 1, 2023 at 3:51 PM William Tu <witu@nvidia.com> wrote: > > Summary: > Machine hangs when loading OFED 2310 mlx5 driver at BlueField > > How to reproduce: > # load the OFED driver > > Reason: > BF got stuck and observed call trace "mlx5_sf_hw_table_init+0xf4/0x2d0 [mlx5_core] > > dmesg from minicom: > [ 726.569928] INFO: task systemd-udevd:297 blocked for more than 604 seconds. > [ 726.576895] Tainted: G OE 5.15.0-1029-bluefield #31-Ubuntu > [ 726.584101] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 726.591913] task:systemd-udevd state:D stack: 0 pid: 297 ppid: 280 flags:0x0000000d > [ 726.600248] Call trace: > [ 726.602680] __switch_to+0xf8/0x150 > [ 726.606159] __schedule+0x2b8/0x790 > [ 726.609634] schedule+0x64/0x140 > [ 726.612850] schedule_preempt_disabled+0x18/0x24 > [ 726.617453] __mutex_lock.constprop.0+0x1a0/0x680 > [ 726.622141] __mutex_lock_slowpath+0x40/0x90 > [ 726.626396] mutex_lock+0x64/0x70 > [ 726.629695] devlink_resource_register+0x50/0x1a0 > [ 726.634386] mlx5_sf_hw_table_init+0xf4/0x2d0 [mlx5_core] > [ 726.639882] mlx5_init_one_devl_locked+0x1c8/0x784 [mlx5_core] > [ 726.645791] probe_one+0x300/0x5f0 [mlx5_core] > [ 726.650307] local_pci_probe+0x48/0xb4 > [ 726.654043] pci_device_probe+0x18c/0x200 > [ 726.658039] really_probe+0xd0/0x490 > [ 726.661600] __driver_probe_device+0x148/0x190 > [ 726.666029] driver_probe_device+0x48/0x180 > [ 726.670198] __driver_attach+0x104/0x240 > [ 726.674106] bus_for_each_dev+0x78/0xdc > [ 726.677927] driver_attach+0x2c/0x40 > [ 726.681486] bus_add_driver+0x154/0x270 > [ 726.685307] driver_register+0x80/0x13c > [ 726.689129] __pci_register_driver+0x4c/0x60 > [ 726.693386] __init_backport+0xf0/0x1000 [mlx5_core] > [ 726.698425] do_one_initcall+0x4c/0x250 > [ 726.702248] do_init_module+0x50/0x260 > [ 726.705983] load_module+0x9fc/0xbe0 > [ 726.709543] __do_sys_finit_module+0xa8/0x114 > > How to fix: > This is related to > https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2039869 > and we need to backport/cherry-pick more patches from the series > > Patches are below > Backport: f655dacb59ac net: devlink: remove unused locked functions > Backport: 012ec02ae441 netdevsim: convert driver to use unlocked devlink API during init/fini > Cherry-pick: eb0e9fa2c635 net: devlink: add unlocked variants of devlink_region_create/destroy() functions > SKIP: 72a4c8c94efa mlxsw: convert driver to use unlocked devlink API during init/fini > Backport: 70a2ff89369d net: devlink: add unlocked variants of devlink_dpipe*() functions > Cherry-pick: 755cfa69c4ec net: devlink: add unlocked variants of devlink_sb*() functions > Cherry-pick: c223d6a4bf6d net: devlink: add unlocked variants of devlink_resource*() functions > Cherry-pick: 852e85a704c2 net: devlink: add unlocked variants of devling_trap*() functions > Cherry-pick: e26fde2f5bef net: devlink: avoid false DEADLOCK warning reported by lock > > Thanks! > > Jiri Pirko (6): > net: devlink: add unlocked variants of devlink_resource*() functions > net: devlink: add unlocked variants of devlink_sb*() functions > net: devlink: add unlocked variants of devlink_dpipe*() functions > net: devlink: add unlocked variants of devlink_region_create/destroy() > functions > netdevsim: convert driver to use unlocked devlink API during init/fini > net: devlink: remove unused locked functions > > Moshe Shemesh (1): > net: devlink: avoid false DEADLOCK warning reported by lockdep > > drivers/net/netdevsim/dev.c | 92 +++---- > drivers/net/netdevsim/fib.c | 62 ++--- > include/net/devlink.h | 60 ++-- > net/core/devlink.c | 534 ++++++++++++++++++++---------------- > 4 files changed, 421 insertions(+), 327 deletions(-) >