mbox series

[SRU,F,0/1] Page fault in RDMA ODP triggers BUG_ON during MMU notifier registration

Message ID 20231215100343.32480-1-chengen.du@canonical.com
Headers show
Series Page fault in RDMA ODP triggers BUG_ON during MMU notifier registration | expand

Message

Chengen Du Dec. 15, 2023, 10:03 a.m. UTC
BugLink: https://bugs.launchpad.net/bugs/2046534

SRU Justification:

[Impact]
When a page fault is triggered in RDMA ODP, it registers an MMU notifier during the process.
Unfortunately, an error arises due to a race condition where the mm is released while attempting to register a notifier.
==========
Oct 14 23:38:32 bnode001 kernel: [1576115.901880] kernel BUG at mm/mmu_notifier.c:255!
Oct 14 23:38:32 bnode001 kernel: [1576115.909129] RSP: 0000:ffffbd3def843c90 EFLAGS: 00010246
Oct 14 23:38:32 bnode001 kernel: [1576115.912689] RAX: ffffa11635d20000 RBX: ffffa0f913ba5800 RCX: 0000000000000000
Oct 14 23:38:32 bnode001 kernel: [1576115.912691] RDX: ffffffffc0b666f0 RSI: ffffffffc0b601c7 RDI: ffffa0f913ba5850
Oct 14 23:38:32 bnode001 kernel: [1576115.913564] RAX: 0000000000000000 RBX: ffffffffc0b5a060 RCX: 0000000000000000
Oct 14 23:38:32 bnode001 kernel: [1576115.913565] RDX: 0000000000000007 RSI: ffffa1152ed3c400 RDI: ffffa1102dcd4300
Oct 14 23:38:32 bnode001 kernel: [1576115.914431] RBP: ffffbd3defcb7c88 R08: ffffa1163f4f50e0 R09: ffffa11638c072c0
Oct 14 23:38:32 bnode001 kernel: [1576115.914432] R10: ffffa0fd99a00000 R11: 0000000000000000 R12: ffffa1152c923b80
Oct 14 23:38:32 bnode001 kernel: [1576115.915263] RBP: ffffbd3def843cb0 R08: ffffa1163f7350e0 R09: ffffa11638c072c0
Oct 14 23:38:32 bnode001 kernel: [1576115.915265] R10: ffffa1088d000000 R11: 0000000000000000 R12: ffffa1102dcd4300
Oct 14 23:38:32 bnode001 kernel: [1576115.916079] R13: ffffa1152c923b80 R14: ffffa1152c923bf8 R15: ffffa114f8127800
Oct 14 23:38:32 bnode001 kernel: [1576115.916080] FS: 0000000000000000(0000) GS:ffffa1163f4c0000(0000) knlGS:0000000000000000
Oct 14 23:38:32 bnode001 kernel: [1576115.917705] R13: ffffa1152ed3c400 R14: ffffa1152ed3c478 R15: ffffa1101cbfbc00
Oct 14 23:38:32 bnode001 kernel: [1576115.917706] FS: 0000000000000000(0000) GS:ffffa1163f700000(0000) knlGS:0000000000000000
Oct 14 23:38:32 bnode001 kernel: [1576115.918506] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 14 23:38:32 bnode001 kernel: [1576115.918508] CR2: 00007f94146af5e0 CR3: 0000001722472004 CR4: 0000000000760ee0
Oct 14 23:38:32 bnode001 kernel: [1576115.919301] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 14 23:38:32 bnode001 kernel: [1576115.919302] CR2: 00007f32f0a2dc80 CR3: 0000001f9f1fc004 CR4: 0000000000760ee0
Oct 14 23:38:32 bnode001 kernel: [1576115.920082] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct 14 23:38:32 bnode001 kernel: [1576115.920084] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
Oct 14 23:38:32 bnode001 kernel: [1576115.920850] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct 14 23:38:32 bnode001 kernel: [1576115.921604] PKRU: 55555554
Oct 14 23:38:32 bnode001 kernel: [1576115.921605] Call Trace:
Oct 14 23:38:32 bnode001 kernel: [1576115.922354] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
Oct 14 23:38:32 bnode001 kernel: [1576115.922355] PKRU: 55555554
Oct 14 23:38:32 bnode001 kernel: [1576115.923112] mmu_notifier_get_locked+0x5f/0xe0
Oct 14 23:38:32 bnode001 kernel: [1576115.923867] Call Trace:
Oct 14 23:38:32 bnode001 kernel: [1576115.923870] ? mmu_notifier_get_locked+0x79/0xe0
Oct 14 23:38:32 bnode001 kernel: [1576115.924645] ib_umem_odp_alloc_child+0x15a/0x290 [ib_core]
Oct 14 23:38:32 bnode001 kernel: [1576115.925409] ib_umem_odp_alloc_child+0x15a/0x290 [ib_core]
Oct 14 23:38:32 bnode001 kernel: [1576115.926161] pagefault_mr+0x312/0x5d0 [mlx5_ib]
Oct 14 23:38:32 bnode001 kernel: [1576115.926906] pagefault_mr+0x312/0x5d0 [mlx5_ib]
Oct 14 23:38:32 bnode001 kernel: [1576115.927651] pagefault_single_data_segment.isra.0+0x284/0x490 [mlx5_ib]
Oct 14 23:38:32 bnode001 kernel: [1576115.928393] pagefault_single_data_segment.isra.0+0x284/0x490 [mlx5_ib]
Oct 14 23:38:32 bnode001 kernel: [1576115.929131] mlx5_ib_eqe_pf_action+0x7d5/0x990 [mlx5_ib]
Oct 14 23:38:32 bnode001 kernel: [1576115.929866] mlx5_ib_eqe_pf_action+0x7d5/0x990 [mlx5_ib]
Oct 14 23:38:32 bnode001 kernel: [1576115.930610] process_one_work+0x1eb/0x3b0
Oct 14 23:38:32 bnode001 kernel: [1576115.931351] process_one_work+0x1eb/0x3b0
Oct 14 23:38:32 bnode001 kernel: [1576115.932084] worker_thread+0x4d/0x400
Oct 14 23:38:32 bnode001 kernel: [1576115.932813] worker_thread+0x4d/0x400
Oct 14 23:38:32 bnode001 kernel: [1576115.933543] kthread+0x104/0x140
Oct 14 23:38:32 bnode001 kernel: [1576115.934272] kthread+0x104/0x140
Oct 14 23:38:32 bnode001 kernel: [1576115.934986] ? process_one_work+0x3b0/0x3b0
Oct 14 23:38:32 bnode001 kernel: [1576115.934988] ? kthread_park+0x90/0x90
Oct 14 23:38:32 bnode001 kernel: [1576115.935687] ? process_one_work+0x3b0/0x3b0
Oct 14 23:38:32 bnode001 kernel: [1576115.935689] ? kthread_park+0x90/0x90
Oct 14 23:38:32 bnode001 kernel: [1576115.936387] ret_from_fork+0x1f/0x40
Oct 14 23:38:32 bnode001 kernel: [1576115.936389] ---[ end trace 1823b59637af552f ]---
Oct 14 23:38:32 bnode001 kernel: [1576115.937077] ret_from_fork+0x1f/0x40
==========

[Fix]
There is an upstream patch that fixes this issue:
==========
commit a4e63bce1414df7ab6eb82ca9feb8494ce13e554
Author: Jason Gunthorpe <jgg@ziepe.ca>
Date: Thu Feb 27 13:41:18 2020 +0200

    RDMA/odp: Ensure the mm is still alive before creating an implicit child
==========
The patch has been implemented to modify the behavior by calling mmget() around the registration, thereby ensuring it is held to avoid the race condition.

[Test Plan]
This is a race condition issue and may not be easy to reproduce.
The test plan involves running on a system with InfiniBand, triggering the RDMA ODP page fault path to check if everything works as expected.

[Where problems could occur]
The patch calls mmget_not_zero() before registering the MMU notifier and puts it after registration is done.
This change may not affect the execution result but ensures that the mm will not be released during registration.
The risk associated with adopting this patch can be judged as low.

Jason Gunthorpe (1):
  RDMA/odp: Ensure the mm is still alive before creating an implicit
    child

 drivers/infiniband/core/umem_odp.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

Comments

Tim Gardner Jan. 3, 2024, 4:44 p.m. UTC | #1
On 12/15/23 3:03 AM, Chengen Du wrote:
> BugLink: https://bugs.launchpad.net/bugs/2046534
> 
> SRU Justification:
> 
> [Impact]
> When a page fault is triggered in RDMA ODP, it registers an MMU notifier during the process.
> Unfortunately, an error arises due to a race condition where the mm is released while attempting to register a notifier.
> ==========
> Oct 14 23:38:32 bnode001 kernel: [1576115.901880] kernel BUG at mm/mmu_notifier.c:255!
> Oct 14 23:38:32 bnode001 kernel: [1576115.909129] RSP: 0000:ffffbd3def843c90 EFLAGS: 00010246
> Oct 14 23:38:32 bnode001 kernel: [1576115.912689] RAX: ffffa11635d20000 RBX: ffffa0f913ba5800 RCX: 0000000000000000
> Oct 14 23:38:32 bnode001 kernel: [1576115.912691] RDX: ffffffffc0b666f0 RSI: ffffffffc0b601c7 RDI: ffffa0f913ba5850
> Oct 14 23:38:32 bnode001 kernel: [1576115.913564] RAX: 0000000000000000 RBX: ffffffffc0b5a060 RCX: 0000000000000000
> Oct 14 23:38:32 bnode001 kernel: [1576115.913565] RDX: 0000000000000007 RSI: ffffa1152ed3c400 RDI: ffffa1102dcd4300
> Oct 14 23:38:32 bnode001 kernel: [1576115.914431] RBP: ffffbd3defcb7c88 R08: ffffa1163f4f50e0 R09: ffffa11638c072c0
> Oct 14 23:38:32 bnode001 kernel: [1576115.914432] R10: ffffa0fd99a00000 R11: 0000000000000000 R12: ffffa1152c923b80
> Oct 14 23:38:32 bnode001 kernel: [1576115.915263] RBP: ffffbd3def843cb0 R08: ffffa1163f7350e0 R09: ffffa11638c072c0
> Oct 14 23:38:32 bnode001 kernel: [1576115.915265] R10: ffffa1088d000000 R11: 0000000000000000 R12: ffffa1102dcd4300
> Oct 14 23:38:32 bnode001 kernel: [1576115.916079] R13: ffffa1152c923b80 R14: ffffa1152c923bf8 R15: ffffa114f8127800
> Oct 14 23:38:32 bnode001 kernel: [1576115.916080] FS: 0000000000000000(0000) GS:ffffa1163f4c0000(0000) knlGS:0000000000000000
> Oct 14 23:38:32 bnode001 kernel: [1576115.917705] R13: ffffa1152ed3c400 R14: ffffa1152ed3c478 R15: ffffa1101cbfbc00
> Oct 14 23:38:32 bnode001 kernel: [1576115.917706] FS: 0000000000000000(0000) GS:ffffa1163f700000(0000) knlGS:0000000000000000
> Oct 14 23:38:32 bnode001 kernel: [1576115.918506] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Oct 14 23:38:32 bnode001 kernel: [1576115.918508] CR2: 00007f94146af5e0 CR3: 0000001722472004 CR4: 0000000000760ee0
> Oct 14 23:38:32 bnode001 kernel: [1576115.919301] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Oct 14 23:38:32 bnode001 kernel: [1576115.919302] CR2: 00007f32f0a2dc80 CR3: 0000001f9f1fc004 CR4: 0000000000760ee0
> Oct 14 23:38:32 bnode001 kernel: [1576115.920082] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Oct 14 23:38:32 bnode001 kernel: [1576115.920084] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> Oct 14 23:38:32 bnode001 kernel: [1576115.920850] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Oct 14 23:38:32 bnode001 kernel: [1576115.921604] PKRU: 55555554
> Oct 14 23:38:32 bnode001 kernel: [1576115.921605] Call Trace:
> Oct 14 23:38:32 bnode001 kernel: [1576115.922354] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> Oct 14 23:38:32 bnode001 kernel: [1576115.922355] PKRU: 55555554
> Oct 14 23:38:32 bnode001 kernel: [1576115.923112] mmu_notifier_get_locked+0x5f/0xe0
> Oct 14 23:38:32 bnode001 kernel: [1576115.923867] Call Trace:
> Oct 14 23:38:32 bnode001 kernel: [1576115.923870] ? mmu_notifier_get_locked+0x79/0xe0
> Oct 14 23:38:32 bnode001 kernel: [1576115.924645] ib_umem_odp_alloc_child+0x15a/0x290 [ib_core]
> Oct 14 23:38:32 bnode001 kernel: [1576115.925409] ib_umem_odp_alloc_child+0x15a/0x290 [ib_core]
> Oct 14 23:38:32 bnode001 kernel: [1576115.926161] pagefault_mr+0x312/0x5d0 [mlx5_ib]
> Oct 14 23:38:32 bnode001 kernel: [1576115.926906] pagefault_mr+0x312/0x5d0 [mlx5_ib]
> Oct 14 23:38:32 bnode001 kernel: [1576115.927651] pagefault_single_data_segment.isra.0+0x284/0x490 [mlx5_ib]
> Oct 14 23:38:32 bnode001 kernel: [1576115.928393] pagefault_single_data_segment.isra.0+0x284/0x490 [mlx5_ib]
> Oct 14 23:38:32 bnode001 kernel: [1576115.929131] mlx5_ib_eqe_pf_action+0x7d5/0x990 [mlx5_ib]
> Oct 14 23:38:32 bnode001 kernel: [1576115.929866] mlx5_ib_eqe_pf_action+0x7d5/0x990 [mlx5_ib]
> Oct 14 23:38:32 bnode001 kernel: [1576115.930610] process_one_work+0x1eb/0x3b0
> Oct 14 23:38:32 bnode001 kernel: [1576115.931351] process_one_work+0x1eb/0x3b0
> Oct 14 23:38:32 bnode001 kernel: [1576115.932084] worker_thread+0x4d/0x400
> Oct 14 23:38:32 bnode001 kernel: [1576115.932813] worker_thread+0x4d/0x400
> Oct 14 23:38:32 bnode001 kernel: [1576115.933543] kthread+0x104/0x140
> Oct 14 23:38:32 bnode001 kernel: [1576115.934272] kthread+0x104/0x140
> Oct 14 23:38:32 bnode001 kernel: [1576115.934986] ? process_one_work+0x3b0/0x3b0
> Oct 14 23:38:32 bnode001 kernel: [1576115.934988] ? kthread_park+0x90/0x90
> Oct 14 23:38:32 bnode001 kernel: [1576115.935687] ? process_one_work+0x3b0/0x3b0
> Oct 14 23:38:32 bnode001 kernel: [1576115.935689] ? kthread_park+0x90/0x90
> Oct 14 23:38:32 bnode001 kernel: [1576115.936387] ret_from_fork+0x1f/0x40
> Oct 14 23:38:32 bnode001 kernel: [1576115.936389] ---[ end trace 1823b59637af552f ]---
> Oct 14 23:38:32 bnode001 kernel: [1576115.937077] ret_from_fork+0x1f/0x40
> ==========
> 
> [Fix]
> There is an upstream patch that fixes this issue:
> ==========
> commit a4e63bce1414df7ab6eb82ca9feb8494ce13e554
> Author: Jason Gunthorpe <jgg@ziepe.ca>
> Date: Thu Feb 27 13:41:18 2020 +0200
> 
>      RDMA/odp: Ensure the mm is still alive before creating an implicit child
> ==========
> The patch has been implemented to modify the behavior by calling mmget() around the registration, thereby ensuring it is held to avoid the race condition.
> 
> [Test Plan]
> This is a race condition issue and may not be easy to reproduce.
> The test plan involves running on a system with InfiniBand, triggering the RDMA ODP page fault path to check if everything works as expected.
> 
> [Where problems could occur]
> The patch calls mmget_not_zero() before registering the MMU notifier and puts it after registration is done.
> This change may not affect the execution result but ensures that the mm will not be released during registration.
> The risk associated with adopting this patch can be judged as low.
> 
> Jason Gunthorpe (1):
>    RDMA/odp: Ensure the mm is still alive before creating an implicit
>      child
> 
>   drivers/infiniband/core/umem_odp.c | 22 ++++++++++++++++++----
>   1 file changed, 18 insertions(+), 4 deletions(-)
> 
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Manuel Diewald Jan. 5, 2024, 10:39 a.m. UTC | #2
On Fri, Dec 15, 2023 at 06:03:42PM +0800, Chengen Du wrote:
> BugLink: https://bugs.launchpad.net/bugs/2046534
> 
> SRU Justification:
> 
> [Impact]
> When a page fault is triggered in RDMA ODP, it registers an MMU notifier during the process.
> Unfortunately, an error arises due to a race condition where the mm is released while attempting to register a notifier.
> ==========
> Oct 14 23:38:32 bnode001 kernel: [1576115.901880] kernel BUG at mm/mmu_notifier.c:255!
> Oct 14 23:38:32 bnode001 kernel: [1576115.909129] RSP: 0000:ffffbd3def843c90 EFLAGS: 00010246
> Oct 14 23:38:32 bnode001 kernel: [1576115.912689] RAX: ffffa11635d20000 RBX: ffffa0f913ba5800 RCX: 0000000000000000
> Oct 14 23:38:32 bnode001 kernel: [1576115.912691] RDX: ffffffffc0b666f0 RSI: ffffffffc0b601c7 RDI: ffffa0f913ba5850
> Oct 14 23:38:32 bnode001 kernel: [1576115.913564] RAX: 0000000000000000 RBX: ffffffffc0b5a060 RCX: 0000000000000000
> Oct 14 23:38:32 bnode001 kernel: [1576115.913565] RDX: 0000000000000007 RSI: ffffa1152ed3c400 RDI: ffffa1102dcd4300
> Oct 14 23:38:32 bnode001 kernel: [1576115.914431] RBP: ffffbd3defcb7c88 R08: ffffa1163f4f50e0 R09: ffffa11638c072c0
> Oct 14 23:38:32 bnode001 kernel: [1576115.914432] R10: ffffa0fd99a00000 R11: 0000000000000000 R12: ffffa1152c923b80
> Oct 14 23:38:32 bnode001 kernel: [1576115.915263] RBP: ffffbd3def843cb0 R08: ffffa1163f7350e0 R09: ffffa11638c072c0
> Oct 14 23:38:32 bnode001 kernel: [1576115.915265] R10: ffffa1088d000000 R11: 0000000000000000 R12: ffffa1102dcd4300
> Oct 14 23:38:32 bnode001 kernel: [1576115.916079] R13: ffffa1152c923b80 R14: ffffa1152c923bf8 R15: ffffa114f8127800
> Oct 14 23:38:32 bnode001 kernel: [1576115.916080] FS: 0000000000000000(0000) GS:ffffa1163f4c0000(0000) knlGS:0000000000000000
> Oct 14 23:38:32 bnode001 kernel: [1576115.917705] R13: ffffa1152ed3c400 R14: ffffa1152ed3c478 R15: ffffa1101cbfbc00
> Oct 14 23:38:32 bnode001 kernel: [1576115.917706] FS: 0000000000000000(0000) GS:ffffa1163f700000(0000) knlGS:0000000000000000
> Oct 14 23:38:32 bnode001 kernel: [1576115.918506] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Oct 14 23:38:32 bnode001 kernel: [1576115.918508] CR2: 00007f94146af5e0 CR3: 0000001722472004 CR4: 0000000000760ee0
> Oct 14 23:38:32 bnode001 kernel: [1576115.919301] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Oct 14 23:38:32 bnode001 kernel: [1576115.919302] CR2: 00007f32f0a2dc80 CR3: 0000001f9f1fc004 CR4: 0000000000760ee0
> Oct 14 23:38:32 bnode001 kernel: [1576115.920082] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Oct 14 23:38:32 bnode001 kernel: [1576115.920084] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> Oct 14 23:38:32 bnode001 kernel: [1576115.920850] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Oct 14 23:38:32 bnode001 kernel: [1576115.921604] PKRU: 55555554
> Oct 14 23:38:32 bnode001 kernel: [1576115.921605] Call Trace:
> Oct 14 23:38:32 bnode001 kernel: [1576115.922354] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> Oct 14 23:38:32 bnode001 kernel: [1576115.922355] PKRU: 55555554
> Oct 14 23:38:32 bnode001 kernel: [1576115.923112] mmu_notifier_get_locked+0x5f/0xe0
> Oct 14 23:38:32 bnode001 kernel: [1576115.923867] Call Trace:
> Oct 14 23:38:32 bnode001 kernel: [1576115.923870] ? mmu_notifier_get_locked+0x79/0xe0
> Oct 14 23:38:32 bnode001 kernel: [1576115.924645] ib_umem_odp_alloc_child+0x15a/0x290 [ib_core]
> Oct 14 23:38:32 bnode001 kernel: [1576115.925409] ib_umem_odp_alloc_child+0x15a/0x290 [ib_core]
> Oct 14 23:38:32 bnode001 kernel: [1576115.926161] pagefault_mr+0x312/0x5d0 [mlx5_ib]
> Oct 14 23:38:32 bnode001 kernel: [1576115.926906] pagefault_mr+0x312/0x5d0 [mlx5_ib]
> Oct 14 23:38:32 bnode001 kernel: [1576115.927651] pagefault_single_data_segment.isra.0+0x284/0x490 [mlx5_ib]
> Oct 14 23:38:32 bnode001 kernel: [1576115.928393] pagefault_single_data_segment.isra.0+0x284/0x490 [mlx5_ib]
> Oct 14 23:38:32 bnode001 kernel: [1576115.929131] mlx5_ib_eqe_pf_action+0x7d5/0x990 [mlx5_ib]
> Oct 14 23:38:32 bnode001 kernel: [1576115.929866] mlx5_ib_eqe_pf_action+0x7d5/0x990 [mlx5_ib]
> Oct 14 23:38:32 bnode001 kernel: [1576115.930610] process_one_work+0x1eb/0x3b0
> Oct 14 23:38:32 bnode001 kernel: [1576115.931351] process_one_work+0x1eb/0x3b0
> Oct 14 23:38:32 bnode001 kernel: [1576115.932084] worker_thread+0x4d/0x400
> Oct 14 23:38:32 bnode001 kernel: [1576115.932813] worker_thread+0x4d/0x400
> Oct 14 23:38:32 bnode001 kernel: [1576115.933543] kthread+0x104/0x140
> Oct 14 23:38:32 bnode001 kernel: [1576115.934272] kthread+0x104/0x140
> Oct 14 23:38:32 bnode001 kernel: [1576115.934986] ? process_one_work+0x3b0/0x3b0
> Oct 14 23:38:32 bnode001 kernel: [1576115.934988] ? kthread_park+0x90/0x90
> Oct 14 23:38:32 bnode001 kernel: [1576115.935687] ? process_one_work+0x3b0/0x3b0
> Oct 14 23:38:32 bnode001 kernel: [1576115.935689] ? kthread_park+0x90/0x90
> Oct 14 23:38:32 bnode001 kernel: [1576115.936387] ret_from_fork+0x1f/0x40
> Oct 14 23:38:32 bnode001 kernel: [1576115.936389] ---[ end trace 1823b59637af552f ]---
> Oct 14 23:38:32 bnode001 kernel: [1576115.937077] ret_from_fork+0x1f/0x40
> ==========
> 
> [Fix]
> There is an upstream patch that fixes this issue:
> ==========
> commit a4e63bce1414df7ab6eb82ca9feb8494ce13e554
> Author: Jason Gunthorpe <jgg@ziepe.ca>
> Date: Thu Feb 27 13:41:18 2020 +0200
> 
>     RDMA/odp: Ensure the mm is still alive before creating an implicit child
> ==========
> The patch has been implemented to modify the behavior by calling mmget() around the registration, thereby ensuring it is held to avoid the race condition.
> 
> [Test Plan]
> This is a race condition issue and may not be easy to reproduce.
> The test plan involves running on a system with InfiniBand, triggering the RDMA ODP page fault path to check if everything works as expected.
> 
> [Where problems could occur]
> The patch calls mmget_not_zero() before registering the MMU notifier and puts it after registration is done.
> This change may not affect the execution result but ensures that the mm will not be released during registration.
> The risk associated with adopting this patch can be judged as low.
> 
> Jason Gunthorpe (1):
>   RDMA/odp: Ensure the mm is still alive before creating an implicit
>     child
> 
>  drivers/infiniband/core/umem_odp.c | 22 ++++++++++++++++++----
>  1 file changed, 18 insertions(+), 4 deletions(-)
> 
> -- 
> 2.40.1
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team

Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Roxana Nicolescu Jan. 5, 2024, 11:37 a.m. UTC | #3
On 15/12/2023 11:03, Chengen Du wrote:
> BugLink: https://bugs.launchpad.net/bugs/2046534
>
> SRU Justification:
>
> [Impact]
> When a page fault is triggered in RDMA ODP, it registers an MMU notifier during the process.
> Unfortunately, an error arises due to a race condition where the mm is released while attempting to register a notifier.
> ==========
> Oct 14 23:38:32 bnode001 kernel: [1576115.901880] kernel BUG at mm/mmu_notifier.c:255!
> Oct 14 23:38:32 bnode001 kernel: [1576115.909129] RSP: 0000:ffffbd3def843c90 EFLAGS: 00010246
> Oct 14 23:38:32 bnode001 kernel: [1576115.912689] RAX: ffffa11635d20000 RBX: ffffa0f913ba5800 RCX: 0000000000000000
> Oct 14 23:38:32 bnode001 kernel: [1576115.912691] RDX: ffffffffc0b666f0 RSI: ffffffffc0b601c7 RDI: ffffa0f913ba5850
> Oct 14 23:38:32 bnode001 kernel: [1576115.913564] RAX: 0000000000000000 RBX: ffffffffc0b5a060 RCX: 0000000000000000
> Oct 14 23:38:32 bnode001 kernel: [1576115.913565] RDX: 0000000000000007 RSI: ffffa1152ed3c400 RDI: ffffa1102dcd4300
> Oct 14 23:38:32 bnode001 kernel: [1576115.914431] RBP: ffffbd3defcb7c88 R08: ffffa1163f4f50e0 R09: ffffa11638c072c0
> Oct 14 23:38:32 bnode001 kernel: [1576115.914432] R10: ffffa0fd99a00000 R11: 0000000000000000 R12: ffffa1152c923b80
> Oct 14 23:38:32 bnode001 kernel: [1576115.915263] RBP: ffffbd3def843cb0 R08: ffffa1163f7350e0 R09: ffffa11638c072c0
> Oct 14 23:38:32 bnode001 kernel: [1576115.915265] R10: ffffa1088d000000 R11: 0000000000000000 R12: ffffa1102dcd4300
> Oct 14 23:38:32 bnode001 kernel: [1576115.916079] R13: ffffa1152c923b80 R14: ffffa1152c923bf8 R15: ffffa114f8127800
> Oct 14 23:38:32 bnode001 kernel: [1576115.916080] FS: 0000000000000000(0000) GS:ffffa1163f4c0000(0000) knlGS:0000000000000000
> Oct 14 23:38:32 bnode001 kernel: [1576115.917705] R13: ffffa1152ed3c400 R14: ffffa1152ed3c478 R15: ffffa1101cbfbc00
> Oct 14 23:38:32 bnode001 kernel: [1576115.917706] FS: 0000000000000000(0000) GS:ffffa1163f700000(0000) knlGS:0000000000000000
> Oct 14 23:38:32 bnode001 kernel: [1576115.918506] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Oct 14 23:38:32 bnode001 kernel: [1576115.918508] CR2: 00007f94146af5e0 CR3: 0000001722472004 CR4: 0000000000760ee0
> Oct 14 23:38:32 bnode001 kernel: [1576115.919301] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Oct 14 23:38:32 bnode001 kernel: [1576115.919302] CR2: 00007f32f0a2dc80 CR3: 0000001f9f1fc004 CR4: 0000000000760ee0
> Oct 14 23:38:32 bnode001 kernel: [1576115.920082] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Oct 14 23:38:32 bnode001 kernel: [1576115.920084] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> Oct 14 23:38:32 bnode001 kernel: [1576115.920850] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Oct 14 23:38:32 bnode001 kernel: [1576115.921604] PKRU: 55555554
> Oct 14 23:38:32 bnode001 kernel: [1576115.921605] Call Trace:
> Oct 14 23:38:32 bnode001 kernel: [1576115.922354] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> Oct 14 23:38:32 bnode001 kernel: [1576115.922355] PKRU: 55555554
> Oct 14 23:38:32 bnode001 kernel: [1576115.923112] mmu_notifier_get_locked+0x5f/0xe0
> Oct 14 23:38:32 bnode001 kernel: [1576115.923867] Call Trace:
> Oct 14 23:38:32 bnode001 kernel: [1576115.923870] ? mmu_notifier_get_locked+0x79/0xe0
> Oct 14 23:38:32 bnode001 kernel: [1576115.924645] ib_umem_odp_alloc_child+0x15a/0x290 [ib_core]
> Oct 14 23:38:32 bnode001 kernel: [1576115.925409] ib_umem_odp_alloc_child+0x15a/0x290 [ib_core]
> Oct 14 23:38:32 bnode001 kernel: [1576115.926161] pagefault_mr+0x312/0x5d0 [mlx5_ib]
> Oct 14 23:38:32 bnode001 kernel: [1576115.926906] pagefault_mr+0x312/0x5d0 [mlx5_ib]
> Oct 14 23:38:32 bnode001 kernel: [1576115.927651] pagefault_single_data_segment.isra.0+0x284/0x490 [mlx5_ib]
> Oct 14 23:38:32 bnode001 kernel: [1576115.928393] pagefault_single_data_segment.isra.0+0x284/0x490 [mlx5_ib]
> Oct 14 23:38:32 bnode001 kernel: [1576115.929131] mlx5_ib_eqe_pf_action+0x7d5/0x990 [mlx5_ib]
> Oct 14 23:38:32 bnode001 kernel: [1576115.929866] mlx5_ib_eqe_pf_action+0x7d5/0x990 [mlx5_ib]
> Oct 14 23:38:32 bnode001 kernel: [1576115.930610] process_one_work+0x1eb/0x3b0
> Oct 14 23:38:32 bnode001 kernel: [1576115.931351] process_one_work+0x1eb/0x3b0
> Oct 14 23:38:32 bnode001 kernel: [1576115.932084] worker_thread+0x4d/0x400
> Oct 14 23:38:32 bnode001 kernel: [1576115.932813] worker_thread+0x4d/0x400
> Oct 14 23:38:32 bnode001 kernel: [1576115.933543] kthread+0x104/0x140
> Oct 14 23:38:32 bnode001 kernel: [1576115.934272] kthread+0x104/0x140
> Oct 14 23:38:32 bnode001 kernel: [1576115.934986] ? process_one_work+0x3b0/0x3b0
> Oct 14 23:38:32 bnode001 kernel: [1576115.934988] ? kthread_park+0x90/0x90
> Oct 14 23:38:32 bnode001 kernel: [1576115.935687] ? process_one_work+0x3b0/0x3b0
> Oct 14 23:38:32 bnode001 kernel: [1576115.935689] ? kthread_park+0x90/0x90
> Oct 14 23:38:32 bnode001 kernel: [1576115.936387] ret_from_fork+0x1f/0x40
> Oct 14 23:38:32 bnode001 kernel: [1576115.936389] ---[ end trace 1823b59637af552f ]---
> Oct 14 23:38:32 bnode001 kernel: [1576115.937077] ret_from_fork+0x1f/0x40
> ==========
>
> [Fix]
> There is an upstream patch that fixes this issue:
> ==========
> commit a4e63bce1414df7ab6eb82ca9feb8494ce13e554
> Author: Jason Gunthorpe <jgg@ziepe.ca>
> Date: Thu Feb 27 13:41:18 2020 +0200
>
>      RDMA/odp: Ensure the mm is still alive before creating an implicit child
> ==========
> The patch has been implemented to modify the behavior by calling mmget() around the registration, thereby ensuring it is held to avoid the race condition.
>
> [Test Plan]
> This is a race condition issue and may not be easy to reproduce.
> The test plan involves running on a system with InfiniBand, triggering the RDMA ODP page fault path to check if everything works as expected.
>
> [Where problems could occur]
> The patch calls mmget_not_zero() before registering the MMU notifier and puts it after registration is done.
> This change may not affect the execution result but ensures that the mm will not be released during registration.
> The risk associated with adopting this patch can be judged as low.
>
> Jason Gunthorpe (1):
>    RDMA/odp: Ensure the mm is still alive before creating an implicit
>      child
>
>   drivers/infiniband/core/umem_odp.c | 22 ++++++++++++++++++----
>   1 file changed, 18 insertions(+), 4 deletions(-)
Applied to focal master-next branch. Thanks!