mbox series

[0/6] *** SUBJECT HERE ***

Message ID 20230606072229.3988976-1-gerald.yang@canonical.com
Headers show
Series *** SUBJECT HERE *** | expand

Message

Gerald Yang June 6, 2023, 7:22 a.m. UTC
BugLink: https://bugs.launchpad.net/bugs/2022318

SRU Justification:

[ Impact ]

When running fio on a NVME on an AWS test instance with 5.19 kernel, IOs get stuck and fio never ends

fio command:
sudo fio --name=read_iops_test --filename=/dev/nvme1n1 --filesize=50G --time_based --ramp_time=2s --runtime=1m --ioengine=libaio --direct=1 --verify=0 --randrepeat=0 --bs=16K --iodepth=256 --rw=randread
read_iops_test: (g=0): rw=randread, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=libaio, iodepth=256
fio-3.28
Starting 1 process
Jobs: 1 (f=0): [/(1)][-.-%][eta 01m:02s]

IOs completely get stuck, after a while kernel log shows:
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.230970] INFO: task fio:2545 blocked for more than 120 seconds.
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.232878] Not tainted 5.19.0-43-generic #44~22.04.1-Ubuntu
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.234738] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237053] task:fio state:D stack: 0 pid: 2545 ppid: 2495 flags:0x00000002
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237057] Call Trace:
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237058] <TASK>
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237061] __schedule+0x257/0x5d0
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237066] schedule+0x68/0x110
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237068] io_schedule+0x46/0x80
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237069] blk_mq_get_tag+0x117/0x300
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237072] ? destroy_sched_domains_rcu+0x40/0x40
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237076] __blk_mq_alloc_requests+0xc4/0x1e0
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237079] blk_mq_get_new_requests+0xf6/0x1a0
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237080] blk_mq_submit_bio+0x1eb/0x440
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237082] __submit_bio+0x109/0x1a0
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237085] submit_bio_noacct_nocheck+0xc2/0x120
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237087] submit_bio_noacct+0x209/0x590
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237088] submit_bio+0x40/0xf0
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237090] __blkdev_direct_IO_async+0x146/0x1f0
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237092] blkdev_direct_IO.part.0+0x40/0xa0
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237093] blkdev_read_iter+0x9f/0x1a0
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237094] aio_read+0xec/0x1d0
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237097] ? __io_submit_one.constprop.0+0x113/0x200
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237099] __io_submit_one.constprop.0+0x113/0x200
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237100] ? __io_submit_one.constprop.0+0x113/0x200
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237101] io_submit_one+0xe8/0x3d0
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237103] __x64_sys_io_submit+0x84/0x190
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237104] ? do_syscall_64+0x69/0x90
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237106] ? do_syscall_64+0x69/0x90
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237107] do_syscall_64+0x59/0x90
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237108] ? syscall_exit_to_user_mode+0x2a/0x50
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237110] ? do_syscall_64+0x69/0x90
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237111] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237113] RIP: 0033:0x7f44f351ea3d
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237116] RSP: 002b:00007fff1dcfe558 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237117] RAX: ffffffffffffffda RBX: 00007f44f2272b68 RCX: 00007f44f351ea3d
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237118] RDX: 000056315d9ad828 RSI: 0000000000000001 RDI: 00007f44f224f000
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237119] RBP: 00007f44f224f000 R08: 00007f44e9430000 R09: 00000000000002d8
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237120] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237120] R13: 0000000000000000 R14: 000056315d9ad828 R15: 000056315d9e1830
Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237122] </TASK>

This issue can not be reproduced on 5.15 and 6.2 kernels

From call trace, it got stuck more than 120 seconds waiting on previous IOs to complete for freeing some tags, so new IO requests are able to obtain tags
But in fact, not all previous IOs got stuck, at least some of previous IOs should have completed, but the waiters were not waken up

This issue is fixed by the upstream commit below which has been merged in kernel 6.1
commit 4acb83417cadfdcbe64215f9d0ddcf3132af808e
Author: Keith Busch <kbusch@kernel.org>
Date: Fri Sep 9 11:40:22 2022 -0700

    sbitmap: fix batched wait_cnt accounting

    Batched completions can clear multiple bits, but we're only decrementing
    the wait_cnt by one each time. This can cause waiters to never be woken,
    stalling IO. Use the batched count instead.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=215679
    Signed-off-by: Keith Busch <kbusch@kernel.org>
    Link: https://lore.kernel.org/r/20220909184022.1709476-1-kbusch@fb.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

This commit can not be cherry picked cleanly, so we also need to SRU all its dependencies and one further fix below:

30514bd2dd4e sbitmap: fix lockup while swapping
4acb83417cad sbitmap: fix batched wait_cnt accounting
c35227d4e8cb sbitmap: Use atomic_long_try_cmpxchg in __sbitmap_queue_get_batch
48c033314f37 sbitmap: Avoid leaving waitqueue in invalid state in __sbq_wake_up()
ddbfc34fcf5d sbitmap: remove unnecessary code in __sbitmap_queue_get_batch
040b83fcecfb sbitmap: fix possible io hung due to lost wakeup

[ Test Plan ]

This can be simply reproduced by launching an instance on AWS EC2
And run the fio command on a nvme device for few hours to make sure IOs don’t get stuck

I’ve built a test kernel with above commits on 5.19.0-43 generic kernel here:
https://launchpad.net/~gerald-yang-tw/+archive/ubuntu/361041-generic

With this test kernel, fio has been running for few hours without any issue

[ Where problems could occur ]

The sbitmap is mainly used in blk-mq in block layer, scsi drivers and fungible ethernet driver
If there is any issue happens in sbitmap, the symptom should be IO hung, or packets get stuck in fungible driver


Hugh Dickins (1):
  sbitmap: fix lockup while swapping

Jan Kara (1):
  sbitmap: Avoid leaving waitqueue in invalid state in __sbq_wake_up()

Keith Busch (1):
  sbitmap: fix batched wait_cnt accounting

Liu Song (1):
  sbitmap: remove unnecessary code in __sbitmap_queue_get_batch

Uros Bizjak (1):
  sbitmap: Use atomic_long_try_cmpxchg in __sbitmap_queue_get_batch

Yu Kuai (1):
  sbitmap: fix possible io hung due to lost wakeup

 block/blk-mq-tag.c      |   2 +-
 include/linux/sbitmap.h |   3 +-
 lib/sbitmap.c           | 109 ++++++++++++++++++++++++++--------------
 3 files changed, 73 insertions(+), 41 deletions(-)

Comments

Roxana Nicolescu June 6, 2023, 7:24 a.m. UTC | #1
On 06/06/2023 09:22, Gerald Yang wrote:
> BugLink: https://bugs.launchpad.net/bugs/2022318
>
> SRU Justification:
>
> [ Impact ]
>
> When running fio on a NVME on an AWS test instance with 5.19 kernel, IOs get stuck and fio never ends
>
> fio command:
> sudo fio --name=read_iops_test --filename=/dev/nvme1n1 --filesize=50G --time_based --ramp_time=2s --runtime=1m --ioengine=libaio --direct=1 --verify=0 --randrepeat=0 --bs=16K --iodepth=256 --rw=randread
> read_iops_test: (g=0): rw=randread, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=libaio, iodepth=256
> fio-3.28
> Starting 1 process
> Jobs: 1 (f=0): [/(1)][-.-%][eta 01m:02s]
>
> IOs completely get stuck, after a while kernel log shows:
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.230970] INFO: task fio:2545 blocked for more than 120 seconds.
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.232878] Not tainted 5.19.0-43-generic #44~22.04.1-Ubuntu
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.234738] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237053] task:fio state:D stack: 0 pid: 2545 ppid: 2495 flags:0x00000002
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237057] Call Trace:
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237058] <TASK>
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237061] __schedule+0x257/0x5d0
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237066] schedule+0x68/0x110
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237068] io_schedule+0x46/0x80
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237069] blk_mq_get_tag+0x117/0x300
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237072] ? destroy_sched_domains_rcu+0x40/0x40
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237076] __blk_mq_alloc_requests+0xc4/0x1e0
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237079] blk_mq_get_new_requests+0xf6/0x1a0
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237080] blk_mq_submit_bio+0x1eb/0x440
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237082] __submit_bio+0x109/0x1a0
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237085] submit_bio_noacct_nocheck+0xc2/0x120
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237087] submit_bio_noacct+0x209/0x590
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237088] submit_bio+0x40/0xf0
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237090] __blkdev_direct_IO_async+0x146/0x1f0
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237092] blkdev_direct_IO.part.0+0x40/0xa0
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237093] blkdev_read_iter+0x9f/0x1a0
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237094] aio_read+0xec/0x1d0
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237097] ? __io_submit_one.constprop.0+0x113/0x200
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237099] __io_submit_one.constprop.0+0x113/0x200
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237100] ? __io_submit_one.constprop.0+0x113/0x200
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237101] io_submit_one+0xe8/0x3d0
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237103] __x64_sys_io_submit+0x84/0x190
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237104] ? do_syscall_64+0x69/0x90
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237106] ? do_syscall_64+0x69/0x90
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237107] do_syscall_64+0x59/0x90
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237108] ? syscall_exit_to_user_mode+0x2a/0x50
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237110] ? do_syscall_64+0x69/0x90
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237111] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237113] RIP: 0033:0x7f44f351ea3d
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237116] RSP: 002b:00007fff1dcfe558 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237117] RAX: ffffffffffffffda RBX: 00007f44f2272b68 RCX: 00007f44f351ea3d
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237118] RDX: 000056315d9ad828 RSI: 0000000000000001 RDI: 00007f44f224f000
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237119] RBP: 00007f44f224f000 R08: 00007f44e9430000 R09: 00000000000002d8
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237120] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237120] R13: 0000000000000000 R14: 000056315d9ad828 R15: 000056315d9e1830
> Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237122] </TASK>
>
> This issue can not be reproduced on 5.15 and 6.2 kernels
>
>  From call trace, it got stuck more than 120 seconds waiting on previous IOs to complete for freeing some tags, so new IO requests are able to obtain tags
> But in fact, not all previous IOs got stuck, at least some of previous IOs should have completed, but the waiters were not waken up
>
> This issue is fixed by the upstream commit below which has been merged in kernel 6.1
> commit 4acb83417cadfdcbe64215f9d0ddcf3132af808e
> Author: Keith Busch <kbusch@kernel.org>
> Date: Fri Sep 9 11:40:22 2022 -0700
>
>      sbitmap: fix batched wait_cnt accounting
>
>      Batched completions can clear multiple bits, but we're only decrementing
>      the wait_cnt by one each time. This can cause waiters to never be woken,
>      stalling IO. Use the batched count instead.
>
>      Link: https://bugzilla.kernel.org/show_bug.cgi?id=215679
>      Signed-off-by: Keith Busch <kbusch@kernel.org>
>      Link: https://lore.kernel.org/r/20220909184022.1709476-1-kbusch@fb.com
>      Signed-off-by: Jens Axboe <axboe@kernel.dk>
>
> This commit can not be cherry picked cleanly, so we also need to SRU all its dependencies and one further fix below:
>
> 30514bd2dd4e sbitmap: fix lockup while swapping
> 4acb83417cad sbitmap: fix batched wait_cnt accounting
> c35227d4e8cb sbitmap: Use atomic_long_try_cmpxchg in __sbitmap_queue_get_batch
> 48c033314f37 sbitmap: Avoid leaving waitqueue in invalid state in __sbq_wake_up()
> ddbfc34fcf5d sbitmap: remove unnecessary code in __sbitmap_queue_get_batch
> 040b83fcecfb sbitmap: fix possible io hung due to lost wakeup
>
> [ Test Plan ]
>
> This can be simply reproduced by launching an instance on AWS EC2
> And run the fio command on a nvme device for few hours to make sure IOs don’t get stuck
>
> I’ve built a test kernel with above commits on 5.19.0-43 generic kernel here:
> https://launchpad.net/~gerald-yang-tw/+archive/ubuntu/361041-generic
>
> With this test kernel, fio has been running for few hours without any issue
>
> [ Where problems could occur ]
>
> The sbitmap is mainly used in blk-mq in block layer, scsi drivers and fungible ethernet driver
> If there is any issue happens in sbitmap, the symptom should be IO hung, or packets get stuck in fungible driver
>
>
> Hugh Dickins (1):
>    sbitmap: fix lockup while swapping
>
> Jan Kara (1):
>    sbitmap: Avoid leaving waitqueue in invalid state in __sbq_wake_up()
>
> Keith Busch (1):
>    sbitmap: fix batched wait_cnt accounting
>
> Liu Song (1):
>    sbitmap: remove unnecessary code in __sbitmap_queue_get_batch
>
> Uros Bizjak (1):
>    sbitmap: Use atomic_long_try_cmpxchg in __sbitmap_queue_get_batch
>
> Yu Kuai (1):
>    sbitmap: fix possible io hung due to lost wakeup
>
>   block/blk-mq-tag.c      |   2 +-
>   include/linux/sbitmap.h |   3 +-
>   lib/sbitmap.c           | 109 ++++++++++++++++++++++++++--------------
>   3 files changed, 73 insertions(+), 41 deletions(-)
>
No proper subject. I think a follow-up submission was sent in the meantime.
Gerald Yang June 6, 2023, 7:27 a.m. UTC | #2
Sorry, please ignore this email

On Tue, Jun 6, 2023 at 3:25 PM Roxana Nicolescu <
roxana.nicolescu@canonical.com> wrote:

>
> On 06/06/2023 09:22, Gerald Yang wrote:
> > BugLink: https://bugs.launchpad.net/bugs/2022318
> >
> > SRU Justification:
> >
> > [ Impact ]
> >
> > When running fio on a NVME on an AWS test instance with 5.19 kernel, IOs
> get stuck and fio never ends
> >
> > fio command:
> > sudo fio --name=read_iops_test --filename=/dev/nvme1n1 --filesize=50G
> --time_based --ramp_time=2s --runtime=1m --ioengine=libaio --direct=1
> --verify=0 --randrepeat=0 --bs=16K --iodepth=256 --rw=randread
> > read_iops_test: (g=0): rw=randread, bs=(R) 16.0KiB-16.0KiB, (W)
> 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=libaio, iodepth=256
> > fio-3.28
> > Starting 1 process
> > Jobs: 1 (f=0): [/(1)][-.-%][eta 01m:02s]
> >
> > IOs completely get stuck, after a while kernel log shows:
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.230970] INFO: task
> fio:2545 blocked for more than 120 seconds.
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.232878] Not tainted
> 5.19.0-43-generic #44~22.04.1-Ubuntu
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.234738] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237053] task:fio state:D
> stack: 0 pid: 2545 ppid: 2495 flags:0x00000002
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237057] Call Trace:
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237058] <TASK>
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237061]
> __schedule+0x257/0x5d0
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237066] schedule+0x68/0x110
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237068]
> io_schedule+0x46/0x80
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237069]
> blk_mq_get_tag+0x117/0x300
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237072] ?
> destroy_sched_domains_rcu+0x40/0x40
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237076]
> __blk_mq_alloc_requests+0xc4/0x1e0
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237079]
> blk_mq_get_new_requests+0xf6/0x1a0
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237080]
> blk_mq_submit_bio+0x1eb/0x440
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237082]
> __submit_bio+0x109/0x1a0
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237085]
> submit_bio_noacct_nocheck+0xc2/0x120
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237087]
> submit_bio_noacct+0x209/0x590
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237088]
> submit_bio+0x40/0xf0
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237090]
> __blkdev_direct_IO_async+0x146/0x1f0
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237092]
> blkdev_direct_IO.part.0+0x40/0xa0
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237093]
> blkdev_read_iter+0x9f/0x1a0
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237094] aio_read+0xec/0x1d0
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237097] ?
> __io_submit_one.constprop.0+0x113/0x200
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237099]
> __io_submit_one.constprop.0+0x113/0x200
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237100] ?
> __io_submit_one.constprop.0+0x113/0x200
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237101]
> io_submit_one+0xe8/0x3d0
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237103]
> __x64_sys_io_submit+0x84/0x190
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237104] ?
> do_syscall_64+0x69/0x90
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237106] ?
> do_syscall_64+0x69/0x90
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237107]
> do_syscall_64+0x59/0x90
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237108] ?
> syscall_exit_to_user_mode+0x2a/0x50
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237110] ?
> do_syscall_64+0x69/0x90
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237111]
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237113] RIP:
> 0033:0x7f44f351ea3d
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237116] RSP:
> 002b:00007fff1dcfe558 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237117] RAX:
> ffffffffffffffda RBX: 00007f44f2272b68 RCX: 00007f44f351ea3d
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237118] RDX:
> 000056315d9ad828 RSI: 0000000000000001 RDI: 00007f44f224f000
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237119] RBP:
> 00007f44f224f000 R08: 00007f44e9430000 R09: 00000000000002d8
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237120] R10:
> 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237120] R13:
> 0000000000000000 R14: 000056315d9ad828 R15: 000056315d9e1830
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237122] </TASK>
> >
> > This issue can not be reproduced on 5.15 and 6.2 kernels
> >
> >  From call trace, it got stuck more than 120 seconds waiting on previous
> IOs to complete for freeing some tags, so new IO requests are able to
> obtain tags
> > But in fact, not all previous IOs got stuck, at least some of previous
> IOs should have completed, but the waiters were not waken up
> >
> > This issue is fixed by the upstream commit below which has been merged
> in kernel 6.1
> > commit 4acb83417cadfdcbe64215f9d0ddcf3132af808e
> > Author: Keith Busch <kbusch@kernel.org>
> > Date: Fri Sep 9 11:40:22 2022 -0700
> >
> >      sbitmap: fix batched wait_cnt accounting
> >
> >      Batched completions can clear multiple bits, but we're only
> decrementing
> >      the wait_cnt by one each time. This can cause waiters to never be
> woken,
> >      stalling IO. Use the batched count instead.
> >
> >      Link: https://bugzilla.kernel.org/show_bug.cgi?id=215679
> >      Signed-off-by: Keith Busch <kbusch@kernel.org>
> >      Link:
> https://lore.kernel.org/r/20220909184022.1709476-1-kbusch@fb.com
> >      Signed-off-by: Jens Axboe <axboe@kernel.dk>
> >
> > This commit can not be cherry picked cleanly, so we also need to SRU all
> its dependencies and one further fix below:
> >
> > 30514bd2dd4e sbitmap: fix lockup while swapping
> > 4acb83417cad sbitmap: fix batched wait_cnt accounting
> > c35227d4e8cb sbitmap: Use atomic_long_try_cmpxchg in
> __sbitmap_queue_get_batch
> > 48c033314f37 sbitmap: Avoid leaving waitqueue in invalid state in
> __sbq_wake_up()
> > ddbfc34fcf5d sbitmap: remove unnecessary code in
> __sbitmap_queue_get_batch
> > 040b83fcecfb sbitmap: fix possible io hung due to lost wakeup
> >
> > [ Test Plan ]
> >
> > This can be simply reproduced by launching an instance on AWS EC2
> > And run the fio command on a nvme device for few hours to make sure IOs
> don’t get stuck
> >
> > I’ve built a test kernel with above commits on 5.19.0-43 generic kernel
> here:
> > https://launchpad.net/~gerald-yang-tw/+archive/ubuntu/361041-generic
> >
> > With this test kernel, fio has been running for few hours without any
> issue
> >
> > [ Where problems could occur ]
> >
> > The sbitmap is mainly used in blk-mq in block layer, scsi drivers and
> fungible ethernet driver
> > If there is any issue happens in sbitmap, the symptom should be IO hung,
> or packets get stuck in fungible driver
> >
> >
> > Hugh Dickins (1):
> >    sbitmap: fix lockup while swapping
> >
> > Jan Kara (1):
> >    sbitmap: Avoid leaving waitqueue in invalid state in __sbq_wake_up()
> >
> > Keith Busch (1):
> >    sbitmap: fix batched wait_cnt accounting
> >
> > Liu Song (1):
> >    sbitmap: remove unnecessary code in __sbitmap_queue_get_batch
> >
> > Uros Bizjak (1):
> >    sbitmap: Use atomic_long_try_cmpxchg in __sbitmap_queue_get_batch
> >
> > Yu Kuai (1):
> >    sbitmap: fix possible io hung due to lost wakeup
> >
> >   block/blk-mq-tag.c      |   2 +-
> >   include/linux/sbitmap.h |   3 +-
> >   lib/sbitmap.c           | 109 ++++++++++++++++++++++++++--------------
> >   3 files changed, 73 insertions(+), 41 deletions(-)
> >
> No proper subject. I think a follow-up submission was sent in the meantime.
>
>
> --
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
>