mbox series

[SRU,J,0/3] bcache: fix deadlock

Message ID 20220729115804.2207073-1-nikhil.kshirsagar@canonical.com
Headers show
Series bcache: fix deadlock | expand

Message

Nikhil Kshirsagar July 29, 2022, 11:58 a.m. UTC
BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1980925

[Impact]

When Random Read I/O is started with a test like -

fio --name=read_iops --directory=/home/ubuntu/bcache_mount/ --size=16G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --randrepeat=0

or

random read-writes with a test like,

fio --filename=/home/ubuntu/bcache_mount/cachedfile --size=15GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=128 --name=iops-test-job --randrepeat=0

traces are seen in the kernel log,

[ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds.
[ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445
[ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000
[ 4474.731408] Call Trace:
[ 4474.731411] <TASK>
[ 4474.731413] __schedule+0x23d/0x5a0
[ 4474.731433] schedule+0x4e/0xb0
[ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0
[ 4474.731441] down_write+0x43/0x50
[ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache]
[ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache]
[ 4474.731487] kthread+0x12a/0x150
[ 4474.731491] ? set_kthread_struct+0x50/0x50
[ 4474.731494] ret_from_fork+0x22/0x30
[ 4474.731499] </TASK>

The bug exists till kernel 5.15.50-051550-generic

The reproducer is pasted below:

# uname -a
Linux bronzor 5.15.50-051550-generic #202206251445 SMP Sat Jun 25 14:51:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sdd 8:48 0 279.4G 0 disk
└─sdd1 8:49 0 60G 0 part
  └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount
nvme0n1 259:0 0 372.6G 0 disk
└─nvme0n1p1 259:2 0 15G 0 part
  └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount

fio --name=read_iops --directory=/home/ubuntu/bcache_mount --size=12G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --group_reporting=1
read_iops: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
fio-3.28
Starting 1 process
read_iops: Laying out IO file (1 file / 12288MiB)

The test does not progress beyond a few minutes, and this trace is then seen in the kernel log,

[ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds.
[ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445
[ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000
[ 4474.731408] Call Trace:
[ 4474.731411] <TASK>
[ 4474.731413] __schedule+0x23d/0x5a0
[ 4474.731433] schedule+0x4e/0xb0
[ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0
[ 4474.731441] down_write+0x43/0x50
[ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache]
[ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache]
[ 4474.731487] kthread+0x12a/0x150
[ 4474.731491] ? set_kthread_struct+0x50/0x50
[ 4474.731494] ret_from_fork+0x22/0x30
[ 4474.731499] </TASK>

[Fix]
These 3 fixes are needed for the SRU.

dea3560e5f31965165bcf34ecf0b47af28bfd155, 6445ec3df23f24677064a327dce437ef3e02dc6,
dc60301fb408e06e0b718c0980cdd31d2b238bee

I have built these fixes into kernel 5.15.0-39-generic (jammy) and tested to verify the problem is fixed.

[Regression Potential]

Regression potential should be minimal. I have not seen any potential drawbacks or harmful effects of this fix in my testing.

Coly Li (3):
  bcache: improve multithreaded bch_btree_check()
  bcache: improve multithreaded bch_sectors_dirty_init()
  bcache: memset on stack variables in bch_btree_check() and
    bch_sectors_dirty_init()

 drivers/md/bcache/btree.c     | 59 +++++++++++++++-----------------
 drivers/md/bcache/btree.h     |  2 +-
 drivers/md/bcache/writeback.c | 63 +++++++++++++++--------------------
 drivers/md/bcache/writeback.h |  2 +-
 4 files changed, 55 insertions(+), 71 deletions(-)

Comments

Tim Gardner July 29, 2022, 1:26 p.m. UTC | #1
On 7/29/22 05:58, Nikhil Kshirsagar wrote:
> BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1980925
> 
> [Impact]
> 
> When Random Read I/O is started with a test like -
> 
> fio --name=read_iops --directory=/home/ubuntu/bcache_mount/ --size=16G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --randrepeat=0
> 
> or
> 
> random read-writes with a test like,
> 
> fio --filename=/home/ubuntu/bcache_mount/cachedfile --size=15GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=128 --name=iops-test-job --randrepeat=0
> 
> traces are seen in the kernel log,
> 
> [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds.
> [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445
> [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000
> [ 4474.731408] Call Trace:
> [ 4474.731411] <TASK>
> [ 4474.731413] __schedule+0x23d/0x5a0
> [ 4474.731433] schedule+0x4e/0xb0
> [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0
> [ 4474.731441] down_write+0x43/0x50
> [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache]
> [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache]
> [ 4474.731487] kthread+0x12a/0x150
> [ 4474.731491] ? set_kthread_struct+0x50/0x50
> [ 4474.731494] ret_from_fork+0x22/0x30
> [ 4474.731499] </TASK>
> 
> The bug exists till kernel 5.15.50-051550-generic
> 
> The reproducer is pasted below:
> 
> # uname -a
> Linux bronzor 5.15.50-051550-generic #202206251445 SMP Sat Jun 25 14:51:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
> 
> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
> sdd 8:48 0 279.4G 0 disk
> └─sdd1 8:49 0 60G 0 part
>    └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount
> nvme0n1 259:0 0 372.6G 0 disk
> └─nvme0n1p1 259:2 0 15G 0 part
>    └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount
> 
> fio --name=read_iops --directory=/home/ubuntu/bcache_mount --size=12G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --group_reporting=1
> read_iops: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
> fio-3.28
> Starting 1 process
> read_iops: Laying out IO file (1 file / 12288MiB)
> 
> The test does not progress beyond a few minutes, and this trace is then seen in the kernel log,
> 
> [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds.
> [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445
> [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000
> [ 4474.731408] Call Trace:
> [ 4474.731411] <TASK>
> [ 4474.731413] __schedule+0x23d/0x5a0
> [ 4474.731433] schedule+0x4e/0xb0
> [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0
> [ 4474.731441] down_write+0x43/0x50
> [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache]
> [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache]
> [ 4474.731487] kthread+0x12a/0x150
> [ 4474.731491] ? set_kthread_struct+0x50/0x50
> [ 4474.731494] ret_from_fork+0x22/0x30
> [ 4474.731499] </TASK>
> 
> [Fix]
> These 3 fixes are needed for the SRU.
> 
> dea3560e5f31965165bcf34ecf0b47af28bfd155, 6445ec3df23f24677064a327dce437ef3e02dc6,
> dc60301fb408e06e0b718c0980cdd31d2b238bee
> 
> I have built these fixes into kernel 5.15.0-39-generic (jammy) and tested to verify the problem is fixed.
> 
> [Regression Potential]
> 
> Regression potential should be minimal. I have not seen any potential drawbacks or harmful effects of this fix in my testing.
> 
> Coly Li (3):
>    bcache: improve multithreaded bch_btree_check()
>    bcache: improve multithreaded bch_sectors_dirty_init()
>    bcache: memset on stack variables in bch_btree_check() and
>      bch_sectors_dirty_init()
> 
>   drivers/md/bcache/btree.c     | 59 +++++++++++++++-----------------
>   drivers/md/bcache/btree.h     |  2 +-
>   drivers/md/bcache/writeback.c | 63 +++++++++++++++--------------------
>   drivers/md/bcache/writeback.h |  2 +-
>   4 files changed, 55 insertions(+), 71 deletions(-)
> 
Acked-by: Tim Gardner <tim.gardner@canonical.com>

Need BugLink in the patches, but that can be done when applying.
Stefan Bader Aug. 1, 2022, 8:15 a.m. UTC | #2
On 29.07.22 13:58, Nikhil Kshirsagar wrote:
> BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1980925

Beside this ^ being needed in all 3 patches (and it should use the standard 
format of https://bugs.launchpad.net/bugs/1980925) all 3 SHA1 references are 
incorrect. Those *should* be:

622536443b6731ec82c563aae7807165adbe9178
  "bcache: improve multithreaded bch_btree_check()"
4dc34ae1b45fe26e772a44379f936c72623dd407
  "bcache: improve multithreaded bch_sectors_dirty_init()"
7d6b902ea0e02b2a25c480edf471cbaa4ebe6b3c
  "bcache: memset on stack variables in bch_btree_check() and
   bch_sectors_dirty_init()"

Also this is for now submitted to jammy only, but the report has a nomination 
for Focal. But both patch #1 and #2 refer to v5.7 patches for introducing the 
issue they fix and they did not get backported into v5.4[.y] as of 5.4.208. The 
fixes are included in 5.19 so Kinetic is neither affected. I marked both as invalid.

For Jammy, provided the SHA1 and BugLink will be fixed when applying:

Acked-by: Stefan Bader <stefan.bader@canonical.com>

> 
> [Impact]
> 
> When Random Read I/O is started with a test like -
> 
> fio --name=read_iops --directory=/home/ubuntu/bcache_mount/ --size=16G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --randrepeat=0
> 
> or
> 
> random read-writes with a test like,
> 
> fio --filename=/home/ubuntu/bcache_mount/cachedfile --size=15GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=128 --name=iops-test-job --randrepeat=0
> 
> traces are seen in the kernel log,
> 
> [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds.
> [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445
> [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000
> [ 4474.731408] Call Trace:
> [ 4474.731411] <TASK>
> [ 4474.731413] __schedule+0x23d/0x5a0
> [ 4474.731433] schedule+0x4e/0xb0
> [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0
> [ 4474.731441] down_write+0x43/0x50
> [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache]
> [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache]
> [ 4474.731487] kthread+0x12a/0x150
> [ 4474.731491] ? set_kthread_struct+0x50/0x50
> [ 4474.731494] ret_from_fork+0x22/0x30
> [ 4474.731499] </TASK>
> 
> The bug exists till kernel 5.15.50-051550-generic
> 
> The reproducer is pasted below:
> 
> # uname -a
> Linux bronzor 5.15.50-051550-generic #202206251445 SMP Sat Jun 25 14:51:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
> 
> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
> sdd 8:48 0 279.4G 0 disk
> └─sdd1 8:49 0 60G 0 part
>    └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount
> nvme0n1 259:0 0 372.6G 0 disk
> └─nvme0n1p1 259:2 0 15G 0 part
>    └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount
> 
> fio --name=read_iops --directory=/home/ubuntu/bcache_mount --size=12G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --group_reporting=1
> read_iops: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
> fio-3.28
> Starting 1 process
> read_iops: Laying out IO file (1 file / 12288MiB)
> 
> The test does not progress beyond a few minutes, and this trace is then seen in the kernel log,
> 
> [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds.
> [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445
> [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000
> [ 4474.731408] Call Trace:
> [ 4474.731411] <TASK>
> [ 4474.731413] __schedule+0x23d/0x5a0
> [ 4474.731433] schedule+0x4e/0xb0
> [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0
> [ 4474.731441] down_write+0x43/0x50
> [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache]
> [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache]
> [ 4474.731487] kthread+0x12a/0x150
> [ 4474.731491] ? set_kthread_struct+0x50/0x50
> [ 4474.731494] ret_from_fork+0x22/0x30
> [ 4474.731499] </TASK>
> 
> [Fix]
> These 3 fixes are needed for the SRU.
> 
> dea3560e5f31965165bcf34ecf0b47af28bfd155, 6445ec3df23f24677064a327dce437ef3e02dc6,
> dc60301fb408e06e0b718c0980cdd31d2b238bee
> 
> I have built these fixes into kernel 5.15.0-39-generic (jammy) and tested to verify the problem is fixed.
> 
> [Regression Potential]
> 
> Regression potential should be minimal. I have not seen any potential drawbacks or harmful effects of this fix in my testing.
> 
> Coly Li (3):
>    bcache: improve multithreaded bch_btree_check()
>    bcache: improve multithreaded bch_sectors_dirty_init()
>    bcache: memset on stack variables in bch_btree_check() and
>      bch_sectors_dirty_init()
> 
>   drivers/md/bcache/btree.c     | 59 +++++++++++++++-----------------
>   drivers/md/bcache/btree.h     |  2 +-
>   drivers/md/bcache/writeback.c | 63 +++++++++++++++--------------------
>   drivers/md/bcache/writeback.h |  2 +-
>   4 files changed, 55 insertions(+), 71 deletions(-)
>
Stefan Bader Aug. 2, 2022, 12:33 p.m. UTC | #3
On 29.07.22 13:58, Nikhil Kshirsagar wrote:
> BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1980925
> 
> [Impact]
> 
> When Random Read I/O is started with a test like -
> 
> fio --name=read_iops --directory=/home/ubuntu/bcache_mount/ --size=16G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --randrepeat=0
> 
> or
> 
> random read-writes with a test like,
> 
> fio --filename=/home/ubuntu/bcache_mount/cachedfile --size=15GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=128 --name=iops-test-job --randrepeat=0
> 
> traces are seen in the kernel log,
> 
> [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds.
> [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445
> [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000
> [ 4474.731408] Call Trace:
> [ 4474.731411] <TASK>
> [ 4474.731413] __schedule+0x23d/0x5a0
> [ 4474.731433] schedule+0x4e/0xb0
> [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0
> [ 4474.731441] down_write+0x43/0x50
> [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache]
> [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache]
> [ 4474.731487] kthread+0x12a/0x150
> [ 4474.731491] ? set_kthread_struct+0x50/0x50
> [ 4474.731494] ret_from_fork+0x22/0x30
> [ 4474.731499] </TASK>
> 
> The bug exists till kernel 5.15.50-051550-generic
> 
> The reproducer is pasted below:
> 
> # uname -a
> Linux bronzor 5.15.50-051550-generic #202206251445 SMP Sat Jun 25 14:51:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
> 
> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
> sdd 8:48 0 279.4G 0 disk
> └─sdd1 8:49 0 60G 0 part
>    └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount
> nvme0n1 259:0 0 372.6G 0 disk
> └─nvme0n1p1 259:2 0 15G 0 part
>    └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount
> 
> fio --name=read_iops --directory=/home/ubuntu/bcache_mount --size=12G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --group_reporting=1
> read_iops: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
> fio-3.28
> Starting 1 process
> read_iops: Laying out IO file (1 file / 12288MiB)
> 
> The test does not progress beyond a few minutes, and this trace is then seen in the kernel log,
> 
> [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds.
> [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445
> [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000
> [ 4474.731408] Call Trace:
> [ 4474.731411] <TASK>
> [ 4474.731413] __schedule+0x23d/0x5a0
> [ 4474.731433] schedule+0x4e/0xb0
> [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0
> [ 4474.731441] down_write+0x43/0x50
> [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache]
> [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache]
> [ 4474.731487] kthread+0x12a/0x150
> [ 4474.731491] ? set_kthread_struct+0x50/0x50
> [ 4474.731494] ret_from_fork+0x22/0x30
> [ 4474.731499] </TASK>
> 
> [Fix]
> These 3 fixes are needed for the SRU.
> 
> dea3560e5f31965165bcf34ecf0b47af28bfd155, 6445ec3df23f24677064a327dce437ef3e02dc6,
> dc60301fb408e06e0b718c0980cdd31d2b238bee
> 
> I have built these fixes into kernel 5.15.0-39-generic (jammy) and tested to verify the problem is fixed.
> 
> [Regression Potential]
> 
> Regression potential should be minimal. I have not seen any potential drawbacks or harmful effects of this fix in my testing.
> 
> Coly Li (3):
>    bcache: improve multithreaded bch_btree_check()
>    bcache: improve multithreaded bch_sectors_dirty_init()
>    bcache: memset on stack variables in bch_btree_check() and
>      bch_sectors_dirty_init()
> 
>   drivers/md/bcache/btree.c     | 59 +++++++++++++++-----------------
>   drivers/md/bcache/btree.h     |  2 +-
>   drivers/md/bcache/writeback.c | 63 +++++++++++++++--------------------
>   drivers/md/bcache/writeback.h |  2 +-
>   4 files changed, 55 insertions(+), 71 deletions(-)
> 

Applied patch #3 to jammy:linux/master-next and annotated the other 2 which had 
been included in upstream v5.15.46 to also refer to the additional bug report. 
Thanks.

-Stefan