Message ID | 20190807131824.GA31905@xps-13 |
---|---|
State | New |
Headers | show |
Series | [SRU,X,PULL,v2] bcache: fix hung task timeout in bch_bucket_alloc() | expand |
On 07.08.19 15:18, Andrea Righi wrote: > BugLink: https://bugs.launchpad.net/bugs/1784665 > > [Impact] > > bcache_allocator can call the following: > > bch_allocator_thread() > -> bch_prio_write() > -> bch_bucket_alloc() > -> wait on &ca->set->bucket_wait > > But the wake up event on bucket_wait is supposed to come from > bch_allocator_thread() itself causing a deadlock. > > [Test Case] > > This is a simple script that can easily trigger the deadlock condition: > https://launchpadlibrarian.net/381282009/bcache-basic-repro.sh > > A better test case has been also provided in LP: #1796292: > https://bugs.launchpad.net/curtin/+bug/1796292/+attachment/5280353/+files/curtin-nvme.sh > > [Fix] > > Fix by making the call to bch_prio_write() non-blocking, so that > bch_allocator_thread() never waits on itself. Moreover, make sure to > wake up the garbage collector thread when bch_prio_write() is failing to > allocate buckets to increase the chance of freeing up more buckets. > > In addition to that it would be safer to also import other upstream > bcache fixes (all clean cherry picks): > > ecb37ce9baac653cc09e2b631393dde3df82979f bcache: Move couple of functions to sysfs.c > 20d3a518713e394efa5a899c84574b4b79ec5098 bcache: Reduce the number of sparse complaints about lock imbalances > 42361469ae84c851e40cb1f94c8c9a14cdd94039 bcache: Suppress more warnings about set-but-not-used variables > f0d3814090ac77de94c42b7124c37ece23629197 bcache: Remove an unused variable > 47344e330eabc1515cbe6061eb337100a3ab6d37 bcache: Fix kernel-doc warnings > 9dfbdec7b7fea1ff1b7b5d5d12980dbc7dca46c7 bcache: Annotate switch fall-through > 4a4e443835a43a79113cc237c472c0d268eb1e1c bcache: Add __printf annotation to __bch_check_keys() > fd01991d5c20098c5c1ffc4dca6c821cc60a2f74 bcache: Fix indentation > ca71df31661a0518ed58a1a59cf1993962153ebb bcache: fix using of loop variable in memory shrink > f3641c3abd1da978ee969b0203b71b86ec1bfa93 bcache: fix error return value in memory shrink > 688892b3bc05e25da94866e32210e5f503f16f69 bcache: fix incorrect sysfs output value of strip size > c4dc2497d50d9c6fb16aa0d07b6a14f3b2adb1e0 bcache: fix high CPU occupancy during journal > a728eacbbdd229d1d903e46261c57d5206f87a4a bcache: add journal statistic > eb8cbb6df38f6e5124a3d5f1f8a3dbf519537c60 bcache: improve bcache_reboot() > > [Regression Potential] > > The upstream fixes are all clean cherry picks from stable (most of them > are small cleanups), so regression potential is minimal. > > The only special patch is "UBUNTU: SAUCE: bcache: fix deadlock in > bcache_allocator" that is addressing the main deadlock bug. Even if this > patch is not yet applied upstream (posted to the LKML), it seems to > reliably fix/prevent the specific deadlock problem reported in this bug, > so it should be considered safe to apply it as it is for now, to prevent > potential hung task timeout conditions. > > Changes in v2: > - fix potential buckets leak in "UBUNTU: SAUCE: bcache: fix deadlock in > bcache_allocator" > > ---------------------------------------------------------------- > The following changes since commit 4a916e0ee66c44ecdf473dc96a5ff2e4ad4ae196: > > floppy: fix out-of-bounds read in copy_buffer (2019-08-07 00:54:59 -0400) > > are available in the Git repository at: > > git://git.launchpad.net/~arighi/+git/xenial-linux bcache-fix-v2 > > for you to fetch changes up to 4f40d1b13ac6288a8a842c1fa6fcd49ccc8de0b9: > > UBUNTU: SAUCE: bcache: fix deadlock in bcache_allocator (2019-08-07 14:19:12 +0200) > > ---------------------------------------------------------------- > Andrea Righi (1): > UBUNTU: SAUCE: bcache: fix deadlock in bcache_allocator > > Andy Shevchenko (1): > bcache: Move couple of functions to sysfs.c > > Bart Van Assche (7): > bcache: Fix indentation > bcache: Add __printf annotation to __bch_check_keys() > bcache: Annotate switch fall-through > bcache: Fix kernel-doc warnings > bcache: Remove an unused variable > bcache: Suppress more warnings about set-but-not-used variables > bcache: Reduce the number of sparse complaints about lock imbalances > > Coly Li (1): > bcache: improve bcache_reboot() > > Tang Junhui (5): > bcache: add journal statistic > bcache: fix high CPU occupancy during journal > bcache: fix incorrect sysfs output value of strip size > bcache: fix error return value in memory shrink > bcache: fix using of loop variable in memory shrink > > drivers/md/bcache/alloc.c | 5 +++- > drivers/md/bcache/bcache.h | 8 +++++- > drivers/md/bcache/bset.c | 4 +-- > drivers/md/bcache/bset.h | 5 ++-- > drivers/md/bcache/btree.c | 15 ++++++----- > drivers/md/bcache/closure.c | 8 +++--- > drivers/md/bcache/extents.c | 2 -- > drivers/md/bcache/journal.c | 56 ++++++++++++++++++++++++++++------------ > drivers/md/bcache/request.c | 1 + > drivers/md/bcache/super.c | 39 ++++++++++++++++++++++------ > drivers/md/bcache/sysfs.c | 52 ++++++++++++++++++++++++++++++++++++- > drivers/md/bcache/util.c | 60 +++++++++++-------------------------------- > drivers/md/bcache/util.h | 7 ++--- > drivers/md/bcache/writeback.h | 2 +- > 14 files changed, 170 insertions(+), 94 deletions(-) > Same as for the Bionic pull request... Acked-by: Stefan Bader <stefan.bader@canonical.com>
On 8/7/19 3:18 PM, Andrea Righi wrote: > BugLink: https://bugs.launchpad.net/bugs/1784665 > > [Impact] > > bcache_allocator can call the following: > > bch_allocator_thread() > -> bch_prio_write() > -> bch_bucket_alloc() > -> wait on &ca->set->bucket_wait > > But the wake up event on bucket_wait is supposed to come from > bch_allocator_thread() itself causing a deadlock. > > [Test Case] > > This is a simple script that can easily trigger the deadlock condition: > https://launchpadlibrarian.net/381282009/bcache-basic-repro.sh > > A better test case has been also provided in LP: #1796292: > https://bugs.launchpad.net/curtin/+bug/1796292/+attachment/5280353/+files/curtin-nvme.sh > > [Fix] > > Fix by making the call to bch_prio_write() non-blocking, so that > bch_allocator_thread() never waits on itself. Moreover, make sure to > wake up the garbage collector thread when bch_prio_write() is failing to > allocate buckets to increase the chance of freeing up more buckets. > > In addition to that it would be safer to also import other upstream > bcache fixes (all clean cherry picks): > > ecb37ce9baac653cc09e2b631393dde3df82979f bcache: Move couple of functions to sysfs.c > 20d3a518713e394efa5a899c84574b4b79ec5098 bcache: Reduce the number of sparse complaints about lock imbalances > 42361469ae84c851e40cb1f94c8c9a14cdd94039 bcache: Suppress more warnings about set-but-not-used variables > f0d3814090ac77de94c42b7124c37ece23629197 bcache: Remove an unused variable > 47344e330eabc1515cbe6061eb337100a3ab6d37 bcache: Fix kernel-doc warnings > 9dfbdec7b7fea1ff1b7b5d5d12980dbc7dca46c7 bcache: Annotate switch fall-through > 4a4e443835a43a79113cc237c472c0d268eb1e1c bcache: Add __printf annotation to __bch_check_keys() > fd01991d5c20098c5c1ffc4dca6c821cc60a2f74 bcache: Fix indentation > ca71df31661a0518ed58a1a59cf1993962153ebb bcache: fix using of loop variable in memory shrink > f3641c3abd1da978ee969b0203b71b86ec1bfa93 bcache: fix error return value in memory shrink > 688892b3bc05e25da94866e32210e5f503f16f69 bcache: fix incorrect sysfs output value of strip size > c4dc2497d50d9c6fb16aa0d07b6a14f3b2adb1e0 bcache: fix high CPU occupancy during journal > a728eacbbdd229d1d903e46261c57d5206f87a4a bcache: add journal statistic > eb8cbb6df38f6e5124a3d5f1f8a3dbf519537c60 bcache: improve bcache_reboot() > > [Regression Potential] > > The upstream fixes are all clean cherry picks from stable (most of them > are small cleanups), so regression potential is minimal. > > The only special patch is "UBUNTU: SAUCE: bcache: fix deadlock in > bcache_allocator" that is addressing the main deadlock bug. Even if this > patch is not yet applied upstream (posted to the LKML), it seems to > reliably fix/prevent the specific deadlock problem reported in this bug, > so it should be considered safe to apply it as it is for now, to prevent > potential hung task timeout conditions. > > Changes in v2: > - fix potential buckets leak in "UBUNTU: SAUCE: bcache: fix deadlock in > bcache_allocator" > > ---------------------------------------------------------------- > The following changes since commit 4a916e0ee66c44ecdf473dc96a5ff2e4ad4ae196: > > floppy: fix out-of-bounds read in copy_buffer (2019-08-07 00:54:59 -0400) > > are available in the Git repository at: > > git://git.launchpad.net/~arighi/+git/xenial-linux bcache-fix-v2 > > for you to fetch changes up to 4f40d1b13ac6288a8a842c1fa6fcd49ccc8de0b9: > > UBUNTU: SAUCE: bcache: fix deadlock in bcache_allocator (2019-08-07 14:19:12 +0200) > > ---------------------------------------------------------------- > Andrea Righi (1): > UBUNTU: SAUCE: bcache: fix deadlock in bcache_allocator > > Andy Shevchenko (1): > bcache: Move couple of functions to sysfs.c > > Bart Van Assche (7): > bcache: Fix indentation > bcache: Add __printf annotation to __bch_check_keys() > bcache: Annotate switch fall-through > bcache: Fix kernel-doc warnings > bcache: Remove an unused variable > bcache: Suppress more warnings about set-but-not-used variables > bcache: Reduce the number of sparse complaints about lock imbalances > > Coly Li (1): > bcache: improve bcache_reboot() > > Tang Junhui (5): > bcache: add journal statistic > bcache: fix high CPU occupancy during journal > bcache: fix incorrect sysfs output value of strip size > bcache: fix error return value in memory shrink > bcache: fix using of loop variable in memory shrink > > drivers/md/bcache/alloc.c | 5 +++- > drivers/md/bcache/bcache.h | 8 +++++- > drivers/md/bcache/bset.c | 4 +-- > drivers/md/bcache/bset.h | 5 ++-- > drivers/md/bcache/btree.c | 15 ++++++----- > drivers/md/bcache/closure.c | 8 +++--- > drivers/md/bcache/extents.c | 2 -- > drivers/md/bcache/journal.c | 56 ++++++++++++++++++++++++++++------------ > drivers/md/bcache/request.c | 1 + > drivers/md/bcache/super.c | 39 ++++++++++++++++++++++------ > drivers/md/bcache/sysfs.c | 52 ++++++++++++++++++++++++++++++++++++- > drivers/md/bcache/util.c | 60 +++++++++++-------------------------------- > drivers/md/bcache/util.h | 7 ++--- > drivers/md/bcache/writeback.h | 2 +- > 14 files changed, 170 insertions(+), 94 deletions(-) > Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
On 2019-08-07 15:18:24 , Andrea Righi wrote: > BugLink: https://bugs.launchpad.net/bugs/1784665 > > [Impact] > > bcache_allocator can call the following: > > bch_allocator_thread() > -> bch_prio_write() > -> bch_bucket_alloc() > -> wait on &ca->set->bucket_wait > > But the wake up event on bucket_wait is supposed to come from > bch_allocator_thread() itself causing a deadlock. > > [Test Case] > > This is a simple script that can easily trigger the deadlock condition: > https://launchpadlibrarian.net/381282009/bcache-basic-repro.sh > > A better test case has been also provided in LP: #1796292: > https://bugs.launchpad.net/curtin/+bug/1796292/+attachment/5280353/+files/curtin-nvme.sh > > [Fix] > > Fix by making the call to bch_prio_write() non-blocking, so that > bch_allocator_thread() never waits on itself. Moreover, make sure to > wake up the garbage collector thread when bch_prio_write() is failing to > allocate buckets to increase the chance of freeing up more buckets. > > In addition to that it would be safer to also import other upstream > bcache fixes (all clean cherry picks): > > ecb37ce9baac653cc09e2b631393dde3df82979f bcache: Move couple of functions to sysfs.c > 20d3a518713e394efa5a899c84574b4b79ec5098 bcache: Reduce the number of sparse complaints about lock imbalances > 42361469ae84c851e40cb1f94c8c9a14cdd94039 bcache: Suppress more warnings about set-but-not-used variables > f0d3814090ac77de94c42b7124c37ece23629197 bcache: Remove an unused variable > 47344e330eabc1515cbe6061eb337100a3ab6d37 bcache: Fix kernel-doc warnings > 9dfbdec7b7fea1ff1b7b5d5d12980dbc7dca46c7 bcache: Annotate switch fall-through > 4a4e443835a43a79113cc237c472c0d268eb1e1c bcache: Add __printf annotation to __bch_check_keys() > fd01991d5c20098c5c1ffc4dca6c821cc60a2f74 bcache: Fix indentation > ca71df31661a0518ed58a1a59cf1993962153ebb bcache: fix using of loop variable in memory shrink > f3641c3abd1da978ee969b0203b71b86ec1bfa93 bcache: fix error return value in memory shrink > 688892b3bc05e25da94866e32210e5f503f16f69 bcache: fix incorrect sysfs output value of strip size > c4dc2497d50d9c6fb16aa0d07b6a14f3b2adb1e0 bcache: fix high CPU occupancy during journal > a728eacbbdd229d1d903e46261c57d5206f87a4a bcache: add journal statistic > eb8cbb6df38f6e5124a3d5f1f8a3dbf519537c60 bcache: improve bcache_reboot() > > [Regression Potential] > > The upstream fixes are all clean cherry picks from stable (most of them > are small cleanups), so regression potential is minimal. > > The only special patch is "UBUNTU: SAUCE: bcache: fix deadlock in > bcache_allocator" that is addressing the main deadlock bug. Even if this > patch is not yet applied upstream (posted to the LKML), it seems to > reliably fix/prevent the specific deadlock problem reported in this bug, > so it should be considered safe to apply it as it is for now, to prevent > potential hung task timeout conditions. > > Changes in v2: > - fix potential buckets leak in "UBUNTU: SAUCE: bcache: fix deadlock in > bcache_allocator" > > ---------------------------------------------------------------- > The following changes since commit 4a916e0ee66c44ecdf473dc96a5ff2e4ad4ae196: > > floppy: fix out-of-bounds read in copy_buffer (2019-08-07 00:54:59 -0400) > > are available in the Git repository at: > > git://git.launchpad.net/~arighi/+git/xenial-linux bcache-fix-v2 > > for you to fetch changes up to 4f40d1b13ac6288a8a842c1fa6fcd49ccc8de0b9: > > UBUNTU: SAUCE: bcache: fix deadlock in bcache_allocator (2019-08-07 14:19:12 +0200) > > ---------------------------------------------------------------- > Andrea Righi (1): > UBUNTU: SAUCE: bcache: fix deadlock in bcache_allocator > > Andy Shevchenko (1): > bcache: Move couple of functions to sysfs.c > > Bart Van Assche (7): > bcache: Fix indentation > bcache: Add __printf annotation to __bch_check_keys() > bcache: Annotate switch fall-through > bcache: Fix kernel-doc warnings > bcache: Remove an unused variable > bcache: Suppress more warnings about set-but-not-used variables > bcache: Reduce the number of sparse complaints about lock imbalances > > Coly Li (1): > bcache: improve bcache_reboot() > > Tang Junhui (5): > bcache: add journal statistic > bcache: fix high CPU occupancy during journal > bcache: fix incorrect sysfs output value of strip size > bcache: fix error return value in memory shrink > bcache: fix using of loop variable in memory shrink > > drivers/md/bcache/alloc.c | 5 +++- > drivers/md/bcache/bcache.h | 8 +++++- > drivers/md/bcache/bset.c | 4 +-- > drivers/md/bcache/bset.h | 5 ++-- > drivers/md/bcache/btree.c | 15 ++++++----- > drivers/md/bcache/closure.c | 8 +++--- > drivers/md/bcache/extents.c | 2 -- > drivers/md/bcache/journal.c | 56 ++++++++++++++++++++++++++++------------ > drivers/md/bcache/request.c | 1 + > drivers/md/bcache/super.c | 39 ++++++++++++++++++++++------ > drivers/md/bcache/sysfs.c | 52 ++++++++++++++++++++++++++++++++++++- > drivers/md/bcache/util.c | 60 +++++++++++-------------------------------- > drivers/md/bcache/util.h | 7 ++--- > drivers/md/bcache/writeback.h | 2 +- > 14 files changed, 170 insertions(+), 94 deletions(-) > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team