Message ID | 20190807124617.30224-1-andrea.righi@canonical.com |
---|---|
Headers | show |
Series | bcache: fix hung task timeout in bch_bucket_alloc() | expand |
On 07.08.19 14:46, Andrea Righi wrote: > BugLink: https://bugs.launchpad.net/bugs/1784665 > > [Impact] > > bcache_allocator can call the following: > > bch_allocator_thread() > -> bch_prio_write() > -> bch_bucket_alloc() > -> wait on &ca->set->bucket_wait > > But the wake up event on bucket_wait is supposed to come from > bch_allocator_thread() itself causing a deadlock. > > [Test Case] > > This is a simple script that can easily trigger the deadlock condition: > https://launchpadlibrarian.net/381282009/bcache-basic-repro.sh > > A better test case has been also provided in LP: #1796292: > https://bugs.launchpad.net/curtin/+bug/1796292/+attachment/5280353/+files/curtin-nvme.sh > > [Fix] > > Fix by making the call to bch_prio_write() non-blocking, so that > bch_allocator_thread() never waits on itself. Moreover, make sure to > wake up the garbage collector thread when bch_prio_write() is failing to > allocate buckets to increase the chance of freeing up more buckets. > > In addition to that it would be safer to also import other upstream > bcache fixes (all clean cherry picks): > > eb8cbb6df38f6e5124a3d5f1f8a3dbf519537c60 bcache: improve bcache_reboot() > 9951379b0ca88c95876ad9778b9099e19a95d566 bcache: never writeback a discard operation > > [Regression Potential] > > The upstream fixes are all clean cherry picks from stable (most of them > are small cleanups), so regression potential is minimal. > > The only special patch is "UBUNTU: SAUCE: bcache: fix deadlock in > bcache_allocator" that is addressing the main deadlock bug (that seems > to be a mainline bug - not fixed yet). We should spend more time trying > to reproduce this deadlock with a mainline kernel and post the patch to > the LKML for review / feedback. > > However, considering that this patch seems to fix/prevent the specific > deadlock problem reported in this bug (tested on the affected platform), > it should be considered safe to apply it as it is for now, to prevent > potential hung task timeout conditions. > > Changes in v2: > - fix potential buckets leak in "UBUNTU: SAUCE: bcache: fix deadlock in > bcache_allocator" > > ---------------------------------------------------------------- > Andrea Righi (1): > UBUNTU: SAUCE: bcache: fix deadlock in bcache_allocator > > Coly Li (1): > bcache: improve bcache_reboot() > > Daniel Axtens (1): > bcache: never writeback a discard operation > > drivers/md/bcache/alloc.c | 5 ++++- > drivers/md/bcache/bcache.h | 2 +- > drivers/md/bcache/super.c | 39 +++++++++++++++++++++++++++++++-------- > drivers/md/bcache/writeback.h | 3 +++ > 4 files changed, 39 insertions(+), 10 deletions(-) > > Acked-by: Stefan Bader <stefan.bader@canonical.com>
On 8/7/19 2:46 PM, Andrea Righi wrote: > BugLink: https://bugs.launchpad.net/bugs/1784665 > > [Impact] > > bcache_allocator can call the following: > > bch_allocator_thread() > -> bch_prio_write() > -> bch_bucket_alloc() > -> wait on &ca->set->bucket_wait > > But the wake up event on bucket_wait is supposed to come from > bch_allocator_thread() itself causing a deadlock. > > [Test Case] > > This is a simple script that can easily trigger the deadlock condition: > https://launchpadlibrarian.net/381282009/bcache-basic-repro.sh > > A better test case has been also provided in LP: #1796292: > https://bugs.launchpad.net/curtin/+bug/1796292/+attachment/5280353/+files/curtin-nvme.sh > > [Fix] > > Fix by making the call to bch_prio_write() non-blocking, so that > bch_allocator_thread() never waits on itself. Moreover, make sure to > wake up the garbage collector thread when bch_prio_write() is failing to > allocate buckets to increase the chance of freeing up more buckets. > > In addition to that it would be safer to also import other upstream > bcache fixes (all clean cherry picks): > > eb8cbb6df38f6e5124a3d5f1f8a3dbf519537c60 bcache: improve bcache_reboot() > 9951379b0ca88c95876ad9778b9099e19a95d566 bcache: never writeback a discard operation > > [Regression Potential] > > The upstream fixes are all clean cherry picks from stable (most of them > are small cleanups), so regression potential is minimal. > > The only special patch is "UBUNTU: SAUCE: bcache: fix deadlock in > bcache_allocator" that is addressing the main deadlock bug (that seems > to be a mainline bug - not fixed yet). We should spend more time trying > to reproduce this deadlock with a mainline kernel and post the patch to > the LKML for review / feedback. > > However, considering that this patch seems to fix/prevent the specific > deadlock problem reported in this bug (tested on the affected platform), > it should be considered safe to apply it as it is for now, to prevent > potential hung task timeout conditions. > > Changes in v2: > - fix potential buckets leak in "UBUNTU: SAUCE: bcache: fix deadlock in > bcache_allocator" > > ---------------------------------------------------------------- > Andrea Righi (1): > UBUNTU: SAUCE: bcache: fix deadlock in bcache_allocator > > Coly Li (1): > bcache: improve bcache_reboot() > > Daniel Axtens (1): > bcache: never writeback a discard operation > > drivers/md/bcache/alloc.c | 5 ++++- > drivers/md/bcache/bcache.h | 2 +- > drivers/md/bcache/super.c | 39 +++++++++++++++++++++++++++++++-------- > drivers/md/bcache/writeback.h | 3 +++ > 4 files changed, 39 insertions(+), 10 deletions(-) > > Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
On 2019-08-07 14:46:14 , Andrea Righi wrote: > BugLink: https://bugs.launchpad.net/bugs/1784665 > > [Impact] > > bcache_allocator can call the following: > > bch_allocator_thread() > -> bch_prio_write() > -> bch_bucket_alloc() > -> wait on &ca->set->bucket_wait > > But the wake up event on bucket_wait is supposed to come from > bch_allocator_thread() itself causing a deadlock. > > [Test Case] > > This is a simple script that can easily trigger the deadlock condition: > https://launchpadlibrarian.net/381282009/bcache-basic-repro.sh > > A better test case has been also provided in LP: #1796292: > https://bugs.launchpad.net/curtin/+bug/1796292/+attachment/5280353/+files/curtin-nvme.sh > > [Fix] > > Fix by making the call to bch_prio_write() non-blocking, so that > bch_allocator_thread() never waits on itself. Moreover, make sure to > wake up the garbage collector thread when bch_prio_write() is failing to > allocate buckets to increase the chance of freeing up more buckets. > > In addition to that it would be safer to also import other upstream > bcache fixes (all clean cherry picks): > > eb8cbb6df38f6e5124a3d5f1f8a3dbf519537c60 bcache: improve bcache_reboot() > 9951379b0ca88c95876ad9778b9099e19a95d566 bcache: never writeback a discard operation > > [Regression Potential] > > The upstream fixes are all clean cherry picks from stable (most of them > are small cleanups), so regression potential is minimal. > > The only special patch is "UBUNTU: SAUCE: bcache: fix deadlock in > bcache_allocator" that is addressing the main deadlock bug (that seems > to be a mainline bug - not fixed yet). We should spend more time trying > to reproduce this deadlock with a mainline kernel and post the patch to > the LKML for review / feedback. > > However, considering that this patch seems to fix/prevent the specific > deadlock problem reported in this bug (tested on the affected platform), > it should be considered safe to apply it as it is for now, to prevent > potential hung task timeout conditions. > > Changes in v2: > - fix potential buckets leak in "UBUNTU: SAUCE: bcache: fix deadlock in > bcache_allocator" > > ---------------------------------------------------------------- > Andrea Righi (1): > UBUNTU: SAUCE: bcache: fix deadlock in bcache_allocator > > Coly Li (1): > bcache: improve bcache_reboot() > > Daniel Axtens (1): > bcache: never writeback a discard operation > > drivers/md/bcache/alloc.c | 5 ++++- > drivers/md/bcache/bcache.h | 2 +- > drivers/md/bcache/super.c | 39 +++++++++++++++++++++++++++++++-------- > drivers/md/bcache/writeback.h | 3 +++ > 4 files changed, 39 insertions(+), 10 deletions(-) > > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team