mbox series

[B,00/11] LP#1829563 bcache: risk of data loss on I/O errors in backing or caching devices

Message ID 20190708005038.13184-1-mfo@canonical.com
Headers show
Series LP#1829563 bcache: risk of data loss on I/O errors in backing or caching devices | expand

Message

Mauricio Faria de Oliveira July 8, 2019, 12:50 a.m. UTC
BugLink: https://bugs.launchpad.net/bugs/1829563

Note: the patchset is relatively large because the support
for error detection/handling is mostly non-existent in 4.15.
All patches are in Cosmic/4.18 except PATCH 11 (other series).

[Impact]

 * The bcache code in Bionic lacks several fixes to handle
   I/O errors in both backing devices and caching devices.

 * Partial or permanent errors in backing or caching devices,
   specially in writeback mode, can lead to data loss and/or
   the application is not notified about failed I/O requests.

 * The bcache device might remain available for I/O requests
   even if backing device is offline, so writes are undefined.

[Test Case]

 * Detailed test cases/steps for the behavior of many patches
   with code logic changes are provided in bug comments.

 * The patchset has been tested for regressions on each cache
   mode (writethrough, writeback, writearound, none) with the
   xfstests test suite (on ext4) and fio (sequential & random
   read-write).

[Regression Potential]

 * The patchset is relatively large and touches several areas
   in bcache code, however, synthetic testing of the patches
   has been performed, and extensive regression/stress tests
   were run (as mentioned in Test Case section).

 * Many patches in the patchset are 'Fixes' patches to other
   patches, and no further 'Fixes' currently exist upstream.

[Other Info]

 * Canonical Field Eng. deploys bcache+writeback extensively
   (e.g., BootStack, UA cloud, except rare all-flash cases).

[Original Bug Description]

This is a request for a backport of the following upstream patch from 4.18:

"bcache: stop bcache device when backing device is offline"
https://github.com/torvalds/linux/commit/0f0709e6bfc3ce4e8e1c0e8573490c45f76cfeee

Field engineering uses bcache quite extensively and it would be good to have this in the GA/bionic kernel.


Coly Li (9):
  bcache: add CACHE_SET_IO_DISABLE to struct cache_set flags
  bcache: add stop_when_cache_set_failed option to backing device
  bcache: add backing_request_endio() for bi_end_io
  bcache: add io_disable to struct cached_dev
  bcache: store disk name in struct cache and struct cached_dev
  bcache: count backing device I/O error for writeback I/O
  bcache: add wait_for_kthread_stop() in bch_allocator_thread()
  bcache: set dc->io_disable to true in conditional_stop_bcache_device()
  bcache: stop bcache device when backing device is offline

Tang Junhui (2):
  bcache: fix inaccurate io state for detached bcache devices
  bcache: fix ioctl in flash device

 drivers/md/bcache/alloc.c     |   8 +-
 drivers/md/bcache/bcache.h    |  54 +++++++++
 drivers/md/bcache/btree.c     |  11 +-
 drivers/md/bcache/debug.c     |   3 +-
 drivers/md/bcache/io.c        |  20 +++-
 drivers/md/bcache/journal.c   |   4 +-
 drivers/md/bcache/request.c   | 186 +++++++++++++++++++++++++-----
 drivers/md/bcache/super.c     | 207 +++++++++++++++++++++++++++++-----
 drivers/md/bcache/sysfs.c     |  50 +++++++-
 drivers/md/bcache/util.h      |   6 -
 drivers/md/bcache/writeback.c |  37 ++++--
 11 files changed, 497 insertions(+), 89 deletions(-)

Comments

Andrea Righi July 12, 2019, 6:25 a.m. UTC | #1
On Sun, Jul 07, 2019 at 09:50:27PM -0300, Mauricio Faria de Oliveira wrote:
> BugLink: https://bugs.launchpad.net/bugs/1829563
> 
> Note: the patchset is relatively large because the support
> for error detection/handling is mostly non-existent in 4.15.
> All patches are in Cosmic/4.18 except PATCH 11 (other series).
> 
> [Impact]
> 
>  * The bcache code in Bionic lacks several fixes to handle
>    I/O errors in both backing devices and caching devices.
> 
>  * Partial or permanent errors in backing or caching devices,
>    specially in writeback mode, can lead to data loss and/or
>    the application is not notified about failed I/O requests.
> 
>  * The bcache device might remain available for I/O requests
>    even if backing device is offline, so writes are undefined.
> 
> [Test Case]
> 
>  * Detailed test cases/steps for the behavior of many patches
>    with code logic changes are provided in bug comments.
> 
>  * The patchset has been tested for regressions on each cache
>    mode (writethrough, writeback, writearound, none) with the
>    xfstests test suite (on ext4) and fio (sequential & random
>    read-write).
> 
> [Regression Potential]
> 
>  * The patchset is relatively large and touches several areas
>    in bcache code, however, synthetic testing of the patches
>    has been performed, and extensive regression/stress tests
>    were run (as mentioned in Test Case section).
> 
>  * Many patches in the patchset are 'Fixes' patches to other
>    patches, and no further 'Fixes' currently exist upstream.
> 
> [Other Info]
> 
>  * Canonical Field Eng. deploys bcache+writeback extensively
>    (e.g., BootStack, UA cloud, except rare all-flash cases).
> 
> [Original Bug Description]
> 
> This is a request for a backport of the following upstream patch from 4.18:
> 
> "bcache: stop bcache device when backing device is offline"
> https://github.com/torvalds/linux/commit/0f0709e6bfc3ce4e8e1c0e8573490c45f76cfeee
> 
> Field engineering uses bcache quite extensively and it would be good to have this in the GA/bionic kernel.
> 
> 
> Coly Li (9):
>   bcache: add CACHE_SET_IO_DISABLE to struct cache_set flags
>   bcache: add stop_when_cache_set_failed option to backing device
>   bcache: add backing_request_endio() for bi_end_io
>   bcache: add io_disable to struct cached_dev
>   bcache: store disk name in struct cache and struct cached_dev
>   bcache: count backing device I/O error for writeback I/O
>   bcache: add wait_for_kthread_stop() in bch_allocator_thread()
>   bcache: set dc->io_disable to true in conditional_stop_bcache_device()
>   bcache: stop bcache device when backing device is offline
> 
> Tang Junhui (2):
>   bcache: fix inaccurate io state for detached bcache devices
>   bcache: fix ioctl in flash device
> 
>  drivers/md/bcache/alloc.c     |   8 +-
>  drivers/md/bcache/bcache.h    |  54 +++++++++
>  drivers/md/bcache/btree.c     |  11 +-
>  drivers/md/bcache/debug.c     |   3 +-
>  drivers/md/bcache/io.c        |  20 +++-
>  drivers/md/bcache/journal.c   |   4 +-
>  drivers/md/bcache/request.c   | 186 +++++++++++++++++++++++++-----
>  drivers/md/bcache/super.c     | 207 +++++++++++++++++++++++++++++-----
>  drivers/md/bcache/sysfs.c     |  50 +++++++-
>  drivers/md/bcache/util.h      |   6 -
>  drivers/md/bcache/writeback.c |  37 ++++--
>  11 files changed, 497 insertions(+), 89 deletions(-)

I had to backport the same commits for a separate bug
(https://bugs.launchpad.net/bugs/1796292) and I produced the same
patches with positive test results, therefore:

Acked-by: Andrea Righi <andrea.righi@canonical.com>
Mauricio Faria de Oliveira July 16, 2019, 12:33 p.m. UTC | #2
Hi kernel team,

On Sun, Jul 7, 2019 at 9:51 PM Mauricio Faria de Oliveira
<mfo@canonical.com> wrote:
>
> BugLink: https://bugs.launchpad.net/bugs/1829563

Looking for an additional reviewer for this series as it's desired on next cycle
(which last day for commits is tomorrow/Jul 17) if at all possible, please.

Thank you,
Connor Kuehl July 16, 2019, 5:51 p.m. UTC | #3
On 7/7/19 5:50 PM, Mauricio Faria de Oliveira wrote:
> BugLink: https://bugs.launchpad.net/bugs/1829563
> 
> Note: the patchset is relatively large because the support
> for error detection/handling is mostly non-existent in 4.15.
> All patches are in Cosmic/4.18 except PATCH 11 (other series).
> 
> [Impact]
> 
>  * The bcache code in Bionic lacks several fixes to handle
>    I/O errors in both backing devices and caching devices.
> 
>  * Partial or permanent errors in backing or caching devices,
>    specially in writeback mode, can lead to data loss and/or
>    the application is not notified about failed I/O requests.
> 
>  * The bcache device might remain available for I/O requests
>    even if backing device is offline, so writes are undefined.
> 
> [Test Case]
> 
>  * Detailed test cases/steps for the behavior of many patches
>    with code logic changes are provided in bug comments.
> 
>  * The patchset has been tested for regressions on each cache
>    mode (writethrough, writeback, writearound, none) with the
>    xfstests test suite (on ext4) and fio (sequential & random
>    read-write).
> 
> [Regression Potential]
> 
>  * The patchset is relatively large and touches several areas
>    in bcache code, however, synthetic testing of the patches
>    has been performed, and extensive regression/stress tests
>    were run (as mentioned in Test Case section).
> 
>  * Many patches in the patchset are 'Fixes' patches to other
>    patches, and no further 'Fixes' currently exist upstream.
> 
> [Other Info]
> 
>  * Canonical Field Eng. deploys bcache+writeback extensively
>    (e.g., BootStack, UA cloud, except rare all-flash cases).
> 
> [Original Bug Description]
> 
> This is a request for a backport of the following upstream patch from 4.18:
> 
> "bcache: stop bcache device when backing device is offline"
> https://github.com/torvalds/linux/commit/0f0709e6bfc3ce4e8e1c0e8573490c45f76cfeee
> 
> Field engineering uses bcache quite extensively and it would be good to have this in the GA/bionic kernel.
> 
> 
> Coly Li (9):
>   bcache: add CACHE_SET_IO_DISABLE to struct cache_set flags
>   bcache: add stop_when_cache_set_failed option to backing device
>   bcache: add backing_request_endio() for bi_end_io
>   bcache: add io_disable to struct cached_dev
>   bcache: store disk name in struct cache and struct cached_dev
>   bcache: count backing device I/O error for writeback I/O
>   bcache: add wait_for_kthread_stop() in bch_allocator_thread()
>   bcache: set dc->io_disable to true in conditional_stop_bcache_device()
>   bcache: stop bcache device when backing device is offline
> 
> Tang Junhui (2):
>   bcache: fix inaccurate io state for detached bcache devices
>   bcache: fix ioctl in flash device
> 
>  drivers/md/bcache/alloc.c     |   8 +-
>  drivers/md/bcache/bcache.h    |  54 +++++++++
>  drivers/md/bcache/btree.c     |  11 +-
>  drivers/md/bcache/debug.c     |   3 +-
>  drivers/md/bcache/io.c        |  20 +++-
>  drivers/md/bcache/journal.c   |   4 +-
>  drivers/md/bcache/request.c   | 186 +++++++++++++++++++++++++-----
>  drivers/md/bcache/super.c     | 207 +++++++++++++++++++++++++++++-----
>  drivers/md/bcache/sysfs.c     |  50 +++++++-
>  drivers/md/bcache/util.h      |   6 -
>  drivers/md/bcache/writeback.c |  37 ++++--
>  11 files changed, 497 insertions(+), 89 deletions(-)
> 

+ very thoroughly tested

Acked-by: Connor Kuehl <connor.kuehl@canonical.com>
Mauricio Faria de Oliveira July 16, 2019, 6:14 p.m. UTC | #4
On Tue, Jul 16, 2019 at 2:51 PM Connor Kuehl <connor.kuehl@canonical.com> wrote:
[snip]
> + very thoroughly tested
>
> Acked-by: Connor Kuehl <connor.kuehl@canonical.com>

Connor, thanks for reviewing!
Mauricio Faria de Oliveira July 16, 2019, 6:14 p.m. UTC | #5
On Fri, Jul 12, 2019 at 3:25 AM Andrea Righi <andrea.righi@canonical.com> wrote:
[snip]
> I had to backport the same commits for a separate bug
> (https://bugs.launchpad.net/bugs/1796292) and I produced the same
> patches with positive test results, therefore:
>
> Acked-by: Andrea Righi <andrea.righi@canonical.com>

Thanks for the review, Andrea!
Kleber Sacilotto de Souza July 17, 2019, 10:47 a.m. UTC | #6
On 7/8/19 2:50 AM, Mauricio Faria de Oliveira wrote:
> BugLink: https://bugs.launchpad.net/bugs/1829563
> 
> Note: the patchset is relatively large because the support
> for error detection/handling is mostly non-existent in 4.15.
> All patches are in Cosmic/4.18 except PATCH 11 (other series).
> 
> [Impact]
> 
>  * The bcache code in Bionic lacks several fixes to handle
>    I/O errors in both backing devices and caching devices.
> 
>  * Partial or permanent errors in backing or caching devices,
>    specially in writeback mode, can lead to data loss and/or
>    the application is not notified about failed I/O requests.
> 
>  * The bcache device might remain available for I/O requests
>    even if backing device is offline, so writes are undefined.
> 
> [Test Case]
> 
>  * Detailed test cases/steps for the behavior of many patches
>    with code logic changes are provided in bug comments.
> 
>  * The patchset has been tested for regressions on each cache
>    mode (writethrough, writeback, writearound, none) with the
>    xfstests test suite (on ext4) and fio (sequential & random
>    read-write).
> 
> [Regression Potential]
> 
>  * The patchset is relatively large and touches several areas
>    in bcache code, however, synthetic testing of the patches
>    has been performed, and extensive regression/stress tests
>    were run (as mentioned in Test Case section).
> 
>  * Many patches in the patchset are 'Fixes' patches to other
>    patches, and no further 'Fixes' currently exist upstream.
> 
> [Other Info]
> 
>  * Canonical Field Eng. deploys bcache+writeback extensively
>    (e.g., BootStack, UA cloud, except rare all-flash cases).
> 
> [Original Bug Description]
> 
> This is a request for a backport of the following upstream patch from 4.18:
> 
> "bcache: stop bcache device when backing device is offline"
> https://github.com/torvalds/linux/commit/0f0709e6bfc3ce4e8e1c0e8573490c45f76cfeee
> 
> Field engineering uses bcache quite extensively and it would be good to have this in the GA/bionic kernel.

Applied to bionic/master-next branch, with some context adjustments to

- bcache: add io_disable to struct cached_dev


Thanks,
Kleber

> 
> 
> Coly Li (9):
>   bcache: add CACHE_SET_IO_DISABLE to struct cache_set flags
>   bcache: add stop_when_cache_set_failed option to backing device
>   bcache: add backing_request_endio() for bi_end_io
>   bcache: add io_disable to struct cached_dev
>   bcache: store disk name in struct cache and struct cached_dev
>   bcache: count backing device I/O error for writeback I/O
>   bcache: add wait_for_kthread_stop() in bch_allocator_thread()
>   bcache: set dc->io_disable to true in conditional_stop_bcache_device()
>   bcache: stop bcache device when backing device is offline
> 
> Tang Junhui (2):
>   bcache: fix inaccurate io state for detached bcache devices
>   bcache: fix ioctl in flash device
> 
>  drivers/md/bcache/alloc.c     |   8 +-
>  drivers/md/bcache/bcache.h    |  54 +++++++++
>  drivers/md/bcache/btree.c     |  11 +-
>  drivers/md/bcache/debug.c     |   3 +-
>  drivers/md/bcache/io.c        |  20 +++-
>  drivers/md/bcache/journal.c   |   4 +-
>  drivers/md/bcache/request.c   | 186 +++++++++++++++++++++++++-----
>  drivers/md/bcache/super.c     | 207 +++++++++++++++++++++++++++++-----
>  drivers/md/bcache/sysfs.c     |  50 +++++++-
>  drivers/md/bcache/util.h      |   6 -
>  drivers/md/bcache/writeback.c |  37 ++++--
>  11 files changed, 497 insertions(+), 89 deletions(-)
>