[SRU,D,0/2] Two crashes on raid0 error path (during a member device removal)
mbox series

Message ID 20190716221539.31891-1-gpiccoli@canonical.com
Headers show
Series
  • Two crashes on raid0 error path (during a member device removal)
Related show

Message

Guilherme G. Piccoli July 16, 2019, 10:15 p.m. UTC
BugLink: https://bugs.launchpad.net/bugs/1836806

[Impact]

* During raid0 error path testing, by removing one member of the array, we've
noticed after kernel 4.18 we can trigger a crash depending if there's I/O in-flight
during the array removal. When debugging the issue, a second problem was found,
that could cause a different crash.

* For the first and more relevant problem, commit cd4a4ae4683d
("block: don't use blocking queue entered for recursive bio submits") introduced
the flag BIO_QUEUE_ENTERED in order BIOs that were split do bypass the blocking
queue entering routine and use the live non-blocking version. What happens with
md/raid0 though is that their BIOs have their underlying device changed to the
physical disk (array member). If we remove this physical disk (or if it fails),
we could have one BIO that had the flag changed to BIO_QUEUE_ENTERED and had the
device changed to the removed array member (before its removal); this bio then
skips a lot of checks in generic_make_request_checks(), triggering the following crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000155
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
RIP: 0010:blk_throtl_bio+0x45/0x970
[...]
Call Trace:
 generic_make_request_checks+0x1bf/0x690
 generic_make_request+0x64/0x3f0
 raid0_make_request+0x184/0x620 [raid0]
 ? raid0_make_request+0x184/0x620 [raid0]
 md_handle_request+0x126/0x1a0
 md_make_request+0x7b/0x180
 generic_make_request+0x19e/0x3f0
 submit_bio+0x73/0x140
[...]

* When debugging the above issue, by rebuilding the kernel with CONFIG_BLK_CGROUP=n
we've noticed a different crash. Commit 37f9579f4c31 ("blk-mq: Avoid that submitting
a bio concurrently with device removal triggers a crash") introduced a NULL pointer
dereference in generic_make_request(), that manifests as:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
RIP: 0010:generic_make_request+0x32b/0x400
Call Trace:
 submit_bio+0x73/0x140
 ext4_io_submit+0x4d/0x60
 ext4_writepages+0x626/0xe90
 do_writepages+0x4b/0xe0
[...]

* For both the issues, we have simple patches that are present in linux-stable
but not in Linus tree.
## For issue 1 (md removal crash):
869eec894663 ("md/raid0: Do not bypass blocking queue entered for raid0 bios")
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=869eec894663

## For issue 2 (generic_make_request() NULL dereference):
c9d8d3e9d7a0 ("block: Fix a NULL pointer dereference in generic_make_request()")
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c9d8d3e9d7a0

The reasoning for both patches not being present in Linus tree is explained in
the commit messages, but in summary Ming Lei submitted a major clean-up series
at the same time I've submitted both patches, it wouldn't make sense to accept
my patches to soon after remove the code paths with his clean-up. But Ming's
series rely on legacy I/O path removal, and so it's very hard to backport.
Hence maintainers suggested me to submit my small fixes to stable tree only.

[Test case]

For both cases, the test is the same, the only change being a kernel config option.
To reproduce issue 1 (md removal crash), a regular Ubuntu kernel config is enough.
For the issue 2, a kernel rebuild with CONFIG_BLK_CGROUP=n is necessary.

Steps to reproduce:

a) Create a raid0 md array with 2 NVMe devices as members, and mount it
with an ext4 filesystem.

b) Run the following oneliner (supposing the raid0 is mounted in /mnt):
(dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;\
echo 1 > /sys/block/nvme1n1/device/device/remove
(whereas nvme1n1 is the 2nd array member)

[Regression potential]

The fixes are self-contained and small, both validated by a great number of
subsystem maintainers (including block, raid and stable). Commit c9d8d3e9d7a0 was
also validated by the author of the offender patch it fixes, and has no functional
change. Commit 869eec894663 has only raid0 driver as scope, and fall-backs raid0
to a previous behavior before the introduction of BIO_QUEUE_ENTERED flag (which
indeed increases the amount of checks performed in BIOs), so the regression
potential is low and restricted to raid0.

Guilherme G. Piccoli (2):
  block: Fix a NULL pointer dereference in generic_make_request()
  md/raid0: Do not bypass blocking queue entered for raid0 bios

 block/blk-core.c   | 5 ++---
 drivers/md/raid0.c | 2 ++
 2 files changed, 4 insertions(+), 3 deletions(-)

Comments

Stefan Bader July 18, 2019, 8:15 a.m. UTC | #1
On 17.07.19 00:15, Guilherme G. Piccoli wrote:
> BugLink: https://bugs.launchpad.net/bugs/1836806
> 
> [Impact]
> 
> * During raid0 error path testing, by removing one member of the array, we've
> noticed after kernel 4.18 we can trigger a crash depending if there's I/O in-flight
> during the array removal. When debugging the issue, a second problem was found,
> that could cause a different crash.
> 
> * For the first and more relevant problem, commit cd4a4ae4683d
> ("block: don't use blocking queue entered for recursive bio submits") introduced
> the flag BIO_QUEUE_ENTERED in order BIOs that were split do bypass the blocking
> queue entering routine and use the live non-blocking version. What happens with
> md/raid0 though is that their BIOs have their underlying device changed to the
> physical disk (array member). If we remove this physical disk (or if it fails),
> we could have one BIO that had the flag changed to BIO_QUEUE_ENTERED and had the
> device changed to the removed array member (before its removal); this bio then
> skips a lot of checks in generic_make_request_checks(), triggering the following crash:
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000155
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP PTI
> RIP: 0010:blk_throtl_bio+0x45/0x970
> [...]
> Call Trace:
>  generic_make_request_checks+0x1bf/0x690
>  generic_make_request+0x64/0x3f0
>  raid0_make_request+0x184/0x620 [raid0]
>  ? raid0_make_request+0x184/0x620 [raid0]
>  md_handle_request+0x126/0x1a0
>  md_make_request+0x7b/0x180
>  generic_make_request+0x19e/0x3f0
>  submit_bio+0x73/0x140
> [...]
> 
> * When debugging the above issue, by rebuilding the kernel with CONFIG_BLK_CGROUP=n
> we've noticed a different crash. Commit 37f9579f4c31 ("blk-mq: Avoid that submitting
> a bio concurrently with device removal triggers a crash") introduced a NULL pointer
> dereference in generic_make_request(), that manifests as:
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP PTI
> RIP: 0010:generic_make_request+0x32b/0x400
> Call Trace:
>  submit_bio+0x73/0x140
>  ext4_io_submit+0x4d/0x60
>  ext4_writepages+0x626/0xe90
>  do_writepages+0x4b/0xe0
> [...]
> 
> * For both the issues, we have simple patches that are present in linux-stable
> but not in Linus tree.
> ## For issue 1 (md removal crash):
> 869eec894663 ("md/raid0: Do not bypass blocking queue entered for raid0 bios")
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=869eec894663
> 
> ## For issue 2 (generic_make_request() NULL dereference):
> c9d8d3e9d7a0 ("block: Fix a NULL pointer dereference in generic_make_request()")
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c9d8d3e9d7a0
> 
> The reasoning for both patches not being present in Linus tree is explained in
> the commit messages, but in summary Ming Lei submitted a major clean-up series
> at the same time I've submitted both patches, it wouldn't make sense to accept
> my patches to soon after remove the code paths with his clean-up. But Ming's
> series rely on legacy I/O path removal, and so it's very hard to backport.
> Hence maintainers suggested me to submit my small fixes to stable tree only.
> 
> [Test case]
> 
> For both cases, the test is the same, the only change being a kernel config option.
> To reproduce issue 1 (md removal crash), a regular Ubuntu kernel config is enough.
> For the issue 2, a kernel rebuild with CONFIG_BLK_CGROUP=n is necessary.
> 
> Steps to reproduce:
> 
> a) Create a raid0 md array with 2 NVMe devices as members, and mount it
> with an ext4 filesystem.
> 
> b) Run the following oneliner (supposing the raid0 is mounted in /mnt):
> (dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;\
> echo 1 > /sys/block/nvme1n1/device/device/remove
> (whereas nvme1n1 is the 2nd array member)
> 
> [Regression potential]
> 
> The fixes are self-contained and small, both validated by a great number of
> subsystem maintainers (including block, raid and stable). Commit c9d8d3e9d7a0 was
> also validated by the author of the offender patch it fixes, and has no functional
> change. Commit 869eec894663 has only raid0 driver as scope, and fall-backs raid0
> to a previous behavior before the introduction of BIO_QUEUE_ENTERED flag (which
> indeed increases the amount of checks performed in BIOs), so the regression
> potential is low and restricted to raid0.
> 
> Guilherme G. Piccoli (2):
>   block: Fix a NULL pointer dereference in generic_make_request()
>   md/raid0: Do not bypass blocking queue entered for raid0 bios
> 
>  block/blk-core.c   | 5 ++---
>  drivers/md/raid0.c | 2 ++
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 
If they are in that form in stable I guess they are preferable over the upstream
commits although it sounds like 5.0 could get the upstream changes. However it
has not yet (latest upstream 5.0.y) and will not as that support ended. So the
minimal variant is good. And this should be testable.


Acked-by: Stefan Bader <stefan.bader@canonical.com>
Connor Kuehl July 19, 2019, 3:50 p.m. UTC | #2
On 7/16/19 3:15 PM, Guilherme G. Piccoli wrote:
> BugLink: https://bugs.launchpad.net/bugs/1836806
> 
> [Impact]
> 
> * During raid0 error path testing, by removing one member of the array, we've
> noticed after kernel 4.18 we can trigger a crash depending if there's I/O in-flight
> during the array removal. When debugging the issue, a second problem was found,
> that could cause a different crash.
> 
> * For the first and more relevant problem, commit cd4a4ae4683d
> ("block: don't use blocking queue entered for recursive bio submits") introduced
> the flag BIO_QUEUE_ENTERED in order BIOs that were split do bypass the blocking
> queue entering routine and use the live non-blocking version. What happens with
> md/raid0 though is that their BIOs have their underlying device changed to the
> physical disk (array member). If we remove this physical disk (or if it fails),
> we could have one BIO that had the flag changed to BIO_QUEUE_ENTERED and had the
> device changed to the removed array member (before its removal); this bio then
> skips a lot of checks in generic_make_request_checks(), triggering the following crash:
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000155
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP PTI
> RIP: 0010:blk_throtl_bio+0x45/0x970
> [...]
> Call Trace:
>  generic_make_request_checks+0x1bf/0x690
>  generic_make_request+0x64/0x3f0
>  raid0_make_request+0x184/0x620 [raid0]
>  ? raid0_make_request+0x184/0x620 [raid0]
>  md_handle_request+0x126/0x1a0
>  md_make_request+0x7b/0x180
>  generic_make_request+0x19e/0x3f0
>  submit_bio+0x73/0x140
> [...]
> 
> * When debugging the above issue, by rebuilding the kernel with CONFIG_BLK_CGROUP=n
> we've noticed a different crash. Commit 37f9579f4c31 ("blk-mq: Avoid that submitting
> a bio concurrently with device removal triggers a crash") introduced a NULL pointer
> dereference in generic_make_request(), that manifests as:
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP PTI
> RIP: 0010:generic_make_request+0x32b/0x400
> Call Trace:
>  submit_bio+0x73/0x140
>  ext4_io_submit+0x4d/0x60
>  ext4_writepages+0x626/0xe90
>  do_writepages+0x4b/0xe0
> [...]
> 
> * For both the issues, we have simple patches that are present in linux-stable
> but not in Linus tree.
> ## For issue 1 (md removal crash):
> 869eec894663 ("md/raid0: Do not bypass blocking queue entered for raid0 bios")
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=869eec894663
> 
> ## For issue 2 (generic_make_request() NULL dereference):
> c9d8d3e9d7a0 ("block: Fix a NULL pointer dereference in generic_make_request()")
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c9d8d3e9d7a0
> 
> The reasoning for both patches not being present in Linus tree is explained in
> the commit messages, but in summary Ming Lei submitted a major clean-up series
> at the same time I've submitted both patches, it wouldn't make sense to accept
> my patches to soon after remove the code paths with his clean-up. But Ming's
> series rely on legacy I/O path removal, and so it's very hard to backport.
> Hence maintainers suggested me to submit my small fixes to stable tree only.
> 
> [Test case]
> 
> For both cases, the test is the same, the only change being a kernel config option.
> To reproduce issue 1 (md removal crash), a regular Ubuntu kernel config is enough.
> For the issue 2, a kernel rebuild with CONFIG_BLK_CGROUP=n is necessary.
> 
> Steps to reproduce:
> 
> a) Create a raid0 md array with 2 NVMe devices as members, and mount it
> with an ext4 filesystem.
> 
> b) Run the following oneliner (supposing the raid0 is mounted in /mnt):
> (dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;\
> echo 1 > /sys/block/nvme1n1/device/device/remove
> (whereas nvme1n1 is the 2nd array member)
> 
> [Regression potential]
> 
> The fixes are self-contained and small, both validated by a great number of
> subsystem maintainers (including block, raid and stable). Commit c9d8d3e9d7a0 was
> also validated by the author of the offender patch it fixes, and has no functional
> change. Commit 869eec894663 has only raid0 driver as scope, and fall-backs raid0
> to a previous behavior before the introduction of BIO_QUEUE_ENTERED flag (which
> indeed increases the amount of checks performed in BIOs), so the regression
> potential is low and restricted to raid0.
> 
> Guilherme G. Piccoli (2):
>   block: Fix a NULL pointer dereference in generic_make_request()
>   md/raid0: Do not bypass blocking queue entered for raid0 bios
> 
>  block/blk-core.c   | 5 ++---
>  drivers/md/raid0.c | 2 ++
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 

Acked-by: Connor Kuehl <connor.kuehl@canonical.com>
Khaled Elmously July 23, 2019, 5:25 a.m. UTC | #3
On 2019-07-16 19:15:37 , Guilherme G. Piccoli wrote:
> BugLink: https://bugs.launchpad.net/bugs/1836806
> 
> [Impact]
> 
> * During raid0 error path testing, by removing one member of the array, we've
> noticed after kernel 4.18 we can trigger a crash depending if there's I/O in-flight
> during the array removal. When debugging the issue, a second problem was found,
> that could cause a different crash.
> 
> * For the first and more relevant problem, commit cd4a4ae4683d
> ("block: don't use blocking queue entered for recursive bio submits") introduced
> the flag BIO_QUEUE_ENTERED in order BIOs that were split do bypass the blocking
> queue entering routine and use the live non-blocking version. What happens with
> md/raid0 though is that their BIOs have their underlying device changed to the
> physical disk (array member). If we remove this physical disk (or if it fails),
> we could have one BIO that had the flag changed to BIO_QUEUE_ENTERED and had the
> device changed to the removed array member (before its removal); this bio then
> skips a lot of checks in generic_make_request_checks(), triggering the following crash:
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000155
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP PTI
> RIP: 0010:blk_throtl_bio+0x45/0x970
> [...]
> Call Trace:
>  generic_make_request_checks+0x1bf/0x690
>  generic_make_request+0x64/0x3f0
>  raid0_make_request+0x184/0x620 [raid0]
>  ? raid0_make_request+0x184/0x620 [raid0]
>  md_handle_request+0x126/0x1a0
>  md_make_request+0x7b/0x180
>  generic_make_request+0x19e/0x3f0
>  submit_bio+0x73/0x140
> [...]
> 
> * When debugging the above issue, by rebuilding the kernel with CONFIG_BLK_CGROUP=n
> we've noticed a different crash. Commit 37f9579f4c31 ("blk-mq: Avoid that submitting
> a bio concurrently with device removal triggers a crash") introduced a NULL pointer
> dereference in generic_make_request(), that manifests as:
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP PTI
> RIP: 0010:generic_make_request+0x32b/0x400
> Call Trace:
>  submit_bio+0x73/0x140
>  ext4_io_submit+0x4d/0x60
>  ext4_writepages+0x626/0xe90
>  do_writepages+0x4b/0xe0
> [...]
> 
> * For both the issues, we have simple patches that are present in linux-stable
> but not in Linus tree.
> ## For issue 1 (md removal crash):
> 869eec894663 ("md/raid0: Do not bypass blocking queue entered for raid0 bios")
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=869eec894663
> 
> ## For issue 2 (generic_make_request() NULL dereference):
> c9d8d3e9d7a0 ("block: Fix a NULL pointer dereference in generic_make_request()")
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c9d8d3e9d7a0
> 
> The reasoning for both patches not being present in Linus tree is explained in
> the commit messages, but in summary Ming Lei submitted a major clean-up series
> at the same time I've submitted both patches, it wouldn't make sense to accept
> my patches to soon after remove the code paths with his clean-up. But Ming's
> series rely on legacy I/O path removal, and so it's very hard to backport.
> Hence maintainers suggested me to submit my small fixes to stable tree only.
> 
> [Test case]
> 
> For both cases, the test is the same, the only change being a kernel config option.
> To reproduce issue 1 (md removal crash), a regular Ubuntu kernel config is enough.
> For the issue 2, a kernel rebuild with CONFIG_BLK_CGROUP=n is necessary.
> 
> Steps to reproduce:
> 
> a) Create a raid0 md array with 2 NVMe devices as members, and mount it
> with an ext4 filesystem.
> 
> b) Run the following oneliner (supposing the raid0 is mounted in /mnt):
> (dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;\
> echo 1 > /sys/block/nvme1n1/device/device/remove
> (whereas nvme1n1 is the 2nd array member)
> 
> [Regression potential]
> 
> The fixes are self-contained and small, both validated by a great number of
> subsystem maintainers (including block, raid and stable). Commit c9d8d3e9d7a0 was
> also validated by the author of the offender patch it fixes, and has no functional
> change. Commit 869eec894663 has only raid0 driver as scope, and fall-backs raid0
> to a previous behavior before the introduction of BIO_QUEUE_ENTERED flag (which
> indeed increases the amount of checks performed in BIOs), so the regression
> potential is low and restricted to raid0.
> 
> Guilherme G. Piccoli (2):
>   block: Fix a NULL pointer dereference in generic_make_request()
>   md/raid0: Do not bypass blocking queue entered for raid0 bios
> 
>  block/blk-core.c   | 5 ++---
>  drivers/md/raid0.c | 2 ++
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 
> -- 
> 2.22.0
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team