ext4: Check superblock mapped prior to committing

Message ID 1530300995-25583-1-git-send-email-jonathan.derrick@intel.com
State New
Headers show
Series
  • ext4: Check superblock mapped prior to committing
Related show

Commit Message

Jon Derrick June 29, 2018, 7:36 p.m.
This patch attempts to close a hole leading to a BUG seen with hot
removals during writes [1].

A block device (NVME namespace in this test case) is formatted to EXT4
without partitions. It's mounted and write I/O is run to a file, then
the device is hot removed from the slot. The superblock attempts to be
written to the drive which is no longer present.

The typical chain of events leading to the BUG:
ext4_commit_super()
  __sync_dirty_buffer()
    submit_bh()
      submit_bh_wbc()
        BUG_ON(!buffer_mapped(bh));

This fix checks for the superblock's buffer head being mapped prior to
syncing.

[1] https://www.spinics.net/lists/linux-ext4/msg56527.html

Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
---
 fs/ext4/super.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Andreas Dilger June 30, 2018, 4:36 a.m. | #1
On Jun 29, 2018, at 1:36 PM, Jon Derrick <jonathan.derrick@intel.com> wrote:
> 
> This patch attempts to close a hole leading to a BUG seen with hot
> removals during writes [1].
> 
> A block device (NVME namespace in this test case) is formatted to EXT4
> without partitions. It's mounted and write I/O is run to a file, then
> the device is hot removed from the slot. The superblock attempts to be
> written to the drive which is no longer present.
> 
> The typical chain of events leading to the BUG:
> ext4_commit_super()
>  __sync_dirty_buffer()
>    submit_bh()
>      submit_bh_wbc()
>        BUG_ON(!buffer_mapped(bh));
> 
> This fix checks for the superblock's buffer head being mapped prior to
> syncing.
> 
> [1] https://www.spinics.net/lists/linux-ext4/msg56527.html
> 
> Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
> ---
> fs/ext4/super.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
> 
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 0c4c220..ee33233 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -4736,6 +4736,14 @@ static int ext4_commit_super(struct super_block *sb, int sync)
> 
> 	if (!sbh || block_device_ejected(sb))
> 		return error;
> +
> +	/*
> +	 * The superblock bh should be mapped, but it might not be if the
> +	 * device was hot-removed. Not much we can do but fail the I/O.
> +	 */
> +	if (!buffer_mapped(sbh))
> +		return error;

This still looks a bit racy, based on the stack trace you posted.
There is already a "block_device_ejected()" check a line above,
which makes me think that the PCI device removal should be handled
like an ejected device, so that it is also handled elsewhere.

Even so, the check here in ext4_commit_super() can pass, and the
PCI card can be removed on the next instruction and still trigger
the BUG_ON().

That said, this is probably still an improvement on the existing
situation.

Cheers, Andreas
Theodore Y. Ts'o July 2, 2018, 10:50 p.m. | #2
On Fri, Jun 29, 2018 at 01:36:35PM -0600, Jon Derrick wrote:
> This patch attempts to close a hole leading to a BUG seen with hot
> removals during writes [1].
> 
> A block device (NVME namespace in this test case) is formatted to EXT4
> without partitions. It's mounted and write I/O is run to a file, then
> the device is hot removed from the slot. The superblock attempts to be
> written to the drive which is no longer present.
> 
> The typical chain of events leading to the BUG:
> ext4_commit_super()
>   __sync_dirty_buffer()
>     submit_bh()
>       submit_bh_wbc()
>         BUG_ON(!buffer_mapped(bh));
> 
> This fix checks for the superblock's buffer head being mapped prior to
> syncing.
> 
> [1] https://www.spinics.net/lists/linux-ext4/msg56527.html
> 
> Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>

Thanks, applied.

				- Ted

Patch

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 0c4c220..ee33233 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4736,6 +4736,14 @@  static int ext4_commit_super(struct super_block *sb, int sync)
 
 	if (!sbh || block_device_ejected(sb))
 		return error;
+
+	/*
+	 * The superblock bh should be mapped, but it might not be if the
+	 * device was hot-removed. Not much we can do but fail the I/O.
+	 */
+	if (!buffer_mapped(sbh))
+		return error;
+
 	/*
 	 * If the file system is mounted read-only, don't update the
 	 * superblock write time.  This avoids updating the superblock