diff mbox series

Revert "ext4: enforce buffer head state assertion in ext4_da_map_blocks"

Message ID 20211012171901.5352-1-enwlinux@gmail.com
State Awaiting Upstream
Headers show
Series Revert "ext4: enforce buffer head state assertion in ext4_da_map_blocks" | expand

Commit Message

Eric Whitney Oct. 12, 2021, 5:19 p.m. UTC
This reverts commit 948ca5f30e1df0c11eb5b0f410b9ceb97fa77ad9.

Two crash reports from users running variations on 5.15-rc4 kernels
suggest that it is premature to enforce the state assertion in the
original commit.  Both crashes were triggered by BUG calls in that
code, indicating that under some rare circumstance the buffer head
state did not match a delayed allocated block at the time the
block was written out.  No reproducer is available.  Resolving this
problem will require more time than remains in the current release
cycle, so reverting the original patch for the time being is necessary
to avoid any instability it may cause.

Signed-off-by: Eric Whitney <enwlinux@gmail.com>
---
 fs/ext4/inode.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

Comments

Theodore Ts'o Oct. 13, 2021, 8:46 p.m. UTC | #1
On Tue, 12 Oct 2021 13:19:01 -0400, Eric Whitney wrote:
> This reverts commit 948ca5f30e1df0c11eb5b0f410b9ceb97fa77ad9.
> 
> Two crash reports from users running variations on 5.15-rc4 kernels
> suggest that it is premature to enforce the state assertion in the
> original commit.  Both crashes were triggered by BUG calls in that
> code, indicating that under some rare circumstance the buffer head
> state did not match a delayed allocated block at the time the
> block was written out.  No reproducer is available.  Resolving this
> problem will require more time than remains in the current release
> cycle, so reverting the original patch for the time being is necessary
> to avoid any instability it may cause.
> 
> [...]

Applied, thanks!

[1/1] Revert "ext4: enforce buffer head state assertion in ext4_da_map_blocks"
      commit: 52264b162a51eadb0adcb55297cf91905c6ede98

Best regards,
Gao Xiang Oct. 26, 2021, 9:52 a.m. UTC | #2
Hi,

On Tue, Oct 12, 2021 at 01:19:01PM -0400, Eric Whitney wrote:
> This reverts commit 948ca5f30e1df0c11eb5b0f410b9ceb97fa77ad9.
> 
> Two crash reports from users running variations on 5.15-rc4 kernels
> suggest that it is premature to enforce the state assertion in the
> original commit.  Both crashes were triggered by BUG calls in that
> code, indicating that under some rare circumstance the buffer head
> state did not match a delayed allocated block at the time the
> block was written out.  No reproducer is available.  Resolving this
> problem will require more time than remains in the current release
> cycle, so reverting the original patch for the time being is necessary
> to avoid any instability it may cause.
> 
> Signed-off-by: Eric Whitney <enwlinux@gmail.com>

I noticed another problem related to commit 948ca5f30e1d
("ext4: enforce buffer head state assertion in ext4_da_map_blocks")
due to memory failure fault injection.

I think because buffer head and page can be removed due to
memory failure in generic_error_remove_page(). So it becomes
inconsistent with the extent status.

When such condition happens, it prints:

[   33.879505] Injecting memory failure for pfn 0x10f57e at process virtual address 0x20000000
[   33.881523] gx: mapping ffff995684f686f8 page->index 0 blocknr ffffffffffff0000 delay 1 new 0 dirty 1 mapped 1
[   33.881530] Memory failure: 0x10f57e: recovery action for dirty LRU page: Recovered

...

[   34.310151] ------------[ cut here ]------------
[   34.311972] mapping ffff995684f686f8 iblock 0 b_blocknr ffffffffffffffff invalid_block ffffffffffff0000 buffer_new 0 buffer_delay 0
[   34.315107] WARNING: CPU: 1 PID: 729 at fs/ext4/inode.c:1721 ext4_da_get_block_prep+0x30e/0x440

Anyway, revert the commit can avoid the memory fault injection
problem as well. I think if such patch re-lands, it needs to
consider memory failure condition too....

Thanks,
Gao Xiang


p.s. here is my debugging patch:
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 0f06305167d5..d20553ac523a 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1718,6 +1718,9 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
                 * initially delayed allocated
                 */
                if (ext4_es_is_delonly(&es)) {
+                       WARN(1, "mapping %px iblock %llu b_blocknr %llx buffer_new %u buffer_delay %u\n",
+                              inode->i_mapping, iblock, bh->b_blocknr, buffer_new(bh), buffer_delay(bh));
+
                        BUG_ON(bh->b_blocknr != invalid_block);
                        BUG_ON(!buffer_new(bh));
                        BUG_ON(!buffer_delay(bh));
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 3e6449f2102a..8bf2affe03fd 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -33,6 +33,7 @@
  * are rare we hope to get away with this. This avoids impacting the core
  * VM.
  */
+#include <linux/buffer_head.h>
 #include <linux/kernel.h>
 #include <linux/mm.h>
 #include <linux/page-flags.h>
@@ -779,7 +780,17 @@ static int truncate_error_page(struct page *p, unsigned long pfn,
        int ret = MF_FAILED;

        if (mapping->a_ops->error_remove_page) {
-               int err = mapping->a_ops->error_remove_page(mapping, p);
+               int err;
+
+               if (page_has_buffers(p)) {
+                       struct buffer_head *bh;
+
+                       bh = page_buffers(p);
+                       pr_err("gx: mapping %px page->index %lu blocknr %llx delay %u new %u dirty %u mapped %u", mapping, p->index,
+                              bh->b_blocknr, buffer_delay(bh), buffer_new(bh), buffer_dirty(bh), buffer_mapped(bh));
+               }
+
+               err = mapping->a_ops->error_remove_page(mapping, p);

                if (err != 0) {
                        pr_info("Memory failure: %#lx: Failed to punch page: %d\n",
Eric Whitney Nov. 3, 2021, 7:14 p.m. UTC | #3
* Theodore Ts'o <tytso@mit.edu>:
> On Tue, 12 Oct 2021 13:19:01 -0400, Eric Whitney wrote:
> > This reverts commit 948ca5f30e1df0c11eb5b0f410b9ceb97fa77ad9.
> > 
> > Two crash reports from users running variations on 5.15-rc4 kernels
> > suggest that it is premature to enforce the state assertion in the
> > original commit.  Both crashes were triggered by BUG calls in that
> > code, indicating that under some rare circumstance the buffer head
> > state did not match a delayed allocated block at the time the
> > block was written out.  No reproducer is available.  Resolving this
> > problem will require more time than remains in the current release
> > cycle, so reverting the original patch for the time being is necessary
> > to avoid any instability it may cause.
> > 
> > [...]
> 
> Applied, thanks!
> 
> [1/1] Revert "ext4: enforce buffer head state assertion in ext4_da_map_blocks"
>       commit: 52264b162a51eadb0adcb55297cf91905c6ede98
> 
> Best regards,
> -- 
> Theodore Ts'o <tytso@mit.edu>

Hi Ted:

This reversion isn't visible in the 5.15 kernel.  We'll want to get this in as
a bug fix to 5.15.1 if possible.  Please let me know if there's anything I can
do to expedite.

Thanks,
Eric
Theodore Ts'o Nov. 4, 2021, 2:36 p.m. UTC | #4
On Wed, Nov 03, 2021 at 03:14:57PM -0400, Eric Whitney wrote:
> 
> This reversion isn't visible in the 5.15 kernel.  We'll want to get this in as
> a bug fix to 5.15.1 if possible.  Please let me know if there's anything I can
> do to expedite.

Ack, I forgot to send a pull request to Linus before 5.15 went out,
sorry about that.  I've updated the commit description to include a
cc:stable and it will be going out with the other ext4 patches for
this merge window.

Cheers,

					- Ted
diff mbox series

Patch

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 0f06305167d5..9097fccdc688 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1711,16 +1711,13 @@  static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
 		}
 
 		/*
-		 * the buffer head associated with a delayed and not unwritten
-		 * block found in the extent status cache must contain an
-		 * invalid block number and have its BH_New and BH_Delay bits
-		 * set, reflecting the state assigned when the block was
-		 * initially delayed allocated
+		 * Delayed extent could be allocated by fallocate.
+		 * So we need to check it.
 		 */
-		if (ext4_es_is_delonly(&es)) {
-			BUG_ON(bh->b_blocknr != invalid_block);
-			BUG_ON(!buffer_new(bh));
-			BUG_ON(!buffer_delay(bh));
+		if (ext4_es_is_delayed(&es) && !ext4_es_is_unwritten(&es)) {
+			map_bh(bh, inode->i_sb, invalid_block);
+			set_buffer_new(bh);
+			set_buffer_delay(bh);
 			return 0;
 		}