diff mbox series

ext4: stop return ENOSPC from ext4_issue_zeroout

Message ID 20210804125044.2480435-1-yangerkun@huawei.com
State Superseded
Headers show
Series ext4: stop return ENOSPC from ext4_issue_zeroout | expand

Commit Message

yangerkun Aug. 4, 2021, 12:50 p.m. UTC
Our testcase(briefly described as fsstress on dm thin-provisioning which
ext4 see volume size with 100G but actual size 10G) trigger a hungtask
bug since ext4_writepages fall into a infinite loop:

static int ext4_writepages(xxx)
{
    ...
   while (!done && mpd.first_page <= mpd.last_page) {
       ...
       ret = mpage_prepare_extent_to_map(&mpd);
       if (!ret) {
           ...
           ret = mpage_map_and_submit_extent(handle,
&mpd,&give_up_on_write);
           <----- will return -ENOSPC
           ...
       }
       ...
       if (ret == -ENOSPC && sbi->s_journal) {
           <------ we cannot break since we will get ENOSPC forever
           jbd2_journal_force_commit_nested(sbi->s_journal);
           ret = 0;
           continue;
       }
       ...
   }
}

Got ENOSPC with follow stack:
...
ext4_ext_map_blocks
  ext4_ext_convert_to_initialized
    ext4_ext_zeroout
      ext4_issue_zeroout
        ...
        submit_bio_wait <-- bio to thinpool will return ENOSPC

'df22291ff0fd ("ext4: Retry block allocation if we have free blocks
left")' add the logic to retry block allcate since we may get free block
after we commit a transaction. But the ENOSPC from thin-provisioning
will confuse ext4, and lead the upper infinite loop. Fix it by convert
the err to EIO.

Signed-off-by: yangerkun <yangerkun@huawei.com>
---
 fs/ext4/inode.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Jan Kara Aug. 4, 2021, 1:35 p.m. UTC | #1
On Wed 04-08-21 20:50:44, yangerkun wrote:
> Our testcase(briefly described as fsstress on dm thin-provisioning which
> ext4 see volume size with 100G but actual size 10G) trigger a hungtask
> bug since ext4_writepages fall into a infinite loop:
> 
> static int ext4_writepages(xxx)
> {
>     ...
>    while (!done && mpd.first_page <= mpd.last_page) {
>        ...
>        ret = mpage_prepare_extent_to_map(&mpd);
>        if (!ret) {
>            ...
>            ret = mpage_map_and_submit_extent(handle,
> &mpd,&give_up_on_write);
>            <----- will return -ENOSPC
>            ...
>        }
>        ...
>        if (ret == -ENOSPC && sbi->s_journal) {
>            <------ we cannot break since we will get ENOSPC forever
>            jbd2_journal_force_commit_nested(sbi->s_journal);
>            ret = 0;
>            continue;
>        }
>        ...
>    }
> }
> 
> Got ENOSPC with follow stack:
> ...
> ext4_ext_map_blocks
>   ext4_ext_convert_to_initialized
>     ext4_ext_zeroout
>       ext4_issue_zeroout
>         ...
>         submit_bio_wait <-- bio to thinpool will return ENOSPC
> 
> 'df22291ff0fd ("ext4: Retry block allocation if we have free blocks
> left")' add the logic to retry block allcate since we may get free block
> after we commit a transaction. But the ENOSPC from thin-provisioning
> will confuse ext4, and lead the upper infinite loop. Fix it by convert
> the err to EIO.
> 
> Signed-off-by: yangerkun <yangerkun@huawei.com>

Thanks for the patch. As a quick fix for the problem this is probably fine.
But longer term we might need to implement a configurable behavior for this
because just dropping data on the floor (which is what would happen here)
need not be what sysadmin wants and blocking until space is provisioned may be
actually a preferable behavior. Anyway for now feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

									Honza

> ---
>  fs/ext4/inode.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 038aebd7eb2f..d9ded699a88c 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -428,6 +428,9 @@ int ext4_issue_zeroout(struct inode *inode, ext4_lblk_t lblk, ext4_fsblk_t pblk,
>  	if (ret > 0)
>  		ret = 0;
>  
> +	if (ret == -ENOSPC)
> +		ret = -EIO;
> +
>  	return ret;
>  }
>  
> -- 
> 2.31.1
>
Theodore Ts'o Aug. 13, 2021, 3:18 p.m. UTC | #2
On Wed, Aug 04, 2021 at 03:35:29PM +0200, Jan Kara wrote:
> On Wed 04-08-21 20:50:44, yangerkun wrote:
> > Our testcase(briefly described as fsstress on dm thin-provisioning which
> > ext4 see volume size with 100G but actual size 10G) trigger a hungtask
> > bug since ext4_writepages fall into a infinite loop:
> > 
> > Got ENOSPC with follow stack:
> > ...
> > ext4_ext_map_blocks
> >   ext4_ext_convert_to_initialized
> >     ext4_ext_zeroout
> >       ext4_issue_zeroout
> >         ...
> >         submit_bio_wait <-- bio to thinpool will return ENOSPC
> > 
> 
> Thanks for the patch. As a quick fix for the problem this is probably fine.
> But longer term we might need to implement a configurable behavior for this
> because just dropping data on the floor (which is what would happen here)
> need not be what sysadmin wants and blocking until space is provisioned may be
> actually a preferable behavior. Anyway for now feel free to add:
> 
> Reviewed-by: Jan Kara <jack@suse.cz>

Hmm, I wonder if this would be a better fix.  (Not yet tested, may fry
your file system, etc....)   What do folks think?

						- Ted

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 92ad64b89d9b..501516cadc1b 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3569,7 +3569,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
 				split_map.m_len - ee_block);
 			err = ext4_ext_zeroout(inode, &zero_ex1);
 			if (err)
-				goto out;
+				goto fallback;
 			split_map.m_len = allocated;
 		}
 		if (split_map.m_lblk - ee_block + split_map.m_len <
@@ -3583,7 +3583,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
 						      ext4_ext_pblock(ex));
 				err = ext4_ext_zeroout(inode, &zero_ex2);
 				if (err)
-					goto out;
+					goto fallback;
 			}
 
 			split_map.m_len += split_map.m_lblk - ee_block;
@@ -3592,6 +3592,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
 		}
 	}
 
+fallback:
 	err = ext4_split_extent(handle, inode, ppath, &split_map, split_flag,
 				flags);
 	if (err > 0)
Jan Kara Aug. 16, 2021, 10:05 a.m. UTC | #3
On Fri 13-08-21 11:18:01, Theodore Ts'o wrote:
> On Wed, Aug 04, 2021 at 03:35:29PM +0200, Jan Kara wrote:
> > On Wed 04-08-21 20:50:44, yangerkun wrote:
> > > Our testcase(briefly described as fsstress on dm thin-provisioning which
> > > ext4 see volume size with 100G but actual size 10G) trigger a hungtask
> > > bug since ext4_writepages fall into a infinite loop:
> > > 
> > > Got ENOSPC with follow stack:
> > > ...
> > > ext4_ext_map_blocks
> > >   ext4_ext_convert_to_initialized
> > >     ext4_ext_zeroout
> > >       ext4_issue_zeroout
> > >         ...
> > >         submit_bio_wait <-- bio to thinpool will return ENOSPC
> > > 
> > 
> > Thanks for the patch. As a quick fix for the problem this is probably fine.
> > But longer term we might need to implement a configurable behavior for this
> > because just dropping data on the floor (which is what would happen here)
> > need not be what sysadmin wants and blocking until space is provisioned may be
> > actually a preferable behavior. Anyway for now feel free to add:
> > 
> > Reviewed-by: Jan Kara <jack@suse.cz>
> 
> Hmm, I wonder if this would be a better fix.  (Not yet tested, may fry
> your file system, etc....)   What do folks think?

Yes, that looks indeed better. I'd note that even splitting extent may fail
due to ENOSPC on thin-provisioned storage but the chances are *much* lower.

								Honza

> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 92ad64b89d9b..501516cadc1b 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3569,7 +3569,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
>  				split_map.m_len - ee_block);
>  			err = ext4_ext_zeroout(inode, &zero_ex1);
>  			if (err)
> -				goto out;
> +				goto fallback;
>  			split_map.m_len = allocated;
>  		}
>  		if (split_map.m_lblk - ee_block + split_map.m_len <
> @@ -3583,7 +3583,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
>  						      ext4_ext_pblock(ex));
>  				err = ext4_ext_zeroout(inode, &zero_ex2);
>  				if (err)
> -					goto out;
> +					goto fallback;
>  			}
>  
>  			split_map.m_len += split_map.m_lblk - ee_block;
> @@ -3592,6 +3592,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
>  		}
>  	}
>  
> +fallback:
>  	err = ext4_split_extent(handle, inode, ppath, &split_map, split_flag,
>  				flags);
>  	if (err > 0)
Theodore Ts'o Aug. 16, 2021, 2:16 p.m. UTC | #4
On Mon, Aug 16, 2021 at 12:05:45PM +0200, Jan Kara wrote:
> 
> Yes, that looks indeed better. I'd note that even splitting extent may fail
> due to ENOSPC on thin-provisioned storage but the chances are *much* lower.

Indeed, any kind of metadata update (updating an inode atime, creating
a new inode, deleting a directory entry, etc.) can fail due to ENOSPC
on thin-provision storage, leading to a potentially corrupted file
system since some writes will succeed, while others won't --- and
that's not a scenario that's super well tested, nor is there much we
can do other than remounting the file system read/only or forcing a
reboot.

But at least we won't throw the kernel into an infinite loop, which is
what yangerkun was reporting...

					- Ted
diff mbox series

Patch

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 038aebd7eb2f..d9ded699a88c 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -428,6 +428,9 @@  int ext4_issue_zeroout(struct inode *inode, ext4_lblk_t lblk, ext4_fsblk_t pblk,
 	if (ret > 0)
 		ret = 0;
 
+	if (ret == -ENOSPC)
+		ret = -EIO;
+
 	return ret;
 }