Message ID | 20210813212701.366447-1-tytso@mit.edu |
---|---|
State | Accepted |
Headers | show |
Series | ext4: if zeroout fails fall back to splitting the extent node | expand |
在 2021/8/14 5:27, Theodore Ts'o 写道: > If the underlying storage device is using thin-provisioning, it's > possible for a zeroout operation to return ENOSPC. > > Commit df22291ff0fd ("ext4: Retry block allocation if we have free blocks > left") added logic to retry block allocation since we might get free block > after we commit a transaction. But the ENOSPC from thin-provisioning > will confuse ext4, and lead to an infinite loop. > > Since using zeroout instead of splitting the extent node is an > optimization, if it fails, we might as well fall back to splitting the > extent node. > > Reported-by: yangerkun <yangerkun@huawei.com> > Signed-off-by: Theodore Ts'o <tytso@mit.edu> > --- > > I've run this through my battery of tests, and it doesn't cause any > regressions. Yangerkun, can you test this and see if this works for > you? Will do it. > > fs/ext4/extents.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c > index 92ad64b89d9b..501516cadc1b 100644 > --- a/fs/ext4/extents.c > +++ b/fs/ext4/extents.c > @@ -3569,7 +3569,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle, > split_map.m_len - ee_block); > err = ext4_ext_zeroout(inode, &zero_ex1); > if (err) > - goto out; > + goto fallback; > split_map.m_len = allocated; > } > if (split_map.m_lblk - ee_block + split_map.m_len < > @@ -3583,7 +3583,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle, > ext4_ext_pblock(ex)); > err = ext4_ext_zeroout(inode, &zero_ex2); > if (err) > - goto out; > + goto fallback; > } > > split_map.m_len += split_map.m_lblk - ee_block; > @@ -3592,6 +3592,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle, > } > } > > +fallback: > err = ext4_split_extent(handle, inode, ppath, &split_map, split_flag, > flags); > if (err > 0) >
在 2021/8/14 10:15, yangerkun 写道: > > > 在 2021/8/14 5:27, Theodore Ts'o 写道: >> If the underlying storage device is using thin-provisioning, it's >> possible for a zeroout operation to return ENOSPC. >> >> Commit df22291ff0fd ("ext4: Retry block allocation if we have free blocks >> left") added logic to retry block allocation since we might get free >> block >> after we commit a transaction. But the ENOSPC from thin-provisioning >> will confuse ext4, and lead to an infinite loop. >> >> Since using zeroout instead of splitting the extent node is an >> optimization, if it fails, we might as well fall back to splitting the >> extent node. >> >> Reported-by: yangerkun <yangerkun@huawei.com> >> Signed-off-by: Theodore Ts'o <tytso@mit.edu> >> --- >> >> I've run this through my battery of tests, and it doesn't cause any >> regressions. Yangerkun, can you test this and see if this works for >> you? > > Will do it. Thanks for the patch, it can help us to pass the testcase. And after some review, it's really a better fix for me. Thanks, Kun. > >> >> fs/ext4/extents.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c >> index 92ad64b89d9b..501516cadc1b 100644 >> --- a/fs/ext4/extents.c >> +++ b/fs/ext4/extents.c >> @@ -3569,7 +3569,7 @@ static int >> ext4_ext_convert_to_initialized(handle_t *handle, >> split_map.m_len - ee_block); >> err = ext4_ext_zeroout(inode, &zero_ex1); >> if (err) >> - goto out; >> + goto fallback; >> split_map.m_len = allocated; >> } >> if (split_map.m_lblk - ee_block + split_map.m_len < >> @@ -3583,7 +3583,7 @@ static int >> ext4_ext_convert_to_initialized(handle_t *handle, >> ext4_ext_pblock(ex)); >> err = ext4_ext_zeroout(inode, &zero_ex2); >> if (err) >> - goto out; >> + goto fallback; >> } >> split_map.m_len += split_map.m_lblk - ee_block; >> @@ -3592,6 +3592,7 @@ static int >> ext4_ext_convert_to_initialized(handle_t *handle, >> } >> } >> +fallback: >> err = ext4_split_extent(handle, inode, ppath, &split_map, >> split_flag, >> flags); >> if (err > 0) >> > .
Hi, Rethink about this problem. Should we consider other place which call ext4_issue_zeroout? Maybe it can trigger the problem too(in theory, not really happened)... How about include follow patch which not only transfer ENOSPC to EIO. But also stop to overwrite the error return by ext4_ext_insert_extent in ext4_split_extent_at. Besides, 308c57ccf431 ("ext4: if zeroout fails fall back to splitting the extent node") can work together with this patch. diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index c0de30f25185..66767ede235f 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -3218,16 +3218,18 @@ static int ext4_split_extent_at(handle_t *handle, goto out; if (EXT4_EXT_MAY_ZEROOUT & split_flag) { + int ret = 0; + if (split_flag & (EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) { if (split_flag & EXT4_EXT_DATA_VALID1) { - err = ext4_ext_zeroout(inode, ex2); + ret = ext4_ext_zeroout(inode, ex2); zero_ex.ee_block = ex2->ee_block; zero_ex.ee_len = cpu_to_le16( ext4_ext_get_actual_len(ex2)); ext4_ext_store_pblock(&zero_ex, ext4_ext_pblock(ex2)); } else { - err = ext4_ext_zeroout(inode, ex); + ret = ext4_ext_zeroout(inode, ex); zero_ex.ee_block = ex->ee_block; zero_ex.ee_len = cpu_to_le16( ext4_ext_get_actual_len(ex)); @@ -3235,7 +3237,7 @@ static int ext4_split_extent_at(handle_t *handle, ext4_ext_pblock(ex)); } } else { - err = ext4_ext_zeroout(inode, &orig_ex); + ret = ext4_ext_zeroout(inode, &orig_ex); zero_ex.ee_block = orig_ex.ee_block; zero_ex.ee_len = cpu_to_le16( ext4_ext_get_actual_len(&orig_ex)); @@ -3243,7 +3245,7 @@ static int ext4_split_extent_at(handle_t *handle, ext4_ext_pblock(&orig_ex)); } - if (!err) { + if (!ret) { /* update the extent length and mark as initialized */ ex->ee_len = cpu_to_le16(ee_len); ext4_ext_try_to_merge(handle, inode, path, ex); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index d18852d6029c..95b970581864 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -427,6 +427,9 @@ int ext4_issue_zeroout(struct inode *inode, ext4_lblk_t lblk, ext4_fsblk_t pblk, if (ret > 0) ret = 0; + if (ret == -ENOSPC) + ret = -EIO; + return ret; } 在 2021/8/14 5:27, Theodore Ts'o 写道: > If the underlying storage device is using thin-provisioning, it's > possible for a zeroout operation to return ENOSPC. > > Commit df22291ff0fd ("ext4: Retry block allocation if we have free blocks > left") added logic to retry block allocation since we might get free block > after we commit a transaction. But the ENOSPC from thin-provisioning > will confuse ext4, and lead to an infinite loop. > > Since using zeroout instead of splitting the extent node is an > optimization, if it fails, we might as well fall back to splitting the > extent node. > > Reported-by: yangerkun <yangerkun@huawei.com> > Signed-off-by: Theodore Ts'o <tytso@mit.edu> > --- > > I've run this through my battery of tests, and it doesn't cause any > regressions. Yangerkun, can you test this and see if this works for > you? > > fs/ext4/extents.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c > index 92ad64b89d9b..501516cadc1b 100644 > --- a/fs/ext4/extents.c > +++ b/fs/ext4/extents.c > @@ -3569,7 +3569,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle, > split_map.m_len - ee_block); > err = ext4_ext_zeroout(inode, &zero_ex1); > if (err) > - goto out; > + goto fallback; > split_map.m_len = allocated; > } > if (split_map.m_lblk - ee_block + split_map.m_len < > @@ -3583,7 +3583,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle, > ext4_ext_pblock(ex)); > err = ext4_ext_zeroout(inode, &zero_ex2); > if (err) > - goto out; > + goto fallback; > } > > split_map.m_len += split_map.m_lblk - ee_block; > @@ -3592,6 +3592,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle, > } > } > > +fallback: > err = ext4_split_extent(handle, inode, ppath, &split_map, split_flag, > flags); > if (err > 0) >
Hello, On Sun 26-09-21 19:35:01, yangerkun wrote: > Rethink about this problem. Should we consider other place which call > ext4_issue_zeroout? Maybe it can trigger the problem too(in theory, not > really happened)... > > How about include follow patch which not only transfer ENOSPC to EIO. But > also stop to overwrite the error return by ext4_ext_insert_extent in > ext4_split_extent_at. > > Besides, 308c57ccf431 ("ext4: if zeroout fails fall back to splitting the > extent node") can work together with this patch. I've got back to this. The ext4_ext_zeroout() calls in ext4_split_extent_at() seem to be there as fallback when insertion of a new extent fails due to ENOSPC / EDQUOT. If even ext4_ext_zeroout(), then I think returning an error as the code does now is correct and we don't have much other option. Also we are really running out of disk space so I think returning ENOSPC is fine. What exact scenario are you afraid of? Honza > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c > index c0de30f25185..66767ede235f 100644 > --- a/fs/ext4/extents.c > +++ b/fs/ext4/extents.c > @@ -3218,16 +3218,18 @@ static int ext4_split_extent_at(handle_t *handle, > goto out; > > if (EXT4_EXT_MAY_ZEROOUT & split_flag) { > + int ret = 0; > + > if (split_flag & > (EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) { > if (split_flag & EXT4_EXT_DATA_VALID1) { > - err = ext4_ext_zeroout(inode, ex2); > + ret = ext4_ext_zeroout(inode, ex2); > zero_ex.ee_block = ex2->ee_block; > zero_ex.ee_len = cpu_to_le16( > > ext4_ext_get_actual_len(ex2)); > ext4_ext_store_pblock(&zero_ex, > > ext4_ext_pblock(ex2)); > } else { > - err = ext4_ext_zeroout(inode, ex); > + ret = ext4_ext_zeroout(inode, ex); > zero_ex.ee_block = ex->ee_block; > zero_ex.ee_len = cpu_to_le16( > > ext4_ext_get_actual_len(ex)); > @@ -3235,7 +3237,7 @@ static int ext4_split_extent_at(handle_t *handle, > ext4_ext_pblock(ex)); > } > } else { > - err = ext4_ext_zeroout(inode, &orig_ex); > + ret = ext4_ext_zeroout(inode, &orig_ex); > zero_ex.ee_block = orig_ex.ee_block; > zero_ex.ee_len = cpu_to_le16( > > ext4_ext_get_actual_len(&orig_ex)); > @@ -3243,7 +3245,7 @@ static int ext4_split_extent_at(handle_t *handle, > ext4_ext_pblock(&orig_ex)); > } > > - if (!err) { > + if (!ret) { > /* update the extent length and mark as initialized > */ > ex->ee_len = cpu_to_le16(ee_len); > ext4_ext_try_to_merge(handle, inode, path, ex); > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index d18852d6029c..95b970581864 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -427,6 +427,9 @@ int ext4_issue_zeroout(struct inode *inode, ext4_lblk_t > lblk, ext4_fsblk_t pblk, > if (ret > 0) > ret = 0; > > + if (ret == -ENOSPC) > + ret = -EIO; > + > return ret; > } > > > > 在 2021/8/14 5:27, Theodore Ts'o 写道: > > If the underlying storage device is using thin-provisioning, it's > > possible for a zeroout operation to return ENOSPC. > > > > Commit df22291ff0fd ("ext4: Retry block allocation if we have free blocks > > left") added logic to retry block allocation since we might get free block > > after we commit a transaction. But the ENOSPC from thin-provisioning > > will confuse ext4, and lead to an infinite loop. > > > > Since using zeroout instead of splitting the extent node is an > > optimization, if it fails, we might as well fall back to splitting the > > extent node. > > > > Reported-by: yangerkun <yangerkun@huawei.com> > > Signed-off-by: Theodore Ts'o <tytso@mit.edu> > > --- > > > > I've run this through my battery of tests, and it doesn't cause any > > regressions. Yangerkun, can you test this and see if this works for > > you? > > > > fs/ext4/extents.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c > > index 92ad64b89d9b..501516cadc1b 100644 > > --- a/fs/ext4/extents.c > > +++ b/fs/ext4/extents.c > > @@ -3569,7 +3569,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle, > > split_map.m_len - ee_block); > > err = ext4_ext_zeroout(inode, &zero_ex1); > > if (err) > > - goto out; > > + goto fallback; > > split_map.m_len = allocated; > > } > > if (split_map.m_lblk - ee_block + split_map.m_len < > > @@ -3583,7 +3583,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle, > > ext4_ext_pblock(ex)); > > err = ext4_ext_zeroout(inode, &zero_ex2); > > if (err) > > - goto out; > > + goto fallback; > > } > > split_map.m_len += split_map.m_lblk - ee_block; > > @@ -3592,6 +3592,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle, > > } > > } > > +fallback: > > err = ext4_split_extent(handle, inode, ppath, &split_map, split_flag, > > flags); > > if (err > 0) > >
On 2021/11/23 17:27, Jan Kara wrote: > Hello, > > On Sun 26-09-21 19:35:01, yangerkun wrote: >> Rethink about this problem. Should we consider other place which call >> ext4_issue_zeroout? Maybe it can trigger the problem too(in theory, not >> really happened)... >> >> How about include follow patch which not only transfer ENOSPC to EIO. But >> also stop to overwrite the error return by ext4_ext_insert_extent in >> ext4_split_extent_at. >> >> Besides, 308c57ccf431 ("ext4: if zeroout fails fall back to splitting the >> extent node") can work together with this patch. > > I've got back to this. The ext4_ext_zeroout() calls in > ext4_split_extent_at() seem to be there as fallback when insertion of a new > extent fails due to ENOSPC / EDQUOT. If even ext4_ext_zeroout(), then I > think returning an error as the code does now is correct and we don't have > much other option. Also we are really running out of disk space so I think > returning ENOSPC is fine. What exact scenario are you afraid of? I am afraid about the EDQUOT from ext4_ext_insert_extent may be overwrite by ext4_ext_zeroout with ENOSPC. And this may lead to dead loop since ext4_writepages will retry once get ENOSPC? Maybe I am wrong... > > Honza > >> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c >> index c0de30f25185..66767ede235f 100644 >> --- a/fs/ext4/extents.c >> +++ b/fs/ext4/extents.c >> @@ -3218,16 +3218,18 @@ static int ext4_split_extent_at(handle_t *handle, >> goto out; >> >> if (EXT4_EXT_MAY_ZEROOUT & split_flag) { >> + int ret = 0; >> + >> if (split_flag & >> (EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) { >> if (split_flag & EXT4_EXT_DATA_VALID1) { >> - err = ext4_ext_zeroout(inode, ex2); >> + ret = ext4_ext_zeroout(inode, ex2); >> zero_ex.ee_block = ex2->ee_block; >> zero_ex.ee_len = cpu_to_le16( >> >> ext4_ext_get_actual_len(ex2)); >> ext4_ext_store_pblock(&zero_ex, >> >> ext4_ext_pblock(ex2)); >> } else { >> - err = ext4_ext_zeroout(inode, ex); >> + ret = ext4_ext_zeroout(inode, ex); >> zero_ex.ee_block = ex->ee_block; >> zero_ex.ee_len = cpu_to_le16( >> >> ext4_ext_get_actual_len(ex)); >> @@ -3235,7 +3237,7 @@ static int ext4_split_extent_at(handle_t *handle, >> ext4_ext_pblock(ex)); >> } >> } else { >> - err = ext4_ext_zeroout(inode, &orig_ex); >> + ret = ext4_ext_zeroout(inode, &orig_ex); >> zero_ex.ee_block = orig_ex.ee_block; >> zero_ex.ee_len = cpu_to_le16( >> >> ext4_ext_get_actual_len(&orig_ex)); >> @@ -3243,7 +3245,7 @@ static int ext4_split_extent_at(handle_t *handle, >> ext4_ext_pblock(&orig_ex)); >> } >> >> - if (!err) { >> + if (!ret) { >> /* update the extent length and mark as initialized >> */ >> ex->ee_len = cpu_to_le16(ee_len); >> ext4_ext_try_to_merge(handle, inode, path, ex); >> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c >> index d18852d6029c..95b970581864 100644 >> --- a/fs/ext4/inode.c >> +++ b/fs/ext4/inode.c >> @@ -427,6 +427,9 @@ int ext4_issue_zeroout(struct inode *inode, ext4_lblk_t >> lblk, ext4_fsblk_t pblk, >> if (ret > 0) >> ret = 0; >> >> + if (ret == -ENOSPC) >> + ret = -EIO; >> + >> return ret; >> } >> >> >> >> 在 2021/8/14 5:27, Theodore Ts'o 写道: >>> If the underlying storage device is using thin-provisioning, it's >>> possible for a zeroout operation to return ENOSPC. >>> >>> Commit df22291ff0fd ("ext4: Retry block allocation if we have free blocks >>> left") added logic to retry block allocation since we might get free block >>> after we commit a transaction. But the ENOSPC from thin-provisioning >>> will confuse ext4, and lead to an infinite loop. >>> >>> Since using zeroout instead of splitting the extent node is an >>> optimization, if it fails, we might as well fall back to splitting the >>> extent node. >>> >>> Reported-by: yangerkun <yangerkun@huawei.com> >>> Signed-off-by: Theodore Ts'o <tytso@mit.edu> >>> --- >>> >>> I've run this through my battery of tests, and it doesn't cause any >>> regressions. Yangerkun, can you test this and see if this works for >>> you? >>> >>> fs/ext4/extents.c | 5 +++-- >>> 1 file changed, 3 insertions(+), 2 deletions(-) >>> >>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c >>> index 92ad64b89d9b..501516cadc1b 100644 >>> --- a/fs/ext4/extents.c >>> +++ b/fs/ext4/extents.c >>> @@ -3569,7 +3569,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle, >>> split_map.m_len - ee_block); >>> err = ext4_ext_zeroout(inode, &zero_ex1); >>> if (err) >>> - goto out; >>> + goto fallback; >>> split_map.m_len = allocated; >>> } >>> if (split_map.m_lblk - ee_block + split_map.m_len < >>> @@ -3583,7 +3583,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle, >>> ext4_ext_pblock(ex)); >>> err = ext4_ext_zeroout(inode, &zero_ex2); >>> if (err) >>> - goto out; >>> + goto fallback; >>> } >>> split_map.m_len += split_map.m_lblk - ee_block; >>> @@ -3592,6 +3592,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle, >>> } >>> } >>> +fallback: >>> err = ext4_split_extent(handle, inode, ppath, &split_map, split_flag, >>> flags); >>> if (err > 0) >>>
On Wed 24-11-21 17:01:12, yangerkun wrote: > On 2021/11/23 17:27, Jan Kara wrote: > > Hello, > > > > On Sun 26-09-21 19:35:01, yangerkun wrote: > > > Rethink about this problem. Should we consider other place which call > > > ext4_issue_zeroout? Maybe it can trigger the problem too(in theory, not > > > really happened)... > > > > > > How about include follow patch which not only transfer ENOSPC to EIO. But > > > also stop to overwrite the error return by ext4_ext_insert_extent in > > > ext4_split_extent_at. > > > > > > Besides, 308c57ccf431 ("ext4: if zeroout fails fall back to splitting the > > > extent node") can work together with this patch. > > > > I've got back to this. The ext4_ext_zeroout() calls in > > ext4_split_extent_at() seem to be there as fallback when insertion of a new > > extent fails due to ENOSPC / EDQUOT. If even ext4_ext_zeroout(), then I > > think returning an error as the code does now is correct and we don't have > > much other option. Also we are really running out of disk space so I think > > returning ENOSPC is fine. What exact scenario are you afraid of? > > I am afraid about the EDQUOT from ext4_ext_insert_extent may be overwrite by > ext4_ext_zeroout with ENOSPC. And this may lead to dead loop since > ext4_writepages will retry once get ENOSPC? Maybe I am wrong... OK, so passing back original error instead of the error from ext4_ext_zeroout() makes sense. But I don't think doing much more is needed - firstly, ENOSPC or EDQUOT should not happen in ext4_split_extent_at() called from ext4_writepages() because we should have reserved enough space for extent splits when writing data. So hitting that is already unexpected. Committing transaction holding blocks that are expected to be free is the most likely reason for us seeing ENOSPC and returning EIO in that case would be bug. Secondly, returning EIO instead of ENOSPC is IMO a bit confusing for upper layers and makes it harder to analyze where the real problem is... Honza
On 2021/11/24 18:37, Jan Kara wrote: > On Wed 24-11-21 17:01:12, yangerkun wrote: >> On 2021/11/23 17:27, Jan Kara wrote: >>> Hello, >>> >>> On Sun 26-09-21 19:35:01, yangerkun wrote: >>>> Rethink about this problem. Should we consider other place which call >>>> ext4_issue_zeroout? Maybe it can trigger the problem too(in theory, not >>>> really happened)... >>>> >>>> How about include follow patch which not only transfer ENOSPC to EIO. But >>>> also stop to overwrite the error return by ext4_ext_insert_extent in >>>> ext4_split_extent_at. >>>> >>>> Besides, 308c57ccf431 ("ext4: if zeroout fails fall back to splitting the >>>> extent node") can work together with this patch. >>> >>> I've got back to this. The ext4_ext_zeroout() calls in >>> ext4_split_extent_at() seem to be there as fallback when insertion of a new >>> extent fails due to ENOSPC / EDQUOT. If even ext4_ext_zeroout(), then I >>> think returning an error as the code does now is correct and we don't have >>> much other option. Also we are really running out of disk space so I think >>> returning ENOSPC is fine. What exact scenario are you afraid of? >> >> I am afraid about the EDQUOT from ext4_ext_insert_extent may be overwrite by >> ext4_ext_zeroout with ENOSPC. And this may lead to dead loop since >> ext4_writepages will retry once get ENOSPC? Maybe I am wrong... > > OK, so passing back original error instead of the error from > ext4_ext_zeroout() makes sense. But I don't think doing much more is needed > - firstly, ENOSPC or EDQUOT should not happen in ext4_split_extent_at() > called from ext4_writepages() because we should have reserved enough > space for extent splits when writing data. So hitting that is already ext4_da_write_begin ext4_da_get_block_prep ext4_insert_delayed_block ext4_da_reserve_space It seems we will only reserve space for data, no for metadata... > unexpected. Committing transaction holding blocks that are expected to be > free is the most likely reason for us seeing ENOSPC and returning EIO in > that case would be bug. Agree. EIO from ext4_ext_zeroout that overwrite the ENOSPC from ext4_ext_insert_extent seems buggy too. Maybe we should ignore the error from ext4_ext_zeroout and return the error from ext4_ext_insert_extent once ext4_ext_zeroout in ext4_split_extent_at got a error. Something like this: diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 0ecf819bf189..56cc00ee42a1 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -3185,6 +3185,7 @@ static int ext4_split_extent_at(handle_t *handle, struct ext4_extent *ex2 = NULL; unsigned int ee_len, depth; int err = 0; + int err1; BUG_ON((split_flag & (EXT4_EXT_DATA_VALID1 | EXT4_EXT_DATA_VALID2)) == (EXT4_EXT_DATA_VALID1 | EXT4_EXT_DATA_VALID2)); @@ -3255,7 +3256,7 @@ static int ext4_split_extent_at(handle_t *handle, if (EXT4_EXT_MAY_ZEROOUT & split_flag) { if (split_flag & (EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) { if (split_flag & EXT4_EXT_DATA_VALID1) { - err = ext4_ext_zeroout(inode, ex2); + err1 = ext4_ext_zeroout(inode, ex2); zero_ex.ee_block = ex2->ee_block; zero_ex.ee_len = cpu_to_le16( ext4_ext_get_actual_len(ex2)); @@ -3270,7 +3271,7 @@ static int ext4_split_extent_at(handle_t *handle, ext4_ext_pblock(ex)); } } else { - err = ext4_ext_zeroout(inode, &orig_ex); + err1 = ext4_ext_zeroout(inode, &orig_ex); zero_ex.ee_block = orig_ex.ee_block; zero_ex.ee_len = cpu_to_le16( ext4_ext_get_actual_len(&orig_ex)); @@ -3278,7 +3279,7 @@ static int ext4_split_extent_at(handle_t *handle, ext4_ext_pblock(&orig_ex)); } - if (!err) { + if (!err1) { /* update the extent length and mark as initialized */ ex->ee_len = cpu_to_le16(ee_len); ext4_ext_try_to_merge(handle, inode, path, ex); > Secondly, returning EIO instead of ENOSPC is IMO a > bit confusing for upper layers and makes it harder to analyze where the > real problem is... > > Honza >
On Wed 24-11-21 20:11:43, yangerkun wrote: > > > On 2021/11/24 18:37, Jan Kara wrote: > > On Wed 24-11-21 17:01:12, yangerkun wrote: > > > On 2021/11/23 17:27, Jan Kara wrote: > > > > Hello, > > > > > > > > On Sun 26-09-21 19:35:01, yangerkun wrote: > > > > > Rethink about this problem. Should we consider other place which call > > > > > ext4_issue_zeroout? Maybe it can trigger the problem too(in theory, not > > > > > really happened)... > > > > > > > > > > How about include follow patch which not only transfer ENOSPC to EIO. But > > > > > also stop to overwrite the error return by ext4_ext_insert_extent in > > > > > ext4_split_extent_at. > > > > > > > > > > Besides, 308c57ccf431 ("ext4: if zeroout fails fall back to splitting the > > > > > extent node") can work together with this patch. > > > > > > > > I've got back to this. The ext4_ext_zeroout() calls in > > > > ext4_split_extent_at() seem to be there as fallback when insertion of a new > > > > extent fails due to ENOSPC / EDQUOT. If even ext4_ext_zeroout(), then I > > > > think returning an error as the code does now is correct and we don't have > > > > much other option. Also we are really running out of disk space so I think > > > > returning ENOSPC is fine. What exact scenario are you afraid of? > > > > > > I am afraid about the EDQUOT from ext4_ext_insert_extent may be overwrite by > > > ext4_ext_zeroout with ENOSPC. And this may lead to dead loop since > > > ext4_writepages will retry once get ENOSPC? Maybe I am wrong... > > > > OK, so passing back original error instead of the error from > > ext4_ext_zeroout() makes sense. But I don't think doing much more is needed > > - firstly, ENOSPC or EDQUOT should not happen in ext4_split_extent_at() > > called from ext4_writepages() because we should have reserved enough > > space for extent splits when writing data. So hitting that is already > > ext4_da_write_begin > ext4_da_get_block_prep > ext4_insert_delayed_block > ext4_da_reserve_space > > It seems we will only reserve space for data, no for metadata... > > > > unexpected. Committing transaction holding blocks that are expected to be > > free is the most likely reason for us seeing ENOSPC and returning EIO in > > that case would be bug. > > Agree. EIO from ext4_ext_zeroout that overwrite the ENOSPC from > ext4_ext_insert_extent seems buggy too. Maybe we should ignore the error > from ext4_ext_zeroout and return the error from ext4_ext_insert_extent > once ext4_ext_zeroout in ext4_split_extent_at got a error. Something > like this: Yep, something like that looks good to me. Honza > > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c > index 0ecf819bf189..56cc00ee42a1 100644 > --- a/fs/ext4/extents.c > +++ b/fs/ext4/extents.c > @@ -3185,6 +3185,7 @@ static int ext4_split_extent_at(handle_t *handle, > struct ext4_extent *ex2 = NULL; > unsigned int ee_len, depth; > int err = 0; > + int err1; > > BUG_ON((split_flag & (EXT4_EXT_DATA_VALID1 | EXT4_EXT_DATA_VALID2)) > == > (EXT4_EXT_DATA_VALID1 | EXT4_EXT_DATA_VALID2)); > @@ -3255,7 +3256,7 @@ static int ext4_split_extent_at(handle_t *handle, > if (EXT4_EXT_MAY_ZEROOUT & split_flag) { > if (split_flag & > (EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) { > if (split_flag & EXT4_EXT_DATA_VALID1) { > - err = ext4_ext_zeroout(inode, ex2); > + err1 = ext4_ext_zeroout(inode, ex2); > zero_ex.ee_block = ex2->ee_block; > zero_ex.ee_len = cpu_to_le16( > > ext4_ext_get_actual_len(ex2)); > @@ -3270,7 +3271,7 @@ static int ext4_split_extent_at(handle_t *handle, > ext4_ext_pblock(ex)); > } > } else { > - err = ext4_ext_zeroout(inode, &orig_ex); > + err1 = ext4_ext_zeroout(inode, &orig_ex); > zero_ex.ee_block = orig_ex.ee_block; > zero_ex.ee_len = cpu_to_le16( > > ext4_ext_get_actual_len(&orig_ex)); > @@ -3278,7 +3279,7 @@ static int ext4_split_extent_at(handle_t *handle, > ext4_ext_pblock(&orig_ex)); > } > > - if (!err) { > + if (!err1) { > /* update the extent length and mark as initialized > */ > ex->ee_len = cpu_to_le16(ee_len); > ext4_ext_try_to_merge(handle, inode, path, ex); > > > > > Secondly, returning EIO instead of ENOSPC is IMO a > > bit confusing for upper layers and makes it harder to analyze where the > > real problem is... > > > > Honza > >
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 92ad64b89d9b..501516cadc1b 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -3569,7 +3569,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle, split_map.m_len - ee_block); err = ext4_ext_zeroout(inode, &zero_ex1); if (err) - goto out; + goto fallback; split_map.m_len = allocated; } if (split_map.m_lblk - ee_block + split_map.m_len < @@ -3583,7 +3583,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle, ext4_ext_pblock(ex)); err = ext4_ext_zeroout(inode, &zero_ex2); if (err) - goto out; + goto fallback; } split_map.m_len += split_map.m_lblk - ee_block; @@ -3592,6 +3592,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle, } } +fallback: err = ext4_split_extent(handle, inode, ppath, &split_map, split_flag, flags); if (err > 0)
If the underlying storage device is using thin-provisioning, it's possible for a zeroout operation to return ENOSPC. Commit df22291ff0fd ("ext4: Retry block allocation if we have free blocks left") added logic to retry block allocation since we might get free block after we commit a transaction. But the ENOSPC from thin-provisioning will confuse ext4, and lead to an infinite loop. Since using zeroout instead of splitting the extent node is an optimization, if it fails, we might as well fall back to splitting the extent node. Reported-by: yangerkun <yangerkun@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> --- I've run this through my battery of tests, and it doesn't cause any regressions. Yangerkun, can you test this and see if this works for you? fs/ext4/extents.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)