Message ID | 20230412124126.2286716-4-libaokun1@huawei.com |
---|---|
State | Superseded |
Headers | show |
Series | ext4: fix WARNING in ext4_da_update_reserve_space | expand |
On Wed 12-04-23 20:41:21, Baokun Li wrote: > If extent status tree update fails, we have inconsistency between what is > stored in the extent status tree and what is stored on disk. And that can > cause even data corruption issues in some cases. > > For extents that cannot be dropped we use __GFP_NOFAIL to allocate memory. > And with the above logic, the undo operation in __es_remove_extent that > may cause inconsistency if the split extent fails is unnecessary, so we > remove it as well. > > Suggested-by: Jan Kara <jack@suse.cz> > Signed-off-by: Baokun Li <libaokun1@huawei.com> When I was looking through this patch, I've realized there's a problem with my plan :-|. See below... > static struct extent_status * > ext4_es_alloc_extent(struct inode *inode, ext4_lblk_t lblk, ext4_lblk_t len, > - ext4_fsblk_t pblk) > + ext4_fsblk_t pblk, int nofail) > { > struct extent_status *es; > - es = kmem_cache_alloc(ext4_es_cachep, GFP_ATOMIC); > + gfp_t gfp_flags = GFP_ATOMIC; > + > + if (nofail) > + gfp_flags |= __GFP_NOFAIL; > + > + es = kmem_cache_alloc(ext4_es_cachep, gfp_flags); > if (es == NULL) > return NULL; I have remembered that the combination of GFP_ATOMIC and GFP_NOFAIL is discouraged because the kernel has no sane way of refilling reserves for atomic allocations when in atomic context. So this combination can result in lockups. So what I think we'll have to do is that we'll just have to return error from __es_insert_extent() and __es_remove_extent() and in the callers we drop the i_es_lock, allocate needed status entries (one or two depending on the desired operation) with GFP_KERNEL | GFP_NOFAIL, get the lock again and pass the preallocated entries into __es_insert_extent / __es_remove_extent(). It's a bit ugly but we can at least remove those __es_shrink() calls which are not pretty either. Honza
On 2023/4/13 18:30, Jan Kara wrote: > On Wed 12-04-23 20:41:21, Baokun Li wrote: >> If extent status tree update fails, we have inconsistency between what is >> stored in the extent status tree and what is stored on disk. And that can >> cause even data corruption issues in some cases. >> >> For extents that cannot be dropped we use __GFP_NOFAIL to allocate memory. >> And with the above logic, the undo operation in __es_remove_extent that >> may cause inconsistency if the split extent fails is unnecessary, so we >> remove it as well. >> >> Suggested-by: Jan Kara<jack@suse.cz> >> Signed-off-by: Baokun Li<libaokun1@huawei.com> > When I was looking through this patch, I've realized there's a problem with > my plan :-|. See below... > >> static struct extent_status * >> ext4_es_alloc_extent(struct inode *inode, ext4_lblk_t lblk, ext4_lblk_t len, >> - ext4_fsblk_t pblk) >> + ext4_fsblk_t pblk, int nofail) >> { >> struct extent_status *es; >> - es = kmem_cache_alloc(ext4_es_cachep, GFP_ATOMIC); >> + gfp_t gfp_flags = GFP_ATOMIC; >> + >> + if (nofail) >> + gfp_flags |= __GFP_NOFAIL; >> + >> + es = kmem_cache_alloc(ext4_es_cachep, gfp_flags); >> if (es == NULL) >> return NULL; > I have remembered that the combination of GFP_ATOMIC and GFP_NOFAIL is > discouraged because the kernel has no sane way of refilling reserves for > atomic allocations when in atomic context. So this combination can result > in lockups. Indeed. GFP_NOFAIL is only applicable to sleepable allocations, GFP_ATOMIC will ignore it. I didn't notice that. > So what I think we'll have to do is that we'll just have to return error > from __es_insert_extent() and __es_remove_extent() and in the callers we > drop the i_es_lock, allocate needed status entries (one or two depending on > the desired operation) with GFP_KERNEL | GFP_NOFAIL, get the lock again and > pass the preallocated entries into __es_insert_extent / > __es_remove_extent(). It's a bit ugly but we can at least remove those > __es_shrink() calls which are not pretty either. > > Honza Yes, there's really no better way, thank you very much for your review! I've sent a patch for v4 as you suggested. Thanks again!
diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c index f9dab2510bdc..a7c3200f9cbe 100644 --- a/fs/ext4/extents_status.c +++ b/fs/ext4/extents_status.c @@ -464,10 +464,15 @@ static inline int ext4_es_must_keep(struct extent_status *es) static struct extent_status * ext4_es_alloc_extent(struct inode *inode, ext4_lblk_t lblk, ext4_lblk_t len, - ext4_fsblk_t pblk) + ext4_fsblk_t pblk, int nofail) { struct extent_status *es; - es = kmem_cache_alloc(ext4_es_cachep, GFP_ATOMIC); + gfp_t gfp_flags = GFP_ATOMIC; + + if (nofail) + gfp_flags |= __GFP_NOFAIL; + + es = kmem_cache_alloc(ext4_es_cachep, gfp_flags); if (es == NULL) return NULL; es->es_lblk = lblk; @@ -804,7 +809,7 @@ static int __es_insert_extent(struct inode *inode, struct extent_status *newes) } es = ext4_es_alloc_extent(inode, newes->es_lblk, newes->es_len, - newes->es_pblk); + newes->es_pblk, ext4_es_must_keep(newes)); if (!es) return -ENOMEM; rb_link_node(&es->rb_node, parent, p); @@ -1361,8 +1366,6 @@ static int __es_remove_extent(struct inode *inode, ext4_lblk_t lblk, ext4_es_status(&orig_es)); err = __es_insert_extent(inode, &newes); if (err) { - es->es_lblk = orig_es.es_lblk; - es->es_len = orig_es.es_len; if ((err == -ENOMEM) && __es_shrink(EXT4_SB(inode->i_sb), 128, EXT4_I(inode)))
If extent status tree update fails, we have inconsistency between what is stored in the extent status tree and what is stored on disk. And that can cause even data corruption issues in some cases. For extents that cannot be dropped we use __GFP_NOFAIL to allocate memory. And with the above logic, the undo operation in __es_remove_extent that may cause inconsistency if the split extent fails is unnecessary, so we remove it as well. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun1@huawei.com> --- V2->V3: Define helper as a preparatory patch and simplify the code. fs/ext4/extents_status.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)