diff mbox series

[v3,3/8] ext4: use __GFP_NOFAIL if allocating extents_status cannot fail

Message ID 20230412124126.2286716-4-libaokun1@huawei.com
State Superseded
Headers show
Series ext4: fix WARNING in ext4_da_update_reserve_space | expand

Commit Message

Baokun Li April 12, 2023, 12:41 p.m. UTC
If extent status tree update fails, we have inconsistency between what is
stored in the extent status tree and what is stored on disk. And that can
cause even data corruption issues in some cases.

For extents that cannot be dropped we use __GFP_NOFAIL to allocate memory.
And with the above logic, the undo operation in __es_remove_extent that
may cause inconsistency if the split extent fails is unnecessary, so we
remove it as well.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
V2->V3:
	Define helper as a preparatory patch and simplify the code.

 fs/ext4/extents_status.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

Comments

Jan Kara April 13, 2023, 10:30 a.m. UTC | #1
On Wed 12-04-23 20:41:21, Baokun Li wrote:
> If extent status tree update fails, we have inconsistency between what is
> stored in the extent status tree and what is stored on disk. And that can
> cause even data corruption issues in some cases.
> 
> For extents that cannot be dropped we use __GFP_NOFAIL to allocate memory.
> And with the above logic, the undo operation in __es_remove_extent that
> may cause inconsistency if the split extent fails is unnecessary, so we
> remove it as well.
> 
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Baokun Li <libaokun1@huawei.com>

When I was looking through this patch, I've realized there's a problem with
my plan :-|. See below...

>  static struct extent_status *
>  ext4_es_alloc_extent(struct inode *inode, ext4_lblk_t lblk, ext4_lblk_t len,
> -		     ext4_fsblk_t pblk)
> +		     ext4_fsblk_t pblk, int nofail)
>  {
>  	struct extent_status *es;
> -	es = kmem_cache_alloc(ext4_es_cachep, GFP_ATOMIC);
> +	gfp_t gfp_flags = GFP_ATOMIC;
> +
> +	if (nofail)
> +		gfp_flags |= __GFP_NOFAIL;
> +
> +	es = kmem_cache_alloc(ext4_es_cachep, gfp_flags);
>  	if (es == NULL)
>  		return NULL;

I have remembered that the combination of GFP_ATOMIC and GFP_NOFAIL is
discouraged because the kernel has no sane way of refilling reserves for
atomic allocations when in atomic context. So this combination can result
in lockups.

So what I think we'll have to do is that we'll just have to return error
from __es_insert_extent() and __es_remove_extent() and in the callers we
drop the i_es_lock, allocate needed status entries (one or two depending on
the desired operation) with GFP_KERNEL | GFP_NOFAIL, get the lock again and
pass the preallocated entries into __es_insert_extent /
__es_remove_extent(). It's a bit ugly but we can at least remove those
__es_shrink() calls which are not pretty either.

								Honza
Baokun Li April 24, 2023, 3:45 a.m. UTC | #2
On 2023/4/13 18:30, Jan Kara wrote:
> On Wed 12-04-23 20:41:21, Baokun Li wrote:
>> If extent status tree update fails, we have inconsistency between what is
>> stored in the extent status tree and what is stored on disk. And that can
>> cause even data corruption issues in some cases.
>>
>> For extents that cannot be dropped we use __GFP_NOFAIL to allocate memory.
>> And with the above logic, the undo operation in __es_remove_extent that
>> may cause inconsistency if the split extent fails is unnecessary, so we
>> remove it as well.
>>
>> Suggested-by: Jan Kara<jack@suse.cz>
>> Signed-off-by: Baokun Li<libaokun1@huawei.com>
> When I was looking through this patch, I've realized there's a problem with
> my plan :-|. See below...
>
>>   static struct extent_status *
>>   ext4_es_alloc_extent(struct inode *inode, ext4_lblk_t lblk, ext4_lblk_t len,
>> -		     ext4_fsblk_t pblk)
>> +		     ext4_fsblk_t pblk, int nofail)
>>   {
>>   	struct extent_status *es;
>> -	es = kmem_cache_alloc(ext4_es_cachep, GFP_ATOMIC);
>> +	gfp_t gfp_flags = GFP_ATOMIC;
>> +
>> +	if (nofail)
>> +		gfp_flags |= __GFP_NOFAIL;
>> +
>> +	es = kmem_cache_alloc(ext4_es_cachep, gfp_flags);
>>   	if (es == NULL)
>>   		return NULL;
> I have remembered that the combination of GFP_ATOMIC and GFP_NOFAIL is
> discouraged because the kernel has no sane way of refilling reserves for
> atomic allocations when in atomic context. So this combination can result
> in lockups.

Indeed. GFP_NOFAIL is only applicable to sleepable allocations,

GFP_ATOMIC will ignore it. I didn't notice that.

> So what I think we'll have to do is that we'll just have to return error
> from __es_insert_extent() and __es_remove_extent() and in the callers we
> drop the i_es_lock, allocate needed status entries (one or two depending on
> the desired operation) with GFP_KERNEL | GFP_NOFAIL, get the lock again and
> pass the preallocated entries into __es_insert_extent /
> __es_remove_extent(). It's a bit ugly but we can at least remove those
> __es_shrink() calls which are not pretty either.
>
> 								Honza

Yes, there's really no better way, thank you very much for your review!
I've sent a patch for v4 as you suggested.
Thanks again!
diff mbox series

Patch

diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index f9dab2510bdc..a7c3200f9cbe 100644
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@ -464,10 +464,15 @@  static inline int ext4_es_must_keep(struct extent_status *es)
 
 static struct extent_status *
 ext4_es_alloc_extent(struct inode *inode, ext4_lblk_t lblk, ext4_lblk_t len,
-		     ext4_fsblk_t pblk)
+		     ext4_fsblk_t pblk, int nofail)
 {
 	struct extent_status *es;
-	es = kmem_cache_alloc(ext4_es_cachep, GFP_ATOMIC);
+	gfp_t gfp_flags = GFP_ATOMIC;
+
+	if (nofail)
+		gfp_flags |= __GFP_NOFAIL;
+
+	es = kmem_cache_alloc(ext4_es_cachep, gfp_flags);
 	if (es == NULL)
 		return NULL;
 	es->es_lblk = lblk;
@@ -804,7 +809,7 @@  static int __es_insert_extent(struct inode *inode, struct extent_status *newes)
 	}
 
 	es = ext4_es_alloc_extent(inode, newes->es_lblk, newes->es_len,
-				  newes->es_pblk);
+				  newes->es_pblk, ext4_es_must_keep(newes));
 	if (!es)
 		return -ENOMEM;
 	rb_link_node(&es->rb_node, parent, p);
@@ -1361,8 +1366,6 @@  static int __es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
 						    ext4_es_status(&orig_es));
 			err = __es_insert_extent(inode, &newes);
 			if (err) {
-				es->es_lblk = orig_es.es_lblk;
-				es->es_len = orig_es.es_len;
 				if ((err == -ENOMEM) &&
 				    __es_shrink(EXT4_SB(inode->i_sb),
 							128, EXT4_I(inode)))