diff mbox series

ext4: Fix calculation of credits for extent tree modification

Message ID 20250429175535.23125-2-jack@suse.cz
State Awaiting Upstream
Headers show
Series ext4: Fix calculation of credits for extent tree modification | expand

Commit Message

Jan Kara April 29, 2025, 5:55 p.m. UTC
Luis and David are reporting that after running generic/750 test for 90+
hours on 2k ext4 filesystem, they are able to trigger a warning in
jbd2_journal_dirty_metadata() complaining that there are not enough
credits in the running transaction started in ext4_do_writepages().

Indeed the code in ext4_do_writepages() is racy and the extent tree can
change between the time we compute credits necessary for extent tree
computation and the time we actually modify the extent tree. Thus it may
happen that the number of credits actually needed is higher. Modify
ext4_ext_index_trans_blocks() to count with the worst case of maximum
tree depth. This can reduce the possible number of writers that can
operate in the system in parallel (because the credit estimates now won't
fit in one transaction) but for reasonably sized journals this shouldn't
really be an issue. So just go with a safe and simple fix.

Link: https://lore.kernel.org/all/20250415013641.f2ppw6wov4kn4wq2@offworld
Reported-by: Davidlohr Bueso <dave@stgolabs.net>
Reported-by: Luis Chamberlain <mcgrof@kernel.org>
Tested-by: kdevops@lists.linux.dev
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/extents.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

Comments

Zhang Yi May 7, 2025, 11:57 a.m. UTC | #1
Hi, Jan!

On 2025/4/30 1:55, Jan Kara wrote:
> Luis and David are reporting that after running generic/750 test for 90+
> hours on 2k ext4 filesystem, they are able to trigger a warning in
> jbd2_journal_dirty_metadata() complaining that there are not enough
> credits in the running transaction started in ext4_do_writepages().
> 
> Indeed the code in ext4_do_writepages() is racy and the extent tree can
> change between the time we compute credits necessary for extent tree
> computation and the time we actually modify the extent tree. Thus it may
> happen that the number of credits actually needed is higher. Modify
> ext4_ext_index_trans_blocks() to count with the worst case of maximum
> tree depth. This can reduce the possible number of writers that can
> operate in the system in parallel (because the credit estimates now won't
> fit in one transaction) but for reasonably sized journals this shouldn't
> really be an issue. So just go with a safe and simple fix.
> 
> Link: https://lore.kernel.org/all/20250415013641.f2ppw6wov4kn4wq2@offworld
> Reported-by: Davidlohr Bueso <dave@stgolabs.net>
> Reported-by: Luis Chamberlain <mcgrof@kernel.org>
> Tested-by: kdevops@lists.linux.dev
> Signed-off-by: Jan Kara <jack@suse.cz>

This overall looks good to me now. However, the credit calculation in
ext4_ext_index_trans_blocks() seems still appears to be incorrect
because it does not include the leaf extent blocks. I discovered this
problem while attempting to enable large folios for ext4. It can easily
trigger problems when writing back a 2MB folio with a 1K block size,
and each block is discontinuous.

  https://lore.kernel.org/linux-ext4/20241125114419.903270-7-yi.zhang@huaweicloud.com/

Fortunately, this problem can only triggered after we support large
folio.

Reviewed-by: Zhang Yi <yi.zhang@huawei.com>

> ---
>  fs/ext4/extents.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index c616a16a9f36..43286632e650 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -2396,18 +2396,19 @@ int ext4_ext_calc_credits_for_single_extent(struct inode *inode, int nrblocks,
>  int ext4_ext_index_trans_blocks(struct inode *inode, int extents)
>  {
>  	int index;
> -	int depth;
>  
>  	/* If we are converting the inline data, only one is needed here. */
>  	if (ext4_has_inline_data(inode))
>  		return 1;
>  
> -	depth = ext_depth(inode);
> -
> +	/*
> +	 * Extent tree can change between the time we estimate credits and
> +	 * the time we actually modify the tree. Assume the worst case.
> +	 */
>  	if (extents <= 1)
> -		index = depth * 2;
> +		index = EXT4_MAX_EXTENT_DEPTH * 2;
>  	else
> -		index = depth * 3;
> +		index = EXT4_MAX_EXTENT_DEPTH * 3;
>  
>  	return index;
>  }
diff mbox series

Patch

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index c616a16a9f36..43286632e650 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2396,18 +2396,19 @@  int ext4_ext_calc_credits_for_single_extent(struct inode *inode, int nrblocks,
 int ext4_ext_index_trans_blocks(struct inode *inode, int extents)
 {
 	int index;
-	int depth;
 
 	/* If we are converting the inline data, only one is needed here. */
 	if (ext4_has_inline_data(inode))
 		return 1;
 
-	depth = ext_depth(inode);
-
+	/*
+	 * Extent tree can change between the time we estimate credits and
+	 * the time we actually modify the tree. Assume the worst case.
+	 */
 	if (extents <= 1)
-		index = depth * 2;
+		index = EXT4_MAX_EXTENT_DEPTH * 2;
 	else
-		index = depth * 3;
+		index = EXT4_MAX_EXTENT_DEPTH * 3;
 
 	return index;
 }