diff mbox series

[v2,24/24] ext4: enable block size larger than page size

Message ID 20251107144249.435029-25-libaokun@huaweicloud.com
State Superseded
Headers show
Series ext4: enable block size larger than page size | expand

Commit Message

Baokun Li Nov. 7, 2025, 2:42 p.m. UTC
From: Baokun Li <libaokun1@huawei.com>

Since block device (See commit 3c20917120ce ("block/bdev: enable large
folio support for large logical block sizes")) and page cache (See commit
ab95d23bab220ef8 ("filemap: allocate mapping_min_order folios in the page
cache")) has the ability to have a minimum order when allocating folio,
and ext4 has supported large folio in commit 7ac67301e82f ("ext4: enable
large folio for regular file"), now add support for block_size > PAGE_SIZE
in ext4.

set_blocksize() -> bdev_validate_blocksize() already validates the block
size, so ext4_load_super() does not need to perform additional checks.

Here we only need to add the FS_LBS bit to fs_flags.

In addition, allocation failures for large folios may trigger warn_alloc()
warnings. Therefore, as with XFS, mark this feature as experimental.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/super.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Jan Kara Nov. 10, 2025, 10 a.m. UTC | #1
On Fri 07-11-25 22:42:49, libaokun@huaweicloud.com wrote:
> From: Baokun Li <libaokun1@huawei.com>
> 
> Since block device (See commit 3c20917120ce ("block/bdev: enable large
> folio support for large logical block sizes")) and page cache (See commit
> ab95d23bab220ef8 ("filemap: allocate mapping_min_order folios in the page
> cache")) has the ability to have a minimum order when allocating folio,
> and ext4 has supported large folio in commit 7ac67301e82f ("ext4: enable
> large folio for regular file"), now add support for block_size > PAGE_SIZE
> in ext4.
> 
> set_blocksize() -> bdev_validate_blocksize() already validates the block
> size, so ext4_load_super() does not need to perform additional checks.
> 
> Here we only need to add the FS_LBS bit to fs_flags.
> 
> In addition, allocation failures for large folios may trigger warn_alloc()
> warnings. Therefore, as with XFS, mark this feature as experimental.
> 
> Signed-off-by: Baokun Li <libaokun1@huawei.com>
> Reviewed-by: Zhang Yi <yi.zhang@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/super.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 6735152dd219..1fbbae5a0426 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -5074,6 +5074,9 @@ static int ext4_check_large_folio(struct super_block *sb)
>  		return -EINVAL;
>  	}
>  
> +	if (sb->s_blocksize > PAGE_SIZE)
> +		ext4_msg(sb, KERN_NOTICE, "EXPERIMENTAL bs(%lu) > ps(%lu) enabled.",
> +			 sb->s_blocksize, PAGE_SIZE);
>  	return 0;
>  }
>  
> @@ -7453,7 +7456,8 @@ static struct file_system_type ext4_fs_type = {
>  	.init_fs_context	= ext4_init_fs_context,
>  	.parameters		= ext4_param_specs,
>  	.kill_sb		= ext4_kill_sb,
> -	.fs_flags		= FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_MGTIME,
> +	.fs_flags		= FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_MGTIME |
> +				  FS_LBS,
>  };
>  MODULE_ALIAS_FS("ext4");
>  
> -- 
> 2.46.1
>
Pankaj Raghav Nov. 10, 2025, 12:51 p.m. UTC | #2
On 11/7/25 15:42, libaokun@huaweicloud.com wrote:
> From: Baokun Li <libaokun1@huawei.com>
> 
> Since block device (See commit 3c20917120ce ("block/bdev: enable large
> folio support for large logical block sizes")) and page cache (See commit
> ab95d23bab220ef8 ("filemap: allocate mapping_min_order folios in the page
> cache")) has the ability to have a minimum order when allocating folio,
> and ext4 has supported large folio in commit 7ac67301e82f ("ext4: enable
> large folio for regular file"), now add support for block_size > PAGE_SIZE
> in ext4.
> 
> set_blocksize() -> bdev_validate_blocksize() already validates the block
> size, so ext4_load_super() does not need to perform additional checks.
> 
> Here we only need to add the FS_LBS bit to fs_flags.
> 
> In addition, allocation failures for large folios may trigger warn_alloc()
> warnings. Therefore, as with XFS, mark this feature as experimental.
> 

Are you adding the experimental flag because allocation failures can occur with
LBS configuration or because it is a new feature (or both)?

In XFS we added this flag because this was a new feature and not because of the
allocation failure that might happen.

Is it even possible to get rid of these allocation failures in systems were the
memory is limited as the page cache works in > PAGE_SIZE allocations?

--
Pankaj
Theodore Tso Nov. 10, 2025, 3:16 p.m. UTC | #3
On Fri, Nov 07, 2025 at 10:42:49PM +0800, libaokun@huaweicloud.com wrote:
> From: Baokun Li <libaokun1@huawei.com>
> 
> Since block device (See commit 3c20917120ce ("block/bdev: enable large
> folio support for large logical block sizes")) and page cache (See commit
> ab95d23bab220ef8 ("filemap: allocate mapping_min_order folios in the page
> cache")) has the ability to have a minimum order when allocating folio,
> and ext4 has supported large folio in commit 7ac67301e82f ("ext4: enable
> large folio for regular file"), now add support for block_size > PAGE_SIZE
> in ext4.
> 
> set_blocksize() -> bdev_validate_blocksize() already validates the block
> size, so ext4_load_super() does not need to perform additional checks.
> 
> Here we only need to add the FS_LBS bit to fs_flags.
> 
> In addition, allocation failures for large folios may trigger warn_alloc()
> warnings. Therefore, as with XFS, mark this feature as experimental.
> 
> Signed-off-by: Baokun Li <libaokun1@huawei.com>
> Reviewed-by: Zhang Yi <yi.zhang@huawei.com>

Could you add:

#ifdef CONFIG_TRANSPARENT_HUGEPAGES
EXT4_ATTR_FEATURE(blocksize_gt_pagesize);
#endif

in fs/sys/sysfs.c, so that userspace programs (like those in e2fsprogs
and xfstests) can test /sys/fs/ext4/features/... to determine whether
or not blocksize > pagesize is supported?  That way we can more easily
determine whether to test the 64k blocksize configurations in
xfstests, and so we can supress the mke2fs warnings:

mke2fs: 65536-byte blocks too big for system (max 4096)
Proceed anyway? (y,N) y
Warning: 65536-byte blocks too big for system (max 4096), forced to continue

... if the feature flag file is present.

Thanks!!

	 	    	       	       - Ted
Theodore Tso Nov. 10, 2025, 3:23 p.m. UTC | #4
On Mon, Nov 10, 2025 at 01:51:28PM +0100, Pankaj Raghav wrote:
> 
> Are you adding the experimental flag because allocation failures can occur with
> LBS configuration or because it is a new feature (or both)?

I'm going to guess that it was added to mirror what XFS did.

I'll note that this is generally not the pattern for ext4, where we
tend to put these warnings in mke2fs/mkfs.ext4, and by not enabling
them by default.  We haven't historically put them as a warning printk
because I don't believe most users read dmesg output.  :-)

When we've wanted to put some kind of warning or disclaimer in the
kernel, my bias has been to add some kind of Kconfig feature flag,
say, "CONFIG_FS_LARGE_BLOCKSIZE" or "CONFIG_EXT4_LARGE_BLOCKSIZE"
which can either have a warning of its experimental nature in the
config descrption, or if it's *reallY* on the edge (not in this case,
in my opinion) by putting an explicit dependency on
CONFIG_EXPERIMENTAL.

I will admit that most users don't read the Kconfig help text, since
most uesrs aren't even compiling their own kernels :-), but it does
allow for more description of why it might be considered
"experimental" for distribution engineers, and it's less disruptive
when we inevitably forget to remove the experimental warning.  :-)

That being said, this is a personal preference sort of thing, and
people of good will can disagree about what's the best way to approach
this sort of warning.

Cheers,

						- Ted

P.S.  I'm happy not having any kind of experimental warning for bs >
ps, since users would have to affirmatively request a 64k blocksize in
mkfs, and most users don't override the default when creating file
systems, so I assume that people who do so Know What They Are Doing.
Baokun Li Nov. 11, 2025, 3:31 a.m. UTC | #5
On 2025-11-10 20:51, Pankaj Raghav wrote:
> On 11/7/25 15:42, libaokun@huaweicloud.com wrote:
>> From: Baokun Li <libaokun1@huawei.com>
>>
>> Since block device (See commit 3c20917120ce ("block/bdev: enable large
>> folio support for large logical block sizes")) and page cache (See commit
>> ab95d23bab220ef8 ("filemap: allocate mapping_min_order folios in the page
>> cache")) has the ability to have a minimum order when allocating folio,
>> and ext4 has supported large folio in commit 7ac67301e82f ("ext4: enable
>> large folio for regular file"), now add support for block_size > PAGE_SIZE
>> in ext4.
>>
>> set_blocksize() -> bdev_validate_blocksize() already validates the block
>> size, so ext4_load_super() does not need to perform additional checks.
>>
>> Here we only need to add the FS_LBS bit to fs_flags.
>>
>> In addition, allocation failures for large folios may trigger warn_alloc()
>> warnings. Therefore, as with XFS, mark this feature as experimental.
>>
> Are you adding the experimental flag because allocation failures can occur with
> LBS configuration or because it is a new feature (or both)?
>
> In XFS we added this flag because this was a new feature and not because of the
> allocation failure that might happen.

Yeah, both. Large folios still have some problems (allocation failures,
fragmentation, memory overhead, etc.) to sort out, and LBS has to be
forced on.

> Is it even possible to get rid of these allocation failures in systems were the
> memory is limited as the page cache works in > PAGE_SIZE allocations?
>
> --
> Pankaj

The MM people are working in this direction, and how to avoid memory
allocation failures has also been a frequent topic of discussion recently.

I believe this issue will be resolved in the near future.


Regards,
Baokun
Baokun Li Nov. 11, 2025, 3:43 a.m. UTC | #6
On 2025-11-10 23:16, Theodore Ts'o wrote:
> On Fri, Nov 07, 2025 at 10:42:49PM +0800, libaokun@huaweicloud.com wrote:
>> From: Baokun Li <libaokun1@huawei.com>
>>
>> Since block device (See commit 3c20917120ce ("block/bdev: enable large
>> folio support for large logical block sizes")) and page cache (See commit
>> ab95d23bab220ef8 ("filemap: allocate mapping_min_order folios in the page
>> cache")) has the ability to have a minimum order when allocating folio,
>> and ext4 has supported large folio in commit 7ac67301e82f ("ext4: enable
>> large folio for regular file"), now add support for block_size > PAGE_SIZE
>> in ext4.
>>
>> set_blocksize() -> bdev_validate_blocksize() already validates the block
>> size, so ext4_load_super() does not need to perform additional checks.
>>
>> Here we only need to add the FS_LBS bit to fs_flags.
>>
>> In addition, allocation failures for large folios may trigger warn_alloc()
>> warnings. Therefore, as with XFS, mark this feature as experimental.
>>
>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
>> Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
> Could you add:
>
> #ifdef CONFIG_TRANSPARENT_HUGEPAGES
> EXT4_ATTR_FEATURE(blocksize_gt_pagesize);
> #endif
>
> in fs/sys/sysfs.c, so that userspace programs (like those in e2fsprogs
> and xfstests) can test /sys/fs/ext4/features/... to determine whether
> or not blocksize > pagesize is supported?  That way we can more easily
> determine whether to test the 64k blocksize configurations in
> xfstests, and so we can supress the mke2fs warnings:
>
> mke2fs: 65536-byte blocks too big for system (max 4096)
> Proceed anyway? (y,N) y
> Warning: 65536-byte blocks too big for system (max 4096), forced to continue
>
> ... if the feature flag file is present.
>
Good idea — sure!

In my earlier tests I just dropped the warning in mke2fs. That’s a bit
clumsy though; adding an interface so mke2fs and the kernel can work
together is much nicer.

It also lets us do what was mentioned in another thread: warn in mke2fs
instead of in the kernel. I’ll take your suggestion in the next version
and drop the experimental tag.

Thank you for your suggestion!


Regards,
Baokun
diff mbox series

Patch

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 6735152dd219..1fbbae5a0426 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5074,6 +5074,9 @@  static int ext4_check_large_folio(struct super_block *sb)
 		return -EINVAL;
 	}
 
+	if (sb->s_blocksize > PAGE_SIZE)
+		ext4_msg(sb, KERN_NOTICE, "EXPERIMENTAL bs(%lu) > ps(%lu) enabled.",
+			 sb->s_blocksize, PAGE_SIZE);
 	return 0;
 }
 
@@ -7453,7 +7456,8 @@  static struct file_system_type ext4_fs_type = {
 	.init_fs_context	= ext4_init_fs_context,
 	.parameters		= ext4_param_specs,
 	.kill_sb		= ext4_kill_sb,
-	.fs_flags		= FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_MGTIME,
+	.fs_flags		= FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_MGTIME |
+				  FS_LBS,
 };
 MODULE_ALIAS_FS("ext4");