diff mbox series

[2/2] ext4: avoid resizing to a partial cluster size

Message ID 9CDF7393-5645-4E8A-9D68-01CF7F4C4955@amazon.com
State Not Applicable
Headers show
Series [1/2] ext4: reduce computation of overhead during resize | expand

Commit Message

Kiselev, Oleg June 30, 2022, 2:17 a.m. UTC
This patch avoids an attempt to resize the filesystem to an
unaligned cluster boundary.  An online resize to a size that is not
integral to cluster size results in the last iteration attempting to
grow the fs by a negative amount, which trips a BUG_ON and leaves the fs
with a corrupted in-memory superblock.

Signed-off-by: Oleg Kiselev <okiselev@amazon.com>
---
 fs/ext4/resize.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--
2.32.0

Comments

Jan Kara July 14, 2022, 1:52 p.m. UTC | #1
On Thu 30-06-22 02:17:22, Kiselev, Oleg wrote:
> This patch avoids an attempt to resize the filesystem to an
> unaligned cluster boundary.  An online resize to a size that is not
> integral to cluster size results in the last iteration attempting to
> grow the fs by a negative amount, which trips a BUG_ON and leaves the fs
> with a corrupted in-memory superblock.
> 
> Signed-off-by: Oleg Kiselev <okiselev@amazon.com>
> ---
...

> @@ -1624,7 +1624,8 @@ static int ext4_setup_next_flex_gd(struct super_block *sb,
> 
>  	o_blocks_count = ext4_blocks_count(es);
> 
> -	if (o_blocks_count == n_blocks_count)
> +	if ((o_blocks_count == n_blocks_count) ||
> +	    ((n_blocks_count - o_blocks_count) < sbi->s_cluster_ratio))
>  		return 0;

So why do you silently do nothing with unaligned size? I'd expect we should
catch this condition already in ext4_resize_fs() and return EINVAL in that
case...

Also this code does something else than what the commit log says. You
actually check whether there are less than one cluster worth of blocks
instead of checking whether n_blocks_count is properly aligned. Why is that
enough?

								Honza
Kiselev, Oleg July 15, 2022, 1 a.m. UTC | #2
Thanks for the review, Jan

> On Jul 14, 2022, at 6:52 AM, Jan Kara <jack@suse.cz> wrote:
> 
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On Thu 30-06-22 02:17:22, Kiselev, Oleg wrote:
>> This patch avoids an attempt to resize the filesystem to an
>> unaligned cluster boundary.  An online resize to a size that is not
>> integral to cluster size results in the last iteration attempting to
>> grow the fs by a negative amount, which trips a BUG_ON and leaves the fs
>> with a corrupted in-memory superblock.
>> 
>> Signed-off-by: Oleg Kiselev <okiselev@amazon.com>
>> ---
> ...
> 
>> @@ -1624,7 +1624,8 @@ static int ext4_setup_next_flex_gd(struct super_block *sb,
>> 
>>      o_blocks_count = ext4_blocks_count(es);
>> 
>> -     if (o_blocks_count == n_blocks_count)
>> +     if ((o_blocks_count == n_blocks_count) ||
>> +         ((n_blocks_count - o_blocks_count) < sbi->s_cluster_ratio))
>>              return 0;
> 
> So why do you silently do nothing with unaligned size? I'd expect we should
> catch this condition already in ext4_resize_fs() and return EINVAL in that
> case...

Failing a resize with an error will be an unexpected behavior that will break software that calls resize2fs without specifying the size.  We ran into this issue because we make our filesystems on top of DRBD devices, and DRBD aligns its metadata on 4K boundaries.  This results in space available for the filesystem having an “odd” size.  Our preference is for the utilities to silently fix the fs size down to the nearest “safe” size rather than get sporadic errors.   I had submitted a patch for resize2fs that rounds the fs target size down to the nearest cluster boundary.  In principle it’s similar to the size-rounding that is done now for 4K blocks.   Using updated e2fsprogs isn’t mandatory for using ext4 in the newer kernels, so making the kernel safe(r) for bigalloc resizes seems like a good idea.

> Also this code does something else than what the commit log says. You
> actually check whether there are less than one cluster worth of blocks
> instead of checking whether n_blocks_count is properly aligned. Why is that
> enough?

That’s a good point.  I put a fix as close to the place in the code where this misalignment causes a problem, but it would be better to put a size alignment check in ext4_resize_fs() and trim the request there, instead.  I will make that change and resubmit the patch.

> 
>                                                                Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
Jan Kara July 15, 2022, 9:35 a.m. UTC | #3
On Fri 15-07-22 01:00:01, Kiselev, Oleg wrote:
> >> @@ -1624,7 +1624,8 @@ static int ext4_setup_next_flex_gd(struct super_block *sb,
> >> 
> >>      o_blocks_count = ext4_blocks_count(es);
> >> 
> >> -     if (o_blocks_count == n_blocks_count)
> >> +     if ((o_blocks_count == n_blocks_count) ||
> >> +         ((n_blocks_count - o_blocks_count) < sbi->s_cluster_ratio))
> >>              return 0;
> > 
> > So why do you silently do nothing with unaligned size? I'd expect we should
> > catch this condition already in ext4_resize_fs() and return EINVAL in that
> > case...
> 
> Failing a resize with an error will be an unexpected behavior that will
> break software that calls resize2fs without specifying the size.  We ran
> into this issue because we make our filesystems on top of DRBD devices,
> and DRBD aligns its metadata on 4K boundaries.  This results in space
> available for the filesystem having an “odd” size.  Our preference is for
> the utilities to silently fix the fs size down to the nearest “safe” size
> rather than get sporadic errors.   I had submitted a patch for resize2fs
> that rounds the fs target size down to the nearest cluster boundary.  In
> principle it’s similar to the size-rounding that is done now for 4K
> blocks.   Using updated e2fsprogs isn’t mandatory for using ext4 in the
> newer kernels, so making the kernel safe(r) for bigalloc resizes seems
> like a good idea.

I see. Honestly, doing automatic "fixups" of passed arguments to syscalls /
ioctls has bitten us more than once in the past. That's why I'm cautious
about that. It seems convenient initially but then when contraints change
(e.g. you'd want to be rounding to a different number) you suddently find
you have no way to extend the API without breaking some userspace. That's
why I prefer to put these "rounding convenience" functions into userspace.

That being said I don't feel too strongly about this particular case so I
guess I'll defer the final decision about the policy to Ted.

								Honza
Theodore Ts'o July 15, 2022, 11:48 a.m. UTC | #4
On Fri, Jul 15, 2022 at 11:35:18AM +0200, Jan Kara wrote:
> > available for the filesystem having an “odd” size.  Our preference is for
> > the utilities to silently fix the fs size down to the nearest “safe” size
> > rather than get sporadic errors.   I had submitted a patch for resize2fs
> > that rounds the fs target size down to the nearest cluster boundary.  In
> > principle it’s similar to the size-rounding that is done now for 4K
> > blocks.   Using updated e2fsprogs isn’t mandatory for using ext4 in the
> > newer kernels, so making the kernel safe(r) for bigalloc resizes seems
> > like a good idea.
> 
> I see. Honestly, doing automatic "fixups" of passed arguments to syscalls /
> ioctls has bitten us more than once in the past. That's why I'm cautious
> about that. It seems convenient initially but then when contraints change
> (e.g. you'd want to be rounding to a different number) you suddently find
> you have no way to extend the API without breaking some userspace. That's
> why I prefer to put these "rounding convenience" functions into userspace.
> 
> That being said I don't feel too strongly about this particular case so I
> guess I'll defer the final decision about the policy to Ted.

In this particular case, a file system whose size is not a multiple of
cluster size is never going to be valid, so having the resize ioctl
round down the requested size to largest valid size seems to be a safe
(and useful) thing to do.

						- Ted
diff mbox series

Patch

diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 2acc9fca99ea..8803905907de 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -1624,7 +1624,8 @@  static int ext4_setup_next_flex_gd(struct super_block *sb,

 	o_blocks_count = ext4_blocks_count(es);

-	if (o_blocks_count == n_blocks_count)
+	if ((o_blocks_count == n_blocks_count) ||
+	    ((n_blocks_count - o_blocks_count) < sbi->s_cluster_ratio))
 		return 0;

 	ext4_get_group_no_and_offset(sb, o_blocks_count, &group, &last);