diff mbox series

[12/13] ext4: remove unnecessary check for avoiding multiple update_backups in ext4_flex_group_add

Message ID 20230629120044.1261968-13-shikemeng@huaweicloud.com
State Superseded
Headers show
Series fixes and cleanups to ext4 resize | expand

Commit Message

Kemeng Shi June 29, 2023, noon UTC
Commit 0acdb8876fead ("ext4: don't call update_backups() multiple times
for the same bg") add check in ext4_flex_group_add to avoid call
update_backups multiple times for block group descriptors in the same
group descriptor block. However, we have already call update_backups in
block step, so the added check is unnecessary.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/ext4/resize.c | 4 ----
 1 file changed, 4 deletions(-)

Comments

Theodore Ts'o Aug. 16, 2023, 3:47 a.m. UTC | #1
On Thu, Jun 29, 2023 at 08:00:43PM +0800, Kemeng Shi wrote:
> Commit 0acdb8876fead ("ext4: don't call update_backups() multiple times
> for the same bg") add check in ext4_flex_group_add to avoid call
> update_backups multiple times for block group descriptors in the same
> group descriptor block. However, we have already call update_backups in
> block step, so the added check is unnecessary.

I'm having trouble understaind this comit.  What do you mean by the
"block step" in the last paragraph?

					- Ted
Kemeng Shi Aug. 17, 2023, 3:53 a.m. UTC | #2
on 8/16/2023 11:47 AM, Theodore Ts'o wrote:
> On Thu, Jun 29, 2023 at 08:00:43PM +0800, Kemeng Shi wrote:
>> Commit 0acdb8876fead ("ext4: don't call update_backups() multiple times
>> for the same bg") add check in ext4_flex_group_add to avoid call
>> update_backups multiple times for block group descriptors in the same
>> group descriptor block. However, we have already call update_backups in
>> block step, so the added check is unnecessary.
> 
> I'm having trouble understaind this comit.  What do you mean by the
> "block step" in the last paragraph?
> 
Sorry for the confusing stuff. I mean we foreach in group descriptor block
step instead of foreach in group descriptor step to update backup.
So there is no chance to call update_backups for different descriptors
in the same bg.

> 					- Ted
>
Theodore Ts'o Aug. 17, 2023, 2:11 p.m. UTC | #3
On Thu, Aug 17, 2023 at 11:53:52AM +0800, Kemeng Shi wrote:
> 
> 
> on 8/16/2023 11:47 AM, Theodore Ts'o wrote:
> > On Thu, Jun 29, 2023 at 08:00:43PM +0800, Kemeng Shi wrote:
> >> Commit 0acdb8876fead ("ext4: don't call update_backups() multiple times
> >> for the same bg") add check in ext4_flex_group_add to avoid call
> >> update_backups multiple times for block group descriptors in the same
> >> group descriptor block. However, we have already call update_backups in
> >> block step, so the added check is unnecessary.
> > 
> > I'm having trouble understaind this comit.  What do you mean by the
> > "block step" in the last paragraph?
> > 
> Sorry for the confusing stuff. I mean we foreach in group descriptor block
> step instead of foreach in group descriptor step to update backup.
> So there is no chance to call update_backups for different descriptors
> in the same bg.

I'm still confused.  This commit is not so much removing an
unnecessary check as much as forcing update_backups() to be called for
every single block group.  Right?

So either we're doing extra work, or (b) we're fixing a problem
because we actually *need* to call update_backups() for every single
block group.

More generally, this whole patch series is making it clear that the
online resize code is hard to understand, because it's super
complicated, leading to potential bugs, and very clearly code which is
very hard to maintain.  So this may mean we need better comments to
make it clear *when* the backup block groups are going to be
accomplished, given the various different cases (e.g., no flex_bg or
meta_bg, with flex_bg, or with meat_bg).

							- Ted
Kemeng Shi Aug. 18, 2023, 3:16 a.m. UTC | #4
on 8/17/2023 10:11 PM, Theodore Ts'o wrote:
> On Thu, Aug 17, 2023 at 11:53:52AM +0800, Kemeng Shi wrote:
>>
>>
>> on 8/16/2023 11:47 AM, Theodore Ts'o wrote:
>>> On Thu, Jun 29, 2023 at 08:00:43PM +0800, Kemeng Shi wrote:
>>>> Commit 0acdb8876fead ("ext4: don't call update_backups() multiple times
>>>> for the same bg") add check in ext4_flex_group_add to avoid call
>>>> update_backups multiple times for block group descriptors in the same
>>>> group descriptor block. However, we have already call update_backups in
>>>> block step, so the added check is unnecessary.
>>>
>>> I'm having trouble understaind this comit.  What do you mean by the
>>> "block step" in the last paragraph?
>>>
>> Sorry for the confusing stuff. I mean we foreach in group descriptor block
>> step instead of foreach in group descriptor step to update backup.
>> So there is no chance to call update_backups for different descriptors
>> in the same bg.
> 
> I'm still confused.  This commit is not so much removing an
> unnecessary check as much as forcing update_backups() to be called for
> every single block group.  Right?
> 
Ah, I guess here is the thing I missed that make this confusing:
sbi->s_group_desc contains only primary block of each group. i.e.
sbi->s_group_desc['i'] is the primary gdb block of group 'i'. It's clear
that primary gdb blocks of different groups will not be the same. Then
all blocks in s_group_desc[*] should be different and the removed check
is unnecessary.
From add_new_gdb and add_new_gdb_meta_bg, we can find that we always
add primary gdb block of new group to s_group_desc. To be more specific:
add_new_gdb
	gdblock = EXT4_SB(sb)->s_sbh->b_blocknr + 1 + gdb_num;
	gdb_bh = ext4_sb_bread(sb, gdblock, 0);
	n_group_desc[gdb_num] = gdb_bh;

add_new_gdb_meta_bg
	gdblock = ext4_meta_bg_first_block_no(sb, group) +
		  ext4_bg_has_super(sb, group);
	gdb_bh = ext4_sb_bread(sb, gdblock, 0);
	n_group_desc[gdb_num] = gdb_bh;
	
> So either we're doing extra work, or (b) we're fixing a problem
> because we actually *need* to call update_backups() for every single
> block group.
> 
> More generally, this whole patch series is making it clear that the
> online resize code is hard to understand, because it's super
> complicated, leading to potential bugs, and very clearly code which is
> very hard to maintain.  So this may mean we need better comments to
> make it clear *when* the backup block groups are going to be
> accomplished, given the various different cases (e.g., no flex_bg or
> meta_bg, with flex_bg, or with meat_bg).
> 
Yes, I agree with this. I wonder if a series to add comments of some
common rules is good to you. Like the information mentioned above
that s_group_desc contains primary gdb block of each group.

> 							- Ted
>
Theodore Ts'o Aug. 18, 2023, 5 a.m. UTC | #5
On Fri, Aug 18, 2023 at 11:16:35AM +0800, Kemeng Shi wrote:
> Ah, I guess here is the thing I missed that make this confusing:
> sbi->s_group_desc contains only primary block of each group. i.e.
> sbi->s_group_desc['i'] is the primary gdb block of group 'i'.

Correct.  In fact, when we need to modify a block group descriptor for
a group, we call ext4_get_group_desc(), and it references
sbi->s_group_desc to find the appropriate buffer_head for the bg
descriptor that we want to modify.

I'm not sure "only" is the right adjective to use here, since the
whole *point* of s_group_desc[] is to keep the buffer_heads for the
block group descriptor blocks in memory, so we can modify them when we
allocate or free blocks, inodes, etc.  And we only modify the primary
block group descriptors.

The secondary, or backup block group descriptors are only by used
e2fsck when the primary block group descriptor has been overwritten,
so we can find the inode table, allocation bitmaps, and so on.  So we
do *not* modify them in the course of normal operations, and that's by
design.  The only time the kernel will update those block group
descriptors is when we are doing an online resize, and we need make
sure the backup descriptors are updated, so that if the primary
descriptors get completely smashed, we can still get critical
information such as the location of that block group's inode table.

> From add_new_gdb and add_new_gdb_meta_bg, we can find that we always
> add primary gdb block of new group to s_group_desc. To be more specific:
> add_new_gdb
> 	gdblock = EXT4_SB(sb)->s_sbh->b_blocknr + 1 + gdb_num;
> 	gdb_bh = ext4_sb_bread(sb, gdblock, 0);
> 	n_group_desc[gdb_num] = gdb_bh;
> 
> add_new_gdb_meta_bg
> 	gdblock = ext4_meta_bg_first_block_no(sb, group) +
> 		  ext4_bg_has_super(sb, group);
> 	gdb_bh = ext4_sb_bread(sb, gdblock, 0);
> 	n_group_desc[gdb_num] = gdb_bh;

Put another way, there are EXT4_DESC_PER_BLOCK(sb) bg descriptors in a
block.  For a file system with the 64-bit feature enabled, the size of
the block group descriptor is 128.  If the block size is 4096, then we
can fit 32 block group descriptors in a block.

When we add a new block group such that its block group descriptor
will spill into a new block, then we need to expand s_group_desc[]
array, and initialize the new block group descriptor block.  And
that's the job of add_new_gdb() and add_new_gdb_meta_bg().

> > More generally, this whole patch series is making it clear that the
> > online resize code is hard to understand, because it's super
> > complicated, leading to potential bugs, and very clearly code which is
> > very hard to maintain.  So this may mean we need better comments to
> > make it clear *when* the backup block groups are going to be
> > accomplished, given the various different cases (e.g., no flex_bg or
> > meta_bg, with flex_bg, or with meat_bg).
> > 
> Yes, I agree with this. I wonder if a series to add comments of some
> common rules is good to you. Like the information mentioned above
> that s_group_desc contains primary gdb block of each group.

Well, the meaning of s_group_desc[] was obvious to me, but that's why
it's useful to have somone with "new eyes" take a look at code, since
what may be obvious to old hands might not be obvious to someone
looking at the code for the first time --- and so yes, it's probably
worth documenting.  The question is where is the best place to put it,
since the primary place where s_group_desc[] is *not* online resize.

s_group_desc[] is initialized in ext4_group_desc_init() in
fs/ext4/super.c, and it is used in fs/ext4/balloc.c, and of course, it
is defined in fs/ext4.h.  

						- Ted
Kemeng Shi Aug. 18, 2023, 8:42 a.m. UTC | #6
on 8/18/2023 1:00 PM, Theodore Ts'o wrote:
> On Fri, Aug 18, 2023 at 11:16:35AM +0800, Kemeng Shi wrote:
>> Ah, I guess here is the thing I missed that make this confusing:
>> sbi->s_group_desc contains only primary block of each group. i.e.
>> sbi->s_group_desc['i'] is the primary gdb block of group 'i'.
> 
> Correct.  In fact, when we need to modify a block group descriptor for
> a group, we call ext4_get_group_desc(), and it references
> sbi->s_group_desc to find the appropriate buffer_head for the bg
> descriptor that we want to modify.
> 
> I'm not sure "only" is the right adjective to use here, since the
> whole *point* of s_group_desc[] is to keep the buffer_heads for the
> block group descriptor blocks in memory, so we can modify them when we
> allocate or free blocks, inodes, etc.  And we only modify the primary
> block group descriptors.
> 
> The secondary, or backup block group descriptors are only by used
> e2fsck when the primary block group descriptor has been overwritten,
> so we can find the inode table, allocation bitmaps, and so on.  So we
> do *not* modify them in the course of normal operations, and that's by
> design.  The only time the kernel will update those block group
> descriptors is when we are doing an online resize, and we need make
> sure the backup descriptors are updated, so that if the primary
> descriptors get completely smashed, we can still get critical
> information such as the location of that block group's inode table.
> >> From add_new_gdb and add_new_gdb_meta_bg, we can find that we always
>> add primary gdb block of new group to s_group_desc. To be more specific:
>> add_new_gdb
>> 	gdblock = EXT4_SB(sb)->s_sbh->b_blocknr + 1 + gdb_num;
>> 	gdb_bh = ext4_sb_bread(sb, gdblock, 0);
>> 	n_group_desc[gdb_num] = gdb_bh;
>>
>> add_new_gdb_meta_bg
>> 	gdblock = ext4_meta_bg_first_block_no(sb, group) +
>> 		  ext4_bg_has_super(sb, group);
>> 	gdb_bh = ext4_sb_bread(sb, gdblock, 0);
>> 	n_group_desc[gdb_num] = gdb_bh;
> 
> Put another way, there are EXT4_DESC_PER_BLOCK(sb) bg descriptors in a
> block.  For a file system with the 64-bit feature enabled, the size of
> the block group descriptor is 128.  If the block size is 4096, then we
> can fit 32 block group descriptors in a block.
> 
> When we add a new block group such that its block group descriptor
> will spill into a new block, then we need to expand s_group_desc[]
> array, and initialize the new block group descriptor block.  And
> that's the job of add_new_gdb() and add_new_gdb_meta_bg().
> 
Hi Ted, thanks for the explain. Here are more updates from what I found:
I find that descriptor_loc show layout of gdb blocks in s_group_desc[]
which is: block of s_group_desc[i] will be superblock + i + 1 for
non-meta_bg and 'first block of meta_bg' + has_super for meta_bg.
Although descriptor_loc is called to initialize s_group_desc[], the
expanded gdb block to s_group_desc from add_new_gdb obeys the same
layout.
Back to the original purpose of this patch which is to remove
unnecessary of equality check of s_group_desc[gdb_num - 1].b_blocknr and
s_group_desc[gdb_num].b_blocknr, we can see each s_group_desc has it's
own block from layout above and the check should be unnecessary.
But no insistant on this if you still have concern about it.
>>> More generally, this whole patch series is making it clear that the
>>> online resize code is hard to understand, because it's super
>>> complicated, leading to potential bugs, and very clearly code which is
>>> very hard to maintain.  So this may mean we need better comments to
>>> make it clear *when* the backup block groups are going to be
>>> accomplished, given the various different cases (e.g., no flex_bg or
>>> meta_bg, with flex_bg, or with meat_bg).
>>>
>> Yes, I agree with this. I wonder if a series to add comments of some
>> common rules is good to you. Like the information mentioned above
>> that s_group_desc contains primary gdb block of each group.
> 
> Well, the meaning of s_group_desc[] was obvious to me, but that's why
> it's useful to have somone with "new eyes" take a look at code, since
> what may be obvious to old hands might not be obvious to someone
> looking at the code for the first time --- and so yes, it's probably
Yes. this is just for anyone starting to read this code.
> worth documenting.  The question is where is the best place to put it,
> since the primary place where s_group_desc[] is *not* online resize.
> 
> s_group_desc[] is initialized in ext4_group_desc_init() in
> fs/ext4/super.c, and it is used in fs/ext4/balloc.c, and of course, it
> is defined in fs/ext4.h.  
I plan to add comment in fs/ext4.h as following:
struct ext4_sb_info {
	...
	struct buffer_head * __rcu *s_group_desc; /* Primary gdb blocks of online groups */
But I'm not sure it's proper now as you menthioned s_group_desc[] is to
keep the buffer_heads for the block group descriptor blocks in memory
and it contains primary gdb block is a coincidence that we only modify
primary block in kernel.

Besides, I plan to go through the resize code again in fulture and
add some comments to make it easy for anyone starting read this
or make it easy to maintain. Please let me if you disklike it.
> 
> 						- Ted
>
Theodore Ts'o Aug. 18, 2023, 4 p.m. UTC | #7
On Fri, Aug 18, 2023 at 04:42:31PM +0800, Kemeng Shi wrote:
> > s_group_desc[] is initialized in ext4_group_desc_init() in
> > fs/ext4/super.c, and it is used in fs/ext4/balloc.c, and of course, it
> > is defined in fs/ext4.h.  
> I plan to add comment in fs/ext4.h as following:
> struct ext4_sb_info {
> 	...
> 	struct buffer_head * __rcu *s_group_desc; /* Primary gdb blocks of online groups */
> But I'm not sure it's proper now as you menthioned s_group_desc[] is to
> keep the buffer_heads for the block group descriptor blocks in memory
> and it contains primary gdb block is a coincidence that we only modify
> primary block in kernel.

In general, the terminology that ext4 developers have used is "block
group descriptors" and "backup block group descriptors".  The kernel
never *uses* the backup block group users; and with the sparse_super2
feature, the "backup superblocks" and "backup block group descriptors"
are optional.

They are used by e2fsck if we need to recover a trashed superblock and
block group descriptors, which is why code that is resizing the file
system, or updating the label or the UUID need to update the backup
superblocks and/or backup block group descriptors so that we can
better recover disaster.

So I'd just suggest changing the comment above to "array of bh's for
the block group descriptors".

Cheers,

							- Ted

> Besides, I plan to go through the resize code again in fulture and
> add some comments to make it easy for anyone starting read this
> or make it easy to maintain. Please let me if you disklike it.

P.S.

BTW, a useful test program to add is one that checks to make sure that
the "static" parts of the superblock and block group descriptors
(i.e., the parts that don't get changed under normal operation while
the file system is mounted when the kernel *isn't* trying to do a
resize or change the label, UUID, or in the future, the new ioctl's to
update the parts of the superblock that can get modified by tune2fs),
and to make sure that all of the backup superblock and block group
descriptors have gotten updated.

Some of the bugs that you found may have resulted in some of the
backup bg descriptors not getting updataed, which we wouldn't
necessarily notice unless we had a test program that explicitly
checked for them.

And truth to tell, the only backup superblock and block group
descriptor that actually gets used to recover the file system is the
first one, since that's the one e2fsck will fall back to
automatically.  An expert might try to use one of the other backup
block groups as a desperate measure, and there might be some automated
programs that might be smart enough to use the backup block groups
when trying to recover the location of the partition table when the
file system and partition table is very badly damaged --- so that's
one of the reasons why with sparse_super2, the number of backup block
group descriptors can be limited to (for example) one located in the
first block group, and one located in the very last block group.
Kemeng Shi Aug. 22, 2023, 2:21 a.m. UTC | #8
on 8/19/2023 12:00 AM, Theodore Ts'o wrote:
> On Fri, Aug 18, 2023 at 04:42:31PM +0800, Kemeng Shi wrote:
>>> s_group_desc[] is initialized in ext4_group_desc_init() in
>>> fs/ext4/super.c, and it is used in fs/ext4/balloc.c, and of course, it
>>> is defined in fs/ext4.h.  
>> I plan to add comment in fs/ext4.h as following:
>> struct ext4_sb_info {
>> 	...
>> 	struct buffer_head * __rcu *s_group_desc; /* Primary gdb blocks of online groups */
>> But I'm not sure it's proper now as you menthioned s_group_desc[] is to
>> keep the buffer_heads for the block group descriptor blocks in memory
>> and it contains primary gdb block is a coincidence that we only modify
>> primary block in kernel.
> 
> In general, the terminology that ext4 developers have used is "block
> group descriptors" and "backup block group descriptors".  The kernel
> never *uses* the backup block group users; and with the sparse_super2
> feature, the "backup superblocks" and "backup block group descriptors"
> are optional.
> 
Sure.
> They are used by e2fsck if we need to recover a trashed superblock and
> block group descriptors, which is why code that is resizing the file
> system, or updating the label or the UUID need to update the backup
> superblocks and/or backup block group descriptors so that we can
> better recover disaster.
OK, kernel updates backup blocks and userspace tool recovers from backup
in case that blocks used by kernel is corrupted.
> 
> So I'd just suggest changing the comment above to "array of bh's for
> the block group descriptors".
> 
Sure, I will do this in next version.

Besides, what about the check this patch tries to remove. Would you
prefer to keep it or remove it with better git description. Both
ways are acceptable to me. Thanks.

> Cheers,
> 
> 							- Ted
> 
>> Besides, I plan to go through the resize code again in fulture and
>> add some comments to make it easy for anyone starting read this
>> or make it easy to maintain. Please let me if you disklike it.
> 
> P.S.
> 
> BTW, a useful test program to add is one that checks to make sure that
> the "static" parts of the superblock and block group descriptors
> (i.e., the parts that don't get changed under normal operation while
> the file system is mounted when the kernel *isn't* trying to do a
> resize or change the label, UUID, or in the future, the new ioctl's to
> update the parts of the superblock that can get modified by tune2fs),
> and to make sure that all of the backup superblock and block group
> descriptors have gotten updated>
> Some of the bugs that you found may have resulted in some of the
> backup bg descriptors not getting updataed, which we wouldn't
> necessarily notice unless we had a test program that explicitly
> checked for them.
> 
> And truth to tell, the only backup superblock and block group
> descriptor that actually gets used to recover the file system is the
> first one, since that's the one e2fsck will fall back to
> automatically.  An expert might try to use one of the other backup
> block groups as a desperate measure, and there might be some automated
> programs that might be smart enough to use the backup block groups
> when trying to recover the location of the partition table when the
> file system and partition table is very badly damaged --- so that's
> one of the reasons why with sparse_super2, the number of backup block
> group descriptors can be limited to (for example) one located in the
> first block group, and one located in the very last block group.
> 
> Thanks for let me know and I do have no knowldge about how backup is used
in usersapce.
diff mbox series

Patch

diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index da832466ce74..d2b3ee50af31 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -1589,7 +1589,6 @@  static int ext4_flex_group_add(struct super_block *sb,
 		int meta_bg = ext4_has_feature_meta_bg(sb);
 		sector_t padding_blocks = meta_bg ? 0 : sbi->s_sbh->b_blocknr -
 					 ext4_group_first_block_no(sb, 0);
-		sector_t old_gdb = 0;
 
 		update_backups(sb, ext4_group_first_block_no(sb, 0),
 			       (char *)es, sizeof(struct ext4_super_block), 0);
@@ -1598,11 +1597,8 @@  static int ext4_flex_group_add(struct super_block *sb,
 
 			gdb_bh = sbi_array_rcu_deref(sbi, s_group_desc,
 						     gdb_num);
-			if (old_gdb == gdb_bh->b_blocknr)
-				continue;
 			update_backups(sb, gdb_bh->b_blocknr - padding_blocks,
 				       gdb_bh->b_data, gdb_bh->b_size, meta_bg);
-			old_gdb = gdb_bh->b_blocknr;
 		}
 	}
 exit: