Patchwork [RFC] resize2fs and uninit_bg questions

login
register
mail settings
Submitter Will Drewry
Date Sept. 16, 2009, 4:24 p.m.
Message ID <20090916162457.GA84213@freezingfog.local>
Download mbox | patch
Permalink /patch/33724/
State New
Headers show

Comments

Will Drewry - Sept. 16, 2009, 4:24 p.m.
Hi linux-ext4,

I have a two questions with an accompanying patch for clarification.

resize2fs is uninit_bg aware, but when it is expanding an ext4
filesystem, it will always zero the inode tables.  Is it safe to mimick
mke2fs's write_inode_table(.., lazy_flag=1) and leave the new block
groups' inode tables marked INODE_UNINIT, BLOCK_UNINIT and _not_ zero
out the inode table if uninit_bg is supported?

If it is okay, then it means offline resizing upwards can be just as
fast as mke2fs.  I've attached a patch which is probably incomplete.
I'd love feedback as to the feasibility of the change and/or patch
quality.

As a follow-on, would it be sane to add support like this for
online resizing.  From a cursory investigation, it looks like
setup_new_block_groups() could be modified to not zero itables
if uninit_bg is supported, and INODE_ZEROED could be replaced
with ΒG_*_UNINIT.  However, I'm not sure if that is a naive
view.  I'm happy to send along a patch illustrating this change
if that'd be helpful or welcome.

Any and all feedback is appreciated -- even if it just for me
to look at the archives/link/etc.

Thanks!


Signed-off-by: Will Drewry <redpig@dataspill.org>
---
resize/resize2fs.c |   28 ++++++++++++++++++++++------
1 files changed, 22 insertions(+), 6 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Dilger - Sept. 16, 2009, 7:08 p.m.
On Sep 16, 2009  11:24 -0500, Will Drewry wrote:
> I have a two questions with an accompanying patch for clarification.
> 
> resize2fs is uninit_bg aware, but when it is expanding an ext4
> filesystem, it will always zero the inode tables.  Is it safe to mimick
> mke2fs's write_inode_table(.., lazy_flag=1) and leave the new block
> groups' inode tables marked INODE_UNINIT, BLOCK_UNINIT and _not_ zero
> out the inode table if uninit_bg is supported?
> 
> If it is okay, then it means offline resizing upwards can be just as
> fast as mke2fs.  I've attached a patch which is probably incomplete.
> I'd love feedback as to the feasibility of the change and/or patch
> quality.
> 
> As a follow-on, would it be sane to add support like this for
> online resizing.  From a cursory investigation, it looks like
> setup_new_block_groups() could be modified to not zero itables
> if uninit_bg is supported, and INODE_ZEROED could be replaced
> with ΒG_*_UNINIT.  However, I'm not sure if that is a naive
> view.  I'm happy to send along a patch illustrating this change
> if that'd be helpful or welcome.

The question is why you would want to risk disk corruption vs.
the (likely not performance critical) online resize?

> Any and all feedback is appreciated -- even if it just for me
> to look at the archives/link/etc.
> 
> diff --git a/resize/resize2fs.c b/resize/resize2fs.c
> index 1a5d910..9fcc3b9 100644
> --- a/resize/resize2fs.c
> +++ b/resize/resize2fs.c
> @@ -497,8 +497,7 @@ retry:
>  
>  		fs->group_desc[i].bg_flags = 0;
>  		if (csum_flag)
> -			fs->group_desc[i].bg_flags |= EXT2_BG_INODE_UNINIT |
> -				EXT2_BG_INODE_ZEROED;
> +			fs->group_desc[i].bg_flags |= EXT2_BG_INODE_UNINIT;

This shouldn't be unconditional, since most users will want the
safety of having zeroed inode tables.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/resize/resize2fs.c b/resize/resize2fs.c
index 1a5d910..9fcc3b9 100644
--- a/resize/resize2fs.c
+++ b/resize/resize2fs.c
@@ -497,8 +497,7 @@  retry:
 
 		fs->group_desc[i].bg_flags = 0;
 		if (csum_flag)
-			fs->group_desc[i].bg_flags |= EXT2_BG_INODE_UNINIT |
-				EXT2_BG_INODE_ZEROED;
+			fs->group_desc[i].bg_flags |= EXT2_BG_INODE_UNINIT;
 		if (i == fs->group_desc_count-1) {
 			numblocks = (fs->super->s_blocks_count -
 				     fs->super->s_first_data_block) %
@@ -568,7 +567,7 @@  errout:
 static errcode_t adjust_superblock(ext2_resize_t rfs, blk_t new_size)
 {
 	ext2_filsys fs;
-	int		adj = 0;
+	int		adj = 0, csum_flag = 0, num = 0;
 	errcode_t	retval;
 	blk_t		group_block;
 	unsigned long	i;
@@ -624,6 +623,9 @@  static errcode_t adjust_superblock(ext2_resize_t rfs, blk_t new_size)
 				&rfs->itable_buf);
 	if (retval)
 		goto errout;
+	/* Track if we can get by with a lazy init */
+	csum_flag = EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
+					       EXT4_FEATURE_RO_COMPAT_GDT_CSUM);
 
 	memset(rfs->itable_buf, 0, fs->blocksize * fs->inode_blocks_per_group);
 	group_block = fs->super->s_first_data_block +
@@ -642,10 +644,24 @@  static errcode_t adjust_superblock(ext2_resize_t rfs, blk_t new_size)
 		/*
 		 * Write out the new inode table
 		 */
+		if (csum_flag) {
+			/* These are _new_ inode tables. No inodes should be in use.
+			 * (As per ext2fs_set_gdt_csum) */
+			fs->group_desc[i].bg_itable_unused = fs->super->s_inodes_per_group;
+			num = ((((fs->super->s_inodes_per_group -
+				  fs->group_desc[i].bg_itable_unused) *
+				 EXT2_INODE_SIZE(fs->super)) +
+				EXT2_BLOCK_SIZE(fs->super) - 1) /
+			       EXT2_BLOCK_SIZE(fs->super));
+		} else {
+			num = fs->inode_blocks_per_group;
+			/* The kernel doesn't need to zero the itable blocks. We will below */
+			fs->group_desc[i].bg_flags |= EXT2_BG_INODE_ZEROED;
+		}
 		retval = io_channel_write_blk(fs->io,
-					      fs->group_desc[i].bg_inode_table,
-					      fs->inode_blocks_per_group,
-					      rfs->itable_buf);
+					      fs->group_desc[i].bg_inode_table, /* blk */
+					      num,  /* count */
+					      rfs->itable_buf);  /* contents */
 		if (retval) goto errout;
 
 		io_channel_flush(fs->io);