From patchwork Wed Sep 16 20:42:25 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Will Drewry X-Patchwork-Id: 33741 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by ozlabs.org (Postfix) with ESMTP id E0F8BB7B6D for ; Thu, 17 Sep 2009 06:47:49 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754817AbZIPUro (ORCPT ); Wed, 16 Sep 2009 16:47:44 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1760019AbZIPUro (ORCPT ); Wed, 16 Sep 2009 16:47:44 -0400 Received: from mail-qy0-f190.google.com ([209.85.221.190]:52960 "EHLO mail-qy0-f190.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754817AbZIPUrn (ORCPT ); Wed, 16 Sep 2009 16:47:43 -0400 X-Greylist: delayed 317 seconds by postgrey-1.27 at vger.kernel.org; Wed, 16 Sep 2009 16:47:42 EDT Received: by qyk28 with SMTP id 28so5407327qyk.28 for ; Wed, 16 Sep 2009 13:47:46 -0700 (PDT) Received: by 10.224.126.212 with SMTP id d20mr7879402qas.118.1253133748848; Wed, 16 Sep 2009 13:42:28 -0700 (PDT) Received: from freezingfog.local ([216.239.44.65]) by mx.google.com with ESMTPS id 20sm98365yxe.6.2009.09.16.13.42.27 (version=TLSv1/SSLv3 cipher=RC4-MD5); Wed, 16 Sep 2009 13:42:28 -0700 (PDT) Date: Wed, 16 Sep 2009 15:42:25 -0500 From: Will Drewry To: Andreas Dilger Cc: linux-ext4@vger.kernel.org Subject: Re: [PATCH][RFC] resize2fs and uninit_bg questions Message-ID: <20090916204225.GB84213@freezingfog.local> References: <20090916162457.GA84213@freezingfog.local> <20090916190831.GH2537@webber.adilger.int> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20090916190831.GH2537@webber.adilger.int> User-Agent: Mutt/1.5.19 (2009-01-05) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Wed, Sep 16, 2009 at 01:08:31PM -0600, Andreas Dilger wrote: > On Sep 16, 2009 11:24 -0500, Will Drewry wrote: > > I have a two questions with an accompanying patch for clarification. > > > > resize2fs is uninit_bg aware, but when it is expanding an ext4 > > filesystem, it will always zero the inode tables. Is it safe to mimick > > mke2fs's write_inode_table(.., lazy_flag=1) and leave the new block > > groups' inode tables marked INODE_UNINIT, BLOCK_UNINIT and _not_ zero > > out the inode table if uninit_bg is supported? > > > > If it is okay, then it means offline resizing upwards can be just as > > fast as mke2fs. I've attached a patch which is probably incomplete. > > I'd love feedback as to the feasibility of the change and/or patch > > quality. > > > > As a follow-on, would it be sane to add support like this for > > online resizing. From a cursory investigation, it looks like > > setup_new_block_groups() could be modified to not zero itables > > if uninit_bg is supported, and INODE_ZEROED could be replaced > > with ΒG_*_UNINIT. However, I'm not sure if that is a naive > > view. I'm happy to send along a patch illustrating this change > > if that'd be helpful or welcome. > > The question is why you would want to risk disk corruption vs. > the (likely not performance critical) online resize? I'm interested in it for a few reasons: 1. it undermines the use of uninit_bg on large filesystems as e2fsck -f will go back to normal speed, even without those block groups being 'used'. In my local example, it goes from 14s to 2m24s. 2. it will spread the I/O cost out over time. Online resizing often means that you don't want to/can't unmount the fs. However, a large filesystem increase might result in gigabytes of 0s being written to the backing store which could result in I/O throttling that makes doing it online less useful. It'd be nice to be able to optionally amortize that cost as is done if the fs had been mke2fs -O lazy_itable_init=1 at full size initially. 3. dm/sparse-file-backed ext4 filesystems end up having the file size expanded quite early as init'ing the itables force extra non-sparse bytes to be written. Otherwise, ext4+uninit_bg is a really nice fs type for this purpose. Would it seriously raise the risk of corruption if uninit_bg is already in use? Alternately, would a patch to that effect stand a chance of ever making it upstream? > > Any and all feedback is appreciated -- even if it just for me > > to look at the archives/link/etc. > > > > diff --git a/resize/resize2fs.c b/resize/resize2fs.c > > index 1a5d910..9fcc3b9 100644 > > --- a/resize/resize2fs.c > > +++ b/resize/resize2fs.c > > @@ -497,8 +497,7 @@ retry: > > > > fs->group_desc[i].bg_flags = 0; > > if (csum_flag) > > - fs->group_desc[i].bg_flags |= EXT2_BG_INODE_UNINIT | > > - EXT2_BG_INODE_ZEROED; > > + fs->group_desc[i].bg_flags |= EXT2_BG_INODE_UNINIT; > > This shouldn't be unconditional, since most users will want the > safety of having zeroed inode tables. I've attached a version with it being flagged by a "-l" for lazy. Thanks for the quick response! will Signed-off-by: Will Drewry --- resize/main.c | 7 +++++-- resize/online.c | 2 +- resize/resize2fs.8.in | 3 +++ resize/resize2fs.c | 41 +++++++++++++++++++++++++++++------------ resize/resize2fs.h | 5 ++++- 5 files changed, 42 insertions(+), 16 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/resize/main.c b/resize/main.c index 220c192..f04a939 100644 --- a/resize/main.c +++ b/resize/main.c @@ -39,7 +39,7 @@ char *program_name, *device_name, *io_options; static void usage (char *prog) { - fprintf (stderr, _("Usage: %s [-d debug_flags] [-f] [-F] [-M] [-P] " + fprintf (stderr, _("Usage: %s [-d debug_flags] [-f] [-F] [-l] [-M] [-P] " "[-p] device [new_size]\n\n"), prog); exit (1); @@ -189,7 +189,7 @@ int main (int argc, char ** argv) if (argc && *argv) program_name = *argv; - while ((c = getopt (argc, argv, "d:fFhMPpS:")) != EOF) { + while ((c = getopt (argc, argv, "d:fFhlMPpS:")) != EOF) { switch (c) { case 'h': usage(program_name); @@ -209,6 +209,9 @@ int main (int argc, char ** argv) case 'd': flags |= atoi(optarg); break; + case 'l': + flags |= RESIZE_LAZY_INIT; + break; case 'p': flags |= RESIZE_PERCENT_COMPLETE; break; diff --git a/resize/online.c b/resize/online.c index 4bc5451..f0b0569 100644 --- a/resize/online.c +++ b/resize/online.c @@ -104,7 +104,7 @@ errcode_t online_resize_fs(ext2_filsys fs, const char *mtpt, * but at least it allows on-line resizing to function. */ new_fs->super->s_feature_incompat &= ~EXT4_FEATURE_INCOMPAT_FLEX_BG; - retval = adjust_fs_info(new_fs, fs, 0, *new_size); + retval = adjust_fs_info(new_fs, fs, 0, *new_size, flags); if (retval) return retval; diff --git a/resize/resize2fs.8.in b/resize/resize2fs.8.in index 3ea7a63..020393e 100644 --- a/resize/resize2fs.8.in +++ b/resize/resize2fs.8.in @@ -102,6 +102,9 @@ really useful for doing .B resize2fs time trials. .TP +.B \-l +Lazily initialize new inode tables if supported (uninit_bg). +.TP .B \-M Shrink the filesystem to the minimum size. .TP diff --git a/resize/resize2fs.c b/resize/resize2fs.c index 1a5d910..fee2e3f 100644 --- a/resize/resize2fs.c +++ b/resize/resize2fs.c @@ -41,7 +41,8 @@ #endif static void fix_uninit_block_bitmaps(ext2_filsys fs); -static errcode_t adjust_superblock(ext2_resize_t rfs, blk_t new_size); +static errcode_t adjust_superblock(ext2_resize_t rfs, blk_t new_size, + int flags); static errcode_t blocks_to_move(ext2_resize_t rfs); static errcode_t block_mover(ext2_resize_t rfs); static errcode_t inode_scan_and_fix(ext2_resize_t rfs); @@ -106,7 +107,7 @@ errcode_t resize_fs(ext2_filsys fs, blk_t *new_size, int flags, if (retval) goto errout; - retval = adjust_superblock(rfs, *new_size); + retval = adjust_superblock(rfs, *new_size, flags); if (retval) goto errout; @@ -292,7 +293,7 @@ static void free_gdp_blocks(ext2_filsys fs, * filesystem. */ errcode_t adjust_fs_info(ext2_filsys fs, ext2_filsys old_fs, - ext2fs_block_bitmap reserve_blocks, blk_t new_size) + ext2fs_block_bitmap reserve_blocks, blk_t new_size, int flags) { errcode_t retval; int overhead = 0; @@ -496,9 +497,11 @@ retry: adjblocks = 0; fs->group_desc[i].bg_flags = 0; - if (csum_flag) - fs->group_desc[i].bg_flags |= EXT2_BG_INODE_UNINIT | - EXT2_BG_INODE_ZEROED; + if (csum_flag) { + fs->group_desc[i].bg_flags |= EXT2_BG_INODE_UNINIT; + if (!(flags & RESIZE_LAZY_INIT)) + fs->group_desc[i].bg_flags |= EXT2_BG_INODE_ZEROED; + } if (i == fs->group_desc_count-1) { numblocks = (fs->super->s_blocks_count - fs->super->s_first_data_block) % @@ -565,10 +568,10 @@ errout: * This routine adjusts the superblock and other data structures, both * in disk as well as in memory... */ -static errcode_t adjust_superblock(ext2_resize_t rfs, blk_t new_size) +static errcode_t adjust_superblock(ext2_resize_t rfs, blk_t new_size, int flags) { ext2_filsys fs; - int adj = 0; + int adj = 0, csum_flag = 0, num = 0; errcode_t retval; blk_t group_block; unsigned long i; @@ -584,7 +587,7 @@ static errcode_t adjust_superblock(ext2_resize_t rfs, blk_t new_size) if (retval) return retval; - retval = adjust_fs_info(fs, rfs->old_fs, rfs->reserve_blocks, new_size); + retval = adjust_fs_info(fs, rfs->old_fs, rfs->reserve_blocks, new_size, flags); if (retval) goto errout; @@ -624,6 +627,9 @@ static errcode_t adjust_superblock(ext2_resize_t rfs, blk_t new_size) &rfs->itable_buf); if (retval) goto errout; + /* Track if we can get by with a lazy init */ + csum_flag = EXT2_HAS_RO_COMPAT_FEATURE(fs->super, + EXT4_FEATURE_RO_COMPAT_GDT_CSUM); memset(rfs->itable_buf, 0, fs->blocksize * fs->inode_blocks_per_group); group_block = fs->super->s_first_data_block + @@ -642,10 +648,21 @@ static errcode_t adjust_superblock(ext2_resize_t rfs, blk_t new_size) /* * Write out the new inode table */ + if (csum_flag && (flags & RESIZE_LAZY_INIT)) { + /* These are _new_ inode tables. No inodes should be in use. */ + fs->group_desc[i].bg_itable_unused = fs->super->s_inodes_per_group; + num = ((((fs->super->s_inodes_per_group - + fs->group_desc[i].bg_itable_unused) * + EXT2_INODE_SIZE(fs->super)) + + EXT2_BLOCK_SIZE(fs->super) - 1) / + EXT2_BLOCK_SIZE(fs->super)); + } else { + num = fs->inode_blocks_per_group; + } retval = io_channel_write_blk(fs->io, - fs->group_desc[i].bg_inode_table, - fs->inode_blocks_per_group, - rfs->itable_buf); + fs->group_desc[i].bg_inode_table, /* blk */ + num, /* count */ + rfs->itable_buf); /* contents */ if (retval) goto errout; io_channel_flush(fs->io); diff --git a/resize/resize2fs.h b/resize/resize2fs.h index fab7290..d4c862c 100644 --- a/resize/resize2fs.h +++ b/resize/resize2fs.h @@ -77,6 +77,8 @@ typedef struct ext2_sim_progress *ext2_sim_progmeter; #define RESIZE_DEBUG_INODEMAP 0x0004 #define RESIZE_DEBUG_ITABLEMOVE 0x0008 +#define RESIZE_LAZY_INIT 0x0010 + #define RESIZE_PERCENT_COMPLETE 0x0100 #define RESIZE_VERBOSE 0x0200 @@ -129,7 +131,8 @@ extern errcode_t resize_fs(ext2_filsys fs, blk_t *new_size, int flags, extern errcode_t adjust_fs_info(ext2_filsys fs, ext2_filsys old_fs, ext2fs_block_bitmap reserve_blocks, - blk_t new_size); + blk_t new_size, + int flags); extern blk_t calculate_minimum_resize_size(ext2_filsys fs);