diff mbox

ext4: add checksum calculation when clearing UNINIT flag

Message ID 1226053376.3542.8.camel@frecb007923.frec.bull.fr
State Accepted, archived
Headers show

Commit Message

Frédéric Bohé Nov. 7, 2008, 10:22 a.m. UTC
From: Frederic Bohe <frederic.bohe@bull.net>

Block group's checksum need to be re-calculated during the
initialization of an UNINIT'd group. This fix a race when several
threads try to allocate a new inode in an UNINIT'd group.

Signed-off-by: Frederic Bohe <frederic.bohe@bull.net>
---
 ialloc.c |    2 ++
 1 file changed, 2 insertions(+)



--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Theodore Ts'o Nov. 7, 2008, 1:52 p.m. UTC | #1
On Fri, Nov 07, 2008 at 11:22:56AM +0100, Frédéric Bohé wrote:
> From: Frederic Bohe <frederic.bohe@bull.net>
> 
> Block group's checksum need to be re-calculated during the
> initialization of an UNINIT'd group. This fix a race when several
> threads try to allocate a new inode in an UNINIT'd group.

This patch looks sane, and so I'll accept it, but there's a higher
order hiding here ---- why are we initializing the block bitmap in
ext4_new_inode()?  Sure, *most* of the time where we create a new
inode, we'll be needing to allocate a new block, but sometimes we
won't (i.e., when creating a symlink, device file, socket, or a
zero-length regular file).  More seriously, we don't account for the
potential need for an extra journal credit in all of the callers for
ext4_new_inode().  Obviously this doesn't get us in trouble because we
generally massively overestimate the number of journal credits we need
--- but from the point of view of code simplification, maybe code
block to ininitialize the block bitmap in ext4_new_inode() should be
dropped entirely.

We have to do the exact same check in the mballoc.c when we actually
allocate blocks --- and in that case we know we'll be modifying the
block bitmap, so there's no need to first initialize the block bitmap
in ext4_new_inode(), only to need to request to redirty that same
block bitmap in mballoc.c when we are really allocating data for the
inode.

Does that make sense for a future cleanup?

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aneesh Kumar K.V Nov. 7, 2008, 2:27 p.m. UTC | #2
On Fri, Nov 07, 2008 at 08:52:22AM -0500, Theodore Tso wrote:
> On Fri, Nov 07, 2008 at 11:22:56AM +0100, Frédéric Bohé wrote:
> > From: Frederic Bohe <frederic.bohe@bull.net>
> > 
> > Block group's checksum need to be re-calculated during the
> > initialization of an UNINIT'd group. This fix a race when several
> > threads try to allocate a new inode in an UNINIT'd group.
> 
> This patch looks sane, and so I'll accept it, but there's a higher
> order hiding here ---- why are we initializing the block bitmap in
> ext4_new_inode()?  Sure, *most* of the time where we create a new
> inode, we'll be needing to allocate a new block, but sometimes we
> won't (i.e., when creating a symlink, device file, socket, or a
> zero-length regular file).  

Because when we clear the uninitt_bg flag the kernel expect the block 
bitmap to be correctly indicate blocks containing block
bitmap and inode bitmap as used. If mke2fs didn't do that we would
need to do the same when we remove the uninit_bg flag.


-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Nov. 7, 2008, 2:38 p.m. UTC | #3
On Fri, Nov 07, 2008 at 07:57:18PM +0530, Aneesh Kumar K.V wrote:
> On Fri, Nov 07, 2008 at 08:52:22AM -0500, Theodore Tso wrote:
> > On Fri, Nov 07, 2008 at 11:22:56AM +0100, Frédéric Bohé wrote:
> > > From: Frederic Bohe <frederic.bohe@bull.net>
> > > 
> > > Block group's checksum need to be re-calculated during the
> > > initialization of an UNINIT'd group. This fix a race when several
> > > threads try to allocate a new inode in an UNINIT'd group.
> > 
> > This patch looks sane, and so I'll accept it, but there's a higher
> > order hiding here ---- why are we initializing the block bitmap in
> > ext4_new_inode()?  Sure, *most* of the time where we create a new
> > inode, we'll be needing to allocate a new block, but sometimes we
> > won't (i.e., when creating a symlink, device file, socket, or a
> > zero-length regular file).  
> 
> Because when we clear the uninitt_bg flag the kernel expect the block 
> bitmap to be correctly indicate blocks containing block
> bitmap and inode bitmap as used. If mke2fs didn't do that we would
> need to do the same when we remove the uninit_bg flag.

We have separate flags inidicating whether the block allocation bitmap
and inode allocation bitmaps are initialized or not,
EXT4_BG_BLOCK_UNINIT, and EXT4_BG_INODE_UNINIT, respectively.  So what
I am proposing is to not initialize the block bitmap in
ext4_new_inode(), and not to clear the EXT4_BG_BLOCK_UNINIT flag, either.

		  		      	     	       - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Dilger Nov. 11, 2008, 1:23 a.m. UTC | #4
On Nov 07, 2008  09:38 -0500, Theodore Ts'o wrote:
> On Fri, Nov 07, 2008 at 07:57:18PM +0530, Aneesh Kumar K.V wrote:
> > Because when we clear the uninitt_bg flag the kernel expect the block 
> > bitmap to be correctly indicate blocks containing block
> > bitmap and inode bitmap as used. If mke2fs didn't do that we would
> > need to do the same when we remove the uninit_bg flag.
> 
> We have separate flags inidicating whether the block allocation bitmap
> and inode allocation bitmaps are initialized or not,
> EXT4_BG_BLOCK_UNINIT, and EXT4_BG_INODE_UNINIT, respectively.  So what
> I am proposing is to not initialize the block bitmap in
> ext4_new_inode(), and not to clear the EXT4_BG_BLOCK_UNINIT flag, either.

That would be dangerous, because the block group _would_ be in use due
to the fact that one of the inode table blocks is in use.  That isn't
to say we couldn't adopt sematics as you suggest (e.g. that INODE_UNINIT
not being set implies that the inode table blocks are in use regardless
of whether or not BLOCK_UNINIT is set, but it needs careful consideration.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: linux/fs/ext4/ialloc.c
===================================================================
--- linux.orig/fs/ext4/ialloc.c	2008-11-06 17:22:14.000000000 +0100
+++ linux/fs/ext4/ialloc.c	2008-11-07 10:43:41.000000000 +0100
@@ -718,6 +718,8 @@  got:
 			gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
 			free = ext4_free_blocks_after_init(sb, group, gdp);
 			gdp->bg_free_blocks_count = cpu_to_le16(free);
+			gdp->bg_checksum = ext4_group_desc_csum(sbi, group,
+								gdp);
 		}
 		spin_unlock(sb_bgl_lock(sbi, group));
--