diff mbox

[-V2,3/5] ext4: Fix the race between read_block_bitmap and mark_diskspace_used

Message ID 20081124113323.GC8462@skywalker
State Superseded, archived
Headers show

Commit Message

Aneesh Kumar K.V Nov. 24, 2008, 11:33 a.m. UTC
On Mon, Nov 24, 2008 at 10:14:27AM +0300, Alex Zhuravlev wrote:
> Theodore Tso wrote:
>> My bigger concern is given that we are playing games like *this*:
>>
>> 		if ((cur & 31) == 0 && (len - cur) >= 32) {
>> 			/* fast path: set whole word at once */
>> 			addr = bm + (cur >> 3);
>> 			*addr = 0xffffffff;
>> 			cur += 32;
>> 			continue;
>> 		}
>
> this is to avoid expensive LOCK prefix in some cases.
>
>> without taking a lock, I'm a little surprised we haven't been
>> seriously burned by other race conditions.  What's the point of
>> calling mb_set_bit_atomic() and passing in a spinlock if we are doing
>> this kind of check without the protection of the same spinlock?!?
>
> why would we need a lock for a whole word bitop ?

Ok the changes was not done for this purpose. I need to make sure we
update bitmap and clear group_desc uninit flag after taking sb_bgl_lock
That means when we claim blocks we can't use mb_set_bits with
sb_bgl_lock because we would already be holding it. How about the below
change

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Alex Zhuravlev Nov. 24, 2008, 4:36 p.m. UTC | #1
Aneesh Kumar K.V wrote:
> Ok the changes was not done for this purpose. I need to make sure we
> update bitmap and clear group_desc uninit flag after taking sb_bgl_lock
> That means when we claim blocks we can't use mb_set_bits with
> sb_bgl_lock because we would already be holding it. How about the below
> change

may I have a look at the original patch?

thanks, Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aneesh Kumar K.V Nov. 24, 2008, 4:43 p.m. UTC | #2
On Mon, Nov 24, 2008 at 07:36:49PM +0300, Alex Zhuravlev wrote:
> Aneesh Kumar K.V wrote:
>> Ok the changes was not done for this purpose. I need to make sure we
>> update bitmap and clear group_desc uninit flag after taking sb_bgl_lock
>> That means when we claim blocks we can't use mb_set_bits with
>> sb_bgl_lock because we would already be holding it. How about the below
>> change
>
> may I have a look at the original patch?

http://patchwork.ozlabs.org/patch/10065/

-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Zhuravlev Nov. 24, 2008, 6:03 p.m. UTC | #3
Aneesh Kumar K.V wrote:
> On Mon, Nov 24, 2008 at 07:36:49PM +0300, Alex Zhuravlev wrote:
>> Aneesh Kumar K.V wrote:
>>> Ok the changes was not done for this purpose. I need to make sure we
>>> update bitmap and clear group_desc uninit flag after taking sb_bgl_lock
>>> That means when we claim blocks we can't use mb_set_bits with
>>> sb_bgl_lock because we would already be holding it. How about the below
>>> change
>> may I have a look at the original patch?
> 
> http://patchwork.ozlabs.org/patch/10065/

I don't understand how a group can be "uninit" if we do some manipulations
inside. both allocation and preallocation initialize group first, see in
ext4_mb_init_cache()

thanks, Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aneesh Kumar K.V Nov. 24, 2008, 6:12 p.m. UTC | #4
On Mon, Nov 24, 2008 at 09:03:21PM +0300, Alex Zhuravlev wrote:
> Aneesh Kumar K.V wrote:
>> On Mon, Nov 24, 2008 at 07:36:49PM +0300, Alex Zhuravlev wrote:
>>> Aneesh Kumar K.V wrote:
>>>> Ok the changes was not done for this purpose. I need to make sure we
>>>> update bitmap and clear group_desc uninit flag after taking sb_bgl_lock
>>>> That means when we claim blocks we can't use mb_set_bits with
>>>> sb_bgl_lock because we would already be holding it. How about the below
>>>> change
>>> may I have a look at the original patch?
>>
>> http://patchwork.ozlabs.org/patch/10065/
>
> I don't understand how a group can be "uninit" if we do some manipulations
> inside. both allocation and preallocation initialize group first, see in
> ext4_mb_init_cache()
>

With commit c806e68f we do a init_bitmap every time we do a
read_block_bitmap.

To quote the update commit message that i have

    ext4: Fix race between read_block_bitmap() and mark_diskspace_used()
    
    We need to make sure we update the block bitmap and clear
    EXT4_BG_BLOCK_UNINIT flag with sb_bgl_lock held.  We look at
    EXT4_BG_BLOCK_UNINIT and reinit the block bitmap each time in
    ext4_read_block_bitmap (introduced by commit c806e68f), and this can
    race with block allocations in ext4_mb_mark_diskspace_used().
    
    ext4_read_block_bitmap does:
    
    spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
    if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
    	ext4_init_block_bitmap(sb, bh, block_group, desc);
    
    Now on the block allocation side we do
    
    mb_set_bits(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group), bitmap_bh->b_data,
    			ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len);
    ....
    spin_lock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group));
    if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
    	gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
    
    ie on allocation we update the bitmap then we take the sb_bgl_lock
    and clear the EXT4_BG_BLOCK_UNINIT flag. What can happen is a
    parallel ext4_read_block_bitmap can zero out the bitmap in between
    the above mb_set_bits and spin_lock(sb_bg_lock..)
    
    The race results in below user visible errors
    EXT4-fs error (device sdb1): ext4_mb_release_inode_pa: free 100, pa_free 105
    EXT4-fs error (device sdb1): mb_free_blocks: double-free of inode 0's block 50(bit 100 in group 0)

-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Zhuravlev Nov. 24, 2008, 6:17 p.m. UTC | #5
Aneesh Kumar K.V wrote:
> With commit c806e68f we do a init_bitmap every time we do a
> read_block_bitmap.

can you explain why do we need to init it every time?

thanks, Alex

> 
> To quote the update commit message that i have
> 
>     ext4: Fix race between read_block_bitmap() and mark_diskspace_used()
>     
>     We need to make sure we update the block bitmap and clear
>     EXT4_BG_BLOCK_UNINIT flag with sb_bgl_lock held.  We look at
>     EXT4_BG_BLOCK_UNINIT and reinit the block bitmap each time in
>     ext4_read_block_bitmap (introduced by commit c806e68f), and this can
>     race with block allocations in ext4_mb_mark_diskspace_used().
>     
>     ext4_read_block_bitmap does:
>     
>     spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
>     if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
>     	ext4_init_block_bitmap(sb, bh, block_group, desc);
>     
>     Now on the block allocation side we do
>     
>     mb_set_bits(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group), bitmap_bh->b_data,
>     			ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len);
>     ....
>     spin_lock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group));
>     if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
>     	gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
>     
>     ie on allocation we update the bitmap then we take the sb_bgl_lock
>     and clear the EXT4_BG_BLOCK_UNINIT flag. What can happen is a
>     parallel ext4_read_block_bitmap can zero out the bitmap in between
>     the above mb_set_bits and spin_lock(sb_bg_lock..)
>     
>     The race results in below user visible errors
>     EXT4-fs error (device sdb1): ext4_mb_release_inode_pa: free 100, pa_free 105
>     EXT4-fs error (device sdb1): mb_free_blocks: double-free of inode 0's block 50(bit 100 in group 0)
> 
> -aneesh

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Frédéric Bohé Nov. 25, 2008, 2:29 p.m. UTC | #6
> Ok the changes was not done for this purpose. I need to make sure we
> update bitmap and clear group_desc uninit flag after taking sb_bgl_lock
> That means when we claim blocks we can't use mb_set_bits with
> sb_bgl_lock because we would already be holding it. How about the below
> change
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 444ad99..53180b1 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -1031,7 +1031,10 @@ static void mb_clear_bits(spinlock_t *lock, void *bm, int cur, int len)
>  			cur += 32;
>  			continue;
>  		}
> -		mb_clear_bit_atomic(lock, cur, bm);
> +		if (lock)
> +			mb_clear_bit_atomic(lock, cur, bm);
> +		else
> +			mb_clear_bit(cur, bm);
>  		cur++;
>  	}
>  }
> @@ -1049,7 +1052,10 @@ static void mb_set_bits(spinlock_t *lock, void *bm, int cur, int len)
>  			cur += 32;
>  			continue;
>  		}
> -		mb_set_bit_atomic(lock, cur, bm);
> +		if (lock)
> +			mb_set_bit_atomic(lock, cur, bm);
> +		else
> +			mb_set_bit(cur, bm);
>  		cur++;
>  	}
>  }


Aneesh,

I've just tried your patch on top of the patch queue and I still have
this error when I resize the fs while allocating blocks concurrently
(with dd):

EXT4-fs error (device md0): ext4_mb_mark_diskspace_used: Allocating
block in system zone - block = 9117697
EXT4-fs error (device md0): ext4_mb_generate_buddy: EXT4-fs: group 1113:
6144 blocks in bitmap, 6014 in gd


Anyway I'm not sure if it is related to the race you try to fix.

FYI this error first appeared with the first set of patches concerning
resize and mballoc.

Regards,

Frederic







--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Zhuravlev Nov. 25, 2008, 4:38 p.m. UTC | #7
Frédéric Bohé wrote:
> I've just tried your patch on top of the patch queue and I still have
> this error when I resize the fs while allocating blocks concurrently
> (with dd):

I believe this is because original fix is wrong. instead of uptodate we
could use different flag.

thanks, Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 444ad99..53180b1 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1031,7 +1031,10 @@  static void mb_clear_bits(spinlock_t *lock, void *bm, int cur, int len)
 			cur += 32;
 			continue;
 		}
-		mb_clear_bit_atomic(lock, cur, bm);
+		if (lock)
+			mb_clear_bit_atomic(lock, cur, bm);
+		else
+			mb_clear_bit(cur, bm);
 		cur++;
 	}
 }
@@ -1049,7 +1052,10 @@  static void mb_set_bits(spinlock_t *lock, void *bm, int cur, int len)
 			cur += 32;
 			continue;
 		}
-		mb_set_bit_atomic(lock, cur, bm);
+		if (lock)
+			mb_set_bit_atomic(lock, cur, bm);
+		else
+			mb_set_bit(cur, bm);
 		cur++;
 	}
 }