Patchwork e2fsprogs: don't set stripe/stride to 1 block in mkfs

login
register
mail settings
Submitter Eric Sandeen
Date April 5, 2011, 4:56 p.m.
Message ID <4D9B49A6.7000709@redhat.com>
Download mbox | patch
Permalink /patch/89900/
State Accepted
Headers show

Comments

Eric Sandeen - April 5, 2011, 4:56 p.m.
On 4/5/11 9:39 AM, Eric Sandeen wrote:
> On 4/5/11 1:10 AM, Andreas Dilger wrote:
>> On 2011-04-04, at 9:11 AM, Eric Sandeen wrote:
>>> Block devices may set minimum or optimal IO hints equal to
>>> blocksize; in this case there is really nothing for ext4
>>> to do with this information (i.e. search for a block-aligned
>>> allocation?) so don't set fs geometry with single-block
>>> values.
>>>
>>> Zeev also reported that with a block-sized stripe, the
>>> ext4 allocator spends time spinning in ext4_mb_scan_aligned(),
>>> oddly enough.
>>>
>>> Reported-by: Zeev Tarantov <zeev.tarantov@gmail.com>
>>> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
>>> ---
>>>
>>> diff --git a/misc/mke2fs.c b/misc/mke2fs.c
>>> index 9798b88..74b838c 100644
>>> --- a/misc/mke2fs.c
>>> +++ b/misc/mke2fs.c
>>> @@ -1135,8 +1135,11 @@ static int get_device_geometry(const char *file,
>>> 	if ((opt_io == 0) && (psector_size > blocksize))
>>> 		opt_io = psector_size;
>>>
>>> -	fs_param->s_raid_stride = min_io / blocksize;
>>> -	fs_param->s_raid_stripe_width = opt_io / blocksize;
>>> +	/* setting stripe/stride to blocksize is pointless */
>>> +	if (min_io > blocksize)
>>> +		fs_param->s_raid_stride = min_io / blocksize;
>>> +	if (opt_io > blocksize)
>>> +		fs_param->s_raid_stripe_width = opt_io / blocksize;
>>
>> I don't think it is harmful to specify an mballoc alignment that is
>> an even multiple of the underlying device IO size (e.g. at least
>> 256kB or 512kB).
>>
>> If the underlying device (e.g. zram) is reporting 16kB or 64kB opt_io
>> size because that is PAGE_SIZE, but blocksize is 4kB, then we will
>> have the same performance problem again.> 
>> Cheers, Andreas
> 
> I need to look into why ext4_mb_scan_aligned is so inefficient for a block-sized stripe.
> 
> In practice I don't think we've seen this problem with stripe size at 4 or 8 or 16 blocks; it may just be less apparent.  I think the function steps through by stripe-sized units, and if that is 1 block, it's a lot of stepping.  
> 
>         while (i < EXT4_BLOCKS_PER_GROUP(sb)) {
> ...
>                 if (!mb_test_bit(i, bitmap)) {

Offhand I think maybe mb_find_next_zero_bit would be more efficient.

totally untested, but I think we have better ways to step through the bitmap.

-Eric

> ...
>                 }
>                 i += sbi->s_stripe;
>         }
> 
> But in any case, setting stripe alignment to 1 block makes no sense to me, and I see no reason to do it at mkfs time...
> 
> -Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen - April 5, 2011, 10:21 p.m.
On 4/5/11 9:56 AM, Eric Sandeen wrote:
> On 4/5/11 9:39 AM, Eric Sandeen wrote:
>> On 4/5/11 1:10 AM, Andreas Dilger wrote:
>>> On 2011-04-04, at 9:11 AM, Eric Sandeen wrote:
>>>> Block devices may set minimum or optimal IO hints equal to
>>>> blocksize; in this case there is really nothing for ext4
>>>> to do with this information (i.e. search for a block-aligned
>>>> allocation?) so don't set fs geometry with single-block
>>>> values.
>>>>
>>>> Zeev also reported that with a block-sized stripe, the
>>>> ext4 allocator spends time spinning in ext4_mb_scan_aligned(),
>>>> oddly enough.
>>>>
>>>> Reported-by: Zeev Tarantov <zeev.tarantov@gmail.com>
>>>> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
>>>> ---
>>>>
>>>> diff --git a/misc/mke2fs.c b/misc/mke2fs.c
>>>> index 9798b88..74b838c 100644
>>>> --- a/misc/mke2fs.c
>>>> +++ b/misc/mke2fs.c
>>>> @@ -1135,8 +1135,11 @@ static int get_device_geometry(const char *file,
>>>> 	if ((opt_io == 0) && (psector_size > blocksize))
>>>> 		opt_io = psector_size;
>>>>
>>>> -	fs_param->s_raid_stride = min_io / blocksize;
>>>> -	fs_param->s_raid_stripe_width = opt_io / blocksize;
>>>> +	/* setting stripe/stride to blocksize is pointless */
>>>> +	if (min_io > blocksize)
>>>> +		fs_param->s_raid_stride = min_io / blocksize;
>>>> +	if (opt_io > blocksize)
>>>> +		fs_param->s_raid_stripe_width = opt_io / blocksize;
>>>
>>> I don't think it is harmful to specify an mballoc alignment that is
>>> an even multiple of the underlying device IO size (e.g. at least
>>> 256kB or 512kB).
>>>
>>> If the underlying device (e.g. zram) is reporting 16kB or 64kB opt_io
>>> size because that is PAGE_SIZE, but blocksize is 4kB, then we will
>>> have the same performance problem again.> 
>>> Cheers, Andreas
>>
>> I need to look into why ext4_mb_scan_aligned is so inefficient for a block-sized stripe.
>>
>> In practice I don't think we've seen this problem with stripe size at 4 or 8 or 16 blocks; it may just be less apparent.  I think the function steps through by stripe-sized units, and if that is 1 block, it's a lot of stepping.  
>>
>>         while (i < EXT4_BLOCKS_PER_GROUP(sb)) {
>> ...
>>                 if (!mb_test_bit(i, bitmap)) {
> 
> Offhand I think maybe mb_find_next_zero_bit would be more efficient.
> 
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -1939,16 +1939,14 @@ void ext4_mb_scan_aligned(struct ext4_allocation_context *ac,
>         i = (a * sbi->s_stripe) - first_group_block;
>  
>         while (i < EXT4_BLOCKS_PER_GROUP(sb)) {
> -               if (!mb_test_bit(i, bitmap)) {
> -                       max = mb_find_extent(e4b, 0, i, sbi->s_stripe, &ex);
> -                       if (max >= sbi->s_stripe) {
> -                               ac->ac_found++;
> -                               ac->ac_b_ex = ex;
> -                               ext4_mb_use_best_found(ac, e4b);
> -                               break;
> -                       }
> +               i = mb_find_next_zero_bit(bitmap, EXT4_BLOCKS_PER_GROUP(sb), i);
> +               max = mb_find_extent(e4b, 0, i, sbi->s_stripe, &ex);
> +               if (max >= sbi->s_stripe) {
> +                       ac->ac_found++;
> +                       ac->ac_b_ex = ex;
> +                       ext4_mb_use_best_found(ac, e4b);
> +                       break;
>                 }
> -               i += sbi->s_stripe;
>         }
>  }
> 
> totally untested, but I think we have better ways to step through the bitmap.

I tested it ;)

Seems to work fine, though I probably should see how things actually got allocated.

Creating an fs with 1-block stripes & widths as with the original report, and copying a (built) 2.6 linux kernel tree, it took about 7 minutes, and looped in the while() loop above 328215171 times.

With the patch above, it took 6m30s, and looped 25055 times.

For a filesystem with no stripe/stride set, it took 6m26s.

Hm, but subsequent tests w/ the tiny stripe set came around 6m30s as well.  So there's no obvious speedup.

Still, avoiding all that looping seems beneficial, I can send a patch after I make sure allocation is still happening as expected.

Zeev, if you'd like to test that patch above with your profiling, that'd be awesome.

Thanks,
-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zeev Tarantov - April 6, 2011, 6:51 a.m.
On Wed, Apr 6, 2011 at 01:21, Eric Sandeen <sandeen@redhat.com> wrote:
> Zeev, if you'd like to test that patch above with your profiling, that'd be awesome.

My wall-clock timings are strange and I disregard them.
On a desktop with X turned off and perf recording system-wide, I trust
the profile to show everything.

In the no-stride case, I don't think you've made it any worse.
In the stripe=stride=1, it went from:

[ perf record: Woken up 13 times to write data ]
[ perf record: Captured and wrote 3.163 MB perf.data (~138211 samples) ]
# Events: 14K cycles
#
# Overhead         Command          Shared Object
             Symbol
# ........  ..............  .....................
.......................................
#
    13.08%            gzip  gzip                   [.] zip
    12.65%     flush-253:0  [kernel.kallsyms]      [k] ext4_mb_scan_aligned
    12.46%            gzip  gzip                   [.] treat_file.part.4.2264
     9.34%     flush-253:0  [csnappy_compress]     [k] snappy_compress_fragment
     5.47%          md5sum  md5sum                 [.] digest_file.isra.2.2089
     4.70%          md5sum  [csnappy_decompress]   [k]
snappy_decompress_noheader
     4.29%          md5sum  md5sum                 [.] 0x2fe7
     2.16%            gzip  libc-2.13.so           [.] __memcpy_ssse3
     1.79%     flush-253:0  [kernel.kallsyms]      [k] memcpy
     1.52%             tar  [kernel.kallsyms]      [k] copy_user_generic_string
     1.08%             tar  [kernel.kallsyms]      [k] _raw_spin_lock
     1.00%            gzip  [kernel.kallsyms]      [k] copy_user_generic_string
     0.87%     flush-253:0  [kernel.kallsyms]      [k] __memset
     0.58%          md5sum  md5sum                 [.] __libc_csu_init
     0.54%          md5sum  [zram]                 [k] zram_make_request
     0.50%          md5sum  [kernel.kallsyms]      [k] copy_user_generic_string
     0.45%             tar  [kernel.kallsyms]      [k] ext4_mark_iloc_dirty
     0.44%             tar  [kernel.kallsyms]      [k] __memset
     0.39%            gzip  gzip                   [.] treat_stdin.2262
     0.35%            gzip  gzip                   [.] treat_file.2267
     0.33%         swapper  [kernel.kallsyms]      [k] mwait_idle
     0.28%              dd  [kernel.kallsyms]      [k] system_call
     0.26%          md5sum  [kernel.kallsyms]      [k] memcpy
     0.25%     flush-253:0  [zram]                 [k] zram_make_request
     0.25%     flush-253:0  [kernel.kallsyms]      [k] _raw_spin_lock
     0.23%             tar  [kernel.kallsyms]      [k] ext4_mb_scan_aligned
     0.22%            gzip  gzip                   [.] compress_block.2644.2190
     0.20%             tar  [kernel.kallsyms]      [k] __find_get_block

to:

[ perf record: Woken up 12 times to write data ]
[ perf record: Captured and wrote 3.091 MB perf.data (~135042 samples) ]
# Events: 13K cycles
#
# Overhead      Command          Shared Object
        Symbol
# ........  ...........  .....................
.....................................
#
    15.10%         gzip  gzip                   [.] zip
    14.70%         gzip  gzip                   [.] treat_file.part.4.2264
    11.10%  flush-253:0  [csnappy_compress]     [k] snappy_compress_fragment
     6.12%       md5sum  md5sum                 [.] digest_file.isra.2.2089
     5.44%       md5sum  [csnappy_decompress]   [k] snappy_decompress_noheader
     5.10%       md5sum  md5sum                 [.] 0x3262
     2.42%         gzip  libc-2.13.so           [.] __memcpy_ssse3
     2.03%  flush-253:0  [kernel.kallsyms]      [k] memcpy
     1.52%          tar  [kernel.kallsyms]      [k] copy_user_generic_string
     1.13%         gzip  [kernel.kallsyms]      [k] copy_user_generic_string
     0.80%  flush-253:0  [kernel.kallsyms]      [k] __memset
     0.62%       md5sum  md5sum                 [.] __libc_csu_init
     0.60%       md5sum  [zram]                 [k] zram_make_request
     0.57%         gzip  gzip                   [.] treat_stdin.2262
     0.56%       md5sum  [kernel.kallsyms]      [k] copy_user_generic_string
     0.55%          tar  [kernel.kallsyms]      [k] __memset
     0.49%          tar  [kernel.kallsyms]      [k] ext4_mark_iloc_dirty
     0.39%       md5sum  [kernel.kallsyms]      [k] memcpy
     0.37%         gzip  gzip                   [.] treat_file.2267
     0.37%      swapper  [kernel.kallsyms]      [k] mwait_idle
     0.28%           dd  [kernel.kallsyms]      [k] system_call
     0.28%  flush-253:0  [kernel.kallsyms]      [k] _raw_spin_lock
     0.25%         gzip  gzip                   [.] compress_block.2644.2190
     0.22%  flush-253:0  [kernel.kallsyms]      [k] ext4_mark_iloc_dirty
     0.22%          tar  [kernel.kallsyms]      [k] __ext4_get_inode_loc
     0.22%          tar  [kernel.kallsyms]      [k] _raw_spin_lock
     0.21%          tar  [kernel.kallsyms]      [k] mark_page_accessed
     0.20%          tar  [kernel.kallsyms]      [k] __find_get_block
     0.20%          tar  tar                    [.] contains_dot_dot

So that's great, but I still don't want mke2fs to set a stripe and
stride of 1, because that's silly - no code improvement in the
allocator is ever going to be able to use that information.

> Thanks,
> -Eric

Thank you!
-Z.T.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen - April 6, 2011, 5 p.m.
On 4/5/11 11:51 PM, Zeev Tarantov wrote:
> On Wed, Apr 6, 2011 at 01:21, Eric Sandeen <sandeen@redhat.com> wrote:
>> Zeev, if you'd like to test that patch above with your profiling, that'd be awesome.
> 
> My wall-clock timings are strange and I disregard them.
> On a desktop with X turned off and perf recording system-wide, I trust
> the profile to show everything.

Agreed, the wallclock time was just a very coarse sanity check while off at a conference :)

> In the no-stride case, I don't think you've made it any worse.

it shouldn't even go down this path so I hope not ;)

> In the stripe=stride=1, it went from:
> 
> [ perf record: Woken up 13 times to write data ]
> [ perf record: Captured and wrote 3.163 MB perf.data (~138211 samples) ]
> # Events: 14K cycles
> #
> # Overhead         Command          Shared Object
>              Symbol
> # ........  ..............  .....................
> .......................................
> #
>     13.08%            gzip  gzip                   [.] zip
>     12.65%     flush-253:0  [kernel.kallsyms]      [k] ext4_mb_scan_aligned

...

> to:
> 
> [ perf record: Woken up 12 times to write data ]
> [ perf record: Captured and wrote 3.091 MB perf.data (~135042 samples) ]
> # Events: 13K cycles
> #
> # Overhead      Command          Shared Object
>         Symbol
> # ........  ...........  .....................
> .....................................
> #
>     15.10%         gzip  gzip                   [.] zip
>     14.70%         gzip  gzip                   [.] treat_file.part.4.2264
>     11.10%  flush-253:0  [csnappy_compress]     [k] snappy_compress_fragment
>      6.12%       md5sum  md5sum                 [.] digest_file.isra.2.2089
...
>      0.23%             tar  [kernel.kallsyms]      [k] ext4_mb_scan_aligned
...

> So that's great, but I still don't want mke2fs to set a stripe and
> stride of 1, because that's silly - no code improvement in the
> allocator is ever going to be able to use that information.

I totally agree, and I'm sure the mkfs patch will go in as well.

Thanks for the verification of the kernel patch, too.

It should help a bit for sane stripe-sizes, too, with less impact as the stripe size gets bigger.

Thanks,
-Eric

>> Thanks,
>> -Eric
> 
> Thank you!
> -Z.T.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Dilger - April 8, 2011, 12:13 a.m.
On 2011-04-05, at 10:56 AM, Eric Sandeen wrote:

> On 4/5/11 9:39 AM, Eric Sandeen wrote:
>> Andreas Dilger wrote:
>>> I don't think it is harmful to specify an mballoc alignment that is
>>> an even multiple of the underlying device IO size (e.g. at least
>>> 256kB or 512kB).
>>> 
>>> If the underlying device (e.g. zram) is reporting 16kB or 64kB opt_io
>>> size because that is PAGE_SIZE, but blocksize is 4kB, then we will
>>> have the same performance problem again.> 
>>> Cheers, Andreas
>> 
>> I need to look into why ext4_mb_scan_aligned is so inefficient for a block-sized stripe.
>> 
>> In practice I don't think we've seen this problem with stripe size at 4 or 8 or 16 blocks; it may just be less apparent.  I think the function steps through by stripe-sized units, and if that is 1 block, it's a lot of stepping.  
>> 
>>        while (i < EXT4_BLOCKS_PER_GROUP(sb)) {
>> ...
>>                if (!mb_test_bit(i, bitmap)) {
> 
> Offhand I think maybe mb_find_next_zero_bit would be more efficient.
> 
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -1939,16 +1939,14 @@ void ext4_mb_scan_aligned(struct ext4_allocation_context *ac,
>        i = (a * sbi->s_stripe) - first_group_block;
> 
>        while (i < EXT4_BLOCKS_PER_GROUP(sb)) {
> -               if (!mb_test_bit(i, bitmap)) {
> -                       max = mb_find_extent(e4b, 0, i, sbi->s_stripe, &ex);
> -                       if (max >= sbi->s_stripe) {
> -                               ac->ac_found++;
> -                               ac->ac_b_ex = ex;
> -                               ext4_mb_use_best_found(ac, e4b);
> -                               break;
> -                       }
> +               i = mb_find_next_zero_bit(bitmap, EXT4_BLOCKS_PER_GROUP(sb), i);
> +               max = mb_find_extent(e4b, 0, i, sbi->s_stripe, &ex);
> +               if (max >= sbi->s_stripe) {
> +                       ac->ac_found++;
> +                       ac->ac_b_ex = ex;
> +                       ext4_mb_use_best_found(ac, e4b);
> +                       break;
>                }
> -               i += sbi->s_stripe;
>        }
> }
> 
> totally untested, but I think we have better ways to step through the bitmap.

This changes the allocation completely, AFAICS.  Instead of doing checks for chunks of free space aligned on sbi->s_stripe boundaries, it is instead finding the first free space of size s_stripe regardless of alignment.  That is not good for RAID back-ends, and is the primary reason for ext4_mb_scan_aligned() to exist.

I think my original assertion holds - that regardless of what the "optimal IO" size reported by the underlying device, doing larger allocations at the mballoc level that are even multiples of this size isn't harmful.  That avoids not only the performance impact of 4kB-sized "optimal IO", but also the (lesser) impact of 8kB-64kB "optimal IO" allocations as well.

Cheers, Andreas





--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen - April 8, 2011, 12:24 a.m.
On 4/7/11 5:13 PM, Andreas Dilger wrote:
> 
> On 2011-04-05, at 10:56 AM, Eric Sandeen wrote:
> 
>> On 4/5/11 9:39 AM, Eric Sandeen wrote:
>>> Andreas Dilger wrote:
>>>> I don't think it is harmful to specify an mballoc alignment that is
>>>> an even multiple of the underlying device IO size (e.g. at least
>>>> 256kB or 512kB).
>>>>
>>>> If the underlying device (e.g. zram) is reporting 16kB or 64kB opt_io
>>>> size because that is PAGE_SIZE, but blocksize is 4kB, then we will
>>>> have the same performance problem again.> 
>>>> Cheers, Andreas
>>>
>>> I need to look into why ext4_mb_scan_aligned is so inefficient for a block-sized stripe.
>>>
>>> In practice I don't think we've seen this problem with stripe size at 4 or 8 or 16 blocks; it may just be less apparent.  I think the function steps through by stripe-sized units, and if that is 1 block, it's a lot of stepping.  
>>>
>>>        while (i < EXT4_BLOCKS_PER_GROUP(sb)) {
>>> ...
>>>                if (!mb_test_bit(i, bitmap)) {
>>
>> Offhand I think maybe mb_find_next_zero_bit would be more efficient.
>>
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -1939,16 +1939,14 @@ void ext4_mb_scan_aligned(struct ext4_allocation_context *ac,
>>        i = (a * sbi->s_stripe) - first_group_block;
>>
>>        while (i < EXT4_BLOCKS_PER_GROUP(sb)) {
>> -               if (!mb_test_bit(i, bitmap)) {
>> -                       max = mb_find_extent(e4b, 0, i, sbi->s_stripe, &ex);
>> -                       if (max >= sbi->s_stripe) {
>> -                               ac->ac_found++;
>> -                               ac->ac_b_ex = ex;
>> -                               ext4_mb_use_best_found(ac, e4b);
>> -                               break;
>> -                       }
>> +               i = mb_find_next_zero_bit(bitmap, EXT4_BLOCKS_PER_GROUP(sb), i);
>> +               max = mb_find_extent(e4b, 0, i, sbi->s_stripe, &ex);
>> +               if (max >= sbi->s_stripe) {
>> +                       ac->ac_found++;
>> +                       ac->ac_b_ex = ex;
>> +                       ext4_mb_use_best_found(ac, e4b);
>> +                       break;
>>                }
>> -               i += sbi->s_stripe;
>>        }
>> }
>>
>> totally untested, but I think we have better ways to step through the bitmap.
> 
> This changes the allocation completely, AFAICS. Instead of doing
> checks for chunks of free space aligned on sbi->s_stripe boundaries,
> it is instead finding the first free space of size s_stripe
> regardless of alignment. That is not good for RAID back-ends, and is
> the primary reason for ext4_mb_scan_aligned() to exist.

Oh, er, right.  It's what I get for coding-at-conference, sorry.

I do wonder if test-bit/advance/test-bit/advance can be made a bit more efficient with something like find_next_bit.  I just did it wrong. :(

I'll revisit it when I get back home.

> I think my original assertion holds - that regardless of what the
> "optimal IO" size reported by the underlying device, doing larger
> allocations at the mballoc level that are even multiples of this size
> isn't harmful. That avoids not only the performance impact of
> 4kB-sized "optimal IO", but also the (lesser) impact of 8kB-64kB
> "optimal IO" allocations as well.> 
> Cheers, Andreas

I'll give that some thought; really, the whole align-on-a-stripe mechanism needs work, at least outside of the Lustre workload :)

Thanks,
-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1939,16 +1939,14 @@  void ext4_mb_scan_aligned(struct ext4_allocation_context *ac,
        i = (a * sbi->s_stripe) - first_group_block;
 
        while (i < EXT4_BLOCKS_PER_GROUP(sb)) {
-               if (!mb_test_bit(i, bitmap)) {
-                       max = mb_find_extent(e4b, 0, i, sbi->s_stripe, &ex);
-                       if (max >= sbi->s_stripe) {
-                               ac->ac_found++;
-                               ac->ac_b_ex = ex;
-                               ext4_mb_use_best_found(ac, e4b);
-                               break;
-                       }
+               i = mb_find_next_zero_bit(bitmap, EXT4_BLOCKS_PER_GROUP(sb), i);
+               max = mb_find_extent(e4b, 0, i, sbi->s_stripe, &ex);
+               if (max >= sbi->s_stripe) {
+                       ac->ac_found++;
+                       ac->ac_b_ex = ex;
+                       ext4_mb_use_best_found(ac, e4b);
+                       break;
                }
-               i += sbi->s_stripe;
        }
 }