diff mbox

e2fsprogs: don't set stripe/stride to 1 block in mkfs

Message ID 4D9A17F8.4000406@redhat.com
State Superseded, archived
Headers show

Commit Message

Eric Sandeen April 4, 2011, 7:11 p.m. UTC
Block devices may set minimum or optimal IO hints equal to
blocksize; in this case there is really nothing for ext4
to do with this information (i.e. search for a block-aligned
allocation?) so don't set fs geometry with single-block
values.

Zeev also reported that with a block-sized stripe, the
ext4 allocator spends time spinning in ext4_mb_scan_aligned(),
oddly enough.

Reported-by: Zeev Tarantov <zeev.tarantov@gmail.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Andreas Dilger April 5, 2011, 8:10 a.m. UTC | #1
On 2011-04-04, at 9:11 AM, Eric Sandeen wrote:
> Block devices may set minimum or optimal IO hints equal to
> blocksize; in this case there is really nothing for ext4
> to do with this information (i.e. search for a block-aligned
> allocation?) so don't set fs geometry with single-block
> values.
> 
> Zeev also reported that with a block-sized stripe, the
> ext4 allocator spends time spinning in ext4_mb_scan_aligned(),
> oddly enough.
> 
> Reported-by: Zeev Tarantov <zeev.tarantov@gmail.com>
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> ---
> 
> diff --git a/misc/mke2fs.c b/misc/mke2fs.c
> index 9798b88..74b838c 100644
> --- a/misc/mke2fs.c
> +++ b/misc/mke2fs.c
> @@ -1135,8 +1135,11 @@ static int get_device_geometry(const char *file,
> 	if ((opt_io == 0) && (psector_size > blocksize))
> 		opt_io = psector_size;
> 
> -	fs_param->s_raid_stride = min_io / blocksize;
> -	fs_param->s_raid_stripe_width = opt_io / blocksize;
> +	/* setting stripe/stride to blocksize is pointless */
> +	if (min_io > blocksize)
> +		fs_param->s_raid_stride = min_io / blocksize;
> +	if (opt_io > blocksize)
> +		fs_param->s_raid_stripe_width = opt_io / blocksize;

I don't think it is harmful to specify an mballoc alignment that is an even multiple of the underlying device IO size (e.g. at least 256kB or 512kB).

If the underlying device (e.g. zram) is reporting 16kB or 64kB opt_io size because that is PAGE_SIZE, but blocksize is 4kB, then we will have the same performance problem again.

Cheers, Andreas





--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zeev Tarantov April 5, 2011, 8:43 a.m. UTC | #2
On Tue, Apr 5, 2011 at 11:10, Andreas Dilger <adilger@dilger.ca> wrote:
> On 2011-04-04, at 9:11 AM, Eric Sandeen wrote:
>> Block devices may set minimum or optimal IO hints equal to
>> blocksize; in this case there is really nothing for ext4
>> to do with this information (i.e. search for a block-aligned
>> allocation?) so don't set fs geometry with single-block
>> values.
>>
>> Zeev also reported that with a block-sized stripe, the
>> ext4 allocator spends time spinning in ext4_mb_scan_aligned(),
>> oddly enough.
>>
>> Reported-by: Zeev Tarantov <zeev.tarantov@gmail.com>
>> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
>> ---
>>
>> diff --git a/misc/mke2fs.c b/misc/mke2fs.c
>> index 9798b88..74b838c 100644
>> --- a/misc/mke2fs.c
>> +++ b/misc/mke2fs.c
>> @@ -1135,8 +1135,11 @@ static int get_device_geometry(const char *file,
>>       if ((opt_io == 0) && (psector_size > blocksize))
>>               opt_io = psector_size;
>>
>> -     fs_param->s_raid_stride = min_io / blocksize;
>> -     fs_param->s_raid_stripe_width = opt_io / blocksize;
>> +     /* setting stripe/stride to blocksize is pointless */
>> +     if (min_io > blocksize)
>> +             fs_param->s_raid_stride = min_io / blocksize;
>> +     if (opt_io > blocksize)
>> +             fs_param->s_raid_stripe_width = opt_io / blocksize;
>
> I don't think it is harmful to specify an mballoc alignment that is an even multiple of the underlying device IO size (e.g. at least 256kB or 512kB).
>
> If the underlying device (e.g. zram) is reporting 16kB or 64kB opt_io size because that is PAGE_SIZE, but blocksize is 4kB, then we will have the same performance problem again.

I think I'm not understanding you. Are you objecting to the patch? If so, why?
If io block is an even multiple of fs blocks, then mballoc really does
need to use the fancier aligning code (though for a ratio that is a
power of two, it shouldn't need to be slow).
In your example, If PAGE_SIZE is 16k and zram reports min and opt io
size as 16k but fs block is 4k, then Eric Sandeen's new code will set
stride and stripe to 4, as it should. Then mballoc will really need to
align blocks in groups of 4 and spending time doing that is not a
performance problem.

This patch is only meant to _not_ set stripe and stride to 1, which
cannot help block allocation and does in fact cause the kernel to
spend more cpu time because it does not detect the degenerate case.
Making the allocator faster in the case of a ratio that is a power of
two is the work of a different patch someone may want to write.
Using larger fs blocks in case the underlying block device prefers
larger io blocks is the work of the admin, I think.

> Cheers, Andreas

-Z.T.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen April 5, 2011, 4:39 p.m. UTC | #3
On 4/5/11 1:10 AM, Andreas Dilger wrote:
> On 2011-04-04, at 9:11 AM, Eric Sandeen wrote:
>> Block devices may set minimum or optimal IO hints equal to
>> blocksize; in this case there is really nothing for ext4
>> to do with this information (i.e. search for a block-aligned
>> allocation?) so don't set fs geometry with single-block
>> values.
>>
>> Zeev also reported that with a block-sized stripe, the
>> ext4 allocator spends time spinning in ext4_mb_scan_aligned(),
>> oddly enough.
>>
>> Reported-by: Zeev Tarantov <zeev.tarantov@gmail.com>
>> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
>> ---
>>
>> diff --git a/misc/mke2fs.c b/misc/mke2fs.c
>> index 9798b88..74b838c 100644
>> --- a/misc/mke2fs.c
>> +++ b/misc/mke2fs.c
>> @@ -1135,8 +1135,11 @@ static int get_device_geometry(const char *file,
>> 	if ((opt_io == 0) && (psector_size > blocksize))
>> 		opt_io = psector_size;
>>
>> -	fs_param->s_raid_stride = min_io / blocksize;
>> -	fs_param->s_raid_stripe_width = opt_io / blocksize;
>> +	/* setting stripe/stride to blocksize is pointless */
>> +	if (min_io > blocksize)
>> +		fs_param->s_raid_stride = min_io / blocksize;
>> +	if (opt_io > blocksize)
>> +		fs_param->s_raid_stripe_width = opt_io / blocksize;
> 
> I don't think it is harmful to specify an mballoc alignment that is
> an even multiple of the underlying device IO size (e.g. at least
> 256kB or 512kB).
> 
> If the underlying device (e.g. zram) is reporting 16kB or 64kB opt_io
> size because that is PAGE_SIZE, but blocksize is 4kB, then we will
> have the same performance problem again.> 
> Cheers, Andreas

I need to look into why ext4_mb_scan_aligned is so inefficient for a block-sized stripe.

In practice I don't think we've seen this problem with stripe size at 4 or 8 or 16 blocks; it may just be less apparent.  I think the function steps through by stripe-sized units, and if that is 1 block, it's a lot of stepping.  

        while (i < EXT4_BLOCKS_PER_GROUP(sb)) {
...
                if (!mb_test_bit(i, bitmap)) {
...
                }
                i += sbi->s_stripe;
        }

But in any case, setting stripe alignment to 1 block makes no sense to me, and I see no reason to do it at mkfs time...

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o May 18, 2011, 5:20 p.m. UTC | #4
On Mon, Apr 04, 2011 at 03:11:52PM -0400, Eric Sandeen wrote:
> Block devices may set minimum or optimal IO hints equal to
> blocksize; in this case there is really nothing for ext4
> to do with this information (i.e. search for a block-aligned
> allocation?) so don't set fs geometry with single-block
> values.
> 
> Zeev also reported that with a block-sized stripe, the
> ext4 allocator spends time spinning in ext4_mb_scan_aligned(),
> oddly enough.
> 
> Reported-by: Zeev Tarantov <zeev.tarantov@gmail.com>
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>

Applied to the maint branch of e2fsprogs (which will be merged up to
the next/master branches), thanks!!

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index 9798b88..74b838c 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -1135,8 +1135,11 @@  static int get_device_geometry(const char *file,
 	if ((opt_io == 0) && (psector_size > blocksize))
 		opt_io = psector_size;
 
-	fs_param->s_raid_stride = min_io / blocksize;
-	fs_param->s_raid_stripe_width = opt_io / blocksize;
+	/* setting stripe/stride to blocksize is pointless */
+	if (min_io > blocksize)
+		fs_param->s_raid_stride = min_io / blocksize;
+	if (opt_io > blocksize)
+		fs_param->s_raid_stripe_width = opt_io / blocksize;
 
 	rc = blkid_topology_get_alignment_offset(tp);
 out: