diff mbox

ext4: speed up group trim with the right free block count.

Message ID 1295577131-3778-1-git-send-email-tm@tao.ma
State New, archived
Headers show

Commit Message

Tao Ma Jan. 21, 2011, 2:32 a.m. UTC
From: Tao Ma <boyu.mt@taobao.com>

When we trim some free blocks in a group of ext4, we should
calculate the free blocks properly and check whether there are
enough freed blocks left for us to trim. Current solution will
only calculate free spaces if they are large for a trim which
isn't appropriate.

Let us see a small example:
a group has 1.5M free which are 300k, 300k, 300k, 300k, 300k.
And minblocks is 1M. With current solution, we have to iterate
the whole group since these 300k will never be subtracted from
1.5M. But actually we should exit after we find the first 2
free spaces since the left 3 chunks only sum up to 900K if we
subtract the first 600K although they can't be trimed.

Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
---
 fs/ext4/mballoc.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

Comments

Lukas Czerner Jan. 21, 2011, 10:49 a.m. UTC | #1
On Fri, 21 Jan 2011, Tao Ma wrote:

> From: Tao Ma <boyu.mt@taobao.com>
> 
> When we trim some free blocks in a group of ext4, we should
> calculate the free blocks properly and check whether there are
> enough freed blocks left for us to trim. Current solution will
> only calculate free spaces if they are large for a trim which
> isn't appropriate.
> 
> Let us see a small example:
> a group has 1.5M free which are 300k, 300k, 300k, 300k, 300k.
> And minblocks is 1M. With current solution, we have to iterate
> the whole group since these 300k will never be subtracted from
> 1.5M. But actually we should exit after we find the first 2
> free spaces since the left 3 chunks only sum up to 900K if we
> subtract the first 600K although they can't be trimed.
> 
> Cc: "Theodore Ts'o" <tytso@mit.edu>
> Signed-off-by: Tao Ma <boyu.mt@taobao.com>
> ---
>  fs/ext4/mballoc.c |    5 +++--
>  1 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 851f49b..970e471 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -4737,7 +4737,7 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
>  		ext4_grpblk_t start, ext4_grpblk_t max, ext4_grpblk_t minblocks)
>  {
>  	void *bitmap;
> -	ext4_grpblk_t next, count = 0;
> +	ext4_grpblk_t next, count = 0, free_count = 0;
>  	ext4_group_t group;
>  	int ret = 0;
>  
> @@ -4762,6 +4762,7 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
>  				break;
>  			count += next - start;
>  		}
> +		free_count += next - start;
>  		start = next + 1;
>  
>  		if (fatal_signal_pending(current)) {
> @@ -4775,7 +4776,7 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
>  			ext4_lock_group(sb, group);
>  		}
>  
> -		if ((e4b->bd_info->bb_free - count) < minblocks)
> +		if ((e4b->bd_info->bb_free - free_count) < minblocks)
>  			break;
>  	}
>  	ext4_unlock_group(sb, group);
> 

Looks good, thanks.

-Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Dilger Jan. 21, 2011, 5:53 p.m. UTC | #2
Actually, I had another idea which might speed up trim operations significantly.  If the kernel keeps a bit in ext4_group_info->bb_state that indicates whether this group has any freed blocks since it last had a trim operation sent to it, then the kernel can completely avoid doing anything for that group.  This isn't just avoiding the need to scan the bitmap for free ranges, but more importantly it avoids sending the TRIM/UNMAP operation to the disk for free ranges that were previously trimmed in the backing storage.

Something like:

#define EXT4_GROUP_INFO_NEED_TRIM_BIT	1

/* Note that bit clear means a trim is needed, so that a newly mounted
* filesystem assumes that holes the group need to be trimmed. */
#define EXT4_MB_GRP_NEED_TRIM(grp)	\
	(!test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))


When calling the TRIM ioctl it can check EXT4_MB_GRP_NEED_TRIM(grp) and skip that group if it hasn't changed since last time.  Otherwise, it should call EXT4_MB_GRP_DONE_TRIM(grp) before doing the actual trim, so it is not racy with another process freeing blocks in that group.

In release_blocks_on_commit() it should call EXT4_MB_GRP_MUST_TRIM() to mark that the group needs to be trimmed again, since blocks were freed in the group.

This can potentially avoid a huge number of TRIMs to the disk, if this is run periodically (e.g. every day) and the filesystem is not remounted all the time, and does not undergo huge allocate/free/allocate cycles during daily use.

It would even be possible to store this bit on-disk ext4_group_desc->bg_flags to avoid the initial "assume every group needs to be trimmed" operation, if that ends up to be a significant factor.  However, that can be done later once some numbers are measured on how significant the initial-mount overhead is.  It is also not free, since it will cause disk IO to set/clear this bit.

Cheers, Andreas





--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tao Ma Jan. 22, 2011, 8:32 a.m. UTC | #3
Hi Andreas,
On 01/22/2011 01:53 AM, Andreas Dilger wrote:
> Actually, I had another idea which might speed up trim operations significantly.  If the kernel keeps a bit in ext4_group_info->bb_state that indicates whether this group has any freed blocks since it last had a trim operation sent to it, then the kernel can completely avoid doing anything for that group.  This isn't just avoiding the need to scan the bitmap for free ranges, but more importantly it avoids sending the TRIM/UNMAP operation to the disk for free ranges that were previously trimmed in the backing storage.
It looks good.
Just an extra point. we have to store 'minlen' passed in by 
fstrim_range. So if the user first try minlen=1mb, and then we give them 
the number we have trimmed. If he isn't satisfied, he can try 
minlen=512kb again. In this case we have to check with the old 'minlen' 
and retry again if minlen < old_minlen.

If there isn't any objection, I can put this to the patch.

Regards,
Tao
> Something like:
>
> #define EXT4_GROUP_INFO_NEED_TRIM_BIT	1
>
> /* Note that bit clear means a trim is needed, so that a newly mounted
> * filesystem assumes that holes the group need to be trimmed. */
> #define EXT4_MB_GRP_NEED_TRIM(grp)	\
> 	(!test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT,&((grp)->bb_state)))
>
>
> When calling the TRIM ioctl it can check EXT4_MB_GRP_NEED_TRIM(grp) and skip that group if it hasn't changed since last time.  Otherwise, it should call EXT4_MB_GRP_DONE_TRIM(grp) before doing the actual trim, so it is not racy with another process freeing blocks in that group.
>
> In release_blocks_on_commit() it should call EXT4_MB_GRP_MUST_TRIM() to mark that the group needs to be trimmed again, since blocks were freed in the group.
>
> This can potentially avoid a huge number of TRIMs to the disk, if this is run periodically (e.g. every day) and the filesystem is not remounted all the time, and does not undergo huge allocate/free/allocate cycles during daily use.
>
> It would even be possible to store this bit on-disk ext4_group_desc->bg_flags to avoid the initial "assume every group needs to be trimmed" operation, if that ends up to be a significant factor.  However, that can be done later once some numbers are measured on how significant the initial-mount overhead is.  It is also not free, since it will cause disk IO to set/clear this bit.
>
> Cheers, Andreas
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Dilger Jan. 24, 2011, 1:33 a.m. UTC | #4
On 2011-01-22, at 01:32, Tao Ma wrote:
> On 01/22/2011 01:53 AM, Andreas Dilger wrote:
>> Actually, I had another idea which might speed up trim operations significantly.  If the kernel keeps a bit in ext4_group_info->bb_state that indicates whether this group has any freed blocks since it last had a trim operation sent to it, then the kernel can completely avoid doing anything for that group.  This isn't just avoiding the need to scan the bitmap for free ranges, but more importantly it avoids sending the TRIM/UNMAP operation to the disk for free ranges that were previously trimmed in the backing storage.
> 
> It looks good.
> Just an extra point. we have to store 'minlen' passed in by fstrim_range. So if the user first try minlen=1mb, and then we give them the number we have trimmed. If he isn't satisfied, he can try minlen=512kb again. In this case we have to check with the old 'minlen' and retry again if minlen < old_minlen.

Maybe I missed something, but why would one run with minlen=1MB and then run again with minlen=512kB?  I can't see why running this command twice would be better than running it a single time with minlen=512kB, if the hardware actually supports that.

>> Something like:
>> 
>> #define EXT4_GROUP_INFO_NEED_TRIM_BIT	1
>> 
>> /* Note that bit clear means a trim is needed, so that a newly mounted
>> * filesystem assumes that holes the group need to be trimmed. */
>> #define EXT4_MB_GRP_NEED_TRIM(grp)	\
>> 	(!test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT,&((grp)->bb_state)))
>> 
>> 
>> When calling the TRIM ioctl it can check EXT4_MB_GRP_NEED_TRIM(grp) and skip that group if it hasn't changed since last time.  Otherwise, it should call EXT4_MB_GRP_DONE_TRIM(grp) before doing the actual trim, so it is not racy with another process freeing blocks in that group.
>> 
>> In release_blocks_on_commit() it should call EXT4_MB_GRP_MUST_TRIM() to mark that the group needs to be trimmed again, since blocks were freed in the group.
>> 
>> This can potentially avoid a huge number of TRIMs to the disk, if this is run periodically (e.g. every day) and the filesystem is not remounted all the time, and does not undergo huge allocate/free/allocate cycles during daily use.
>> 
>> It would even be possible to store this bit on-disk ext4_group_desc->bg_flags to avoid the initial "assume every group needs to be trimmed" operation, if that ends up to be a significant factor.  However, that can be done later once some numbers are measured on how significant the initial-mount overhead is.  It is also not free, since it will cause disk IO to set/clear this bit.
>> 
>> Cheers, Andreas
>> 
>> 
>> 
>> 
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


Cheers, Andreas





--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tao Ma Jan. 24, 2011, 2:07 a.m. UTC | #5
On 01/24/2011 09:33 AM, Andreas Dilger wrote:
> On 2011-01-22, at 01:32, Tao Ma wrote:
>> On 01/22/2011 01:53 AM, Andreas Dilger wrote:
>>> Actually, I had another idea which might speed up trim operations significantly.  If the kernel keeps a bit in ext4_group_info->bb_state that indicates whether this group has any freed blocks since it last had a trim operation sent to it, then the kernel can completely avoid doing anything for that group.  This isn't just avoiding the need to scan the bitmap for free ranges, but more importantly it avoids sending the TRIM/UNMAP operation to the disk for free ranges that were previously trimmed in the backing storage.
>>
>> It looks good.
>> Just an extra point. we have to store 'minlen' passed in by fstrim_range. So if the user first try minlen=1mb, and then we give them the number we have trimmed. If he isn't satisfied, he can try minlen=512kb again. In this case we have to check with the old 'minlen' and retry again if minlen<  old_minlen.
>
> Maybe I missed something, but why would one run with minlen=1MB and then run again with minlen=512kB?  I can't see why running this command twice would be better than running it a single time with minlen=512kB, if the hardware actually supports that.
>
I don't know either. But that is the user's choice of 'minlen' and we 
can't provent them from doing like that.

Here is a scenario:
1. run with minlen=1mb, he got that only 1G get trimmed. but the free 
space is more than 3gb actually because of the fragmentation.
2. So he decide to run with minlen=512kb or even smaller len to see 
whether more space can be trimmed.

Is it possible? I guess the answer is yes.

Regards,
Tao
>>> Something like:
>>>
>>> #define EXT4_GROUP_INFO_NEED_TRIM_BIT	1
>>>
>>> /* Note that bit clear means a trim is needed, so that a newly mounted
>>> * filesystem assumes that holes the group need to be trimmed. */
>>> #define EXT4_MB_GRP_NEED_TRIM(grp)	\
>>> 	(!test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT,&((grp)->bb_state)))
>>>
>>>
>>> When calling the TRIM ioctl it can check EXT4_MB_GRP_NEED_TRIM(grp) and skip that group if it hasn't changed since last time.  Otherwise, it should call EXT4_MB_GRP_DONE_TRIM(grp) before doing the actual trim, so it is not racy with another process freeing blocks in that group.
>>>
>>> In release_blocks_on_commit() it should call EXT4_MB_GRP_MUST_TRIM() to mark that the group needs to be trimmed again, since blocks were freed in the group.
>>>
>>> This can potentially avoid a huge number of TRIMs to the disk, if this is run periodically (e.g. every day) and the filesystem is not remounted all the time, and does not undergo huge allocate/free/allocate cycles during daily use.
>>>
>>> It would even be possible to store this bit on-disk ext4_group_desc->bg_flags to avoid the initial "assume every group needs to be trimmed" operation, if that ends up to be a significant factor.  However, that can be done later once some numbers are measured on how significant the initial-mount overhead is.  It is also not free, since it will cause disk IO to set/clear this bit.
>>>
>>> Cheers, Andreas
>>>
>>>
>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
> Cheers, Andreas
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Czerner Jan. 24, 2011, 1:39 p.m. UTC | #6
On Mon, 24 Jan 2011, Tao Ma wrote:

> On 01/24/2011 09:33 AM, Andreas Dilger wrote:
> > On 2011-01-22, at 01:32, Tao Ma wrote:
> > > On 01/22/2011 01:53 AM, Andreas Dilger wrote:
> > > > Actually, I had another idea which might speed up trim operations
> > > > significantly.  If the kernel keeps a bit in ext4_group_info->bb_state
> > > > that indicates whether this group has any freed blocks since it last had
> > > > a trim operation sent to it, then the kernel can completely avoid doing
> > > > anything for that group.  This isn't just avoiding the need to scan the
> > > > bitmap for free ranges, but more importantly it avoids sending the
> > > > TRIM/UNMAP operation to the disk for free ranges that were previously
> > > > trimmed in the backing storage.
> > > 
> > > It looks good.
> > > Just an extra point. we have to store 'minlen' passed in by fstrim_range.
> > > So if the user first try minlen=1mb, and then we give them the number we
> > > have trimmed. If he isn't satisfied, he can try minlen=512kb again. In
> > > this case we have to check with the old 'minlen' and retry again if
> > > minlen<  old_minlen.
> > 
> > Maybe I missed something, but why would one run with minlen=1MB and then run
> > again with minlen=512kB?  I can't see why running this command twice would
> > be better than running it a single time with minlen=512kB, if the hardware
> > actually supports that.
> > 
> I don't know either. But that is the user's choice of 'minlen' and we can't
> provent them from doing like that.
> 
> Here is a scenario:
> 1. run with minlen=1mb, he got that only 1G get trimmed. but the free space is
> more than 3gb actually because of the fragmentation.
> 2. So he decide to run with minlen=512kb or even smaller len to see whether
> more space can be trimmed.
> 
> Is it possible? I guess the answer is yes.

Hi,

I think that this is actually quite useful *feature*. I can imagine that
people might want to run FITRIM with bigger minlen (megabytes or tens of
megabytes) weekly, as it is much faster, especially on fragmented
filesystem. Then, they might want to run FITRIM with smaller minlen (4kB)
monthly to reclaim even the smaller (or all of them) extents.

But I like Andreas' idea, it should improve FITRIM performance
significantly (since we are doing mkfs trim). Minlen can be stored in
high bits of bb_state as number of blocks.


Thanks!
-Lukas


> 
> Regards,
> Tao
> > > > Something like:
> > > > 
> > > > #define EXT4_GROUP_INFO_NEED_TRIM_BIT	1
> > > > 
> > > > /* Note that bit clear means a trim is needed, so that a newly mounted
> > > > * filesystem assumes that holes the group need to be trimmed. */
> > > > #define EXT4_MB_GRP_NEED_TRIM(grp)	\
> > > > 	(!test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT,&((grp)->bb_state)))
> > > > 
> > > > 
> > > > When calling the TRIM ioctl it can check EXT4_MB_GRP_NEED_TRIM(grp) and
> > > > skip that group if it hasn't changed since last time.  Otherwise, it
> > > > should call EXT4_MB_GRP_DONE_TRIM(grp) before doing the actual trim, so
> > > > it is not racy with another process freeing blocks in that group.
> > > > 
> > > > In release_blocks_on_commit() it should call EXT4_MB_GRP_MUST_TRIM() to
> > > > mark that the group needs to be trimmed again, since blocks were freed
> > > > in the group.
> > > > 
> > > > This can potentially avoid a huge number of TRIMs to the disk, if this
> > > > is run periodically (e.g. every day) and the filesystem is not remounted
> > > > all the time, and does not undergo huge allocate/free/allocate cycles
> > > > during daily use.
> > > > 
> > > > It would even be possible to store this bit on-disk
> > > > ext4_group_desc->bg_flags to avoid the initial "assume every group needs
> > > > to be trimmed" operation, if that ends up to be a significant factor.
> > > > However, that can be done later once some numbers are measured on how
> > > > significant the initial-mount overhead is.  It is also not free, since
> > > > it will cause disk IO to set/clear this bit.
> > > > 
> > > > Cheers, Andreas
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > 
> > 
> > Cheers, Andreas
> > 
> > 
> > 
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
>
Andreas Dilger Jan. 24, 2011, 6:51 p.m. UTC | #7
On 2011-01-24, at 06:39, Lukas Czerner wrote:
>> I don't know either. But that is the user's choice of 'minlen' and we can't
>> provent them from doing like that.
>> 
>> Here is a scenario:
>> 1. run with minlen=1mb, he got that only 1G get trimmed. but the free space is
>> more than 3gb actually because of the fragmentation.
>> 2. So he decide to run with minlen=512kb or even smaller len to see whether
>> more space can be trimmed.
>> 
>> Is it possible? I guess the answer is yes.
> 
> Hi,
> 
> I think that this is actually quite useful *feature*. I can imagine that
> people might want to run FITRIM with bigger minlen (megabytes or tens of
> megabytes) weekly, as it is much faster, especially on fragmented
> filesystem. Then, they might want to run FITRIM with smaller minlen (4kB)
> monthly to reclaim even the smaller (or all of them) extents.
> 
> But I like Andreas' idea, it should improve FITRIM performance
> significantly (since we are doing mkfs trim). Minlen can be stored in
> high bits of bb_state as number of blocks.

I'd rather just add a proper field in ext4_group_info to store the length.  I don't think this will change the actual memory usage, since this is already a fairly large and odd-sized structure.

>>>>> Something like:
>>>>> 
>>>>> #define EXT4_GROUP_INFO_NEED_TRIM_BIT	1
>>>>> 
>>>>> /* Note that bit clear means a trim is needed, so that a newly mounted
>>>>> * filesystem assumes that holes the group need to be trimmed. */
>>>>> #define EXT4_MB_GRP_NEED_TRIM(grp)	\
>>>>> 	(!test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT,&((grp)->bb_state)))
>>>>> 
>>>>> 
>>>>> When calling the TRIM ioctl it can check EXT4_MB_GRP_NEED_TRIM(grp) and
>>>>> skip that group if it hasn't changed since last time.  Otherwise, it
>>>>> should call EXT4_MB_GRP_DONE_TRIM(grp) before doing the actual trim, so
>>>>> it is not racy with another process freeing blocks in that group.
>>>>> 
>>>>> In release_blocks_on_commit() it should call EXT4_MB_GRP_MUST_TRIM() to
>>>>> mark that the group needs to be trimmed again, since blocks were freed
>>>>> in the group.
>>>>> 
>>>>> This can potentially avoid a huge number of TRIMs to the disk, if this
>>>>> is run periodically (e.g. every day) and the filesystem is not remounted
>>>>> all the time, and does not undergo huge allocate/free/allocate cycles
>>>>> during daily use.
>>>>> 
>>>>> It would even be possible to store this bit on-disk
>>>>> ext4_group_desc->bg_flags to avoid the initial "assume every group needs
>>>>> to be trimmed" operation, if that ends up to be a significant factor.
>>>>> However, that can be done later once some numbers are measured on how
>>>>> significant the initial-mount overhead is.  It is also not free, since
>>>>> it will cause disk IO to set/clear this bit.
>>>>> 
>>>>> Cheers, Andreas
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> 
>>> 
>>> 
>>> Cheers, Andreas
>>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> 
> 
> -- 


Cheers, Andreas





--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tao Ma Jan. 25, 2011, 1:53 a.m. UTC | #8
> On 2011-01-24, at 06:39, Lukas Czerner wrote:
>>> I don't know either. But that is the user's choice of 'minlen' and we
>>> can't
>>> provent them from doing like that.
>>>
>>> Here is a scenario:
>>> 1. run with minlen=1mb, he got that only 1G get trimmed. but the free
>>> space is
>>> more than 3gb actually because of the fragmentation.
>>> 2. So he decide to run with minlen=512kb or even smaller len to see
>>> whether
>>> more space can be trimmed.
>>>
>>> Is it possible? I guess the answer is yes.
>>
>> Hi,
>>
>> I think that this is actually quite useful *feature*. I can imagine that
>> people might want to run FITRIM with bigger minlen (megabytes or tens of
>> megabytes) weekly, as it is much faster, especially on fragmented
>> filesystem. Then, they might want to run FITRIM with smaller minlen
>> (4kB)
>> monthly to reclaim even the smaller (or all of them) extents.
>>
>> But I like Andreas' idea, it should improve FITRIM performance
>> significantly (since we are doing mkfs trim). Minlen can be stored in
>> high bits of bb_state as number of blocks.
>
> I'd rather just add a proper field in ext4_group_info to store the length.
>  I don't think this will change the actual memory usage, since this is
> already a fairly large and odd-sized structure.
yeah, a proper field should be fine.
First, we don't store it in the disk, so we don't care for the layout
compatibility problem.
Second, we don't have much group in the volume. So a new field wouldn't
cost much for us like inodes.

Regards,
Tao

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 851f49b..970e471 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4737,7 +4737,7 @@  ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
 		ext4_grpblk_t start, ext4_grpblk_t max, ext4_grpblk_t minblocks)
 {
 	void *bitmap;
-	ext4_grpblk_t next, count = 0;
+	ext4_grpblk_t next, count = 0, free_count = 0;
 	ext4_group_t group;
 	int ret = 0;
 
@@ -4762,6 +4762,7 @@  ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
 				break;
 			count += next - start;
 		}
+		free_count += next - start;
 		start = next + 1;
 
 		if (fatal_signal_pending(current)) {
@@ -4775,7 +4776,7 @@  ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
 			ext4_lock_group(sb, group);
 		}
 
-		if ((e4b->bd_info->bb_free - count) < minblocks)
+		if ((e4b->bd_info->bb_free - free_count) < minblocks)
 			break;
 	}
 	ext4_unlock_group(sb, group);