diff mbox

Set the initial TRIM information as TRIMMED

Message ID 20111201070052.GA29708@july
State New, archived
Headers show

Commit Message

Kyungmin Park Dec. 1, 2011, 7 a.m. UTC
From: Kyungmin Park <kyungmin.park@samsung.com>

Now trim information doesn't stored at disk so every boot time. it's cleared.
and do the trim all disk groups.
But assume that it's already trimmed at previous time so don't need to trim it again. So set the intial state as trimmed.

Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Tao Ma Dec. 1, 2011, 7:39 a.m. UTC | #1
Hi Kyungmin,
On 12/01/2011 03:00 PM, Kyungmin Park wrote:
> From: Kyungmin Park <kyungmin.park@samsung.com>
> 
> Now trim information doesn't stored at disk so every boot time. it's cleared.
> and do the trim all disk groups.
> But assume that it's already trimmed at previous time so don't need to trim it again. So set the intial state as trimmed.
sorry, I don't get your meaning here.
Why can we assume that the group is already trimmed since it isn't
stored in the disk?

Thanks
Tao
> 
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> ---
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index e2d8be8..97ef342 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -1098,6 +1098,12 @@ int ext4_mb_init_group(struct super_block *sb, ext4_group_t group)
>  		goto err;
>  	}
>  	mark_page_accessed(page);
> +
> +	/*
> +	 * TRIM information is not stored at disk so set the initial
> +	 * state as trimmed. Since previous time it's already trimmed all
> +	 */
> +	EXT4_MB_GRP_SET_TRIMMED(this_grp);
>  err:
>  	ext4_mb_put_buddy_page_lock(&e4b);
>  	return ret;

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kyungmin Park Dec. 1, 2011, 8:19 a.m. UTC | #2
On 12/1/11, Tao Ma <tm@tao.ma> wrote:
> Hi Kyungmin,
> On 12/01/2011 03:00 PM, Kyungmin Park wrote:
>> From: Kyungmin Park <kyungmin.park@samsung.com>
>>
>> Now trim information doesn't stored at disk so every boot time. it's
>> cleared.
>> and do the trim all disk groups.
>> But assume that it's already trimmed at previous time so don't need to
>> trim it again. So set the intial state as trimmed.
> sorry, I don't get your meaning here.
> Why can we assume that the group is already trimmed since it isn't
> stored in the disk?
To avoid the first time trim operation.
Every boot time. run the fitrim then it trims all block groups again.
but it's already done at previous time. so don't need to trim it
again.
Doesn't make sense? I think it's not designed behavior.

In your patch at http://patchwork.ozlabs.org/patch/102918/

with the patch:
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m5.625s
user	0m0.000s
sys	0m1.269s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m0.002s
user	0m0.000s
sys	0m0.001s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m0.002s
user	0m0.000s
sys	0m0.001s

After reboot. it maybe become below

[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m0.002s
user	0m0.000s
sys	0m0.001s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m0.002s
user	0m0.000s
sys	0m0.001s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m0.002s
user	0m0.000s
sys	0m0.001s

Thank you,
Kyungmin Park
>
> Thanks
> Tao
>>
>> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
>> ---
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index e2d8be8..97ef342 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -1098,6 +1098,12 @@ int ext4_mb_init_group(struct super_block *sb,
>> ext4_group_t group)
>>  		goto err;
>>  	}
>>  	mark_page_accessed(page);
>> +
>> +	/*
>> +	 * TRIM information is not stored at disk so set the initial
>> +	 * state as trimmed. Since previous time it's already trimmed all
>> +	 */
>> +	EXT4_MB_GRP_SET_TRIMMED(this_grp);
>>  err:
>>  	ext4_mb_put_buddy_page_lock(&e4b);
>>  	return ret;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tao Ma Dec. 1, 2011, 8:30 a.m. UTC | #3
On 12/01/2011 04:19 PM, Kyungmin Park wrote:
> On 12/1/11, Tao Ma <tm@tao.ma> wrote:
>> Hi Kyungmin,
>> On 12/01/2011 03:00 PM, Kyungmin Park wrote:
>>> From: Kyungmin Park <kyungmin.park@samsung.com>
>>>
>>> Now trim information doesn't stored at disk so every boot time. it's
>>> cleared.
>>> and do the trim all disk groups.
>>> But assume that it's already trimmed at previous time so don't need to
>>> trim it again. So set the intial state as trimmed.
>> sorry, I don't get your meaning here.
>> Why can we assume that the group is already trimmed since it isn't
>> stored in the disk?
> To avoid the first time trim operation.
> Every boot time. run the fitrim then it trims all block groups again.
> but it's already done at previous time. so don't need to trim it
> again.
> Doesn't make sense? I think it's not designed behavior.
You make the assumption that we run the fitrim every time at boot time.
But what if the user don't run it at all? I guess "run fitrim during
boot" is your firmware's behaviour and it isn't an assumption for all
the other users.

Having said that, this flag will be cleared whenever some
blocks/clusters are freed. So maybe it doesn't matter for setting this
flag during the group initialization.

Thanks
Tao
> 
> In your patch at http://patchwork.ozlabs.org/patch/102918/
> 
> with the patch:
> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
> real	0m5.625s
> user	0m0.000s
> sys	0m1.269s
> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
> real	0m0.002s
> user	0m0.000s
> sys	0m0.001s
> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
> real	0m0.002s
> user	0m0.000s
> sys	0m0.001s
> 
> After reboot. it maybe become below
> 
> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
> real	0m0.002s
> user	0m0.000s
> sys	0m0.001s
> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
> real	0m0.002s
> user	0m0.000s
> sys	0m0.001s
> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
> real	0m0.002s
> user	0m0.000s
> sys	0m0.001s
> 
> Thank you,
> Kyungmin Park
>>
>> Thanks
>> Tao
>>>
>>> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
>>> ---
>>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>>> index e2d8be8..97ef342 100644
>>> --- a/fs/ext4/mballoc.c
>>> +++ b/fs/ext4/mballoc.c
>>> @@ -1098,6 +1098,12 @@ int ext4_mb_init_group(struct super_block *sb,
>>> ext4_group_t group)
>>>  		goto err;
>>>  	}
>>>  	mark_page_accessed(page);
>>> +
>>> +	/*
>>> +	 * TRIM information is not stored at disk so set the initial
>>> +	 * state as trimmed. Since previous time it's already trimmed all
>>> +	 */
>>> +	EXT4_MB_GRP_SET_TRIMMED(this_grp);
>>>  err:
>>>  	ext4_mb_put_buddy_page_lock(&e4b);
>>>  	return ret;
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kyungmin Park Dec. 1, 2011, 8:39 a.m. UTC | #4
On 12/1/11, Tao Ma <tm@tao.ma> wrote:
> On 12/01/2011 04:19 PM, Kyungmin Park wrote:
>> On 12/1/11, Tao Ma <tm@tao.ma> wrote:
>>> Hi Kyungmin,
>>> On 12/01/2011 03:00 PM, Kyungmin Park wrote:
>>>> From: Kyungmin Park <kyungmin.park@samsung.com>
>>>>
>>>> Now trim information doesn't stored at disk so every boot time. it's
>>>> cleared.
>>>> and do the trim all disk groups.
>>>> But assume that it's already trimmed at previous time so don't need to
>>>> trim it again. So set the intial state as trimmed.
>>> sorry, I don't get your meaning here.
>>> Why can we assume that the group is already trimmed since it isn't
>>> stored in the disk?
>> To avoid the first time trim operation.
>> Every boot time. run the fitrim then it trims all block groups again.
>> but it's already done at previous time. so don't need to trim it
>> again.
>> Doesn't make sense? I think it's not designed behavior.
> You make the assumption that we run the fitrim every time at boot time.
> But what if the user don't run it at all? I guess "run fitrim during
> boot" is your firmware's behaviour and it isn't an assumption for all
> the other users.
>
> Having said that, this flag will be cleared whenever some
> blocks/clusters are freed. So maybe it doesn't matter for setting this
> flag during the group initialization.
Right,

I hope trim the real updated blocks only.

Thank you,
Kyungmin Park
>
> Thanks
> Tao
>>
>> In your patch at http://patchwork.ozlabs.org/patch/102918/
>>
>> with the patch:
>> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
>> real	0m5.625s
>> user	0m0.000s
>> sys	0m1.269s
>> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
>> real	0m0.002s
>> user	0m0.000s
>> sys	0m0.001s
>> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
>> real	0m0.002s
>> user	0m0.000s
>> sys	0m0.001s
>>
>> After reboot. it maybe become below
>>
>> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
>> real	0m0.002s
>> user	0m0.000s
>> sys	0m0.001s
>> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
>> real	0m0.002s
>> user	0m0.000s
>> sys	0m0.001s
>> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
>> real	0m0.002s
>> user	0m0.000s
>> sys	0m0.001s
>>
>> Thank you,
>> Kyungmin Park
>>>
>>> Thanks
>>> Tao
>>>>
>>>> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
>>>> ---
>>>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>>>> index e2d8be8..97ef342 100644
>>>> --- a/fs/ext4/mballoc.c
>>>> +++ b/fs/ext4/mballoc.c
>>>> @@ -1098,6 +1098,12 @@ int ext4_mb_init_group(struct super_block *sb,
>>>> ext4_group_t group)
>>>>  		goto err;
>>>>  	}
>>>>  	mark_page_accessed(page);
>>>> +
>>>> +	/*
>>>> +	 * TRIM information is not stored at disk so set the initial
>>>> +	 * state as trimmed. Since previous time it's already trimmed all
>>>> +	 */
>>>> +	EXT4_MB_GRP_SET_TRIMMED(this_grp);
>>>>  err:
>>>>  	ext4_mb_put_buddy_page_lock(&e4b);
>>>>  	return ret;
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen Dec. 1, 2011, 10:06 p.m. UTC | #5
On 12/1/11 1:00 AM, Kyungmin Park wrote:
> From: Kyungmin Park <kyungmin.park@samsung.com>
> 
> Now trim information doesn't stored at disk so every boot time. it's cleared.
> and do the trim all disk groups.
> But assume that it's already trimmed at previous time so don't need to trim it again. So set the intial state as trimmed.
> 
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> ---
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index e2d8be8..97ef342 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -1098,6 +1098,12 @@ int ext4_mb_init_group(struct super_block *sb, ext4_group_t group)
>  		goto err;
>  	}
>  	mark_page_accessed(page);
> +
> +	/*
> +	 * TRIM information is not stored at disk so set the initial
> +	 * state as trimmed. Since previous time it's already trimmed all
> +	 */
> +	EXT4_MB_GRP_SET_TRIMMED(this_grp);

Hm, so if there were freed but un-trimmed blocks at this point, we will
never trim them until we free _another_ block in the group, right?  That
might be a reasonable tradeoff, but it is somewhat surprising behavior.

i.e. say we do:

mount /mnt
rm -rf /mnt/very_big_file
umount /mnt

mount /mnt
fitrim /mnt

then we won't trim anything at all, right, despite there being many
new free blocks?  Which would be rather unexpected.

If we don't store the trimmed state on disk, I think we should
probably stick with the slower first-time trim, and the more obvious
behavior (all free blocks are always trimmed whenever a trim
command is issued).

-Eric

>  err:
>  	ext4_mb_put_buddy_page_lock(&e4b);
>  	return ret;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kyungmin Park Dec. 2, 2011, 12:01 a.m. UTC | #6
On 12/2/11, Eric Sandeen <sandeen@redhat.com> wrote:
> On 12/1/11 1:00 AM, Kyungmin Park wrote:
>> From: Kyungmin Park <kyungmin.park@samsung.com>
>>
>> Now trim information doesn't stored at disk so every boot time. it's
>> cleared.
>> and do the trim all disk groups.
>> But assume that it's already trimmed at previous time so don't need to
>> trim it again. So set the intial state as trimmed.
>>
>> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
>> ---
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index e2d8be8..97ef342 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -1098,6 +1098,12 @@ int ext4_mb_init_group(struct super_block *sb,
>> ext4_group_t group)
>>  		goto err;
>>  	}
>>  	mark_page_accessed(page);
>> +
>> +	/*
>> +	 * TRIM information is not stored at disk so set the initial
>> +	 * state as trimmed. Since previous time it's already trimmed all
>> +	 */
>> +	EXT4_MB_GRP_SET_TRIMMED(this_grp);
>
Hi,

> Hm, so if there were freed but un-trimmed blocks at this point, we will
> never trim them until we free _another_ block in the group, right?  That
> might be a reasonable tradeoff, but it is somewhat surprising behavior.
>
> i.e. say we do:
>
> mount /mnt
> rm -rf /mnt/very_big_file
> umount /mnt
>
> mount /mnt
> fitrim /mnt
another word, you can run fitrim after rm -rf
yes, it's trade-off.

In my case, phone scenario, no umount system and data partition. it's
burden to trim at boot time. it has still slower boot time.
some daemon or program run fitrm at filesystem. it consumes time and
hurt other boot processes.
>
> then we won't trim anything at all, right, despite there being many
> new free blocks?  Which would be rather unexpected.
>
> If we don't store the trimmed state on disk, I think we should
> probably stick with the slower first-time trim, and the more obvious
> behavior (all free blocks are always trimmed whenever a trim
> command is issued).

Umm how do you think, introduce the trim force command for this?

Thank you,
Kyungmin Park
>
> -Eric
>
>>  err:
>>  	ext4_mb_put_buddy_page_lock(&e4b);
>>  	return ret;
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg Freemyer Dec. 2, 2011, 12:50 a.m. UTC | #7
On Thu, Dec 1, 2011 at 7:01 PM, Kyungmin Park <kmpark@infradead.org> wrote:
> On 12/2/11, Eric Sandeen <sandeen@redhat.com> wrote:
<snip>
> Hi,
>
>> Hm, so if there were freed but un-trimmed blocks at this point, we will
>> never trim them until we free _another_ block in the group, right?  That
>> might be a reasonable tradeoff, but it is somewhat surprising behavior.
>>
>> i.e. say we do:
>>
>> mount /mnt
>> rm -rf /mnt/very_big_file
>> umount /mnt

does umount need to force a fitrim if it's available?

>> mount /mnt
>> fitrim /mnt

That way if umount is clean, then the new logic could kick for the
next mount, but if there was not a clean umount, then in addition to
replaying the journal, the next mount could leave the fitrim info not
initialized.

I'm sure there are smarter ways to track it.  The biggest thing I'm
suggesting is for there to at least be a single boolean "fitrim'ed
state flag for the whole filesystem.  that could cleared on mount and
set on a clean umount.

Greg
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen Dec. 2, 2011, 3:30 p.m. UTC | #8
On 12/1/11 6:01 PM, Kyungmin Park wrote:
> On 12/2/11, Eric Sandeen <sandeen@redhat.com> wrote:
>> On 12/1/11 1:00 AM, Kyungmin Park wrote:
>>> From: Kyungmin Park <kyungmin.park@samsung.com>
>>>
>>> Now trim information doesn't stored at disk so every boot time. it's
>>> cleared.
>>> and do the trim all disk groups.
>>> But assume that it's already trimmed at previous time so don't need to
>>> trim it again. So set the intial state as trimmed.
>>>
>>> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
>>> ---
>>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>>> index e2d8be8..97ef342 100644
>>> --- a/fs/ext4/mballoc.c
>>> +++ b/fs/ext4/mballoc.c
>>> @@ -1098,6 +1098,12 @@ int ext4_mb_init_group(struct super_block *sb,
>>> ext4_group_t group)
>>>  		goto err;
>>>  	}
>>>  	mark_page_accessed(page);
>>> +
>>> +	/*
>>> +	 * TRIM information is not stored at disk so set the initial
>>> +	 * state as trimmed. Since previous time it's already trimmed all
>>> +	 */
>>> +	EXT4_MB_GRP_SET_TRIMMED(this_grp);
>>
> Hi,
> 
>> Hm, so if there were freed but un-trimmed blocks at this point, we will
>> never trim them until we free _another_ block in the group, right?  That
>> might be a reasonable tradeoff, but it is somewhat surprising behavior.
>>
>> i.e. say we do:
>>
>> mount /mnt
>> rm -rf /mnt/very_big_file
>> umount /mnt
>>
>> mount /mnt
>> fitrim /mnt
> another word, you can run fitrim after rm -rf
> yes, it's trade-off.
> 
> In my case, phone scenario, no umount system and data partition. it's
> burden to trim at boot time. it has still slower boot time.
> some daemon or program run fitrm at filesystem. it consumes time and
> hurt other boot processes.
>>
>> then we won't trim anything at all, right, despite there being many
>> new free blocks?  Which would be rather unexpected.
>>
>> If we don't store the trimmed state on disk, I think we should
>> probably stick with the slower first-time trim, and the more obvious
>> behavior (all free blocks are always trimmed whenever a trim
>> command is issued).
> 
> Umm how do you think, introduce the trim force command for this?

One that ignores the state flag?  Maybe, but IMHO that's getting even
kludgier, or at least more complicated.  I think it will be difficult
getting that past the vfs folks.  "Here is a new flag which says that
when we issue a trim command, we really should issue the trim command,
even if we are issuing it on ext4, and ext4 hasn't kept proper track,
and defaults to assuming that there is no work to do on a trim."

If we don't have space on-disk for the state, then we are kind of stuck.

But marking groups as "trimmed" when they are not, thereby ignoring
subsequent trim commands, strikes me as a very surprising behavior for
the user.

-Eric

> Thank you,
> Kyungmin Park
>>
>> -Eric
>>
>>>  err:
>>>  	ext4_mb_put_buddy_page_lock(&e4b);
>>>  	return ret;
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen Dec. 2, 2011, 3:45 p.m. UTC | #9
On 12/1/11 6:01 PM, Kyungmin Park wrote:
> On 12/2/11, Eric Sandeen <sandeen@redhat.com> wrote:
>> On 12/1/11 1:00 AM, Kyungmin Park wrote:
>>> From: Kyungmin Park <kyungmin.park@samsung.com>
>>>
>>> Now trim information doesn't stored at disk so every boot time. it's
>>> cleared.
>>> and do the trim all disk groups.
>>> But assume that it's already trimmed at previous time so don't need to
>>> trim it again. So set the intial state as trimmed.
>>>
>>> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
>>> ---
>>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>>> index e2d8be8..97ef342 100644
>>> --- a/fs/ext4/mballoc.c
>>> +++ b/fs/ext4/mballoc.c
>>> @@ -1098,6 +1098,12 @@ int ext4_mb_init_group(struct super_block *sb,
>>> ext4_group_t group)
>>>  		goto err;
>>>  	}
>>>  	mark_page_accessed(page);
>>> +
>>> +	/*
>>> +	 * TRIM information is not stored at disk so set the initial
>>> +	 * state as trimmed. Since previous time it's already trimmed all
>>> +	 */
>>> +	EXT4_MB_GRP_SET_TRIMMED(this_grp);
>>
> Hi,
> 
>> Hm, so if there were freed but un-trimmed blocks at this point, we will
>> never trim them until we free _another_ block in the group, right?  That
>> might be a reasonable tradeoff, but it is somewhat surprising behavior.
>>
>> i.e. say we do:
>>
>> mount /mnt
>> rm -rf /mnt/very_big_file
>> umount /mnt
>>
>> mount /mnt
>> fitrim /mnt
> another word, you can run fitrim after rm -rf
> yes, it's trade-off.
> 
> In my case, phone scenario, no umount system and data partition. it's
> burden to trim at boot time. it has still slower boot time.
> some daemon or program run fitrm at filesystem. it consumes time and
> hurt other boot processes.

Why not just do smaller FITRIM commands in the background while the phone
is running?  Why do you want to trim the whole fs at boot time
in the first place?

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen Dec. 2, 2011, 3:57 p.m. UTC | #10
On 12/1/11 6:01 PM, Kyungmin Park wrote:
> On 12/2/11, Eric Sandeen <sandeen@redhat.com> wrote:
>> On 12/1/11 1:00 AM, Kyungmin Park wrote:
>>> From: Kyungmin Park <kyungmin.park@samsung.com>
>>>
>>> Now trim information doesn't stored at disk so every boot time. it's
>>> cleared.
>>> and do the trim all disk groups.
>>> But assume that it's already trimmed at previous time so don't need to
>>> trim it again. So set the intial state as trimmed.
>>>
>>> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
>>> ---
>>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>>> index e2d8be8..97ef342 100644
>>> --- a/fs/ext4/mballoc.c
>>> +++ b/fs/ext4/mballoc.c
>>> @@ -1098,6 +1098,12 @@ int ext4_mb_init_group(struct super_block *sb,
>>> ext4_group_t group)
>>>  		goto err;
>>>  	}
>>>  	mark_page_accessed(page);
>>> +
>>> +	/*
>>> +	 * TRIM information is not stored at disk so set the initial
>>> +	 * state as trimmed. Since previous time it's already trimmed all
>>> +	 */
>>> +	EXT4_MB_GRP_SET_TRIMMED(this_grp);
>>
> Hi,
> 
>> Hm, so if there were freed but un-trimmed blocks at this point, we will
>> never trim them until we free _another_ block in the group, right?  That
>> might be a reasonable tradeoff, but it is somewhat surprising behavior.
>>
>> i.e. say we do:
>>
>> mount /mnt
>> rm -rf /mnt/very_big_file
>> umount /mnt
>>
>> mount /mnt
>> fitrim /mnt
> another word, you can run fitrim after rm -rf
> yes, it's trade-off.
> 
> In my case, phone scenario, no umount system and data partition. it's
> burden to trim at boot time. it has still slower boot time.
> some daemon or program run fitrm at filesystem. it consumes time and
> hurt other boot processes.
>>
>> then we won't trim anything at all, right, despite there being many
>> new free blocks?  Which would be rather unexpected.
>>
>> If we don't store the trimmed state on disk, I think we should
>> probably stick with the slower first-time trim, and the more obvious
>> behavior (all free blocks are always trimmed whenever a trim
>> command is issued).
> 
> Umm how do you think, introduce the trim force command for this?

Alternately, can we use a bit in bg_flags to keep a better view of this
state on disk, if this is critical?

-Eric

> Thank you,
> Kyungmin Park

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kyungmin Park Dec. 5, 2011, 9:55 a.m. UTC | #11
On 12/3/11, Eric Sandeen <sandeen@redhat.com> wrote:
> On 12/1/11 6:01 PM, Kyungmin Park wrote:
>> On 12/2/11, Eric Sandeen <sandeen@redhat.com> wrote:
>>> On 12/1/11 1:00 AM, Kyungmin Park wrote:
>>>> From: Kyungmin Park <kyungmin.park@samsung.com>
>>>>
>>>> Now trim information doesn't stored at disk so every boot time. it's
>>>> cleared.
>>>> and do the trim all disk groups.
>>>> But assume that it's already trimmed at previous time so don't need to
>>>> trim it again. So set the intial state as trimmed.
>>>>
>>>> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
>>>> ---
>>>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>>>> index e2d8be8..97ef342 100644
>>>> --- a/fs/ext4/mballoc.c
>>>> +++ b/fs/ext4/mballoc.c
>>>> @@ -1098,6 +1098,12 @@ int ext4_mb_init_group(struct super_block *sb,
>>>> ext4_group_t group)
>>>>  		goto err;
>>>>  	}
>>>>  	mark_page_accessed(page);
>>>> +
>>>> +	/*
>>>> +	 * TRIM information is not stored at disk so set the initial
>>>> +	 * state as trimmed. Since previous time it's already trimmed all
>>>> +	 */
>>>> +	EXT4_MB_GRP_SET_TRIMMED(this_grp);
>>>
>> Hi,
>>
>>> Hm, so if there were freed but un-trimmed blocks at this point, we will
>>> never trim them until we free _another_ block in the group, right?  That
>>> might be a reasonable tradeoff, but it is somewhat surprising behavior.
>>>
>>> i.e. say we do:
>>>
>>> mount /mnt
>>> rm -rf /mnt/very_big_file
>>> umount /mnt
>>>
>>> mount /mnt
>>> fitrim /mnt
>> another word, you can run fitrim after rm -rf
>> yes, it's trade-off.
>>
>> In my case, phone scenario, no umount system and data partition. it's
>> burden to trim at boot time. it has still slower boot time.
>> some daemon or program run fitrm at filesystem. it consumes time and
>> hurt other boot processes.
>>>
>>> then we won't trim anything at all, right, despite there being many
>>> new free blocks?  Which would be rather unexpected.
>>>
>>> If we don't store the trimmed state on disk, I think we should
>>> probably stick with the slower first-time trim, and the more obvious
>>> behavior (all free blocks are always trimmed whenever a trim
>>> command is issued).
>>
>> Umm how do you think, introduce the trim force command for this?
>
> Alternately, can we use a bit in bg_flags to keep a better view of this
> state on disk, if this is critical?

BTW, bg_flags in struct ext4_group_desc is same as ext4_group_info's bb_state?

Thank you,
Kyungmin Park
>
> -Eric
>
>> Thank you,
>> Kyungmin Park
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Czerner Dec. 5, 2011, 10:17 a.m. UTC | #12
On Thu, 1 Dec 2011, Kyungmin Park wrote:

> From: Kyungmin Park <kyungmin.park@samsung.com>
> 
> Now trim information doesn't stored at disk so every boot time. it's cleared.
> and do the trim all disk groups.
> But assume that it's already trimmed at previous time so don't need to trim it again. So set the intial state as trimmed.

Hi,

I am sorry, but from the code and comments I have seen, I do not think
it does make sense to change that behaviour. I agree that discarding the
whole file system after the mount might not be necessarily needed, but
on the other hand, you can never assume that the blocks were already
trimmed, since it most likely will not be true.

I think that the bigger problem is "running fitrim at boot time" which
does not make sense to me, because you'll be perfectly fine leaving this
up to the cron (or something similar). Also when you're booting you need
to be done ASAP so why to bother with other unnecessary operations ?

The funniest thing about this patch is (no offense, really) that you've
solved the problem of "fitrim slowing down the boot process" by changing
kernel logic to assume something which will most likely be false, instead
of just *not* doing boot time fitrim at the first place, because this is
exactly what will happen with this patch.

Thanks!
-Lukas

> 
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> ---
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index e2d8be8..97ef342 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -1098,6 +1098,12 @@ int ext4_mb_init_group(struct super_block *sb, ext4_group_t group)
>  		goto err;
>  	}
>  	mark_page_accessed(page);
> +
> +	/*
> +	 * TRIM information is not stored at disk so set the initial
> +	 * state as trimmed. Since previous time it's already trimmed all
> +	 */
> +	EXT4_MB_GRP_SET_TRIMMED(this_grp);
>  err:
>  	ext4_mb_put_buddy_page_lock(&e4b);
>  	return ret;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Lukas Czerner Dec. 5, 2011, 10:25 a.m. UTC | #13
On Thu, 1 Dec 2011, Greg Freemyer wrote:

> On Thu, Dec 1, 2011 at 7:01 PM, Kyungmin Park <kmpark@infradead.org> wrote:
> > On 12/2/11, Eric Sandeen <sandeen@redhat.com> wrote:
> <snip>
> > Hi,
> >
> >> Hm, so if there were freed but un-trimmed blocks at this point, we will
> >> never trim them until we free _another_ block in the group, right?  That
> >> might be a reasonable tradeoff, but it is somewhat surprising behavior.
> >>
> >> i.e. say we do:
> >>
> >> mount /mnt
> >> rm -rf /mnt/very_big_file
> >> umount /mnt
> 
> does umount need to force a fitrim if it's available?

Please, do not. That will be weird, unexpected and also annoying,
because fitrim can take a bit longer depending on the hadrware. So I do
not thing doing fitrim at mount/umount time is a good idea.

> 
> >> mount /mnt
> >> fitrim /mnt
> 
> That way if umount is clean, then the new logic could kick for the
> next mount, but if there was not a clean umount, then in addition to
> replaying the journal, the next mount could leave the fitrim info not
> initialized.
> 
> I'm sure there are smarter ways to track it.  The biggest thing I'm
> suggesting is for there to at least be a single boolean "fitrim'ed
> state flag for the whole filesystem.  that could cleared on mount and
> set on a clean umount.

The flag makes some sense (if we really need some mechanism like that).
But rather than setting it on (mount/umount) time we should set it after
the fitrim and clear it after the unlink.

Thanks!
-Lukas

> 
> Greg
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
Kyungmin Park Dec. 5, 2011, 10:35 a.m. UTC | #14
On 12/5/11, Lukas Czerner <lczerner@redhat.com> wrote:
> On Thu, 1 Dec 2011, Kyungmin Park wrote:
>
>> From: Kyungmin Park <kyungmin.park@samsung.com>
>>
>> Now trim information doesn't stored at disk so every boot time. it's
>> cleared.
>> and do the trim all disk groups.
>> But assume that it's already trimmed at previous time so don't need to
>> trim it again. So set the intial state as trimmed.
>
> Hi,
>
> I am sorry, but from the code and comments I have seen, I do not think
> it does make sense to change that behaviour. I agree that discarding the
> whole file system after the mount might not be necessarily needed, but
> on the other hand, you can never assume that the blocks were already
> trimmed, since it most likely will not be true.
>
> I think that the bigger problem is "running fitrim at boot time" which
> does not make sense to me, because you'll be perfectly fine leaving this
> up to the cron (or something similar). Also when you're booting you need
> to be done ASAP so why to bother with other unnecessary operations ?

Okay please ignore the boot time, but still first trim operation needs
to be considered.
Unlike the pc world, the phone is turned off/on frequently. at that
time it trims all block again.
I know it's not hurt the flash life time, but want to avoid to useless
trim operation on flash.

Yes, the best solution is that trim information is also stored at disk.
Can you confirm the idea? store the trim information at bg_flags at
struct ext4_block_desc?

Thank you,
Kyungmin Park
>
> The funniest thing about this patch is (no offense, really) that you've
> solved the problem of "fitrim slowing down the boot process" by changing
> kernel logic to assume something which will most likely be false, instead
> of just *not* doing boot time fitrim at the first place, because this is
> exactly what will happen with this patch.
>
> Thanks!
> -Lukas
>
>>
>> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
>> ---
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index e2d8be8..97ef342 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -1098,6 +1098,12 @@ int ext4_mb_init_group(struct super_block *sb,
>> ext4_group_t group)
>>  		goto err;
>>  	}
>>  	mark_page_accessed(page);
>> +
>> +	/*
>> +	 * TRIM information is not stored at disk so set the initial
>> +	 * state as trimmed. Since previous time it's already trimmed all
>> +	 */
>> +	EXT4_MB_GRP_SET_TRIMMED(this_grp);
>>  err:
>>  	ext4_mb_put_buddy_page_lock(&e4b);
>>  	return ret;
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
> --
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Czerner Dec. 5, 2011, 11:09 a.m. UTC | #15
On Mon, 5 Dec 2011, Kyungmin Park wrote:

> On 12/5/11, Lukas Czerner <lczerner@redhat.com> wrote:
> > On Thu, 1 Dec 2011, Kyungmin Park wrote:
> >
> >> From: Kyungmin Park <kyungmin.park@samsung.com>
> >>
> >> Now trim information doesn't stored at disk so every boot time. it's
> >> cleared.
> >> and do the trim all disk groups.
> >> But assume that it's already trimmed at previous time so don't need to
> >> trim it again. So set the intial state as trimmed.
> >
> > Hi,
> >
> > I am sorry, but from the code and comments I have seen, I do not think
> > it does make sense to change that behaviour. I agree that discarding the
> > whole file system after the mount might not be necessarily needed, but
> > on the other hand, you can never assume that the blocks were already
> > trimmed, since it most likely will not be true.
> >
> > I think that the bigger problem is "running fitrim at boot time" which
> > does not make sense to me, because you'll be perfectly fine leaving this
> > up to the cron (or something similar). Also when you're booting you need
> > to be done ASAP so why to bother with other unnecessary operations ?
> 
> Okay please ignore the boot time, but still first trim operation needs
> to be considered.
> Unlike the pc world, the phone is turned off/on frequently. at that
> time it trims all block again.

I am not sure that this is entirely true. I am turning off my pc way to
more often than my phone. But it might be just me. Also some problem
might be mounting/umounting your external flash when you plug your
phone to the pc. So I can see than the mobile world probably requires
some solution. Not sure what it should be though.

> I know it's not hurt the flash life time, but want to avoid to useless
> trim operation on flash.

Actually, with those chap flash cards I think that it might hurt its
life time, when you're doing discard way too often. But it really
depends on the wear-leveling algorithms it uses internally.

> 
> Yes, the best solution is that trim information is also stored at disk.
> Can you confirm the idea? store the trim information at bg_flags at
> struct ext4_block_desc?

I do not like the idea very much. I would rather find some "smart" way
to overcome this problem, than adding more on disk format
incompatibilities.

We know that the performance problem starts to appear when the flash
does not have enough space to write new blocks without moving old
fragments of the erase block to the new position. It essentially means
that there is not enough "free" space which is the space which is free
and which has not been used since the last discard.

We know the size of the flash (from the user space), we also know the
number of kilobytes written to the file system in case of ext4
(/sys/fs/ext4/<device>/lifetime_write_kbytes) and if we can track the
change between fitrim calls we can more-or-less determine how much
"free" space is there on the flash available to efficiently write new
block. That said, I think that some user space heuristic can be applied
to determine when is the right time tu run fitrim again and it might
significantly help to solve the problem. Some experimenting has to be
done though, to get some numbers.

Also note that you do not have to run fitrim on the whole file system.
It can be run on the part of the file system as well. though the result
might differ between the file systems (some migh discard more blocks
than requested because it is efficient to do so, or really hard to do
differently).

The idea would require some more thoughts, but what is your opinion
about this approach ? Note that it can be done purely from user space.

Thanks!
-Lukas

> 
> Thank you,
> Kyungmin Park
> >
> > The funniest thing about this patch is (no offense, really) that you've
> > solved the problem of "fitrim slowing down the boot process" by changing
> > kernel logic to assume something which will most likely be false, instead
> > of just *not* doing boot time fitrim at the first place, because this is
> > exactly what will happen with this patch.
> >
> > Thanks!
> > -Lukas
> >
> >>
> >> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> >> ---
> >> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> >> index e2d8be8..97ef342 100644
> >> --- a/fs/ext4/mballoc.c
> >> +++ b/fs/ext4/mballoc.c
> >> @@ -1098,6 +1098,12 @@ int ext4_mb_init_group(struct super_block *sb,
> >> ext4_group_t group)
> >>  		goto err;
> >>  	}
> >>  	mark_page_accessed(page);
> >> +
> >> +	/*
> >> +	 * TRIM information is not stored at disk so set the initial
> >> +	 * state as trimmed. Since previous time it's already trimmed all
> >> +	 */
> >> +	EXT4_MB_GRP_SET_TRIMMED(this_grp);
> >>  err:
> >>  	ext4_mb_put_buddy_page_lock(&e4b);
> >>  	return ret;
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >
> > --
> >
>
diff mbox

Patch

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index e2d8be8..97ef342 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1098,6 +1098,12 @@  int ext4_mb_init_group(struct super_block *sb, ext4_group_t group)
 		goto err;
 	}
 	mark_page_accessed(page);
+
+	/*
+	 * TRIM information is not stored at disk so set the initial
+	 * state as trimmed. Since previous time it's already trimmed all
+	 */
+	EXT4_MB_GRP_SET_TRIMMED(this_grp);
 err:
 	ext4_mb_put_buddy_page_lock(&e4b);
 	return ret;