diff mbox

ext4: group cache is added in ext4_mb_discard_preallocations()

Message ID p2yac8f92701003300536kd52f7ad0p43755fcc2382423b@mail.gmail.com
State Rejected, archived
Headers show

Commit Message

jing zhang March 30, 2010, 12:36 p.m. UTC
From: Jing Zhang <zj.barak@gmail.com>

Date: Tue Mar 30 20:35:22     2010

With the added cache, better group locality may be earned when
allocating blocks.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Andreas Dilger <adilger@sun.com>
Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: "Aneesh Kumar K. V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Jing Zhang <zj.barak@gmail.com>

---

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Aneesh Kumar K.V March 30, 2010, 6:37 p.m. UTC | #1
On Tue, 30 Mar 2010 20:36:17 +0800, jing zhang <zj.barak@gmail.com> wrote:
> From: Jing Zhang <zj.barak@gmail.com>
> 
> Date: Tue Mar 30 20:35:22     2010
> 
> With the added cache, better group locality may be earned when
> allocating blocks.
> 
> Cc: Theodore Ts'o <tytso@mit.edu>
> Cc: Andreas Dilger <adilger@sun.com>
> Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
> Cc: "Aneesh Kumar K. V" <aneesh.kumar@linux.vnet.ibm.com>
> Signed-off-by: Jing Zhang <zj.barak@gmail.com>
> 
> ---
> 
> --- linux-2.6.32/fs/ext4/mballoc.c	2009-12-03 11:51:22.000000000 +0800
> +++ ext4_mm_leak/mballoc-13.c	2010-03-30 20:28:08.000000000 +0800
> @@ -4183,12 +4183,20 @@ static int ext4_mb_discard_preallocation
>  	ext4_group_t i, ngroups = ext4_get_groups_count(sb);
>  	int ret;
>  	int freed = 0;
> +	static ext4_group_t grp_cache = 0;
> 
>  	trace_ext4_mb_discard_preallocations(sb, needed);
> -	for (i = 0; i < ngroups && needed > 0; i++) {
> -		ret = ext4_mb_discard_group_preallocations(sb, i, needed);
> +	if (needed <= 0)
> +		return freed;
> +	for (i = 0; i < ngroups; i++) {
> +		if (grp_cache >= ngroups)
> +			grp_cache -= ngroups;
> +		ret = ext4_mb_discard_group_preallocations(sb, grp_cache, needed);
>  		freed += ret;
>  		needed -= ret;
> +		if (needed <= 0)
> +			break;
> +		grp_cache++;
>  	}
> 
>  	return freed;

can you explain this further ?

-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Dilger March 31, 2010, 3:03 p.m. UTC | #2
On 2010-03-30, at 06:36, jing zhang wrote:
> --- linux-2.6.32/fs/ext4/mballoc.c	2009-12-03 11:51:22.000000000 +0800
> +++ ext4_mm_leak/mballoc-13.c	2010-03-30 20:28:08.000000000 +0800
> @@ -4183,12 +4183,20 @@ static int ext4_mb_discard_preallocation
> 	trace_ext4_mb_discard_preallocations(sb, needed);
> -	for (i = 0; i < ngroups && needed > 0; i++) {
> -		ret = ext4_mb_discard_group_preallocations(sb, i, needed);
> +	if (needed <= 0)
> +		return freed;
> +	for (i = 0; i < ngroups; i++) {
> +		if (grp_cache >= ngroups)
> +			grp_cache -= ngroups;
> +		ret = ext4_mb_discard_group_preallocations(sb, grp_cache, needed);


Anything that is walking every group in the filesystem is going to hit  
problems on large filesystems.  This seems like something that needs  
to be fixed in a different way (e.g. keeping a list of preallocations).

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
jing zhang March 31, 2010, 3:10 p.m. UTC | #3
2010/3/31, Aneesh Kumar K. V <aneesh.kumar@linux.vnet.ibm.com>:
> On Tue, 30 Mar 2010 20:36:17 +0800, jing zhang <zj.barak@gmail.com> wrote:
>> From: Jing Zhang <zj.barak@gmail.com>
>>
>> Date: Tue Mar 30 20:35:22     2010
>>
>> With the added cache, better group locality may be earned when
>> allocating blocks.
>>
>> Cc: Theodore Ts'o <tytso@mit.edu>
>> Cc: Andreas Dilger <adilger@sun.com>
>> Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
>> Cc: "Aneesh Kumar K. V" <aneesh.kumar@linux.vnet.ibm.com>
>> Signed-off-by: Jing Zhang <zj.barak@gmail.com>
>>
>> ---
>>
>> --- linux-2.6.32/fs/ext4/mballoc.c	2009-12-03 11:51:22.000000000 +0800
>> +++ ext4_mm_leak/mballoc-13.c	2010-03-30 20:28:08.000000000 +0800
>> @@ -4183,12 +4183,20 @@ static int ext4_mb_discard_preallocation
>>  	ext4_group_t i, ngroups = ext4_get_groups_count(sb);
>>  	int ret;
>>  	int freed = 0;
>> +	static ext4_group_t grp_cache = 0;
>>
>>  	trace_ext4_mb_discard_preallocations(sb, needed);
>> -	for (i = 0; i < ngroups && needed > 0; i++) {
>> -		ret = ext4_mb_discard_group_preallocations(sb, i, needed);
>> +	if (needed <= 0)
>> +		return freed;
>> +	for (i = 0; i < ngroups; i++) {
>> +		if (grp_cache >= ngroups)
>> +			grp_cache -= ngroups;
>> +		ret = ext4_mb_discard_group_preallocations(sb, grp_cache, needed);
>>  		freed += ret;
>>  		needed -= ret;
>> +		if (needed <= 0)
>> +			break;
>> +		grp_cache++;
>>  	}
>>
>>  	return freed;
>
> can you explain this further ?
>
> -aneesh
>

The added cache checks whether blocks pre-allocated in group are still
available. If yes, they are discarded and used for allocation without
change of group. So more group locality can be earned.

What is more, in function, ext4_mb_discard_group_preallocations(),
pre-allocation is allowed  to be discarded as much as possible by
yielding.

     - zj
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o April 6, 2010, 6:31 p.m. UTC | #4
On Tue, Mar 30, 2010 at 08:36:17PM +0800, jing zhang wrote:
> --- linux-2.6.32/fs/ext4/mballoc.c	2009-12-03 11:51:22.000000000 +0800
> +++ ext4_mm_leak/mballoc-13.c	2010-03-30 20:28:08.000000000 +0800
> @@ -4183,12 +4183,20 @@ static int ext4_mb_discard_preallocation
>  	ext4_group_t i, ngroups = ext4_get_groups_count(sb);
>  	int ret;
>  	int freed = 0;
> +	static ext4_group_t grp_cache = 0;

This is a problem right there.  Remember that there could be multiple
file systems mounted so a static variable is fundamentally flawed.

In fact, we could have a one filesystem which has more than 3 times
the number of groups as another file system.  I'll leave it as an
exercise to a reader why your patch would be fundamentally flawed in
that case.

The other thing to note is that this case only gets hit if the file
system is so full that we need to empty preallocations.  So this means
hitting this case is rare, which raises two questions: (1) is it worth
it to optimize this case in the first place (is it really that
expensive to iterate over all the groups to discard the
preallocations); (2) can we test this case well?

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
jing zhang April 7, 2010, 12:50 p.m. UTC | #5
2010/4/7, tytso@mit.edu <tytso@mit.edu>:
> On Tue, Mar 30, 2010 at 08:36:17PM +0800, jing zhang wrote:
>> --- linux-2.6.32/fs/ext4/mballoc.c	2009-12-03 11:51:22.000000000 +0800
>> +++ ext4_mm_leak/mballoc-13.c	2010-03-30 20:28:08.000000000 +0800
>> @@ -4183,12 +4183,20 @@ static int ext4_mb_discard_preallocation
>>  	ext4_group_t i, ngroups = ext4_get_groups_count(sb);
>>  	int ret;
>>  	int freed = 0;
>> +	static ext4_group_t grp_cache = 0;
>
> This is a problem right there.  Remember that there could be multiple
> file systems mounted so a static variable is fundamentally flawed.
>

cool, the static in my patch is a fatal error.

          - zj

> In fact, we could have a one filesystem which has more than 3 times
> the number of groups as another file system.  I'll leave it as an
> exercise to a reader why your patch would be fundamentally flawed in
> that case.
>
> The other thing to note is that this case only gets hit if the file
> system is so full that we need to empty preallocations.  So this means
> hitting this case is rare, which raises two questions: (1) is it worth
> it to optimize this case in the first place (is it really that
> expensive to iterate over all the groups to discard the
> preallocations); (2) can we test this case well?
>
> 						- Ted
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- linux-2.6.32/fs/ext4/mballoc.c	2009-12-03 11:51:22.000000000 +0800
+++ ext4_mm_leak/mballoc-13.c	2010-03-30 20:28:08.000000000 +0800
@@ -4183,12 +4183,20 @@  static int ext4_mb_discard_preallocation
 	ext4_group_t i, ngroups = ext4_get_groups_count(sb);
 	int ret;
 	int freed = 0;
+	static ext4_group_t grp_cache = 0;

 	trace_ext4_mb_discard_preallocations(sb, needed);
-	for (i = 0; i < ngroups && needed > 0; i++) {
-		ret = ext4_mb_discard_group_preallocations(sb, i, needed);
+	if (needed <= 0)
+		return freed;
+	for (i = 0; i < ngroups; i++) {
+		if (grp_cache >= ngroups)
+			grp_cache -= ngroups;
+		ret = ext4_mb_discard_group_preallocations(sb, grp_cache, needed);
 		freed += ret;
 		needed -= ret;
+		if (needed <= 0)
+			break;
+		grp_cache++;
 	}

 	return freed;