Message ID | p2yac8f92701003300536kd52f7ad0p43755fcc2382423b@mail.gmail.com |
---|---|
State | Rejected, archived |
Headers | show |
On Tue, 30 Mar 2010 20:36:17 +0800, jing zhang <zj.barak@gmail.com> wrote: > From: Jing Zhang <zj.barak@gmail.com> > > Date: Tue Mar 30 20:35:22 2010 > > With the added cache, better group locality may be earned when > allocating blocks. > > Cc: Theodore Ts'o <tytso@mit.edu> > Cc: Andreas Dilger <adilger@sun.com> > Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com> > Cc: "Aneesh Kumar K. V" <aneesh.kumar@linux.vnet.ibm.com> > Signed-off-by: Jing Zhang <zj.barak@gmail.com> > > --- > > --- linux-2.6.32/fs/ext4/mballoc.c 2009-12-03 11:51:22.000000000 +0800 > +++ ext4_mm_leak/mballoc-13.c 2010-03-30 20:28:08.000000000 +0800 > @@ -4183,12 +4183,20 @@ static int ext4_mb_discard_preallocation > ext4_group_t i, ngroups = ext4_get_groups_count(sb); > int ret; > int freed = 0; > + static ext4_group_t grp_cache = 0; > > trace_ext4_mb_discard_preallocations(sb, needed); > - for (i = 0; i < ngroups && needed > 0; i++) { > - ret = ext4_mb_discard_group_preallocations(sb, i, needed); > + if (needed <= 0) > + return freed; > + for (i = 0; i < ngroups; i++) { > + if (grp_cache >= ngroups) > + grp_cache -= ngroups; > + ret = ext4_mb_discard_group_preallocations(sb, grp_cache, needed); > freed += ret; > needed -= ret; > + if (needed <= 0) > + break; > + grp_cache++; > } > > return freed; can you explain this further ? -aneesh -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2010-03-30, at 06:36, jing zhang wrote: > --- linux-2.6.32/fs/ext4/mballoc.c 2009-12-03 11:51:22.000000000 +0800 > +++ ext4_mm_leak/mballoc-13.c 2010-03-30 20:28:08.000000000 +0800 > @@ -4183,12 +4183,20 @@ static int ext4_mb_discard_preallocation > trace_ext4_mb_discard_preallocations(sb, needed); > - for (i = 0; i < ngroups && needed > 0; i++) { > - ret = ext4_mb_discard_group_preallocations(sb, i, needed); > + if (needed <= 0) > + return freed; > + for (i = 0; i < ngroups; i++) { > + if (grp_cache >= ngroups) > + grp_cache -= ngroups; > + ret = ext4_mb_discard_group_preallocations(sb, grp_cache, needed); Anything that is walking every group in the filesystem is going to hit problems on large filesystems. This seems like something that needs to be fixed in a different way (e.g. keeping a list of preallocations). Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2010/3/31, Aneesh Kumar K. V <aneesh.kumar@linux.vnet.ibm.com>: > On Tue, 30 Mar 2010 20:36:17 +0800, jing zhang <zj.barak@gmail.com> wrote: >> From: Jing Zhang <zj.barak@gmail.com> >> >> Date: Tue Mar 30 20:35:22 2010 >> >> With the added cache, better group locality may be earned when >> allocating blocks. >> >> Cc: Theodore Ts'o <tytso@mit.edu> >> Cc: Andreas Dilger <adilger@sun.com> >> Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com> >> Cc: "Aneesh Kumar K. V" <aneesh.kumar@linux.vnet.ibm.com> >> Signed-off-by: Jing Zhang <zj.barak@gmail.com> >> >> --- >> >> --- linux-2.6.32/fs/ext4/mballoc.c 2009-12-03 11:51:22.000000000 +0800 >> +++ ext4_mm_leak/mballoc-13.c 2010-03-30 20:28:08.000000000 +0800 >> @@ -4183,12 +4183,20 @@ static int ext4_mb_discard_preallocation >> ext4_group_t i, ngroups = ext4_get_groups_count(sb); >> int ret; >> int freed = 0; >> + static ext4_group_t grp_cache = 0; >> >> trace_ext4_mb_discard_preallocations(sb, needed); >> - for (i = 0; i < ngroups && needed > 0; i++) { >> - ret = ext4_mb_discard_group_preallocations(sb, i, needed); >> + if (needed <= 0) >> + return freed; >> + for (i = 0; i < ngroups; i++) { >> + if (grp_cache >= ngroups) >> + grp_cache -= ngroups; >> + ret = ext4_mb_discard_group_preallocations(sb, grp_cache, needed); >> freed += ret; >> needed -= ret; >> + if (needed <= 0) >> + break; >> + grp_cache++; >> } >> >> return freed; > > can you explain this further ? > > -aneesh > The added cache checks whether blocks pre-allocated in group are still available. If yes, they are discarded and used for allocation without change of group. So more group locality can be earned. What is more, in function, ext4_mb_discard_group_preallocations(), pre-allocation is allowed to be discarded as much as possible by yielding. - zj -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 30, 2010 at 08:36:17PM +0800, jing zhang wrote: > --- linux-2.6.32/fs/ext4/mballoc.c 2009-12-03 11:51:22.000000000 +0800 > +++ ext4_mm_leak/mballoc-13.c 2010-03-30 20:28:08.000000000 +0800 > @@ -4183,12 +4183,20 @@ static int ext4_mb_discard_preallocation > ext4_group_t i, ngroups = ext4_get_groups_count(sb); > int ret; > int freed = 0; > + static ext4_group_t grp_cache = 0; This is a problem right there. Remember that there could be multiple file systems mounted so a static variable is fundamentally flawed. In fact, we could have a one filesystem which has more than 3 times the number of groups as another file system. I'll leave it as an exercise to a reader why your patch would be fundamentally flawed in that case. The other thing to note is that this case only gets hit if the file system is so full that we need to empty preallocations. So this means hitting this case is rare, which raises two questions: (1) is it worth it to optimize this case in the first place (is it really that expensive to iterate over all the groups to discard the preallocations); (2) can we test this case well? - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2010/4/7, tytso@mit.edu <tytso@mit.edu>: > On Tue, Mar 30, 2010 at 08:36:17PM +0800, jing zhang wrote: >> --- linux-2.6.32/fs/ext4/mballoc.c 2009-12-03 11:51:22.000000000 +0800 >> +++ ext4_mm_leak/mballoc-13.c 2010-03-30 20:28:08.000000000 +0800 >> @@ -4183,12 +4183,20 @@ static int ext4_mb_discard_preallocation >> ext4_group_t i, ngroups = ext4_get_groups_count(sb); >> int ret; >> int freed = 0; >> + static ext4_group_t grp_cache = 0; > > This is a problem right there. Remember that there could be multiple > file systems mounted so a static variable is fundamentally flawed. > cool, the static in my patch is a fatal error. - zj > In fact, we could have a one filesystem which has more than 3 times > the number of groups as another file system. I'll leave it as an > exercise to a reader why your patch would be fundamentally flawed in > that case. > > The other thing to note is that this case only gets hit if the file > system is so full that we need to empty preallocations. So this means > hitting this case is rare, which raises two questions: (1) is it worth > it to optimize this case in the first place (is it really that > expensive to iterate over all the groups to discard the > preallocations); (2) can we test this case well? > > - Ted > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- linux-2.6.32/fs/ext4/mballoc.c 2009-12-03 11:51:22.000000000 +0800 +++ ext4_mm_leak/mballoc-13.c 2010-03-30 20:28:08.000000000 +0800 @@ -4183,12 +4183,20 @@ static int ext4_mb_discard_preallocation ext4_group_t i, ngroups = ext4_get_groups_count(sb); int ret; int freed = 0; + static ext4_group_t grp_cache = 0; trace_ext4_mb_discard_preallocations(sb, needed); - for (i = 0; i < ngroups && needed > 0; i++) { - ret = ext4_mb_discard_group_preallocations(sb, i, needed); + if (needed <= 0) + return freed; + for (i = 0; i < ngroups; i++) { + if (grp_cache >= ngroups) + grp_cache -= ngroups; + ret = ext4_mb_discard_group_preallocations(sb, grp_cache, needed); freed += ret; needed -= ret; + if (needed <= 0) + break; + grp_cache++; } return freed;