After unlinking a large file on ext4, the process stalls for a long time
diff mbox

Message ID 53C7B0B7.9030007@free.fr
State New
Headers show

Commit Message

Mason July 17, 2014, 11:17 a.m. UTC
Lukáš Czerner wrote:

> So it really does not seem to be stalling in fallocate, nor unlink.
> Can you add close() before unlink, just to be sure what's happening
> there ?

Doh! Good catch! Unlinking was fast because the ref count didn't drop
to 0 on unlink, it did so on the implicit close done on exit, which
would explain why the process stalled "at the end".

If I unlink a closed file, it is indeed unlink that stalls.

[BTW, some of the e2fsprogs devs may be reading this. I suppose you
already know, but the cross-compile build was broken in 1.4.10.
I wrote a trivial patch to fix it (cf. the end of this message)
although I'm not sure I did it the canonical way.]


# time strace -T ./foo /mnt/hdd/xxx 300 2> strace.out
posix_fallocate(fd, 0, size_in_GiB << 30): 0 [412 ms]
close(fd): 0 [0 ms]
unlink(filename): 0 [111481 ms]

open("/mnt/hdd/xxx", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0600) = 3 <0.000456>
clock_gettime(CLOCK_MONOTONIC, {82152, 251657385}) = 0 <0.000085>
SYS_4320()                              = 0 <0.411628>
clock_gettime(CLOCK_MONOTONIC, {82152, 664179762}) = 0 <0.000089>
fstat64(1, {st_mode=S_IFCHR|0755, st_rdev=makedev(4, 64), ...}) = 0 <0.000094>
ioctl(1, TIOCNXCL, {B115200 opost isig icanon echo ...}) = 0 <0.000128>
old_mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x773e4000 <0.000195>
write(1, "posix_fallocate(fd, 0, size_in_G"..., 54) = 54 <0.000281>
clock_gettime(CLOCK_MONOTONIC, {82152, 668413115}) = 0 <0.000077>
close(3)                                = 0 <0.000119>
clock_gettime(CLOCK_MONOTONIC, {82152, 669249479}) = 0 <0.000129>
write(1, "close(fd): 0 [0 ms]\n", 20)   = 20 <0.000145>
clock_gettime(CLOCK_MONOTONIC, {82152, 670361133}) = 0 <0.000078>
unlink("/mnt/hdd/xxx")                  = 0 <111.479283>
clock_gettime(CLOCK_MONOTONIC, {82264, 150551496}) = 0 <0.000080>
write(1, "unlink(filename): 0 [111481 ms]\n", 32) = 32 <0.000225>
exit_group(0)                           = ?

0.01user 111.48system 1:51.99elapsed 99%CPU (0avgtext+0avgdata 772maxresident)k
0inputs+0outputs (0major+434minor)pagefaults 0swaps


For reference, here's my minimal test case:

#define _FILE_OFFSET_BITS 64
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <fcntl.h>
#include <time.h>

#define BENCH(op) do { \
  struct timespec t0; clock_gettime(CLOCK_MONOTONIC, &t0); \
  int err = op; \
  struct timespec t1; clock_gettime(CLOCK_MONOTONIC, &t1); \
  int ms = (t1.tv_sec-t0.tv_sec)*1000 + (t1.tv_nsec-t0.tv_nsec)/1000000; \
  printf("%s: %d [%d ms]\n", #op, err, ms); } while(0)

int main(int argc, char **argv)
{
  if (argc != 3) { puts("Usage: prog filename size"); return 42; }

  char *filename = argv[1];
  int fd = open(filename, O_CREAT | O_EXCL | O_WRONLY, 0600);
  if (fd < 0) { perror("open"); return 1; }

  long long size_in_GiB = atoi(argv[2]);
  BENCH(posix_fallocate(fd, 0, size_in_GiB << 30));
  BENCH(close(fd));
  BENCH(unlink(filename));
  return 0;
}


$ cat e2fsprogs-1.42.10.patch

Comments

Theodore Y. Ts'o July 17, 2014, 1:37 p.m. UTC | #1
On Thu, Jul 17, 2014 at 01:17:11PM +0200, Mason wrote:
> unlink("/mnt/hdd/xxx")                  = 0 <111.479283>
> 
> 0.01user 111.48system 1:51.99elapsed 99%CPU (0avgtext+0avgdata 772maxresident)k
> 0inputs+0outputs (0major+434minor)pagefaults 0swaps

... and we're CPU bound inside the kernel.

Can you run perf so we can see exactly where we're spending the CPU?
You're not using a journal, so I'm pretty sure what you will find is
that we're spending all of our time in mb_free_blocks(), when it is
updating the internal mballoc buddy bitmaps.

With a journal, this work done by mb_free_blocks() is hidden in the
kjournal thread, and happens after the commit is completed, so it
won't block other file system operations (other than burning some
extra CPU on one of the multiple cores available on a typical x86
CPU).

Also, I suspect the CPU overhead is *much* less on an x86 CPU, which
has native bit test/set/clear instructions, whereas the MIPS
architecture was designed by Prof. Hennessy at Stanford, who was a
doctrinaire RISC fanatic, so there would be no bitop instructions.

Even though I'm pretty sure what we'll find, knowing exactly *where*
in mb_free_blocks() or the function it calls would be helpful in
knowing what we need to optimize.  So if you could try using perf
(assuming that the perf is supported MIPS; not sure if it does) that
would be really helpful.

Thanks,

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mason July 17, 2014, 4:07 p.m. UTC | #2
Theodore Ts'o wrote:

> Mason wrote:
> 
>> unlink("/mnt/hdd/xxx")                  = 0 <111.479283>
>>
>> 0.01user 111.48system 1:51.99elapsed 99%CPU (0avgtext+0avgdata 772maxresident)k
>> 0inputs+0outputs (0major+434minor)pagefaults 0swaps
> 
> ... and we're CPU bound inside the kernel.
> 
> Can you run perf so we can see exactly where we're spending the CPU?
> You're not using a journal, so I'm pretty sure what you will find is
> that we're spending all of our time in mb_free_blocks(), when it is
> updating the internal mballoc buddy bitmaps.
> 
> With a journal, this work done by mb_free_blocks() is hidden in the
> kjournal thread, and happens after the commit is completed, so it
> won't block other file system operations (other than burning some
> extra CPU on one of the multiple cores available on a typical x86
> CPU).
> 
> Also, I suspect the CPU overhead is *much* less on an x86 CPU, which
> has native bit test/set/clear instructions, whereas the MIPS
> architecture was designed by Prof. Hennessy at Stanford, who was a
> doctrinaire RISC fanatic, so there would be no bitop instructions.
> 
> Even though I'm pretty sure what we'll find, knowing exactly *where*
> in mb_free_blocks() or the function it calls would be helpful in
> knowing what we need to optimize.  So if you could try using perf
> (assuming that the perf is supported MIPS; not sure if it does) that
> would be really helpful.

Is perf "better" than oprofile? (For some metric)

I have enabled:

CONFIG_PERF_EVENTS=y
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
CONFIG_OPROFILE=y
CONFIG_HAVE_OPROFILE=y
CONFIG_KPROBES=y
CONFIG_KRETPROBES=y

What command-line do you suggest I run to get the output you expect?
(I'll try to get it done, but I might have to wait two weeks before
I can run these tests.)
Mason July 17, 2014, 4:32 p.m. UTC | #3
On 17/07/2014 18:07, Mason wrote:

> Theodore Ts'o wrote:
> 
>> Mason wrote:
>>
>>> unlink("/mnt/hdd/xxx")                  = 0 <111.479283>
>>>
>>> 0.01user 111.48system 1:51.99elapsed 99%CPU (0avgtext+0avgdata 772maxresident)k
>>> 0inputs+0outputs (0major+434minor)pagefaults 0swaps
>>
>> ... and we're CPU bound inside the kernel.
>>
>> Can you run perf so we can see exactly where we're spending the CPU?
>> You're not using a journal, so I'm pretty sure what you will find is
>> that we're spending all of our time in mb_free_blocks(), when it is
>> updating the internal mballoc buddy bitmaps.
>>
>> With a journal, this work done by mb_free_blocks() is hidden in the
>> kjournal thread, and happens after the commit is completed, so it
>> won't block other file system operations (other than burning some
>> extra CPU on one of the multiple cores available on a typical x86
>> CPU).
>>
>> Also, I suspect the CPU overhead is *much* less on an x86 CPU, which
>> has native bit test/set/clear instructions, whereas the MIPS
>> architecture was designed by Prof. Hennessy at Stanford, who was a
>> doctrinaire RISC fanatic, so there would be no bitop instructions.
>>
>> Even though I'm pretty sure what we'll find, knowing exactly *where*
>> in mb_free_blocks() or the function it calls would be helpful in
>> knowing what we need to optimize.  So if you could try using perf
>> (assuming that the perf is supported MIPS; not sure if it does) that
>> would be really helpful.
> 
> Is perf "better" than oprofile? (For some metric)
> 
> I have enabled:
> 
> CONFIG_PERF_EVENTS=y
> CONFIG_PROFILING=y
> CONFIG_TRACEPOINTS=y
> CONFIG_OPROFILE=y
> CONFIG_HAVE_OPROFILE=y
> CONFIG_KPROBES=y
> CONFIG_KRETPROBES=y
> 
> What command-line do you suggest I run to get the output you expect?
> (I'll try to get it done, but I might have to wait two weeks before
> I can run these tests.)

So much for oprofile...

  CC      arch/mips/oprofile/../../../drivers/oprofile/oprof.o
arch/mips/oprofile/../../../drivers/oprofile/oprof.c: In function 'oprofile_init':
arch/mips/oprofile/../../../drivers/oprofile/oprof.c:316: error: 'timer' undeclared (first use in this function)
arch/mips/oprofile/../../../drivers/oprofile/oprof.c:316: error: (Each undeclared identifier is reported only once
arch/mips/oprofile/../../../drivers/oprofile/oprof.c:316: error: for each function it appears in.)
arch/mips/oprofile/../../../drivers/oprofile/oprof.c: In function '__check_timer':
arch/mips/oprofile/../../../drivers/oprofile/oprof.c:373: error: 'timer' undeclared (first use in this function)
arch/mips/oprofile/../../../drivers/oprofile/oprof.c: At top level:
arch/mips/oprofile/../../../drivers/oprofile/oprof.c:373: error: 'timer' undeclared here (not in a function)
cc1: warnings being treated as errors
arch/mips/oprofile/../../../drivers/oprofile/oprof.c:373: error: type defaults to 'int' in declaration of 'type name'
make[1]: *** [arch/mips/oprofile/../../../drivers/oprofile/oprof.o] Error 1
make: *** [arch/mips/oprofile] Error 2

Dunno if this happens on vanilla kernels, or if the ODM messed
something up (again).

$ ll tools/perf/arch/
drwxrwxr-x 4 bob bob 4096 Mar 27 17:12 arm/
drwxrwxr-x 4 bob bob 4096 Mar 27 17:12 powerpc/
drwxrwxr-x 4 bob bob 4096 Mar 27 17:12 s390/
drwxrwxr-x 4 bob bob 4096 Mar 27 17:12 sh/
drwxrwxr-x 4 bob bob 4096 Mar 27 17:12 sparc/
drwxrwxr-x 4 bob bob 4096 Mar 27 17:12 x86/

I'm not sure perf supports MIPS...

Or maybe it does

$ g -rni mips .
./Makefile:45:				  -e s/ppc.*/powerpc/ -e s/mips.*/mips/ \
Binary file ./.Makefile.swp matches
./perf.h:76:#ifdef __mips__
./perf.h:77:#include "../../arch/mips/include/asm/unistd.h"
./perf.h:79:				".set	mips2\n\t"			\
./perf.h:81:				".set	mips0"				\
Lukas Czerner July 18, 2014, 9:29 a.m. UTC | #4
On Thu, 17 Jul 2014, Mason wrote:

> Date: Thu, 17 Jul 2014 18:07:30 +0200
> From: Mason <mpeg.blue@free.fr>
> To: Theodore Ts'o <tytso@mit.edu>
> Cc: Lukáš Czerner <lczerner@redhat.com>, Andreas Dilger <adilger@dilger.ca>,
>     Ext4 Developers List <linux-ext4@vger.kernel.org>,
>     linux-fsdevel <linux-fsdevel@vger.kernel.org>
> Subject: Re: After unlinking a large file on ext4,
>     the process stalls for a long time
> 
> Theodore Ts'o wrote:
> 
> > Mason wrote:
> > 
> >> unlink("/mnt/hdd/xxx")                  = 0 <111.479283>
> >>
> >> 0.01user 111.48system 1:51.99elapsed 99%CPU (0avgtext+0avgdata 772maxresident)k
> >> 0inputs+0outputs (0major+434minor)pagefaults 0swaps
> > 
> > ... and we're CPU bound inside the kernel.
> > 
> > Can you run perf so we can see exactly where we're spending the CPU?
> > You're not using a journal, so I'm pretty sure what you will find is
> > that we're spending all of our time in mb_free_blocks(), when it is
> > updating the internal mballoc buddy bitmaps.
> > 
> > With a journal, this work done by mb_free_blocks() is hidden in the
> > kjournal thread, and happens after the commit is completed, so it
> > won't block other file system operations (other than burning some
> > extra CPU on one of the multiple cores available on a typical x86
> > CPU).
> > 
> > Also, I suspect the CPU overhead is *much* less on an x86 CPU, which
> > has native bit test/set/clear instructions, whereas the MIPS
> > architecture was designed by Prof. Hennessy at Stanford, who was a
> > doctrinaire RISC fanatic, so there would be no bitop instructions.
> > 
> > Even though I'm pretty sure what we'll find, knowing exactly *where*
> > in mb_free_blocks() or the function it calls would be helpful in
> > knowing what we need to optimize.  So if you could try using perf
> > (assuming that the perf is supported MIPS; not sure if it does) that
> > would be really helpful.
> 
> Is perf "better" than oprofile? (For some metric)
> 
> I have enabled:
> 
> CONFIG_PERF_EVENTS=y
> CONFIG_PROFILING=y
> CONFIG_TRACEPOINTS=y
> CONFIG_OPROFILE=y
> CONFIG_HAVE_OPROFILE=y
> CONFIG_KPROBES=y
> CONFIG_KRETPROBES=y
> 
> What command-line do you suggest I run to get the output you expect?
> (I'll try to get it done, but I might have to wait two weeks before
> I can run these tests.)

If perf works on your system you can record data with

perf record -g ./test file <size>

and then report with

perf report --stdio

That should yield some interesting information about where we spend
the most time in kernel.

Thanks!
-Lukas
Andreas Dilger Aug. 4, 2014, 10:55 p.m. UTC | #5
It would be possible to optimize mb_free_blocks() by having it
clear a whole word at a time instead of a series if bits. 

I thought that was done already, but it doesn't appear to be the case.
Also, it isn't clear that the bit "normalization" is needed anymore.
This was done back in the aniceint times when the buddy bitmaps were stored on disk instead of being regenerated only at mount time. 

Cheers, Andreas

> On Aug 4, 2014, at 16:30, Mason <mpeg.blue@free.fr> wrote:
> 
>> On 18/07/2014 11:29, Lukáš Czerner wrote:
>> 
>> Mason wrote:
>> 
>>> Theodore Ts'o wrote:
>>> 
>>>> Mason wrote:
>>>> 
>>>>> unlink("/mnt/hdd/xxx")                  = 0 <111.479283>
>>>>> 
>>>>> 0.01user 111.48system 1:51.99elapsed 99%CPU (0avgtext+0avgdata 772maxresident)k
>>>>> 0inputs+0outputs (0major+434minor)pagefaults 0swaps
>>>> 
>>>> ... and we're CPU bound inside the kernel.
>>>> 
>>>> Can you run perf so we can see exactly where we're spending the CPU?
>>>> You're not using a journal, so I'm pretty sure what you will find is
>>>> that we're spending all of our time in mb_free_blocks(), when it is
>>>> updating the internal mballoc buddy bitmaps.
>>>> 
>>>> With a journal, this work done by mb_free_blocks() is hidden in the
>>>> kjournal thread, and happens after the commit is completed, so it
>>>> won't block other file system operations (other than burning some
>>>> extra CPU on one of the multiple cores available on a typical x86
>>>> CPU).
>>>> 
>>>> Also, I suspect the CPU overhead is *much* less on an x86 CPU, which
>>>> has native bit test/set/clear instructions, whereas the MIPS
>>>> architecture was designed by Prof. Hennessy at Stanford, who was a
>>>> doctrinaire RISC fanatic, so there would be no bitop instructions.
> 
> I've attached the output of "mips-linux-gnu-objdump -xd mballoc.o"
> in case someone wants to peek at the generated code.
> 
>>>> Even though I'm pretty sure what we'll find, knowing exactly *where*
>>>> in mb_free_blocks() or the function it calls would be helpful in
>>>> knowing what we need to optimize.  So if you could try using perf
>>>> (assuming that the perf is supported MIPS; not sure if it does) that
>>>> would be really helpful.
> 
> How do you get perf to tell you where in mb_free_blocks we are spending
> the most time?
> 
>>> What command-line do you suggest I run to get the output you expect?
>> 
>> If perf works on your system you can record data with
>> 
>> perf record -g ./test file <size>
>> 
>> and then report with
>> 
>> perf report --stdio
>> 
>> That should yield some interesting information about where we spend
>> the most time in kernel.
> 
> I've no idea why, but the unlink operation, which used to take
> 111 seconds to run, now only takes 53...
> 
> Anyway, here is the requested output.
> 
> # time perf record -g foo /mnt/hdd/xxx 300
> [ perf record: Woken up 8 times to write data ]
> [ perf record: Captured and wrote 1.909 MB perf.data (~83406 samples) ]
> 0.04user 0.08system 0:53.54elapsed 0%CPU (0avgtext+0avgdata 3616maxresident)k
> 0inputs+0outputs (0major+984minor)pagefaults 0swaps
> 
> # perf report --stdio > report.txt
> (Complete report attached as report.txt.xz)
> 
> What can I do to improve the latency of unlinking large files?
> Would sparse_super2 help at all?
> 
> 
> # Events: 14K cycles
> #
> # Overhead  Command      Shared Object                        Symbol
> # ........  .......  .................  ............................
> #
>    33.94%      foo  [kernel.kallsyms]  [k] mb_free_blocks
>               |
>               --- mb_free_blocks
>                   ext4_free_blocks
>                   ext4_ext_rm_leaf
>                   ext4_ext_truncate
>                   ext4_truncate
>                   ext4_evict_inode
>                   evict
>                   do_unlinkat
>                   stack_done
> 
>    21.11%      foo  [kernel.kallsyms]  [k] __find_get_block
>               |
>               --- __find_get_block
>                  |          
>                  |--99.94%-- ext4_free_blocks
>                  |          ext4_ext_rm_leaf
>                  |          ext4_ext_truncate
>                  |          ext4_truncate
>                  |          ext4_evict_inode
>                  |          evict
>                  |          do_unlinkat
>                  |          stack_done
>                   --0.06%-- [...]
> 
>     8.33%      foo  [kernel.kallsyms]  [k] radix_tree_lookup_slot
>               |
>               --- radix_tree_lookup_slot
>                   find_get_page
>                   __find_get_block_slow
>                   __find_get_block
>                   ext4_free_blocks
>                   ext4_ext_rm_leaf
>                   ext4_ext_truncate
>                   ext4_truncate
>                   ext4_evict_inode
>                   evict
>                   do_unlinkat
>                   stack_done
> 
>     6.99%      foo  [kernel.kallsyms]  [k] mb_find_buddy
>               |
>               --- mb_find_buddy
>                   mb_free_blocks
>                   ext4_free_blocks
>                   ext4_ext_rm_leaf
>                   ext4_ext_truncate
>                   ext4_truncate
>                   ext4_evict_inode
>                   evict
>                   do_unlinkat
>                   stack_done
> 
>     4.21%      foo  [kernel.kallsyms]  [k] trace_preempt_off
>               |
>               --- trace_preempt_off
>                  |          
>                  |--99.99%-- __find_get_block
>                  |          ext4_free_blocks
>                  |          ext4_ext_rm_leaf
>                  |          ext4_ext_truncate
>                  |          ext4_truncate
>                  |          ext4_evict_inode
>                  |          evict
>                  |          do_unlinkat
>                  |          stack_done
>                   --0.01%-- [...]
> 
>     4.19%      foo  [kernel.kallsyms]  [k] ext4_free_blocks
>               |
>               --- ext4_free_blocks
>                   ext4_ext_rm_leaf
>                   ext4_ext_truncate
>                   ext4_truncate
>                   ext4_evict_inode
>                   evict
>                   do_unlinkat
>                   stack_done
> 
>     4.14%      foo  [kernel.kallsyms]  [k] sub_preempt_count
>               |
>               --- sub_preempt_count
>                  |          
>                  |--99.69%-- __find_get_block
>                  |          ext4_free_blocks
>                  |          ext4_ext_rm_leaf
>                  |          ext4_ext_truncate
>                  |          ext4_truncate
>                  |          ext4_evict_inode
>                  |          evict
>                  |          do_unlinkat
>                  |          stack_done
>                   --0.31%-- [...]
> 
>     3.97%      foo  [kernel.kallsyms]  [k] __find_get_block_slow
>               |
>               --- __find_get_block_slow
>                   __find_get_block
>                   ext4_free_blocks
>                   ext4_ext_rm_leaf
>                   ext4_ext_truncate
>                   ext4_truncate
>                   ext4_evict_inode
>                   evict
>                   do_unlinkat
>                   stack_done
> 
>     3.53%      foo  [kernel.kallsyms]  [k] __rcu_read_unlock
>               |
>               --- __rcu_read_unlock
>                  |          
>                  |--100.00%-- find_get_page
>                  |          __find_get_block_slow
>                  |          __find_get_block
>                  |          ext4_free_blocks
>                  |          ext4_ext_rm_leaf
>                  |          ext4_ext_truncate
>                  |          ext4_truncate
>                  |          ext4_evict_inode
>                  |          evict
>                  |          do_unlinkat
>                  |          stack_done
>                   --0.00%-- [...]
> 
>     3.26%      foo  [kernel.kallsyms]  [k] trace_preempt_on
>               |
>               --- trace_preempt_on
>                   sub_preempt_count
>                  |          
>                  |--100.00%-- __find_get_block
>                  |          ext4_free_blocks
>                  |          ext4_ext_rm_leaf
>                  |          ext4_ext_truncate
>                  |          ext4_truncate
>                  |          ext4_evict_inode
>                  |          evict
>                  |          do_unlinkat
>                  |          stack_done
>                   --0.00%-- [...]
> 
>     2.06%      foo  [kernel.kallsyms]  [k] find_get_page
>               |
>               --- find_get_page
>                  |          
>                  |--100.00%-- __find_get_block_slow
>                  |          __find_get_block
>                  |          ext4_free_blocks
>                  |          ext4_ext_rm_leaf
>                  |          ext4_ext_truncate
>                  |          ext4_truncate
>                  |          ext4_evict_inode
>                  |          evict
>                  |          do_unlinkat
>                  |          stack_done
>                   --0.00%-- [...]
> 
>     1.39%      foo  [kernel.kallsyms]  [k] add_preempt_count
>               |
>               --- add_preempt_count
>                  |          
>                  |--99.99%-- __find_get_block
>                  |          ext4_free_blocks
>                  |          ext4_ext_rm_leaf
>                  |          ext4_ext_truncate
>                  |          ext4_truncate
>                  |          ext4_evict_inode
>                  |          evict
>                  |          do_unlinkat
>                  |          stack_done
>                   --0.01%-- [...]
> 
>     1.26%      foo  [kernel.kallsyms]  [k] __rcu_read_lock
>               |
>               --- __rcu_read_lock
>                   find_get_page
>                   __find_get_block_slow
>                   __find_get_block
>                   ext4_free_blocks
>                   ext4_ext_rm_leaf
>                   ext4_ext_truncate
>                   ext4_truncate
>                   ext4_evict_inode
>                   evict
>                   do_unlinkat
>                   stack_done
> 
> -- 
> Regards.
> 
> <report.txt.xz>
> <mballoc.dump.xz>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Y. Ts'o Aug. 5, 2014, 2:33 a.m. UTC | #6
On Tue, Aug 05, 2014 at 12:55:14AM +0200, Andreas Dilger wrote:
> It would be possible to optimize mb_free_blocks() by having it
> clear a whole word at a time instead of a series if bits. 

It looks like we're doing this already in mb_test_and_clear_bits(),
aren't we?

> I thought that was done already, but it doesn't appear to be the case.
> Also, it isn't clear that the bit "normalization" is needed anymore.
> This was done back in the aniceint times when the buddy bitmaps were stored on disk instead of being regenerated only at mount time. 

I'm not sure what you mean by this; the only reference I can find
normalization is with normalizing requests?

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mason Aug. 5, 2014, 12:06 p.m. UTC | #7
On 05/08/2014 00:55, Andreas Dilger wrote:

> It would be possible to optimize mb_free_blocks() by having it
> clear a whole word at a time instead of a series of bits.
> 
> I thought that was done already, but it doesn't appear to be the case.
> Also, it isn't clear that the bit "normalization" is needed anymore.
> This was done back in the ancient times when the buddy bitmaps were
> stored on disk instead of being regenerated only at mount time.

Are there any other tests you'd like me to run?
(I will be permanently losing access to this platform in a few days.)
Andreas Dilger Aug. 5, 2014, 9:54 p.m. UTC | #8
On Aug 5, 2014, at 4:33, Theodore Ts'o <tytso@mit.edu> wrote:
> 
>> On Tue, Aug 05, 2014 at 12:55:14AM +0200, Andreas Dilger wrote:
>> It would be possible to optimize mb_free_blocks() by having it
>> clear a whole word at a time instead of a series if bits. 
> 
> It looks like we're doing this already in mb_test_and_clear_bits(),
> aren't we?

Sorry, I didn't see mb_test_and_clear_bits(), I was only looking at
mb_clear_bit() to see if it be the multi-bit optimization. 

>> I thought that was done already, but it doesn't appear to be the case.
>> Also, it isn't clear that the bit "normalization" is needed anymore.
>> This was done back in the aniceint times when the buddy bitmaps were stored on disk instead of being regenerated only at mount time. 
> 
> I'm not sure what you mean by this; the only reference I can find
> normalization is with normalizing requests?

I meant mb_correct_addr_and_bit(). 

Cheers, Andreas--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff -ur a/util/Makefile.in b/util/Makefile.in
--- a/util/Makefile.in	2014-05-15 19:04:08.000000000 +0200
+++ b/util/Makefile.in	2014-07-10 15:31:04.819352596 +0200
@@ -15,7 +15,7 @@ 
 
 .c.o:
 	$(E) "	CC $<"
-	$(Q) $(BUILD_CC) -c $(BUILD_CFLAGS) $< -o $@
+	$(Q) $(BUILD_CC) $(CPPFLAGS) -c $(BUILD_CFLAGS) $< -o $@
 	$(Q) $(CHECK_CMD) $(ALL_CFLAGS) $<
 
 PROGS=		subst symlinks