diff mbox

[1/2] ext2: speed up file creates by optimizing rec_len functions

Message ID 4CFE7409.9090609@redhat.com
State Not Applicable, archived
Headers show

Commit Message

Eric Sandeen Dec. 7, 2010, 5:51 p.m. UTC
The addition of 64k block capability in the rec_len_from_disk
and rec_len_to_disk functions added a bit of math overhead which
slows down file create workloads needlessly when the architecture
cannot even support 64k blocks, thanks to page size limits.

The directory entry checking can also be optimized a bit
by sprinkling in some unlikely() conditions to move the
error handling out of line.

bonnie++ sequential file creates on a 512MB ramdisk speeds up
from about 2200/s to about 2500/s, about a 14% improvement.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Andrew Morton Dec. 7, 2010, 9:07 p.m. UTC | #1
On Tue, 07 Dec 2010 11:51:05 -0600
Eric Sandeen <sandeen@redhat.com> wrote:

> The addition of 64k block capability in the rec_len_from_disk
> and rec_len_to_disk functions added a bit of math overhead which
> slows down file create workloads needlessly when the architecture
> cannot even support 64k blocks, thanks to page size limits.
> 
> The directory entry checking can also be optimized a bit
> by sprinkling in some unlikely() conditions to move the
> error handling out of line.
> 
> bonnie++ sequential file creates on a 512MB ramdisk speeds up
> from about 2200/s to about 2500/s, about a 14% improvement.
> 

hrm, that's an improbably-large sounding improvement from eliminating
just a few test-n-branches from a pretty heavyweight operation.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen Dec. 7, 2010, 9:22 p.m. UTC | #2
On 12/7/10 3:07 PM, Andrew Morton wrote:
> On Tue, 07 Dec 2010 11:51:05 -0600
> Eric Sandeen <sandeen@redhat.com> wrote:
> 
>> The addition of 64k block capability in the rec_len_from_disk
>> and rec_len_to_disk functions added a bit of math overhead which
>> slows down file create workloads needlessly when the architecture
>> cannot even support 64k blocks, thanks to page size limits.
>>
>> The directory entry checking can also be optimized a bit
>> by sprinkling in some unlikely() conditions to move the
>> error handling out of line.
>>
>> bonnie++ sequential file creates on a 512MB ramdisk speeds up
>> from about 2200/s to about 2500/s, about a 14% improvement.
>>
> 
> hrm, that's an improbably-large sounding improvement from eliminating
> just a few test-n-branches from a pretty heavyweight operation.

And yet ...

Yeah, I dunno.  Part of it is that ext2_add_link does a linear
search, so when you do rec_len_from_disk 50,000 times on a dir,
that little bit adds up quite badly I suppose.

Retesting at a bunch of different number-of-files in bonnie
(with a small sample size so probably a little noise)

	|files per sec|
files	stock	patched	delta
10,000	12300	14700	+19%
20,000	 6300	 7600	+20%
30,000	 4200	 5000	+20%
40,000	 3150	 3700	+17%
50,000	 2500	 3000	+20%

(again all on a 512MB ramdisk)

*shrug* I'll believe my lyin' eyes, I guess. :)

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Morton Dec. 7, 2010, 9:33 p.m. UTC | #3
On Tue, 07 Dec 2010 15:22:55 -0600
Eric Sandeen <sandeen@redhat.com> wrote:

> On 12/7/10 3:07 PM, Andrew Morton wrote:
> > On Tue, 07 Dec 2010 11:51:05 -0600
> > Eric Sandeen <sandeen@redhat.com> wrote:
> > 
> >> The addition of 64k block capability in the rec_len_from_disk
> >> and rec_len_to_disk functions added a bit of math overhead which
> >> slows down file create workloads needlessly when the architecture
> >> cannot even support 64k blocks, thanks to page size limits.
> >>
> >> The directory entry checking can also be optimized a bit
> >> by sprinkling in some unlikely() conditions to move the
> >> error handling out of line.
> >>
> >> bonnie++ sequential file creates on a 512MB ramdisk speeds up
> >> from about 2200/s to about 2500/s, about a 14% improvement.
> >>
> > 
> > hrm, that's an improbably-large sounding improvement from eliminating
> > just a few test-n-branches from a pretty heavyweight operation.
> 
> And yet ...
> 
> Yeah, I dunno.  Part of it is that ext2_add_link does a linear
> search, so when you do rec_len_from_disk 50,000 times on a dir,
> that little bit adds up quite badly I suppose.

oh yeah, that would do it.

> Retesting at a bunch of different number-of-files in bonnie
> (with a small sample size so probably a little noise)
> 
> 	|files per sec|
> files	stock	patched	delta
> 10,000	12300	14700	+19%
> 20,000	 6300	 7600	+20%
> 30,000	 4200	 5000	+20%
> 40,000	 3150	 3700	+17%
> 50,000	 2500	 3000	+20%
> 
> (again all on a 512MB ramdisk)
> 
> *shrug* I'll believe my lyin' eyes, I guess. :)
> 

I bet other tweaks in there would yield similar goodliness.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen Dec. 7, 2010, 9:36 p.m. UTC | #4
On 12/7/10 3:33 PM, Andrew Morton wrote:
> I bet other tweaks in there would yield similar goodliness.

Yep, probably so ...

We could also speed up ext3/4 a bit by not doing the extN_check_dir_entry
call on every single access, but only when we read it from disk.

Given that I've talked about it for 4 years and never done it, maybe
I'll try to find a volunteer ;)

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Dilger Dec. 8, 2010, 7:01 p.m. UTC | #5
On 2010-12-07, at 14:33, Andrew Morton wrote:
> On Tue, 07 Dec 2010 15:22:55 -0600
> Eric Sandeen <sandeen@redhat.com> wrote:
>> Retesting at a bunch of different number-of-files in bonnie
>> (with a small sample size so probably a little noise)
>> 
>> 	|files per sec|
>> files	stock	patched	delta
>> 10,000	12300	14700	+19%
>> 20,000	 6300	 7600	+20%
>> 30,000	 4200	 5000	+20%
>> 40,000	 3150	 3700	+17%
>> 50,000	 2500	 3000	+20%
>> 
>> (again all on a 512MB ramdisk)
>> 
>> *shrug* I'll believe my lyin' eyes, I guess. :)
> 
> I bet other tweaks in there would yield similar goodliness.

I think an important factor here is that this is being tested on a ramdisk, and is likely CPU bound, so any CPU reduction will directly be measured as a performance improvement.  Probably oprofile is in order to see where other major CPU users are.

In the past, ext3_mark_inode_dirty() was a major offender, and I recall that we discussed on the ext4 concall a mechanism to only copy the inode once per transaction so that overhead could be removed.

Cheers, Andreas





--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen Dec. 8, 2010, 9:07 p.m. UTC | #6
On 12/08/2010 01:01 PM, Andreas Dilger wrote:
> On 2010-12-07, at 14:33, Andrew Morton wrote:
>> On Tue, 07 Dec 2010 15:22:55 -0600 Eric Sandeen
>> <sandeen@redhat.com> wrote:
>>> Retesting at a bunch of different number-of-files in bonnie (with
>>> a small sample size so probably a little noise)
>>> 
>>> |files per sec| files	stock	patched	delta 10,000	12300	14700
>>> +19% 20,000	 6300	 7600	+20% 30,000	 4200	 5000	+20% 40,000	 3150
>>> 3700	+17% 50,000	 2500	 3000	+20%
>>> 
>>> (again all on a 512MB ramdisk)
>>> 
>>> *shrug* I'll believe my lyin' eyes, I guess. :)
>> 
>> I bet other tweaks in there would yield similar goodliness.
> 
> I think an important factor here is that this is being tested on a
> ramdisk, and is likely CPU bound, so any CPU reduction will directly
> be measured as a performance improvement.  Probably oprofile is in
> order to see where other major CPU users are.

Yep, I ran oprofile.

samples  %        app name                 symbol name
1140046  41.8702  ext2.ko                  ext2_find_entry
1052117  38.6408  ext2.ko                  ext2_add_link
98424     3.6148  vmlinux                  native_safe_halt
40461     1.4860  vmlinux                  wait_on_page_read
29084     1.0682  vmlinux                  find_get_page

pretty slammed on those 2 ext2 functions!  I think it's pretty
overwhelmed by the linear search.

-Eric

> In the past, ext3_mark_inode_dirty() was a major offender, and I
> recall that we discussed on the ext4 concall a mechanism to only copy
> the inode once per transaction so that overhead could be removed.
> 
> Cheers, Andreas
> 
> 
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Dilger Dec. 8, 2010, 9:44 p.m. UTC | #7
On 2010-12-08, at 14:07, Eric Sandeen wrote:
> On 12/08/2010 01:01 PM, Andreas Dilger wrote:
>> I think an important factor here is that this is being tested on a
>> ramdisk, and is likely CPU bound, so any CPU reduction will directly
>> be measured as a performance improvement.  Probably oprofile is in
>> order to see where other major CPU users are.
> 
> Yep, I ran oprofile.
> 
> samples  %        app name                 symbol name
> 1140046  41.8702  ext2.ko                  ext2_find_entry
> 1052117  38.6408  ext2.ko                  ext2_add_link
> 98424     3.6148  vmlinux                  native_safe_halt
> 40461     1.4860  vmlinux                  wait_on_page_read
> 29084     1.0682  vmlinux                  find_get_page
> 
> pretty slammed on those 2 ext2 functions!  I think it's pretty
> overwhelmed by the linear search.

Can you test ext4 with nojournal mode, but with dir_index enabled?  I suspect that testing ext2 for directory performance is pointless.  My personal threshold for ext2 directories was 10k files before I considered it a lost cause, and all of your tests are with 10k+ files per directory.

Just another log on the fire beneath getting rid of ext2 (and eventually ext3) in favour of ext4, IMHO.  I'd be surprised if there are many benchmarks that ext2 can beat ext4 in nojournal mode, if allowed to enable "reversible" format changes like dir_index, uninit_bg, etc.

Cheers, Andreas





--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ric Wheeler Dec. 9, 2010, 1:06 a.m. UTC | #8
On 12/08/2010 04:44 PM, Andreas Dilger wrote:
> On 2010-12-08, at 14:07, Eric Sandeen wrote:
>> On 12/08/2010 01:01 PM, Andreas Dilger wrote:
>>> I think an important factor here is that this is being tested on a
>>> ramdisk, and is likely CPU bound, so any CPU reduction will directly
>>> be measured as a performance improvement.  Probably oprofile is in
>>> order to see where other major CPU users are.
>> Yep, I ran oprofile.
>>
>> samples  %        app name                 symbol name
>> 1140046  41.8702  ext2.ko                  ext2_find_entry
>> 1052117  38.6408  ext2.ko                  ext2_add_link
>> 98424     3.6148  vmlinux                  native_safe_halt
>> 40461     1.4860  vmlinux                  wait_on_page_read
>> 29084     1.0682  vmlinux                  find_get_page
>>
>> pretty slammed on those 2 ext2 functions!  I think it's pretty
>> overwhelmed by the linear search.
> Can you test ext4 with nojournal mode, but with dir_index enabled?  I suspect that testing ext2 for directory performance is pointless.  My personal threshold for ext2 directories was 10k files before I considered it a lost cause, and all of your tests are with 10k+ files per directory.
>
> Just another log on the fire beneath getting rid of ext2 (and eventually ext3) in favour of ext4, IMHO.  I'd be surprised if there are many benchmarks that ext2 can beat ext4 in nojournal mode, if allowed to enable "reversible" format changes like dir_index, uninit_bg, etc.
>
> Cheers, Andreas
>

If we could get rid of ext2 (and eventually ext3), it would actually help reduce 
the testing matrix and possibly let us invest even more in testing ext4.  Having 
to maintain three very similar code bases and test them all for correctness and 
performance is a real pain :)

ric

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen Dec. 9, 2010, 2:13 a.m. UTC | #9
On 12/8/10 3:44 PM, Andreas Dilger wrote:
> On 2010-12-08, at 14:07, Eric Sandeen wrote:
>> On 12/08/2010 01:01 PM, Andreas Dilger wrote:
>>> I think an important factor here is that this is being tested on
>>> a ramdisk, and is likely CPU bound, so any CPU reduction will
>>> directly be measured as a performance improvement.  Probably
>>> oprofile is in order to see where other major CPU users are.
>> 
>> Yep, I ran oprofile.
>> 
>> samples  %        app name                 symbol name 1140046
>> 41.8702  ext2.ko                  ext2_find_entry 1052117  38.6408
>> ext2.ko                  ext2_add_link 98424     3.6148  vmlinux
>> native_safe_halt 40461     1.4860  vmlinux
>> wait_on_page_read 29084     1.0682  vmlinux
>> find_get_page
>> 
>> pretty slammed on those 2 ext2 functions!  I think it's pretty 
>> overwhelmed by the linear search.
> 
> Can you test ext4 with nojournal mode, but with dir_index enabled?  I
> suspect that testing ext2 for directory performance is pointless.

Oh, I agree.  I just had a report of a regression in a certain distro,
which is frowned upon... :)  I'm positive ext4 nojournal would beat the
pants off it due to dir_index.

> My personal threshold for ext2 directories was 10k files before I
> considered it a lost cause, and all of your tests are with 10k+ files
> per directory.

Agreed, it's not very realistic but then it was a simple fix too.
Still took time though...

> Just another log on the fire beneath getting rid of ext2 (and
> eventually ext3) in favour of ext4, IMHO.  I'd be surprised if there
> are many benchmarks that ext2 can beat ext4 in nojournal mode, if
> allowed to enable "reversible" format changes like dir_index,
> uninit_bg, etc.

Hey, I wouldn't complain.

-Eric

> Cheers, Andreas
> 
> 
> 
> 
> 
> -- To unsubscribe from this list: send the line "unsubscribe
> linux-ext4" in the body of a message to majordomo@vger.kernel.org 
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Dec. 10, 2010, 5:55 p.m. UTC | #10
On Wed, Dec 08, 2010 at 08:06:25PM -0500, Ric Wheeler wrote:

> If we could get rid of ext2 (and eventually ext3), it would actually
> help reduce the testing matrix and possibly let us invest even more
> in testing ext4.  Having to maintain three very similar code bases
> and test them all for correctness and performance is a real pain :)
> 

A distribution can do that at any time, just by unconfiguring
CONFIG_EXT2 and/or CONFIG_EXT3 and configuring
CONFIG_EXT4_USE_FOR_EXT23.   :-)

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ric Wheeler Dec. 10, 2010, 6:05 p.m. UTC | #11
On 12/10/2010 12:55 PM, Ted Ts'o wrote:
> On Wed, Dec 08, 2010 at 08:06:25PM -0500, Ric Wheeler wrote:
>
>> If we could get rid of ext2 (and eventually ext3), it would actually
>> help reduce the testing matrix and possibly let us invest even more
>> in testing ext4.  Having to maintain three very similar code bases
>> and test them all for correctness and performance is a real pain :)
>>
> A distribution can do that at any time, just by unconfiguring
> CONFIG_EXT2 and/or CONFIG_EXT3 and configuring
> CONFIG_EXT4_USE_FOR_EXT23.   :-)
>
> 						- Ted

Way too easy :)

What makes users stop using things is google searches that show up Ted saying 
"We don't need ext2 any more, just use ext4 with no journal mode" ;)

Ric

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Dilger Dec. 10, 2010, 11:40 p.m. UTC | #12
On 2010-12-10, at 10:55, Ted Ts'o wrote:
> On Wed, Dec 08, 2010 at 08:06:25PM -0500, Ric Wheeler wrote:
>> If we could get rid of ext2 (and eventually ext3), it would actually
>> help reduce the testing matrix and possibly let us invest even more
>> in testing ext4.  Having to maintain three very similar code bases
>> and test them all for correctness and performance is a real pain :)
> 
> A distribution can do that at any time, just by unconfiguring
> CONFIG_EXT2 and/or CONFIG_EXT3 and configuring
> CONFIG_EXT4_USE_FOR_EXT23.   :-)

That doesn't get rid of the maintenance efforts, however.  There are a
constant stream of "patch for ext[234] needs to be ported to ext[342]".

Cheers, Andreas





--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen Dec. 11, 2010, 12:24 a.m. UTC | #13
On 12/10/10 5:40 PM, Andreas Dilger wrote:
> On 2010-12-10, at 10:55, Ted Ts'o wrote:
>> On Wed, Dec 08, 2010 at 08:06:25PM -0500, Ric Wheeler wrote:
>>> If we could get rid of ext2 (and eventually ext3), it would actually
>>> help reduce the testing matrix and possibly let us invest even more
>>> in testing ext4.  Having to maintain three very similar code bases
>>> and test them all for correctness and performance is a real pain :)
>>
>> A distribution can do that at any time, just by unconfiguring
>> CONFIG_EXT2 and/or CONFIG_EXT3 and configuring
>> CONFIG_EXT4_USE_FOR_EXT23.   :-)
> 
> That doesn't get rid of the maintenance efforts, however.  There are a
> constant stream of "patch for ext[234] needs to be ported to ext[342]".

Agreed....

our 3-way fork is getting cumbersome.
 
> Cheers, Andreas

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c
index 2709b34..47cda41 100644
--- a/fs/ext2/dir.c
+++ b/fs/ext2/dir.c
@@ -28,21 +28,30 @@ 
 
 typedef struct ext2_dir_entry_2 ext2_dirent;
 
+/*
+ * Tests against MAX_REC_LEN etc were put in place for 64k block
+ * sizes; if that is not possible on this arch, we can skip
+ * those tests and speed things up.
+ */
 static inline unsigned ext2_rec_len_from_disk(__le16 dlen)
 {
 	unsigned len = le16_to_cpu(dlen);
 
+#if (PAGE_CACHE_SIZE >= 65536)
 	if (len == EXT2_MAX_REC_LEN)
 		return 1 << 16;
+#endif
 	return len;
 }
 
 static inline __le16 ext2_rec_len_to_disk(unsigned len)
 {
+#if (PAGE_CACHE_SIZE >= 65536)
 	if (len == (1 << 16))
 		return cpu_to_le16(EXT2_MAX_REC_LEN);
 	else
 		BUG_ON(len > (1 << 16));
+#endif
 	return cpu_to_le16(len);
 }
 
@@ -129,15 +138,15 @@  static void ext2_check_page(struct page *page, int quiet)
 		p = (ext2_dirent *)(kaddr + offs);
 		rec_len = ext2_rec_len_from_disk(p->rec_len);
 
-		if (rec_len < EXT2_DIR_REC_LEN(1))
+		if (unlikely(rec_len < EXT2_DIR_REC_LEN(1)))
 			goto Eshort;
-		if (rec_len & 3)
+		if (unlikely(rec_len & 3))
 			goto Ealign;
-		if (rec_len < EXT2_DIR_REC_LEN(p->name_len))
+		if (unlikely(rec_len < EXT2_DIR_REC_LEN(p->name_len)))
 			goto Enamelen;
-		if (((offs + rec_len - 1) ^ offs) & ~(chunk_size-1))
+		if (unlikely(((offs + rec_len - 1) ^ offs) & ~(chunk_size-1)))
 			goto Espan;
-		if (le32_to_cpu(p->inode) > max_inumber)
+		if (unlikely(le32_to_cpu(p->inode) > max_inumber))
 			goto Einumber;
 	}
 	if (offs != limit)