Patchwork [2/2] libext2fs: optimize rb_test_bit

login
register
mail settings
Submitter Lukas Czerner
Date Oct. 9, 2012, 7:18 a.m.
Message ID <alpine.LFD.2.00.1210090836370.11244@dhcp-1-104.brq.redhat.com>
Download mbox | patch
Permalink /patch/190223/
State Rejected
Headers show

Comments

Lukas Czerner - Oct. 9, 2012, 7:18 a.m.
On Mon, 8 Oct 2012, Theodore Ts'o wrote:

> Date: Mon, 8 Oct 2012 14:17:53 -0400
> From: Theodore Ts'o <tytso@mit.edu>
> To: Lukáš Czerner <lczerner@redhat.com>
> Cc: Ext4 Developers List <linux-ext4@vger.kernel.org>
> Subject: Re: [PATCH 2/2] libext2fs: optimize rb_test_bit
> 
> On Mon, Oct 08, 2012 at 10:25:19AM +0200, Lukáš Czerner wrote:
> > > the patch and the idea behind it look fine, especially when we're
> > > walking the bitmap sequentially not modifying it simultaneously, but
> > > I have one question/suggestion below.
> > 
> > Also for this kind of usage it might actually make sense to have
> > something like:
> > 
> > get_next_zero_bit
> > get_next_set_bit
> > 
> > which would be much more effective than testing single bits, but it
> > would require actually using this in e2fsprogs tools.
> 
> Yes, I thought about that, and in fact we already have find_first_zero
> (which takes a starting offset, so works for both find_first and
> find_next).  When we introduced this, though, we accidentally
> introduced a bug at first.
> 
> At some point I agree it would be good to implement find_first_set(),
> and writing new unit tests, and then modify e2freefrag, e2fsck, and
> dumpe2fs to use it.  But in the applications is actually pretty
> tricky, and I didn't have the time I figured would be necessary to
> really do the changes right, and validate/test them properly.
> 
> So yes, I agree this would be much more effective, and ultimately
> would result in further speedups in e2fsck and e2freefrag in
> particular.  It would also allow us to take out the test_bit
> optimizations which do have a slight cost for random access reads ---
> and this is measurable when looking at the results of the CPU time for
> e2fsck pass 4 in particular.  It's just that the performance hit for
> the random access test_bit case is very tiny compared with the huge
> wins in the sequential scan case.

I agree, at some point it would be nice to have that implemented.

> 
> > > what about using the next_ext once we're holding it to check the bit
> > > ? On sequential walk this shout make sense to do so since we
> > > actually should hit this if we're not in rcursor nor between rcursor
> > > and next_ext.
> 
> Yes, I implemented that in the 2/3 commit in the follow-on patch
> series.

Right, but I think that the 2/3 commit is not necessary and the
whole solution is more complicated than it could be. Also we do not
really need the rcursor_next pointer in the ext2fs_rb_private
structure.

The solution I was suggesting is simpler and it is also one
condition shorter on sequential walk because we store the "next"
extent into the rcursor once we actually hit it. So we optimize a
little bit less when we're testing buts in between the extents (due
to ext2fs_rb_next()), but we avoid searching the tree entirely
(except the first time) on sequential walk, because we are moving
the bp->rcursor to the next node when we have match there.

So what about this ? It has to be tested to see if it really is as
effective. Let me know what do you think.

Thanks!
-Lukas


--
Theodore Ts'o - Oct. 9, 2012, 7:55 p.m.
On Tue, Oct 09, 2012 at 09:18:45AM +0200, Lukáš Czerner wrote:
> 
> So what about this ? It has to be tested to see if it really is as
> effective. Let me know what do you think.

Here is the results of my patches, running e2fsck on an empty 3T file system;

Pass 1: Memory used: 672k/32232k (423k/250k), time:  1.15/ 1.14/ 0.00
Pass 2: Memory used: 672k/63692k (423k/250k), time:  0.01/ 0.00/ 0.01
Peak memory: Memory used: 672k/63692k (424k/249k), time:  1.94/ 1.92/ 0.01
Pass 3A: Memory used: 672k/63692k (423k/250k), time:  0.00/ 0.00/ 0.00
Pass 3: Memory used: 672k/63692k (423k/250k), time:  0.00/ 0.00/ 0.00
Pass 4: Memory used: 672k/772k (422k/251k), time:  3.67/ 3.66/ 0.00
Pass 5: Memory used: 824k/772k (422k/403k), time:  3.83/ 3.82/ 0.00
/mnt/foo.img: 11/201326592 files (0.0% non-contiguous), 12687388/805306368 blocks
Memory used: 824k/772k (422k/403k), time:  9.49/ 9.41/ 0.01
9.41user 0.01system 0:09.59elapsed 98%CPU (0avgtext+0avgdata 65636maxresident)k
0inputs+1552outputs (0major+2198minor)pagefaults 0swaps

And here is running e2fsck on a 3T file system with your patches:

Pass 1: Memory used: 672k/32232k (423k/250k), time:  1.20/ 1.19/ 0.01
Pass 2: Memory used: 672k/63692k (423k/250k), time:  0.00/ 0.00/ 0.01
Peak memory: Memory used: 672k/63692k (423k/250k), time:  2.02/ 1.98/ 0.03
Pass 3A: Memory used: 672k/63692k (423k/250k), time:  0.00/ 0.00/ 0.00
Pass 3: Memory used: 672k/63692k (423k/250k), time:  0.00/ 0.00/ 0.00
Pass 4: Memory used: 672k/772k (422k/251k), time:  5.66/ 5.64/ 0.00
Pass 5: Memory used: 896k/772k (422k/475k), time:  4.04/ 4.02/ 0.01
/mnt/foo.img: 11/201326592 files (0.0% non-contiguous), 12687388/805306368 blocks
11.65user 0.04system 0:11.89elapsed 98%CPU (0avgtext+0avgdata 65640maxresident)k
0inputs+1552outputs (0major+1706minor)pagefaults 0swaps


I had tried this approach before, and saw a similar performance
characteristics, so here this didn't come as a huge surprise.

----------------

Here is the results of my patches running e2fsck on a roughly
half-empty file system that holds lots of kernel build trees:

Pass 1: Memory used: 2076k/2780k (1957k/120k), time:  2.78/ 0.95/ 0.15
Pass 2: Memory used: 3528k/2828k (2488k/1041k), time:  2.05/ 0.19/ 0.21
Peak memory: Memory used: 4276k/2828k (3129k/1148k), time:  4.92/ 1.16/ 0.36
Pass 3A: Memory used: 4276k/2828k (3128k/1149k), time:  0.00/ 0.00/ 0.00
Pass 3: Memory used: 4276k/2596k (2488k/1789k), time:  0.00/ 0.00/ 0.00
Pass 4: Memory used: 4276k/644k (645k/3632k), time:  0.12/ 0.12/ 0.00
Pass 5: Memory used: 4276k/0k (645k/3632k), time:  1.42/ 0.99/ 0.02
kbuild: 204690/5242880 files (1.9% non-contiguous), 9843555/20971520 blocks
Memory used: 4276k/0k (645k/3632k), time:  6.51/ 2.29/ 0.38
2.29user 0.45system 0:06.60elapsed 41%CPU (0avgtext+0avgdata 8664maxresident)k
329872inputs+56outputs (0major+2511minor)pagefaults 0swaps

Here are the results with your patch:

Pass 1: Memory used: 2080k/2780k (1957k/124k), time:  2.81/ 0.99/ 0.14
Pass 2: Memory used: 3532k/2828k (2487k/1045k), time:  2.10/ 0.21/ 0.22
Peak memory: Memory used: 4276k/2828k (3128k/1149k), time:  5.00/ 1.22/ 0.37
Pass 3: Memory used: 4276k/2596k (2487k/1790k), time:  0.00/ 0.00/ 0.00
Pass 4: Memory used: 4276k/644k (645k/3632k), time:  0.14/ 0.15/ 0.00
Pass 5: Memory used: 4276k/0k (645k/3632k), time:  1.64/ 1.18/ 0.04
Memory used: 4276k/0k (645k/3632k), time:  6.83/ 2.56/ 0.41
2.57user 0.46system 0:06.91elapsed 43%CPU (0avgtext+0avgdata 8668maxresident)k
329872inputs+56outputs (0major+2512minor)pagefaults 0swaps

----------------

Here are the timing results of e2freefrag on the empty 3T file system
image using my patch:

8.25user 0.01system 0:08.28elapsed 99%CPU (0avgtext+0avgdata 1708maxresident)k
0inputs+0outputs (0major+475minor)pagefaults 0swaps

and here are the results using your patch:

10.27user 0.00system 0:10.30elapsed 99%CPU (0avgtext+0avgdata 1708maxresident)k
0inputs+0outputs (0major+474minor)pagefaults 0swaps


So anyway, that's why I chose the approach I did.

							- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/lib/ext2fs/blkmap64_rb.c b/lib/ext2fs/blkmap64_rb.c
index 55d78e2..f76f790 100644
--- a/lib/ext2fs/blkmap64_rb.c
+++ b/lib/ext2fs/blkmap64_rb.c
@@ -304,8 +304,8 @@  static errcode_t rb_resize_bmap(ext2fs_generic_bitmap bmap,
 inline static int
 rb_test_bit(struct ext2fs_rb_private *bp, __u64 bit)
 {
-	struct bmap_rb_extent *rcursor;
-	struct rb_node *parent = NULL;
+	struct bmap_rb_extent *rcursor, *next_ext;
+	struct rb_node *parent = NULL, *next;
 	struct rb_node **n = &bp->root.rb_node;
 	struct bmap_rb_extent *ext;
 
@@ -320,6 +320,23 @@  rb_test_bit(struct ext2fs_rb_private *bp, __u64 bit)
 		return 1;
 	}
 
+	next = ext2fs_rb_next(&rcursor->node);
+	if (next && (bit >= rcursor->start + rcursor->count)) {
+		next_ext = ext2fs_rb_entry(next, struct bmap_rb_extent, node);
+		if (bit < next_ext->start) {
+#ifdef BMAP_STATS_OPS
+			bp->test_hit++;
+#endif
+			return 0;
+		} else if (bit < next_ext->start + next_ext->count) {
+#ifdef BMAP_STATS_OPS
+			bp->test_hit++;
+#endif
+			bp->rcursor = next_ext;
+			return 1;
+		}
+	}
+
 	rcursor = bp->wcursor;
 	if (!rcursor)
 		goto search_tree;