Patchwork [3/6,-v3] libext2fs: add ext2fs_bitcount() function

login
register
mail settings
Submitter Theodore Ts'o
Date Nov. 26, 2012, 4:39 p.m.
Message ID <1353947981-15219-4-git-send-email-tytso@mit.edu>
Download mbox | patch
Permalink /patch/201729/
State Superseded
Headers show

Comments

Theodore Ts'o - Nov. 26, 2012, 4:39 p.m.
This function efficiently counts the number of bits in a block of
memory.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
---
 lib/ext2fs/bitops.c         | 35 ++++++++++++++++
 lib/ext2fs/bitops.h         |  1 +
 lib/ext2fs/tst_bitmaps.c    |  1 +
 lib/ext2fs/tst_bitmaps_cmds | 42 ++++++++++++++++++++
 lib/ext2fs/tst_bitmaps_exp  | 97 +++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 176 insertions(+)
Zach Brown - Nov. 26, 2012, 11:17 p.m.
> This function efficiently counts the number of bits in a block of
> memory.

Would it be worth the annoying build- and run-time machinery to detect
and use the -msse4.2 __builtin_popcount() gcc intrinsic?

> +static unsigned int popcount32(unsigned int w)
> +{
> +	unsigned int res = w - ((w >> 1) & 0x55555555);
> +	res = (res & 0x33333333) + ((res >> 2) & 0x33333333);
> +	res = (res + (res >> 4)) & 0x0F0F0F0F;
> +	res = res + (res >> 8);
> +	return (res + (res >> 16)) & 0x000000FF;
> +}

  40042b:       89 c2                   mov    %eax,%edx
  400432:       d1 ea                   shr    %edx
  400434:       81 e2 55 55 55 55       and    $0x55555555,%edx
  40043a:       29 d0                   sub    %edx,%eax
  40043c:       89 c2                   mov    %eax,%edx
  40043e:       c1 e8 02                shr    $0x2,%eax
  400441:       81 e2 33 33 33 33       and    $0x33333333,%edx
  400447:       25 33 33 33 33          and    $0x33333333,%eax
  40044c:       01 d0                   add    %edx,%eax
  40044e:       89 c2                   mov    %eax,%edx
  400450:       c1 ea 04                shr    $0x4,%edx
  400453:       01 d0                   add    %edx,%eax
  400455:       25 0f 0f 0f 0f          and    $0xf0f0f0f,%eax
  40045a:       89 c2                   mov    %eax,%edx
  40045c:       c1 ea 08                shr    $0x8,%edx
  40045f:       01 d0                   add    %edx,%eax
  400461:       89 c6                   mov    %eax,%esi
  400463:       c1 ee 10                shr    $0x10,%esi
  400468:       0f b6 f0                movzbl %al,%esi

__builtin_popcount():

  40047e:       f3 0f b8 f0             popcnt %eax,%esi

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - Nov. 27, 2012, 1:45 a.m.
On Mon, Nov 26, 2012 at 03:17:45PM -0800, Zach Brown wrote:
> > This function efficiently counts the number of bits in a block of
> > memory.
> 
> Would it be worth the annoying build- and run-time machinery to detect
> and use the -msse4.2 __builtin_popcount() gcc intrinsic?

I thought about doing it, but I was in a bit of a hurry implementing
this patch set, and I wasn't even sure how to correctly implement the
build- and run-time machinery (i.e., detecting whether the gcc you're
compiling with supports __builtin_popcount, and implementing a
run-time fallback is the CPU doesn't support popcount instruction ---
which by the way isn't properly part of SSE 4.2; it has its own
separate CPUID bit, IIRC).  Is there some userspace application
licensed under LGPLv2 which does this cleanly from which I could
borrow code?

I suppose I should first check and see how much difference it makes to
with a hard-coded use __builtin_popcnt().  If it makes a sufficiently
large improvement, it's probably worth the hair of implementing the
fallback machinery.

							- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - Nov. 27, 2012, 5:16 a.m.
On Mon, Nov 26, 2012 at 08:45:05PM -0500, Theodore Ts'o wrote:
> I suppose I should first check and see how much difference it makes to
> with a hard-coded use __builtin_popcnt().  If it makes a sufficiently
> large improvement, it's probably worth the hair of implementing the
> fallback machinery.

I did some quick benchmarking, and the difference it makes when
checking 4TB's worth of bitmaps is negligble:

slow popcount: 0.2623
fast popcount: 0.0700

For a 128TB's worth of bitmaps, the time difference is:

slow popcount: 8.0185
fast popcount: 2.2066

I measured running e2fsck on an empty 128TB file system, and that took
202 CPU seconds (assuming all of the fs metadata blocks are in cache),
so with this optimization we would save at most 3%.  (For comparison,
using an unmodified 1.42.6 e2fsck, it burned 392.7 CPU seconds.)

My conclusion is that using __builtin_popcnt() is a nice-to-have, and
if someone sends me patches I'll probably take them as a optimization,
but it's not super high priority for me.

							- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zach Brown - Nov. 27, 2012, 5:50 p.m.
On Tue, Nov 27, 2012 at 12:16:17AM -0500, Theodore Ts'o wrote:
> I did some quick benchmarking, and the difference it makes when
> checking 4TB's worth of bitmaps is negligble:
> 
> slow popcount: 0.2623
> fast popcount: 0.0700
> 
> For a 128TB's worth of bitmaps, the time difference is:
> 
> slow popcount: 8.0185
> fast popcount: 2.2066
> 
> I measured running e2fsck on an empty 128TB file system, and that took
> 202 CPU seconds (assuming all of the fs metadata blocks are in cache),
> so with this optimization we would save at most 3%.  (For comparison,
> using an unmodified 1.42.6 e2fsck, it burned 392.7 CPU seconds.)

Nice, thanks for taking the time to get numbers.

> My conclusion is that using __builtin_popcnt() is a nice-to-have, and
> if someone sends me patches I'll probably take them as a optimization,
> but it's not super high priority for me.

Agreed.  I'll chuck it at the end of my fun-projects-some-day list as
well, but getting it right for all the platforms that e2fsprogs
supports.. meh :).

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zach Brown - Nov. 27, 2012, 5:54 p.m.
On Mon, Nov 26, 2012 at 08:45:05PM -0500, Theodore Ts'o wrote:
> build- and run-time machinery (i.e., detecting whether the gcc you're
> compiling with supports __builtin_popcount, and implementing a
> run-time fallback is the CPU doesn't support popcount instruction ---
> which by the way isn't properly part of SSE 4.2; it has its own
> separate CPUID bit, IIRC).

*nod*

> Is there some userspace application licensed under LGPLv2 which does
> this cleanly from which I could borrow code?

Not that I know of, off the top of my head.  I think I'd first check the
usual crypto suspects :).

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - Nov. 27, 2012, 7:37 p.m.
On Tue, Nov 27, 2012 at 09:50:23AM -0800, Zach Brown wrote:
> 
> Agreed.  I'll chuck it at the end of my fun-projects-some-day list as
> well, but getting it right for all the platforms that e2fsprogs
> supports.. meh :).

It's not strictly necessary to get things right for all platforms; we
already have some accelerations using asm statements which only work
for one platform already --- although it's already the case that I
didn't bother to make 64-bit set/clear/test bit optimizations for x86,
mainly because I didn't think it was worth it, especially on modern
CPU's.  (And with the red/black tree backend for bitmaps, the asm
bitops are even less important.)

							- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/lib/ext2fs/bitops.c b/lib/ext2fs/bitops.c
index 9322a35..bd9e768 100644
--- a/lib/ext2fs/bitops.c
+++ b/lib/ext2fs/bitops.c
@@ -116,3 +116,38 @@  int ext2fs_test_bit64(__u64 nr, const void * addr)
 	return (mask & *ADDR);
 }
 
+static unsigned int popcount8(unsigned int w)
+{
+	unsigned int res = w - ((w >> 1) & 0x55);
+	res = (res & 0x33) + ((res >> 2) & 0x33);
+	return (res + (res >> 4)) & 0x0F;
+}
+
+static unsigned int popcount32(unsigned int w)
+{
+	unsigned int res = w - ((w >> 1) & 0x55555555);
+	res = (res & 0x33333333) + ((res >> 2) & 0x33333333);
+	res = (res + (res >> 4)) & 0x0F0F0F0F;
+	res = res + (res >> 8);
+	return (res + (res >> 16)) & 0x000000FF;
+}
+
+unsigned int ext2fs_bitcount(const void *addr, unsigned int nbytes)
+{
+	const unsigned char *cp = addr;
+	const __u32 *p = addr;
+	unsigned int res = 0;
+
+	if ((((unsigned long) addr) & 3) == 0) {
+		while (nbytes > 4) {
+			res += popcount32(*p++);
+			nbytes -= 4;
+		}
+		cp = (const unsigned char *) p;
+	}
+	while (nbytes > 0) {
+		res += popcount8(*cp++);
+		nbytes--;
+	}
+	return res;
+}
diff --git a/lib/ext2fs/bitops.h b/lib/ext2fs/bitops.h
index 526870f..406999b 100644
--- a/lib/ext2fs/bitops.h
+++ b/lib/ext2fs/bitops.h
@@ -686,6 +686,7 @@  extern int ext2fs_test_bit(unsigned int nr, const void * addr);
 extern int ext2fs_set_bit64(__u64 nr,void * addr);
 extern int ext2fs_clear_bit64(__u64 nr, void * addr);
 extern int ext2fs_test_bit64(__u64 nr, const void * addr);
+extern unsigned int ext2fs_bitcount(const void *addr, unsigned int nbytes);
 
 #ifdef NO_INLINE_FUNCS
 extern void ext2fs_fast_set_bit(unsigned int nr,void * addr);
diff --git a/lib/ext2fs/tst_bitmaps.c b/lib/ext2fs/tst_bitmaps.c
index 5da3693..2a76292 100644
--- a/lib/ext2fs/tst_bitmaps.c
+++ b/lib/ext2fs/tst_bitmaps.c
@@ -270,6 +270,7 @@  void dump_bitmap(ext2fs_generic_bitmap bmap, unsigned int start, unsigned num)
 	for (i=0; i < len; i++)
 		printf("%02x", buf[i]);
 	printf("\n");
+	printf("bits set: %u\n", ext2fs_bitcount(buf, len));
 	free(buf);
 }
 
diff --git a/lib/ext2fs/tst_bitmaps_cmds b/lib/ext2fs/tst_bitmaps_cmds
index 9a4f3d0..31e2a60 100644
--- a/lib/ext2fs/tst_bitmaps_cmds
+++ b/lib/ext2fs/tst_bitmaps_cmds
@@ -53,5 +53,47 @@  cleari 5
 testi 17
 testi 6
 testi 4
+clearb 7 12
+dump_bb
+setb 1
+dump_bb
+setb 2
+dump_bb
+setb 3
+dump_bb
+setb 4
+dump_bb
+setb 5
+dump_bb
+setb 6
+dump_bb
+setb 7
+dump_bb
+setb 8
+dump_bb
+setb 10
+setb 12
+setb 14
+setb 17
+setb 19
+setb 24
+setb 26
+setb 27
+setb 30
+setb 31
+setb 32
+setb 35
+setb 39
+setb 40
+setb 44
+setb 46
+setb 47
+setb 49
+setb 51
+setb 52
+clearb 2
+clearb 3
+clearb 7
+dump_bb
 quit
 
diff --git a/lib/ext2fs/tst_bitmaps_exp b/lib/ext2fs/tst_bitmaps_exp
index 2d406ce..2d62b66 100644
--- a/lib/ext2fs/tst_bitmaps_exp
+++ b/lib/ext2fs/tst_bitmaps_exp
@@ -36,6 +36,7 @@  tst_bitmaps: testb 16
 Block 16 is set
 tst_bitmaps: dump_bb
 block bitmap: 00f80000000000000000000000000000
+bits set: 5
 tst_bitmaps: ffzb 11 16
 First unmarked block is 11
 tst_bitmaps: ffzb 12 16
@@ -64,6 +65,7 @@  tst_bitmaps: setb 12 7
 Marking blocks 12 to 18
 tst_bitmaps: dump_bb
 block bitmap: 00f80300000000000000000000000000
+bits set: 7
 tst_bitmaps: seti 2
 Setting inode 2, was clear before
 tst_bitmaps: seti 5
@@ -82,6 +84,7 @@  tst_bitmaps: testi 1
 Inode 1 is clear
 tst_bitmaps: dump_ib
 inode bitmap: 1e000000
+bits set: 4
 tst_bitmaps: ffzi 1 6
 First unmarked inode is 1
 tst_bitmaps: ffzi 2 5
@@ -110,5 +113,99 @@  tst_bitmaps: testi 6
 Inode 6 is clear
 tst_bitmaps: testi 4
 Inode 4 is clear
+tst_bitmaps: clearb 7 12
+Clearing blocks 7 to 18
+tst_bitmaps: dump_bb
+block bitmap: 00000000000000000000000000000000
+bits set: 0
+tst_bitmaps: setb 1
+Setting block 1, was clear before
+tst_bitmaps: dump_bb
+block bitmap: 01000000000000000000000000000000
+bits set: 1
+tst_bitmaps: setb 2
+Setting block 2, was clear before
+tst_bitmaps: dump_bb
+block bitmap: 03000000000000000000000000000000
+bits set: 2
+tst_bitmaps: setb 3
+Setting block 3, was clear before
+tst_bitmaps: dump_bb
+block bitmap: 07000000000000000000000000000000
+bits set: 3
+tst_bitmaps: setb 4
+Setting block 4, was clear before
+tst_bitmaps: dump_bb
+block bitmap: 0f000000000000000000000000000000
+bits set: 4
+tst_bitmaps: setb 5
+Setting block 5, was clear before
+tst_bitmaps: dump_bb
+block bitmap: 1f000000000000000000000000000000
+bits set: 5
+tst_bitmaps: setb 6
+Setting block 6, was clear before
+tst_bitmaps: dump_bb
+block bitmap: 3f000000000000000000000000000000
+bits set: 6
+tst_bitmaps: setb 7
+Setting block 7, was clear before
+tst_bitmaps: dump_bb
+block bitmap: 7f000000000000000000000000000000
+bits set: 7
+tst_bitmaps: setb 8
+Setting block 8, was clear before
+tst_bitmaps: dump_bb
+block bitmap: ff000000000000000000000000000000
+bits set: 8
+tst_bitmaps: setb 10
+Setting block 10, was clear before
+tst_bitmaps: setb 12
+Setting block 12, was clear before
+tst_bitmaps: setb 14
+Setting block 14, was clear before
+tst_bitmaps: setb 17
+Setting block 17, was clear before
+tst_bitmaps: setb 19
+Setting block 19, was clear before
+tst_bitmaps: setb 24
+Setting block 24, was clear before
+tst_bitmaps: setb 26
+Setting block 26, was clear before
+tst_bitmaps: setb 27
+Setting block 27, was clear before
+tst_bitmaps: setb 30
+Setting block 30, was clear before
+tst_bitmaps: setb 31
+Setting block 31, was clear before
+tst_bitmaps: setb 32
+Setting block 32, was clear before
+tst_bitmaps: setb 35
+Setting block 35, was clear before
+tst_bitmaps: setb 39
+Setting block 39, was clear before
+tst_bitmaps: setb 40
+Setting block 40, was clear before
+tst_bitmaps: setb 44
+Setting block 44, was clear before
+tst_bitmaps: setb 46
+Setting block 46, was clear before
+tst_bitmaps: setb 47
+Setting block 47, was clear before
+tst_bitmaps: setb 49
+Setting block 49, was clear before
+tst_bitmaps: setb 51
+Setting block 51, was clear before
+tst_bitmaps: setb 52
+Setting block 52, was clear before
+tst_bitmaps: clearb 2
+Clearing block 2, was set before
+tst_bitmaps: clearb 3
+Clearing block 3, was set before
+tst_bitmaps: clearb 7
+Clearing block 7, was set before
+tst_bitmaps: dump_bb
+block bitmap: b92a85e6c4680d000000000000000000
+bits set: 25
 tst_bitmaps: quit
 tst_bitmaps: