Patchwork [RFC,7/9] bitops: use vector algorithm to optimize find_next_bit()

login
register
mail settings
Submitter Peter Lieven
Date March 12, 2013, 3:52 p.m.
Message ID <513F4F4E.1060009@dlhnet.de>
Download mbox | patch
Permalink /patch/227062/
State New
Headers show

Comments

Peter Lieven - March 12, 2013, 3:52 p.m.
this patch adds the usage of buffer_find_nonzero_offset()
to skip large areas of zeroes.

compared to loop unrolling this adds another 50% performance
benefit for skipping large areas of zeroes.

Signed-off-by: Peter Lieven <pl@kamp.de>
---
  util/bitops.c |   23 ++++++++++++++++++++---
  1 file changed, 20 insertions(+), 3 deletions(-)
Eric Blake - March 12, 2013, 4:04 p.m.
On 03/12/2013 09:52 AM, Peter Lieven wrote:
> this patch adds the usage of buffer_find_nonzero_offset()
> to skip large areas of zeroes.
> 
> compared to loop unrolling this adds another 50% performance
> benefit for skipping large areas of zeroes.
> 
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  util/bitops.c |   23 ++++++++++++++++++++---
>  1 file changed, 20 insertions(+), 3 deletions(-)

> +        if (((uintptr_t) p) % sizeof(VECTYPE) == 0
> +              && size >= BITS_PER_BYTE*8*sizeof(VECTYPE)) {

Spaces around binary operators.  CHAR_BITS instead of magic 8.

> +          unsigned long tmp2 =
> +              buffer_find_nonzero_offset(p, ((size/BITS_PER_BYTE) &
> ~(8*sizeof(VECTYPE)-1)));

Spaces around binary operators.

Patch

diff --git a/util/bitops.c b/util/bitops.c
index e72237a..0a056ff 100644
--- a/util/bitops.c
+++ b/util/bitops.c
@@ -42,10 +42,27 @@  unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
          size -= BITS_PER_LONG;
          result += BITS_PER_LONG;
      }
-    while (size & ~(BITS_PER_LONG-1)) {
-        if ((tmp = *(p++))) {
-            goto found_middle;
+    while (size >= BITS_PER_LONG) {
+        if ((tmp = *p)) {
+             goto found_middle;
+         }
+        if (((uintptr_t) p) % sizeof(VECTYPE) == 0
+              && size >= BITS_PER_BYTE*8*sizeof(VECTYPE)) {
+          unsigned long tmp2 =
+              buffer_find_nonzero_offset(p, ((size/BITS_PER_BYTE) & ~(8*sizeof(VECTYPE)-1)));
+          result += tmp2 * BITS_PER_BYTE;
+          size -= tmp2 * BITS_PER_BYTE;
+          p += tmp2 / sizeof(unsigned long);
+          if (!size) {
+              return result;
+          }
+          if (tmp2) {
+             if ((tmp = *p)) {
+                 goto found_middle;
+             }
+          }
          }
+        p++;
          result += BITS_PER_LONG;
          size -= BITS_PER_LONG;
      }