Patchwork [PATCHv2,4/9] bitops: use vector algorithm to optimize find_next_bit()

login
register
mail settings
Submitter Peter Lieven
Date March 15, 2013, 3:50 p.m.
Message ID <1363362619-3190-5-git-send-email-pl@kamp.de>
Download mbox | patch
Permalink /patch/228153/
State New
Headers show

Comments

Peter Lieven - March 15, 2013, 3:50 p.m.
this patch adds the usage of buffer_find_nonzero_offset()
to skip large areas of zeroes.

compared to loop unrolling presented in an earlier
patch this adds another 50% performance benefit for
skipping large areas of zeroes. loop unrolling alone
added close to 100% speedup.

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 util/bitops.c |   26 +++++++++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)
Eric Blake - March 19, 2013, 4:49 p.m.
On 03/15/2013 09:50 AM, Peter Lieven wrote:
> this patch adds the usage of buffer_find_nonzero_offset()
> to skip large areas of zeroes.
> 
> compared to loop unrolling presented in an earlier
> patch this adds another 50% performance benefit for
> skipping large areas of zeroes. loop unrolling alone
> added close to 100% speedup.
> 
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  util/bitops.c |   26 +++++++++++++++++++++++---
>  1 file changed, 23 insertions(+), 3 deletions(-)

> +    while (size >= BITS_PER_LONG) {
> +        if ((tmp = *p)) {
> +             goto found_middle;
> +        }
> +        if (((uintptr_t) p) % sizeof(VECTYPE) == 0 
> +                && size >= BITS_PER_BYTE * sizeof(VECTYPE)
> +                   * BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR) {

Another instance where a helper function to check for alignment would be
nice.  Except this time you have a BITS_PER_BYTE factor, so you would be
calling something like buffer_can_use_vectors(buf, size / BITS_PER_BYTE)

> +            unsigned long tmp2 =
> +                buffer_find_nonzero_offset(p, ((size / BITS_PER_BYTE) & 
> +                           ~(BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR * 
> +                             sizeof(VECTYPE) - 1)));

Type mismatch - buffer_find_nonzero_offset returns size_t, which isn't
necessarily the same size as unsigned long.  I'm not sure if it can bite
you.

> +            result += tmp2 * BITS_PER_BYTE;
> +            size -= tmp2 * BITS_PER_BYTE;
> +            p += tmp2 / sizeof(unsigned long);
> +            if (!size) {
> +                return result;
> +            }
> +            if (tmp2) {

Do you really need this condition, or would it suffice to just
'continue;' the loop?  Once buffer_find_nonzero_offset returns anything
that leaves size as non-zero, we are guaranteed that the loop will goto
found_middle without any further calls to buffer_find_nonzero_offset.

> +                if ((tmp = *p)) {
> +                    goto found_middle;
> +                }
> +            }
>          }
> +        p++;
>          result += BITS_PER_LONG;
>          size -= BITS_PER_LONG;
>      }
>
Peter Lieven - March 19, 2013, 7:40 p.m.
Am 19.03.2013 um 17:49 schrieb Eric Blake <eblake@redhat.com>:

> On 03/15/2013 09:50 AM, Peter Lieven wrote:
>> this patch adds the usage of buffer_find_nonzero_offset()
>> to skip large areas of zeroes.
>> 
>> compared to loop unrolling presented in an earlier
>> patch this adds another 50% performance benefit for
>> skipping large areas of zeroes. loop unrolling alone
>> added close to 100% speedup.
>> 
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>> util/bitops.c |   26 +++++++++++++++++++++++---
>> 1 file changed, 23 insertions(+), 3 deletions(-)
> 
>> +    while (size >= BITS_PER_LONG) {
>> +        if ((tmp = *p)) {
>> +             goto found_middle;
>> +        }
>> +        if (((uintptr_t) p) % sizeof(VECTYPE) == 0 
>> +                && size >= BITS_PER_BYTE * sizeof(VECTYPE)
>> +                   * BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR) {
> 
> Another instance where a helper function to check for alignment would be
> nice.  Except this time you have a BITS_PER_BYTE factor, so you would be
> calling something like buffer_can_use_vectors(buf, size / BITS_PER_BYTE)
> 
>> +            unsigned long tmp2 =
>> +                buffer_find_nonzero_offset(p, ((size / BITS_PER_BYTE) & 
>> +                           ~(BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR * 
>> +                             sizeof(VECTYPE) - 1)));
> 
> Type mismatch - buffer_find_nonzero_offset returns size_t, which isn't
> necessarily the same size as unsigned long.  I'm not sure if it can bite
> you.

I will look into it.

> 
>> +            result += tmp2 * BITS_PER_BYTE;
>> +            size -= tmp2 * BITS_PER_BYTE;
>> +            p += tmp2 / sizeof(unsigned long);
>> +            if (!size) {
>> +                return result;
>> +            }
>> +            if (tmp2) {
> 
> Do you really need this condition, or would it suffice to just
> 'continue;' the loop?  Once buffer_find_nonzero_offset returns anything
> that leaves size as non-zero, we are guaranteed that the loop will goto
> found_middle without any further calls to buffer_find_nonzero_offset.

Note in all cases. It will do if the nonzero content is in the first sizeof(unsigned long)
bytes. If not, buffer_find_nonzero_offset() is called again. It will return 0 because
in the first sizeof(VECTYPE)*BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
bytes is a non-zero byte. To avoid this I placed this check.

Peter


> 
>> +                if ((tmp = *p)) {
>> +                    goto found_middle;
>> +                }
>> +            }
>>         }
>> +        p++;
>>         result += BITS_PER_LONG;
>>         size -= BITS_PER_LONG;
>>     }
>> 
> 
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>

Patch

diff --git a/util/bitops.c b/util/bitops.c
index e72237a..3c301fa 100644
--- a/util/bitops.c
+++ b/util/bitops.c
@@ -42,10 +42,30 @@  unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
         size -= BITS_PER_LONG;
         result += BITS_PER_LONG;
     }
-    while (size & ~(BITS_PER_LONG-1)) {
-        if ((tmp = *(p++))) {
-            goto found_middle;
+    while (size >= BITS_PER_LONG) {
+        if ((tmp = *p)) {
+             goto found_middle;
+        }
+        if (((uintptr_t) p) % sizeof(VECTYPE) == 0 
+                && size >= BITS_PER_BYTE * sizeof(VECTYPE)
+                   * BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR) {
+            unsigned long tmp2 =
+                buffer_find_nonzero_offset(p, ((size / BITS_PER_BYTE) & 
+                           ~(BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR * 
+                             sizeof(VECTYPE) - 1)));
+            result += tmp2 * BITS_PER_BYTE;
+            size -= tmp2 * BITS_PER_BYTE;
+            p += tmp2 / sizeof(unsigned long);
+            if (!size) {
+                return result;
+            }
+            if (tmp2) {
+                if ((tmp = *p)) {
+                    goto found_middle;
+                }
+            }
         }
+        p++;
         result += BITS_PER_LONG;
         size -= BITS_PER_LONG;
     }