Patchwork [PATCHv4,2/9] cutils: add a function to find non-zero content in a buffer

login
register
mail settings
Submitter Peter Lieven
Date March 22, 2013, 12:46 p.m.
Message ID <1363956370-23681-3-git-send-email-pl@kamp.de>
Download mbox | patch
Permalink /patch/230005/
State New
Headers show

Comments

Peter Lieven - March 22, 2013, 12:46 p.m.
this adds buffer_find_nonzero_offset() which is a SSE2/Altivec
optimized function that searches for non-zero content in a
buffer.

due to the optimizations used in the function there are restrictions
on buffer address and search length. the function
can_use_buffer_find_nonzero_content() can be used to check if
the function can be used safely.

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 include/qemu-common.h |   13 +++++++++++++
 util/cutils.c         |   45 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 58 insertions(+)
Eric Blake - March 22, 2013, 7:37 p.m.
On 03/22/2013 06:46 AM, Peter Lieven wrote:
> this adds buffer_find_nonzero_offset() which is a SSE2/Altivec
> optimized function that searches for non-zero content in a
> buffer.
> 
> due to the optimizations used in the function there are restrictions
> on buffer address and search length. the function
> can_use_buffer_find_nonzero_content() can be used to check if
> the function can be used safely.
> 
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  include/qemu-common.h |   13 +++++++++++++
>  util/cutils.c         |   45 +++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 58 insertions(+)

> +#define BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR 8
> +static inline bool
> +can_use_buffer_find_nonzero_offset(const void *buf, size_t len)
> +{
> +    if (len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
> +                * sizeof(VECTYPE)) == 0
> +            && ((uintptr_t) buf) % sizeof(VECTYPE) == 0) {

I know that emacs tends to indent the second line to the column after
the ( that it is associated with, as in:

+    if (len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
+               * sizeof(VECTYPE)) == 0
+        && ((uintptr_t) buf) % sizeof(VECTYPE) == 0) {

But since checkpatch.pl didn't complain, and since I'm not sure if there
is a codified qemu indentation style, and since I _am_ sure that not
everyone uses emacs [hi, vi guys], it's not worth respinning.  A
maintainer can touch it up if desired.



> +
> +size_t buffer_find_nonzero_offset(const void *buf, size_t len)
> +{
> +    VECTYPE *p = (VECTYPE *)buf;
> +    VECTYPE zero = ZERO_SPLAT;
> +    size_t i;
> +
> +    assert(len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
> +        * sizeof(VECTYPE)) == 0);
> +    assert(((uintptr_t) buf) % sizeof(VECTYPE) == 0);

I would have written this:

assert(can_use_buffer_find_nonzero_offset(buf, len));

But that's cosmetic, and compiles to the same code, so it's not worth a
respin.

You've addressed my concerns on v3.

Reviewed-by: Eric Blake <eblake@redhat.com>
Peter Lieven - March 22, 2013, 8:03 p.m.
Am 22.03.2013 20:37, schrieb Eric Blake:

> On 03/22/2013 06:46 AM, Peter Lieven wrote:
>> this adds buffer_find_nonzero_offset() which is a SSE2/Altivec
>> optimized function that searches for non-zero content in a
>> buffer.
>>
>> due to the optimizations used in the function there are restrictions
>> on buffer address and search length. the function
>> can_use_buffer_find_nonzero_content() can be used to check if
>> the function can be used safely.
>>
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>>  include/qemu-common.h |   13 +++++++++++++
>>  util/cutils.c         |   45 +++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 58 insertions(+)
>> +#define BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR 8
>> +static inline bool
>> +can_use_buffer_find_nonzero_offset(const void *buf, size_t len)
>> +{
>> +    if (len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
>> +                * sizeof(VECTYPE)) == 0
>> +            && ((uintptr_t) buf) % sizeof(VECTYPE) == 0) {
> I know that emacs tends to indent the second line to the column after
> the ( that it is associated with, as in:
>
> +    if (len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
> +               * sizeof(VECTYPE)) == 0
> +        && ((uintptr_t) buf) % sizeof(VECTYPE) == 0) {
>
> But since checkpatch.pl didn't complain, and since I'm not sure if there
> is a codified qemu indentation style, and since I _am_ sure that not
> everyone uses emacs [hi, vi guys], it's not worth respinning.  A
> maintainer can touch it up if desired.

Actually, I was totally unsure how to indent this. Maybe just give
a hint what you would like to see. As I will replace patch 4 with
an earlier version that is not vector optimized, but uses loop unrolling,
I will have to do a v5 and so I can fix this.

>
>> +
>> +size_t buffer_find_nonzero_offset(const void *buf, size_t len)
>> +{
>> +    VECTYPE *p = (VECTYPE *)buf;
>> +    VECTYPE zero = ZERO_SPLAT;
>> +    size_t i;
>> +
>> +    assert(len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
>> +        * sizeof(VECTYPE)) == 0);
>> +    assert(((uintptr_t) buf) % sizeof(VECTYPE) == 0);
> I would have written this:
>
> assert(can_use_buffer_find_nonzero_offset(buf, len));

Good point. Will be changed in v5.

> But that's cosmetic, and compiles to the same code, so it's not worth a
> respin.
>
> You've addressed my concerns on v3.
>
> Reviewed-by: Eric Blake <eblake@redhat.com>
>
Peter
Eric Blake - March 22, 2013, 8:22 p.m.
On 03/22/2013 02:03 PM, Peter Lieven wrote:

>>> +    if (len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
>>> +                * sizeof(VECTYPE)) == 0
>>> +            && ((uintptr_t) buf) % sizeof(VECTYPE) == 0) {
>> I know that emacs tends to indent the second line to the column after
>> the ( that it is associated with, as in:
>>
>> +    if (len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
>> +               * sizeof(VECTYPE)) == 0
>> +        && ((uintptr_t) buf) % sizeof(VECTYPE) == 0) {
>>

> 
> Actually, I was totally unsure how to indent this. Maybe just give
> a hint what you would like to see.

I thought I did just that, with my rewrite of your line (best if you
view the mail in fixed-width font) :)

> As I will replace patch 4 with
> an earlier version that is not vector optimized, but uses loop unrolling,
> I will have to do a v5 and so I can fix this.

Note that qemu.git already has a .exrc file that enables default vi
settings for vim users; I'm not sure if there is a counterpart
.dir-locals.el file to set up emacs settings, but someone probably has
one.  Ultimately, having instructions on how to set up your editor so
that 'TAB' just magically indents to the preferred style seems like a
tip worth having on the wiki page on contributing a patch, but I'm not
sure I'm the one to provide such a patch (since I focus most of my qemu
work on reviewing other's patches, and not writing my own, I don't
really have my own preferred editor set up to indent in a qemu style).
Peter Maydell - March 23, 2013, 11:18 a.m.
On 22 March 2013 20:22, Eric Blake <eblake@redhat.com> wrote:
> Note that qemu.git already has a .exrc file that enables default vi
> settings for vim users; I'm not sure if there is a counterpart
> .dir-locals.el file to set up emacs settings

My .emacs has the following config:
https://wiki.linaro.org/PeterMaydell/QemuEmacsStyle
(it doesn't get things perfect but it's pretty close.)


-- PMM
Orit Wasserman - March 25, 2013, 8:53 a.m.
On 03/22/2013 02:46 PM, Peter Lieven wrote:
> this adds buffer_find_nonzero_offset() which is a SSE2/Altivec
> optimized function that searches for non-zero content in a
> buffer.
> 
> due to the optimizations used in the function there are restrictions
> on buffer address and search length. the function
> can_use_buffer_find_nonzero_content() can be used to check if
> the function can be used safely.
> 
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  include/qemu-common.h |   13 +++++++++++++
>  util/cutils.c         |   45 +++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 58 insertions(+)
> 
> diff --git a/include/qemu-common.h b/include/qemu-common.h
> index e76ade3..078e535 100644
> --- a/include/qemu-common.h
> +++ b/include/qemu-common.h
> @@ -472,4 +472,17 @@ void hexdump(const char *buf, FILE *fp, const char *prefix, size_t size);
>  #define ALL_EQ(v1, v2) ((v1) == (v2))
>  #endif
>  
> +#define BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR 8
> +static inline bool
> +can_use_buffer_find_nonzero_offset(const void *buf, size_t len)
> +{
> +    if (len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
> +                * sizeof(VECTYPE)) == 0
> +            && ((uintptr_t) buf) % sizeof(VECTYPE) == 0) {
> +        return true;
> +    }
> +    return false;
> +}
> +size_t buffer_find_nonzero_offset(const void *buf, size_t len);
> +
>  #endif
> diff --git a/util/cutils.c b/util/cutils.c
> index 1439da4..41c627e 100644
> --- a/util/cutils.c
> +++ b/util/cutils.c
> @@ -143,6 +143,51 @@ int qemu_fdatasync(int fd)
>  }
>  
>  /*
> + * Searches for an area with non-zero content in a buffer
> + *
> + * Attention! The len must be a multiple of
> + * BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR * sizeof(VECTYPE)
> + * and addr must be a multiple of sizeof(VECTYPE) due to
> + * restriction of optimizations in this function.
> + *
> + * can_use_buffer_find_nonzero_offset() can be used to check
> + * these requirements.
> + *
> + * The return value is the offset of the non-zero area rounded
> + * down to BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR * sizeof(VECTYPE).
> + * If the buffer is all zero the return value is equal to len.
> + */
> +
> +size_t buffer_find_nonzero_offset(const void *buf, size_t len)
> +{
> +    VECTYPE *p = (VECTYPE *)buf;
> +    VECTYPE zero = ZERO_SPLAT;
> +    size_t i;
> +
> +    assert(len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
> +        * sizeof(VECTYPE)) == 0);
> +    assert(((uintptr_t) buf) % sizeof(VECTYPE) == 0);
> +
> +    if (*((const long *) buf)) {
> +        return 0;
> +    }
> +
> +    for (i = 0; i < len / sizeof(VECTYPE);
Why not put len/sizeof(VECTYPE) in a variable?
Orit
> +            i += BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR) {
> +        VECTYPE tmp0 = p[i + 0] | p[i + 1];
> +        VECTYPE tmp1 = p[i + 2] | p[i + 3];
> +        VECTYPE tmp2 = p[i + 4] | p[i + 5];
> +        VECTYPE tmp3 = p[i + 6] | p[i + 7];
> +        VECTYPE tmp01 = tmp0 | tmp1;
> +        VECTYPE tmp23 = tmp2 | tmp3;
> +        if (!ALL_EQ(tmp01 | tmp23, zero)) {
> +            break;
> +        }
> +    }
> +    return i * sizeof(VECTYPE);
> +}
> +
> +/*
>   * Checks if a buffer is all zeroes
>   *
>   * Attention! The len must be a multiple of 4 * sizeof(long) due to
>
Peter Lieven - March 25, 2013, 8:56 a.m.
Am 25.03.2013 um 09:53 schrieb Orit Wasserman <owasserm@redhat.com>:

> On 03/22/2013 02:46 PM, Peter Lieven wrote:
>> this adds buffer_find_nonzero_offset() which is a SSE2/Altivec
>> optimized function that searches for non-zero content in a
>> buffer.
>> 
>> due to the optimizations used in the function there are restrictions
>> on buffer address and search length. the function
>> can_use_buffer_find_nonzero_content() can be used to check if
>> the function can be used safely.
>> 
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>> include/qemu-common.h |   13 +++++++++++++
>> util/cutils.c         |   45 +++++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 58 insertions(+)
>> 
>> diff --git a/include/qemu-common.h b/include/qemu-common.h
>> index e76ade3..078e535 100644
>> --- a/include/qemu-common.h
>> +++ b/include/qemu-common.h
>> @@ -472,4 +472,17 @@ void hexdump(const char *buf, FILE *fp, const char *prefix, size_t size);
>> #define ALL_EQ(v1, v2) ((v1) == (v2))
>> #endif
>> 
>> +#define BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR 8
>> +static inline bool
>> +can_use_buffer_find_nonzero_offset(const void *buf, size_t len)
>> +{
>> +    if (len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
>> +                * sizeof(VECTYPE)) == 0
>> +            && ((uintptr_t) buf) % sizeof(VECTYPE) == 0) {
>> +        return true;
>> +    }
>> +    return false;
>> +}
>> +size_t buffer_find_nonzero_offset(const void *buf, size_t len);
>> +
>> #endif
>> diff --git a/util/cutils.c b/util/cutils.c
>> index 1439da4..41c627e 100644
>> --- a/util/cutils.c
>> +++ b/util/cutils.c
>> @@ -143,6 +143,51 @@ int qemu_fdatasync(int fd)
>> }
>> 
>> /*
>> + * Searches for an area with non-zero content in a buffer
>> + *
>> + * Attention! The len must be a multiple of
>> + * BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR * sizeof(VECTYPE)
>> + * and addr must be a multiple of sizeof(VECTYPE) due to
>> + * restriction of optimizations in this function.
>> + *
>> + * can_use_buffer_find_nonzero_offset() can be used to check
>> + * these requirements.
>> + *
>> + * The return value is the offset of the non-zero area rounded
>> + * down to BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR * sizeof(VECTYPE).
>> + * If the buffer is all zero the return value is equal to len.
>> + */
>> +
>> +size_t buffer_find_nonzero_offset(const void *buf, size_t len)
>> +{
>> +    VECTYPE *p = (VECTYPE *)buf;
>> +    VECTYPE zero = ZERO_SPLAT;
>> +    size_t i;
>> +
>> +    assert(len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
>> +        * sizeof(VECTYPE)) == 0);
>> +    assert(((uintptr_t) buf) % sizeof(VECTYPE) == 0);
>> +
>> +    if (*((const long *) buf)) {
>> +        return 0;
>> +    }
>> +
>> +    for (i = 0; i < len / sizeof(VECTYPE);
> Why not put len/sizeof(VECTYPE) in a variable?

are you afraid that there is a division at each iteration?

sizeof(VECTYPE) is a power of 2 so i think the compiler will optimize it
to a >> at compile time.

I would also be ok with writing len /= sizeof(VECTYPE) before the loop.

Peter

> Orit
>> +            i += BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR) {
>> +        VECTYPE tmp0 = p[i + 0] | p[i + 1];
>> +        VECTYPE tmp1 = p[i + 2] | p[i + 3];
>> +        VECTYPE tmp2 = p[i + 4] | p[i + 5];
>> +        VECTYPE tmp3 = p[i + 6] | p[i + 7];
>> +        VECTYPE tmp01 = tmp0 | tmp1;
>> +        VECTYPE tmp23 = tmp2 | tmp3;
>> +        if (!ALL_EQ(tmp01 | tmp23, zero)) {
>> +            break;
>> +        }
>> +    }
>> +    return i * sizeof(VECTYPE);
>> +}
>> +
>> +/*
>>  * Checks if a buffer is all zeroes
>>  *
>>  * Attention! The len must be a multiple of 4 * sizeof(long) due to
>> 
>
Orit Wasserman - March 25, 2013, 9:26 a.m.
On 03/25/2013 10:56 AM, Peter Lieven wrote:
> 
> Am 25.03.2013 um 09:53 schrieb Orit Wasserman <owasserm@redhat.com>:
> 
>> On 03/22/2013 02:46 PM, Peter Lieven wrote:
>>> this adds buffer_find_nonzero_offset() which is a SSE2/Altivec
>>> optimized function that searches for non-zero content in a
>>> buffer.
>>>
>>> due to the optimizations used in the function there are restrictions
>>> on buffer address and search length. the function
>>> can_use_buffer_find_nonzero_content() can be used to check if
>>> the function can be used safely.
>>>
>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>>> ---
>>> include/qemu-common.h |   13 +++++++++++++
>>> util/cutils.c         |   45 +++++++++++++++++++++++++++++++++++++++++++++
>>> 2 files changed, 58 insertions(+)
>>>
>>> diff --git a/include/qemu-common.h b/include/qemu-common.h
>>> index e76ade3..078e535 100644
>>> --- a/include/qemu-common.h
>>> +++ b/include/qemu-common.h
>>> @@ -472,4 +472,17 @@ void hexdump(const char *buf, FILE *fp, const char *prefix, size_t size);
>>> #define ALL_EQ(v1, v2) ((v1) == (v2))
>>> #endif
>>>
>>> +#define BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR 8
>>> +static inline bool
>>> +can_use_buffer_find_nonzero_offset(const void *buf, size_t len)
>>> +{
>>> +    if (len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
>>> +                * sizeof(VECTYPE)) == 0
>>> +            && ((uintptr_t) buf) % sizeof(VECTYPE) == 0) {
>>> +        return true;
>>> +    }
>>> +    return false;
>>> +}
>>> +size_t buffer_find_nonzero_offset(const void *buf, size_t len);
>>> +
>>> #endif
>>> diff --git a/util/cutils.c b/util/cutils.c
>>> index 1439da4..41c627e 100644
>>> --- a/util/cutils.c
>>> +++ b/util/cutils.c
>>> @@ -143,6 +143,51 @@ int qemu_fdatasync(int fd)
>>> }
>>>
>>> /*
>>> + * Searches for an area with non-zero content in a buffer
>>> + *
>>> + * Attention! The len must be a multiple of
>>> + * BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR * sizeof(VECTYPE)
>>> + * and addr must be a multiple of sizeof(VECTYPE) due to
>>> + * restriction of optimizations in this function.
>>> + *
>>> + * can_use_buffer_find_nonzero_offset() can be used to check
>>> + * these requirements.
>>> + *
>>> + * The return value is the offset of the non-zero area rounded
>>> + * down to BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR * sizeof(VECTYPE).
>>> + * If the buffer is all zero the return value is equal to len.
>>> + */
>>> +
>>> +size_t buffer_find_nonzero_offset(const void *buf, size_t len)
>>> +{
>>> +    VECTYPE *p = (VECTYPE *)buf;
>>> +    VECTYPE zero = ZERO_SPLAT;
>>> +    size_t i;
>>> +
>>> +    assert(len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
>>> +        * sizeof(VECTYPE)) == 0);
>>> +    assert(((uintptr_t) buf) % sizeof(VECTYPE) == 0);
>>> +
>>> +    if (*((const long *) buf)) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    for (i = 0; i < len / sizeof(VECTYPE);
>> Why not put len/sizeof(VECTYPE) in a variable?
> 
> are you afraid that there is a division at each iteration?
> 
> sizeof(VECTYPE) is a power of 2 so i think the compiler will optimize it
> to a >> at compile time.
true, but it still is done every iteration.
> 
> I would also be ok with writing len /= sizeof(VECTYPE) before the loop.
I would prefer it :)

Orit
> 
> Peter
> 
>> Orit
>>> +            i += BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR) {
>>> +        VECTYPE tmp0 = p[i + 0] | p[i + 1];
>>> +        VECTYPE tmp1 = p[i + 2] | p[i + 3];
>>> +        VECTYPE tmp2 = p[i + 4] | p[i + 5];
>>> +        VECTYPE tmp3 = p[i + 6] | p[i + 7];
>>> +        VECTYPE tmp01 = tmp0 | tmp1;
>>> +        VECTYPE tmp23 = tmp2 | tmp3;
>>> +        if (!ALL_EQ(tmp01 | tmp23, zero)) {
>>> +            break;
>>> +        }
>>> +    }
>>> +    return i * sizeof(VECTYPE);
>>> +}
>>> +
>>> +/*
>>>  * Checks if a buffer is all zeroes
>>>  *
>>>  * Attention! The len must be a multiple of 4 * sizeof(long) due to
>>>
>>
>
Paolo Bonzini - March 25, 2013, 9:42 a.m.
> >>> +size_t buffer_find_nonzero_offset(const void *buf, size_t len)
> >>> +{
> >>> +    VECTYPE *p = (VECTYPE *)buf;
> >>> +    VECTYPE zero = ZERO_SPLAT;
> >>> +    size_t i;
> >>> +
> >>> +    assert(len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
> >>> +        * sizeof(VECTYPE)) == 0);
> >>> +    assert(((uintptr_t) buf) % sizeof(VECTYPE) == 0);
> >>> +
> >>> +    if (*((const long *) buf)) {
> >>> +        return 0;
> >>> +    }
> >>> +
> >>> +    for (i = 0; i < len / sizeof(VECTYPE);
> >> Why not put len/sizeof(VECTYPE) in a variable?
> > 
> > are you afraid that there is a division at each iteration?
> > 
> > sizeof(VECTYPE) is a power of 2 so i think the compiler will
> > optimize it
> > to a >> at compile time.
> true, but it still is done every iteration.

len is an invariant, the compiler will move it out of the loop
automatically.  Write readable code unless you have good clues
that it is also slow.

Paolo
Orit Wasserman - March 25, 2013, 10:03 a.m.
On 03/25/2013 11:42 AM, Paolo Bonzini wrote:
> 
>>>>> +size_t buffer_find_nonzero_offset(const void *buf, size_t len)
>>>>> +{
>>>>> +    VECTYPE *p = (VECTYPE *)buf;
>>>>> +    VECTYPE zero = ZERO_SPLAT;
>>>>> +    size_t i;
>>>>> +
>>>>> +    assert(len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
>>>>> +        * sizeof(VECTYPE)) == 0);
>>>>> +    assert(((uintptr_t) buf) % sizeof(VECTYPE) == 0);
>>>>> +
>>>>> +    if (*((const long *) buf)) {
>>>>> +        return 0;
>>>>> +    }
>>>>> +
>>>>> +    for (i = 0; i < len / sizeof(VECTYPE);
>>>> Why not put len/sizeof(VECTYPE) in a variable?
>>>
>>> are you afraid that there is a division at each iteration?
>>>
>>> sizeof(VECTYPE) is a power of 2 so i think the compiler will
>>> optimize it
>>> to a >> at compile time.
>> true, but it still is done every iteration.
> 
> len is an invariant, the compiler will move it out of the loop
> automatically.  Write readable code unless you have good clues
> that it is also slow.
> 
I know it does for x86 but I wasn't sure for other platforms.
I'm fine with as is.

Orit
> Paolo
>

Patch

diff --git a/include/qemu-common.h b/include/qemu-common.h
index e76ade3..078e535 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -472,4 +472,17 @@  void hexdump(const char *buf, FILE *fp, const char *prefix, size_t size);
 #define ALL_EQ(v1, v2) ((v1) == (v2))
 #endif
 
+#define BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR 8
+static inline bool
+can_use_buffer_find_nonzero_offset(const void *buf, size_t len)
+{
+    if (len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
+                * sizeof(VECTYPE)) == 0
+            && ((uintptr_t) buf) % sizeof(VECTYPE) == 0) {
+        return true;
+    }
+    return false;
+}
+size_t buffer_find_nonzero_offset(const void *buf, size_t len);
+
 #endif
diff --git a/util/cutils.c b/util/cutils.c
index 1439da4..41c627e 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -143,6 +143,51 @@  int qemu_fdatasync(int fd)
 }
 
 /*
+ * Searches for an area with non-zero content in a buffer
+ *
+ * Attention! The len must be a multiple of
+ * BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR * sizeof(VECTYPE)
+ * and addr must be a multiple of sizeof(VECTYPE) due to
+ * restriction of optimizations in this function.
+ *
+ * can_use_buffer_find_nonzero_offset() can be used to check
+ * these requirements.
+ *
+ * The return value is the offset of the non-zero area rounded
+ * down to BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR * sizeof(VECTYPE).
+ * If the buffer is all zero the return value is equal to len.
+ */
+
+size_t buffer_find_nonzero_offset(const void *buf, size_t len)
+{
+    VECTYPE *p = (VECTYPE *)buf;
+    VECTYPE zero = ZERO_SPLAT;
+    size_t i;
+
+    assert(len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
+        * sizeof(VECTYPE)) == 0);
+    assert(((uintptr_t) buf) % sizeof(VECTYPE) == 0);
+
+    if (*((const long *) buf)) {
+        return 0;
+    }
+
+    for (i = 0; i < len / sizeof(VECTYPE);
+            i += BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR) {
+        VECTYPE tmp0 = p[i + 0] | p[i + 1];
+        VECTYPE tmp1 = p[i + 2] | p[i + 3];
+        VECTYPE tmp2 = p[i + 4] | p[i + 5];
+        VECTYPE tmp3 = p[i + 6] | p[i + 7];
+        VECTYPE tmp01 = tmp0 | tmp1;
+        VECTYPE tmp23 = tmp2 | tmp3;
+        if (!ALL_EQ(tmp01 | tmp23, zero)) {
+            break;
+        }
+    }
+    return i * sizeof(VECTYPE);
+}
+
+/*
  * Checks if a buffer is all zeroes
  *
  * Attention! The len must be a multiple of 4 * sizeof(long) due to