Patchwork [PATCHv3,2/9] cutils: add a function to find non-zero content in a buffer

login
register
mail settings
Submitter Peter Lieven
Date March 21, 2013, 3:57 p.m.
Message ID <1363881457-14814-3-git-send-email-pl@kamp.de>
Download mbox | patch
Permalink /patch/229745/
State New
Headers show

Comments

Peter Lieven - March 21, 2013, 3:57 p.m.
this adds buffer_find_nonzero_offset() which is a SSE2/Altives
optimized function that searches for non-zero content in a
buffer.

due to the optimizations used in the function there are restrictions
on buffer address and search length. the function
can_use_buffer_find_nonzero_content() can be used to check if
the function can be used safely.

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 include/qemu-common.h |    3 +++
 util/cutils.c         |   50 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)
Eric Blake - March 21, 2013, 6:12 p.m.
On 03/21/2013 09:57 AM, Peter Lieven wrote:
> this adds buffer_find_nonzero_offset() which is a SSE2/Altives

s/Altives/Altivec/

> optimized function that searches for non-zero content in a
> buffer.
> 
> due to the optimizations used in the function there are restrictions
> on buffer address and search length. the function
> can_use_buffer_find_nonzero_content() can be used to check if
> the function can be used safely.
> 
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  include/qemu-common.h |    3 +++
>  util/cutils.c         |   50 +++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 53 insertions(+)

> +inline bool can_use_buffer_find_nonzero_offset(const void *buf, size_t len);
> +inline size_t buffer_find_nonzero_offset(const void *buf, size_t len);

Ouch.  It is okay to add a 'static inline' function, but then the
implementation must live in this header.  Otherwise, the function must
not be inline, or you risk linker errors.

> +++ b/util/cutils.c
> @@ -143,6 +143,56 @@ int qemu_fdatasync(int fd)
>  }
>  
>  /*
> + * Searches for an area with non-zero content in a buffer
> + *
> + * Attention! The len must be a multiple of 8 * sizeof(VECTYPE) 

Should we call out BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR instead of a
magic number here?  But I'm okay with leaving it as-is.

> + * and addr must be a multiple of sizeof(VECTYPE) due to 

Trailing whitespace (here, and on several other lines).  Please run your
series through scripts/checkpatch.pl before submitting v4.

> + * restriction of optimizations in this function.
> + * 
> + * can_use_buffer_find_nonzero_offset() can be used to check
> + * these requirements.
> + * 
> + * The return value is the offset of the non-zero area rounded
> + * down to 8 * sizeof(VECTYPE). If the buffer is all zero 

Same comment on this use of '8'.

> + * the return value is equal to len.
> + */
> +
> +inline size_t buffer_find_nonzero_offset(const void *buf, size_t len)

s/inline// (or move it to a 'static inline' definition in the .h)

> +{
> +    VECTYPE *p = (VECTYPE *)buf;
> +    VECTYPE zero = ZERO_SPLAT;
> +    size_t i;
> +    

You copied the 'Attention! ...' message from buffer_is_zero, which
currently asserts that its condition is held.  Therefore, consistency
would argue that you should assert your preconditions here, even if it
adds more to the code size.  But this is something where a maintainer
might have a better opinion on whether to keep the code robust with an
assert(), or whether the faster operation without sanity checking is
more appropriate (in which case a followup to remove the assert from
buffer_is_zero would make sense).

>   * Checks if a buffer is all zeroes
>   *
>   * Attention! The len must be a multiple of 4 * sizeof(long) due to
> 

Cleaning up whitespace is trivial; but the incorrect use of 'inline'
requires a v4.
Peter Lieven - March 21, 2013, 7:11 p.m.
Am 21.03.2013 um 19:12 schrieb Eric Blake <eblake@redhat.com>:

> On 03/21/2013 09:57 AM, Peter Lieven wrote:
>> this adds buffer_find_nonzero_offset() which is a SSE2/Altives
> 
> s/Altives/Altivec/
> 
>> optimized function that searches for non-zero content in a
>> buffer.
>> 
>> due to the optimizations used in the function there are restrictions
>> on buffer address and search length. the function
>> can_use_buffer_find_nonzero_content() can be used to check if
>> the function can be used safely.
>> 
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>> include/qemu-common.h |    3 +++
>> util/cutils.c         |   50 +++++++++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 53 insertions(+)
> 
>> +inline bool can_use_buffer_find_nonzero_offset(const void *buf, size_t len);
>> +inline size_t buffer_find_nonzero_offset(const void *buf, size_t len);
> 
> Ouch.  It is okay to add a 'static inline' function, but then the
> implementation must live in this header.  Otherwise, the function must
> not be inline, or you risk linker errors.

Sorry, i was not aware. 

> 
>> +++ b/util/cutils.c
>> @@ -143,6 +143,56 @@ int qemu_fdatasync(int fd)
>> }
>> 
>> /*
>> + * Searches for an area with non-zero content in a buffer
>> + *
>> + * Attention! The len must be a multiple of 8 * sizeof(VECTYPE) 
> 
> Should we call out BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR instead of a
> magic number here?  But I'm okay with leaving it as-is.
> 
>> + * and addr must be a multiple of sizeof(VECTYPE) due to 
> 
> Trailing whitespace (here, and on several other lines).  Please run your
> series through scripts/checkpatch.pl before submitting v4.

thanks for the pointer.

> 
>> + * restriction of optimizations in this function.
>> + * 
>> + * can_use_buffer_find_nonzero_offset() can be used to check
>> + * these requirements.
>> + * 
>> + * The return value is the offset of the non-zero area rounded
>> + * down to 8 * sizeof(VECTYPE). If the buffer is all zero 
> 
> Same comment on this use of '8'.
> 
>> + * the return value is equal to len.
>> + */
>> +
>> +inline size_t buffer_find_nonzero_offset(const void *buf, size_t len)
> 
> s/inline// (or move it to a 'static inline' definition in the .h)

I would move at least the can_use_buffer_find_nonzero_offset() function
completely into the header and rely the compiler inlineing
buffer_find_nonzero_offset().

> 
>> +{
>> +    VECTYPE *p = (VECTYPE *)buf;
>> +    VECTYPE zero = ZERO_SPLAT;
>> +    size_t i;
>> +    
> 
> You copied the 'Attention! ...' message from buffer_is_zero, which
> currently asserts that its condition is held.  Therefore, consistency
> would argue that you should assert your preconditions here, even if it
> adds more to the code size.  But this is something where a maintainer
> might have a better opinion on whether to keep the code robust with an
> assert(), or whether the faster operation without sanity checking is
> more appropriate (in which case a followup to remove the assert from
> buffer_is_zero would make sense).

I would appreciate feedback on this from the maintainers. My concern
with buffer_find_nonzero_offset() is that it is used for checking millions
of 4KByte pages. buffer_is_zero is used for larger objects in the bdrv
context, afaik.



> 
>>  * Checks if a buffer is all zeroes
>>  *
>>  * Attention! The len must be a multiple of 4 * sizeof(long) due to
>> 
> 
> Cleaning up whitespace is trivial; but the incorrect use of 'inline'
> requires a v4.
> 
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>

Patch

diff --git a/include/qemu-common.h b/include/qemu-common.h
index e76ade3..ebbaf71 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -362,6 +362,9 @@  size_t qemu_iovec_from_buf(QEMUIOVector *qiov, size_t offset,
 size_t qemu_iovec_memset(QEMUIOVector *qiov, size_t offset,
                          int fillc, size_t bytes);
 
+#define BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR 8
+inline bool can_use_buffer_find_nonzero_offset(const void *buf, size_t len);
+inline size_t buffer_find_nonzero_offset(const void *buf, size_t len);
 bool buffer_is_zero(const void *buf, size_t len);
 
 void qemu_progress_init(int enabled, float min_skip);
diff --git a/util/cutils.c b/util/cutils.c
index 1439da4..6d079ac 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -143,6 +143,56 @@  int qemu_fdatasync(int fd)
 }
 
 /*
+ * Searches for an area with non-zero content in a buffer
+ *
+ * Attention! The len must be a multiple of 8 * sizeof(VECTYPE) 
+ * and addr must be a multiple of sizeof(VECTYPE) due to 
+ * restriction of optimizations in this function.
+ * 
+ * can_use_buffer_find_nonzero_offset() can be used to check
+ * these requirements.
+ * 
+ * The return value is the offset of the non-zero area rounded
+ * down to 8 * sizeof(VECTYPE). If the buffer is all zero 
+ * the return value is equal to len.
+ */
+
+inline size_t buffer_find_nonzero_offset(const void *buf, size_t len)
+{
+    VECTYPE *p = (VECTYPE *)buf;
+    VECTYPE zero = ZERO_SPLAT;
+    size_t i;
+    
+    if (*((const long *) buf)) {
+        return 0;
+    }
+
+    for (i = 0; i < len / sizeof(VECTYPE); 
+            i += BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR) {
+        VECTYPE tmp0 = p[i + 0] | p[i + 1];
+        VECTYPE tmp1 = p[i + 2] | p[i + 3];
+        VECTYPE tmp2 = p[i + 4] | p[i + 5];
+        VECTYPE tmp3 = p[i + 6] | p[i + 7];
+        VECTYPE tmp01 = tmp0 | tmp1;
+        VECTYPE tmp23 = tmp2 | tmp3;
+        if (!ALL_EQ(tmp01 | tmp23, zero)) {
+            break;
+        }
+    }
+    return i * sizeof(VECTYPE);
+}
+
+inline bool can_use_buffer_find_nonzero_offset(const void *buf, size_t len)
+{
+    if (len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR 
+                * sizeof(VECTYPE)) == 0 
+            && ((uintptr_t) buf) % sizeof(VECTYPE) == 0) {
+        return true;
+    }
+    return false;
+}
+
+/*
  * Checks if a buffer is all zeroes
  *
  * Attention! The len must be a multiple of 4 * sizeof(long) due to