Patchwork [qemu-img] CPU consuming optimization

login
register
mail settings
Submitter Dmitry Konishchev
Date May 18, 2011, 9:18 a.m.
Message ID <4DD38F03.7020209@gmail.com>
Download mbox | patch
Permalink /patch/96139/
State New
Headers show

Comments

Dmitry Konishchev - May 18, 2011, 9:18 a.m.
On 18.05.2011 11:57, Stefan Hajnoczi wrote:
> Yes, optimizing is_not_zero() is good.  The only additional thing I
> suggest is adding a comment before the function to document the length
> constraint.

OK, fixed.


On 18.05.2011 12:05, Kevin Wolf wrote:
> A future bdrv_is_allocated() patch must make sure that the conversion
> falls back to a simple is_not_zero() when a backing file is used.

Thanks, I'll take this into account.


Signed-off-by: Dmitry Konishchev <konishchev@gmail.com>
---
  qemu-img.c |   30 +++++++++++++++++++++++++++---
  1 files changed, 27 insertions(+), 3 deletions(-)
Kevin Wolf - May 18, 2011, 9:31 a.m.
Am 18.05.2011 11:18, schrieb Dmitry Konishchev:
> On 18.05.2011 11:57, Stefan Hajnoczi wrote:
>> Yes, optimizing is_not_zero() is good.  The only additional thing I
>> suggest is adding a comment before the function to document the length
>> constraint.
> 
> OK, fixed.
> 
> 
> On 18.05.2011 12:05, Kevin Wolf wrote:
>> A future bdrv_is_allocated() patch must make sure that the conversion
>> falls back to a simple is_not_zero() when a backing file is used.
> 
> Thanks, I'll take this into account.
> 
> 
> Signed-off-by: Dmitry Konishchev <konishchev@gmail.com>
> ---
>   qemu-img.c |   30 +++++++++++++++++++++++++++---
>   1 files changed, 27 insertions(+), 3 deletions(-)
> 
> diff --git a/qemu-img.c b/qemu-img.c
> index e825123..7665c2f 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -496,14 +496,38 @@ static int img_commit(int argc, char **argv)
>       return 0;
>   }
> 
> +/*
> + * Checks whether the sector is not a zero sector.
> + *
> + * Attention! The len must be a multiple of 4 * sizeof(long) due to
> + * restriction of optimizations in this function.
> + */
>   static int is_not_zero(const uint8_t *sector, int len)
>   {
> +    /*
> +     * Use long as the biggest available internal data type that fits 
> into the
> +     * CPU register and unroll the loop to smooth out the effect of memory
> +     * latency.
> +     */
> +
>       int i;
> -    len >>= 2;
> -    for(i = 0;i < len; i++) {
> -        if (((uint32_t *)sector)[i] != 0)
> +    len /= sizeof(long);
> +
> +    long d0;
> +    long d1;
> +    long d2;
> +    long d3;

Please move the declarations to the start of the function.

I also would use a single line like "long d0, d1, d2, d3;", but that's
up to you.

> +
> +    for(i = 0; i < len; i += 4) {
> +        d0 = ((const long*) sector)[i + 0];
> +        d1 = ((const long*) sector)[i + 1];
> +        d2 = ((const long*) sector)[i + 2];
> +        d3 = ((const long*) sector)[i + 3];

I would suggest to declare a const long* variable so that you don't have
to cast each time you use, but that's probably a matter of taste.

> +
> +        if (d0 || d1 || d2 || d3)
>               return 1;

Coding style requires braces here.

>       }
> +
>       return 0;
>   }

Please make sure that your patch isn't line-wrapped when you send it for
inclusion. git send-email will do the right thing.

Kevin
Peter Maydell - May 18, 2011, 9:40 a.m.
On 18 May 2011 10:18, Dmitry Konishchev <konishchev@gmail.com> wrote:

> + * Attention! The len must be a multiple of 4 * sizeof(long) due to
> + * restriction of optimizations in this function.

You could assert() this:
 assert(argc % (4 * sizeof(long)) == 0);

-- PMM
Peter Maydell - May 18, 2011, 9:40 a.m.
On 18 May 2011 10:40, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 18 May 2011 10:18, Dmitry Konishchev <konishchev@gmail.com> wrote:
>
>> + * Attention! The len must be a multiple of 4 * sizeof(long) due to
>> + * restriction of optimizations in this function.
>
> You could assert() this:
>  assert(argc % (4 * sizeof(long)) == 0);

s/len/argc/, obviously!

-- PMM
Dmitry Konishchev - May 18, 2011, 10:27 a.m.
On Wed, May 18, 2011 at 1:40 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> You could assert() this:
>  assert(argc % (4 * sizeof(long)) == 0);

Yeah, but actually I'm not really like the idea to include asserts in
the little bottleneck functions if the configuration script doesn't
include -DNDEBUG in the compiler cflags by default. Because in this
case it's yet additional instructions on which CPU pipeline is going
to stumble upon + it also decreases the chances for this function to
be inlined by the compiler. But inlining can give us an additional
boost + compiler will be able to understand that, in the place where
the function is inlined, it is always called with len == 512 and
optimize the code for this case by automatically unroll the loop and
so on.

But in the bottom line I don't really mind to include the assert -
just believe that it's not really worth it.

Patch

diff --git a/qemu-img.c b/qemu-img.c
index e825123..7665c2f 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -496,14 +496,38 @@  static int img_commit(int argc, char **argv)
      return 0;
  }

+/*
+ * Checks whether the sector is not a zero sector.
+ *
+ * Attention! The len must be a multiple of 4 * sizeof(long) due to
+ * restriction of optimizations in this function.
+ */
  static int is_not_zero(const uint8_t *sector, int len)
  {
+    /*
+     * Use long as the biggest available internal data type that fits 
into the
+     * CPU register and unroll the loop to smooth out the effect of memory
+     * latency.
+     */
+
      int i;
-    len >>= 2;
-    for(i = 0;i < len; i++) {
-        if (((uint32_t *)sector)[i] != 0)
+    len /= sizeof(long);
+
+    long d0;
+    long d1;
+    long d2;
+    long d3;
+
+    for(i = 0; i < len; i += 4) {
+        d0 = ((const long*) sector)[i + 0];
+        d1 = ((const long*) sector)[i + 1];
+        d2 = ((const long*) sector)[i + 2];
+        d3 = ((const long*) sector)[i + 3];
+
+        if (d0 || d1 || d2 || d3)
              return 1;
      }
+
      return 0;
  }