Message ID | 4DD38F03.7020209@gmail.com |
---|---|
State | New |
Headers | show |
Am 18.05.2011 11:18, schrieb Dmitry Konishchev: > On 18.05.2011 11:57, Stefan Hajnoczi wrote: >> Yes, optimizing is_not_zero() is good. The only additional thing I >> suggest is adding a comment before the function to document the length >> constraint. > > OK, fixed. > > > On 18.05.2011 12:05, Kevin Wolf wrote: >> A future bdrv_is_allocated() patch must make sure that the conversion >> falls back to a simple is_not_zero() when a backing file is used. > > Thanks, I'll take this into account. > > > Signed-off-by: Dmitry Konishchev <konishchev@gmail.com> > --- > qemu-img.c | 30 +++++++++++++++++++++++++++--- > 1 files changed, 27 insertions(+), 3 deletions(-) > > diff --git a/qemu-img.c b/qemu-img.c > index e825123..7665c2f 100644 > --- a/qemu-img.c > +++ b/qemu-img.c > @@ -496,14 +496,38 @@ static int img_commit(int argc, char **argv) > return 0; > } > > +/* > + * Checks whether the sector is not a zero sector. > + * > + * Attention! The len must be a multiple of 4 * sizeof(long) due to > + * restriction of optimizations in this function. > + */ > static int is_not_zero(const uint8_t *sector, int len) > { > + /* > + * Use long as the biggest available internal data type that fits > into the > + * CPU register and unroll the loop to smooth out the effect of memory > + * latency. > + */ > + > int i; > - len >>= 2; > - for(i = 0;i < len; i++) { > - if (((uint32_t *)sector)[i] != 0) > + len /= sizeof(long); > + > + long d0; > + long d1; > + long d2; > + long d3; Please move the declarations to the start of the function. I also would use a single line like "long d0, d1, d2, d3;", but that's up to you. > + > + for(i = 0; i < len; i += 4) { > + d0 = ((const long*) sector)[i + 0]; > + d1 = ((const long*) sector)[i + 1]; > + d2 = ((const long*) sector)[i + 2]; > + d3 = ((const long*) sector)[i + 3]; I would suggest to declare a const long* variable so that you don't have to cast each time you use, but that's probably a matter of taste. > + > + if (d0 || d1 || d2 || d3) > return 1; Coding style requires braces here. > } > + > return 0; > } Please make sure that your patch isn't line-wrapped when you send it for inclusion. git send-email will do the right thing. Kevin
On 18 May 2011 10:18, Dmitry Konishchev <konishchev@gmail.com> wrote: > + * Attention! The len must be a multiple of 4 * sizeof(long) due to > + * restriction of optimizations in this function. You could assert() this: assert(argc % (4 * sizeof(long)) == 0); -- PMM
On 18 May 2011 10:40, Peter Maydell <peter.maydell@linaro.org> wrote: > On 18 May 2011 10:18, Dmitry Konishchev <konishchev@gmail.com> wrote: > >> + * Attention! The len must be a multiple of 4 * sizeof(long) due to >> + * restriction of optimizations in this function. > > You could assert() this: > assert(argc % (4 * sizeof(long)) == 0); s/len/argc/, obviously! -- PMM
On Wed, May 18, 2011 at 1:40 PM, Peter Maydell <peter.maydell@linaro.org> wrote: > You could assert() this: > assert(argc % (4 * sizeof(long)) == 0); Yeah, but actually I'm not really like the idea to include asserts in the little bottleneck functions if the configuration script doesn't include -DNDEBUG in the compiler cflags by default. Because in this case it's yet additional instructions on which CPU pipeline is going to stumble upon + it also decreases the chances for this function to be inlined by the compiler. But inlining can give us an additional boost + compiler will be able to understand that, in the place where the function is inlined, it is always called with len == 512 and optimize the code for this case by automatically unroll the loop and so on. But in the bottom line I don't really mind to include the assert - just believe that it's not really worth it.
diff --git a/qemu-img.c b/qemu-img.c index e825123..7665c2f 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -496,14 +496,38 @@ static int img_commit(int argc, char **argv) return 0; } +/* + * Checks whether the sector is not a zero sector. + * + * Attention! The len must be a multiple of 4 * sizeof(long) due to + * restriction of optimizations in this function. + */ static int is_not_zero(const uint8_t *sector, int len) { + /* + * Use long as the biggest available internal data type that fits into the + * CPU register and unroll the loop to smooth out the effect of memory + * latency. + */ + int i; - len >>= 2; - for(i = 0;i < len; i++) { - if (((uint32_t *)sector)[i] != 0) + len /= sizeof(long); + + long d0; + long d1; + long d2; + long d3; + + for(i = 0; i < len; i += 4) { + d0 = ((const long*) sector)[i + 0]; + d1 = ((const long*) sector)[i + 1]; + d2 = ((const long*) sector)[i + 2]; + d3 = ((const long*) sector)[i + 3]; + + if (d0 || d1 || d2 || d3) return 1; } + return 0; }