Message ID | BANLkTinHhx5jTgcH-cytci=fNdk2V7zssg@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Tue, May 17, 2011 at 3:33 PM, Dmitry Konishchev <konishchev@gmail.com> wrote: > Hi! I was wondering why qemu-img consumes so much CPU when it converts > one partially allocated qcow2 image to another qcow2 image and I've > written a patch which improves the situation a little. Please see http://wiki.qemu.org/Contribute/SubmitAPatch, which asks that patches are sent inline (not as attachments) for easy review and that you follow the coding style (see the CODING_STYLE file). Patches also need a Signed-off-by: line. The unrolled loop makes the function rely on len being a multiple of sizeof(long) * 4, otherwise it accesses beyond the end of sector[]. So for this use case it's okay but the function is generic anymore. GNU cp(1) tries to detect holes in files and image formats could tell us about unallocated regions using bdrv_is_allocated(). So I think there are ways to avoid comparing so much data in the first place, if you are interested in looking into that. Stefan
On Tue, May 17, 2011 at 7:35 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote: > Please see http://wiki.qemu.org/Contribute/SubmitAPatch, which asks > that patches are sent inline (not as attachments) for easy review and > that you follow the coding style (see the CODING_STYLE file). Patches > also need a Signed-off-by: line. OK, thanks. I'll fix this in case you are willing to accept the patch. On Tue, May 17, 2011 at 7:35 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote: > The unrolled loop makes the function rely on len being a multiple of > sizeof(long) * 4, otherwise it accesses beyond the end of sector[]. > So for this use case it's okay but the function is generic anymore. Yeah, but this function is static and within the whole file it is used only for comparing clusters, so I think that we can sacrifice it's universality for the sake of the performance. On Tue, May 17, 2011 at 7:35 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote: > GNU cp(1) tries to detect holes in files and image formats could tell > us about unallocated regions using bdrv_is_allocated(). So I think > there are ways to avoid comparing so much data in the first place, if > you are interested in looking into that. OK, thanks, I'll look on this function (but sorry, only after 2 weeks, since I'm on my vacation). But actually I think, that it will be better to use the both ways, since they can give a boost in different usecases. So, if you are agreed with the said above, you can accept this patch and then I'll write an enchancement for it with bdrv_is_allocated() because it is going to include this patch. -- Dmitry Konishchev mailto:konishchev@gmail.com
On Wed, May 18, 2011 at 7:55 AM, Dmitry Konishchev <konishchev@gmail.com> wrote: > So, if you are agreed with the said above, you can accept this patch > and then I'll write an enchancement for it with bdrv_is_allocated() > because it is going to include this patch. Yes, optimizing is_not_zero() is good. The only additional thing I suggest is adding a comment before the function to document the length constraint. Kevin Wolf is CCed, he's the QEMU block layer maintainer and may have additional thoughts before accepting this patch. Stefan
Am 18.05.2011 09:57, schrieb Stefan Hajnoczi: > On Wed, May 18, 2011 at 7:55 AM, Dmitry Konishchev <konishchev@gmail.com> wrote: >> So, if you are agreed with the said above, you can accept this patch >> and then I'll write an enchancement for it with bdrv_is_allocated() >> because it is going to include this patch. > > Yes, optimizing is_not_zero() is good. The only additional thing I > suggest is adding a comment before the function to document the length > constraint. > > Kevin Wolf is CCed, he's the QEMU block layer maintainer and may have > additional thoughts before accepting this patch. For this one not really. Except for the coding style it looks good to me. A future bdrv_is_allocated() patch must make sure that the conversion falls back to a simple is_not_zero() when a backing file is used. Kevin
From 61d228c0ea0d518de48a08577cd6d282e2f97759 Mon Sep 17 00:00:00 2001 From: Dmitry Konishchev <konishchev@gmail.com> Date: Tue, 17 May 2011 16:29:48 +0400 Subject: [PATCH] is_not_zero() optimization in qemu-img --- qemu-img.c | 24 +++++++++++++++++++++--- 1 files changed, 21 insertions(+), 3 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index e825123..41b4e32 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -498,12 +498,30 @@ static int img_commit(int argc, char **argv) static int is_not_zero(const uint8_t *sector, int len) { + /* + * Use long as the biggest available internal data type that fits into the + * CPU register and unroll the loop to smooth out the effect of memory + * latency. + */ + int i; - len >>= 2; - for(i = 0;i < len; i++) { - if (((uint32_t *)sector)[i] != 0) + len /= sizeof(long); + + long d0; + long d1; + long d2; + long d3; + + for(i = 0; i < len; i += 4) { + d0 = ((const long*) sector)[i + 0]; + d1 = ((const long*) sector)[i + 1]; + d2 = ((const long*) sector)[i + 2]; + d3 = ((const long*) sector)[i + 3]; + + if (d0 || d1 || d2 || d3) return 1; } + return 0; } -- 1.7.4.1