Message ID | 20190221173758.24672-1-slp@redhat.com |
---|---|
State | New |
Headers | show |
Series | qemu-img: implement copy offload (-C) for dd | expand |
On 2/21/19 11:37 AM, Sergio Lopez wrote: > This parameter is analogous to convert's "-C", making use of > bdrv_co_copy_range(). The last time I tried to patch 'qemu-img dd', it was pointed out that it already has several bugs (where it is not on feature-parity with real dd), and that we REALLY want to make it a syntactic sugar wrapper around 'qemu-img convert', rather than duplicating code (which means that qemu-img convert needs to make it easier to do arbitrary offsets and subsets - although to some extent you can already do that with --image-opts and appropriate raw driver wrappers). https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg02618.html > > Signed-off-by: Sergio Lopez <slp@redhat.com> > --- > qemu-img-cmds.hx | 4 +- > qemu-img.c | 146 ++++++++++++++++++++++++++++++++++++----------- > qemu-img.texi | 2 +- > 3 files changed, 117 insertions(+), 35 deletions(-) In other words, this feels like a lot of code compared to being able to just forward on to an appropriate qemu-img convert command, and I think our time would be better spent on that.
On Thu, Feb 21, 2019 at 12:08:12PM -0600, Eric Blake wrote: > On 2/21/19 11:37 AM, Sergio Lopez wrote: > > This parameter is analogous to convert's "-C", making use of > > bdrv_co_copy_range(). > > The last time I tried to patch 'qemu-img dd', it was pointed out that it > already has several bugs (where it is not on feature-parity with real > dd), and that we REALLY want to make it a syntactic sugar wrapper around > 'qemu-img convert', rather than duplicating code (which means that > qemu-img convert needs to make it easier to do arbitrary offsets and > subsets - although to some extent you can already do that with > --image-opts and appropriate raw driver wrappers). > > https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg02618.html Interesting, I wasn't aware of that conversation. It might a little late to go again through it, but while I don't a strong opinion about it, I do have some reservations about the idea of making 'dd' a frontend for 'convert'. While I do see the functional similarity of both commands, to me they are quite different at a semantical level. For 'convert', I do expect it to do "the right thing" and use the optimal settings (i.e. choosing the best transfer size) by default, while 'dd' is more of "do whatever the user told you to do no matter how wrong it is". Due to this differences, I think turning 'convert' code into something able to deal with 'dd' semantics would imply adding a considerable number of conditionals, possibly making it harder to maintain than keeping it separate. Sergio (slp).
Am 21.02.2019 um 20:32 hat Sergio Lopez geschrieben: > On Thu, Feb 21, 2019 at 12:08:12PM -0600, Eric Blake wrote: > > On 2/21/19 11:37 AM, Sergio Lopez wrote: > > > This parameter is analogous to convert's "-C", making use of > > > bdrv_co_copy_range(). > > > > The last time I tried to patch 'qemu-img dd', it was pointed out that it > > already has several bugs (where it is not on feature-parity with real > > dd), and that we REALLY want to make it a syntactic sugar wrapper around > > 'qemu-img convert', rather than duplicating code (which means that > > qemu-img convert needs to make it easier to do arbitrary offsets and > > subsets - although to some extent you can already do that with > > --image-opts and appropriate raw driver wrappers). > > > > https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg02618.html > > Interesting, I wasn't aware of that conversation. It might a little > late to go again through it, but while I don't a strong opinion about > it, I do have some reservations about the idea of making 'dd' a > frontend for 'convert'. > > While I do see the functional similarity of both commands, to me they > are quite different at a semantical level. For 'convert', I do expect > it to do "the right thing" and use the optimal settings (i.e. choosing > the best transfer size) by default, while 'dd' is more of "do whatever > the user told you to do no matter how wrong it is". I think that's what defaults are for. It would make sense to allow the user to specify the buffer size even for 'qemu-img convert'. In fact, we already have a variable for this, we'd just have to add an option to set it explicitly instead of just relying on what the output block node tells us. > Due to this differences, I think turning 'convert' code into something > able to deal with 'dd' semantics would imply adding a considerable > number of conditionals, possibly making it harder to maintain than > keeping it separate. 'qemu-img dd' currently supports five options: * if and of. These are obviously supported for convert, too. * count and skip. We don't have these in convert yet. We could either add the functionality there or add a raw layer in the 'dd' implementation before we call into the common code. * bs. The buffer size is already configurable in ImgConvertState. So getting this functionality into 'convert' should be easy. There are more differences between 'convert' and 'dd' in how exactly the copy is done. I'm not sure whether there is actually a good use for the dumb 'dd' copying or whether it was just implemented this way because it was easier. Currently we have features like copying only a given range only in 'dd', and most other features like zero detection, dealing with backing files, compression, copy offloading or parallel out-of-order copy only in 'convert'. Actually, we have more than just these two places that copy images: We also have the mirror block job and for other special cases also the other block jobs that copy data, each with its own list of features and missing options. What I really want to see eventually is a way to have all features available in all of these instead of only a random selection where you're out of luck if you want to copy part of an image with compressed output while the VM is running because these three are features supported in three different copy implementations and you can't get more than one of the features at the same time. Kevin
On Fri, Feb 22, 2019 at 11:04:25AM +0100, Kevin Wolf wrote: > Am 21.02.2019 um 20:32 hat Sergio Lopez geschrieben: > > On Thu, Feb 21, 2019 at 12:08:12PM -0600, Eric Blake wrote: > > > On 2/21/19 11:37 AM, Sergio Lopez wrote: > > > > This parameter is analogous to convert's "-C", making use of > > > > bdrv_co_copy_range(). > > > > > > The last time I tried to patch 'qemu-img dd', it was pointed out that it > > > already has several bugs (where it is not on feature-parity with real > > > dd), and that we REALLY want to make it a syntactic sugar wrapper around > > > 'qemu-img convert', rather than duplicating code (which means that > > > qemu-img convert needs to make it easier to do arbitrary offsets and > > > subsets - although to some extent you can already do that with > > > --image-opts and appropriate raw driver wrappers). > > > > > > https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg02618.html > > > > Interesting, I wasn't aware of that conversation. It might a little > > late to go again through it, but while I don't a strong opinion about > > it, I do have some reservations about the idea of making 'dd' a > > frontend for 'convert'. > > > > While I do see the functional similarity of both commands, to me they > > are quite different at a semantical level. For 'convert', I do expect > > it to do "the right thing" and use the optimal settings (i.e. choosing > > the best transfer size) by default, while 'dd' is more of "do whatever > > the user told you to do no matter how wrong it is". > > I think that's what defaults are for. It would make sense to allow the > user to specify the buffer size even for 'qemu-img convert'. In fact, we > already have a variable for this, we'd just have to add an option to set > it explicitly instead of just relying on what the output block node > tells us. > > > Due to this differences, I think turning 'convert' code into something > > able to deal with 'dd' semantics would imply adding a considerable > > number of conditionals, possibly making it harder to maintain than > > keeping it separate. > > 'qemu-img dd' currently supports five options: > > * if and of. These are obviously supported for convert, too. > * count and skip. We don't have these in convert yet. We could either > add the functionality there or add a raw layer in the 'dd' > implementation before we call into the common code. > * bs. The buffer size is already configurable in ImgConvertState. > > So getting this functionality into 'convert' should be easy. > > There are more differences between 'convert' and 'dd' in how exactly the > copy is done. I'm not sure whether there is actually a good use for the > dumb 'dd' copying or whether it was just implemented this way because it > was easier. > > Currently we have features like copying only a given range only in 'dd', > and most other features like zero detection, dealing with backing files, > compression, copy offloading or parallel out-of-order copy only in > 'convert'. > > Actually, we have more than just these two places that copy images: We > also have the mirror block job and for other special cases also the > other block jobs that copy data, each with its own list of features and > missing options. > > What I really want to see eventually is a way to have all features > available in all of these instead of only a random selection where > you're out of luck if you want to copy part of an image with compressed > output while the VM is running because these three are features > supported in three different copy implementations and you can't get more > than one of the features at the same time. OK, makes sense. Is someone already working on this? Sergio (slp).
Am 25.02.2019 um 14:40 hat Sergio Lopez geschrieben: > On Fri, Feb 22, 2019 at 11:04:25AM +0100, Kevin Wolf wrote: > > Am 21.02.2019 um 20:32 hat Sergio Lopez geschrieben: > > > On Thu, Feb 21, 2019 at 12:08:12PM -0600, Eric Blake wrote: > > > > On 2/21/19 11:37 AM, Sergio Lopez wrote: > > > > > This parameter is analogous to convert's "-C", making use of > > > > > bdrv_co_copy_range(). > > > > > > > > The last time I tried to patch 'qemu-img dd', it was pointed out that it > > > > already has several bugs (where it is not on feature-parity with real > > > > dd), and that we REALLY want to make it a syntactic sugar wrapper around > > > > 'qemu-img convert', rather than duplicating code (which means that > > > > qemu-img convert needs to make it easier to do arbitrary offsets and > > > > subsets - although to some extent you can already do that with > > > > --image-opts and appropriate raw driver wrappers). > > > > > > > > https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg02618.html > > > > > > Interesting, I wasn't aware of that conversation. It might a little > > > late to go again through it, but while I don't a strong opinion about > > > it, I do have some reservations about the idea of making 'dd' a > > > frontend for 'convert'. > > > > > > While I do see the functional similarity of both commands, to me they > > > are quite different at a semantical level. For 'convert', I do expect > > > it to do "the right thing" and use the optimal settings (i.e. choosing > > > the best transfer size) by default, while 'dd' is more of "do whatever > > > the user told you to do no matter how wrong it is". > > > > I think that's what defaults are for. It would make sense to allow the > > user to specify the buffer size even for 'qemu-img convert'. In fact, we > > already have a variable for this, we'd just have to add an option to set > > it explicitly instead of just relying on what the output block node > > tells us. > > > > > Due to this differences, I think turning 'convert' code into something > > > able to deal with 'dd' semantics would imply adding a considerable > > > number of conditionals, possibly making it harder to maintain than > > > keeping it separate. > > > > 'qemu-img dd' currently supports five options: > > > > * if and of. These are obviously supported for convert, too. > > * count and skip. We don't have these in convert yet. We could either > > add the functionality there or add a raw layer in the 'dd' > > implementation before we call into the common code. > > * bs. The buffer size is already configurable in ImgConvertState. > > > > So getting this functionality into 'convert' should be easy. > > > > There are more differences between 'convert' and 'dd' in how exactly the > > copy is done. I'm not sure whether there is actually a good use for the > > dumb 'dd' copying or whether it was just implemented this way because it > > was easier. > > > > Currently we have features like copying only a given range only in 'dd', > > and most other features like zero detection, dealing with backing files, > > compression, copy offloading or parallel out-of-order copy only in > > 'convert'. > > > > Actually, we have more than just these two places that copy images: We > > also have the mirror block job and for other special cases also the > > other block jobs that copy data, each with its own list of features and > > missing options. > > > > What I really want to see eventually is a way to have all features > > available in all of these instead of only a random selection where > > you're out of luck if you want to copy part of an image with compressed > > output while the VM is running because these three are features > > supported in three different copy implementations and you can't get more > > than one of the features at the same time. > > OK, makes sense. Is someone already working on this? I don't think anyone has found the time yet. If you want to start work on it, maybe the best way to begin incrementally would be to factor out some common functionality from the block jobs into a new block/copy.c, from which we can later create an actual unified 'copy' block job. Or it might turn out that another way works better. I'm only talking theory here, I haven't actually tried how well it works. Kevin
On Mon, Feb 25, 2019 at 03:01:22PM +0100, Kevin Wolf wrote: > Am 25.02.2019 um 14:40 hat Sergio Lopez geschrieben: > > On Fri, Feb 22, 2019 at 11:04:25AM +0100, Kevin Wolf wrote: > > > Am 21.02.2019 um 20:32 hat Sergio Lopez geschrieben: > > > > On Thu, Feb 21, 2019 at 12:08:12PM -0600, Eric Blake wrote: > > > > > On 2/21/19 11:37 AM, Sergio Lopez wrote: > > > > > > This parameter is analogous to convert's "-C", making use of > > > > > > bdrv_co_copy_range(). > > > > > > > > > > The last time I tried to patch 'qemu-img dd', it was pointed out that it > > > > > already has several bugs (where it is not on feature-parity with real > > > > > dd), and that we REALLY want to make it a syntactic sugar wrapper around > > > > > 'qemu-img convert', rather than duplicating code (which means that > > > > > qemu-img convert needs to make it easier to do arbitrary offsets and > > > > > subsets - although to some extent you can already do that with > > > > > --image-opts and appropriate raw driver wrappers). > > > > > > > > > > https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg02618.html > > > > > > > > Interesting, I wasn't aware of that conversation. It might a little > > > > late to go again through it, but while I don't a strong opinion about > > > > it, I do have some reservations about the idea of making 'dd' a > > > > frontend for 'convert'. > > > > > > > > While I do see the functional similarity of both commands, to me they > > > > are quite different at a semantical level. For 'convert', I do expect > > > > it to do "the right thing" and use the optimal settings (i.e. choosing > > > > the best transfer size) by default, while 'dd' is more of "do whatever > > > > the user told you to do no matter how wrong it is". > > > > > > I think that's what defaults are for. It would make sense to allow the > > > user to specify the buffer size even for 'qemu-img convert'. In fact, we > > > already have a variable for this, we'd just have to add an option to set > > > it explicitly instead of just relying on what the output block node > > > tells us. > > > > > > > Due to this differences, I think turning 'convert' code into something > > > > able to deal with 'dd' semantics would imply adding a considerable > > > > number of conditionals, possibly making it harder to maintain than > > > > keeping it separate. > > > > > > 'qemu-img dd' currently supports five options: > > > > > > * if and of. These are obviously supported for convert, too. > > > * count and skip. We don't have these in convert yet. We could either > > > add the functionality there or add a raw layer in the 'dd' > > > implementation before we call into the common code. > > > * bs. The buffer size is already configurable in ImgConvertState. > > > > > > So getting this functionality into 'convert' should be easy. > > > > > > There are more differences between 'convert' and 'dd' in how exactly the > > > copy is done. I'm not sure whether there is actually a good use for the > > > dumb 'dd' copying or whether it was just implemented this way because it > > > was easier. > > > > > > Currently we have features like copying only a given range only in 'dd', > > > and most other features like zero detection, dealing with backing files, > > > compression, copy offloading or parallel out-of-order copy only in > > > 'convert'. > > > > > > Actually, we have more than just these two places that copy images: We > > > also have the mirror block job and for other special cases also the > > > other block jobs that copy data, each with its own list of features and > > > missing options. > > > > > > What I really want to see eventually is a way to have all features > > > available in all of these instead of only a random selection where > > > you're out of luck if you want to copy part of an image with compressed > > > output while the VM is running because these three are features > > > supported in three different copy implementations and you can't get more > > > than one of the features at the same time. > > > > OK, makes sense. Is someone already working on this? > > I don't think anyone has found the time yet. > > If you want to start work on it, maybe the best way to begin > incrementally would be to factor out some common functionality from the > block jobs into a new block/copy.c, from which we can later create an > actual unified 'copy' block job. Thanks for the hint. I'm going to give it a try and I'll send an early RFC patchset to put it in common and confirm we're moving in the right direction. Sergio.
diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx index 1526f327a5..7b85d183e5 100644 --- a/qemu-img-cmds.hx +++ b/qemu-img-cmds.hx @@ -56,9 +56,9 @@ STEXI ETEXI DEF("dd", img_dd, - "dd [--image-opts] [-U] [-f fmt] [-O output_fmt] [bs=block_size] [count=blocks] [skip=blocks] if=input of=output") + "dd [--image-opts] [-U] [-C] [-f fmt] [-O output_fmt] [bs=block_size] [count=blocks] [skip=blocks] if=input of=output") STEXI -@item dd [--image-opts] [-U] [-f @var{fmt}] [-O @var{output_fmt}] [bs=@var{block_size}] [count=@var{blocks}] [skip=@var{blocks}] if=@var{input} of=@var{output} +@item dd [--image-opts] [-U] [-C] [-f @var{fmt}] [-O @var{output_fmt}] [bs=@var{block_size}] [count=@var{blocks}] [skip=@var{blocks}] if=@var{input} of=@var{output} ETEXI DEF("info", img_info, diff --git a/qemu-img.c b/qemu-img.c index 25288c4d18..75ac0318a1 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -4310,7 +4310,6 @@ struct DdInfo { struct DdIo { int bsz; /* Block size */ char *filename; - uint8_t *buf; int64_t offset; }; @@ -4320,6 +4319,18 @@ struct DdOpts { unsigned int flag; }; +typedef struct ImgDdState { + BlockBackend *src; + int64_t src_offset; + BlockBackend *dst; + int64_t dst_offset; + int bsz; + int64_t size; + bool copy_range; + int running_coroutines; + int ret; +} ImgDdState; + static int img_dd_bs(const char *arg, struct DdIo *in, struct DdIo *out, struct DdInfo *dd) @@ -4383,6 +4394,86 @@ static int img_dd_skip(const char *arg, return 0; } +static void coroutine_fn dd_co_do_copy(void *opaque) +{ + ImgDdState *s = opaque; + int64_t block_count, in_pos, out_pos; + uint8_t *buf; + int ret = 0; + + s->running_coroutines++; + + in_pos = s->src_offset; + out_pos = s->dst_offset; + + if (s->copy_range) { + /* + * We don't impose any kind of alignment requirements nor + * adjustments here. It's up to the user to set the best + * parameters for a particular storage/driver. + */ + for (; in_pos < s->size && ret == 0; block_count++) { + int bytes = s->bsz; + + if (in_pos + bytes > s->size) { + bytes = s->size - in_pos; + } + + ret = blk_co_copy_range(s->src, in_pos, + s->dst, out_pos, + bytes, 0, 0); + in_pos += bytes; + out_pos += bytes; + } + + /* + * If copy_range succeded, jump to the end. Otherwise, + * fall through to try with the usual bouncing buffers. + */ + if (ret == 0) { + goto out; + } else { + warn_report("copy_range failed, continuing with the conventional " + "read/write approach."); + } + } + + buf = g_new(uint8_t, s->bsz); + + for (; in_pos < s->size; block_count++) { + int in_ret, out_ret; + + if (in_pos + s->bsz > s->size) { + in_ret = blk_pread(s->src, in_pos, buf, s->size - in_pos); + } else { + in_ret = blk_pread(s->src, in_pos, buf, s->bsz); + } + if (in_ret < 0) { + error_report("error while reading from input image file: %s", + strerror(-in_ret)); + ret = -1; + goto free_out; + } + in_pos += in_ret; + + out_ret = blk_pwrite(s->dst, out_pos, buf, in_ret, 0); + + if (out_ret < 0) { + error_report("error while writing to output image file: %s", + strerror(-out_ret)); + ret = -1; + goto free_out; + } + out_pos += out_ret; + } + + free_out: + g_free(buf); + out: + s->running_coroutines--; + s->ret = ret; +} + static int img_dd(int argc, char **argv) { int ret = 0; @@ -4390,6 +4481,7 @@ static int img_dd(int argc, char **argv) char *tmp; BlockDriver *drv = NULL, *proto_drv = NULL; BlockBackend *blk1 = NULL, *blk2 = NULL; + Coroutine *co = NULL; QemuOpts *opts = NULL; QemuOptsList *create_opts = NULL; Error *local_err = NULL; @@ -4398,8 +4490,9 @@ static int img_dd(int argc, char **argv) const char *out_fmt = "raw"; const char *fmt = NULL; int64_t size = 0; - int64_t block_count = 0, out_pos, in_pos; + int64_t in_pos; bool force_share = false; + bool copy_range = false; struct DdInfo dd = { .flags = 0, .count = 0, @@ -4407,13 +4500,11 @@ static int img_dd(int argc, char **argv) struct DdIo in = { .bsz = 512, /* Block size is by default 512 bytes */ .filename = NULL, - .buf = NULL, .offset = 0 }; struct DdIo out = { .bsz = 512, .filename = NULL, - .buf = NULL, .offset = 0 }; @@ -4433,7 +4524,7 @@ static int img_dd(int argc, char **argv) { 0, 0, 0, 0 } }; - while ((c = getopt_long(argc, argv, ":hf:O:U", long_options, NULL))) { + while ((c = getopt_long(argc, argv, ":hf:O:UC", long_options, NULL))) { if (c == EOF) { break; } @@ -4456,6 +4547,9 @@ static int img_dd(int argc, char **argv) case 'U': force_share = true; break; + case 'C': + copy_range = true; + break; case OPTION_OBJECT: if (!qemu_opts_parse_noisily(&qemu_object_opts, optarg, true)) { ret = -1; @@ -4606,35 +4700,25 @@ static int img_dd(int argc, char **argv) in_pos = in.offset * in.bsz; } - in.buf = g_new(uint8_t, in.bsz); - - for (out_pos = 0; in_pos < size; block_count++) { - int in_ret, out_ret; - - if (in_pos + in.bsz > size) { - in_ret = blk_pread(blk1, in_pos, in.buf, size - in_pos); - } else { - in_ret = blk_pread(blk1, in_pos, in.buf, in.bsz); - } - if (in_ret < 0) { - error_report("error while reading from input image file: %s", - strerror(-in_ret)); - ret = -1; - goto out; - } - in_pos += in_ret; + ImgDdState s = (ImgDdState) { + .src = blk1, + .src_offset = in_pos, + .dst = blk2, + .dst_offset = 0, + .bsz = in.bsz, + .size = size, + .copy_range = copy_range, + }; - out_ret = blk_pwrite(blk2, out_pos, in.buf, in_ret, 0); + co = qemu_coroutine_create(dd_co_do_copy, &s); + qemu_coroutine_enter(co); - if (out_ret < 0) { - error_report("error while writing to output image file: %s", - strerror(-out_ret)); - ret = -1; - goto out; - } - out_pos += out_ret; + while (s.running_coroutines) { + main_loop_wait(false); } + ret = s.ret; + out: g_free(arg); qemu_opts_del(opts); @@ -4643,8 +4727,6 @@ out: blk_unref(blk2); g_free(in.filename); g_free(out.filename); - g_free(in.buf); - g_free(out.buf); if (ret) { return 1; diff --git a/qemu-img.texi b/qemu-img.texi index 3b6710a580..5c5079e3bc 100644 --- a/qemu-img.texi +++ b/qemu-img.texi @@ -390,7 +390,7 @@ way. The size can also be specified using the @var{size} option with @code{-o}, it doesn't need to be specified separately in this case. -@item dd [--image-opts] [-U] [-f @var{fmt}] [-O @var{output_fmt}] [bs=@var{block_size}] [count=@var{blocks}] [skip=@var{blocks}] if=@var{input} of=@var{output} +@item dd [--image-opts] [-U] [-C] [-f @var{fmt}] [-O @var{output_fmt}] [bs=@var{block_size}] [count=@var{blocks}] [skip=@var{blocks}] if=@var{input} of=@var{output} Dd copies from @var{input} file to @var{output} file converting it from @var{fmt} format to @var{output_fmt} format.
This parameter is analogous to convert's "-C", making use of bdrv_co_copy_range(). Signed-off-by: Sergio Lopez <slp@redhat.com> --- qemu-img-cmds.hx | 4 +- qemu-img.c | 146 ++++++++++++++++++++++++++++++++++++----------- qemu-img.texi | 2 +- 3 files changed, 117 insertions(+), 35 deletions(-)