Message ID | 1330102144-14491-2-git-send-email-fsimonce@redhat.com |
---|---|
State | New |
Headers | show |
On 02/24/2012 09:49 AM, Federico Simoncelli wrote: > Signed-off-by: Federico Simoncelli <fsimonce@redhat.com> Pretty sparse on the commit message. > +++ b/hmp-commands.hx > @@ -886,6 +886,44 @@ Snapshot device, using snapshot file as target if provided > ETEXI > > { > + .name = "drive_reopen", > + .args_type = "device:B,new-image-file:s?,format:s?", > + .params = "device [new-image-file] [format]", > + .help = "Assigns a new image file to a device.\n\t\t\t" > + "The image will be opened using the format\n\t\t\t" > + "specified or 'qcow2' by default.\n\t\t\t" > + "This command is deprecated.", > + .mhandler.cmd = hmp_drive_reopen, Libvirt will just use the QMP version, so I'm not reviewing the HMP version very closely. > +++ b/qapi-schema.json > @@ -1136,6 +1136,60 @@ > 'data': { 'device': 'str', 'snapshot-file': 'str', '*format': 'str' } } > > ## > +# @drive-reopen > +# > +# Assigns a new image file to a device. > +# > +# @device: the name of the device for which we are chainging the image file. s/chainging/changing/ > +# > +# @new-image-file: the target of the new image. If the file doesn't exists the > +# command will fail. I think you need to be more explicit that @new-image-file MUST have identical contents as the current image file, for this to be useful, and that qemu does not validate whether the new image met those conditions. Possible ways to achieve this: 1. Freeze all guest I/O, so that you know the current image file is stable, copy the contents to new-image-file, drive-reopen, then unfreeze guest I/O 2. Create an empty qcow2 file with backing image of the current image file. drive-reopen will then see the new image file as containing the same contents as the previous image file. 3. Create the new image file via mirroring (blockdev-migrate with incremental and new-image-file), and reopen to the mirror > +# > +# @format: #optional the format of the new image, default is 'qcow2'. > +# > +# Returns: nothing on success > +# If @device is not a valid block device, DeviceNotFound > +# If @new-image-file can't be opened, OpenFileFailed > +# If @format is invalid, InvalidBlockFormat > +# > +# Notes: This command is deprecated. Generally, a deprecation note should list the preferred replacement; but we don't have a replacement, so I don't see this note adding any information. > +# > +# Since 1.1 > +## > + > +{ 'command': 'drive-reopen', > + 'data': { 'device': 'str', 'new-image-file': 'str', '*format': 'str' } } > + > +## > +# @blockdev-migrate > +# > +# Migrates the device to a new destination. > +# > +# @device: the name of the block device to migrate. > +# > +# @incremental: if true only the new writes are sent to the destination. > +# This method is particularly useful if used in conjunction > +# with new-image-file allowing the current image to be > +# transferred to the destination by an external manager. > +# > +# @destination: the destination of the block migration. Where do you list what format the destination is? Shouldn't this have an optional format, defaulting to qcow2? Does qemu create the destination file, or must it already be existing? > +# > +# @new-image-file: #optional an existing image file that will replace > +# the current one in the device. Where do you list what format the new-image-file is? Shouldn't this have an optional format, defaulting to qcow2? Does qemu create the new-image-file, or can one already be existing? I know that this patch is only implementing the case where incremental is true and new-image-file is provided; but I'm not quite sure what semantics are intended if incremental is false. Is that still a case where this sets up mirroring (writes go to two images) but additionally the contents from the current image are (asynchronously) streamed into the destination? I guess I'm not sure what the intended semantics of the modes that we aren't implementing, or why we don't just go with a simpler command that only exposes the semantics that we are implementing. > +# > +# Returns: nothing on success > +# If @device is not a valid block device, DeviceNotFound > +# If @mode is not a valid migration mode, InvalidParameterValue > +# If @destination can't be opened, OpenFileFailed > +# If @new-image-file can't be opened, OpenFileFailed > +# > +# Since 1.1 > +## > +{ 'command': 'blockdev-migrate', > + 'data': { 'device': 'str', 'incremental' : 'bool', > + 'destination': 'str', '*new-image-file': 'str' } } > + > +## > # @human-monitor-command: > # > # Execute a command on the human monitor and return the output.
On 02/24/2012 06:46 PM, Eric Blake wrote: > I think you need to be more explicit that @new-image-file MUST have > identical contents as the current image file, for this to be useful, and > that qemu does not validate whether the new image met those conditions. > Possible ways to achieve this: Not necessarily, you could always do this on a paused machine. >> > +# @destination: the destination of the block migration. > Where do you list what format the destination is? Shouldn't this have > an optional format, defaulting to qcow2? Does qemu create the > destination file, or must it already be existing? I think no default, raw makes as much or more sense in the non-incremental case. Anyway either autodetect, or no default. >> > +# >> > +# @new-image-file: #optional an existing image file that will replace >> > +# the current one in the device. > Where do you list what format the new-image-file is? Shouldn't this > have an optional format, defaulting to qcow2? Does qemu create the > new-image-file, or can one already be existing? qemu does not create the file now, but in the future we may add a flag to create a snapshot. I think no default is better here too, or autodetect. > I know that this patch is only implementing the case where incremental > is true and new-image-file is provided; but I'm not quite sure what > semantics are intended if incremental is false. Is that still a case > where this sets up mirroring (writes go to two images) but additionally > the contents from the current image are (asynchronously) streamed into > the destination? Yes. The image should already be there also in this case, and new-image-file will usually be omitted. Paolo
On Fri, 24 Feb 2012 16:49:04 +0000 Federico Simoncelli <fsimonce@redhat.com> wrote: > Signed-off-by: Federico Simoncelli <fsimonce@redhat.com> How does this relate to migrate -b? Should it be deprecated? Btw, would be nice to have a 0/2 intro email describing the feature and changelog info. > --- > blockdev.c | 107 ++++++++++++++++++++++++++++++++++++++++++++++++------ > hmp-commands.hx | 38 +++++++++++++++++++ > hmp.c | 24 ++++++++++++ > hmp.h | 2 + > qapi-schema.json | 54 +++++++++++++++++++++++++++ > 5 files changed, 213 insertions(+), 12 deletions(-) > > diff --git a/blockdev.c b/blockdev.c > index 2c132a3..f51bd6f 100644 > --- a/blockdev.c > +++ b/blockdev.c > @@ -646,9 +646,8 @@ void do_commit(Monitor *mon, const QDict *qdict) > } > } > > -void qmp_blockdev_snapshot_sync(const char *device, const char *snapshot_file, > - bool has_format, const char *format, > - Error **errp) > +static void change_blockdev_image(const char *device, const char *new_image_file, > + const char *format, bool create, Error **errp) > { > BlockDriverState *bs; > BlockDriver *drv, *old_drv, *proto_drv; > @@ -671,7 +670,7 @@ void qmp_blockdev_snapshot_sync(const char *device, const char *snapshot_file, > old_drv = bs->drv; > flags = bs->open_flags; > > - if (!has_format) { > + if (!format) { > format = "qcow2"; > } > > @@ -681,24 +680,26 @@ void qmp_blockdev_snapshot_sync(const char *device, const char *snapshot_file, > return; > } > > - proto_drv = bdrv_find_protocol(snapshot_file); > + proto_drv = bdrv_find_protocol(new_image_file); > if (!proto_drv) { > error_set(errp, QERR_INVALID_BLOCK_FORMAT, format); > return; > } > > - ret = bdrv_img_create(snapshot_file, format, bs->filename, > - bs->drv->format_name, NULL, -1, flags); > - if (ret) { > - error_set(errp, QERR_UNDEFINED_ERROR); > - return; > + if (create) { > + ret = bdrv_img_create(new_image_file, format, bs->filename, > + bs->drv->format_name, NULL, -1, flags); > + if (ret) { > + error_set(errp, QERR_UNDEFINED_ERROR); > + return; > + } > } > > bdrv_drain_all(); > bdrv_flush(bs); > > bdrv_close(bs); > - ret = bdrv_open(bs, snapshot_file, flags, drv); > + ret = bdrv_open(bs, new_image_file, flags, drv); > /* > * If reopening the image file we just created fails, fall back > * and try to re-open the original image. If that fails too, we > @@ -709,11 +710,93 @@ void qmp_blockdev_snapshot_sync(const char *device, const char *snapshot_file, > if (ret != 0) { > error_set(errp, QERR_OPEN_FILE_FAILED, old_filename); > } else { > - error_set(errp, QERR_OPEN_FILE_FAILED, snapshot_file); > + error_set(errp, QERR_OPEN_FILE_FAILED, new_image_file); > } > } > } > > +void qmp_drive_reopen(const char *device, const char *new_image_file, > + bool has_format, const char *format, Error **errp) > +{ > + change_blockdev_image(device, new_image_file, > + has_format ? format : NULL, false, errp); > +} > + > +void qmp_blockdev_snapshot_sync(const char *device, const char *snapshot_file, > + bool has_format, const char *format, > + Error **errp) > +{ > + change_blockdev_image(device, snapshot_file, > + has_format ? format : NULL, true, errp); > +} > + > +static void qmp_blockdev_mirror(const char *device, const char *destination, > + const char *new_image_file, Error **errp) > +{ Minor: this is a not a qmp function, please drop the qmp_ prefix then. > + BlockDriver *drv; > + int i, j, escape; > + char new_filename[2048], *filename; I'd use PATH_MAX for new_filename's size. > + > + drv = bdrv_find_format("blkmirror"); > + if (!drv) { > + error_set(errp, QERR_INVALID_BLOCK_FORMAT, "blkmirror"); > + return; > + } > + > + escape = 0; > + for (i = 0, j = 0; j < strlen(new_image_file); j++) { > + loop: > + if (!(i < sizeof(new_filename) - 2)) { > + error_set(errp, QERR_INVALID_PARAMETER_VALUE, > + "new-image-file", "shorter filename"); > + return; > + } > + > + if (new_image_file[j] == ':' || new_image_file[j] == '\\') { > + if (!escape) { > + new_filename[i++] = '\\', escape = 1; > + goto loop; > + } else { > + escape = 0; > + } > + } > + > + new_filename[i++] = new_image_file[j]; > + } IMO, you could require the filename arguments to be escaped already. > + > + filename = g_strdup_printf("blkmirror:%s:%s", new_filename, destination); > + change_blockdev_image(device, filename, "blkmirror", false, errp); > + g_free(filename); > +} > + > +void qmp_blockdev_migrate(const char *device, bool incremental, > + const char *destination, bool has_new_image_file, > + const char *new_image_file, Error **errp) > +{ > + BlockDriverState *bs; > + > + bs = bdrv_find(device); > + if (!bs) { > + error_set(errp, QERR_DEVICE_NOT_FOUND, device); > + return; > + } > + if (bdrv_in_use(bs)) { > + error_set(errp, QERR_DEVICE_IN_USE, device); > + return; > + } > + > + if (incremental) { > + if (!has_new_image_file) { > + error_set(errp, QERR_INVALID_PARAMETER_VALUE, > + "incremental", "a new image file"); > + } else { > + qmp_blockdev_mirror(device, destination, new_image_file, errp); > + } > + } else { > + error_set(errp, QERR_NOT_SUPPORTED); > + } The command documentation says that 'incremental' and 'new_image_file' are optionals, but the code makes them required. Why? > +} > + > static void eject_device(BlockDriverState *bs, int force, Error **errp) > { > if (bdrv_in_use(bs)) { > diff --git a/hmp-commands.hx b/hmp-commands.hx > index 573b823..4aacf2a 100644 > --- a/hmp-commands.hx > +++ b/hmp-commands.hx > @@ -886,6 +886,44 @@ Snapshot device, using snapshot file as target if provided > ETEXI > > { > + .name = "drive_reopen", > + .args_type = "device:B,new-image-file:s?,format:s?", > + .params = "device [new-image-file] [format]", > + .help = "Assigns a new image file to a device.\n\t\t\t" > + "The image will be opened using the format\n\t\t\t" > + "specified or 'qcow2' by default.\n\t\t\t" > + "This command is deprecated.", > + .mhandler.cmd = hmp_drive_reopen, > + }, > + > +STEXI > +@item drive_reopen > +@findex drive_reopen > +Assigns a new image file to a device. > +ETEXI > + > + { > + .name = "blkdev_migrate", > + .args_type = "incremental:-i,device:B,destination:s,new-image-file:s?", > + .params = "device mode destination [new-image-file]", > + .help = "Migrates the device to a new destination.\n\t\t\t" > + "If the incremental (-i) option is used then\n\t\t\t" > + "only the new writes are sent to the destination.\n\t\t\t" > + "This method is particularly useful if used in\n\t\t\t" > + "conjunction with new-image-file (a new image for\n\t\t\t" > + "the device) allowing the current image to be\n\t\t\t" > + "transferred to the destination by an external\n\t\t\t" > + "manager.", > + .mhandler.cmd = hmp_blkdev_migrate, > + }, > + > +STEXI > +@item blkdev_migrate > +@findex blkdev_migrate > +Migrates the device to a new destination. > +ETEXI > + > + { > .name = "drive_add", > .args_type = "pci_addr:s,opts:s", > .params = "[[<domain>:]<bus>:]<slot>\n" > diff --git a/hmp.c b/hmp.c > index 8ff8c94..282d3e5 100644 > --- a/hmp.c > +++ b/hmp.c > @@ -701,6 +701,30 @@ void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict) > hmp_handle_error(mon, &errp); > } > > +void hmp_drive_reopen(Monitor *mon, const QDict *qdict) > +{ > + const char *device = qdict_get_str(qdict, "device"); > + const char *filename = qdict_get_str(qdict, "new-image-file"); > + const char *format = qdict_get_try_str(qdict, "format"); > + Error *errp = NULL; > + > + qmp_drive_reopen(device, filename, !!format, format, &errp); > + hmp_handle_error(mon, &errp); > +} > + > +void hmp_blkdev_migrate(Monitor *mon, const QDict *qdict) > +{ > + bool incremental = qdict_get_try_bool(qdict, "incremental", 0); > + const char *device = qdict_get_str(qdict, "device"); > + const char *destination = qdict_get_str(qdict, "destination"); > + const char *new_image_file = qdict_get_try_str(qdict, "new-image-file"); > + Error *errp = NULL; > + > + qmp_blockdev_migrate(device, incremental, destination, > + !!new_image_file, new_image_file, &errp); > + hmp_handle_error(mon, &errp); > +} > + > void hmp_migrate_cancel(Monitor *mon, const QDict *qdict) > { > qmp_migrate_cancel(NULL); > diff --git a/hmp.h b/hmp.h > index 18eecbd..3c66257 100644 > --- a/hmp.h > +++ b/hmp.h > @@ -47,6 +47,8 @@ void hmp_block_passwd(Monitor *mon, const QDict *qdict); > void hmp_balloon(Monitor *mon, const QDict *qdict); > void hmp_block_resize(Monitor *mon, const QDict *qdict); > void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict); > +void hmp_drive_reopen(Monitor *mon, const QDict *qdict); > +void hmp_blkdev_migrate(Monitor *mon, const QDict *qdict); > void hmp_migrate_cancel(Monitor *mon, const QDict *qdict); > void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict); > void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict); > diff --git a/qapi-schema.json b/qapi-schema.json > index d02ee86..df1248c 100644 > --- a/qapi-schema.json > +++ b/qapi-schema.json > @@ -1136,6 +1136,60 @@ > 'data': { 'device': 'str', 'snapshot-file': 'str', '*format': 'str' } } > > ## > +# @drive-reopen > +# > +# Assigns a new image file to a device. > +# > +# @device: the name of the device for which we are chainging the image file. > +# > +# @new-image-file: the target of the new image. If the file doesn't exists the > +# command will fail. > +# > +# @format: #optional the format of the new image, default is 'qcow2'. > +# > +# Returns: nothing on success > +# If @device is not a valid block device, DeviceNotFound > +# If @new-image-file can't be opened, OpenFileFailed > +# If @format is invalid, InvalidBlockFormat > +# > +# Notes: This command is deprecated. As I said in the other thread, we need a replacement to deprecate something. > +# > +# Since 1.1 > +## > + > +{ 'command': 'drive-reopen', > + 'data': { 'device': 'str', 'new-image-file': 'str', '*format': 'str' } } > + > +## > +# @blockdev-migrate > +# > +# Migrates the device to a new destination. > +# > +# @device: the name of the block device to migrate. > +# > +# @incremental: if true only the new writes are sent to the destination. > +# This method is particularly useful if used in conjunction > +# with new-image-file allowing the current image to be > +# transferred to the destination by an external manager. > +# > +# @destination: the destination of the block migration. > +# > +# @new-image-file: #optional an existing image file that will replace > +# the current one in the device. > +# > +# Returns: nothing on success > +# If @device is not a valid block device, DeviceNotFound > +# If @mode is not a valid migration mode, InvalidParameterValue > +# If @destination can't be opened, OpenFileFailed > +# If @new-image-file can't be opened, OpenFileFailed > +# > +# Since 1.1 > +## > +{ 'command': 'blockdev-migrate', > + 'data': { 'device': 'str', 'incremental' : 'bool', > + 'destination': 'str', '*new-image-file': 'str' } } > + > +## > # @human-monitor-command: > # > # Execute a command on the human monitor and return the output.
On 02/24/2012 11:57 AM, Paolo Bonzini wrote: > On 02/24/2012 06:46 PM, Eric Blake wrote: >> I think you need to be more explicit that @new-image-file MUST have >> identical contents as the current image file, for this to be useful, and >> that qemu does not validate whether the new image met those conditions. >> Possible ways to achieve this: > > Not necessarily, you could always do this on a paused machine. You lost me; I think you snipped too much. Was it about this statement in my original? >> 1. Freeze all guest I/O, so that you know the current image file is >> stable, To which I have a counter-question: Is pausing a machine guaranteed to flush all I/O out to the image prior to returning control to the controlling app, or do we have to rely on the qemu-ga agent command to actually freeze I/O within the guest? Overall, I was arguing that from a usability perspective, if new-image-file does not have the same contents (from the guest's perspective) as the current image, then when you resume the machine, you will be injecting arbitrary disk changes into the guest's disk, probably with disastrous results. I'm not arguing that new-image-file has to be identical to the current image (in fact, converting between raw or qcow2 via drive-reopen seems like great use cases for this new command). So documenting this distinction (guest data must be consistent across the reopen), as part of the contract of the monitor command, would be useful. > >>>> +# @destination: the destination of the block migration. >> Where do you list what format the destination is? Shouldn't this have >> an optional format, defaulting to qcow2? Does qemu create the >> destination file, or must it already be existing? > > I think no default, raw makes as much or more sense in the > non-incremental case. Anyway either autodetect, or no default. So if there is no default, then how do I specify the format? Libvirt _cannot_ rely on autodetect, ever since we had to solve a CVE where relying on autodetect would allow the guest to cause libvirt to mislabel files on the host. >> I know that this patch is only implementing the case where incremental >> is true and new-image-file is provided; but I'm not quite sure what >> semantics are intended if incremental is false. Is that still a case >> where this sets up mirroring (writes go to two images) but additionally >> the contents from the current image are (asynchronously) streamed into >> the destination? > > Yes. The image should already be there also in this case, and > new-image-file will usually be omitted. If the image to be opened must already exist, then explicitly state that in the command documentation.
On 02/24/2012 12:01 PM, Luiz Capitulino wrote: >> + BlockDriver *drv; >> + int i, j, escape; >> + char new_filename[2048], *filename; > > I'd use PATH_MAX for new_filename's size. PATH_MAX need not be defined (and on Hurd, it intentionally is not defined); or might be so huge as to be useless. But 2048 is smaller than typical PATH_MAX of 4096, so you may still want to consider a compromise. >> + >> + new_filename[i++] = new_image_file[j]; >> + } > > IMO, you could require the filename arguments to be escaped already. And if you do that, then the escaping should be consistent for both arguments. Libvirt is okay with pre-escaping file names before passing them to qemu.
On Fri, 24 Feb 2012 12:40:17 -0700 Eric Blake <eblake@redhat.com> wrote: > On 02/24/2012 12:01 PM, Luiz Capitulino wrote: > > >> + BlockDriver *drv; > >> + int i, j, escape; > >> + char new_filename[2048], *filename; > > > > I'd use PATH_MAX for new_filename's size. > > PATH_MAX need not be defined (and on Hurd, it intentionally is not > defined); or might be so huge as to be useless. Aren't those extreme cases? PATH_MAX is a standard define and is used in QEMU in several places. If it's not good here, it shouldn't be good anywhere.
On 02/24/2012 08:01 PM, Luiz Capitulino wrote:
> IMO, you could require the filename arguments to be escaped already.
The monitor command was introduced exactly to avoid having to worry
about details such as escaping. JSON is already supposed to take care
of those.
That said, I think Eric is right that we need to specify the formats, so
we'll need another iteration, and we probably need to rename the command
to drive-migrate too.
Paolo
On Fri, 24 Feb 2012 21:32:30 +0100 Paolo Bonzini <pbonzini@redhat.com> wrote: > On 02/24/2012 08:01 PM, Luiz Capitulino wrote: > > IMO, you could require the filename arguments to be escaped already. > > The monitor command was introduced exactly to avoid having to worry > about details such as escaping. JSON is already supposed to take care > of those. Then the escaping should be done by the parser, not by the command implementation. > > That said, I think Eric is right that we need to specify the formats, so > we'll need another iteration, and we probably need to rename the command > to drive-migrate too. > > Paolo >
On 02/24/2012 09:36 PM, Luiz Capitulino wrote: > > > IMO, you could require the filename arguments to be escaped already. > > > > The monitor command was introduced exactly to avoid having to worry > > about details such as escaping. JSON is already supposed to take care > > of those. > > Then the escaping should be done by the parser, not by the command > implementation. This command is passing its arguments to the block layer, which needs its own escaping. It's a wart of the block layer and not supposed to percolate outside QEMU. Paolo
On 02/24/2012 12:01 PM, Luiz Capitulino wrote: > On Fri, 24 Feb 2012 16:49:04 +0000 > Federico Simoncelli <fsimonce@redhat.com> wrote: > >> Signed-off-by: Federico Simoncelli <fsimonce@redhat.com> > > How does this relate to migrate -b? Should it be deprecated? > > Btw, would be nice to have a 0/2 intro email describing the feature and changelog > info. Another question - how does this interact with Jeff Cody's work to add blockdev-group-snapshot-sync to add atomic group snapshots? Do you ever envision the need to migrate and/or reopen multiple disks at once, in which case, we would want to be setting things up to work on a group in the same way that Jeff's patches for live snapshot are headed?
On 02/24/2012 01:26 PM, Luiz Capitulino wrote: > On Fri, 24 Feb 2012 12:40:17 -0700 > Eric Blake <eblake@redhat.com> wrote: > >> On 02/24/2012 12:01 PM, Luiz Capitulino wrote: >> >>>> + BlockDriver *drv; >>>> + int i, j, escape; >>>> + char new_filename[2048], *filename; >>> >>> I'd use PATH_MAX for new_filename's size. >> >> PATH_MAX need not be defined (and on Hurd, it intentionally is not >> defined); or might be so huge as to be useless. > > Aren't those extreme cases? PATH_MAX is a standard define and is used in > QEMU in several places. If it's not good here, it shouldn't be good anywhere. PATH_MAX is specifically declared in POSIX to be defined if there is a limit, or undefined if there is no limit. There is no limit in GNU Hurd, so PATH_MAX is undefined there, and you will get a compile error (then again, no one has ported qemu to Hurd). Here's what gnulib has recommended: https://lists.gnu.org/archive/html/bug-gnulib/2011-06/msg00328.html > A package like coreutils can also do > #ifndef PATH_MAX > # define PATH_MAX 8192 > #endif > in its system.h. > > Looking at both uses of PATH_MAX in coreutils (src/pwd.c:88 and > src/remove.c:186) the value of PATH_MAX is capped by 8192 or 16384 anyway. > So, on systems like GNU/Hurd, where filenames can have arbitrary size, you > are calling pathconf for no real purpose. > > To me, this confirms that a generic pathmax.h (like the one in gnulib) > should only define PATH_MAX when it makes sense - like POSIX says -, > and that the handling of the GNU/Hurd case should be done on a case-by-case > basis: > - Either a package-wide handling, or a per-file handling. > - Either a fallback value of 8192, or a fallback value of > pathconf ("/", _PC_PATH_MAX), or just a #ifdef test. Other mails in that thread are also an interesting read. In short, use of PATH_MAX should only ever be used to optimize routines to the common case; in which case, you can pick your own cap for PATH_MAX if the implementation did not provide one or reduce the implementation's 8k PATH_MAX down to something like 2048 that you can safely fit on the stack for the common case before malloc'ing for the larger strings. But using it as a bounds to a statically-sized object is a recipe for artificially limiting software; if you are okay with introducing that artificial limit, then go for it; but if you want to be truly portable, it is best to never use PATH_MAX as an array bounds, and to write fallback code paths to handle the cases where user input exceeds PATH_MAX but can still be handled without error by the system you are running on.
On 02/24/2012 11:30 PM, Eric Blake wrote: > On 02/24/2012 12:01 PM, Luiz Capitulino wrote: >> On Fri, 24 Feb 2012 16:49:04 +0000 >> Federico Simoncelli <fsimonce@redhat.com> wrote: >> >>> Signed-off-by: Federico Simoncelli <fsimonce@redhat.com> >> >> How does this relate to migrate -b? Should it be deprecated? >> >> Btw, would be nice to have a 0/2 intro email describing the feature and changelog >> info. > > Another question - how does this interact with Jeff Cody's work to add > blockdev-group-snapshot-sync to add atomic group snapshots? Do you ever > envision the need to migrate and/or reopen multiple disks at once, in > which case, we would want to be setting things up to work on a group in > the same way that Jeff's patches for live snapshot are headed? I was thinking that perhaps we needed something like blockdev-group-start, blockdev-group-commit, blockdev-group-abort instead of Jeff's group snapshot, but I was afraid of proposing it. :) Paolo
----- Original Message ----- > From: "Luiz Capitulino" <lcapitulino@redhat.com> > To: "Federico Simoncelli" <fsimonce@redhat.com> > Cc: qemu-devel@nongnu.org, mtosatti@redhat.com, pbonzini@redhat.com, kwolf@redhat.com, armbru@redhat.com > Sent: Friday, February 24, 2012 8:01:43 PM > Subject: Re: [PATCH 2/2 v2] Add the blockdev-reopen and blockdev-migrate commands > > On Fri, 24 Feb 2012 16:49:04 +0000 > Federico Simoncelli <fsimonce@redhat.com> wrote: > > > Signed-off-by: Federico Simoncelli <fsimonce@redhat.com> > > Btw, would be nice to have a 0/2 intro email describing the feature > and changelog > info. Yes the 0/2 (actually 0/3) was sent at the beginning of the thread so you might have missed it because you were added later on but I'm sure you can still find it in the archives. > > > + BlockDriver *drv; > > + int i, j, escape; > > + char new_filename[2048], *filename; > > I'd use PATH_MAX for new_filename's size. Maybe PATH_MAX * 2 + 1? (To handle the case where all the characters should be escaped). > > + escape = 0; > > + for (i = 0, j = 0; j < strlen(new_image_file); j++) { > > + loop: > > + if (!(i < sizeof(new_filename) - 2)) { > > + error_set(errp, QERR_INVALID_PARAMETER_VALUE, > > + "new-image-file", "shorter filename"); > > + return; > > + } > > + > > + if (new_image_file[j] == ':' || new_image_file[j] == '\\') > > { > > + if (!escape) { > > + new_filename[i++] = '\\', escape = 1; > > + goto loop; > > + } else { > > + escape = 0; > > + } > > + } > > + > > + new_filename[i++] = new_image_file[j]; > > + } > > IMO, you could require the filename arguments to be escaped already. Can you be more explicit, who should escape it? The only thing that comes to my mind right now is requiring the input of blockdev-migrate already escaped but that would expose an internal format. (I'd not recommend it). > > +void qmp_blockdev_migrate(const char *device, bool incremental, > > + const char *destination, bool > > has_new_image_file, > > + const char *new_image_file, Error > > **errp) > > +{ > > + BlockDriverState *bs; > > + > > + bs = bdrv_find(device); > > + if (!bs) { > > + error_set(errp, QERR_DEVICE_NOT_FOUND, device); > > + return; > > + } > > + if (bdrv_in_use(bs)) { > > + error_set(errp, QERR_DEVICE_IN_USE, device); > > + return; > > + } > > + > > + if (incremental) { > > + if (!has_new_image_file) { > > + error_set(errp, QERR_INVALID_PARAMETER_VALUE, > > + "incremental", "a new image file"); > > + } else { > > + qmp_blockdev_mirror(device, destination, > > new_image_file, errp); > > + } > > + } else { > > + error_set(errp, QERR_NOT_SUPPORTED); > > + } > > The command documentation says that 'incremental' and > 'new_image_file' are > optionals, but the code makes them required. Why? If I didn't make any mistake in the code I'm just enforcing that when you specify "incremental" you also need a new image. There are still other valid cases where they are optional.
On Mon, 27 Feb 2012 06:29:39 -0500 (EST) Federico Simoncelli <fsimonce@redhat.com> wrote: > ----- Original Message ----- > > From: "Luiz Capitulino" <lcapitulino@redhat.com> > > To: "Federico Simoncelli" <fsimonce@redhat.com> > > Cc: qemu-devel@nongnu.org, mtosatti@redhat.com, pbonzini@redhat.com, kwolf@redhat.com, armbru@redhat.com > > Sent: Friday, February 24, 2012 8:01:43 PM > > Subject: Re: [PATCH 2/2 v2] Add the blockdev-reopen and blockdev-migrate commands > > > > On Fri, 24 Feb 2012 16:49:04 +0000 > > Federico Simoncelli <fsimonce@redhat.com> wrote: > > > > > Signed-off-by: Federico Simoncelli <fsimonce@redhat.com> > > > > Btw, would be nice to have a 0/2 intro email describing the feature > > and changelog > > info. > > Yes the 0/2 (actually 0/3) was sent at the beginning of the thread so you might > have missed it because you were added later on but I'm sure you can still find > it in the archives. I only found v1 iirc, but this is not important right now, as you're going to post v3 right? And that's going to have the intro :) > > > > > + BlockDriver *drv; > > > + int i, j, escape; > > > + char new_filename[2048], *filename; > > > > I'd use PATH_MAX for new_filename's size. > > Maybe PATH_MAX * 2 + 1? (To handle the case where all the characters should be > escaped). Well, I was discussing this with Eric and he thinks that a value of 4096 should be fine. I personally prefer using PATH_MAX because I don't believe I'm better at choosing a random value for this vs. using what the system provides me. Feel free to choose what you think is the best for this case. > > > + escape = 0; > > > + for (i = 0, j = 0; j < strlen(new_image_file); j++) { > > > + loop: > > > + if (!(i < sizeof(new_filename) - 2)) { > > > + error_set(errp, QERR_INVALID_PARAMETER_VALUE, > > > + "new-image-file", "shorter filename"); > > > + return; > > > + } > > > + > > > + if (new_image_file[j] == ':' || new_image_file[j] == '\\') > > > { > > > + if (!escape) { > > > + new_filename[i++] = '\\', escape = 1; > > > + goto loop; > > > + } else { > > > + escape = 0; > > > + } > > > + } > > > + > > > + new_filename[i++] = new_image_file[j]; > > > + } > > > > IMO, you could require the filename arguments to be escaped already. > > Can you be more explicit, who should escape it? Paolo thinks this should be done by the block layer, fine with me. > The only thing that comes to my mind right now is requiring the input of > blockdev-migrate already escaped but that would expose an internal format. > (I'd not recommend it). > > > > +void qmp_blockdev_migrate(const char *device, bool incremental, > > > + const char *destination, bool > > > has_new_image_file, > > > + const char *new_image_file, Error > > > **errp) > > > +{ > > > + BlockDriverState *bs; > > > + > > > + bs = bdrv_find(device); > > > + if (!bs) { > > > + error_set(errp, QERR_DEVICE_NOT_FOUND, device); > > > + return; > > > + } > > > + if (bdrv_in_use(bs)) { > > > + error_set(errp, QERR_DEVICE_IN_USE, device); > > > + return; > > > + } > > > + > > > + if (incremental) { > > > + if (!has_new_image_file) { > > > + error_set(errp, QERR_INVALID_PARAMETER_VALUE, > > > + "incremental", "a new image file"); > > > + } else { > > > + qmp_blockdev_mirror(device, destination, > > > new_image_file, errp); > > > + } > > > + } else { > > > + error_set(errp, QERR_NOT_SUPPORTED); > > > + } > > > > The command documentation says that 'incremental' and > > 'new_image_file' are > > optionals, but the code makes them required. Why? > > If I didn't make any mistake in the code I'm just enforcing that when > you specify "incremental" you also need a new image. > There are still other valid cases where they are optional. Which operation will be performed if 'incremental' is not passed? If it's passed, which operation will be performed if 'new_image_file' is not?
On 02/27/2012 01:12 PM, Luiz Capitulino wrote: > > If I didn't make any mistake in the code I'm just enforcing that when > > you specify "incremental" you also need a new image. > > There are still other valid cases where they are optional. > > Which operation will be performed if 'incremental' is not passed? If > it's passed, which operation will be performed if 'new_image_file' is not? There are four cases: 1) incremental=false, new_image_file not passed: right now fail; in the future: - create an image on dest with no backing file; - writes will be mirrored to current_image_file and dest - start streaming from current_image_file to dest 2) incremental=false, new_image_file passed: right now fail; in the future: - create an image on dest with no backing file; - live-snapshot based on current_image_file to new_image_file; - writes will be mirrored to new_image_file and dest - start streaming from current_image_file to dest 3) incremental=true, new_image_file not passed: - error 4) incremental=true, new_image_file passed: - no images will be created - writes will be mirrored to new_image_file and dest Paolo
On Mon, 27 Feb 2012 13:49:17 +0100 Paolo Bonzini <pbonzini@redhat.com> wrote: > On 02/27/2012 01:12 PM, Luiz Capitulino wrote: > > > If I didn't make any mistake in the code I'm just enforcing that when > > > you specify "incremental" you also need a new image. > > > There are still other valid cases where they are optional. > > > > Which operation will be performed if 'incremental' is not passed? If > > it's passed, which operation will be performed if 'new_image_file' is not? > > There are four cases: > > 1) incremental=false, new_image_file not passed: > right now fail; in the future: > - create an image on dest with no backing file; > - writes will be mirrored to current_image_file and dest > - start streaming from current_image_file to dest > > 2) incremental=false, new_image_file passed: > right now fail; in the future: > - create an image on dest with no backing file; > - live-snapshot based on current_image_file to new_image_file; > - writes will be mirrored to new_image_file and dest > - start streaming from current_image_file to dest > > > 3) incremental=true, new_image_file not passed: > - error > > 4) incremental=true, new_image_file passed: > - no images will be created > - writes will be mirrored to new_image_file and dest IMHO, this is asking for two commands, where cases 1 & 2 is one of them and cases 3 & 4 is the other one. Note how 'incremental' goes away and 'new_image_file' really becomes an optional.
On 02/27/2012 02:06 PM, Luiz Capitulino wrote: > IMHO, this is asking for two commands, where cases 1 & 2 is one of them > and cases 3 & 4 is the other one. Note how 'incremental' goes away and > 'new_image_file' really becomes an optional. I really would have no idea on naming except perhaps "drive_migrate" and "funny_drive_migrate_for_ovirt". But actually I must admit that it's a rat's nest. First, there's no reason why live-snapshotting to new_image_file shouldn't be handled within QEMU. That would make the above table much more orthogonal. However, oVirt likes to create its own snapshots. Perhaps we do need to rethink this together with group snapshots. There are four kinds of operations that we do on block devices: (a) create an image. This is part of what blockdev-snapshot does. (b) switch a block device to a new image. drive-reopen does this. (c) add mirroring to a new destination. (d) activate streaming from a base image drive-migrate does (b) and (c), will do (a) and (d) when we add pre-copy, and should do (a) right now if Federico wasn't an oVirt developer. :) Thinking more about it, the commands we _do_ need are: - start a transaction - switch to a new image - add mirroring to a new destination - commit a transaction - rollback a transaction - query transaction errors Creating an image can be done outside a transaction for now because we only support live external snapshots. Streaming can also be started outside a transaction, because it doesn't need to be started atomically. If we have the above elementary commands, blockdev-snapshot-sync becomes sugar for this: (create image) blockdev-start-transaction if no active transaction drive-reopen blockdev-commit-transaction if no active transaction Jeff's group snapshot can be realized as this: blockdev-begin-transaction blockdev-snapshot-sync ... blockdev-snapshot-sync blockdev-commit-transaction And for drive-migrate, let's look at the above 3 cases: > > 1) incremental=false, new_image_file not passed: > > right now fail; in the future: > > - create an image on dest with no backing file; > > - writes will be mirrored to current_image_file and dest > > - start streaming from current_image_file to dest This is a new command "drive-mirror device dest", which does: (create image for dest) blockdev-begin-transaction if no active transaction drive-mirror device dest blockdev-commit-transaction if no active transaction The command does this: - mirror writes to current_image_file and dest - start streaming from current_image_file to dest The second part can be suppressed with a boolean argument. > > 2) incremental=false, new_image_file passed: > > right now fail; in the future: > > - create an image on dest with no backing file; > > - live-snapshot based on current_image_file to new_image_file; > > - writes will be mirrored to new_image_file and dest > > - start streaming from current_image_file to dest Atomicity is not needed here, so the user can simply issue: blockdev-snapshot-sync device new-image-file drive-mirror device dest > > 4) incremental=true, new_image_file passed: > > - no images will be created > > - writes will be mirrored to new_image_file and dest No need to provide this from within QEMU, because libvirt/oVirt can do the dance using elementary operations: blockdev-begin-transaction drive-reopen device new-image-file drive-mirror streaming=false device dest blockdev-commit-transaction No strange optional arguments, no proliferation of commands, etc. The only downside is that if someone tries to do (4) without transactions (or without stopping the VM) they'll get corruption because atomicity is required. Paolo
On 02/27/2012 08:39 AM, Paolo Bonzini wrote: > On 02/27/2012 02:06 PM, Luiz Capitulino wrote: >> IMHO, this is asking for two commands, where cases 1& 2 is one of them >> and cases 3& 4 is the other one. Note how 'incremental' goes away and >> 'new_image_file' really becomes an optional. > > I really would have no idea on naming except perhaps "drive_migrate" and > "funny_drive_migrate_for_ovirt". But actually I must admit that it's a > rat's nest. > > First, there's no reason why live-snapshotting to new_image_file > shouldn't be handled within QEMU. That would make the above table much > more orthogonal. However, oVirt likes to create its own snapshots. > > Perhaps we do need to rethink this together with group snapshots. > > There are four kinds of operations that we do on block devices: > (a) create an image. This is part of what blockdev-snapshot does. > (b) switch a block device to a new image. drive-reopen does this. > (c) add mirroring to a new destination. > (d) activate streaming from a base image > > drive-migrate does (b) and (c), will do (a) and (d) when we add > pre-copy, and should do (a) right now if Federico wasn't an oVirt > developer. :) > > Thinking more about it, the commands we _do_ need are: > - start a transaction > - switch to a new image > - add mirroring to a new destination > - commit a transaction > - rollback a transaction > - query transaction errors I think a better way to think of this is as a batch submission. It would be relatively easy to model in QMP too (just have a batch-command that has a list of commands as it's argument). The difference between batch submission and a transaction is atomic rollback. But I don't think atomic rollback is really needed here. Regards, Anthony Liguori
On 02/27/2012 03:46 PM, Anthony Liguori wrote: > I think a better way to think of this is as a batch submission. It > would be relatively easy to model in QMP too (just have a batch-command > that has a list of commands as it's argument). > > The difference between batch submission and a transaction is atomic > rollback. But I don't think atomic rollback is really needed here. A transaction enforces atomicity at the block layer level. It's different from batch commands in two ways: * bdrv_drain_all/bdrv_flush needs to be called at the beginning of the commit. This may not be the case with batch commands. * with batch commands, atomicity happens by chance because VCPUs cannot send I/O while the monitor is holding the global mutex. Paolo
On 02/27/2012 08:54 AM, Paolo Bonzini wrote: > On 02/27/2012 03:46 PM, Anthony Liguori wrote: >> I think a better way to think of this is as a batch submission. It >> would be relatively easy to model in QMP too (just have a batch-command >> that has a list of commands as it's argument). >> >> The difference between batch submission and a transaction is atomic >> rollback. But I don't think atomic rollback is really needed here. > > A transaction enforces atomicity at the block layer level. It's > different from batch commands in two ways: > > * bdrv_drain_all/bdrv_flush needs to be called at the beginning of the > commit. This may not be the case with batch commands. > > * with batch commands, atomicity happens by chance because VCPUs cannot > send I/O while the monitor is holding the global mutex. If the commands are designed correctly, then this works well. The problem is that the current commands are not designed well. For instance, multi-snapshot could look like: block-freeze ide0-hd0 block-freeze ide1-hd1 block-reopen ide0-hd0 my-new-file0.qcow2 block-reopen ide1-hd1 my-new-file1.qcow2 block-unfreeze ide1-hd1 block-unfreeze ide1-hd0 This would work regardless of whether the commands were implemented asynchronously within QEMU too. Regards, Anthony Liguori > > Paolo
On 02/27/2012 03:59 PM, Anthony Liguori wrote: > The problem is that the current commands are not designed well. For > instance, multi-snapshot could look like: > > block-freeze ide0-hd0 > block-freeze ide1-hd1 > block-reopen ide0-hd0 my-new-file0.qcow2 > block-reopen ide1-hd1 my-new-file1.qcow2 > block-unfreeze ide1-hd1 > block-unfreeze ide1-hd0 > > This would work regardless of whether the commands were implemented > asynchronously within QEMU too. This looks good, too. Positive: maps well to fsfreeze/thaw with help from the guest agent. Negative: you have to specify the devices three times. Overall, I think I like it. However, you need to add freeze/unfreeze capabilities to the block layer. Not hard, but one more thing to do. Paolo
On 02/27/2012 09:03 AM, Paolo Bonzini wrote: > On 02/27/2012 03:59 PM, Anthony Liguori wrote: >> The problem is that the current commands are not designed well. For >> instance, multi-snapshot could look like: >> >> block-freeze ide0-hd0 >> block-freeze ide1-hd1 >> block-reopen ide0-hd0 my-new-file0.qcow2 >> block-reopen ide1-hd1 my-new-file1.qcow2 >> block-unfreeze ide1-hd1 >> block-unfreeze ide1-hd0 >> >> This would work regardless of whether the commands were implemented >> asynchronously within QEMU too. > > This looks good, too. Positive: maps well to fsfreeze/thaw with help > from the guest agent. Negative: you have to specify the devices three > times. Overall, I think I like it. > > However, you need to add freeze/unfreeze capabilities to the block > layer. Not hard, but one more thing to do. Right. But it also generalizes to other QMP operations which is potentially interesting. And providing mechanisms like this gives more flexibility to management tools to implement interesting features without constantly chancing new QMP commands. Regards, Anthony Liguori > > Paolo
Am 27.02.2012 15:59, schrieb Anthony Liguori: > On 02/27/2012 08:54 AM, Paolo Bonzini wrote: >> On 02/27/2012 03:46 PM, Anthony Liguori wrote: >>> I think a better way to think of this is as a batch submission. It >>> would be relatively easy to model in QMP too (just have a batch-command >>> that has a list of commands as it's argument). >>> >>> The difference between batch submission and a transaction is atomic >>> rollback. But I don't think atomic rollback is really needed here. >> >> A transaction enforces atomicity at the block layer level. It's >> different from batch commands in two ways: >> >> * bdrv_drain_all/bdrv_flush needs to be called at the beginning of the >> commit. This may not be the case with batch commands. >> >> * with batch commands, atomicity happens by chance because VCPUs cannot >> send I/O while the monitor is holding the global mutex. > > If the commands are designed correctly, then this works well. The problem is > that the current commands are not designed well. For instance, multi-snapshot > could look like: > > block-freeze ide0-hd0 > block-freeze ide1-hd1 > block-reopen ide0-hd0 my-new-file0.qcow2 > block-reopen ide1-hd1 my-new-file1.qcow2 > block-unfreeze ide1-hd1 > block-unfreeze ide1-hd0 > > This would work regardless of whether the commands were implemented > asynchronously within QEMU too. What if block-reopen ide1-hd1 fails? In case of failure, you want both drives to point to their old image, not the first one to the new image and the second one to the old image. Kevin
On 02/27/2012 09:17 AM, Kevin Wolf wrote: > Am 27.02.2012 15:59, schrieb Anthony Liguori: >> On 02/27/2012 08:54 AM, Paolo Bonzini wrote: >>> On 02/27/2012 03:46 PM, Anthony Liguori wrote: >>>> I think a better way to think of this is as a batch submission. It >>>> would be relatively easy to model in QMP too (just have a batch-command >>>> that has a list of commands as it's argument). >>>> >>>> The difference between batch submission and a transaction is atomic >>>> rollback. But I don't think atomic rollback is really needed here. >>> >>> A transaction enforces atomicity at the block layer level. It's >>> different from batch commands in two ways: >>> >>> * bdrv_drain_all/bdrv_flush needs to be called at the beginning of the >>> commit. This may not be the case with batch commands. >>> >>> * with batch commands, atomicity happens by chance because VCPUs cannot >>> send I/O while the monitor is holding the global mutex. >> >> If the commands are designed correctly, then this works well. The problem is >> that the current commands are not designed well. For instance, multi-snapshot >> could look like: >> >> block-freeze ide0-hd0 >> block-freeze ide1-hd1 >> block-reopen ide0-hd0 my-new-file0.qcow2 >> block-reopen ide1-hd1 my-new-file1.qcow2 >> block-unfreeze ide1-hd1 >> block-unfreeze ide1-hd0 >> >> This would work regardless of whether the commands were implemented >> asynchronously within QEMU too. > > What if block-reopen ide1-hd1 fails? In case of failure, you want both > drives to point to their old image, not the first one to the new image > and the second one to the old image. Then you get an error with the block devices still frozen. You can execute another command to reopen back to the old image to roll back the transaction. Pushing the rollback logic to the client does make the client interface a bit more complicated and adds latency to the error path but it's much easier than building a complex transaction infrastructure. And there are examples of this in the wild too. LDAP uses a similar mechanism. Regards, Anthony Liguori > > Kevin
----- Original Message ----- > From: "Paolo Bonzini" <pbonzini@redhat.com> > To: "Luiz Capitulino" <lcapitulino@redhat.com> > Cc: "Federico Simoncelli" <fsimonce@redhat.com>, kwolf@redhat.com, mtosatti@redhat.com, qemu-devel@nongnu.org, > armbru@redhat.com, "Jeff Cody" <jcody@redhat.com> > Sent: Monday, February 27, 2012 3:39:33 PM > Subject: drive transactions (was Re: [PATCH 2/2 v2] Add the blockdev-reopen and blockdev-migrate commands) > > > > 4) incremental=true, new_image_file passed: > > > - no images will be created > > > - writes will be mirrored to new_image_file and dest > > No need to provide this from within QEMU, because libvirt/oVirt can > do > the dance using elementary operations: > > blockdev-begin-transaction > drive-reopen device new-image-file > drive-mirror streaming=false device dest > blockdev-commit-transaction > > No strange optional arguments, no proliferation of commands, etc. > The > only downside is that if someone tries to do (4) without transactions > (or without stopping the VM) they'll get corruption because atomicity > is > required. I'm all for the modularity of the commands (I suggested it since the beginning), but all this infrastructure goes way beyond what I'd need for oVirt now. When I submitted my patches we knew that my work wasn't the definitive solution (eg: the future implementation of -blockdev). So I'd suggest to try and settle with something that is good enough and that is not preventing a future improvement. What about having a (temporary) flag in drive-mirror to accept also a new-image-file until we will have the optimal solution?
On 02/27/2012 05:33 PM, Federico Simoncelli wrote: >> > >> > blockdev-begin-transaction >> > drive-reopen device new-image-file >> > drive-mirror streaming=false device dest >> > blockdev-commit-transaction >> > >> > No strange optional arguments, no proliferation of commands, etc. >> > The >> > only downside is that if someone tries to do (4) without transactions >> > (or without stopping the VM) they'll get corruption because atomicity >> > is >> > required. > I'm all for the modularity of the commands (I suggested it since the beginning), > but all this infrastructure goes way beyond what I'd need for oVirt now. > > When I submitted my patches we knew that my work wasn't the definitive solution > (eg: the future implementation of -blockdev). So I'd suggest to try and settle > with something that is good enough and that is not preventing a future improvement. > > What about having a (temporary) flag in drive-mirror to accept also a new-image-file > until we will have the optimal solution? What if libvirt could simply try blockdev-freeze and, if it's not there, try passing a new-image-file argument. Add the new-image-file downstream if you have time constraints, and leave it out upstream. Too ugly? Paolo
On 02/27/2012 10:33 AM, Federico Simoncelli wrote: > I'm all for the modularity of the commands (I suggested it since the beginning), > but all this infrastructure goes way beyond what I'd need for oVirt now. > > When I submitted my patches we knew that my work wasn't the definitive solution > (eg: the future implementation of -blockdev). So I'd suggest to try and settle > with something that is good enough and that is not preventing a future improvement. > > What about having a (temporary) flag in drive-mirror to accept also a new-image-file > until we will have the optimal solution? Unless there are extenuating circumstances (like the absence of core infrastructure in QEMU), then we should not add commands that we know are not the right command. We have to support our commands forever. Regards, Anthony Liguori >
----- Original Message ----- > From: "Anthony Liguori" <anthony@codemonkey.ws> > To: "Federico Simoncelli" <fsimonce@redhat.com> > Cc: "Paolo Bonzini" <pbonzini@redhat.com>, kwolf@redhat.com, armbru@redhat.com, "Jeff Cody" <jcody@redhat.com>, > mtosatti@redhat.com, qemu-devel@nongnu.org, "Luiz Capitulino" <lcapitulino@redhat.com> > Sent: Monday, February 27, 2012 5:42:31 PM > Subject: Re: [Qemu-devel] drive transactions (was Re: [PATCH 2/2 v2] Add the blockdev-reopen and blockdev-migrate > commands) > > On 02/27/2012 10:33 AM, Federico Simoncelli wrote: > > I'm all for the modularity of the commands (I suggested it since > > the beginning), > > but all this infrastructure goes way beyond what I'd need for oVirt > > now. > > > > When I submitted my patches we knew that my work wasn't the > > definitive solution > > (eg: the future implementation of -blockdev). So I'd suggest to try > > and settle > > with something that is good enough and that is not preventing a > > future improvement. > > > > What about having a (temporary) flag in drive-mirror to accept also > > a new-image-file > > until we will have the optimal solution? > > Unless there are extenuating circumstances (like the absence of core > infrastructure in QEMU), then we should not add commands that we know > are not > the right command. So are you in favor or against my suggestion? It looks like this is exactly the case where the core infrastructure (transactions) is missing.
On 02/27/2012 04:24 PM, Anthony Liguori wrote: > > Then you get an error with the block devices still frozen. You can > execute another command to reopen back to the old image to roll back the > transaction. > > Pushing the rollback logic to the client does make the client interface > a bit more complicated and adds latency to the error path but it's much > easier than building a complex transaction infrastructure. > > And there are examples of this in the wild too. LDAP uses a similar > mechanism. Actually, have you seen Jeff's atomic snapshot patch? It really implements all that is needed to do transactions, and gets 100% ok error-recovery unlike the existing blockdev_snapshot_sync. It really looks like we can do better than client-side error management, and there is not that much complexity at all. Jeff could rework his patches to work with transaction begin/commit commands, and Federico can then add drive-reopen and drive-migrate on top. Paolo
>> On 02/27/2012 10:33 AM, Federico Simoncelli wrote: >>> I'm all for the modularity of the commands (I suggested it since >>> the beginning), >>> but all this infrastructure goes way beyond what I'd need for oVirt >>> now. >>> >>> When I submitted my patches we knew that my work wasn't the >>> definitive solution >>> (eg: the future implementation of -blockdev). So I'd suggest to try >>> and settle >>> with something that is good enough and that is not preventing a >>> future improvement. >>> >>> What about having a (temporary) flag in drive-mirror to accept also >>> a new-image-file >>> until we will have the optimal solution? >> >> Unless there are extenuating circumstances (like the absence of core >> infrastructure in QEMU), then we should not add commands that we know >> are not >> the right command. > > So are you in favor or against my suggestion? Against. I think we have everything we need. > It looks like this is exactly > the case where the core infrastructure (transactions) is missing. Batch requests are incredibly easy to add. I'm stuck in meetings for the next couple days but I'm sure Luiz throw it together in no time at all. Regards, Anthony Liguori >
On 02/27/2012 05:53 PM, Anthony Liguori wrote: >> It looks like this is exactly >> the case where the core infrastructure (transactions) is missing. > > Batch requests are incredibly easy to add. I'm stuck in meetings for > the next couple days but I'm sure Luiz throw it together in no time at all. Batch requests are unnecessary. They should be a convenience, not a correctness feature. Paolo
On 02/27/2012 10:51 AM, Paolo Bonzini wrote: > On 02/27/2012 04:24 PM, Anthony Liguori wrote: >> >> Then you get an error with the block devices still frozen. You can >> execute another command to reopen back to the old image to roll back the >> transaction. >> >> Pushing the rollback logic to the client does make the client interface >> a bit more complicated and adds latency to the error path but it's much >> easier than building a complex transaction infrastructure. >> >> And there are examples of this in the wild too. LDAP uses a similar >> mechanism. > > Actually, have you seen Jeff's atomic snapshot patch? It really > implements all that is needed to do transactions, and gets 100% ok > error-recovery unlike the existing blockdev_snapshot_sync. It really > looks like we can do better than client-side error management, and there > is not that much complexity at all. > > Jeff could rework his patches to work with transaction begin/commit > commands, and Federico can then add drive-reopen and drive-migrate on top. Yes, maybe I lack imagination but I fail to see how it generalizes easily/nicely. From what I can tell, all of the rollback logic is very specific to the commands being used, right? Regards, Anthony Liguori > > Paolo
On 02/27/2012 10:54 AM, Paolo Bonzini wrote: > On 02/27/2012 05:53 PM, Anthony Liguori wrote: >>> It looks like this is exactly >>> the case where the core infrastructure (transactions) is missing. >> >> Batch requests are incredibly easy to add. I'm stuck in meetings for >> the next couple days but I'm sure Luiz throw it together in no time at all. > > Batch requests are unnecessary. They should be a convenience, not a > correctness feature. I think this discussion has gotten a bit unwieldy as I'm having trouble keeping up with the various proposals. Can we move to something more formal and written on the wiki? Can we start with a clear description of what the various requirements are? Regards, Anthony Liguroi > > Paolo >
On 02/27/2012 05:58 PM, Anthony Liguori wrote: >> >> Jeff could rework his patches to work with transaction begin/commit >> commands, and Federico can then add drive-reopen and drive-migrate on >> top. > > Yes, maybe I lack imagination but I fail to see how it generalizes > easily/nicely. > From what I can tell, all of the rollback logic is very specific to the > commands being used, right? The rollback logic is just "close the new devices". The commit logic is specific to the commands being used, but reopen should be easier than snapshot (and basically the same except for backing_hd handling). Migrate is really syntactic sugar around reopen, so no surprises there. Paolo
On Mon, 27 Feb 2012 10:42:31 -0600 Anthony Liguori <anthony@codemonkey.ws> wrote: > On 02/27/2012 10:33 AM, Federico Simoncelli wrote: > > I'm all for the modularity of the commands (I suggested it since the beginning), > > but all this infrastructure goes way beyond what I'd need for oVirt now. > > > > When I submitted my patches we knew that my work wasn't the definitive solution > > (eg: the future implementation of -blockdev). So I'd suggest to try and settle > > with something that is good enough and that is not preventing a future improvement. > > > > What about having a (temporary) flag in drive-mirror to accept also a new-image-file > > until we will have the optimal solution? > > Unless there are extenuating circumstances (like the absence of core > infrastructure in QEMU), then we should not add commands that we know are not > the right command. > > We have to support our commands forever. Agreed. Worst case we have vendor extension support that downstream can use to add its own commands.
diff --git a/blockdev.c b/blockdev.c index 2c132a3..f51bd6f 100644 --- a/blockdev.c +++ b/blockdev.c @@ -646,9 +646,8 @@ void do_commit(Monitor *mon, const QDict *qdict) } } -void qmp_blockdev_snapshot_sync(const char *device, const char *snapshot_file, - bool has_format, const char *format, - Error **errp) +static void change_blockdev_image(const char *device, const char *new_image_file, + const char *format, bool create, Error **errp) { BlockDriverState *bs; BlockDriver *drv, *old_drv, *proto_drv; @@ -671,7 +670,7 @@ void qmp_blockdev_snapshot_sync(const char *device, const char *snapshot_file, old_drv = bs->drv; flags = bs->open_flags; - if (!has_format) { + if (!format) { format = "qcow2"; } @@ -681,24 +680,26 @@ void qmp_blockdev_snapshot_sync(const char *device, const char *snapshot_file, return; } - proto_drv = bdrv_find_protocol(snapshot_file); + proto_drv = bdrv_find_protocol(new_image_file); if (!proto_drv) { error_set(errp, QERR_INVALID_BLOCK_FORMAT, format); return; } - ret = bdrv_img_create(snapshot_file, format, bs->filename, - bs->drv->format_name, NULL, -1, flags); - if (ret) { - error_set(errp, QERR_UNDEFINED_ERROR); - return; + if (create) { + ret = bdrv_img_create(new_image_file, format, bs->filename, + bs->drv->format_name, NULL, -1, flags); + if (ret) { + error_set(errp, QERR_UNDEFINED_ERROR); + return; + } } bdrv_drain_all(); bdrv_flush(bs); bdrv_close(bs); - ret = bdrv_open(bs, snapshot_file, flags, drv); + ret = bdrv_open(bs, new_image_file, flags, drv); /* * If reopening the image file we just created fails, fall back * and try to re-open the original image. If that fails too, we @@ -709,11 +710,93 @@ void qmp_blockdev_snapshot_sync(const char *device, const char *snapshot_file, if (ret != 0) { error_set(errp, QERR_OPEN_FILE_FAILED, old_filename); } else { - error_set(errp, QERR_OPEN_FILE_FAILED, snapshot_file); + error_set(errp, QERR_OPEN_FILE_FAILED, new_image_file); } } } +void qmp_drive_reopen(const char *device, const char *new_image_file, + bool has_format, const char *format, Error **errp) +{ + change_blockdev_image(device, new_image_file, + has_format ? format : NULL, false, errp); +} + +void qmp_blockdev_snapshot_sync(const char *device, const char *snapshot_file, + bool has_format, const char *format, + Error **errp) +{ + change_blockdev_image(device, snapshot_file, + has_format ? format : NULL, true, errp); +} + +static void qmp_blockdev_mirror(const char *device, const char *destination, + const char *new_image_file, Error **errp) +{ + BlockDriver *drv; + int i, j, escape; + char new_filename[2048], *filename; + + drv = bdrv_find_format("blkmirror"); + if (!drv) { + error_set(errp, QERR_INVALID_BLOCK_FORMAT, "blkmirror"); + return; + } + + escape = 0; + for (i = 0, j = 0; j < strlen(new_image_file); j++) { + loop: + if (!(i < sizeof(new_filename) - 2)) { + error_set(errp, QERR_INVALID_PARAMETER_VALUE, + "new-image-file", "shorter filename"); + return; + } + + if (new_image_file[j] == ':' || new_image_file[j] == '\\') { + if (!escape) { + new_filename[i++] = '\\', escape = 1; + goto loop; + } else { + escape = 0; + } + } + + new_filename[i++] = new_image_file[j]; + } + + filename = g_strdup_printf("blkmirror:%s:%s", new_filename, destination); + change_blockdev_image(device, filename, "blkmirror", false, errp); + g_free(filename); +} + +void qmp_blockdev_migrate(const char *device, bool incremental, + const char *destination, bool has_new_image_file, + const char *new_image_file, Error **errp) +{ + BlockDriverState *bs; + + bs = bdrv_find(device); + if (!bs) { + error_set(errp, QERR_DEVICE_NOT_FOUND, device); + return; + } + if (bdrv_in_use(bs)) { + error_set(errp, QERR_DEVICE_IN_USE, device); + return; + } + + if (incremental) { + if (!has_new_image_file) { + error_set(errp, QERR_INVALID_PARAMETER_VALUE, + "incremental", "a new image file"); + } else { + qmp_blockdev_mirror(device, destination, new_image_file, errp); + } + } else { + error_set(errp, QERR_NOT_SUPPORTED); + } +} + static void eject_device(BlockDriverState *bs, int force, Error **errp) { if (bdrv_in_use(bs)) { diff --git a/hmp-commands.hx b/hmp-commands.hx index 573b823..4aacf2a 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -886,6 +886,44 @@ Snapshot device, using snapshot file as target if provided ETEXI { + .name = "drive_reopen", + .args_type = "device:B,new-image-file:s?,format:s?", + .params = "device [new-image-file] [format]", + .help = "Assigns a new image file to a device.\n\t\t\t" + "The image will be opened using the format\n\t\t\t" + "specified or 'qcow2' by default.\n\t\t\t" + "This command is deprecated.", + .mhandler.cmd = hmp_drive_reopen, + }, + +STEXI +@item drive_reopen +@findex drive_reopen +Assigns a new image file to a device. +ETEXI + + { + .name = "blkdev_migrate", + .args_type = "incremental:-i,device:B,destination:s,new-image-file:s?", + .params = "device mode destination [new-image-file]", + .help = "Migrates the device to a new destination.\n\t\t\t" + "If the incremental (-i) option is used then\n\t\t\t" + "only the new writes are sent to the destination.\n\t\t\t" + "This method is particularly useful if used in\n\t\t\t" + "conjunction with new-image-file (a new image for\n\t\t\t" + "the device) allowing the current image to be\n\t\t\t" + "transferred to the destination by an external\n\t\t\t" + "manager.", + .mhandler.cmd = hmp_blkdev_migrate, + }, + +STEXI +@item blkdev_migrate +@findex blkdev_migrate +Migrates the device to a new destination. +ETEXI + + { .name = "drive_add", .args_type = "pci_addr:s,opts:s", .params = "[[<domain>:]<bus>:]<slot>\n" diff --git a/hmp.c b/hmp.c index 8ff8c94..282d3e5 100644 --- a/hmp.c +++ b/hmp.c @@ -701,6 +701,30 @@ void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict) hmp_handle_error(mon, &errp); } +void hmp_drive_reopen(Monitor *mon, const QDict *qdict) +{ + const char *device = qdict_get_str(qdict, "device"); + const char *filename = qdict_get_str(qdict, "new-image-file"); + const char *format = qdict_get_try_str(qdict, "format"); + Error *errp = NULL; + + qmp_drive_reopen(device, filename, !!format, format, &errp); + hmp_handle_error(mon, &errp); +} + +void hmp_blkdev_migrate(Monitor *mon, const QDict *qdict) +{ + bool incremental = qdict_get_try_bool(qdict, "incremental", 0); + const char *device = qdict_get_str(qdict, "device"); + const char *destination = qdict_get_str(qdict, "destination"); + const char *new_image_file = qdict_get_try_str(qdict, "new-image-file"); + Error *errp = NULL; + + qmp_blockdev_migrate(device, incremental, destination, + !!new_image_file, new_image_file, &errp); + hmp_handle_error(mon, &errp); +} + void hmp_migrate_cancel(Monitor *mon, const QDict *qdict) { qmp_migrate_cancel(NULL); diff --git a/hmp.h b/hmp.h index 18eecbd..3c66257 100644 --- a/hmp.h +++ b/hmp.h @@ -47,6 +47,8 @@ void hmp_block_passwd(Monitor *mon, const QDict *qdict); void hmp_balloon(Monitor *mon, const QDict *qdict); void hmp_block_resize(Monitor *mon, const QDict *qdict); void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict); +void hmp_drive_reopen(Monitor *mon, const QDict *qdict); +void hmp_blkdev_migrate(Monitor *mon, const QDict *qdict); void hmp_migrate_cancel(Monitor *mon, const QDict *qdict); void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict); void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict); diff --git a/qapi-schema.json b/qapi-schema.json index d02ee86..df1248c 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -1136,6 +1136,60 @@ 'data': { 'device': 'str', 'snapshot-file': 'str', '*format': 'str' } } ## +# @drive-reopen +# +# Assigns a new image file to a device. +# +# @device: the name of the device for which we are chainging the image file. +# +# @new-image-file: the target of the new image. If the file doesn't exists the +# command will fail. +# +# @format: #optional the format of the new image, default is 'qcow2'. +# +# Returns: nothing on success +# If @device is not a valid block device, DeviceNotFound +# If @new-image-file can't be opened, OpenFileFailed +# If @format is invalid, InvalidBlockFormat +# +# Notes: This command is deprecated. +# +# Since 1.1 +## + +{ 'command': 'drive-reopen', + 'data': { 'device': 'str', 'new-image-file': 'str', '*format': 'str' } } + +## +# @blockdev-migrate +# +# Migrates the device to a new destination. +# +# @device: the name of the block device to migrate. +# +# @incremental: if true only the new writes are sent to the destination. +# This method is particularly useful if used in conjunction +# with new-image-file allowing the current image to be +# transferred to the destination by an external manager. +# +# @destination: the destination of the block migration. +# +# @new-image-file: #optional an existing image file that will replace +# the current one in the device. +# +# Returns: nothing on success +# If @device is not a valid block device, DeviceNotFound +# If @mode is not a valid migration mode, InvalidParameterValue +# If @destination can't be opened, OpenFileFailed +# If @new-image-file can't be opened, OpenFileFailed +# +# Since 1.1 +## +{ 'command': 'blockdev-migrate', + 'data': { 'device': 'str', 'incremental' : 'bool', + 'destination': 'str', '*new-image-file': 'str' } } + +## # @human-monitor-command: # # Execute a command on the human monitor and return the output.
Signed-off-by: Federico Simoncelli <fsimonce@redhat.com> --- blockdev.c | 107 ++++++++++++++++++++++++++++++++++++++++++++++++------ hmp-commands.hx | 38 +++++++++++++++++++ hmp.c | 24 ++++++++++++ hmp.h | 2 + qapi-schema.json | 54 +++++++++++++++++++++++++++ 5 files changed, 213 insertions(+), 12 deletions(-)