Patchwork [v2,1/1] Add QMP bits for blockdev-snapshot-sync.

login
register
mail settings
Submitter Jes Sorensen
Date April 18, 2011, 2:27 p.m.
Message ID <1303136821-13333-2-git-send-email-Jes.Sorensen@redhat.com>
Download mbox | patch
Permalink /patch/91764/
State New
Headers show

Comments

Jes Sorensen - April 18, 2011, 2:27 p.m.
From: Jes Sorensen <Jes.Sorensen@redhat.com>

This is quivalent to snapshot_blkdev in the human monitor, with _sync
added to the command name to make it explicit that the command is
synchronous and leave space for a future async version.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
---
 qmp-commands.hx |   26 ++++++++++++++++++++++++++
 1 files changed, 26 insertions(+), 0 deletions(-)
Luiz Capitulino - April 27, 2011, 3:05 p.m.
On Mon, 18 Apr 2011 16:27:01 +0200
Jes.Sorensen@redhat.com wrote:

> From: Jes Sorensen <Jes.Sorensen@redhat.com>
> 
> This is quivalent to snapshot_blkdev in the human monitor, with _sync
> added to the command name to make it explicit that the command is
> synchronous and leave space for a future async version.

I'm not sure appending "_sync" is such a good convention, most commands
are sync today and they don't have it. I'd prefer to call it snapshot_blkdev
and note in the documentation how it works.

On the other hand, I'm not sure how Anthony is going to model async
commands, so maybe he has a better suggestion.

> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
> ---
>  qmp-commands.hx |   26 ++++++++++++++++++++++++++
>  1 files changed, 26 insertions(+), 0 deletions(-)
> 
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index fbd98ee..b8f537c 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -667,6 +667,32 @@ Example:
>  EQMP
>  
>      {
> +        .name       = "blockdev-snapshot-sync",
> +        .args_type  = "device:B,snapshot_file:s?,format:s?",
> +        .params     = "device [new-image-file] [format]",
> +        .user_print = monitor_user_noop,
> +        .mhandler.cmd_new = do_snapshot_blkdev,
> +    },
> +
> +SQMP

This doesn't follow QMP doc convention, which is:

command name
------------

(Explain how the command works, like you do below)

Arguments

Example

> +Synchronous snapshot of block device, using snapshot file as target
> +if provided. 

It's not optional in HMP:

 (qemu) snapshot_blkdev ide0-hd0
 Parameter 'snapshot_file' is missing
 (qemu)

And the command argument is called "snapshot_file" not "new-image-file"
as written in the HMP help text.

> +
> +If a new image file is specified, the new image file will become the
> +new root image. If format is specified, the snapshot file will be
> +created in that format. Otherwise the snapshot will be internal!
> +(currently unsupported).

Sorry for the stupid question, but what's a "new root image"? Also, all
these assumptions seem human features to me, as it can save some typing
and I can poke around to see where the snapshots are stored.

All arguments should be mandatory in QMP, IMO.

Finally, what's the expect behavior when -snapshot is used? I'm getting
this:

 (qemu) snapshot_blkdev ide0-hd0 snap-test
 Could not open '/tmp/vl.6w8YXA'
 (qemu)

At first, I don't see why we shouldn't generate the live snapshot, but anyway,
any special behavior like this should be noted in the section called Notes
in the command's documentation.

> +
> +Errors:
> +If creating the new snapshot image fails, QEMU will continue running
> +on the original image. If switching to the newly created image fails,
> +it will be attempted to fall back to the original image and return
> +QERR_OPEN_FILE_FAILED with the snapshot filename. If re-opening
> +the original image fails, QERR_OPEN_FILE_FAILED will be returned with
> +the original image filename.
> +EQMP
> +
> +    {
>          .name       = "balloon",
>          .args_type  = "value:M",
>          .params     = "target",
Jes Sorensen - April 27, 2011, 3:05 p.m.
On 04/27/11 17:05, Luiz Capitulino wrote:
> On Mon, 18 Apr 2011 16:27:01 +0200
> Jes.Sorensen@redhat.com wrote:
> 
>> From: Jes Sorensen <Jes.Sorensen@redhat.com>
>>
>> This is quivalent to snapshot_blkdev in the human monitor, with _sync
>> added to the command name to make it explicit that the command is
>> synchronous and leave space for a future async version.
> 
> I'm not sure appending "_sync" is such a good convention, most commands
> are sync today and they don't have it. I'd prefer to call it snapshot_blkdev
> and note in the documentation how it works.
> 
> On the other hand, I'm not sure how Anthony is going to model async
> commands, so maybe he has a better suggestion.

The _sync prefix is on purpose to leave space for a possible async
implementation of the snapshot command in the future. This isn't related
to it being a sync vs async qmp command though.

Jes
Jes Sorensen - April 28, 2011, 12:45 p.m.
On 04/27/11 17:05, Jes Sorensen wrote:
> On 04/27/11 17:05, Luiz Capitulino wrote:
>> On Mon, 18 Apr 2011 16:27:01 +0200
>> Jes.Sorensen@redhat.com wrote:
>>
>>> From: Jes Sorensen <Jes.Sorensen@redhat.com>
>>>
>>> This is quivalent to snapshot_blkdev in the human monitor, with _sync
>>> added to the command name to make it explicit that the command is
>>> synchronous and leave space for a future async version.
>>
>> I'm not sure appending "_sync" is such a good convention, most commands
>> are sync today and they don't have it. I'd prefer to call it snapshot_blkdev
>> and note in the documentation how it works.
>>
>> On the other hand, I'm not sure how Anthony is going to model async
>> commands, so maybe he has a better suggestion.
> 
> The _sync prefix is on purpose to leave space for a possible async
> implementation of the snapshot command in the future. This isn't related
> to it being a sync vs async qmp command though.

If people are more comfortable with the QMP command being
"blockdev-snapshot" and then using {-,_}async for the async comment in
the monitor and QMP later, that is fine with me.

I'd just like to move on this and get it upstream.

Cheers,
Jes
Jes Sorensen - April 28, 2011, 1:14 p.m.
On 04/27/11 17:05, Luiz Capitulino wrote:
>> > +Synchronous snapshot of block device, using snapshot file as target
>> > +if provided. 
> It's not optional in HMP:
> 
>  (qemu) snapshot_blkdev ide0-hd0
>  Parameter 'snapshot_file' is missing
>  (qemu)
> 

The parameter is optional in HMP, however it will fail if there is no
snapshot filename or there is no flag for internal snapshots (which is
currently unsupported).

Jes
Jes Sorensen - April 28, 2011, 1:21 p.m.
On 04/27/11 17:05, Luiz Capitulino wrote:
>> +If a new image file is specified, the new image file will become the
>> > +new root image. If format is specified, the snapshot file will be
>> > +created in that format. Otherwise the snapshot will be internal!
>> > +(currently unsupported).
> Sorry for the stupid question, but what's a "new root image"? Also, all
> these assumptions seem human features to me, as it can save some typing
> and I can poke around to see where the snapshots are stored.
> 
> All arguments should be mandatory in QMP, IMO.

Sorry, but there is absolutely no reason to make all arguments
mandatory. Sure it can be done, but the only result is a separate
handling function for it, so we got more almost identical, but still
different code to maintain.

> Finally, what's the expect behavior when -snapshot is used? I'm getting
> this:
> 
>  (qemu) snapshot_blkdev ide0-hd0 snap-test
>  Could not open '/tmp/vl.6w8YXA'
>  (qemu)

What type of file system is your /tmp? You need to provide full path to
the snapshot file if you don't want it created next to where your qemu
binary is being executed.

> At first, I don't see why we shouldn't generate the live snapshot, but anyway,
> any special behavior like this should be noted in the section called Notes
> in the command's documentation.
> 

I don't follow this at all, please elaborate.

Jes
Kevin Wolf - April 28, 2011, 1:41 p.m.
Am 28.04.2011 15:21, schrieb Jes Sorensen:
> On 04/27/11 17:05, Luiz Capitulino wrote:
>>> +If a new image file is specified, the new image file will become the
>>>> +new root image. If format is specified, the snapshot file will be
>>>> +created in that format. Otherwise the snapshot will be internal!
>>>> +(currently unsupported).
>> Sorry for the stupid question, but what's a "new root image"? Also, all
>> these assumptions seem human features to me, as it can save some typing
>> and I can poke around to see where the snapshots are stored.
>>
>> All arguments should be mandatory in QMP, IMO.
> 
> Sorry, but there is absolutely no reason to make all arguments
> mandatory. Sure it can be done, but the only result is a separate
> handling function for it, so we got more almost identical, but still
> different code to maintain.
> 
>> Finally, what's the expect behavior when -snapshot is used? I'm getting
>> this:
>>
>>  (qemu) snapshot_blkdev ide0-hd0 snap-test
>>  Could not open '/tmp/vl.6w8YXA'
>>  (qemu)
> 
> What type of file system is your /tmp? You need to provide full path to
> the snapshot file if you don't want it created next to where your qemu
> binary is being executed.

I think the problem is that this is a temporary file, i.e. unlinked
directly after it has been opened. Trying to reopen a deleted file is a
bad idea.

Kevin
Jes Sorensen - April 28, 2011, 1:46 p.m.
On 04/28/11 15:41, Kevin Wolf wrote:
>>> Finally, what's the expect behavior when -snapshot is used? I'm getting
>>> >> this:
>>> >>
>>> >>  (qemu) snapshot_blkdev ide0-hd0 snap-test
>>> >>  Could not open '/tmp/vl.6w8YXA'
>>> >>  (qemu)
>> > 
>> > What type of file system is your /tmp? You need to provide full path to
>> > the snapshot file if you don't want it created next to where your qemu
>> > binary is being executed.
> I think the problem is that this is a temporary file, i.e. unlinked
> directly after it has been opened. Trying to reopen a deleted file is a
> bad idea.

True, but if /tmp is tmpfs, it isn't possible to open things O_DIRECT,
which could also be the source of the problem here.

Jes
Luiz Capitulino - April 28, 2011, 2:21 p.m.
On Thu, 28 Apr 2011 15:21:41 +0200
Jes Sorensen <Jes.Sorensen@redhat.com> wrote:

> On 04/27/11 17:05, Luiz Capitulino wrote:
> >> +If a new image file is specified, the new image file will become the
> >> > +new root image. If format is specified, the snapshot file will be
> >> > +created in that format. Otherwise the snapshot will be internal!
> >> > +(currently unsupported).
> > Sorry for the stupid question, but what's a "new root image"? Also, all
> > these assumptions seem human features to me, as it can save some typing
> > and I can poke around to see where the snapshots are stored.
> > 
> > All arguments should be mandatory in QMP, IMO.
> 
> Sorry, but there is absolutely no reason to make all arguments
> mandatory. Sure it can be done, but the only result is a separate
> handling function for it, so we got more almost identical, but still
> different code to maintain.

We shouldn't compromise our external interface quality because of
implementation details. What I'm really asking here is whether this is
a good command for our management tools.

For example, I've just realized that the new root image is going to be
automatically created after the first call to this command, and subsequent
calls w/o the snapshot file name will re-use that file. Is that correct?

Also note the optional format usage, the command (randomly) picks qcow2 if
the format is not given. What happens if I pass a raw image and don't specify
the format? Will it work as it works for qcow2?

I'm not exactly asking for mandatory arguments. For the format argument for
example, we could try to auto-detect the format (is it possible)? And then
we could fail with a meaningful error message.

And, I also would like to hear from Anthony, as he's picking up QMP maintenance.

> > Finally, what's the expect behavior when -snapshot is used? I'm getting
> > this:
> > 
> >  (qemu) snapshot_blkdev ide0-hd0 snap-test
> >  Could not open '/tmp/vl.6w8YXA'
> >  (qemu)
> 
> What type of file system is your /tmp?

ext4

> You need to provide full path to
> the snapshot file if you don't want it created next to where your qemu
> binary is being executed.

I'm not running in /tmp.

> > At first, I don't see why we shouldn't generate the live snapshot, but anyway,
> > any special behavior like this should be noted in the section called Notes
> > in the command's documentation.
> > 
> 
> I don't follow this at all, please elaborate.

Any kind of limitation should be noted in the documentation.
Jes Sorensen - April 28, 2011, 2:30 p.m.
On 04/28/11 16:21, Luiz Capitulino wrote:
> On Thu, 28 Apr 2011 15:21:41 +0200
> Jes Sorensen <Jes.Sorensen@redhat.com> wrote:
> 
>> On 04/27/11 17:05, Luiz Capitulino wrote:
>>> All arguments should be mandatory in QMP, IMO.
>>
>> Sorry, but there is absolutely no reason to make all arguments
>> mandatory. Sure it can be done, but the only result is a separate
>> handling function for it, so we got more almost identical, but still
>> different code to maintain.
> 
> We shouldn't compromise our external interface quality because of
> implementation details. What I'm really asking here is whether this is
> a good command for our management tools.

It has been discussed repeatedly for months, so yes I will argue it is.

> For example, I've just realized that the new root image is going to be
> automatically created after the first call to this command, and subsequent
> calls w/o the snapshot file name will re-use that file. Is that correct?

No of course not. Every call to snapshot-blockdev will create a new
snapshot. If you don't specify a filename, the new snapshot will be
internal, except you will get an error as we don't currently support that.

snapshots can be chained, so you can end up with a snapshot pointing to
the previous snapshot, which points to the previous snapshot, which
points to the original image .....

> Also note the optional format usage, the command (randomly) picks qcow2 if
> the format is not given. What happens if I pass a raw image and don't specify
> the format? Will it work as it works for qcow2?

The command doesn't pick randomly, it picks the default cow format for
qemu. You cannot pass a raw image as it is not cow compatible.

> I'm not exactly asking for mandatory arguments. For the format argument for
> example, we could try to auto-detect the format (is it possible)? And then
> we could fail with a meaningful error message.

The code is in there now, and there hasn't been requests for this in the
past. The introduction of the qmp wrapper is not the place to discuss this.

> And, I also would like to hear from Anthony, as he's picking up QMP maintenance.

Anthony already stated to me that he was fairly happy with it. However
you are the QMP maintainer, so it needs to go in via the QMP tree.

>>> Finally, what's the expect behavior when -snapshot is used? I'm getting
>>> this:
>>>
>>>  (qemu) snapshot_blkdev ide0-hd0 snap-test
>>>  Could not open '/tmp/vl.6w8YXA'
>>>  (qemu)
>>
>> What type of file system is your /tmp?
> 
> ext4
> 
>> You need to provide full path to
>> the snapshot file if you don't want it created next to where your qemu
>> binary is being executed.
> 
> I'm not running in /tmp.

Well something is funny with your /tmp then, as the above isn't normal
behavior.

>>> At first, I don't see why we shouldn't generate the live snapshot, but anyway,
>>> any special behavior like this should be noted in the section called Notes
>>> in the command's documentation.
>>>
>>
>> I don't follow this at all, please elaborate.
> 
> Any kind of limitation should be noted in the documentation.

We cannot document a users choice of /tmp, when /tmp isn't part of what
the command does.

Jes
Anthony Liguori - April 28, 2011, 2:36 p.m.
On 04/27/2011 10:05 AM, Luiz Capitulino wrote:
> On Mon, 18 Apr 2011 16:27:01 +0200
> Jes.Sorensen@redhat.com wrote:
>
>> From: Jes Sorensen<Jes.Sorensen@redhat.com>
>>
>> This is quivalent to snapshot_blkdev in the human monitor, with _sync
>> added to the command name to make it explicit that the command is
>> synchronous and leave space for a future async version.
>
> I'm not sure appending "_sync" is such a good convention, most commands
> are sync today and they don't have it. I'd prefer to call it snapshot_blkdev
> and note in the documentation how it works.

It probably should be called snapshot_blkdev_broken because that's what 
it really is.

The '_sync' is there to indicate that this command doesn't properly 
implement asynchronous logic and can break a guest.

I'd actually prefer that we not expose this command through QMP at all 
and instead implement a proper snapshot command.

Regards,

Anthony Liguori

>
> On the other hand, I'm not sure how Anthony is going to model async
> commands, so maybe he has a better suggestion.
>
>> Signed-off-by: Jes Sorensen<Jes.Sorensen@redhat.com>
>> ---
>>   qmp-commands.hx |   26 ++++++++++++++++++++++++++
>>   1 files changed, 26 insertions(+), 0 deletions(-)
>>
>> diff --git a/qmp-commands.hx b/qmp-commands.hx
>> index fbd98ee..b8f537c 100644
>> --- a/qmp-commands.hx
>> +++ b/qmp-commands.hx
>> @@ -667,6 +667,32 @@ Example:
>>   EQMP
>>
>>       {
>> +        .name       = "blockdev-snapshot-sync",
>> +        .args_type  = "device:B,snapshot_file:s?,format:s?",
>> +        .params     = "device [new-image-file] [format]",
>> +        .user_print = monitor_user_noop,
>> +        .mhandler.cmd_new = do_snapshot_blkdev,
>> +    },
>> +
>> +SQMP
>
> This doesn't follow QMP doc convention, which is:
>
> command name
> ------------
>
> (Explain how the command works, like you do below)
>
> Arguments
>
> Example
>
>> +Synchronous snapshot of block device, using snapshot file as target
>> +if provided.
>
> It's not optional in HMP:
>
>   (qemu) snapshot_blkdev ide0-hd0
>   Parameter 'snapshot_file' is missing
>   (qemu)
>
> And the command argument is called "snapshot_file" not "new-image-file"
> as written in the HMP help text.
>
>> +
>> +If a new image file is specified, the new image file will become the
>> +new root image. If format is specified, the snapshot file will be
>> +created in that format. Otherwise the snapshot will be internal!
>> +(currently unsupported).
>
> Sorry for the stupid question, but what's a "new root image"? Also, all
> these assumptions seem human features to me, as it can save some typing
> and I can poke around to see where the snapshots are stored.
>
> All arguments should be mandatory in QMP, IMO.
>
> Finally, what's the expect behavior when -snapshot is used? I'm getting
> this:
>
>   (qemu) snapshot_blkdev ide0-hd0 snap-test
>   Could not open '/tmp/vl.6w8YXA'
>   (qemu)
>
> At first, I don't see why we shouldn't generate the live snapshot, but anyway,
> any special behavior like this should be noted in the section called Notes
> in the command's documentation.
>
>> +
>> +Errors:
>> +If creating the new snapshot image fails, QEMU will continue running
>> +on the original image. If switching to the newly created image fails,
>> +it will be attempted to fall back to the original image and return
>> +QERR_OPEN_FILE_FAILED with the snapshot filename. If re-opening
>> +the original image fails, QERR_OPEN_FILE_FAILED will be returned with
>> +the original image filename.
>> +EQMP
>> +
>> +    {
>>           .name       = "balloon",
>>           .args_type  = "value:M",
>>           .params     = "target",
>
>
Jes Sorensen - April 28, 2011, 2:38 p.m.
On 04/28/11 16:36, Anthony Liguori wrote:
> On 04/27/2011 10:05 AM, Luiz Capitulino wrote:
>> On Mon, 18 Apr 2011 16:27:01 +0200
>> Jes.Sorensen@redhat.com wrote:
>>
>>> From: Jes Sorensen<Jes.Sorensen@redhat.com>
>>>
>>> This is quivalent to snapshot_blkdev in the human monitor, with _sync
>>> added to the command name to make it explicit that the command is
>>> synchronous and leave space for a future async version.
>>
>> I'm not sure appending "_sync" is such a good convention, most commands
>> are sync today and they don't have it. I'd prefer to call it
>> snapshot_blkdev
>> and note in the documentation how it works.
> 
> It probably should be called snapshot_blkdev_broken because that's what
> it really is.
> 
> The '_sync' is there to indicate that this command doesn't properly
> implement asynchronous logic and can break a guest.
> 
> I'd actually prefer that we not expose this command through QMP at all
> and instead implement a proper snapshot command.

Sorry but this is utterly bogus.

The snapshot support as is works fine, and the command is in the
monitor. We should expose it in QMP as well.

If we eventually get a different implementation, then we can rename it
or replace it then.

Jes
Anthony Liguori - April 28, 2011, 2:38 p.m.
On 04/28/2011 09:21 AM, Luiz Capitulino wrote:
> On Thu, 28 Apr 2011 15:21:41 +0200
> Jes Sorensen<Jes.Sorensen@redhat.com>  wrote:
>
>> On 04/27/11 17:05, Luiz Capitulino wrote:
>>>> +If a new image file is specified, the new image file will become the
>>>>> +new root image. If format is specified, the snapshot file will be
>>>>> +created in that format. Otherwise the snapshot will be internal!
>>>>> +(currently unsupported).
>>> Sorry for the stupid question, but what's a "new root image"? Also, all
>>> these assumptions seem human features to me, as it can save some typing
>>> and I can poke around to see where the snapshots are stored.
>>>
>>> All arguments should be mandatory in QMP, IMO.
>>
>> Sorry, but there is absolutely no reason to make all arguments
>> mandatory. Sure it can be done, but the only result is a separate
>> handling function for it, so we got more almost identical, but still
>> different code to maintain.
>
> We shouldn't compromise our external interface quality because of
> implementation details. What I'm really asking here is whether this is
> a good command for our management tools.
>
> For example, I've just realized that the new root image is going to be
> automatically created after the first call to this command, and subsequent
> calls w/o the snapshot file name will re-use that file. Is that correct?
>
> Also note the optional format usage, the command (randomly) picks qcow2 if
> the format is not given. What happens if I pass a raw image and don't specify
> the format? Will it work as it works for qcow2?
>
> I'm not exactly asking for mandatory arguments. For the format argument for
> example, we could try to auto-detect the format (is it possible)? And then
> we could fail with a meaningful error message.
>
> And, I also would like to hear from Anthony, as he's picking up QMP maintenance.

I've been ignoring this interface because it's fundamentally broken.

Maybe we should not expose this via QMP and instead focus on making a 
proper interface for this operation.

Regards,

Anthony Liguori
Jes Sorensen - April 28, 2011, 2:41 p.m.
On 04/28/11 16:42, Kevin Wolf wrote:
>>>>> What type of file system is your /tmp? You need to provide full path to
>>>>> >>>> the snapshot file if you don't want it created next to where your qemu
>>>>> >>>> binary is being executed.
>>> >> I think the problem is that this is a temporary file, i.e. unlinked
>>> >> directly after it has been opened. Trying to reopen a deleted file is a
>>> >> bad idea.
>> > 
>> > True, but if /tmp is tmpfs, it isn't possible to open things O_DIRECT,
>> > which could also be the source of the problem here.
> Not really, -snapshot is very clearly the problem here. Note that what's
> failing here is not opening the new snapshot but the old temporary image
> created by -snapshot.

Sorry you are losing me here - why would reopening the old image fail?
And if it fails, why does it have such a bizarre name in Luiz's case?

Jes
Kevin Wolf - April 28, 2011, 2:42 p.m.
Am 28.04.2011 15:46, schrieb Jes Sorensen:
> On 04/28/11 15:41, Kevin Wolf wrote:
>>>> Finally, what's the expect behavior when -snapshot is used? I'm getting
>>>>>> this:
>>>>>>
>>>>>>  (qemu) snapshot_blkdev ide0-hd0 snap-test
>>>>>>  Could not open '/tmp/vl.6w8YXA'
>>>>>>  (qemu)
>>>>
>>>> What type of file system is your /tmp? You need to provide full path to
>>>> the snapshot file if you don't want it created next to where your qemu
>>>> binary is being executed.
>> I think the problem is that this is a temporary file, i.e. unlinked
>> directly after it has been opened. Trying to reopen a deleted file is a
>> bad idea.
> 
> True, but if /tmp is tmpfs, it isn't possible to open things O_DIRECT,
> which could also be the source of the problem here.

Not really, -snapshot is very clearly the problem here. Note that what's
failing here is not opening the new snapshot but the old temporary image
created by -snapshot.

Kevin
Anthony Liguori - April 28, 2011, 2:46 p.m.
On 04/28/2011 09:38 AM, Jes Sorensen wrote:
>
> Sorry but this is utterly bogus.
>
> The snapshot support as is works fine, and the command is in the
> monitor. We should expose it in QMP as well.

It went in for the monitor because it was considered an imperfect 
command so we held up the QMP side because we wanted a better interface.

The current command does:

1) Create new image backing to current image

2) Flush outstanding I/O to old image

3) Close current image

4) Reopen newly created image

5) Go

Operations (1) and (2) are very synchronous operations.  (4) can be too. 
  We really should have a bdrv_aio_snapshot() function that implements 
the logic for at least (2) in an asynchronous fashion.

That sort of interface is going to affect how we expose things in QMP. 
As from a QMP perspective, we're going to do something like:

a) start snapshot

b) query snapshot progress

c) receive notification of snapshot completion

d) flip over image

And of course, this needs to be carefully thought through for race 
conditions.  In the current command, what happens if you get a crash 
between (2) and (3)?  There's no way for the management tools to know 
that we didn't finish flushing writes.  How does the management tool 
know that (1) didn't fail mid way through resulting in a corrupted image?

Regards,

Anthony Liguori
Jes Sorensen - April 28, 2011, 2:57 p.m.
On 04/28/11 16:46, Anthony Liguori wrote:
> On 04/28/2011 09:38 AM, Jes Sorensen wrote:
>>
>> Sorry but this is utterly bogus.
>>
>> The snapshot support as is works fine, and the command is in the
>> monitor. We should expose it in QMP as well.
> 
> It went in for the monitor because it was considered an imperfect
> command so we held up the QMP side because we wanted a better interface.

I am not sure who is included in the 'we' here.....

> The current command does:
> 1) Create new image backing to current image
> 2) Flush outstanding I/O to old image
> 3) Close current image
> 4) Reopen newly created image
> 5) Go

> Operations (1) and (2) are very synchronous operations.  (4) can be too.
>  We really should have a bdrv_aio_snapshot() function that implements
> the logic for at least (2) in an asynchronous fashion.
> 
> That sort of interface is going to affect how we expose things in QMP.
> As from a QMP perspective, we're going to do something like:
> 
> a) start snapshot
> b) query snapshot progress
> c) receive notification of snapshot completion
> d) flip over image

Sorry this is inherently broken. The management tool should not be
keeping state in this process. I agree an async interface would be nice,
but the above process is plain wrong.

The async snapshot process needs to be doing the exact same as in the
current implementation, the main difference is that it would be running
asynchronously and that QMP would be able to query the state of it. You
definitely do *not* want the management tool to launch the snapshot
process, and then do the flip by the management tool later. It should
all happen as part of the original command. Being able to query it while
it is progressing is a different story.

The one reason why this isn't all that big an issue in the first place,
is that in the normal usage case, a user will run a guest agent which
freezes the guest file systems during the snapshot process, so the fact
that the existing command is synchronous pretty much disappears in the
noise.

> And of course, this needs to be carefully thought through for race
> conditions.  In the current command, what happens if you get a crash
> between (2) and (3)?  There's no way for the management tools to know
> that we didn't finish flushing writes.  How does the management tool
> know that (1) didn't fail mid way through resulting in a corrupted image?

There is no issue here, you have the exact same problem if you get a
crash during d) in your example. It is the same with the existing
command, the crash is only an issue if it happens right in the middle of
the switch over. Until then, only the original image remains valid.

Cheers,
Jes
Anthony Liguori - April 28, 2011, 3:10 p.m.
On 04/28/2011 09:57 AM, Jes Sorensen wrote:
> On 04/28/11 16:46, Anthony Liguori wrote:
>> On 04/28/2011 09:38 AM, Jes Sorensen wrote:
>>>
>>> Sorry but this is utterly bogus.
>>>
>>> The snapshot support as is works fine, and the command is in the
>>> monitor. We should expose it in QMP as well.
>>
>> It went in for the monitor because it was considered an imperfect
>> command so we held up the QMP side because we wanted a better interface.
>
> I am not sure who is included in the 'we' here.....
>
>> The current command does:
>> 1) Create new image backing to current image
>> 2) Flush outstanding I/O to old image
>> 3) Close current image
>> 4) Reopen newly created image
>> 5) Go
>
>> Operations (1) and (2) are very synchronous operations.  (4) can be too.
>>   We really should have a bdrv_aio_snapshot() function that implements
>> the logic for at least (2) in an asynchronous fashion.
>>
>> That sort of interface is going to affect how we expose things in QMP.
>> As from a QMP perspective, we're going to do something like:
>>
>> a) start snapshot
>> b) query snapshot progress
>> c) receive notification of snapshot completion
>> d) flip over image
>
> Sorry this is inherently broken. The management tool should not be
> keeping state in this process. I agree an async interface would be nice,
> but the above process is plain wrong.
>
> The async snapshot process needs to be doing the exact same as in the
> current implementation, the main difference is that it would be running
> asynchronously and that QMP would be able to query the state of it.


No, the command does too many things and as such, makes it impossible 
for a management tool to gracefully recover.

>> And of course, this needs to be carefully thought through for race
>> conditions.  In the current command, what happens if you get a crash
>> between (2) and (3)?  There's no way for the management tools to know
>> that we didn't finish flushing writes.  How does the management tool
>> know that (1) didn't fail mid way through resulting in a corrupted image?
>
> There is no issue here, you have the exact same problem if you get a
> crash during d) in your example. It is the same with the existing
> command, the crash is only an issue if it happens right in the middle of
> the switch over. Until then, only the original image remains valid.

But the new image is always valid once it's been created as pending 
writes are lost no matter what in a hard power failure.  That suggests 
the following flow:

1) Create new image with a backing file
2) Completion ACK

At this stage, the management tool should update it's internal state to 
point to the new image.

3) Begin switch over to new image
4) Switch over image.
5) Receive notification when it's complete

If (4) is happening asynchronously, things get kind of complicated.  You 
can either wait for things to quiesce on their own or you can queue 
pending requests.  It's unclear to me what the right strategy is or if 
there's a mixed strategy that's needed.  I think it makes sense to treat 
3/4 as a command with (5) being an event notification.

But combining 1-5 in a single interface creates a command that while 
convenient on the command line, is not usable for a robust management tool.

Regards,

Anthony Liguori


> Cheers,
> Jes
>
Jes Sorensen - April 29, 2011, 1:38 p.m.
On 04/28/11 17:10, Anthony Liguori wrote:
> On 04/28/2011 09:57 AM, Jes Sorensen wrote:
>> On 04/28/11 16:46, Anthony Liguori wrote:
>> Sorry this is inherently broken. The management tool should not be
>> keeping state in this process. I agree an async interface would be nice,
>> but the above process is plain wrong.
>>
>> The async snapshot process needs to be doing the exact same as in the
>> current implementation, the main difference is that it would be running
>> asynchronously and that QMP would be able to query the state of it.
> 
> No, the command does too many things and as such, makes it impossible
> for a management tool to gracefully recover.

It is exactly the same for the management tool:
- Creation of the new image either succeeds or fails
- Switchover either succeeds or fails

If you have a crash in the process, you still can only know by checking
the consistency of the new image after rebooting.

>> There is no issue here, you have the exact same problem if you get a
>> crash during d) in your example. It is the same with the existing
>> command, the crash is only an issue if it happens right in the middle of
>> the switch over. Until then, only the original image remains valid.
> 
> But the new image is always valid once it's been created as pending
> writes are lost no matter what in a hard power failure.  That suggests
> the following flow:
> 
> 1) Create new image with a backing file
> 2) Completion ACK
> 
> At this stage, the management tool should update it's internal state to
> point to the new image.

This can still be done with the existing code - it's not the ideal
setup, but it is possible due to evil things such as selinux labeling.

> 3) Begin switch over to new image
> 4) Switch over image.
> 5) Receive notification when it's complete

Sorry but this isn't an accurate description of the process. Once you
have a new image ready, you need to halt writes, flush buffers, and then
you can do the switch, which has to be atomic. The only thing that can
really be async in all of this is the buffer flushing. There isn't any
long switch-over process.

> If (4) is happening asynchronously, things get kind of complicated.  You
> can either wait for things to quiesce on their own or you can queue
> pending requests.  It's unclear to me what the right strategy is or if
> there's a mixed strategy that's needed.  I think it makes sense to treat
> 3/4 as a command with (5) being an event notification.

The actual guest application freeze (what some strange people use the
quiicannotspell word for) is done prior to the switchover, you cannot do
it in parallel with it.

> But combining 1-5 in a single interface creates a command that while
> convenient on the command line, is not usable for a robust management tool.

As I explained you can already use an externally created image with the
current interface, the only issue that may be worth doing async is the
buffer flushing. However once you do that, guest writes are going to
stall anyway, and eventually the guest applications will all stall.

The single command is both better from a consistency perspective, and it
will do the right job. Things are not going to get any easier from the
management tool's perspective than they are with the current interface.

Cheers,
Jes
Anthony Liguori - April 29, 2011, 1:45 p.m.
On 04/29/2011 08:38 AM, Jes Sorensen wrote:
> On 04/28/11 17:10, Anthony Liguori wrote:
>> No, the command does too many things and as such, makes it impossible
>> for a management tool to gracefully recover.
>
> It is exactly the same for the management tool:
> - Creation of the new image either succeeds or fails
> - Switchover either succeeds or fails


Creating an image can be treated as an atomic operation.  It either 
fully succeeds or you assume it failed and throw the image away since 
you can always try again.

When you combine creating an image with another operation, you create a 
a single operation that cannot be treated as atomic which makes recovery 
from failure much more difficult.

>> But the new image is always valid once it's been created as pending
>> writes are lost no matter what in a hard power failure.  That suggests
>> the following flow:
>>
>> 1) Create new image with a backing file
>> 2) Completion ACK
>>
>> At this stage, the management tool should update it's internal state to
>> point to the new image.
>
> This can still be done with the existing code - it's not the ideal
> setup, but it is possible due to evil things such as selinux labeling


I don't follow.


>> 3) Begin switch over to new image
>> 4) Switch over image.
>> 5) Receive notification when it's complete
>
> Sorry but this isn't an accurate description of the process. Once you
> have a new image ready, you need to halt writes, flush buffers, and then
> you can do the switch, which has to be atomic.


You need to queue writes, you can still let the guest run while the 
remaining writes are sent to disk.

But if this is a background task, then as a management tool, don't I 
want a notification when it's been completed?

Don't I want to have a little spinning box in my GUI or something like 
that while this is happening?

>> If (4) is happening asynchronously, things get kind of complicated.  You
>> can either wait for things to quiesce on their own or you can queue
>> pending requests.  It's unclear to me what the right strategy is or if
>> there's a mixed strategy that's needed.  I think it makes sense to treat
>> 3/4 as a command with (5) being an event notification.
>
> The actual guest application freeze (what some strange people use the
> quiicannotspell word for) is done prior to the switchover, you cannot do
> it in parallel with it.


I'm not even talking about application quiescing here.  And yeah, I have 
to frequently google that word because Thunderbird doesn't have it in 
it's dictionary :-)

>> But combining 1-5 in a single interface creates a command that while
>> convenient on the command line, is not usable for a robust management tool.
>
> As I explained you can already use an externally created image with the
> current interface, the only issue that may be worth doing async is the
> buffer flushing. However once you do that, guest writes are going to
> stall anyway, and eventually the guest applications will all stall.

The interface shouldn't have the option to create an image... because if 
you have that, the interface is difficult to use.

Why present an option to do something that we know is broken?  We can't 
blame management tools for not doing a good job managing KVM if we're 
giving them poor interfaces to work with.

Regards,

Anthony Liguori

> The single command is both better from a consistency perspective, and it
> will do the right job. Things are not going to get any easier from the
> management tool's perspective than they are with the current interface.
>
> Cheers,
> Jes
>
>
Jes Sorensen - May 3, 2011, 11:44 a.m.
On 04/29/11 15:45, Anthony Liguori wrote:
> On 04/29/2011 08:38 AM, Jes Sorensen wrote:
>> It is exactly the same for the management tool:
>> - Creation of the new image either succeeds or fails
>> - Switchover either succeeds or fails
> 
> Creating an image can be treated as an atomic operation.  It either
> fully succeeds or you assume it failed and throw the image away since
> you can always try again.
> 
> When you combine creating an image with another operation, you create a
> a single operation that cannot be treated as atomic which makes recovery
> from failure much more difficult.

As explained in previous emails, you still have the option to do this
with the current implementation, if you so desire. It's a bad idea, but
for those who insist on doing the wrong thing, sure go ahead.

Just create the image first, and then pass it in as the snapshot file
rather than specifying a snapshot file that doesn't exist already.

>>> 3) Begin switch over to new image
>>> 4) Switch over image.
>>> 5) Receive notification when it's complete
>>
>> Sorry but this isn't an accurate description of the process. Once you
>> have a new image ready, you need to halt writes, flush buffers, and then
>> you can do the switch, which has to be atomic.
> 
> You need to queue writes, you can still let the guest run while the
> remaining writes are sent to disk.
> 
> But if this is a background task, then as a management tool, don't I
> want a notification when it's been completed?
> 
> Don't I want to have a little spinning box in my GUI or something like
> that while this is happening?

I agree that having an interface that can run it in the background
asynchronously isn't a bad thing, and it would be a nice feature.
However it really doesn't qualify as anything but an enhancement in my
book. You already have the option to pre-create the snapshot image if
you so desire.

>>> But combining 1-5 in a single interface creates a command that while
>>> convenient on the command line, is not usable for a robust management
>>> tool.
>>
>> As I explained you can already use an externally created image with the
>> current interface, the only issue that may be worth doing async is the
>> buffer flushing. However once you do that, guest writes are going to
>> stall anyway, and eventually the guest applications will all stall.
> 
> The interface shouldn't have the option to create an image... because if
> you have that, the interface is difficult to use.
> 
> Why present an option to do something that we know is broken?  We can't
> blame management tools for not doing a good job managing KVM if we're
> giving them poor interfaces to work with.

Sorry this is utterly bogus. This is *not* broken, it is the right way
to do things. It is clearly a matter of taste whether you prefer it one
way or another, but it is not broken.

There is nothing wrong with it being an option, especially since you
don't have to do anything magic for creating an image in advance. I
realize it doesn't match your image of how you would like the command to
work, but that is not the same as it being broken.

I really see nothing in your above arguments that backs up the argument
that the current implementation is broken in any way. I am totally in
favor or having an async implementation as well, but I really don't see
it being as big an issue as you are making it.

Regards,
Jes

Patch

diff --git a/qmp-commands.hx b/qmp-commands.hx
index fbd98ee..b8f537c 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -667,6 +667,32 @@  Example:
 EQMP
 
     {
+        .name       = "blockdev-snapshot-sync",
+        .args_type  = "device:B,snapshot_file:s?,format:s?",
+        .params     = "device [new-image-file] [format]",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_snapshot_blkdev,
+    },
+
+SQMP
+Synchronous snapshot of block device, using snapshot file as target
+if provided. 
+
+If a new image file is specified, the new image file will become the
+new root image. If format is specified, the snapshot file will be
+created in that format. Otherwise the snapshot will be internal!
+(currently unsupported).
+
+Errors:
+If creating the new snapshot image fails, QEMU will continue running
+on the original image. If switching to the newly created image fails,
+it will be attempted to fall back to the original image and return
+QERR_OPEN_FILE_FAILED with the snapshot filename. If re-opening
+the original image fails, QERR_OPEN_FILE_FAILED will be returned with
+the original image filename.
+EQMP
+
+    {
         .name       = "balloon",
         .args_type  = "value:M",
         .params     = "target",