diff mbox

[1/1] blk: do not select PFLASH device for internal snapshot

Message ID 56952B5E.9080002@openvz.org
State New
Headers show

Commit Message

Denis V. Lunev Jan. 12, 2016, 4:35 p.m. UTC
On 01/12/2016 06:47 PM, Denis V. Lunev wrote:
> On 01/12/2016 06:20 PM, Kevin Wolf wrote:
>> Am 12.01.2016 um 15:59 hat Paolo Bonzini geschrieben:
>>>
>>> On 12/01/2016 15:16, Kevin Wolf wrote:
>>>>> Thus we should avoid selection of "pflash" drives for VM state 
>>>>> saving.
>>>>>
>>>>> For now "pflash" is read-write raw image as it configured by libvirt.
>>>>> Thus there are no such images in the field and we could safely 
>>>>> disable
>>>>> ability to save state to those images inside QEMU.
>>>> This is obviously broken. If you write to the pflash, then it needs to
>>>> be snapshotted in order to keep a consistent state.
>>>>
>>>> If you want to avoid snapshotting the image, make it read-only and it
>>>> will be skipped even today.
>>> Sort of.  The point of having flash is to _not_ make it read-only, so
>>> that is not a solution.
>>>
>>> Flash is already being snapshotted as part of saving RAM state.  In
>>> fact, for this reason the device (at least the one used with OVMF; I
>>> haven't checked other pflash devices) can simply save it back to disk
>>> on the migration destination, without the need to use "migrate -b" or
>>> shared storage.
>>> [...]
>>> I don't like very much using IF_PFLASH this way, which is why I hadn't
>>> replied to the patch so far---I hadn't made up my mind about *what* to
>>> suggest instead, or whether to just accept it.  However, it does work.
>>>
>>> Perhaps a separate "I know what I am doing" skip-snapshot option?  Or
>>> a device callback saying "not snapshotting this is fine"?
>> Boy, is this ugly...
>>
>> What do you do with disk-only snapshots? The recovery only works as long
>> as you have VM state.
>>
>> Kevin
> actually I am in a bit of trouble :(
>
> I understand that this is ugly, but I would like to make working
> 'virsh snapshot' for OVFM VMs. This is necessary for us to make
> a release.
>
> Currently libvirt guys generate XML in the following way:
>
>   <os>
>     <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type>
>     <loader readonly='yes' 
> type='pflash'>/usr/share/OVMF/OVMF_CODE_new.fd</loader>
> <nvram>/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd</nvram>
>   </os>
>
> This results in:
>
> qemu -drive 
> file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on 
> \
>      -drive 
> file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd,if=pflash,format=raw,unit=1
>
> This obviously can not pass check in bdrv_all_can_snapshot()
> as 'pflash' is RW and raw, i.e. can not be snapshoted.
>
> They have discussed the switch to the following command line:
>
> qemu -drive 
> file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on 
> \
>      -drive 
> file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd.qcow2,if=pflash,format=qcow2,unit=1
>
> and say that in this case VM state could fall into PFLASH
> drive which is should not be big as the location of the
> file is different. This means that I am doomed here.
>
> Either we should force libvirt people to forget about their
> opinion that pflash should be small which I am unable to
> do or I should invent a way to ban VM state saving into
> pflash.
>
> OK. There are 2 options.
>
> 1) Ban pflash as it was done.
> 2) Add 'no-vmstate' flag to -drive (invented just now).
>
something like this:

Comments

Kevin Wolf Jan. 12, 2016, 4:52 p.m. UTC | #1
Am 12.01.2016 um 17:35 hat Denis V. Lunev geschrieben:
> On 01/12/2016 06:47 PM, Denis V. Lunev wrote:
> >On 01/12/2016 06:20 PM, Kevin Wolf wrote:
> >>Am 12.01.2016 um 15:59 hat Paolo Bonzini geschrieben:
> >>>
> >>>On 12/01/2016 15:16, Kevin Wolf wrote:
> >>>>>Thus we should avoid selection of "pflash" drives for VM
> >>>>>state saving.
> >>>>>
> >>>>>For now "pflash" is read-write raw image as it configured by libvirt.
> >>>>>Thus there are no such images in the field and we could
> >>>>>safely disable
> >>>>>ability to save state to those images inside QEMU.
> >>>>This is obviously broken. If you write to the pflash, then it needs to
> >>>>be snapshotted in order to keep a consistent state.
> >>>>
> >>>>If you want to avoid snapshotting the image, make it read-only and it
> >>>>will be skipped even today.
> >>>Sort of.  The point of having flash is to _not_ make it read-only, so
> >>>that is not a solution.
> >>>
> >>>Flash is already being snapshotted as part of saving RAM state.  In
> >>>fact, for this reason the device (at least the one used with OVMF; I
> >>>haven't checked other pflash devices) can simply save it back to disk
> >>>on the migration destination, without the need to use "migrate -b" or
> >>>shared storage.
> >>>[...]
> >>>I don't like very much using IF_PFLASH this way, which is why I hadn't
> >>>replied to the patch so far---I hadn't made up my mind about *what* to
> >>>suggest instead, or whether to just accept it.  However, it does work.
> >>>
> >>>Perhaps a separate "I know what I am doing" skip-snapshot option?  Or
> >>>a device callback saying "not snapshotting this is fine"?
> >>Boy, is this ugly...
> >>
> >>What do you do with disk-only snapshots? The recovery only works as long
> >>as you have VM state.
> >>
> >>Kevin
> >actually I am in a bit of trouble :(
> >
> >I understand that this is ugly, but I would like to make working
> >'virsh snapshot' for OVFM VMs. This is necessary for us to make
> >a release.
> >
> >Currently libvirt guys generate XML in the following way:
> >
> >  <os>
> >    <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type>
> >    <loader readonly='yes'
> >type='pflash'>/usr/share/OVMF/OVMF_CODE_new.fd</loader>
> ><nvram>/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd</nvram>
> >  </os>
> >
> >This results in:
> >
> >qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on
> >\
> >     -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd,if=pflash,format=raw,unit=1
> >
> >This obviously can not pass check in bdrv_all_can_snapshot()
> >as 'pflash' is RW and raw, i.e. can not be snapshoted.
> >
> >They have discussed the switch to the following command line:
> >
> >qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on
> >\
> >     -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd.qcow2,if=pflash,format=qcow2,unit=1
> >
> >and say that in this case VM state could fall into PFLASH
> >drive which is should not be big as the location of the
> >file is different. This means that I am doomed here.
> >
> >Either we should force libvirt people to forget about their
> >opinion that pflash should be small which I am unable to
> >do or I should invent a way to ban VM state saving into
> >pflash.
> >
> >OK. There are 2 options.
> >
> >1) Ban pflash as it was done.
> >2) Add 'no-vmstate' flag to -drive (invented just now).
> >
> something like this:
> 
> diff --git a/block.c b/block.c
> index 3e1877d..8900589 100644
> --- a/block.c
> +++ b/block.c
> @@ -881,6 +881,11 @@ static QemuOptsList bdrv_runtime_opts = {
>              .help = "Block driver to use for the node",
>          },
>          {
> +            .name = "novmstate",
> +            .type = QEMU_OPT_BOOL,
> +            .help = "Ignore for selecting to save VM state",
> +        },
> +        {
>              .name = BDRV_OPT_CACHE_WB,
>              .type = QEMU_OPT_BOOL,
>              .help = "Enable writeback mode",
> @@ -957,6 +962,7 @@ static int bdrv_open_common(BlockDriverState
> *bs, BdrvChild *file,
>      bs->request_alignment = 512;
>      bs->zero_beyond_eof = true;
>      bs->read_only = !(bs->open_flags & BDRV_O_RDWR);
> +    bs->disable_vmstate_save = qemu_opt_get_bool(opts, "novmstate", false);
> 
>      if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, bs->read_only)) {
>          error_setg(errp,
> diff --git a/block/snapshot.c b/block/snapshot.c
> index 2d86b88..33cdd86 100644
> --- a/block/snapshot.c
> +++ b/block/snapshot.c
> @@ -483,6 +483,10 @@ BlockDriverState *bdrv_all_find_vmstate_bs(void)
>      while (not_found && (bs = bdrv_next(bs))) {
>          AioContext *ctx = bdrv_get_aio_context(bs);
> 
> +        if (bs->disable_vmstate_save) {
> +            continue;
> +        }
> +
>          aio_context_acquire(ctx);
>          not_found = !bdrv_can_snapshot(bs);
>          aio_context_release(ctx);
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 256609d..855a209 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -438,6 +438,9 @@ struct BlockDriverState {
>      /* do we need to tell the quest if we have a volatile write cache? */
>      int enable_write_cache;
> 
> +    /* skip this BDS searching for one to save VM state */
> +    bool disable_vmstate_save;
> +
>      /* the following member gives a name to every node on the bs graph. */
>      char node_name[32];
>      /* element of the list of named nodes building the graph */

That sounds like an option. (No pun intended.)

We can discuss the option name (perhaps "vmstate" defaulting to "on" is
better?) and variable names (I'd prefer them to match the option name);
also you'll need to extend the QAPI schema for blockdev-add. But all of
these are minor points and the idea seems sane.

Kevin
Denis V. Lunev Jan. 12, 2016, 4:58 p.m. UTC | #2
On 01/12/2016 07:52 PM, Kevin Wolf wrote:
> Am 12.01.2016 um 17:35 hat Denis V. Lunev geschrieben:
>> On 01/12/2016 06:47 PM, Denis V. Lunev wrote:
>>> On 01/12/2016 06:20 PM, Kevin Wolf wrote:
>>>> Am 12.01.2016 um 15:59 hat Paolo Bonzini geschrieben:
>>>>> On 12/01/2016 15:16, Kevin Wolf wrote:
>>>>>>> Thus we should avoid selection of "pflash" drives for VM
>>>>>>> state saving.
>>>>>>>
>>>>>>> For now "pflash" is read-write raw image as it configured by libvirt.
>>>>>>> Thus there are no such images in the field and we could
>>>>>>> safely disable
>>>>>>> ability to save state to those images inside QEMU.
>>>>>> This is obviously broken. If you write to the pflash, then it needs to
>>>>>> be snapshotted in order to keep a consistent state.
>>>>>>
>>>>>> If you want to avoid snapshotting the image, make it read-only and it
>>>>>> will be skipped even today.
>>>>> Sort of.  The point of having flash is to _not_ make it read-only, so
>>>>> that is not a solution.
>>>>>
>>>>> Flash is already being snapshotted as part of saving RAM state.  In
>>>>> fact, for this reason the device (at least the one used with OVMF; I
>>>>> haven't checked other pflash devices) can simply save it back to disk
>>>>> on the migration destination, without the need to use "migrate -b" or
>>>>> shared storage.
>>>>> [...]
>>>>> I don't like very much using IF_PFLASH this way, which is why I hadn't
>>>>> replied to the patch so far---I hadn't made up my mind about *what* to
>>>>> suggest instead, or whether to just accept it.  However, it does work.
>>>>>
>>>>> Perhaps a separate "I know what I am doing" skip-snapshot option?  Or
>>>>> a device callback saying "not snapshotting this is fine"?
>>>> Boy, is this ugly...
>>>>
>>>> What do you do with disk-only snapshots? The recovery only works as long
>>>> as you have VM state.
>>>>
>>>> Kevin
>>> actually I am in a bit of trouble :(
>>>
>>> I understand that this is ugly, but I would like to make working
>>> 'virsh snapshot' for OVFM VMs. This is necessary for us to make
>>> a release.
>>>
>>> Currently libvirt guys generate XML in the following way:
>>>
>>>   <os>
>>>     <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type>
>>>     <loader readonly='yes'
>>> type='pflash'>/usr/share/OVMF/OVMF_CODE_new.fd</loader>
>>> <nvram>/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd</nvram>
>>>   </os>
>>>
>>> This results in:
>>>
>>> qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on
>>> \
>>>      -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd,if=pflash,format=raw,unit=1
>>>
>>> This obviously can not pass check in bdrv_all_can_snapshot()
>>> as 'pflash' is RW and raw, i.e. can not be snapshoted.
>>>
>>> They have discussed the switch to the following command line:
>>>
>>> qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on
>>> \
>>>      -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd.qcow2,if=pflash,format=qcow2,unit=1
>>>
>>> and say that in this case VM state could fall into PFLASH
>>> drive which is should not be big as the location of the
>>> file is different. This means that I am doomed here.
>>>
>>> Either we should force libvirt people to forget about their
>>> opinion that pflash should be small which I am unable to
>>> do or I should invent a way to ban VM state saving into
>>> pflash.
>>>
>>> OK. There are 2 options.
>>>
>>> 1) Ban pflash as it was done.
>>> 2) Add 'no-vmstate' flag to -drive (invented just now).
>>>
>> something like this:
>>
>> diff --git a/block.c b/block.c
>> index 3e1877d..8900589 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -881,6 +881,11 @@ static QemuOptsList bdrv_runtime_opts = {
>>               .help = "Block driver to use for the node",
>>           },
>>           {
>> +            .name = "novmstate",
>> +            .type = QEMU_OPT_BOOL,
>> +            .help = "Ignore for selecting to save VM state",
>> +        },
>> +        {
>>               .name = BDRV_OPT_CACHE_WB,
>>               .type = QEMU_OPT_BOOL,
>>               .help = "Enable writeback mode",
>> @@ -957,6 +962,7 @@ static int bdrv_open_common(BlockDriverState
>> *bs, BdrvChild *file,
>>       bs->request_alignment = 512;
>>       bs->zero_beyond_eof = true;
>>       bs->read_only = !(bs->open_flags & BDRV_O_RDWR);
>> +    bs->disable_vmstate_save = qemu_opt_get_bool(opts, "novmstate", false);
>>
>>       if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, bs->read_only)) {
>>           error_setg(errp,
>> diff --git a/block/snapshot.c b/block/snapshot.c
>> index 2d86b88..33cdd86 100644
>> --- a/block/snapshot.c
>> +++ b/block/snapshot.c
>> @@ -483,6 +483,10 @@ BlockDriverState *bdrv_all_find_vmstate_bs(void)
>>       while (not_found && (bs = bdrv_next(bs))) {
>>           AioContext *ctx = bdrv_get_aio_context(bs);
>>
>> +        if (bs->disable_vmstate_save) {
>> +            continue;
>> +        }
>> +
>>           aio_context_acquire(ctx);
>>           not_found = !bdrv_can_snapshot(bs);
>>           aio_context_release(ctx);
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index 256609d..855a209 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -438,6 +438,9 @@ struct BlockDriverState {
>>       /* do we need to tell the quest if we have a volatile write cache? */
>>       int enable_write_cache;
>>
>> +    /* skip this BDS searching for one to save VM state */
>> +    bool disable_vmstate_save;
>> +
>>       /* the following member gives a name to every node on the bs graph. */
>>       char node_name[32];
>>       /* element of the list of named nodes building the graph */
> That sounds like an option. (No pun intended.)
>
> We can discuss the option name (perhaps "vmstate" defaulting to "on" is
> better?) and variable names (I'd prefer them to match the option name);
> also you'll need to extend the QAPI schema for blockdev-add. But all of
> these are minor points and the idea seems sane.
>
> Kevin
Perfect!

Thanks all for a discussion :)

Den
Markus Armbruster Jan. 12, 2016, 5:40 p.m. UTC | #3
Kevin Wolf <kwolf@redhat.com> writes:

> Am 12.01.2016 um 17:35 hat Denis V. Lunev geschrieben:
>> On 01/12/2016 06:47 PM, Denis V. Lunev wrote:
>> >On 01/12/2016 06:20 PM, Kevin Wolf wrote:
>> >>Am 12.01.2016 um 15:59 hat Paolo Bonzini geschrieben:
>> >>>
>> >>>On 12/01/2016 15:16, Kevin Wolf wrote:
>> >>>>>Thus we should avoid selection of "pflash" drives for VM
>> >>>>>state saving.
>> >>>>>
>> >>>>>For now "pflash" is read-write raw image as it configured by libvirt.
>> >>>>>Thus there are no such images in the field and we could
>> >>>>>safely disable
>> >>>>>ability to save state to those images inside QEMU.
>> >>>>This is obviously broken. If you write to the pflash, then it needs to
>> >>>>be snapshotted in order to keep a consistent state.
>> >>>>
>> >>>>If you want to avoid snapshotting the image, make it read-only and it
>> >>>>will be skipped even today.
>> >>>Sort of.  The point of having flash is to _not_ make it read-only, so
>> >>>that is not a solution.
>> >>>
>> >>>Flash is already being snapshotted as part of saving RAM state.  In
>> >>>fact, for this reason the device (at least the one used with OVMF; I
>> >>>haven't checked other pflash devices) can simply save it back to disk
>> >>>on the migration destination, without the need to use "migrate -b" or
>> >>>shared storage.
>> >>>[...]
>> >>>I don't like very much using IF_PFLASH this way, which is why I hadn't
>> >>>replied to the patch so far---I hadn't made up my mind about *what* to
>> >>>suggest instead, or whether to just accept it.  However, it does work.
>> >>>
>> >>>Perhaps a separate "I know what I am doing" skip-snapshot option?  Or
>> >>>a device callback saying "not snapshotting this is fine"?
>> >>Boy, is this ugly...
>> >>
>> >>What do you do with disk-only snapshots? The recovery only works as long
>> >>as you have VM state.
>> >>
>> >>Kevin
>> >actually I am in a bit of trouble :(
>> >
>> >I understand that this is ugly, but I would like to make working
>> >'virsh snapshot' for OVFM VMs. This is necessary for us to make
>> >a release.
>> >
>> >Currently libvirt guys generate XML in the following way:
>> >
>> >  <os>
>> >    <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type>
>> >    <loader readonly='yes'
>> >type='pflash'>/usr/share/OVMF/OVMF_CODE_new.fd</loader>
>> ><nvram>/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd</nvram>
>> >  </os>
>> >
>> >This results in:
>> >
>> >qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on
>> >\
>> >     -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd,if=pflash,format=raw,unit=1
>> >
>> >This obviously can not pass check in bdrv_all_can_snapshot()
>> >as 'pflash' is RW and raw, i.e. can not be snapshoted.
>> >
>> >They have discussed the switch to the following command line:
>> >
>> >qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on
>> >\
>> >     -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd.qcow2,if=pflash,format=qcow2,unit=1
>> >
>> >and say that in this case VM state could fall into PFLASH
>> >drive which is should not be big as the location of the
>> >file is different. This means that I am doomed here.
>> >
>> >Either we should force libvirt people to forget about their
>> >opinion that pflash should be small which I am unable to
>> >do or I should invent a way to ban VM state saving into
>> >pflash.
>> >
>> >OK. There are 2 options.
>> >
>> >1) Ban pflash as it was done.
>> >2) Add 'no-vmstate' flag to -drive (invented just now).
>> >
>> something like this:
>> 
>> diff --git a/block.c b/block.c
>> index 3e1877d..8900589 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -881,6 +881,11 @@ static QemuOptsList bdrv_runtime_opts = {
>>              .help = "Block driver to use for the node",
>>          },
>>          {
>> +            .name = "novmstate",
>> +            .type = QEMU_OPT_BOOL,
>> +            .help = "Ignore for selecting to save VM state",
>> +        },
>> +        {
>>              .name = BDRV_OPT_CACHE_WB,
>>              .type = QEMU_OPT_BOOL,
>>              .help = "Enable writeback mode",
>> @@ -957,6 +962,7 @@ static int bdrv_open_common(BlockDriverState
>> *bs, BdrvChild *file,
>>      bs->request_alignment = 512;
>>      bs->zero_beyond_eof = true;
>>      bs->read_only = !(bs->open_flags & BDRV_O_RDWR);
>> +    bs->disable_vmstate_save = qemu_opt_get_bool(opts, "novmstate", false);
>> 
>>      if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, bs->read_only)) {
>>          error_setg(errp,
>> diff --git a/block/snapshot.c b/block/snapshot.c
>> index 2d86b88..33cdd86 100644
>> --- a/block/snapshot.c
>> +++ b/block/snapshot.c
>> @@ -483,6 +483,10 @@ BlockDriverState *bdrv_all_find_vmstate_bs(void)
>>      while (not_found && (bs = bdrv_next(bs))) {
>>          AioContext *ctx = bdrv_get_aio_context(bs);
>> 
>> +        if (bs->disable_vmstate_save) {
>> +            continue;
>> +        }
>> +
>>          aio_context_acquire(ctx);
>>          not_found = !bdrv_can_snapshot(bs);
>>          aio_context_release(ctx);
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index 256609d..855a209 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -438,6 +438,9 @@ struct BlockDriverState {
>>      /* do we need to tell the quest if we have a volatile write cache? */
>>      int enable_write_cache;
>> 
>> +    /* skip this BDS searching for one to save VM state */
>> +    bool disable_vmstate_save;
>> +
>>      /* the following member gives a name to every node on the bs graph. */
>>      char node_name[32];
>>      /* element of the list of named nodes building the graph */
>
> That sounds like an option. (No pun intended.)
>
> We can discuss the option name (perhaps "vmstate" defaulting to "on" is
> better?) and variable names (I'd prefer them to match the option name);
> also you'll need to extend the QAPI schema for blockdev-add. But all of
> these are minor points and the idea seems sane.

I've always thought that QEMU picking the image to take the VM state is
backwards.  Adding means to guide that pick like "don't pick this one,
please" may help ease the pain, but it's still backwards.

The *user* should pick it.
Kevin Wolf Jan. 12, 2016, 5:50 p.m. UTC | #4
Am 12.01.2016 um 18:40 hat Markus Armbruster geschrieben:
> Kevin Wolf <kwolf@redhat.com> writes:
> 
> > Am 12.01.2016 um 17:35 hat Denis V. Lunev geschrieben:
> >> On 01/12/2016 06:47 PM, Denis V. Lunev wrote:
> >> >On 01/12/2016 06:20 PM, Kevin Wolf wrote:
> >> >>Am 12.01.2016 um 15:59 hat Paolo Bonzini geschrieben:
> >> >>>
> >> >>>On 12/01/2016 15:16, Kevin Wolf wrote:
> >> >>>>>Thus we should avoid selection of "pflash" drives for VM
> >> >>>>>state saving.
> >> >>>>>
> >> >>>>>For now "pflash" is read-write raw image as it configured by libvirt.
> >> >>>>>Thus there are no such images in the field and we could
> >> >>>>>safely disable
> >> >>>>>ability to save state to those images inside QEMU.
> >> >>>>This is obviously broken. If you write to the pflash, then it needs to
> >> >>>>be snapshotted in order to keep a consistent state.
> >> >>>>
> >> >>>>If you want to avoid snapshotting the image, make it read-only and it
> >> >>>>will be skipped even today.
> >> >>>Sort of.  The point of having flash is to _not_ make it read-only, so
> >> >>>that is not a solution.
> >> >>>
> >> >>>Flash is already being snapshotted as part of saving RAM state.  In
> >> >>>fact, for this reason the device (at least the one used with OVMF; I
> >> >>>haven't checked other pflash devices) can simply save it back to disk
> >> >>>on the migration destination, without the need to use "migrate -b" or
> >> >>>shared storage.
> >> >>>[...]
> >> >>>I don't like very much using IF_PFLASH this way, which is why I hadn't
> >> >>>replied to the patch so far---I hadn't made up my mind about *what* to
> >> >>>suggest instead, or whether to just accept it.  However, it does work.
> >> >>>
> >> >>>Perhaps a separate "I know what I am doing" skip-snapshot option?  Or
> >> >>>a device callback saying "not snapshotting this is fine"?
> >> >>Boy, is this ugly...
> >> >>
> >> >>What do you do with disk-only snapshots? The recovery only works as long
> >> >>as you have VM state.
> >> >>
> >> >>Kevin
> >> >actually I am in a bit of trouble :(
> >> >
> >> >I understand that this is ugly, but I would like to make working
> >> >'virsh snapshot' for OVFM VMs. This is necessary for us to make
> >> >a release.
> >> >
> >> >Currently libvirt guys generate XML in the following way:
> >> >
> >> >  <os>
> >> >    <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type>
> >> >    <loader readonly='yes'
> >> >type='pflash'>/usr/share/OVMF/OVMF_CODE_new.fd</loader>
> >> ><nvram>/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd</nvram>
> >> >  </os>
> >> >
> >> >This results in:
> >> >
> >> >qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on
> >> >\
> >> >     -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd,if=pflash,format=raw,unit=1
> >> >
> >> >This obviously can not pass check in bdrv_all_can_snapshot()
> >> >as 'pflash' is RW and raw, i.e. can not be snapshoted.
> >> >
> >> >They have discussed the switch to the following command line:
> >> >
> >> >qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on
> >> >\
> >> >     -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd.qcow2,if=pflash,format=qcow2,unit=1
> >> >
> >> >and say that in this case VM state could fall into PFLASH
> >> >drive which is should not be big as the location of the
> >> >file is different. This means that I am doomed here.
> >> >
> >> >Either we should force libvirt people to forget about their
> >> >opinion that pflash should be small which I am unable to
> >> >do or I should invent a way to ban VM state saving into
> >> >pflash.
> >> >
> >> >OK. There are 2 options.
> >> >
> >> >1) Ban pflash as it was done.
> >> >2) Add 'no-vmstate' flag to -drive (invented just now).
> >> >
> >> something like this:
> >> 
> >> diff --git a/block.c b/block.c
> >> index 3e1877d..8900589 100644
> >> --- a/block.c
> >> +++ b/block.c
> >> @@ -881,6 +881,11 @@ static QemuOptsList bdrv_runtime_opts = {
> >>              .help = "Block driver to use for the node",
> >>          },
> >>          {
> >> +            .name = "novmstate",
> >> +            .type = QEMU_OPT_BOOL,
> >> +            .help = "Ignore for selecting to save VM state",
> >> +        },
> >> +        {
> >>              .name = BDRV_OPT_CACHE_WB,
> >>              .type = QEMU_OPT_BOOL,
> >>              .help = "Enable writeback mode",
> >> @@ -957,6 +962,7 @@ static int bdrv_open_common(BlockDriverState
> >> *bs, BdrvChild *file,
> >>      bs->request_alignment = 512;
> >>      bs->zero_beyond_eof = true;
> >>      bs->read_only = !(bs->open_flags & BDRV_O_RDWR);
> >> +    bs->disable_vmstate_save = qemu_opt_get_bool(opts, "novmstate", false);
> >> 
> >>      if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, bs->read_only)) {
> >>          error_setg(errp,
> >> diff --git a/block/snapshot.c b/block/snapshot.c
> >> index 2d86b88..33cdd86 100644
> >> --- a/block/snapshot.c
> >> +++ b/block/snapshot.c
> >> @@ -483,6 +483,10 @@ BlockDriverState *bdrv_all_find_vmstate_bs(void)
> >>      while (not_found && (bs = bdrv_next(bs))) {
> >>          AioContext *ctx = bdrv_get_aio_context(bs);
> >> 
> >> +        if (bs->disable_vmstate_save) {
> >> +            continue;
> >> +        }
> >> +
> >>          aio_context_acquire(ctx);
> >>          not_found = !bdrv_can_snapshot(bs);
> >>          aio_context_release(ctx);
> >> diff --git a/include/block/block_int.h b/include/block/block_int.h
> >> index 256609d..855a209 100644
> >> --- a/include/block/block_int.h
> >> +++ b/include/block/block_int.h
> >> @@ -438,6 +438,9 @@ struct BlockDriverState {
> >>      /* do we need to tell the quest if we have a volatile write cache? */
> >>      int enable_write_cache;
> >> 
> >> +    /* skip this BDS searching for one to save VM state */
> >> +    bool disable_vmstate_save;
> >> +
> >>      /* the following member gives a name to every node on the bs graph. */
> >>      char node_name[32];
> >>      /* element of the list of named nodes building the graph */
> >
> > That sounds like an option. (No pun intended.)
> >
> > We can discuss the option name (perhaps "vmstate" defaulting to "on" is
> > better?) and variable names (I'd prefer them to match the option name);
> > also you'll need to extend the QAPI schema for blockdev-add. But all of
> > these are minor points and the idea seems sane.
> 
> I've always thought that QEMU picking the image to take the VM state is
> backwards.  Adding means to guide that pick like "don't pick this one,
> please" may help ease the pain, but it's still backwards.
> 
> The *user* should pick it.

Designing the API now when it has been in use for ten years is
backwards, too. We have to take it as is and make the best of it.

We could add an optional argument to savevm that tells which image to
save the VM state to. But if it's missing, we still need to make a pick.
Of course, libvirt should then always use that option and then we don't
need a separate vmstate=[on|off] option.

If we go that way, we need to improve loadvm to get VM state from any of
the images of a VM, because the user could have saved the state to any.
(Making that improvement is probably a good idea anyway.)

Kevin
Denis V. Lunev Jan. 12, 2016, 5:53 p.m. UTC | #5
On 01/12/2016 08:40 PM, Markus Armbruster wrote:
> Kevin Wolf <kwolf@redhat.com> writes:
>
>> Am 12.01.2016 um 17:35 hat Denis V. Lunev geschrieben:
>>> On 01/12/2016 06:47 PM, Denis V. Lunev wrote:
>>>> On 01/12/2016 06:20 PM, Kevin Wolf wrote:
>>>>> Am 12.01.2016 um 15:59 hat Paolo Bonzini geschrieben:
>>>>>> On 12/01/2016 15:16, Kevin Wolf wrote:
>>>>>>>> Thus we should avoid selection of "pflash" drives for VM
>>>>>>>> state saving.
>>>>>>>>
>>>>>>>> For now "pflash" is read-write raw image as it configured by libvirt.
>>>>>>>> Thus there are no such images in the field and we could
>>>>>>>> safely disable
>>>>>>>> ability to save state to those images inside QEMU.
>>>>>>> This is obviously broken. If you write to the pflash, then it needs to
>>>>>>> be snapshotted in order to keep a consistent state.
>>>>>>>
>>>>>>> If you want to avoid snapshotting the image, make it read-only and it
>>>>>>> will be skipped even today.
>>>>>> Sort of.  The point of having flash is to _not_ make it read-only, so
>>>>>> that is not a solution.
>>>>>>
>>>>>> Flash is already being snapshotted as part of saving RAM state.  In
>>>>>> fact, for this reason the device (at least the one used with OVMF; I
>>>>>> haven't checked other pflash devices) can simply save it back to disk
>>>>>> on the migration destination, without the need to use "migrate -b" or
>>>>>> shared storage.
>>>>>> [...]
>>>>>> I don't like very much using IF_PFLASH this way, which is why I hadn't
>>>>>> replied to the patch so far---I hadn't made up my mind about *what* to
>>>>>> suggest instead, or whether to just accept it.  However, it does work.
>>>>>>
>>>>>> Perhaps a separate "I know what I am doing" skip-snapshot option?  Or
>>>>>> a device callback saying "not snapshotting this is fine"?
>>>>> Boy, is this ugly...
>>>>>
>>>>> What do you do with disk-only snapshots? The recovery only works as long
>>>>> as you have VM state.
>>>>>
>>>>> Kevin
>>>> actually I am in a bit of trouble :(
>>>>
>>>> I understand that this is ugly, but I would like to make working
>>>> 'virsh snapshot' for OVFM VMs. This is necessary for us to make
>>>> a release.
>>>>
>>>> Currently libvirt guys generate XML in the following way:
>>>>
>>>>   <os>
>>>>     <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type>
>>>>     <loader readonly='yes'
>>>> type='pflash'>/usr/share/OVMF/OVMF_CODE_new.fd</loader>
>>>> <nvram>/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd</nvram>
>>>>   </os>
>>>>
>>>> This results in:
>>>>
>>>> qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on
>>>> \
>>>>      -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd,if=pflash,format=raw,unit=1
>>>>
>>>> This obviously can not pass check in bdrv_all_can_snapshot()
>>>> as 'pflash' is RW and raw, i.e. can not be snapshoted.
>>>>
>>>> They have discussed the switch to the following command line:
>>>>
>>>> qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on
>>>> \
>>>>      -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd.qcow2,if=pflash,format=qcow2,unit=1
>>>>
>>>> and say that in this case VM state could fall into PFLASH
>>>> drive which is should not be big as the location of the
>>>> file is different. This means that I am doomed here.
>>>>
>>>> Either we should force libvirt people to forget about their
>>>> opinion that pflash should be small which I am unable to
>>>> do or I should invent a way to ban VM state saving into
>>>> pflash.
>>>>
>>>> OK. There are 2 options.
>>>>
>>>> 1) Ban pflash as it was done.
>>>> 2) Add 'no-vmstate' flag to -drive (invented just now).
>>>>
>>> something like this:
>>>
>>> diff --git a/block.c b/block.c
>>> index 3e1877d..8900589 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -881,6 +881,11 @@ static QemuOptsList bdrv_runtime_opts = {
>>>               .help = "Block driver to use for the node",
>>>           },
>>>           {
>>> +            .name = "novmstate",
>>> +            .type = QEMU_OPT_BOOL,
>>> +            .help = "Ignore for selecting to save VM state",
>>> +        },
>>> +        {
>>>               .name = BDRV_OPT_CACHE_WB,
>>>               .type = QEMU_OPT_BOOL,
>>>               .help = "Enable writeback mode",
>>> @@ -957,6 +962,7 @@ static int bdrv_open_common(BlockDriverState
>>> *bs, BdrvChild *file,
>>>       bs->request_alignment = 512;
>>>       bs->zero_beyond_eof = true;
>>>       bs->read_only = !(bs->open_flags & BDRV_O_RDWR);
>>> +    bs->disable_vmstate_save = qemu_opt_get_bool(opts, "novmstate", false);
>>>
>>>       if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, bs->read_only)) {
>>>           error_setg(errp,
>>> diff --git a/block/snapshot.c b/block/snapshot.c
>>> index 2d86b88..33cdd86 100644
>>> --- a/block/snapshot.c
>>> +++ b/block/snapshot.c
>>> @@ -483,6 +483,10 @@ BlockDriverState *bdrv_all_find_vmstate_bs(void)
>>>       while (not_found && (bs = bdrv_next(bs))) {
>>>           AioContext *ctx = bdrv_get_aio_context(bs);
>>>
>>> +        if (bs->disable_vmstate_save) {
>>> +            continue;
>>> +        }
>>> +
>>>           aio_context_acquire(ctx);
>>>           not_found = !bdrv_can_snapshot(bs);
>>>           aio_context_release(ctx);
>>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>>> index 256609d..855a209 100644
>>> --- a/include/block/block_int.h
>>> +++ b/include/block/block_int.h
>>> @@ -438,6 +438,9 @@ struct BlockDriverState {
>>>       /* do we need to tell the quest if we have a volatile write cache? */
>>>       int enable_write_cache;
>>>
>>> +    /* skip this BDS searching for one to save VM state */
>>> +    bool disable_vmstate_save;
>>> +
>>>       /* the following member gives a name to every node on the bs graph. */
>>>       char node_name[32];
>>>       /* element of the list of named nodes building the graph */
>> That sounds like an option. (No pun intended.)
>>
>> We can discuss the option name (perhaps "vmstate" defaulting to "on" is
>> better?) and variable names (I'd prefer them to match the option name);
>> also you'll need to extend the QAPI schema for blockdev-add. But all of
>> these are minor points and the idea seems sane.
> I've always thought that QEMU picking the image to take the VM state is
> backwards.  Adding means to guide that pick like "don't pick this one,
> please" may help ease the pain, but it's still backwards.
>
> The *user* should pick it.
user here can pass hint for all devices setting 'vmstate=on' on a single
drive.

Though there is an option to select drive specifically like this

##
# @savevm
#
# Save a VM snapshot. Old snapshot with the same name will be deleted if 
exists.
#
# @name: identifier of a snapshot to be created
#
# Returns: Nothing on success
#
# Since 2.6
##
{ 'command': 'savevm', 'data': {'name': 'str', 'vmstate-drive': 'str'} }

But for loadvm it would be better to really scan all drives trying to locate
vmstate. This is IMHO better but it is unclear for me how to do this
just now. I'll need to think on this.

Actually we can do both things. Code the option in QMP and also
provide protection with "vmstate=off" for certain BDSes.

Den
Denis V. Lunev Jan. 12, 2016, 5:54 p.m. UTC | #6
On 01/12/2016 08:50 PM, Kevin Wolf wrote:
> Am 12.01.2016 um 18:40 hat Markus Armbruster geschrieben:
>> Kevin Wolf <kwolf@redhat.com> writes:
>>
>>> Am 12.01.2016 um 17:35 hat Denis V. Lunev geschrieben:
>>>> On 01/12/2016 06:47 PM, Denis V. Lunev wrote:
>>>>> On 01/12/2016 06:20 PM, Kevin Wolf wrote:
>>>>>> Am 12.01.2016 um 15:59 hat Paolo Bonzini geschrieben:
>>>>>>> On 12/01/2016 15:16, Kevin Wolf wrote:
>>>>>>>>> Thus we should avoid selection of "pflash" drives for VM
>>>>>>>>> state saving.
>>>>>>>>>
>>>>>>>>> For now "pflash" is read-write raw image as it configured by libvirt.
>>>>>>>>> Thus there are no such images in the field and we could
>>>>>>>>> safely disable
>>>>>>>>> ability to save state to those images inside QEMU.
>>>>>>>> This is obviously broken. If you write to the pflash, then it needs to
>>>>>>>> be snapshotted in order to keep a consistent state.
>>>>>>>>
>>>>>>>> If you want to avoid snapshotting the image, make it read-only and it
>>>>>>>> will be skipped even today.
>>>>>>> Sort of.  The point of having flash is to _not_ make it read-only, so
>>>>>>> that is not a solution.
>>>>>>>
>>>>>>> Flash is already being snapshotted as part of saving RAM state.  In
>>>>>>> fact, for this reason the device (at least the one used with OVMF; I
>>>>>>> haven't checked other pflash devices) can simply save it back to disk
>>>>>>> on the migration destination, without the need to use "migrate -b" or
>>>>>>> shared storage.
>>>>>>> [...]
>>>>>>> I don't like very much using IF_PFLASH this way, which is why I hadn't
>>>>>>> replied to the patch so far---I hadn't made up my mind about *what* to
>>>>>>> suggest instead, or whether to just accept it.  However, it does work.
>>>>>>>
>>>>>>> Perhaps a separate "I know what I am doing" skip-snapshot option?  Or
>>>>>>> a device callback saying "not snapshotting this is fine"?
>>>>>> Boy, is this ugly...
>>>>>>
>>>>>> What do you do with disk-only snapshots? The recovery only works as long
>>>>>> as you have VM state.
>>>>>>
>>>>>> Kevin
>>>>> actually I am in a bit of trouble :(
>>>>>
>>>>> I understand that this is ugly, but I would like to make working
>>>>> 'virsh snapshot' for OVFM VMs. This is necessary for us to make
>>>>> a release.
>>>>>
>>>>> Currently libvirt guys generate XML in the following way:
>>>>>
>>>>>   <os>
>>>>>     <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type>
>>>>>     <loader readonly='yes'
>>>>> type='pflash'>/usr/share/OVMF/OVMF_CODE_new.fd</loader>
>>>>> <nvram>/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd</nvram>
>>>>>   </os>
>>>>>
>>>>> This results in:
>>>>>
>>>>> qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on
>>>>> \
>>>>>      -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd,if=pflash,format=raw,unit=1
>>>>>
>>>>> This obviously can not pass check in bdrv_all_can_snapshot()
>>>>> as 'pflash' is RW and raw, i.e. can not be snapshoted.
>>>>>
>>>>> They have discussed the switch to the following command line:
>>>>>
>>>>> qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on
>>>>> \
>>>>>      -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd.qcow2,if=pflash,format=qcow2,unit=1
>>>>>
>>>>> and say that in this case VM state could fall into PFLASH
>>>>> drive which is should not be big as the location of the
>>>>> file is different. This means that I am doomed here.
>>>>>
>>>>> Either we should force libvirt people to forget about their
>>>>> opinion that pflash should be small which I am unable to
>>>>> do or I should invent a way to ban VM state saving into
>>>>> pflash.
>>>>>
>>>>> OK. There are 2 options.
>>>>>
>>>>> 1) Ban pflash as it was done.
>>>>> 2) Add 'no-vmstate' flag to -drive (invented just now).
>>>>>
>>>> something like this:
>>>>
>>>> diff --git a/block.c b/block.c
>>>> index 3e1877d..8900589 100644
>>>> --- a/block.c
>>>> +++ b/block.c
>>>> @@ -881,6 +881,11 @@ static QemuOptsList bdrv_runtime_opts = {
>>>>               .help = "Block driver to use for the node",
>>>>           },
>>>>           {
>>>> +            .name = "novmstate",
>>>> +            .type = QEMU_OPT_BOOL,
>>>> +            .help = "Ignore for selecting to save VM state",
>>>> +        },
>>>> +        {
>>>>               .name = BDRV_OPT_CACHE_WB,
>>>>               .type = QEMU_OPT_BOOL,
>>>>               .help = "Enable writeback mode",
>>>> @@ -957,6 +962,7 @@ static int bdrv_open_common(BlockDriverState
>>>> *bs, BdrvChild *file,
>>>>       bs->request_alignment = 512;
>>>>       bs->zero_beyond_eof = true;
>>>>       bs->read_only = !(bs->open_flags & BDRV_O_RDWR);
>>>> +    bs->disable_vmstate_save = qemu_opt_get_bool(opts, "novmstate", false);
>>>>
>>>>       if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, bs->read_only)) {
>>>>           error_setg(errp,
>>>> diff --git a/block/snapshot.c b/block/snapshot.c
>>>> index 2d86b88..33cdd86 100644
>>>> --- a/block/snapshot.c
>>>> +++ b/block/snapshot.c
>>>> @@ -483,6 +483,10 @@ BlockDriverState *bdrv_all_find_vmstate_bs(void)
>>>>       while (not_found && (bs = bdrv_next(bs))) {
>>>>           AioContext *ctx = bdrv_get_aio_context(bs);
>>>>
>>>> +        if (bs->disable_vmstate_save) {
>>>> +            continue;
>>>> +        }
>>>> +
>>>>           aio_context_acquire(ctx);
>>>>           not_found = !bdrv_can_snapshot(bs);
>>>>           aio_context_release(ctx);
>>>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>>>> index 256609d..855a209 100644
>>>> --- a/include/block/block_int.h
>>>> +++ b/include/block/block_int.h
>>>> @@ -438,6 +438,9 @@ struct BlockDriverState {
>>>>       /* do we need to tell the quest if we have a volatile write cache? */
>>>>       int enable_write_cache;
>>>>
>>>> +    /* skip this BDS searching for one to save VM state */
>>>> +    bool disable_vmstate_save;
>>>> +
>>>>       /* the following member gives a name to every node on the bs graph. */
>>>>       char node_name[32];
>>>>       /* element of the list of named nodes building the graph */
>>> That sounds like an option. (No pun intended.)
>>>
>>> We can discuss the option name (perhaps "vmstate" defaulting to "on" is
>>> better?) and variable names (I'd prefer them to match the option name);
>>> also you'll need to extend the QAPI schema for blockdev-add. But all of
>>> these are minor points and the idea seems sane.
>> I've always thought that QEMU picking the image to take the VM state is
>> backwards.  Adding means to guide that pick like "don't pick this one,
>> please" may help ease the pain, but it's still backwards.
>>
>> The *user* should pick it.
> Designing the API now when it has been in use for ten years is
> backwards, too. We have to take it as is and make the best of it.
>
> We could add an optional argument to savevm that tells which image to
> save the VM state to. But if it's missing, we still need to make a pick.
> Of course, libvirt should then always use that option and then we don't
> need a separate vmstate=[on|off] option.
>
> If we go that way, we need to improve loadvm to get VM state from any of
> the images of a VM, because the user could have saved the state to any.
> (Making that improvement is probably a good idea anyway.)
>
> Kevin
no. There is window now.

savevm at the moment is implemented in HMP only. There is my pending
patchset to switch to QMP. We can require to do something additional
during that switch.

Den
Markus Armbruster Jan. 13, 2016, 8:09 a.m. UTC | #7
Kevin Wolf <kwolf@redhat.com> writes:

> Am 12.01.2016 um 18:40 hat Markus Armbruster geschrieben:
>> Kevin Wolf <kwolf@redhat.com> writes:
>> 
>> > Am 12.01.2016 um 17:35 hat Denis V. Lunev geschrieben:
>> >> On 01/12/2016 06:47 PM, Denis V. Lunev wrote:
>> >> >On 01/12/2016 06:20 PM, Kevin Wolf wrote:
>> >> >>Am 12.01.2016 um 15:59 hat Paolo Bonzini geschrieben:
>> >> >>>
>> >> >>>On 12/01/2016 15:16, Kevin Wolf wrote:
>> >> >>>>>Thus we should avoid selection of "pflash" drives for VM
>> >> >>>>>state saving.
>> >> >>>>>
>> >> >>>>>For now "pflash" is read-write raw image as it configured by libvirt.
>> >> >>>>>Thus there are no such images in the field and we could
>> >> >>>>>safely disable
>> >> >>>>>ability to save state to those images inside QEMU.
>> >> >>>>This is obviously broken. If you write to the pflash, then it needs to
>> >> >>>>be snapshotted in order to keep a consistent state.
>> >> >>>>
>> >> >>>>If you want to avoid snapshotting the image, make it read-only and it
>> >> >>>>will be skipped even today.
>> >> >>>Sort of.  The point of having flash is to _not_ make it read-only, so
>> >> >>>that is not a solution.
>> >> >>>
>> >> >>>Flash is already being snapshotted as part of saving RAM state.  In
>> >> >>>fact, for this reason the device (at least the one used with OVMF; I
>> >> >>>haven't checked other pflash devices) can simply save it back to disk
>> >> >>>on the migration destination, without the need to use "migrate -b" or
>> >> >>>shared storage.
>> >> >>>[...]
>> >> >>>I don't like very much using IF_PFLASH this way, which is why I hadn't
>> >> >>>replied to the patch so far---I hadn't made up my mind about *what* to
>> >> >>>suggest instead, or whether to just accept it.  However, it does work.
>> >> >>>
>> >> >>>Perhaps a separate "I know what I am doing" skip-snapshot option?  Or
>> >> >>>a device callback saying "not snapshotting this is fine"?
>> >> >>Boy, is this ugly...
>> >> >>
>> >> >>What do you do with disk-only snapshots? The recovery only works as long
>> >> >>as you have VM state.
>> >> >>
>> >> >>Kevin
>> >> >actually I am in a bit of trouble :(
>> >> >
>> >> >I understand that this is ugly, but I would like to make working
>> >> >'virsh snapshot' for OVFM VMs. This is necessary for us to make
>> >> >a release.
>> >> >
>> >> >Currently libvirt guys generate XML in the following way:
>> >> >
>> >> >  <os>
>> >> >    <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type>
>> >> >    <loader readonly='yes'
>> >> >type='pflash'>/usr/share/OVMF/OVMF_CODE_new.fd</loader>
>> >> ><nvram>/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd</nvram>
>> >> >  </os>
>> >> >
>> >> >This results in:
>> >> >
>> >> >qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on
>> >> >\
>> >> >     -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd,if=pflash,format=raw,unit=1
>> >> >
>> >> >This obviously can not pass check in bdrv_all_can_snapshot()
>> >> >as 'pflash' is RW and raw, i.e. can not be snapshoted.
>> >> >
>> >> >They have discussed the switch to the following command line:
>> >> >
>> >> >qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on
>> >> >\
>> >> >     -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd.qcow2,if=pflash,format=qcow2,unit=1
>> >> >
>> >> >and say that in this case VM state could fall into PFLASH
>> >> >drive which is should not be big as the location of the
>> >> >file is different. This means that I am doomed here.
>> >> >
>> >> >Either we should force libvirt people to forget about their
>> >> >opinion that pflash should be small which I am unable to
>> >> >do or I should invent a way to ban VM state saving into
>> >> >pflash.
>> >> >
>> >> >OK. There are 2 options.
>> >> >
>> >> >1) Ban pflash as it was done.
>> >> >2) Add 'no-vmstate' flag to -drive (invented just now).
>> >> >
>> >> something like this:
>> >> 
>> >> diff --git a/block.c b/block.c
>> >> index 3e1877d..8900589 100644
>> >> --- a/block.c
>> >> +++ b/block.c
>> >> @@ -881,6 +881,11 @@ static QemuOptsList bdrv_runtime_opts = {
>> >>              .help = "Block driver to use for the node",
>> >>          },
>> >>          {
>> >> +            .name = "novmstate",
>> >> +            .type = QEMU_OPT_BOOL,
>> >> +            .help = "Ignore for selecting to save VM state",
>> >> +        },
>> >> +        {
>> >>              .name = BDRV_OPT_CACHE_WB,
>> >>              .type = QEMU_OPT_BOOL,
>> >>              .help = "Enable writeback mode",
>> >> @@ -957,6 +962,7 @@ static int bdrv_open_common(BlockDriverState
>> >> *bs, BdrvChild *file,
>> >>      bs->request_alignment = 512;
>> >>      bs->zero_beyond_eof = true;
>> >>      bs->read_only = !(bs->open_flags & BDRV_O_RDWR);
>> >> +    bs->disable_vmstate_save = qemu_opt_get_bool(opts, "novmstate", false);
>> >> 
>> >>      if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, bs->read_only)) {
>> >>          error_setg(errp,
>> >> diff --git a/block/snapshot.c b/block/snapshot.c
>> >> index 2d86b88..33cdd86 100644
>> >> --- a/block/snapshot.c
>> >> +++ b/block/snapshot.c
>> >> @@ -483,6 +483,10 @@ BlockDriverState *bdrv_all_find_vmstate_bs(void)
>> >>      while (not_found && (bs = bdrv_next(bs))) {
>> >>          AioContext *ctx = bdrv_get_aio_context(bs);
>> >> 
>> >> +        if (bs->disable_vmstate_save) {
>> >> +            continue;
>> >> +        }
>> >> +
>> >>          aio_context_acquire(ctx);
>> >>          not_found = !bdrv_can_snapshot(bs);
>> >>          aio_context_release(ctx);
>> >> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> >> index 256609d..855a209 100644
>> >> --- a/include/block/block_int.h
>> >> +++ b/include/block/block_int.h
>> >> @@ -438,6 +438,9 @@ struct BlockDriverState {
>> >>      /* do we need to tell the quest if we have a volatile write cache? */
>> >>      int enable_write_cache;
>> >> 
>> >> +    /* skip this BDS searching for one to save VM state */
>> >> +    bool disable_vmstate_save;
>> >> +
>> >>      /* the following member gives a name to every node on the bs graph. */
>> >>      char node_name[32];
>> >>      /* element of the list of named nodes building the graph */
>> >
>> > That sounds like an option. (No pun intended.)
>> >
>> > We can discuss the option name (perhaps "vmstate" defaulting to "on" is
>> > better?) and variable names (I'd prefer them to match the option name);
>> > also you'll need to extend the QAPI schema for blockdev-add. But all of
>> > these are minor points and the idea seems sane.
>> 
>> I've always thought that QEMU picking the image to take the VM state is
>> backwards.  Adding means to guide that pick like "don't pick this one,
>> please" may help ease the pain, but it's still backwards.
>> 
>> The *user* should pick it.
>
> Designing the API now when it has been in use for ten years is
> backwards, too. We have to take it as is and make the best of it.

As Den pointed out, the *QMP* interface doesn't exist, yet, and there's
no excuse for creating it backwards now.

HMP is not a stable interface, but we commonly make a reasonable effort
to keep it muscle-memory-compatible.

> We could add an optional argument to savevm that tells which image to
> save the VM state to. But if it's missing, we still need to make a pick.
> Of course, libvirt should then always use that option and then we don't
> need a separate vmstate=[on|off] option.
>
> If we go that way, we need to improve loadvm to get VM state from any of
> the images of a VM, because the user could have saved the state to any.
> (Making that improvement is probably a good idea anyway.)

I want none of the "QEMU picks the image to receive the VM state"
baggage in QMP.  I want none of the other baggage, either: overloaded
"ID or name" parameter, non-atomically deleting old snapshots with the
same ID or name.  Instead, let's have simple, non-magical commands that
do one thing.

Speaking of "do one thing": coupling "save VM state" to "snapshot all
storage internally" is problematic.  What if you want to combine "save
VM state" with external snapshots, or some combination of external and
internal snapshots?  Let me explain.  We have "external" and "internal"
solutions both for snapshotting storage and VM state: internal
vs. external snapshot, migrate to file vs. internal VM snapshot.  Do we
want to expose the building blocks, or do we want to expose only certain
combinations?

Back to HMP.  In general, HMP commands should be built on top of QMP
commands.  Adding convenience features is fine.  Defaulting savevm's
destination could be such a convenience feature.  Very low complexity
when the default is unambiguous, i.e. if there's just one possible
destination.  But if there's more than one, it adds significant
complexity to the interface.  If we want it anyway for muscle-memory-
compatibility, I'd spit out a warning when it's ambiguous.

Like you, I can't see a need for a separate vmstate=[on|off] knob.

Note that my opinions on HMP interfaces are less strong than on QMP
interfaces.
Laszlo Ersek Jan. 13, 2016, 10:41 a.m. UTC | #8
On 01/12/16 18:40, Markus Armbruster wrote:
> Kevin Wolf <kwolf@redhat.com> writes:
> 
>> Am 12.01.2016 um 17:35 hat Denis V. Lunev geschrieben:
>>> On 01/12/2016 06:47 PM, Denis V. Lunev wrote:
>>>> On 01/12/2016 06:20 PM, Kevin Wolf wrote:
>>>>> Am 12.01.2016 um 15:59 hat Paolo Bonzini geschrieben:
>>>>>>
>>>>>> On 12/01/2016 15:16, Kevin Wolf wrote:
>>>>>>>> Thus we should avoid selection of "pflash" drives for VM
>>>>>>>> state saving.
>>>>>>>>
>>>>>>>> For now "pflash" is read-write raw image as it configured by libvirt.
>>>>>>>> Thus there are no such images in the field and we could
>>>>>>>> safely disable
>>>>>>>> ability to save state to those images inside QEMU.
>>>>>>> This is obviously broken. If you write to the pflash, then it needs to
>>>>>>> be snapshotted in order to keep a consistent state.
>>>>>>>
>>>>>>> If you want to avoid snapshotting the image, make it read-only and it
>>>>>>> will be skipped even today.
>>>>>> Sort of.  The point of having flash is to _not_ make it read-only, so
>>>>>> that is not a solution.
>>>>>>
>>>>>> Flash is already being snapshotted as part of saving RAM state.  In
>>>>>> fact, for this reason the device (at least the one used with OVMF; I
>>>>>> haven't checked other pflash devices) can simply save it back to disk
>>>>>> on the migration destination, without the need to use "migrate -b" or
>>>>>> shared storage.
>>>>>> [...]
>>>>>> I don't like very much using IF_PFLASH this way, which is why I hadn't
>>>>>> replied to the patch so far---I hadn't made up my mind about *what* to
>>>>>> suggest instead, or whether to just accept it.  However, it does work.
>>>>>>
>>>>>> Perhaps a separate "I know what I am doing" skip-snapshot option?  Or
>>>>>> a device callback saying "not snapshotting this is fine"?
>>>>> Boy, is this ugly...
>>>>>
>>>>> What do you do with disk-only snapshots? The recovery only works as long
>>>>> as you have VM state.
>>>>>
>>>>> Kevin
>>>> actually I am in a bit of trouble :(
>>>>
>>>> I understand that this is ugly, but I would like to make working
>>>> 'virsh snapshot' for OVFM VMs. This is necessary for us to make
>>>> a release.
>>>>
>>>> Currently libvirt guys generate XML in the following way:
>>>>
>>>>  <os>
>>>>    <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type>
>>>>    <loader readonly='yes'
>>>> type='pflash'>/usr/share/OVMF/OVMF_CODE_new.fd</loader>
>>>> <nvram>/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd</nvram>
>>>>  </os>
>>>>
>>>> This results in:
>>>>
>>>> qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on
>>>> \
>>>>     -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd,if=pflash,format=raw,unit=1
>>>>
>>>> This obviously can not pass check in bdrv_all_can_snapshot()
>>>> as 'pflash' is RW and raw, i.e. can not be snapshoted.
>>>>
>>>> They have discussed the switch to the following command line:
>>>>
>>>> qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on
>>>> \
>>>>     -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd.qcow2,if=pflash,format=qcow2,unit=1
>>>>
>>>> and say that in this case VM state could fall into PFLASH
>>>> drive which is should not be big as the location of the
>>>> file is different. This means that I am doomed here.
>>>>
>>>> Either we should force libvirt people to forget about their
>>>> opinion that pflash should be small which I am unable to
>>>> do or I should invent a way to ban VM state saving into
>>>> pflash.
>>>>
>>>> OK. There are 2 options.
>>>>
>>>> 1) Ban pflash as it was done.
>>>> 2) Add 'no-vmstate' flag to -drive (invented just now).
>>>>
>>> something like this:
>>>
>>> diff --git a/block.c b/block.c
>>> index 3e1877d..8900589 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -881,6 +881,11 @@ static QemuOptsList bdrv_runtime_opts = {
>>>              .help = "Block driver to use for the node",
>>>          },
>>>          {
>>> +            .name = "novmstate",
>>> +            .type = QEMU_OPT_BOOL,
>>> +            .help = "Ignore for selecting to save VM state",
>>> +        },
>>> +        {
>>>              .name = BDRV_OPT_CACHE_WB,
>>>              .type = QEMU_OPT_BOOL,
>>>              .help = "Enable writeback mode",
>>> @@ -957,6 +962,7 @@ static int bdrv_open_common(BlockDriverState
>>> *bs, BdrvChild *file,
>>>      bs->request_alignment = 512;
>>>      bs->zero_beyond_eof = true;
>>>      bs->read_only = !(bs->open_flags & BDRV_O_RDWR);
>>> +    bs->disable_vmstate_save = qemu_opt_get_bool(opts, "novmstate", false);
>>>
>>>      if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, bs->read_only)) {
>>>          error_setg(errp,
>>> diff --git a/block/snapshot.c b/block/snapshot.c
>>> index 2d86b88..33cdd86 100644
>>> --- a/block/snapshot.c
>>> +++ b/block/snapshot.c
>>> @@ -483,6 +483,10 @@ BlockDriverState *bdrv_all_find_vmstate_bs(void)
>>>      while (not_found && (bs = bdrv_next(bs))) {
>>>          AioContext *ctx = bdrv_get_aio_context(bs);
>>>
>>> +        if (bs->disable_vmstate_save) {
>>> +            continue;
>>> +        }
>>> +
>>>          aio_context_acquire(ctx);
>>>          not_found = !bdrv_can_snapshot(bs);
>>>          aio_context_release(ctx);
>>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>>> index 256609d..855a209 100644
>>> --- a/include/block/block_int.h
>>> +++ b/include/block/block_int.h
>>> @@ -438,6 +438,9 @@ struct BlockDriverState {
>>>      /* do we need to tell the quest if we have a volatile write cache? */
>>>      int enable_write_cache;
>>>
>>> +    /* skip this BDS searching for one to save VM state */
>>> +    bool disable_vmstate_save;
>>> +
>>>      /* the following member gives a name to every node on the bs graph. */
>>>      char node_name[32];
>>>      /* element of the list of named nodes building the graph */
>>
>> That sounds like an option. (No pun intended.)
>>
>> We can discuss the option name (perhaps "vmstate" defaulting to "on" is
>> better?) and variable names (I'd prefer them to match the option name);
>> also you'll need to extend the QAPI schema for blockdev-add. But all of
>> these are minor points and the idea seems sane.
> 
> I've always thought that QEMU picking the image to take the VM state is
> backwards.  Adding means to guide that pick like "don't pick this one,
> please" may help ease the pain, but it's still backwards.

I agree. This is the gist of the argument that you made last time too,
and I agreed then as well.

One piece of good illustration for why this is brittle is your
observation that just changing the order of drives (without any pflash
being present) breaks this as well. (Libvirt only gets away with that
because it also stashes the domain XML when a snapshot is made, and the
domain XML dictates the order of drive options.)

> The *user* should pick it.

I guess so.

Thanks!
Laszlo
Laszlo Ersek Jan. 13, 2016, 10:43 a.m. UTC | #9
On 01/12/16 18:50, Kevin Wolf wrote:
> Am 12.01.2016 um 18:40 hat Markus Armbruster geschrieben:
>> Kevin Wolf <kwolf@redhat.com> writes:
>>
>>> Am 12.01.2016 um 17:35 hat Denis V. Lunev geschrieben:
>>>> On 01/12/2016 06:47 PM, Denis V. Lunev wrote:
>>>>> On 01/12/2016 06:20 PM, Kevin Wolf wrote:
>>>>>> Am 12.01.2016 um 15:59 hat Paolo Bonzini geschrieben:
>>>>>>>
>>>>>>> On 12/01/2016 15:16, Kevin Wolf wrote:
>>>>>>>>> Thus we should avoid selection of "pflash" drives for VM
>>>>>>>>> state saving.
>>>>>>>>>
>>>>>>>>> For now "pflash" is read-write raw image as it configured by libvirt.
>>>>>>>>> Thus there are no such images in the field and we could
>>>>>>>>> safely disable
>>>>>>>>> ability to save state to those images inside QEMU.
>>>>>>>> This is obviously broken. If you write to the pflash, then it needs to
>>>>>>>> be snapshotted in order to keep a consistent state.
>>>>>>>>
>>>>>>>> If you want to avoid snapshotting the image, make it read-only and it
>>>>>>>> will be skipped even today.
>>>>>>> Sort of.  The point of having flash is to _not_ make it read-only, so
>>>>>>> that is not a solution.
>>>>>>>
>>>>>>> Flash is already being snapshotted as part of saving RAM state.  In
>>>>>>> fact, for this reason the device (at least the one used with OVMF; I
>>>>>>> haven't checked other pflash devices) can simply save it back to disk
>>>>>>> on the migration destination, without the need to use "migrate -b" or
>>>>>>> shared storage.
>>>>>>> [...]
>>>>>>> I don't like very much using IF_PFLASH this way, which is why I hadn't
>>>>>>> replied to the patch so far---I hadn't made up my mind about *what* to
>>>>>>> suggest instead, or whether to just accept it.  However, it does work.
>>>>>>>
>>>>>>> Perhaps a separate "I know what I am doing" skip-snapshot option?  Or
>>>>>>> a device callback saying "not snapshotting this is fine"?
>>>>>> Boy, is this ugly...
>>>>>>
>>>>>> What do you do with disk-only snapshots? The recovery only works as long
>>>>>> as you have VM state.
>>>>>>
>>>>>> Kevin
>>>>> actually I am in a bit of trouble :(
>>>>>
>>>>> I understand that this is ugly, but I would like to make working
>>>>> 'virsh snapshot' for OVFM VMs. This is necessary for us to make
>>>>> a release.
>>>>>
>>>>> Currently libvirt guys generate XML in the following way:
>>>>>
>>>>>  <os>
>>>>>    <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type>
>>>>>    <loader readonly='yes'
>>>>> type='pflash'>/usr/share/OVMF/OVMF_CODE_new.fd</loader>
>>>>> <nvram>/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd</nvram>
>>>>>  </os>
>>>>>
>>>>> This results in:
>>>>>
>>>>> qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on
>>>>> \
>>>>>     -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd,if=pflash,format=raw,unit=1
>>>>>
>>>>> This obviously can not pass check in bdrv_all_can_snapshot()
>>>>> as 'pflash' is RW and raw, i.e. can not be snapshoted.
>>>>>
>>>>> They have discussed the switch to the following command line:
>>>>>
>>>>> qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on
>>>>> \
>>>>>     -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd.qcow2,if=pflash,format=qcow2,unit=1
>>>>>
>>>>> and say that in this case VM state could fall into PFLASH
>>>>> drive which is should not be big as the location of the
>>>>> file is different. This means that I am doomed here.
>>>>>
>>>>> Either we should force libvirt people to forget about their
>>>>> opinion that pflash should be small which I am unable to
>>>>> do or I should invent a way to ban VM state saving into
>>>>> pflash.
>>>>>
>>>>> OK. There are 2 options.
>>>>>
>>>>> 1) Ban pflash as it was done.
>>>>> 2) Add 'no-vmstate' flag to -drive (invented just now).
>>>>>
>>>> something like this:
>>>>
>>>> diff --git a/block.c b/block.c
>>>> index 3e1877d..8900589 100644
>>>> --- a/block.c
>>>> +++ b/block.c
>>>> @@ -881,6 +881,11 @@ static QemuOptsList bdrv_runtime_opts = {
>>>>              .help = "Block driver to use for the node",
>>>>          },
>>>>          {
>>>> +            .name = "novmstate",
>>>> +            .type = QEMU_OPT_BOOL,
>>>> +            .help = "Ignore for selecting to save VM state",
>>>> +        },
>>>> +        {
>>>>              .name = BDRV_OPT_CACHE_WB,
>>>>              .type = QEMU_OPT_BOOL,
>>>>              .help = "Enable writeback mode",
>>>> @@ -957,6 +962,7 @@ static int bdrv_open_common(BlockDriverState
>>>> *bs, BdrvChild *file,
>>>>      bs->request_alignment = 512;
>>>>      bs->zero_beyond_eof = true;
>>>>      bs->read_only = !(bs->open_flags & BDRV_O_RDWR);
>>>> +    bs->disable_vmstate_save = qemu_opt_get_bool(opts, "novmstate", false);
>>>>
>>>>      if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, bs->read_only)) {
>>>>          error_setg(errp,
>>>> diff --git a/block/snapshot.c b/block/snapshot.c
>>>> index 2d86b88..33cdd86 100644
>>>> --- a/block/snapshot.c
>>>> +++ b/block/snapshot.c
>>>> @@ -483,6 +483,10 @@ BlockDriverState *bdrv_all_find_vmstate_bs(void)
>>>>      while (not_found && (bs = bdrv_next(bs))) {
>>>>          AioContext *ctx = bdrv_get_aio_context(bs);
>>>>
>>>> +        if (bs->disable_vmstate_save) {
>>>> +            continue;
>>>> +        }
>>>> +
>>>>          aio_context_acquire(ctx);
>>>>          not_found = !bdrv_can_snapshot(bs);
>>>>          aio_context_release(ctx);
>>>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>>>> index 256609d..855a209 100644
>>>> --- a/include/block/block_int.h
>>>> +++ b/include/block/block_int.h
>>>> @@ -438,6 +438,9 @@ struct BlockDriverState {
>>>>      /* do we need to tell the quest if we have a volatile write cache? */
>>>>      int enable_write_cache;
>>>>
>>>> +    /* skip this BDS searching for one to save VM state */
>>>> +    bool disable_vmstate_save;
>>>> +
>>>>      /* the following member gives a name to every node on the bs graph. */
>>>>      char node_name[32];
>>>>      /* element of the list of named nodes building the graph */
>>>
>>> That sounds like an option. (No pun intended.)
>>>
>>> We can discuss the option name (perhaps "vmstate" defaulting to "on" is
>>> better?) and variable names (I'd prefer them to match the option name);
>>> also you'll need to extend the QAPI schema for blockdev-add. But all of
>>> these are minor points and the idea seems sane.
>>
>> I've always thought that QEMU picking the image to take the VM state is
>> backwards.  Adding means to guide that pick like "don't pick this one,
>> please" may help ease the pain, but it's still backwards.
>>
>> The *user* should pick it.
> 
> Designing the API now when it has been in use for ten years is
> backwards, too. We have to take it as is and make the best of it.
> 
> We could add an optional argument to savevm that tells which image to
> save the VM state to. But if it's missing, we still need to make a pick.
> Of course, libvirt should then always use that option and then we don't
> need a separate vmstate=[on|off] option.

Sounds great!

> If we go that way, we need to improve loadvm to get VM state from any of
> the images of a VM, because the user could have saved the state to any.
> (Making that improvement is probably a good idea anyway.)

What if there are several images with vmstate in them?

(OTOH that doesn't seem to be a well-defined case even now, so nothing
would amount to a regression.)

Thanks
Laszlo

> 
> Kevin
>
diff mbox

Patch

diff --git a/block.c b/block.c
index 3e1877d..8900589 100644
--- a/block.c
+++ b/block.c
@@ -881,6 +881,11 @@  static QemuOptsList bdrv_runtime_opts = {
              .help = "Block driver to use for the node",
          },
          {
+            .name = "novmstate",
+            .type = QEMU_OPT_BOOL,
+            .help = "Ignore for selecting to save VM state",
+        },
+        {
              .name = BDRV_OPT_CACHE_WB,
              .type = QEMU_OPT_BOOL,
              .help = "Enable writeback mode",
@@ -957,6 +962,7 @@  static int bdrv_open_common(BlockDriverState *bs, 
BdrvChild *file,
      bs->request_alignment = 512;
      bs->zero_beyond_eof = true;
      bs->read_only = !(bs->open_flags & BDRV_O_RDWR);
+    bs->disable_vmstate_save = qemu_opt_get_bool(opts, "novmstate", false);

      if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, bs->read_only)) {
          error_setg(errp,
diff --git a/block/snapshot.c b/block/snapshot.c
index 2d86b88..33cdd86 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -483,6 +483,10 @@  BlockDriverState *bdrv_all_find_vmstate_bs(void)
      while (not_found && (bs = bdrv_next(bs))) {
          AioContext *ctx = bdrv_get_aio_context(bs);

+        if (bs->disable_vmstate_save) {
+            continue;
+        }
+
          aio_context_acquire(ctx);
          not_found = !bdrv_can_snapshot(bs);
          aio_context_release(ctx);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 256609d..855a209 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -438,6 +438,9 @@  struct BlockDriverState {
      /* do we need to tell the quest if we have a volatile write cache? */
      int enable_write_cache;

+    /* skip this BDS searching for one to save VM state */
+    bool disable_vmstate_save;
+
      /* the following member gives a name to every node on the bs graph. */
      char node_name[32];
      /* element of the list of named nodes building the graph */