Patchwork [2/6] loadvm: improve tests before bdrv_snapshot_goto()

login
register
mail settings
Submitter Kevin Wolf
Date Aug. 3, 2010, 2:44 p.m.
Message ID <1280846670-27063-3-git-send-email-kwolf@redhat.com>
Download mbox | patch
Permalink /patch/60758/
State New
Headers show

Comments

Kevin Wolf - Aug. 3, 2010, 2:44 p.m.
From: Miguel Di Ciurcio Filho <miguel.filho@gmail.com>

This patch improves the resilience of the load_vmstate() function, doing
further and better ordered tests.

In load_vmstate(), if there is any error on bdrv_snapshot_goto(), except if the
error is on VM state device, load_vmstate() will return zero and the VM will be
started with major corruption chances.

The current process:
- test if there is any writable device without snapshot support
    - if exists return -error
- get the device that saves the VM state, possible return -error but unlikely
because it was tested earlier
- flush I/O
- run bdrv_snapshot_goto() on devices
    - if fails, give an warning and goes to the next (not good!)
    - if fails on the VM state device, return zero (not good!)
- check if the requested snapshot exists on the device that saves the VM state
and the state is not zero
    - if fails return -error
- open the file with the VM state
    - if fails return -error
- load the VM state
    - if fails return -error
- return zero

New behavior:
- get the device that saves the VM state
    - if fails return -error
- check if the requested snapshot exists on the device that saves the VM state
and the state is not zero
    - if fails return -error
- test if there is any writable device without snapshot support
    - if exists return -error
- test if the devices with snapshot support have the requested snapshot
    - if anyone fails, return -error
- flush I/O
- run snapshot_goto() on devices
    - if anyone fails, return -error
- open the file with the VM state
    - if fails return -error
- load the VM state
    - if fails return -error
- return zero

do_loadvm must not call vm_start if any error has occurred in load_vmstate.

Signed-off-by: Miguel Di Ciurcio Filho <miguel.filho@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 monitor.c |    3 +-
 savevm.c  |   71 +++++++++++++++++++++++++++++-------------------------------
 2 files changed, 36 insertions(+), 38 deletions(-)
Philipp Hahn - April 14, 2011, 9:10 a.m.
Hello,

Am Dienstag 03 August 2010 06:44:26 schrieb Kevin Wolf:
> From: Miguel Di Ciurcio Filho <miguel.filho@gmail.com>
>
> This patch improves the resilience of the load_vmstate() function, doing
> further and better ordered tests.

This patch broke restoring not-running VMs using libvirt-0.8.7 with qemu-0.14: 
When the domain is not running while taking a snpshot, the sn.vm_state_size 
== 0:

2021        } else if (sn.vm_state_size == 0) {
(gdb) print sn
$6 = {id_str = "1", '\0' <repeats 126 times>, name = "pre-update-flash", '\0' 
<repeats 239 times>, vm_state_size = 0, date_sec = 1302698007, date_nsec = 
711909000, 
  vm_clock_nsec = 0}

> The [old] process:
...
> - run bdrv_snapshot_goto() on devices
>     - if fails, give an warning and goes to the next (not good!)
>     - if fails on the VM state device, return zero (not good!)
> - check if the requested snapshot exists on the device that saves the VM
> state and the state is not zero
>     - if fails return -error

Previously the qcow2 image was still reverted to the old state, so on the next 
start of the domain the qcow2 image would be in the state of the snapshot

> New behavior:
...
> - check if the requested snapshot exists on the device that saves the VM
> state and the state is not zero
>     - if fails return -error
...
> - run snapshot_goto() on devices

Now the qcow2 image is not reverted and when the domain is started, it is NOT 
in the state of the snapshot.

I can't decide if this regression is an Qemu bug or libvirt should be adapted 
to this new behavior.

I found the Bug also reported with Ubuntu and created a Bug in our own German 
bugtracker:
<https://bugs.launchpad.net/qemu/+bug/726619>
<https://forge.univention.org/bugzilla/show_bug.cgi?id=22221>

Sincerely
Philipp Hahn
Kevin Wolf - April 14, 2011, 9:29 a.m.
Am 14.04.2011 11:10, schrieb Philipp Hahn:
> Hello,
> 
> Am Dienstag 03 August 2010 06:44:26 schrieb Kevin Wolf:
>> From: Miguel Di Ciurcio Filho <miguel.filho@gmail.com>
>>
>> This patch improves the resilience of the load_vmstate() function, doing
>> further and better ordered tests.
> 
> This patch broke restoring not-running VMs using libvirt-0.8.7 with qemu-0.14: 
> When the domain is not running while taking a snpshot, the sn.vm_state_size 
> == 0:
> 
> [...]
> 
> Previously the qcow2 image was still reverted to the old state, so on the next 
> start of the domain the qcow2 image would be in the state of the snapshot
> 
> [...]
> 
> Now the qcow2 image is not reverted and when the domain is started, it is NOT 
> in the state of the snapshot.
> 
> I can't decide if this regression is an Qemu bug or libvirt should be adapted 
> to this new behavior.

Ouch. I wouldn't have expected that libvirt relies on this qemu bug.
When libvirt doesn't use the VM state but boots a fresh VM, it should
call qemu-img snapshot -a for the disks rather than using the loadvm
monitor command.

Kevin
Eric Blake - Aug. 9, 2011, 10:35 p.m.
On 04/14/2011 03:29 AM, Kevin Wolf wrote:
> Am 14.04.2011 11:10, schrieb Philipp Hahn:

Reviving an old thread...

>> Hello,
>>
>> Am Dienstag 03 August 2010 06:44:26 schrieb Kevin Wolf:
>>> From: Miguel Di Ciurcio Filho<miguel.filho@gmail.com>
>>>
>>> This patch improves the resilience of the load_vmstate() function, doing
>>> further and better ordered tests.
>>
>> This patch broke restoring not-running VMs using libvirt-0.8.7 with qemu-0.14:
>> When the domain is not running while taking a snpshot, the sn.vm_state_size
>> == 0:
>>
>> [...]
>>
>> Previously the qcow2 image was still reverted to the old state, so on the next
>> start of the domain the qcow2 image would be in the state of the snapshot
>>
>> [...]
>>
>> Now the qcow2 image is not reverted and when the domain is started, it is NOT
>> in the state of the snapshot.
>>
>> I can't decide if this regression is an Qemu bug or libvirt should be adapted
>> to this new behavior.
>
> Ouch. I wouldn't have expected that libvirt relies on this qemu bug.
> When libvirt doesn't use the VM state but boots a fresh VM, it should
> call qemu-img snapshot -a for the disks rather than using the loadvm
> monitor command.

Libvirt should be using 'qemu-img snapshot -a' before reverting to a 
snapshot made via 'qemu-img snapshot -c'; I'm writing the patch now.

Patch

diff --git a/monitor.c b/monitor.c
index 5366c36..c313d5a 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2274,8 +2274,9 @@  static void do_loadvm(Monitor *mon, const QDict *qdict)
 
     vm_stop(0);
 
-    if (load_vmstate(name) >= 0 && saved_vm_running)
+    if (load_vmstate(name) == 0 && saved_vm_running) {
         vm_start();
+    }
 }
 
 int monitor_get_fd(Monitor *mon, const char *fdname)
diff --git a/savevm.c b/savevm.c
index 4c0e5d3..53d47a6 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1894,12 +1894,27 @@  void do_savevm(Monitor *mon, const QDict *qdict)
 
 int load_vmstate(const char *name)
 {
-    BlockDriverState *bs, *bs1;
+    BlockDriverState *bs, *bs_vm_state;
     QEMUSnapshotInfo sn;
     QEMUFile *f;
     int ret;
 
-    /* Verify if there is a device that doesn't support snapshots and is writable */
+    bs_vm_state = bdrv_snapshots();
+    if (!bs_vm_state) {
+        error_report("No block device supports snapshots");
+        return -ENOTSUP;
+    }
+
+    /* Don't even try to load empty VM states */
+    ret = bdrv_snapshot_find(bs_vm_state, &sn, name);
+    if (ret < 0) {
+        return ret;
+    } else if (sn.vm_state_size == 0) {
+        return -EINVAL;
+    }
+
+    /* Verify if there is any device that doesn't support snapshots and is
+    writable and check if the requested snapshot is available too. */
     bs = NULL;
     while ((bs = bdrv_next(bs))) {
 
@@ -1912,63 +1927,45 @@  int load_vmstate(const char *name)
                                bdrv_get_device_name(bs));
             return -ENOTSUP;
         }
-    }
 
-    bs = bdrv_snapshots();
-    if (!bs) {
-        error_report("No block device supports snapshots");
-        return -EINVAL;
+        ret = bdrv_snapshot_find(bs, &sn, name);
+        if (ret < 0) {
+            error_report("Device '%s' does not have the requested snapshot '%s'",
+                           bdrv_get_device_name(bs), name);
+            return ret;
+        }
     }
 
     /* Flush all IO requests so they don't interfere with the new state.  */
     qemu_aio_flush();
 
-    bs1 = NULL;
-    while ((bs1 = bdrv_next(bs1))) {
-        if (bdrv_can_snapshot(bs1)) {
-            ret = bdrv_snapshot_goto(bs1, name);
+    bs = NULL;
+    while ((bs = bdrv_next(bs))) {
+        if (bdrv_can_snapshot(bs)) {
+            ret = bdrv_snapshot_goto(bs, name);
             if (ret < 0) {
-                switch(ret) {
-                case -ENOTSUP:
-                    error_report("%sSnapshots not supported on device '%s'",
-                                 bs != bs1 ? "Warning: " : "",
-                                 bdrv_get_device_name(bs1));
-                    break;
-                case -ENOENT:
-                    error_report("%sCould not find snapshot '%s' on device '%s'",
-                                 bs != bs1 ? "Warning: " : "",
-                                 name, bdrv_get_device_name(bs1));
-                    break;
-                default:
-                    error_report("%sError %d while activating snapshot on '%s'",
-                                 bs != bs1 ? "Warning: " : "",
-                                 ret, bdrv_get_device_name(bs1));
-                    break;
-                }
-                /* fatal on snapshot block device */
-                if (bs == bs1)
-                    return 0;
+                error_report("Error %d while activating snapshot '%s' on '%s'",
+                             ret, name, bdrv_get_device_name(bs));
+                return ret;
             }
         }
     }
 
-    /* Don't even try to load empty VM states */
-    ret = bdrv_snapshot_find(bs, &sn, name);
-    if ((ret >= 0) && (sn.vm_state_size == 0))
-        return -EINVAL;
-
     /* restore the VM state */
-    f = qemu_fopen_bdrv(bs, 0);
+    f = qemu_fopen_bdrv(bs_vm_state, 0);
     if (!f) {
         error_report("Could not open VM state file");
         return -EINVAL;
     }
+
     ret = qemu_loadvm_state(f);
+
     qemu_fclose(f);
     if (ret < 0) {
         error_report("Error %d while loading VM state", ret);
         return ret;
     }
+
     return 0;
 }