Patchwork [3/4] virtio: modify savevm to have a stable wire format

login
register
mail settings
Submitter Anthony Liguori
Date Dec. 10, 2012, 2:29 p.m.
Message ID <1355149790-8125-4-git-send-email-aliguori@us.ibm.com>
Download mbox | patch
Permalink /patch/204925/
State New
Headers show

Comments

Anthony Liguori - Dec. 10, 2012, 2:29 p.m.
We were memcpy()'ing a structure to the wire :-/  Since savevm really
only works on x86 today, lets just declare that this element is sent
over the wire as a little endian value in order to fix the bitness.

Unfortunately, we also send raw pointers and size_t which are going
to be different values on a 32-bit vs. 64-bit QEMU so we need to also
deal with that case.

A lot of values that should have been previously ignored are now sent
as 0 and ignored on the receive side too.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
---
 hw/virtio.c | 114 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 112 insertions(+), 2 deletions(-)
Rusty Russell - Dec. 11, 2012, 12:32 a.m.
Anthony Liguori <aliguori@us.ibm.com> writes:
> We were memcpy()'ing a structure to the wire :-/  Since savevm really
> only works on x86 today, lets just declare that this element is sent
> over the wire as a little endian value in order to fix the bitness.
>
> Unfortunately, we also send raw pointers and size_t which are going
> to be different values on a 32-bit vs. 64-bit QEMU so we need to also
> deal with that case.
>
> A lot of values that should have been previously ignored are now sent
> as 0 and ignored on the receive side too.

Don't we want to transition to vmstate anyway?  Can we just do that, and
relegate the existing slightly broken code, to legacy?

Cheers,
Rusty.
Anthony Liguori - Dec. 11, 2012, 12:54 a.m.
Rusty Russell <rusty@rustcorp.com.au> writes:

> Anthony Liguori <aliguori@us.ibm.com> writes:
>> We were memcpy()'ing a structure to the wire :-/  Since savevm really
>> only works on x86 today, lets just declare that this element is sent
>> over the wire as a little endian value in order to fix the bitness.
>>
>> Unfortunately, we also send raw pointers and size_t which are going
>> to be different values on a 32-bit vs. 64-bit QEMU so we need to also
>> deal with that case.
>>
>> A lot of values that should have been previously ignored are now sent
>> as 0 and ignored on the receive side too.
>
> Don't we want to transition to vmstate anyway?  Can we just do that, and
> relegate the existing slightly broken code, to legacy?

What worries me is if someone changes VirtQueueElement, then all the
sudden migration breaks.  By transitioning to what I've sent, we at
least have a better documented protocol that isn't prone to subtle
breakage anymore.  Plus, we resolve the endian issue before it becomes a
bigger problem when David actually gets live migration working reliably
on PPC...

I'm certainly in favor of cleaning up the savevm format and probably
leaving the existing load/save functions as-is for legacy purposes.
I'll leave that as an exercise for someone else though :-)

Regards,

Anthony Liguori

>
> Cheers,
> Rusty.
Rusty Russell - Dec. 14, 2012, 12:57 a.m.
Anthony Liguori <anthony@codemonkey.ws> writes:
> Rusty Russell <rusty@rustcorp.com.au> writes:
>
>> Anthony Liguori <aliguori@us.ibm.com> writes:
>>> We were memcpy()'ing a structure to the wire :-/  Since savevm really
>>> only works on x86 today, lets just declare that this element is sent
>>> over the wire as a little endian value in order to fix the bitness.
>>>
>>> Unfortunately, we also send raw pointers and size_t which are going
>>> to be different values on a 32-bit vs. 64-bit QEMU so we need to also
>>> deal with that case.
>>>
>>> A lot of values that should have been previously ignored are now sent
>>> as 0 and ignored on the receive side too.
>>
>> Don't we want to transition to vmstate anyway?  Can we just do that, and
>> relegate the existing slightly broken code, to legacy?
>
> What worries me is if someone changes VirtQueueElement, then all the
> sudden migration breaks.  By transitioning to what I've sent, we at
> least have a better documented protocol that isn't prone to subtle
> breakage anymore.  Plus, we resolve the endian issue before it becomes a
> bigger problem when David actually gets live migration working reliably
> on PPC...

My transition was to copy that structure to VirtQueueSavedElement, and
it's only used for loading old versions.

With the new code we only need the head from that structure.

> I'm certainly in favor of cleaning up the savevm format and probably
> leaving the existing load/save functions as-is for legacy purposes.
> I'll leave that as an exercise for someone else though :-)

What is the rule about new versions?  Can we introduce a new save
version at any time, or only at major qemu version changes?

Thanks,
Rusty.
Paolo Bonzini - Dec. 14, 2012, 11:50 a.m.
Il 14/12/2012 01:57, Rusty Russell ha scritto:
> With the new code we only need the head from that structure.

We also need to do again all validation of the elements if we fetch it
back from the data.  Sometimes the parsed data is saved elsewhere (e.g.
in a SCSIRequest struct that is serialized by the SCSI subsystem) and
that data may be inconsistent with whatever you read from guest memory.
 It's a can of worms.

>> I'm certainly in favor of cleaning up the savevm format and probably
>> leaving the existing load/save functions as-is for legacy purposes.
>> I'll leave that as an exercise for someone else though :-)
> 
> What is the rule about new versions?  Can we introduce a new save
> version at any time, or only at major qemu version changes?

Any time, but we provide a backwards-compatible loader for older versions.

Paolo
Anthony Liguori - Dec. 14, 2012, 1:59 p.m.
Paolo Bonzini <pbonzini@redhat.com> writes:

> Il 14/12/2012 01:57, Rusty Russell ha scritto:
>> With the new code we only need the head from that structure.
>
> We also need to do again all validation of the elements if we fetch it
> back from the data.  Sometimes the parsed data is saved elsewhere (e.g.
> in a SCSIRequest struct that is serialized by the SCSI subsystem) and
> that data may be inconsistent with whatever you read from guest memory.
>  It's a can of worms.
>
>>> I'm certainly in favor of cleaning up the savevm format and probably
>>> leaving the existing load/save functions as-is for legacy purposes.
>>> I'll leave that as an exercise for someone else though :-)
>> 
>> What is the rule about new versions?  Can we introduce a new save
>> version at any time, or only at major qemu version changes?
>
> Any time, but we provide a backwards-compatible loader for older
> versions.

Well..  if we can avoid bumping the version, we should.  Bumping the
version breaks migration from new -> old so it's preferrable that we
don't do it unless it's absolutely required.

Regards,

Anthony Liguori

>
> Paolo

Patch

diff --git a/hw/virtio.c b/hw/virtio.c
index 8eb8f69..0108d62 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -875,14 +875,124 @@  int virtio_load(VirtIODevice *vdev, QEMUFile *f)
     return 0;
 }
 
+/* We used to memcpy the structure to the wire so that's the reason for all of
+ * this ugliness */
+
+#if HOST_LONG_BITS == 32
+static uint32 virtio_get_long(QEMUFile *f)
+{
+    return qemu_get_le32(f);
+}
+
+static void virtio_put_long(QEMUFile *f, uint32_t value)
+{
+    qemu_put_le32(f, value);
+}
+#elif HOST_LONG_BITS == 64
+static uint64 virtio_get_long(QEMUFile *f)
+{
+    return qemu_get_le64(f);
+}
+
+static void virtio_put_long(QEMUFile *f, uint64_t value)
+{
+    qemu_put_le64(f, value);
+}
+#else
+#error Invalid HOST_LONG_BITS
+#endif
+
 void virtio_put_virt_queue_element(QEMUFile *f, const VirtQueueElement *elem)
 {
-    qemu_put_buffer(f, (unsigned char*)elem, sizeof(*elem));
+    int i;
+
+    qemu_put_le32(f, elem->index);
+    qemu_put_le32(f, elem->out_num);
+    qemu_put_le32(f, elem->in_num);
+
+    qemu_put_le32(f, 0); /* padding for structure */
+
+    for (i = 0; i < VIRTQUEUE_MAX_SIZE; i++) {
+        if (i < elem->in_num) {
+            qemu_put_le64(f, elem->in_addr[i]);
+        } else {
+            qemu_put_le64(f, 0);
+        }
+    }
+
+    for (i = 0; i < VIRTQUEUE_MAX_SIZE; i++) {
+        if (i < elem->out_num) {
+            qemu_put_le64(f, elem->out_addr[i]);
+        } else {
+            qemu_put_le64(f, 0);
+        }
+    }
+
+    for (i = 0; i < VIRTQUEUE_MAX_SIZE; i++) {
+        virtio_put_long(f, 0);
+        if (i < elem->in_num) {
+            virtio_put_long(f, elem->in_sg[i].iov_len);
+        } else {
+            virtio_put_long(f, 0);
+        }
+    }
+
+    for (i = 0; i < VIRTQUEUE_MAX_SIZE; i++) {
+        virtio_put_long(f, 0);
+        if (i < elem->out_num) {
+            virtio_put_long(f, elem->out_sg[i].iov_len);
+        } else {
+            virtio_put_long(f, 0);
+        }
+    }
 }
 
 void virtio_get_virt_queue_element(QEMUFile *f, VirtQueueElement *elem)
 {
-    qemu_get_buffer(f, (unsigned char *)elem, sizeof(*elem));
+    int i;
+
+    elem->index = qemu_get_le32(f);
+    elem->out_num = qemu_get_le32(f);
+    elem->in_num = qemu_get_le32(f);
+
+    assert(elem->out_num <= VIRTQUEUE_MAX_SIZE &&
+           elem->in_num <= VIRTQUEUE_MAX_SIZE);
+
+    qemu_get_le32(f); /* padding for structure */
+
+    for (i = 0; i < VIRTQUEUE_MAX_SIZE; i++) {
+        if (i < elem->in_num) {
+            elem->in_addr[i] = qemu_get_le64(f);
+        } else {
+            qemu_get_le64(f);
+        }
+    }
+
+    for (i = 0; i < VIRTQUEUE_MAX_SIZE; i++) {
+        if (i < elem->out_num) {
+            elem->out_addr[i] = qemu_get_le64(f);
+        } else {
+            qemu_get_le64(f);
+        }
+    }
+
+    for (i = 0; i < VIRTQUEUE_MAX_SIZE; i++) {
+        virtio_get_long(f);
+        if (i < elem->in_num) {
+            elem->in_sg[i].iov_len = virtio_get_long(f);
+        } else {
+            virtio_get_long(f);
+        }
+    }
+
+    for (i = 0; i < VIRTQUEUE_MAX_SIZE; i++) {
+        virtio_get_long(f);
+        if (i < elem->out_num) {
+            elem->out_sg[i].iov_len = virtio_get_long(f);
+        } else {
+            virtio_get_long(f);
+        }
+    }
 
     virtqueue_map_sg(elem->in_sg, elem->in_addr, elem->in_num, 1);
     virtqueue_map_sg(elem->out_sg, elem->out_addr, elem->out_num, 0);