diff mbox series

[PULL,5/5] virtio-gpu: fix scanout migration post-load

Message ID 20240312140216.313618-6-marcandre.lureau@redhat.com
State New
Headers show
Series [PULL,1/5] ui/vnc: Respect bound console | expand

Commit Message

Marc-André Lureau March 12, 2024, 2:02 p.m. UTC
From: Marc-André Lureau <marcandre.lureau@redhat.com>

The current post-loading code for scanout has a FIXME: it doesn't take
the resource region/rect into account. But there is more, when adding
blob migration support in commit f66767f75c9, I didn't realize that blob
resources could be used for scanouts. This situationn leads to a crash
during post-load, as they don't have an associated res->image.

virtio_gpu_do_set_scanout() handle all cases, but requires the
associated virtio_gpu_framebuffer, which is currently not saved during
migration.

Add a v2 of "virtio-gpu-one-scanout" with the framebuffer fields, so we
can restore blob scanouts, as well as fixing the existing FIXME.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Sebastian Ott <sebott@redhat.com>
---
 include/hw/virtio/virtio-gpu.h |  1 +
 hw/display/virtio-gpu.c        | 51 +++++++++++++++++++++++++++-------
 2 files changed, 42 insertions(+), 10 deletions(-)

Comments

Fiona Ebner April 30, 2024, 12:30 p.m. UTC | #1
Am 12.03.24 um 15:02 schrieb marcandre.lureau@redhat.com:
> From: Marc-André Lureau <marcandre.lureau@redhat.com>
> 
> The current post-loading code for scanout has a FIXME: it doesn't take
> the resource region/rect into account. But there is more, when adding
> blob migration support in commit f66767f75c9, I didn't realize that blob
> resources could be used for scanouts. This situationn leads to a crash
> during post-load, as they don't have an associated res->image.
> 
> virtio_gpu_do_set_scanout() handle all cases, but requires the
> associated virtio_gpu_framebuffer, which is currently not saved during
> migration.
> 
> Add a v2 of "virtio-gpu-one-scanout" with the framebuffer fields, so we
> can restore blob scanouts, as well as fixing the existing FIXME.
> 
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> Reviewed-by: Sebastian Ott <sebott@redhat.com>

Hi,
unfortunately, this broke migration from pre-9.0 to 9.0:

> vmstate_load_state_field virtio-gpu:virtio-gpu
> vmstate_load_state_field virtio-gpu-scanouts:parent_obj.enable
> vmstate_load_state_field virtio-gpu-scanouts:parent_obj.conf.max_outputs
> vmstate_load_state_field virtio-gpu-scanouts:parent_obj.scanout
> vmstate_load_state_field virtio-gpu-one-scanout:resource_id
> vmstate_load_state_field virtio-gpu-one-scanout:width
> vmstate_load_state_field virtio-gpu-one-scanout:height
> vmstate_load_state_field virtio-gpu-one-scanout:x
> vmstate_load_state_field virtio-gpu-one-scanout:y
> vmstate_load_state_field virtio-gpu-one-scanout:cursor.resource_id
> vmstate_load_state_field virtio-gpu-one-scanout:cursor.hot_x
> vmstate_load_state_field virtio-gpu-one-scanout:cursor.hot_y
> vmstate_load_state_field virtio-gpu-one-scanout:cursor.pos.x
> vmstate_load_state_field virtio-gpu-one-scanout:cursor.pos.y
> vmstate_load_state_field virtio-gpu-one-scanout:fb.format
> vmstate_load_state_field virtio-gpu-one-scanout:fb.bytes_pp
> vmstate_load_state_field virtio-gpu-one-scanout:fb.width
> vmstate_load_state_field virtio-gpu-one-scanout:fb.height
> vmstate_load_state_field virtio-gpu-one-scanout:fb.stride
> vmstate_load_state_field virtio-gpu-one-scanout:fb.offset
> qemu-system-x86_64: Missing section footer for 0000:00:02.0/virtio-gpu
> qemu-system-x86_64: Error -22 while loading VM state

It wrongly tries to load the fb fields even though they should be
guarded by version 2.

Looking at it with GDB, in vmstate_load_state(), when we come to
field->name == parent_obj.scanout, the

>                 } else if (field->flags & VMS_STRUCT) {
>                     ret = vmstate_load_state(f, field->vmsd, curr_elem,
>                                              field->vmsd->version_id);

branch will be taken and suddenly we'll have a call to
vmstate_load_state() for vmsd==vmstate_virtio_gpu_scanout with
version_id==2 rather than version_id==1, because that is
field->vmsd->version_id (i.e. the .version_id in VMStateDescription
vmstate_virtio_gpu_scanout).

Would it have been necessary to version the VMStateDescription
vmstate_virtio_gpu_scanouts too using VMS_VSTRUCT (or am I
misinterpreting the use case for that)?

Best Regards,
Fiona
Peter Xu May 1, 2024, 2:55 p.m. UTC | #2
On Tue, Apr 30, 2024 at 02:30:19PM +0200, Fiona Ebner wrote:
> Am 12.03.24 um 15:02 schrieb marcandre.lureau@redhat.com:
> > From: Marc-André Lureau <marcandre.lureau@redhat.com>
> > 
> > The current post-loading code for scanout has a FIXME: it doesn't take
> > the resource region/rect into account. But there is more, when adding
> > blob migration support in commit f66767f75c9, I didn't realize that blob
> > resources could be used for scanouts. This situationn leads to a crash
> > during post-load, as they don't have an associated res->image.
> > 
> > virtio_gpu_do_set_scanout() handle all cases, but requires the
> > associated virtio_gpu_framebuffer, which is currently not saved during
> > migration.
> > 
> > Add a v2 of "virtio-gpu-one-scanout" with the framebuffer fields, so we
> > can restore blob scanouts, as well as fixing the existing FIXME.
> > 
> > Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> > Reviewed-by: Sebastian Ott <sebott@redhat.com>
> 
> Hi,
> unfortunately, this broke migration from pre-9.0 to 9.0:
> 
> > vmstate_load_state_field virtio-gpu:virtio-gpu
> > vmstate_load_state_field virtio-gpu-scanouts:parent_obj.enable
> > vmstate_load_state_field virtio-gpu-scanouts:parent_obj.conf.max_outputs
> > vmstate_load_state_field virtio-gpu-scanouts:parent_obj.scanout
> > vmstate_load_state_field virtio-gpu-one-scanout:resource_id
> > vmstate_load_state_field virtio-gpu-one-scanout:width
> > vmstate_load_state_field virtio-gpu-one-scanout:height
> > vmstate_load_state_field virtio-gpu-one-scanout:x
> > vmstate_load_state_field virtio-gpu-one-scanout:y
> > vmstate_load_state_field virtio-gpu-one-scanout:cursor.resource_id
> > vmstate_load_state_field virtio-gpu-one-scanout:cursor.hot_x
> > vmstate_load_state_field virtio-gpu-one-scanout:cursor.hot_y
> > vmstate_load_state_field virtio-gpu-one-scanout:cursor.pos.x
> > vmstate_load_state_field virtio-gpu-one-scanout:cursor.pos.y
> > vmstate_load_state_field virtio-gpu-one-scanout:fb.format
> > vmstate_load_state_field virtio-gpu-one-scanout:fb.bytes_pp
> > vmstate_load_state_field virtio-gpu-one-scanout:fb.width
> > vmstate_load_state_field virtio-gpu-one-scanout:fb.height
> > vmstate_load_state_field virtio-gpu-one-scanout:fb.stride
> > vmstate_load_state_field virtio-gpu-one-scanout:fb.offset
> > qemu-system-x86_64: Missing section footer for 0000:00:02.0/virtio-gpu
> > qemu-system-x86_64: Error -22 while loading VM state
> 
> It wrongly tries to load the fb fields even though they should be
> guarded by version 2.
> 
> Looking at it with GDB, in vmstate_load_state(), when we come to
> field->name == parent_obj.scanout, the
> 
> >                 } else if (field->flags & VMS_STRUCT) {
> >                     ret = vmstate_load_state(f, field->vmsd, curr_elem,
> >                                              field->vmsd->version_id);
> 
> branch will be taken and suddenly we'll have a call to
> vmstate_load_state() for vmsd==vmstate_virtio_gpu_scanout with
> version_id==2 rather than version_id==1, because that is
> field->vmsd->version_id (i.e. the .version_id in VMStateDescription
> vmstate_virtio_gpu_scanout).
> 
> Would it have been necessary to version the VMStateDescription
> vmstate_virtio_gpu_scanouts too using VMS_VSTRUCT (or am I
> misinterpreting the use case for that)?

Looks right.  And there's only one such user which is when it's introduced
in 2018.  It's sad we can't simply already use vmsd subsections even if
that was there before this VSTRUCT thing, and that should work with
internal versioning.  Maybe we introduced that because we can't replace a
VMS_STRUCT to subsections?

https://lore.kernel.org/qemu-devel/1524670052-28373-1-git-send-email-minyard@acm.org/#t

OTOH, I don't think vmsd versioning would work for ping-pong migrations.
Migrating backwards should fail with 'not supported' with vmsd versionings.
Depending on the requirement (in this virtio-gpu case, it looks like
applicable to be used in a cluster and doing back-and-forth moves?), we may
want to support bi-directional migrations which should be superior.  That
will need to stick with machine type check (compat fields in hw_compat_*,
then conditionally save/load fields) to guarantee migration can work back
and forth.

Thanks,
Marc-André Lureau May 7, 2024, 10:14 a.m. UTC | #3
Hi

On Tue, Apr 30, 2024 at 4:31 PM Fiona Ebner <f.ebner@proxmox.com> wrote:
>
> Am 12.03.24 um 15:02 schrieb marcandre.lureau@redhat.com:
> > From: Marc-André Lureau <marcandre.lureau@redhat.com>
> >
> > The current post-loading code for scanout has a FIXME: it doesn't take
> > the resource region/rect into account. But there is more, when adding
> > blob migration support in commit f66767f75c9, I didn't realize that blob
> > resources could be used for scanouts. This situationn leads to a crash
> > during post-load, as they don't have an associated res->image.
> >
> > virtio_gpu_do_set_scanout() handle all cases, but requires the
> > associated virtio_gpu_framebuffer, which is currently not saved during
> > migration.
> >
> > Add a v2 of "virtio-gpu-one-scanout" with the framebuffer fields, so we
> > can restore blob scanouts, as well as fixing the existing FIXME.
> >
> > Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> > Reviewed-by: Sebastian Ott <sebott@redhat.com>
>
> Hi,
> unfortunately, this broke migration from pre-9.0 to 9.0:
>
> > vmstate_load_state_field virtio-gpu:virtio-gpu
> > vmstate_load_state_field virtio-gpu-scanouts:parent_obj.enable
> > vmstate_load_state_field virtio-gpu-scanouts:parent_obj.conf.max_outputs
> > vmstate_load_state_field virtio-gpu-scanouts:parent_obj.scanout
> > vmstate_load_state_field virtio-gpu-one-scanout:resource_id
> > vmstate_load_state_field virtio-gpu-one-scanout:width
> > vmstate_load_state_field virtio-gpu-one-scanout:height
> > vmstate_load_state_field virtio-gpu-one-scanout:x
> > vmstate_load_state_field virtio-gpu-one-scanout:y
> > vmstate_load_state_field virtio-gpu-one-scanout:cursor.resource_id
> > vmstate_load_state_field virtio-gpu-one-scanout:cursor.hot_x
> > vmstate_load_state_field virtio-gpu-one-scanout:cursor.hot_y
> > vmstate_load_state_field virtio-gpu-one-scanout:cursor.pos.x
> > vmstate_load_state_field virtio-gpu-one-scanout:cursor.pos.y
> > vmstate_load_state_field virtio-gpu-one-scanout:fb.format
> > vmstate_load_state_field virtio-gpu-one-scanout:fb.bytes_pp
> > vmstate_load_state_field virtio-gpu-one-scanout:fb.width
> > vmstate_load_state_field virtio-gpu-one-scanout:fb.height
> > vmstate_load_state_field virtio-gpu-one-scanout:fb.stride
> > vmstate_load_state_field virtio-gpu-one-scanout:fb.offset
> > qemu-system-x86_64: Missing section footer for 0000:00:02.0/virtio-gpu
> > qemu-system-x86_64: Error -22 while loading VM state
>
> It wrongly tries to load the fb fields even though they should be
> guarded by version 2.
>
> Looking at it with GDB, in vmstate_load_state(), when we come to
> field->name == parent_obj.scanout, the
>
> >                 } else if (field->flags & VMS_STRUCT) {
> >                     ret = vmstate_load_state(f, field->vmsd, curr_elem,
> >                                              field->vmsd->version_id);
>
> branch will be taken and suddenly we'll have a call to
> vmstate_load_state() for vmsd==vmstate_virtio_gpu_scanout with
> version_id==2 rather than version_id==1, because that is
> field->vmsd->version_id (i.e. the .version_id in VMStateDescription
> vmstate_virtio_gpu_scanout).
>
> Would it have been necessary to version the VMStateDescription
> vmstate_virtio_gpu_scanouts too using VMS_VSTRUCT (or am I
> misinterpreting the use case for that)?
>

Sigh.. this is embarrassing. This patch didn't receive enough testing
and breaks pre-9.0/9.0 forward and backward migrations.

I think it would be reasonable to revert the patch, but it's already
in 9.0. Is that acceptable?

There are various issues about versioning:
- VMStateDescription (vmsd) version_id are not in the stream (only
start & full section headers), so any nested version bump should
somehow be bubbled up/down..
- machine versions don't have specific vmsd version_id associations
(that I can see, or is there any way to do that?)
- virtio-gpu uses even lower-level vmstate API and hardcodes version
to 1 in general

Ideally, I wished adding versionized fields the way done in this patch
would simply work, but this is really, sadly, not supported atm.

And it would be great to have some testing for back/forward migrations.

Any thoughts, corrections, encouragement perhaps? :)

thanks


--
Marc-André Lureau
diff mbox series

Patch

diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index b28e7ef0d2..ed44cdad6b 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -81,6 +81,7 @@  struct virtio_gpu_scanout {
     uint32_t resource_id;
     struct virtio_gpu_update_cursor cursor;
     QEMUCursor *current_cursor;
+    struct virtio_gpu_framebuffer fb;
 };
 
 struct virtio_gpu_requested_state {
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index ccbe31d759..78d5a4f164 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -600,6 +600,7 @@  static void virtio_unref_resource(pixman_image_t *image, void *data)
 static void virtio_gpu_update_scanout(VirtIOGPU *g,
                                       uint32_t scanout_id,
                                       struct virtio_gpu_simple_resource *res,
+                                      struct virtio_gpu_framebuffer *fb,
                                       struct virtio_gpu_rect *r)
 {
     struct virtio_gpu_simple_resource *ores;
@@ -617,9 +618,10 @@  static void virtio_gpu_update_scanout(VirtIOGPU *g,
     scanout->y = r->y;
     scanout->width = r->width;
     scanout->height = r->height;
+    scanout->fb = *fb;
 }
 
-static void virtio_gpu_do_set_scanout(VirtIOGPU *g,
+static bool virtio_gpu_do_set_scanout(VirtIOGPU *g,
                                       uint32_t scanout_id,
                                       struct virtio_gpu_framebuffer *fb,
                                       struct virtio_gpu_simple_resource *res,
@@ -645,7 +647,7 @@  static void virtio_gpu_do_set_scanout(VirtIOGPU *g,
                       r->x, r->y, r->width, r->height,
                       fb->width, fb->height);
         *error = VIRTIO_GPU_RESP_ERR_INVALID_PARAMETER;
-        return;
+        return false;
     }
 
     g->parent_obj.enable = 1;
@@ -653,11 +655,12 @@  static void virtio_gpu_do_set_scanout(VirtIOGPU *g,
     if (res->blob) {
         if (console_has_gl(scanout->con)) {
             if (!virtio_gpu_update_dmabuf(g, scanout_id, res, fb, r)) {
-                virtio_gpu_update_scanout(g, scanout_id, res, r);
+                virtio_gpu_update_scanout(g, scanout_id, res, fb, r);
             } else {
                 *error = VIRTIO_GPU_RESP_ERR_OUT_OF_MEMORY;
+                return false;
             }
-            return;
+            return true;
         }
 
         data = res->blob;
@@ -693,7 +696,8 @@  static void virtio_gpu_do_set_scanout(VirtIOGPU *g,
                                 scanout->ds);
     }
 
-    virtio_gpu_update_scanout(g, scanout_id, res, r);
+    virtio_gpu_update_scanout(g, scanout_id, res, fb, r);
+    return true;
 }
 
 static void virtio_gpu_set_scanout(VirtIOGPU *g,
@@ -1164,7 +1168,8 @@  static void virtio_gpu_cursor_bh(void *opaque)
 
 static const VMStateDescription vmstate_virtio_gpu_scanout = {
     .name = "virtio-gpu-one-scanout",
-    .version_id = 1,
+    .version_id = 2,
+    .minimum_version_id = 1,
     .fields = (const VMStateField[]) {
         VMSTATE_UINT32(resource_id, struct virtio_gpu_scanout),
         VMSTATE_UINT32(width, struct virtio_gpu_scanout),
@@ -1176,6 +1181,12 @@  static const VMStateDescription vmstate_virtio_gpu_scanout = {
         VMSTATE_UINT32(cursor.hot_y, struct virtio_gpu_scanout),
         VMSTATE_UINT32(cursor.pos.x, struct virtio_gpu_scanout),
         VMSTATE_UINT32(cursor.pos.y, struct virtio_gpu_scanout),
+        VMSTATE_UINT32_V(fb.format, struct virtio_gpu_scanout, 2),
+        VMSTATE_UINT32_V(fb.bytes_pp, struct virtio_gpu_scanout, 2),
+        VMSTATE_UINT32_V(fb.width, struct virtio_gpu_scanout, 2),
+        VMSTATE_UINT32_V(fb.height, struct virtio_gpu_scanout, 2),
+        VMSTATE_UINT32_V(fb.stride, struct virtio_gpu_scanout, 2),
+        VMSTATE_UINT32_V(fb.offset, struct virtio_gpu_scanout, 2),
         VMSTATE_END_OF_LIST()
     },
 };
@@ -1347,6 +1358,7 @@  static int virtio_gpu_blob_save(QEMUFile *f, void *opaque, size_t size,
         if (!res->blob_size) {
             continue;
         }
+        assert(!res->image);
         qemu_put_be32(f, res->resource_id);
         qemu_put_be32(f, res->blob_size);
         qemu_put_be32(f, res->iov_cnt);
@@ -1409,21 +1421,40 @@  static int virtio_gpu_post_load(void *opaque, int version_id)
     int i;
 
     for (i = 0; i < g->parent_obj.conf.max_outputs; i++) {
-        /* FIXME: should take scanout.r.{x,y} into account */
         scanout = &g->parent_obj.scanout[i];
         if (!scanout->resource_id) {
             continue;
         }
+
         res = virtio_gpu_find_resource(g, scanout->resource_id);
         if (!res) {
             return -EINVAL;
         }
-        scanout->ds = qemu_create_displaysurface_pixman(res->image);
+
+        if (scanout->fb.format != 0) {
+            uint32_t error = 0;
+            struct virtio_gpu_rect r = {
+                .x = scanout->x,
+                .y = scanout->y,
+                .width = scanout->width,
+                .height = scanout->height
+            };
+
+            if (!virtio_gpu_do_set_scanout(g, i, &scanout->fb, res, &r, &error)) {
+                return -EINVAL;
+            }
+        } else {
+            /* legacy v1 migration support */
+            if (!res->image) {
+                return -EINVAL;
+            }
+            scanout->ds = qemu_create_displaysurface_pixman(res->image);
 #ifdef WIN32
-        qemu_displaysurface_win32_set_handle(scanout->ds, res->handle, 0);
+            qemu_displaysurface_win32_set_handle(scanout->ds, res->handle, 0);
 #endif
+            dpy_gfx_replace_surface(scanout->con, scanout->ds);
+        }
 
-        dpy_gfx_replace_surface(scanout->con, scanout->ds);
         dpy_gfx_update_full(scanout->con);
         if (scanout->cursor.resource_id) {
             update_cursor(g, &scanout->cursor);