diff mbox series

[RFC,v2,11/12] Add virtio-gpu vhost-user backend

Message ID 20180601162749.27406-12-marcandre.lureau@redhat.com
State New
Headers show
Series vhost-user for input & GPU | expand

Commit Message

Marc-André Lureau June 1, 2018, 4:27 p.m. UTC
Add to virtio-gpu devices a "vhost-user" property. When set, the
associated vhost-user backend is used to handle the virtio rings.

For now, a socketpair is created for the backend to share the rendering
results with qemu via a simple VHOST_GPU protocol.

Example usage:
-object vhost-user-backend,id=vug,cmd="./vhost-user-gpu"
-device virtio-vga,virgl=true,vhost-user=vug

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 include/hw/virtio/virtio-gpu.h |   9 +
 include/ui/console.h           |   1 +
 hw/display/vhost-gpu.c         | 290 +++++++++++++++++++++++++++++++++
 hw/display/virtio-gpu-3d.c     |   8 +-
 hw/display/virtio-gpu-pci.c    |   5 +
 hw/display/virtio-gpu.c        |  77 ++++++++-
 hw/display/virtio-vga.c        |   5 +
 ui/spice-display.c             |   3 +-
 hw/display/Makefile.objs       |   2 +-
 9 files changed, 393 insertions(+), 7 deletions(-)
 create mode 100644 hw/display/vhost-gpu.c

Comments

Gerd Hoffmann June 4, 2018, 9:37 a.m. UTC | #1
On Fri, Jun 01, 2018 at 06:27:48PM +0200, Marc-André Lureau wrote:
> Add to virtio-gpu devices a "vhost-user" property. When set, the
> associated vhost-user backend is used to handle the virtio rings.
> 
> For now, a socketpair is created for the backend to share the rendering
> results with qemu via a simple VHOST_GPU protocol.

Why this isn't a separate device, like vhost-user-input-pci?

> +typedef struct VhostGpuUpdate {
> +    uint32_t scanout_id;
> +    uint32_t x;
> +    uint32_t y;
> +    uint32_t width;
> +    uint32_t height;
> +    uint8_t data[];
> +} QEMU_PACKED VhostGpuUpdate;

Hmm, when designing a new protocol I think we can do better than just
squeering the pixels into a tcp stream.  Use shared memory instead?  Due
to vhost we are limited to linux anyway, so we might even consider stuff
like dmabufs here.

cheers,
  Gerd
Marc-André Lureau June 8, 2018, 5:25 p.m. UTC | #2
Hi

On Mon, Jun 4, 2018 at 11:37 AM, Gerd Hoffmann <kraxel@redhat.com> wrote:
> On Fri, Jun 01, 2018 at 06:27:48PM +0200, Marc-André Lureau wrote:
>> Add to virtio-gpu devices a "vhost-user" property. When set, the
>> associated vhost-user backend is used to handle the virtio rings.
>>
>> For now, a socketpair is created for the backend to share the rendering
>> results with qemu via a simple VHOST_GPU protocol.
>
> Why this isn't a separate device, like vhost-user-input-pci?

Ok, let's have vhost-user-gpu-pci and vhost-user-vga, inheriting from
existing devices.

>> +typedef struct VhostGpuUpdate {
>> +    uint32_t scanout_id;
>> +    uint32_t x;
>> +    uint32_t y;
>> +    uint32_t width;
>> +    uint32_t height;
>> +    uint8_t data[];
>> +} QEMU_PACKED VhostGpuUpdate;
>
> Hmm, when designing a new protocol I think we can do better than just
> squeering the pixels into a tcp stream.  Use shared memory instead?  Due
> to vhost we are limited to linux anyway, so we might even consider stuff
> like dmabufs here.

Well, my goal is not to invent a new spice or wayland protocol :) I
don't care much about 2d performance at this point, more about 3d. Can
we leave 2d improvements for another day? Beside, what would dmabuf
bring us for 2d compared to shmem?

There seems to be a lot of overhead with the roundtrip vhost-user ->
qemu -> spice worker -> spice client -> wayland/x11 -> gpu already
(but this isn't necessarily so bad at 60fps or less).
Ideally, I would like to bypass qemu & spice for local rendering, but
I don't think wayland support that kind of nested window composition
(at least tracking messages weston --nested doesn't show that kind of
optimization).

FWIW, here are some Unigine Heaven 4.0 benchmarks (probably within +-10%):

qemu-gtk/egl+virtio-gpu: fps:2.6/ score: 64
qemu-gtk/egl+vhost-user-gpu: fps:12.9 / score: 329

spice+virtio-gpu: fps:2.8 / score: 70
spice+vhost-user-gpu: fps:12.1 / score: 304

That should give an extra motivation :)
Marc-André Lureau June 9, 2018, 1:02 a.m. UTC | #3
On Fri, Jun 8, 2018 at 7:25 PM, Marc-André Lureau
<marcandre.lureau@gmail.com> wrote:
> Hi
>
> On Mon, Jun 4, 2018 at 11:37 AM, Gerd Hoffmann <kraxel@redhat.com> wrote:
>> On Fri, Jun 01, 2018 at 06:27:48PM +0200, Marc-André Lureau wrote:
>>> Add to virtio-gpu devices a "vhost-user" property. When set, the
>>> associated vhost-user backend is used to handle the virtio rings.
>>>
>>> For now, a socketpair is created for the backend to share the rendering
>>> results with qemu via a simple VHOST_GPU protocol.
>>
>> Why this isn't a separate device, like vhost-user-input-pci?
>
> Ok, let's have vhost-user-gpu-pci and vhost-user-vga, inheriting from
> existing devices.
>
>>> +typedef struct VhostGpuUpdate {
>>> +    uint32_t scanout_id;
>>> +    uint32_t x;
>>> +    uint32_t y;
>>> +    uint32_t width;
>>> +    uint32_t height;
>>> +    uint8_t data[];
>>> +} QEMU_PACKED VhostGpuUpdate;
>>
>> Hmm, when designing a new protocol I think we can do better than just
>> squeering the pixels into a tcp stream.  Use shared memory instead?  Due
>> to vhost we are limited to linux anyway, so we might even consider stuff
>> like dmabufs here.
>
> Well, my goal is not to invent a new spice or wayland protocol :) I
> don't care much about 2d performance at this point, more about 3d. Can
> we leave 2d improvements for another day? Beside, what would dmabuf
> bring us for 2d compared to shmem?
>
> There seems to be a lot of overhead with the roundtrip vhost-user ->
> qemu -> spice worker -> spice client -> wayland/x11 -> gpu already
> (but this isn't necessarily so bad at 60fps or less).
> Ideally, I would like to bypass qemu & spice for local rendering, but
> I don't think wayland support that kind of nested window composition
> (at least tracking messages weston --nested doesn't show that kind of
> optimization).
>
> FWIW, here are some Unigine Heaven 4.0 benchmarks (probably within +-10%):
>
> qemu-gtk/egl+virtio-gpu: fps:2.6/ score: 64
> qemu-gtk/egl+vhost-user-gpu: fps:12.9 / score: 329
>
> spice+virtio-gpu: fps:2.8 / score: 70
> spice+vhost-user-gpu: fps:12.1 / score: 304
>
> That should give an extra motivation :)
>

(host is fps:31.1 / score:784
Gerd Hoffmann June 11, 2018, 6:49 a.m. UTC | #4
> >> For now, a socketpair is created for the backend to share the rendering
> >> results with qemu via a simple VHOST_GPU protocol.
> >
> > Why this isn't a separate device, like vhost-user-input-pci?
> 
> Ok, let's have vhost-user-gpu-pci and vhost-user-vga, inheriting from
> existing devices.

I'd tend to create separate devices instead of inheriting from the
existing devices.  Arn't the code paths more or less completely
different?  What code is shared between builtin and vhost-user versions
of the devices?

> >> +typedef struct VhostGpuUpdate {
> >> +    uint32_t scanout_id;
> >> +    uint32_t x;
> >> +    uint32_t y;
> >> +    uint32_t width;
> >> +    uint32_t height;
> >> +    uint8_t data[];
> >> +} QEMU_PACKED VhostGpuUpdate;
> >
> > Hmm, when designing a new protocol I think we can do better than just
> > squeering the pixels into a tcp stream.  Use shared memory instead?  Due
> > to vhost we are limited to linux anyway, so we might even consider stuff
> > like dmabufs here.
> 
> Well, my goal is not to invent a new spice or wayland protocol :) I
> don't care much about 2d performance at this point, more about 3d. Can
> we leave 2d improvements for another day? Beside, what would dmabuf
> bring us for 2d compared to shmem?

Well, you need dma-bufs for 3d anyway, so why not use them for 2d too?
I don't think we need separate code paths for 2d vs. 3d updates.

> There seems to be a lot of overhead with the roundtrip vhost-user ->
> qemu -> spice worker -> spice client -> wayland/x11 -> gpu already
> (but this isn't necessarily so bad at 60fps or less).
> Ideally, I would like to bypass qemu & spice for local rendering, but
> I don't think wayland support that kind of nested window composition
> (at least tracking messages weston --nested doesn't show that kind of
> optimization).

Yep, a direct vhost-user -> wayland path makes sense.  Using dma-bufs
for both 2d and 3d should simplify that too (again: one code path
instead of two).

What do you mean with nested window composition?

cheers,
  Gerd
diff mbox series

Patch

diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 79bb3fb3dd..7cd514175a 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -19,6 +19,7 @@ 
 #include "ui/console.h"
 #include "hw/virtio/virtio.h"
 #include "qemu/log.h"
+#include "sysemu/vhost-user-backend.h"
 
 #include "standard-headers/linux/virtio_gpu.h"
 #define TYPE_VIRTIO_GPU "virtio-gpu-device"
@@ -88,6 +89,9 @@  struct virtio_gpu_ctrl_command {
 typedef struct VirtIOGPU {
     VirtIODevice parent_obj;
 
+    VhostUserBackend *vhost;
+    CharBackend vhost_chr;
+
     QEMUBH *ctrl_bh;
     QEMUBH *cursor_bh;
     VirtQueue *ctrl_vq;
@@ -103,6 +107,7 @@  typedef struct VirtIOGPU {
     QTAILQ_HEAD(, virtio_gpu_ctrl_command) fenceq;
 
     struct virtio_gpu_scanout scanout[VIRTIO_GPU_MAX_SCANOUTS];
+    QemuDmaBuf dmabuf[VIRTIO_GPU_MAX_SCANOUTS];
     struct virtio_gpu_requested_state req_state[VIRTIO_GPU_MAX_SCANOUTS];
 
     struct virtio_gpu_conf conf;
@@ -171,4 +176,8 @@  void virtio_gpu_virgl_reset(VirtIOGPU *g);
 void virtio_gpu_gl_block(void *opaque, bool block);
 int virtio_gpu_virgl_init(VirtIOGPU *g);
 int virtio_gpu_virgl_get_num_capsets(VirtIOGPU *g);
+
+/* vhost-gpu.c */
+int vhost_gpu_init(VirtIOGPU *g, Error **errp);
+
 #endif
diff --git a/include/ui/console.h b/include/ui/console.h
index 981b519dde..fb969caf70 100644
--- a/include/ui/console.h
+++ b/include/ui/console.h
@@ -186,6 +186,7 @@  struct QemuDmaBuf {
     uint32_t  stride;
     uint32_t  fourcc;
     uint32_t  texture;
+    bool      y0_top;
 };
 
 typedef struct DisplayChangeListenerOps {
diff --git a/hw/display/vhost-gpu.c b/hw/display/vhost-gpu.c
new file mode 100644
index 0000000000..42d9143d3d
--- /dev/null
+++ b/hw/display/vhost-gpu.c
@@ -0,0 +1,290 @@ 
+/*
+ * Virtio vhost GPU Device
+ *
+ * Copyright Red Hat, Inc. 2016
+ *
+ * Authors:
+ *     Marc-André Lureau <marcandre.lureau@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/virtio/virtio-gpu.h"
+#include "chardev/char-fe.h"
+#include "qapi/error.h"
+
+typedef enum VhostGpuRequest {
+    VHOST_GPU_NONE = 0,
+    VHOST_GPU_CURSOR_POS,
+    VHOST_GPU_CURSOR_POS_HIDE,
+    VHOST_GPU_CURSOR_UPDATE,
+    VHOST_GPU_SCANOUT,
+    VHOST_GPU_UPDATE,
+    VHOST_GPU_GL_SCANOUT,
+    VHOST_GPU_GL_UPDATE,
+} VhostGpuRequest;
+
+typedef struct VhostGpuCursorPos {
+    uint32_t scanout_id;
+    uint32_t x;
+    uint32_t y;
+} QEMU_PACKED VhostGpuCursorPos;
+
+typedef struct VhostGpuCursorUpdate {
+    VhostGpuCursorPos pos;
+    uint32_t hot_x;
+    uint32_t hot_y;
+    uint32_t data[64 * 64];
+} QEMU_PACKED VhostGpuCursorUpdate;
+
+typedef struct VhostGpuScanout {
+    uint32_t scanout_id;
+    uint32_t width;
+    uint32_t height;
+} QEMU_PACKED VhostGpuScanout;
+
+typedef struct VhostGpuGlScanout {
+    uint32_t scanout_id;
+    uint32_t x;
+    uint32_t y;
+    uint32_t width;
+    uint32_t height;
+    uint32_t fd_width;
+    uint32_t fd_height;
+    uint32_t fd_stride;
+    uint32_t fd_flags;
+    int fd_drm_fourcc;
+} QEMU_PACKED VhostGpuGlScanout;
+
+typedef struct VhostGpuUpdate {
+    uint32_t scanout_id;
+    uint32_t x;
+    uint32_t y;
+    uint32_t width;
+    uint32_t height;
+    uint8_t data[];
+} QEMU_PACKED VhostGpuUpdate;
+
+typedef struct VhostGpuMsg {
+    VhostGpuRequest request;
+    uint32_t size; /* the following payload size */
+    union {
+        VhostGpuCursorPos cursor_pos;
+        VhostGpuCursorUpdate cursor_update;
+        VhostGpuScanout scanout;
+        VhostGpuUpdate update;
+        VhostGpuGlScanout gl_scanout;
+    } payload;
+} QEMU_PACKED VhostGpuMsg;
+
+static VhostGpuMsg m __attribute__ ((unused));
+#define VHOST_GPU_HDR_SIZE (sizeof(m.request) + sizeof(m.size))
+
+static void vhost_gpu_handle_cursor(VirtIOGPU *g, VhostGpuMsg *msg)
+{
+    VhostGpuCursorPos *pos = &msg->payload.cursor_pos;
+    struct virtio_gpu_scanout *s;
+
+    if (pos->scanout_id >= g->conf.max_outputs) {
+        return;
+    }
+    s = &g->scanout[pos->scanout_id];
+
+    if (msg->request == VHOST_GPU_CURSOR_UPDATE) {
+        VhostGpuCursorUpdate *up = &msg->payload.cursor_update;
+        if (!s->current_cursor) {
+            s->current_cursor = cursor_alloc(64, 64);
+        }
+
+        s->current_cursor->hot_x = up->hot_x;
+        s->current_cursor->hot_y = up->hot_y;
+
+        memcpy(s->current_cursor->data, up->data,
+               64 * 64 * sizeof(uint32_t));
+
+        dpy_cursor_define(s->con, s->current_cursor);
+    }
+
+    dpy_mouse_set(s->con, pos->x, pos->y,
+                  msg->request != VHOST_GPU_CURSOR_POS_HIDE);
+}
+
+static void vhost_gpu_handle_display(VirtIOGPU *g, VhostGpuMsg *msg)
+{
+    struct virtio_gpu_scanout *s;
+
+    switch (msg->request) {
+    case VHOST_GPU_SCANOUT: {
+        VhostGpuScanout *m = &msg->payload.scanout;
+
+        if (m->scanout_id >= g->conf.max_outputs) {
+            return;
+        }
+        s = &g->scanout[m->scanout_id];
+
+        s->ds = qemu_create_displaysurface(m->width, m->height);
+        if (!s->ds) {
+            return;
+        }
+
+        dpy_gfx_replace_surface(s->con, s->ds);
+        break;
+    }
+    case VHOST_GPU_GL_SCANOUT: {
+        VhostGpuGlScanout *m = &msg->payload.gl_scanout;
+        int fd = qemu_chr_fe_get_msgfd(&g->vhost_chr);
+        QemuDmaBuf *dmabuf;
+
+        if (m->scanout_id >= g->conf.max_outputs) {
+            if (fd >= 0) {
+                close(fd);
+            }
+            break;
+        }
+
+        dmabuf = &g->dmabuf[m->scanout_id];
+        if (g->enable && dmabuf->fd >= 0) {
+            close(dmabuf->fd);
+            dmabuf->fd = -1;
+        }
+        if (fd == -1) {
+            dpy_gl_release_dmabuf(g->scanout[m->scanout_id].con, dmabuf);
+            break;
+        }
+        *dmabuf = (QemuDmaBuf) {
+            .fd = fd,
+            .width = m->fd_width,
+            .height = m->fd_height,
+            .stride = m->fd_stride,
+            .fourcc = m->fd_drm_fourcc,
+            .y0_top = m->fd_flags & VIRTIO_GPU_RESOURCE_FLAG_Y_0_TOP,
+        };
+        g->enable = 1;
+        dpy_gl_scanout_dmabuf(g->scanout[m->scanout_id].con, dmabuf);
+        break;
+    }
+    case VHOST_GPU_GL_UPDATE: {
+        VhostGpuUpdate *m = &msg->payload.update;
+        QemuConsole *con = g->scanout[m->scanout_id].con;
+
+        if (m->scanout_id >= g->conf.max_outputs ||
+            !g->scanout[m->scanout_id].con) {
+            break;
+        }
+
+        graphic_hw_gl_block(con, true);
+        dpy_gl_update(con, m->x, m->y, m->width, m->height);
+        graphic_hw_gl_block(con, false);
+        break;
+    }
+    case VHOST_GPU_UPDATE: {
+        VhostGpuUpdate *m = &msg->payload.update;
+
+        if (m->scanout_id >= g->conf.max_outputs) {
+            break;
+        }
+        s = &g->scanout[m->scanout_id];
+
+        pixman_image_t *image =
+            pixman_image_create_bits(PIXMAN_x8r8g8b8,
+                                     m->width,
+                                     m->height,
+                                     (uint32_t *)m->data,
+                                     m->width * 4);
+
+        pixman_image_composite(PIXMAN_OP_SRC,
+                               image, NULL, s->ds->image,
+                               0, 0, 0, 0, m->x, m->y, m->width, m->height);
+
+        pixman_image_unref(image);
+        dpy_gfx_update(s->con, m->x, m->y, m->width, m->height);
+        break;
+    }
+    default:
+        g_warning("unhandled message %d %d", msg->request, msg->size);
+    }
+}
+
+static void vhost_gpu_chr_read(void *opaque)
+{
+    VirtIOGPU *g = opaque;
+    VhostGpuMsg *msg = NULL;
+    VhostGpuRequest request;
+    uint32_t size;
+    int r;
+
+    r = qemu_chr_fe_read_all(&g->vhost_chr,
+                             (uint8_t *)&request, sizeof(uint32_t));
+    if (r != sizeof(uint32_t)) {
+        error_report("failed to read msg header: %d, %d", r, errno);
+        goto end;
+    }
+
+    r = qemu_chr_fe_read_all(&g->vhost_chr,
+                             (uint8_t *)&size, sizeof(uint32_t));
+    if (r != sizeof(uint32_t)) {
+        error_report("failed to read msg size");
+        goto end;
+    }
+
+    msg = g_malloc(VHOST_GPU_HDR_SIZE + size);
+    g_return_if_fail(msg != NULL);
+
+    r = qemu_chr_fe_read_all(&g->vhost_chr,
+                             (uint8_t *)&msg->payload, size);
+    if (r != size) {
+        error_report("failed to read msg payload %d != %d", r, size);
+        goto end;
+    }
+
+    msg->request = request;
+    msg->size = size;
+
+    if (request == VHOST_GPU_CURSOR_UPDATE ||
+        request == VHOST_GPU_CURSOR_POS ||
+        request == VHOST_GPU_CURSOR_POS_HIDE) {
+        vhost_gpu_handle_cursor(g, msg);
+    } else {
+        vhost_gpu_handle_display(g, msg);
+    }
+
+end:
+    g_free(msg);
+}
+
+int vhost_gpu_init(VirtIOGPU *g, Error **errp)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(g);
+    Chardev *chr;
+    int sv[2];
+
+    if (vhost_user_backend_dev_init(g->vhost, vdev, 2, errp) < 0) {
+        return -1;
+    }
+
+    if (socketpair(PF_UNIX, SOCK_STREAM, 0, sv) == -1) {
+        error_setg_errno(errp, errno, "socketpair() failed");
+        return -1;
+    }
+
+    chr = CHARDEV(object_new(TYPE_CHARDEV_SOCKET));
+    if (!chr || qemu_chr_add_client(chr, sv[0]) == -1) {
+        error_setg(errp, "Failed to make socket chardev");
+        object_unref(OBJECT(chr));
+        return -1;
+    }
+    if (!qemu_chr_fe_init(&g->vhost_chr, chr, errp)) {
+        object_unref(OBJECT(chr));
+        return -1;
+    }
+
+    qemu_set_fd_handler(sv[0], vhost_gpu_chr_read, NULL, g);
+
+    vhost_user_gpu_set_socket(&g->vhost->dev, sv[1]);
+
+    close(sv[1]);
+
+    return 0;
+}
diff --git a/hw/display/virtio-gpu-3d.c b/hw/display/virtio-gpu-3d.c
index 3558f38fe8..90984ea1bb 100644
--- a/hw/display/virtio-gpu-3d.c
+++ b/hw/display/virtio-gpu-3d.c
@@ -614,9 +614,13 @@  void virtio_gpu_gl_block(void *opaque, bool block)
         g->renderer_blocked--;
     }
     assert(g->renderer_blocked >= 0);
-
     if (g->renderer_blocked == 0) {
-        virtio_gpu_process_cmdq(g);
+        if (g->vhost && g->enable) {
+            uint32_t ok; /*unblock renderer */
+            qemu_chr_fe_write(&g->vhost_chr, (uint8_t *)&ok, sizeof(ok));
+        } else {
+            virtio_gpu_process_cmdq(g);
+        }
     }
 }
 
diff --git a/hw/display/virtio-gpu-pci.c b/hw/display/virtio-gpu-pci.c
index cece4aa495..470bf0608c 100644
--- a/hw/display/virtio-gpu-pci.c
+++ b/hw/display/virtio-gpu-pci.c
@@ -18,6 +18,7 @@ 
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-pci.h"
 #include "hw/virtio/virtio-gpu.h"
+#include "qapi/error.h"
 
 static Property virtio_gpu_pci_properties[] = {
     DEFINE_VIRTIO_GPU_PCI_PROPERTIES(VirtIOPCIProxy),
@@ -67,6 +68,10 @@  static void virtio_gpu_initfn(Object *obj)
 
     virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
                                 TYPE_VIRTIO_GPU);
+
+    object_property_add_alias(obj, "vhost-user",
+                              OBJECT(&dev->vdev), "vhost-user",
+                              &error_abort);
 }
 
 static const TypeInfo virtio_gpu_pci_info = {
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 2dd3c3481a..0dfa08a173 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -22,6 +22,7 @@ 
 #include "migration/blocker.h"
 #include "qemu/log.h"
 #include "qapi/error.h"
+#include "qemu/error-report.h"
 
 #define VIRTIO_GPU_VM_VERSION 1
 
@@ -1184,11 +1185,16 @@  static void virtio_gpu_device_realize(DeviceState *qdev, Error **errp)
     }
 
     g->use_virgl_renderer = false;
+    if (g->vhost) {
+        /* default to backend virgl=on */
+        have_virgl = true;
+    } else {
 #if !defined(CONFIG_VIRGL) || defined(HOST_WORDS_BIGENDIAN)
-    have_virgl = false;
+        have_virgl = false;
 #else
-    have_virgl = display_opengl;
+        have_virgl = display_opengl;
 #endif
+    }
     if (!have_virgl) {
         g->conf.flags &= ~(1 << VIRTIO_GPU_FLAG_VIRGL_ENABLED);
     }
@@ -1226,6 +1232,10 @@  static void virtio_gpu_device_realize(DeviceState *qdev, Error **errp)
         g->cursor_vq = virtio_add_queue(vdev, 16, virtio_gpu_handle_cursor_cb);
     }
 
+    if (g->vhost && vhost_gpu_init(g, errp) < 0) {
+        return;
+    }
+
     g->ctrl_bh = qemu_bh_new(virtio_gpu_ctrl_bh, g);
     g->cursor_bh = qemu_bh_new(virtio_gpu_cursor_bh, g);
     QTAILQ_INIT(&g->reslist);
@@ -1253,8 +1263,27 @@  static void virtio_gpu_device_unrealize(DeviceState *qdev, Error **errp)
     }
 }
 
+static void virtio_gpu_host_user_is_busy(const Object *obj, const char *name,
+                                         Object *val, Error **errp)
+{
+    VirtIOGPU *g = VIRTIO_GPU(obj);
+
+    if (g->vhost) {
+        error_setg(errp, "can't use already busy vhost-user");
+    } else {
+        qdev_prop_allow_set_link_before_realize(obj, name, val, errp);
+    }
+}
+
 static void virtio_gpu_instance_init(Object *obj)
 {
+    VirtIOGPU *g = VIRTIO_GPU(obj);
+
+    object_property_add_link(obj, "vhost-user", TYPE_VHOST_USER_BACKEND,
+                             (Object **)&g->vhost,
+                             virtio_gpu_host_user_is_busy,
+                             OBJ_PROP_LINK_UNREF_ON_RELEASE,
+                             &error_abort);
 }
 
 static void virtio_gpu_reset(VirtIODevice *vdev)
@@ -1264,7 +1293,9 @@  static void virtio_gpu_reset(VirtIODevice *vdev)
     int i;
 
     g->enable = 0;
-
+    if (g->vhost) {
+        vhost_user_backend_stop(g->vhost);
+    }
     QTAILQ_FOREACH_SAFE(res, &g->reslist, next, tmp) {
         virtio_gpu_resource_destroy(g, res);
     }
@@ -1312,6 +1343,43 @@  static const VMStateDescription vmstate_virtio_gpu = {
     },
 };
 
+static void virtio_gpu_set_status(VirtIODevice *vdev, uint8_t val)
+{
+    VirtIOGPU *g = VIRTIO_GPU(vdev);
+
+    if (g->vhost) {
+        if (val & VIRTIO_CONFIG_S_DRIVER_OK && vdev->vm_running) {
+            vhost_user_backend_start(g->vhost);
+        } else {
+            /* TODO: forcefully gl unblock ? */
+            vhost_user_backend_stop(g->vhost);
+        }
+    }
+}
+
+static bool virtio_gpu_guest_notifier_pending(VirtIODevice *vdev, int idx)
+{
+    VirtIOGPU *g = VIRTIO_GPU(vdev);
+
+    if (!g->vhost) {
+        return false;
+    }
+
+    return vhost_virtqueue_pending(&g->vhost->dev, idx);
+}
+
+static void virtio_gpu_guest_notifier_mask(VirtIODevice *vdev, int idx,
+                                           bool mask)
+{
+    VirtIOGPU *g = VIRTIO_GPU(vdev);
+
+    if (!g->vhost) {
+        return;
+    }
+
+    vhost_virtqueue_mask(&g->vhost->dev, vdev, idx, mask);
+}
+
 static Property virtio_gpu_properties[] = {
     DEFINE_PROP_UINT32("max_outputs", VirtIOGPU, conf.max_outputs, 1),
     DEFINE_PROP_SIZE("max_hostmem", VirtIOGPU, conf.max_hostmem,
@@ -1338,6 +1406,9 @@  static void virtio_gpu_class_init(ObjectClass *klass, void *data)
     vdc->set_config = virtio_gpu_set_config;
     vdc->get_features = virtio_gpu_get_features;
     vdc->set_features = virtio_gpu_set_features;
+    vdc->set_status   = virtio_gpu_set_status;
+    vdc->guest_notifier_mask = virtio_gpu_guest_notifier_mask;
+    vdc->guest_notifier_pending = virtio_gpu_guest_notifier_pending;
 
     vdc->reset = virtio_gpu_reset;
 
diff --git a/hw/display/virtio-vga.c b/hw/display/virtio-vga.c
index baa74ba82c..b16edd9e6d 100644
--- a/hw/display/virtio-vga.c
+++ b/hw/display/virtio-vga.c
@@ -201,6 +201,11 @@  static void virtio_vga_inst_initfn(Object *obj)
 
     virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
                                 TYPE_VIRTIO_GPU);
+
+    /* could eventually be included in qdev_alias_all_properties? */
+    object_property_add_alias(obj, "vhost-user",
+                              OBJECT(&dev->vdev), "vhost-user",
+                              &error_abort);
 }
 
 static TypeInfo virtio_vga_info = {
diff --git a/ui/spice-display.c b/ui/spice-display.c
index fe734821dd..81f08a85bc 100644
--- a/ui/spice-display.c
+++ b/ui/spice-display.c
@@ -1048,7 +1048,8 @@  static void qemu_spice_gl_update(DisplayChangeListener *dcl,
             /* note: spice server will close the fd, so hand over a dup */
             spice_qxl_gl_scanout(&ssd->qxl, dup(dmabuf->fd),
                                  dmabuf->width, dmabuf->height,
-                                 dmabuf->stride, dmabuf->fourcc, false);
+                                 dmabuf->stride, dmabuf->fourcc,
+                                 dmabuf->y0_top);
         }
         qemu_spice_gl_monitor_config(ssd, 0, 0, dmabuf->width, dmabuf->height);
         ssd->guest_dmabuf_refresh = false;
diff --git a/hw/display/Makefile.objs b/hw/display/Makefile.objs
index d907b381ae..1c6f3c6bc1 100644
--- a/hw/display/Makefile.objs
+++ b/hw/display/Makefile.objs
@@ -36,7 +36,7 @@  obj-$(CONFIG_VGA) += vga.o
 
 common-obj-$(CONFIG_QXL) += qxl.o qxl-logger.o qxl-render.o
 
-obj-$(CONFIG_VIRTIO) += virtio-gpu.o virtio-gpu-3d.o
+obj-$(CONFIG_VIRTIO) += virtio-gpu.o virtio-gpu-3d.o vhost-gpu.o
 obj-$(CONFIG_VIRTIO_PCI) += virtio-gpu-pci.o
 obj-$(CONFIG_VIRTIO_VGA) += virtio-vga.o
 virtio-gpu.o-cflags := $(VIRGL_CFLAGS)