diff mbox

[RFC,4/7] vhost: set vring endianness for legacy virtio

Message ID 20150506120801.8607.64278.stgit@bahia.huguette.org
State New
Headers show

Commit Message

Greg Kurz May 6, 2015, 12:08 p.m. UTC
Legacy virtio is native endian: if the guest and host endianness differ,
we have to tell vhost so it can swap bytes where appropriate. This is
done through a vhost ring ioctl.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
---
 hw/virtio/vhost.c |   50 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 49 insertions(+), 1 deletion(-)

Comments

Cornelia Huck May 12, 2015, 1:25 p.m. UTC | #1
On Wed, 06 May 2015 14:08:02 +0200
Greg Kurz <gkurz@linux.vnet.ibm.com> wrote:

> Legacy virtio is native endian: if the guest and host endianness differ,
> we have to tell vhost so it can swap bytes where appropriate. This is
> done through a vhost ring ioctl.
> 
> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> ---
>  hw/virtio/vhost.c |   50 +++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 49 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 54851b7..1d7b939 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
(...)
> @@ -677,6 +700,16 @@ static int vhost_virtqueue_start(struct vhost_dev *dev,
>          return -errno;
>      }
> 
> +    if (!virtio_has_feature(vdev, VIRTIO_F_VERSION_1) &&

I think this should either go in after the virtio-1 base support (more
feature bits etc.) or get a big fat comment and be touched up later.
I'd prefer the first solution so it does not get forgotten, but I'm not
sure when Michael plans to proceed with the virtio-1 patches (I think
they're mostly fine already).

> +        virtio_legacy_is_cross_endian(vdev)) {
> +        r = vhost_virtqueue_set_vring_endian_legacy(dev,
> +                                                    virtio_is_big_endian(vdev),
> +                                                    vhost_vq_index);
> +        if (r) {
> +            return -errno;
> +        }
> +    }
> +
>      s = l = virtio_queue_get_desc_size(vdev, idx);
>      a = virtio_queue_get_desc_addr(vdev, idx);
>      vq->desc = cpu_physical_memory_map(a, &l, 0);
Michael S. Tsirkin May 12, 2015, 3:15 p.m. UTC | #2
On Tue, May 12, 2015 at 03:25:30PM +0200, Cornelia Huck wrote:
> On Wed, 06 May 2015 14:08:02 +0200
> Greg Kurz <gkurz@linux.vnet.ibm.com> wrote:
> 
> > Legacy virtio is native endian: if the guest and host endianness differ,
> > we have to tell vhost so it can swap bytes where appropriate. This is
> > done through a vhost ring ioctl.
> > 
> > Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> > ---
> >  hw/virtio/vhost.c |   50 +++++++++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 49 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > index 54851b7..1d7b939 100644
> > --- a/hw/virtio/vhost.c
> > +++ b/hw/virtio/vhost.c
> (...)
> > @@ -677,6 +700,16 @@ static int vhost_virtqueue_start(struct vhost_dev *dev,
> >          return -errno;
> >      }
> > 
> > +    if (!virtio_has_feature(vdev, VIRTIO_F_VERSION_1) &&
> 
> I think this should either go in after the virtio-1 base support (more
> feature bits etc.) or get a big fat comment and be touched up later.
> I'd prefer the first solution so it does not get forgotten, but I'm not
> sure when Michael plans to proceed with the virtio-1 patches (I think
> they're mostly fine already).

There are three main issues with virtio 1 patches that I am aware of.

One issue with virtio 1 patches as they are is with how features are
handled ATM.  There are 3 types of features

	a. virtio 1 only features
	b. virtio 0 only features
	c. shared features

and 3 types of devices
	a. legacy device: has b+c features
	b. modern device: has a+c features
	c. transitional device: has a+c features but exposes
	   only c through the legacy interface


So I think a callback that gets features depending on guest
version isn't a good way to model it because fundamentally device
has one set of features.
A better way to model this is really just a single
host_features bitmask, and for transitional devices, a mask
hiding a features - which are so far all bits > 31, so maybe
for now we can just have a global mask.

We need to validate features at initialization time and make
sure they make sense, fail if not (sometimes we need to mask
features if they don't make sense - this is unfortunate
but might be needed for compatibility).

Moving host_features to virtio core would make all of the above
easier.


Second issue is migration, some of it is with migrating the new
features, so that's tied to the first one.


Third issue is fixing devices so they don't try to
access guest memory until DRIVER_OK is set.
This is surprisingly hard to do generally given need to support old
drivers which don't set DRIVER_OK or set it very late, and the fact that
we tied work-arounds for even older drivers which dont' set pci bus
master to the DRIVER_OK bit. I tried, and I'm close to giving up and
just checking guest ack for virtio 1, and ignoring DRIVER_OK requirement
if not there.



> > +        virtio_legacy_is_cross_endian(vdev)) {
> > +        r = vhost_virtqueue_set_vring_endian_legacy(dev,
> > +                                                    virtio_is_big_endian(vdev),
> > +                                                    vhost_vq_index);
> > +        if (r) {
> > +            return -errno;
> > +        }
> > +    }
> > +
> >      s = l = virtio_queue_get_desc_size(vdev, idx);
> >      a = virtio_queue_get_desc_addr(vdev, idx);
> >      vq->desc = cpu_physical_memory_map(a, &l, 0);
Cornelia Huck May 12, 2015, 4:25 p.m. UTC | #3
On Tue, 12 May 2015 17:15:53 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Tue, May 12, 2015 at 03:25:30PM +0200, Cornelia Huck wrote:
> > On Wed, 06 May 2015 14:08:02 +0200
> > Greg Kurz <gkurz@linux.vnet.ibm.com> wrote:
> > 
> > > Legacy virtio is native endian: if the guest and host endianness differ,
> > > we have to tell vhost so it can swap bytes where appropriate. This is
> > > done through a vhost ring ioctl.
> > > 
> > > Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> > > ---
> > >  hw/virtio/vhost.c |   50 +++++++++++++++++++++++++++++++++++++++++++++++++-
> > >  1 file changed, 49 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > > index 54851b7..1d7b939 100644
> > > --- a/hw/virtio/vhost.c
> > > +++ b/hw/virtio/vhost.c
> > (...)
> > > @@ -677,6 +700,16 @@ static int vhost_virtqueue_start(struct vhost_dev *dev,
> > >          return -errno;
> > >      }
> > > 
> > > +    if (!virtio_has_feature(vdev, VIRTIO_F_VERSION_1) &&
> > 
> > I think this should either go in after the virtio-1 base support (more
> > feature bits etc.) or get a big fat comment and be touched up later.
> > I'd prefer the first solution so it does not get forgotten, but I'm not
> > sure when Michael plans to proceed with the virtio-1 patches (I think
> > they're mostly fine already).
> 
> There are three main issues with virtio 1 patches that I am aware of.
> 
> One issue with virtio 1 patches as they are is with how features are
> handled ATM.  There are 3 types of features
> 
> 	a. virtio 1 only features
> 	b. virtio 0 only features
> 	c. shared features
> 
> and 3 types of devices
> 	a. legacy device: has b+c features
> 	b. modern device: has a+c features
> 	c. transitional device: has a+c features but exposes
> 	   only c through the legacy interface

Wouldn't a transitional device be able to expose b as well?

> 
> 
> So I think a callback that gets features depending on guest
> version isn't a good way to model it because fundamentally device
> has one set of features.
> A better way to model this is really just a single
> host_features bitmask, and for transitional devices, a mask
> hiding a features - which are so far all bits > 31, so maybe
> for now we can just have a global mask.

How would this work for transitional presenting a modern device - would
you have a superset of bits and masks for legacy and modern?

> 
> We need to validate features at initialization time and make
> sure they make sense, fail if not (sometimes we need to mask
> features if they don't make sense - this is unfortunate
> but might be needed for compatibility).
> 
> Moving host_features to virtio core would make all of the above
> easier.

I have started hacking up code that moves host_features, but I'm quite
lost with all the different virtio versions floating around. Currently
trying against master, but that of course ignores the virtio-1 issues.

> 
> 
> Second issue is migration, some of it is with migrating the new
> features, so that's tied to the first one.

There's also the used and avail addresses, but that kind of follows
from virtio-1 support.

> 
> 
> Third issue is fixing devices so they don't try to
> access guest memory until DRIVER_OK is set.
> This is surprisingly hard to do generally given need to support old
> drivers which don't set DRIVER_OK or set it very late, and the fact that
> we tied work-arounds for even older drivers which dont' set pci bus
> master to the DRIVER_OK bit. I tried, and I'm close to giving up and
> just checking guest ack for virtio 1, and ignoring DRIVER_OK requirement
> if not there.

If legacy survived like it is until now, it might be best to focus on
modern devices for this.
Michael S. Tsirkin May 12, 2015, 4:40 p.m. UTC | #4
On Tue, May 12, 2015 at 06:25:32PM +0200, Cornelia Huck wrote:
> On Tue, 12 May 2015 17:15:53 +0200
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Tue, May 12, 2015 at 03:25:30PM +0200, Cornelia Huck wrote:
> > > On Wed, 06 May 2015 14:08:02 +0200
> > > Greg Kurz <gkurz@linux.vnet.ibm.com> wrote:
> > > 
> > > > Legacy virtio is native endian: if the guest and host endianness differ,
> > > > we have to tell vhost so it can swap bytes where appropriate. This is
> > > > done through a vhost ring ioctl.
> > > > 
> > > > Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> > > > ---
> > > >  hw/virtio/vhost.c |   50 +++++++++++++++++++++++++++++++++++++++++++++++++-
> > > >  1 file changed, 49 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > > > index 54851b7..1d7b939 100644
> > > > --- a/hw/virtio/vhost.c
> > > > +++ b/hw/virtio/vhost.c
> > > (...)
> > > > @@ -677,6 +700,16 @@ static int vhost_virtqueue_start(struct vhost_dev *dev,
> > > >          return -errno;
> > > >      }
> > > > 
> > > > +    if (!virtio_has_feature(vdev, VIRTIO_F_VERSION_1) &&
> > > 
> > > I think this should either go in after the virtio-1 base support (more
> > > feature bits etc.) or get a big fat comment and be touched up later.
> > > I'd prefer the first solution so it does not get forgotten, but I'm not
> > > sure when Michael plans to proceed with the virtio-1 patches (I think
> > > they're mostly fine already).
> > 
> > There are three main issues with virtio 1 patches that I am aware of.
> > 
> > One issue with virtio 1 patches as they are is with how features are
> > handled ATM.  There are 3 types of features
> > 
> > 	a. virtio 1 only features
> > 	b. virtio 0 only features
> > 	c. shared features
> > 
> > and 3 types of devices
> > 	a. legacy device: has b+c features
> > 	b. modern device: has a+c features
> > 	c. transitional device: has a+c features but exposes
> > 	   only c through the legacy interface
> 
> Wouldn't a transitional device be able to expose b as well?

No because the virtio 1 spec says it shouldn't.


> > 
> > 
> > So I think a callback that gets features depending on guest
> > version isn't a good way to model it because fundamentally device
> > has one set of features.
> > A better way to model this is really just a single
> > host_features bitmask, and for transitional devices, a mask
> > hiding a features - which are so far all bits > 31, so maybe
> > for now we can just have a global mask.
> 
> How would this work for transitional presenting a modern device - would
> you have a superset of bits and masks for legacy and modern?

Basically we expose through modern interface a superset
of bits exposed through legacy.
F_BAD for pci is probably the only exception.


> > 
> > We need to validate features at initialization time and make
> > sure they make sense, fail if not (sometimes we need to mask
> > features if they don't make sense - this is unfortunate
> > but might be needed for compatibility).
> > 
> > Moving host_features to virtio core would make all of the above
> > easier.
> 
> I have started hacking up code that moves host_features, but I'm quite
> lost with all the different virtio versions floating around. Currently
> trying against master, but that of course ignores the virtio-1 issues.

Yes, I think we should focus on infrastructure cleanups in master first.

> > 
> > 
> > Second issue is migration, some of it is with migrating the new
> > features, so that's tied to the first one.
> 
> There's also the used and avail addresses, but that kind of follows
> from virtio-1 support.
> 
> > 
> > 
> > Third issue is fixing devices so they don't try to
> > access guest memory until DRIVER_OK is set.
> > This is surprisingly hard to do generally given need to support old
> > drivers which don't set DRIVER_OK or set it very late, and the fact that
> > we tied work-arounds for even older drivers which dont' set pci bus
> > master to the DRIVER_OK bit. I tried, and I'm close to giving up and
> > just checking guest ack for virtio 1, and ignoring DRIVER_OK requirement
> > if not there.
> 
> If legacy survived like it is until now, it might be best to focus on
> modern devices for this.

I'm kind of unhappy that it's up to guest though as that controls
whether we run in modern mode. But yea.
Cornelia Huck May 13, 2015, 10:39 a.m. UTC | #5
On Tue, 12 May 2015 18:40:35 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Tue, May 12, 2015 at 06:25:32PM +0200, Cornelia Huck wrote:
> > On Tue, 12 May 2015 17:15:53 +0200
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:

> > > One issue with virtio 1 patches as they are is with how features are
> > > handled ATM.  There are 3 types of features
> > > 
> > > 	a. virtio 1 only features
> > > 	b. virtio 0 only features
> > > 	c. shared features
> > > 
> > > and 3 types of devices
> > > 	a. legacy device: has b+c features
> > > 	b. modern device: has a+c features
> > > 	c. transitional device: has a+c features but exposes
> > > 	   only c through the legacy interface
> > 
> > Wouldn't a transitional device be able to expose b as well?
> 
> No because the virtio 1 spec says it shouldn't.

Can you point me to the part where it says that? Can't seem to find it.

> > > So I think a callback that gets features depending on guest
> > > version isn't a good way to model it because fundamentally device
> > > has one set of features.
> > > A better way to model this is really just a single
> > > host_features bitmask, and for transitional devices, a mask
> > > hiding a features - which are so far all bits > 31, so maybe
> > > for now we can just have a global mask.
> > 
> > How would this work for transitional presenting a modern device - would
> > you have a superset of bits and masks for legacy and modern?
> 
> Basically we expose through modern interface a superset
> of bits exposed through legacy.
> F_BAD for pci is probably the only exception.

ccw has F_BAD as well.

NOTIFY_ON_EMPTY is in the MAY be offered category.

I'm still wondering about the old legacy bits. Transitional devices not
offering them is easiest to implement from the host side, but I'm
wondering if all legacy devices will be able to function with missing
bits? Is breaking some legacy drivers OK?
Michael S. Tsirkin May 13, 2015, 11:22 a.m. UTC | #6
On Wed, May 13, 2015 at 12:39:07PM +0200, Cornelia Huck wrote:
> On Tue, 12 May 2015 18:40:35 +0200
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Tue, May 12, 2015 at 06:25:32PM +0200, Cornelia Huck wrote:
> > > On Tue, 12 May 2015 17:15:53 +0200
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > > > One issue with virtio 1 patches as they are is with how features are
> > > > handled ATM.  There are 3 types of features
> > > > 
> > > > 	a. virtio 1 only features
> > > > 	b. virtio 0 only features
> > > > 	c. shared features
> > > > 
> > > > and 3 types of devices
> > > > 	a. legacy device: has b+c features
> > > > 	b. modern device: has a+c features
> > > > 	c. transitional device: has a+c features but exposes
> > > > 	   only c through the legacy interface
> > > 
> > > Wouldn't a transitional device be able to expose b as well?
> > 
> > No because the virtio 1 spec says it shouldn't.
> 
> Can you point me to the part where it says that? Can't seem to find it.


What I mean is that these features can't be operated through the modern
interface.
Consider the BLK SCSI command feature. If you set it you really
expect guest to be able to e.g. use your device as a cd writer.
So, a legacy guest is able to do this, but a modern guest isn't?
Doesn't sound like an expected outcome to me.


So the principle of the least surprise dictates that we
- disable the feature by default for everyone
- add flag to enable the feature, available for legacy devices only


> > > > So I think a callback that gets features depending on guest
> > > > version isn't a good way to model it because fundamentally device
> > > > has one set of features.
> > > > A better way to model this is really just a single
> > > > host_features bitmask, and for transitional devices, a mask
> > > > hiding a features - which are so far all bits > 31, so maybe
> > > > for now we can just have a global mask.
> > > 
> > > How would this work for transitional presenting a modern device - would
> > > you have a superset of bits and masks for legacy and modern?
> > 
> > Basically we expose through modern interface a superset
> > of bits exposed through legacy.
> > F_BAD for pci is probably the only exception.
> 
> ccw has F_BAD as well.

It does but it ignores it, so I think setting that bit
in host features is a bug. PCI uses it to detect very old
guests that just acked all features without looking
at them. ccw never had such guests.

> NOTIFY_ON_EMPTY is in the MAY be offered category.

Right. Current guests simply ignore this feature bit,
so I don't much care what we do with this bit as long as
- pci exposes this on the legacy interface
- we don't spend many lines of code working around that special case

> I'm still wondering about the old legacy bits. Transitional devices not
> offering them is easiest to implement from the host side, but I'm
> wondering if all legacy devices will be able to function with missing
> bits? Is breaking some legacy drivers OK?

We should add flags to re-enable them, for legacy devices only.
diff mbox

Patch

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 54851b7..1d7b939 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -17,9 +17,11 @@ 
 #include "hw/hw.h"
 #include "qemu/atomic.h"
 #include "qemu/range.h"
+#include "qemu/error-report.h"
 #include <linux/vhost.h>
 #include "exec/address-spaces.h"
 #include "hw/virtio/virtio-bus.h"
+#include "hw/virtio/virtio-access.h"
 #include "migration/migration.h"
 
 static void vhost_dev_sync_region(struct vhost_dev *dev,
@@ -647,6 +649,27 @@  static void vhost_log_stop(MemoryListener *listener,
     /* FIXME: implement */
 }
 
+static int vhost_virtqueue_set_vring_endian_legacy(struct vhost_dev *dev,
+                                                   bool is_big_endian,
+                                                   int vhost_vq_index)
+{
+    struct vhost_vring_state s = {
+        .index = vhost_vq_index,
+        .num = is_big_endian
+    };
+
+    if (!dev->vhost_ops->vhost_call(dev, VHOST_SET_VRING_ENDIAN, &s)) {
+        return 0;
+    }
+
+    if (errno == ENOTTY) {
+        error_report("vhost does not support cross-endian");
+        return -ENOSYS;
+    }
+
+    return -errno;
+}
+
 static int vhost_virtqueue_start(struct vhost_dev *dev,
                                 struct VirtIODevice *vdev,
                                 struct vhost_virtqueue *vq,
@@ -677,6 +700,16 @@  static int vhost_virtqueue_start(struct vhost_dev *dev,
         return -errno;
     }
 
+    if (!virtio_has_feature(vdev, VIRTIO_F_VERSION_1) &&
+        virtio_legacy_is_cross_endian(vdev)) {
+        r = vhost_virtqueue_set_vring_endian_legacy(dev,
+                                                    virtio_is_big_endian(vdev),
+                                                    vhost_vq_index);
+        if (r) {
+            return -errno;
+        }
+    }
+
     s = l = virtio_queue_get_desc_size(vdev, idx);
     a = virtio_queue_get_desc_addr(vdev, idx);
     vq->desc = cpu_physical_memory_map(a, &l, 0);
@@ -747,8 +780,9 @@  static void vhost_virtqueue_stop(struct vhost_dev *dev,
                                     struct vhost_virtqueue *vq,
                                     unsigned idx)
 {
+    int vhost_vq_index = idx - dev->vq_index;
     struct vhost_vring_state state = {
-        .index = idx - dev->vq_index
+        .index = vhost_vq_index,
     };
     int r;
     assert(idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs);
@@ -759,6 +793,20 @@  static void vhost_virtqueue_stop(struct vhost_dev *dev,
     }
     virtio_queue_set_last_avail_idx(vdev, idx, state.num);
     virtio_queue_invalidate_signalled_used(vdev, idx);
+
+    /* In the cross-endian case, we need to reset the vring endianness to
+     * native as legacy devices expect so by default.
+     */
+    if (!virtio_has_feature(vdev, VIRTIO_F_VERSION_1) &&
+        virtio_legacy_is_cross_endian(vdev)) {
+        r = vhost_virtqueue_set_vring_endian_legacy(dev,
+                                                    !virtio_is_big_endian(vdev),
+                                                    vhost_vq_index);
+        if (r < 0) {
+            error_report("failed to reset vring endianness");
+        }
+    }
+
     assert (r >= 0);
     cpu_physical_memory_unmap(vq->ring, virtio_queue_get_ring_size(vdev, idx),
                               0, virtio_queue_get_ring_size(vdev, idx));