Message ID | 20150311180212.8119.14734.stgit@bahia.local |
---|---|
State | New |
Headers | show |
On Wed, Mar 11, 2015 at 07:04:38PM +0100, Greg Kurz wrote: > vhost is seriously broken with ppc64le guests, even in the supposedly > supported case where the host is ppc64le and we don't need cross-endian > support. > > The TX virtqueue fails to be handled by vhost and falls back to QEMU. > Despite this unexpected scenario where RX is vhost and TX is QEMU, the > guest runs well with reduced upload performances... until you reboot, > migrate, managed save or in fact any operation that causes vhost_net > to be re-started. Network connectivity is then permanantly lost for > the guest. > > TX falling back to QEMU is the result of a failed MMIO store emulation > in KVM. Debugging shows that: > > kvmppc_emulate_mmio() > | > +-> kvmppc_handle_store() > | > +-> kvm_io_bus_write() > | > +-> __kvm_io_bus_write() returns -EOPNOTSUPP > > This happens because no matching device was found: > > __kvm_io_bus_write() > | > +->kvm_iodevice_write() > | > +->ioeventfd_write() > | > +->ioeventfd_in_range() returns false for all registered vrings > > Extra debugging shows that the TX vring number (16-bit) is supposed to > be 0x0100 but QEMU passes 0x0001 to KVM... This happens *again* because > QEMU still assumes powerpc is big endian (TARGET_WORDS_BIGENDIAN) by > default. > > This patch adds an extra swap in virtio_pci_set_host_notifier_internal() > to negate the one that is done in adjust_endianness(). Since this is not > a hot path and we want to keep virtio-pci.o in common-obj, we don't care > whether the guest is bi-endian or not. > > Reported-by: Cédric Le Goater <clg@fr.ibm.com> > Suggested-by: Michael Roth <mdroth@linux.vnet.ibm.com> > Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> I am confused. The value that notifications use is always LE. Can't we avoid multiple swaps? They make my head spin. > --- > > I guess it is also a fix for virtio-1 but I didn't check. > > hw/virtio/virtio-pci.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c > index e7baf7b..62b04c9 100644 > --- a/hw/virtio/virtio-pci.c > +++ b/hw/virtio/virtio-pci.c > @@ -133,6 +133,11 @@ static int virtio_pci_load_queue(DeviceState *d, int n, QEMUFile *f) > return 0; > } > > +static uint16_t cpu_to_host_notifier16(VirtIODevice *vdev, uint16_t val) > +{ > + return virtio_is_big_endian(vdev) ? val : bswap16(val); > +} > + > static int virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy, > int n, bool assign, bool set_handler) > { > @@ -150,10 +155,12 @@ static int virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy, > } > virtio_queue_set_host_notifier_fd_handler(vq, true, set_handler); > memory_region_add_eventfd(&proxy->bar, VIRTIO_PCI_QUEUE_NOTIFY, 2, > - true, n, notifier); > + true, cpu_to_host_notifier16(vdev, n), > + notifier); > } else { > memory_region_del_eventfd(&proxy->bar, VIRTIO_PCI_QUEUE_NOTIFY, 2, > - true, n, notifier); > + true, cpu_to_host_notifier16(vdev, n), > + notifier); > virtio_queue_set_host_notifier_fd_handler(vq, false, false); > event_notifier_cleanup(notifier); > }
On Wed, 11 Mar 2015 21:06:05 +0100 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Wed, Mar 11, 2015 at 07:04:38PM +0100, Greg Kurz wrote: > > vhost is seriously broken with ppc64le guests, even in the supposedly > > supported case where the host is ppc64le and we don't need cross-endian > > support. > > > > The TX virtqueue fails to be handled by vhost and falls back to QEMU. > > Despite this unexpected scenario where RX is vhost and TX is QEMU, the > > guest runs well with reduced upload performances... until you reboot, > > migrate, managed save or in fact any operation that causes vhost_net > > to be re-started. Network connectivity is then permanantly lost for > > the guest. > > > > TX falling back to QEMU is the result of a failed MMIO store emulation > > in KVM. Debugging shows that: > > > > kvmppc_emulate_mmio() > > | > > +-> kvmppc_handle_store() > > | > > +-> kvm_io_bus_write() > > | > > +-> __kvm_io_bus_write() returns -EOPNOTSUPP > > > > This happens because no matching device was found: > > > > __kvm_io_bus_write() > > | > > +->kvm_iodevice_write() > > | > > +->ioeventfd_write() > > | > > +->ioeventfd_in_range() returns false for all registered vrings > > > > Extra debugging shows that the TX vring number (16-bit) is supposed to > > be 0x0100 but QEMU passes 0x0001 to KVM... This happens *again* because > > QEMU still assumes powerpc is big endian (TARGET_WORDS_BIGENDIAN) by > > default. > > > > This patch adds an extra swap in virtio_pci_set_host_notifier_internal() > > to negate the one that is done in adjust_endianness(). Since this is not > > a hot path and we want to keep virtio-pci.o in common-obj, we don't care > > whether the guest is bi-endian or not. > > > > Reported-by: Cédric Le Goater <clg@fr.ibm.com> > > Suggested-by: Michael Roth <mdroth@linux.vnet.ibm.com> > > Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> > > I am confused. > The value that notifications use is always LE. True but adjust_endianness() does swap unconditionally for ppc64 because of TARGET_WORDS_BIGENDIAN. > Can't we avoid multiple swaps? That would mean adding an extra endianness argument down to memory_region_wrong_endianness()... not sure we want to do that. > They make my head spin. > I understand that the current fixed target endianness paradigm is best suited for most architectures. Extra swaps in specific non-critical locations allows to support odd beasts like ppc64le and arm64be without trashing more common paths. Maybe I can add a comment for better clarity (see below). > > --- > > > > I guess it is also a fix for virtio-1 but I didn't check. > > > > hw/virtio/virtio-pci.c | 11 +++++++++-- > > 1 file changed, 9 insertions(+), 2 deletions(-) > > > > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c > > index e7baf7b..62b04c9 100644 > > --- a/hw/virtio/virtio-pci.c > > +++ b/hw/virtio/virtio-pci.c > > @@ -133,6 +133,11 @@ static int virtio_pci_load_queue(DeviceState *d, int n, QEMUFile *f) > > return 0; > > } > > /* The host notifier will be swapped in adjust_endianness() according to the * target default endianness. We need to negate this swap if the device uses * an endianness that is not the default (ppc64le for example). */ > > +static uint16_t cpu_to_host_notifier16(VirtIODevice *vdev, uint16_t val) > > +{ > > + return virtio_is_big_endian(vdev) ? val : bswap16(val); > > +} > > + > > static int virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy, > > int n, bool assign, bool set_handler) > > { > > @@ -150,10 +155,12 @@ static int virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy, > > } > > virtio_queue_set_host_notifier_fd_handler(vq, true, set_handler); > > memory_region_add_eventfd(&proxy->bar, VIRTIO_PCI_QUEUE_NOTIFY, 2, > > - true, n, notifier); > > + true, cpu_to_host_notifier16(vdev, n), > > + notifier); > > } else { > > memory_region_del_eventfd(&proxy->bar, VIRTIO_PCI_QUEUE_NOTIFY, 2, > > - true, n, notifier); > > + true, cpu_to_host_notifier16(vdev, n), > > + notifier); > > virtio_queue_set_host_notifier_fd_handler(vq, false, false); > > event_notifier_cleanup(notifier); > > } >
On Wed, 2015-03-11 at 23:03 +0100, Greg Kurz wrote: > /* The host notifier will be swapped in adjust_endianness() according to the > * target default endianness. We need to negate this swap if the device uses > * an endianness that is not the default (ppc64le for example). > */ > > > > +static uint16_t cpu_to_host_notifier16(VirtIODevice *vdev, uint16_t val) > > > +{ > > > + return virtio_is_big_endian(vdev) ? val : bswap16(val); > > > +} But but but ... The above ... won't it do things like break x86 ? Ie, shouldn't we swap only if TARGET_BIG_ENDIAN and !virtio_is_big_endian ? Or better, "fixed target endian" ^ "virtio endian" to cover all cases ? Cheers, Ben. > > > static int virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy, > > > int n, bool assign, bool set_handler) > > > { > > > @@ -150,10 +155,12 @@ static int virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy, > > > } > > > virtio_queue_set_host_notifier_fd_handler(vq, true, set_handler); > > > memory_region_add_eventfd(&proxy->bar, VIRTIO_PCI_QUEUE_NOTIFY, 2, > > > - true, n, notifier); > > > + true, cpu_to_host_notifier16(vdev, n), > > > + notifier); > > > } else { > > > memory_region_del_eventfd(&proxy->bar, VIRTIO_PCI_QUEUE_NOTIFY, 2, > > > - true, n, notifier); > > > + true, cpu_to_host_notifier16(vdev, n), > > > + notifier); > > > virtio_queue_set_host_notifier_fd_handler(vq, false, false); > > > event_notifier_cleanup(notifier); > > > } > > >
On Thu, 12 Mar 2015 09:18:38 +1100 Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > On Wed, 2015-03-11 at 23:03 +0100, Greg Kurz wrote: > > > /* The host notifier will be swapped in adjust_endianness() according to the > > * target default endianness. We need to negate this swap if the device uses > > * an endianness that is not the default (ppc64le for example). > > */ > > > > > > +static uint16_t cpu_to_host_notifier16(VirtIODevice *vdev, uint16_t val) > > > > +{ > > > > + return virtio_is_big_endian(vdev) ? val : bswap16(val); > > > > +} > > But but but ... The above ... won't it do things like break x86 ? Ie, > shouldn't we swap only if TARGET_BIG_ENDIAN and !virtio_is_big_endian ? > Or better, "fixed target endian" ^ "virtio endian" to cover all cases ? > Yeah you're right, it's a mess :) To avoid virtio-pci.o being built per target, we can use virtio_default_endian() instead (to be exported from virtio.c): return vdev->device_endian() != virtio_default_endian() ? val : bswap16(val); I shall test on x86 and post a v2. Thanks. -- G > Cheers, > Ben. > > > > > static int virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy, > > > > int n, bool assign, bool set_handler) > > > > { > > > > @@ -150,10 +155,12 @@ static int virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy, > > > > } > > > > virtio_queue_set_host_notifier_fd_handler(vq, true, set_handler); > > > > memory_region_add_eventfd(&proxy->bar, VIRTIO_PCI_QUEUE_NOTIFY, 2, > > > > - true, n, notifier); > > > > + true, cpu_to_host_notifier16(vdev, n), > > > > + notifier); > > > > } else { > > > > memory_region_del_eventfd(&proxy->bar, VIRTIO_PCI_QUEUE_NOTIFY, 2, > > > > - true, n, notifier); > > > > + true, cpu_to_host_notifier16(vdev, n), > > > > + notifier); > > > > virtio_queue_set_host_notifier_fd_handler(vq, false, false); > > > > event_notifier_cleanup(notifier); > > > > } > > > > > > >
On Wed, Mar 11, 2015 at 11:03:14PM +0100, Greg Kurz wrote: > On Wed, 11 Mar 2015 21:06:05 +0100 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Wed, Mar 11, 2015 at 07:04:38PM +0100, Greg Kurz wrote: > > > vhost is seriously broken with ppc64le guests, even in the supposedly > > > supported case where the host is ppc64le and we don't need cross-endian > > > support. > > > > > > The TX virtqueue fails to be handled by vhost and falls back to QEMU. > > > Despite this unexpected scenario where RX is vhost and TX is QEMU, the > > > guest runs well with reduced upload performances... until you reboot, > > > migrate, managed save or in fact any operation that causes vhost_net > > > to be re-started. Network connectivity is then permanantly lost for > > > the guest. > > > > > > TX falling back to QEMU is the result of a failed MMIO store emulation > > > in KVM. Debugging shows that: > > > > > > kvmppc_emulate_mmio() > > > | > > > +-> kvmppc_handle_store() > > > | > > > +-> kvm_io_bus_write() > > > | > > > +-> __kvm_io_bus_write() returns -EOPNOTSUPP > > > > > > This happens because no matching device was found: > > > > > > __kvm_io_bus_write() > > > | > > > +->kvm_iodevice_write() > > > | > > > +->ioeventfd_write() > > > | > > > +->ioeventfd_in_range() returns false for all registered vrings > > > > > > Extra debugging shows that the TX vring number (16-bit) is supposed to > > > be 0x0100 but QEMU passes 0x0001 to KVM... This happens *again* because > > > QEMU still assumes powerpc is big endian (TARGET_WORDS_BIGENDIAN) by > > > default. > > > > > > This patch adds an extra swap in virtio_pci_set_host_notifier_internal() > > > to negate the one that is done in adjust_endianness(). Since this is not > > > a hot path and we want to keep virtio-pci.o in common-obj, we don't care > > > whether the guest is bi-endian or not. > > > > > > Reported-by: Cédric Le Goater <clg@fr.ibm.com> > > > Suggested-by: Michael Roth <mdroth@linux.vnet.ibm.com> > > > Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> > > > > I am confused. > > The value that notifications use is always LE. > > True but adjust_endianness() does swap unconditionally for ppc64 > because of TARGET_WORDS_BIGENDIAN. > > > Can't we avoid multiple swaps? > > That would mean adding an extra endianness argument down to > memory_region_wrong_endianness()... not sure we want to do that. > > > They make my head spin. > > > > I understand that the current fixed target endianness paradigm > is best suited for most architectures. Extra swaps in specific > non-critical locations allows to support odd beasts like ppc64le > and arm64be without trashing more common paths. Maybe I can add a > comment for better clarity (see below). But common header format is simple, it's always LE. It does not depend on target. To me this looks like a bug in memory_region_add_eventfd, it should do the right thing depending on device endian-ness. > > > --- > > > > > > I guess it is also a fix for virtio-1 but I didn't check. > > > > > > hw/virtio/virtio-pci.c | 11 +++++++++-- > > > 1 file changed, 9 insertions(+), 2 deletions(-) > > > > > > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c > > > index e7baf7b..62b04c9 100644 > > > --- a/hw/virtio/virtio-pci.c > > > +++ b/hw/virtio/virtio-pci.c > > > @@ -133,6 +133,11 @@ static int virtio_pci_load_queue(DeviceState *d, int n, QEMUFile *f) > > > return 0; > > > } > > > > > /* The host notifier will be swapped in adjust_endianness() according to the > * target default endianness. We need to negate this swap if the device uses > * an endianness that is not the default (ppc64le for example). > */ > > > > +static uint16_t cpu_to_host_notifier16(VirtIODevice *vdev, uint16_t val) > > > +{ > > > + return virtio_is_big_endian(vdev) ? val : bswap16(val); > > > +} > > > + > > > static int virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy, > > > int n, bool assign, bool set_handler) > > > { > > > @@ -150,10 +155,12 @@ static int virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy, > > > } > > > virtio_queue_set_host_notifier_fd_handler(vq, true, set_handler); > > > memory_region_add_eventfd(&proxy->bar, VIRTIO_PCI_QUEUE_NOTIFY, 2, > > > - true, n, notifier); > > > + true, cpu_to_host_notifier16(vdev, n), > > > + notifier); > > > } else { > > > memory_region_del_eventfd(&proxy->bar, VIRTIO_PCI_QUEUE_NOTIFY, 2, > > > - true, n, notifier); > > > + true, cpu_to_host_notifier16(vdev, n), > > > + notifier); > > > virtio_queue_set_host_notifier_fd_handler(vq, false, false); > > > event_notifier_cleanup(notifier); > > > } > >
On Wed, Mar 11, 2015 at 11:52:13PM +0100, Greg Kurz wrote: > On Thu, 12 Mar 2015 09:18:38 +1100 > Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > > > On Wed, 2015-03-11 at 23:03 +0100, Greg Kurz wrote: > > > > > /* The host notifier will be swapped in adjust_endianness() according to the > > > * target default endianness. We need to negate this swap if the device uses > > > * an endianness that is not the default (ppc64le for example). > > > */ > > > > > > > > +static uint16_t cpu_to_host_notifier16(VirtIODevice *vdev, uint16_t val) > > > > > +{ > > > > > + return virtio_is_big_endian(vdev) ? val : bswap16(val); > > > > > +} > > > > But but but ... The above ... won't it do things like break x86 ? Ie, > > shouldn't we swap only if TARGET_BIG_ENDIAN and !virtio_is_big_endian ? > > Or better, "fixed target endian" ^ "virtio endian" to cover all cases ? > > > > Yeah you're right, it's a mess :) > > To avoid virtio-pci.o being built per target, we can use virtio_default_endian() > instead (to be exported from virtio.c): > > return vdev->device_endian() != virtio_default_endian() ? val : bswap16(val); > > I shall test on x86 and post a v2. > > Thanks. > > -- > G It's a mess because the issue is not in virtio I think. virtio common region is just a simple device, using fixed LE format. It's some bug in ioeventfd support. Need to understand whether it's in kernel (and maybe do a work-around) or in qemu memory core. > > Cheers, > > Ben. > > > > > > > static int virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy, > > > > > int n, bool assign, bool set_handler) > > > > > { > > > > > @@ -150,10 +155,12 @@ static int virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy, > > > > > } > > > > > virtio_queue_set_host_notifier_fd_handler(vq, true, set_handler); > > > > > memory_region_add_eventfd(&proxy->bar, VIRTIO_PCI_QUEUE_NOTIFY, 2, > > > > > - true, n, notifier); > > > > > + true, cpu_to_host_notifier16(vdev, n), > > > > > + notifier); > > > > > } else { > > > > > memory_region_del_eventfd(&proxy->bar, VIRTIO_PCI_QUEUE_NOTIFY, 2, > > > > > - true, n, notifier); > > > > > + true, cpu_to_host_notifier16(vdev, n), > > > > > + notifier); > > > > > virtio_queue_set_host_notifier_fd_handler(vq, false, false); > > > > > event_notifier_cleanup(notifier); > > > > > } > > > > > > > > > > >
On 12/03/2015 08:08, Michael S. Tsirkin wrote: > But common header format is simple, it's always LE. > It does not depend on target. > To me this looks like a bug in memory_region_add_eventfd, > it should do the right thing depending on device > endian-ness. I agree it seems to be a QEMU bug. Paolo
On Thu, 12 Mar 2015 17:25:15 +0100 Paolo Bonzini <pbonzini@redhat.com> wrote: > > > On 12/03/2015 08:08, Michael S. Tsirkin wrote: > > But common header format is simple, it's always LE. > > It does not depend on target. > > To me this looks like a bug in memory_region_add_eventfd, > > it should do the right thing depending on device > > endian-ness. > > I agree it seems to be a QEMU bug. > > Paolo > Yes you're right ! QEMU swaps the virtqueue id (adjust_endianness) according to TARGET_WORDS, like it was coming from the guest but in fact it comes from the host. The id should be fixed according to HOST_WORDS instead. Of course this went unnoticed until TARGET_WORDS_BIGENDIAN != HOST_WORDS_BIGENDIAN, which we have now with ppc64le hosts. Patches to follow. Thanks. -- Greg
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index e7baf7b..62b04c9 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -133,6 +133,11 @@ static int virtio_pci_load_queue(DeviceState *d, int n, QEMUFile *f) return 0; } +static uint16_t cpu_to_host_notifier16(VirtIODevice *vdev, uint16_t val) +{ + return virtio_is_big_endian(vdev) ? val : bswap16(val); +} + static int virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy, int n, bool assign, bool set_handler) { @@ -150,10 +155,12 @@ static int virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy, } virtio_queue_set_host_notifier_fd_handler(vq, true, set_handler); memory_region_add_eventfd(&proxy->bar, VIRTIO_PCI_QUEUE_NOTIFY, 2, - true, n, notifier); + true, cpu_to_host_notifier16(vdev, n), + notifier); } else { memory_region_del_eventfd(&proxy->bar, VIRTIO_PCI_QUEUE_NOTIFY, 2, - true, n, notifier); + true, cpu_to_host_notifier16(vdev, n), + notifier); virtio_queue_set_host_notifier_fd_handler(vq, false, false); event_notifier_cleanup(notifier); }
vhost is seriously broken with ppc64le guests, even in the supposedly supported case where the host is ppc64le and we don't need cross-endian support. The TX virtqueue fails to be handled by vhost and falls back to QEMU. Despite this unexpected scenario where RX is vhost and TX is QEMU, the guest runs well with reduced upload performances... until you reboot, migrate, managed save or in fact any operation that causes vhost_net to be re-started. Network connectivity is then permanantly lost for the guest. TX falling back to QEMU is the result of a failed MMIO store emulation in KVM. Debugging shows that: kvmppc_emulate_mmio() | +-> kvmppc_handle_store() | +-> kvm_io_bus_write() | +-> __kvm_io_bus_write() returns -EOPNOTSUPP This happens because no matching device was found: __kvm_io_bus_write() | +->kvm_iodevice_write() | +->ioeventfd_write() | +->ioeventfd_in_range() returns false for all registered vrings Extra debugging shows that the TX vring number (16-bit) is supposed to be 0x0100 but QEMU passes 0x0001 to KVM... This happens *again* because QEMU still assumes powerpc is big endian (TARGET_WORDS_BIGENDIAN) by default. This patch adds an extra swap in virtio_pci_set_host_notifier_internal() to negate the one that is done in adjust_endianness(). Since this is not a hot path and we want to keep virtio-pci.o in common-obj, we don't care whether the guest is bi-endian or not. Reported-by: Cédric Le Goater <clg@fr.ibm.com> Suggested-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> --- I guess it is also a fix for virtio-1 but I didn't check. hw/virtio/virtio-pci.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)