Message ID | 20140903061015.GA5449@redhat.com |
---|---|
State | New |
Headers | show |
On Wed, Sep 3, 2014 at 10:10 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > On Wed, Sep 03, 2014 at 02:17:02AM +0400, Andrey Korolyov wrote: >> On Wed, Sep 3, 2014 at 2:09 AM, Andrey Korolyov <andrey@xdel.ru> wrote: >> > On Wed, Sep 3, 2014 at 1:51 AM, Michael S. Tsirkin <mst@redhat.com> wrote: >> >> On Wed, Sep 03, 2014 at 01:29:29AM +0400, Andrey Korolyov wrote: >> >>> On Wed, Sep 3, 2014 at 1:03 AM, Michael S. Tsirkin <mst@redhat.com> wrote: >> >>> >> bad one is the >> >>> >> >> >>> >> Author: Jason Wang <jasowang@redhat.com> >> >>> >> Date: Tue Sep 2 18:07:46 2014 +0300 >> >>> >> >> >>> >> vhost_net: start/stop guest notifiers properly >> >>> > >> >>> > >> >>> > >> >>> > upstream has this (pull request sent today): >> >>> > vhost_net: cleanup start/stop condition >> >>> > >> >>> > Could you apply it and see if it helps please? >> >>> > >> >>> > Michael, if it helps it should be before start/stop guest notifiers >> >>> > ideally to avoid bisect problems. >> >>> >> >>> It is already applied as shown from the list in the previous message >> >>> (there are some aio fixes too on top of 2.1 I picked before but they >> >>> should not impact vhost-net interaction in any mean). The symptoms are >> >>> a bit interesting - VM crashes only at PCI device initalization (e.g. >> >>> grub stage after reset and initrd unpacking are passing well, but then >> >>> things getting ugly). I am running 3.14 guest i686-pae kernel from >> >>> debian backports in guest, so it may be version-specific after all. If >> >>> it`ll be hard to reproduce, I can try 64bit, expecting same behavior. >> >>> Please find args in attached file. >> >> >> >> >> >> >> >> ok just to make sure - which tree do I clone exactly? >> >> >> > >> > https://github.com/mdroth/qemu.git stable-2.1-staging showing same >> > behavior for me with those patches >> >> Forgot to mention important detail - I am playing with -mq now, so >> actually virtio-net working in a bit different way than it may >> expected (it also shown in args list from above, but someone may miss >> it): >> ... >> qemu-system-x86_64: unable to start vhost net: 95: falling back on >> userspace virtio >> qemu-system-x86_64: unable to start vhost net: 95: falling back on >> userspace virtio >> ... > > > OK I see at least one obvious bug there: does the following fix the > crash for you? > Separately, we need to debug why mq vhost is broken for you. > Is this a regression? > > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c > index ba5d544..1fe18c7 100644 > --- a/hw/net/vhost_net.c > +++ b/hw/net/vhost_net.c > @@ -289,7 +289,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs, > BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev))); > VirtioBusState *vbus = VIRTIO_BUS(qbus); > VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus); > - int r, i = 0; > + int r, i; > > if (!vhost_net_device_endian_ok(dev)) { > error_report("vhost-net does not support cross-endian"); > @@ -317,16 +317,22 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs, > r = vhost_net_start_one(get_vhost_net(ncs[i].peer), dev); > > if (r < 0) { > - goto err; > + goto err_start; > } > } > > return 0; > > -err: > +err_start: > while (--i >= 0) { > vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev); > } > +err: > + r = k->set_guest_notifiers(qbus->parent, total_queues * 2, false); > + if (r < 0) { > + fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r); > + fflush(stderr); > + } > return r; > } > another bits of information: - the userspace fallback is not specific to mq (very unfortunately for me because I didn`t checked this exact regression week before when I saw it for mq and it is not specific for queued patches for 2.1.1), - bug itself is not specific to mq, reproduces every time even with more generic interface config without queues, - patch from above does not fix the issue. Strace output for all threads is available at http://xdel.ru/downloads/qemu.out.gz, attached just before reset.
On Wed, Sep 03, 2014 at 11:43:54AM +0400, Andrey Korolyov wrote: > On Wed, Sep 3, 2014 at 10:10 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Wed, Sep 03, 2014 at 02:17:02AM +0400, Andrey Korolyov wrote: > >> On Wed, Sep 3, 2014 at 2:09 AM, Andrey Korolyov <andrey@xdel.ru> wrote: > >> > On Wed, Sep 3, 2014 at 1:51 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > >> >> On Wed, Sep 03, 2014 at 01:29:29AM +0400, Andrey Korolyov wrote: > >> >>> On Wed, Sep 3, 2014 at 1:03 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > >> >>> >> bad one is the > >> >>> >> > >> >>> >> Author: Jason Wang <jasowang@redhat.com> > >> >>> >> Date: Tue Sep 2 18:07:46 2014 +0300 > >> >>> >> > >> >>> >> vhost_net: start/stop guest notifiers properly > >> >>> > > >> >>> > > >> >>> > > >> >>> > upstream has this (pull request sent today): > >> >>> > vhost_net: cleanup start/stop condition > >> >>> > > >> >>> > Could you apply it and see if it helps please? > >> >>> > > >> >>> > Michael, if it helps it should be before start/stop guest notifiers > >> >>> > ideally to avoid bisect problems. > >> >>> > >> >>> It is already applied as shown from the list in the previous message > >> >>> (there are some aio fixes too on top of 2.1 I picked before but they > >> >>> should not impact vhost-net interaction in any mean). The symptoms are > >> >>> a bit interesting - VM crashes only at PCI device initalization (e.g. > >> >>> grub stage after reset and initrd unpacking are passing well, but then > >> >>> things getting ugly). I am running 3.14 guest i686-pae kernel from > >> >>> debian backports in guest, so it may be version-specific after all. If > >> >>> it`ll be hard to reproduce, I can try 64bit, expecting same behavior. > >> >>> Please find args in attached file. > >> >> > >> >> > >> >> > >> >> ok just to make sure - which tree do I clone exactly? > >> >> > >> > > >> > https://github.com/mdroth/qemu.git stable-2.1-staging showing same > >> > behavior for me with those patches > >> > >> Forgot to mention important detail - I am playing with -mq now, so > >> actually virtio-net working in a bit different way than it may > >> expected (it also shown in args list from above, but someone may miss > >> it): > >> ... > >> qemu-system-x86_64: unable to start vhost net: 95: falling back on > >> userspace virtio > >> qemu-system-x86_64: unable to start vhost net: 95: falling back on > >> userspace virtio > >> ... > > > > > > OK I see at least one obvious bug there: does the following fix the > > crash for you? > > Separately, we need to debug why mq vhost is broken for you. > > Is this a regression? > > > > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c > > index ba5d544..1fe18c7 100644 > > --- a/hw/net/vhost_net.c > > +++ b/hw/net/vhost_net.c > > @@ -289,7 +289,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs, > > BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev))); > > VirtioBusState *vbus = VIRTIO_BUS(qbus); > > VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus); > > - int r, i = 0; > > + int r, i; > > > > if (!vhost_net_device_endian_ok(dev)) { > > error_report("vhost-net does not support cross-endian"); > > @@ -317,16 +317,22 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs, > > r = vhost_net_start_one(get_vhost_net(ncs[i].peer), dev); > > > > if (r < 0) { > > - goto err; > > + goto err_start; > > } > > } > > > > return 0; > > > > -err: > > +err_start: > > while (--i >= 0) { > > vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev); > > } > > +err: > > + r = k->set_guest_notifiers(qbus->parent, total_queues * 2, false); > > + if (r < 0) { > > + fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r); > > + fflush(stderr); > > + } > > return r; > > } > > > > > another bits of information: > - the userspace fallback is not specific to mq (very unfortunately > for me because I didn`t checked this exact regression week before when > I saw it for mq and it is not specific for queued patches for 2.1.1), > - bug itself is not specific to mq, reproduces every time even with > more generic interface config without queues, > - patch from above does not fix the issue. > > Strace output for all threads is available at > http://xdel.ru/downloads/qemu.out.gz, attached just before reset. OK does my patch help? Jason sent patches to fix the fallback to virtio bug - does that work for you?
On Wed, Sep 3, 2014 at 12:13 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > On Wed, Sep 03, 2014 at 11:43:54AM +0400, Andrey Korolyov wrote: >> On Wed, Sep 3, 2014 at 10:10 AM, Michael S. Tsirkin <mst@redhat.com> wrote: >> > On Wed, Sep 03, 2014 at 02:17:02AM +0400, Andrey Korolyov wrote: >> >> On Wed, Sep 3, 2014 at 2:09 AM, Andrey Korolyov <andrey@xdel.ru> wrote: >> >> > On Wed, Sep 3, 2014 at 1:51 AM, Michael S. Tsirkin <mst@redhat.com> wrote: >> >> >> On Wed, Sep 03, 2014 at 01:29:29AM +0400, Andrey Korolyov wrote: >> >> >>> On Wed, Sep 3, 2014 at 1:03 AM, Michael S. Tsirkin <mst@redhat.com> wrote: >> >> >>> >> bad one is the >> >> >>> >> >> >> >>> >> Author: Jason Wang <jasowang@redhat.com> >> >> >>> >> Date: Tue Sep 2 18:07:46 2014 +0300 >> >> >>> >> >> >> >>> >> vhost_net: start/stop guest notifiers properly >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> > upstream has this (pull request sent today): >> >> >>> > vhost_net: cleanup start/stop condition >> >> >>> > >> >> >>> > Could you apply it and see if it helps please? >> >> >>> > >> >> >>> > Michael, if it helps it should be before start/stop guest notifiers >> >> >>> > ideally to avoid bisect problems. >> >> >>> >> >> >>> It is already applied as shown from the list in the previous message >> >> >>> (there are some aio fixes too on top of 2.1 I picked before but they >> >> >>> should not impact vhost-net interaction in any mean). The symptoms are >> >> >>> a bit interesting - VM crashes only at PCI device initalization (e.g. >> >> >>> grub stage after reset and initrd unpacking are passing well, but then >> >> >>> things getting ugly). I am running 3.14 guest i686-pae kernel from >> >> >>> debian backports in guest, so it may be version-specific after all. If >> >> >>> it`ll be hard to reproduce, I can try 64bit, expecting same behavior. >> >> >>> Please find args in attached file. >> >> >> >> >> >> >> >> >> >> >> >> ok just to make sure - which tree do I clone exactly? >> >> >> >> >> > >> >> > https://github.com/mdroth/qemu.git stable-2.1-staging showing same >> >> > behavior for me with those patches >> >> >> >> Forgot to mention important detail - I am playing with -mq now, so >> >> actually virtio-net working in a bit different way than it may >> >> expected (it also shown in args list from above, but someone may miss >> >> it): >> >> ... >> >> qemu-system-x86_64: unable to start vhost net: 95: falling back on >> >> userspace virtio >> >> qemu-system-x86_64: unable to start vhost net: 95: falling back on >> >> userspace virtio >> >> ... >> > >> > >> > OK I see at least one obvious bug there: does the following fix the >> > crash for you? >> > Separately, we need to debug why mq vhost is broken for you. >> > Is this a regression? >> > >> > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c >> > index ba5d544..1fe18c7 100644 >> > --- a/hw/net/vhost_net.c >> > +++ b/hw/net/vhost_net.c >> > @@ -289,7 +289,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs, >> > BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev))); >> > VirtioBusState *vbus = VIRTIO_BUS(qbus); >> > VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus); >> > - int r, i = 0; >> > + int r, i; >> > >> > if (!vhost_net_device_endian_ok(dev)) { >> > error_report("vhost-net does not support cross-endian"); >> > @@ -317,16 +317,22 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs, >> > r = vhost_net_start_one(get_vhost_net(ncs[i].peer), dev); >> > >> > if (r < 0) { >> > - goto err; >> > + goto err_start; >> > } >> > } >> > >> > return 0; >> > >> > -err: >> > +err_start: >> > while (--i >= 0) { >> > vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev); >> > } >> > +err: >> > + r = k->set_guest_notifiers(qbus->parent, total_queues * 2, false); >> > + if (r < 0) { >> > + fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r); >> > + fflush(stderr); >> > + } >> > return r; >> > } >> > >> >> >> another bits of information: >> - the userspace fallback is not specific to mq (very unfortunately >> for me because I didn`t checked this exact regression week before when >> I saw it for mq and it is not specific for queued patches for 2.1.1), >> - bug itself is not specific to mq, reproduces every time even with >> more generic interface config without queues, >> - patch from above does not fix the issue. >> >> Strace output for all threads is available at >> http://xdel.ru/downloads/qemu.out.gz, attached just before reset. > > > > OK does my patch help? > > Jason sent patches to fix the fallback to virtio bug - > does that work for you? > Whoops, missed patch from Jason, meant yours above. The acceleration is fixed, thanks! Jason`s patch alone fixes both crash appearance and accel initialization while yours fixed initialization (while intended to fix assert appearance), with crash still in place.
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index ba5d544..1fe18c7 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -289,7 +289,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs, BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev))); VirtioBusState *vbus = VIRTIO_BUS(qbus); VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus); - int r, i = 0; + int r, i; if (!vhost_net_device_endian_ok(dev)) { error_report("vhost-net does not support cross-endian"); @@ -317,16 +317,22 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs, r = vhost_net_start_one(get_vhost_net(ncs[i].peer), dev); if (r < 0) { - goto err; + goto err_start; } } return 0; -err: +err_start: while (--i >= 0) { vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev); } +err: + r = k->set_guest_notifiers(qbus->parent, total_queues * 2, false); + if (r < 0) { + fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r); + fflush(stderr); + } return r; }