diff mbox

[Qemu-stable] Patch Round-up for stable 2.1.1, freeze on 2014-09-03

Message ID 20140903061015.GA5449@redhat.com
State New
Headers show

Commit Message

Michael S. Tsirkin Sept. 3, 2014, 6:10 a.m. UTC
On Wed, Sep 03, 2014 at 02:17:02AM +0400, Andrey Korolyov wrote:
> On Wed, Sep 3, 2014 at 2:09 AM, Andrey Korolyov <andrey@xdel.ru> wrote:
> > On Wed, Sep 3, 2014 at 1:51 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> On Wed, Sep 03, 2014 at 01:29:29AM +0400, Andrey Korolyov wrote:
> >>> On Wed, Sep 3, 2014 at 1:03 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >>> >> bad one is the
> >>> >>
> >>> >> Author: Jason Wang <jasowang@redhat.com>
> >>> >> Date:   Tue Sep 2 18:07:46 2014 +0300
> >>> >>
> >>> >>     vhost_net: start/stop guest notifiers properly
> >>> >
> >>> >
> >>> >
> >>> > upstream has this (pull request sent today):
> >>> > vhost_net: cleanup start/stop condition
> >>> >
> >>> > Could you apply it and see if it helps please?
> >>> >
> >>> > Michael, if it helps it should be before start/stop guest notifiers
> >>> > ideally to avoid bisect problems.
> >>>
> >>> It is already applied as shown from the list in the previous message
> >>> (there are some aio fixes too on top of 2.1 I picked before but they
> >>> should not impact vhost-net interaction in any mean). The symptoms are
> >>> a bit interesting - VM crashes only at PCI device initalization (e.g.
> >>> grub stage after reset and initrd unpacking are passing well, but then
> >>> things getting ugly). I am running 3.14 guest i686-pae kernel from
> >>> debian backports in guest, so it may be version-specific after all. If
> >>> it`ll be hard to reproduce, I can try 64bit, expecting same behavior.
> >>> Please find args in attached file.
> >>
> >>
> >>
> >> ok just to make sure - which tree do I clone exactly?
> >>
> >
> > https://github.com/mdroth/qemu.git stable-2.1-staging showing same
> > behavior for me with those patches
> 
> Forgot to mention important detail - I am playing with -mq now, so
> actually virtio-net working in a bit different way than it may
> expected (it also shown in args list from above, but someone may miss
> it):
> ...
> qemu-system-x86_64: unable to start vhost net: 95: falling back on
> userspace virtio
> qemu-system-x86_64: unable to start vhost net: 95: falling back on
> userspace virtio
> ...


OK I see at least one obvious bug there: does the following fix the
crash for you?
Separately, we need to debug why mq vhost is broken for you.
Is this a regression?

Comments

Andrey Korolyov Sept. 3, 2014, 7:43 a.m. UTC | #1
On Wed, Sep 3, 2014 at 10:10 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Wed, Sep 03, 2014 at 02:17:02AM +0400, Andrey Korolyov wrote:
>> On Wed, Sep 3, 2014 at 2:09 AM, Andrey Korolyov <andrey@xdel.ru> wrote:
>> > On Wed, Sep 3, 2014 at 1:51 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> >> On Wed, Sep 03, 2014 at 01:29:29AM +0400, Andrey Korolyov wrote:
>> >>> On Wed, Sep 3, 2014 at 1:03 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> >>> >> bad one is the
>> >>> >>
>> >>> >> Author: Jason Wang <jasowang@redhat.com>
>> >>> >> Date:   Tue Sep 2 18:07:46 2014 +0300
>> >>> >>
>> >>> >>     vhost_net: start/stop guest notifiers properly
>> >>> >
>> >>> >
>> >>> >
>> >>> > upstream has this (pull request sent today):
>> >>> > vhost_net: cleanup start/stop condition
>> >>> >
>> >>> > Could you apply it and see if it helps please?
>> >>> >
>> >>> > Michael, if it helps it should be before start/stop guest notifiers
>> >>> > ideally to avoid bisect problems.
>> >>>
>> >>> It is already applied as shown from the list in the previous message
>> >>> (there are some aio fixes too on top of 2.1 I picked before but they
>> >>> should not impact vhost-net interaction in any mean). The symptoms are
>> >>> a bit interesting - VM crashes only at PCI device initalization (e.g.
>> >>> grub stage after reset and initrd unpacking are passing well, but then
>> >>> things getting ugly). I am running 3.14 guest i686-pae kernel from
>> >>> debian backports in guest, so it may be version-specific after all. If
>> >>> it`ll be hard to reproduce, I can try 64bit, expecting same behavior.
>> >>> Please find args in attached file.
>> >>
>> >>
>> >>
>> >> ok just to make sure - which tree do I clone exactly?
>> >>
>> >
>> > https://github.com/mdroth/qemu.git stable-2.1-staging showing same
>> > behavior for me with those patches
>>
>> Forgot to mention important detail - I am playing with -mq now, so
>> actually virtio-net working in a bit different way than it may
>> expected (it also shown in args list from above, but someone may miss
>> it):
>> ...
>> qemu-system-x86_64: unable to start vhost net: 95: falling back on
>> userspace virtio
>> qemu-system-x86_64: unable to start vhost net: 95: falling back on
>> userspace virtio
>> ...
>
>
> OK I see at least one obvious bug there: does the following fix the
> crash for you?
> Separately, we need to debug why mq vhost is broken for you.
> Is this a regression?
>
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index ba5d544..1fe18c7 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -289,7 +289,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>      BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
>      VirtioBusState *vbus = VIRTIO_BUS(qbus);
>      VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
> -    int r, i = 0;
> +    int r, i;
>
>      if (!vhost_net_device_endian_ok(dev)) {
>          error_report("vhost-net does not support cross-endian");
> @@ -317,16 +317,22 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>          r = vhost_net_start_one(get_vhost_net(ncs[i].peer), dev);
>
>          if (r < 0) {
> -            goto err;
> +            goto err_start;
>          }
>      }
>
>      return 0;
>
> -err:
> +err_start:
>      while (--i >= 0) {
>          vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
>      }
> +err:
> +    r = k->set_guest_notifiers(qbus->parent, total_queues * 2, false);
> +    if (r < 0) {
> +        fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r);
> +        fflush(stderr);
> +    }
>      return r;
>  }
>


another bits of information:
 - the userspace fallback is not specific to mq (very unfortunately
for me because I didn`t checked this exact regression week before when
I saw it for mq and it is not specific for queued patches for 2.1.1),
 - bug itself is not specific to mq, reproduces every time even with
more generic interface config without queues,
 - patch from above does not fix the issue.

Strace output for all threads is available at
http://xdel.ru/downloads/qemu.out.gz, attached just before reset.
Michael S. Tsirkin Sept. 3, 2014, 8:13 a.m. UTC | #2
On Wed, Sep 03, 2014 at 11:43:54AM +0400, Andrey Korolyov wrote:
> On Wed, Sep 3, 2014 at 10:10 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Wed, Sep 03, 2014 at 02:17:02AM +0400, Andrey Korolyov wrote:
> >> On Wed, Sep 3, 2014 at 2:09 AM, Andrey Korolyov <andrey@xdel.ru> wrote:
> >> > On Wed, Sep 3, 2014 at 1:51 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> >> On Wed, Sep 03, 2014 at 01:29:29AM +0400, Andrey Korolyov wrote:
> >> >>> On Wed, Sep 3, 2014 at 1:03 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> >>> >> bad one is the
> >> >>> >>
> >> >>> >> Author: Jason Wang <jasowang@redhat.com>
> >> >>> >> Date:   Tue Sep 2 18:07:46 2014 +0300
> >> >>> >>
> >> >>> >>     vhost_net: start/stop guest notifiers properly
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> > upstream has this (pull request sent today):
> >> >>> > vhost_net: cleanup start/stop condition
> >> >>> >
> >> >>> > Could you apply it and see if it helps please?
> >> >>> >
> >> >>> > Michael, if it helps it should be before start/stop guest notifiers
> >> >>> > ideally to avoid bisect problems.
> >> >>>
> >> >>> It is already applied as shown from the list in the previous message
> >> >>> (there are some aio fixes too on top of 2.1 I picked before but they
> >> >>> should not impact vhost-net interaction in any mean). The symptoms are
> >> >>> a bit interesting - VM crashes only at PCI device initalization (e.g.
> >> >>> grub stage after reset and initrd unpacking are passing well, but then
> >> >>> things getting ugly). I am running 3.14 guest i686-pae kernel from
> >> >>> debian backports in guest, so it may be version-specific after all. If
> >> >>> it`ll be hard to reproduce, I can try 64bit, expecting same behavior.
> >> >>> Please find args in attached file.
> >> >>
> >> >>
> >> >>
> >> >> ok just to make sure - which tree do I clone exactly?
> >> >>
> >> >
> >> > https://github.com/mdroth/qemu.git stable-2.1-staging showing same
> >> > behavior for me with those patches
> >>
> >> Forgot to mention important detail - I am playing with -mq now, so
> >> actually virtio-net working in a bit different way than it may
> >> expected (it also shown in args list from above, but someone may miss
> >> it):
> >> ...
> >> qemu-system-x86_64: unable to start vhost net: 95: falling back on
> >> userspace virtio
> >> qemu-system-x86_64: unable to start vhost net: 95: falling back on
> >> userspace virtio
> >> ...
> >
> >
> > OK I see at least one obvious bug there: does the following fix the
> > crash for you?
> > Separately, we need to debug why mq vhost is broken for you.
> > Is this a regression?
> >
> > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> > index ba5d544..1fe18c7 100644
> > --- a/hw/net/vhost_net.c
> > +++ b/hw/net/vhost_net.c
> > @@ -289,7 +289,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
> >      BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
> >      VirtioBusState *vbus = VIRTIO_BUS(qbus);
> >      VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
> > -    int r, i = 0;
> > +    int r, i;
> >
> >      if (!vhost_net_device_endian_ok(dev)) {
> >          error_report("vhost-net does not support cross-endian");
> > @@ -317,16 +317,22 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
> >          r = vhost_net_start_one(get_vhost_net(ncs[i].peer), dev);
> >
> >          if (r < 0) {
> > -            goto err;
> > +            goto err_start;
> >          }
> >      }
> >
> >      return 0;
> >
> > -err:
> > +err_start:
> >      while (--i >= 0) {
> >          vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
> >      }
> > +err:
> > +    r = k->set_guest_notifiers(qbus->parent, total_queues * 2, false);
> > +    if (r < 0) {
> > +        fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r);
> > +        fflush(stderr);
> > +    }
> >      return r;
> >  }
> >
> 
> 
> another bits of information:
>  - the userspace fallback is not specific to mq (very unfortunately
> for me because I didn`t checked this exact regression week before when
> I saw it for mq and it is not specific for queued patches for 2.1.1),
>  - bug itself is not specific to mq, reproduces every time even with
> more generic interface config without queues,
>  - patch from above does not fix the issue.
> 
> Strace output for all threads is available at
> http://xdel.ru/downloads/qemu.out.gz, attached just before reset.



OK does my patch help?

Jason sent patches to fix the fallback to virtio bug -
does that work for you?
Andrey Korolyov Sept. 3, 2014, 8:36 a.m. UTC | #3
On Wed, Sep 3, 2014 at 12:13 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Wed, Sep 03, 2014 at 11:43:54AM +0400, Andrey Korolyov wrote:
>> On Wed, Sep 3, 2014 at 10:10 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> > On Wed, Sep 03, 2014 at 02:17:02AM +0400, Andrey Korolyov wrote:
>> >> On Wed, Sep 3, 2014 at 2:09 AM, Andrey Korolyov <andrey@xdel.ru> wrote:
>> >> > On Wed, Sep 3, 2014 at 1:51 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> >> >> On Wed, Sep 03, 2014 at 01:29:29AM +0400, Andrey Korolyov wrote:
>> >> >>> On Wed, Sep 3, 2014 at 1:03 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> >> >>> >> bad one is the
>> >> >>> >>
>> >> >>> >> Author: Jason Wang <jasowang@redhat.com>
>> >> >>> >> Date:   Tue Sep 2 18:07:46 2014 +0300
>> >> >>> >>
>> >> >>> >>     vhost_net: start/stop guest notifiers properly
>> >> >>> >
>> >> >>> >
>> >> >>> >
>> >> >>> > upstream has this (pull request sent today):
>> >> >>> > vhost_net: cleanup start/stop condition
>> >> >>> >
>> >> >>> > Could you apply it and see if it helps please?
>> >> >>> >
>> >> >>> > Michael, if it helps it should be before start/stop guest notifiers
>> >> >>> > ideally to avoid bisect problems.
>> >> >>>
>> >> >>> It is already applied as shown from the list in the previous message
>> >> >>> (there are some aio fixes too on top of 2.1 I picked before but they
>> >> >>> should not impact vhost-net interaction in any mean). The symptoms are
>> >> >>> a bit interesting - VM crashes only at PCI device initalization (e.g.
>> >> >>> grub stage after reset and initrd unpacking are passing well, but then
>> >> >>> things getting ugly). I am running 3.14 guest i686-pae kernel from
>> >> >>> debian backports in guest, so it may be version-specific after all. If
>> >> >>> it`ll be hard to reproduce, I can try 64bit, expecting same behavior.
>> >> >>> Please find args in attached file.
>> >> >>
>> >> >>
>> >> >>
>> >> >> ok just to make sure - which tree do I clone exactly?
>> >> >>
>> >> >
>> >> > https://github.com/mdroth/qemu.git stable-2.1-staging showing same
>> >> > behavior for me with those patches
>> >>
>> >> Forgot to mention important detail - I am playing with -mq now, so
>> >> actually virtio-net working in a bit different way than it may
>> >> expected (it also shown in args list from above, but someone may miss
>> >> it):
>> >> ...
>> >> qemu-system-x86_64: unable to start vhost net: 95: falling back on
>> >> userspace virtio
>> >> qemu-system-x86_64: unable to start vhost net: 95: falling back on
>> >> userspace virtio
>> >> ...
>> >
>> >
>> > OK I see at least one obvious bug there: does the following fix the
>> > crash for you?
>> > Separately, we need to debug why mq vhost is broken for you.
>> > Is this a regression?
>> >
>> > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>> > index ba5d544..1fe18c7 100644
>> > --- a/hw/net/vhost_net.c
>> > +++ b/hw/net/vhost_net.c
>> > @@ -289,7 +289,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>> >      BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
>> >      VirtioBusState *vbus = VIRTIO_BUS(qbus);
>> >      VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
>> > -    int r, i = 0;
>> > +    int r, i;
>> >
>> >      if (!vhost_net_device_endian_ok(dev)) {
>> >          error_report("vhost-net does not support cross-endian");
>> > @@ -317,16 +317,22 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>> >          r = vhost_net_start_one(get_vhost_net(ncs[i].peer), dev);
>> >
>> >          if (r < 0) {
>> > -            goto err;
>> > +            goto err_start;
>> >          }
>> >      }
>> >
>> >      return 0;
>> >
>> > -err:
>> > +err_start:
>> >      while (--i >= 0) {
>> >          vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
>> >      }
>> > +err:
>> > +    r = k->set_guest_notifiers(qbus->parent, total_queues * 2, false);
>> > +    if (r < 0) {
>> > +        fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r);
>> > +        fflush(stderr);
>> > +    }
>> >      return r;
>> >  }
>> >
>>
>>
>> another bits of information:
>>  - the userspace fallback is not specific to mq (very unfortunately
>> for me because I didn`t checked this exact regression week before when
>> I saw it for mq and it is not specific for queued patches for 2.1.1),
>>  - bug itself is not specific to mq, reproduces every time even with
>> more generic interface config without queues,
>>  - patch from above does not fix the issue.
>>
>> Strace output for all threads is available at
>> http://xdel.ru/downloads/qemu.out.gz, attached just before reset.
>
>
>
> OK does my patch help?
>
> Jason sent patches to fix the fallback to virtio bug -
> does that work for you?
>

Whoops, missed patch from Jason, meant yours above. The acceleration
is fixed, thanks! Jason`s patch alone fixes both crash appearance and
accel initialization while yours fixed initialization (while intended
to fix assert appearance), with crash still in place.
diff mbox

Patch

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index ba5d544..1fe18c7 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -289,7 +289,7 @@  int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
     BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
     VirtioBusState *vbus = VIRTIO_BUS(qbus);
     VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
-    int r, i = 0;
+    int r, i;
 
     if (!vhost_net_device_endian_ok(dev)) {
         error_report("vhost-net does not support cross-endian");
@@ -317,16 +317,22 @@  int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
         r = vhost_net_start_one(get_vhost_net(ncs[i].peer), dev);
 
         if (r < 0) {
-            goto err;
+            goto err_start;
         }
     }
 
     return 0;
 
-err:
+err_start:
     while (--i >= 0) {
         vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
     }
+err:
+    r = k->set_guest_notifiers(qbus->parent, total_queues * 2, false);
+    if (r < 0) {
+        fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r);
+        fflush(stderr);
+    }
     return r;
 }