diff mbox series

[PULL,52/53] vhost_net: add an assertion for TAP client backends

Message ID 0af710813dcde638379e3bece8f9b1bde31af2f6.1687782442.git.mst@redhat.com
State New
Headers show
Series [PULL,01/53] bswap: Add the ability to store to an unaligned 24 bit field | expand

Commit Message

Michael S. Tsirkin June 26, 2023, 12:30 p.m. UTC
From: Ani Sinha <anisinha@redhat.com>

An assertion was missing for tap vhost backends that enforces a non-null
reference from get_vhost_net(). Both vhost-net-user and vhost-net-vdpa
enforces this. Enforce the same for tap. Unit tests pass with this change.

Signed-off-by: Ani Sinha <anisinha@redhat.com>
Message-Id: <20230619041501.111655-1-anisinha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
---
 hw/net/vhost_net.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Cédric Le Goater June 28, 2023, 6:28 a.m. UTC | #1
Hello,

On 6/26/23 14:30, Michael S. Tsirkin wrote:
> From: Ani Sinha <anisinha@redhat.com>
> 
> An assertion was missing for tap vhost backends that enforces a non-null
> reference from get_vhost_net(). Both vhost-net-user and vhost-net-vdpa
> enforces this. Enforce the same for tap. Unit tests pass with this change.
> 
> Signed-off-by: Ani Sinha <anisinha@redhat.com>
> Message-Id: <20230619041501.111655-1-anisinha@redhat.com>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
> ---
>   hw/net/vhost_net.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index c4eecc6f36..6db23ca323 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -507,6 +507,7 @@ VHostNetState *get_vhost_net(NetClientState *nc)
>       switch (nc->info->type) {
>       case NET_CLIENT_DRIVER_TAP:
>           vhost_net = tap_get_vhost_net(nc);
> +        assert(vhost_net);
>           break;
>   #ifdef CONFIG_VHOST_NET_USER
>       case NET_CLIENT_DRIVER_VHOST_USER:

A system of mine without vhost_net (old host kernel) is reaching this assert
and works perfectly fine without. Should it be considered as a regression ?

Thanks,

C.
Ani Sinha June 28, 2023, 6:45 a.m. UTC | #2
> On 28-Jun-2023, at 11:58 AM, Cédric Le Goater <clg@redhat.com> wrote:
> 
> Hello,
> 
> On 6/26/23 14:30, Michael S. Tsirkin wrote:
>> From: Ani Sinha <anisinha@redhat.com>
>> An assertion was missing for tap vhost backends that enforces a non-null
>> reference from get_vhost_net(). Both vhost-net-user and vhost-net-vdpa
>> enforces this. Enforce the same for tap. Unit tests pass with this change.
>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>> Message-Id: <20230619041501.111655-1-anisinha@redhat.com>
>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
>> ---
>>  hw/net/vhost_net.c | 1 +
>>  1 file changed, 1 insertion(+)
>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>> index c4eecc6f36..6db23ca323 100644
>> --- a/hw/net/vhost_net.c
>> +++ b/hw/net/vhost_net.c
>> @@ -507,6 +507,7 @@ VHostNetState *get_vhost_net(NetClientState *nc)
>>      switch (nc->info->type) {
>>      case NET_CLIENT_DRIVER_TAP:
>>          vhost_net = tap_get_vhost_net(nc);
>> +        assert(vhost_net);
>>          break;
>>  #ifdef CONFIG_VHOST_NET_USER
>>      case NET_CLIENT_DRIVER_VHOST_USER:
> 
> A system of mine without vhost_net (old host kernel) is reaching this assert

We need to understand why this assertion is being hit. It could be a bug somewhere else.
What is the backtrace? What is the repro case?

> and works perfectly fine without. Should it be considered as a regression ?
> 
> Thanks,
> 
> C.
>
Cédric Le Goater June 28, 2023, 7:30 a.m. UTC | #3
On 6/28/23 08:45, Ani Sinha wrote:
> 
> 
>> On 28-Jun-2023, at 11:58 AM, Cédric Le Goater <clg@redhat.com> wrote:
>>
>> Hello,
>>
>> On 6/26/23 14:30, Michael S. Tsirkin wrote:
>>> From: Ani Sinha <anisinha@redhat.com>
>>> An assertion was missing for tap vhost backends that enforces a non-null
>>> reference from get_vhost_net(). Both vhost-net-user and vhost-net-vdpa
>>> enforces this. Enforce the same for tap. Unit tests pass with this change.
>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>> Message-Id: <20230619041501.111655-1-anisinha@redhat.com>
>>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
>>> ---
>>>   hw/net/vhost_net.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>>> index c4eecc6f36..6db23ca323 100644
>>> --- a/hw/net/vhost_net.c
>>> +++ b/hw/net/vhost_net.c
>>> @@ -507,6 +507,7 @@ VHostNetState *get_vhost_net(NetClientState *nc)
>>>       switch (nc->info->type) {
>>>       case NET_CLIENT_DRIVER_TAP:
>>>           vhost_net = tap_get_vhost_net(nc);
>>> +        assert(vhost_net);
>>>           break;
>>>   #ifdef CONFIG_VHOST_NET_USER
>>>       case NET_CLIENT_DRIVER_VHOST_USER:
>>
>> A system of mine without vhost_net (old host kernel) is reaching this assert
> 
> We need to understand why this assertion is being hit. It could be a bug somewhere else.

sure.

> What is the backtrace? 


#0  __pthread_kill_implementation (threadid=549621125152, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1  0x0000007ff71df254 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  0x0000007ff719a67c in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x0000007ff7187130 in __GI_abort () at ./stdlib/abort.c:79
#4  0x0000007ff7193fd0 in __assert_fail_base
     (fmt=0x7ff72ad3f8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x5556208c00 "vhost_net", file=file@entry=0x5556208ae0 "../hw/net/vhost_net.c", line=line@entry=510, function=function@entry=0x5556208de8 <__PRETTY_FUNCTION__.2> "get_vhost_net") at ./assert/assert.c:92
#5  0x0000007ff7194040 in __GI___assert_fail
     (assertion=assertion@entry=0x5556208c00 "vhost_net", file=file@entry=0x5556208ae0 "../hw/net/vhost_net.c", line=line@entry=510, function=function@entry=0x5556208de8 <__PRETTY_FUNCTION__.2> "get_vhost_net") at ./assert/assert.c:101
#6  0x0000005555a767ac in get_vhost_net (nc=<optimized out>) at ../hw/net/vhost_net.c:510
#7  0x0000005555e6aea8 in virtio_net_get_features (vdev=0x55578cc770, features=1105849221095, errp=<optimized out>) at ../hw/net/virtio-net.c:807
#8  0x0000005555b69da4 in virtio_bus_device_plugged (vdev=vdev@entry=0x55578cc770, errp=errp@entry=0x7fffffe640) at ../hw/virtio/virtio-bus.c:66
#9  0x0000005555e8b6a0 in virtio_device_realize (dev=0x55578cc770, errp=0x7fffffe6c0) at ../hw/virtio/virtio.c:3609
#10 0x0000005555f1ead8 in device_set_realized (obj=0x55578cc770, value=<optimized out>, errp=0x7fffffe898) at ../hw/core/qdev.c:510
#11 0x0000005555f22a14 in property_set_bool (obj=0x55578cc770, v=<optimized out>, name=<optimized out>, opaque=0x5556be9ad0, errp=0x7fffffe898) at ../qom/object.c:2285
#12 0x0000005555f260fc in object_property_set (obj=obj@entry=0x55578cc770, name=name@entry=0x555621c280 "realized", v=v@entry=0x55578d9310, errp=errp@entry=0x7fffffe898)
     at ../qom/object.c:1420
#13 0x0000005555f2a85c in object_property_set_qobject
     (obj=obj@entry=0x55578cc770, name=name@entry=0x555621c280 "realized", value=value@entry=0x55578d9250, errp=errp@entry=0x7fffffe898) at ../qom/qom-qobject.c:28
#14 0x0000005555f26820 in object_property_set_bool (obj=0x55578cc770, name=0x555621c280 "realized", value=<optimized out>, errp=0x7fffffe898) at ../qom/object.c:1489
#15 0x0000005555aafd24 in pci_qdev_realize (qdev=<optimized out>, errp=<optimized out>) at ../hw/pci/pci.c:2116
#16 0x0000005555f1ead8 in device_set_realized (obj=0x55578c43a0, value=<optimized out>, errp=0x7fffffeae8) at ../hw/core/qdev.c:510
#17 0x0000005555f22a14 in property_set_bool (obj=0x55578c43a0, v=<optimized out>, name=<optimized out>, opaque=0x5556be9ad0, errp=0x7fffffeae8) at ../qom/object.c:2285
#18 0x0000005555f260fc in object_property_set (obj=obj@entry=0x55578c43a0, name=name@entry=0x555621c280 "realized", v=v@entry=0x55578d34c0, errp=errp@entry=0x7fffffeae8)
     at ../qom/object.c:1420
#19 0x0000005555f2a85c in object_property_set_qobject
     (obj=obj@entry=0x55578c43a0, name=name@entry=0x555621c280 "realized", value=value@entry=0x55578d3450, errp=errp@entry=0x7fffffeae8) at ../qom/qom-qobject.c:28
#20 0x0000005555f26820 in object_property_set_bool (obj=0x55578c43a0, name=0x555621c280 "realized", value=<optimized out>, errp=0x7fffffeae8) at ../qom/object.c:1489
#21 0x0000005555ba4f7c in qdev_device_add_from_qdict (opts=opts@entry=0x55578c3110, from_json=from_json@entry=false, errp=0x7fffffeae8, errp@entry=0x5556b24a28 <error_fatal>)
     at ../softmmu/qdev-monitor.c:714
#22 0x0000005555ba5170 in qdev_device_add (opts=0x5556be6060, errp=errp@entry=0x5556b24a28 <error_fatal>) at ../softmmu/qdev-monitor.c:733
#23 0x0000005555baa09c in device_init_func (opaque=<optimized out>, opts=<optimized out>, errp=0x5556b24a28 <error_fatal>) at ../softmmu/vl.c:1152
#24 0x00000055560b2124 in qemu_opts_foreach
     (list=<optimized out>, func=func@entry=0x5555baa080 <device_init_func>, opaque=opaque@entry=0x0, errp=errp@entry=0x5556b24a28 <error_fatal>) at ../util/qemu-option.c:1135
#25 0x0000005555bac88c in qemu_create_cli_devices () at ../softmmu/vl.c:2573
#26 qmp_x_exit_preconfig (errp=<optimized out>) at ../softmmu/vl.c:2641
#27 0x0000005555bafee4 in qmp_x_exit_preconfig (errp=<optimized out>) at ../softmmu/vl.c:2635
#28 qemu_init (argc=<optimized out>, argv=<optimized out>) at ../softmmu/vl.c:3648
#29 0x00000055558e839c in main (argc=<optimized out>, argv=<optimized out>) at ../softmmu/main.c:47

> What is the repro case?

Nothing special :

qemu-system-aarch64 -M virt -accel kvm -cpu host -nographic -m 2G \
  -drive if=pflash,format=raw,file=./EFI/efi.img,readonly=on \
  -drive if=pflash,format=raw,file=fedora-varstore.img \
  -device virtio-net,netdev=net0,mac=C0:FF:EE:00:00:12,bus=pcie.0,addr=0x3 \
  -netdev tap,id=net0,helper=/usr/lib/qemu/qemu-bridge-helper,br=virbr0,vhost=off \
  -drive file=fedora-arm64.qcow2,if=none,id=disk,format=qcow2,cache=none \
  -device virtio-blk-device,drive=disk \
  -serial mon:stdio
qemu-system-aarch64: ../hw/net/vhost_net.c:510: get_vhost_net: Assertion `vhost_net' failed.

Thanks,

C.
Ani Sinha June 28, 2023, 10:33 a.m. UTC | #4
> On 28-Jun-2023, at 1:00 PM, Cédric Le Goater <clg@redhat.com> wrote:
> 
> On 6/28/23 08:45, Ani Sinha wrote:
>>> On 28-Jun-2023, at 11:58 AM, Cédric Le Goater <clg@redhat.com> wrote:
>>> 
>>> Hello,
>>> 
>>> On 6/26/23 14:30, Michael S. Tsirkin wrote:
>>>> From: Ani Sinha <anisinha@redhat.com>
>>>> An assertion was missing for tap vhost backends that enforces a non-null
>>>> reference from get_vhost_net(). Both vhost-net-user and vhost-net-vdpa
>>>> enforces this. Enforce the same for tap. Unit tests pass with this change.
>>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>>> Message-Id: <20230619041501.111655-1-anisinha@redhat.com>
>>>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>>> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
>>>> ---
>>>>  hw/net/vhost_net.c | 1 +
>>>>  1 file changed, 1 insertion(+)
>>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>>>> index c4eecc6f36..6db23ca323 100644
>>>> --- a/hw/net/vhost_net.c
>>>> +++ b/hw/net/vhost_net.c
>>>> @@ -507,6 +507,7 @@ VHostNetState *get_vhost_net(NetClientState *nc)
>>>>      switch (nc->info->type) {
>>>>      case NET_CLIENT_DRIVER_TAP:
>>>>          vhost_net = tap_get_vhost_net(nc);
>>>> +        assert(vhost_net);
>>>>          break;
>>>>  #ifdef CONFIG_VHOST_NET_USER
>>>>      case NET_CLIENT_DRIVER_VHOST_USER:
>>> 
>>> A system of mine without vhost_net (old host kernel) is reaching this assert
>> We need to understand why this assertion is being hit. It could be a bug somewhere else.
> 
> sure.
> 
>> What is the backtrace? 
> 
> 
> #0  __pthread_kill_implementation (threadid=549621125152, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
> #1  0x0000007ff71df254 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
> #2  0x0000007ff719a67c in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
> #3  0x0000007ff7187130 in __GI_abort () at ./stdlib/abort.c:79
> #4  0x0000007ff7193fd0 in __assert_fail_base
>    (fmt=0x7ff72ad3f8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x5556208c00 "vhost_net", file=file@entry=0x5556208ae0 "../hw/net/vhost_net.c", line=line@entry=510, function=function@entry=0x5556208de8 <__PRETTY_FUNCTION__.2> "get_vhost_net") at ./assert/assert.c:92
> #5  0x0000007ff7194040 in __GI___assert_fail
>    (assertion=assertion@entry=0x5556208c00 "vhost_net", file=file@entry=0x5556208ae0 "../hw/net/vhost_net.c", line=line@entry=510, function=function@entry=0x5556208de8 <__PRETTY_FUNCTION__.2> "get_vhost_net") at ./assert/assert.c:101
> #6  0x0000005555a767ac in get_vhost_net (nc=<optimized out>) at ../hw/net/vhost_net.c:510
> #7  0x0000005555e6aea8 in virtio_net_get_features (vdev=0x55578cc770, features=1105849221095, errp=<optimized out>) at ../hw/net/virtio-net.c:807
> #8  0x0000005555b69da4 in virtio_bus_device_plugged (vdev=vdev@entry=0x55578cc770, errp=errp@entry=0x7fffffe640) at ../hw/virtio/virtio-bus.c:66
> #9  0x0000005555e8b6a0 in virtio_device_realize (dev=0x55578cc770, errp=0x7fffffe6c0) at ../hw/virtio/virtio.c:3609
> #10 0x0000005555f1ead8 in device_set_realized (obj=0x55578cc770, value=<optimized out>, errp=0x7fffffe898) at ../hw/core/qdev.c:510
> #11 0x0000005555f22a14 in property_set_bool (obj=0x55578cc770, v=<optimized out>, name=<optimized out>, opaque=0x5556be9ad0, errp=0x7fffffe898) at ../qom/object.c:2285
> #12 0x0000005555f260fc in object_property_set (obj=obj@entry=0x55578cc770, name=name@entry=0x555621c280 "realized", v=v@entry=0x55578d9310, errp=errp@entry=0x7fffffe898)
>    at ../qom/object.c:1420
> #13 0x0000005555f2a85c in object_property_set_qobject
>    (obj=obj@entry=0x55578cc770, name=name@entry=0x555621c280 "realized", value=value@entry=0x55578d9250, errp=errp@entry=0x7fffffe898) at ../qom/qom-qobject.c:28
> #14 0x0000005555f26820 in object_property_set_bool (obj=0x55578cc770, name=0x555621c280 "realized", value=<optimized out>, errp=0x7fffffe898) at ../qom/object.c:1489
> #15 0x0000005555aafd24 in pci_qdev_realize (qdev=<optimized out>, errp=<optimized out>) at ../hw/pci/pci.c:2116
> #16 0x0000005555f1ead8 in device_set_realized (obj=0x55578c43a0, value=<optimized out>, errp=0x7fffffeae8) at ../hw/core/qdev.c:510
> #17 0x0000005555f22a14 in property_set_bool (obj=0x55578c43a0, v=<optimized out>, name=<optimized out>, opaque=0x5556be9ad0, errp=0x7fffffeae8) at ../qom/object.c:2285
> #18 0x0000005555f260fc in object_property_set (obj=obj@entry=0x55578c43a0, name=name@entry=0x555621c280 "realized", v=v@entry=0x55578d34c0, errp=errp@entry=0x7fffffeae8)
>    at ../qom/object.c:1420
> #19 0x0000005555f2a85c in object_property_set_qobject
>    (obj=obj@entry=0x55578c43a0, name=name@entry=0x555621c280 "realized", value=value@entry=0x55578d3450, errp=errp@entry=0x7fffffeae8) at ../qom/qom-qobject.c:28
> #20 0x0000005555f26820 in object_property_set_bool (obj=0x55578c43a0, name=0x555621c280 "realized", value=<optimized out>, errp=0x7fffffeae8) at ../qom/object.c:1489
> #21 0x0000005555ba4f7c in qdev_device_add_from_qdict (opts=opts@entry=0x55578c3110, from_json=from_json@entry=false, errp=0x7fffffeae8, errp@entry=0x5556b24a28 <error_fatal>)
>    at ../softmmu/qdev-monitor.c:714
> #22 0x0000005555ba5170 in qdev_device_add (opts=0x5556be6060, errp=errp@entry=0x5556b24a28 <error_fatal>) at ../softmmu/qdev-monitor.c:733
> #23 0x0000005555baa09c in device_init_func (opaque=<optimized out>, opts=<optimized out>, errp=0x5556b24a28 <error_fatal>) at ../softmmu/vl.c:1152
> #24 0x00000055560b2124 in qemu_opts_foreach
>    (list=<optimized out>, func=func@entry=0x5555baa080 <device_init_func>, opaque=opaque@entry=0x0, errp=errp@entry=0x5556b24a28 <error_fatal>) at ../util/qemu-option.c:1135
> #25 0x0000005555bac88c in qemu_create_cli_devices () at ../softmmu/vl.c:2573
> #26 qmp_x_exit_preconfig (errp=<optimized out>) at ../softmmu/vl.c:2641
> #27 0x0000005555bafee4 in qmp_x_exit_preconfig (errp=<optimized out>) at ../softmmu/vl.c:2635
> #28 qemu_init (argc=<optimized out>, argv=<optimized out>) at ../softmmu/vl.c:3648
> #29 0x00000055558e839c in main (argc=<optimized out>, argv=<optimized out>) at ../softmmu/main.c:47
> 
>> What is the repro case?
> 

I have been able to repro this even on x86, using the following command line:

./qemu-system-x86_64 \
-accel kvm \
-machine pc-q35-8.0 \
-m 8192 \
-nodefaults \
-boot strict=on \
-nographic \
-device virtio-net,netdev=net0,mac=C0:FF:EE:00:00:12,bus=pcie.0,addr=0x3 \
-netdev tap,id=net0,helper=/usr/libexec/qemu-bridge-helper,br=virbr0,vhost=off \
-monitor stdio \
-qmp tcp:0:5555,server,nowait \
-vnc :0


> Nothing special :
> 
> qemu-system-aarch64 -M virt -accel kvm -cpu host -nographic -m 2G \
> -drive if=pflash,format=raw,file=./EFI/efi.img,readonly=on \
> -drive if=pflash,format=raw,file=fedora-varstore.img \
> -device virtio-net,netdev=net0,mac=C0:FF:EE:00:00:12,bus=pcie.0,addr=0x3 \
> -netdev tap,id=net0,helper=/usr/lib/qemu/qemu-bridge-helper,br=virbr0,vhost=off \
                                                                       ^^^^^^^^^^^
This is what is causing the crash. With vhost=off or without that option (and no vhostfd provided or forced) , within net_init_tap_one(), tap->vhost is False (when vhost=off) or tap->has_vhost is False (when no vhost option is provided). Thus vhost_net_init() will never be called since the following conditional evaluates to false :

    if (tap->has_vhost ? tap->vhost :
        vhostfdname || (tap->has_vhostforce && tap->vhostforce)) {

and s->vhost_net pointer will remain NULL.

The crash does not happen when passing vhost=on in the commandline.

Its sad we did not have a test for it and did not catch this scenario.

I will replace the assert() with a comment as to why the assert() is absent in tap network case.
 
> -drive file=fedora-arm64.qcow2,if=none,id=disk,format=qcow2,cache=none \
> -device virtio-blk-device,drive=disk \
> -serial mon:stdio
> qemu-system-aarch64: ../hw/net/vhost_net.c:510: get_vhost_net: Assertion `vhost_net' failed.
Michael S. Tsirkin June 28, 2023, 10:50 a.m. UTC | #5
On Wed, Jun 28, 2023 at 04:03:44PM +0530, Ani Sinha wrote:
> 
> 
> > On 28-Jun-2023, at 1:00 PM, Cédric Le Goater <clg@redhat.com> wrote:
> > 
> > On 6/28/23 08:45, Ani Sinha wrote:
> >>> On 28-Jun-2023, at 11:58 AM, Cédric Le Goater <clg@redhat.com> wrote:
> >>> 
> >>> Hello,
> >>> 
> >>> On 6/26/23 14:30, Michael S. Tsirkin wrote:
> >>>> From: Ani Sinha <anisinha@redhat.com>
> >>>> An assertion was missing for tap vhost backends that enforces a non-null
> >>>> reference from get_vhost_net(). Both vhost-net-user and vhost-net-vdpa
> >>>> enforces this. Enforce the same for tap. Unit tests pass with this change.
> >>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
> >>>> Message-Id: <20230619041501.111655-1-anisinha@redhat.com>
> >>>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> >>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >>>> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
> >>>> ---
> >>>>  hw/net/vhost_net.c | 1 +
> >>>>  1 file changed, 1 insertion(+)
> >>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> >>>> index c4eecc6f36..6db23ca323 100644
> >>>> --- a/hw/net/vhost_net.c
> >>>> +++ b/hw/net/vhost_net.c
> >>>> @@ -507,6 +507,7 @@ VHostNetState *get_vhost_net(NetClientState *nc)
> >>>>      switch (nc->info->type) {
> >>>>      case NET_CLIENT_DRIVER_TAP:
> >>>>          vhost_net = tap_get_vhost_net(nc);
> >>>> +        assert(vhost_net);
> >>>>          break;
> >>>>  #ifdef CONFIG_VHOST_NET_USER
> >>>>      case NET_CLIENT_DRIVER_VHOST_USER:
> >>> 
> >>> A system of mine without vhost_net (old host kernel) is reaching this assert
> >> We need to understand why this assertion is being hit. It could be a bug somewhere else.
> > 
> > sure.
> > 
> >> What is the backtrace? 
> > 
> > 
> > #0  __pthread_kill_implementation (threadid=549621125152, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
> > #1  0x0000007ff71df254 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
> > #2  0x0000007ff719a67c in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
> > #3  0x0000007ff7187130 in __GI_abort () at ./stdlib/abort.c:79
> > #4  0x0000007ff7193fd0 in __assert_fail_base
> >    (fmt=0x7ff72ad3f8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x5556208c00 "vhost_net", file=file@entry=0x5556208ae0 "../hw/net/vhost_net.c", line=line@entry=510, function=function@entry=0x5556208de8 <__PRETTY_FUNCTION__.2> "get_vhost_net") at ./assert/assert.c:92
> > #5  0x0000007ff7194040 in __GI___assert_fail
> >    (assertion=assertion@entry=0x5556208c00 "vhost_net", file=file@entry=0x5556208ae0 "../hw/net/vhost_net.c", line=line@entry=510, function=function@entry=0x5556208de8 <__PRETTY_FUNCTION__.2> "get_vhost_net") at ./assert/assert.c:101
> > #6  0x0000005555a767ac in get_vhost_net (nc=<optimized out>) at ../hw/net/vhost_net.c:510
> > #7  0x0000005555e6aea8 in virtio_net_get_features (vdev=0x55578cc770, features=1105849221095, errp=<optimized out>) at ../hw/net/virtio-net.c:807
> > #8  0x0000005555b69da4 in virtio_bus_device_plugged (vdev=vdev@entry=0x55578cc770, errp=errp@entry=0x7fffffe640) at ../hw/virtio/virtio-bus.c:66
> > #9  0x0000005555e8b6a0 in virtio_device_realize (dev=0x55578cc770, errp=0x7fffffe6c0) at ../hw/virtio/virtio.c:3609
> > #10 0x0000005555f1ead8 in device_set_realized (obj=0x55578cc770, value=<optimized out>, errp=0x7fffffe898) at ../hw/core/qdev.c:510
> > #11 0x0000005555f22a14 in property_set_bool (obj=0x55578cc770, v=<optimized out>, name=<optimized out>, opaque=0x5556be9ad0, errp=0x7fffffe898) at ../qom/object.c:2285
> > #12 0x0000005555f260fc in object_property_set (obj=obj@entry=0x55578cc770, name=name@entry=0x555621c280 "realized", v=v@entry=0x55578d9310, errp=errp@entry=0x7fffffe898)
> >    at ../qom/object.c:1420
> > #13 0x0000005555f2a85c in object_property_set_qobject
> >    (obj=obj@entry=0x55578cc770, name=name@entry=0x555621c280 "realized", value=value@entry=0x55578d9250, errp=errp@entry=0x7fffffe898) at ../qom/qom-qobject.c:28
> > #14 0x0000005555f26820 in object_property_set_bool (obj=0x55578cc770, name=0x555621c280 "realized", value=<optimized out>, errp=0x7fffffe898) at ../qom/object.c:1489
> > #15 0x0000005555aafd24 in pci_qdev_realize (qdev=<optimized out>, errp=<optimized out>) at ../hw/pci/pci.c:2116
> > #16 0x0000005555f1ead8 in device_set_realized (obj=0x55578c43a0, value=<optimized out>, errp=0x7fffffeae8) at ../hw/core/qdev.c:510
> > #17 0x0000005555f22a14 in property_set_bool (obj=0x55578c43a0, v=<optimized out>, name=<optimized out>, opaque=0x5556be9ad0, errp=0x7fffffeae8) at ../qom/object.c:2285
> > #18 0x0000005555f260fc in object_property_set (obj=obj@entry=0x55578c43a0, name=name@entry=0x555621c280 "realized", v=v@entry=0x55578d34c0, errp=errp@entry=0x7fffffeae8)
> >    at ../qom/object.c:1420
> > #19 0x0000005555f2a85c in object_property_set_qobject
> >    (obj=obj@entry=0x55578c43a0, name=name@entry=0x555621c280 "realized", value=value@entry=0x55578d3450, errp=errp@entry=0x7fffffeae8) at ../qom/qom-qobject.c:28
> > #20 0x0000005555f26820 in object_property_set_bool (obj=0x55578c43a0, name=0x555621c280 "realized", value=<optimized out>, errp=0x7fffffeae8) at ../qom/object.c:1489
> > #21 0x0000005555ba4f7c in qdev_device_add_from_qdict (opts=opts@entry=0x55578c3110, from_json=from_json@entry=false, errp=0x7fffffeae8, errp@entry=0x5556b24a28 <error_fatal>)
> >    at ../softmmu/qdev-monitor.c:714
> > #22 0x0000005555ba5170 in qdev_device_add (opts=0x5556be6060, errp=errp@entry=0x5556b24a28 <error_fatal>) at ../softmmu/qdev-monitor.c:733
> > #23 0x0000005555baa09c in device_init_func (opaque=<optimized out>, opts=<optimized out>, errp=0x5556b24a28 <error_fatal>) at ../softmmu/vl.c:1152
> > #24 0x00000055560b2124 in qemu_opts_foreach
> >    (list=<optimized out>, func=func@entry=0x5555baa080 <device_init_func>, opaque=opaque@entry=0x0, errp=errp@entry=0x5556b24a28 <error_fatal>) at ../util/qemu-option.c:1135
> > #25 0x0000005555bac88c in qemu_create_cli_devices () at ../softmmu/vl.c:2573
> > #26 qmp_x_exit_preconfig (errp=<optimized out>) at ../softmmu/vl.c:2641
> > #27 0x0000005555bafee4 in qmp_x_exit_preconfig (errp=<optimized out>) at ../softmmu/vl.c:2635
> > #28 qemu_init (argc=<optimized out>, argv=<optimized out>) at ../softmmu/vl.c:3648
> > #29 0x00000055558e839c in main (argc=<optimized out>, argv=<optimized out>) at ../softmmu/main.c:47
> > 
> >> What is the repro case?
> > 
> 
> I have been able to repro this even on x86, using the following command line:
> 
> ./qemu-system-x86_64 \
> -accel kvm \
> -machine pc-q35-8.0 \
> -m 8192 \
> -nodefaults \
> -boot strict=on \
> -nographic \
> -device virtio-net,netdev=net0,mac=C0:FF:EE:00:00:12,bus=pcie.0,addr=0x3 \
> -netdev tap,id=net0,helper=/usr/libexec/qemu-bridge-helper,br=virbr0,vhost=off \
> -monitor stdio \
> -qmp tcp:0:5555,server,nowait \
> -vnc :0
> 
> 
> > Nothing special :
> > 
> > qemu-system-aarch64 -M virt -accel kvm -cpu host -nographic -m 2G \
> > -drive if=pflash,format=raw,file=./EFI/efi.img,readonly=on \
> > -drive if=pflash,format=raw,file=fedora-varstore.img \
> > -device virtio-net,netdev=net0,mac=C0:FF:EE:00:00:12,bus=pcie.0,addr=0x3 \
> > -netdev tap,id=net0,helper=/usr/lib/qemu/qemu-bridge-helper,br=virbr0,vhost=off \
>                                                                        ^^^^^^^^^^^
> This is what is causing the crash. With vhost=off or without that option (and no vhostfd provided or forced) , within net_init_tap_one(), tap->vhost is False (when vhost=off) or tap->has_vhost is False (when no vhost option is provided). Thus vhost_net_init() will never be called since the following conditional evaluates to false :
> 
>     if (tap->has_vhost ? tap->vhost :
>         vhostfdname || (tap->has_vhostforce && tap->vhostforce)) {
> 
> and s->vhost_net pointer will remain NULL.
> 
> The crash does not happen when passing vhost=on in the commandline.
> 
> Its sad we did not have a test for it and did not catch this scenario.
> 
> I will replace the assert() with a comment as to why the assert() is absent in tap network case.
>  
> > -drive file=fedora-arm64.qcow2,if=none,id=disk,format=qcow2,cache=none \
> > -device virtio-blk-device,drive=disk \
> > -serial mon:stdio
> > qemu-system-aarch64: ../hw/net/vhost_net.c:510: get_vhost_net: Assertion `vhost_net' failed.
> 

Thanks! Pls post quickly and I'll do a pull with just this fixup, so
people don't suffer from the regression.
diff mbox series

Patch

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index c4eecc6f36..6db23ca323 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -507,6 +507,7 @@  VHostNetState *get_vhost_net(NetClientState *nc)
     switch (nc->info->type) {
     case NET_CLIENT_DRIVER_TAP:
         vhost_net = tap_get_vhost_net(nc);
+        assert(vhost_net);
         break;
 #ifdef CONFIG_VHOST_NET_USER
     case NET_CLIENT_DRIVER_VHOST_USER: