virtio_net: flush uncompleted TX on reset

Message ID	152044905701.22965.8924520463227226351.stgit@bahia.lan
State	New
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> sender: groug@kaod.org) by player728.ha.ovh.net (Postfix) with ESMTPA id CE03E54008F; Wed, 7 Mar 2018 19:57:43 +0100 (CET) From: Greg Kurz <groug@kaod.org> To: qemu-devel@nongnu.org Date: Wed, 07 Mar 2018 19:57:37 +0100 Message-ID: <152044905701.22965.8924520463227226351.stgit@bahia.lan> User-Agent: StGit/0.17.1-46-g6855-dirty MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] [PATCH] virtio_net: flush uncompleted TX on reset Precedence: list Cc: Jason Wang <jasowang@redhat.com>, R Nageswara Sastry <nasastry@in.ibm.com>, "Michael S. Tsirkin" <mst@redhat.com> Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>
Series	virtio_net: flush uncompleted TX on reset \| expand virtio_net: flush uncompleted TX on reset

Greg Kurz March 7, 2018, 6:57 p.m. UTC

If the backend could not transmit a packet right away for some reason,
the packet is queued for asynchronous sending. The corresponding vq
element is tracked in the async_tx.elem field of the VirtIONetQueue,
for later freeing when the transmission is complete.

If a reset happens before completion, virtio_net_tx_complete() will push
async_tx.elem back to the guest anyway, and we end up with the inuse flag
of the vq being equal to -1. The next call to virtqueue_pop() is then
likely to fail with "Virtqueue size exceeded".

This can be reproduced easily by starting a guest without a net backend,
doing a system reset when it is booted, and finally snapshotting it.

The appropriate fix is to ensure that such an asynchronous transmission
cannot survive a device reset. So for all queues, we first try to send
the packet again, and eventually we purge it if the backend still could
not deliver it.

Reported-by: R. Nageswara Sastry <nasastry@in.ibm.com>
Buglink: https://github.com/open-power-host-os/qemu/issues/37
Signed-off-by: Greg Kurz <groug@kaod.org>
---
 hw/net/virtio-net.c |   11 +++++++++++
 1 file changed, 11 insertions(+)

Nageswara R Sastry March 8, 2018, 5:08 a.m. UTC | #1

Greg Kurz <groug@kaod.org> wrote on 08/03/2018 12:27:37 AM:

> From: Greg Kurz <groug@kaod.org>
> To: qemu-devel@nongnu.org
> Cc: "Michael S. Tsirkin" <mst@redhat.com>, Jason Wang 
> <jasowang@redhat.com>, R Nageswara Sastry <nasastry@in.ibm.com>
> Date: 08/03/2018 12:27 AM
> Subject: [PATCH] virtio_net: flush uncompleted TX on reset
> 
> If the backend could not transmit a packet right away for some reason,
> the packet is queued for asynchronous sending. The corresponding vq
> element is tracked in the async_tx.elem field of the VirtIONetQueue,
> for later freeing when the transmission is complete.
> 
> If a reset happens before completion, virtio_net_tx_complete() will push
> async_tx.elem back to the guest anyway, and we end up with the inuse 
flag
> of the vq being equal to -1. The next call to virtqueue_pop() is then
> likely to fail with "Virtqueue size exceeded".
> 
> This can be reproduced easily by starting a guest without a net backend,
> doing a system reset when it is booted, and finally snapshotting it.
> 
> The appropriate fix is to ensure that such an asynchronous transmission
> cannot survive a device reset. So for all queues, we first try to send
> the packet again, and eventually we purge it if the backend still could
> not deliver it.
> 
> Reported-by: R. Nageswara Sastry <nasastry@in.ibm.com>

Tested-by: R. Nageswara Sastry <nasastry@in.ibm.com>

> Buglink: https://urldefense.proofpoint.com/v2/url?
> 
u=https-3A__github.com_open-2Dpower-2Dhost-2Dos_qemu_issues_37&d=DwICaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=mxAxqGE8eb0FlPPFjDZkTNaoci-GdQbkJayE4r-
> 
wzYY&m=hPoY6b601IXJbUV2uh22jBrnYuByNQpwi1d7gvN4yZs&s=Ic3NN3mM_Nv1gAJ7dY22-
> ebnJsG7c0yNkbThX8Tu6xg&e=
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---
>  hw/net/virtio-net.c |   11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 188744e17d57..eea3cdb2c700 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -422,6 +422,7 @@ static RxFilterInfo 
> *virtio_net_query_rxfilter(NetClientState *nc)
>  static void virtio_net_reset(VirtIODevice *vdev)
>  {
>      VirtIONet *n = VIRTIO_NET(vdev);
> +    int i;
> 
>      /* Reset back to compatibility mode */
>      n->promisc = 1;
> @@ -445,6 +446,16 @@ static void virtio_net_reset(VirtIODevice *vdev)
>      memcpy(&n->mac[0], &n->nic->conf->macaddr, sizeof(n->mac));
>      qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
>      memset(n->vlans, 0, MAX_VLAN >> 3);
> +
> +    /* Flush any async TX */
> +    for (i = 0;  i < n->max_queues; i++) {
> +        NetClientState *nc = qemu_get_subqueue(n->nic, i);
> +
> +        if (!qemu_net_queue_flush(nc->peer->incoming_queue)) {
> +            qemu_net_queue_purge(nc->peer->incoming_queue, nc);
> +        }
> +        assert(!virtio_net_get_subqueue(nc)->async_tx.elem);
> +    }
>  }
> 
>  static void peer_test_vnet_hdr(VirtIONet *n)
> 


With out patch:
(qemu) system_reset
(qemu) savevm 1
Virtqueue size exceeded
(qemu) loadvm 1
VQ 1 size 0x100 < last_avail_idx 0x0 - used_idx 0x1
Failed to load virtio-net:virtio
error while loading state for instance 0x0 of device
'pci@800000020000000:00.0/virtio-net'
Error -1 while loading VM state

With patch:
(qemu) system_reset
(qemu) savevm 1
(qemu) loadvm 1

Greg Kurz March 15, 2018, 9:45 a.m. UTC | #2

Ping ?

On Thu, 8 Mar 2018 10:38:56 +0530
"Nageswara R Sastry" <nasastry@in.ibm.com> wrote:

> Greg Kurz <groug@kaod.org> wrote on 08/03/2018 12:27:37 AM:
> 
> > From: Greg Kurz <groug@kaod.org>
> > To: qemu-devel@nongnu.org
> > Cc: "Michael S. Tsirkin" <mst@redhat.com>, Jason Wang 
> > <jasowang@redhat.com>, R Nageswara Sastry <nasastry@in.ibm.com>
> > Date: 08/03/2018 12:27 AM
> > Subject: [PATCH] virtio_net: flush uncompleted TX on reset
> > 
> > If the backend could not transmit a packet right away for some reason,
> > the packet is queued for asynchronous sending. The corresponding vq
> > element is tracked in the async_tx.elem field of the VirtIONetQueue,
> > for later freeing when the transmission is complete.
> > 
> > If a reset happens before completion, virtio_net_tx_complete() will push
> > async_tx.elem back to the guest anyway, and we end up with the inuse   
> flag
> > of the vq being equal to -1. The next call to virtqueue_pop() is then
> > likely to fail with "Virtqueue size exceeded".
> > 
> > This can be reproduced easily by starting a guest without a net backend,
> > doing a system reset when it is booted, and finally snapshotting it.
> > 
> > The appropriate fix is to ensure that such an asynchronous transmission
> > cannot survive a device reset. So for all queues, we first try to send
> > the packet again, and eventually we purge it if the backend still could
> > not deliver it.
> > 
> > Reported-by: R. Nageswara Sastry <nasastry@in.ibm.com>  
> 
> Tested-by: R. Nageswara Sastry <nasastry@in.ibm.com>
> 
> > Buglink: https://urldefense.proofpoint.com/v2/url?
> >   
> u=https-3A__github.com_open-2Dpower-2Dhost-2Dos_qemu_issues_37&d=DwICaQ&c=jf_iaSHvJObTbx-
> > siA1ZOg&r=mxAxqGE8eb0FlPPFjDZkTNaoci-GdQbkJayE4r-
> >   
> wzYY&m=hPoY6b601IXJbUV2uh22jBrnYuByNQpwi1d7gvN4yZs&s=Ic3NN3mM_Nv1gAJ7dY22-
> > ebnJsG7c0yNkbThX8Tu6xg&e=
> > Signed-off-by: Greg Kurz <groug@kaod.org>
> > ---
> >  hw/net/virtio-net.c |   11 +++++++++++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index 188744e17d57..eea3cdb2c700 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -422,6 +422,7 @@ static RxFilterInfo 
> > *virtio_net_query_rxfilter(NetClientState *nc)
> >  static void virtio_net_reset(VirtIODevice *vdev)
> >  {
> >      VirtIONet *n = VIRTIO_NET(vdev);
> > +    int i;
> > 
> >      /* Reset back to compatibility mode */
> >      n->promisc = 1;
> > @@ -445,6 +446,16 @@ static void virtio_net_reset(VirtIODevice *vdev)
> >      memcpy(&n->mac[0], &n->nic->conf->macaddr, sizeof(n->mac));
> >      qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
> >      memset(n->vlans, 0, MAX_VLAN >> 3);
> > +
> > +    /* Flush any async TX */
> > +    for (i = 0;  i < n->max_queues; i++) {
> > +        NetClientState *nc = qemu_get_subqueue(n->nic, i);
> > +
> > +        if (!qemu_net_queue_flush(nc->peer->incoming_queue)) {
> > +            qemu_net_queue_purge(nc->peer->incoming_queue, nc);
> > +        }
> > +        assert(!virtio_net_get_subqueue(nc)->async_tx.elem);
> > +    }
> >  }
> > 
> >  static void peer_test_vnet_hdr(VirtIONet *n)
> >   
> 
> 
> With out patch:
> (qemu) system_reset
> (qemu) savevm 1
> Virtqueue size exceeded
> (qemu) loadvm 1
> VQ 1 size 0x100 < last_avail_idx 0x0 - used_idx 0x1
> Failed to load virtio-net:virtio
> error while loading state for instance 0x0 of device
> 'pci@800000020000000:00.0/virtio-net'
> Error -1 while loading VM state
> 
> With patch:
> (qemu) system_reset
> (qemu) savevm 1
> (qemu) loadvm 1
> 
> 
> 
>

Jason Wang March 16, 2018, 8:05 a.m. UTC | #3

On 2018年03月08日 02:57, Greg Kurz wrote:
> If the backend could not transmit a packet right away for some reason,
> the packet is queued for asynchronous sending. The corresponding vq
> element is tracked in the async_tx.elem field of the VirtIONetQueue,
> for later freeing when the transmission is complete.
>
> If a reset happens before completion, virtio_net_tx_complete() will push
> async_tx.elem back to the guest anyway, and we end up with the inuse flag
> of the vq being equal to -1. The next call to virtqueue_pop() is then
> likely to fail with "Virtqueue size exceeded".
>
> This can be reproduced easily by starting a guest without a net backend,
> doing a system reset when it is booted, and finally snapshotting it.
>
> The appropriate fix is to ensure that such an asynchronous transmission
> cannot survive a device reset. So for all queues, we first try to send
> the packet again, and eventually we purge it if the backend still could
> not deliver it.
>
> Reported-by: R. Nageswara Sastry <nasastry@in.ibm.com>
> Buglink: https://github.com/open-power-host-os/qemu/issues/37
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---
>   hw/net/virtio-net.c |   11 +++++++++++
>   1 file changed, 11 insertions(+)
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 188744e17d57..eea3cdb2c700 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -422,6 +422,7 @@ static RxFilterInfo *virtio_net_query_rxfilter(NetClientState *nc)
>   static void virtio_net_reset(VirtIODevice *vdev)
>   {
>       VirtIONet *n = VIRTIO_NET(vdev);
> +    int i;
>   
>       /* Reset back to compatibility mode */
>       n->promisc = 1;
> @@ -445,6 +446,16 @@ static void virtio_net_reset(VirtIODevice *vdev)
>       memcpy(&n->mac[0], &n->nic->conf->macaddr, sizeof(n->mac));
>       qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
>       memset(n->vlans, 0, MAX_VLAN >> 3);
> +
> +    /* Flush any async TX */
> +    for (i = 0;  i < n->max_queues; i++) {
> +        NetClientState *nc = qemu_get_subqueue(n->nic, i);
> +
> +        if (!qemu_net_queue_flush(nc->peer->incoming_queue)) {
> +            qemu_net_queue_purge(nc->peer->incoming_queue, nc);
> +        }

Looks like we can use qemu_flush_or_purge_queued_packets(nc->peer) here.

But a questions, you said it could be reproduced without a backend, in 
this case nc->peer should be NULL I believe or we won't even get here 
since qemu_sendv_packet_async() won't return zero?

Thanks

> +        assert(!virtio_net_get_subqueue(nc)->async_tx.elem);
> +    }
>   }
>   
>   static void peer_test_vnet_hdr(VirtIONet *n)
>
>

Greg Kurz March 16, 2018, 9:58 a.m. UTC | #4

On Fri, 16 Mar 2018 16:05:17 +0800
Jason Wang <jasowang@redhat.com> wrote:

> On 2018年03月08日 02:57, Greg Kurz wrote:
> > If the backend could not transmit a packet right away for some reason,
> > the packet is queued for asynchronous sending. The corresponding vq
> > element is tracked in the async_tx.elem field of the VirtIONetQueue,
> > for later freeing when the transmission is complete.
> >
> > If a reset happens before completion, virtio_net_tx_complete() will push
> > async_tx.elem back to the guest anyway, and we end up with the inuse flag
> > of the vq being equal to -1. The next call to virtqueue_pop() is then
> > likely to fail with "Virtqueue size exceeded".
> >
> > This can be reproduced easily by starting a guest without a net backend,
> > doing a system reset when it is booted, and finally snapshotting it.
> >
> > The appropriate fix is to ensure that such an asynchronous transmission
> > cannot survive a device reset. So for all queues, we first try to send
> > the packet again, and eventually we purge it if the backend still could
> > not deliver it.
> >
> > Reported-by: R. Nageswara Sastry <nasastry@in.ibm.com>
> > Buglink: https://github.com/open-power-host-os/qemu/issues/37
> > Signed-off-by: Greg Kurz <groug@kaod.org>
> > ---
> >   hw/net/virtio-net.c |   11 +++++++++++
> >   1 file changed, 11 insertions(+)
> >
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index 188744e17d57..eea3cdb2c700 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -422,6 +422,7 @@ static RxFilterInfo *virtio_net_query_rxfilter(NetClientState *nc)
> >   static void virtio_net_reset(VirtIODevice *vdev)
> >   {
> >       VirtIONet *n = VIRTIO_NET(vdev);
> > +    int i;
> >   
> >       /* Reset back to compatibility mode */
> >       n->promisc = 1;
> > @@ -445,6 +446,16 @@ static void virtio_net_reset(VirtIODevice *vdev)
> >       memcpy(&n->mac[0], &n->nic->conf->macaddr, sizeof(n->mac));
> >       qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
> >       memset(n->vlans, 0, MAX_VLAN >> 3);
> > +
> > +    /* Flush any async TX */
> > +    for (i = 0;  i < n->max_queues; i++) {
> > +        NetClientState *nc = qemu_get_subqueue(n->nic, i);
> > +
> > +        if (!qemu_net_queue_flush(nc->peer->incoming_queue)) {
> > +            qemu_net_queue_purge(nc->peer->incoming_queue, nc);
> > +        }  
> 
> Looks like we can use qemu_flush_or_purge_queued_packets(nc->peer) here.
> 

It should be made extern first but we can use it indeed.

> But a questions, you said it could be reproduced without a backend, in 
> this case nc->peer should be NULL I believe or we won't even get here 
> since qemu_sendv_packet_async() won't return zero?
> 

My bad, I didn't use an appropriate wording. The issue is always reproducible
if you only pass '-net nic,model=virtio', without any other -net option that
would provide a functional network.

ie,

-device virtio-net-pci,netdev=netdev0 -netdev hubport,id=netdev0,hubid=0

So, we do have a hubport peer, but since it isn't connected to the host network,
net_hub_port_can_receive() returns 0, and so does qemu_sendv_packet_async().

What about the following change in the changelog ?

"This can be reproduced easily by starting a guest with an hubport
 backend that is not connected to a functional network, eg,

 -device virtio-net-pci,netdev=netdev0 -netdev hubport,id=netdev0,hubid=0"

Cheers,

--
Greg

> Thanks
> 
> > +        assert(!virtio_net_get_subqueue(nc)->async_tx.elem);
> > +    }
> >   }
> >   
> >   static void peer_test_vnet_hdr(VirtIONet *n)
> >
> >  
>

Jason Wang March 16, 2018, 11:28 a.m. UTC | #5

On 2018年03月16日 17:58, Greg Kurz wrote:
> On Fri, 16 Mar 2018 16:05:17 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>> On 2018年03月08日 02:57, Greg Kurz wrote:
>>> If the backend could not transmit a packet right away for some reason,
>>> the packet is queued for asynchronous sending. The corresponding vq
>>> element is tracked in the async_tx.elem field of the VirtIONetQueue,
>>> for later freeing when the transmission is complete.
>>>
>>> If a reset happens before completion, virtio_net_tx_complete() will push
>>> async_tx.elem back to the guest anyway, and we end up with the inuse flag
>>> of the vq being equal to -1. The next call to virtqueue_pop() is then
>>> likely to fail with "Virtqueue size exceeded".
>>>
>>> This can be reproduced easily by starting a guest without a net backend,
>>> doing a system reset when it is booted, and finally snapshotting it.
>>>
>>> The appropriate fix is to ensure that such an asynchronous transmission
>>> cannot survive a device reset. So for all queues, we first try to send
>>> the packet again, and eventually we purge it if the backend still could
>>> not deliver it.
>>>
>>> Reported-by: R. Nageswara Sastry <nasastry@in.ibm.com>
>>> Buglink: https://github.com/open-power-host-os/qemu/issues/37
>>> Signed-off-by: Greg Kurz <groug@kaod.org>
>>> ---
>>>    hw/net/virtio-net.c |   11 +++++++++++
>>>    1 file changed, 11 insertions(+)
>>>
>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>> index 188744e17d57..eea3cdb2c700 100644
>>> --- a/hw/net/virtio-net.c
>>> +++ b/hw/net/virtio-net.c
>>> @@ -422,6 +422,7 @@ static RxFilterInfo *virtio_net_query_rxfilter(NetClientState *nc)
>>>    static void virtio_net_reset(VirtIODevice *vdev)
>>>    {
>>>        VirtIONet *n = VIRTIO_NET(vdev);
>>> +    int i;
>>>    
>>>        /* Reset back to compatibility mode */
>>>        n->promisc = 1;
>>> @@ -445,6 +446,16 @@ static void virtio_net_reset(VirtIODevice *vdev)
>>>        memcpy(&n->mac[0], &n->nic->conf->macaddr, sizeof(n->mac));
>>>        qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
>>>        memset(n->vlans, 0, MAX_VLAN >> 3);
>>> +
>>> +    /* Flush any async TX */
>>> +    for (i = 0;  i < n->max_queues; i++) {
>>> +        NetClientState *nc = qemu_get_subqueue(n->nic, i);
>>> +
>>> +        if (!qemu_net_queue_flush(nc->peer->incoming_queue)) {
>>> +            qemu_net_queue_purge(nc->peer->incoming_queue, nc);
>>> +        }
>> Looks like we can use qemu_flush_or_purge_queued_packets(nc->peer) here.
>>
> It should be made extern first but we can use it indeed.
>
>> But a questions, you said it could be reproduced without a backend, in
>> this case nc->peer should be NULL I believe or we won't even get here
>> since qemu_sendv_packet_async() won't return zero?
>>
> My bad, I didn't use an appropriate wording. The issue is always reproducible
> if you only pass '-net nic,model=virtio', without any other -net option that
> would provide a functional network.
>
> ie,
>
> -device virtio-net-pci,netdev=netdev0 -netdev hubport,id=netdev0,hubid=0
>
> So, we do have a hubport peer, but since it isn't connected to the host network,
> net_hub_port_can_receive() returns 0, and so does qemu_sendv_packet_async().
>
> What about the following change in the changelog ?
>
> "This can be reproduced easily by starting a guest with an hubport
>   backend that is not connected to a functional network, eg,
>
>   -device virtio-net-pci,netdev=netdev0 -netdev hubport,id=netdev0,hubid=0"
>
> Cheers,

Looks good to me.

Thanks

> --
> Greg
>
>> Thanks
>>
>>> +        assert(!virtio_net_get_subqueue(nc)->async_tx.elem);
>>> +    }
>>>    }
>>>    
>>>    static void peer_test_vnet_hdr(VirtIONet *n)
>>>
>>>

virtio_net: flush uncompleted TX on reset

Commit Message

Comments

Patch