virtio-net: Drop net_virtio_info.can_receive

Message ID	1435633611-14023-1-git-send-email-famz@redhat.com
State	New
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> From: Fam Zheng <famz@redhat.com> To: qemu-devel@nongnu.org Date: Tue, 30 Jun 2015 11:06:51 +0800 Message-Id: <1435633611-14023-1-git-send-email-famz@redhat.com> Cc: jasowang@redhat.com, Stefan Hajnoczi <stefanha@redhat.com>, "Michael S. Tsirkin" <mst@redhat.com> Subject: [Qemu-devel] [PATCH] virtio-net: Drop net_virtio_info.can_receive Precedence: list Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org

Fam Zheng June 30, 2015, 3:06 a.m. UTC

virtio_net_receive still does the check by calling
virtio_net_can_receive, if the device or driver is not ready, the packet
is dropped.

This is necessary because returning false from can_receive complicates
things: the peer would disable sending until we explicitly flush the
queue.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 hw/net/virtio-net.c | 1 -
 1 file changed, 1 deletion(-)

Jason Wang June 30, 2015, 8:35 a.m. UTC | #1

On 06/30/2015 11:06 AM, Fam Zheng wrote:
> virtio_net_receive still does the check by calling
> virtio_net_can_receive, if the device or driver is not ready, the packet
> is dropped.
>
> This is necessary because returning false from can_receive complicates
> things: the peer would disable sending until we explicitly flush the
> queue.
>
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  hw/net/virtio-net.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index d728233..dbef0d0 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -1503,7 +1503,6 @@ static int virtio_net_load_device(VirtIODevice *vdev, QEMUFile *f,
>  static NetClientInfo net_virtio_info = {
>      .type = NET_CLIENT_OPTIONS_KIND_NIC,
>      .size = sizeof(NICState),
> -    .can_receive = virtio_net_can_receive,
>      .receive = virtio_net_receive,
>      .link_status_changed = virtio_net_set_link_status,
>      .query_rx_filter = virtio_net_query_rxfilter,

A side effect of this patch is it will read and then drop packet is
guest driver is no ok.

Stefan Hajnoczi July 2, 2015, 12:46 p.m. UTC | #2

On Tue, Jun 30, 2015 at 04:35:24PM +0800, Jason Wang wrote:
> On 06/30/2015 11:06 AM, Fam Zheng wrote:
> > virtio_net_receive still does the check by calling
> > virtio_net_can_receive, if the device or driver is not ready, the packet
> > is dropped.
> >
> > This is necessary because returning false from can_receive complicates
> > things: the peer would disable sending until we explicitly flush the
> > queue.
> >
> > Signed-off-by: Fam Zheng <famz@redhat.com>
> > ---
> >  hw/net/virtio-net.c | 1 -
> >  1 file changed, 1 deletion(-)
> >
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index d728233..dbef0d0 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -1503,7 +1503,6 @@ static int virtio_net_load_device(VirtIODevice *vdev, QEMUFile *f,
> >  static NetClientInfo net_virtio_info = {
> >      .type = NET_CLIENT_OPTIONS_KIND_NIC,
> >      .size = sizeof(NICState),
> > -    .can_receive = virtio_net_can_receive,
> >      .receive = virtio_net_receive,
> >      .link_status_changed = virtio_net_set_link_status,
> >      .query_rx_filter = virtio_net_query_rxfilter,
> 
> A side effect of this patch is it will read and then drop packet is
> guest driver is no ok.

I think that the semantics of .can_receive() and .receive() return
values are currently incorrect in many NICs.  They have .can_receive()
functions that return false for conditions where .receive() would
discard the packet.  So what happens is that packets get queued when
they should actually be discarded.

The purpose of the flow control (queuing) mechanism is to tell the
sender to hold off until the receiver has more rx buffers available.
It's a short-term thing that doesn't included link down, rx disable, or
NIC reset states.

Therefore, I think this patch will not introduce a regression.  It is
adjusting the code to stop queuing packets when they should actually be
dropped.

Thoughts?

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

Michael S. Tsirkin July 2, 2015, 4:46 p.m. UTC | #3

On Thu, Jul 02, 2015 at 01:46:26PM +0100, Stefan Hajnoczi wrote:
> On Tue, Jun 30, 2015 at 04:35:24PM +0800, Jason Wang wrote:
> > On 06/30/2015 11:06 AM, Fam Zheng wrote:
> > > virtio_net_receive still does the check by calling
> > > virtio_net_can_receive, if the device or driver is not ready, the packet
> > > is dropped.
> > >
> > > This is necessary because returning false from can_receive complicates
> > > things: the peer would disable sending until we explicitly flush the
> > > queue.
> > >
> > > Signed-off-by: Fam Zheng <famz@redhat.com>
> > > ---
> > >  hw/net/virtio-net.c | 1 -
> > >  1 file changed, 1 deletion(-)
> > >
> > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > > index d728233..dbef0d0 100644
> > > --- a/hw/net/virtio-net.c
> > > +++ b/hw/net/virtio-net.c
> > > @@ -1503,7 +1503,6 @@ static int virtio_net_load_device(VirtIODevice *vdev, QEMUFile *f,
> > >  static NetClientInfo net_virtio_info = {
> > >      .type = NET_CLIENT_OPTIONS_KIND_NIC,
> > >      .size = sizeof(NICState),
> > > -    .can_receive = virtio_net_can_receive,
> > >      .receive = virtio_net_receive,
> > >      .link_status_changed = virtio_net_set_link_status,
> > >      .query_rx_filter = virtio_net_query_rxfilter,
> > 
> > A side effect of this patch is it will read and then drop packet is
> > guest driver is no ok.
> 
> I think that the semantics of .can_receive() and .receive() return
> values are currently incorrect in many NICs.  They have .can_receive()
> functions that return false for conditions where .receive() would
> discard the packet.  So what happens is that packets get queued when
> they should actually be discarded.
> 
> The purpose of the flow control (queuing) mechanism is to tell the
> sender to hold off until the receiver has more rx buffers available.
> It's a short-term thing that doesn't included link down, rx disable, or
> NIC reset states.
> 
> Therefore, I think this patch will not introduce a regression.  It is
> adjusting the code to stop queuing packets when they should actually be
> dropped.
> 
> Thoughts?
> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

OK, but we do have transient out of buffers states too, and
virtio_net_can_receive checks these as well.

To me, it looks like we should change virtio_net_can_receive to
avoid checking link down state, etc.
But I don't see why we should remove it completely.

Fam Zheng July 3, 2015, 1:12 a.m. UTC | #4

On Thu, 07/02 18:46, Michael S. Tsirkin wrote:
> On Thu, Jul 02, 2015 at 01:46:26PM +0100, Stefan Hajnoczi wrote:
> > On Tue, Jun 30, 2015 at 04:35:24PM +0800, Jason Wang wrote:
> > > On 06/30/2015 11:06 AM, Fam Zheng wrote:
> > > > virtio_net_receive still does the check by calling
> > > > virtio_net_can_receive, if the device or driver is not ready, the packet
> > > > is dropped.
> > > >
> > > > This is necessary because returning false from can_receive complicates
> > > > things: the peer would disable sending until we explicitly flush the
> > > > queue.
> > > >
> > > > Signed-off-by: Fam Zheng <famz@redhat.com>
> > > > ---
> > > >  hw/net/virtio-net.c | 1 -
> > > >  1 file changed, 1 deletion(-)
> > > >
> > > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > > > index d728233..dbef0d0 100644
> > > > --- a/hw/net/virtio-net.c
> > > > +++ b/hw/net/virtio-net.c
> > > > @@ -1503,7 +1503,6 @@ static int virtio_net_load_device(VirtIODevice *vdev, QEMUFile *f,
> > > >  static NetClientInfo net_virtio_info = {
> > > >      .type = NET_CLIENT_OPTIONS_KIND_NIC,
> > > >      .size = sizeof(NICState),
> > > > -    .can_receive = virtio_net_can_receive,
> > > >      .receive = virtio_net_receive,
> > > >      .link_status_changed = virtio_net_set_link_status,
> > > >      .query_rx_filter = virtio_net_query_rxfilter,
> > > 
> > > A side effect of this patch is it will read and then drop packet is
> > > guest driver is no ok.
> > 
> > I think that the semantics of .can_receive() and .receive() return
> > values are currently incorrect in many NICs.  They have .can_receive()
> > functions that return false for conditions where .receive() would
> > discard the packet.  So what happens is that packets get queued when
> > they should actually be discarded.
> > 
> > The purpose of the flow control (queuing) mechanism is to tell the
> > sender to hold off until the receiver has more rx buffers available.
> > It's a short-term thing that doesn't included link down, rx disable, or
> > NIC reset states.
> > 
> > Therefore, I think this patch will not introduce a regression.  It is
> > adjusting the code to stop queuing packets when they should actually be
> > dropped.
> > 
> > Thoughts?
> > 
> > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> 
> OK, but we do have transient out of buffers states too, and
> virtio_net_can_receive checks these as well.
> 
> To me, it looks like we should change virtio_net_can_receive to
> avoid checking link down state, etc.
> But I don't see why we should remove it completely.

OK, it makse sense to check buffer state.

Fam

Fam Zheng July 3, 2015, 4:17 a.m. UTC | #5

On Fri, 07/03 09:12, Fam Zheng wrote:
> On Thu, 07/02 18:46, Michael S. Tsirkin wrote:
> > On Thu, Jul 02, 2015 at 01:46:26PM +0100, Stefan Hajnoczi wrote:
> > > On Tue, Jun 30, 2015 at 04:35:24PM +0800, Jason Wang wrote:
> > > > On 06/30/2015 11:06 AM, Fam Zheng wrote:
> > > > > virtio_net_receive still does the check by calling
> > > > > virtio_net_can_receive, if the device or driver is not ready, the packet
> > > > > is dropped.
> > > > >
> > > > > This is necessary because returning false from can_receive complicates
> > > > > things: the peer would disable sending until we explicitly flush the
> > > > > queue.
> > > > >
> > > > > Signed-off-by: Fam Zheng <famz@redhat.com>
> > > > > ---
> > > > >  hw/net/virtio-net.c | 1 -
> > > > >  1 file changed, 1 deletion(-)
> > > > >
> > > > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > > > > index d728233..dbef0d0 100644
> > > > > --- a/hw/net/virtio-net.c
> > > > > +++ b/hw/net/virtio-net.c
> > > > > @@ -1503,7 +1503,6 @@ static int virtio_net_load_device(VirtIODevice *vdev, QEMUFile *f,
> > > > >  static NetClientInfo net_virtio_info = {
> > > > >      .type = NET_CLIENT_OPTIONS_KIND_NIC,
> > > > >      .size = sizeof(NICState),
> > > > > -    .can_receive = virtio_net_can_receive,
> > > > >      .receive = virtio_net_receive,
> > > > >      .link_status_changed = virtio_net_set_link_status,
> > > > >      .query_rx_filter = virtio_net_query_rxfilter,
> > > > 
> > > > A side effect of this patch is it will read and then drop packet is
> > > > guest driver is no ok.
> > > 
> > > I think that the semantics of .can_receive() and .receive() return
> > > values are currently incorrect in many NICs.  They have .can_receive()
> > > functions that return false for conditions where .receive() would
> > > discard the packet.  So what happens is that packets get queued when
> > > they should actually be discarded.
> > > 
> > > The purpose of the flow control (queuing) mechanism is to tell the
> > > sender to hold off until the receiver has more rx buffers available.
> > > It's a short-term thing that doesn't included link down, rx disable, or
> > > NIC reset states.
> > > 
> > > Therefore, I think this patch will not introduce a regression.  It is
> > > adjusting the code to stop queuing packets when they should actually be
> > > dropped.
> > > 
> > > Thoughts?
> > > 
> > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > 
> > OK, but we do have transient out of buffers states too, and
> > virtio_net_can_receive checks these as well.

I'm a bit confused, do you mean virtio_queue_ready()? That is buffer presence
check, and the free buffer check virtio_net_has_buffers() is only in
virtio_net_receive(). Am I missing things?

Fam

> > 
> > To me, it looks like we should change virtio_net_can_receive to
> > avoid checking link down state, etc.
> > But I don't see why we should remove it completely.
> 
> OK, it makse sense to check buffer state.
> 
> Fam
>

Michael S. Tsirkin July 4, 2015, 6:48 p.m. UTC | #6

On Fri, Jul 03, 2015 at 12:17:38PM +0800, Fam Zheng wrote:
> On Fri, 07/03 09:12, Fam Zheng wrote:
> > On Thu, 07/02 18:46, Michael S. Tsirkin wrote:
> > > On Thu, Jul 02, 2015 at 01:46:26PM +0100, Stefan Hajnoczi wrote:
> > > > On Tue, Jun 30, 2015 at 04:35:24PM +0800, Jason Wang wrote:
> > > > > On 06/30/2015 11:06 AM, Fam Zheng wrote:
> > > > > > virtio_net_receive still does the check by calling
> > > > > > virtio_net_can_receive, if the device or driver is not ready, the packet
> > > > > > is dropped.
> > > > > >
> > > > > > This is necessary because returning false from can_receive complicates
> > > > > > things: the peer would disable sending until we explicitly flush the
> > > > > > queue.
> > > > > >
> > > > > > Signed-off-by: Fam Zheng <famz@redhat.com>
> > > > > > ---
> > > > > >  hw/net/virtio-net.c | 1 -
> > > > > >  1 file changed, 1 deletion(-)
> > > > > >
> > > > > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > > > > > index d728233..dbef0d0 100644
> > > > > > --- a/hw/net/virtio-net.c
> > > > > > +++ b/hw/net/virtio-net.c
> > > > > > @@ -1503,7 +1503,6 @@ static int virtio_net_load_device(VirtIODevice *vdev, QEMUFile *f,
> > > > > >  static NetClientInfo net_virtio_info = {
> > > > > >      .type = NET_CLIENT_OPTIONS_KIND_NIC,
> > > > > >      .size = sizeof(NICState),
> > > > > > -    .can_receive = virtio_net_can_receive,
> > > > > >      .receive = virtio_net_receive,
> > > > > >      .link_status_changed = virtio_net_set_link_status,
> > > > > >      .query_rx_filter = virtio_net_query_rxfilter,
> > > > > 
> > > > > A side effect of this patch is it will read and then drop packet is
> > > > > guest driver is no ok.
> > > > 
> > > > I think that the semantics of .can_receive() and .receive() return
> > > > values are currently incorrect in many NICs.  They have .can_receive()
> > > > functions that return false for conditions where .receive() would
> > > > discard the packet.  So what happens is that packets get queued when
> > > > they should actually be discarded.
> > > > 
> > > > The purpose of the flow control (queuing) mechanism is to tell the
> > > > sender to hold off until the receiver has more rx buffers available.
> > > > It's a short-term thing that doesn't included link down, rx disable, or
> > > > NIC reset states.
> > > > 
> > > > Therefore, I think this patch will not introduce a regression.  It is
> > > > adjusting the code to stop queuing packets when they should actually be
> > > > dropped.
> > > > 
> > > > Thoughts?
> > > > 
> > > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > 
> > > OK, but we do have transient out of buffers states too, and
> > > virtio_net_can_receive checks these as well.
> 
> I'm a bit confused, do you mean virtio_queue_ready()? That is buffer presence
> check, and the free buffer check virtio_net_has_buffers() is only in
> virtio_net_receive(). Am I missing things?
> 
> Fam

Interesting. You are right.

See also commit cdd5cc12ba8cf0c068da319370bdd3ba45eaf7ac.


> > > 
> > > To me, it looks like we should change virtio_net_can_receive to
> > > avoid checking link down state, etc.
> > > But I don't see why we should remove it completely.
> > 
> > OK, it makse sense to check buffer state.
> > 
> > Fam
> >

Jason Wang July 6, 2015, 3:32 a.m. UTC | #7

On 07/02/2015 08:46 PM, Stefan Hajnoczi wrote:
> On Tue, Jun 30, 2015 at 04:35:24PM +0800, Jason Wang wrote:
>> On 06/30/2015 11:06 AM, Fam Zheng wrote:
>>> virtio_net_receive still does the check by calling
>>> virtio_net_can_receive, if the device or driver is not ready, the packet
>>> is dropped.
>>>
>>> This is necessary because returning false from can_receive complicates
>>> things: the peer would disable sending until we explicitly flush the
>>> queue.
>>>
>>> Signed-off-by: Fam Zheng <famz@redhat.com>
>>> ---
>>>  hw/net/virtio-net.c | 1 -
>>>  1 file changed, 1 deletion(-)
>>>
>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>> index d728233..dbef0d0 100644
>>> --- a/hw/net/virtio-net.c
>>> +++ b/hw/net/virtio-net.c
>>> @@ -1503,7 +1503,6 @@ static int virtio_net_load_device(VirtIODevice *vdev, QEMUFile *f,
>>>  static NetClientInfo net_virtio_info = {
>>>      .type = NET_CLIENT_OPTIONS_KIND_NIC,
>>>      .size = sizeof(NICState),
>>> -    .can_receive = virtio_net_can_receive,
>>>      .receive = virtio_net_receive,
>>>      .link_status_changed = virtio_net_set_link_status,
>>>      .query_rx_filter = virtio_net_query_rxfilter,
>> A side effect of this patch is it will read and then drop packet is
>> guest driver is no ok.
> I think that the semantics of .can_receive() and .receive() return
> values are currently incorrect in many NICs.  They have .can_receive()
> functions that return false for conditions where .receive() would
> discard the packet.  So what happens is that packets get queued when
> they should actually be discarded.

Yes, but they are bugs more or less.

>
> The purpose of the flow control (queuing) mechanism is to tell the
> sender to hold off until the receiver has more rx buffers available.
> It's a short-term thing that doesn't included link down, rx disable, or
> NIC reset states.
>
> Therefore, I think this patch will not introduce a regression.  It is
> adjusting the code to stop queuing packets when they should actually be
> dropped.
>
> Thoughts?

I agree there's no functional issue. But it cause wasting of cpu cycles
(consider guest is being flooded). Sometime it maybe even dangerous. For
tap, we're probably ok since we have 756ae78b but for other backend, we
don't.

>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

Stefan Hajnoczi July 6, 2015, 3:21 p.m. UTC | #8

On Mon, Jul 06, 2015 at 11:32:25AM +0800, Jason Wang wrote:
> 
> 
> On 07/02/2015 08:46 PM, Stefan Hajnoczi wrote:
> > On Tue, Jun 30, 2015 at 04:35:24PM +0800, Jason Wang wrote:
> >> On 06/30/2015 11:06 AM, Fam Zheng wrote:
> >>> virtio_net_receive still does the check by calling
> >>> virtio_net_can_receive, if the device or driver is not ready, the packet
> >>> is dropped.
> >>>
> >>> This is necessary because returning false from can_receive complicates
> >>> things: the peer would disable sending until we explicitly flush the
> >>> queue.
> >>>
> >>> Signed-off-by: Fam Zheng <famz@redhat.com>
> >>> ---
> >>>  hw/net/virtio-net.c | 1 -
> >>>  1 file changed, 1 deletion(-)
> >>>
> >>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >>> index d728233..dbef0d0 100644
> >>> --- a/hw/net/virtio-net.c
> >>> +++ b/hw/net/virtio-net.c
> >>> @@ -1503,7 +1503,6 @@ static int virtio_net_load_device(VirtIODevice *vdev, QEMUFile *f,
> >>>  static NetClientInfo net_virtio_info = {
> >>>      .type = NET_CLIENT_OPTIONS_KIND_NIC,
> >>>      .size = sizeof(NICState),
> >>> -    .can_receive = virtio_net_can_receive,
> >>>      .receive = virtio_net_receive,
> >>>      .link_status_changed = virtio_net_set_link_status,
> >>>      .query_rx_filter = virtio_net_query_rxfilter,
> >> A side effect of this patch is it will read and then drop packet is
> >> guest driver is no ok.
> > I think that the semantics of .can_receive() and .receive() return
> > values are currently incorrect in many NICs.  They have .can_receive()
> > functions that return false for conditions where .receive() would
> > discard the packet.  So what happens is that packets get queued when
> > they should actually be discarded.
> 
> Yes, but they are bugs more or less.
> 
> >
> > The purpose of the flow control (queuing) mechanism is to tell the
> > sender to hold off until the receiver has more rx buffers available.
> > It's a short-term thing that doesn't included link down, rx disable, or
> > NIC reset states.
> >
> > Therefore, I think this patch will not introduce a regression.  It is
> > adjusting the code to stop queuing packets when they should actually be
> > dropped.
> >
> > Thoughts?
> 
> I agree there's no functional issue. But it cause wasting of cpu cycles
> (consider guest is being flooded). Sometime it maybe even dangerous. For
> tap, we're probably ok since we have 756ae78b but for other backend, we
> don't.

If the guest uses iptables rules or other mechanisms to drop bogus
packets the cost is even higher than discarding them at the QEMU layer.

What's more is that if you're using link down as a DoS mitigation
strategy then you might as well hot unplug the NIC.

Stefan

Michael S. Tsirkin July 6, 2015, 5:09 p.m. UTC | #9

On Mon, Jul 06, 2015 at 04:21:16PM +0100, Stefan Hajnoczi wrote:
> On Mon, Jul 06, 2015 at 11:32:25AM +0800, Jason Wang wrote:
> > 
> > 
> > On 07/02/2015 08:46 PM, Stefan Hajnoczi wrote:
> > > On Tue, Jun 30, 2015 at 04:35:24PM +0800, Jason Wang wrote:
> > >> On 06/30/2015 11:06 AM, Fam Zheng wrote:
> > >>> virtio_net_receive still does the check by calling
> > >>> virtio_net_can_receive, if the device or driver is not ready, the packet
> > >>> is dropped.
> > >>>
> > >>> This is necessary because returning false from can_receive complicates
> > >>> things: the peer would disable sending until we explicitly flush the
> > >>> queue.
> > >>>
> > >>> Signed-off-by: Fam Zheng <famz@redhat.com>
> > >>> ---
> > >>>  hw/net/virtio-net.c | 1 -
> > >>>  1 file changed, 1 deletion(-)
> > >>>
> > >>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > >>> index d728233..dbef0d0 100644
> > >>> --- a/hw/net/virtio-net.c
> > >>> +++ b/hw/net/virtio-net.c
> > >>> @@ -1503,7 +1503,6 @@ static int virtio_net_load_device(VirtIODevice *vdev, QEMUFile *f,
> > >>>  static NetClientInfo net_virtio_info = {
> > >>>      .type = NET_CLIENT_OPTIONS_KIND_NIC,
> > >>>      .size = sizeof(NICState),
> > >>> -    .can_receive = virtio_net_can_receive,
> > >>>      .receive = virtio_net_receive,
> > >>>      .link_status_changed = virtio_net_set_link_status,
> > >>>      .query_rx_filter = virtio_net_query_rxfilter,
> > >> A side effect of this patch is it will read and then drop packet is
> > >> guest driver is no ok.
> > > I think that the semantics of .can_receive() and .receive() return
> > > values are currently incorrect in many NICs.  They have .can_receive()
> > > functions that return false for conditions where .receive() would
> > > discard the packet.  So what happens is that packets get queued when
> > > they should actually be discarded.
> > 
> > Yes, but they are bugs more or less.
> > 
> > >
> > > The purpose of the flow control (queuing) mechanism is to tell the
> > > sender to hold off until the receiver has more rx buffers available.
> > > It's a short-term thing that doesn't included link down, rx disable, or
> > > NIC reset states.
> > >
> > > Therefore, I think this patch will not introduce a regression.  It is
> > > adjusting the code to stop queuing packets when they should actually be
> > > dropped.
> > >
> > > Thoughts?
> > 
> > I agree there's no functional issue. But it cause wasting of cpu cycles
> > (consider guest is being flooded). Sometime it maybe even dangerous. For
> > tap, we're probably ok since we have 756ae78b but for other backend, we
> > don't.
> 
> If the guest uses iptables rules or other mechanisms to drop bogus
> packets the cost is even higher than discarding them at the QEMU layer.
> 
> What's more is that if you're using link down as a DoS mitigation
> strategy then you might as well hot unplug the NIC.
> 
> Stefan



Frankly, I don't see the point of the patch.  Is this supposed to be a
bugfix? If so, there's should be a description about how to trigger the
bug.  Is this an optimization? If so there should be some numbers
showing a gain.

Fam Zheng July 7, 2015, 12:53 a.m. UTC | #10

On Mon, 07/06 20:09, Michael S. Tsirkin wrote:
> On Mon, Jul 06, 2015 at 04:21:16PM +0100, Stefan Hajnoczi wrote:
> > On Mon, Jul 06, 2015 at 11:32:25AM +0800, Jason Wang wrote:
> > > 
> > > 
> > > On 07/02/2015 08:46 PM, Stefan Hajnoczi wrote:
> > > > On Tue, Jun 30, 2015 at 04:35:24PM +0800, Jason Wang wrote:
> > > >> On 06/30/2015 11:06 AM, Fam Zheng wrote:
> > > >>> virtio_net_receive still does the check by calling
> > > >>> virtio_net_can_receive, if the device or driver is not ready, the packet
> > > >>> is dropped.
> > > >>>
> > > >>> This is necessary because returning false from can_receive complicates
> > > >>> things: the peer would disable sending until we explicitly flush the
> > > >>> queue.
> > > >>>
> > > >>> Signed-off-by: Fam Zheng <famz@redhat.com>
> > > >>> ---
> > > >>>  hw/net/virtio-net.c | 1 -
> > > >>>  1 file changed, 1 deletion(-)
> > > >>>
> > > >>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > > >>> index d728233..dbef0d0 100644
> > > >>> --- a/hw/net/virtio-net.c
> > > >>> +++ b/hw/net/virtio-net.c
> > > >>> @@ -1503,7 +1503,6 @@ static int virtio_net_load_device(VirtIODevice *vdev, QEMUFile *f,
> > > >>>  static NetClientInfo net_virtio_info = {
> > > >>>      .type = NET_CLIENT_OPTIONS_KIND_NIC,
> > > >>>      .size = sizeof(NICState),
> > > >>> -    .can_receive = virtio_net_can_receive,
> > > >>>      .receive = virtio_net_receive,
> > > >>>      .link_status_changed = virtio_net_set_link_status,
> > > >>>      .query_rx_filter = virtio_net_query_rxfilter,
> > > >> A side effect of this patch is it will read and then drop packet is
> > > >> guest driver is no ok.
> > > > I think that the semantics of .can_receive() and .receive() return
> > > > values are currently incorrect in many NICs.  They have .can_receive()
> > > > functions that return false for conditions where .receive() would
> > > > discard the packet.  So what happens is that packets get queued when
> > > > they should actually be discarded.
> > > 
> > > Yes, but they are bugs more or less.
> > > 
> > > >
> > > > The purpose of the flow control (queuing) mechanism is to tell the
> > > > sender to hold off until the receiver has more rx buffers available.
> > > > It's a short-term thing that doesn't included link down, rx disable, or
> > > > NIC reset states.
> > > >
> > > > Therefore, I think this patch will not introduce a regression.  It is
> > > > adjusting the code to stop queuing packets when they should actually be
> > > > dropped.
> > > >
> > > > Thoughts?
> > > 
> > > I agree there's no functional issue. But it cause wasting of cpu cycles
> > > (consider guest is being flooded). Sometime it maybe even dangerous. For
> > > tap, we're probably ok since we have 756ae78b but for other backend, we
> > > don't.
> > 
> > If the guest uses iptables rules or other mechanisms to drop bogus
> > packets the cost is even higher than discarding them at the QEMU layer.
> > 
> > What's more is that if you're using link down as a DoS mitigation
> > strategy then you might as well hot unplug the NIC.
> > 
> > Stefan
> 
> 
> 
> Frankly, I don't see the point of the patch.  Is this supposed to be a
> bugfix? If so, there's should be a description about how to trigger the
> bug.  Is this an optimization? If so there should be some numbers
> showing a gain.

It's a bug fix, we are not flushing the queue when DIRVER_OK is being set or
when buffer is becoming available (the virtio_net_can_receive conditions). Not
an issue before a90a7425cf but since that the semantics is enforced.

Fam

Michael S. Tsirkin July 7, 2015, 8:10 a.m. UTC | #11

On Tue, Jul 07, 2015 at 08:53:59AM +0800, Fam Zheng wrote:
> On Mon, 07/06 20:09, Michael S. Tsirkin wrote:
> > On Mon, Jul 06, 2015 at 04:21:16PM +0100, Stefan Hajnoczi wrote:
> > > On Mon, Jul 06, 2015 at 11:32:25AM +0800, Jason Wang wrote:
> > > > 
> > > > 
> > > > On 07/02/2015 08:46 PM, Stefan Hajnoczi wrote:
> > > > > On Tue, Jun 30, 2015 at 04:35:24PM +0800, Jason Wang wrote:
> > > > >> On 06/30/2015 11:06 AM, Fam Zheng wrote:
> > > > >>> virtio_net_receive still does the check by calling
> > > > >>> virtio_net_can_receive, if the device or driver is not ready, the packet
> > > > >>> is dropped.
> > > > >>>
> > > > >>> This is necessary because returning false from can_receive complicates
> > > > >>> things: the peer would disable sending until we explicitly flush the
> > > > >>> queue.
> > > > >>>
> > > > >>> Signed-off-by: Fam Zheng <famz@redhat.com>
> > > > >>> ---
> > > > >>>  hw/net/virtio-net.c | 1 -
> > > > >>>  1 file changed, 1 deletion(-)
> > > > >>>
> > > > >>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > > > >>> index d728233..dbef0d0 100644
> > > > >>> --- a/hw/net/virtio-net.c
> > > > >>> +++ b/hw/net/virtio-net.c
> > > > >>> @@ -1503,7 +1503,6 @@ static int virtio_net_load_device(VirtIODevice *vdev, QEMUFile *f,
> > > > >>>  static NetClientInfo net_virtio_info = {
> > > > >>>      .type = NET_CLIENT_OPTIONS_KIND_NIC,
> > > > >>>      .size = sizeof(NICState),
> > > > >>> -    .can_receive = virtio_net_can_receive,
> > > > >>>      .receive = virtio_net_receive,
> > > > >>>      .link_status_changed = virtio_net_set_link_status,
> > > > >>>      .query_rx_filter = virtio_net_query_rxfilter,
> > > > >> A side effect of this patch is it will read and then drop packet is
> > > > >> guest driver is no ok.
> > > > > I think that the semantics of .can_receive() and .receive() return
> > > > > values are currently incorrect in many NICs.  They have .can_receive()
> > > > > functions that return false for conditions where .receive() would
> > > > > discard the packet.  So what happens is that packets get queued when
> > > > > they should actually be discarded.
> > > > 
> > > > Yes, but they are bugs more or less.
> > > > 
> > > > >
> > > > > The purpose of the flow control (queuing) mechanism is to tell the
> > > > > sender to hold off until the receiver has more rx buffers available.
> > > > > It's a short-term thing that doesn't included link down, rx disable, or
> > > > > NIC reset states.
> > > > >
> > > > > Therefore, I think this patch will not introduce a regression.  It is
> > > > > adjusting the code to stop queuing packets when they should actually be
> > > > > dropped.
> > > > >
> > > > > Thoughts?
> > > > 
> > > > I agree there's no functional issue. But it cause wasting of cpu cycles
> > > > (consider guest is being flooded). Sometime it maybe even dangerous. For
> > > > tap, we're probably ok since we have 756ae78b but for other backend, we
> > > > don't.
> > > 
> > > If the guest uses iptables rules or other mechanisms to drop bogus
> > > packets the cost is even higher than discarding them at the QEMU layer.
> > > 
> > > What's more is that if you're using link down as a DoS mitigation
> > > strategy then you might as well hot unplug the NIC.
> > > 
> > > Stefan
> > 
> > 
> > 
> > Frankly, I don't see the point of the patch.  Is this supposed to be a
> > bugfix? If so, there's should be a description about how to trigger the
> > bug.  Is this an optimization? If so there should be some numbers
> > showing a gain.
> 
> It's a bug fix, we are not flushing the queue when DIRVER_OK is being set or
> when buffer is becoming available (the virtio_net_can_receive conditions). Not
> an issue before a90a7425cf but since that the semantics is enforced.
> 
> Fam

I think the safest and obvious fix is to flush on DRIVER_OK then (unless
vhost started). That might be 2.4 material.

Jason Wang July 7, 2015, 8:45 a.m. UTC | #12

On 07/06/2015 11:21 PM, Stefan Hajnoczi wrote:
> On Mon, Jul 06, 2015 at 11:32:25AM +0800, Jason Wang wrote:
>>
>> On 07/02/2015 08:46 PM, Stefan Hajnoczi wrote:
>>> On Tue, Jun 30, 2015 at 04:35:24PM +0800, Jason Wang wrote:
>>>> On 06/30/2015 11:06 AM, Fam Zheng wrote:
>>>>> virtio_net_receive still does the check by calling
>>>>> virtio_net_can_receive, if the device or driver is not ready, the packet
>>>>> is dropped.
>>>>>
>>>>> This is necessary because returning false from can_receive complicates
>>>>> things: the peer would disable sending until we explicitly flush the
>>>>> queue.
>>>>>
>>>>> Signed-off-by: Fam Zheng <famz@redhat.com>
>>>>> ---
>>>>>  hw/net/virtio-net.c | 1 -
>>>>>  1 file changed, 1 deletion(-)
>>>>>
>>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>>>> index d728233..dbef0d0 100644
>>>>> --- a/hw/net/virtio-net.c
>>>>> +++ b/hw/net/virtio-net.c
>>>>> @@ -1503,7 +1503,6 @@ static int virtio_net_load_device(VirtIODevice *vdev, QEMUFile *f,
>>>>>  static NetClientInfo net_virtio_info = {
>>>>>      .type = NET_CLIENT_OPTIONS_KIND_NIC,
>>>>>      .size = sizeof(NICState),
>>>>> -    .can_receive = virtio_net_can_receive,
>>>>>      .receive = virtio_net_receive,
>>>>>      .link_status_changed = virtio_net_set_link_status,
>>>>>      .query_rx_filter = virtio_net_query_rxfilter,
>>>> A side effect of this patch is it will read and then drop packet is
>>>> guest driver is no ok.
>>> I think that the semantics of .can_receive() and .receive() return
>>> values are currently incorrect in many NICs.  They have .can_receive()
>>> functions that return false for conditions where .receive() would
>>> discard the packet.  So what happens is that packets get queued when
>>> they should actually be discarded.
>> Yes, but they are bugs more or less.
>>
>>> The purpose of the flow control (queuing) mechanism is to tell the
>>> sender to hold off until the receiver has more rx buffers available.
>>> It's a short-term thing that doesn't included link down, rx disable, or
>>> NIC reset states.
>>>
>>> Therefore, I think this patch will not introduce a regression.  It is
>>> adjusting the code to stop queuing packets when they should actually be
>>> dropped.
>>>
>>> Thoughts?
>> I agree there's no functional issue. But it cause wasting of cpu cycles
>> (consider guest is being flooded). Sometime it maybe even dangerous. For
>> tap, we're probably ok since we have 756ae78b but for other backend, we
>> don't.
> If the guest uses iptables rules or other mechanisms to drop bogus
> packets the cost is even higher than discarding them at the QEMU layer.

But it was the choice of guest.

>
> What's more is that if you're using link down as a DoS mitigation
> strategy then you might as well hot unplug the NIC.
>
> Stefan

I think there're two problems for virtio-net:

1) mitigation method when guest driver is ok. For tx, we have either
timer or bh, for rx and only for tap, we have 756ae78b. We probably need
fixes for other backends.

2) when driver is not ok, the point is we should not poll the backend at
all (I believe this is one of the main objects of main loop). Something
like tap_can_send() and the commit that drops tap_can_send() all follow
this rule. But this patch does not, we end up with:

- driver is not ok or no driver, qemu keep reading and dropping packets.
- driver is ok but not enough rx buffer, qemu will disable tap read poll.

Which looks conflicted.

We need fix this either in 2.4 or later and also for other NICs.

Stefan Hajnoczi July 8, 2015, 10:50 a.m. UTC | #13

On Tue, Jul 07, 2015 at 04:45:41PM +0800, Jason Wang wrote:
> 
> 
> On 07/06/2015 11:21 PM, Stefan Hajnoczi wrote:
> > On Mon, Jul 06, 2015 at 11:32:25AM +0800, Jason Wang wrote:
> >>
> >> On 07/02/2015 08:46 PM, Stefan Hajnoczi wrote:
> >>> On Tue, Jun 30, 2015 at 04:35:24PM +0800, Jason Wang wrote:
> >>>> On 06/30/2015 11:06 AM, Fam Zheng wrote:
> >>>>> virtio_net_receive still does the check by calling
> >>>>> virtio_net_can_receive, if the device or driver is not ready, the packet
> >>>>> is dropped.
> >>>>>
> >>>>> This is necessary because returning false from can_receive complicates
> >>>>> things: the peer would disable sending until we explicitly flush the
> >>>>> queue.
> >>>>>
> >>>>> Signed-off-by: Fam Zheng <famz@redhat.com>
> >>>>> ---
> >>>>>  hw/net/virtio-net.c | 1 -
> >>>>>  1 file changed, 1 deletion(-)
> >>>>>
> >>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >>>>> index d728233..dbef0d0 100644
> >>>>> --- a/hw/net/virtio-net.c
> >>>>> +++ b/hw/net/virtio-net.c
> >>>>> @@ -1503,7 +1503,6 @@ static int virtio_net_load_device(VirtIODevice *vdev, QEMUFile *f,
> >>>>>  static NetClientInfo net_virtio_info = {
> >>>>>      .type = NET_CLIENT_OPTIONS_KIND_NIC,
> >>>>>      .size = sizeof(NICState),
> >>>>> -    .can_receive = virtio_net_can_receive,
> >>>>>      .receive = virtio_net_receive,
> >>>>>      .link_status_changed = virtio_net_set_link_status,
> >>>>>      .query_rx_filter = virtio_net_query_rxfilter,
> >>>> A side effect of this patch is it will read and then drop packet is
> >>>> guest driver is no ok.
> >>> I think that the semantics of .can_receive() and .receive() return
> >>> values are currently incorrect in many NICs.  They have .can_receive()
> >>> functions that return false for conditions where .receive() would
> >>> discard the packet.  So what happens is that packets get queued when
> >>> they should actually be discarded.
> >> Yes, but they are bugs more or less.
> >>
> >>> The purpose of the flow control (queuing) mechanism is to tell the
> >>> sender to hold off until the receiver has more rx buffers available.
> >>> It's a short-term thing that doesn't included link down, rx disable, or
> >>> NIC reset states.
> >>>
> >>> Therefore, I think this patch will not introduce a regression.  It is
> >>> adjusting the code to stop queuing packets when they should actually be
> >>> dropped.
> >>>
> >>> Thoughts?
> >> I agree there's no functional issue. But it cause wasting of cpu cycles
> >> (consider guest is being flooded). Sometime it maybe even dangerous. For
> >> tap, we're probably ok since we have 756ae78b but for other backend, we
> >> don't.
> > If the guest uses iptables rules or other mechanisms to drop bogus
> > packets the cost is even higher than discarding them at the QEMU layer.
> 
> But it was the choice of guest.
> 
> >
> > What's more is that if you're using link down as a DoS mitigation
> > strategy then you might as well hot unplug the NIC.
> >
> > Stefan
> 
> I think there're two problems for virtio-net:
> 
> 1) mitigation method when guest driver is ok. For tx, we have either
> timer or bh, for rx and only for tap, we have 756ae78b. We probably need
> fixes for other backends.

I agree.  It's not directly related to the change Fam is proposing
though.

> 2) when driver is not ok, the point is we should not poll the backend at
> all (I believe this is one of the main objects of main loop). Something
> like tap_can_send() and the commit that drops tap_can_send() all follow
> this rule. But this patch does not, we end up with:
> 
> - driver is not ok or no driver, qemu keep reading and dropping packets.

It's correct from a functionality perspective.  When rx is disabled or
the link is down, packets should be dropped.

From a performance perspective I agree it would be better to drop
packets earlier in the stack.

If you want tap to drop packets efficiently during link down/receive
disabled/etc:

What's the best way to disable tap rx so the host kernel drops packets
instead of queuing them?

> - driver is ok but not enough rx buffer, qemu will disable tap read poll.

QEMU relies on the kernel's socket receive buffer or netif receive
queue to hold incoming packets.  When QEMU enables tap read poll again
it receives the queued packets.

Queuing is necessary so USB net emulation (the rx buffer is only 1
packet!) and maybe slirp work.

I think real NICs drop packets when the rx ring is exhausted, but we
need to keep the queuing behavior unless someone proposes a way to
eliminate it.

Jason Wang July 13, 2015, 4:52 a.m. UTC | #14

On 07/08/2015 06:50 PM, Stefan Hajnoczi wrote:
> On Tue, Jul 07, 2015 at 04:45:41PM +0800, Jason Wang wrote:
>>
>> On 07/06/2015 11:21 PM, Stefan Hajnoczi wrote:
>>> On Mon, Jul 06, 2015 at 11:32:25AM +0800, Jason Wang wrote:
>>>> On 07/02/2015 08:46 PM, Stefan Hajnoczi wrote:
>>>>> On Tue, Jun 30, 2015 at 04:35:24PM +0800, Jason Wang wrote:
>>>>>> On 06/30/2015 11:06 AM, Fam Zheng wrote:
>>>>>>> virtio_net_receive still does the check by calling
>>>>>>> virtio_net_can_receive, if the device or driver is not ready, the packet
>>>>>>> is dropped.
>>>>>>>
>>>>>>> This is necessary because returning false from can_receive complicates
>>>>>>> things: the peer would disable sending until we explicitly flush the
>>>>>>> queue.
>>>>>>>
>>>>>>> Signed-off-by: Fam Zheng <famz@redhat.com>
>>>>>>> ---
>>>>>>>  hw/net/virtio-net.c | 1 -
>>>>>>>  1 file changed, 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>>>>>> index d728233..dbef0d0 100644
>>>>>>> --- a/hw/net/virtio-net.c
>>>>>>> +++ b/hw/net/virtio-net.c
>>>>>>> @@ -1503,7 +1503,6 @@ static int virtio_net_load_device(VirtIODevice *vdev, QEMUFile *f,
>>>>>>>  static NetClientInfo net_virtio_info = {
>>>>>>>      .type = NET_CLIENT_OPTIONS_KIND_NIC,
>>>>>>>      .size = sizeof(NICState),
>>>>>>> -    .can_receive = virtio_net_can_receive,
>>>>>>>      .receive = virtio_net_receive,
>>>>>>>      .link_status_changed = virtio_net_set_link_status,
>>>>>>>      .query_rx_filter = virtio_net_query_rxfilter,
>>>>>> A side effect of this patch is it will read and then drop packet is
>>>>>> guest driver is no ok.
>>>>> I think that the semantics of .can_receive() and .receive() return
>>>>> values are currently incorrect in many NICs.  They have .can_receive()
>>>>> functions that return false for conditions where .receive() would
>>>>> discard the packet.  So what happens is that packets get queued when
>>>>> they should actually be discarded.
>>>> Yes, but they are bugs more or less.
>>>>
>>>>> The purpose of the flow control (queuing) mechanism is to tell the
>>>>> sender to hold off until the receiver has more rx buffers available.
>>>>> It's a short-term thing that doesn't included link down, rx disable, or
>>>>> NIC reset states.
>>>>>
>>>>> Therefore, I think this patch will not introduce a regression.  It is
>>>>> adjusting the code to stop queuing packets when they should actually be
>>>>> dropped.
>>>>>
>>>>> Thoughts?
>>>> I agree there's no functional issue. But it cause wasting of cpu cycles
>>>> (consider guest is being flooded). Sometime it maybe even dangerous. For
>>>> tap, we're probably ok since we have 756ae78b but for other backend, we
>>>> don't.
>>> If the guest uses iptables rules or other mechanisms to drop bogus
>>> packets the cost is even higher than discarding them at the QEMU layer.
>> But it was the choice of guest.
>>
>>> What's more is that if you're using link down as a DoS mitigation
>>> strategy then you might as well hot unplug the NIC.
>>>
>>> Stefan
>> I think there're two problems for virtio-net:
>>
>> 1) mitigation method when guest driver is ok. For tx, we have either
>> timer or bh, for rx and only for tap, we have 756ae78b. We probably need
>> fixes for other backends.
> I agree.  It's not directly related to the change Fam is proposing
> though.
>
>> 2) when driver is not ok, the point is we should not poll the backend at
>> all (I believe this is one of the main objects of main loop). Something
>> like tap_can_send() and the commit that drops tap_can_send() all follow
>> this rule. But this patch does not, we end up with:
>>
>> - driver is not ok or no driver, qemu keep reading and dropping packets.
> It's correct from a functionality perspective.  When rx is disabled or
> the link is down, packets should be dropped.
>
> From a performance perspective I agree it would be better to drop
> packets earlier in the stack.
>
> If you want tap to drop packets efficiently during link down/receive
> disabled/etc:
>
> What's the best way to disable tap rx so the host kernel drops packets
> instead of queuing them?

There's no such API currently especially for a unprivileged qemu. But
tuntap will drop the packet if it exceeds tx_queue_length. This looks
still much better than current behaviour. 

>
>> - driver is ok but not enough rx buffer, qemu will disable tap read poll.
> QEMU relies on the kernel's socket receive buffer or netif receive
> queue to hold incoming packets.  When QEMU enables tap read poll again
> it receives the queued packets.
>
> Queuing is necessary so USB net emulation (the rx buffer is only 1
> packet!) and maybe slirp work.

Yes.

>
> I think real NICs drop packets when the rx ring is exhausted, but we
> need to keep the queuing behavior unless someone proposes a way to
> eliminate it.

Yes, queueing seems useless if backend can do it.

So I'm ok with the patch, and we can optimize in the future.

Thanks

virtio-net: Drop net_virtio_info.can_receive

Commit Message

Comments

Patch