Message ID | 20141220002327.GA31975@gondor.apana.org.au |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
On 12/20/2014 08:23 AM, Herbert Xu wrote: > David Vrabel <david.vrabel@citrix.com> wrote: >> After d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less interrupt >> masking in NAPI) the napi instance is removed from the per-cpu list >> prior to calling the n->poll(), and is only requeued if all of the >> budget was used. This inadvertently broke netfront because netfront >> does not use NAPI correctly. > A similar bug exists in virtio_net. > > -- >8 -- > The commit d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less > interrupt masking in NAPI) breaks virtio_net in an insidious way. > > It is now required that if the entire budget is consumed when poll > returns, the napi poll_list must remain empty. However, like some > other drivers virtio_net tries to do a last-ditch check and if > there is more work it will call napi_schedule and then immediately > process some of this new work. Should the entire budget be consumed > while processing such new work then we will violate the new caller > contract. > > This patch fixes this by not touching any work when we reschedule > in virtio_net. > > The worst part of this bug is that the list corruption causes other > napi users to be moved off-list. In my case I was chasing a stall > in IPsec (IPsec uses netif_rx) and I only belatedly realised that it > was virtio_net which caused the stall even though the virtio_net > poll was still functioning perfectly after IPsec stalled. > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index b8bd719..5ca9771 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -760,7 +760,6 @@ static int virtnet_poll(struct napi_struct *napi, int budget) > container_of(napi, struct receive_queue, napi); > unsigned int r, received = 0; > > -again: > received += virtnet_receive(rq, budget - received); > > /* Out of packets? */ > @@ -771,7 +770,6 @@ again: > napi_schedule_prep(napi)) { > virtqueue_disable_cb(rq->vq); > __napi_schedule(napi); > - goto again; > } > } > > Cheers, Acked-by: Jason Wang <jasowang@redhat.com> btw, looks like at least caif_virtio has the same issue. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 19-12-2014 22:23, Herbert Xu wrote: > David Vrabel <david.vrabel@citrix.com> wrote: >> After d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less interrupt >> masking in NAPI) the napi instance is removed from the per-cpu list >> prior to calling the n->poll(), and is only requeued if all of the >> budget was used. This inadvertently broke netfront because netfront >> does not use NAPI correctly. > > A similar bug exists in virtio_net. > > -- >8 -- > The commit d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less > interrupt masking in NAPI) breaks virtio_net in an insidious way. > > It is now required that if the entire budget is consumed when poll > returns, the napi poll_list must remain empty. However, like some > other drivers virtio_net tries to do a last-ditch check and if > there is more work it will call napi_schedule and then immediately > process some of this new work. Should the entire budget be consumed > while processing such new work then we will violate the new caller > contract. > > This patch fixes this by not touching any work when we reschedule > in virtio_net. > > The worst part of this bug is that the list corruption causes other > napi users to be moved off-list. In my case I was chasing a stall > in IPsec (IPsec uses netif_rx) and I only belatedly realised that it > was virtio_net which caused the stall even though the virtio_net > poll was still functioning perfectly after IPsec stalled. Thanks for finding/fixing this, Herbert. I was debugging this one too. In my case, vxlan interface was getting stuck. Marcelo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Herbert Xu <herbert@gondor.apana.org.au> Date: Sat, 20 Dec 2014 11:23:27 +1100 > The commit d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less > interrupt masking in NAPI) breaks virtio_net in an insidious way. > > It is now required that if the entire budget is consumed when poll > returns, the napi poll_list must remain empty. However, like some > other drivers virtio_net tries to do a last-ditch check and if > there is more work it will call napi_schedule and then immediately > process some of this new work. Should the entire budget be consumed > while processing such new work then we will violate the new caller > contract. > > This patch fixes this by not touching any work when we reschedule > in virtio_net. > > The worst part of this bug is that the list corruption causes other > napi users to be moved off-list. In my case I was chasing a stall > in IPsec (IPsec uses netif_rx) and I only belatedly realised that it > was virtio_net which caused the stall even though the virtio_net > poll was still functioning perfectly after IPsec stalled. > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Applied, thanks Herbert. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index b8bd719..5ca9771 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -760,7 +760,6 @@ static int virtnet_poll(struct napi_struct *napi, int budget) container_of(napi, struct receive_queue, napi); unsigned int r, received = 0; -again: received += virtnet_receive(rq, budget - received); /* Out of packets? */ @@ -771,7 +770,6 @@ again: napi_schedule_prep(napi)) { virtqueue_disable_cb(rq->vq); __napi_schedule(napi); - goto again; } }