GRO with non napi driver: BUG in __napi_complete

Submitted by Herbert Xu on March 17, 2009, 10:22 a.m.

Details

Message ID 20090317102244.GB23396@gondor.apana.org.au
State Accepted
Delegated to: David Miller
Headers show

Commit Message

Herbert Xu March 17, 2009, 10:22 a.m.
On Tue, Mar 17, 2009 at 09:35:21AM +0100, Frank Blaschka wrote:
>
> What is the intention process_backlog calls __napi_complete() instead of
> napi_complete(), this looks suspicious to me. Can anybody help?

You're absolutely right.  Dave, we need this fix for both net
and net-next.

gro: Fix legacy path napi_complete crash

On the legacy netif_rx path, I incorrectly tried to optimise
the napi_complete call by using __napi_complete before we reenable
IRQs.  This simply doesn't work since we need to flush the held
GRO packets first.

This patch fixes it by doing the obvious thing of reenabling
IRQs first and then calling napi_complete.

Reported-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks,

Comments

David Miller March 17, 2009, 8:11 p.m.
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue, 17 Mar 2009 18:22:44 +0800

> On Tue, Mar 17, 2009 at 09:35:21AM +0100, Frank Blaschka wrote:
> >
> > What is the intention process_backlog calls __napi_complete() instead of
> > napi_complete(), this looks suspicious to me. Can anybody help?
> 
> You're absolutely right.  Dave, we need this fix for both net
> and net-next.
> 
> gro: Fix legacy path napi_complete crash
> 
> On the legacy netif_rx path, I incorrectly tried to optimise
> the napi_complete call by using __napi_complete before we reenable
> IRQs.  This simply doesn't work since we need to flush the held
> GRO packets first.
> 
> This patch fixes it by doing the obvious thing of reenabling
> IRQs first and then calling napi_complete.
> 
> Reported-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Herbert March 27, 2009, 7:05 p.m.
> On the legacy netif_rx path, I incorrectly tried to optimise
>
> the napi_complete call by using __napi_complete before we reenable
> IRQs.  This simply doesn't work since we need to flush the held
> GRO packets first.
>
> This patch fixes it by doing the obvious thing of reenabling
> IRQs first and then calling napi_complete.

Does this fix generate a race condition for a non-NAPI device?  If
netif_rx runs immediately after local_irq_enable it would queue a
packet on the backlog queue and try to schedule napi (the latter has
no effect because napi has not completed).  On return from interrupt,
napi_complete is done leaving a packet in the input queue but napi is
not scheduled to process it.

Thanks,
Tom

>
> Reported-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index f112970..2565f6d 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2588,9 +2588,9 @@ static int process_backlog(struct napi_struct *napi, int quota)
>                local_irq_disable();
>                skb = __skb_dequeue(&queue->input_pkt_queue);
>                if (!skb) {
> -                       __napi_complete(napi);
>                        local_irq_enable();
> -                       break;
> +                       napi_complete(napi);
> +                       goto out;
>                }
>                local_irq_enable();
>
> @@ -2599,6 +2599,7 @@ static int process_backlog(struct napi_struct *napi, int quota)
>
>        napi_gro_flush(napi);
>
> +out:
>        return work;
>  }
>
> Thanks,
> --
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller March 27, 2009, 10:50 p.m.
From: Tom Herbert <therbert@google.com>
Date: Fri, 27 Mar 2009 12:05:30 -0700

> > On the legacy netif_rx path, I incorrectly tried to optimise
> >
> > the napi_complete call by using __napi_complete before we reenable
> > IRQs.  This simply doesn't work since we need to flush the held
> > GRO packets first.
> >
> > This patch fixes it by doing the obvious thing of reenabling
> > IRQs first and then calling napi_complete.
> 
> Does this fix generate a race condition for a non-NAPI device?  If
> netif_rx runs immediately after local_irq_enable it would queue a
> packet on the backlog queue and try to schedule napi (the latter has
> no effect because napi has not completed).  On return from interrupt,
> napi_complete is done leaving a packet in the input queue but napi is
> not scheduled to process it.

Yes we know this version of Herbert's patch has that problem,
read the rest of the thread and subsequent versions of the fix.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch hide | download patch | download mbox

diff --git a/net/core/dev.c b/net/core/dev.c
index f112970..2565f6d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2588,9 +2588,9 @@  static int process_backlog(struct napi_struct *napi, int quota)
 		local_irq_disable();
 		skb = __skb_dequeue(&queue->input_pkt_queue);
 		if (!skb) {
-			__napi_complete(napi);
 			local_irq_enable();
-			break;
+			napi_complete(napi);
+			goto out;
 		}
 		local_irq_enable();
 
@@ -2599,6 +2599,7 @@  static int process_backlog(struct napi_struct *napi, int quota)
 
 	napi_gro_flush(napi);
 
+out:
 	return work;
 }