Patchwork GRO with non napi driver: BUG in __napi_complete

login
register
mail settings
Submitter Herbert Xu
Date March 17, 2009, 10:22 a.m.
Message ID <20090317102244.GB23396@gondor.apana.org.au>
Download mbox | patch
Permalink /patch/24555/
State Accepted
Delegated to: David Miller
Headers show

Comments

Herbert Xu - March 17, 2009, 10:22 a.m.
On Tue, Mar 17, 2009 at 09:35:21AM +0100, Frank Blaschka wrote:
>
> What is the intention process_backlog calls __napi_complete() instead of
> napi_complete(), this looks suspicious to me. Can anybody help?

You're absolutely right.  Dave, we need this fix for both net
and net-next.

gro: Fix legacy path napi_complete crash

On the legacy netif_rx path, I incorrectly tried to optimise
the napi_complete call by using __napi_complete before we reenable
IRQs.  This simply doesn't work since we need to flush the held
GRO packets first.

This patch fixes it by doing the obvious thing of reenabling
IRQs first and then calling napi_complete.

Reported-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks,
David Miller - March 17, 2009, 8:11 p.m.
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue, 17 Mar 2009 18:22:44 +0800

> On Tue, Mar 17, 2009 at 09:35:21AM +0100, Frank Blaschka wrote:
> >
> > What is the intention process_backlog calls __napi_complete() instead of
> > napi_complete(), this looks suspicious to me. Can anybody help?
> 
> You're absolutely right.  Dave, we need this fix for both net
> and net-next.
> 
> gro: Fix legacy path napi_complete crash
> 
> On the legacy netif_rx path, I incorrectly tried to optimise
> the napi_complete call by using __napi_complete before we reenable
> IRQs.  This simply doesn't work since we need to flush the held
> GRO packets first.
> 
> This patch fixes it by doing the obvious thing of reenabling
> IRQs first and then calling napi_complete.
> 
> Reported-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Herbert - March 27, 2009, 7:05 p.m.
> On the legacy netif_rx path, I incorrectly tried to optimise
>
> the napi_complete call by using __napi_complete before we reenable
> IRQs.  This simply doesn't work since we need to flush the held
> GRO packets first.
>
> This patch fixes it by doing the obvious thing of reenabling
> IRQs first and then calling napi_complete.

Does this fix generate a race condition for a non-NAPI device?  If
netif_rx runs immediately after local_irq_enable it would queue a
packet on the backlog queue and try to schedule napi (the latter has
no effect because napi has not completed).  On return from interrupt,
napi_complete is done leaving a packet in the input queue but napi is
not scheduled to process it.

Thanks,
Tom

>
> Reported-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index f112970..2565f6d 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2588,9 +2588,9 @@ static int process_backlog(struct napi_struct *napi, int quota)
>                local_irq_disable();
>                skb = __skb_dequeue(&queue->input_pkt_queue);
>                if (!skb) {
> -                       __napi_complete(napi);
>                        local_irq_enable();
> -                       break;
> +                       napi_complete(napi);
> +                       goto out;
>                }
>                local_irq_enable();
>
> @@ -2599,6 +2599,7 @@ static int process_backlog(struct napi_struct *napi, int quota)
>
>        napi_gro_flush(napi);
>
> +out:
>        return work;
>  }
>
> Thanks,
> --
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - March 27, 2009, 10:50 p.m.
From: Tom Herbert <therbert@google.com>
Date: Fri, 27 Mar 2009 12:05:30 -0700

> > On the legacy netif_rx path, I incorrectly tried to optimise
> >
> > the napi_complete call by using __napi_complete before we reenable
> > IRQs.  This simply doesn't work since we need to flush the held
> > GRO packets first.
> >
> > This patch fixes it by doing the obvious thing of reenabling
> > IRQs first and then calling napi_complete.
> 
> Does this fix generate a race condition for a non-NAPI device?  If
> netif_rx runs immediately after local_irq_enable it would queue a
> packet on the backlog queue and try to schedule napi (the latter has
> no effect because napi has not completed).  On return from interrupt,
> napi_complete is done leaving a packet in the input queue but napi is
> not scheduled to process it.

Yes we know this version of Herbert's patch has that problem,
read the rest of the thread and subsequent versions of the fix.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index f112970..2565f6d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2588,9 +2588,9 @@  static int process_backlog(struct napi_struct *napi, int quota)
 		local_irq_disable();
 		skb = __skb_dequeue(&queue->input_pkt_queue);
 		if (!skb) {
-			__napi_complete(napi);
 			local_irq_enable();
-			break;
+			napi_complete(napi);
+			goto out;
 		}
 		local_irq_enable();
 
@@ -2599,6 +2599,7 @@  static int process_backlog(struct napi_struct *napi, int quota)
 
 	napi_gro_flush(napi);
 
+out:
 	return work;
 }