diff mbox

netpoll: fix race on poll_list resulting in garbage entry

Message ID 20081211181528.GB10558@hmsendeavour.rdu.redhat.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Neil Horman Dec. 11, 2008, 6:15 p.m. UTC
On Thu, Dec 11, 2008 at 09:01:04AM -0800, Stephen Hemminger wrote:
> On Thu, 11 Dec 2008 13:07:28 +0000
> Jarek Poplawski <jarkao2@gmail.com> wrote:
> 
> > On 09-12-2008 22:06, Neil Horman wrote:
> > ...
> > > When executing napi->poll from the netpoll_path, this bit will
> > > be set. When a driver calls netif_rx_complete, if that bit is set, it will not
> > > remove the napi_struct from the poll_list.  That work will be saved for the next
> > > iteration of net_rx_action.
> > 
> > This could be not enough: some drivers, e.g. sky2, call napi_complete()
> > directly.
> > 
> 
> There is good reason for this. Although most drivers only have one NAPI
> instance per device, and multiqueue drivers have several NAPI structures
> per device, a few devices like sky2 need to support multiple devices
> running off one NAPI receive. The Marvell hardware has a common receive
> interrupt for both ports on a dual port card.
> 
> This kind of hardware limits usage of netpoll. Only one port can be
> used with netpoll because netpoll makes assumptions about NAPI
> association.
> 

There was previously good cause to use __netif_rx_complete instead of
netif_rx_complete some time ago when multiqueue rx was implemented using a set
of dummy netdevices.  But with the separation of the napi code, there is no
longer any reason for this to be done.

I just took a quick look, and it appears that sky2 is the last remaining driver
to use the underlying napi routines.

This patch maintains exactly the same functionality that it previously had, but
allows for the netpoll patch to be safe with respect to the per-cpu poll_lists
used by net_rx_action.

Regards
Neil


Signed-off-by: Neil Horman <nhorman@tuxdriver.com>


 sky2.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

stephen hemminger Dec. 12, 2008, 12:03 a.m. UTC | #1
On Thu, 11 Dec 2008 13:15:28 -0500
Neil Horman <nhorman@tuxdriver.com> wrote:

> On Thu, Dec 11, 2008 at 09:01:04AM -0800, Stephen Hemminger wrote:
> > On Thu, 11 Dec 2008 13:07:28 +0000
> > Jarek Poplawski <jarkao2@gmail.com> wrote:
> > 
> > > On 09-12-2008 22:06, Neil Horman wrote:
> > > ...
> > > > When executing napi->poll from the netpoll_path, this bit will
> > > > be set. When a driver calls netif_rx_complete, if that bit is set, it will not
> > > > remove the napi_struct from the poll_list.  That work will be saved for the next
> > > > iteration of net_rx_action.
> > > 
> > > This could be not enough: some drivers, e.g. sky2, call napi_complete()
> > > directly.
> > > 
> > 
> > There is good reason for this. Although most drivers only have one NAPI
> > instance per device, and multiqueue drivers have several NAPI structures
> > per device, a few devices like sky2 need to support multiple devices
> > running off one NAPI receive. The Marvell hardware has a common receive
> > interrupt for both ports on a dual port card.
> > 
> > This kind of hardware limits usage of netpoll. Only one port can be
> > used with netpoll because netpoll makes assumptions about NAPI
> > association.
> > 
> 
> There was previously good cause to use __netif_rx_complete instead of
> netif_rx_complete some time ago when multiqueue rx was implemented using a set
> of dummy netdevices.  But with the separation of the napi code, there is no
> longer any reason for this to be done.
> 
> I just took a quick look, and it appears that sky2 is the last remaining driver
> to use the underlying napi routines.
> 
> This patch maintains exactly the same functionality that it previously had, but
> allows for the netpoll patch to be safe with respect to the per-cpu poll_lists
> used by net_rx_action.
> 
> Regards
> Neil
> 
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> 
> 
>  sky2.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> 
> diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
> index 3813d15..84bdc3c 100644
> --- a/drivers/net/sky2.c
> +++ b/drivers/net/sky2.c
> @@ -2694,7 +2694,7 @@ static int sky2_poll(struct napi_struct *napi, int work_limit)
>  		sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_STOP);
>  		sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_START);
>  	}
> -	napi_complete(napi);
> +	netif_rx_complete(napi->dev, napi);
>  	sky2_read32(hw, B0_Y2_SP_LISR);
>  done:

I would ask it the other way. Why is interface an argument to netif_rx_complete
if it is never used?  
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Dec. 12, 2008, 7:07 a.m. UTC | #2
On Thu, Dec 11, 2008 at 01:15:28PM -0500, Neil Horman wrote:
> On Thu, Dec 11, 2008 at 09:01:04AM -0800, Stephen Hemminger wrote:
> > On Thu, 11 Dec 2008 13:07:28 +0000
> > Jarek Poplawski <jarkao2@gmail.com> wrote:
> > 
> > > On 09-12-2008 22:06, Neil Horman wrote:
> > > ...
> > > > When executing napi->poll from the netpoll_path, this bit will
> > > > be set. When a driver calls netif_rx_complete, if that bit is set, it will not
> > > > remove the napi_struct from the poll_list.  That work will be saved for the next
> > > > iteration of net_rx_action.
> > > 
> > > This could be not enough: some drivers, e.g. sky2, call napi_complete()
> > > directly.
> > > 
> > 
> > There is good reason for this. Although most drivers only have one NAPI
> > instance per device, and multiqueue drivers have several NAPI structures
> > per device, a few devices like sky2 need to support multiple devices
> > running off one NAPI receive. The Marvell hardware has a common receive
> > interrupt for both ports on a dual port card.
> > 
> > This kind of hardware limits usage of netpoll. Only one port can be
> > used with netpoll because netpoll makes assumptions about NAPI
> > association.
> > 
> 
> There was previously good cause to use __netif_rx_complete instead of
> netif_rx_complete some time ago when multiqueue rx was implemented using a set
> of dummy netdevices.  But with the separation of the napi code, there is no
> longer any reason for this to be done.
> 
> I just took a quick look, and it appears that sky2 is the last remaining driver
> to use the underlying napi routines.

Hmm... My grep shows a bit more (mv643xx_eth etc.), plus some
__netif_rx_complete().

BTW, I don't know these things, but I wonder if it's always safe to
do one more ->poll() after such _complete? (I mean enabled interrupts
and/or some locking problems.)

Jarek P.

> 
> This patch maintains exactly the same functionality that it previously had, but
> allows for the netpoll patch to be safe with respect to the per-cpu poll_lists
> used by net_rx_action.
> 
> Regards
> Neil
> 
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> 
> 
>  sky2.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> 
> diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
> index 3813d15..84bdc3c 100644
> --- a/drivers/net/sky2.c
> +++ b/drivers/net/sky2.c
> @@ -2694,7 +2694,7 @@ static int sky2_poll(struct napi_struct *napi, int work_limit)
>  		sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_STOP);
>  		sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_START);
>  	}
> -	napi_complete(napi);
> +	netif_rx_complete(napi->dev, napi);
>  	sky2_read32(hw, B0_Y2_SP_LISR);
>  done:
>  
> -- 
> /***************************************************
>  *Neil Horman
>  *nhorman@tuxdriver.com
>  *gpg keyid: 1024D / 0x92A74FA1
>  *http://pgp.mit.edu
>  ***************************************************/
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman Dec. 12, 2008, 12:18 p.m. UTC | #3
On Thu, Dec 11, 2008 at 04:03:07PM -0800, Stephen Hemminger wrote:
> On Thu, 11 Dec 2008 13:15:28 -0500
> Neil Horman <nhorman@tuxdriver.com> wrote:
> 
> > On Thu, Dec 11, 2008 at 09:01:04AM -0800, Stephen Hemminger wrote:
> > > On Thu, 11 Dec 2008 13:07:28 +0000
> > > Jarek Poplawski <jarkao2@gmail.com> wrote:
> > > 
> > > > On 09-12-2008 22:06, Neil Horman wrote:
> > > > ...
> > > > > When executing napi->poll from the netpoll_path, this bit will
> > > > > be set. When a driver calls netif_rx_complete, if that bit is set, it will not
> > > > > remove the napi_struct from the poll_list.  That work will be saved for the next
> > > > > iteration of net_rx_action.
> > > > 
> > > > This could be not enough: some drivers, e.g. sky2, call napi_complete()
> > > > directly.
> > > > 
> > > 
> > > There is good reason for this. Although most drivers only have one NAPI
> > > instance per device, and multiqueue drivers have several NAPI structures
> > > per device, a few devices like sky2 need to support multiple devices
> > > running off one NAPI receive. The Marvell hardware has a common receive
> > > interrupt for both ports on a dual port card.
> > > 
> > > This kind of hardware limits usage of netpoll. Only one port can be
> > > used with netpoll because netpoll makes assumptions about NAPI
> > > association.
> > > 
> > 
> > There was previously good cause to use __netif_rx_complete instead of
> > netif_rx_complete some time ago when multiqueue rx was implemented using a set
> > of dummy netdevices.  But with the separation of the napi code, there is no
> > longer any reason for this to be done.
> > 
> > I just took a quick look, and it appears that sky2 is the last remaining driver
> > to use the underlying napi routines.
> > 
> > This patch maintains exactly the same functionality that it previously had, but
> > allows for the netpoll patch to be safe with respect to the per-cpu poll_lists
> > used by net_rx_action.
> > 
> > Regards
> > Neil
> > 
> > 
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > 
> > 
> >  sky2.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > 
> > diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
> > index 3813d15..84bdc3c 100644
> > --- a/drivers/net/sky2.c
> > +++ b/drivers/net/sky2.c
> > @@ -2694,7 +2694,7 @@ static int sky2_poll(struct napi_struct *napi, int work_limit)
> >  		sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_STOP);
> >  		sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_START);
> >  	}
> > -	napi_complete(napi);
> > +	netif_rx_complete(napi->dev, napi);
> >  	sky2_read32(hw, B0_Y2_SP_LISR);
> >  done:
> 
> I would ask it the other way. Why is interface an argument to netif_rx_complete
> if it is never used?  
> 
Thats a fair question, and I don't know the answer, Dave?
Neil
Neil Horman Dec. 12, 2008, 1:31 p.m. UTC | #4
On Fri, Dec 12, 2008 at 07:07:50AM +0000, Jarek Poplawski wrote:
> On Thu, Dec 11, 2008 at 01:15:28PM -0500, Neil Horman wrote:
> > On Thu, Dec 11, 2008 at 09:01:04AM -0800, Stephen Hemminger wrote:
> > > On Thu, 11 Dec 2008 13:07:28 +0000
> > > Jarek Poplawski <jarkao2@gmail.com> wrote:
> > > 
> > > > On 09-12-2008 22:06, Neil Horman wrote:
> > > > ...
> > > > > When executing napi->poll from the netpoll_path, this bit will
> > > > > be set. When a driver calls netif_rx_complete, if that bit is set, it will not
> > > > > remove the napi_struct from the poll_list.  That work will be saved for the next
> > > > > iteration of net_rx_action.
> > > > 
> > > > This could be not enough: some drivers, e.g. sky2, call napi_complete()
> > > > directly.
> > > > 
> > > 
> > > There is good reason for this. Although most drivers only have one NAPI
> > > instance per device, and multiqueue drivers have several NAPI structures
> > > per device, a few devices like sky2 need to support multiple devices
> > > running off one NAPI receive. The Marvell hardware has a common receive
> > > interrupt for both ports on a dual port card.
> > > 
> > > This kind of hardware limits usage of netpoll. Only one port can be
> > > used with netpoll because netpoll makes assumptions about NAPI
> > > association.
> > > 
> > 
> > There was previously good cause to use __netif_rx_complete instead of
> > netif_rx_complete some time ago when multiqueue rx was implemented using a set
> > of dummy netdevices.  But with the separation of the napi code, there is no
> > longer any reason for this to be done.
> > 
> > I just took a quick look, and it appears that sky2 is the last remaining driver
> > to use the underlying napi routines.
> 
> Hmm... My grep shows a bit more (mv643xx_eth etc.), plus some
> __netif_rx_complete().
> 
didn't check __netif_rx_complete, but IMHO that falls into the same category.
Should be pretty straightforward to fix up.

> BTW, I don't know these things, but I wonder if it's always safe to
> do one more ->poll() after such _complete? (I mean enabled interrupts
> and/or some locking problems.)
> 
There are cases in which doing so may trigger a bug, yes, but in those cases the
error would again be in the driver.  They scheduled a poll, they should be able
to handle the trivial case in which there is 0 work to do, calling
netif_rx_complete when appropriate.

Neil

> Jarek P.
> 
> > 
> > This patch maintains exactly the same functionality that it previously had, but
> > allows for the netpoll patch to be safe with respect to the per-cpu poll_lists
> > used by net_rx_action.
> > 
> > Regards
> > Neil
> > 
> > 
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > 
> > 
> >  sky2.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > 
> > diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
> > index 3813d15..84bdc3c 100644
> > --- a/drivers/net/sky2.c
> > +++ b/drivers/net/sky2.c
> > @@ -2694,7 +2694,7 @@ static int sky2_poll(struct napi_struct *napi, int work_limit)
> >  		sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_STOP);
> >  		sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_START);
> >  	}
> > -	napi_complete(napi);
> > +	netif_rx_complete(napi->dev, napi);
> >  	sky2_read32(hw, B0_Y2_SP_LISR);
> >  done:
> >  
> > -- 
> > /***************************************************
> >  *Neil Horman
> >  *nhorman@tuxdriver.com
> >  *gpg keyid: 1024D / 0x92A74FA1
> >  *http://pgp.mit.edu
> >  ***************************************************/
> 
>
David Miller Dec. 16, 2008, 11:55 p.m. UTC | #5
From: Neil Horman <nhorman@tuxdriver.com>
Date: Fri, 12 Dec 2008 07:18:35 -0500

> On Thu, Dec 11, 2008 at 04:03:07PM -0800, Stephen Hemminger wrote:
> > I would ask it the other way. Why is interface an argument to netif_rx_complete
> > if it is never used?  
> > 
> Thats a fair question, and I don't know the answer, Dave?

That's just what the old code uses, since the NAPI context
sat inside of the device.  I just never removed it.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman Dec. 17, 2008, 9:16 p.m. UTC | #6
On Tue, Dec 16, 2008 at 03:55:40PM -0800, David Miller wrote:
> From: Neil Horman <nhorman@tuxdriver.com>
> Date: Fri, 12 Dec 2008 07:18:35 -0500
> 
> > On Thu, Dec 11, 2008 at 04:03:07PM -0800, Stephen Hemminger wrote:
> > > I would ask it the other way. Why is interface an argument to netif_rx_complete
> > > if it is never used?  
> > > 
> > Thats a fair question, and I don't know the answer, Dave?
> 
> That's just what the old code uses, since the NAPI context
> sat inside of the device.  I just never removed it.


To that end, Dave, I've got a tree with that api fixup complete.  Its on the
napi_api_fixup branch of the tree here:
http://git.infradead.org/users/nhorman/net-2.6.git?a=shortlog;h=refs/heads/napi_api_fixup

If you'd like to pull it, please go ahead, or I can submit individual patches
for it, if you prefer

Regards
Neil

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
stephen hemminger Dec. 17, 2008, 9:31 p.m. UTC | #7
On Wed, 17 Dec 2008 16:16:28 -0500
Neil Horman <nhorman@tuxdriver.com> wrote:

> On Tue, Dec 16, 2008 at 03:55:40PM -0800, David Miller wrote:
> > From: Neil Horman <nhorman@tuxdriver.com>
> > Date: Fri, 12 Dec 2008 07:18:35 -0500
> > 
> > > On Thu, Dec 11, 2008 at 04:03:07PM -0800, Stephen Hemminger wrote:
> > > > I would ask it the other way. Why is interface an argument to netif_rx_complete
> > > > if it is never used?  
> > > > 
> > > Thats a fair question, and I don't know the answer, Dave?
> > 
> > That's just what the old code uses, since the NAPI context
> > sat inside of the device.  I just never removed it.
> 
> 
> To that end, Dave, I've got a tree with that api fixup complete.  Its on the
> napi_api_fixup branch of the tree here:
> http://git.infradead.org/users/nhorman/net-2.6.git?a=shortlog;h=refs/heads/napi_api_fixup
> 
> If you'd like to pull it, please go ahead, or I can submit individual patches
> for it, if you prefer
> 
> Regards
> Neil
> 
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 

Since this a kernel API change, you have to do it as a one big patch
otherwise the kernel source won't build for the intermediate steps,
which makes 'git bisect' impossible.  Please merge the patches together.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman Dec. 17, 2008, 11:44 p.m. UTC | #8
On Wed, Dec 17, 2008 at 01:31:31PM -0800, Stephen Hemminger wrote:
> On Wed, 17 Dec 2008 16:16:28 -0500
> Neil Horman <nhorman@tuxdriver.com> wrote:
> 
> > On Tue, Dec 16, 2008 at 03:55:40PM -0800, David Miller wrote:
> > > From: Neil Horman <nhorman@tuxdriver.com>
> > > Date: Fri, 12 Dec 2008 07:18:35 -0500
> > > 
> > > > On Thu, Dec 11, 2008 at 04:03:07PM -0800, Stephen Hemminger wrote:
> > > > > I would ask it the other way. Why is interface an argument to netif_rx_complete
> > > > > if it is never used?  
> > > > > 
> > > > Thats a fair question, and I don't know the answer, Dave?
> > > 
> > > That's just what the old code uses, since the NAPI context
> > > sat inside of the device.  I just never removed it.
> > 
> > 
> > To that end, Dave, I've got a tree with that api fixup complete.  Its on the
> > napi_api_fixup branch of the tree here:
> > http://git.infradead.org/users/nhorman/net-2.6.git?a=shortlog;h=refs/heads/napi_api_fixup
> > 
> > If you'd like to pull it, please go ahead, or I can submit individual patches
> > for it, if you prefer
> > 
> > Regards
> > Neil
> > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > 
> 
> Since this a kernel API change, you have to do it as a one big patch
> otherwise the kernel source won't build for the intermediate steps,
> which makes 'git bisect' impossible.  Please merge the patches together.
> 

Sure, no problem.  I'll rediff it all and post an omnibus patch in the morning
Thanks!
Neil
diff mbox

Patch

diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index 3813d15..84bdc3c 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -2694,7 +2694,7 @@  static int sky2_poll(struct napi_struct *napi, int work_limit)
 		sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_STOP);
 		sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_START);
 	}
-	napi_complete(napi);
+	netif_rx_complete(napi->dev, napi);
 	sky2_read32(hw, B0_Y2_SP_LISR);
 done: