diff mbox

[1/1] atl1c: fix issue of transmit queue 0 timed out

Message ID 1340724786-3819-1-git-send-email-cjren@qca.qualcomm.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Cloud Ren June 26, 2012, 3:33 p.m. UTC
From: xiong <xiong@qca.qualcomm.com>

some people report atl1c could cause system hang with following
kernel trace info:
---------------------------------------
WARNING: at.../net/sched/sch_generic.c:258
dev_watchdog+0x1db/0x1d0()
...
NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out
...
---------------------------------------
This is caused by netif_stop_queue calling when cable Link is down
but netif_wake_queue isn't called when cable Link is resume.

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Signed-off-by: Cloud Ren <cjren@qca.qualcomm.com>
---
 drivers/net/ethernet/atheros/atl1c/atl1c_main.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

Comments

Luis R. Rodriguez June 26, 2012, 6:03 p.m. UTC | #1
On Tue, Jun 26, 2012 at 12:33:06PM -0300, Ren, Cloud wrote:
> From: xiong <xiong@qca.qualcomm.com>
> 
> some people report atl1c could cause system hang with following
> kernel trace info:
> ---------------------------------------
> WARNING: at.../net/sched/sch_generic.c:258
> dev_watchdog+0x1db/0x1d0()
> ...
> NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out
> ...
> ---------------------------------------
> This is caused by netif_stop_queue calling when cable Link is down
> but netif_wake_queue isn't called when cable Link is resume.
> 
> Signed-off-by: xiong <xiong@qca.qualcomm.com>
> Signed-off-by: Cloud Ren <cjren@qca.qualcomm.com>

If this fixes  a system hang then this could be a stable
fix -- that is, this should be propagated to older stable
kernels, no?

If so then please add to the commit log a line like this:

Cc: stable@vger.kernel.org [3.4]

Note: this is for the commit log! Right above the line below
that has "---"

  Luis
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings June 26, 2012, 8:23 p.m. UTC | #2
On Tue, 2012-06-26 at 12:33 -0300, Ren, Cloud wrote:
> From: xiong <xiong@qca.qualcomm.com>
> 
> some people report atl1c could cause system hang with following
> kernel trace info:
> ---------------------------------------
> WARNING: at.../net/sched/sch_generic.c:258
> dev_watchdog+0x1db/0x1d0()
> ...
> NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out
> ...
> ---------------------------------------
> This is caused by netif_stop_queue calling when cable Link is down
> but netif_wake_queue isn't called when cable Link is resume.
> 
> Signed-off-by: xiong <xiong@qca.qualcomm.com>
> Signed-off-by: Cloud Ren <cjren@qca.qualcomm.com>
> ---
>  drivers/net/ethernet/atheros/atl1c/atl1c_main.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> index 85717cb..c2736c4 100644
> --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> @@ -351,6 +351,8 @@ static void atl1c_common_task(struct work_struct *work)
>  		atl1c_irq_disable(adapter);
>  		atl1c_check_link_status(adapter);
>  		atl1c_irq_enable(adapter);
> +		if (netif_queue_stopped(netdev) && netif_carrier_ok(netdev))
> +			netif_wake_queue(netdev);
>  	}
>  }
>  

Why explicitly stop/start the queue when the link changes?  That's what
link_watch is for.

Ben.
Huang, Xiong June 26, 2012, 8:25 p.m. UTC | #3
Yes, another fix to remove netif_stop_queue when cable link is down.

-Xiong

> -----Original Message-----

> From: Ben Hutchings [mailto:bhutchings@solarflare.com]

> Sent: Wednesday, June 27, 2012 4:24

> To: Ren, Cloud

> Cc: davem@davemloft.net; netdev@vger.kernel.org; linux-

> kernel@vger.kernel.org; qca-linux-team; nic-devel; Huang, Xiong

> Subject: Re: [PATCH 1/1] atl1c: fix issue of transmit queue 0 timed out

> 

> On Tue, 2012-06-26 at 12:33 -0300, Ren, Cloud wrote:

> > From: xiong <xiong@qca.qualcomm.com>

> >

> > some people report atl1c could cause system hang with following kernel

> > trace info:

> > ---------------------------------------

> > WARNING: at.../net/sched/sch_generic.c:258

> > dev_watchdog+0x1db/0x1d0()

> > ...

> > NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out ...

> > ---------------------------------------

> > This is caused by netif_stop_queue calling when cable Link is down but

> > netif_wake_queue isn't called when cable Link is resume.

> >

> > Signed-off-by: xiong <xiong@qca.qualcomm.com>

> > Signed-off-by: Cloud Ren <cjren@qca.qualcomm.com>

> > ---

> >  drivers/net/ethernet/atheros/atl1c/atl1c_main.c |    2 ++

> >  1 files changed, 2 insertions(+), 0 deletions(-)

> >

> > diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c

> > b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c

> > index 85717cb..c2736c4 100644

> > --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c

> > +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c

> > @@ -351,6 +351,8 @@ static void atl1c_common_task(struct work_struct

> *work)

> >  		atl1c_irq_disable(adapter);

> >  		atl1c_check_link_status(adapter);

> >  		atl1c_irq_enable(adapter);

> > +		if (netif_queue_stopped(netdev) && netif_carrier_ok(netdev))

> > +			netif_wake_queue(netdev);

> >  	}

> >  }

> >

> 

> Why explicitly stop/start the queue when the link changes?  That's what

> link_watch is for.

> 

> Ben.

> 

> --

> Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's

> the marketing department's job.

> They asked us to note that Solarflare product names are trademarked.
Huang, Xiong June 26, 2012, 8:26 p.m. UTC | #4
Sorry, my mean , another fix is to remove netif_stop_queue when cable link is down.

Thanks
Xiong
> -----Original Message-----

> From: Huang, Xiong

> Sent: Wednesday, June 27, 2012 4:25

> To: Ben Hutchings; Ren, Cloud

> Cc: davem@davemloft.net; netdev@vger.kernel.org; linux-

> kernel@vger.kernel.org; qca-linux-team; nic-devel

> Subject: RE: [PATCH 1/1] atl1c: fix issue of transmit queue 0 timed out

> 

> Yes, another fix to remove netif_stop_queue when cable link is down.

> 

> -Xiong

> 

> > -----Original Message-----

> > From: Ben Hutchings [mailto:bhutchings@solarflare.com]

> > Sent: Wednesday, June 27, 2012 4:24

> > To: Ren, Cloud

> > Cc: davem@davemloft.net; netdev@vger.kernel.org; linux-

> > kernel@vger.kernel.org; qca-linux-team; nic-devel; Huang, Xiong

> > Subject: Re: [PATCH 1/1] atl1c: fix issue of transmit queue 0 timed

> > out

> >

> > On Tue, 2012-06-26 at 12:33 -0300, Ren, Cloud wrote:

> > > From: xiong <xiong@qca.qualcomm.com>

> > >

> > > some people report atl1c could cause system hang with following

> > > kernel trace info:

> > > ---------------------------------------

> > > WARNING: at.../net/sched/sch_generic.c:258

> > > dev_watchdog+0x1db/0x1d0()

> > > ...

> > > NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out ...

> > > ---------------------------------------

> > > This is caused by netif_stop_queue calling when cable Link is down

> > > but netif_wake_queue isn't called when cable Link is resume.

> > >

> > > Signed-off-by: xiong <xiong@qca.qualcomm.com>

> > > Signed-off-by: Cloud Ren <cjren@qca.qualcomm.com>

> > > ---

> > >  drivers/net/ethernet/atheros/atl1c/atl1c_main.c |    2 ++

> > >  1 files changed, 2 insertions(+), 0 deletions(-)

> > >

> > > diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c

> > > b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c

> > > index 85717cb..c2736c4 100644

> > > --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c

> > > +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c

> > > @@ -351,6 +351,8 @@ static void atl1c_common_task(struct work_struct

> > *work)

> > >  		atl1c_irq_disable(adapter);

> > >  		atl1c_check_link_status(adapter);

> > >  		atl1c_irq_enable(adapter);

> > > +		if (netif_queue_stopped(netdev) && netif_carrier_ok(netdev))

> > > +			netif_wake_queue(netdev);

> > >  	}

> > >  }

> > >

> >

> > Why explicitly stop/start the queue when the link changes?  That's

> > what link_watch is for.

> >

> > Ben.

> >

> > --

> > Ben Hutchings, Staff Engineer, Solarflare Not speaking for my

> > employer; that's the marketing department's job.

> > They asked us to note that Solarflare product names are trademarked.
Huang, Xiong June 26, 2012, 8:41 p.m. UTC | #5
Luis
   It should be a stable fix, but as Ben Hutchings mentioned in another mail, 
Maybe, removing netif_stop_queue when cable link down is a better choice.

   Do you mean we need add 'cc:stable@vger.kernel.org'  just before 'some people report  ...' ?

Thanks
Xiong

> -----Original Message-----
> From: Rodriguez, Luis
> Sent: Wednesday, June 27, 2012 2:03
> To: Ren, Cloud
> Cc: davem@davemloft.net; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; qca-linux-team; nic-devel; Huang, Xiong
> Subject: Re: [PATCH 1/1] atl1c: fix issue of transmit queue 0 timed out
> 
> On Tue, Jun 26, 2012 at 12:33:06PM -0300, Ren, Cloud wrote:
> > From: xiong <xiong@qca.qualcomm.com>
> >
> > some people report atl1c could cause system hang with following kernel
> > trace info:
> > ---------------------------------------
> > WARNING: at.../net/sched/sch_generic.c:258
> > dev_watchdog+0x1db/0x1d0()
> > ...
> > NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out ...
> > ---------------------------------------
> > This is caused by netif_stop_queue calling when cable Link is down but
> > netif_wake_queue isn't called when cable Link is resume.
> >
> > Signed-off-by: xiong <xiong@qca.qualcomm.com>
> > Signed-off-by: Cloud Ren <cjren@qca.qualcomm.com>
> 
> If this fixes  a system hang then this could be a stable fix -- that is, this should
> be propagated to older stable kernels, no?
> 
> If so then please add to the commit log a line like this:
> 
> Cc: stable@vger.kernel.org [3.4]
> 
> Note: this is for the commit log! Right above the line below that has "---"
> 
>   Luis
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luis R. Rodriguez June 26, 2012, 8:54 p.m. UTC | #6
On Tue, Jun 26, 2012 at 01:41:11PM -0700, Huang, Xiong wrote:
> Luis
>    It should be a stable fix, but as Ben Hutchings mentioned in another mail, 
> Maybe, removing netif_stop_queue when cable link down is a better choice.
> 
>    Do you mean we need add 'cc:stable@vger.kernel.org'  just before 'some people report  ...' ?

Nope, see commit 4f7a67e2dd49fbfba002c453bc24bf00e701cc71
as an example of how to do this. This is a random commit
that has been marked as stable.

commit 4f7a67e2dd49fbfba002c453bc24bf00e701cc71
Author: Ricardo Martins <rasm@fe.up.pt>
Date:   Tue May 22 18:02:03 2012 +0100

    USB: fix PS3 EHCI systems
    
    After commit aaa0ef289afe9186f81e2340114ea413eef0492a "PS3 EHCI QH
    read work-around", Terratec Grabby (em28xx) stopped working with AMD
    Geode LX 800 (USB controller AMD CS5536). Since this is a PS3 only
    fix, the following patch adds a conditional block around it.
    
    Signed-off-by: Ricardo Martins <rasm@fe.up.pt>
    Acked-by: Alan Stern <stern@rowland.harvard.edu>
    Cc: stable <stable@vger.kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Sometimes it helps if you specify the oldest stable kernel
to apply patches to, so for example:

commit 80b08a8d8829a58b5db14b1417151094cc28face
Author: Felix Fietkau <nbd@openwrt.org>
Date:   Fri Jun 15 03:04:53 2012 +0200

    ath9k: fix invalid pointer access in the tx path
    
    After setup_frame_info has been called, only info->control.rates is still
    valid, other control fields have been overwritten by the ath_frame_info
    data. Move the access to info->control.vif for checking short preamble
    to setup_frame_info before it gets overwritten.
    
    This regression was introduced in commit d47a61aa
    "ath9k: Fix multi-VIF BSS handling"
    
    Signed-off-by: Felix Fietkau <nbd@openwrt.org>
    Reported-by: Thomas Hühn <thomas@net.t-labs.tu-berlin.de>
    Acked-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
    Cc: stable@vger.kernel.org [3.4]
    Signed-off-by: John W. Linville <linville@tuxdriver.com>

To be clear, this is not a Cc: in the e-mail but instead a
Cc line in the commit log entry.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Huang, Xiong June 26, 2012, 8:55 p.m. UTC | #7
Understand, thank you !

> -----Original Message-----
> From: Rodriguez, Luis
> Sent: Wednesday, June 27, 2012 4:55
> To: Huang, Xiong
> Cc: Ren, Cloud; davem@davemloft.net; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; qca-linux-team; nic-devel
> Subject: Re: [PATCH 1/1] atl1c: fix issue of transmit queue 0 timed out
> 
> On Tue, Jun 26, 2012 at 01:41:11PM -0700, Huang, Xiong wrote:
> > Luis
> >    It should be a stable fix, but as Ben Hutchings mentioned in
> > another mail, Maybe, removing netif_stop_queue when cable link down is a
> better choice.
> >
> >    Do you mean we need add 'cc:stable@vger.kernel.org'  just before 'some
> people report  ...' ?
> 
> Nope, see commit 4f7a67e2dd49fbfba002c453bc24bf00e701cc71
> as an example of how to do this. This is a random commit that has been
> marked as stable.
> 
> commit 4f7a67e2dd49fbfba002c453bc24bf00e701cc71
> Author: Ricardo Martins <rasm@fe.up.pt>
> Date:   Tue May 22 18:02:03 2012 +0100
> 
>     USB: fix PS3 EHCI systems
> 
>     After commit aaa0ef289afe9186f81e2340114ea413eef0492a "PS3 EHCI
> QH
>     read work-around", Terratec Grabby (em28xx) stopped working with AMD
>     Geode LX 800 (USB controller AMD CS5536). Since this is a PS3 only
>     fix, the following patch adds a conditional block around it.
> 
>     Signed-off-by: Ricardo Martins <rasm@fe.up.pt>
>     Acked-by: Alan Stern <stern@rowland.harvard.edu>
>     Cc: stable <stable@vger.kernel.org>
>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> Sometimes it helps if you specify the oldest stable kernel to apply patches to,
> so for example:
> 
> commit 80b08a8d8829a58b5db14b1417151094cc28face
> Author: Felix Fietkau <nbd@openwrt.org>
> Date:   Fri Jun 15 03:04:53 2012 +0200
> 
>     ath9k: fix invalid pointer access in the tx path
> 
>     After setup_frame_info has been called, only info->control.rates is still
>     valid, other control fields have been overwritten by the ath_frame_info
>     data. Move the access to info->control.vif for checking short preamble
>     to setup_frame_info before it gets overwritten.
> 
>     This regression was introduced in commit d47a61aa
>     "ath9k: Fix multi-VIF BSS handling"
> 
>     Signed-off-by: Felix Fietkau <nbd@openwrt.org>
>     Reported-by: Thomas Hühn <thomas@net.t-labs.tu-berlin.de>
>     Acked-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
>     Cc: stable@vger.kernel.org [3.4]
>     Signed-off-by: John W. Linville <linville@tuxdriver.com>
> 
> To be clear, this is not a Cc: in the e-mail but instead a Cc line in the commit
> log entry.
> 
>   Luis
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
index 85717cb..c2736c4 100644
--- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
+++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
@@ -351,6 +351,8 @@  static void atl1c_common_task(struct work_struct *work)
 		atl1c_irq_disable(adapter);
 		atl1c_check_link_status(adapter);
 		atl1c_irq_enable(adapter);
+		if (netif_queue_stopped(netdev) && netif_carrier_ok(netdev))
+			netif_wake_queue(netdev);
 	}
 }