Patchwork skge: fix occasional BUG during MTU change

login
register
mail settings
Submitter Michal Schmidt
Date April 7, 2009, 4:36 p.m.
Message ID <20090407183623.7545bb0b@leela>
Download mbox | patch
Permalink /patch/25688/
State Accepted
Delegated to: David Miller
Headers show

Comments

Michal Schmidt - April 7, 2009, 4:36 p.m.
The BUG_ON(skge->tx_ring.to_use != skge->tx_ring.to_clean) in skge_up()
was sometimes observed when setting MTU.

skge_down() disables the TX queue, but then reenables it by mistake via
skge_tx_clean().
Fix it by moving the waking of the queue from skge_tx_clean() to the
other caller. And to make sure start_xmit is not in progress on another
CPU, skge_down() should call netif_tx_disable().

The bug was reported to me by Jiri Jilek whose Debian system sometimes
failed to boot. He tested the patch and the bug did not happen anymore.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
---
 drivers/net/skge.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)
David Miller - April 8, 2009, 11:01 p.m.
From: Michal Schmidt <mschmidt@redhat.com>
Date: Tue, 7 Apr 2009 18:36:23 +0200

> The BUG_ON(skge->tx_ring.to_use != skge->tx_ring.to_clean) in skge_up()
> was sometimes observed when setting MTU.
> 
> skge_down() disables the TX queue, but then reenables it by mistake via
> skge_tx_clean().
> Fix it by moving the waking of the queue from skge_tx_clean() to the
> other caller. And to make sure start_xmit is not in progress on another
> CPU, skge_down() should call netif_tx_disable().
> 
> The bug was reported to me by Jiri Jilek whose Debian system sometimes
> failed to boot. He tested the patch and the bug did not happen anymore.
> 
> Signed-off-by: Michal Schmidt <mschmidt@redhat.com>

Stephen, an ACK possibly?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger - April 8, 2009, 11:06 p.m.
On Wed, 08 Apr 2009 16:01:52 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Michal Schmidt <mschmidt@redhat.com>
> Date: Tue, 7 Apr 2009 18:36:23 +0200
> 
> > The BUG_ON(skge->tx_ring.to_use != skge->tx_ring.to_clean) in skge_up()
> > was sometimes observed when setting MTU.
> > 
> > skge_down() disables the TX queue, but then reenables it by mistake via
> > skge_tx_clean().
> > Fix it by moving the waking of the queue from skge_tx_clean() to the
> > other caller. And to make sure start_xmit is not in progress on another
> > CPU, skge_down() should call netif_tx_disable().
> > 
> > The bug was reported to me by Jiri Jilek whose Debian system sometimes
> > failed to boot. He tested the patch and the bug did not happen anymore.
> > 
> > Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
> 
> Stephen, an ACK possibly?

I wanted to test on real hardware, and am offsite this week.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - April 8, 2009, 11:08 p.m.
From: Stephen Hemminger <shemminger@vyatta.com>
Date: Wed, 8 Apr 2009 16:06:21 -0700

> On Wed, 08 Apr 2009 16:01:52 -0700 (PDT)
> David Miller <davem@davemloft.net> wrote:
> 
>> From: Michal Schmidt <mschmidt@redhat.com>
>> Date: Tue, 7 Apr 2009 18:36:23 +0200
>> 
>> > The BUG_ON(skge->tx_ring.to_use != skge->tx_ring.to_clean) in skge_up()
>> > was sometimes observed when setting MTU.
>> > 
>> > skge_down() disables the TX queue, but then reenables it by mistake via
>> > skge_tx_clean().
>> > Fix it by moving the waking of the queue from skge_tx_clean() to the
>> > other caller. And to make sure start_xmit is not in progress on another
>> > CPU, skge_down() should call netif_tx_disable().
>> > 
>> > The bug was reported to me by Jiri Jilek whose Debian system sometimes
>> > failed to boot. He tested the patch and the bug did not happen anymore.
>> > 
>> > Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
>> 
>> Stephen, an ACK possibly?
> 
> I wanted to test on real hardware, and am offsite this week.

Ok, I'll wait for that, thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Morton - April 10, 2009, 4:59 a.m.
On Tue, 7 Apr 2009 18:36:23 +0200 Michal Schmidt <mschmidt@redhat.com> wrote:

> The BUG_ON(skge->tx_ring.to_use != skge->tx_ring.to_clean) in skge_up()
> was sometimes observed when setting MTU.
> 
> skge_down() disables the TX queue, but then reenables it by mistake via
> skge_tx_clean().
> Fix it by moving the waking of the queue from skge_tx_clean() to the
> other caller. And to make sure start_xmit is not in progress on another
> CPU, skge_down() should call netif_tx_disable().
> 
> The bug was reported to me by Jiri Jilek whose Debian system sometimes
> failed to boot. He tested the patch and the bug did not happen anymore.

It's conventional to add the reporter's "Reported-by:" tag to the
changelog in this situation.

> Signed-off-by: Michal Schmidt <mschmidt@redhat.com>

As the bug is present in 2.6.29 (and possibly earlier?) it's
appropriate to add a Cc: <stable@kernel.org> too.  This makes davem go
mad at you, but I prefer getting madded at over possibly losing bugfixes ;)

> 
> diff --git a/drivers/net/skge.c b/drivers/net/skge.c
> index 952d37f..b2a05af 100644
> --- a/drivers/net/skge.c
> +++ b/drivers/net/skge.c
> @@ -2674,7 +2674,7 @@ static int skge_down(struct net_device *dev)
>  	if (netif_msg_ifdown(skge))
>  		printk(KERN_INFO PFX "%s: disabling interface\n", dev->name);
>  
> -	netif_stop_queue(dev);
> +	netif_tx_disable(dev);
>  
>  	if (hw->chip_id == CHIP_ID_GENESIS && hw->phy_type == SK_PHY_XMAC)
>  		del_timer_sync(&skge->link_timer);
> @@ -2881,7 +2881,6 @@ static void skge_tx_clean(struct net_device *dev)
>  	}
>  
>  	skge->tx_ring.to_clean = e;
> -	netif_wake_queue(dev);
>  }
>  
>  static void skge_tx_timeout(struct net_device *dev)
> @@ -2893,6 +2892,7 @@ static void skge_tx_timeout(struct net_device *dev)
>  
>  	skge_write8(skge->hw, Q_ADDR(txqaddr[skge->port], Q_CSR), CSR_STOP);
>  	skge_tx_clean(dev);
> +	netif_wake_queue(dev);
>  }
>  
>  static int skge_change_mtu(struct net_device *dev, int new_mtu)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - April 13, 2009, 11:23 p.m.
From: Michal Schmidt <mschmidt@redhat.com>
Date: Tue, 7 Apr 2009 18:36:23 +0200

> The BUG_ON(skge->tx_ring.to_use != skge->tx_ring.to_clean) in skge_up()
> was sometimes observed when setting MTU.
> 
> skge_down() disables the TX queue, but then reenables it by mistake via
> skge_tx_clean().
> Fix it by moving the waking of the queue from skge_tx_clean() to the
> other caller. And to make sure start_xmit is not in progress on another
> CPU, skge_down() should call netif_tx_disable().
> 
> The bug was reported to me by Jiri Jilek whose Debian system sometimes
> failed to boot. He tested the patch and the bug did not happen anymore.
> 
> Signed-off-by: Michal Schmidt <mschmidt@redhat.com>

Stephen have you had a chance to test this yet?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger - April 14, 2009, 5:55 p.m.
On Tue, 7 Apr 2009 18:36:23 +0200
Michal Schmidt <mschmidt@redhat.com> wrote:

> The BUG_ON(skge->tx_ring.to_use != skge->tx_ring.to_clean) in skge_up()
> was sometimes observed when setting MTU.
> 
> skge_down() disables the TX queue, but then reenables it by mistake via
> skge_tx_clean().
> Fix it by moving the waking of the queue from skge_tx_clean() to the
> other caller. And to make sure start_xmit is not in progress on another
> CPU, skge_down() should call netif_tx_disable().
> 
> The bug was reported to me by Jiri Jilek whose Debian system sometimes
> failed to boot. He tested the patch and the bug did not happen anymore.
> 
> Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
> ---
>  drivers/net/skge.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)

Tested fine. This should go to stable as well.

Acked-by: Stephen Hemminger <shemminger@vyatta.com>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - April 14, 2009, 10:17 p.m.
From: Stephen Hemminger <shemminger@vyatta.com>
Date: Tue, 14 Apr 2009 10:55:39 -0700

> On Tue, 7 Apr 2009 18:36:23 +0200
> Michal Schmidt <mschmidt@redhat.com> wrote:
> 
>> The BUG_ON(skge->tx_ring.to_use != skge->tx_ring.to_clean) in skge_up()
>> was sometimes observed when setting MTU.
>> 
>> skge_down() disables the TX queue, but then reenables it by mistake via
>> skge_tx_clean().
>> Fix it by moving the waking of the queue from skge_tx_clean() to the
>> other caller. And to make sure start_xmit is not in progress on another
>> CPU, skge_down() should call netif_tx_disable().
>> 
>> The bug was reported to me by Jiri Jilek whose Debian system sometimes
>> failed to boot. He tested the patch and the bug did not happen anymore.
>> 
>> Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
>> ---
>>  drivers/net/skge.c |    4 ++--
>>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> Tested fine. This should go to stable as well.
> 
> Acked-by: Stephen Hemminger <shemminger@vyatta.com>

Applied, thanks everyone.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/net/skge.c b/drivers/net/skge.c
index 952d37f..b2a05af 100644
--- a/drivers/net/skge.c
+++ b/drivers/net/skge.c
@@ -2674,7 +2674,7 @@  static int skge_down(struct net_device *dev)
 	if (netif_msg_ifdown(skge))
 		printk(KERN_INFO PFX "%s: disabling interface\n", dev->name);
 
-	netif_stop_queue(dev);
+	netif_tx_disable(dev);
 
 	if (hw->chip_id == CHIP_ID_GENESIS && hw->phy_type == SK_PHY_XMAC)
 		del_timer_sync(&skge->link_timer);
@@ -2881,7 +2881,6 @@  static void skge_tx_clean(struct net_device *dev)
 	}
 
 	skge->tx_ring.to_clean = e;
-	netif_wake_queue(dev);
 }
 
 static void skge_tx_timeout(struct net_device *dev)
@@ -2893,6 +2892,7 @@  static void skge_tx_timeout(struct net_device *dev)
 
 	skge_write8(skge->hw, Q_ADDR(txqaddr[skge->port], Q_CSR), CSR_STOP);
 	skge_tx_clean(dev);
+	netif_wake_queue(dev);
 }
 
 static int skge_change_mtu(struct net_device *dev, int new_mtu)