diff mbox

r8169 hard-freezes the system on big network loads

Message ID 20110913081126.GA20022@electric-eye.fr.zoreil.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Francois Romieu Sept. 13, 2011, 8:11 a.m. UTC
Michael Brade <brade@informatik.uni-muenchen.de> :
[...]
> Does it have to be 3.1.0-rc3 or is 3.0.1 ok as well ?

:o(

Almost any release may exhibit the bug. The attached patch (#0003)
should be a better candidate as an official fix though.

> If so, I have another bad news: 3.0.1 still crashes with this patch.
> It took me a lot longer to crash it but eventually it did happen.
> Not sure why it took longer, I guess I didn't generate enough throughput.

It sure sucks from a user experience viewpoint but it is not _that_ bad.

Are the symptoms in any way different or do you still notice more-or-less
periodic link-up messages and no real network traffic ?

> If you want me to use 3.1.0 then we'll have to wait until git.kernel.org is 
> back...

https://github.com/torvalds/linux.git is available in the meantime.

You will want the patch below as well if you try 3.1-rc6.

[PATCH] r8169: don't reset software ring indexes after disabling hardware Rx.

Bad things happen when the driver resets ring indexes after disabling
hardware Rx (and Tx) in the RxFIFOOver event recovery path of the irq
handler while it races with the NAPI Rx processing method.

Ring indexes init is now done before enabling hardware Rx / Tx.

NB: this is not a straight candidate for -stable since it is coupled
with commit 92fc43b4159b518f5baae57301f26d770b0834c9 (July 11).

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: Hayes <hayeswang@realtek.com>
---
 drivers/net/r8169.c |   14 ++++++++------
 1 files changed, 8 insertions(+), 6 deletions(-)

Comments

Michael Brade Sept. 14, 2011, 9:36 p.m. UTC | #1
On Tuesday 13 September 2011 10:11:26 you wrote:
> Michael Brade <brade@informatik.uni-muenchen.de> :
> [...]
> 
> > Does it have to be 3.1.0-rc3 or is 3.0.1 ok as well ?
> :
> :o(
> 
> Almost any release may exhibit the bug. The attached patch (#0003)
> should be a better candidate as an official fix though.

ok, good news: I did not experience any freeze anymore even though I 
transfered 60 GB. And I applied both of your patches and 

-                   if (status & RxFOVF) {
-                           rtl8169_schedule_work(dev, rtl8169_reset_task);
-                           dev->stats.rx_fifo_errors++;
-                   }

 
> > If so, I have another bad news: 3.0.1 still crashes with this patch.
> > It took me a lot longer to crash it but eventually it did happen.
> > Not sure why it took longer, I guess I didn't generate enough throughput.
> 
> It sure sucks from a user experience viewpoint but it is not _that_ bad.

I disagree - I actually lose data because I mount my data and backups with 
iSCSI and exactly then it crashes.

> Are the symptoms in any way different or do you still notice more-or-less
> periodic link-up messages and no real network traffic ?

dmesg looks like this:

[ 1611.380420] r8169 0000:13:00.0: eth0: link up
[ 1611.995417] r8169 0000:13:00.0: eth0: link up
[ 1612.323050] r8169 0000:13:00.0: eth0: link up
[ 1612.574016] r8169 0000:13:00.0: eth0: link up
[ 1613.450630] r8169 0000:13:00.0: eth0: link up
[ 1613.929383] r8169 0000:13:00.0: eth0: link up
[ 1614.950939] r8169 0000:13:00.0: eth0: link up
[ 1615.699660] r8169 0000:13:00.0: eth0: link up
[ 1616.005507] r8169 0000:13:00.0: eth0: link up
[ 1616.746199] r8169 0000:13:00.0: eth0: link up
[ 1617.879670] r8169 0000:13:00.0: eth0: link up
[ 1618.461433] r8169 0000:13:00.0: eth0: link up

so yes but what do you mean with "no real network traffic"? I still get
100 MB/s.

cheers,
  Michael
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Francois Romieu Sept. 15, 2011, 12:03 a.m. UTC | #2
Michael Brade <brade@informatik.uni-muenchen.de> :
[...]
> ok, good news: I did not experience any freeze anymore even though I 
> transfered 60 GB. And I applied both of your patches and 
> 
> -                   if (status & RxFOVF) {
> -                           rtl8169_schedule_work(dev, rtl8169_reset_task);
> -                           dev->stats.rx_fifo_errors++;
> -                   }

It should not be necessary to remove this part : the status mask is
supposed to take care of it. One of my patches is wrong if this part
needs to go away.

[...]
> > Are the symptoms in any way different or do you still notice more-or-less
> > periodic link-up messages and no real network traffic ?
> 
> dmesg looks like this:
> 
> [ 1611.380420] r8169 0000:13:00.0: eth0: link up
> [ 1611.995417] r8169 0000:13:00.0: eth0: link up
> [ 1612.323050] r8169 0000:13:00.0: eth0: link up
> [ 1612.574016] r8169 0000:13:00.0: eth0: link up

I will have to figure why there are so much of theses messages.

[...]
> so yes but what do you mean with "no real network traffic"? I still get
> 100 MB/s.

100 MB/s as 100 Mbyte/s on a gigabit link or 100 Mbit/s on a {gigabit / fast}
ethernet link ?

Thanks.
Michael Brade Sept. 15, 2011, 10:26 a.m. UTC | #3
On Thursday 15 September 2011 02:03:32 Francois Romieu wrote:
> Michael Brade <brade@informatik.uni-muenchen.de> :
> [...]
>
> > ok, good news: I did not experience any freeze anymore even though I
> > transfered 60 GB. And I applied both of your patches and
> >
> > -                   if (status & RxFOVF) {
> > -                           rtl8169_schedule_work(dev,
> > rtl8169_reset_task); -                          
> > dev->stats.rx_fifo_errors++;
> > -                   }
>
> It should not be necessary to remove this part : the status mask is
> supposed to take care of it. One of my patches is wrong if this part
> needs to go away.

ok, I only removed it because you told me so the first time.

> [...]
>
> > so yes but what do you mean with "no real network traffic"? I still get
> > 100 MB/s.
>
> 100 MB/s as 100 Mbyte/s on a gigabit link or 100 Mbit/s on a {gigabit /
> fast} ethernet link ?

100 Mbytes on a gigabit link, so almost 100% usage (with ups and downs, of course; maybe 
between 90 MB/s and 112 MB/s).

thanks,
  Michael
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 05566b1..22b9c7a 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -717,7 +717,7 @@  static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 				      struct net_device *dev);
 static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance);
 static int rtl8169_init_ring(struct net_device *dev);
-static void rtl_hw_start(struct net_device *dev);
+static void rtl_start(struct net_device *dev);
 static int rtl8169_close(struct net_device *dev);
 static void rtl_set_rx_mode(struct net_device *dev);
 static void rtl8169_tx_timeout(struct net_device *dev);
@@ -3589,8 +3589,6 @@  static void rtl_hw_reset(struct rtl8169_private *tp)
 			break;
 		udelay(100);
 	}
-
-	rtl8169_init_ring_indexes(tp);
 }
 
 static int __devinit
@@ -3948,7 +3946,7 @@  static int rtl8169_open(struct net_device *dev)
 
 	rtl_pll_power_up(tp);
 
-	rtl_hw_start(dev);
+	rtl_start(dev);
 
 	tp->saved_wolopts = 0;
 	pm_runtime_put_noidle(&pdev->dev);
@@ -4014,10 +4012,14 @@  static void rtl_set_rx_tx_config_registers(struct rtl8169_private *tp)
 		(InterFrameGap << TxInterFrameGapShift));
 }
 
-static void rtl_hw_start(struct net_device *dev)
+static void rtl_start(struct net_device *dev)
 {
 	struct rtl8169_private *tp = netdev_priv(dev);
 
+	rtl8169_init_ring_indexes(tp);
+
+	smp_mb();
+
 	tp->hw_start(dev);
 
 	netif_start_queue(dev);
@@ -4997,7 +4999,7 @@  static void rtl8169_reset_task(struct work_struct *work)
 	rtl8169_tx_clear(tp);
 
 	rtl8169_hw_reset(tp);
-	rtl_hw_start(dev);
+	rtl_start(dev);
 	netif_wake_queue(dev);
 	rtl8169_check_link_status(dev, tp, tp->mmio_addr);