Patchwork 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too

login
register
mail settings
Submitter David Dillow
Date May 14, 2009, 2:38 a.m.
Message ID <1242268709.4979.7.camel@obelisk.thedillows.org>
Download mbox | patch
Permalink /patch/27187/
State RFC
Delegated to: David Miller
Headers show

Comments

David Dillow - May 14, 2009, 2:38 a.m.
On Tue, 2009-05-12 at 22:29 +0200, Michael Riepe wrote:
> Hi!
> 
> David Dillow wrote:
> 
> > I was saying that I don't think the timeouts are necessarily the NIC
> > chipset -- or the bridge chip for that matter --  having issues with
> > MSI. There were some substantial IRQ handling changes in 2.6.28 and my
> > bisection of the problem seem to lead into that code. I'll try this
> > later tonight hopefully, but can you try to run 2.6.27 with the current
> > r8169 driver and see if it is solid for you? That way it is using the
> > same driver code, but avoids the IRQ changes.
> 
> Unfortunately, 2.6.27 won't build with r8169.c copied from 2.6.29.

You are correct, and I should have thought about that. The following
patch reverts the following commits:

288379 net: Remove redundant NAPI functions
908a7a net: Remove unused netdev arg from some NAPI interfaces.
008298 netdev: add more functions to netdevice ops
8b4ab2 r8169: convert to net_device_ops
babcda drivers/net: Kill now superfluous ->last_rx stores.

The patched driver runs on 2.6.27 and survives my 5 minutes 'dd
if=/dev/zero bs=1024k | nc target 9000' test which usually dies in less
than 90 seconds on 2.6.28+.

To use the patch, just copy r8169.c from HEAD and use patch -p1.




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Riepe - May 14, 2009, 6:37 p.m.
David Dillow wrote:
> On Tue, 2009-05-12 at 22:29 +0200, Michael Riepe wrote:
> 
>>Hi!
>>
>>David Dillow wrote:
>>
>>
>>>I was saying that I don't think the timeouts are necessarily the NIC
>>>chipset -- or the bridge chip for that matter --  having issues with
>>>MSI. There were some substantial IRQ handling changes in 2.6.28 and my
>>>bisection of the problem seem to lead into that code. I'll try this
>>>later tonight hopefully, but can you try to run 2.6.27 with the current
>>>r8169 driver and see if it is solid for you? That way it is using the
>>>same driver code, but avoids the IRQ changes.
>>
>>Unfortunately, 2.6.27 won't build with r8169.c copied from 2.6.29.
> 
> 
> You are correct, and I should have thought about that. The following
> patch reverts the following commits:
> 
> 288379 net: Remove redundant NAPI functions
> 908a7a net: Remove unused netdev arg from some NAPI interfaces.
> 008298 netdev: add more functions to netdevice ops
> 8b4ab2 r8169: convert to net_device_ops
> babcda drivers/net: Kill now superfluous ->last_rx stores.
> 
> The patched driver runs on 2.6.27 and survives my 5 minutes 'dd
> if=/dev/zero bs=1024k | nc target 9000' test which usually dies in less
> than 90 seconds on 2.6.28+.

Not on my system:

WARNING: at net/sched/sch_generic.c:219 dev_watchdog+0x258/0x270()
NETDEV WATCHDOG: eth0 (r8169): transmit timed out
Modules linked in: nfsd lockd nfs_acl sunrpc exportfs autofs4 deflate
zlib_deflate ctr twofish twofish_common camellia serpent blowfish
des_generic cbc aes_x86_64 aes_generic xcbc rmd160 sha256_generic
sha1_generic crypto_null crypto_blkcipher af_key ipt_REJECT
nf_conntrack_ipv6 ip6table_filter ip6_tables xt_tcpudp xt_conntrack
iptable_filter ip_tables x_tables sg nf_conntrack_irc nf_conntrack_ftp
nf_conntrack_ipv4 nf_conntrack rfcomm l2cap bluetooth dm_mod tun eeprom
smsc47m192 hwmon_vid smsc47m1 hwmon cpufreq_stats cpufreq_powersave
video backlight output fan container battery ac usbhid usb_storage
i2c_dev hid evdev intelfb fb i2c_algo_bit ff_memless parport_pc
cfbcopyarea r8169 snd_hda_intel i2c_i801 thermal cfbimgblt serio_raw
ehci_hcd mii button iTCO_wdt snd_pcm processor parport i2c_core
cfbfillrect intel_agp snd_timer snd_page_alloc snd_hwdep snd uhci_hcd
soundcore
Pid: 0, comm: swapper Not tainted 2.6.27-ai-x64-r8169 #1

Call Trace:
 <IRQ>  [<ffffffff802498f7>] warn_slowpath+0xb7/0xf0
 [<ffffffff804b98e0>] ? ip_output+0x90/0xf0
 [<ffffffff804b858f>] ? __ip_local_out+0x9f/0xb0
 [<ffffffff804b85c0>] ? ip_local_out+0x20/0x30
 [<ffffffff804b8e9c>] ? ip_queue_xmit+0x21c/0x3f0
 [<ffffffff80488ddc>] ? pskb_copy+0x1c/0x1a0
 [<ffffffff8048884e>] ? __alloc_skb+0x6e/0x150
 [<ffffffff80512972>] ? fib6_clean_node+0x42/0xc0
 [<ffffffff8054ad04>] ? _write_unlock_bh+0x24/0x30
 [<ffffffff8054aa0f>] ? _spin_lock_irqsave+0x3f/0x50
 [<ffffffff8037baca>] ? strlcpy+0x4a/0x60
 [<ffffffff8049fe78>] dev_watchdog+0x258/0x270
 [<ffffffff80512360>] ? fib6_gc_timer_cb+0x0/0x10
 [<ffffffff8054ad63>] ? _spin_unlock_bh+0x23/0x30
 [<ffffffff8049fc20>] ? dev_watchdog+0x0/0x270
 [<ffffffff80254650>] run_timer_softirq+0x170/0x250
 [<ffffffff8026adff>] ? clockevents_program_event+0x4f/0x90
 [<ffffffff8024fcc4>] __do_softirq+0x84/0x100
 [<ffffffff80213fdc>] call_softirq+0x1c/0x30
 [<ffffffff802161ad>] do_softirq+0x5d/0xa0
 [<ffffffff8024f92d>] irq_exit+0x9d/0xb0
 [<ffffffff80224b84>] smp_apic_timer_interrupt+0x84/0xc0
 [<ffffffff802138b3>] apic_timer_interrupt+0x83/0x90
 <EOI>  [<ffffffff8021b250>] ? mwait_idle+0x40/0x60
 [<ffffffff80211662>] ? enter_idle+0x22/0x30
 [<ffffffff802116dd>] ? cpu_idle+0x6d/0x120
 [<ffffffff80538038>] ? rest_init+0x88/0x90

This happened less than half a minute after the transfer had started.
And it's going to happen earlier if I increase the load. With four
connections to two other hosts, the transmission usually pauses after
less than ten seconds. Sometimes it lasts for only two or three seconds.
David Dillow - May 14, 2009, 7:14 p.m.
On Thu, 2009-05-14 at 20:37 +0200, Michael Riepe wrote:
> 
> David Dillow wrote:
> > On Tue, 2009-05-12 at 22:29 +0200, Michael Riepe wrote:
> > The patched driver runs on 2.6.27 and survives my 5 minutes 'dd
> > if=/dev/zero bs=1024k | nc target 9000' test which usually dies in less
> > than 90 seconds on 2.6.28+.
> 
> Not on my system:

> This happened less than half a minute after the transfer had started.
> And it's going to happen earlier if I increase the load. With four
> connections to two other hosts, the transmission usually pauses after
> less than ten seconds. Sometimes it lasts for only two or three seconds.

Bummer, but a good data point; thanks for testing.

I added some code to print the irq status when it hangs, and it shows
0x0085, which is RxOK | TxOK | TxDescUnavail, which makes me think we've
lost an MSI-edge interrupt somehow. You being able to reproduce it on
2.6.27 where I cannot leads me to think that the bisection down into the
genirq tree just changed the timing and made it easier to hit after it
was merged.

So, I suppose a good review of the IRQ handling of r8169.c is in order,
though my SATA disks (AHCI w/ MSI irqs) also seem to have similar issues
with delays, though that is entirely unqualified and unmeasured.

Dave
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Riepe - May 14, 2009, 7:42 p.m.
David Dillow wrote:
> On Thu, 2009-05-14 at 20:37 +0200, Michael Riepe wrote:
> 
>>David Dillow wrote:
>>
>>>On Tue, 2009-05-12 at 22:29 +0200, Michael Riepe wrote:
>>>The patched driver runs on 2.6.27 and survives my 5 minutes 'dd
>>>if=/dev/zero bs=1024k | nc target 9000' test which usually dies in less
>>>than 90 seconds on 2.6.28+.
>>
>>Not on my system:
> 
> 
>>This happened less than half a minute after the transfer had started.
>>And it's going to happen earlier if I increase the load. With four
>>connections to two other hosts, the transmission usually pauses after
>>less than ten seconds. Sometimes it lasts for only two or three seconds.
> 
> 
> Bummer, but a good data point; thanks for testing.
> 
> I added some code to print the irq status when it hangs, and it shows
> 0x0085, which is RxOK | TxOK | TxDescUnavail, which makes me think we've
> lost an MSI-edge interrupt somehow. You being able to reproduce it on
> 2.6.27 where I cannot leads me to think that the bisection down into the
> genirq tree just changed the timing and made it easier to hit after it
> was merged.

Maybe. With a single connection, 2.6.27 with the 2.6.29 driver seemed to
be a little more stable (i.e. the transfers lasted a little longer under
low and medium loads) than 2.6.29, but that's nothing I could actually
quantify.

> So, I suppose a good review of the IRQ handling of r8169.c is in order,
> though my SATA disks (AHCI w/ MSI irqs) also seem to have similar issues
> with delays, though that is entirely unqualified and unmeasured.

Hey, MSI isn't bad in general. The e1000e driver on my Lenovo T60 uses
it as well, and it's as reliable as a rock.

Patch

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 0b6e8c8..2d751bf 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -490,7 +490,6 @@  struct rtl8169_private {
 	void (*hw_start)(struct net_device *);
 	unsigned int (*phy_reset_pending)(void __iomem *);
 	unsigned int (*link_ok)(void __iomem *);
-	int (*do_ioctl)(struct rtl8169_private *tp, struct mii_ioctl_data *data, int cmd);
 	int pcie_cap;
 	struct delayed_work task;
 	unsigned features;
@@ -1850,11 +1849,9 @@  static int rtl8169_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	struct rtl8169_private *tp = netdev_priv(dev);
 	struct mii_ioctl_data *data = if_mii(ifr);
 
-	return netif_running(dev) ? tp->do_ioctl(tp, data, cmd) : -ENODEV;
-}
+	if (!netif_running(dev))
+		return -ENODEV;
 
-static int rtl_xmii_ioctl(struct rtl8169_private *tp, struct mii_ioctl_data *data, int cmd)
-{
 	switch (cmd) {
 	case SIOCGMIIPHY:
 		data->phy_id = 32; /* Internal PHY */
@@ -1873,11 +1870,6 @@  static int rtl_xmii_ioctl(struct rtl8169_private *tp, struct mii_ioctl_data *dat
 	return -EOPNOTSUPP;
 }
 
-static int rtl_tbi_ioctl(struct rtl8169_private *tp, struct mii_ioctl_data *data, int cmd)
-{
-	return -EOPNOTSUPP;
-}
-
 static const struct rtl_cfg_info {
 	void (*hw_start)(struct net_device *);
 	unsigned int region;
@@ -1943,26 +1935,6 @@  static void rtl_disable_msi(struct pci_dev *pdev, struct rtl8169_private *tp)
 	}
 }
 
-static const struct net_device_ops rtl8169_netdev_ops = {
-	.ndo_open		= rtl8169_open,
-	.ndo_stop		= rtl8169_close,
-	.ndo_get_stats		= rtl8169_get_stats,
-	.ndo_start_xmit		= rtl8169_start_xmit,
-	.ndo_tx_timeout		= rtl8169_tx_timeout,
-	.ndo_validate_addr	= eth_validate_addr,
-	.ndo_change_mtu		= rtl8169_change_mtu,
-	.ndo_set_mac_address	= rtl_set_mac_address,
-	.ndo_do_ioctl		= rtl8169_ioctl,
-	.ndo_set_multicast_list	= rtl_set_rx_mode,
-#ifdef CONFIG_R8169_VLAN
-	.ndo_vlan_rx_register	= rtl8169_vlan_rx_register,
-#endif
-#ifdef CONFIG_NET_POLL_CONTROLLER
-	.ndo_poll_controller	= rtl8169_netpoll,
-#endif
-
-};
-
 static int __devinit
 rtl8169_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 {
@@ -1989,7 +1961,6 @@  rtl8169_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	}
 
 	SET_NETDEV_DEV(dev, &pdev->dev);
-	dev->netdev_ops = &rtl8169_netdev_ops;
 	tp = netdev_priv(dev);
 	tp->dev = dev;
 	tp->pci_dev = pdev;
@@ -2126,7 +2097,6 @@  rtl8169_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		tp->phy_reset_enable = rtl8169_tbi_reset_enable;
 		tp->phy_reset_pending = rtl8169_tbi_reset_pending;
 		tp->link_ok = rtl8169_tbi_link_ok;
-		tp->do_ioctl = rtl_tbi_ioctl;
 
 		tp->phy_1000_ctrl_reg = ADVERTISE_1000FULL; /* Implied by TBI */
 	} else {
@@ -2135,7 +2105,8 @@  rtl8169_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		tp->phy_reset_enable = rtl8169_xmii_reset_enable;
 		tp->phy_reset_pending = rtl8169_xmii_reset_pending;
 		tp->link_ok = rtl8169_xmii_link_ok;
-		tp->do_ioctl = rtl_xmii_ioctl;
+
+		dev->do_ioctl = rtl8169_ioctl;
 	}
 
 	spin_lock_init(&tp->lock);
@@ -2147,15 +2118,28 @@  rtl8169_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		dev->dev_addr[i] = RTL_R8(MAC0 + i);
 	memcpy(dev->perm_addr, dev->dev_addr, dev->addr_len);
 
+	dev->open = rtl8169_open;
+	dev->hard_start_xmit = rtl8169_start_xmit;
+	dev->get_stats = rtl8169_get_stats;
 	SET_ETHTOOL_OPS(dev, &rtl8169_ethtool_ops);
+	dev->stop = rtl8169_close;
+	dev->tx_timeout = rtl8169_tx_timeout;
+	dev->set_multicast_list = rtl_set_rx_mode;
 	dev->watchdog_timeo = RTL8169_TX_TIMEOUT;
 	dev->irq = pdev->irq;
 	dev->base_addr = (unsigned long) ioaddr;
+	dev->change_mtu = rtl8169_change_mtu;
+	dev->set_mac_address = rtl_set_mac_address;
 
 	netif_napi_add(dev, &tp->napi, rtl8169_poll, R8169_NAPI_WEIGHT);
 
 #ifdef CONFIG_R8169_VLAN
 	dev->features |= NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX;
+	dev->vlan_rx_register = rtl8169_vlan_rx_register;
+#endif
+
+#ifdef CONFIG_NET_POLL_CONTROLLER
+	dev->poll_controller = rtl8169_netpoll;
 #endif
 
 	tp->intr_mask = 0xffff;
@@ -3513,6 +3497,7 @@  static int rtl8169_rx_interrupt(struct net_device *dev,
 			if (rtl8169_rx_vlan_skb(tp, desc, skb) < 0)
 				netif_receive_skb(skb);
 
+			dev->last_rx = jiffies;
 			dev->stats.rx_bytes += pkt_size;
 			dev->stats.rx_packets++;
 		}
@@ -3594,8 +3579,8 @@  static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 		RTL_W16(IntrMask, tp->intr_event & ~tp->napi_event);
 		tp->intr_mask = ~tp->napi_event;
 
-		if (likely(napi_schedule_prep(&tp->napi)))
-			__napi_schedule(&tp->napi);
+		if (likely(netif_rx_schedule_prep(dev, &tp->napi)))
+			__netif_rx_schedule(dev, &tp->napi);
 		else if (netif_msg_intr(tp)) {
 			printk(KERN_INFO "%s: interrupt %04x in poll\n",
 			       dev->name, status);
@@ -3616,7 +3601,7 @@  static int rtl8169_poll(struct napi_struct *napi, int budget)
 	rtl8169_tx_interrupt(dev, tp, ioaddr);
 
 	if (work_done < budget) {
-		napi_complete(napi);
+		netif_rx_complete(dev, napi);
 		tp->intr_mask = 0xffff;
 		/*
 		 * 20040426: the barrier is not strictly required but the