Patchwork Mass udp flow reboot linux with RealTek RTL-8169 Gigabit

login
register
mail settings
Submitter françois romieu
Date Feb. 18, 2011, 9:30 a.m.
Message ID <20110218093000.GA12106@electric-eye.fr.zoreil.com>
Download mbox | patch
Permalink /patch/83549/
State RFC
Delegated to: David Miller
Headers show

Comments

françois romieu - Feb. 18, 2011, 9:30 a.m.
Seblu <seblu@seblu.net> :
[...]
> I've applyed your patch on 2.6.38-rc5. Host have rebooted 2mn after udp start.
> After this reboot, host is still on after 2 hour under a 1Gbit/s udp flow.

Thanks for testing.

> I attached a dmesg output before reboot. Do you need anything else?

Mostly :
1. .config
2. the size of the udp packets and the mtu

As an option :
3. a few seconds of 'vmstat 1' from the host under test
4. an 'ethtool -s eth0' from the host under test
5. /proc/interrupts from the host under test
6. lspci -tv 

Can you apply the two attached patches on top of the previous ones and
give it a try ? The debug should not be too verbose if things are stationary
enough.

Do you have a serial cable and a second computer at hand by chance (don't
go for netconsole with the r8169 driver) ?
Seblu - March 6, 2011, 12:29 a.m.
On Fri, Feb 18, 2011 at 10:30 AM, Francois Romieu <romieu@fr.zoreil.com> wrote:
> Seblu <seblu@seblu.net> :
> [...]
>> I've applyed your patch on 2.6.38-rc5. Host have rebooted 2mn after udp start.
>> After this reboot, host is still on after 2 hour under a 1Gbit/s udp flow.
>
> Thanks for testing.

Thanks for debugging! Sorry for the late reply.

I catched the following trace during my previous torture session.
Maybe it can help.

[ 1448.189961] ------------[ cut here ]------------
[ 1448.189968] WARNING: at net/sched/sch_generic.c:256
dev_watchdog+0x104/0x1ad()
[ 1448.189970] Hardware name: H55N-USB3
[ 1448.189971] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
[ 1448.189972] Modules linked in: ext2 it87 hwmon_vid ds1621 i915
drm_kms_helper drm i2c_algo_bit video dm_crypt raid456
async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy
async_tx md_mod option usb_wwan r8169 snd_pcm snd_timer snd soundcore
usbserial uhci_hcd xhci_hcd ehci_hcd usbcore tpm_tis evdev i2c_i801
tpm processor snd_page_alloc mii sata_mv pcspkr serio_raw i2c_core
tpm_bios button thermal_sys nls_base ext4 mbcache jbd2 crc16 dm_mod
btrfs zlib_deflate crc32c libcrc32c sd_mod crc_t10dif ahci libahci
libata scsi_mod
[ 1448.190006] Pid: 0, comm: kworker/0:1 Not tainted 2.6.38-rc5-seblu #2
[ 1448.190007] Call Trace:
[ 1448.190009]  <IRQ>  [<ffffffff8103a198>] ? warn_slowpath_common+0x78/0x8c
[ 1448.190016]  [<ffffffff8103a24b>] ? warn_slowpath_fmt+0x45/0x4a
[ 1448.190018]  [<ffffffff8126a718>] ? netif_tx_lock+0x43/0x74
[ 1448.190019]  [<ffffffff8126a84d>] ? dev_watchdog+0x104/0x1ad
[ 1448.190023]  [<ffffffff8104e80f>] ? __queue_work+0x30c/0x334
[ 1448.190026]  [<ffffffff81045ff3>] ? run_timer_softirq+0x1b8/0x27b
[ 1448.190028]  [<ffffffff8126a749>] ? dev_watchdog+0x0/0x1ad
[ 1448.190031]  [<ffffffff8105a726>] ? ktime_get+0x5f/0xb8
[ 1448.190034]  [<ffffffff8103fa09>] ? __do_softirq+0xbd/0x194
[ 1448.190037]  [<ffffffff810037cc>] ? call_softirq+0x1c/0x28
[ 1448.190040]  [<ffffffff81004bb5>] ? do_softirq+0x31/0x63
[ 1448.190042]  [<ffffffff8103f8b1>] ? irq_exit+0x36/0x7b
[ 1448.190045]  [<ffffffff8101843c>] ? smp_apic_timer_interrupt+0x87/0x94
[ 1448.190047]  [<ffffffff81003293>] ? apic_timer_interrupt+0x13/0x20
[ 1448.190048]  <EOI>  [<ffffffff81056250>] ? hrtimer_get_next_event+0x8a/0xa2
[ 1448.190056]  [<ffffffffa0128d11>] ? acpi_idle_enter_c1+0x7b/0x95 [processor]
[ 1448.190059]  [<ffffffffa0128cf1>] ? acpi_idle_enter_c1+0x5b/0x95 [processor]
[ 1448.190063]  [<ffffffff8123c097>] ? cpuidle_idle_call+0x119/0x1c6
[ 1448.190065]  [<ffffffff81001c9c>] ? cpu_idle+0xa5/0xdb
[ 1448.190068]  [<ffffffff812ffd4a>] ? start_secondary+0x1ad/0x1b3
[ 1448.190070] ---[ end trace dd5a8baf361d8b47 ]---

>> I attached a dmesg output before reboot. Do you need anything else?
ok. next informations is about test with patch
r8169-TX-stop-queue-race-fix and r8169-debuginator on 2.6.38-rc7.

> Mostly :
> 1. .config
see attachment

> 2. the size of the udp packets and the mtu
eth0 mtu is 1500

iperf output:
Client connecting to bitume, UDP port 5001
Sending 1470 byte datagrams
UDP buffer size:   122 KByte (default)

and icmp echo have size of 65500.

> As an option :
> 3. a few seconds of 'vmstat 1' from the host under test
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 3720532  11292  43292    0    0    15     2 2356 4160  0  1 98  1
 0  0      0 3720524  11300  43316    0    0     0    20 21459 41637  0  2 97  2
 1  0      0 3720420  11300  43316    0    0     0     0 20565 39521  1  1 98  0
 0  0      0 3720588  11300  43316    0    0     0     0 21451 41569  0  1 99  0
 0  0      0 3720492  11300  43316    0    0     0     0 21492 41441  0  1 98  0
 0  0      0 3720564  11300  43316    0    0     0     0 21475 41435  0  2 98  0
 1  0      0 3720500  11308  43320    0    0     0    16 21485 41675  1  1 97  1
 1  0      0 3720556  11308  43320    0    0     0     0 17940 33773  0  1 99  0
 0  0      0 3720588  11308  43320    0    0     0     0 9472 16384  0  1 100  0
 0  0      0 3720660  11308  43320    0    0     0     0 8769 15217  1  1 99  0
 0  0      0 3720484  11308  43320    0    0     0     0 10137 17766  0  1 99  0
 2  0      0 3720524  11316  43324    0    0     0    16 9760 16907  0  1 98  2
 0  0      0 3720492  11316  43324    0    0     0     0 9956 17138  0  1 99  0
 0  0      0 3720492  11316  43324    0    0     0     0 9927 17220  0  0 99  0
 1  0      0 3720524  11316  43324    0    0     0     0 10543 18502
0  0 100  0
 1  0      0 3720596  11316  43324    0    0     0     0 9600 16409  0  1 99  0
 0  0      0 3720564  11324  43324    0    0     0    60 9953 17339  0  1 98  1
 0  0      0 3720460  11324  43324    0    0     0     0 9503 16583  0  0 99  0
 0  0      0 3720684  11324  43324    0    0     0     0 9106 15637  0  1 99  0
 0  0      0 3720460  11324  43324    0    0     0     0 9800 17019  0  1 99  0
 0  0      0 3720492  11324  43324    0    0     0     0 9522 16559  0  0 99  0
 0  0      0 3720428  11332  43328    0    0     0    28 9789 17106  0  1 97  2
 0  0      0 3720460  11332  43328    0    0     0     0 10157 17717  0  1 99  0
 0  0      0 3720564  11332  43328    0    0     0     0 9882 16940  0  0 100  0
 0  0      0 3720492  11332  43328    0    0     0     0 9360 15991  0  0 100  0
 0  0      0 3720492  11332  43328    0    0     0     0 10096 17436  0  0 99  0
 0  0      0 3720556  11340  43328    0    0     0    28 9224 16028  0  1 97  2
 0  0      0 3720500  11340  43328    0    0     0     0 9154 15942  0  1 99  0
 1  0      0 3720524  11340  43328    0    0     0     0 10309 17781  0  1 99  0


> 4. an 'ethtool -s eth0' from the host under test
# ethtool -S eth0
NIC statistics:
     tx_packets: 2598554
     rx_packets: 35146015
     tx_errors: 0
     rx_errors: 0
     rx_missed: 27571
     align_errors: 0
     tx_single_collisions: 0
     tx_multi_collisions: 0
     unicast: 35145983
     broadcast: 24
     multicast: 8
     tx_aborted: 0
     tx_underrun: 0

> 5. /proc/interrupts from the host under test
            CPU0       CPU1       CPU2       CPU3
  0:         40          1          0          2   IO-APIC-edge      timer
  1:          1          0          0          1   IO-APIC-edge      i8042
  8:          0          0          1          0   IO-APIC-edge      rtc0
  9:          0          0          0          0   IO-APIC-fasteoi   acpi
 16:        135        135        130        124   IO-APIC-fasteoi
sata_mv, uhci_hcd:usb4
 18:          0          0          0          0   IO-APIC-fasteoi
ehci_hcd:usb1, uhci_hcd:usb6, uhci_hcd:usb9
 19:          0          0          0          0   IO-APIC-fasteoi
uhci_hcd:usb8
 21:          0          0          0          0   IO-APIC-fasteoi
uhci_hcd:usb5
 23:          0          0          0          0   IO-APIC-fasteoi
ehci_hcd:usb3, uhci_hcd:usb7
 41:       1196       1164       1206       1197   PCI-MSI-edge      ahci
 42:          0          0          0          1   PCI-MSI-edge      xhci_hcd
 43:          0          0          0          0   PCI-MSI-edge      xhci_hcd
 44:          0          0          0          0   PCI-MSI-edge      xhci_hcd
 45:          0          0          0          0   PCI-MSI-edge      xhci_hcd
 46:          0          0          0          0   PCI-MSI-edge      xhci_hcd
 47:    2517359    2517695    2517327    2517689   PCI-MSI-edge      eth0
 48:          0          0          0          0   PCI-MSI-edge      i915
NMI:         64         64         63         61   Non-maskable interrupts
LOC:     189057     173286     183692     172647   Local timer interrupts
SPU:          0          0          0          0   Spurious interrupts
PMI:         64         64         63         61   Performance
monitoring interrupts
IWI:          0          0          0          0   IRQ work interrupts
RES:         75         82        102         65   Rescheduling interrupts
CAL:         93        129        107        125   Function call interrupts
TLB:        105        111        238        322   TLB shootdowns
TRM:          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0   Threshold APIC interrupts
MCE:          0          0          0          0   Machine check exceptions
MCP:          5          5          5          5   Machine check polls
ERR:          3
MIS:          0


> 6. lspci -tv
-[0000:00]-+-00.0  Intel Corporation Core Processor DRAM Controller
           +-01.0-[01]----00.0  Adaptec Serial ATA II RAID 1430SA
           +-02.0  Intel Corporation Core Processor Integrated
Graphics Controller
           +-16.0  Intel Corporation 5 Series/3400 Series Chipset HECI
Controller
           +-1a.0  Intel Corporation 5 Series/3400 Series Chipset USB
Universal Host Controller
           +-1a.1  Intel Corporation 5 Series/3400 Series Chipset USB
Universal Host Controller
           +-1a.2  Intel Corporation 5 Series/3400 Series Chipset USB
Universal Host Controller
           +-1a.7  Intel Corporation 5 Series/3400 Series Chipset USB2
Enhanced Host Controller
           +-1c.0-[02]--
           +-1c.3-[03]----00.0  NEC Corporation uPD720200 USB 3.0 Host
Controller
           +-1c.5-[04]----00.0  Realtek Semiconductor Co., Ltd.
RTL8111/8168B PCI Express Gigabit Ethernet controller
           +-1d.0  Intel Corporation 5 Series/3400 Series Chipset USB
Universal Host Controller
           +-1d.1  Intel Corporation 5 Series/3400 Series Chipset USB
Universal Host Controller
           +-1d.2  Intel Corporation 5 Series/3400 Series Chipset USB
Universal Host Controller
           +-1d.7  Intel Corporation 5 Series/3400 Series Chipset USB2
Enhanced Host Controller
           +-1e.0-[05]--
           +-1f.0  Intel Corporation 5 Series Chipset LPC Interface Controller
           +-1f.2  Intel Corporation 5 Series/3400 Series Chipset 6
port SATA AHCI Controller
           \-1f.3  Intel Corporation 5 Series/3400 Series Chipset
SMBus Controller

>
> Can you apply the two attached patches on top of the previous ones and
> give it a try ? The debug should not be too verbose if things are stationary
> enough.
2.6.38-rc7 with your 2 previous patch change the game. No reboot. No
strange message in dmesg.

But somes sended packet are lost from some host. Example:

thunder ~ 1 # ping  bitume
PING bitume.spn (192.168.242.4) 56(84) bytes of data.
64 bytes from bitume.spn (192.168.242.4): icmp_req=9 ttl=64 time=1005 ms
64 bytes from bitume.spn (192.168.242.4): icmp_req=10 ttl=64 time=0.538 ms
64 bytes from bitume.spn (192.168.242.4): icmp_req=12 ttl=64 time=0.817 ms
64 bytes from bitume.spn (192.168.242.4): icmp_req=15 ttl=64 time=0.639 ms
64 bytes from bitume.spn (192.168.242.4): icmp_req=20 ttl=64 time=5.32 ms
64 bytes from bitume.spn (192.168.242.4): icmp_req=22 ttl=64 time=0.866 ms
64 bytes from bitume.spn (192.168.242.4): icmp_req=26 ttl=64 time=0.607 ms

bitume ~ 0 # tcpdump -lni eth0 icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
00:23:45.713828 IP 192.168.242.5 > 192.168.242.4: ICMP echo request,
id 19486, seq 9, length 64
00:23:46.718479 IP 192.168.242.4 > 192.168.242.5: ICMP echo reply, id
19486, seq 9, length 64
00:23:46.718482 IP 192.168.242.4 > 192.168.242.5: ICMP ip reassembly
time exceeded, length 556
00:23:46.720291 IP 192.168.242.5 > 192.168.242.4: ICMP echo request,
id 19486, seq 10, length 64
00:23:46.720297 IP 192.168.242.4 > 192.168.242.5: ICMP echo reply, id
19486, seq 10, length 64
00:23:48.141013 IP 192.168.242.4 > 192.168.242.5: ICMP ip reassembly
time exceeded, length 556
00:23:48.722329 IP 192.168.242.5 > 192.168.242.4: ICMP echo request,
id 19486, seq 12, length 64
00:23:48.722335 IP 192.168.242.4 > 192.168.242.5: ICMP echo reply, id
19486, seq 12, length 64
00:23:51.741686 IP 192.168.242.5 > 192.168.242.4: ICMP echo request,
id 19486, seq 15, length 64
00:23:51.741692 IP 192.168.242.4 > 192.168.242.5: ICMP echo reply, id
19486, seq 15, length 64
00:23:56.778367 IP 192.168.242.5 > 192.168.242.4: ICMP echo request,
id 19486, seq 20, length 64
00:23:56.778373 IP 192.168.242.4 > 192.168.242.5: ICMP echo reply, id
19486, seq 20, length 64
00:23:57.988239 IP 192.168.242.2 > 192.168.242.4: ICMP ip reassembly
time exceeded, length 556
00:23:57.988243 IP 192.168.242.2 > 192.168.242.4: ICMP ip reassembly
time exceeded, length 556
00:23:57.988327 IP 192.168.242.2 > 192.168.242.4: ICMP ip reassembly
time exceeded, length 556
00:23:58.775425 IP 192.168.242.5 > 192.168.242.4: ICMP echo request,
id 19486, seq 22, length 64
00:23:58.775431 IP 192.168.242.4 > 192.168.242.5: ICMP echo reply, id
19486, seq 22, length 64
^C
17 packets captured
82 packets received by filter
0 packets dropped by kernel

This is maybe normal under stress, card discard packet after all.

> Do you have a serial cable and a second computer at hand by chance (don't
> go for netconsole with the r8169 driver) ?
I've a serial cable and a second computer, but my first computer
doesn't have a com port. Is it then possible?

Do you need more test?
françois romieu - March 10, 2011, 12:08 p.m.
Seblu <seblu@seblu.net> :
[...]
> I catched the following trace during my previous torture session.
> Maybe it can help.

It's the usual r8169 TX timeout watchdog.

[...]
> > Can you apply the two attached patches on top of the previous ones and
> > give it a try ? The debug should not be too verbose if things are stationary
> > enough.
> 2.6.38-rc7 with your 2 previous patch change the game. No reboot. No
> strange message in dmesg.

?

"strange message" as :
[ ] netdev watchdog messages
[ ] 0001 0001 0001 0001 (or similar) message
[ ] net_ratelimit message

> But some sent packets are lost from some host. Example:
[...]
> This is maybe normal under stress, card discard packet after all.

It seems so. 0.08% packet loss. 10 ~ 20kpps (right ?). Sample at 0.1 Hz (ping).

[...]
> I've a serial cable and a second computer, but my first computer
> doesn't have a com port. Is it then possible?

Hardly. Forget it for now.

> Do you need more test?

1. 2.6.38-rc7 without the patches
2. 2.6.38-rc7 with the r8169.c driver of 2.6.38-rc5, without the patches
3. current setup + pktgen. Lower the packet size as long as it increases
   the sender's pps.

I do not understand why the bug would be gone if it was in the r8169
proper.

Patch

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 59ccf0c..712231f 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -4361,13 +4361,13 @@  static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 
 	tp->cur_tx += frags + 1;
 
-	wmb();
-
 	RTL_W8(TxPoll, NPQ);	/* set polling bit */
 
+	mmiowb();
+
 	if (TX_BUFFS_AVAIL(tp) < MAX_SKB_FRAGS) {
 		netif_stop_queue(dev);
-		smp_rmb();
+		smp_mb();
 		if (TX_BUFFS_AVAIL(tp) >= MAX_SKB_FRAGS)
 			netif_wake_queue(dev);
 	}
@@ -4468,10 +4468,14 @@  static void rtl8169_tx_interrupt(struct net_device *dev,
 
 	if (tp->dirty_tx != dirty_tx) {
 		tp->dirty_tx = dirty_tx;
-		smp_wmb();
-		if (netif_queue_stopped(dev) &&
-		    (TX_BUFFS_AVAIL(tp) >= MAX_SKB_FRAGS)) {
-			netif_wake_queue(dev);
+		smp_mb();
+		if (unlikely(netif_queue_stopped(dev) &&
+		    (TX_BUFFS_AVAIL(tp) >= (NUM_TX_DESC / 4)))) {
+			netif_tx_lock(dev);
+			if (netif_queue_stopped(dev) &&
+			    (TX_BUFFS_AVAIL(tp) >= (NUM_TX_DESC / 4)))
+				netif_wake_queue(dev);
+			netif_tx_unlock(dev);
 		}
 		/*
 		 * 8168 hack: TxPoll requests are lost when the Tx packets are