Message ID | 4A2D2906.6090002@gmail.com |
---|---|
State | Superseded, archived |
Delegated to: | David Miller |
Headers | show |
Eric Dumazet wrote: > Michael Tokarev a écrit : [] >>>> The situation is very simple: with an RTL8169 (probably >>>> onboard) GigE card which, by default, is configured to >>>> have MTU (maximal transmission unit) to be 1500 bytes, >>>> it's *trivial* to instantly crash the machine by sending >>>> it a *single* packet of size >1500 bytes (provided the >>>> network switch can handle jumbo frames). [] > OK, 2nd try then :) > diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c > index e94316b..9080b08 100644 > --- a/drivers/net/r8169.c > +++ b/drivers/net/r8169.c > @@ -3495,7 +3495,8 @@ static int rtl8169_rx_interrupt(struct net_device *dev, > * frames. They are seen as a symptom of over-mtu > * sized frames. > */ > - if (unlikely(rtl8169_fragmented_frame(status))) { > + if (unlikely(rtl8169_fragmented_frame(status) || > + (unsigned int)pkt_size > tp->rx_buf_sz)) { > dev->stats.rx_dropped++; > dev->stats.rx_length_errors++; > rtl8169_mark_to_asic(desc, tp->rx_buf_sz); This one behaves much better. There's no instant crash anymore, and the 'dropped' and 'frame' stats in ifconfig gets incremented with each ping. It fails down the line however. I wasn't able to reply to this email after doing the ping test with the above change (no more large packets were sent). With OOPSes like this one: general protection fault: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:01.0/0000:01:05.0/drm/card0/dev CPU 0 Modules linked in: radeon drm r8169 powernow_k8 autofs4 nfsd nfs lockd nfs_acl auth_rpcgss sunrpc quota_v2 Pid: 10917, comm: icedove-bin Not tainted 2.6.29-x86-64 #2.6.29.4 System Product Name RIP: 0010:[<ffffffff8029889b>] [<ffffffff8029889b>] put_page+0x1b/0x170 RSP: 0018:ffff8800cd8fdb88 EFLAGS: 00210296 RAX: 0000000000000020 RBX: 6d6c6b6a69686766 RCX: 0000000000000760 RDX: ffff88011d9f1680 RSI: ffff88011d9f139b RDI: 6d6c6b6a69686766 RBP: ffff88011c936ac0 R08: 0000000000000001 R09: 0000000000000000 R10: ffffffff80552840 R11: 0000000000200293 R12: ffff88011d03e080 R13: 0000000000000030 R14: ffff88011d03e4bc R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffffff80608000(0063) knlGS:00000000f220bb90 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 000000000820302c CR3: 0000000116c57000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process icedove-bin (pid: 10917, threadinfo ffff8800cd8fc000, task ffff8801158d8820) Stack: 0000000000000000 0000000000000001 ffff88011c936ac0 ffff88011d03e080 0000000000000030 ffffffff803dbc7f 0000000000000319 ffff88011c936ac0 0000000000000000 ffffffff803db911 ffff88011c936ac0 ffffffff80418a88 Call Trace: [<ffffffff803dbc7f>] ? skb_release_data+0xaf/0xe0 [<ffffffff803db911>] ? __kfree_skb+0x11/0xa0 [<ffffffff80418a88>] ? tcp_recvmsg+0x6d8/0x950 [<ffffffff8046f91e>] ? _spin_lock_irqsave+0x2e/0x40 [<ffffffff803d61b0>] ? sock_common_recvmsg+0x30/0x50 [<ffffffff803d4365>] ? sock_recvmsg+0xd5/0x110 [<ffffffff80244640>] ? default_wake_function+0x0/0x10 [<ffffffff802d5019>] ? file_update_time+0x59/0x140 [<ffffffff80261e90>] ? autoremove_wake_function+0x0/0x30 [<ffffffff8046fa25>] ? _spin_lock+0x5/0x10 [<ffffffff8026f109>] ? futex_wake+0x129/0x140 [<ffffffff803d3ab2>] ? sockfd_lookup_light+0x22/0x90 [<ffffffff803d56e9>] ? sys_recvfrom+0xe9/0x180 [<ffffffff80261e90>] ? autoremove_wake_function+0x0/0x30 [<ffffffff8046d8c5>] ? thread_return+0x3d/0x6d8 [<ffffffff803f6c86>] ? compat_sys_socketcall+0x136/0x1f0 [<ffffffff80238c47>] ? cstar_dispatch+0x7/0x4a Code: 2c fd ff ff eb db 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec 28 48 89 5c 24 08 48 89 6c 24 10 48 89 fb 4c RIP [<ffffffff8029889b>] put_page+0x1b/0x170 RSP <ffff8800cd8fdb88> ---[ end trace c2d84c667e0d946d ]--- (it probably has nothing to do with radeon drm sysfs file (it is NOT the binary fglrx module by the way)). Looks like some memory corruption. And most probably it is in that error path in r8169 driver - it is the only new codepath which were executed here. The problem is quite repeatable - after sending a single large ping system starts behaving like the above at random. So we're on a right way it seems, but there's more than one issue here. By the way, is there anything else we can do here but drop the packet? Or is there any REASON to do something else? Thanks! /mjt -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Michael Tokarev a écrit : > Eric Dumazet wrote: >> Michael Tokarev a écrit : > [] >>>>> The situation is very simple: with an RTL8169 (probably >>>>> onboard) GigE card which, by default, is configured to >>>>> have MTU (maximal transmission unit) to be 1500 bytes, >>>>> it's *trivial* to instantly crash the machine by sending >>>>> it a *single* packet of size >1500 bytes (provided the >>>>> network switch can handle jumbo frames). > [] >> OK, 2nd try then :) > >> diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c >> index e94316b..9080b08 100644 >> --- a/drivers/net/r8169.c >> +++ b/drivers/net/r8169.c >> @@ -3495,7 +3495,8 @@ static int rtl8169_rx_interrupt(struct >> net_device *dev, >> * frames. They are seen as a symptom of over-mtu >> * sized frames. >> */ >> - if (unlikely(rtl8169_fragmented_frame(status))) { >> + if (unlikely(rtl8169_fragmented_frame(status) || >> + (unsigned int)pkt_size > tp->rx_buf_sz)) { >> dev->stats.rx_dropped++; >> dev->stats.rx_length_errors++; >> rtl8169_mark_to_asic(desc, tp->rx_buf_sz); > > This one behaves much better. There's no instant crash anymore, and the > 'dropped' and 'frame' stats in ifconfig gets incremented with each ping. > > It fails down the line however. I wasn't able to reply to this email after > doing the ping test with the above change (no more large packets were > sent). > With OOPSes like this one: > > general protection fault: 0000 [#1] SMP > last sysfs file: > /sys/devices/pci0000:00/0000:00:01.0/0000:01:05.0/drm/card0/dev > CPU 0 > Modules linked in: radeon drm r8169 powernow_k8 autofs4 nfsd nfs lockd > nfs_acl auth_rpcgss sunrpc quota_v2 > Pid: 10917, comm: icedove-bin Not tainted 2.6.29-x86-64 #2.6.29.4 > System Product Name > RIP: 0010:[<ffffffff8029889b>] [<ffffffff8029889b>] put_page+0x1b/0x170 > RSP: 0018:ffff8800cd8fdb88 EFLAGS: 00210296 > RAX: 0000000000000020 RBX: 6d6c6b6a69686766 RCX: 0000000000000760 > RDX: ffff88011d9f1680 RSI: ffff88011d9f139b RDI: 6d6c6b6a69686766 > RBP: ffff88011c936ac0 R08: 0000000000000001 R09: 0000000000000000 > R10: ffffffff80552840 R11: 0000000000200293 R12: ffff88011d03e080 > R13: 0000000000000030 R14: ffff88011d03e4bc R15: 0000000000000000 > FS: 0000000000000000(0000) GS:ffffffff80608000(0063) > knlGS:00000000f220bb90 > CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 > CR2: 000000000820302c CR3: 0000000116c57000 CR4: 00000000000006e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process icedove-bin (pid: 10917, threadinfo ffff8800cd8fc000, task > ffff8801158d8820) > Stack: > 0000000000000000 0000000000000001 ffff88011c936ac0 ffff88011d03e080 > 0000000000000030 ffffffff803dbc7f 0000000000000319 ffff88011c936ac0 > 0000000000000000 ffffffff803db911 ffff88011c936ac0 ffffffff80418a88 > Call Trace: > [<ffffffff803dbc7f>] ? skb_release_data+0xaf/0xe0 > [<ffffffff803db911>] ? __kfree_skb+0x11/0xa0 > [<ffffffff80418a88>] ? tcp_recvmsg+0x6d8/0x950 > [<ffffffff8046f91e>] ? _spin_lock_irqsave+0x2e/0x40 > [<ffffffff803d61b0>] ? sock_common_recvmsg+0x30/0x50 > [<ffffffff803d4365>] ? sock_recvmsg+0xd5/0x110 > [<ffffffff80244640>] ? default_wake_function+0x0/0x10 > [<ffffffff802d5019>] ? file_update_time+0x59/0x140 > [<ffffffff80261e90>] ? autoremove_wake_function+0x0/0x30 > [<ffffffff8046fa25>] ? _spin_lock+0x5/0x10 > [<ffffffff8026f109>] ? futex_wake+0x129/0x140 > [<ffffffff803d3ab2>] ? sockfd_lookup_light+0x22/0x90 > [<ffffffff803d56e9>] ? sys_recvfrom+0xe9/0x180 > [<ffffffff80261e90>] ? autoremove_wake_function+0x0/0x30 > [<ffffffff8046d8c5>] ? thread_return+0x3d/0x6d8 > [<ffffffff803f6c86>] ? compat_sys_socketcall+0x136/0x1f0 > [<ffffffff80238c47>] ? cstar_dispatch+0x7/0x4a > Code: 2c fd ff ff eb db 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec 28 48 89 > 5c 24 08 48 89 6c 24 10 48 89 fb 4c > RIP [<ffffffff8029889b>] put_page+0x1b/0x170 > RSP <ffff8800cd8fdb88> > ---[ end trace c2d84c667e0d946d ]--- > > (it probably has nothing to do with radeon drm sysfs file > (it is NOT the binary fglrx module by the way)). > > Looks like some memory corruption. And most probably it is in > that error path in r8169 driver - it is the only new codepath > which were executed here. The problem is quite repeatable - > after sending a single large ping system starts behaving like > the above at random. > > So we're on a right way it seems, but there's more than one > issue here. > > By the way, is there anything else we can do here but drop the > packet? Or is there any REASON to do something else? > Hmm... this code path is not new, I believe your adapter is buggy, because it is overwriting part of memory it should not touch at all. When this driver queues a skb in rx queue, it tells NIC the max size of the skb, and apparently NIC happily delivers packets with larger sizes, so probably DMA wrote data past end of skb data. Try to change static void rtl_set_rx_max_size(void __iomem *ioaddr) RTL_W16(RxMaxSize, 16383); to -> RTL_W16(RxMaxSize, RX_BUF_SIZE); (But it will probably break jumbo frames rx as well) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet wrote: > Michael Tokarev a écrit : >> Eric Dumazet wrote: >>> Michael Tokarev a écrit : >> [] >>>>>> The situation is very simple: with an RTL8169 (probably >>>>>> onboard) GigE card which, by default, is configured to >>>>>> have MTU (maximal transmission unit) to be 1500 bytes, >>>>>> it's *trivial* to instantly crash the machine by sending >>>>>> it a *single* packet of size >1500 bytes (provided the >>>>>> network switch can handle jumbo frames). >> [] >>> OK, 2nd try then :) >>> diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c >>> index e94316b..9080b08 100644 >>> --- a/drivers/net/r8169.c >>> +++ b/drivers/net/r8169.c >>> @@ -3495,7 +3495,8 @@ static int rtl8169_rx_interrupt(struct >>> net_device *dev, >>> * frames. They are seen as a symptom of over-mtu >>> * sized frames. >>> */ >>> - if (unlikely(rtl8169_fragmented_frame(status))) { >>> + if (unlikely(rtl8169_fragmented_frame(status) || >>> + (unsigned int)pkt_size > tp->rx_buf_sz)) { >>> dev->stats.rx_dropped++; >>> dev->stats.rx_length_errors++; >>> rtl8169_mark_to_asic(desc, tp->rx_buf_sz); >> This one behaves much better. There's no instant crash anymore, and the >> 'dropped' and 'frame' stats in ifconfig gets incremented with each ping. >> >> It fails down the line however. I wasn't able to reply to this email after >> doing the ping test with the above change (no more large packets were >> sent). >> With OOPSes like this one: >> >> general protection fault: 0000 [#1] SMP [] >> [<ffffffff803dbc7f>] ? skb_release_data+0xaf/0xe0 >> [<ffffffff803db911>] ? __kfree_skb+0x11/0xa0 >> [<ffffffff80418a88>] ? tcp_recvmsg+0x6d8/0x950 [] >> Looks like some memory corruption. And most probably it is in >> that error path in r8169 driver - it is the only new codepath >> which were executed here. The problem is quite repeatable - >> after sending a single large ping system starts behaving like >> the above at random. [] > Hmm... this code path is not new, I believe your adapter is buggy, because it > is overwriting part of memory it should not touch at all. > > When this driver queues a skb in rx queue, it tells NIC the max size of the skb, > and apparently NIC happily delivers packets with larger sizes, so probably DMA > wrote data past end of skb data. That's a very likely situation. > Try to change > > static void rtl_set_rx_max_size(void __iomem *ioaddr) > RTL_W16(RxMaxSize, 16383); > > to -> > > RTL_W16(RxMaxSize, RX_BUF_SIZE); > > (But it will probably break jumbo frames rx as well) (RX_BUF_SIZE is defined as 1536). Aha, so it should set some flags instead (as were tested in your first try), for packets larger than that. Makes sense. But if we told the NIC that we can receive 16K buffers, and it delivered 3K packet to us, we've got some memory corruption... I.e., the problem is that we told the driver that we can handle 16k buffers but actually we had only 1500, no? Lemme check this all... Setting RxMaxSize to RX_BUF_SIZE indeed solved the problem, -- I don't see random corruptions like the last one above. But after setting RxMaxSize to 2500, I can trigger your 2nd check/patch condition (for pkt_size > tp->rx_buf_sz) for packets <2500 in size, and your *first* check/patch condition (RxRES | RxRWT | RxRUNT | RxCRC | RxFOVF) for packets >2500 in size. So to me (who has no knowledge about hardware at all), it looks like the card behaves quite correctly. Also note that I've seen this behavior on several different machines. Here @home where I'm doing this all testing I've Asus M3A78-EM motherboard (AMD780), and the second one is Gigabyte GA-MA74GM-S2H (AMD740) - both behaves very similarly. Both are AMD7xx, but I've seen the same problem on Intel-based machines too. I'll try out some more tests later today. (And there's another issue with these NICs -- the famous, quite frequent under load "NETDEV WATCHDOG: eth0 (r8169): transmit timed out" errors, which are quite annoying... Also shown by both the above-mentioned mobos and by other machines). Thanks! /mjt -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Michael Tokarev <mjt@tls.msk.ru> : [...] > I'll try out some more tests later today. (And there's another > issue with these NICs -- the famous, quite frequent under load > "NETDEV WATCHDOG: eth0 (r8169): transmit timed out" errors, which > are quite annoying... Also shown by both the above-mentioned mobos > and by other machines). This one should be fixed by commit f11a377b3f4e897d11f0e8d1fc688667e2f19708 in mainline.
diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c index e94316b..9080b08 100644 --- a/drivers/net/r8169.c +++ b/drivers/net/r8169.c @@ -3495,7 +3495,8 @@ static int rtl8169_rx_interrupt(struct net_device *dev, * frames. They are seen as a symptom of over-mtu * sized frames. */ - if (unlikely(rtl8169_fragmented_frame(status))) { + if (unlikely(rtl8169_fragmented_frame(status) || + (unsigned int)pkt_size > tp->rx_buf_sz)) { dev->stats.rx_dropped++; dev->stats.rx_length_errors++; rtl8169_mark_to_asic(desc, tp->rx_buf_sz);