Message ID | 20140615.193312.2155181077359902619.davem@davemloft.net |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
On Sun, Jun 15, 2014 at 07:33:12PM -0700, David Miller wrote:
> 1) Fix checksumming regressions, from Tom Herbert.
Something still not right for me here.
After about 5 minutes, I get an oops and then instant reboot/lock up.
I haven't managed to get a trace over usb-serial because it seems to
crash before it completes. Hand transcribed one looks like..
rbp: ffff880236403970 r08: 0000000000000000 r09: 0000000000000000
r10: 000000000000005a r11: 00000000000002d7 f12: ffff880233000d80
r13: ffff8800aa1a6fc2 r14: ffff880233001d40 f15: 00000000ffffac82
fs: 0 fs: ffff880236400000 knlGS: 0
CS: 10 DS: 0 ES: 0 CR0: 80050033
CR2: ffff8800aa1a8000 CR3: 1a0d000 CR4: 407f0
Stack:
ffff880236403988 ffffffff81298bbc 00000000000016c0 ffff8802364039e8
ffffffff814ca05a ffff880233001d40 000005a80000e397 ffff880233001680
0000000000000000 0d420685ffffac82 000000000000012a 000000000000004e
Call Trace:
<IRQ>
csum_partial
tcp_gso_segment
inet_gso_segment
? update_dl_migration
skb_mac_gso_segment
__skb_gso_segment
dev_hard_start_xmit
sch_direct_xmit
__dev_queue_xmit
? dev_hard_start_xmit
dev_queue_xmit
ip_finish_output
? ip_output
ip_output
ip_forward_finish
ip_forward
ip_rcv_finish
ip_rcv
__netif_receive_skb_core
? __netif_receive_skb_core
? trace_hardirqs_on
__netif_receive_skb
netif_receive_skb_internal
napi_gro_complete
? napi_gro_complete
dev_gro_receive
? dev_gro_receive
napi_gro_receive
rtl8169_poll
net_rx_action
__do_softirq
irq_exit
do_IRQ
common_interrupt
<EOI>
cpuidle_enter_state
cpuidle_enter
cpu_startup_entry
rest_init
? csum_partial_copy_generic
start_kernel
RIP: do_csum+0x83/0x180
Code: 41 89 d2 74 45 89 d1 45 31 c0 48 89 fa 0f 1f 00 48 03 02 48 13 42
08 48 13 42 10 48 13 42 20 48 13 42 28 48 13 42 30 <48> 13 42 38 4c 11
c0 48 83 c2 40 83 e9 01 75 d5 41 83 ea 01 49
All code
========
0: 41 89 d2 mov %edx,%r10d
3: 74 45 je 0x4a
5: 89 d1 mov %edx,%ecx
7: 45 31 c0 xor %r8d,%r8d
a: 48 89 fa mov %rdi,%rdx
d: 0f 1f 00 nopl (%rax)
10: 48 03 02 add (%rdx),%rax
13: 48 13 42 08 adc 0x8(%rdx),%rax
17: 48 13 42 10 adc 0x10(%rdx),%rax
1b: 48 13 42 20 adc 0x20(%rdx),%rax
1f: 48 13 42 28 adc 0x28(%rdx),%rax
23: 48 13 42 30 adc 0x30(%rdx),%rax
27:* 48 13 42 38 adc 0x38(%rdx),%rax <-- trapping instruction
2b: 4c 11 c0 adc %r8,%rax
2e: 48 83 c2 40 add $0x40,%rdx
32: 83 e9 01 sub $0x1,%ecx
35: 75 d5 jne 0xc
37: 41 83 ea 01 sub $0x1,%r10d
3b: 49 rex.WB
Typical, rdx and rax had scrolled off the screen.
I'll hobble some of the dump_stack output and see if I can get something useful.
It's a two nic (both 8169's) box doing ip-masq and firewall duties.
Dave
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 16, 2014 at 07:04:50PM -0400, Dave Jones wrote: > On Sun, Jun 15, 2014 at 07:33:12PM -0700, David Miller wrote: > > > 1) Fix checksumming regressions, from Tom Herbert. > > Something still not right for me here. > After about 5 minutes, I get an oops and then instant reboot/lock up. > > I haven't managed to get a trace over usb-serial because it seems to > crash before it completes. Hand transcribed one looks like.. > > rbp: ffff880236403970 r08: 0000000000000000 r09: 0000000000000000 > r10: 000000000000005a r11: 00000000000002d7 f12: ffff880233000d80 > r13: ffff8800aa1a6fc2 r14: ffff880233001d40 f15: 00000000ffffac82 > fs: 0 fs: ffff880236400000 knlGS: 0 > CS: 10 DS: 0 ES: 0 CR0: 80050033 > CR2: ffff8800aa1a8000 CR3: 1a0d000 CR4: 407f0 > Stack: > ffff880236403988 ffffffff81298bbc 00000000000016c0 ffff8802364039e8 > ffffffff814ca05a ffff880233001d40 000005a80000e397 ffff880233001680 > 0000000000000000 0d420685ffffac82 000000000000012a 000000000000004e > Call Trace: > <IRQ> > csum_partial > tcp_gso_segment > inet_gso_segment > ? update_dl_migration > skb_mac_gso_segment > __skb_gso_segment > dev_hard_start_xmit > sch_direct_xmit > __dev_queue_xmit > ? dev_hard_start_xmit > dev_queue_xmit > ip_finish_output > ? ip_output > ip_output > ip_forward_finish > ip_forward > ip_rcv_finish > ip_rcv > __netif_receive_skb_core > ? __netif_receive_skb_core > ? trace_hardirqs_on > __netif_receive_skb > netif_receive_skb_internal > napi_gro_complete > ? napi_gro_complete > dev_gro_receive > ? dev_gro_receive > napi_gro_receive > rtl8169_poll > net_rx_action > __do_softirq > irq_exit > do_IRQ > common_interrupt > <EOI> > cpuidle_enter_state > cpuidle_enter > cpu_startup_entry > rest_init > ? csum_partial_copy_generic > start_kernel > RIP: do_csum+0x83/0x180 > > Code: 41 89 d2 74 45 89 d1 45 31 c0 48 89 fa 0f 1f 00 48 03 02 48 13 42 > 08 48 13 42 10 48 13 42 20 48 13 42 28 48 13 42 30 <48> 13 42 38 4c 11 > c0 48 83 c2 40 83 e9 01 75 d5 41 83 ea 01 49 > > All code > ======== > 0: 41 89 d2 mov %edx,%r10d > 3: 74 45 je 0x4a > 5: 89 d1 mov %edx,%ecx > 7: 45 31 c0 xor %r8d,%r8d > a: 48 89 fa mov %rdi,%rdx > d: 0f 1f 00 nopl (%rax) > 10: 48 03 02 add (%rdx),%rax > 13: 48 13 42 08 adc 0x8(%rdx),%rax > 17: 48 13 42 10 adc 0x10(%rdx),%rax > 1b: 48 13 42 20 adc 0x20(%rdx),%rax > 1f: 48 13 42 28 adc 0x28(%rdx),%rax > 23: 48 13 42 30 adc 0x30(%rdx),%rax > 27:* 48 13 42 38 adc 0x38(%rdx),%rax <-- trapping instruction > 2b: 4c 11 c0 adc %r8,%rax > 2e: 48 83 c2 40 add $0x40,%rdx > 32: 83 e9 01 sub $0x1,%ecx > 35: 75 d5 jne 0xc > 37: 41 83 ea 01 sub $0x1,%r10d > 3b: 49 rex.WB > > Typical, rdx and rax had scrolled off the screen. after removing the dump_stack invocations, I noticed that the reason this is rebooting is probably because right after the initial oops we hit the WARN_ON at arch/x86/kernel/smp.c:124 if (unlikely(cpu_is_offline(cpu))) { WARN_ON(1); return; } lol. Anwyay, before all that nonsense, I now have the top of the oops.. BUG: unable to handle kernel paging request at ffff880218c18000 IP: do_csum+0x68 PGD: 2c6a067 PUD: 2c6d067 PMD 23fd1c067 PTE: 80000000218c18060 RAX: 2090539bbf7b28f2 RBX: 00000000acb23d4e RCX: 000000000000000b RDX: ffff880218c18000 RSI: 0000000000001c62 RDI: ffff880218c16680 Maybe also notable here is that the kernel is built with DEBUG_PAGEALLOC on. Dave -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 16, 2014 at 07:42:54PM -0400, Dave Jones wrote: > On Mon, Jun 16, 2014 at 07:04:50PM -0400, Dave Jones wrote: > > On Sun, Jun 15, 2014 at 07:33:12PM -0700, David Miller wrote: > > > > > 1) Fix checksumming regressions, from Tom Herbert. > > > > Something still not right for me here. > > After about 5 minutes, I get an oops and then instant reboot/lock up. > > > > I haven't managed to get a trace over usb-serial because it seems to > > crash before it completes. Hand transcribed one looks like.. > > > > rbp: ffff880236403970 r08: 0000000000000000 r09: 0000000000000000 > > r10: 000000000000005a r11: 00000000000002d7 f12: ffff880233000d80 > > r13: ffff8800aa1a6fc2 r14: ffff880233001d40 f15: 00000000ffffac82 > > fs: 0 fs: ffff880236400000 knlGS: 0 > > CS: 10 DS: 0 ES: 0 CR0: 80050033 > > CR2: ffff8800aa1a8000 CR3: 1a0d000 CR4: 407f0 > > Stack: > > ffff880236403988 ffffffff81298bbc 00000000000016c0 ffff8802364039e8 > > ffffffff814ca05a ffff880233001d40 000005a80000e397 ffff880233001680 > > 0000000000000000 0d420685ffffac82 000000000000012a 000000000000004e > > Call Trace: > > <IRQ> > > csum_partial > > tcp_gso_segment > > inet_gso_segment > > ? update_dl_migration > > skb_mac_gso_segment > > __skb_gso_segment > > dev_hard_start_xmit > > sch_direct_xmit > > __dev_queue_xmit > > ? dev_hard_start_xmit > > dev_queue_xmit > > ip_finish_output > > ? ip_output > > ip_output > > ip_forward_finish > > ip_forward > > ip_rcv_finish > > ip_rcv > > __netif_receive_skb_core > > ? __netif_receive_skb_core > > ? trace_hardirqs_on > > __netif_receive_skb > > netif_receive_skb_internal > > napi_gro_complete > > ? napi_gro_complete > > dev_gro_receive > > ? dev_gro_receive > > napi_gro_receive > > rtl8169_poll > > net_rx_action > > __do_softirq > > irq_exit > > do_IRQ > > common_interrupt > > <EOI> > > cpuidle_enter_state > > cpuidle_enter > > cpu_startup_entry > > rest_init > > ? csum_partial_copy_generic > > start_kernel > > RIP: do_csum+0x83/0x180 > > > > Code: 41 89 d2 74 45 89 d1 45 31 c0 48 89 fa 0f 1f 00 48 03 02 48 13 42 > > 08 48 13 42 10 48 13 42 20 48 13 42 28 48 13 42 30 <48> 13 42 38 4c 11 > > c0 48 83 c2 40 83 e9 01 75 d5 41 83 ea 01 49 > > > > All code > > ======== > > 0: 41 89 d2 mov %edx,%r10d > > 3: 74 45 je 0x4a > > 5: 89 d1 mov %edx,%ecx > > 7: 45 31 c0 xor %r8d,%r8d > > a: 48 89 fa mov %rdi,%rdx > > d: 0f 1f 00 nopl (%rax) > > 10: 48 03 02 add (%rdx),%rax > > 13: 48 13 42 08 adc 0x8(%rdx),%rax > > 17: 48 13 42 10 adc 0x10(%rdx),%rax > > 1b: 48 13 42 20 adc 0x20(%rdx),%rax > > 1f: 48 13 42 28 adc 0x28(%rdx),%rax > > 23: 48 13 42 30 adc 0x30(%rdx),%rax > > 27:* 48 13 42 38 adc 0x38(%rdx),%rax <-- trapping instruction > > 2b: 4c 11 c0 adc %r8,%rax > > 2e: 48 83 c2 40 add $0x40,%rdx > > 32: 83 e9 01 sub $0x1,%ecx > > 35: 75 d5 jne 0xc > > 37: 41 83 ea 01 sub $0x1,%r10d > > 3b: 49 rex.WB > > > > Typical, rdx and rax had scrolled off the screen. > > after removing the dump_stack invocations, I noticed that the reason > this is rebooting is probably because right after the initial oops > we hit the WARN_ON at arch/x86/kernel/smp.c:124 > > if (unlikely(cpu_is_offline(cpu))) { > WARN_ON(1); > return; > } > > lol. > > Anwyay, before all that nonsense, I now have the top of the oops.. > > BUG: unable to handle kernel paging request at ffff880218c18000 > IP: do_csum+0x68 > PGD: 2c6a067 PUD: 2c6d067 PMD 23fd1c067 PTE: 80000000218c18060 > RAX: 2090539bbf7b28f2 RBX: 00000000acb23d4e RCX: 000000000000000b > RDX: ffff880218c18000 RSI: 0000000000001c62 RDI: ffff880218c16680 > > Maybe also notable here is that the kernel is built with DEBUG_PAGEALLOC on. This is still a problem in -rc2. Lasts about 5 minutes, then reboots. Dave -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ping? This is all related to the new checksumming code by Tom Herbert. The oops seems to be "gso_make_checksum()" taking a checksum of something that isn't mapped. Either the math for 'plen' is simply wrong (maybe "csum_start" is not properly initialized), or maybe there is a missing skb_pull() or similar, or the skb is fragmented and/or needs kmapping. It's not a NULL pointer dereference, the faulting address is ffff8800aa1a8000, so it's some kind of invalid pointer arithmetic found by DEBUG_PAGEALLOC. The register information all looks reasonably sane (ie we have 11 64-byte blocks to go - so it looks like the length of the csum is reasonable), and the starting address was clearly ok too, so this is the copying just traversing into a page that isn't allocated. That really smells like a skb with multiple fragments to me. Can that happen for the GSO code? Comparing the newly added "gso_make_checksum()" function with our venerable "skb_checksum()", I get the feeling that there are a few corners that have been cut. Linus On Mon, Jun 23, 2014 at 4:47 PM, Dave Jones <davej@redhat.com> wrote: > On Mon, Jun 16, 2014 at 07:42:54PM -0400, Dave Jones wrote: > > On Mon, Jun 16, 2014 at 07:04:50PM -0400, Dave Jones wrote: > > > On Sun, Jun 15, 2014 at 07:33:12PM -0700, David Miller wrote: > > > > > > > 1) Fix checksumming regressions, from Tom Herbert. > > > > > > Something still not right for me here. > > > After about 5 minutes, I get an oops and then instant reboot/lock up. > > > > > > I haven't managed to get a trace over usb-serial because it seems to > > > crash before it completes. Hand transcribed one looks like.. > > > > > > rbp: ffff880236403970 r08: 0000000000000000 r09: 0000000000000000 > > > r10: 000000000000005a r11: 00000000000002d7 f12: ffff880233000d80 > > > r13: ffff8800aa1a6fc2 r14: ffff880233001d40 f15: 00000000ffffac82 > > > fs: 0 fs: ffff880236400000 knlGS: 0 > > > CS: 10 DS: 0 ES: 0 CR0: 80050033 > > > CR2: ffff8800aa1a8000 CR3: 1a0d000 CR4: 407f0 > > > Stack: > > > ffff880236403988 ffffffff81298bbc 00000000000016c0 ffff8802364039e8 > > > ffffffff814ca05a ffff880233001d40 000005a80000e397 ffff880233001680 > > > 0000000000000000 0d420685ffffac82 000000000000012a 000000000000004e > > > Call Trace: > > > <IRQ> > > > csum_partial > > > tcp_gso_segment > > > inet_gso_segment > > > ? update_dl_migration > > > skb_mac_gso_segment > > > __skb_gso_segment > > > dev_hard_start_xmit > > > sch_direct_xmit > > > __dev_queue_xmit > > > ? dev_hard_start_xmit > > > dev_queue_xmit > > > ip_finish_output > > > ? ip_output > > > ip_output > > > ip_forward_finish > > > ip_forward > > > ip_rcv_finish > > > ip_rcv > > > __netif_receive_skb_core > > > ? __netif_receive_skb_core > > > ? trace_hardirqs_on > > > __netif_receive_skb > > > netif_receive_skb_internal > > > napi_gro_complete > > > ? napi_gro_complete > > > dev_gro_receive > > > ? dev_gro_receive > > > napi_gro_receive > > > rtl8169_poll > > > net_rx_action > > > __do_softirq > > > irq_exit > > > do_IRQ > > > common_interrupt > > > <EOI> > > > cpuidle_enter_state > > > cpuidle_enter > > > cpu_startup_entry > > > rest_init > > > ? csum_partial_copy_generic > > > start_kernel > > > RIP: do_csum+0x83/0x180 > > > > > > Code: 41 89 d2 74 45 89 d1 45 31 c0 48 89 fa 0f 1f 00 48 03 02 48 13 42 > > > 08 48 13 42 10 48 13 42 20 48 13 42 28 48 13 42 30 <48> 13 42 38 4c 11 > > > c0 48 83 c2 40 83 e9 01 75 d5 41 83 ea 01 49 > > > > > > All code > > > ======== > > > 0: 41 89 d2 mov %edx,%r10d > > > 3: 74 45 je 0x4a > > > 5: 89 d1 mov %edx,%ecx > > > 7: 45 31 c0 xor %r8d,%r8d > > > a: 48 89 fa mov %rdi,%rdx > > > d: 0f 1f 00 nopl (%rax) > > > 10: 48 03 02 add (%rdx),%rax > > > 13: 48 13 42 08 adc 0x8(%rdx),%rax > > > 17: 48 13 42 10 adc 0x10(%rdx),%rax > > > 1b: 48 13 42 20 adc 0x20(%rdx),%rax > > > 1f: 48 13 42 28 adc 0x28(%rdx),%rax > > > 23: 48 13 42 30 adc 0x30(%rdx),%rax > > > 27:* 48 13 42 38 adc 0x38(%rdx),%rax <-- trapping instruction > > > 2b: 4c 11 c0 adc %r8,%rax > > > 2e: 48 83 c2 40 add $0x40,%rdx > > > 32: 83 e9 01 sub $0x1,%ecx > > > 35: 75 d5 jne 0xc > > > 37: 41 83 ea 01 sub $0x1,%r10d > > > 3b: 49 rex.WB > > > > > > Typical, rdx and rax had scrolled off the screen. > > > > after removing the dump_stack invocations, I noticed that the reason > > this is rebooting is probably because right after the initial oops > > we hit the WARN_ON at arch/x86/kernel/smp.c:124 > > > > if (unlikely(cpu_is_offline(cpu))) { > > WARN_ON(1); > > return; > > } > > > > lol. > > > > Anwyay, before all that nonsense, I now have the top of the oops.. > > > > BUG: unable to handle kernel paging request at ffff880218c18000 > > IP: do_csum+0x68 > > PGD: 2c6a067 PUD: 2c6d067 PMD 23fd1c067 PTE: 80000000218c18060 > > RAX: 2090539bbf7b28f2 RBX: 00000000acb23d4e RCX: 000000000000000b > > RDX: ffff880218c18000 RSI: 0000000000001c62 RDI: ffff880218c16680 > > > > Maybe also notable here is that the kernel is built with DEBUG_PAGEALLOC on. > > This is still a problem in -rc2. > Lasts about 5 minutes, then reboots. > > Dave > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Linus Torvalds <torvalds@linux-foundation.org> Date: Tue, 24 Jun 2014 17:04:41 -0700 > Ping? Tom please help look at this. > This is all related to the new checksumming code by Tom Herbert. > > The oops seems to be "gso_make_checksum()" taking a checksum of > something that isn't mapped. Either the math for 'plen' is simply > wrong (maybe "csum_start" is not properly initialized), or maybe there > is a missing skb_pull() or similar, or the skb is fragmented and/or > needs kmapping. > > It's not a NULL pointer dereference, the faulting address is > ffff8800aa1a8000, so it's some kind of invalid pointer arithmetic > found by DEBUG_PAGEALLOC. > > The register information all looks reasonably sane (ie we have 11 > 64-byte blocks to go - so it looks like the length of the csum is > reasonable), and the starting address was clearly ok too, so this is > the copying just traversing into a page that isn't allocated. That > really smells like a skb with multiple fragments to me. Can that > happen for the GSO code? This is the forwarding path and what's happening is: 1) r8169 is allocating linear packets for rx and passing those into the stack 2) those rx packets are being accumulated by the GRO layer into a GRO packet, likely the GRO skb has segments composed of the data areas of the second and subsequent accumulated rx frames 3) The gro packet passes through IP forwarding then back out for TX 4) The destination device doesn't support TSO, so the GSO layer starts segmenting it back into MTU sized frames And this is where the csum crash is happening. tcp_gso_segment() seems to call skb_segment before doing checksumming stuff such as gso_make_checksum, so SKB_GSO_CB()->csum_start should be initialized properly. tcp_gso_segment() makes sure that the headers are reachable in the linear area with the pskb_may_pull(skb, sizeof(*th)) call, and gso_make_checksum() is only working with the area up to SKB_GSO_CB()->csum_start which should be within this area for sure. Well, that's the precondition we seem to be relying upon, I suppose an assert is in order. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> tcp_gso_segment() makes sure that the headers are reachable in the linear > area with the pskb_may_pull(skb, sizeof(*th)) call, and gso_make_checksum() > is only working with the area up to SKB_GSO_CB()->csum_start which should > be within this area for sure. > Seems likely that csum_start is not properly initialized in this path. I am thinking that this may have happened in GRO path on a checksum error where CHECKSUM_PARTIAL (and hence csum) is not set. That would explain the infrequency of the occurrence, and also previously not setting csum would have just resulted in sending a corrupted packet not a crash. > Well, that's the precondition we seem to be relying upon, I suppose an > assert is in order. Assert on SKB_GSO_CB()->csum_start == 0 would confirm my suspicion. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I believe in the no scatter-gather case of skb_segment is not set correctly. Will post a patch momentarily. On Tue, Jun 24, 2014 at 8:05 PM, Tom Herbert <therbert@google.com> wrote: >> tcp_gso_segment() makes sure that the headers are reachable in the linear >> area with the pskb_may_pull(skb, sizeof(*th)) call, and gso_make_checksum() >> is only working with the area up to SKB_GSO_CB()->csum_start which should >> be within this area for sure. >> > Seems likely that csum_start is not properly initialized in this path. > I am thinking that this may have happened in GRO path on a checksum > error where CHECKSUM_PARTIAL (and hence csum) is not set. That would > explain the infrequency of the occurrence, and also previously not > setting csum would have just resulted in sending a corrupted packet > not a crash. > >> Well, that's the precondition we seem to be relying upon, I suppose an >> assert is in order. > > Assert on SKB_GSO_CB()->csum_start == 0 would confirm my suspicion. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html