Message ID | 50FF4BC9.1060206@candelatech.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Tue, 2013-01-22 at 18:32 -0800, Ben Greear wrote: > diff --git a/net/core/dst.c b/net/core/dst.c > index ee6153e..234b168 100644 > --- a/net/core/dst.c > +++ b/net/core/dst.c > @@ -245,6 +245,7 @@ again: > dst->ops->destroy(dst); > if (dst->dev) > dev_put(dst->dev); > + dst->input = dst->output = 0xdeadbeef; > kmem_cache_free(dst->ops->kmem_cachep, dst); Great ! You could comment the kmem_cache_free() as well to get better chances to hit the bug, and maybe start a bisection to find the faulty commit ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/22/2013 10:11 PM, Eric Dumazet wrote: > On Tue, 2013-01-22 at 18:32 -0800, Ben Greear wrote: > >> diff --git a/net/core/dst.c b/net/core/dst.c >> index ee6153e..234b168 100644 >> --- a/net/core/dst.c >> +++ b/net/core/dst.c >> @@ -245,6 +245,7 @@ again: >> dst->ops->destroy(dst); >> if (dst->dev) >> dev_put(dst->dev); >> + dst->input = dst->output = 0xdeadbeef; >> kmem_cache_free(dst->ops->kmem_cachep, dst); > > Great ! > > You could comment the kmem_cache_free() as well to get better chances to > hit the bug, and maybe start a bisection to find the faulty commit ? I suspect the bug goes back at least as far as 3.3. And since I need the NFS patches for this test case, bisecting will be pure hell. I'll work on some more code instrumentation tomorrow. One thing that came to mind while I was looking at the code today: How are the non-ref-counted dst objects used safely? Any chance that tearing down the IP protocol on a device (or deleting a device) could delete a dst that is referenced by an skb (and thus crashes as I see)? Thanks, Ben
On Tue, 2013-01-22 at 23:14 -0800, Ben Greear wrote: > How are the non-ref-counted dst objects used safely? Any chance > that tearing down the IP protocol on a device (or deleting a device) > could delete a dst that is referenced by an skb (and thus crashes as > I see)? There is probably a bug. Normally it should be RCU protected. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2013-01-22 at 23:14 -0800, Ben Greear wrote: > I suspect the bug goes back at least as far as 3.3. And since > I need the NFS patches for this test case, bisecting will be pure hell. Make sure the bug is still present in net-next. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/23/2013 05:35 AM, Eric Dumazet wrote: > On Tue, 2013-01-22 at 23:14 -0800, Ben Greear wrote: > >> How are the non-ref-counted dst objects used safely? Any chance >> that tearing down the IP protocol on a device (or deleting a device) >> could delete a dst that is referenced by an skb (and thus crashes as >> I see)? > > There is probably a bug. Normally it should be RCU protected. I'm building net-next, and will do some testing on it later today if all goes well. In the meantime, a few questions about RCU. If the non-ref-counted dst is protected by RCU, does that mean that from the time we acquire the pointer (ie, set it in the skb), to the time we clear the pointer from the skb, we must be under RCU read lock? If so, that means that ip_rcv_finish must be called under RCU read lock, etc? Btw, this seems non related to NFS, so I'm dropping them from CC after this email unless I hear a request otherwise. Thanks, Ben
On Wed, 2013-01-23 at 10:15 -0800, Ben Greear wrote: > On 01/23/2013 05:35 AM, Eric Dumazet wrote: > > On Tue, 2013-01-22 at 23:14 -0800, Ben Greear wrote: > > > >> How are the non-ref-counted dst objects used safely? Any chance > >> that tearing down the IP protocol on a device (or deleting a device) > >> could delete a dst that is referenced by an skb (and thus crashes as > >> I see)? > > > > There is probably a bug. Normally it should be RCU protected. > > I'm building net-next, and will do some testing on it later today > if all goes well. > > In the meantime, a few questions about RCU. > > If the non-ref-counted dst is protected by RCU, does that mean > that from the time we acquire the pointer (ie, set it in the skb), > to the time we clear the pointer from the skb, we must be under > RCU read lock? > > If so, that means that ip_rcv_finish must be called under RCU > read lock, etc? Obviously yes. Check skb_dst_force() for the rare cases we want to escape the RCU section. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/23/2013 06:42 AM, Eric Dumazet wrote: > On Tue, 2013-01-22 at 23:14 -0800, Ben Greear wrote: > >> I suspect the bug goes back at least as far as 3.3. And since >> I need the NFS patches for this test case, bisecting will be pure hell. > > Make sure the bug is still present in net-next. > My test case isn't running well on net-next. Seems most of the file-io processes are hung trying to open a file: [root@lf1011-12060006 lanforge]# cat /proc/10612/stack [<ffffffff8115fadb>] do_last+0x1db/0xa4e [<ffffffff8116090b>] path_openat+0xcb/0x363 [<ffffffff81160ca8>] do_filp_open+0x38/0x84 [<ffffffff81153bfc>] do_sys_open+0x6d/0xff [<ffffffff81153cbb>] sys_open+0x1c/0x1e [<ffffffff8154f629>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff I'm going back to 3.7.3 and will try to learn some more about what it's doing there. Thanks, Ben
On 01/22/2013 06:32 PM, Ben Greear wrote: So, I'm slowly making some progress. I've verified that the skb has bogus dst (0xdeadbeef) at the top of the ip_rcv_finish method. I'm trying to track it backwards and figure out which device it belongs to, etc....takes a while to reproduce though. One thing about this stack trace below...the dev_seq_stop() does a rcu read-unlock. Now, I can't figure out exactly how ip_rcv() can cause dev_seq_stop() to run, but if this stack is legit, then maybe by the time we enter the ip_rcv_finish() code we are running without rcu_readlock() held? If so, that would probably explain the bug. > Call Trace: > [<ffffffff814a8b02>] ? ip_rcv_finish+0x2f0/0x308 > [<ffffffff814a8812>] ? skb_dst+0x5a/0x5a > [<ffffffff814a8eb5>] NF_HOOK.clone.1+0x4c/0x54 > [<ffffffff81472e61>] ? dev_seq_stop+0xb/0xb > [<ffffffff814a9142>] ip_rcv+0x237/0x269 > [<ffffffff81473def>] __netif_receive_skb+0x487/0x530 > [<ffffffff81473f91>] process_backlog+0xf9/0x1da > [<ffffffff8147639a>] net_rx_action+0xad/0x218 > [<ffffffff8108d50a>] __do_softirq+0x9c/0x161 > [<ffffffff8108d5f2>] run_ksoftirqd+0x23/0x42 > [<ffffffff810a7ebe>] smpboot_thread_fn+0x253/0x259 > [<ffffffff810a7c6b>] ? test_ti_thread_flag.clone.0+0x11/0x11 > [<ffffffff810a0a6d>] kthread+0xc2/0xca > [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56 > [<ffffffff81537b7c>] ret_from_fork+0x7c/0xb0 > [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56 ## This is from a slightly different kernel image...but probably this part is legit. 0xffffffff814a92b3 is in ip_rcv (/home/greearb/git/linux-3.7.dev.y/net/ipv4/ip_input.c:466). 461 /* Our transport medium may have padded the buffer out. Now we know it 462 * is IP we can trim to the true length of the frame. 463 * Note this now means skb->len holds ntohs(iph->tot_len). 464 */ 465 if (pskb_trim_rcsum(skb, len)) { 466 IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INDISCARDS); 467 goto drop; 468 } 469 470 /* Remove any debris in the socket control block */
On Wed, 2013-01-23 at 15:55 -0800, Ben Greear wrote: > On 01/22/2013 06:32 PM, Ben Greear wrote: > > So, I'm slowly making some progress. I've verified that the skb > has bogus dst (0xdeadbeef) at the top of the ip_rcv_finish > method. I'm trying to track it backwards and figure out which > device it belongs to, etc....takes a while to reproduce though. > > One thing about this stack trace below...the dev_seq_stop() does > a rcu read-unlock. Now, I can't figure out exactly how ip_rcv() > can cause dev_seq_stop() to run, but if this stack is legit, > then maybe by the time we enter the ip_rcv_finish() code we are > running without rcu_readlock() held? > > If so, that would probably explain the bug. > The whole thing is run under rcu_read_lock() done in __netif_receive_skb() My suspicion was that we called netif_rx() from macvlan leaving a not refcounted skb dst. But the patch I sent to you didnt solve the bug, so its something else. You could trace at which point the dst was released. (where you set dst->input/output to deadbeef) > > Call Trace: > > [<ffffffff814a8b02>] ? ip_rcv_finish+0x2f0/0x308 > > [<ffffffff814a8812>] ? skb_dst+0x5a/0x5a > > [<ffffffff814a8eb5>] NF_HOOK.clone.1+0x4c/0x54 > > [<ffffffff81472e61>] ? dev_seq_stop+0xb/0xb > > [<ffffffff814a9142>] ip_rcv+0x237/0x269 > > [<ffffffff81473def>] __netif_receive_skb+0x487/0x530 > > [<ffffffff81473f91>] process_backlog+0xf9/0x1da > > [<ffffffff8147639a>] net_rx_action+0xad/0x218 > > [<ffffffff8108d50a>] __do_softirq+0x9c/0x161 > > [<ffffffff8108d5f2>] run_ksoftirqd+0x23/0x42 > > [<ffffffff810a7ebe>] smpboot_thread_fn+0x253/0x259 > > [<ffffffff810a7c6b>] ? test_ti_thread_flag.clone.0+0x11/0x11 > > [<ffffffff810a0a6d>] kthread+0xc2/0xca > > [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56 > > [<ffffffff81537b7c>] ret_from_fork+0x7c/0xb0 > > [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56 > > > ## This is from a slightly different kernel image...but probably this part is legit. > > 0xffffffff814a92b3 is in ip_rcv (/home/greearb/git/linux-3.7.dev.y/net/ipv4/ip_input.c:466). > 461 /* Our transport medium may have padded the buffer out. Now we know it > 462 * is IP we can trim to the true length of the frame. > 463 * Note this now means skb->len holds ntohs(iph->tot_len). > 464 */ > 465 if (pskb_trim_rcsum(skb, len)) { > 466 IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INDISCARDS); > 467 goto drop; > 468 } > 469 > 470 /* Remove any debris in the socket control block */ > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/23/2013 04:01 PM, Eric Dumazet wrote: > On Wed, 2013-01-23 at 15:55 -0800, Ben Greear wrote: >> On 01/22/2013 06:32 PM, Ben Greear wrote: >> >> So, I'm slowly making some progress. I've verified that the skb >> has bogus dst (0xdeadbeef) at the top of the ip_rcv_finish >> method. I'm trying to track it backwards and figure out which >> device it belongs to, etc....takes a while to reproduce though. >> >> One thing about this stack trace below...the dev_seq_stop() does >> a rcu read-unlock. Now, I can't figure out exactly how ip_rcv() >> can cause dev_seq_stop() to run, but if this stack is legit, >> then maybe by the time we enter the ip_rcv_finish() code we are >> running without rcu_readlock() held? >> >> If so, that would probably explain the bug. >> > > The whole thing is run under rcu_read_lock() done in > __netif_receive_skb() I was worried that the dev_seq_stop might be called incorrectly causing an asymetric unlock. I have no idea how that might happened, but several crashes have that dev_seq_stop method listed, so it got me suspicious. > > My suspicion was that we called netif_rx() from macvlan leaving a > not refcounted skb dst. > > But the patch I sent to you didnt solve the bug, so its something else. > > You could trace at which point the dst was released. (where you set > dst->input/output to deadbeef) My current code is in some garbage collector timer code, but I can work on saving the call-site that first pokes the dst into the garbage collection list... Thanks, Ben
On Wed, 2013-01-23 at 16:13 -0800, Ben Greear wrote: > On 01/23/2013 04:01 PM, Eric Dumazet wrote: > > On Wed, 2013-01-23 at 15:55 -0800, Ben Greear wrote: > >> On 01/22/2013 06:32 PM, Ben Greear wrote: > >> > >> So, I'm slowly making some progress. I've verified that the skb > >> has bogus dst (0xdeadbeef) at the top of the ip_rcv_finish > >> method. I'm trying to track it backwards and figure out which > >> device it belongs to, etc....takes a while to reproduce though. > >> > >> One thing about this stack trace below...the dev_seq_stop() does > >> a rcu read-unlock. Now, I can't figure out exactly how ip_rcv() > >> can cause dev_seq_stop() to run, but if this stack is legit, > >> then maybe by the time we enter the ip_rcv_finish() code we are > >> running without rcu_readlock() held? > >> > >> If so, that would probably explain the bug. > >> > > > > The whole thing is run under rcu_read_lock() done in > > __netif_receive_skb() > > I was worried that the dev_seq_stop might be called > incorrectly causing an asymetric unlock. I have no > idea how that might happened, but several crashes > have that dev_seq_stop method listed, so it got me suspicious. dev_seq_stop() is some word in the kernel stack, result of a prior system call. Stack is not cleanup. Each function reserves an amount of stack but not always write on all reserved space (some automatic variables might be not set) Note the "? " before the name : linux printed the symbol but this was not a call site for this particular call graph. Its only an extra indication, that can be useful sometimes. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/23/2013 04:23 PM, Eric Dumazet wrote: > On Wed, 2013-01-23 at 16:13 -0800, Ben Greear wrote: >> On 01/23/2013 04:01 PM, Eric Dumazet wrote: >> I was worried that the dev_seq_stop might be called >> incorrectly causing an asymetric unlock. I have no >> idea how that might happened, but several crashes >> have that dev_seq_stop method listed, so it got me suspicious. > > dev_seq_stop() is some word in the kernel stack, result of a prior > system call. Stack is not cleanup. > > Each function reserves an amount of stack but not always write on all > reserved space (some automatic variables might be not set) > > Note the "? " before the name : linux printed the symbol but this was > not a call site for this particular call graph. Its only an extra > indication, that can be useful sometimes. Ahh, thanks for that info...I'd never quite pieced that together before. Here's another crash. Interestingly, the dst is bad before the rcu-read-lock() (the bug is from the first of the 'deadbeef' debugging code below) Perhaps other useful info: The skb->dev claims to be 'lo'. The dst 'pointer' in the skb has 0x1 set, so it is the 'noref' variant. static int __netif_receive_skb(struct sk_buff *skb) { struct packet_type *ptype, *pt_prev; rx_handler_func_t *rx_handler; struct net_device *orig_dev; struct net_device *null_or_dev; bool deliver_exact = false; int ret = NET_RX_DROP; __be16 type; unsigned long pflags = current->flags; net_timestamp_check(!netdev_tstamp_prequeue, skb); trace_netif_receive_skb(skb); /* * PFMEMALLOC skbs are special, they should * - be delivered to SOCK_MEMALLOC sockets only * - stay away from userspace * - have bounded memory usage * * Use PF_MEMALLOC as this saves us from propagating the allocation * context down to all allocation sites. */ if (sk_memalloc_socks() && skb_pfmemalloc(skb)) current->flags |= PF_MEMALLOC; /* if we've gotten here through NAPI, check netpoll */ if (netpoll_receive_skb(skb)) goto out; orig_dev = skb->dev; skb_reset_network_header(skb); skb_reset_transport_header(skb); skb_reset_mac_len(skb); pt_prev = NULL; if (skb_dst(skb)) { if (skb_dst(skb)->input == 0xdeadbeef) { printk("bad dst: %lu, skb->dev: %s len: %i\n", skb->_skb_refdst, skb->dev->name, skb->len); BUG_ON(1); } } rcu_read_lock(); if (skb_dst(skb)) { if (skb_dst(skb)->input == 0xdeadbeef) { printk("bad dst: %lu, skb->dev: %s len: %i\n", skb->_skb_refdst, skb->dev->name, skb->len); BUG_ON(1); } } another_round: skb->skb_iif = skb->dev->ifindex; __this_cpu_inc(softnet_data.processed); ... [root@lf1011-12060006 ~]# bad dst: 18446612148864241601, skb->dev: lo len: 3232 ------------[ cut here ]------------ kernel BUG at /home/greearb/git/linux-3.7.dev.y/net/core/dev.c:3266! invalid opcode: 0000 [#1] PREEMPT SMP Modules linked in: macvlan pktgen lockd sunrpc uinput coretemp hwmon kvm_intel kvm microcode iTCO_wdt iTe CPU 4 Pid: 35, comm: ksoftirqd/4 Tainted: G C O 3.7.3+ #50 Iron Systems Inc. EE2610R/X8ST3 RIP: 0010:[<ffffffff81473d22>] [<ffffffff81473d22>] __netif_receive_skb+0x101/0x5b8 RSP: 0018:ffff88040d711c58 EFLAGS: 00010296 RAX: 0000000000000036 RBX: ffff88041fc93e80 RCX: 000000000000a6a5 RDX: ffffffff810883a6 RSI: 00000000000005fc RDI: 0000000000000246 RBP: ffff88040d711cb8 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000004 R11: 0000000000000000 R12: ffff88041fc93fd0 R13: 0000000000000040 R14: ffff88040d3f8000 R15: ffff88041fc93f80 FS: 0000000000000000(0000) GS:ffff88041fc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000000262c118 CR3: 00000003da651000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process ksoftirqd/4 (pid: 35, threadinfo ffff88040d710000, task ffff88040d701f50) Stack: 0000000000000046 0420804000000100 ffffffff81aaf0a0 ffff8803da901200 000000000d711cb8 ffffffff81aaf0a0 ffff8803ffa90428 ffff88041fc93e80 ffff88041fc93fd0 0000000000000040 0000000000000024 ffff88041fc93f80 Call Trace: [<ffffffff814742d2>] process_backlog+0xf9/0x1da [<ffffffff814766db>] net_rx_action+0xad/0x218 [<ffffffff8108d50a>] __do_softirq+0x9c/0x161 [<ffffffff8108d5f2>] run_ksoftirqd+0x23/0x42 [<ffffffff810a7ebe>] smpboot_thread_fn+0x253/0x259 [<ffffffff810a7c6b>] ? test_ti_thread_flag.clone.0+0x11/0x11 [<ffffffff810a0a6d>] kthread+0xc2/0xca [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56 [<ffffffff81537dbc>] ret_from_fork+0x7c/0xb0 [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56 Code: fc ff ff ba ef be ad de 48 39 50 50 75 21 48 8b 45 b8 48 c7 c7 50 ea 82 81 8b 48 68 48 8b 50 20 48 RIP [<ffffffff81473d22>] __netif_receive_skb+0x101/0x5b8 RSP <ffff88040d711c58> ---[ end trace e5f94dc78f5e5277 ]--- Kernel panic - not syncing: Fatal exception in interrupt > >
On Wed, 2013-01-23 at 16:38 -0800, Ben Greear wrote: > On 01/23/2013 04:23 PM, Eric Dumazet wrote: > > On Wed, 2013-01-23 at 16:13 -0800, Ben Greear wrote: > >> On 01/23/2013 04:01 PM, Eric Dumazet wrote: > > >> I was worried that the dev_seq_stop might be called > >> incorrectly causing an asymetric unlock. I have no > >> idea how that might happened, but several crashes > >> have that dev_seq_stop method listed, so it got me suspicious. > > > > dev_seq_stop() is some word in the kernel stack, result of a prior > > system call. Stack is not cleanup. > > > > Each function reserves an amount of stack but not always write on all > > reserved space (some automatic variables might be not set) > > > > Note the "? " before the name : linux printed the symbol but this was > > not a call site for this particular call graph. Its only an extra > > indication, that can be useful sometimes. > > Ahh, thanks for that info...I'd never quite pieced that together > before. > > Here's another crash. Interestingly, the dst is bad before the rcu-read-lock() > (the bug is from the first of the 'deadbeef' debugging code below) > > Perhaps other useful info: The skb->dev claims to be 'lo'. The dst 'pointer' > in the skb has 0x1 set, so it is the 'noref' variant. > > > static int __netif_receive_skb(struct sk_buff *skb) > { > struct packet_type *ptype, *pt_prev; > rx_handler_func_t *rx_handler; > struct net_device *orig_dev; > struct net_device *null_or_dev; > bool deliver_exact = false; > int ret = NET_RX_DROP; > __be16 type; > unsigned long pflags = current->flags; > > net_timestamp_check(!netdev_tstamp_prequeue, skb); > > trace_netif_receive_skb(skb); > > /* > * PFMEMALLOC skbs are special, they should > * - be delivered to SOCK_MEMALLOC sockets only > * - stay away from userspace > * - have bounded memory usage > * > * Use PF_MEMALLOC as this saves us from propagating the allocation > * context down to all allocation sites. > */ > if (sk_memalloc_socks() && skb_pfmemalloc(skb)) > current->flags |= PF_MEMALLOC; > > /* if we've gotten here through NAPI, check netpoll */ > if (netpoll_receive_skb(skb)) > goto out; > > orig_dev = skb->dev; > > skb_reset_network_header(skb); > skb_reset_transport_header(skb); > skb_reset_mac_len(skb); > > pt_prev = NULL; > > if (skb_dst(skb)) { > if (skb_dst(skb)->input == 0xdeadbeef) { > printk("bad dst: %lu, skb->dev: %s len: %i\n", > skb->_skb_refdst, skb->dev->name, skb->len); > BUG_ON(1); > } > } > You should add your debuging code in netif_rx() so that we know the caller by the way you could only add BUG_ON(skb->_skb_refdst & SKB_DST_NOREF) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/23/2013 04:45 PM, Eric Dumazet wrote: > On Wed, 2013-01-23 at 16:38 -0800, Ben Greear wrote: >> On 01/23/2013 04:23 PM, Eric Dumazet wrote: >>> On Wed, 2013-01-23 at 16:13 -0800, Ben Greear wrote: >>>> On 01/23/2013 04:01 PM, Eric Dumazet wrote: >> >>>> I was worried that the dev_seq_stop might be called >>>> incorrectly causing an asymetric unlock. I have no >>>> idea how that might happened, but several crashes >>>> have that dev_seq_stop method listed, so it got me suspicious. >>> >>> dev_seq_stop() is some word in the kernel stack, result of a prior >>> system call. Stack is not cleanup. >>> >>> Each function reserves an amount of stack but not always write on all >>> reserved space (some automatic variables might be not set) >>> >>> Note the "? " before the name : linux printed the symbol but this was >>> not a call site for this particular call graph. Its only an extra >>> indication, that can be useful sometimes. >> >> Ahh, thanks for that info...I'd never quite pieced that together >> before. >> >> Here's another crash. Interestingly, the dst is bad before the rcu-read-lock() >> (the bug is from the first of the 'deadbeef' debugging code below) >> >> Perhaps other useful info: The skb->dev claims to be 'lo'. The dst 'pointer' >> in the skb has 0x1 set, so it is the 'noref' variant. >> >> >> static int __netif_receive_skb(struct sk_buff *skb) >> { >> struct packet_type *ptype, *pt_prev; >> rx_handler_func_t *rx_handler; >> struct net_device *orig_dev; >> struct net_device *null_or_dev; >> bool deliver_exact = false; >> int ret = NET_RX_DROP; >> __be16 type; >> unsigned long pflags = current->flags; >> >> net_timestamp_check(!netdev_tstamp_prequeue, skb); >> >> trace_netif_receive_skb(skb); >> >> /* >> * PFMEMALLOC skbs are special, they should >> * - be delivered to SOCK_MEMALLOC sockets only >> * - stay away from userspace >> * - have bounded memory usage >> * >> * Use PF_MEMALLOC as this saves us from propagating the allocation >> * context down to all allocation sites. >> */ >> if (sk_memalloc_socks() && skb_pfmemalloc(skb)) >> current->flags |= PF_MEMALLOC; >> >> /* if we've gotten here through NAPI, check netpoll */ >> if (netpoll_receive_skb(skb)) >> goto out; >> >> orig_dev = skb->dev; >> >> skb_reset_network_header(skb); >> skb_reset_transport_header(skb); >> skb_reset_mac_len(skb); >> >> pt_prev = NULL; >> >> if (skb_dst(skb)) { >> if (skb_dst(skb)->input == 0xdeadbeef) { >> printk("bad dst: %lu, skb->dev: %s len: %i\n", >> skb->_skb_refdst, skb->dev->name, skb->len); >> BUG_ON(1); >> } >> } >> > > You should add your debuging code in netif_rx() so that we know the > caller > > by the way you could only add > > BUG_ON(skb->_skb_refdst & SKB_DST_NOREF) Ok, will add that. I was poking around in drivers/net/loopback.c. Maybe it needs to clean up the skb_dst() before calling the rx logic in the loopback_xmit method? Thanks, ben > >
On Wed, 2013-01-23 at 16:51 -0800, Ben Greear wrote: > > I was poking around in drivers/net/loopback.c. Maybe it needs > to clean up the skb_dst() before calling the rx logic in the > loopback_xmit method? Nope. Its ok there. We need a dst for loopback -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/23/2013 05:00 PM, Eric Dumazet wrote: > On Wed, 2013-01-23 at 16:51 -0800, Ben Greear wrote: > >> >> I was poking around in drivers/net/loopback.c. Maybe it needs >> to clean up the skb_dst() before calling the rx logic in the >> loopback_xmit method? > > Nope. Its ok there. We need a dst for loopback Here's the crash with the BUG_ON(). Did I understand your suggestion properly? This hit immediately when booting. int netif_rx(struct sk_buff *skb) { int ret; BUG_ON(skb->_skb_refdst & SKB_DST_NOREF); /* if netpoll wants it, pretend we never saw it */ if (netpoll_rx(skb)) return NET_RX_DROP; kernel BUG at /home/greearb/git/linux-3.7.dev.y/net/core/dev.c:2982! invalid opcode: 0000 [#1] PREEMPT SMP Modules linked in: lockd sunrpc macvlan pktgen uinput coretemp hwmon kvm_intel kvm iTCO_wdt iTCO_vendor_e CPU 0 Pid: 1554, comm: btserver Tainted: G C O 3.7.3+ #51 Iron Systems Inc. EE2610R/X8ST3 RIP: 0010:[<ffffffff8147685a>] [<ffffffff8147685a>] netif_rx+0x14/0x109 RSP: 0018:ffff8804030359a8 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff8803fff3a2f0 RCX: ffff88040d3b4490 RDX: ffff880403035a08 RSI: ffff88040d3f8000 RDI: ffff8803fff3a2f0 RBP: ffff8804030359d8 R08: 0000000000000001 R09: 0000000000000000 R10: ffffffff81472e61 R11: ffff8803fff62cc0 R12: 0000000000016ff0 R13: 000000000000003c R14: ffff88041fc00000 R15: ffffffff81670560 FS: 00007fcbd5c01740(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000439004 CR3: 00000003ff68d000 CR4: 00000000000007f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process btserver (pid: 1554, threadinfo ffff880403034000, task ffff880407e09f50) Stack: ffff8804030359d8 ffffffff814c0290 ffff880403035a08 ffff8803fff3a2f0 ffff8803fff3a2f0 0000000000016ff0 ffff880403035a08 ffffffff813a3c4d ffff8803fff3a2f0 ffff88040d3f8000 00000000a01b7c08 ffff88040d0b7400 Call Trace: [<ffffffff814c0290>] ? tcp_wfree+0xc0/0xc8 [<ffffffff813a3c4d>] loopback_xmit+0x64/0x83 [<ffffffff81477364>] dev_hard_start_xmit+0x26c/0x35e [<ffffffff8147771a>] dev_queue_xmit+0x2c4/0x37c [<ffffffff81477456>] ? dev_hard_start_xmit+0x35e/0x35e [<ffffffff8148cfa6>] ? eth_header+0x28/0xb6 [<ffffffff81480f09>] neigh_resolve_output+0x176/0x1a7 [<ffffffff814ad835>] ip_finish_output2+0x297/0x30d [<ffffffff814ad6d5>] ? ip_finish_output2+0x137/0x30d [<ffffffff814ad90e>] ip_finish_output+0x63/0x68 [<ffffffff814ae412>] ip_output+0x61/0x67 [<ffffffff814ab904>] dst_output+0x17/0x1b [<ffffffff814adb6d>] ip_local_out+0x1e/0x23 [<ffffffff814ae1c4>] ip_queue_xmit+0x315/0x353 [<ffffffff814adeaf>] ? ip_send_unicast_reply+0x2cc/0x2cc [<ffffffff814c018f>] tcp_transmit_skb+0x7ca/0x80b [<ffffffff814c3571>] tcp_connect+0x53c/0x587 [<ffffffff810c2f0c>] ? getnstimeofday+0x44/0x7d [<ffffffff810c2f56>] ? ktime_get_real+0x11/0x3e [<ffffffff814c6f9b>] tcp_v4_connect+0x3c2/0x431 [<ffffffff814d6913>] __inet_stream_connect+0x84/0x287 [<ffffffff814d6b38>] ? inet_stream_connect+0x22/0x49 [<ffffffff8108d695>] ? _local_bh_enable_ip+0x84/0x9f [<ffffffff8108d6c8>] ? local_bh_enable+0xd/0x11 [<ffffffff8146763c>] ? lock_sock_nested+0x6e/0x79 [<ffffffff814d6b38>] ? inet_stream_connect+0x22/0x49 [<ffffffff814d6b49>] inet_stream_connect+0x33/0x49 [<ffffffff814632c6>] sys_connect+0x75/0x98 [<ffffffff811551fd>] ? path_put+0x1d/0x21 [<ffffffff810e8d06>] ? __audit_syscall_entry+0x11c/0x148 [<ffffffff8128a509>] ? lockdep_sys_exit_thunk+0x35/0x67 [<ffffffff81162f8b>] ? __fd_install+0x26/0x52 [<ffffffff81537e69>] system_call_fastpath+0x16/0x1b Code: 49 8b 5c 24 10 48 8b 43 68 48 85 c0 0f 85 5e fe ff ff e9 79 fe ff ff 55 48 89 e5 41 54 53 48 89 fb RIP [<ffffffff8147685a>] netif_rx+0x14/0x109 RSP <ffff8804030359a8> ... DONE > > > >
On Wed, 2013-01-23 at 17:06 -0800, Ben Greear wrote: > On 01/23/2013 05:00 PM, Eric Dumazet wrote: > > On Wed, 2013-01-23 at 16:51 -0800, Ben Greear wrote: > > > >> > >> I was poking around in drivers/net/loopback.c. Maybe it needs > >> to clean up the skb_dst() before calling the rx logic in the > >> loopback_xmit method? > > > > Nope. Its ok there. We need a dst for loopback > > Here's the crash with the BUG_ON(). Did I understand your > suggestion properly? This hit immediately when booting. > > > int netif_rx(struct sk_buff *skb) > { > int ret; > > BUG_ON(skb->_skb_refdst & SKB_DST_NOREF); > > /* if netpoll wants it, pretend we never saw it */ > if (netpoll_rx(skb)) > return NET_RX_DROP; > > > kernel BUG at /home/greearb/git/linux-3.7.dev.y/net/core/dev.c:2982! > invalid opcode: 0000 [#1] PREEMPT SMP > Modules linked in: lockd sunrpc macvlan pktgen uinput coretemp hwmon kvm_intel kvm iTCO_wdt iTCO_vendor_e > CPU 0 > Pid: 1554, comm: btserver Tainted: G C O 3.7.3+ #51 Iron Systems Inc. EE2610R/X8ST3 > RIP: 0010:[<ffffffff8147685a>] [<ffffffff8147685a>] netif_rx+0x14/0x109 > RSP: 0018:ffff8804030359a8 EFLAGS: 00010202 > RAX: 0000000000000000 RBX: ffff8803fff3a2f0 RCX: ffff88040d3b4490 > RDX: ffff880403035a08 RSI: ffff88040d3f8000 RDI: ffff8803fff3a2f0 > RBP: ffff8804030359d8 R08: 0000000000000001 R09: 0000000000000000 > R10: ffffffff81472e61 R11: ffff8803fff62cc0 R12: 0000000000016ff0 > R13: 000000000000003c R14: ffff88041fc00000 R15: ffffffff81670560 > FS: 00007fcbd5c01740(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000439004 CR3: 00000003ff68d000 CR4: 00000000000007f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process btserver (pid: 1554, threadinfo ffff880403034000, task ffff880407e09f50) > Stack: > ffff8804030359d8 ffffffff814c0290 ffff880403035a08 ffff8803fff3a2f0 > ffff8803fff3a2f0 0000000000016ff0 ffff880403035a08 ffffffff813a3c4d > ffff8803fff3a2f0 ffff88040d3f8000 00000000a01b7c08 ffff88040d0b7400 > Call Trace: > [<ffffffff814c0290>] ? tcp_wfree+0xc0/0xc8 > [<ffffffff813a3c4d>] loopback_xmit+0x64/0x83 > [<ffffffff81477364>] dev_hard_start_xmit+0x26c/0x35e > [<ffffffff8147771a>] dev_queue_xmit+0x2c4/0x37c > [<ffffffff81477456>] ? dev_hard_start_xmit+0x35e/0x35e > [<ffffffff8148cfa6>] ? eth_header+0x28/0xb6 > [<ffffffff81480f09>] neigh_resolve_output+0x176/0x1a7 > [<ffffffff814ad835>] ip_finish_output2+0x297/0x30d > [<ffffffff814ad6d5>] ? ip_finish_output2+0x137/0x30d > [<ffffffff814ad90e>] ip_finish_output+0x63/0x68 > [<ffffffff814ae412>] ip_output+0x61/0x67 > [<ffffffff814ab904>] dst_output+0x17/0x1b > [<ffffffff814adb6d>] ip_local_out+0x1e/0x23 > [<ffffffff814ae1c4>] ip_queue_xmit+0x315/0x353 > [<ffffffff814adeaf>] ? ip_send_unicast_reply+0x2cc/0x2cc > [<ffffffff814c018f>] tcp_transmit_skb+0x7ca/0x80b > [<ffffffff814c3571>] tcp_connect+0x53c/0x587 > [<ffffffff810c2f0c>] ? getnstimeofday+0x44/0x7d > [<ffffffff810c2f56>] ? ktime_get_real+0x11/0x3e > [<ffffffff814c6f9b>] tcp_v4_connect+0x3c2/0x431 > [<ffffffff814d6913>] __inet_stream_connect+0x84/0x287 > [<ffffffff814d6b38>] ? inet_stream_connect+0x22/0x49 > [<ffffffff8108d695>] ? _local_bh_enable_ip+0x84/0x9f > [<ffffffff8108d6c8>] ? local_bh_enable+0xd/0x11 > [<ffffffff8146763c>] ? lock_sock_nested+0x6e/0x79 > [<ffffffff814d6b38>] ? inet_stream_connect+0x22/0x49 > [<ffffffff814d6b49>] inet_stream_connect+0x33/0x49 > [<ffffffff814632c6>] sys_connect+0x75/0x98 > [<ffffffff811551fd>] ? path_put+0x1d/0x21 > [<ffffffff810e8d06>] ? __audit_syscall_entry+0x11c/0x148 > [<ffffffff8128a509>] ? lockdep_sys_exit_thunk+0x35/0x67 > [<ffffffff81162f8b>] ? __fd_install+0x26/0x52 > [<ffffffff81537e69>] system_call_fastpath+0x16/0x1b > Code: 49 8b 5c 24 10 48 8b 43 68 48 85 c0 0f 85 5e fe ff ff e9 79 fe ff ff 55 48 89 e5 41 54 53 48 89 fb > RIP [<ffffffff8147685a>] netif_rx+0x14/0x109 > RSP <ffff8804030359a8> > ... DONE Excellent, thats the bug. I'll send a fix asap, thanks ! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/core/dst.c b/net/core/dst.c index ee6153e..234b168 100644 --- a/net/core/dst.c +++ b/net/core/dst.c @@ -245,6 +245,7 @@ again: dst->ops->destroy(dst); if (dst->dev) dev_put(dst->dev); + dst->input = dst->output = 0xdeadbeef; kmem_cache_free(dst->ops->kmem_cachep, dst); dst = child;