From patchwork Tue Mar 27 16:47:38 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Greear X-Patchwork-Id: 148996 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 21B6AB6EE6 for ; Wed, 28 Mar 2012 03:53:11 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754644Ab2C0QxI (ORCPT ); Tue, 27 Mar 2012 12:53:08 -0400 Received: from mail.candelatech.com ([208.74.158.172]:52228 "EHLO ns3.lanforge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753181Ab2C0QxD (ORCPT ); Tue, 27 Mar 2012 12:53:03 -0400 Received: from [192.168.100.111] (firewall.candelatech.com [70.89.124.249]) (authenticated bits=0) by ns3.lanforge.com (8.14.2/8.14.2) with ESMTP id q2RGldi5009579 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 27 Mar 2012 09:47:39 -0700 Message-ID: <4F71EF2A.8020507@candelatech.com> Date: Tue, 27 Mar 2012 09:47:38 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.1) Gecko/20120209 Thunderbird/10.0.1 MIME-Version: 1.0 To: Eric Dumazet CC: David Miller , netdev@vger.kernel.org, gregkh@linuxfoundation.org, "Paul E. McKenney" Subject: Re: RCU lock bug in 3.0.21 (bisected to: 682cb56a, fix NULL dereferences in check_peer_redir) References: <4F70E308.7070908@candelatech.com> <20120326.174945.1186427809261872546.davem@davemloft.net> <4F70E560.3020102@candelatech.com> <4F70F688.6050108@candelatech.com> <1332805148.3547.14.camel@edumazet-glaptop> In-Reply-To: <1332805148.3547.14.camel@edumazet-glaptop> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On 03/26/2012 04:39 PM, Eric Dumazet wrote: > On Mon, 2012-03-26 at 16:06 -0700, Ben Greear wrote: >> On 03/26/2012 02:53 PM, Ben Greear wrote: >>> On 03/26/2012 02:49 PM, David Miller wrote: >>>> >>>> Looks like all of those strange undiagnosable reported Dave Jones >>>> has been feeding us. Something in one part of the kernel leaves >>>> a lock held, and this shows up as a warning elsewhere. >>> >>> Every (initial) bug printout fingers ipv6 and the 'ip' tool on my system. >> >> I added a patch to convert rcu_read_lock/unlock to macros so >> that I could automatically grab the call site (_THIS_IP_) >> and pass it into the lockdep framework instead of the (useless) >> _THIS_IP_ in the old rcu_read_lock method which at best seems to >> only indicate which module the issue relates to... > > Hi Ben > > Is this problem also appears with current tree ? > (This could be a problem with the backport, as it was full of > dependencies) > > Also, if you use a patch to better track rcu_read_lock()/unlock(), you > could add new macros as well to track that a particular unlock() matches > one given lock(). (maybe returning the rcu_preempt_depth at > rcu_read_lock() time , but maybe a more absolute ref would be better) > > So we could have a warning if an unlock() doesnt match the lock() > > inet6_dump_fib () was already a suspect but we could not find why. Ok, I tried the patch below, and got the result farther down. Is this what you were thinking of? (The lockdep warning about rcu lock still held happened immediately after this..so it appears the depth mis-match does represent this problem... [greearb@fs3 linux-3.0.dev.y]$ git diff ------------[ cut here ]------------ WARNING: at /home/greearb/git/linux-3.0.dev.y/net/ipv6/ip6_fib.c:415 inet6_dump_fib+0x25c/0x292 [ipv6]() Hardware name: To be filled by O.E.M. depth: 1 lockdep-depth: 2 Modules linked in: 8021q garp stp llc fuse macvlan pktgen coretemp hwmon sunrpc ipv6 uinput arc4 ath9k snd_hda_codec_realtek mac80211 snd_hda_intel snd_hda_codec snd_hwdep snd_seq ath9k_common ath9k_hw snd_seq_device snd_pcm ath snd_timer e1000e cfg80211 snd mei(C) ppdev microcode i2c_i801 iTCO_wdt soundcore serio_raw pcspkr snd_page_alloc iTCO_vendor_support parport_pc parport i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan] Pid: 6563, comm: ip Tainted: G C 3.0.25+ #16 Call Trace: [] warn_slowpath_common+0x80/0x98 [] warn_slowpath_fmt+0x41/0x43 [] inet6_dump_fib+0x25c/0x292 [ipv6] [] netlink_dump+0x5b/0x19b [] ? consume_skb+0x28/0x2a [] netlink_recvmsg+0x1c7/0x2f8 [] __sock_recvmsg_nosec+0x65/0x6e [] __sock_recvmsg+0x49/0x54 [] sock_recvmsg+0xa6/0xbf [] ? lock_release_non_nested+0x9d/0x227 [] ? might_fault+0x4e/0x9e [] ? might_fault+0x97/0x9e [] ? copy_from_user+0x2a/0x2c [] ? might_fault+0x4e/0x9e [] ? verify_iovec+0x4f/0xa3 [] __sys_recvmsg+0x147/0x21e [] ? up_read+0x1e/0x36 [] ? fcheck_files+0xb7/0xee [] ? fget_light+0x3b/0xbc [] sys_recvmsg+0x3d/0x5b [] system_call_fastpath+0x16/0x1b ---[ end trace 5232c09c4fb31d15 ]--- diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index 0f9b37a..ae3c7c9 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -366,6 +366,7 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb) struct hlist_node *node; struct hlist_head *head; int res = 0; + int depth = current->lockdep_depth; s_h = cb->args[0]; s_e = cb->args[1]; @@ -410,6 +411,8 @@ next: } out: rcu_read_unlock(); + WARN(depth != current->lockdep_depth, "depth: %i lockdep-depth: %i\n", + depth, current->lockdep_depth); cb->args[1] = e; cb->args[0] = h;