From patchwork Fri Oct 24 21:49:27 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 402965 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 4CA96140077 for ; Sat, 25 Oct 2014 08:53:37 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755101AbaJXVxV (ORCPT ); Fri, 24 Oct 2014 17:53:21 -0400 Received: from e9.ny.us.ibm.com ([32.97.182.139]:58332 "EHLO e9.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751573AbaJXVxT (ORCPT ); Fri, 24 Oct 2014 17:53:19 -0400 Received: from /spool/local by e9.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 24 Oct 2014 17:53:18 -0400 Received: from d01dlp02.pok.ibm.com (9.56.250.167) by e9.ny.us.ibm.com (192.168.1.109) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Fri, 24 Oct 2014 17:53:15 -0400 Received: from b03cxnp08028.gho.boulder.ibm.com (b03cxnp08028.gho.boulder.ibm.com [9.17.130.20]) by d01dlp02.pok.ibm.com (Postfix) with ESMTP id 23A466E8054; Fri, 24 Oct 2014 17:41:59 -0400 (EDT) Received: from d03av06.boulder.ibm.com (d03av06.boulder.ibm.com [9.17.195.245]) by b03cxnp08028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id s9OLnTi137880014; Fri, 24 Oct 2014 23:49:29 +0200 Received: from d03av06.boulder.ibm.com (loopback [127.0.0.1]) by d03av06.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id s9OLsCKd013619; Fri, 24 Oct 2014 15:54:13 -0600 Received: from paulmck-ThinkPad-W500 ([9.70.82.148]) by d03av06.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id s9OLsBYm013613; Fri, 24 Oct 2014 15:54:11 -0600 Received: by paulmck-ThinkPad-W500 (Postfix, from userid 1000) id 6B56E383BA2; Fri, 24 Oct 2014 14:49:27 -0700 (PDT) Date: Fri, 24 Oct 2014 14:49:27 -0700 From: "Paul E. McKenney" To: Yanko Kaneti Cc: Josh Boyer , "Eric W. Biederman" , Cong Wang , Kevin Fenzi , netdev , "Linux-Kernel@Vger. Kernel. Org" , jay.vosburgh@canonical.com, mroos@linux.ee, tj@kernel.org Subject: Re: localed stuck in recent 3.18 git in copy_net_ns? Message-ID: <20141024214927.GA4977@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20141023220406.GJ4977@linux.vnet.ibm.com> <20141024090857.GA4083@declera.com> <20141024154006.GP4977@linux.vnet.ibm.com> <20141024162943.GA16621@declera.com> <20141024165454.GS4977@linux.vnet.ibm.com> <20141024170931.GA21849@declera.com> <20141024172009.GV4977@linux.vnet.ibm.com> <20141024173526.GA26058@declera.com> <20141024183226.GW4977@linux.vnet.ibm.com> <20141024212557.GA15537@declera.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20141024212557.GA15537@declera.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14102421-0033-0000-0000-000000D66580 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Sat, Oct 25, 2014 at 12:25:57AM +0300, Yanko Kaneti wrote: > On Fri-10/24/14-2014 11:32, Paul E. McKenney wrote: > > On Fri, Oct 24, 2014 at 08:35:26PM +0300, Yanko Kaneti wrote: > > > On Fri-10/24/14-2014 10:20, Paul E. McKenney wrote: [ . . . ] > > > > Well, if you are feeling aggressive, give the following patch a spin. > > > > I am doing sanity tests on it in the meantime. > > > > > > Doesn't seem to make a difference here > > > > OK, inspection isn't cutting it, so time for tracing. Does the system > > respond to user input? If so, please enable rcu:rcu_barrier ftrace before > > the problem occurs, then dump the trace buffer after the problem occurs. > > Sorry for being unresposive here, but I know next to nothing about tracing > or most things about the kernel, so I have some cathing up to do. > > In the meantime some layman observations while I tried to find what exactly > triggers the problem. > - Even in runlevel 1 I can reliably trigger the problem by starting libvirtd > - libvirtd seems to be very active in using all sorts of kernel facilities > that are modules on fedora so it seems to cause many simultaneous kworker > calls to modprobe > - there are 8 kworker/u16 from 0 to 7 > - one of these kworkers always deadlocks, while there appear to be two > kworker/u16:6 - the seventh Adding Tejun on CC in case this duplication of kworker/u16:6 is important. > 6 vs 8 as in 6 rcuos where before they were always 8 > > Just observations from someone who still doesn't know what the u16 > kworkers are.. Could you please run the following diagnostic patch? This will help me see if I have managed to miswire the rcuo kthreads. It should print some information at task-hang time. Thanx, Paul ------------------------------------------------------------------------ rcu: Dump no-CBs CPU state at task-hung time Strictly diagnostic commit for rcu_barrier() hang. Not for inclusion. Signed-off-by: Paul E. McKenney --- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h index 0e5366200154..34048140577b 100644 --- a/include/linux/rcutiny.h +++ b/include/linux/rcutiny.h @@ -157,4 +157,8 @@ static inline bool rcu_is_watching(void) #endif /* #else defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) */ +static inline void rcu_show_nocb_setup(void) +{ +} + #endif /* __LINUX_RCUTINY_H */ diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h index 52953790dcca..0b813bdb971b 100644 --- a/include/linux/rcutree.h +++ b/include/linux/rcutree.h @@ -97,4 +97,6 @@ extern int rcu_scheduler_active __read_mostly; bool rcu_is_watching(void); +void rcu_show_nocb_setup(void); + #endif /* __LINUX_RCUTREE_H */ diff --git a/kernel/hung_task.c b/kernel/hung_task.c index 06db12434d72..e6e4d0f6b063 100644 --- a/kernel/hung_task.c +++ b/kernel/hung_task.c @@ -118,6 +118,7 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout) " disables this message.\n"); sched_show_task(t); debug_show_held_locks(t); + rcu_show_nocb_setup(); touch_nmi_watchdog(); diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c index 240fa9094f83..6b373e79ce0e 100644 --- a/kernel/rcu/rcutorture.c +++ b/kernel/rcu/rcutorture.c @@ -1513,6 +1513,7 @@ rcu_torture_cleanup(void) { int i; + rcu_show_nocb_setup(); rcutorture_record_test_transition(); if (torture_cleanup_begin()) { if (cur_ops->cb_barrier != NULL) diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 927c17b081c7..285b3f6fb229 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -2699,6 +2699,31 @@ static bool init_nocb_callback_list(struct rcu_data *rdp) #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */ +void rcu_show_nocb_setup(void) +{ +#ifdef CONFIG_RCU_NOCB_CPU + int cpu; + struct rcu_data *rdp; + struct rcu_state *rsp; + + for_each_rcu_flavor(rsp) { + pr_alert("rcu_show_nocb_setup(): %s nocb state:\n", rsp->name); + for_each_possible_cpu(cpu) { + if (!rcu_is_nocb_cpu(cpu)) + continue; + rdp = per_cpu_ptr(rsp->rda, cpu); + pr_alert("%3d: %p l:%p n:%p %c%c%c\n", + cpu, + rdp, rdp->nocb_leader, rdp->nocb_next_follower, + ".N"[!!rdp->nocb_head], + ".G"[!!rdp->nocb_gp_head], + ".F"[!!rdp->nocb_follower_head]); + } + } +#endif /* #ifdef CONFIG_RCU_NOCB_CPU */ +} +EXPORT_SYMBOL_GPL(rcu_show_nocb_setup); + /* * An adaptive-ticks CPU can potentially execute in kernel mode for an * arbitrarily long period of time with the scheduling-clock tick turned