Message ID | 20141024172009.GV4977@linux.vnet.ibm.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Fri-10/24/14-2014 10:20, Paul E. McKenney wrote: > On Fri, Oct 24, 2014 at 08:09:31PM +0300, Yanko Kaneti wrote: > > On Fri-10/24/14-2014 09:54, Paul E. McKenney wrote: > > > On Fri, Oct 24, 2014 at 07:29:43PM +0300, Yanko Kaneti wrote: > > > > On Fri-10/24/14-2014 08:40, Paul E. McKenney wrote: > > > > > On Fri, Oct 24, 2014 at 12:08:57PM +0300, Yanko Kaneti wrote: > > > > > > On Thu-10/23/14-2014 15:04, Paul E. McKenney wrote: > > > > > > > On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote: > > > > > > > > > > > > > > > > On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote: > > > > > > > > > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote: > > > > > > [ . . . ] > > > > > > > > > Ok, unless I've messsed up something major, bisecting points to: > > > > > > > > > > > > 35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs > > > > > > > > > > > > Makes any sense ? > > > > > > > > > > Good question. ;-) > > > > > > > > > > Are any of your online CPUs missing rcuo kthreads? There should be > > > > > kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU. > > > > > > > > Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8 > > > > and the modprobe ppp_generic testcase reliably works, libvirt also manages > > > > to setup its bridge. > > > > > > > > Just with linux-tip , the rcuos are 6 but the failure is as reliable as > > > > before. > > > > > Thank you, very interesting. Which 6 of the rcuos are present? > > > > Well, the rcuos are 0 to 5. Which sounds right for a 6 core CPU like this > > Phenom II. > > Ah, you get 8 without the patch because it creates them for potential > CPUs as well as real ones. OK, got it. > > > > > Awating instructions: :) > > > > > > Well, I thought I understood the problem until you found that only 6 of > > > the expected 8 rcuos are present with linux-tip without the revert. ;-) > > > > > > I am putting together a patch for the part of the problem that I think > > > I understand, of course, but it would help a lot to know which two of > > > the rcuos are missing. ;-) > > > > Ready to test > > Well, if you are feeling aggressive, give the following patch a spin. > I am doing sanity tests on it in the meantime. Doesn't seem to make a difference here > Thanx, Paul > > ------------------------------------------------------------------------ > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h > index 29fb23f33c18..927c17b081c7 100644 > --- a/kernel/rcu/tree_plugin.h > +++ b/kernel/rcu/tree_plugin.h > @@ -2546,9 +2546,13 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu) > rdp->nocb_leader = rdp_spawn; > if (rdp_last && rdp != rdp_spawn) > rdp_last->nocb_next_follower = rdp; > - rdp_last = rdp; > - rdp = rdp->nocb_next_follower; > - rdp_last->nocb_next_follower = NULL; > + if (rdp == rdp_spawn) { > + rdp = rdp->nocb_next_follower; > + } else { > + rdp_last = rdp; > + rdp = rdp->nocb_next_follower; > + rdp_last->nocb_next_follower = NULL; > + } > } while (rdp); > rdp_spawn->nocb_next_follower = rdp_old_leader; > } > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Oct 24, 2014 at 08:35:26PM +0300, Yanko Kaneti wrote: > On Fri-10/24/14-2014 10:20, Paul E. McKenney wrote: > > On Fri, Oct 24, 2014 at 08:09:31PM +0300, Yanko Kaneti wrote: > > > On Fri-10/24/14-2014 09:54, Paul E. McKenney wrote: > > > > On Fri, Oct 24, 2014 at 07:29:43PM +0300, Yanko Kaneti wrote: > > > > > On Fri-10/24/14-2014 08:40, Paul E. McKenney wrote: > > > > > > On Fri, Oct 24, 2014 at 12:08:57PM +0300, Yanko Kaneti wrote: > > > > > > > On Thu-10/23/14-2014 15:04, Paul E. McKenney wrote: > > > > > > > > On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote: > > > > > > > > > > > > > > > > > > On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote: > > > > > > > > > > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote: > > > > > > > > [ . . . ] > > > > > > > > > > > Ok, unless I've messsed up something major, bisecting points to: > > > > > > > > > > > > > > 35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs > > > > > > > > > > > > > > Makes any sense ? > > > > > > > > > > > > Good question. ;-) > > > > > > > > > > > > Are any of your online CPUs missing rcuo kthreads? There should be > > > > > > kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU. > > > > > > > > > > Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8 > > > > > and the modprobe ppp_generic testcase reliably works, libvirt also manages > > > > > to setup its bridge. > > > > > > > > > > Just with linux-tip , the rcuos are 6 but the failure is as reliable as > > > > > before. > > > > > > > Thank you, very interesting. Which 6 of the rcuos are present? > > > > > > Well, the rcuos are 0 to 5. Which sounds right for a 6 core CPU like this > > > Phenom II. > > > > Ah, you get 8 without the patch because it creates them for potential > > CPUs as well as real ones. OK, got it. > > > > > > > Awating instructions: :) > > > > > > > > Well, I thought I understood the problem until you found that only 6 of > > > > the expected 8 rcuos are present with linux-tip without the revert. ;-) > > > > > > > > I am putting together a patch for the part of the problem that I think > > > > I understand, of course, but it would help a lot to know which two of > > > > the rcuos are missing. ;-) > > > > > > Ready to test > > > > Well, if you are feeling aggressive, give the following patch a spin. > > I am doing sanity tests on it in the meantime. > > Doesn't seem to make a difference here OK, inspection isn't cutting it, so time for tracing. Does the system respond to user input? If so, please enable rcu:rcu_barrier ftrace before the problem occurs, then dump the trace buffer after the problem occurs. Thanx, Paul > > ------------------------------------------------------------------------ > > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h > > index 29fb23f33c18..927c17b081c7 100644 > > --- a/kernel/rcu/tree_plugin.h > > +++ b/kernel/rcu/tree_plugin.h > > @@ -2546,9 +2546,13 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu) > > rdp->nocb_leader = rdp_spawn; > > if (rdp_last && rdp != rdp_spawn) > > rdp_last->nocb_next_follower = rdp; > > - rdp_last = rdp; > > - rdp = rdp->nocb_next_follower; > > - rdp_last->nocb_next_follower = NULL; > > + if (rdp == rdp_spawn) { > > + rdp = rdp->nocb_next_follower; > > + } else { > > + rdp_last = rdp; > > + rdp = rdp->nocb_next_follower; > > + rdp_last->nocb_next_follower = NULL; > > + } > > } while (rdp); > > rdp_spawn->nocb_next_follower = rdp_old_leader; > > } > > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: >On Fri, Oct 24, 2014 at 08:35:26PM +0300, Yanko Kaneti wrote: >> On Fri-10/24/14-2014 10:20, Paul E. McKenney wrote: >> > On Fri, Oct 24, 2014 at 08:09:31PM +0300, Yanko Kaneti wrote: >> > > On Fri-10/24/14-2014 09:54, Paul E. McKenney wrote: >> > > > On Fri, Oct 24, 2014 at 07:29:43PM +0300, Yanko Kaneti wrote: >> > > > > On Fri-10/24/14-2014 08:40, Paul E. McKenney wrote: >> > > > > > On Fri, Oct 24, 2014 at 12:08:57PM +0300, Yanko Kaneti wrote: >> > > > > > > On Thu-10/23/14-2014 15:04, Paul E. McKenney wrote: >> > > > > > > > On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote: >> > > > > > > > > >> > > > > > > > > On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote: >> > > > > > > > > > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote: >> > > > >> > > > [ . . . ] >> > > > >> > > > > > > Ok, unless I've messsed up something major, bisecting points to: >> > > > > > > >> > > > > > > 35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs >> > > > > > > >> > > > > > > Makes any sense ? >> > > > > > >> > > > > > Good question. ;-) >> > > > > > >> > > > > > Are any of your online CPUs missing rcuo kthreads? There should be >> > > > > > kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU. >> > > > > >> > > > > Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8 >> > > > > and the modprobe ppp_generic testcase reliably works, libvirt also manages >> > > > > to setup its bridge. >> > > > > >> > > > > Just with linux-tip , the rcuos are 6 but the failure is as reliable as >> > > > > before. >> > > >> > > > Thank you, very interesting. Which 6 of the rcuos are present? >> > > >> > > Well, the rcuos are 0 to 5. Which sounds right for a 6 core CPU like this >> > > Phenom II. >> > >> > Ah, you get 8 without the patch because it creates them for potential >> > CPUs as well as real ones. OK, got it. >> > >> > > > > Awating instructions: :) >> > > > >> > > > Well, I thought I understood the problem until you found that only 6 of >> > > > the expected 8 rcuos are present with linux-tip without the revert. ;-) >> > > > >> > > > I am putting together a patch for the part of the problem that I think >> > > > I understand, of course, but it would help a lot to know which two of >> > > > the rcuos are missing. ;-) >> > > >> > > Ready to test >> > >> > Well, if you are feeling aggressive, give the following patch a spin. >> > I am doing sanity tests on it in the meantime. >> >> Doesn't seem to make a difference here > >OK, inspection isn't cutting it, so time for tracing. Does the system >respond to user input? If so, please enable rcu:rcu_barrier ftrace before >the problem occurs, then dump the trace buffer after the problem occurs. My system is up and responsive when the problem occurs, so this shouldn't be a problem. Do you want the ftrace with your patch below, or unmodified tip of tree? -J > Thanx, Paul > >> > ------------------------------------------------------------------------ >> > >> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h >> > index 29fb23f33c18..927c17b081c7 100644 >> > --- a/kernel/rcu/tree_plugin.h >> > +++ b/kernel/rcu/tree_plugin.h >> > @@ -2546,9 +2546,13 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu) >> > rdp->nocb_leader = rdp_spawn; >> > if (rdp_last && rdp != rdp_spawn) >> > rdp_last->nocb_next_follower = rdp; >> > - rdp_last = rdp; >> > - rdp = rdp->nocb_next_follower; >> > - rdp_last->nocb_next_follower = NULL; >> > + if (rdp == rdp_spawn) { >> > + rdp = rdp->nocb_next_follower; >> > + } else { >> > + rdp_last = rdp; >> > + rdp = rdp->nocb_next_follower; >> > + rdp_last->nocb_next_follower = NULL; >> > + } >> > } while (rdp); >> > rdp_spawn->nocb_next_follower = rdp_old_leader; >> > } >> > --- -Jay Vosburgh, jay.vosburgh@canonical.com -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Oct 24, 2014 at 11:49:48AM -0700, Jay Vosburgh wrote: > Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > > >On Fri, Oct 24, 2014 at 08:35:26PM +0300, Yanko Kaneti wrote: > >> On Fri-10/24/14-2014 10:20, Paul E. McKenney wrote: > >> > On Fri, Oct 24, 2014 at 08:09:31PM +0300, Yanko Kaneti wrote: > >> > > On Fri-10/24/14-2014 09:54, Paul E. McKenney wrote: > >> > > > On Fri, Oct 24, 2014 at 07:29:43PM +0300, Yanko Kaneti wrote: > >> > > > > On Fri-10/24/14-2014 08:40, Paul E. McKenney wrote: > >> > > > > > On Fri, Oct 24, 2014 at 12:08:57PM +0300, Yanko Kaneti wrote: > >> > > > > > > On Thu-10/23/14-2014 15:04, Paul E. McKenney wrote: > >> > > > > > > > On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote: > >> > > > > > > > > > >> > > > > > > > > On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote: > >> > > > > > > > > > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote: > >> > > > > >> > > > [ . . . ] > >> > > > > >> > > > > > > Ok, unless I've messsed up something major, bisecting points to: > >> > > > > > > > >> > > > > > > 35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs > >> > > > > > > > >> > > > > > > Makes any sense ? > >> > > > > > > >> > > > > > Good question. ;-) > >> > > > > > > >> > > > > > Are any of your online CPUs missing rcuo kthreads? There should be > >> > > > > > kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU. > >> > > > > > >> > > > > Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8 > >> > > > > and the modprobe ppp_generic testcase reliably works, libvirt also manages > >> > > > > to setup its bridge. > >> > > > > > >> > > > > Just with linux-tip , the rcuos are 6 but the failure is as reliable as > >> > > > > before. > >> > > > >> > > > Thank you, very interesting. Which 6 of the rcuos are present? > >> > > > >> > > Well, the rcuos are 0 to 5. Which sounds right for a 6 core CPU like this > >> > > Phenom II. > >> > > >> > Ah, you get 8 without the patch because it creates them for potential > >> > CPUs as well as real ones. OK, got it. > >> > > >> > > > > Awating instructions: :) > >> > > > > >> > > > Well, I thought I understood the problem until you found that only 6 of > >> > > > the expected 8 rcuos are present with linux-tip without the revert. ;-) > >> > > > > >> > > > I am putting together a patch for the part of the problem that I think > >> > > > I understand, of course, but it would help a lot to know which two of > >> > > > the rcuos are missing. ;-) > >> > > > >> > > Ready to test > >> > > >> > Well, if you are feeling aggressive, give the following patch a spin. > >> > I am doing sanity tests on it in the meantime. > >> > >> Doesn't seem to make a difference here > > > >OK, inspection isn't cutting it, so time for tracing. Does the system > >respond to user input? If so, please enable rcu:rcu_barrier ftrace before > >the problem occurs, then dump the trace buffer after the problem occurs. > > My system is up and responsive when the problem occurs, so this > shouldn't be a problem. Nice! ;-) > Do you want the ftrace with your patch below, or unmodified tip > of tree? Let's please start with the patch. Thanx, Paul > -J > > > > Thanx, Paul > > > >> > ------------------------------------------------------------------------ > >> > > >> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h > >> > index 29fb23f33c18..927c17b081c7 100644 > >> > --- a/kernel/rcu/tree_plugin.h > >> > +++ b/kernel/rcu/tree_plugin.h > >> > @@ -2546,9 +2546,13 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu) > >> > rdp->nocb_leader = rdp_spawn; > >> > if (rdp_last && rdp != rdp_spawn) > >> > rdp_last->nocb_next_follower = rdp; > >> > - rdp_last = rdp; > >> > - rdp = rdp->nocb_next_follower; > >> > - rdp_last->nocb_next_follower = NULL; > >> > + if (rdp == rdp_spawn) { > >> > + rdp = rdp->nocb_next_follower; > >> > + } else { > >> > + rdp_last = rdp; > >> > + rdp = rdp->nocb_next_follower; > >> > + rdp_last->nocb_next_follower = NULL; > >> > + } > >> > } while (rdp); > >> > rdp_spawn->nocb_next_follower = rdp_old_leader; > >> > } > >> > > > --- > -Jay Vosburgh, jay.vosburgh@canonical.com > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Oct 24, 2014 at 11:57:53AM -0700, Paul E. McKenney wrote: > On Fri, Oct 24, 2014 at 11:49:48AM -0700, Jay Vosburgh wrote: > > Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > > > > >On Fri, Oct 24, 2014 at 08:35:26PM +0300, Yanko Kaneti wrote: > > >> On Fri-10/24/14-2014 10:20, Paul E. McKenney wrote: > > >> > On Fri, Oct 24, 2014 at 08:09:31PM +0300, Yanko Kaneti wrote: > > >> > > On Fri-10/24/14-2014 09:54, Paul E. McKenney wrote: > > >> > > > On Fri, Oct 24, 2014 at 07:29:43PM +0300, Yanko Kaneti wrote: > > >> > > > > On Fri-10/24/14-2014 08:40, Paul E. McKenney wrote: > > >> > > > > > On Fri, Oct 24, 2014 at 12:08:57PM +0300, Yanko Kaneti wrote: > > >> > > > > > > On Thu-10/23/14-2014 15:04, Paul E. McKenney wrote: > > >> > > > > > > > On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote: > > >> > > > > > > > > > > >> > > > > > > > > On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote: > > >> > > > > > > > > > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote: > > >> > > > > > >> > > > [ . . . ] > > >> > > > > > >> > > > > > > Ok, unless I've messsed up something major, bisecting points to: > > >> > > > > > > > > >> > > > > > > 35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs > > >> > > > > > > > > >> > > > > > > Makes any sense ? > > >> > > > > > > > >> > > > > > Good question. ;-) > > >> > > > > > > > >> > > > > > Are any of your online CPUs missing rcuo kthreads? There should be > > >> > > > > > kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU. > > >> > > > > > > >> > > > > Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8 > > >> > > > > and the modprobe ppp_generic testcase reliably works, libvirt also manages > > >> > > > > to setup its bridge. > > >> > > > > > > >> > > > > Just with linux-tip , the rcuos are 6 but the failure is as reliable as > > >> > > > > before. > > >> > > > > >> > > > Thank you, very interesting. Which 6 of the rcuos are present? > > >> > > > > >> > > Well, the rcuos are 0 to 5. Which sounds right for a 6 core CPU like this > > >> > > Phenom II. > > >> > > > >> > Ah, you get 8 without the patch because it creates them for potential > > >> > CPUs as well as real ones. OK, got it. > > >> > > > >> > > > > Awating instructions: :) > > >> > > > > > >> > > > Well, I thought I understood the problem until you found that only 6 of > > >> > > > the expected 8 rcuos are present with linux-tip without the revert. ;-) > > >> > > > > > >> > > > I am putting together a patch for the part of the problem that I think > > >> > > > I understand, of course, but it would help a lot to know which two of > > >> > > > the rcuos are missing. ;-) > > >> > > > > >> > > Ready to test > > >> > > > >> > Well, if you are feeling aggressive, give the following patch a spin. > > >> > I am doing sanity tests on it in the meantime. > > >> > > >> Doesn't seem to make a difference here > > > > > >OK, inspection isn't cutting it, so time for tracing. Does the system > > >respond to user input? If so, please enable rcu:rcu_barrier ftrace before > > >the problem occurs, then dump the trace buffer after the problem occurs. > > > > My system is up and responsive when the problem occurs, so this > > shouldn't be a problem. > > Nice! ;-) > > > Do you want the ftrace with your patch below, or unmodified tip > > of tree? > > Let's please start with the patch. And I should hasten to add that you need to set CONFIG_RCU_TRACE=y for these tracepoints to be enabled. Thanx, Paul > > -J > > > > > > > Thanx, Paul > > > > > >> > ------------------------------------------------------------------------ > > >> > > > >> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h > > >> > index 29fb23f33c18..927c17b081c7 100644 > > >> > --- a/kernel/rcu/tree_plugin.h > > >> > +++ b/kernel/rcu/tree_plugin.h > > >> > @@ -2546,9 +2546,13 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu) > > >> > rdp->nocb_leader = rdp_spawn; > > >> > if (rdp_last && rdp != rdp_spawn) > > >> > rdp_last->nocb_next_follower = rdp; > > >> > - rdp_last = rdp; > > >> > - rdp = rdp->nocb_next_follower; > > >> > - rdp_last->nocb_next_follower = NULL; > > >> > + if (rdp == rdp_spawn) { > > >> > + rdp = rdp->nocb_next_follower; > > >> > + } else { > > >> > + rdp_last = rdp; > > >> > + rdp = rdp->nocb_next_follower; > > >> > + rdp_last->nocb_next_follower = NULL; > > >> > + } > > >> > } while (rdp); > > >> > rdp_spawn->nocb_next_follower = rdp_old_leader; > > >> > } > > >> > > > > > --- > > -Jay Vosburgh, jay.vosburgh@canonical.com > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri-10/24/14-2014 11:32, Paul E. McKenney wrote: > On Fri, Oct 24, 2014 at 08:35:26PM +0300, Yanko Kaneti wrote: > > On Fri-10/24/14-2014 10:20, Paul E. McKenney wrote: > > > On Fri, Oct 24, 2014 at 08:09:31PM +0300, Yanko Kaneti wrote: > > > > On Fri-10/24/14-2014 09:54, Paul E. McKenney wrote: > > > > > On Fri, Oct 24, 2014 at 07:29:43PM +0300, Yanko Kaneti wrote: > > > > > > On Fri-10/24/14-2014 08:40, Paul E. McKenney wrote: > > > > > > > On Fri, Oct 24, 2014 at 12:08:57PM +0300, Yanko Kaneti wrote: > > > > > > > > On Thu-10/23/14-2014 15:04, Paul E. McKenney wrote: > > > > > > > > > On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote: > > > > > > > > > > > > > > > > > > > > On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote: > > > > > > > > > > > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote: > > > > > > > > > > [ . . . ] > > > > > > > > > > > > > Ok, unless I've messsed up something major, bisecting points to: > > > > > > > > > > > > > > > > 35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs > > > > > > > > > > > > > > > > Makes any sense ? > > > > > > > > > > > > > > Good question. ;-) > > > > > > > > > > > > > > Are any of your online CPUs missing rcuo kthreads? There should be > > > > > > > kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU. > > > > > > > > > > > > Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8 > > > > > > and the modprobe ppp_generic testcase reliably works, libvirt also manages > > > > > > to setup its bridge. > > > > > > > > > > > > Just with linux-tip , the rcuos are 6 but the failure is as reliable as > > > > > > before. > > > > > > > > > Thank you, very interesting. Which 6 of the rcuos are present? > > > > > > > > Well, the rcuos are 0 to 5. Which sounds right for a 6 core CPU like this > > > > Phenom II. > > > > > > Ah, you get 8 without the patch because it creates them for potential > > > CPUs as well as real ones. OK, got it. > > > > > > > > > Awating instructions: :) > > > > > > > > > > Well, I thought I understood the problem until you found that only 6 of > > > > > the expected 8 rcuos are present with linux-tip without the revert. ;-) > > > > > > > > > > I am putting together a patch for the part of the problem that I think > > > > > I understand, of course, but it would help a lot to know which two of > > > > > the rcuos are missing. ;-) > > > > > > > > Ready to test > > > > > > Well, if you are feeling aggressive, give the following patch a spin. > > > I am doing sanity tests on it in the meantime. > > > > Doesn't seem to make a difference here > > OK, inspection isn't cutting it, so time for tracing. Does the system > respond to user input? If so, please enable rcu:rcu_barrier ftrace before > the problem occurs, then dump the trace buffer after the problem occurs. Sorry for being unresposive here, but I know next to nothing about tracing or most things about the kernel, so I have some cathing up to do. In the meantime some layman observations while I tried to find what exactly triggers the problem. - Even in runlevel 1 I can reliably trigger the problem by starting libvirtd - libvirtd seems to be very active in using all sorts of kernel facilities that are modules on fedora so it seems to cause many simultaneous kworker calls to modprobe - there are 8 kworker/u16 from 0 to 7 - one of these kworkers always deadlocks, while there appear to be two kworker/u16:6 - the seventh 6 vs 8 as in 6 rcuos where before they were always 8 Just observations from someone who still doesn't know what the u16 kworkers are.. -- Yanko > Thanx, Paul > > > > ------------------------------------------------------------------------ > > > > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h > > > index 29fb23f33c18..927c17b081c7 100644 > > > --- a/kernel/rcu/tree_plugin.h > > > +++ b/kernel/rcu/tree_plugin.h > > > @@ -2546,9 +2546,13 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu) > > > rdp->nocb_leader = rdp_spawn; > > > if (rdp_last && rdp != rdp_spawn) > > > rdp_last->nocb_next_follower = rdp; > > > - rdp_last = rdp; > > > - rdp = rdp->nocb_next_follower; > > > - rdp_last->nocb_next_follower = NULL; > > > + if (rdp == rdp_spawn) { > > > + rdp = rdp->nocb_next_follower; > > > + } else { > > > + rdp_last = rdp; > > > + rdp = rdp->nocb_next_follower; > > > + rdp_last->nocb_next_follower = NULL; > > > + } > > > } while (rdp); > > > rdp_spawn->nocb_next_follower = rdp_old_leader; > > > } > > > > > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 29fb23f33c18..927c17b081c7 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -2546,9 +2546,13 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu) rdp->nocb_leader = rdp_spawn; if (rdp_last && rdp != rdp_spawn) rdp_last->nocb_next_follower = rdp; - rdp_last = rdp; - rdp = rdp->nocb_next_follower; - rdp_last->nocb_next_follower = NULL; + if (rdp == rdp_spawn) { + rdp = rdp->nocb_next_follower; + } else { + rdp_last = rdp; + rdp = rdp->nocb_next_follower; + rdp_last->nocb_next_follower = NULL; + } } while (rdp); rdp_spawn->nocb_next_follower = rdp_old_leader; }