diff mbox

localed stuck in recent 3.18 git in copy_net_ns?

Message ID 20141024172009.GV4977@linux.vnet.ibm.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Paul E. McKenney Oct. 24, 2014, 5:20 p.m. UTC
On Fri, Oct 24, 2014 at 08:09:31PM +0300, Yanko Kaneti wrote:
> On Fri-10/24/14-2014 09:54, Paul E. McKenney wrote:
> > On Fri, Oct 24, 2014 at 07:29:43PM +0300, Yanko Kaneti wrote:
> > > On Fri-10/24/14-2014 08:40, Paul E. McKenney wrote:
> > > > On Fri, Oct 24, 2014 at 12:08:57PM +0300, Yanko Kaneti wrote:
> > > > > On Thu-10/23/14-2014 15:04, Paul E. McKenney wrote:
> > > > > > On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote:
> > > > > > > 
> > > > > > > On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote:
> > > > > > > > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote:
> > 
> > [ . . . ]
> > 
> > > > > Ok, unless I've messsed up something major, bisecting points to:
> > > > > 
> > > > > 35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
> > > > > 
> > > > > Makes any sense ?
> > > > 
> > > > Good question.  ;-)
> > > > 
> > > > Are any of your online CPUs missing rcuo kthreads?  There should be
> > > > kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU.
> > > 
> > > Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8
> > > and the modprobe ppp_generic testcase reliably works, libvirt also manages
> > > to setup its bridge.
> > > 
> > > Just with linux-tip , the rcuos are 6 but the failure is as reliable as
> > > before.
> 
> > Thank you, very interesting.  Which 6 of the rcuos are present?
> 
> Well, the rcuos are 0 to 5. Which sounds right for a 6 core CPU like this   
> Phenom II.

Ah, you get 8 without the patch because it creates them for potential
CPUs as well as real ones.  OK, got it.

> > > Awating instructions: :)
> > 
> > Well, I thought I understood the problem until you found that only 6 of
> > the expected 8 rcuos are present with linux-tip without the revert.  ;-)
> > 
> > I am putting together a patch for the part of the problem that I think
> > I understand, of course, but it would help a lot to know which two of
> > the rcuos are missing.  ;-)
> 
> Ready to test

Well, if you are feeling aggressive, give the following patch a spin.
I am doing sanity tests on it in the meantime.

							Thanx, Paul

------------------------------------------------------------------------


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Yanko Kaneti Oct. 24, 2014, 5:35 p.m. UTC | #1
On Fri-10/24/14-2014 10:20, Paul E. McKenney wrote:
> On Fri, Oct 24, 2014 at 08:09:31PM +0300, Yanko Kaneti wrote:
> > On Fri-10/24/14-2014 09:54, Paul E. McKenney wrote:
> > > On Fri, Oct 24, 2014 at 07:29:43PM +0300, Yanko Kaneti wrote:
> > > > On Fri-10/24/14-2014 08:40, Paul E. McKenney wrote:
> > > > > On Fri, Oct 24, 2014 at 12:08:57PM +0300, Yanko Kaneti wrote:
> > > > > > On Thu-10/23/14-2014 15:04, Paul E. McKenney wrote:
> > > > > > > On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote:
> > > > > > > > 
> > > > > > > > On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote:
> > > > > > > > > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote:
> > > 
> > > [ . . . ]
> > > 
> > > > > > Ok, unless I've messsed up something major, bisecting points to:
> > > > > > 
> > > > > > 35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
> > > > > > 
> > > > > > Makes any sense ?
> > > > > 
> > > > > Good question.  ;-)
> > > > > 
> > > > > Are any of your online CPUs missing rcuo kthreads?  There should be
> > > > > kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU.
> > > > 
> > > > Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8
> > > > and the modprobe ppp_generic testcase reliably works, libvirt also manages
> > > > to setup its bridge.
> > > > 
> > > > Just with linux-tip , the rcuos are 6 but the failure is as reliable as
> > > > before.
> > 
> > > Thank you, very interesting.  Which 6 of the rcuos are present?
> > 
> > Well, the rcuos are 0 to 5. Which sounds right for a 6 core CPU like this   
> > Phenom II.
> 
> Ah, you get 8 without the patch because it creates them for potential
> CPUs as well as real ones.  OK, got it.
> 
> > > > Awating instructions: :)
> > > 
> > > Well, I thought I understood the problem until you found that only 6 of
> > > the expected 8 rcuos are present with linux-tip without the revert.  ;-)
> > > 
> > > I am putting together a patch for the part of the problem that I think
> > > I understand, of course, but it would help a lot to know which two of
> > > the rcuos are missing.  ;-)
> > 
> > Ready to test
> 
> Well, if you are feeling aggressive, give the following patch a spin.
> I am doing sanity tests on it in the meantime.

Doesn't seem to make a difference here

 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 29fb23f33c18..927c17b081c7 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2546,9 +2546,13 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu)
>  			rdp->nocb_leader = rdp_spawn;
>  			if (rdp_last && rdp != rdp_spawn)
>  				rdp_last->nocb_next_follower = rdp;
> -			rdp_last = rdp;
> -			rdp = rdp->nocb_next_follower;
> -			rdp_last->nocb_next_follower = NULL;
> +			if (rdp == rdp_spawn) {
> +				rdp = rdp->nocb_next_follower;
> +			} else {
> +				rdp_last = rdp;
> +				rdp = rdp->nocb_next_follower;
> +				rdp_last->nocb_next_follower = NULL;
> +			}
>  		} while (rdp);
>  		rdp_spawn->nocb_next_follower = rdp_old_leader;
>  	}
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney Oct. 24, 2014, 6:32 p.m. UTC | #2
On Fri, Oct 24, 2014 at 08:35:26PM +0300, Yanko Kaneti wrote:
> On Fri-10/24/14-2014 10:20, Paul E. McKenney wrote:
> > On Fri, Oct 24, 2014 at 08:09:31PM +0300, Yanko Kaneti wrote:
> > > On Fri-10/24/14-2014 09:54, Paul E. McKenney wrote:
> > > > On Fri, Oct 24, 2014 at 07:29:43PM +0300, Yanko Kaneti wrote:
> > > > > On Fri-10/24/14-2014 08:40, Paul E. McKenney wrote:
> > > > > > On Fri, Oct 24, 2014 at 12:08:57PM +0300, Yanko Kaneti wrote:
> > > > > > > On Thu-10/23/14-2014 15:04, Paul E. McKenney wrote:
> > > > > > > > On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote:
> > > > > > > > > 
> > > > > > > > > On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote:
> > > > > > > > > > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote:
> > > > 
> > > > [ . . . ]
> > > > 
> > > > > > > Ok, unless I've messsed up something major, bisecting points to:
> > > > > > > 
> > > > > > > 35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
> > > > > > > 
> > > > > > > Makes any sense ?
> > > > > > 
> > > > > > Good question.  ;-)
> > > > > > 
> > > > > > Are any of your online CPUs missing rcuo kthreads?  There should be
> > > > > > kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU.
> > > > > 
> > > > > Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8
> > > > > and the modprobe ppp_generic testcase reliably works, libvirt also manages
> > > > > to setup its bridge.
> > > > > 
> > > > > Just with linux-tip , the rcuos are 6 but the failure is as reliable as
> > > > > before.
> > > 
> > > > Thank you, very interesting.  Which 6 of the rcuos are present?
> > > 
> > > Well, the rcuos are 0 to 5. Which sounds right for a 6 core CPU like this   
> > > Phenom II.
> > 
> > Ah, you get 8 without the patch because it creates them for potential
> > CPUs as well as real ones.  OK, got it.
> > 
> > > > > Awating instructions: :)
> > > > 
> > > > Well, I thought I understood the problem until you found that only 6 of
> > > > the expected 8 rcuos are present with linux-tip without the revert.  ;-)
> > > > 
> > > > I am putting together a patch for the part of the problem that I think
> > > > I understand, of course, but it would help a lot to know which two of
> > > > the rcuos are missing.  ;-)
> > > 
> > > Ready to test
> > 
> > Well, if you are feeling aggressive, give the following patch a spin.
> > I am doing sanity tests on it in the meantime.
> 
> Doesn't seem to make a difference here

OK, inspection isn't cutting it, so time for tracing.  Does the system
respond to user input?  If so, please enable rcu:rcu_barrier ftrace before
the problem occurs, then dump the trace buffer after the problem occurs.

							Thanx, Paul

> > ------------------------------------------------------------------------
> > 
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 29fb23f33c18..927c17b081c7 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -2546,9 +2546,13 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu)
> >  			rdp->nocb_leader = rdp_spawn;
> >  			if (rdp_last && rdp != rdp_spawn)
> >  				rdp_last->nocb_next_follower = rdp;
> > -			rdp_last = rdp;
> > -			rdp = rdp->nocb_next_follower;
> > -			rdp_last->nocb_next_follower = NULL;
> > +			if (rdp == rdp_spawn) {
> > +				rdp = rdp->nocb_next_follower;
> > +			} else {
> > +				rdp_last = rdp;
> > +				rdp = rdp->nocb_next_follower;
> > +				rdp_last->nocb_next_follower = NULL;
> > +			}
> >  		} while (rdp);
> >  		rdp_spawn->nocb_next_follower = rdp_old_leader;
> >  	}
> > 
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jay Vosburgh Oct. 24, 2014, 6:49 p.m. UTC | #3
Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:

>On Fri, Oct 24, 2014 at 08:35:26PM +0300, Yanko Kaneti wrote:
>> On Fri-10/24/14-2014 10:20, Paul E. McKenney wrote:
>> > On Fri, Oct 24, 2014 at 08:09:31PM +0300, Yanko Kaneti wrote:
>> > > On Fri-10/24/14-2014 09:54, Paul E. McKenney wrote:
>> > > > On Fri, Oct 24, 2014 at 07:29:43PM +0300, Yanko Kaneti wrote:
>> > > > > On Fri-10/24/14-2014 08:40, Paul E. McKenney wrote:
>> > > > > > On Fri, Oct 24, 2014 at 12:08:57PM +0300, Yanko Kaneti wrote:
>> > > > > > > On Thu-10/23/14-2014 15:04, Paul E. McKenney wrote:
>> > > > > > > > On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote:
>> > > > > > > > > 
>> > > > > > > > > On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote:
>> > > > > > > > > > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote:
>> > > > 
>> > > > [ . . . ]
>> > > > 
>> > > > > > > Ok, unless I've messsed up something major, bisecting points to:
>> > > > > > > 
>> > > > > > > 35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
>> > > > > > > 
>> > > > > > > Makes any sense ?
>> > > > > > 
>> > > > > > Good question.  ;-)
>> > > > > > 
>> > > > > > Are any of your online CPUs missing rcuo kthreads?  There should be
>> > > > > > kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU.
>> > > > > 
>> > > > > Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8
>> > > > > and the modprobe ppp_generic testcase reliably works, libvirt also manages
>> > > > > to setup its bridge.
>> > > > > 
>> > > > > Just with linux-tip , the rcuos are 6 but the failure is as reliable as
>> > > > > before.
>> > > 
>> > > > Thank you, very interesting.  Which 6 of the rcuos are present?
>> > > 
>> > > Well, the rcuos are 0 to 5. Which sounds right for a 6 core CPU like this   
>> > > Phenom II.
>> > 
>> > Ah, you get 8 without the patch because it creates them for potential
>> > CPUs as well as real ones.  OK, got it.
>> > 
>> > > > > Awating instructions: :)
>> > > > 
>> > > > Well, I thought I understood the problem until you found that only 6 of
>> > > > the expected 8 rcuos are present with linux-tip without the revert.  ;-)
>> > > > 
>> > > > I am putting together a patch for the part of the problem that I think
>> > > > I understand, of course, but it would help a lot to know which two of
>> > > > the rcuos are missing.  ;-)
>> > > 
>> > > Ready to test
>> > 
>> > Well, if you are feeling aggressive, give the following patch a spin.
>> > I am doing sanity tests on it in the meantime.
>> 
>> Doesn't seem to make a difference here
>
>OK, inspection isn't cutting it, so time for tracing.  Does the system
>respond to user input?  If so, please enable rcu:rcu_barrier ftrace before
>the problem occurs, then dump the trace buffer after the problem occurs.

	My system is up and responsive when the problem occurs, so this
shouldn't be a problem.

	Do you want the ftrace with your patch below, or unmodified tip
of tree?

	-J


>							Thanx, Paul
>
>> > ------------------------------------------------------------------------
>> > 
>> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
>> > index 29fb23f33c18..927c17b081c7 100644
>> > --- a/kernel/rcu/tree_plugin.h
>> > +++ b/kernel/rcu/tree_plugin.h
>> > @@ -2546,9 +2546,13 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu)
>> >  			rdp->nocb_leader = rdp_spawn;
>> >  			if (rdp_last && rdp != rdp_spawn)
>> >  				rdp_last->nocb_next_follower = rdp;
>> > -			rdp_last = rdp;
>> > -			rdp = rdp->nocb_next_follower;
>> > -			rdp_last->nocb_next_follower = NULL;
>> > +			if (rdp == rdp_spawn) {
>> > +				rdp = rdp->nocb_next_follower;
>> > +			} else {
>> > +				rdp_last = rdp;
>> > +				rdp = rdp->nocb_next_follower;
>> > +				rdp_last->nocb_next_follower = NULL;
>> > +			}
>> >  		} while (rdp);
>> >  		rdp_spawn->nocb_next_follower = rdp_old_leader;
>> >  	}
>> > 

---
	-Jay Vosburgh, jay.vosburgh@canonical.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney Oct. 24, 2014, 6:57 p.m. UTC | #4
On Fri, Oct 24, 2014 at 11:49:48AM -0700, Jay Vosburgh wrote:
> Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> 
> >On Fri, Oct 24, 2014 at 08:35:26PM +0300, Yanko Kaneti wrote:
> >> On Fri-10/24/14-2014 10:20, Paul E. McKenney wrote:
> >> > On Fri, Oct 24, 2014 at 08:09:31PM +0300, Yanko Kaneti wrote:
> >> > > On Fri-10/24/14-2014 09:54, Paul E. McKenney wrote:
> >> > > > On Fri, Oct 24, 2014 at 07:29:43PM +0300, Yanko Kaneti wrote:
> >> > > > > On Fri-10/24/14-2014 08:40, Paul E. McKenney wrote:
> >> > > > > > On Fri, Oct 24, 2014 at 12:08:57PM +0300, Yanko Kaneti wrote:
> >> > > > > > > On Thu-10/23/14-2014 15:04, Paul E. McKenney wrote:
> >> > > > > > > > On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote:
> >> > > > > > > > > 
> >> > > > > > > > > On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote:
> >> > > > > > > > > > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote:
> >> > > > 
> >> > > > [ . . . ]
> >> > > > 
> >> > > > > > > Ok, unless I've messsed up something major, bisecting points to:
> >> > > > > > > 
> >> > > > > > > 35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
> >> > > > > > > 
> >> > > > > > > Makes any sense ?
> >> > > > > > 
> >> > > > > > Good question.  ;-)
> >> > > > > > 
> >> > > > > > Are any of your online CPUs missing rcuo kthreads?  There should be
> >> > > > > > kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU.
> >> > > > > 
> >> > > > > Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8
> >> > > > > and the modprobe ppp_generic testcase reliably works, libvirt also manages
> >> > > > > to setup its bridge.
> >> > > > > 
> >> > > > > Just with linux-tip , the rcuos are 6 but the failure is as reliable as
> >> > > > > before.
> >> > > 
> >> > > > Thank you, very interesting.  Which 6 of the rcuos are present?
> >> > > 
> >> > > Well, the rcuos are 0 to 5. Which sounds right for a 6 core CPU like this   
> >> > > Phenom II.
> >> > 
> >> > Ah, you get 8 without the patch because it creates them for potential
> >> > CPUs as well as real ones.  OK, got it.
> >> > 
> >> > > > > Awating instructions: :)
> >> > > > 
> >> > > > Well, I thought I understood the problem until you found that only 6 of
> >> > > > the expected 8 rcuos are present with linux-tip without the revert.  ;-)
> >> > > > 
> >> > > > I am putting together a patch for the part of the problem that I think
> >> > > > I understand, of course, but it would help a lot to know which two of
> >> > > > the rcuos are missing.  ;-)
> >> > > 
> >> > > Ready to test
> >> > 
> >> > Well, if you are feeling aggressive, give the following patch a spin.
> >> > I am doing sanity tests on it in the meantime.
> >> 
> >> Doesn't seem to make a difference here
> >
> >OK, inspection isn't cutting it, so time for tracing.  Does the system
> >respond to user input?  If so, please enable rcu:rcu_barrier ftrace before
> >the problem occurs, then dump the trace buffer after the problem occurs.
> 
> 	My system is up and responsive when the problem occurs, so this
> shouldn't be a problem.

Nice!  ;-)

> 	Do you want the ftrace with your patch below, or unmodified tip
> of tree?

Let's please start with the patch.

							Thanx, Paul

> 	-J
> 
> 
> >							Thanx, Paul
> >
> >> > ------------------------------------------------------------------------
> >> > 
> >> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> >> > index 29fb23f33c18..927c17b081c7 100644
> >> > --- a/kernel/rcu/tree_plugin.h
> >> > +++ b/kernel/rcu/tree_plugin.h
> >> > @@ -2546,9 +2546,13 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu)
> >> >  			rdp->nocb_leader = rdp_spawn;
> >> >  			if (rdp_last && rdp != rdp_spawn)
> >> >  				rdp_last->nocb_next_follower = rdp;
> >> > -			rdp_last = rdp;
> >> > -			rdp = rdp->nocb_next_follower;
> >> > -			rdp_last->nocb_next_follower = NULL;
> >> > +			if (rdp == rdp_spawn) {
> >> > +				rdp = rdp->nocb_next_follower;
> >> > +			} else {
> >> > +				rdp_last = rdp;
> >> > +				rdp = rdp->nocb_next_follower;
> >> > +				rdp_last->nocb_next_follower = NULL;
> >> > +			}
> >> >  		} while (rdp);
> >> >  		rdp_spawn->nocb_next_follower = rdp_old_leader;
> >> >  	}
> >> > 
> 
> ---
> 	-Jay Vosburgh, jay.vosburgh@canonical.com
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney Oct. 24, 2014, 8:15 p.m. UTC | #5
On Fri, Oct 24, 2014 at 11:57:53AM -0700, Paul E. McKenney wrote:
> On Fri, Oct 24, 2014 at 11:49:48AM -0700, Jay Vosburgh wrote:
> > Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> > 
> > >On Fri, Oct 24, 2014 at 08:35:26PM +0300, Yanko Kaneti wrote:
> > >> On Fri-10/24/14-2014 10:20, Paul E. McKenney wrote:
> > >> > On Fri, Oct 24, 2014 at 08:09:31PM +0300, Yanko Kaneti wrote:
> > >> > > On Fri-10/24/14-2014 09:54, Paul E. McKenney wrote:
> > >> > > > On Fri, Oct 24, 2014 at 07:29:43PM +0300, Yanko Kaneti wrote:
> > >> > > > > On Fri-10/24/14-2014 08:40, Paul E. McKenney wrote:
> > >> > > > > > On Fri, Oct 24, 2014 at 12:08:57PM +0300, Yanko Kaneti wrote:
> > >> > > > > > > On Thu-10/23/14-2014 15:04, Paul E. McKenney wrote:
> > >> > > > > > > > On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote:
> > >> > > > > > > > > 
> > >> > > > > > > > > On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote:
> > >> > > > > > > > > > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote:
> > >> > > > 
> > >> > > > [ . . . ]
> > >> > > > 
> > >> > > > > > > Ok, unless I've messsed up something major, bisecting points to:
> > >> > > > > > > 
> > >> > > > > > > 35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
> > >> > > > > > > 
> > >> > > > > > > Makes any sense ?
> > >> > > > > > 
> > >> > > > > > Good question.  ;-)
> > >> > > > > > 
> > >> > > > > > Are any of your online CPUs missing rcuo kthreads?  There should be
> > >> > > > > > kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU.
> > >> > > > > 
> > >> > > > > Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8
> > >> > > > > and the modprobe ppp_generic testcase reliably works, libvirt also manages
> > >> > > > > to setup its bridge.
> > >> > > > > 
> > >> > > > > Just with linux-tip , the rcuos are 6 but the failure is as reliable as
> > >> > > > > before.
> > >> > > 
> > >> > > > Thank you, very interesting.  Which 6 of the rcuos are present?
> > >> > > 
> > >> > > Well, the rcuos are 0 to 5. Which sounds right for a 6 core CPU like this   
> > >> > > Phenom II.
> > >> > 
> > >> > Ah, you get 8 without the patch because it creates them for potential
> > >> > CPUs as well as real ones.  OK, got it.
> > >> > 
> > >> > > > > Awating instructions: :)
> > >> > > > 
> > >> > > > Well, I thought I understood the problem until you found that only 6 of
> > >> > > > the expected 8 rcuos are present with linux-tip without the revert.  ;-)
> > >> > > > 
> > >> > > > I am putting together a patch for the part of the problem that I think
> > >> > > > I understand, of course, but it would help a lot to know which two of
> > >> > > > the rcuos are missing.  ;-)
> > >> > > 
> > >> > > Ready to test
> > >> > 
> > >> > Well, if you are feeling aggressive, give the following patch a spin.
> > >> > I am doing sanity tests on it in the meantime.
> > >> 
> > >> Doesn't seem to make a difference here
> > >
> > >OK, inspection isn't cutting it, so time for tracing.  Does the system
> > >respond to user input?  If so, please enable rcu:rcu_barrier ftrace before
> > >the problem occurs, then dump the trace buffer after the problem occurs.
> > 
> > 	My system is up and responsive when the problem occurs, so this
> > shouldn't be a problem.
> 
> Nice!  ;-)
> 
> > 	Do you want the ftrace with your patch below, or unmodified tip
> > of tree?
> 
> Let's please start with the patch.

And I should hasten to add that you need to set CONFIG_RCU_TRACE=y
for these tracepoints to be enabled.

							Thanx, Paul

> > 	-J
> > 
> > 
> > >							Thanx, Paul
> > >
> > >> > ------------------------------------------------------------------------
> > >> > 
> > >> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > >> > index 29fb23f33c18..927c17b081c7 100644
> > >> > --- a/kernel/rcu/tree_plugin.h
> > >> > +++ b/kernel/rcu/tree_plugin.h
> > >> > @@ -2546,9 +2546,13 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu)
> > >> >  			rdp->nocb_leader = rdp_spawn;
> > >> >  			if (rdp_last && rdp != rdp_spawn)
> > >> >  				rdp_last->nocb_next_follower = rdp;
> > >> > -			rdp_last = rdp;
> > >> > -			rdp = rdp->nocb_next_follower;
> > >> > -			rdp_last->nocb_next_follower = NULL;
> > >> > +			if (rdp == rdp_spawn) {
> > >> > +				rdp = rdp->nocb_next_follower;
> > >> > +			} else {
> > >> > +				rdp_last = rdp;
> > >> > +				rdp = rdp->nocb_next_follower;
> > >> > +				rdp_last->nocb_next_follower = NULL;
> > >> > +			}
> > >> >  		} while (rdp);
> > >> >  		rdp_spawn->nocb_next_follower = rdp_old_leader;
> > >> >  	}
> > >> > 
> > 
> > ---
> > 	-Jay Vosburgh, jay.vosburgh@canonical.com
> > 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yanko Kaneti Oct. 24, 2014, 9:25 p.m. UTC | #6
On Fri-10/24/14-2014 11:32, Paul E. McKenney wrote:
> On Fri, Oct 24, 2014 at 08:35:26PM +0300, Yanko Kaneti wrote:
> > On Fri-10/24/14-2014 10:20, Paul E. McKenney wrote:
> > > On Fri, Oct 24, 2014 at 08:09:31PM +0300, Yanko Kaneti wrote:
> > > > On Fri-10/24/14-2014 09:54, Paul E. McKenney wrote:
> > > > > On Fri, Oct 24, 2014 at 07:29:43PM +0300, Yanko Kaneti wrote:
> > > > > > On Fri-10/24/14-2014 08:40, Paul E. McKenney wrote:
> > > > > > > On Fri, Oct 24, 2014 at 12:08:57PM +0300, Yanko Kaneti wrote:
> > > > > > > > On Thu-10/23/14-2014 15:04, Paul E. McKenney wrote:
> > > > > > > > > On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote:
> > > > > > > > > > 
> > > > > > > > > > On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote:
> > > > > > > > > > > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote:
> > > > > 
> > > > > [ . . . ]
> > > > > 
> > > > > > > > Ok, unless I've messsed up something major, bisecting points to:
> > > > > > > > 
> > > > > > > > 35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
> > > > > > > > 
> > > > > > > > Makes any sense ?
> > > > > > > 
> > > > > > > Good question.  ;-)
> > > > > > > 
> > > > > > > Are any of your online CPUs missing rcuo kthreads?  There should be
> > > > > > > kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU.
> > > > > > 
> > > > > > Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8
> > > > > > and the modprobe ppp_generic testcase reliably works, libvirt also manages
> > > > > > to setup its bridge.
> > > > > > 
> > > > > > Just with linux-tip , the rcuos are 6 but the failure is as reliable as
> > > > > > before.
> > > > 
> > > > > Thank you, very interesting.  Which 6 of the rcuos are present?
> > > > 
> > > > Well, the rcuos are 0 to 5. Which sounds right for a 6 core CPU like this   
> > > > Phenom II.
> > > 
> > > Ah, you get 8 without the patch because it creates them for potential
> > > CPUs as well as real ones.  OK, got it.
> > > 
> > > > > > Awating instructions: :)
> > > > > 
> > > > > Well, I thought I understood the problem until you found that only 6 of
> > > > > the expected 8 rcuos are present with linux-tip without the revert.  ;-)
> > > > > 
> > > > > I am putting together a patch for the part of the problem that I think
> > > > > I understand, of course, but it would help a lot to know which two of
> > > > > the rcuos are missing.  ;-)
> > > > 
> > > > Ready to test
> > > 
> > > Well, if you are feeling aggressive, give the following patch a spin.
> > > I am doing sanity tests on it in the meantime.
> > 
> > Doesn't seem to make a difference here
> 
> OK, inspection isn't cutting it, so time for tracing.  Does the system
> respond to user input?  If so, please enable rcu:rcu_barrier ftrace before
> the problem occurs, then dump the trace buffer after the problem occurs.

Sorry for being unresposive here, but I know next to nothing about tracing
or most things about the kernel, so I have some cathing up to do.

In the meantime some layman observations while I tried to find what exactly
triggers the problem.
- Even in runlevel 1 I can reliably trigger the problem by starting libvirtd
- libvirtd seems to be very active in using all sorts of kernel facilities
  that are modules on fedora so it seems to cause many simultaneous kworker 
  calls to modprobe
- there are 8 kworker/u16 from 0 to 7
- one of these kworkers always deadlocks, while there appear to be two
  kworker/u16:6 - the seventh

  6 vs 8 as in 6 rcuos where before they were always 8

Just observations from someone who still doesn't know what the u16
kworkers are..

-- Yanko



 
> 							Thanx, Paul
> 
> > > ------------------------------------------------------------------------
> > > 
> > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > index 29fb23f33c18..927c17b081c7 100644
> > > --- a/kernel/rcu/tree_plugin.h
> > > +++ b/kernel/rcu/tree_plugin.h
> > > @@ -2546,9 +2546,13 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu)
> > >  			rdp->nocb_leader = rdp_spawn;
> > >  			if (rdp_last && rdp != rdp_spawn)
> > >  				rdp_last->nocb_next_follower = rdp;
> > > -			rdp_last = rdp;
> > > -			rdp = rdp->nocb_next_follower;
> > > -			rdp_last->nocb_next_follower = NULL;
> > > +			if (rdp == rdp_spawn) {
> > > +				rdp = rdp->nocb_next_follower;
> > > +			} else {
> > > +				rdp_last = rdp;
> > > +				rdp = rdp->nocb_next_follower;
> > > +				rdp_last->nocb_next_follower = NULL;
> > > +			}
> > >  		} while (rdp);
> > >  		rdp_spawn->nocb_next_follower = rdp_old_leader;
> > >  	}
> > > 
> > 
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 29fb23f33c18..927c17b081c7 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2546,9 +2546,13 @@  static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu)
 			rdp->nocb_leader = rdp_spawn;
 			if (rdp_last && rdp != rdp_spawn)
 				rdp_last->nocb_next_follower = rdp;
-			rdp_last = rdp;
-			rdp = rdp->nocb_next_follower;
-			rdp_last->nocb_next_follower = NULL;
+			if (rdp == rdp_spawn) {
+				rdp = rdp->nocb_next_follower;
+			} else {
+				rdp_last = rdp;
+				rdp = rdp->nocb_next_follower;
+				rdp_last->nocb_next_follower = NULL;
+			}
 		} while (rdp);
 		rdp_spawn->nocb_next_follower = rdp_old_leader;
 	}