Patchwork 2.6.35-stable/ppc64/p7: suspicious rcu_dereference_check() usage detected during 2.6.35-stable boot

login
register
mail settings
Submitter Peter Zijlstra
Date Sept. 16, 2010, 3:50 p.m.
Message ID <1284652231.2275.569.camel@laptop>
Download mbox | patch
Permalink /patch/64981/
State Not Applicable
Headers show

Comments

Peter Zijlstra - Sept. 16, 2010, 3:50 p.m.
On Mon, 2010-08-09 at 09:12 -0700, Paul E. McKenney wrote:

> > [    0.051203] CPU0: AMD QEMU Virtual CPU version 0.12.4 stepping 03
> > [    0.052999] lockdep: fixing up alternatives.
> > [    0.054105]
> > [    0.054106] ===================================================
> > [    0.054999] [ INFO: suspicious rcu_dereference_check() usage. ]
> > [    0.054999] ---------------------------------------------------
> > [    0.054999] kernel/sched.c:616 invoked rcu_dereference_check() without protection!
> > [    0.054999]
> > [    0.054999] other info that might help us debug this:
> > [    0.054999]
> > [    0.054999]
> > [    0.054999] rcu_scheduler_active = 1, debug_locks = 1
> > [    0.054999] 3 locks held by swapper/1:
> > [    0.054999]  #0:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff814be933>] cpu_up+0x42/0x6a
> > [    0.054999]  #1:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff810400d8>] cpu_hotplug_begin+0x2a/0x51
> > [    0.054999]  #2:  (&rq->lock){-.-...}, at: [<ffffffff814be2f7>] init_idle+0x2f/0x113
> > [    0.054999]
> > [    0.054999] stack backtrace:
> > [    0.054999] Pid: 1, comm: swapper Not tainted 2.6.35 #1
> > [    0.054999] Call Trace:
> > [    0.054999]  [<ffffffff81068054>] lockdep_rcu_dereference+0x9b/0xa3
> > [    0.054999]  [<ffffffff810325c3>] task_group+0x7b/0x8a
> > [    0.054999]  [<ffffffff810325e5>] set_task_rq+0x13/0x40
> > [    0.054999]  [<ffffffff814be39a>] init_idle+0xd2/0x113
> > [    0.054999]  [<ffffffff814be78a>] fork_idle+0xb8/0xc7
> > [    0.054999]  [<ffffffff81068717>] ? mark_held_locks+0x4d/0x6b
> > [    0.054999]  [<ffffffff814bcebd>] do_fork_idle+0x17/0x2b
> > [    0.054999]  [<ffffffff814bc89b>] native_cpu_up+0x1c1/0x724
> > [    0.054999]  [<ffffffff814bcea6>] ? do_fork_idle+0x0/0x2b
> > [    0.054999]  [<ffffffff814be876>] _cpu_up+0xac/0x127
> > [    0.054999]  [<ffffffff814be946>] cpu_up+0x55/0x6a
> > [    0.054999]  [<ffffffff81ab562a>] kernel_init+0xe1/0x1ff
> > [    0.054999]  [<ffffffff81003854>] kernel_thread_helper+0x4/0x10
> > [    0.054999]  [<ffffffff814c353c>] ? restore_args+0x0/0x30
> > [    0.054999]  [<ffffffff81ab5549>] ? kernel_init+0x0/0x1ff
> > [    0.054999]  [<ffffffff81003850>] ? kernel_thread_helper+0x0/0x10
> > [    0.056074] Booting Node   0, Processors  #1lockdep: fixing up alternatives.
> > [    0.130045]  #2lockdep: fixing up alternatives.
> > [    0.203089]  #3 Ok.
> > [    0.275286] Brought up 4 CPUs
> > [    0.276005] Total of 4 processors activated (16017.17 BogoMIPS).
> 
> This does look like a new one, thank you for reporting it!
> 
> Here is my analysis, which should at least provide some humor value to
> those who understand the code better than I do.  ;-)
> 
> So the corresponding rcu_dereference_check() is in
> task_subsys_state_check(), and is fetching the cpu_cgroup_subsys_id
> element of the newly created task's task->cgroups->subsys[] array.
> The "git grep" command finds only three uses of cpu_cgroup_subsys_id,
> but no definition.
> 
> Now, fork_idle() invokes copy_process(), which invokes cgroup_fork(),
> which sets the child process's ->cgroups pointer to that of the parent,
> also invoking get_css_set(), which increments the corresponding reference
> count, doing both operations under task_lock() protection (->alloc_lock).
> Because fork_idle() does not specify any of CLONE_NEWNS, CLONE_NEWUTS,
> CLONE_NEWIPC, CLONE_NEWPID, or CLONE_NEWNET, copy_namespaces() should
> not create a new namespace, and so there should be no ns_cgroup_clone().
> We should thus retain the parent's ->cgroups pointer.  And copy_process()
> installs the new task in the various lists, so that the task is externally
> accessible upon return.
> 
> After a non-error return from copy_process(), fork_init() invokes
> init_idle_pid(), which does not appear to affect the task's cgroup
> state.  Next fork_init() invokes init_idle(), which in turn invokes
> __set_task_cpu(), which invokes set_task_rq(), which calls task_group()
> several times, which calls task_subsys_state_check(), which calls the
> rcu_dereference_check() that complained above.
> 
> However, the result returns by rcu_dereference_check() is stored into
> the task structure:
> 
> 	p->se.cfs_rq = task_group(p)->cfs_rq[cpu];
> 	p->se.parent = task_group(p)->se[cpu];
> 
> This means that the corresponding structure must have been tied down with
> a reference count or some such.  If such a reference has been taken, then
> this complaint is a false positive, and could be suppressed by putting
> rcu_read_lock() and rcu_read_unlock() around the call to init_idle()
> from fork_idle().  However, although, reference to the enclosing ->cgroups
> struct css_set is held, it is not clear to me that this reference applies
> to the structures pointed to by the ->subsys[] array, especially given
> that the cgroup_subsys_state structures referenced by this array have
> their own reference count, which does not appear to me to be acquired
> by this code path.
> 
> Or are the cgroup_subsys_state structures referenced by idle tasks
> never freed or some such?

I would hope so!, the idle tasks should be part of the root cgroup,
which is not removable.

The problem is that while we do in-fact hold rq->lock, the newly spawned
idle thread's cpu is not yet set to the correct cpu so the lockdep check
in task_group():

  lockdep_is_held(&task_rq(p)->lock)

will fail.

But of a chicken and egg problem. Setting the cpu needs to have the cpu
set ;-)

Ingo, why do we have rq->lock there at all? The CPU isn't up and running
yet, nothing should be touching it.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/sched.c |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)
Paul E. McKenney - Sept. 16, 2010, 4:12 p.m.
On Thu, Sep 16, 2010 at 05:50:31PM +0200, Peter Zijlstra wrote:
> On Mon, 2010-08-09 at 09:12 -0700, Paul E. McKenney wrote:
> 
> > > [    0.051203] CPU0: AMD QEMU Virtual CPU version 0.12.4 stepping 03
> > > [    0.052999] lockdep: fixing up alternatives.
> > > [    0.054105]
> > > [    0.054106] ===================================================
> > > [    0.054999] [ INFO: suspicious rcu_dereference_check() usage. ]
> > > [    0.054999] ---------------------------------------------------
> > > [    0.054999] kernel/sched.c:616 invoked rcu_dereference_check() without protection!
> > > [    0.054999]
> > > [    0.054999] other info that might help us debug this:
> > > [    0.054999]
> > > [    0.054999]
> > > [    0.054999] rcu_scheduler_active = 1, debug_locks = 1
> > > [    0.054999] 3 locks held by swapper/1:
> > > [    0.054999]  #0:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff814be933>] cpu_up+0x42/0x6a
> > > [    0.054999]  #1:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff810400d8>] cpu_hotplug_begin+0x2a/0x51
> > > [    0.054999]  #2:  (&rq->lock){-.-...}, at: [<ffffffff814be2f7>] init_idle+0x2f/0x113
> > > [    0.054999]
> > > [    0.054999] stack backtrace:
> > > [    0.054999] Pid: 1, comm: swapper Not tainted 2.6.35 #1
> > > [    0.054999] Call Trace:
> > > [    0.054999]  [<ffffffff81068054>] lockdep_rcu_dereference+0x9b/0xa3
> > > [    0.054999]  [<ffffffff810325c3>] task_group+0x7b/0x8a
> > > [    0.054999]  [<ffffffff810325e5>] set_task_rq+0x13/0x40
> > > [    0.054999]  [<ffffffff814be39a>] init_idle+0xd2/0x113
> > > [    0.054999]  [<ffffffff814be78a>] fork_idle+0xb8/0xc7
> > > [    0.054999]  [<ffffffff81068717>] ? mark_held_locks+0x4d/0x6b
> > > [    0.054999]  [<ffffffff814bcebd>] do_fork_idle+0x17/0x2b
> > > [    0.054999]  [<ffffffff814bc89b>] native_cpu_up+0x1c1/0x724
> > > [    0.054999]  [<ffffffff814bcea6>] ? do_fork_idle+0x0/0x2b
> > > [    0.054999]  [<ffffffff814be876>] _cpu_up+0xac/0x127
> > > [    0.054999]  [<ffffffff814be946>] cpu_up+0x55/0x6a
> > > [    0.054999]  [<ffffffff81ab562a>] kernel_init+0xe1/0x1ff
> > > [    0.054999]  [<ffffffff81003854>] kernel_thread_helper+0x4/0x10
> > > [    0.054999]  [<ffffffff814c353c>] ? restore_args+0x0/0x30
> > > [    0.054999]  [<ffffffff81ab5549>] ? kernel_init+0x0/0x1ff
> > > [    0.054999]  [<ffffffff81003850>] ? kernel_thread_helper+0x0/0x10
> > > [    0.056074] Booting Node   0, Processors  #1lockdep: fixing up alternatives.
> > > [    0.130045]  #2lockdep: fixing up alternatives.
> > > [    0.203089]  #3 Ok.
> > > [    0.275286] Brought up 4 CPUs
> > > [    0.276005] Total of 4 processors activated (16017.17 BogoMIPS).
> > 
> > This does look like a new one, thank you for reporting it!
> > 
> > Here is my analysis, which should at least provide some humor value to
> > those who understand the code better than I do.  ;-)
> > 
> > So the corresponding rcu_dereference_check() is in
> > task_subsys_state_check(), and is fetching the cpu_cgroup_subsys_id
> > element of the newly created task's task->cgroups->subsys[] array.
> > The "git grep" command finds only three uses of cpu_cgroup_subsys_id,
> > but no definition.
> > 
> > Now, fork_idle() invokes copy_process(), which invokes cgroup_fork(),
> > which sets the child process's ->cgroups pointer to that of the parent,
> > also invoking get_css_set(), which increments the corresponding reference
> > count, doing both operations under task_lock() protection (->alloc_lock).
> > Because fork_idle() does not specify any of CLONE_NEWNS, CLONE_NEWUTS,
> > CLONE_NEWIPC, CLONE_NEWPID, or CLONE_NEWNET, copy_namespaces() should
> > not create a new namespace, and so there should be no ns_cgroup_clone().
> > We should thus retain the parent's ->cgroups pointer.  And copy_process()
> > installs the new task in the various lists, so that the task is externally
> > accessible upon return.
> > 
> > After a non-error return from copy_process(), fork_init() invokes
> > init_idle_pid(), which does not appear to affect the task's cgroup
> > state.  Next fork_init() invokes init_idle(), which in turn invokes
> > __set_task_cpu(), which invokes set_task_rq(), which calls task_group()
> > several times, which calls task_subsys_state_check(), which calls the
> > rcu_dereference_check() that complained above.
> > 
> > However, the result returns by rcu_dereference_check() is stored into
> > the task structure:
> > 
> > 	p->se.cfs_rq = task_group(p)->cfs_rq[cpu];
> > 	p->se.parent = task_group(p)->se[cpu];
> > 
> > This means that the corresponding structure must have been tied down with
> > a reference count or some such.  If such a reference has been taken, then
> > this complaint is a false positive, and could be suppressed by putting
> > rcu_read_lock() and rcu_read_unlock() around the call to init_idle()
> > from fork_idle().  However, although, reference to the enclosing ->cgroups
> > struct css_set is held, it is not clear to me that this reference applies
> > to the structures pointed to by the ->subsys[] array, especially given
> > that the cgroup_subsys_state structures referenced by this array have
> > their own reference count, which does not appear to me to be acquired
> > by this code path.
> > 
> > Or are the cgroup_subsys_state structures referenced by idle tasks
> > never freed or some such?
> 
> I would hope so!, the idle tasks should be part of the root cgroup,
> which is not removable.
> 
> The problem is that while we do in-fact hold rq->lock, the newly spawned
> idle thread's cpu is not yet set to the correct cpu so the lockdep check
> in task_group():
> 
>   lockdep_is_held(&task_rq(p)->lock)
> 
> will fail.
> 
> But of a chicken and egg problem. Setting the cpu needs to have the cpu
> set ;-)

OK, makes sense to me.

> Ingo, why do we have rq->lock there at all? The CPU isn't up and running
> yet, nothing should be touching it.

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  kernel/sched.c |   12 ++++++++++++
>  1 files changed, 12 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/sched.c b/kernel/sched.c
> index bd8b487..6241049 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -5332,7 +5332,19 @@ void __cpuinit init_idle(struct task_struct *idle, int cpu)
>  	idle->se.exec_start = sched_clock();
>  
>  	cpumask_copy(&idle->cpus_allowed, cpumask_of(cpu));
> +	/*
> +	 * We're having a chicken and egg problem, even though we are
> +	 * holding rq->lock, the cpu isn't yet set to this cpu so the
> +	 * lockdep check in task_group() will fail.
> +	 *
> +	 * Similar case to sched_fork(). / Alternatively we could
> +	 * use task_rq_lock() here and obtain the other rq->lock.
> +	 *
> +	 * Silence PROVE_RCU
> +	 */
> +	rcu_read_lock();
>  	__set_task_cpu(idle, cpu);
> +	rcu_read_unlock();
>  
>  	rq->curr = rq->idle = idle;
>  #if defined(CONFIG_SMP) && defined(__ARCH_WANT_UNLOCKED_CTXSW)
>
Subrata Modak - Sept. 16, 2010, 6:19 p.m.
On Thu, 2010-09-16 at 09:12 -0700, Paul E. McKenney wrote:
> On Thu, Sep 16, 2010 at 05:50:31PM +0200, Peter Zijlstra wrote:
> > On Mon, 2010-08-09 at 09:12 -0700, Paul E. McKenney wrote:
> > 
> > > > [    0.051203] CPU0: AMD QEMU Virtual CPU version 0.12.4 stepping 03
> > > > [    0.052999] lockdep: fixing up alternatives.
> > > > [    0.054105]
> > > > [    0.054106] ===================================================
> > > > [    0.054999] [ INFO: suspicious rcu_dereference_check() usage. ]
> > > > [    0.054999] ---------------------------------------------------
> > > > [    0.054999] kernel/sched.c:616 invoked rcu_dereference_check() without protection!
> > > > [    0.054999]
> > > > [    0.054999] other info that might help us debug this:
> > > > [    0.054999]
> > > > [    0.054999]
> > > > [    0.054999] rcu_scheduler_active = 1, debug_locks = 1
> > > > [    0.054999] 3 locks held by swapper/1:
> > > > [    0.054999]  #0:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff814be933>] cpu_up+0x42/0x6a
> > > > [    0.054999]  #1:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff810400d8>] cpu_hotplug_begin+0x2a/0x51
> > > > [    0.054999]  #2:  (&rq->lock){-.-...}, at: [<ffffffff814be2f7>] init_idle+0x2f/0x113
> > > > [    0.054999]
> > > > [    0.054999] stack backtrace:
> > > > [    0.054999] Pid: 1, comm: swapper Not tainted 2.6.35 #1
> > > > [    0.054999] Call Trace:
> > > > [    0.054999]  [<ffffffff81068054>] lockdep_rcu_dereference+0x9b/0xa3
> > > > [    0.054999]  [<ffffffff810325c3>] task_group+0x7b/0x8a
> > > > [    0.054999]  [<ffffffff810325e5>] set_task_rq+0x13/0x40
> > > > [    0.054999]  [<ffffffff814be39a>] init_idle+0xd2/0x113
> > > > [    0.054999]  [<ffffffff814be78a>] fork_idle+0xb8/0xc7
> > > > [    0.054999]  [<ffffffff81068717>] ? mark_held_locks+0x4d/0x6b
> > > > [    0.054999]  [<ffffffff814bcebd>] do_fork_idle+0x17/0x2b
> > > > [    0.054999]  [<ffffffff814bc89b>] native_cpu_up+0x1c1/0x724
> > > > [    0.054999]  [<ffffffff814bcea6>] ? do_fork_idle+0x0/0x2b
> > > > [    0.054999]  [<ffffffff814be876>] _cpu_up+0xac/0x127
> > > > [    0.054999]  [<ffffffff814be946>] cpu_up+0x55/0x6a
> > > > [    0.054999]  [<ffffffff81ab562a>] kernel_init+0xe1/0x1ff
> > > > [    0.054999]  [<ffffffff81003854>] kernel_thread_helper+0x4/0x10
> > > > [    0.054999]  [<ffffffff814c353c>] ? restore_args+0x0/0x30
> > > > [    0.054999]  [<ffffffff81ab5549>] ? kernel_init+0x0/0x1ff
> > > > [    0.054999]  [<ffffffff81003850>] ? kernel_thread_helper+0x0/0x10
> > > > [    0.056074] Booting Node   0, Processors  #1lockdep: fixing up alternatives.
> > > > [    0.130045]  #2lockdep: fixing up alternatives.
> > > > [    0.203089]  #3 Ok.
> > > > [    0.275286] Brought up 4 CPUs
> > > > [    0.276005] Total of 4 processors activated (16017.17 BogoMIPS).
> > > 
> > > This does look like a new one, thank you for reporting it!
> > > 
> > > Here is my analysis, which should at least provide some humor value to
> > > those who understand the code better than I do.  ;-)
> > > 
> > > So the corresponding rcu_dereference_check() is in
> > > task_subsys_state_check(), and is fetching the cpu_cgroup_subsys_id
> > > element of the newly created task's task->cgroups->subsys[] array.
> > > The "git grep" command finds only three uses of cpu_cgroup_subsys_id,
> > > but no definition.
> > > 
> > > Now, fork_idle() invokes copy_process(), which invokes cgroup_fork(),
> > > which sets the child process's ->cgroups pointer to that of the parent,
> > > also invoking get_css_set(), which increments the corresponding reference
> > > count, doing both operations under task_lock() protection (->alloc_lock).
> > > Because fork_idle() does not specify any of CLONE_NEWNS, CLONE_NEWUTS,
> > > CLONE_NEWIPC, CLONE_NEWPID, or CLONE_NEWNET, copy_namespaces() should
> > > not create a new namespace, and so there should be no ns_cgroup_clone().
> > > We should thus retain the parent's ->cgroups pointer.  And copy_process()
> > > installs the new task in the various lists, so that the task is externally
> > > accessible upon return.
> > > 
> > > After a non-error return from copy_process(), fork_init() invokes
> > > init_idle_pid(), which does not appear to affect the task's cgroup
> > > state.  Next fork_init() invokes init_idle(), which in turn invokes
> > > __set_task_cpu(), which invokes set_task_rq(), which calls task_group()
> > > several times, which calls task_subsys_state_check(), which calls the
> > > rcu_dereference_check() that complained above.
> > > 
> > > However, the result returns by rcu_dereference_check() is stored into
> > > the task structure:
> > > 
> > > 	p->se.cfs_rq = task_group(p)->cfs_rq[cpu];
> > > 	p->se.parent = task_group(p)->se[cpu];
> > > 
> > > This means that the corresponding structure must have been tied down with
> > > a reference count or some such.  If such a reference has been taken, then
> > > this complaint is a false positive, and could be suppressed by putting
> > > rcu_read_lock() and rcu_read_unlock() around the call to init_idle()
> > > from fork_idle().  However, although, reference to the enclosing ->cgroups
> > > struct css_set is held, it is not clear to me that this reference applies
> > > to the structures pointed to by the ->subsys[] array, especially given
> > > that the cgroup_subsys_state structures referenced by this array have
> > > their own reference count, which does not appear to me to be acquired
> > > by this code path.
> > > 
> > > Or are the cgroup_subsys_state structures referenced by idle tasks
> > > never freed or some such?
> > 
> > I would hope so!, the idle tasks should be part of the root cgroup,
> > which is not removable.
> > 
> > The problem is that while we do in-fact hold rq->lock, the newly spawned
> > idle thread's cpu is not yet set to the correct cpu so the lockdep check
> > in task_group():
> > 
> >   lockdep_is_held(&task_rq(p)->lock)
> > 
> > will fail.
> > 
> > But of a chicken and egg problem. Setting the cpu needs to have the cpu
> > set ;-)
> 
> OK, makes sense to me.
> 
> > Ingo, why do we have rq->lock there at all? The CPU isn't up and running
> > yet, nothing should be touching it.
> 
> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>

Thanks Peter for the fix.

Regards--
Subrata

> > ---
> >  kernel/sched.c |   12 ++++++++++++
> >  1 files changed, 12 insertions(+), 0 deletions(-)
> > 
> > diff --git a/kernel/sched.c b/kernel/sched.c
> > index bd8b487..6241049 100644
> > --- a/kernel/sched.c
> > +++ b/kernel/sched.c
> > @@ -5332,7 +5332,19 @@ void __cpuinit init_idle(struct task_struct *idle, int cpu)
> >  	idle->se.exec_start = sched_clock();
> >  
> >  	cpumask_copy(&idle->cpus_allowed, cpumask_of(cpu));
> > +	/*
> > +	 * We're having a chicken and egg problem, even though we are
> > +	 * holding rq->lock, the cpu isn't yet set to this cpu so the
> > +	 * lockdep check in task_group() will fail.
> > +	 *
> > +	 * Similar case to sched_fork(). / Alternatively we could
> > +	 * use task_rq_lock() here and obtain the other rq->lock.
> > +	 *
> > +	 * Silence PROVE_RCU
> > +	 */
> > +	rcu_read_lock();
> >  	__set_task_cpu(idle, cpu);
> > +	rcu_read_unlock();
> >  
> >  	rq->curr = rq->idle = idle;
> >  #if defined(CONFIG_SMP) && defined(__ARCH_WANT_UNLOCKED_CTXSW)
> >

Patch

diff --git a/kernel/sched.c b/kernel/sched.c
index bd8b487..6241049 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -5332,7 +5332,19 @@  void __cpuinit init_idle(struct task_struct *idle, int cpu)
 	idle->se.exec_start = sched_clock();
 
 	cpumask_copy(&idle->cpus_allowed, cpumask_of(cpu));
+	/*
+	 * We're having a chicken and egg problem, even though we are
+	 * holding rq->lock, the cpu isn't yet set to this cpu so the
+	 * lockdep check in task_group() will fail.
+	 *
+	 * Similar case to sched_fork(). / Alternatively we could
+	 * use task_rq_lock() here and obtain the other rq->lock.
+	 *
+	 * Silence PROVE_RCU
+	 */
+	rcu_read_lock();
 	__set_task_cpu(idle, cpu);
+	rcu_read_unlock();
 
 	rq->curr = rq->idle = idle;
 #if defined(CONFIG_SMP) && defined(__ARCH_WANT_UNLOCKED_CTXSW)