diff mbox

Weird rcu lockdep warning

Message ID 20100414014930.GI2538@linux.vnet.ibm.com
State RFC
Delegated to: David Miller
Headers show

Commit Message

Paul E. McKenney April 14, 2010, 1:49 a.m. UTC
On Tue, Apr 13, 2010 at 05:13:06PM -0700, David Miller wrote:
> From: Frederic Weisbecker <fweisbec@gmail.com>
> Date: Wed, 14 Apr 2010 02:02:27 +0200
> 
> > I just have a guess though....
> > This seems to always happen from NMI path, and lockdep is disabled on NMI.
> > I suspect the lock_acquire() performed by rcu_read_lock() is just ignored
> > and then the rcu_read_lock_held() check has the wrong result...
> 
> Yeah, I bet that's it too.
> 
> lock_is_held() can't return anything meaningful while lockdep is
> disabled, which it is during NMIs.

Ah!  So I just need to add a "current->lockdep_recursion"
check to debug_lockdep_rcu_enabled().  And move the function to
kernel/rcutree_plugin.h to avoid #include hell.

See below for (untested) patch.

						Thanx, Paul

------------------------------------------------------------------------

 include/linux/rcupdate.h |    5 +----
 kernel/rcutree_plugin.h  |   11 +++++++++++
 2 files changed, 12 insertions(+), 4 deletions(-)

commit 304d8da6cd791a81ce3164f867e1b3ef4f9af1d1
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Tue Apr 13 18:45:51 2010 -0700

    rcu: Make RCU lockdep check the lockdep_recursion variable
    
    The lockdep facility temporarily disables lockdep checking by incrementing
    the current->lockdep_recursion variable.  Such disabling happens in NMIs
    and in other situations where lockdep might expect to recurse on itself.
    This patch therefore checks current->lockdep_recursion, disabling RCU
    lockdep splats when this variable is non-zero.
    
    Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
    Reported-by: David Miller <davem@davemloft.net>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller April 14, 2010, 1:51 a.m. UTC | #1
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Date: Tue, 13 Apr 2010 18:49:30 -0700

> On Tue, Apr 13, 2010 at 05:13:06PM -0700, David Miller wrote:
>> From: Frederic Weisbecker <fweisbec@gmail.com>
>> Date: Wed, 14 Apr 2010 02:02:27 +0200
>> 
>> > I just have a guess though....
>> > This seems to always happen from NMI path, and lockdep is disabled on NMI.
>> > I suspect the lock_acquire() performed by rcu_read_lock() is just ignored
>> > and then the rcu_read_lock_held() check has the wrong result...
>> 
>> Yeah, I bet that's it too.
>> 
>> lock_is_held() can't return anything meaningful while lockdep is
>> disabled, which it is during NMIs.
> 
> Ah!  So I just need to add a "current->lockdep_recursion"
> check to debug_lockdep_rcu_enabled().  And move the function to
> kernel/rcutree_plugin.h to avoid #include hell.
> 
> See below for (untested) patch.

Looks good to me, Frederic please test :-)
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lai Jiangshan April 14, 2010, 3:34 a.m. UTC | #2
Paul E. McKenney wrote:
> On Tue, Apr 13, 2010 at 05:13:06PM -0700, David Miller wrote:
>> From: Frederic Weisbecker <fweisbec@gmail.com>
>> Date: Wed, 14 Apr 2010 02:02:27 +0200
>>
>>> I just have a guess though....
>>> This seems to always happen from NMI path, and lockdep is disabled on NMI.
>>> I suspect the lock_acquire() performed by rcu_read_lock() is just ignored
>>> and then the rcu_read_lock_held() check has the wrong result...
>> Yeah, I bet that's it too.
>>
>> lock_is_held() can't return anything meaningful while lockdep is
>> disabled, which it is during NMIs.
> 
> Ah!  So I just need to add a "current->lockdep_recursion"
> check to debug_lockdep_rcu_enabled().  And move the function to
> kernel/rcutree_plugin.h to avoid #include hell.
> 
> See below for (untested) patch.
> 
> 						Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
>  include/linux/rcupdate.h |    5 +----
>  kernel/rcutree_plugin.h  |   11 +++++++++++
>  2 files changed, 12 insertions(+), 4 deletions(-)
> 
> commit 304d8da6cd791a81ce3164f867e1b3ef4f9af1d1
> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Date:   Tue Apr 13 18:45:51 2010 -0700
> 
>     rcu: Make RCU lockdep check the lockdep_recursion variable
>     
>     The lockdep facility temporarily disables lockdep checking by incrementing
>     the current->lockdep_recursion variable.  Such disabling happens in NMIs
>     and in other situations where lockdep might expect to recurse on itself.
>     This patch therefore checks current->lockdep_recursion, disabling RCU
>     lockdep splats when this variable is non-zero.
>     
>     Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
>     Reported-by: David Miller <davem@davemloft.net>
>     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 9f1ddfe..07db2fe 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -101,10 +101,7 @@ extern struct lockdep_map rcu_sched_lock_map;
>  # define rcu_read_release_sched() \
>  		lock_release(&rcu_sched_lock_map, 1, _THIS_IP_)
>  
> -static inline int debug_lockdep_rcu_enabled(void)
> -{
> -	return likely(rcu_scheduler_active && debug_locks);
> -}
> +extern int debug_lockdep_rcu_enabled(void);
>  
>  /**
>   * rcu_read_lock_held - might we be in RCU read-side critical section?
> diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> index 79b53bd..2169abe 100644
> --- a/kernel/rcutree_plugin.h
> +++ b/kernel/rcutree_plugin.h
> @@ -1067,3 +1067,14 @@ static void rcu_needs_cpu_flush(void)
>  }
>  
>  #endif /* #else #if !defined(CONFIG_RCU_FAST_NO_HZ) */
> +
> +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> +
> +int debug_lockdep_rcu_enabled(void)
> +{
> +	return likely(rcu_scheduler_active &&
> +		      debug_locks &&
> +		      current->lockdep_recursion == 0);
> +}
> +

Looks good to me too, but I think
'likely' is needless since the function is not inline.

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney April 14, 2010, 3:43 p.m. UTC | #3
On Wed, Apr 14, 2010 at 11:34:33AM +0800, Lai Jiangshan wrote:
> Paul E. McKenney wrote:
> > On Tue, Apr 13, 2010 at 05:13:06PM -0700, David Miller wrote:
> >> From: Frederic Weisbecker <fweisbec@gmail.com>
> >> Date: Wed, 14 Apr 2010 02:02:27 +0200
> >>
> >>> I just have a guess though....
> >>> This seems to always happen from NMI path, and lockdep is disabled on NMI.
> >>> I suspect the lock_acquire() performed by rcu_read_lock() is just ignored
> >>> and then the rcu_read_lock_held() check has the wrong result...
> >> Yeah, I bet that's it too.
> >>
> >> lock_is_held() can't return anything meaningful while lockdep is
> >> disabled, which it is during NMIs.
> > 
> > Ah!  So I just need to add a "current->lockdep_recursion"
> > check to debug_lockdep_rcu_enabled().  And move the function to
> > kernel/rcutree_plugin.h to avoid #include hell.
> > 
> > See below for (untested) patch.
> > 
> > 						Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> >  include/linux/rcupdate.h |    5 +----
> >  kernel/rcutree_plugin.h  |   11 +++++++++++
> >  2 files changed, 12 insertions(+), 4 deletions(-)
> > 
> > commit 304d8da6cd791a81ce3164f867e1b3ef4f9af1d1
> > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Date:   Tue Apr 13 18:45:51 2010 -0700
> > 
> >     rcu: Make RCU lockdep check the lockdep_recursion variable
> >     
> >     The lockdep facility temporarily disables lockdep checking by incrementing
> >     the current->lockdep_recursion variable.  Such disabling happens in NMIs
> >     and in other situations where lockdep might expect to recurse on itself.
> >     This patch therefore checks current->lockdep_recursion, disabling RCU
> >     lockdep splats when this variable is non-zero.
> >     
> >     Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
> >     Reported-by: David Miller <davem@davemloft.net>
> >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > 
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index 9f1ddfe..07db2fe 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -101,10 +101,7 @@ extern struct lockdep_map rcu_sched_lock_map;
> >  # define rcu_read_release_sched() \
> >  		lock_release(&rcu_sched_lock_map, 1, _THIS_IP_)
> >  
> > -static inline int debug_lockdep_rcu_enabled(void)
> > -{
> > -	return likely(rcu_scheduler_active && debug_locks);
> > -}
> > +extern int debug_lockdep_rcu_enabled(void);
> >  
> >  /**
> >   * rcu_read_lock_held - might we be in RCU read-side critical section?
> > diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> > index 79b53bd..2169abe 100644
> > --- a/kernel/rcutree_plugin.h
> > +++ b/kernel/rcutree_plugin.h
> > @@ -1067,3 +1067,14 @@ static void rcu_needs_cpu_flush(void)
> >  }
> >  
> >  #endif /* #else #if !defined(CONFIG_RCU_FAST_NO_HZ) */
> > +
> > +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> > +
> > +int debug_lockdep_rcu_enabled(void)
> > +{
> > +	return likely(rcu_scheduler_active &&
> > +		      debug_locks &&
> > +		      current->lockdep_recursion == 0);
> > +}
> > +
> 
> Looks good to me too, but I think
> 'likely' is needless since the function is not inline.

Good point.  And to add injury to insult, I forgot EXPORT_SYMBOL_GPL().

Updated patch in the works.

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Frédéric Weisbecker April 14, 2010, 3:51 p.m. UTC | #4
On Wed, Apr 14, 2010 at 08:43:02AM -0700, Paul E. McKenney wrote:
> On Wed, Apr 14, 2010 at 11:34:33AM +0800, Lai Jiangshan wrote:
> > Paul E. McKenney wrote:
> > > On Tue, Apr 13, 2010 at 05:13:06PM -0700, David Miller wrote:
> > >> From: Frederic Weisbecker <fweisbec@gmail.com>
> > >> Date: Wed, 14 Apr 2010 02:02:27 +0200
> > >>
> > >>> I just have a guess though....
> > >>> This seems to always happen from NMI path, and lockdep is disabled on NMI.
> > >>> I suspect the lock_acquire() performed by rcu_read_lock() is just ignored
> > >>> and then the rcu_read_lock_held() check has the wrong result...
> > >> Yeah, I bet that's it too.
> > >>
> > >> lock_is_held() can't return anything meaningful while lockdep is
> > >> disabled, which it is during NMIs.
> > > 
> > > Ah!  So I just need to add a "current->lockdep_recursion"
> > > check to debug_lockdep_rcu_enabled().  And move the function to
> > > kernel/rcutree_plugin.h to avoid #include hell.
> > > 
> > > See below for (untested) patch.
> > > 
> > > 						Thanx, Paul
> > > 
> > > ------------------------------------------------------------------------
> > > 
> > >  include/linux/rcupdate.h |    5 +----
> > >  kernel/rcutree_plugin.h  |   11 +++++++++++
> > >  2 files changed, 12 insertions(+), 4 deletions(-)
> > > 
> > > commit 304d8da6cd791a81ce3164f867e1b3ef4f9af1d1
> > > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > Date:   Tue Apr 13 18:45:51 2010 -0700
> > > 
> > >     rcu: Make RCU lockdep check the lockdep_recursion variable
> > >     
> > >     The lockdep facility temporarily disables lockdep checking by incrementing
> > >     the current->lockdep_recursion variable.  Such disabling happens in NMIs
> > >     and in other situations where lockdep might expect to recurse on itself.
> > >     This patch therefore checks current->lockdep_recursion, disabling RCU
> > >     lockdep splats when this variable is non-zero.
> > >     
> > >     Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
> > >     Reported-by: David Miller <davem@davemloft.net>
> > >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > 
> > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > > index 9f1ddfe..07db2fe 100644
> > > --- a/include/linux/rcupdate.h
> > > +++ b/include/linux/rcupdate.h
> > > @@ -101,10 +101,7 @@ extern struct lockdep_map rcu_sched_lock_map;
> > >  # define rcu_read_release_sched() \
> > >  		lock_release(&rcu_sched_lock_map, 1, _THIS_IP_)
> > >  
> > > -static inline int debug_lockdep_rcu_enabled(void)
> > > -{
> > > -	return likely(rcu_scheduler_active && debug_locks);
> > > -}
> > > +extern int debug_lockdep_rcu_enabled(void);
> > >  
> > >  /**
> > >   * rcu_read_lock_held - might we be in RCU read-side critical section?
> > > diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> > > index 79b53bd..2169abe 100644
> > > --- a/kernel/rcutree_plugin.h
> > > +++ b/kernel/rcutree_plugin.h
> > > @@ -1067,3 +1067,14 @@ static void rcu_needs_cpu_flush(void)
> > >  }
> > >  
> > >  #endif /* #else #if !defined(CONFIG_RCU_FAST_NO_HZ) */
> > > +
> > > +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> > > +
> > > +int debug_lockdep_rcu_enabled(void)
> > > +{
> > > +	return likely(rcu_scheduler_active &&
> > > +		      debug_locks &&
> > > +		      current->lockdep_recursion == 0);
> > > +}
> > > +
> > 
> > Looks good to me too, but I think
> > 'likely' is needless since the function is not inline.
> 
> Good point.  And to add injury to insult, I forgot EXPORT_SYMBOL_GPL().
> 
> Updated patch in the works.


Note I just tested the patch the previous one and it looks fine now.
You can then safely consider the "general idea" fixes the problem :)

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney April 14, 2010, 4 p.m. UTC | #5
On Wed, Apr 14, 2010 at 05:51:11PM +0200, Frederic Weisbecker wrote:
> On Wed, Apr 14, 2010 at 08:43:02AM -0700, Paul E. McKenney wrote:
> > On Wed, Apr 14, 2010 at 11:34:33AM +0800, Lai Jiangshan wrote:
> > > Paul E. McKenney wrote:
> > > > On Tue, Apr 13, 2010 at 05:13:06PM -0700, David Miller wrote:
> > > >> From: Frederic Weisbecker <fweisbec@gmail.com>
> > > >> Date: Wed, 14 Apr 2010 02:02:27 +0200
> > > >>
> > > >>> I just have a guess though....
> > > >>> This seems to always happen from NMI path, and lockdep is disabled on NMI.
> > > >>> I suspect the lock_acquire() performed by rcu_read_lock() is just ignored
> > > >>> and then the rcu_read_lock_held() check has the wrong result...
> > > >> Yeah, I bet that's it too.
> > > >>
> > > >> lock_is_held() can't return anything meaningful while lockdep is
> > > >> disabled, which it is during NMIs.
> > > > 
> > > > Ah!  So I just need to add a "current->lockdep_recursion"
> > > > check to debug_lockdep_rcu_enabled().  And move the function to
> > > > kernel/rcutree_plugin.h to avoid #include hell.
> > > > 
> > > > See below for (untested) patch.
> > > > 
> > > > 						Thanx, Paul
> > > > 
> > > > ------------------------------------------------------------------------
> > > > 
> > > >  include/linux/rcupdate.h |    5 +----
> > > >  kernel/rcutree_plugin.h  |   11 +++++++++++
> > > >  2 files changed, 12 insertions(+), 4 deletions(-)
> > > > 
> > > > commit 304d8da6cd791a81ce3164f867e1b3ef4f9af1d1
> > > > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > Date:   Tue Apr 13 18:45:51 2010 -0700
> > > > 
> > > >     rcu: Make RCU lockdep check the lockdep_recursion variable
> > > >     
> > > >     The lockdep facility temporarily disables lockdep checking by incrementing
> > > >     the current->lockdep_recursion variable.  Such disabling happens in NMIs
> > > >     and in other situations where lockdep might expect to recurse on itself.
> > > >     This patch therefore checks current->lockdep_recursion, disabling RCU
> > > >     lockdep splats when this variable is non-zero.
> > > >     
> > > >     Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
> > > >     Reported-by: David Miller <davem@davemloft.net>
> > > >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > 
> > > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > > > index 9f1ddfe..07db2fe 100644
> > > > --- a/include/linux/rcupdate.h
> > > > +++ b/include/linux/rcupdate.h
> > > > @@ -101,10 +101,7 @@ extern struct lockdep_map rcu_sched_lock_map;
> > > >  # define rcu_read_release_sched() \
> > > >  		lock_release(&rcu_sched_lock_map, 1, _THIS_IP_)
> > > >  
> > > > -static inline int debug_lockdep_rcu_enabled(void)
> > > > -{
> > > > -	return likely(rcu_scheduler_active && debug_locks);
> > > > -}
> > > > +extern int debug_lockdep_rcu_enabled(void);
> > > >  
> > > >  /**
> > > >   * rcu_read_lock_held - might we be in RCU read-side critical section?
> > > > diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> > > > index 79b53bd..2169abe 100644
> > > > --- a/kernel/rcutree_plugin.h
> > > > +++ b/kernel/rcutree_plugin.h
> > > > @@ -1067,3 +1067,14 @@ static void rcu_needs_cpu_flush(void)
> > > >  }
> > > >  
> > > >  #endif /* #else #if !defined(CONFIG_RCU_FAST_NO_HZ) */
> > > > +
> > > > +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> > > > +
> > > > +int debug_lockdep_rcu_enabled(void)
> > > > +{
> > > > +	return likely(rcu_scheduler_active &&
> > > > +		      debug_locks &&
> > > > +		      current->lockdep_recursion == 0);
> > > > +}
> > > > +
> > > 
> > > Looks good to me too, but I think
> > > 'likely' is needless since the function is not inline.
> > 
> > Good point.  And to add injury to insult, I forgot EXPORT_SYMBOL_GPL().
> > 
> > Updated patch in the works.
> 
> 
> Note I just tested the patch the previous one and it looks fine now.
> You can then safely consider the "general idea" fixes the problem :)

Thank you, Frederic!!!

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 9f1ddfe..07db2fe 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -101,10 +101,7 @@  extern struct lockdep_map rcu_sched_lock_map;
 # define rcu_read_release_sched() \
 		lock_release(&rcu_sched_lock_map, 1, _THIS_IP_)
 
-static inline int debug_lockdep_rcu_enabled(void)
-{
-	return likely(rcu_scheduler_active && debug_locks);
-}
+extern int debug_lockdep_rcu_enabled(void);
 
 /**
  * rcu_read_lock_held - might we be in RCU read-side critical section?
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 79b53bd..2169abe 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -1067,3 +1067,14 @@  static void rcu_needs_cpu_flush(void)
 }
 
 #endif /* #else #if !defined(CONFIG_RCU_FAST_NO_HZ) */
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+
+int debug_lockdep_rcu_enabled(void)
+{
+	return likely(rcu_scheduler_active &&
+		      debug_locks &&
+		      current->lockdep_recursion == 0);
+}
+
+#endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */