diff mbox

rebalance locks by converting write_lock_bh to write_lock+local_bh_disable

Message ID 20131121235402.GA11774@opentech.at
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Nicholas Mc Guire Nov. 21, 2013, 11:54 p.m. UTC
From 2c8e669b691b825c0ed2a02bd7a698d8ed5c6d29 Mon Sep 17 00:00:00 2001
From: Nicholas Mc Guire <der.herr@hofr.at>
Date: Thu, 21 Nov 2013 18:22:55 -0500
Subject: [PATCH] rebalance locks by converting write_lock_bh to write_lock+local_bh_disable
 

 in __neigh_event_send write_lock_bh(&neigh->lock) is implicitly balanced by
 write_unlock(&neigh->lock)+local_bh_disable() - while this is equivalent with
 respect to the effective low level locking primitives it breaks balancing
 in the locking api. This makes automatic lock-checking trigger false 
 positives, creates an implicit dependency between *_lock_bh and *_lock 
 functions as well as making the extremly simply locking of net core even
 easier to understand.

 The api inbalance was introduced in:
 commit cd28ca0a3dd17c68d24b839602a0e6268ad28b5d
 Author: Eric Dumazet <eric.dumazet@gmail.com>
 This patch just rebalances the lock api

 No change of functionality

Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
---
 net/core/neighbour.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

Comments

David Miller Nov. 23, 2013, 10:39 p.m. UTC | #1
From: Nicholas Mc Guire <der.herr@hofr.at>
Date: Fri, 22 Nov 2013 00:54:02 +0100

> From 2c8e669b691b825c0ed2a02bd7a698d8ed5c6d29 Mon Sep 17 00:00:00 2001
> From: Nicholas Mc Guire <der.herr@hofr.at>
> Date: Thu, 21 Nov 2013 18:22:55 -0500
> Subject: [PATCH] rebalance locks by converting write_lock_bh to write_lock+local_bh_disable
>  
> 
>  in __neigh_event_send write_lock_bh(&neigh->lock) is implicitly balanced by
>  write_unlock(&neigh->lock)+local_bh_disable() - while this is equivalent with
>  respect to the effective low level locking primitives it breaks balancing
>  in the locking api. This makes automatic lock-checking trigger false 
>  positives, creates an implicit dependency between *_lock_bh and *_lock 
>  functions as well as making the extremly simply locking of net core even
>  easier to understand.
> 
>  The api inbalance was introduced in:
>  commit cd28ca0a3dd17c68d24b839602a0e6268ad28b5d
>  Author: Eric Dumazet <eric.dumazet@gmail.com>
>  This patch just rebalances the lock api
> 
>  No change of functionality
> 
> Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>

This is a valid locking idiom, fix the lock checking.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicholas Mc Guire Nov. 23, 2013, 11:58 p.m. UTC | #2
On Sat, 23 Nov 2013, David Miller wrote:

> From: Nicholas Mc Guire <der.herr@hofr.at>
> Date: Fri, 22 Nov 2013 00:54:02 +0100
> 
> > From 2c8e669b691b825c0ed2a02bd7a698d8ed5c6d29 Mon Sep 17 00:00:00 2001
> > From: Nicholas Mc Guire <der.herr@hofr.at>
> > Date: Thu, 21 Nov 2013 18:22:55 -0500
> > Subject: [PATCH] rebalance locks by converting write_lock_bh to write_lock+local_bh_disable
> >  
> > 
> >  in __neigh_event_send write_lock_bh(&neigh->lock) is implicitly balanced by
> >  write_unlock(&neigh->lock)+local_bh_disable() - while this is equivalent with
> >  respect to the effective low level locking primitives it breaks balancing
> >  in the locking api. This makes automatic lock-checking trigger false 
> >  positives, creates an implicit dependency between *_lock_bh and *_lock 
> >  functions as well as making the extremly simply locking of net core even
> >  easier to understand.
> > 
> >  The api inbalance was introduced in:
> >  commit cd28ca0a3dd17c68d24b839602a0e6268ad28b5d
> >  Author: Eric Dumazet <eric.dumazet@gmail.com>
> >  This patch just rebalances the lock api
> > 
> >  No change of functionality
> > 
> > Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
> 
> This is a valid locking idiom, fix the lock checking.

for lock checking that is doable but what is with the api coupling and 
readability ?

any change you do to the spin_lock_bh/spin_unlock_bh side would need to also
take care of the spin_lock/spin_unlock variance and keep them functionally 
equivalent - currently there is a very small number of such inbalances in
place it seems (scan of 3.12.1 found 1 write_lock/write_lock_bh, 
2 spin_lock/spin_lock_bh, 0 in read_lock/read_lock_bh) so is this idiomatic extension sensible given that it introduces implicit api-coupling ?

in one of the cases I do not understand the intent behind the split:
in net/core/sock.c:lock_sock_fast

	spin_lock_bh(&sk->sk_lock.slock);
	...
        spin_unlock(&sk->sk_lock.slock);
        /*
         * The sk_lock has mutex_lock() semantics here:
         */
        mutex_acquire(&sk->sk_lock.dep_map, 0, 0, _RET_IP_);
        local_bh_enable();

 I think that 

	spin_lock_bh(&sk->sk_lock.slock);
	...
        /*
         * The sk_lock has mutex_lock() semantics here:
         */
        mutex_acquire(&sk->sk_lock.dep_map, 0, 0, _RET_IP_);
        spin_unlock_bh(&sk->sk_lock.slock);

 should be equivalent ?

thx!
hofrat
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight Nov. 25, 2013, 9:40 a.m. UTC | #3
> in one of the cases I do not understand the intent behind the split:
> in net/core/sock.c:lock_sock_fast
> 
> 	spin_lock_bh(&sk->sk_lock.slock);
> 	...
>         spin_unlock(&sk->sk_lock.slock);
>         /*
>          * The sk_lock has mutex_lock() semantics here:
>          */
>         mutex_acquire(&sk->sk_lock.dep_map, 0, 0, _RET_IP_);
>         local_bh_enable();
> 
>  I think that
> 
> 	spin_lock_bh(&sk->sk_lock.slock);
> 	...
>         /*
>          * The sk_lock has mutex_lock() semantics here:
>          */
>         mutex_acquire(&sk->sk_lock.dep_map, 0, 0, _RET_IP_);
>         spin_unlock_bh(&sk->sk_lock.slock);
> 
>  should be equivalent ?

You've added a lock ordering that wasn't there before.
Also I suspect that mutex_acquire() might be allowed to sleep,
whereas you shouldn't sleep with a spin lock held.

	David



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicholas Mc Guire Nov. 25, 2013, 10:37 a.m. UTC | #4
On Mon, 25 Nov 2013, David Laight wrote:

> > in one of the cases I do not understand the intent behind the split:
> > in net/core/sock.c:lock_sock_fast
> > 
> > 	spin_lock_bh(&sk->sk_lock.slock);
> > 	...
> >         spin_unlock(&sk->sk_lock.slock);
> >         /*
> >          * The sk_lock has mutex_lock() semantics here:
> >          */
> >         mutex_acquire(&sk->sk_lock.dep_map, 0, 0, _RET_IP_);
> >         local_bh_enable();
> > 
> >  I think that
> > 
> > 	spin_lock_bh(&sk->sk_lock.slock);
> > 	...
> >         /*
> >          * The sk_lock has mutex_lock() semantics here:
> >          */
> >         mutex_acquire(&sk->sk_lock.dep_map, 0, 0, _RET_IP_);
> >         spin_unlock_bh(&sk->sk_lock.slock);
> > 
> >  should be equivalent ?
> 
> You've added a lock ordering that wasn't there before.
> Also I suspect that mutex_acquire() might be allowed to sleep,
> whereas you shouldn't sleep with a spin lock held.
>
mutex_acquire is not a lock but a lockdep entry. 
As far as I understand it it nither can sleep nor is there a change of 
order here. Am I missing something ?

thx!
hofrat
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index ca15f32..d681c75 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -966,7 +966,8 @@  int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb)
 	int rc;
 	bool immediate_probe = false;
 
-	write_lock_bh(&neigh->lock);
+	local_bh_disable();
+	write_lock(&neigh->lock);
 
 	rc = 0;
 	if (neigh->nud_state & (NUD_CONNECTED | NUD_DELAY | NUD_PROBE))