diff mbox

[net-next] rhashtable: Lower/upper bucket may map to same lock while shrinking

Message ID 20150112235821.GB16617@casper.infradead.org
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Thomas Graf Jan. 12, 2015, 11:58 p.m. UTC
Each per bucket lock covers a configurable number of buckets. While
shrinking, two buckets in the old table contain entries for a single
bucket in the new table. We need to lock down both while linking.
Check if they are protected by different locks to avoid a recursive
lock.

Fixes: 97defe1e ("rhashtable: Per bucket locks & deferred expansion/shrinking")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 lib/rhashtable.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

Comments

David Miller Jan. 13, 2015, 5:25 a.m. UTC | #1
From: Thomas Graf <tgraf@suug.ch>
Date: Mon, 12 Jan 2015 23:58:21 +0000

> Each per bucket lock covers a configurable number of buckets. While
> shrinking, two buckets in the old table contain entries for a single
> bucket in the new table. We need to lock down both while linking.
> Check if they are protected by different locks to avoid a recursive
> lock.
> 
> Fixes: 97defe1e ("rhashtable: Per bucket locks & deferred expansion/shrinking")
> Reported-by: Fengguang Wu <fengguang.wu@intel.com>
> Signed-off-by: Thomas Graf <tgraf@suug.ch>

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight Jan. 13, 2015, 9:49 a.m. UTC | #2
From: Thomas Graf
> Each per bucket lock covers a configurable number of buckets. While
> shrinking, two buckets in the old table contain entries for a single
> bucket in the new table. We need to lock down both while linking.
> Check if they are protected by different locks to avoid a recursive
> lock.

Thought, could the shrunk table use the same locks as the lower half
of the old table?

I also wonder whether shrinking hash tables is ever actually worth
the effort. Most likely they'll need to grow again very quickly.

>  		spin_lock_bh(old_bucket_lock1);
> -		spin_lock_bh_nested(old_bucket_lock2, RHT_LOCK_NESTED);
> -		spin_lock_bh_nested(new_bucket_lock, RHT_LOCK_NESTED2);
> +
> +		/* Depending on the lock per buckets mapping, the bucket in
> +		 * the lower and upper region may map to the same lock.
> +		 */
> +		if (old_bucket_lock1 != old_bucket_lock2) {
> +			spin_lock_bh_nested(old_bucket_lock2, RHT_LOCK_NESTED);
> +			spin_lock_bh_nested(new_bucket_lock, RHT_LOCK_NESTED2);
> +		} else {
> +			spin_lock_bh_nested(new_bucket_lock, RHT_LOCK_NESTED);
> +		}

Acquiring 3 locks of much the same type looks like a locking hierarchy
violation just waiting to happen.

	David

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Graf Jan. 13, 2015, 11:25 a.m. UTC | #3
On 01/13/15 at 09:49am, David Laight wrote:
> From: Thomas Graf
> > Each per bucket lock covers a configurable number of buckets. While
> > shrinking, two buckets in the old table contain entries for a single
> > bucket in the new table. We need to lock down both while linking.
> > Check if they are protected by different locks to avoid a recursive
> > lock.
> 
> Thought, could the shrunk table use the same locks as the lower half
> of the old table?

No. A new bucket table and thus a new set of locks is allocated when the
table is shrunk or grown. We only have check for overlapping locks
when holding multiple locks for the same table at the same time.

> I also wonder whether shrinking hash tables is ever actually worth
> the effort. Most likely they'll need to grow again very quickly.

Specifying a .shrink_decision function is optional so every rhashtable
user can decide whether it wants shrinking or not. Need for it was
expressed in the past threads.

Also, the case of multiple buckets mapping to the same lock is also
present in the expanding logic so removing the shrinking logic would
not remove the need for these types of checks.

> >  		spin_lock_bh(old_bucket_lock1);
> > -		spin_lock_bh_nested(old_bucket_lock2, RHT_LOCK_NESTED);
> > -		spin_lock_bh_nested(new_bucket_lock, RHT_LOCK_NESTED2);
> > +
> > +		/* Depending on the lock per buckets mapping, the bucket in
> > +		 * the lower and upper region may map to the same lock.
> > +		 */
> > +		if (old_bucket_lock1 != old_bucket_lock2) {
> > +			spin_lock_bh_nested(old_bucket_lock2, RHT_LOCK_NESTED);
> > +			spin_lock_bh_nested(new_bucket_lock, RHT_LOCK_NESTED2);
> > +		} else {
> > +			spin_lock_bh_nested(new_bucket_lock, RHT_LOCK_NESTED);
> > +		}
> 
> Acquiring 3 locks of much the same type looks like a locking hierarchy
> violation just waiting to happen.

I'm not claiming it's extremely pretty, lockless lookup with deferred
resizing doesn't come for free ;-) If you have a suggestion on how to
implement this differently I'm all ears. That said, it's well isolated
and the user of rhashtable does not have to deal with it. All code paths
which take multiple locks are mutually exclusive to each other (ht->mutex).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight Jan. 13, 2015, 3 p.m. UTC | #4
From: Thomas Graf
...
> > Thought, could the shrunk table use the same locks as the lower half
> > of the old table?
> 
> No. A new bucket table and thus a new set of locks is allocated when the
> table is shrunk or grown. We only have check for overlapping locks
> when holding multiple locks for the same table at the same time.

I was guessing that when locks are shared buckets k and 2^n+k use the
same lock.
Under those conditions if the 'grow' decided not to allocate extra
locks then it could save work by using exactly the same locks as the
old table.
Similarly 'shrink' could do the reverse.

It was only a thought.

	David

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight Jan. 13, 2015, 3:06 p.m. UTC | #5
From: Thomas Graf
...
> > >  		spin_lock_bh(old_bucket_lock1);
> > > -		spin_lock_bh_nested(old_bucket_lock2, RHT_LOCK_NESTED);
> > > -		spin_lock_bh_nested(new_bucket_lock, RHT_LOCK_NESTED2);
> > > +
> > > +		/* Depending on the lock per buckets mapping, the bucket in
> > > +		 * the lower and upper region may map to the same lock.
> > > +		 */
> > > +		if (old_bucket_lock1 != old_bucket_lock2) {
> > > +			spin_lock_bh_nested(old_bucket_lock2, RHT_LOCK_NESTED);
> > > +			spin_lock_bh_nested(new_bucket_lock, RHT_LOCK_NESTED2);
> > > +		} else {
> > > +			spin_lock_bh_nested(new_bucket_lock, RHT_LOCK_NESTED);
> > > +		}
> >
> > Acquiring 3 locks of much the same type looks like a locking hierarchy
> > violation just waiting to happen.
> 
> I'm not claiming it's extremely pretty, lockless lookup with deferred
> resizing doesn't come for free ;-) If you have a suggestion on how to
> implement this differently I'm all ears.

runs away....

> That said, it's well isolated
> and the user of rhashtable does not have to deal with it. All code paths
> which take multiple locks are mutually exclusive to each other (ht->mutex).

OK, ht->mutes saves the day.
Might be worth a comment to save people looking at the code in isolation
from worrying and doing a bit search.
OTOH it might be obvious from a slightly larger fragment than the diff.

	David

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Graf Jan. 13, 2015, 3:56 p.m. UTC | #6
On 01/13/15 at 03:06pm, David Laight wrote:
> OK, ht->mutes saves the day.
> Might be worth a comment to save people looking at the code in isolation
> from worrying and doing a bit search.
> OTOH it might be obvious from a slightly larger fragment than the diff.

Good idea. Will do this. Also, thanks for the suggestion in the other
email. I now understand what you meant.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 8023b55..45477f7 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -443,8 +443,16 @@  int rhashtable_shrink(struct rhashtable *ht)
 		new_bucket_lock = bucket_lock(new_tbl, new_hash);
 
 		spin_lock_bh(old_bucket_lock1);
-		spin_lock_bh_nested(old_bucket_lock2, RHT_LOCK_NESTED);
-		spin_lock_bh_nested(new_bucket_lock, RHT_LOCK_NESTED2);
+
+		/* Depending on the lock per buckets mapping, the bucket in
+		 * the lower and upper region may map to the same lock.
+		 */
+		if (old_bucket_lock1 != old_bucket_lock2) {
+			spin_lock_bh_nested(old_bucket_lock2, RHT_LOCK_NESTED);
+			spin_lock_bh_nested(new_bucket_lock, RHT_LOCK_NESTED2);
+		} else {
+			spin_lock_bh_nested(new_bucket_lock, RHT_LOCK_NESTED);
+		}
 
 		rcu_assign_pointer(*bucket_tail(new_tbl, new_hash),
 				   tbl->buckets[new_hash]);
@@ -452,7 +460,8 @@  int rhashtable_shrink(struct rhashtable *ht)
 				   tbl->buckets[new_hash + new_tbl->size]);
 
 		spin_unlock_bh(new_bucket_lock);
-		spin_unlock_bh(old_bucket_lock2);
+		if (old_bucket_lock1 != old_bucket_lock2)
+			spin_unlock_bh(old_bucket_lock2);
 		spin_unlock_bh(old_bucket_lock1);
 	}