From patchwork Mon Mar 23 13:50:26 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Herbert Xu X-Patchwork-Id: 453479 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 8B167140134 for ; Tue, 24 Mar 2015 00:52:06 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752854AbbCWNwD (ORCPT ); Mon, 23 Mar 2015 09:52:03 -0400 Received: from ringil.hengli.com.au ([178.18.16.133]:59348 "EHLO ringil.hengli.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752349AbbCWNuq (ORCPT ); Mon, 23 Mar 2015 09:50:46 -0400 Received: from gondolin.me.apana.org.au ([192.168.0.6]) by norbury.hengli.com.au with esmtp (Exim 4.80 #3 (Debian)) id 1Ya2kM-0003Am-P5; Tue, 24 Mar 2015 00:50:26 +1100 Received: from herbert by gondolin.me.apana.org.au with local (Exim 4.80) (envelope-from ) id 1Ya2kM-0004If-GN; Tue, 24 Mar 2015 00:50:26 +1100 Subject: [v3 PATCH 7/9] rhashtable: Add multiple rehash support References: <20150323134955.GA16328@gondor.apana.org.au> To: "David S. Miller" , Thomas Graf , Eric Dumazet , Patrick McHardy , Josh Triplett , "Paul E. McKenney" , netdev@vger.kernel.org Message-Id: From: Herbert Xu Date: Tue, 24 Mar 2015 00:50:26 +1100 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch adds the missing bits to allow multiple rehashes. The read-side as well as remove already handle this correctly. So it's only the rehasher and insertion that need modification to handle this. Note that this patch doesn't actually enable it so for now rehashing is still only performed by the worker thread. This patch also disables the explicit expand/shrink interface because the table is meant to expand and shrink automatically, and continuing to export these interfaces unnecessarily complicates the life of the rehasher since the rehash process is now composed of two parts. Signed-off-by: Herbert Xu Acked-by: Thomas Graf --- include/linux/rhashtable.h | 26 +++++++------ lib/rhashtable.c | 87 +++++++++++++++++++++++++++++++++++++-------- lib/test_rhashtable.c | 24 ------------ 3 files changed, 86 insertions(+), 51 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h index 53465dc..97fa904 100644 --- a/include/linux/rhashtable.h +++ b/include/linux/rhashtable.h @@ -308,9 +308,6 @@ int rhashtable_insert_slow(struct rhashtable *ht, const void *key, struct rhash_head *obj, struct bucket_table *old_tbl); -int rhashtable_expand(struct rhashtable *ht); -int rhashtable_shrink(struct rhashtable *ht); - int rhashtable_walk_init(struct rhashtable *ht, struct rhashtable_iter *iter); void rhashtable_walk_exit(struct rhashtable_iter *iter); int rhashtable_walk_start(struct rhashtable_iter *iter) __acquires(RCU); @@ -541,17 +538,22 @@ static inline int __rhashtable_insert_fast( rcu_read_lock(); tbl = rht_dereference_rcu(ht->tbl, ht); - hash = rht_head_hashfn(ht, tbl, obj, params); - lock = rht_bucket_lock(tbl, hash); - - spin_lock_bh(lock); - /* Because we have already taken the bucket lock in tbl, - * if we find that future_tbl is not yet visible then - * that guarantees all other insertions of the same entry - * will also grab the bucket lock in tbl because until - * the rehash completes ht->tbl won't be changed. + /* All insertions must grab the oldest table containing + * the hashed bucket that is yet to be rehashed. */ + for (;;) { + hash = rht_head_hashfn(ht, tbl, obj, params); + lock = rht_bucket_lock(tbl, hash); + spin_lock_bh(lock); + + if (tbl->rehash <= hash) + break; + + spin_unlock_bh(lock); + tbl = rht_dereference_rcu(tbl->future_tbl, ht); + } + new_tbl = rht_dereference_rcu(tbl->future_tbl, ht); if (unlikely(new_tbl)) { err = rhashtable_insert_slow(ht, key, obj, new_tbl); diff --git a/lib/rhashtable.c b/lib/rhashtable.c index 9623be3..5e04403 100644 --- a/lib/rhashtable.c +++ b/lib/rhashtable.c @@ -136,11 +136,24 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht, return tbl; } +static struct bucket_table *rhashtable_last_table(struct rhashtable *ht, + struct bucket_table *tbl) +{ + struct bucket_table *new_tbl; + + do { + new_tbl = tbl; + tbl = rht_dereference_rcu(tbl->future_tbl, ht); + } while (tbl); + + return new_tbl; +} + static int rhashtable_rehash_one(struct rhashtable *ht, unsigned old_hash) { struct bucket_table *old_tbl = rht_dereference(ht->tbl, ht); - struct bucket_table *new_tbl = - rht_dereference(old_tbl->future_tbl, ht) ?: old_tbl; + struct bucket_table *new_tbl = rhashtable_last_table(ht, + rht_dereference_rcu(old_tbl->future_tbl, ht)); struct rhash_head __rcu **pprev = &old_tbl->buckets[old_hash]; int err = -ENOENT; struct rhash_head *head, *next, *entry; @@ -196,12 +209,18 @@ static void rhashtable_rehash_chain(struct rhashtable *ht, unsigned old_hash) spin_unlock_bh(old_bucket_lock); } -static void rhashtable_rehash(struct rhashtable *ht, - struct bucket_table *new_tbl) +static int rhashtable_rehash_attach(struct rhashtable *ht, + struct bucket_table *old_tbl, + struct bucket_table *new_tbl) { - struct bucket_table *old_tbl = rht_dereference(ht->tbl, ht); - struct rhashtable_walker *walker; - unsigned old_hash; + /* Protect future_tbl using the first bucket lock. */ + spin_lock_bh(old_tbl->locks); + + /* Did somebody beat us to it? */ + if (rcu_access_pointer(old_tbl->future_tbl)) { + spin_unlock_bh(old_tbl->locks); + return -EEXIST; + } /* Make insertions go into the new, empty table right away. Deletions * and lookups will be attempted in both tables until we synchronize. @@ -211,6 +230,22 @@ static void rhashtable_rehash(struct rhashtable *ht, /* Ensure the new table is visible to readers. */ smp_wmb(); + spin_unlock_bh(old_tbl->locks); + + return 0; +} + +static int rhashtable_rehash_table(struct rhashtable *ht) +{ + struct bucket_table *old_tbl = rht_dereference(ht->tbl, ht); + struct bucket_table *new_tbl; + struct rhashtable_walker *walker; + unsigned old_hash; + + new_tbl = rht_dereference(old_tbl->future_tbl, ht); + if (!new_tbl) + return 0; + for (old_hash = 0; old_hash < old_tbl->size; old_hash++) rhashtable_rehash_chain(ht, old_hash); @@ -225,6 +260,8 @@ static void rhashtable_rehash(struct rhashtable *ht, * remain. */ call_rcu(&old_tbl->rcu, bucket_table_free_rcu); + + return rht_dereference(new_tbl->future_tbl, ht) ? -EAGAIN : 0; } /** @@ -242,20 +279,25 @@ static void rhashtable_rehash(struct rhashtable *ht, * It is valid to have concurrent insertions and deletions protected by per * bucket locks or concurrent RCU protected lookups and traversals. */ -int rhashtable_expand(struct rhashtable *ht) +static int rhashtable_expand(struct rhashtable *ht) { struct bucket_table *new_tbl, *old_tbl = rht_dereference(ht->tbl, ht); + int err; ASSERT_RHT_MUTEX(ht); + old_tbl = rhashtable_last_table(ht, old_tbl); + new_tbl = bucket_table_alloc(ht, old_tbl->size * 2); if (new_tbl == NULL) return -ENOMEM; - rhashtable_rehash(ht, new_tbl); - return 0; + err = rhashtable_rehash_attach(ht, old_tbl, new_tbl); + if (err) + bucket_table_free(new_tbl); + + return err; } -EXPORT_SYMBOL_GPL(rhashtable_expand); /** * rhashtable_shrink - Shrink hash table while allowing concurrent lookups @@ -273,10 +315,11 @@ EXPORT_SYMBOL_GPL(rhashtable_expand); * It is valid to have concurrent insertions and deletions protected by per * bucket locks or concurrent RCU protected lookups and traversals. */ -int rhashtable_shrink(struct rhashtable *ht) +static int rhashtable_shrink(struct rhashtable *ht) { struct bucket_table *new_tbl, *old_tbl = rht_dereference(ht->tbl, ht); unsigned size = roundup_pow_of_two(atomic_read(&ht->nelems) * 3 / 2); + int err; ASSERT_RHT_MUTEX(ht); @@ -286,19 +329,25 @@ int rhashtable_shrink(struct rhashtable *ht) if (old_tbl->size <= size) return 0; + if (rht_dereference(old_tbl->future_tbl, ht)) + return -EEXIST; + new_tbl = bucket_table_alloc(ht, size); if (new_tbl == NULL) return -ENOMEM; - rhashtable_rehash(ht, new_tbl); - return 0; + err = rhashtable_rehash_attach(ht, old_tbl, new_tbl); + if (err) + bucket_table_free(new_tbl); + + return err; } -EXPORT_SYMBOL_GPL(rhashtable_shrink); static void rht_deferred_worker(struct work_struct *work) { struct rhashtable *ht; struct bucket_table *tbl; + int err = 0; ht = container_of(work, struct rhashtable, run_work); mutex_lock(&ht->mutex); @@ -306,13 +355,20 @@ static void rht_deferred_worker(struct work_struct *work) goto unlock; tbl = rht_dereference(ht->tbl, ht); + tbl = rhashtable_last_table(ht, tbl); if (rht_grow_above_75(ht, tbl)) rhashtable_expand(ht); else if (rht_shrink_below_30(ht, tbl)) rhashtable_shrink(ht); + + err = rhashtable_rehash_table(ht); + unlock: mutex_unlock(&ht->mutex); + + if (err) + schedule_work(&ht->run_work); } int rhashtable_insert_slow(struct rhashtable *ht, const void *key, @@ -323,6 +379,7 @@ int rhashtable_insert_slow(struct rhashtable *ht, const void *key, unsigned hash; int err = -EEXIST; + tbl = rhashtable_last_table(ht, tbl); hash = head_hashfn(ht, tbl, obj); spin_lock_nested(rht_bucket_lock(tbl, hash), SINGLE_DEPTH_NESTING); diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c index a2ba6ad..a42a0d4 100644 --- a/lib/test_rhashtable.c +++ b/lib/test_rhashtable.c @@ -155,30 +155,6 @@ static int __init test_rhashtable(struct rhashtable *ht) test_rht_lookup(ht); rcu_read_unlock(); - for (i = 0; i < TEST_NEXPANDS; i++) { - pr_info(" Table expansion iteration %u...\n", i); - mutex_lock(&ht->mutex); - rhashtable_expand(ht); - mutex_unlock(&ht->mutex); - - rcu_read_lock(); - pr_info(" Verifying lookups...\n"); - test_rht_lookup(ht); - rcu_read_unlock(); - } - - for (i = 0; i < TEST_NEXPANDS; i++) { - pr_info(" Table shrinkage iteration %u...\n", i); - mutex_lock(&ht->mutex); - rhashtable_shrink(ht); - mutex_unlock(&ht->mutex); - - rcu_read_lock(); - pr_info(" Verifying lookups...\n"); - test_rht_lookup(ht); - rcu_read_unlock(); - } - rcu_read_lock(); test_bucket_stats(ht, true); rcu_read_unlock();