diff mbox

rhashtable: Fix reader/rehash race

Message ID 20150312110749.GA19285@gondor.apana.org.au
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Herbert Xu March 12, 2015, 11:07 a.m. UTC
There is a potential race condition between readers and the rehasher.
In particular, the rehasher could have started a rehash while the
reader finishes a scan of the old table but fails to see the new
table pointer.

This patch closes this window by adding smp_wmb/smp_rmb.
    
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Comments

Thomas Graf March 12, 2015, 1:37 p.m. UTC | #1
On 03/12/15 at 10:07pm, Herbert Xu wrote:
> There is a potential race condition between readers and the rehasher.
> In particular, the rehasher could have started a rehash while the
> reader finishes a scan of the old table but fails to see the new
> table pointer.
> 
> This patch closes this window by adding smp_wmb/smp_rmb.
>     
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Are you sure this is sufficient? I think it is still possible
for a rehash to preempt a reader in between:

        tbl = rht_dereference_rcu(ht->future_tbl, ht);
        if (unlikely(tbl != old_tbl))

in which case old entries will be moved to the new table
without the reader seeing them. I don't see how ensuring order
solves this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu March 12, 2015, 9:49 p.m. UTC | #2
On Thu, Mar 12, 2015 at 01:37:49PM +0000, Thomas Graf wrote:
> On 03/12/15 at 10:07pm, Herbert Xu wrote:
> > There is a potential race condition between readers and the rehasher.
> > In particular, the rehasher could have started a rehash while the
> > reader finishes a scan of the old table but fails to see the new
> > table pointer.
> > 
> > This patch closes this window by adding smp_wmb/smp_rmb.
> >     
> > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> Are you sure this is sufficient? I think it is still possible
> for a rehash to preempt a reader in between:
> 
>         tbl = rht_dereference_rcu(ht->future_tbl, ht);
>         if (unlikely(tbl != old_tbl))
> 
> in which case old entries will be moved to the new table
> without the reader seeing them. I don't see how ensuring order
> solves this.

It doesn't matter.  The wmb/smb guarauntees that if the reader
cannot find the element in the old table then it must see the
new table pointer.  Vice versa if it cannot see the new table
pointer then the element (if it existed at all) must be in the
old table.

Cheers,
David Miller March 13, 2015, 3:01 a.m. UTC | #3
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 12 Mar 2015 22:07:49 +1100

> There is a potential race condition between readers and the rehasher.
> In particular, the rehasher could have started a rehash while the
> reader finishes a scan of the old table but fails to see the new
> table pointer.
> 
> This patch closes this window by adding smp_wmb/smp_rmb.
>     
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Graf March 13, 2015, 10:32 a.m. UTC | #4
On 03/13/15 at 08:49am, Herbert Xu wrote:
> It doesn't matter.  The wmb/smb guarauntees that if the reader
> cannot find the element in the old table then it must see the
> new table pointer.  Vice versa if it cannot see the new table
> pointer then the element (if it existed at all) must be in the
> old table.

I understand what you are doing now. I was still thinking
of entries being on both lists in parallel.

One last question though. What about rhashtable_remove()?
The spin_unlock_bh() in __rhashtable_remove() only guarantees
for loads before the release to be completed. The future_tbl
load could still be reordered before the traversal is complete.
I think it needs an smp_rmb() as well.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu March 13, 2015, 10:36 a.m. UTC | #5
On Fri, Mar 13, 2015 at 10:32:59AM +0000, Thomas Graf wrote:
>
> One last question though. What about rhashtable_remove()?
> The spin_unlock_bh() in __rhashtable_remove() only guarantees
> for loads before the release to be completed. The future_tbl
> load could still be reordered before the traversal is complete.
> I think it needs an smp_rmb() as well.

rhashtable_remove is fine because the rehasher has to take the
same lock to move things over.  That's what guarantees the new
future_tbl to be visible if it moved the to-be-removed object
over to the new table.

IOW if rhashtable_remove couldn't see the future_tbl then that
can only mean that the rehasher has yet to take the lock on that
bucket which implies that the object if it existed at all is still
in that bucket.

Cheers,
Thomas Graf March 13, 2015, 10:54 a.m. UTC | #6
On 03/13/15 at 09:36pm, Herbert Xu wrote:
> rhashtable_remove is fine because the rehasher has to take the
> same lock to move things over.  That's what guarantees the new
> future_tbl to be visible if it moved the to-be-removed object
> over to the new table.
> 
> IOW if rhashtable_remove couldn't see the future_tbl then that
> can only mean that the rehasher has yet to take the lock on that
> bucket which implies that the object if it existed at all is still
> in that bucket.

Ah, it works because the future_tbl load can't be reordered in
front of the acquire. Thanks for the explanation.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 6ffc793..68210cc 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -271,6 +271,9 @@  static void rhashtable_rehash(struct rhashtable *ht,
 	 */
 	rcu_assign_pointer(ht->future_tbl, new_tbl);
 
+	/* Ensure the new table is visible to readers. */
+	smp_wmb();
+
 	for (old_hash = 0; old_hash < old_tbl->size; old_hash++)
 		rhashtable_rehash_chain(ht, old_hash);
 
@@ -618,6 +621,9 @@  restart:
 		return rht_obj(ht, he);
 	}
 
+	/* Ensure we see any new tables. */
+	smp_rmb();
+
 	old_tbl = tbl;
 	tbl = rht_dereference_rcu(ht->future_tbl, ht);
 	if (unlikely(tbl != old_tbl))