Message ID | 1442847108.29850.56.camel@edumazet-glaptop2.roam.corp.google.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Mon, Sep 21, 2015 at 4:51 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Mon, 2015-09-21 at 06:31 -0700, Eric Dumazet wrote: >> On Mon, 2015-09-21 at 10:08 +0200, Dmitry Vyukov wrote: >> > rhashtable_rehash_one() uses plain writes to update entry->next, >> > while it is being concurrently accessed by readers. >> > Unfortunately, the compiler is within its rights to (for example) use >> > byte-at-a-time writes to update the pointer, which would fatally confuse >> > concurrent readers. >> > >> This is bogus. >> >> 1) Linux is certainly not working if some arch or compiler is not doing >> single word writes. WRITE_ONCE() would not help at all to enforce this. >> >> 2) If new node is not yet visible, we don't care if we write >> entry->next using any kind of operation. >> >> So the WRITE_ONCE() is not needed at all. >> >> >> >> > + WRITE_ONCE(entry->next, head); >> >> >> The rcu_assign_pointer() immediately following is enough in this case. >> >> We have hundred of similar cases in the kernel. >> >> > > The changelog and comment are totally confusing. > > Please remove the bogus parts in them, and/or rephrase. > > The important part here is that we rehash an item, so we need to make > sure to maintain consistent ->next field, and need to prevent compiler > from using ->next as a temporary variable. > > ptr->next = 1UL | ((base + offset) << 1); > > Is dangerous because compiler could issue : > > ptr->next = (base + offset); > > ptr->next <<= 1; > > ptr->next += 1UL; > > Frankly, all this looks like an oversight in this code. > > Not sure why the NULLS value is even recomputed. I have not looked in detail yet, but the NULLS recomputation uses new_hash, which obviously wasn't available when the value was previously computed. Don't know yet whether it is important or not. > > diff --git a/lib/rhashtable.c b/lib/rhashtable.c > index cc0c69710dcf..0a29f07ba45a 100644 > --- a/lib/rhashtable.c > +++ b/lib/rhashtable.c > @@ -187,10 +187,7 @@ static int rhashtable_rehash_one(struct rhashtable *ht, unsigned int old_hash) > head = rht_dereference_bucket(new_tbl->buckets[new_hash], > new_tbl, new_hash); > > - if (rht_is_a_nulls(head)) > - INIT_RHT_NULLS_HEAD(entry->next, ht, new_hash); > - else > - RCU_INIT_POINTER(entry->next, head); > + RCU_INIT_POINTER(entry->next, head); > > rcu_assign_pointer(new_tbl->buckets[new_hash], entry); > spin_unlock(new_bucket_lock); > > > -- > You received this message because you are subscribed to the Google Groups "ktsan" group. > To unsubscribe from this group and stop receiving emails from it, send an email to ktsan+unsubscribe@googlegroups.com. > To post to this group, send email to ktsan@googlegroups.com. > To view this discussion on the web visit https://groups.google.com/d/msgid/ktsan/1442847108.29850.56.camel%40edumazet-glaptop2.roam.corp.google.com. > For more options, visit https://groups.google.com/d/optout.
On Mon, 2015-09-21 at 17:10 +0200, Dmitry Vyukov wrote: > On Mon, Sep 21, 2015 at 4:51 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > On Mon, 2015-09-21 at 06:31 -0700, Eric Dumazet wrote: > >> On Mon, 2015-09-21 at 10:08 +0200, Dmitry Vyukov wrote: > >> > rhashtable_rehash_one() uses plain writes to update entry->next, > >> > while it is being concurrently accessed by readers. > >> > Unfortunately, the compiler is within its rights to (for example) use > >> > byte-at-a-time writes to update the pointer, which would fatally confuse > >> > concurrent readers. > >> > > >> This is bogus. > >> > >> 1) Linux is certainly not working if some arch or compiler is not doing > >> single word writes. WRITE_ONCE() would not help at all to enforce this. > >> > >> 2) If new node is not yet visible, we don't care if we write > >> entry->next using any kind of operation. > >> > >> So the WRITE_ONCE() is not needed at all. > >> > >> > >> > >> > + WRITE_ONCE(entry->next, head); > >> > >> > >> The rcu_assign_pointer() immediately following is enough in this case. > >> > >> We have hundred of similar cases in the kernel. > >> > >> > > > > The changelog and comment are totally confusing. > > > > Please remove the bogus parts in them, and/or rephrase. > > > > The important part here is that we rehash an item, so we need to make > > sure to maintain consistent ->next field, and need to prevent compiler > > from using ->next as a temporary variable. > > > > ptr->next = 1UL | ((base + offset) << 1); > > > > Is dangerous because compiler could issue : > > > > ptr->next = (base + offset); > > > > ptr->next <<= 1; > > > > ptr->next += 1UL; > > > > Frankly, all this looks like an oversight in this code. > > > > Not sure why the NULLS value is even recomputed. > > I have not looked in detail yet, but the NULLS recomputation uses > new_hash, which obviously wasn't available when the value was > previously computed. Don't know yet whether it is important or not. Well, head already contains the right value, set in bucket_table_alloc() for (i = 0; i < nbuckets; i++) INIT_RHT_NULLS_HEAD(tbl->buckets[i], ht, i); Think of this nulls value as a special NULL pointer. If hash table is properly allocated/initialized, all the chains are correctly ending with a proper NULL pointer. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/21/15 at 07:51am, Eric Dumazet wrote: > The important part here is that we rehash an item, so we need to make > sure to maintain consistent ->next field, and need to prevent compiler > from using ->next as a temporary variable. > > ptr->next = 1UL | ((base + offset) << 1); > > Is dangerous because compiler could issue : > > ptr->next = (base + offset); > > ptr->next <<= 1; > > ptr->next += 1UL; > > Frankly, all this looks like an oversight in this code. > > Not sure why the NULLS value is even recomputed. The hash of the chain is part of the NULLS value. Since the entry might have been moved to a different chain, the NULLS value must be recalculated to contain the proper hash. However, nobody is using the hash today as far as I can see so we could as well just remove it and use the base value only for the nulls marker. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/lib/rhashtable.c b/lib/rhashtable.c index cc0c69710dcf..0a29f07ba45a 100644 --- a/lib/rhashtable.c +++ b/lib/rhashtable.c @@ -187,10 +187,7 @@ static int rhashtable_rehash_one(struct rhashtable *ht, unsigned int old_hash) head = rht_dereference_bucket(new_tbl->buckets[new_hash], new_tbl, new_hash); - if (rht_is_a_nulls(head)) - INIT_RHT_NULLS_HEAD(entry->next, ht, new_hash); - else - RCU_INIT_POINTER(entry->next, head); + RCU_INIT_POINTER(entry->next, head); rcu_assign_pointer(new_tbl->buckets[new_hash], entry); spin_unlock(new_bucket_lock);