Message ID | 20190529054026.fwcyhzt33dshma4h@gondor.apana.org.au |
---|---|
State | Accepted |
Delegated to: | David Miller |
Headers | show |
Series | inet: frags: Remove unnecessary smp_store_release/READ_ONCE | expand |
On Wed, May 29, 2019 at 7:40 AM Herbert Xu <herbert@gondor.apana.org.au> wrote: > > On Tue, May 28, 2019 at 06:31:00AM -0700, Eric Dumazet wrote: > > > > This smp_store_release() is a left over of the first version of the patch, where > > there was no rcu grace period enforcement. > > > > I do not believe there is harm letting this, but if you disagree > > please send a patch ;) > > I see now that it is actually relying on the barrier/locking > semantics of call_rcu vs. rcu_read_lock. So the smp_store_release > and READ_ONCE are simply unnecessary and could be confusing to > future readers. > > ---8<--- > The smp_store_release call in fqdir_exit cannot protect the setting > of fqdir->dead as claimed because its memory barrier is only > guaranteed to be one-way and the barrier precedes the setting of > fqdir->dead. > > IOW it doesn't provide any barriers between fq->dir and the following > hash table destruction. > > In fact, the code is safe anyway because call_rcu does provide both > the memory barrier as well as a guarantee that when the destruction > work starts executing all RCU readers will see the updated value for > fqdir->dead. > > Therefore this patch removes the unnecessary smp_store_release call > as well as the corresponding READ_ONCE on the read-side in order to > not confuse future readers of this code. Comments have been added > in their places. > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> > > diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c > index 2b816f1ebbb4..35e9784fab4e 100644 > --- a/net/ipv4/inet_fragment.c > +++ b/net/ipv4/inet_fragment.c > @@ -193,10 +193,12 @@ void fqdir_exit(struct fqdir *fqdir) > { > fqdir->high_thresh = 0; /* prevent creation of new frags */ > > - /* paired with READ_ONCE() in inet_frag_kill() : > - * We want to prevent rhashtable_remove_fast() calls > + fqdir->dead = true; > + > + /* call_rcu is supposed to provide memory barrier semantics, > + * separating the setting of fqdir->dead with the destruction > + * work. This implicit barrier is paired with inet_frag_kill(). > */ > - smp_store_release(&fqdir->dead, true); > > INIT_RCU_WORK(&fqdir->destroy_rwork, fqdir_rwork_fn); > queue_rcu_work(system_wq, &fqdir->destroy_rwork); > @@ -214,10 +216,12 @@ void inet_frag_kill(struct inet_frag_queue *fq) > > fq->flags |= INET_FRAG_COMPLETE; > rcu_read_lock(); > - /* This READ_ONCE() is paired with smp_store_release() > - * in inet_frags_exit_net(). > + /* The RCU read lock provides a memory barrier > + * guaranteeing that if fqdir->dead is false then > + * the hash table destruction will not start until > + * after we unlock. Paired with inet_frags_exit_net(). > */ > - if (!READ_ONCE(fqdir->dead)) { > + if (!fqdir->dead) { If fqdir->dead read/write are concurrent, then this still needs to be READ_ONCE/WRITE_ONCE. Ordering is orthogonal to atomicity. > rhashtable_remove_fast(&fqdir->rhashtable, &fq->node, > fqdir->f->rhash_params); > refcount_dec(&fq->refcnt);
On Wed, May 29, 2019 at 07:43:51AM +0200, Dmitry Vyukov wrote: > > If fqdir->dead read/write are concurrent, then this still needs to be > READ_ONCE/WRITE_ONCE. Ordering is orthogonal to atomicity. No they do not. READ_ONCE/WRITE_ONCE are basically a more fine-tuned version of barrier(). In this case we already have an implicit barrier() call due to the memory barrier semantics so this is simply unnecessary. It's the same reason you don't need READ_ONCE/WRITE_ONCE when you do: CPU1 CPU2 ---- ---- spin_lock shared_var = 1 spin_lock spin_unlock if (shared_var == 1) ... spin_unlock Cheers,
On 5/28/19 10:40 PM, Herbert Xu wrote: > I see now that it is actually relying on the barrier/locking > semantics of call_rcu vs. rcu_read_lock. So the smp_store_release > and READ_ONCE are simply unnecessary and could be confusing to > future readers. > > ---8<--- > The smp_store_release call in fqdir_exit cannot protect the setting > of fqdir->dead as claimed because its memory barrier is only > guaranteed to be one-way and the barrier precedes the setting of > fqdir->dead. > > IOW it doesn't provide any barriers between fq->dir and the following > hash table destruction. > > In fact, the code is safe anyway because call_rcu does provide both > the memory barrier as well as a guarantee that when the destruction > work starts executing all RCU readers will see the updated value for > fqdir->dead. > > Therefore this patch removes the unnecessary smp_store_release call > as well as the corresponding READ_ONCE on the read-side in order to > not confuse future readers of this code. Comments have been added > in their places. > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> > SGTM, thanks. Reviewed-by: Eric Dumazet <edumazet@google.com> David, this targets net-next tree :)
From: Herbert Xu <herbert@gondor.apana.org.au> Date: Wed, 29 May 2019 13:40:26 +0800 > The smp_store_release call in fqdir_exit cannot protect the setting > of fqdir->dead as claimed because its memory barrier is only > guaranteed to be one-way and the barrier precedes the setting of > fqdir->dead. > > IOW it doesn't provide any barriers between fq->dir and the following > hash table destruction. > > In fact, the code is safe anyway because call_rcu does provide both > the memory barrier as well as a guarantee that when the destruction > work starts executing all RCU readers will see the updated value for > fqdir->dead. > > Therefore this patch removes the unnecessary smp_store_release call > as well as the corresponding READ_ONCE on the read-side in order to > not confuse future readers of this code. Comments have been added > in their places. > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Applied.
On Wed, May 29, 2019 at 7:48 AM Herbert Xu <herbert@gondor.apana.org.au> wrote: > > On Wed, May 29, 2019 at 07:43:51AM +0200, Dmitry Vyukov wrote: > > > > If fqdir->dead read/write are concurrent, then this still needs to be > > READ_ONCE/WRITE_ONCE. Ordering is orthogonal to atomicity. > > No they do not. READ_ONCE/WRITE_ONCE are basically a more fine-tuned > version of barrier(). In this case we already have an implicit > barrier() call due to the memory barrier semantics so this is simply > unnecessary. > > It's the same reason you don't need READ_ONCE/WRITE_ONCE when you do: > > CPU1 CPU2 > ---- ---- > spin_lock > shared_var = 1 spin_lock > spin_unlock if (shared_var == 1) > ... > spin_unlock +Paul, Andrea, Alan OK, let's call it barrier. But we need more than a barrier here then. 1. The C standard is very clear here -- this new code causes undefined behavior of kernel. Regardless of what one might think about the C standard, it's still the current contract between us and compiler writers and nobody created any better replacement. 2. Then Documentation/memory-barriers.txt (which one can't say is not relevant here) effectively says the same: (*) It _must_not_ be assumed that the compiler will do what you want with memory references that are not protected by READ_ONCE() and WRITE_ONCE(). Without them, the compiler is within its rights to do all sorts of "creative" transformations, which are covered in the COMPILER BARRIER section. 3. The code is only correct under the naive execution (all code is executed literally as written). But we don't want compiler to execute code in such way, we want them to all employ all possible optimization tricks to make it faster. As the result compiler can break this code. It can reload the condition multiple times, including within the same branch, it can use variables as scratch storage and do other things that you and me can't think of now. Some of these optimizations are reasonable, some are not reasonable but still legal and just a result complex compiler logic giving surprising result on a corner case. Also if current implementation details of rhashtable_remove_fast change, surprising things can happen, e.g. executing just refcount_dec but not rhashtable_remove_fast. "Proving" impossibility of any of unexpected transformations it's just not what we should be spending time on again and again. E.g. a hundred of times people will skim through this code in future chasing some bug. 4. Then READ_ONCE()/WRITE_ONCE() improve code self-documentation. There is huge difference between a plain load and inter-thread synchronization algorithms. This needs to be clearly visible in the code. Especially when one hunts a racy memory corruption in large base of code (which happens with kernel all the time). 5. Marking all shared access is required to enable data race detection tooling. Which is an absolute must for kernel taking into account amount and complexity of tricky multi-threaded code. We need this. 6. We need marking of all shared memory accesses for the kernel memory model effort. There is no way the model can be defined without that. 7. This is very different from spin_lock example. spin_lock is a synchronization primitive that effectively makes code inside of the critical section single-threaded. But in this case read/write to dead execute concurrently. So what are the reasons to give up all these nice things and go into the rabbit hole of "proving" that we don't see how compiler could screw up things? If there are no significant reasons that outweigh all of these points, please use READ_ONCE()/WRITE_ONCE() for all shared accesses. Many will say thank you.
On Fri, May 31, 2019 at 10:24:08AM +0200, Dmitry Vyukov wrote: > > OK, let's call it barrier. But we need more than a barrier here then. READ_ONCE/WRITE_ONCE is not some magical dust that you sprinkle around in your code to make it work without locks. You need to understand exactly why you need them and why the code would be buggy if you don't use them. In this case the code doesn't need them because an implicit barrier() (which is *stronger* than READ_ONCE/WRITE_ONCE) already exists in both places. Cheers,
On 5/31/19 7:45 AM, Herbert Xu wrote: > On Fri, May 31, 2019 at 10:24:08AM +0200, Dmitry Vyukov wrote: >> >> OK, let's call it barrier. But we need more than a barrier here then. > > READ_ONCE/WRITE_ONCE is not some magical dust that you sprinkle > around in your code to make it work without locks. You need to > understand exactly why you need them and why the code would be > buggy if you don't use them. > > In this case the code doesn't need them because an implicit > barrier() (which is *stronger* than READ_ONCE/WRITE_ONCE) already > exists in both places. > More over, adding READ_ONCE() while not really needed prevents some compiler optimizations. ( Not in this particular case, since fqdir->dead is read exactly once, but we could have had a loop ) I have already explained that the READ_ONCE() was a leftover of the first version of the patch, that I refined later, adding correct (and slightly more complex) RCU barriers and rules. Dmitry, the self-documentation argument is perfectly good, but Herbert put much nicer ad hoc comments.
On Fri, May 31, 2019 at 08:45:47AM -0700, Eric Dumazet wrote: > On 5/31/19 7:45 AM, Herbert Xu wrote: > > In this case the code doesn't need them because an implicit > > barrier() (which is *stronger* than READ_ONCE/WRITE_ONCE) already > > exists in both places. > I have already explained that the READ_ONCE() was a leftover of the first version > of the patch, that I refined later, adding correct (and slightly more complex) RCU > barriers and rules. AFAICT, neither barrier() nor RCU synchronization can be used as a replacement for {READ,WRITE}_ONCE() here (and in tons of other different situations). IOW, you might want to try harder. ;-) Thanks, Andrea
On Fri, May 31, 2019 at 9:29 AM Andrea Parri <andrea.parri@amarulasolutions.com> wrote: > > On Fri, May 31, 2019 at 08:45:47AM -0700, Eric Dumazet wrote: > > On 5/31/19 7:45 AM, Herbert Xu wrote: > > > > In this case the code doesn't need them because an implicit > > > barrier() (which is *stronger* than READ_ONCE/WRITE_ONCE) already > > > exists in both places. > > > > I have already explained that the READ_ONCE() was a leftover of the first version > > of the patch, that I refined later, adding correct (and slightly more complex) RCU > > barriers and rules. > > AFAICT, neither barrier() nor RCU synchronization can be used as > a replacement for {READ,WRITE}_ONCE() here (and in tons of other > different situations). IOW, you might want to try harder. ;-) At least the writer side is using queue_rcu_work() which implies many full memory barriers, it is not equivalent to a compiler barrier() :/ David, Herbert, I really do not care, I want to move on fixing real bugs, not arguing with memory barriers experts. Lets add back the READ_ONCE() and be happy.
On Fri, May 31, 2019 at 08:45:47AM -0700, Eric Dumazet wrote: > > > On 5/31/19 7:45 AM, Herbert Xu wrote: > > On Fri, May 31, 2019 at 10:24:08AM +0200, Dmitry Vyukov wrote: > >> > >> OK, let's call it barrier. But we need more than a barrier here then. > > > > READ_ONCE/WRITE_ONCE is not some magical dust that you sprinkle > > around in your code to make it work without locks. You need to > > understand exactly why you need them and why the code would be > > buggy if you don't use them. > > > > In this case the code doesn't need them because an implicit > > barrier() (which is *stronger* than READ_ONCE/WRITE_ONCE) already > > exists in both places. > > > > More over, adding READ_ONCE() while not really needed prevents some compiler > optimizations. > > ( Not in this particular case, since fqdir->dead is read exactly once, but we could > have had a loop ) > > I have already explained that the READ_ONCE() was a leftover of the first version > of the patch, that I refined later, adding correct (and slightly more complex) RCU > barriers and rules. > > Dmitry, the self-documentation argument is perfectly good, but Herbert > put much nicer ad hoc comments. I don't see all the code, but let me see if I understand based on the pieces that I do see... o fqdir_exit() does a store-release to ->dead, then arranges for fqdir_rwork_fn() to be called from workqueue context after a grace period has elapsed. o If inet_frag_kill() is invoked only from fqdir_rwork_fn(), and if they are using the same fqdir, then inet_frag_kill() would always see fqdir->dead==true. But then it would not be necessary to check it, so this seems unlikely. o If fqdir_exit() does store-releases to a number of ->dead fields under rcu_read_lock(), and if the next fqdir_exit() won't happen until after all the callbacks complete (combination of flushing workqueues and rcu_barrier(), for example), then ->dead would be stable when inet_frag_kill() is invoked, and might be true or not. (This again requires inet_frag_kill() be only invoked from fqdir_rwork_fn().) So I can imagine cases where this would in fact work. But did I get it right or is something else happening? Thanx, Paul
On Fri, May 31, 2019 at 10:11 AM Paul E. McKenney <paulmck@linux.ibm.com> wrote: > > On Fri, May 31, 2019 at 08:45:47AM -0700, Eric Dumazet wrote: > > > > > > On 5/31/19 7:45 AM, Herbert Xu wrote: > > > On Fri, May 31, 2019 at 10:24:08AM +0200, Dmitry Vyukov wrote: > > >> > > >> OK, let's call it barrier. But we need more than a barrier here then. > > > > > > READ_ONCE/WRITE_ONCE is not some magical dust that you sprinkle > > > around in your code to make it work without locks. You need to > > > understand exactly why you need them and why the code would be > > > buggy if you don't use them. > > > > > > In this case the code doesn't need them because an implicit > > > barrier() (which is *stronger* than READ_ONCE/WRITE_ONCE) already > > > exists in both places. > > > > > > > More over, adding READ_ONCE() while not really needed prevents some compiler > > optimizations. > > > > ( Not in this particular case, since fqdir->dead is read exactly once, but we could > > have had a loop ) > > > > I have already explained that the READ_ONCE() was a leftover of the first version > > of the patch, that I refined later, adding correct (and slightly more complex) RCU > > barriers and rules. > > > > Dmitry, the self-documentation argument is perfectly good, but Herbert > > put much nicer ad hoc comments. > > I don't see all the code, but let me see if I understand based on the > pieces that I do see... > > o fqdir_exit() does a store-release to ->dead, then arranges > for fqdir_rwork_fn() to be called from workqueue context > after a grace period has elapsed. > > o If inet_frag_kill() is invoked only from fqdir_rwork_fn(), > and if they are using the same fqdir, then inet_frag_kill() > would always see fqdir->dead==true. > > But then it would not be necessary to check it, so this seems > unlikely > Nope, inet_frag_kill() can be called from timer handler, and there is already an existing barrier (spinlock) before we call it (also under rcu_read_lock()) ip_expire(struct timer_list *t) rcu_read_lock(); spin_lock(&qp->q.lock); ... ipq_kill(qp); -> inet_frag_kill() > o If fqdir_exit() does store-releases to a number of ->dead > fields under rcu_read_lock(), and if the next fqdir_exit() > won't happen until after all the callbacks complete > (combination of flushing workqueues and rcu_barrier(), for > example), then ->dead would be stable when inet_frag_kill() > is invoked, and might be true or not. (This again requires > inet_frag_kill() be only invoked from fqdir_rwork_fn().) > > So I can imagine cases where this would in fact work. But did I get > it right or is something else happening? > > Thanx, Paul >
On Fri, May 31, 2019 at 7:11 PM Eric Dumazet <edumazet@google.com> wrote: > > On Fri, May 31, 2019 at 9:29 AM Andrea Parri > <andrea.parri@amarulasolutions.com> wrote: > > > > On Fri, May 31, 2019 at 08:45:47AM -0700, Eric Dumazet wrote: > > > On 5/31/19 7:45 AM, Herbert Xu wrote: > > > > > > In this case the code doesn't need them because an implicit > > > > barrier() (which is *stronger* than READ_ONCE/WRITE_ONCE) already > > > > exists in both places. > > > > > > > I have already explained that the READ_ONCE() was a leftover of the first version > > > of the patch, that I refined later, adding correct (and slightly more complex) RCU > > > barriers and rules. > > > > AFAICT, neither barrier() nor RCU synchronization can be used as > > a replacement for {READ,WRITE}_ONCE() here (and in tons of other > > different situations). IOW, you might want to try harder. ;-) > > At least the writer side is using queue_rcu_work() which implies many > full memory barriers, > it is not equivalent to a compiler barrier() :/ > > David, Herbert, I really do not care, I want to move on fixing real > bugs, not arguing with memory barriers experts. > > Lets add back the READ_ONCE() and be happy. We will have get back to this later with LKMM and KTSAN.
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c index 2b816f1ebbb4..35e9784fab4e 100644 --- a/net/ipv4/inet_fragment.c +++ b/net/ipv4/inet_fragment.c @@ -193,10 +193,12 @@ void fqdir_exit(struct fqdir *fqdir) { fqdir->high_thresh = 0; /* prevent creation of new frags */ - /* paired with READ_ONCE() in inet_frag_kill() : - * We want to prevent rhashtable_remove_fast() calls + fqdir->dead = true; + + /* call_rcu is supposed to provide memory barrier semantics, + * separating the setting of fqdir->dead with the destruction + * work. This implicit barrier is paired with inet_frag_kill(). */ - smp_store_release(&fqdir->dead, true); INIT_RCU_WORK(&fqdir->destroy_rwork, fqdir_rwork_fn); queue_rcu_work(system_wq, &fqdir->destroy_rwork); @@ -214,10 +216,12 @@ void inet_frag_kill(struct inet_frag_queue *fq) fq->flags |= INET_FRAG_COMPLETE; rcu_read_lock(); - /* This READ_ONCE() is paired with smp_store_release() - * in inet_frags_exit_net(). + /* The RCU read lock provides a memory barrier + * guaranteeing that if fqdir->dead is false then + * the hash table destruction will not start until + * after we unlock. Paired with inet_frags_exit_net(). */ - if (!READ_ONCE(fqdir->dead)) { + if (!fqdir->dead) { rhashtable_remove_fast(&fqdir->rhashtable, &fq->node, fqdir->f->rhash_params); refcount_dec(&fq->refcnt);