diff mbox series

inet: frags: Remove unnecessary smp_store_release/READ_ONCE

Message ID 20190529054026.fwcyhzt33dshma4h@gondor.apana.org.au
State Accepted
Delegated to: David Miller
Headers show
Series inet: frags: Remove unnecessary smp_store_release/READ_ONCE | expand

Commit Message

Herbert Xu May 29, 2019, 5:40 a.m. UTC
On Tue, May 28, 2019 at 06:31:00AM -0700, Eric Dumazet wrote:
>
> This smp_store_release() is a left over of the first version of the patch, where
> there was no rcu grace period enforcement.
> 
> I do not believe there is harm letting this, but if you disagree
> please send a patch ;)

I see now that it is actually relying on the barrier/locking
semantics of call_rcu vs. rcu_read_lock.  So the smp_store_release
and READ_ONCE are simply unnecessary and could be confusing to
future readers.

---8<---
The smp_store_release call in fqdir_exit cannot protect the setting
of fqdir->dead as claimed because its memory barrier is only
guaranteed to be one-way and the barrier precedes the setting of
fqdir->dead.

IOW it doesn't provide any barriers between fq->dir and the following
hash table destruction.

In fact, the code is safe anyway because call_rcu does provide both
the memory barrier as well as a guarantee that when the destruction
work starts executing all RCU readers will see the updated value for
fqdir->dead.

Therefore this patch removes the unnecessary smp_store_release call
as well as the corresponding READ_ONCE on the read-side in order to
not confuse future readers of this code.  Comments have been added
in their places.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Comments

Dmitry Vyukov May 29, 2019, 5:43 a.m. UTC | #1
On Wed, May 29, 2019 at 7:40 AM Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> On Tue, May 28, 2019 at 06:31:00AM -0700, Eric Dumazet wrote:
> >
> > This smp_store_release() is a left over of the first version of the patch, where
> > there was no rcu grace period enforcement.
> >
> > I do not believe there is harm letting this, but if you disagree
> > please send a patch ;)
>
> I see now that it is actually relying on the barrier/locking
> semantics of call_rcu vs. rcu_read_lock.  So the smp_store_release
> and READ_ONCE are simply unnecessary and could be confusing to
> future readers.
>
> ---8<---
> The smp_store_release call in fqdir_exit cannot protect the setting
> of fqdir->dead as claimed because its memory barrier is only
> guaranteed to be one-way and the barrier precedes the setting of
> fqdir->dead.
>
> IOW it doesn't provide any barriers between fq->dir and the following
> hash table destruction.
>
> In fact, the code is safe anyway because call_rcu does provide both
> the memory barrier as well as a guarantee that when the destruction
> work starts executing all RCU readers will see the updated value for
> fqdir->dead.
>
> Therefore this patch removes the unnecessary smp_store_release call
> as well as the corresponding READ_ONCE on the read-side in order to
> not confuse future readers of this code.  Comments have been added
> in their places.
>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
>
> diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
> index 2b816f1ebbb4..35e9784fab4e 100644
> --- a/net/ipv4/inet_fragment.c
> +++ b/net/ipv4/inet_fragment.c
> @@ -193,10 +193,12 @@ void fqdir_exit(struct fqdir *fqdir)
>  {
>         fqdir->high_thresh = 0; /* prevent creation of new frags */
>
> -       /* paired with READ_ONCE() in inet_frag_kill() :
> -        * We want to prevent rhashtable_remove_fast() calls
> +       fqdir->dead = true;
> +
> +       /* call_rcu is supposed to provide memory barrier semantics,
> +        * separating the setting of fqdir->dead with the destruction
> +        * work.  This implicit barrier is paired with inet_frag_kill().
>          */
> -       smp_store_release(&fqdir->dead, true);
>
>         INIT_RCU_WORK(&fqdir->destroy_rwork, fqdir_rwork_fn);
>         queue_rcu_work(system_wq, &fqdir->destroy_rwork);
> @@ -214,10 +216,12 @@ void inet_frag_kill(struct inet_frag_queue *fq)
>
>                 fq->flags |= INET_FRAG_COMPLETE;
>                 rcu_read_lock();
> -               /* This READ_ONCE() is paired with smp_store_release()
> -                * in inet_frags_exit_net().
> +               /* The RCU read lock provides a memory barrier
> +                * guaranteeing that if fqdir->dead is false then
> +                * the hash table destruction will not start until
> +                * after we unlock.  Paired with inet_frags_exit_net().
>                  */
> -               if (!READ_ONCE(fqdir->dead)) {
> +               if (!fqdir->dead) {

If fqdir->dead read/write are concurrent, then this still needs to be
READ_ONCE/WRITE_ONCE. Ordering is orthogonal to atomicity.

>                         rhashtable_remove_fast(&fqdir->rhashtable, &fq->node,
>                                                fqdir->f->rhash_params);
>                         refcount_dec(&fq->refcnt);
Herbert Xu May 29, 2019, 5:47 a.m. UTC | #2
On Wed, May 29, 2019 at 07:43:51AM +0200, Dmitry Vyukov wrote:
>
> If fqdir->dead read/write are concurrent, then this still needs to be
> READ_ONCE/WRITE_ONCE. Ordering is orthogonal to atomicity.

No they do not.  READ_ONCE/WRITE_ONCE are basically a more fine-tuned
version of barrier().  In this case we already have an implicit
barrier() call due to the memory barrier semantics so this is simply
unnecessary.

It's the same reason you don't need READ_ONCE/WRITE_ONCE when you do:

CPU1				CPU2
----				----
spin_lock
shared_var = 1			spin_lock
spin_unlock			if (shared_var == 1)
					...
				spin_unlock

Cheers,
Eric Dumazet May 29, 2019, 9:30 p.m. UTC | #3
On 5/28/19 10:40 PM, Herbert Xu wrote:

> I see now that it is actually relying on the barrier/locking
> semantics of call_rcu vs. rcu_read_lock.  So the smp_store_release
> and READ_ONCE are simply unnecessary and could be confusing to
> future readers.
> 
> ---8<---
> The smp_store_release call in fqdir_exit cannot protect the setting
> of fqdir->dead as claimed because its memory barrier is only
> guaranteed to be one-way and the barrier precedes the setting of
> fqdir->dead.
> 
> IOW it doesn't provide any barriers between fq->dir and the following
> hash table destruction.
> 
> In fact, the code is safe anyway because call_rcu does provide both
> the memory barrier as well as a guarantee that when the destruction
> work starts executing all RCU readers will see the updated value for
> fqdir->dead.
> 
> Therefore this patch removes the unnecessary smp_store_release call
> as well as the corresponding READ_ONCE on the read-side in order to
> not confuse future readers of this code.  Comments have been added
> in their places.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 

SGTM, thanks.

Reviewed-by: Eric Dumazet <edumazet@google.com>

David, this targets net-next tree :)
David Miller May 30, 2019, 6:51 p.m. UTC | #4
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Wed, 29 May 2019 13:40:26 +0800

> The smp_store_release call in fqdir_exit cannot protect the setting
> of fqdir->dead as claimed because its memory barrier is only
> guaranteed to be one-way and the barrier precedes the setting of
> fqdir->dead.
> 
> IOW it doesn't provide any barriers between fq->dir and the following
> hash table destruction.
> 
> In fact, the code is safe anyway because call_rcu does provide both
> the memory barrier as well as a guarantee that when the destruction
> work starts executing all RCU readers will see the updated value for
> fqdir->dead.
> 
> Therefore this patch removes the unnecessary smp_store_release call
> as well as the corresponding READ_ONCE on the read-side in order to
> not confuse future readers of this code.  Comments have been added
> in their places.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied.
Dmitry Vyukov May 31, 2019, 8:24 a.m. UTC | #5
On Wed, May 29, 2019 at 7:48 AM Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> On Wed, May 29, 2019 at 07:43:51AM +0200, Dmitry Vyukov wrote:
> >
> > If fqdir->dead read/write are concurrent, then this still needs to be
> > READ_ONCE/WRITE_ONCE. Ordering is orthogonal to atomicity.
>
> No they do not.  READ_ONCE/WRITE_ONCE are basically a more fine-tuned
> version of barrier().  In this case we already have an implicit
> barrier() call due to the memory barrier semantics so this is simply
> unnecessary.
>
> It's the same reason you don't need READ_ONCE/WRITE_ONCE when you do:
>
> CPU1                            CPU2
> ----                            ----
> spin_lock
> shared_var = 1                  spin_lock
> spin_unlock                     if (shared_var == 1)
>                                         ...
>                                 spin_unlock

+Paul, Andrea, Alan

OK, let's call it barrier. But we need more than a barrier here then.

1. The C standard is very clear here -- this new code causes undefined
behavior of kernel. Regardless of what one might think about the C
standard, it's still the current contract between us and compiler
writers and nobody created any better replacement.

2. Then Documentation/memory-barriers.txt (which one can't say is not
relevant here) effectively says the same:

 (*) It _must_not_ be assumed that the compiler will do what you want
     with memory references that are not protected by READ_ONCE() and
     WRITE_ONCE().  Without them, the compiler is within its rights to
     do all sorts of "creative" transformations, which are covered in
     the COMPILER BARRIER section.

3. The code is only correct under the naive execution (all code is
executed literally as written). But we don't want compiler to execute
code in such way, we want them to all employ all possible optimization
tricks to make it faster. As the result compiler can break this code.
It can reload the condition multiple times, including within the same
branch, it can use variables as scratch storage and do other things
that you and me can't think of now. Some of these optimizations are
reasonable, some are not reasonable but still legal and just a result
complex compiler logic giving surprising result on a corner case. Also
if current implementation details of rhashtable_remove_fast change,
surprising things can happen, e.g. executing just refcount_dec but not
rhashtable_remove_fast.
"Proving" impossibility of any of unexpected transformations it's just
not what we should be spending time on again and again. E.g. a hundred
of times people will skim through this code in future chasing some
bug.

4. Then READ_ONCE()/WRITE_ONCE() improve code self-documentation.
There is huge difference between a plain load and inter-thread
synchronization algorithms. This needs to be clearly visible in the
code. Especially when one hunts a racy memory corruption in large base
of code (which happens with kernel all the time).

5. Marking all shared access is required to enable data race detection
tooling. Which is an absolute must for kernel taking into account
amount and complexity of tricky multi-threaded code. We need this.

6. We need marking of all shared memory accesses for the kernel memory
model effort. There is no way the model can be defined without that.

7. This is very different from spin_lock example. spin_lock is a
synchronization primitive that effectively makes code inside of the
critical section single-threaded. But in this case read/write to dead
execute concurrently.

So what are the reasons to give up all these nice things and go into
the rabbit hole of "proving" that we don't see how compiler could
screw up things?
If there are no significant reasons that outweigh all of these points,
please use READ_ONCE()/WRITE_ONCE() for all shared accesses. Many will
say thank you.
Herbert Xu May 31, 2019, 2:45 p.m. UTC | #6
On Fri, May 31, 2019 at 10:24:08AM +0200, Dmitry Vyukov wrote:
>
> OK, let's call it barrier. But we need more than a barrier here then.

READ_ONCE/WRITE_ONCE is not some magical dust that you sprinkle
around in your code to make it work without locks.  You need to
understand exactly why you need them and why the code would be
buggy if you don't use them.

In this case the code doesn't need them because an implicit
barrier() (which is *stronger* than READ_ONCE/WRITE_ONCE) already
exists in both places.

Cheers,
Eric Dumazet May 31, 2019, 3:45 p.m. UTC | #7
On 5/31/19 7:45 AM, Herbert Xu wrote:
> On Fri, May 31, 2019 at 10:24:08AM +0200, Dmitry Vyukov wrote:
>>
>> OK, let's call it barrier. But we need more than a barrier here then.
> 
> READ_ONCE/WRITE_ONCE is not some magical dust that you sprinkle
> around in your code to make it work without locks.  You need to
> understand exactly why you need them and why the code would be
> buggy if you don't use them.
> 
> In this case the code doesn't need them because an implicit
> barrier() (which is *stronger* than READ_ONCE/WRITE_ONCE) already
> exists in both places.
>

More over, adding READ_ONCE() while not really needed prevents some compiler
optimizations.

( Not in this particular case, since fqdir->dead is read exactly once, but we could
have had a loop )

I have already explained that the READ_ONCE() was a leftover of the first version
of the patch, that I refined later, adding correct (and slightly more complex) RCU
barriers and rules.

Dmitry, the self-documentation argument is perfectly good, but Herbert
put much nicer ad hoc comments.
Andrea Parri May 31, 2019, 4:29 p.m. UTC | #8
On Fri, May 31, 2019 at 08:45:47AM -0700, Eric Dumazet wrote:
> On 5/31/19 7:45 AM, Herbert Xu wrote:

> > In this case the code doesn't need them because an implicit
> > barrier() (which is *stronger* than READ_ONCE/WRITE_ONCE) already
> > exists in both places.


> I have already explained that the READ_ONCE() was a leftover of the first version
> of the patch, that I refined later, adding correct (and slightly more complex) RCU
> barriers and rules.

AFAICT, neither barrier() nor RCU synchronization can be used as
a replacement for {READ,WRITE}_ONCE() here (and in tons of other
different situations).  IOW, you might want to try harder.  ;-)

Thanks,
  Andrea
Eric Dumazet May 31, 2019, 5:11 p.m. UTC | #9
On Fri, May 31, 2019 at 9:29 AM Andrea Parri
<andrea.parri@amarulasolutions.com> wrote:
>
> On Fri, May 31, 2019 at 08:45:47AM -0700, Eric Dumazet wrote:
> > On 5/31/19 7:45 AM, Herbert Xu wrote:
>
> > > In this case the code doesn't need them because an implicit
> > > barrier() (which is *stronger* than READ_ONCE/WRITE_ONCE) already
> > > exists in both places.
>
>
> > I have already explained that the READ_ONCE() was a leftover of the first version
> > of the patch, that I refined later, adding correct (and slightly more complex) RCU
> > barriers and rules.
>
> AFAICT, neither barrier() nor RCU synchronization can be used as
> a replacement for {READ,WRITE}_ONCE() here (and in tons of other
> different situations).  IOW, you might want to try harder.  ;-)

At least the writer side is using queue_rcu_work() which implies many
full memory barriers,
it is not equivalent to a compiler barrier() :/

David, Herbert, I really do not care, I want to move on fixing real
bugs, not arguing with memory barriers experts.

Lets add back the READ_ONCE() and be happy.
Paul E. McKenney May 31, 2019, 5:11 p.m. UTC | #10
On Fri, May 31, 2019 at 08:45:47AM -0700, Eric Dumazet wrote:
> 
> 
> On 5/31/19 7:45 AM, Herbert Xu wrote:
> > On Fri, May 31, 2019 at 10:24:08AM +0200, Dmitry Vyukov wrote:
> >>
> >> OK, let's call it barrier. But we need more than a barrier here then.
> > 
> > READ_ONCE/WRITE_ONCE is not some magical dust that you sprinkle
> > around in your code to make it work without locks.  You need to
> > understand exactly why you need them and why the code would be
> > buggy if you don't use them.
> > 
> > In this case the code doesn't need them because an implicit
> > barrier() (which is *stronger* than READ_ONCE/WRITE_ONCE) already
> > exists in both places.
> >
> 
> More over, adding READ_ONCE() while not really needed prevents some compiler
> optimizations.
> 
> ( Not in this particular case, since fqdir->dead is read exactly once, but we could
> have had a loop )
> 
> I have already explained that the READ_ONCE() was a leftover of the first version
> of the patch, that I refined later, adding correct (and slightly more complex) RCU
> barriers and rules.
> 
> Dmitry, the self-documentation argument is perfectly good, but Herbert
> put much nicer ad hoc comments.

I don't see all the code, but let me see if I understand based on the
pieces that I do see...

o	fqdir_exit() does a store-release to ->dead, then arranges
	for fqdir_rwork_fn() to be called from workqueue context
	after a grace period has elapsed.

o	If inet_frag_kill() is invoked only from fqdir_rwork_fn(),
	and if they are using the same fqdir, then inet_frag_kill()
	would always see fqdir->dead==true.

	But then it would not be necessary to check it, so this seems
	unlikely.

o	If fqdir_exit() does store-releases to a number of ->dead
	fields under rcu_read_lock(), and if the next fqdir_exit()
	won't happen until after all the callbacks complete
	(combination of flushing workqueues and rcu_barrier(), for
	example), then ->dead would be stable when inet_frag_kill()
	is invoked, and might be true or not.  (This again requires
	inet_frag_kill() be only invoked from fqdir_rwork_fn().)

So I can imagine cases where this would in fact work.  But did I get
it right or is something else happening?

							Thanx, Paul
Eric Dumazet May 31, 2019, 5:18 p.m. UTC | #11
On Fri, May 31, 2019 at 10:11 AM Paul E. McKenney <paulmck@linux.ibm.com> wrote:
>
> On Fri, May 31, 2019 at 08:45:47AM -0700, Eric Dumazet wrote:
> >
> >
> > On 5/31/19 7:45 AM, Herbert Xu wrote:
> > > On Fri, May 31, 2019 at 10:24:08AM +0200, Dmitry Vyukov wrote:
> > >>
> > >> OK, let's call it barrier. But we need more than a barrier here then.
> > >
> > > READ_ONCE/WRITE_ONCE is not some magical dust that you sprinkle
> > > around in your code to make it work without locks.  You need to
> > > understand exactly why you need them and why the code would be
> > > buggy if you don't use them.
> > >
> > > In this case the code doesn't need them because an implicit
> > > barrier() (which is *stronger* than READ_ONCE/WRITE_ONCE) already
> > > exists in both places.
> > >
> >
> > More over, adding READ_ONCE() while not really needed prevents some compiler
> > optimizations.
> >
> > ( Not in this particular case, since fqdir->dead is read exactly once, but we could
> > have had a loop )
> >
> > I have already explained that the READ_ONCE() was a leftover of the first version
> > of the patch, that I refined later, adding correct (and slightly more complex) RCU
> > barriers and rules.
> >
> > Dmitry, the self-documentation argument is perfectly good, but Herbert
> > put much nicer ad hoc comments.
>
> I don't see all the code, but let me see if I understand based on the
> pieces that I do see...
>
> o       fqdir_exit() does a store-release to ->dead, then arranges
>         for fqdir_rwork_fn() to be called from workqueue context
>         after a grace period has elapsed.
>
> o       If inet_frag_kill() is invoked only from fqdir_rwork_fn(),
>         and if they are using the same fqdir, then inet_frag_kill()
>         would always see fqdir->dead==true.
>
>         But then it would not be necessary to check it, so this seems
>         unlikely
>

Nope, inet_frag_kill() can be called from timer handler, and there is
already an existing barrier (spinlock) before we call it (also under
rcu_read_lock())

ip_expire(struct timer_list *t)

rcu_read_lock();
spin_lock(&qp->q.lock);
 ... ipq_kill(qp);   -> inet_frag_kill()





> o       If fqdir_exit() does store-releases to a number of ->dead
>         fields under rcu_read_lock(), and if the next fqdir_exit()
>         won't happen until after all the callbacks complete
>         (combination of flushing workqueues and rcu_barrier(), for
>         example), then ->dead would be stable when inet_frag_kill()
>         is invoked, and might be true or not.  (This again requires
>         inet_frag_kill() be only invoked from fqdir_rwork_fn().)
>
> So I can imagine cases where this would in fact work.  But did I get
> it right or is something else happening?
>
>                                                         Thanx, Paul
>
Dmitry Vyukov June 7, 2019, 12:06 p.m. UTC | #12
On Fri, May 31, 2019 at 7:11 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Fri, May 31, 2019 at 9:29 AM Andrea Parri
> <andrea.parri@amarulasolutions.com> wrote:
> >
> > On Fri, May 31, 2019 at 08:45:47AM -0700, Eric Dumazet wrote:
> > > On 5/31/19 7:45 AM, Herbert Xu wrote:
> >
> > > > In this case the code doesn't need them because an implicit
> > > > barrier() (which is *stronger* than READ_ONCE/WRITE_ONCE) already
> > > > exists in both places.
> >
> >
> > > I have already explained that the READ_ONCE() was a leftover of the first version
> > > of the patch, that I refined later, adding correct (and slightly more complex) RCU
> > > barriers and rules.
> >
> > AFAICT, neither barrier() nor RCU synchronization can be used as
> > a replacement for {READ,WRITE}_ONCE() here (and in tons of other
> > different situations).  IOW, you might want to try harder.  ;-)
>
> At least the writer side is using queue_rcu_work() which implies many
> full memory barriers,
> it is not equivalent to a compiler barrier() :/
>
> David, Herbert, I really do not care, I want to move on fixing real
> bugs, not arguing with memory barriers experts.
>
> Lets add back the READ_ONCE() and be happy.

We will have get back to this later with LKMM and KTSAN.
diff mbox series

Patch

diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 2b816f1ebbb4..35e9784fab4e 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -193,10 +193,12 @@  void fqdir_exit(struct fqdir *fqdir)
 {
 	fqdir->high_thresh = 0; /* prevent creation of new frags */
 
-	/* paired with READ_ONCE() in inet_frag_kill() :
-	 * We want to prevent rhashtable_remove_fast() calls
+	fqdir->dead = true;
+
+	/* call_rcu is supposed to provide memory barrier semantics,
+	 * separating the setting of fqdir->dead with the destruction
+	 * work.  This implicit barrier is paired with inet_frag_kill().
 	 */
-	smp_store_release(&fqdir->dead, true);
 
 	INIT_RCU_WORK(&fqdir->destroy_rwork, fqdir_rwork_fn);
 	queue_rcu_work(system_wq, &fqdir->destroy_rwork);
@@ -214,10 +216,12 @@  void inet_frag_kill(struct inet_frag_queue *fq)
 
 		fq->flags |= INET_FRAG_COMPLETE;
 		rcu_read_lock();
-		/* This READ_ONCE() is paired with smp_store_release()
-		 * in inet_frags_exit_net().
+		/* The RCU read lock provides a memory barrier
+		 * guaranteeing that if fqdir->dead is false then
+		 * the hash table destruction will not start until
+		 * after we unlock.  Paired with inet_frags_exit_net().
 		 */
-		if (!READ_ONCE(fqdir->dead)) {
+		if (!fqdir->dead) {
 			rhashtable_remove_fast(&fqdir->rhashtable, &fq->node,
 					       fqdir->f->rhash_params);
 			refcount_dec(&fq->refcnt);