diff mbox

[RFC,2/3] locking: Annotate spin_unlock_wait() users

Message ID 20160524143649.608476390@infradead.org
State RFC
Delegated to: Pablo Neira
Headers show

Commit Message

Peter Zijlstra May 24, 2016, 2:27 p.m. UTC
spin_unlock_wait() has an unintuitive 'feature' in that it doesn't
fully serialize against the spin_unlock() we've waited on.

In particular, spin_unlock_wait() only provides a control dependency,
which is a LOAD->STORE order. This means subsequent loads can creep up
and observe state prior to the waited-for unlock. This means we don't
necessarily observe the full critical section.

We must employ smp_acquire__after_ctrl_dep() to upgrade the
LOAD->STORE to LOAD->{LOAD,STORE} aka. load-AQUIRE and thereby ensure
we observe the full critical section we've waited on.

Many spin_unlock_wait() users were unaware of this issue and need
help.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 drivers/ata/libata-eh.c   |    4 +++-
 kernel/exit.c             |   14 ++++++++++++--
 kernel/sched/completion.c |    7 +++++++
 kernel/task_work.c        |    2 +-
 4 files changed, 23 insertions(+), 4 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Linus Torvalds May 24, 2016, 4:17 p.m. UTC | #1
On Tue, May 24, 2016 at 7:27 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> spin_unlock_wait() has an unintuitive 'feature' in that it doesn't
> fully serialize against the spin_unlock() we've waited on.

NAK.

We don't start adding more of this "after_ctrl_dep" crap.

It's completely impossible to understand, and even people who have
been locking experts have gotten it wrong.

So it is *completely* unacceptable to have it in drivers.

This needs to be either hidden inside the basic spinlock functions,
_or_ it needs to be a clear and unambiguous interface. Anything that
starts talking about control dependencies is not it.

Note that this really is about naming and use, not about
implementation. So something like "spin_sync_after_unlock_wait()" is
acceptable, even if the actual _implementation_ were to be exactly the
same as the "after_ctrl_dep()" crap.

The difference is that one talks about incomprehensible implementation
details that nobody outside of the person who *implemented* the
spinlock code is supposed to understand (and seriously, I have my
doubts even the spinlock implementer understands it, judging by the
last time this happened), and the other is a much simpler semantic
guarantee.

So don't talk about "acquire". And most certainly don't talk about
"control dependencies". Not if we end up having things like *drivers*
using this like in this example libata.

                Linus
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo May 24, 2016, 4:22 p.m. UTC | #2
Hello,

On Tue, May 24, 2016 at 09:17:13AM -0700, Linus Torvalds wrote:
> So don't talk about "acquire". And most certainly don't talk about
> "control dependencies". Not if we end up having things like *drivers*
> using this like in this example libata.

A delta but that particular libata usage is probably not needed now.
The path was used while libata was gradually adding error handlers to
the low level drivers.  I don't think we don't have any left w/o one
at this point.  I'll verify and get rid of that usage.

Thanks.
Peter Zijlstra May 24, 2016, 4:57 p.m. UTC | #3
On Tue, May 24, 2016 at 09:17:13AM -0700, Linus Torvalds wrote:

> This needs to be either hidden inside the basic spinlock functions,
> _or_ it needs to be a clear and unambiguous interface. Anything that
> starts talking about control dependencies is not it.
> 
> Note that this really is about naming and use, not about
> implementation. So something like "spin_sync_after_unlock_wait()" is
> acceptable, even if the actual _implementation_ were to be exactly the
> same as the "after_ctrl_dep()" crap.

OK; so I would prefer to keep the smp_acquire__after_ctrl_dep() crap for
common use in smp_cond_acquire() and such, but I'd be more than happy to
just stuff it unconditionally into spin_unlock_wait().

Most users really need it, and its restores intuitive semantics to the
primitive.

I'm assuming the explicit use then left in ipc/sem.c (as paired with the
spin_is_locked) is fine with you; that's certainly not driver code.

Todays series was really more about auditing all the spin_unlock_wait()
usage sites.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra May 24, 2016, 4:58 p.m. UTC | #4
On Tue, May 24, 2016 at 12:22:07PM -0400, Tejun Heo wrote:
> A delta but that particular libata usage is probably not needed now.
> The path was used while libata was gradually adding error handlers to
> the low level drivers.  I don't think we don't have any left w/o one
> at this point.  I'll verify and get rid of that usage.

OK, that would be great; I was sorta lost in there, but it looked like
if you need the spin_unlock_wait() you also need the extra barrier
thing.

If you can remove it, better still.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo May 25, 2016, 7:28 p.m. UTC | #5
On Tue, May 24, 2016 at 06:58:36PM +0200, Peter Zijlstra wrote:
> On Tue, May 24, 2016 at 12:22:07PM -0400, Tejun Heo wrote:
> > A delta but that particular libata usage is probably not needed now.
> > The path was used while libata was gradually adding error handlers to
> > the low level drivers.  I don't think we don't have any left w/o one
> > at this point.  I'll verify and get rid of that usage.
> 
> OK, that would be great; I was sorta lost in there, but it looked like
> if you need the spin_unlock_wait() you also need the extra barrier
> thing.
> 
> If you can remove it, better still.

Unfortunately, ipr SAS driver is still using the old error handling
path, so please include libata in the conversion for now.  I'll see if
ipr can be converted.

Thanks.
diff mbox

Patch

--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -703,8 +703,10 @@  void ata_scsi_cmd_error_handler(struct S
 
 		/* initialize eh_tries */
 		ap->eh_tries = ATA_EH_MAX_TRIES;
-	} else
+	} else {
 		spin_unlock_wait(ap->lock);
+		smp_acquire__after_ctrl_dep();
+	}
 
 }
 EXPORT_SYMBOL(ata_scsi_cmd_error_handler);
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -776,11 +776,16 @@  void do_exit(long code)
 
 	exit_signals(tsk);  /* sets PF_EXITING */
 	/*
-	 * tsk->flags are checked in the futex code to protect against
-	 * an exiting task cleaning up the robust pi futexes.
+	 * Ensure that all new tsk->pi_lock acquisitions must observe
+	 * PF_EXITING. Serializes against futex.c:attach_to_pi_owner().
 	 */
 	smp_mb();
 	raw_spin_unlock_wait(&tsk->pi_lock);
+	/*
+	 * Ensure that we must observe the pi_state in exit_mm() ->
+	 * mm_release() -> exit_pi_state_list().
+	 */
+	smp_acquire__after_ctrl_dep();
 
 	if (unlikely(in_atomic())) {
 		pr_info("note: %s[%d] exited with preempt_count %d\n",
@@ -897,6 +902,11 @@  void do_exit(long code)
 	 */
 	smp_mb();
 	raw_spin_unlock_wait(&tsk->pi_lock);
+	/*
+	 * Since there are no following loads the LOAD->LOAD order
+	 * provided by smp_acquire__after_ctrl_dep() is not
+	 * strictly required.
+	 */
 
 	/* causes final put_task_struct in finish_task_switch(). */
 	tsk->state = TASK_DEAD;
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -312,6 +312,13 @@  bool completion_done(struct completion *
 	 */
 	smp_rmb();
 	spin_unlock_wait(&x->wait.lock);
+	/*
+	 * Even though we've observed 'done', this doesn't mean we can observe
+	 * all stores prior to complete(), as the only RELEASE barrier on that
+	 * path is provided by the spin_unlock().
+	 */
+	smp_acquire__after_ctrl_dep();
+
 	return true;
 }
 EXPORT_SYMBOL(completion_done);
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -108,7 +108,7 @@  void task_work_run(void)
 		 * fail, but it can play with *work and other entries.
 		 */
 		raw_spin_unlock_wait(&task->pi_lock);
-		smp_mb();
+		smp_acquire__after_ctrl_dep();
 
 		do {
 			next = work->next;