Message ID | 20160524143649.608476390@infradead.org |
---|---|
State | RFC |
Delegated to: | Pablo Neira |
Headers | show |
On Tue, May 24, 2016 at 7:27 AM, Peter Zijlstra <peterz@infradead.org> wrote: > spin_unlock_wait() has an unintuitive 'feature' in that it doesn't > fully serialize against the spin_unlock() we've waited on. NAK. We don't start adding more of this "after_ctrl_dep" crap. It's completely impossible to understand, and even people who have been locking experts have gotten it wrong. So it is *completely* unacceptable to have it in drivers. This needs to be either hidden inside the basic spinlock functions, _or_ it needs to be a clear and unambiguous interface. Anything that starts talking about control dependencies is not it. Note that this really is about naming and use, not about implementation. So something like "spin_sync_after_unlock_wait()" is acceptable, even if the actual _implementation_ were to be exactly the same as the "after_ctrl_dep()" crap. The difference is that one talks about incomprehensible implementation details that nobody outside of the person who *implemented* the spinlock code is supposed to understand (and seriously, I have my doubts even the spinlock implementer understands it, judging by the last time this happened), and the other is a much simpler semantic guarantee. So don't talk about "acquire". And most certainly don't talk about "control dependencies". Not if we end up having things like *drivers* using this like in this example libata. Linus -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Tue, May 24, 2016 at 09:17:13AM -0700, Linus Torvalds wrote: > So don't talk about "acquire". And most certainly don't talk about > "control dependencies". Not if we end up having things like *drivers* > using this like in this example libata. A delta but that particular libata usage is probably not needed now. The path was used while libata was gradually adding error handlers to the low level drivers. I don't think we don't have any left w/o one at this point. I'll verify and get rid of that usage. Thanks.
On Tue, May 24, 2016 at 09:17:13AM -0700, Linus Torvalds wrote: > This needs to be either hidden inside the basic spinlock functions, > _or_ it needs to be a clear and unambiguous interface. Anything that > starts talking about control dependencies is not it. > > Note that this really is about naming and use, not about > implementation. So something like "spin_sync_after_unlock_wait()" is > acceptable, even if the actual _implementation_ were to be exactly the > same as the "after_ctrl_dep()" crap. OK; so I would prefer to keep the smp_acquire__after_ctrl_dep() crap for common use in smp_cond_acquire() and such, but I'd be more than happy to just stuff it unconditionally into spin_unlock_wait(). Most users really need it, and its restores intuitive semantics to the primitive. I'm assuming the explicit use then left in ipc/sem.c (as paired with the spin_is_locked) is fine with you; that's certainly not driver code. Todays series was really more about auditing all the spin_unlock_wait() usage sites. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 24, 2016 at 12:22:07PM -0400, Tejun Heo wrote: > A delta but that particular libata usage is probably not needed now. > The path was used while libata was gradually adding error handlers to > the low level drivers. I don't think we don't have any left w/o one > at this point. I'll verify and get rid of that usage. OK, that would be great; I was sorta lost in there, but it looked like if you need the spin_unlock_wait() you also need the extra barrier thing. If you can remove it, better still. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 24, 2016 at 06:58:36PM +0200, Peter Zijlstra wrote: > On Tue, May 24, 2016 at 12:22:07PM -0400, Tejun Heo wrote: > > A delta but that particular libata usage is probably not needed now. > > The path was used while libata was gradually adding error handlers to > > the low level drivers. I don't think we don't have any left w/o one > > at this point. I'll verify and get rid of that usage. > > OK, that would be great; I was sorta lost in there, but it looked like > if you need the spin_unlock_wait() you also need the extra barrier > thing. > > If you can remove it, better still. Unfortunately, ipr SAS driver is still using the old error handling path, so please include libata in the conversion for now. I'll see if ipr can be converted. Thanks.
--- a/drivers/ata/libata-eh.c +++ b/drivers/ata/libata-eh.c @@ -703,8 +703,10 @@ void ata_scsi_cmd_error_handler(struct S /* initialize eh_tries */ ap->eh_tries = ATA_EH_MAX_TRIES; - } else + } else { spin_unlock_wait(ap->lock); + smp_acquire__after_ctrl_dep(); + } } EXPORT_SYMBOL(ata_scsi_cmd_error_handler); --- a/kernel/exit.c +++ b/kernel/exit.c @@ -776,11 +776,16 @@ void do_exit(long code) exit_signals(tsk); /* sets PF_EXITING */ /* - * tsk->flags are checked in the futex code to protect against - * an exiting task cleaning up the robust pi futexes. + * Ensure that all new tsk->pi_lock acquisitions must observe + * PF_EXITING. Serializes against futex.c:attach_to_pi_owner(). */ smp_mb(); raw_spin_unlock_wait(&tsk->pi_lock); + /* + * Ensure that we must observe the pi_state in exit_mm() -> + * mm_release() -> exit_pi_state_list(). + */ + smp_acquire__after_ctrl_dep(); if (unlikely(in_atomic())) { pr_info("note: %s[%d] exited with preempt_count %d\n", @@ -897,6 +902,11 @@ void do_exit(long code) */ smp_mb(); raw_spin_unlock_wait(&tsk->pi_lock); + /* + * Since there are no following loads the LOAD->LOAD order + * provided by smp_acquire__after_ctrl_dep() is not + * strictly required. + */ /* causes final put_task_struct in finish_task_switch(). */ tsk->state = TASK_DEAD; --- a/kernel/sched/completion.c +++ b/kernel/sched/completion.c @@ -312,6 +312,13 @@ bool completion_done(struct completion * */ smp_rmb(); spin_unlock_wait(&x->wait.lock); + /* + * Even though we've observed 'done', this doesn't mean we can observe + * all stores prior to complete(), as the only RELEASE barrier on that + * path is provided by the spin_unlock(). + */ + smp_acquire__after_ctrl_dep(); + return true; } EXPORT_SYMBOL(completion_done); --- a/kernel/task_work.c +++ b/kernel/task_work.c @@ -108,7 +108,7 @@ void task_work_run(void) * fail, but it can play with *work and other entries. */ raw_spin_unlock_wait(&task->pi_lock); - smp_mb(); + smp_acquire__after_ctrl_dep(); do { next = work->next;
spin_unlock_wait() has an unintuitive 'feature' in that it doesn't fully serialize against the spin_unlock() we've waited on. In particular, spin_unlock_wait() only provides a control dependency, which is a LOAD->STORE order. This means subsequent loads can creep up and observe state prior to the waited-for unlock. This means we don't necessarily observe the full critical section. We must employ smp_acquire__after_ctrl_dep() to upgrade the LOAD->STORE to LOAD->{LOAD,STORE} aka. load-AQUIRE and thereby ensure we observe the full critical section we've waited on. Many spin_unlock_wait() users were unaware of this issue and need help. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- drivers/ata/libata-eh.c | 4 +++- kernel/exit.c | 14 ++++++++++++-- kernel/sched/completion.c | 7 +++++++ kernel/task_work.c | 2 +- 4 files changed, 23 insertions(+), 4 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html