diff mbox series

[X,4/4] stop_machine: Atomically queue and wake stopper threads

Message ID 20190321234412.11113-5-mfo@canonical.com
State New
Headers show
Series LP#1821259 Fix for deadlock in cpu_stopper | expand

Commit Message

Mauricio Faria de Oliveira March 21, 2019, 11:44 p.m. UTC
From: Prasad Sodagudi <psodagud@codeaurora.org>

BugLink: https://bugs.launchpad.net/bugs/1821259

When cpu_stop_queue_work() releases the lock for the stopper
thread that was queued into its wake queue, preemption is
enabled, which leads to the following deadlock:

CPU0                              CPU1
sched_setaffinity(0, ...)
__set_cpus_allowed_ptr()
stop_one_cpu(0, ...)              stop_two_cpus(0, 1, ...)
cpu_stop_queue_work(0, ...)       cpu_stop_queue_two_works(0, ..., 1, ...)

-grabs lock for migration/0-
                                  -spins with preemption disabled,
                                   waiting for migration/0's lock to be
                                   released-

-adds work items for migration/0
and queues migration/0 to its
wake_q-

-releases lock for migration/0
 and preemption is enabled-

-current thread is preempted,
and __set_cpus_allowed_ptr
has changed the thread's
cpu allowed mask to CPU1 only-

                                  -acquires migration/0 and migration/1's
                                   locks-

                                  -adds work for migration/0 but does not
                                   add migration/0 to wake_q, since it is
                                   already in a wake_q-

                                  -adds work for migration/1 and adds
                                   migration/1 to its wake_q-

                                  -releases migration/0 and migration/1's
                                   locks, wakes migration/1, and enables
                                   preemption-

                                  -since migration/1 is requested to run,
                                   migration/1 begins to run and waits on
                                   migration/0, but migration/0 will never
                                   be able to run, since the thread that
                                   can wake it is affine to CPU1-

Disable preemption in cpu_stop_queue_work() before queueing works for
stopper threads, and queueing the stopper thread in the wake queue, to
ensure that the operation of queueing the works and waking the stopper
threads is atomic.

Fixes: 0b26351b910f ("stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock")
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: peterz@infradead.org
Cc: matt@codeblueprint.co.uk
Cc: bigeasy@linutronix.de
Cc: gregkh@linuxfoundation.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1533329766-4856-1-git-send-email-isaacm@codeaurora.org

Co-Developed-by: Isaac J. Manjarres <isaacm@codeaurora.org>
(backported from commit cfd355145c32bb7ccb65fccbe2d67280dc2119e1)
[mfo: backport: refresh context lines]
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
---
 kernel/stop_machine.c | 2 ++
 1 file changed, 2 insertions(+)
diff mbox series

Patch

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 7f39951f30bc..7f5579fc44ab 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -89,6 +89,7 @@  static void cpu_stop_queue_work(unsigned int cpu, struct cpu_stop_work *work)
 	WAKE_Q(wakeq);
 	unsigned long flags;
 
+	preempt_disable();
 	spin_lock_irqsave(&stopper->lock, flags);
 	if (stopper->enabled)
 		__cpu_stop_queue_work(stopper, work, &wakeq);
@@ -97,6 +98,7 @@  static void cpu_stop_queue_work(unsigned int cpu, struct cpu_stop_work *work)
 	spin_unlock_irqrestore(&stopper->lock, flags);
 
 	wake_up_q(&wakeq);
+	preempt_enable();
 }
 
 /**