From patchwork Thu Mar 21 23:48:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Faria de Oliveira X-Patchwork-Id: 1060606 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 44QNms42rwz9sS0; Fri, 22 Mar 2019 10:49:33 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1h77RA-0001nu-AH; Thu, 21 Mar 2019 23:49:28 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1h77R5-0001ll-5Z for kernel-team@lists.ubuntu.com; Thu, 21 Mar 2019 23:49:23 +0000 Received: from mail-qt1-f198.google.com ([209.85.160.198]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1h77R4-0001f5-Ra for kernel-team@lists.ubuntu.com; Thu, 21 Mar 2019 23:49:22 +0000 Received: by mail-qt1-f198.google.com with SMTP id g17so605629qte.17 for ; Thu, 21 Mar 2019 16:49:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=kf+gLfVtFyN1tqLwhbV+gW9wx5nPE/1ZSEiYsPg7cWw=; b=JeB2bMRvnMcLa9aWe6TbUJbCu7KBzUdTE1QgsAKSGCCGCwwvw+UtGIrIqayG/ePbsE 7Jzi6mZBdHopG9hOSrwJrnNpk/62qCuTs5yTbLR1a6IpI5LWBjwz2rBJdSdVIg6wnctA Ik3fxicvQYv/ImbUGyyU6NuPpgEDRmm1ex0K3Ev6IU3m+lhv9Jjt00DgqZZMT66/LRTU HBp0XVTlNtgQ+PRLjgWgHDc3RODsvQD07/kdi1sjHC4og8+wAyFlfgMME3BxDpGZR9tw jmZP5X6UDICzwtmHzGmMkCDMBZ36BNNYhCLLhaGOrN3dyh3pft4Rj8nG0vmG1MrLTBOs hFRA== X-Gm-Message-State: APjAAAXF842kOdJgaeIYX5MgQ08za3A3RBL7RlpQEzC0rlkVrRcTT+z6 bGfGm++axwbTtOZgPC9YWSgujRHHnTuZf4gM662xyamzgZeAdJoT0fttaER3/pLuogk2y35MaeE mXkj+6UbFjOgaERevZPqGoErqdpABeKj1VX24+WjNFQ== X-Received: by 2002:a37:96c4:: with SMTP id y187mr5043111qkd.149.1553212161852; Thu, 21 Mar 2019 16:49:21 -0700 (PDT) X-Google-Smtp-Source: APXvYqzAq4kzKQaJI/UaXAqcQkjg0/dHwwYtZicsXYx9wux+SFmP/OnkJ9BAkQpIzjDt9RUUwBMvxg== X-Received: by 2002:a37:96c4:: with SMTP id y187mr5043098qkd.149.1553212161675; Thu, 21 Mar 2019 16:49:21 -0700 (PDT) Received: from localhost.localdomain ([2804:14c:4e7:c0e:5083:4574:81c5:ff8d]) by smtp.gmail.com with ESMTPSA id e6sm445639qtr.56.2019.03.21.16.49.20 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 21 Mar 2019 16:49:21 -0700 (PDT) From: Mauricio Faria de Oliveira To: kernel-team@lists.ubuntu.com Subject: [B][PATCH 1/2] stop_machine: Disable preemption after queueing stopper threads Date: Thu, 21 Mar 2019 20:48:35 -0300 Message-Id: <20190321234836.11774-2-mfo@canonical.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190321234836.11774-1-mfo@canonical.com> References: <20190321234836.11774-1-mfo@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: "Isaac J. Manjarres" BugLink: https://bugs.launchpad.net/bugs/1821259 This commit: 9fb8d5dc4b64 ("stop_machine, Disable preemption when waking two stopper threads") does not fully address the race condition that can occur as follows: On one CPU, call it CPU 3, thread 1 invokes cpu_stop_queue_two_works(2, 3,...), and the execution is such that thread 1 queues the works for migration/2 and migration/3, and is preempted after releasing the locks for migration/2 and migration/3, but before waking the threads. Then, On CPU 2, a kworker, call it thread 2, is running, and it invokes cpu_stop_queue_two_works(1, 2,...), such that thread 2 queues the works for migration/1 and migration/2. Meanwhile, on CPU 3, thread 1 resumes execution, and wakes migration/2 and migration/3. This means that when CPU 2 releases the locks for migration/1 and migration/2, but before it wakes those threads, it can be preempted by migration/2. If thread 2 is preempted by migration/2, then migration/2 will execute the first work item successfully, since migration/3 was woken up by CPU 3, but when it goes to execute the second work item, it disables preemption, calls multi_cpu_stop(), and thus, CPU 2 will wait forever for migration/1, which should have been woken up by thread 2. However migration/1 cannot be woken up by thread 2, since it is a kworker, so it is affine to CPU 2, but CPU 2 is running migration/2 with preemption disabled, so thread 2 will never run. Disable preemption after queueing works for stopper threads to ensure that the operation of queueing the works and waking the stopper threads is atomic. Co-Developed-by: Prasad Sodagudi Co-Developed-by: Pavankumar Kondeti Signed-off-by: Isaac J. Manjarres Signed-off-by: Prasad Sodagudi Signed-off-by: Pavankumar Kondeti Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: bigeasy@linutronix.de Cc: gregkh@linuxfoundation.org Cc: matt@codeblueprint.co.uk Fixes: 9fb8d5dc4b64 ("stop_machine, Disable preemption when waking two stopper threads") Link: http://lkml.kernel.org/r/1531856129-9871-1-git-send-email-isaacm@codeaurora.org Signed-off-by: Ingo Molnar (cherry picked from commit 2610e88946632afb78aa58e61f11368ac4c0af7b) Signed-off-by: Mauricio Faria de Oliveira --- kernel/stop_machine.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index 1ff523dae6e2..e190d1ef3a23 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -260,6 +260,15 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1, err = 0; __cpu_stop_queue_work(stopper1, work1, &wakeq); __cpu_stop_queue_work(stopper2, work2, &wakeq); + /* + * The waking up of stopper threads has to happen + * in the same scheduling context as the queueing. + * Otherwise, there is a possibility of one of the + * above stoppers being woken up by another CPU, + * and preempting us. This will cause us to n ot + * wake up the other stopper forever. + */ + preempt_disable(); unlock: raw_spin_unlock(&stopper2->lock); raw_spin_unlock_irq(&stopper1->lock); @@ -271,7 +280,6 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1, } if (!err) { - preempt_disable(); wake_up_q(&wakeq); preempt_enable(); } From patchwork Thu Mar 21 23:48:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Faria de Oliveira X-Patchwork-Id: 1060605 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 44QNmp6P2Nz9sRy; Fri, 22 Mar 2019 10:49:30 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1h77R8-0001mt-45; Thu, 21 Mar 2019 23:49:26 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1h77R6-0001mc-Kh for kernel-team@lists.ubuntu.com; Thu, 21 Mar 2019 23:49:24 +0000 Received: from mail-qk1-f197.google.com ([209.85.222.197]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1h77R6-0001fE-AO for kernel-team@lists.ubuntu.com; Thu, 21 Mar 2019 23:49:24 +0000 Received: by mail-qk1-f197.google.com with SMTP id 77so409448qkd.9 for ; Thu, 21 Mar 2019 16:49:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=yz4ORE5d7ObRB+EdFJKW2p/Vp/PJ2F28vXaQshqMHnQ=; b=qTHf+gULJX4hVIIGc9pkB3Dv7IlbHwQLslp1SHZY01ij6Y+dXDDBUtnbCnYNADaMhT PSJsXWxnA1obYLZP94HISUe2GtLVYAUxQxSf2B09bBQz2CTYpRCtiFOPh3QLraXDEaWW Rkv0Oz/xYFLePfkTLnwWPn/TA9UkGqwvzjfcyVrehfnu1HTv2ZcgTKIYNUxgk4X0cC5j 1nfTy0azSuyVjMLPyrW/C4DJfPwNiEyNmhPDusQpz9RhPoZuxGUoY01nud7dLlp/A/v0 0cz7aYqlb8vBPQBqsZN4UGBx8MCBXBJy5n/3JPfquMi+SXHiZ4J+CQkRefq4dviRj72u T4VQ== X-Gm-Message-State: APjAAAUoT9osSQhDUO5NveiDvRMjhKHqSMiWZDAdZiWMh+nvJ3XwHL24 iZ6adVv6q/cRetKTRUVFlAhxNa+5CJga/1b66ulMq2w4CQIIuOVbGA4H7mUaRV4PMwGB90wLLNP 33FTsnFTaCUE7bBvJvvqgfBa50p9qpQsBpyKF8Hc/BQ== X-Received: by 2002:ac8:865:: with SMTP id x34mr5577149qth.379.1553212163405; Thu, 21 Mar 2019 16:49:23 -0700 (PDT) X-Google-Smtp-Source: APXvYqzqXxmzVwhiLVmsEFMUlFLJJ8+Mqbh1qre5m2G5hBaAqgcTG/v1jNMN6/k6uFh6rnh6aohZPw== X-Received: by 2002:ac8:865:: with SMTP id x34mr5577141qth.379.1553212163240; Thu, 21 Mar 2019 16:49:23 -0700 (PDT) Received: from localhost.localdomain ([2804:14c:4e7:c0e:5083:4574:81c5:ff8d]) by smtp.gmail.com with ESMTPSA id e6sm445639qtr.56.2019.03.21.16.49.21 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 21 Mar 2019 16:49:22 -0700 (PDT) From: Mauricio Faria de Oliveira To: kernel-team@lists.ubuntu.com Subject: [B][PATCH 2/2] stop_machine: Atomically queue and wake stopper threads Date: Thu, 21 Mar 2019 20:48:36 -0300 Message-Id: <20190321234836.11774-3-mfo@canonical.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190321234836.11774-1-mfo@canonical.com> References: <20190321234836.11774-1-mfo@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Prasad Sodagudi BugLink: https://bugs.launchpad.net/bugs/1821259 When cpu_stop_queue_work() releases the lock for the stopper thread that was queued into its wake queue, preemption is enabled, which leads to the following deadlock: CPU0 CPU1 sched_setaffinity(0, ...) __set_cpus_allowed_ptr() stop_one_cpu(0, ...) stop_two_cpus(0, 1, ...) cpu_stop_queue_work(0, ...) cpu_stop_queue_two_works(0, ..., 1, ...) -grabs lock for migration/0- -spins with preemption disabled, waiting for migration/0's lock to be released- -adds work items for migration/0 and queues migration/0 to its wake_q- -releases lock for migration/0 and preemption is enabled- -current thread is preempted, and __set_cpus_allowed_ptr has changed the thread's cpu allowed mask to CPU1 only- -acquires migration/0 and migration/1's locks- -adds work for migration/0 but does not add migration/0 to wake_q, since it is already in a wake_q- -adds work for migration/1 and adds migration/1 to its wake_q- -releases migration/0 and migration/1's locks, wakes migration/1, and enables preemption- -since migration/1 is requested to run, migration/1 begins to run and waits on migration/0, but migration/0 will never be able to run, since the thread that can wake it is affine to CPU1- Disable preemption in cpu_stop_queue_work() before queueing works for stopper threads, and queueing the stopper thread in the wake queue, to ensure that the operation of queueing the works and waking the stopper threads is atomic. Fixes: 0b26351b910f ("stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock") Signed-off-by: Prasad Sodagudi Signed-off-by: Isaac J. Manjarres Signed-off-by: Thomas Gleixner Cc: peterz@infradead.org Cc: matt@codeblueprint.co.uk Cc: bigeasy@linutronix.de Cc: gregkh@linuxfoundation.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1533329766-4856-1-git-send-email-isaacm@codeaurora.org Co-Developed-by: Isaac J. Manjarres (cherry picked from commit cfd355145c32bb7ccb65fccbe2d67280dc2119e1) Signed-off-by: Mauricio Faria de Oliveira --- kernel/stop_machine.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index e190d1ef3a23..69eb76daed34 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -81,6 +81,7 @@ static bool cpu_stop_queue_work(unsigned int cpu, struct cpu_stop_work *work) unsigned long flags; bool enabled; + preempt_disable(); raw_spin_lock_irqsave(&stopper->lock, flags); enabled = stopper->enabled; if (enabled) @@ -90,6 +91,7 @@ static bool cpu_stop_queue_work(unsigned int cpu, struct cpu_stop_work *work) raw_spin_unlock_irqrestore(&stopper->lock, flags); wake_up_q(&wakeq); + preempt_enable(); return enabled; }