From patchwork Fri Jul 28 12:54:16 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boqun Feng X-Patchwork-Id: 794859 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [103.22.144.68]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3xJpkJ1rKgz9s65 for ; Fri, 28 Jul 2017 22:56:08 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="An67snWt"; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 3xJpkJ0XByzDrSL for ; Fri, 28 Jul 2017 22:56:08 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="An67snWt"; dkim-atps=neutral X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Received: from mail-pf0-x241.google.com (mail-pf0-x241.google.com [IPv6:2607:f8b0:400e:c00::241]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3xJphS6xVYzDrRd for ; Fri, 28 Jul 2017 22:54:32 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="An67snWt"; dkim-atps=neutral Received: by mail-pf0-x241.google.com with SMTP id k72so17291827pfj.0 for ; Fri, 28 Jul 2017 05:54:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=3dxAWXRnKj6cNK3WuBZRq+thyyCEiE5wWSRzzzL1eZ0=; b=An67snWtsqD0zhAm9QwAFJE6/o9erRT02Nea7M5PaO9k9K99mVhHsjBXdKRPvDW7L8 XcB9iwffvLcJyrcaa7yooau1+ExYoesmhDVa/CjAyGJ50/JloEhGxxam+KUsdP1Y3KOc dKNSKE2Hy8htkn79U1P0Y6H6ZkkAmQcCGdzmrpdS0FCdrBNWl94zkPHSR430ddDFjrBp HeTfLzjuw+Q+8x1v47MG8z4swEuBAA7wnkXodP5P8/ubgWA8rMSEpoYcijFVsRXouUE5 w3RNrlNeG/dXwslgfS8wuabRzCz141pZPZw5RO2nrZwojGopOA4rIdIfaWDe+cTuiC6a ZPfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=3dxAWXRnKj6cNK3WuBZRq+thyyCEiE5wWSRzzzL1eZ0=; b=i8E8mh9LdgU3FauzzYHxi9PofXfyJB1Z1wNRywnO2TwE8jEHbWiFo9l5hpmtu+ShE2 qka5+vbcCIuD43Kvjp8lmbaFhlXy4V1CWphIZdyY6BC2LhDqy0aXrq/4wvNd/4pPaxHO UprqOcgDPRwxv36wS7aNJvfmCWu/Nd6iCi1ucvt+xN5+yCqhh1pjDkDznv5jUPY6+xIj RyodlMOLqw9JK+kdZNnBbXrV4pthACd9YJ+KeHxoGArxzdmyN/jawywJ2gXGAphNsWGg wDTLGSxrvD8rVcaMUtc3QvuDLR4SOx0zklXCZMlNq5eGRbDr2yPKafSiaaXtuCQJMeZH najQ== X-Gm-Message-State: AIVw113MZdMwFMLb8JNtJRaM8cqcxvh2DVwNOjOQGY+xveGOUhphY432 u4I5HiYQ8BI/Mw== X-Received: by 10.84.232.143 with SMTP id i15mr7909449plk.248.1501246471089; Fri, 28 Jul 2017 05:54:31 -0700 (PDT) Received: from localhost ([45.32.128.109]) by smtp.gmail.com with ESMTPSA id u13sm37713843pgq.75.2017.07.28.05.54.29 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 28 Jul 2017 05:54:30 -0700 (PDT) Date: Fri, 28 Jul 2017 20:54:16 +0800 From: Boqun Feng To: Jonathan Cameron Subject: Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this? Message-ID: <20170728125416.j7gcgvnxgv2gq73u@tardis> References: <20170726.154540.150558937277891719.davem@davemloft.net> <20170726231505.GG3730@linux.vnet.ibm.com> <20170726.162200.1904949371593276937.davem@davemloft.net> <20170727014214.GH3730@linux.vnet.ibm.com> <20170727143400.23e4d2b2@roar.ozlabs.ibm.com> <20170727124913.GL3730@linux.vnet.ibm.com> <20170727144903.000022a1@huawei.com> <20170727173923.000001b2@huawei.com> <20170727165245.GD3730@linux.vnet.ibm.com> <20170728084411.00001ddb@huawei.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20170728084411.00001ddb@huawei.com> User-Agent: NeoMutt/20170609 (1.8.3) X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: dzickus@redhat.com, sfr@canb.auug.org.au, linuxarm@huawei.com, Nicholas Piggin , abdhalee@linux.vnet.ibm.com, sparclinux@vger.kernel.org, akpm@linux-foundation.org, "Paul E. McKenney" , linuxppc-dev@lists.ozlabs.org, David Miller , linux-arm-kernel@lists.infradead.org Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Hi Jonathan, FWIW, there is wakeup-missing issue in swake_up() and swake_up_all(): https://marc.info/?l=linux-kernel&m=149750022019663 and RCU begins to use swait/wake last year, so I thought this could be relevant. Could you try the following patch and see if it works? Thanks. Regards, Boqun ------------------>8 Subject: [PATCH] swait: Remove the lockless swait_active() check in swake_up*() Steven Rostedt reported a potential race in RCU core because of swake_up(): CPU0 CPU1 ---- ---- __call_rcu_core() { spin_lock(rnp_root) need_wake = __rcu_start_gp() { rcu_start_gp_advanced() { gp_flags = FLAG_INIT } } rcu_gp_kthread() { swait_event_interruptible(wq, gp_flags & FLAG_INIT) { spin_lock(q->lock) *fetch wq->task_list here! * list_add(wq->task_list, q->task_list) spin_unlock(q->lock); *fetch old value of gp_flags here * spin_unlock(rnp_root) rcu_gp_kthread_wake() { swake_up(wq) { swait_active(wq) { list_empty(wq->task_list) } * return false * if (condition) * false * schedule(); In this case, a wakeup is missed, which could cause the rcu_gp_kthread waits for a long time. The reason of this is that we do a lockless swait_active() check in swake_up(). To fix this, we can either 1) add a smp_mb() in swake_up() before swait_active() to provide the proper order or 2) simply remove the swait_active() in swake_up(). The solution 2 not only fixes this problem but also keeps the swait and wait API as close as possible, as wake_up() doesn't provide a full barrier and doesn't do a lockless check of the wait queue either. Moreover, there are users already using swait_active() to do their quick checks for the wait queues, so it make less sense that swake_up() and swake_up_all() do this on their own. This patch then removes the lockless swait_active() check in swake_up() and swake_up_all(). Reported-by: Steven Rostedt Signed-off-by: Boqun Feng Acked-by: Paul E. McKenney Tested-by: Paul E. McKenney --- kernel/sched/swait.c | 6 ------ 1 file changed, 6 deletions(-) diff --git a/kernel/sched/swait.c b/kernel/sched/swait.c index 3d5610dcce11..2227e183e202 100644 --- a/kernel/sched/swait.c +++ b/kernel/sched/swait.c @@ -33,9 +33,6 @@ void swake_up(struct swait_queue_head *q) { unsigned long flags; - if (!swait_active(q)) - return; - raw_spin_lock_irqsave(&q->lock, flags); swake_up_locked(q); raw_spin_unlock_irqrestore(&q->lock, flags); @@ -51,9 +48,6 @@ void swake_up_all(struct swait_queue_head *q) struct swait_queue *curr; LIST_HEAD(tmp); - if (!swait_active(q)) - return; - raw_spin_lock_irq(&q->lock); list_splice_init(&q->task_list, &tmp); while (!list_empty(&tmp)) {