From patchwork Wed Jan 27 00:13:02 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kamal Mostafa X-Patchwork-Id: 573680 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id C81D21402C9; Wed, 27 Jan 2016 11:19:34 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.76) (envelope-from ) id 1aODpa-0004rt-OA; Wed, 27 Jan 2016 00:19:30 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.76) (envelope-from ) id 1aODjQ-0001hw-0o for kernel-team@lists.ubuntu.com; Wed, 27 Jan 2016 00:13:08 +0000 Received: from 1.general.kamal.us.vpn ([10.172.68.52] helo=fourier) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1aODjN-0007bv-RH; Wed, 27 Jan 2016 00:13:06 +0000 Received: from kamal by fourier with local (Exim 4.82) (envelope-from ) id 1aODjL-0005XS-3B; Tue, 26 Jan 2016 16:13:03 -0800 From: Kamal Mostafa To: Bart Van Assche Subject: [4.2.y-ckt stable] Patch "IB/cm: Fix a recently introduced deadlock" has been added to the 4.2.y-ckt tree Date: Tue, 26 Jan 2016 16:13:02 -0800 Message-Id: <1453853582-21260-1-git-send-email-kamal@canonical.com> X-Mailer: git-send-email 1.9.1 X-Extended-Stable: 4.2 Cc: kernel-team@lists.ubuntu.com, Doug Ledford , Kamal Mostafa , Erez Shitrit X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.14 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: kernel-team-bounces@lists.ubuntu.com This is a note to let you know that I have just added a patch titled IB/cm: Fix a recently introduced deadlock to the linux-4.2.y-queue branch of the 4.2.y-ckt extended stable tree which can be found at: http://kernel.ubuntu.com/git/ubuntu/linux.git/log/?h=linux-4.2.y-queue This patch is scheduled to be released in version 4.2.8-ckt3. If you, or anyone else, feels it should not be added to this tree, please reply to this email. For more information about the 4.2.y-ckt tree, see https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable Thanks. -Kamal ---8<------------------------------------------------------------ From 15236ba57e9137838965854ed7dd263740fa03e9 Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Fri, 1 Jan 2016 13:17:46 +0100 Subject: IB/cm: Fix a recently introduced deadlock commit 4bfdf635c668869c69fd18ece37ec66fb6f38fcf upstream. ib_send_cm_drep() calls cm_enter_timewait() while holding a spinlock that can be locked from inside an interrupt handler. Hence do not enable interrupts inside cm_enter_timewait() if called with interrupts disabled. This patch fixes e.g. the following deadlock: Acked-by: Erez Shitrit --- 1.9.1 ================================= [ INFO: inconsistent lock state ] 4.4.0-rc7+ #1 Tainted: G E --------------------------------- inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. swapper/8/0 [HC1[1]:SC0[0]:HE0:SE1] takes: (&(&cm_id_priv->lock)->rlock){?.+...}, at: [] cm_establish+0x 74/0x1b0 [ib_cm] {HARDIRQ-ON-W} state was registered at: [] mark_held_locks+0x71/0x90 [] trace_hardirqs_on_caller+0xa7/0x1c0 [] trace_hardirqs_on+0xd/0x10 [] _raw_spin_unlock_irq+0x2b/0x40 [] cm_enter_timewait+0xae/0x100 [ib_cm] [] ib_send_cm_drep+0xb6/0x190 [ib_cm] [] srp_cm_handler+0x128/0x1a0 [ib_srp] [] cm_process_work+0x20/0xf0 [ib_cm] [] cm_dreq_handler+0x135/0x2c0 [ib_cm] [] cm_work_handler+0x75/0xd0 [ib_cm] [] process_one_work+0x1bd/0x460 [] worker_thread+0x118/0x420 [] kthread+0xe4/0x100 [] ret_from_fork+0x3f/0x70 irq event stamp: 1672286 hardirqs last enabled at (1672283): [] poll_idle+0x10/0x80 hardirqs last disabled at (1672284): [] common_interrupt+0x84/0x89 softirqs last enabled at (1672286): [] _local_bh_enable+0x1c/0x50 softirqs last disabled at (1672285): [] irq_enter+0x47/0x70 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&cm_id_priv->lock)->rlock); lock(&(&cm_id_priv->lock)->rlock); *** DEADLOCK *** no locks held by swapper/8/0. stack backtrace: CPU: 8 PID: 0 Comm: swapper/8 Tainted: G E 4.4.0-rc7+ #1 Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014 ffff88045af5e950 ffff88046e503a88 ffffffff81251c1b 0000000000000007 0000000000000006 0000000000000003 ffff88045af5ddc0 ffff88046e503ad8 ffffffff810a32f4 0000000000000000 0000000000000000 0000000000000001 Call Trace: [] dump_stack+0x4f/0x74 [] print_usage_bug+0x184/0x190 [] mark_lock_irq+0xf2/0x290 [] mark_lock+0x115/0x1b0 [] mark_irqflags+0x15c/0x170 [] __lock_acquire+0x1ef/0x560 [] lock_acquire+0x62/0x80 [] _raw_spin_lock_irqsave+0x43/0x60 [] cm_establish+0x74/0x1b0 [ib_cm] [] ib_cm_notify+0x31/0x100 [ib_cm] [] srpt_qp_event+0x54/0xd0 [ib_srpt] [] mlx4_ib_qp_event+0x72/0xc0 [mlx4_ib] [] mlx4_qp_event+0x69/0xd0 [mlx4_core] [] mlx4_eq_int+0x51e/0xd50 [mlx4_core] [] mlx4_msi_x_interrupt+0xf/0x20 [mlx4_core] [] handle_irq_event_percpu+0x40/0x110 [] handle_irq_event+0x3f/0x70 [] handle_edge_irq+0x79/0x120 [] handle_irq+0x5d/0x130 [] do_IRQ+0x6d/0x130 [] common_interrupt+0x89/0x89 [] cpuidle_enter_state+0xcf/0x200 [] cpuidle_enter+0x12/0x20 [] call_cpuidle+0x36/0x60 [] cpuidle_idle_call+0x63/0x110 [] cpu_idle_loop+0xfa/0x130 [] cpu_startup_entry+0xe/0x10 [] start_secondary+0x83/0x90 Fixes: commit be4b499323bf ("IB/cm: Do not queue work to a device that's going away") Signed-off-by: Bart Van Assche Cc: Erez Shitrit Signed-off-by: Doug Ledford Signed-off-by: Kamal Mostafa --- drivers/infiniband/core/cm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 8be7352..27919bb 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -826,11 +826,11 @@ static void cm_enter_timewait(struct cm_id_private *cm_id_priv) wait_time = cm_convert_to_ms(cm_id_priv->av.timeout); /* Check if the device started its remove_one */ - spin_lock_irq(&cm.lock); + spin_lock_irqsave(&cm.lock, flags); if (!cm_dev->going_down) queue_delayed_work(cm.wq, &cm_id_priv->timewait_info->work.work, msecs_to_jiffies(wait_time)); - spin_unlock_irq(&cm.lock); + spin_unlock_irqrestore(&cm.lock, flags); cm_id_priv->timewait_info = NULL; }