From patchwork Thu Nov 12 23:15:06 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kamal Mostafa X-Patchwork-Id: 543816 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id 8D20B141430; Fri, 13 Nov 2015 10:19:06 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.76) (envelope-from ) id 1Zx18x-0005CE-EU; Thu, 12 Nov 2015 23:19:03 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtp (Exim 4.76) (envelope-from ) id 1Zx15B-0001o4-0J for kernel-team@lists.ubuntu.com; Thu, 12 Nov 2015 23:15:09 +0000 Received: from 1.general.kamal.us.vpn ([10.172.68.52] helo=fourier) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1Zx15A-0006JD-QD; Thu, 12 Nov 2015 23:15:08 +0000 Received: from kamal by fourier with local (Exim 4.82) (envelope-from ) id 1Zx158-0008NU-Ji; Thu, 12 Nov 2015 15:15:06 -0800 From: Kamal Mostafa To: Roman Gushchin Subject: [3.19.y-ckt stable] Patch "md/raid5: fix locking in handle_stripe_clean_event()" has been added to staging queue Date: Thu, 12 Nov 2015 15:15:06 -0800 Message-Id: <1447370106-32174-1-git-send-email-kamal@canonical.com> X-Mailer: git-send-email 1.9.1 X-Extended-Stable: 3.19 Cc: Kamal Mostafa , Shaohua Li , NeilBrown , kernel-team@lists.ubuntu.com X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.14 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: kernel-team-bounces@lists.ubuntu.com This is a note to let you know that I have just added a patch titled md/raid5: fix locking in handle_stripe_clean_event() to the linux-3.19.y-queue branch of the 3.19.y-ckt extended stable tree which can be found at: http://kernel.ubuntu.com/git/ubuntu/linux.git/log/?h=linux-3.19.y-queue This patch is scheduled to be released in version 3.19.8-ckt10. If you, or anyone else, feels it should not be added to this tree, please reply to this email. For more information about the 3.19.y-ckt tree, see https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable Thanks. -Kamal ------ From 4df46646db3c2c81a5b7441947ef47a6a07cbd2b Mon Sep 17 00:00:00 2001 From: Roman Gushchin Date: Fri, 6 Nov 2015 12:54:47 +0300 Subject: [PATCH 115/120] md/raid5: fix locking in handle_stripe_clean_event() commit b8a9d66d043ffac116100775a469f05f5158c16f upstream. After commit 566c09c53455 ("raid5: relieve lock contention in get_active_stripe()") __find_stripe() is called under conf->hash_locks + hash. But handle_stripe_clean_event() calls remove_hash() under conf->device_lock. Under some cirscumstances the hash chain can be circuited, and we get an infinite loop with disabled interrupts and locked hash lock in __find_stripe(). This leads to hard lockup on multiple CPUs and following system crash. I was able to reproduce this behavior on raid6 over 6 ssd disks. The devices_handle_discard_safely option should be set to enable trim support. The following script was used: for i in `seq 1 32`; do dd if=/dev/zero of=large$i bs=10M count=100 & done Signed-off-by: Roman Gushchin Fixes: 566c09c53455 ("raid5: relieve lock contention in get_active_stripe()") Signed-off-by: NeilBrown Cc: Shaohua Li [ luis: backported to 3.16: used Roman's backport to 3.14 ] Signed-off-by: Luis Henriques Signed-off-by: Kamal Mostafa --- drivers/md/raid5.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) -- 1.9.1 diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index e421016..5fa7549 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -3060,6 +3060,8 @@ static void handle_stripe_clean_event(struct r5conf *conf, } if (!discard_pending && test_bit(R5_Discard, &sh->dev[sh->pd_idx].flags)) { + int hash = sh->hash_lock_index; + clear_bit(R5_Discard, &sh->dev[sh->pd_idx].flags); clear_bit(R5_UPTODATE, &sh->dev[sh->pd_idx].flags); if (sh->qd_idx >= 0) { @@ -3073,9 +3075,9 @@ static void handle_stripe_clean_event(struct r5conf *conf, * no updated data, so remove it from hash list and the stripe * will be reinitialized */ - spin_lock_irq(&conf->device_lock); + spin_lock_irq(conf->hash_locks + hash); remove_hash(sh); - spin_unlock_irq(&conf->device_lock); + spin_unlock_irq(conf->hash_locks + hash); if (test_bit(STRIPE_SYNC_REQUESTED, &sh->state)) set_bit(STRIPE_HANDLE, &sh->state);