From patchwork Sat Apr 2 00:51:18 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kamal Mostafa X-Patchwork-Id: 605482 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id 3qcpfr2FvBz9sDC; Sun, 3 Apr 2016 05:47:00 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.76) (envelope-from ) id 1amRVX-0003Gg-KP; Sat, 02 Apr 2016 19:46:55 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.76) (envelope-from ) id 1am9md-0001BU-RL for kernel-team@lists.ubuntu.com; Sat, 02 Apr 2016 00:51:23 +0000 Received: from 1.general.kamal.us.vpn ([10.172.68.52] helo=fourier) by youngberry.canonical.com with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1am9mc-0005zz-AN; Sat, 02 Apr 2016 00:51:22 +0000 Received: from kamal by fourier with local (Exim 4.86_2) (envelope-from ) id 1am9mZ-0005dR-L7; Fri, 01 Apr 2016 17:51:19 -0700 From: Kamal Mostafa To: Joseph Qi Subject: [3.19.y-ckt stable] Patch "ocfs2/dlm: fix race between convert and recovery" has been added to the 3.19.y-ckt tree Date: Fri, 1 Apr 2016 17:51:18 -0700 Message-Id: <1459558278-21626-1-git-send-email-kamal@canonical.com> X-Mailer: git-send-email 2.7.4 X-Extended-Stable: 3.19 X-Mailman-Approved-At: Sat, 02 Apr 2016 19:46:34 +0000 Cc: Mark Fasheh , Kamal Mostafa , kernel-team@lists.ubuntu.com, Junxiao Bi , Yiwen Jiang , Tariq Saeed , Joel Becker , Andrew Morton , Linus Torvalds X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.14 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: kernel-team-bounces@lists.ubuntu.com This is a note to let you know that I have just added a patch titled ocfs2/dlm: fix race between convert and recovery to the linux-3.19.y-queue branch of the 3.19.y-ckt extended stable tree which can be found at: http://kernel.ubuntu.com/git/ubuntu/linux.git/log/?h=linux-3.19.y-queue This patch is scheduled to be released in version 3.19.8-ckt18. If you, or anyone else, feels it should not be added to this tree, please reply to this email. For more information about the 3.19.y-ckt tree, see https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable Thanks. -Kamal ---8<------------------------------------------------------------ From 12e9cea4d36690d97167c417e40c5d7d8e537ab2 Mon Sep 17 00:00:00 2001 From: Joseph Qi Date: Fri, 25 Mar 2016 14:21:26 -0700 Subject: ocfs2/dlm: fix race between convert and recovery commit ac7cf246dfdbec3d8fed296c7bf30e16f5099dac upstream. There is a race window between dlmconvert_remote and dlm_move_lockres_to_recovery_list, which will cause a lock with OCFS2_LOCK_BUSY in grant list, thus system hangs. dlmconvert_remote { spin_lock(&res->spinlock); list_move_tail(&lock->list, &res->converting); lock->convert_pending = 1; spin_unlock(&res->spinlock); status = dlm_send_remote_convert_request(); >>>>>> race window, master has queued ast and return DLM_NORMAL, and then down before sending ast. this node detects master down and calls dlm_move_lockres_to_recovery_list, which will revert the lock to grant list. Then OCFS2_LOCK_BUSY won't be cleared as new master won't send ast any more because it thinks already be authorized. spin_lock(&res->spinlock); lock->convert_pending = 0; if (status != DLM_NORMAL) dlm_revert_pending_convert(res, lock); spin_unlock(&res->spinlock); } In this case, check if res->state has DLM_LOCK_RES_RECOVERING bit set (res is still in recovering) or res master changed (new master has finished recovery), reset the status to DLM_RECOVERING, then it will retry convert. Signed-off-by: Joseph Qi Reported-by: Yiwen Jiang Reviewed-by: Junxiao Bi Cc: Mark Fasheh Cc: Joel Becker Cc: Tariq Saeed Cc: Junxiao Bi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Kamal Mostafa --- fs/ocfs2/dlm/dlmconvert.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) -- 2.7.4 diff --git a/fs/ocfs2/dlm/dlmconvert.c b/fs/ocfs2/dlm/dlmconvert.c index e36d63f..84de55e 100644 --- a/fs/ocfs2/dlm/dlmconvert.c +++ b/fs/ocfs2/dlm/dlmconvert.c @@ -262,6 +262,7 @@ enum dlm_status dlmconvert_remote(struct dlm_ctxt *dlm, struct dlm_lock *lock, int flags, int type) { enum dlm_status status; + u8 old_owner = res->owner; mlog(0, "type=%d, convert_type=%d, busy=%d\n", lock->ml.type, lock->ml.convert_type, res->state & DLM_LOCK_RES_IN_PROGRESS); @@ -316,11 +317,19 @@ enum dlm_status dlmconvert_remote(struct dlm_ctxt *dlm, spin_lock(&res->spinlock); res->state &= ~DLM_LOCK_RES_IN_PROGRESS; lock->convert_pending = 0; - /* if it failed, move it back to granted queue */ + /* if it failed, move it back to granted queue. + * if master returns DLM_NORMAL and then down before sending ast, + * it may have already been moved to granted queue, reset to + * DLM_RECOVERING and retry convert */ if (status != DLM_NORMAL) { if (status != DLM_NOTQUEUED) dlm_error(status); dlm_revert_pending_convert(res, lock); + } else if ((res->state & DLM_LOCK_RES_RECOVERING) || + (old_owner != res->owner)) { + mlog(0, "res %.*s is in recovering or has been recovered.\n", + res->lockname.len, res->lockname.name); + status = DLM_RECOVERING; } bail: spin_unlock(&res->spinlock);