From patchwork Wed Feb 3 13:59:22 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Henriques X-Patchwork-Id: 578040 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id B43CE140C57; Thu, 4 Feb 2016 01:02:43 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.76) (envelope-from ) id 1aQy11-0006ff-Tz; Wed, 03 Feb 2016 14:02:39 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.76) (envelope-from ) id 1aQxxs-00059J-US for kernel-team@lists.ubuntu.com; Wed, 03 Feb 2016 13:59:24 +0000 Received: from av-217-129-142-138.netvisao.pt ([217.129.142.138] helo=localhost) by youngberry.canonical.com with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1aQxxr-000593-DI; Wed, 03 Feb 2016 13:59:24 +0000 From: Luis Henriques To: Dave Chinner Subject: [3.16.y-ckt stable] Patch "xfs: inode recovery readahead can race with inode buffer creation" has been added to the 3.16.y-ckt tree Date: Wed, 3 Feb 2016 13:59:22 +0000 Message-Id: <1454507962-29915-1-git-send-email-luis.henriques@canonical.com> X-Extended-Stable: 3.16 Cc: Brian Foster , Dave Chinner , kernel-team@lists.ubuntu.com X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.14 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: kernel-team-bounces@lists.ubuntu.com This is a note to let you know that I have just added a patch titled xfs: inode recovery readahead can race with inode buffer creation to the linux-3.16.y-queue branch of the 3.16.y-ckt extended stable tree which can be found at: http://kernel.ubuntu.com/git/ubuntu/linux.git/log/?h=linux-3.16.y-queue This patch is scheduled to be released in version 3.16.7-ckt24. If you, or anyone else, feels it should not be added to this tree, please reply to this email. For more information about the 3.16.y-ckt tree, see https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable Thanks. -Luis ---8<------------------------------------------------------------ From 15814082717cba02b98bea32abfe706de60b319c Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Tue, 12 Jan 2016 07:03:44 +1100 Subject: xfs: inode recovery readahead can race with inode buffer creation commit b79f4a1c68bb99152d0785ee4ea3ab4396cdacc6 upstream. When we do inode readahead in log recovery, we do can do the readahead before we've replayed the icreate transaction that stamps the buffer with inode cores. The inode readahead verifier catches this and marks the buffer as !done to indicate that it doesn't yet contain valid inodes. In adding buffer error notification (i.e. setting b_error = -EIO at the same time as as we clear the done flag) to such a readahead verifier failure, we can then get subsequent inode recovery failing with this error: XFS (dm-0): metadata I/O error: block 0xa00060 ("xlog_recover_do..(read#2)") error 5 numblks 32 This occurs when readahead completion races with icreate item replay such as: inode readahead find buffer lock buffer submit RA io .... icreate recovery xfs_trans_get_buffer find buffer lock buffer ..... fails verifier clear XBF_DONE set bp->b_error = -EIO release and unlock buffer icreate initialises buffer marks buffer as done adds buffer to delayed write queue releases buffer At this point, we have an initialised inode buffer that is up to date but has an -EIO state registered against it. When we finally get to recovering an inode in that buffer: inode item recovery xfs_trans_read_buffer find buffer lock buffer sees XBF_DONE is set, returns buffer sees bp->b_error is set fail log recovery! Essentially, we need xfs_trans_get_buf_map() to clear the error status of the buffer when doing a lookup. This function returns uninitialised buffers, so the buffer returned can not be in an error state and none of the code that uses this function expects b_error to be set on return. Indeed, there is an ASSERT(!bp->b_error); in the transaction case in xfs_trans_get_buf_map() that would have caught this if log recovery used transactions.... This patch firstly changes the inode readahead failure to set -EIO on the buffer, and secondly changes xfs_buf_get_map() to never return a buffer with an error state set so this first change doesn't cause unexpected log recovery failures. Signed-off-by: Dave Chinner Reviewed-by: Brian Foster Signed-off-by: Dave Chinner [ luis: backported to 3.16: - file rename: fs/xfs/libxfs/xfs_inode_buf.c -> fs/xfs/xfs_inode_buf.c - adjusted context ] Signed-off-by: Luis Henriques --- fs/xfs/xfs_buf.c | 7 +++++++ fs/xfs/xfs_inode_buf.c | 12 +++++++----- 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 7a34a1ae6552..7d988a50c353 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -607,6 +607,13 @@ found: } } + /* + * Clear b_error if this is a lookup from a caller that doesn't expect + * valid data to be found in the buffer. + */ + if (!(flags & XBF_READ)) + xfs_buf_ioerror(bp, 0); + XFS_STATS_INC(xb_get); trace_xfs_buf_get(bp, flags, _RET_IP_); return bp; diff --git a/fs/xfs/xfs_inode_buf.c b/fs/xfs/xfs_inode_buf.c index cb35ae41d4a1..fdc04bd2fdf7 100644 --- a/fs/xfs/xfs_inode_buf.c +++ b/fs/xfs/xfs_inode_buf.c @@ -66,11 +66,12 @@ xfs_inobp_check( * has not had the inode cores stamped into it. Hence for readahead, the buffer * may be potentially invalid. * - * If the readahead buffer is invalid, we don't want to mark it with an error, - * but we do want to clear the DONE status of the buffer so that a followup read - * will re-read it from disk. This will ensure that we don't get an unnecessary - * warnings during log recovery and we don't get unnecssary panics on debug - * kernels. + * If the readahead buffer is invalid, we need to mark it with an error and + * clear the DONE status of the buffer so that a followup read will re-read it + * from disk. We don't report the error otherwise to avoid warnings during log + * recovery and we don't get unnecssary panics on debug kernels. We use EIO here + * because all we want to do is say readahead failed; there is no-one to report + * the error to, so this will distinguish it from a non-ra verifier failure. */ static void xfs_inode_buf_verify( @@ -98,6 +99,7 @@ xfs_inode_buf_verify( XFS_RANDOM_ITOBP_INOTOBP))) { if (readahead) { bp->b_flags &= ~XBF_DONE; + xfs_buf_ioerror(bp, -EIO); return; }