From patchwork Mon Jun 11 21:03:49 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kiran Kumar Modukuri X-Patchwork-Id: 928333 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 414s9y1XSJz9s1b; Wed, 13 Jun 2018 00:06:14 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1fSjvv-0000NR-MY; Tue, 12 Jun 2018 14:06:03 +0000 Received: from hqemgate16.nvidia.com ([216.228.121.65]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1fSTym-0000MU-GB for kernel-team@lists.ubuntu.com; Mon, 11 Jun 2018 21:03:56 +0000 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqemgate16.nvidia.com (using TLS: TLSv1, AES128-SHA) id ; Mon, 11 Jun 2018 14:04:00 -0700 Received: from HQMAIL103.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Mon, 11 Jun 2018 14:04:01 -0700 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Mon, 11 Jun 2018 14:04:01 -0700 Received: from HQMAIL108.nvidia.com (172.18.146.13) by HQMAIL103.nvidia.com (172.20.187.11) with Microsoft SMTP Server (TLS) id 15.0.1347.2; Mon, 11 Jun 2018 21:03:50 +0000 Received: from HQMAIL108.nvidia.com ([::1]) by HQMAIL108.nvidia.com ([fe80::5cec:1718:2c53:6e93%19]) with mapi id 15.00.1347.000; Mon, 11 Jun 2018 21:03:50 +0000 From: Kiran Kumar Modukuri To: "kernel-team@lists.ubuntu.com" Subject: [SAUCE][XENIAL][PATCH 1/1] [CacheFiles] Fix to handle Oops in cachefiles module during new object lookup while old object is being cleaned up Thread-Topic: [SAUCE][XENIAL][PATCH 1/1] [CacheFiles] Fix to handle Oops in cachefiles module during new object lookup while old object is being cleaned up Thread-Index: AdQBx7F8icFmJczaSYeSphHTcpFgSw== Date: Mon, 11 Jun 2018 21:03:49 +0000 Message-ID: <10be0b16e3aa4d2792ffe9853a8f9653@HQMAIL108.nvidia.com> Accept-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: MSIP_Label_6b558183-044c-4105-8d9c-cea02a2a3d86_Enabled=True; MSIP_Label_6b558183-044c-4105-8d9c-cea02a2a3d86_SiteId=43083d15-7273-40c1-b7db-39efd9ccc17a; MSIP_Label_6b558183-044c-4105-8d9c-cea02a2a3d86_Owner=kmodukuri@nvidia.com; MSIP_Label_6b558183-044c-4105-8d9c-cea02a2a3d86_SetDate=2018-06-11T21:03:48.9792437Z; MSIP_Label_6b558183-044c-4105-8d9c-cea02a2a3d86_Name=Unrestricted; MSIP_Label_6b558183-044c-4105-8d9c-cea02a2a3d86_Application=Microsoft Azure Information Protection; MSIP_Label_6b558183-044c-4105-8d9c-cea02a2a3d86_Extended_MSFT_Method=Automatic; Sensitivity=Unrestricted x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.2.167.214] MIME-Version: 1.0 Content-Language: en-US X-Mailman-Approved-At: Tue, 12 Jun 2018 14:06:02 +0000 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1776254 [Impact] Oops during heavy NFS + FSCache + Cachefiles CacheFiles: Error: Overlong wait for old active object to go away. BUG: unable to handle kernel NULL pointer dereference at 0000000000000002 CacheFiles: Error: Object already active kernel BUG at fs/cachefiles/namei.c:163! [Cause] In a heavily loaded system with big files being read and truncated, an fscache object for a cookie is being dropped and a new object being looked. The new object being looked for has to wait for the old object to go away before the new object is moved to active state. [Fix] Clear the flag 'CACHEFILES_OBJECT_ACTIVE' for the new object when retrying the object lookup. Remove the BUG() for the case where the old object is still being dropped and convert to WARN() [Testcase] A user has run ~100 hours of NFS stress tests and not seen this bug recur. [Regression Potential] - Limited to fscache/cachefiles. Signed-off-by: kmodukuri --- fs/cachefiles/namei.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) -- 2.7.4 ----------------------------------------------------------------------------------- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ----------------------------------------------------------------------------------- diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c index c4b8934..b08286d 100644 --- a/fs/cachefiles/namei.c +++ b/fs/cachefiles/namei.c @@ -194,7 +194,7 @@ wait_for_old_object: pr_err("\n"); pr_err("Error: Unexpected object collision\n"); cachefiles_printk_object(object, xobject); - BUG(); + WARN(true, "Unexpected object collision\n"); } atomic_inc(&xobject->usage); write_unlock(&cache->active_lock); @@ -247,6 +247,7 @@ wait_for_old_object: ASSERT(!test_bit(CACHEFILES_OBJECT_ACTIVE, &xobject->flags)); + clear_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags); cache->cache.ops->put_object(&xobject->fscache); goto try_again;