From patchwork Tue Jun 12 18:44:23 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kiran Kumar Modukuri X-Patchwork-Id: 928454 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 414zM74hrzz9s01; Wed, 13 Jun 2018 04:44:35 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1fSoHM-0004qS-Js; Tue, 12 Jun 2018 18:44:28 +0000 Received: from hqemgate16.nvidia.com ([216.228.121.65]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1fSoHL-0004q8-AN for kernel-team@lists.ubuntu.com; Tue, 12 Jun 2018 18:44:27 +0000 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqemgate16.nvidia.com (using TLS: TLSv1, AES128-SHA) id ; Tue, 12 Jun 2018 11:44:30 -0700 Received: from HQMAIL107.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Tue, 12 Jun 2018 11:44:32 -0700 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Tue, 12 Jun 2018 11:44:32 -0700 Received: from HQMAIL102.nvidia.com (172.18.146.10) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1347.2; Tue, 12 Jun 2018 18:44:24 +0000 Received: from HQMAIL108.nvidia.com (172.18.146.13) by HQMAIL102.nvidia.com (172.18.146.10) with Microsoft SMTP Server (TLS) id 15.0.1347.2; Tue, 12 Jun 2018 18:44:24 +0000 Received: from HQMAIL108.nvidia.com ([::1]) by HQMAIL108.nvidia.com ([fe80::5cec:1718:2c53:6e93%19]) with mapi id 15.00.1347.000; Tue, 12 Jun 2018 18:44:24 +0000 From: Kiran Kumar Modukuri To: "kernel-team@lists.ubuntu.com" Subject: [SAUCE][XENIAL][PATCH 1/1] [CacheFiles] Fix to handle Oops in cachefiles module during new object lookup while old object is being cleaned up Thread-Topic: [SAUCE][XENIAL][PATCH 1/1] [CacheFiles] Fix to handle Oops in cachefiles module during new object lookup while old object is being cleaned up Thread-Index: AQHUAnz9wwWSSBMN7EiA3OrHgIXmsw== Date: Tue, 12 Jun 2018 18:44:23 +0000 Message-ID: <1528829063537.99986@nvidia.com> Accept-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.110.49.140] MIME-Version: 1.0 Content-Language: en-US X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1776254 [Impact] Oops during heavy NFS + FSCache + Cachefiles CacheFiles: Error: Overlong wait for old active object to go away. BUG: unable to handle kernel NULL pointer dereference at 0000000000000002 CacheFiles: Error: Object already active kernel BUG at fs/cachefiles/namei.c:163! [Cause] In a heavily loaded system with big files being read and truncated, an fscache object for a cookie is being dropped and a new object being looked. The new object being looked for has to wait for the old object to go away before the new object is moved to active state. [Fix] Clear the flag 'CACHEFILES_OBJECT_ACTIVE' for the new object when retrying the object lookup. Remove the BUG() for the case where the old object is still being dropped and convert to WARN() [Testcase] A user has run ~100 hours of NFS stress tests and not seen this bug recur. [Regression Potential] - Limited to fscache/cachefiles. Signed-off-by: kmodukuri --- fs/cachefiles/namei.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c index c4b8934..b08286d 100644 --- a/fs/cachefiles/namei.c +++ b/fs/cachefiles/namei.c @@ -194,7 +194,7 @@ wait_for_old_object: pr_err("\n"); pr_err("Error: Unexpected object collision\n"); cachefiles_printk_object(object, xobject); - BUG(); + WARN(true, "Unexpected object collision\n"); } atomic_inc(&xobject->usage); write_unlock(&cache->active_lock); @@ -247,6 +247,7 @@ wait_for_old_object: ASSERT(!test_bit(CACHEFILES_OBJECT_ACTIVE, &xobject->flags)); + clear_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags); cache->cache.ops->put_object(&xobject->fscache); goto try_again;