From patchwork Mon Jun 11 21:04:20 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kiran Kumar Modukuri X-Patchwork-Id: 928334 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 414s9y3JSyz9s4Y; Wed, 13 Jun 2018 00:06:14 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1fSjvv-0000Nc-Sz; Tue, 12 Jun 2018 14:06:03 +0000 Received: from hqemgate14.nvidia.com ([216.228.121.143]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1fSTzD-0000P1-Rr for kernel-team@lists.ubuntu.com; Mon, 11 Jun 2018 21:04:24 +0000 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate14.nvidia.com (using TLS: TLSv1, AES128-SHA) id ; Mon, 11 Jun 2018 14:04:31 -0700 Received: from HQMAIL105.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Mon, 11 Jun 2018 14:04:22 -0700 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Mon, 11 Jun 2018 14:04:22 -0700 Received: from HQMAIL108.nvidia.com (172.18.146.13) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1347.2; Mon, 11 Jun 2018 21:04:21 +0000 Received: from HQMAIL108.nvidia.com ([::1]) by HQMAIL108.nvidia.com ([fe80::5cec:1718:2c53:6e93%19]) with mapi id 15.00.1347.000; Mon, 11 Jun 2018 21:04:21 +0000 From: Kiran Kumar Modukuri To: "kernel-team@lists.ubuntu.com" Subject: [SAUCE][XENIAL][PATCH 1/1] [FSCACHE] Fix to handle Oops in fscache module during cookie cleanup Thread-Topic: [SAUCE][XENIAL][PATCH 1/1] [FSCACHE] Fix to handle Oops in fscache module during cookie cleanup Thread-Index: AdQBxxOVC1nHZWHvRXiQ/9KV/hQfyA== Date: Mon, 11 Jun 2018 21:04:20 +0000 Message-ID: <226543b51601404594ae19bf2ecc877a@HQMAIL108.nvidia.com> Accept-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: MSIP_Label_6b558183-044c-4105-8d9c-cea02a2a3d86_Enabled=True; MSIP_Label_6b558183-044c-4105-8d9c-cea02a2a3d86_SiteId=43083d15-7273-40c1-b7db-39efd9ccc17a; MSIP_Label_6b558183-044c-4105-8d9c-cea02a2a3d86_Owner=kmodukuri@nvidia.com; MSIP_Label_6b558183-044c-4105-8d9c-cea02a2a3d86_SetDate=2018-06-11T21:04:19.6240692Z; MSIP_Label_6b558183-044c-4105-8d9c-cea02a2a3d86_Name=Unrestricted; MSIP_Label_6b558183-044c-4105-8d9c-cea02a2a3d86_Application=Microsoft Azure Information Protection; MSIP_Label_6b558183-044c-4105-8d9c-cea02a2a3d86_Extended_MSFT_Method=Automatic; Sensitivity=Unrestricted x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.2.167.214] MIME-Version: 1.0 Content-Language: en-US X-Mailman-Approved-At: Tue, 12 Jun 2018 14:06:02 +0000 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1776277 fscache cookie ref count updated incorrectly during fscache object allocation resulting in following Oops. kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/internal.h:321! kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/cookie.c:639! [Cause] 1)Two threads are trying to do operate on a cookie and two objects. 2a)One thread tries to unmount the filesystem and in process goes over a huge list of objects marking them dead and deleting the objects. cookie->usage is also decremented in following path nfs_fscache_release_super_cookie -> __fscache_relinquish_cookie ->__fscache_cookie_put ->BUG_ON(atomic_read(&cookie->usage) <= 0); 2b)second thread tries to lookup an object for reading data in following path fscache_alloc_object 1) cachefiles_alloc_object -> fscache_object_init -> assign cookie, but usage not bumped. 2) fscache_attach_object -> fails in cant_attach_object because the cookie's backing object or cookie's->parent object are going away 3)fscache_put_object -> cachefiles_put_object ->fscache_object_destroy ->fscache_cookie_put ->BUG_ON(atomic_read(&cookie->usage) <= 0); [Fix] Bump up the cookie usage in fscache_object_init, when it is first being assigned a cookie atomically such that the cookie is added and bumped up if its refcount is not zero. remove the assignment in the attach_object. [Testcase] A user has run ~100 hours of NFS stress tests and not seen this bug recur. [Regression Potential] - Limited to fscache/cachefiles. Signed-off-by: kmodukuri --- fs/fscache/cookie.c | 6 ++---- fs/fscache/object.c | 6 +++++- 2 files changed, 7 insertions(+), 5 deletions(-) -- 2.7.4 ----------------------------------------------------------------------------------- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ----------------------------------------------------------------------------------- diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c index 40d6107..8e2c9ad 100644 --- a/fs/fscache/cookie.c +++ b/fs/fscache/cookie.c @@ -302,6 +302,7 @@ static int fscache_alloc_object(struct fscache_cache *cache, goto error; } + ASSERTCMP(object->cookie, ==, cookie); fscache_stat(&fscache_n_object_alloc); object->debug_id = atomic_inc_return(&fscache_object_debug_id); @@ -358,7 +359,7 @@ static int fscache_attach_object(struct fscache_cookie *cookie, _enter("{%s},{OBJ%x}", cookie->def->name, object->debug_id); spin_lock(&cookie->lock); - + ASSERTCMP(object->cookie, ==, cookie); /* there may be multiple initial creations of this object, but we only * want one */ ret = -EEXIST; @@ -396,9 +397,6 @@ static int fscache_attach_object(struct fscache_cookie *cookie, spin_unlock(&cache->object_list_lock); } - /* attach to the cookie */ - object->cookie = cookie; - atomic_inc(&cookie->usage); hlist_add_head(&object->cookie_link, &cookie->backing_objects); fscache_objlist_add(object); diff --git a/fs/fscache/object.c b/fs/fscache/object.c index 7a182c8..cfc437d 100644 --- a/fs/fscache/object.c +++ b/fs/fscache/object.c @@ -317,7 +317,11 @@ void fscache_object_init(struct fscache_object *object, object->store_limit = 0; object->store_limit_l = 0; object->cache = cache; - object->cookie = cookie; + if (cookie) { + if (atomic_inc_not_zero(&cookie->usage)) { + object->cookie = cookie; + } + } object->parent = NULL; #ifdef CONFIG_FSCACHE_OBJECT_LIST RB_CLEAR_NODE(&object->objlist_link);