From patchwork Fri Mar 22 20:05:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Faria de Oliveira X-Patchwork-Id: 1061621 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 44Qvn20p4fz9sSV; Sat, 23 Mar 2019 07:06:30 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1h7QQn-00013t-LL; Fri, 22 Mar 2019 20:06:21 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1h7QQl-00013U-Q1 for kernel-team@lists.ubuntu.com; Fri, 22 Mar 2019 20:06:19 +0000 Received: from mail-qk1-f200.google.com ([209.85.222.200]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1h7QQl-0003wb-Dz for kernel-team@lists.ubuntu.com; Fri, 22 Mar 2019 20:06:19 +0000 Received: by mail-qk1-f200.google.com with SMTP id o135so2937531qke.11 for ; Fri, 22 Mar 2019 13:06:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=p9jLSIVR9OVwP/1WHoWXoAFR8lq57PxW0I4kBBzSy74=; b=mM6qYDEGcFTRl48VuY+aam3MPCk3vy0EtVF/wK5tWKb0Z1BjnzDTcVKr89Nm7gmfUB /oSfUcfO9zZJWn86vkdV+t+Mk03HIcXozDznuEkOziElw9D+lJ5r6W8A2dbXlcUtrbda P8lG89blcVoT+oFD38SDv42v4RCvaumMSyAsJRkDF/1RnA6VF2uRASGdRbifiHxZwL6G jXHbMdews3HH/69kNgIJGYGpScAi6iOZi89MjhtlELzTCXsdDyHLrmUY24kUq+knB4O9 vWJTW3vIXxIoxrO5ki/hZRTag2PLX8tpOuCfsfZrOMfV6dyUfYjR/7sx76bdUyYIRrfx NrEw== X-Gm-Message-State: APjAAAXqjZXyfcQ1zjkXC1+rJw8ClzVl5b0PB5JTqo7y2m4wOVSk5uiN BYXabzcbvL5PP0NpzV2t4mo3gSqHC0m3UFay7iHLLa14IlctIGy/0YNZgll+geCdcdNIyFrqIRO vIMkNrVvY0H/AXtBzBTywTrjZdAWftHKeZhzHw8daXQ== X-Received: by 2002:ac8:2c5a:: with SMTP id e26mr9857459qta.189.1553285178426; Fri, 22 Mar 2019 13:06:18 -0700 (PDT) X-Google-Smtp-Source: APXvYqx1IqPfMQexLg5x5LIN73UpDecZmecByE2VHh1B2sn5CtASztifO4tELLPGQCYknGwU4CsJSw== X-Received: by 2002:ac8:2c5a:: with SMTP id e26mr9857451qta.189.1553285178266; Fri, 22 Mar 2019 13:06:18 -0700 (PDT) Received: from localhost.localdomain ([2804:14c:4e7:c0e:5083:4574:81c5:ff8d]) by smtp.gmail.com with ESMTPSA id q50sm3759205qtq.34.2019.03.22.13.06.16 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 22 Mar 2019 13:06:17 -0700 (PDT) From: Mauricio Faria de Oliveira To: kernel-team@lists.ubuntu.com Subject: [B/C][PATCH 1/1] fscache: fix race between enablement and dropping of object Date: Fri, 22 Mar 2019 17:05:32 -0300 Message-Id: <20190322200532.22424-2-mfo@canonical.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190322200532.22424-1-mfo@canonical.com> References: <20190322200532.22424-1-mfo@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: NeilBrown BugLink: https://bugs.launchpad.net/bugs/1821395 It was observed that a process blocked indefintely in __fscache_read_or_alloc_page(), waiting for FSCACHE_COOKIE_LOOKING_UP to be cleared via fscache_wait_for_deferred_lookup(). At this time, ->backing_objects was empty, which would normaly prevent __fscache_read_or_alloc_page() from getting to the point of waiting. This implies that ->backing_objects was cleared *after* __fscache_read_or_alloc_page was was entered. When an object is "killed" and then "dropped", FSCACHE_COOKIE_LOOKING_UP is cleared in fscache_lookup_failure(), then KILL_OBJECT and DROP_OBJECT are "called" and only in DROP_OBJECT is ->backing_objects cleared. This leaves a window where something else can set FSCACHE_COOKIE_LOOKING_UP and __fscache_read_or_alloc_page() can start waiting, before ->backing_objects is cleared There is some uncertainty in this analysis, but it seems to be fit the observations. Adding the wake in this patch will be handled correctly by __fscache_read_or_alloc_page(), as it checks if ->backing_objects is empty again, after waiting. Customer which reported the hang, also report that the hang cannot be reproduced with this fix. The backtrace for the blocked process looked like: PID: 29360 TASK: ffff881ff2ac0f80 CPU: 3 COMMAND: "zsh" #0 [ffff881ff43efbf8] schedule at ffffffff815e56f1 #1 [ffff881ff43efc58] bit_wait at ffffffff815e64ed #2 [ffff881ff43efc68] __wait_on_bit at ffffffff815e61b8 #3 [ffff881ff43efca0] out_of_line_wait_on_bit at ffffffff815e625e #4 [ffff881ff43efd08] fscache_wait_for_deferred_lookup at ffffffffa04f2e8f [fscache] #5 [ffff881ff43efd18] __fscache_read_or_alloc_page at ffffffffa04f2ffe [fscache] #6 [ffff881ff43efd58] __nfs_readpage_from_fscache at ffffffffa0679668 [nfs] #7 [ffff881ff43efd78] nfs_readpage at ffffffffa067092b [nfs] #8 [ffff881ff43efda0] generic_file_read_iter at ffffffff81187a73 #9 [ffff881ff43efe50] nfs_file_read at ffffffffa066544b [nfs] Signed-off-by: NeilBrown Signed-off-by: David Howells (cherry picked from commit c5a94f434c82529afda290df3235e4d85873c5b4) Signed-off-by: Mauricio Faria de Oliveira --- fs/fscache/object.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/fscache/object.c b/fs/fscache/object.c index 4ae441f65af2..fc1ba5bfb5c3 100644 --- a/fs/fscache/object.c +++ b/fs/fscache/object.c @@ -716,6 +716,9 @@ static const struct fscache_state *fscache_drop_object(struct fscache_object *ob if (awaken) wake_up_bit(&cookie->flags, FSCACHE_COOKIE_INVALIDATING); + if (test_and_clear_bit(FSCACHE_COOKIE_LOOKING_UP, &cookie->flags)) + wake_up_bit(&cookie->flags, FSCACHE_COOKIE_LOOKING_UP); + /* Prevent a race with our last child, which has to signal EV_CLEARED * before dropping our spinlock.