diff mbox series

[SAUCE,XENIAL,1/1,FSCACHE] Fix to handle Oops in fscache module during cookie cleanup

Message ID 226543b51601404594ae19bf2ecc877a@HQMAIL108.nvidia.com
State New
Headers show
Series [SAUCE,XENIAL,1/1,FSCACHE] Fix to handle Oops in fscache module during cookie cleanup | expand

Commit Message

Kiran Kumar Modukuri June 11, 2018, 9:04 p.m. UTC
BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1776277

fscache cookie ref count updated incorrectly during fscache object allocation resulting in following Oops.

kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/internal.h:321!
kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/cookie.c:639!

[Cause]
1)Two threads are trying to do operate on a cookie and two objects.
2a)One thread tries to unmount the filesystem and in process goes over
   a huge list of objects marking them dead and deleting the objects.
   cookie->usage is also decremented in following path
      nfs_fscache_release_super_cookie
       -> __fscache_relinquish_cookie
        ->__fscache_cookie_put
        ->BUG_ON(atomic_read(&cookie->usage) <= 0);

2b)second thread tries to lookup an object for reading data in
   following path

   fscache_alloc_object
   1) cachefiles_alloc_object
       -> fscache_object_init
          -> assign cookie, but usage not bumped.
   2) fscache_attach_object -> fails in cant_attach_object because the
        cookie's backing object or cookie's->parent object are going away
   3)fscache_put_object
       -> cachefiles_put_object
         ->fscache_object_destroy
           ->fscache_cookie_put
              ->BUG_ON(atomic_read(&cookie->usage) <= 0);
[Fix]
 Bump up the cookie usage in fscache_object_init,
 when it is first being assigned a cookie atomically such that the cookie
 is added and bumped up if its refcount is not zero.
 remove the assignment in the attach_object.

[Testcase]
A user has run ~100 hours of NFS stress tests and not seen this bug recur.

[Regression Potential]
 - Limited to fscache/cachefiles.

Signed-off-by: kmodukuri <kmodukuri@nvidia.com>
---
 fs/fscache/cookie.c | 6 ++----
 fs/fscache/object.c | 6 +++++-
 2 files changed, 7 insertions(+), 5 deletions(-)

--
2.7.4
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information.  Any unauthorized review, use, disclosure or distribution
is prohibited.  If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

Comments

Stefan Bader June 13, 2018, 7:19 a.m. UTC | #1
On 11.06.2018 23:04, Kiran Kumar Modukuri wrote:
> BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1776277
> 
> fscache cookie ref count updated incorrectly during fscache object allocation resulting in following Oops.
> 
> kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/internal.h:321!
> kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/cookie.c:639!
> 

Any reason this is not sent upstream? Looking at today's tip it seems to me the
exact same problem still exists in current kernel versions. We rather prefer bug
fixes through upstream/stable because they then will be reviewed by upstream
subsystem maintainer(s). But more importantly this will then more likely come
back to *all* currently supported series (like this one to at least 18.04/bionic
and also into the current development series (cosmic).

-Stefan

> [Cause]
> 1)Two threads are trying to do operate on a cookie and two objects.
> 2a)One thread tries to unmount the filesystem and in process goes over
>    a huge list of objects marking them dead and deleting the objects.
>    cookie->usage is also decremented in following path
>       nfs_fscache_release_super_cookie
>        -> __fscache_relinquish_cookie
>         ->__fscache_cookie_put
>         ->BUG_ON(atomic_read(&cookie->usage) <= 0);
> 
> 2b)second thread tries to lookup an object for reading data in
>    following path
> 
>    fscache_alloc_object
>    1) cachefiles_alloc_object
>        -> fscache_object_init
>           -> assign cookie, but usage not bumped.
>    2) fscache_attach_object -> fails in cant_attach_object because the
>         cookie's backing object or cookie's->parent object are going away
>    3)fscache_put_object
>        -> cachefiles_put_object
>          ->fscache_object_destroy
>            ->fscache_cookie_put
>               ->BUG_ON(atomic_read(&cookie->usage) <= 0);
> [Fix]
>  Bump up the cookie usage in fscache_object_init,
>  when it is first being assigned a cookie atomically such that the cookie
>  is added and bumped up if its refcount is not zero.
>  remove the assignment in the attach_object.
> 
> [Testcase]
> A user has run ~100 hours of NFS stress tests and not seen this bug recur.
> 
> [Regression Potential]
>  - Limited to fscache/cachefiles.
> 
> Signed-off-by: kmodukuri <kmodukuri@nvidia.com>
> ---
>  fs/fscache/cookie.c | 6 ++----
>  fs/fscache/object.c | 6 +++++-
>  2 files changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
> index 40d6107..8e2c9ad 100644
> --- a/fs/fscache/cookie.c
> +++ b/fs/fscache/cookie.c
> @@ -302,6 +302,7 @@ static int fscache_alloc_object(struct fscache_cache *cache,
>                 goto error;
>         }
> 
> +       ASSERTCMP(object->cookie, ==, cookie);
>         fscache_stat(&fscache_n_object_alloc);
> 
>         object->debug_id = atomic_inc_return(&fscache_object_debug_id);
> @@ -358,7 +359,7 @@ static int fscache_attach_object(struct fscache_cookie *cookie,
>         _enter("{%s},{OBJ%x}", cookie->def->name, object->debug_id);
> 
>         spin_lock(&cookie->lock);
> -
> +       ASSERTCMP(object->cookie, ==, cookie);
>         /* there may be multiple initial creations of this object, but we only
>          * want one */
>         ret = -EEXIST;
> @@ -396,9 +397,6 @@ static int fscache_attach_object(struct fscache_cookie *cookie,
>                 spin_unlock(&cache->object_list_lock);
>         }
> 
> -       /* attach to the cookie */
> -       object->cookie = cookie;
> -       atomic_inc(&cookie->usage);
>         hlist_add_head(&object->cookie_link, &cookie->backing_objects);
> 
>         fscache_objlist_add(object);
> diff --git a/fs/fscache/object.c b/fs/fscache/object.c
> index 7a182c8..cfc437d 100644
> --- a/fs/fscache/object.c
> +++ b/fs/fscache/object.c
> @@ -317,7 +317,11 @@ void fscache_object_init(struct fscache_object *object,
>         object->store_limit = 0;
>         object->store_limit_l = 0;
>         object->cache = cache;
> -       object->cookie = cookie;
> +       if (cookie) {
> +               if (atomic_inc_not_zero(&cookie->usage)) {
> +                       object->cookie = cookie;
> +               }
> +       }
>         object->parent = NULL;
>  #ifdef CONFIG_FSCACHE_OBJECT_LIST
>         RB_CLEAR_NODE(&object->objlist_link);
> --
> 2.7.4
> -----------------------------------------------------------------------------------
> This email message is for the sole use of the intended recipient(s) and may contain
> confidential information.  Any unauthorized review, use, disclosure or distribution
> is prohibited.  If you are not the intended recipient, please contact the sender by
> reply email and destroy all copies of the original message.
> -----------------------------------------------------------------------------------
>
Kiran Kumar Modukuri June 15, 2018, 5:03 p.m. UTC | #2
Thanks for replying.  Yes I will be sending the exact patch to the upstream mailing list linux-cachefs@redhat.com  as well.  The request for SAUCE patch is to save calendar time.
Also looks like the formatting from email client has changed the tabs to spaces. I will resend the patch with indentation corrected.

Thanks
Kiran

-----Original Message-----
From: Stefan Bader <stefan.bader@canonical.com> 
Sent: Wednesday, June 13, 2018 12:19 AM
To: Kiran Kumar Modukuri <kmodukuri@nvidia.com>; kernel-team@lists.ubuntu.com
Subject: Re: [SAUCE][XENIAL][PATCH 1/1] [FSCACHE] Fix to handle Oops in fscache module during cookie cleanup

On 11.06.2018 23:04, Kiran Kumar Modukuri wrote:
> BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1776277
> 
> fscache cookie ref count updated incorrectly during fscache object allocation resulting in following Oops.
> 
> kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/internal.h:321!
> kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/cookie.c:639!
> 

Any reason this is not sent upstream? Looking at today's tip it seems to me the exact same problem still exists in current kernel versions. We rather prefer bug fixes through upstream/stable because they then will be reviewed by upstream subsystem maintainer(s). But more importantly this will then more likely come back to *all* currently supported series (like this one to at least 18.04/bionic and also into the current development series (cosmic).

-Stefan

> [Cause]
> 1)Two threads are trying to do operate on a cookie and two objects.
> 2a)One thread tries to unmount the filesystem and in process goes over
>    a huge list of objects marking them dead and deleting the objects.
>    cookie->usage is also decremented in following path
>       nfs_fscache_release_super_cookie
>        -> __fscache_relinquish_cookie
>         ->__fscache_cookie_put
>         ->BUG_ON(atomic_read(&cookie->usage) <= 0);
> 
> 2b)second thread tries to lookup an object for reading data in
>    following path
> 
>    fscache_alloc_object
>    1) cachefiles_alloc_object
>        -> fscache_object_init
>           -> assign cookie, but usage not bumped.
>    2) fscache_attach_object -> fails in cant_attach_object because the
>         cookie's backing object or cookie's->parent object are going away
>    3)fscache_put_object
>        -> cachefiles_put_object
>          ->fscache_object_destroy
>            ->fscache_cookie_put
>               ->BUG_ON(atomic_read(&cookie->usage) <= 0); [Fix]  Bump 
> up the cookie usage in fscache_object_init,  when it is first being 
> assigned a cookie atomically such that the cookie  is added and bumped 
> up if its refcount is not zero.
>  remove the assignment in the attach_object.
> 
> [Testcase]
> A user has run ~100 hours of NFS stress tests and not seen this bug recur.
> 
> [Regression Potential]
>  - Limited to fscache/cachefiles.
> 
> Signed-off-by: kmodukuri <kmodukuri@nvidia.com>
> ---
>  fs/fscache/cookie.c | 6 ++----
>  fs/fscache/object.c | 6 +++++-
>  2 files changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c index 
> 40d6107..8e2c9ad 100644
> --- a/fs/fscache/cookie.c
> +++ b/fs/fscache/cookie.c
> @@ -302,6 +302,7 @@ static int fscache_alloc_object(struct fscache_cache *cache,
>                 goto error;
>         }
> 
> +       ASSERTCMP(object->cookie, ==, cookie);
>         fscache_stat(&fscache_n_object_alloc);
> 
>         object->debug_id = 
> atomic_inc_return(&fscache_object_debug_id);
> @@ -358,7 +359,7 @@ static int fscache_attach_object(struct fscache_cookie *cookie,
>         _enter("{%s},{OBJ%x}", cookie->def->name, object->debug_id);
> 
>         spin_lock(&cookie->lock);
> -
> +       ASSERTCMP(object->cookie, ==, cookie);
>         /* there may be multiple initial creations of this object, but we only
>          * want one */
>         ret = -EEXIST;
> @@ -396,9 +397,6 @@ static int fscache_attach_object(struct fscache_cookie *cookie,
>                 spin_unlock(&cache->object_list_lock);
>         }
> 
> -       /* attach to the cookie */
> -       object->cookie = cookie;
> -       atomic_inc(&cookie->usage);
>         hlist_add_head(&object->cookie_link, 
> &cookie->backing_objects);
> 
>         fscache_objlist_add(object);
> diff --git a/fs/fscache/object.c b/fs/fscache/object.c index 
> 7a182c8..cfc437d 100644
> --- a/fs/fscache/object.c
> +++ b/fs/fscache/object.c
> @@ -317,7 +317,11 @@ void fscache_object_init(struct fscache_object *object,
>         object->store_limit = 0;
>         object->store_limit_l = 0;
>         object->cache = cache;
> -       object->cookie = cookie;
> +       if (cookie) {
> +               if (atomic_inc_not_zero(&cookie->usage)) {
> +                       object->cookie = cookie;
> +               }
> +       }
>         object->parent = NULL;
>  #ifdef CONFIG_FSCACHE_OBJECT_LIST
>         RB_CLEAR_NODE(&object->objlist_link);
> --
> 2.7.4
> ----------------------------------------------------------------------
> ------------- This email message is for the sole use of the intended 
> recipient(s) and may contain confidential information.  Any 
> unauthorized review, use, disclosure or distribution is prohibited.  
> If you are not the intended recipient, please contact the sender by 
> reply email and destroy all copies of the original message.
> ----------------------------------------------------------------------
> -------------
>
diff mbox series

Patch

diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 40d6107..8e2c9ad 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -302,6 +302,7 @@  static int fscache_alloc_object(struct fscache_cache *cache,
                goto error;
        }

+       ASSERTCMP(object->cookie, ==, cookie);
        fscache_stat(&fscache_n_object_alloc);

        object->debug_id = atomic_inc_return(&fscache_object_debug_id);
@@ -358,7 +359,7 @@  static int fscache_attach_object(struct fscache_cookie *cookie,
        _enter("{%s},{OBJ%x}", cookie->def->name, object->debug_id);

        spin_lock(&cookie->lock);
-
+       ASSERTCMP(object->cookie, ==, cookie);
        /* there may be multiple initial creations of this object, but we only
         * want one */
        ret = -EEXIST;
@@ -396,9 +397,6 @@  static int fscache_attach_object(struct fscache_cookie *cookie,
                spin_unlock(&cache->object_list_lock);
        }

-       /* attach to the cookie */
-       object->cookie = cookie;
-       atomic_inc(&cookie->usage);
        hlist_add_head(&object->cookie_link, &cookie->backing_objects);

        fscache_objlist_add(object);
diff --git a/fs/fscache/object.c b/fs/fscache/object.c
index 7a182c8..cfc437d 100644
--- a/fs/fscache/object.c
+++ b/fs/fscache/object.c
@@ -317,7 +317,11 @@  void fscache_object_init(struct fscache_object *object,
        object->store_limit = 0;
        object->store_limit_l = 0;
        object->cache = cache;
-       object->cookie = cookie;
+       if (cookie) {
+               if (atomic_inc_not_zero(&cookie->usage)) {
+                       object->cookie = cookie;
+               }
+       }
        object->parent = NULL;
 #ifdef CONFIG_FSCACHE_OBJECT_LIST
        RB_CLEAR_NODE(&object->objlist_link);