[SAUCE,XENIAL,1/1,CacheFiles] Fix to handle Oops in cachefiles module during new object lookup while old object is being cleaned up

Message ID 10be0b16e3aa4d2792ffe9853a8f9653@HQMAIL108.nvidia.com
State New
Headers show
Series
  • [SAUCE,XENIAL,1/1,CacheFiles] Fix to handle Oops in cachefiles module during new object lookup while old object is being cleaned up
Related show

Commit Message

Kiran Kumar Modukuri June 11, 2018, 9:03 p.m.
BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1776254

[Impact]
Oops during heavy NFS + FSCache + Cachefiles 

CacheFiles: Error: Overlong wait for old active object to go away.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000002

CacheFiles: Error: Object already active kernel BUG at fs/cachefiles/namei.c:163!

[Cause]
In a heavily loaded system with big files being read and truncated, an fscache object for a cookie is being dropped
and a new object being looked. The new object being looked for has to wait for the old object to go away before 
the new object is moved to active state.

[Fix]
Clear the flag 'CACHEFILES_OBJECT_ACTIVE' for the new object when retrying the object lookup.
Remove the BUG() for the case where the old object is still being dropped and convert to WARN()

[Testcase]
A user has run ~100 hours of NFS stress tests and not seen this bug recur.

[Regression Potential]
 - Limited to fscache/cachefiles.

Signed-off-by: kmodukuri <mailto:kmodukuri@nvidia.com>
---
 fs/cachefiles/namei.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--
2.7.4
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information.  Any unauthorized review, use, disclosure or distribution
is prohibited.  If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

Patch

diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c index c4b8934..b08286d 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -194,7 +194,7 @@  wait_for_old_object:
                pr_err("\n");
                pr_err("Error: Unexpected object collision\n");
                cachefiles_printk_object(object, xobject);
-               BUG();
+              WARN(true, "Unexpected object collision\n");
        }
        atomic_inc(&xobject->usage);
        write_unlock(&cache->active_lock);
@@ -247,6 +247,7 @@  wait_for_old_object:

        ASSERT(!test_bit(CACHEFILES_OBJECT_ACTIVE, &xobject->flags));

+      clear_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags);
        cache->cache.ops->put_object(&xobject->fscache);
        goto try_again;