[SAUCE,XENIAL,1/1,CacheFiles] Fix to handle Oops in cachefiles module during new object lookup while old object is being cleaned up

Message ID 1528829063537.99986@nvidia.com
State New
Headers show
Series
  • [SAUCE,XENIAL,1/1,CacheFiles] Fix to handle Oops in cachefiles module during new object lookup while old object is being cleaned up
Related show

Commit Message

Kiran Kumar Modukuri June 12, 2018, 6:44 p.m.
BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1776254

[Impact]
Oops during heavy NFS + FSCache + Cachefiles

CacheFiles: Error: Overlong wait for old active object to go away.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000002

CacheFiles: Error: Object already active kernel BUG at fs/cachefiles/namei.c:163!

[Cause]
In a heavily loaded system with big files being read and truncated, an fscache object for a cookie is being dropped
and a new object being looked. The new object being looked for has to wait for the old object to go away before
the new object is moved to active state.

[Fix]
Clear the flag 'CACHEFILES_OBJECT_ACTIVE' for the new object when retrying the object lookup.
Remove the BUG() for the case where the old object is still being dropped and convert to WARN()

[Testcase]
A user has run ~100 hours of NFS stress tests and not seen this bug recur.

[Regression Potential]
 - Limited to fscache/cachefiles.

Signed-off-by: kmodukuri <mailto:kmodukuri@nvidia.com>
---
 fs/cachefiles/namei.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Patch

diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index c4b8934..b08286d 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -194,7 +194,7 @@  wait_for_old_object:
 		pr_err("\n");
 		pr_err("Error: Unexpected object collision\n");
 		cachefiles_printk_object(object, xobject);
-		BUG();
+		WARN(true, "Unexpected object collision\n");
 	}
 	atomic_inc(&xobject->usage);
 	write_unlock(&cache->active_lock);
@@ -247,6 +247,7 @@  wait_for_old_object:
 
 	ASSERT(!test_bit(CACHEFILES_OBJECT_ACTIVE, &xobject->flags));
 
+	clear_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags);
 	cache->cache.ops->put_object(&xobject->fscache);
 	goto try_again;