diff mbox

[Natty,SRU] Kernel Oops : Dentry still in use (1) [unmount of nfs4 0:1d]

Message ID 4ED68017.5030504@canonical.com
State New
Headers show

Commit Message

Chris J Arges Nov. 30, 2011, 7:12 p.m. UTC
BugLink:  http://bugs.launchpad.net/bugs/769927

SRU Justification:

Impact:
If two processes attempt to cause automounting on the same mountpoint at
the same time, the vfsmount holding the mountpoint will be left with one
too few references on it, causing a BUG when the kernel tries to clean up.

Fix:
Fix has been cherry picked from 8aef18845266f5c05904c610088f2d1ed58f6be3t.
It has been accepted in 3.0 and is present in ubuntu-oneiric.
It applies cleanly to ubuntu-natty.

Testcase:
The procedure for replicating this bug is outlined in the patch
description. Essentially mounting an nfs mount with another mount in it.
Running a test program, and unmount the submount and original mount.

Patch is attached.

Comments

Herton Ronaldo Krzesinski Nov. 30, 2011, 8:06 p.m. UTC | #1
On Wed, Nov 30, 2011 at 01:12:23PM -0600, Chris J Arges wrote:
> BugLink:  http://bugs.launchpad.net/bugs/769927
> 
> SRU Justification:
> 
> Impact:
> If two processes attempt to cause automounting on the same mountpoint at
> the same time, the vfsmount holding the mountpoint will be left with one
> too few references on it, causing a BUG when the kernel tries to clean up.
> 
> Fix:
> Fix has been cherry picked from 8aef18845266f5c05904c610088f2d1ed58f6be3t.
> It has been accepted in 3.0 and is present in ubuntu-oneiric.
> It applies cleanly to ubuntu-natty.
> 
> Testcase:
> The procedure for replicating this bug is outlined in the patch
> description. Essentially mounting an nfs mount with another mount in it.
> Running a test program, and unmount the submount and original mount.
> 
> Patch is attached.

> From a8d02e97b6a23ac9648a7a550feb0ed7c7109ec2 Mon Sep 17 00:00:00 2001
> From: Al Viro <viro@ZenIV.linux.org.uk>
> Date: Thu, 16 Jun 2011 15:10:06 +0100
> Subject: [PATCH] VFS: Fix vfsmount overput on simultaneous automount
> 
> BugLink: http://bugs.launchpad.net/bugs/769927
> 
> [Kudos to dhowells for tracking that crap down]
> 
> If two processes attempt to cause automounting on the same mountpoint at the
> same time, the vfsmount holding the mountpoint will be left with one too few
> references on it, causing a BUG when the kernel tries to clean up.
> 
> The problem is that lock_mount() drops the caller's reference to the
> mountpoint's vfsmount in the case where it finds something already mounted on
> the mountpoint as it transits to the mounted filesystem and replaces path->mnt
> with the new mountpoint vfsmount.
> 
> During a pathwalk, however, we don't take a reference on the vfsmount if it is
> the same as the one in the nameidata struct, but do_add_mount() doesn't know
> this.
> 
> The fix is to make sure we have a ref on the vfsmount of the mountpoint before
> calling do_add_mount().  However, if lock_mount() doesn't transit, we're then
> left with an extra ref on the mountpoint vfsmount which needs releasing.
> We can handle that in follow_managed() by not making assumptions about what
> we can and what we cannot get from lookup_mnt() as the current code does.
> 
> The callers of follow_managed() expect that reference to path->mnt will be
> grabbed iff path->mnt has been changed.  follow_managed() and follow_automount()
> keep track of whether such reference has been grabbed and assume that it'll
> happen in those and only those cases that'll have us return with changed
> path->mnt.  That assumption is almost correct - it breaks in case of
> racing automounts and in even harder to hit race between following a mountpoint
> and a couple of mount --move.  The thing is, we don't need to make that
> assumption at all - after the end of loop in follow_manage() we can check
> if path->mnt has ended up unchanged and do mntput() if needed.
> 
> The BUG can be reproduced with the following test program:
> 
> 	#include <stdio.h>
> 	#include <sys/types.h>
> 	#include <sys/stat.h>
> 	#include <unistd.h>
> 	#include <sys/wait.h>
> 	int main(int argc, char **argv)
> 	{
> 		int pid, ws;
> 		struct stat buf;
> 		pid = fork();
> 		stat(argv[1], &buf);
> 		if (pid > 0) wait(&ws);
> 		return 0;
> 	}
> 
> and the following procedure:
> 
>  (1) Mount an NFS volume that on the server has something else mounted on a
>      subdirectory.  For instance, I can mount / from my server:
> 
> 	mount warthog:/ /mnt -t nfs4 -r
> 
>      On the server /data has another filesystem mounted on it, so NFS will see
>      a change in FSID as it walks down the path, and will mark /mnt/data as
>      being a mountpoint.  This will cause the automount code to be triggered.
> 
>      !!! Do not look inside the mounted fs at this point !!!
> 
>  (2) Run the above program on a file within the submount to generate two
>      simultaneous automount requests:
> 
> 	/tmp/forkstat /mnt/data/testfile
> 
>  (3) Unmount the automounted submount:
> 
> 	umount /mnt/data
> 
>  (4) Unmount the original mount:
> 
> 	umount /mnt
> 
>      At this point the kernel should throw a BUG with something like the
>      following:
> 
> 	BUG: Dentry ffff880032e3c5c0{i=2,n=} still in use (1) [unmount of nfs4 0:12]
> 
> Note that the bug appears on the root dentry of the original mount, not the
> mountpoint and not the submount because sys_umount() hasn't got to its final
> mntput_no_expire() yet, but this isn't so obvious from the call trace:
> 
>  [<ffffffff8117cd82>] shrink_dcache_for_umount+0x69/0x82
>  [<ffffffff8116160e>] generic_shutdown_super+0x37/0x15b
>  [<ffffffffa00fae56>] ? nfs_super_return_all_delegations+0x2e/0x1b1 [nfs]
>  [<ffffffff811617f3>] kill_anon_super+0x1d/0x7e
>  [<ffffffffa00d0be1>] nfs4_kill_super+0x60/0xb6 [nfs]
>  [<ffffffff81161c17>] deactivate_locked_super+0x34/0x83
>  [<ffffffff811629ff>] deactivate_super+0x6f/0x7b
>  [<ffffffff81186261>] mntput_no_expire+0x18d/0x199
>  [<ffffffff811862a8>] mntput+0x3b/0x44
>  [<ffffffff81186d87>] release_mounts+0xa2/0xbf
>  [<ffffffff811876af>] sys_umount+0x47a/0x4ba
>  [<ffffffff8109e1ca>] ? trace_hardirqs_on_caller+0x1fd/0x22f
>  [<ffffffff816ea86b>] system_call_fastpath+0x16/0x1b
> 
> as do_umount() is inlined.  However, you can see release_mounts() in there.
> 
> Note also that it may be necessary to have multiple CPU cores to be able to
> trigger this bug.
> 
> Tested-by: Jeff Layton <jlayton@redhat.com>
> Tested-by: Ian Kent <raven@themaw.net>
> Signed-off-by: David Howells <dhowells@redhat.com>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> (cherry picked from commit 8aef18845266f5c05904c610088f2d1ed58f6be3)
> 
> Signed-off-by: Chris J Arges <chris.j.arges@canonical.com>
> ---
>  fs/namei.c |   24 ++++++++++++++++--------
>  1 files changed, 16 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index 44890c6..7bf6968 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -923,6 +923,11 @@ static int follow_automount(struct path *path, unsigned flags,
>  	if (!mnt) /* mount collision */
>  		return 0;
>  
> +	if (!*need_mntput) {
> +		/* lock_mount() may release path->mnt on error */
> +		mntget(path->mnt);
> +		*need_mntput = true;
> +	}
>  	err = finish_automount(mnt, path);
>  
>  	switch (err) {
> @@ -930,12 +935,9 @@ static int follow_automount(struct path *path, unsigned flags,
>  		/* Someone else made a mount here whilst we were busy */
>  		return 0;
>  	case 0:
> -		dput(path->dentry);
> -		if (*need_mntput)
> -			mntput(path->mnt);
> +		path_put(path);
>  		path->mnt = mnt;
>  		path->dentry = dget(mnt->mnt_root);
> -		*need_mntput = true;
>  		return 0;
>  	default:
>  		return err;
> @@ -955,9 +957,10 @@ static int follow_automount(struct path *path, unsigned flags,
>   */
>  static int follow_managed(struct path *path, unsigned flags)
>  {
> +	struct vfsmount *mnt = path->mnt; /* held by caller, must be left alone */
>  	unsigned managed;
>  	bool need_mntput = false;
> -	int ret;
> +	int ret = 0;
>  
>  	/* Given that we're not holding a lock here, we retain the value in a
>  	 * local variable for each dentry as we look at it so that we don't see
> @@ -973,7 +976,7 @@ static int follow_managed(struct path *path, unsigned flags)
>  			ret = path->dentry->d_op->d_manage(path->dentry,
>  							   false, false);
>  			if (ret < 0)
> -				return ret == -EISDIR ? 0 : ret;
> +				break;
>  		}
>  
>  		/* Transit to a mounted filesystem. */
> @@ -999,14 +1002,19 @@ static int follow_managed(struct path *path, unsigned flags)
>  		if (managed & DCACHE_NEED_AUTOMOUNT) {
>  			ret = follow_automount(path, flags, &need_mntput);
>  			if (ret < 0)
> -				return ret == -EISDIR ? 0 : ret;
> +				break;
>  			continue;
>  		}
>  
>  		/* We didn't change the current path point */
>  		break;
>  	}
> -	return 0;
> +
> +	if (need_mntput && path->mnt == mnt)
> +		mntput(path->mnt);
> +	if (ret == -EISDIR)
> +		ret = 0;
> +	return ret;
>  }
>  
>  int follow_down_one(struct path *path)
> -- 
> 1.7.5.4
> 

> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
Tim Gardner Dec. 1, 2011, 2:51 a.m. UTC | #2
On 11/30/2011 12:12 PM, Chris J Arges wrote:
> BugLink:  http://bugs.launchpad.net/bugs/769927
>
> SRU Justification:
>
> Impact:
> If two processes attempt to cause automounting on the same mountpoint at
> the same time, the vfsmount holding the mountpoint will be left with one
> too few references on it, causing a BUG when the kernel tries to clean up.
>
> Fix:
> Fix has been cherry picked from 8aef18845266f5c05904c610088f2d1ed58f6be3t.
> It has been accepted in 3.0 and is present in ubuntu-oneiric.
> It applies cleanly to ubuntu-natty.
>
> Testcase:
> The procedure for replicating this bug is outlined in the patch
> description. Essentially mounting an nfs mount with another mount in it.
> Running a test program, and unmount the submount and original mount.
>
> Patch is attached.
>
diff mbox

Patch

From a8d02e97b6a23ac9648a7a550feb0ed7c7109ec2 Mon Sep 17 00:00:00 2001
From: Al Viro <viro@ZenIV.linux.org.uk>
Date: Thu, 16 Jun 2011 15:10:06 +0100
Subject: [PATCH] VFS: Fix vfsmount overput on simultaneous automount

BugLink: http://bugs.launchpad.net/bugs/769927

[Kudos to dhowells for tracking that crap down]

If two processes attempt to cause automounting on the same mountpoint at the
same time, the vfsmount holding the mountpoint will be left with one too few
references on it, causing a BUG when the kernel tries to clean up.

The problem is that lock_mount() drops the caller's reference to the
mountpoint's vfsmount in the case where it finds something already mounted on
the mountpoint as it transits to the mounted filesystem and replaces path->mnt
with the new mountpoint vfsmount.

During a pathwalk, however, we don't take a reference on the vfsmount if it is
the same as the one in the nameidata struct, but do_add_mount() doesn't know
this.

The fix is to make sure we have a ref on the vfsmount of the mountpoint before
calling do_add_mount().  However, if lock_mount() doesn't transit, we're then
left with an extra ref on the mountpoint vfsmount which needs releasing.
We can handle that in follow_managed() by not making assumptions about what
we can and what we cannot get from lookup_mnt() as the current code does.

The callers of follow_managed() expect that reference to path->mnt will be
grabbed iff path->mnt has been changed.  follow_managed() and follow_automount()
keep track of whether such reference has been grabbed and assume that it'll
happen in those and only those cases that'll have us return with changed
path->mnt.  That assumption is almost correct - it breaks in case of
racing automounts and in even harder to hit race between following a mountpoint
and a couple of mount --move.  The thing is, we don't need to make that
assumption at all - after the end of loop in follow_manage() we can check
if path->mnt has ended up unchanged and do mntput() if needed.

The BUG can be reproduced with the following test program:

	#include <stdio.h>
	#include <sys/types.h>
	#include <sys/stat.h>
	#include <unistd.h>
	#include <sys/wait.h>
	int main(int argc, char **argv)
	{
		int pid, ws;
		struct stat buf;
		pid = fork();
		stat(argv[1], &buf);
		if (pid > 0) wait(&ws);
		return 0;
	}

and the following procedure:

 (1) Mount an NFS volume that on the server has something else mounted on a
     subdirectory.  For instance, I can mount / from my server:

	mount warthog:/ /mnt -t nfs4 -r

     On the server /data has another filesystem mounted on it, so NFS will see
     a change in FSID as it walks down the path, and will mark /mnt/data as
     being a mountpoint.  This will cause the automount code to be triggered.

     !!! Do not look inside the mounted fs at this point !!!

 (2) Run the above program on a file within the submount to generate two
     simultaneous automount requests:

	/tmp/forkstat /mnt/data/testfile

 (3) Unmount the automounted submount:

	umount /mnt/data

 (4) Unmount the original mount:

	umount /mnt

     At this point the kernel should throw a BUG with something like the
     following:

	BUG: Dentry ffff880032e3c5c0{i=2,n=} still in use (1) [unmount of nfs4 0:12]

Note that the bug appears on the root dentry of the original mount, not the
mountpoint and not the submount because sys_umount() hasn't got to its final
mntput_no_expire() yet, but this isn't so obvious from the call trace:

 [<ffffffff8117cd82>] shrink_dcache_for_umount+0x69/0x82
 [<ffffffff8116160e>] generic_shutdown_super+0x37/0x15b
 [<ffffffffa00fae56>] ? nfs_super_return_all_delegations+0x2e/0x1b1 [nfs]
 [<ffffffff811617f3>] kill_anon_super+0x1d/0x7e
 [<ffffffffa00d0be1>] nfs4_kill_super+0x60/0xb6 [nfs]
 [<ffffffff81161c17>] deactivate_locked_super+0x34/0x83
 [<ffffffff811629ff>] deactivate_super+0x6f/0x7b
 [<ffffffff81186261>] mntput_no_expire+0x18d/0x199
 [<ffffffff811862a8>] mntput+0x3b/0x44
 [<ffffffff81186d87>] release_mounts+0xa2/0xbf
 [<ffffffff811876af>] sys_umount+0x47a/0x4ba
 [<ffffffff8109e1ca>] ? trace_hardirqs_on_caller+0x1fd/0x22f
 [<ffffffff816ea86b>] system_call_fastpath+0x16/0x1b

as do_umount() is inlined.  However, you can see release_mounts() in there.

Note also that it may be necessary to have multiple CPU cores to be able to
trigger this bug.

Tested-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Ian Kent <raven@themaw.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
(cherry picked from commit 8aef18845266f5c05904c610088f2d1ed58f6be3)

Signed-off-by: Chris J Arges <chris.j.arges@canonical.com>
---
 fs/namei.c |   24 ++++++++++++++++--------
 1 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 44890c6..7bf6968 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -923,6 +923,11 @@  static int follow_automount(struct path *path, unsigned flags,
 	if (!mnt) /* mount collision */
 		return 0;
 
+	if (!*need_mntput) {
+		/* lock_mount() may release path->mnt on error */
+		mntget(path->mnt);
+		*need_mntput = true;
+	}
 	err = finish_automount(mnt, path);
 
 	switch (err) {
@@ -930,12 +935,9 @@  static int follow_automount(struct path *path, unsigned flags,
 		/* Someone else made a mount here whilst we were busy */
 		return 0;
 	case 0:
-		dput(path->dentry);
-		if (*need_mntput)
-			mntput(path->mnt);
+		path_put(path);
 		path->mnt = mnt;
 		path->dentry = dget(mnt->mnt_root);
-		*need_mntput = true;
 		return 0;
 	default:
 		return err;
@@ -955,9 +957,10 @@  static int follow_automount(struct path *path, unsigned flags,
  */
 static int follow_managed(struct path *path, unsigned flags)
 {
+	struct vfsmount *mnt = path->mnt; /* held by caller, must be left alone */
 	unsigned managed;
 	bool need_mntput = false;
-	int ret;
+	int ret = 0;
 
 	/* Given that we're not holding a lock here, we retain the value in a
 	 * local variable for each dentry as we look at it so that we don't see
@@ -973,7 +976,7 @@  static int follow_managed(struct path *path, unsigned flags)
 			ret = path->dentry->d_op->d_manage(path->dentry,
 							   false, false);
 			if (ret < 0)
-				return ret == -EISDIR ? 0 : ret;
+				break;
 		}
 
 		/* Transit to a mounted filesystem. */
@@ -999,14 +1002,19 @@  static int follow_managed(struct path *path, unsigned flags)
 		if (managed & DCACHE_NEED_AUTOMOUNT) {
 			ret = follow_automount(path, flags, &need_mntput);
 			if (ret < 0)
-				return ret == -EISDIR ? 0 : ret;
+				break;
 			continue;
 		}
 
 		/* We didn't change the current path point */
 		break;
 	}
-	return 0;
+
+	if (need_mntput && path->mnt == mnt)
+		mntput(path->mnt);
+	if (ret == -EISDIR)
+		ret = 0;
+	return ret;
 }
 
 int follow_down_one(struct path *path)
-- 
1.7.5.4