Message ID | 1407418337-9139-1-git-send-email-luis.henriques@canonical.com |
---|---|
State | New |
Headers | show |
On Thu, 7 Aug 2014 14:32:17 +0100 Luis Henriques <luis.henriques@canonical.com> wrote: > This is a note to let you know that I have just added a patch titled > > vfs: allow umount to handle mountpoints without revalidating them > > to the linux-3.11.y-queue branch of the 3.11.y.z extended stable tree > which can be found at: > > http://kernel.ubuntu.com/git?p=ubuntu/linux.git;a=shortlog;h=refs/heads/linux-3.11.y-queue > > If you, or anyone else, feels it should not be added to this tree, please > reply to this email. > > For more information about the 3.11.y.z tree, see > https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable > > Thanks. > -Luis > > ------ > > >From 41ee9e50729fa43da90a04691dec3cabbb3171fe Mon Sep 17 00:00:00 2001 > From: Jeff Layton <jlayton@redhat.com> > Date: Fri, 26 Jul 2013 06:23:25 -0400 > Subject: vfs: allow umount to handle mountpoints without revalidating them > > commit 8033426e6bdb2690d302872ac1e1fadaec1a5581 upstream. > > Christopher reported a regression where he was unable to unmount a NFS > filesystem where the root had gone stale. The problem is that > d_revalidate handles the root of the filesystem differently from other > dentries, but d_weak_revalidate does not. We could simply fix this by > making d_weak_revalidate return success on IS_ROOT dentries, but there > are cases where we do want to revalidate the root of the fs. > > A umount is really a special case. We generally aren't interested in > anything but the dentry and vfsmount that's attached at that point. If > the inode turns out to be stale we just don't care since the intent is > to stop using it anyway. > > Try to handle this situation better by treating umount as a special > case in the lookup code. Have it resolve the parent using normal > means, and then do a lookup of the final dentry without revalidating > it. In most cases, the final lookup will come out of the dcache, but > the case where there's a trailing symlink or !LAST_NORM entry on the > end complicates things a bit. > > Cc: Neil Brown <neilb@suse.de> > Reported-by: Christopher T Vogan <cvogan@us.ibm.com> > Signed-off-by: Jeff Layton <jlayton@redhat.com> > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> > Cc: Chris Dunlop <chris@onthe.net.au> > Signed-off-by: Luis Henriques <luis.henriques@canonical.com> > --- > fs/namei.c | 182 ++++++++++++++++++++++++++++++++++++++++++++++++++ > fs/namespace.c | 2 +- > include/linux/namei.h | 1 + > 3 files changed, 184 insertions(+), 1 deletion(-) > > diff --git a/fs/namei.c b/fs/namei.c > index 2a2d0236f82a..a10bd2f8b66b 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -2185,6 +2185,188 @@ user_path_parent(int dfd, const char __user *path, struct nameidata *nd, > return s; > } > > +/** > + * umount_lookup_last - look up last component for umount > + * @nd: pathwalk nameidata - currently pointing at parent directory of "last" > + * @path: pointer to container for result > + * > + * This is a special lookup_last function just for umount. In this case, we > + * need to resolve the path without doing any revalidation. > + * > + * The nameidata should be the result of doing a LOOKUP_PARENT pathwalk. Since > + * mountpoints are always pinned in the dcache, their ancestors are too. Thus, > + * in almost all cases, this lookup will be served out of the dcache. The only > + * cases where it won't are if nd->last refers to a symlink or the path is > + * bogus and it doesn't exist. > + * > + * Returns: > + * -error: if there was an error during lookup. This includes -ENOENT if the > + * lookup found a negative dentry. The nd->path reference will also be > + * put in this case. > + * > + * 0: if we successfully resolved nd->path and found it to not to be a > + * symlink that needs to be followed. "path" will also be populated. > + * The nd->path reference will also be put. > + * > + * 1: if we successfully resolved nd->last and found it to be a symlink > + * that needs to be followed. "path" will be populated with the path > + * to the link, and nd->path will *not* be put. > + */ > +static int > +umount_lookup_last(struct nameidata *nd, struct path *path) > +{ > + int error = 0; > + struct dentry *dentry; > + struct dentry *dir = nd->path.dentry; > + > + if (unlikely(nd->flags & LOOKUP_RCU)) { > + WARN_ON_ONCE(1); > + error = -ECHILD; > + goto error_check; > + } > + > + nd->flags &= ~LOOKUP_PARENT; > + > + if (unlikely(nd->last_type != LAST_NORM)) { > + error = handle_dots(nd, nd->last_type); > + if (!error) > + dentry = dget(nd->path.dentry); > + goto error_check; > + } > + > + mutex_lock(&dir->d_inode->i_mutex); > + dentry = d_lookup(dir, &nd->last); > + if (!dentry) { > + /* > + * No cached dentry. Mounted dentries are pinned in the cache, > + * so that means that this dentry is probably a symlink or the > + * path doesn't actually point to a mounted dentry. > + */ > + dentry = d_alloc(dir, &nd->last); > + if (!dentry) { > + error = -ENOMEM; > + } else { > + dentry = lookup_real(dir->d_inode, dentry, nd->flags); > + if (IS_ERR(dentry)) > + error = PTR_ERR(dentry); > + } > + } > + mutex_unlock(&dir->d_inode->i_mutex); > + > +error_check: > + if (!error) { > + if (!dentry->d_inode) { > + error = -ENOENT; > + dput(dentry); > + } else { > + path->dentry = dentry; > + path->mnt = mntget(nd->path.mnt); > + if (should_follow_link(dentry->d_inode, > + nd->flags & LOOKUP_FOLLOW)) > + return 1; > + follow_mount(path); > + } Above code is buggy and fixed by commit 295dc39d941dc2ae53d5c170365af4c9d5c16212 Author: Vasily Averin <vvs@parallels.com> Date: Mon Jul 21 12:30:23 2014 +0400 fs: umount on symlink leaks mnt count so be sure to include that patch too. NeilBrown > + } > + terminate_walk(nd); > + return error; > +} > + > +/** > + * path_umountat - look up a path to be umounted > + * @dfd: directory file descriptor to start walk from > + * @name: full pathname to walk > + * @flags: lookup flags > + * @nd: pathwalk nameidata > + * > + * Look up the given name, but don't attempt to revalidate the last component. > + * Returns 0 and "path" will be valid on success; Retuns error otherwise. > + */ > +static int > +path_umountat(int dfd, const char *name, struct path *path, unsigned int flags) > +{ > + struct file *base = NULL; > + struct nameidata nd; > + int err; > + > + err = path_init(dfd, name, flags | LOOKUP_PARENT, &nd, &base); > + if (unlikely(err)) > + return err; > + > + current->total_link_count = 0; > + err = link_path_walk(name, &nd); > + if (err) > + goto out; > + > + /* If we're in rcuwalk, drop out of it to handle last component */ > + if (nd.flags & LOOKUP_RCU) { > + err = unlazy_walk(&nd, NULL); > + if (err) { > + terminate_walk(&nd); > + goto out; > + } > + } > + > + err = umount_lookup_last(&nd, path); > + while (err > 0) { > + void *cookie; > + struct path link = *path; > + err = may_follow_link(&link, &nd); > + if (unlikely(err)) > + break; > + nd.flags |= LOOKUP_PARENT; > + err = follow_link(&link, &nd, &cookie); > + if (err) > + break; > + err = umount_lookup_last(&nd, path); > + put_link(&nd, &link, cookie); > + } > +out: > + if (base) > + fput(base); > + > + if (nd.root.mnt && !(nd.flags & LOOKUP_ROOT)) > + path_put(&nd.root); > + > + return err; > +} > + > +/** > + * user_path_umountat - lookup a path from userland in order to umount it > + * @dfd: directory file descriptor > + * @name: pathname from userland > + * @flags: lookup flags > + * @path: pointer to container to hold result > + * > + * A umount is a special case for path walking. We're not actually interested > + * in the inode in this situation, and ESTALE errors can be a problem. We > + * simply want track down the dentry and vfsmount attached at the mountpoint > + * and avoid revalidating the last component. > + * > + * Returns 0 and populates "path" on success. > + */ > +int > +user_path_umountat(int dfd, const char __user *name, unsigned int flags, > + struct path *path) > +{ > + struct filename *s = getname(name); > + int error; > + > + if (IS_ERR(s)) > + return PTR_ERR(s); > + > + error = path_umountat(dfd, s->name, path, flags | LOOKUP_RCU); > + if (unlikely(error == -ECHILD)) > + error = path_umountat(dfd, s->name, path, flags); > + if (unlikely(error == -ESTALE)) > + error = path_umountat(dfd, s->name, path, flags | LOOKUP_REVAL); > + > + if (likely(!error)) > + audit_inode(s, path->dentry, 0); > + > + putname(s); > + return error; > +} > + > /* > * It's inline, so penalty for filesystems that don't use sticky bit is > * minimal. > diff --git a/fs/namespace.c b/fs/namespace.c > index a45ba4f267fe..ad8ea9bc2518 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -1318,7 +1318,7 @@ SYSCALL_DEFINE2(umount, char __user *, name, int, flags) > if (!(flags & UMOUNT_NOFOLLOW)) > lookup_flags |= LOOKUP_FOLLOW; > > - retval = user_path_at(AT_FDCWD, name, lookup_flags, &path); > + retval = user_path_umountat(AT_FDCWD, name, lookup_flags, &path); > if (retval) > goto out; > mnt = real_mount(path.mnt); > diff --git a/include/linux/namei.h b/include/linux/namei.h > index 5a5ff57ceed4..cd09751c71a0 100644 > --- a/include/linux/namei.h > +++ b/include/linux/namei.h > @@ -58,6 +58,7 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND}; > > extern int user_path_at(int, const char __user *, unsigned, struct path *); > extern int user_path_at_empty(int, const char __user *, unsigned, struct path *, int *empty); > +extern int user_path_umountat(int, const char __user *, unsigned int, struct path *); > > #define user_path(name, path) user_path_at(AT_FDCWD, name, LOOKUP_FOLLOW, path) > #define user_lpath(name, path) user_path_at(AT_FDCWD, name, 0, path) > -- > 1.9.1
On Fri, Aug 08, 2014 at 08:59:16AM +1000, NeilBrown wrote: <...> > > +error_check: > > + if (!error) { > > + if (!dentry->d_inode) { > > + error = -ENOENT; > > + dput(dentry); > > + } else { > > + path->dentry = dentry; > > + path->mnt = mntget(nd->path.mnt); > > + if (should_follow_link(dentry->d_inode, > > + nd->flags & LOOKUP_FOLLOW)) > > + return 1; > > + follow_mount(path); > > + } > > Above code is buggy and fixed by > > commit 295dc39d941dc2ae53d5c170365af4c9d5c16212 > Author: Vasily Averin <vvs@parallels.com> > Date: Mon Jul 21 12:30:23 2014 +0400 > > fs: umount on symlink leaks mnt count > > > so be sure to include that patch too. > > NeilBrown > > Ah! Thanks a lot for catching that. I'm queuing a backport of the fix. Cheers, -- Luís
diff --git a/fs/namei.c b/fs/namei.c index 2a2d0236f82a..a10bd2f8b66b 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2185,6 +2185,188 @@ user_path_parent(int dfd, const char __user *path, struct nameidata *nd, return s; } +/** + * umount_lookup_last - look up last component for umount + * @nd: pathwalk nameidata - currently pointing at parent directory of "last" + * @path: pointer to container for result + * + * This is a special lookup_last function just for umount. In this case, we + * need to resolve the path without doing any revalidation. + * + * The nameidata should be the result of doing a LOOKUP_PARENT pathwalk. Since + * mountpoints are always pinned in the dcache, their ancestors are too. Thus, + * in almost all cases, this lookup will be served out of the dcache. The only + * cases where it won't are if nd->last refers to a symlink or the path is + * bogus and it doesn't exist. + * + * Returns: + * -error: if there was an error during lookup. This includes -ENOENT if the + * lookup found a negative dentry. The nd->path reference will also be + * put in this case. + * + * 0: if we successfully resolved nd->path and found it to not to be a + * symlink that needs to be followed. "path" will also be populated. + * The nd->path reference will also be put. + * + * 1: if we successfully resolved nd->last and found it to be a symlink + * that needs to be followed. "path" will be populated with the path + * to the link, and nd->path will *not* be put. + */ +static int +umount_lookup_last(struct nameidata *nd, struct path *path) +{ + int error = 0; + struct dentry *dentry; + struct dentry *dir = nd->path.dentry; + + if (unlikely(nd->flags & LOOKUP_RCU)) { + WARN_ON_ONCE(1); + error = -ECHILD; + goto error_check; + } + + nd->flags &= ~LOOKUP_PARENT; + + if (unlikely(nd->last_type != LAST_NORM)) { + error = handle_dots(nd, nd->last_type); + if (!error) + dentry = dget(nd->path.dentry); + goto error_check; + } + + mutex_lock(&dir->d_inode->i_mutex); + dentry = d_lookup(dir, &nd->last); + if (!dentry) { + /* + * No cached dentry. Mounted dentries are pinned in the cache, + * so that means that this dentry is probably a symlink or the + * path doesn't actually point to a mounted dentry. + */ + dentry = d_alloc(dir, &nd->last); + if (!dentry) { + error = -ENOMEM; + } else { + dentry = lookup_real(dir->d_inode, dentry, nd->flags); + if (IS_ERR(dentry)) + error = PTR_ERR(dentry); + } + } + mutex_unlock(&dir->d_inode->i_mutex); + +error_check: + if (!error) { + if (!dentry->d_inode) { + error = -ENOENT; + dput(dentry); + } else { + path->dentry = dentry; + path->mnt = mntget(nd->path.mnt); + if (should_follow_link(dentry->d_inode, + nd->flags & LOOKUP_FOLLOW)) + return 1; + follow_mount(path); + } + } + terminate_walk(nd); + return error; +} + +/** + * path_umountat - look up a path to be umounted + * @dfd: directory file descriptor to start walk from + * @name: full pathname to walk + * @flags: lookup flags + * @nd: pathwalk nameidata + * + * Look up the given name, but don't attempt to revalidate the last component. + * Returns 0 and "path" will be valid on success; Retuns error otherwise. + */ +static int +path_umountat(int dfd, const char *name, struct path *path, unsigned int flags) +{ + struct file *base = NULL; + struct nameidata nd; + int err; + + err = path_init(dfd, name, flags | LOOKUP_PARENT, &nd, &base); + if (unlikely(err)) + return err; + + current->total_link_count = 0; + err = link_path_walk(name, &nd); + if (err) + goto out; + + /* If we're in rcuwalk, drop out of it to handle last component */ + if (nd.flags & LOOKUP_RCU) { + err = unlazy_walk(&nd, NULL); + if (err) { + terminate_walk(&nd); + goto out; + } + } + + err = umount_lookup_last(&nd, path); + while (err > 0) { + void *cookie; + struct path link = *path; + err = may_follow_link(&link, &nd); + if (unlikely(err)) + break; + nd.flags |= LOOKUP_PARENT; + err = follow_link(&link, &nd, &cookie); + if (err) + break; + err = umount_lookup_last(&nd, path); + put_link(&nd, &link, cookie); + } +out: + if (base) + fput(base); + + if (nd.root.mnt && !(nd.flags & LOOKUP_ROOT)) + path_put(&nd.root); + + return err; +} + +/** + * user_path_umountat - lookup a path from userland in order to umount it + * @dfd: directory file descriptor + * @name: pathname from userland + * @flags: lookup flags + * @path: pointer to container to hold result + * + * A umount is a special case for path walking. We're not actually interested + * in the inode in this situation, and ESTALE errors can be a problem. We + * simply want track down the dentry and vfsmount attached at the mountpoint + * and avoid revalidating the last component. + * + * Returns 0 and populates "path" on success. + */ +int +user_path_umountat(int dfd, const char __user *name, unsigned int flags, + struct path *path) +{ + struct filename *s = getname(name); + int error; + + if (IS_ERR(s)) + return PTR_ERR(s); + + error = path_umountat(dfd, s->name, path, flags | LOOKUP_RCU); + if (unlikely(error == -ECHILD)) + error = path_umountat(dfd, s->name, path, flags); + if (unlikely(error == -ESTALE)) + error = path_umountat(dfd, s->name, path, flags | LOOKUP_REVAL); + + if (likely(!error)) + audit_inode(s, path->dentry, 0); + + putname(s); + return error; +} + /* * It's inline, so penalty for filesystems that don't use sticky bit is * minimal. diff --git a/fs/namespace.c b/fs/namespace.c index a45ba4f267fe..ad8ea9bc2518 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1318,7 +1318,7 @@ SYSCALL_DEFINE2(umount, char __user *, name, int, flags) if (!(flags & UMOUNT_NOFOLLOW)) lookup_flags |= LOOKUP_FOLLOW; - retval = user_path_at(AT_FDCWD, name, lookup_flags, &path); + retval = user_path_umountat(AT_FDCWD, name, lookup_flags, &path); if (retval) goto out; mnt = real_mount(path.mnt); diff --git a/include/linux/namei.h b/include/linux/namei.h index 5a5ff57ceed4..cd09751c71a0 100644 --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -58,6 +58,7 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND}; extern int user_path_at(int, const char __user *, unsigned, struct path *); extern int user_path_at_empty(int, const char __user *, unsigned, struct path *, int *empty); +extern int user_path_umountat(int, const char __user *, unsigned int, struct path *); #define user_path(name, path) user_path_at(AT_FDCWD, name, LOOKUP_FOLLOW, path) #define user_lpath(name, path) user_path_at(AT_FDCWD, name, 0, path)