[v17,08/13] namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution
diff mbox series

Message ID 20191117011713.13032-9-cyphar@cyphar.com
State New
Headers show
Series
  • open: introduce openat2(2) syscall
Related show

Commit Message

Aleksa Sarai Nov. 17, 2019, 1:17 a.m. UTC
/* Background. */
There are many circumstances when userspace wants to resolve a path and
ensure that it doesn't go outside of a particular root directory during
resolution. Obvious examples include archive extraction tools, as well as
other security-conscious userspace programs. FreeBSD spun out O_BENEATH
from their Capsicum project[1,2], so it also seems reasonable to
implement similar functionality for Linux.

This is part of a refresh of Al's AT_NO_JUMPS patchset[3] (which was a
variation on David Drysdale's O_BENEATH patchset[4], which in turn was
based on the Capsicum project[5]).

/* Userspace API. */
LOOKUP_BENEATH will be exposed to userspace through openat2(2).

/* Semantics. */
Unlike most other LOOKUP flags (most notably LOOKUP_FOLLOW),
LOOKUP_BENEATH applies to all components of the path.

With LOOKUP_BENEATH, any path component which attempts to "escape" the
starting point of the filesystem lookup (the dirfd passed to openat)
will yield -EXDEV. Thus, all absolute paths and symlinks are disallowed.

Due to a security concern brought up by Jann[6], any ".." path
components are also blocked. This restriction will be lifted in a future
patch, but requires more work to ensure that permitting ".." is done
safely.

Magic-link jumps are also blocked, because they can beam the path lookup
across the starting point. It would be possible to detect and block
only the "bad" crossings with path_is_under() checks, but it's unclear
whether it makes sense to permit magic-links at all. However, userspace
is recommended to pass LOOKUP_NO_MAGICLINKS if they want to ensure that
magic-link crossing is entirely disabled.

/* Testing. */
LOOKUP_BENEATH is tested as part of the openat2(2) selftests.

[1]: https://reviews.freebsd.org/D2808
[2]: https://reviews.freebsd.org/D17547
[3]: https://lore.kernel.org/lkml/20170429220414.GT29622@ZenIV.linux.org.uk/
[4]: https://lore.kernel.org/lkml/1415094884-18349-1-git-send-email-drysdale@google.com/
[5]: https://lore.kernel.org/lkml/1404124096-21445-1-git-send-email-drysdale@google.com/
[6]: https://lore.kernel.org/lkml/CAG48ez1jzNvxB+bfOBnERFGp=oMM0vHWuLD6EULmne3R6xa53w@mail.gmail.com/

Cc: Christian Brauner <christian.brauner@ubuntu.com>
Suggested-by: David Drysdale <drysdale@google.com>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
---
 fs/namei.c            | 70 +++++++++++++++++++++++++++++++++++++++----
 include/linux/namei.h |  4 +++
 2 files changed, 68 insertions(+), 6 deletions(-)

Comments

Aleksa Sarai Nov. 25, 2019, 6:03 a.m. UTC | #1
On 2019-11-25, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Sun, Nov 17, 2019 at 12:17:08PM +1100, Aleksa Sarai wrote:
> 
> > +	if (unlikely(nd->flags & LOOKUP_IS_SCOPED)) {
> > +		/*
> > +		 * Do a final check to ensure that the path didn't escape. Note
> > +		 * that this should already be guaranteed by all of the other
> > +		 * LOOKUP_IS_SCOPED checks (and delaying this check this late
> > +		 * does open the door to some possible timing-based attacks).
> > +		 */
> > +		if (WARN_ON(!path_is_under(&nd->path, &nd->root)))
> > +			return -EXDEV;
> 
> I don't like that.  What it gives is an ability to race that with
> rename(), with user-triggered WARN_ON.  You *can't* promise that result of
> lookup is in a subtree, simply because it can get moved just as you've
> declared it to be in the clear.
> 
> 	Anyone who relies upon that is delusional; it really can't be done.
> What warranties LOOKUP_IS_SCOPED is really supposed to provide?  That we do not
> attempt to walk out of the subtree rooted at the start point?  Fine, but this
> is not what this test does.  What are you trying to achieve there?  If it's
> "what we'd got was at one point in our subtree", the test is more or less
> right, but WARN_ON isn't.

You're right that (looking at this again) this chunk doesn't make too
much sense -- though I still think it wouldn't be a bad idea to include
it without the WARN_ON.

The reason I added it was as an attempt to have a last-chance check to
make sure we don't hand out a file descriptor to userspace that is
outside nd->root as a result of some yet-unknown namei bug (hence the
WARN_ON). But you're quite right that I overlooked that user-space could
trigger this maliciously.

Regarding the warranties LOOKUP_IS_SCOPED is supposed to provide --
arguably the guarantee is meant to be "you never step outside the root
during lookup" but that should already be implemented with the
handle_dots() checks -- and it's not something you could easily check at
the end of a lookup anyway. The idea was that (if for some reason those
checks were bypassed), at the very least you wouldn't silently get a
file descriptor which is completely outside the root.

Looking at this again, I can see the argument that check this shouldn't
be done at all -- but I will admit that I feel more comfortable with the
guarantees of LOOKUP_IS_SCOPED if we had some kind of last-chance check
to avoid a privileged process opening /etc/shadow on the host. Then
again, libpathrs (which I assume will be the primary consumer of
LOOKUP_IN_ROOT) already does a double-check in userspace after getting
the file descriptor from openat2().

All of that being said, I'd be happy to drop it entirely if you feel
it's unnecessary.

Patch
diff mbox series

diff --git a/fs/namei.c b/fs/namei.c
index 321c8ad5d6b3..3f7bb22c375d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -641,6 +641,14 @@  static bool legitimize_links(struct nameidata *nd)
 
 static bool legitimize_root(struct nameidata *nd)
 {
+	/*
+	 * For scoped-lookups (where nd->root has been zeroed), we need to
+	 * restart the whole lookup from scratch -- because set_root() is wrong
+	 * for these lookups (nd->dfd is the root, not the filesystem root).
+	 */
+	if (!nd->root.mnt && (nd->flags & LOOKUP_IS_SCOPED))
+		return false;
+	/* Nothing to do if nd->root is zero or is managed by the VFS user. */
 	if (!nd->root.mnt || (nd->flags & LOOKUP_ROOT))
 		return true;
 	nd->flags |= LOOKUP_ROOT_GRABBED;
@@ -776,12 +784,27 @@  static int complete_walk(struct nameidata *nd)
 	int status;
 
 	if (nd->flags & LOOKUP_RCU) {
-		if (!(nd->flags & LOOKUP_ROOT))
+		/*
+		 * We don't want to zero nd->root for scoped-lookups or
+		 * externally-managed nd->root.
+		 */
+		if (!(nd->flags & (LOOKUP_ROOT | LOOKUP_IS_SCOPED)))
 			nd->root.mnt = NULL;
 		if (unlikely(unlazy_walk(nd)))
 			return -ECHILD;
 	}
 
+	if (unlikely(nd->flags & LOOKUP_IS_SCOPED)) {
+		/*
+		 * Do a final check to ensure that the path didn't escape. Note
+		 * that this should already be guaranteed by all of the other
+		 * LOOKUP_IS_SCOPED checks (and delaying this check this late
+		 * does open the door to some possible timing-based attacks).
+		 */
+		if (WARN_ON(!path_is_under(&nd->path, &nd->root)))
+			return -EXDEV;
+	}
+
 	if (likely(!(nd->flags & LOOKUP_JUMPED)))
 		return 0;
 
@@ -802,6 +825,14 @@  static int set_root(struct nameidata *nd)
 {
 	struct fs_struct *fs = current->fs;
 
+	/*
+	 * Jumping to the real root in a scoped-lookup is a BUG in namei, but we
+	 * still have to ensure it doesn't happen because it will cause a breakout
+	 * from the dirfd.
+	 */
+	if (WARN_ON(nd->flags & LOOKUP_IS_SCOPED))
+		return -ENOTRECOVERABLE;
+
 	if (nd->flags & LOOKUP_RCU) {
 		unsigned seq;
 
@@ -838,6 +869,8 @@  static inline void path_to_nameidata(const struct path *path,
 
 static int nd_jump_root(struct nameidata *nd)
 {
+	if (unlikely(nd->flags & LOOKUP_BENEATH))
+		return -EXDEV;
 	if (unlikely(nd->flags & LOOKUP_NO_XDEV)) {
 		/* Absolute path arguments to path_init() are allowed. */
 		if (nd->path.mnt != NULL && nd->path.mnt != nd->root.mnt)
@@ -883,6 +916,9 @@  int nd_jump_link(struct path *path)
 		if (nd->path.mnt != path->mnt)
 			goto err;
 	}
+	/* Not currently safe for scoped-lookups. */
+	if (unlikely(nd->flags & LOOKUP_IS_SCOPED))
+		goto err;
 
 	path_put(&nd->path);
 	nd->path = *path;
@@ -1379,8 +1415,11 @@  static int follow_dotdot_rcu(struct nameidata *nd)
 	struct inode *inode = nd->inode;
 
 	while (1) {
-		if (path_equal(&nd->path, &nd->root))
+		if (path_equal(&nd->path, &nd->root)) {
+			if (unlikely(nd->flags & LOOKUP_BENEATH))
+				return -ECHILD;
 			break;
+		}
 		if (nd->path.dentry != nd->path.mnt->mnt_root) {
 			struct dentry *old = nd->path.dentry;
 			struct dentry *parent = old->d_parent;
@@ -1510,9 +1549,12 @@  static int path_parent_directory(struct path *path)
 
 static int follow_dotdot(struct nameidata *nd)
 {
-	while(1) {
-		if (path_equal(&nd->path, &nd->root))
+	while (1) {
+		if (path_equal(&nd->path, &nd->root)) {
+			if (unlikely(nd->flags & LOOKUP_BENEATH))
+				return -EXDEV;
 			break;
+		}
 		if (nd->path.dentry != nd->path.mnt->mnt_root) {
 			int ret = path_parent_directory(&nd->path);
 			if (ret)
@@ -1739,6 +1781,13 @@  static inline int handle_dots(struct nameidata *nd, int type)
 	if (type == LAST_DOTDOT) {
 		int error = 0;
 
+		/*
+		 * Scoped-lookup flags resolving ".." is not currently safe --
+		 * races can cause our parent to have moved outside of the root
+		 * and us to skip over it.
+		 */
+		if (unlikely(nd->flags & LOOKUP_IS_SCOPED))
+			return -EXDEV;
 		if (!nd->root.mnt) {
 			error = set_root(nd);
 			if (error)
@@ -2261,7 +2310,6 @@  static const char *path_init(struct nameidata *nd, unsigned flags)
 			get_fs_pwd(current->fs, &nd->path);
 			nd->inode = nd->path.dentry->d_inode;
 		}
-		return s;
 	} else {
 		/* Caller must check execute permissions on the starting path component */
 		struct fd f = fdget_raw(nd->dfd);
@@ -2286,8 +2334,18 @@  static const char *path_init(struct nameidata *nd, unsigned flags)
 			nd->inode = nd->path.dentry->d_inode;
 		}
 		fdput(f);
-		return s;
 	}
+	/* For scoped-lookups we need to set the root to the dirfd as well. */
+	if (flags & LOOKUP_IS_SCOPED) {
+		nd->root = nd->path;
+		if (flags & LOOKUP_RCU) {
+			nd->root_seq = nd->seq;
+		} else {
+			path_get(&nd->root);
+			nd->flags |= LOOKUP_ROOT_GRABBED;
+		}
+	}
+	return s;
 }
 
 static const char *trailing_symlink(struct nameidata *nd)
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 25ee88c4acb1..93dad378f1e8 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -2,6 +2,7 @@ 
 #ifndef _LINUX_NAMEI_H
 #define _LINUX_NAMEI_H
 
+#include <linux/fs.h>
 #include <linux/kernel.h>
 #include <linux/path.h>
 #include <linux/fcntl.h>
@@ -43,6 +44,9 @@  enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND};
 #define LOOKUP_NO_SYMLINKS	0x010000 /* No symlink crossing. */
 #define LOOKUP_NO_MAGICLINKS	0x020000 /* No nd_jump_link() crossing. */
 #define LOOKUP_NO_XDEV		0x040000 /* No mountpoint crossing. */
+#define LOOKUP_BENEATH		0x080000 /* No escaping from starting point. */
+/* LOOKUP_* flags which do scope-related checks based on the dirfd. */
+#define LOOKUP_IS_SCOPED LOOKUP_BENEATH
 
 extern int path_pts(struct path *path);