diff mbox

[07/29] 9pfs: local: introduce symlink-attack safe xattr helpers

Message ID 148760161575.31154.505252736798591155.stgit@bahia.lan
State New
Headers show

Commit Message

Greg Kurz Feb. 20, 2017, 2:40 p.m. UTC
All operations dealing with extended attributes are vulnerable to symlink
attacks because they use path-based syscalls which can traverse symbolic
links while walking through the dirname part of the path.

The solution is to introduce helpers based on opendir_nofollow(). This
calls for "at" versions of the extended attribute syscalls, which don't
exist unfortunately. This patch implement them by simulating the "at"
behavior with fchdir(). Since the current working directory is process
wide, and we don't want to confuse another thread in QEMU, all the work
is done in a separate process.

The extended attributes code spreads over several files: all helpers
are hence declared with external linkage in 9p-xattr.h.

Note that the listxattr-based code is fully contained in 9p-xattr.c: the
flistxattrat_nofollow() helper is added in a subsequent patch.

Signed-off-by: Greg Kurz <groug@kaod.org>
---
 hw/9pfs/9p-xattr.c |  158 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/9pfs/9p-xattr.h |   13 ++++
 2 files changed, 171 insertions(+)

Comments

Stefan Hajnoczi Feb. 23, 2017, 1:44 p.m. UTC | #1
On Mon, Feb 20, 2017 at 03:40:15PM +0100, Greg Kurz wrote:
> +static ssize_t do_xattrat_op(int op_type, int dirfd, const char *path,
> +                             const char *name, void *value, size_t size,
> +                             int flags)
> +{
> +    struct xattrat_data *data;
> +    pid_t pid;
> +    ssize_t ret = -1;
> +    int wstatus;
> +
> +    data = mmap(NULL, sizeof(*data) + size, PROT_READ | PROT_WRITE,
> +                MAP_SHARED | MAP_ANONYMOUS, -1, 0);
> +    if (data == MAP_FAILED) {
> +        return -1;
> +    }
> +    data->ret = -1;
> +
> +    pid = fork();
> +    if (pid < 0) {
> +        goto err_out;
> +    } else if (pid == 0) {
> +        if (fchdir(dirfd) == 0) {
> +            switch (op_type) {
> +            case XATTRAT_OP_GET:
> +                data->ret = lgetxattr(path, name, data->value, size);
> +                break;
> +            case XATTRAT_OP_LIST:
> +                data->ret = llistxattr(path, data->value, size);
> +                break;
> +            case XATTRAT_OP_SET:
> +                data->ret = lsetxattr(path, name, value, size, flags);
> +                break;
> +            case XATTRAT_OP_REMOVE:
> +                data->ret = lremovexattr(path, name);
> +                break;
> +            default:
> +                g_assert_not_reached();
> +            }
> +        }
> +        data->serrno = errno;
> +        _exit(0);
> +    }
> +    assert(waitpid(pid, &wstatus, 0) == pid && WIFEXITED(wstatus));
> +
> +    ret = data->ret;
> +    if (ret < 0) {
> +        errno = data->serrno;
> +        goto err_out;
> +    }
> +    if (value) {
> +        memcpy(value, data->value, data->ret);
> +    }
> +err_out:
> +    munmap_preserver_errno(data, sizeof(*data) + size);
> +    return ret;
> +}

Forking is ugly since QEMU is a multi-threaded program.  We brainstormed
alternatives on IRC like using /proc/self/fd/$fd to work around the
missing getxattrat() API.

Stefan
Eric Blake Feb. 23, 2017, 3:02 p.m. UTC | #2
On 02/20/2017 08:40 AM, Greg Kurz wrote:
> All operations dealing with extended attributes are vulnerable to symlink
> attacks because they use path-based syscalls which can traverse symbolic
> links while walking through the dirname part of the path.
> 
> The solution is to introduce helpers based on opendir_nofollow(). This
> calls for "at" versions of the extended attribute syscalls, which don't
> exist unfortunately. This patch implement them by simulating the "at"
> behavior with fchdir(). Since the current working directory is process
> wide, and we don't want to confuse another thread in QEMU, all the work
> is done in a separate process.

Can you emulate *at using /proc/fd/nnn/xyz?  Coreutils was one of the
early adopters of the power of *at functions, and found that emulation
of *at via procfs was a LOT more efficient than emulation via fchdir
(although both emulations still exist in gnulib, since procfs is not
universal).
Jann Horn Feb. 23, 2017, 3:05 p.m. UTC | #3
On Thu, Feb 23, 2017 at 4:02 PM, Eric Blake <eblake@redhat.com> wrote:
> On 02/20/2017 08:40 AM, Greg Kurz wrote:
>> All operations dealing with extended attributes are vulnerable to symlink
>> attacks because they use path-based syscalls which can traverse symbolic
>> links while walking through the dirname part of the path.
>>
>> The solution is to introduce helpers based on opendir_nofollow(). This
>> calls for "at" versions of the extended attribute syscalls, which don't
>> exist unfortunately. This patch implement them by simulating the "at"
>> behavior with fchdir(). Since the current working directory is process
>> wide, and we don't want to confuse another thread in QEMU, all the work
>> is done in a separate process.
>
> Can you emulate *at using /proc/fd/nnn/xyz?

I don't know much about QEMU internals, but QEMU supports running in a
chroot using the -chroot option, right? Does that already require procfs to be
mounted inside the chroot?
Greg Kurz Feb. 23, 2017, 8:31 p.m. UTC | #4
On Thu, 23 Feb 2017 16:05:02 +0100
Jann Horn <jannh@google.com> wrote:

> On Thu, Feb 23, 2017 at 4:02 PM, Eric Blake <eblake@redhat.com> wrote:
> > On 02/20/2017 08:40 AM, Greg Kurz wrote:  
> >> All operations dealing with extended attributes are vulnerable to symlink
> >> attacks because they use path-based syscalls which can traverse symbolic
> >> links while walking through the dirname part of the path.
> >>
> >> The solution is to introduce helpers based on opendir_nofollow(). This
> >> calls for "at" versions of the extended attribute syscalls, which don't
> >> exist unfortunately. This patch implement them by simulating the "at"
> >> behavior with fchdir(). Since the current working directory is process
> >> wide, and we don't want to confuse another thread in QEMU, all the work
> >> is done in a separate process.  
> >
> > Can you emulate *at using /proc/fd/nnn/xyz?  
> 
> I don't know much about QEMU internals, but QEMU supports running in a
> chroot using the -chroot option, right? Does that already require procfs to be
> mounted inside the chroot?

Calling chroot() requires CAP_SYS_CHROOT and QEMU shouldn't rely on that to
provide a secure and isolated environment to run VMs.
Greg Kurz Feb. 23, 2017, 8:54 p.m. UTC | #5
On Thu, 23 Feb 2017 13:44:41 +0000
Stefan Hajnoczi <stefanha@gmail.com> wrote:

> On Mon, Feb 20, 2017 at 03:40:15PM +0100, Greg Kurz wrote:
> > +static ssize_t do_xattrat_op(int op_type, int dirfd, const char *path,
> > +                             const char *name, void *value, size_t size,
> > +                             int flags)
> > +{
> > +    struct xattrat_data *data;
> > +    pid_t pid;
> > +    ssize_t ret = -1;
> > +    int wstatus;
> > +
> > +    data = mmap(NULL, sizeof(*data) + size, PROT_READ | PROT_WRITE,
> > +                MAP_SHARED | MAP_ANONYMOUS, -1, 0);
> > +    if (data == MAP_FAILED) {
> > +        return -1;
> > +    }
> > +    data->ret = -1;
> > +
> > +    pid = fork();
> > +    if (pid < 0) {
> > +        goto err_out;
> > +    } else if (pid == 0) {
> > +        if (fchdir(dirfd) == 0) {
> > +            switch (op_type) {
> > +            case XATTRAT_OP_GET:
> > +                data->ret = lgetxattr(path, name, data->value, size);
> > +                break;
> > +            case XATTRAT_OP_LIST:
> > +                data->ret = llistxattr(path, data->value, size);
> > +                break;
> > +            case XATTRAT_OP_SET:
> > +                data->ret = lsetxattr(path, name, value, size, flags);
> > +                break;
> > +            case XATTRAT_OP_REMOVE:
> > +                data->ret = lremovexattr(path, name);
> > +                break;
> > +            default:
> > +                g_assert_not_reached();
> > +            }
> > +        }
> > +        data->serrno = errno;
> > +        _exit(0);
> > +    }
> > +    assert(waitpid(pid, &wstatus, 0) == pid && WIFEXITED(wstatus));
> > +
> > +    ret = data->ret;
> > +    if (ret < 0) {
> > +        errno = data->serrno;
> > +        goto err_out;
> > +    }
> > +    if (value) {
> > +        memcpy(value, data->value, data->ret);
> > +    }
> > +err_out:
> > +    munmap_preserver_errno(data, sizeof(*data) + size);
> > +    return ret;
> > +}  
> 
> Forking is ugly since QEMU is a multi-threaded program.  We brainstormed

Yeah, forking is ugly and it completely ruins metadata performance (x30
slower in passthrough mode and x300 slower in mapped-xattr mode).

> alternatives on IRC like using /proc/self/fd/$fd to work around the
> missing getxattrat() API.
> 

This should do the trick indeed. If we have to call getxattr()
on some untrusted $path that may be modified by the guest. We
can do:

dirfd = openat_nofollow($mount_fd, dirname($path))
filename = basename($path)

and then we can safely call:

lgetxattr("/proc/self/fd/$dirfd/$filename")

since "/proc/self/fd/$dirfd" is trusted.

> Stefan
Greg Kurz Feb. 23, 2017, 9:01 p.m. UTC | #6
On Thu, 23 Feb 2017 09:02:39 -0600
Eric Blake <eblake@redhat.com> wrote:

> On 02/20/2017 08:40 AM, Greg Kurz wrote:
> > All operations dealing with extended attributes are vulnerable to symlink
> > attacks because they use path-based syscalls which can traverse symbolic
> > links while walking through the dirname part of the path.
> > 
> > The solution is to introduce helpers based on opendir_nofollow(). This
> > calls for "at" versions of the extended attribute syscalls, which don't
> > exist unfortunately. This patch implement them by simulating the "at"
> > behavior with fchdir(). Since the current working directory is process
> > wide, and we don't want to confuse another thread in QEMU, all the work
> > is done in a separate process.  
> 
> Can you emulate *at using /proc/fd/nnn/xyz?  Coreutils was one of the
> early adopters of the power of *at functions, and found that emulation
> of *at via procfs was a LOT more efficient than emulation via fchdir
> (although both emulations still exist in gnulib, since procfs is not
> universal).
> 

Yeah, Stefan suggested this on irc. I had also found a tentative patchset to
implement genuine f*xattrat() calls in the kernel 3 yrs ago, that never got
merged. The author, Florian Weimer, also told me /proc was the way to go.

It looks like we have a consensus :)
diff mbox

Patch

diff --git a/hw/9pfs/9p-xattr.c b/hw/9pfs/9p-xattr.c
index 19a2daf02f5c..62993624ff64 100644
--- a/hw/9pfs/9p-xattr.c
+++ b/hw/9pfs/9p-xattr.c
@@ -15,7 +15,165 @@ 
 #include "9p.h"
 #include "fsdev/file-op-9p.h"
 #include "9p-xattr.h"
+#include "9p-util.h"
 
+enum {
+    XATTRAT_OP_GET = 0,
+    XATTRAT_OP_LIST,
+    XATTRAT_OP_SET,
+    XATTRAT_OP_REMOVE
+};
+
+struct xattrat_data {
+    ssize_t ret;
+    int serrno;
+    char value[0];
+};
+
+static void munmap_preserver_errno(void *addr, size_t length)
+{
+    int serrno = errno;
+    munmap(addr, length);
+    errno = serrno;
+}
+
+static ssize_t do_xattrat_op(int op_type, int dirfd, const char *path,
+                             const char *name, void *value, size_t size,
+                             int flags)
+{
+    struct xattrat_data *data;
+    pid_t pid;
+    ssize_t ret = -1;
+    int wstatus;
+
+    data = mmap(NULL, sizeof(*data) + size, PROT_READ | PROT_WRITE,
+                MAP_SHARED | MAP_ANONYMOUS, -1, 0);
+    if (data == MAP_FAILED) {
+        return -1;
+    }
+    data->ret = -1;
+
+    pid = fork();
+    if (pid < 0) {
+        goto err_out;
+    } else if (pid == 0) {
+        if (fchdir(dirfd) == 0) {
+            switch (op_type) {
+            case XATTRAT_OP_GET:
+                data->ret = lgetxattr(path, name, data->value, size);
+                break;
+            case XATTRAT_OP_LIST:
+                data->ret = llistxattr(path, data->value, size);
+                break;
+            case XATTRAT_OP_SET:
+                data->ret = lsetxattr(path, name, value, size, flags);
+                break;
+            case XATTRAT_OP_REMOVE:
+                data->ret = lremovexattr(path, name);
+                break;
+            default:
+                g_assert_not_reached();
+            }
+        }
+        data->serrno = errno;
+        _exit(0);
+    }
+    assert(waitpid(pid, &wstatus, 0) == pid && WIFEXITED(wstatus));
+
+    ret = data->ret;
+    if (ret < 0) {
+        errno = data->serrno;
+        goto err_out;
+    }
+    if (value) {
+        memcpy(value, data->value, data->ret);
+    }
+err_out:
+    munmap_preserver_errno(data, sizeof(*data) + size);
+    return ret;
+}
+
+ssize_t fgetxattrat_nofollow(int dirfd, const char *path, const char *name,
+                             void *value, size_t size)
+{
+    return do_xattrat_op(XATTRAT_OP_GET, dirfd, path, name, value, size, 0);
+}
+
+ssize_t local_getxattr_nofollow(FsContext *ctx, const char *path,
+                                const char *name, void *value, size_t size)
+{
+    char *dirpath = g_path_get_dirname(path);
+    char *filename = g_path_get_basename(path);
+    int dirfd;
+    ssize_t ret = -1;
+
+    dirfd = local_opendir_nofollow(ctx, dirpath);
+    if (dirfd == -1) {
+        goto out;
+    }
+
+    ret = fgetxattrat_nofollow(dirfd, filename, name, value, size);
+    close_preserve_errno(dirfd);
+out:
+    g_free(dirpath);
+    g_free(filename);
+    return ret;
+}
+
+int fsetxattrat_nofollow(int dirfd, const char *path, const char *name,
+                         void *value, size_t size, int flags)
+{
+    return do_xattrat_op(XATTRAT_OP_SET, dirfd, path, name, value, size, flags);
+}
+
+ssize_t local_setxattr_nofollow(FsContext *ctx, const char *path,
+                                const char *name, void *value, size_t size,
+                                int flags)
+{
+    char *dirpath = g_path_get_dirname(path);
+    char *filename = g_path_get_basename(path);
+    int dirfd;
+    ssize_t ret = -1;
+
+    dirfd = local_opendir_nofollow(ctx, dirpath);
+    if (dirfd == -1) {
+        goto out;
+    }
+
+    ret = fsetxattrat_nofollow(dirfd, filename, name, value, size, flags);
+    close_preserve_errno(dirfd);
+out:
+    g_free(dirpath);
+    g_free(filename);
+    return ret;
+}
+
+static ssize_t fremovexattrat_nofollow(int dirfd, const char *path,
+                                       const char *name)
+{
+    return do_xattrat_op(XATTRAT_OP_GET, dirfd, path, name, NULL, 0, 0);
+}
+
+ssize_t local_removexattr_nofollow(FsContext *ctx, const char *path,
+                                   const char *name)
+{
+    char *dirpath = g_path_get_dirname(path);
+    char *filename = g_path_get_basename(path);
+    int dirfd;
+    ssize_t ret = -1;
+
+    dirfd = local_opendir_nofollow(ctx, dirpath);
+    if (dirfd == -1) {
+        goto out;
+    }
+
+    ret = fremovexattrat_nofollow(dirfd, filename, name);
+    close_preserve_errno(dirfd);
+out:
+    g_free(dirpath);
+    g_free(filename);
+    return ret;
+}
 
 static XattrOperations *get_xattr_operations(XattrOperations **h,
                                              const char *name)
diff --git a/hw/9pfs/9p-xattr.h b/hw/9pfs/9p-xattr.h
index 3f43f5153f3c..986cb59b67f2 100644
--- a/hw/9pfs/9p-xattr.h
+++ b/hw/9pfs/9p-xattr.h
@@ -15,6 +15,7 @@ 
 #define QEMU_9P_XATTR_H
 
 #include "qemu/xattr.h"
+#include "9p-local.h"
 
 typedef struct xattr_operations
 {
@@ -29,6 +30,18 @@  typedef struct xattr_operations
                        const char *path, const char *name);
 } XattrOperations;
 
+ssize_t fgetxattrat_nofollow(int dirfd, const char *path, const char *name,
+                             void *value, size_t size);
+int fsetxattrat_nofollow(int dirfd, const char *path, const char *name,
+                         void *value, size_t size, int flags);
+
+ssize_t local_getxattr_nofollow(FsContext *ctx, const char *path,
+                                const char *name, void *value, size_t size);
+ssize_t local_setxattr_nofollow(FsContext *ctx, const char *path,
+                                const char *name, void *value, size_t size,
+                                int flags);
+ssize_t local_removexattr_nofollow(FsContext *ctx, const char *path,
+                                   const char *name);
 
 extern XattrOperations mapped_user_xattr;
 extern XattrOperations passthrough_user_xattr;