Message ID | 148760161575.31154.505252736798591155.stgit@bahia.lan |
---|---|
State | New |
Headers | show |
On Mon, Feb 20, 2017 at 03:40:15PM +0100, Greg Kurz wrote: > +static ssize_t do_xattrat_op(int op_type, int dirfd, const char *path, > + const char *name, void *value, size_t size, > + int flags) > +{ > + struct xattrat_data *data; > + pid_t pid; > + ssize_t ret = -1; > + int wstatus; > + > + data = mmap(NULL, sizeof(*data) + size, PROT_READ | PROT_WRITE, > + MAP_SHARED | MAP_ANONYMOUS, -1, 0); > + if (data == MAP_FAILED) { > + return -1; > + } > + data->ret = -1; > + > + pid = fork(); > + if (pid < 0) { > + goto err_out; > + } else if (pid == 0) { > + if (fchdir(dirfd) == 0) { > + switch (op_type) { > + case XATTRAT_OP_GET: > + data->ret = lgetxattr(path, name, data->value, size); > + break; > + case XATTRAT_OP_LIST: > + data->ret = llistxattr(path, data->value, size); > + break; > + case XATTRAT_OP_SET: > + data->ret = lsetxattr(path, name, value, size, flags); > + break; > + case XATTRAT_OP_REMOVE: > + data->ret = lremovexattr(path, name); > + break; > + default: > + g_assert_not_reached(); > + } > + } > + data->serrno = errno; > + _exit(0); > + } > + assert(waitpid(pid, &wstatus, 0) == pid && WIFEXITED(wstatus)); > + > + ret = data->ret; > + if (ret < 0) { > + errno = data->serrno; > + goto err_out; > + } > + if (value) { > + memcpy(value, data->value, data->ret); > + } > +err_out: > + munmap_preserver_errno(data, sizeof(*data) + size); > + return ret; > +} Forking is ugly since QEMU is a multi-threaded program. We brainstormed alternatives on IRC like using /proc/self/fd/$fd to work around the missing getxattrat() API. Stefan
On 02/20/2017 08:40 AM, Greg Kurz wrote: > All operations dealing with extended attributes are vulnerable to symlink > attacks because they use path-based syscalls which can traverse symbolic > links while walking through the dirname part of the path. > > The solution is to introduce helpers based on opendir_nofollow(). This > calls for "at" versions of the extended attribute syscalls, which don't > exist unfortunately. This patch implement them by simulating the "at" > behavior with fchdir(). Since the current working directory is process > wide, and we don't want to confuse another thread in QEMU, all the work > is done in a separate process. Can you emulate *at using /proc/fd/nnn/xyz? Coreutils was one of the early adopters of the power of *at functions, and found that emulation of *at via procfs was a LOT more efficient than emulation via fchdir (although both emulations still exist in gnulib, since procfs is not universal).
On Thu, Feb 23, 2017 at 4:02 PM, Eric Blake <eblake@redhat.com> wrote: > On 02/20/2017 08:40 AM, Greg Kurz wrote: >> All operations dealing with extended attributes are vulnerable to symlink >> attacks because they use path-based syscalls which can traverse symbolic >> links while walking through the dirname part of the path. >> >> The solution is to introduce helpers based on opendir_nofollow(). This >> calls for "at" versions of the extended attribute syscalls, which don't >> exist unfortunately. This patch implement them by simulating the "at" >> behavior with fchdir(). Since the current working directory is process >> wide, and we don't want to confuse another thread in QEMU, all the work >> is done in a separate process. > > Can you emulate *at using /proc/fd/nnn/xyz? I don't know much about QEMU internals, but QEMU supports running in a chroot using the -chroot option, right? Does that already require procfs to be mounted inside the chroot?
On Thu, 23 Feb 2017 16:05:02 +0100 Jann Horn <jannh@google.com> wrote: > On Thu, Feb 23, 2017 at 4:02 PM, Eric Blake <eblake@redhat.com> wrote: > > On 02/20/2017 08:40 AM, Greg Kurz wrote: > >> All operations dealing with extended attributes are vulnerable to symlink > >> attacks because they use path-based syscalls which can traverse symbolic > >> links while walking through the dirname part of the path. > >> > >> The solution is to introduce helpers based on opendir_nofollow(). This > >> calls for "at" versions of the extended attribute syscalls, which don't > >> exist unfortunately. This patch implement them by simulating the "at" > >> behavior with fchdir(). Since the current working directory is process > >> wide, and we don't want to confuse another thread in QEMU, all the work > >> is done in a separate process. > > > > Can you emulate *at using /proc/fd/nnn/xyz? > > I don't know much about QEMU internals, but QEMU supports running in a > chroot using the -chroot option, right? Does that already require procfs to be > mounted inside the chroot? Calling chroot() requires CAP_SYS_CHROOT and QEMU shouldn't rely on that to provide a secure and isolated environment to run VMs.
On Thu, 23 Feb 2017 13:44:41 +0000 Stefan Hajnoczi <stefanha@gmail.com> wrote: > On Mon, Feb 20, 2017 at 03:40:15PM +0100, Greg Kurz wrote: > > +static ssize_t do_xattrat_op(int op_type, int dirfd, const char *path, > > + const char *name, void *value, size_t size, > > + int flags) > > +{ > > + struct xattrat_data *data; > > + pid_t pid; > > + ssize_t ret = -1; > > + int wstatus; > > + > > + data = mmap(NULL, sizeof(*data) + size, PROT_READ | PROT_WRITE, > > + MAP_SHARED | MAP_ANONYMOUS, -1, 0); > > + if (data == MAP_FAILED) { > > + return -1; > > + } > > + data->ret = -1; > > + > > + pid = fork(); > > + if (pid < 0) { > > + goto err_out; > > + } else if (pid == 0) { > > + if (fchdir(dirfd) == 0) { > > + switch (op_type) { > > + case XATTRAT_OP_GET: > > + data->ret = lgetxattr(path, name, data->value, size); > > + break; > > + case XATTRAT_OP_LIST: > > + data->ret = llistxattr(path, data->value, size); > > + break; > > + case XATTRAT_OP_SET: > > + data->ret = lsetxattr(path, name, value, size, flags); > > + break; > > + case XATTRAT_OP_REMOVE: > > + data->ret = lremovexattr(path, name); > > + break; > > + default: > > + g_assert_not_reached(); > > + } > > + } > > + data->serrno = errno; > > + _exit(0); > > + } > > + assert(waitpid(pid, &wstatus, 0) == pid && WIFEXITED(wstatus)); > > + > > + ret = data->ret; > > + if (ret < 0) { > > + errno = data->serrno; > > + goto err_out; > > + } > > + if (value) { > > + memcpy(value, data->value, data->ret); > > + } > > +err_out: > > + munmap_preserver_errno(data, sizeof(*data) + size); > > + return ret; > > +} > > Forking is ugly since QEMU is a multi-threaded program. We brainstormed Yeah, forking is ugly and it completely ruins metadata performance (x30 slower in passthrough mode and x300 slower in mapped-xattr mode). > alternatives on IRC like using /proc/self/fd/$fd to work around the > missing getxattrat() API. > This should do the trick indeed. If we have to call getxattr() on some untrusted $path that may be modified by the guest. We can do: dirfd = openat_nofollow($mount_fd, dirname($path)) filename = basename($path) and then we can safely call: lgetxattr("/proc/self/fd/$dirfd/$filename") since "/proc/self/fd/$dirfd" is trusted. > Stefan
On Thu, 23 Feb 2017 09:02:39 -0600 Eric Blake <eblake@redhat.com> wrote: > On 02/20/2017 08:40 AM, Greg Kurz wrote: > > All operations dealing with extended attributes are vulnerable to symlink > > attacks because they use path-based syscalls which can traverse symbolic > > links while walking through the dirname part of the path. > > > > The solution is to introduce helpers based on opendir_nofollow(). This > > calls for "at" versions of the extended attribute syscalls, which don't > > exist unfortunately. This patch implement them by simulating the "at" > > behavior with fchdir(). Since the current working directory is process > > wide, and we don't want to confuse another thread in QEMU, all the work > > is done in a separate process. > > Can you emulate *at using /proc/fd/nnn/xyz? Coreutils was one of the > early adopters of the power of *at functions, and found that emulation > of *at via procfs was a LOT more efficient than emulation via fchdir > (although both emulations still exist in gnulib, since procfs is not > universal). > Yeah, Stefan suggested this on irc. I had also found a tentative patchset to implement genuine f*xattrat() calls in the kernel 3 yrs ago, that never got merged. The author, Florian Weimer, also told me /proc was the way to go. It looks like we have a consensus :)
diff --git a/hw/9pfs/9p-xattr.c b/hw/9pfs/9p-xattr.c index 19a2daf02f5c..62993624ff64 100644 --- a/hw/9pfs/9p-xattr.c +++ b/hw/9pfs/9p-xattr.c @@ -15,7 +15,165 @@ #include "9p.h" #include "fsdev/file-op-9p.h" #include "9p-xattr.h" +#include "9p-util.h" +enum { + XATTRAT_OP_GET = 0, + XATTRAT_OP_LIST, + XATTRAT_OP_SET, + XATTRAT_OP_REMOVE +}; + +struct xattrat_data { + ssize_t ret; + int serrno; + char value[0]; +}; + +static void munmap_preserver_errno(void *addr, size_t length) +{ + int serrno = errno; + munmap(addr, length); + errno = serrno; +} + +static ssize_t do_xattrat_op(int op_type, int dirfd, const char *path, + const char *name, void *value, size_t size, + int flags) +{ + struct xattrat_data *data; + pid_t pid; + ssize_t ret = -1; + int wstatus; + + data = mmap(NULL, sizeof(*data) + size, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_ANONYMOUS, -1, 0); + if (data == MAP_FAILED) { + return -1; + } + data->ret = -1; + + pid = fork(); + if (pid < 0) { + goto err_out; + } else if (pid == 0) { + if (fchdir(dirfd) == 0) { + switch (op_type) { + case XATTRAT_OP_GET: + data->ret = lgetxattr(path, name, data->value, size); + break; + case XATTRAT_OP_LIST: + data->ret = llistxattr(path, data->value, size); + break; + case XATTRAT_OP_SET: + data->ret = lsetxattr(path, name, value, size, flags); + break; + case XATTRAT_OP_REMOVE: + data->ret = lremovexattr(path, name); + break; + default: + g_assert_not_reached(); + } + } + data->serrno = errno; + _exit(0); + } + assert(waitpid(pid, &wstatus, 0) == pid && WIFEXITED(wstatus)); + + ret = data->ret; + if (ret < 0) { + errno = data->serrno; + goto err_out; + } + if (value) { + memcpy(value, data->value, data->ret); + } +err_out: + munmap_preserver_errno(data, sizeof(*data) + size); + return ret; +} + +ssize_t fgetxattrat_nofollow(int dirfd, const char *path, const char *name, + void *value, size_t size) +{ + return do_xattrat_op(XATTRAT_OP_GET, dirfd, path, name, value, size, 0); +} + +ssize_t local_getxattr_nofollow(FsContext *ctx, const char *path, + const char *name, void *value, size_t size) +{ + char *dirpath = g_path_get_dirname(path); + char *filename = g_path_get_basename(path); + int dirfd; + ssize_t ret = -1; + + dirfd = local_opendir_nofollow(ctx, dirpath); + if (dirfd == -1) { + goto out; + } + + ret = fgetxattrat_nofollow(dirfd, filename, name, value, size); + close_preserve_errno(dirfd); +out: + g_free(dirpath); + g_free(filename); + return ret; +} + +int fsetxattrat_nofollow(int dirfd, const char *path, const char *name, + void *value, size_t size, int flags) +{ + return do_xattrat_op(XATTRAT_OP_SET, dirfd, path, name, value, size, flags); +} + +ssize_t local_setxattr_nofollow(FsContext *ctx, const char *path, + const char *name, void *value, size_t size, + int flags) +{ + char *dirpath = g_path_get_dirname(path); + char *filename = g_path_get_basename(path); + int dirfd; + ssize_t ret = -1; + + dirfd = local_opendir_nofollow(ctx, dirpath); + if (dirfd == -1) { + goto out; + } + + ret = fsetxattrat_nofollow(dirfd, filename, name, value, size, flags); + close_preserve_errno(dirfd); +out: + g_free(dirpath); + g_free(filename); + return ret; +} + +static ssize_t fremovexattrat_nofollow(int dirfd, const char *path, + const char *name) +{ + return do_xattrat_op(XATTRAT_OP_GET, dirfd, path, name, NULL, 0, 0); +} + +ssize_t local_removexattr_nofollow(FsContext *ctx, const char *path, + const char *name) +{ + char *dirpath = g_path_get_dirname(path); + char *filename = g_path_get_basename(path); + int dirfd; + ssize_t ret = -1; + + dirfd = local_opendir_nofollow(ctx, dirpath); + if (dirfd == -1) { + goto out; + } + + ret = fremovexattrat_nofollow(dirfd, filename, name); + close_preserve_errno(dirfd); +out: + g_free(dirpath); + g_free(filename); + return ret; +} static XattrOperations *get_xattr_operations(XattrOperations **h, const char *name) diff --git a/hw/9pfs/9p-xattr.h b/hw/9pfs/9p-xattr.h index 3f43f5153f3c..986cb59b67f2 100644 --- a/hw/9pfs/9p-xattr.h +++ b/hw/9pfs/9p-xattr.h @@ -15,6 +15,7 @@ #define QEMU_9P_XATTR_H #include "qemu/xattr.h" +#include "9p-local.h" typedef struct xattr_operations { @@ -29,6 +30,18 @@ typedef struct xattr_operations const char *path, const char *name); } XattrOperations; +ssize_t fgetxattrat_nofollow(int dirfd, const char *path, const char *name, + void *value, size_t size); +int fsetxattrat_nofollow(int dirfd, const char *path, const char *name, + void *value, size_t size, int flags); + +ssize_t local_getxattr_nofollow(FsContext *ctx, const char *path, + const char *name, void *value, size_t size); +ssize_t local_setxattr_nofollow(FsContext *ctx, const char *path, + const char *name, void *value, size_t size, + int flags); +ssize_t local_removexattr_nofollow(FsContext *ctx, const char *path, + const char *name); extern XattrOperations mapped_user_xattr; extern XattrOperations passthrough_user_xattr;
All operations dealing with extended attributes are vulnerable to symlink attacks because they use path-based syscalls which can traverse symbolic links while walking through the dirname part of the path. The solution is to introduce helpers based on opendir_nofollow(). This calls for "at" versions of the extended attribute syscalls, which don't exist unfortunately. This patch implement them by simulating the "at" behavior with fchdir(). Since the current working directory is process wide, and we don't want to confuse another thread in QEMU, all the work is done in a separate process. The extended attributes code spreads over several files: all helpers are hence declared with external linkage in 9p-xattr.h. Note that the listxattr-based code is fully contained in 9p-xattr.c: the flistxattrat_nofollow() helper is added in a subsequent patch. Signed-off-by: Greg Kurz <groug@kaod.org> --- hw/9pfs/9p-xattr.c | 158 ++++++++++++++++++++++++++++++++++++++++++++++++++++ hw/9pfs/9p-xattr.h | 13 ++++ 2 files changed, 171 insertions(+)