Message ID | 20210121144429.58885-1-stefanha@redhat.com |
---|---|
State | New |
Headers | show |
Series | virtiofsd: prevent opening of special files (CVE-2020-35517) | expand |
On Thu, Jan 21, 2021 at 02:44:29PM +0000, Stefan Hajnoczi wrote: > A well-behaved FUSE client does not attempt to open special files with > FUSE_OPEN because they are handled on the client side (e.g. device nodes > are handled by client-side device drivers). > > The check to prevent virtiofsd from opening special files is missing in > a few cases, most notably FUSE_OPEN. A malicious client can cause > virtiofsd to open a device node, potentially allowing the guest to > escape. This can be exploited by a modified guest device driver. It is > not exploitable from guest userspace since the guest kernel will handle > special files inside the guest instead of sending FUSE requests. > > This patch adds the missing checks to virtiofsd. This is a short-term > solution because it does not prevent a compromised virtiofsd process > from opening device nodes on the host. > > Reported-by: Alex Xu <alex@alxu.ca> > Fixes: CVE-2020-35517 > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > --- > This issue was diagnosed on public IRC and is therefore already known > and not embargoed. > > A stronger fix, and the long-term solution, is for users to mount the > shared directory and any sub-mounts with nodev, as well as nosuid and > noexec. Unfortunately virtiofsd cannot do this automatically because > bind mounts added by the user after virtiofsd has launched would not be > detected. I suggest the following: > > 1. Modify libvirt and Kata Containers to explicitly set these mount > options. > 2. Then modify virtiofsd to check that the shared directory has the > necessary options at startup. Refuse to start if the options are > missing so that the user is aware of the security requirements. > > As a bonus this also increases the likelihood that other host processes > besides virtiofsd will be protected by nosuid/noexec/nodev so that a > malicious guest cannot drop these files in place and then arrange for a > host process to come across them. > > Additionally, user namespaces have been discussed. They seem like a > worthwhile addition as an unprivileged or privilege-separated mode > although there are limitations with respect to security xattrs and the > actual uid/gid stored on the host file system not corresponding to the > guest uid/gid. > --- > tools/virtiofsd/passthrough_ll.c | 84 +++++++++++++++++++++----------- > 1 file changed, 56 insertions(+), 28 deletions(-) > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c > index 5fb36d9407..e08352d649 100644 > --- a/tools/virtiofsd/passthrough_ll.c > +++ b/tools/virtiofsd/passthrough_ll.c > @@ -555,6 +555,29 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino) > return fd; > } > > +/* > + * Open a file descriptor for an inode. Returns -EBADF if the inode is not a > + * regular file or a directory. Use this helper function instead of raw > + * openat(2) to prevent security issues when a malicious client opens special > + * files such as block device nodes. > + */ > +static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode, > + int open_flags) > +{ > + g_autofree char *fd_str = g_strdup_printf("%d", inode->fd); > + int fd; > + > + if (!S_ISREG(inode->filetype) && !S_ISDIR(inode->filetype)) { > + return -EBADF; > + } > + > + fd = openat(lo->proc_self_fd, fd_str, open_flags); Whats the intended behaviour with symlinks ? Do we need to allow S_ISLNK, or are we assuming the symlink has already been expanded to the target file by this point ? If the latter adding a comment about this would be useful. > + if (fd < 0) { > + return -errno; > + } > + return fd; > +} > + Regards, Daniel
On 01/21/21 15:44, Stefan Hajnoczi wrote: > A well-behaved FUSE client does not attempt to open special files with > FUSE_OPEN because they are handled on the client side (e.g. device nodes > are handled by client-side device drivers). > > The check to prevent virtiofsd from opening special files is missing in > a few cases, most notably FUSE_OPEN. A malicious client can cause > virtiofsd to open a device node, potentially allowing the guest to > escape. This can be exploited by a modified guest device driver. It is > not exploitable from guest userspace since the guest kernel will handle > special files inside the guest instead of sending FUSE requests. > > This patch adds the missing checks to virtiofsd. This is a short-term > solution because it does not prevent a compromised virtiofsd process > from opening device nodes on the host. > > Reported-by: Alex Xu <alex@alxu.ca> > Fixes: CVE-2020-35517 > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > --- > This issue was diagnosed on public IRC and is therefore already known > and not embargoed. > > A stronger fix, and the long-term solution, is for users to mount the > shared directory and any sub-mounts with nodev, as well as nosuid and > noexec. Unfortunately virtiofsd cannot do this automatically because > bind mounts added by the user after virtiofsd has launched would not be > detected. I suggest the following: > > 1. Modify libvirt and Kata Containers to explicitly set these mount > options. > 2. Then modify virtiofsd to check that the shared directory has the > necessary options at startup. Refuse to start if the options are > missing so that the user is aware of the security requirements. Assuming a benign / trusted guest, is there going to be an override for this? Asked differently -- if we don't want to set up a separate block device on the host, to contain the filesystem that is mounted as the shared directory, can unionfs (?) / overlayfs be used to re-mount an existent host-side directory as the shared directory, but with "noexec,nosuid,nodev" *bolted-on*? If people have to create separate block devices (on the host side) for innocent use cases such as running tests in a trusted guest, that's not going to qualify as "usability progress" relative to having a qcow2 (or raw) image file. "nodev,nosuid" is kind of a no-brainer for any host-side *data* volume anyway (such as the one underlying "/home", even), so I don't see those options as a challenge. But "noexec" is different. Thanks, Laszlo > > As a bonus this also increases the likelihood that other host processes > besides virtiofsd will be protected by nosuid/noexec/nodev so that a > malicious guest cannot drop these files in place and then arrange for a > host process to come across them. > > Additionally, user namespaces have been discussed. They seem like a > worthwhile addition as an unprivileged or privilege-separated mode > although there are limitations with respect to security xattrs and the > actual uid/gid stored on the host file system not corresponding to the > guest uid/gid. > --- > tools/virtiofsd/passthrough_ll.c | 84 +++++++++++++++++++++----------- > 1 file changed, 56 insertions(+), 28 deletions(-) > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c > index 5fb36d9407..e08352d649 100644 > --- a/tools/virtiofsd/passthrough_ll.c > +++ b/tools/virtiofsd/passthrough_ll.c > @@ -555,6 +555,29 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino) > return fd; > } > > +/* > + * Open a file descriptor for an inode. Returns -EBADF if the inode is not a > + * regular file or a directory. Use this helper function instead of raw > + * openat(2) to prevent security issues when a malicious client opens special > + * files such as block device nodes. > + */ > +static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode, > + int open_flags) > +{ > + g_autofree char *fd_str = g_strdup_printf("%d", inode->fd); > + int fd; > + > + if (!S_ISREG(inode->filetype) && !S_ISDIR(inode->filetype)) { > + return -EBADF; > + } > + > + fd = openat(lo->proc_self_fd, fd_str, open_flags); > + if (fd < 0) { > + return -errno; > + } > + return fd; > +} > + > static void lo_init(void *userdata, struct fuse_conn_info *conn) > { > struct lo_data *lo = (struct lo_data *)userdata; > @@ -684,8 +707,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr, > if (fi) { > truncfd = fd; > } else { > - sprintf(procname, "%i", ifd); > - truncfd = openat(lo->proc_self_fd, procname, O_RDWR); > + truncfd = lo_inode_open(lo, inode, O_RDWR); > if (truncfd < 0) { > goto out_err; > } > @@ -1725,7 +1747,6 @@ static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo, > pid_t pid, int *err) > { > struct lo_inode_plock *plock; > - char procname[64]; > int fd; > > plock = > @@ -1742,12 +1763,10 @@ static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo, > } > > /* Open another instance of file which can be used for ofd locks. */ > - sprintf(procname, "%i", inode->fd); > - > /* TODO: What if file is not writable? */ > - fd = openat(lo->proc_self_fd, procname, O_RDWR); > - if (fd == -1) { > - *err = errno; > + fd = lo_inode_open(lo, inode, O_RDWR); > + if (fd < 0) { > + *err = -fd; > free(plock); > return NULL; > } > @@ -1894,18 +1913,24 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi) > { > int fd; > ssize_t fh; > - char buf[64]; > struct lo_data *lo = lo_data(req); > + struct lo_inode *inode = lo_inode(req, ino); > > fuse_log(FUSE_LOG_DEBUG, "lo_open(ino=%" PRIu64 ", flags=%d)\n", ino, > fi->flags); > > + if (!inode) { > + fuse_reply_err(req, EBADF); > + return; > + } > + > update_open_flags(lo->writeback, lo->allow_direct_io, fi); > > - sprintf(buf, "%i", lo_fd(req, ino)); > - fd = openat(lo->proc_self_fd, buf, fi->flags & ~O_NOFOLLOW); > - if (fd == -1) { > - return (void)fuse_reply_err(req, errno); > + fd = lo_inode_open(lo, inode, fi->flags & ~O_NOFOLLOW); > + if (fd < 0) { > + lo_inode_put(lo, &inode); > + fuse_reply_err(req, -fd); > + return; > } > > pthread_mutex_lock(&lo->mutex); > @@ -1913,6 +1938,7 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi) > pthread_mutex_unlock(&lo->mutex); > if (fh == -1) { > close(fd); > + lo_inode_put(lo, &inode); > fuse_reply_err(req, ENOMEM); > return; > } > @@ -1923,6 +1949,7 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi) > } else if (lo->cache == CACHE_ALWAYS) { > fi->keep_cache = 1; > } > + lo_inode_put(lo, &inode); > fuse_reply_open(req, fi); > } > > @@ -1982,39 +2009,40 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi) > static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync, > struct fuse_file_info *fi) > { > + struct lo_inode *inode = lo_inode(req, ino); > + struct lo_data *lo = lo_data(req); > int res; > int fd; > - char *buf; > > fuse_log(FUSE_LOG_DEBUG, "lo_fsync(ino=%" PRIu64 ", fi=0x%p)\n", ino, > (void *)fi); > > + if (!inode) { > + fuse_reply_err(req, EBADF); > + return; > + } > + > if (!fi) { > - struct lo_data *lo = lo_data(req); > - > - res = asprintf(&buf, "%i", lo_fd(req, ino)); > - if (res == -1) { > - return (void)fuse_reply_err(req, errno); > - } > - > - fd = openat(lo->proc_self_fd, buf, O_RDWR); > - free(buf); > - if (fd == -1) { > - return (void)fuse_reply_err(req, errno); > + fd = lo_inode_open(lo, inode, O_RDWR); > + if (fd < 0) { > + res = -fd; > + goto out; > } > } else { > fd = lo_fi_fd(req, fi); > } > > if (datasync) { > - res = fdatasync(fd); > + res = fdatasync(fd) == -1 ? errno : 0; > } else { > - res = fsync(fd); > + res = fsync(fd) == -1 ? errno : 0; > } > if (!fi) { > close(fd); > } > - fuse_reply_err(req, res == -1 ? errno : 0); > +out: > + lo_inode_put(lo, &inode); > + fuse_reply_err(req, res); > } > > static void lo_read(fuse_req_t req, fuse_ino_t ino, size_t size, off_t offset, >
On 01/21/21 16:52, Alex Xu wrote: > Excerpts from Laszlo Ersek's message of January 21, 2021 10:32 am: >> Assuming a benign / trusted guest, is there going to be an override for >> this? >> >> Asked differently -- if we don't want to set up a separate block device >> on the host, to contain the filesystem that is mounted as the shared >> directory, can unionfs (?) / overlayfs be used to re-mount an existent >> host-side directory as the shared directory, but with >> "noexec,nosuid,nodev" *bolted-on*? >> >> If people have to create separate block devices (on the host side) for >> innocent use cases such as running tests in a trusted guest, that's not >> going to qualify as "usability progress" relative to having a qcow2 (or >> raw) image file. >> >> "nodev,nosuid" is kind of a no-brainer for any host-side *data* volume >> anyway (such as the one underlying "/home", even), so I don't see those >> options as a challenge. But "noexec" is different. >> >> Thanks, >> Laszlo > > On Linux, there are two types of mount options: per-superblock, and > per-point. Most filesystem-specific options are per-superblock, and > apply to all mounts of that device. noexec, nosuid, and nodev are > per-mount options, and apply individually to each mount, bind or > otherwise, of a given device. Bind mounts copy the parent per-mount > options, but can be individually altered. Note also that it is not > required to create a new location for a bind mount. > > For example, invoking: > > mount --bind -o noexec,nosuid,nodev /var/lib/vms/source1 /var/lib/vms/source1 > > would effectively secure a source directory. This can also be inserted > in /etc/fstab, such as: > > /var/lib/vms/source1 /var/lib/vms/source1 none bind,noexec,nosuid,nodev 0 0 > > Note that, as explained in Stefan's initial email, this hides any > submounts below source. Each of those must be individually protected, > either by initially mounting with the security options or by creating a > new bind mount as above. Although these cases should be infrequent, they > are common enough that Stefan and I think that they should be supported, > or at the very least not silently behave in unexpected or insecure ways. > > Additionally, while it's possible to use overlayfs for this purpose, > it's overkill, and as far as I understand, doesn't solve the problem of > hiding sub-mounts. Awesome, this is the best answer I could possibly get. Thank you! Laszlo
* Stefan Hajnoczi (stefanha@redhat.com) wrote: > A well-behaved FUSE client does not attempt to open special files with > FUSE_OPEN because they are handled on the client side (e.g. device nodes > are handled by client-side device drivers). > > The check to prevent virtiofsd from opening special files is missing in > a few cases, most notably FUSE_OPEN. A malicious client can cause > virtiofsd to open a device node, potentially allowing the guest to > escape. This can be exploited by a modified guest device driver. It is > not exploitable from guest userspace since the guest kernel will handle > special files inside the guest instead of sending FUSE requests. > > This patch adds the missing checks to virtiofsd. This is a short-term > solution because it does not prevent a compromised virtiofsd process > from opening device nodes on the host. > > Reported-by: Alex Xu <alex@alxu.ca> > Fixes: CVE-2020-35517 > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> (With a response to Dan's question about Symlinks) Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > --- > This issue was diagnosed on public IRC and is therefore already known > and not embargoed. > > A stronger fix, and the long-term solution, is for users to mount the > shared directory and any sub-mounts with nodev, as well as nosuid and > noexec. Unfortunately virtiofsd cannot do this automatically because > bind mounts added by the user after virtiofsd has launched would not be > detected. I suggest the following: > > 1. Modify libvirt and Kata Containers to explicitly set these mount > options. > 2. Then modify virtiofsd to check that the shared directory has the > necessary options at startup. Refuse to start if the options are > missing so that the user is aware of the security requirements. > > As a bonus this also increases the likelihood that other host processes > besides virtiofsd will be protected by nosuid/noexec/nodev so that a > malicious guest cannot drop these files in place and then arrange for a > host process to come across them. > > Additionally, user namespaces have been discussed. They seem like a > worthwhile addition as an unprivileged or privilege-separated mode > although there are limitations with respect to security xattrs and the > actual uid/gid stored on the host file system not corresponding to the > guest uid/gid. > --- > tools/virtiofsd/passthrough_ll.c | 84 +++++++++++++++++++++----------- > 1 file changed, 56 insertions(+), 28 deletions(-) > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c > index 5fb36d9407..e08352d649 100644 > --- a/tools/virtiofsd/passthrough_ll.c > +++ b/tools/virtiofsd/passthrough_ll.c > @@ -555,6 +555,29 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino) > return fd; > } > > +/* > + * Open a file descriptor for an inode. Returns -EBADF if the inode is not a > + * regular file or a directory. Use this helper function instead of raw > + * openat(2) to prevent security issues when a malicious client opens special > + * files such as block device nodes. > + */ > +static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode, > + int open_flags) > +{ > + g_autofree char *fd_str = g_strdup_printf("%d", inode->fd); > + int fd; > + > + if (!S_ISREG(inode->filetype) && !S_ISDIR(inode->filetype)) { > + return -EBADF; > + } > + > + fd = openat(lo->proc_self_fd, fd_str, open_flags); > + if (fd < 0) { > + return -errno; > + } > + return fd; > +} > + > static void lo_init(void *userdata, struct fuse_conn_info *conn) > { > struct lo_data *lo = (struct lo_data *)userdata; > @@ -684,8 +707,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr, > if (fi) { > truncfd = fd; > } else { > - sprintf(procname, "%i", ifd); > - truncfd = openat(lo->proc_self_fd, procname, O_RDWR); > + truncfd = lo_inode_open(lo, inode, O_RDWR); > if (truncfd < 0) { > goto out_err; > } > @@ -1725,7 +1747,6 @@ static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo, > pid_t pid, int *err) > { > struct lo_inode_plock *plock; > - char procname[64]; > int fd; > > plock = > @@ -1742,12 +1763,10 @@ static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo, > } > > /* Open another instance of file which can be used for ofd locks. */ > - sprintf(procname, "%i", inode->fd); > - > /* TODO: What if file is not writable? */ > - fd = openat(lo->proc_self_fd, procname, O_RDWR); > - if (fd == -1) { > - *err = errno; > + fd = lo_inode_open(lo, inode, O_RDWR); > + if (fd < 0) { > + *err = -fd; > free(plock); > return NULL; > } > @@ -1894,18 +1913,24 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi) > { > int fd; > ssize_t fh; > - char buf[64]; > struct lo_data *lo = lo_data(req); > + struct lo_inode *inode = lo_inode(req, ino); > > fuse_log(FUSE_LOG_DEBUG, "lo_open(ino=%" PRIu64 ", flags=%d)\n", ino, > fi->flags); > > + if (!inode) { > + fuse_reply_err(req, EBADF); > + return; > + } > + > update_open_flags(lo->writeback, lo->allow_direct_io, fi); > > - sprintf(buf, "%i", lo_fd(req, ino)); > - fd = openat(lo->proc_self_fd, buf, fi->flags & ~O_NOFOLLOW); > - if (fd == -1) { > - return (void)fuse_reply_err(req, errno); > + fd = lo_inode_open(lo, inode, fi->flags & ~O_NOFOLLOW); > + if (fd < 0) { > + lo_inode_put(lo, &inode); > + fuse_reply_err(req, -fd); > + return; > } > > pthread_mutex_lock(&lo->mutex); > @@ -1913,6 +1938,7 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi) > pthread_mutex_unlock(&lo->mutex); > if (fh == -1) { > close(fd); > + lo_inode_put(lo, &inode); > fuse_reply_err(req, ENOMEM); > return; > } > @@ -1923,6 +1949,7 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi) > } else if (lo->cache == CACHE_ALWAYS) { > fi->keep_cache = 1; > } > + lo_inode_put(lo, &inode); > fuse_reply_open(req, fi); > } > > @@ -1982,39 +2009,40 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi) > static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync, > struct fuse_file_info *fi) > { > + struct lo_inode *inode = lo_inode(req, ino); > + struct lo_data *lo = lo_data(req); > int res; > int fd; > - char *buf; > > fuse_log(FUSE_LOG_DEBUG, "lo_fsync(ino=%" PRIu64 ", fi=0x%p)\n", ino, > (void *)fi); > > + if (!inode) { > + fuse_reply_err(req, EBADF); > + return; > + } > + > if (!fi) { > - struct lo_data *lo = lo_data(req); > - > - res = asprintf(&buf, "%i", lo_fd(req, ino)); > - if (res == -1) { > - return (void)fuse_reply_err(req, errno); > - } > - > - fd = openat(lo->proc_self_fd, buf, O_RDWR); > - free(buf); > - if (fd == -1) { > - return (void)fuse_reply_err(req, errno); > + fd = lo_inode_open(lo, inode, O_RDWR); > + if (fd < 0) { > + res = -fd; > + goto out; > } > } else { > fd = lo_fi_fd(req, fi); > } > > if (datasync) { > - res = fdatasync(fd); > + res = fdatasync(fd) == -1 ? errno : 0; > } else { > - res = fsync(fd); > + res = fsync(fd) == -1 ? errno : 0; > } > if (!fi) { > close(fd); > } > - fuse_reply_err(req, res == -1 ? errno : 0); > +out: > + lo_inode_put(lo, &inode); > + fuse_reply_err(req, res); > } > > static void lo_read(fuse_req_t req, fuse_ino_t ino, size_t size, off_t offset, > -- > 2.29.2 >
On Thu, Jan 21, 2021 at 02:44:29PM +0000, Stefan Hajnoczi wrote: > A well-behaved FUSE client does not attempt to open special files with > FUSE_OPEN because they are handled on the client side (e.g. device nodes > are handled by client-side device drivers). > > The check to prevent virtiofsd from opening special files is missing in > a few cases, most notably FUSE_OPEN. A malicious client can cause > virtiofsd to open a device node, potentially allowing the guest to > escape. This can be exploited by a modified guest device driver. It is > not exploitable from guest userspace since the guest kernel will handle > special files inside the guest instead of sending FUSE requests. > > This patch adds the missing checks to virtiofsd. This is a short-term > solution because it does not prevent a compromised virtiofsd process > from opening device nodes on the host. > > Reported-by: Alex Xu <alex@alxu.ca> > Fixes: CVE-2020-35517 > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> It looks good to me. I see there is another openat() instance in lo_opendir(). May be we can convert it to use lo_inode_open() as well? Initially I thought that remove posix locks will be broken on special files like named pipes (fifo). But looks like in that case posix lock request does not come to virtiofsd at all (despite the fact remote posix locks are enabled). So it probably is fine to modify openat() there. Ran blogbench and pjdfstests and they run fine for me, So.. Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Vivek > --- > This issue was diagnosed on public IRC and is therefore already known > and not embargoed. > > A stronger fix, and the long-term solution, is for users to mount the > shared directory and any sub-mounts with nodev, as well as nosuid and > noexec. Unfortunately virtiofsd cannot do this automatically because > bind mounts added by the user after virtiofsd has launched would not be > detected. I suggest the following: > > 1. Modify libvirt and Kata Containers to explicitly set these mount > options. > 2. Then modify virtiofsd to check that the shared directory has the > necessary options at startup. Refuse to start if the options are > missing so that the user is aware of the security requirements. > > As a bonus this also increases the likelihood that other host processes > besides virtiofsd will be protected by nosuid/noexec/nodev so that a > malicious guest cannot drop these files in place and then arrange for a > host process to come across them. > > Additionally, user namespaces have been discussed. They seem like a > worthwhile addition as an unprivileged or privilege-separated mode > although there are limitations with respect to security xattrs and the > actual uid/gid stored on the host file system not corresponding to the > guest uid/gid. > --- > tools/virtiofsd/passthrough_ll.c | 84 +++++++++++++++++++++----------- > 1 file changed, 56 insertions(+), 28 deletions(-) > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c > index 5fb36d9407..e08352d649 100644 > --- a/tools/virtiofsd/passthrough_ll.c > +++ b/tools/virtiofsd/passthrough_ll.c > @@ -555,6 +555,29 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino) > return fd; > } > > +/* > + * Open a file descriptor for an inode. Returns -EBADF if the inode is not a > + * regular file or a directory. Use this helper function instead of raw > + * openat(2) to prevent security issues when a malicious client opens special > + * files such as block device nodes. > + */ > +static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode, > + int open_flags) > +{ > + g_autofree char *fd_str = g_strdup_printf("%d", inode->fd); > + int fd; > + > + if (!S_ISREG(inode->filetype) && !S_ISDIR(inode->filetype)) { > + return -EBADF; > + } > + > + fd = openat(lo->proc_self_fd, fd_str, open_flags); > + if (fd < 0) { > + return -errno; > + } > + return fd; > +} > + > static void lo_init(void *userdata, struct fuse_conn_info *conn) > { > struct lo_data *lo = (struct lo_data *)userdata; > @@ -684,8 +707,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr, > if (fi) { > truncfd = fd; > } else { > - sprintf(procname, "%i", ifd); > - truncfd = openat(lo->proc_self_fd, procname, O_RDWR); > + truncfd = lo_inode_open(lo, inode, O_RDWR); > if (truncfd < 0) { > goto out_err; > } > @@ -1725,7 +1747,6 @@ static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo, > pid_t pid, int *err) > { > struct lo_inode_plock *plock; > - char procname[64]; > int fd; > > plock = > @@ -1742,12 +1763,10 @@ static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo, > } > > /* Open another instance of file which can be used for ofd locks. */ > - sprintf(procname, "%i", inode->fd); > - > /* TODO: What if file is not writable? */ > - fd = openat(lo->proc_self_fd, procname, O_RDWR); > - if (fd == -1) { > - *err = errno; > + fd = lo_inode_open(lo, inode, O_RDWR); > + if (fd < 0) { > + *err = -fd; > free(plock); > return NULL; > } > @@ -1894,18 +1913,24 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi) > { > int fd; > ssize_t fh; > - char buf[64]; > struct lo_data *lo = lo_data(req); > + struct lo_inode *inode = lo_inode(req, ino); > > fuse_log(FUSE_LOG_DEBUG, "lo_open(ino=%" PRIu64 ", flags=%d)\n", ino, > fi->flags); > > + if (!inode) { > + fuse_reply_err(req, EBADF); > + return; > + } > + > update_open_flags(lo->writeback, lo->allow_direct_io, fi); > > - sprintf(buf, "%i", lo_fd(req, ino)); > - fd = openat(lo->proc_self_fd, buf, fi->flags & ~O_NOFOLLOW); > - if (fd == -1) { > - return (void)fuse_reply_err(req, errno); > + fd = lo_inode_open(lo, inode, fi->flags & ~O_NOFOLLOW); > + if (fd < 0) { > + lo_inode_put(lo, &inode); > + fuse_reply_err(req, -fd); > + return; > } > > pthread_mutex_lock(&lo->mutex); > @@ -1913,6 +1938,7 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi) > pthread_mutex_unlock(&lo->mutex); > if (fh == -1) { > close(fd); > + lo_inode_put(lo, &inode); > fuse_reply_err(req, ENOMEM); > return; > } > @@ -1923,6 +1949,7 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi) > } else if (lo->cache == CACHE_ALWAYS) { > fi->keep_cache = 1; > } > + lo_inode_put(lo, &inode); > fuse_reply_open(req, fi); > } > > @@ -1982,39 +2009,40 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi) > static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync, > struct fuse_file_info *fi) > { > + struct lo_inode *inode = lo_inode(req, ino); > + struct lo_data *lo = lo_data(req); > int res; > int fd; > - char *buf; > > fuse_log(FUSE_LOG_DEBUG, "lo_fsync(ino=%" PRIu64 ", fi=0x%p)\n", ino, > (void *)fi); > > + if (!inode) { > + fuse_reply_err(req, EBADF); > + return; > + } > + > if (!fi) { > - struct lo_data *lo = lo_data(req); > - > - res = asprintf(&buf, "%i", lo_fd(req, ino)); > - if (res == -1) { > - return (void)fuse_reply_err(req, errno); > - } > - > - fd = openat(lo->proc_self_fd, buf, O_RDWR); > - free(buf); > - if (fd == -1) { > - return (void)fuse_reply_err(req, errno); > + fd = lo_inode_open(lo, inode, O_RDWR); > + if (fd < 0) { > + res = -fd; > + goto out; > } > } else { > fd = lo_fi_fd(req, fi); > } > > if (datasync) { > - res = fdatasync(fd); > + res = fdatasync(fd) == -1 ? errno : 0; > } else { > - res = fsync(fd); > + res = fsync(fd) == -1 ? errno : 0; > } > if (!fi) { > close(fd); > } > - fuse_reply_err(req, res == -1 ? errno : 0); > +out: > + lo_inode_put(lo, &inode); > + fuse_reply_err(req, res); > } > > static void lo_read(fuse_req_t req, fuse_ino_t ino, size_t size, off_t offset, > -- > 2.29.2 >
On Thu, Jan 21, 2021 at 02:48:03PM +0000, Daniel P. Berrangé wrote: > On Thu, Jan 21, 2021 at 02:44:29PM +0000, Stefan Hajnoczi wrote: > > A well-behaved FUSE client does not attempt to open special files with > > FUSE_OPEN because they are handled on the client side (e.g. device nodes > > are handled by client-side device drivers). > > > > The check to prevent virtiofsd from opening special files is missing in > > a few cases, most notably FUSE_OPEN. A malicious client can cause > > virtiofsd to open a device node, potentially allowing the guest to > > escape. This can be exploited by a modified guest device driver. It is > > not exploitable from guest userspace since the guest kernel will handle > > special files inside the guest instead of sending FUSE requests. > > > > This patch adds the missing checks to virtiofsd. This is a short-term > > solution because it does not prevent a compromised virtiofsd process > > from opening device nodes on the host. > > > > Reported-by: Alex Xu <alex@alxu.ca> > > Fixes: CVE-2020-35517 > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > > --- > > This issue was diagnosed on public IRC and is therefore already known > > and not embargoed. > > > > A stronger fix, and the long-term solution, is for users to mount the > > shared directory and any sub-mounts with nodev, as well as nosuid and > > noexec. Unfortunately virtiofsd cannot do this automatically because > > bind mounts added by the user after virtiofsd has launched would not be > > detected. I suggest the following: > > > > 1. Modify libvirt and Kata Containers to explicitly set these mount > > options. > > 2. Then modify virtiofsd to check that the shared directory has the > > necessary options at startup. Refuse to start if the options are > > missing so that the user is aware of the security requirements. > > > > As a bonus this also increases the likelihood that other host processes > > besides virtiofsd will be protected by nosuid/noexec/nodev so that a > > malicious guest cannot drop these files in place and then arrange for a > > host process to come across them. > > > > Additionally, user namespaces have been discussed. They seem like a > > worthwhile addition as an unprivileged or privilege-separated mode > > although there are limitations with respect to security xattrs and the > > actual uid/gid stored on the host file system not corresponding to the > > guest uid/gid. > > --- > > tools/virtiofsd/passthrough_ll.c | 84 +++++++++++++++++++++----------- > > 1 file changed, 56 insertions(+), 28 deletions(-) > > > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c > > index 5fb36d9407..e08352d649 100644 > > --- a/tools/virtiofsd/passthrough_ll.c > > +++ b/tools/virtiofsd/passthrough_ll.c > > @@ -555,6 +555,29 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino) > > return fd; > > } > > > > +/* > > + * Open a file descriptor for an inode. Returns -EBADF if the inode is not a > > + * regular file or a directory. Use this helper function instead of raw > > + * openat(2) to prevent security issues when a malicious client opens special > > + * files such as block device nodes. > > + */ > > +static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode, > > + int open_flags) > > +{ > > + g_autofree char *fd_str = g_strdup_printf("%d", inode->fd); > > + int fd; > > + > > + if (!S_ISREG(inode->filetype) && !S_ISDIR(inode->filetype)) { > > + return -EBADF; > > + } > > + > > + fd = openat(lo->proc_self_fd, fd_str, open_flags); > > Whats the intended behaviour with symlinks ? Do we need to > allow S_ISLNK, or are we assuming the symlink has already > been expanded to the target file by this point ? If the latter > adding a comment about this would be useful. Given the current places lo_inode_open() is being used, I think it is fine and we probably don't have to worry about symlink in these paths as it has been resolved already by now. In truncation path, we will expect that we are working with target file (and not symlink fd). Same should be true for remote posix locks, lo_open() and lo_fsync(). Other paths where we might be working with symlinks like listxattr, getxattr, setxattr, already have mechanism to deal with symlink. Thanks Vivek > > > + if (fd < 0) { > > + return -errno; > > + } > > + return fd; > > +} > > + > > Regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Thu, Jan 21, 2021 at 02:48:03PM +0000, Daniel P. Berrangé wrote: > On Thu, Jan 21, 2021 at 02:44:29PM +0000, Stefan Hajnoczi wrote: > > A well-behaved FUSE client does not attempt to open special files with > > FUSE_OPEN because they are handled on the client side (e.g. device nodes > > are handled by client-side device drivers). > > > > The check to prevent virtiofsd from opening special files is missing in > > a few cases, most notably FUSE_OPEN. A malicious client can cause > > virtiofsd to open a device node, potentially allowing the guest to > > escape. This can be exploited by a modified guest device driver. It is > > not exploitable from guest userspace since the guest kernel will handle > > special files inside the guest instead of sending FUSE requests. > > > > This patch adds the missing checks to virtiofsd. This is a short-term > > solution because it does not prevent a compromised virtiofsd process > > from opening device nodes on the host. > > > > Reported-by: Alex Xu <alex@alxu.ca> > > Fixes: CVE-2020-35517 > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > > --- > > This issue was diagnosed on public IRC and is therefore already known > > and not embargoed. > > > > A stronger fix, and the long-term solution, is for users to mount the > > shared directory and any sub-mounts with nodev, as well as nosuid and > > noexec. Unfortunately virtiofsd cannot do this automatically because > > bind mounts added by the user after virtiofsd has launched would not be > > detected. I suggest the following: > > > > 1. Modify libvirt and Kata Containers to explicitly set these mount > > options. > > 2. Then modify virtiofsd to check that the shared directory has the > > necessary options at startup. Refuse to start if the options are > > missing so that the user is aware of the security requirements. > > > > As a bonus this also increases the likelihood that other host processes > > besides virtiofsd will be protected by nosuid/noexec/nodev so that a > > malicious guest cannot drop these files in place and then arrange for a > > host process to come across them. > > > > Additionally, user namespaces have been discussed. They seem like a > > worthwhile addition as an unprivileged or privilege-separated mode > > although there are limitations with respect to security xattrs and the > > actual uid/gid stored on the host file system not corresponding to the > > guest uid/gid. > > --- > > tools/virtiofsd/passthrough_ll.c | 84 +++++++++++++++++++++----------- > > 1 file changed, 56 insertions(+), 28 deletions(-) > > > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c > > index 5fb36d9407..e08352d649 100644 > > --- a/tools/virtiofsd/passthrough_ll.c > > +++ b/tools/virtiofsd/passthrough_ll.c > > @@ -555,6 +555,29 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino) > > return fd; > > } > > > > +/* > > + * Open a file descriptor for an inode. Returns -EBADF if the inode is not a > > + * regular file or a directory. Use this helper function instead of raw > > + * openat(2) to prevent security issues when a malicious client opens special > > + * files such as block device nodes. > > + */ > > +static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode, > > + int open_flags) > > +{ > > + g_autofree char *fd_str = g_strdup_printf("%d", inode->fd); > > + int fd; > > + > > + if (!S_ISREG(inode->filetype) && !S_ISDIR(inode->filetype)) { > > + return -EBADF; > > + } > > + > > + fd = openat(lo->proc_self_fd, fd_str, open_flags); > > Whats the intended behaviour with symlinks ? Do we need to > allow S_ISLNK, or are we assuming the symlink has already > been expanded to the target file by this point ? If the latter > adding a comment about this would be useful. I will add a comment. The FUSE client is expected to resolve symlinks on the client side. In other words, the client does not open the symlink inode in order to access the target of the symlink. It must open the target directly. Stefan
On Thu, Jan 21, 2021 at 3:44 PM Stefan Hajnoczi <stefanha@redhat.com> wrote: > This patch adds the missing checks to virtiofsd. This is a short-term > solution because it does not prevent a compromised virtiofsd process > from opening device nodes on the host. I think the proper solution is adding support to the host in order to restrict opens on filesystems that virtiofsd has access to. My idea was to add a "force_nodev" mount option that cannot be disabled and will make propagated mounts also be marked "force_nodev,nodev". A possibly simpler solution is to extend seccomp to restrict the process itself from being able to open special files. Not sure if that's within the scope of seccomp though. Thanks, Miklos
On Fri, Jan 22, 2021 at 10:40:54AM -0500, Vivek Goyal wrote: > On Thu, Jan 21, 2021 at 02:44:29PM +0000, Stefan Hajnoczi wrote: > > A well-behaved FUSE client does not attempt to open special files with > > FUSE_OPEN because they are handled on the client side (e.g. device nodes > > are handled by client-side device drivers). > > > > The check to prevent virtiofsd from opening special files is missing in > > a few cases, most notably FUSE_OPEN. A malicious client can cause > > virtiofsd to open a device node, potentially allowing the guest to > > escape. This can be exploited by a modified guest device driver. It is > > not exploitable from guest userspace since the guest kernel will handle > > special files inside the guest instead of sending FUSE requests. > > > > This patch adds the missing checks to virtiofsd. This is a short-term > > solution because it does not prevent a compromised virtiofsd process > > from opening device nodes on the host. > > > > Reported-by: Alex Xu <alex@alxu.ca> > > Fixes: CVE-2020-35517 > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > > It looks good to me. I see there is another openat() instance in > lo_opendir(). May be we can convert it to use lo_inode_open() as well? I thought so too, but this one is interesting: fd = openat(lo_fd(req, ino), ".", O_RDONLY); Using "." here is basically equivalent to O_DIRECTORY! Therefore this always fails on special files. That's why I ended up leaving it unchanged. $ cat a.c #define _GNU_SOURCE #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <stdio.h> #include <stdlib.h> int main(int argc, char **argv) { int fd; int pfd = open(argv[1], O_PATH); if (pfd < 0) { perror("open"); return EXIT_FAILURE; } fd = openat(pfd, ".", O_RDONLY); if (fd < 0) { perror("openat"); return EXIT_FAILURE; } close(fd); close(pfd); return EXIT_SUCCESS; } $ gcc -o a a.c $ ./a /tmp <--- no error $ ./a /tmp/a.c openat: Not a directory $ ./a /dev/stdin openat: Not a directory Stefan
On Mon, Jan 25, 2021 at 05:12:23PM +0100, Miklos Szeredi wrote: > On Thu, Jan 21, 2021 at 3:44 PM Stefan Hajnoczi <stefanha@redhat.com> wrote: > > > This patch adds the missing checks to virtiofsd. This is a short-term > > solution because it does not prevent a compromised virtiofsd process > > from opening device nodes on the host. > > I think the proper solution is adding support to the host in order to > restrict opens on filesystems that virtiofsd has access to. > > My idea was to add a "force_nodev" mount option that cannot be > disabled and will make propagated mounts also be marked > "force_nodev,nodev". Interesting idea! Mount options that are relevant: * noexec * nosuid * nodev * nosymfollow Do you have time to work on the force_* mount options? > A possibly simpler solution is to extend seccomp to restrict the > process itself from being able to open special files. Not sure if > that's within the scope of seccomp though. I don't think seccomp can provide that restriction since it's unrelated to the syscall or its arguments. Stefan
On Tue, Jan 26, 2021 at 11:18 AM Stefan Hajnoczi <stefanha@redhat.com> wrote: > > On Mon, Jan 25, 2021 at 05:12:23PM +0100, Miklos Szeredi wrote: > > On Thu, Jan 21, 2021 at 3:44 PM Stefan Hajnoczi <stefanha@redhat.com> wrote: > > > > > This patch adds the missing checks to virtiofsd. This is a short-term > > > solution because it does not prevent a compromised virtiofsd process > > > from opening device nodes on the host. > > > > I think the proper solution is adding support to the host in order to > > restrict opens on filesystems that virtiofsd has access to. > > > > My idea was to add a "force_nodev" mount option that cannot be > > disabled and will make propagated mounts also be marked > > "force_nodev,nodev". > > Interesting idea! Mount options that are relevant: > * noexec > * nosuid > * nodev > * nosymfollow > > Do you have time to work on the force_* mount options? Not at the moment, but first we need to probe Al to see if this idea sticks... > > A possibly simpler solution is to extend seccomp to restrict the > > process itself from being able to open special files. Not sure if > > that's within the scope of seccomp though. > > I don't think seccomp can provide that restriction since it's unrelated > to the syscall or its arguments. How about selinux, then? Thanks, Miklos
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c index 5fb36d9407..e08352d649 100644 --- a/tools/virtiofsd/passthrough_ll.c +++ b/tools/virtiofsd/passthrough_ll.c @@ -555,6 +555,29 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino) return fd; } +/* + * Open a file descriptor for an inode. Returns -EBADF if the inode is not a + * regular file or a directory. Use this helper function instead of raw + * openat(2) to prevent security issues when a malicious client opens special + * files such as block device nodes. + */ +static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode, + int open_flags) +{ + g_autofree char *fd_str = g_strdup_printf("%d", inode->fd); + int fd; + + if (!S_ISREG(inode->filetype) && !S_ISDIR(inode->filetype)) { + return -EBADF; + } + + fd = openat(lo->proc_self_fd, fd_str, open_flags); + if (fd < 0) { + return -errno; + } + return fd; +} + static void lo_init(void *userdata, struct fuse_conn_info *conn) { struct lo_data *lo = (struct lo_data *)userdata; @@ -684,8 +707,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr, if (fi) { truncfd = fd; } else { - sprintf(procname, "%i", ifd); - truncfd = openat(lo->proc_self_fd, procname, O_RDWR); + truncfd = lo_inode_open(lo, inode, O_RDWR); if (truncfd < 0) { goto out_err; } @@ -1725,7 +1747,6 @@ static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo, pid_t pid, int *err) { struct lo_inode_plock *plock; - char procname[64]; int fd; plock = @@ -1742,12 +1763,10 @@ static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo, } /* Open another instance of file which can be used for ofd locks. */ - sprintf(procname, "%i", inode->fd); - /* TODO: What if file is not writable? */ - fd = openat(lo->proc_self_fd, procname, O_RDWR); - if (fd == -1) { - *err = errno; + fd = lo_inode_open(lo, inode, O_RDWR); + if (fd < 0) { + *err = -fd; free(plock); return NULL; } @@ -1894,18 +1913,24 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi) { int fd; ssize_t fh; - char buf[64]; struct lo_data *lo = lo_data(req); + struct lo_inode *inode = lo_inode(req, ino); fuse_log(FUSE_LOG_DEBUG, "lo_open(ino=%" PRIu64 ", flags=%d)\n", ino, fi->flags); + if (!inode) { + fuse_reply_err(req, EBADF); + return; + } + update_open_flags(lo->writeback, lo->allow_direct_io, fi); - sprintf(buf, "%i", lo_fd(req, ino)); - fd = openat(lo->proc_self_fd, buf, fi->flags & ~O_NOFOLLOW); - if (fd == -1) { - return (void)fuse_reply_err(req, errno); + fd = lo_inode_open(lo, inode, fi->flags & ~O_NOFOLLOW); + if (fd < 0) { + lo_inode_put(lo, &inode); + fuse_reply_err(req, -fd); + return; } pthread_mutex_lock(&lo->mutex); @@ -1913,6 +1938,7 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi) pthread_mutex_unlock(&lo->mutex); if (fh == -1) { close(fd); + lo_inode_put(lo, &inode); fuse_reply_err(req, ENOMEM); return; } @@ -1923,6 +1949,7 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi) } else if (lo->cache == CACHE_ALWAYS) { fi->keep_cache = 1; } + lo_inode_put(lo, &inode); fuse_reply_open(req, fi); } @@ -1982,39 +2009,40 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi) static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync, struct fuse_file_info *fi) { + struct lo_inode *inode = lo_inode(req, ino); + struct lo_data *lo = lo_data(req); int res; int fd; - char *buf; fuse_log(FUSE_LOG_DEBUG, "lo_fsync(ino=%" PRIu64 ", fi=0x%p)\n", ino, (void *)fi); + if (!inode) { + fuse_reply_err(req, EBADF); + return; + } + if (!fi) { - struct lo_data *lo = lo_data(req); - - res = asprintf(&buf, "%i", lo_fd(req, ino)); - if (res == -1) { - return (void)fuse_reply_err(req, errno); - } - - fd = openat(lo->proc_self_fd, buf, O_RDWR); - free(buf); - if (fd == -1) { - return (void)fuse_reply_err(req, errno); + fd = lo_inode_open(lo, inode, O_RDWR); + if (fd < 0) { + res = -fd; + goto out; } } else { fd = lo_fi_fd(req, fi); } if (datasync) { - res = fdatasync(fd); + res = fdatasync(fd) == -1 ? errno : 0; } else { - res = fsync(fd); + res = fsync(fd) == -1 ? errno : 0; } if (!fi) { close(fd); } - fuse_reply_err(req, res == -1 ? errno : 0); +out: + lo_inode_put(lo, &inode); + fuse_reply_err(req, res); } static void lo_read(fuse_req_t req, fuse_ino_t ino, size_t size, off_t offset,
A well-behaved FUSE client does not attempt to open special files with FUSE_OPEN because they are handled on the client side (e.g. device nodes are handled by client-side device drivers). The check to prevent virtiofsd from opening special files is missing in a few cases, most notably FUSE_OPEN. A malicious client can cause virtiofsd to open a device node, potentially allowing the guest to escape. This can be exploited by a modified guest device driver. It is not exploitable from guest userspace since the guest kernel will handle special files inside the guest instead of sending FUSE requests. This patch adds the missing checks to virtiofsd. This is a short-term solution because it does not prevent a compromised virtiofsd process from opening device nodes on the host. Reported-by: Alex Xu <alex@alxu.ca> Fixes: CVE-2020-35517 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> --- This issue was diagnosed on public IRC and is therefore already known and not embargoed. A stronger fix, and the long-term solution, is for users to mount the shared directory and any sub-mounts with nodev, as well as nosuid and noexec. Unfortunately virtiofsd cannot do this automatically because bind mounts added by the user after virtiofsd has launched would not be detected. I suggest the following: 1. Modify libvirt and Kata Containers to explicitly set these mount options. 2. Then modify virtiofsd to check that the shared directory has the necessary options at startup. Refuse to start if the options are missing so that the user is aware of the security requirements. As a bonus this also increases the likelihood that other host processes besides virtiofsd will be protected by nosuid/noexec/nodev so that a malicious guest cannot drop these files in place and then arrange for a host process to come across them. Additionally, user namespaces have been discussed. They seem like a worthwhile addition as an unprivileged or privilege-separated mode although there are limitations with respect to security xattrs and the actual uid/gid stored on the host file system not corresponding to the guest uid/gid. --- tools/virtiofsd/passthrough_ll.c | 84 +++++++++++++++++++++----------- 1 file changed, 56 insertions(+), 28 deletions(-)