diff mbox series

virtiofsd: prevent opening of special files (CVE-2020-35517)

Message ID 20210121144429.58885-1-stefanha@redhat.com
State New
Headers show
Series virtiofsd: prevent opening of special files (CVE-2020-35517) | expand

Commit Message

Stefan Hajnoczi Jan. 21, 2021, 2:44 p.m. UTC
A well-behaved FUSE client does not attempt to open special files with
FUSE_OPEN because they are handled on the client side (e.g. device nodes
are handled by client-side device drivers).

The check to prevent virtiofsd from opening special files is missing in
a few cases, most notably FUSE_OPEN. A malicious client can cause
virtiofsd to open a device node, potentially allowing the guest to
escape. This can be exploited by a modified guest device driver. It is
not exploitable from guest userspace since the guest kernel will handle
special files inside the guest instead of sending FUSE requests.

This patch adds the missing checks to virtiofsd. This is a short-term
solution because it does not prevent a compromised virtiofsd process
from opening device nodes on the host.

Reported-by: Alex Xu <alex@alxu.ca>
Fixes: CVE-2020-35517
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
This issue was diagnosed on public IRC and is therefore already known
and not embargoed.

A stronger fix, and the long-term solution, is for users to mount the
shared directory and any sub-mounts with nodev, as well as nosuid and
noexec. Unfortunately virtiofsd cannot do this automatically because
bind mounts added by the user after virtiofsd has launched would not be
detected. I suggest the following:

1. Modify libvirt and Kata Containers to explicitly set these mount
   options.
2. Then modify virtiofsd to check that the shared directory has the
   necessary options at startup. Refuse to start if the options are
   missing so that the user is aware of the security requirements.

As a bonus this also increases the likelihood that other host processes
besides virtiofsd will be protected by nosuid/noexec/nodev so that a
malicious guest cannot drop these files in place and then arrange for a
host process to come across them.

Additionally, user namespaces have been discussed. They seem like a
worthwhile addition as an unprivileged or privilege-separated mode
although there are limitations with respect to security xattrs and the
actual uid/gid stored on the host file system not corresponding to the
guest uid/gid.
---
 tools/virtiofsd/passthrough_ll.c | 84 +++++++++++++++++++++-----------
 1 file changed, 56 insertions(+), 28 deletions(-)

Comments

Daniel P. Berrangé Jan. 21, 2021, 2:48 p.m. UTC | #1
On Thu, Jan 21, 2021 at 02:44:29PM +0000, Stefan Hajnoczi wrote:
> A well-behaved FUSE client does not attempt to open special files with
> FUSE_OPEN because they are handled on the client side (e.g. device nodes
> are handled by client-side device drivers).
> 
> The check to prevent virtiofsd from opening special files is missing in
> a few cases, most notably FUSE_OPEN. A malicious client can cause
> virtiofsd to open a device node, potentially allowing the guest to
> escape. This can be exploited by a modified guest device driver. It is
> not exploitable from guest userspace since the guest kernel will handle
> special files inside the guest instead of sending FUSE requests.
> 
> This patch adds the missing checks to virtiofsd. This is a short-term
> solution because it does not prevent a compromised virtiofsd process
> from opening device nodes on the host.
> 
> Reported-by: Alex Xu <alex@alxu.ca>
> Fixes: CVE-2020-35517
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> This issue was diagnosed on public IRC and is therefore already known
> and not embargoed.
> 
> A stronger fix, and the long-term solution, is for users to mount the
> shared directory and any sub-mounts with nodev, as well as nosuid and
> noexec. Unfortunately virtiofsd cannot do this automatically because
> bind mounts added by the user after virtiofsd has launched would not be
> detected. I suggest the following:
> 
> 1. Modify libvirt and Kata Containers to explicitly set these mount
>    options.
> 2. Then modify virtiofsd to check that the shared directory has the
>    necessary options at startup. Refuse to start if the options are
>    missing so that the user is aware of the security requirements.
> 
> As a bonus this also increases the likelihood that other host processes
> besides virtiofsd will be protected by nosuid/noexec/nodev so that a
> malicious guest cannot drop these files in place and then arrange for a
> host process to come across them.
> 
> Additionally, user namespaces have been discussed. They seem like a
> worthwhile addition as an unprivileged or privilege-separated mode
> although there are limitations with respect to security xattrs and the
> actual uid/gid stored on the host file system not corresponding to the
> guest uid/gid.
> ---
>  tools/virtiofsd/passthrough_ll.c | 84 +++++++++++++++++++++-----------
>  1 file changed, 56 insertions(+), 28 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 5fb36d9407..e08352d649 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -555,6 +555,29 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino)
>      return fd;
>  }
>  
> +/*
> + * Open a file descriptor for an inode. Returns -EBADF if the inode is not a
> + * regular file or a directory. Use this helper function instead of raw
> + * openat(2) to prevent security issues when a malicious client opens special
> + * files such as block device nodes.
> + */
> +static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode,
> +                         int open_flags)
> +{
> +    g_autofree char *fd_str = g_strdup_printf("%d", inode->fd);
> +    int fd;
> +
> +    if (!S_ISREG(inode->filetype) && !S_ISDIR(inode->filetype)) {
> +        return -EBADF;
> +    }
> +
> +    fd = openat(lo->proc_self_fd, fd_str, open_flags);

Whats the intended behaviour with symlinks ?  Do we need to
allow S_ISLNK, or are we assuming the symlink has already
been expanded to the target file by this point ? If the latter
adding a comment about this would be useful.

> +    if (fd < 0) {
> +        return -errno;
> +    }
> +    return fd;
> +}
> +

Regards,
Daniel
Laszlo Ersek Jan. 21, 2021, 3:32 p.m. UTC | #2
On 01/21/21 15:44, Stefan Hajnoczi wrote:
> A well-behaved FUSE client does not attempt to open special files with
> FUSE_OPEN because they are handled on the client side (e.g. device nodes
> are handled by client-side device drivers).
> 
> The check to prevent virtiofsd from opening special files is missing in
> a few cases, most notably FUSE_OPEN. A malicious client can cause
> virtiofsd to open a device node, potentially allowing the guest to
> escape. This can be exploited by a modified guest device driver. It is
> not exploitable from guest userspace since the guest kernel will handle
> special files inside the guest instead of sending FUSE requests.
> 
> This patch adds the missing checks to virtiofsd. This is a short-term
> solution because it does not prevent a compromised virtiofsd process
> from opening device nodes on the host.
> 
> Reported-by: Alex Xu <alex@alxu.ca>
> Fixes: CVE-2020-35517
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> This issue was diagnosed on public IRC and is therefore already known
> and not embargoed.
> 
> A stronger fix, and the long-term solution, is for users to mount the
> shared directory and any sub-mounts with nodev, as well as nosuid and
> noexec. Unfortunately virtiofsd cannot do this automatically because
> bind mounts added by the user after virtiofsd has launched would not be
> detected. I suggest the following:
> 
> 1. Modify libvirt and Kata Containers to explicitly set these mount
>    options.
> 2. Then modify virtiofsd to check that the shared directory has the
>    necessary options at startup. Refuse to start if the options are
>    missing so that the user is aware of the security requirements.

Assuming a benign / trusted guest, is there going to be an override for
this?

Asked differently -- if we don't want to set up a separate block device
on the host, to contain the filesystem that is mounted as the shared
directory, can unionfs (?) / overlayfs be used to re-mount an existent
host-side directory as the shared directory, but with
"noexec,nosuid,nodev" *bolted-on*?

If people have to create separate block devices (on the host side) for
innocent use cases such as running tests in a trusted guest, that's not
going to qualify as "usability progress" relative to having a qcow2 (or
raw) image file.

"nodev,nosuid" is kind of a no-brainer for any host-side *data* volume
anyway (such as the one underlying "/home", even), so I don't see those
options as a challenge. But "noexec" is different.

Thanks,
Laszlo

> 
> As a bonus this also increases the likelihood that other host processes
> besides virtiofsd will be protected by nosuid/noexec/nodev so that a
> malicious guest cannot drop these files in place and then arrange for a
> host process to come across them.
> 
> Additionally, user namespaces have been discussed. They seem like a
> worthwhile addition as an unprivileged or privilege-separated mode
> although there are limitations with respect to security xattrs and the
> actual uid/gid stored on the host file system not corresponding to the
> guest uid/gid.
> ---
>  tools/virtiofsd/passthrough_ll.c | 84 +++++++++++++++++++++-----------
>  1 file changed, 56 insertions(+), 28 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 5fb36d9407..e08352d649 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -555,6 +555,29 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino)
>      return fd;
>  }
>  
> +/*
> + * Open a file descriptor for an inode. Returns -EBADF if the inode is not a
> + * regular file or a directory. Use this helper function instead of raw
> + * openat(2) to prevent security issues when a malicious client opens special
> + * files such as block device nodes.
> + */
> +static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode,
> +                         int open_flags)
> +{
> +    g_autofree char *fd_str = g_strdup_printf("%d", inode->fd);
> +    int fd;
> +
> +    if (!S_ISREG(inode->filetype) && !S_ISDIR(inode->filetype)) {
> +        return -EBADF;
> +    }
> +
> +    fd = openat(lo->proc_self_fd, fd_str, open_flags);
> +    if (fd < 0) {
> +        return -errno;
> +    }
> +    return fd;
> +}
> +
>  static void lo_init(void *userdata, struct fuse_conn_info *conn)
>  {
>      struct lo_data *lo = (struct lo_data *)userdata;
> @@ -684,8 +707,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>          if (fi) {
>              truncfd = fd;
>          } else {
> -            sprintf(procname, "%i", ifd);
> -            truncfd = openat(lo->proc_self_fd, procname, O_RDWR);
> +            truncfd = lo_inode_open(lo, inode, O_RDWR);
>              if (truncfd < 0) {
>                  goto out_err;
>              }
> @@ -1725,7 +1747,6 @@ static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo,
>                                                        pid_t pid, int *err)
>  {
>      struct lo_inode_plock *plock;
> -    char procname[64];
>      int fd;
>  
>      plock =
> @@ -1742,12 +1763,10 @@ static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo,
>      }
>  
>      /* Open another instance of file which can be used for ofd locks. */
> -    sprintf(procname, "%i", inode->fd);
> -
>      /* TODO: What if file is not writable? */
> -    fd = openat(lo->proc_self_fd, procname, O_RDWR);
> -    if (fd == -1) {
> -        *err = errno;
> +    fd = lo_inode_open(lo, inode, O_RDWR);
> +    if (fd < 0) {
> +        *err = -fd;
>          free(plock);
>          return NULL;
>      }
> @@ -1894,18 +1913,24 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
>  {
>      int fd;
>      ssize_t fh;
> -    char buf[64];
>      struct lo_data *lo = lo_data(req);
> +    struct lo_inode *inode = lo_inode(req, ino);
>  
>      fuse_log(FUSE_LOG_DEBUG, "lo_open(ino=%" PRIu64 ", flags=%d)\n", ino,
>               fi->flags);
>  
> +    if (!inode) {
> +        fuse_reply_err(req, EBADF);
> +        return;
> +    }
> +
>      update_open_flags(lo->writeback, lo->allow_direct_io, fi);
>  
> -    sprintf(buf, "%i", lo_fd(req, ino));
> -    fd = openat(lo->proc_self_fd, buf, fi->flags & ~O_NOFOLLOW);
> -    if (fd == -1) {
> -        return (void)fuse_reply_err(req, errno);
> +    fd = lo_inode_open(lo, inode, fi->flags & ~O_NOFOLLOW);
> +    if (fd < 0) {
> +        lo_inode_put(lo, &inode);
> +        fuse_reply_err(req, -fd);
> +        return;
>      }
>  
>      pthread_mutex_lock(&lo->mutex);
> @@ -1913,6 +1938,7 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
>      pthread_mutex_unlock(&lo->mutex);
>      if (fh == -1) {
>          close(fd);
> +        lo_inode_put(lo, &inode);
>          fuse_reply_err(req, ENOMEM);
>          return;
>      }
> @@ -1923,6 +1949,7 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
>      } else if (lo->cache == CACHE_ALWAYS) {
>          fi->keep_cache = 1;
>      }
> +    lo_inode_put(lo, &inode);
>      fuse_reply_open(req, fi);
>  }
>  
> @@ -1982,39 +2009,40 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
>  static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
>                       struct fuse_file_info *fi)
>  {
> +    struct lo_inode *inode = lo_inode(req, ino);
> +    struct lo_data *lo = lo_data(req);
>      int res;
>      int fd;
> -    char *buf;
>  
>      fuse_log(FUSE_LOG_DEBUG, "lo_fsync(ino=%" PRIu64 ", fi=0x%p)\n", ino,
>               (void *)fi);
>  
> +    if (!inode) {
> +        fuse_reply_err(req, EBADF);
> +        return;
> +    }
> +
>      if (!fi) {
> -        struct lo_data *lo = lo_data(req);
> -
> -        res = asprintf(&buf, "%i", lo_fd(req, ino));
> -        if (res == -1) {
> -            return (void)fuse_reply_err(req, errno);
> -        }
> -
> -        fd = openat(lo->proc_self_fd, buf, O_RDWR);
> -        free(buf);
> -        if (fd == -1) {
> -            return (void)fuse_reply_err(req, errno);
> +        fd = lo_inode_open(lo, inode, O_RDWR);
> +        if (fd < 0) {
> +            res = -fd;
> +            goto out;
>          }
>      } else {
>          fd = lo_fi_fd(req, fi);
>      }
>  
>      if (datasync) {
> -        res = fdatasync(fd);
> +        res = fdatasync(fd) == -1 ? errno : 0;
>      } else {
> -        res = fsync(fd);
> +        res = fsync(fd) == -1 ? errno : 0;
>      }
>      if (!fi) {
>          close(fd);
>      }
> -    fuse_reply_err(req, res == -1 ? errno : 0);
> +out:
> +    lo_inode_put(lo, &inode);
> +    fuse_reply_err(req, res);
>  }
>  
>  static void lo_read(fuse_req_t req, fuse_ino_t ino, size_t size, off_t offset,
>
Laszlo Ersek Jan. 21, 2021, 5:07 p.m. UTC | #3
On 01/21/21 16:52, Alex Xu wrote:
> Excerpts from Laszlo Ersek's message of January 21, 2021 10:32 am:
>> Assuming a benign / trusted guest, is there going to be an override for
>> this?
>>
>> Asked differently -- if we don't want to set up a separate block device
>> on the host, to contain the filesystem that is mounted as the shared
>> directory, can unionfs (?) / overlayfs be used to re-mount an existent
>> host-side directory as the shared directory, but with
>> "noexec,nosuid,nodev" *bolted-on*?
>>
>> If people have to create separate block devices (on the host side) for
>> innocent use cases such as running tests in a trusted guest, that's not
>> going to qualify as "usability progress" relative to having a qcow2 (or
>> raw) image file.
>>
>> "nodev,nosuid" is kind of a no-brainer for any host-side *data* volume
>> anyway (such as the one underlying "/home", even), so I don't see those
>> options as a challenge. But "noexec" is different.
>>
>> Thanks,
>> Laszlo
> 
> On Linux, there are two types of mount options: per-superblock, and 
> per-point. Most filesystem-specific options are per-superblock, and 
> apply to all mounts of that device. noexec, nosuid, and nodev are 
> per-mount options, and apply individually to each mount, bind or 
> otherwise, of a given device. Bind mounts copy the parent per-mount 
> options, but can be individually altered. Note also that it is not 
> required to create a new location for a bind mount.
> 
> For example, invoking:
> 
> mount --bind -o noexec,nosuid,nodev /var/lib/vms/source1 /var/lib/vms/source1
> 
> would effectively secure a source directory. This can also be inserted 
> in /etc/fstab, such as:
> 
> /var/lib/vms/source1 /var/lib/vms/source1 none bind,noexec,nosuid,nodev 0 0
> 
> Note that, as explained in Stefan's initial email, this hides any 
> submounts below source. Each of those must be individually protected, 
> either by initially mounting with the security options or by creating a 
> new bind mount as above. Although these cases should be infrequent, they 
> are common enough that Stefan and I think that they should be supported, 
> or at the very least not silently behave in unexpected or insecure ways.
> 
> Additionally, while it's possible to use overlayfs for this purpose, 
> it's overkill, and as far as I understand, doesn't solve the problem of 
> hiding sub-mounts.

Awesome, this is the best answer I could possibly get. Thank you!
Laszlo
Dr. David Alan Gilbert Jan. 21, 2021, 7 p.m. UTC | #4
* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> A well-behaved FUSE client does not attempt to open special files with
> FUSE_OPEN because they are handled on the client side (e.g. device nodes
> are handled by client-side device drivers).
> 
> The check to prevent virtiofsd from opening special files is missing in
> a few cases, most notably FUSE_OPEN. A malicious client can cause
> virtiofsd to open a device node, potentially allowing the guest to
> escape. This can be exploited by a modified guest device driver. It is
> not exploitable from guest userspace since the guest kernel will handle
> special files inside the guest instead of sending FUSE requests.
> 
> This patch adds the missing checks to virtiofsd. This is a short-term
> solution because it does not prevent a compromised virtiofsd process
> from opening device nodes on the host.
> 
> Reported-by: Alex Xu <alex@alxu.ca>
> Fixes: CVE-2020-35517
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

(With a response to Dan's question about Symlinks)


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
> This issue was diagnosed on public IRC and is therefore already known
> and not embargoed.
> 
> A stronger fix, and the long-term solution, is for users to mount the
> shared directory and any sub-mounts with nodev, as well as nosuid and
> noexec. Unfortunately virtiofsd cannot do this automatically because
> bind mounts added by the user after virtiofsd has launched would not be
> detected. I suggest the following:
> 
> 1. Modify libvirt and Kata Containers to explicitly set these mount
>    options.
> 2. Then modify virtiofsd to check that the shared directory has the
>    necessary options at startup. Refuse to start if the options are
>    missing so that the user is aware of the security requirements.
> 
> As a bonus this also increases the likelihood that other host processes
> besides virtiofsd will be protected by nosuid/noexec/nodev so that a
> malicious guest cannot drop these files in place and then arrange for a
> host process to come across them.
> 
> Additionally, user namespaces have been discussed. They seem like a
> worthwhile addition as an unprivileged or privilege-separated mode
> although there are limitations with respect to security xattrs and the
> actual uid/gid stored on the host file system not corresponding to the
> guest uid/gid.
> ---
>  tools/virtiofsd/passthrough_ll.c | 84 +++++++++++++++++++++-----------
>  1 file changed, 56 insertions(+), 28 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 5fb36d9407..e08352d649 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -555,6 +555,29 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino)
>      return fd;
>  }
>  
> +/*
> + * Open a file descriptor for an inode. Returns -EBADF if the inode is not a
> + * regular file or a directory. Use this helper function instead of raw
> + * openat(2) to prevent security issues when a malicious client opens special
> + * files such as block device nodes.
> + */
> +static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode,
> +                         int open_flags)
> +{
> +    g_autofree char *fd_str = g_strdup_printf("%d", inode->fd);
> +    int fd;
> +
> +    if (!S_ISREG(inode->filetype) && !S_ISDIR(inode->filetype)) {
> +        return -EBADF;
> +    }
> +
> +    fd = openat(lo->proc_self_fd, fd_str, open_flags);
> +    if (fd < 0) {
> +        return -errno;
> +    }
> +    return fd;
> +}
> +
>  static void lo_init(void *userdata, struct fuse_conn_info *conn)
>  {
>      struct lo_data *lo = (struct lo_data *)userdata;
> @@ -684,8 +707,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>          if (fi) {
>              truncfd = fd;
>          } else {
> -            sprintf(procname, "%i", ifd);
> -            truncfd = openat(lo->proc_self_fd, procname, O_RDWR);
> +            truncfd = lo_inode_open(lo, inode, O_RDWR);
>              if (truncfd < 0) {
>                  goto out_err;
>              }
> @@ -1725,7 +1747,6 @@ static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo,
>                                                        pid_t pid, int *err)
>  {
>      struct lo_inode_plock *plock;
> -    char procname[64];
>      int fd;
>  
>      plock =
> @@ -1742,12 +1763,10 @@ static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo,
>      }
>  
>      /* Open another instance of file which can be used for ofd locks. */
> -    sprintf(procname, "%i", inode->fd);
> -
>      /* TODO: What if file is not writable? */
> -    fd = openat(lo->proc_self_fd, procname, O_RDWR);
> -    if (fd == -1) {
> -        *err = errno;
> +    fd = lo_inode_open(lo, inode, O_RDWR);
> +    if (fd < 0) {
> +        *err = -fd;
>          free(plock);
>          return NULL;
>      }
> @@ -1894,18 +1913,24 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
>  {
>      int fd;
>      ssize_t fh;
> -    char buf[64];
>      struct lo_data *lo = lo_data(req);
> +    struct lo_inode *inode = lo_inode(req, ino);
>  
>      fuse_log(FUSE_LOG_DEBUG, "lo_open(ino=%" PRIu64 ", flags=%d)\n", ino,
>               fi->flags);
>  
> +    if (!inode) {
> +        fuse_reply_err(req, EBADF);
> +        return;
> +    }
> +
>      update_open_flags(lo->writeback, lo->allow_direct_io, fi);
>  
> -    sprintf(buf, "%i", lo_fd(req, ino));
> -    fd = openat(lo->proc_self_fd, buf, fi->flags & ~O_NOFOLLOW);
> -    if (fd == -1) {
> -        return (void)fuse_reply_err(req, errno);
> +    fd = lo_inode_open(lo, inode, fi->flags & ~O_NOFOLLOW);
> +    if (fd < 0) {
> +        lo_inode_put(lo, &inode);
> +        fuse_reply_err(req, -fd);
> +        return;
>      }
>  
>      pthread_mutex_lock(&lo->mutex);
> @@ -1913,6 +1938,7 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
>      pthread_mutex_unlock(&lo->mutex);
>      if (fh == -1) {
>          close(fd);
> +        lo_inode_put(lo, &inode);
>          fuse_reply_err(req, ENOMEM);
>          return;
>      }
> @@ -1923,6 +1949,7 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
>      } else if (lo->cache == CACHE_ALWAYS) {
>          fi->keep_cache = 1;
>      }
> +    lo_inode_put(lo, &inode);
>      fuse_reply_open(req, fi);
>  }
>  
> @@ -1982,39 +2009,40 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
>  static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
>                       struct fuse_file_info *fi)
>  {
> +    struct lo_inode *inode = lo_inode(req, ino);
> +    struct lo_data *lo = lo_data(req);
>      int res;
>      int fd;
> -    char *buf;
>  
>      fuse_log(FUSE_LOG_DEBUG, "lo_fsync(ino=%" PRIu64 ", fi=0x%p)\n", ino,
>               (void *)fi);
>  
> +    if (!inode) {
> +        fuse_reply_err(req, EBADF);
> +        return;
> +    }
> +
>      if (!fi) {
> -        struct lo_data *lo = lo_data(req);
> -
> -        res = asprintf(&buf, "%i", lo_fd(req, ino));
> -        if (res == -1) {
> -            return (void)fuse_reply_err(req, errno);
> -        }
> -
> -        fd = openat(lo->proc_self_fd, buf, O_RDWR);
> -        free(buf);
> -        if (fd == -1) {
> -            return (void)fuse_reply_err(req, errno);
> +        fd = lo_inode_open(lo, inode, O_RDWR);
> +        if (fd < 0) {
> +            res = -fd;
> +            goto out;
>          }
>      } else {
>          fd = lo_fi_fd(req, fi);
>      }
>  
>      if (datasync) {
> -        res = fdatasync(fd);
> +        res = fdatasync(fd) == -1 ? errno : 0;
>      } else {
> -        res = fsync(fd);
> +        res = fsync(fd) == -1 ? errno : 0;
>      }
>      if (!fi) {
>          close(fd);
>      }
> -    fuse_reply_err(req, res == -1 ? errno : 0);
> +out:
> +    lo_inode_put(lo, &inode);
> +    fuse_reply_err(req, res);
>  }
>  
>  static void lo_read(fuse_req_t req, fuse_ino_t ino, size_t size, off_t offset,
> -- 
> 2.29.2
>
Vivek Goyal Jan. 22, 2021, 3:40 p.m. UTC | #5
On Thu, Jan 21, 2021 at 02:44:29PM +0000, Stefan Hajnoczi wrote:
> A well-behaved FUSE client does not attempt to open special files with
> FUSE_OPEN because they are handled on the client side (e.g. device nodes
> are handled by client-side device drivers).
> 
> The check to prevent virtiofsd from opening special files is missing in
> a few cases, most notably FUSE_OPEN. A malicious client can cause
> virtiofsd to open a device node, potentially allowing the guest to
> escape. This can be exploited by a modified guest device driver. It is
> not exploitable from guest userspace since the guest kernel will handle
> special files inside the guest instead of sending FUSE requests.
> 
> This patch adds the missing checks to virtiofsd. This is a short-term
> solution because it does not prevent a compromised virtiofsd process
> from opening device nodes on the host.
> 
> Reported-by: Alex Xu <alex@alxu.ca>
> Fixes: CVE-2020-35517
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

It looks good to me. I see there is another openat() instance in
lo_opendir(). May be we can convert it to use lo_inode_open() as well?

Initially I thought that remove posix locks will be broken on special
files like named pipes (fifo). But looks like in that case posix lock
request does not come to virtiofsd at all (despite the fact remote
posix locks are enabled). So it probably is fine to modify openat()
there.

Ran blogbench and pjdfstests and they run fine for me, So..

Reviewed-by: Vivek Goyal <vgoyal@redhat.com>

Vivek

> ---
> This issue was diagnosed on public IRC and is therefore already known
> and not embargoed.
> 
> A stronger fix, and the long-term solution, is for users to mount the
> shared directory and any sub-mounts with nodev, as well as nosuid and
> noexec. Unfortunately virtiofsd cannot do this automatically because
> bind mounts added by the user after virtiofsd has launched would not be
> detected. I suggest the following:
> 
> 1. Modify libvirt and Kata Containers to explicitly set these mount
>    options.
> 2. Then modify virtiofsd to check that the shared directory has the
>    necessary options at startup. Refuse to start if the options are
>    missing so that the user is aware of the security requirements.
> 
> As a bonus this also increases the likelihood that other host processes
> besides virtiofsd will be protected by nosuid/noexec/nodev so that a
> malicious guest cannot drop these files in place and then arrange for a
> host process to come across them.
> 
> Additionally, user namespaces have been discussed. They seem like a
> worthwhile addition as an unprivileged or privilege-separated mode
> although there are limitations with respect to security xattrs and the
> actual uid/gid stored on the host file system not corresponding to the
> guest uid/gid.
> ---
>  tools/virtiofsd/passthrough_ll.c | 84 +++++++++++++++++++++-----------
>  1 file changed, 56 insertions(+), 28 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 5fb36d9407..e08352d649 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -555,6 +555,29 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino)
>      return fd;
>  }
>  
> +/*
> + * Open a file descriptor for an inode. Returns -EBADF if the inode is not a
> + * regular file or a directory. Use this helper function instead of raw
> + * openat(2) to prevent security issues when a malicious client opens special
> + * files such as block device nodes.
> + */
> +static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode,
> +                         int open_flags)
> +{
> +    g_autofree char *fd_str = g_strdup_printf("%d", inode->fd);
> +    int fd;
> +
> +    if (!S_ISREG(inode->filetype) && !S_ISDIR(inode->filetype)) {
> +        return -EBADF;
> +    }
> +
> +    fd = openat(lo->proc_self_fd, fd_str, open_flags);
> +    if (fd < 0) {
> +        return -errno;
> +    }
> +    return fd;
> +}
> +
>  static void lo_init(void *userdata, struct fuse_conn_info *conn)
>  {
>      struct lo_data *lo = (struct lo_data *)userdata;
> @@ -684,8 +707,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>          if (fi) {
>              truncfd = fd;
>          } else {
> -            sprintf(procname, "%i", ifd);
> -            truncfd = openat(lo->proc_self_fd, procname, O_RDWR);
> +            truncfd = lo_inode_open(lo, inode, O_RDWR);
>              if (truncfd < 0) {
>                  goto out_err;
>              }
> @@ -1725,7 +1747,6 @@ static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo,
>                                                        pid_t pid, int *err)
>  {
>      struct lo_inode_plock *plock;
> -    char procname[64];
>      int fd;
>  
>      plock =
> @@ -1742,12 +1763,10 @@ static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo,
>      }
>  
>      /* Open another instance of file which can be used for ofd locks. */
> -    sprintf(procname, "%i", inode->fd);
> -
>      /* TODO: What if file is not writable? */
> -    fd = openat(lo->proc_self_fd, procname, O_RDWR);
> -    if (fd == -1) {
> -        *err = errno;
> +    fd = lo_inode_open(lo, inode, O_RDWR);
> +    if (fd < 0) {
> +        *err = -fd;
>          free(plock);
>          return NULL;
>      }
> @@ -1894,18 +1913,24 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
>  {
>      int fd;
>      ssize_t fh;
> -    char buf[64];
>      struct lo_data *lo = lo_data(req);
> +    struct lo_inode *inode = lo_inode(req, ino);
>  
>      fuse_log(FUSE_LOG_DEBUG, "lo_open(ino=%" PRIu64 ", flags=%d)\n", ino,
>               fi->flags);
>  
> +    if (!inode) {
> +        fuse_reply_err(req, EBADF);
> +        return;
> +    }
> +
>      update_open_flags(lo->writeback, lo->allow_direct_io, fi);
>  
> -    sprintf(buf, "%i", lo_fd(req, ino));
> -    fd = openat(lo->proc_self_fd, buf, fi->flags & ~O_NOFOLLOW);
> -    if (fd == -1) {
> -        return (void)fuse_reply_err(req, errno);
> +    fd = lo_inode_open(lo, inode, fi->flags & ~O_NOFOLLOW);
> +    if (fd < 0) {
> +        lo_inode_put(lo, &inode);
> +        fuse_reply_err(req, -fd);
> +        return;
>      }
>  
>      pthread_mutex_lock(&lo->mutex);
> @@ -1913,6 +1938,7 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
>      pthread_mutex_unlock(&lo->mutex);
>      if (fh == -1) {
>          close(fd);
> +        lo_inode_put(lo, &inode);
>          fuse_reply_err(req, ENOMEM);
>          return;
>      }
> @@ -1923,6 +1949,7 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
>      } else if (lo->cache == CACHE_ALWAYS) {
>          fi->keep_cache = 1;
>      }
> +    lo_inode_put(lo, &inode);
>      fuse_reply_open(req, fi);
>  }
>  
> @@ -1982,39 +2009,40 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
>  static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
>                       struct fuse_file_info *fi)
>  {
> +    struct lo_inode *inode = lo_inode(req, ino);
> +    struct lo_data *lo = lo_data(req);
>      int res;
>      int fd;
> -    char *buf;
>  
>      fuse_log(FUSE_LOG_DEBUG, "lo_fsync(ino=%" PRIu64 ", fi=0x%p)\n", ino,
>               (void *)fi);
>  
> +    if (!inode) {
> +        fuse_reply_err(req, EBADF);
> +        return;
> +    }
> +
>      if (!fi) {
> -        struct lo_data *lo = lo_data(req);
> -
> -        res = asprintf(&buf, "%i", lo_fd(req, ino));
> -        if (res == -1) {
> -            return (void)fuse_reply_err(req, errno);
> -        }
> -
> -        fd = openat(lo->proc_self_fd, buf, O_RDWR);
> -        free(buf);
> -        if (fd == -1) {
> -            return (void)fuse_reply_err(req, errno);
> +        fd = lo_inode_open(lo, inode, O_RDWR);
> +        if (fd < 0) {
> +            res = -fd;
> +            goto out;
>          }
>      } else {
>          fd = lo_fi_fd(req, fi);
>      }
>  
>      if (datasync) {
> -        res = fdatasync(fd);
> +        res = fdatasync(fd) == -1 ? errno : 0;
>      } else {
> -        res = fsync(fd);
> +        res = fsync(fd) == -1 ? errno : 0;
>      }
>      if (!fi) {
>          close(fd);
>      }
> -    fuse_reply_err(req, res == -1 ? errno : 0);
> +out:
> +    lo_inode_put(lo, &inode);
> +    fuse_reply_err(req, res);
>  }
>  
>  static void lo_read(fuse_req_t req, fuse_ino_t ino, size_t size, off_t offset,
> -- 
> 2.29.2
>
Vivek Goyal Jan. 22, 2021, 3:49 p.m. UTC | #6
On Thu, Jan 21, 2021 at 02:48:03PM +0000, Daniel P. Berrangé wrote:
> On Thu, Jan 21, 2021 at 02:44:29PM +0000, Stefan Hajnoczi wrote:
> > A well-behaved FUSE client does not attempt to open special files with
> > FUSE_OPEN because they are handled on the client side (e.g. device nodes
> > are handled by client-side device drivers).
> > 
> > The check to prevent virtiofsd from opening special files is missing in
> > a few cases, most notably FUSE_OPEN. A malicious client can cause
> > virtiofsd to open a device node, potentially allowing the guest to
> > escape. This can be exploited by a modified guest device driver. It is
> > not exploitable from guest userspace since the guest kernel will handle
> > special files inside the guest instead of sending FUSE requests.
> > 
> > This patch adds the missing checks to virtiofsd. This is a short-term
> > solution because it does not prevent a compromised virtiofsd process
> > from opening device nodes on the host.
> > 
> > Reported-by: Alex Xu <alex@alxu.ca>
> > Fixes: CVE-2020-35517
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> > This issue was diagnosed on public IRC and is therefore already known
> > and not embargoed.
> > 
> > A stronger fix, and the long-term solution, is for users to mount the
> > shared directory and any sub-mounts with nodev, as well as nosuid and
> > noexec. Unfortunately virtiofsd cannot do this automatically because
> > bind mounts added by the user after virtiofsd has launched would not be
> > detected. I suggest the following:
> > 
> > 1. Modify libvirt and Kata Containers to explicitly set these mount
> >    options.
> > 2. Then modify virtiofsd to check that the shared directory has the
> >    necessary options at startup. Refuse to start if the options are
> >    missing so that the user is aware of the security requirements.
> > 
> > As a bonus this also increases the likelihood that other host processes
> > besides virtiofsd will be protected by nosuid/noexec/nodev so that a
> > malicious guest cannot drop these files in place and then arrange for a
> > host process to come across them.
> > 
> > Additionally, user namespaces have been discussed. They seem like a
> > worthwhile addition as an unprivileged or privilege-separated mode
> > although there are limitations with respect to security xattrs and the
> > actual uid/gid stored on the host file system not corresponding to the
> > guest uid/gid.
> > ---
> >  tools/virtiofsd/passthrough_ll.c | 84 +++++++++++++++++++++-----------
> >  1 file changed, 56 insertions(+), 28 deletions(-)
> > 
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index 5fb36d9407..e08352d649 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -555,6 +555,29 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino)
> >      return fd;
> >  }
> >  
> > +/*
> > + * Open a file descriptor for an inode. Returns -EBADF if the inode is not a
> > + * regular file or a directory. Use this helper function instead of raw
> > + * openat(2) to prevent security issues when a malicious client opens special
> > + * files such as block device nodes.
> > + */
> > +static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode,
> > +                         int open_flags)
> > +{
> > +    g_autofree char *fd_str = g_strdup_printf("%d", inode->fd);
> > +    int fd;
> > +
> > +    if (!S_ISREG(inode->filetype) && !S_ISDIR(inode->filetype)) {
> > +        return -EBADF;
> > +    }
> > +
> > +    fd = openat(lo->proc_self_fd, fd_str, open_flags);
> 
> Whats the intended behaviour with symlinks ?  Do we need to
> allow S_ISLNK, or are we assuming the symlink has already
> been expanded to the target file by this point ? If the latter
> adding a comment about this would be useful.

Given the current places lo_inode_open() is being used, I think it is
fine and we probably don't have to worry about symlink in these paths
as it has been resolved already by now.

In truncation path, we will expect that we are working with target
file (and not symlink fd). Same should be true for remote posix locks,
lo_open() and lo_fsync().

Other paths where we might be working with symlinks like listxattr,
getxattr, setxattr, already have mechanism to deal with symlink.

Thanks
Vivek



> 
> > +    if (fd < 0) {
> > +        return -errno;
> > +    }
> > +    return fd;
> > +}
> > +
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Stefan Hajnoczi Jan. 25, 2021, 2:53 p.m. UTC | #7
On Thu, Jan 21, 2021 at 02:48:03PM +0000, Daniel P. Berrangé wrote:
> On Thu, Jan 21, 2021 at 02:44:29PM +0000, Stefan Hajnoczi wrote:
> > A well-behaved FUSE client does not attempt to open special files with
> > FUSE_OPEN because they are handled on the client side (e.g. device nodes
> > are handled by client-side device drivers).
> > 
> > The check to prevent virtiofsd from opening special files is missing in
> > a few cases, most notably FUSE_OPEN. A malicious client can cause
> > virtiofsd to open a device node, potentially allowing the guest to
> > escape. This can be exploited by a modified guest device driver. It is
> > not exploitable from guest userspace since the guest kernel will handle
> > special files inside the guest instead of sending FUSE requests.
> > 
> > This patch adds the missing checks to virtiofsd. This is a short-term
> > solution because it does not prevent a compromised virtiofsd process
> > from opening device nodes on the host.
> > 
> > Reported-by: Alex Xu <alex@alxu.ca>
> > Fixes: CVE-2020-35517
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> > This issue was diagnosed on public IRC and is therefore already known
> > and not embargoed.
> > 
> > A stronger fix, and the long-term solution, is for users to mount the
> > shared directory and any sub-mounts with nodev, as well as nosuid and
> > noexec. Unfortunately virtiofsd cannot do this automatically because
> > bind mounts added by the user after virtiofsd has launched would not be
> > detected. I suggest the following:
> > 
> > 1. Modify libvirt and Kata Containers to explicitly set these mount
> >    options.
> > 2. Then modify virtiofsd to check that the shared directory has the
> >    necessary options at startup. Refuse to start if the options are
> >    missing so that the user is aware of the security requirements.
> > 
> > As a bonus this also increases the likelihood that other host processes
> > besides virtiofsd will be protected by nosuid/noexec/nodev so that a
> > malicious guest cannot drop these files in place and then arrange for a
> > host process to come across them.
> > 
> > Additionally, user namespaces have been discussed. They seem like a
> > worthwhile addition as an unprivileged or privilege-separated mode
> > although there are limitations with respect to security xattrs and the
> > actual uid/gid stored on the host file system not corresponding to the
> > guest uid/gid.
> > ---
> >  tools/virtiofsd/passthrough_ll.c | 84 +++++++++++++++++++++-----------
> >  1 file changed, 56 insertions(+), 28 deletions(-)
> > 
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index 5fb36d9407..e08352d649 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -555,6 +555,29 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino)
> >      return fd;
> >  }
> >  
> > +/*
> > + * Open a file descriptor for an inode. Returns -EBADF if the inode is not a
> > + * regular file or a directory. Use this helper function instead of raw
> > + * openat(2) to prevent security issues when a malicious client opens special
> > + * files such as block device nodes.
> > + */
> > +static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode,
> > +                         int open_flags)
> > +{
> > +    g_autofree char *fd_str = g_strdup_printf("%d", inode->fd);
> > +    int fd;
> > +
> > +    if (!S_ISREG(inode->filetype) && !S_ISDIR(inode->filetype)) {
> > +        return -EBADF;
> > +    }
> > +
> > +    fd = openat(lo->proc_self_fd, fd_str, open_flags);
> 
> Whats the intended behaviour with symlinks ?  Do we need to
> allow S_ISLNK, or are we assuming the symlink has already
> been expanded to the target file by this point ? If the latter
> adding a comment about this would be useful.

I will add a comment. The FUSE client is expected to resolve symlinks
on the client side.

In other words, the client does not open the symlink inode in order to
access the target of the symlink. It must open the target directly.

Stefan
Miklos Szeredi Jan. 25, 2021, 4:12 p.m. UTC | #8
On Thu, Jan 21, 2021 at 3:44 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:

> This patch adds the missing checks to virtiofsd. This is a short-term
> solution because it does not prevent a compromised virtiofsd process
> from opening device nodes on the host.

I think the proper solution is adding support to the host in order to
restrict opens on filesystems that virtiofsd has access to.

My idea was to add a "force_nodev" mount option that cannot be
disabled and will make propagated mounts  also be marked
"force_nodev,nodev".

A possibly simpler solution is to extend seccomp to restrict the
process itself from being able to open special files.  Not sure if
that's within the scope of seccomp though.

Thanks,
Miklos
Stefan Hajnoczi Jan. 26, 2021, 10:10 a.m. UTC | #9
On Fri, Jan 22, 2021 at 10:40:54AM -0500, Vivek Goyal wrote:
> On Thu, Jan 21, 2021 at 02:44:29PM +0000, Stefan Hajnoczi wrote:
> > A well-behaved FUSE client does not attempt to open special files with
> > FUSE_OPEN because they are handled on the client side (e.g. device nodes
> > are handled by client-side device drivers).
> > 
> > The check to prevent virtiofsd from opening special files is missing in
> > a few cases, most notably FUSE_OPEN. A malicious client can cause
> > virtiofsd to open a device node, potentially allowing the guest to
> > escape. This can be exploited by a modified guest device driver. It is
> > not exploitable from guest userspace since the guest kernel will handle
> > special files inside the guest instead of sending FUSE requests.
> > 
> > This patch adds the missing checks to virtiofsd. This is a short-term
> > solution because it does not prevent a compromised virtiofsd process
> > from opening device nodes on the host.
> > 
> > Reported-by: Alex Xu <alex@alxu.ca>
> > Fixes: CVE-2020-35517
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> 
> It looks good to me. I see there is another openat() instance in
> lo_opendir(). May be we can convert it to use lo_inode_open() as well?

I thought so too, but this one is interesting:

  fd = openat(lo_fd(req, ino), ".", O_RDONLY);

Using "." here is basically equivalent to O_DIRECTORY!

Therefore this always fails on special files. That's why I ended up
leaving it unchanged.

  $ cat a.c
  #define _GNU_SOURCE
  #include <sys/types.h>
  #include <sys/stat.h>
  #include <fcntl.h>
  #include <unistd.h>
  #include <stdio.h>
  #include <stdlib.h>

  int main(int argc, char **argv)
  {
      int fd;
      int pfd = open(argv[1], O_PATH);

      if (pfd < 0) {
          perror("open");
          return EXIT_FAILURE;
      }

      fd = openat(pfd, ".", O_RDONLY);
      if (fd < 0) {
          perror("openat");
          return EXIT_FAILURE;
      }

      close(fd);
      close(pfd);
      return EXIT_SUCCESS;
  }
  $ gcc -o a a.c
  $ ./a /tmp
  <--- no error
  $ ./a /tmp/a.c
  openat: Not a directory
  $ ./a /dev/stdin
  openat: Not a directory

Stefan
Stefan Hajnoczi Jan. 26, 2021, 10:18 a.m. UTC | #10
On Mon, Jan 25, 2021 at 05:12:23PM +0100, Miklos Szeredi wrote:
> On Thu, Jan 21, 2021 at 3:44 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> > This patch adds the missing checks to virtiofsd. This is a short-term
> > solution because it does not prevent a compromised virtiofsd process
> > from opening device nodes on the host.
> 
> I think the proper solution is adding support to the host in order to
> restrict opens on filesystems that virtiofsd has access to.
> 
> My idea was to add a "force_nodev" mount option that cannot be
> disabled and will make propagated mounts  also be marked
> "force_nodev,nodev".

Interesting idea! Mount options that are relevant:
 * noexec
 * nosuid
 * nodev
 * nosymfollow

Do you have time to work on the force_* mount options?

> A possibly simpler solution is to extend seccomp to restrict the
> process itself from being able to open special files.  Not sure if
> that's within the scope of seccomp though.

I don't think seccomp can provide that restriction since it's unrelated
to the syscall or its arguments.

Stefan
Miklos Szeredi Jan. 26, 2021, 10:27 a.m. UTC | #11
On Tue, Jan 26, 2021 at 11:18 AM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Mon, Jan 25, 2021 at 05:12:23PM +0100, Miklos Szeredi wrote:
> > On Thu, Jan 21, 2021 at 3:44 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >
> > > This patch adds the missing checks to virtiofsd. This is a short-term
> > > solution because it does not prevent a compromised virtiofsd process
> > > from opening device nodes on the host.
> >
> > I think the proper solution is adding support to the host in order to
> > restrict opens on filesystems that virtiofsd has access to.
> >
> > My idea was to add a "force_nodev" mount option that cannot be
> > disabled and will make propagated mounts  also be marked
> > "force_nodev,nodev".
>
> Interesting idea! Mount options that are relevant:
>  * noexec
>  * nosuid
>  * nodev
>  * nosymfollow
>
> Do you have time to work on the force_* mount options?

Not at the moment, but first we need to probe Al to see if this idea sticks...

> > A possibly simpler solution is to extend seccomp to restrict the
> > process itself from being able to open special files.  Not sure if
> > that's within the scope of seccomp though.
>
> I don't think seccomp can provide that restriction since it's unrelated
> to the syscall or its arguments.

How about selinux, then?

Thanks,
Miklos
diff mbox series

Patch

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 5fb36d9407..e08352d649 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -555,6 +555,29 @@  static int lo_fd(fuse_req_t req, fuse_ino_t ino)
     return fd;
 }
 
+/*
+ * Open a file descriptor for an inode. Returns -EBADF if the inode is not a
+ * regular file or a directory. Use this helper function instead of raw
+ * openat(2) to prevent security issues when a malicious client opens special
+ * files such as block device nodes.
+ */
+static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode,
+                         int open_flags)
+{
+    g_autofree char *fd_str = g_strdup_printf("%d", inode->fd);
+    int fd;
+
+    if (!S_ISREG(inode->filetype) && !S_ISDIR(inode->filetype)) {
+        return -EBADF;
+    }
+
+    fd = openat(lo->proc_self_fd, fd_str, open_flags);
+    if (fd < 0) {
+        return -errno;
+    }
+    return fd;
+}
+
 static void lo_init(void *userdata, struct fuse_conn_info *conn)
 {
     struct lo_data *lo = (struct lo_data *)userdata;
@@ -684,8 +707,7 @@  static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         if (fi) {
             truncfd = fd;
         } else {
-            sprintf(procname, "%i", ifd);
-            truncfd = openat(lo->proc_self_fd, procname, O_RDWR);
+            truncfd = lo_inode_open(lo, inode, O_RDWR);
             if (truncfd < 0) {
                 goto out_err;
             }
@@ -1725,7 +1747,6 @@  static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo,
                                                       pid_t pid, int *err)
 {
     struct lo_inode_plock *plock;
-    char procname[64];
     int fd;
 
     plock =
@@ -1742,12 +1763,10 @@  static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo,
     }
 
     /* Open another instance of file which can be used for ofd locks. */
-    sprintf(procname, "%i", inode->fd);
-
     /* TODO: What if file is not writable? */
-    fd = openat(lo->proc_self_fd, procname, O_RDWR);
-    if (fd == -1) {
-        *err = errno;
+    fd = lo_inode_open(lo, inode, O_RDWR);
+    if (fd < 0) {
+        *err = -fd;
         free(plock);
         return NULL;
     }
@@ -1894,18 +1913,24 @@  static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
 {
     int fd;
     ssize_t fh;
-    char buf[64];
     struct lo_data *lo = lo_data(req);
+    struct lo_inode *inode = lo_inode(req, ino);
 
     fuse_log(FUSE_LOG_DEBUG, "lo_open(ino=%" PRIu64 ", flags=%d)\n", ino,
              fi->flags);
 
+    if (!inode) {
+        fuse_reply_err(req, EBADF);
+        return;
+    }
+
     update_open_flags(lo->writeback, lo->allow_direct_io, fi);
 
-    sprintf(buf, "%i", lo_fd(req, ino));
-    fd = openat(lo->proc_self_fd, buf, fi->flags & ~O_NOFOLLOW);
-    if (fd == -1) {
-        return (void)fuse_reply_err(req, errno);
+    fd = lo_inode_open(lo, inode, fi->flags & ~O_NOFOLLOW);
+    if (fd < 0) {
+        lo_inode_put(lo, &inode);
+        fuse_reply_err(req, -fd);
+        return;
     }
 
     pthread_mutex_lock(&lo->mutex);
@@ -1913,6 +1938,7 @@  static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
     pthread_mutex_unlock(&lo->mutex);
     if (fh == -1) {
         close(fd);
+        lo_inode_put(lo, &inode);
         fuse_reply_err(req, ENOMEM);
         return;
     }
@@ -1923,6 +1949,7 @@  static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
     } else if (lo->cache == CACHE_ALWAYS) {
         fi->keep_cache = 1;
     }
+    lo_inode_put(lo, &inode);
     fuse_reply_open(req, fi);
 }
 
@@ -1982,39 +2009,40 @@  static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
 static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
                      struct fuse_file_info *fi)
 {
+    struct lo_inode *inode = lo_inode(req, ino);
+    struct lo_data *lo = lo_data(req);
     int res;
     int fd;
-    char *buf;
 
     fuse_log(FUSE_LOG_DEBUG, "lo_fsync(ino=%" PRIu64 ", fi=0x%p)\n", ino,
              (void *)fi);
 
+    if (!inode) {
+        fuse_reply_err(req, EBADF);
+        return;
+    }
+
     if (!fi) {
-        struct lo_data *lo = lo_data(req);
-
-        res = asprintf(&buf, "%i", lo_fd(req, ino));
-        if (res == -1) {
-            return (void)fuse_reply_err(req, errno);
-        }
-
-        fd = openat(lo->proc_self_fd, buf, O_RDWR);
-        free(buf);
-        if (fd == -1) {
-            return (void)fuse_reply_err(req, errno);
+        fd = lo_inode_open(lo, inode, O_RDWR);
+        if (fd < 0) {
+            res = -fd;
+            goto out;
         }
     } else {
         fd = lo_fi_fd(req, fi);
     }
 
     if (datasync) {
-        res = fdatasync(fd);
+        res = fdatasync(fd) == -1 ? errno : 0;
     } else {
-        res = fsync(fd);
+        res = fsync(fd) == -1 ? errno : 0;
     }
     if (!fi) {
         close(fd);
     }
-    fuse_reply_err(req, res == -1 ? errno : 0);
+out:
+    lo_inode_put(lo, &inode);
+    fuse_reply_err(req, res);
 }
 
 static void lo_read(fuse_req_t req, fuse_ino_t ino, size_t size, off_t offset,