Message ID | 20191016142006.24975-1-christian@brauner.io |
---|---|
Headers | show |
Series | UBUNTU: SAUCE: seccomp: backport SECCOMP_USER_NOTIF_FLAG_CONTINUE | expand |
On Wed, Oct 16, 2019 at 04:20:04PM +0200, Christian Brauner wrote: > From: Christian Brauner <christian.brauner@ubuntu.com> > > Hey everyone, > > BugLink: https://bugs.launchpad.net/bugs/1847744 > > Recently we landed seccomp support for SECCOMP_RET_USER_NOTIF (cf. [4]) > which enables a process (watchee) to retrieve an fd for its seccomp > filter. This fd can then be handed to another (usually more privileged) > process (watcher). The watcher will then be able to receive seccomp > messages about the syscalls having been performed by the watchee. > > This feature is heavily used by LXD but currently with limited > useability which is why we urgently need this series. > For example, it is currently used to intercept mknod() syscalls in > unprivileged containers. The mknod() syscall can be easily filtered > based on dev_t. This allows us to only intercept a very specific subset > of mknod() syscalls. Furthermore, mknod() is not possible in user > namespaces toto coelo and so intercepting and denying syscalls that are > not in the whitelist on accident is not a big deal. The watchee won't > notice a difference. > > In contrast to mknod(), a lot of other syscall we intercept (e.g. > setxattr(), and soon mount()) cannot be easily filtered like mknod() > because they have pointer arguments. Additionally, some of them might > actually succeed in user namespaces (e.g. setxattr() for all "user.*" > xattrs). Since we currently cannot tell seccomp to continue from a user > notifier we are stuck with performing all of the syscalls in lieu of the > container. This is a huge security liability since it is extremely > difficult to correctly assume all of the necessary privileges of the > calling task such that the syscall can be successfully emulated without > escaping other additional security restrictions (think missing CAP_MKNOD > for mknod(), or MS_NODEV on a filesystem etc.). This can > be solved by telling seccomp to resume the syscall. > > Until we have backported this patch we are blocked on intercepting the > mount() syscall. It would be excellent if we could backport this patch. > > I've also backported the selftests since they are worth running! > Please note that these patches are up for the v5.5 merge window and will > not be carried as Ubuntu specific patches indefinitely! Applied to unstable/master, thanks!
From: Christian Brauner <christian.brauner@ubuntu.com> Hey everyone, BugLink: https://bugs.launchpad.net/bugs/1847744 Recently we landed seccomp support for SECCOMP_RET_USER_NOTIF (cf. [4]) which enables a process (watchee) to retrieve an fd for its seccomp filter. This fd can then be handed to another (usually more privileged) process (watcher). The watcher will then be able to receive seccomp messages about the syscalls having been performed by the watchee. This feature is heavily used by LXD but currently with limited useability which is why we urgently need this series. For example, it is currently used to intercept mknod() syscalls in unprivileged containers. The mknod() syscall can be easily filtered based on dev_t. This allows us to only intercept a very specific subset of mknod() syscalls. Furthermore, mknod() is not possible in user namespaces toto coelo and so intercepting and denying syscalls that are not in the whitelist on accident is not a big deal. The watchee won't notice a difference. In contrast to mknod(), a lot of other syscall we intercept (e.g. setxattr(), and soon mount()) cannot be easily filtered like mknod() because they have pointer arguments. Additionally, some of them might actually succeed in user namespaces (e.g. setxattr() for all "user.*" xattrs). Since we currently cannot tell seccomp to continue from a user notifier we are stuck with performing all of the syscalls in lieu of the container. This is a huge security liability since it is extremely difficult to correctly assume all of the necessary privileges of the calling task such that the syscall can be successfully emulated without escaping other additional security restrictions (think missing CAP_MKNOD for mknod(), or MS_NODEV on a filesystem etc.). This can be solved by telling seccomp to resume the syscall. Until we have backported this patch we are blocked on intercepting the mount() syscall. It would be excellent if we could backport this patch. I've also backported the selftests since they are worth running! Please note that these patches are up for the v5.5 merge window and will not be carried as Ubuntu specific patches indefinitely! Thanks! Christian Christian Brauner (2): UBUNTU: SAUCE: seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE UBUNTU: SAUCE: seccomp: test SECCOMP_USER_NOTIF_FLAG_CONTINUE include/uapi/linux/seccomp.h | 29 +++++ kernel/seccomp.c | 28 ++++- tools/testing/selftests/seccomp/seccomp_bpf.c | 107 ++++++++++++++++++ 3 files changed, 158 insertions(+), 6 deletions(-)