mbox series

[0/2,SRU,EOAN] UBUNTU: SAUCE: seccomp: backport SECCOMP_USER_NOTIF_FLAG_CONTINUE

Message ID 20191016140432.20421-1-christian.brauner@ubuntu.com
Headers show
Series UBUNTU: SAUCE: seccomp: backport SECCOMP_USER_NOTIF_FLAG_CONTINUE | expand

Message

Christian Brauner Oct. 16, 2019, 2:04 p.m. UTC
Hey everyone,

BugLink: https://bugs.launchpad.net/bugs/1847744

Recently we landed seccomp support for SECCOMP_RET_USER_NOTIF (cf. [4])
which enables a process (watchee) to retrieve an fd for its seccomp
filter. This fd can then be handed to another (usually more privileged)
process (watcher). The watcher will then be able to receive seccomp
messages about the syscalls having been performed by the watchee.

This feature is heavily used by LXD but currently with limited
useability which is why we urgently need this series.
For example, it is currently used to intercept mknod() syscalls in
unprivileged containers. The mknod() syscall can be easily filtered
based on dev_t. This allows us to only intercept a very specific subset
of mknod() syscalls. Furthermore, mknod() is not possible in user
namespaces toto coelo and so intercepting and denying syscalls that are
not in the whitelist on accident is not a big deal. The watchee won't
notice a difference.

In contrast to mknod(), a lot of other syscall we intercept (e.g.
setxattr(), and soon mount()) cannot be easily filtered like mknod()
because they have pointer arguments. Additionally, some of them might
actually succeed in user namespaces (e.g. setxattr() for all "user.*"
xattrs). Since we currently cannot tell seccomp to continue from a user
notifier we are stuck with performing all of the syscalls in lieu of the
container. This is a huge security liability since it is extremely
difficult to correctly assume all of the necessary privileges of the
calling task such that the syscall can be successfully emulated without
escaping other additional security restrictions (think missing CAP_MKNOD
for mknod(), or MS_NODEV on a filesystem etc.). This can
be solved by telling seccomp to resume the syscall.

Until we have backported this patch we are blocked on intercepting the
mount() syscall. It would be excellent if we could backport this patch.

I've also backported the selftests since they are worth running!
Please note that these patches are up for the v5.5 merge window and will
not be carried as Ubuntu specific patches indefinitely!

Thanks!
Christian

Christian Brauner (2):
  UBUNTU: SAUCE: seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE
  UBUNTU: SAUCE: seccomp: test SECCOMP_USER_NOTIF_FLAG_CONTINUE

 include/uapi/linux/seccomp.h                  |  29 +++++
 kernel/seccomp.c                              |  28 ++++-
 tools/testing/selftests/seccomp/seccomp_bpf.c | 107 ++++++++++++++++++
 3 files changed, 158 insertions(+), 6 deletions(-)

Comments

Stefan Bader Oct. 18, 2019, 1:21 p.m. UTC | #1
On 16.10.19 16:04, Christian Brauner wrote:
> Hey everyone,
> 
> BugLink: https://bugs.launchpad.net/bugs/1847744
> 
> Recently we landed seccomp support for SECCOMP_RET_USER_NOTIF (cf. [4])
> which enables a process (watchee) to retrieve an fd for its seccomp
> filter. This fd can then be handed to another (usually more privileged)
> process (watcher). The watcher will then be able to receive seccomp
> messages about the syscalls having been performed by the watchee.
> 
> This feature is heavily used by LXD but currently with limited
> useability which is why we urgently need this series.
> For example, it is currently used to intercept mknod() syscalls in
> unprivileged containers. The mknod() syscall can be easily filtered
> based on dev_t. This allows us to only intercept a very specific subset
> of mknod() syscalls. Furthermore, mknod() is not possible in user
> namespaces toto coelo and so intercepting and denying syscalls that are
> not in the whitelist on accident is not a big deal. The watchee won't
> notice a difference.
> 
> In contrast to mknod(), a lot of other syscall we intercept (e.g.
> setxattr(), and soon mount()) cannot be easily filtered like mknod()
> because they have pointer arguments. Additionally, some of them might
> actually succeed in user namespaces (e.g. setxattr() for all "user.*"
> xattrs). Since we currently cannot tell seccomp to continue from a user
> notifier we are stuck with performing all of the syscalls in lieu of the
> container. This is a huge security liability since it is extremely
> difficult to correctly assume all of the necessary privileges of the
> calling task such that the syscall can be successfully emulated without
> escaping other additional security restrictions (think missing CAP_MKNOD
> for mknod(), or MS_NODEV on a filesystem etc.). This can
> be solved by telling seccomp to resume the syscall.
> 
> Until we have backported this patch we are blocked on intercepting the
> mount() syscall. It would be excellent if we could backport this patch.
> 
> I've also backported the selftests since they are worth running!
> Please note that these patches are up for the v5.5 merge window and will
> not be carried as Ubuntu specific patches indefinitely!
> 
> Thanks!
> Christian
> 
> Christian Brauner (2):
>   UBUNTU: SAUCE: seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE
>   UBUNTU: SAUCE: seccomp: test SECCOMP_USER_NOTIF_FLAG_CONTINUE
> 
>  include/uapi/linux/seccomp.h                  |  29 +++++
>  kernel/seccomp.c                              |  28 ++++-
>  tools/testing/selftests/seccomp/seccomp_bpf.c | 107 ++++++++++++++++++
>  3 files changed, 158 insertions(+), 6 deletions(-)
> 
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Khalid Elmously Oct. 21, 2019, 3:35 a.m. UTC | #2
On 2019-10-16 16:04:30 , Christian Brauner wrote:
> Hey everyone,
> 
> BugLink: https://bugs.launchpad.net/bugs/1847744
> 
> Recently we landed seccomp support for SECCOMP_RET_USER_NOTIF (cf. [4])
> which enables a process (watchee) to retrieve an fd for its seccomp
> filter. This fd can then be handed to another (usually more privileged)
> process (watcher). The watcher will then be able to receive seccomp
> messages about the syscalls having been performed by the watchee.
> 
> This feature is heavily used by LXD but currently with limited
> useability which is why we urgently need this series.
> For example, it is currently used to intercept mknod() syscalls in
> unprivileged containers. The mknod() syscall can be easily filtered
> based on dev_t. This allows us to only intercept a very specific subset
> of mknod() syscalls. Furthermore, mknod() is not possible in user
> namespaces toto coelo and so intercepting and denying syscalls that are
> not in the whitelist on accident is not a big deal. The watchee won't
> notice a difference.
> 
> In contrast to mknod(), a lot of other syscall we intercept (e.g.
> setxattr(), and soon mount()) cannot be easily filtered like mknod()
> because they have pointer arguments. Additionally, some of them might
> actually succeed in user namespaces (e.g. setxattr() for all "user.*"
> xattrs). Since we currently cannot tell seccomp to continue from a user
> notifier we are stuck with performing all of the syscalls in lieu of the
> container. This is a huge security liability since it is extremely
> difficult to correctly assume all of the necessary privileges of the
> calling task such that the syscall can be successfully emulated without
> escaping other additional security restrictions (think missing CAP_MKNOD
> for mknod(), or MS_NODEV on a filesystem etc.). This can
> be solved by telling seccomp to resume the syscall.
> 
> Until we have backported this patch we are blocked on intercepting the
> mount() syscall. It would be excellent if we could backport this patch.
> 
> I've also backported the selftests since they are worth running!
> Please note that these patches are up for the v5.5 merge window and will
> not be carried as Ubuntu specific patches indefinitely!
> 
> Thanks!
> Christian
> 
> Christian Brauner (2):
>   UBUNTU: SAUCE: seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE
>   UBUNTU: SAUCE: seccomp: test SECCOMP_USER_NOTIF_FLAG_CONTINUE
> 
>  include/uapi/linux/seccomp.h                  |  29 +++++
>  kernel/seccomp.c                              |  28 ++++-
>  tools/testing/selftests/seccomp/seccomp_bpf.c | 107 ++++++++++++++++++
>  3 files changed, 158 insertions(+), 6 deletions(-)
> 
> -- 
> 2.23.0
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
Khalid Elmously Oct. 21, 2019, 3:37 a.m. UTC | #3
On 2019-10-16 16:04:30 , Christian Brauner wrote:
> Hey everyone,
> 
> BugLink: https://bugs.launchpad.net/bugs/1847744
> 
> Recently we landed seccomp support for SECCOMP_RET_USER_NOTIF (cf. [4])
> which enables a process (watchee) to retrieve an fd for its seccomp
> filter. This fd can then be handed to another (usually more privileged)
> process (watcher). The watcher will then be able to receive seccomp
> messages about the syscalls having been performed by the watchee.
> 
> This feature is heavily used by LXD but currently with limited
> useability which is why we urgently need this series.
> For example, it is currently used to intercept mknod() syscalls in
> unprivileged containers. The mknod() syscall can be easily filtered
> based on dev_t. This allows us to only intercept a very specific subset
> of mknod() syscalls. Furthermore, mknod() is not possible in user
> namespaces toto coelo and so intercepting and denying syscalls that are
> not in the whitelist on accident is not a big deal. The watchee won't
> notice a difference.
> 
> In contrast to mknod(), a lot of other syscall we intercept (e.g.
> setxattr(), and soon mount()) cannot be easily filtered like mknod()
> because they have pointer arguments. Additionally, some of them might
> actually succeed in user namespaces (e.g. setxattr() for all "user.*"
> xattrs). Since we currently cannot tell seccomp to continue from a user
> notifier we are stuck with performing all of the syscalls in lieu of the
> container. This is a huge security liability since it is extremely
> difficult to correctly assume all of the necessary privileges of the
> calling task such that the syscall can be successfully emulated without
> escaping other additional security restrictions (think missing CAP_MKNOD
> for mknod(), or MS_NODEV on a filesystem etc.). This can
> be solved by telling seccomp to resume the syscall.
> 
> Until we have backported this patch we are blocked on intercepting the
> mount() syscall. It would be excellent if we could backport this patch.
> 
> I've also backported the selftests since they are worth running!
> Please note that these patches are up for the v5.5 merge window and will
> not be carried as Ubuntu specific patches indefinitely!
> 
> Thanks!
> Christian
> 
> Christian Brauner (2):
>   UBUNTU: SAUCE: seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE
>   UBUNTU: SAUCE: seccomp: test SECCOMP_USER_NOTIF_FLAG_CONTINUE
> 
>  include/uapi/linux/seccomp.h                  |  29 +++++
>  kernel/seccomp.c                              |  28 ++++-
>  tools/testing/selftests/seccomp/seccomp_bpf.c | 107 ++++++++++++++++++
>  3 files changed, 158 insertions(+), 6 deletions(-)
> 
> -- 
> 2.23.0
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team