Message ID | N1-wwKG8XWUST@Safe-mail.net |
---|---|
State | New |
Headers | show |
(Adding Paul Moore to CC since he's done a lot of work on syscall whitelisting in QEMU) On Wed, Sep 09, 2015 at 09:55:33PM -0400, namnamc@Safe-mail.net wrote: > This patch here adds argument filtering for three syscalls: > madvise(), shmget(), and shmctl(). > > The madvise() flags may need a few additions, but I couldn't find any common > cases where the extra flags were used. The only additions were ones I found by > grepping through the source code for all madvise-related flags: The current intention of the seccomp filter in QEMU, is that /all/ existing QEMU features continue to work unchanged. So even if a flag is used in a seemingly uncommon code path, we still need to allow that in a seccomp filter. Soo all these are needed: > $ grep -hro MADV_[A-Z]* qemu-2.3.0 | sort -u > MADV_DODUMP > MADV_DONTDUMP > MADV_DONTFORK > MADV_DONTNEED > MADV_HUGEPAGE > MADV_INVALID This one isn't a real constant as the syscall level > MADV_MERGEABLE > MADV_UNMERGEABLE > MADV_WILLNEED One thing to be wary off when checking for used syscalls in the source, is that a large amount of QEMU functionality is provided via 3rd party libraries, especially the crypto/TLS code, SPICE server, and various storage drivers for RBD, gluster, iSCSI and GTK/SDL for the UI. Probably not an issue for madvise, but this is in fact where the usage of shmget comes from - QEMU never directly uses it but (at least) GTK does call it. So in this case, we need to check at least the GTK source to ensure we covered all args. I wouldn't be surprised if SDL uses it too. This gets kind of tedious quite quickly.... > diff --git a/qemu-seccomp.c b/qemu-seccomp.c > index f9de0d3..0c11580 100644 > --- a/qemu-seccomp.c > +++ b/qemu-seccomp.c > @@ -264,6 +263,49 @@ int seccomp_start(void) > } > } > > + /* madvise */ > + static const int madvise_flags[] = { > + MADV_DONTFORK, > + MADV_DONTNEED, > + MADV_HUGEPAGE, > + MADV_MERGEABLE, So we need to add DODUMP, DONTDUMP, UNMERGABLE and WILLNEED here. That is still stricter than the previous allow-everything rule, so a net win. > + }; > + for (i = 0; i < ARRAY_SIZE(madvise_flags); i++) { > + rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(madvise), 1, > + SCMP_A2(SCMP_CMP_EQ, madvise_flags[i])); > + if (rc < 0) { > + goto seccomp_return; > + } > + } > + rc = seccomp_syscall_priority(ctx, SCMP_SYS(madvise), 245); > + if (rc < 0) { > + goto seccomp_return; > + } > + > + /* shmget */ > + rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(shmget), 2, > + SCMP_A0(SCMP_CMP_EQ, IPC_PRIVATE), > + SCMP_A2(SCMP_CMP_EQ, IP_CREAT|0777)); I'm not familiar with semantics of these seccomp rules, but is this saying that the second arg must be exactly equal to IP_CREAT|0777 ? If the app passes IP_CREAT|0600, would that be permitted instead ? The latter is what I see gtk2 source code passing for mode. > + if (rc < 0) { > + goto seccomp_return; > + } > + rc = seccomp_syscall_priority(ctx, SCMP_SYS(shmget), 240); > + if (rc < 0) { > + goto seccomp_return; > + } > + > + /* shmctl */ > + rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(shmctl), 2, > + SCMP_A1(SCMP_CMP_EQ, IPC_RMID), > + SCMP_A2(SCMP_CMP_EQ, 0)); > + if (rc < 0) { > + goto seccomp_return; > + } > + rc = seccomp_syscall_priority(ctx, SCMP_SYS(shmctl), 240); > + if (rc < 0) { > + goto seccomp_return; > + } > + Regards, Daniel
On Thursday, September 10, 2015 03:48:52 PM Daniel P. Berrange wrote: > On Wed, Sep 09, 2015 at 09:55:33PM -0400, namnamc@Safe-mail.net wrote: > > + > > + /* shmget */ > > + rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(shmget), 2, > > + SCMP_A0(SCMP_CMP_EQ, IPC_PRIVATE), > > + SCMP_A2(SCMP_CMP_EQ, IP_CREAT|0777)); > > I'm not familiar with semantics of these seccomp rules, but is this > saying that the second arg must be exactly equal to IP_CREAT|0777 ? Yes. > If the app passes IP_CREAT|0600, would that be permitted instead ? > The latter is what I see gtk2 source code passing for mode. It wouldn't match the rule as written above, if it doesn't match any other configured rules it would hit the default filter action.
diff --git a/qemu-seccomp.c b/qemu-seccomp.c index f9de0d3..0c11580 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -14,6 +14,8 @@ */ #include <stdio.h> #include <seccomp.h> +#include <linux/ipc.h> +#include <asm-generic/mman-common.h> #include "sysemu/seccomp.h" struct QemuSeccompSyscall { @@ -105,7 +107,6 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(rt_sigreturn), 245 }, { SCMP_SYS(sync), 245 }, { SCMP_SYS(pread64), 245 }, - { SCMP_SYS(madvise), 245 }, { SCMP_SYS(set_robust_list), 245 }, { SCMP_SYS(lseek), 245 }, { SCMP_SYS(pselect6), 245 }, @@ -224,11 +225,9 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(arch_prctl), 240 }, { SCMP_SYS(mkdir), 240 }, { SCMP_SYS(fchmod), 240 }, - { SCMP_SYS(shmget), 240 }, { SCMP_SYS(shmat), 240 }, { SCMP_SYS(shmdt), 240 }, { SCMP_SYS(timerfd_create), 240 }, - { SCMP_SYS(shmctl), 240 }, { SCMP_SYS(mlockall), 240 }, { SCMP_SYS(mlock), 240 }, { SCMP_SYS(munlock), 240 }, @@ -264,6 +263,49 @@ int seccomp_start(void) } } + /* madvise */ + static const int madvise_flags[] = { + MADV_DONTFORK, + MADV_DONTNEED, + MADV_HUGEPAGE, + MADV_MERGEABLE, + }; + for (i = 0; i < ARRAY_SIZE(madvise_flags); i++) { + rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(madvise), 1, + SCMP_A2(SCMP_CMP_EQ, madvise_flags[i])); + if (rc < 0) { + goto seccomp_return; + } + } + rc = seccomp_syscall_priority(ctx, SCMP_SYS(madvise), 245); + if (rc < 0) { + goto seccomp_return; + } + + /* shmget */ + rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(shmget), 2, + SCMP_A0(SCMP_CMP_EQ, IPC_PRIVATE), + SCMP_A2(SCMP_CMP_EQ, IP_CREAT|0777)); + if (rc < 0) { + goto seccomp_return; + } + rc = seccomp_syscall_priority(ctx, SCMP_SYS(shmget), 240); + if (rc < 0) { + goto seccomp_return; + } + + /* shmctl */ + rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(shmctl), 2, + SCMP_A1(SCMP_CMP_EQ, IPC_RMID), + SCMP_A2(SCMP_CMP_EQ, 0)); + if (rc < 0) { + goto seccomp_return; + } + rc = seccomp_syscall_priority(ctx, SCMP_SYS(shmctl), 240); + if (rc < 0) { + goto seccomp_return; + } + rc = seccomp_load(ctx); seccomp_return:
This patch here adds argument filtering for three syscalls: madvise(), shmget(), and shmctl(). The madvise() flags may need a few additions, but I couldn't find any common cases where the extra flags were used. The only additions were ones I found by grepping through the source code for all madvise-related flags: $ grep -hro MADV_[A-Z]* qemu-2.3.0 | sort -u MADV_DODUMP MADV_DONTDUMP MADV_DONTFORK MADV_DONTNEED MADV_HUGEPAGE MADV_INVALID MADV_MERGEABLE MADV_UNMERGEABLE MADV_WILLNEED The rest of the flags were found by stracing Qemu during some common uses cases. The tests were done on an x86_64 hardened Gentoo machine for the host, and the guests were a variety of Debian-based Linux distros. For the purpose of these syscalls, I don't think the guest OS matters much. Signed-off-by: Namsun Ch'o <namnamc@safe-mail.net> --- This is my first patch to Qemu. I have many more syscalls filtered, but was told on IRC to send in patches broken into smaller chunks with just a few new filters each time to make review easier. This is the first one. The next one will be for ioctl() which is possibly the most important considering its rich history of abuse.