Patchwork seccomp: add mkdir() and fchmod() to the whitelist

login
register
mail settings
Submitter Paul Moore
Date Jan. 3, 2014, 7:58 p.m.
Message ID <20140103195827.7268.69658.stgit@localhost>
Download mbox | patch
Permalink /patch/306672/
State New
Headers show

Comments

Paul Moore - Jan. 3, 2014, 7:58 p.m.
The PulseAudio library attempts to do a mkdir(2) and fchmod(2) on
"/run/user/<UID>/pulse" which is currently blocked by the syscall
filter; this patch adds the two missing syscalls to the whitelist.
You can reproduce this problem with the following command:

 # qemu -monitor stdio -device intel-hda -device hda-duplex

If watched under strace the following syscalls are shown:

 mkdir("/run/user/0/pulse", 0700)
 fchmod(11, 0700) [NOTE: 11 is the fd for /run/user/0/pulse]

Reported-by: xuhan@redhat.com
Signed-off-by: Paul Moore <pmoore@redhat.com>
---
 qemu-seccomp.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
Paolo Bonzini - Jan. 3, 2014, 8:24 p.m.
Il 03/01/2014 20:58, Paul Moore ha scritto:
> The PulseAudio library attempts to do a mkdir(2) and fchmod(2) on
> "/run/user/<UID>/pulse" which is currently blocked by the syscall
> filter; this patch adds the two missing syscalls to the whitelist.
> You can reproduce this problem with the following command:
> 
>  # qemu -monitor stdio -device intel-hda -device hda-duplex
> 
> If watched under strace the following syscalls are shown:
> 
>  mkdir("/run/user/0/pulse", 0700)
>  fchmod(11, 0700) [NOTE: 11 is the fd for /run/user/0/pulse]

Can fchmod be exploited to violate the sandbox (e.g. to let data escape
from a VM that ought not to have any way to communicate with the outside
world)?

Paolo
Paul Moore - Jan. 3, 2014, 8:46 p.m.
On Friday, January 03, 2014 09:24:57 PM Paolo Bonzini wrote:
> Il 03/01/2014 20:58, Paul Moore ha scritto:
> > The PulseAudio library attempts to do a mkdir(2) and fchmod(2) on
> > "/run/user/<UID>/pulse" which is currently blocked by the syscall
> > filter; this patch adds the two missing syscalls to the whitelist.
> > 
> > You can reproduce this problem with the following command:
> >  # qemu -monitor stdio -device intel-hda -device hda-duplex
> > 
> > If watched under strace the following syscalls are shown:
> >  mkdir("/run/user/0/pulse", 0700)
> >  fchmod(11, 0700) [NOTE: 11 is the fd for /run/user/0/pulse]
> 
> Can fchmod be exploited to violate the sandbox (e.g. to let data escape
> from a VM that ought not to have any way to communicate with the outside
> world)?

Technically, there is the potential for any syscall to be exploited in such a 
way that a malicious guest could gain greater access than desired and do 
something evil with that access.  After all, that was the motivation behind 
seccomp: disable unused syscalls to reduce the chance of an attacker 
exploiting a syscall bug.

The important thing to remember here is that the seccomp code in QEMU is not 
enabling syscalls, it is disabling them.  In other words, a QEMU instance with 
the seccomp functionality enabled, e.g. '-sandbox on', only reduces the number 
of syscalls available to the QEMU process, it never increases or adds 
vulnerable syscalls to the QEMU process.

Granted, yes, there are syscalls in the current whitelist that I wish we could 
disable, but we are still trying to arrive a whitelist that is all 
encompassing (or close to it) with respect to QEMU functionality.  Once we 
have that list in hand (each fix like the one I posted gets us closer) we can 
start looking at selectively shrinking the whitelist*.

* We've talked about this on-list previously and there are several approaches 
here, some include conditionally adding/removing syscalls based on the QEMU 
functionality requested, e.g. command line, different sandbox "profiles", e.g. 
standalone vs libvirt, and staged seccomp filters, e.g. a whitelist followed 
by progressively tighter blacklists.

Patch

diff --git a/qemu-seccomp.c b/qemu-seccomp.c
index cf07869..bb19306 100644
--- a/qemu-seccomp.c
+++ b/qemu-seccomp.c
@@ -220,7 +220,9 @@  static const struct QemuSeccompSyscall seccomp_whitelist[] = {
     { SCMP_SYS(io_cancel), 241 },
     { SCMP_SYS(io_setup), 241 },
     { SCMP_SYS(io_destroy), 241 },
-    { SCMP_SYS(arch_prctl), 240 }
+    { SCMP_SYS(arch_prctl), 240 },
+    { SCMP_SYS(mkdir), 240 },
+    { SCMP_SYS(fchmod), 240 }
 };
 
 int seccomp_start(void)