diff mbox

sandbox: Report error on forbidden system call

Message ID c72ecf1bce6c98235a76444ca7df38fdc095fbb8.1360062165.git.mprivozn@redhat.com
State New
Headers show

Commit Message

Michal Prívozník Feb. 5, 2013, 11:02 a.m. UTC
Currently, it we call a not white listed system call, we get killed
immediately without reporting any error. It would be far more useful,
if we can at least shout something on stderr just before dying, so
users know it is because of sandbox, not just random quit.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
---
 os-posix.c     | 8 ++++++++
 qemu-seccomp.c | 4 +++-
 2 files changed, 11 insertions(+), 1 deletion(-)

Comments

Corey Bryant Feb. 5, 2013, 2:28 p.m. UTC | #1
On 02/05/2013 06:02 AM, Michal Privoznik wrote:
> Currently, it we call a not white listed system call, we get killed
> immediately without reporting any error. It would be far more useful,
> if we can at least shout something on stderr just before dying, so
> users know it is because of sandbox, not just random quit.
>
> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
> ---
>   os-posix.c     | 8 ++++++++
>   qemu-seccomp.c | 4 +++-
>   2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/os-posix.c b/os-posix.c
> index 5c64518..1d52306 100644
> --- a/os-posix.c
> +++ b/os-posix.c
> @@ -62,6 +62,12 @@ void os_setup_early_signal_handling(void)
>       sigaction(SIGPIPE, &act, NULL);
>   }
>
> +static void syssig_handler(int signal, siginfo_t *info, void *c)
> +{
> +    fprintf(stderr, "Bad system call\n");
> +    exit(1);
> +}
> +
>   static void termsig_handler(int signal, siginfo_t *info, void *c)
>   {
>       qemu_system_killed(info->si_signo, info->si_pid);
> @@ -77,6 +83,8 @@ void os_setup_signal_handling(void)
>       sigaction(SIGINT,  &act, NULL);
>       sigaction(SIGHUP,  &act, NULL);
>       sigaction(SIGTERM, &act, NULL);
> +    act.sa_sigaction = syssig_handler;
> +    sigaction(SIGSYS,  &act, NULL);
>   }
>
>   /* Find a likely location for support files using the location of the binary.
> diff --git a/qemu-seccomp.c b/qemu-seccomp.c
> index 031da1d..897d9b3 100644
> --- a/qemu-seccomp.c
> +++ b/qemu-seccomp.c
> @@ -2,9 +2,11 @@
>    * QEMU seccomp mode 2 support with libseccomp
>    *
>    * Copyright IBM, Corp. 2012
> + * Copyright (C) 2013 Red Hat, Inc.
>    *
>    * Authors:
>    *  Eduardo Otubo    <eotubo@br.ibm.com>
> + *  Michal Privoznik <mprivozn@redhat.com>
>    *
>    * This work is licensed under the terms of the GNU GPL, version 2.  See
>    * the COPYING file in the top-level directory.
> @@ -238,7 +240,7 @@ int seccomp_start(void)
>       unsigned int i = 0;
>       scmp_filter_ctx ctx;
>
> -    ctx = seccomp_init(SCMP_ACT_KILL);
> +    ctx = seccomp_init(SCMP_ACT_TRAP);
>       if (ctx == NULL) {
>           goto seccomp_return;
>       }
>

I think this is going to be better solved in the kernel.  I have a 
kernel patch sitting out there at: https://lkml.org/lkml/2013/1/7/313
Any public support of this patch could be useful to help get it in.

Something is definitely needed to learn the syscall that is killing 
QEMU.  But I don't think the signal handler approach is going to work. 
We tried that and ran into too many situations where signals were being 
blocked by libraries (spice is one example).  And we didn't want to get 
in the business of patching third party libraries to allow SIGSYS.
Michal Prívozník Feb. 6, 2013, 11:13 a.m. UTC | #2
On 05.02.2013 15:28, Corey Bryant wrote:
> 
> On 02/05/2013 06:02 AM, Michal Privoznik wrote:
>> Currently, it we call a not white listed system call, we get killed
>> immediately without reporting any error. It would be far more useful,
>> if we can at least shout something on stderr just before dying, so
>> users know it is because of sandbox, not just random quit.
>>
>> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
>> ---
>>   os-posix.c     | 8 ++++++++
>>   qemu-seccomp.c | 4 +++-
>>   2 files changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/os-posix.c b/os-posix.c
>> index 5c64518..1d52306 100644
>> --- a/os-posix.c
>> +++ b/os-posix.c
>> @@ -62,6 +62,12 @@ void os_setup_early_signal_handling(void)
>>       sigaction(SIGPIPE, &act, NULL);
>>   }
>>
>> +static void syssig_handler(int signal, siginfo_t *info, void *c)
>> +{
>> +    fprintf(stderr, "Bad system call\n");
>> +    exit(1);
>> +}
>> +
>>   static void termsig_handler(int signal, siginfo_t *info, void *c)
>>   {
>>       qemu_system_killed(info->si_signo, info->si_pid);
>> @@ -77,6 +83,8 @@ void os_setup_signal_handling(void)
>>       sigaction(SIGINT,  &act, NULL);
>>       sigaction(SIGHUP,  &act, NULL);
>>       sigaction(SIGTERM, &act, NULL);
>> +    act.sa_sigaction = syssig_handler;
>> +    sigaction(SIGSYS,  &act, NULL);
>>   }
>>
>>   /* Find a likely location for support files using the location of
>> the binary.
>> diff --git a/qemu-seccomp.c b/qemu-seccomp.c
>> index 031da1d..897d9b3 100644
>> --- a/qemu-seccomp.c
>> +++ b/qemu-seccomp.c
>> @@ -2,9 +2,11 @@
>>    * QEMU seccomp mode 2 support with libseccomp
>>    *
>>    * Copyright IBM, Corp. 2012
>> + * Copyright (C) 2013 Red Hat, Inc.
>>    *
>>    * Authors:
>>    *  Eduardo Otubo    <eotubo@br.ibm.com>
>> + *  Michal Privoznik <mprivozn@redhat.com>
>>    *
>>    * This work is licensed under the terms of the GNU GPL, version 2. 
>> See
>>    * the COPYING file in the top-level directory.
>> @@ -238,7 +240,7 @@ int seccomp_start(void)
>>       unsigned int i = 0;
>>       scmp_filter_ctx ctx;
>>
>> -    ctx = seccomp_init(SCMP_ACT_KILL);
>> +    ctx = seccomp_init(SCMP_ACT_TRAP);
>>       if (ctx == NULL) {
>>           goto seccomp_return;
>>       }
>>
> 
> I think this is going to be better solved in the kernel.  I have a
> kernel patch sitting out there at: https://lkml.org/lkml/2013/1/7/313
> Any public support of this patch could be useful to help get it in.
> 
> Something is definitely needed to learn the syscall that is killing
> QEMU.  But I don't think the signal handler approach is going to work.
> We tried that and ran into too many situations where signals were being
> blocked by libraries (spice is one example).  And we didn't want to get
> in the business of patching third party libraries to allow SIGSYS.
> 

My approach is slightly different here. While I agree that pr_info() is
useful to find which syscalls are not whitelisted yet, it doesn't solve
the different issue. That is - when qemu calls a not whitelisted
syscall, currently it gets killed without any error being printed
anywhere. So libvirt just sees the machine has gone away and it can be
really hard to figure out the root cause is actually sandboxing.

Re patch itself. Okay, there may be some situations where signal is not
delivered or missed in which case we don't print any error message. But
in these situations qemu should check / is checking return value of a
syscall so it will die itself anyway.

Michal
Daniel P. Berrangé Feb. 6, 2013, 11:25 a.m. UTC | #3
On Tue, Feb 05, 2013 at 09:28:51AM -0500, Corey Bryant wrote:
> 
> On 02/05/2013 06:02 AM, Michal Privoznik wrote:
> >Currently, it we call a not white listed system call, we get killed
> >immediately without reporting any error. It would be far more useful,
> >if we can at least shout something on stderr just before dying, so
> >users know it is because of sandbox, not just random quit.
> >
> >Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
> >---
> >  os-posix.c     | 8 ++++++++
> >  qemu-seccomp.c | 4 +++-
> >  2 files changed, 11 insertions(+), 1 deletion(-)
> >
> >diff --git a/os-posix.c b/os-posix.c
> >index 5c64518..1d52306 100644
> >--- a/os-posix.c
> >+++ b/os-posix.c
> >@@ -62,6 +62,12 @@ void os_setup_early_signal_handling(void)
> >      sigaction(SIGPIPE, &act, NULL);
> >  }
> >
> >+static void syssig_handler(int signal, siginfo_t *info, void *c)
> >+{
> >+    fprintf(stderr, "Bad system call\n");
> >+    exit(1);
> >+}
> >+
> >  static void termsig_handler(int signal, siginfo_t *info, void *c)
> >  {
> >      qemu_system_killed(info->si_signo, info->si_pid);
> >@@ -77,6 +83,8 @@ void os_setup_signal_handling(void)
> >      sigaction(SIGINT,  &act, NULL);
> >      sigaction(SIGHUP,  &act, NULL);
> >      sigaction(SIGTERM, &act, NULL);
> >+    act.sa_sigaction = syssig_handler;
> >+    sigaction(SIGSYS,  &act, NULL);
> >  }
> >
> >  /* Find a likely location for support files using the location of the binary.
> >diff --git a/qemu-seccomp.c b/qemu-seccomp.c
> >index 031da1d..897d9b3 100644
> >--- a/qemu-seccomp.c
> >+++ b/qemu-seccomp.c
> >@@ -2,9 +2,11 @@
> >   * QEMU seccomp mode 2 support with libseccomp
> >   *
> >   * Copyright IBM, Corp. 2012
> >+ * Copyright (C) 2013 Red Hat, Inc.
> >   *
> >   * Authors:
> >   *  Eduardo Otubo    <eotubo@br.ibm.com>
> >+ *  Michal Privoznik <mprivozn@redhat.com>
> >   *
> >   * This work is licensed under the terms of the GNU GPL, version 2.  See
> >   * the COPYING file in the top-level directory.
> >@@ -238,7 +240,7 @@ int seccomp_start(void)
> >      unsigned int i = 0;
> >      scmp_filter_ctx ctx;
> >
> >-    ctx = seccomp_init(SCMP_ACT_KILL);
> >+    ctx = seccomp_init(SCMP_ACT_TRAP);
> >      if (ctx == NULL) {
> >          goto seccomp_return;
> >      }
> >
> 
> I think this is going to be better solved in the kernel.  I have a
> kernel patch sitting out there at:
> https://lkml.org/lkml/2013/1/7/313
> Any public support of this patch could be useful to help get it in.

While that "learning mode" is useful, I don't think that really
solves the problem that Michael is looking at. We're running a QEMU
guest in production mode and it gets killed with no indication of
why. It is definitely desirable that we get a log message in QEMU's
stderr for this scenario.

> Something is definitely needed to learn the syscall that is killing
> QEMU.  But I don't think the signal handler approach is going to
> work. We tried that and ran into too many situations where signals
> were being blocked by libraries (spice is one example).  And we
> didn't want to get in the business of patching third party libraries
> to allow SIGSYS.

We absolutely should be fixing libraries not to screw up signal handling
in applications. I see that the SPICE worker thread blocks every single
signal except SEGV/FPE/ILL, which is just completely bogus. There's no
acceptable reason for SPICE to block so many signals.

Regards,
Daniel
Corey Bryant Feb. 6, 2013, 2:08 p.m. UTC | #4
On 02/06/2013 06:13 AM, Michal Privoznik wrote:
> On 05.02.2013 15:28, Corey Bryant wrote:
>>
>> On 02/05/2013 06:02 AM, Michal Privoznik wrote:
>>> Currently, it we call a not white listed system call, we get killed
>>> immediately without reporting any error. It would be far more useful,
>>> if we can at least shout something on stderr just before dying, so
>>> users know it is because of sandbox, not just random quit.
>>>
>>> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
>>> ---
>>>    os-posix.c     | 8 ++++++++
>>>    qemu-seccomp.c | 4 +++-
>>>    2 files changed, 11 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/os-posix.c b/os-posix.c
>>> index 5c64518..1d52306 100644
>>> --- a/os-posix.c
>>> +++ b/os-posix.c
>>> @@ -62,6 +62,12 @@ void os_setup_early_signal_handling(void)
>>>        sigaction(SIGPIPE, &act, NULL);
>>>    }
>>>
>>> +static void syssig_handler(int signal, siginfo_t *info, void *c)
>>> +{
>>> +    fprintf(stderr, "Bad system call\n");
>>> +    exit(1);
>>> +}
>>> +
>>>    static void termsig_handler(int signal, siginfo_t *info, void *c)
>>>    {
>>>        qemu_system_killed(info->si_signo, info->si_pid);
>>> @@ -77,6 +83,8 @@ void os_setup_signal_handling(void)
>>>        sigaction(SIGINT,  &act, NULL);
>>>        sigaction(SIGHUP,  &act, NULL);
>>>        sigaction(SIGTERM, &act, NULL);
>>> +    act.sa_sigaction = syssig_handler;
>>> +    sigaction(SIGSYS,  &act, NULL);
>>>    }
>>>
>>>    /* Find a likely location for support files using the location of
>>> the binary.
>>> diff --git a/qemu-seccomp.c b/qemu-seccomp.c
>>> index 031da1d..897d9b3 100644
>>> --- a/qemu-seccomp.c
>>> +++ b/qemu-seccomp.c
>>> @@ -2,9 +2,11 @@
>>>     * QEMU seccomp mode 2 support with libseccomp
>>>     *
>>>     * Copyright IBM, Corp. 2012
>>> + * Copyright (C) 2013 Red Hat, Inc.
>>>     *
>>>     * Authors:
>>>     *  Eduardo Otubo    <eotubo@br.ibm.com>
>>> + *  Michal Privoznik <mprivozn@redhat.com>
>>>     *
>>>     * This work is licensed under the terms of the GNU GPL, version 2.
>>> See
>>>     * the COPYING file in the top-level directory.
>>> @@ -238,7 +240,7 @@ int seccomp_start(void)
>>>        unsigned int i = 0;
>>>        scmp_filter_ctx ctx;
>>>
>>> -    ctx = seccomp_init(SCMP_ACT_KILL);
>>> +    ctx = seccomp_init(SCMP_ACT_TRAP);
>>>        if (ctx == NULL) {
>>>            goto seccomp_return;
>>>        }
>>>
>>
>> I think this is going to be better solved in the kernel.  I have a
>> kernel patch sitting out there at: https://lkml.org/lkml/2013/1/7/313
>> Any public support of this patch could be useful to help get it in.
>>
>> Something is definitely needed to learn the syscall that is killing
>> QEMU.  But I don't think the signal handler approach is going to work.
>> We tried that and ran into too many situations where signals were being
>> blocked by libraries (spice is one example).  And we didn't want to get
>> in the business of patching third party libraries to allow SIGSYS.
>>
>
> My approach is slightly different here. While I agree that pr_info() is
> useful to find which syscalls are not whitelisted yet, it doesn't solve
> the different issue. That is - when qemu calls a not whitelisted
> syscall, currently it gets killed without any error being printed
> anywhere. So libvirt just sees the machine has gone away and it can be
> really hard to figure out the root cause is actually sandboxing.

Have you looked at the audit_seccomp() call in the kernel's 
__secure_computing() function?  It may already be doing what you're 
looking for.

>
> Re patch itself. Okay, there may be some situations where signal is not
> delivered or missed in which case we don't print any error message. But
> in these situations qemu should check / is checking return value of a
> syscall so it will die itself anyway.

This just sounds like it is defeating the purpose of seccomp, and a lot 
of things could go wrong with signal handling.  The entire point of 
SCMP_ACT_KILL is that the process does not get control back after an 
non-whitelisted syscall is called.  Now we'd be giving back control to 
what could be a malicious guest.

>
> Michal
>
Corey Bryant Feb. 6, 2013, 2:51 p.m. UTC | #5
On 02/06/2013 09:08 AM, Corey Bryant wrote:
>
>
> On 02/06/2013 06:13 AM, Michal Privoznik wrote:
>> On 05.02.2013 15:28, Corey Bryant wrote:
>>>
>>> On 02/05/2013 06:02 AM, Michal Privoznik wrote:
>>>> Currently, it we call a not white listed system call, we get killed
>>>> immediately without reporting any error. It would be far more useful,
>>>> if we can at least shout something on stderr just before dying, so
>>>> users know it is because of sandbox, not just random quit.
>>>>
>>>> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
>>>> ---
>>>>    os-posix.c     | 8 ++++++++
>>>>    qemu-seccomp.c | 4 +++-
>>>>    2 files changed, 11 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/os-posix.c b/os-posix.c
>>>> index 5c64518..1d52306 100644
>>>> --- a/os-posix.c
>>>> +++ b/os-posix.c
>>>> @@ -62,6 +62,12 @@ void os_setup_early_signal_handling(void)
>>>>        sigaction(SIGPIPE, &act, NULL);
>>>>    }
>>>>
>>>> +static void syssig_handler(int signal, siginfo_t *info, void *c)
>>>> +{
>>>> +    fprintf(stderr, "Bad system call\n");
>>>> +    exit(1);
>>>> +}
>>>> +
>>>>    static void termsig_handler(int signal, siginfo_t *info, void *c)
>>>>    {
>>>>        qemu_system_killed(info->si_signo, info->si_pid);
>>>> @@ -77,6 +83,8 @@ void os_setup_signal_handling(void)
>>>>        sigaction(SIGINT,  &act, NULL);
>>>>        sigaction(SIGHUP,  &act, NULL);
>>>>        sigaction(SIGTERM, &act, NULL);
>>>> +    act.sa_sigaction = syssig_handler;
>>>> +    sigaction(SIGSYS,  &act, NULL);
>>>>    }
>>>>
>>>>    /* Find a likely location for support files using the location of
>>>> the binary.
>>>> diff --git a/qemu-seccomp.c b/qemu-seccomp.c
>>>> index 031da1d..897d9b3 100644
>>>> --- a/qemu-seccomp.c
>>>> +++ b/qemu-seccomp.c
>>>> @@ -2,9 +2,11 @@
>>>>     * QEMU seccomp mode 2 support with libseccomp
>>>>     *
>>>>     * Copyright IBM, Corp. 2012
>>>> + * Copyright (C) 2013 Red Hat, Inc.
>>>>     *
>>>>     * Authors:
>>>>     *  Eduardo Otubo    <eotubo@br.ibm.com>
>>>> + *  Michal Privoznik <mprivozn@redhat.com>
>>>>     *
>>>>     * This work is licensed under the terms of the GNU GPL, version 2.
>>>> See
>>>>     * the COPYING file in the top-level directory.
>>>> @@ -238,7 +240,7 @@ int seccomp_start(void)
>>>>        unsigned int i = 0;
>>>>        scmp_filter_ctx ctx;
>>>>
>>>> -    ctx = seccomp_init(SCMP_ACT_KILL);
>>>> +    ctx = seccomp_init(SCMP_ACT_TRAP);
>>>>        if (ctx == NULL) {
>>>>            goto seccomp_return;
>>>>        }
>>>>
>>>
>>> I think this is going to be better solved in the kernel.  I have a
>>> kernel patch sitting out there at: https://lkml.org/lkml/2013/1/7/313
>>> Any public support of this patch could be useful to help get it in.
>>>
>>> Something is definitely needed to learn the syscall that is killing
>>> QEMU.  But I don't think the signal handler approach is going to work.
>>> We tried that and ran into too many situations where signals were being
>>> blocked by libraries (spice is one example).  And we didn't want to get
>>> in the business of patching third party libraries to allow SIGSYS.
>>>
>>
>> My approach is slightly different here. While I agree that pr_info() is
>> useful to find which syscalls are not whitelisted yet, it doesn't solve
>> the different issue. That is - when qemu calls a not whitelisted
>> syscall, currently it gets killed without any error being printed
>> anywhere. So libvirt just sees the machine has gone away and it can be
>> really hard to figure out the root cause is actually sandboxing.
>
> Have you looked at the audit_seccomp() call in the kernel's
> __secure_computing() function?  It may already be doing what you're
> looking for.
>

Fyi, here's an example of an audit_seccomp() audit log:

type=SECCOMP msg=audit(1360162117.139:415): auid=1000 uid=1000 gid=1000 
ses=2 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 
pid=1808 comm="bpf-direct" sig=31 syscall=1 compat=0 ip=0x35dc8ede49 
code=0x0
Paolo Bonzini Feb. 6, 2013, 3:56 p.m. UTC | #6
Il 06/02/2013 12:25, Daniel P. Berrange ha scritto:
>> > Something is definitely needed to learn the syscall that is killing
>> > QEMU.  But I don't think the signal handler approach is going to
>> > work. We tried that and ran into too many situations where signals
>> > were being blocked by libraries (spice is one example).  And we
>> > didn't want to get in the business of patching third party libraries
>> > to allow SIGSYS.
> We absolutely should be fixing libraries not to screw up signal handling
> in applications. I see that the SPICE worker thread blocks every single
> signal except SEGV/FPE/ILL, which is just completely bogus. There's no
> acceptable reason for SPICE to block so many signals.

There definitely is.  It is blocking the signals only in the worker
thread, so that the library does not get in the way of applications that
do not expect other threads to exist.

By keeping the signals blocked, the application is free to handle them:
1) knowing that a signal will always be delivered to the thread running
the event loop, and will interrupt select/poll; 2) without having to
enforce _both_ thread-safety and async-signal-safety.

The problem is that SIGSYS is a per-thread signal, like SIGSEGV/FPE/ILL.
 So it is impossible to block it, what the kernel does if you block it
is treat it as SIG_DFL.  Unfortunately, SIGSYS is obscure enough that
nobody knows about it and/or realizes it is a companion to SIGSEGV/FPE/ILL.

So _yes_, we should indeed get in the business of patching third-party
libraries (and QEMU itself, see util/qemu-thread-posix.c) to allow
SIGSIGV/FPE/ILL/SYS, but that's where the scope of the problem ends.

Paolo
Corey Bryant Feb. 8, 2013, 2:44 p.m. UTC | #7
On 02/05/2013 06:02 AM, Michal Privoznik wrote:
> Currently, it we call a not white listed system call, we get killed
> immediately without reporting any error. It would be far more useful,
> if we can at least shout something on stderr just before dying, so
> users know it is because of sandbox, not just random quit.
>
> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
> ---
>   os-posix.c     | 8 ++++++++
>   qemu-seccomp.c | 4 +++-
>   2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/os-posix.c b/os-posix.c
> index 5c64518..1d52306 100644
> --- a/os-posix.c
> +++ b/os-posix.c
> @@ -62,6 +62,12 @@ void os_setup_early_signal_handling(void)
>       sigaction(SIGPIPE, &act, NULL);
>   }
>
> +static void syssig_handler(int signal, siginfo_t *info, void *c)
> +{
> +    fprintf(stderr, "Bad system call\n");
> +    exit(1);
> +}
> +
>   static void termsig_handler(int signal, siginfo_t *info, void *c)
>   {
>       qemu_system_killed(info->si_signo, info->si_pid);
> @@ -77,6 +83,8 @@ void os_setup_signal_handling(void)
>       sigaction(SIGINT,  &act, NULL);
>       sigaction(SIGHUP,  &act, NULL);
>       sigaction(SIGTERM, &act, NULL);
> +    act.sa_sigaction = syssig_handler;
> +    sigaction(SIGSYS,  &act, NULL);
>   }
>
>   /* Find a likely location for support files using the location of the binary.
> diff --git a/qemu-seccomp.c b/qemu-seccomp.c
> index 031da1d..897d9b3 100644
> --- a/qemu-seccomp.c
> +++ b/qemu-seccomp.c
> @@ -2,9 +2,11 @@
>    * QEMU seccomp mode 2 support with libseccomp
>    *
>    * Copyright IBM, Corp. 2012
> + * Copyright (C) 2013 Red Hat, Inc.
>    *
>    * Authors:
>    *  Eduardo Otubo    <eotubo@br.ibm.com>
> + *  Michal Privoznik <mprivozn@redhat.com>
>    *
>    * This work is licensed under the terms of the GNU GPL, version 2.  See
>    * the COPYING file in the top-level directory.
> @@ -238,7 +240,7 @@ int seccomp_start(void)
>       unsigned int i = 0;
>       scmp_filter_ctx ctx;
>
> -    ctx = seccomp_init(SCMP_ACT_KILL);
> +    ctx = seccomp_init(SCMP_ACT_TRAP);
>       if (ctx == NULL) {
>           goto seccomp_return;
>       }
>

Another thought.. When seccomp kills a task the exit status of the task 
will be SIGSYS.  Here's the kernel documentation:

SECCOMP_RET_KILL:
         Results in the task exiting immediately without executing the
         system call.  The exit status of the task (status & 0x7f) will
         be SIGSYS, not SIGKILL.

Maybe the right solution is for libvirt to check qemu's exit status and 
issue a message based on it?
Daniel P. Berrangé Feb. 8, 2013, 2:51 p.m. UTC | #8
On Fri, Feb 08, 2013 at 09:44:10AM -0500, Corey Bryant wrote:
> 
> 
> On 02/05/2013 06:02 AM, Michal Privoznik wrote:
> >Currently, it we call a not white listed system call, we get killed
> >immediately without reporting any error. It would be far more useful,
> >if we can at least shout something on stderr just before dying, so
> >users know it is because of sandbox, not just random quit.
> >
> >Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
> >---
> >  os-posix.c     | 8 ++++++++
> >  qemu-seccomp.c | 4 +++-
> >  2 files changed, 11 insertions(+), 1 deletion(-)
> >
> >diff --git a/os-posix.c b/os-posix.c
> >index 5c64518..1d52306 100644
> >--- a/os-posix.c
> >+++ b/os-posix.c
> >@@ -62,6 +62,12 @@ void os_setup_early_signal_handling(void)
> >      sigaction(SIGPIPE, &act, NULL);
> >  }
> >
> >+static void syssig_handler(int signal, siginfo_t *info, void *c)
> >+{
> >+    fprintf(stderr, "Bad system call\n");
> >+    exit(1);
> >+}
> >+
> >  static void termsig_handler(int signal, siginfo_t *info, void *c)
> >  {
> >      qemu_system_killed(info->si_signo, info->si_pid);
> >@@ -77,6 +83,8 @@ void os_setup_signal_handling(void)
> >      sigaction(SIGINT,  &act, NULL);
> >      sigaction(SIGHUP,  &act, NULL);
> >      sigaction(SIGTERM, &act, NULL);
> >+    act.sa_sigaction = syssig_handler;
> >+    sigaction(SIGSYS,  &act, NULL);
> >  }
> >
> >  /* Find a likely location for support files using the location of the binary.
> >diff --git a/qemu-seccomp.c b/qemu-seccomp.c
> >index 031da1d..897d9b3 100644
> >--- a/qemu-seccomp.c
> >+++ b/qemu-seccomp.c
> >@@ -2,9 +2,11 @@
> >   * QEMU seccomp mode 2 support with libseccomp
> >   *
> >   * Copyright IBM, Corp. 2012
> >+ * Copyright (C) 2013 Red Hat, Inc.
> >   *
> >   * Authors:
> >   *  Eduardo Otubo    <eotubo@br.ibm.com>
> >+ *  Michal Privoznik <mprivozn@redhat.com>
> >   *
> >   * This work is licensed under the terms of the GNU GPL, version 2.  See
> >   * the COPYING file in the top-level directory.
> >@@ -238,7 +240,7 @@ int seccomp_start(void)
> >      unsigned int i = 0;
> >      scmp_filter_ctx ctx;
> >
> >-    ctx = seccomp_init(SCMP_ACT_KILL);
> >+    ctx = seccomp_init(SCMP_ACT_TRAP);
> >      if (ctx == NULL) {
> >          goto seccomp_return;
> >      }
> >
> 
> Another thought.. When seccomp kills a task the exit status of the
> task will be SIGSYS.  Here's the kernel documentation:
> 
> SECCOMP_RET_KILL:
>         Results in the task exiting immediately without executing the
>         system call.  The exit status of the task (status & 0x7f) will
>         be SIGSYS, not SIGKILL.
> 
> Maybe the right solution is for libvirt to check qemu's exit status
> and issue a message based on it?

QEMU is daemonized, so libvirt doesn't get to see the exit status at
all.

Daniel
diff mbox

Patch

diff --git a/os-posix.c b/os-posix.c
index 5c64518..1d52306 100644
--- a/os-posix.c
+++ b/os-posix.c
@@ -62,6 +62,12 @@  void os_setup_early_signal_handling(void)
     sigaction(SIGPIPE, &act, NULL);
 }
 
+static void syssig_handler(int signal, siginfo_t *info, void *c)
+{
+    fprintf(stderr, "Bad system call\n");
+    exit(1);
+}
+
 static void termsig_handler(int signal, siginfo_t *info, void *c)
 {
     qemu_system_killed(info->si_signo, info->si_pid);
@@ -77,6 +83,8 @@  void os_setup_signal_handling(void)
     sigaction(SIGINT,  &act, NULL);
     sigaction(SIGHUP,  &act, NULL);
     sigaction(SIGTERM, &act, NULL);
+    act.sa_sigaction = syssig_handler;
+    sigaction(SIGSYS,  &act, NULL);
 }
 
 /* Find a likely location for support files using the location of the binary.
diff --git a/qemu-seccomp.c b/qemu-seccomp.c
index 031da1d..897d9b3 100644
--- a/qemu-seccomp.c
+++ b/qemu-seccomp.c
@@ -2,9 +2,11 @@ 
  * QEMU seccomp mode 2 support with libseccomp
  *
  * Copyright IBM, Corp. 2012
+ * Copyright (C) 2013 Red Hat, Inc.
  *
  * Authors:
  *  Eduardo Otubo    <eotubo@br.ibm.com>
+ *  Michal Privoznik <mprivozn@redhat.com>
  *
  * This work is licensed under the terms of the GNU GPL, version 2.  See
  * the COPYING file in the top-level directory.
@@ -238,7 +240,7 @@  int seccomp_start(void)
     unsigned int i = 0;
     scmp_filter_ctx ctx;
 
-    ctx = seccomp_init(SCMP_ACT_KILL);
+    ctx = seccomp_init(SCMP_ACT_TRAP);
     if (ctx == NULL) {
         goto seccomp_return;
     }