diff mbox series

coroutine-sigaltstack: Add SIGUSR2 mutex

Message ID 20210125120305.19520-1-mreitz@redhat.com
State New
Headers show
Series coroutine-sigaltstack: Add SIGUSR2 mutex | expand

Commit Message

Max Reitz Jan. 25, 2021, 12:03 p.m. UTC
Disposition (action) for any given signal is global for the process.
When two threads run coroutine-sigaltstack's qemu_coroutine_new()
concurrently, they may interfere with each other: One of them may revert
the SIGUSR2 handler to SIG_DFL, between the other thread (a) setting up
coroutine_trampoline() as the handler and (b) raising SIGUSR2.  That
SIGUSR2 will then terminate the QEMU process abnormally.

We have to ensure that only one thread at a time can modify the
process-global SIGUSR2 handler.  To do so, wrap the whole section where
that is done in a mutex.

Alternatively, we could for example have the SIGUSR2 handler always be
coroutine_trampoline(), so there would be no need to invoke sigaction()
in qemu_coroutine_new().  Laszlo has posted a patch to do so here:

  https://lists.nongnu.org/archive/html/qemu-devel/2021-01/msg05962.html

However, given that coroutine-sigaltstack is more of a fallback
implementation for platforms that do not support ucontext, that change
may be a bit too invasive to be comfortable with it.  The mutex proposed
here may negatively impact performance, but the change is much simpler.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 util/coroutine-sigaltstack.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Laszlo Ersek Jan. 25, 2021, 9:55 p.m. UTC | #1
On 01/25/21 13:03, Max Reitz wrote:
> Disposition (action) for any given signal is global for the process.
> When two threads run coroutine-sigaltstack's qemu_coroutine_new()
> concurrently, they may interfere with each other: One of them may revert
> the SIGUSR2 handler to SIG_DFL, between the other thread (a) setting up
> coroutine_trampoline() as the handler and (b) raising SIGUSR2.  That
> SIGUSR2 will then terminate the QEMU process abnormally.
> 
> We have to ensure that only one thread at a time can modify the
> process-global SIGUSR2 handler.  To do so, wrap the whole section where
> that is done in a mutex.
> 
> Alternatively, we could for example have the SIGUSR2 handler always be
> coroutine_trampoline(), so there would be no need to invoke sigaction()
> in qemu_coroutine_new().  Laszlo has posted a patch to do so here:
> 
>   https://lists.nongnu.org/archive/html/qemu-devel/2021-01/msg05962.html
> 
> However, given that coroutine-sigaltstack is more of a fallback
> implementation for platforms that do not support ucontext, that change
> may be a bit too invasive to be comfortable with it.  The mutex proposed
> here may negatively impact performance, but the change is much simpler.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  util/coroutine-sigaltstack.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c
> index aade82afb8..e99b8a4f9c 100644
> --- a/util/coroutine-sigaltstack.c
> +++ b/util/coroutine-sigaltstack.c
> @@ -157,6 +157,7 @@ Coroutine *qemu_coroutine_new(void)
>      sigset_t sigs;
>      sigset_t osigs;
>      sigjmp_buf old_env;
> +    static pthread_mutex_t sigusr2_mutex = PTHREAD_MUTEX_INITIALIZER;
>  
>      /* The way to manipulate stack is with the sigaltstack function. We
>       * prepare a stack, with it delivering a signal to ourselves and then
> @@ -186,6 +187,12 @@ Coroutine *qemu_coroutine_new(void)
>      sa.sa_handler = coroutine_trampoline;
>      sigfillset(&sa.sa_mask);
>      sa.sa_flags = SA_ONSTACK;
> +
> +    /*
> +     * sigaction() is a process-global operation.  We must not run
> +     * this code in multiple threads at once.
> +     */
> +    pthread_mutex_lock(&sigusr2_mutex);
>      if (sigaction(SIGUSR2, &sa, &osa) != 0) {
>          abort();
>      }
> @@ -234,6 +241,8 @@ Coroutine *qemu_coroutine_new(void)
>       * Restore the old SIGUSR2 signal handler and mask
>       */
>      sigaction(SIGUSR2, &osa, NULL);
> +    pthread_mutex_unlock(&sigusr2_mutex);
> +
>      pthread_sigmask(SIG_SETMASK, &osigs, NULL);
>  
>      /*
> 

Reviewed-by: Laszlo Ersek <lersek@redhat.com>

Thanks!
Laszlo
Vladimir Sementsov-Ogievskiy Jan. 26, 2021, 12:44 p.m. UTC | #2
25.01.2021 15:03, Max Reitz wrote:
> Disposition (action) for any given signal is global for the process.
> When two threads run coroutine-sigaltstack's qemu_coroutine_new()
> concurrently, they may interfere with each other: One of them may revert
> the SIGUSR2 handler to SIG_DFL, between the other thread (a) setting up
> coroutine_trampoline() as the handler and (b) raising SIGUSR2.  That
> SIGUSR2 will then terminate the QEMU process abnormally.
> 
> We have to ensure that only one thread at a time can modify the
> process-global SIGUSR2 handler.  To do so, wrap the whole section where
> that is done in a mutex.
> 
> Alternatively, we could for example have the SIGUSR2 handler always be
> coroutine_trampoline(), so there would be no need to invoke sigaction()
> in qemu_coroutine_new().  Laszlo has posted a patch to do so here:
> 
>    https://lists.nongnu.org/archive/html/qemu-devel/2021-01/msg05962.html
> 
> However, given that coroutine-sigaltstack is more of a fallback
> implementation for platforms that do not support ucontext, that change
> may be a bit too invasive to be comfortable with it.  The mutex proposed
> here may negatively impact performance, but the change is much simpler.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   util/coroutine-sigaltstack.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
> 
> diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c
> index aade82afb8..e99b8a4f9c 100644
> --- a/util/coroutine-sigaltstack.c
> +++ b/util/coroutine-sigaltstack.c
> @@ -157,6 +157,7 @@ Coroutine *qemu_coroutine_new(void)
>       sigset_t sigs;
>       sigset_t osigs;
>       sigjmp_buf old_env;
> +    static pthread_mutex_t sigusr2_mutex = PTHREAD_MUTEX_INITIALIZER;
>   
>       /* The way to manipulate stack is with the sigaltstack function. We
>        * prepare a stack, with it delivering a signal to ourselves and then
> @@ -186,6 +187,12 @@ Coroutine *qemu_coroutine_new(void)
>       sa.sa_handler = coroutine_trampoline;
>       sigfillset(&sa.sa_mask);
>       sa.sa_flags = SA_ONSTACK;
> +
> +    /*
> +     * sigaction() is a process-global operation.  We must not run
> +     * this code in multiple threads at once.
> +     */
> +    pthread_mutex_lock(&sigusr2_mutex);
>       if (sigaction(SIGUSR2, &sa, &osa) != 0) {
>           abort();
>       }
> @@ -234,6 +241,8 @@ Coroutine *qemu_coroutine_new(void)
>        * Restore the old SIGUSR2 signal handler and mask
>        */
>       sigaction(SIGUSR2, &osa, NULL);
> +    pthread_mutex_unlock(&sigusr2_mutex);
> +
>       pthread_sigmask(SIG_SETMASK, &osigs, NULL);
>   
>       /*
> 

weak:
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

Side thought: so, sigaltstack coroutine implementation is not thread-safe. Is that the only bug? Or actually, the whole implementation should be revisited to check, could it be used with iothreads or not? Shouldn't we just state that sigaltstack coroutine implementation doesn't support iothreads? And do error out on iothread creation if sigaltstack coroutines is in use?
Max Reitz Jan. 26, 2021, 1:16 p.m. UTC | #3
On 26.01.21 13:44, Vladimir Sementsov-Ogievskiy wrote:
> 25.01.2021 15:03, Max Reitz wrote:
>> Disposition (action) for any given signal is global for the process.
>> When two threads run coroutine-sigaltstack's qemu_coroutine_new()
>> concurrently, they may interfere with each other: One of them may revert
>> the SIGUSR2 handler to SIG_DFL, between the other thread (a) setting up
>> coroutine_trampoline() as the handler and (b) raising SIGUSR2.  That
>> SIGUSR2 will then terminate the QEMU process abnormally.
>>
>> We have to ensure that only one thread at a time can modify the
>> process-global SIGUSR2 handler.  To do so, wrap the whole section where
>> that is done in a mutex.
>>
>> Alternatively, we could for example have the SIGUSR2 handler always be
>> coroutine_trampoline(), so there would be no need to invoke sigaction()
>> in qemu_coroutine_new().  Laszlo has posted a patch to do so here:
>>
>>    https://lists.nongnu.org/archive/html/qemu-devel/2021-01/msg05962.html
>>
>> However, given that coroutine-sigaltstack is more of a fallback
>> implementation for platforms that do not support ucontext, that change
>> may be a bit too invasive to be comfortable with it.  The mutex proposed
>> here may negatively impact performance, but the change is much simpler.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   util/coroutine-sigaltstack.c | 9 +++++++++
>>   1 file changed, 9 insertions(+)
>>
>> diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c
>> index aade82afb8..e99b8a4f9c 100644
>> --- a/util/coroutine-sigaltstack.c
>> +++ b/util/coroutine-sigaltstack.c
>> @@ -157,6 +157,7 @@ Coroutine *qemu_coroutine_new(void)
>>       sigset_t sigs;
>>       sigset_t osigs;
>>       sigjmp_buf old_env;
>> +    static pthread_mutex_t sigusr2_mutex = PTHREAD_MUTEX_INITIALIZER;
>>       /* The way to manipulate stack is with the sigaltstack function. We
>>        * prepare a stack, with it delivering a signal to ourselves and 
>> then
>> @@ -186,6 +187,12 @@ Coroutine *qemu_coroutine_new(void)
>>       sa.sa_handler = coroutine_trampoline;
>>       sigfillset(&sa.sa_mask);
>>       sa.sa_flags = SA_ONSTACK;
>> +
>> +    /*
>> +     * sigaction() is a process-global operation.  We must not run
>> +     * this code in multiple threads at once.
>> +     */
>> +    pthread_mutex_lock(&sigusr2_mutex);
>>       if (sigaction(SIGUSR2, &sa, &osa) != 0) {
>>           abort();
>>       }
>> @@ -234,6 +241,8 @@ Coroutine *qemu_coroutine_new(void)
>>        * Restore the old SIGUSR2 signal handler and mask
>>        */
>>       sigaction(SIGUSR2, &osa, NULL);
>> +    pthread_mutex_unlock(&sigusr2_mutex);
>> +
>>       pthread_sigmask(SIG_SETMASK, &osigs, NULL);
>>       /*
>>
> 
> weak:
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> 
> Side thought: so, sigaltstack coroutine implementation is not 
> thread-safe. Is that the only bug?

It would be great if I could tell you for sure whether there’s no bug in 
some piece of code. :)

> Or actually, the whole implementation 
> should be revisited to check, could it be used with iothreads or not?

Judging from the discussion I had with Laszlo, I’m definitely not the 
right person to do so, because for example I don’t know the ins and outs 
of signal handling.

I can only tell you it’s the only issue I’ve seen, and that there’s just 
not much more code in coroutine-sigaltstack.c than the code around 
qemu_coroutine_new().

> Shouldn't we just state that sigaltstack coroutine implementation 
> doesn't support iothreads? And do error out on iothread creation if 
> sigaltstack coroutines is in use?

I’m not sure whether that would be better than potentially having a bug 
in it.  What you’re proposing is effectively breaking all iothreads 
usage on MacOS.  If I were a MacOS user, I’d rather risk encountering 
bugs than that.

(And it isn’t like we know it’s unstable with iothreads; I haven’t seen 
it breaking with this patch applied yet, and I don’t think there’s 
reason to believe it would be.  qemu_coroutine_new() together with 
coroutine_trampoline() sets up a coroutine environment, and the rest of 
the code just consists of sigsetjmp() and siglongjmp().  I believe 
Laszlo hat some open questions about signal masking done by those 
functions, but I don’t think that has anything to do with multithreading.)

Max
Stefan Hajnoczi Jan. 26, 2021, 4:11 p.m. UTC | #4
On Mon, Jan 25, 2021 at 01:03:05PM +0100, Max Reitz wrote:
> Disposition (action) for any given signal is global for the process.
> When two threads run coroutine-sigaltstack's qemu_coroutine_new()
> concurrently, they may interfere with each other: One of them may revert
> the SIGUSR2 handler to SIG_DFL, between the other thread (a) setting up
> coroutine_trampoline() as the handler and (b) raising SIGUSR2.  That
> SIGUSR2 will then terminate the QEMU process abnormally.
> 
> We have to ensure that only one thread at a time can modify the
> process-global SIGUSR2 handler.  To do so, wrap the whole section where
> that is done in a mutex.
> 
> Alternatively, we could for example have the SIGUSR2 handler always be
> coroutine_trampoline(), so there would be no need to invoke sigaction()
> in qemu_coroutine_new().  Laszlo has posted a patch to do so here:
> 
>   https://lists.nongnu.org/archive/html/qemu-devel/2021-01/msg05962.html
> 
> However, given that coroutine-sigaltstack is more of a fallback
> implementation for platforms that do not support ucontext, that change
> may be a bit too invasive to be comfortable with it.  The mutex proposed
> here may negatively impact performance, but the change is much simpler.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  util/coroutine-sigaltstack.c | 9 +++++++++
>  1 file changed, 9 insertions(+)

I slightly prefer Laszlo's patch: since the signal disposition is
process-wide it's cleaner to set it up globally and once only. That
said, this patch is okay too.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Laszlo Ersek Jan. 26, 2021, 8:06 p.m. UTC | #5
On 01/26/21 14:16, Max Reitz wrote:
> On 26.01.21 13:44, Vladimir Sementsov-Ogievskiy wrote:
>> 25.01.2021 15:03, Max Reitz wrote:
>>> Disposition (action) for any given signal is global for the process.
>>> When two threads run coroutine-sigaltstack's qemu_coroutine_new()
>>> concurrently, they may interfere with each other: One of them may revert
>>> the SIGUSR2 handler to SIG_DFL, between the other thread (a) setting up
>>> coroutine_trampoline() as the handler and (b) raising SIGUSR2.  That
>>> SIGUSR2 will then terminate the QEMU process abnormally.
>>>
>>> We have to ensure that only one thread at a time can modify the
>>> process-global SIGUSR2 handler.  To do so, wrap the whole section where
>>> that is done in a mutex.
>>>
>>> Alternatively, we could for example have the SIGUSR2 handler always be
>>> coroutine_trampoline(), so there would be no need to invoke sigaction()
>>> in qemu_coroutine_new().  Laszlo has posted a patch to do so here:
>>>
>>>   
>>> https://lists.nongnu.org/archive/html/qemu-devel/2021-01/msg05962.html
>>>
>>> However, given that coroutine-sigaltstack is more of a fallback
>>> implementation for platforms that do not support ucontext, that change
>>> may be a bit too invasive to be comfortable with it.  The mutex proposed
>>> here may negatively impact performance, but the change is much simpler.
>>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>   util/coroutine-sigaltstack.c | 9 +++++++++
>>>   1 file changed, 9 insertions(+)
>>>
>>> diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c
>>> index aade82afb8..e99b8a4f9c 100644
>>> --- a/util/coroutine-sigaltstack.c
>>> +++ b/util/coroutine-sigaltstack.c
>>> @@ -157,6 +157,7 @@ Coroutine *qemu_coroutine_new(void)
>>>       sigset_t sigs;
>>>       sigset_t osigs;
>>>       sigjmp_buf old_env;
>>> +    static pthread_mutex_t sigusr2_mutex = PTHREAD_MUTEX_INITIALIZER;
>>>       /* The way to manipulate stack is with the sigaltstack
>>> function. We
>>>        * prepare a stack, with it delivering a signal to ourselves
>>> and then
>>> @@ -186,6 +187,12 @@ Coroutine *qemu_coroutine_new(void)
>>>       sa.sa_handler = coroutine_trampoline;
>>>       sigfillset(&sa.sa_mask);
>>>       sa.sa_flags = SA_ONSTACK;
>>> +
>>> +    /*
>>> +     * sigaction() is a process-global operation.  We must not run
>>> +     * this code in multiple threads at once.
>>> +     */
>>> +    pthread_mutex_lock(&sigusr2_mutex);
>>>       if (sigaction(SIGUSR2, &sa, &osa) != 0) {
>>>           abort();
>>>       }
>>> @@ -234,6 +241,8 @@ Coroutine *qemu_coroutine_new(void)
>>>        * Restore the old SIGUSR2 signal handler and mask
>>>        */
>>>       sigaction(SIGUSR2, &osa, NULL);
>>> +    pthread_mutex_unlock(&sigusr2_mutex);
>>> +
>>>       pthread_sigmask(SIG_SETMASK, &osigs, NULL);
>>>       /*
>>>
>>
>> weak:
>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>
>> Side thought: so, sigaltstack coroutine implementation is not
>> thread-safe. Is that the only bug?
> 
> It would be great if I could tell you for sure whether there’s no bug in
> some piece of code. :)
> 
>> Or actually, the whole implementation should be revisited to check,
>> could it be used with iothreads or not?
> 
> Judging from the discussion I had with Laszlo, I’m definitely not the
> right person to do so, because for example I don’t know the ins and outs
> of signal handling.
> 
> I can only tell you it’s the only issue I’ve seen, and that there’s just
> not much more code in coroutine-sigaltstack.c than the code around
> qemu_coroutine_new().
> 
>> Shouldn't we just state that sigaltstack coroutine implementation
>> doesn't support iothreads? And do error out on iothread creation if
>> sigaltstack coroutines is in use?
> 
> I’m not sure whether that would be better than potentially having a bug
> in it.  What you’re proposing is effectively breaking all iothreads
> usage on MacOS.  If I were a MacOS user, I’d rather risk encountering
> bugs than that.
> 
> (And it isn’t like we know it’s unstable with iothreads; I haven’t seen
> it breaking with this patch applied yet, and I don’t think there’s
> reason to believe it would be.  qemu_coroutine_new() together with
> coroutine_trampoline() sets up a coroutine environment, and the rest of
> the code just consists of sigsetjmp() and siglongjmp().  I believe
> Laszlo hat some open questions about signal masking done by those
> functions, but I don’t think that has anything to do with multithreading.)

I've no open questions regarding the signal masking done by sigsetjmp()
and siglongjmp(). I was briefly confused by sigsetjmp() potentially
saving the signal mask into the "env" buffer even if "savemask" were
zero (POSIX allows this behavior), but then I re-learned that
siglongjmp() is *required to ignore* that potentially-saved mask in
"env" if "savemask" was 0 in the first place.

So the end result is as expected, it's just that the distribution of
responsibilities is potentially non-intuitive (i.e., why permit the
"save" function to stash some crap, under some circumstances, if the
"load" function is required to ignore said crap under the same
circumstances?) Of course the answer is that POSIX codifies existent
practice, and some system does this. I guess I would have appreciated a
hint right in sigsetjmp().

Anyway: no open questions on my end.

Thanks
Laszlo
Eric Blake Jan. 27, 2021, 7:13 p.m. UTC | #6
On 1/25/21 6:03 AM, Max Reitz wrote:
> Disposition (action) for any given signal is global for the process.
> When two threads run coroutine-sigaltstack's qemu_coroutine_new()
> concurrently, they may interfere with each other: One of them may revert
> the SIGUSR2 handler to SIG_DFL, between the other thread (a) setting up
> coroutine_trampoline() as the handler and (b) raising SIGUSR2.  That
> SIGUSR2 will then terminate the QEMU process abnormally.
> 
> We have to ensure that only one thread at a time can modify the
> process-global SIGUSR2 handler.  To do so, wrap the whole section where
> that is done in a mutex.
> 
> Alternatively, we could for example have the SIGUSR2 handler always be
> coroutine_trampoline(), so there would be no need to invoke sigaction()
> in qemu_coroutine_new().  Laszlo has posted a patch to do so here:
> 
>   https://lists.nongnu.org/archive/html/qemu-devel/2021-01/msg05962.html

I indeed like that one, but also concur that simplicity trumps the
uncertainty of a larger patch.  Let's get things unbroken before we
worry about optimizing things to avoid the mutex.

> 
> However, given that coroutine-sigaltstack is more of a fallback
> implementation for platforms that do not support ucontext, that change
> may be a bit too invasive to be comfortable with it.  The mutex proposed
> here may negatively impact performance, but the change is much simpler.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  util/coroutine-sigaltstack.c | 9 +++++++++
>  1 file changed, 9 insertions(+)

Reviewed-by: Eric Blake <eblake@redhat.com>
diff mbox series

Patch

diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c
index aade82afb8..e99b8a4f9c 100644
--- a/util/coroutine-sigaltstack.c
+++ b/util/coroutine-sigaltstack.c
@@ -157,6 +157,7 @@  Coroutine *qemu_coroutine_new(void)
     sigset_t sigs;
     sigset_t osigs;
     sigjmp_buf old_env;
+    static pthread_mutex_t sigusr2_mutex = PTHREAD_MUTEX_INITIALIZER;
 
     /* The way to manipulate stack is with the sigaltstack function. We
      * prepare a stack, with it delivering a signal to ourselves and then
@@ -186,6 +187,12 @@  Coroutine *qemu_coroutine_new(void)
     sa.sa_handler = coroutine_trampoline;
     sigfillset(&sa.sa_mask);
     sa.sa_flags = SA_ONSTACK;
+
+    /*
+     * sigaction() is a process-global operation.  We must not run
+     * this code in multiple threads at once.
+     */
+    pthread_mutex_lock(&sigusr2_mutex);
     if (sigaction(SIGUSR2, &sa, &osa) != 0) {
         abort();
     }
@@ -234,6 +241,8 @@  Coroutine *qemu_coroutine_new(void)
      * Restore the old SIGUSR2 signal handler and mask
      */
     sigaction(SIGUSR2, &osa, NULL);
+    pthread_mutex_unlock(&sigusr2_mutex);
+
     pthread_sigmask(SIG_SETMASK, &osigs, NULL);
 
     /*