diff mbox

[1.1] coroutine: Avoid ucontext usage on i386 Linux host

Message ID 4FAAC3A3.5040503@siemens.com
State New
Headers show

Commit Message

Jan Kiszka May 9, 2012, 7:21 p.m. UTC
On i386, glibc only saves/restores the signal mask via sigprocmask,
excluding RT signal. A Linux bug in the compat version of this syscall
corrupts the RT signal state, which will cause lockups of QEMU's VCPU
threads. Therefore, fall back to gthread coroutines on this host
platform.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

I'm not sure where to fall back to. The existing code uses gthread,
likely because it is the safer harbor. So I picked it as well.

This is also stable material.

 configure |   13 +++++++++----
 1 files changed, 9 insertions(+), 4 deletions(-)

Comments

Michael Tokarev May 9, 2012, 7:27 p.m. UTC | #1
On 09.05.2012 23:21, Jan Kiszka wrote:
> On i386, glibc only saves/restores the signal mask via sigprocmask,
> excluding RT signal. A Linux bug in the compat version of this syscall
> corrupts the RT signal state, which will cause lockups of QEMU's VCPU
> threads.

This should obviously be fixed in kernel, for benefit of all (not only
qemu), do you have any details here?

> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
> 
> I'm not sure where to fall back to. The existing code uses gthread,
> likely because it is the safer harbor. So I picked it as well.

Can't we resort to the SIGUSR1 workaround for the time being, while
no RT signals are in actual use, and just have the time to let the
kernel side to fix the things up before some actual RTsig user will
emerge in qemu?  I think it is a bit more conservative approach,
especially having in mind the minority of users this issue affects
(only 32/64 mixed environment).  I'd favor for this variant, and
it looks like I'm the "main" 32/64bit user of qemu in this world :)

Thanks,

/mjt
Jan Kiszka May 9, 2012, 7:34 p.m. UTC | #2
On 2012-05-09 16:27, Michael Tokarev wrote:
> On 09.05.2012 23:21, Jan Kiszka wrote:
>> On i386, glibc only saves/restores the signal mask via sigprocmask,
>> excluding RT signal. A Linux bug in the compat version of this syscall
>> corrupts the RT signal state, which will cause lockups of QEMU's VCPU
>> threads.
> 
> This should obviously be fixed in kernel, for benefit of all (not only
> qemu), do you have any details here?

compat_sys_sigprocmask reads 32-bit sigmask from user space, i.e.
excluding RT signal, but calls sys_sigprocmask that takes a 64-bit
sigset. So the RT signals are unblocked. I'm testing a simple patch ATM,
will post it to LKML once this works.

> 
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>>
>> I'm not sure where to fall back to. The existing code uses gthread,
>> likely because it is the safer harbor. So I picked it as well.
> 
> Can't we resort to the SIGUSR1 workaround for the time being, while
> no RT signals are in actual use, and just have the time to let the
> kernel side to fix the things up before some actual RTsig user will
> emerge in qemu?  I think it is a bit more conservative approach,
> especially having in mind the minority of users this issue affects
> (only 32/64 mixed environment).  I'd favor for this variant, and
> it looks like I'm the "main" 32/64bit user of qemu in this world :)

Most conservative is definitely this patch, not switching to SIGUSR1,
hoping that no other RT signal user shows up until current kernel are no
longer in use.

Jan
Anthony Liguori May 9, 2012, 7:48 p.m. UTC | #3
On 05/09/2012 02:34 PM, Jan Kiszka wrote:
> On 2012-05-09 16:27, Michael Tokarev wrote:
>> On 09.05.2012 23:21, Jan Kiszka wrote:
>>> On i386, glibc only saves/restores the signal mask via sigprocmask,
>>> excluding RT signal. A Linux bug in the compat version of this syscall
>>> corrupts the RT signal state, which will cause lockups of QEMU's VCPU
>>> threads.
>>
>> This should obviously be fixed in kernel, for benefit of all (not only
>> qemu), do you have any details here?
>
> compat_sys_sigprocmask reads 32-bit sigmask from user space, i.e.
> excluding RT signal, but calls sys_sigprocmask that takes a 64-bit
> sigset. So the RT signals are unblocked. I'm testing a simple patch ATM,
> will post it to LKML once this works.
>
>>
>>> Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com>
>>> ---
>>>
>>> I'm not sure where to fall back to. The existing code uses gthread,
>>> likely because it is the safer harbor. So I picked it as well.
>>
>> Can't we resort to the SIGUSR1 workaround for the time being, while
>> no RT signals are in actual use, and just have the time to let the
>> kernel side to fix the things up before some actual RTsig user will
>> emerge in qemu?  I think it is a bit more conservative approach,
>> especially having in mind the minority of users this issue affects
>> (only 32/64 mixed environment).  I'd favor for this variant, and
>> it looks like I'm the "main" 32/64bit user of qemu in this world :)
>
> Most conservative is definitely this patch, not switching to SIGUSR1,
> hoping that no other RT signal user shows up until current kernel are no
> longer in use.

Sorry, how is using a totally different code path more conservative than using a 
different signal number?

Why would we even use an RT signal in the future?

Regards,

Anthony Liguori

>
> Jan
>
Jan Kiszka May 9, 2012, 7:57 p.m. UTC | #4
On 2012-05-09 16:48, Anthony Liguori wrote:
> On 05/09/2012 02:34 PM, Jan Kiszka wrote:
>> On 2012-05-09 16:27, Michael Tokarev wrote:
>>> On 09.05.2012 23:21, Jan Kiszka wrote:
>>>> On i386, glibc only saves/restores the signal mask via sigprocmask,
>>>> excluding RT signal. A Linux bug in the compat version of this syscall
>>>> corrupts the RT signal state, which will cause lockups of QEMU's VCPU
>>>> threads.
>>>
>>> This should obviously be fixed in kernel, for benefit of all (not only
>>> qemu), do you have any details here?
>>
>> compat_sys_sigprocmask reads 32-bit sigmask from user space, i.e.
>> excluding RT signal, but calls sys_sigprocmask that takes a 64-bit
>> sigset. So the RT signals are unblocked. I'm testing a simple patch ATM,
>> will post it to LKML once this works.
>>
>>>
>>>> Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com>
>>>> ---
>>>>
>>>> I'm not sure where to fall back to. The existing code uses gthread,
>>>> likely because it is the safer harbor. So I picked it as well.
>>>
>>> Can't we resort to the SIGUSR1 workaround for the time being, while
>>> no RT signals are in actual use, and just have the time to let the
>>> kernel side to fix the things up before some actual RTsig user will
>>> emerge in qemu?  I think it is a bit more conservative approach,
>>> especially having in mind the minority of users this issue affects
>>> (only 32/64 mixed environment).  I'd favor for this variant, and
>>> it looks like I'm the "main" 32/64bit user of qemu in this world :)
>>
>> Most conservative is definitely this patch, not switching to SIGUSR1,
>> hoping that no other RT signal user shows up until current kernel are no
>> longer in use.
> 
> Sorry, how is using a totally different code path more conservative than using a 
> different signal number?

If the gthread version is not safe to use, why do we fall back to it?

> 
> Why would we even use an RT signal in the future?

As both SIGUSR1 and 2 are now in use?

Jan
Anthony Liguori May 9, 2012, 8:01 p.m. UTC | #5
On 05/09/2012 02:57 PM, Jan Kiszka wrote:
> On 2012-05-09 16:48, Anthony Liguori wrote:
>> On 05/09/2012 02:34 PM, Jan Kiszka wrote:
>>>> Can't we resort to the SIGUSR1 workaround for the time being, while
>>>> no RT signals are in actual use, and just have the time to let the
>>>> kernel side to fix the things up before some actual RTsig user will
>>>> emerge in qemu?  I think it is a bit more conservative approach,
>>>> especially having in mind the minority of users this issue affects
>>>> (only 32/64 mixed environment).  I'd favor for this variant, and
>>>> it looks like I'm the "main" 32/64bit user of qemu in this world :)
>>>
>>> Most conservative is definitely this patch, not switching to SIGUSR1,
>>> hoping that no other RT signal user shows up until current kernel are no
>>> longer in use.
>>
>> Sorry, how is using a totally different code path more conservative than using a
>> different signal number?
>
> If the gthread version is not safe to use, why do we fall back to it?

It's safe, but it's significantly slower.

Regards,

Anthony Liguori
Michael Tokarev May 9, 2012, 8:04 p.m. UTC | #6
On 09.05.2012 23:21, Jan Kiszka wrote:
[]
> --- a/configure
> +++ b/configure
> @@ -2777,17 +2777,22 @@ fi
>  # windows autodetected by make
>  if test "$coroutine" = "" -o "$coroutine" = "ucontext"; then
>    if test "$darwin" != "yes"; then
> -    cat > $TMPC << EOF
> +    if test "$linux" = "yes" -a "$cpu" = "i386"; then
> +      # RT signal mask corruption for 32-on-64 bit prevents ucontext usage
> +      coroutine_backend=gthread
> +    else

I'd say this is unacceptable.  We're making 32bit x86 code to use
different implementation of one of core subsystems, and 32bit code
already receives much less testing because everyone is using 64bit
code and many don't care about 32bits (this is general 32bit case,
not a more exotic 32-on-64bit case).  I think it is a sure way to
have more subtle "works for me" bugs which happens only on 32bits..

Besides, gthread is quite a bit slower than ucontext (but at least
it works even in the exotic case).

Thanks,

/mjt
Jan Kiszka May 9, 2012, 8:11 p.m. UTC | #7
On 2012-05-09 17:01, Anthony Liguori wrote:
> On 05/09/2012 02:57 PM, Jan Kiszka wrote:
>> On 2012-05-09 16:48, Anthony Liguori wrote:
>>> On 05/09/2012 02:34 PM, Jan Kiszka wrote:
>>>>> Can't we resort to the SIGUSR1 workaround for the time being, while
>>>>> no RT signals are in actual use, and just have the time to let the
>>>>> kernel side to fix the things up before some actual RTsig user will
>>>>> emerge in qemu?  I think it is a bit more conservative approach,
>>>>> especially having in mind the minority of users this issue affects
>>>>> (only 32/64 mixed environment).  I'd favor for this variant, and
>>>>> it looks like I'm the "main" 32/64bit user of qemu in this world :)
>>>>
>>>> Most conservative is definitely this patch, not switching to SIGUSR1,
>>>> hoping that no other RT signal user shows up until current kernel are no
>>>> longer in use.
>>>
>>> Sorry, how is using a totally different code path more conservative than using a
>>> different signal number?
>>
>> If the gthread version is not safe to use, why do we fall back to it?
> 
> It's safe, but it's significantly slower.

OK. Then what about sigaltstack (once fixed)? Is it also slower? If not,
can we converge over it? I would really hate staying with this time bomb
of broken RT signals unless someone tells me we will kick out all these
coroutines rather sooner than later.

Jan
Peter Maydell May 9, 2012, 8:46 p.m. UTC | #8
On 9 May 2012 21:11, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> OK. Then what about sigaltstack (once fixed)? Is it also slower? If not,
> can we converge over it?

sigaltstack is going to be significantly faster than the gthread
implementation (about the same speed as ucontext for coroutine
switch, a bit slower for coroutine creation).

It suffers from the same problem of being a code path which nobody
has seriously stress tested, though. And it does at least one thing
which is "not permitted by the standards but works anyway".

> I would really hate staying with this time bomb
> of broken RT signals unless someone tells me we will kick out all these
> coroutines rather sooner than later.

I think that regardless of the long term choice it would be
a pretty risky decision to switch coroutine implementation
this close to release when we have the option of just using
a different signal that avoids the bug.

Longer term (ie post 1.1) I'm strongly in favour of kicking
out coroutines, because I think there clearly is no single
solid portable implementation possible. C just isn't designed
to allow them; better not to try to swim against the current.

-- PMM
Anthony Liguori May 9, 2012, 8:56 p.m. UTC | #9
On 05/09/2012 03:11 PM, Jan Kiszka wrote:
> On 2012-05-09 17:01, Anthony Liguori wrote:
>> On 05/09/2012 02:57 PM, Jan Kiszka wrote:
>>> On 2012-05-09 16:48, Anthony Liguori wrote:
>>>> On 05/09/2012 02:34 PM, Jan Kiszka wrote:
>>>>>> Can't we resort to the SIGUSR1 workaround for the time being, while
>>>>>> no RT signals are in actual use, and just have the time to let the
>>>>>> kernel side to fix the things up before some actual RTsig user will
>>>>>> emerge in qemu?  I think it is a bit more conservative approach,
>>>>>> especially having in mind the minority of users this issue affects
>>>>>> (only 32/64 mixed environment).  I'd favor for this variant, and
>>>>>> it looks like I'm the "main" 32/64bit user of qemu in this world :)
>>>>>
>>>>> Most conservative is definitely this patch, not switching to SIGUSR1,
>>>>> hoping that no other RT signal user shows up until current kernel are no
>>>>> longer in use.
>>>>
>>>> Sorry, how is using a totally different code path more conservative than using a
>>>> different signal number?
>>>
>>> If the gthread version is not safe to use, why do we fall back to it?
>>
>> It's safe, but it's significantly slower.
>
> OK. Then what about sigaltstack (once fixed)? Is it also slower?

I don't know, performance testing would need to be done.

> If not,
> can we converge over it? I would really hate staying with this time bomb
> of broken RT signals unless someone tells me we will kick out all these
> coroutines rather sooner than later.

AFAICT, neither SIGUSR1 or SIGUSR2 are used today.  We only use SIG_IPI today. 
We could easily #define SIG_IPI to SIGUSR1 unconditionally today.  We used to 
use SIGUSR2 for posix-aio but that was ages ago.  AFAICT, it's only used for 
sigaltstack now.

I don't see where this "time bomb" comes from.  I think it's perfect reasonable 
that if we end up needing more signals (after exhausting SIGUSR1/SIGUSR2) we 
simply require a fixed kernel for 32bit on 64bit.

Regards,

Anthony Liguori

>
> Jan
>
Anthony Liguori May 9, 2012, 8:59 p.m. UTC | #10
On 05/09/2012 03:46 PM, Peter Maydell wrote:
> On 9 May 2012 21:11, Jan Kiszka<jan.kiszka@siemens.com>  wrote:
>> OK. Then what about sigaltstack (once fixed)? Is it also slower? If not,
>> can we converge over it?
>
> sigaltstack is going to be significantly faster than the gthread
> implementation (about the same speed as ucontext for coroutine
> switch, a bit slower for coroutine creation).
>
> It suffers from the same problem of being a code path which nobody
> has seriously stress tested, though. And it does at least one thing
> which is "not permitted by the standards but works anyway".
>
>> I would really hate staying with this time bomb
>> of broken RT signals unless someone tells me we will kick out all these
>> coroutines rather sooner than later.
>
> I think that regardless of the long term choice it would be
> a pretty risky decision to switch coroutine implementation
> this close to release when we have the option of just using
> a different signal that avoids the bug.
>
> Longer term (ie post 1.1) I'm strongly in favour of kicking
> out coroutines, because I think there clearly is no single
> solid portable implementation possible. C just isn't designed
> to allow them; better not to try to swim against the current.

Unfortunately, voting for code to be different doesn't actually make it different.

If you're volunteering to rewrite the block layer to not require coroutines 
(either by using a state machine or by using re-entrant threads and fixing any 
locking issues associated with that) that's wonderful.

But we decided to not do synchronous I/O years ago and still haven't removed it 
all from the tree.  Coroutines got us much closer to getting rid of synchronous I/O.

Regards,

Anthony Liguori

>
> -- PMM
>
Peter Maydell May 9, 2012, 9:27 p.m. UTC | #11
On 9 May 2012 21:59, Anthony Liguori <aliguori@us.ibm.com> wrote:
> On 05/09/2012 03:46 PM, Peter Maydell wrote:
>> Longer term (ie post 1.1) I'm strongly in favour of kicking
>> out coroutines, because I think there clearly is no single
>> solid portable implementation possible. C just isn't designed
>> to allow them; better not to try to swim against the current.

> Unfortunately, voting for code to be different doesn't actually make it
> different.

Yeah, I agree with this sentiment...

> If you're volunteering to rewrite the block layer to not require coroutines
> (either by using a state machine or by using re-entrant threads and fixing
> any locking issues associated with that) that's wonderful.
>
> But we decided to not do synchronous I/O years ago and still haven't removed
> it all from the tree.  Coroutines got us much closer to getting rid of
> synchronous I/O.

...but I would at least like us to take the position that we don't
introduce *more* users of coroutines.

-- PMM
Anthony Liguori May 9, 2012, 9:36 p.m. UTC | #12
On 05/09/2012 04:27 PM, Peter Maydell wrote:
> On 9 May 2012 21:59, Anthony Liguori<aliguori@us.ibm.com>  wrote:
>> On 05/09/2012 03:46 PM, Peter Maydell wrote:
>>> Longer term (ie post 1.1) I'm strongly in favour of kicking
>>> out coroutines, because I think there clearly is no single
>>> solid portable implementation possible. C just isn't designed
>>> to allow them; better not to try to swim against the current.
>
>> Unfortunately, voting for code to be different doesn't actually make it
>> different.
>
> Yeah, I agree with this sentiment...
>
>> If you're volunteering to rewrite the block layer to not require coroutines
>> (either by using a state machine or by using re-entrant threads and fixing
>> any locking issues associated with that) that's wonderful.
>>
>> But we decided to not do synchronous I/O years ago and still haven't removed
>> it all from the tree.  Coroutines got us much closer to getting rid of
>> synchronous I/O.
>
> ...but I would at least like us to take the position that we don't
> introduce *more* users of coroutines.

I think the long term plan has been:

1) replace synchronous I/O users with coroutines + async I/O

2) promote coroutines to threads by introducing fine grain locking.

I don't think avoiding coroutines helps us along this route nor does it help 
eliminate immediate users of coroutines.

I think our best strategy forward is to get rid of async I/O in the blocker 
layer and in devices.  Then I think we should promote coroutines as much as 
possible.

Regards,

Anthony Liguori

>
> -- PMM
>
diff mbox

Patch

diff --git a/configure b/configure
index 491109d..62dcdb2 100755
--- a/configure
+++ b/configure
@@ -2777,17 +2777,22 @@  fi
 # windows autodetected by make
 if test "$coroutine" = "" -o "$coroutine" = "ucontext"; then
   if test "$darwin" != "yes"; then
-    cat > $TMPC << EOF
+    if test "$linux" = "yes" -a "$cpu" = "i386"; then
+      # RT signal mask corruption for 32-on-64 bit prevents ucontext usage
+      coroutine_backend=gthread
+    else
+      cat > $TMPC << EOF
 #include <ucontext.h>
 #ifdef __stub_makecontext
 #error Ignoring glibc stub makecontext which will always fail
 #endif
 int main(void) { makecontext(0, 0, 0); return 0; }
 EOF
-    if compile_prog "" "" ; then
+      if compile_prog "" "" ; then
         coroutine_backend=ucontext
-    else
-	coroutine_backend=gthread
+      else
+        coroutine_backend=gthread
+      fi
     fi
   else
     echo "Silently falling back into gthread backend under darwin"