diff mbox

tcg: Reload local variables after return from longjmp

Message ID 4E0ECDDB.9030001@web.de
State New
Headers show

Commit Message

Jan Kiszka July 2, 2011, 7:50 a.m. UTC
From: Jan Kiszka <jan.kiszka@siemens.com>

Recent compilers look deep into cpu_exec, find longjmp as a noreturn
function and decide to smash some stack variables as they won't be used
again. This may lead to env becoming invalid after return from setjmp,
causing crashes. Fix it by reloading env from cpu_single_env in that
case.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 cpu-exec.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

Comments

Blue Swirl July 12, 2011, 8:56 p.m. UTC | #1
Thanks, applied.

On Sat, Jul 2, 2011 at 10:50 AM, Jan Kiszka <jan.kiszka@web.de> wrote:
> From: Jan Kiszka <jan.kiszka@siemens.com>
>
> Recent compilers look deep into cpu_exec, find longjmp as a noreturn
> function and decide to smash some stack variables as they won't be used
> again. This may lead to env becoming invalid after return from setjmp,
> causing crashes. Fix it by reloading env from cpu_single_env in that
> case.
>
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>  cpu-exec.c |    4 ++++
>  1 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/cpu-exec.c b/cpu-exec.c
> index 20e3ec4..de0d716 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -587,6 +587,10 @@ int cpu_exec(CPUState *env)
>                 /* reset soft MMU for next block (it can currently
>                    only be set by a memory fault) */
>             } /* for(;;) */
> +        } else {
> +            /* Reload env after longjmp - the compiler may have smashed all
> +             * local variables as longjmp is marked 'noreturn'. */
> +            env = cpu_single_env;
>         }
>     } /* for(;;) */
>
>
Peter Maydell Aug. 11, 2011, 11:30 a.m. UTC | #2
On 2 July 2011 08:50, Jan Kiszka <jan.kiszka@web.de> wrote:
> From: Jan Kiszka <jan.kiszka@siemens.com>
>
> Recent compilers look deep into cpu_exec, find longjmp as a noreturn
> function and decide to smash some stack variables as they won't be used
> again. This may lead to env becoming invalid after return from setjmp,
> causing crashes. Fix it by reloading env from cpu_single_env in that
> case.

Can you give more details of what compiler/platform this was
a problem for? My reading of the C standard is that the compiler
isn't allowed to trash env across this longjmp, because it's
a variable of automatic scope which isn't modified between the
setjmp and the longjmp...

(We've been looking at this because reloading from cpu_single_env is
the wrong fix in the case of user-mode where there are multiple-threads.)

Thanks
-- PMM
Paolo Bonzini Aug. 11, 2011, 12:16 p.m. UTC | #3
On 08/11/2011 01:30 PM, Peter Maydell wrote:
>> >  Recent compilers look deep into cpu_exec, find longjmp as a noreturn
>> >  function and decide to smash some stack variables as they won't be used
>> >  again. This may lead to env becoming invalid after return from setjmp,
>> >  causing crashes. Fix it by reloading env from cpu_single_env in that
>> >  case.
> Can you give more details of what compiler/platform this was
> a problem for? My reading of the C standard is that the compiler
> isn't allowed to trash env across this longjmp, because it's
> a variable of automatic scope which isn't modified between the
> setjmp and the longjmp...

longjmp can destroy any non-volatile variable (-Wclobbered warns about 
this).

Paolo
Peter Maydell Aug. 11, 2011, 12:40 p.m. UTC | #4
On 11 August 2011 13:16, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 08/11/2011 01:30 PM, Peter Maydell wrote:
>> Can you give more details of what compiler/platform this was
>> a problem for? My reading of the C standard is that the compiler
>> isn't allowed to trash env across this longjmp, because it's
>> a variable of automatic scope which isn't modified between the
>> setjmp and the longjmp...
>
> longjmp can destroy any non-volatile variable (-Wclobbered warns about
> this).

"All accessible objects have values [...] as of the time the longjmp
function was called, except that the values of objects of automatic
storage duration that are local to the function containing the
invocation of the corresponding setjmp macro that do not have
volatile-qualified type and have been changed between the setjmp
invocation and longjmp call are indeterminate."
 -- C99 section 7.13.2.1 para 3.

So variables may only be destroyed if they are all of:
 * local to the function calling setjmp
 * not volatile
 * changed between setjmp and longjmp

We don't change env between the setjmp and longjmp so the compiler
should not trash it. (Indeed according to Jan in
http://lists.gnu.org/archive/html/qemu-devel/2011-07/msg00144.html
-Wclobbered doesn't complain about this code.)

-- PMM
Paolo Bonzini Aug. 11, 2011, 1:13 p.m. UTC | #5
On 08/11/2011 02:40 PM, Peter Maydell wrote:
> "All accessible objects have values [...] as of the time the longjmp
> function was called, except that the values of objects of automatic
> storage duration that are local to the function containing the
> invocation of the corresponding setjmp macro that do not have
> volatile-qualified type and have been changed between the setjmp
> invocation and longjmp call are indeterminate."
>   -- C99 section 7.13.2.1 para 3.
>
> So variables may only be destroyed if they are all of:
>   * local to the function calling setjmp
>   * not volatile
>   * changed between setjmp and longjmp

I didn't remember this third part.  Thanks for bringing up facts. :)

> We don't change env between the setjmp and longjmp so the compiler
> should not trash it. (Indeed according to Jan in
> http://lists.gnu.org/archive/html/qemu-devel/2011-07/msg00144.html
> -Wclobbered doesn't complain about this code.)

Then it's a compiler bug, not smartness.  Making env volatile (or making 
a volatile copy if there is a performance impact) should still be enough 
to work around it.

Paolo
Peter Maydell Aug. 11, 2011, 1:31 p.m. UTC | #6
On 11 August 2011 14:13, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 08/11/2011 02:40 PM, Peter Maydell wrote:
>> We don't change env between the setjmp and longjmp so the compiler
>> should not trash it. (Indeed according to Jan in
>> http://lists.gnu.org/archive/html/qemu-devel/2011-07/msg00144.html
>> -Wclobbered doesn't complain about this code.)
>
> Then it's a compiler bug, not smartness.  Making env volatile (or making a
> volatile copy if there is a performance impact) should still be enough to
> work around it.

Yes. (It would have to be a volatile copy, I think, env is a function
parameter and I don't think you can make those volatile.)
https://bugs.launchpad.net/qemu/+bug/823902 includes some discussion
of the effects on the test of adding the volatile copy.

-- PMM
Paolo Bonzini Aug. 11, 2011, 2:10 p.m. UTC | #7
On 08/11/2011 03:31 PM, Peter Maydell wrote:
>>>
>>> Then it's a compiler bug, not smartness.  Making env volatile
>>> (or making a volatile copy if there is a performance impact)
>>> should still be enough to work around it.
> Yes. (It would have to be a volatile copy, I think, env is a function
> parameter and I don't think you can make those volatile.)
> https://bugs.launchpad.net/qemu/+bug/823902  includes some discussion
> of the effects on the test of adding the volatile copy.

I'm not sure about what to read from there:

> If I make cpu_single_env thread local with __thread and leave
> 0d101... in, then again it works reliably on 32bit Lucid, and is
> flaky on 64 bit Oneiric (5/10 2 hangs, 3 segs)
>
> I've also tried using a volatile local variable in cpu_exec to hold
> a copy of env and restore that rather than cpu_single_env. With this
> it's solid on 32bit lucid and flaky on 64bit Oneirc; these failures
> on 64bit OO look like it running off the end of the code buffer (all
> 0 code), jumping to non-existent code addresses and a seg in
> tb_reset_jump_recursive2.

It looks like neither a thread-local cpu_single_env nor a volatile copy 
fix the bug?!?

I cannot think off-hand of a reason why thread-local cpu_single_env 
should not work for iothread under Unix, BTW.  Since cpu_single_env is 
only set/used by a thread at a time (under the global lock), its users 
cannot distinguish between a thread-local variable and a global.

The only problem would be Windows, which runs cpu_signal in a thread 
different than the CPU thread.  But that can be fixed easily in 
qemu_cpu_kick_thread.

Paolo
David Gilbert Aug. 11, 2011, 2:12 p.m. UTC | #8
On 11 August 2011 15:10, Paolo Bonzini <pbonzini@redhat.com> wrote:

> I'm not sure about what to read from there:
>
>> If I make cpu_single_env thread local with __thread and leave
>> 0d101... in, then again it works reliably on 32bit Lucid, and is
>> flaky on 64 bit Oneiric (5/10 2 hangs, 3 segs)
>>
>> I've also tried using a volatile local variable in cpu_exec to hold
>> a copy of env and restore that rather than cpu_single_env. With this
>> it's solid on 32bit lucid and flaky on 64bit Oneirc; these failures
>> on 64bit OO look like it running off the end of the code buffer (all
>> 0 code), jumping to non-existent code addresses and a seg in
>> tb_reset_jump_recursive2.
>
> It looks like neither a thread-local cpu_single_env nor a volatile copy fix
> the bug?!?

As I say at the bottom of that bug I'm assuming I'm hitting multiple bugs.
Although it's not clear to me why I don't hit them on 32bit lucid.

Dave
Peter Maydell Aug. 11, 2011, 2:24 p.m. UTC | #9
On 11 August 2011 15:10, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 08/11/2011 03:31 PM, Peter Maydell wrote:
>>>>
>>>> Then it's a compiler bug, not smartness.  Making env volatile
>>>> (or making a volatile copy if there is a performance impact)
>>>> should still be enough to work around it.
>>
>> Yes. (It would have to be a volatile copy, I think, env is a function
>> parameter and I don't think you can make those volatile.)
>> https://bugs.launchpad.net/qemu/+bug/823902  includes some discussion
>> of the effects on the test of adding the volatile copy.
>
> I'm not sure about what to read from there:
>
>> If I make cpu_single_env thread local with __thread and leave
>> 0d101... in, then again it works reliably on 32bit Lucid, and is
>> flaky on 64 bit Oneiric (5/10 2 hangs, 3 segs)
>>
>> I've also tried using a volatile local variable in cpu_exec to hold
>> a copy of env and restore that rather than cpu_single_env. With this
>> it's solid on 32bit lucid and flaky on 64bit Oneirc; these failures
>> on 64bit OO look like it running off the end of the code buffer (all
>> 0 code), jumping to non-existent code addresses and a seg in
>> tb_reset_jump_recursive2.
>
> It looks like neither a thread-local cpu_single_env nor a volatile copy fix
> the bug?!?

My guess is that we're running into the various other problems qemu has
with significantly multithreaded programs in user-mode, which makes it
a bit hard to disentangle this specific regression. (In particular
segfaults in tb_reset_jump_recursive2 are almost certainly bug 668799.)

> I cannot think off-hand of a reason why thread-local cpu_single_env should
> not work for iothread under Unix, BTW.  Since cpu_single_env is only
> set/used by a thread at a time (under the global lock), its users cannot
> distinguish between a thread-local variable and a global.

Thanks for the clarification. As you say, as long as we don't ever
try to access it from another thread we're fine...

> The only problem would be Windows, which runs cpu_signal in a thread
> different than the CPU thread.  But that can be fixed easily in
> qemu_cpu_kick_thread.

...and we just need to fix this.

-- PMM
diff mbox

Patch

diff --git a/cpu-exec.c b/cpu-exec.c
index 20e3ec4..de0d716 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -587,6 +587,10 @@  int cpu_exec(CPUState *env)
                 /* reset soft MMU for next block (it can currently
                    only be set by a memory fault) */
             } /* for(;;) */
+        } else {
+            /* Reload env after longjmp - the compiler may have smashed all
+             * local variables as longjmp is marked 'noreturn'. */
+            env = cpu_single_env;
         }
     } /* for(;;) */