diff mbox

sh4-linux-user: fix multi-threading regression.

Message ID 1332767238-22730-1-git-send-email-cedric.vincent@st.com
State New
Headers show

Commit Message

vincent March 26, 2012, 1:07 p.m. UTC
This reverts commit fd4bab10 "target-sh4: optimize exceptions": the
function cpu_restore_state() isn't expected to be called in user-mode,
as a consequence it isn't protected from race conditions.  For
information, syscalls are exceptions on Linux/SH4.

There were two possible fixes: either "tb_lock" is acquired/released
around the call to cpu_restore_state() [1] or the commit that
introduced this regression is reverted [2].  The performance impact of
these two approaches were compared with two different tests:

    +-------------------------+--------+-----------------+-------------------+
    |                         | v1.0.1 | [1]      v1.0.1 | [2]        v1.0.1 |
    | Test                    |        | + tb_lock patch | + reverted commit |
    +-------------------------+--------+-----------------+-------------------+
    | bzip2 17MB.mp3 (v1.0.5) |   ~46s |            ~46s |              ~46s |
    | df_dok/X11 (v1.4.12)    |  crash |           ~257s |             ~256s |
    +-------------------------+--------+-----------------+-------------------+

Since I didn't see any performance impact, I prefered not to add extra
calls to spin_lock/spin_unlock that were specific to the user-mode.

Signed-off-by: Cédric VINCENT <cedric.vincent@st.com>
---
 target-sh4/op_helper.c |   22 ++++++++++------------
 target-sh4/translate.c |    5 +++++
 2 files changed, 15 insertions(+), 12 deletions(-)

Comments

Peter Maydell March 26, 2012, 5:23 p.m. UTC | #1
2012/3/26 Cédric VINCENT <cedric.vincent@st.com>:
> This reverts commit fd4bab10 "target-sh4: optimize exceptions":

[cc'ing Aurelien as the author of that commit]

> the function cpu_restore_state() isn't expected to be called in user-mode,

Is this really true? host_signal_handler() calls cpu_signal_handler()
calls handle_cpu_signala) calls cpu_restore_state() so hopefully
it's OK to be called in at least *some* situations...

> as a consequence it isn't protected from race conditions.  For
> information, syscalls are exceptions on Linux/SH4.
>
> There were two possible fixes: either "tb_lock" is acquired/released
> around the call to cpu_restore_state() [1] or the commit that
> introduced this regression is reverted [2].

Can you explain a bit further what the race condition is that occurs
here?

NB the whole tb_lock thing is broken anyway.

thanks
-- PMM
vincent March 27, 2012, 8:15 a.m. UTC | #2
On Mon, Mar 26, 2012 at 07:23:58PM +0200, Peter Maydell wrote:
> 2012/3/26 Cédric VINCENT <cedric.vincent@st.com>:
> > This reverts commit fd4bab10 "target-sh4: optimize exceptions":
> 
> [cc'ing Aurelien as the author of that commit]
> 
> > the function cpu_restore_state() isn't expected to be called in user-mode,
> 
> Is this really true? host_signal_handler() calls cpu_signal_handler()
> calls handle_cpu_signala) calls cpu_restore_state() so hopefully
> it's OK to be called in at least *some* situations...
> 
> > as a consequence it isn't protected from race conditions.  For
> > information, syscalls are exceptions on Linux/SH4.
> >
> > There were two possible fixes: either "tb_lock" is acquired/released
> > around the call to cpu_restore_state() [1] or the commit that
> > introduced this regression is reverted [2].
> 
> Can you explain a bit further what the race condition is that occurs
> here?

Here is what I observed on a "real" application:

thread2: "jump to a new block"       | thread1: "do a syscall"
  cpu_exec [tb_lock acquired]        |   helper_trapa
    tb_find_fast                     |     raise_exception
      tb_find_slow                   |       cpu_restore_state_from_retaddr
        tb_gen_code                  |         cpu_translate_state
          cpu_gen_code               |            gen_intermediate_code_pc
            gen_intermediate_code_pc | /!\ thread1 and thread2 modify
                                     |     gen_opc_buf[] at the same time
                                     |     since thread1 didn't acquire tb_lock.


Actually you can reproduce easily this race-condition with the source
code below, where both threads do syscalls repeatedly:

8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----

    #include <pthread.h>
    #include <stdio.h>
    #include <unistd.h>

    void *routine(void *arg)
    {
        int thread_number = (int)arg;

        while (1) {
            printf("hello, I'm %d\n", thread_number);
            sleep(1);
       }
    }

    int main(void)
    {
        int status;
        pthread_t thread;

        status = pthread_create(&thread, NULL, routine, (void *)2);
        if (status) {
            puts("can't create a thread, bye!");
            return 1;
        }

        routine((void *)1);
        return 0; /* Never reached.  */
    }

8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----

Before the fix:

    $ qemu-sh4 ./a.out                                   
    hello, I'm 1
    hello, I'm 2
    hello, I'm 1
    hello, I'm 2
    hello, I'm 1
    qemu: uncaught target signal 11 (Segmentation fault) - core dumped
    Segmentation fault

    $ qemu-sh4 ./a.out
    qemu/tcg/tcg.c:1369: tcg fatal error
    qemu: uncaught target signal 11 (Segmentation fault) - core dumped
    Segmentation fault
vincent March 30, 2012, 1:36 p.m. UTC | #3
On Mon, Mar 26, 2012 at 03:07:18PM +0200, Cedric VINCENT wrote:
> This reverts commit fd4bab10 "target-sh4: optimize exceptions": the
> function cpu_restore_state() isn't expected to be called in user-mode,
> as a consequence it isn't protected from race conditions.  For
> information, syscalls are exceptions on Linux/SH4.
> 
> There were two possible fixes: either "tb_lock" is acquired/released
> around the call to cpu_restore_state() [1] or the commit that
> introduced this regression is reverted [2].

I will submit a new version of this patch where the commit fd4bab10
isn't reverted and the call to cpu_restore_state() is protected by
tb_lock instead.  Actually most of the SH4 FPU helpers may call
cpu_restore_state() indirectly, this is the same problem as with
helper_trapa().

Regards,
Cédric.
vincent April 3, 2012, 12:28 p.m. UTC | #4
On Mon, Mar 26, 2012 at 07:23:58PM +0200, Peter Maydell wrote:
> 2012/3/26 Cédric VINCENT <cedric.vincent@st.com>:
> > the function cpu_restore_state() isn't expected to be called in user-mode,
> 
> Is this really true? host_signal_handler() calls cpu_signal_handler()
> calls handle_cpu_signala) calls cpu_restore_state() so hopefully
> it's OK to be called in at least *some* situations...

Actually it's not that OK, the code below [1] exposes a threading
issue in this specific part of QEMU:

    * the first thread makes invalid memory references, that way QEMU
      reaches the given situation (host_signal_handler ->
      cpu_signal_handler -> handle_cpu_signal -> cpu_restore_state ->
      gen_intermediate_code) without acquiring "tb_lock".

    * at the same time, the second thread executes code that wasn't
      translated before, that way the TCG is invoked with "tb_lock"
      acquired.  Note that in this example I used a self modifying
      block just to simulate a not-yet-translated code.

This example triggers an invalid memory reference and/or an assertion
in the TCG part of QEMU.


> NB the whole tb_lock thing is broken anyway.

Because of such bugs or are there other reasons?  Maybe we could add a
simple sanity check in gen_intermediate_code() functions to be sure
that "tb_lock" is acquired when they are called.

Regards,
Cédric.

[1] I wrote this example for ARM and SH4, but this problem appears for
    any guest CPU:

#define _GNU_SOURCE
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <signal.h>
#include <sys/syscall.h>
#include <signal.h>
#include <stdlib.h>
#include <strings.h>
#include <stdint.h>

pid_t gettid(void)
{
	return syscall(SYS_gettid);
}

void handler(int signo)
{
	printf("Thread %d received signal %d\n", gettid(), signo);
}

void *routine(void *arg)
{
	pid_t tid = (pid_t)arg;

	while (1) {
            *(int *)NULL = 1;
            sleep(1);
        }
}

int main(void)
{
    struct sigaction action;
    pthread_t thread;
    int status;

    bzero(&action, sizeof(action));
    action.sa_handler = handler;
    sigfillset(&action.sa_mask);

    status = sigaction(SIGSEGV, &action, NULL);
    if (status < 0) {
        puts("can't regsiter sighandler, bye!");
        exit(EXIT_FAILURE);
    }

    status = pthread_create(&thread, NULL, routine, (void *)gettid());
    if (status) {
	    puts("can't create a thread, bye!");
	    exit(EXIT_FAILURE);
    }

#if defined(__SH4__)
    uint8_t self_modifying_code[] = {
        // _start
        0x00, 0xc7,  // mova    @(4, pc), r0
        0x01, 0x61,  // mov.w   @r0, r1
        0x11, 0x20,  // mov.w   r1, @r0
        0xfb, 0xaf,  // bra     _start
        0x09, 0x00,  // nop
    };

    asm volatile ("jmp @%0 \n nop" : : "r" (self_modifying_code));
#elif defined(__arm__)
    uint32_t self_modifying_code[] = {
        // _start:
        0xe1a0000f,  // mov     r0, pc
        0xe5901000,  // ldr     r1, [r0]
        0xe5801000,  // str     r1, [r0]
        0xeafffffb,  // b       _start
    };

    asm volatile ("mov pc, %0" : : "r" (self_modifying_code));
#else
    #error "unsupported architecture."
#endif

    puts("should'nt be reached, bye!");
    exit(EXIT_FAILURE);
}
Peter Maydell April 3, 2012, 12:48 p.m. UTC | #5
On 3 April 2012 13:28,  <cedric.vincent@st.com> wrote:
> On Mon, Mar 26, 2012 at 07:23:58PM +0200, Peter Maydell wrote:
>> 2012/3/26 Cédric VINCENT <cedric.vincent@st.com>:
>> > the function cpu_restore_state() isn't expected to be called in user-mode,
>>
>> Is this really true? host_signal_handler() calls cpu_signal_handler()
>> calls handle_cpu_signala) calls cpu_restore_state() so hopefully
>> it's OK to be called in at least *some* situations...
>
> Actually it's not that OK, the code below [1] exposes a threading
> issue in this specific part of QEMU:
>
>    * the first thread makes invalid memory references, that way QEMU
>      reaches the given situation (host_signal_handler ->
>      cpu_signal_handler -> handle_cpu_signal -> cpu_restore_state ->
>      gen_intermediate_code) without acquiring "tb_lock".
>
>    * at the same time, the second thread executes code that wasn't
>      translated before, that way the TCG is invoked with "tb_lock"
>      acquired.  Note that in this example I used a self modifying
>      block just to simulate a not-yet-translated code.
>
> This example triggers an invalid memory reference and/or an assertion
> in the TCG part of QEMU.

Mmm, I rather suspected you had a larger problem than the one
you first encountered.

>> NB the whole tb_lock thing is broken anyway.
>
> Because of such bugs or are there other reasons?  Maybe we could add a
> simple sanity check in gen_intermediate_code() functions to be sure
> that "tb_lock" is acquired when they are called.

Eg https://bugs.launchpad.net/qemu/+bug/668799
Basically at the moment there are places where we try to update
the TB graph in functions called from signal handlers. This isn't
safe because we can't take a lock in the signal handler (we'd deadlock).
Instead we do no locking at all and if you make heavy use of threads
and signals we crash.

-- PMM
diff mbox

Patch

diff --git a/target-sh4/op_helper.c b/target-sh4/op_helper.c
index b299576..b8b6e4a 100644
--- a/target-sh4/op_helper.c
+++ b/target-sh4/op_helper.c
@@ -84,31 +84,28 @@  void helper_ldtlb(void)
 #endif
 }
 
-static inline void raise_exception(int index, void *retaddr)
-{
-    env->exception_index = index;
-    cpu_restore_state_from_retaddr(retaddr);
-    cpu_loop_exit(env);
-}
-
 void helper_raise_illegal_instruction(void)
 {
-    raise_exception(0x180, GETPC());
+    env->exception_index = 0x180;
+    cpu_loop_exit(env);
 }
 
 void helper_raise_slot_illegal_instruction(void)
 {
-    raise_exception(0x1a0, GETPC());
+    env->exception_index = 0x1a0;
+    cpu_loop_exit(env);
 }
 
 void helper_raise_fpu_disable(void)
 {
-    raise_exception(0x800, GETPC());
+  env->exception_index = 0x800;
+  cpu_loop_exit(env);
 }
 
 void helper_raise_slot_fpu_disable(void)
 {
-    raise_exception(0x820, GETPC());
+  env->exception_index = 0x820;
+  cpu_loop_exit(env);
 }
 
 void helper_debug(void)
@@ -129,7 +126,8 @@  void helper_sleep(uint32_t next_pc)
 void helper_trapa(uint32_t tra)
 {
     env->tra = tra << 2;
-    raise_exception(0x160, GETPC());
+    env->exception_index = 0x160;
+    cpu_loop_exit(env);
 }
 
 void helper_movcal(uint32_t address, uint32_t value)
diff --git a/target-sh4/translate.c b/target-sh4/translate.c
index e04a6e0..5f4ad0e 100644
--- a/target-sh4/translate.c
+++ b/target-sh4/translate.c
@@ -466,6 +466,7 @@  static inline void gen_store_fpr64 (TCGv_i64 t, int reg)
 #define CHECK_NOT_DELAY_SLOT \
   if (ctx->flags & (DELAY_SLOT | DELAY_SLOT_CONDITIONAL))     \
   {                                                           \
+      tcg_gen_movi_i32(cpu_pc, ctx->pc);                      \
       gen_helper_raise_slot_illegal_instruction();            \
       ctx->bstate = BS_EXCP;                                  \
       return;                                                 \
@@ -473,6 +474,7 @@  static inline void gen_store_fpr64 (TCGv_i64 t, int reg)
 
 #define CHECK_PRIVILEGED                                        \
   if (IS_USER(ctx)) {                                           \
+      tcg_gen_movi_i32(cpu_pc, ctx->pc);                        \
       if (ctx->flags & (DELAY_SLOT | DELAY_SLOT_CONDITIONAL)) { \
          gen_helper_raise_slot_illegal_instruction();           \
       } else {                                                  \
@@ -484,6 +486,7 @@  static inline void gen_store_fpr64 (TCGv_i64 t, int reg)
 
 #define CHECK_FPU_ENABLED                                       \
   if (ctx->flags & SR_FD) {                                     \
+      tcg_gen_movi_i32(cpu_pc, ctx->pc);                        \
       if (ctx->flags & (DELAY_SLOT | DELAY_SLOT_CONDITIONAL)) { \
           gen_helper_raise_slot_fpu_disable();                  \
       } else {                                                  \
@@ -1384,6 +1387,7 @@  static void _decode_opc(DisasContext * ctx)
 	{
 	    TCGv imm;
 	    CHECK_NOT_DELAY_SLOT
+	    tcg_gen_movi_i32(cpu_pc, ctx->pc);
 	    imm = tcg_const_i32(B7_0);
 	    gen_helper_trapa(imm);
 	    tcg_temp_free(imm);
@@ -1881,6 +1885,7 @@  static void _decode_opc(DisasContext * ctx)
 	    ctx->opcode, ctx->pc);
     fflush(stderr);
 #endif
+    tcg_gen_movi_i32(cpu_pc, ctx->pc);
     if (ctx->flags & (DELAY_SLOT | DELAY_SLOT_CONDITIONAL)) {
        gen_helper_raise_slot_illegal_instruction();
     } else {