Message ID | 1332767238-22730-1-git-send-email-cedric.vincent@st.com |
---|---|
State | New |
Headers | show |
2012/3/26 Cédric VINCENT <cedric.vincent@st.com>: > This reverts commit fd4bab10 "target-sh4: optimize exceptions": [cc'ing Aurelien as the author of that commit] > the function cpu_restore_state() isn't expected to be called in user-mode, Is this really true? host_signal_handler() calls cpu_signal_handler() calls handle_cpu_signala) calls cpu_restore_state() so hopefully it's OK to be called in at least *some* situations... > as a consequence it isn't protected from race conditions. For > information, syscalls are exceptions on Linux/SH4. > > There were two possible fixes: either "tb_lock" is acquired/released > around the call to cpu_restore_state() [1] or the commit that > introduced this regression is reverted [2]. Can you explain a bit further what the race condition is that occurs here? NB the whole tb_lock thing is broken anyway. thanks -- PMM
On Mon, Mar 26, 2012 at 07:23:58PM +0200, Peter Maydell wrote: > 2012/3/26 Cédric VINCENT <cedric.vincent@st.com>: > > This reverts commit fd4bab10 "target-sh4: optimize exceptions": > > [cc'ing Aurelien as the author of that commit] > > > the function cpu_restore_state() isn't expected to be called in user-mode, > > Is this really true? host_signal_handler() calls cpu_signal_handler() > calls handle_cpu_signala) calls cpu_restore_state() so hopefully > it's OK to be called in at least *some* situations... > > > as a consequence it isn't protected from race conditions. For > > information, syscalls are exceptions on Linux/SH4. > > > > There were two possible fixes: either "tb_lock" is acquired/released > > around the call to cpu_restore_state() [1] or the commit that > > introduced this regression is reverted [2]. > > Can you explain a bit further what the race condition is that occurs > here? Here is what I observed on a "real" application: thread2: "jump to a new block" | thread1: "do a syscall" cpu_exec [tb_lock acquired] | helper_trapa tb_find_fast | raise_exception tb_find_slow | cpu_restore_state_from_retaddr tb_gen_code | cpu_translate_state cpu_gen_code | gen_intermediate_code_pc gen_intermediate_code_pc | /!\ thread1 and thread2 modify | gen_opc_buf[] at the same time | since thread1 didn't acquire tb_lock. Actually you can reproduce easily this race-condition with the source code below, where both threads do syscalls repeatedly: 8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<---- #include <pthread.h> #include <stdio.h> #include <unistd.h> void *routine(void *arg) { int thread_number = (int)arg; while (1) { printf("hello, I'm %d\n", thread_number); sleep(1); } } int main(void) { int status; pthread_t thread; status = pthread_create(&thread, NULL, routine, (void *)2); if (status) { puts("can't create a thread, bye!"); return 1; } routine((void *)1); return 0; /* Never reached. */ } 8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<---- Before the fix: $ qemu-sh4 ./a.out hello, I'm 1 hello, I'm 2 hello, I'm 1 hello, I'm 2 hello, I'm 1 qemu: uncaught target signal 11 (Segmentation fault) - core dumped Segmentation fault $ qemu-sh4 ./a.out qemu/tcg/tcg.c:1369: tcg fatal error qemu: uncaught target signal 11 (Segmentation fault) - core dumped Segmentation fault
On Mon, Mar 26, 2012 at 03:07:18PM +0200, Cedric VINCENT wrote: > This reverts commit fd4bab10 "target-sh4: optimize exceptions": the > function cpu_restore_state() isn't expected to be called in user-mode, > as a consequence it isn't protected from race conditions. For > information, syscalls are exceptions on Linux/SH4. > > There were two possible fixes: either "tb_lock" is acquired/released > around the call to cpu_restore_state() [1] or the commit that > introduced this regression is reverted [2]. I will submit a new version of this patch where the commit fd4bab10 isn't reverted and the call to cpu_restore_state() is protected by tb_lock instead. Actually most of the SH4 FPU helpers may call cpu_restore_state() indirectly, this is the same problem as with helper_trapa(). Regards, Cédric.
On Mon, Mar 26, 2012 at 07:23:58PM +0200, Peter Maydell wrote: > 2012/3/26 Cédric VINCENT <cedric.vincent@st.com>: > > the function cpu_restore_state() isn't expected to be called in user-mode, > > Is this really true? host_signal_handler() calls cpu_signal_handler() > calls handle_cpu_signala) calls cpu_restore_state() so hopefully > it's OK to be called in at least *some* situations... Actually it's not that OK, the code below [1] exposes a threading issue in this specific part of QEMU: * the first thread makes invalid memory references, that way QEMU reaches the given situation (host_signal_handler -> cpu_signal_handler -> handle_cpu_signal -> cpu_restore_state -> gen_intermediate_code) without acquiring "tb_lock". * at the same time, the second thread executes code that wasn't translated before, that way the TCG is invoked with "tb_lock" acquired. Note that in this example I used a self modifying block just to simulate a not-yet-translated code. This example triggers an invalid memory reference and/or an assertion in the TCG part of QEMU. > NB the whole tb_lock thing is broken anyway. Because of such bugs or are there other reasons? Maybe we could add a simple sanity check in gen_intermediate_code() functions to be sure that "tb_lock" is acquired when they are called. Regards, Cédric. [1] I wrote this example for ARM and SH4, but this problem appears for any guest CPU: #define _GNU_SOURCE #include <pthread.h> #include <stdio.h> #include <unistd.h> #include <sys/types.h> #include <signal.h> #include <sys/syscall.h> #include <signal.h> #include <stdlib.h> #include <strings.h> #include <stdint.h> pid_t gettid(void) { return syscall(SYS_gettid); } void handler(int signo) { printf("Thread %d received signal %d\n", gettid(), signo); } void *routine(void *arg) { pid_t tid = (pid_t)arg; while (1) { *(int *)NULL = 1; sleep(1); } } int main(void) { struct sigaction action; pthread_t thread; int status; bzero(&action, sizeof(action)); action.sa_handler = handler; sigfillset(&action.sa_mask); status = sigaction(SIGSEGV, &action, NULL); if (status < 0) { puts("can't regsiter sighandler, bye!"); exit(EXIT_FAILURE); } status = pthread_create(&thread, NULL, routine, (void *)gettid()); if (status) { puts("can't create a thread, bye!"); exit(EXIT_FAILURE); } #if defined(__SH4__) uint8_t self_modifying_code[] = { // _start 0x00, 0xc7, // mova @(4, pc), r0 0x01, 0x61, // mov.w @r0, r1 0x11, 0x20, // mov.w r1, @r0 0xfb, 0xaf, // bra _start 0x09, 0x00, // nop }; asm volatile ("jmp @%0 \n nop" : : "r" (self_modifying_code)); #elif defined(__arm__) uint32_t self_modifying_code[] = { // _start: 0xe1a0000f, // mov r0, pc 0xe5901000, // ldr r1, [r0] 0xe5801000, // str r1, [r0] 0xeafffffb, // b _start }; asm volatile ("mov pc, %0" : : "r" (self_modifying_code)); #else #error "unsupported architecture." #endif puts("should'nt be reached, bye!"); exit(EXIT_FAILURE); }
On 3 April 2012 13:28, <cedric.vincent@st.com> wrote: > On Mon, Mar 26, 2012 at 07:23:58PM +0200, Peter Maydell wrote: >> 2012/3/26 Cédric VINCENT <cedric.vincent@st.com>: >> > the function cpu_restore_state() isn't expected to be called in user-mode, >> >> Is this really true? host_signal_handler() calls cpu_signal_handler() >> calls handle_cpu_signala) calls cpu_restore_state() so hopefully >> it's OK to be called in at least *some* situations... > > Actually it's not that OK, the code below [1] exposes a threading > issue in this specific part of QEMU: > > * the first thread makes invalid memory references, that way QEMU > reaches the given situation (host_signal_handler -> > cpu_signal_handler -> handle_cpu_signal -> cpu_restore_state -> > gen_intermediate_code) without acquiring "tb_lock". > > * at the same time, the second thread executes code that wasn't > translated before, that way the TCG is invoked with "tb_lock" > acquired. Note that in this example I used a self modifying > block just to simulate a not-yet-translated code. > > This example triggers an invalid memory reference and/or an assertion > in the TCG part of QEMU. Mmm, I rather suspected you had a larger problem than the one you first encountered. >> NB the whole tb_lock thing is broken anyway. > > Because of such bugs or are there other reasons? Maybe we could add a > simple sanity check in gen_intermediate_code() functions to be sure > that "tb_lock" is acquired when they are called. Eg https://bugs.launchpad.net/qemu/+bug/668799 Basically at the moment there are places where we try to update the TB graph in functions called from signal handlers. This isn't safe because we can't take a lock in the signal handler (we'd deadlock). Instead we do no locking at all and if you make heavy use of threads and signals we crash. -- PMM
diff --git a/target-sh4/op_helper.c b/target-sh4/op_helper.c index b299576..b8b6e4a 100644 --- a/target-sh4/op_helper.c +++ b/target-sh4/op_helper.c @@ -84,31 +84,28 @@ void helper_ldtlb(void) #endif } -static inline void raise_exception(int index, void *retaddr) -{ - env->exception_index = index; - cpu_restore_state_from_retaddr(retaddr); - cpu_loop_exit(env); -} - void helper_raise_illegal_instruction(void) { - raise_exception(0x180, GETPC()); + env->exception_index = 0x180; + cpu_loop_exit(env); } void helper_raise_slot_illegal_instruction(void) { - raise_exception(0x1a0, GETPC()); + env->exception_index = 0x1a0; + cpu_loop_exit(env); } void helper_raise_fpu_disable(void) { - raise_exception(0x800, GETPC()); + env->exception_index = 0x800; + cpu_loop_exit(env); } void helper_raise_slot_fpu_disable(void) { - raise_exception(0x820, GETPC()); + env->exception_index = 0x820; + cpu_loop_exit(env); } void helper_debug(void) @@ -129,7 +126,8 @@ void helper_sleep(uint32_t next_pc) void helper_trapa(uint32_t tra) { env->tra = tra << 2; - raise_exception(0x160, GETPC()); + env->exception_index = 0x160; + cpu_loop_exit(env); } void helper_movcal(uint32_t address, uint32_t value) diff --git a/target-sh4/translate.c b/target-sh4/translate.c index e04a6e0..5f4ad0e 100644 --- a/target-sh4/translate.c +++ b/target-sh4/translate.c @@ -466,6 +466,7 @@ static inline void gen_store_fpr64 (TCGv_i64 t, int reg) #define CHECK_NOT_DELAY_SLOT \ if (ctx->flags & (DELAY_SLOT | DELAY_SLOT_CONDITIONAL)) \ { \ + tcg_gen_movi_i32(cpu_pc, ctx->pc); \ gen_helper_raise_slot_illegal_instruction(); \ ctx->bstate = BS_EXCP; \ return; \ @@ -473,6 +474,7 @@ static inline void gen_store_fpr64 (TCGv_i64 t, int reg) #define CHECK_PRIVILEGED \ if (IS_USER(ctx)) { \ + tcg_gen_movi_i32(cpu_pc, ctx->pc); \ if (ctx->flags & (DELAY_SLOT | DELAY_SLOT_CONDITIONAL)) { \ gen_helper_raise_slot_illegal_instruction(); \ } else { \ @@ -484,6 +486,7 @@ static inline void gen_store_fpr64 (TCGv_i64 t, int reg) #define CHECK_FPU_ENABLED \ if (ctx->flags & SR_FD) { \ + tcg_gen_movi_i32(cpu_pc, ctx->pc); \ if (ctx->flags & (DELAY_SLOT | DELAY_SLOT_CONDITIONAL)) { \ gen_helper_raise_slot_fpu_disable(); \ } else { \ @@ -1384,6 +1387,7 @@ static void _decode_opc(DisasContext * ctx) { TCGv imm; CHECK_NOT_DELAY_SLOT + tcg_gen_movi_i32(cpu_pc, ctx->pc); imm = tcg_const_i32(B7_0); gen_helper_trapa(imm); tcg_temp_free(imm); @@ -1881,6 +1885,7 @@ static void _decode_opc(DisasContext * ctx) ctx->opcode, ctx->pc); fflush(stderr); #endif + tcg_gen_movi_i32(cpu_pc, ctx->pc); if (ctx->flags & (DELAY_SLOT | DELAY_SLOT_CONDITIONAL)) { gen_helper_raise_slot_illegal_instruction(); } else {
This reverts commit fd4bab10 "target-sh4: optimize exceptions": the function cpu_restore_state() isn't expected to be called in user-mode, as a consequence it isn't protected from race conditions. For information, syscalls are exceptions on Linux/SH4. There were two possible fixes: either "tb_lock" is acquired/released around the call to cpu_restore_state() [1] or the commit that introduced this regression is reverted [2]. The performance impact of these two approaches were compared with two different tests: +-------------------------+--------+-----------------+-------------------+ | | v1.0.1 | [1] v1.0.1 | [2] v1.0.1 | | Test | | + tb_lock patch | + reverted commit | +-------------------------+--------+-----------------+-------------------+ | bzip2 17MB.mp3 (v1.0.5) | ~46s | ~46s | ~46s | | df_dok/X11 (v1.4.12) | crash | ~257s | ~256s | +-------------------------+--------+-----------------+-------------------+ Since I didn't see any performance impact, I prefered not to add extra calls to spin_lock/spin_unlock that were specific to the user-mode. Signed-off-by: Cédric VINCENT <cedric.vincent@st.com> --- target-sh4/op_helper.c | 22 ++++++++++------------ target-sh4/translate.c | 5 +++++ 2 files changed, 15 insertions(+), 12 deletions(-)