diff mbox

[TestDays] s390x emulation error

Message ID 4EBE4596.6010009@suse.de
State New
Headers show

Commit Message

Andreas Färber Nov. 12, 2011, 10:08 a.m. UTC
Am 10.11.2011 12:29, schrieb Andreas Färber:
> Am 10.11.2011 11:32, schrieb Alexander Graf:
>>
>> On 10.11.2011, at 10:53, Andreas Färber <afaerber@suse.de> wrote:
>>
>>> Is there a known issue with running multiple instances of
>>> qemu-system-s390x? I got a hang on openSUSE 12.1 RC2 x86_64 host:
>>>
>>> 0x00007f0de7f698d4 in __lll_lock_wait () from /lib64/libpthread.so.0
>>> (gdb) bt
>>> #0  0x00007f0de7f698d4 in __lll_lock_wait () from /lib64/libpthread.so.0
>>> #1  0x00007f0de7f651c5 in _L_lock_883 () from /lib64/libpthread.so.0
>>> #2  0x00007f0de7f6501a in pthread_mutex_lock () from /lib64/libpthread.so.0
>>> #3  0x000000000048b4a9 in qemu_mutex_lock (mutex=<optimized out>)
>>>    at /home/andreas/QEMU/qemu/qemu-thread-posix.c:54
>>> #4  0x00000000004cd3df in qemu_mutex_lock_iothread ()
>>>    at /home/andreas/QEMU/qemu/cpus.c:843
>>> #5  0x0000000000467580 in main_loop_wait (nonblocking=<optimized out>)
>>>    at /home/andreas/QEMU/qemu/main-loop.c:459
>>> #6  0x0000000000408984 in main_loop () at /home/andreas/QEMU/qemu/vl.c:1481
>>> #7  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
>>>    at /home/andreas/QEMU/qemu/vl.c:3474
>>>
>>> Key presses didn't work, SDL window doesn't close, at 99.9% CPU.
>>
>> Huh? This is all generic code O_o. And no, it's not a known issue.
> 
> Hm, reproducable by running
> 
> $ s390x-softmmu/qemu-system-s390x
> 
> (without arguments) on s390-next branch.
> 
> I get compat_monitor0 console, but monitor or switching to real console
> don't work and neither does closing. Same backtrace.

Same in my WIP qemu-system-rl78.

I found that the following main-loop change works around it for s390x
and rl78 but breaks x86_64 SeaBIOS boot. Paolo, any ideas?


A deadlock between iothread and main?

Andreas

Comments

Stefan Weil Nov. 12, 2011, 10:40 a.m. UTC | #1
Am 12.11.2011 11:08, schrieb Andreas Färber:
> Am 10.11.2011 12:29, schrieb Andreas Färber:
> I found that the following main-loop change works around it for s390x
> and rl78 but breaks x86_64 SeaBIOS boot. Paolo, any ideas?
>
> diff --git a/main-loop.c b/main-loop.c
> index 60e9748..2ab5023 100644
> --- a/main-loop.c
> +++ b/main-loop.c
> @@ -460,7 +460,7 @@ int main_loop_wait(int nonblocking)
> }
>
> glib_select_poll(&rfds, &wfds, &xfds, (ret < 0));
> - qemu_iohandler_poll(&rfds, &wfds, &xfds, ret);
> + qemu_iohandler_poll(&rfds, &wfds, &xfds, (ret < 0));
> #ifdef CONFIG_SLIRP
> slirp_select_poll(&rfds, &wfds, &xfds, (ret < 0));
> #endif
>
> A deadlock between iothread and main?
>
> Andreas

I just tried s390x on a 386 host (32 bit!) and got a different crash
(modulo operation / division with 0.0).

Are 32 bit hosts supported?

Stefan

(gdb) r
Starting program: 
/home/stefan/src/qemu/qemu.org/qemu/bin/debug/386/s390x-softmmu/qemu-system-s390x 

[Thread debugging using libthread_db enabled]
[New Thread 0xae9d0b70 (LWP 6841)]

Program received signal SIGFPE, Arithmetic exception.
[Switching to Thread 0xae9d0b70 (LWP 6841)]
0x08199f6b in __umoddi3 ()
(gdb) i s
#0  0x08199f6b in __umoddi3 ()
#1  0x08168a48 in helper_dlg (r1=2, v2=0) at 
/home/stefan/src/qemu/qemu.org/qemu/target-s390x/op_helper.c:369
#2  0x00eb5a88 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) up
#1  0x08168a48 in helper_dlg (r1=2, v2=0) at 
/home/stefan/src/qemu/qemu.org/qemu/target-s390x/op_helper.c:369
369            env->regs[r1] = env->regs[r1+1] % divisor;
(gdb) l
364    {
365        uint64_t divisor = v2;
366
367        if (!env->regs[r1]) {
368            /* 64 -> 64/64 case */
369            env->regs[r1] = env->regs[r1+1] % divisor;
370            env->regs[r1+1] = env->regs[r1+1] / divisor;
371            return;
372        } else {
373
(gdb) p divisor
$1 = 0
(gdb) p v2
$2 = 0
Alexander Graf Nov. 12, 2011, 12:50 p.m. UTC | #2
On 12.11.2011, at 11:40, Stefan Weil <sw@weilnetz.de> wrote:

> Am 12.11.2011 11:08, schrieb Andreas Färber:
>> Am 10.11.2011 12:29, schrieb Andreas Färber:
>> I found that the following main-loop change works around it for s390x
>> and rl78 but breaks x86_64 SeaBIOS boot. Paolo, any ideas?
>> 
>> diff --git a/main-loop.c b/main-loop.c
>> index 60e9748..2ab5023 100644
>> --- a/main-loop.c
>> +++ b/main-loop.c
>> @@ -460,7 +460,7 @@ int main_loop_wait(int nonblocking)
>> }
>> 
>> glib_select_poll(&rfds, &wfds, &xfds, (ret < 0));
>> - qemu_iohandler_poll(&rfds, &wfds, &xfds, ret);
>> + qemu_iohandler_poll(&rfds, &wfds, &xfds, (ret < 0));
>> #ifdef CONFIG_SLIRP
>> slirp_select_poll(&rfds, &wfds, &xfds, (ret < 0));
>> #endif
>> 
>> A deadlock between iothread and main?
>> 
>> Andreas
> 
> I just tried s390x on a 386 host (32 bit!) and got a different crash
> (modulo operation / division with 0.0).
> 
> Are 32 bit hosts supported?
> 
> Stefan
> 
> (gdb) r
> Starting program: /home/stefan/src/qemu/qemu.org/qemu/bin/debug/386/s390x-softmmu/qemu-system-s390x 
> [Thread debugging using libthread_db enabled]
> [New Thread 0xae9d0b70 (LWP 6841)]
> 
> Program received signal SIGFPE, Arithmetic exception.
> [Switching to Thread 0xae9d0b70 (LWP 6841)]
> 0x08199f6b in __umoddi3 ()
> (gdb) i s
> #0  0x08199f6b in __umoddi3 ()
> #1  0x08168a48 in helper_dlg (r1=2, v2=0) at /home/stefan/src/qemu/qemu.org/qemu/target-s390x/op_helper.c:369
> #2  0x00eb5a88 in ?? ()
> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
> (gdb) up
> #1  0x08168a48 in helper_dlg (r1=2, v2=0) at /home/stefan/src/qemu/qemu.org/qemu/target-s390x/op_helper.c:369
> 369            env->regs[r1] = env->regs[r1+1] % divisor;
> (gdb) l
> 364    {
> 365        uint64_t divisor = v2;
> 366
> 367        if (!env->regs[r1]) {
> 368            /* 64 -> 64/64 case */
> 369            env->regs[r1] = env->regs[r1+1] % divisor;
> 370            env->regs[r1+1] = env->regs[r1+1] / divisor;
> 371            return;
> 372        } else {
> 373
> (gdb) p divisor
> $1 = 0
> (gdb) p v2
> $2 = 0
> 

No, that's the expected result. I don't special-case division by 0 and my small virtio zipl boot rom triggers div by 0 when no hd is attached :)

Alex
Paolo Bonzini Nov. 13, 2011, 8:48 a.m. UTC | #3
On 11/12/2011 11:08 AM, Andreas Färber wrote:
>
> diff --git a/main-loop.c b/main-loop.c
> index 60e9748..2ab5023 100644
> --- a/main-loop.c
> +++ b/main-loop.c
> @@ -460,7 +460,7 @@ int main_loop_wait(int nonblocking)
>       }
>
>       glib_select_poll(&rfds,&wfds,&xfds, (ret<  0));
> -    qemu_iohandler_poll(&rfds,&wfds,&xfds, ret);
> +    qemu_iohandler_poll(&rfds,&wfds,&xfds, (ret<  0));
>   #ifdef CONFIG_SLIRP
>       slirp_select_poll(&rfds,&wfds,&xfds, (ret<  0));
>   #endif

No, this is definitely wrong. :)  It will break all iohandlers.

Paolo
Andreas Färber Nov. 14, 2011, 2:37 p.m. UTC | #4
Am 13.11.2011 09:48, schrieb Paolo Bonzini:
> On 11/12/2011 11:08 AM, Andreas Färber wrote:
>> I found that the following main-loop change works around it for s390x
>> and rl78 but breaks x86_64 SeaBIOS boot. Paolo, any ideas?
>>
>> diff --git a/main-loop.c b/main-loop.c
>> index 60e9748..2ab5023 100644
>> --- a/main-loop.c
>> +++ b/main-loop.c
>> @@ -460,7 +460,7 @@ int main_loop_wait(int nonblocking)
>>       }
>>
>>       glib_select_poll(&rfds,&wfds,&xfds, (ret<  0));
>> -    qemu_iohandler_poll(&rfds,&wfds,&xfds, ret);
>> +    qemu_iohandler_poll(&rfds,&wfds,&xfds, (ret<  0));
>>   #ifdef CONFIG_SLIRP
>>       slirp_select_poll(&rfds,&wfds,&xfds, (ret<  0));
>>   #endif
> 
> No, this is definitely wrong. :)  It will break all iohandlers.

Yeah, I noticed that myself in the part you had snipped above.
Question is, why are some use cases broken "iff" used right. :)
Any suggestions where/how to debug this?

Andreas
Andreas Färber Nov. 18, 2011, 3:16 p.m. UTC | #5
Am 12.11.2011 11:08, schrieb Andreas Färber:
> Am 10.11.2011 12:29, schrieb Andreas Färber:
>> Am 10.11.2011 11:32, schrieb Alexander Graf:
>>>
>>> On 10.11.2011, at 10:53, Andreas Färber <afaerber@suse.de> wrote:
>>>
>>>> Is there a known issue with running multiple instances of
>>>> qemu-system-s390x? I got a hang on openSUSE 12.1 RC2 x86_64 host:
>>>>
>>>> 0x00007f0de7f698d4 in __lll_lock_wait () from /lib64/libpthread.so.0
>>>> (gdb) bt
>>>> #0  0x00007f0de7f698d4 in __lll_lock_wait () from /lib64/libpthread.so.0
>>>> #1  0x00007f0de7f651c5 in _L_lock_883 () from /lib64/libpthread.so.0
>>>> #2  0x00007f0de7f6501a in pthread_mutex_lock () from /lib64/libpthread.so.0
>>>> #3  0x000000000048b4a9 in qemu_mutex_lock (mutex=<optimized out>)
>>>>    at /home/andreas/QEMU/qemu/qemu-thread-posix.c:54
>>>> #4  0x00000000004cd3df in qemu_mutex_lock_iothread ()
>>>>    at /home/andreas/QEMU/qemu/cpus.c:843
>>>> #5  0x0000000000467580 in main_loop_wait (nonblocking=<optimized out>)
>>>>    at /home/andreas/QEMU/qemu/main-loop.c:459
>>>> #6  0x0000000000408984 in main_loop () at /home/andreas/QEMU/qemu/vl.c:1481
>>>> #7  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
>>>>    at /home/andreas/QEMU/qemu/vl.c:3474
>>>>
>>>> Key presses didn't work, SDL window doesn't close, at 99.9% CPU.
>>>
>>> Huh? This is all generic code O_o. And no, it's not a known issue.
>>
>> Hm, reproducable by running
>>
>> $ s390x-softmmu/qemu-system-s390x
>>
>> (without arguments) on s390-next branch.
>>
>> I get compat_monitor0 console, but monitor or switching to real console
>> don't work and neither does closing. Same backtrace.
> 
> Same in my WIP qemu-system-rl78.
> 
> I found that the following main-loop change works around it for s390x
> and rl78 but breaks x86_64 SeaBIOS boot. [...]
> 
> diff --git a/main-loop.c b/main-loop.c
> index 60e9748..2ab5023 100644
> --- a/main-loop.c
> +++ b/main-loop.c
> @@ -460,7 +460,7 @@ int main_loop_wait(int nonblocking)
>      }
> 
>      glib_select_poll(&rfds, &wfds, &xfds, (ret < 0));
> -    qemu_iohandler_poll(&rfds, &wfds, &xfds, ret);
> +    qemu_iohandler_poll(&rfds, &wfds, &xfds, (ret < 0));
>  #ifdef CONFIG_SLIRP
>      slirp_select_poll(&rfds, &wfds, &xfds, (ret < 0));
>  #endif

For the record: While s390x and rl78 showed an identical backtrace in
gdb, the actual causes of the lockups turned out to be different:

For rl78, tlb_fill() was not yet fully/correctly implemented, with no TB
being generated for execution. We should probably error out rather than
going into an infinite loop then.

For s390x, the hang was workaroundable by memset()'ing not just the
virtio memory region but the whole of RAM. Still investigating.

Andreas
diff mbox

Patch

diff --git a/main-loop.c b/main-loop.c
index 60e9748..2ab5023 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -460,7 +460,7 @@  int main_loop_wait(int nonblocking)
     }

     glib_select_poll(&rfds, &wfds, &xfds, (ret < 0));
-    qemu_iohandler_poll(&rfds, &wfds, &xfds, ret);
+    qemu_iohandler_poll(&rfds, &wfds, &xfds, (ret < 0));
 #ifdef CONFIG_SLIRP
     slirp_select_poll(&rfds, &wfds, &xfds, (ret < 0));
 #endif