Patchwork [v2] block: avoid SIGUSR2

login
register
mail settings
Submitter Kevin Wolf
Date Oct. 28, 2011, 11:33 a.m.
Message ID <4EAA9310.2030705@redhat.com>
Download mbox | patch
Permalink /patch/122396/
State New
Headers show

Comments

Kevin Wolf - Oct. 28, 2011, 11:33 a.m.
Am 27.10.2011 16:32, schrieb Kevin Wolf:
> Am 27.10.2011 16:15, schrieb Kevin Wolf:
>> Am 27.10.2011 15:57, schrieb Stefan Hajnoczi:
>>> On Thu, Oct 27, 2011 at 03:26:23PM +0200, Kevin Wolf wrote:
>>>> Am 19.09.2011 16:37, schrieb Frediano Ziglio:
>>>>> Now that iothread is always compiled sending a signal seems only an
>>>>> additional step. This patch also avoid writing to two pipe (one from signal
>>>>> and one in qemu_service_io).
>>>>>
>>>>> Work with kvm enabled or disabled. strace output is more readable (less syscalls).
>>>>>
>>>>> Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
>>>>
>>>> Something in this change has bad effects, in the sense that it seems to
>>>> break bdrv_read_em.
>>>
>>> How does it break bdrv_read_em?  Are you seeing QEMU hung with 100% CPU
>>> utilization or deadlocked?
>>
>> Sorry, I should have been more detailed here.
>>
>> No, it's nothing obvious, it must be some subtle side effect. The result
>> of bdrv_read_em itself seems to be correct (return value and checksum of
>> the read buffer).
>>
>> However instead of booting into the DOS setup I only get an error
>> message "Kein System oder Laufwerksfehler" (don't know how it reads in
>> English DOS versions), which seems to be produced by the boot sector.
>>
>> I excluded all of the minor changes, so I'm sure that it's caused by the
>> switch from kill() to a direct call of the function that writes into the
>> pipe.
>>
>>> One interesting thing is that qemu_aio_wait() does not release the QEMU
>>> mutex, so we cannot write to a pipe with the mutex held and then spin
>>> waiting for the iothread to do work for us.
>>>
>>> Exactly how kill and qemu_notify_event() were different I'm not sure
>>> right now but it could be a factor.
>>
>> This would cause a hang, right? Then it isn't what I'm seeing.
> 
> While trying out some more things, I added some fprintfs to
> posix_aio_process_queue() and suddenly it also fails with the kill()
> version. So what has changed might really just be the timing, and it
> could be a race somewhere that has always (?) existed.

Replying to myself again... It looks like there is a problem with
reentrancy in fdctrl_transfer_handler. I think this would have been
guarded by the AsyncContexts before, but we don't have them any more.

qemu-system-x86_64: /root/upstream/qemu/hw/fdc.c:1253:
fdctrl_transfer_handler: Assertion `reentrancy == 0' failed.

Program received signal SIGABRT, Aborted.

(gdb) bt
#0  0x0000003ccd2329a5 in raise () from /lib64/libc.so.6
#1  0x0000003ccd234185 in abort () from /lib64/libc.so.6
#2  0x0000003ccd22b935 in __assert_fail () from /lib64/libc.so.6
#3  0x000000000046ff09 in fdctrl_transfer_handler (opaque=<value
optimized out>, nchan=<value optimized out>, dma_pos=<value optimized out>,
    dma_len=<value optimized out>) at /root/upstream/qemu/hw/fdc.c:1253
#4  0x000000000046702c in channel_run () at /root/upstream/qemu/hw/dma.c:348
#5  DMA_run () at /root/upstream/qemu/hw/dma.c:378
#6  0x000000000040b0e1 in qemu_bh_poll () at async.c:70
#7  0x000000000040aa19 in qemu_aio_wait () at aio.c:147
#8  0x000000000041c355 in bdrv_read_em (bs=0x131fd80, sector_num=19,
buf=<value optimized out>, nb_sectors=1) at block.c:2896
#9  0x000000000041b3d2 in bdrv_read (bs=0x131fd80, sector_num=19,
buf=0x1785a00 "IO      SYS!", nb_sectors=1) at block.c:1062
#10 0x000000000041b3d2 in bdrv_read (bs=0x131f430, sector_num=19,
buf=0x1785a00 "IO      SYS!", nb_sectors=1) at block.c:1062
#11 0x000000000046fbb8 in do_fdctrl_transfer_handler (opaque=0x1785788,
nchan=2, dma_pos=<value optimized out>, dma_len=512)
    at /root/upstream/qemu/hw/fdc.c:1178
#12 0x000000000046fecf in fdctrl_transfer_handler (opaque=<value
optimized out>, nchan=<value optimized out>, dma_pos=<value optimized out>,
    dma_len=<value optimized out>) at /root/upstream/qemu/hw/fdc.c:1255
#13 0x000000000046702c in channel_run () at /root/upstream/qemu/hw/dma.c:348
#14 DMA_run () at /root/upstream/qemu/hw/dma.c:378
#15 0x000000000046e456 in fdctrl_start_transfer (fdctrl=0x1785788,
direction=1) at /root/upstream/qemu/hw/fdc.c:1107
#16 0x0000000000558a41 in kvm_handle_io (env=0x1323ff0) at
/root/upstream/qemu/kvm-all.c:834
#17 kvm_cpu_exec (env=0x1323ff0) at /root/upstream/qemu/kvm-all.c:976
#18 0x000000000053686a in qemu_kvm_cpu_thread_fn (arg=0x1323ff0) at
/root/upstream/qemu/cpus.c:661
#19 0x0000003ccda077e1 in start_thread () from /lib64/libpthread.so.0
#20 0x0000003ccd2e151d in clone () from /lib64/libc.so.6

I'm afraid that we can only avoid things like this reliably if we
convert all devices to be direct users of AIO/coroutines. The current
block layer infrastructure doesn't emulate the behaviour of bdrv_read
accurately as bottom halves can be run in the nested main loop.

For floppy, the following seems to be a quick fix (Lucas, Cleber, does
this solve your problems?), though it's not very satisfying. And I'm not
quite sure yet why it doesn't always happen with kill() in
posix-aio-compat.c.


Kevin
Kevin Wolf - Oct. 28, 2011, 11:35 a.m.
Am 28.10.2011 13:33, schrieb Kevin Wolf:
> Am 27.10.2011 16:32, schrieb Kevin Wolf:
>> Am 27.10.2011 16:15, schrieb Kevin Wolf:
>>> Am 27.10.2011 15:57, schrieb Stefan Hajnoczi:
>>>> On Thu, Oct 27, 2011 at 03:26:23PM +0200, Kevin Wolf wrote:
>>>>> Am 19.09.2011 16:37, schrieb Frediano Ziglio:
>>>>>> Now that iothread is always compiled sending a signal seems only an
>>>>>> additional step. This patch also avoid writing to two pipe (one from signal
>>>>>> and one in qemu_service_io).
>>>>>>
>>>>>> Work with kvm enabled or disabled. strace output is more readable (less syscalls).
>>>>>>
>>>>>> Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
>>>>>
>>>>> Something in this change has bad effects, in the sense that it seems to
>>>>> break bdrv_read_em.
>>>>
>>>> How does it break bdrv_read_em?  Are you seeing QEMU hung with 100% CPU
>>>> utilization or deadlocked?
>>>
>>> Sorry, I should have been more detailed here.
>>>
>>> No, it's nothing obvious, it must be some subtle side effect. The result
>>> of bdrv_read_em itself seems to be correct (return value and checksum of
>>> the read buffer).
>>>
>>> However instead of booting into the DOS setup I only get an error
>>> message "Kein System oder Laufwerksfehler" (don't know how it reads in
>>> English DOS versions), which seems to be produced by the boot sector.
>>>
>>> I excluded all of the minor changes, so I'm sure that it's caused by the
>>> switch from kill() to a direct call of the function that writes into the
>>> pipe.
>>>
>>>> One interesting thing is that qemu_aio_wait() does not release the QEMU
>>>> mutex, so we cannot write to a pipe with the mutex held and then spin
>>>> waiting for the iothread to do work for us.
>>>>
>>>> Exactly how kill and qemu_notify_event() were different I'm not sure
>>>> right now but it could be a factor.
>>>
>>> This would cause a hang, right? Then it isn't what I'm seeing.
>>
>> While trying out some more things, I added some fprintfs to
>> posix_aio_process_queue() and suddenly it also fails with the kill()
>> version. So what has changed might really just be the timing, and it
>> could be a race somewhere that has always (?) existed.
> 
> Replying to myself again... It looks like there is a problem with
> reentrancy in fdctrl_transfer_handler. I think this would have been
> guarded by the AsyncContexts before, but we don't have them any more.
> 
> qemu-system-x86_64: /root/upstream/qemu/hw/fdc.c:1253:
> fdctrl_transfer_handler: Assertion `reentrancy == 0' failed.
> 
> Program received signal SIGABRT, Aborted.
> 
> (gdb) bt
> #0  0x0000003ccd2329a5 in raise () from /lib64/libc.so.6
> #1  0x0000003ccd234185 in abort () from /lib64/libc.so.6
> #2  0x0000003ccd22b935 in __assert_fail () from /lib64/libc.so.6
> #3  0x000000000046ff09 in fdctrl_transfer_handler (opaque=<value
> optimized out>, nchan=<value optimized out>, dma_pos=<value optimized out>,
>     dma_len=<value optimized out>) at /root/upstream/qemu/hw/fdc.c:1253
> #4  0x000000000046702c in channel_run () at /root/upstream/qemu/hw/dma.c:348
> #5  DMA_run () at /root/upstream/qemu/hw/dma.c:378
> #6  0x000000000040b0e1 in qemu_bh_poll () at async.c:70
> #7  0x000000000040aa19 in qemu_aio_wait () at aio.c:147
> #8  0x000000000041c355 in bdrv_read_em (bs=0x131fd80, sector_num=19,
> buf=<value optimized out>, nb_sectors=1) at block.c:2896
> #9  0x000000000041b3d2 in bdrv_read (bs=0x131fd80, sector_num=19,
> buf=0x1785a00 "IO      SYS!", nb_sectors=1) at block.c:1062
> #10 0x000000000041b3d2 in bdrv_read (bs=0x131f430, sector_num=19,
> buf=0x1785a00 "IO      SYS!", nb_sectors=1) at block.c:1062
> #11 0x000000000046fbb8 in do_fdctrl_transfer_handler (opaque=0x1785788,
> nchan=2, dma_pos=<value optimized out>, dma_len=512)
>     at /root/upstream/qemu/hw/fdc.c:1178
> #12 0x000000000046fecf in fdctrl_transfer_handler (opaque=<value
> optimized out>, nchan=<value optimized out>, dma_pos=<value optimized out>,
>     dma_len=<value optimized out>) at /root/upstream/qemu/hw/fdc.c:1255
> #13 0x000000000046702c in channel_run () at /root/upstream/qemu/hw/dma.c:348
> #14 DMA_run () at /root/upstream/qemu/hw/dma.c:378
> #15 0x000000000046e456 in fdctrl_start_transfer (fdctrl=0x1785788,
> direction=1) at /root/upstream/qemu/hw/fdc.c:1107
> #16 0x0000000000558a41 in kvm_handle_io (env=0x1323ff0) at
> /root/upstream/qemu/kvm-all.c:834
> #17 kvm_cpu_exec (env=0x1323ff0) at /root/upstream/qemu/kvm-all.c:976
> #18 0x000000000053686a in qemu_kvm_cpu_thread_fn (arg=0x1323ff0) at
> /root/upstream/qemu/cpus.c:661
> #19 0x0000003ccda077e1 in start_thread () from /lib64/libpthread.so.0
> #20 0x0000003ccd2e151d in clone () from /lib64/libc.so.6
> 
> I'm afraid that we can only avoid things like this reliably if we
> convert all devices to be direct users of AIO/coroutines. The current
> block layer infrastructure doesn't emulate the behaviour of bdrv_read
> accurately as bottom halves can be run in the nested main loop.
> 
> For floppy, the following seems to be a quick fix (Lucas, Cleber, does
> this solve your problems?), though it's not very satisfying. And I'm not
> quite sure yet why it doesn't always happen with kill() in
> posix-aio-compat.c.
> 
> diff --git a/hw/dma.c b/hw/dma.c
> index 8a7302a..1d3b6f1 100644
> --- a/hw/dma.c
> +++ b/hw/dma.c
> @@ -358,6 +358,13 @@ static void DMA_run (void)
>      struct dma_cont *d;
>      int icont, ichan;
>      int rearm = 0;
> +    static int running = 0;
> +
> +    if (running) {
> +        goto out;
> +    } else {
> +        running = 0;

running = 1, obviously. I had the fix disabled for testing something.

> +    }
> 
>      d = dma_controllers;
> 
> @@ -374,6 +381,8 @@ static void DMA_run (void)
>          }
>      }
> 
> +out:
> +    running = 0;
>      if (rearm)
>          qemu_bh_schedule_idle(dma_bh);
>  }
> 
> Kevin
>
Paolo Bonzini - Oct. 28, 2011, 11:50 a.m.
On 10/28/2011 01:33 PM, Kevin Wolf wrote:
> I'm afraid that we can only avoid things like this reliably if we
> convert all devices to be direct users of AIO/coroutines. The current
> block layer infrastructure doesn't emulate the behaviour of bdrv_read
> accurately as bottom halves can be run in the nested main loop.
>
> For floppy, the following seems to be a quick fix (Lucas, Cleber, does
> this solve your problems?), though it's not very satisfying. And I'm not
> quite sure yet why it doesn't always happen with kill() in
> posix-aio-compat.c.

Another "fix" is to change idle bottom halves (at least the one in 
hw/dma.c) to 10ms timers.

Paolo
Cleber Rosa - Oct. 28, 2011, 12:20 p.m.
On 10/28/2011 08:33 AM, Kevin Wolf wrote:
> Am 27.10.2011 16:32, schrieb Kevin Wolf:
>> Am 27.10.2011 16:15, schrieb Kevin Wolf:
>>> Am 27.10.2011 15:57, schrieb Stefan Hajnoczi:
>>>> On Thu, Oct 27, 2011 at 03:26:23PM +0200, Kevin Wolf wrote:
>>>>> Am 19.09.2011 16:37, schrieb Frediano Ziglio:
>>>>>> Now that iothread is always compiled sending a signal seems only an
>>>>>> additional step. This patch also avoid writing to two pipe (one from signal
>>>>>> and one in qemu_service_io).
>>>>>>
>>>>>> Work with kvm enabled or disabled. strace output is more readable (less syscalls).
>>>>>>
>>>>>> Signed-off-by: Frediano Ziglio<freddy77@gmail.com>
>>>>> Something in this change has bad effects, in the sense that it seems to
>>>>> break bdrv_read_em.
>>>> How does it break bdrv_read_em?  Are you seeing QEMU hung with 100% CPU
>>>> utilization or deadlocked?
>>> Sorry, I should have been more detailed here.
>>>
>>> No, it's nothing obvious, it must be some subtle side effect. The result
>>> of bdrv_read_em itself seems to be correct (return value and checksum of
>>> the read buffer).
>>>
>>> However instead of booting into the DOS setup I only get an error
>>> message "Kein System oder Laufwerksfehler" (don't know how it reads in
>>> English DOS versions), which seems to be produced by the boot sector.
>>>
>>> I excluded all of the minor changes, so I'm sure that it's caused by the
>>> switch from kill() to a direct call of the function that writes into the
>>> pipe.
>>>
>>>> One interesting thing is that qemu_aio_wait() does not release the QEMU
>>>> mutex, so we cannot write to a pipe with the mutex held and then spin
>>>> waiting for the iothread to do work for us.
>>>>
>>>> Exactly how kill and qemu_notify_event() were different I'm not sure
>>>> right now but it could be a factor.
>>> This would cause a hang, right? Then it isn't what I'm seeing.
>> While trying out some more things, I added some fprintfs to
>> posix_aio_process_queue() and suddenly it also fails with the kill()
>> version. So what has changed might really just be the timing, and it
>> could be a race somewhere that has always (?) existed.
> Replying to myself again... It looks like there is a problem with
> reentrancy in fdctrl_transfer_handler. I think this would have been
> guarded by the AsyncContexts before, but we don't have them any more.
>
> qemu-system-x86_64: /root/upstream/qemu/hw/fdc.c:1253:
> fdctrl_transfer_handler: Assertion `reentrancy == 0' failed.
>
> Program received signal SIGABRT, Aborted.
>
> (gdb) bt
> #0  0x0000003ccd2329a5 in raise () from /lib64/libc.so.6
> #1  0x0000003ccd234185 in abort () from /lib64/libc.so.6
> #2  0x0000003ccd22b935 in __assert_fail () from /lib64/libc.so.6
> #3  0x000000000046ff09 in fdctrl_transfer_handler (opaque=<value
> optimized out>, nchan=<value optimized out>, dma_pos=<value optimized out>,
>      dma_len=<value optimized out>) at /root/upstream/qemu/hw/fdc.c:1253
> #4  0x000000000046702c in channel_run () at /root/upstream/qemu/hw/dma.c:348
> #5  DMA_run () at /root/upstream/qemu/hw/dma.c:378
> #6  0x000000000040b0e1 in qemu_bh_poll () at async.c:70
> #7  0x000000000040aa19 in qemu_aio_wait () at aio.c:147
> #8  0x000000000041c355 in bdrv_read_em (bs=0x131fd80, sector_num=19,
> buf=<value optimized out>, nb_sectors=1) at block.c:2896
> #9  0x000000000041b3d2 in bdrv_read (bs=0x131fd80, sector_num=19,
> buf=0x1785a00 "IO      SYS!", nb_sectors=1) at block.c:1062
> #10 0x000000000041b3d2 in bdrv_read (bs=0x131f430, sector_num=19,
> buf=0x1785a00 "IO      SYS!", nb_sectors=1) at block.c:1062
> #11 0x000000000046fbb8 in do_fdctrl_transfer_handler (opaque=0x1785788,
> nchan=2, dma_pos=<value optimized out>, dma_len=512)
>      at /root/upstream/qemu/hw/fdc.c:1178
> #12 0x000000000046fecf in fdctrl_transfer_handler (opaque=<value
> optimized out>, nchan=<value optimized out>, dma_pos=<value optimized out>,
>      dma_len=<value optimized out>) at /root/upstream/qemu/hw/fdc.c:1255
> #13 0x000000000046702c in channel_run () at /root/upstream/qemu/hw/dma.c:348
> #14 DMA_run () at /root/upstream/qemu/hw/dma.c:378
> #15 0x000000000046e456 in fdctrl_start_transfer (fdctrl=0x1785788,
> direction=1) at /root/upstream/qemu/hw/fdc.c:1107
> #16 0x0000000000558a41 in kvm_handle_io (env=0x1323ff0) at
> /root/upstream/qemu/kvm-all.c:834
> #17 kvm_cpu_exec (env=0x1323ff0) at /root/upstream/qemu/kvm-all.c:976
> #18 0x000000000053686a in qemu_kvm_cpu_thread_fn (arg=0x1323ff0) at
> /root/upstream/qemu/cpus.c:661
> #19 0x0000003ccda077e1 in start_thread () from /lib64/libpthread.so.0
> #20 0x0000003ccd2e151d in clone () from /lib64/libc.so.6
>
> I'm afraid that we can only avoid things like this reliably if we
> convert all devices to be direct users of AIO/coroutines. The current
> block layer infrastructure doesn't emulate the behaviour of bdrv_read
> accurately as bottom halves can be run in the nested main loop.
>
> For floppy, the following seems to be a quick fix (Lucas, Cleber, does
> this solve your problems?), though it's not very satisfying. And I'm not
> quite sure yet why it doesn't always happen with kill() in
> posix-aio-compat.c.
>
> diff --git a/hw/dma.c b/hw/dma.c
> index 8a7302a..1d3b6f1 100644
> --- a/hw/dma.c
> +++ b/hw/dma.c
> @@ -358,6 +358,13 @@ static void DMA_run (void)
>       struct dma_cont *d;
>       int icont, ichan;
>       int rearm = 0;
> +    static int running = 0;
> +
> +    if (running) {
> +        goto out;
> +    } else {
> +        running = 0;
> +    }
>
>       d = dma_controllers;
>
> @@ -374,6 +381,8 @@ static void DMA_run (void)
>           }
>       }
>
> +out:
> +    running = 0;
>       if (rearm)
>           qemu_bh_schedule_idle(dma_bh);
>   }
>
> Kevin

Kevin,

In my quick test (compiling qemu.git master + your dma patch, and 
running a FreeDOS floppy image) it does not have any visible difference.

The boot is still stuck after printing "FreeDOS" at the console.

PS: We will trigger a full blown test, with a Windows installation using 
a floppy, but the results with the FreeDOS floppy have been very 
consistent with the full blown test.
Kevin Wolf - Oct. 28, 2011, 12:29 p.m.
Am 28.10.2011 13:50, schrieb Paolo Bonzini:
> On 10/28/2011 01:33 PM, Kevin Wolf wrote:
>> I'm afraid that we can only avoid things like this reliably if we
>> convert all devices to be direct users of AIO/coroutines. The current
>> block layer infrastructure doesn't emulate the behaviour of bdrv_read
>> accurately as bottom halves can be run in the nested main loop.
>>
>> For floppy, the following seems to be a quick fix (Lucas, Cleber, does
>> this solve your problems?), though it's not very satisfying. And I'm not
>> quite sure yet why it doesn't always happen with kill() in
>> posix-aio-compat.c.
> 
> Another "fix" is to change idle bottom halves (at least the one in 
> hw/dma.c) to 10ms timers.

Which would be using the fact that timers are only executed in the real
main loop. Which makes me wonder if it would be enough for floppy if we
changed qemu_bh_poll() to take a bool run_idle_bhs that would be true in
the main loop and false an qemu_aio_wait().

Still this wouldn't be a general solution as normal BHs have the very
same problem if they are scheduled before a bdrv_read/write call. To
solve that I guess we'd have to reintroduce AsyncContext, but it has its
own problems and was removed for a reason.

Or we make some serious effort now to convert devices to AIO.

Kevin
Stefan Hajnoczi - Oct. 28, 2011, 12:31 p.m.
On Fri, Oct 28, 2011 at 1:29 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 28.10.2011 13:50, schrieb Paolo Bonzini:
>> On 10/28/2011 01:33 PM, Kevin Wolf wrote:
>>> I'm afraid that we can only avoid things like this reliably if we
>>> convert all devices to be direct users of AIO/coroutines. The current
>>> block layer infrastructure doesn't emulate the behaviour of bdrv_read
>>> accurately as bottom halves can be run in the nested main loop.
>>>
>>> For floppy, the following seems to be a quick fix (Lucas, Cleber, does
>>> this solve your problems?), though it's not very satisfying. And I'm not
>>> quite sure yet why it doesn't always happen with kill() in
>>> posix-aio-compat.c.
>>
>> Another "fix" is to change idle bottom halves (at least the one in
>> hw/dma.c) to 10ms timers.
>
> Which would be using the fact that timers are only executed in the real
> main loop. Which makes me wonder if it would be enough for floppy if we
> changed qemu_bh_poll() to take a bool run_idle_bhs that would be true in
> the main loop and false an qemu_aio_wait().
>
> Still this wouldn't be a general solution as normal BHs have the very
> same problem if they are scheduled before a bdrv_read/write call. To
> solve that I guess we'd have to reintroduce AsyncContext, but it has its
> own problems and was removed for a reason.
>
> Or we make some serious effort now to convert devices to AIO.

Zhi Yong: We were just talking about converting devices to aio.  If
you have time to do that for fdc, sd, or any other synchronous API
users in hw/ that would be helpful.  Please let us know which device
you are refactoring so we don't duplicate work.

Stefan
Paolo Bonzini - Oct. 28, 2011, 3:58 p.m.
On 10/28/2011 02:31 PM, Stefan Hajnoczi wrote:
> Zhi Yong: We were just talking about converting devices to aio.  If
> you have time to do that for fdc, sd, or any other synchronous API
> users in hw/ that would be helpful.  Please let us know which device
> you are refactoring so we don't duplicate work.

The problem is not really fdc or sd themselves, but whoever uses the 
result of the synchronous reads---respectively DMA and the SD clients.

Some SD clients talk to the SD card in a relatively confined way and 
have interrupts that they set when the operation is done, so these 
confined parts that talk to the card could be changed to a coroutine and 
locked with a CoMutex.  However, not even all of these can do it (in 
particular I'm not sure about ssi-sd.c cannot).

I'm thinking that the problem with the floppy is really that it mixes 
synchronous and asynchronous parts.  As long as you're entirely 
synchronous you should not have any problem, but as soon as you add 
asynchronicity (via bottom halves) you now have to deal with reentrancy.

"git grep _bh hw/" suggests that this should not be a huge problem; most 
if not all occurrences are related to ptimers, or are in entirely 
asynchronous code (IDE, SCSI, virtio).  Floppy+DMA seems to be the only 
problematic occurrence, and any fix (switch to timers, drop idle BH in 
qemu_aio_wait, reschedule if DMA reenters during I/O, drop BH completely 
and just loop) is as good as the others.

(Actually, another one worth checking is ATAPI, but I don't know the 
code and the standards well enough).

Paolo
Zhi Yong Wu - Oct. 31, 2011, 2:10 a.m.
On Fri, Oct 28, 2011 at 01:31:20PM +0100, Stefan Hajnoczi wrote:
>Subject: Re: [Qemu-devel] [PATCH v2] block: avoid SIGUSR2
>From: Stefan Hajnoczi <stefanha@gmail.com>
>To: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>Cc: Paolo Bonzini <pbonzini@redhat.com>, Lucas Meneghel Rodrigues
> <lmr@redhat.com>, aliguori@us.ibm.com, Stefan Hajnoczi
> <stefanha@linux.vnet.ibm.com>, qemu-devel@nongnu.org, Frediano Ziglio
> <freddy77@gmail.com>, Cleber Rosa <crosa@redhat.com>, Kevin Wolf
> <kwolf@redhat.com>
>Content-Type: text/plain; charset=ISO-8859-1
>x-cbid: 11102812-3534-0000-0000-000000FD91EE
>X-IBM-ISS-SpamDetectors: Score=0; BY=0; FL=0; FP=0; FZ=0; HX=0; KW=0; PH=0;
> SC=0; ST=0; TS=0; UL=0; ISC=
>X-IBM-ISS-DetailInfo: BY=3.00000227; HX=3.00000175; KW=3.00000007;
> PH=3.00000001; SC=3.00000001; SDB=6.00082671; UDB=6.00022873;
> UTC=2011-10-28 12:31:35
>X-Xagent-From: stefanha@gmail.com
>X-Xagent-To: wuzhy@linux.vnet.ibm.com
>X-Xagent-Gateway: vmsdvm6.vnet.ibm.com (XAGENTU8 at VMSDVM6)
>
>On Fri, Oct 28, 2011 at 1:29 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>> Am 28.10.2011 13:50, schrieb Paolo Bonzini:
>>> On 10/28/2011 01:33 PM, Kevin Wolf wrote:
>>>> I'm afraid that we can only avoid things like this reliably if we
>>>> convert all devices to be direct users of AIO/coroutines. The current
>>>> block layer infrastructure doesn't emulate the behaviour of bdrv_read
>>>> accurately as bottom halves can be run in the nested main loop.
>>>>
>>>> For floppy, the following seems to be a quick fix (Lucas, Cleber, does
>>>> this solve your problems?), though it's not very satisfying. And I'm not
>>>> quite sure yet why it doesn't always happen with kill() in
>>>> posix-aio-compat.c.
>>>
>>> Another "fix" is to change idle bottom halves (at least the one in
>>> hw/dma.c) to 10ms timers.
>>
>> Which would be using the fact that timers are only executed in the real
>> main loop. Which makes me wonder if it would be enough for floppy if we
>> changed qemu_bh_poll() to take a bool run_idle_bhs that would be true in
>> the main loop and false an qemu_aio_wait().
>>
>> Still this wouldn't be a general solution as normal BHs have the very
>> same problem if they are scheduled before a bdrv_read/write call. To
>> solve that I guess we'd have to reintroduce AsyncContext, but it has its
>> own problems and was removed for a reason.
>>
>> Or we make some serious effort now to convert devices to AIO.
>
>Zhi Yong: We were just talking about converting devices to aio.  If
>you have time to do that for fdc, sd, or any other synchronous API
>users in hw/ that would be helpful.  Please let us know which device
>you are refactoring so we don't duplicate work.
Stefan,
I am working on flash(onenand, CFI), cdrom, sd, fdc, etc. If anyone has good thought, pls let me know.:)


Regards,

Zhi Yong Wu
>
>Stefan
>

Patch

diff --git a/hw/dma.c b/hw/dma.c
index 8a7302a..1d3b6f1 100644
--- a/hw/dma.c
+++ b/hw/dma.c
@@ -358,6 +358,13 @@  static void DMA_run (void)
     struct dma_cont *d;
     int icont, ichan;
     int rearm = 0;
+    static int running = 0;
+
+    if (running) {
+        goto out;
+    } else {
+        running = 0;
+    }

     d = dma_controllers;

@@ -374,6 +381,8 @@  static void DMA_run (void)
         }
     }

+out:
+    running = 0;
     if (rearm)
         qemu_bh_schedule_idle(dma_bh);
 }