Patchwork block: add the support to drain throttled requests

login
register
mail settings
Submitter Zhiyong Wu
Date March 13, 2012, 1:53 a.m.
Message ID <1331603611-9103-1-git-send-email-zwu.kernel@gmail.com>
Download mbox | patch
Permalink /patch/146325/
State New
Headers show

Comments

Zhiyong Wu - March 13, 2012, 1:53 a.m.
From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
[ Iterate until all block devices have processed all requests,
 add comments. - Paolo ]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c |   24 ++++++++++++++++++++++--
 1 files changed, 22 insertions(+), 2 deletions(-)
Zhiyong Wu - March 20, 2012, 9:40 a.m.
HI, Kevin,

We hope that I/O throttling can be shipped without known issue in QEMU
1.1, so if you are available, can you give this patch some love?

On Tue, Mar 13, 2012 at 9:53 AM,  <zwu.kernel@gmail.com> wrote:
> From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>
> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
> [ Iterate until all block devices have processed all requests,
>  add comments. - Paolo ]
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block.c |   24 ++++++++++++++++++++++--
>  1 files changed, 22 insertions(+), 2 deletions(-)
>
> diff --git a/block.c b/block.c
> index 52ffe14..52dabd0 100644
> --- a/block.c
> +++ b/block.c
> @@ -858,12 +858,32 @@ void bdrv_close_all(void)
>  *
>  * This function does not flush data to disk, use bdrv_flush_all() for that
>  * after calling this function.
> - */
> + *
> + * Note that completion of an asynchronous I/O operation can trigger any
> + * number of other I/O operations on other devices---for example a coroutine
> + * can be arbitrarily complex and a constant flow of I/O to multiple devices
> + * can come until the coroutine is complete.  Because of this, it is not
> + * possible to drain a single device's I/O queue.
> +*/
>  void bdrv_drain_all(void)
>  {
>     BlockDriverState *bs;
> +    bool busy;
>
> -    qemu_aio_flush();
> +    do {
> +        busy = false;
> +        qemu_aio_flush();
> +
> +        /* FIXME: We do not have timer support here, so this is effectively
> +         * a busy wait.
> +         */
> +        QTAILQ_FOREACH(bs, &bdrv_states, list) {
> +            if (!qemu_co_queue_empty(&bs->throttled_reqs)) {
> +                qemu_co_queue_restart_all(&bs->throttled_reqs);
> +                busy = true;
> +            }
> +        }
> +    } while (busy);
>
>     /* If requests are still pending there is a bug somewhere */
>     QTAILQ_FOREACH(bs, &bdrv_states, list) {
> --
> 1.7.6
>
Paolo Bonzini - March 20, 2012, 9:47 a.m.
Il 20/03/2012 10:40, Zhi Yong Wu ha scritto:
> HI, Kevin,
> 
> We hope that I/O throttling can be shipped without known issue in QEMU
> 1.1, so if you are available, can you give this patch some love?

I'm sorry to say this, but I think I/O throttling is impossible to save.
 As it is implemented now, it just cannot work in the presence of
synchronous I/O, except at the cost of busy waiting with the global
mutex taken.  See the message from Stefan yesterday.

Unfortunately I don't have any solution for this, except perhaps
disabling throttling around synchronous I/O.

Paolo
Kevin Wolf - March 20, 2012, 9:56 a.m.
Am 20.03.2012 10:40, schrieb Zhi Yong Wu:
> HI, Kevin,
> 
> We hope that I/O throttling can be shipped without known issue in QEMU
> 1.1, so if you are available, can you give this patch some love?

Sorry, haven't had the time to follow the discussion closely. Are all
review comments addressed now? Paolo?

Kevin
Kevin Wolf - March 20, 2012, 9:58 a.m.
Am 20.03.2012 10:47, schrieb Paolo Bonzini:
> Il 20/03/2012 10:40, Zhi Yong Wu ha scritto:
>> HI, Kevin,
>>
>> We hope that I/O throttling can be shipped without known issue in QEMU
>> 1.1, so if you are available, can you give this patch some love?
> 
> I'm sorry to say this, but I think I/O throttling is impossible to save.
>  As it is implemented now, it just cannot work in the presence of
> synchronous I/O, except at the cost of busy waiting with the global
> mutex taken.  See the message from Stefan yesterday.

qemu_aio_flush() is busy waiting with the global mutex taken anyway, so
it doesn't change that much.

Kevin
Stefan Hajnoczi - March 20, 2012, 11:44 a.m.
On Tue, Mar 20, 2012 at 10:58:10AM +0100, Kevin Wolf wrote:
> Am 20.03.2012 10:47, schrieb Paolo Bonzini:
> > Il 20/03/2012 10:40, Zhi Yong Wu ha scritto:
> >> HI, Kevin,
> >>
> >> We hope that I/O throttling can be shipped without known issue in QEMU
> >> 1.1, so if you are available, can you give this patch some love?
> > 
> > I'm sorry to say this, but I think I/O throttling is impossible to save.
> >  As it is implemented now, it just cannot work in the presence of
> > synchronous I/O, except at the cost of busy waiting with the global
> > mutex taken.  See the message from Stefan yesterday.
> 
> qemu_aio_flush() is busy waiting with the global mutex taken anyway, so
> it doesn't change that much.

Yesterday I only posted an analysis of the bug but here are some
thoughts on how to move forward.  Throttling itself is not the problem.
We've known that synchronous operations in the vcpu thread are a problem
long before throttling.  This is just another reason to convert device
emulation to use asynchronous interfaces.

Here is the list of device models that perform synchronous block I/O:
hw/fdc.c
hw/ide/atapi.c
hw/ide/core.c
hw/nand.c
hw/onenand.c
hw/pflash_cfi01.c
hw/pflash_cfi02.c
hw/sd.c

Zhi Hui Li is working on hw/fdc.c and recently sent a patch.

I think it's too close to QEMU 1.1 to convert all the remaining devices
and test them properly before the soft-freeze.  But it's probably
possible to convert IDE before the soft-freeze.

In the meantime we could add this to bdrv_rw_co():

if (bs->io_limits_enabled) {
    fprintf(stderr, "Disabling I/O throttling on '%s' due "
            "to synchronous I/O\n", bdrv_get_device_name(bs));
    bdrv_io_limits_disable(bs);
}

It's not pretty but tells the user there is an issue and avoids
deadlocking.

Stefan
Chris Webb - March 22, 2012, 7:07 p.m.
Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> writes:

> Yesterday I only posted an analysis of the bug but here are some
> thoughts on how to move forward.  Throttling itself is not the problem.
> We've known that synchronous operations in the vcpu thread are a problem
> long before throttling.  This is just another reason to convert device
> emulation to use asynchronous interfaces.
> 
> Here is the list of device models that perform synchronous block I/O:
> hw/fdc.c
> hw/ide/atapi.c
> hw/ide/core.c
> hw/nand.c
> hw/onenand.c
> hw/pflash_cfi01.c
> hw/pflash_cfi02.c
> hw/sd.c
> 
> Zhi Hui Li is working on hw/fdc.c and recently sent a patch.
> 
> I think it's too close to QEMU 1.1 to convert all the remaining devices
> and test them properly before the soft-freeze.  But it's probably
> possible to convert IDE before the soft-freeze.

IDE is the only of these that would affect us as a typical user of
throttling. The others aren't really the kind of devices which you'd be
using in a hosting setting in any case.

However, being able to throttle ide and virtio drives to limit the harm one
VM can cause to another on the same host would make an enormous difference
to us, and for what it's worth, we're desperately keen to see this problem
and the assert() issue fixed so we can deploy this feature, back-porting it
if necessary.

Just to reiterate a previous offer: we'd be very happy to pay for someone's
contracting time to work on this if that would help. The feature is great
and works well apart from the risk of these ide deadlocks and asserts.
Unfortunately, ide represents the bulk of our VMs at the moment.

Best wishes,

Chris.
Stefan Hajnoczi - March 23, 2012, 10:38 a.m.
On Thu, Mar 22, 2012 at 07:07:52PM +0000, Chris Webb wrote:
> Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> writes:
> 
> > Yesterday I only posted an analysis of the bug but here are some
> > thoughts on how to move forward.  Throttling itself is not the problem.
> > We've known that synchronous operations in the vcpu thread are a problem
> > long before throttling.  This is just another reason to convert device
> > emulation to use asynchronous interfaces.
> > 
> > Here is the list of device models that perform synchronous block I/O:
> > hw/fdc.c
> > hw/ide/atapi.c
> > hw/ide/core.c
> > hw/nand.c
> > hw/onenand.c
> > hw/pflash_cfi01.c
> > hw/pflash_cfi02.c
> > hw/sd.c
> > 
> > Zhi Hui Li is working on hw/fdc.c and recently sent a patch.
> > 
> > I think it's too close to QEMU 1.1 to convert all the remaining devices
> > and test them properly before the soft-freeze.  But it's probably
> > possible to convert IDE before the soft-freeze.
> 
> IDE is the only of these that would affect us as a typical user of
> throttling. The others aren't really the kind of devices which you'd be
> using in a hosting setting in any case.

Can you check whether your Windows guest has DMA or PIO mode enabled?

http://msdn.microsoft.com/en-us/windows/hardware/gg463526

Stefan
Chris Webb - March 23, 2012, 10:43 a.m.
Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> writes:

> On Thu, Mar 22, 2012 at 07:07:52PM +0000, Chris Webb wrote:
> > Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> writes:
> > 
> > > Yesterday I only posted an analysis of the bug but here are some
> > > thoughts on how to move forward.  Throttling itself is not the problem.
> > > We've known that synchronous operations in the vcpu thread are a problem
> > > long before throttling.  This is just another reason to convert device
> > > emulation to use asynchronous interfaces.
> > > 
> > > Here is the list of device models that perform synchronous block I/O:
> > > hw/fdc.c
> > > hw/ide/atapi.c
> > > hw/ide/core.c
> > > hw/nand.c
> > > hw/onenand.c
> > > hw/pflash_cfi01.c
> > > hw/pflash_cfi02.c
> > > hw/sd.c
> > > 
> > > Zhi Hui Li is working on hw/fdc.c and recently sent a patch.
> > > 
> > > I think it's too close to QEMU 1.1 to convert all the remaining devices
> > > and test them properly before the soft-freeze.  But it's probably
> > > possible to convert IDE before the soft-freeze.
> > 
> > IDE is the only of these that would affect us as a typical user of
> > throttling. The others aren't really the kind of devices which you'd be
> > using in a hosting setting in any case.
> 
> Can you check whether your Windows guest has DMA or PIO mode enabled?
> 
> http://msdn.microsoft.com/en-us/windows/hardware/gg463526

Hi. We were producing the IDE assert()s and deadlocks with linux kernels.
Although I believe the same symptoms exist on windows, I haven't actually
tested it myself. Typically they would show up in the 16-bit bootloader
code, even before the 32-bit OS has started.

Cheers,

Chris.
Stefan Hajnoczi - March 23, 2012, 10:50 a.m.
On Fri, Mar 23, 2012 at 10:43 AM, Chris Webb <chris@arachsys.com> wrote:
> Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> writes:
>
>> On Thu, Mar 22, 2012 at 07:07:52PM +0000, Chris Webb wrote:
>> > Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> writes:
>> >
>> > > Yesterday I only posted an analysis of the bug but here are some
>> > > thoughts on how to move forward.  Throttling itself is not the problem.
>> > > We've known that synchronous operations in the vcpu thread are a problem
>> > > long before throttling.  This is just another reason to convert device
>> > > emulation to use asynchronous interfaces.
>> > >
>> > > Here is the list of device models that perform synchronous block I/O:
>> > > hw/fdc.c
>> > > hw/ide/atapi.c
>> > > hw/ide/core.c
>> > > hw/nand.c
>> > > hw/onenand.c
>> > > hw/pflash_cfi01.c
>> > > hw/pflash_cfi02.c
>> > > hw/sd.c
>> > >
>> > > Zhi Hui Li is working on hw/fdc.c and recently sent a patch.
>> > >
>> > > I think it's too close to QEMU 1.1 to convert all the remaining devices
>> > > and test them properly before the soft-freeze.  But it's probably
>> > > possible to convert IDE before the soft-freeze.
>> >
>> > IDE is the only of these that would affect us as a typical user of
>> > throttling. The others aren't really the kind of devices which you'd be
>> > using in a hosting setting in any case.
>>
>> Can you check whether your Windows guest has DMA or PIO mode enabled?
>>
>> http://msdn.microsoft.com/en-us/windows/hardware/gg463526
>
> Hi. We were producing the IDE assert()s and deadlocks with linux kernels.
> Although I believe the same symptoms exist on windows, I haven't actually
> tested it myself. Typically they would show up in the 16-bit bootloader
> code, even before the 32-bit OS has started.

Okay, that makes sense.  Bootloaders and the BIOS may use the simplest
driver interface - which may be PIO in the case.  I asked because the
IDE DMA code path should work with I/O throttling and Windows is known
for sometimes falling back to the PIO code path when some heuristics
trigger.

Stefan
Richard Davies - March 23, 2012, 11:02 a.m.
Stefan Hajnoczi wrote:
> > Hi. We were producing the IDE assert()s and deadlocks with linux kernels.
> > Although I believe the same symptoms exist on windows, I haven't actually
> > tested it myself. Typically they would show up in the 16-bit bootloader
> > code, even before the 32-bit OS has started.
>
> Okay, that makes sense.  Bootloaders and the BIOS may use the simplest
> driver interface - which may be PIO in the case.  I asked because the
> IDE DMA code path should work with I/O throttling and Windows is known
> for sometimes falling back to the PIO code path when some heuristics
> trigger.

Whilst the bootloader was the easiest place for us to replicate this
deadlock, we have also seen it with running Linux VMs.

Older Linux kernels (e.g. CentOS 5) will fall back to PIO mode on IDE
devices if they experience storage timeouts (e.g. due to heavy contention of
underlying storage from other VMs).

Hence, the IO limits deadlock can lead to running Linux VMs locking up as
well as just Windows and Linux bootloaders.

Cheers,

Richard.
Stefan Hajnoczi - March 23, 2012, 11:32 a.m.
On Fri, Mar 23, 2012 at 11:02 AM, Richard Davies <richard@arachsys.com> wrote:
> Stefan Hajnoczi wrote:
>> > Hi. We were producing the IDE assert()s and deadlocks with linux kernels.
>> > Although I believe the same symptoms exist on windows, I haven't actually
>> > tested it myself. Typically they would show up in the 16-bit bootloader
>> > code, even before the 32-bit OS has started.
>>
>> Okay, that makes sense.  Bootloaders and the BIOS may use the simplest
>> driver interface - which may be PIO in the case.  I asked because the
>> IDE DMA code path should work with I/O throttling and Windows is known
>> for sometimes falling back to the PIO code path when some heuristics
>> trigger.
>
> Whilst the bootloader was the easiest place for us to replicate this
> deadlock, we have also seen it with running Linux VMs.
>
> Older Linux kernels (e.g. CentOS 5) will fall back to PIO mode on IDE
> devices if they experience storage timeouts (e.g. due to heavy contention of
> underlying storage from other VMs).
>
> Hence, the IO limits deadlock can lead to running Linux VMs locking up as
> well as just Windows and Linux bootloaders.

Thanks for pointing out that Linux falls back too.

Stefan
Stefan Hajnoczi - March 23, 2012, 4:56 p.m.
On Fri, Mar 23, 2012 at 11:32 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Fri, Mar 23, 2012 at 11:02 AM, Richard Davies <richard@arachsys.com> wrote:
>> Stefan Hajnoczi wrote:
>>> > Hi. We were producing the IDE assert()s and deadlocks with linux kernels.
>>> > Although I believe the same symptoms exist on windows, I haven't actually
>>> > tested it myself. Typically they would show up in the 16-bit bootloader
>>> > code, even before the 32-bit OS has started.
>>>
>>> Okay, that makes sense.  Bootloaders and the BIOS may use the simplest
>>> driver interface - which may be PIO in the case.  I asked because the
>>> IDE DMA code path should work with I/O throttling and Windows is known
>>> for sometimes falling back to the PIO code path when some heuristics
>>> trigger.
>>
>> Whilst the bootloader was the easiest place for us to replicate this
>> deadlock, we have also seen it with running Linux VMs.
>>
>> Older Linux kernels (e.g. CentOS 5) will fall back to PIO mode on IDE
>> devices if they experience storage timeouts (e.g. due to heavy contention of
>> underlying storage from other VMs).
>>
>> Hence, the IO limits deadlock can lead to running Linux VMs locking up as
>> well as just Windows and Linux bootloaders.
>
> Thanks for pointing out that Linux falls back too.

I have a prototype that converts hw/ide/core.c to asynchronous I/O
functions.  It still needs to be cleaned up and tested against Windows
(Linux with libata.dma=0 is happy).  Patches will be sent next week.

Stefan
Stefan Hajnoczi - March 26, 2012, 2:21 p.m.
On Tue, Mar 20, 2012 at 11:44 AM, Stefan Hajnoczi
<stefanha@linux.vnet.ibm.com> wrote:
> On Tue, Mar 20, 2012 at 10:58:10AM +0100, Kevin Wolf wrote:
>> Am 20.03.2012 10:47, schrieb Paolo Bonzini:
>> > Il 20/03/2012 10:40, Zhi Yong Wu ha scritto:
>> >> HI, Kevin,
>> >>
>> >> We hope that I/O throttling can be shipped without known issue in QEMU
>> >> 1.1, so if you are available, can you give this patch some love?
>> >
>> > I'm sorry to say this, but I think I/O throttling is impossible to save.
>> >  As it is implemented now, it just cannot work in the presence of
>> > synchronous I/O, except at the cost of busy waiting with the global
>> > mutex taken.  See the message from Stefan yesterday.
>>
>> qemu_aio_flush() is busy waiting with the global mutex taken anyway, so
>> it doesn't change that much.
>
> Yesterday I only posted an analysis of the bug but here are some
> thoughts on how to move forward.  Throttling itself is not the problem.
> We've known that synchronous operations in the vcpu thread are a problem
> long before throttling.  This is just another reason to convert device
> emulation to use asynchronous interfaces.
>
> Here is the list of device models that perform synchronous block I/O:
> hw/fdc.c
> hw/ide/atapi.c
> hw/ide/core.c
> hw/nand.c
> hw/onenand.c
> hw/pflash_cfi01.c
> hw/pflash_cfi02.c
> hw/sd.c
>
> Zhi Hui Li is working on hw/fdc.c and recently sent a patch.
>
> I think it's too close to QEMU 1.1 to convert all the remaining devices
> and test them properly before the soft-freeze.  But it's probably
> possible to convert IDE before the soft-freeze.
>
> In the meantime we could add this to bdrv_rw_co():
>
> if (bs->io_limits_enabled) {
>    fprintf(stderr, "Disabling I/O throttling on '%s' due "
>            "to synchronous I/O\n", bdrv_get_device_name(bs));
>    bdrv_io_limits_disable(bs);
> }
>
> It's not pretty but tells the user there is an issue and avoids
> deadlocking.

No one has commented on this suggestion.  I think leaving a known hang
in QEMU 1.1 is undesirable.  Better to have this warning and disable
throttling in the case we cannot support right now.

Kevin: Would you accept a patch like this?  Or do you have another
solution in mind?

Stefan
Kevin Wolf - March 26, 2012, 2:31 p.m.
Am 26.03.2012 16:21, schrieb Stefan Hajnoczi:
> On Tue, Mar 20, 2012 at 11:44 AM, Stefan Hajnoczi
> <stefanha@linux.vnet.ibm.com> wrote:
>> On Tue, Mar 20, 2012 at 10:58:10AM +0100, Kevin Wolf wrote:
>>> Am 20.03.2012 10:47, schrieb Paolo Bonzini:
>>>> Il 20/03/2012 10:40, Zhi Yong Wu ha scritto:
>>>>> HI, Kevin,
>>>>>
>>>>> We hope that I/O throttling can be shipped without known issue in QEMU
>>>>> 1.1, so if you are available, can you give this patch some love?
>>>>
>>>> I'm sorry to say this, but I think I/O throttling is impossible to save.
>>>>  As it is implemented now, it just cannot work in the presence of
>>>> synchronous I/O, except at the cost of busy waiting with the global
>>>> mutex taken.  See the message from Stefan yesterday.
>>>
>>> qemu_aio_flush() is busy waiting with the global mutex taken anyway, so
>>> it doesn't change that much.
>>
>> Yesterday I only posted an analysis of the bug but here are some
>> thoughts on how to move forward.  Throttling itself is not the problem.
>> We've known that synchronous operations in the vcpu thread are a problem
>> long before throttling.  This is just another reason to convert device
>> emulation to use asynchronous interfaces.
>>
>> Here is the list of device models that perform synchronous block I/O:
>> hw/fdc.c
>> hw/ide/atapi.c
>> hw/ide/core.c
>> hw/nand.c
>> hw/onenand.c
>> hw/pflash_cfi01.c
>> hw/pflash_cfi02.c
>> hw/sd.c
>>
>> Zhi Hui Li is working on hw/fdc.c and recently sent a patch.
>>
>> I think it's too close to QEMU 1.1 to convert all the remaining devices
>> and test them properly before the soft-freeze.  But it's probably
>> possible to convert IDE before the soft-freeze.
>>
>> In the meantime we could add this to bdrv_rw_co():
>>
>> if (bs->io_limits_enabled) {
>>    fprintf(stderr, "Disabling I/O throttling on '%s' due "
>>            "to synchronous I/O\n", bdrv_get_device_name(bs));
>>    bdrv_io_limits_disable(bs);
>> }
>>
>> It's not pretty but tells the user there is an issue and avoids
>> deadlocking.
> 
> No one has commented on this suggestion.  I think leaving a known hang
> in QEMU 1.1 is undesirable.  Better to have this warning and disable
> throttling in the case we cannot support right now.
> 
> Kevin: Would you accept a patch like this?  Or do you have another
> solution in mind?

No, I don't have any clever short-term solution to offer. Put in
whatever hack you think works best for 1.1.

Kevin
Zhiyong Wu - March 27, 2012, 4:29 a.m.
On Mon, Mar 26, 2012 at 10:21 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Tue, Mar 20, 2012 at 11:44 AM, Stefan Hajnoczi
> <stefanha@linux.vnet.ibm.com> wrote:
>> On Tue, Mar 20, 2012 at 10:58:10AM +0100, Kevin Wolf wrote:
>>> Am 20.03.2012 10:47, schrieb Paolo Bonzini:
>>> > Il 20/03/2012 10:40, Zhi Yong Wu ha scritto:
>>> >> HI, Kevin,
>>> >>
>>> >> We hope that I/O throttling can be shipped without known issue in QEMU
>>> >> 1.1, so if you are available, can you give this patch some love?
>>> >
>>> > I'm sorry to say this, but I think I/O throttling is impossible to save.
>>> >  As it is implemented now, it just cannot work in the presence of
>>> > synchronous I/O, except at the cost of busy waiting with the global
>>> > mutex taken.  See the message from Stefan yesterday.
>>>
>>> qemu_aio_flush() is busy waiting with the global mutex taken anyway, so
>>> it doesn't change that much.
>>
>> Yesterday I only posted an analysis of the bug but here are some
>> thoughts on how to move forward.  Throttling itself is not the problem.
>> We've known that synchronous operations in the vcpu thread are a problem
>> long before throttling.  This is just another reason to convert device
>> emulation to use asynchronous interfaces.
>>
>> Here is the list of device models that perform synchronous block I/O:
>> hw/fdc.c
>> hw/ide/atapi.c
>> hw/ide/core.c
>> hw/nand.c
>> hw/onenand.c
>> hw/pflash_cfi01.c
>> hw/pflash_cfi02.c
>> hw/sd.c
>>
>> Zhi Hui Li is working on hw/fdc.c and recently sent a patch.
>>
>> I think it's too close to QEMU 1.1 to convert all the remaining devices
>> and test them properly before the soft-freeze.  But it's probably
>> possible to convert IDE before the soft-freeze.
>>
>> In the meantime we could add this to bdrv_rw_co():
>>
>> if (bs->io_limits_enabled) {
>>    fprintf(stderr, "Disabling I/O throttling on '%s' due "
>>            "to synchronous I/O\n", bdrv_get_device_name(bs));
>>    bdrv_io_limits_disable(bs);
>> }
>>
>> It's not pretty but tells the user there is an issue and avoids
>> deadlocking.
>
> No one has commented on this suggestion.  I think leaving a known hang
> in QEMU 1.1 is undesirable.  Better to have this warning and disable
> throttling in the case we cannot support right now.
IDE is using both sync API and async API when guest boot.
>
> Kevin: Would you accept a patch like this?  Or do you have another
> solution in mind?
>
> Stefan
Stefan Hajnoczi - March 27, 2012, 6:52 a.m.
On Tue, Mar 27, 2012 at 12:29:09PM +0800, Zhi Yong Wu wrote:
> On Mon, Mar 26, 2012 at 10:21 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > On Tue, Mar 20, 2012 at 11:44 AM, Stefan Hajnoczi
> > <stefanha@linux.vnet.ibm.com> wrote:
> >> On Tue, Mar 20, 2012 at 10:58:10AM +0100, Kevin Wolf wrote:
> >>> Am 20.03.2012 10:47, schrieb Paolo Bonzini:
> >>> > Il 20/03/2012 10:40, Zhi Yong Wu ha scritto:
> >>> >> HI, Kevin,
> >>> >>
> >>> >> We hope that I/O throttling can be shipped without known issue in QEMU
> >>> >> 1.1, so if you are available, can you give this patch some love?
> >>> >
> >>> > I'm sorry to say this, but I think I/O throttling is impossible to save.
> >>> >  As it is implemented now, it just cannot work in the presence of
> >>> > synchronous I/O, except at the cost of busy waiting with the global
> >>> > mutex taken.  See the message from Stefan yesterday.
> >>>
> >>> qemu_aio_flush() is busy waiting with the global mutex taken anyway, so
> >>> it doesn't change that much.
> >>
> >> Yesterday I only posted an analysis of the bug but here are some
> >> thoughts on how to move forward.  Throttling itself is not the problem.
> >> We've known that synchronous operations in the vcpu thread are a problem
> >> long before throttling.  This is just another reason to convert device
> >> emulation to use asynchronous interfaces.
> >>
> >> Here is the list of device models that perform synchronous block I/O:
> >> hw/fdc.c
> >> hw/ide/atapi.c
> >> hw/ide/core.c
> >> hw/nand.c
> >> hw/onenand.c
> >> hw/pflash_cfi01.c
> >> hw/pflash_cfi02.c
> >> hw/sd.c
> >>
> >> Zhi Hui Li is working on hw/fdc.c and recently sent a patch.
> >>
> >> I think it's too close to QEMU 1.1 to convert all the remaining devices
> >> and test them properly before the soft-freeze.  But it's probably
> >> possible to convert IDE before the soft-freeze.
> >>
> >> In the meantime we could add this to bdrv_rw_co():
> >>
> >> if (bs->io_limits_enabled) {
> >>    fprintf(stderr, "Disabling I/O throttling on '%s' due "
> >>            "to synchronous I/O\n", bdrv_get_device_name(bs));
> >>    bdrv_io_limits_disable(bs);
> >> }
> >>
> >> It's not pretty but tells the user there is an issue and avoids
> >> deadlocking.
> >
> > No one has commented on this suggestion.  I think leaving a known hang
> > in QEMU 1.1 is undesirable.  Better to have this warning and disable
> > throttling in the case we cannot support right now.
> IDE is using both sync API and async API when guest boot.

Yes, a warning patch must handle the synchronous I/O case during startup
(guessing disk geometry), which I think happens even for virtio-blk.

Stefan

Patch

diff --git a/block.c b/block.c
index 52ffe14..52dabd0 100644
--- a/block.c
+++ b/block.c
@@ -858,12 +858,32 @@  void bdrv_close_all(void)
  *
  * This function does not flush data to disk, use bdrv_flush_all() for that
  * after calling this function.
- */
+ *
+ * Note that completion of an asynchronous I/O operation can trigger any
+ * number of other I/O operations on other devices---for example a coroutine
+ * can be arbitrarily complex and a constant flow of I/O to multiple devices
+ * can come until the coroutine is complete.  Because of this, it is not
+ * possible to drain a single device's I/O queue.
+*/
 void bdrv_drain_all(void)
 {
     BlockDriverState *bs;
+    bool busy;
 
-    qemu_aio_flush();
+    do {
+        busy = false;
+        qemu_aio_flush();
+
+        /* FIXME: We do not have timer support here, so this is effectively
+         * a busy wait.
+         */
+        QTAILQ_FOREACH(bs, &bdrv_states, list) {
+            if (!qemu_co_queue_empty(&bs->throttled_reqs)) {
+                qemu_co_queue_restart_all(&bs->throttled_reqs);
+                busy = true;
+            }
+        }
+    } while (busy);
 
     /* If requests are still pending there is a bug somewhere */
     QTAILQ_FOREACH(bs, &bdrv_states, list) {