diff mbox

[v2,1/3] block: block: introduce bdrv_io_plug() and bdrv_io_unplug()

Message ID 1404201119-6646-2-git-send-email-ming.lei@canonical.com
State New
Headers show

Commit Message

Ming Lei July 1, 2014, 7:51 a.m. UTC
This patch introduces these two APIs so that following
patches can support queuing I/O requests and submitting them
at batch for improving I/O performance.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 block.c                   |   21 +++++++++++++++++++++
 include/block/block.h     |    3 +++
 include/block/block_int.h |    4 ++++
 3 files changed, 28 insertions(+)

Comments

Kevin Wolf July 1, 2014, 11:18 a.m. UTC | #1
Am 01.07.2014 um 09:51 hat Ming Lei geschrieben:
> This patch introduces these two APIs so that following
> patches can support queuing I/O requests and submitting them
> at batch for improving I/O performance.
> 
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> ---
>  block.c                   |   21 +++++++++++++++++++++
>  include/block/block.h     |    3 +++
>  include/block/block_int.h |    4 ++++
>  3 files changed, 28 insertions(+)
> 
> diff --git a/block.c b/block.c
> index 217f523..fea9e43 100644
> --- a/block.c
> +++ b/block.c
> @@ -1910,6 +1910,7 @@ void bdrv_drain_all(void)
>              bool bs_busy;
>  
>              aio_context_acquire(aio_context);
> +            bdrv_io_unplug(bs);
>              bdrv_start_throttled_reqs(bs);
>              bs_busy = bdrv_requests_pending(bs);
>              bs_busy |= aio_poll(aio_context, bs_busy);

This means that bdrv_io_plug/unplug() are not paired as I would have
expected. I find the name not very descriptive anyway (I probably
wouldn't have guessed what it does from its name if it weren't in this
series), so maybe we should consider renaming it?

Perhaps something like bdrv_req_batch_start() and
bdrv_req_batch_submit(), but I'm open for different suggestions.

> @@ -5774,3 +5775,23 @@ bool bdrv_is_first_non_filter(BlockDriverState *candidate)
>  
>      return false;
>  }
> +
> +void bdrv_io_plug(BlockDriverState *bs)
> +{
> +    BlockDriver *drv = bs->drv;
> +    if (drv && drv->bdrv_io_plug) {
> +        drv->bdrv_io_plug(bs);
> +    } else if (bs->file) {
> +        bdrv_io_plug(bs->file);
> +    }
> +}

Does this bs->file forwarding work for more than the raw driver? For
example, if drv is an image format driver that needs to read some
metadata from the image before it can submit the payload, does this
still do what you were intending?

Kevin
Ming Lei July 1, 2014, 1:31 p.m. UTC | #2
On Tue, Jul 1, 2014 at 7:18 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 01.07.2014 um 09:51 hat Ming Lei geschrieben:
>> This patch introduces these two APIs so that following
>> patches can support queuing I/O requests and submitting them
>> at batch for improving I/O performance.
>>
>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>> Signed-off-by: Ming Lei <ming.lei@canonical.com>
>> ---
>>  block.c                   |   21 +++++++++++++++++++++
>>  include/block/block.h     |    3 +++
>>  include/block/block_int.h |    4 ++++
>>  3 files changed, 28 insertions(+)
>>
>> diff --git a/block.c b/block.c
>> index 217f523..fea9e43 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -1910,6 +1910,7 @@ void bdrv_drain_all(void)
>>              bool bs_busy;
>>
>>              aio_context_acquire(aio_context);
>> +            bdrv_io_unplug(bs);
>>              bdrv_start_throttled_reqs(bs);
>>              bs_busy = bdrv_requests_pending(bs);
>>              bs_busy |= aio_poll(aio_context, bs_busy);
>
> This means that bdrv_io_plug/unplug() are not paired as I would have
> expected. I find the name not very descriptive anyway (I probably
> wouldn't have guessed what it does from its name if it weren't in this
> series), so maybe we should consider renaming it?
>
> Perhaps something like bdrv_req_batch_start() and
> bdrv_req_batch_submit(), but I'm open for different suggestions.

The term of plug/unplug have been used in block subsystem of
linux kernel for long time, just like pipe with faucet, :-)

>> @@ -5774,3 +5775,23 @@ bool bdrv_is_first_non_filter(BlockDriverState *candidate)
>>
>>      return false;
>>  }
>> +
>> +void bdrv_io_plug(BlockDriverState *bs)
>> +{
>> +    BlockDriver *drv = bs->drv;
>> +    if (drv && drv->bdrv_io_plug) {
>> +        drv->bdrv_io_plug(bs);
>> +    } else if (bs->file) {
>> +        bdrv_io_plug(bs->file);
>> +    }
>> +}
>
> Does this bs->file forwarding work for more than the raw driver? For
> example, if drv is an image format driver that needs to read some
> metadata from the image before it can submit the payload, does this
> still do what you were intending?

Yes, I think so, even though the current implementation is just for
"fixing" dataplane and keeping simple as far as possible, because
reading both metadata and payload will forward to bs->file, and
bs->file->drv should have supported plug & unplug in case of
linux-aio.


Thanks,
--
Ming Lei
Ming Lei July 1, 2014, 2:39 p.m. UTC | #3
On Tue, Jul 1, 2014 at 9:31 PM, Ming Lei <ming.lei@canonical.com> wrote:
> On Tue, Jul 1, 2014 at 7:18 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>> Am 01.07.2014 um 09:51 hat Ming Lei geschrieben:
>>> This patch introduces these two APIs so that following
>>> patches can support queuing I/O requests and submitting them
>>> at batch for improving I/O performance.
>>>
>>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>>> Signed-off-by: Ming Lei <ming.lei@canonical.com>
>>> ---
>>>  block.c                   |   21 +++++++++++++++++++++
>>>  include/block/block.h     |    3 +++
>>>  include/block/block_int.h |    4 ++++
>>>  3 files changed, 28 insertions(+)
>>>
>>> diff --git a/block.c b/block.c
>>> index 217f523..fea9e43 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -1910,6 +1910,7 @@ void bdrv_drain_all(void)
>>>              bool bs_busy;
>>>
>>>              aio_context_acquire(aio_context);
>>> +            bdrv_io_unplug(bs);
>>>              bdrv_start_throttled_reqs(bs);
>>>              bs_busy = bdrv_requests_pending(bs);
>>>              bs_busy |= aio_poll(aio_context, bs_busy);
>>
>> This means that bdrv_io_plug/unplug() are not paired as I would have

Maybe new interface of bdrv_io_flush() or bdrv_io_commit() is better
for the above situation.

>> expected. I find the name not very descriptive anyway (I probably
>> wouldn't have guessed what it does from its name if it weren't in this
>> series), so maybe we should consider renaming it?
>>
>> Perhaps something like bdrv_req_batch_start() and
>> bdrv_req_batch_submit(), but I'm open for different suggestions.
>
> The term of plug/unplug have been used in block subsystem of
> linux kernel for long time, just like pipe with faucet, :-)
>
>>> @@ -5774,3 +5775,23 @@ bool bdrv_is_first_non_filter(BlockDriverState *candidate)
>>>
>>>      return false;
>>>  }
>>> +
>>> +void bdrv_io_plug(BlockDriverState *bs)
>>> +{
>>> +    BlockDriver *drv = bs->drv;
>>> +    if (drv && drv->bdrv_io_plug) {
>>> +        drv->bdrv_io_plug(bs);
>>> +    } else if (bs->file) {
>>> +        bdrv_io_plug(bs->file);
>>> +    }
>>> +}
>>
>> Does this bs->file forwarding work for more than the raw driver? For
>> example, if drv is an image format driver that needs to read some
>> metadata from the image before it can submit the payload, does this
>> still do what you were intending?

Sorry for not understanding the problem, and you are right, these
patches can't support other formats, and for solving the dependency,
changes to image format driver should be needed.


Thanks,
--
Ming Lei
Kevin Wolf July 1, 2014, 3:21 p.m. UTC | #4
Am 01.07.2014 um 16:39 hat Ming Lei geschrieben:
> On Tue, Jul 1, 2014 at 9:31 PM, Ming Lei <ming.lei@canonical.com> wrote:
> > On Tue, Jul 1, 2014 at 7:18 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> >> Am 01.07.2014 um 09:51 hat Ming Lei geschrieben:
> >>> This patch introduces these two APIs so that following
> >>> patches can support queuing I/O requests and submitting them
> >>> at batch for improving I/O performance.
> >>>
> >>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> >>> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> >>> ---
> >>>  block.c                   |   21 +++++++++++++++++++++
> >>>  include/block/block.h     |    3 +++
> >>>  include/block/block_int.h |    4 ++++
> >>>  3 files changed, 28 insertions(+)
> >>>
> >>> diff --git a/block.c b/block.c
> >>> index 217f523..fea9e43 100644
> >>> --- a/block.c
> >>> +++ b/block.c
> >>> @@ -1910,6 +1910,7 @@ void bdrv_drain_all(void)
> >>>              bool bs_busy;
> >>>
> >>>              aio_context_acquire(aio_context);
> >>> +            bdrv_io_unplug(bs);
> >>>              bdrv_start_throttled_reqs(bs);
> >>>              bs_busy = bdrv_requests_pending(bs);
> >>>              bs_busy |= aio_poll(aio_context, bs_busy);
> >>
> >> This means that bdrv_io_plug/unplug() are not paired as I would have
> 
> Maybe new interface of bdrv_io_flush() or bdrv_io_commit() is better
> for the above situation.
> 
> >> expected. I find the name not very descriptive anyway (I probably
> >> wouldn't have guessed what it does from its name if it weren't in this
> >> series), so maybe we should consider renaming it?
> >>
> >> Perhaps something like bdrv_req_batch_start() and
> >> bdrv_req_batch_submit(), but I'm open for different suggestions.
> >
> > The term of plug/unplug have been used in block subsystem of
> > linux kernel for long time, just like pipe with faucet, :-)

Fair enough. I don't think it's obvious when you don't know the kernel
code, but consistency might be more important.

> >>> @@ -5774,3 +5775,23 @@ bool bdrv_is_first_non_filter(BlockDriverState *candidate)
> >>>
> >>>      return false;
> >>>  }
> >>> +
> >>> +void bdrv_io_plug(BlockDriverState *bs)
> >>> +{
> >>> +    BlockDriver *drv = bs->drv;
> >>> +    if (drv && drv->bdrv_io_plug) {
> >>> +        drv->bdrv_io_plug(bs);
> >>> +    } else if (bs->file) {
> >>> +        bdrv_io_plug(bs->file);
> >>> +    }
> >>> +}
> >>
> >> Does this bs->file forwarding work for more than the raw driver? For
> >> example, if drv is an image format driver that needs to read some
> >> metadata from the image before it can submit the payload, does this
> >> still do what you were intending?
> 
> Sorry for not understanding the problem, and you are right, these
> patches can't support other formats, and for solving the dependency,
> changes to image format driver should be needed.

Then let's drop the bs->file recursion here and add an explicit
.bdrv_io_plug/unplug callback to the raw driver.

Kevin
Paolo Bonzini July 1, 2014, 4:56 p.m. UTC | #5
Il 01/07/2014 17:21, Kevin Wolf ha scritto:
>>>> Does this bs->file forwarding work for more than the raw driver? For
>>>> example, if drv is an image format driver that needs to read some
>>>> metadata from the image before it can submit the payload, does this
>>>> still do what you were intending?
>>
>> Sorry for not understanding the problem, and you are right, these
>> patches can't support other formats, and for solving the dependency,
>> changes to image format driver should be needed.
>
> Then let's drop the bs->file recursion here and add an explicit
> .bdrv_io_plug/unplug callback to the raw driver.

Actually I thought about this in my review, and there's no reason for 
this not to work for image formats.

While bs->file is plugged, image formats will start executing their 
bdrv_co_readv/bdrv_co_writev callbacks, and issue reads or writes as 
necessary.  The reads and writes will accumulate in bs->file until it is 
unplugged, which is exactly the effect we want.

The change in bdrv_drain_all is ugly though.  I don't have a better 
idea, but I would like to understand better why it is needed.  Ming Lei, 
did you see a deadlock without it?

Paolo
Ming Lei July 2, 2014, 12:46 a.m. UTC | #6
On Wed, Jul 2, 2014 at 12:56 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 01/07/2014 17:21, Kevin Wolf ha scritto:
>
>>>>> Does this bs->file forwarding work for more than the raw driver? For
>>>>> example, if drv is an image format driver that needs to read some
>>>>> metadata from the image before it can submit the payload, does this
>>>>> still do what you were intending?
>>>
>>>
>>> Sorry for not understanding the problem, and you are right, these
>>> patches can't support other formats, and for solving the dependency,
>>> changes to image format driver should be needed.
>>
>>
>> Then let's drop the bs->file recursion here and add an explicit
>> .bdrv_io_plug/unplug callback to the raw driver.
>
>
> Actually I thought about this in my review, and there's no reason for this
> not to work for image formats.
>
> While bs->file is plugged, image formats will start executing their
> bdrv_co_readv/bdrv_co_writev callbacks, and issue reads or writes as
> necessary.  The reads and writes will accumulate in bs->file until it is
> unplugged, which is exactly the effect we want.

For some image formats, meta data need to be read first before
the payload can be read since how/what to read payload might
depend on content of meta data.

>
> The change in bdrv_drain_all is ugly though.  I don't have a better idea,
> but I would like to understand better why it is needed.  Ming Lei, did you
> see a deadlock without it?

Not yet, just for safe reason to make sure all queued data has chance
to be flushed. If you think it isn't necessary, I can remove it.

Thanks,
--
Ming Lei
Ming Lei July 2, 2014, 2:35 a.m. UTC | #7
On Wed, Jul 2, 2014 at 8:46 AM, Ming Lei <ming.lei@canonical.com> wrote:
> On Wed, Jul 2, 2014 at 12:56 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Il 01/07/2014 17:21, Kevin Wolf ha scritto:
>>
>>>>>> Does this bs->file forwarding work for more than the raw driver? For
>>>>>> example, if drv is an image format driver that needs to read some
>>>>>> metadata from the image before it can submit the payload, does this
>>>>>> still do what you were intending?
>>>>
>>>>
>>>> Sorry for not understanding the problem, and you are right, these
>>>> patches can't support other formats, and for solving the dependency,
>>>> changes to image format driver should be needed.
>>>
>>>
>>> Then let's drop the bs->file recursion here and add an explicit
>>> .bdrv_io_plug/unplug callback to the raw driver.
>>
>>
>> Actually I thought about this in my review, and there's no reason for this
>> not to work for image formats.
>>
>> While bs->file is plugged, image formats will start executing their
>> bdrv_co_readv/bdrv_co_writev callbacks, and issue reads or writes as
>> necessary.  The reads and writes will accumulate in bs->file until it is
>> unplugged, which is exactly the effect we want.
>
> For some image formats, meta data need to be read first before
> the payload can be read since how/what to read payload might
> depend on content of meta data.
>
>>
>> The change in bdrv_drain_all is ugly though.  I don't have a better idea,
>> but I would like to understand better why it is needed.  Ming Lei, did you
>> see a deadlock without it?
>
> Not yet, just for safe reason to make sure all queued data has chance
> to be flushed. If you think it isn't necessary, I can remove it.

Another interface like bdrv_io_flush() may be needed for
such purpose, and the accumulated IOs is really required to be
flushed before doing flush.

I will add the interface in v4.

Thanks,
--
Ming Lei
Paolo Bonzini July 2, 2014, 8:18 a.m. UTC | #8
Il 02/07/2014 02:46, Ming Lei ha scritto:
> On Wed, Jul 2, 2014 at 12:56 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Il 01/07/2014 17:21, Kevin Wolf ha scritto:
>>>>>> Does this bs->file forwarding work for more than the raw driver? For
>>>>>> example, if drv is an image format driver that needs to read some
>>>>>> metadata from the image before it can submit the payload, does this
>>>>>> still do what you were intending?
>>>>
>>>>
>>>> Sorry for not understanding the problem, and you are right, these
>>>> patches can't support other formats, and for solving the dependency,
>>>> changes to image format driver should be needed.
>>>
>>>
>>> Then let's drop the bs->file recursion here and add an explicit
>>> .bdrv_io_plug/unplug callback to the raw driver.
>>
>>
>> Actually I thought about this in my review, and there's no reason for this
>> not to work for image formats.
>>
>> While bs->file is plugged, image formats will start executing their
>> bdrv_co_readv/bdrv_co_writev callbacks, and issue reads or writes as
>> necessary.  The reads and writes will accumulate in bs->file until it is
>> unplugged, which is exactly the effect we want.
>
> For some image formats, meta data need to be read first before
> the payload can be read since how/what to read payload might
> depend on content of meta data.

Yes, but everything is done asynchronously so it should just work.  You have

     plug(bs)
       plug(bs->file)
     read
       start coroutine for image format read
         issue metadata read on bs->file
           bdrv_read ultimately calls bdrv_aio_readv on bs->file
             I/O not submitted because bs->file is plugged
         coroutine yields
     unplug(bs)
       unplug(bs->file)
         bs->file now submits metadata read
     ...
     metadata read completes
       coroutine re-entered
         image format read restarts, unaware of plug/unplug

If things do _not_ work as above I would like to understand where my 
reasoning is wrong.

On top of this, there _could_ be reasons for formats to implement 
plug/unplug themselves.  They could coalesce metadata reads or 
copy-on-write operations, for example.  This however is independent from 
the default behavior, which IMO is "plugging should propagate along 
bs->file" (and possibly bs->backing_hd too, but not now because that 
opens more cans of worms).

>> The change in bdrv_drain_all is ugly though.  I don't have a better idea,
>> but I would like to understand better why it is needed.  Ming Lei, did you
>> see a deadlock without it?
>
> Not yet, just for safe reason to make sure all queued data has chance
> to be flushed. If you think it isn't necessary, I can remove it.

If you could make it an assertion, that would also be great.  (BTW, it's 
probably best to add a nesting count to plug/unplug).

Paolo
Kevin Wolf July 2, 2014, 8:38 a.m. UTC | #9
Am 02.07.2014 um 10:18 hat Paolo Bonzini geschrieben:
> Il 02/07/2014 02:46, Ming Lei ha scritto:
> >On Wed, Jul 2, 2014 at 12:56 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> >>Il 01/07/2014 17:21, Kevin Wolf ha scritto:
> >>>>>>Does this bs->file forwarding work for more than the raw driver? For
> >>>>>>example, if drv is an image format driver that needs to read some
> >>>>>>metadata from the image before it can submit the payload, does this
> >>>>>>still do what you were intending?
> >>>>
> >>>>
> >>>>Sorry for not understanding the problem, and you are right, these
> >>>>patches can't support other formats, and for solving the dependency,
> >>>>changes to image format driver should be needed.
> >>>
> >>>
> >>>Then let's drop the bs->file recursion here and add an explicit
> >>>.bdrv_io_plug/unplug callback to the raw driver.
> >>
> >>
> >>Actually I thought about this in my review, and there's no reason for this
> >>not to work for image formats.
> >>
> >>While bs->file is plugged, image formats will start executing their
> >>bdrv_co_readv/bdrv_co_writev callbacks, and issue reads or writes as
> >>necessary.  The reads and writes will accumulate in bs->file until it is
> >>unplugged, which is exactly the effect we want.
> >
> >For some image formats, meta data need to be read first before
> >the payload can be read since how/what to read payload might
> >depend on content of meta data.
> 
> Yes, but everything is done asynchronously so it should just work.  You have
> 
>     plug(bs)
>       plug(bs->file)
>     read
>       start coroutine for image format read
>         issue metadata read on bs->file
>           bdrv_read ultimately calls bdrv_aio_readv on bs->file
>             I/O not submitted because bs->file is plugged
>         coroutine yields
>     unplug(bs)
>       unplug(bs->file)
>         bs->file now submits metadata read
>     ...
>     metadata read completes
>       coroutine re-entered
>         image format read restarts, unaware of plug/unplug
> 
> If things do _not_ work as above I would like to understand where my
> reasoning is wrong.

This is my understanding of how things work, yes. It means that
plug applies only to the very first I/O that is made against the lower
layer. This means that we may queue L2 table reads, but the actual
payload requests are sent one by one.

In fact, currently, only one request would do the metadata read and hold
a coroutine lock for it. The request would be delayed until the unplug,
but we wouldn't get any batching in this case. We may actually benefit
from it when we get metadata cache hits and the first thing to happen is
the payload request.

Okay, looks safe to enable it even for image formats.

> On top of this, there _could_ be reasons for formats to implement
> plug/unplug themselves.  They could coalesce metadata reads or
> copy-on-write operations, for example.  This however is independent
> from the default behavior, which IMO is "plugging should propagate
> along bs->file" (and possibly bs->backing_hd too, but not now
> because that opens more cans of worms).

Yes, enabling it for bs->backing_hd was my alternative option for the
case that we keep bs->file. What can of worms do you see there? From the
reasoning above, I can't see why bs->backing_hd should be any different
from bs->file.

> >>The change in bdrv_drain_all is ugly though.  I don't have a better idea,
> >>but I would like to understand better why it is needed.  Ming Lei, did you
> >>see a deadlock without it?
> >
> >Not yet, just for safe reason to make sure all queued data has chance
> >to be flushed. If you think it isn't necessary, I can remove it.
> 
> If you could make it an assertion, that would also be great.  (BTW,
> it's probably best to add a nesting count to plug/unplug).

Agreed.

Another point I was thinking about (not for this series, but in a
follow-up) is what to do with bdrv_aio_multiwrite(). We could either
leave the two different interfaces there, just that multiwrite should
probably call plug/unplug before it sends its requests (this would apply
the improvements to virtio-blk without dataplane; could be done easily
even now), or we try to implement request merging with the plug/unplug
interface and convert the callers. Not sure how we would do the latter,
though, because we don't have a list of requests any more, except in the
lowest layer that actually queues.

Or should we have a default plug/unplug implementation in block.c that
queues any reads and writes, and on unplug merges requests, then plugs
bs->file and sends the requests to the driver?

Kevin
Ming Lei July 2, 2014, 8:39 a.m. UTC | #10
On Wed, Jul 2, 2014 at 4:18 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 02/07/2014 02:46, Ming Lei ha scritto:
>
>> On Wed, Jul 2, 2014 at 12:56 AM, Paolo Bonzini <pbonzini@redhat.com>
>> wrote:
>>>
>>> Il 01/07/2014 17:21, Kevin Wolf ha scritto:
>>>>>>>
>>>>>>> Does this bs->file forwarding work for more than the raw driver? For
>>>>>>> example, if drv is an image format driver that needs to read some
>>>>>>> metadata from the image before it can submit the payload, does this
>>>>>>> still do what you were intending?
>>>>>
>>>>>
>>>>>
>>>>> Sorry for not understanding the problem, and you are right, these
>>>>> patches can't support other formats, and for solving the dependency,
>>>>> changes to image format driver should be needed.
>>>>
>>>>
>>>>
>>>> Then let's drop the bs->file recursion here and add an explicit
>>>> .bdrv_io_plug/unplug callback to the raw driver.
>>>
>>>
>>>
>>> Actually I thought about this in my review, and there's no reason for
>>> this
>>> not to work for image formats.
>>>
>>> While bs->file is plugged, image formats will start executing their
>>> bdrv_co_readv/bdrv_co_writev callbacks, and issue reads or writes as
>>> necessary.  The reads and writes will accumulate in bs->file until it is
>>> unplugged, which is exactly the effect we want.
>>
>>
>> For some image formats, meta data need to be read first before
>> the payload can be read since how/what to read payload might
>> depend on content of meta data.
>
>
> Yes, but everything is done asynchronously so it should just work.  You have
>
>     plug(bs)
>       plug(bs->file)
>     read
>       start coroutine for image format read
>         issue metadata read on bs->file
>           bdrv_read ultimately calls bdrv_aio_readv on bs->file
>             I/O not submitted because bs->file is plugged
>         coroutine yields
>     unplug(bs)
>       unplug(bs->file)
>         bs->file now submits metadata read
>     ...
>     metadata read completes
>       coroutine re-entered
>         image format read restarts, unaware of plug/unplug

Then start to read payload in original path, but no plug/unplug any
more. Also another request may follows, and another plug&unplug
comes too, which makes thing more complicated, so I suggest to
enable plug&unplug only for raw driver now.

And in future we can use explicit bdrv_io_flush() to handle the
dependency in non-raw format drivers.

>
> If things do _not_ work as above I would like to understand where my
> reasoning is wrong.
>
> On top of this, there _could_ be reasons for formats to implement
> plug/unplug themselves.  They could coalesce metadata reads or copy-on-write
> operations, for example.  This however is independent from the default
> behavior, which IMO is "plugging should propagate along bs->file" (and
> possibly bs->backing_hd too, but not now because that opens more cans of
> worms).
>
>
>>> The change in bdrv_drain_all is ugly though.  I don't have a better idea,
>>> but I would like to understand better why it is needed.  Ming Lei, did
>>> you
>>> see a deadlock without it?
>>
>>
>> Not yet, just for safe reason to make sure all queued data has chance
>> to be flushed. If you think it isn't necessary, I can remove it.
>
>
> If you could make it an assertion, that would also be great.  (BTW, it's
> probably best to add a nesting count to plug/unplug).

I'd rather to add it in future if we really need that.


Thanks,
--
Ming Lei
Paolo Bonzini July 2, 2014, 8:49 a.m. UTC | #11
Il 02/07/2014 10:38, Kevin Wolf ha scritto:
> > On top of this, there _could_ be reasons for formats to implement
> > plug/unplug themselves.  They could coalesce metadata reads or
> > copy-on-write operations, for example.  This however is independent
> > from the default behavior, which IMO is "plugging should propagate
> > along bs->file" (and possibly bs->backing_hd too, but not now
> > because that opens more cans of worms).
>
> Yes, enabling it for bs->backing_hd was my alternative option for the
> case that we keep bs->file. What can of worms do you see there? From the
> reasoning above, I can't see why bs->backing_hd should be any different
> from bs->file.

The only one I thought of, is someone changing the backing file chain 
while the backing file is plugged.  IIRC bs->file doesn't change when 
you do bdrv_reopen.

Though changing the backing file chain probably would call 
bdrv_drain_all too.

> Another point I was thinking about (not for this series, but in a
> follow-up) is what to do with bdrv_aio_multiwrite(). We could either
> leave the two different interfaces there, just that multiwrite should
> probably call plug/unplug before it sends its requests (this would apply
> the improvements to virtio-blk without dataplane; could be done easily
> even now), or we try to implement request merging with the plug/unplug
> interface and convert the callers. Not sure how we would do the latter,
> though, because we don't have a list of requests any more, except in the
> lowest layer that actually queues.
>
> Or should we have a default plug/unplug implementation in block.c that
> queues any reads and writes, and on unplug merges requests, then plugs
> bs->file and sends the requests to the driver?

Longer term this is a good point.  Plug/unplug certainly provides many 
optimization opportunities, but as long as we expose the right API to 
the device model we need not care about it now.

In fact, there are plenty of other places where we can plug/unplug: 
mirroring, I/O throttling, image formats could also use aio to submit 
requests to a fragmented bs->file together and plug/unplug around the 
loop.  Benchmarking is needed of course to see if this actually improves 
performance (unlikely except in the last case) or at least reduces CPU 
usage.

Paolo
Paolo Bonzini July 2, 2014, 8:56 a.m. UTC | #12
Il 02/07/2014 10:39, Ming Lei ha scritto:
> Then start to read payload in original path, but no plug/unplug any
> more. Also another request may follows, and another plug&unplug
> comes too, which makes thing more complicated, so I suggest to
> enable plug&unplug only for raw driver now.

That's just a performance issue (and actually one that wasn't in 2.0 
because qcow2 on dataplane wasn't supported there).  In many cases the 
cache hit of the qcow2 metadata cache can be very high, and avoiding 
plug/unplug would prevent an easy performance bonus.

I don't especially like plug/unplug as an API (I think it's better to 
extend aio_multiwrite to include other kind of requests), but:

- either we have qualms on the correctness of it, and then we should 
live with the regressions

- or if the patches are not messy and reverting them is easy, we should 
go for it.  This is what we did for dataplane in the first place, and we 
can keep doing it in the 2.1 dataplane code.

> > If you could make it an assertion, that would also be great.  (BTW, it's
> > probably best to add a nesting count to plug/unplug).
>
> I'd rather to add it in future if we really need that.

I think this is a matter of correctness as Kevin's review pointed out.

Paolo
Kevin Wolf July 2, 2014, 9:12 a.m. UTC | #13
Am 02.07.2014 um 10:56 hat Paolo Bonzini geschrieben:
> Il 02/07/2014 10:39, Ming Lei ha scritto:
> >Then start to read payload in original path, but no plug/unplug any
> >more. Also another request may follows, and another plug&unplug
> >comes too, which makes thing more complicated, so I suggest to
> >enable plug&unplug only for raw driver now.
> 
> That's just a performance issue (and actually one that wasn't in 2.0
> because qcow2 on dataplane wasn't supported there).  In many cases
> the cache hit of the qcow2 metadata cache can be very high, and
> avoiding plug/unplug would prevent an easy performance bonus.
> 
> I don't especially like plug/unplug as an API (I think it's better
> to extend aio_multiwrite to include other kind of requests), but:
> 
> - either we have qualms on the correctness of it, and then we should
> live with the regressions
> 
> - or if the patches are not messy and reverting them is easy, we
> should go for it.  This is what we did for dataplane in the first
> place, and we can keep doing it in the 2.1 dataplane code.

Fully agree. This series is small enough and obviously fixes a
dataplane problem, so at least for 2.1 we should go for it.

My thoughts in the other mail were more about where to go in the long
term. We need to have a decision about what API we commit to - something
multiwrite-like or something plug/unplug-like - before we want to start
converting everything to that interface.

This is why I think we should be thinking about how to implement certain
optimisations (like the request merging with plug/unplug, as I mentioned;
or mixing read and writes in one batch with multiwrite) in both models.
Only when we have a reasonbly good idea of what the result would look
like in either case we can make an informed decision.

Kevin
Ming Lei July 2, 2014, 9:26 a.m. UTC | #14
On Wed, Jul 2, 2014 at 4:56 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 02/07/2014 10:39, Ming Lei ha scritto:
>
>> Then start to read payload in original path, but no plug/unplug any
>> more. Also another request may follows, and another plug&unplug
>> comes too, which makes thing more complicated, so I suggest to
>> enable plug&unplug only for raw driver now.
>
>
> That's just a performance issue (and actually one that wasn't in 2.0 because
> qcow2 on dataplane wasn't supported there).  In many cases the cache hit of
> the qcow2 metadata cache can be very high, and avoiding plug/unplug would
> prevent an easy performance bonus.

I just gave a test on qcow2 image with dataplane on, and looks it
can work, so I will take original way to implement the api.

But making full use of plug/unplug for non-raw formats need
changes in format drivers in future.

>
> I don't especially like plug/unplug as an API (I think it's better to extend
> aio_multiwrite to include other kind of requests), but:
>
> - either we have qualms on the correctness of it, and then we should live
> with the regressions
>
> - or if the patches are not messy and reverting them is easy, we should go
> for it.  This is what we did for dataplane in the first place, and we can
> keep doing it in the 2.1 dataplane code.

OK, let's keep the API's as it now.

Actually I have thoughts in mind for plug&unplug changes too,
and now it is better to make it a bit simple,

>
>> > If you could make it an assertion, that would also be great.  (BTW, it's
>> > probably best to add a nesting count to plug/unplug).
>>
>> I'd rather to add it in future if we really need that.
>
>
> I think this is a matter of correctness as Kevin's review pointed out.

OK, specifically it is needed for non-raw format driver.

Thanks,
--
Ming Lei
Ming Lei July 2, 2014, 9:29 a.m. UTC | #15
On Wed, Jul 2, 2014 at 5:12 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 02.07.2014 um 10:56 hat Paolo Bonzini geschrieben:
>> Il 02/07/2014 10:39, Ming Lei ha scritto:
>> >Then start to read payload in original path, but no plug/unplug any
>> >more. Also another request may follows, and another plug&unplug
>> >comes too, which makes thing more complicated, so I suggest to
>> >enable plug&unplug only for raw driver now.
>>
>> That's just a performance issue (and actually one that wasn't in 2.0
>> because qcow2 on dataplane wasn't supported there).  In many cases
>> the cache hit of the qcow2 metadata cache can be very high, and
>> avoiding plug/unplug would prevent an easy performance bonus.
>>
>> I don't especially like plug/unplug as an API (I think it's better
>> to extend aio_multiwrite to include other kind of requests), but:
>>
>> - either we have qualms on the correctness of it, and then we should
>> live with the regressions
>>
>> - or if the patches are not messy and reverting them is easy, we
>> should go for it.  This is what we did for dataplane in the first
>> place, and we can keep doing it in the 2.1 dataplane code.
>
> Fully agree. This series is small enough and obviously fixes a
> dataplane problem, so at least for 2.1 we should go for it.
>
> My thoughts in the other mail were more about where to go in the long
> term. We need to have a decision about what API we commit to - something
> multiwrite-like or something plug/unplug-like - before we want to start
> converting everything to that interface.
>
> This is why I think we should be thinking about how to implement certain
> optimisations (like the request merging with plug/unplug, as I mentioned;
> or mixing read and writes in one batch with multiwrite) in both models.
> Only when we have a reasonbly good idea of what the result would look
> like in either case we can make an informed decision.

Actually linux-aio can support to submit read/write to multi files, and
virtio-scsi does have the use case, so in future io queue should be
per aio-context as I posted 1st time.  And I am wondering if multiwrite-like
APIs can fit in this situation.


Thanks,
--
Ming Lei
Kevin Wolf July 2, 2014, 9:49 a.m. UTC | #16
Am 02.07.2014 um 11:29 hat Ming Lei geschrieben:
> On Wed, Jul 2, 2014 at 5:12 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> > Am 02.07.2014 um 10:56 hat Paolo Bonzini geschrieben:
> >> Il 02/07/2014 10:39, Ming Lei ha scritto:
> >> >Then start to read payload in original path, but no plug/unplug any
> >> >more. Also another request may follows, and another plug&unplug
> >> >comes too, which makes thing more complicated, so I suggest to
> >> >enable plug&unplug only for raw driver now.
> >>
> >> That's just a performance issue (and actually one that wasn't in 2.0
> >> because qcow2 on dataplane wasn't supported there).  In many cases
> >> the cache hit of the qcow2 metadata cache can be very high, and
> >> avoiding plug/unplug would prevent an easy performance bonus.
> >>
> >> I don't especially like plug/unplug as an API (I think it's better
> >> to extend aio_multiwrite to include other kind of requests), but:
> >>
> >> - either we have qualms on the correctness of it, and then we should
> >> live with the regressions
> >>
> >> - or if the patches are not messy and reverting them is easy, we
> >> should go for it.  This is what we did for dataplane in the first
> >> place, and we can keep doing it in the 2.1 dataplane code.
> >
> > Fully agree. This series is small enough and obviously fixes a
> > dataplane problem, so at least for 2.1 we should go for it.
> >
> > My thoughts in the other mail were more about where to go in the long
> > term. We need to have a decision about what API we commit to - something
> > multiwrite-like or something plug/unplug-like - before we want to start
> > converting everything to that interface.
> >
> > This is why I think we should be thinking about how to implement certain
> > optimisations (like the request merging with plug/unplug, as I mentioned;
> > or mixing read and writes in one batch with multiwrite) in both models.
> > Only when we have a reasonbly good idea of what the result would look
> > like in either case we can make an informed decision.
> 
> Actually linux-aio can support to submit read/write to multi files, and
> virtio-scsi does have the use case, so in future io queue should be
> per aio-context as I posted 1st time.  And I am wondering if multiwrite-like
> APIs can fit in this situation.

Though where would you get the requests for two different files from,
within the same bdrv_plug/unplug block?

Hm, okay, I guess backing files might be a valid point. Anything else?

Kevin
Ming Lei July 2, 2014, 10:02 a.m. UTC | #17
On Wed, Jul 2, 2014 at 5:49 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 02.07.2014 um 11:29 hat Ming Lei geschrieben:
>> On Wed, Jul 2, 2014 at 5:12 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>> > Am 02.07.2014 um 10:56 hat Paolo Bonzini geschrieben:
>> >> Il 02/07/2014 10:39, Ming Lei ha scritto:
>> >> >Then start to read payload in original path, but no plug/unplug any
>> >> >more. Also another request may follows, and another plug&unplug
>> >> >comes too, which makes thing more complicated, so I suggest to
>> >> >enable plug&unplug only for raw driver now.
>> >>
>> >> That's just a performance issue (and actually one that wasn't in 2.0
>> >> because qcow2 on dataplane wasn't supported there).  In many cases
>> >> the cache hit of the qcow2 metadata cache can be very high, and
>> >> avoiding plug/unplug would prevent an easy performance bonus.
>> >>
>> >> I don't especially like plug/unplug as an API (I think it's better
>> >> to extend aio_multiwrite to include other kind of requests), but:
>> >>
>> >> - either we have qualms on the correctness of it, and then we should
>> >> live with the regressions
>> >>
>> >> - or if the patches are not messy and reverting them is easy, we
>> >> should go for it.  This is what we did for dataplane in the first
>> >> place, and we can keep doing it in the 2.1 dataplane code.
>> >
>> > Fully agree. This series is small enough and obviously fixes a
>> > dataplane problem, so at least for 2.1 we should go for it.
>> >
>> > My thoughts in the other mail were more about where to go in the long
>> > term. We need to have a decision about what API we commit to - something
>> > multiwrite-like or something plug/unplug-like - before we want to start
>> > converting everything to that interface.
>> >
>> > This is why I think we should be thinking about how to implement certain
>> > optimisations (like the request merging with plug/unplug, as I mentioned;
>> > or mixing read and writes in one batch with multiwrite) in both models.
>> > Only when we have a reasonbly good idea of what the result would look
>> > like in either case we can make an informed decision.
>>
>> Actually linux-aio can support to submit read/write to multi files, and
>> virtio-scsi does have the use case, so in future io queue should be
>> per aio-context as I posted 1st time.  And I am wondering if multiwrite-like
>> APIs can fit in this situation.
>
> Though where would you get the requests for two different files from,
> within the same bdrv_plug/unplug block?

I think it is doable if io queue is per aio_context.

The requests can be sent to different luns(files), see virtio_scsi_handle_cmd().

Thanks,
--
Ming Lei
Paolo Bonzini July 2, 2014, 10:24 a.m. UTC | #18
Il 02/07/2014 12:02, Ming Lei ha scritto:
>>> Actually linux-aio can support to submit read/write to multi files, and
>>> virtio-scsi does have the use case, so in future io queue should be
>>> per aio-context as I posted 1st time.  And I am wondering if multiwrite-like
>>> APIs can fit in this situation.
>>
>> Though where would you get the requests for two different files from,
>> within the same bdrv_plug/unplug block?
>
> I think it is doable if io queue is per aio_context.

Yes, but that would basically mean moving linux-aio functionality to 
AioContext.  Effectively it's a rewrite of linux-aio.c.

Paolo
Ming Lei July 2, 2014, 10:28 a.m. UTC | #19
On Wed, Jul 2, 2014 at 6:24 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 02/07/2014 12:02, Ming Lei ha scritto:
>
>>>> Actually linux-aio can support to submit read/write to multi files, and
>>>> virtio-scsi does have the use case, so in future io queue should be
>>>> per aio-context as I posted 1st time.  And I am wondering if
>>>> multiwrite-like
>>>> APIs can fit in this situation.
>>>
>>>
>>> Though where would you get the requests for two different files from,
>>> within the same bdrv_plug/unplug block?
>>
>>
>> I think it is doable if io queue is per aio_context.
>
>
> Yes, but that would basically mean moving linux-aio functionality to
> AioContext.  Effectively it's a rewrite of linux-aio.c.

IMO, we just need to move io_queue, plug and unplug out of linux-aio.c,
not a rewrite.

Thanks,
--
Ming Lei
diff mbox

Patch

diff --git a/block.c b/block.c
index 217f523..fea9e43 100644
--- a/block.c
+++ b/block.c
@@ -1910,6 +1910,7 @@  void bdrv_drain_all(void)
             bool bs_busy;
 
             aio_context_acquire(aio_context);
+            bdrv_io_unplug(bs);
             bdrv_start_throttled_reqs(bs);
             bs_busy = bdrv_requests_pending(bs);
             bs_busy |= aio_poll(aio_context, bs_busy);
@@ -5774,3 +5775,23 @@  bool bdrv_is_first_non_filter(BlockDriverState *candidate)
 
     return false;
 }
+
+void bdrv_io_plug(BlockDriverState *bs)
+{
+    BlockDriver *drv = bs->drv;
+    if (drv && drv->bdrv_io_plug) {
+        drv->bdrv_io_plug(bs);
+    } else if (bs->file) {
+        bdrv_io_plug(bs->file);
+    }
+}
+
+void bdrv_io_unplug(BlockDriverState *bs)
+{
+    BlockDriver *drv = bs->drv;
+    if (drv && drv->bdrv_io_unplug) {
+        drv->bdrv_io_unplug(bs);
+    } else if (bs->file) {
+        bdrv_io_unplug(bs->file);
+    }
+}
diff --git a/include/block/block.h b/include/block/block.h
index d0baf4f..ea627d2 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -578,4 +578,7 @@  AioContext *bdrv_get_aio_context(BlockDriverState *bs);
  */
 void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context);
 
+void bdrv_io_plug(BlockDriverState *bs);
+void bdrv_io_unplug(BlockDriverState *bs);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 715c761..0d75ca6 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -257,6 +257,10 @@  struct BlockDriver {
     void (*bdrv_attach_aio_context)(BlockDriverState *bs,
                                     AioContext *new_context);
 
+    /* io queue for linux-aio */
+    void (*bdrv_io_plug)(BlockDriverState *bs);
+    void (*bdrv_io_unplug)(BlockDriverState *bs);
+
     QLIST_ENTRY(BlockDriver) list;
 };