diff mbox

[v3,3/6] block: Introduce bdrv_dma_map and bdrv_dma_unmap

Message ID 20170705133635.11850-4-famz@redhat.com
State New
Headers show

Commit Message

Fam Zheng July 5, 2017, 1:36 p.m. UTC
Allow block driver to map and unmap a buffer for later I/O, as a performance
hint.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/block-backend.c          | 10 ++++++++++
 block/io.c                     | 24 ++++++++++++++++++++++++
 include/block/block.h          |  2 ++
 include/block/block_int.h      |  4 ++++
 include/sysemu/block-backend.h |  3 +++
 5 files changed, 43 insertions(+)

Comments

Stefan Hajnoczi July 10, 2017, 2:57 p.m. UTC | #1
On Wed, Jul 05, 2017 at 09:36:32PM +0800, Fam Zheng wrote:
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 15fa602..4092669 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -381,6 +381,10 @@ struct BlockDriver {
>                               uint64_t parent_perm, uint64_t parent_shared,
>                               uint64_t *nperm, uint64_t *nshared);
>  
> +    /* Map and unmap a buffer for I/O, as a performance hint to the
> +     * driver. */
> +    void (*bdrv_dma_map)(BlockDriverState *bs, void *host, size_t size);
> +    void (*bdrv_dma_unmap)(BlockDriverState *bs, void *host);

It's unclear what this API does and how to use it correctly.  Please
flesh out the doc comments a bit to explain its use.
Stefan Hajnoczi July 10, 2017, 3:07 p.m. UTC | #2
On Wed, Jul 05, 2017 at 09:36:32PM +0800, Fam Zheng wrote:
> Allow block driver to map and unmap a buffer for later I/O, as a performance
> hint.

The name blk_dma_map() is confusing since other "dma" APIs like
dma_addr_t and dma_blk_io() deal with guest physical addresses instead
of host addresses.  They are about DMA to/from guest RAM.

Have you considered hiding this cached mapping in block/nvme.c so that
it isn't exposed?  block/nvme.c could keep the last buffer mapped and
callers would get the performance benefit without a new blk_dma_map()
API.
Paolo Bonzini July 10, 2017, 3:08 p.m. UTC | #3
On 10/07/2017 17:07, Stefan Hajnoczi wrote:
> On Wed, Jul 05, 2017 at 09:36:32PM +0800, Fam Zheng wrote:
>> Allow block driver to map and unmap a buffer for later I/O, as a performance
>> hint.
> The name blk_dma_map() is confusing since other "dma" APIs like
> dma_addr_t and dma_blk_io() deal with guest physical addresses instead
> of host addresses.  They are about DMA to/from guest RAM.
> 
> Have you considered hiding this cached mapping in block/nvme.c so that
> it isn't exposed?  block/nvme.c could keep the last buffer mapped and
> callers would get the performance benefit without a new blk_dma_map()
> API.

One buffer is enough for qemu-img bench, but not for more complex cases
(e.g. fio).

Paolo
Stefan Hajnoczi July 11, 2017, 10:05 a.m. UTC | #4
On Mon, Jul 10, 2017 at 05:08:56PM +0200, Paolo Bonzini wrote:
> On 10/07/2017 17:07, Stefan Hajnoczi wrote:
> > On Wed, Jul 05, 2017 at 09:36:32PM +0800, Fam Zheng wrote:
> >> Allow block driver to map and unmap a buffer for later I/O, as a performance
> >> hint.
> > The name blk_dma_map() is confusing since other "dma" APIs like
> > dma_addr_t and dma_blk_io() deal with guest physical addresses instead
> > of host addresses.  They are about DMA to/from guest RAM.
> > 
> > Have you considered hiding this cached mapping in block/nvme.c so that
> > it isn't exposed?  block/nvme.c could keep the last buffer mapped and
> > callers would get the performance benefit without a new blk_dma_map()
> > API.
> 
> One buffer is enough for qemu-img bench, but not for more complex cases
> (e.g. fio).

I don't see any other blk_dma_map() callers.

Stefan
Paolo Bonzini July 11, 2017, 10:28 a.m. UTC | #5
On 11/07/2017 12:05, Stefan Hajnoczi wrote:
> On Mon, Jul 10, 2017 at 05:08:56PM +0200, Paolo Bonzini wrote:
>> On 10/07/2017 17:07, Stefan Hajnoczi wrote:
>>> On Wed, Jul 05, 2017 at 09:36:32PM +0800, Fam Zheng wrote:
>>>> Allow block driver to map and unmap a buffer for later I/O, as a performance
>>>> hint.
>>> The name blk_dma_map() is confusing since other "dma" APIs like
>>> dma_addr_t and dma_blk_io() deal with guest physical addresses instead
>>> of host addresses.  They are about DMA to/from guest RAM.
>>>
>>> Have you considered hiding this cached mapping in block/nvme.c so that
>>> it isn't exposed?  block/nvme.c could keep the last buffer mapped and
>>> callers would get the performance benefit without a new blk_dma_map()
>>> API.
>>
>> One buffer is enough for qemu-img bench, but not for more complex cases
>> (e.g. fio).
> 
> I don't see any other blk_dma_map() callers.

Indeed, the fio plugin is not part of this series, but it also used
blk_dma_map.  Without it, performance is awful.

Paolo
Fam Zheng July 12, 2017, 1:07 a.m. UTC | #6
On Tue, 07/11 12:28, Paolo Bonzini wrote:
> On 11/07/2017 12:05, Stefan Hajnoczi wrote:
> > On Mon, Jul 10, 2017 at 05:08:56PM +0200, Paolo Bonzini wrote:
> >> On 10/07/2017 17:07, Stefan Hajnoczi wrote:
> >>> On Wed, Jul 05, 2017 at 09:36:32PM +0800, Fam Zheng wrote:
> >>>> Allow block driver to map and unmap a buffer for later I/O, as a performance
> >>>> hint.
> >>> The name blk_dma_map() is confusing since other "dma" APIs like
> >>> dma_addr_t and dma_blk_io() deal with guest physical addresses instead
> >>> of host addresses.  They are about DMA to/from guest RAM.
> >>>
> >>> Have you considered hiding this cached mapping in block/nvme.c so that
> >>> it isn't exposed?  block/nvme.c could keep the last buffer mapped and
> >>> callers would get the performance benefit without a new blk_dma_map()
> >>> API.
> >>
> >> One buffer is enough for qemu-img bench, but not for more complex cases
> >> (e.g. fio).
> > 
> > I don't see any other blk_dma_map() callers.
> 
> Indeed, the fio plugin is not part of this series, but it also used
> blk_dma_map.  Without it, performance is awful.

How many buffers does fio use, typically? If it's not too many, block/nvme.c can
cache the last N buffers. I'm with Stefan that hiding the mapping logic from
block layer callers makes a nicer API, especially such that qemu-img is much
easier to maintain good performance across subcommmands.

Fam
Paolo Bonzini July 12, 2017, 2:03 p.m. UTC | #7
On 12/07/2017 03:07, Fam Zheng wrote:
> On Tue, 07/11 12:28, Paolo Bonzini wrote:
>> On 11/07/2017 12:05, Stefan Hajnoczi wrote:
>>> On Mon, Jul 10, 2017 at 05:08:56PM +0200, Paolo Bonzini wrote:
>>>> On 10/07/2017 17:07, Stefan Hajnoczi wrote:
>>>>> On Wed, Jul 05, 2017 at 09:36:32PM +0800, Fam Zheng wrote:
>>>>>> Allow block driver to map and unmap a buffer for later I/O, as a performance
>>>>>> hint.
>>>>> The name blk_dma_map() is confusing since other "dma" APIs like
>>>>> dma_addr_t and dma_blk_io() deal with guest physical addresses instead
>>>>> of host addresses.  They are about DMA to/from guest RAM.
>>>>>
>>>>> Have you considered hiding this cached mapping in block/nvme.c so that
>>>>> it isn't exposed?  block/nvme.c could keep the last buffer mapped and
>>>>> callers would get the performance benefit without a new blk_dma_map()
>>>>> API.
>>>>
>>>> One buffer is enough for qemu-img bench, but not for more complex cases
>>>> (e.g. fio).
>>>
>>> I don't see any other blk_dma_map() callers.
>>
>> Indeed, the fio plugin is not part of this series, but it also used
>> blk_dma_map.  Without it, performance is awful.
> 
> How many buffers does fio use, typically? If it's not too many, block/nvme.c can
> cache the last N buffers. I'm with Stefan that hiding the mapping logic from
> block layer callers makes a nicer API, especially such that qemu-img is much
> easier to maintain good performance across subcommmands.

It depends on the queue depth.

I think the API addition is necessary, otherwise we wouldn't have added
the RAMBlockNotifier which is a layering violation that does the same
thing (create permanent HVA->IOVA mappings).  In fact, the
RAMBlockNotifier could be moved out of nvme.c and made to use
blk_dma_map/unmap, though I'm not proposing to do it now.

I don't think qemu-img convert and dd are impacted by IOMMU map/unmap as
heavily as bench, because they operate with queue depth 1.  But adding
map/unmap there would not be hard.

Paolo
Stefan Hajnoczi July 14, 2017, 1:37 p.m. UTC | #8
On Wed, Jul 12, 2017 at 04:03:57PM +0200, Paolo Bonzini wrote:
> On 12/07/2017 03:07, Fam Zheng wrote:
> > On Tue, 07/11 12:28, Paolo Bonzini wrote:
> >> On 11/07/2017 12:05, Stefan Hajnoczi wrote:
> >>> On Mon, Jul 10, 2017 at 05:08:56PM +0200, Paolo Bonzini wrote:
> >>>> On 10/07/2017 17:07, Stefan Hajnoczi wrote:
> >>>>> On Wed, Jul 05, 2017 at 09:36:32PM +0800, Fam Zheng wrote:
> >>>>>> Allow block driver to map and unmap a buffer for later I/O, as a performance
> >>>>>> hint.
> >>>>> The name blk_dma_map() is confusing since other "dma" APIs like
> >>>>> dma_addr_t and dma_blk_io() deal with guest physical addresses instead
> >>>>> of host addresses.  They are about DMA to/from guest RAM.
> >>>>>
> >>>>> Have you considered hiding this cached mapping in block/nvme.c so that
> >>>>> it isn't exposed?  block/nvme.c could keep the last buffer mapped and
> >>>>> callers would get the performance benefit without a new blk_dma_map()
> >>>>> API.
> >>>>
> >>>> One buffer is enough for qemu-img bench, but not for more complex cases
> >>>> (e.g. fio).
> >>>
> >>> I don't see any other blk_dma_map() callers.
> >>
> >> Indeed, the fio plugin is not part of this series, but it also used
> >> blk_dma_map.  Without it, performance is awful.
> > 
> > How many buffers does fio use, typically? If it's not too many, block/nvme.c can
> > cache the last N buffers. I'm with Stefan that hiding the mapping logic from
> > block layer callers makes a nicer API, especially such that qemu-img is much
> > easier to maintain good performance across subcommmands.
> 
> It depends on the queue depth.
> 
> I think the API addition is necessary, otherwise we wouldn't have added
> the RAMBlockNotifier which is a layering violation that does the same
> thing (create permanent HVA->IOVA mappings).  In fact, the
> RAMBlockNotifier could be moved out of nvme.c and made to use
> blk_dma_map/unmap, though I'm not proposing to do it now.
> 
> I don't think qemu-img convert and dd are impacted by IOMMU map/unmap as
> heavily as bench, because they operate with queue depth 1.  But adding
> map/unmap there would not be hard.

I'm not against an API existing for this.  I would just ask:

1. It's documented so the purpose and semantics are clear.
2. The name cannot be confused with dma-helpers.c APIs.

Maybe blk_register_buf() or blk_add_buf_hint()?
Paolo Bonzini July 14, 2017, 1:46 p.m. UTC | #9
On 14/07/2017 15:37, Stefan Hajnoczi wrote:
> On Wed, Jul 12, 2017 at 04:03:57PM +0200, Paolo Bonzini wrote:
>> On 12/07/2017 03:07, Fam Zheng wrote:
>>> On Tue, 07/11 12:28, Paolo Bonzini wrote:
>>>> On 11/07/2017 12:05, Stefan Hajnoczi wrote:
>>>>> On Mon, Jul 10, 2017 at 05:08:56PM +0200, Paolo Bonzini wrote:
>>>>>> On 10/07/2017 17:07, Stefan Hajnoczi wrote:
>>>>>>> On Wed, Jul 05, 2017 at 09:36:32PM +0800, Fam Zheng wrote:
>>>>>>>> Allow block driver to map and unmap a buffer for later I/O, as a performance
>>>>>>>> hint.
>>>>>>> The name blk_dma_map() is confusing since other "dma" APIs like
>>>>>>> dma_addr_t and dma_blk_io() deal with guest physical addresses instead
>>>>>>> of host addresses.  They are about DMA to/from guest RAM.
>>>>>>>
>>>>>>> Have you considered hiding this cached mapping in block/nvme.c so that
>>>>>>> it isn't exposed?  block/nvme.c could keep the last buffer mapped and
>>>>>>> callers would get the performance benefit without a new blk_dma_map()
>>>>>>> API.
>>>>>>
>>>>>> One buffer is enough for qemu-img bench, but not for more complex cases
>>>>>> (e.g. fio).
>>>>>
>>>>> I don't see any other blk_dma_map() callers.
>>>>
>>>> Indeed, the fio plugin is not part of this series, but it also used
>>>> blk_dma_map.  Without it, performance is awful.
>>>
>>> How many buffers does fio use, typically? If it's not too many, block/nvme.c can
>>> cache the last N buffers. I'm with Stefan that hiding the mapping logic from
>>> block layer callers makes a nicer API, especially such that qemu-img is much
>>> easier to maintain good performance across subcommmands.
>>
>> It depends on the queue depth.
>>
>> I think the API addition is necessary, otherwise we wouldn't have added
>> the RAMBlockNotifier which is a layering violation that does the same
>> thing (create permanent HVA->IOVA mappings).  In fact, the
>> RAMBlockNotifier could be moved out of nvme.c and made to use
>> blk_dma_map/unmap, though I'm not proposing to do it now.
>>
>> I don't think qemu-img convert and dd are impacted by IOMMU map/unmap as
>> heavily as bench, because they operate with queue depth 1.  But adding
>> map/unmap there would not be hard.
> 
> I'm not against an API existing for this.  I would just ask:
> 
> 1. It's documented so the purpose and semantics are clear.
> 2. The name cannot be confused with dma-helpers.c APIs.

Yes, I agree completely.

> Maybe blk_register_buf() or blk_add_buf_hint()?

blk_(un)register_buf, or perhaps iobuf or io_buffer, sounds good to me.

Paolo
diff mbox

Patch

diff --git a/block/block-backend.c b/block/block-backend.c
index 0df3457..784b936 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1974,3 +1974,13 @@  static void blk_root_drained_end(BdrvChild *child)
         }
     }
 }
+
+void blk_dma_map(BlockBackend *blk, void *host, size_t size)
+{
+    bdrv_dma_map(blk_bs(blk), host, size);
+}
+
+void blk_dma_unmap(BlockBackend *blk, void *host)
+{
+    bdrv_dma_unmap(blk_bs(blk), host);
+}
diff --git a/block/io.c b/block/io.c
index 2de7c77..988e4db 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2537,3 +2537,27 @@  void bdrv_io_unplug(BlockDriverState *bs)
         bdrv_io_unplug(child->bs);
     }
 }
+
+void bdrv_dma_map(BlockDriverState *bs, void *host, size_t size)
+{
+    BdrvChild *child;
+
+    if (bs->drv && bs->drv->bdrv_dma_map) {
+        bs->drv->bdrv_dma_map(bs, host, size);
+    }
+    QLIST_FOREACH(child, &bs->children, next) {
+        bdrv_dma_map(child->bs, host, size);
+    }
+}
+
+void bdrv_dma_unmap(BlockDriverState *bs, void *host)
+{
+    BdrvChild *child;
+
+    if (bs->drv && bs->drv->bdrv_dma_unmap) {
+        bs->drv->bdrv_dma_unmap(bs, host);
+    }
+    QLIST_FOREACH(child, &bs->children, next) {
+        bdrv_dma_unmap(child->bs, host);
+    }
+}
diff --git a/include/block/block.h b/include/block/block.h
index 4c149ad..f59b50a 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -624,4 +624,6 @@  void bdrv_add_child(BlockDriverState *parent, BlockDriverState *child,
                     Error **errp);
 void bdrv_del_child(BlockDriverState *parent, BdrvChild *child, Error **errp);
 
+void bdrv_dma_map(BlockDriverState *bs, void *host, size_t size);
+void bdrv_dma_unmap(BlockDriverState *bs, void *host);
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 15fa602..4092669 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -381,6 +381,10 @@  struct BlockDriver {
                              uint64_t parent_perm, uint64_t parent_shared,
                              uint64_t *nperm, uint64_t *nshared);
 
+    /* Map and unmap a buffer for I/O, as a performance hint to the
+     * driver. */
+    void (*bdrv_dma_map)(BlockDriverState *bs, void *host, size_t size);
+    void (*bdrv_dma_unmap)(BlockDriverState *bs, void *host);
     QLIST_ENTRY(BlockDriver) list;
 };
 
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 1e05281..5f7ccdb 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -239,4 +239,7 @@  void blk_io_limits_disable(BlockBackend *blk);
 void blk_io_limits_enable(BlockBackend *blk, const char *group);
 void blk_io_limits_update_group(BlockBackend *blk, const char *group);
 
+void blk_dma_map(BlockBackend *blk, void *host, size_t size);
+void blk_dma_unmap(BlockBackend *blk, void *host);
+
 #endif