diff mbox series

[v5,11/12] blkio: implement BDRV_REQ_REGISTERED_BUF optimization

Message ID 20220927193431.22302-12-stefanha@redhat.com
State New
Headers show
Series blkio: add libblkio BlockDriver | expand

Commit Message

Stefan Hajnoczi Sept. 27, 2022, 7:34 p.m. UTC
Avoid bounce buffers when QEMUIOVector elements are within previously
registered bdrv_register_buf() buffers.

The idea is that emulated storage controllers will register guest RAM
using bdrv_register_buf() and set the BDRV_REQ_REGISTERED_BUF on I/O
requests. Therefore no blkio_map_mem_region() calls are necessary in the
performance-critical I/O code path.

This optimization doesn't apply if the I/O buffer is internally
allocated by QEMU (e.g. qcow2 metadata). There we still take the slow
path because BDRV_REQ_REGISTERED_BUF is not set.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/blkio.c | 174 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 171 insertions(+), 3 deletions(-)

Comments

Stefan Hajnoczi Sept. 28, 2022, 7:21 p.m. UTC | #1
On Tue, Sep 27, 2022 at 03:34:30PM -0400, Stefan Hajnoczi wrote:
> +    ret = blkio_get_bool(s->blkio,
> +                         "mem-regions-pinned",
> +                         &s->mem_regions_pinned);
> +    if (ret < 0) {
> +        /* Be conservative (assume pinning) if the property is not supported */
> +        s->mem_regions_pinned = true;

This is too conservative :). It can be changed to:

  s->mem_regions_pinned = s->needs_mem_regions;

That way we avoid ram_block_discard_disable() for libblkio drivers (like
io_uring in libblkio 1.0) that don't use memory regions and don't
support the "mem-regions-pinned" property yet.

Stefan
Alberto Faria Sept. 28, 2022, 8:12 p.m. UTC | #2
On Wed, Sep 28, 2022 at 8:21 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> On Tue, Sep 27, 2022 at 03:34:30PM -0400, Stefan Hajnoczi wrote:
> > +    ret = blkio_get_bool(s->blkio,
> > +                         "mem-regions-pinned",
> > +                         &s->mem_regions_pinned);
> > +    if (ret < 0) {
> > +        /* Be conservative (assume pinning) if the property is not supported */
> > +        s->mem_regions_pinned = true;
>
> This is too conservative :). It can be changed to:
>
>   s->mem_regions_pinned = s->needs_mem_regions;
>
> That way we avoid ram_block_discard_disable() for libblkio drivers (like
> io_uring in libblkio 1.0) that don't use memory regions and don't
> support the "mem-regions-pinned" property yet.

Even if a driver doesn't _need_ memory regions to be mapped before
use, it may still do something special with the ones that _are_
mapped, so we may have no choice but to set s->mem_regions_pinned =
true.

(Unless we are assuming that all future libblkio versions will either
not have such drivers, or will provide a "mem-regions-pinned"
property, but that feels brittle.)

Alberto
Stefan Hajnoczi Oct. 6, 2022, 6 p.m. UTC | #3
On Wed, Sep 28, 2022 at 09:12:36PM +0100, Alberto Campinho Faria wrote:
> On Wed, Sep 28, 2022 at 8:21 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > On Tue, Sep 27, 2022 at 03:34:30PM -0400, Stefan Hajnoczi wrote:
> > > +    ret = blkio_get_bool(s->blkio,
> > > +                         "mem-regions-pinned",
> > > +                         &s->mem_regions_pinned);
> > > +    if (ret < 0) {
> > > +        /* Be conservative (assume pinning) if the property is not supported */
> > > +        s->mem_regions_pinned = true;
> >
> > This is too conservative :). It can be changed to:
> >
> >   s->mem_regions_pinned = s->needs_mem_regions;
> >
> > That way we avoid ram_block_discard_disable() for libblkio drivers (like
> > io_uring in libblkio 1.0) that don't use memory regions and don't
> > support the "mem-regions-pinned" property yet.
> 
> Even if a driver doesn't _need_ memory regions to be mapped before
> use, it may still do something special with the ones that _are_
> mapped, so we may have no choice but to set s->mem_regions_pinned =
> true.
> 
> (Unless we are assuming that all future libblkio versions will either
> not have such drivers, or will provide a "mem-regions-pinned"
> property, but that feels brittle.)

s->needs_mem_regions determines if we'll use libblkio memory regions at
all. When it's false we skip blkio_map_mem_region() and therefore it's
safe to set s->mem_regions_pinned to false.

Stefan
Alberto Faria Oct. 6, 2022, 6:09 p.m. UTC | #4
On Thu, Oct 6, 2022 at 7:00 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> s->needs_mem_regions determines if we'll use libblkio memory regions at
> all. When it's false we skip blkio_map_mem_region() and therefore it's
> safe to set s->mem_regions_pinned to false.

blkio_register_buf() calls blkio_map_mem_region(). Is the
bdrv_register_buf callback maybe skipped somehow when
needs_mem_regions is false?

Regardless, I'd say we want to map memory regions even if we don't
strictly need to (in cases where we can do so at no additional cost),
since that may improve performance for some drivers.

Alberto
Stefan Hajnoczi Oct. 6, 2022, 6:46 p.m. UTC | #5
On Thu, Oct 06, 2022 at 07:09:36PM +0100, Alberto Faria wrote:
> On Thu, Oct 6, 2022 at 7:00 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > s->needs_mem_regions determines if we'll use libblkio memory regions at
> > all. When it's false we skip blkio_map_mem_region() and therefore it's
> > safe to set s->mem_regions_pinned to false.
> 
> blkio_register_buf() calls blkio_map_mem_region(). Is the
> bdrv_register_buf callback maybe skipped somehow when
> needs_mem_regions is false?

You're right. blkio_register_buf() will be called by virtio-blk's ram
block registrar even when s->needs_mem_regions is false.

s->mem_regions_pinned therefore has to default to true even when
s->needs_mem_regions is false.

> Regardless, I'd say we want to map memory regions even if we don't
> strictly need to (in cases where we can do so at no additional cost),
> since that may improve performance for some drivers.

The downside is that when s->mem_regions_pinned is true, virtio-mem and
anything else that calls ram discard cannot be used.

I think we can try that for now and see if the policy needs to be
refined.

Stefan
Alberto Faria Oct. 6, 2022, 6:54 p.m. UTC | #6
On Thu, Oct 6, 2022 at 7:46 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > Regardless, I'd say we want to map memory regions even if we don't
> > strictly need to (in cases where we can do so at no additional cost),
> > since that may improve performance for some drivers.
>
> The downside is that when s->mem_regions_pinned is true, virtio-mem and
> anything else that calls ram discard cannot be used.

Hmm right, losing that functionality would probably be worse than
potentially losing some performance for some drivers. Maybe a good
middle point would be to call blkio_map_mem_region() in
blkio_register_buf() iff s->needs_mem_regions ||
!s->mem_regions_pinned.

Alberto
diff mbox series

Patch

diff --git a/block/blkio.c b/block/blkio.c
index 9244a653ef..ed6ec7f167 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -11,9 +11,13 @@ 
 #include "qemu/osdep.h"
 #include <blkio.h>
 #include "block/block_int.h"
+#include "exec/memory.h"
+#include "exec/cpu-common.h" /* for qemu_ram_get_fd() */
 #include "qapi/error.h"
+#include "qemu/error-report.h"
 #include "qapi/qmp/qdict.h"
 #include "qemu/module.h"
+#include "exec/memory.h" /* for ram_block_discard_disable() */
 
 /*
  * Keep the QEMU BlockDriver names identical to the libblkio driver names.
@@ -72,6 +76,12 @@  typedef struct {
 
     /* Can we skip adding/deleting blkio_mem_regions? */
     bool needs_mem_regions;
+
+    /* Are file descriptors necessary for blkio_mem_regions? */
+    bool needs_mem_region_fd;
+
+    /* Are madvise(MADV_DONTNEED)-style operations unavailable? */
+    bool mem_regions_pinned;
 } BDRVBlkioState;
 
 /* Called with s->bounce_lock held */
@@ -346,7 +356,8 @@  blkio_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes,
         .coroutine = qemu_coroutine_self(),
     };
     BDRVBlkioState *s = bs->opaque;
-    bool use_bounce_buffer = s->needs_mem_regions;
+    bool use_bounce_buffer =
+        s->needs_mem_regions && !(flags & BDRV_REQ_REGISTERED_BUF);
     BlkioBounceBuf bounce;
     struct iovec *iov = qiov->iov;
     int iovcnt = qiov->niov;
@@ -389,7 +400,8 @@  static int coroutine_fn blkio_co_pwritev(BlockDriverState *bs, int64_t offset,
         .coroutine = qemu_coroutine_self(),
     };
     BDRVBlkioState *s = bs->opaque;
-    bool use_bounce_buffer = s->needs_mem_regions;
+    bool use_bounce_buffer =
+        s->needs_mem_regions && !(flags & BDRV_REQ_REGISTERED_BUF);
     BlkioBounceBuf bounce;
     struct iovec *iov = qiov->iov;
     int iovcnt = qiov->niov;
@@ -472,6 +484,117 @@  static void blkio_io_unplug(BlockDriverState *bs)
     }
 }
 
+typedef enum {
+    BMRR_OK,
+    BMRR_SKIP,
+    BMRR_FAIL,
+} BlkioMemRegionResult;
+
+/*
+ * Produce a struct blkio_mem_region for a given address and size.
+ *
+ * This function produces identical results when called multiple times with the
+ * same arguments. This property is necessary because blkio_unmap_mem_region()
+ * must receive the same struct blkio_mem_region field values that were passed
+ * to blkio_map_mem_region().
+ */
+static BlkioMemRegionResult
+blkio_mem_region_from_host(BlockDriverState *bs,
+                           void *host, size_t size,
+                           struct blkio_mem_region *region,
+                           Error **errp)
+{
+    BDRVBlkioState *s = bs->opaque;
+    int fd = -1;
+    ram_addr_t fd_offset = 0;
+
+    if (((uintptr_t)host | size) % s->mem_region_alignment) {
+        error_setg(errp, "unaligned buf %p with size %zu", host, size);
+        return BMRR_FAIL;
+    }
+
+    /* Attempt to find the fd for the underlying memory */
+    if (s->needs_mem_region_fd) {
+        RAMBlock *ram_block;
+        RAMBlock *end_block;
+        ram_addr_t offset;
+
+        /*
+         * bdrv_register_buf() is called with the BQL held so mr lives at least
+         * until this function returns.
+         */
+        ram_block = qemu_ram_block_from_host(host, false, &fd_offset);
+        if (ram_block) {
+            fd = qemu_ram_get_fd(ram_block);
+        }
+        if (fd == -1) {
+            /*
+             * Ideally every RAMBlock would have an fd. pc-bios and other
+             * things don't. Luckily they are usually not I/O buffers and we
+             * can just ignore them.
+             */
+            return BMRR_SKIP;
+        }
+
+        /* Make sure the fd covers the entire range */
+        end_block = qemu_ram_block_from_host(host + size - 1, false, &offset);
+        if (ram_block != end_block) {
+            error_setg(errp, "registered buffer at %p with size %zu extends "
+                       "beyond RAMBlock", host, size);
+            return BMRR_FAIL;
+        }
+    }
+
+    *region = (struct blkio_mem_region){
+        .addr = host,
+        .len = size,
+        .fd = fd,
+        .fd_offset = fd_offset,
+    };
+    return BMRR_OK;
+}
+
+static bool blkio_register_buf(BlockDriverState *bs, void *host, size_t size,
+                               Error **errp)
+{
+    BDRVBlkioState *s = bs->opaque;
+    struct blkio_mem_region region;
+    BlkioMemRegionResult region_result;
+    int ret;
+
+    region_result = blkio_mem_region_from_host(bs, host, size, &region, errp);
+    if (region_result == BMRR_SKIP) {
+        return true;
+    } else if (region_result != BMRR_OK) {
+        return false;
+    }
+
+    WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
+        ret = blkio_map_mem_region(s->blkio, &region);
+    }
+
+    if (ret < 0) {
+        error_setg(errp, "Failed to add blkio mem region %p with size %zu: %s",
+                   host, size, blkio_get_error_msg());
+        return false;
+    }
+    return true;
+}
+
+static void blkio_unregister_buf(BlockDriverState *bs, void *host, size_t size)
+{
+    BDRVBlkioState *s = bs->opaque;
+    struct blkio_mem_region region;
+
+    if (blkio_mem_region_from_host(bs, host, size, &region, NULL) != BMRR_OK) {
+        return;
+    }
+
+    WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
+        blkio_unmap_mem_region(s->blkio, &region);
+    }
+}
+
 static int blkio_io_uring_open(BlockDriverState *bs, QDict *options, int flags,
                                Error **errp)
 {
@@ -610,6 +733,17 @@  static int blkio_file_open(BlockDriverState *bs, QDict *options, int flags,
         return ret;
     }
 
+    ret = blkio_get_bool(s->blkio,
+                         "needs-mem-region-fd",
+                         &s->needs_mem_region_fd);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret,
+                         "failed to get needs-mem-region-fd: %s",
+                         blkio_get_error_msg());
+        blkio_destroy(&s->blkio);
+        return ret;
+    }
+
     ret = blkio_get_uint64(s->blkio,
                            "mem-region-alignment",
                            &s->mem_region_alignment);
@@ -621,15 +755,39 @@  static int blkio_file_open(BlockDriverState *bs, QDict *options, int flags,
         return ret;
     }
 
+    ret = blkio_get_bool(s->blkio,
+                         "mem-regions-pinned",
+                         &s->mem_regions_pinned);
+    if (ret < 0) {
+        /* Be conservative (assume pinning) if the property is not supported */
+        s->mem_regions_pinned = true;
+    }
+
+    /*
+     * Notify if libblkio drivers pin memory and prevent features like
+     * virtio-mem from working.
+     */
+    if (s->mem_regions_pinned) {
+        ret = ram_block_discard_disable(true);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret, "ram_block_discard_disable() failed");
+            blkio_destroy(&s->blkio);
+            return ret;
+        }
+    }
+
     ret = blkio_start(s->blkio);
     if (ret < 0) {
         error_setg_errno(errp, -ret, "blkio_start failed: %s",
                          blkio_get_error_msg());
         blkio_destroy(&s->blkio);
+        if (s->mem_regions_pinned) {
+            ram_block_discard_disable(false);
+        }
         return ret;
     }
 
-    bs->supported_write_flags = BDRV_REQ_FUA;
+    bs->supported_write_flags = BDRV_REQ_FUA | BDRV_REQ_REGISTERED_BUF;
     bs->supported_zero_flags = BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP |
                                BDRV_REQ_NO_FALLBACK;
 
@@ -653,6 +811,10 @@  static void blkio_close(BlockDriverState *bs)
     qemu_mutex_destroy(&s->blkio_lock);
     blkio_detach_aio_context(bs);
     blkio_destroy(&s->blkio);
+
+    if (s->mem_regions_pinned) {
+        ram_block_discard_disable(false);
+    }
 }
 
 static int64_t blkio_getlength(BlockDriverState *bs)
@@ -799,6 +961,8 @@  static BlockDriver bdrv_io_uring = {
     .bdrv_co_pwrite_zeroes      = blkio_co_pwrite_zeroes,
     .bdrv_io_unplug             = blkio_io_unplug,
     .bdrv_refresh_limits        = blkio_refresh_limits,
+    .bdrv_register_buf          = blkio_register_buf,
+    .bdrv_unregister_buf        = blkio_unregister_buf,
 };
 
 static BlockDriver bdrv_virtio_blk_vhost_user = {
@@ -818,6 +982,8 @@  static BlockDriver bdrv_virtio_blk_vhost_user = {
     .bdrv_co_pwrite_zeroes      = blkio_co_pwrite_zeroes,
     .bdrv_io_unplug             = blkio_io_unplug,
     .bdrv_refresh_limits        = blkio_refresh_limits,
+    .bdrv_register_buf          = blkio_register_buf,
+    .bdrv_unregister_buf        = blkio_unregister_buf,
 };
 
 static BlockDriver bdrv_virtio_blk_vhost_vdpa = {
@@ -837,6 +1003,8 @@  static BlockDriver bdrv_virtio_blk_vhost_vdpa = {
     .bdrv_co_pwrite_zeroes      = blkio_co_pwrite_zeroes,
     .bdrv_io_unplug             = blkio_io_unplug,
     .bdrv_refresh_limits        = blkio_refresh_limits,
+    .bdrv_register_buf          = blkio_register_buf,
+    .bdrv_unregister_buf        = blkio_unregister_buf,
 };
 
 static void bdrv_blkio_init(void)