diff mbox

[7/7] qcow2: Merge the writing of the COW regions with the guest data

Message ID 4086b9ad1d1c912298f7c981f2c94a9f9a7026a3.1495536228.git.berto@igalia.com
State New
Headers show

Commit Message

Alberto Garcia May 23, 2017, 11:23 a.m. UTC
If the guest tries to write data that results on the allocation of a
new cluster, instead of writing the guest data first and then the data
from the COW regions, write everything together using one single I/O
operation.

This can improve the write performance by 25% or more, depending on
several factors such as the media type, the cluster size and the I/O
request size.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 34 ++++++++++++++++++++++--------
 block/qcow2.c         | 58 ++++++++++++++++++++++++++++++++++++++++++---------
 block/qcow2.h         |  7 +++++++
 3 files changed, 80 insertions(+), 19 deletions(-)

Comments

Anton Nefedov May 24, 2017, 4:43 p.m. UTC | #1
> If the guest tries to write data that results on the allocation of a
> new cluster, instead of writing the guest data first and then the data
> from the COW regions, write everything together using one single I/O
> operation.
>
> This can improve the write performance by 25% or more, depending on
> several factors such as the media type, the cluster size and the I/O
> request size.
>
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c | 34 ++++++++++++++++++++++--------
>  block/qcow2.c         | 58
> ++++++++++++++++++++++++++++++++++++++++++---------
>  block/qcow2.h         |  7 +++++++
>  3 files changed, 80 insertions(+), 19 deletions(-)
>
> -    /* And now we can write everything */
> -    qemu_iovec_reset(&qiov);
> -    qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
> -    ret = do_perform_cow_write(bs, m->alloc_offset, start->offset, &qiov);
> -    if (ret < 0) {
> -        goto fail;
> +    /* And now we can write everything. If we have the guest data we
> +     * can write everything in one single operation */
> +    if (m->data_qiov) {
> +        qemu_iovec_reset(&qiov);
> +        qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
> +        qemu_iovec_concat(&qiov, m->data_qiov, 0, data_bytes);
> +        qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);

Can it be a problem if (m->data_qiov->niov == IOV_MAX)?
We had to add merge-iovecs code for the case (maybe there's better
solution?)

Also, will this work if allocation is split into several l2metas?
e.g. one has cow_start.nb_bytes and another has cow_end.nb_bytes

/Anton
Alberto Garcia May 24, 2017, 7:05 p.m. UTC | #2
On Wed 24 May 2017 06:43:31 PM CEST, Anton Nefedov wrote:
>> +    if (m->data_qiov) {
>> +        qemu_iovec_reset(&qiov);
>> +        qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
>> +        qemu_iovec_concat(&qiov, m->data_qiov, 0, data_bytes);
>> +        qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);
>
> Can it be a problem if (m->data_qiov->niov == IOV_MAX)?
> We had to add merge-iovecs code for the case (maybe there's better
> solution?)

You're right, good catch! I'll add a check for that. To be honest I
don't think that's likely to happen in practice, so if it does we can
simply fall back to the old behavior (separate writes).

> Also, will this work if allocation is split into several l2metas?
> e.g. one has cow_start.nb_bytes and another has cow_end.nb_bytes

The guest request will be merged with the first l2meta that has to copy
at least one of the two regions. It doesn't matter if the other one has
nb_bytes == 0.

Berto
Kevin Wolf May 26, 2017, 10:12 a.m. UTC | #3
Am 24.05.2017 um 21:05 hat Alberto Garcia geschrieben:
> On Wed 24 May 2017 06:43:31 PM CEST, Anton Nefedov wrote:
> >> +    if (m->data_qiov) {
> >> +        qemu_iovec_reset(&qiov);
> >> +        qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
> >> +        qemu_iovec_concat(&qiov, m->data_qiov, 0, data_bytes);
> >> +        qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);
> >
> > Can it be a problem if (m->data_qiov->niov == IOV_MAX)?
> > We had to add merge-iovecs code for the case (maybe there's better
> > solution?)
> 
> You're right, good catch! I'll add a check for that. To be honest I
> don't think that's likely to happen in practice, so if it does we can
> simply fall back to the old behavior (separate writes).
> 
> > Also, will this work if allocation is split into several l2metas?
> > e.g. one has cow_start.nb_bytes and another has cow_end.nb_bytes
> 
> The guest request will be merged with the first l2meta that has to copy
> at least one of the two regions. It doesn't matter if the other one has
> nb_bytes == 0.

I think we can improve this later so that we merge everything together
(multiple l2metas and guest data) into a single request, but what this
seris implements is still a very good first step, so we shouldn't let
this stop us from taking the good rather than waiting for the perfect.

Kevin
Alberto Garcia May 26, 2017, 2:09 p.m. UTC | #4
On Tue 23 May 2017 01:23:02 PM CEST, Alberto Garcia wrote:

You can still review this now if you want, but here's a couple of minor
things I'll correct in the next revision:

> +    if (m->data_qiov) {
> +        qemu_iovec_reset(&qiov);
> +        qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
> +        qemu_iovec_concat(&qiov, m->data_qiov, 0, data_bytes);
> +        qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);

I'll check start->nb_bytes and end->nb_bytes for zero before calling
qemu_iovec_add() (no practical difference, but I find it a bit cleaner).

> +/* Check if it's possible to merge a write request with the writing of
> + * the data from the COW regions */
> +static bool can_merge_cow(uint64_t offset, unsigned bytes,
> +                          QEMUIOVector *hd_qiov, QCowL2Meta *l2meta)
> +{
> +    QCowL2Meta *m;
> +
> +    for (m = l2meta; m != NULL; m = m->next) {
> +        /* If both COW regions are empty then there's nothing to merge */
> +        if (m->cow_start.nb_bytes == 0 && m->cow_end.nb_bytes == 0) {
> +            continue;
> +        }
> +
> +        /* The data (middle) region must be immediately after the
> +         * start region */
> +        if (l2meta_cow_start(m) + m->cow_start.nb_bytes != offset) {
> +            continue;
> +        }

I realized that I'm using guest offsets in this function. Since this
checks if we can _write_ everything together I think I should be using
host offsets instead.

I also don't think this makes any difference in practice in this case,
but I'll update it.

Berto
diff mbox

Patch

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index ae737adf88..e926dad6bb 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -776,6 +776,7 @@  static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
     assert(start->nb_bytes <= UINT_MAX - end->nb_bytes);
     assert(buffer_size <= UINT_MAX - data_bytes);
     assert(start->offset + start->nb_bytes <= end->offset);
+    assert(!m->data_qiov || m->data_qiov->size == data_bytes);
 
     if (start->nb_bytes == 0 && end->nb_bytes == 0) {
         return 0;
@@ -833,17 +834,32 @@  static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
         }
     }
 
-    /* And now we can write everything */
-    qemu_iovec_reset(&qiov);
-    qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
-    ret = do_perform_cow_write(bs, m->alloc_offset, start->offset, &qiov);
-    if (ret < 0) {
-        goto fail;
+    /* And now we can write everything. If we have the guest data we
+     * can write everything in one single operation */
+    if (m->data_qiov) {
+        qemu_iovec_reset(&qiov);
+        qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
+        qemu_iovec_concat(&qiov, m->data_qiov, 0, data_bytes);
+        qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);
+        /* NOTE: we have a write_aio blkdebug event here followed by
+         * a cow_write one in do_perform_cow_write(), but there's only
+         * one single I/O operation */
+        BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
+        ret = do_perform_cow_write(bs, m->alloc_offset, start->offset, &qiov);
+    } else {
+        /* If there's no guest data then write both COW regions separately */
+        qemu_iovec_reset(&qiov);
+        qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
+        ret = do_perform_cow_write(bs, m->alloc_offset, start->offset, &qiov);
+        if (ret < 0) {
+            goto fail;
+        }
+
+        qemu_iovec_reset(&qiov);
+        qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);
+        ret = do_perform_cow_write(bs, m->alloc_offset, end->offset, &qiov);
     }
 
-    qemu_iovec_reset(&qiov);
-    qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);
-    ret = do_perform_cow_write(bs, m->alloc_offset, end->offset, &qiov);
 fail:
     qemu_co_mutex_lock(&s->lock);
 
diff --git a/block/qcow2.c b/block/qcow2.c
index a8d61f0981..71023ab773 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1575,6 +1575,38 @@  fail:
     return ret;
 }
 
+/* Check if it's possible to merge a write request with the writing of
+ * the data from the COW regions */
+static bool can_merge_cow(uint64_t offset, unsigned bytes,
+                          QEMUIOVector *hd_qiov, QCowL2Meta *l2meta)
+{
+    QCowL2Meta *m;
+
+    for (m = l2meta; m != NULL; m = m->next) {
+        /* If both COW regions are empty then there's nothing to merge */
+        if (m->cow_start.nb_bytes == 0 && m->cow_end.nb_bytes == 0) {
+            continue;
+        }
+
+        /* The data (middle) region must be immediately after the
+         * start region */
+        if (l2meta_cow_start(m) + m->cow_start.nb_bytes != offset) {
+            continue;
+        }
+
+        /* The end region must be immediately after the data (middle)
+         * region */
+        if (m->offset + m->cow_end.offset != offset + bytes) {
+            continue;
+        }
+
+        m->data_qiov = hd_qiov;
+        return true;
+    }
+
+    return false;
+}
+
 static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
                                          uint64_t bytes, QEMUIOVector *qiov,
                                          int flags)
@@ -1657,16 +1689,22 @@  static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
             goto fail;
         }
 
-        qemu_co_mutex_unlock(&s->lock);
-        BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
-        trace_qcow2_writev_data(qemu_coroutine_self(),
-                                cluster_offset + offset_in_cluster);
-        ret = bdrv_co_pwritev(bs->file,
-                              cluster_offset + offset_in_cluster,
-                              cur_bytes, &hd_qiov, 0);
-        qemu_co_mutex_lock(&s->lock);
-        if (ret < 0) {
-            goto fail;
+        /* If we need to do COW, check if it's possible to merge the
+         * writing of the guest data together with that of the COW regions.
+         * If it's not possible (or not necessary) then write the
+         * guest data now. */
+        if (!can_merge_cow(offset, cur_bytes, &hd_qiov, l2meta)) {
+            qemu_co_mutex_unlock(&s->lock);
+            BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
+            trace_qcow2_writev_data(qemu_coroutine_self(),
+                                    cluster_offset + offset_in_cluster);
+            ret = bdrv_co_pwritev(bs->file,
+                                  cluster_offset + offset_in_cluster,
+                                  cur_bytes, &hd_qiov, 0);
+            qemu_co_mutex_lock(&s->lock);
+            if (ret < 0) {
+                goto fail;
+            }
         }
 
         while (l2meta != NULL) {
diff --git a/block/qcow2.h b/block/qcow2.h
index c26ee0a33d..87b15eb4aa 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -343,6 +343,13 @@  typedef struct QCowL2Meta
      */
     Qcow2COWRegion cow_end;
 
+    /**
+     * The I/O vector with the data from the actual guest write request.
+     * If non-NULL, this is meant to be merged together with the data
+     * from @cow_start and @cow_end into one single write operation.
+     */
+    QEMUIOVector *data_qiov;
+
     /** Pointer to next L2Meta of the same write request */
     struct QCowL2Meta *next;