diff mbox

[v2,4/7] qcow2: Split do_perform_cow() into _read(), _encrypt() and _write()

Message ID 643c9f0a83c8f5c913a55a556dcddd13754d1275.1496844254.git.berto@igalia.com
State New
Headers show

Commit Message

Alberto Garcia June 7, 2017, 2:08 p.m. UTC
This patch splits do_perform_cow() into three separate functions to
read, encrypt and write the COW regions.

perform_cow() can now read both regions first, then encrypt them and
finally write them to disk. The memory allocation is also done in
this function now, using one single buffer large enough to hold both
regions.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 114 +++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 84 insertions(+), 30 deletions(-)

Comments

Eric Blake June 9, 2017, 2:53 p.m. UTC | #1
On 06/07/2017 09:08 AM, Alberto Garcia wrote:
> This patch splits do_perform_cow() into three separate functions to
> read, encrypt and write the COW regions.
> 
> perform_cow() can now read both regions first, then encrypt them and
> finally write them to disk. The memory allocation is also done in
> this function now, using one single buffer large enough to hold both
> regions.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c | 114 +++++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 84 insertions(+), 30 deletions(-)
> 

Let's suppose we have a guest issuing 512-byte aligned requests and a
host that requires 4k alignment; and the guest does an operation that
needs a COW with one sector at both the front and end of the cluster.

> @@ -760,22 +776,59 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
>      BDRVQcow2State *s = bs->opaque;
>      Qcow2COWRegion *start = &m->cow_start;
>      Qcow2COWRegion *end = &m->cow_end;
> +    unsigned buffer_size = start->nb_bytes + end->nb_bytes;

This sets buffer_size to 1024 initially.

> +    uint8_t *start_buffer, *end_buffer;
>      int ret;
>  
> +    assert(start->nb_bytes <= UINT_MAX - end->nb_bytes);
> +
>      if (start->nb_bytes == 0 && end->nb_bytes == 0) {
>          return 0;
>      }
>  
> +    /* Reserve a buffer large enough to store the data from both the
> +     * start and end COW regions */
> +    start_buffer = qemu_try_blockalign(bs, buffer_size);

This is going to allocate a bigger buffer, namely one that is at least
4k in size (at least, that's my understanding - the block device is able
to track its preferred IO size/alignment).

> +    if (start_buffer == NULL) {
> +        return -ENOMEM;
> +    }
> +    /* The part of the buffer where the end region is located */
> +    end_buffer = start_buffer + start->nb_bytes;

But now end_buffer does not have optimal alignment.  In the old code, we
called qemu_try_blockalign() twice, so that both read()s were called on
a 4k boundary; but now, the end read() is called unaligned to a 4k
boundary.  Of course, since we're only reading 512 bytes, instead of 4k,
it MIGHT not matter, but I don't know if we are going to cause a bounce
buffer to come into play that we could otherwise avoid if we were
smarter with our alignments.  Is that something we need to analyze
further, or even possibly intentionally over-allocate our buffer to
ensure optimal read alignments?

> +    /* And now we can write everything */
> +    ret = do_perform_cow_write(bs, m->alloc_offset, start->offset,
> +                               start_buffer, start->nb_bytes);
> +    if (ret < 0) {
> +        goto fail;
> +    }
>  
> +    ret = do_perform_cow_write(bs, m->alloc_offset, end->offset,
> +                               end_buffer, end->nb_bytes);

At any rate, other than the potential of a pessimization due to poor
alignments, your split looks sane, and it makes it more obvious that if
we set up the write iov, a later patch could then call a single
do_perform_cow_write using pwritev() over both chunks in one syscall.
Alberto Garcia June 12, 2017, 1 p.m. UTC | #2
On Fri 09 Jun 2017 04:53:05 PM CEST, Eric Blake wrote:
> Let's suppose we have a guest issuing 512-byte aligned requests and a
> host that requires 4k alignment; and the guest does an operation that
> needs a COW with one sector at both the front and end of the cluster.
>
>> @@ -760,22 +776,59 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
>>      BDRVQcow2State *s = bs->opaque;
>>      Qcow2COWRegion *start = &m->cow_start;
>>      Qcow2COWRegion *end = &m->cow_end;
>> +    unsigned buffer_size = start->nb_bytes + end->nb_bytes;
>
> This sets buffer_size to 1024 initially.
>
>> +    uint8_t *start_buffer, *end_buffer;
>>      int ret;
>>  
>> +    assert(start->nb_bytes <= UINT_MAX - end->nb_bytes);
>> +
>>      if (start->nb_bytes == 0 && end->nb_bytes == 0) {
>>          return 0;
>>      }
>>  
>> +    /* Reserve a buffer large enough to store the data from both the
>> +     * start and end COW regions */
>> +    start_buffer = qemu_try_blockalign(bs, buffer_size);
>
> This is going to allocate a bigger buffer, namely one that is at least
> 4k in size (at least, that's my understanding - the block device is
> able to track its preferred IO size/alignment).
>
>> +    if (start_buffer == NULL) {
>> +        return -ENOMEM;
>> +    }
>> +    /* The part of the buffer where the end region is located */
>> +    end_buffer = start_buffer + start->nb_bytes;
>
> But now end_buffer does not have optimal alignment.  In the old code,
> we called qemu_try_blockalign() twice, so that both read()s were
> called on a 4k boundary; but now, the end read() is called unaligned
> to a 4k boundary.  Of course, since we're only reading 512 bytes,
> instead of 4k, it MIGHT not matter, but I don't know if we are going
> to cause a bounce buffer to come into play that we could otherwise
> avoid if we were smarter with our alignments.  Is that something we
> need to analyze further, or even possibly intentionally over-allocate
> our buffer to ensure optimal read alignments?

I think you're right, I actually thought about this but forgot it when
preparing the series for publishing.

If I'm not wrong it should be simply a matter of replacing

   buffer_size = start->nb_bytes + end->nb_bytes;

with

   buffer_size = QEMU_ALIGN_UP(start->nb_bytes, bdrv_opt_mem_align(bs))
                 + end->nb_bytes;

and then

    end_buffer = start_buffer + buffer_size - end->nb_bytes;

    (this one we do anyway in patch 5)

Berto
diff mbox

Patch

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 4c03639a72..af43e6a34f 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -403,34 +403,26 @@  int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t sector_num,
     return 0;
 }
 
-static int coroutine_fn do_perform_cow(BlockDriverState *bs,
-                                       uint64_t src_cluster_offset,
-                                       uint64_t cluster_offset,
-                                       unsigned offset_in_cluster,
-                                       unsigned bytes)
+static int coroutine_fn do_perform_cow_read(BlockDriverState *bs,
+                                            uint64_t src_cluster_offset,
+                                            unsigned offset_in_cluster,
+                                            uint8_t *buffer,
+                                            unsigned bytes)
 {
-    BDRVQcow2State *s = bs->opaque;
     QEMUIOVector qiov;
-    struct iovec iov;
+    struct iovec iov = { .iov_base = buffer, .iov_len = bytes };
     int ret;
 
     if (bytes == 0) {
         return 0;
     }
 
-    iov.iov_len = bytes;
-    iov.iov_base = qemu_try_blockalign(bs, iov.iov_len);
-    if (iov.iov_base == NULL) {
-        return -ENOMEM;
-    }
-
     qemu_iovec_init_external(&qiov, &iov, 1);
 
     BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);
 
     if (!bs->drv) {
-        ret = -ENOMEDIUM;
-        goto out;
+        return -ENOMEDIUM;
     }
 
     /* Call .bdrv_co_readv() directly instead of using the public block-layer
@@ -440,39 +432,63 @@  static int coroutine_fn do_perform_cow(BlockDriverState *bs,
     ret = bs->drv->bdrv_co_preadv(bs, src_cluster_offset + offset_in_cluster,
                                   bytes, &qiov, 0);
     if (ret < 0) {
-        goto out;
+        return ret;
     }
 
-    if (bs->encrypted) {
+    return 0;
+}
+
+static bool coroutine_fn do_perform_cow_encrypt(BlockDriverState *bs,
+                                                uint64_t src_cluster_offset,
+                                                unsigned offset_in_cluster,
+                                                uint8_t *buffer,
+                                                unsigned bytes)
+{
+    if (bytes && bs->encrypted) {
+        BDRVQcow2State *s = bs->opaque;
         int64_t sector = (src_cluster_offset + offset_in_cluster)
                          >> BDRV_SECTOR_BITS;
         assert(s->cipher);
         assert((offset_in_cluster & ~BDRV_SECTOR_MASK) == 0);
         assert((bytes & ~BDRV_SECTOR_MASK) == 0);
-        if (qcow2_encrypt_sectors(s, sector, iov.iov_base, iov.iov_base,
+        if (qcow2_encrypt_sectors(s, sector, buffer, buffer,
                                   bytes >> BDRV_SECTOR_BITS, true, NULL) < 0) {
-            ret = -EIO;
-            goto out;
+            return false;
         }
     }
+    return true;
+}
+
+static int coroutine_fn do_perform_cow_write(BlockDriverState *bs,
+                                             uint64_t cluster_offset,
+                                             unsigned offset_in_cluster,
+                                             uint8_t *buffer,
+                                             unsigned bytes)
+{
+    QEMUIOVector qiov;
+    struct iovec iov = { .iov_base = buffer, .iov_len = bytes };
+    int ret;
+
+    if (bytes == 0) {
+        return 0;
+    }
+
+    qemu_iovec_init_external(&qiov, &iov, 1);
 
     ret = qcow2_pre_write_overlap_check(bs, 0,
             cluster_offset + offset_in_cluster, bytes);
     if (ret < 0) {
-        goto out;
+        return ret;
     }
 
     BLKDBG_EVENT(bs->file, BLKDBG_COW_WRITE);
     ret = bdrv_co_pwritev(bs->file, cluster_offset + offset_in_cluster,
                           bytes, &qiov, 0);
     if (ret < 0) {
-        goto out;
+        return ret;
     }
 
-    ret = 0;
-out:
-    qemu_vfree(iov.iov_base);
-    return ret;
+    return 0;
 }
 
 
@@ -760,22 +776,59 @@  static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
     BDRVQcow2State *s = bs->opaque;
     Qcow2COWRegion *start = &m->cow_start;
     Qcow2COWRegion *end = &m->cow_end;
+    unsigned buffer_size = start->nb_bytes + end->nb_bytes;
+    uint8_t *start_buffer, *end_buffer;
     int ret;
 
+    assert(start->nb_bytes <= UINT_MAX - end->nb_bytes);
+
     if (start->nb_bytes == 0 && end->nb_bytes == 0) {
         return 0;
     }
 
+    /* Reserve a buffer large enough to store the data from both the
+     * start and end COW regions */
+    start_buffer = qemu_try_blockalign(bs, buffer_size);
+    if (start_buffer == NULL) {
+        return -ENOMEM;
+    }
+    /* The part of the buffer where the end region is located */
+    end_buffer = start_buffer + start->nb_bytes;
+
     qemu_co_mutex_unlock(&s->lock);
-    ret = do_perform_cow(bs, m->offset, m->alloc_offset,
-                         start->offset, start->nb_bytes);
+    /* First we read the existing data from both COW regions */
+    ret = do_perform_cow_read(bs, m->offset, start->offset,
+                              start_buffer, start->nb_bytes);
     if (ret < 0) {
         goto fail;
     }
 
-    ret = do_perform_cow(bs, m->offset, m->alloc_offset,
-                         end->offset, end->nb_bytes);
+    ret = do_perform_cow_read(bs, m->offset, end->offset,
+                              end_buffer, end->nb_bytes);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    /* Encrypt the data if necessary before writing it */
+    if (bs->encrypted) {
+        if (!do_perform_cow_encrypt(bs, m->offset, start->offset,
+                                    start_buffer, start->nb_bytes) ||
+            !do_perform_cow_encrypt(bs, m->offset, end->offset,
+                                    end_buffer, end->nb_bytes)) {
+            ret = -EIO;
+            goto fail;
+        }
+    }
+
+    /* And now we can write everything */
+    ret = do_perform_cow_write(bs, m->alloc_offset, start->offset,
+                               start_buffer, start->nb_bytes);
+    if (ret < 0) {
+        goto fail;
+    }
 
+    ret = do_perform_cow_write(bs, m->alloc_offset, end->offset,
+                               end_buffer, end->nb_bytes);
 fail:
     qemu_co_mutex_lock(&s->lock);
 
@@ -788,6 +841,7 @@  fail:
         qcow2_cache_depends_on_flush(s->l2_table_cache);
     }
 
+    qemu_vfree(start_buffer);
     return ret;
 }