Patchwork [PATCHv3] block-migration: efficiently encode zero blocks

login
register
mail settings
Submitter Peter Lieven
Date July 15, 2013, 10:55 a.m.
Message ID <1373885705-13722-1-git-send-email-pl@kamp.de>
Download mbox | patch
Permalink /patch/258996/
State New
Headers show

Comments

Peter Lieven - July 15, 2013, 10:55 a.m.
this patch adds an efficient encoding for zero blocks by
adding a new flag indiciating a block is completly zero.

additionally bdrv_write_zeros() is used at the destination
to efficiently write these zeroes.

v2->v3:
 - changed type of flags in blk_send() from int to uint64_t
 - added migration capability setting to enable sending
   of zero blocks.

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 block-migration.c             |   29 +++++++++++++++++++++++------
 include/migration/migration.h |    1 +
 migration.c                   |    9 +++++++++
 qapi-schema.json              |    7 ++++++-
 4 files changed, 39 insertions(+), 7 deletions(-)
Eric Blake - July 15, 2013, 9:27 p.m.
On 07/15/2013 04:55 AM, Peter Lieven wrote:
> this patch adds an efficient encoding for zero blocks by
> adding a new flag indiciating a block is completly zero.

s/indiciating/indicating/
s/completly/completely/

> 
> additionally bdrv_write_zeros() is used at the destination
> to efficiently write these zeroes.
> 
> v2->v3:

patch revision history belongs outside of the commit message proper...

>  - changed type of flags in blk_send() from int to uint64_t
>  - added migration capability setting to enable sending
>    of zero blocks.
> 
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---

...here, after the --- separator.

>  block-migration.c             |   29 +++++++++++++++++++++++------
>  include/migration/migration.h |    1 +
>  migration.c                   |    9 +++++++++
>  qapi-schema.json              |    7 ++++++-
>  4 files changed, 39 insertions(+), 7 deletions(-)

> 
> +++ b/qapi-schema.json
> @@ -613,10 +613,15 @@
>  #          Disabled by default. Experimental: may (or may not) be renamed after
>  #          further testing is complete. (since 1.6)
>  #
> +# @zero-blocks: During storage migration encode blocks of zeroes efficiently. This
> +#          essentially saves 1MB of zeroes per block on the wire. Enabling requires
> +#          source and target VM to support this feature. Disabled by default. (since 1.6)

Does this capability have to be explicitly set on the receiving end, or
can it be automatic?  I'd prefer automatic - where only the sending end
has to explicitly turn on the optimization.

Are there any downsides to unconditionally using this when it is
supported on both sides?  With xbzrle, there are workloads where
enabling the feature can actually penalize performance, so libvirt
exposed the choice of using the feature through its API to the end user.
 But if I understand this feature, there are no downsides to using it,
other than generating a migration stream that the destination will not
understand.  Thus, I don't think libvirt public-facing APIs need to
change, and that it is just a matter of implementing this in libvirt
internals that coordinate the start of a migration:

old -> old: feature not present on source, not enabled
old -> new: feature not present on source, not enabled
new -> old: feature present on source, so query destination; feature not
present on destination, so not enabled
new -> new: feature present on source, so query destination; feature
present on destination, so use it

At any rate, the interface looks sane;
Reviewed-by: Eric Blake <eblake@redhat.com>
Peter Lieven - July 16, 2013, 7:10 a.m.
On 15.07.2013 23:27, Eric Blake wrote:
> On 07/15/2013 04:55 AM, Peter Lieven wrote:
>> this patch adds an efficient encoding for zero blocks by
>> adding a new flag indiciating a block is completly zero.
> s/indiciating/indicating/
> s/completly/completely/
>
>> additionally bdrv_write_zeros() is used at the destination
>> to efficiently write these zeroes.
>>
>> v2->v3:
> patch revision history belongs outside of the commit message proper...
>
>>   - changed type of flags in blk_send() from int to uint64_t
>>   - added migration capability setting to enable sending
>>     of zero blocks.
>>
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
> ...here, after the --- separator.
>
>>   block-migration.c             |   29 +++++++++++++++++++++++------
>>   include/migration/migration.h |    1 +
>>   migration.c                   |    9 +++++++++
>>   qapi-schema.json              |    7 ++++++-
>>   4 files changed, 39 insertions(+), 7 deletions(-)
>> +++ b/qapi-schema.json
>> @@ -613,10 +613,15 @@
>>   #          Disabled by default. Experimental: may (or may not) be renamed after
>>   #          further testing is complete. (since 1.6)
>>   #
>> +# @zero-blocks: During storage migration encode blocks of zeroes efficiently. This
>> +#          essentially saves 1MB of zeroes per block on the wire. Enabling requires
>> +#          source and target VM to support this feature. Disabled by default. (since 1.6)
> Does this capability have to be explicitly set on the receiving end, or
> can it be automatic?  I'd prefer automatic - where only the sending end
> has to explicitly turn on the optimization.
Only on the sending end. But you have to check that the receiver supports it as
you figured out. I can update the comments if you like.
>
> Are there any downsides to unconditionally using this when it is
> supported on both sides?  With xbzrle, there are workloads where

Downsides, not that I know of. The problem with xbzrle is that is is
very complex and memory and network speed my be so high that
not using XBZRLE can be better than enabling it. Here the only
penalty is the zero blocks check which is lightning fast compared to
disk access and the data is in memory anyway.

The benefit is that you gain a lot.

a) you save network bandwidth (which might be low).
b) you can explicitly write zeroes at the end. with the write zero
optimizations that exist for various drivers this can be a huge benefit
in the form that it keeps the target thin-provisioned.
Otherwise a block migration would always mean the target is fully
allocated.

Peter
Stefan Hajnoczi - July 18, 2013, 5:03 a.m.
On Mon, Jul 15, 2013 at 12:55:05PM +0200, Peter Lieven wrote:
> @@ -114,16 +115,29 @@ static void blk_mig_unlock(void)
>  static void blk_send(QEMUFile *f, BlkMigBlock * blk)
>  {
>      int len;
> +    uint64_t flags = BLK_MIG_FLAG_DEVICE_BLOCK;
> +
> +    if (migrate_zero_blocks() && buffer_is_zero(blk->buf, BLOCK_SIZE)) {
[...]
> +bool migrate_zero_blocks(void)
> +{
> +    MigrationState *s;
> +
> +    s = migrate_get_current();
> +
> +    return s->enabled_capabilities[MIGRATION_CAPABILITY_ZERO_BLOCKS];
> +}

blk_send() is called without locks held.  It would be safer and cleaner
to stash bool migrate_zero_blocks in BlkMigBlock in init_blk_migration()
instead of accessing migrate_get_current() without locks held.

This eliminates the assumption that accessing migrate_get_current() is
safe without locks.

Besides this locking issue I'm happy with the code.

Stefan
Peter Lieven - July 18, 2013, 6:02 a.m.
On 18.07.2013 07:03, Stefan Hajnoczi wrote:
> On Mon, Jul 15, 2013 at 12:55:05PM +0200, Peter Lieven wrote:
>> @@ -114,16 +115,29 @@ static void blk_mig_unlock(void)
>>   static void blk_send(QEMUFile *f, BlkMigBlock * blk)
>>   {
>>       int len;
>> +    uint64_t flags = BLK_MIG_FLAG_DEVICE_BLOCK;
>> +
>> +    if (migrate_zero_blocks() && buffer_is_zero(blk->buf, BLOCK_SIZE)) {
> [...]
>> +bool migrate_zero_blocks(void)
>> +{
>> +    MigrationState *s;
>> +
>> +    s = migrate_get_current();
>> +
>> +    return s->enabled_capabilities[MIGRATION_CAPABILITY_ZERO_BLOCKS];
>> +}
> blk_send() is called without locks held.  It would be safer and cleaner
> to stash bool migrate_zero_blocks in BlkMigBlock in init_blk_migration()
> instead of accessing migrate_get_current() without locks held.
>
> This eliminates the assumption that accessing migrate_get_current() is
> safe without locks.
>
> Besides this locking issue I'm happy with the code.
Thank you,

I will sent a v4 shortly. I have to rebase it anyway.

Peter

Patch

diff --git a/block-migration.c b/block-migration.c
index 2fd7699..29b48bd 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -29,6 +29,7 @@ 
 #define BLK_MIG_FLAG_DEVICE_BLOCK       0x01
 #define BLK_MIG_FLAG_EOS                0x02
 #define BLK_MIG_FLAG_PROGRESS           0x04
+#define BLK_MIG_FLAG_ZERO_BLOCK         0x08
 
 #define MAX_IS_ALLOCATED_SEARCH 65536
 
@@ -114,16 +115,29 @@  static void blk_mig_unlock(void)
 static void blk_send(QEMUFile *f, BlkMigBlock * blk)
 {
     int len;
+    uint64_t flags = BLK_MIG_FLAG_DEVICE_BLOCK;
+
+    if (migrate_zero_blocks() && buffer_is_zero(blk->buf, BLOCK_SIZE)) {
+        flags |= BLK_MIG_FLAG_ZERO_BLOCK;
+    }
 
     /* sector number and flags */
     qemu_put_be64(f, (blk->sector << BDRV_SECTOR_BITS)
-                     | BLK_MIG_FLAG_DEVICE_BLOCK);
+                     | flags);
 
     /* device name */
     len = strlen(blk->bmds->bs->device_name);
     qemu_put_byte(f, len);
     qemu_put_buffer(f, (uint8_t *)blk->bmds->bs->device_name, len);
 
+    /* if a block is zero we need to flush here since the network
+     * bandwidth is now a lot higher than the storage device bandwidth.
+     * thus if we queue zero blocks we slow down the migration */
+    if (flags & BLK_MIG_FLAG_ZERO_BLOCK) {
+        qemu_fflush(f);
+        return;
+    }
+
     qemu_put_buffer(f, blk->buf, BLOCK_SIZE);
 }
 
@@ -762,12 +776,15 @@  static int block_load(QEMUFile *f, void *opaque, int version_id)
                 nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
             }
 
-            buf = g_malloc(BLOCK_SIZE);
-
-            qemu_get_buffer(f, buf, BLOCK_SIZE);
-            ret = bdrv_write(bs, addr, buf, nr_sectors);
+            if (flags & BLK_MIG_FLAG_ZERO_BLOCK) {
+                ret = bdrv_write_zeroes(bs, addr, nr_sectors);
+            } else {
+                buf = g_malloc(BLOCK_SIZE);
+                qemu_get_buffer(f, buf, BLOCK_SIZE);
+                ret = bdrv_write(bs, addr, buf, nr_sectors);
+                g_free(buf);
+            }
 
-            g_free(buf);
             if (ret < 0) {
                 return ret;
             }
diff --git a/include/migration/migration.h b/include/migration/migration.h
index f0640e0..d2b02f3 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -124,6 +124,7 @@  void migrate_add_blocker(Error *reason);
 void migrate_del_blocker(Error *reason);
 
 bool migrate_rdma_pin_all(void);
+bool migrate_zero_blocks(void);
 
 int xbzrle_encode_buffer(uint8_t *old_buf, uint8_t *new_buf, int slen,
                          uint8_t *dst, int dlen);
diff --git a/migration.c b/migration.c
index 635a7e7..83559d6 100644
--- a/migration.c
+++ b/migration.c
@@ -484,6 +484,15 @@  bool migrate_rdma_pin_all(void)
     return s->enabled_capabilities[MIGRATION_CAPABILITY_X_RDMA_PIN_ALL];
 }
 
+bool migrate_zero_blocks(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->enabled_capabilities[MIGRATION_CAPABILITY_ZERO_BLOCKS];
+}
+
 int migrate_use_xbzrle(void)
 {
     MigrationState *s;
diff --git a/qapi-schema.json b/qapi-schema.json
index b251d28..6bcfabd 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -613,10 +613,15 @@ 
 #          Disabled by default. Experimental: may (or may not) be renamed after
 #          further testing is complete. (since 1.6)
 #
+# @zero-blocks: During storage migration encode blocks of zeroes efficiently. This
+#          essentially saves 1MB of zeroes per block on the wire. Enabling requires
+#          source and target VM to support this feature. Disabled by default. (since 1.6)
+#
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
-  'data': ['xbzrle', 'x-rdma-pin-all'] }
+  'data': ['xbzrle', 'x-rdma-pin-all', 'zero-blocks'] }
 
 ##
 # @MigrationCapabilityStatus