Patchwork [43/47] mirror: allow customizing the granularity

login
register
mail settings
Submitter Paolo Bonzini
Date July 24, 2012, 11:04 a.m.
Message ID <1343127865-16608-44-git-send-email-pbonzini@redhat.com>
Download mbox | patch
Permalink /patch/172875/
State New
Headers show

Comments

Paolo Bonzini - July 24, 2012, 11:04 a.m.
The desired granularity may be very different depending on the kind of
operation (e.g. continous replication vs. collapse-to-raw) and whether
the VM is expected to perform lots of I/O while mirroring is in progress.
Allow the user to customize it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/mirror.c   |   38 ++++++++++++++++++++------------------
 block_int.h      |    3 ++-
 blockdev.c       |   15 ++++++++++++++-
 hmp.c            |    2 +-
 qapi-schema.json |    7 ++++++-
 qmp-commands.hx  |    5 ++++-
 6 files changed, 47 insertions(+), 23 deletions(-)
Eric Blake - July 28, 2012, 1:43 p.m.
On 07/24/2012 05:04 AM, Paolo Bonzini wrote:
> The desired granularity may be very different depending on the kind of
> operation (e.g. continous replication vs. collapse-to-raw) and whether

s/continous/continuous/

> the VM is expected to perform lots of I/O while mirroring is in progress.
> Allow the user to customize it.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---

> @@ -857,6 +858,17 @@ void qmp_drive_mirror(const char *device, const char *target,
>      if (!has_mode) {
>          mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
>      }
> +    if (!has_granularity) {
> +        granularity = 65536;
> +    }
> +    if (granularity < 512 || granularity > 1048576 * 64) {
> +        error_set(errp, QERR_INVALID_PARAMETER, device);
> +        return;
> +    }
> +    if (granularity & (granularity - 1)) {
> +        error_set(errp, QERR_INVALID_PARAMETER, device);
> +        return;

In the XBZLRE migration series, we decided to round the users input down
to a power of two instead of reject it.  Should we do that here?  Also,
there is already an explicit QERR_PROPERY_VALUE_NOT_POWER_OF_2 that
might fit better here (depending on how the error cleanup series goes).
Paolo Bonzini - July 30, 2012, 1:40 p.m.
Il 28/07/2012 15:43, Eric Blake ha scritto:
>> > +    if (granularity < 512 || granularity > 1048576 * 64) {
>> > +        error_set(errp, QERR_INVALID_PARAMETER, device);
>> > +        return;
>> > +    }
>> > +    if (granularity & (granularity - 1)) {
>> > +        error_set(errp, QERR_INVALID_PARAMETER, device);
>> > +        return;
> In the XBZLRE migration series, we decided to round the users input down
> to a power of two instead of reject it.  Should we do that here?

I can certainly do that, but do you have a pointer to the discussion so
that I can understand the rationale?

> Also,
> there is already an explicit QERR_PROPERY_VALUE_NOT_POWER_OF_2 that
> might fit better here (depending on how the error cleanup series goes).

It's not a property value though.

Paolo
Eric Blake - July 30, 2012, 1:53 p.m.
On 07/30/2012 07:40 AM, Paolo Bonzini wrote:
> Il 28/07/2012 15:43, Eric Blake ha scritto:
>>>> +    if (granularity < 512 || granularity > 1048576 * 64) {
>>>> +        error_set(errp, QERR_INVALID_PARAMETER, device);
>>>> +        return;
>>>> +    }
>>>> +    if (granularity & (granularity - 1)) {
>>>> +        error_set(errp, QERR_INVALID_PARAMETER, device);
>>>> +        return;
>> In the XBZLRE migration series, we decided to round the users input down
>> to a power of two instead of reject it.  Should we do that here?
> 
> I can certainly do that, but do you have a pointer to the discussion so
> that I can understand the rationale?

https://lists.gnu.org/archive/html/qemu-devel/2012-05/msg02421.html
Paolo Bonzini - July 30, 2012, 2:03 p.m.
Il 30/07/2012 15:53, Eric Blake ha scritto:
> On 07/30/2012 07:40 AM, Paolo Bonzini wrote:
>> Il 28/07/2012 15:43, Eric Blake ha scritto:
>>>>> +    if (granularity < 512 || granularity > 1048576 * 64) {
>>>>> +        error_set(errp, QERR_INVALID_PARAMETER, device);
>>>>> +        return;
>>>>> +    }
>>>>> +    if (granularity & (granularity - 1)) {
>>>>> +        error_set(errp, QERR_INVALID_PARAMETER, device);
>>>>> +        return;
>>> In the XBZLRE migration series, we decided to round the users input down
>>> to a power of two instead of reject it.  Should we do that here?
>>
>> I can certainly do that, but do you have a pointer to the discussion so
>> that I can understand the rationale?
> 
> https://lists.gnu.org/archive/html/qemu-devel/2012-05/msg02421.html

Hmm, a buffer size (the cache size in the case of XBZRLE) is different
however.  There is no reason in principle why it needs to be a power of
two (except if you know how the hash table is implemented, or something
like that).  For example, the buf_size argument in this series does
indeed support a non-power-of-two size.

Requesting a granularity to be a power-of-two shouldn't be surprising to
anybody who has an idea of what a bit shift is and how it is used...

Paolo

Patch

diff --git a/block/mirror.c b/block/mirror.c
index 48ee963..81a600b 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -17,9 +17,6 @@ 
 #include "qemu/ratelimit.h"
 #include "bitmap.h"
 
-#define BLOCK_SIZE                       (1 << 20)
-#define BDRV_SECTORS_PER_DIRTY_CHUNK     (BLOCK_SIZE >> BDRV_SECTOR_BITS)
-
 #define SLICE_TIME 100000000ULL /* ns */
 
 typedef struct MirrorBlockJob {
@@ -31,6 +28,7 @@  typedef struct MirrorBlockJob {
     bool synced;
     bool complete;
     int64_t sector_num;
+    int64_t granularity;
     size_t buf_size;
     unsigned long *cow_bitmap;
     HBitmapIter hbi;
@@ -43,7 +41,7 @@  static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
     BlockDriverState *source = s->common.bs;
     BlockDriverState *target = s->target;
     QEMUIOVector qiov;
-    int ret, nb_sectors;
+    int ret, nb_sectors, nb_sectors_chunk;
     int64_t end, sector_num, cluster_num;
     struct iovec iov;
 
@@ -59,19 +57,19 @@  static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
      * is very large, we need to do COW ourselves.  The first time a cluster is
      * copied, copy it entirely.
      *
-     * Because both BDRV_SECTORS_PER_DIRTY_CHUNK and the cluster size are
-     * powers of two, the number of sectors to copy cannot exceed one cluster.
+     * Because both the granularity and the cluster size are powers of two, the
+     * number of sectors to copy cannot exceed one cluster.
      */
     sector_num = s->sector_num;
-    nb_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
-    cluster_num = sector_num / BDRV_SECTORS_PER_DIRTY_CHUNK;
+    nb_sectors_chunk = nb_sectors = s->granularity >> BDRV_SECTOR_BITS;
+    cluster_num = sector_num / nb_sectors_chunk;
     if (s->cow_bitmap && !test_bit(cluster_num, s->cow_bitmap)) {
         trace_mirror_cow(s, sector_num);
         bdrv_round_to_clusters(s->target,
-                               sector_num, BDRV_SECTORS_PER_DIRTY_CHUNK,
+                               sector_num, nb_sectors_chunk,
                                &sector_num, &nb_sectors);
-        bitmap_set(s->cow_bitmap, sector_num / BDRV_SECTORS_PER_DIRTY_CHUNK,
-                   nb_sectors / BDRV_SECTORS_PER_DIRTY_CHUNK);
+        bitmap_set(s->cow_bitmap, sector_num / nb_sectors_chunk,
+                   nb_sectors / nb_sectors_chunk);
     }
 
     end = s->common.len >> BDRV_SECTOR_BITS;
@@ -110,7 +108,7 @@  static void coroutine_fn mirror_run(void *opaque)
 {
     MirrorBlockJob *s = opaque;
     BlockDriverState *bs = s->common.bs;
-    int64_t sector_num, end, length;
+    int64_t sector_num, end, nb_sectors_chunk, length;
     BlockDriverInfo bdi;
     char backing_filename[1024];
     int ret = 0;
@@ -136,20 +134,21 @@  static void coroutine_fn mirror_run(void *opaque)
         bdrv_get_info(s->target, &bdi);
         if (s->buf_size < bdi.cluster_size) {
             s->buf_size = bdi.cluster_size;
-            length = (bdrv_getlength(bs) + BLOCK_SIZE - 1) / BLOCK_SIZE;
+            length = (bdrv_getlength(bs) + s->granularity - 1) / s->granularity;
             s->cow_bitmap = bitmap_new(length);
         }
     }
 
     end = s->common.len >> BDRV_SECTOR_BITS;
     s->buf = qemu_blockalign(bs, s->buf_size);
+    nb_sectors_chunk = s->granularity >> BDRV_SECTOR_BITS;
 
     if (s->mode == MIRROR_SYNC_MODE_FULL || s->mode == MIRROR_SYNC_MODE_TOP) {
         /* First part, loop on the sectors and initialize the dirty bitmap.  */
         BlockDriverState *base;
         base = s->mode == MIRROR_SYNC_MODE_FULL ? NULL : bs->backing_hd;
         for (sector_num = 0; sector_num < end; ) {
-            int64_t next = (sector_num | (BDRV_SECTORS_PER_DIRTY_CHUNK - 1)) + 1;
+            int64_t next = (sector_num | (nb_sectors_chunk - 1)) + 1;
             ret = bdrv_co_is_allocated_above(bs, base,
                                              sector_num, next - sector_num, &n);
 
@@ -224,7 +223,7 @@  static void coroutine_fn mirror_run(void *opaque)
             s->common.offset = (end - cnt) * BDRV_SECTOR_SIZE;
 
             if (s->common.speed) {
-                delay_ns = ratelimit_calculate_delay(&s->limit, BDRV_SECTORS_PER_DIRTY_CHUNK);
+                delay_ns = ratelimit_calculate_delay(&s->limit, nb_sectors_chunk);
             } else {
                 delay_ns = 0;
             }
@@ -323,7 +322,7 @@  static BlockJobType mirror_job_type = {
 };
 
 void mirror_start(BlockDriverState *bs, BlockDriverState *target,
-                  int64_t speed, MirrorSyncMode mode,
+                  int64_t speed, int64_t granularity, MirrorSyncMode mode,
                   BlockdevOnError on_source_error,
                   BlockdevOnError on_target_error,
                   BlockDriverCompletionFunc *cb,
@@ -331,6 +330,8 @@  void mirror_start(BlockDriverState *bs, BlockDriverState *target,
 {
     MirrorBlockJob *s;
 
+    assert((granularity & (granularity - 1)) == 0);
+
     if ((on_source_error == BLOCKDEV_ON_ERROR_STOP ||
          on_source_error == BLOCKDEV_ON_ERROR_ENOSPC) &&
         !bdrv_iostatus_is_enabled(bs)) {
@@ -347,9 +348,10 @@  void mirror_start(BlockDriverState *bs, BlockDriverState *target,
     s->on_target_error = on_target_error;
     s->target = target;
     s->mode = mode;
-    s->buf_size = BLOCK_SIZE;
+    s->granularity = granularity;
+    s->buf_size = granularity;
 
-    bdrv_set_dirty_tracking(bs, BDRV_SECTORS_PER_DIRTY_CHUNK);
+    bdrv_set_dirty_tracking(bs, granularity >> BDRV_SECTOR_BITS);
     bdrv_set_on_error(s->target, on_target_error, on_target_error);
     bdrv_iostatus_enable(s->target);
     s->common.co = qemu_coroutine_create(mirror_run);
diff --git a/block_int.h b/block_int.h
index 4cb3c5b..69bb769 100644
--- a/block_int.h
+++ b/block_int.h
@@ -310,6 +310,7 @@  void stream_start(BlockDriverState *bs, BlockDriverState *base,
  * @bs: Block device to operate on.
  * @target: Block device to write to.
  * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
+ * @granularity: The chosen granularity for the dirty bitmap.
  * @mode: Whether to collapse all images in the chain to the target.
  * @on_source_error: The action to take upon error reading from the source.
  * @on_target_error: The action to take upon error writing to the target.
@@ -323,7 +324,7 @@  void stream_start(BlockDriverState *bs, BlockDriverState *base,
  * @bs will be switched to read from @target.
  */
 void mirror_start(BlockDriverState *bs, BlockDriverState *target,
-                  int64_t speed, MirrorSyncMode mode,
+                  int64_t speed, int64_t granularity, MirrorSyncMode mode,
                   BlockdevOnError on_source_error,
                   BlockdevOnError on_target_error,
                   BlockDriverCompletionFunc *cb,
diff --git a/blockdev.c b/blockdev.c
index e160610..13b6217 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -832,6 +832,7 @@  void qmp_drive_mirror(const char *device, const char *target,
                       enum MirrorSyncMode sync,
                       bool has_mode, enum NewImageMode mode,
                       bool has_speed, int64_t speed,
+                      bool has_granularity, int64_t granularity,
                       bool has_on_source_error, BlockdevOnError on_source_error,
                       bool has_on_target_error, BlockdevOnError on_target_error,
                       Error **errp)
@@ -857,6 +858,17 @@  void qmp_drive_mirror(const char *device, const char *target,
     if (!has_mode) {
         mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
     }
+    if (!has_granularity) {
+        granularity = 65536;
+    }
+    if (granularity < 512 || granularity > 1048576 * 64) {
+        error_set(errp, QERR_INVALID_PARAMETER, device);
+        return;
+    }
+    if (granularity & (granularity - 1)) {
+        error_set(errp, QERR_INVALID_PARAMETER, device);
+        return;
+    }
 
     bs = bdrv_find(device);
     if (!bs) {
@@ -936,7 +948,8 @@  void qmp_drive_mirror(const char *device, const char *target,
         return;
     }
 
-    mirror_start(bs, target_bs, speed, sync, on_source_error, on_target_error,
+    mirror_start(bs, target_bs, speed, granularity, sync,
+                 on_source_error, on_target_error,
                  block_job_cb, bs, &local_err);
     if (local_err != NULL) {
         bdrv_delete(target_bs);
diff --git a/hmp.c b/hmp.c
index b6bc263..4f95096 100644
--- a/hmp.c
+++ b/hmp.c
@@ -719,7 +719,7 @@  void hmp_drive_mirror(Monitor *mon, const QDict *qdict)
 
     qmp_drive_mirror(device, filename, !!format, format,
                      full ? MIRROR_SYNC_MODE_FULL : MIRROR_SYNC_MODE_TOP,
-                     true, mode, false, 0,
+                     true, mode, false, 0, false, 0,
                      false, 0, false, 0, &errp);
     hmp_handle_error(mon, &errp);
 }
diff --git a/qapi-schema.json b/qapi-schema.json
index 7a97fad..fb0ccc7 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1389,6 +1389,9 @@ 
 #        (all the disk, only the sectors allocated in the topmost image, or
 #        only new I/O).
 #
+# @granularity: #optional granularity of the dirty bitmap, default is 64K.
+#               Must be a power of 2 between 512 and 64M.
+#
 # @on-source-error: #optional the action to take on an error on the source,
 #                   default 'report'.  'stop' and 'enospc' can only be used
 #                   if the block device supports io-status (see BlockInfo).
@@ -1401,6 +1404,7 @@ 
 #          If @device is not a valid block device, DeviceNotFound
 #          If @target can't be opened, OpenFileFailed
 #          If @format is invalid, InvalidBlockFormat
+#          If @granularity is not a power of 2, InvalidParameter
 #          If @on_source_error is not supported, InvalidParameter
 #
 # Since 1.2
@@ -1408,7 +1412,8 @@ 
 { 'command': 'drive-mirror',
   'data': { 'device': 'str', 'target': 'str', '*format': 'str',
             'sync': 'MirrorSyncMode', '*mode': 'NewImageMode',
-            '*speed': 'int', '*on-source-error': 'BlockdevOnError',
+            '*speed': 'int', '*granularity': 'int',
+            '*on-source-error': 'BlockdevOnError',
             '*on-target-error': 'BlockdevOnError' } }
 
 ##
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 5081b01..89e12f5 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -839,7 +839,8 @@  EQMP
     {
         .name       = "drive-mirror",
         .args_type  = "sync:s,device:B,target:s,speed:i?,mode:s?,format:s?,"
-                      "on-source-error:s?,on-target-error:s?",
+                      "on-source-error:s?,on-target-error:s?,"
+                      "granularity:i?",
         .mhandler.cmd_new = qmp_marshal_input_drive_mirror,
     },
 
@@ -863,6 +864,8 @@  Arguments:
   file/device (NewImageMode, optional, default 'absolute-paths')
 - "speed": maximum speed of the streaming job, in bytes per second
   (json-int)
+- "granularity": granularity of the dirty bitmap (json-int, default 64k,
+  must be a power of two between 512 and 64M.
 - "sync": what parts of the disk image should be copied to the destination;
   possibilities include "full" for all the disk, "top" for only the sectors
   allocated in the topmost image, or "none" to only replicate new I/O