Patchwork [v2,31/45] mirror: add support for on-source-error/on-target-error

login
register
mail settings
Submitter Paolo Bonzini
Date Sept. 26, 2012, 3:56 p.m.
Message ID <1348675011-8794-32-git-send-email-pbonzini@redhat.com>
Download mbox | patch
Permalink /patch/187151/
State New
Headers show

Comments

Paolo Bonzini - Sept. 26, 2012, 3:56 p.m.
Error management is important for mirroring; otherwise, an error on the
target (even something as "innocent" as ENOSPC) requires to start again
with a full copy.  Similar to on_read_error/on_write_error, two separate
knobs are provided for on_source_error (reads) and on_target_error (writes).
The default is 'report' for both.

The 'ignore' policy will leave the sector dirty, so that it will be
retried later.  Thus, it will not cause corruption.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: error handling for bdrv_flush, introduce mirror_error_action

 block/mirror.c   | 95 +++++++++++++++++++++++++++++++++++++++++++-------------
 block_int.h      |  4 +++
 blockdev.c       | 14 +++++++--
 hmp.c            |  3 +-
 qapi-schema.json | 11 ++++++-
 qmp-commands.hx  |  8 ++++-
 6 file modificati, 109 inserzioni(+), 26 rimozioni(-)
Kevin Wolf - Oct. 18, 2012, 1:07 p.m.
Am 26.09.2012 17:56, schrieb Paolo Bonzini:
> Error management is important for mirroring; otherwise, an error on the
> target (even something as "innocent" as ENOSPC) requires to start again
> with a full copy.  Similar to on_read_error/on_write_error, two separate
> knobs are provided for on_source_error (reads) and on_target_error (writes).
> The default is 'report' for both.
> 
> The 'ignore' policy will leave the sector dirty, so that it will be
> retried later.  Thus, it will not cause corruption.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>         v1->v2: error handling for bdrv_flush, introduce mirror_error_action
> 
>  block/mirror.c   | 95 +++++++++++++++++++++++++++++++++++++++++++-------------
>  block_int.h      |  4 +++
>  blockdev.c       | 14 +++++++--
>  hmp.c            |  3 +-
>  qapi-schema.json | 11 ++++++-
>  qmp-commands.hx  |  8 ++++-
>  6 file modificati, 109 inserzioni(+), 26 rimozioni(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 939834d..caec272 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -32,13 +32,28 @@ typedef struct MirrorBlockJob {
>      RateLimit limit;
>      BlockDriverState *target;
>      MirrorSyncMode mode;
> +    BlockdevOnError on_source_error, on_target_error;
>      bool synced;
>      bool complete;
>      int64_t sector_num;
>      uint8_t *buf;
>  } MirrorBlockJob;
>  
> -static int coroutine_fn mirror_iteration(MirrorBlockJob *s)
> +static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
> +                                            int error)
> +{
> +    s->synced = false;
> +    if (read) {
> +        return block_job_error_action(&s->common, s->common.bs,
> +                                      s->on_source_error, true, error);
> +    } else {
> +        return block_job_error_action(&s->common, s->target,
> +                                      s->on_target_error, false, error);

Here we produce an event that reports an error on s->bs, i.e. on the
source, even though the error was on the target. This makes some sense
today that the target doesn't have a name, but once it has, we would
better use the target name here.

Can we change this later on? If not, what's the way forward?

Kevin
Paolo Bonzini - Oct. 18, 2012, 1:10 p.m.
Il 18/10/2012 15:07, Kevin Wolf ha scritto:
>> > +    s->synced = false;
>> > +    if (read) {
>> > +        return block_job_error_action(&s->common, s->common.bs,
>> > +                                      s->on_source_error, true, error);
>> > +    } else {
>> > +        return block_job_error_action(&s->common, s->target,
>> > +                                      s->on_target_error, false, error);
> Here we produce an event that reports an error on s->bs, i.e. on the
> source, even though the error was on the target.

More precisely, this is an event that reports an error on s->bs's job.
In principle there is no reason why asynchronous long-running operations
are tied to a block device (in fact migration fits the definition quite
well, with the only twist that the VM is stopped at the end), but that's
the API we're stuck with.

> This makes some sense
> today that the target doesn't have a name, but once it has, we would
> better use the target name here.
> 
> Can we change this later on? If not, what's the way forward?

Yes, we can change it to one of these:

1) produce both a BLOCK_JOB_ERROR event on the source and a
BLOCK_IO_ERROR event on the target;

2) add a "device" argument to the BLOCK_JOB_ERROR and fill it.

I think I prefer the latter, but it can be discussed separately.

Paolo
Kevin Wolf - Oct. 18, 2012, 1:56 p.m.
Am 18.10.2012 15:10, schrieb Paolo Bonzini:
> Il 18/10/2012 15:07, Kevin Wolf ha scritto:
>>>> +    s->synced = false;
>>>> +    if (read) {
>>>> +        return block_job_error_action(&s->common, s->common.bs,
>>>> +                                      s->on_source_error, true, error);
>>>> +    } else {
>>>> +        return block_job_error_action(&s->common, s->target,
>>>> +                                      s->on_target_error, false, error);
>> Here we produce an event that reports an error on s->bs, i.e. on the
>> source, even though the error was on the target.
> 
> More precisely, this is an event that reports an error on s->bs's job.
> In principle there is no reason why asynchronous long-running operations
> are tied to a block device (in fact migration fits the definition quite
> well, with the only twist that the VM is stopped at the end), but that's
> the API we're stuck with.

Yes, I think I mentioned already more than once that it shouldn't be
block job, but background job without a reference to a (single)
BlockDriverState. What we have just doesn't make any sense - even for
block jobs, because block jobs working on a single BDS are the
exception, not the rule.

Should probably have tried to fix this when I first mentioned it, but
too many incoming patches prevent that I do any change myself...

>> This makes some sense
>> today that the target doesn't have a name, but once it has, we would
>> better use the target name here.
>>
>> Can we change this later on? If not, what's the way forward?
> 
> Yes, we can change it to one of these:
> 
> 1) produce both a BLOCK_JOB_ERROR event on the source and a
> BLOCK_IO_ERROR event on the target;
> 
> 2) add a "device" argument to the BLOCK_JOB_ERROR and fill it.
> 
> I think I prefer the latter, but it can be discussed separately.

I already hate it again. But yeah, we can muddle through somehow, not a
blocker at this moment.

Kevin
Paolo Bonzini - Oct. 18, 2012, 2:52 p.m.
Il 18/10/2012 15:56, Kevin Wolf ha scritto:
> Am 18.10.2012 15:10, schrieb Paolo Bonzini:
>> Il 18/10/2012 15:07, Kevin Wolf ha scritto:
>>>>> +    s->synced = false;
>>>>> +    if (read) {
>>>>> +        return block_job_error_action(&s->common, s->common.bs,
>>>>> +                                      s->on_source_error, true, error);
>>>>> +    } else {
>>>>> +        return block_job_error_action(&s->common, s->target,
>>>>> +                                      s->on_target_error, false, error);
>>> Here we produce an event that reports an error on s->bs, i.e. on the
>>> source, even though the error was on the target.
>>
>> More precisely, this is an event that reports an error on s->bs's job.
>> In principle there is no reason why asynchronous long-running operations
>> are tied to a block device (in fact migration fits the definition quite
>> well, with the only twist that the VM is stopped at the end), but that's
>> the API we're stuck with.
> 
> Yes, I think I mentioned already more than once that it shouldn't be
> block job, but background job without a reference to a (single)
> BlockDriverState. What we have just doesn't make any sense - even for
> block jobs, because block jobs working on a single BDS are the
> exception, not the rule.

I'm quite at a loss with how to change this without breaking the API. :/

Unfortunately this came up after the first release with streaming.

Paolo
Kevin Wolf - Oct. 19, 2012, 8:04 a.m.
Am 18.10.2012 16:52, schrieb Paolo Bonzini:
> Il 18/10/2012 15:56, Kevin Wolf ha scritto:
>> Am 18.10.2012 15:10, schrieb Paolo Bonzini:
>>> Il 18/10/2012 15:07, Kevin Wolf ha scritto:
>>>>>> +    s->synced = false;
>>>>>> +    if (read) {
>>>>>> +        return block_job_error_action(&s->common, s->common.bs,
>>>>>> +                                      s->on_source_error, true, error);
>>>>>> +    } else {
>>>>>> +        return block_job_error_action(&s->common, s->target,
>>>>>> +                                      s->on_target_error, false, error);
>>>> Here we produce an event that reports an error on s->bs, i.e. on the
>>>> source, even though the error was on the target.
>>>
>>> More precisely, this is an event that reports an error on s->bs's job.
>>> In principle there is no reason why asynchronous long-running operations
>>> are tied to a block device (in fact migration fits the definition quite
>>> well, with the only twist that the VM is stopped at the end), but that's
>>> the API we're stuck with.
>>
>> Yes, I think I mentioned already more than once that it shouldn't be
>> block job, but background job without a reference to a (single)
>> BlockDriverState. What we have just doesn't make any sense - even for
>> block jobs, because block jobs working on a single BDS are the
>> exception, not the rule.
> 
> I'm quite at a loss with how to change this without breaking the API. :/
> 
> Unfortunately this came up after the first release with streaming.

Then let's break the API. Not immediately, I think we can keep some
useless compatibility fields in the implementation of background jobs
that would only be needed to allow the block job commands to be a
wrapper (mostly 'bool is_block_job' and 'BlockDriverState bs', I think;
maybe even just char* bs_name would be enough). Then deprecate block
jobs and at 1.6 or so remove them.

Kevin
Paolo Bonzini - Oct. 19, 2012, 9:30 a.m.
> Then let's break the API. Not immediately, I think we can keep some
> useless compatibility fields in the implementation of background jobs
> that would only be needed to allow the block job commands to be a
> wrapper (mostly 'bool is_block_job' and 'BlockDriverState bs', I think;
> maybe even just char* bs_name would be enough). Then deprecate block
> jobs and at 1.6 or so remove them.

That's a plan.  I promise to send less patches starting at the next release
cycle!

Paolo

Patch

diff --git a/block/mirror.c b/block/mirror.c
index 939834d..caec272 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -32,13 +32,28 @@  typedef struct MirrorBlockJob {
     RateLimit limit;
     BlockDriverState *target;
     MirrorSyncMode mode;
+    BlockdevOnError on_source_error, on_target_error;
     bool synced;
     bool complete;
     int64_t sector_num;
     uint8_t *buf;
 } MirrorBlockJob;
 
-static int coroutine_fn mirror_iteration(MirrorBlockJob *s)
+static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
+                                            int error)
+{
+    s->synced = false;
+    if (read) {
+        return block_job_error_action(&s->common, s->common.bs,
+                                      s->on_source_error, true, error);
+    } else {
+        return block_job_error_action(&s->common, s->target,
+                                      s->on_target_error, false, error);
+    }
+}
+
+static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
+                                         BlockErrorAction *p_action)
 {
     BlockDriverState *source = s->common.bs;
     BlockDriverState *target = s->target;
@@ -60,9 +75,21 @@  static int coroutine_fn mirror_iteration(MirrorBlockJob *s)
     trace_mirror_one_iteration(s, s->sector_num, nb_sectors);
     ret = bdrv_co_readv(source, s->sector_num, nb_sectors, &qiov);
     if (ret < 0) {
-        return ret;
+        *p_action = mirror_error_action(s, true, -ret);
+        goto fail;
+    }
+    ret = bdrv_co_writev(target, s->sector_num, nb_sectors, &qiov);
+    if (ret < 0) {
+        *p_action = mirror_error_action(s, false, -ret);
+        s->synced = false;
+        goto fail;
     }
-    return bdrv_co_writev(target, s->sector_num, nb_sectors, &qiov);
+    return 0;
+
+fail:
+    /* Try again later.  */
+    bdrv_set_dirty(source, s->sector_num, nb_sectors);
+    return ret;
 }
 
 static void coroutine_fn mirror_run(void *opaque)
@@ -117,8 +144,9 @@  static void coroutine_fn mirror_run(void *opaque)
 
         cnt = bdrv_get_dirty_count(bs);
         if (cnt != 0) {
-            ret = mirror_iteration(s);
-            if (ret < 0) {
+            BlockErrorAction action = BDRV_ACTION_REPORT;
+            ret = mirror_iteration(s, &action);
+            if (ret < 0 && action == BDRV_ACTION_REPORT) {
                 goto immediate_exit;
             }
             cnt = bdrv_get_dirty_count(bs);
@@ -127,23 +155,26 @@  static void coroutine_fn mirror_run(void *opaque)
         should_complete = false;
         if (cnt == 0) {
             trace_mirror_before_flush(s);
-            if (bdrv_flush(s->target) < 0) {
-                goto immediate_exit;
-            }
-
-            /* We're out of the streaming phase.  From now on, if the job
-             * is cancelled we will actually complete all pending I/O and
-             * report completion.  This way, block-job-cancel will leave
-             * the target in a consistent state.
-             */
-            s->common.offset = end * BDRV_SECTOR_SIZE;
-            if (!s->synced) {
-                block_job_ready(&s->common);
-                s->synced = true;
+            ret = bdrv_flush(s->target);
+            if (ret < 0) {
+                if (mirror_error_action(s, false, -ret) == BDRV_ACTION_REPORT) {
+                    goto immediate_exit;
+                }
+            } else {
+                /* We're out of the streaming phase.  From now on, if the job
+                 * is cancelled we will actually complete all pending I/O and
+                 * report completion.  This way, block-job-cancel will leave
+                 * the target in a consistent state.
+                 */
+                s->common.offset = end * BDRV_SECTOR_SIZE;
+                if (!s->synced) {
+                    block_job_ready(&s->common);
+                    s->synced = true;
+                }
+
+                should_complete = block_job_is_cancelled(&s->common) || s->complete;
+                cnt = bdrv_get_dirty_count(bs);
             }
-
-            should_complete = block_job_is_cancelled(&s->common) || s->complete;
-            cnt = bdrv_get_dirty_count(bs);
         }
 
         if (cnt == 0 && should_complete) {
@@ -195,6 +226,7 @@  static void coroutine_fn mirror_run(void *opaque)
 immediate_exit:
     g_free(s->buf);
     bdrv_set_dirty_tracking(bs, false);
+    bdrv_iostatus_disable(s->target);
     if (s->complete && ret == 0) {
         bdrv_swap(s->target, s->common.bs);
     }
@@ -214,6 +246,13 @@  static void mirror_set_speed(BlockJob *job, int64_t speed, Error **errp)
     ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
 }
 
+static void mirror_iostatus_reset(BlockJob *job)
+{
+    MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
+
+    bdrv_iostatus_reset(s->target);
+}
+
 static void mirror_complete(BlockJob *job, Error **errp)
 {
     MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
@@ -240,25 +279,39 @@  static BlockJobType mirror_job_type = {
     .instance_size = sizeof(MirrorBlockJob),
     .job_type      = "mirror",
     .set_speed     = mirror_set_speed,
+    .iostatus_reset= mirror_iostatus_reset,
     .complete      = mirror_complete,
 };
 
 void mirror_start(BlockDriverState *bs, BlockDriverState *target,
                   int64_t speed, MirrorSyncMode mode,
+                  BlockdevOnError on_source_error,
+                  BlockdevOnError on_target_error,
                   BlockDriverCompletionFunc *cb,
                   void *opaque, Error **errp)
 {
     MirrorBlockJob *s;
 
+    if ((on_source_error == BLOCKDEV_ON_ERROR_STOP ||
+         on_source_error == BLOCKDEV_ON_ERROR_ENOSPC) &&
+        !bdrv_iostatus_is_enabled(bs)) {
+        error_set(errp, QERR_INVALID_PARAMETER, "on-source-error");
+        return;
+    }
+
     s = block_job_create(&mirror_job_type, bs, speed, cb, opaque, errp);
     if (!s) {
         return;
     }
 
+    s->on_source_error = on_source_error;
+    s->on_target_error = on_target_error;
     s->target = target;
     s->mode = mode;
     bdrv_set_dirty_tracking(bs, true);
     bdrv_set_enable_write_cache(s->target, true);
+    bdrv_set_on_error(s->target, on_target_error, on_target_error);
+    bdrv_iostatus_enable(s->target);
     s->common.co = qemu_coroutine_create(mirror_run);
     trace_mirror_start(bs, s, s->common.co, opaque);
     qemu_coroutine_enter(s->common.co, s);
diff --git a/block_int.h b/block_int.h
index 62525cf..a533c7b 100644
--- a/block_int.h
+++ b/block_int.h
@@ -321,6 +321,8 @@  void stream_start(BlockDriverState *bs, BlockDriverState *base,
  * @target: Block device to write to.
  * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
  * @mode: Whether to collapse all images in the chain to the target.
+ * @on_source_error: The action to take upon error reading from the source.
+ * @on_target_error: The action to take upon error writing to the target.
  * @cb: Completion function for the job.
  * @opaque: Opaque pointer value passed to @cb.
  * @errp: Error object.
@@ -332,6 +334,8 @@  void stream_start(BlockDriverState *bs, BlockDriverState *base,
  */
 void mirror_start(BlockDriverState *bs, BlockDriverState *target,
                   int64_t speed, MirrorSyncMode mode,
+                  BlockdevOnError on_source_error,
+                  BlockdevOnError on_target_error,
                   BlockDriverCompletionFunc *cb,
                   void *opaque, Error **errp);
 
diff --git a/blockdev.c b/blockdev.c
index 722aab5..84fee2f 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1121,7 +1121,10 @@  void qmp_drive_mirror(const char *device, const char *target,
                       bool has_format, const char *format,
                       enum MirrorSyncMode sync,
                       bool has_mode, enum NewImageMode mode,
-                      bool has_speed, int64_t speed, Error **errp)
+                      bool has_speed, int64_t speed,
+                      bool has_on_source_error, BlockdevOnError on_source_error,
+                      bool has_on_target_error, BlockdevOnError on_target_error,
+                      Error **errp)
 {
     BlockDriverInfo bdi;
     BlockDriverState *bs;
@@ -1136,6 +1139,12 @@  void qmp_drive_mirror(const char *device, const char *target,
     if (!has_speed) {
         speed = 0;
     }
+    if (!has_on_source_error) {
+        on_source_error = BLOCKDEV_ON_ERROR_REPORT;
+    }
+    if (!has_on_target_error) {
+        on_target_error = BLOCKDEV_ON_ERROR_REPORT;
+    }
     if (!has_mode) {
         mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
     }
@@ -1228,7 +1237,8 @@  void qmp_drive_mirror(const char *device, const char *target,
         }
     }
 
-    mirror_start(bs, target_bs, speed, sync, block_job_cb, bs, &local_err);
+    mirror_start(bs, target_bs, speed, sync, on_source_error, on_target_error,
+                 block_job_cb, bs, &local_err);
     if (local_err != NULL) {
         bdrv_delete(target_bs);
         error_propagate(errp, local_err);
diff --git a/hmp.c b/hmp.c
index 94d4d41..b4d2736 100644
--- a/hmp.c
+++ b/hmp.c
@@ -783,7 +783,8 @@  void hmp_drive_mirror(Monitor *mon, const QDict *qdict)
 
     qmp_drive_mirror(device, filename, !!format, format,
                      full ? MIRROR_SYNC_MODE_FULL : MIRROR_SYNC_MODE_TOP,
-                     true, mode, false, 0, &errp);
+                     true, mode, false, 0,
+                     false, 0, false, 0, &errp);
     hmp_handle_error(mon, &errp);
 }
 
diff --git a/qapi-schema.json b/qapi-schema.json
index 4827ed3..2947206 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1551,6 +1551,14 @@ 
 #        (all the disk, only the sectors allocated in the topmost image, or
 #        only new I/O).
 #
+# @on-source-error: #optional the action to take on an error on the source,
+#                   default 'report'.  'stop' and 'enospc' can only be used
+#                   if the block device supports io-status (see BlockInfo).
+#
+# @on-target-error: #optional the action to take on an error on the target,
+#                   default 'report' (no limitations, since this applies to
+#                   a different block device than @device).
+#
 # Returns: nothing on success
 #          If @device is not a valid block device, DeviceNotFound
 #
@@ -1559,7 +1567,8 @@ 
 { 'command': 'drive-mirror',
   'data': { 'device': 'str', 'target': 'str', '*format': 'str',
             'sync': 'MirrorSyncMode', '*mode': 'NewImageMode',
-            '*speed': 'int' } }
+            '*speed': 'int', '*on-source-error': 'BlockdevOnError',
+            '*on-target-error': 'BlockdevOnError' } }
 
 ##
 # @migrate_cancel
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 25800a8..ec97eaa 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -907,7 +907,8 @@  EQMP
 
     {
         .name       = "drive-mirror",
-        .args_type  = "sync:s,device:B,target:s,speed:i?,mode:s?,format:s?",
+        .args_type  = "sync:s,device:B,target:s,speed:i?,mode:s?,format:s?,"
+                      "on-source-error:s?,on-target-error:s?",
         .mhandler.cmd_new = qmp_marshal_input_drive_mirror,
     },
 
@@ -935,6 +936,11 @@  Arguments:
   possibilities include "full" for all the disk, "top" for only the sectors
   allocated in the topmost image, or "none" to only replicate new I/O
   (MirrorSyncMode).
+- "on-source-error": the action to take on an error on the source
+  (BlockdevOnError, default 'report')
+- "on-target-error": the action to take on an error on the target
+  (BlockdevOnError, default 'report')
+
 
 
 Example: