Patchwork [v2,06/45] block: add support for job pause/resume

login
register
mail settings
Submitter Paolo Bonzini
Date Sept. 26, 2012, 3:56 p.m.
Message ID <1348675011-8794-7-git-send-email-pbonzini@redhat.com>
Download mbox | patch
Permalink /patch/187152/
State New
Headers show

Comments

Paolo Bonzini - Sept. 26, 2012, 3:56 p.m.
Job pausing reuses the existing support for cancellable sleeps.  A pause
happens at the next sleeping point and lasts until the coroutine is
re-entered explicitly.  Cancellation was already doing a forced resume,
so implement it explicitly in terms of resume.

Paused jobs cannot be canceled without first resuming them.  This ensures
that I/O errors are never missed by management.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: rebased for Error changes

 blockdev.c       |  4 ++++
 blockjob.c       | 35 ++++++++++++++++++++++++++++++-----
 blockjob.h       | 30 ++++++++++++++++++++++++++++++
 qapi-schema.json |  4 +++-
 qerror.h         |  3 +++
 5 file modificati, 70 inserzioni(+), 6 rimozioni(-)
Eric Blake - Sept. 26, 2012, 5:31 p.m.
On 09/26/2012 09:56 AM, Paolo Bonzini wrote:
> Job pausing reuses the existing support for cancellable sleeps.  A pause
> happens at the next sleeping point and lasts until the coroutine is
> re-entered explicitly.  Cancellation was already doing a forced resume,
> so implement it explicitly in terms of resume.
> 
> Paused jobs cannot be canceled without first resuming them.  This ensures
> that I/O errors are never missed by management.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---

> +++ b/qapi-schema.json
> @@ -1098,6 +1098,8 @@
>  #
>  # @len: the maximum progress value
>  #
> +# @paused: whether the job is paused (since 1.2)

1.3
Kevin Wolf - Sept. 27, 2012, 12:18 p.m.
Am 26.09.2012 17:56, schrieb Paolo Bonzini:
> Job pausing reuses the existing support for cancellable sleeps.  A pause
> happens at the next sleeping point and lasts until the coroutine is
> re-entered explicitly.  Cancellation was already doing a forced resume,
> so implement it explicitly in terms of resume.
> 
> Paused jobs cannot be canceled without first resuming them.  This ensures
> that I/O errors are never missed by management.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

I think there's a problem with terminology at least. What does "paused"
really mean? Is it that the job has been requested to pause, or that it
has actually yielded and is inactive?

The commit message seems to use the latter semantics (which I would
consider the intuitive one), the QMP documentation leaves it unclear,
but the code actually implements the former semantics.

Kevin
Paolo Bonzini - Sept. 27, 2012, 12:27 p.m.
Il 27/09/2012 14:18, Kevin Wolf ha scritto:
>> > 
>> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> I think there's a problem with terminology at least. What does "paused"
> really mean? Is it that the job has been requested to pause, or that it
> has actually yielded and is inactive?
> 
> The commit message seems to use the latter semantics (which I would
> consider the intuitive one),

You mean this: "Paused jobs cannot be canceled without first resuming
them".  I can add a specification, like "(even if the job actually has
not reached the sleeping point and thus is still running)".

> the QMP documentation leaves it unclear,
> but the code actually implements the former semantics.

This code comment is clear:

    /**
     * Set to true if the job is either paused, or will pause itself
     * as soon as possible (if busy == true).
     */
    bool paused;

but this one can indeed use some improvement.

/**
 * block_job_is_paused:
 * @job: The job being queried.
 *
 * Returns whether the job is currently paused.
 */
bool block_job_is_paused(BlockJob *job);


From the QMP client's point of view it doesn't really matter, does it?

- even after a job that writes to disk X has "really" paused, you cannot
read or write disk X.  It's still owned by QEMU, it hasn't been flushed,
it may play games like lazy refcounts.

- what matters is that a resume undoes a pause, even if it is still
pending (which it does).

Paolo
Kevin Wolf - Sept. 27, 2012, 12:45 p.m.
Am 27.09.2012 14:27, schrieb Paolo Bonzini:
> Il 27/09/2012 14:18, Kevin Wolf ha scritto:
>>>>
>>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> I think there's a problem with terminology at least. What does "paused"
>> really mean? Is it that the job has been requested to pause, or that it
>> has actually yielded and is inactive?
>>
>> The commit message seems to use the latter semantics (which I would
>> consider the intuitive one),
> 
> You mean this: "Paused jobs cannot be canceled without first resuming
> them".  I can add a specification, like "(even if the job actually has
> not reached the sleeping point and thus is still running)".

I actually meant "pause happens at the next sleeping point", which isn't
unspecific at all.

>> the QMP documentation leaves it unclear,
>> but the code actually implements the former semantics.
> 
> This code comment is clear:
> 
>     /**
>      * Set to true if the job is either paused, or will pause itself
>      * as soon as possible (if busy == true).
>      */
>     bool paused;

Yes, this one is a good and clear comment (and possibly I wouldn't even
have noticed without this comment)

> but this one can indeed use some improvement.
> 
> /**
>  * block_job_is_paused:
>  * @job: The job being queried.
>  *
>  * Returns whether the job is currently paused.
>  */
> bool block_job_is_paused(BlockJob *job);
> 
> 
> From the QMP client's point of view it doesn't really matter, does it?
> 
> - even after a job that writes to disk X has "really" paused, you cannot
> read or write disk X.  It's still owned by QEMU, it hasn't been flushed,
> it may play games like lazy refcounts.

I'm not sure about this one. Consider things like a built-in NBD server.
Probably we'll find more cases in the future, where some monitor command
might seem to be safe while a job is paused.

It makes me nervous that clients could make assumptions based on the
paused state despite having no way to make sure that a job is actually
stopped - the documentation doesn't even tell them about the fact that
"paused" doesn't really mean what they think it means.

> - what matters is that a resume undoes a pause, even if it is still
> pending (which it does).

Agreed, this part looks okay.

Kevin
Paolo Bonzini - Sept. 27, 2012, 12:57 p.m.
Il 27/09/2012 14:45, Kevin Wolf ha scritto:
> Am 27.09.2012 14:27, schrieb Paolo Bonzini:
>> Il 27/09/2012 14:18, Kevin Wolf ha scritto:
>>>>>
>>>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> I think there's a problem with terminology at least. What does "paused"
>>> really mean? Is it that the job has been requested to pause, or that it
>>> has actually yielded and is inactive?
>>>
>>> The commit message seems to use the latter semantics (which I would
>>> consider the intuitive one),
>>
>> You mean this: "Paused jobs cannot be canceled without first resuming
>> them".  I can add a specification, like "(even if the job actually has
>> not reached the sleeping point and thus is still running)".
> 
> I actually meant "pause happens at the next sleeping point", which isn't
> unspecific at all.

Hmm, there are two aspects: 1) when things stop running; 2) when the job
reports itself to be paused.  The commit message describes (1)
precisely, and doesn't say anything about (2).  That's too specific for
a commit message, but the header file describes it precisely.

However, in the QMP documentation, the good comment for "bool paused;"
must be replicated in BlockJobInfo's "paused" member.

>> From the QMP client's point of view it doesn't really matter, does it?
>>
>> - even after a job that writes to disk X has "really" paused, you cannot
>> read or write disk X.  It's still owned by QEMU, it hasn't been flushed,
>> it may play games like lazy refcounts.
> 
> I'm not sure about this one. Consider things like a built-in NBD server.
> Probably we'll find more cases in the future, where some monitor command
> might seem to be safe while a job is paused.

Ok, that's a good point.  I'll add a "busy" member to BlockJobInfo.

Paolo
Kevin Wolf - Sept. 27, 2012, 1:51 p.m.
Am 27.09.2012 14:57, schrieb Paolo Bonzini:
> Il 27/09/2012 14:45, Kevin Wolf ha scritto:
>> Am 27.09.2012 14:27, schrieb Paolo Bonzini:
>>> Il 27/09/2012 14:18, Kevin Wolf ha scritto:
>>>>>>
>>>>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>>> I think there's a problem with terminology at least. What does "paused"
>>>> really mean? Is it that the job has been requested to pause, or that it
>>>> has actually yielded and is inactive?
>>>>
>>>> The commit message seems to use the latter semantics (which I would
>>>> consider the intuitive one),
>>>
>>> You mean this: "Paused jobs cannot be canceled without first resuming
>>> them".  I can add a specification, like "(even if the job actually has
>>> not reached the sleeping point and thus is still running)".
>>
>> I actually meant "pause happens at the next sleeping point", which isn't
>> unspecific at all.
> 
> Hmm, there are two aspects: 1) when things stop running; 2) when the job
> reports itself to be paused.  The commit message describes (1)
> precisely, and doesn't say anything about (2).  That's too specific for
> a commit message, but the header file describes it precisely.

Yes, I understood that, I just found it confusing that both were called
"paused" in different contexts.

> However, in the QMP documentation, the good comment for "bool paused;"
> must be replicated in BlockJobInfo's "paused" member.
> 
>>> From the QMP client's point of view it doesn't really matter, does it?
>>>
>>> - even after a job that writes to disk X has "really" paused, you cannot
>>> read or write disk X.  It's still owned by QEMU, it hasn't been flushed,
>>> it may play games like lazy refcounts.
>>
>> I'm not sure about this one. Consider things like a built-in NBD server.
>> Probably we'll find more cases in the future, where some monitor command
>> might seem to be safe while a job is paused.
> 
> Ok, that's a good point.  I'll add a "busy" member to BlockJobInfo.

Ok, thanks. Together with the comment from the bool paused field this
should make pretty clear what clients would have to check for.

Kevin

Patch

diff --git a/blockdev.c b/blockdev.c
index 5772c11..df0c449 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1158,6 +1158,10 @@  void qmp_block_job_cancel(const char *device, Error **errp)
         error_set(errp, QERR_BLOCK_JOB_NOT_ACTIVE, device);
         return;
     }
+    if (job->paused) {
+        error_set(errp, QERR_BLOCK_JOB_PAUSED, device);
+        return;
+    }
 
     trace_qmp_block_job_cancel(job);
     block_job_cancel(job);
diff --git a/blockjob.c b/blockjob.c
index dea63f8..6c65521 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -99,14 +99,30 @@  void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
     job->speed = speed;
 }
 
-void block_job_cancel(BlockJob *job)
+void block_job_pause(BlockJob *job)
 {
-    job->cancelled = true;
+    job->paused = true;
+}
+
+bool block_job_is_paused(BlockJob *job)
+{
+    return job->paused;
+}
+
+void block_job_resume(BlockJob *job)
+{
+    job->paused = false;
     if (job->co && !job->busy) {
         qemu_coroutine_enter(job->co, NULL);
     }
 }
 
+void block_job_cancel(BlockJob *job)
+{
+    job->cancelled = true;
+    block_job_resume(job);
+}
+
 bool block_job_is_cancelled(BlockJob *job)
 {
     return job->cancelled;
@@ -154,12 +170,20 @@  int block_job_cancel_sync(BlockJob *job)
 
 void block_job_sleep_ns(BlockJob *job, QEMUClock *clock, int64_t ns)
 {
+    assert(job->busy);
+
     /* Check cancellation *before* setting busy = false, too!  */
-    if (!block_job_is_cancelled(job)) {
-        job->busy = false;
+    if (block_job_is_cancelled(job)) {
+        return;
+    }
+
+    job->busy = false;
+    if (block_job_is_paused(job)) {
+        qemu_coroutine_yield();
+    } else {
         co_sleep_ns(clock, ns);
-        job->busy = true;
     }
+    job->busy = true;
 }
 
 BlockJobInfo *block_job_query(BlockJob *job)
@@ -168,6 +192,7 @@  BlockJobInfo *block_job_query(BlockJob *job)
     info->type   = g_strdup(job->job_type->job_type);
     info->device = g_strdup(bdrv_get_device_name(job->bs));
     info->len    = job->len;
+    info->paused = job->paused;
     info->offset = job->offset;
     info->speed  = job->speed;
     return info;
diff --git a/blockjob.h b/blockjob.h
index c6af0fb..a2bacba 100644
--- a/blockjob.h
+++ b/blockjob.h
@@ -70,6 +70,12 @@  struct BlockJob {
     bool cancelled;
 
     /**
+     * Set to true if the job is either paused, or will pause itself
+     * as soon as possible (if busy == true).
+     */
+    bool paused;
+
+    /**
      * Set to false by the job while it is in a quiescent state, where
      * no I/O is pending and the job has yielded on any condition
      * that is not detected by #qemu_aio_wait, such as a timer.
@@ -171,6 +177,30 @@  bool block_job_is_cancelled(BlockJob *job);
 BlockJobInfo *block_job_query(BlockJob *job);
 
 /**
+ * block_job_pause:
+ * @job: The job to be paused.
+ *
+ * Asynchronously pause the specified job.
+ */
+void block_job_pause(BlockJob *job);
+
+/**
+ * block_job_resume:
+ * @job: The job to be resumed.
+ *
+ * Resume the specified job.
+ */
+void block_job_resume(BlockJob *job);
+
+/**
+ * block_job_is_paused:
+ * @job: The job being queried.
+ *
+ * Returns whether the job is currently paused.
+ */
+bool block_job_is_paused(BlockJob *job);
+
+/**
  * block_job_cancel_sync:
  * @job: The job to be canceled.
  *
diff --git a/qapi-schema.json b/qapi-schema.json
index 14e4419..f8a67ae 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1098,6 +1098,8 @@ 
 #
 # @len: the maximum progress value
 #
+# @paused: whether the job is paused (since 1.2)
+#
 # @offset: the current progress value
 #
 # @speed: the rate limit, bytes per second
@@ -1106,7 +1108,7 @@ 
 ##
 { 'type': 'BlockJobInfo',
   'data': {'type': 'str', 'device': 'str', 'len': 'int',
-           'offset': 'int', 'speed': 'int'} }
+           'offset': 'int', 'paused': 'bool', 'speed': 'int'} }
 
 ##
 # @query-block-jobs:
diff --git a/qerror.h b/qerror.h
index 485c773..c91708c 100644
--- a/qerror.h
+++ b/qerror.h
@@ -51,6 +51,9 @@  void assert_no_error(Error *err);
 #define QERR_BLOCK_JOB_NOT_ACTIVE \
     ERROR_CLASS_DEVICE_NOT_ACTIVE, "No active block job on device '%s'"
 
+#define QERR_BLOCK_JOB_PAUSED \
+    ERROR_CLASS_GENERIC_ERROR, "The block job for device '%s' is currently paused"
+
 #define QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED \
     ERROR_CLASS_GENERIC_ERROR, "Block format '%s' used by device '%s' does not support feature '%s'"