diff mbox

docs: update bitmaps.md

Message ID 1447196417-26081-1-git-send-email-jsnow@redhat.com
State New
Headers show

Commit Message

John Snow Nov. 10, 2015, 11 p.m. UTC
Include new error handling scenarios for 2.5.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 docs/bitmaps.md | 157 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 157 insertions(+)

Comments

Eric Blake Nov. 10, 2015, 11:09 p.m. UTC | #1
On 11/10/2015 04:00 PM, John Snow wrote:
> Include new error handling scenarios for 2.5.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  docs/bitmaps.md | 157 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 157 insertions(+)
> 
> diff --git a/docs/bitmaps.md b/docs/bitmaps.md
> index 9fd8ea6..a2e8d51 100644
> --- a/docs/bitmaps.md
> +++ b/docs/bitmaps.md
> @@ -19,12 +19,20 @@ which is included at the end of this document.
>  * A dirty bitmap's name is unique to the node, but bitmaps attached to different
>    nodes can share the same name.
>  
> +* Dirty bitmaps created for internal use by QEMU may be anonymous and have no
> +  name, but any user-created bitmaps may not be. There can be any number of
> +  anonymous bitmaps per node.

may not be what?  Maybe:

Dirty bitmaps ... have no name, but any user-created bitmaps will have a
name.  There can be...


> +
> +### Grouped Completion Mode
> +

> +    * Later, QEMU sends notice that the second job has errored out,
> +      but that the first job was also cancelled:
> +        ```json
> +        { "timestamp": { "seconds": 1447193702, "microseconds": 632377 },
> +          "data": { "device": "drive1", "action": "report",
> +                    "operation": "read" },
> +          "event": "BLOCK_JOB_ERROR" }
> +        ```
> +
> +        ```json
> +        { "timestamp": { "seconds": 1447193702, "microseconds": 640074 },
> +          "data": { "speed": 0, "offset": 0, "len": 67108864,
> +                    "error": "Input/output error",
> +                    "device": "drive1", "type": "backup" },
> +          "event": "BLOCK_JOB_COMPLETED" }
> +        ```

So we get both an error and a completion notice on failed jobs?  I guess
it's because you can configure jobs to report errors but continue on, so
the error notification alone doesn't say whether the job ends.

> +
> +        ```json
> +        { "timestamp": { "seconds": 1447193702, "microseconds": 640163 },
> +          "data": { "device": "drive0", "type": "backup", "speed": 0,
> +                    "len": 67108864, "offset": 16777216 },
> +          "event": "BLOCK_JOB_CANCELLED" }
> +        ```
> +

Thanks; these examples are very useful.

Reviewed-by: Eric Blake <eblake@redhat.com>
John Snow Nov. 10, 2015, 11:18 p.m. UTC | #2
On 11/10/2015 06:09 PM, Eric Blake wrote:
> On 11/10/2015 04:00 PM, John Snow wrote:
>> Include new error handling scenarios for 2.5.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>  docs/bitmaps.md | 157 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 157 insertions(+)
>>
>> diff --git a/docs/bitmaps.md b/docs/bitmaps.md
>> index 9fd8ea6..a2e8d51 100644
>> --- a/docs/bitmaps.md
>> +++ b/docs/bitmaps.md
>> @@ -19,12 +19,20 @@ which is included at the end of this document.
>>  * A dirty bitmap's name is unique to the node, but bitmaps attached to different
>>    nodes can share the same name.
>>  
>> +* Dirty bitmaps created for internal use by QEMU may be anonymous and have no
>> +  name, but any user-created bitmaps may not be. There can be any number of
>> +  anonymous bitmaps per node.
> 
> may not be what?  Maybe:
> 
> Dirty bitmaps ... have no name, but any user-created bitmaps will have a
> name.  There can be...
> 
> 
>> +
>> +### Grouped Completion Mode
>> +
> 
>> +    * Later, QEMU sends notice that the second job has errored out,
>> +      but that the first job was also cancelled:
>> +        ```json
>> +        { "timestamp": { "seconds": 1447193702, "microseconds": 632377 },
>> +          "data": { "device": "drive1", "action": "report",
>> +                    "operation": "read" },
>> +          "event": "BLOCK_JOB_ERROR" }
>> +        ```
>> +
>> +        ```json
>> +        { "timestamp": { "seconds": 1447193702, "microseconds": 640074 },
>> +          "data": { "speed": 0, "offset": 0, "len": 67108864,
>> +                    "error": "Input/output error",
>> +                    "device": "drive1", "type": "backup" },
>> +          "event": "BLOCK_JOB_COMPLETED" }
>> +        ```
> 
> So we get both an error and a completion notice on failed jobs?  I guess
> it's because you can configure jobs to report errors but continue on, so
> the error notification alone doesn't say whether the job ends.
> 

Not a design choice of mine; that's just what already happens when a
block job fails. You get the error notice *AND* the "completion" notice
with the error field set.

>> +
>> +        ```json
>> +        { "timestamp": { "seconds": 1447193702, "microseconds": 640163 },
>> +          "data": { "device": "drive0", "type": "backup", "speed": 0,
>> +                    "len": 67108864, "offset": 16777216 },
>> +          "event": "BLOCK_JOB_CANCELLED" }
>> +        ```
>> +
> 
> Thanks; these examples are very useful.
> 

I'm glad.

> Reviewed-by: Eric Blake <eblake@redhat.com>
> 

Thanks,
--js
Stefan Hajnoczi Nov. 16, 2015, 3:30 a.m. UTC | #3
On Tue, Nov 10, 2015 at 06:00:17PM -0500, John Snow wrote:
> Include new error handling scenarios for 2.5.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  docs/bitmaps.md | 157 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 157 insertions(+)

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan
diff mbox

Patch

diff --git a/docs/bitmaps.md b/docs/bitmaps.md
index 9fd8ea6..a2e8d51 100644
--- a/docs/bitmaps.md
+++ b/docs/bitmaps.md
@@ -19,12 +19,20 @@  which is included at the end of this document.
 * A dirty bitmap's name is unique to the node, but bitmaps attached to different
   nodes can share the same name.
 
+* Dirty bitmaps created for internal use by QEMU may be anonymous and have no
+  name, but any user-created bitmaps may not be. There can be any number of
+  anonymous bitmaps per node.
+
+* The name of a user-created bitmap must not be empty ("").
+
 ## Bitmap Modes
 
 * A Bitmap can be "frozen," which means that it is currently in-use by a backup
   operation and cannot be deleted, renamed, written to, reset,
   etc.
 
+* The normal operating mode for a bitmap is "active."
+
 ## Basic QMP Usage
 
 ### Supported Commands ###
@@ -319,6 +327,155 @@  full backup as a backing image.
       "event": "BLOCK_JOB_COMPLETED" }
     ```
 
+### Partial Transactional Failures
+
+* Sometimes, a transaction will succeed in launching and return success,
+  but then later the backup jobs themselves may fail. It is possible that
+  a management application may have to deal with a partial backup failure
+  after a successful transaction.
+
+* If multiple backup jobs are specified in a single transaction, when one of
+  them fails, it will not interact with the other backup jobs in any way.
+
+* The job(s) that succeeded will clear the dirty bitmap associated with the
+  operation, but the job(s) that failed will not. It is not "safe" to delete
+  any incremental backups that were created successfully in this scenario,
+  even though others failed.
+
+#### Example
+
+* QMP example highlighting two backup jobs:
+
+    ```json
+    { "execute": "transaction",
+      "arguments": {
+        "actions": [
+          { "type": "drive-backup",
+            "data": { "device": "drive0", "bitmap": "bitmap0",
+                      "format": "qcow2", "mode": "existing",
+                      "sync": "incremental", "target": "d0-incr-1.qcow2" } },
+          { "type": "drive-backup",
+            "data": { "device": "drive1", "bitmap": "bitmap1",
+                      "format": "qcow2", "mode": "existing",
+                      "sync": "incremental", "target": "d1-incr-1.qcow2" } },
+        ]
+      }
+    }
+    ```
+
+* QMP example response, highlighting one success and one failure:
+    * Acknowledgement that the Transaction was accepted and jobs were launched:
+        ```json
+        { "return": {} }
+        ```
+
+    * Later, QEMU sends notice that the first job was completed:
+        ```json
+        { "timestamp": { "seconds": 1447192343, "microseconds": 615698 },
+          "data": { "device": "drive0", "type": "backup",
+                     "speed": 0, "len": 67108864, "offset": 67108864 },
+          "event": "BLOCK_JOB_COMPLETED"
+        }
+        ```
+
+    * Later yet, QEMU sends notice that the second job has failed:
+        ```json
+        { "timestamp": { "seconds": 1447192399, "microseconds": 683015 },
+          "data": { "device": "drive1", "action": "report",
+                    "operation": "read" },
+          "event": "BLOCK_JOB_ERROR" }
+        ```
+
+        ```json
+        { "timestamp": { "seconds": 1447192399, "microseconds": 685853 },
+          "data": { "speed": 0, "offset": 0, "len": 67108864,
+                    "error": "Input/output error",
+                    "device": "drive1", "type": "backup" },
+          "event": "BLOCK_JOB_COMPLETED" }
+
+* In the above example, "d0-incr-1.qcow2" is valid and must be kept,
+  but "d1-incr-1.qcow2" is invalid and should be deleted. If a VM-wide
+  incremental backup of all drives at a point-in-time is to be made,
+  new backups for both drives will need to be made, taking into account
+  that a new incremental backup for drive0 needs to be based on top of
+  "d0-incr-1.qcow2."
+
+### Grouped Completion Mode
+
+* While jobs launched by transactions normally complete or fail on their own,
+  it is possible to instruct them to complete or fail together as a group.
+
+* QMP transactions take an optional properties structure that can affect
+  the semantics of the transaction.
+
+* The "completion-mode" transaction property can be either "individual"
+  which is the default, legacy behavior described above, or "grouped,"
+  a new behavior detailed below.
+
+* Delayed Completion: In grouped completion mode, no jobs will report
+  success until all jobs are ready to report success.
+
+* Grouped failure: If any job fails in grouped completion mode, all remaining
+  jobs will be cancelled. Any incremental backups will restore their dirty
+  bitmap objects as if no backup command was ever issued.
+
+    * Regardless of if QEMU reports a particular incremental backup job as
+      CANCELLED or as an ERROR, the in-memory bitmap will be restored.
+
+#### Example
+
+* Here's the same example scenario from above with the new property:
+
+    ```json
+    { "execute": "transaction",
+      "arguments": {
+        "actions": [
+          { "type": "drive-backup",
+            "data": { "device": "drive0", "bitmap": "bitmap0",
+                      "format": "qcow2", "mode": "existing",
+                      "sync": "incremental", "target": "d0-incr-1.qcow2" } },
+          { "type": "drive-backup",
+            "data": { "device": "drive1", "bitmap": "bitmap1",
+                      "format": "qcow2", "mode": "existing",
+                      "sync": "incremental", "target": "d1-incr-1.qcow2" } },
+        ],
+        "properties": {
+          "completion-mode": "grouped"
+        }
+      }
+    }
+    ```
+
+* QMP example response, highlighting a failure for drive2:
+    * Acknowledgement that the Transaction was accepted and jobs were launched:
+        ```json
+        { "return": {} }
+        ```
+
+    * Later, QEMU sends notice that the second job has errored out,
+      but that the first job was also cancelled:
+        ```json
+        { "timestamp": { "seconds": 1447193702, "microseconds": 632377 },
+          "data": { "device": "drive1", "action": "report",
+                    "operation": "read" },
+          "event": "BLOCK_JOB_ERROR" }
+        ```
+
+        ```json
+        { "timestamp": { "seconds": 1447193702, "microseconds": 640074 },
+          "data": { "speed": 0, "offset": 0, "len": 67108864,
+                    "error": "Input/output error",
+                    "device": "drive1", "type": "backup" },
+          "event": "BLOCK_JOB_COMPLETED" }
+        ```
+
+        ```json
+        { "timestamp": { "seconds": 1447193702, "microseconds": 640163 },
+          "data": { "device": "drive0", "type": "backup", "speed": 0,
+                    "len": 67108864, "offset": 16777216 },
+          "event": "BLOCK_JOB_CANCELLED" }
+        ```
+
 <!--
 The FreeBSD Documentation License