diff mbox

[COLO-Frame,v10,23/38] qmp event: Add event notification for COLO error

Message ID 1446551816-15768-24-git-send-email-zhang.zhanghailiang@huawei.com
State New
Headers show

Commit Message

Zhanghailiang Nov. 3, 2015, 11:56 a.m. UTC
If some errors happen during VM's COLO FT stage, it's important to notify the users
of this event. Together with 'colo_lost_heartbeat', users can intervene in COLO's
failover work immediately.
If users don't want to get involved in COLO's failover verdict,
it is still necessary to notify users that we exit COLO mode.

Cc: Markus Armbruster <armbru@redhat.com>
Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 docs/qmp-events.txt | 17 +++++++++++++++++
 migration/colo.c    | 13 +++++++++++++
 qapi-schema.json    | 16 ++++++++++++++++
 qapi/event.json     | 17 +++++++++++++++++
 4 files changed, 63 insertions(+)

Comments

Eric Blake Nov. 20, 2015, 9:50 p.m. UTC | #1
On 11/03/2015 04:56 AM, zhanghailiang wrote:
> If some errors happen during VM's COLO FT stage, it's important to notify the users
> of this event. Together with 'colo_lost_heartbeat', users can intervene in COLO's
> failover work immediately.
> If users don't want to get involved in COLO's failover verdict,
> it is still necessary to notify users that we exit COLO mode.

s/exit/exited/

> 
> Cc: Markus Armbruster <armbru@redhat.com>
> Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> ---
>  docs/qmp-events.txt | 17 +++++++++++++++++
>  migration/colo.c    | 13 +++++++++++++
>  qapi-schema.json    | 16 ++++++++++++++++
>  qapi/event.json     | 17 +++++++++++++++++
>  4 files changed, 63 insertions(+)
> 
> diff --git a/docs/qmp-events.txt b/docs/qmp-events.txt
> index d2f1ce4..165dd76 100644
> --- a/docs/qmp-events.txt
> +++ b/docs/qmp-events.txt
> @@ -184,6 +184,23 @@ Example:
>  Note: The "ready to complete" status is always reset by a BLOCK_JOB_ERROR
>  event.
>  
> +COLO_EXIT
> +---------
> +
> +Emitted when VM finishes COLO mode due to some errors happening or
> +the request of users.

s/the/at the/


> +++ b/qapi-schema.json
> @@ -751,6 +751,22 @@
>    'data': [ 'unknown', 'primary', 'secondary'] }
>  
>  ##
> +# @COLOExitReason
> +#
> +# The reason of COLO exit

s/of/for a/

> +#
> +# @unknow: unknown reason

s/unknow/unknown/

> +#
> +# @request: COLO exit is due to an external request
> +#
> +# @error: COLO exit is due to an internal error
> +#
> +# Since: 2.5

2.6 (but you already know that throughout the series, so I'll quit
pointing it out)


> +++ b/qapi/event.json
> @@ -255,6 +255,23 @@
>    'data': {'status': 'MigrationStatus'}}
>  
>  ##
> +# @COLO_EXIT
> +#
> +# Emitted when VM finishes COLO mode due to some errors happening or
> +# the request of users.

s/the/at the/

> +#
> +# @mode: @COLOMode describing which side of VM is exit.

Maybe:

@mode: Which COLO mode the VM was in when it exited.

> +#
> +# @reason: @COLOExitReason describing the reason of colo exit.

@reason: describes the reason for the COLO exit.

> +#
> +# @error: #optional, error message. Only present on error happening.
> +#
> +# Since: 2.5
> +##
> +{ 'event': 'COLO_EXIT',
> +  'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason', '*error': 'str' } }

Other than typos, the interface seems okay.
Zhanghailiang Nov. 23, 2015, 6:01 a.m. UTC | #2
On 2015/11/21 5:50, Eric Blake wrote:
> On 11/03/2015 04:56 AM, zhanghailiang wrote:
>> If some errors happen during VM's COLO FT stage, it's important to notify the users
>> of this event. Together with 'colo_lost_heartbeat', users can intervene in COLO's
>> failover work immediately.
>> If users don't want to get involved in COLO's failover verdict,
>> it is still necessary to notify users that we exit COLO mode.
>
> s/exit/exited/
>
>>
>> Cc: Markus Armbruster <armbru@redhat.com>
>> Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>>   docs/qmp-events.txt | 17 +++++++++++++++++
>>   migration/colo.c    | 13 +++++++++++++
>>   qapi-schema.json    | 16 ++++++++++++++++
>>   qapi/event.json     | 17 +++++++++++++++++
>>   4 files changed, 63 insertions(+)
>>
>> diff --git a/docs/qmp-events.txt b/docs/qmp-events.txt
>> index d2f1ce4..165dd76 100644
>> --- a/docs/qmp-events.txt
>> +++ b/docs/qmp-events.txt
>> @@ -184,6 +184,23 @@ Example:
>>   Note: The "ready to complete" status is always reset by a BLOCK_JOB_ERROR
>>   event.
>>
>> +COLO_EXIT
>> +---------
>> +
>> +Emitted when VM finishes COLO mode due to some errors happening or
>> +the request of users.
>
> s/the/at the/
>
>
>> +++ b/qapi-schema.json
>> @@ -751,6 +751,22 @@
>>     'data': [ 'unknown', 'primary', 'secondary'] }
>>
>>   ##
>> +# @COLOExitReason
>> +#
>> +# The reason of COLO exit
>
> s/of/for a/
>
>> +#
>> +# @unknow: unknown reason
>
> s/unknow/unknown/
>
>> +#
>> +# @request: COLO exit is due to an external request
>> +#
>> +# @error: COLO exit is due to an internal error
>> +#
>> +# Since: 2.5
>
> 2.6 (but you already know that throughout the series, so I'll quit
> pointing it out)
>
>
>> +++ b/qapi/event.json
>> @@ -255,6 +255,23 @@
>>     'data': {'status': 'MigrationStatus'}}
>>
>>   ##
>> +# @COLO_EXIT
>> +#
>> +# Emitted when VM finishes COLO mode due to some errors happening or
>> +# the request of users.
>
> s/the/at the/
>
>> +#
>> +# @mode: @COLOMode describing which side of VM is exit.
>
> Maybe:
>
> @mode: Which COLO mode the VM was in when it exited.
>
>> +#
>> +# @reason: @COLOExitReason describing the reason of colo exit.
>
> @reason: describes the reason for the COLO exit.
>
>> +#
>> +# @error: #optional, error message. Only present on error happening.
>> +#
>> +# Since: 2.5
>> +##
>> +{ 'event': 'COLO_EXIT',
>> +  'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason', '*error': 'str' } }
>
> Other than typos, the interface seems okay.
>

OK, i will fix them in next version, thanks.
diff mbox

Patch

diff --git a/docs/qmp-events.txt b/docs/qmp-events.txt
index d2f1ce4..165dd76 100644
--- a/docs/qmp-events.txt
+++ b/docs/qmp-events.txt
@@ -184,6 +184,23 @@  Example:
 Note: The "ready to complete" status is always reset by a BLOCK_JOB_ERROR
 event.
 
+COLO_EXIT
+---------
+
+Emitted when VM finishes COLO mode due to some errors happening or
+the request of users.
+
+Data:
+
+ - "mode": COLO mode, primary or secondary side (json-string)
+ - "reason":  the exit reason, internal error or external request. (json-string)
+ - "error": error message (json-string, operation)
+
+Example:
+
+{"timestamp": {"seconds": 2032141960, "microseconds": 417172},
+ "event": "COLO_EXIT", "data": {"mode": "primary", "reason": "request" } }
+
 DEVICE_DELETED
 --------------
 
diff --git a/migration/colo.c b/migration/colo.c
index de6265e..247b40f 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -18,6 +18,7 @@ 
 #include "qemu/error-report.h"
 #include "qemu/sockets.h"
 #include "migration/failover.h"
+#include "qapi-event.h"
 
 /*
  * checkpoint interval: unit ms
@@ -343,6 +344,9 @@  out:
     current_time = error_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
     if (ret < 0) {
         error_report("%s: %s", __func__, strerror(-ret));
+        qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_ERROR,
+                                  true, strerror(-ret), NULL);
+
         /* Give users time to get involved in this verdict */
         while (current_time - error_time <= DEFAULT_FAILOVER_DELAY) {
             if (failover_request_is_active()) {
@@ -359,6 +363,9 @@  out:
             failover_request_active(NULL);
         }
         qemu_mutex_unlock_iothread();
+    } else {
+        qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_REQUEST,
+                                  false, NULL, NULL);
     }
 
     qsb_free(buffer);
@@ -530,6 +537,9 @@  out:
     if (ret < 0) {
         error_report("colo incoming thread will exit, detect error: %s",
                      strerror(-ret));
+        qapi_event_send_colo_exit(COLO_MODE_SECONDARY, COLO_EXIT_REASON_ERROR,
+                                  true, strerror(-ret), NULL);
+
         /* Give users time to get involved in this verdict */
         while (current_time - error_time <= DEFAULT_FAILOVER_DELAY) {
             if (failover_request_is_active()) {
@@ -548,6 +558,9 @@  out:
             error_report("SVM is going to exit in default!");
             exit(1);
         }
+    } else {
+        qapi_event_send_colo_exit(COLO_MODE_SECONDARY, COLO_EXIT_REASON_REQUEST,
+                                  false, NULL, NULL);
     }
 
     if (fb) {
diff --git a/qapi-schema.json b/qapi-schema.json
index ff0e941..8cc1f60 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -751,6 +751,22 @@ 
   'data': [ 'unknown', 'primary', 'secondary'] }
 
 ##
+# @COLOExitReason
+#
+# The reason of COLO exit
+#
+# @unknow: unknown reason
+#
+# @request: COLO exit is due to an external request
+#
+# @error: COLO exit is due to an internal error
+#
+# Since: 2.5
+##
+{ 'enum': 'COLOExitReason',
+  'data': [ 'unknown', 'request', 'error'] }
+
+##
 # @x-colo-lost-heartbeat
 #
 # Tell qemu that heartbeat is lost, request it to do takeover procedures.
diff --git a/qapi/event.json b/qapi/event.json
index f0cef01..6158ab5 100644
--- a/qapi/event.json
+++ b/qapi/event.json
@@ -255,6 +255,23 @@ 
   'data': {'status': 'MigrationStatus'}}
 
 ##
+# @COLO_EXIT
+#
+# Emitted when VM finishes COLO mode due to some errors happening or
+# the request of users.
+#
+# @mode: @COLOMode describing which side of VM is exit.
+#
+# @reason: @COLOExitReason describing the reason of colo exit.
+#
+# @error: #optional, error message. Only present on error happening.
+#
+# Since: 2.5
+##
+{ 'event': 'COLO_EXIT',
+  'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason', '*error': 'str' } }
+
+##
 # @ACPI_DEVICE_OST
 #
 # Emitted when guest executes ACPI _OST method.