diff mbox

[COLO-Frame,v8,19/34] qmp event: Add event notification for COLO error

Message ID 1438159544-6224-20-git-send-email-zhang.zhanghailiang@huawei.com
State New
Headers show

Commit Message

Zhanghailiang July 29, 2015, 8:45 a.m. UTC
If some errors happen during VM's COLO FT stage, it's import to notify the users
this event, Togehter with 'colo_lost_heartbeat', users can intervene in COLO's
failover work immediately.
If users don't want to get involved in COLO's failover verdict,
it is still necessary to notify users that we exit COLO mode.

Cc: Markus Armbruster <armbru@redhat.com>
Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 docs/qmp/qmp-events.txt | 16 ++++++++++++++++
 migration/colo.c        | 11 ++++++++++-
 qapi/event.json         | 15 +++++++++++++++
 3 files changed, 41 insertions(+), 1 deletion(-)

Comments

Eric Blake Aug. 28, 2015, 10:13 p.m. UTC | #1
On 07/29/2015 02:45 AM, zhanghailiang wrote:
> If some errors happen during VM's COLO FT stage, it's import to notify the users

s/import/important/

> this event, Togehter with 'colo_lost_heartbeat', users can intervene in COLO's

s/this event,/of this event./
s/Togehter/Together/

> failover work immediately.
> If users don't want to get involved in COLO's failover verdict,
> it is still necessary to notify users that we exit COLO mode.
> 
> Cc: Markus Armbruster <armbru@redhat.com>
> Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> ---
>  docs/qmp/qmp-events.txt | 16 ++++++++++++++++
>  migration/colo.c        | 11 ++++++++++-
>  qapi/event.json         | 15 +++++++++++++++
>  3 files changed, 41 insertions(+), 1 deletion(-)
> 

Interface review:

> +++ b/docs/qmp/qmp-events.txt
> @@ -488,6 +488,22 @@ Example:
>  {"timestamp": {"seconds": 1432121972, "microseconds": 744001},
>   "event": "MIGRATION", "data": {"status": "completed"}}
>  
> +COLO_EXIT
> +---------

This file is alphabetical prior to your patch; please insert your event
prior to DEVICE_DELETED.

> +
> +Emitted when VM finish COLO mode due to some errors happening or

s/finish/finishes/

> +the request of users.
> +
> +Data: None.
> +
> + - "mode": COLO mode, 'primary' or 'secondary'
> + - "error": Error message (json-string, optional)
> +
> +Example:
> +
> +{"timestamp": {"seconds": 2032141960, "microseconds": 417172},
> + "event": "COLO_EXIT", "data": {"mode": "primary"}}

It might also be useful to provide a machine-parseable parameter on
whether the exit was due to an internal error vs. an external request.
Maybe by adding 'flag':'bool', since presence or absence of 'error' is
not as nice.

> +++ b/migration/colo.c

> @@ -581,7 +590,7 @@ out:
>              error_report("Secondary VM will take over work");
>              break;
>          }
> -        usleep(200*1000);
> +        usleep(200 * 1000);
>      }

Unrelated whitespace tweak. Please squash this hunk into the patch that
introduced the problem.

> +++ b/qapi/event.json
> @@ -255,6 +255,21 @@
>    'data': {'status': 'MigrationStatus'}}
>  
>  ##
> +# @COLO_EXIT
> +#
> +# Emitted when VM finish COLO mode due to some errors happening or

s/finish/finishes/

> +# the request of users.
> +#
> +# @mode: 'primary' or 'secondeary'.

s/secondeary/secondary/

> +#
> +# @error:  #optional, error message. Only present on error happening.
> +#
> +# Since: 2.4

2.5

> +##
> +{ 'event': 'COLO_EXIT',
> +  'data': {'mode': 'str', '*error':'str'}}

Please use 'mode':'COLOMode' (we already have an enum; for a finite set
of strings, it's nicer to use the enum than the open-coded 'str').
Zhanghailiang Aug. 31, 2015, 9:27 a.m. UTC | #2
On 2015/8/29 6:13, Eric Blake wrote:
> On 07/29/2015 02:45 AM, zhanghailiang wrote:
>> If some errors happen during VM's COLO FT stage, it's import to notify the users
>
> s/import/important/
>
>> this event, Togehter with 'colo_lost_heartbeat', users can intervene in COLO's
>
> s/this event,/of this event./
> s/Togehter/Together/
>
>> failover work immediately.
>> If users don't want to get involved in COLO's failover verdict,
>> it is still necessary to notify users that we exit COLO mode.
>>
>> Cc: Markus Armbruster <armbru@redhat.com>
>> Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>>   docs/qmp/qmp-events.txt | 16 ++++++++++++++++
>>   migration/colo.c        | 11 ++++++++++-
>>   qapi/event.json         | 15 +++++++++++++++
>>   3 files changed, 41 insertions(+), 1 deletion(-)
>>
>
> Interface review:
>
>> +++ b/docs/qmp/qmp-events.txt
>> @@ -488,6 +488,22 @@ Example:
>>   {"timestamp": {"seconds": 1432121972, "microseconds": 744001},
>>    "event": "MIGRATION", "data": {"status": "completed"}}
>>
>> +COLO_EXIT
>> +---------
>
> This file is alphabetical prior to your patch; please insert your event
> prior to DEVICE_DELETED.
>
>> +
>> +Emitted when VM finish COLO mode due to some errors happening or
>
> s/finish/finishes/
>
>> +the request of users.
>> +
>> +Data: None.
>> +
>> + - "mode": COLO mode, 'primary' or 'secondary'
>> + - "error": Error message (json-string, optional)
>> +
>> +Example:
>> +
>> +{"timestamp": {"seconds": 2032141960, "microseconds": 417172},
>> + "event": "COLO_EXIT", "data": {"mode": "primary"}}
>
> It might also be useful to provide a machine-parseable parameter on
> whether the exit was due to an internal error vs. an external request.
> Maybe by adding 'flag':'bool', since presence or absence of 'error' is
> not as nice.
>

Good idea, but i think the type of 'flag' changes to 'str' maybe better, which can be 'error' or
'request', 'bool' is not so self-explanatory.
The original reason for adding 'error' member is to help
users to know what exactly happen, but it seems to be difficult to  exactly class all errors,
so i will remove it.

>> +++ b/migration/colo.c
>
>> @@ -581,7 +590,7 @@ out:
>>               error_report("Secondary VM will take over work");
>>               break;
>>           }
>> -        usleep(200*1000);
>> +        usleep(200 * 1000);
>>       }
>
> Unrelated whitespace tweak. Please squash this hunk into the patch that
> introduced the problem.
>

OK, will fix in next version.

>> +++ b/qapi/event.json
>> @@ -255,6 +255,21 @@
>>     'data': {'status': 'MigrationStatus'}}
>>
>>   ##
>> +# @COLO_EXIT
>> +#
>> +# Emitted when VM finish COLO mode due to some errors happening or
>
> s/finish/finishes/
>
>> +# the request of users.
>> +#
>> +# @mode: 'primary' or 'secondeary'.
>
> s/secondeary/secondary/
>
>> +#
>> +# @error:  #optional, error message. Only present on error happening.
>> +#
>> +# Since: 2.4
>
> 2.5
>
>> +##
>> +{ 'event': 'COLO_EXIT',
>> +  'data': {'mode': 'str', '*error':'str'}}
>
> Please use 'mode':'COLOMode' (we already have an enum; for a finite set
> of strings, it's nicer to use the enum than the open-coded 'str').
>

I will fix all the above typos and other problems in next version, thanks very much.
Eric Blake Aug. 31, 2015, 3:07 p.m. UTC | #3
On 08/31/2015 03:27 AM, zhanghailiang wrote:

>>> +Data: None.
>>> +
>>> + - "mode": COLO mode, 'primary' or 'secondary'
>>> + - "error": Error message (json-string, optional)

The "Data: None" designation is inconsistent with the fact that you do
provide data.

>>> +
>>> +Example:
>>> +
>>> +{"timestamp": {"seconds": 2032141960, "microseconds": 417172},
>>> + "event": "COLO_EXIT", "data": {"mode": "primary"}}
>>
>> It might also be useful to provide a machine-parseable parameter on
>> whether the exit was due to an internal error vs. an external request.
>> Maybe by adding 'flag':'bool', since presence or absence of 'error' is
>> not as nice.
>>
> 
> Good idea, but i think the type of 'flag' changes to 'str' maybe better,
> which can be 'error' or
> 'request', 'bool' is not so self-explanatory.
> The original reason for adding 'error' member is to help
> users to know what exactly happen, but it seems to be difficult to 
> exactly class all errors,
> so i will remove it.

Keeping an optional 'error' for human consumption is not bad; it's just
that it is also nice to provide something for machine consumption (and
we typically document that machines should NOT parse 'error' messages).
Zhanghailiang Sept. 1, 2015, 1:08 a.m. UTC | #4
On 2015/8/31 23:07, Eric Blake wrote:
> On 08/31/2015 03:27 AM, zhanghailiang wrote:
>
>>>> +Data: None.
>>>> +
>>>> + - "mode": COLO mode, 'primary' or 'secondary'
>>>> + - "error": Error message (json-string, optional)
>
> The "Data: None" designation is inconsistent with the fact that you do
> provide data.
>

Will fix in next version.

>>>> +
>>>> +Example:
>>>> +
>>>> +{"timestamp": {"seconds": 2032141960, "microseconds": 417172},
>>>> + "event": "COLO_EXIT", "data": {"mode": "primary"}}
>>>
>>> It might also be useful to provide a machine-parseable parameter on
>>> whether the exit was due to an internal error vs. an external request.
>>> Maybe by adding 'flag':'bool', since presence or absence of 'error' is
>>> not as nice.
>>>
>>
>> Good idea, but i think the type of 'flag' changes to 'str' maybe better,
>> which can be 'error' or
>> 'request', 'bool' is not so self-explanatory.
>> The original reason for adding 'error' member is to help
>> users to know what exactly happen, but it seems to be difficult to
>> exactly class all errors,
>> so i will remove it.
>
> Keeping an optional 'error' for human consumption is not bad; it's just
> that it is also nice to provide something for machine consumption (and
> we typically document that machines should NOT parse 'error' messages).
>

OK, then i will keep it, and manage all the error places and error messages carefully.

Thanks,
zhanghailiang
diff mbox

Patch

diff --git a/docs/qmp/qmp-events.txt b/docs/qmp/qmp-events.txt
index d92cc48..c7b4581 100644
--- a/docs/qmp/qmp-events.txt
+++ b/docs/qmp/qmp-events.txt
@@ -488,6 +488,22 @@  Example:
 {"timestamp": {"seconds": 1432121972, "microseconds": 744001},
  "event": "MIGRATION", "data": {"status": "completed"}}
 
+COLO_EXIT
+---------
+
+Emitted when VM finish COLO mode due to some errors happening or
+the request of users.
+
+Data: None.
+
+ - "mode": COLO mode, 'primary' or 'secondary'
+ - "error": Error message (json-string, optional)
+
+Example:
+
+{"timestamp": {"seconds": 2032141960, "microseconds": 417172},
+ "event": "COLO_EXIT", "data": {"mode": "primary"}}
+
 STOP
 ----
 
diff --git a/migration/colo.c b/migration/colo.c
index a03ba0d..afddb52 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -16,6 +16,7 @@ 
 #include "qemu/error-report.h"
 #include "qemu/sockets.h"
 #include "migration/failover.h"
+#include "qapi-event.h"
 
 /* Fix me: Convert to use QAPI */
 typedef enum COLOCommand {
@@ -373,6 +374,7 @@  out:
         qemu_fclose(colo_control);
     }
 
+    qapi_event_send_colo_exit("primary", true, "unknown", NULL);;
     /* Give users time (2s) to get involved in this verdict */
     for (i = 0; i < 10; i++) {
         if (failover_request_is_active()) {
@@ -566,6 +568,13 @@  void *colo_process_incoming_checkpoints(void *opaque)
 
 out:
     error_report("Detect some error or get a failover request");
+    /*
+    * Here, we raise a qmp event to the user,
+    * It can help user to know what happens, and help deciding whether to
+    * do failover.
+    */
+    qapi_event_send_colo_exit("secondary", true, "unknown", NULL);
+
     if (fb) {
         qemu_fclose(fb);
     }
@@ -581,7 +590,7 @@  out:
             error_report("Secondary VM will take over work");
             break;
         }
-        usleep(200*1000);
+        usleep(200 * 1000);
     }
     /* check flag again*/
     if (!failover_request_is_active()) {
diff --git a/qapi/event.json b/qapi/event.json
index f0cef01..21dfe3a 100644
--- a/qapi/event.json
+++ b/qapi/event.json
@@ -255,6 +255,21 @@ 
   'data': {'status': 'MigrationStatus'}}
 
 ##
+# @COLO_EXIT
+#
+# Emitted when VM finish COLO mode due to some errors happening or
+# the request of users.
+#
+# @mode: 'primary' or 'secondeary'.
+#
+# @error:  #optional, error message. Only present on error happening.
+#
+# Since: 2.4
+##
+{ 'event': 'COLO_EXIT',
+  'data': {'mode': 'str', '*error':'str'}}
+
+##
 # @ACPI_DEVICE_OST
 #
 # Emitted when guest executes ACPI _OST method.