Message ID | 1438159544-6224-20-git-send-email-zhang.zhanghailiang@huawei.com |
---|---|
State | New |
Headers | show |
On 07/29/2015 02:45 AM, zhanghailiang wrote: > If some errors happen during VM's COLO FT stage, it's import to notify the users s/import/important/ > this event, Togehter with 'colo_lost_heartbeat', users can intervene in COLO's s/this event,/of this event./ s/Togehter/Together/ > failover work immediately. > If users don't want to get involved in COLO's failover verdict, > it is still necessary to notify users that we exit COLO mode. > > Cc: Markus Armbruster <armbru@redhat.com> > Cc: Michael Roth <mdroth@linux.vnet.ibm.com> > Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> > Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> > --- > docs/qmp/qmp-events.txt | 16 ++++++++++++++++ > migration/colo.c | 11 ++++++++++- > qapi/event.json | 15 +++++++++++++++ > 3 files changed, 41 insertions(+), 1 deletion(-) > Interface review: > +++ b/docs/qmp/qmp-events.txt > @@ -488,6 +488,22 @@ Example: > {"timestamp": {"seconds": 1432121972, "microseconds": 744001}, > "event": "MIGRATION", "data": {"status": "completed"}} > > +COLO_EXIT > +--------- This file is alphabetical prior to your patch; please insert your event prior to DEVICE_DELETED. > + > +Emitted when VM finish COLO mode due to some errors happening or s/finish/finishes/ > +the request of users. > + > +Data: None. > + > + - "mode": COLO mode, 'primary' or 'secondary' > + - "error": Error message (json-string, optional) > + > +Example: > + > +{"timestamp": {"seconds": 2032141960, "microseconds": 417172}, > + "event": "COLO_EXIT", "data": {"mode": "primary"}} It might also be useful to provide a machine-parseable parameter on whether the exit was due to an internal error vs. an external request. Maybe by adding 'flag':'bool', since presence or absence of 'error' is not as nice. > +++ b/migration/colo.c > @@ -581,7 +590,7 @@ out: > error_report("Secondary VM will take over work"); > break; > } > - usleep(200*1000); > + usleep(200 * 1000); > } Unrelated whitespace tweak. Please squash this hunk into the patch that introduced the problem. > +++ b/qapi/event.json > @@ -255,6 +255,21 @@ > 'data': {'status': 'MigrationStatus'}} > > ## > +# @COLO_EXIT > +# > +# Emitted when VM finish COLO mode due to some errors happening or s/finish/finishes/ > +# the request of users. > +# > +# @mode: 'primary' or 'secondeary'. s/secondeary/secondary/ > +# > +# @error: #optional, error message. Only present on error happening. > +# > +# Since: 2.4 2.5 > +## > +{ 'event': 'COLO_EXIT', > + 'data': {'mode': 'str', '*error':'str'}} Please use 'mode':'COLOMode' (we already have an enum; for a finite set of strings, it's nicer to use the enum than the open-coded 'str').
On 2015/8/29 6:13, Eric Blake wrote: > On 07/29/2015 02:45 AM, zhanghailiang wrote: >> If some errors happen during VM's COLO FT stage, it's import to notify the users > > s/import/important/ > >> this event, Togehter with 'colo_lost_heartbeat', users can intervene in COLO's > > s/this event,/of this event./ > s/Togehter/Together/ > >> failover work immediately. >> If users don't want to get involved in COLO's failover verdict, >> it is still necessary to notify users that we exit COLO mode. >> >> Cc: Markus Armbruster <armbru@redhat.com> >> Cc: Michael Roth <mdroth@linux.vnet.ibm.com> >> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> >> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> >> --- >> docs/qmp/qmp-events.txt | 16 ++++++++++++++++ >> migration/colo.c | 11 ++++++++++- >> qapi/event.json | 15 +++++++++++++++ >> 3 files changed, 41 insertions(+), 1 deletion(-) >> > > Interface review: > >> +++ b/docs/qmp/qmp-events.txt >> @@ -488,6 +488,22 @@ Example: >> {"timestamp": {"seconds": 1432121972, "microseconds": 744001}, >> "event": "MIGRATION", "data": {"status": "completed"}} >> >> +COLO_EXIT >> +--------- > > This file is alphabetical prior to your patch; please insert your event > prior to DEVICE_DELETED. > >> + >> +Emitted when VM finish COLO mode due to some errors happening or > > s/finish/finishes/ > >> +the request of users. >> + >> +Data: None. >> + >> + - "mode": COLO mode, 'primary' or 'secondary' >> + - "error": Error message (json-string, optional) >> + >> +Example: >> + >> +{"timestamp": {"seconds": 2032141960, "microseconds": 417172}, >> + "event": "COLO_EXIT", "data": {"mode": "primary"}} > > It might also be useful to provide a machine-parseable parameter on > whether the exit was due to an internal error vs. an external request. > Maybe by adding 'flag':'bool', since presence or absence of 'error' is > not as nice. > Good idea, but i think the type of 'flag' changes to 'str' maybe better, which can be 'error' or 'request', 'bool' is not so self-explanatory. The original reason for adding 'error' member is to help users to know what exactly happen, but it seems to be difficult to exactly class all errors, so i will remove it. >> +++ b/migration/colo.c > >> @@ -581,7 +590,7 @@ out: >> error_report("Secondary VM will take over work"); >> break; >> } >> - usleep(200*1000); >> + usleep(200 * 1000); >> } > > Unrelated whitespace tweak. Please squash this hunk into the patch that > introduced the problem. > OK, will fix in next version. >> +++ b/qapi/event.json >> @@ -255,6 +255,21 @@ >> 'data': {'status': 'MigrationStatus'}} >> >> ## >> +# @COLO_EXIT >> +# >> +# Emitted when VM finish COLO mode due to some errors happening or > > s/finish/finishes/ > >> +# the request of users. >> +# >> +# @mode: 'primary' or 'secondeary'. > > s/secondeary/secondary/ > >> +# >> +# @error: #optional, error message. Only present on error happening. >> +# >> +# Since: 2.4 > > 2.5 > >> +## >> +{ 'event': 'COLO_EXIT', >> + 'data': {'mode': 'str', '*error':'str'}} > > Please use 'mode':'COLOMode' (we already have an enum; for a finite set > of strings, it's nicer to use the enum than the open-coded 'str'). > I will fix all the above typos and other problems in next version, thanks very much.
On 08/31/2015 03:27 AM, zhanghailiang wrote: >>> +Data: None. >>> + >>> + - "mode": COLO mode, 'primary' or 'secondary' >>> + - "error": Error message (json-string, optional) The "Data: None" designation is inconsistent with the fact that you do provide data. >>> + >>> +Example: >>> + >>> +{"timestamp": {"seconds": 2032141960, "microseconds": 417172}, >>> + "event": "COLO_EXIT", "data": {"mode": "primary"}} >> >> It might also be useful to provide a machine-parseable parameter on >> whether the exit was due to an internal error vs. an external request. >> Maybe by adding 'flag':'bool', since presence or absence of 'error' is >> not as nice. >> > > Good idea, but i think the type of 'flag' changes to 'str' maybe better, > which can be 'error' or > 'request', 'bool' is not so self-explanatory. > The original reason for adding 'error' member is to help > users to know what exactly happen, but it seems to be difficult to > exactly class all errors, > so i will remove it. Keeping an optional 'error' for human consumption is not bad; it's just that it is also nice to provide something for machine consumption (and we typically document that machines should NOT parse 'error' messages).
On 2015/8/31 23:07, Eric Blake wrote: > On 08/31/2015 03:27 AM, zhanghailiang wrote: > >>>> +Data: None. >>>> + >>>> + - "mode": COLO mode, 'primary' or 'secondary' >>>> + - "error": Error message (json-string, optional) > > The "Data: None" designation is inconsistent with the fact that you do > provide data. > Will fix in next version. >>>> + >>>> +Example: >>>> + >>>> +{"timestamp": {"seconds": 2032141960, "microseconds": 417172}, >>>> + "event": "COLO_EXIT", "data": {"mode": "primary"}} >>> >>> It might also be useful to provide a machine-parseable parameter on >>> whether the exit was due to an internal error vs. an external request. >>> Maybe by adding 'flag':'bool', since presence or absence of 'error' is >>> not as nice. >>> >> >> Good idea, but i think the type of 'flag' changes to 'str' maybe better, >> which can be 'error' or >> 'request', 'bool' is not so self-explanatory. >> The original reason for adding 'error' member is to help >> users to know what exactly happen, but it seems to be difficult to >> exactly class all errors, >> so i will remove it. > > Keeping an optional 'error' for human consumption is not bad; it's just > that it is also nice to provide something for machine consumption (and > we typically document that machines should NOT parse 'error' messages). > OK, then i will keep it, and manage all the error places and error messages carefully. Thanks, zhanghailiang
diff --git a/docs/qmp/qmp-events.txt b/docs/qmp/qmp-events.txt index d92cc48..c7b4581 100644 --- a/docs/qmp/qmp-events.txt +++ b/docs/qmp/qmp-events.txt @@ -488,6 +488,22 @@ Example: {"timestamp": {"seconds": 1432121972, "microseconds": 744001}, "event": "MIGRATION", "data": {"status": "completed"}} +COLO_EXIT +--------- + +Emitted when VM finish COLO mode due to some errors happening or +the request of users. + +Data: None. + + - "mode": COLO mode, 'primary' or 'secondary' + - "error": Error message (json-string, optional) + +Example: + +{"timestamp": {"seconds": 2032141960, "microseconds": 417172}, + "event": "COLO_EXIT", "data": {"mode": "primary"}} + STOP ---- diff --git a/migration/colo.c b/migration/colo.c index a03ba0d..afddb52 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -16,6 +16,7 @@ #include "qemu/error-report.h" #include "qemu/sockets.h" #include "migration/failover.h" +#include "qapi-event.h" /* Fix me: Convert to use QAPI */ typedef enum COLOCommand { @@ -373,6 +374,7 @@ out: qemu_fclose(colo_control); } + qapi_event_send_colo_exit("primary", true, "unknown", NULL);; /* Give users time (2s) to get involved in this verdict */ for (i = 0; i < 10; i++) { if (failover_request_is_active()) { @@ -566,6 +568,13 @@ void *colo_process_incoming_checkpoints(void *opaque) out: error_report("Detect some error or get a failover request"); + /* + * Here, we raise a qmp event to the user, + * It can help user to know what happens, and help deciding whether to + * do failover. + */ + qapi_event_send_colo_exit("secondary", true, "unknown", NULL); + if (fb) { qemu_fclose(fb); } @@ -581,7 +590,7 @@ out: error_report("Secondary VM will take over work"); break; } - usleep(200*1000); + usleep(200 * 1000); } /* check flag again*/ if (!failover_request_is_active()) { diff --git a/qapi/event.json b/qapi/event.json index f0cef01..21dfe3a 100644 --- a/qapi/event.json +++ b/qapi/event.json @@ -255,6 +255,21 @@ 'data': {'status': 'MigrationStatus'}} ## +# @COLO_EXIT +# +# Emitted when VM finish COLO mode due to some errors happening or +# the request of users. +# +# @mode: 'primary' or 'secondeary'. +# +# @error: #optional, error message. Only present on error happening. +# +# Since: 2.4 +## +{ 'event': 'COLO_EXIT', + 'data': {'mode': 'str', '*error':'str'}} + +## # @ACPI_DEVICE_OST # # Emitted when guest executes ACPI _OST method.