diff mbox

[COLO-Frame,v12,10/38] COLO: Implement colo checkpoint protocol

Message ID 1450167779-9960-11-git-send-email-zhang.zhanghailiang@huawei.com
State New
Headers show

Commit Message

Zhanghailiang Dec. 15, 2015, 8:22 a.m. UTC
We need communications protocol of user-defined to control the checkpoint
process.

The new checkpoint request is started by Primary VM, and the interactive process
like below:
Checkpoint synchronizing points,

                       Primary                         Secondary
                                                       initial work
'checkpoint-ready'     <------------------------------ @

'checkpoint-request'   @ ----------------------------->
                                                       Suspend (Only in hybrid mode)
'checkpoint-reply'     <------------------------------ @
                       Suspend&Save state
'vmstate-send'         @ ----------------------------->
                       Send state                      Receive state
'vmstate-received'     <------------------------------ @
                       Release packets                 Load state
'vmstate-load'         <------------------------------ @
                       Resume                          Resume (Only in hybrid mode)

                       Start Comparing (Only in hybrid mode)
NOTE:
 1) '@' who sends the message
 2) Every sync-point is synchronized by two sides with only
    one handshake(single direction) for low-latency.
    If more strict synchronization is required, a opposite direction
    sync-point should be added.
 3) Since sync-points are single direction, the remote side may
    go forward a lot when this side just receives the sync-point.
 4) For now, we only support 'periodic' checkpoint, for which
   the Secondary VM is not running, later we will support 'hybrid' mode.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Cc: Eric Blake <eblake@redhat.com>
---
v12:
- Rename colo_ctl_put() to colo_put_cmd()
- Rename colo_ctl_get() to colo_get_check_cmd() and drop
  the third parameter
- Rename colo_ctl_get_cmd() to colo_get_cmd()
- Remove useless 'invalid' member for COLOcommand enum.
v11:
- Add missing 'checkpoint-ready' communication in comment.
- Use parameter to return 'value' for colo_ctl_get() (Dave's suggestion)
- Fix trace for colo_ctl_get() to trace command and value both
v10:
- Rename enum COLOCmd to COLOCommand (Eric's suggestion).
- Remove unused 'ram-steal'

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 migration/colo.c | 183 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 qapi-schema.json |  25 ++++++++
 trace-events     |   2 +
 3 files changed, 208 insertions(+), 2 deletions(-)

Comments

Dr. David Alan Gilbert Dec. 18, 2015, 2:52 p.m. UTC | #1
* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> We need communications protocol of user-defined to control the checkpoint
> process.
> 
> The new checkpoint request is started by Primary VM, and the interactive process
> like below:
> Checkpoint synchronizing points,
> 
>                        Primary                         Secondary
>                                                        initial work
> 'checkpoint-ready'     <------------------------------ @
> 
> 'checkpoint-request'   @ ----------------------------->
>                                                        Suspend (Only in hybrid mode)
> 'checkpoint-reply'     <------------------------------ @
>                        Suspend&Save state
> 'vmstate-send'         @ ----------------------------->
>                        Send state                      Receive state
> 'vmstate-received'     <------------------------------ @
>                        Release packets                 Load state
> 'vmstate-load'         <------------------------------ @
>                        Resume                          Resume (Only in hybrid mode)
> 
>                        Start Comparing (Only in hybrid mode)
> NOTE:
>  1) '@' who sends the message
>  2) Every sync-point is synchronized by two sides with only
>     one handshake(single direction) for low-latency.
>     If more strict synchronization is required, a opposite direction
>     sync-point should be added.
>  3) Since sync-points are single direction, the remote side may
>     go forward a lot when this side just receives the sync-point.
>  4) For now, we only support 'periodic' checkpoint, for which
>    the Secondary VM is not running, later we will support 'hybrid' mode.
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> Cc: Eric Blake <eblake@redhat.com>
> ---
> v12:
> - Rename colo_ctl_put() to colo_put_cmd()
> - Rename colo_ctl_get() to colo_get_check_cmd() and drop
>   the third parameter
> - Rename colo_ctl_get_cmd() to colo_get_cmd()
> - Remove useless 'invalid' member for COLOcommand enum.
> v11:
> - Add missing 'checkpoint-ready' communication in comment.
> - Use parameter to return 'value' for colo_ctl_get() (Dave's suggestion)
> - Fix trace for colo_ctl_get() to trace command and value both
> v10:
> - Rename enum COLOCmd to COLOCommand (Eric's suggestion).
> - Remove unused 'ram-steal'
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> ---
>  migration/colo.c | 183 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  qapi-schema.json |  25 ++++++++
>  trace-events     |   2 +
>  3 files changed, 208 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index 0ab9618..0ce2a6e 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -10,10 +10,12 @@
>   * later.  See the COPYING file in the top-level directory.
>   */
>  
> +#include <unistd.h>
>  #include "sysemu/sysemu.h"
>  #include "migration/colo.h"
>  #include "trace.h"
>  #include "qemu/error-report.h"
> +#include "qemu/sockets.h"
>  
>  bool colo_supported(void)
>  {
> @@ -34,6 +36,100 @@ bool migration_incoming_in_colo_state(void)
>      return mis && (mis->state == MIGRATION_STATUS_COLO);
>  }
>  
> +static int colo_put_cmd(QEMUFile *f, uint32_t cmd)
> +{
> +    int ret;
> +
> +    if (cmd >= COLO_COMMAND_MAX) {
> +        error_report("%s: Invalid cmd", __func__);
> +        return -EINVAL;
> +    }
> +    qemu_put_be32(f, cmd);
> +    qemu_fflush(f);
> +
> +    ret = qemu_file_get_error(f);
> +    trace_colo_put_cmd(COLOCommand_lookup[cmd]);
> +
> +    return ret;
> +}
> +
> +static int colo_get_cmd(QEMUFile *f, uint32_t *cmd)
> +{
> +    int ret;
> +
> +    *cmd = qemu_get_be32(f);
> +    ret = qemu_file_get_error(f);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    if (*cmd >= COLO_COMMAND_MAX) {
> +        error_report("%s: Invalid cmd", __func__);
> +        return -EINVAL;
> +    }
> +    trace_colo_get_cmd(COLOCommand_lookup[*cmd]);
> +    return 0;
> +}
> +
> +static int colo_get_check_cmd(QEMUFile *f, uint32_t expect_cmd)
> +{
> +    int ret;
> +    uint32_t cmd;
> +
> +    ret = colo_get_cmd(f, &cmd);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    if (cmd != expect_cmd) {
> +        error_report("Unexpect colo command, expect:%d, but got cmd:%d",
> +                     expect_cmd, cmd);

Those still need to be PRIu32

But other than that,

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> +        return -EINVAL;
> +    }
> +
> +    return 0;
> +}
> +
> +static int colo_do_checkpoint_transaction(MigrationState *s)
> +{
> +    int ret;
> +
> +    ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_CHECKPOINT_REQUEST);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
> +                             COLO_COMMAND_CHECKPOINT_REPLY);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    /* TODO: suspend and save vm state to colo buffer */
> +
> +    ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_VMSTATE_SEND);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    /* TODO: send vmstate to Secondary */
> +
> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
> +                             COLO_COMMAND_VMSTATE_RECEIVED);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
> +                             COLO_COMMAND_VMSTATE_LOADED);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    /* TODO: resume Primary */
> +
> +out:
> +    return ret;
> +}
> +
>  static void colo_process_checkpoint(MigrationState *s)
>  {
>      int ret = 0;
> @@ -45,12 +141,28 @@ static void colo_process_checkpoint(MigrationState *s)
>          goto out;
>      }
>  
> +    /*
> +     * Wait for Secondary finish loading vm states and enter COLO
> +     * restore.
> +     */
> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
> +                             COLO_COMMAND_CHECKPOINT_READY);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
>      qemu_mutex_lock_iothread();
>      vm_start();
>      qemu_mutex_unlock_iothread();
>      trace_colo_vm_state_change("stop", "run");
>  
> -    /*TODO: COLO checkpoint savevm loop*/
> +    while (s->state == MIGRATION_STATUS_COLO) {
> +        /* start a colo checkpoint */
> +        ret = colo_do_checkpoint_transaction(s);
> +        if (ret < 0) {
> +            goto out;
> +        }
> +    }
>  
>  out:
>      if (ret < 0) {
> @@ -73,6 +185,31 @@ void migrate_start_colo_process(MigrationState *s)
>      qemu_mutex_lock_iothread();
>  }
>  
> +/*
> + * return:
> + * 0: start a checkpoint
> + * -1: some error happened, exit colo restore
> + */
> +static int colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request)
> +{
> +    int ret;
> +    uint32_t cmd;
> +
> +    ret = colo_get_cmd(f, &cmd);
> +    if (ret < 0) {
> +        /* do failover ? */
> +        return ret;
> +    }
> +
> +    switch (cmd) {
> +    case COLO_COMMAND_CHECKPOINT_REQUEST:
> +        *checkpoint_request = 1;
> +        return 0;
> +    default:
> +        return -EINVAL;
> +    }
> +}
> +
>  void *colo_process_incoming_thread(void *opaque)
>  {
>      MigrationIncomingState *mis = opaque;
> @@ -93,7 +230,49 @@ void *colo_process_incoming_thread(void *opaque)
>      */
>      qemu_set_block(qemu_get_fd(mis->from_src_file));
>  
> -    /* TODO: COLO checkpoint restore loop */
> +
> +    ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_CHECKPOINT_READY);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    while (mis->state == MIGRATION_STATUS_COLO) {
> +        int request = 0;
> +        int ret = colo_wait_handle_cmd(mis->from_src_file, &request);
> +
> +        if (ret < 0) {
> +            break;
> +        } else {
> +            if (!request) {
> +                continue;
> +            }
> +        }
> +        /* FIXME: This is unnecessary for periodic checkpoint mode */
> +        ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_CHECKPOINT_REPLY);
> +        if (ret < 0) {
> +            goto out;
> +        }
> +
> +        ret = colo_get_check_cmd(mis->from_src_file,
> +                                 COLO_COMMAND_VMSTATE_SEND);
> +        if (ret < 0) {
> +            goto out;
> +        }
> +
> +        /* TODO: read migration data into colo buffer */
> +
> +        ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_VMSTATE_RECEIVED);
> +        if (ret < 0) {
> +            goto out;
> +        }
> +
> +        /* TODO: load vm state */
> +
> +        ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_VMSTATE_LOADED);
> +        if (ret < 0) {
> +            goto out;
> +        }
> +    }
>  
>  out:
>      if (ret < 0) {
> diff --git a/qapi-schema.json b/qapi-schema.json
> index c9ff34e..85f7800 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -720,6 +720,31 @@
>  { 'command': 'migrate-start-postcopy' }
>  
>  ##
> +# @COLOCommand
> +#
> +# The commands for COLO fault tolerance
> +#
> +# @checkpoint-ready: SVM is ready for checkpointing
> +#
> +# @checkpoint-request: PVM tells SVM to prepare for new checkpointing
> +#
> +# @checkpoint-reply: SVM gets PVM's checkpoint request
> +#
> +# @vmstate-send: VM's state will be sent by PVM.
> +#
> +# @vmstate-size: The total size of VMstate.
> +#
> +# @vmstate-received: VM's state has been received by SVM.
> +#
> +# @vmstate-loaded: VM's state has been loaded by SVM.
> +#
> +# Since: 2.6
> +##
> +{ 'enum': 'COLOCommand',
> +  'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
> +            'vmstate-send', 'vmstate-size','vmstate-received',
> +            'vmstate-loaded' ] }
> +
>  # @MouseInfo:
>  #
>  # Information about a mouse device.
> diff --git a/trace-events b/trace-events
> index 5565e79..39fdd8d 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -1579,6 +1579,8 @@ postcopy_ram_incoming_cleanup_join(void) ""
>  
>  # migration/colo.c
>  colo_vm_state_change(const char *old, const char *new) "Change '%s' => '%s'"
> +colo_put_cmd(const char *msg) "Send '%s' cmd"
> +colo_get_cmd(const char *msg) "Receive '%s' cmd"
>  
>  # kvm-all.c
>  kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Markus Armbruster Dec. 19, 2015, 8:54 a.m. UTC | #2
Jumping in at v12 for a bit of QAPI review (and whatever else catched my
eye nearby), please pardon my ignorance of COLO in general, and previous
review of this series in particular.

zhanghailiang <zhang.zhanghailiang@huawei.com> writes:

> We need communications protocol of user-defined to control the checkpoint
> process.
>
> The new checkpoint request is started by Primary VM, and the interactive process
> like below:
> Checkpoint synchronizing points,
>
>                        Primary                         Secondary
>                                                        initial work
> 'checkpoint-ready'     <------------------------------ @
>
> 'checkpoint-request'   @ ----------------------------->
>                                                        Suspend (Only in hybrid mode)
> 'checkpoint-reply'     <------------------------------ @
>                        Suspend&Save state
> 'vmstate-send'         @ ----------------------------->
>                        Send state                      Receive state
> 'vmstate-received'     <------------------------------ @
>                        Release packets                 Load state
> 'vmstate-load'         <------------------------------ @
>                        Resume                          Resume (Only in hybrid mode)

Long lines.  Easy to fix: shorten your arrows.

>                        Start Comparing (Only in hybrid mode)
> NOTE:
>  1) '@' who sends the message
>  2) Every sync-point is synchronized by two sides with only
>     one handshake(single direction) for low-latency.
>     If more strict synchronization is required, a opposite direction
>     sync-point should be added.
>  3) Since sync-points are single direction, the remote side may
>     go forward a lot when this side just receives the sync-point.
>  4) For now, we only support 'periodic' checkpoint, for which
>    the Secondary VM is not running, later we will support 'hybrid' mode.

Useful commit message, but shouldn't this explanation (also) be in the
source?

> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> Cc: Eric Blake <eblake@redhat.com>
> ---
> v12:
> - Rename colo_ctl_put() to colo_put_cmd()
> - Rename colo_ctl_get() to colo_get_check_cmd() and drop
>   the third parameter
> - Rename colo_ctl_get_cmd() to colo_get_cmd()
> - Remove useless 'invalid' member for COLOcommand enum.
> v11:
> - Add missing 'checkpoint-ready' communication in comment.
> - Use parameter to return 'value' for colo_ctl_get() (Dave's suggestion)
> - Fix trace for colo_ctl_get() to trace command and value both
> v10:
> - Rename enum COLOCmd to COLOCommand (Eric's suggestion).
> - Remove unused 'ram-steal'
>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> ---
>  migration/colo.c | 183 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  qapi-schema.json |  25 ++++++++
>  trace-events     |   2 +
>  3 files changed, 208 insertions(+), 2 deletions(-)
>
> diff --git a/migration/colo.c b/migration/colo.c
> index 0ab9618..0ce2a6e 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -10,10 +10,12 @@
>   * later.  See the COPYING file in the top-level directory.
>   */
>  
> +#include <unistd.h>
>  #include "sysemu/sysemu.h"
>  #include "migration/colo.h"
>  #include "trace.h"
>  #include "qemu/error-report.h"
> +#include "qemu/sockets.h"
>  
>  bool colo_supported(void)
>  {
> @@ -34,6 +36,100 @@ bool migration_incoming_in_colo_state(void)
>      return mis && (mis->state == MIGRATION_STATUS_COLO);
>  }
>  
> +static int colo_put_cmd(QEMUFile *f, uint32_t cmd)
> +{
> +    int ret;
> +
> +    if (cmd >= COLO_COMMAND_MAX) {

Needs a trivial rebase due to commit 7fb1cf1.

> +        error_report("%s: Invalid cmd", __func__);
> +        return -EINVAL;

Can this run in a context with different error handling needs?

Or asked differently: who may ultimately handle this error?  Whoever
that may be, how does it need to report errors?

Peeking ahead: the immediate callers don't handle this error, they just
pass it on their callers.

I'm asking because I'm trying to understand whether error_report() is
appropriate here, or whether you need to use error_setg(), and leave the
actual reporting to the spot that ultimately handles this error.

> +    }
> +    qemu_put_be32(f, cmd);
> +    qemu_fflush(f);
> +
> +    ret = qemu_file_get_error(f);
> +    trace_colo_put_cmd(COLOCommand_lookup[cmd]);
> +
> +    return ret;
> +}

Looks like @cmd is a COLOCommand.  Why is the parameter type uint32_t?

> +
> +static int colo_get_cmd(QEMUFile *f, uint32_t *cmd)
> +{
> +    int ret;
> +
> +    *cmd = qemu_get_be32(f);
> +    ret = qemu_file_get_error(f);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    if (*cmd >= COLO_COMMAND_MAX) {
> +        error_report("%s: Invalid cmd", __func__);
> +        return -EINVAL;
> +    }
> +    trace_colo_get_cmd(COLOCommand_lookup[*cmd]);
> +    return 0;
> +}

Same question.

The "get" in the name suggests the function returns the value gotten,
like similarly named function elsewhere in migration/ do.

> +
> +static int colo_get_check_cmd(QEMUFile *f, uint32_t expect_cmd)
> +{
> +    int ret;
> +    uint32_t cmd;
> +
> +    ret = colo_get_cmd(f, &cmd);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    if (cmd != expect_cmd) {
> +        error_report("Unexpect colo command, expect:%d, but got cmd:%d",

Grammar nit: "Unexpected".  Suggest: "Unexpected COLO command %d,
expected %d".

> +                     expect_cmd, cmd);
> +        return -EINVAL;
> +    }
> +
> +    return 0;
> +}
> +
> +static int colo_do_checkpoint_transaction(MigrationState *s)
> +{
> +    int ret;
> +
> +    ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_CHECKPOINT_REQUEST);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
> +                             COLO_COMMAND_CHECKPOINT_REPLY);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    /* TODO: suspend and save vm state to colo buffer */
> +
> +    ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_VMSTATE_SEND);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    /* TODO: send vmstate to Secondary */
> +
> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
> +                             COLO_COMMAND_VMSTATE_RECEIVED);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
> +                             COLO_COMMAND_VMSTATE_LOADED);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    /* TODO: resume Primary */
> +
> +out:
> +    return ret;
> +}
> +
>  static void colo_process_checkpoint(MigrationState *s)
>  {
>      int ret = 0;
> @@ -45,12 +141,28 @@ static void colo_process_checkpoint(MigrationState *s)
>          goto out;
>      }
>  
> +    /*
> +     * Wait for Secondary finish loading vm states and enter COLO
> +     * restore.
> +     */
> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
> +                             COLO_COMMAND_CHECKPOINT_READY);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
>      qemu_mutex_lock_iothread();
>      vm_start();
>      qemu_mutex_unlock_iothread();
>      trace_colo_vm_state_change("stop", "run");
>  
> -    /*TODO: COLO checkpoint savevm loop*/
> +    while (s->state == MIGRATION_STATUS_COLO) {
> +        /* start a colo checkpoint */
> +        ret = colo_do_checkpoint_transaction(s);
> +        if (ret < 0) {
> +            goto out;
> +        }
> +    }
>  
>  out:
>      if (ret < 0) {
> @@ -73,6 +185,31 @@ void migrate_start_colo_process(MigrationState *s)
>      qemu_mutex_lock_iothread();
>  }
>  
> +/*
> + * return:
> + * 0: start a checkpoint
> + * -1: some error happened, exit colo restore
> + */

Suggest to make this a proper function comment, i.e.

/*
 * One line describing purpose
 * As many additional lines as it takes to further explain what it does,
 * preconditions, side effects, return values, error conditions.  Use
 * @name to refer to parameters.
 */

> +static int colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request)
> +{
> +    int ret;
> +    uint32_t cmd;
> +
> +    ret = colo_get_cmd(f, &cmd);
> +    if (ret < 0) {
> +        /* do failover ? */
> +        return ret;
> +    }
> +
> +    switch (cmd) {
> +    case COLO_COMMAND_CHECKPOINT_REQUEST:
> +        *checkpoint_request = 1;
> +        return 0;
> +    default:
> +        return -EINVAL;
> +    }

switch makes sense only if you're going to add cases.

Suggest to set *checkpoint_request = 0 on error, for robustness.

> +}
> +
>  void *colo_process_incoming_thread(void *opaque)
>  {
>      MigrationIncomingState *mis = opaque;
> @@ -93,7 +230,49 @@ void *colo_process_incoming_thread(void *opaque)
>      */
>      qemu_set_block(qemu_get_fd(mis->from_src_file));
>  
> -    /* TODO: COLO checkpoint restore loop */
> +
> +    ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_CHECKPOINT_READY);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    while (mis->state == MIGRATION_STATUS_COLO) {
> +        int request = 0;

Dead initialization.

> +        int ret = colo_wait_handle_cmd(mis->from_src_file, &request);
> +
> +        if (ret < 0) {
> +            break;
> +        } else {
> +            if (!request) {
> +                continue;
> +            }
> +        }

Convoluted nesting.  Suggest

        if (ret < 0) {
            break;
        }
        if (!request) {
            continue;
        }

Actually, !request can't happen, so I'd make it.

        if (ret < 0) {
            break;
        }
        assert(request);

until it can happen.

> +        /* FIXME: This is unnecessary for periodic checkpoint mode */

When you add a FIXME, you should probably point to it in your commit
message.  May not be necessary when the FIXME goes away later in this
series.

Pretty much the same for TODO.

> +        ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_CHECKPOINT_REPLY);
> +        if (ret < 0) {
> +            goto out;

Above, you used break to "break" the loop on error.  Here, you use "goto
out".  Suggest to pick one and stick to it.

> +        }
> +
> +        ret = colo_get_check_cmd(mis->from_src_file,
> +                                 COLO_COMMAND_VMSTATE_SEND);
> +        if (ret < 0) {
> +            goto out;
> +        }
> +
> +        /* TODO: read migration data into colo buffer */
> +
> +        ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_VMSTATE_RECEIVED);
> +        if (ret < 0) {
> +            goto out;
> +        }
> +
> +        /* TODO: load vm state */
> +
> +        ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_VMSTATE_LOADED);
> +        if (ret < 0) {
> +            goto out;
> +        }
> +    }
>  
>  out:
>      if (ret < 0) {
> diff --git a/qapi-schema.json b/qapi-schema.json
> index c9ff34e..85f7800 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -720,6 +720,31 @@
>  { 'command': 'migrate-start-postcopy' }
>  
>  ##
> +# @COLOCommand
> +#
> +# The commands for COLO fault tolerance
> +#
> +# @checkpoint-ready: SVM is ready for checkpointing
> +#
> +# @checkpoint-request: PVM tells SVM to prepare for new checkpointing
> +#
> +# @checkpoint-reply: SVM gets PVM's checkpoint request
> +#
> +# @vmstate-send: VM's state will be sent by PVM.
> +#
> +# @vmstate-size: The total size of VMstate.
> +#
> +# @vmstate-received: VM's state has been received by SVM.
> +#
> +# @vmstate-loaded: VM's state has been loaded by SVM.
> +#
> +# Since: 2.6
> +##
> +{ 'enum': 'COLOCommand',
> +  'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
> +            'vmstate-send', 'vmstate-size','vmstate-received',
> +            'vmstate-loaded' ] }
> +

Space after 'vmstate-size', please.

'vmstate-size' is not used in this patch.  You may want to add it with
its first use instead.

Should this enum really be named "COLOCommand"?  'checkpoint-ready',
'checkpoint-request', 'vmstate-send' look like commands to me, but the
others look like replies.


>  # @MouseInfo:
>  #
>  # Information about a mouse device.
> diff --git a/trace-events b/trace-events
> index 5565e79..39fdd8d 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -1579,6 +1579,8 @@ postcopy_ram_incoming_cleanup_join(void) ""
>  
>  # migration/colo.c
>  colo_vm_state_change(const char *old, const char *new) "Change '%s' => '%s'"
> +colo_put_cmd(const char *msg) "Send '%s' cmd"
> +colo_get_cmd(const char *msg) "Receive '%s' cmd"
>  
>  # kvm-all.c
>  kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"

I like how this commit creates just the two state machines, and leaves
filling in their actions to later commits.  Helps ignorant rewiewers
like me :)
Zhanghailiang Dec. 22, 2015, 7 a.m. UTC | #3
Hi Markus,

On 2015/12/19 16:54, Markus Armbruster wrote:
> Jumping in at v12 for a bit of QAPI review (and whatever else catched my
> eye nearby), please pardon my ignorance of COLO in general, and previous
> review of this series in particular.
>

Thanks all the same :)

> zhanghailiang <zhang.zhanghailiang@huawei.com> writes:
>
>> We need communications protocol of user-defined to control the checkpoint
>> process.
>>
>> The new checkpoint request is started by Primary VM, and the interactive process
>> like below:
>> Checkpoint synchronizing points,
>>
>>                         Primary                         Secondary
>>                                                         initial work
>> 'checkpoint-ready'     <------------------------------ @
>>
>> 'checkpoint-request'   @ ----------------------------->
>>                                                         Suspend (Only in hybrid mode)
>> 'checkpoint-reply'     <------------------------------ @
>>                         Suspend&Save state
>> 'vmstate-send'         @ ----------------------------->
>>                         Send state                      Receive state
>> 'vmstate-received'     <------------------------------ @
>>                         Release packets                 Load state
>> 'vmstate-load'         <------------------------------ @
>>                         Resume                          Resume (Only in hybrid mode)
>
> Long lines.  Easy to fix: shorten your arrows.
>
>>                         Start Comparing (Only in hybrid mode)

OK.

>> NOTE:
>>   1) '@' who sends the message
>>   2) Every sync-point is synchronized by two sides with only
>>      one handshake(single direction) for low-latency.
>>      If more strict synchronization is required, a opposite direction
>>      sync-point should be added.
>>   3) Since sync-points are single direction, the remote side may
>>      go forward a lot when this side just receives the sync-point.
>>   4) For now, we only support 'periodic' checkpoint, for which
>>     the Secondary VM is not running, later we will support 'hybrid' mode.
>
> Useful commit message, but shouldn't this explanation (also) be in the
> source?
>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> Cc: Eric Blake <eblake@redhat.com>
>> ---
>> v12:
>> - Rename colo_ctl_put() to colo_put_cmd()
>> - Rename colo_ctl_get() to colo_get_check_cmd() and drop
>>    the third parameter
>> - Rename colo_ctl_get_cmd() to colo_get_cmd()
>> - Remove useless 'invalid' member for COLOcommand enum.
>> v11:
>> - Add missing 'checkpoint-ready' communication in comment.
>> - Use parameter to return 'value' for colo_ctl_get() (Dave's suggestion)
>> - Fix trace for colo_ctl_get() to trace command and value both
>> v10:
>> - Rename enum COLOCmd to COLOCommand (Eric's suggestion).
>> - Remove unused 'ram-steal'
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> ---
>>   migration/colo.c | 183 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>   qapi-schema.json |  25 ++++++++
>>   trace-events     |   2 +
>>   3 files changed, 208 insertions(+), 2 deletions(-)
>>
>> diff --git a/migration/colo.c b/migration/colo.c
>> index 0ab9618..0ce2a6e 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -10,10 +10,12 @@
>>    * later.  See the COPYING file in the top-level directory.
>>    */
>>
>> +#include <unistd.h>
>>   #include "sysemu/sysemu.h"
>>   #include "migration/colo.h"
>>   #include "trace.h"
>>   #include "qemu/error-report.h"
>> +#include "qemu/sockets.h"
>>
>>   bool colo_supported(void)
>>   {
>> @@ -34,6 +36,100 @@ bool migration_incoming_in_colo_state(void)
>>       return mis && (mis->state == MIGRATION_STATUS_COLO);
>>   }
>>
>> +static int colo_put_cmd(QEMUFile *f, uint32_t cmd)
>> +{
>> +    int ret;
>> +
>> +    if (cmd >= COLO_COMMAND_MAX) {
>
> Needs a trivial rebase due to commit 7fb1cf1.
>

>> +        error_report("%s: Invalid cmd", __func__);
>> +        return -EINVAL;
>
> Can this run in a context with different error handling needs?
>
> Or asked differently: who may ultimately handle this error?  Whoever
> that may be, how does it need to report errors?
>
> Peeking ahead: the immediate callers don't handle this error, they just
> pass it on their callers.
>
> I'm asking because I'm trying to understand whether error_report() is
> appropriate here, or whether you need to use error_setg(), and leave the
> actual reporting to the spot that ultimately handles this error.
>

Hmm, i know what you mean, we handled them all together after exit from the colo process loop,
Use error_setg() seems to be a good idea, with this modification, we can also drop the return
value. I will fix it in next version.


>> +    }
>> +    qemu_put_be32(f, cmd);
>> +    qemu_fflush(f);
>> +
>> +    ret = qemu_file_get_error(f);
>> +    trace_colo_put_cmd(COLOCommand_lookup[cmd]);
>> +
>> +    return ret;
>> +}
>
> Looks like @cmd is a COLOCommand.  Why is the parameter type uint32_t?
>

OK, i will change it to use enum COLOCommand.

>> +
>> +static int colo_get_cmd(QEMUFile *f, uint32_t *cmd)
>> +{
>> +    int ret;
>> +
>> +    *cmd = qemu_get_be32(f);
>> +    ret = qemu_file_get_error(f);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    if (*cmd >= COLO_COMMAND_MAX) {
>> +        error_report("%s: Invalid cmd", __func__);
>> +        return -EINVAL;
>> +    }
>> +    trace_colo_get_cmd(COLOCommand_lookup[*cmd]);
>> +    return 0;
>> +}
>
> Same question.
>
> The "get" in the name suggests the function returns the value gotten,
> like similarly named function elsewhere in migration/ do.
>
Do you mean it should return the cmd value directly, not though parameter way ?
After we convert it to use error_setg() to indicate success or not, we can do like that.
I will fix it.

>> +
>> +static int colo_get_check_cmd(QEMUFile *f, uint32_t expect_cmd)
>> +{
>> +    int ret;
>> +    uint32_t cmd;
>> +
>> +    ret = colo_get_cmd(f, &cmd);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    if (cmd != expect_cmd) {
>> +        error_report("Unexpect colo command, expect:%d, but got cmd:%d",
>
> Grammar nit: "Unexpected".  Suggest: "Unexpected COLO command %d,
> expected %d".
>

Will fix it.

>> +                     expect_cmd, cmd);
>> +        return -EINVAL;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int colo_do_checkpoint_transaction(MigrationState *s)
>> +{
>> +    int ret;
>> +
>> +    ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_CHECKPOINT_REQUEST);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
>> +                             COLO_COMMAND_CHECKPOINT_REPLY);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>> +    /* TODO: suspend and save vm state to colo buffer */
>> +
>> +    ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_VMSTATE_SEND);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>> +    /* TODO: send vmstate to Secondary */
>> +
>> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
>> +                             COLO_COMMAND_VMSTATE_RECEIVED);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
>> +                             COLO_COMMAND_VMSTATE_LOADED);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>> +    /* TODO: resume Primary */
>> +
>> +out:
>> +    return ret;
>> +}
>> +
>>   static void colo_process_checkpoint(MigrationState *s)
>>   {
>>       int ret = 0;
>> @@ -45,12 +141,28 @@ static void colo_process_checkpoint(MigrationState *s)
>>           goto out;
>>       }
>>
>> +    /*
>> +     * Wait for Secondary finish loading vm states and enter COLO
>> +     * restore.
>> +     */
>> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
>> +                             COLO_COMMAND_CHECKPOINT_READY);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>>       qemu_mutex_lock_iothread();
>>       vm_start();
>>       qemu_mutex_unlock_iothread();
>>       trace_colo_vm_state_change("stop", "run");
>>
>> -    /*TODO: COLO checkpoint savevm loop*/
>> +    while (s->state == MIGRATION_STATUS_COLO) {
>> +        /* start a colo checkpoint */
>> +        ret = colo_do_checkpoint_transaction(s);
>> +        if (ret < 0) {
>> +            goto out;
>> +        }
>> +    }
>>
>>   out:
>>       if (ret < 0) {
>> @@ -73,6 +185,31 @@ void migrate_start_colo_process(MigrationState *s)
>>       qemu_mutex_lock_iothread();
>>   }
>>
>> +/*
>> + * return:
>> + * 0: start a checkpoint
>> + * -1: some error happened, exit colo restore
>> + */
>
> Suggest to make this a proper function comment, i.e.
>

Good catch, i will fix it as your suggestion.

> /*
>   * One line describing purpose
>   * As many additional lines as it takes to further explain what it does,
>   * preconditions, side effects, return values, error conditions.  Use
>   * @name to refer to parameters.
>   */
>
>> +static int colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request)
>> +{
>> +    int ret;
>> +    uint32_t cmd;
>> +
>> +    ret = colo_get_cmd(f, &cmd);
>> +    if (ret < 0) {
>> +        /* do failover ? */
>> +        return ret;
>> +    }
>> +
>> +    switch (cmd) {
>> +    case COLO_COMMAND_CHECKPOINT_REQUEST:
>> +        *checkpoint_request = 1;
>> +        return 0;
>> +    default:
>> +        return -EINVAL;
>> +    }
>
> switch makes sense only if you're going to add cases.
>

Yes, we will add COLO_COMMAND_GUEST_SHUTDOWN in the later patch,
and maybe all more cases in future.

> Suggest to set *checkpoint_request = 0 on error, for robustness.
>

OK.

>> +}
>> +
>>   void *colo_process_incoming_thread(void *opaque)
>>   {
>>       MigrationIncomingState *mis = opaque;
>> @@ -93,7 +230,49 @@ void *colo_process_incoming_thread(void *opaque)
>>       */
>>       qemu_set_block(qemu_get_fd(mis->from_src_file));
>>
>> -    /* TODO: COLO checkpoint restore loop */
>> +
>> +    ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_CHECKPOINT_READY);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>> +    while (mis->state == MIGRATION_STATUS_COLO) {
>> +        int request = 0;
>
> Dead initialization.
>

>> +        int ret = colo_wait_handle_cmd(mis->from_src_file, &request);
>> +
>> +        if (ret < 0) {
>> +            break;
>> +        } else {
>> +            if (!request) {
>> +                continue;
>> +            }
>> +        }
>
> Convoluted nesting.  Suggest
>
>          if (ret < 0) {
>              break;
>          }
>          if (!request) {
>              continue;
>          }
>
> Actually, !request can't happen, so I'd make it.
>
>          if (ret < 0) {
>              break;
>          }
>          assert(request);
>
> until it can happen.
>

Yes, you are right, it should never happen.

>> +        /* FIXME: This is unnecessary for periodic checkpoint mode */
>
> When you add a FIXME, you should probably point to it in your commit
> message.  May not be necessary when the FIXME goes away later in this
> series.
>
> Pretty much the same for TODO.
>
>> +        ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_CHECKPOINT_REPLY);
>> +        if (ret < 0) {
>> +            goto out;
>
> Above, you used break to "break" the loop on error.  Here, you use "goto
> out".  Suggest to pick one and stick to it.
>
>> +        }
>> +
>> +        ret = colo_get_check_cmd(mis->from_src_file,
>> +                                 COLO_COMMAND_VMSTATE_SEND);
>> +        if (ret < 0) {
>> +            goto out;
>> +        }
>> +
>> +        /* TODO: read migration data into colo buffer */
>> +
>> +        ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_VMSTATE_RECEIVED);
>> +        if (ret < 0) {
>> +            goto out;
>> +        }
>> +
>> +        /* TODO: load vm state */
>> +
>> +        ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_VMSTATE_LOADED);
>> +        if (ret < 0) {
>> +            goto out;
>> +        }
>> +    }
>>
>>   out:
>>       if (ret < 0) {
>> diff --git a/qapi-schema.json b/qapi-schema.json
>> index c9ff34e..85f7800 100644
>> --- a/qapi-schema.json
>> +++ b/qapi-schema.json
>> @@ -720,6 +720,31 @@
>>   { 'command': 'migrate-start-postcopy' }
>>
>>   ##
>> +# @COLOCommand
>> +#
>> +# The commands for COLO fault tolerance
>> +#
>> +# @checkpoint-ready: SVM is ready for checkpointing
>> +#
>> +# @checkpoint-request: PVM tells SVM to prepare for new checkpointing
>> +#
>> +# @checkpoint-reply: SVM gets PVM's checkpoint request
>> +#
>> +# @vmstate-send: VM's state will be sent by PVM.
>> +#
>> +# @vmstate-size: The total size of VMstate.
>> +#
>> +# @vmstate-received: VM's state has been received by SVM.
>> +#
>> +# @vmstate-loaded: VM's state has been loaded by SVM.
>> +#
>> +# Since: 2.6
>> +##
>> +{ 'enum': 'COLOCommand',
>> +  'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
>> +            'vmstate-send', 'vmstate-size','vmstate-received',
>> +            'vmstate-loaded' ] }
>> +
>
> Space after 'vmstate-size', please.
>

> 'vmstate-size' is not used in this patch.  You may want to add it with
> its first use instead.
>

OK, i will move it to the corresponding patch.

> Should this enum really be named "COLOCommand"?  'checkpoint-ready',
> 'checkpoint-request', 'vmstate-send' look like commands to me, but the
> others look like replies.
>

Yes, COLOCommand is not so exact. what about name it COLOProtocol?

>
>>   # @MouseInfo:
>>   #
>>   # Information about a mouse device.
>> diff --git a/trace-events b/trace-events
>> index 5565e79..39fdd8d 100644
>> --- a/trace-events
>> +++ b/trace-events
>> @@ -1579,6 +1579,8 @@ postcopy_ram_incoming_cleanup_join(void) ""
>>
>>   # migration/colo.c
>>   colo_vm_state_change(const char *old, const char *new) "Change '%s' => '%s'"
>> +colo_put_cmd(const char *msg) "Send '%s' cmd"
>> +colo_get_cmd(const char *msg) "Receive '%s' cmd"
>>
>>   # kvm-all.c
>>   kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
>
> I like how this commit creates just the two state machines, and leaves
> filling in their actions to later commits.  Helps ignorant rewiewers
> like me :)
>
>

Do you mean i should split this patch ? Leave this patch with the simplest colo process,
maybe just 'ready, request, reply', and add the other states in later patch?

Thanks,
Hailiang

> .
>
Zhanghailiang Dec. 28, 2015, 7:34 a.m. UTC | #4
On 2015/12/18 22:52, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> We need communications protocol of user-defined to control the checkpoint
>> process.
>>
>> The new checkpoint request is started by Primary VM, and the interactive process
>> like below:
>> Checkpoint synchronizing points,
>>
>>                         Primary                         Secondary
>>                                                         initial work
>> 'checkpoint-ready'     <------------------------------ @
>>
>> 'checkpoint-request'   @ ----------------------------->
>>                                                         Suspend (Only in hybrid mode)
>> 'checkpoint-reply'     <------------------------------ @
>>                         Suspend&Save state
>> 'vmstate-send'         @ ----------------------------->
>>                         Send state                      Receive state
>> 'vmstate-received'     <------------------------------ @
>>                         Release packets                 Load state
>> 'vmstate-load'         <------------------------------ @
>>                         Resume                          Resume (Only in hybrid mode)
>>
>>                         Start Comparing (Only in hybrid mode)
>> NOTE:
>>   1) '@' who sends the message
>>   2) Every sync-point is synchronized by two sides with only
>>      one handshake(single direction) for low-latency.
>>      If more strict synchronization is required, a opposite direction
>>      sync-point should be added.
>>   3) Since sync-points are single direction, the remote side may
>>      go forward a lot when this side just receives the sync-point.
>>   4) For now, we only support 'periodic' checkpoint, for which
>>     the Secondary VM is not running, later we will support 'hybrid' mode.
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> Cc: Eric Blake <eblake@redhat.com>
>> ---
>> v12:
>> - Rename colo_ctl_put() to colo_put_cmd()
>> - Rename colo_ctl_get() to colo_get_check_cmd() and drop
>>    the third parameter
>> - Rename colo_ctl_get_cmd() to colo_get_cmd()
>> - Remove useless 'invalid' member for COLOcommand enum.
>> v11:
>> - Add missing 'checkpoint-ready' communication in comment.
>> - Use parameter to return 'value' for colo_ctl_get() (Dave's suggestion)
>> - Fix trace for colo_ctl_get() to trace command and value both
>> v10:
>> - Rename enum COLOCmd to COLOCommand (Eric's suggestion).
>> - Remove unused 'ram-steal'
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> ---
>>   migration/colo.c | 183 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>   qapi-schema.json |  25 ++++++++
>>   trace-events     |   2 +
>>   3 files changed, 208 insertions(+), 2 deletions(-)
>>
>> diff --git a/migration/colo.c b/migration/colo.c
>> index 0ab9618..0ce2a6e 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -10,10 +10,12 @@
>>    * later.  See the COPYING file in the top-level directory.
>>    */
>>
>> +#include <unistd.h>
>>   #include "sysemu/sysemu.h"
>>   #include "migration/colo.h"
>>   #include "trace.h"
>>   #include "qemu/error-report.h"
>> +#include "qemu/sockets.h"
>>
>>   bool colo_supported(void)
>>   {
>> @@ -34,6 +36,100 @@ bool migration_incoming_in_colo_state(void)
>>       return mis && (mis->state == MIGRATION_STATUS_COLO);
>>   }
>>
>> +static int colo_put_cmd(QEMUFile *f, uint32_t cmd)
>> +{
>> +    int ret;
>> +
>> +    if (cmd >= COLO_COMMAND_MAX) {
>> +        error_report("%s: Invalid cmd", __func__);
>> +        return -EINVAL;
>> +    }
>> +    qemu_put_be32(f, cmd);
>> +    qemu_fflush(f);
>> +
>> +    ret = qemu_file_get_error(f);
>> +    trace_colo_put_cmd(COLOCommand_lookup[cmd]);
>> +
>> +    return ret;
>> +}
>> +
>> +static int colo_get_cmd(QEMUFile *f, uint32_t *cmd)
>> +{
>> +    int ret;
>> +
>> +    *cmd = qemu_get_be32(f);
>> +    ret = qemu_file_get_error(f);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    if (*cmd >= COLO_COMMAND_MAX) {
>> +        error_report("%s: Invalid cmd", __func__);
>> +        return -EINVAL;
>> +    }
>> +    trace_colo_get_cmd(COLOCommand_lookup[*cmd]);
>> +    return 0;
>> +}
>> +
>> +static int colo_get_check_cmd(QEMUFile *f, uint32_t expect_cmd)
>> +{
>> +    int ret;
>> +    uint32_t cmd;
>> +
>> +    ret = colo_get_cmd(f, &cmd);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    if (cmd != expect_cmd) {
>> +        error_report("Unexpect colo command, expect:%d, but got cmd:%d",
>> +                     expect_cmd, cmd);
>
> Those still need to be PRIu32
>

Er, i have change the type to 'COLOCommand' according to Markus's comment.

> But other than that,
>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Thanks.
Hailiang

>
>> +        return -EINVAL;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int colo_do_checkpoint_transaction(MigrationState *s)
>> +{
>> +    int ret;
>> +
>> +    ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_CHECKPOINT_REQUEST);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
>> +                             COLO_COMMAND_CHECKPOINT_REPLY);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>> +    /* TODO: suspend and save vm state to colo buffer */
>> +
>> +    ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_VMSTATE_SEND);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>> +    /* TODO: send vmstate to Secondary */
>> +
>> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
>> +                             COLO_COMMAND_VMSTATE_RECEIVED);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
>> +                             COLO_COMMAND_VMSTATE_LOADED);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>> +    /* TODO: resume Primary */
>> +
>> +out:
>> +    return ret;
>> +}
>> +
>>   static void colo_process_checkpoint(MigrationState *s)
>>   {
>>       int ret = 0;
>> @@ -45,12 +141,28 @@ static void colo_process_checkpoint(MigrationState *s)
>>           goto out;
>>       }
>>
>> +    /*
>> +     * Wait for Secondary finish loading vm states and enter COLO
>> +     * restore.
>> +     */
>> +    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
>> +                             COLO_COMMAND_CHECKPOINT_READY);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>>       qemu_mutex_lock_iothread();
>>       vm_start();
>>       qemu_mutex_unlock_iothread();
>>       trace_colo_vm_state_change("stop", "run");
>>
>> -    /*TODO: COLO checkpoint savevm loop*/
>> +    while (s->state == MIGRATION_STATUS_COLO) {
>> +        /* start a colo checkpoint */
>> +        ret = colo_do_checkpoint_transaction(s);
>> +        if (ret < 0) {
>> +            goto out;
>> +        }
>> +    }
>>
>>   out:
>>       if (ret < 0) {
>> @@ -73,6 +185,31 @@ void migrate_start_colo_process(MigrationState *s)
>>       qemu_mutex_lock_iothread();
>>   }
>>
>> +/*
>> + * return:
>> + * 0: start a checkpoint
>> + * -1: some error happened, exit colo restore
>> + */
>> +static int colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request)
>> +{
>> +    int ret;
>> +    uint32_t cmd;
>> +
>> +    ret = colo_get_cmd(f, &cmd);
>> +    if (ret < 0) {
>> +        /* do failover ? */
>> +        return ret;
>> +    }
>> +
>> +    switch (cmd) {
>> +    case COLO_COMMAND_CHECKPOINT_REQUEST:
>> +        *checkpoint_request = 1;
>> +        return 0;
>> +    default:
>> +        return -EINVAL;
>> +    }
>> +}
>> +
>>   void *colo_process_incoming_thread(void *opaque)
>>   {
>>       MigrationIncomingState *mis = opaque;
>> @@ -93,7 +230,49 @@ void *colo_process_incoming_thread(void *opaque)
>>       */
>>       qemu_set_block(qemu_get_fd(mis->from_src_file));
>>
>> -    /* TODO: COLO checkpoint restore loop */
>> +
>> +    ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_CHECKPOINT_READY);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>> +    while (mis->state == MIGRATION_STATUS_COLO) {
>> +        int request = 0;
>> +        int ret = colo_wait_handle_cmd(mis->from_src_file, &request);
>> +
>> +        if (ret < 0) {
>> +            break;
>> +        } else {
>> +            if (!request) {
>> +                continue;
>> +            }
>> +        }
>> +        /* FIXME: This is unnecessary for periodic checkpoint mode */
>> +        ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_CHECKPOINT_REPLY);
>> +        if (ret < 0) {
>> +            goto out;
>> +        }
>> +
>> +        ret = colo_get_check_cmd(mis->from_src_file,
>> +                                 COLO_COMMAND_VMSTATE_SEND);
>> +        if (ret < 0) {
>> +            goto out;
>> +        }
>> +
>> +        /* TODO: read migration data into colo buffer */
>> +
>> +        ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_VMSTATE_RECEIVED);
>> +        if (ret < 0) {
>> +            goto out;
>> +        }
>> +
>> +        /* TODO: load vm state */
>> +
>> +        ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_VMSTATE_LOADED);
>> +        if (ret < 0) {
>> +            goto out;
>> +        }
>> +    }
>>
>>   out:
>>       if (ret < 0) {
>> diff --git a/qapi-schema.json b/qapi-schema.json
>> index c9ff34e..85f7800 100644
>> --- a/qapi-schema.json
>> +++ b/qapi-schema.json
>> @@ -720,6 +720,31 @@
>>   { 'command': 'migrate-start-postcopy' }
>>
>>   ##
>> +# @COLOCommand
>> +#
>> +# The commands for COLO fault tolerance
>> +#
>> +# @checkpoint-ready: SVM is ready for checkpointing
>> +#
>> +# @checkpoint-request: PVM tells SVM to prepare for new checkpointing
>> +#
>> +# @checkpoint-reply: SVM gets PVM's checkpoint request
>> +#
>> +# @vmstate-send: VM's state will be sent by PVM.
>> +#
>> +# @vmstate-size: The total size of VMstate.
>> +#
>> +# @vmstate-received: VM's state has been received by SVM.
>> +#
>> +# @vmstate-loaded: VM's state has been loaded by SVM.
>> +#
>> +# Since: 2.6
>> +##
>> +{ 'enum': 'COLOCommand',
>> +  'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
>> +            'vmstate-send', 'vmstate-size','vmstate-received',
>> +            'vmstate-loaded' ] }
>> +
>>   # @MouseInfo:
>>   #
>>   # Information about a mouse device.
>> diff --git a/trace-events b/trace-events
>> index 5565e79..39fdd8d 100644
>> --- a/trace-events
>> +++ b/trace-events
>> @@ -1579,6 +1579,8 @@ postcopy_ram_incoming_cleanup_join(void) ""
>>
>>   # migration/colo.c
>>   colo_vm_state_change(const char *old, const char *new) "Change '%s' => '%s'"
>> +colo_put_cmd(const char *msg) "Send '%s' cmd"
>> +colo_get_cmd(const char *msg) "Receive '%s' cmd"
>>
>>   # kvm-all.c
>>   kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
>> --
>> 1.8.3.1
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>
Markus Armbruster Jan. 11, 2016, 12:47 p.m. UTC | #5
Hailiang Zhang <zhang.zhanghailiang@huawei.com> writes:

> Hi Markus,
>
> On 2015/12/19 16:54, Markus Armbruster wrote:
>> Jumping in at v12 for a bit of QAPI review (and whatever else catched my
>> eye nearby), please pardon my ignorance of COLO in general, and previous
>> review of this series in particular.
>>
>
> Thanks all the same :)
[...]
>>> diff --git a/migration/colo.c b/migration/colo.c
>>> index 0ab9618..0ce2a6e 100644
>>> --- a/migration/colo.c
>>> +++ b/migration/colo.c
>>> @@ -10,10 +10,12 @@
>>>    * later.  See the COPYING file in the top-level directory.
>>>    */
>>>
>>> +#include <unistd.h>
>>>   #include "sysemu/sysemu.h"
>>>   #include "migration/colo.h"
>>>   #include "trace.h"
>>>   #include "qemu/error-report.h"
>>> +#include "qemu/sockets.h"
>>>
>>>   bool colo_supported(void)
>>>   {
>>> @@ -34,6 +36,100 @@ bool migration_incoming_in_colo_state(void)
>>>       return mis && (mis->state == MIGRATION_STATUS_COLO);
>>>   }
>>>
>>> +static int colo_put_cmd(QEMUFile *f, uint32_t cmd)
>>> +{
>>> +    int ret;
>>> +
>>> +    if (cmd >= COLO_COMMAND_MAX) {
>>
>> Needs a trivial rebase due to commit 7fb1cf1.
>>
>
>>> +        error_report("%s: Invalid cmd", __func__);
>>> +        return -EINVAL;
>>
>> Can this run in a context with different error handling needs?
>>
>> Or asked differently: who may ultimately handle this error?  Whoever
>> that may be, how does it need to report errors?
>>
>> Peeking ahead: the immediate callers don't handle this error, they just
>> pass it on their callers.
>>
>> I'm asking because I'm trying to understand whether error_report() is
>> appropriate here, or whether you need to use error_setg(), and leave the
>> actual reporting to the spot that ultimately handles this error.
>>
>
> Hmm, i know what you mean, we handled them all together after exit
> from the colo process loop,
> Use error_setg() seems to be a good idea, with this modification, we
> can also drop the return
> value. I will fix it in next version.
>
>
>>> +    }
>>> +    qemu_put_be32(f, cmd);
>>> +    qemu_fflush(f);
>>> +
>>> +    ret = qemu_file_get_error(f);
>>> +    trace_colo_put_cmd(COLOCommand_lookup[cmd]);
>>> +
>>> +    return ret;
>>> +}
>>
>> Looks like @cmd is a COLOCommand.  Why is the parameter type uint32_t?
>>
>
> OK, i will change it to use enum COLOCommand.
>
>>> +
>>> +static int colo_get_cmd(QEMUFile *f, uint32_t *cmd)
>>> +{
>>> +    int ret;
>>> +
>>> +    *cmd = qemu_get_be32(f);
>>> +    ret = qemu_file_get_error(f);
>>> +    if (ret < 0) {
>>> +        return ret;
>>> +    }
>>> +    if (*cmd >= COLO_COMMAND_MAX) {
>>> +        error_report("%s: Invalid cmd", __func__);
>>> +        return -EINVAL;
>>> +    }
>>> +    trace_colo_get_cmd(COLOCommand_lookup[*cmd]);
>>> +    return 0;
>>> +}
>>
>> Same question.
>>
>> The "get" in the name suggests the function returns the value gotten,
>> like similarly named function elsewhere in migration/ do.
>>
> Do you mean it should return the cmd value directly, not though parameter way ?
> After we convert it to use error_setg() to indicate success or not, we
> can do like that.
> I will fix it.

Sounds good to me.

[...]
>>> diff --git a/qapi-schema.json b/qapi-schema.json
>>> index c9ff34e..85f7800 100644
>>> --- a/qapi-schema.json
>>> +++ b/qapi-schema.json
>>> @@ -720,6 +720,31 @@
>>>   { 'command': 'migrate-start-postcopy' }
>>>
>>>   ##
>>> +# @COLOCommand
>>> +#
>>> +# The commands for COLO fault tolerance
>>> +#
>>> +# @checkpoint-ready: SVM is ready for checkpointing
>>> +#
>>> +# @checkpoint-request: PVM tells SVM to prepare for new checkpointing
>>> +#
>>> +# @checkpoint-reply: SVM gets PVM's checkpoint request
>>> +#
>>> +# @vmstate-send: VM's state will be sent by PVM.
>>> +#
>>> +# @vmstate-size: The total size of VMstate.
>>> +#
>>> +# @vmstate-received: VM's state has been received by SVM.
>>> +#
>>> +# @vmstate-loaded: VM's state has been loaded by SVM.
>>> +#
>>> +# Since: 2.6
>>> +##
>>> +{ 'enum': 'COLOCommand',
>>> +  'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
>>> +            'vmstate-send', 'vmstate-size','vmstate-received',
>>> +            'vmstate-loaded' ] }
>>> +
>>
>> Space after 'vmstate-size', please.
>>
>
>> 'vmstate-size' is not used in this patch.  You may want to add it with
>> its first use instead.
>>
>
> OK, i will move it to the corresponding patch.
>
>> Should this enum really be named "COLOCommand"?  'checkpoint-ready',
>> 'checkpoint-request', 'vmstate-send' look like commands to me, but the
>> others look like replies.
>>
>
> Yes, COLOCommand is not so exact. what about name it COLOProtocol?

A protocol specifies valid sequences of messages, and what they mean.
This isn't a protocol, it's a message within a protocol.  COLOMessage?

>>
>>>   # @MouseInfo:
>>>   #
>>>   # Information about a mouse device.
>>> diff --git a/trace-events b/trace-events
>>> index 5565e79..39fdd8d 100644
>>> --- a/trace-events
>>> +++ b/trace-events
>>> @@ -1579,6 +1579,8 @@ postcopy_ram_incoming_cleanup_join(void) ""
>>>
>>>   # migration/colo.c
>>>   colo_vm_state_change(const char *old, const char *new) "Change
>>> '%s' => '%s'"
>>> +colo_put_cmd(const char *msg) "Send '%s' cmd"
>>> +colo_get_cmd(const char *msg) "Receive '%s' cmd"
>>>
>>>   # kvm-all.c
>>>   kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
>>
>> I like how this commit creates just the two state machines, and leaves
>> filling in their actions to later commits.  Helps ignorant rewiewers
>> like me :)
>>
>>
>
> Do you mean i should split this patch ? Leave this patch with the
> simplest colo process,
> maybe just 'ready, request, reply', and add the other states in later patch?

No, I *like* how you split up the work.
Zhanghailiang Jan. 12, 2016, 12:57 p.m. UTC | #6
On 2016/1/11 20:47, Markus Armbruster wrote:
> Hailiang Zhang <zhang.zhanghailiang@huawei.com> writes:
>
>> Hi Markus,
>>
>> On 2015/12/19 16:54, Markus Armbruster wrote:
>>> Jumping in at v12 for a bit of QAPI review (and whatever else catched my
>>> eye nearby), please pardon my ignorance of COLO in general, and previous
>>> review of this series in particular.
>>>
>>
>> Thanks all the same :)
> [...]
>>>> diff --git a/migration/colo.c b/migration/colo.c
>>>> index 0ab9618..0ce2a6e 100644
>>>> --- a/migration/colo.c
>>>> +++ b/migration/colo.c
>>>> @@ -10,10 +10,12 @@
>>>>     * later.  See the COPYING file in the top-level directory.
>>>>     */
>>>>
>>>> +#include <unistd.h>
>>>>    #include "sysemu/sysemu.h"
>>>>    #include "migration/colo.h"
>>>>    #include "trace.h"
>>>>    #include "qemu/error-report.h"
>>>> +#include "qemu/sockets.h"
>>>>
>>>>    bool colo_supported(void)
>>>>    {
>>>> @@ -34,6 +36,100 @@ bool migration_incoming_in_colo_state(void)
>>>>        return mis && (mis->state == MIGRATION_STATUS_COLO);
>>>>    }
>>>>
>>>> +static int colo_put_cmd(QEMUFile *f, uint32_t cmd)
>>>> +{
>>>> +    int ret;
>>>> +
>>>> +    if (cmd >= COLO_COMMAND_MAX) {
>>>
>>> Needs a trivial rebase due to commit 7fb1cf1.
>>>
>>
>>>> +        error_report("%s: Invalid cmd", __func__);
>>>> +        return -EINVAL;
>>>
>>> Can this run in a context with different error handling needs?
>>>
>>> Or asked differently: who may ultimately handle this error?  Whoever
>>> that may be, how does it need to report errors?
>>>
>>> Peeking ahead: the immediate callers don't handle this error, they just
>>> pass it on their callers.
>>>
>>> I'm asking because I'm trying to understand whether error_report() is
>>> appropriate here, or whether you need to use error_setg(), and leave the
>>> actual reporting to the spot that ultimately handles this error.
>>>
>>
>> Hmm, i know what you mean, we handled them all together after exit
>> from the colo process loop,
>> Use error_setg() seems to be a good idea, with this modification, we
>> can also drop the return
>> value. I will fix it in next version.
>>
>>
>>>> +    }
>>>> +    qemu_put_be32(f, cmd);
>>>> +    qemu_fflush(f);
>>>> +
>>>> +    ret = qemu_file_get_error(f);
>>>> +    trace_colo_put_cmd(COLOCommand_lookup[cmd]);
>>>> +
>>>> +    return ret;
>>>> +}
>>>
>>> Looks like @cmd is a COLOCommand.  Why is the parameter type uint32_t?
>>>
>>
>> OK, i will change it to use enum COLOCommand.
>>
>>>> +
>>>> +static int colo_get_cmd(QEMUFile *f, uint32_t *cmd)
>>>> +{
>>>> +    int ret;
>>>> +
>>>> +    *cmd = qemu_get_be32(f);
>>>> +    ret = qemu_file_get_error(f);
>>>> +    if (ret < 0) {
>>>> +        return ret;
>>>> +    }
>>>> +    if (*cmd >= COLO_COMMAND_MAX) {
>>>> +        error_report("%s: Invalid cmd", __func__);
>>>> +        return -EINVAL;
>>>> +    }
>>>> +    trace_colo_get_cmd(COLOCommand_lookup[*cmd]);
>>>> +    return 0;
>>>> +}
>>>
>>> Same question.
>>>
>>> The "get" in the name suggests the function returns the value gotten,
>>> like similarly named function elsewhere in migration/ do.
>>>
>> Do you mean it should return the cmd value directly, not though parameter way ?
>> After we convert it to use error_setg() to indicate success or not, we
>> can do like that.
>> I will fix it.
>
> Sounds good to me.
>

I have fixed them in v13 version :)

> [...]
>>>> diff --git a/qapi-schema.json b/qapi-schema.json
>>>> index c9ff34e..85f7800 100644
>>>> --- a/qapi-schema.json
>>>> +++ b/qapi-schema.json
>>>> @@ -720,6 +720,31 @@
>>>>    { 'command': 'migrate-start-postcopy' }
>>>>
>>>>    ##
>>>> +# @COLOCommand
>>>> +#
>>>> +# The commands for COLO fault tolerance
>>>> +#
>>>> +# @checkpoint-ready: SVM is ready for checkpointing
>>>> +#
>>>> +# @checkpoint-request: PVM tells SVM to prepare for new checkpointing
>>>> +#
>>>> +# @checkpoint-reply: SVM gets PVM's checkpoint request
>>>> +#
>>>> +# @vmstate-send: VM's state will be sent by PVM.
>>>> +#
>>>> +# @vmstate-size: The total size of VMstate.
>>>> +#
>>>> +# @vmstate-received: VM's state has been received by SVM.
>>>> +#
>>>> +# @vmstate-loaded: VM's state has been loaded by SVM.
>>>> +#
>>>> +# Since: 2.6
>>>> +##
>>>> +{ 'enum': 'COLOCommand',
>>>> +  'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
>>>> +            'vmstate-send', 'vmstate-size','vmstate-received',
>>>> +            'vmstate-loaded' ] }
>>>> +
>>>
>>> Space after 'vmstate-size', please.
>>>
>>
>>> 'vmstate-size' is not used in this patch.  You may want to add it with
>>> its first use instead.
>>>
>>
>> OK, i will move it to the corresponding patch.
>>
>>> Should this enum really be named "COLOCommand"?  'checkpoint-ready',
>>> 'checkpoint-request', 'vmstate-send' look like commands to me, but the
>>> others look like replies.
>>>
>>
>> Yes, COLOCommand is not so exact. what about name it COLOProtocol?
>
> A protocol specifies valid sequences of messages, and what they mean.
> This isn't a protocol, it's a message within a protocol.  COLOMessage?
>

Yes, COLOMessage is more precise, i will fix it in next version.

>>>
>>>>    # @MouseInfo:
>>>>    #
>>>>    # Information about a mouse device.
>>>> diff --git a/trace-events b/trace-events
>>>> index 5565e79..39fdd8d 100644
>>>> --- a/trace-events
>>>> +++ b/trace-events
>>>> @@ -1579,6 +1579,8 @@ postcopy_ram_incoming_cleanup_join(void) ""
>>>>
>>>>    # migration/colo.c
>>>>    colo_vm_state_change(const char *old, const char *new) "Change
>>>> '%s' => '%s'"
>>>> +colo_put_cmd(const char *msg) "Send '%s' cmd"
>>>> +colo_get_cmd(const char *msg) "Receive '%s' cmd"
>>>>
>>>>    # kvm-all.c
>>>>    kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
>>>
>>> I like how this commit creates just the two state machines, and leaves
>>> filling in their actions to later commits.  Helps ignorant rewiewers
>>> like me :)
>>>
>>>
>>
>> Do you mean i should split this patch ? Leave this patch with the
>> simplest colo process,
>> maybe just 'ready, request, reply', and add the other states in later patch?
>
> No, I *like* how you split up the work.
>

OK.

Thanks.
Hailiang

> .
>
diff mbox

Patch

diff --git a/migration/colo.c b/migration/colo.c
index 0ab9618..0ce2a6e 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -10,10 +10,12 @@ 
  * later.  See the COPYING file in the top-level directory.
  */
 
+#include <unistd.h>
 #include "sysemu/sysemu.h"
 #include "migration/colo.h"
 #include "trace.h"
 #include "qemu/error-report.h"
+#include "qemu/sockets.h"
 
 bool colo_supported(void)
 {
@@ -34,6 +36,100 @@  bool migration_incoming_in_colo_state(void)
     return mis && (mis->state == MIGRATION_STATUS_COLO);
 }
 
+static int colo_put_cmd(QEMUFile *f, uint32_t cmd)
+{
+    int ret;
+
+    if (cmd >= COLO_COMMAND_MAX) {
+        error_report("%s: Invalid cmd", __func__);
+        return -EINVAL;
+    }
+    qemu_put_be32(f, cmd);
+    qemu_fflush(f);
+
+    ret = qemu_file_get_error(f);
+    trace_colo_put_cmd(COLOCommand_lookup[cmd]);
+
+    return ret;
+}
+
+static int colo_get_cmd(QEMUFile *f, uint32_t *cmd)
+{
+    int ret;
+
+    *cmd = qemu_get_be32(f);
+    ret = qemu_file_get_error(f);
+    if (ret < 0) {
+        return ret;
+    }
+    if (*cmd >= COLO_COMMAND_MAX) {
+        error_report("%s: Invalid cmd", __func__);
+        return -EINVAL;
+    }
+    trace_colo_get_cmd(COLOCommand_lookup[*cmd]);
+    return 0;
+}
+
+static int colo_get_check_cmd(QEMUFile *f, uint32_t expect_cmd)
+{
+    int ret;
+    uint32_t cmd;
+
+    ret = colo_get_cmd(f, &cmd);
+    if (ret < 0) {
+        return ret;
+    }
+    if (cmd != expect_cmd) {
+        error_report("Unexpect colo command, expect:%d, but got cmd:%d",
+                     expect_cmd, cmd);
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+static int colo_do_checkpoint_transaction(MigrationState *s)
+{
+    int ret;
+
+    ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_CHECKPOINT_REQUEST);
+    if (ret < 0) {
+        goto out;
+    }
+
+    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
+                             COLO_COMMAND_CHECKPOINT_REPLY);
+    if (ret < 0) {
+        goto out;
+    }
+
+    /* TODO: suspend and save vm state to colo buffer */
+
+    ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_VMSTATE_SEND);
+    if (ret < 0) {
+        goto out;
+    }
+
+    /* TODO: send vmstate to Secondary */
+
+    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
+                             COLO_COMMAND_VMSTATE_RECEIVED);
+    if (ret < 0) {
+        goto out;
+    }
+
+    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
+                             COLO_COMMAND_VMSTATE_LOADED);
+    if (ret < 0) {
+        goto out;
+    }
+
+    /* TODO: resume Primary */
+
+out:
+    return ret;
+}
+
 static void colo_process_checkpoint(MigrationState *s)
 {
     int ret = 0;
@@ -45,12 +141,28 @@  static void colo_process_checkpoint(MigrationState *s)
         goto out;
     }
 
+    /*
+     * Wait for Secondary finish loading vm states and enter COLO
+     * restore.
+     */
+    ret = colo_get_check_cmd(s->rp_state.from_dst_file,
+                             COLO_COMMAND_CHECKPOINT_READY);
+    if (ret < 0) {
+        goto out;
+    }
+
     qemu_mutex_lock_iothread();
     vm_start();
     qemu_mutex_unlock_iothread();
     trace_colo_vm_state_change("stop", "run");
 
-    /*TODO: COLO checkpoint savevm loop*/
+    while (s->state == MIGRATION_STATUS_COLO) {
+        /* start a colo checkpoint */
+        ret = colo_do_checkpoint_transaction(s);
+        if (ret < 0) {
+            goto out;
+        }
+    }
 
 out:
     if (ret < 0) {
@@ -73,6 +185,31 @@  void migrate_start_colo_process(MigrationState *s)
     qemu_mutex_lock_iothread();
 }
 
+/*
+ * return:
+ * 0: start a checkpoint
+ * -1: some error happened, exit colo restore
+ */
+static int colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request)
+{
+    int ret;
+    uint32_t cmd;
+
+    ret = colo_get_cmd(f, &cmd);
+    if (ret < 0) {
+        /* do failover ? */
+        return ret;
+    }
+
+    switch (cmd) {
+    case COLO_COMMAND_CHECKPOINT_REQUEST:
+        *checkpoint_request = 1;
+        return 0;
+    default:
+        return -EINVAL;
+    }
+}
+
 void *colo_process_incoming_thread(void *opaque)
 {
     MigrationIncomingState *mis = opaque;
@@ -93,7 +230,49 @@  void *colo_process_incoming_thread(void *opaque)
     */
     qemu_set_block(qemu_get_fd(mis->from_src_file));
 
-    /* TODO: COLO checkpoint restore loop */
+
+    ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_CHECKPOINT_READY);
+    if (ret < 0) {
+        goto out;
+    }
+
+    while (mis->state == MIGRATION_STATUS_COLO) {
+        int request = 0;
+        int ret = colo_wait_handle_cmd(mis->from_src_file, &request);
+
+        if (ret < 0) {
+            break;
+        } else {
+            if (!request) {
+                continue;
+            }
+        }
+        /* FIXME: This is unnecessary for periodic checkpoint mode */
+        ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_CHECKPOINT_REPLY);
+        if (ret < 0) {
+            goto out;
+        }
+
+        ret = colo_get_check_cmd(mis->from_src_file,
+                                 COLO_COMMAND_VMSTATE_SEND);
+        if (ret < 0) {
+            goto out;
+        }
+
+        /* TODO: read migration data into colo buffer */
+
+        ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_VMSTATE_RECEIVED);
+        if (ret < 0) {
+            goto out;
+        }
+
+        /* TODO: load vm state */
+
+        ret = colo_put_cmd(mis->to_src_file, COLO_COMMAND_VMSTATE_LOADED);
+        if (ret < 0) {
+            goto out;
+        }
+    }
 
 out:
     if (ret < 0) {
diff --git a/qapi-schema.json b/qapi-schema.json
index c9ff34e..85f7800 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -720,6 +720,31 @@ 
 { 'command': 'migrate-start-postcopy' }
 
 ##
+# @COLOCommand
+#
+# The commands for COLO fault tolerance
+#
+# @checkpoint-ready: SVM is ready for checkpointing
+#
+# @checkpoint-request: PVM tells SVM to prepare for new checkpointing
+#
+# @checkpoint-reply: SVM gets PVM's checkpoint request
+#
+# @vmstate-send: VM's state will be sent by PVM.
+#
+# @vmstate-size: The total size of VMstate.
+#
+# @vmstate-received: VM's state has been received by SVM.
+#
+# @vmstate-loaded: VM's state has been loaded by SVM.
+#
+# Since: 2.6
+##
+{ 'enum': 'COLOCommand',
+  'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
+            'vmstate-send', 'vmstate-size','vmstate-received',
+            'vmstate-loaded' ] }
+
 # @MouseInfo:
 #
 # Information about a mouse device.
diff --git a/trace-events b/trace-events
index 5565e79..39fdd8d 100644
--- a/trace-events
+++ b/trace-events
@@ -1579,6 +1579,8 @@  postcopy_ram_incoming_cleanup_join(void) ""
 
 # migration/colo.c
 colo_vm_state_change(const char *old, const char *new) "Change '%s' => '%s'"
+colo_put_cmd(const char *msg) "Send '%s' cmd"
+colo_get_cmd(const char *msg) "Receive '%s' cmd"
 
 # kvm-all.c
 kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"