From patchwork Thu May 21 08:13:07 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhanghailiang X-Patchwork-Id: 474858 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 55B1E14078C for ; Thu, 21 May 2015 18:15:29 +1000 (AEST) Received: from localhost ([::1]:55800 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YvLdX-0003gM-CI for incoming@patchwork.ozlabs.org; Thu, 21 May 2015 04:15:27 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52399) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YvLck-0002Fi-SA for qemu-devel@nongnu.org; Thu, 21 May 2015 04:14:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YvLcj-0002jN-C7 for qemu-devel@nongnu.org; Thu, 21 May 2015 04:14:38 -0400 Received: from szxga03-in.huawei.com ([119.145.14.66]:24254) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YvLch-0002iW-MI for qemu-devel@nongnu.org; Thu, 21 May 2015 04:14:37 -0400 Received: from 172.24.2.119 (EHLO szxeml434-hub.china.huawei.com) ([172.24.2.119]) by szxrg03-dlp.huawei.com (MOS 4.4.3-GA FastPath queued) with ESMTP id BGI98549; Thu, 21 May 2015 16:14:32 +0800 (CST) Received: from localhost (10.177.22.69) by szxeml434-hub.china.huawei.com (10.82.67.225) with Microsoft SMTP Server id 14.3.158.1; Thu, 21 May 2015 16:14:21 +0800 From: zhanghailiang To: Date: Thu, 21 May 2015 16:13:07 +0800 Message-ID: <1432196001-10352-16-git-send-email-zhang.zhanghailiang@huawei.com> X-Mailer: git-send-email 1.9.2.msysgit.0 In-Reply-To: <1432196001-10352-1-git-send-email-zhang.zhanghailiang@huawei.com> References: <1432196001-10352-1-git-send-email-zhang.zhanghailiang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.177.22.69] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020202.555D93E9.00B5, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=0.0.0.0, so=2013-05-26 15:14:31, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 34c355d2d42b443d31da6053d4dfcbd6 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] X-Received-From: 119.145.14.66 Cc: lizhijian@cn.fujitsu.com, quintela@redhat.com, yunhong.jiang@intel.com, eddie.dong@intel.com, peter.huangpeng@huawei.com, dgilbert@redhat.com, zhanghailiang , arei.gonglei@huawei.com, amit.shah@redhat.com, Lai Jiangshan , david@gibson.dropbear.id.au Subject: [Qemu-devel] [PATCH COLO-Frame v5 15/29] COLO failover: Implement COLO master/slave failover work X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org If failover is requested, after some cleanup work, PVM or SVM will exit COLO mode, and resume to normal run. Signed-off-by: zhanghailiang Signed-off-by: Li Zhijian Signed-off-by: Lai Jiangshan --- include/migration/migration-colo.h | 14 ++++ include/migration/migration-failover.h | 2 + migration/colo-comm.c | 11 +++ migration/colo-failover.c | 12 +++- migration/colo.c | 126 ++++++++++++++++++++++++++++++++- stubs/migration-colo.c | 5 ++ trace-events | 1 + 7 files changed, 169 insertions(+), 2 deletions(-) diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h index d6eac07..63f8b45 100644 --- a/include/migration/migration-colo.h +++ b/include/migration/migration-colo.h @@ -22,6 +22,13 @@ bool colo_supported(void); void colo_info_mig_init(void); +/* Checkpoint control, called in migration/checkpoint thread */ +enum { + COLO_UNPROTECTED_MODE = 0, + COLO_PRIMARY_MODE, + COLO_SECONDARY_MODE, +}; + struct colo_incoming { QEMUFile *file; QemuThread thread; @@ -36,8 +43,15 @@ bool loadvm_enable_colo(void); void loadvm_exit_colo(void); void *colo_process_incoming_checkpoints(void *opaque); bool loadvm_in_colo_state(void); + +int get_colo_mode(void); + /* ram cache */ int create_and_init_ram_cache(void); void colo_flush_ram_cache(void); void release_ram_cache(void); + +/* failover */ +void colo_do_failover(MigrationState *s); + #endif diff --git a/include/migration/migration-failover.h b/include/migration/migration-failover.h index a8767fc..5e59b1d 100644 --- a/include/migration/migration-failover.h +++ b/include/migration/migration-failover.h @@ -16,5 +16,7 @@ #include "qemu-common.h" void failover_request_set(void); +void failover_request_clear(void); +bool failover_request_is_set(void); #endif diff --git a/migration/colo-comm.c b/migration/colo-comm.c index f8be027..16bd184 100644 --- a/migration/colo-comm.c +++ b/migration/colo-comm.c @@ -16,6 +16,17 @@ static bool colo_requested; +int get_colo_mode(void) +{ + if (migrate_in_colo_state()) { + return COLO_PRIMARY_MODE; + } else if (loadvm_in_colo_state()) { + return COLO_SECONDARY_MODE; + } else { + return COLO_UNPROTECTED_MODE; + } +} + /* save */ static void colo_info_save(QEMUFile *f, void *opaque) { diff --git a/migration/colo-failover.c b/migration/colo-failover.c index 2bd2e16..97f5d24 100644 --- a/migration/colo-failover.c +++ b/migration/colo-failover.c @@ -22,7 +22,7 @@ static void colo_failover_bh(void *opaque) { qemu_bh_delete(failover_bh); failover_bh = NULL; - /*TODO: Do failover work */ + colo_do_failover(NULL); } void failover_request_set(void) @@ -32,6 +32,16 @@ void failover_request_set(void) qemu_bh_schedule(failover_bh); } +void failover_request_clear(void) +{ + failover_request = false; +} + +bool failover_request_is_set(void) +{ + return failover_request; +} + void qmp_colo_lost_heartbeat(Error **errp) { failover_request_set(); diff --git a/migration/colo.c b/migration/colo.c index ad44569..1c8cdfe 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -68,6 +68,67 @@ bool migrate_in_colo_state(void) return (s->state == MIGRATION_STATUS_COLO); } +static bool colo_runstate_is_stopped(void) +{ + return runstate_check(RUN_STATE_COLO) || !runstate_is_running(); +} + +/* + * there are two way to entry this function + * 1. From colo checkpoint incoming thread, in this case + * we should protect it by iothread lock + * 2. From user command, because hmp/qmp command + * was happened in main loop, iothread lock will cause a + * dead lock. + */ +static void slave_do_failover(void) +{ + colo = NULL; + + if (!autostart) { + error_report("\"-S\" qemu option will be ignored in colo slave side"); + /* recover runstate to normal migration finish state */ + autostart = true; + } + + /* On slave side, jump to incoming co */ + if (migration_incoming_co) { + qemu_coroutine_enter(migration_incoming_co, NULL); + } +} + +static void master_do_failover(void) +{ + MigrationState *s = migrate_get_current(); + + if (!colo_runstate_is_stopped()) { + vm_stop_force_state(RUN_STATE_COLO); + } + + if (s->state != MIGRATION_STATUS_FAILED) { + migrate_set_state(s, MIGRATION_STATUS_COLO, MIGRATION_STATUS_COMPLETED); + } + + vm_start(); +} + +static bool failover_completed; +void colo_do_failover(MigrationState *s) +{ + /* Make sure vm stopped while failover */ + if (!colo_runstate_is_stopped()) { + vm_stop_force_state(RUN_STATE_COLO); + } + + trace_colo_do_failover(); + if (get_colo_mode() == COLO_SECONDARY_MODE) { + slave_do_failover(); + } else { + master_do_failover(); + } + failover_completed = true; +} + /* colo checkpoint control helper */ static int colo_ctl_put(QEMUFile *f, uint64_t request) { @@ -139,11 +200,23 @@ static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control) goto out; } + if (failover_request_is_set()) { + ret = -1; + goto out; + } /* suspend and save vm state to colo buffer */ qemu_mutex_lock_iothread(); vm_stop_force_state(RUN_STATE_COLO); qemu_mutex_unlock_iothread(); trace_colo_vm_state_change("run", "stop"); + /* + * failover request bh could be called after + * vm_stop_force_state so we check failover_request_is_set() again. + */ + if (failover_request_is_set()) { + ret = -1; + goto out; + } /* Disable block migration */ s->params.blk = 0; @@ -233,6 +306,11 @@ static void *colo_thread(void *opaque) trace_colo_vm_state_change("stop", "run"); while (s->state == MIGRATION_STATUS_COLO) { + if (failover_request_is_set()) { + error_report("failover request"); + goto out; + } + /* start a colo checkpoint */ if (colo_do_checkpoint_transaction(s, colo_control)) { goto out; @@ -240,7 +318,18 @@ static void *colo_thread(void *opaque) } out: - migrate_set_state(s, MIGRATION_STATUS_COLO, MIGRATION_STATUS_COMPLETED); + error_report("colo: some error happens in colo_thread"); + qemu_mutex_lock_iothread(); + if (!failover_request_is_set()) { + error_report("master takeover from checkpoint channel"); + failover_request_set(); + } + qemu_mutex_unlock_iothread(); + + while (!failover_completed) { + ; + } + failover_request_clear(); qsb_free(colo_buffer); colo_buffer = NULL; @@ -281,6 +370,11 @@ void colo_init_checkpointer(MigrationState *s) qemu_bh_schedule(colo_bh); } +bool loadvm_in_colo_state(void) +{ + return colo != NULL; +} + /* * return: * 0: start a checkpoint @@ -356,6 +450,10 @@ void *colo_process_incoming_checkpoints(void *opaque) continue; } } + if (failover_request_is_set()) { + error_report("failover request"); + goto out; + } /* suspend guest */ qemu_mutex_lock_iothread(); @@ -425,6 +523,32 @@ void *colo_process_incoming_checkpoints(void *opaque) } out: + error_report("Detect some error or get a failover request"); + /* determine whether we need to failover */ + if (!failover_request_is_set()) { + /* + * TODO: Here, maybe we should raise a qmp event to the user, + * It can help user to know what happens, and help deciding whether to + * do failover. + */ + usleep(2000 * 1000); + } + /* check flag again*/ + if (!failover_request_is_set()) { + /* + * We assume that master is still alive according to heartbeat, + * just kill slave + */ + error_report("SVM is going to exit!"); + exit(1); + } else { + /* if we went here, means master may dead, we are doing failover */ + while (!failover_completed) { + ; + } + failover_request_clear(); + } + colo = NULL; if (fb) { diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c index 7d1fd9f..9ec0c07 100644 --- a/stubs/migration-colo.c +++ b/stubs/migration-colo.c @@ -32,6 +32,11 @@ void *colo_process_incoming_checkpoints(void *opaque) return NULL; } +bool loadvm_in_colo_state(void) +{ + return false; +} + void qmp_colo_lost_heartbeat(Error **errp) { error_setg(errp, "COLO is not supported, please rerun configure" diff --git a/trace-events b/trace-events index 2b95743..1ce7bba 100644 --- a/trace-events +++ b/trace-events @@ -1449,6 +1449,7 @@ colo_info_load(const char *msg) "%s" # migration/colo.c colo_vm_state_change(const char *old, const char *new) "Change '%s' => '%s'" colo_receive_message(const char *msg) "Receive '%s'" +colo_do_failover(void) "" # kvm-all.c kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"