From patchwork Tue Nov 3 11:56:40 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhanghailiang X-Patchwork-Id: 539345 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 28AAB14090A for ; Tue, 3 Nov 2015 23:05:27 +1100 (AEDT) Received: from localhost ([::1]:47435 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZtaL6-0006Jc-VW for incoming@patchwork.ozlabs.org; Tue, 03 Nov 2015 07:05:25 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58833) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZtaGQ-0006Rl-KW for qemu-devel@nongnu.org; Tue, 03 Nov 2015 07:00:38 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZtaGP-0002Zx-H5 for qemu-devel@nongnu.org; Tue, 03 Nov 2015 07:00:34 -0500 Received: from szxga03-in.huawei.com ([119.145.14.66]:61816) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZtaGO-0002ZC-L0 for qemu-devel@nongnu.org; Tue, 03 Nov 2015 07:00:33 -0500 Received: from 172.24.1.51 (EHLO SZXEML424-HUB.china.huawei.com) ([172.24.1.51]) by szxrg03-dlp.huawei.com (MOS 4.4.3-GA FastPath queued) with ESMTP id BQI67006; Tue, 03 Nov 2015 19:57:39 +0800 (CST) Received: from localhost (10.177.25.63) by SZXEML424-HUB.china.huawei.com (10.82.67.153) with Microsoft SMTP Server id 14.3.235.1; Tue, 3 Nov 2015 19:57:31 +0800 From: zhanghailiang To: Date: Tue, 3 Nov 2015 19:56:40 +0800 Message-ID: <1446551816-15768-23-git-send-email-zhang.zhanghailiang@huawei.com> X-Mailer: git-send-email 1.9.5.msysgit.1 In-Reply-To: <1446551816-15768-1-git-send-email-zhang.zhanghailiang@huawei.com> References: <1446551816-15768-1-git-send-email-zhang.zhanghailiang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.177.25.63] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A090204.5638A1CF.0032, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=0.0.0.0, so=2013-05-26 15:14:31, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: a7cb5681ae5a8aecdebce8401bfa3211 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] X-Received-From: 119.145.14.66 Cc: lizhijian@cn.fujitsu.com, quintela@redhat.com, yunhong.jiang@intel.com, eddie.dong@intel.com, peter.huangpeng@huawei.com, dgilbert@redhat.com, arei.gonglei@huawei.com, stefanha@redhat.com, amit.shah@redhat.com, zhanghailiang Subject: [Qemu-devel] [PATCH COLO-Frame v10 22/38] COLO: implement default failover treatment X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org If we detect some error in colo, we will wait for some time, hoping users also detect it. If users don't issue failover command. We will go into default failover procedure, which the PVM will takeover work while SVM is exit in default. Signed-off-by: zhanghailiang Signed-off-by: Li Zhijian --- migration/colo.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) diff --git a/migration/colo.c b/migration/colo.c index 925a694..de6265e 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -25,6 +25,14 @@ */ #define CHECKPOINT_MAX_PEROID 200 +/* + * The delay time before qemu begin the procedure of default failover treatment. + * Unit: ms + * Fix me: This value should be able to change by command + * 'migrate-set-parameters' + */ +#define DEFAULT_FAILOVER_DELAY 2000 + /* colo buffer */ #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024) @@ -273,6 +281,7 @@ static void colo_process_checkpoint(MigrationState *s) { QEMUSizedBuffer *buffer = NULL; int64_t current_time, checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST); + int64_t error_time; int fd, ret = 0; failover_init_state(); @@ -331,8 +340,25 @@ static void colo_process_checkpoint(MigrationState *s) } out: + current_time = error_time = qemu_clock_get_ms(QEMU_CLOCK_HOST); if (ret < 0) { error_report("%s: %s", __func__, strerror(-ret)); + /* Give users time to get involved in this verdict */ + while (current_time - error_time <= DEFAULT_FAILOVER_DELAY) { + if (failover_request_is_active()) { + error_report("Primary VM will take over work"); + break; + } + usleep(100 * 1000); + current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST); + } + + qemu_mutex_lock_iothread(); + if (!failover_request_is_active()) { + error_report("Primary VM will take over work in default"); + failover_request_active(NULL); + } + qemu_mutex_unlock_iothread(); } qsb_free(buffer); @@ -391,6 +417,7 @@ void *colo_process_incoming_thread(void *opaque) QEMUFile *fb = NULL; QEMUSizedBuffer *buffer = NULL; /* Cache incoming device state */ int total_size; + int64_t error_time, current_time; int fd, ret = 0; migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE, @@ -499,9 +526,28 @@ void *colo_process_incoming_thread(void *opaque) } out: + current_time = error_time = qemu_clock_get_ms(QEMU_CLOCK_HOST); if (ret < 0) { error_report("colo incoming thread will exit, detect error: %s", strerror(-ret)); + /* Give users time to get involved in this verdict */ + while (current_time - error_time <= DEFAULT_FAILOVER_DELAY) { + if (failover_request_is_active()) { + error_report("Secondary VM will take over work"); + break; + } + usleep(100 * 1000); + current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST); + } + /* check flag again*/ + if (!failover_request_is_active()) { + /* + * We assume that Primary VM is still alive according to + * heartbeat, just kill Secondary VM + */ + error_report("SVM is going to exit in default!"); + exit(1); + } } if (fb) {