From patchwork Wed Sep 2 08:23:10 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhanghailiang X-Patchwork-Id: 513384 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 0D93C140129 for ; Wed, 2 Sep 2015 18:39:23 +1000 (AEST) Received: from localhost ([::1]:35073 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZX3Zg-0003HS-V7 for incoming@patchwork.ozlabs.org; Wed, 02 Sep 2015 04:39:21 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44125) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZX3LQ-00064i-DL for qemu-devel@nongnu.org; Wed, 02 Sep 2015 04:24:37 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZX3LP-00018B-Bh for qemu-devel@nongnu.org; Wed, 02 Sep 2015 04:24:36 -0400 Received: from szxga01-in.huawei.com ([58.251.152.64]:28411) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZX3LO-00017V-KG for qemu-devel@nongnu.org; Wed, 02 Sep 2015 04:24:35 -0400 Received: from 172.24.1.50 (EHLO szxeml426-hub.china.huawei.com) ([172.24.1.50]) by szxrg01-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id CUG54212; Wed, 02 Sep 2015 16:24:25 +0800 (CST) Received: from localhost (10.177.25.63) by szxeml426-hub.china.huawei.com (10.82.67.181) with Microsoft SMTP Server id 14.3.235.1; Wed, 2 Sep 2015 16:24:14 +0800 From: zhanghailiang To: Date: Wed, 2 Sep 2015 16:23:10 +0800 Message-ID: <1441182199-8328-24-git-send-email-zhang.zhanghailiang@huawei.com> X-Mailer: git-send-email 1.9.5.msysgit.1 In-Reply-To: <1441182199-8328-1-git-send-email-zhang.zhanghailiang@huawei.com> References: <1441182199-8328-1-git-send-email-zhang.zhanghailiang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.177.25.63] X-CFilter-Loop: Reflected X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] X-Received-From: 58.251.152.64 Cc: lizhijian@cn.fujitsu.com, quintela@redhat.com, yunhong.jiang@intel.com, eddie.dong@intel.com, peter.huangpeng@huawei.com, dgilbert@redhat.com, arei.gonglei@huawei.com, stefanha@redhat.com, amit.shah@redhat.com, yanghy@cn.fujitsu.com, zhanghailiang Subject: [Qemu-devel] [PATCH COLO-Frame v9 23/32] COLO failover: Shutdown related socket fd when do failover X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org If the net connection between COLO's two sides is broken while colo/colo incoming thread is blocked in 'read'/'write' socket fd. It will not detect this error until connect timeout. It will be a long time. Here we shutdown all the related socket file descriptors to wake up the blocking operation in failover BH. Besides, we should close the corresponding file descriptors after failvoer BH shutdown them, or there will be an error. Signed-off-by: zhanghailiang Signed-off-by: Li Zhijian --- migration/colo.c | 35 +++++++++++++++++++++++++++++++---- 1 file changed, 31 insertions(+), 4 deletions(-) diff --git a/migration/colo.c b/migration/colo.c index 29b5334..e443dfa 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -75,6 +75,13 @@ static void secondary_vm_do_failover(void) /* recover runstate to normal migration finish state */ autostart = true; } + /* Make sure colo incoming thread not block in recv */ + if (mis->from_src_file) { + qemu_file_shutdown(mis->from_src_file); + } + if (mis->to_src_file) { + qemu_file_shutdown(mis->to_src_file); + } old_state = failover_set_state(FAILOVER_STATUS_HANDLING, FAILOVER_STATUS_COMPLETED); @@ -98,6 +105,14 @@ static void primary_vm_do_failover(void) migrate_set_state(&s->state, MIGRATION_STATUS_COLO, MIGRATION_STATUS_COMPLETED); } + + if (s->from_dst_file) { /* Make sure colo thread no block in recv */ + qemu_file_shutdown(s->from_dst_file); + } + if (s->to_dst_file) { + qemu_file_shutdown(s->to_dst_file); + } + qemu_bh_schedule(s->cleanup_bh); vm_start(); @@ -207,7 +222,6 @@ static int colo_do_checkpoint_transaction(MigrationState *s, goto out; } - /* suspend and save vm state to colo buffer */ qemu_mutex_lock_iothread(); if (failover_request_is_active()) { qemu_mutex_unlock_iothread(); @@ -346,7 +360,7 @@ static void *colo_thread(void *opaque) out: current_time = error_time = qemu_clock_get_ms(QEMU_CLOCK_HOST); - if (ret < 0) { + if (ret < 0 || (!ret && !failover_request_is_active())) { error_report("Detect some error: %s", strerror(-ret)); qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_ERROR, true, strerror(-ret), NULL); @@ -375,6 +389,15 @@ out: qsb_free(buffer); buffer = NULL; + /* Hope this not to be too long to loop here */ + while (failover_get_state() != FAILOVER_STATUS_COMPLETED) { + ; + } + /* Must be called after failover BH is completed */ + if (s->from_dst_file) { + qemu_fclose(s->from_dst_file); + } + return NULL; } @@ -552,7 +575,7 @@ void *colo_process_incoming_thread(void *opaque) out: current_time = error_time = qemu_clock_get_ms(QEMU_CLOCK_HOST); - if (ret < 0) { + if (ret < 0 || (!ret && !failover_request_is_active())) { error_report("colo incoming thread will exit, detect error: %s", strerror(-ret)); qapi_event_send_colo_exit(COLO_MODE_SECONDARY, COLO_EXIT_REASON_ERROR, @@ -591,11 +614,15 @@ out: */ colo_release_ram_cache(); + /* Hope this not to be too long to loop here */ + while (failover_get_state() != FAILOVER_STATUS_COMPLETED) { + ; + } + /* Must be called after failover BH is completed */ if (mis->to_src_file) { qemu_fclose(mis->to_src_file); } migration_incoming_exit_colo(); - return NULL; }