From patchwork Tue Dec 15 08:22:47 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhanghailiang X-Patchwork-Id: 556854 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3F7301402ED for ; Tue, 15 Dec 2015 19:39:19 +1100 (AEDT) Received: from localhost ([::1]:35710 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a8l8f-0002xW-6J for incoming@patchwork.ozlabs.org; Tue, 15 Dec 2015 03:39:17 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58732) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a8kuY-0003La-OJ for qemu-devel@nongnu.org; Tue, 15 Dec 2015 03:24:43 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a8kuX-0000ys-Gb for qemu-devel@nongnu.org; Tue, 15 Dec 2015 03:24:42 -0500 Received: from szxga03-in.huawei.com ([119.145.14.66]:29929) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a8kuW-0000y1-Ig for qemu-devel@nongnu.org; Tue, 15 Dec 2015 03:24:41 -0500 Received: from 172.24.1.50 (EHLO szxeml434-hub.china.huawei.com) ([172.24.1.50]) by szxrg03-dlp.huawei.com (MOS 4.4.3-GA FastPath queued) with ESMTP id BSU23439; Tue, 15 Dec 2015 16:24:29 +0800 (CST) Received: from localhost (10.177.24.212) by szxeml434-hub.china.huawei.com (10.82.67.225) with Microsoft SMTP Server id 14.3.235.1; Tue, 15 Dec 2015 16:24:21 +0800 From: zhanghailiang To: Date: Tue, 15 Dec 2015 16:22:47 +0800 Message-ID: <1450167779-9960-27-git-send-email-zhang.zhanghailiang@huawei.com> X-Mailer: git-send-email 1.9.0.msysgit.0 In-Reply-To: <1450167779-9960-1-git-send-email-zhang.zhanghailiang@huawei.com> References: <1450167779-9960-1-git-send-email-zhang.zhanghailiang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.177.24.212] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020206.566FCE40.008C, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=0.0.0.0, so=2013-05-26 15:14:31, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: c7aad92c926fd41ad56b432fd3bd093d X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] X-Received-From: 119.145.14.66 Cc: lizhijian@cn.fujitsu.com, quintela@redhat.com, yunhong.jiang@intel.com, eddie.dong@intel.com, peter.huangpeng@huawei.com, dgilbert@redhat.com, zhanghailiang , arei.gonglei@huawei.com, stefanha@redhat.com, amit.shah@redhat.com, hongyang.yang@easystack.cn Subject: [Qemu-devel] [PATCH COLO-Frame v12 26/38] COLO failover: Shutdown related socket fd when do failover X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org If the net connection between COLO's two sides is broken while colo/colo incoming thread is blocked in 'read'/'write' socket fd. It will not detect this error until connect timeout. It will be a long time. Here we shutdown all the related socket file descriptors to wake up the blocking operation in failover BH. Besides, we should close the corresponding file descriptors after failvoer BH shutdown them, or there will be an error. Signed-off-by: zhanghailiang Signed-off-by: Li Zhijian Reviewed-by: Dr. David Alan Gilbert --- v12: - Shutdown both QEMUFile's fd though they may use the same fd. (Dave's suggestion) v11: - Only shutdown fd for once Signed-off-by: zhanghailiang --- migration/colo.c | 42 ++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 40 insertions(+), 2 deletions(-) diff --git a/migration/colo.c b/migration/colo.c index d06c14f..58531e7 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -60,6 +60,18 @@ static void secondary_vm_do_failover(void) /* recover runstate to normal migration finish state */ autostart = true; } + /* + * Make sure colo incoming thread not block in recv or send, + * If mis->from_src_file and mis->to_src_file use the same fd, + * The second shutdown() will return -1, we ignore this value, + * it is harmless. + */ + if (mis->from_src_file) { + qemu_file_shutdown(mis->from_src_file); + } + if (mis->to_src_file) { + qemu_file_shutdown(mis->to_src_file); + } old_state = failover_set_state(FAILOVER_STATUS_HANDLING, FAILOVER_STATUS_COMPLETED); @@ -82,6 +94,18 @@ static void primary_vm_do_failover(void) migrate_set_state(&s->state, MIGRATION_STATUS_COLO, MIGRATION_STATUS_COMPLETED); + /* + * Make sure colo thread no block in recv or send, + * The s->rp_state.from_dst_file and s->to_dst_file may use the + * same fd, but we still shutdown the fd for twice, it is harmless. + */ + if (s->to_dst_file) { + qemu_file_shutdown(s->to_dst_file); + } + if (s->rp_state.from_dst_file) { + qemu_file_shutdown(s->rp_state.from_dst_file); + } + old_state = failover_set_state(FAILOVER_STATUS_HANDLING, FAILOVER_STATUS_COMPLETED); if (old_state != FAILOVER_STATUS_HANDLING) { @@ -348,7 +372,7 @@ static void colo_process_checkpoint(MigrationState *s) } out: - if (ret < 0) { + if (ret < 0 || (!ret && !failover_request_is_active())) { error_report("%s: %s", __func__, strerror(-ret)); qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_ERROR, true, strerror(-ret), NULL); @@ -360,6 +384,15 @@ out: qsb_free(buffer); buffer = NULL; + /* Hope this not to be too long to loop here */ + while (failover_get_state() != FAILOVER_STATUS_COMPLETED) { + ; + } + /* + * Must be called after failover BH is completed, + * Or the failover BH may shutdown the wrong fd, that + * re-used by other thread after we release here. + */ if (s->rp_state.from_dst_file) { qemu_fclose(s->rp_state.from_dst_file); } @@ -519,7 +552,7 @@ void *colo_process_incoming_thread(void *opaque) } out: - if (ret < 0) { + if (ret < 0 || (!ret && !failover_request_is_active())) { error_report("colo incoming thread will exit, detect error: %s", strerror(-ret)); qapi_event_send_colo_exit(COLO_MODE_SECONDARY, COLO_EXIT_REASON_ERROR, @@ -539,6 +572,11 @@ out: */ colo_release_ram_cache(); + /* Hope this not to be too long to loop here */ + while (failover_get_state() != FAILOVER_STATUS_COMPLETED) { + ; + } + /* Must be called after failover BH is completed */ if (mis->to_src_file) { qemu_fclose(mis->to_src_file); }