From patchwork Tue Nov 14 07:09:20 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: fangying X-Patchwork-Id: 837708 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3ybdwT6hXCz9s7G for ; Tue, 14 Nov 2017 18:11:36 +1100 (AEDT) Received: from localhost ([::1]:57939 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eEVNb-0005Gl-4X for incoming@patchwork.ozlabs.org; Tue, 14 Nov 2017 02:11:31 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38369) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eEVNI-0005Ge-0s for qemu-devel@nongnu.org; Tue, 14 Nov 2017 02:11:13 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eEVND-0004SK-Nx for qemu-devel@nongnu.org; Tue, 14 Nov 2017 02:11:11 -0500 Received: from szxga04-in.huawei.com ([45.249.212.190]:2352) by eggs.gnu.org with esmtps (TLS1.0:RSA_ARCFOUR_SHA1:16) (Exim 4.71) (envelope-from ) id 1eEVND-0004P2-3s for qemu-devel@nongnu.org; Tue, 14 Nov 2017 02:11:07 -0500 Received: from 172.30.72.58 (EHLO DGGEMS409-HUB.china.huawei.com) ([172.30.72.58]) by dggrg04-dlp.huawei.com (MOS 4.4.6-GA FastPath queued) with ESMTP id DKU37305; Tue, 14 Nov 2017 15:10:52 +0800 (CST) Received: from [127.0.0.1] (10.177.223.127) by DGGEMS409-HUB.china.huawei.com (10.3.19.209) with Microsoft SMTP Server id 14.3.361.1; Tue, 14 Nov 2017 15:10:14 +0800 From: fangying To: "qemu-devel@nongnu.org" , , , Paolo Bonzini Message-ID: <212a9227-5a56-7a0e-1149-2d1bc884b4f0@huawei.com> Date: Tue, 14 Nov 2017 15:09:20 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 MIME-Version: 1.0 X-Originating-IP: [10.177.223.127] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020206.5A0A96FD.0024, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=0.0.0.0, so=2014-11-16 11:51:01, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 88562a07e38fc6191506f93e2aaf0a37 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] [fuzzy] X-Received-From: 45.249.212.190 Subject: [Qemu-devel] QEMU abort when network serivce is restarted during live migration with vhost-user as the network backend X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "hangaohuai@huawei.com" , arei.gonglei@huawei.com, king.wang@huawei.com Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Hi all, We have a vm running migration with vhost-user as network backend, we notice that qemu will abort when openvswitch is restarted when MEMORY_LISTENER_CALL_GLOBAL(log_global_start, Forward) is called. The reasion is clear that vhost_dev_set_log returns -1 because the network connection is temporarily lost due to the restart of openvswitch service. Below is the trace of the call stack. #0 0x00007f868ed971d7 in raise() from /usr/lib64/libc.so.6 #1 0x00007f868ed988c8 in abort() from /usr/lib64/libc.so.6 #2 0x00000000004d0d35 in vhost_log_global_start (listener=) at /usr/src/debug/qemu-kvm-2.8.1/hw/virtio/vhost.c:794 #2 0x0000000000486bd2 in memory_global_dirty_log_start at /usr/src/debug/qemu-kvm-2.8.1/memory.c:2304 #3 0x0000000000486dcd in ram_save_init_globals at /usr/src/debug/qemu-kvm-2.8.1/migration/ram.c:2072 #4 0x000000000048c185 in ram_save_setup (f=0x25e6ac0, opaque=) at /usr/src/debug/qemu-kvm-2.8.1/migration/ram.c:2093 #5 0x00000000004fbee2 in qemu_savevm_state_begin at /usr/src/debug/qemu-kvm-2.8.1/migration/savevm.c:956 #6 0x000000000083d8f8 in migration_thread at migration/migration.c:2198 static void vhost_log_global_start(MemoryListener *listener) { int r; r = vhost_migration_log(listener, true); if (r < 0) { abort(); /* branch taken */ } } What confuse me is that 1. do we really need to abort here ? 2. all member of callbacks in MemoryListener returned with type void, we cannot judge in any upper function on the call stack. Can we just cancel migration here instead of calling abort ? like: diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index ddc42f0..27ae4a2 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -27,6 +27,7 @@ #include "hw/virtio/virtio-access.h" #include "migration/blocker.h" #include "sysemu/dma.h" +#include "qmp-commands.h" /* enabled until disconnected backend stabilizes */ #define _VHOST_DEBUG 1 @@ -885,7 +886,7 @@ static void vhost_log_global_start(MemoryListener *listener) r = vhost_migration_log(listener, true); if (r < 0) { - abort(); + qmp_migrate_cancel(NULL); } } @@ -895,7 +896,7 @@ static void vhost_log_global_stop(MemoryListener *listener) r = vhost_migration_log(listener, false); if (r < 0) { - abort(); + qmp_migrate_cancel(NULL); } }