From patchwork Wed Oct 18 17:01:31 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Dr. David Alan Gilbert" X-Patchwork-Id: 827760 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3yHJJm5QTxz9sRm for ; Thu, 19 Oct 2017 04:02:31 +1100 (AEDT) Received: from localhost ([::1]:45662 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1e4rjg-0000dE-7x for incoming@patchwork.ozlabs.org; Wed, 18 Oct 2017 13:02:28 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49470) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1e4rjB-0000d2-DS for qemu-devel@nongnu.org; Wed, 18 Oct 2017 13:02:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1e4rj7-0004OL-I4 for qemu-devel@nongnu.org; Wed, 18 Oct 2017 13:01:57 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58882) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1e4rj7-0004N4-BA for qemu-devel@nongnu.org; Wed, 18 Oct 2017 13:01:53 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9E6E67C82B; Wed, 18 Oct 2017 17:01:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 9E6E67C82B Authentication-Results: ext-mx03.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx03.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=dgilbert@redhat.com Received: from dgilbert-t530.redhat.com (ovpn-116-63.ams2.redhat.com [10.36.116.63]) by smtp.corp.redhat.com (Postfix) with ESMTP id 49C9C85353; Wed, 18 Oct 2017 17:01:38 +0000 (UTC) From: "Dr. David Alan Gilbert (git)" To: qemu-devel@nongnu.org, kwolf@redhat.com, jdenemar@redhat.com, wangjie88@huawei.com, quintela@redhat.com, peterx@redhat.com, mreitz@redhat.com Date: Wed, 18 Oct 2017 18:01:31 +0100 Message-Id: <20171018170138.19078-1-dgilbert@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Wed, 18 Oct 2017 17:01:50 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v2 0/7] migration: pause-before-switchover X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: fuweiwei2@huawei.com Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" From: "Dr. David Alan Gilbert" Hi, This set attempts to make a race condition between migration and drive-mirror (and other block users) soluble by allowing the migration to be paused after the source qemu releases the block devices but before the serialisation of the device state. The symptom of this failure, as reported by Wangjie, is a: _co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed and the source qemu dieing; so the problem is pretty nasty. This has only been seen on 2.9 onwards, but the theory is that prior to 2.9 it might have been happening anyway and we were perhaps getting unreported corruptions (lost writes); so this really needs fixing. This flow came from discussions between Kevin and me, and we can't see a way of fixing it without exposing a new state to the management layer. The flow is now: (qemu) migrate_set_capability pause-before-switchover on (qemu) migrate -d ... (qemu) info migrate ... Migration status: pre-switchover ... << issue commands to clean up any block jobs>> (qemu) migrate_continue pre-switchover (qemu) info migrate ... Migration status: completed This set has been _very_ lightly tested just at the normal migration code, without the addition of the drive mirror; so this is a first cut. I'd appreciate some feedback from libvirt whether the inteface is OK and ideally a hack to test it in a full libvirt setup to see if we hit any other issues. The precopy flow is: active->pre-switchover->device->completed The postcopy flow is: active->pre-switchover->postcopy-active->completed Although the behaviour with postcopy only gets interesting when we add something like Max's active-sync. Dave --- v2 Pause *before* block inactivation (thanks Peter) Rename state and capability to Dan+KWolf's combined suggestion Dr. David Alan Gilbert (7): migration: Add 'pause-before-switchover' capability migration: Add 'pre-switchover' and 'device' statuses migration: Wait for semaphore before completing migration migration: migrate-continue migrate: HMP migate_continue migration: allow cancel to unpause migration: pause-before-switchover for postcopy hmp-commands.hx | 12 +++++++ hmp.c | 13 ++++++++ hmp.h | 1 + migration/migration.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++-- migration/migration.h | 4 +++ qapi/migration.json | 30 ++++++++++++++++-- 6 files changed, 144 insertions(+), 4 deletions(-)