From patchwork Fri Nov 4 16:59:36 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 691339 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3t9Sq52ls4z9vDS for ; Sat, 5 Nov 2016 04:04:00 +1100 (AEDT) Received: from localhost ([::1]:39990 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c2huG-00043K-BY for incoming@patchwork.ozlabs.org; Fri, 04 Nov 2016 13:03:56 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44545) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c2hqi-0002HY-Vl for qemu-devel@nongnu.org; Fri, 04 Nov 2016 13:00:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c2hqf-0005cj-T5 for qemu-devel@nongnu.org; Fri, 04 Nov 2016 13:00:17 -0400 Received: from mx1.redhat.com ([209.132.183.28]:38320) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1c2hqf-0005bQ-LH for qemu-devel@nongnu.org; Fri, 04 Nov 2016 13:00:13 -0400 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A2E7280515; Fri, 4 Nov 2016 17:00:12 +0000 (UTC) Received: from amt.cnet (vpn1-7-230.gru2.redhat.com [10.97.7.230]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id uA4H0Bes026443; Fri, 4 Nov 2016 13:00:11 -0400 Received: from amt.cnet (localhost [127.0.0.1]) by amt.cnet (Postfix) with ESMTP id 56BCC1008C2; Fri, 4 Nov 2016 14:59:40 -0200 (BRST) Received: (from marcelo@localhost) by amt.cnet (8.14.7/8.14.7/Submit) id uA4GxaCD003501; Fri, 4 Nov 2016 14:59:36 -0200 Date: Fri, 4 Nov 2016 14:59:36 -0200 From: Marcelo Tosatti To: kvm@vger.kernel.org, qemu-devel Message-ID: <20161104165933.GA3027@amt.cnet> References: <20161104094322.GA16930@amt.cnet> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20161104094322.GA16930@amt.cnet> User-Agent: Mutt/1.5.21 (2010-09-15) X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Fri, 04 Nov 2016 17:00:12 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [QEMU PATCH v2] kvmclock: advance clock by time window between vm_stop and pre_save X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Paolo Bonzini , Juan Quintela , "Dr. David Alan Gilbert" , Eduardo Habkost , Radim =?utf-8?B?S3LEjW3DocWZ?= Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" This patch, relative to pre-copy migration codepath, measures the time between vm_stop() and pre_save(), which includes copying the remaining RAM to destination, and advances the clock by that amount. In a VM with 5 seconds downtime, this reduces the guest clock difference on destination from 5s to 0.2s. Tested with Linux and Windows 2012 R2 guests with -cpu XXX,+hv-time. Signed-off-by: Marcelo Tosatti Reviewed-by: Juan Quintela --- v2: use subsection (Juan Quintela) fix older machine types support diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c index 0f75dd3..a2a02ac 100644 --- a/hw/i386/kvm/clock.c +++ b/hw/i386/kvm/clock.c @@ -22,9 +22,11 @@ #include "kvm_i386.h" #include "hw/sysbus.h" #include "hw/kvm/clock.h" +#include "migration/migration.h" #include #include +#include #define TYPE_KVM_CLOCK "kvmclock" #define KVM_CLOCK(obj) OBJECT_CHECK(KVMClockState, (obj), TYPE_KVM_CLOCK) @@ -35,7 +37,13 @@ typedef struct KVMClockState { /*< public >*/ uint64_t clock; + uint64_t ns; bool clock_valid; + + uint64_t advance_clock; + struct timespec t_aftervmstop; + + bool adv_clock_enabled; } KVMClockState; struct pvclock_vcpu_time_info { @@ -100,6 +108,11 @@ static void kvmclock_vm_state_change(void *opaque, int running, s->clock = time_at_migration; } + if (s->advance_clock && s->clock + s->advance_clock > s->clock) { + s->clock += s->advance_clock; + s->advance_clock = 0; + } + data.clock = s->clock; ret = kvm_vm_ioctl(kvm_state, KVM_SET_CLOCK, &data); if (ret < 0) { @@ -135,6 +148,18 @@ static void kvmclock_vm_state_change(void *opaque, int running, abort(); } s->clock = data.clock; + /* + * Transition from VM-running to VM-stopped via migration? + * Record when the VM was stopped. + */ + + if (state == RUN_STATE_FINISH_MIGRATE && + !migration_in_postcopy(migrate_get_current())) { + clock_gettime(CLOCK_MONOTONIC, &s->t_aftervmstop); + } else { + s->t_aftervmstop.tv_sec = 0; + s->t_aftervmstop.tv_nsec = 0; + } /* * If the VM is stopped, declare the clock state valid to @@ -152,6 +177,77 @@ static void kvmclock_realize(DeviceState *dev, Error **errp) qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s); } +static uint64_t clock_delta(struct timespec *before, struct timespec *after) +{ + if (before->tv_sec > after->tv_sec || + (before->tv_sec == after->tv_sec && + before->tv_nsec > after->tv_nsec)) { + fprintf(stderr, "clock_delta failed: before=(%ld sec, %ld nsec)," + "after=(%ld sec, %ld nsec)\n", before->tv_sec, + before->tv_nsec, after->tv_sec, after->tv_nsec); + abort(); + } + + return (after->tv_sec - before->tv_sec) * 1000000000ULL + + after->tv_nsec - before->tv_nsec; +} + +static void kvmclock_pre_save(void *opaque) +{ + KVMClockState *s = opaque; + struct timespec now; + uint64_t ns; + + if (s->t_aftervmstop.tv_sec == 0) { + return; + } + + clock_gettime(CLOCK_MONOTONIC, &now); + + ns = clock_delta(&s->t_aftervmstop, &now); + + /* + * Linux guests can overflow if time jumps + * forward in large increments. + * Cap maximum adjustment to 10 minutes. + */ + ns = MIN(ns, 600000000000ULL); + + if (s->clock + ns > s->clock) { + s->ns = ns; + } +} + +static int kvmclock_post_load(void *opaque, int version_id) +{ + KVMClockState *s = opaque; + + /* save the value from incoming migration */ + s->advance_clock = s->ns; + + return 0; +} + +static bool kvmclock_ns_needed(void *opaque) +{ + KVMClockState *s = opaque; + + return s->adv_clock_enabled; +} + +static const VMStateDescription kvmclock_advance_ns = { + .name = "kvmclock/advance_ns", + .version_id = 1, + .minimum_version_id = 1, + .needed = kvmclock_ns_needed, + .pre_save = kvmclock_pre_save, + .post_load = kvmclock_post_load, + .fields = (VMStateField[]) { + VMSTATE_UINT64(ns, KVMClockState), + VMSTATE_END_OF_LIST() + } +}; + static const VMStateDescription kvmclock_vmsd = { .name = "kvmclock", .version_id = 1, @@ -159,15 +255,25 @@ static const VMStateDescription kvmclock_vmsd = { .fields = (VMStateField[]) { VMSTATE_UINT64(clock, KVMClockState), VMSTATE_END_OF_LIST() + }, + .subsections = (const VMStateDescription * []) { + &kvmclock_advance_ns, + NULL } }; +static Property kvmclock_properties[] = { + DEFINE_PROP_BOOL("advance_clock", KVMClockState, adv_clock_enabled, true), + DEFINE_PROP_END_OF_LIST(), +}; + static void kvmclock_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); dc->realize = kvmclock_realize; dc->vmsd = &kvmclock_vmsd; + dc->props = kvmclock_properties; } static const TypeInfo kvmclock_info = { diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index 98dc772..243352e 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -370,6 +370,11 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *); #define PC_COMPAT_2_7 \ HW_COMPAT_2_7 \ {\ + .driver = "kvmclock",\ + .property = "advance_clock",\ + .value = "off",\ + },\ + {\ .driver = TYPE_X86_CPU,\ .property = "l3-cache",\ .value = "off",\