Patchwork [v3] qemu: kvm: Enable XSAVE live migration support

login
register
mail settings
Submitter Sheng Yang
Date June 11, 2010, 4:36 a.m.
Message ID <1276231009-6060-1-git-send-email-sheng@linux.intel.com>
Download mbox | patch
Permalink /patch/55291/
State New
Headers show

Comments

Sheng Yang - June 11, 2010, 4:36 a.m.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 qemu-kvm-x86.c        |  109 ++++++++++++++++++++++++++++++++++++++++---------
 qemu-kvm.c            |   24 +++++++++++
 qemu-kvm.h            |   28 +++++++++++++
 target-i386/cpu.h     |    5 ++
 target-i386/kvm.c     |    2 +
 target-i386/machine.c |   20 +++++++++
 6 files changed, 169 insertions(+), 19 deletions(-)
Marcelo Tosatti - June 14, 2010, 8:39 p.m.
On Fri, Jun 11, 2010 at 12:36:49PM +0800, Sheng Yang wrote:
> 
> Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> ---
>  qemu-kvm-x86.c        |  109 ++++++++++++++++++++++++++++++++++++++++---------
>  qemu-kvm.c            |   24 +++++++++++
>  qemu-kvm.h            |   28 +++++++++++++
>  target-i386/cpu.h     |    5 ++
>  target-i386/kvm.c     |    2 +
>  target-i386/machine.c |   20 +++++++++
>  6 files changed, 169 insertions(+), 19 deletions(-)

Applied, thanks.
Jan Kiszka - June 16, 2010, 3:48 p.m.
Marcelo Tosatti wrote:
> On Fri, Jun 11, 2010 at 12:36:49PM +0800, Sheng Yang wrote:
>> Signed-off-by: Sheng Yang <sheng@linux.intel.com>
>> ---
>>  qemu-kvm-x86.c        |  109 ++++++++++++++++++++++++++++++++++++++++---------
>>  qemu-kvm.c            |   24 +++++++++++
>>  qemu-kvm.h            |   28 +++++++++++++
>>  target-i386/cpu.h     |    5 ++
>>  target-i386/kvm.c     |    2 +
>>  target-i386/machine.c |   20 +++++++++
>>  6 files changed, 169 insertions(+), 19 deletions(-)
> 
> Applied, thanks.

Oops, late remark: Why introducing this feature against qemu-kvm instead
of upstream? Doesn't this just generate additional conversion work and
the risk of divergence to upstream in the migration protocol?

Jan
Marcelo Tosatti - June 16, 2010, 4:05 p.m.
On Wed, Jun 16, 2010 at 05:48:46PM +0200, Jan Kiszka wrote:
> Marcelo Tosatti wrote:
> > On Fri, Jun 11, 2010 at 12:36:49PM +0800, Sheng Yang wrote:
> >> Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> >> ---
> >>  qemu-kvm-x86.c        |  109 ++++++++++++++++++++++++++++++++++++++++---------
> >>  qemu-kvm.c            |   24 +++++++++++
> >>  qemu-kvm.h            |   28 +++++++++++++
> >>  target-i386/cpu.h     |    5 ++
> >>  target-i386/kvm.c     |    2 +
> >>  target-i386/machine.c |   20 +++++++++
> >>  6 files changed, 169 insertions(+), 19 deletions(-)
> > 
> > Applied, thanks.
> 
> Oops, late remark: Why introducing this feature against qemu-kvm instead
> of upstream? Doesn't this just generate additional conversion work and
> the risk of divergence to upstream in the migration protocol?

Thats true. Sheng, can you add save/restore support to uq/master to
avoid these problems?

Then the cpuid bits can be also merged upstream.
Sheng Yang - June 17, 2010, 2:01 a.m.
On Thursday 17 June 2010 00:05:44 Marcelo Tosatti wrote:
> On Wed, Jun 16, 2010 at 05:48:46PM +0200, Jan Kiszka wrote:
> > Marcelo Tosatti wrote:
> > > On Fri, Jun 11, 2010 at 12:36:49PM +0800, Sheng Yang wrote:
> > >> Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> > >> ---
> > >> 
> > >>  qemu-kvm-x86.c        |  109
> > >>  ++++++++++++++++++++++++++++++++++++++++--------- qemu-kvm.c        
> > >>     |   24 +++++++++++
> > >>  qemu-kvm.h            |   28 +++++++++++++
> > >>  target-i386/cpu.h     |    5 ++
> > >>  target-i386/kvm.c     |    2 +
> > >>  target-i386/machine.c |   20 +++++++++
> > >>  6 files changed, 169 insertions(+), 19 deletions(-)
> > > 
> > > Applied, thanks.
> > 
> > Oops, late remark: Why introducing this feature against qemu-kvm instead
> > of upstream? Doesn't this just generate additional conversion work and
> > the risk of divergence to upstream in the migration protocol?

Hi Jan

You're late... Hope you could raise the comment earlier next time so we can work 
together more efficient.
> 
> Thats true. Sheng, can you add save/restore support to uq/master to
> avoid these problems?

Yes, there is divergence risk, would send an upstream version as well.

But I think as long as qemu-kvm and qemu upstream use different LM path, the 
duplicate code/work can't be avoid. 
 
> Then the cpuid bits can be also merged upstream.

--
regards
Yang, Sheng
Jan Kiszka - June 17, 2010, 7:26 a.m.
Sheng Yang wrote:
> On Thursday 17 June 2010 00:05:44 Marcelo Tosatti wrote:
>> On Wed, Jun 16, 2010 at 05:48:46PM +0200, Jan Kiszka wrote:
>>> Marcelo Tosatti wrote:
>>>> On Fri, Jun 11, 2010 at 12:36:49PM +0800, Sheng Yang wrote:
>>>>> Signed-off-by: Sheng Yang <sheng@linux.intel.com>
>>>>> ---
>>>>>
>>>>>  qemu-kvm-x86.c        |  109
>>>>>  ++++++++++++++++++++++++++++++++++++++++--------- qemu-kvm.c        
>>>>>     |   24 +++++++++++
>>>>>  qemu-kvm.h            |   28 +++++++++++++
>>>>>  target-i386/cpu.h     |    5 ++
>>>>>  target-i386/kvm.c     |    2 +
>>>>>  target-i386/machine.c |   20 +++++++++
>>>>>  6 files changed, 169 insertions(+), 19 deletions(-)
>>>> Applied, thanks.
>>> Oops, late remark: Why introducing this feature against qemu-kvm instead
>>> of upstream? Doesn't this just generate additional conversion work and
>>> the risk of divergence to upstream in the migration protocol?
> 
> Hi Jan
> 
> You're late... Hope you could raise the comment earlier next time so we can work 
> together more efficient.

This case is "lost", probably was already when you posted the first
time. But I hope we can raise awareness for the issue that way again.

>> Thats true. Sheng, can you add save/restore support to uq/master to
>> avoid these problems?
> 
> Yes, there is divergence risk, would send an upstream version as well.
> 
> But I think as long as qemu-kvm and qemu upstream use different LM path, the 
> duplicate code/work can't be avoid. 

Probably. The vision is that one day you can write a KVM feature and
apply it to qemu-kvm as a staging tree for later unmodified merge into
qemu upstream. qemu-kvm[-arch].[ch] is still in our way, but it already
uses many bits from upstream. So I would recommend to design new
features against upstream first and then provide the few bits to also
make use of it in qemu-kvm once the latter has merged the required bits
(which may actually happen before upstream, but that doesn't matter).

Jan

Patch

diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 3c33e64..4f0b1d0 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -772,10 +772,20 @@  static void get_seg(SegmentCache *lhs, const struct kvm_segment *rhs)
 	| (rhs->avl * DESC_AVL_MASK);
 }
 
+#define XSAVE_CWD_RIP     2
+#define XSAVE_CWD_RDP     4
+#define XSAVE_MXCSR       6
+#define XSAVE_ST_SPACE    8
+#define XSAVE_XMM_SPACE   40
+#define XSAVE_XSTATE_BV   128
+#define XSAVE_YMMH_SPACE  144
+
 void kvm_arch_load_regs(CPUState *env, int level)
 {
     struct kvm_regs regs;
     struct kvm_fpu fpu;
+    struct kvm_xsave* xsave;
+    struct kvm_xcrs xcrs;
     struct kvm_sregs sregs;
     struct kvm_msr_entry msrs[100];
     int rc, n, i;
@@ -806,16 +816,47 @@  void kvm_arch_load_regs(CPUState *env, int level)
 
     kvm_set_regs(env, &regs);
 
-    memset(&fpu, 0, sizeof fpu);
-    fpu.fsw = env->fpus & ~(7 << 11);
-    fpu.fsw |= (env->fpstt & 7) << 11;
-    fpu.fcw = env->fpuc;
-    for (i = 0; i < 8; ++i)
-	fpu.ftwx |= (!env->fptags[i]) << i;
-    memcpy(fpu.fpr, env->fpregs, sizeof env->fpregs);
-    memcpy(fpu.xmm, env->xmm_regs, sizeof env->xmm_regs);
-    fpu.mxcsr = env->mxcsr;
-    kvm_set_fpu(env, &fpu);
+    if (kvm_check_extension(kvm_state, KVM_CAP_XSAVE)) {
+        uint16_t cwd, swd, twd, fop;
+
+        xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
+        memset(xsave, 0, sizeof(struct kvm_xsave));
+        cwd = swd = twd = fop = 0;
+        swd = env->fpus & ~(7 << 11);
+        swd |= (env->fpstt & 7) << 11;
+        cwd = env->fpuc;
+        for (i = 0; i < 8; ++i)
+            twd |= (!env->fptags[i]) << i;
+        xsave->region[0] = (uint32_t)(swd << 16) + cwd;
+        xsave->region[1] = (uint32_t)(fop << 16) + twd;
+        memcpy(&xsave->region[XSAVE_ST_SPACE], env->fpregs,
+                sizeof env->fpregs);
+        memcpy(&xsave->region[XSAVE_XMM_SPACE], env->xmm_regs,
+                sizeof env->xmm_regs);
+        xsave->region[XSAVE_MXCSR] = env->mxcsr;
+        *(uint64_t *)&xsave->region[XSAVE_XSTATE_BV] = env->xstate_bv;
+        memcpy(&xsave->region[XSAVE_YMMH_SPACE], env->ymmh_regs,
+                sizeof env->ymmh_regs);
+        kvm_set_xsave(env, xsave);
+        if (kvm_check_extension(kvm_state, KVM_CAP_XCRS)) {
+            xcrs.nr_xcrs = 1;
+            xcrs.flags = 0;
+            xcrs.xcrs[0].xcr = 0;
+            xcrs.xcrs[0].value = env->xcr0;
+            kvm_set_xcrs(env, &xcrs);
+        }
+    } else {
+        memset(&fpu, 0, sizeof fpu);
+        fpu.fsw = env->fpus & ~(7 << 11);
+        fpu.fsw |= (env->fpstt & 7) << 11;
+        fpu.fcw = env->fpuc;
+        for (i = 0; i < 8; ++i)
+            fpu.ftwx |= (!env->fptags[i]) << i;
+        memcpy(fpu.fpr, env->fpregs, sizeof env->fpregs);
+        memcpy(fpu.xmm, env->xmm_regs, sizeof env->xmm_regs);
+        fpu.mxcsr = env->mxcsr;
+        kvm_set_fpu(env, &fpu);
+    }
 
     memset(sregs.interrupt_bitmap, 0, sizeof(sregs.interrupt_bitmap));
     if (env->interrupt_injected >= 0) {
@@ -934,6 +975,8 @@  void kvm_arch_save_regs(CPUState *env)
 {
     struct kvm_regs regs;
     struct kvm_fpu fpu;
+    struct kvm_xsave* xsave;
+    struct kvm_xcrs xcrs;
     struct kvm_sregs sregs;
     struct kvm_msr_entry msrs[100];
     uint32_t hflags;
@@ -965,15 +1008,43 @@  void kvm_arch_save_regs(CPUState *env)
     env->eflags = regs.rflags;
     env->eip = regs.rip;
 
-    kvm_get_fpu(env, &fpu);
-    env->fpstt = (fpu.fsw >> 11) & 7;
-    env->fpus = fpu.fsw;
-    env->fpuc = fpu.fcw;
-    for (i = 0; i < 8; ++i)
-	env->fptags[i] = !((fpu.ftwx >> i) & 1);
-    memcpy(env->fpregs, fpu.fpr, sizeof env->fpregs);
-    memcpy(env->xmm_regs, fpu.xmm, sizeof env->xmm_regs);
-    env->mxcsr = fpu.mxcsr;
+    if (kvm_check_extension(kvm_state, KVM_CAP_XSAVE)) {
+        uint16_t cwd, swd, twd, fop;
+        xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
+        kvm_get_xsave(env, xsave);
+        cwd = (uint16_t)xsave->region[0];
+        swd = (uint16_t)(xsave->region[0] >> 16);
+        twd = (uint16_t)xsave->region[1];
+        fop = (uint16_t)(xsave->region[1] >> 16);
+        env->fpstt = (swd >> 11) & 7;
+        env->fpus = swd;
+        env->fpuc = cwd;
+        for (i = 0; i < 8; ++i)
+            env->fptags[i] = !((twd >> i) & 1);
+        env->mxcsr = xsave->region[XSAVE_MXCSR];
+        memcpy(env->fpregs, &xsave->region[XSAVE_ST_SPACE],
+                sizeof env->fpregs);
+        memcpy(env->xmm_regs, &xsave->region[XSAVE_XMM_SPACE],
+                sizeof env->xmm_regs);
+        env->xstate_bv = *(uint64_t *)&xsave->region[XSAVE_XSTATE_BV];
+        memcpy(env->ymmh_regs, &xsave->region[XSAVE_YMMH_SPACE],
+                sizeof env->ymmh_regs);
+        if (kvm_check_extension(kvm_state, KVM_CAP_XCRS)) {
+            kvm_get_xcrs(env, &xcrs);
+            if (xcrs.xcrs[0].xcr == 0)
+                env->xcr0 = xcrs.xcrs[0].value;
+        }
+    } else {
+        kvm_get_fpu(env, &fpu);
+        env->fpstt = (fpu.fsw >> 11) & 7;
+        env->fpus = fpu.fsw;
+        env->fpuc = fpu.fcw;
+        for (i = 0; i < 8; ++i)
+            env->fptags[i] = !((fpu.ftwx >> i) & 1);
+        memcpy(env->fpregs, fpu.fpr, sizeof env->fpregs);
+        memcpy(env->xmm_regs, fpu.xmm, sizeof env->xmm_regs);
+        env->mxcsr = fpu.mxcsr;
+    }
 
     kvm_get_sregs(env, &sregs);
 
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 96d458c..be1dac2 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -503,6 +503,30 @@  int kvm_set_mpstate(CPUState *env, struct kvm_mp_state *mp_state)
 }
 #endif
 
+#ifdef KVM_CAP_XSAVE
+int kvm_get_xsave(CPUState *env, struct kvm_xsave *xsave)
+{
+    return kvm_vcpu_ioctl(env, KVM_GET_XSAVE, xsave);
+}
+
+int kvm_set_xsave(CPUState *env, struct kvm_xsave *xsave)
+{
+    return kvm_vcpu_ioctl(env, KVM_SET_XSAVE, xsave);
+}
+#endif
+
+#ifdef KVM_CAP_XCRS
+int kvm_get_xcrs(CPUState *env, struct kvm_xcrs *xcrs)
+{
+    return kvm_vcpu_ioctl(env, KVM_GET_XCRS, xcrs);
+}
+
+int kvm_set_xcrs(CPUState *env, struct kvm_xcrs *xcrs)
+{
+    return kvm_vcpu_ioctl(env, KVM_SET_XCRS, xcrs);
+}
+#endif
+
 static int handle_mmio(CPUState *env)
 {
     unsigned long addr = env->kvm_run->mmio.phys_addr;
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 6f6c6d8..3ace503 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -300,6 +300,34 @@  int kvm_get_mpstate(CPUState *env, struct kvm_mp_state *mp_state);
 int kvm_set_mpstate(CPUState *env, struct kvm_mp_state *mp_state);
 #endif
 
+#ifdef KVM_CAP_XSAVE
+/*!
+ *  * \brief Read VCPU xsave state
+ *
+ */
+int kvm_get_xsave(CPUState *env, struct kvm_xsave *xsave);
+
+/*!
+ *  * \brief Write VCPU xsave state
+ *
+ */
+int kvm_set_xsave(CPUState *env, struct kvm_xsave *xsave);
+#endif
+
+#ifdef KVM_CAP_XCRS
+/*!
+ *  * \brief Read VCPU XCRs
+ *
+ */
+int kvm_get_xcrs(CPUState *env, struct kvm_xcrs *xcrs);
+
+/*!
+ *  * \brief Write VCPU XCRs
+ *
+ */
+int kvm_set_xcrs(CPUState *env, struct kvm_xcrs *xcrs);
+#endif
+
 /*!
  * \brief Simulate an external vectored interrupt
  *
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index e989040..c32f854 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -736,6 +736,11 @@  typedef struct CPUX86State {
     uint16_t fpregs_format_vmstate;
 
     int kvm_vcpu_update_vapic;
+
+    uint64_t xstate_bv;
+    XMMReg ymmh_regs[CPU_NB_REGS];
+
+    uint64_t xcr0;
 } CPUX86State;
 
 CPUX86State *cpu_x86_init(const char *cpu_model);
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index f0f3252..57327f5 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -294,6 +294,8 @@  void kvm_arch_reset_vcpu(CPUState *env)
     env->interrupt_injected = -1;
     env->nmi_injected = 0;
     env->nmi_pending = 0;
+    /* Legal xcr0 for loading */
+    env->xcr0 = 1;
 }
 #ifdef KVM_UPSTREAM
 
diff --git a/target-i386/machine.c b/target-i386/machine.c
index b547e2a..6c89a08 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -47,6 +47,22 @@  static const VMStateDescription vmstate_xmm_reg = {
 #define VMSTATE_XMM_REGS(_field, _state, _n)                         \
     VMSTATE_STRUCT_ARRAY(_field, _state, _n, 0, vmstate_xmm_reg, XMMReg)
 
+/* YMMH format is the same as XMM */
+static const VMStateDescription vmstate_ymmh_reg = {
+    .name = "ymmh_reg",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_UINT64(XMM_Q(0), XMMReg),
+        VMSTATE_UINT64(XMM_Q(1), XMMReg),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+#define VMSTATE_YMMH_REGS_VARS(_field, _state, _n, _v)                         \
+    VMSTATE_STRUCT_ARRAY(_field, _state, _n, _v, vmstate_ymmh_reg, XMMReg)
+
 static const VMStateDescription vmstate_mtrr_var = {
     .name = "mtrr_var",
     .version_id = 1,
@@ -453,6 +469,10 @@  static const VMStateDescription vmstate_cpu = {
         /* KVM pvclock msr */
         VMSTATE_UINT64_V(system_time_msr, CPUState, 11),
         VMSTATE_UINT64_V(wall_clock_msr, CPUState, 11),
+
+        VMSTATE_UINT64_V(xcr0, CPUState, 12),
+        VMSTATE_UINT64_V(xstate_bv, CPUState, 12),
+        VMSTATE_YMMH_REGS_VARS(ymmh_regs, CPUState, CPU_NB_REGS, 12),
         VMSTATE_END_OF_LIST()
         /* The above list is not sorted /wrt version numbers, watch out! */
     }