diff mbox

[uq/master] kvm: x86: Save/restore FPU OP, IP and DP

Message ID 4DF33413.9070605@web.de
State New
Headers show

Commit Message

Jan Kiszka June 11, 2011, 9:23 a.m. UTC
From: Jan Kiszka <jan.kiszka@siemens.com>

These FPU states are properly maintained by KVM but not yet by TCG. So
far we unconditionally set them to 0 in the guest which may cause
state corruptions - not only during migration.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 target-i386/cpu.h     |    6 +++++-
 target-i386/kvm.c     |   20 +++++++++++++++-----
 target-i386/machine.c |    4 ++++
 3 files changed, 24 insertions(+), 6 deletions(-)

Comments

Avi Kivity June 13, 2011, 8:45 a.m. UTC | #1
On 06/11/2011 12:23 PM, Jan Kiszka wrote:
> From: Jan Kiszka<jan.kiszka@siemens.com>
>
> These FPU states are properly maintained by KVM but not yet by TCG. So
> far we unconditionally set them to 0 in the guest which may cause
> state corruptions - not only during migration.
>
>
> -#define CPU_SAVE_VERSION 12
> +#define CPU_SAVE_VERSION 13
>

Incrementing the version number seems excessive - I can't imagine a 
real-life guest will break due to fp pointer corruption

However, I don't think we have a mechanism for optional state.  We 
discussed this during the 18th VMState Subsection Symposium and IIRC 
agreed to re-raise the issue when we encountered it, which appears to be 
now.
Jan Kiszka June 14, 2011, 6:10 a.m. UTC | #2
On 2011-06-13 10:45, Avi Kivity wrote:
> On 06/11/2011 12:23 PM, Jan Kiszka wrote:
>> From: Jan Kiszka<jan.kiszka@siemens.com>
>>
>> These FPU states are properly maintained by KVM but not yet by TCG. So
>> far we unconditionally set them to 0 in the guest which may cause
>> state corruptions - not only during migration.
>>
>>
>> -#define CPU_SAVE_VERSION 12
>> +#define CPU_SAVE_VERSION 13
>>
> 
> Incrementing the version number seems excessive - I can't imagine a
> real-life guest will break due to fp pointer corruption
> 
> However, I don't think we have a mechanism for optional state.  We
> discussed this during the 18th VMState Subsection Symposium and IIRC
> agreed to re-raise the issue when we encountered it, which appears to be
> now.
> 

Whatever we invent, it has to be backported as well to allow that
infamous traveling back in time, migrating VMs from newer to older versions.

Would that backporting be simpler if we used an unconditional subsection
for the additional states?

Jan
Avi Kivity June 14, 2011, 8:23 a.m. UTC | #3
On 06/14/2011 09:10 AM, Jan Kiszka wrote:
> On 2011-06-13 10:45, Avi Kivity wrote:
> >  On 06/11/2011 12:23 PM, Jan Kiszka wrote:
> >>  From: Jan Kiszka<jan.kiszka@siemens.com>
> >>
> >>  These FPU states are properly maintained by KVM but not yet by TCG. So
> >>  far we unconditionally set them to 0 in the guest which may cause
> >>  state corruptions - not only during migration.
> >>
> >>
> >>  -#define CPU_SAVE_VERSION 12
> >>  +#define CPU_SAVE_VERSION 13
> >>
> >
> >  Incrementing the version number seems excessive - I can't imagine a
> >  real-life guest will break due to fp pointer corruption
> >
> >  However, I don't think we have a mechanism for optional state.  We
> >  discussed this during the 18th VMState Subsection Symposium and IIRC
> >  agreed to re-raise the issue when we encountered it, which appears to be
> >  now.
> >
>
> Whatever we invent, it has to be backported as well to allow that
> infamous traveling back in time, migrating VMs from newer to older versions.
>
> Would that backporting be simpler if we used an unconditional subsection
> for the additional states?

Most likely.  It depends on what mechanism we use.

Let's spend some time to think about what it would be like.  This patch 
is not urgent, is it? (i.e. it was discovered by code inspection, not 
live migration that caught the cpu between an instruction that caused a 
math exception and the exception handler).
Jan Kiszka June 14, 2011, 8:28 a.m. UTC | #4
On 2011-06-14 10:23, Avi Kivity wrote:
> On 06/14/2011 09:10 AM, Jan Kiszka wrote:
>> On 2011-06-13 10:45, Avi Kivity wrote:
>> >  On 06/11/2011 12:23 PM, Jan Kiszka wrote:
>> >>  From: Jan Kiszka<jan.kiszka@siemens.com>
>> >>
>> >>  These FPU states are properly maintained by KVM but not yet by
>> TCG. So
>> >>  far we unconditionally set them to 0 in the guest which may cause
>> >>  state corruptions - not only during migration.
>> >>
>> >>
>> >>  -#define CPU_SAVE_VERSION 12
>> >>  +#define CPU_SAVE_VERSION 13
>> >>
>> >
>> >  Incrementing the version number seems excessive - I can't imagine a
>> >  real-life guest will break due to fp pointer corruption
>> >
>> >  However, I don't think we have a mechanism for optional state.  We
>> >  discussed this during the 18th VMState Subsection Symposium and IIRC
>> >  agreed to re-raise the issue when we encountered it, which appears
>> to be
>> >  now.
>> >
>>
>> Whatever we invent, it has to be backported as well to allow that
>> infamous traveling back in time, migrating VMs from newer to older
>> versions.
>>
>> Would that backporting be simpler if we used an unconditional subsection
>> for the additional states?
> 
> Most likely.  It depends on what mechanism we use.
> 
> Let's spend some time to think about what it would be like.  This patch
> is not urgent, is it? (i.e. it was discovered by code inspection, not
> live migration that caught the cpu between an instruction that caused a
> math exception and the exception handler).

Right, not urgent, should just make it into 0.15 in the end.

Jan
Avi Kivity June 15, 2011, 9:10 a.m. UTC | #5
On 06/14/2011 09:10 AM, Jan Kiszka wrote:
> On 2011-06-13 10:45, Avi Kivity wrote:
> >  On 06/11/2011 12:23 PM, Jan Kiszka wrote:
> >>  From: Jan Kiszka<jan.kiszka@siemens.com>
> >>
> >>  These FPU states are properly maintained by KVM but not yet by TCG. So
> >>  far we unconditionally set them to 0 in the guest which may cause
> >>  state corruptions - not only during migration.
> >>
> >>
> >>  -#define CPU_SAVE_VERSION 12
> >>  +#define CPU_SAVE_VERSION 13
> >>
> >
> >  Incrementing the version number seems excessive - I can't imagine a
> >  real-life guest will break due to fp pointer corruption
> >
> >  However, I don't think we have a mechanism for optional state.  We
> >  discussed this during the 18th VMState Subsection Symposium and IIRC
> >  agreed to re-raise the issue when we encountered it, which appears to be
> >  now.
> >
>
> Whatever we invent, it has to be backported as well to allow that
> infamous traveling back in time, migrating VMs from newer to older versions.
>
> Would that backporting be simpler if we used an unconditional subsection
> for the additional states?

Thinking about it, a conditional subsection would work fine.  Most 
threads will never see an fpu error, and are all initialized to a clean 
slate.

SDM 1 8.1.9.1 says:

> 8.1.9.1 Fopcode Compatibility Sub-mode
> Beginning with the Pentium 4 and Intel Xeon processors, the IA-32 
> architecture
> provides program control over the storing of the last instruction 
> opcode (sometimes
> referred to as the fopcode). Here, bit 2 of the IA32_MISC_ENABLE MSR 
> enables (set)
> or disables (clear) the fopcode compatibility mode.
> If FOP code compatibility mode is enabled, the FOP is defined as it 
> has always been
> in previous IA32 implementations (always defined as the FOP of the 
> last non-trans-
> parent FP instruction executed before a FSAVE/FSTENV/FXSAVE). If FOP code
> compatibility mode is disabled (default), FOP is only valid if the 
> last non-transparent
> FP instruction executed before a FSAVE/FSTENV/FXSAVE had an unmasked 
> exception.

So fopcode will usually be clear.
Jan Kiszka June 15, 2011, 10:20 a.m. UTC | #6
On 2011-06-15 11:10, Avi Kivity wrote:
> On 06/14/2011 09:10 AM, Jan Kiszka wrote:
>> On 2011-06-13 10:45, Avi Kivity wrote:
>> >  On 06/11/2011 12:23 PM, Jan Kiszka wrote:
>> >>  From: Jan Kiszka<jan.kiszka@siemens.com>
>> >>
>> >>  These FPU states are properly maintained by KVM but not yet by
>> TCG. So
>> >>  far we unconditionally set them to 0 in the guest which may cause
>> >>  state corruptions - not only during migration.
>> >>
>> >>
>> >>  -#define CPU_SAVE_VERSION 12
>> >>  +#define CPU_SAVE_VERSION 13
>> >>
>> >
>> >  Incrementing the version number seems excessive - I can't imagine a
>> >  real-life guest will break due to fp pointer corruption
>> >
>> >  However, I don't think we have a mechanism for optional state.  We
>> >  discussed this during the 18th VMState Subsection Symposium and IIRC
>> >  agreed to re-raise the issue when we encountered it, which appears
>> to be
>> >  now.
>> >
>>
>> Whatever we invent, it has to be backported as well to allow that
>> infamous traveling back in time, migrating VMs from newer to older
>> versions.
>>
>> Would that backporting be simpler if we used an unconditional subsection
>> for the additional states?
> 
> Thinking about it, a conditional subsection would work fine.  Most
> threads will never see an fpu error, and are all initialized to a clean
> slate.
> 
> SDM 1 8.1.9.1 says:
> 
>> 8.1.9.1 Fopcode Compatibility Sub-mode
>> Beginning with the Pentium 4 and Intel Xeon processors, the IA-32
>> architecture
>> provides program control over the storing of the last instruction
>> opcode (sometimes
>> referred to as the fopcode). Here, bit 2 of the IA32_MISC_ENABLE MSR
>> enables (set)
>> or disables (clear) the fopcode compatibility mode.
>> If FOP code compatibility mode is enabled, the FOP is defined as it
>> has always been
>> in previous IA32 implementations (always defined as the FOP of the
>> last non-trans-
>> parent FP instruction executed before a FSAVE/FSTENV/FXSAVE). If FOP code
>> compatibility mode is disabled (default), FOP is only valid if the
>> last non-transparent
>> FP instruction executed before a FSAVE/FSTENV/FXSAVE had an unmasked
>> exception.
> 
> So fopcode will usually be clear.
> 

OK. So if bit 2 of IA32_MISC_ENABLE MSR, we must save that fields. But
if it's off, how to test for that other condition "last non-transparent
FP instruction ... had an unmasked exception" from the host?

Jan
Jan Kiszka June 15, 2011, 10:39 a.m. UTC | #7
On 2011-06-15 12:20, Jan Kiszka wrote:
> On 2011-06-15 11:10, Avi Kivity wrote:
>> On 06/14/2011 09:10 AM, Jan Kiszka wrote:
>>> On 2011-06-13 10:45, Avi Kivity wrote:
>>>>  On 06/11/2011 12:23 PM, Jan Kiszka wrote:
>>>>>  From: Jan Kiszka<jan.kiszka@siemens.com>
>>>>>
>>>>>  These FPU states are properly maintained by KVM but not yet by
>>> TCG. So
>>>>>  far we unconditionally set them to 0 in the guest which may cause
>>>>>  state corruptions - not only during migration.
>>>>>
>>>>>
>>>>>  -#define CPU_SAVE_VERSION 12
>>>>>  +#define CPU_SAVE_VERSION 13
>>>>>
>>>>
>>>>  Incrementing the version number seems excessive - I can't imagine a
>>>>  real-life guest will break due to fp pointer corruption
>>>>
>>>>  However, I don't think we have a mechanism for optional state.  We
>>>>  discussed this during the 18th VMState Subsection Symposium and IIRC
>>>>  agreed to re-raise the issue when we encountered it, which appears
>>> to be
>>>>  now.
>>>>
>>>
>>> Whatever we invent, it has to be backported as well to allow that
>>> infamous traveling back in time, migrating VMs from newer to older
>>> versions.
>>>
>>> Would that backporting be simpler if we used an unconditional subsection
>>> for the additional states?
>>
>> Thinking about it, a conditional subsection would work fine.  Most
>> threads will never see an fpu error, and are all initialized to a clean
>> slate.
>>
>> SDM 1 8.1.9.1 says:
>>
>>> 8.1.9.1 Fopcode Compatibility Sub-mode
>>> Beginning with the Pentium 4 and Intel Xeon processors, the IA-32
>>> architecture
>>> provides program control over the storing of the last instruction
>>> opcode (sometimes
>>> referred to as the fopcode). Here, bit 2 of the IA32_MISC_ENABLE MSR
>>> enables (set)
>>> or disables (clear) the fopcode compatibility mode.
>>> If FOP code compatibility mode is enabled, the FOP is defined as it
>>> has always been
>>> in previous IA32 implementations (always defined as the FOP of the
>>> last non-trans-
>>> parent FP instruction executed before a FSAVE/FSTENV/FXSAVE). If FOP code
>>> compatibility mode is disabled (default), FOP is only valid if the
>>> last non-transparent
>>> FP instruction executed before a FSAVE/FSTENV/FXSAVE had an unmasked
>>> exception.
>>
>> So fopcode will usually be clear.
>>
> 
> OK. So if bit 2 of IA32_MISC_ENABLE MSR, we must save that fields. But
> if it's off, how to test for that other condition "last non-transparent
> FP instruction ... had an unmasked exception" from the host?

I briefly thought about status.ES == 1. But the guest may clear the flag
in its exception handler before reading opcode etc.

Jan
Avi Kivity June 15, 2011, 11:26 a.m. UTC | #8
On 06/15/2011 01:20 PM, Jan Kiszka wrote:
> >
> >  So fopcode will usually be clear.
> >
>
> OK. So if bit 2 of IA32_MISC_ENABLE MSR, we must save that fields. But
> if it's off, how to test for that other condition "last non-transparent
> FP instruction ... had an unmasked exception" from the host?
>

We save fopcode unconditionally.  But if IA32_MISC_ENABLE_MSR[2]=0, then 
fopcode will be zero, and we can skip the subsection (if the data and 
instruction pointers are also zero, which they will be).

If it isn't zero, there's still a good chance fopcode will be zero 
(64-bit userspace, thread that hasn't used the fpu since the last 
context switch, last opcode happened to be zero).
Jan Kiszka June 15, 2011, 11:32 a.m. UTC | #9
On 2011-06-15 13:26, Avi Kivity wrote:
> On 06/15/2011 01:20 PM, Jan Kiszka wrote:
>> >
>> >  So fopcode will usually be clear.
>> >
>>
>> OK. So if bit 2 of IA32_MISC_ENABLE MSR, we must save that fields. But
>> if it's off, how to test for that other condition "last non-transparent
>> FP instruction ... had an unmasked exception" from the host?
>>
> 
> We save fopcode unconditionally.  But if IA32_MISC_ENABLE_MSR[2]=0, then
> fopcode will be zero, and we can skip the subsection (if the data and
> instruction pointers are also zero, which they will be).
> 
> If it isn't zero, there's still a good chance fopcode will be zero
> (64-bit userspace, thread that hasn't used the fpu since the last
> context switch, last opcode happened to be zero).

I do not yet find "if fopcode is invalid, it is zero, just as IP and DP"
in the spec. What clears them reliably?

Jan
Avi Kivity June 15, 2011, 11:33 a.m. UTC | #10
On 06/15/2011 02:32 PM, Jan Kiszka wrote:
> >
> >  If it isn't zero, there's still a good chance fopcode will be zero
> >  (64-bit userspace, thread that hasn't used the fpu since the last
> >  context switch, last opcode happened to be zero).
>
> I do not yet find "if fopcode is invalid, it is zero, just as IP and DP"
> in the spec. What clears them reliably?

FNINIT
Jan Kiszka June 15, 2011, 11:45 a.m. UTC | #11
On 2011-06-15 13:33, Avi Kivity wrote:
> On 06/15/2011 02:32 PM, Jan Kiszka wrote:
>>>
>>>  If it isn't zero, there's still a good chance fopcode will be zero
>>>  (64-bit userspace, thread that hasn't used the fpu since the last
>>>  context switch, last opcode happened to be zero).
>>
>> I do not yet find "if fopcode is invalid, it is zero, just as IP and DP"
>> in the spec. What clears them reliably?
> 
> FNINIT

OK, I see. So we simply check for all fields being zero and skip the
section in that case. The MSR doesn't actually to us here.

Will write v2.

Jan
Christophe Fergeau June 16, 2011, 9:35 a.m. UTC | #12
Hi Jan,

On Sat, Jun 11, 2011 at 11:23:31AM +0200, Jan Kiszka wrote:
> These FPU states are properly maintained by KVM but not yet by TCG. So
> far we unconditionally set them to 0 in the guest which may cause
> state corruptions - not only during migration.

I can't judge whether the patch is correct or not, but I can confirm it
fixes my compilation problem. Feel free to add an Acked-by-me if that makes
sense.

Christophe
diff mbox

Patch

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 9c3340d..3c2dab9 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -641,6 +641,10 @@  typedef struct CPUX86State {
     uint16_t fpuc;
     uint8_t fptags[8];   /* 0 = valid, 1 = empty */
     FPReg fpregs[8];
+    /* KVM-only so far */
+    uint16_t fpop;
+    uint64_t fpip;
+    uint64_t fpdp;
 
     /* emulator internal variables */
     float_status fp_status;
@@ -942,7 +946,7 @@  uint64_t cpu_get_tsc(CPUX86State *env);
 #define cpu_list_id x86_cpu_list
 #define cpudef_setup	x86_cpudef_setup
 
-#define CPU_SAVE_VERSION 12
+#define CPU_SAVE_VERSION 13
 
 /* MMU modes definitions */
 #define MMU_MODE0_SUFFIX _kernel
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 5ebb054..938e0a3 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -718,6 +718,9 @@  static int kvm_put_fpu(CPUState *env)
     fpu.fsw = env->fpus & ~(7 << 11);
     fpu.fsw |= (env->fpstt & 7) << 11;
     fpu.fcw = env->fpuc;
+    fpu.last_opcode = env->fpop;
+    fpu.last_ip = env->fpip;
+    fpu.last_dp = env->fpdp;
     for (i = 0; i < 8; ++i) {
         fpu.ftwx |= (!env->fptags[i]) << i;
     }
@@ -740,7 +743,7 @@  static int kvm_put_xsave(CPUState *env)
 {
     int i, r;
     struct kvm_xsave* xsave;
-    uint16_t cwd, swd, twd, fop;
+    uint16_t cwd, swd, twd;
 
     if (!kvm_has_xsave()) {
         return kvm_put_fpu(env);
@@ -748,7 +751,7 @@  static int kvm_put_xsave(CPUState *env)
 
     xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
     memset(xsave, 0, sizeof(struct kvm_xsave));
-    cwd = swd = twd = fop = 0;
+    cwd = swd = twd = 0;
     swd = env->fpus & ~(7 << 11);
     swd |= (env->fpstt & 7) << 11;
     cwd = env->fpuc;
@@ -756,7 +759,9 @@  static int kvm_put_xsave(CPUState *env)
         twd |= (!env->fptags[i]) << i;
     }
     xsave->region[0] = (uint32_t)(swd << 16) + cwd;
-    xsave->region[1] = (uint32_t)(fop << 16) + twd;
+    xsave->region[1] = (uint32_t)(env->fpop << 16) + twd;
+    memcpy(&xsave->region[XSAVE_CWD_RIP], &env->fpip, sizeof(env->fpip));
+    memcpy(&xsave->region[XSAVE_CWD_RDP], &env->fpdp, sizeof(env->fpdp));
     memcpy(&xsave->region[XSAVE_ST_SPACE], env->fpregs,
             sizeof env->fpregs);
     memcpy(&xsave->region[XSAVE_XMM_SPACE], env->xmm_regs,
@@ -921,6 +926,9 @@  static int kvm_get_fpu(CPUState *env)
     env->fpstt = (fpu.fsw >> 11) & 7;
     env->fpus = fpu.fsw;
     env->fpuc = fpu.fcw;
+    env->fpop = fpu.last_opcode;
+    env->fpip = fpu.last_ip;
+    env->fpdp = fpu.last_dp;
     for (i = 0; i < 8; ++i) {
         env->fptags[i] = !((fpu.ftwx >> i) & 1);
     }
@@ -935,7 +943,7 @@  static int kvm_get_xsave(CPUState *env)
 {
     struct kvm_xsave* xsave;
     int ret, i;
-    uint16_t cwd, swd, twd, fop;
+    uint16_t cwd, swd, twd;
 
     if (!kvm_has_xsave()) {
         return kvm_get_fpu(env);
@@ -951,13 +959,15 @@  static int kvm_get_xsave(CPUState *env)
     cwd = (uint16_t)xsave->region[0];
     swd = (uint16_t)(xsave->region[0] >> 16);
     twd = (uint16_t)xsave->region[1];
-    fop = (uint16_t)(xsave->region[1] >> 16);
+    env->fpop = (uint16_t)(xsave->region[1] >> 16);
     env->fpstt = (swd >> 11) & 7;
     env->fpus = swd;
     env->fpuc = cwd;
     for (i = 0; i < 8; ++i) {
         env->fptags[i] = !((twd >> i) & 1);
     }
+    memcpy(&env->fpip, &xsave->region[XSAVE_CWD_RIP], sizeof(env->fpip));
+    memcpy(&env->fpdp, &xsave->region[XSAVE_CWD_RDP], sizeof(env->fpdp));
     env->mxcsr = xsave->region[XSAVE_MXCSR];
     memcpy(env->fpregs, &xsave->region[XSAVE_ST_SPACE],
             sizeof env->fpregs);
diff --git a/target-i386/machine.c b/target-i386/machine.c
index bbeae88..e02c2a3 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -390,6 +390,10 @@  static const VMStateDescription vmstate_cpu = {
         VMSTATE_UINT64_V(xcr0, CPUState, 12),
         VMSTATE_UINT64_V(xstate_bv, CPUState, 12),
         VMSTATE_YMMH_REGS_VARS(ymmh_regs, CPUState, CPU_NB_REGS, 12),
+        /* Further FPU states */
+        VMSTATE_UINT16_V(fpop, CPUState, 13),
+        VMSTATE_UINT64_V(fpip, CPUState, 13),
+        VMSTATE_UINT64_V(fpdp, CPUState, 13),
         VMSTATE_END_OF_LIST()
         /* The above list is not sorted /wrt version numbers, watch out! */
     },