diff mbox

SMI handler should set the CPL to zero and save and restore it on rsm.

Message ID 5372636F.8080101@redhat.com
State New
Headers show

Commit Message

Paolo Bonzini May 13, 2014, 6:24 p.m. UTC
Il 27/04/2014 19:25, Kevin O'Connor ha scritto:
> I was wondering about that as well.  The Intel docs state that the CPL
> is bits 0-1 of the CS.selector register, and that protected mode
> starts immediately after setting the PE bit.  The CS.selector field
> should be the value of %cs in real mode, which is the value added to
> eip (after shifting right by 4).
> 
> I guess that means that the real mode code that enables the PE bit
> must run with a code segment aligned to a value of 4.  (Which
> effectively means code alignment of 64 bytes because of the segment
> shift.)

It turns out that this is not a requirement; which means that the 
protected mode transition is exactly the only place where CPL is not 
redundant.  The CPL remains zero until you reload CS with a long jump.

Your patch gets it right because after a CR0 write it doesn't attempt 
to recompute the CPL, but you need the following partial revert in 
order to satisfy virtualization extensions (SVM).  Without it, the
guest will triple fault after setting CR0.PE=1, unless CS's low 2 bits
are 00.  The hypervisor gets a CR0_WRITE vmexit, but then the processor
fails to execute guest code from a non-conforming ring-0 code segment
at CPL>0.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Comments

Kevin O'Connor May 13, 2014, 6:39 p.m. UTC | #1
On Tue, May 13, 2014 at 08:24:47PM +0200, Paolo Bonzini wrote:
> Il 27/04/2014 19:25, Kevin O'Connor ha scritto:
> > I was wondering about that as well.  The Intel docs state that the CPL
> > is bits 0-1 of the CS.selector register, and that protected mode
> > starts immediately after setting the PE bit.  The CS.selector field
> > should be the value of %cs in real mode, which is the value added to
> > eip (after shifting right by 4).
> > 
> > I guess that means that the real mode code that enables the PE bit
> > must run with a code segment aligned to a value of 4.  (Which
> > effectively means code alignment of 64 bytes because of the segment
> > shift.)
> 
> It turns out that this is not a requirement; which means that the 
> protected mode transition is exactly the only place where CPL is not 
> redundant.  The CPL remains zero until you reload CS with a long jump.

That doesn't sound right.  What happens if the processor takes an NMI,
SMI, or VMEXIT between the point it enables protected mode but before
it long jumps?  The processor would have to save and restore the CPL
somewhere for all of these situations.

If you are right, I think the whole series needs to be reworked.

-Kevin
Paolo Bonzini May 13, 2014, 6:57 p.m. UTC | #2
Il 13/05/2014 20:39, Kevin O'Connor ha scritto:
> On Tue, May 13, 2014 at 08:24:47PM +0200, Paolo Bonzini wrote:
>> Il 27/04/2014 19:25, Kevin O'Connor ha scritto:
>>> I was wondering about that as well.  The Intel docs state that the CPL
>>> is bits 0-1 of the CS.selector register, and that protected mode
>>> starts immediately after setting the PE bit.  The CS.selector field
>>> should be the value of %cs in real mode, which is the value added to
>>> eip (after shifting right by 4).
>>>
>>> I guess that means that the real mode code that enables the PE bit
>>> must run with a code segment aligned to a value of 4.  (Which
>>> effectively means code alignment of 64 bytes because of the segment
>>> shift.)
>>
>> It turns out that this is not a requirement; which means that the
>> protected mode transition is exactly the only place where CPL is not
>> redundant.  The CPL remains zero until you reload CS with a long jump.
>
> That doesn't sound right.  What happens if the processor takes an NMI,
> SMI, or VMEXIT between the point it enables protected mode but before
> it long jumps?  The processor would have to save and restore the CPL
> somewhere for all of these situations.

For VMEXITs it's up to the hypervisor to make it work properly.  I just 
posted today fixes for KVM.

I guess the answer for NMIs is "good luck".  But in the case of NMIs, 
wouldn't it be broken anyway, because the IDT format is different 
between real mode and protected mode?

For SMIs, http://www.sandpile.org/x86/smm.htm says that the CPL is 
stored somewhere in SMRAM.  I think your patches are an improvement 
anyway, we can build a more complete fix on top of them.

Paolo

> If you are right, I think the whole series needs to be reworked.
Paolo Bonzini May 13, 2014, 7:42 p.m. UTC | #3
Il 13/05/2014 20:57, Paolo Bonzini ha scritto:
> Il 13/05/2014 20:39, Kevin O'Connor ha scritto:
>> That doesn't sound right.  What happens if the processor takes an NMI,
>> SMI, or VMEXIT between the point it enables protected mode but before
>> it long jumps?  The processor would have to save and restore the CPL
>> somewhere for all of these situations.
>
> For VMEXITs it's up to the hypervisor to make it work properly.  I just
> posted today fixes for KVM.
>
> I guess the answer for NMIs is "good luck".  But in the case of NMIs,
> wouldn't it be broken anyway, because the IDT format is different
> between real mode and protected mode?
>
> For SMIs, http://www.sandpile.org/x86/smm.htm says that the CPL is
> stored somewhere in SMRAM.  I think your patches are an improvement
> anyway, we can build a more complete fix on top of them.

On second thought, the CPL should always be equal to SS.DPL, even during 
real mode transitions.  Unlike CS.RPL, SS.DPL is hidden in the internal 
segment descriptor registers and is always zero.

I say *should*, because of course there is an exception. :)  CPL is 
forced to 3 by SYSRET, even if it loads SS with a selector whose RPL is 
not 3.  So we're better off saving CPL in SMRAM (and perhaps make the 
SMRAM map equal to that of real processors).

In any case, I think this is an example of how your patches are an 
improvement; it would be trivial to make CPL=SS.DPL on top of them.

Paolo
Kevin O'Connor May 13, 2014, 10:07 p.m. UTC | #4
On Tue, May 13, 2014 at 02:39:20PM -0400, Kevin O'Connor wrote:
> On Tue, May 13, 2014 at 08:24:47PM +0200, Paolo Bonzini wrote:
> > Il 27/04/2014 19:25, Kevin O'Connor ha scritto:
> > > I was wondering about that as well.  The Intel docs state that the CPL
> > > is bits 0-1 of the CS.selector register, and that protected mode
> > > starts immediately after setting the PE bit.  The CS.selector field
> > > should be the value of %cs in real mode, which is the value added to
> > > eip (after shifting right by 4).
> > > 
> > > I guess that means that the real mode code that enables the PE bit
> > > must run with a code segment aligned to a value of 4.  (Which
> > > effectively means code alignment of 64 bytes because of the segment
> > > shift.)
> > 
> > It turns out that this is not a requirement; which means that the 
> > protected mode transition is exactly the only place where CPL is not 
> > redundant.  The CPL remains zero until you reload CS with a long jump.
> 
> That doesn't sound right.

FYI, I ran a couple of tests on a real machine where I set protected
mode while %cs=0xf003 and I can confirm that it doesn't cause faults.
So, you are correct - the CPL appears to be stored separately from
%cs[1:0] and it appears CPL isn't altered until CS is reloaded.

-Kevin
diff mbox

Patch

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index e9cbdab..478f356 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -124,9 +124,9 @@ 
 #define ID_MASK                 0x00200000
 
 /* hidden flags - used internally by qemu to represent additional cpu
-   states. Only the INHIBIT_IRQ, SMM and SVMI are not redundant. We
-   avoid using the IOPL_MASK, TF_MASK, VM_MASK and AC_MASK bit
-   positions to ease oring with eflags. */
+   states. Only the CPL, INHIBIT_IRQ, SMM and SVMI are not
+   redundant. We avoid using the IOPL_MASK, TF_MASK, VM_MASK and AC_MASK
+   bit positions to ease oring with eflags. */
 /* current cpl */
 #define HF_CPL_SHIFT         0
 /* true if soft mmu is being used */
@@ -1052,6 +1052,16 @@  int cpu_x86_get_descr_debug(CPUX86State *env, unsigned int selector,
                             target_ulong *base, unsigned int *limit,
                             unsigned int *flags);
 
+/* wrapper, just in case memory mappings must be changed */
+static inline void cpu_x86_set_cpl(CPUX86State *s, int cpl)
+{
+#if HF_CPL_MASK == 3
+    s->hflags = (s->hflags & ~HF_CPL_MASK) | cpl;
+#else
+#error HF_CPL_MASK is hardcoded
+#endif
+}
+
 /* op_helper.c */
 /* used for debug or cpu save/restore */
 void cpu_get_fp80(uint64_t *pmant, uint16_t *pexp, floatx80 f);
diff --git a/target-i386/svm_helper.c b/target-i386/svm_helper.c
index 846eaa5..29ca012 100644
--- a/target-i386/svm_helper.c
+++ b/target-i386/svm_helper.c
@@ -282,6 +282,9 @@  void helper_vmrun(CPUX86State *env, int aflag, int next_eip_addend)
                           env->vm_vmcb + offsetof(struct vmcb, save.dr7));
     env->dr[6] = ldq_phys(cs->as,
                           env->vm_vmcb + offsetof(struct vmcb, save.dr6));
+    cpu_x86_set_cpl(env, ldub_phys(cs->as,
+                                   env->vm_vmcb + offsetof(struct vmcb,
+                                                           save.cpl)));
 
     /* FIXME: guest state consistency checks */