diff mbox

Fix bug in implementation of SYSRET instruction for x86-64

Message ID 201503040948.05637.wpaul@windriver.com
State New
Headers show

Commit Message

Bill Paul March 4, 2015, 5:48 p.m. UTC
Hi guys. I seem to have found a bug in the helper_systet() function in
target-i386/seg_helper.c. I downloaded the Intel architecture manual from 
here:

http://www.intel.com/content/www/us/en/processors/architectures-software-
developer-manuals.html

And it describes the behavior of SYSRET with regards to the stack selector 
(SS) as follows:

SS.Selector <-- (IA32_STAR[63:48]+8) OR 3; (* RPL forced to 3 *)

In other words, the value of the SS register is supposed to be loaded from 
bits 63:48 of the IA32_STAR model-specific register (MSR), incremented by 8, 
_AND_ ORed with 3. ORing in the 3 sets the privilege level to 3 (user). This 
is done since SYSRET returns to user mode after a system call.

The code in helper_sysret() performs the "increment by 8" step, but _NOT_ the 
"OR with 3" step.

Here's another description of the SYSRET instruction which says basically the 
same thing, though in slightly different format:

http://tptp.cc/mirrors/siyobik.info/instruction/SYSRET.html

[...]
        SS(SEL) = IA32_STAR[63:48] + 8;
        SS(PL) = 0x3;
[...]

The effect of this bug is that when x86-64 code uses the SYSCALL instruction 
to enter kernel mode, the SYSRET instruction will return the CPU to user mode 
with the SS register incorrectly set (the privilege level bits will be 0). In 
my case, the original SS value for user mode was 0x2B, but after the return 
from SYSRET, it was changed to 0x28. It took me quite some time to realize 
this was due to a bug in QEMU rather than my own code.

This caused a problem later when the user-mode code was preempted by an 
interrupt. At the end of the interrupt handling, an IRET instruction was used 
to exit the interrupt context, and the IRET instruction would generate a 
general protection fault because it would attempt to validate the stack 
selector value and notice that it was set for PL 0 (supervisor) instead of PL 
3 (user).

From my reading, the code is quite clearly in error with respect to the Intel 
documentation, and fixing the bug in my local sources results in my test code 
(which runs correctly on real hardware) working correctly in QEMU. It seems 
this bug has been there for a long time -- I happened to have the sources for 
QEMU 0.10.5 and the bug is there too (in target-i386/op_helper.c). I'm puzzled 
as to how it's gone unnoticed for so long, which has me thinking that maybe 
I'm missing something. I'm pretty sure this is wrong though.

I'm including a patch to fix this below. It seems to fix the problem quite 
nicely on my QEMU 2.2.0 installation. I'm also attaching a separate copy in 
case my mail client mangles the formatting on the inline copy.

-Bill

Comments

John Snow March 5, 2015, 5:21 p.m. UTC | #1
CC'ing X86 maintainers.

On 03/04/2015 12:48 PM, Bill Paul wrote:
> Hi guys. I seem to have found a bug in the helper_systet() function in
>
> target-i386/seg_helper.c. I downloaded the Intel architecture manual
> from here:
>
> http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
>
> And it describes the behavior of SYSRET with regards to the stack
> selector (SS) as follows:
>
> SS.Selector <-- (IA32_STAR[63:48]+8) OR 3; (* RPL forced to 3 *)
>
> In other words, the value of the SS register is supposed to be loaded
> from bits 63:48 of the IA32_STAR model-specific register (MSR),
> incremented by 8, _AND_ ORed with 3. ORing in the 3 sets the privilege
> level to 3 (user). This is done since SYSRET returns to user mode after
> a system call.
>
> The code in helper_sysret() performs the "increment by 8" step, but
> _NOT_ the "OR with 3" step.
>
> Here's another description of the SYSRET instruction which says
> basically the same thing, though in slightly different format:
>
> http://tptp.cc/mirrors/siyobik.info/instruction/SYSRET.html
>
> [...]
>
> SS(SEL) = IA32_STAR[63:48] + 8;
>
> SS(PL) = 0x3;
>
> [...]
>
> The effect of this bug is that when x86-64 code uses the SYSCALL
> instruction to enter kernel mode, the SYSRET instruction will return the
> CPU to user mode with the SS register incorrectly set (the privilege
> level bits will be 0). In my case, the original SS value for user mode
> was 0x2B, but after the return from SYSRET, it was changed to 0x28. It
> took me quite some time to realize this was due to a bug in QEMU rather
> than my own code.
>
> This caused a problem later when the user-mode code was preempted by an
> interrupt. At the end of the interrupt handling, an IRET instruction was
> used to exit the interrupt context, and the IRET instruction would
> generate a general protection fault because it would attempt to validate
> the stack selector value and notice that it was set for PL 0
> (supervisor) instead of PL 3 (user).
>
>  From my reading, the code is quite clearly in error with respect to the
> Intel documentation, and fixing the bug in my local sources results in
> my test code (which runs correctly on real hardware) working correctly
> in QEMU. It seems this bug has been there for a long time -- I happened
> to have the sources for QEMU 0.10.5 and the bug is there too (in
> target-i386/op_helper.c). I'm puzzled as to how it's gone unnoticed for
> so long, which has me thinking that maybe I'm missing something. I'm
> pretty sure this is wrong though.
>
> I'm including a patch to fix this below. It seems to fix the problem
> quite nicely on my QEMU 2.2.0 installation. I'm also attaching a
> separate copy in case my mail client mangles the formatting on the
> inline copy.
>
> -Bill
>
> --
>
> =============================================================================
>
> -Bill Paul (510) 749-2329 | Senior Member of Technical Staff,
>
> wpaul@windriver.com | Master of Unix-Fu - Wind River Systems
>
> =============================================================================
>
> "I put a dollar in a change machine. Nothing changed." - George Carlin
>
> =============================================================================
>
> Signed-off-by: Bill Paul <wpaul@windriver.com>
>
> ---
>
> target-i386/seg_helper.c | 4 ++--
>
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/target-i386/seg_helper.c b/target-i386/seg_helper.c
>
> index fa374d0..2bc757a 100644
>
> --- a/target-i386/seg_helper.c
>
> +++ b/target-i386/seg_helper.c
>
> @@ -1043,7 +1043,7 @@ void helper_sysret(CPUX86State *env, int dflag)
>
> DESC_CS_MASK | DESC_R_MASK | DESC_A_MASK);
>
> env->eip = (uint32_t)env->regs[R_ECX];
>
> }
>
> - cpu_x86_load_seg_cache(env, R_SS, selector + 8,
>
> + cpu_x86_load_seg_cache(env, R_SS, (selector + 8) | 3,
>
> 0, 0xffffffff,
>
> DESC_G_MASK | DESC_B_MASK | DESC_P_MASK |
>
> DESC_S_MASK | (3 << DESC_DPL_SHIFT) |
>
> @@ -1056,7 +1056,7 @@ void helper_sysret(CPUX86State *env, int dflag)
>
> DESC_S_MASK | (3 << DESC_DPL_SHIFT) |
>
> DESC_CS_MASK | DESC_R_MASK | DESC_A_MASK);
>
> env->eip = (uint32_t)env->regs[R_ECX];
>
> - cpu_x86_load_seg_cache(env, R_SS, selector + 8,
>
> + cpu_x86_load_seg_cache(env, R_SS, (selector + 8) | 3,
>
> 0, 0xffffffff,
>
> DESC_G_MASK | DESC_B_MASK | DESC_P_MASK |
>
> DESC_S_MASK | (3 << DESC_DPL_SHIFT) |
>
> --
>
> 1.8.0
>
diff mbox

Patch

From e85d3d374594f6c31e1bf72f8912334194be8654 Mon Sep 17 00:00:00 2001
From: Bill Paul <wpaul@windriver.com>
Date: Wed, 4 Mar 2015 09:19:03 -0800
Subject: [PATCH] This checkin corrects a bug in the implementation of the
 sysret instruction. Per the Intel architecture manual, the
 stack selector (SS) is to be loaded from the IA32_STAR
 register, incremented by 8, _AND_ ORed with 3 (this forces
 the privilege level to 3). The latter step (ORing in the 3)
 is missing. This breaks the machine behavior and can lead
 to correct code malfunctioning in some circumstances.

Signed-off-by: Bill Paul <wpaul@windriver.com>

---
 target-i386/seg_helper.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target-i386/seg_helper.c b/target-i386/seg_helper.c
index fa374d0..2bc757a 100644
--- a/target-i386/seg_helper.c
+++ b/target-i386/seg_helper.c
@@ -1043,7 +1043,7 @@  void helper_sysret(CPUX86State *env, int dflag)
                                    DESC_CS_MASK | DESC_R_MASK | DESC_A_MASK);
             env->eip = (uint32_t)env->regs[R_ECX];
         }
-        cpu_x86_load_seg_cache(env, R_SS, selector + 8,
+        cpu_x86_load_seg_cache(env, R_SS, (selector + 8) | 3,
                                0, 0xffffffff,
                                DESC_G_MASK | DESC_B_MASK | DESC_P_MASK |
                                DESC_S_MASK | (3 << DESC_DPL_SHIFT) |
@@ -1056,7 +1056,7 @@  void helper_sysret(CPUX86State *env, int dflag)
                                DESC_S_MASK | (3 << DESC_DPL_SHIFT) |
                                DESC_CS_MASK | DESC_R_MASK | DESC_A_MASK);
         env->eip = (uint32_t)env->regs[R_ECX];
-        cpu_x86_load_seg_cache(env, R_SS, selector + 8,
+        cpu_x86_load_seg_cache(env, R_SS, (selector + 8) | 3,
                                0, 0xffffffff,
                                DESC_G_MASK | DESC_B_MASK | DESC_P_MASK |
                                DESC_S_MASK | (3 << DESC_DPL_SHIFT) |
-- 
1.8.0