diff mbox

[SeaBIOS,v3] fw/msr_feature_control: add support to set MSR_IA32_FEATURE_CONTROL

Message ID 2af5d48b-4bcc-d25d-e41e-5c41cf74b631@redhat.com
State New
Headers show

Commit Message

Paolo Bonzini July 6, 2016, 12:18 p.m. UTC
On 06/07/2016 13:04, Laszlo Ersek wrote:
> Actually, I think there is a bug in KVM at the moment. I ran the
> following test:
> 
> - modified OVMF to set the MSR to value 0x5 on just the BSP
> - booted an i440fx and a Q35 (SMM-enabled) OVMF guest
> - checked "rdmsr -a 0x3a" in both
> - ran "pm-suspend" in both guests, woke them
> - repeated the rdmsr command
> 
> The result is that the BSP had the 0x5 MSR value both after cold boot
> and after S3 resume. So, KVM does not seem to implement clearing of the MSR.

I suspect that the KVM module is getting in the way.  Try removing it
or go lower-level.  I did the following test:

- Applied the following patch to x86/s3.c from kvm-unit-tests:


- Compiled the latest SeaBIOS

- Applied the QEMU patches for LMCE

- ran the following:

../qemu/+build/x86_64-softmmu/qemu-system-x86_64 -display none \
	--enable-kvm -m 512 -serial mon:stdio \
	-cpu host,+vmx -kernel ./x86/s3.flat \
	-global PIIX4_PM.disable_s3=0 -device isa-debug-exit,iobase=0xf4

and my output is:

enabling apic
FACS is at 0x1ffe0000
resume vector addr is 0x1ffe000c
copy resume code from 0x400340
PM1a event registers at 600
Current value of MSR is 0x5
Value after S3 resume is 0


I also tried using BITS from https://biosbits.org/downloads/bits-2073.zip
with OVMF:

>>> import bits
>>> cpu = bits.cpus()[0]
>>> bits.rdmsr(cpu, 0x3a)
0L
>>> bits.wrmsr(cpu, 0x3a, 5)
True
>>> bits.rdmsr(cpu, 0x3a)   
5L

After a full system reset (I don't know if BITS can do S3! :)) the rdmsr gave
zero again.  Tracing KVM confirmed that the OVMF I used doesn't touch the MSR.

> I checked kvm/next (currently at
> 196f20ca52e8c7281932663c348fa54b82d03914), and vmx_vcpu_reset() does not
> seem to zero vmx->msr_ia32_feature_control.

This is true, but QEMU does zero it.

> Now, what I absolutely can't tell you is whether this zeroing should
> happen regardless of "init_event", or just for a specific value of
> "init_event". Whenever I look at "init_event", I have to track it down
> to the commit that added it, then locate all the commits that fixed it,
> then guess whether the SDM language "logical processor reset" implies a
> specific "init_event" value or not. So, I have no idea about "init_event".

It should be preserved by INIT, but not by reset or S3.

> So I need to know whether those INIT-SIPI-SIPI sequences are supposed to
> clear the MSR -- in other words, whether I have to write patches that
> explicitly sustain the MSR across these IPIs.

No, INIT hardly changes any MSR.

Paolo

Comments

Laszlo Ersek July 6, 2016, 3:43 p.m. UTC | #1
On 07/06/16 14:18, Paolo Bonzini wrote:
> 
> 
> On 06/07/2016 13:04, Laszlo Ersek wrote:

>> I checked kvm/next (currently at
>> 196f20ca52e8c7281932663c348fa54b82d03914), and vmx_vcpu_reset() does not
>> seem to zero vmx->msr_ia32_feature_control.
> 
> This is true, but QEMU does zero it.

Indeed. Sorry for wasting your time.

Although, I am very happy about the following:

>> Now, what I absolutely can't tell you is whether this zeroing should
>> happen regardless of "init_event", or just for a specific value of
>> "init_event". Whenever I look at "init_event", I have to track it down
>> to the commit that added it, then locate all the commits that fixed it,
>> then guess whether the SDM language "logical processor reset" implies a
>> specific "init_event" value or not. So, I have no idea about "init_event".
> 
> It should be preserved by INIT, but not by reset or S3.

This is great (I hope!) because this way I have a chance to stay out of
the implementations of EFI_MP_SERVICES_PROTOCOL and
EFI_PEI_MP_SERVICES_PPI. (They live under UefiCpuPkg.)

I've worried that if I only *call* these interfaces to set the MSR, then
the next (independent) use of the same interfaces would clear the MSR
through the INIT-SIPI-SIPI. That would have forced me to modify the
protocol / PPI implementations so that any use of them would reprogram
the MSR every time, after the INIT-SIPI-SIPI.

This way however (hopefully) it should suffice to call the PPI only --
the results should survive from PEI to DXE to the runtime OS on the
normal boot path, and from PEI to the runtime OS on the S3 resume path.

>> So I need to know whether those INIT-SIPI-SIPI sequences are supposed to
>> clear the MSR -- in other words, whether I have to write patches that
>> explicitly sustain the MSR across these IPIs.
> 
> No, INIT hardly changes any MSR.

Thank you for your help!
Laszlo
Paolo Bonzini July 7, 2016, 12:12 p.m. UTC | #2
> I've worried that if I only *call* these interfaces to set the MSR, then
> the next (independent) use of the same interfaces would clear the MSR
> through the INIT-SIPI-SIPI. That would have forced me to modify the
> protocol / PPI implementations so that any use of them would reprogram
> the MSR every time, after the INIT-SIPI-SIPI.
> 
> This way however (hopefully) it should suffice to call the PPI only --
> the results should survive from PEI to DXE to the runtime OS on the
> normal boot path, and from PEI to the runtime OS on the S3 resume path.

This is correct (for what I understand).  Out of curiosity, why is
it not enough to just add the MSR to the ACPI_CPU_DATA?  Or is it
what you're doing, but OVMF was not restoring MTRRs at S3 resume
time either?

Paolo
Laszlo Ersek July 7, 2016, 12:44 p.m. UTC | #3
On 07/07/16 14:12, Paolo Bonzini wrote:
> 
>> I've worried that if I only *call* these interfaces to set the MSR, then
>> the next (independent) use of the same interfaces would clear the MSR
>> through the INIT-SIPI-SIPI. That would have forced me to modify the
>> protocol / PPI implementations so that any use of them would reprogram
>> the MSR every time, after the INIT-SIPI-SIPI.
>>
>> This way however (hopefully) it should suffice to call the PPI only --
>> the results should survive from PEI to DXE to the runtime OS on the
>> normal boot path, and from PEI to the runtime OS on the S3 resume path.
> 
> This is correct (for what I understand).  Out of curiosity, why is
> it not enough to just add the MSR to the ACPI_CPU_DATA?  Or is it
> what you're doing, but OVMF was not restoring MTRRs at S3 resume
> time either?

At the time of writing the above emails, I had not done anything in
particular yet. I was just focusing on an idea that would hopefully
allow me to keep things simple (and under OvmfPkg).

The idea is that I set the feature control MSR once in PlatformPei (on
all the CPUs), using EFI_PEI_MP_SERVICES_PPI. This would happen
regardless of the boot path (S3 vs. normal boot path). Then the MSR set
thus would survive into the OS (through DXE on the nromal boot path, and
directly on the S3 boot path).

Until you confirmed that INIT IPIs would not clear the MSR, the above
idea was not feasible, for the following reason:

- on the S3 resume path, any other PEIM clients of
  EFI_PEI_MP_SERVICES_PPI might invoke EFI_PEI_MP_SERVICES_PPI, raising
  an INIT IPI (for utilizing the APs), thereby clearing the MSR;

- on the normal boot path, the same, *plus*: in DXE, any client of
  EFI_MP_SERVICES_PROTOCOL could cause the same. (In fact just the
  startup of the EFI_MP_SERVICES_PROTOCOL implementation in CpuDxe
  involves INIT-SIPI-SIPI.)

Ultimately, the idea I'm working on is pretty simple, same as any other
chipset register configuration we do in PlatformPei. The difference is
"only" that I need to pull in the EFI_PEI_MP_SERVICES_PPI implementation
(CpuMpPei), and then call that from PlatformPei to write the MSR on all
APs as well.

The risk was that any other INIT IPIs (involved in further use of
EFI_PEI_MP_SERVICES_PPI through PEI, and EFI_MP_SERVICES_PROTOCOL
through DXE) would undo that work. But, you excluded this risk, so the
idea should be fine. (I'm about to test it soon...)

The ACPI_CPU_DATA field would be related to PiSmmCpuDxeSmm and
CpuS3DataDxe. With the above idea, I don't expect to have to touch those
at all (I could be proved wrong, of course -- this is why it is
important for QEMU to clear the MSR whenever it is architecturally
required). PlatformPei should program the MSR independently of the SMM
driver stack.

So this is something I have to test in four situations: { cold boot, S3
resume } x { SMM driver stack absent, SMM driver stack present }.

Regarding MTRRs... that's a bit messy. PlatformPei only progams the
MTRRs only on the BSP. For the normal boot path, this is no problem,
because when EFI_MP_SERVICES_PROTOCOL starts up (in CpuDxe), the MTRR
settings are broad-cast to all APs. It is also not a problem for the S3
resume path, when the SMM driver stack is used, because CpuS3DataDxe
saves the MTRRs at End-of-DXE, and at S3 resume, PiSmmCpuDxeSmm restores
them.

There is one path where the firmware does not restore MTRRs on APs: S3
resume without the SMM driver stack. In practice this doesn't seem to
cause problems. Maybe Linux restores those MTRRs anyway, when the APs
are onlined after resume -- even at cold boot, Linux checks the MTRR
config, and if it's inconsistent between BSP and APs, the APs are adapted.

(If I understand correctly, on S3 resume, SeaBIOS doesn't reprogram the
MTRRs even on the BSP, and historically this has caused no problems. So
in that sense OVMF is "no worse". :))

Thanks
Laszlo
Paolo Bonzini July 7, 2016, 1:19 p.m. UTC | #4
On 07/07/2016 14:44, Laszlo Ersek wrote:
> Regarding MTRRs... that's a bit messy. PlatformPei only progams the
> MTRRs only on the BSP. For the normal boot path, this is no problem,
> because when EFI_MP_SERVICES_PROTOCOL starts up (in CpuDxe), the MTRR
> settings are broad-cast to all APs. It is also not a problem for the S3
> resume path, when the SMM driver stack is used, because CpuS3DataDxe
> saves the MTRRs at End-of-DXE, and at S3 resume, PiSmmCpuDxeSmm restores
> them.
> 
> There is one path where the firmware does not restore MTRRs on APs: S3
> resume without the SMM driver stack. In practice this doesn't seem to
> cause problems. Maybe Linux restores those MTRRs anyway, when the APs
> are onlined after resume -- even at cold boot, Linux checks the MTRR
> config, and if it's inconsistent between BSP and APs, the APs are adapted.

Oh, that helps indeed.

> (If I understand correctly, on S3 resume, SeaBIOS doesn't reprogram the
> MTRRs even on the BSP, and historically this has caused no problems. So
> in that sense OVMF is "no worse". :))

MTRRs hardly have any effect on virt platforms, which kinda explains
that.  We need to fix that now that smp_setup is used for
MSR_IA32_FEATURE_CONTROL.

Paolo
diff mbox

Patch

diff --git a/x86/s3.c b/x86/s3.c
index cef956e..0f2b48f 100644
--- a/x86/s3.c
+++ b/x86/s3.c
@@ -1,6 +1,7 @@ 
 #include "libcflat.h"
 #include "x86/acpi.h"
 #include "asm/io.h"
+#include "x86/processor.h"
 
 u32* find_resume_vector_addr(void)
 {
@@ -62,6 +63,7 @@  int main(int argc, char **argv)
 	rtc_out(RTC_HOURS_ALARM, RTC_ALARM_DONT_CARE);
 	rtc_out(RTC_REG_B, rtc_in(RTC_REG_B) | REG_B_AIE);
 
+	printf("Current value of MSR is 0x%x\nValue after S3 resume is ", (int)rdmsr(0x3a));
 	*(volatile int*)0 = 0;
 	asm volatile("outw %0, %1" :: "a"((short)0x2400), "d"((short)fadt->pm1a_cnt_blk):"memory");
 	while(1)
@@ -75,6 +77,13 @@  asm (
 	".global resume_end\n"
 	".code16\n"
 	"resume_start:\n"
+	"mov $0x3a, %ecx\n"
+	"rdmsr\n"
+	"mov $0x3f8, %dx\n"
+	"add $0x30, %al\n"
+	"out %al, %dx\n"
+	"mov $0xa, %al\n"
+	"out %al, %dx\n"
 	"mov 0x0, %eax\n"
 	"mov $0xf4, %dx\n"
 	"out %eax, %dx\n"