Message ID | 1585656658-1838-3-git-send-email-ego@linux.vnet.ibm.com |
---|---|
State | RFC |
Headers | show |
Series | Add support for stop instruction inside KVM guest | expand |
Gautham R. Shenoy's on March 31, 2020 10:10 pm: > From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> > > ISA v3.0 allows the guest to execute a stop instruction. For this, the > PSSCR[ESL|EC] bits need to be cleared by the hypervisor before > scheduling in the guest vCPU. > > Currently we always schedule in a vCPU with PSSCR[ESL|EC] bits > set. This patch changes the behaviour to enter the guest with > PSSCR[ESL|EC] bits cleared. This is a RFC patch where we > unconditionally clear these bits. Ideally this should be done > conditionally on platforms where the guest stop instruction has no > Bugs (starting POWER9 DD2.3). How will guests know that they can use this facility safely after your series? You need both DD2.3 and a patched KVM. > > Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> > --- > arch/powerpc/kvm/book3s_hv.c | 2 +- > arch/powerpc/kvm/book3s_hv_rmhandlers.S | 25 +++++++++++++------------ > 2 files changed, 14 insertions(+), 13 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index cdb7224..36d059a 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -3424,7 +3424,7 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit, > mtspr(SPRN_IC, vcpu->arch.ic); > mtspr(SPRN_PID, vcpu->arch.pid); > > - mtspr(SPRN_PSSCR, vcpu->arch.psscr | PSSCR_EC | > + mtspr(SPRN_PSSCR, (vcpu->arch.psscr & ~(PSSCR_EC | PSSCR_ESL)) | > (local_paca->kvm_hstate.fake_suspend << PSSCR_FAKE_SUSPEND_LG)); > > mtspr(SPRN_HFSCR, vcpu->arch.hfscr); > diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > index dbc2fec..c2daec3 100644 > --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S > +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > @@ -823,6 +823,18 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S) > mtspr SPRN_PID, r7 > mtspr SPRN_WORT, r8 > BEGIN_FTR_SECTION > + /* POWER9-only registers */ > + ld r5, VCPU_TID(r4) > + ld r6, VCPU_PSSCR(r4) > + lbz r8, HSTATE_FAKE_SUSPEND(r13) > + lis r7, (PSSCR_EC | PSSCR_ESL)@h /* Allow guest to call stop */ > + andc r6, r6, r7 > + rldimi r6, r8, PSSCR_FAKE_SUSPEND_LG, 63 - PSSCR_FAKE_SUSPEND_LG > + ld r7, VCPU_HFSCR(r4) > + mtspr SPRN_TIDR, r5 > + mtspr SPRN_PSSCR, r6 > + mtspr SPRN_HFSCR, r7 > +FTR_SECTION_ELSE Why did you move these around? Just because the POWER9 section became larger than the other? That's a real wart in the instruction patching implementation, I think we can fix it by padding with nops in the macros. Can you just add the additional required nops to the top branch without changing them around for this patch, so it's easier to see what's going on? The end result will be the same after patching. Actually changing these around can have a slight unintended consequence in that code that runs before features were patched will execute the IF code. Not a problem here, but another reason why the instruction patching restriction is annoying. Thanks, Nick > /* POWER8-only registers */ > ld r5, VCPU_TCSCR(r4) > ld r6, VCPU_ACOP(r4) > @@ -833,18 +845,7 @@ BEGIN_FTR_SECTION > mtspr SPRN_CSIGR, r7 > mtspr SPRN_TACR, r8 > nop > -FTR_SECTION_ELSE > - /* POWER9-only registers */ > - ld r5, VCPU_TID(r4) > - ld r6, VCPU_PSSCR(r4) > - lbz r8, HSTATE_FAKE_SUSPEND(r13) > - oris r6, r6, PSSCR_EC@h /* This makes stop trap to HV */ > - rldimi r6, r8, PSSCR_FAKE_SUSPEND_LG, 63 - PSSCR_FAKE_SUSPEND_LG > - ld r7, VCPU_HFSCR(r4) > - mtspr SPRN_TIDR, r5 > - mtspr SPRN_PSSCR, r6 > - mtspr SPRN_HFSCR, r7 > -ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300) > +ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300) > 8: > > ld r5, VCPU_SPRG0(r4) > -- > 1.9.4 > >
On Fri, Apr 03, 2020 at 12:20:26PM +1000, Nicholas Piggin wrote: > Gautham R. Shenoy's on March 31, 2020 10:10 pm: > > From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> > > > > ISA v3.0 allows the guest to execute a stop instruction. For this, the > > PSSCR[ESL|EC] bits need to be cleared by the hypervisor before > > scheduling in the guest vCPU. > > > > Currently we always schedule in a vCPU with PSSCR[ESL|EC] bits > > set. This patch changes the behaviour to enter the guest with > > PSSCR[ESL|EC] bits cleared. This is a RFC patch where we > > unconditionally clear these bits. Ideally this should be done > > conditionally on platforms where the guest stop instruction has no > > Bugs (starting POWER9 DD2.3). > > How will guests know that they can use this facility safely after your > series? You need both DD2.3 and a patched KVM. Yes, this is something that isn't addressed in this series (mentioned in the cover letter), which is a POC demonstrating that the stop0lite state in guest works. However, to answer your question, this is the scheme that I had in mind : OPAL: On Procs >= DD2.3 : we publish a dt-cpu-feature "idle-stop-guest" Hypervisor Kernel: 1. If "idle-stop-guest" dt-cpu-feature is discovered, then we set bool enable_guest_stop = true; 2. During KVM guest entry, clear PSSCR[ESL|EC] iff enable_guest_stop == true. 3. In kvm_vm_ioctl_check_extension(), for a new capability KVM_CAP_STOP, return true iff enable_guest_top == true. QEMU: Check with the hypervisor if KVM_CAP_STOP is present. If so, indicate the presence to the guest via device tree. Guest Kernel: Check for the presence of guest stop state support in device-tree. If available, enable the stop0lite in the cpuidle driver. We still have a challenge of migrating a guest which started on a hypervisor supporting guest stop state to a hypervisor without it. The target hypervisor should atleast have Patch 1 of this series, so that we don't crash the guest. > > > > > Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> > > --- > > arch/powerpc/kvm/book3s_hv.c | 2 +- > > arch/powerpc/kvm/book3s_hv_rmhandlers.S | 25 +++++++++++++------------ > > 2 files changed, 14 insertions(+), 13 deletions(-) > > > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > > index cdb7224..36d059a 100644 > > --- a/arch/powerpc/kvm/book3s_hv.c > > +++ b/arch/powerpc/kvm/book3s_hv.c > > @@ -3424,7 +3424,7 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit, > > mtspr(SPRN_IC, vcpu->arch.ic); > > mtspr(SPRN_PID, vcpu->arch.pid); > > > > - mtspr(SPRN_PSSCR, vcpu->arch.psscr | PSSCR_EC | > > + mtspr(SPRN_PSSCR, (vcpu->arch.psscr & ~(PSSCR_EC | PSSCR_ESL)) | > > (local_paca->kvm_hstate.fake_suspend << PSSCR_FAKE_SUSPEND_LG)); > > > > mtspr(SPRN_HFSCR, vcpu->arch.hfscr); > > diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > > index dbc2fec..c2daec3 100644 > > --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S > > +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > > @@ -823,6 +823,18 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S) > > mtspr SPRN_PID, r7 > > mtspr SPRN_WORT, r8 > > BEGIN_FTR_SECTION > > + /* POWER9-only registers */ > > + ld r5, VCPU_TID(r4) > > + ld r6, VCPU_PSSCR(r4) > > + lbz r8, HSTATE_FAKE_SUSPEND(r13) > > + lis r7, (PSSCR_EC | PSSCR_ESL)@h /* Allow guest to call stop */ > > + andc r6, r6, r7 > > + rldimi r6, r8, PSSCR_FAKE_SUSPEND_LG, 63 - PSSCR_FAKE_SUSPEND_LG > > + ld r7, VCPU_HFSCR(r4) > > + mtspr SPRN_TIDR, r5 > > + mtspr SPRN_PSSCR, r6 > > + mtspr SPRN_HFSCR, r7 > > +FTR_SECTION_ELSE > > Why did you move these around? Just because the POWER9 section became > larger than the other? Yes. > > That's a real wart in the instruction patching implementation, I think > we can fix it by padding with nops in the macros. > > Can you just add the additional required nops to the top branch without > changing them around for this patch, so it's easier to see what's going > on? The end result will be the same after patching. Actually changing > these around can have a slight unintended consequence in that code that > runs before features were patched will execute the IF code. Not a > problem here, but another reason why the instruction patching > restriction is annoying. Sure, I will repost this patch with additional nops instead of moving them around. > > Thanks, > Nick > > > /* POWER8-only registers */ > > ld r5, VCPU_TCSCR(r4) > > ld r6, VCPU_ACOP(r4) > > @@ -833,18 +845,7 @@ BEGIN_FTR_SECTION > > mtspr SPRN_CSIGR, r7 > > mtspr SPRN_TACR, r8 > > nop > > -FTR_SECTION_ELSE > > - /* POWER9-only registers */ > > - ld r5, VCPU_TID(r4) > > - ld r6, VCPU_PSSCR(r4) > > - lbz r8, HSTATE_FAKE_SUSPEND(r13) > > - oris r6, r6, PSSCR_EC@h /* This makes stop trap to HV */ > > - rldimi r6, r8, PSSCR_FAKE_SUSPEND_LG, 63 - PSSCR_FAKE_SUSPEND_LG > > - ld r7, VCPU_HFSCR(r4) > > - mtspr SPRN_TIDR, r5 > > - mtspr SPRN_PSSCR, r6 > > - mtspr SPRN_HFSCR, r7 > > -ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300) > > +ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300) > > 8: > > > > ld r5, VCPU_SPRG0(r4) > > -- > > 1.9.4 > > > > -- Thanks and Regards gautham.
On Fri, Apr 03, 2020 at 03:01:03PM +0530, Gautham R Shenoy wrote: > On Fri, Apr 03, 2020 at 12:20:26PM +1000, Nicholas Piggin wrote: > > Gautham R. Shenoy's on March 31, 2020 10:10 pm: > > > From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> > > > > > > ISA v3.0 allows the guest to execute a stop instruction. For this, the > > > PSSCR[ESL|EC] bits need to be cleared by the hypervisor before > > > scheduling in the guest vCPU. > > > > > > Currently we always schedule in a vCPU with PSSCR[ESL|EC] bits > > > set. This patch changes the behaviour to enter the guest with > > > PSSCR[ESL|EC] bits cleared. This is a RFC patch where we > > > unconditionally clear these bits. Ideally this should be done > > > conditionally on platforms where the guest stop instruction has no > > > Bugs (starting POWER9 DD2.3). > > > > How will guests know that they can use this facility safely after your > > series? You need both DD2.3 and a patched KVM. > > > Yes, this is something that isn't addressed in this series (mentioned > in the cover letter), which is a POC demonstrating that the stop0lite > state in guest works. > > However, to answer your question, this is the scheme that I had in > mind : > > OPAL: > On Procs >= DD2.3 : we publish a dt-cpu-feature "idle-stop-guest" > > Hypervisor Kernel: > 1. If "idle-stop-guest" dt-cpu-feature is discovered, then > we set bool enable_guest_stop = true; > > 2. During KVM guest entry, clear PSSCR[ESL|EC] iff > enable_guest_stop == true. > > 3. In kvm_vm_ioctl_check_extension(), for a new capability > KVM_CAP_STOP, return true iff enable_guest_top == true. > > QEMU: > Check with the hypervisor if KVM_CAP_STOP is present. If so, > indicate the presence to the guest via device tree. Nack. Presenting different capabilities to the guest depending on host capabilities (rather than explicit options) is never ok. It means that depending on the system you start on you may or may not be able to migrate to other systems that you're supposed to be able to, > Guest Kernel: > Check for the presence of guest stop state support in > device-tree. If available, enable the stop0lite in the cpuidle > driver. > > > We still have a challenge of migrating a guest which started on a > hypervisor supporting guest stop state to a hypervisor without it. > The target hypervisor should atleast have Patch 1 of this series, so > that we don't crash the guest. > > > > > > > > > Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> > > > --- > > > arch/powerpc/kvm/book3s_hv.c | 2 +- > > > arch/powerpc/kvm/book3s_hv_rmhandlers.S | 25 +++++++++++++------------ > > > 2 files changed, 14 insertions(+), 13 deletions(-) > > > > > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > > > index cdb7224..36d059a 100644 > > > --- a/arch/powerpc/kvm/book3s_hv.c > > > +++ b/arch/powerpc/kvm/book3s_hv.c > > > @@ -3424,7 +3424,7 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit, > > > mtspr(SPRN_IC, vcpu->arch.ic); > > > mtspr(SPRN_PID, vcpu->arch.pid); > > > > > > - mtspr(SPRN_PSSCR, vcpu->arch.psscr | PSSCR_EC | > > > + mtspr(SPRN_PSSCR, (vcpu->arch.psscr & ~(PSSCR_EC | PSSCR_ESL)) | > > > (local_paca->kvm_hstate.fake_suspend << PSSCR_FAKE_SUSPEND_LG)); > > > > > > mtspr(SPRN_HFSCR, vcpu->arch.hfscr); > > > diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > > > index dbc2fec..c2daec3 100644 > > > --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S > > > +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > > > @@ -823,6 +823,18 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S) > > > mtspr SPRN_PID, r7 > > > mtspr SPRN_WORT, r8 > > > BEGIN_FTR_SECTION > > > + /* POWER9-only registers */ > > > + ld r5, VCPU_TID(r4) > > > + ld r6, VCPU_PSSCR(r4) > > > + lbz r8, HSTATE_FAKE_SUSPEND(r13) > > > + lis r7, (PSSCR_EC | PSSCR_ESL)@h /* Allow guest to call stop */ > > > + andc r6, r6, r7 > > > + rldimi r6, r8, PSSCR_FAKE_SUSPEND_LG, 63 - PSSCR_FAKE_SUSPEND_LG > > > + ld r7, VCPU_HFSCR(r4) > > > + mtspr SPRN_TIDR, r5 > > > + mtspr SPRN_PSSCR, r6 > > > + mtspr SPRN_HFSCR, r7 > > > +FTR_SECTION_ELSE > > > > Why did you move these around? Just because the POWER9 section became > > larger than the other? > > Yes. > > > > > That's a real wart in the instruction patching implementation, I think > > we can fix it by padding with nops in the macros. > > > > Can you just add the additional required nops to the top branch without > > changing them around for this patch, so it's easier to see what's going > > on? The end result will be the same after patching. Actually changing > > these around can have a slight unintended consequence in that code that > > runs before features were patched will execute the IF code. Not a > > problem here, but another reason why the instruction patching > > restriction is annoying. > > Sure, I will repost this patch with additional nops instead of > moving them around. > > > > > Thanks, > > Nick > > > > > /* POWER8-only registers */ > > > ld r5, VCPU_TCSCR(r4) > > > ld r6, VCPU_ACOP(r4) > > > @@ -833,18 +845,7 @@ BEGIN_FTR_SECTION > > > mtspr SPRN_CSIGR, r7 > > > mtspr SPRN_TACR, r8 > > > nop > > > -FTR_SECTION_ELSE > > > - /* POWER9-only registers */ > > > - ld r5, VCPU_TID(r4) > > > - ld r6, VCPU_PSSCR(r4) > > > - lbz r8, HSTATE_FAKE_SUSPEND(r13) > > > - oris r6, r6, PSSCR_EC@h /* This makes stop trap to HV */ > > > - rldimi r6, r8, PSSCR_FAKE_SUSPEND_LG, 63 - PSSCR_FAKE_SUSPEND_LG > > > - ld r7, VCPU_HFSCR(r4) > > > - mtspr SPRN_TIDR, r5 > > > - mtspr SPRN_PSSCR, r6 > > > - mtspr SPRN_HFSCR, r7 > > > -ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300) > > > +ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300) > > > 8: > > > > > > ld r5, VCPU_SPRG0(r4) >
Hello Nicholas, On Fri, Apr 03, 2020 at 03:01:03PM +0530, Gautham R Shenoy wrote: > On Fri, Apr 03, 2020 at 12:20:26PM +1000, Nicholas Piggin wrote: [..snip..] > > > > > > Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> > > > --- > > > arch/powerpc/kvm/book3s_hv.c | 2 +- > > > arch/powerpc/kvm/book3s_hv_rmhandlers.S | 25 +++++++++++++------------ > > > 2 files changed, 14 insertions(+), 13 deletions(-) > > > > > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > > > index cdb7224..36d059a 100644 > > > --- a/arch/powerpc/kvm/book3s_hv.c > > > +++ b/arch/powerpc/kvm/book3s_hv.c > > > @@ -3424,7 +3424,7 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit, > > > mtspr(SPRN_IC, vcpu->arch.ic); > > > mtspr(SPRN_PID, vcpu->arch.pid); > > > > > > - mtspr(SPRN_PSSCR, vcpu->arch.psscr | PSSCR_EC | > > > + mtspr(SPRN_PSSCR, (vcpu->arch.psscr & ~(PSSCR_EC | PSSCR_ESL)) | > > > (local_paca->kvm_hstate.fake_suspend << PSSCR_FAKE_SUSPEND_LG)); > > > > > > mtspr(SPRN_HFSCR, vcpu->arch.hfscr); > > > diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > > > index dbc2fec..c2daec3 100644 > > > --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S > > > +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > > > @@ -823,6 +823,18 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S) > > > mtspr SPRN_PID, r7 > > > mtspr SPRN_WORT, r8 > > > BEGIN_FTR_SECTION > > > + /* POWER9-only registers */ > > > + ld r5, VCPU_TID(r4) > > > + ld r6, VCPU_PSSCR(r4) > > > + lbz r8, HSTATE_FAKE_SUSPEND(r13) > > > + lis r7, (PSSCR_EC | PSSCR_ESL)@h /* Allow guest to call stop */ > > > + andc r6, r6, r7 > > > + rldimi r6, r8, PSSCR_FAKE_SUSPEND_LG, 63 - PSSCR_FAKE_SUSPEND_LG > > > + ld r7, VCPU_HFSCR(r4) > > > + mtspr SPRN_TIDR, r5 > > > + mtspr SPRN_PSSCR, r6 > > > + mtspr SPRN_HFSCR, r7 > > > +FTR_SECTION_ELSE > > > > Why did you move these around? Just because the POWER9 section became > > larger than the other? > > Yes. > > > > > That's a real wart in the instruction patching implementation, I think > > we can fix it by padding with nops in the macros. > > > > Can you just add the additional required nops to the top branch without > > changing them around for this patch, so it's easier to see what's going > > on? The end result will be the same after patching. Actually changing > > these around can have a slight unintended consequence in that code that > > runs before features were patched will execute the IF code. Not a > > problem here, but another reason why the instruction patching > > restriction is annoying. > > Sure, I will repost this patch with additional nops instead of > moving them around. > Below is the same patch without rearranging the FTR_SECTION blocks, but with an extra nop. --- arch/powerpc/kvm/book3s_hv.c | 2 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 4 +++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index c52871c..efa7d3e 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -3433,7 +3433,7 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit, mtspr(SPRN_IC, vcpu->arch.ic); mtspr(SPRN_PID, vcpu->arch.pid); - mtspr(SPRN_PSSCR, vcpu->arch.psscr | PSSCR_EC | + mtspr(SPRN_PSSCR, (vcpu->arch.psscr & ~(PSSCR_EC | PSSCR_ESL)) | (local_paca->kvm_hstate.fake_suspend << PSSCR_FAKE_SUSPEND_LG)); mtspr(SPRN_HFSCR, vcpu->arch.hfscr); diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 780a499..83a69dc 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -833,12 +833,14 @@ BEGIN_FTR_SECTION mtspr SPRN_CSIGR, r7 mtspr SPRN_TACR, r8 nop + nop FTR_SECTION_ELSE /* POWER9-only registers */ ld r5, VCPU_TID(r4) ld r6, VCPU_PSSCR(r4) lbz r8, HSTATE_FAKE_SUSPEND(r13) - oris r6, r6, PSSCR_EC@h /* This makes stop trap to HV */ + lis r7, (PSSCR_EC | PSSCR_ESL)@h /* Allow guest to call stop */ + andc r6, r6, r7 rldimi r6, r8, PSSCR_FAKE_SUSPEND_LG, 63 - PSSCR_FAKE_SUSPEND_LG ld r7, VCPU_HFSCR(r4) mtspr SPRN_TIDR, r5
Hello David, On Mon, Apr 06, 2020 at 07:58:19PM +1000, David Gibson wrote: > On Fri, Apr 03, 2020 at 03:01:03PM +0530, Gautham R Shenoy wrote: > > On Fri, Apr 03, 2020 at 12:20:26PM +1000, Nicholas Piggin wrote: > > > Gautham R. Shenoy's on March 31, 2020 10:10 pm: > > > > From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> > > > > > > > > ISA v3.0 allows the guest to execute a stop instruction. For this, the > > > > PSSCR[ESL|EC] bits need to be cleared by the hypervisor before > > > > scheduling in the guest vCPU. > > > > > > > > Currently we always schedule in a vCPU with PSSCR[ESL|EC] bits > > > > set. This patch changes the behaviour to enter the guest with > > > > PSSCR[ESL|EC] bits cleared. This is a RFC patch where we > > > > unconditionally clear these bits. Ideally this should be done > > > > conditionally on platforms where the guest stop instruction has no > > > > Bugs (starting POWER9 DD2.3). > > > > > > How will guests know that they can use this facility safely after your > > > series? You need both DD2.3 and a patched KVM. > > > > > > Yes, this is something that isn't addressed in this series (mentioned > > in the cover letter), which is a POC demonstrating that the stop0lite > > state in guest works. > > > > However, to answer your question, this is the scheme that I had in > > mind : > > > > OPAL: > > On Procs >= DD2.3 : we publish a dt-cpu-feature "idle-stop-guest" > > > > Hypervisor Kernel: > > 1. If "idle-stop-guest" dt-cpu-feature is discovered, then > > we set bool enable_guest_stop = true; > > > > 2. During KVM guest entry, clear PSSCR[ESL|EC] iff > > enable_guest_stop == true. > > > > 3. In kvm_vm_ioctl_check_extension(), for a new capability > > KVM_CAP_STOP, return true iff enable_guest_top == true. > > > > QEMU: > > Check with the hypervisor if KVM_CAP_STOP is present. If so, > > indicate the presence to the guest via device tree. > > Nack. Presenting different capabilities to the guest depending on > host capabilities (rather than explicit options) is never ok. It > means that depending on the system you start on you may or may not be > able to migrate to other systems that you're supposed to be able to, I agree that blocking migration for the unavailability of this feature is not desirable. Could you point me to some other capabilities in KVM which have been implemented via explicit options? The ISA 3.0 allows the guest to execute the "stop" instruction. If the Hypervisor hasn't cleared the PSSCR[ESL|EC] then, guest executing the "stop" instruction in the causes a Hypervisor Facility Unavailable Exception, thus giving the hypervisor a chance to emulate the instruction. However, in the current code, when the hypervisor receives this exception, it sends a PROGKILL to the guest which results in crashing the guest. Patch 1 of this series emulates wakeup from the "stop" instruction. Would the following scheme be ok? OPAL: On Procs >= DD2.3 : we publish a dt-cpu-feature "idle-stop-guest" Hypervisor Kernel: If "idle-stop-guest" dt feature is available, then, before entering the guest, the hypervisor clears the PSSCR[EC|ESL] bits allowing the guest to safely execute stop instruction. If "idle-stop-guest" dt feature is not available, then, the Hypervisor sets the PSSCR[ESL|EC] bits, thereby causing a guest "stop" instruction execution to trap back into the hypervisor. We then emulate a wakeup from the stop instruction (Patch 1 of this series). Guest Kernel: If (cpu_has_feature(CPU_FTR_ARCH_300)) only then use the stop0lite cpuidle state. This allows us to migrate the KVM guest across any POWER9 Hypervisor. The minimal addition that the Hypervisor would need is Patch 1 of this series. -- Thanks and Regards gautham.
On Tue, Apr 07, 2020 at 06:55:26PM +0530, Gautham R Shenoy wrote: > Hello David, > > On Mon, Apr 06, 2020 at 07:58:19PM +1000, David Gibson wrote: > > On Fri, Apr 03, 2020 at 03:01:03PM +0530, Gautham R Shenoy wrote: > > > On Fri, Apr 03, 2020 at 12:20:26PM +1000, Nicholas Piggin wrote: > > > > Gautham R. Shenoy's on March 31, 2020 10:10 pm: > > > > > From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> > > > > > > > > > > ISA v3.0 allows the guest to execute a stop instruction. For this, the > > > > > PSSCR[ESL|EC] bits need to be cleared by the hypervisor before > > > > > scheduling in the guest vCPU. > > > > > > > > > > Currently we always schedule in a vCPU with PSSCR[ESL|EC] bits > > > > > set. This patch changes the behaviour to enter the guest with > > > > > PSSCR[ESL|EC] bits cleared. This is a RFC patch where we > > > > > unconditionally clear these bits. Ideally this should be done > > > > > conditionally on platforms where the guest stop instruction has no > > > > > Bugs (starting POWER9 DD2.3). > > > > > > > > How will guests know that they can use this facility safely after your > > > > series? You need both DD2.3 and a patched KVM. > > > > > > > > > Yes, this is something that isn't addressed in this series (mentioned > > > in the cover letter), which is a POC demonstrating that the stop0lite > > > state in guest works. > > > > > > However, to answer your question, this is the scheme that I had in > > > mind : > > > > > > OPAL: > > > On Procs >= DD2.3 : we publish a dt-cpu-feature "idle-stop-guest" > > > > > > Hypervisor Kernel: > > > 1. If "idle-stop-guest" dt-cpu-feature is discovered, then > > > we set bool enable_guest_stop = true; > > > > > > 2. During KVM guest entry, clear PSSCR[ESL|EC] iff > > > enable_guest_stop == true. > > > > > > 3. In kvm_vm_ioctl_check_extension(), for a new capability > > > KVM_CAP_STOP, return true iff enable_guest_top == true. > > > > > > QEMU: > > > Check with the hypervisor if KVM_CAP_STOP is present. If so, > > > indicate the presence to the guest via device tree. > > > > Nack. Presenting different capabilities to the guest depending on > > host capabilities (rather than explicit options) is never ok. It > > means that depending on the system you start on you may or may not be > > able to migrate to other systems that you're supposed to be able to, > > I agree that blocking migration for the unavailability of this feature > is not desirable. Could you point me to some other capabilities in KVM > which have been implemented via explicit options? TBH, most of the options for the 'pseries' machine type are in this category: cap-vsx, cap-dfp, cap-htm, a bunch related to various Spectre mitigations, cap-hpt-max-page-size (maximum page size for hash guests), cap-nested-hv, cap-large-decr, cap-fwnmi, resize-hpt (HPT resizing extension), ic-mode (which irq controllers are available to the guest). > The ISA 3.0 allows the guest to execute the "stop" instruction. So, this was a bug in DD2.2's implementation of the architecture? > If the > Hypervisor hasn't cleared the PSSCR[ESL|EC] then, guest executing the > "stop" instruction in the causes a Hypervisor Facility Unavailable > Exception, thus giving the hypervisor a chance to emulate the > instruction. However, in the current code, when the hypervisor > receives this exception, it sends a PROGKILL to the guest which > results in crashing the guest. > > Patch 1 of this series emulates wakeup from the "stop" > instruction. Would the following scheme be ok? > > OPAL: > On Procs >= DD2.3 : we publish a dt-cpu-feature "idle-stop-guest" > > Hypervisor Kernel: > > If "idle-stop-guest" dt feature is available, then, before > entering the guest, the hypervisor clears the PSSCR[EC|ESL] > bits allowing the guest to safely execute stop instruction. > > If "idle-stop-guest" dt feature is not available, then, the > Hypervisor sets the PSSCR[ESL|EC] bits, thereby causing a > guest "stop" instruction execution to trap back into the > hypervisor. We then emulate a wakeup from the stop > instruction (Patch 1 of this series). > > Guest Kernel: > If (cpu_has_feature(CPU_FTR_ARCH_300)) only then use the > stop0lite cpuidle state. > > This allows us to migrate the KVM guest across any POWER9 > Hypervisor. The minimal addition that the Hypervisor would need is > Patch 1 of this series. That could be workable. Some caveats, though: * How does the latency of the trap-and-emulate compare to the guest using H_CEDE in the first place? i.e. how big a negative impact will this have for guests running on DD2.2 hosts? * We'll only be able to enable this in a new qemu machine type version (say, pseries-5.1.0). Otherwise a guest could start thinking it can use stop states, then be migrated to an older qemu or host kernel without the support and crash.
Hello David, On Wed, Apr 08, 2020 at 12:29:57PM +1000, David Gibson wrote: > On Tue, Apr 07, 2020 at 06:55:26PM +0530, Gautham R Shenoy wrote: > > Hello David, > > > > On Mon, Apr 06, 2020 at 07:58:19PM +1000, David Gibson wrote: > > > On Fri, Apr 03, 2020 at 03:01:03PM +0530, Gautham R Shenoy wrote: > > > > On Fri, Apr 03, 2020 at 12:20:26PM +1000, Nicholas Piggin wrote: > > > > > Gautham R. Shenoy's on March 31, 2020 10:10 pm: > > > > > > From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> > > > > > > > > > > > > ISA v3.0 allows the guest to execute a stop instruction. For this, the > > > > > > PSSCR[ESL|EC] bits need to be cleared by the hypervisor before > > > > > > scheduling in the guest vCPU. > > > > > > > > > > > > Currently we always schedule in a vCPU with PSSCR[ESL|EC] bits > > > > > > set. This patch changes the behaviour to enter the guest with > > > > > > PSSCR[ESL|EC] bits cleared. This is a RFC patch where we > > > > > > unconditionally clear these bits. Ideally this should be done > > > > > > conditionally on platforms where the guest stop instruction has no > > > > > > Bugs (starting POWER9 DD2.3). > > > > > > > > > > How will guests know that they can use this facility safely after your > > > > > series? You need both DD2.3 and a patched KVM. > > > > > > > > > > > > Yes, this is something that isn't addressed in this series (mentioned > > > > in the cover letter), which is a POC demonstrating that the stop0lite > > > > state in guest works. > > > > > > > > However, to answer your question, this is the scheme that I had in > > > > mind : > > > > > > > > OPAL: > > > > On Procs >= DD2.3 : we publish a dt-cpu-feature "idle-stop-guest" > > > > > > > > Hypervisor Kernel: > > > > 1. If "idle-stop-guest" dt-cpu-feature is discovered, then > > > > we set bool enable_guest_stop = true; > > > > > > > > 2. During KVM guest entry, clear PSSCR[ESL|EC] iff > > > > enable_guest_stop == true. > > > > > > > > 3. In kvm_vm_ioctl_check_extension(), for a new capability > > > > KVM_CAP_STOP, return true iff enable_guest_top == true. > > > > > > > > QEMU: > > > > Check with the hypervisor if KVM_CAP_STOP is present. If so, > > > > indicate the presence to the guest via device tree. > > > > > > Nack. Presenting different capabilities to the guest depending on > > > host capabilities (rather than explicit options) is never ok. It > > > means that depending on the system you start on you may or may not be > > > able to migrate to other systems that you're supposed to be able to, > > > > I agree that blocking migration for the unavailability of this feature > > is not desirable. Could you point me to some other capabilities in KVM > > which have been implemented via explicit options? > > TBH, most of the options for the 'pseries' machine type are in this > category: cap-vsx, cap-dfp, cap-htm, a bunch related to various > Spectre mitigations, cap-hpt-max-page-size (maximum page size for hash > guests), cap-nested-hv, cap-large-decr, cap-fwnmi, resize-hpt (HPT > resizing extension), ic-mode (which irq controllers are available to > the guest). Thanks. I will follow this suit. > > > The ISA 3.0 allows the guest to execute the "stop" instruction. > > So, this was a bug in DD2.2's implementation of the architecture? Yes, the previous versions could miss wakeup events when stop was executed in HV=0,PR=0 mode. So, the hypervisor had to block that. > > > If the > > Hypervisor hasn't cleared the PSSCR[ESL|EC] then, guest executing the > > "stop" instruction in the causes a Hypervisor Facility Unavailable > > Exception, thus giving the hypervisor a chance to emulate the > > instruction. However, in the current code, when the hypervisor > > receives this exception, it sends a PROGKILL to the guest which > > results in crashing the guest. > > > > Patch 1 of this series emulates wakeup from the "stop" > > instruction. Would the following scheme be ok? > > > > OPAL: > > On Procs >= DD2.3 : we publish a dt-cpu-feature "idle-stop-guest" > > > > Hypervisor Kernel: > > > > If "idle-stop-guest" dt feature is available, then, before > > entering the guest, the hypervisor clears the PSSCR[EC|ESL] > > bits allowing the guest to safely execute stop instruction. > > > > If "idle-stop-guest" dt feature is not available, then, the > > Hypervisor sets the PSSCR[ESL|EC] bits, thereby causing a > > guest "stop" instruction execution to trap back into the > > hypervisor. We then emulate a wakeup from the stop > > instruction (Patch 1 of this series). > > > > Guest Kernel: > > If (cpu_has_feature(CPU_FTR_ARCH_300)) only then use the > > stop0lite cpuidle state. > > > > This allows us to migrate the KVM guest across any POWER9 > > Hypervisor. The minimal addition that the Hypervisor would need is > > Patch 1 of this series. > > That could be workable. Some caveats, though: > > * How does the latency of the trap-and-emulate compare to the guest > using H_CEDE in the first place? i.e. how big a negative impact > will this have for guests running on DD2.2 hosts? The wakeup latency of trap-and-emulated stop0lite (referred to as "stop0lite Emulated" in the tables below) the compares favorably compared to H_CEDE. It is in the order of 5-6us while the wakeup latency of H_CEDE is ~25-30us. ====================================================================== Wakeup Latency measured using a timer (in ns) [Lower is better] ====================================================================== Idle state | Nr samples | Min | Max | Median | Avg | Stddev| ---------------------------------------------------------------------- snooze | 60 | 787 | 1059 | 938 | 937.4 | 42.27 | ---------------------------------------------------------------------- stop0lite | 60 | 770 | 1182 | 948 | 946.4 | 67.41 | ---------------------------------------------------------------------- stop0lite | 60 | 2378 | 7659 | 5006 |5093.6 |1578.7 | Emulated | | | | | | | ---------------------------------------------------------------------- Shared CEDE| 60 | 9550 | 36694 | 29219 |28564.1|3545.9 | ====================================================================== ====================================================================== Wakeup latency measured using an IPI (in ns) [Lower is better] ====================================================================== Idle state | Nr | Min | Max | Median | Avg | Stddev | |samples | | | | | | ---------------------------------------------------------------------- snooze | 60 | 2089| 4228| 2259| 2342.31| 316.56| ---------------------------------------------------------------------- stop0lite | 60 | 1947| 3674| 2653| 2610.57| 266.73| ---------------------------------------------------------------------- stop0lite | 60 | 3580| 8154| 5596| 5644.95| 1368.44| Emulated | | | | | | | ---------------------------------------------------------------------- Shared CEDE| 60 | 20147| 36305| 21827| 26762.65| 6875.01| ====================================================================== > > * We'll only be able to enable this in a new qemu machine type > version (say, pseries-5.1.0). Otherwise a guest could start > thinking it can use stop states, then be migrated to an older qemu > or host kernel without the support and crash. That makese sense. In fact in the case of not being able to backport Patch 1 to all the older hypervisor kernels, we will need a way of gating the guest from using stop-states and then migrating onto an older hypervisor kernel. Associating this with a new qemu machine type version should solve this problem, assuming that all the newer qemus will also be running on newer hypervisor kernels. > > -- > David Gibson | I'll have my music baroque, and my code > david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ > | _way_ _around_! > http://www.ozlabs.org/~dgibson -- Thanks and Regards gautham.
On Mon, Apr 13, 2020 at 03:55:49PM +0530, Gautham R Shenoy wrote: > Hello David, > > On Wed, Apr 08, 2020 at 12:29:57PM +1000, David Gibson wrote: > > On Tue, Apr 07, 2020 at 06:55:26PM +0530, Gautham R Shenoy wrote: > > > Hello David, > > > > > > On Mon, Apr 06, 2020 at 07:58:19PM +1000, David Gibson wrote: > > > > On Fri, Apr 03, 2020 at 03:01:03PM +0530, Gautham R Shenoy wrote: > > > > > On Fri, Apr 03, 2020 at 12:20:26PM +1000, Nicholas Piggin wrote: > > > > > > Gautham R. Shenoy's on March 31, 2020 10:10 pm: > > > > > > > From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> > > > > > > > > > > > > > > ISA v3.0 allows the guest to execute a stop instruction. For this, the > > > > > > > PSSCR[ESL|EC] bits need to be cleared by the hypervisor before > > > > > > > scheduling in the guest vCPU. > > > > > > > > > > > > > > Currently we always schedule in a vCPU with PSSCR[ESL|EC] bits > > > > > > > set. This patch changes the behaviour to enter the guest with > > > > > > > PSSCR[ESL|EC] bits cleared. This is a RFC patch where we > > > > > > > unconditionally clear these bits. Ideally this should be done > > > > > > > conditionally on platforms where the guest stop instruction has no > > > > > > > Bugs (starting POWER9 DD2.3). > > > > > > > > > > > > How will guests know that they can use this facility safely after your > > > > > > series? You need both DD2.3 and a patched KVM. > > > > > > > > > > > > > > > Yes, this is something that isn't addressed in this series (mentioned > > > > > in the cover letter), which is a POC demonstrating that the stop0lite > > > > > state in guest works. > > > > > > > > > > However, to answer your question, this is the scheme that I had in > > > > > mind : > > > > > > > > > > OPAL: > > > > > On Procs >= DD2.3 : we publish a dt-cpu-feature "idle-stop-guest" > > > > > > > > > > Hypervisor Kernel: > > > > > 1. If "idle-stop-guest" dt-cpu-feature is discovered, then > > > > > we set bool enable_guest_stop = true; > > > > > > > > > > 2. During KVM guest entry, clear PSSCR[ESL|EC] iff > > > > > enable_guest_stop == true. > > > > > > > > > > 3. In kvm_vm_ioctl_check_extension(), for a new capability > > > > > KVM_CAP_STOP, return true iff enable_guest_top == true. > > > > > > > > > > QEMU: > > > > > Check with the hypervisor if KVM_CAP_STOP is present. If so, > > > > > indicate the presence to the guest via device tree. > > > > > > > > Nack. Presenting different capabilities to the guest depending on > > > > host capabilities (rather than explicit options) is never ok. It > > > > means that depending on the system you start on you may or may not be > > > > able to migrate to other systems that you're supposed to be able to, > > > > > > I agree that blocking migration for the unavailability of this feature > > > is not desirable. Could you point me to some other capabilities in KVM > > > which have been implemented via explicit options? > > > > TBH, most of the options for the 'pseries' machine type are in this > > category: cap-vsx, cap-dfp, cap-htm, a bunch related to various > > Spectre mitigations, cap-hpt-max-page-size (maximum page size for hash > > guests), cap-nested-hv, cap-large-decr, cap-fwnmi, resize-hpt (HPT > > resizing extension), ic-mode (which irq controllers are available to > > the guest). > > > Thanks. I will follow this suit. > > > > > > The ISA 3.0 allows the guest to execute the "stop" instruction. > > > > So, this was a bug in DD2.2's implementation of the architecture? > > Yes, the previous versions could miss wakeup events when stop was > executed in HV=0,PR=0 mode. So, the hypervisor had to block that. > > > > > > > If the > > > Hypervisor hasn't cleared the PSSCR[ESL|EC] then, guest executing the > > > "stop" instruction in the causes a Hypervisor Facility Unavailable > > > Exception, thus giving the hypervisor a chance to emulate the > > > instruction. However, in the current code, when the hypervisor > > > receives this exception, it sends a PROGKILL to the guest which > > > results in crashing the guest. > > > > > > Patch 1 of this series emulates wakeup from the "stop" > > > instruction. Would the following scheme be ok? > > > > > > OPAL: > > > On Procs >= DD2.3 : we publish a dt-cpu-feature "idle-stop-guest" > > > > > > Hypervisor Kernel: > > > > > > If "idle-stop-guest" dt feature is available, then, before > > > entering the guest, the hypervisor clears the PSSCR[EC|ESL] > > > bits allowing the guest to safely execute stop instruction. > > > > > > If "idle-stop-guest" dt feature is not available, then, the > > > Hypervisor sets the PSSCR[ESL|EC] bits, thereby causing a > > > guest "stop" instruction execution to trap back into the > > > hypervisor. We then emulate a wakeup from the stop > > > instruction (Patch 1 of this series). > > > > > > Guest Kernel: > > > If (cpu_has_feature(CPU_FTR_ARCH_300)) only then use the > > > stop0lite cpuidle state. > > > > > > This allows us to migrate the KVM guest across any POWER9 > > > Hypervisor. The minimal addition that the Hypervisor would need is > > > Patch 1 of this series. > > > > That could be workable. Some caveats, though: > > > > * How does the latency of the trap-and-emulate compare to the guest > > using H_CEDE in the first place? i.e. how big a negative impact > > will this have for guests running on DD2.2 hosts? > > > The wakeup latency of trap-and-emulated stop0lite (referred to as > "stop0lite Emulated" in the tables below) the compares favorably > compared to H_CEDE. It is in the order of 5-6us while the wakeup > latency of H_CEDE is ~25-30us. Ok. So allowing the guest to use stop0lite everywhere, but having it emulated on the older CPUs should work reasonably well. > > ====================================================================== > Wakeup Latency measured using a timer (in ns) [Lower is better] > ====================================================================== > Idle state | Nr samples | Min | Max | Median | Avg | Stddev| > ---------------------------------------------------------------------- > snooze | 60 | 787 | 1059 | 938 | 937.4 | 42.27 | > ---------------------------------------------------------------------- > stop0lite | 60 | 770 | 1182 | 948 | 946.4 | 67.41 | > ---------------------------------------------------------------------- > stop0lite | 60 | 2378 | 7659 | 5006 |5093.6 |1578.7 | > Emulated | | | | | | | > ---------------------------------------------------------------------- > Shared CEDE| 60 | 9550 | 36694 | 29219 |28564.1|3545.9 | > ====================================================================== > > > ====================================================================== > Wakeup latency measured using an IPI (in ns) [Lower is better] > ====================================================================== > Idle state | Nr | Min | Max | Median | Avg | Stddev | > |samples | | | | | | > ---------------------------------------------------------------------- > snooze | 60 | 2089| 4228| 2259| 2342.31| 316.56| > ---------------------------------------------------------------------- > stop0lite | 60 | 1947| 3674| 2653| 2610.57| 266.73| > ---------------------------------------------------------------------- > stop0lite | 60 | 3580| 8154| 5596| 5644.95| 1368.44| > Emulated | | | | | | | > ---------------------------------------------------------------------- > Shared CEDE| 60 | 20147| 36305| 21827| 26762.65| 6875.01| > ====================================================================== > > > > > * We'll only be able to enable this in a new qemu machine type > > version (say, pseries-5.1.0). Otherwise a guest could start > > thinking it can use stop states, then be migrated to an older qemu > > or host kernel without the support and crash. > > That makese sense. In fact in the case of not being able to backport > Patch 1 to all the older hypervisor kernels, we will need a way of > gating the guest from using stop-states and then migrating onto an > older hypervisor kernel. Associating this with a new qemu machine type > version should solve this problem, assuming that all the newer qemus > will also be running on newer hypervisor kernels. We can't assume that automatically, but we can enforce it with a pretty standard mechanism. The way to do it is this: * Make a new spapr capability flag that enables guest use of the lite stop instructions * In the capability's '.apply' hook, verify that the host kernel can support this, and if not fail * Enable the new capability by default in the new machine type So, running the new machine type with default options on an old kernel will fail with a meaningful error, but existing setups with an old qemu, or old machine type.
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index cdb7224..36d059a 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -3424,7 +3424,7 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit, mtspr(SPRN_IC, vcpu->arch.ic); mtspr(SPRN_PID, vcpu->arch.pid); - mtspr(SPRN_PSSCR, vcpu->arch.psscr | PSSCR_EC | + mtspr(SPRN_PSSCR, (vcpu->arch.psscr & ~(PSSCR_EC | PSSCR_ESL)) | (local_paca->kvm_hstate.fake_suspend << PSSCR_FAKE_SUSPEND_LG)); mtspr(SPRN_HFSCR, vcpu->arch.hfscr); diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index dbc2fec..c2daec3 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -823,6 +823,18 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S) mtspr SPRN_PID, r7 mtspr SPRN_WORT, r8 BEGIN_FTR_SECTION + /* POWER9-only registers */ + ld r5, VCPU_TID(r4) + ld r6, VCPU_PSSCR(r4) + lbz r8, HSTATE_FAKE_SUSPEND(r13) + lis r7, (PSSCR_EC | PSSCR_ESL)@h /* Allow guest to call stop */ + andc r6, r6, r7 + rldimi r6, r8, PSSCR_FAKE_SUSPEND_LG, 63 - PSSCR_FAKE_SUSPEND_LG + ld r7, VCPU_HFSCR(r4) + mtspr SPRN_TIDR, r5 + mtspr SPRN_PSSCR, r6 + mtspr SPRN_HFSCR, r7 +FTR_SECTION_ELSE /* POWER8-only registers */ ld r5, VCPU_TCSCR(r4) ld r6, VCPU_ACOP(r4) @@ -833,18 +845,7 @@ BEGIN_FTR_SECTION mtspr SPRN_CSIGR, r7 mtspr SPRN_TACR, r8 nop -FTR_SECTION_ELSE - /* POWER9-only registers */ - ld r5, VCPU_TID(r4) - ld r6, VCPU_PSSCR(r4) - lbz r8, HSTATE_FAKE_SUSPEND(r13) - oris r6, r6, PSSCR_EC@h /* This makes stop trap to HV */ - rldimi r6, r8, PSSCR_FAKE_SUSPEND_LG, 63 - PSSCR_FAKE_SUSPEND_LG - ld r7, VCPU_HFSCR(r4) - mtspr SPRN_TIDR, r5 - mtspr SPRN_PSSCR, r6 - mtspr SPRN_HFSCR, r7 -ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300) +ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300) 8: ld r5, VCPU_SPRG0(r4)