diff mbox

[v2,1/4] KVM: PPC: e500mc: Revert "add load inst fixup"

Message ID 1398905152-18091-2-git-send-email-mihai.caraman@freescale.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Mihai Caraman May 1, 2014, 12:45 a.m. UTC
The commit 1d628af7 "add load inst fixup" made an attempt to handle
failures generated by reading the guest current instruction. The fixup
code that was added works by chance hiding the real issue.

Load external pid (lwepx) instruction, used by KVM to read guest
instructions, is executed in a subsituted guest translation context
(EPLC[EGS] = 1). In consequence lwepx's TLB error and data storage
interrupts need to be handled by KVM, even though these interrupts
are generated from host context (MSR[GS] = 0).

Currently, KVM hooks only interrupts generated from guest context
(MSR[GS] = 1), doing minimal checks on the fast path to avoid host
performance degradation. As a result, the host kernel handles lwepx
faults searching the faulting guest data address (loaded in DEAR) in
its own Logical Partition ID (LPID) 0 context. In case a host translation
is found the execution returns to the lwepx instruction instead of the
fixup, the host ending up in an infinite loop.

Revert the commit "add load inst fixup". lwepx issue will be addressed
in a subsequent patch without needing fixup code.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
---
v2:
 - reworked patch description

 arch/powerpc/kvm/bookehv_interrupts.S | 25 +------------------------
 1 file changed, 1 insertion(+), 24 deletions(-)

Comments

Alexander Graf May 2, 2014, 9:24 a.m. UTC | #1
On 05/01/2014 02:45 AM, Mihai Caraman wrote:
> The commit 1d628af7 "add load inst fixup" made an attempt to handle
> failures generated by reading the guest current instruction. The fixup
> code that was added works by chance hiding the real issue.
>
> Load external pid (lwepx) instruction, used by KVM to read guest
> instructions, is executed in a subsituted guest translation context
> (EPLC[EGS] = 1). In consequence lwepx's TLB error and data storage
> interrupts need to be handled by KVM, even though these interrupts
> are generated from host context (MSR[GS] = 0).
>
> Currently, KVM hooks only interrupts generated from guest context
> (MSR[GS] = 1), doing minimal checks on the fast path to avoid host
> performance degradation. As a result, the host kernel handles lwepx
> faults searching the faulting guest data address (loaded in DEAR) in
> its own Logical Partition ID (LPID) 0 context. In case a host translation
> is found the execution returns to the lwepx instruction instead of the
> fixup, the host ending up in an infinite loop.
>
> Revert the commit "add load inst fixup". lwepx issue will be addressed
> in a subsequent patch without needing fixup code.
>
> Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>

Just a random idea: Could we just switch IVOR2 during the critical lwepx 
phase? In fact, we could even do that later when we're already in C code 
and try to recover the last instruction. The code IVOR2 would point to 
would simply set the register we're trying to read to as LAST_INST_FAIL 
and rfi.


Alex
Mihai Caraman May 2, 2014, 11:14 p.m. UTC | #2
> From: Alexander Graf <agraf@suse.de>
> Sent: Friday, May 2, 2014 12:24 PM
> To: Caraman Mihai Claudiu-B02008
> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
> Subject: Re: [PATCH v2 1/4] KVM: PPC: e500mc: Revert "add load inst fixup"
> 
> On 05/01/2014 02:45 AM, Mihai Caraman wrote:
> > The commit 1d628af7 "add load inst fixup" made an attempt to handle
> > failures generated by reading the guest current instruction. The fixup
> > code that was added works by chance hiding the real issue.
> >
> > Load external pid (lwepx) instruction, used by KVM to read guest
> > instructions, is executed in a subsituted guest translation context
> > (EPLC[EGS] = 1). In consequence lwepx's TLB error and data storage
> > interrupts need to be handled by KVM, even though these interrupts
> > are generated from host context (MSR[GS] = 0).
> >
> > Currently, KVM hooks only interrupts generated from guest context
> > (MSR[GS] = 1), doing minimal checks on the fast path to avoid host
> > performance degradation. As a result, the host kernel handles lwepx
> > faults searching the faulting guest data address (loaded in DEAR) in
> > its own Logical Partition ID (LPID) 0 context. In case a host translation
> > is found the execution returns to the lwepx instruction instead of the
> > fixup, the host ending up in an infinite loop.
> >
> > Revert the commit "add load inst fixup". lwepx issue will be addressed
> > in a subsequent patch without needing fixup code.
> >
> > Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
> 
> Just a random idea: Could we just switch IVOR2 during the critical lwepx
> phase? In fact, we could even do that later when we're already in C code
> and try to recover the last instruction. The code IVOR2 would point to
> would simply set the register we're trying to read to as LAST_INST_FAIL
> and rfi.

This was the first idea that sprang to my mind inspired from how DO_KVM
is hooked on PR. I actually did a simple POC for e500mc/e5500, but this will
not work on e6500 which has shared IVORs between HW threads.

-Mike
Alexander Graf May 3, 2014, 10:14 p.m. UTC | #3
Am 03.05.2014 um 01:14 schrieb "mihai.caraman@freescale.com" <mihai.caraman@freescale.com>:

>> From: Alexander Graf <agraf@suse.de>
>> Sent: Friday, May 2, 2014 12:24 PM
>> To: Caraman Mihai Claudiu-B02008
>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
>> Subject: Re: [PATCH v2 1/4] KVM: PPC: e500mc: Revert "add load inst fixup"
>> 
>>> On 05/01/2014 02:45 AM, Mihai Caraman wrote:
>>> The commit 1d628af7 "add load inst fixup" made an attempt to handle
>>> failures generated by reading the guest current instruction. The fixup
>>> code that was added works by chance hiding the real issue.
>>> 
>>> Load external pid (lwepx) instruction, used by KVM to read guest
>>> instructions, is executed in a subsituted guest translation context
>>> (EPLC[EGS] = 1). In consequence lwepx's TLB error and data storage
>>> interrupts need to be handled by KVM, even though these interrupts
>>> are generated from host context (MSR[GS] = 0).
>>> 
>>> Currently, KVM hooks only interrupts generated from guest context
>>> (MSR[GS] = 1), doing minimal checks on the fast path to avoid host
>>> performance degradation. As a result, the host kernel handles lwepx
>>> faults searching the faulting guest data address (loaded in DEAR) in
>>> its own Logical Partition ID (LPID) 0 context. In case a host translation
>>> is found the execution returns to the lwepx instruction instead of the
>>> fixup, the host ending up in an infinite loop.
>>> 
>>> Revert the commit "add load inst fixup". lwepx issue will be addressed
>>> in a subsequent patch without needing fixup code.
>>> 
>>> Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
>> 
>> Just a random idea: Could we just switch IVOR2 during the critical lwepx
>> phase? In fact, we could even do that later when we're already in C code
>> and try to recover the last instruction. The code IVOR2 would point to
>> would simply set the register we're trying to read to as LAST_INST_FAIL
>> and rfi.
> 
> This was the first idea that sprang to my mind inspired from how DO_KVM
> is hooked on PR. I actually did a simple POC for e500mc/e5500, but this will
> not work on e6500 which has shared IVORs between HW threads.

What if we combine the ideas? On read we flip the IVOR to a separate handler that checks for a field in the PACA. Only if that field is set, we treat the fault as kvm fault, otherwise we jump into the normal handler.

I suppose we'd have to also take a lock to make sure we don't race with the other thread when it wants to also read a guest instruction, but you get the idea.

I have no idea whether this would be any faster, it's more of a brainstorming thing really. But regardless this patch set would be a move into the right direction.

Btw, do we have any guarantees that we don't get scheduled away before we run kvmppc_get_last_inst()? If we run on a different core we can't read the inst anymore. Hrm.


Alex
Mihai Caraman May 6, 2014, 3:48 p.m. UTC | #4
> -----Original Message-----
> From: Alexander Graf [mailto:agraf@suse.de]
> Sent: Sunday, May 04, 2014 1:14 AM
> To: Caraman Mihai Claudiu-B02008
> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; linuxppc-
> dev@lists.ozlabs.org
> Subject: Re: [PATCH v2 1/4] KVM: PPC: e500mc: Revert "add load inst
> fixup"
> 
> 
> 
> Am 03.05.2014 um 01:14 schrieb "mihai.caraman@freescale.com"
> <mihai.caraman@freescale.com>:
> 
> >> From: Alexander Graf <agraf@suse.de>
> >> Sent: Friday, May 2, 2014 12:24 PM

> > This was the first idea that sprang to my mind inspired from how DO_KVM
> > is hooked on PR. I actually did a simple POC for e500mc/e5500, but this
> will
> > not work on e6500 which has shared IVORs between HW threads.
> 
> What if we combine the ideas? On read we flip the IVOR to a separate
> handler that checks for a field in the PACA. Only if that field is set,
> we treat the fault as kvm fault, otherwise we jump into the normal
> handler.
> 
> I suppose we'd have to also take a lock to make sure we don't race with
> the other thread when it wants to also read a guest instruction, but you
> get the idea.

This might be a solution for TLB eviction but not for execute-but-not-read
entries which requires access from host context.

> 
> I have no idea whether this would be any faster, it's more of a
> brainstorming thing really. But regardless this patch set would be a move
> into the right direction.
> 
> Btw, do we have any guarantees that we don't get scheduled away before we
> run kvmppc_get_last_inst()? If we run on a different core we can't read
> the inst anymore. Hrm.

It was your suggestion to move the logic from kvmppc_handle_exit() irq
disabled area to kvmppc_get_last_inst():

http://git.freescale.com/git/cgit.cgi/ppc/sdk/linux.git/tree/arch/powerpc/kvm/booke.c

Still, what is wrong if we get scheduled on another core? We will emulate
again and the guest will populate the TLB on the new core.

-Mike
Alexander Graf May 6, 2014, 3:54 p.m. UTC | #5
On 05/06/2014 05:48 PM, mihai.caraman@freescale.com wrote:
>> -----Original Message-----
>> From: Alexander Graf [mailto:agraf@suse.de]
>> Sent: Sunday, May 04, 2014 1:14 AM
>> To: Caraman Mihai Claudiu-B02008
>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; linuxppc-
>> dev@lists.ozlabs.org
>> Subject: Re: [PATCH v2 1/4] KVM: PPC: e500mc: Revert "add load inst
>> fixup"
>>
>>
>>
>> Am 03.05.2014 um 01:14 schrieb "mihai.caraman@freescale.com"
>> <mihai.caraman@freescale.com>:
>>
>>>> From: Alexander Graf <agraf@suse.de>
>>>> Sent: Friday, May 2, 2014 12:24 PM
>>> This was the first idea that sprang to my mind inspired from how DO_KVM
>>> is hooked on PR. I actually did a simple POC for e500mc/e5500, but this
>> will
>>> not work on e6500 which has shared IVORs between HW threads.
>> What if we combine the ideas? On read we flip the IVOR to a separate
>> handler that checks for a field in the PACA. Only if that field is set,
>> we treat the fault as kvm fault, otherwise we jump into the normal
>> handler.
>>
>> I suppose we'd have to also take a lock to make sure we don't race with
>> the other thread when it wants to also read a guest instruction, but you
>> get the idea.
> This might be a solution for TLB eviction but not for execute-but-not-read
> entries which requires access from host context.

Good point :).

>
>> I have no idea whether this would be any faster, it's more of a
>> brainstorming thing really. But regardless this patch set would be a move
>> into the right direction.
>>
>> Btw, do we have any guarantees that we don't get scheduled away before we
>> run kvmppc_get_last_inst()? If we run on a different core we can't read
>> the inst anymore. Hrm.
> It was your suggestion to move the logic from kvmppc_handle_exit() irq
> disabled area to kvmppc_get_last_inst():
>
> http://git.freescale.com/git/cgit.cgi/ppc/sdk/linux.git/tree/arch/powerpc/kvm/booke.c
>
> Still, what is wrong if we get scheduled on another core? We will emulate
> again and the guest will populate the TLB on the new core.

Yes, it means we have to get the EMULATE_AGAIN code paths correct :). It 
also means we lose some performance with preemptive kernel configurations.


Alex
diff mbox

Patch

diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S
index a1712b8..925da71 100644
--- a/arch/powerpc/kvm/bookehv_interrupts.S
+++ b/arch/powerpc/kvm/bookehv_interrupts.S
@@ -164,32 +164,9 @@ 
 	PPC_STL	r30, VCPU_GPR(R30)(r4)
 	PPC_STL	r31, VCPU_GPR(R31)(r4)
 	mtspr	SPRN_EPLC, r8
-
-	/* disable preemption, so we are sure we hit the fixup handler */
-	CURRENT_THREAD_INFO(r8, r1)
-	li	r7, 1
-	stw	r7, TI_PREEMPT(r8)
-
 	isync
-
-	/*
-	 * In case the read goes wrong, we catch it and write an invalid value
-	 * in LAST_INST instead.
-	 */
-1:	lwepx	r9, 0, r5
-2:
-.section .fixup, "ax"
-3:	li	r9, KVM_INST_FETCH_FAILED
-	b	2b
-.previous
-.section __ex_table,"a"
-	PPC_LONG_ALIGN
-	PPC_LONG 1b,3b
-.previous
-
+	lwepx   r9, 0, r5
 	mtspr	SPRN_EPLC, r3
-	li	r7, 0
-	stw	r7, TI_PREEMPT(r8)
 	stw	r9, VCPU_LAST_INST(r4)
 	.endif