[2/2] KVM: PPC: Book3S: Call into C interrupt handlers

Submitted by Alexander Graf on April 27, 2012, 2:19 p.m.


Message ID CC095701-3BB5-42C5-AF3A-C569275FE69F@suse.de
State New
Headers show

Commit Message

Alexander Graf April 27, 2012, 2:19 p.m.
On 27.04.2012, at 13:23, Alexander Graf wrote:

> On 27.04.2012, at 07:48, Paul Mackerras wrote:
>> On Thu, Apr 26, 2012 at 12:19:03PM +0200, Alexander Graf wrote:
>>> So switch the code over to call into the Linux C handlers from C code,
>>> speeding up everything along the way.
>> I have to say this patch makes me pretty uneasy.  There are a few
>> things that look wrong to me, but more than that, it seems to me that
>> there would be a lot of careful thought needed to make sure that the
>> approach is bullet-proof.
> Yay, finally some review on it :). This method is currently used identically in booke hv, so everything we find broken here also applies there!
>> The first thing is that you are filling in the registers, and in
>> particular r1, in a subroutine, so you are potentially making regs.r1
>> point to a stack frame that no longer exists by the time we look at it
>> inside do_IRQ or timer_interrupt.  So, for example, a stack trace
>> could go completely off the rails at that point.  Quite possibly gcc
>> will inline the kvmppc_fill_pt_regs function, which would probably
>> save you, but I don't think you should rely on that.
> Ugh. Right.
>> The second thing is, why do you save just r1, ip, msr, and lr?  Why
>> those ones and no others?  A performance monitor interrupt might well
>> decide to save all the registers away plus a stack trace, and to see
>> all the GPRs as 0 could be very confusing.
> Well, any other state at that point is pretty useless, since we've long deferred from the original IP the interrupt arrived at. So if a perfmon module reads out other GPRs there, they are basically guaranteed to be useless anyway, no?
>> Thirdly, if preemption is enabled, it could well be that the interrupt
>> will wake up a higher priority task which should run before we
>> continue.  On 64-bit you are probably saved by the soft_irq_enable
>> calls, which will (I think) call schedule() if a reschedule is
>> pending, but on 32-bit soft_irq_enable does nothing.
> At that point preemption is disabled.
>> Fourthly, as Ben said, you should be setting regs->trap.
> Yup :). Very good catch that one.
>> Have you measured a performance improvement with this patch?  If so
>> how big was it?
> Yeah, I tried things on 970 in an mfsprg/mtsprg busy loop. I measured 3 different variants:
> C irq handling:		1004944 exits/sec
> asm irq handling:		1001774 exits/sec
> asm + hsrr patch:		994719 exits/sec
> So as you can see, that code change does have quite an impact. But maybe the added complexity isn't worth it? Either way, we should try and find a solution that works the same way for booke and book3s - I don't want such an integral part to differ all that much.

How about this patch? This at least gets rid of the hsrr performance penalty:

To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch hide | download patch | download mbox

diff --git a/arch/powerpc/kvm/book3s_segment.S b/arch/powerpc/kvm/book3s_segment.S
index 6bae0a9..8b2fc66 100644
--- a/arch/powerpc/kvm/book3s_segment.S
+++ b/arch/powerpc/kvm/book3s_segment.S
@@ -198,6 +198,7 @@  kvmppc_interrupt:
 	/* Save guest PC and MSR */
 #ifdef CONFIG_PPC64
+	mr	r10, r12
 	andi.	r0,r12,0x2
 	beq	1f
 	mfspr	r3,SPRN_HSRR0
@@ -317,23 +318,17 @@  no_dcbz32_off:
 	 * Having set up SRR0/1 with the address where we want
 	 * to continue with relocation on (potentially in module
 	 * space), we either just go straight there with rfi[d],
-	 * or we jump to an interrupt handler with bctr if there
-	 * is an interrupt to be handled first.  In the latter
-	 * case, the rfi[d] at the end of the interrupt handler
-	 * will get us back to where we want to continue.
+	 * or we jump to an interrupt handler if there is an
+	 * interrupt to be handled first.  In the latter case,
+	 * the rfi[d] at the end of the interrupt handler will
+	 * get us back to where we want to continue.
-	beq	1f
-	beq	1f
-1:	mtctr	r12
 	/* Register usage at this point:
 	 * R1       = host R1
 	 * R2       = host R2
+	 * R10      = raw exit handler id
 	 * R12      = exit handler id
 	 * R13      = shadow vcpu (32-bit) or PACA (64-bit)
 	 * SVCPU.*  = guest *
@@ -343,12 +338,26 @@  no_dcbz32_off:
-	/* Restore host msr -> SRR1 */
+#ifdef CONFIG_PPC64
+	andi.	r0,r10,0x2
+	beq	1f
+	mtspr	SPRN_HSRR1, r6
+	mtspr	SPRN_HSRR0, r8
+1:	/* Restore host msr -> SRR1 */
 	mtsrr1	r6
 	/* Load highmem handler address */
 	mtsrr0	r8
 	/* RFI into the highmem handler, or jump to interrupt handler */
-	beqctr