Patchwork [2/2] KVM: PPC: Book3S: Call into C interrupt handlers

login
register
mail settings
Submitter Alexander Graf
Date April 26, 2012, 10:19 a.m.
Message ID <1335435543-19690-2-git-send-email-agraf@suse.de>
Download mbox | patch
Permalink /patch/155229/
State New
Headers show

Comments

Alexander Graf - April 26, 2012, 10:19 a.m.
Some interrupts only should be deferred when intercepted by KVM code,
not completely handled by KVM. Thus we need to call into Linux' interrupt
handling code for a few vectors.

So far, this has been done by calling right back into the original
interrupt vector once we're far enough in the guest exit code path.

However, this adds more code to be executed, because we need to save
and restore more state during the full interrupt cycle. We also lose
out on compiler optimizations, since it's all hand written asm.

So switch the code over to call into the Linux C handlers from C code,
speeding up everything along the way.

Along the way, this fixes a bug where we would have to set HSSR instead
of SSR SPRs for the interrupt vector, making hv capable hosts work again
properly.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/book3s_pr.c      |   58 +++++++++++++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_segment.S |   25 +---------------
 2 files changed, 59 insertions(+), 24 deletions(-)
Benjamin Herrenschmidt - April 26, 2012, 9:45 p.m.
> +static void kvmppc_fill_pt_regs(struct pt_regs *regs)
> +{
> +	ulong r1, ip, msr, lr;
> +
> +	asm("mr %0, 1" : "=r"(r1));
> +	asm("mflr %0" : "=r"(lr));
> +	asm("mfmsr %0" : "=r"(msr));
> +	asm("bl 1f; 1: mflr %0" : "=r"(ip));
> +
> +	memset(regs, 0, sizeof(*regs));
> +	regs->gpr[1] = r1;
> +	regs->nip = ip;
> +	regs->msr = msr;
> +	regs->link = lr;
> +}

That is -very- gross ... I suppose it works but yuck :-)

> +static void kvmppc_restart_interrupt(struct kvm_vcpu *vcpu,
> +				     unsigned int exit_nr)
> +{
> +	struct pt_regs regs;
> +
> +	switch (exit_nr) {
> +	case BOOK3S_INTERRUPT_EXTERNAL:
> +	case BOOK3S_INTERRUPT_EXTERNAL_LEVEL:
> +	case BOOK3S_INTERRUPT_EXTERNAL_HV:
> +		kvmppc_fill_pt_regs(&regs);
> +		soft_irq_disable();
> +		do_IRQ(&regs);
> +		soft_irq_enable();

What are those soft_irq_disable/enable ? They look like sometimes
local_irq_disable/enable and sometimes something else ?

If you are hard disabled already (which you should be) and want to
"mark" things as soft disabled, I suppose that will work except that
you'll be missing PACA_IRQ_HARD_DIS in irq_happened, so
local_irq_enable() will not hard-enable.

> +		break;
> +	case BOOK3S_INTERRUPT_DECREMENTER:
> +	case BOOK3S_INTERRUPT_HV_DECREMENTER:
> +		kvmppc_fill_pt_regs(&regs);
> +		soft_irq_disable();
> +		timer_interrupt(&regs);
> +		soft_irq_enable();
> +		break;

Same.

> +	case BOOK3S_INTERRUPT_MACHINE_CHECK:
> +		/* FIXME */
> +		break;
> +	case BOOK3S_INTERRUPT_PERFMON:
> +		kvmppc_fill_pt_regs(&regs);
> +		soft_irq_disable();
> +		performance_monitor_exception(&regs);
> +		soft_irq_enable();
> +		break;

Same.

> +	}
> +}
> +
>  int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
>                         unsigned int exit_nr)
>  {
> @@ -548,6 +602,10 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
>  	run->exit_reason = KVM_EXIT_UNKNOWN;
>  	run->ready_for_interrupt_injection = 1;
>  
> +	/* restart interrupts if they were meant for the host */
> +	kvmppc_restart_interrupt(vcpu, exit_nr);
> +	__hard_irq_enable();
> +

I suppose that's to work around the above comment about missing
PACA_IRQ_HARD_DIS ?

>  	trace_kvm_book3s_exit(exit_nr, vcpu);
>  	preempt_enable();
>  	kvm_resched(vcpu);
> diff --git a/arch/powerpc/kvm/book3s_segment.S b/arch/powerpc/kvm/book3s_segment.S
> index 6bae0a9..3a78b46 100644
> --- a/arch/powerpc/kvm/book3s_segment.S
> +++ b/arch/powerpc/kvm/book3s_segment.S
> @@ -308,28 +308,6 @@ no_dcbz32_off:
>  
>  #endif /* CONFIG_PPC_BOOK3S_64 */
>  
> -	/*
> -	 * For some interrupts, we need to call the real Linux
> -	 * handler, so it can do work for us. This has to happen
> -	 * as if the interrupt arrived from the kernel though,
> -	 * so let's fake it here where most state is restored.
> -	 *
> -	 * Having set up SRR0/1 with the address where we want
> -	 * to continue with relocation on (potentially in module
> -	 * space), we either just go straight there with rfi[d],
> -	 * or we jump to an interrupt handler with bctr if there
> -	 * is an interrupt to be handled first.  In the latter
> -	 * case, the rfi[d] at the end of the interrupt handler
> -	 * will get us back to where we want to continue.
> -	 */
> -
> -	cmpwi	r12, BOOK3S_INTERRUPT_EXTERNAL
> -	beq	1f
> -	cmpwi	r12, BOOK3S_INTERRUPT_DECREMENTER
> -	beq	1f
> -	cmpwi	r12, BOOK3S_INTERRUPT_PERFMON
> -1:	mtctr	r12
> -
>  	/* Register usage at this point:
>  	 *
>  	 * R1       = host R1
> @@ -348,7 +326,6 @@ no_dcbz32_off:
>  	/* Load highmem handler address */
>  	mtsrr0	r8
>  
> -	/* RFI into the highmem handler, or jump to interrupt handler */
> -	beqctr
> +	/* RFI into the highmem handler */
>  	RFI
>  kvmppc_handler_trampoline_exit_end:

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Wood - April 26, 2012, 9:54 p.m.
On 04/26/2012 05:19 AM, Alexander Graf wrote:
> +	asm("mfmsr %0" : "=r"(msr));

Why not just mfmsr()?

> +	asm("bl 1f; 1: mflr %0" : "=r"(ip));

You'll want to tell the compiler that you're trashing LR.

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf - April 26, 2012, 10:24 p.m.
On 26.04.2012, at 23:45, Benjamin Herrenschmidt wrote:

> 
>> +static void kvmppc_fill_pt_regs(struct pt_regs *regs)
>> +{
>> +	ulong r1, ip, msr, lr;
>> +
>> +	asm("mr %0, 1" : "=r"(r1));
>> +	asm("mflr %0" : "=r"(lr));
>> +	asm("mfmsr %0" : "=r"(msr));
>> +	asm("bl 1f; 1: mflr %0" : "=r"(ip));
>> +
>> +	memset(regs, 0, sizeof(*regs));
>> +	regs->gpr[1] = r1;
>> +	regs->nip = ip;
>> +	regs->msr = msr;
>> +	regs->link = lr;
>> +}
> 
> That is -very- gross ... I suppose it works but yuck :-)
> 
>> +static void kvmppc_restart_interrupt(struct kvm_vcpu *vcpu,
>> +				     unsigned int exit_nr)
>> +{
>> +	struct pt_regs regs;
>> +
>> +	switch (exit_nr) {
>> +	case BOOK3S_INTERRUPT_EXTERNAL:
>> +	case BOOK3S_INTERRUPT_EXTERNAL_LEVEL:
>> +	case BOOK3S_INTERRUPT_EXTERNAL_HV:
>> +		kvmppc_fill_pt_regs(&regs);
>> +		soft_irq_disable();
>> +		do_IRQ(&regs);
>> +		soft_irq_enable();
> 
> What are those soft_irq_disable/enable ? They look like sometimes
> local_irq_disable/enable and sometimes something else ?

Yeah, on ppc64 these are local_irq_en/disable. On ppc32, they are nops, because there we don't distinguish between soft and hard irq enabled.

> 
> If you are hard disabled already (which you should be) and want to
> "mark" things as soft disabled, I suppose that will work except that
> you'll be missing PACA_IRQ_HARD_DIS in irq_happened, so
> local_irq_enable() will not hard-enable.

I jump through this hoop because in kernel/posix-cpu-timers.c:1288 the BUG_ON triggers, since irqs_disabled() is false, because they are hard, but not soft disabled (which is a combination that iiuc doesn't happen in normal Linux).

> 
>> +		break;
>> +	case BOOK3S_INTERRUPT_DECREMENTER:
>> +	case BOOK3S_INTERRUPT_HV_DECREMENTER:
>> +		kvmppc_fill_pt_regs(&regs);
>> +		soft_irq_disable();
>> +		timer_interrupt(&regs);
>> +		soft_irq_enable();
>> +		break;
> 
> Same.
> 
>> +	case BOOK3S_INTERRUPT_MACHINE_CHECK:
>> +		/* FIXME */
>> +		break;
>> +	case BOOK3S_INTERRUPT_PERFMON:
>> +		kvmppc_fill_pt_regs(&regs);
>> +		soft_irq_disable();
>> +		performance_monitor_exception(&regs);
>> +		soft_irq_enable();
>> +		break;
> 
> Same.
> 
>> +	}
>> +}
>> +
>> int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
>>                        unsigned int exit_nr)
>> {
>> @@ -548,6 +602,10 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
>> 	run->exit_reason = KVM_EXIT_UNKNOWN;
>> 	run->ready_for_interrupt_injection = 1;
>> 
>> +	/* restart interrupts if they were meant for the host */
>> +	kvmppc_restart_interrupt(vcpu, exit_nr);
>> +	__hard_irq_enable();
>> +
> 
> I suppose that's to work around the above comment about missing
> PACA_IRQ_HARD_DIS ?

This is to actually enable interrupts for real, regardless of ppc64 and ppc32. In fact, the previous code was pretty buggy - it was running the handlers with interrupts disabled ;).


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf - April 26, 2012, 10:26 p.m.
On 26.04.2012, at 23:54, Scott Wood wrote:

> On 04/26/2012 05:19 AM, Alexander Graf wrote:
>> +	asm("mfmsr %0" : "=r"(msr));
> 
> Why not just mfmsr()?

Good question :).

> 
>> +	asm("bl 1f; 1: mflr %0" : "=r"(ip));
> 
> You'll want to tell the compiler that you're trashing LR.

Yikes. Yeah.

Keep in mind that this is a 1:1 copy from the booke code (read: I should merge those 2). So it's just as broken there.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Herrenschmidt - April 26, 2012, 10:58 p.m.
On Thu, 2012-04-26 at 16:54 -0500, Scott Wood wrote:
> On 04/26/2012 05:19 AM, Alexander Graf wrote:
> > +     asm("mfmsr %0" : "=r"(msr));
> 
> Why not just mfmsr()?
> 
> > +     asm("bl 1f; 1: mflr %0" : "=r"(ip));
> 
> You'll want to tell the compiler that you're trashing LR. 

We probably also want to make sure TRAP is set properly. Some interrupts
(like perf) will test it.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Herrenschmidt - April 26, 2012, 11:12 p.m.
On Fri, 2012-04-27 at 00:24 +0200, Alexander Graf wrote:
> 
> This is to actually enable interrupts for real, regardless of ppc64
> and ppc32. In fact, the previous code was pretty buggy - it was
> running the handlers with interrupts disabled ;). 

They should be run with interrupts disabled.. tho both soft & hard.

You probably do want to call local_irq_disable() unconditionally anyway,
because on ppc32, that will give you the proper accounting vs. lockdep.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf - April 26, 2012, 11:30 p.m.
On 27.04.2012, at 01:12, Benjamin Herrenschmidt wrote:

> On Fri, 2012-04-27 at 00:24 +0200, Alexander Graf wrote:
>> 
>> This is to actually enable interrupts for real, regardless of ppc64
>> and ppc32. In fact, the previous code was pretty buggy - it was
>> running the handlers with interrupts disabled ;). 
> 
> They should be run with interrupts disabled.. tho both soft & hard.

The kvm_resched()? No, that one should be run with interrupts enabled - hard and soft :).

> You probably do want to call local_irq_disable() unconditionally anyway,
> because on ppc32, that will give you the proper accounting vs. lockdep.

We already do __hard_irq_disable (which maps to local_irq_disable on ppc32) when entering the guest context and when leaving the intercept handler :). So that should be fine, no?


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Herrenschmidt - April 26, 2012, 11:37 p.m.
On Fri, 2012-04-27 at 01:30 +0200, Alexander Graf wrote:
> On 27.04.2012, at 01:12, Benjamin Herrenschmidt wrote:
> 
> > On Fri, 2012-04-27 at 00:24 +0200, Alexander Graf wrote:
> >> 
> >> This is to actually enable interrupts for real, regardless of ppc64
> >> and ppc32. In fact, the previous code was pretty buggy - it was
> >> running the handlers with interrupts disabled ;). 
> > 
> > They should be run with interrupts disabled.. tho both soft & hard.
> 
> The kvm_resched()? No, that one should be run with interrupts enabled - hard and soft :).

Ok, when you said "the handler" I thought you mean do_IRQ & co... those
must be run with IRQs off (and never enabled since taking the actual
exception).

> > You probably do want to call local_irq_disable() unconditionally anyway,
> > because on ppc32, that will give you the proper accounting vs. lockdep.
> 
> We already do __hard_irq_disable (which maps to local_irq_disable on ppc32)
> when entering the guest context and when leaving the intercept handler :).
> So that should be fine, no?

Well, __hard_irq_disable() isn't defined on ppc32 in hw_irq.h so if you
redefine it locally that's really gross :-) Also that means that from a
lockdep perspective you are running the entire guest with IRQs off ?
that doesn't sound right...

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf - April 26, 2012, 11:50 p.m.
On 27.04.2012, at 01:37, Benjamin Herrenschmidt wrote:

> On Fri, 2012-04-27 at 01:30 +0200, Alexander Graf wrote:
>> On 27.04.2012, at 01:12, Benjamin Herrenschmidt wrote:
>> 
>>> On Fri, 2012-04-27 at 00:24 +0200, Alexander Graf wrote:
>>>> 
>>>> This is to actually enable interrupts for real, regardless of ppc64
>>>> and ppc32. In fact, the previous code was pretty buggy - it was
>>>> running the handlers with interrupts disabled ;). 
>>> 
>>> They should be run with interrupts disabled.. tho both soft & hard.
>> 
>> The kvm_resched()? No, that one should be run with interrupts enabled - hard and soft :).
> 
> Ok, when you said "the handler" I thought you mean do_IRQ & co... those
> must be run with IRQs off (and never enabled since taking the actual
> exception).

Ah, yeah, "the handler" in this context meant the kvm intercept handlers. Those were running with irq hard disabled. Pretty bad, eh? :)

> 
>>> You probably do want to call local_irq_disable() unconditionally anyway,
>>> because on ppc32, that will give you the proper accounting vs. lockdep.
>> 
>> We already do __hard_irq_disable (which maps to local_irq_disable on ppc32)
>> when entering the guest context and when leaving the intercept handler :).
>> So that should be fine, no?
> 
> Well, __hard_irq_disable() isn't defined on ppc32 in hw_irq.h so if you
> redefine it locally that's really gross :-) Also that means that from a
> lockdep perspective you are running the entire guest with IRQs off ?
> that doesn't sound right...

Yup. We're running the entire guest with IRQs off. Do you have any better idea how to make sure that we're atomic wrt signal delivery?


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Herrenschmidt - April 27, 2012, midnight
On Fri, 2012-04-27 at 01:50 +0200, Alexander Graf wrote:

> Yup. We're running the entire guest with IRQs off. Do you have any better idea how to make sure that we're atomic wrt signal delivery?

Something like this entry path:

 - local_irq_disable
 - check TIF_NEED_RESCHED, TIF_SIGPENDING,... (probably
_TIF_USER_WORK_MASK in fact)
 - if any set, re-enable then -> back to qemu
 - ppc64: hard disable
 - ppc64: check irq_pending, something set (other than hard disable) ->
local_irq_enable() and try again the whole sequence
 - call trace_irq_enable() (tell lockdep/irqtrace we are re-enabling)
 - enter the guest, last rfi will turn EE on

And exit:

 - irqs are hard off from the interrupt handler
 - call local_irq_disable() to tell lockdep about it and mark
soft-disabled on ppc64
 - ppc64: maybe set PACA_HARD_IRQ_DIS (tbd)
 - do things like do_IRQ() etc... if needed
 - local_irq_enable() (will hard enable if PACA_HARD_IRQ_DIS was set)


Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Mackerras - April 27, 2012, 5:48 a.m.
On Thu, Apr 26, 2012 at 12:19:03PM +0200, Alexander Graf wrote:

> So switch the code over to call into the Linux C handlers from C code,
> speeding up everything along the way.

I have to say this patch makes me pretty uneasy.  There are a few
things that look wrong to me, but more than that, it seems to me that
there would be a lot of careful thought needed to make sure that the
approach is bullet-proof.

The first thing is that you are filling in the registers, and in
particular r1, in a subroutine, so you are potentially making regs.r1
point to a stack frame that no longer exists by the time we look at it
inside do_IRQ or timer_interrupt.  So, for example, a stack trace
could go completely off the rails at that point.  Quite possibly gcc
will inline the kvmppc_fill_pt_regs function, which would probably
save you, but I don't think you should rely on that.

The second thing is, why do you save just r1, ip, msr, and lr?  Why
those ones and no others?  A performance monitor interrupt might well
decide to save all the registers away plus a stack trace, and to see
all the GPRs as 0 could be very confusing.

Thirdly, if preemption is enabled, it could well be that the interrupt
will wake up a higher priority task which should run before we
continue.  On 64-bit you are probably saved by the soft_irq_enable
calls, which will (I think) call schedule() if a reschedule is
pending, but on 32-bit soft_irq_enable does nothing.

Fourthly, as Ben said, you should be setting regs->trap.

Have you measured a performance improvement with this patch?  If so
how big was it?

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf - April 27, 2012, 11:23 a.m.
On 27.04.2012, at 07:48, Paul Mackerras wrote:

> On Thu, Apr 26, 2012 at 12:19:03PM +0200, Alexander Graf wrote:
> 
>> So switch the code over to call into the Linux C handlers from C code,
>> speeding up everything along the way.
> 
> I have to say this patch makes me pretty uneasy.  There are a few
> things that look wrong to me, but more than that, it seems to me that
> there would be a lot of careful thought needed to make sure that the
> approach is bullet-proof.

Yay, finally some review on it :). This method is currently used identically in booke hv, so everything we find broken here also applies there!

> The first thing is that you are filling in the registers, and in
> particular r1, in a subroutine, so you are potentially making regs.r1
> point to a stack frame that no longer exists by the time we look at it
> inside do_IRQ or timer_interrupt.  So, for example, a stack trace
> could go completely off the rails at that point.  Quite possibly gcc
> will inline the kvmppc_fill_pt_regs function, which would probably
> save you, but I don't think you should rely on that.

Ugh. Right.

> The second thing is, why do you save just r1, ip, msr, and lr?  Why
> those ones and no others?  A performance monitor interrupt might well
> decide to save all the registers away plus a stack trace, and to see
> all the GPRs as 0 could be very confusing.

Well, any other state at that point is pretty useless, since we've long deferred from the original IP the interrupt arrived at. So if a perfmon module reads out other GPRs there, they are basically guaranteed to be useless anyway, no?

> Thirdly, if preemption is enabled, it could well be that the interrupt
> will wake up a higher priority task which should run before we
> continue.  On 64-bit you are probably saved by the soft_irq_enable
> calls, which will (I think) call schedule() if a reschedule is
> pending, but on 32-bit soft_irq_enable does nothing.

At that point preemption is disabled.

> Fourthly, as Ben said, you should be setting regs->trap.

Yup :). Very good catch that one.

> Have you measured a performance improvement with this patch?  If so
> how big was it?

Yeah, I tried things on 970 in an mfsprg/mtsprg busy loop. I measured 3 different variants:

C irq handling:		1004944 exits/sec
asm irq handling:		1001774 exits/sec
asm + hsrr patch:		994719 exits/sec

So as you can see, that code change does have quite an impact. But maybe the added complexity isn't worth it? Either way, we should try and find a solution that works the same way for booke and book3s - I don't want such an integral part to differ all that much.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Wood - April 27, 2012, 4:37 p.m.
On 04/27/2012 06:23 AM, Alexander Graf wrote:
> 
> On 27.04.2012, at 07:48, Paul Mackerras wrote:
> 
>> Have you measured a performance improvement with this patch?  If so
>> how big was it?
> 
> Yeah, I tried things on 970 in an mfsprg/mtsprg busy loop. I measured 3 different variants:
> 
> C irq handling:		1004944 exits/sec
> asm irq handling:		1001774 exits/sec
> asm + hsrr patch:		994719 exits/sec
> 
> So as you can see, that code change does have quite an impact. But
> maybe the added complexity isn't worth it? Either way, we should try
> and find a solution that works the same way for booke and book3s - I
> don't want such an integral part to differ all that much.

Is it really added complexity, considering what you can remove from the
asm?  I went with C handling on bookehv because it seemed simpler (the
original internal version had asm handling).

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf - April 27, 2012, 4:54 p.m.
On 27.04.2012, at 18:37, Scott Wood <scottwood@freescale.com> wrote:

> On 04/27/2012 06:23 AM, Alexander Graf wrote:
>> 
>> On 27.04.2012, at 07:48, Paul Mackerras wrote:
>> 
>>> Have you measured a performance improvement with this patch?  If so
>>> how big was it?
>> 
>> Yeah, I tried things on 970 in an mfsprg/mtsprg busy loop. I measured 3 different variants:
>> 
>> C irq handling:        1004944 exits/sec
>> asm irq handling:        1001774 exits/sec
>> asm + hsrr patch:        994719 exits/sec
>> 
>> So as you can see, that code change does have quite an impact. But
>> maybe the added complexity isn't worth it? Either way, we should try
>> and find a solution that works the same way for booke and book3s - I
>> don't want such an integral part to differ all that much.
> 
> Is it really added complexity, considering what you can remove from the
> asm?  I went with C handling on bookehv because it seemed simpler (the
> original internal version had asm handling).

Well, the addition in complexity is that you have to start thinking about more things than before :). I'm not sure really which way is better in the long run.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Herrenschmidt - April 27, 2012, 10:20 p.m.
On Fri, 2012-04-27 at 13:23 +0200, Alexander Graf wrote:
> On 27.04.2012, at 07:48, Paul Mackerras wrote:
> 
> > On Thu, Apr 26, 2012 at 12:19:03PM +0200, Alexander Graf wrote:
> > 
> >> So switch the code over to call into the Linux C handlers from C code,
> >> speeding up everything along the way.
> > 
> > I have to say this patch makes me pretty uneasy.  There are a few
> > things that look wrong to me, but more than that, it seems to me that
> > there would be a lot of careful thought needed to make sure that the
> > approach is bullet-proof.
> 
> Yay, finally some review on it :). This method is currently used identically
> in booke hv, so everything we find broken here also applies there!
> 
> > The first thing is that you are filling in the registers, and in
> > particular r1, in a subroutine, so you are potentially making regs.r1
> > point to a stack frame that no longer exists by the time we look at it
> > inside do_IRQ or timer_interrupt.  So, for example, a stack trace
> > could go completely off the rails at that point.  Quite possibly gcc
> > will inline the kvmppc_fill_pt_regs function, which would probably
> > save you, but I don't think you should rely on that.
> 
> Ugh. Right.
> 
> > The second thing is, why do you save just r1, ip, msr, and lr?  Why
> > those ones and no others?  A performance monitor interrupt might well
> > decide to save all the registers away plus a stack trace, and to see
> > all the GPRs as 0 could be very confusing.
> 
> Well, any other state at that point is pretty useless, since we've long
> deferred from the original IP the interrupt arrived at. So if a perfmon
> module reads out other GPRs there, they are basically guaranteed to be
> useless anyway, no?
> 
> > Thirdly, if preemption is enabled, it could well be that the interrupt
> > will wake up a higher priority task which should run before we
> > continue.  On 64-bit you are probably saved by the soft_irq_enable
> > calls, which will (I think) call schedule() if a reschedule is
> > pending, but on 32-bit soft_irq_enable does nothing.
> 
> At that point preemption is disabled.
> 
> > Fourthly, as Ben said, you should be setting regs->trap.
> 
> Yup :). Very good catch that one.
> 
> > Have you measured a performance improvement with this patch?  If so
> > how big was it?
> 
> Yeah, I tried things on 970 in an mfsprg/mtsprg busy loop. I measured 3 different variants:
> 
> C irq handling:		1004944 exits/sec
> asm irq handling:		1001774 exits/sec
> asm + hsrr patch:		994719 exits/sec
> 
> So as you can see, that code change does have quite an impact. But maybe the added
> complexity isn't worth it? Either way, we should try and find a solution that works
> the same way for booke and book3s - I don't want such an integral part to differ
> all that much.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index dba282e..f84b7ba 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -54,6 +54,11 @@  static int kvmppc_handle_ext(struct kvm_vcpu *vcpu, unsigned int exit_nr,
 #define HW_PAGE_SIZE PAGE_SIZE
 #define __hard_irq_disable local_irq_disable
 #define __hard_irq_enable local_irq_enable
+static void soft_irq_disable(void) { }
+static void soft_irq_enable(void) { }
+#else
+#define soft_irq_disable local_irq_disable
+#define soft_irq_enable local_irq_enable
 #endif
 
 void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
@@ -538,6 +543,55 @@  static int kvmppc_handle_ext(struct kvm_vcpu *vcpu, unsigned int exit_nr,
 	return RESUME_GUEST;
 }
 
+static void kvmppc_fill_pt_regs(struct pt_regs *regs)
+{
+	ulong r1, ip, msr, lr;
+
+	asm("mr %0, 1" : "=r"(r1));
+	asm("mflr %0" : "=r"(lr));
+	asm("mfmsr %0" : "=r"(msr));
+	asm("bl 1f; 1: mflr %0" : "=r"(ip));
+
+	memset(regs, 0, sizeof(*regs));
+	regs->gpr[1] = r1;
+	regs->nip = ip;
+	regs->msr = msr;
+	regs->link = lr;
+}
+
+static void kvmppc_restart_interrupt(struct kvm_vcpu *vcpu,
+				     unsigned int exit_nr)
+{
+	struct pt_regs regs;
+
+	switch (exit_nr) {
+	case BOOK3S_INTERRUPT_EXTERNAL:
+	case BOOK3S_INTERRUPT_EXTERNAL_LEVEL:
+	case BOOK3S_INTERRUPT_EXTERNAL_HV:
+		kvmppc_fill_pt_regs(&regs);
+		soft_irq_disable();
+		do_IRQ(&regs);
+		soft_irq_enable();
+		break;
+	case BOOK3S_INTERRUPT_DECREMENTER:
+	case BOOK3S_INTERRUPT_HV_DECREMENTER:
+		kvmppc_fill_pt_regs(&regs);
+		soft_irq_disable();
+		timer_interrupt(&regs);
+		soft_irq_enable();
+		break;
+	case BOOK3S_INTERRUPT_MACHINE_CHECK:
+		/* FIXME */
+		break;
+	case BOOK3S_INTERRUPT_PERFMON:
+		kvmppc_fill_pt_regs(&regs);
+		soft_irq_disable();
+		performance_monitor_exception(&regs);
+		soft_irq_enable();
+		break;
+	}
+}
+
 int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
                        unsigned int exit_nr)
 {
@@ -548,6 +602,10 @@  int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	run->exit_reason = KVM_EXIT_UNKNOWN;
 	run->ready_for_interrupt_injection = 1;
 
+	/* restart interrupts if they were meant for the host */
+	kvmppc_restart_interrupt(vcpu, exit_nr);
+	__hard_irq_enable();
+
 	trace_kvm_book3s_exit(exit_nr, vcpu);
 	preempt_enable();
 	kvm_resched(vcpu);
diff --git a/arch/powerpc/kvm/book3s_segment.S b/arch/powerpc/kvm/book3s_segment.S
index 6bae0a9..3a78b46 100644
--- a/arch/powerpc/kvm/book3s_segment.S
+++ b/arch/powerpc/kvm/book3s_segment.S
@@ -308,28 +308,6 @@  no_dcbz32_off:
 
 #endif /* CONFIG_PPC_BOOK3S_64 */
 
-	/*
-	 * For some interrupts, we need to call the real Linux
-	 * handler, so it can do work for us. This has to happen
-	 * as if the interrupt arrived from the kernel though,
-	 * so let's fake it here where most state is restored.
-	 *
-	 * Having set up SRR0/1 with the address where we want
-	 * to continue with relocation on (potentially in module
-	 * space), we either just go straight there with rfi[d],
-	 * or we jump to an interrupt handler with bctr if there
-	 * is an interrupt to be handled first.  In the latter
-	 * case, the rfi[d] at the end of the interrupt handler
-	 * will get us back to where we want to continue.
-	 */
-
-	cmpwi	r12, BOOK3S_INTERRUPT_EXTERNAL
-	beq	1f
-	cmpwi	r12, BOOK3S_INTERRUPT_DECREMENTER
-	beq	1f
-	cmpwi	r12, BOOK3S_INTERRUPT_PERFMON
-1:	mtctr	r12
-
 	/* Register usage at this point:
 	 *
 	 * R1       = host R1
@@ -348,7 +326,6 @@  no_dcbz32_off:
 	/* Load highmem handler address */
 	mtsrr0	r8
 
-	/* RFI into the highmem handler, or jump to interrupt handler */
-	beqctr
+	/* RFI into the highmem handler */
 	RFI
 kvmppc_handler_trampoline_exit_end: