Patchwork powerpc: kvm: optimize "sc 0" as fast return

login
register
mail settings
Submitter Liu Ping Fan
Date Nov. 8, 2013, 2:44 a.m.
Message ID <1383878656-4196-1-git-send-email-pingfank@linux.vnet.ibm.com>
Download mbox | patch
Permalink /patch/289663/
State New
Headers show

Comments

Liu Ping Fan - Nov. 8, 2013, 2:44 a.m.
syscall is a very common behavior inside guest, and this patch
optimizes the path for the emulation of BOOK3S_INTERRUPT_SYSCALL,
so hypervisor can return to guest without heavy exit, i.e, no need
to swap TLB, HTAB,.. etc

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
Compiled, but lack of bare metal, I have not tested it yet.
---
 arch/powerpc/kvm/book3s_hv.c            |  6 ------
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 13 ++++++++++++-
 2 files changed, 12 insertions(+), 7 deletions(-)
Alexander Graf - Nov. 8, 2013, 3:10 a.m.
On 08.11.2013, at 03:44, Liu Ping Fan <kernelfans@gmail.com> wrote:

> syscall is a very common behavior inside guest, and this patch
> optimizes the path for the emulation of BOOK3S_INTERRUPT_SYSCALL,
> so hypervisor can return to guest without heavy exit, i.e, no need
> to swap TLB, HTAB,.. etc

The syscall exit you touch here only happens when you do an sc > 0 with MSR_PR set inside the guest. The only case you realistically see this is when you run PR KVM inside of an HV KVM guest.

I don't think we should optimize for that case. Instead, we should rather try to not bounce to the 1st hypervisor in the first place in that scenario :).


Alex

> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
> Compiled, but lack of bare metal, I have not tested it yet.
> ---
> arch/powerpc/kvm/book3s_hv.c            |  6 ------
> arch/powerpc/kvm/book3s_hv_rmhandlers.S | 13 ++++++++++++-
> 2 files changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 62a2b5a..73dc852 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -628,12 +628,6 @@ static int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
> 		/* hcall - punt to userspace */
> 		int i;
> 
> -		if (vcpu->arch.shregs.msr & MSR_PR) {
> -			/* sc 1 from userspace - reflect to guest syscall */
> -			kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_SYSCALL);
> -			r = RESUME_GUEST;
> -			break;
> -		}
> 		run->papr_hcall.nr = kvmppc_get_gpr(vcpu, 3);
> 		for (i = 0; i < 9; ++i)
> 			run->papr_hcall.args[i] = kvmppc_get_gpr(vcpu, 4 + i);
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index c71103b..9f626c3 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -1388,7 +1388,8 @@ kvmppc_hisi:
> hcall_try_real_mode:
> 	ld	r3,VCPU_GPR(R3)(r9)
> 	andi.	r0,r11,MSR_PR
> -	bne	guest_exit_cont
> +	/* sc 1 from userspace - reflect to guest syscall */
> +	bne	sc_0_fast_return
> 	clrrdi	r3,r3,2
> 	cmpldi	r3,hcall_real_table_end - hcall_real_table
> 	bge	guest_exit_cont
> @@ -1409,6 +1410,16 @@ hcall_try_real_mode:
> 	ld	r11,VCPU_MSR(r4)
> 	b	fast_guest_return
> 
> +sc_0_fast_return:
> +	ld	r10,VCPU_PC(r9)
> +	ld	r11,VCPU_MSR(r9)
> +	mtspr	SPRN_SRR0,r10
> +	mtspr	SPRN_SRR1,r11
> +	li	r10, BOOK3S_INTERRUPT_SYSCALL
> +	LOAD_REG_IMMEDIATE(r3,0xffffffff87a0ffff)	/* zero 33:36,42:47 */
> +	and	r11,r11,r3
> +	b	fast_guest_return
> +
> 	/* We've attempted a real mode hcall, but it's punted it back
> 	 * to userspace.  We need to restore some clobbered volatiles
> 	 * before resuming the pass-it-to-qemu path */
> -- 
> 1.8.1.4
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Herrenschmidt - Nov. 8, 2013, 4:05 a.m.
On Fri, 2013-11-08 at 04:10 +0100, Alexander Graf wrote:
> On 08.11.2013, at 03:44, Liu Ping Fan <kernelfans@gmail.com> wrote:
> 
> > syscall is a very common behavior inside guest, and this patch
> > optimizes the path for the emulation of BOOK3S_INTERRUPT_SYSCALL,
> > so hypervisor can return to guest without heavy exit, i.e, no need
> > to swap TLB, HTAB,.. etc
> 
> The syscall exit you touch here only happens when you do an sc > 0
> with MSR_PR set inside the guest. The only case you realistically see
> this is when you run PR KVM inside of an HV KVM guest.
> 
> I don't think we should optimize for that case. Instead, we should
> rather try to not bounce to the 1st hypervisor in the first place in
> that scenario :).

Well, so unfortunately openstack CI uses PR inside HV pretty
heavily .... it *might* be worthwhile optimizing that path if the patch
is simple enough... I'd make that Paul's call.

Cheers,
Ben.

> 
> Alex
> 
> > 
> > Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> > ---
> > Compiled, but lack of bare metal, I have not tested it yet.
> > ---
> > arch/powerpc/kvm/book3s_hv.c            |  6 ------
> > arch/powerpc/kvm/book3s_hv_rmhandlers.S | 13 ++++++++++++-
> > 2 files changed, 12 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/powerpc/kvm/book3s_hv.c
> b/arch/powerpc/kvm/book3s_hv.c
> > index 62a2b5a..73dc852 100644
> > --- a/arch/powerpc/kvm/book3s_hv.c
> > +++ b/arch/powerpc/kvm/book3s_hv.c
> > @@ -628,12 +628,6 @@ static int kvmppc_handle_exit(struct kvm_run
> *run, struct kvm_vcpu *vcpu,
> > 		/* hcall - punt to userspace */
> > 		int i;
> > 
> > -		if (vcpu->arch.shregs.msr & MSR_PR) {
> > -			/* sc 1 from userspace - reflect to guest syscall */
> > -			kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_SYSCALL);
> > -			r = RESUME_GUEST;
> > -			break;
> > -		}
> > 		run->papr_hcall.nr = kvmppc_get_gpr(vcpu, 3);
> > 		for (i = 0; i < 9; ++i)
> > 			run->papr_hcall.args[i] = kvmppc_get_gpr(vcpu, 4 + i);
> > diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > index c71103b..9f626c3 100644
> > --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > @@ -1388,7 +1388,8 @@ kvmppc_hisi:
> > hcall_try_real_mode:
> > 	ld	r3,VCPU_GPR(R3)(r9)
> > 	andi.	r0,r11,MSR_PR
> > -	bne	guest_exit_cont
> > +	/* sc 1 from userspace - reflect to guest syscall */
> > +	bne	sc_0_fast_return
> > 	clrrdi	r3,r3,2
> > 	cmpldi	r3,hcall_real_table_end - hcall_real_table
> > 	bge	guest_exit_cont
> > @@ -1409,6 +1410,16 @@ hcall_try_real_mode:
> > 	ld	r11,VCPU_MSR(r4)
> > 	b	fast_guest_return
> > 
> > +sc_0_fast_return:
> > +	ld	r10,VCPU_PC(r9)
> > +	ld	r11,VCPU_MSR(r9)
> > +	mtspr	SPRN_SRR0,r10
> > +	mtspr	SPRN_SRR1,r11
> > +	li	r10, BOOK3S_INTERRUPT_SYSCALL
> > +	LOAD_REG_IMMEDIATE(r3,0xffffffff87a0ffff)	/* zero 33:36,42:47 */
> > +	and	r11,r11,r3
> > +	b	fast_guest_return
> > +
> > 	/* We've attempted a real mode hcall, but it's punted it back
> > 	 * to userspace.  We need to restore some clobbered volatiles
> > 	 * before resuming the pass-it-to-qemu path */
> > -- 
> > 1.8.1.4
> > 


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Herrenschmidt - Nov. 8, 2013, 4:11 a.m.
On Fri, 2013-11-08 at 15:05 +1100, Benjamin Herrenschmidt wrote:
> On Fri, 2013-11-08 at 04:10 +0100, Alexander Graf wrote:
> > On 08.11.2013, at 03:44, Liu Ping Fan <kernelfans@gmail.com> wrote:
> > 
> > > syscall is a very common behavior inside guest, and this patch
> > > optimizes the path for the emulation of BOOK3S_INTERRUPT_SYSCALL,
> > > so hypervisor can return to guest without heavy exit, i.e, no need
> > > to swap TLB, HTAB,.. etc
> > 
> > The syscall exit you touch here only happens when you do an sc > 0
> > with MSR_PR set inside the guest. The only case you realistically see
> > this is when you run PR KVM inside of an HV KVM guest.
> > 
> > I don't think we should optimize for that case. Instead, we should
> > rather try to not bounce to the 1st hypervisor in the first place in
> > that scenario :).
> 
> Well, so unfortunately openstack CI uses PR inside HV pretty
> heavily .... it *might* be worthwhile optimizing that path if the patch
> is simple enough... I'd make that Paul's call.

Note that this is a statement of value for the idea ... not the
implementation ;-) From a quick look with Paulus, the patch is quite
broken. I'll let Paul comment in details.

Cheers,
Ben.

> Cheers,
> Ben.
> 
> > 
> > Alex
> > 
> > > 
> > > Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> > > ---
> > > Compiled, but lack of bare metal, I have not tested it yet.
> > > ---
> > > arch/powerpc/kvm/book3s_hv.c            |  6 ------
> > > arch/powerpc/kvm/book3s_hv_rmhandlers.S | 13 ++++++++++++-
> > > 2 files changed, 12 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/arch/powerpc/kvm/book3s_hv.c
> > b/arch/powerpc/kvm/book3s_hv.c
> > > index 62a2b5a..73dc852 100644
> > > --- a/arch/powerpc/kvm/book3s_hv.c
> > > +++ b/arch/powerpc/kvm/book3s_hv.c
> > > @@ -628,12 +628,6 @@ static int kvmppc_handle_exit(struct kvm_run
> > *run, struct kvm_vcpu *vcpu,
> > > 		/* hcall - punt to userspace */
> > > 		int i;
> > > 
> > > -		if (vcpu->arch.shregs.msr & MSR_PR) {
> > > -			/* sc 1 from userspace - reflect to guest syscall */
> > > -			kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_SYSCALL);
> > > -			r = RESUME_GUEST;
> > > -			break;
> > > -		}
> > > 		run->papr_hcall.nr = kvmppc_get_gpr(vcpu, 3);
> > > 		for (i = 0; i < 9; ++i)
> > > 			run->papr_hcall.args[i] = kvmppc_get_gpr(vcpu, 4 + i);
> > > diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > > index c71103b..9f626c3 100644
> > > --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > > +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > > @@ -1388,7 +1388,8 @@ kvmppc_hisi:
> > > hcall_try_real_mode:
> > > 	ld	r3,VCPU_GPR(R3)(r9)
> > > 	andi.	r0,r11,MSR_PR
> > > -	bne	guest_exit_cont
> > > +	/* sc 1 from userspace - reflect to guest syscall */
> > > +	bne	sc_0_fast_return
> > > 	clrrdi	r3,r3,2
> > > 	cmpldi	r3,hcall_real_table_end - hcall_real_table
> > > 	bge	guest_exit_cont
> > > @@ -1409,6 +1410,16 @@ hcall_try_real_mode:
> > > 	ld	r11,VCPU_MSR(r4)
> > > 	b	fast_guest_return
> > > 
> > > +sc_0_fast_return:
> > > +	ld	r10,VCPU_PC(r9)
> > > +	ld	r11,VCPU_MSR(r9)
> > > +	mtspr	SPRN_SRR0,r10
> > > +	mtspr	SPRN_SRR1,r11
> > > +	li	r10, BOOK3S_INTERRUPT_SYSCALL
> > > +	LOAD_REG_IMMEDIATE(r3,0xffffffff87a0ffff)	/* zero 33:36,42:47 */
> > > +	and	r11,r11,r3
> > > +	b	fast_guest_return
> > > +
> > > 	/* We've attempted a real mode hcall, but it's punted it back
> > > 	 * to userspace.  We need to restore some clobbered volatiles
> > > 	 * before resuming the pass-it-to-qemu path */
> > > -- 
> > > 1.8.1.4
> > > 
> 


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Ping Fan - Nov. 8, 2013, 4:19 a.m.
On Fri, Nov 8, 2013 at 11:10 AM, Alexander Graf <agraf@suse.de> wrote:
>
> On 08.11.2013, at 03:44, Liu Ping Fan <kernelfans@gmail.com> wrote:
>
>> syscall is a very common behavior inside guest, and this patch
>> optimizes the path for the emulation of BOOK3S_INTERRUPT_SYSCALL,
>> so hypervisor can return to guest without heavy exit, i.e, no need
>> to swap TLB, HTAB,.. etc
>
> The syscall exit you touch here only happens when you do an sc > 0 with MSR_PR set inside the guest. The only case you realistically see this is when you run PR KVM inside of an HV KVM guest.
>
Maybe I misunderstood the ISA spec, but refer for "6.5.14 System Call
Interrupt", no description about the MSR_PR when sc trigger a syscall
interrupt. So I think, guest application "sc 0" will also fall to the
kernel who owns hypervisor mode.  Am I right?

> I don't think we should optimize for that case. Instead, we should rather try to not bounce to the 1st hypervisor in the first place in that scenario :).
>
Sorry, but just want to make clear about the idiom:  0 -> kernel run
with NV, and 1st -> kernel run on HV-KVM and provide PR-KVM to up
layer? Right?

When you say "try to not bounce to the 1st hypervisor ", what is the
exact meaning and how can we achieve this?  I am a quite newer on
powerpc, and hope that I can get more clear figure about it  :)

Thanks

Pingfan
>
> Alex
>
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>> Compiled, but lack of bare metal, I have not tested it yet.
>> ---
>> arch/powerpc/kvm/book3s_hv.c            |  6 ------
>> arch/powerpc/kvm/book3s_hv_rmhandlers.S | 13 ++++++++++++-
>> 2 files changed, 12 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index 62a2b5a..73dc852 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -628,12 +628,6 @@ static int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
>>               /* hcall - punt to userspace */
>>               int i;
>>
>> -             if (vcpu->arch.shregs.msr & MSR_PR) {
>> -                     /* sc 1 from userspace - reflect to guest syscall */
>> -                     kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_SYSCALL);
>> -                     r = RESUME_GUEST;
>> -                     break;
>> -             }
>>               run->papr_hcall.nr = kvmppc_get_gpr(vcpu, 3);
>>               for (i = 0; i < 9; ++i)
>>                       run->papr_hcall.args[i] = kvmppc_get_gpr(vcpu, 4 + i);
>> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> index c71103b..9f626c3 100644
>> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> @@ -1388,7 +1388,8 @@ kvmppc_hisi:
>> hcall_try_real_mode:
>>       ld      r3,VCPU_GPR(R3)(r9)
>>       andi.   r0,r11,MSR_PR
>> -     bne     guest_exit_cont
>> +     /* sc 1 from userspace - reflect to guest syscall */
>> +     bne     sc_0_fast_return
>>       clrrdi  r3,r3,2
>>       cmpldi  r3,hcall_real_table_end - hcall_real_table
>>       bge     guest_exit_cont
>> @@ -1409,6 +1410,16 @@ hcall_try_real_mode:
>>       ld      r11,VCPU_MSR(r4)
>>       b       fast_guest_return
>>
>> +sc_0_fast_return:
>> +     ld      r10,VCPU_PC(r9)
>> +     ld      r11,VCPU_MSR(r9)
>> +     mtspr   SPRN_SRR0,r10
>> +     mtspr   SPRN_SRR1,r11
>> +     li      r10, BOOK3S_INTERRUPT_SYSCALL
>> +     LOAD_REG_IMMEDIATE(r3,0xffffffff87a0ffff)       /* zero 33:36,42:47 */
>> +     and     r11,r11,r3
>> +     b       fast_guest_return
>> +
>>       /* We've attempted a real mode hcall, but it's punted it back
>>        * to userspace.  We need to restore some clobbered volatiles
>>        * before resuming the pass-it-to-qemu path */
>> --
>> 1.8.1.4
>>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Ping Fan - Nov. 8, 2013, 4:20 a.m.
On Fri, Nov 8, 2013 at 12:11 PM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Fri, 2013-11-08 at 15:05 +1100, Benjamin Herrenschmidt wrote:
>> On Fri, 2013-11-08 at 04:10 +0100, Alexander Graf wrote:
>> > On 08.11.2013, at 03:44, Liu Ping Fan <kernelfans@gmail.com> wrote:
>> >
>> > > syscall is a very common behavior inside guest, and this patch
>> > > optimizes the path for the emulation of BOOK3S_INTERRUPT_SYSCALL,
>> > > so hypervisor can return to guest without heavy exit, i.e, no need
>> > > to swap TLB, HTAB,.. etc
>> >
>> > The syscall exit you touch here only happens when you do an sc > 0
>> > with MSR_PR set inside the guest. The only case you realistically see
>> > this is when you run PR KVM inside of an HV KVM guest.
>> >
>> > I don't think we should optimize for that case. Instead, we should
>> > rather try to not bounce to the 1st hypervisor in the first place in
>> > that scenario :).
>>
>> Well, so unfortunately openstack CI uses PR inside HV pretty
>> heavily .... it *might* be worthwhile optimizing that path if the patch
>> is simple enough... I'd make that Paul's call.
>
> Note that this is a statement of value for the idea ... not the
> implementation ;-) From a quick look with Paulus, the patch is quite
> broken. I'll let Paul comment in details.
>
Thank you very much,

Regards,
Pingfan
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Ping Fan - Nov. 8, 2013, 8:38 a.m.
On Fri, Nov 8, 2013 at 12:19 PM, Liu ping fan <kernelfans@gmail.com> wrote:
> On Fri, Nov 8, 2013 at 11:10 AM, Alexander Graf <agraf@suse.de> wrote:
>>
>> On 08.11.2013, at 03:44, Liu Ping Fan <kernelfans@gmail.com> wrote:
>>
>>> syscall is a very common behavior inside guest, and this patch
>>> optimizes the path for the emulation of BOOK3S_INTERRUPT_SYSCALL,
>>> so hypervisor can return to guest without heavy exit, i.e, no need
>>> to swap TLB, HTAB,.. etc
>>
>> The syscall exit you touch here only happens when you do an sc > 0 with MSR_PR set inside the guest. The only case you realistically see this is when you run PR KVM inside of an HV KVM guest.
>>
> Maybe I misunderstood the ISA spec, but refer for "6.5.14 System Call
> Interrupt", no description about the MSR_PR when sc trigger a syscall
> interrupt. So I think, guest application "sc 0" will also fall to the
> kernel who owns hypervisor mode.  Am I right?
>
Some further comment: I think the essential of the problem is whether
we switch RMA from guest to HV when interrupts raise.
DSI/ISI will be redirected to HDSI and RMA switch.  But what about
SYSCALL, and DEC, external interrupt, ...etc?

>> I don't think we should optimize for that case. Instead, we should rather try to not bounce to the 1st hypervisor in the first place in that scenario :).
>>
> Sorry, but just want to make clear about the idiom:  0 -> kernel run
> with NV, and 1st -> kernel run on HV-KVM and provide PR-KVM to up
> layer? Right?
>
> When you say "try to not bounce to the 1st hypervisor ", what is the
> exact meaning and how can we achieve this?  I am a quite newer on
> powerpc, and hope that I can get more clear figure about it  :)
>

Thanks
Pingfan
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Mackerras - Nov. 8, 2013, 11:12 a.m.
On Fri, Nov 08, 2013 at 10:44:16AM +0800, Liu Ping Fan wrote:
> syscall is a very common behavior inside guest, and this patch
> optimizes the path for the emulation of BOOK3S_INTERRUPT_SYSCALL,
> so hypervisor can return to guest without heavy exit, i.e, no need
> to swap TLB, HTAB,.. etc

Many interrupts that are caused by guest code go directly to the guest
and don't come to the hypervisor at all.  That includes system call
(sc 0), alignment interrupts, program interrupts, SLB miss interrupts,
etc.  See section 6.5 of Book 3S of the Power ISA specification; all
the interrupts with '-' in the 'HV' column of the table there get
delivered directly to the guest when they occur inside a guest.

> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -1388,7 +1388,8 @@ kvmppc_hisi:
>  hcall_try_real_mode:
>  	ld	r3,VCPU_GPR(R3)(r9)
>  	andi.	r0,r11,MSR_PR
> -	bne	guest_exit_cont
> +	/* sc 1 from userspace - reflect to guest syscall */
> +	bne	sc_0_fast_return

Discrepancy between comment and code here.  In fact we would only take
the branch for a sc 1 instruction in userspace, which occurs when a PR
KVM guest nested inside a HV KVM guest does a hypercall (i.e., not for
normal system calls).  It is probably worthwhile to speed those up.

> +sc_0_fast_return:
> +	ld	r10,VCPU_PC(r9)
> +	ld	r11,VCPU_MSR(r9)

r11 must already contain this since you just did andi. r0,r11,MSR_PR.
In fact r10 already contains VCPU_PC(r9) at this point also, though
that is not so obvious.

> +	mtspr	SPRN_SRR0,r10
> +	mtspr	SPRN_SRR1,r11
> +	li	r10, BOOK3S_INTERRUPT_SYSCALL
> +	LOAD_REG_IMMEDIATE(r3,0xffffffff87a0ffff)	/* zero 33:36,42:47 */
> +	and	r11,r11,r3

This is not correct, since you don't even clear PR.  In fact what you
need is to load up MSR_SF | MSR_ME, though that value changes with
little-endian support and changes again with transactional memory
support for POWER8.  There is an idiom for loading that MSR value,
which is:

	li	r11, (MSR_ME << 1) | 1	/* synthesize MSR_SF | MSR_ME */
	rotldi	r11, r11, 63

which you could use for now, but it will need to be changed when
Anton's LE patch gets accepted.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Ping Fan - Nov. 11, 2013, 1:02 a.m.
On Fri, Nov 8, 2013 at 7:12 PM, Paul Mackerras <paulus@samba.org> wrote:
> On Fri, Nov 08, 2013 at 10:44:16AM +0800, Liu Ping Fan wrote:
>> syscall is a very common behavior inside guest, and this patch
>> optimizes the path for the emulation of BOOK3S_INTERRUPT_SYSCALL,
>> so hypervisor can return to guest without heavy exit, i.e, no need
>> to swap TLB, HTAB,.. etc
>
> Many interrupts that are caused by guest code go directly to the guest
> and don't come to the hypervisor at all.  That includes system call
> (sc 0), alignment interrupts, program interrupts, SLB miss interrupts,
> etc.  See section 6.5 of Book 3S of the Power ISA specification; all
> the interrupts with '-' in the 'HV' column of the table there get
> delivered directly to the guest when they occur inside a guest.
>
Oh,got it, thanks! That is an important thing I tried to find out but
missed all these days.

>> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> @@ -1388,7 +1388,8 @@ kvmppc_hisi:
>>  hcall_try_real_mode:
>>       ld      r3,VCPU_GPR(R3)(r9)
>>       andi.   r0,r11,MSR_PR
>> -     bne     guest_exit_cont
>> +     /* sc 1 from userspace - reflect to guest syscall */
>> +     bne     sc_0_fast_return
>
> Discrepancy between comment and code here.  In fact we would only take
> the branch for a sc 1 instruction in userspace, which occurs when a PR
> KVM guest nested inside a HV KVM guest does a hypercall (i.e., not for

I made a big mistake from the beginning, and now get a more clear
understand of the scene. Thanks!

> normal system calls).  It is probably worthwhile to speed those up.
>
>> +sc_0_fast_return:
>> +     ld      r10,VCPU_PC(r9)
>> +     ld      r11,VCPU_MSR(r9)
>
> r11 must already contain this since you just did andi. r0,r11,MSR_PR.
> In fact r10 already contains VCPU_PC(r9) at this point also, though
> that is not so obvious.
>
>> +     mtspr   SPRN_SRR0,r10
>> +     mtspr   SPRN_SRR1,r11
>> +     li      r10, BOOK3S_INTERRUPT_SYSCALL
>> +     LOAD_REG_IMMEDIATE(r3,0xffffffff87a0ffff)       /* zero 33:36,42:47 */
>> +     and     r11,r11,r3
>
> This is not correct, since you don't even clear PR.  In fact what you

Yes.
> need is to load up MSR_SF | MSR_ME, though that value changes with

Is it enough to only set "MSR_SF | MSR_ME"? Sould the HV guest(PR KVM)
need to fake msr,  so that PR guest feels that "sc 1" is trapped by PR
KVM directly? I.e, according to ISA "Figure 51. MSR setting due to
interrupt", about "System Call", we need to keep MSR_IR/_DR unchanged.
If it is true, then HV need to help HV guest to make this fake. Right?

> little-endian support and changes again with transactional memory
> support for POWER8.  There is an idiom for loading that MSR value,
> which is:
>
>         li      r11, (MSR_ME << 1) | 1  /* synthesize MSR_SF | MSR_ME */
>         rotldi  r11, r11, 63
>
Why do we define MSR_SF_LG as bit 63, not like the ISA says bit 0 is SF?
And could you enlighten me briefly that what is the effect on the
value, when LE and  transactional memory are introduced?

Thanks and best regards,
Pingfan

> which you could use for now, but it will need to be changed when
> Anton's LE patch gets accepted.
>
> Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 62a2b5a..73dc852 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -628,12 +628,6 @@  static int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
 		/* hcall - punt to userspace */
 		int i;
 
-		if (vcpu->arch.shregs.msr & MSR_PR) {
-			/* sc 1 from userspace - reflect to guest syscall */
-			kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_SYSCALL);
-			r = RESUME_GUEST;
-			break;
-		}
 		run->papr_hcall.nr = kvmppc_get_gpr(vcpu, 3);
 		for (i = 0; i < 9; ++i)
 			run->papr_hcall.args[i] = kvmppc_get_gpr(vcpu, 4 + i);
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index c71103b..9f626c3 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1388,7 +1388,8 @@  kvmppc_hisi:
 hcall_try_real_mode:
 	ld	r3,VCPU_GPR(R3)(r9)
 	andi.	r0,r11,MSR_PR
-	bne	guest_exit_cont
+	/* sc 1 from userspace - reflect to guest syscall */
+	bne	sc_0_fast_return
 	clrrdi	r3,r3,2
 	cmpldi	r3,hcall_real_table_end - hcall_real_table
 	bge	guest_exit_cont
@@ -1409,6 +1410,16 @@  hcall_try_real_mode:
 	ld	r11,VCPU_MSR(r4)
 	b	fast_guest_return
 
+sc_0_fast_return:
+	ld	r10,VCPU_PC(r9)
+	ld	r11,VCPU_MSR(r9)
+	mtspr	SPRN_SRR0,r10
+	mtspr	SPRN_SRR1,r11
+	li	r10, BOOK3S_INTERRUPT_SYSCALL
+	LOAD_REG_IMMEDIATE(r3,0xffffffff87a0ffff)	/* zero 33:36,42:47 */
+	and	r11,r11,r3
+	b	fast_guest_return
+
 	/* We've attempted a real mode hcall, but it's punted it back
 	 * to userspace.  We need to restore some clobbered volatiles
 	 * before resuming the pass-it-to-qemu path */