[RFC,07/11] powerpc: kvm: the stopper func to cease secondary hwthread
diff mbox

Message ID 1413487800-7162-8-git-send-email-kernelfans@gmail.com
State RFC
Headers show

Commit Message

Pingfan Liu Oct. 16, 2014, 7:29 p.m. UTC
To enter guest, primary hwtherad schedules the stopper func on
secondary threads and force them into NAP mode.
When exit to host,secondary threads hardcode to restore the stack,
then switch back to the stopper func, i.e host.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_hv.c            | 15 +++++++++++++++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 34 +++++++++++++++++++++++++++++++++
 2 files changed, 49 insertions(+)

Comments

Preeti U Murthy Oct. 22, 2014, 7:12 a.m. UTC | #1
Hi Liu,

On 10/17/2014 12:59 AM, kernelfans@gmail.com wrote:
> To enter guest, primary hwtherad schedules the stopper func on
> secondary threads and force them into NAP mode.
> When exit to host,secondary threads hardcode to restore the stack,
> then switch back to the stopper func, i.e host.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kvm/book3s_hv.c            | 15 +++++++++++++++
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 34 +++++++++++++++++++++++++++++++++
>  2 files changed, 49 insertions(+)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index ba258c8..4348abd 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -1486,6 +1486,21 @@ static void kvmppc_remove_runnable(struct kvmppc_vcore *vc,
>  	list_del(&vcpu->arch.run_list);
>  }
>  
> +#ifdef KVMPPC_ENABLE_SECONDARY
> +
> +extern void kvmppc_secondary_stopper_enter();
> +
> +static int kvmppc_secondary_stopper(void *data)
> +{
> +	int cpu =smp_processor_id();
> +	struct paca_struct *lpaca = get_paca();
> +	BUG_ON(!(cpu%thread_per_core));
> +
> +	kvmppc_secondary_stopper_enter();
> +}
> +
> +#endif
> +
>  static int kvmppc_grab_hwthread(int cpu)
>  {
>  	struct paca_struct *tpaca;
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index d5594b0..254038b 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -349,7 +349,41 @@ kvm_do_nap:
>  
>  #ifdef PPCKVM_ENABLE_SECONDARY
>  kvm_secondary_exit_trampoline:
> +
> +	/* all register is free to use, later kvmppc_secondary_stopper_exit set up them*/
> +	//loop-wait for the primary to signal that host env is ready
> +
> +	LOAD_REG_ADDR(r5, kvmppc_secondary_stopper_exit)
> +	/* fixme, load msr from lpaca stack */
> +	li	r6, MSR_IR | MSR_DR
> +	mtsrr0	r5
> +	mtsrr1	r6
> +	RFI
> +
> +_GLOBAL_TOC(kvmppc_secondary_stopper_enter)
> +	mflr	r0
> +	std	r0, PPC_LR_STKOFF(r1)
> +	stdu	r1, -112(r1)
> +
> +	/* fixme: store other register such as msr */
> +
> +	/* prevent us to enter kernel */
> +	li	r0, 1
> +	stb	r0, HSTATE_HWTHREAD_REQ(r13)
> +	/* tell the primary that we are ready */
> +        li      r0,KVM_HWTHREAD_IN_KERNEL
> +        stb     r0,HSTATE_HWTHREAD_STATE(r13)
> +	nap
>  	b	.

This does not look right. Let me explain.

We can get hypervisor decrementer interrupts on the secondary threads
since they are online. And we would ideally want to handle them. They
could be scheduler tick interrupts or some timer queued on that thread
could have fired.

If it is a scheduler tick, we would want to exit to host to schedule the
next task if the timeslice of the vcpu running on the secondary thread
is over.

To the best of my understanding, this is what I can tell happens when we
get a hypervisor decrementer interrupt in the guest. We call all threads
in the core to exit to the kernel. The secondaries now return to the
code in idle_power7.S where the wakeup from idle path takes effect. Then
we do an rfid which lands after the nap code above. And we spin.

From your above code you seem to be indicating that you do not want the
secondary threads to exit to the kernel once they are made to run guest
for the first time. Is this right? Or am I missing something ?

Moreover the changelog in [patch 10/11] says that you cease the
scheduler. This pushes me to think that you do not expect the scheduler
to run on the secondary threads ever again after you have launched the
guest. This does not look right to me.

Regards
Preeti U Murthy
Preeti U Murthy Oct. 27, 2014, 6:07 a.m. UTC | #2
On 10/17/2014 12:59 AM, kernelfans@gmail.com wrote:
> To enter guest, primary hwtherad schedules the stopper func on
> secondary threads and force them into NAP mode.
> When exit to host,secondary threads hardcode to restore the stack,
> then switch back to the stopper func, i.e host.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kvm/book3s_hv.c            | 15 +++++++++++++++
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 34 +++++++++++++++++++++++++++++++++
>  2 files changed, 49 insertions(+)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index ba258c8..4348abd 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -1486,6 +1486,21 @@ static void kvmppc_remove_runnable(struct kvmppc_vcore *vc,
>  	list_del(&vcpu->arch.run_list);
>  }
>  
> +#ifdef KVMPPC_ENABLE_SECONDARY
> +
> +extern void kvmppc_secondary_stopper_enter();
> +
> +static int kvmppc_secondary_stopper(void *data)
> +{
> +	int cpu =smp_processor_id();
> +	struct paca_struct *lpaca = get_paca();
> +	BUG_ON(!(cpu%thread_per_core));
> +
> +	kvmppc_secondary_stopper_enter();
> +}
> +
> +#endif
> +
>  static int kvmppc_grab_hwthread(int cpu)
>  {
>  	struct paca_struct *tpaca;
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index d5594b0..254038b 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -349,7 +349,41 @@ kvm_do_nap:
>  
>  #ifdef PPCKVM_ENABLE_SECONDARY
>  kvm_secondary_exit_trampoline:
> +
> +	/* all register is free to use, later kvmppc_secondary_stopper_exit set up them*/
> +	//loop-wait for the primary to signal that host env is ready
> +
> +	LOAD_REG_ADDR(r5, kvmppc_secondary_stopper_exit)
> +	/* fixme, load msr from lpaca stack */
> +	li	r6, MSR_IR | MSR_DR
> +	mtsrr0	r5
> +	mtsrr1	r6
> +	RFI
> +
> +_GLOBAL_TOC(kvmppc_secondary_stopper_enter)
> +	mflr	r0
> +	std	r0, PPC_LR_STKOFF(r1)
> +	stdu	r1, -112(r1)
> +
> +	/* fixme: store other register such as msr */
> +
> +	/* prevent us to enter kernel */
> +	li	r0, 1
> +	stb	r0, HSTATE_HWTHREAD_REQ(r13)
> +	/* tell the primary that we are ready */
> +        li      r0,KVM_HWTHREAD_IN_KERNEL
> +        stb     r0,HSTATE_HWTHREAD_STATE(r13)
> +	nap
>  	b	.

I see a fundamental issues with this design. Note that the secondaries
are still capable of getting IPIs, timer interrupts and scheduler
interrupts in nap. Scheduling ticks do not get stopped just because we
nap. So the primary thread cannot assume that the secondaries will wait
in nap until it has switched to guest context. There will definitely be
race conditions here between primary switching context and the secondary
handling some interrupt in virtual mode.

So your answer to above could be that we are preventing the secondaries
from entering the kernel by setting the HSTATE_HWTHREAD_REQ above. But
we cannot do this because the secondary thread is as alive and kicking
as the primary thread on the host. There can be hrtimers queued, there
may be an IPI from a thread in another core. We cannot ignore handling
them while we wait for the primary thread to switch context.

There is also a bug as far as I can see with having a b . after nap. I
mentioned that in one of my earlier replies to this patch.


Regards
Preeti U Murthy
> +
> +/* enter with vmode */
> +kvmppc_secondary_stopper_exit:
> +	/* fixme, restore the stack which we store on lpaca */
> +
> +	ld	r0, 112+PPC_LR_STKOFF(r1)
> +	addi	r1, r1, 112
> +	mtlr	r0
> +	blr
>  #endif
>  
>  /******************************************************************************
>

Patch
diff mbox

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index ba258c8..4348abd 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1486,6 +1486,21 @@  static void kvmppc_remove_runnable(struct kvmppc_vcore *vc,
 	list_del(&vcpu->arch.run_list);
 }
 
+#ifdef KVMPPC_ENABLE_SECONDARY
+
+extern void kvmppc_secondary_stopper_enter();
+
+static int kvmppc_secondary_stopper(void *data)
+{
+	int cpu =smp_processor_id();
+	struct paca_struct *lpaca = get_paca();
+	BUG_ON(!(cpu%thread_per_core));
+
+	kvmppc_secondary_stopper_enter();
+}
+
+#endif
+
 static int kvmppc_grab_hwthread(int cpu)
 {
 	struct paca_struct *tpaca;
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index d5594b0..254038b 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -349,7 +349,41 @@  kvm_do_nap:
 
 #ifdef PPCKVM_ENABLE_SECONDARY
 kvm_secondary_exit_trampoline:
+
+	/* all register is free to use, later kvmppc_secondary_stopper_exit set up them*/
+	//loop-wait for the primary to signal that host env is ready
+
+	LOAD_REG_ADDR(r5, kvmppc_secondary_stopper_exit)
+	/* fixme, load msr from lpaca stack */
+	li	r6, MSR_IR | MSR_DR
+	mtsrr0	r5
+	mtsrr1	r6
+	RFI
+
+_GLOBAL_TOC(kvmppc_secondary_stopper_enter)
+	mflr	r0
+	std	r0, PPC_LR_STKOFF(r1)
+	stdu	r1, -112(r1)
+
+	/* fixme: store other register such as msr */
+
+	/* prevent us to enter kernel */
+	li	r0, 1
+	stb	r0, HSTATE_HWTHREAD_REQ(r13)
+	/* tell the primary that we are ready */
+        li      r0,KVM_HWTHREAD_IN_KERNEL
+        stb     r0,HSTATE_HWTHREAD_STATE(r13)
+	nap
 	b	.
+
+/* enter with vmode */
+kvmppc_secondary_stopper_exit:
+	/* fixme, restore the stack which we store on lpaca */
+
+	ld	r0, 112+PPC_LR_STKOFF(r1)
+	addi	r1, r1, 112
+	mtlr	r0
+	blr
 #endif
 
 /******************************************************************************