[RFC,09/11] powerpc: kvm: handle time base on secondary hwthread
diff mbox

Message ID 1413487800-7162-10-git-send-email-kernelfans@gmail.com
State RFC
Headers show

Commit Message

Pingfan Liu Oct. 16, 2014, 7:29 p.m. UTC
(This is a place holder patch.)
We need to store the time base for host on secondary hwthread.
Later when switching back, we need to reprogram it with elapse
time.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Preeti U Murthy Oct. 27, 2014, 6:40 a.m. UTC | #1
On 10/17/2014 12:59 AM, kernelfans@gmail.com wrote:
> (This is a place holder patch.)
> We need to store the time base for host on secondary hwthread.
> Later when switching back, we need to reprogram it with elapse
> time.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index 89ea16c..a817ba6 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -371,6 +371,8 @@ _GLOBAL_TOC(kvmppc_secondary_stopper_enter)
>  
>  	/* fixme: store other register such as msr */
>  
> +	/* fixme: store the tb, and set it as MAX, so we cease the tick on secondary */

This can lead to serious consequences. First of all, even if we set the
decrementer(not tb) to MAX, it is bound to fire after 4s. That is the
maximum time till which you can keep off the decrementer from firing.

In the hotplug path, the offline cpus nap and their decrementers are
programmed to fire at MAX too. But the difference is that we clear the
LPCR_PECE1 wakeup bit which prevents cpus from waking up on a
decrementer interrupt.

We cannot afford to do this here though because there are other tasks on
the secondary threads' runqueue. They need to be scheduled in.
There are also timers besides the tick_sched one, which can be queued on
these secondary threads. These patches have not taken care to migrate
timers or tasks before entering guest as far as I observed. Hence we
cannot just turn off time base like this and expect to handle the above
mentioned events the next time the primary thread decides to exit to the
host.

Regards
Preeti U Murthy
> +
>  	/* prevent us to enter kernel */
>  	li	r0, 1
>  	stb	r0, HSTATE_HWTHREAD_REQ(r13)
> @@ -382,6 +384,10 @@ _GLOBAL_TOC(kvmppc_secondary_stopper_enter)
>  
>  /* enter with vmode */
>  kvmppc_secondary_stopper_exit:
> +	/* fixme: restore the tb, with the orig val plus time elapse
> +         * so we can fire the hrtimer as soon as possible
> +         */
> +
>  	/* fixme, restore the stack which we store on lpaca */
>  
>  	ld	r0, 112+PPC_LR_STKOFF(r1)
>
Pingfan Liu Nov. 18, 2014, 5:43 a.m. UTC | #2
On Mon, Oct 27, 2014 at 2:40 PM, Preeti U Murthy
<preeti@linux.vnet.ibm.com> wrote:
> On 10/17/2014 12:59 AM, kernelfans@gmail.com wrote:
>> (This is a place holder patch.)
>> We need to store the time base for host on secondary hwthread.
>> Later when switching back, we need to reprogram it with elapse
>> time.
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> index 89ea16c..a817ba6 100644
>> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> @@ -371,6 +371,8 @@ _GLOBAL_TOC(kvmppc_secondary_stopper_enter)
>>
>>       /* fixme: store other register such as msr */
>>
>> +     /* fixme: store the tb, and set it as MAX, so we cease the tick on secondary */
>
> This can lead to serious consequences. First of all, even if we set the
> decrementer(not tb) to MAX, it is bound to fire after 4s. That is the
> maximum time till which you can keep off the decrementer from firing.
>
> In the hotplug path, the offline cpus nap and their decrementers are
> programmed to fire at MAX too. But the difference is that we clear the
> LPCR_PECE1 wakeup bit which prevents cpus from waking up on a
> decrementer interrupt.
>
> We cannot afford to do this here though because there are other tasks on
> the secondary threads' runqueue. They need to be scheduled in.
> There are also timers besides the tick_sched one, which can be queued on
> these secondary threads. These patches have not taken care to migrate
> timers or tasks before entering guest as far as I observed. Hence we
> cannot just turn off time base like this and expect to handle the above
> mentioned events the next time the primary thread decides to exit to the
> host.
>
Yes, that is the nut in this series. My plan is to compensate the
hrtimer when the secondary exit.
But as to the scheduler on secondary, if it is ceased, what is side-effect?

Thx,
Fan

> Regards
> Preeti U Murthy
>> +
>>       /* prevent us to enter kernel */
>>       li      r0, 1
>>       stb     r0, HSTATE_HWTHREAD_REQ(r13)
>> @@ -382,6 +384,10 @@ _GLOBAL_TOC(kvmppc_secondary_stopper_enter)
>>
>>  /* enter with vmode */
>>  kvmppc_secondary_stopper_exit:
>> +     /* fixme: restore the tb, with the orig val plus time elapse
>> +         * so we can fire the hrtimer as soon as possible
>> +         */
>> +
>>       /* fixme, restore the stack which we store on lpaca */
>>
>>       ld      r0, 112+PPC_LR_STKOFF(r1)
>>
>

Patch
diff mbox

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 89ea16c..a817ba6 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -371,6 +371,8 @@  _GLOBAL_TOC(kvmppc_secondary_stopper_enter)
 
 	/* fixme: store other register such as msr */
 
+	/* fixme: store the tb, and set it as MAX, so we cease the tick on secondary */
+
 	/* prevent us to enter kernel */
 	li	r0, 1
 	stb	r0, HSTATE_HWTHREAD_REQ(r13)
@@ -382,6 +384,10 @@  _GLOBAL_TOC(kvmppc_secondary_stopper_enter)
 
 /* enter with vmode */
 kvmppc_secondary_stopper_exit:
+	/* fixme: restore the tb, with the orig val plus time elapse
+         * so we can fire the hrtimer as soon as possible
+         */
+
 	/* fixme, restore the stack which we store on lpaca */
 
 	ld	r0, 112+PPC_LR_STKOFF(r1)