[RFC,06/11] powerpc/tm: Refactor the __switch_to_tm code

Message ID 1536781219-13938-7-git-send-email-leitao@debian.org
State RFC
Headers show
Series
  • New TM Model
Related show

Checks

Context Check Description
snowpatch_ozlabs/checkpatch warning Test checkpatch on branch next
snowpatch_ozlabs/apply_patch success next/apply_patch Successfully applied

Commit Message

Breno Leitao Sept. 12, 2018, 7:40 p.m.
__switch_to_tm is the function that switches between two tasks which might
have TM enabled. This function is clearly split in two parts, the task that
is leaving the CPU, known as 'prev' and the task that is being scheduled,
known as new.

It starts checking if the previous task had TM enable, if so, it increases
the load_tm (this is the only place we increment load_tm). It also saves
the TM SPRs here.

If the previous task was scheduled out with a transaction active, the
failure cause needs to be updated, since it might contain the failure cause
that caused the exception, as TM_CAUSE_MISC. In this case, since there was
a context switch, overwrite the failure cause.

If the previous task has overflowed load_tm, disable TM, putting the
facility save/restore lazy mechanism at lazy mode.

Regarding the new task, when loading it, it basically restore the SPRs, and
TIF_RESTORE_TM (already set by tm_reclaim_current if the transaction was
active) would invoke the recheckpoint process later in restore_tm_state()
if recheckpoint is somehow required.

On top of that, both tm_reclaim_task() and tm_recheckpoint_new_task()
functions are not used anymore, removing them.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 arch/powerpc/kernel/process.c | 163 +++++++++++++++-------------------
 1 file changed, 74 insertions(+), 89 deletions(-)

Comments

Michael Neuling Sept. 18, 2018, 4:04 a.m. | #1
On Wed, 2018-09-12 at 16:40 -0300, Breno Leitao wrote:
> __switch_to_tm is the function that switches between two tasks which might
> have TM enabled. This function is clearly split in two parts, the task that
> is leaving the CPU, known as 'prev' and the task that is being scheduled,
> known as new.
> 
> It starts checking if the previous task had TM enable, if so, it increases
> the load_tm (this is the only place we increment load_tm). It also saves
> the TM SPRs here.
> 
> If the previous task was scheduled out with a transaction active, the
> failure cause needs to be updated, since it might contain the failure cause
> that caused the exception, as TM_CAUSE_MISC. In this case, since there was
> a context switch, overwrite the failure cause.
> 
> If the previous task has overflowed load_tm, disable TM, putting the
> facility save/restore lazy mechanism at lazy mode.
> 
> Regarding the new task, when loading it, it basically restore the SPRs, and
> TIF_RESTORE_TM (already set by tm_reclaim_current if the transaction was
> active) would invoke the recheckpoint process later in restore_tm_state()
> if recheckpoint is somehow required.

This paragraph is a little awkwardly worded.  Can you rewrite?

> On top of that, both tm_reclaim_task() and tm_recheckpoint_new_task()
> functions are not used anymore, removing them.

What about tm_reclaim_current().  This is being used in places like signals
which I would have thought we could avoid with this series

> 
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
>  arch/powerpc/kernel/process.c | 163 +++++++++++++++-------------------
>  1 file changed, 74 insertions(+), 89 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index fe063c0142e3..5cace1b744b1 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -921,48 +921,6 @@ void tm_reclaim_current(uint8_t cause)
>  	tm_reclaim_thread(&current->thread, cause);
>  }
>  
> -static inline void tm_reclaim_task(struct task_struct *tsk)
> -{
> -	/* We have to work out if we're switching from/to a task that's in the
> -	 * middle of a transaction.
> -	 *
> -	 * In switching we need to maintain a 2nd register state as
> -	 * oldtask->thread.ckpt_regs.  We tm_reclaim(oldproc); this saves the
> -	 * checkpointed (tbegin) state in ckpt_regs, ckfp_state and
> -	 * ckvr_state
> -	 *
> -	 * We also context switch (save) TFHAR/TEXASR/TFIAR in here.
> -	 */
> -	struct thread_struct *thr = &tsk->thread;
> -
> -	if (!thr->regs)
> -		return;
> -
> -	if (!MSR_TM_ACTIVE(thr->regs->msr))
> -		goto out_and_saveregs;
> -
> -	WARN_ON(tm_suspend_disabled);
> -
> -	TM_DEBUG("--- tm_reclaim on pid %d (NIP=%lx, "
> -		 "ccr=%lx, msr=%lx, trap=%lx)\n",
> -		 tsk->pid, thr->regs->nip,
> -		 thr->regs->ccr, thr->regs->msr,
> -		 thr->regs->trap);
> -
> -	tm_reclaim_thread(thr, TM_CAUSE_RESCHED);
> -
> -	TM_DEBUG("--- tm_reclaim on pid %d complete\n",
> -		 tsk->pid);
> -
> -out_and_saveregs:
> -	/* Always save the regs here, even if a transaction's not active.
> -	 * This context-switches a thread's TM info SPRs.  We do it here to
> -	 * be consistent with the restore path (in recheckpoint) which
> -	 * cannot happen later in _switch().
> -	 */
> -	tm_save_sprs(thr);
> -}
> -
>  extern void __tm_recheckpoint(struct thread_struct *thread);
>  
>  void tm_recheckpoint(struct thread_struct *thread)
> @@ -997,59 +955,87 @@ static void tm_fix_failure_cause(struct task_struct
> *task, uint8_t cause)
>  	task->thread.tm_texasr |= (unsigned long) cause << 56;
>  }
>  
> -static inline void tm_recheckpoint_new_task(struct task_struct *new)
> +static inline void __switch_to_tm(struct task_struct *prev,

Can we just drop the __ ?

> +		struct task_struct *new)
>  {
>  	if (!cpu_has_feature(CPU_FTR_TM))
>  		return;
>  
> -	/* Recheckpoint the registers of the thread we're about to switch to.
> -	 *
> -	 * If the task was using FP, we non-lazily reload both the original and
> -	 * the speculative FP register states.  This is because the kernel
> -	 * doesn't see if/when a TM rollback occurs, so if we take an FP
> -	 * unavailable later, we are unable to determine which set of FP regs
> -	 * need to be restored.
> -	 */
> -	if (!tm_enabled(new))
> -		return;
> -
> -	if (!MSR_TM_ACTIVE(new->thread.regs->msr)){
> -		tm_restore_sprs(&new->thread);
> -		return;
> -	}
> -	/* Recheckpoint to restore original checkpointed register state. */
> -	TM_DEBUG("*** tm_recheckpoint of pid %d (new->msr 0x%lx)\n",
> -		 new->pid, new->thread.regs->msr);
> -
> -	tm_recheckpoint(&new->thread);
> -
> -	/*
> -	 * The checkpointed state has been restored but the live state has
> -	 * not, ensure all the math functionality is turned off to trigger
> -	 * restore_math() to reload.
> -	 */
> -	new->thread.regs->msr &= ~(MSR_FP | MSR_VEC | MSR_VSX);
> +	/* The task leaving the CPU was using TM, let's handle it */
> +	if (tm_enabled(prev)) {
> +		/*
> +		 * Load_tm is incremented only when the task is scheduled out
> +		 */
> +		prev->thread.load_tm++;
>  
> -	TM_DEBUG("*** tm_recheckpoint of pid %d complete "
> -		 "(kernel msr 0x%lx)\n",
> -		 new->pid, mfmsr());
> -}
> +		/*
> +		 * If TM is enabled for the thread, it needs to, at least,
> +		 * save the SPRs
> +		 */
> +		tm_enable();
> +		tm_save_sprs(&prev->thread);
>  
> -static inline void __switch_to_tm(struct task_struct *prev,
> -		struct task_struct *new)
> -{
> -	if (cpu_has_feature(CPU_FTR_TM)) {
> -		if (tm_enabled(prev) || tm_enabled(new))
> -			tm_enable();
> +		/*
> +		 * If we got here with an active transaction, then, it was
> +		 * aborted by TM_KERNEL_ENTRY and the fix the failure case
> +		 * needs to be fixed, so, indepedently how we arrived here, the
> +		 * new TM abort case will be TM_CAUSE_RESCHED now.

What does "fix the failure case needs to be fixed" mean?

also s/indepedently/independently/

> +		 */
> +		if (MSR_TM_ACTIVE(prev->thread.regs->msr)) {
> +			/*
> +			 * If there was an IRQ during trecheckpoint, it will
> +			 * cause an IRQ to be replayed. This replayed IRQ can
> +			 * invoke SCHEDULE_USER, thus, we arrive here with a TM
> +			 * active transaction.

I don't think this can happen. trecheckpoint (and treclaim) are called with IRQs
hard off (since they change r1).

I think something else is going on here. I think this code and comment needs to
go but I assume it's here because you are seeing something.

> +			 * I.e, the task was leaving kernelspace to userspace,
> +			 * already trecheckpointed, but there was a IRQ during
> +			 * the trecheckpoint process (soft irq disabled), and
> +			 * on the IRQ replay, the process was de-scheduled, so,
> +			 * SCHEDULE_USER was called and here we are.
> +			 *
> +			 */
> +			if (MSR_TM_ACTIVE(mfmsr())) {
> +				/*
> +				 * This is the only other case other than
> +				 * TM_KERNEL_ENTRY that does a TM reclaim
> +				 */
> +				tm_reclaim_current(TM_CAUSE_RESCHED);
> +			}
>  
> -		if (tm_enabled(prev)) {
> -			prev->thread.load_tm++;
> -			tm_reclaim_task(prev);
> -			if (!MSR_TM_ACTIVE(prev->thread.regs->msr) && prev-
> >thread.load_tm == 0)
> +			/*
> +			 * If rescheduled with TM active, update the
> +			 * failure cause
> +			 */
> +			tm_fix_failure_cause(prev, TM_CAUSE_RESCHED);
> +		} else {
> +			/*
> +			 * TM enabled but not transactional. Just disable TM
> +			 * if load_tm overflows. This should be the only place
> +			 * that disables the TM and reenables the laziness
> +			 * save/restore
> +			 */
> +			if (prev->thread.load_tm == 0)
>  				prev->thread.regs->msr &= ~MSR_TM; 		
> }
> +	}
>  
> -		tm_recheckpoint_new_task(new);
> +	/*
> +	 * It is a *bug* if we arrived so late with a transaction active
> +	 * (more precisely suspended)
> +	 */
> +	if (WARN_ON(MSR_TM_ACTIVE(mfmsr()))) {
> +		/* Recovery path. 0x99 shouldn't be exported to UAPI */
> +		tm_reclaim_current(0x99);
> +	}
> +
> +	/*
> +	 * If the next task has TM enabled, restore the SPRs. Do not need to
> +	 * care about recheckpoint at this time. It will be done later if
> +	 * TIF_RESTORE_TM was set when the task was scheduled out
> +	 */
> +	if (tm_enabled(new)) {
> +		tm_enable();
> +		tm_restore_sprs(&new->thread);
>  	}
>  }
>  
> @@ -1101,7 +1087,6 @@ void restore_tm_state(struct pt_regs *regs)
>  }
>  
>  #else
> -#define tm_recheckpoint_new_task(new)
>  #define __switch_to_tm(prev, new)
>  #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
>  
> @@ -1588,9 +1573,9 @@ int arch_dup_task_struct(struct task_struct *dst, struct
> task_struct *src)
>  	/*
>  	 * Flush TM state out so we can copy it.  __switch_to_tm() does this
>  	 * flush but it removes the checkpointed state from the current CPU and
> -	 * transitions the CPU out of TM mode.  Hence we need to call
> -	 * tm_recheckpoint_new_task() (on the same task) to restore the
> -	 * checkpointed state back and the TM mode.
> +	 * transitions the CPU out of TM mode.  Hence we need to make sure
> +	 * TIF_RESTORE_TM is set so restore_tm_state is called to restore the
> +	 * checkpointed state and back to TM mode.
>  	 *
>  	 * Can't pass dst because it isn't ready. Doesn't matter, passing
>  	 * dst is only important for __switch_to()
Breno Leitao Sept. 27, 2018, 8:48 p.m. | #2
hi Mikey

On 09/18/2018 01:04 AM, Michael Neuling wrote:
>> On top of that, both tm_reclaim_task() and tm_recheckpoint_new_task()
>> functions are not used anymore, removing them.
> 
> What about tm_reclaim_current().  This is being used in places like signals
> which I would have thought we could avoid with this series

In fact tm_reclaim_current() is the default function to reclaim. It is the
function that is being called by TM_KERNEL_ENTRY, other than that, it should
never be called on the sane path.

>> +		 * If we got here with an active transaction, then, it was
>> +		 * aborted by TM_KERNEL_ENTRY and the fix the failure case
>> +		 * needs to be fixed, so, indepedently how we arrived here, the
>> +		 * new TM abort case will be TM_CAUSE_RESCHED now.
> 
> What does "fix the failure case needs to be fixed" mean?
> 
> also s/indepedently/independently/

In fact, I rewrote this paragraph. I try to say that the "TEXASR Failure
code" needs to be updated/fixed, but it became that messy and crazy
paragraph. :-(

> 
>> +		 */
>> +		if (MSR_TM_ACTIVE(prev->thread.regs->msr)) {
>> +			/*
>> +			 * If there was an IRQ during trecheckpoint, it will
>> +			 * cause an IRQ to be replayed. This replayed IRQ can
>> +			 * invoke SCHEDULE_USER, thus, we arrive here with a TM
>> +			 * active transaction.
> 
> I don't think this can happen. trecheckpoint (and treclaim) are called with IRQs
> hard off (since they change r1).

There will be an IRQ check and replay later. An IRQ being replayed can cause
a process to be reschedule and arrive here after a trecheckpoint.

> I think something else is going on here. I think this code and comment needs to
> go but I assume it's here because you are seeing something.

Right, and it was my major concern about this whole review.

The tm_recheckpoint() was being called with IRQ hard off, as you said, but
the IRQ is being re-enabled later, after "trecheckpoint", when it calls
local_irq_restore().

Since the IRQ was re-enabled, there is a mechanism to check for IRQs that
were pending and execute them (after recheckpoint - hence with MSR[TS]
active). Some of these pending IRQ might cause a task reschedule, bringing us
to __switch_to with MSR[TS] active.

I also checked that there were cases where a pending IRQ was waiting to be
replayed even before  tm_recheckpoint() is called. I suspect we were reaching
tm_recheckpoint soft-disabled.

On the v2 patchset, I am following your suggestion, which is calling
restore_tm_state() much later (at the beginning of fast_exception_return),
after the _TIF_USER_WORK_MASK section, and after the IRQ replay section also.
So, after this point, there is no way to rollback the exit to userspace after
trecheckpoint. Therefore, if there is a recheckpoint, a rfid will always come
directly.

In order to do so, I need to remove _TIF_RESTORE_TM as part of
_TIF_USER_WORK_MASK and handle _TIF_RESTORE_TM individually later.

This seems to solve this problem, and I can remove this reclaim in the middle
of __switch_to_tm().

Thank you

Patch

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index fe063c0142e3..5cace1b744b1 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -921,48 +921,6 @@  void tm_reclaim_current(uint8_t cause)
 	tm_reclaim_thread(&current->thread, cause);
 }
 
-static inline void tm_reclaim_task(struct task_struct *tsk)
-{
-	/* We have to work out if we're switching from/to a task that's in the
-	 * middle of a transaction.
-	 *
-	 * In switching we need to maintain a 2nd register state as
-	 * oldtask->thread.ckpt_regs.  We tm_reclaim(oldproc); this saves the
-	 * checkpointed (tbegin) state in ckpt_regs, ckfp_state and
-	 * ckvr_state
-	 *
-	 * We also context switch (save) TFHAR/TEXASR/TFIAR in here.
-	 */
-	struct thread_struct *thr = &tsk->thread;
-
-	if (!thr->regs)
-		return;
-
-	if (!MSR_TM_ACTIVE(thr->regs->msr))
-		goto out_and_saveregs;
-
-	WARN_ON(tm_suspend_disabled);
-
-	TM_DEBUG("--- tm_reclaim on pid %d (NIP=%lx, "
-		 "ccr=%lx, msr=%lx, trap=%lx)\n",
-		 tsk->pid, thr->regs->nip,
-		 thr->regs->ccr, thr->regs->msr,
-		 thr->regs->trap);
-
-	tm_reclaim_thread(thr, TM_CAUSE_RESCHED);
-
-	TM_DEBUG("--- tm_reclaim on pid %d complete\n",
-		 tsk->pid);
-
-out_and_saveregs:
-	/* Always save the regs here, even if a transaction's not active.
-	 * This context-switches a thread's TM info SPRs.  We do it here to
-	 * be consistent with the restore path (in recheckpoint) which
-	 * cannot happen later in _switch().
-	 */
-	tm_save_sprs(thr);
-}
-
 extern void __tm_recheckpoint(struct thread_struct *thread);
 
 void tm_recheckpoint(struct thread_struct *thread)
@@ -997,59 +955,87 @@  static void tm_fix_failure_cause(struct task_struct *task, uint8_t cause)
 	task->thread.tm_texasr |= (unsigned long) cause << 56;
 }
 
-static inline void tm_recheckpoint_new_task(struct task_struct *new)
+static inline void __switch_to_tm(struct task_struct *prev,
+		struct task_struct *new)
 {
 	if (!cpu_has_feature(CPU_FTR_TM))
 		return;
 
-	/* Recheckpoint the registers of the thread we're about to switch to.
-	 *
-	 * If the task was using FP, we non-lazily reload both the original and
-	 * the speculative FP register states.  This is because the kernel
-	 * doesn't see if/when a TM rollback occurs, so if we take an FP
-	 * unavailable later, we are unable to determine which set of FP regs
-	 * need to be restored.
-	 */
-	if (!tm_enabled(new))
-		return;
-
-	if (!MSR_TM_ACTIVE(new->thread.regs->msr)){
-		tm_restore_sprs(&new->thread);
-		return;
-	}
-	/* Recheckpoint to restore original checkpointed register state. */
-	TM_DEBUG("*** tm_recheckpoint of pid %d (new->msr 0x%lx)\n",
-		 new->pid, new->thread.regs->msr);
-
-	tm_recheckpoint(&new->thread);
-
-	/*
-	 * The checkpointed state has been restored but the live state has
-	 * not, ensure all the math functionality is turned off to trigger
-	 * restore_math() to reload.
-	 */
-	new->thread.regs->msr &= ~(MSR_FP | MSR_VEC | MSR_VSX);
+	/* The task leaving the CPU was using TM, let's handle it */
+	if (tm_enabled(prev)) {
+		/*
+		 * Load_tm is incremented only when the task is scheduled out
+		 */
+		prev->thread.load_tm++;
 
-	TM_DEBUG("*** tm_recheckpoint of pid %d complete "
-		 "(kernel msr 0x%lx)\n",
-		 new->pid, mfmsr());
-}
+		/*
+		 * If TM is enabled for the thread, it needs to, at least,
+		 * save the SPRs
+		 */
+		tm_enable();
+		tm_save_sprs(&prev->thread);
 
-static inline void __switch_to_tm(struct task_struct *prev,
-		struct task_struct *new)
-{
-	if (cpu_has_feature(CPU_FTR_TM)) {
-		if (tm_enabled(prev) || tm_enabled(new))
-			tm_enable();
+		/*
+		 * If we got here with an active transaction, then, it was
+		 * aborted by TM_KERNEL_ENTRY and the fix the failure case
+		 * needs to be fixed, so, indepedently how we arrived here, the
+		 * new TM abort case will be TM_CAUSE_RESCHED now.
+		 */
+		if (MSR_TM_ACTIVE(prev->thread.regs->msr)) {
+			/*
+			 * If there was an IRQ during trecheckpoint, it will
+			 * cause an IRQ to be replayed. This replayed IRQ can
+			 * invoke SCHEDULE_USER, thus, we arrive here with a TM
+			 * active transaction.
+			 * I.e, the task was leaving kernelspace to userspace,
+			 * already trecheckpointed, but there was a IRQ during
+			 * the trecheckpoint process (soft irq disabled), and
+			 * on the IRQ replay, the process was de-scheduled, so,
+			 * SCHEDULE_USER was called and here we are.
+			 *
+			 */
+			if (MSR_TM_ACTIVE(mfmsr())) {
+				/*
+				 * This is the only other case other than
+				 * TM_KERNEL_ENTRY that does a TM reclaim
+				 */
+				tm_reclaim_current(TM_CAUSE_RESCHED);
+			}
 
-		if (tm_enabled(prev)) {
-			prev->thread.load_tm++;
-			tm_reclaim_task(prev);
-			if (!MSR_TM_ACTIVE(prev->thread.regs->msr) && prev->thread.load_tm == 0)
+			/*
+			 * If rescheduled with TM active, update the
+			 * failure cause
+			 */
+			tm_fix_failure_cause(prev, TM_CAUSE_RESCHED);
+		} else {
+			/*
+			 * TM enabled but not transactional. Just disable TM
+			 * if load_tm overflows. This should be the only place
+			 * that disables the TM and reenables the laziness
+			 * save/restore
+			 */
+			if (prev->thread.load_tm == 0)
 				prev->thread.regs->msr &= ~MSR_TM;
 		}
+	}
 
-		tm_recheckpoint_new_task(new);
+	/*
+	 * It is a *bug* if we arrived so late with a transaction active
+	 * (more precisely suspended)
+	 */
+	if (WARN_ON(MSR_TM_ACTIVE(mfmsr()))) {
+		/* Recovery path. 0x99 shouldn't be exported to UAPI */
+		tm_reclaim_current(0x99);
+	}
+
+	/*
+	 * If the next task has TM enabled, restore the SPRs. Do not need to
+	 * care about recheckpoint at this time. It will be done later if
+	 * TIF_RESTORE_TM was set when the task was scheduled out
+	 */
+	if (tm_enabled(new)) {
+		tm_enable();
+		tm_restore_sprs(&new->thread);
 	}
 }
 
@@ -1101,7 +1087,6 @@  void restore_tm_state(struct pt_regs *regs)
 }
 
 #else
-#define tm_recheckpoint_new_task(new)
 #define __switch_to_tm(prev, new)
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
 
@@ -1588,9 +1573,9 @@  int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 	/*
 	 * Flush TM state out so we can copy it.  __switch_to_tm() does this
 	 * flush but it removes the checkpointed state from the current CPU and
-	 * transitions the CPU out of TM mode.  Hence we need to call
-	 * tm_recheckpoint_new_task() (on the same task) to restore the
-	 * checkpointed state back and the TM mode.
+	 * transitions the CPU out of TM mode.  Hence we need to make sure
+	 * TIF_RESTORE_TM is set so restore_tm_state is called to restore the
+	 * checkpointed state and back to TM mode.
 	 *
 	 * Can't pass dst because it isn't ready. Doesn't matter, passing
 	 * dst is only important for __switch_to()