diff mbox

powerpc: icswx: fix race condition where threads do not get their ACOP register updated in time.

Message ID 1329948466-325-1-git-send-email-jimix@pobox.com (mailing list archive)
State Changes Requested
Delegated to: Benjamin Herrenschmidt
Headers show

Commit Message

Jimi Xenidis Feb. 22, 2012, 10:07 p.m. UTC
There is a race where a thread causes a coprocessor type to be valid
in its own ACOP _and_ in the current context, but it does not
propagate to the ACOP register of other threads in time for them to
use it.  The original code tries to solve this by sending an IPI to
all threads on the system, which is heavy handed, but unfortunately
still provides a window where the icswx is issued by other threads and
the ACOP is not up to date.

This patch detects that the ACOP DSI fault was a "false positive" and
syncs the ACOP and causes the icswx to be replayed.

Signed-off-by: Jimi Xenidis <jimix@pobox.com>
Cc: Anton Blanchard <anton@samba.org>
---
 arch/powerpc/mm/icswx.c |   29 +++++++++++++++++++++++++++--
 arch/powerpc/mm/icswx.h |    6 ++++++
 2 files changed, 33 insertions(+), 2 deletions(-)

Comments

Benjamin Herrenschmidt Feb. 26, 2012, 11:47 p.m. UTC | #1
>  
> +	/*
> +	 * We could be here because another thread has enabled acop
> +	 * but the ACOP register has yet to be updated.
> +	 *
> +	 * This should have been taken care of by the IPI to sync all
> +	 * the threads (see smp_call_function(sync_cop, mm, 1)), but
> +	 * that could take forever if there are a significant amount
> +	 * of threads.
> +	 *
> +	 * Given the number of threads on some of these systems,
> +	 * perhaps this is the best way to sync ACOP rather than whack
> +	 * every thread with an IPI.
> +	 */

This is actually pretty standard stuff... If it was me I would make it
all lazy and avoid the IPI completely but it doesn't necessarily hurt
that much. In any case the "recovery" is indeed needed and you should
probably also remove the pr_debug, it's really just spam.
 
> +	if (acop_copro_type_bit(ct) && current->active_mm->context.acop) {

Shouldn't that be "&" ? In fact, gcc would even warn so either make
it acop_check_copro(acop, ct) or do a (x & y) != 0

Cheers,
Ben.

> +		pr_debug("%s[%d]: Spurrious ACOP Fault, CT: %d, bit: 0x%llx "
> +			"SPR: 0x%lx, mm->acop: 0x%lx\n",
> +			current->comm, current->pid,
> +			ct, acop_copro_type_bit(ct), mfspr(SPRN_ACOP),
> +			current->active_mm->context.acop);
> +
> +		sync_cop(current->active_mm);
> +		return 0;
> +	}
> +
> +	/* check for alternate policy */
>  	if (!acop_use_cop(ct))
>  		return 0;
>  
>  	/* at this point the CT is unknown to the system */
> -	pr_warn("%s[%d]: Coprocessor %d is unavailable",
> +	pr_warn("%s[%d]: Coprocessor %d is unavailable\n",
>  		current->comm, current->pid, ct);
>  
>  	/* get inst if we don't already have it */
> diff --git a/arch/powerpc/mm/icswx.h b/arch/powerpc/mm/icswx.h
> index 42176bd..6dedc08 100644
> --- a/arch/powerpc/mm/icswx.h
> +++ b/arch/powerpc/mm/icswx.h
> @@ -59,4 +59,10 @@ extern void free_cop_pid(int free_pid);
>  
>  extern int acop_handle_fault(struct pt_regs *regs, unsigned long address,
>  			     unsigned long error_code);
> +
> +static inline u64 acop_copro_type_bit(unsigned int type)
> +{
> +	return 1ULL << (63 - type);
> +}
> +
>  #endif /* !_ARCH_POWERPC_MM_ICSWX_H_ */
Jimi Xenidis Feb. 27, 2012, 5:56 p.m. UTC | #2
On Feb 26, 2012, at 5:47 PM, Benjamin Herrenschmidt wrote:

> 
>> 
>> +	/*
>> +	 * We could be here because another thread has enabled acop
>> +	 * but the ACOP register has yet to be updated.
>> +	 *
>> +	 * This should have been taken care of by the IPI to sync all
>> +	 * the threads (see smp_call_function(sync_cop, mm, 1)), but
>> +	 * that could take forever if there are a significant amount
>> +	 * of threads.
>> +	 *
>> +	 * Given the number of threads on some of these systems,
>> +	 * perhaps this is the best way to sync ACOP rather than whack
>> +	 * every thread with an IPI.
>> +	 */
> 
> This is actually pretty standard stuff... If it was me I would make it
> all lazy and avoid the IPI completely but it doesn't necessarily hurt
> that much.

I'm happy to get rid of the IPI completely if Anton (or someone else) agrees to test on his end.
If not, do you want me to reduce the comment?


> In any case the "recovery" is indeed needed and you should
> probably also remove the pr_debug, it's really just spam.

ack

> 
>> +	if (acop_copro_type_bit(ct) && current->active_mm->context.acop) {
> 
> Shouldn't that be "&" ? In fact, gcc would even warn so either make
> it acop_check_copro(acop, ct) or do a (x & y) != 0

Ach!! nice catch!
-jx

> 
> Cheers,
> Ben.
> 
>> +		pr_debug("%s[%d]: Spurrious ACOP Fault, CT: %d, bit: 0x%llx "
>> +			"SPR: 0x%lx, mm->acop: 0x%lx\n",
>> +			current->comm, current->pid,
>> +			ct, acop_copro_type_bit(ct), mfspr(SPRN_ACOP),
>> +			current->active_mm->context.acop);
>> +
>> +		sync_cop(current->active_mm);
>> +		return 0;
>> +	}
>> +
>> +	/* check for alternate policy */
>> 	if (!acop_use_cop(ct))
>> 		return 0;
>> 
>> 	/* at this point the CT is unknown to the system */
>> -	pr_warn("%s[%d]: Coprocessor %d is unavailable",
>> +	pr_warn("%s[%d]: Coprocessor %d is unavailable\n",
>> 		current->comm, current->pid, ct);
>> 
>> 	/* get inst if we don't already have it */
>> diff --git a/arch/powerpc/mm/icswx.h b/arch/powerpc/mm/icswx.h
>> index 42176bd..6dedc08 100644
>> --- a/arch/powerpc/mm/icswx.h
>> +++ b/arch/powerpc/mm/icswx.h
>> @@ -59,4 +59,10 @@ extern void free_cop_pid(int free_pid);
>> 
>> extern int acop_handle_fault(struct pt_regs *regs, unsigned long address,
>> 			     unsigned long error_code);
>> +
>> +static inline u64 acop_copro_type_bit(unsigned int type)
>> +{
>> +	return 1ULL << (63 - type);
>> +}
>> +
>> #endif /* !_ARCH_POWERPC_MM_ICSWX_H_ */
> 
>
diff mbox

Patch

diff --git a/arch/powerpc/mm/icswx.c b/arch/powerpc/mm/icswx.c
index 5d9a59e..4db5b36 100644
--- a/arch/powerpc/mm/icswx.c
+++ b/arch/powerpc/mm/icswx.c
@@ -163,7 +163,7 @@  EXPORT_SYMBOL_GPL(drop_cop);
 
 static int acop_use_cop(int ct)
 {
-	/* todo */
+	/* There is no alternate policy, yet */
 	return -1;
 }
 
@@ -227,11 +227,36 @@  int acop_handle_fault(struct pt_regs *regs, unsigned long address,
 		ct = (ccw >> 16) & 0x3f;
 	}
 
+	/*
+	 * We could be here because another thread has enabled acop
+	 * but the ACOP register has yet to be updated.
+	 *
+	 * This should have been taken care of by the IPI to sync all
+	 * the threads (see smp_call_function(sync_cop, mm, 1)), but
+	 * that could take forever if there are a significant amount
+	 * of threads.
+	 *
+	 * Given the number of threads on some of these systems,
+	 * perhaps this is the best way to sync ACOP rather than whack
+	 * every thread with an IPI.
+	 */
+	if (acop_copro_type_bit(ct) && current->active_mm->context.acop) {
+		pr_debug("%s[%d]: Spurrious ACOP Fault, CT: %d, bit: 0x%llx "
+			"SPR: 0x%lx, mm->acop: 0x%lx\n",
+			current->comm, current->pid,
+			ct, acop_copro_type_bit(ct), mfspr(SPRN_ACOP),
+			current->active_mm->context.acop);
+
+		sync_cop(current->active_mm);
+		return 0;
+	}
+
+	/* check for alternate policy */
 	if (!acop_use_cop(ct))
 		return 0;
 
 	/* at this point the CT is unknown to the system */
-	pr_warn("%s[%d]: Coprocessor %d is unavailable",
+	pr_warn("%s[%d]: Coprocessor %d is unavailable\n",
 		current->comm, current->pid, ct);
 
 	/* get inst if we don't already have it */
diff --git a/arch/powerpc/mm/icswx.h b/arch/powerpc/mm/icswx.h
index 42176bd..6dedc08 100644
--- a/arch/powerpc/mm/icswx.h
+++ b/arch/powerpc/mm/icswx.h
@@ -59,4 +59,10 @@  extern void free_cop_pid(int free_pid);
 
 extern int acop_handle_fault(struct pt_regs *regs, unsigned long address,
 			     unsigned long error_code);
+
+static inline u64 acop_copro_type_bit(unsigned int type)
+{
+	return 1ULL << (63 - type);
+}
+
 #endif /* !_ARCH_POWERPC_MM_ICSWX_H_ */