[v2,16/19] ARC: [plat-eznps] Use dedicated cpu_relax()
diff mbox

Message ID 1446893557-29748-17-git-send-email-noamc@ezchip.com
State Superseded
Headers show

Commit Message

Noam Camus Nov. 7, 2015, 10:52 a.m. UTC
From: Tal Zilcer <talz@ezchip.com>

Since the CTOP is SMT hardware multi-threaded, we need to hint
the HW that now will be a very good time to do a hardware
thread context switching. This is done by issuing the schd.rw
instruction (binary coded here so as to not require specific
revision of GCC to build the kernel).
sched.rw means that Thread becomes eligible for execution by
the threads scheduler after all pending read/write
transactions were completed.

Implementing cpu_relax_lowlatency() with barrier()
Since with current semantics of cpu_relax() it may take a
while till yielded CPU will get back.

Signed-off-by: Noam Camus <noamc@ezchip.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/processor.h |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

Comments

Peter Zijlstra Nov. 9, 2015, 10:05 a.m. UTC | #1
On Sat, Nov 07, 2015 at 12:52:34PM +0200, Noam Camus wrote:
> diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h
> index 7266ede..50f9bae 100644
> --- a/arch/arc/include/asm/processor.h
> +++ b/arch/arc/include/asm/processor.h
> @@ -58,12 +58,21 @@ struct task_struct;
>   * get optimised away by gcc
>   */
>  #ifdef CONFIG_SMP
> +#ifndef CONFIG_EZNPS_MTM_EXT
>  #define cpu_relax()	__asm__ __volatile__ ("" : : : "memory")
>  #else
> +#define cpu_relax()     \
> +	__asm__ __volatile__ (".word %0" : : "i"(CTOP_INST_SCHD_RW) : "memory")
> +#endif
> +#else
>  #define cpu_relax()	do { } while (0)

I'm fairly sure this is incorrect. Even on UP we expect cpu_relax() to
be a compiler barrier.

>  #endif
>  
> +#ifndef CONFIG_EZNPS_MTM_EXT
>  #define cpu_relax_lowlatency() cpu_relax()
> +#else
> +#define cpu_relax_lowlatency() barrier()
> +#endif

At which point you can unconditionally use that definition.

>  
>  #define copy_segments(tsk, mm)      do { } while (0)
>  #define release_segments(mm)        do { } while (0)
> -- 
> 1.7.1
>
Vineet Gupta Nov. 9, 2015, 10:22 a.m. UTC | #2
On Monday 09 November 2015 03:35 PM, Peter Zijlstra wrote:
> On Sat, Nov 07, 2015 at 12:52:34PM +0200, Noam Camus wrote:
>> diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h
>> index 7266ede..50f9bae 100644
>> --- a/arch/arc/include/asm/processor.h
>> +++ b/arch/arc/include/asm/processor.h
>> @@ -58,12 +58,21 @@ struct task_struct;
>>   * get optimised away by gcc
>>   */
>>  #ifdef CONFIG_SMP
>> +#ifndef CONFIG_EZNPS_MTM_EXT
>>  #define cpu_relax()	__asm__ __volatile__ ("" : : : "memory")
>>  #else
>> +#define cpu_relax()     \
>> +	__asm__ __volatile__ (".word %0" : : "i"(CTOP_INST_SCHD_RW) : "memory")
>> +#endif
>> +#else
>>  #define cpu_relax()	do { } while (0)
> I'm fairly sure this is incorrect. Even on UP we expect cpu_relax() to
> be a compiler barrier.

We discussed this a while back (why do https:/lkml.org/lkml/<year>/.... links work
psuedo randomly)

http://marc.info/?l=linux-kernel&m=140350765530113


>
>>  #endif
>>  
>> +#ifndef CONFIG_EZNPS_MTM_EXT
>>  #define cpu_relax_lowlatency() cpu_relax()
>> +#else
>> +#define cpu_relax_lowlatency() barrier()
>> +#endif
> At which point you can unconditionally use that definition.
>
>>  
>>  #define copy_segments(tsk, mm)      do { } while (0)
>>  #define release_segments(mm)        do { } while (0)
>> -- 
>> 1.7.1
>>
> _______________________________________________
> linux-snps-arc mailing list
> linux-snps-arc@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-snps-arc
>
Peter Zijlstra Nov. 9, 2015, 10:45 a.m. UTC | #3
On Mon, Nov 09, 2015 at 10:22:27AM +0000, Vineet Gupta wrote:
> On Monday 09 November 2015 03:35 PM, Peter Zijlstra wrote:
> > On Sat, Nov 07, 2015 at 12:52:34PM +0200, Noam Camus wrote:
> >> diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h
> >> index 7266ede..50f9bae 100644
> >> --- a/arch/arc/include/asm/processor.h
> >> +++ b/arch/arc/include/asm/processor.h
> >> @@ -58,12 +58,21 @@ struct task_struct;
> >>   * get optimised away by gcc
> >>   */
> >>  #ifdef CONFIG_SMP
> >> +#ifndef CONFIG_EZNPS_MTM_EXT
> >>  #define cpu_relax()	__asm__ __volatile__ ("" : : : "memory")
> >>  #else
> >> +#define cpu_relax()     \
> >> +	__asm__ __volatile__ (".word %0" : : "i"(CTOP_INST_SCHD_RW) : "memory")
> >> +#endif
> >> +#else
> >>  #define cpu_relax()	do { } while (0)
> > I'm fairly sure this is incorrect. Even on UP we expect cpu_relax() to
> > be a compiler barrier.
> 
> We discussed this a while back (why do https:/lkml.org/lkml/<year>/.... links work
> psuedo randomly)
> 
> http://marc.info/?l=linux-kernel&m=140350765530113

Hurm.. you have a better memory than me ;-)

So in general we assume cpu_relax() implies a barrier() and I think we
have loops like:

	while (!var)
		cpu_relax();

where var isn't volatile (or casted using READ_ONCE etc).

See for instance: kernel/time/timer.c:lock_timer_base() which has:

	for (;;) {
		u32 tf = timer->flags;

		if (!(tf & TIMER_MIGRATING)) {
		 ...
		}

		cpu_relax();
	}

So while TIMER_MIGRATING is set, it will only ever do regular loads,
which GCC is permitted to lift out if cpu_relax() is not a barrier.

Patch
diff mbox

diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h
index 7266ede..50f9bae 100644
--- a/arch/arc/include/asm/processor.h
+++ b/arch/arc/include/asm/processor.h
@@ -58,12 +58,21 @@  struct task_struct;
  * get optimised away by gcc
  */
 #ifdef CONFIG_SMP
+#ifndef CONFIG_EZNPS_MTM_EXT
 #define cpu_relax()	__asm__ __volatile__ ("" : : : "memory")
 #else
+#define cpu_relax()     \
+	__asm__ __volatile__ (".word %0" : : "i"(CTOP_INST_SCHD_RW) : "memory")
+#endif
+#else
 #define cpu_relax()	do { } while (0)
 #endif
 
+#ifndef CONFIG_EZNPS_MTM_EXT
 #define cpu_relax_lowlatency() cpu_relax()
+#else
+#define cpu_relax_lowlatency() barrier()
+#endif
 
 #define copy_segments(tsk, mm)      do { } while (0)
 #define release_segments(mm)        do { } while (0)