diff mbox

[U-Boot,08/24] armv8: add simple sdelay implementation

Message ID 1479653838-3574-9-git-send-email-andre.przywara@arm.com
State Superseded
Delegated to: Jagannadha Sutradharudu Teki
Headers show

Commit Message

Andre Przywara Nov. 20, 2016, 2:57 p.m. UTC
The sunxi DRAM setup code needs an sdelay() implementation, which
wasn't defined for armv8 so far.
Shamelessly copy the armv7 version and adjust it to work in AArch64.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
 arch/arm/cpu/armv8/cpu.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Alexander Graf Nov. 21, 2016, 3:52 p.m. UTC | #1
On 20/11/2016 15:57, Andre Przywara wrote:
> The sunxi DRAM setup code needs an sdelay() implementation, which
> wasn't defined for armv8 so far.
> Shamelessly copy the armv7 version and adjust it to work in AArch64.
>
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>

I don't think it hurts to write this in C - and I also doubt that 
inlining has any negative effect.

Something along the lines of

static inline void sdelay(...) {
   for (; loops; loops--)
     asm volatile("");
}

inside a header should do the trick as well and is much more readable.


Alex

> ---
>  arch/arm/cpu/armv8/cpu.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
>
> diff --git a/arch/arm/cpu/armv8/cpu.c b/arch/arm/cpu/armv8/cpu.c
> index e06c3cc..e82e9cf 100644
> --- a/arch/arm/cpu/armv8/cpu.c
> +++ b/arch/arm/cpu/armv8/cpu.c
> @@ -16,6 +16,19 @@
>  #include <asm/system.h>
>  #include <linux/compiler.h>
>
> +/************************************************************
> + * sdelay() - simple spin loop.  Will be constant time as
> + *  its generally used in bypass conditions only.  This
> + *  is necessary until timers are accessible.
> + *
> + *  not inline to increase chances its in cache when called
> + *************************************************************/
> +void sdelay(unsigned long loops)
> +{
> +	__asm__ volatile ("1:\n" "subs %0, %1, #1\n"
> +			  "b.ne 1b":"=r" (loops):"0"(loops));
> +}
> +
>  int cleanup_before_linux(void)
>  {
>  	/*
>
Siarhei Siamashka Nov. 24, 2016, 1:25 a.m. UTC | #2
On Sun, 20 Nov 2016 14:57:02 +0000
Andre Przywara <andre.przywara@arm.com> wrote:

> The sunxi DRAM setup code needs an sdelay() implementation, which
> wasn't defined for armv8 so far.
> Shamelessly copy the armv7 version and adjust it to work in AArch64.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
>  arch/arm/cpu/armv8/cpu.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/arch/arm/cpu/armv8/cpu.c b/arch/arm/cpu/armv8/cpu.c
> index e06c3cc..e82e9cf 100644
> --- a/arch/arm/cpu/armv8/cpu.c
> +++ b/arch/arm/cpu/armv8/cpu.c
> @@ -16,6 +16,19 @@
>  #include <asm/system.h>
>  #include <linux/compiler.h>
>  
> +/************************************************************
> + * sdelay() - simple spin loop.  Will be constant time as
> + *  its generally used in bypass conditions only.  This
> + *  is necessary until timers are accessible.
> + *
> + *  not inline to increase chances its in cache when called
> + *************************************************************/
> +void sdelay(unsigned long loops)
> +{
> +	__asm__ volatile ("1:\n" "subs %0, %1, #1\n"
> +			  "b.ne 1b":"=r" (loops):"0"(loops));

This inline assembly needs "cc" in the clobber list. Also don't we
want to just use a single register for the counter ("subs %0, %0, #1")
rather than trying to construct something excessively complicated
and possibly fragile?

The https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html page provides
some information.

> +}
> +
>  int cleanup_before_linux(void)
>  {
>  	/*
Andre Przywara Nov. 24, 2016, 1:29 a.m. UTC | #3
On 24/11/16 01:25, Siarhei Siamashka wrote:

Hi Siarhei,

> On Sun, 20 Nov 2016 14:57:02 +0000
> Andre Przywara <andre.przywara@arm.com> wrote:
> 
>> The sunxi DRAM setup code needs an sdelay() implementation, which
>> wasn't defined for armv8 so far.
>> Shamelessly copy the armv7 version and adjust it to work in AArch64.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> ---
>>  arch/arm/cpu/armv8/cpu.c | 13 +++++++++++++
>>  1 file changed, 13 insertions(+)
>>
>> diff --git a/arch/arm/cpu/armv8/cpu.c b/arch/arm/cpu/armv8/cpu.c
>> index e06c3cc..e82e9cf 100644
>> --- a/arch/arm/cpu/armv8/cpu.c
>> +++ b/arch/arm/cpu/armv8/cpu.c
>> @@ -16,6 +16,19 @@
>>  #include <asm/system.h>
>>  #include <linux/compiler.h>
>>  
>> +/************************************************************
>> + * sdelay() - simple spin loop.  Will be constant time as
>> + *  its generally used in bypass conditions only.  This
>> + *  is necessary until timers are accessible.
>> + *
>> + *  not inline to increase chances its in cache when called
>> + *************************************************************/
>> +void sdelay(unsigned long loops)
>> +{
>> +	__asm__ volatile ("1:\n" "subs %0, %1, #1\n"
>> +			  "b.ne 1b":"=r" (loops):"0"(loops));
> 
> This inline assembly needs "cc" in the clobber list. Also don't we
> want to just use a single register for the counter ("subs %0, %0, #1")
> rather than trying to construct something excessively complicated
> and possibly fragile?

Please don't shoot the messenger, this is the version copied from ARMv7.
I noticed the redundant register as well, but didn't dare to touch it
(assuming some higher wisdom behind it).
And good catch for the cc clobber!

Cheers,
Andre.

> The https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html page provides
> some information.
> 
>> +}
>> +
>>  int cleanup_before_linux(void)
>>  {
>>  	/*
> 
>
Siarhei Siamashka Nov. 24, 2016, 1:33 a.m. UTC | #4
On Mon, 21 Nov 2016 16:52:47 +0100
Alexander Graf <agraf@suse.de> wrote:

> On 20/11/2016 15:57, Andre Przywara wrote:
> > The sunxi DRAM setup code needs an sdelay() implementation, which
> > wasn't defined for armv8 so far.
> > Shamelessly copy the armv7 version and adjust it to work in AArch64.
> >
> > Signed-off-by: Andre Przywara <andre.przywara@arm.com>  
> 
> I don't think it hurts to write this in C - and I also doubt that 
> inlining has any negative effect.
> 
> Something along the lines of
> 
> static inline void sdelay(...) {
>    for (; loops; loops--)
>      asm volatile("");
> }
> 
> inside a header should do the trick as well and is much more readable.

Unfortunately the performance of the generated C code is very
unpredictable. Depending on the optimization settings, it may
place the counter variable in a register, or keep it on stack.

It would be much nicer to have more predictable timings for
these delays. So I like the assembly version a lot better.
Naturally, when it is implemented correctly.

> 
> 
> Alex
> 
> > ---
> >  arch/arm/cpu/armv8/cpu.c | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> >
> > diff --git a/arch/arm/cpu/armv8/cpu.c b/arch/arm/cpu/armv8/cpu.c
> > index e06c3cc..e82e9cf 100644
> > --- a/arch/arm/cpu/armv8/cpu.c
> > +++ b/arch/arm/cpu/armv8/cpu.c
> > @@ -16,6 +16,19 @@
> >  #include <asm/system.h>
> >  #include <linux/compiler.h>
> >
> > +/************************************************************
> > + * sdelay() - simple spin loop.  Will be constant time as
> > + *  its generally used in bypass conditions only.  This
> > + *  is necessary until timers are accessible.
> > + *
> > + *  not inline to increase chances its in cache when called
> > + *************************************************************/
> > +void sdelay(unsigned long loops)
> > +{
> > +	__asm__ volatile ("1:\n" "subs %0, %1, #1\n"
> > +			  "b.ne 1b":"=r" (loops):"0"(loops));
> > +}
> > +
> >  int cleanup_before_linux(void)
> >  {
> >  	/*
> >
diff mbox

Patch

diff --git a/arch/arm/cpu/armv8/cpu.c b/arch/arm/cpu/armv8/cpu.c
index e06c3cc..e82e9cf 100644
--- a/arch/arm/cpu/armv8/cpu.c
+++ b/arch/arm/cpu/armv8/cpu.c
@@ -16,6 +16,19 @@ 
 #include <asm/system.h>
 #include <linux/compiler.h>
 
+/************************************************************
+ * sdelay() - simple spin loop.  Will be constant time as
+ *  its generally used in bypass conditions only.  This
+ *  is necessary until timers are accessible.
+ *
+ *  not inline to increase chances its in cache when called
+ *************************************************************/
+void sdelay(unsigned long loops)
+{
+	__asm__ volatile ("1:\n" "subs %0, %1, #1\n"
+			  "b.ne 1b":"=r" (loops):"0"(loops));
+}
+
 int cleanup_before_linux(void)
 {
 	/*