arc: make sure __delay() never gets executed with 0 loops
diff mbox

Message ID 1455554267-23886-1-git-send-email-abrodkin@synopsys.com
State New
Headers show

Commit Message

Alexey Brodkin Feb. 15, 2016, 4:37 p.m. UTC
Current implementation of __delay() function uses so-called
zero-delay loops. And the only condition to exit that loop is
LP_COUNT (loop count register) = 1 (but not 0 as it might be easily
imagined).

So if our calculation of "loops" gives 0 (and that is pretty possible
given result of multiplication being >> 32) then zero-delay loop
mechanism starts with LP_COUNT=0 and it ends up decrementing LP_COUNT
while staying in the loop effectively producing close to infinite delay
instead of very short one.

I bumped into it with AXS101 + external DDR controller and caches
disabled. In that case I've got very small
loops_per_jiffy=0xf00:
------------------------>8--------------------
Calibrating delay loop... 0.77 BogoMIPS (lpj=3862)
------------------------>8--------------------

And on console output delays were way too long.

Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
 Please enter the commit message for your changes. Lines starting
---
 arch/arc/include/asm/delay.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Vineet Gupta Feb. 24, 2016, 5:05 a.m. UTC | #1
On Monday 15 February 2016 10:07 PM, Alexey Brodkin wrote:
> Current implementation of __delay() function uses so-called
> zero-delay loops. And the only condition to exit that loop is
> LP_COUNT (loop count register) = 1 (but not 0 as it might be easily
> imagined).

So u can fix this better by doing a lp.nz, but....

> So if our calculation of "loops" gives 0 (and that is pretty possible
> given result of multiplication being >> 32) then zero-delay loop
> mechanism starts with LP_COUNT=0 and it ends up decrementing LP_COUNT
> while staying in the loop effectively producing close to infinite delay
> instead of very short one.
>
> I bumped into it with AXS101 + external DDR controller and caches
> disabled. In that case I've got very small
> loops_per_jiffy=0xf00:

I understand this gives you grief, but the code is doing exactly what it is asked to.
Since the system is slow, You are getting only 0xf00 (3840) loop iterations in 10ms.
So if you want say a delay of 1 micro-sec, you will need to loop for 3840 / 10000
~ 0 loops

This all assumes our lpj computation is correct - otherwise that needs fixing too.

Anyways I think for genuine cases where the number of loops is indeed computed to
0 because caller was passing too small a value, it is better to wait for looong
time to catch the bugger rather than silently returning. This is one of the cases
where disease is better than the cure !

> ------------------------>8--------------------
> Calibrating delay loop... 0.77 BogoMIPS (lpj=3862)
> ------------------------>8--------------------
>
> And on console output delays were way too long.
>
> Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
> Cc: Vineet Gupta <vgupta@synopsys.com>
>  Please enter the commit message for your changes. Lines starting
> ---
>  arch/arc/include/asm/delay.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arc/include/asm/delay.h b/arch/arc/include/asm/delay.h
> index 08e7e2a..1a7a1dc 100644
> --- a/arch/arc/include/asm/delay.h
> +++ b/arch/arc/include/asm/delay.h
> @@ -57,7 +57,8 @@ static inline void __udelay(unsigned long usecs)
>  	 */
>  	loops = ((u64) usecs * 4295 * HZ * loops_per_jiffy) >> 32;
>  
> -	__delay(loops);
> +	if (loops)
> +		__delay(loops);
>  }
>  
>  #define udelay(n) (__builtin_constant_p(n) ? ((n) > 20000 ? __bad_udelay() \
Alexey Brodkin Feb. 24, 2016, 2:10 p.m. UTC | #2
Hi Vineet,

On Wed, 2016-02-24 at 05:05 +0000, Vineet Gupta wrote:
> On Monday 15 February 2016 10:07 PM, Alexey Brodkin wrote:
> > Current implementation of __delay() function uses so-called
> > zero-delay loops. And the only condition to exit that loop is
> > LP_COUNT (loop count register) = 1 (but not 0 as it might be easily
> > imagined).
> 
> So u can fix this better by doing a lp.nz, but....
> 
> > So if our calculation of "loops" gives 0 (and that is pretty possible
> > given result of multiplication being >> 32) then zero-delay loop
> > mechanism starts with LP_COUNT=0 and it ends up decrementing LP_COUNT
> > while staying in the loop effectively producing close to infinite delay
> > instead of very short one.
> > 
> > I bumped into it with AXS101 + external DDR controller and caches
> > disabled. In that case I've got very small
> > loops_per_jiffy=0xf00:
> 
> I understand this gives you grief, but the code is doing exactly what it is asked to.
> Since the system is slow, You are getting only 0xf00 (3840) loop iterations in 10ms.
> So if you want say a delay of 1 micro-sec, you will need to loop for 3840 / 10000
> ~ 0 loops
> 
> This all assumes our lpj computation is correct - otherwise that needs fixing too.
> 
> Anyways I think for genuine cases where the number of loops is indeed computed to
> 0 because caller was passing too small a value, it is better to wait for looong
> time to catch the bugger rather than silently returning. This is one of the cases
> where disease is better than the cure !

Ok, but see delays (even those for just a few usecs) might be justified by
hardware requirements (i.e. some peripheral need at least 10 usecs between this
and tat operations). So one driver will attempt to use completely correct value
for delay but on slow hardware (or especially in simulation) we'll get kernel
virtually stuck for no obvious reason.

So frankly I don't like proposal to keep existing implementation.
And solution with "lp.nz" looks much better to me.

-Alexey

Patch
diff mbox

diff --git a/arch/arc/include/asm/delay.h b/arch/arc/include/asm/delay.h
index 08e7e2a..1a7a1dc 100644
--- a/arch/arc/include/asm/delay.h
+++ b/arch/arc/include/asm/delay.h
@@ -57,7 +57,8 @@  static inline void __udelay(unsigned long usecs)
 	 */
 	loops = ((u64) usecs * 4295 * HZ * loops_per_jiffy) >> 32;
 
-	__delay(loops);
+	if (loops)
+		__delay(loops);
 }
 
 #define udelay(n) (__builtin_constant_p(n) ? ((n) > 20000 ? __bad_udelay() \