diff mbox

[U-Boot,v3] fsl_ddr: Don't use full 64-bit divides on 32-bit PowerPC

Message ID 1300202627-7245-1-git-send-email-Kyle.D.Moffett@boeing.com
State Accepted
Commit e820a131f4084397a212c6ffe3ed8ea0ce43631f
Delegated to: Kumar Gala
Headers show

Commit Message

Kyle Moffett March 15, 2011, 3:23 p.m. UTC
The current FreeScale MPC-8xxx DDR SPD interpreter is using full 64-bit
integer divide operations to convert between nanoseconds and DDR clock
cycles given arbitrary DDR clock frequencies.

Since all of the inputs to this are 32-bit (nanoseconds, clock cycles,
and DDR frequencies), we can easily restructure the computation to use
the "do_div()" function to perform 64-bit/32-bit divide operations.

On 64-bit this change is basically a no-op, because do_div is
implemented as a literal 64-bit divide operation and the instruction
scheduling works out almost the same.

On 32-bit PowerPC a fully accurate 64/64 divide (__udivdi3 in libgcc) is
over 1.1kB of code and thousands of heavily dependent cycles to compute,
all of which is linked from libgcc.  Another 1.2kB of code comes in for
the function __umoddi3.

It should be noted that nothing else in U-Boot or the Linux kernel seems
to require a full 64-bit divide on my 32-bit PowerPC.

Build-and-boot-tested on the HWW-1U-1A board using DDR2 SPD detection.

Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
Acked-by: York Sun <yorksun@freescale.com>
Cc: Andy Fleming <afleming@gmail.com>
Cc: Kumar Gala <kumar.gala@freescale.com>

--
Changelog:
v2: Resubmitted separately from the other HWW-1U-1A patches
v3: Rebased on the 'next' branch of git://git.denx.de/u-boot-mpc85xx.git

 arch/powerpc/cpu/mpc8xxx/ddr/util.c |   56 +++++++++++++++++++++++++----------
 1 files changed, 40 insertions(+), 16 deletions(-)

Comments

Kumar Gala March 31, 2011, 8:25 a.m. UTC | #1
On Mar 15, 2011, at 10:23 AM, Kyle Moffett wrote:

> The current FreeScale MPC-8xxx DDR SPD interpreter is using full 64-bit
> integer divide operations to convert between nanoseconds and DDR clock
> cycles given arbitrary DDR clock frequencies.
> 
> Since all of the inputs to this are 32-bit (nanoseconds, clock cycles,
> and DDR frequencies), we can easily restructure the computation to use
> the "do_div()" function to perform 64-bit/32-bit divide operations.
> 
> On 64-bit this change is basically a no-op, because do_div is
> implemented as a literal 64-bit divide operation and the instruction
> scheduling works out almost the same.
> 
> On 32-bit PowerPC a fully accurate 64/64 divide (__udivdi3 in libgcc) is
> over 1.1kB of code and thousands of heavily dependent cycles to compute,
> all of which is linked from libgcc.  Another 1.2kB of code comes in for
> the function __umoddi3.
> 
> It should be noted that nothing else in U-Boot or the Linux kernel seems
> to require a full 64-bit divide on my 32-bit PowerPC.
> 
> Build-and-boot-tested on the HWW-1U-1A board using DDR2 SPD detection.
> 
> Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
> Acked-by: York Sun <yorksun@freescale.com>
> Cc: Andy Fleming <afleming@gmail.com>
> Cc: Kumar Gala <kumar.gala@freescale.com>
> 
> --
> Changelog:
> v2: Resubmitted separately from the other HWW-1U-1A patches
> v3: Rebased on the 'next' branch of git://git.denx.de/u-boot-mpc85xx.git
> 
> arch/powerpc/cpu/mpc8xxx/ddr/util.c |   56 +++++++++++++++++++++++++----------
> 1 files changed, 40 insertions(+), 16 deletions(-)

applied to 85xx next

- k
Wolfgang Denk April 13, 2011, 8:51 p.m. UTC | #2
Dear Kyle Moffett,

In message <1300202627-7245-1-git-send-email-Kyle.D.Moffett@boeing.com> you wrote:
> The current FreeScale MPC-8xxx DDR SPD interpreter is using full 64-bit
> integer divide operations to convert between nanoseconds and DDR clock
> cycles given arbitrary DDR clock frequencies.
...
> +/* To avoid 64-bit full-divides, we factor this here */
> +#define ULL_2e12 2000000000000ULL
> +#define UL_5pow12 244140625UL
> +#define UL_2pow13 (1UL << 13)
> +
> +#define ULL_8Fs 0xFFFFFFFFULL

Unfortunately this is already in mainline.  Can you please send a
cleanup patch to fix the CamelCaps macro names?

Thanks.

Best regards,

Wolfgang Denk
diff mbox

Patch

diff --git a/arch/powerpc/cpu/mpc8xxx/ddr/util.c b/arch/powerpc/cpu/mpc8xxx/ddr/util.c
index b9a5a69..02908b4 100644
--- a/arch/powerpc/cpu/mpc8xxx/ddr/util.c
+++ b/arch/powerpc/cpu/mpc8xxx/ddr/util.c
@@ -8,9 +8,17 @@ 
 
 #include <common.h>
 #include <asm/fsl_law.h>
+#include <div64.h>
 
 #include "ddr.h"
 
+/* To avoid 64-bit full-divides, we factor this here */
+#define ULL_2e12 2000000000000ULL
+#define UL_5pow12 244140625UL
+#define UL_2pow13 (1UL << 13)
+
+#define ULL_8Fs 0xFFFFFFFFULL
+
 /*
  * Round mclk_ps to nearest 10 ps in memory controller code.
  *
@@ -20,35 +28,51 @@ 
  */
 unsigned int get_memory_clk_period_ps(void)
 {
-	unsigned int mclk_ps;
+	unsigned int data_rate = get_ddr_freq(0);
+	unsigned int result;
+
+	/* Round to nearest 10ps, being careful about 64-bit multiply/divide */
+	unsigned long long mclk_ps = ULL_2e12;
 
-	mclk_ps = 2000000000000ULL / get_ddr_freq(0);
-	/* round to nearest 10 ps */
-	return 10 * ((mclk_ps + 5) / 10);
+	/* Add 5*data_rate, for rounding */
+	mclk_ps += 5*(unsigned long long)data_rate;
+
+	/* Now perform the big divide, the result fits in 32-bits */
+	do_div(mclk_ps, data_rate);
+	result = mclk_ps;
+
+	/* We still need to round to 10ps */
+	return 10 * (result/10);
 }
 
 /* Convert picoseconds into DRAM clock cycles (rounding up if needed). */
 unsigned int picos_to_mclk(unsigned int picos)
 {
-	const unsigned long long ULL_2e12 = 2000000000000ULL;
-	const unsigned long long ULL_8Fs = 0xFFFFFFFFULL;
-	unsigned long long clks;
-	unsigned long long clks_temp;
+	unsigned long long clks, clks_rem;
 
+	/* Short circuit for zero picos */
 	if (!picos)
 		return 0;
 
-	clks = get_ddr_freq(0) * (unsigned long long) picos;
-	clks_temp = clks;
-	clks = clks / ULL_2e12;
-	if (clks_temp % ULL_2e12) {
+	/* First multiply the time by the data rate (32x32 => 64) */
+	clks = picos * (unsigned long long)get_ddr_freq(0);
+
+	/*
+	 * Now divide by 5^12 and track the 32-bit remainder, then divide
+	 * by 2*(2^12) using shifts (and updating the remainder).
+	 */
+	clks_rem = do_div(clks, UL_5pow12);
+	clks_rem <<= 13;
+	clks_rem |= clks & (UL_2pow13-1);
+	clks >>= 13;
+
+	/* If we had a remainder, then round up */
+	if (clks_rem)
 		clks++;
-	}
 
-	if (clks > ULL_8Fs) {
+	/* Clamp to the maximum representable value */
+	if (clks > ULL_8Fs)
 		clks = ULL_8Fs;
-	}
-
 	return (unsigned int) clks;
 }