Patchwork [U-Boot,4/7] fsl_ddr: Don't use full 64-bit divides on 32-bit PowerPC

login
register
mail settings
Submitter Kyle Moffett
Date Feb. 21, 2011, 5:59 p.m.
Message ID <1298311199-18775-5-git-send-email-Kyle.D.Moffett@boeing.com>
Download mbox | patch
Permalink /patch/83916/
State Deferred
Headers show

Comments

Kyle Moffett - Feb. 21, 2011, 5:59 p.m.
The current FreeScale MPC-8xxx DDR SPD interpreter is using full 64-bit
integer divide operations to convert between nanoseconds and DDR clock
cycles given arbitrary DDR clock frequencies.

Since all of the inputs to this are 32-bit (nanoseconds, clock cycles,
and DDR frequencies), we can easily restructure the computation to use
the "do_div()" function to perform 64-bit/32-bit divide operations.

This decreases compute time rather significantly for each conversion and
avoids bringing in a very complicated function from libgcc.

It should be noted that nothing else in U-Boot or the Linux kernel seems
to require a full 64-bit divide on any 32-bit PowerPC.

Build-and-boot-tested on the HWW-1U-1A board using DDR2 SPD detection.

Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
---

Author's note:  This patch really needs a bunch more review and testing, but
I only have access to a very limited selection of hardware.  Please let me
know about any questions or concerns.

 arch/powerpc/cpu/mpc8xxx/ddr/util.c |   58 ++++++++++++++++++++++++----------
 1 files changed, 41 insertions(+), 17 deletions(-)
Wolfgang Denk - Feb. 21, 2011, 9:16 p.m.
Dear Kyle Moffett,

In message <1298311199-18775-5-git-send-email-Kyle.D.Moffett@boeing.com> you wrote:
> The current FreeScale MPC-8xxx DDR SPD interpreter is using full 64-bit
> integer divide operations to convert between nanoseconds and DDR clock
> cycles given arbitrary DDR clock frequencies.
> 
> Since all of the inputs to this are 32-bit (nanoseconds, clock cycles,
> and DDR frequencies), we can easily restructure the computation to use
> the "do_div()" function to perform 64-bit/32-bit divide operations.
> 
> This decreases compute time rather significantly for each conversion and
> avoids bringing in a very complicated function from libgcc.
> 
> It should be noted that nothing else in U-Boot or the Linux kernel seems
> to require a full 64-bit divide on any 32-bit PowerPC.
> 
> Build-and-boot-tested on the HWW-1U-1A board using DDR2 SPD detection.
> 
> Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
> ---
> 
> Author's note:  This patch really needs a bunch more review and testing, but
> I only have access to a very limited selection of hardware.  Please let me
> know about any questions or concerns.

This patch should be split off the patch series and submitted
separately.

Best regards,

Wolfgang Denk

Patch

diff --git a/arch/powerpc/cpu/mpc8xxx/ddr/util.c b/arch/powerpc/cpu/mpc8xxx/ddr/util.c
index 1e2d921..c545d59 100644
--- a/arch/powerpc/cpu/mpc8xxx/ddr/util.c
+++ b/arch/powerpc/cpu/mpc8xxx/ddr/util.c
@@ -8,11 +8,19 @@ 
 
 #include <common.h>
 #include <asm/fsl_law.h>
+#include <div64.h>
 
 #include "ddr.h"
 
 unsigned int fsl_ddr_get_mem_data_rate(void);
 
+/* To avoid 64-bit full-divides, we factor this here */
+#define ULL_2e12 2000000000000ULL
+#define UL_5pow12 244140625UL
+#define UL_2pow13 (1UL << 13)
+
+#define ULL_8Fs 0xFFFFFFFFULL
+
 /*
  * Round mclk_ps to nearest 10 ps in memory controller code.
  *
@@ -22,36 +30,52 @@  unsigned int fsl_ddr_get_mem_data_rate(void);
  */
 unsigned int get_memory_clk_period_ps(void)
 {
-	unsigned int mclk_ps;
+	unsigned int data_rate = fsl_ddr_get_mem_data_rate();
+	unsigned int result;
+
+	/* Round to nearest 10ps, being careful about 64-bit multiply/divide */
+	unsigned long long mclk_ps = ULL_2e12;
 
-	mclk_ps = 2000000000000ULL / fsl_ddr_get_mem_data_rate();
-	/* round to nearest 10 ps */
-	return 10 * ((mclk_ps + 5) / 10);
+	/* Add 5*data_rate, for rounding */
+	mclk_ps += 5*(unsigned long long)data_rate;
+
+	/* Now perform the big divide, the result fits in 32-bits */
+	do_div(mclk_ps, data_rate);
+	result = mclk_ps;
+
+	/* We still need to round to 10ps */
+	return 10 * (result/10);
 }
 
 /* Convert picoseconds into DRAM clock cycles (rounding up if needed). */
 unsigned int picos_to_mclk(unsigned int picos)
 {
-	const unsigned long long ULL_2e12 = 2000000000000ULL;
-	const unsigned long long ULL_8Fs = 0xFFFFFFFFULL;
-	unsigned long long clks;
-	unsigned long long clks_temp;
+	unsigned long long clks, clks_rem;
 
+	/* Short circuit for zero picos */
 	if (!picos)
 		return 0;
 
-	clks = fsl_ddr_get_mem_data_rate() * (unsigned long long) picos;
-	clks_temp = clks;
-	clks = clks / ULL_2e12;
-	if (clks_temp % ULL_2e12) {
+	/* First multiply the time by the data rate (32x32 => 64) */
+	clks = picos * (unsigned long long)fsl_ddr_get_mem_data_rate();
+
+	/*
+	 * Now divide by 5^12 and track the 32-bit remainder, then divide
+	 * by 2*(2^12) using shifts (and updating the remainder).
+	 */
+	clks_rem = do_div(clks, UL_5pow12);
+	clks_rem <<= 13;
+	clks_rem |= clks & (UL_2pow13-1);
+	clks >>= 13;
+
+	/* If we had a remainder, then round up */
+	if (clks_rem)
 		clks++;
-	}
 
-	if (clks > ULL_8Fs) {
+	/* Clamp to the maximum representable value */
+	if (clks > ULL_8Fs)
 		clks = ULL_8Fs;
-	}
-
-	return (unsigned int) clks;
+	return (unsigned int)clks;
 }
 
 unsigned int mclk_to_picos(unsigned int mclk)