Message ID | 1300202627-7245-1-git-send-email-Kyle.D.Moffett@boeing.com |
---|---|
State | Accepted |
Commit | e820a131f4084397a212c6ffe3ed8ea0ce43631f |
Delegated to: | Kumar Gala |
Headers | show |
On Mar 15, 2011, at 10:23 AM, Kyle Moffett wrote: > The current FreeScale MPC-8xxx DDR SPD interpreter is using full 64-bit > integer divide operations to convert between nanoseconds and DDR clock > cycles given arbitrary DDR clock frequencies. > > Since all of the inputs to this are 32-bit (nanoseconds, clock cycles, > and DDR frequencies), we can easily restructure the computation to use > the "do_div()" function to perform 64-bit/32-bit divide operations. > > On 64-bit this change is basically a no-op, because do_div is > implemented as a literal 64-bit divide operation and the instruction > scheduling works out almost the same. > > On 32-bit PowerPC a fully accurate 64/64 divide (__udivdi3 in libgcc) is > over 1.1kB of code and thousands of heavily dependent cycles to compute, > all of which is linked from libgcc. Another 1.2kB of code comes in for > the function __umoddi3. > > It should be noted that nothing else in U-Boot or the Linux kernel seems > to require a full 64-bit divide on my 32-bit PowerPC. > > Build-and-boot-tested on the HWW-1U-1A board using DDR2 SPD detection. > > Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com> > Acked-by: York Sun <yorksun@freescale.com> > Cc: Andy Fleming <afleming@gmail.com> > Cc: Kumar Gala <kumar.gala@freescale.com> > > -- > Changelog: > v2: Resubmitted separately from the other HWW-1U-1A patches > v3: Rebased on the 'next' branch of git://git.denx.de/u-boot-mpc85xx.git > > arch/powerpc/cpu/mpc8xxx/ddr/util.c | 56 +++++++++++++++++++++++++---------- > 1 files changed, 40 insertions(+), 16 deletions(-) applied to 85xx next - k
Dear Kyle Moffett, In message <1300202627-7245-1-git-send-email-Kyle.D.Moffett@boeing.com> you wrote: > The current FreeScale MPC-8xxx DDR SPD interpreter is using full 64-bit > integer divide operations to convert between nanoseconds and DDR clock > cycles given arbitrary DDR clock frequencies. ... > +/* To avoid 64-bit full-divides, we factor this here */ > +#define ULL_2e12 2000000000000ULL > +#define UL_5pow12 244140625UL > +#define UL_2pow13 (1UL << 13) > + > +#define ULL_8Fs 0xFFFFFFFFULL Unfortunately this is already in mainline. Can you please send a cleanup patch to fix the CamelCaps macro names? Thanks. Best regards, Wolfgang Denk
diff --git a/arch/powerpc/cpu/mpc8xxx/ddr/util.c b/arch/powerpc/cpu/mpc8xxx/ddr/util.c index b9a5a69..02908b4 100644 --- a/arch/powerpc/cpu/mpc8xxx/ddr/util.c +++ b/arch/powerpc/cpu/mpc8xxx/ddr/util.c @@ -8,9 +8,17 @@ #include <common.h> #include <asm/fsl_law.h> +#include <div64.h> #include "ddr.h" +/* To avoid 64-bit full-divides, we factor this here */ +#define ULL_2e12 2000000000000ULL +#define UL_5pow12 244140625UL +#define UL_2pow13 (1UL << 13) + +#define ULL_8Fs 0xFFFFFFFFULL + /* * Round mclk_ps to nearest 10 ps in memory controller code. * @@ -20,35 +28,51 @@ */ unsigned int get_memory_clk_period_ps(void) { - unsigned int mclk_ps; + unsigned int data_rate = get_ddr_freq(0); + unsigned int result; + + /* Round to nearest 10ps, being careful about 64-bit multiply/divide */ + unsigned long long mclk_ps = ULL_2e12; - mclk_ps = 2000000000000ULL / get_ddr_freq(0); - /* round to nearest 10 ps */ - return 10 * ((mclk_ps + 5) / 10); + /* Add 5*data_rate, for rounding */ + mclk_ps += 5*(unsigned long long)data_rate; + + /* Now perform the big divide, the result fits in 32-bits */ + do_div(mclk_ps, data_rate); + result = mclk_ps; + + /* We still need to round to 10ps */ + return 10 * (result/10); } /* Convert picoseconds into DRAM clock cycles (rounding up if needed). */ unsigned int picos_to_mclk(unsigned int picos) { - const unsigned long long ULL_2e12 = 2000000000000ULL; - const unsigned long long ULL_8Fs = 0xFFFFFFFFULL; - unsigned long long clks; - unsigned long long clks_temp; + unsigned long long clks, clks_rem; + /* Short circuit for zero picos */ if (!picos) return 0; - clks = get_ddr_freq(0) * (unsigned long long) picos; - clks_temp = clks; - clks = clks / ULL_2e12; - if (clks_temp % ULL_2e12) { + /* First multiply the time by the data rate (32x32 => 64) */ + clks = picos * (unsigned long long)get_ddr_freq(0); + + /* + * Now divide by 5^12 and track the 32-bit remainder, then divide + * by 2*(2^12) using shifts (and updating the remainder). + */ + clks_rem = do_div(clks, UL_5pow12); + clks_rem <<= 13; + clks_rem |= clks & (UL_2pow13-1); + clks >>= 13; + + /* If we had a remainder, then round up */ + if (clks_rem) clks++; - } - if (clks > ULL_8Fs) { + /* Clamp to the maximum representable value */ + if (clks > ULL_8Fs) clks = ULL_8Fs; - } - return (unsigned int) clks; }