Message ID | 1314787130-1043-1-git-send-email-clchiou@chromium.org |
---|---|
State | Changes Requested |
Headers | show |
On Wednesday, August 31, 2011 12:38:50 PM Che-Liang Chiou wrote: > This patch adds a 64-64 bit divider that supports ARMv4 and above. > > Because clz (count leading zero) instruction is added until ARMv5, the > divider implements a clz function for ARMv4 targets. > > The divider was tested with the following test driver code ran by > qemu-arm: > > int main(void) > { > uint64_t a, b, q, r; > while (scanf("%llx %llx %llx %llx", &a, &b, &q, &r) > 0) > printf("%016llx %016llx %016llx %016llx\n", a, b, a / b, a % b); > return 0; > } > > Signed-off-by: Che-Liang Chiou <clchiou@chromium.org> > Cc: Albert Aribaud <albert.u.boot@aribaud.net> > --- Hi, do you see any kind of a performance hit so you can't use the default "C" version? Cheers
On Wednesday, August 31, 2011 06:38:50 Che-Liang Chiou wrote:
> This patch adds a 64-64 bit divider that supports ARMv4 and above.
why ? if you're doing 64 bit divides, chances are you're doing something
fundamentally wrong. perhaps you should fix that instead.
this is also why we have the do_div() helper macro.
so until your changelog documents the actual *reason* for this patch: NAK
-mike
On Wednesday, August 31, 2011 04:32:52 PM Mike Frysinger wrote: > On Wednesday, August 31, 2011 06:38:50 Che-Liang Chiou wrote: > > This patch adds a 64-64 bit divider that supports ARMv4 and above. > > why ? if you're doing 64 bit divides, chances are you're doing something > fundamentally wrong. perhaps you should fix that instead. Oh come on Mike, what about too big NAND memories ? > > this is also why we have the do_div() helper macro. > > so until your changelog documents the actual *reason* for this patch: NAK The reason is likely it's faster. But I don't think it matters, that's why I commented on this already. If he's fixing something by this (like I mistakenly did some time ago), there's really something wrong. > -mike
On Wednesday, August 31, 2011 11:11:00 Marek Vasut wrote: > On Wednesday, August 31, 2011 04:32:52 PM Mike Frysinger wrote: > > On Wednesday, August 31, 2011 06:38:50 Che-Liang Chiou wrote: > > > This patch adds a 64-64 bit divider that supports ARMv4 and above. > > > > why ? if you're doing 64 bit divides, chances are you're doing something > > fundamentally wrong. perhaps you should fix that instead. > > Oh come on Mike, what about too big NAND memories ? Linux hasnt had a problem supporting large NAND without a 64bit divide routine. why are we special ? > > this is also why we have the do_div() helper macro. > > > > so until your changelog documents the actual *reason* for this patch: NAK > > The reason is likely it's faster. let's see actual #'s -mike
On Wednesday, August 31, 2011 05:27:46 PM Mike Frysinger wrote: > On Wednesday, August 31, 2011 11:11:00 Marek Vasut wrote: > > On Wednesday, August 31, 2011 04:32:52 PM Mike Frysinger wrote: > > > On Wednesday, August 31, 2011 06:38:50 Che-Liang Chiou wrote: > > > > This patch adds a 64-64 bit divider that supports ARMv4 and above. > > > > > > why ? if you're doing 64 bit divides, chances are you're doing > > > something fundamentally wrong. perhaps you should fix that instead. > > > > Oh come on Mike, what about too big NAND memories ? > > Linux hasnt had a problem supporting large NAND without a 64bit divide > routine. why are we special ? Because someone (?) has to fix the code that uses do_div() ;-) > > > > this is also why we have the do_div() helper macro. > > > > > > so until your changelog documents the actual *reason* for this patch: > > > NAK > > > > The reason is likely it's faster. > > let's see actual #'s True, will you make the measurements? ;-) Still, I'd stick with the plain-C version, it doesn't matter I guess. Cheers > -mike
On Wednesday, August 31, 2011 11:33:59 Marek Vasut wrote: > On Wednesday, August 31, 2011 05:27:46 PM Mike Frysinger wrote: > > On Wednesday, August 31, 2011 11:11:00 Marek Vasut wrote: > > > On Wednesday, August 31, 2011 04:32:52 PM Mike Frysinger wrote: > > > > On Wednesday, August 31, 2011 06:38:50 Che-Liang Chiou wrote: > > > > > This patch adds a 64-64 bit divider that supports ARMv4 and above. > > > > > > > > why ? if you're doing 64 bit divides, chances are you're doing > > > > something fundamentally wrong. perhaps you should fix that instead. > > > > > > Oh come on Mike, what about too big NAND memories ? > > > > Linux hasnt had a problem supporting large NAND without a 64bit divide > > routine. why are we special ? > > Because someone (?) has to fix the code that uses do_div() ;-) sure ... the guy with the problem gets to post the fix :) -mike
On Wednesday, August 31, 2011 06:05:29 PM Mike Frysinger wrote: > On Wednesday, August 31, 2011 11:33:59 Marek Vasut wrote: > > On Wednesday, August 31, 2011 05:27:46 PM Mike Frysinger wrote: > > > On Wednesday, August 31, 2011 11:11:00 Marek Vasut wrote: > > > > On Wednesday, August 31, 2011 04:32:52 PM Mike Frysinger wrote: > > > > > On Wednesday, August 31, 2011 06:38:50 Che-Liang Chiou wrote: > > > > > > This patch adds a 64-64 bit divider that supports ARMv4 and > > > > > > above. > > > > > > > > > > why ? if you're doing 64 bit divides, chances are you're doing > > > > > something fundamentally wrong. perhaps you should fix that > > > > > instead. > > > > > > > > Oh come on Mike, what about too big NAND memories ? > > > > > > Linux hasnt had a problem supporting large NAND without a 64bit divide > > > routine. why are we special ? > > > > Because someone (?) has to fix the code that uses do_div() ;-) > > sure ... the guy with the problem gets to post the fix :) Cool, would that be ... you ? ;-) Cheers
On Wednesday, August 31, 2011 12:30:25 Marek Vasut wrote: > On Wednesday, August 31, 2011 06:05:29 PM Mike Frysinger wrote: > > On Wednesday, August 31, 2011 11:33:59 Marek Vasut wrote: > > > On Wednesday, August 31, 2011 05:27:46 PM Mike Frysinger wrote: > > > > On Wednesday, August 31, 2011 11:11:00 Marek Vasut wrote: > > > > > On Wednesday, August 31, 2011 04:32:52 PM Mike Frysinger wrote: > > > > > > On Wednesday, August 31, 2011 06:38:50 Che-Liang Chiou wrote: > > > > > > > This patch adds a 64-64 bit divider that supports ARMv4 and > > > > > > > above. > > > > > > > > > > > > why ? if you're doing 64 bit divides, chances are you're doing > > > > > > something fundamentally wrong. perhaps you should fix that > > > > > > instead. > > > > > > > > > > Oh come on Mike, what about too big NAND memories ? > > > > > > > > Linux hasnt had a problem supporting large NAND without a 64bit > > > > divide routine. why are we special ? > > > > > > Because someone (?) has to fix the code that uses do_div() ;-) > > > > sure ... the guy with the problem gets to post the fix :) > > Cool, would that be ... you ? ;-) no, because it's building fine for me, thus i dont have a problem -mike
Dear Che-Liang Chiou,
In message <1314787130-1043-1-git-send-email-clchiou@chromium.org> you wrote:
> This patch adds a 64-64 bit divider that supports ARMv4 and above.
To summarize the misc feedback: Please explain in detail which
problem you are trying to fix. We see no need for this patch so far.
Best regards,
Wolfgang Denk
Hi, Thanks for the insightful comments. Here are my responses: * Why don't I implement the divider in C? It is not because I think it's performance critical (I haven't benchmarked it yet), but because I have a probably wrong impression that the divider has to be written in assembly --- all dividers in arch/arm/lib/ are written in ARM assembly. What is the policy here for using assembly or C? * When do we need a 64-bit divider? In kernel code do_div() is used for various purposes. So I think it should be quite often that we would need a 64-bit divider in U-Boot. * Do we need a 64-64 bit divider? do_div() defines 64-32 bit division semantics (dividend is 64-bit and divisor is 32-bit), and this patch implements a 64-64 bit divider (both dividend and divisor are 64-bit). I have to admit that I can't think of scenarios or reasons to justify a 64-64 bit divider instead of a 64-32 bit divider, except that a 64-64 bit divider is more generic than a 64-32 bit one. So I guess we can agree that a 64-bit divider is feature that is nice to have, and we should decide: * Do we need a 64-64 bit divider or a 64-32 bit one? * Do we write it in C or assembly? Depending on our decisions, I will rewrite (or abandon) this patch accordingly. Regards, Che-Liang On Thu, Sep 1, 2011 at 4:03 AM, Wolfgang Denk <wd@denx.de> wrote: > Dear Che-Liang Chiou, > > In message <1314787130-1043-1-git-send-email-clchiou@chromium.org> you wrote: >> This patch adds a 64-64 bit divider that supports ARMv4 and above. > > To summarize the misc feedback: Please explain in detail which > problem you are trying to fix. We see no need for this patch so far. > > Best regards, > > Wolfgang Denk > > -- > DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany > Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de > "Success covers a multitude of blunders." - George Bernard Shaw >
On Thursday, September 01, 2011 12:09:18 PM Che-liang Chiou wrote: > Hi, > > Thanks for the insightful comments. Here are my responses: > > * Why don't I implement the divider in C? > It is not because I think it's performance critical (I haven't > benchmarked it yet), but because I have a probably wrong impression > that the divider has to be written in assembly --- all dividers in > arch/arm/lib/ are written in ARM assembly. What is the policy here for > using assembly or C? No, C is just fine and is more generic. Those assembler versions are just optimized things, you don't need to be bothered by those. > > * When do we need a 64-bit divider? > In kernel code do_div() is used for various purposes. So I think it > should be quite often that we would need a 64-bit divider in U-Boot. Not much really ... and for the rare cases, we can do with do_div() as is. > > * Do we need a 64-64 bit divider? > do_div() defines 64-32 bit division semantics (dividend is 64-bit and > divisor is 32-bit), and this patch implements a 64-64 bit divider > (both dividend and divisor are 64-bit). I have to admit that I can't > think of scenarios or reasons to justify a 64-64 bit divider instead > of a 64-32 bit divider, except that a 64-64 bit divider is more > generic than a 64-32 bit one. So we don't need 64/64 divide at all. > > So I guess we can agree that a 64-bit divider is feature that is nice > to have, and we should decide: > * Do we need a 64-64 bit divider or a 64-32 bit one? 64-32 is do_div() > * Do we write it in C or assembly? C is OK. > > Depending on our decisions, I will rewrite (or abandon) this patch > accordingly. Look, I don't mean to be rough, but honestly. I see no use for this code. Adding code to anywhere so it'd just sit there is bad. Cheers > > Regards, > Che-Liang > > On Thu, Sep 1, 2011 at 4:03 AM, Wolfgang Denk <wd@denx.de> wrote: > > Dear Che-Liang Chiou, > > > > In message <1314787130-1043-1-git-send-email-clchiou@chromium.org> you wrote: > >> This patch adds a 64-64 bit divider that supports ARMv4 and above. > > > > To summarize the misc feedback: Please explain in detail which > > problem you are trying to fix. We see no need for this patch so far. > > > > Best regards, > > > > Wolfgang Denk > > > > -- > > DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel > > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany > > Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de > > "Success covers a multitude of blunders." - George Bernard Shaw
Hi Marek, I will abandon this patch and submit a new patch that is adapted from do_div() and lib64.c of the Linux kernel. Does this sound okay to you? Regards, Che-Liang On Thu, Sep 1, 2011 at 6:16 PM, Marek Vasut <marek.vasut@gmail.com> wrote: > On Thursday, September 01, 2011 12:09:18 PM Che-liang Chiou wrote: >> Hi, >> >> Thanks for the insightful comments. Here are my responses: >> >> * Why don't I implement the divider in C? >> It is not because I think it's performance critical (I haven't >> benchmarked it yet), but because I have a probably wrong impression >> that the divider has to be written in assembly --- all dividers in >> arch/arm/lib/ are written in ARM assembly. What is the policy here for >> using assembly or C? > > No, C is just fine and is more generic. Those assembler versions are just > optimized things, you don't need to be bothered by those. > >> >> * When do we need a 64-bit divider? >> In kernel code do_div() is used for various purposes. So I think it >> should be quite often that we would need a 64-bit divider in U-Boot. > > Not much really ... and for the rare cases, we can do with do_div() as is. > >> >> * Do we need a 64-64 bit divider? >> do_div() defines 64-32 bit division semantics (dividend is 64-bit and >> divisor is 32-bit), and this patch implements a 64-64 bit divider >> (both dividend and divisor are 64-bit). I have to admit that I can't >> think of scenarios or reasons to justify a 64-64 bit divider instead >> of a 64-32 bit divider, except that a 64-64 bit divider is more >> generic than a 64-32 bit one. > > So we don't need 64/64 divide at all. > >> >> So I guess we can agree that a 64-bit divider is feature that is nice >> to have, and we should decide: >> * Do we need a 64-64 bit divider or a 64-32 bit one? > > 64-32 is do_div() > >> * Do we write it in C or assembly? > > C is OK. > >> >> Depending on our decisions, I will rewrite (or abandon) this patch >> accordingly. > > Look, I don't mean to be rough, but honestly. I see no use for this code. Adding > code to anywhere so it'd just sit there is bad. > > Cheers > >> >> Regards, >> Che-Liang >> >> On Thu, Sep 1, 2011 at 4:03 AM, Wolfgang Denk <wd@denx.de> wrote: >> > Dear Che-Liang Chiou, >> > >> > In message <1314787130-1043-1-git-send-email-clchiou@chromium.org> you > wrote: >> >> This patch adds a 64-64 bit divider that supports ARMv4 and above. >> > >> > To summarize the misc feedback: Please explain in detail which >> > problem you are trying to fix. We see no need for this patch so far. >> > >> > Best regards, >> > >> > Wolfgang Denk >> > >> > -- >> > DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel >> > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany >> > Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de >> > "Success covers a multitude of blunders." - George Bernard Shaw >
On Thursday, September 01, 2011 12:30:47 PM Che-liang Chiou wrote: > Hi Marek, > > I will abandon this patch and submit a new patch that is adapted from > do_div() and lib64.c of the Linux kernel. Does this sound okay to you? I'm not against it, but is it worth the effort? Like ... why do we need it ? > > Regards, > Che-Liang [...]
Hi Marek, do_div() and lib/div64.c of linux kernel has been ported to U-Boot since Oct, 2006 (this date is the earliest record that I can find; see commit 7b64fef3). Regards, Che-Liang On Thu, Sep 1, 2011 at 6:42 PM, Marek Vasut <marek.vasut@gmail.com> wrote: > On Thursday, September 01, 2011 12:30:47 PM Che-liang Chiou wrote: >> Hi Marek, >> >> I will abandon this patch and submit a new patch that is adapted from >> do_div() and lib64.c of the Linux kernel. Does this sound okay to you? > > I'm not against it, but is it worth the effort? Like ... why do we need it ? >> >> Regards, >> Che-Liang > [...] >
Dear Che-liang Chiou, In message <CANJuy2K9uWdxT6T=mMj0yLiV3cAJUHLNC4LGDv21sP-DMGVzUg@mail.gmail.com> you wrote: > > do_div() and lib/div64.c of linux kernel has been ported to U-Boot > since Oct, 2006 (this date is the earliest record that I can find; see > commit 7b64fef3). Indeed, and so far nobody ever needed the patch you submitted, so please explain in detail why you need it now? Best regards, Wolfgang Denk
Dear Wolfgang, I am convinced that a 64-64 bit divider (this patch) is not needed. Is there any way that we could mark a patch "abandon"? Regards, Che-Liang On Thu, Sep 1, 2011 at 9:07 PM, Wolfgang Denk <wd@denx.de> wrote: > Dear Che-liang Chiou, > > In message <CANJuy2K9uWdxT6T=mMj0yLiV3cAJUHLNC4LGDv21sP-DMGVzUg@mail.gmail.com> you wrote: >> >> do_div() and lib/div64.c of linux kernel has been ported to U-Boot >> since Oct, 2006 (this date is the earliest record that I can find; see >> commit 7b64fef3). > > Indeed, and so far nobody ever needed the patch you submitted, so > please explain in detail why you need it now? > > Best regards, > > Wolfgang Denk > > -- > DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany > Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de > "More software projects have gone awry for lack of calendar time than > for all other causes combined." > - Fred Brooks, Jr., _The Mythical Man Month_ >
Dear Che-liang Chiou, In message <CANJuy2+BB7tA70vHoTq3LJA-o4ymGCSdP9PnFzrx-uWret7nqQ@mail.gmail.com> you wrote: > > So I guess we can agree that a 64-bit divider is feature that is nice > to have, and we should decide: > * Do we need a 64-64 bit divider or a 64-32 bit one? > * Do we write it in C or assembly? The situation is simple: there is no code in U-Boot that needs this feature, and we try to avoid adding dead code. If you don;t have a use case at hand that actually requires this, then please let's drop it. Thanks. Best regards, Wolfgang Denk
Hi Wolfgang, On 08/09/11 07:14, Wolfgang Denk wrote: > Dear Che-liang Chiou, > > In message <CANJuy2+BB7tA70vHoTq3LJA-o4ymGCSdP9PnFzrx-uWret7nqQ@mail.gmail.com> you wrote: >> >> So I guess we can agree that a 64-bit divider is feature that is nice >> to have, and we should decide: >> * Do we need a 64-64 bit divider or a 64-32 bit one? >> * Do we write it in C or assembly? > > The situation is simple: there is no code in U-Boot that needs this > feature, and we try to avoid adding dead code. > > If you don;t have a use case at hand that actually requires this, then > please let's drop it. You'll laugh at this - the Intel High Performance Event Timers (HPET) are defined to a resolution of femto-seconds and you end up with code in get_timer() like: u32 count_low; u32 count_high; u32 fs_per_tick; u64 ticks; u64 fs; u32 ms; count_low = readl(&hpet_registers->main_count_low); count_high = readl(&hpet_registers->main_count_high); fs_per_tick = readl(&hpet_registers->counter_clk_period); ticks = ((u64)count_high << 32) | ((u64)count_low); fs = fs_per_tick * ticks; ms = (u32)lldiv(ticks, 1000000000000); But I can right shift both divisor and dividend by 12 bits without loosing any significant precision which turns it into: ms = (u32)lldiv(ticks >> 12, 244140625); So I almost needed a 64 bit divisor. Regards, Graeme
Dear Graeme Russ, In message <4E786EBA.5040805@gmail.com> you wrote: > > You'll laugh at this - the Intel High Performance Event Timers (HPET) are > defined to a resolution of femto-seconds and you end up with code in > get_timer() like: I have to admit that I have never been able to laugh about x86 design issues. But then, Intel told us the Pentium would have "RISK" features... Best regards, Wolfgang Denk
On 20/09/11 21:28, Wolfgang Denk wrote: > Dear Graeme Russ, > > In message <4E786EBA.5040805@gmail.com> you wrote: >> >> You'll laugh at this - the Intel High Performance Event Timers (HPET) are >> defined to a resolution of femto-seconds and you end up with code in >> get_timer() like: > > I have to admit that I have never been able to laugh about x86 design > issues. But then, Intel told us the Pentium would have "RISK" > features... *ROFL* Well actually, it's not really an x86 thing - Any architecture could implement HPET. Using femto-seconds as the time-base and defining a 'tick' as a number of femto-seconds makes a lot of sense - It allows preservation of timer accuracy through the comparators so interrupts can be generated with extreme precision while actually allowing the source clock to be pretty much any frequency. They were, after all, designed for multi-media applications to solve the horrendous sub-ms accuracy issue with the older programmable timers Regards, Graeme
diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile index 300c8fa..31770dd 100644 --- a/arch/arm/lib/Makefile +++ b/arch/arm/lib/Makefile @@ -33,6 +33,7 @@ GLSOBJS += _divsi3.o GLSOBJS += _lshrdi3.o GLSOBJS += _modsi3.o GLSOBJS += _udivsi3.o +GLSOBJS += _uldivmod.o GLSOBJS += _umodsi3.o GLCOBJS += div0.o diff --git a/arch/arm/lib/_uldivmod.S b/arch/arm/lib/_uldivmod.S new file mode 100644 index 0000000..9e3a5e6 --- /dev/null +++ b/arch/arm/lib/_uldivmod.S @@ -0,0 +1,266 @@ +/* + * Copyright (c) 2011 The Chromium OS Authors. + * See file CREDITS for list of people who contributed to this + * project. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of + * the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, + * MA 02111-1307 USA + */ + +/* + * A, Q = r0 + (r1 << 32) + * B, R = r2 + (r3 << 32) + * A / B = Q ... R + */ + + .text + .global __aeabi_uldivmod + .type __aeabi_uldivmod, function + .align 0 + +/* armv4 does not support clz (count leading zero) instruction. */ +#if __LINUX_ARM_ARCH__ <= 4 +# define CLZ(dst, src) bl L_clz_ ## dst ## _ ## src +# define CLZEQ(dst, src) bleq L_clz_ ## dst ## _ ## src +#else +# define CLZ(dst, src) clz dst, src +# define CLZEQ(dst, src) clzeq dst, src +#endif + +A_0 .req r0 +A_1 .req r1 +B_0 .req r2 +B_1 .req r3 +C_0 .req r4 +C_1 .req r5 +D_0 .req r6 +D_1 .req r7 + +Q_0 .req r0 +Q_1 .req r1 +R_0 .req r2 +R_1 .req r3 + +__aeabi_uldivmod: + stmfd sp!, {r4, r5, r6, r7, lr} + @ Test if B == 0 + orrs ip, B_0, B_1 @ Z set -> B == 0 + beq L_div_by_0 + @ Test if B is power of 2: (B & (B - 1)) == 0 + subs C_0, B_0, #1 + sbc C_1, B_1, #0 + tst C_0, B_0 + tsteq B_1, C_1 + beq L_pow2 + @ Test if A_1 == B_1 == 0 + orrs ip, A_1, B_1 + beq L_div_32_32 + +L_div_64_64: + mov C_0, #1 + mov C_1, #0 + @ D_0 = clz A + CLZ(D_0, A_1) + teq A_1, #0 + CLZEQ(ip, A_0) + teq A_1, #0 + addeq D_0, D_0, ip + @ D_1 = clz B + CLZ(D_1, B_1) + teq B_1, #0 + CLZEQ(ip, B_0) + teq B_1, #0 + addeq D_1, D_1, ip + @ if clz B - clz A <= 0: goto L_done_shift + subs D_0, D_1, D_0 + bls L_done_shift + subs D_1, D_0, #32 + rsb ip, D_0, #32 + @ B <<= (clz B - clz A) + movmi B_1, B_1, lsl D_0 + orrmi B_1, B_1, B_0, lsr ip + movpl B_1, B_0, lsl D_1 + mov B_0, B_0, lsl D_0 + @ C = 1 << (clz B - clz A) + movmi C_1, C_1, lsl D_0 + orrmi C_1, C_1, C_0, lsr ip + movpl C_1, C_0, lsl D_1 + mov C_0, C_0, lsl D_0 +L_done_shift: + mov D_0, #0 + mov D_1, #0 + @ C: current bit; D: result +L_subtract: + @ if A >= B + cmp A_1, B_1 + cmpeq A_0, B_0 + bcc L_update + @ A -= B + subs A_0, A_0, B_0 + sbc A_1, A_1, B_1 + @ D |= C + orr D_0, D_0, C_0 + orr D_1, D_1, C_1 +L_update: + @ if A == 0: break + orrs ip, A_1, A_0 + beq L_exit + @ C >>= 1 + movs C_1, C_1, lsr #1 + movs C_0, C_0, rrx + @ if C == 0: break + orrs ip, C_1, C_0 + beq L_exit + @ B >>= 1 + movs B_1, B_1, lsr #1 + mov B_0, B_0, rrx + b L_subtract +L_exit: + @ Note: A, B & Q, R are aliases + mov R_0, A_0 + mov R_1, A_1 + mov Q_0, D_0 + mov Q_1, D_1 + ldmfd sp!, {r4, r5, r6, r7, pc} + +L_div_32_32: + @ Note: A_0 & r0 are aliases + @ Q_1 r1 + mov r1, B_0 + bl __aeabi_uidivmod + mov R_0, r1 + mov R_1, #0 + mov Q_1, #0 + ldmfd sp!, {r4, r5, r6, r7, pc} + +L_pow2: + @ Note: A, B and Q, R are aliases + @ R = A & (B - 1) + and C_0, A_0, C_0 + and C_1, A_1, C_1 + @ Q = A >> log2(B) + @ Note: B must not be 0 here! + CLZ(D_0, B_0) + add D_1, D_0, #1 + rsbs D_0, D_0, #31 + movpl A_0, A_0, lsr D_0 + orrpl A_0, A_0, A_1, lsl D_1 + bpl L_1 + CLZ(D_0, B_1) + rsb D_0, D_0, #31 + mov A_0, A_1, lsr D_0 + add D_0, D_0, #32 +L_1: + mov A_1, A_1, lsr D_0 + @ Mov back C to R + mov R_0, C_0 + mov R_1, C_1 + ldmfd sp!, {r4, r5, r6, r7, pc} + +L_div_by_0: + bl __div0 + @ As wrong as it could be + mov Q_0, #0 + mov Q_1, #0 + mov R_0, #0 + mov R_1, #0 + ldmfd sp!, {r4, r5, r6, r7, pc} + +#if __LINUX_ARM_ARCH__ <= 4 +/* + * count leading zero + * + * input : r0 + * output : r0 + * destroy : r1, r2, r3, r4, r5 + */ +L_clz: + mov r1, #0 // clz result + mov r2, #0xf0000000 // mask + mov r3, #28 // shift amount + adr r4, L_clz_table +L_clz_loop: + teq r2, #0 + beq L_clz_loop_done + ands r5, r0, r2 + mov r5, r5, lsr r3 + ldrsb r5, [r4, r5] + add r1, r1, r5 + mov r2, r2, lsr #4 + add r3, r3, #-4 + beq L_clz_loop +L_clz_loop_done: + mov r0, r1 + mov pc, lr +L_clz_table: + .byte 4 + .byte 3 + .byte 2 + .byte 2 + .byte 1 + .byte 1 + .byte 1 + .byte 1 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + +L_clz_D_0_A_1: + stmfd sp!, {r0, r1, r2, r3, r4, r5, lr} + mov r0, A_1 + bl L_clz + mov D_0, r0 + ldmfd sp!, {r0, r1, r2, r3, r4, r5, pc} + +L_clz_ip_A_0: + stmfd sp!, {r0, r1, r2, r3, r4, r5, lr} + mov r0, A_0 + bl L_clz + mov ip, r0 + ldmfd sp!, {r0, r1, r2, r3, r4, r5, pc} + +L_clz_D_1_B_1: + stmfd sp!, {r0, r1, r2, r3, r4, r5, lr} + mov r0, B_1 + bl L_clz + mov D_1, r0 + ldmfd sp!, {r0, r1, r2, r3, r4, r5, pc} + +L_clz_ip_B_0: + stmfd sp!, {r0, r1, r2, r3, r4, r5, lr} + mov r0, B_0 + bl L_clz + mov ip, r0 + ldmfd sp!, {r0, r1, r2, r3, r4, r5, pc} + +L_clz_D_0_B_0: + stmfd sp!, {r0, r1, r2, r3, r4, r5, lr} + mov r0, B_0 + bl L_clz + mov D_0, r0 + ldmfd sp!, {r0, r1, r2, r3, r4, r5, pc} + +L_clz_D_0_B_1: + stmfd sp!, {r0, r1, r2, r3, r4, r5, lr} + mov r0, B_1 + bl L_clz + mov D_0, r0 + ldmfd sp!, {r0, r1, r2, r3, r4, r5, pc} +#endif /* __LINUX_ARM_ARCH__ */
This patch adds a 64-64 bit divider that supports ARMv4 and above. Because clz (count leading zero) instruction is added until ARMv5, the divider implements a clz function for ARMv4 targets. The divider was tested with the following test driver code ran by qemu-arm: int main(void) { uint64_t a, b, q, r; while (scanf("%llx %llx %llx %llx", &a, &b, &q, &r) > 0) printf("%016llx %016llx %016llx %016llx\n", a, b, a / b, a % b); return 0; } Signed-off-by: Che-Liang Chiou <clchiou@chromium.org> Cc: Albert Aribaud <albert.u.boot@aribaud.net> --- This patch is alos tested with `MAKEALL -a arm` arch/arm/lib/Makefile | 1 + arch/arm/lib/_uldivmod.S | 266 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 267 insertions(+), 0 deletions(-) create mode 100644 arch/arm/lib/_uldivmod.S