Message ID | 20160428164856.10120.qmail@ns.horizon.com |
---|---|
State | Not Applicable |
Delegated to: | David Miller |
Headers | show |
> __ffs on the available architectures: > Alpha: sometimes (CONFIG_ALPHA_EV6, CONFIG_ALPHA_EV67) > ARC: sometimes (!CONFIG_ISA_ARCOMPACT) > ARM: sometimes (V5+) > ARM64: NO, could be written using RBIT and CLZ > AVR: yes > Blackfin: NO, could be written using hweight() > C6x: yes > CRIS: NO > FR-V: yes > H8300: NO > Hexagon: yes > IA64: yes > M32R: NO > M68k: sometimes > MetaG: NO > Microblaze: NO > MIPS: sometimes > MN10300: yes > OpenRISC: NO > PA-RISC: NO? Interesting code, but I think it's a net loss. > PowerPC: yes > S390: sometimes (CONFIG_HAVE_MARCH_Z9_109_FEATURES) > Score: NO > SH: NO > SPARC: NO SPARC: sparc64: YES, sparc32: NO Patch needs to be updated to refelct this. Sam -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Obviously non-SPARC recipients dropped, but I probably missed some.] >> __ffs on the available architectures: > SPARC: sparc64: YES, sparc32: NO > Patch needs to be updated to refelct this. I didn't see the sparc64 implementation, but on looking again, I have to say that no, it doesn't. arch/sparc/lib/ffs.S is a custom implementation of __ffs, but it's a function call and 33 instructions/18 cycles long. There are several similar custom implementations that I also considered "NO". "fast" in this context means a handful of inline instructions, usually one of: 1. A direct count_trailing_zeros instruction, 2. count_leading_zeros(x ^ (x-1)), or 3. count_leading_zeros(bit_reverse(x)), or 4. popcount(~x & (x-1)). The question is whether __ffs plus a variable shift is faster than three instructions plus an unpredictable branch. -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: "George Spelvin" <linux@horizon.com> Date: 28 Apr 2016 17:21:26 -0400 > arch/sparc/lib/ffs.S is a custom implementation of __ffs, but it's a > function call and 33 instructions/18 cycles long. There are several > similar custom implementations that I also considered "NO". Read it again, it is patched if the cpu supports the necessary instructions in which case it's a very short sequence. The actual code used when the cpu support said instruction is simply: .section .popc_6insn_patch, "ax" .word ffs brz,pn %o0, 98f neg %o0, %g1 xnor %o0, %g1, %o1 popc %o1, %o0 98: retl nop .word __ffs neg %o0, %g1 xnor %o0, %g1, %o1 popc %o1, %o0 retl sub %o0, 1, %o0 nop .previous -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Dave Miller spake truth from on high: > Read it again, it is patched if the cpu supports the necessary instructions > in which case it's a very short sequence. > > The actual code used when the cpu support said instruction is simply: > > neg %o0, %g1 > xnor %o0, %g1, %o1 > popc %o1, %o0 > retl > sub %o0, 1, %o0 D'oh! Missed that run-time patching. (Done by popc_patch() in arch/sparc/kernel/setup_64.c, which is executed based on boot-time capabilities detection.) I'm surprised you don't use the shorter: sub %o0, 1, %g1 andn %g1, %o0, %g1 retl popc %g1, %o0 This kind of blows a hole in having a compile-time flag. Which code path to use? It's absolutely not worth choosing gcd() implementations at boot time. The race is between that, plus the call and right shift, and the loop 1: andcc <src>, 1, %g0 bne,a 1b srl <src>, 1, <src> ... where the branch is taken 50% of the time, so on average it will be taken once before being not taken. -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/alpha/include/asm/bitops.h b/arch/alpha/include/asm/bitops.h index 4bdfbd44..c9c307a8 100644 --- a/arch/alpha/include/asm/bitops.h +++ b/arch/alpha/include/asm/bitops.h @@ -333,6 +333,7 @@ static inline unsigned long ffz(unsigned long word) static inline unsigned long __ffs(unsigned long word) { #if defined(CONFIG_ALPHA_EV6) && defined(CONFIG_ALPHA_EV67) +#define ARCH_HAS_FAST_FFS 1 /* Whee. EV67 can calculate it directly. */ return __kernel_cttz(word); #else