From patchwork Fri Apr 21 11:18:46 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christophe Leroy X-Patchwork-Id: 753294 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3w8YG73BGcz9s2P for ; Fri, 21 Apr 2017 21:21:19 +1000 (AEST) Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 3w8YG72R7mzDqM1 for ; Fri, 21 Apr 2017 21:21:19 +1000 (AEST) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Received: from pegase1.c-s.fr (pegase1.c-s.fr [93.17.236.30]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3w8YCH5kMhzDq66 for ; Fri, 21 Apr 2017 21:18:51 +1000 (AEST) Received: from localhost (mailhub1-int [192.168.12.234]) by localhost (Postfix) with ESMTP id 3w8YBt6kSxz9ttBw; Fri, 21 Apr 2017 13:18:30 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at c-s.fr Received: from pegase1.c-s.fr ([192.168.12.234]) by localhost (pegase1.c-s.fr [192.168.12.234]) (amavisd-new, port 10024) with ESMTP id lvAgT-3wcSQ8; Fri, 21 Apr 2017 13:18:30 +0200 (CEST) Received: from messagerie.si.c-s.fr (messagerie.si.c-s.fr [192.168.25.192]) by pegase1.c-s.fr (Postfix) with ESMTP id 3w8YBt6C03z9ttBP; Fri, 21 Apr 2017 13:18:30 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by messagerie.si.c-s.fr (Postfix) with ESMTP id 709CF8B830; Fri, 21 Apr 2017 13:18:46 +0200 (CEST) X-Virus-Scanned: amavisd-new at c-s.fr Received: from messagerie.si.c-s.fr ([127.0.0.1]) by localhost (messagerie.si.c-s.fr [127.0.0.1]) (amavisd-new, port 10023) with ESMTP id LRXsCPWDiJmm; Fri, 21 Apr 2017 13:18:46 +0200 (CEST) Received: from PO15451.localdomain (po15451.idsi0.si.c-s.fr [172.25.231.16]) by messagerie.si.c-s.fr (Postfix) with ESMTP id 43D128B74B; Fri, 21 Apr 2017 13:18:46 +0200 (CEST) Received: by localhost.localdomain (Postfix, from userid 0) id 3C6176EA23; Fri, 21 Apr 2017 13:18:46 +0200 (CEST) Message-Id: In-Reply-To: References: From: Christophe Leroy Subject: [PATCH 1/4] powerpc: Discard ffs()/__ffs() function and use builtin functions instead To: Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , Scott Wood Date: Fri, 21 Apr 2017 13:18:46 +0200 (CEST) X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" With the ffs() function as defined in arch/powerpc/include/asm/bitops.h GCC will not optimise the code in case of constant parameter, as shown by the small exemple below. int ffs_test(void) { return 4 << ffs(31); } c0012334 : c0012334: 39 20 00 01 li r9,1 c0012338: 38 60 00 04 li r3,4 c001233c: 7d 29 00 34 cntlzw r9,r9 c0012340: 21 29 00 20 subfic r9,r9,32 c0012344: 7c 63 48 30 slw r3,r3,r9 c0012348: 4e 80 00 20 blr With this patch, the same function will compile as follows: c0012334 : c0012334: 38 60 00 08 li r3,8 c0012338: 4e 80 00 20 blr The same happens with __ffs() For non constant calls, the generated code is doing the same, allthought it is slightly different on 64 bits for ffs(): unsigned long test__ffs(unsigned long x) { return __ffs(x); } int testffs(int x) { return ffs(x); } On PPC32, before the patch: 0000003c : 3c: 7d 23 00 d0 neg r9,r3 40: 7d 23 18 38 and r3,r9,r3 44: 7c 63 00 34 cntlzw r3,r3 48: 20 63 00 1f subfic r3,r3,31 4c: 4e 80 00 20 blr 00000050 : 50: 7d 23 00 d0 neg r9,r3 54: 7d 23 18 38 and r3,r9,r3 58: 7c 63 00 34 cntlzw r3,r3 5c: 20 63 00 20 subfic r3,r3,32 60: 4e 80 00 20 blr On PPC32, after the patch: 0000002c : 2c: 7d 23 00 d0 neg r9,r3 30: 7d 23 18 38 and r3,r9,r3 34: 7c 63 00 34 cntlzw r3,r3 38: 20 63 00 1f subfic r3,r3,31 3c: 4e 80 00 20 blr 00000040 : 40: 7d 23 00 d0 neg r9,r3 44: 7d 23 18 38 and r3,r9,r3 48: 7c 63 00 34 cntlzw r3,r3 4c: 20 63 00 20 subfic r3,r3,32 50: 4e 80 00 20 blr On PPC64, before the patch: 0000000000000060 <.test__ffs>: 60: 7c 03 00 d0 neg r0,r3 64: 7c 03 18 38 and r3,r0,r3 68: 7c 63 00 74 cntlzd r3,r3 6c: 20 63 00 3f subfic r3,r3,63 70: 7c 63 07 b4 extsw r3,r3 74: 4e 80 00 20 blr 0000000000000080 <.testffs>: 80: 7c 03 00 d0 neg r0,r3 84: 7c 03 18 38 and r3,r0,r3 88: 7c 63 00 74 cntlzd r3,r3 8c: 20 63 00 40 subfic r3,r3,64 90: 7c 63 07 b4 extsw r3,r3 94: 4e 80 00 20 blr On PPC64, after the patch: 0000000000000050 <.test__ffs>: 50: 7c 03 00 d0 neg r0,r3 54: 7c 03 18 38 and r3,r0,r3 58: 7c 63 00 74 cntlzd r3,r3 5c: 20 63 00 3f subfic r3,r3,63 60: 4e 80 00 20 blr 0000000000000070 <.testffs>: 70: 7c 03 00 d0 neg r0,r3 74: 7c 03 18 38 and r3,r0,r3 78: 7c 63 00 34 cntlzw r3,r3 7c: 20 63 00 20 subfic r3,r3,32 80: 7c 63 07 b4 extsw r3,r3 84: 4e 80 00 20 blr (ffs() operates on an int so cntlzw is equivalent to cntlzd) In addition, when reading the generated vmlinux, we can observe that with the builtin functions, GCC sometimes efficiently spreads the instructions within the generated functions while the inline assembly force them to remain grouped together. __builtin_ffs() is already used in arch/powerpc/include/asm/page_32.h Those builtins have been in GCC since at least 3.4.6 (see https://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/Other-Builtins.html ) Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/bitops.h | 16 ++-------------- 1 file changed, 2 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/include/asm/bitops.h b/arch/powerpc/include/asm/bitops.h index 33a24fdd7958..71b05685f3a7 100644 --- a/arch/powerpc/include/asm/bitops.h +++ b/arch/powerpc/include/asm/bitops.h @@ -253,21 +253,9 @@ static __inline__ unsigned long ffz(unsigned long x) return __ilog2(x & -x); } -static __inline__ unsigned long __ffs(unsigned long x) -{ - return __ilog2(x & -x); -} +#include -/* - * ffs: find first bit set. This is defined the same way as - * the libc and compiler builtin ffs routines, therefore - * differs in spirit from the above ffz (man ffs). - */ -static __inline__ int ffs(int x) -{ - unsigned long i = (unsigned long)x; - return __ilog2(i & -i) + 1; -} +#include /* * fls: find last (most-significant) bit set.