From patchwork Fri Feb 9 12:42:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 871381 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-472940-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="hxYEtQI5"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zdF820gSFz9s72 for ; Fri, 9 Feb 2018 23:42:25 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=dW1qXhWgsWnqy82wpyQ1yswl/EAmlsvOQcnNUBeD/6U+zBH7yAwLo b0c7mho7A1lAZKYIFYC0SKxALSGUcfkgxl2PaAbY+Z3xPbuV0yDEIlFzYF/mwlqS 9vMx3dIPODwnX/zvao4jbLixD264vVwkDKPfgOUoSVLc6p9PFcnx/M= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=d/U6dezmWDJh/1iUCytAOQUPEYo=; b=hxYEtQI5V5z5bMIOOTfM z69OBFROoGzip22UKBq9D9AiFS0gbJsBVhzi3kOsUO2HHQ+4L81L7ELBLd97qTBE o/XVYIUPXNMzUKiIiLijnxeAqHM8zqIjcmxnrpI25pIfSIEWpll5kFgLFaKseJsa Glca0yYrWTNDCVwZ6wNW/b4= Received: (qmail 85297 invoked by alias); 9 Feb 2018 12:42:18 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 85284 invoked by uid 89); 9 Feb 2018 12:42:17 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-9.8 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, SPF_PASS, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=Roger, roger, Cambridge, cambridge X-HELO: server.nextmovesoftware.com Received: from server.nextmovesoftware.com (HELO server.nextmovesoftware.com) (162.254.253.69) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 09 Feb 2018 12:42:15 +0000 Received: from 212.44.19.45.ip.redstone-isp.net ([212.44.19.45]:52426 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.89) (envelope-from ) id 1ek80G-0002uA-Fw for gcc-patches@gcc.gnu.org; Fri, 09 Feb 2018 07:42:08 -0500 From: "Roger Sayle" To: Subject: [PATCH] POPCOUNT folding optimizations Date: Fri, 9 Feb 2018 12:42:08 -0000 Message-ID: <003f01d3a1a3$66b893b0$3429bb10$@nextmovesoftware.com> MIME-Version: 1.0 X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com The following patch implements a number of __builtin_popcount related optimizations. (i) popcount(x) == 0 can be simplified to x==0, and popcount(x) != 0 to x!=0. (ii) popcount(x&1) can be simplified to x&1, and for unsigned x, popcount(x>>31) to x>>31. (iii) popcount (x&6) + popcount(y&16) can be simplified to popcount((x&6)|(y&16)) These may seem obscure transformations, but performing these types of POPCOUNT operations are often the performance critical steps in some cheminformatics applications. To implement the above transformations I've introduced the tree_nonzero_bits function, which is a tree-level version of rtlanal's nonzero_bits used by the RTL optimizers. The following patch has been tested on x86_64-pc-linux-gnu with a "make bootstrap" and "make check" with no regressions, and passes for the four new gcc.dg test cases. Many thanks In advance. Best regards, Roger --- Roger Sayle, PhD. NextMove Software Limited Innovation Centre (Unit 23), Cambridge Science Park, Cambridge, CB4 0EY 2018-02-09 Roger Sayle * fold-const.c (tree_nonzero_bits): New function. * fold-const.h (tree_nonzero_bits): Likewise. * match.pd (POPCOUNT): New patterns to fold BUILTIN_POPCOUNT and friends. POPCOUNT(x&1) => x&1, POPCOUNT(x)==0 => x==0, etc. 2018-02-09 Roger Sayle * gcc.dg/fold-popcount-1.c: New testcase. * gcc.dg/fold-popcount-2.c: New testcase. * gcc.dg/fold-popcount-3.c: New testcase. * gcc.dg/fold-popcount-4.c: New testcase. /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-original" } */ int test_eqzero(unsigned int a) { return __builtin_popcount(a) == 0; } int test_eqzerol(unsigned long b) { return __builtin_popcountl(b) == 0; } int test_eqzeroll(unsigned long long c) { return __builtin_popcountll(c) == 0; } int test_nezero(unsigned int d) { return __builtin_popcount(d) != 0; } int test_nezerol(unsigned long e) { return __builtin_popcountl(e) != 0; } int test_nezeroll(unsigned long long f) { return __builtin_popcountll(f) != 0; } /* { dg-final { scan-tree-dump-times "popcount" 0 "original" } } */ /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-cddce1" } */ int test_andone(unsigned int a) { return __builtin_popcount(a&1); } int test_andonel(unsigned long b) { return __builtin_popcountl(b&1); } int test_andonell(unsigned long long c) { return __builtin_popcountll(c&1); } int test_oneand(unsigned int d) { return __builtin_popcount(1&d); } int test_oneandl(unsigned long e) { return __builtin_popcountl(1&e); } int test_oneandll(unsigned long long f) { return __builtin_popcountll(1&f); } /* { dg-final { scan-tree-dump-times "popcount" 0 "cddce1" } } */ /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-cddce1" } */ int test_combine(unsigned int a, unsigned int b) { return __builtin_popcount(a&8) + __builtin_popcount(b&2); } /* { dg-final { scan-tree-dump-times "popcount" 1 "cddce1" } } */ /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-cddce1" } */ int test_shiftmax(unsigned int a) { return __builtin_popcount(a>>(8*sizeof(a)-1)); } int test_shiftmaxl(unsigned long b) { return __builtin_popcountl(b>>(8*sizeof(b)-1)); } int test_shiftmaxll(unsigned long long c) { return __builtin_popcountll(c>>(8*sizeof(c)-1)); } int test_shift7(unsigned char d) { return __builtin_popcount(d>>7); } int test_shift7l(unsigned char e) { return __builtin_popcountl(e>>7); } int test_shift7ll(unsigned char f) { return __builtin_popcountll(f>>7); } int test_shift15(unsigned short g) { return __builtin_popcount(g>>15); } int test_shift15l(unsigned short h) { return __builtin_popcountl(h>>15); } int test_shift15ll(unsigned short i) { return __builtin_popcountll(i>>15); } /* { dg-final { scan-tree-dump-times "popcount" 0 "cddce1" } } */ Index: gcc/fold-const.c =================================================================== --- gcc/fold-const.c (revision 257227) +++ gcc/fold-const.c (working copy) @@ -14580,6 +14580,75 @@ return string + offset; } +/* Given a tree T, compute which bits in T may be nonzero. */ + +wide_int +tree_nonzero_bits (const_tree t) +{ + switch (TREE_CODE (t)) + { + case INTEGER_CST: + return wi::to_wide (t); + case SSA_NAME: + return get_nonzero_bits (t); + case NON_LVALUE_EXPR: + case SAVE_EXPR: + return tree_nonzero_bits (TREE_OPERAND (t, 0)); + case BIT_AND_EXPR: + return wi::bit_and (tree_nonzero_bits (TREE_OPERAND (t, 0)), + tree_nonzero_bits (TREE_OPERAND (t, 1))); + case BIT_IOR_EXPR: + case BIT_XOR_EXPR: + return wi::bit_or (tree_nonzero_bits (TREE_OPERAND (t, 0)), + tree_nonzero_bits (TREE_OPERAND (t, 1))); + case COND_EXPR: + return wi::bit_or (tree_nonzero_bits (TREE_OPERAND (t, 1)), + tree_nonzero_bits (TREE_OPERAND (t, 2))); + case NOP_EXPR: + case CONVERT_EXPR: + return wide_int::from (tree_nonzero_bits (TREE_OPERAND (t, 0)), + TYPE_PRECISION (TREE_TYPE (t)), + TYPE_SIGN (TREE_TYPE (TREE_OPERAND (t, 0)))); + case PLUS_EXPR: + if (INTEGRAL_TYPE_P (TREE_TYPE (t))) + { + wide_int nzbits1 = tree_nonzero_bits (TREE_OPERAND (t, 0)); + wide_int nzbits2 = tree_nonzero_bits (TREE_OPERAND (t, 1)); + if (wi::bit_and (nzbits1, nzbits2) == 0) + return wi::bit_or (nzbits1, nzbits2); + } + break; + case LSHIFT_EXPR: + if (TREE_CODE (TREE_OPERAND (t, 1)) == INTEGER_CST) + { + tree type = TREE_TYPE (t); + wide_int nzbits = tree_nonzero_bits (TREE_OPERAND (t, 0)); + wide_int arg1 = wi::to_wide (TREE_OPERAND (t, 1), + TYPE_PRECISION (type)); + return wi::neg_p (arg1) + ? wi::rshift (nzbits, -arg1, TYPE_SIGN (type)) + : wi::lshift (nzbits, arg1); + } + break; + case RSHIFT_EXPR: + if (TREE_CODE (TREE_OPERAND (t, 1)) == INTEGER_CST) + { + tree type = TREE_TYPE (t); + wide_int nzbits = tree_nonzero_bits (TREE_OPERAND (t, 0)); + wide_int arg1 = wi::to_wide (TREE_OPERAND (t, 1), + TYPE_PRECISION (type)); + return wi::neg_p (arg1) + ? wi::lshift (nzbits, -arg1) + : wi::rshift (nzbits, arg1, TYPE_SIGN (type)); + } + break; + default: + break; + } + + return wi::shwi (-1, TYPE_PRECISION (TREE_TYPE (t))); +} + #if CHECKING_P namespace selftest { Index: gcc/fold-const.h =================================================================== --- gcc/fold-const.h (revision 257227) +++ gcc/fold-const.h (working copy) @@ -181,6 +181,7 @@ extern tree const_binop (enum tree_code, tree, tree, tree); extern bool negate_mathfn_p (combined_fn); extern const char *c_getstr (tree, unsigned HOST_WIDE_INT *strlen = NULL); +extern wide_int tree_nonzero_bits (const_tree); /* Return OFF converted to a pointer offset type suitable as offset for POINTER_PLUS_EXPR. Use location LOC for this conversion. */ Index: gcc/match.pd =================================================================== --- gcc/match.pd (revision 257227) +++ gcc/match.pd (working copy) @@ -4648,3 +4648,24 @@ || wi::geu_p (wi::to_wide (@rpos), wi::to_wide (@ipos) + isize)) (BIT_FIELD_REF @0 @rsize @rpos))))) + +/* POPCOUNT simplifications. */ +(for popcount (BUILT_IN_POPCOUNT BUILT_IN_POPCOUNTL BUILT_IN_POPCOUNTLL + BUILT_IN_POPCOUNTIMAX) + /* popcount(X&1) is nop_expr(X&1). */ + (simplify + (popcount @0) + (if (tree_nonzero_bits (@0) == 1) + (convert @0))) + /* popcount(X) + popcount(Y) is popcount(X|Y) when X&Y must be zero. */ + (simplify + (plus (popcount @0) (popcount @1)) + (if (wi::bit_and (tree_nonzero_bits (@0), tree_nonzero_bits (@1)) == 0) + (popcount (bit_ior @0 @1)))) + /* pocount(X) == 0 is X == 0, and related (in)equalities. */ + (for cmp (le eq ne gt) + rep (eq eq ne ne) + (simplify + (cmp (popcount @0) zerop) + (rep @0 { build_zero_cst (TREE_TYPE (@0)); })))) +