From patchwork Wed Jan 24 21:56:37 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kugan Vivekanandarajah X-Patchwork-Id: 865492 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-471991-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="JUZulyTk"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zRfCv1H31z9sNr for ; Thu, 25 Jan 2018 08:57:29 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:from:date:message-id:subject:to:content-type; q= dns; s=default; b=XoKeJMZ3UsRSensqs+9HEYso7VDWQuO8U74p15tlRvJquW qMin2tK194bYM/xdNuKpSVnLmkB94JMaF71uV1RMDFwoUN8zosimEhP7naph5KiZ AiJ3Rj0kZVmK5wZ3GQIr2UEtRvPV1o+AyiFcJCserUf1zo3KvsxbfHkFe3M4w= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:from:date:message-id:subject:to:content-type; s= default; bh=oOzKiLc1rgStUv/4DUAcdF7UEJE=; b=JUZulyTkR26REzupc/96 xEir4CQBmlQWeoXTCqJsV8bu+khjiyVgVyZV++DC7EBeZRzbiromgxZ1USS0gd8b 6xKFU2bIrp86IvXT0s9dygD5PYIzvqJOPvgTJeAh594mxHjdu9N3LuiaosOD2jGl koIIQiDgdgadeHM2yurEt2M= Received: (qmail 59716 invoked by alias); 24 Jan 2018 21:57:22 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 59707 invoked by uid 89); 24 Jan 2018 21:57:22 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.7 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-qt0-f181.google.com Received: from mail-qt0-f181.google.com (HELO mail-qt0-f181.google.com) (209.85.216.181) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 24 Jan 2018 21:57:20 +0000 Received: by mail-qt0-f181.google.com with SMTP id l20so14323881qtj.11 for ; Wed, 24 Jan 2018 13:57:19 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=oUQxphvXwF3dMYQzYOkrUAq/Fad64QjfPX1HcbGoNtQ=; b=hGPzfwI28ObChKSN0aul/zSajo8JirmNHubIrfUtUy9uZw7CpNhlyTScFsRqVCEsVu iysTuqeSIWQKYPjZrrzjZpsuWKyR6ukxqVZ8r8sN3SzJkBSAAm9t7Rvtj+Qow5+MHPVF DaYCr85863jCNoTwbiyKFPfK4oV4ukmIIYdMuSyi17EZc1KSJmuFjVVja6LtX05lFs++ Cb2L6xKC5JsGSSKOvfcBv8AhKhkalUW1XZiMK51E4nlzSH649gtIa2LSyaQsWSMvW6LK RH5PbQgcGETd2zzrQ9Zz+Al7N31AYrdz3OFdcKQkeCJYTRaBEWE+Wp4q7LfLJq+woMhT jyrQ== X-Gm-Message-State: AKwxyteIMffxuD6egIbCUjOCkuIBLCzXJjwCLgUO3JIyms8dr/3uN9F6 Nwtm5BIhCz3ibu9bTMGh+ByCm7SxX4D0kXpfRqh8FSFUkf4= X-Google-Smtp-Source: AH8x224868c6Tjoka6V8MBhgjt1VGO7OOZPx/9PGACyUONvHRjx6l4BsOiNkw6SQ3cJBOVj6z98roo64it5Bqvj3Ozc= X-Received: by 10.55.96.4 with SMTP id u4mr11900361qkb.323.1516831038106; Wed, 24 Jan 2018 13:57:18 -0800 (PST) MIME-Version: 1.0 Received: by 10.200.36.185 with HTTP; Wed, 24 Jan 2018 13:56:37 -0800 (PST) From: Kugan Vivekanandarajah Date: Thu, 25 Jan 2018 08:56:37 +1100 Message-ID: Subject: [RFC][PR82479] missing popcount builtin detection To: gcc-patches@gcc.gnu.org X-IsSubscribed: yes Hi All, Here is a patch for popcount builtin detection similar to LLVM. I would like to queue this for review for next stage 1. 1. This is done part of loop-distribution and effective for -O3 and above. 2. This does not distribute loop to detect popcount (like memcpy/memmove). I dont think that happens in practice. Please correct me if I am wrong. Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions. Thanks, Kugan gcc/ChangeLog: 2018-01-25 Kugan Vivekanandarajah PR middle-end/82479 * tree-loop-distribution.c (handle_popcount): New. (pass_loop_distribution::execute): Use handle_popcount. gcc/testsuite/ChangeLog: 2018-01-25 Kugan Vivekanandarajah PR middle-end/82479 * gcc.dg/tree-ssa/popcount.c: New test. From 9fa09af4b7013c6207e59a4920c82f089bfe45c2 Mon Sep 17 00:00:00 2001 From: Kugan Vivekanandarajah Date: Wed, 24 Jan 2018 08:50:08 +1100 Subject: [PATCH] pocount builtin detection Change-Id: Ic6e175f9cc9a69bd417936a4845c2c046fd446b4 Change-Id: I680eb107445660c60a5d38f5d7300ab1a3243bf5 Change-Id: Ia9f0df89e05520091dc7797195098118768c7ac2 --- gcc/testsuite/gcc.dg/tree-ssa/popcount.c | 41 +++++++++ gcc/tree-loop-distribution.c | 145 +++++++++++++++++++++++++++++++ 2 files changed, 186 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c new file mode 100644 index 0000000..86a66cb --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c @@ -0,0 +1,41 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-optimized" } */ + +extern int foo (int); + +int PopCount (long b) { + int c = 0; + b++; + + while (b) { + b &= b - 1; + c++; + } + return c; +} +int PopCount2 (long b) { + int c = 0; + + while (b) { + b &= b - 1; + c++; + } + foo (c); + return foo (c); +} + +void PopCount3 (long b1) { + + for (long i = 0; i < b1; ++i) + { + long b = i; + int c = 0; + while (b) { + b &= b - 1; + c++; + } + foo (c); + } +} + +/* { dg-final { scan-tree-dump-times "__builtin_popcount" 3 "optimized" } } */ diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c index a3d76e4..1060700 100644 --- a/gcc/tree-loop-distribution.c +++ b/gcc/tree-loop-distribution.c @@ -1585,6 +1585,148 @@ classify_builtin_ldst (loop_p loop, struct graph *rdg, partition *partition, return; } +/* See if loop is a popcout implementation of the form + + int c = 0; + while (b) { + b = b & (b - 1); + c++; + } + + If so, convert this into c = __builtin_popcount (b) + return true if we did, false otherwise. */ + + +static bool +handle_popcount (loop_p loop) +{ + tree lhs, rhs; + tree dest, src; + gimple *and_minus_one; + int count = 0; + gphi *count_phi; + gimple *fn_call; + gimple *use_stmt; + use_operand_p use_p; + imm_use_iterator iter; + gimple_stmt_iterator gsi; + + /* Check loop terminating branch is like + if (b != 0). */ + gimple *stmt = last_stmt (loop->header); + if (!stmt + || gimple_code (stmt) != GIMPLE_COND + || !zerop (gimple_cond_rhs (stmt))) + return false; + + /* Cheeck "b = b & (b - 1)" is calculated. */ + lhs = gimple_cond_lhs (stmt); + gimple *and_stmt = SSA_NAME_DEF_STMT (lhs); + if (gimple_assign_rhs_code (and_stmt) != BIT_AND_EXPR) + return false; + lhs = gimple_assign_rhs1 (and_stmt); + rhs = gimple_assign_rhs2 (and_stmt); + if (TREE_CODE (lhs) == SSA_NAME + && (and_minus_one = SSA_NAME_DEF_STMT (lhs)) + && is_gimple_assign (and_minus_one) + && (gimple_assign_rhs_code (and_minus_one) == PLUS_EXPR) + && integer_minus_onep (gimple_assign_rhs2 (and_minus_one))) + lhs = rhs; + else if (TREE_CODE (rhs) == SSA_NAME + && (and_minus_one = SSA_NAME_DEF_STMT (rhs)) + && is_gimple_assign (and_minus_one) + && (gimple_assign_rhs_code (and_minus_one) == PLUS_EXPR) + && integer_minus_onep (gimple_assign_rhs2 (and_minus_one))) + ; + else + return false; + if ((gimple_assign_rhs1 (and_stmt) != gimple_assign_rhs1 (and_minus_one)) + && (gimple_assign_rhs2 (and_stmt) != gimple_assign_rhs1 (and_minus_one))) + return false; + + /* Check the recurrence. */ + gimple *phi = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (and_minus_one)); + gimple *src_phi = SSA_NAME_DEF_STMT (lhs); + if (gimple_code (phi) != GIMPLE_PHI + || gimple_code (src_phi) != GIMPLE_PHI) + return false; + + /* Check the loop closed SSA definition for just the variable c defined in + loop. */ + src = gimple_phi_arg_def (src_phi, loop_preheader_edge (loop)->dest_idx); + basic_block bb = single_exit (loop)->dest; + for (gphi_iterator gpi = gsi_start_phis (bb); + !gsi_end_p (gpi); gsi_next (&gpi)) + { + count_phi = gpi.phi (); + count++; + } + if (count != 1) + return false; + + /* Make sure we have a count by one and it starts from zero. */ + rhs = gimple_phi_arg_def (count_phi, 0); + stmt = SSA_NAME_DEF_STMT (rhs); + if (!is_gimple_assign (stmt) + || (gimple_assign_rhs_code (stmt) != PLUS_EXPR) + || !integer_onep (gimple_assign_rhs2 (stmt))) + return false; + rhs = gimple_assign_rhs1 (stmt); + stmt = SSA_NAME_DEF_STMT (rhs); + if (gimple_code (stmt) != GIMPLE_PHI + || !integer_zerop (gimple_phi_arg_def (stmt, loop_preheader_edge (loop)->dest_idx))) + return false; + + /* Create a var for builtin_popcount result and update the uses. */ + dest = gimple_phi_result (count_phi); + tree var = make_ssa_name (TREE_TYPE (dest), NULL); + FOR_EACH_IMM_USE_STMT (use_stmt, iter, dest) + { + FOR_EACH_IMM_USE_ON_STMT (use_p, iter) + SET_USE (use_p, var); + } + dest = var; + + /* Remove the loop closed PHI stmt. */ + gsi = gsi_after_labels (gimple_bb (count_phi)); + gphi_iterator gsi_phi = gsi_for_phi (count_phi); + remove_phi_node (&gsi_phi, true); + + /* Insert the builtin. */ + gsi = gsi_after_labels (loop_preheader_edge (loop)->src); + src = force_gimple_operand_gsi (&gsi, src, true, NULL_TREE, + false, GSI_CONTINUE_LINKING); + tree fn = build_fold_addr_expr (builtin_decl_implicit (BUILT_IN_POPCOUNT)); + fn_call = gimple_build_call (fn, 1, src); + gimple_call_set_lhs (fn_call, dest); + gsi_insert_before (&gsi, fn_call, GSI_CONTINUE_LINKING); + + /* In most cases, we will have if (b != 0) preceeding the loop in + the form + if (b != 0) + { + loop; + } + + Since __builtin_popcount (0) is defined, we can remove this condition + by making it always true. */ + + stmt = last_stmt (EDGE_PRED (loop_preheader_edge (loop)->src, 0)->src); + if (stmt + && gimple_code (stmt) == GIMPLE_COND + && (gimple_cond_lhs (stmt) == src) + && zerop (gimple_cond_rhs (stmt))) + { + gimple_cond_set_lhs (as_a (stmt), + build_int_cst (TREE_TYPE (src), 1)); + update_stmt (stmt); + } + + /* Finaly remove the loop. */ + destroy_loop (loop); + return true; +} + /* Classifies the builtin kind we can generate for PARTITION of RDG and LOOP. For the moment we detect memset, memcpy and memmove patterns. Bitmap STMT_IN_ALL_PARTITIONS contains statements belonging to all partitions. */ @@ -3032,6 +3174,9 @@ pass_loop_distribution::execute (function *fun) || !optimize_loop_for_speed_p (loop)) continue; + if (handle_popcount (loop)) + continue; + /* Don't distribute loop if niters is unknown. */ tree niters = number_of_latch_executions (loop); if (niters == NULL_TREE || niters == chrec_dont_know) -- 2.7.4