From patchwork Thu Jun 17 06:29:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1493223 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=dlLBG/HB; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4G5BxV0QH4z9sSn for ; Thu, 17 Jun 2021 16:30:04 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6DF963838034 for ; Thu, 17 Jun 2021 06:30:01 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6DF963838034 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1623911401; bh=0q0kcBI+zknb/UR6ymVoGXuEHbXJjypOxNVUy9cEysU=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=dlLBG/HBqE5ACcb0OitfhekHBhW87QLJDvJ8659GilXgX1Ecdc53iY2zIUcjTEivM NaTTxL5iSoUgfB25BJ2OuDGL03tmROqd8k6OIE5Kbeyx0TIhP5+REf3C5j1B/re/On L3t5iLNKFLqDrnJJVRRYfQhMGK10vCTZlhBaQZeo= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by sourceware.org (Postfix) with ESMTPS id B68663857434 for ; Thu, 17 Jun 2021 06:29:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B68663857434 IronPort-SDR: dLv+kG3swXUg+0mEYOEnyDP5qLTWbJb2y2yCKDLyY0FNUKztmviME6QypI6rEkuVJm4Smg8/Li 8i9HRnJzF6Pg== X-IronPort-AV: E=McAfee;i="6200,9189,10017"; a="267459359" X-IronPort-AV: E=Sophos;i="5.83,278,1616482800"; d="scan'208";a="267459359" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Jun 2021 23:29:15 -0700 IronPort-SDR: dbXL3bm0mkbYg1fP1CB3UEAgTmOdmOOQKwgYEx/1n7IciRZ3YC29pJVDX/THhLpml9yicTjbta AB0WUbE+qyBg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,278,1616482800"; d="scan'208";a="637748131" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga006.fm.intel.com with ESMTP; 16 Jun 2021 23:29:14 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 15H6TDYJ017007; Wed, 16 Jun 2021 23:29:13 -0700 To: gcc-patches@gcc.gnu.org, richard.guenther@gmail.com Subject: [PATCH] Add vect_recog_popcount_pattern to handle mismatch between the vectorized popcount IFN and scalar popcount builtin. Date: Thu, 17 Jun 2021 14:29:12 +0800 Message-Id: <20210617062912.89506-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" The patch remove those pro- and demotions when backend support direct optab. For i386: it enables vectorization for vpopcntb/vpopcntw and optimized for vpopcntq. gcc/ChangeLog: PR tree-optimization/97770 * tree-vect-patterns.c (vect_recog_popcount_pattern): New. (vect_recog_func vect_vect_recog_func_ptrs): Add new pattern. gcc/testsuite/ChangeLog: PR tree-optimization/97770 * gcc.target/i386/avx512bitalg-pr97770-1.c: Remove xfail. * gcc.target/i386/avx512vpopcntdq-pr97770-1.c: Remove xfail. --- .../gcc.target/i386/avx512bitalg-pr97770-1.c | 27 +++-- .../i386/avx512vpopcntdq-pr97770-1.c | 9 +- gcc/tree-vect-patterns.c | 110 ++++++++++++++++++ 3 files changed, 127 insertions(+), 19 deletions(-) diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-pr97770-1.c b/gcc/testsuite/gcc.target/i386/avx512bitalg-pr97770-1.c index c83a477045c..d1beec4cdb4 100644 --- a/gcc/testsuite/gcc.target/i386/avx512bitalg-pr97770-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-pr97770-1.c @@ -1,19 +1,18 @@ /* PR target/97770 */ /* { dg-do compile } */ -/* { dg-options "-O2 -mavx512bitalg -mavx512vl -mprefer-vector-width=512" } */ -/* Add xfail since no IFN for QI/HImode popcount */ -/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*xmm" 1 {xfail *-*-*} } } */ -/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*xmm" 1 {xfail *-*-*} } } */ -/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*ymm" 1 {xfail *-*-*} } } */ -/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*ymm" 1 {xfail *-*-*} } } */ -/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*zmm" 1 {xfail *-*-*} } } */ -/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*zmm" 1 {xfail *-*-*} } } */ +/* { dg-options "-O2 -march=icelake-server -mprefer-vector-width=512" } */ +/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*xmm" 1 } } */ +/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*xmm" 1 } } */ +/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*ymm" 1 } } */ +/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*ymm" 1 } } */ +/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*zmm" 1 } } */ +/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*zmm" 1 } } */ #include void __attribute__ ((noipa, optimize("-O3"))) -popcountb_128 (char * __restrict dest, char* src) +popcountb_128 (unsigned char * __restrict dest, unsigned char* src) { for (int i = 0; i != 16; i++) dest[i] = __builtin_popcount (src[i]); @@ -21,7 +20,7 @@ popcountb_128 (char * __restrict dest, char* src) void __attribute__ ((noipa, optimize("-O3"))) -popcountw_128 (short* __restrict dest, short* src) +popcountw_128 (unsigned short* __restrict dest, unsigned short* src) { for (int i = 0; i != 8; i++) dest[i] = __builtin_popcount (src[i]); @@ -29,7 +28,7 @@ popcountw_128 (short* __restrict dest, short* src) void __attribute__ ((noipa, optimize("-O3"))) -popcountb_256 (char * __restrict dest, char* src) +popcountb_256 (unsigned char * __restrict dest, unsigned char* src) { for (int i = 0; i != 32; i++) dest[i] = __builtin_popcount (src[i]); @@ -37,7 +36,7 @@ popcountb_256 (char * __restrict dest, char* src) void __attribute__ ((noipa, optimize("-O3"))) -popcountw_256 (short* __restrict dest, short* src) +popcountw_256 (unsigned short* __restrict dest, unsigned short* src) { for (int i = 0; i != 16; i++) dest[i] = __builtin_popcount (src[i]); @@ -45,7 +44,7 @@ popcountw_256 (short* __restrict dest, short* src) void __attribute__ ((noipa, optimize("-O3"))) -popcountb_512 (char * __restrict dest, char* src) +popcountb_512 (unsigned char * __restrict dest, unsigned char* src) { for (int i = 0; i != 64; i++) dest[i] = __builtin_popcount (src[i]); @@ -53,7 +52,7 @@ popcountb_512 (char * __restrict dest, char* src) void __attribute__ ((noipa, optimize("-O3"))) -popcountw_512 (short* __restrict dest, short* src) +popcountw_512 (unsigned short* __restrict dest, unsigned short* src) { for (int i = 0; i != 32; i++) dest[i] = __builtin_popcount (src[i]); diff --git a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-pr97770-1.c b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-pr97770-1.c index 63bb00d9b4a..dedd2e4c3d6 100644 --- a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-pr97770-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-pr97770-1.c @@ -1,13 +1,12 @@ /* PR target/97770 */ /* { dg-do compile } */ -/* { dg-options "-O2 -mavx512vpopcntdq -mavx512vl -mprefer-vector-width=512" } */ +/* { dg-options "-O2 -march=icelake-server -mprefer-vector-width=512" } */ /* { dg-final { scan-assembler-times "vpopcntd\[ \\t\]+\[^\\n\\r\]*xmm" 1 } } */ /* { dg-final { scan-assembler-times "vpopcntd\[ \\t\]+\[^\\n\\r\]*ymm" 1 } } */ /* { dg-final { scan-assembler-times "vpopcntd\[ \\t\]+\[^\\n\\r\]*zmm" 1 } } */ -/* Add xfail since current vectorizor cannot generate expected code for DImode popcount */ -/* { dg-final { scan-assembler-times "vpopcntq\[ \\t\]+\[^\\n\\r\]*xmm" 1 { xfail *-*-* } } } */ -/* { dg-final { scan-assembler-times "vpopcntq\[ \\t\]+\[^\\n\\r\]*ymm" 1 { xfail *-*-* } } } */ -/* { dg-final { scan-assembler-times "vpopcntq\[ \\t\]+\[^\\n\\r\]*zmm" 1 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times "vpopcntq\[ \\t\]+\[^\\n\\r\]*xmm" 1 } } */ +/* { dg-final { scan-assembler-times "vpopcntq\[ \\t\]+\[^\\n\\r\]*ymm" 1 } } */ +/* { dg-final { scan-assembler-times "vpopcntq\[ \\t\]+\[^\\n\\r\]*zmm" 1 } } */ #ifndef AVX512VPOPCNTQ_H_INCLUDED #define AVX512VPOPCNTQ_H_INCLUDED diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c index 177d44ebb5e..5c80800efbb 100644 --- a/gcc/tree-vect-patterns.c +++ b/gcc/tree-vect-patterns.c @@ -1292,6 +1292,115 @@ vect_recog_widen_minus_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info, "vect_recog_widen_minus_pattern"); } +/* Function vect_recog_popcount_pattern + + Try to find the following pattern: + + UTYPE1 A; + TYPE1 B; + UTYPE2 temp_in; + TYPE3 temp_out; + temp_in = (TYPE2)A; + + temp_out = __builtin_popcount{,l,ll} (temp_in); + B = (TYPE1) temp_out; + + TYPE2 may or may not be equal to TYPE3. + i.e. TYPE2 is equal to TYPE3 for __builtin_popcount + i.e. TYPE2 is not equal to TYPE3 for __builtin_popcountll + + Input: + + * STMT_VINFO: The stmt from which the pattern search begins. + here it starts with B = (TYPE1) temp_out; + + Output: + + * TYPE_OUT: The vector type of the output of this pattern. + + * Return value: A new stmt that will be used to replace the sequence of + stmts that constitute the pattern. In this case it will be: + B = .POPCOUNT (A); +*/ + +static gimple * +vect_recog_popcount_pattern (vec_info *vinfo, + stmt_vec_info stmt_vinfo, tree *type_out) +{ + gassign *last_stmt = dyn_cast (stmt_vinfo->stmt); + gimple *popcount_stmt, *pattern_stmt; + tree rhs_oprnd, rhs_origin, lhs_oprnd, lhs_type, vec_type, new_var; + auto_vec vargs; + + /* Find B = (TYPE1) temp_out. */ + if (!last_stmt) + return NULL; + tree_code code = gimple_assign_rhs_code (last_stmt); + if (!CONVERT_EXPR_CODE_P (code)) + return NULL; + + lhs_oprnd = gimple_assign_lhs (last_stmt); + lhs_type = TREE_TYPE (lhs_oprnd); + if (TREE_CODE (lhs_type) != INTEGER_TYPE) + return NULL; + + rhs_oprnd = gimple_assign_rhs1 (last_stmt); + if (TREE_CODE (rhs_oprnd) != SSA_NAME + || !has_single_use (rhs_oprnd)) + return NULL; + popcount_stmt = SSA_NAME_DEF_STMT (rhs_oprnd); + + /* Find temp_out = __builtin_popcount{,l,ll} (temp_in); */ + if (!is_gimple_call (popcount_stmt) + || !gimple_call_lhs (popcount_stmt)) + return NULL; + switch (gimple_call_combined_fn (popcount_stmt)) + { + CASE_CFN_POPCOUNT: + break; + default: + return NULL; + } + + rhs_oprnd = gimple_call_arg (popcount_stmt, 0); + vect_unpromoted_value unprom_diff; + rhs_origin = vect_look_through_possible_promotion (vinfo, rhs_oprnd, + &unprom_diff); + + if (!rhs_origin) + return NULL; + + /* Input and outout of .POPCOUNT should be same-precision integer. + Also A should be unsigned or same presion as temp_in, + otherwise there would be sign_extend from A to temp_in. */ + if (TYPE_PRECISION (unprom_diff.type) != TYPE_PRECISION (lhs_type) + || !(TYPE_UNSIGNED (unprom_diff.type) + || (TYPE_PRECISION (unprom_diff.type) + == TYPE_PRECISION (TREE_TYPE (rhs_oprnd))))) + return NULL; + vargs.safe_push (unprom_diff.op); + + vect_pattern_detected ("vec_regcog_popcount_pattern", popcount_stmt); + vec_type = get_vectype_for_scalar_type (vinfo, lhs_type); + /* Do it only the backend existed popcount2. */ + if (!direct_internal_fn_supported_p (IFN_POPCOUNT, + vec_type, + OPTIMIZE_FOR_SPEED)) + return NULL; + + /* Create B = .POPCOUNT (A). */ + new_var = vect_recog_temp_ssa_var (lhs_type, NULL); + pattern_stmt = gimple_build_call_internal_vec (IFN_POPCOUNT, vargs); + gimple_call_set_lhs (pattern_stmt, new_var); + gimple_set_location (pattern_stmt, gimple_location (last_stmt)); + *type_out = vec_type; + + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "created pattern stmt: %G", pattern_stmt); + return pattern_stmt; +} + /* Function vect_recog_pow_pattern Try to find the following pattern: @@ -5283,6 +5392,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = { { vect_recog_sad_pattern, "sad" }, { vect_recog_widen_sum_pattern, "widen_sum" }, { vect_recog_pow_pattern, "pow" }, + { vect_recog_popcount_pattern, "popcount" }, { vect_recog_widen_shift_pattern, "widen_shift" }, { vect_recog_rotate_pattern, "rotate" }, { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },