From patchwork Wed Jun 21 13:59:29 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 778897 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3wt5tn56vlz9s0Z for ; Wed, 21 Jun 2017 23:59:44 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="SggJ+v2s"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; q=dns; s=default; b=BSC3noegSgxm606QsC8ok7CscbA5NHPCkQtR24u0bLoBUpZEcI qMRgPN0+F+Fo3u7PK8g/G7YQPDNz7NdWdwrjxRmaA8DEPzyxZ+1PKF6R0dVgqiX/ yImNvz/DP9T9zut0ziIoJvKYtNswITRcRlYCII7Yfs0kd0sAYSN02hzoc= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; s= default; bh=hRxs7f9p8iUch9lSyRLAnBtT9KQ=; b=SggJ+v2sBEO2tBet8hQm IRy5KPP6d3mVwAz62Sim4vvjykt8Q7A8RtMV/fm1O79QPWiGrLs4D4GEXim21lHZ b/gd865lkvntq67R1+5G5aQ0hbhelrqycpLqG0C7/jyAWHkLtf54xVB1jZJRyseJ DREOTfLJOBS5r0Ndu1LtO08= Received: (qmail 50669 invoked by alias); 21 Jun 2017 13:59:35 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 50645 invoked by uid 89); 21 Jun 2017 13:59:34 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-11.1 required=5.0 tests=BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, SPF_PASS, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy= X-HELO: mx1.suse.de Received: from mx2.suse.de (HELO mx1.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 21 Jun 2017 13:59:32 +0000 Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id AAEC6AD24; Wed, 21 Jun 2017 13:59:29 +0000 (UTC) Date: Wed, 21 Jun 2017 15:59:29 +0200 (CEST) From: Richard Biener To: gcc-patches@gcc.gnu.org cc: alan.hayward@arm.com Subject: [PATCH] Implement cond and induction cond reduction w/o REDUC_MAX_EXPR Message-ID: User-Agent: Alpine 2.20 (LSU 67 2015-01-07) MIME-Version: 1.0 During my attempt to refactor reduction vectorization I ran across the special casing of inital values for INTEGER_INDUC_COND_REDUCTION and tried to see what it is about. So I ended up implementing cond reduction support for targets w/o REDUC_MAX_EXPR by simply doing the reduction in scalar code -- while that results in an expensive epilogue the vector loop should be reasonably fast. I still didn't run into any exec FAILs in vect.exp with removing the INTEGER_INDUC_COND_REDUCTION special case thus the following patch. Alan -- is there a testcase (maybe full bootstrap & regtest will unconver one) that shows how this is necessary? Bootstrap and regtest running on x86_64-unknown-linux-gnu, testing on arm appreciated. Thanks, Richard. 2016-06-21 Richard Biener * tree-vect-loop.c (vect_model_reduction_cost): Handle COND_REDUCTION and INTEGER_INDUC_COND_REDUCTION without REDUC_MAX_EXPR support. (vectorizable_reduction): Likewise. (vect_create_epilog_for_reduction): Remove special case of INTEGER_INDUC_COND_REDUCTION initial value. (vect_create_epilog_for_reduction): Handle COND_REDUCTION and INTEGER_INDUC_COND_REDUCTION without REDUC_MAX_EXPR support. Remove compensation code for initial value special handling of INTEGER_INDUC_COND_REDUCTION. * gcc.dg/vect/pr65947-1.c: Remove xfail. * gcc.dg/vect/pr65947-2.c: Likewise. * gcc.dg/vect/pr65947-3.c: Likewise. * gcc.dg/vect/pr65947-4.c: Likewise. * gcc.dg/vect/pr65947-5.c: Likewise. * gcc.dg/vect/pr65947-6.c: Likewise. * gcc.dg/vect/pr65947-8.c: Likewise. * gcc.dg/vect/pr65947-9.c: Likewise. Index: gcc/testsuite/gcc.dg/vect/pr65947-1.c =================================================================== --- gcc/testsuite/gcc.dg/vect/pr65947-1.c (revision 249446) +++ gcc/testsuite/gcc.dg/vect/pr65947-1.c (working copy) @@ -40,5 +40,5 @@ main (void) return 0; } -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! vect_max_reduc } } } } */ -/* { dg-final { scan-tree-dump-times "condition expression based on integer induction." 4 "vect" { xfail { ! vect_max_reduc } } } } */ +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */ +/* { dg-final { scan-tree-dump-times "condition expression based on integer induction." 4 "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/pr65947-2.c =================================================================== --- gcc/testsuite/gcc.dg/vect/pr65947-2.c (revision 249446) +++ gcc/testsuite/gcc.dg/vect/pr65947-2.c (working copy) @@ -41,5 +41,5 @@ main (void) return 0; } -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! vect_max_reduc } } } } */ +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */ /* { dg-final { scan-tree-dump-not "condition expression based on integer induction." "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/pr65947-3.c =================================================================== --- gcc/testsuite/gcc.dg/vect/pr65947-3.c (revision 249446) +++ gcc/testsuite/gcc.dg/vect/pr65947-3.c (working copy) @@ -51,5 +51,5 @@ main (void) return 0; } -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! vect_max_reduc } } } } */ +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */ /* { dg-final { scan-tree-dump-not "condition expression based on integer induction." "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/pr65947-4.c =================================================================== --- gcc/testsuite/gcc.dg/vect/pr65947-4.c (revision 249446) +++ gcc/testsuite/gcc.dg/vect/pr65947-4.c (working copy) @@ -40,6 +40,6 @@ main (void) return 0; } -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! vect_max_reduc } } } } */ -/* { dg-final { scan-tree-dump-times "condition expression based on integer induction." 4 "vect" { xfail { ! vect_max_reduc } } } } */ +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */ +/* { dg-final { scan-tree-dump-times "condition expression based on integer induction." 4 "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/pr65947-5.c =================================================================== --- gcc/testsuite/gcc.dg/vect/pr65947-5.c (revision 249446) +++ gcc/testsuite/gcc.dg/vect/pr65947-5.c (working copy) @@ -41,6 +41,6 @@ main (void) return 0; } -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" { xfail { ! vect_max_reduc } } } } */ -/* { dg-final { scan-tree-dump "loop size is greater than data size" "vect" { xfail { ! vect_max_reduc } } } } */ +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" } } */ +/* { dg-final { scan-tree-dump "loop size is greater than data size" "vect" } } */ /* { dg-final { scan-tree-dump-not "condition expression based on integer induction." "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/pr65947-6.c =================================================================== --- gcc/testsuite/gcc.dg/vect/pr65947-6.c (revision 249446) +++ gcc/testsuite/gcc.dg/vect/pr65947-6.c (working copy) @@ -40,5 +40,5 @@ main (void) return 0; } -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! vect_max_reduc } } } } */ +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */ /* { dg-final { scan-tree-dump-not "condition expression based on integer induction." "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/pr65947-8.c =================================================================== --- gcc/testsuite/gcc.dg/vect/pr65947-8.c (revision 249446) +++ gcc/testsuite/gcc.dg/vect/pr65947-8.c (working copy) @@ -42,4 +42,4 @@ main (void) } /* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ -/* { dg-final { scan-tree-dump "multiple types in double reduction or condition reduction" "vect" { xfail { ! vect_max_reduc } } } } */ +/* { dg-final { scan-tree-dump "multiple types in double reduction or condition reduction" "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/pr65947-9.c =================================================================== --- gcc/testsuite/gcc.dg/vect/pr65947-9.c (revision 249446) +++ gcc/testsuite/gcc.dg/vect/pr65947-9.c (working copy) @@ -46,4 +46,4 @@ main () } /* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ -/* { dg-final { scan-tree-dump "loop size is greater than data size" "vect" { xfail { ! vect_max_reduc } } } } */ +/* { dg-final { scan-tree-dump "loop size is greater than data size" "vect" } } */ Index: gcc/tree-vect-loop.c =================================================================== --- gcc/tree-vect-loop.c (revision 249446) +++ gcc/tree-vect-loop.c (working copy) @@ -3772,6 +3772,31 @@ vect_model_reduction_cost (stmt_vec_info vect_epilogue); } } + else if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION) + { + unsigned nunits = TYPE_VECTOR_SUBPARTS (vectype); + /* Extraction of scalar elements. */ + epilogue_cost += add_stmt_cost (target_cost_data, 2 * nunits, + vec_to_scalar, stmt_info, 0, + vect_epilogue); + /* Scalar max reductions via COND_EXPR / MAX_EXPR. */ + epilogue_cost += add_stmt_cost (target_cost_data, 2 * nunits - 3, + scalar_stmt, stmt_info, 0, + vect_epilogue); + } + else if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) + == INTEGER_INDUC_COND_REDUCTION) + { + unsigned nunits = TYPE_VECTOR_SUBPARTS (vectype); + /* Extraction of scalar elements. */ + epilogue_cost += add_stmt_cost (target_cost_data, nunits, + vec_to_scalar, stmt_info, 0, + vect_epilogue); + /* Scalar max reductions via MAX_EXPRs. */ + epilogue_cost += add_stmt_cost (target_cost_data, nunits - 1, + scalar_stmt, stmt_info, 0, + vect_epilogue); + } else { int vec_size_in_bits = tree_to_uhwi (TYPE_SIZE (vectype)); @@ -3783,7 +3808,8 @@ vect_model_reduction_cost (stmt_vec_info optab = optab_for_tree_code (code, vectype, optab_default); /* We have a whole vector shift available. */ - if (VECTOR_MODE_P (mode) + if (optab != unknown_optab + && VECTOR_MODE_P (mode) && optab_handler (optab, mode) != CODE_FOR_nothing && have_whole_vector_shift (mode)) { @@ -4212,24 +4238,8 @@ vect_create_epilog_for_reduction (vec (phi), zero_vec, - loop_preheader_edge (loop), UNKNOWN_LOCATION); - } - else - add_phi_arg (as_a (phi), vec_init_def, - loop_preheader_edge (loop), UNKNOWN_LOCATION); + add_phi_arg (as_a (phi), vec_init_def, + loop_preheader_edge (loop), UNKNOWN_LOCATION); /* Set the loop-latch arg for the reduction-phi. */ if (j > 0) @@ -4424,7 +4434,8 @@ vect_create_epilog_for_reduction (vec idx_val) + val = data_reduc[i], idx_val = induction_index[i]; + return val; */ + + tree data_eltype = NULL_TREE; + if (!induction_index) + std::swap (induction_index, new_phi_result); + else + data_eltype = TREE_TYPE (TREE_TYPE (new_phi_result)); + tree idx_eltype = TREE_TYPE (TREE_TYPE (induction_index)); + unsigned HOST_WIDE_INT el_size = tree_to_uhwi (TYPE_SIZE (idx_eltype)); + unsigned HOST_WIDE_INT v_size + = el_size * TYPE_VECTOR_SUBPARTS (TREE_TYPE (induction_index)); + tree idx_val = NULL_TREE, val = NULL_TREE; + for (unsigned HOST_WIDE_INT off = 0; off < v_size; off += el_size) + { + tree old_idx_val = idx_val; + tree old_val = val; + idx_val = make_ssa_name (idx_eltype); + epilog_stmt = gimple_build_assign (idx_val, BIT_FIELD_REF, + build3 (BIT_FIELD_REF, idx_eltype, + induction_index, + bitsize_int (el_size), + bitsize_int (off))); + gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); + if (new_phi_result) + { + val = make_ssa_name (data_eltype); + epilog_stmt = gimple_build_assign (val, BIT_FIELD_REF, + build3 (BIT_FIELD_REF, + data_eltype, + new_phi_result, + bitsize_int (el_size), + bitsize_int (off))); + gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); + } + if (off != 0) + { + tree new_idx_val = idx_val; + tree new_val = val; + if (! new_phi_result + || off != v_size - el_size) + { + new_idx_val = make_ssa_name (idx_eltype); + epilog_stmt = gimple_build_assign (new_idx_val, + MAX_EXPR, idx_val, + old_idx_val); + gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); + } + if (new_phi_result) + { + new_val = make_ssa_name (data_eltype); + epilog_stmt = gimple_build_assign (new_val, + COND_EXPR, + build2 (GT_EXPR, + boolean_type_node, + idx_val, + old_idx_val), + val, old_val); + gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT); + } + idx_val = new_idx_val; + val = new_val; + } + } + if (new_phi_result) + scalar_results.safe_push (val); + else + { + scalar_results.safe_push (idx_val); + std::swap (induction_index, new_phi_result); + } + } /* 2.3 Create the reduction code, using one of the three schemes described above. In SLP we simply need to extract all the elements from the @@ -4572,23 +4665,6 @@ vect_create_epilog_for_reduction (vec