From patchwork Wed Jun 21 13:59:29 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Richard Biener <rguenther@suse.de>
X-Patchwork-Id: 778897
Return-Path: 
 <gcc-patches-return-456484-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 3wt5tn56vlz9s0Z
	for <incoming@patchwork.ozlabs.org>;
	Wed, 21 Jun 2017 23:59:44 +1000 (AEST)
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b="SggJ+v2s"; dkim-atps=neutral
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:cc:subject:message-id:mime-version:content-type; q=dns;
	s=default; b=BSC3noegSgxm606QsC8ok7CscbA5NHPCkQtR24u0bLoBUpZEcI
	qMRgPN0+F+Fo3u7PK8g/G7YQPDNz7NdWdwrjxRmaA8DEPzyxZ+1PKF6R0dVgqiX/
	yImNvz/DP9T9zut0ziIoJvKYtNswITRcRlYCII7Yfs0kd0sAYSN02hzoc=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:cc:subject:message-id:mime-version:content-type; s=
	default; bh=hRxs7f9p8iUch9lSyRLAnBtT9KQ=; b=SggJ+v2sBEO2tBet8hQm
	IRy5KPP6d3mVwAz62Sim4vvjykt8Q7A8RtMV/fm1O79QPWiGrLs4D4GEXim21lHZ
	b/gd865lkvntq67R1+5G5aQ0hbhelrqycpLqG0C7/jyAWHkLtf54xVB1jZJRyseJ
	DREOTfLJOBS5r0Ndu1LtO08=
Received: (qmail 50669 invoked by alias); 21 Jun 2017 13:59:35 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 50645 invoked by uid 89); 21 Jun 2017 13:59:34 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-11.1 required=5.0 tests=BAYES_00, GIT_PATCH_2,
	GIT_PATCH_3, KAM_ASCII_DIVIDERS, SPF_PASS,
	T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=
X-HELO: mx1.suse.de
Received: from mx2.suse.de (HELO mx1.suse.de) (195.135.220.15) by
	sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Wed, 21 Jun 2017 13:59:32 +0000
Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254])	by
	mx1.suse.de (Postfix) with ESMTP id AAEC6AD24;
	Wed, 21 Jun 2017 13:59:29 +0000 (UTC)
Date: Wed, 21 Jun 2017 15:59:29 +0200 (CEST)
From: Richard Biener <rguenther@suse.de>
To: gcc-patches@gcc.gnu.org
cc: alan.hayward@arm.com
Subject: [PATCH] Implement cond and induction cond reduction w/o
	REDUC_MAX_EXPR
Message-ID: <alpine.LSU.2.20.1706211547590.22867@zhemvz.fhfr.qr>
User-Agent: Alpine 2.20 (LSU 67 2015-01-07)
MIME-Version: 1.0

During my attempt to refactor reduction vectorization I ran across
the special casing of inital values for INTEGER_INDUC_COND_REDUCTION
and tried to see what it is about.  So I ended up implementing
cond reduction support for targets w/o REDUC_MAX_EXPR by simply
doing the reduction in scalar code -- while that results in an
expensive epilogue the vector loop should be reasonably fast.

I still didn't run into any exec FAILs in vect.exp with removing
the INTEGER_INDUC_COND_REDUCTION special case thus the following
patch.

Alan -- is there a testcase (maybe full bootstrap & regtest will
unconver one) that shows how this is necessary?

Bootstrap and regtest running on x86_64-unknown-linux-gnu, testing
on arm appreciated.

Thanks,
Richard.

2016-06-21  Richard Biener  <rguenther@suse.de>

	* tree-vect-loop.c (vect_model_reduction_cost): Handle
	COND_REDUCTION and INTEGER_INDUC_COND_REDUCTION without
	REDUC_MAX_EXPR support.
	(vectorizable_reduction): Likewise.
	(vect_create_epilog_for_reduction): Remove special case of
	INTEGER_INDUC_COND_REDUCTION initial value.
	(vect_create_epilog_for_reduction): Handle COND_REDUCTION
	and INTEGER_INDUC_COND_REDUCTION without REDUC_MAX_EXPR support.
	Remove compensation code for initial value special handling
	of INTEGER_INDUC_COND_REDUCTION.

	* gcc.dg/vect/pr65947-1.c: Remove xfail.
	* gcc.dg/vect/pr65947-2.c: Likewise.
	* gcc.dg/vect/pr65947-3.c: Likewise.
	* gcc.dg/vect/pr65947-4.c: Likewise.
	* gcc.dg/vect/pr65947-5.c: Likewise.
	* gcc.dg/vect/pr65947-6.c: Likewise.
	* gcc.dg/vect/pr65947-8.c: Likewise.
	* gcc.dg/vect/pr65947-9.c: Likewise.

Index: gcc/testsuite/gcc.dg/vect/pr65947-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/pr65947-1.c	(revision 249446)
+++ gcc/testsuite/gcc.dg/vect/pr65947-1.c	(working copy)
@@ -40,5 +40,5 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! vect_max_reduc } } } } */
-/* { dg-final { scan-tree-dump-times "condition expression based on integer induction." 4 "vect" { xfail { ! vect_max_reduc } } } } */
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "condition expression based on integer induction." 4 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/pr65947-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/pr65947-2.c	(revision 249446)
+++ gcc/testsuite/gcc.dg/vect/pr65947-2.c	(working copy)
@@ -41,5 +41,5 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! vect_max_reduc } } } } */
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */
 /* { dg-final { scan-tree-dump-not "condition expression based on integer induction." "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/pr65947-3.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/pr65947-3.c	(revision 249446)
+++ gcc/testsuite/gcc.dg/vect/pr65947-3.c	(working copy)
@@ -51,5 +51,5 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! vect_max_reduc } } } } */
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */
 /* { dg-final { scan-tree-dump-not "condition expression based on integer induction." "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/pr65947-4.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/pr65947-4.c	(revision 249446)
+++ gcc/testsuite/gcc.dg/vect/pr65947-4.c	(working copy)
@@ -40,6 +40,6 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! vect_max_reduc } } } } */
-/* { dg-final { scan-tree-dump-times "condition expression based on integer induction." 4 "vect" { xfail { ! vect_max_reduc } } } } */
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "condition expression based on integer induction." 4 "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/pr65947-5.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/pr65947-5.c	(revision 249446)
+++ gcc/testsuite/gcc.dg/vect/pr65947-5.c	(working copy)
@@ -41,6 +41,6 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" { xfail { ! vect_max_reduc } } } } */
-/* { dg-final { scan-tree-dump "loop size is greater than data size" "vect" { xfail { ! vect_max_reduc } } } } */
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" } } */
+/* { dg-final { scan-tree-dump "loop size is greater than data size" "vect" } } */
 /* { dg-final { scan-tree-dump-not "condition expression based on integer induction." "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/pr65947-6.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/pr65947-6.c	(revision 249446)
+++ gcc/testsuite/gcc.dg/vect/pr65947-6.c	(working copy)
@@ -40,5 +40,5 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! vect_max_reduc } } } } */
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */
 /* { dg-final { scan-tree-dump-not "condition expression based on integer induction." "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/pr65947-8.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/pr65947-8.c	(revision 249446)
+++ gcc/testsuite/gcc.dg/vect/pr65947-8.c	(working copy)
@@ -42,4 +42,4 @@ main (void)
 }
 
 /* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
-/* { dg-final { scan-tree-dump "multiple types in double reduction or condition reduction" "vect" { xfail { ! vect_max_reduc } } } } */
+/* { dg-final { scan-tree-dump "multiple types in double reduction or condition reduction" "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/pr65947-9.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/pr65947-9.c	(revision 249446)
+++ gcc/testsuite/gcc.dg/vect/pr65947-9.c	(working copy)
@@ -46,4 +46,4 @@ main ()
 }
 
 /* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
-/* { dg-final { scan-tree-dump "loop size is greater than data size" "vect" { xfail { ! vect_max_reduc } } } } */
+/* { dg-final { scan-tree-dump "loop size is greater than data size" "vect" } } */
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	(revision 249446)
+++ gcc/tree-vect-loop.c	(working copy)
@@ -3772,6 +3772,31 @@ vect_model_reduction_cost (stmt_vec_info
 					      vect_epilogue);
 	    }
 	}
+      else if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION)
+	{
+	  unsigned nunits = TYPE_VECTOR_SUBPARTS (vectype);
+	  /* Extraction of scalar elements.  */
+	  epilogue_cost += add_stmt_cost (target_cost_data, 2 * nunits,
+					  vec_to_scalar, stmt_info, 0,
+					  vect_epilogue);
+	  /* Scalar max reductions via COND_EXPR / MAX_EXPR.  */
+	  epilogue_cost += add_stmt_cost (target_cost_data, 2 * nunits - 3,
+					  scalar_stmt, stmt_info, 0,
+					  vect_epilogue);
+	}
+      else if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+	       == INTEGER_INDUC_COND_REDUCTION)
+	{
+	  unsigned nunits = TYPE_VECTOR_SUBPARTS (vectype);
+	  /* Extraction of scalar elements.  */
+	  epilogue_cost += add_stmt_cost (target_cost_data, nunits,
+					  vec_to_scalar, stmt_info, 0,
+					  vect_epilogue);
+	  /* Scalar max reductions via MAX_EXPRs.  */
+	  epilogue_cost += add_stmt_cost (target_cost_data, nunits - 1,
+					  scalar_stmt, stmt_info, 0,
+					  vect_epilogue);
+	}
       else
 	{
 	  int vec_size_in_bits = tree_to_uhwi (TYPE_SIZE (vectype));
@@ -3783,7 +3808,8 @@ vect_model_reduction_cost (stmt_vec_info
 	  optab = optab_for_tree_code (code, vectype, optab_default);
 
 	  /* We have a whole vector shift available.  */
-	  if (VECTOR_MODE_P (mode)
+	  if (optab != unknown_optab
+	      && VECTOR_MODE_P (mode)
 	      && optab_handler (optab, mode) != CODE_FOR_nothing
 	      && have_whole_vector_shift (mode))
 	    {
@@ -4212,24 +4238,8 @@ vect_create_epilog_for_reduction (vec<tr
 	    }
 
 	  /* Set the loop-entry arg of the reduction-phi.  */
-
-	  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
-	      == INTEGER_INDUC_COND_REDUCTION)
-	    {
-	      /* Initialise the reduction phi to zero.  This prevents initial
-		 values of non-zero interferring with the reduction op.  */
-	      gcc_assert (ncopies == 1);
-	      gcc_assert (i == 0);
-
-	      tree vec_init_def_type = TREE_TYPE (vec_init_def);
-	      tree zero_vec = build_zero_cst (vec_init_def_type);
-
-	      add_phi_arg (as_a <gphi *> (phi), zero_vec,
-			   loop_preheader_edge (loop), UNKNOWN_LOCATION);
-	    }
-	  else
-	    add_phi_arg (as_a <gphi *> (phi), vec_init_def,
-			 loop_preheader_edge (loop), UNKNOWN_LOCATION);
+	  add_phi_arg (as_a <gphi *> (phi), vec_init_def,
+		       loop_preheader_edge (loop), UNKNOWN_LOCATION);
 
           /* Set the loop-latch arg for the reduction-phi.  */
           if (j > 0)
@@ -4424,7 +4434,8 @@ vect_create_epilog_for_reduction (vec<tr
   else
     new_phi_result = PHI_RESULT (new_phis[0]);
 
-  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION)
+  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
+      && reduc_code != ERROR_MARK)
     {
       /* For condition reductions, we have a vector (NEW_PHI_RESULT) containing
 	 various data values where the condition matched and another vector
@@ -4536,6 +4547,88 @@ vect_create_epilog_for_reduction (vec<tr
       gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
       scalar_results.safe_push (new_temp);
     }
+  else if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
+	   || (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+	       == INTEGER_INDUC_COND_REDUCTION))
+    {
+      /* Condition redution without supported REDUC_MAX_EXPR.  Generate
+	 idx = 0;
+         idx_val = induction_index[0];
+	 val = data_reduc[0];
+         for (idx = 0, val = init, i = 0; i < nelts; ++i)
+	   if (induction_index[i] > idx_val)
+	     val = data_reduc[i], idx_val = induction_index[i];
+	 return val;  */
+
+      tree data_eltype = NULL_TREE;
+      if (!induction_index)
+	std::swap (induction_index, new_phi_result);
+      else
+	data_eltype = TREE_TYPE (TREE_TYPE (new_phi_result));
+      tree idx_eltype = TREE_TYPE (TREE_TYPE (induction_index));
+      unsigned HOST_WIDE_INT el_size = tree_to_uhwi (TYPE_SIZE (idx_eltype));
+      unsigned HOST_WIDE_INT v_size
+	= el_size * TYPE_VECTOR_SUBPARTS (TREE_TYPE (induction_index));
+      tree idx_val = NULL_TREE, val = NULL_TREE;
+      for (unsigned HOST_WIDE_INT off = 0; off < v_size; off += el_size)
+	{
+	  tree old_idx_val = idx_val;
+	  tree old_val = val;
+	  idx_val = make_ssa_name (idx_eltype);
+	  epilog_stmt = gimple_build_assign (idx_val, BIT_FIELD_REF,
+					     build3 (BIT_FIELD_REF, idx_eltype,
+						     induction_index,
+						     bitsize_int (el_size),
+						     bitsize_int (off)));
+	  gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
+	  if (new_phi_result)
+	    {
+	      val = make_ssa_name (data_eltype);
+	      epilog_stmt = gimple_build_assign (val, BIT_FIELD_REF,
+						 build3 (BIT_FIELD_REF,
+							 data_eltype,
+							 new_phi_result,
+							 bitsize_int (el_size),
+							 bitsize_int (off)));
+	      gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
+	    }
+	  if (off != 0)
+	    {
+	      tree new_idx_val = idx_val;
+	      tree new_val = val;
+	      if (! new_phi_result
+		  || off != v_size - el_size)
+		{
+		  new_idx_val = make_ssa_name (idx_eltype);
+		  epilog_stmt = gimple_build_assign (new_idx_val,
+						     MAX_EXPR, idx_val,
+						     old_idx_val);
+		  gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
+		}
+	      if (new_phi_result)
+		{
+		  new_val = make_ssa_name (data_eltype);
+		  epilog_stmt = gimple_build_assign (new_val,
+						     COND_EXPR,
+						     build2 (GT_EXPR,
+							     boolean_type_node,
+							     idx_val,
+							     old_idx_val),
+						     val, old_val);
+		  gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
+		}
+	      idx_val = new_idx_val;
+	      val = new_val;
+	    }
+	}
+      if (new_phi_result)
+	scalar_results.safe_push (val);
+      else
+	{
+	  scalar_results.safe_push (idx_val);
+	  std::swap (induction_index, new_phi_result);
+	}
+    }
 
   /* 2.3 Create the reduction code, using one of the three schemes described
          above. In SLP we simply need to extract all the elements from the 
@@ -4572,23 +4665,6 @@ vect_create_epilog_for_reduction (vec<tr
       new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
       gimple_assign_set_lhs (epilog_stmt, new_temp);
       gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
-
-      if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
-	  == INTEGER_INDUC_COND_REDUCTION)
-	{
-	  /* Earlier we set the initial value to be zero.  Check the result
-	     and if it is zero then replace with the original initial
-	     value.  */
-	  tree zero = build_zero_cst (scalar_type);
-	  tree zcompare = build2 (EQ_EXPR, boolean_type_node, new_temp, zero);
-
-	  tmp = make_ssa_name (new_scalar_dest);
-	  epilog_stmt = gimple_build_assign (tmp, COND_EXPR, zcompare,
-					     initial_def, new_temp);
-	  gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
-	  new_temp = tmp;
-	}
-
       scalar_results.safe_push (new_temp);
     }
   else
@@ -5639,21 +5715,6 @@ vectorizable_reduction (gimple *stmt, gi
 
 	      epilog_reduc_code = ERROR_MARK;
 	    }
-
-	  /* When epilog_reduc_code is ERROR_MARK then a reduction will be
-	     generated in the epilog using multiple expressions.  This does not
-	     work for condition reductions.  */
-	  if (epilog_reduc_code == ERROR_MARK
-	      && (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
-			== INTEGER_INDUC_COND_REDUCTION
-		  || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
-			== CONST_COND_REDUCTION))
-	    {
-	      if (dump_enabled_p ())
-		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-				 "no reduc code for scalar code.\n");
-	      return false;
-	    }
 	}
       else
 	{
@@ -5674,17 +5735,11 @@ vectorizable_reduction (gimple *stmt, gi
       cr_index_vector_type = build_vector_type
 	(cr_index_scalar_type, TYPE_VECTOR_SUBPARTS (vectype_out));
 
-      epilog_reduc_code = REDUC_MAX_EXPR;
       optab = optab_for_tree_code (REDUC_MAX_EXPR, cr_index_vector_type,
 				   optab_default);
       if (optab_handler (optab, TYPE_MODE (cr_index_vector_type))
-	  == CODE_FOR_nothing)
-	{
-	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			     "reduc max op not supported by target.\n");
-	  return false;
-	}
+	  != CODE_FOR_nothing)
+	epilog_reduc_code = REDUC_MAX_EXPR;
     }
 
   if ((double_reduc