diff mbox series

Fix result for conditional reductions matching at index 0

Message ID 87d14brhj6.fsf@uclouvain.be
State New
Headers show
Series Fix result for conditional reductions matching at index 0 | expand

Commit Message

Kilian Verhetsel Nov. 21, 2017, 11:35 a.m. UTC
Hi,

When translating conditional reductions based on integer induction, the
compiler uses the value zero to indicate the absence of any matches: if
the index of the last match is still zero at the end of the loop, the
default value is returned. The problem with this approach is that this
default value is returned not only when there were no matches at all,
but also when the last match occurred at index 0. This causes the test
gcc.dg/vect/pr65947-14.c to fail.

This patch corrects this by reusing the vector of indices used for
COND_REDUCTION, which starts at 1. If the 1-based index of the last
match is non-zero, 1 is subtracted from it, otherwise the initial value
is returned.

I tested this patch on x86_64-pc-linux-gnu (both with SSE and AVX2,
causing both paths through the reduc_code != ERROR_MARK branch being
taken).

2017-11-21  Kilian Verhetsel <kilian.verhetsel@uclouvain.be>

	* tree-vect-loop.c
        (vect_create_epilog_for_reduction): Fix the returned value for
	INTEGER_INDUC_COND_REDUCTION whose last match occurred at
	index 0.	
	(vectorizable_reduction): For INTEGER_INDUC_COND_REDUCTION,
	pass the PHI statement that sets the induction variable to the
	code generating the epilogue.

Comments

Richard Biener Nov. 21, 2017, 1:58 p.m. UTC | #1
On Tue, Nov 21, 2017 at 12:35 PM, Kilian Verhetsel
<kilian.verhetsel@uclouvain.be> wrote:
>
> Hi,
>
> When translating conditional reductions based on integer induction, the
> compiler uses the value zero to indicate the absence of any matches: if
> the index of the last match is still zero at the end of the loop, the
> default value is returned. The problem with this approach is that this
> default value is returned not only when there were no matches at all,
> but also when the last match occurred at index 0. This causes the test
> gcc.dg/vect/pr65947-14.c to fail.
>
> This patch corrects this by reusing the vector of indices used for
> COND_REDUCTION, which starts at 1. If the 1-based index of the last
> match is non-zero, 1 is subtracted from it, otherwise the initial value
> is returned.
>
> I tested this patch on x86_64-pc-linux-gnu (both with SSE and AVX2,
> causing both paths through the reduc_code != ERROR_MARK branch being
> taken).

This is PR81179 I think, please mention that in the changelog.

This unconditionally pessimizes code even if there is no valid index
zero, right?

The issue with the COND_REDUCITION index vector is overflow IIRC.

Alan, can you please comment on the patch?

Thanks,
Richard.

> 2017-11-21  Kilian Verhetsel <kilian.verhetsel@uclouvain.be>
>
>         * tree-vect-loop.c
>         (vect_create_epilog_for_reduction): Fix the returned value for
>         INTEGER_INDUC_COND_REDUCTION whose last match occurred at
>         index 0.
>         (vectorizable_reduction): For INTEGER_INDUC_COND_REDUCTION,
>         pass the PHI statement that sets the induction variable to the
>         code generating the epilogue.
>
Kilian Verhetsel Nov. 21, 2017, 4:43 p.m. UTC | #2
> This is PR81179 I think, please mention that in the changelog.

Correct, my bad for missing that.

> This unconditionally pessimizes code even if there is no valid index
> zero, right?

Almost, since for a loop such as:

  #define OFFSET 1
  unsigned int find(const unsigned int *a, unsigned int v) {
    unsigned int result = 120;
    for (unsigned int i = OFFSET; i < 32+OFFSET; i++) {
      if (a[i-OFFSET] == v) result = i;
    }
    return result;
  }

the index i will match the contents of the index vector used here ---
but this does indeed pessimize the code generated for, say, OFFSET
= 2. It is probably more sensible to use the existing code path in those
situations.

> The issue with the COND_REDUCITION index vector is overflow IIRC.

Does that mean such overflows can already manifest themselves for
regular COND_REDUCTION? I had assumed sufficient checks were already in
place because of the presence of the is_nonwrapping_integer_induction
test.
Alan Hayward Nov. 21, 2017, 5:01 p.m. UTC | #3
> On 21 Nov 2017, at 16:43, Kilian Verhetsel <kilian.verhetsel@uclouvain.be> wrote:

> 

> 

>> This is PR81179 I think, please mention that in the changelog.

> 

> Correct, my bad for missing that.

> 

>> This unconditionally pessimizes code even if there is no valid index

>> zero, right?

> 

> Almost, since for a loop such as:

> 

>  #define OFFSET 1

>  unsigned int find(const unsigned int *a, unsigned int v) {

>    unsigned int result = 120;

>    for (unsigned int i = OFFSET; i < 32+OFFSET; i++) {

>      if (a[i-OFFSET] == v) result = i;

>    }

>    return result;

>  }

> 

> the index i will match the contents of the index vector used here ---

> but this does indeed pessimize the code generated for, say, OFFSET

> = 2. It is probably more sensible to use the existing code path in those

> situations.

> 


Looking at the final asm on aarch64 for -14.c, the code has only grown
By a single instruction in the epilogue. Which is good, given the vector
pass dump for this patch is quite a bit longer.

In the vector dump, there is a vector of 0’s
_57 = { 0, 0, 0, 0 };
created which is then never used (and later gets optimised away).
Be nice if that could avoid getting created.
I’ve not had chance to scrutinise the patch yet to see where that is created.


>> The issue with the COND_REDUCITION index vector is overflow IIRC.

> 

> Does that mean such overflows can already manifest themselves for

> regular COND_REDUCTION? I had assumed sufficient checks were already in

> place because of the presence of the is_nonwrapping_integer_induction

> test.


As an aside, it’s possibly worth mentioning that this issue will go away for
aarch64-sve when Richard Sandiford’s SVE patch goes in, as that’ll support
CLASTB reduction.
Richard Biener Nov. 22, 2017, 9:14 a.m. UTC | #4
On Tue, Nov 21, 2017 at 5:43 PM, Kilian Verhetsel
<kilian.verhetsel@uclouvain.be> wrote:
>
>> This is PR81179 I think, please mention that in the changelog.
>
> Correct, my bad for missing that.
>
>> This unconditionally pessimizes code even if there is no valid index
>> zero, right?
>
> Almost, since for a loop such as:
>
>   #define OFFSET 1
>   unsigned int find(const unsigned int *a, unsigned int v) {
>     unsigned int result = 120;
>     for (unsigned int i = OFFSET; i < 32+OFFSET; i++) {
>       if (a[i-OFFSET] == v) result = i;
>     }
>     return result;
>   }
>
> the index i will match the contents of the index vector used here ---
> but this does indeed pessimize the code generated for, say, OFFSET
> = 2. It is probably more sensible to use the existing code path in those
> situations.
>
>> The issue with the COND_REDUCITION index vector is overflow IIRC.
>
> Does that mean such overflows can already manifest themselves for
> regular COND_REDUCTION? I had assumed sufficient checks were already in
> place because of the presence of the is_nonwrapping_integer_induction
> test.

But only if we need the index vector?  The patch looked like you're changing
how other modes are handled (in my look I didn't make myself familiar with
the various modes again...).  Anyway, Alan hopefully remembers what he
coded so I'll defer to him here.

If Alan is happy with the patch consider it approved.

Thanks,
Richard.
Alan Hayward Nov. 22, 2017, 11:01 a.m. UTC | #5
> On 22 Nov 2017, at 09:14, Richard Biener <richard.guenther@gmail.com> wrote:

> 

> On Tue, Nov 21, 2017 at 5:43 PM, Kilian Verhetsel

> <kilian.verhetsel@uclouvain.be> wrote:

>> 

>>> This is PR81179 I think, please mention that in the changelog.

>> 

>> Correct, my bad for missing that.

>> 

>>> This unconditionally pessimizes code even if there is no valid index

>>> zero, right?

>> 

>> Almost, since for a loop such as:

>> 

>>  #define OFFSET 1

>>  unsigned int find(const unsigned int *a, unsigned int v) {

>>    unsigned int result = 120;

>>    for (unsigned int i = OFFSET; i < 32+OFFSET; i++) {

>>      if (a[i-OFFSET] == v) result = i;

>>    }

>>    return result;

>>  }

>> 

>> the index i will match the contents of the index vector used here ---

>> but this does indeed pessimize the code generated for, say, OFFSET

>> = 2. It is probably more sensible to use the existing code path in those

>> situations.

>> 

>>> The issue with the COND_REDUCITION index vector is overflow IIRC.

>> 

>> Does that mean such overflows can already manifest themselves for

>> regular COND_REDUCTION? I had assumed sufficient checks were already in

>> place because of the presence of the is_nonwrapping_integer_induction

>> test.

> 

> But only if we need the index vector?  The patch looked like you're changing

> how other modes are handled (in my look I didn't make myself familiar with

> the various modes again...).  Anyway, Alan hopefully remembers what he

> coded so I'll defer to him here.

> 

> If Alan is happy with the patch consider it approved.

> 


Richard’s right with his question.

The optimisation needs to fail if the number of interactions of the loop + 1 doesn’t
fit into the data type used for the result.

I took the test pr65947-14.c
First I set N to 0xffffffff-1. This compiled and vectorised. That’s ok.
Now if I set N to 0xffffffff it still vectorises, but this should fail.

Compare to pr65947-14.c where we set  last = a[I]; inside the if.
When set N to 0xffffffff-1, it compiled and vectorised. That’s ok.
When set N to 0xffffffff it fails to vectorise with the message
"loop size is greater than data size”.

Looks like you might just need to add the one check.

Also see pr65947-9.c which uses the slightly more useful char indexes.


Alan.
Richard Biener Nov. 22, 2017, 2:55 p.m. UTC | #6
On Wed, Nov 22, 2017 at 12:01 PM, Alan Hayward <Alan.Hayward@arm.com> wrote:
>
>> On 22 Nov 2017, at 09:14, Richard Biener <richard.guenther@gmail.com> wrote:
>>
>> On Tue, Nov 21, 2017 at 5:43 PM, Kilian Verhetsel
>> <kilian.verhetsel@uclouvain.be> wrote:
>>>
>>>> This is PR81179 I think, please mention that in the changelog.
>>>
>>> Correct, my bad for missing that.
>>>
>>>> This unconditionally pessimizes code even if there is no valid index
>>>> zero, right?
>>>
>>> Almost, since for a loop such as:
>>>
>>>  #define OFFSET 1
>>>  unsigned int find(const unsigned int *a, unsigned int v) {
>>>    unsigned int result = 120;
>>>    for (unsigned int i = OFFSET; i < 32+OFFSET; i++) {
>>>      if (a[i-OFFSET] == v) result = i;
>>>    }
>>>    return result;
>>>  }
>>>
>>> the index i will match the contents of the index vector used here ---
>>> but this does indeed pessimize the code generated for, say, OFFSET
>>> = 2. It is probably more sensible to use the existing code path in those
>>> situations.
>>>
>>>> The issue with the COND_REDUCITION index vector is overflow IIRC.
>>>
>>> Does that mean such overflows can already manifest themselves for
>>> regular COND_REDUCTION? I had assumed sufficient checks were already in
>>> place because of the presence of the is_nonwrapping_integer_induction
>>> test.
>>
>> But only if we need the index vector?  The patch looked like you're changing
>> how other modes are handled (in my look I didn't make myself familiar with
>> the various modes again...).  Anyway, Alan hopefully remembers what he
>> coded so I'll defer to him here.
>>
>> If Alan is happy with the patch consider it approved.
>>
>
> Richard’s right with his question.
>
> The optimisation needs to fail if the number of interactions of the loop + 1 doesn’t
> fit into the data type used for the result.
>
> I took the test pr65947-14.c
> First I set N to 0xffffffff-1. This compiled and vectorised. That’s ok.
> Now if I set N to 0xffffffff it still vectorises, but this should fail.
>
> Compare to pr65947-14.c where we set  last = a[I]; inside the if.
> When set N to 0xffffffff-1, it compiled and vectorised. That’s ok.
> When set N to 0xffffffff it fails to vectorise with the message
> "loop size is greater than data size”.
>
> Looks like you might just need to add the one check.
>
> Also see pr65947-9.c which uses the slightly more useful char indexes.

Some extra test variants might be indeed useful to really cover all corner
cases.  The PR had some "trivial" patch for the issue by me that I considered
a too big hammer.

I wonder if other compilers pull another trick to deal with the situation.

Richard.

>
> Alan.
>
>
>
Kilian Verhetsel Nov. 22, 2017, 4:57 p.m. UTC | #7
Thank you both for your comments.

I have added the check to ensure the index vector won't cause an
overflow. I also added tests to the testsuite in order to check that the
loop is vectorized for UINT_MAX - 1 iterations but not UINT_MAX
iterations. I was not able to write code that triggers
INTEGER_INDUC_COND_REDUCTION when using char or other smaller types
(changing the types of last, min_v and a to something else causes
COND_REDUCTION to be used instead), so these tests are only compiled and
not executed.

I also moved an instruction that generates a vector of zeroes (used for
COND_REDUCTION) in the branch of code run only for COND_REDUCTION, this
should remove the unused vector that Alan noticed.

2017-11-21  Kilian Verhetsel <kilian.verhetsel@uclouvain.be>

gcc/ChangeLog:
	PR testsuite/81179
	* tree-vect-loop.c
        (vect_create_epilog_for_reduction): Fix the returned value for
	INTEGER_INDUC_COND_REDUCTION whose last match occurred at
	index 0.
	(vectorizable_reduction): For INTEGER_INDUC_COND_REDUCTION,
	pass the PHI statement that sets the induction variable to the
	code generating the epilogue and check that no overflow will
	occur.

gcc/testsuite/ChangeLog:
	* gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c: New test for
        overflows when compiling conditional reductions.
        * gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c: Likewise.
Index: gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c	(working copy)
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_condition } */
+
+#include "tree-vect.h"
+#include <limits.h>
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N UINT_MAX
+
+/* Condition reduction with maximum possible loop size.  Will fail to vectorize
+   because values in the index vector will overflow.  */
+unsigned int
+condition_reduction (unsigned int *a, unsigned int min_v)
+{
+  unsigned int last = -72;
+
+  for (unsigned int i = 0; i < N; i++)
+    if (a[i] < min_v)
+      last = i;
+
+  return last;
+}
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "condition expression based on integer induction." "vect" } } */
+/* { dg-final { scan-tree-dump "loop size is greater than data size" "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c	(working copy)
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_condition } */
+
+#include "tree-vect.h"
+#include <limits.h>
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N (UINT_MAX - 1)
+
+/* Condition reduction with maximum possible loop size, minus one.  This should
+   still be vectorized correctly.  */
+unsigned int
+condition_reduction (unsigned int *a, unsigned int min_v)
+{
+  unsigned int last = -72;
+
+  for (unsigned int i = 0; i < N; i++)
+    if (a[i] < min_v)
+      last = i;
+
+  return last;
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "condition expression based on integer induction." "vect" } } */
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	(revision 254996)
+++ gcc/tree-vect-loop.c	(working copy)
@@ -4316,7 +4316,7 @@ get_initial_defs_for_reduction (slp_tree slp_node,
 
 static void
 vect_create_epilog_for_reduction (vec<tree> vect_defs, gimple *stmt,
-				  gimple *reduc_def_stmt,
+				  gimple *reduc_def_stmt, gimple *induct_stmt,
 				  int ncopies, enum tree_code reduc_code,
 				  vec<gimple *> reduction_phis,
                                   bool double_reduc, 
@@ -4477,7 +4477,9 @@ vect_create_epilog_for_reduction (vec<tree> vect_d
      The first match will be a 1 to allow 0 to be used for non-matching
      indexes.  If there are no matches at all then the vector will be all
      zeroes.  */
-  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION)
+  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
+      || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+      == INTEGER_INDUC_COND_REDUCTION)
     {
       tree indx_before_incr, indx_after_incr;
       int nunits_out = TYPE_VECTOR_SUBPARTS (vectype);
@@ -4754,7 +4756,9 @@ vect_create_epilog_for_reduction (vec<tree> vect_d
   else
     new_phi_result = PHI_RESULT (new_phis[0]);
 
-  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
+  if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
+       || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+       == INTEGER_INDUC_COND_REDUCTION)
       && reduc_code != ERROR_MARK)
     {
       /* For condition reductions, we have a vector (NEW_PHI_RESULT) containing
@@ -4779,18 +4783,6 @@ vect_create_epilog_for_reduction (vec<tree> vect_d
       tree vectype_unsigned = build_vector_type
 	(scalar_type_unsigned, TYPE_VECTOR_SUBPARTS (vectype));
 
-      /* First we need to create a vector (ZERO_VEC) of zeros and another
-	 vector (MAX_INDEX_VEC) filled with the last matching index, which we
-	 can create using a MAX reduction and then expanding.
-	 In the case where the loop never made any matches, the max index will
-	 be zero.  */
-
-      /* Vector of {0, 0, 0,...}.  */
-      tree zero_vec = make_ssa_name (vectype);
-      tree zero_vec_rhs = build_zero_cst (vectype);
-      gimple *zero_vec_stmt = gimple_build_assign (zero_vec, zero_vec_rhs);
-      gsi_insert_before (&exit_gsi, zero_vec_stmt, GSI_SAME_STMT);
-
       /* Find maximum value from the vector of found indexes.  */
       tree max_index = make_ssa_name (index_scalar_type);
       gimple *max_index_stmt = gimple_build_assign (max_index, REDUC_MAX_EXPR,
@@ -4797,76 +4789,130 @@ vect_create_epilog_for_reduction (vec<tree> vect_d
 						    induction_index);
       gsi_insert_before (&exit_gsi, max_index_stmt, GSI_SAME_STMT);
 
-      /* Vector of {max_index, max_index, max_index,...}.  */
-      tree max_index_vec = make_ssa_name (index_vec_type);
-      tree max_index_vec_rhs = build_vector_from_val (index_vec_type,
-						      max_index);
-      gimple *max_index_vec_stmt = gimple_build_assign (max_index_vec,
-							max_index_vec_rhs);
-      gsi_insert_before (&exit_gsi, max_index_vec_stmt, GSI_SAME_STMT);
+      if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION)
+	{
+	  /* First we need to create a vector (ZERO_VEC) of zeros and another
+	     vector (MAX_INDEX_VEC) filled with the last matching index, which
+	     we can create using a MAX reduction and then expanding.  In the
+	     case where the loop never made any matches, the max index will be
+	     zero.  */
 
-      /* Next we compare the new vector (MAX_INDEX_VEC) full of max indexes
-	 with the vector (INDUCTION_INDEX) of found indexes, choosing values
-	 from the data vector (NEW_PHI_RESULT) for matches, 0 (ZERO_VEC)
-	 otherwise.  Only one value should match, resulting in a vector
-	 (VEC_COND) with one data value and the rest zeros.
-	 In the case where the loop never made any matches, every index will
-	 match, resulting in a vector with all data values (which will all be
-	 the default value).  */
+	  /* Vector of {0, 0, 0,...}.  */
+	  tree zero_vec = make_ssa_name (vectype);
+	  tree zero_vec_rhs = build_zero_cst (vectype);
+	  gimple *zero_vec_stmt = gimple_build_assign (zero_vec, zero_vec_rhs);
+	  gsi_insert_before (&exit_gsi, zero_vec_stmt, GSI_SAME_STMT);
 
-      /* Compare the max index vector to the vector of found indexes to find
-	 the position of the max value.  */
-      tree vec_compare = make_ssa_name (index_vec_cmp_type);
-      gimple *vec_compare_stmt = gimple_build_assign (vec_compare, EQ_EXPR,
-						      induction_index,
-						      max_index_vec);
-      gsi_insert_before (&exit_gsi, vec_compare_stmt, GSI_SAME_STMT);
+	  /* Vector of {max_index, max_index, max_index,...}.  */
+	  tree max_index_vec = make_ssa_name (index_vec_type);
+	  tree max_index_vec_rhs = build_vector_from_val (index_vec_type,
+							  max_index);
+	  gimple *max_index_vec_stmt = gimple_build_assign (max_index_vec,
+							    max_index_vec_rhs);
+	  gsi_insert_before (&exit_gsi, max_index_vec_stmt, GSI_SAME_STMT);
 
-      /* Use the compare to choose either values from the data vector or
-	 zero.  */
-      tree vec_cond = make_ssa_name (vectype);
-      gimple *vec_cond_stmt = gimple_build_assign (vec_cond, VEC_COND_EXPR,
-						   vec_compare, new_phi_result,
-						   zero_vec);
-      gsi_insert_before (&exit_gsi, vec_cond_stmt, GSI_SAME_STMT);
+	  /* Next we compare the new vector (MAX_INDEX_VEC) full of max indexes
+	     with the vector (INDUCTION_INDEX) of found indexes, choosing values
+	     from the data vector (NEW_PHI_RESULT) for matches, 0 (ZERO_VEC)
+	     otherwise.  Only one value should match, resulting in a vector
+	     (VEC_COND) with one data value and the rest zeros.  In the case
+	     where the loop never made any matches, every index will match,
+	     resulting in a vector with all data values (which will all be the
+	     default value).  */
 
-      /* Finally we need to extract the data value from the vector (VEC_COND)
-	 into a scalar (MATCHED_DATA_REDUC).  Logically we want to do a OR
-	 reduction, but because this doesn't exist, we can use a MAX reduction
-	 instead.  The data value might be signed or a float so we need to cast
-	 it first.
-	 In the case where the loop never made any matches, the data values are
-	 all identical, and so will reduce down correctly.  */
+	  /* Compare the max index vector to the vector of found indexes to find
+	     the position of the max value.  */
+	  tree vec_compare = make_ssa_name (index_vec_cmp_type);
+	  gimple *vec_compare_stmt = gimple_build_assign (vec_compare, EQ_EXPR,
+							  induction_index,
+							  max_index_vec);
+	  gsi_insert_before (&exit_gsi, vec_compare_stmt, GSI_SAME_STMT);
 
-      /* Make the matched data values unsigned.  */
-      tree vec_cond_cast = make_ssa_name (vectype_unsigned);
-      tree vec_cond_cast_rhs = build1 (VIEW_CONVERT_EXPR, vectype_unsigned,
-				       vec_cond);
-      gimple *vec_cond_cast_stmt = gimple_build_assign (vec_cond_cast,
-							VIEW_CONVERT_EXPR,
-							vec_cond_cast_rhs);
-      gsi_insert_before (&exit_gsi, vec_cond_cast_stmt, GSI_SAME_STMT);
+	  /* Use the compare to choose either values from the data vector or
+	     zero.  */
+	  tree vec_cond = make_ssa_name (vectype);
+	  gimple *vec_cond_stmt = gimple_build_assign (vec_cond, VEC_COND_EXPR,
+						       vec_compare,
+						       new_phi_result,
+						       zero_vec);
+	  gsi_insert_before (&exit_gsi, vec_cond_stmt, GSI_SAME_STMT);
 
-      /* Reduce down to a scalar value.  */
-      tree data_reduc = make_ssa_name (scalar_type_unsigned);
-      optab ot = optab_for_tree_code (REDUC_MAX_EXPR, vectype_unsigned,
-				      optab_default);
-      gcc_assert (optab_handler (ot, TYPE_MODE (vectype_unsigned))
-		  != CODE_FOR_nothing);
-      gimple *data_reduc_stmt = gimple_build_assign (data_reduc,
-						     REDUC_MAX_EXPR,
-						     vec_cond_cast);
-      gsi_insert_before (&exit_gsi, data_reduc_stmt, GSI_SAME_STMT);
+	  /* Finally we need to extract the data value from the vector
+	     (VEC_COND) into a scalar (MATCHED_DATA_REDUC).  Logically we want
+	     to do a OR reduction, but because this doesn't exist, we can use a
+	     MAX reduction instead.  The data value might be signed or a float
+	     so we need to cast it first.  In the case where the loop never made
+	     any matches, the data values are all identical, and so will reduce
+	     down correctly.  */
 
-      /* Convert the reduced value back to the result type and set as the
-	 result.  */
-      gimple_seq stmts = NULL;
-      new_temp = gimple_build (&stmts, VIEW_CONVERT_EXPR, scalar_type,
-			       data_reduc);
-      gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
+	  /* Make the matched data values unsigned.  */
+	  tree vec_cond_cast = make_ssa_name (vectype_unsigned);
+	  tree vec_cond_cast_rhs = build1 (VIEW_CONVERT_EXPR, vectype_unsigned,
+					   vec_cond);
+	  gimple *vec_cond_cast_stmt = gimple_build_assign (vec_cond_cast,
+							    VIEW_CONVERT_EXPR,
+							    vec_cond_cast_rhs);
+	  gsi_insert_before (&exit_gsi, vec_cond_cast_stmt, GSI_SAME_STMT);
+
+	  /* Reduce down to a scalar value.  */
+	  tree data_reduc = make_ssa_name (scalar_type_unsigned);
+	  optab ot = optab_for_tree_code (REDUC_MAX_EXPR, vectype_unsigned,
+					  optab_default);
+	  gcc_assert (optab_handler (ot, TYPE_MODE (vectype_unsigned))
+		      != CODE_FOR_nothing);
+	  gimple *data_reduc_stmt = gimple_build_assign (data_reduc,
+							 REDUC_MAX_EXPR,
+							 vec_cond_cast);
+	  gsi_insert_before (&exit_gsi, data_reduc_stmt, GSI_SAME_STMT);
+
+	  /* Convert the reduced value back to the result type and set as the
+	     result.  */
+	  gimple_seq stmts = NULL;
+	  new_temp = gimple_build (&stmts, VIEW_CONVERT_EXPR, scalar_type,
+				   data_reduc);
+	  gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
+	}
+      else
+	{
+	  /* Retrieve the index of the last match, and adjust the result value
+	     if there were no matches.  */
+	  gimple_seq stmts = NULL;
+
+	  stmt_vec_info induct_stmt_info = vinfo_for_stmt (induct_stmt);
+
+	  tree base = PHI_ARG_DEF_FROM_EDGE (induct_stmt,
+					     loop_preheader_edge (loop));
+	  tree step = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (induct_stmt_info);
+
+	  tree one = build_one_cst (TREE_TYPE (max_index));
+	  tree index = gimple_build (&stmts, MINUS_EXPR, TREE_TYPE (max_index),
+				     max_index, one);
+
+	  index = gimple_convert (&stmts, scalar_type, index);
+
+	  tree offset = gimple_build (&stmts, MULT_EXPR, scalar_type,
+				      step, index);
+	  tree result = gimple_build (&stmts, PLUS_EXPR, scalar_type,
+				      base, offset);
+
+	  tree zero = build_zero_cst (TREE_TYPE (max_index));
+	  tree zcompare = build2 (EQ_EXPR, boolean_type_node,
+				  max_index, zero);
+
+	  gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
+
+	  new_temp = make_ssa_name (new_scalar_dest);
+	  epilog_stmt = gimple_build_assign (new_temp, COND_EXPR, zcompare,
+					     initial_def, result);
+
+	  gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
+	}
+
       scalar_results.safe_push (new_temp);
     }
-  else if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
+  else if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
+	    || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+	    == INTEGER_INDUC_COND_REDUCTION)
 	   && reduc_code == ERROR_MARK)
     {
       /* Condition redution without supported REDUC_MAX_EXPR.  Generate
@@ -4907,8 +4953,12 @@ vect_create_epilog_for_reduction (vec<tree> vect_d
 	    {
 	      tree new_idx_val = idx_val;
 	      tree new_val = val;
-	      if (off != v_size - el_size)
+	      if (off != v_size - el_size
+		  || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+		  == INTEGER_INDUC_COND_REDUCTION)
 		{
+		  /* For integer induction, the index of the last match
+		     must always be known.  */
 		  new_idx_val = make_ssa_name (idx_eltype);
 		  epilog_stmt = gimple_build_assign (new_idx_val,
 						     MAX_EXPR, idx_val,
@@ -4928,12 +4978,52 @@ vect_create_epilog_for_reduction (vec<tree> vect_d
 	      val = new_val;
 	    }
 	}
-      /* Convert the reduced value back to the result type and set as the
-	 result.  */
+
       gimple_seq stmts = NULL;
-      val = gimple_convert (&stmts, scalar_type, val);
-      gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
-      scalar_results.safe_push (val);
+
+      if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION)
+	{
+	  /* Convert the reduced value back to the result type and set as the
+	     result.  */
+	  val = gimple_convert (&stmts, scalar_type, val);
+	  gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
+	  scalar_results.safe_push (val);
+	}
+      else if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+	       == INTEGER_INDUC_COND_REDUCTION)
+	{
+	  /* Retrieve the index of the last match, and adjust the result value
+	     if there were no matches.  */
+	  stmt_vec_info induct_stmt_info = vinfo_for_stmt (induct_stmt);
+
+	  tree base = PHI_ARG_DEF_FROM_EDGE (induct_stmt,
+					     loop_preheader_edge (loop));
+	  tree step = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (induct_stmt_info);
+
+	  tree one = build_one_cst (idx_eltype);
+	  tree index = gimple_build (&stmts, MINUS_EXPR, idx_eltype,
+				     idx_val, one);
+
+	  index = gimple_convert (&stmts, scalar_type, index);
+
+	  tree offset = gimple_build (&stmts, MULT_EXPR, scalar_type,
+				      step, index);
+	  tree result = gimple_build (&stmts, PLUS_EXPR, scalar_type,
+				      base, offset);
+
+	  gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
+
+	  tree zero = build_zero_cst (TREE_TYPE (idx_val));
+	  tree zcompare = build2 (EQ_EXPR, boolean_type_node,
+				  idx_val, zero);
+
+	  new_temp = make_ssa_name (new_scalar_dest);
+	  epilog_stmt = gimple_build_assign (new_temp, COND_EXPR, zcompare,
+					     initial_def, result);
+	  gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
+
+	  scalar_results.safe_push (new_temp);
+	}
     }
 
   /* 2.3 Create the reduction code, using one of the three schemes described
@@ -5625,6 +5715,7 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_
   bool first_p = true;
   tree cr_index_scalar_type = NULL_TREE, cr_index_vector_type = NULL_TREE;
   tree cond_reduc_val = NULL_TREE;
+  gimple *induct_stmt = NULL;
 
   /* Make sure it was already recognized as a reduction computation.  */
   if (STMT_VINFO_DEF_TYPE (vinfo_for_stmt (stmt)) != vect_reduction_def
@@ -5869,7 +5960,10 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_
 	    }
 	  if (dt == vect_induction_def && def_stmt != NULL
 	      && is_nonwrapping_integer_induction (def_stmt, loop))
-	    cond_reduc_dt = dt;
+	    {
+	      cond_reduc_dt = dt;
+	      induct_stmt = def_stmt;
+	    }
 	}
     }
 
@@ -6137,7 +6231,9 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_
 
   epilog_reduc_code = ERROR_MARK;
 
-  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) != COND_REDUCTION)
+  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) != COND_REDUCTION
+      && STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+      != INTEGER_INDUC_COND_REDUCTION)
     {
       if (reduction_code_for_scalar_code (orig_code, &epilog_reduc_code))
 	{
@@ -6218,7 +6314,9 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_
         }
     }
 
-  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION)
+  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
+      || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+      == INTEGER_INDUC_COND_REDUCTION)
     {
       widest_int ni;
 
@@ -6232,7 +6330,6 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_
 	}
       /* Convert backedges to iterations.  */
       ni += 1;
-
       /* The additional index will be the same type as the condition.  Check
 	 that the loop can fit into this less one (because we'll use up the
 	 zero slot for when there are no matches).  */
@@ -6461,7 +6558,7 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_
     vect_defs[0] = gimple_assign_lhs (*vec_stmt);
 
   vect_create_epilog_for_reduction (vect_defs, stmt, reduc_def_stmt,
-				    epilog_copies,
+				    induct_stmt, epilog_copies,
                                     epilog_reduc_code, phis,
 				    double_reduc, slp_node, slp_node_instance);
Alan Hayward Nov. 23, 2017, 9:51 a.m. UTC | #8
> On 22 Nov 2017, at 16:57, Kilian Verhetsel <kilian.verhetsel@uclouvain.be> wrote:

> 

> 

> Thank you both for your comments.

> 

> I have added the check to ensure the index vector won't cause an

> overflow. I also added tests to the testsuite in order to check that the

> loop is vectorized for UINT_MAX - 1 iterations but not UINT_MAX

> iterations. I was not able to write code that triggers

> INTEGER_INDUC_COND_REDUCTION when using char or other smaller types

> (changing the types of last, min_v and a to something else causes

> COND_REDUCTION to be used instead), so these tests are only compiled and

> not executed.


I had similar problems when playing around with -14.c, but didn’t have chance to
investigate why. Possibly worth raising a bug to mentioning there is a missed
optimisation. It’d be nice to figure out why.

> 

> I also moved an instruction that generates a vector of zeroes (used for

> COND_REDUCTION) in the branch of code run only for COND_REDUCTION, this

> should remove the unused vector that Alan noticed.


Patch is ok for me.


Alan.
Richard Biener Nov. 23, 2017, 12:16 p.m. UTC | #9
On Thu, Nov 23, 2017 at 10:51 AM, Alan Hayward <Alan.Hayward@arm.com> wrote:
>
>> On 22 Nov 2017, at 16:57, Kilian Verhetsel <kilian.verhetsel@uclouvain.be> wrote:
>>
>>
>> Thank you both for your comments.
>>
>> I have added the check to ensure the index vector won't cause an
>> overflow. I also added tests to the testsuite in order to check that the
>> loop is vectorized for UINT_MAX - 1 iterations but not UINT_MAX
>> iterations. I was not able to write code that triggers
>> INTEGER_INDUC_COND_REDUCTION when using char or other smaller types
>> (changing the types of last, min_v and a to something else causes
>> COND_REDUCTION to be used instead), so these tests are only compiled and
>> not executed.
>
> I had similar problems when playing around with -14.c, but didn’t have chance to
> investigate why. Possibly worth raising a bug to mentioning there is a missed
> optimisation. It’d be nice to figure out why.
>
>>
>> I also moved an instruction that generates a vector of zeroes (used for
>> COND_REDUCTION) in the branch of code run only for COND_REDUCTION, this
>> should remove the unused vector that Alan noticed.
>
> Patch is ok for me.

Fine with me as well.

Thanks for taking care of this bug.

Richard.

>
> Alan.
>
Jakub Jelinek Dec. 8, 2017, 6:15 p.m. UTC | #10
Hi!

What is going on with this patch, will anybody commit it?

This is also http://gcc.gnu.org/PR80631 apparently.

The patch doesn't apply cleanly to current trunk due to the
reduc_code -> reduc_fn changes.

As it doesn't apply, I can't easily check what the patch generates
on the PR80631 testcases vs. my thoughts on that; though, if it emits
something more complicated for the simple cases, perhaps we could improve
those (not handle it like COND_REDUCTION, but instead as former
INTEGER_INDUC_COND_REDUCTION and just use a different constant instead of 0
if 0 isn't usable for the condition never matched value.  We could
check the value from the loop preheader edge of the induc_stmt phi
and if it is constant and for REDUC_MAX (does that imply always positive
step?) larger than minimum value of the type, we can use the minimum of
the type as the value instead of zero.  If REDUC_MIN and the initial constant
is smaller than the maximum value of the type, we could use the maximum.
Otherwise punt to what you're doing.  Or instead of minimum or maximum check
the value of initial_def, if it is constant, if that value is usable, we
could even avoid the final COND_EXPR.

Another thing is that is_nonwrapping_integer_induction has:
  /* Make sure the loop is integer based.  */
  if (TREE_CODE (base) != INTEGER_CST
      || TREE_CODE (step) != INTEGER_CST)
    return false;

  /* Check that the induction increments.  */
  if (tree_int_cst_sgn (step) == -1)
    return false;

  if (TYPE_OVERFLOW_UNDEFINED (lhs_type))
    return true;

First of all, I fail to see why we don't handle negative step,
that can be done with REDUC_MIN instead of REDUC_MAX.

And, I also fail to see why we need to require INTEGER_CST base if
TYPE_OVERFLOW_UNDEFINED (lhs_type), in that case we know it will not wrap,
so all we care about is if step * ni won't cover all iterations and we can
use some value (type minimum or maximum) to denote the condition never
true case.

On Wed, Nov 22, 2017 at 05:57:08PM +0100, Kilian Verhetsel wrote:
> 2017-11-21  Kilian Verhetsel <kilian.verhetsel@uclouvain.be>

Missing space before <.

> gcc/ChangeLog:
> 	PR testsuite/81179
> 	* tree-vect-loop.c
>         (vect_create_epilog_for_reduction): Fix the returned value for

8 spaces instead of a tab, note the (vect_create_epilog_for_reduction):
should just go after space right after the filename.

> 	INTEGER_INDUC_COND_REDUCTION whose last match occurred at
> 	index 0.
> 	(vectorizable_reduction): For INTEGER_INDUC_COND_REDUCTION,
> 	pass the PHI statement that sets the induction variable to the
> 	code generating the epilogue and check that no overflow will
> 	occur.
> 
> gcc/testsuite/ChangeLog:
> 	* gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c: New test for
>         overflows when compiling conditional reductions.
>         * gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c: Likewise.

Again, spaces instead of tab twice.

> -  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION)
> +  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
> +      || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> +      == INTEGER_INDUC_COND_REDUCTION)

Formatting, == should be below STMT_VINFO on the previous line, for emacs
users perhaps even better surrounded by ()s, like:


  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
      || (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
	  == INTEGER_INDUC_COND_REDUCTION))

(happens many times in the patch).

	Jakub
Kilian Verhetsel Dec. 11, 2017, 10:56 a.m. UTC | #11
Hello,

Jakub Jelinek <jakub@redhat.com> writes:
> As it doesn't apply, I can't easily check what the patch generates
> on the PR80631 testcases vs. my thoughts on that; though, if it emits
> something more complicated for the simple cases, perhaps we could improve
> those (not handle it like COND_REDUCTION, but instead as former
> INTEGER_INDUC_COND_REDUCTION and just use a different constant instead of 0
> if 0 isn't usable for the condition never matched value.

While you could use values different from 0, I'm not sure this can be
done as efficiently.  0 is convenient because a single bitwise-and
between the index vector and the condition will set lanes that do not
contain a match to 0.

Jakub Jelinek <jakub@redhat.com> writes:
> First of all, I fail to see why we don't handle negative step,
> that can be done with REDUC_MIN instead of REDUC_MAX.

That would not work without first using values different from 0 to
indicate the absence of matches (except in cases where all indices are
negative), which I assume is why the test was there in the first place.

Below is the patch with fixed formatting and changes to make it apply
cleanly to r255537 (as far as I can tell this was simply caused by some
variables and constants having been renamed).

2017-11-21  Kilian Verhetsel  <kilian.verhetsel@uclouvain.be>

gcc/ChangeLog:
	PR testsuite/81179
	* tree-vect-loop.c (vect_create_epilog_for_reduction): Fix the
	returned value for INTEGER_INDUC_COND_REDUCTION whose last match
	occurred at index 0.
	(vectorizable_reduction): For
	INTEGER_INDUC_COND_REDUCTION, pass the PHI statement that sets
	the induction variable to the code generating the epilogue and
	check that no overflow will occur.

gcc/testsuite/ChangeLog:
	* gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c: New test for
	overflows when compiling conditional reductions.
	* gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c: Likewise.
Index: gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c	(working copy)
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_condition } */
+
+#include "tree-vect.h"
+#include <limits.h>
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N UINT_MAX
+
+/* Condition reduction with maximum possible loop size.  Will fail to vectorize
+   because values in the index vector will overflow.  */
+unsigned int
+condition_reduction (unsigned int *a, unsigned int min_v)
+{
+  unsigned int last = -72;
+
+  for (unsigned int i = 0; i < N; i++)
+    if (a[i] < min_v)
+      last = i;
+
+  return last;
+}
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "condition expression based on integer induction." "vect" } } */
+/* { dg-final { scan-tree-dump "loop size is greater than data size" "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c	(working copy)
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_condition } */
+
+#include "tree-vect.h"
+#include <limits.h>
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N (UINT_MAX - 1)
+
+/* Condition reduction with maximum possible loop size, minus one.  This should
+   still be vectorized correctly.  */
+unsigned int
+condition_reduction (unsigned int *a, unsigned int min_v)
+{
+  unsigned int last = -72;
+
+  for (unsigned int i = 0; i < N; i++)
+    if (a[i] < min_v)
+      last = i;
+
+  return last;
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "condition expression based on integer induction." "vect" } } */
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	(revision 255537)
+++ gcc/tree-vect-loop.c	(working copy)
@@ -4325,7 +4325,7 @@ get_initial_defs_for_reduction (slp_tree slp_node,
 
 static void
 vect_create_epilog_for_reduction (vec<tree> vect_defs, gimple *stmt,
-				  gimple *reduc_def_stmt,
+				  gimple *reduc_def_stmt, gimple *induct_stmt,
 				  int ncopies, internal_fn reduc_fn,
 				  vec<gimple *> reduction_phis,
                                   bool double_reduc, 
@@ -4486,7 +4486,9 @@ vect_create_epilog_for_reduction (vec<tree> vect_d
      The first match will be a 1 to allow 0 to be used for non-matching
      indexes.  If there are no matches at all then the vector will be all
      zeroes.  */
-  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION)
+  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
+      || (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+	  == INTEGER_INDUC_COND_REDUCTION))
     {
       tree indx_before_incr, indx_after_incr;
       int nunits_out = TYPE_VECTOR_SUBPARTS (vectype);
@@ -4763,7 +4765,9 @@ vect_create_epilog_for_reduction (vec<tree> vect_d
   else
     new_phi_result = PHI_RESULT (new_phis[0]);
 
-  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
+  if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
+       || (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+	   == INTEGER_INDUC_COND_REDUCTION))
       && reduc_fn != IFN_LAST)
     {
       /* For condition reductions, we have a vector (NEW_PHI_RESULT) containing
@@ -4788,18 +4792,6 @@ vect_create_epilog_for_reduction (vec<tree> vect_d
       tree vectype_unsigned = build_vector_type
 	(scalar_type_unsigned, TYPE_VECTOR_SUBPARTS (vectype));
 
-      /* First we need to create a vector (ZERO_VEC) of zeros and another
-	 vector (MAX_INDEX_VEC) filled with the last matching index, which we
-	 can create using a MAX reduction and then expanding.
-	 In the case where the loop never made any matches, the max index will
-	 be zero.  */
-
-      /* Vector of {0, 0, 0,...}.  */
-      tree zero_vec = make_ssa_name (vectype);
-      tree zero_vec_rhs = build_zero_cst (vectype);
-      gimple *zero_vec_stmt = gimple_build_assign (zero_vec, zero_vec_rhs);
-      gsi_insert_before (&exit_gsi, zero_vec_stmt, GSI_SAME_STMT);
-
       /* Find maximum value from the vector of found indexes.  */
       tree max_index = make_ssa_name (index_scalar_type);
       gcall *max_index_stmt = gimple_build_call_internal (IFN_REDUC_MAX,
@@ -4815,65 +4807,114 @@ vect_create_epilog_for_reduction (vec<tree> vect_d
 							max_index_vec_rhs);
       gsi_insert_before (&exit_gsi, max_index_vec_stmt, GSI_SAME_STMT);
 
-      /* Next we compare the new vector (MAX_INDEX_VEC) full of max indexes
-	 with the vector (INDUCTION_INDEX) of found indexes, choosing values
-	 from the data vector (NEW_PHI_RESULT) for matches, 0 (ZERO_VEC)
-	 otherwise.  Only one value should match, resulting in a vector
-	 (VEC_COND) with one data value and the rest zeros.
-	 In the case where the loop never made any matches, every index will
-	 match, resulting in a vector with all data values (which will all be
-	 the default value).  */
+      if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION)
+	{
+	  /* Next we compare the new vector (MAX_INDEX_VEC) full of max indexes
+	     with the vector (INDUCTION_INDEX) of found indexes, choosing values
+	     from the data vector (NEW_PHI_RESULT) for matches, 0 (ZERO_VEC)
+	     otherwise.  Only one value should match, resulting in a vector
+	     (VEC_COND) with one data value and the rest zeros.  In the case
+	     where the loop never made any matches, every index will match,
+	     resulting in a vector with all data values (which will all be the
+	     default value).  */
 
-      /* Compare the max index vector to the vector of found indexes to find
-	 the position of the max value.  */
-      tree vec_compare = make_ssa_name (index_vec_cmp_type);
-      gimple *vec_compare_stmt = gimple_build_assign (vec_compare, EQ_EXPR,
-						      induction_index,
-						      max_index_vec);
-      gsi_insert_before (&exit_gsi, vec_compare_stmt, GSI_SAME_STMT);
+	  /* Vector of {0, 0, 0,...}.  */
+	  tree zero_vec = make_ssa_name (vectype);
+	  tree zero_vec_rhs = build_zero_cst (vectype);
+	  gimple *zero_vec_stmt = gimple_build_assign (zero_vec, zero_vec_rhs);
+	  gsi_insert_before (&exit_gsi, zero_vec_stmt, GSI_SAME_STMT);
 
-      /* Use the compare to choose either values from the data vector or
-	 zero.  */
-      tree vec_cond = make_ssa_name (vectype);
-      gimple *vec_cond_stmt = gimple_build_assign (vec_cond, VEC_COND_EXPR,
-						   vec_compare, new_phi_result,
-						   zero_vec);
-      gsi_insert_before (&exit_gsi, vec_cond_stmt, GSI_SAME_STMT);
+	  /* Compare the max index vector to the vector of found indexes to find
+	     the position of the max value.  */
+	  tree vec_compare = make_ssa_name (index_vec_cmp_type);
+	  gimple *vec_compare_stmt = gimple_build_assign (vec_compare, EQ_EXPR,
+							  induction_index,
+							  max_index_vec);
+	  gsi_insert_before (&exit_gsi, vec_compare_stmt, GSI_SAME_STMT);
 
-      /* Finally we need to extract the data value from the vector (VEC_COND)
-	 into a scalar (MATCHED_DATA_REDUC).  Logically we want to do a OR
-	 reduction, but because this doesn't exist, we can use a MAX reduction
-	 instead.  The data value might be signed or a float so we need to cast
-	 it first.
-	 In the case where the loop never made any matches, the data values are
-	 all identical, and so will reduce down correctly.  */
+	  /* Use the compare to choose either values from the data vector or
+	     zero.  */
+	  tree vec_cond = make_ssa_name (vectype);
+	  gimple *vec_cond_stmt = gimple_build_assign (vec_cond, VEC_COND_EXPR,
+						       vec_compare,
+						       new_phi_result,
+						       zero_vec);
+	  gsi_insert_before (&exit_gsi, vec_cond_stmt, GSI_SAME_STMT);
 
-      /* Make the matched data values unsigned.  */
-      tree vec_cond_cast = make_ssa_name (vectype_unsigned);
-      tree vec_cond_cast_rhs = build1 (VIEW_CONVERT_EXPR, vectype_unsigned,
-				       vec_cond);
-      gimple *vec_cond_cast_stmt = gimple_build_assign (vec_cond_cast,
-							VIEW_CONVERT_EXPR,
-							vec_cond_cast_rhs);
-      gsi_insert_before (&exit_gsi, vec_cond_cast_stmt, GSI_SAME_STMT);
+	  /* Finally we need to extract the data value from the vector
+	     (VEC_COND) into a scalar (MATCHED_DATA_REDUC).  Logically we want
+	     to do a OR reduction, but because this doesn't exist, we can use a
+	     MAX reduction instead.  The data value might be signed or a float
+	     so we need to cast it first.  In the case where the loop never made
+	     any matches, the data values are all identical, and so will reduce
+	     down correctly.  */
 
-      /* Reduce down to a scalar value.  */
-      tree data_reduc = make_ssa_name (scalar_type_unsigned);
-      gcall *data_reduc_stmt = gimple_build_call_internal (IFN_REDUC_MAX,
-							   1, vec_cond_cast);
-      gimple_call_set_lhs (data_reduc_stmt, data_reduc);
-      gsi_insert_before (&exit_gsi, data_reduc_stmt, GSI_SAME_STMT);
+	  /* Make the matched data values unsigned.  */
+	  tree vec_cond_cast = make_ssa_name (vectype_unsigned);
+	  tree vec_cond_cast_rhs = build1 (VIEW_CONVERT_EXPR, vectype_unsigned,
+					   vec_cond);
+	  gimple *vec_cond_cast_stmt = gimple_build_assign (vec_cond_cast,
+							    VIEW_CONVERT_EXPR,
+							    vec_cond_cast_rhs);
+	  gsi_insert_before (&exit_gsi, vec_cond_cast_stmt, GSI_SAME_STMT);
 
-      /* Convert the reduced value back to the result type and set as the
-	 result.  */
-      gimple_seq stmts = NULL;
-      new_temp = gimple_build (&stmts, VIEW_CONVERT_EXPR, scalar_type,
-			       data_reduc);
-      gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
+	  /* Reduce down to a scalar value.  */
+	  tree data_reduc = make_ssa_name (scalar_type_unsigned);
+	  gcall *data_reduc_stmt = gimple_build_call_internal (IFN_REDUC_MAX,
+							       1,
+							       vec_cond_cast);
+	  gimple_call_set_lhs (data_reduc_stmt, data_reduc);
+	  gsi_insert_before (&exit_gsi, data_reduc_stmt, GSI_SAME_STMT);
+
+	  /* Convert the reduced value back to the result type and set as the
+	     result.  */
+	  gimple_seq stmts = NULL;
+	  new_temp = gimple_build (&stmts, VIEW_CONVERT_EXPR, scalar_type,
+				   data_reduc);
+	  gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
+	}
+      else
+	{
+	  /* Retrieve the index of the last match, and adjust the result value
+	     if there were no matches.  */
+	  gimple_seq stmts = NULL;
+
+	  stmt_vec_info induct_stmt_info = vinfo_for_stmt (induct_stmt);
+
+	  tree base = PHI_ARG_DEF_FROM_EDGE (induct_stmt,
+					     loop_preheader_edge (loop));
+	  tree step = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (induct_stmt_info);
+
+	  tree one = build_one_cst (TREE_TYPE (max_index));
+	  tree index = gimple_build (&stmts, MINUS_EXPR, TREE_TYPE (max_index),
+				     max_index, one);
+
+	  index = gimple_convert (&stmts, scalar_type, index);
+
+	  tree offset = gimple_build (&stmts, MULT_EXPR, scalar_type,
+				      step, index);
+	  tree result = gimple_build (&stmts, PLUS_EXPR, scalar_type,
+				      base, offset);
+
+	  tree zero = build_zero_cst (TREE_TYPE (max_index));
+	  tree zcompare = build2 (EQ_EXPR, boolean_type_node,
+				  max_index, zero);
+
+	  gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
+
+	  new_temp = make_ssa_name (new_scalar_dest);
+	  epilog_stmt = gimple_build_assign (new_temp, COND_EXPR, zcompare,
+					     initial_def, result);
+
+	  gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
+	}
+
       scalar_results.safe_push (new_temp);
     }
-  else if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
-	   && reduc_fn == IFN_LAST)
+  else if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
+	    || (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+		== INTEGER_INDUC_COND_REDUCTION))
+	   && reduc_fn == IFN_LAST)»
     {
       /* Condition reduction without supported IFN_REDUC_MAX.  Generate
 	 idx = 0;
@@ -4913,8 +4954,12 @@ vect_create_epilog_for_reduction (vec<tree> vect_d
 	    {
 	      tree new_idx_val = idx_val;
 	      tree new_val = val;
-	      if (off != v_size - el_size)
+	      if (off != v_size - el_size
+		  || (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+		      == INTEGER_INDUC_COND_REDUCTION))
 		{
+		  /* For integer induction, the index of the last match
+		     must always be known.  */
 		  new_idx_val = make_ssa_name (idx_eltype);
 		  epilog_stmt = gimple_build_assign (new_idx_val,
 						     MAX_EXPR, idx_val,
@@ -4934,12 +4979,52 @@ vect_create_epilog_for_reduction (vec<tree> vect_d
 	      val = new_val;
 	    }
 	}
-      /* Convert the reduced value back to the result type and set as the
-	 result.  */
+
       gimple_seq stmts = NULL;
-      val = gimple_convert (&stmts, scalar_type, val);
-      gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
-      scalar_results.safe_push (val);
+
+      if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION)
+	{
+	  /* Convert the reduced value back to the result type and set as the
+	     result.  */
+	  val = gimple_convert (&stmts, scalar_type, val);
+	  gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
+	  scalar_results.safe_push (val);
+	}
+      else if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+	       == INTEGER_INDUC_COND_REDUCTION)
+	{
+	  /* Retrieve the index of the last match, and adjust the result value
+	     if there were no matches.  */
+	  stmt_vec_info induct_stmt_info = vinfo_for_stmt (induct_stmt);
+
+	  tree base = PHI_ARG_DEF_FROM_EDGE (induct_stmt,
+					     loop_preheader_edge (loop));
+	  tree step = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (induct_stmt_info);
+
+	  tree one = build_one_cst (idx_eltype);
+	  tree index = gimple_build (&stmts, MINUS_EXPR, idx_eltype,
+				     idx_val, one);
+
+	  index = gimple_convert (&stmts, scalar_type, index);
+
+	  tree offset = gimple_build (&stmts, MULT_EXPR, scalar_type,
+				      step, index);
+	  tree result = gimple_build (&stmts, PLUS_EXPR, scalar_type,
+				      base, offset);
+
+	  gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
+
+	  tree zero = build_zero_cst (TREE_TYPE (idx_val));
+	  tree zcompare = build2 (EQ_EXPR, boolean_type_node,
+				  idx_val, zero);
+
+	  new_temp = make_ssa_name (new_scalar_dest);
+	  epilog_stmt = gimple_build_assign (new_temp, COND_EXPR, zcompare,
+					     initial_def, result);
+	  gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
+
+	  scalar_results.safe_push (new_temp);
+	}
     }
 
   /* 2.3 Create the reduction code, using one of the three schemes described
@@ -5637,6 +5722,7 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_
   bool first_p = true;
   tree cr_index_scalar_type = NULL_TREE, cr_index_vector_type = NULL_TREE;
   tree cond_reduc_val = NULL_TREE;
+  gimple *induct_stmt = NULL;
 
   /* Make sure it was already recognized as a reduction computation.  */
   if (STMT_VINFO_DEF_TYPE (vinfo_for_stmt (stmt)) != vect_reduction_def
@@ -5881,7 +5967,10 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_
 	    }
 	  if (dt == vect_induction_def && def_stmt != NULL
 	      && is_nonwrapping_integer_induction (def_stmt, loop))
-	    cond_reduc_dt = dt;
+	    {
+	      cond_reduc_dt = dt;
+	      induct_stmt = def_stmt;
+	    }
 	}
     }
 
@@ -6149,7 +6238,9 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_
 
   reduc_fn = IFN_LAST;
 
-  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) != COND_REDUCTION)
+  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) != COND_REDUCTION
+      && (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+	  != INTEGER_INDUC_COND_REDUCTION))
     {
       if (reduction_fn_for_scalar_code (orig_code, &reduc_fn))
 	{
@@ -6220,7 +6311,9 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_
         }
     }
 
-  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION)
+  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
+      || (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+	  == INTEGER_INDUC_COND_REDUCTION))
     {
       widest_int ni;
 
@@ -6463,8 +6556,9 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_
     vect_defs[0] = gimple_assign_lhs (*vec_stmt);
 
   vect_create_epilog_for_reduction (vect_defs, stmt, reduc_def_stmt,
-				    epilog_copies, reduc_fn, phis,
-				    double_reduc, slp_node, slp_node_instance);
+				    induct_stmt, epilog_copies, reduc_fn,
+				    phis, double_reduc, slp_node,
+				    slp_node_instance);
 
   return true;
 }
Jakub Jelinek Dec. 11, 2017, 1:11 p.m. UTC | #12
On Mon, Dec 11, 2017 at 11:56:55AM +0100, Kilian Verhetsel wrote:
> Jakub Jelinek <jakub@redhat.com> writes:
> > As it doesn't apply, I can't easily check what the patch generates
> > on the PR80631 testcases vs. my thoughts on that; though, if it emits
> > something more complicated for the simple cases, perhaps we could improve
> > those (not handle it like COND_REDUCTION, but instead as former
> > INTEGER_INDUC_COND_REDUCTION and just use a different constant instead of 0
> > if 0 isn't usable for the condition never matched value.
> 
> While you could use values different from 0, I'm not sure this can be
> done as efficiently.  0 is convenient because a single bitwise-and
> between the index vector and the condition will set lanes that do not
> contain a match to 0.

Of course it can be done efficiently, what we care most is that the body of
the vectorized loop is efficient.  Whether we choose -1, 0 or 124 as the
COND_EXPR not ever meant value matters only before that loop (when we need
to load that into a register holding vector of all those constants) and
then a scalar comparison on the REDUC_* result.  Load of -1 vector on some targets
is as expensive as load of 0, for arbitrary value worst case it is one
memory load compared to a specialized zero register (or set all bits)
instruction.  On the other side, by not using any offsetted iteration var,
one can reuse the vector register that holds the IV, which can be used in
some loops too and thus decrease register pressure.
And while comparison against 0 is sometimes one scalar insn
cheaper than comparison against other value, if the insn producing it
already sets the flags, I doubt it is the case here, so it is exactly the
same cost.  Not to mention that in your patch you are instead subtracting
one in the scalar code.

> Jakub Jelinek <jakub@redhat.com> writes:
> > First of all, I fail to see why we don't handle negative step,
> > that can be done with REDUC_MIN instead of REDUC_MAX.
> 
> That would not work without first using values different from 0 to
> indicate the absence of matches (except in cases where all indices are
> negative), which I assume is why the test was there in the first place.
> 
> Below is the patch with fixed formatting and changes to make it apply
> cleanly to r255537 (as far as I can tell this was simply caused by some
> variables and constants having been renamed).

Thanks, it applies cleanly now
> +  else if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
> +	    || (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> +		== INTEGER_INDUC_COND_REDUCTION))
> +	   && reduc_fn == IFN_LAST)»

contains a character at the end of line that makes it not to compile.

Trying to understand your patch, here is the difference with your patch
between additional:
--- tree-vect-loop.c	2017-12-11 13:39:35.619122907 +0100
+++ tree-vect-loop.c	2017-12-11 13:35:27.000000000 +0100
@@ -6021,8 +6021,8 @@ vectorizable_reduction (gimple *stmt, gi
 	    dump_printf_loc (MSG_NOTE, vect_location,
 			     "condition expression based on "
 			     "integer induction.\n");
-	  STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
-	    = INTEGER_INDUC_COND_REDUCTION;
+/*	  STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+	    = INTEGER_INDUC_COND_REDUCTION; */
 	}
so that COND_REDUCTION is used, and the case with
INTEGER_INDUC_COND_REDUCTION with your patch on:
int v[256] = { 77, 1, 79, 3, 4, 5, 6, 7 };

__attribute__((noipa)) void
foo ()
{
  int k, r = -1;
  for (k = 0; k < 256; k++)
    if (v[k] == 77)
      r = k;
  if (r != 0)
    __builtin_abort ();
}

   vect_cst__21 = { 8, 8, 8, 8, 8, 8, 8, 8 };
   vect_cst__28 = { 77, 77, 77, 77, 77, 77, 77, 77 };
+  vect_cst__30 = { -1, -1, -1, -1, -1, -1, -1, -1 };
 
   <bb 3> [local count: 139586436]:
   # k_12 = PHI <k_8(9), 0(2)>
   # r_13 = PHI <r_3(9), -1(2)>
   # ivtmp_11 = PHI <ivtmp_2(9), 256(2)>
   # vect_vec_iv_.0_22 = PHI <vect_vec_iv_.0_23(9), { 0, 1, 2, 3, 4, 5, 6, 7 }(2)>
-  # vect_r_3.1_24 = PHI <vect_r_3.6_29(9), { -1, -1, -1, -1, -1, -1, -1, -1 }(2)>
+  # vect_r_3.1_24 = PHI <vect_r_3.6_29(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)>
   # vectp_v.2_25 = PHI <vectp_v.2_26(9), &v(2)>
-  # ivtmp_30 = PHI <ivtmp_31(9), { 1, 2, 3, 4, 5, 6, 7, 8 }(2)>
-  # _32 = PHI <_33(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)>
-  # ivtmp_43 = PHI <ivtmp_44(9), 0(2)>
+  # ivtmp_31 = PHI <ivtmp_32(9), { 1, 2, 3, 4, 5, 6, 7, 8 }(2)>
+  # _33 = PHI <_34(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)>
+  # ivtmp_41 = PHI <ivtmp_42(9), 0(2)>
   vect_vec_iv_.0_23 = vect_vec_iv_.0_22 + vect_cst__21;
   vect__1.4_27 = MEM[(int *)vectp_v.2_25];
   _1 = v[k_12];
   vect_r_3.6_29 = VEC_COND_EXPR <vect__1.4_27 == vect_cst__28, vect_vec_iv_.0_22, vect_r_3.1_24>;
   r_3 = _1 == 77 ? k_12 : r_13;
   k_8 = k_12 + 1;
   ivtmp_2 = ivtmp_11 - 1;
   vectp_v.2_26 = vectp_v.2_25 + 32;
-  _33 = VEC_COND_EXPR <vect__1.4_27 == vect_cst__28, ivtmp_30, _32>;
-  ivtmp_31 = ivtmp_30 + { 8, 8, 8, 8, 8, 8, 8, 8 };
-  ivtmp_44 = ivtmp_43 + 1;
-  if (ivtmp_44 < 32)
+  _34 = VEC_COND_EXPR <vect__1.4_27 == vect_cst__28, ivtmp_31, _33>;
+  ivtmp_32 = ivtmp_31 + { 8, 8, 8, 8, 8, 8, 8, 8 };
+  ivtmp_42 = ivtmp_41 + 1;
+  if (ivtmp_42 < 32)
     goto <bb 9>; [92.31%]
   else
     goto <bb 18>; [7.69%]

...
   <bb 18> [local count: 10737418]:
   # r_19 = PHI <r_3(3)>
-  # vect_r_3.6_34 = PHI <vect_r_3.6_29(3)>
-  # _45 = PHI <_33(3)>
-  _35 = REDUC_MAX (_45);
-  _36 = {_35, _35, _35, _35, _35, _35, _35, _35};
-  _37 = { 0, 0, 0, 0, 0, 0, 0, 0 };
-  _38 = _45 == _36;
-  _39 = VEC_COND_EXPR <_38, vect_r_3.6_34, _37>;
-  _40 = VIEW_CONVERT_EXPR<vector(8) unsigned int>(_39);
-  _41 = REDUC_MAX (_40);
-  _42 = (int) _41;
-  if (_42 != 0)
+  # vect_r_3.6_35 = PHI <vect_r_3.6_29(3)>
+  # _43 = PHI <_34(3)>
+  _36 = REDUC_MAX (_43);
+  _37 = {_36, _36, _36, _36, _36, _36, _36, _36};
+  _38 = _36 + 4294967295;
+  _39 = (int) _38;
+  stmp_r_3.7_40 = _36 == 0 ? -1 : _39;
+  if (stmp_r_3.7_40 != 0)

Nothing uses vect_cst__30, the only real change is using 0 instead of -1
as the starting value for vect_r_3.1_24, and the REDUC_MAX stuff, where
the most important is that vect_r_3.6_35 is completely ignored with
INTEGER_INDUC_COND_EXPR and thus we rely on DCE to remove it.

So, my preference would be your patch goes in without the »
character and then we incrementally add SIMPLE_INTEGER_INDUC_COND_EXPR
which will be used if vectorizable_reduction determines possibilities
to use some value smaller than all iterations, and then tweak the cases
that can be handled with your INTEGER_INDUC_COND_EXPR or
SIMPLE_INTEGER_INDUC_COND_EXPR instead of COND_EXPR (negative step,
variable base).

	Jakub
Jakub Jelinek Dec. 11, 2017, 1:51 p.m. UTC | #13
On Mon, Dec 11, 2017 at 02:11:34PM +0100, Jakub Jelinek wrote:
> Thanks, it applies cleanly now
> > +  else if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
> > +	    || (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> > +		== INTEGER_INDUC_COND_REDUCTION))
> > +	   && reduc_fn == IFN_LAST)»
> 
> contains a character at the end of line that makes it not to compile.

Another thing is, as your patch is quite large, we need a copyright
assignment for the changes before we can accept it, see
https://gcc.gnu.org/contribute.html for details.

If you are already covered by an assignment of some company, please tell
us which one it is, otherwise contact us and we'll get you the needed
forms.

	Jakub
Kilian Verhetsel Dec. 11, 2017, 5 p.m. UTC | #14
Jakub Jelinek <jakub@redhat.com> writes:
> Of course it can be done efficiently, what we care most is that the body of
> the vectorized loop is efficient.

That's fair, I was looking at the x86 assembly being generated when a single
vectorized iteration was enough (because that is the context in which I
first encountered this bug):

    int f(unsigned int *x, unsigned int k) {
      unsigned int result = 8;
      for (unsigned int i = 0; i < 8; i++) {
        if (x[i] == k) result = i;
      }
      return result;
    }

where the vpand instruction this generates would have to be replaced
with a variable blend if the default value weren't 0 — although I had
not realized even SSE4.1 on x86 includes such an instruction, making
this point less relevant.

> Thanks, it applies cleanly now
> > +  else if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
> > +	    || (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> > +		== INTEGER_INDUC_COND_REDUCTION))
> > +	   && reduc_fn == IFN_LAST)»
>
> contains a character at the end of line that makes it not to compile.

My bad, I must have added this when I opened the patch file itself to
inspect it...

> Another thing is, as your patch is quite large, we need a copyright
> assignment for the changes before we can accept it, see
> https://gcc.gnu.org/contribute.html for details.
>
> If you are already covered by an assignment of some company, please tell
> us which one it is, otherwise contact us and we'll get you the needed
> forms.

I am not covered by any copyright assignment yet. Do I need to send you
any additional information?
Jakub Jelinek Dec. 11, 2017, 5:06 p.m. UTC | #15
On Mon, Dec 11, 2017 at 06:00:11PM +0100, Kilian Verhetsel wrote:
> Jakub Jelinek <jakub@redhat.com> writes:
> > Of course it can be done efficiently, what we care most is that the body of
> > the vectorized loop is efficient.
> 
> That's fair, I was looking at the x86 assembly being generated when a single
> vectorized iteration was enough (because that is the context in which I
> first encountered this bug):
> 
>     int f(unsigned int *x, unsigned int k) {
>       unsigned int result = 8;
>       for (unsigned int i = 0; i < 8; i++) {
>         if (x[i] == k) result = i;
>       }
>       return result;
>     }
> 
> where the vpand instruction this generates would have to be replaced
> with a variable blend if the default value weren't 0 — although I had
> not realized even SSE4.1 on x86 includes such an instruction, making
> this point less relevant.

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80631#c6 where I've
attached so far untested prototype.  If it is added before your patch
makes it in, your patch would start by introducing another kind
(say SIMPLE_INTEGER_INDUC_COND_REDUCTION) and would use that for the
spots that are handled by the PR80631 patch as INTEGER_INDUC_COND_REDUCTION
right now and your code for the rest.  E.g. the above testcase with my
patch, because i is unsigned and base is the minimum of the type is emitted as
COND_REDUCTION, which is what your patch would improve.

> > Another thing is, as your patch is quite large, we need a copyright
> > assignment for the changes before we can accept it, see
> > https://gcc.gnu.org/contribute.html for details.
> >
> > If you are already covered by an assignment of some company, please tell
> > us which one it is, otherwise contact us and we'll get you the needed
> > forms.
> 
> I am not covered by any copyright assignment yet. Do I need to send you
> any additional information?

I'll send it offlist.

	Jakub
diff mbox series

Patch

Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	(revision 254913)
+++ gcc/tree-vect-loop.c	(working copy)
@@ -4316,7 +4316,7 @@  get_initial_defs_for_reduction (slp_tree slp_node,
 
 static void
 vect_create_epilog_for_reduction (vec<tree> vect_defs, gimple *stmt,
-				  gimple *reduc_def_stmt,
+				  gimple *reduc_def_stmt, gimple *induct_stmt,
 				  int ncopies, enum tree_code reduc_code,
 				  vec<gimple *> reduction_phis,
                                   bool double_reduc, 
@@ -4477,7 +4477,9 @@  vect_create_epilog_for_reduction (vec<tree> vect_d
      The first match will be a 1 to allow 0 to be used for non-matching
      indexes.  If there are no matches at all then the vector will be all
      zeroes.  */
-  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION)
+  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
+      || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+      == INTEGER_INDUC_COND_REDUCTION)
     {
       tree indx_before_incr, indx_after_incr;
       int nunits_out = TYPE_VECTOR_SUBPARTS (vectype);
@@ -4754,7 +4756,9 @@  vect_create_epilog_for_reduction (vec<tree> vect_d
   else
     new_phi_result = PHI_RESULT (new_phis[0]);
 
-  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
+  if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
+       || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+       == INTEGER_INDUC_COND_REDUCTION)
       && reduc_code != ERROR_MARK)
     {
       /* For condition reductions, we have a vector (NEW_PHI_RESULT) containing
@@ -4797,76 +4801,118 @@  vect_create_epilog_for_reduction (vec<tree> vect_d
 						    induction_index);
       gsi_insert_before (&exit_gsi, max_index_stmt, GSI_SAME_STMT);
 
-      /* Vector of {max_index, max_index, max_index,...}.  */
-      tree max_index_vec = make_ssa_name (index_vec_type);
-      tree max_index_vec_rhs = build_vector_from_val (index_vec_type,
-						      max_index);
-      gimple *max_index_vec_stmt = gimple_build_assign (max_index_vec,
-							max_index_vec_rhs);
-      gsi_insert_before (&exit_gsi, max_index_vec_stmt, GSI_SAME_STMT);
+      if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION)
+	{
+	  /* Vector of {max_index, max_index, max_index,...}.  */
+	  tree max_index_vec = make_ssa_name (index_vec_type);
+	  tree max_index_vec_rhs = build_vector_from_val (index_vec_type,
+							  max_index);
+	  gimple *max_index_vec_stmt = gimple_build_assign (max_index_vec,
+							    max_index_vec_rhs);
+	  gsi_insert_before (&exit_gsi, max_index_vec_stmt, GSI_SAME_STMT);
 
-      /* Next we compare the new vector (MAX_INDEX_VEC) full of max indexes
-	 with the vector (INDUCTION_INDEX) of found indexes, choosing values
-	 from the data vector (NEW_PHI_RESULT) for matches, 0 (ZERO_VEC)
-	 otherwise.  Only one value should match, resulting in a vector
-	 (VEC_COND) with one data value and the rest zeros.
-	 In the case where the loop never made any matches, every index will
-	 match, resulting in a vector with all data values (which will all be
-	 the default value).  */
+	  /* Next we compare the new vector (MAX_INDEX_VEC) full of max indexes
+	     with the vector (INDUCTION_INDEX) of found indexes, choosing values
+	     from the data vector (NEW_PHI_RESULT) for matches, 0 (ZERO_VEC)
+	     otherwise.  Only one value should match, resulting in a vector
+	     (VEC_COND) with one data value and the rest zeros.  In the case
+	     where the loop never made any matches, every index will match,
+	     resulting in a vector with all data values (which will all be the
+	     default value).  */
 
-      /* Compare the max index vector to the vector of found indexes to find
-	 the position of the max value.  */
-      tree vec_compare = make_ssa_name (index_vec_cmp_type);
-      gimple *vec_compare_stmt = gimple_build_assign (vec_compare, EQ_EXPR,
-						      induction_index,
-						      max_index_vec);
-      gsi_insert_before (&exit_gsi, vec_compare_stmt, GSI_SAME_STMT);
+	  /* Compare the max index vector to the vector of found indexes to find
+	     the position of the max value.  */
+	  tree vec_compare = make_ssa_name (index_vec_cmp_type);
+	  gimple *vec_compare_stmt = gimple_build_assign (vec_compare, EQ_EXPR,
+							  induction_index,
+							  max_index_vec);
+	  gsi_insert_before (&exit_gsi, vec_compare_stmt, GSI_SAME_STMT);
 
-      /* Use the compare to choose either values from the data vector or
-	 zero.  */
-      tree vec_cond = make_ssa_name (vectype);
-      gimple *vec_cond_stmt = gimple_build_assign (vec_cond, VEC_COND_EXPR,
-						   vec_compare, new_phi_result,
-						   zero_vec);
-      gsi_insert_before (&exit_gsi, vec_cond_stmt, GSI_SAME_STMT);
+	  /* Use the compare to choose either values from the data vector or
+	     zero.  */
+	  tree vec_cond = make_ssa_name (vectype);
+	  gimple *vec_cond_stmt = gimple_build_assign (vec_cond, VEC_COND_EXPR,
+						       vec_compare,
+						       new_phi_result,
+						       zero_vec);
+	  gsi_insert_before (&exit_gsi, vec_cond_stmt, GSI_SAME_STMT);
 
-      /* Finally we need to extract the data value from the vector (VEC_COND)
-	 into a scalar (MATCHED_DATA_REDUC).  Logically we want to do a OR
-	 reduction, but because this doesn't exist, we can use a MAX reduction
-	 instead.  The data value might be signed or a float so we need to cast
-	 it first.
-	 In the case where the loop never made any matches, the data values are
-	 all identical, and so will reduce down correctly.  */
+	  /* Finally we need to extract the data value from the vector
+	     (VEC_COND) into a scalar (MATCHED_DATA_REDUC).  Logically we want
+	     to do a OR reduction, but because this doesn't exist, we can use a
+	     MAX reduction instead.  The data value might be signed or a float
+	     so we need to cast it first.  In the case where the loop never made
+	     any matches, the data values are all identical, and so will reduce
+	     down correctly.  */
 
-      /* Make the matched data values unsigned.  */
-      tree vec_cond_cast = make_ssa_name (vectype_unsigned);
-      tree vec_cond_cast_rhs = build1 (VIEW_CONVERT_EXPR, vectype_unsigned,
-				       vec_cond);
-      gimple *vec_cond_cast_stmt = gimple_build_assign (vec_cond_cast,
-							VIEW_CONVERT_EXPR,
-							vec_cond_cast_rhs);
-      gsi_insert_before (&exit_gsi, vec_cond_cast_stmt, GSI_SAME_STMT);
+	  /* Make the matched data values unsigned.  */
+	  tree vec_cond_cast = make_ssa_name (vectype_unsigned);
+	  tree vec_cond_cast_rhs = build1 (VIEW_CONVERT_EXPR, vectype_unsigned,
+					   vec_cond);
+	  gimple *vec_cond_cast_stmt = gimple_build_assign (vec_cond_cast,
+							    VIEW_CONVERT_EXPR,
+							    vec_cond_cast_rhs);
+	  gsi_insert_before (&exit_gsi, vec_cond_cast_stmt, GSI_SAME_STMT);
 
-      /* Reduce down to a scalar value.  */
-      tree data_reduc = make_ssa_name (scalar_type_unsigned);
-      optab ot = optab_for_tree_code (REDUC_MAX_EXPR, vectype_unsigned,
-				      optab_default);
-      gcc_assert (optab_handler (ot, TYPE_MODE (vectype_unsigned))
-		  != CODE_FOR_nothing);
-      gimple *data_reduc_stmt = gimple_build_assign (data_reduc,
-						     REDUC_MAX_EXPR,
-						     vec_cond_cast);
-      gsi_insert_before (&exit_gsi, data_reduc_stmt, GSI_SAME_STMT);
+	  /* Reduce down to a scalar value.  */
+	  tree data_reduc = make_ssa_name (scalar_type_unsigned);
+	  optab ot = optab_for_tree_code (REDUC_MAX_EXPR, vectype_unsigned,
+					  optab_default);
+	  gcc_assert (optab_handler (ot, TYPE_MODE (vectype_unsigned))
+		      != CODE_FOR_nothing);
+	  gimple *data_reduc_stmt = gimple_build_assign (data_reduc,
+							 REDUC_MAX_EXPR,
+							 vec_cond_cast);
+	  gsi_insert_before (&exit_gsi, data_reduc_stmt, GSI_SAME_STMT);
 
-      /* Convert the reduced value back to the result type and set as the
-	 result.  */
-      gimple_seq stmts = NULL;
-      new_temp = gimple_build (&stmts, VIEW_CONVERT_EXPR, scalar_type,
-			       data_reduc);
-      gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
+	  /* Convert the reduced value back to the result type and set as the
+	     result.  */
+	  gimple_seq stmts = NULL;
+	  new_temp = gimple_build (&stmts, VIEW_CONVERT_EXPR, scalar_type,
+				   data_reduc);
+	  gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
+	}
+      else
+	{
+	  /* Retrieve the index of the last match, and adjust the result value
+	     if there were no matches.  */
+	  gimple_seq stmts = NULL;
+
+	  stmt_vec_info induct_stmt_info = vinfo_for_stmt (induct_stmt);
+
+	  tree base = PHI_ARG_DEF_FROM_EDGE (induct_stmt,
+					     loop_preheader_edge (loop));
+	  tree step = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (induct_stmt_info);
+
+	  tree one = build_one_cst (TREE_TYPE (max_index));
+	  tree index = gimple_build (&stmts, MINUS_EXPR, TREE_TYPE (max_index),
+				     max_index, one);
+
+	  index = gimple_convert (&stmts, scalar_type, index);
+
+	  tree offset = gimple_build (&stmts, MULT_EXPR, scalar_type,
+				      step, index);
+	  tree result = gimple_build (&stmts, PLUS_EXPR, scalar_type,
+				      base, offset);
+
+	  tree zero = build_zero_cst (TREE_TYPE (max_index));
+	  tree zcompare = build2 (EQ_EXPR, boolean_type_node,
+				  max_index, zero);
+
+	  gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
+
+	  new_temp = make_ssa_name (new_scalar_dest);
+	  epilog_stmt = gimple_build_assign (new_temp, COND_EXPR, zcompare,
+					     initial_def, result);
+
+	  gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
+	}
+
       scalar_results.safe_push (new_temp);
     }
-  else if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
+  else if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
+	    || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+	    == INTEGER_INDUC_COND_REDUCTION)
 	   && reduc_code == ERROR_MARK)
     {
       /* Condition redution without supported REDUC_MAX_EXPR.  Generate
@@ -4907,8 +4953,12 @@  vect_create_epilog_for_reduction (vec<tree> vect_d
 	    {
 	      tree new_idx_val = idx_val;
 	      tree new_val = val;
-	      if (off != v_size - el_size)
+	      if (off != v_size - el_size
+		  || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+		  == INTEGER_INDUC_COND_REDUCTION)
 		{
+		  /* For integer induction, the index of the last match
+		     must always be known.  */
 		  new_idx_val = make_ssa_name (idx_eltype);
 		  epilog_stmt = gimple_build_assign (new_idx_val,
 						     MAX_EXPR, idx_val,
@@ -4928,12 +4978,52 @@  vect_create_epilog_for_reduction (vec<tree> vect_d
 	      val = new_val;
 	    }
 	}
-      /* Convert the reduced value back to the result type and set as the
-	 result.  */
+
       gimple_seq stmts = NULL;
-      val = gimple_convert (&stmts, scalar_type, val);
-      gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
-      scalar_results.safe_push (val);
+
+      if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION)
+	{
+	  /* Convert the reduced value back to the result type and set as the
+	     result.  */
+	  val = gimple_convert (&stmts, scalar_type, val);
+	  gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
+	  scalar_results.safe_push (val);
+	}
+      else if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+	       == INTEGER_INDUC_COND_REDUCTION)
+	{
+	  /* Retrieve the index of the last match, and adjust the result value
+	     if there were no matches.  */
+	  stmt_vec_info induct_stmt_info = vinfo_for_stmt (induct_stmt);
+
+	  tree base = PHI_ARG_DEF_FROM_EDGE (induct_stmt,
+					     loop_preheader_edge (loop));
+	  tree step = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (induct_stmt_info);
+
+	  tree one = build_one_cst (idx_eltype);
+	  tree index = gimple_build (&stmts, MINUS_EXPR, idx_eltype,
+				     idx_val, one);
+
+	  index = gimple_convert (&stmts, scalar_type, index);
+
+	  tree offset = gimple_build (&stmts, MULT_EXPR, scalar_type,
+				      step, index);
+	  tree result = gimple_build (&stmts, PLUS_EXPR, scalar_type,
+				      base, offset);
+
+	  gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
+
+	  tree zero = build_zero_cst (TREE_TYPE (idx_val));
+	  tree zcompare = build2 (EQ_EXPR, boolean_type_node,
+				  idx_val, zero);
+
+	  new_temp = make_ssa_name (new_scalar_dest);
+	  epilog_stmt = gimple_build_assign (new_temp, COND_EXPR, zcompare,
+					     initial_def, result);
+	  gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
+
+	  scalar_results.safe_push (new_temp);
+	}
     }
 
   /* 2.3 Create the reduction code, using one of the three schemes described
@@ -5625,6 +5715,7 @@  vectorizable_reduction (gimple *stmt, gimple_stmt_
   bool first_p = true;
   tree cr_index_scalar_type = NULL_TREE, cr_index_vector_type = NULL_TREE;
   tree cond_reduc_val = NULL_TREE;
+  gimple *induct_stmt = NULL;
 
   /* Make sure it was already recognized as a reduction computation.  */
   if (STMT_VINFO_DEF_TYPE (vinfo_for_stmt (stmt)) != vect_reduction_def
@@ -5869,7 +5960,10 @@  vectorizable_reduction (gimple *stmt, gimple_stmt_
 	    }
 	  if (dt == vect_induction_def && def_stmt != NULL
 	      && is_nonwrapping_integer_induction (def_stmt, loop))
-	    cond_reduc_dt = dt;
+	    {
+	      cond_reduc_dt = dt;
+	      induct_stmt = def_stmt;
+	    }
 	}
     }
 
@@ -6461,7 +6555,7 @@  vectorizable_reduction (gimple *stmt, gimple_stmt_
     vect_defs[0] = gimple_assign_lhs (*vec_stmt);
 
   vect_create_epilog_for_reduction (vect_defs, stmt, reduc_def_stmt,
-				    epilog_copies,
+				    induct_stmt, epilog_copies,
                                     epilog_reduc_code, phis,
 				    double_reduc, slp_node, slp_node_instance);