diff mbox

Small ubsan vector arith optimization to fix one testcase from PR sanitizer/79904

Message ID 20170307185802.GQ22703@tucnak
State New
Headers show

Commit Message

Jakub Jelinek March 7, 2017, 6:58 p.m. UTC
Hi!

If any of the operands of the UBSAN_{ADD,SUB,MUL}_OVERFLOW ifn with
vector operand is a uniform vector, expanding it as VCE on the VECTOR_CST
followed by ARRAY_REF with variable index in the loop is unnecessarily
expensive and nothing fixes it afterwards.
This works around ICE on s390 at least for the uniform vectors.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-03-07  Jakub Jelinek  <jakub@redhat.com>

	PR sanitizer/79904
	* internal-fn.c (expand_vector_ubsan_overflow): If arg0 or arg1
	is a uniform vector, use uniform_vector_p return value instead of
	building ARRAY_REF on folded VIEW_CONVERT_EXPR to array type.

	* gcc.dg/ubsan/pr79904.c: New test.


	Jakub

Comments

Richard Biener March 8, 2017, 8:15 a.m. UTC | #1
On Tue, 7 Mar 2017, Jakub Jelinek wrote:

> Hi!
> 
> If any of the operands of the UBSAN_{ADD,SUB,MUL}_OVERFLOW ifn with
> vector operand is a uniform vector, expanding it as VCE on the VECTOR_CST
> followed by ARRAY_REF with variable index in the loop is unnecessarily
> expensive and nothing fixes it afterwards.
> This works around ICE on s390 at least for the uniform vectors.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.  Note that another option for the loopy case is to do

  for (;;)
    {
      vec >> by-one-elt;
      elt = BIT_FIELD_REF <vec, index-zero>;
    }

when whole-vector shifts are available (they are constructed by
VEC_PERM_EXPR if vec_perm_const_ok for that mask).  If you end up
doing variable-index array accesses you can as well spill the
vector to memory and use memory accesses on that.  Not sure how
to arrange that from this part of the expander.

Unsure what the ICE you see with s390 is about...

Thanks,
Richard.



> 2017-03-07  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR sanitizer/79904
> 	* internal-fn.c (expand_vector_ubsan_overflow): If arg0 or arg1
> 	is a uniform vector, use uniform_vector_p return value instead of
> 	building ARRAY_REF on folded VIEW_CONVERT_EXPR to array type.
> 
> 	* gcc.dg/ubsan/pr79904.c: New test.
> 
> --- gcc/internal-fn.c.jj	2017-02-23 08:48:40.000000000 +0100
> +++ gcc/internal-fn.c	2017-03-07 11:55:56.261465702 +0100
> @@ -1869,12 +1869,20 @@ expand_vector_ubsan_overflow (location_t
>        if (cnt > 4)
>  	{
>  	  tree atype = build_array_type_nelts (eltype, cnt);
> -	  op0 = fold_build1_loc (loc, VIEW_CONVERT_EXPR, atype, arg0);
> -	  op0 = build4_loc (loc, ARRAY_REF, eltype, op0, cntv,
> -			    NULL_TREE, NULL_TREE);
> -	  op1 = fold_build1_loc (loc, VIEW_CONVERT_EXPR, atype, arg1);
> -	  op1 = build4_loc (loc, ARRAY_REF, eltype, op1, cntv,
> -			    NULL_TREE, NULL_TREE);
> +	  op0 = uniform_vector_p (arg0);
> +	  if (op0 == NULL_TREE)
> +	    {
> +	      op0 = fold_build1_loc (loc, VIEW_CONVERT_EXPR, atype, arg0);
> +	      op0 = build4_loc (loc, ARRAY_REF, eltype, op0, cntv,
> +				NULL_TREE, NULL_TREE);
> +	    }
> +	  op1 = uniform_vector_p (arg1);
> +	  if (op1 == NULL_TREE)
> +	    {
> +	      op1 = fold_build1_loc (loc, VIEW_CONVERT_EXPR, atype, arg1);
> +	      op1 = build4_loc (loc, ARRAY_REF, eltype, op1, cntv,
> +				NULL_TREE, NULL_TREE);
> +	    }
>  	  if (resv)
>  	    {
>  	      res = fold_build1_loc (loc, VIEW_CONVERT_EXPR, atype, resv);
> --- gcc/testsuite/gcc.dg/ubsan/pr79904.c.jj	2017-03-07 11:58:53.266120958 +0100
> +++ gcc/testsuite/gcc.dg/ubsan/pr79904.c	2017-03-07 11:58:46.000000000 +0100
> @@ -0,0 +1,11 @@
> +/* PR sanitizer/79904 */
> +/* { dg-do compile } */
> +/* { dg-options "-fsanitize=signed-integer-overflow -Wno-psabi" } */
> +
> +typedef signed char V __attribute__((vector_size (8))); 
> +
> +void
> +foo (V *a) 
> +{ 
> +  *a = *a * 3; 
> +}
> 
> 	Jakub
> 
>
Jakub Jelinek March 8, 2017, 9:10 a.m. UTC | #2
On Wed, Mar 08, 2017 at 09:15:05AM +0100, Richard Biener wrote:
> Ok.  Note that another option for the loopy case is to do
> 
>   for (;;)
>     {
>       vec >> by-one-elt;
>       elt = BIT_FIELD_REF <vec, index-zero>;
>     }

Indeed, that is a possibility, but I guess I'd need to construct
the result similarly if resv is non-NULL.  Also, not sure about big endian
vectors and whether BIT_FIELD_REF with zero or size - elt_size is
more efficient there.

In any case, the PR was about s390 without vectors enabled, so this wouldn't
apply.

> when whole-vector shifts are available (they are constructed by
> VEC_PERM_EXPR if vec_perm_const_ok for that mask).  If you end up
> doing variable-index array accesses you can as well spill the
> vector to memory and use memory accesses on that.  Not sure how
> to arrange that from this part of the expander.

Shouldn't something else during the expansion force it into memory if it is
more efficient to expand it that way?  Apparently it is forced into memory
on s390 and the ICE is that the backend doesn't like something on it.

	Jakub
Richard Biener March 8, 2017, 10:03 a.m. UTC | #3
On Wed, 8 Mar 2017, Jakub Jelinek wrote:

> On Wed, Mar 08, 2017 at 09:15:05AM +0100, Richard Biener wrote:
> > Ok.  Note that another option for the loopy case is to do
> > 
> >   for (;;)
> >     {
> >       vec >> by-one-elt;
> >       elt = BIT_FIELD_REF <vec, index-zero>;
> >     }
> 
> Indeed, that is a possibility, but I guess I'd need to construct
> the result similarly if resv is non-NULL.  Also, not sure about big endian
> vectors and whether BIT_FIELD_REF with zero or size - elt_size is
> more efficient there.
> 
> In any case, the PR was about s390 without vectors enabled, so this wouldn't
> apply.
> 
> > when whole-vector shifts are available (they are constructed by
> > VEC_PERM_EXPR if vec_perm_const_ok for that mask).  If you end up
> > doing variable-index array accesses you can as well spill the
> > vector to memory and use memory accesses on that.  Not sure how
> > to arrange that from this part of the expander.
> 
> Shouldn't something else during the expansion force it into memory if it is
> more efficient to expand it that way?  Apparently it is forced into memory

Possibly - but it might end up spilling in the loop itself and thus be
rather inefficient?

> on s390 and the ICE is that the backend doesn't like something on it.

Could be - as I said I didn't look into what the ICE actually is.

Richard.
Jakub Jelinek March 8, 2017, 10:06 a.m. UTC | #4
On Wed, Mar 08, 2017 at 11:03:53AM +0100, Richard Biener wrote:
> On Wed, 8 Mar 2017, Jakub Jelinek wrote:
> 
> > On Wed, Mar 08, 2017 at 09:15:05AM +0100, Richard Biener wrote:
> > > Ok.  Note that another option for the loopy case is to do
> > > 
> > >   for (;;)
> > >     {
> > >       vec >> by-one-elt;
> > >       elt = BIT_FIELD_REF <vec, index-zero>;
> > >     }
> > 
> > Indeed, that is a possibility, but I guess I'd need to construct
> > the result similarly if resv is non-NULL.  Also, not sure about big endian
> > vectors and whether BIT_FIELD_REF with zero or size - elt_size is
> > more efficient there.
> > 
> > In any case, the PR was about s390 without vectors enabled, so this wouldn't
> > apply.
> > 
> > > when whole-vector shifts are available (they are constructed by
> > > VEC_PERM_EXPR if vec_perm_const_ok for that mask).  If you end up
> > > doing variable-index array accesses you can as well spill the
> > > vector to memory and use memory accesses on that.  Not sure how
> > > to arrange that from this part of the expander.
> > 
> > Shouldn't something else during the expansion force it into memory if it is
> > more efficient to expand it that way?  Apparently it is forced into memory
> 
> Possibly - but it might end up spilling in the loop itself and thus be
> rather inefficient?

Ok.  That said, this is -fsanitize=undefined which slows down code anyway,
so making it more efficient can be done in GCC8 IMNSHO.

	Jakub
Marek Polacek March 8, 2017, 10:11 a.m. UTC | #5
On Wed, Mar 08, 2017 at 11:06:38AM +0100, Jakub Jelinek wrote:
> On Wed, Mar 08, 2017 at 11:03:53AM +0100, Richard Biener wrote:
> > On Wed, 8 Mar 2017, Jakub Jelinek wrote:
> > 
> > > On Wed, Mar 08, 2017 at 09:15:05AM +0100, Richard Biener wrote:
> > > > Ok.  Note that another option for the loopy case is to do
> > > > 
> > > >   for (;;)
> > > >     {
> > > >       vec >> by-one-elt;
> > > >       elt = BIT_FIELD_REF <vec, index-zero>;
> > > >     }
> > > 
> > > Indeed, that is a possibility, but I guess I'd need to construct
> > > the result similarly if resv is non-NULL.  Also, not sure about big endian
> > > vectors and whether BIT_FIELD_REF with zero or size - elt_size is
> > > more efficient there.
> > > 
> > > In any case, the PR was about s390 without vectors enabled, so this wouldn't
> > > apply.
> > > 
> > > > when whole-vector shifts are available (they are constructed by
> > > > VEC_PERM_EXPR if vec_perm_const_ok for that mask).  If you end up
> > > > doing variable-index array accesses you can as well spill the
> > > > vector to memory and use memory accesses on that.  Not sure how
> > > > to arrange that from this part of the expander.
> > > 
> > > Shouldn't something else during the expansion force it into memory if it is
> > > more efficient to expand it that way?  Apparently it is forced into memory
> > 
> > Possibly - but it might end up spilling in the loop itself and thus be
> > rather inefficient?
> 
> Ok.  That said, this is -fsanitize=undefined which slows down code anyway,
> so making it more efficient can be done in GCC8 IMNSHO.

Indeed -- I'd push out that to 8.

	Marek
diff mbox

Patch

--- gcc/internal-fn.c.jj	2017-02-23 08:48:40.000000000 +0100
+++ gcc/internal-fn.c	2017-03-07 11:55:56.261465702 +0100
@@ -1869,12 +1869,20 @@  expand_vector_ubsan_overflow (location_t
       if (cnt > 4)
 	{
 	  tree atype = build_array_type_nelts (eltype, cnt);
-	  op0 = fold_build1_loc (loc, VIEW_CONVERT_EXPR, atype, arg0);
-	  op0 = build4_loc (loc, ARRAY_REF, eltype, op0, cntv,
-			    NULL_TREE, NULL_TREE);
-	  op1 = fold_build1_loc (loc, VIEW_CONVERT_EXPR, atype, arg1);
-	  op1 = build4_loc (loc, ARRAY_REF, eltype, op1, cntv,
-			    NULL_TREE, NULL_TREE);
+	  op0 = uniform_vector_p (arg0);
+	  if (op0 == NULL_TREE)
+	    {
+	      op0 = fold_build1_loc (loc, VIEW_CONVERT_EXPR, atype, arg0);
+	      op0 = build4_loc (loc, ARRAY_REF, eltype, op0, cntv,
+				NULL_TREE, NULL_TREE);
+	    }
+	  op1 = uniform_vector_p (arg1);
+	  if (op1 == NULL_TREE)
+	    {
+	      op1 = fold_build1_loc (loc, VIEW_CONVERT_EXPR, atype, arg1);
+	      op1 = build4_loc (loc, ARRAY_REF, eltype, op1, cntv,
+				NULL_TREE, NULL_TREE);
+	    }
 	  if (resv)
 	    {
 	      res = fold_build1_loc (loc, VIEW_CONVERT_EXPR, atype, resv);
--- gcc/testsuite/gcc.dg/ubsan/pr79904.c.jj	2017-03-07 11:58:53.266120958 +0100
+++ gcc/testsuite/gcc.dg/ubsan/pr79904.c	2017-03-07 11:58:46.000000000 +0100
@@ -0,0 +1,11 @@ 
+/* PR sanitizer/79904 */
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=signed-integer-overflow -Wno-psabi" } */
+
+typedef signed char V __attribute__((vector_size (8))); 
+
+void
+foo (V *a) 
+{ 
+  *a = *a * 3; 
+}