Message ID | 20170307185802.GQ22703@tucnak |
---|---|
State | New |
Headers | show |
On Tue, 7 Mar 2017, Jakub Jelinek wrote: > Hi! > > If any of the operands of the UBSAN_{ADD,SUB,MUL}_OVERFLOW ifn with > vector operand is a uniform vector, expanding it as VCE on the VECTOR_CST > followed by ARRAY_REF with variable index in the loop is unnecessarily > expensive and nothing fixes it afterwards. > This works around ICE on s390 at least for the uniform vectors. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. Note that another option for the loopy case is to do for (;;) { vec >> by-one-elt; elt = BIT_FIELD_REF <vec, index-zero>; } when whole-vector shifts are available (they are constructed by VEC_PERM_EXPR if vec_perm_const_ok for that mask). If you end up doing variable-index array accesses you can as well spill the vector to memory and use memory accesses on that. Not sure how to arrange that from this part of the expander. Unsure what the ICE you see with s390 is about... Thanks, Richard. > 2017-03-07 Jakub Jelinek <jakub@redhat.com> > > PR sanitizer/79904 > * internal-fn.c (expand_vector_ubsan_overflow): If arg0 or arg1 > is a uniform vector, use uniform_vector_p return value instead of > building ARRAY_REF on folded VIEW_CONVERT_EXPR to array type. > > * gcc.dg/ubsan/pr79904.c: New test. > > --- gcc/internal-fn.c.jj 2017-02-23 08:48:40.000000000 +0100 > +++ gcc/internal-fn.c 2017-03-07 11:55:56.261465702 +0100 > @@ -1869,12 +1869,20 @@ expand_vector_ubsan_overflow (location_t > if (cnt > 4) > { > tree atype = build_array_type_nelts (eltype, cnt); > - op0 = fold_build1_loc (loc, VIEW_CONVERT_EXPR, atype, arg0); > - op0 = build4_loc (loc, ARRAY_REF, eltype, op0, cntv, > - NULL_TREE, NULL_TREE); > - op1 = fold_build1_loc (loc, VIEW_CONVERT_EXPR, atype, arg1); > - op1 = build4_loc (loc, ARRAY_REF, eltype, op1, cntv, > - NULL_TREE, NULL_TREE); > + op0 = uniform_vector_p (arg0); > + if (op0 == NULL_TREE) > + { > + op0 = fold_build1_loc (loc, VIEW_CONVERT_EXPR, atype, arg0); > + op0 = build4_loc (loc, ARRAY_REF, eltype, op0, cntv, > + NULL_TREE, NULL_TREE); > + } > + op1 = uniform_vector_p (arg1); > + if (op1 == NULL_TREE) > + { > + op1 = fold_build1_loc (loc, VIEW_CONVERT_EXPR, atype, arg1); > + op1 = build4_loc (loc, ARRAY_REF, eltype, op1, cntv, > + NULL_TREE, NULL_TREE); > + } > if (resv) > { > res = fold_build1_loc (loc, VIEW_CONVERT_EXPR, atype, resv); > --- gcc/testsuite/gcc.dg/ubsan/pr79904.c.jj 2017-03-07 11:58:53.266120958 +0100 > +++ gcc/testsuite/gcc.dg/ubsan/pr79904.c 2017-03-07 11:58:46.000000000 +0100 > @@ -0,0 +1,11 @@ > +/* PR sanitizer/79904 */ > +/* { dg-do compile } */ > +/* { dg-options "-fsanitize=signed-integer-overflow -Wno-psabi" } */ > + > +typedef signed char V __attribute__((vector_size (8))); > + > +void > +foo (V *a) > +{ > + *a = *a * 3; > +} > > Jakub > >
On Wed, Mar 08, 2017 at 09:15:05AM +0100, Richard Biener wrote: > Ok. Note that another option for the loopy case is to do > > for (;;) > { > vec >> by-one-elt; > elt = BIT_FIELD_REF <vec, index-zero>; > } Indeed, that is a possibility, but I guess I'd need to construct the result similarly if resv is non-NULL. Also, not sure about big endian vectors and whether BIT_FIELD_REF with zero or size - elt_size is more efficient there. In any case, the PR was about s390 without vectors enabled, so this wouldn't apply. > when whole-vector shifts are available (they are constructed by > VEC_PERM_EXPR if vec_perm_const_ok for that mask). If you end up > doing variable-index array accesses you can as well spill the > vector to memory and use memory accesses on that. Not sure how > to arrange that from this part of the expander. Shouldn't something else during the expansion force it into memory if it is more efficient to expand it that way? Apparently it is forced into memory on s390 and the ICE is that the backend doesn't like something on it. Jakub
On Wed, 8 Mar 2017, Jakub Jelinek wrote: > On Wed, Mar 08, 2017 at 09:15:05AM +0100, Richard Biener wrote: > > Ok. Note that another option for the loopy case is to do > > > > for (;;) > > { > > vec >> by-one-elt; > > elt = BIT_FIELD_REF <vec, index-zero>; > > } > > Indeed, that is a possibility, but I guess I'd need to construct > the result similarly if resv is non-NULL. Also, not sure about big endian > vectors and whether BIT_FIELD_REF with zero or size - elt_size is > more efficient there. > > In any case, the PR was about s390 without vectors enabled, so this wouldn't > apply. > > > when whole-vector shifts are available (they are constructed by > > VEC_PERM_EXPR if vec_perm_const_ok for that mask). If you end up > > doing variable-index array accesses you can as well spill the > > vector to memory and use memory accesses on that. Not sure how > > to arrange that from this part of the expander. > > Shouldn't something else during the expansion force it into memory if it is > more efficient to expand it that way? Apparently it is forced into memory Possibly - but it might end up spilling in the loop itself and thus be rather inefficient? > on s390 and the ICE is that the backend doesn't like something on it. Could be - as I said I didn't look into what the ICE actually is. Richard.
On Wed, Mar 08, 2017 at 11:03:53AM +0100, Richard Biener wrote: > On Wed, 8 Mar 2017, Jakub Jelinek wrote: > > > On Wed, Mar 08, 2017 at 09:15:05AM +0100, Richard Biener wrote: > > > Ok. Note that another option for the loopy case is to do > > > > > > for (;;) > > > { > > > vec >> by-one-elt; > > > elt = BIT_FIELD_REF <vec, index-zero>; > > > } > > > > Indeed, that is a possibility, but I guess I'd need to construct > > the result similarly if resv is non-NULL. Also, not sure about big endian > > vectors and whether BIT_FIELD_REF with zero or size - elt_size is > > more efficient there. > > > > In any case, the PR was about s390 without vectors enabled, so this wouldn't > > apply. > > > > > when whole-vector shifts are available (they are constructed by > > > VEC_PERM_EXPR if vec_perm_const_ok for that mask). If you end up > > > doing variable-index array accesses you can as well spill the > > > vector to memory and use memory accesses on that. Not sure how > > > to arrange that from this part of the expander. > > > > Shouldn't something else during the expansion force it into memory if it is > > more efficient to expand it that way? Apparently it is forced into memory > > Possibly - but it might end up spilling in the loop itself and thus be > rather inefficient? Ok. That said, this is -fsanitize=undefined which slows down code anyway, so making it more efficient can be done in GCC8 IMNSHO. Jakub
On Wed, Mar 08, 2017 at 11:06:38AM +0100, Jakub Jelinek wrote: > On Wed, Mar 08, 2017 at 11:03:53AM +0100, Richard Biener wrote: > > On Wed, 8 Mar 2017, Jakub Jelinek wrote: > > > > > On Wed, Mar 08, 2017 at 09:15:05AM +0100, Richard Biener wrote: > > > > Ok. Note that another option for the loopy case is to do > > > > > > > > for (;;) > > > > { > > > > vec >> by-one-elt; > > > > elt = BIT_FIELD_REF <vec, index-zero>; > > > > } > > > > > > Indeed, that is a possibility, but I guess I'd need to construct > > > the result similarly if resv is non-NULL. Also, not sure about big endian > > > vectors and whether BIT_FIELD_REF with zero or size - elt_size is > > > more efficient there. > > > > > > In any case, the PR was about s390 without vectors enabled, so this wouldn't > > > apply. > > > > > > > when whole-vector shifts are available (they are constructed by > > > > VEC_PERM_EXPR if vec_perm_const_ok for that mask). If you end up > > > > doing variable-index array accesses you can as well spill the > > > > vector to memory and use memory accesses on that. Not sure how > > > > to arrange that from this part of the expander. > > > > > > Shouldn't something else during the expansion force it into memory if it is > > > more efficient to expand it that way? Apparently it is forced into memory > > > > Possibly - but it might end up spilling in the loop itself and thus be > > rather inefficient? > > Ok. That said, this is -fsanitize=undefined which slows down code anyway, > so making it more efficient can be done in GCC8 IMNSHO. Indeed -- I'd push out that to 8. Marek
--- gcc/internal-fn.c.jj 2017-02-23 08:48:40.000000000 +0100 +++ gcc/internal-fn.c 2017-03-07 11:55:56.261465702 +0100 @@ -1869,12 +1869,20 @@ expand_vector_ubsan_overflow (location_t if (cnt > 4) { tree atype = build_array_type_nelts (eltype, cnt); - op0 = fold_build1_loc (loc, VIEW_CONVERT_EXPR, atype, arg0); - op0 = build4_loc (loc, ARRAY_REF, eltype, op0, cntv, - NULL_TREE, NULL_TREE); - op1 = fold_build1_loc (loc, VIEW_CONVERT_EXPR, atype, arg1); - op1 = build4_loc (loc, ARRAY_REF, eltype, op1, cntv, - NULL_TREE, NULL_TREE); + op0 = uniform_vector_p (arg0); + if (op0 == NULL_TREE) + { + op0 = fold_build1_loc (loc, VIEW_CONVERT_EXPR, atype, arg0); + op0 = build4_loc (loc, ARRAY_REF, eltype, op0, cntv, + NULL_TREE, NULL_TREE); + } + op1 = uniform_vector_p (arg1); + if (op1 == NULL_TREE) + { + op1 = fold_build1_loc (loc, VIEW_CONVERT_EXPR, atype, arg1); + op1 = build4_loc (loc, ARRAY_REF, eltype, op1, cntv, + NULL_TREE, NULL_TREE); + } if (resv) { res = fold_build1_loc (loc, VIEW_CONVERT_EXPR, atype, resv); --- gcc/testsuite/gcc.dg/ubsan/pr79904.c.jj 2017-03-07 11:58:53.266120958 +0100 +++ gcc/testsuite/gcc.dg/ubsan/pr79904.c 2017-03-07 11:58:46.000000000 +0100 @@ -0,0 +1,11 @@ +/* PR sanitizer/79904 */ +/* { dg-do compile } */ +/* { dg-options "-fsanitize=signed-integer-overflow -Wno-psabi" } */ + +typedef signed char V __attribute__((vector_size (8))); + +void +foo (V *a) +{ + *a = *a * 3; +}