Message ID | alpine.DEB.2.20.2201251820040.11348@tpp.orcam.me.uk |
---|---|
State | New |
Headers | show |
Series | [GCC13] Don't force side effects for hardware vector element broadcast | expand |
On Thu, Jan 27, 2022 at 1:16 PM Maciej W. Rozycki <macro@embecosm.com> wrote: > > Do not mark vector element broadcast resulting from OpenCL operations as > having side effects for targets that have a suitable hardware operation, > so that the `vec_duplicateM' standard RTL pattern can be directly used > for them. This does not happen currently, because any RTL pattern named > `vec_duplicateM' cannot express the side effects presumed, and therefore > a `!TREE_SIDE_EFFECTS' condition placed in `store_constructor' correctly > prevents any `vec_duplicateM' pattern from being used. > > This side-effect annotation was introduced for the C frontend with: > <https://gcc.gnu.org/ml/gcc-patches/2010-08/msg01075.html>, and then: > <https://gcc.gnu.org/ml/gcc-patches/2011-08/msg00891.html> ("Scalar > vector binary operation"), and finally landed with commit 0e3a99ae913c > ("c-typeck.c (scalar_to_vector): New function."), along with original > support for scalar-to-vector OpenCL operations. > > Support for these operations was then gradually added to C++ with: > <https://gcc.gnu.org/ml/gcc-patches/2012-09/msg01557.html>, and then: > <https://gcc.gnu.org/ml/gcc-patches/2012-10/msg00778.html> ("[C++] Mixed > scalar-vector operations"), landing with commit a212e43fca22 ("re PR > c++/54427 (Expose more vector extensions)"), followed by: > <https://gcc.gnu.org/ml/gcc-patches/2012-10/msg01665.html>, and then: > <https://gcc.gnu.org/ml/gcc-patches/2012-10/msg02253.html> ("[C++] > Handle ?: for vectors"), landing with commit 93100c6b5b45 ("re PR > c++/54427 (Expose more vector extensions)"). > > And side-effect annotation for C++ was retrofitted, again gradually, > with: <https://gcc.gnu.org/ml/gcc-patches/2013-05/msg00254.html> ("[C++] > Missing save_expr in vector-scalar ops"), landing with commit > 6698175d1591 ("typeck.c (cp_build_binary_op): Call save_expr before > build_vector_from_val."), and then: > <https://gcc.gnu.org/ml/gcc-patches/2013-07/msg00233.html> ("vector > conditional expression with scalar arguments"), landing with commit > 07298ffd6f7c ("call.c (build_conditional_expr_1): Handle the case with > 1 vector and 2 scalars.") > > This was all long before we gained support for the `vec_duplicateM' > standard RTL pattern, with commit be4c1d4a42c5 ("Add VEC_DUPLICATE_EXPR > and associated optab") and it may have had sense for where the element > broadcast operation was always open-coded. However where provided by > hardware the element broadcast operation is just like any other vector > operation, presumably cheap, and whether to repeat it or not should be > based on the cost of the operation and not hardwired. > > So there is no point in artificially marking the operation as having a > side effect for the lone purpose of preventing repetition. This does > not currently prevent targets like the Aarch64 and MIPS ones from using > the hardware element broadcast operation, but it requires the backend to > wire it explicitly via the `vec_initMN' standard RTL pattern rather than > relying on the middle end to pick it up via the `vec_duplicateM' pattern > automatically. This change lifts this limitation. > > gcc/ > * c/c-typeck.cc (build_binary_op) Do not annotate vector element > broadcast operations as having a side effect if there is such an > operation provided by the backend. > * cp/call.cc (build_conditional_expr): Likewise. > * cp/typeck.cc (cp_build_binary_op): Likewise. > --- > Hi, > > I've come across this while working on an out-of-tree microarchitecture, > which does not require a `vec_initMN' RTL pattern, however has hardware > support suitable for cheap `vec_duplicateM'. I could not persuade the > middle end to make use of the new `vec_duplicateM' pattern however until I > removed the side-effect anbnotation. So I think this is a move in the > right direction even if we do not have support for said microarchitecture > at the moment. > > For some reason I could not persuade our current Aarch64 code to use the > `vec_duplicateM' pattern directly either. This is because at the time > `store_constructor' is called the vector mode for the element broadcast > operation is say `V4SFmode' (for a vector of floats), however there is no > corresponding `vec_duplicatev4sf' pattern. Code in the `vec_initv4sfsf' > pattern does the right thing though, referring to `aarch64_simd_dupv4sf' > by hand instead. There is a `vec_duplicatevnx4sf' pattern corresponding > to `VNx4SFmode', but I was unable to figure out how to activate it. > > This change has passed regression-testing with the `aarch64-linux-gnu' > target (with `-march=armv9-a') and QEMU in the user emulation mode across > all the GCC languages and libraries. > > NB there are indentation issues with code surrounding places I have > modified, but I have chosen not to address them so as not to obfuscate > this change. Likewise there are unnecessary compound statements used in > the switch statements in the C++ frontend parts modified. > > Any questions or comments? Otherwise OK once GCC 13 has opened? > > Maciej > --- > gcc/c/c-typeck.cc | 9 +++++++-- > gcc/cp/call.cc | 19 +++++++++++++++---- > gcc/cp/typeck.cc | 9 +++++++-- > 3 files changed, 29 insertions(+), 8 deletions(-) > > gcc-scalar-to-vector-no-side-effect.diff > Index: gcc/gcc/c/c-typeck.cc > =================================================================== > --- gcc.orig/gcc/c/c-typeck.cc > +++ gcc/gcc/c/c-typeck.cc > @@ -49,6 +49,7 @@ along with GCC; see the file COPYING3. > #include "gomp-constants.h" > #include "spellcheck-tree.h" > #include "gcc-rich-location.h" > +#include "optabs-query.h" > #include "stringpool.h" > #include "attribs.h" > #include "asan.h" > @@ -11923,7 +11924,9 @@ build_binary_op (location_t location, en > bool maybe_const = true; > tree sc; > sc = c_fully_fold (op0, false, &maybe_const); > - sc = save_expr (sc); > + if (optab_handler (vec_duplicate_optab, > + TYPE_MODE (type1)) == CODE_FOR_nothing) > + sc = save_expr (sc); This doesn't make much sense - I suppose the CONSTRUCTOR retains TREE_SIDE_EFFECTS but such flag has no meaning on GIMPLE and thus should have been cleared during gimplification or in the end ignored by RTL expansion. You do not add a testcase so it's hard to see what goes wrong where exactly. Richard. > sc = convert (TREE_TYPE (type1), sc); > op0 = build_vector_from_val (type1, sc); > if (!maybe_const) > @@ -11938,7 +11941,9 @@ build_binary_op (location_t location, en > bool maybe_const = true; > tree sc; > sc = c_fully_fold (op1, false, &maybe_const); > - sc = save_expr (sc); > + if (optab_handler (vec_duplicate_optab, > + TYPE_MODE (type0)) == CODE_FOR_nothing) > + sc = save_expr (sc); > sc = convert (TREE_TYPE (type0), sc); > op1 = build_vector_from_val (type0, sc); > if (!maybe_const) > Index: gcc/gcc/cp/call.cc > =================================================================== > --- gcc.orig/gcc/cp/call.cc > +++ gcc/gcc/cp/call.cc > @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3. > #include "langhooks.h" > #include "c-family/c-objc.h" > #include "internal-fn.h" > +#include "optabs-query.h" > #include "stringpool.h" > #include "attribs.h" > #include "gcc-rich-location.h" > @@ -5435,12 +5436,18 @@ build_conditional_expr (const op_locatio > return error_mark_node; > } > > + bool have_vec_duplicate > + = optab_handler (vec_duplicate_optab, > + TYPE_MODE (vtype)) != CODE_FOR_nothing; > + > arg2 = cp_convert (stype, arg2, complain); > - arg2 = save_expr (arg2); > + if (!have_vec_duplicate) > + arg2 = save_expr (arg2); > arg2 = build_vector_from_val (vtype, arg2); > arg2_type = vtype; > arg3 = cp_convert (stype, arg3, complain); > - arg3 = save_expr (arg3); > + if (!have_vec_duplicate) > + arg3 = save_expr (arg3); > arg3 = build_vector_from_val (vtype, arg3); > arg3_type = vtype; > } > @@ -5458,7 +5465,9 @@ build_conditional_expr (const op_locatio > return error_mark_node; > case stv_firstarg: > { > - arg2 = save_expr (arg2); > + if (optab_handler (vec_duplicate_optab, > + TYPE_MODE (arg3_type)) == CODE_FOR_nothing) > + arg2 = save_expr (arg2); > arg2 = convert (TREE_TYPE (arg3_type), arg2); > arg2 = build_vector_from_val (arg3_type, arg2); > arg2_type = TREE_TYPE (arg2); > @@ -5466,7 +5475,9 @@ build_conditional_expr (const op_locatio > } > case stv_secondarg: > { > - arg3 = save_expr (arg3); > + if (optab_handler (vec_duplicate_optab, > + TYPE_MODE (arg2_type)) == CODE_FOR_nothing) > + arg3 = save_expr (arg3); > arg3 = convert (TREE_TYPE (arg2_type), arg3); > arg3 = build_vector_from_val (arg2_type, arg3); > arg3_type = TREE_TYPE (arg3); > Index: gcc/gcc/cp/typeck.cc > =================================================================== > --- gcc.orig/gcc/cp/typeck.cc > +++ gcc/gcc/cp/typeck.cc > @@ -36,6 +36,7 @@ along with GCC; see the file COPYING3. > #include "c-family/c-objc.h" > #include "c-family/c-ubsan.h" > #include "gcc-rich-location.h" > +#include "optabs-query.h" > #include "stringpool.h" > #include "attribs.h" > #include "asan.h" > @@ -5066,7 +5067,9 @@ cp_build_binary_op (const op_location_t > case stv_firstarg: > { > op0 = convert (TREE_TYPE (type1), op0); > - op0 = save_expr (op0); > + if (optab_handler (vec_duplicate_optab, > + TYPE_MODE (type1)) == CODE_FOR_nothing) > + op0 = save_expr (op0); > op0 = build_vector_from_val (type1, op0); > type0 = TREE_TYPE (op0); > code0 = TREE_CODE (type0); > @@ -5076,7 +5079,9 @@ cp_build_binary_op (const op_location_t > case stv_secondarg: > { > op1 = convert (TREE_TYPE (type0), op1); > - op1 = save_expr (op1); > + if (optab_handler (vec_duplicate_optab, > + TYPE_MODE (type0)) == CODE_FOR_nothing) > + op1 = save_expr (op1); > op1 = build_vector_from_val (type0, op1); > type1 = TREE_TYPE (op1); > code1 = TREE_CODE (type1);
On Thu, 27 Jan 2022, Richard Biener wrote: > > Index: gcc/gcc/c/c-typeck.cc > > =================================================================== > > --- gcc.orig/gcc/c/c-typeck.cc > > +++ gcc/gcc/c/c-typeck.cc > > @@ -49,6 +49,7 @@ along with GCC; see the file COPYING3. > > #include "gomp-constants.h" > > #include "spellcheck-tree.h" > > #include "gcc-rich-location.h" > > +#include "optabs-query.h" > > #include "stringpool.h" > > #include "attribs.h" > > #include "asan.h" > > @@ -11923,7 +11924,9 @@ build_binary_op (location_t location, en > > bool maybe_const = true; > > tree sc; > > sc = c_fully_fold (op0, false, &maybe_const); > > - sc = save_expr (sc); > > + if (optab_handler (vec_duplicate_optab, > > + TYPE_MODE (type1)) == CODE_FOR_nothing) > > + sc = save_expr (sc); > > This doesn't make much sense - I suppose the CONSTRUCTOR retains > TREE_SIDE_EFFECTS but such flag has no meaning on GIMPLE > and thus should have been cleared during gimplification or in the end > ignored by RTL expansion. This is how the expression built here eventually looks in `store_constructor': (gdb) print exp $41 = <constructor 0x7ffff5c06ba0> (gdb) pt <constructor 0x7ffff5c06ba0 type <vector_type 0x7ffff5e7ea48 v4sf type <real_type 0x7ffff5cf1260 float sizes-gimplified SF size <integer_cst 0x7ffff5c00f78 constant 32> unit-size <integer_cst 0x7ffff5c00f90 constant 4> align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5cf1260 precision:32 pointer_to_this <pointer_type 0x7ffff5cf1848>> sizes-gimplified V4SF size <integer_cst 0x7ffff5c00d80 constant 128> unit-size <integer_cst 0x7ffff5c00d98 constant 16> align:128 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5d19648 nunits:4 context <translation_unit_decl 0x7ffff5ec0bb8 v4sf-dup.c>> side-effects length:4 val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> visited var <parm_decl 0x7ffff5f00080 y> def_stmt GIMPLE_NOP version:2> val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> visited var <parm_decl 0x7ffff5f00080 y> def_stmt GIMPLE_NOP version:2> val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> visited var <parm_decl 0x7ffff5f00080 y> def_stmt GIMPLE_NOP version:2> val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> visited var <parm_decl 0x7ffff5f00080 y> def_stmt GIMPLE_NOP version:2>> (gdb) The `side-effects' flag prevents this conditional from executing: /* Try using vec_duplicate_optab for uniform vectors. */ if (!TREE_SIDE_EFFECTS (exp) && VECTOR_MODE_P (mode) && eltmode == GET_MODE_INNER (mode) && ((icode = optab_handler (vec_duplicate_optab, mode)) != CODE_FOR_nothing) && (elt = uniform_vector_p (exp)) && !VECTOR_TYPE_P (TREE_TYPE (elt))) { class expand_operand ops[2]; create_output_operand (&ops[0], target, mode); create_input_operand (&ops[1], expand_normal (elt), eltmode); expand_insn (icode, 2, ops); if (!rtx_equal_p (target, ops[0].value)) emit_move_insn (target, ops[0].value); break; } I don't know what's supposed to clear the flag (and what the purpose of setting it in the first place would be then). > You do not add a testcase so it's hard to see what goes wrong where exactly. This only happens with an out-of-tree microarchitecture and I was unable to find an in-tree target that ever uses the `vec_duplicateM' standard RTL pattern, so I wasn't able to produce a test case that would trigger with upstream code. Code is essentially like: typedef float v4sf __attribute__ ((vector_size (16))); v4sf odd_even (v4sf x, float y) { return x + f; } I considered the Aarch64 one the most likely candidate as it is the one commit be4c1d4a42c5 has been for, but as I noted in the description it does not appear to use it either. It uses `vec_initMN' instead and does the broadcast by hand in the backend. Maybe I'm missing something. Maciej
On Thu, Jan 27, 2022 at 2:59 PM Maciej W. Rozycki <macro@embecosm.com> wrote: > > On Thu, 27 Jan 2022, Richard Biener wrote: > > > > Index: gcc/gcc/c/c-typeck.cc > > > =================================================================== > > > --- gcc.orig/gcc/c/c-typeck.cc > > > +++ gcc/gcc/c/c-typeck.cc > > > @@ -49,6 +49,7 @@ along with GCC; see the file COPYING3. > > > #include "gomp-constants.h" > > > #include "spellcheck-tree.h" > > > #include "gcc-rich-location.h" > > > +#include "optabs-query.h" > > > #include "stringpool.h" > > > #include "attribs.h" > > > #include "asan.h" > > > @@ -11923,7 +11924,9 @@ build_binary_op (location_t location, en > > > bool maybe_const = true; > > > tree sc; > > > sc = c_fully_fold (op0, false, &maybe_const); > > > - sc = save_expr (sc); > > > + if (optab_handler (vec_duplicate_optab, > > > + TYPE_MODE (type1)) == CODE_FOR_nothing) > > > + sc = save_expr (sc); > > > > This doesn't make much sense - I suppose the CONSTRUCTOR retains > > TREE_SIDE_EFFECTS but such flag has no meaning on GIMPLE > > and thus should have been cleared during gimplification or in the end > > ignored by RTL expansion. > > This is how the expression built here eventually looks in > `store_constructor': > > (gdb) print exp > $41 = <constructor 0x7ffff5c06ba0> > (gdb) pt > <constructor 0x7ffff5c06ba0 > type <vector_type 0x7ffff5e7ea48 v4sf > type <real_type 0x7ffff5cf1260 float sizes-gimplified SF > size <integer_cst 0x7ffff5c00f78 constant 32> > unit-size <integer_cst 0x7ffff5c00f90 constant 4> > align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5cf1260 precision:32 > pointer_to_this <pointer_type 0x7ffff5cf1848>> > sizes-gimplified V4SF > size <integer_cst 0x7ffff5c00d80 constant 128> > unit-size <integer_cst 0x7ffff5c00d98 constant 16> > align:128 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5d19648 nunits:4 context <translation_unit_decl 0x7ffff5ec0bb8 v4sf-dup.c>> > side-effects length:4 > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > visited var <parm_decl 0x7ffff5f00080 y> > def_stmt GIMPLE_NOP > version:2> > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > visited var <parm_decl 0x7ffff5f00080 y> > def_stmt GIMPLE_NOP > version:2> > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > visited var <parm_decl 0x7ffff5f00080 y> > def_stmt GIMPLE_NOP > version:2> > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > visited var <parm_decl 0x7ffff5f00080 y> > def_stmt GIMPLE_NOP > version:2>> > (gdb) > > The `side-effects' flag prevents this conditional from executing: > > /* Try using vec_duplicate_optab for uniform vectors. */ > if (!TREE_SIDE_EFFECTS (exp) > && VECTOR_MODE_P (mode) > && eltmode == GET_MODE_INNER (mode) > && ((icode = optab_handler (vec_duplicate_optab, mode)) > != CODE_FOR_nothing) > && (elt = uniform_vector_p (exp)) > && !VECTOR_TYPE_P (TREE_TYPE (elt))) > { > class expand_operand ops[2]; > create_output_operand (&ops[0], target, mode); > create_input_operand (&ops[1], expand_normal (elt), eltmode); > expand_insn (icode, 2, ops); > if (!rtx_equal_p (target, ops[0].value)) > emit_move_insn (target, ops[0].value); > break; > } > > I don't know what's supposed to clear the flag (and what the purpose of > setting it in the first place would be then). It's probably safe to remove the !TREE_SIDE_EFFECTS check above but already gimplification should have made sure all side-effects are pushed to separate stmts. gimplifiation usually calls recompute_side_effects but that doesn't seem to touch CONSTRUCTORs. But I do remember fixing some spurious TREE_SIDE_EFFECTS on CTORs before. Might be worth verifying in verify_gimple_assign_single that CTORs do not have TREE_SIDE_EFFECTS set (unless this is a clobber). > > > You do not add a testcase so it's hard to see what goes wrong where exactly. > > This only happens with an out-of-tree microarchitecture and I was unable > to find an in-tree target that ever uses the `vec_duplicateM' standard RTL > pattern, so I wasn't able to produce a test case that would trigger with > upstream code. Code is essentially like: > > typedef float v4sf __attribute__ ((vector_size (16))); > > v4sf > odd_even (v4sf x, float y) > { > return x + f; > } > > I considered the Aarch64 one the most likely candidate as it is the one > commit be4c1d4a42c5 has been for, but as I noted in the description it > does not appear to use it either. It uses `vec_initMN' instead and does > the broadcast by hand in the backend. > > Maybe I'm missing something. > > Maciej
On Thu, 27 Jan 2022, Richard Biener wrote: > > > > Index: gcc/gcc/c/c-typeck.cc > > > > =================================================================== > > > > --- gcc.orig/gcc/c/c-typeck.cc > > > > +++ gcc/gcc/c/c-typeck.cc > > > > @@ -49,6 +49,7 @@ along with GCC; see the file COPYING3. > > > > #include "gomp-constants.h" > > > > #include "spellcheck-tree.h" > > > > #include "gcc-rich-location.h" > > > > +#include "optabs-query.h" > > > > #include "stringpool.h" > > > > #include "attribs.h" > > > > #include "asan.h" > > > > @@ -11923,7 +11924,9 @@ build_binary_op (location_t location, en > > > > bool maybe_const = true; > > > > tree sc; > > > > sc = c_fully_fold (op0, false, &maybe_const); > > > > - sc = save_expr (sc); > > > > + if (optab_handler (vec_duplicate_optab, > > > > + TYPE_MODE (type1)) == CODE_FOR_nothing) > > > > + sc = save_expr (sc); > > > > > > This doesn't make much sense - I suppose the CONSTRUCTOR retains > > > TREE_SIDE_EFFECTS but such flag has no meaning on GIMPLE > > > and thus should have been cleared during gimplification or in the end > > > ignored by RTL expansion. > > > > This is how the expression built here eventually looks in > > `store_constructor': > > > > (gdb) print exp > > $41 = <constructor 0x7ffff5c06ba0> > > (gdb) pt > > <constructor 0x7ffff5c06ba0 > > type <vector_type 0x7ffff5e7ea48 v4sf > > type <real_type 0x7ffff5cf1260 float sizes-gimplified SF > > size <integer_cst 0x7ffff5c00f78 constant 32> > > unit-size <integer_cst 0x7ffff5c00f90 constant 4> > > align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5cf1260 precision:32 > > pointer_to_this <pointer_type 0x7ffff5cf1848>> > > sizes-gimplified V4SF > > size <integer_cst 0x7ffff5c00d80 constant 128> > > unit-size <integer_cst 0x7ffff5c00d98 constant 16> > > align:128 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5d19648 nunits:4 context <translation_unit_decl 0x7ffff5ec0bb8 v4sf-dup.c>> > > side-effects length:4 > > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > > visited var <parm_decl 0x7ffff5f00080 y> > > def_stmt GIMPLE_NOP > > version:2> > > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > > visited var <parm_decl 0x7ffff5f00080 y> > > def_stmt GIMPLE_NOP > > version:2> > > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > > visited var <parm_decl 0x7ffff5f00080 y> > > def_stmt GIMPLE_NOP > > version:2> > > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > > visited var <parm_decl 0x7ffff5f00080 y> > > def_stmt GIMPLE_NOP > > version:2>> > > (gdb) > > > > The `side-effects' flag prevents this conditional from executing: > > > > /* Try using vec_duplicate_optab for uniform vectors. */ > > if (!TREE_SIDE_EFFECTS (exp) > > && VECTOR_MODE_P (mode) > > && eltmode == GET_MODE_INNER (mode) > > && ((icode = optab_handler (vec_duplicate_optab, mode)) > > != CODE_FOR_nothing) > > && (elt = uniform_vector_p (exp)) > > && !VECTOR_TYPE_P (TREE_TYPE (elt))) > > { > > class expand_operand ops[2]; > > create_output_operand (&ops[0], target, mode); > > create_input_operand (&ops[1], expand_normal (elt), eltmode); > > expand_insn (icode, 2, ops); > > if (!rtx_equal_p (target, ops[0].value)) > > emit_move_insn (target, ops[0].value); > > break; > > } > > > > I don't know what's supposed to clear the flag (and what the purpose of > > setting it in the first place would be then). > > It's probably safe to remove the !TREE_SIDE_EFFECTS check above > but already gimplification should have made sure all side-effects are > pushed to separate stmts. gimplifiation usually calls recompute_side_effects > but that doesn't seem to touch CONSTRUCTORs. But I do remember fixing > some spurious TREE_SIDE_EFFECTS on CTORs before. > > Might be worth verifying in verify_gimple_assign_single that CTORs > do not have TREE_SIDE_EFFECTS set (unless this is a clobber). OK, so maybe there's another bug somewhere that causes the side-effects flag not to be cleared where expected, however I an inconvinced as to withdrawing my original point. That is why treat code like: v4sf odd_even (v4sf x, float y) { return x + f; } effectively like: v4sf odd_even (v4sf x, volatile float y) { return x + f; } which I infer from the terse justification in the discussions referred is the sole purpose of making use of `save_expr' here, also for targets that have a cheap (or free if combined with another operation) `vec_duplicateM' machine operation? While removing the !TREE_SIDE_EFFECTS check may cause `vec_duplicateM' to be used, the middle end will still ensure the element broadcast operation won't be repeated, e.g. at the cost of consuming a temporary register to carry a vector of identical elements, where it may not be the least costly approach. Where we have an actual `vec_duplicateM' insn we can use its cost to determine the best approach, can't we? Am I still missing something? Maciej
On Thu, Jan 27, 2022 at 8:14 PM Maciej W. Rozycki <macro@embecosm.com> wrote: > > On Thu, 27 Jan 2022, Richard Biener wrote: > > > > > > Index: gcc/gcc/c/c-typeck.cc > > > > > =================================================================== > > > > > --- gcc.orig/gcc/c/c-typeck.cc > > > > > +++ gcc/gcc/c/c-typeck.cc > > > > > @@ -49,6 +49,7 @@ along with GCC; see the file COPYING3. > > > > > #include "gomp-constants.h" > > > > > #include "spellcheck-tree.h" > > > > > #include "gcc-rich-location.h" > > > > > +#include "optabs-query.h" > > > > > #include "stringpool.h" > > > > > #include "attribs.h" > > > > > #include "asan.h" > > > > > @@ -11923,7 +11924,9 @@ build_binary_op (location_t location, en > > > > > bool maybe_const = true; > > > > > tree sc; > > > > > sc = c_fully_fold (op0, false, &maybe_const); > > > > > - sc = save_expr (sc); > > > > > + if (optab_handler (vec_duplicate_optab, > > > > > + TYPE_MODE (type1)) == CODE_FOR_nothing) > > > > > + sc = save_expr (sc); > > > > > > > > This doesn't make much sense - I suppose the CONSTRUCTOR retains > > > > TREE_SIDE_EFFECTS but such flag has no meaning on GIMPLE > > > > and thus should have been cleared during gimplification or in the end > > > > ignored by RTL expansion. > > > > > > This is how the expression built here eventually looks in > > > `store_constructor': > > > > > > (gdb) print exp > > > $41 = <constructor 0x7ffff5c06ba0> > > > (gdb) pt > > > <constructor 0x7ffff5c06ba0 > > > type <vector_type 0x7ffff5e7ea48 v4sf > > > type <real_type 0x7ffff5cf1260 float sizes-gimplified SF > > > size <integer_cst 0x7ffff5c00f78 constant 32> > > > unit-size <integer_cst 0x7ffff5c00f90 constant 4> > > > align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5cf1260 precision:32 > > > pointer_to_this <pointer_type 0x7ffff5cf1848>> > > > sizes-gimplified V4SF > > > size <integer_cst 0x7ffff5c00d80 constant 128> > > > unit-size <integer_cst 0x7ffff5c00d98 constant 16> > > > align:128 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5d19648 nunits:4 context <translation_unit_decl 0x7ffff5ec0bb8 v4sf-dup.c>> > > > side-effects length:4 > > > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > > > visited var <parm_decl 0x7ffff5f00080 y> > > > def_stmt GIMPLE_NOP > > > version:2> > > > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > > > visited var <parm_decl 0x7ffff5f00080 y> > > > def_stmt GIMPLE_NOP > > > version:2> > > > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > > > visited var <parm_decl 0x7ffff5f00080 y> > > > def_stmt GIMPLE_NOP > > > version:2> > > > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > > > visited var <parm_decl 0x7ffff5f00080 y> > > > def_stmt GIMPLE_NOP > > > version:2>> > > > (gdb) > > > > > > The `side-effects' flag prevents this conditional from executing: > > > > > > /* Try using vec_duplicate_optab for uniform vectors. */ > > > if (!TREE_SIDE_EFFECTS (exp) > > > && VECTOR_MODE_P (mode) > > > && eltmode == GET_MODE_INNER (mode) > > > && ((icode = optab_handler (vec_duplicate_optab, mode)) > > > != CODE_FOR_nothing) > > > && (elt = uniform_vector_p (exp)) > > > && !VECTOR_TYPE_P (TREE_TYPE (elt))) > > > { > > > class expand_operand ops[2]; > > > create_output_operand (&ops[0], target, mode); > > > create_input_operand (&ops[1], expand_normal (elt), eltmode); > > > expand_insn (icode, 2, ops); > > > if (!rtx_equal_p (target, ops[0].value)) > > > emit_move_insn (target, ops[0].value); > > > break; > > > } > > > > > > I don't know what's supposed to clear the flag (and what the purpose of > > > setting it in the first place would be then). > > > > It's probably safe to remove the !TREE_SIDE_EFFECTS check above > > but already gimplification should have made sure all side-effects are > > pushed to separate stmts. gimplifiation usually calls recompute_side_effects > > but that doesn't seem to touch CONSTRUCTORs. But I do remember fixing > > some spurious TREE_SIDE_EFFECTS on CTORs before. > > > > Might be worth verifying in verify_gimple_assign_single that CTORs > > do not have TREE_SIDE_EFFECTS set (unless this is a clobber). > > OK, so maybe there's another bug somewhere that causes the side-effects > flag not to be cleared where expected, however I an inconvinced as to > withdrawing my original point. That is why treat code like: > > v4sf > odd_even (v4sf x, float y) > { > return x + f; > } > > effectively like: > > v4sf > odd_even (v4sf x, volatile float y) > { > return x + f; > } that's not what it does. It treats it like float tem = f; return x + { tem, tem, tem, tem }; avoiding, like for x + (1.0f + f) creating return x + { 1.0+f, 1.0+f, 1.0+f ...} it's more CSE than volatile qualifying. > which I infer from the terse justification in the discussions referred is > the sole purpose of making use of `save_expr' here, also for targets that > have a cheap (or free if combined with another operation) `vec_duplicateM' > machine operation? Because the IL from the frontends should not depend on target capabilities and whether we have to preserve side-effects properly doesn't depend on the cheapness of the operation itself. Consider return x + bar (f); you definitely want bar(f) to be only evaluated once, even when the target can cheaply do the splat. Richard. > While removing the !TREE_SIDE_EFFECTS check may cause `vec_duplicateM' to > be used, the middle end will still ensure the element broadcast operation > won't be repeated, e.g. at the cost of consuming a temporary register to > carry a vector of identical elements, where it may not be the least costly > approach. Where we have an actual `vec_duplicateM' insn we can use its > cost to determine the best approach, can't we? > > Am I still missing something? > > Maciej
On Fri, Jan 28, 2022 at 1:22 PM Richard Biener <richard.guenther@gmail.com> wrote: > > On Thu, Jan 27, 2022 at 8:14 PM Maciej W. Rozycki <macro@embecosm.com> wrote: > > > > On Thu, 27 Jan 2022, Richard Biener wrote: > > > > > > > > Index: gcc/gcc/c/c-typeck.cc > > > > > > =================================================================== > > > > > > --- gcc.orig/gcc/c/c-typeck.cc > > > > > > +++ gcc/gcc/c/c-typeck.cc > > > > > > @@ -49,6 +49,7 @@ along with GCC; see the file COPYING3. > > > > > > #include "gomp-constants.h" > > > > > > #include "spellcheck-tree.h" > > > > > > #include "gcc-rich-location.h" > > > > > > +#include "optabs-query.h" > > > > > > #include "stringpool.h" > > > > > > #include "attribs.h" > > > > > > #include "asan.h" > > > > > > @@ -11923,7 +11924,9 @@ build_binary_op (location_t location, en > > > > > > bool maybe_const = true; > > > > > > tree sc; > > > > > > sc = c_fully_fold (op0, false, &maybe_const); > > > > > > - sc = save_expr (sc); > > > > > > + if (optab_handler (vec_duplicate_optab, > > > > > > + TYPE_MODE (type1)) == CODE_FOR_nothing) > > > > > > + sc = save_expr (sc); > > > > > > > > > > This doesn't make much sense - I suppose the CONSTRUCTOR retains > > > > > TREE_SIDE_EFFECTS but such flag has no meaning on GIMPLE > > > > > and thus should have been cleared during gimplification or in the end > > > > > ignored by RTL expansion. > > > > > > > > This is how the expression built here eventually looks in > > > > `store_constructor': > > > > > > > > (gdb) print exp > > > > $41 = <constructor 0x7ffff5c06ba0> > > > > (gdb) pt > > > > <constructor 0x7ffff5c06ba0 > > > > type <vector_type 0x7ffff5e7ea48 v4sf > > > > type <real_type 0x7ffff5cf1260 float sizes-gimplified SF > > > > size <integer_cst 0x7ffff5c00f78 constant 32> > > > > unit-size <integer_cst 0x7ffff5c00f90 constant 4> > > > > align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5cf1260 precision:32 > > > > pointer_to_this <pointer_type 0x7ffff5cf1848>> > > > > sizes-gimplified V4SF > > > > size <integer_cst 0x7ffff5c00d80 constant 128> > > > > unit-size <integer_cst 0x7ffff5c00d98 constant 16> > > > > align:128 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5d19648 nunits:4 context <translation_unit_decl 0x7ffff5ec0bb8 v4sf-dup.c>> > > > > side-effects length:4 > > > > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > > > > visited var <parm_decl 0x7ffff5f00080 y> > > > > def_stmt GIMPLE_NOP > > > > version:2> > > > > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > > > > visited var <parm_decl 0x7ffff5f00080 y> > > > > def_stmt GIMPLE_NOP > > > > version:2> > > > > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > > > > visited var <parm_decl 0x7ffff5f00080 y> > > > > def_stmt GIMPLE_NOP > > > > version:2> > > > > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > > > > visited var <parm_decl 0x7ffff5f00080 y> > > > > def_stmt GIMPLE_NOP > > > > version:2>> > > > > (gdb) > > > > > > > > The `side-effects' flag prevents this conditional from executing: > > > > > > > > /* Try using vec_duplicate_optab for uniform vectors. */ > > > > if (!TREE_SIDE_EFFECTS (exp) > > > > && VECTOR_MODE_P (mode) > > > > && eltmode == GET_MODE_INNER (mode) > > > > && ((icode = optab_handler (vec_duplicate_optab, mode)) > > > > != CODE_FOR_nothing) > > > > && (elt = uniform_vector_p (exp)) > > > > && !VECTOR_TYPE_P (TREE_TYPE (elt))) > > > > { > > > > class expand_operand ops[2]; > > > > create_output_operand (&ops[0], target, mode); > > > > create_input_operand (&ops[1], expand_normal (elt), eltmode); > > > > expand_insn (icode, 2, ops); > > > > if (!rtx_equal_p (target, ops[0].value)) > > > > emit_move_insn (target, ops[0].value); > > > > break; > > > > } > > > > > > > > I don't know what's supposed to clear the flag (and what the purpose of > > > > setting it in the first place would be then). > > > > > > It's probably safe to remove the !TREE_SIDE_EFFECTS check above > > > but already gimplification should have made sure all side-effects are > > > pushed to separate stmts. gimplifiation usually calls recompute_side_effects > > > but that doesn't seem to touch CONSTRUCTORs. But I do remember fixing > > > some spurious TREE_SIDE_EFFECTS on CTORs before. > > > > > > Might be worth verifying in verify_gimple_assign_single that CTORs > > > do not have TREE_SIDE_EFFECTS set (unless this is a clobber). > > > > OK, so maybe there's another bug somewhere that causes the side-effects > > flag not to be cleared where expected, however I an inconvinced as to > > withdrawing my original point. That is why treat code like: > > > > v4sf > > odd_even (v4sf x, float y) > > { > > return x + f; > > } > > > > effectively like: > > > > v4sf > > odd_even (v4sf x, volatile float y) > > { > > return x + f; > > } > > that's not what it does. It treats it like > > float tem = f; > return x + { tem, tem, tem, tem }; > > avoiding, like for x + (1.0f + f) creating > > return x + { 1.0+f, 1.0+f, 1.0+f ...} > > it's more CSE than volatile qualifying. > > > which I infer from the terse justification in the discussions referred is > > the sole purpose of making use of `save_expr' here, also for targets that > > have a cheap (or free if combined with another operation) `vec_duplicateM' > > machine operation? > > Because the IL from the frontends should not depend on target capabilities > and whether we have to preserve side-effects properly doesn't depend on > the cheapness of the operation itself. Consider > > return x + bar (f); > > you definitely want bar(f) to be only evaluated once, even when the > target can cheaply do the splat. Btw, diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc index efd10332c53..c0f7d98931d 100644 --- a/gcc/tree-cfg.cc +++ b/gcc/tree-cfg.cc @@ -4703,6 +4703,12 @@ verify_gimple_assign_single (gassign *stmt) debug_generic_stmt (rhs1); return true; } + if (TREE_SIDE_EFFECTS (rhs1) && !gimple_clobber_p (stmt)) + { + error ("%qs with side-effects", code_name); + debug_generic_stmt (rhs1); + return true; + } return res; case ASSERT_EXPR: does not cause ICEs on the two testcases (on trunk). Richard. > > Richard. > > > While removing the !TREE_SIDE_EFFECTS check may cause `vec_duplicateM' to > > be used, the middle end will still ensure the element broadcast operation > > won't be repeated, e.g. at the cost of consuming a temporary register to > > carry a vector of identical elements, where it may not be the least costly > > approach. Where we have an actual `vec_duplicateM' insn we can use its > > cost to determine the best approach, can't we? > > > > Am I still missing something? > > > > Maciej
On Fri, 28 Jan 2022, Richard Biener wrote: > > that's not what it does. It treats it like > > > > float tem = f; > > return x + { tem, tem, tem, tem }; > > > > avoiding, like for x + (1.0f + f) creating > > > > return x + { 1.0+f, 1.0+f, 1.0+f ...} > > > > it's more CSE than volatile qualifying. I see, thanks for your time to explain me. I got this confused. > > Because the IL from the frontends should not depend on target capabilities > > and whether we have to preserve side-effects properly doesn't depend on > > the cheapness of the operation itself. Consider > > > > return x + bar (f); > > > > you definitely want bar(f) to be only evaluated once, even when the > > target can cheaply do the splat. Indeed. > Btw, > > diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc > index efd10332c53..c0f7d98931d 100644 > --- a/gcc/tree-cfg.cc > +++ b/gcc/tree-cfg.cc > @@ -4703,6 +4703,12 @@ verify_gimple_assign_single (gassign *stmt) > debug_generic_stmt (rhs1); > return true; > } > + if (TREE_SIDE_EFFECTS (rhs1) && !gimple_clobber_p (stmt)) > + { > + error ("%qs with side-effects", code_name); > + debug_generic_stmt (rhs1); > + return true; > + } > return res; > > case ASSERT_EXPR: > > does not cause ICEs on the two testcases (on trunk). Right, so it has turned out I had the wrong binary run under GDB, sigh. I have re-verified the current trunk and indeed the side-effect annotation has gone: (gdb) frame #0 store_constructor (exp=<constructor 0x7ffff5b36e88>, target=0x7ffff5b388e0, cleared=0, size=..., reverse=false) at .../gcc/expr.cc:7169 7169 if (!TREE_SIDE_EFFECTS (exp) (gdb) pt exp <constructor 0x7ffff5b36e88 type <vector_type 0x7ffff5e23678 v4sf type <real_type 0x7ffff5c41260 float sizes-gimplified SF size <integer_cst 0x7ffff5b31050 constant 32> unit-size <integer_cst 0x7ffff5b31068 constant 4> align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5c41260 precision:32 pointer_to_this <pointer_type 0x7ffff5c41848>> sizes-gimplified V4SF size <integer_cst 0x7ffff5b30e58 constant 128> unit-size <integer_cst 0x7ffff5b30e70 constant 16> align:128 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5c6d740 nunits:4 context <translation_unit_decl 0x7ffff5e908e8 v4sf.c>> length:4 val <ssa_name 0x7ffff5bd1200 type <real_type 0x7ffff5c41260 float> visited var <parm_decl 0x7ffff7f80100 y> def_stmt GIMPLE_NOP version:2> val <ssa_name 0x7ffff5bd1200 type <real_type 0x7ffff5c41260 float> visited var <parm_decl 0x7ffff7f80100 y> def_stmt GIMPLE_NOP version:2> val <ssa_name 0x7ffff5bd1200 type <real_type 0x7ffff5c41260 float> visited var <parm_decl 0x7ffff7f80100 y> def_stmt GIMPLE_NOP version:2> val <ssa_name 0x7ffff5bd1200 type <real_type 0x7ffff5c41260 float> visited var <parm_decl 0x7ffff7f80100 y> def_stmt GIMPLE_NOP version:2>> (gdb) and I have tracked down commit 512429a885e8 ("tree-optimization/99863 - clear vector CTOR TREE_SIDE_EFFECTS") of yours to be the change required. Thank you for your assistance! Maciej
Index: gcc/gcc/c/c-typeck.cc =================================================================== --- gcc.orig/gcc/c/c-typeck.cc +++ gcc/gcc/c/c-typeck.cc @@ -49,6 +49,7 @@ along with GCC; see the file COPYING3. #include "gomp-constants.h" #include "spellcheck-tree.h" #include "gcc-rich-location.h" +#include "optabs-query.h" #include "stringpool.h" #include "attribs.h" #include "asan.h" @@ -11923,7 +11924,9 @@ build_binary_op (location_t location, en bool maybe_const = true; tree sc; sc = c_fully_fold (op0, false, &maybe_const); - sc = save_expr (sc); + if (optab_handler (vec_duplicate_optab, + TYPE_MODE (type1)) == CODE_FOR_nothing) + sc = save_expr (sc); sc = convert (TREE_TYPE (type1), sc); op0 = build_vector_from_val (type1, sc); if (!maybe_const) @@ -11938,7 +11941,9 @@ build_binary_op (location_t location, en bool maybe_const = true; tree sc; sc = c_fully_fold (op1, false, &maybe_const); - sc = save_expr (sc); + if (optab_handler (vec_duplicate_optab, + TYPE_MODE (type0)) == CODE_FOR_nothing) + sc = save_expr (sc); sc = convert (TREE_TYPE (type0), sc); op1 = build_vector_from_val (type0, sc); if (!maybe_const) Index: gcc/gcc/cp/call.cc =================================================================== --- gcc.orig/gcc/cp/call.cc +++ gcc/gcc/cp/call.cc @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3. #include "langhooks.h" #include "c-family/c-objc.h" #include "internal-fn.h" +#include "optabs-query.h" #include "stringpool.h" #include "attribs.h" #include "gcc-rich-location.h" @@ -5435,12 +5436,18 @@ build_conditional_expr (const op_locatio return error_mark_node; } + bool have_vec_duplicate + = optab_handler (vec_duplicate_optab, + TYPE_MODE (vtype)) != CODE_FOR_nothing; + arg2 = cp_convert (stype, arg2, complain); - arg2 = save_expr (arg2); + if (!have_vec_duplicate) + arg2 = save_expr (arg2); arg2 = build_vector_from_val (vtype, arg2); arg2_type = vtype; arg3 = cp_convert (stype, arg3, complain); - arg3 = save_expr (arg3); + if (!have_vec_duplicate) + arg3 = save_expr (arg3); arg3 = build_vector_from_val (vtype, arg3); arg3_type = vtype; } @@ -5458,7 +5465,9 @@ build_conditional_expr (const op_locatio return error_mark_node; case stv_firstarg: { - arg2 = save_expr (arg2); + if (optab_handler (vec_duplicate_optab, + TYPE_MODE (arg3_type)) == CODE_FOR_nothing) + arg2 = save_expr (arg2); arg2 = convert (TREE_TYPE (arg3_type), arg2); arg2 = build_vector_from_val (arg3_type, arg2); arg2_type = TREE_TYPE (arg2); @@ -5466,7 +5475,9 @@ build_conditional_expr (const op_locatio } case stv_secondarg: { - arg3 = save_expr (arg3); + if (optab_handler (vec_duplicate_optab, + TYPE_MODE (arg2_type)) == CODE_FOR_nothing) + arg3 = save_expr (arg3); arg3 = convert (TREE_TYPE (arg2_type), arg3); arg3 = build_vector_from_val (arg2_type, arg3); arg3_type = TREE_TYPE (arg3); Index: gcc/gcc/cp/typeck.cc =================================================================== --- gcc.orig/gcc/cp/typeck.cc +++ gcc/gcc/cp/typeck.cc @@ -36,6 +36,7 @@ along with GCC; see the file COPYING3. #include "c-family/c-objc.h" #include "c-family/c-ubsan.h" #include "gcc-rich-location.h" +#include "optabs-query.h" #include "stringpool.h" #include "attribs.h" #include "asan.h" @@ -5066,7 +5067,9 @@ cp_build_binary_op (const op_location_t case stv_firstarg: { op0 = convert (TREE_TYPE (type1), op0); - op0 = save_expr (op0); + if (optab_handler (vec_duplicate_optab, + TYPE_MODE (type1)) == CODE_FOR_nothing) + op0 = save_expr (op0); op0 = build_vector_from_val (type1, op0); type0 = TREE_TYPE (op0); code0 = TREE_CODE (type0); @@ -5076,7 +5079,9 @@ cp_build_binary_op (const op_location_t case stv_secondarg: { op1 = convert (TREE_TYPE (type0), op1); - op1 = save_expr (op1); + if (optab_handler (vec_duplicate_optab, + TYPE_MODE (type0)) == CODE_FOR_nothing) + op1 = save_expr (op1); op1 = build_vector_from_val (type0, op1); type1 = TREE_TYPE (op1); code1 = TREE_CODE (type1);