diff mbox series

[stage1] Lower VEC_COND_EXPR into internal functions.

Message ID 0e5c0fa7-9719-e3ab-b909-f1c0b052b283@suse.cz
State New
Headers show
Series [stage1] Lower VEC_COND_EXPR into internal functions. | expand

Commit Message

Martin Liška April 1, 2020, 10:19 a.m. UTC
Hello.

This is second attempt to get rid of tcc_comparison GENERIC trees
to be used as the first argument of VEC_COND_EXPR.

The patch attempts achieves that in the following steps:
1) veclower pass expands all tcc_comparison expression into a SSA_NAME
2) since that tcc_comparsion can't be used as the first argument of VEC_COND_EXPR
    (done in GIMPLE verifier)
3) I exposed new internal functions with:
DEF_INTERNAL_OPTAB_FN (VCOND, 0, vcond, vec_cond)
DEF_INTERNAL_OPTAB_FN (VCONDU, 0, vcondu, vec_condu)
DEF_INTERNAL_OPTAB_FN (VCONDEQ, 0, vcondeq, vec_condeq)
DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask)

4) logic of expand_vec_cond_expr is moved into the new pass_gimple_isel pass
5) the pass expands VEC_COND_EXPR into one of the internal functions defined in 3)
6) moreover, I've added a new logic that prefers expand_vec_cmp_expr_p when
    a SSA_NAME is being used in multiple (2+) VEC_COND_EXPR statements

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
Moreover, I run SPEC2006 and SPEC2017 benchmarks on znver1, znver2 and skylake
target and I don't see any reasonable change.

Achieved benefits of the patch:
- removal of a GENERIC expression being used in GIMPLE statements
- extraction into SSA_NAMEs can enable proper tree optimizer (FRE, DOM, PRE)
- possibility to expand smarter based on number of uses (expand_vec_cmp_expr_p)

Future plans:
- tcc_comparison removal just during gimplification
- removal of a code where these expressions are handled for VEC_COND_EXPR
- do the similar thing for COND_EXPR?

The task was guided by Richi (Biener) and I bet he can help with both further questions
and reasoning.

Thanks,
Martin

Comments

Richard Sandiford April 6, 2020, 9:17 a.m. UTC | #1
Martin Liška <mliska@suse.cz> writes:
> Hello.
>
> This is second attempt to get rid of tcc_comparison GENERIC trees
> to be used as the first argument of VEC_COND_EXPR.
>
> The patch attempts achieves that in the following steps:
> 1) veclower pass expands all tcc_comparison expression into a SSA_NAME
> 2) since that tcc_comparsion can't be used as the first argument of VEC_COND_EXPR
>     (done in GIMPLE verifier)
> 3) I exposed new internal functions with:
> DEF_INTERNAL_OPTAB_FN (VCOND, 0, vcond, vec_cond)
> DEF_INTERNAL_OPTAB_FN (VCONDU, 0, vcondu, vec_condu)
> DEF_INTERNAL_OPTAB_FN (VCONDEQ, 0, vcondeq, vec_condeq)
> DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask)
>
> 4) logic of expand_vec_cond_expr is moved into the new pass_gimple_isel pass
> 5) the pass expands VEC_COND_EXPR into one of the internal functions defined in 3)
> 6) moreover, I've added a new logic that prefers expand_vec_cmp_expr_p when
>     a SSA_NAME is being used in multiple (2+) VEC_COND_EXPR statements
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> Moreover, I run SPEC2006 and SPEC2017 benchmarks on znver1, znver2 and skylake
> target and I don't see any reasonable change.
>
> Achieved benefits of the patch:
> - removal of a GENERIC expression being used in GIMPLE statements
> - extraction into SSA_NAMEs can enable proper tree optimizer (FRE, DOM, PRE)
> - possibility to expand smarter based on number of uses (expand_vec_cmp_expr_p)
>
> Future plans:
> - tcc_comparison removal just during gimplification
> - removal of a code where these expressions are handled for VEC_COND_EXPR
> - do the similar thing for COND_EXPR?
>
> The task was guided by Richi (Biener) and I bet he can help with both further questions
> and reasoning.

Thanks for doing this.  It definitely seems more friendly than the
four-operand version to targets where separate comparisons are the norm.

Just a couple of comments about the implementation:

> diff --git a/gcc/passes.def b/gcc/passes.def
> index 2bf2cb78fc5..d654e5ee9fe 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -397,6 +397,7 @@ along with GCC; see the file COPYING3.  If not see
>    NEXT_PASS (pass_cleanup_eh);
>    NEXT_PASS (pass_lower_resx);
>    NEXT_PASS (pass_nrv);
> +  NEXT_PASS (pass_gimple_isel);
>    NEXT_PASS (pass_cleanup_cfg_post_optimizing);
>    NEXT_PASS (pass_warn_function_noreturn);
>    NEXT_PASS (pass_gen_hsail);

What was the reason for making this a separate pass, rather than doing
it as part of veclower?  If we do them separately, then it's harder for
veclower to know which VEC_COND_EXPRs it needs to open-code.  (OK, so
that's a general problem between veclower and expand already, but it
seems like the new approach could help to move away from that by
doing the instruction selection directly in veclower.)

> +/* Expand all VEC_COND_EXPR gimple assignments into calls to internal
> +   function based on type of selected expansion.  */
> +
> +static gimple *
> +gimple_expand_vec_cond_expr (gimple_stmt_iterator *gsi)
> +{
> +  tree lhs, op0a = NULL_TREE, op0b = NULL_TREE;
> +  enum tree_code code;
> +  enum tree_code tcode;
> +  machine_mode cmp_op_mode;
> +  bool unsignedp;
> +  enum insn_code icode;
> +  imm_use_iterator imm_iter;
> +
> +  /* Only consider code == GIMPLE_ASSIGN.  */
> +  gassign *stmt = dyn_cast<gassign *> (gsi_stmt (*gsi));
> +  if (!stmt)
> +    return NULL;
> +
> +  code = gimple_assign_rhs_code (stmt);
> +  if (code != VEC_COND_EXPR)
> +    return NULL;
> +
> +  tree op0 = gimple_assign_rhs1 (stmt);
> +  tree op1 = gimple_assign_rhs2 (stmt);
> +  tree op2 = gimple_assign_rhs3 (stmt);
> +  lhs = gimple_assign_lhs (stmt);
> +  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
> +
> +  gcc_assert (!COMPARISON_CLASS_P (op0));
> +  if (TREE_CODE (op0) == SSA_NAME)
> +    {
> +      unsigned int used_vec_cond_exprs = 0;
> +      gimple *use_stmt;
> +      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, op0)
> +	{
> +	  gassign *assign = dyn_cast<gassign *> (use_stmt);
> +	  if (assign != NULL && gimple_assign_rhs_code (assign) == VEC_COND_EXPR
> +	      && gimple_assign_rhs1 (assign) == op0)
> +	    used_vec_cond_exprs++;
> +	}

This looks like it's quadratic in the worst case.  Could we check
this in a different way?

> +
> +      gassign *def_stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (op0));
> +      if (def_stmt)
> +	{
> +	  tcode = gimple_assign_rhs_code (def_stmt);
> +	  op0a = gimple_assign_rhs1 (def_stmt);
> +	  op0b = gimple_assign_rhs2 (def_stmt);
> +
> +	  tree op0a_type = TREE_TYPE (op0a);
> +	  if (used_vec_cond_exprs >= 2

It would be good if targets were able to provide only vcond_mask.
In that case I guess we should go this path if the later one would fail.

> +	      && (get_vcond_mask_icode (mode, TYPE_MODE (op0a_type))
> +		  != CODE_FOR_nothing)
> +	      && expand_vec_cmp_expr_p (op0a_type, TREE_TYPE (lhs), tcode))
> +	    {
> +	      /* Keep the SSA name and use vcond_mask.  */
> +	      tcode = TREE_CODE (op0);
> +	    }
> +	}
> +      else
> +	tcode = TREE_CODE (op0);
> +    }
> +  else
> +    tcode = TREE_CODE (op0);

Might be easier to follow if tcode is TREE_CODE (op0) by default and
only gets changed when we want to fold in the comparison.

Thanks,
Richard
Li, Pan2 via Gcc-patches April 6, 2020, 12:30 p.m. UTC | #2
On Mon, Apr 6, 2020 at 11:18 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Martin Liška <mliska@suse.cz> writes:
> > Hello.
> >
> > This is second attempt to get rid of tcc_comparison GENERIC trees
> > to be used as the first argument of VEC_COND_EXPR.
> >
> > The patch attempts achieves that in the following steps:
> > 1) veclower pass expands all tcc_comparison expression into a SSA_NAME
> > 2) since that tcc_comparsion can't be used as the first argument of VEC_COND_EXPR
> >     (done in GIMPLE verifier)
> > 3) I exposed new internal functions with:
> > DEF_INTERNAL_OPTAB_FN (VCOND, 0, vcond, vec_cond)
> > DEF_INTERNAL_OPTAB_FN (VCONDU, 0, vcondu, vec_condu)
> > DEF_INTERNAL_OPTAB_FN (VCONDEQ, 0, vcondeq, vec_condeq)
> > DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask)
> >
> > 4) logic of expand_vec_cond_expr is moved into the new pass_gimple_isel pass
> > 5) the pass expands VEC_COND_EXPR into one of the internal functions defined in 3)
> > 6) moreover, I've added a new logic that prefers expand_vec_cmp_expr_p when
> >     a SSA_NAME is being used in multiple (2+) VEC_COND_EXPR statements
> >
> > Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> > Moreover, I run SPEC2006 and SPEC2017 benchmarks on znver1, znver2 and skylake
> > target and I don't see any reasonable change.
> >
> > Achieved benefits of the patch:
> > - removal of a GENERIC expression being used in GIMPLE statements
> > - extraction into SSA_NAMEs can enable proper tree optimizer (FRE, DOM, PRE)
> > - possibility to expand smarter based on number of uses (expand_vec_cmp_expr_p)
> >
> > Future plans:
> > - tcc_comparison removal just during gimplification
> > - removal of a code where these expressions are handled for VEC_COND_EXPR
> > - do the similar thing for COND_EXPR?
> >
> > The task was guided by Richi (Biener) and I bet he can help with both further questions
> > and reasoning.
>
> Thanks for doing this.  It definitely seems more friendly than the
> four-operand version to targets where separate comparisons are the norm.
>
> Just a couple of comments about the implementation:
>
> > diff --git a/gcc/passes.def b/gcc/passes.def
> > index 2bf2cb78fc5..d654e5ee9fe 100644
> > --- a/gcc/passes.def
> > +++ b/gcc/passes.def
> > @@ -397,6 +397,7 @@ along with GCC; see the file COPYING3.  If not see
> >    NEXT_PASS (pass_cleanup_eh);
> >    NEXT_PASS (pass_lower_resx);
> >    NEXT_PASS (pass_nrv);
> > +  NEXT_PASS (pass_gimple_isel);
> >    NEXT_PASS (pass_cleanup_cfg_post_optimizing);
> >    NEXT_PASS (pass_warn_function_noreturn);
> >    NEXT_PASS (pass_gen_hsail);
>
> What was the reason for making this a separate pass, rather than doing
> it as part of veclower?  If we do them separately, then it's harder for
> veclower to know which VEC_COND_EXPRs it needs to open-code.  (OK, so
> that's a general problem between veclower and expand already, but it
> seems like the new approach could help to move away from that by
> doing the instruction selection directly in veclower.)

As the name of the pass suggests it was supposed to be the starting point
of doing all the "complex" (multi-GIMPLE-stmt matching) RTL expansion tricks.

But most importantly veclower is too early to catch CSE opportunities from
loop opts on the conditions and if veclower lowers things then we also want
CSE to cleanup its mess.  I guess we also do not want veclower to be done
before vectorization since it should be easier to re-vectorize from unsupported
vector code than from what veclower makes out of it ... catch-22.

So I consider pass placement a secondary issue for now.

> > +/* Expand all VEC_COND_EXPR gimple assignments into calls to internal
> > +   function based on type of selected expansion.  */
> > +
> > +static gimple *
> > +gimple_expand_vec_cond_expr (gimple_stmt_iterator *gsi)
> > +{
> > +  tree lhs, op0a = NULL_TREE, op0b = NULL_TREE;
> > +  enum tree_code code;
> > +  enum tree_code tcode;
> > +  machine_mode cmp_op_mode;
> > +  bool unsignedp;
> > +  enum insn_code icode;
> > +  imm_use_iterator imm_iter;
> > +
> > +  /* Only consider code == GIMPLE_ASSIGN.  */
> > +  gassign *stmt = dyn_cast<gassign *> (gsi_stmt (*gsi));
> > +  if (!stmt)
> > +    return NULL;
> > +
> > +  code = gimple_assign_rhs_code (stmt);
> > +  if (code != VEC_COND_EXPR)
> > +    return NULL;
> > +
> > +  tree op0 = gimple_assign_rhs1 (stmt);
> > +  tree op1 = gimple_assign_rhs2 (stmt);
> > +  tree op2 = gimple_assign_rhs3 (stmt);
> > +  lhs = gimple_assign_lhs (stmt);
> > +  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
> > +
> > +  gcc_assert (!COMPARISON_CLASS_P (op0));
> > +  if (TREE_CODE (op0) == SSA_NAME)
> > +    {
> > +      unsigned int used_vec_cond_exprs = 0;
> > +      gimple *use_stmt;
> > +      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, op0)
> > +     {
> > +       gassign *assign = dyn_cast<gassign *> (use_stmt);
> > +       if (assign != NULL && gimple_assign_rhs_code (assign) == VEC_COND_EXPR
> > +           && gimple_assign_rhs1 (assign) == op0)
> > +         used_vec_cond_exprs++;
> > +     }
>
> This looks like it's quadratic in the worst case.  Could we check
> this in a different way?
>
> > +
> > +      gassign *def_stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (op0));
> > +      if (def_stmt)
> > +     {
> > +       tcode = gimple_assign_rhs_code (def_stmt);
> > +       op0a = gimple_assign_rhs1 (def_stmt);
> > +       op0b = gimple_assign_rhs2 (def_stmt);
> > +
> > +       tree op0a_type = TREE_TYPE (op0a);
> > +       if (used_vec_cond_exprs >= 2
>
> It would be good if targets were able to provide only vcond_mask.
> In that case I guess we should go this path if the later one would fail.
>
> > +           && (get_vcond_mask_icode (mode, TYPE_MODE (op0a_type))
> > +               != CODE_FOR_nothing)
> > +           && expand_vec_cmp_expr_p (op0a_type, TREE_TYPE (lhs), tcode))
> > +         {
> > +           /* Keep the SSA name and use vcond_mask.  */
> > +           tcode = TREE_CODE (op0);
> > +         }
> > +     }
> > +      else
> > +     tcode = TREE_CODE (op0);
> > +    }
> > +  else
> > +    tcode = TREE_CODE (op0);
>
> Might be easier to follow if tcode is TREE_CODE (op0) by default and
> only gets changed when we want to fold in the comparison.
>
> Thanks,
> Richard
Li, Pan2 via Gcc-patches April 6, 2020, 12:33 p.m. UTC | #3
On Mon, Apr 6, 2020 at 11:18 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Martin Liška <mliska@suse.cz> writes:
> > Hello.
> >
> > This is second attempt to get rid of tcc_comparison GENERIC trees
> > to be used as the first argument of VEC_COND_EXPR.
> >
> > The patch attempts achieves that in the following steps:
> > 1) veclower pass expands all tcc_comparison expression into a SSA_NAME
> > 2) since that tcc_comparsion can't be used as the first argument of VEC_COND_EXPR
> >     (done in GIMPLE verifier)
> > 3) I exposed new internal functions with:
> > DEF_INTERNAL_OPTAB_FN (VCOND, 0, vcond, vec_cond)
> > DEF_INTERNAL_OPTAB_FN (VCONDU, 0, vcondu, vec_condu)
> > DEF_INTERNAL_OPTAB_FN (VCONDEQ, 0, vcondeq, vec_condeq)
> > DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask)
> >
> > 4) logic of expand_vec_cond_expr is moved into the new pass_gimple_isel pass
> > 5) the pass expands VEC_COND_EXPR into one of the internal functions defined in 3)
> > 6) moreover, I've added a new logic that prefers expand_vec_cmp_expr_p when
> >     a SSA_NAME is being used in multiple (2+) VEC_COND_EXPR statements
> >
> > Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> > Moreover, I run SPEC2006 and SPEC2017 benchmarks on znver1, znver2 and skylake
> > target and I don't see any reasonable change.
> >
> > Achieved benefits of the patch:
> > - removal of a GENERIC expression being used in GIMPLE statements
> > - extraction into SSA_NAMEs can enable proper tree optimizer (FRE, DOM, PRE)
> > - possibility to expand smarter based on number of uses (expand_vec_cmp_expr_p)
> >
> > Future plans:
> > - tcc_comparison removal just during gimplification
> > - removal of a code where these expressions are handled for VEC_COND_EXPR
> > - do the similar thing for COND_EXPR?
> >
> > The task was guided by Richi (Biener) and I bet he can help with both further questions
> > and reasoning.
>
> Thanks for doing this.  It definitely seems more friendly than the
> four-operand version to targets where separate comparisons are the norm.
>
> Just a couple of comments about the implementation:
>
> > diff --git a/gcc/passes.def b/gcc/passes.def
> > index 2bf2cb78fc5..d654e5ee9fe 100644
> > --- a/gcc/passes.def
> > +++ b/gcc/passes.def
> > @@ -397,6 +397,7 @@ along with GCC; see the file COPYING3.  If not see
> >    NEXT_PASS (pass_cleanup_eh);
> >    NEXT_PASS (pass_lower_resx);
> >    NEXT_PASS (pass_nrv);
> > +  NEXT_PASS (pass_gimple_isel);
> >    NEXT_PASS (pass_cleanup_cfg_post_optimizing);
> >    NEXT_PASS (pass_warn_function_noreturn);
> >    NEXT_PASS (pass_gen_hsail);
>
> What was the reason for making this a separate pass, rather than doing
> it as part of veclower?  If we do them separately, then it's harder for
> veclower to know which VEC_COND_EXPRs it needs to open-code.  (OK, so
> that's a general problem between veclower and expand already, but it
> seems like the new approach could help to move away from that by
> doing the instruction selection directly in veclower.)
>
> > +/* Expand all VEC_COND_EXPR gimple assignments into calls to internal
> > +   function based on type of selected expansion.  */
> > +
> > +static gimple *
> > +gimple_expand_vec_cond_expr (gimple_stmt_iterator *gsi)
> > +{
> > +  tree lhs, op0a = NULL_TREE, op0b = NULL_TREE;
> > +  enum tree_code code;
> > +  enum tree_code tcode;
> > +  machine_mode cmp_op_mode;
> > +  bool unsignedp;
> > +  enum insn_code icode;
> > +  imm_use_iterator imm_iter;
> > +
> > +  /* Only consider code == GIMPLE_ASSIGN.  */
> > +  gassign *stmt = dyn_cast<gassign *> (gsi_stmt (*gsi));
> > +  if (!stmt)
> > +    return NULL;
> > +
> > +  code = gimple_assign_rhs_code (stmt);
> > +  if (code != VEC_COND_EXPR)
> > +    return NULL;
> > +
> > +  tree op0 = gimple_assign_rhs1 (stmt);
> > +  tree op1 = gimple_assign_rhs2 (stmt);
> > +  tree op2 = gimple_assign_rhs3 (stmt);
> > +  lhs = gimple_assign_lhs (stmt);
> > +  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
> > +
> > +  gcc_assert (!COMPARISON_CLASS_P (op0));
> > +  if (TREE_CODE (op0) == SSA_NAME)
> > +    {
> > +      unsigned int used_vec_cond_exprs = 0;
> > +      gimple *use_stmt;
> > +      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, op0)
> > +     {
> > +       gassign *assign = dyn_cast<gassign *> (use_stmt);
> > +       if (assign != NULL && gimple_assign_rhs_code (assign) == VEC_COND_EXPR
> > +           && gimple_assign_rhs1 (assign) == op0)
> > +         used_vec_cond_exprs++;
> > +     }
>
> This looks like it's quadratic in the worst case.  Could we check
> this in a different way?

We could remember a SSA names cond-expr-uses and thus only compute it
once.

> > +
> > +      gassign *def_stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (op0));
> > +      if (def_stmt)
> > +     {
> > +       tcode = gimple_assign_rhs_code (def_stmt);
> > +       op0a = gimple_assign_rhs1 (def_stmt);
> > +       op0b = gimple_assign_rhs2 (def_stmt);
> > +
> > +       tree op0a_type = TREE_TYPE (op0a);
> > +       if (used_vec_cond_exprs >= 2
>
> It would be good if targets were able to provide only vcond_mask.
> In that case I guess we should go this path if the later one would fail.
>
> > +           && (get_vcond_mask_icode (mode, TYPE_MODE (op0a_type))
> > +               != CODE_FOR_nothing)
> > +           && expand_vec_cmp_expr_p (op0a_type, TREE_TYPE (lhs), tcode))
> > +         {
> > +           /* Keep the SSA name and use vcond_mask.  */
> > +           tcode = TREE_CODE (op0);
> > +         }
> > +     }
> > +      else
> > +     tcode = TREE_CODE (op0);
> > +    }
> > +  else
> > +    tcode = TREE_CODE (op0);
>
> Might be easier to follow if tcode is TREE_CODE (op0) by default and
> only gets changed when we want to fold in the comparison.
>
> Thanks,
> Richard
Martin Liška May 21, 2020, 12:51 p.m. UTC | #4
Hi.

Back to this I noticed that ppc64le target build is broken due to:

g++  -fno-PIE -c   -g   -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE   -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. -I. -I/home/marxin/Programming/gcc/gcc -I/home/marxin/Programming/gcc/gcc/. -I/home/marxin/Programming/gcc/gcc/../include -I/home/marxin/Programming/gcc/gcc/../libcpp/include  -I/home/marxin/Programming/gcc/gcc/../libdecnumber -I/home/marxin/Programming/gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I/home/marxin/Programming/gcc/gcc/../libbacktrace   -o insn-emit.o -MT insn-emit.o -MMD -MP -MF ./.deps/insn-emit.TPo insn-emit.c
/home/marxin/Programming/gcc/gcc/config/rs6000/vector.md:357:11: error: vcondv4sfv4sf cannot FAIL
   357 |     FAIL;
       |           ^
/home/marxin/Programming/gcc/gcc/config/rs6000/vector.md:357:11: error: vcondv2dfv2df cannot FAIL
   357 |     FAIL;
       |           ^
/home/marxin/Programming/gcc/gcc/config/rs6000/vector.md:374:11: error: vcondv16qiv16qi cannot FAIL
   374 |     FAIL;
       |           ^
/home/marxin/Programming/gcc/gcc/config/rs6000/vector.md:374:11: error: vcondv8hiv8hi cannot FAIL
   374 |     FAIL;
       |           ^
...


which is caused by the 4 added optabs:

+DEF_INTERNAL_OPTAB_FN (VCOND, 0, vcond, vec_cond)
+DEF_INTERNAL_OPTAB_FN (VCONDU, 0, vcondu, vec_condu)
+DEF_INTERNAL_OPTAB_FN (VCONDEQ, 0, vcondeq, vec_condeq)
+DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask)

looking at the generator:

Breakpoint 6, gen_expand (info=0x7fffffffe160) at /home/marxin/Programming/gcc/gcc/genemit.c:516
516	      if (find_optab (&p, XSTR (expand, 0)))
(gdb) bt
#0  emit_c_code (code=0x7fa0f0 "{\n  if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2],\n\t\t\t\t    operands[3], operands[4], operands[5]))\n    DONE;\n  else\n    FAIL;\n}", can_fail_p=false, name=0x7fa190 "vcondv4sfv4sf")
     at /home/marxin/Programming/gcc/gcc/genemit.c:306
#1  0x00000000004039b5 in gen_expand (info=0x7fffffffe160) at /home/marxin/Programming/gcc/gcc/genemit.c:522
#2  0x0000000000404912 in main (argc=4, argv=0x7fffffffe288) at /home/marxin/Programming/gcc/gcc/genemit.c:916

I get there due to:

B- │516               if (find_optab (&p, XSTR (expand, 0)))│
    │517                 {                                   │
    │518                   gcc_assert (p.op < NUM_OPTABS);   │
    │519                   if (nofail_optabs[p.op])          │
    │520                     can_fail_p = false;             │
    │521                 }                                   │


#define DEF_INTERNAL_OPTAB_FN(NAME, FLAGS, OPTAB, TYPE) \
   nofail_optabs[OPTAB##_optab] = true;
#include "internal-fn.def"

Any hint what's bad? Note that x86_64-linux-gnu is fine.
Do I miss a target hook?

Martin
Martin Liška May 21, 2020, 1:29 p.m. UTC | #5
Adding Segher to CC, he can help us.

Martin

On 5/21/20 2:51 PM, Martin Liška wrote:
> Hi.
> 
> Back to this I noticed that ppc64le target build is broken due to:
> 
> g++  -fno-PIE -c   -g   -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE   -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. -I. -I/home/marxin/Programming/gcc/gcc -I/home/marxin/Programming/gcc/gcc/. -I/home/marxin/Programming/gcc/gcc/../include -I/home/marxin/Programming/gcc/gcc/../libcpp/include  -I/home/marxin/Programming/gcc/gcc/../libdecnumber -I/home/marxin/Programming/gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I/home/marxin/Programming/gcc/gcc/../libbacktrace   -o insn-emit.o -MT insn-emit.o -MMD -MP -MF ./.deps/insn-emit.TPo insn-emit.c
> /home/marxin/Programming/gcc/gcc/config/rs6000/vector.md:357:11: error: vcondv4sfv4sf cannot FAIL
>    357 |     FAIL;
>        |           ^
> /home/marxin/Programming/gcc/gcc/config/rs6000/vector.md:357:11: error: vcondv2dfv2df cannot FAIL
>    357 |     FAIL;
>        |           ^
> /home/marxin/Programming/gcc/gcc/config/rs6000/vector.md:374:11: error: vcondv16qiv16qi cannot FAIL
>    374 |     FAIL;
>        |           ^
> /home/marxin/Programming/gcc/gcc/config/rs6000/vector.md:374:11: error: vcondv8hiv8hi cannot FAIL
>    374 |     FAIL;
>        |           ^
> ...
> 
> 
> which is caused by the 4 added optabs:
> 
> +DEF_INTERNAL_OPTAB_FN (VCOND, 0, vcond, vec_cond)
> +DEF_INTERNAL_OPTAB_FN (VCONDU, 0, vcondu, vec_condu)
> +DEF_INTERNAL_OPTAB_FN (VCONDEQ, 0, vcondeq, vec_condeq)
> +DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask)
> 
> looking at the generator:
> 
> Breakpoint 6, gen_expand (info=0x7fffffffe160) at /home/marxin/Programming/gcc/gcc/genemit.c:516
> 516          if (find_optab (&p, XSTR (expand, 0)))
> (gdb) bt
> #0  emit_c_code (code=0x7fa0f0 "{\n  if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2],\n\t\t\t\t    operands[3], operands[4], operands[5]))\n    DONE;\n  else\n    FAIL;\n}", can_fail_p=false, name=0x7fa190 "vcondv4sfv4sf")
>      at /home/marxin/Programming/gcc/gcc/genemit.c:306
> #1  0x00000000004039b5 in gen_expand (info=0x7fffffffe160) at /home/marxin/Programming/gcc/gcc/genemit.c:522
> #2  0x0000000000404912 in main (argc=4, argv=0x7fffffffe288) at /home/marxin/Programming/gcc/gcc/genemit.c:916
> 
> I get there due to:
> 
> B- │516               if (find_optab (&p, XSTR (expand, 0)))│
>     │517                 {                                   │
>     │518                   gcc_assert (p.op < NUM_OPTABS);   │
>     │519                   if (nofail_optabs[p.op])          │
>     │520                     can_fail_p = false;             │
>     │521                 }                                   │
> 
> 
> #define DEF_INTERNAL_OPTAB_FN(NAME, FLAGS, OPTAB, TYPE) \
>    nofail_optabs[OPTAB##_optab] = true;
> #include "internal-fn.def"
> 
> Any hint what's bad? Note that x86_64-linux-gnu is fine.
> Do I miss a target hook?
> 
> Martin
Segher Boessenkool May 21, 2020, 8:16 p.m. UTC | #6
Hi!

On Thu, May 21, 2020 at 03:29:49PM +0200, Martin Liška wrote:
> Adding Segher to CC, he can help us.

Oh dear.  Are you sure?

> On 5/21/20 2:51 PM, Martin Liška wrote:
> >Back to this I noticed that ppc64le target build is broken due to:

> >insn-emit.o -MMD -MP -MF ./.deps/insn-emit.TPo insn-emit.c
> >/home/marxin/Programming/gcc/gcc/config/rs6000/vector.md:357:11: error: 
> >vcondv4sfv4sf cannot FAIL
> >   357 |     FAIL;

Is it new that vcond cannot FAIL?  Because we have done that for years.

Since this breaks bootstrap on a primary target, please revert the patch
until it is sorted.

> >which is caused by the 4 added optabs:
> >
> >+DEF_INTERNAL_OPTAB_FN (VCOND, 0, vcond, vec_cond)
> >+DEF_INTERNAL_OPTAB_FN (VCONDU, 0, vcondu, vec_condu)
> >+DEF_INTERNAL_OPTAB_FN (VCONDEQ, 0, vcondeq, vec_condeq)
> >+DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask)

> >looking at the generator:

> >I get there due to:
> >
> >B- │516               if (find_optab (&p, XSTR (expand, 
> >0)))│
> >    │517                 
> > {                                   │
> >    │518                   gcc_assert (p.op < 
> > NUM_OPTABS);   │
> >    │519                   if 
> > (nofail_optabs[p.op])          │
> >    │520                     can_fail_p = 
> > false;             │
> >    │521                 
> > }                                   │
> >
> >
> >#define DEF_INTERNAL_OPTAB_FN(NAME, FLAGS, OPTAB, TYPE) \
> >   nofail_optabs[OPTAB##_optab] = true;

So yes it is new.  Please fix :-(

> >Any hint what's bad? Note that x86_64-linux-gnu is fine.
> >Do I miss a target hook?

There is a new IFN that requires the existing optabs to never fail.  But
they *do* sometimes fail.  That is what I understand from this anyway,
please correct if needed :-)

We can make the rs6000 patterns never FAIL if that is a good idea (I am
not convinced however), but this should be documented, and all existing
targets need to be checked.

In general it is not pleasant at all to have patterns that cannot FAIL,
it makes writing a (new) port much harder, and there can be cases where
there is no sane code at all that can be generated for some cases, etc.


Segher
Richard Biener May 22, 2020, 11:14 a.m. UTC | #7
On Thu, May 21, 2020 at 10:17 PM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> Hi!
>
> On Thu, May 21, 2020 at 03:29:49PM +0200, Martin Liška wrote:
> > Adding Segher to CC, he can help us.
>
> Oh dear.  Are you sure?
>
> > On 5/21/20 2:51 PM, Martin Liška wrote:
> > >Back to this I noticed that ppc64le target build is broken due to:
>
> > >insn-emit.o -MMD -MP -MF ./.deps/insn-emit.TPo insn-emit.c
> > >/home/marxin/Programming/gcc/gcc/config/rs6000/vector.md:357:11: error:
> > >vcondv4sfv4sf cannot FAIL
> > >   357 |     FAIL;
>
> Is it new that vcond cannot FAIL?  Because we have done that for years.
>
> Since this breaks bootstrap on a primary target, please revert the patch
> until it is sorted.
>
> > >which is caused by the 4 added optabs:
> > >
> > >+DEF_INTERNAL_OPTAB_FN (VCOND, 0, vcond, vec_cond)
> > >+DEF_INTERNAL_OPTAB_FN (VCONDU, 0, vcondu, vec_condu)
> > >+DEF_INTERNAL_OPTAB_FN (VCONDEQ, 0, vcondeq, vec_condeq)
> > >+DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask)
>
> > >looking at the generator:
>
> > >I get there due to:
> > >
> > >B- │516               if (find_optab (&p, XSTR (expand,
> > >0)))│
> > >    │517
> > > {                                   │
> > >    │518                   gcc_assert (p.op <
> > > NUM_OPTABS);   │
> > >    │519                   if
> > > (nofail_optabs[p.op])          │
> > >    │520                     can_fail_p =
> > > false;             │
> > >    │521
> > > }                                   │

OK, so this is an "artifact" of direct internal functions.  We do check that
expansion does not actually FAIL before emitting calls to those IFNs.

I guess this simply makes direct internal functions not a 100% match for
our use and the way out is to add regular internal functions mapping to
the optabs.  That is, I guess, for direct-internal functions it should be
enough to check direct_internal_function_supported_p which it is not
for the case of vcond*.

Richard, do you agree?

Thanks,
Richard.

> > >
> > >#define DEF_INTERNAL_OPTAB_FN(NAME, FLAGS, OPTAB, TYPE) \
> > >   nofail_optabs[OPTAB##_optab] = true;
>
> So yes it is new.  Please fix :-(
>
> > >Any hint what's bad? Note that x86_64-linux-gnu is fine.
> > >Do I miss a target hook?
>
> There is a new IFN that requires the existing optabs to never fail.  But
> they *do* sometimes fail.  That is what I understand from this anyway,
> please correct if needed :-)
>
> We can make the rs6000 patterns never FAIL if that is a good idea (I am
> not convinced however), but this should be documented, and all existing
> targets need to be checked.
>
> In general it is not pleasant at all to have patterns that cannot FAIL,
> it makes writing a (new) port much harder, and there can be cases where
> there is no sane code at all that can be generated for some cases, etc.
>
>
> Segher
Richard Sandiford May 26, 2020, 10:15 a.m. UTC | #8
Richard Biener <richard.guenther@gmail.com> writes:
> On Thu, May 21, 2020 at 10:17 PM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
>>
>> Hi!
>>
>> On Thu, May 21, 2020 at 03:29:49PM +0200, Martin Liška wrote:
>> > Adding Segher to CC, he can help us.
>>
>> Oh dear.  Are you sure?
>>
>> > On 5/21/20 2:51 PM, Martin Liška wrote:
>> > >Back to this I noticed that ppc64le target build is broken due to:
>>
>> > >insn-emit.o -MMD -MP -MF ./.deps/insn-emit.TPo insn-emit.c
>> > >/home/marxin/Programming/gcc/gcc/config/rs6000/vector.md:357:11: error:
>> > >vcondv4sfv4sf cannot FAIL
>> > >   357 |     FAIL;
>>
>> Is it new that vcond cannot FAIL?  Because we have done that for years.
>>
>> Since this breaks bootstrap on a primary target, please revert the patch
>> until it is sorted.
>>
>> > >which is caused by the 4 added optabs:
>> > >
>> > >+DEF_INTERNAL_OPTAB_FN (VCOND, 0, vcond, vec_cond)
>> > >+DEF_INTERNAL_OPTAB_FN (VCONDU, 0, vcondu, vec_condu)
>> > >+DEF_INTERNAL_OPTAB_FN (VCONDEQ, 0, vcondeq, vec_condeq)
>> > >+DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask)
>>
>> > >looking at the generator:
>>
>> > >I get there due to:
>> > >
>> > >B- │516               if (find_optab (&p, XSTR (expand,
>> > >0)))│
>> > >    │517
>> > > {                                   │
>> > >    │518                   gcc_assert (p.op <
>> > > NUM_OPTABS);   │
>> > >    │519                   if
>> > > (nofail_optabs[p.op])          │
>> > >    │520                     can_fail_p =
>> > > false;             │
>> > >    │521
>> > > }                                   │
>
> OK, so this is an "artifact" of direct internal functions.  We do check that
> expansion does not actually FAIL before emitting calls to those IFNs.
>
> I guess this simply makes direct internal functions not a 100% match for
> our use and the way out is to add regular internal functions mapping to
> the optabs.  That is, I guess, for direct-internal functions it should be
> enough to check direct_internal_function_supported_p which it is not
> for the case of vcond*.
>
> Richard, do you agree?

Sorry for the late reply, been off for a few days.

I guess that would be OK for VCOND(U) as an intermediate step,
but long term, I think we should try to make all VCOND* directly-mapped.
If we're doing instruction selection on gimple (a good thing IMO)
we need to know before expand whether an operation is supported.

So longer-term, I think we should replace VCOND(U) with individual ifns,
like for VCONDEQ.  We could reduce the number of optabs needed by
canonicalising greater-based tests to lesser-based tests.

Thanks,
Richard
Martin Liška May 27, 2020, 2:04 p.m. UTC | #9
On 5/26/20 12:15 PM, Richard Sandiford wrote:
> So longer-term, I think we should replace VCOND(U) with individual ifns,
> like for VCONDEQ.  We could reduce the number of optabs needed by
> canonicalising greater-based tests to lesser-based tests.

Hello.

Thanks for the feedback. So would it be possible to go with something
like DEF_INTERNAL_OPTAB_CAN_FAIL (see the attachment)?

I'm sending the complete patch that survives bootstrap and regression
tests on x86_64-linux-gnu and ppc64le-linux-gnu.

Martin
Richard Sandiford May 27, 2020, 4:13 p.m. UTC | #10
Martin Liška <mliska@suse.cz> writes:
> On 5/26/20 12:15 PM, Richard Sandiford wrote:
>> So longer-term, I think we should replace VCOND(U) with individual ifns,
>> like for VCONDEQ.  We could reduce the number of optabs needed by
>> canonicalising greater-based tests to lesser-based tests.
>
> Hello.
>
> Thanks for the feedback. So would it be possible to go with something
> like DEF_INTERNAL_OPTAB_CAN_FAIL (see the attachment)?

It doesn't look like this will solve the problem.  The reason that we
don't allow optabs for directly-mapped IFNs to FAIL is that:

  expand_insn (icode, 6, ops);

will (deliberately) ICE when the pattern FAILs.  Code that copes with
FAILing optabs instead needs to do:

  rtx_insn *watermark = get_last_insn (); <-- position whether it should go.
  ...
  if (maybe_expand_insn (icode, 6, ops))
    {
      ...Success...;
    }

  delete_insns_since (watermark);
  ...fallback code that implements the IFN without optab support...

At this point the IFN isn't really directly-mapped in the intended sense:
the optab is “just” a way of optimising the IFN.

So I think the effect of the patch will be to suppress the build failure,
but instead ICE for PowerPC when the FAIL condition is hit.  It might
be quite difficult to trigger though.  (That's why the static checking
is there. :-))

I think instead we should treat VCOND(U) as not directly-mapped,
as Richard suggested (IIRC).  The internal-fn.c code should then handle
the case in which we have an IFN_VCOND(U) call and the associated
optab fails.  Of course, this is only going to be exercised on targets
like powerpc* that having failing patterns, so it'll need testing there.

What I meant by the quote above is that I think this shows the flaw in
using IFN_VCOND(U) rather than splitting it up further.  Longer term,
we should have a separate IFN_VCOND* and optab for each necessary
condition.  There would then be no need (IMO) to allow the patterns
to FAIL, and we could use directly-mapped IFNs with no fallback.
There'd also be no need for the tree comparison operand to the IFN.

Thanks,
Richard
Richard Biener May 27, 2020, 4:32 p.m. UTC | #11
On May 27, 2020 6:13:24 PM GMT+02:00, Richard Sandiford <richard.sandiford@arm.com> wrote:
>Martin Liška <mliska@suse.cz> writes:
>> On 5/26/20 12:15 PM, Richard Sandiford wrote:
>>> So longer-term, I think we should replace VCOND(U) with individual
>ifns,
>>> like for VCONDEQ.  We could reduce the number of optabs needed by
>>> canonicalising greater-based tests to lesser-based tests.
>>
>> Hello.
>>
>> Thanks for the feedback. So would it be possible to go with something
>> like DEF_INTERNAL_OPTAB_CAN_FAIL (see the attachment)?
>
>It doesn't look like this will solve the problem.  The reason that we
>don't allow optabs for directly-mapped IFNs to FAIL is that:
>
>  expand_insn (icode, 6, ops);
>
>will (deliberately) ICE when the pattern FAILs.  Code that copes with
>FAILing optabs instead needs to do:
>
>rtx_insn *watermark = get_last_insn (); <-- position whether it should
>go.
>  ...
>  if (maybe_expand_insn (icode, 6, ops))
>    {
>      ...Success...;
>    }
>
>  delete_insns_since (watermark);
>  ...fallback code that implements the IFN without optab support...
>
>At this point the IFN isn't really directly-mapped in the intended
>sense:
>the optab is “just” a way of optimising the IFN.
>
>So I think the effect of the patch will be to suppress the build
>failure,
>but instead ICE for PowerPC when the FAIL condition is hit.  It might
>be quite difficult to trigger though.  (That's why the static checking
>is there. :-))
>
>I think instead we should treat VCOND(U) as not directly-mapped,
>as Richard suggested (IIRC).  The internal-fn.c code should then handle
>the case in which we have an IFN_VCOND(U) call and the associated
>optab fails.  Of course, this is only going to be exercised on targets
>like powerpc* that having failing patterns, so it'll need testing
>there.
>
>What I meant by the quote above is that I think this shows the flaw in
>using IFN_VCOND(U) rather than splitting it up further.  Longer term,
>we should have a separate IFN_VCOND* and optab for each necessary
>condition.  There would then be no need (IMO) to allow the patterns
>to FAIL, and we could use directly-mapped IFNs with no fallback.
>There'd also be no need for the tree comparison operand to the IFN.

That might be indeed a good idea. 

Richard. 

>Thanks,
>Richard
Martin Liška May 28, 2020, 2:46 p.m. UTC | #12
Hi.

There's a new patch that adds normal internal functions for the 4
VCOND* functions.

The patch that survives bootstrap and regression
tests on x86_64-linux-gnu and ppc64le-linux-gnu.

Thoughts?
Martin
Richard Sandiford May 28, 2020, 3:28 p.m. UTC | #13
Martin Liška <mliska@suse.cz> writes:
> Hi.
>
> There's a new patch that adds normal internal functions for the 4
> VCOND* functions.
>
> The patch that survives bootstrap and regression
> tests on x86_64-linux-gnu and ppc64le-linux-gnu.

I think this has the same problem as the previous one.  What I meant
in yesterday's message is that:

  expand_insn (icode, 6, ops);

is simply not valid when icode is allowed to FAIL.  That's true in
any context, not just internal functions.  If icode does FAIL,
the expand_insn call will ICE:

  if (!maybe_expand_insn (icode, nops, ops))
    gcc_unreachable ();

When using optabs you either:

(a) declare that the md patterns aren't allowed to FAIL.  expand_insn
    is for this case.

(b) allow the md patterns to FAIL and provide a fallback when they do.
    maybe_expand_insn is for this case.

So if we keep IFN_VCOND, we need to use maybe_expand_insn and find some
way of implementing the IFN_VCOND when the pattern FAILs.

Thanks,
Richard
Richard Biener May 29, 2020, 12:17 p.m. UTC | #14
On Thu, May 28, 2020 at 5:28 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Martin Liška <mliska@suse.cz> writes:
> > Hi.
> >
> > There's a new patch that adds normal internal functions for the 4
> > VCOND* functions.
> >
> > The patch that survives bootstrap and regression
> > tests on x86_64-linux-gnu and ppc64le-linux-gnu.
>
> I think this has the same problem as the previous one.  What I meant
> in yesterday's message is that:
>
>   expand_insn (icode, 6, ops);
>
> is simply not valid when icode is allowed to FAIL.  That's true in
> any context, not just internal functions.  If icode does FAIL,
> the expand_insn call will ICE:
>
>   if (!maybe_expand_insn (icode, nops, ops))
>     gcc_unreachable ();
>
> When using optabs you either:
>
> (a) declare that the md patterns aren't allowed to FAIL.  expand_insn
>     is for this case.
>
> (b) allow the md patterns to FAIL and provide a fallback when they do.
>     maybe_expand_insn is for this case.
>
> So if we keep IFN_VCOND, we need to use maybe_expand_insn and find some
> way of implementing the IFN_VCOND when the pattern FAILs.

But we should not have generated the pattern in that case - we actually verify
we can expand at the time we do this "instruction selection".  This is in-line
with other vectorizations where we also do not expect things to FAIL.

See also the expanders that are removed in the patch.

But adding a comment in the internal function expander to reflect this
is probably good, also pointing to the verification routines (the
preexisting expand_vec_cond_expr_p and expand_vec_cmp_expr_p
routines).  Because of this pre-verification I suggested the direct
internal function first, not being aware of the static cannot-FAIL logic.

Now it looks like that those verification also simply checks optab
availability only but then this is just a preexisting issue (and we can
possibly build a testcase that FAILs RTL expansion for power...).

So given that this means the latent bug in the powerpc backend
should be fixed and we should use a direct internal function instead?

Thanks,
Richard.

> Thanks,
> Richard
Richard Biener May 29, 2020, 12:43 p.m. UTC | #15
On Fri, May 29, 2020 at 2:17 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Thu, May 28, 2020 at 5:28 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
> >
> > Martin Liška <mliska@suse.cz> writes:
> > > Hi.
> > >
> > > There's a new patch that adds normal internal functions for the 4
> > > VCOND* functions.
> > >
> > > The patch that survives bootstrap and regression
> > > tests on x86_64-linux-gnu and ppc64le-linux-gnu.
> >
> > I think this has the same problem as the previous one.  What I meant
> > in yesterday's message is that:
> >
> >   expand_insn (icode, 6, ops);
> >
> > is simply not valid when icode is allowed to FAIL.  That's true in
> > any context, not just internal functions.  If icode does FAIL,
> > the expand_insn call will ICE:
> >
> >   if (!maybe_expand_insn (icode, nops, ops))
> >     gcc_unreachable ();
> >
> > When using optabs you either:
> >
> > (a) declare that the md patterns aren't allowed to FAIL.  expand_insn
> >     is for this case.
> >
> > (b) allow the md patterns to FAIL and provide a fallback when they do.
> >     maybe_expand_insn is for this case.
> >
> > So if we keep IFN_VCOND, we need to use maybe_expand_insn and find some
> > way of implementing the IFN_VCOND when the pattern FAILs.
>
> But we should not have generated the pattern in that case - we actually verify
> we can expand at the time we do this "instruction selection".  This is in-line
> with other vectorizations where we also do not expect things to FAIL.
>
> See also the expanders that are removed in the patch.
>
> But adding a comment in the internal function expander to reflect this
> is probably good, also pointing to the verification routines (the
> preexisting expand_vec_cond_expr_p and expand_vec_cmp_expr_p
> routines).  Because of this pre-verification I suggested the direct
> internal function first, not being aware of the static cannot-FAIL logic.
>
> Now it looks like that those verification also simply checks optab
> availability only but then this is just a preexisting issue (and we can
> possibly build a testcase that FAILs RTL expansion for power...).
>
> So given that this means the latent bug in the powerpc backend
> should be fixed and we should use a direct internal function instead?

So I tried to understand the circumstances the rs6000 patterns FAIL
but FAILed ;)  It looks like some outs of rs6000_emit_vector_cond_expr
are unwarranted and the following should work:

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 8435bc15d72..5503215a00a 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -14638,8 +14638,7 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
        rtx mask2;

        rev_code = reverse_condition_maybe_unordered (rcode);
-       if (rev_code == UNKNOWN)
-         return NULL_RTX;
+       gcc_assert (rev_code != UNKNOWN);

        nor_code = optab_handler (one_cmpl_optab, dmode);
        if (nor_code == CODE_FOR_nothing)
@@ -14737,8 +14736,7 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx
op_true, rtx op_false,
   rtx cond2;
   bool invert_move = false;

-  if (VECTOR_UNIT_NONE_P (dest_mode))
-    return 0;
+  gcc_assert (VECTOR_UNIT_NONE_P (dest_mode));

   gcc_assert (GET_MODE_SIZE (dest_mode) == GET_MODE_SIZE (mask_mode)
              && GET_MODE_NUNITS (dest_mode) == GET_MODE_NUNITS (mask_mode));
@@ -14756,8 +14754,7 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx
op_true, rtx op_false,
         e.g., A  = (B != C) ? D : E becomes A = (B == C) ? E : D.  */
       invert_move = true;
       rcode = reverse_condition_maybe_unordered (rcode);
-      if (rcode == UNKNOWN)
-       return 0;
+      gcc_assert (rcode != UNKNOWN);
       break;

     case GE:

which leaves the

  /* Get the vector mask for the given relational operations.  */
  mask = rs6000_emit_vector_compare (rcode, cc_op0, cc_op1, mask_mode);

  if (!mask)
    return 0;

fail but that function recurses heavily - from reading
rs6000_emit_vector_compare_inner
it looks like power can do a lot of compares but floating-point LT which
reverse_condition_maybe_unordered would turn into UNGE which is not
handled either.
But then rs6000_emit_vector_compare just tries GT for that anyway (not UNGE) so
it is actually be handled (but should not?).

So I bet the expansion of the patterns cannot fail at the moment.  Thus I'd
replace the FAIL with a gcc_unreachable () and see if we have test
coverage for those
FAILs.

Segher - do you actually know this code to guess why the patterns are defensive?

Thanks,
Richard.

> Thanks,
> Richard.
>
> > Thanks,
> > Richard
Segher Boessenkool May 29, 2020, 3:39 p.m. UTC | #16
On Fri, May 29, 2020 at 02:17:00PM +0200, Richard Biener wrote:
> Now it looks like that those verification also simply checks optab
> availability only but then this is just a preexisting issue (and we can
> possibly build a testcase that FAILs RTL expansion for power...).
> 
> So given that this means the latent bug in the powerpc backend
> should be fixed and we should use a direct internal function instead?

I don't see what you consider a bug in the backend here?  The expansion
FAILs, and it is explicitly allowed to do that.

Not allowed to FAIL are:
-- The "lanes" things;
-- vec_duplicate, vec_series;
-- maskload, maskstore;
-- fmin, fmax;
-- madd and friends;
-- sqrt, rsqrt;
-- fmod, remainder;
-- scalb, ldexp;
-- sin, cos, tan, asin, acos, atan;
-- exp, expm1, exp10, exp2, log, log1p, log10, log2, logb;
-- significand, pow, atan2, floor, btrunc, round, ceil, nearbyint, rint;
-- copysign, xorsign;
-- ffs, clrsb, clz, ctz, popcount, parity.

All vcond* patterns are allowed to fail.

Maybe ours don't *need* to, but that doesn't change a thing.

In general, it is a Very Good Thing if patterns are allowed to fail: if
they are not allowed to fail, they have to duplicate all the code that
the generic expander should have, into ever target that needs it.  It
also makes writing a (new) backend easier.


Segher
Segher Boessenkool May 29, 2020, 4:47 p.m. UTC | #17
On Fri, May 29, 2020 at 02:43:12PM +0200, Richard Biener wrote:
> So I tried to understand the circumstances the rs6000 patterns FAIL
> but FAILed ;)  It looks like some outs of rs6000_emit_vector_cond_expr
> are unwarranted and the following should work:
> 
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 8435bc15d72..5503215a00a 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -14638,8 +14638,7 @@ rs6000_emit_vector_compare (enum rtx_code rcode,

(Different function, btw)

>         rtx mask2;
> 
>         rev_code = reverse_condition_maybe_unordered (rcode);
> -       if (rev_code == UNKNOWN)
> -         return NULL_RTX;
> +       gcc_assert (rev_code != UNKNOWN);

reverse_condition_maybe_unordered is documented as possibly returning
UNKNOWN.  The current implementation doesn't, sure.  But fix that first?

rs6000_emit_vector_compare can fail for several other reasons, too --
including when rs6000_emit_vector_compare_inner fails.

> @@ -14737,8 +14736,7 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx
> op_true, rtx op_false,
>    rtx cond2;
>    bool invert_move = false;
> 
> -  if (VECTOR_UNIT_NONE_P (dest_mode))
> -    return 0;
> +  gcc_assert (VECTOR_UNIT_NONE_P (dest_mode));

Why can this condition never be true?  (Missing a ! btw)

It needs a big comment if you want to make wide assumptions like that,
in any case.  Pretty much *all* (non-trivial) asserts need an explanation.

(And perhaps VECTOR_UNIT_ALTIVEC_OR_VSX_P is better).

>   /* Get the vector mask for the given relational operations.  */
>   mask = rs6000_emit_vector_compare (rcode, cc_op0, cc_op1, mask_mode);
> 
>   if (!mask)
>     return 0;
> 
> fail but that function recurses heavily - from reading
> rs6000_emit_vector_compare_inner
> it looks like power can do a lot of compares but floating-point LT which
> reverse_condition_maybe_unordered would turn into UNGE which is not
> handled either.
> But then rs6000_emit_vector_compare just tries GT for that anyway (not UNGE) so
> it is actually be handled (but should not?).
> 
> So I bet the expansion of the patterns cannot fail at the moment.  Thus I'd
> replace the FAIL with a gcc_unreachable () and see if we have test
> coverage for those
> FAILs.

I am not comfortable with that at all.

> Segher - do you actually know this code to guess why the patterns are defensive?

Yes.


If you want to change the documented semantics of widely used functions,
please propose that?


Segher
Richard Sandiford May 29, 2020, 4:57 p.m. UTC | #18
Segher Boessenkool <segher@kernel.crashing.org> writes:
> On Fri, May 29, 2020 at 02:17:00PM +0200, Richard Biener wrote:
>> Now it looks like that those verification also simply checks optab
>> availability only but then this is just a preexisting issue (and we can
>> possibly build a testcase that FAILs RTL expansion for power...).
>> 
>> So given that this means the latent bug in the powerpc backend
>> should be fixed and we should use a direct internal function instead?
>
> I don't see what you consider a bug in the backend here?  The expansion
> FAILs, and it is explicitly allowed to do that.

Well, the docs say:

  …  For **certain** named patterns, it may invoke @code{FAIL} to tell the
  compiler to use an alternate way of performing that task.  …

(my emphasis).  Later on they say:

  @findex FAIL
  @item FAIL
  …

  Failure is currently supported only for binary (addition, multiplication,
  shifting, etc.) and bit-field (@code{extv}, @code{extzv}, and @code{insv})
  operations.

which explicitly says that vcond* isn't allowed to fail.

OK, so that list looks out of date.  But still. :-)

We now explicitly say that some patterns aren't allowed to FAIL,
which I guess gives the (implicit) impression that all the others can.
But that wasn't the intention.  The lines were just added for emphasis.
(AFAIK 7f9844caf1ebd513 was the first patch to do this.)

Richard
Richard Sandiford May 29, 2020, 5:05 p.m. UTC | #19
Segher Boessenkool <segher@kernel.crashing.org> writes:
> On Fri, May 29, 2020 at 02:43:12PM +0200, Richard Biener wrote:
>> So I tried to understand the circumstances the rs6000 patterns FAIL
>> but FAILed ;)  It looks like some outs of rs6000_emit_vector_cond_expr
>> are unwarranted and the following should work:
>> 
>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>> index 8435bc15d72..5503215a00a 100644
>> --- a/gcc/config/rs6000/rs6000.c
>> +++ b/gcc/config/rs6000/rs6000.c
>> @@ -14638,8 +14638,7 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
>
> (Different function, btw)
>
>>         rtx mask2;
>> 
>>         rev_code = reverse_condition_maybe_unordered (rcode);
>> -       if (rev_code == UNKNOWN)
>> -         return NULL_RTX;
>> +       gcc_assert (rev_code != UNKNOWN);
>
> reverse_condition_maybe_unordered is documented as possibly returning
> UNKNOWN.  The current implementation doesn't, sure.  But fix that first?
>
> rs6000_emit_vector_compare can fail for several other reasons, too --
> including when rs6000_emit_vector_compare_inner fails.
>
>> @@ -14737,8 +14736,7 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx
>> op_true, rtx op_false,
>>    rtx cond2;
>>    bool invert_move = false;
>> 
>> -  if (VECTOR_UNIT_NONE_P (dest_mode))
>> -    return 0;
>> +  gcc_assert (VECTOR_UNIT_NONE_P (dest_mode));
>
> Why can this condition never be true?  (Missing a ! btw)
>
> It needs a big comment if you want to make wide assumptions like that,
> in any case.  Pretty much *all* (non-trivial) asserts need an explanation.
>
> (And perhaps VECTOR_UNIT_ALTIVEC_OR_VSX_P is better).
>
>>   /* Get the vector mask for the given relational operations.  */
>>   mask = rs6000_emit_vector_compare (rcode, cc_op0, cc_op1, mask_mode);
>> 
>>   if (!mask)
>>     return 0;
>> 
>> fail but that function recurses heavily - from reading
>> rs6000_emit_vector_compare_inner
>> it looks like power can do a lot of compares but floating-point LT which
>> reverse_condition_maybe_unordered would turn into UNGE which is not
>> handled either.
>> But then rs6000_emit_vector_compare just tries GT for that anyway (not UNGE) so
>> it is actually be handled (but should not?).
>> 
>> So I bet the expansion of the patterns cannot fail at the moment.  Thus I'd
>> replace the FAIL with a gcc_unreachable () and see if we have test
>> coverage for those
>> FAILs.
>
> I am not comfortable with that at all.
>
>> Segher - do you actually know this code to guess why the patterns are defensive?
>
> Yes.

In that case, can you give a specific example in which the patterns do
actually fail?

I think Richard's point is that even the current compiler will ICE if
the vcond* patterns fail.  All Martin's patch did was expose that via
the extra static checking we get for directly-mapped internal fns.
If you want us to fix that by providing a fallback, we need to know what
the fallback should do.  E.g. the obvious thing would be to emit the
embedded comparison separately and then emit bitwise operations to
implement the select.  But in the powerpc case, it's actually the
comparison that's the potential problem, so that expansion would just
kick the can further down the road.

So which vector comparisons doesn't powerpc support, and what should the
fallback vcond* expansion for them be?

Richard
Segher Boessenkool May 29, 2020, 5:09 p.m. UTC | #20
On Fri, May 29, 2020 at 05:57:13PM +0100, Richard Sandiford wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:
> > On Fri, May 29, 2020 at 02:17:00PM +0200, Richard Biener wrote:
> >> Now it looks like that those verification also simply checks optab
> >> availability only but then this is just a preexisting issue (and we can
> >> possibly build a testcase that FAILs RTL expansion for power...).
> >> 
> >> So given that this means the latent bug in the powerpc backend
> >> should be fixed and we should use a direct internal function instead?
> >
> > I don't see what you consider a bug in the backend here?  The expansion
> > FAILs, and it is explicitly allowed to do that.
> 
> Well, the docs say:
> 
>   …  For **certain** named patterns, it may invoke @code{FAIL} to tell the
>   compiler to use an alternate way of performing that task.  …
> 
> (my emphasis).  Later on they say:
> 
>   @findex FAIL
>   @item FAIL
>   …
> 
>   Failure is currently supported only for binary (addition, multiplication,
>   shifting, etc.) and bit-field (@code{extv}, @code{extzv}, and @code{insv})
>   operations.
> 
> which explicitly says that vcond* isn't allowed to fail.
> 
> OK, so that list looks out of date.  But still. :-)
> 
> We now explicitly say that some patterns aren't allowed to FAIL,
> which I guess gives the (implicit) impression that all the others can.
> But that wasn't the intention.  The lines were just added for emphasis.
> (AFAIK 7f9844caf1ebd513 was the first patch to do this.)

Most patterns *do* FAIL on some target.  We cannot rewind time.


Segher
Richard Sandiford May 29, 2020, 5:26 p.m. UTC | #21
Segher Boessenkool <segher@kernel.crashing.org> writes:
> On Fri, May 29, 2020 at 05:57:13PM +0100, Richard Sandiford wrote:
>> Segher Boessenkool <segher@kernel.crashing.org> writes:
>> > On Fri, May 29, 2020 at 02:17:00PM +0200, Richard Biener wrote:
>> >> Now it looks like that those verification also simply checks optab
>> >> availability only but then this is just a preexisting issue (and we can
>> >> possibly build a testcase that FAILs RTL expansion for power...).
>> >> 
>> >> So given that this means the latent bug in the powerpc backend
>> >> should be fixed and we should use a direct internal function instead?
>> >
>> > I don't see what you consider a bug in the backend here?  The expansion
>> > FAILs, and it is explicitly allowed to do that.
>> 
>> Well, the docs say:
>> 
>>   …  For **certain** named patterns, it may invoke @code{FAIL} to tell the
>>   compiler to use an alternate way of performing that task.  …
>> 
>> (my emphasis).  Later on they say:
>> 
>>   @findex FAIL
>>   @item FAIL
>>   …
>> 
>>   Failure is currently supported only for binary (addition, multiplication,
>>   shifting, etc.) and bit-field (@code{extv}, @code{extzv}, and @code{insv})
>>   operations.
>> 
>> which explicitly says that vcond* isn't allowed to fail.
>> 
>> OK, so that list looks out of date.  But still. :-)
>> 
>> We now explicitly say that some patterns aren't allowed to FAIL,
>> which I guess gives the (implicit) impression that all the others can.
>> But that wasn't the intention.  The lines were just added for emphasis.
>> (AFAIK 7f9844caf1ebd513 was the first patch to do this.)
>
> Most patterns *do* FAIL on some target.  We cannot rewind time.

Sure.  But the point is that FAILing isn't “explicitly allowed” for vcond*.
In fact it's the opposite.

If we ignore the docs and look at what the status quo actually is --
which I agree seems safest for GCC :-) -- then patterns are allowed to
FAIL if target-independent code provides an expand-time fallback for
the FAILing case.  But that isn't true for vcond either.
expand_vec_cond_expr does:

  icode = get_vcond_icode (mode, cmp_op_mode, unsignedp);
  if (icode == CODE_FOR_nothing)
    ...

  comparison = vector_compare_rtx (VOIDmode, tcode, op0a, op0b, unsignedp,
				   icode, 4);
  rtx_op1 = expand_normal (op1);
  rtx_op2 = expand_normal (op2);

  create_output_operand (&ops[0], target, mode);
  create_input_operand (&ops[1], rtx_op1, mode);
  create_input_operand (&ops[2], rtx_op2, mode);
  create_fixed_operand (&ops[3], comparison);
  create_fixed_operand (&ops[4], XEXP (comparison, 0));
  create_fixed_operand (&ops[5], XEXP (comparison, 1));
  expand_insn (icode, 6, ops);
  return ops[0].value;

which ICEs if the expander FAILs.

So whether you go from the docs or from what's actually implemented,
vcond* isn't currently allowed to FAIL.  All Richard's gcc_unreachable
suggestion would do is change where the ICE happens.

Richard
Segher Boessenkool May 29, 2020, 5:30 p.m. UTC | #22
On Fri, May 29, 2020 at 06:05:14PM +0100, Richard Sandiford wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:
> > On Fri, May 29, 2020 at 02:43:12PM +0200, Richard Biener wrote:
> >> Segher - do you actually know this code to guess why the patterns are defensive?
> >
> > Yes.
> 
> In that case, can you give a specific example in which the patterns do
> actually fail?

That is a very different question.  (And this is shifting the burden of
proof again.)

> I think Richard's point is that even the current compiler will ICE if
> the vcond* patterns fail.  All Martin's patch did was expose that via
> the extra static checking we get for directly-mapped internal fns.

How will they ICE?

> If you want us to fix that by providing a fallback, we need to know what
> the fallback should do.

Just whatever vcond* is documented to do, of course ;-)

> E.g. the obvious thing would be to emit the
> embedded comparison separately and then emit bitwise operations to
> implement the select.  But in the powerpc case, it's actually the
> comparison that's the potential problem, so that expansion would just
> kick the can further down the road.
> 
> So which vector comparisons doesn't powerpc support, and what should the
> fallback vcond* expansion for them be?

It depends on which set of vector registers is in use, and on the ISA
version as well, what the hardware can do.  What the backend can do --
well, it is allowed to FAIL these patterns, and it sometimes does.
That's the whole point isn't it?

vec_cmp* won't FAIL.  I don't know if there is a portable variant of
this?


Segher
Segher Boessenkool May 29, 2020, 5:37 p.m. UTC | #23
On Fri, May 29, 2020 at 06:26:55PM +0100, Richard Sandiford wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:
> > Most patterns *do* FAIL on some target.  We cannot rewind time.
> 
> Sure.  But the point is that FAILing isn't “explicitly allowed” for vcond*.
> In fact it's the opposite.

It has FAILed on rs6000 since 2004.

> If we ignore the docs and look at what the status quo actually is --
> which I agree seems safest for GCC :-) -- then patterns are allowed to
> FAIL if target-independent code provides an expand-time fallback for
> the FAILing case.  But that isn't true for vcond either.

That is a bug in the callers then :-)

> expand_vec_cond_expr does:
> 
>   icode = get_vcond_icode (mode, cmp_op_mode, unsignedp);
>   if (icode == CODE_FOR_nothing)
>     ...
> 
>   comparison = vector_compare_rtx (VOIDmode, tcode, op0a, op0b, unsignedp,
> 				   icode, 4);
>   rtx_op1 = expand_normal (op1);
>   rtx_op2 = expand_normal (op2);
> 
>   create_output_operand (&ops[0], target, mode);
>   create_input_operand (&ops[1], rtx_op1, mode);
>   create_input_operand (&ops[2], rtx_op2, mode);
>   create_fixed_operand (&ops[3], comparison);
>   create_fixed_operand (&ops[4], XEXP (comparison, 0));
>   create_fixed_operand (&ops[5], XEXP (comparison, 1));
>   expand_insn (icode, 6, ops);
>   return ops[0].value;
> 
> which ICEs if the expander FAILs.
> 
> So whether you go from the docs or from what's actually implemented,
> vcond* isn't currently allowed to FAIL.  All Richard's gcc_unreachable
> suggestion would do is change where the ICE happens.


>   icode = get_vcond_icode (mode, cmp_op_mode, unsignedp);
>   if (icode == CODE_FOR_nothing)
>     ...

Of course it is allowed to FAIL, based on this code.  That is: the RTL
pattern is allowed to FAIL.  Whatever optabs do, I never understood :-)

Is this vec_cmp that is used by the fallback?  That will never FAIL
for us (if it is enabled at all, natch, same as for any other target).


Segher
Richard Sandiford May 30, 2020, 7:15 a.m. UTC | #24
Segher Boessenkool <segher@kernel.crashing.org> writes:
> On Fri, May 29, 2020 at 06:26:55PM +0100, Richard Sandiford wrote:
>> Segher Boessenkool <segher@kernel.crashing.org> writes:
>> > Most patterns *do* FAIL on some target.  We cannot rewind time.
>> 
>> Sure.  But the point is that FAILing isn't “explicitly allowed” for vcond*.
>> In fact it's the opposite.
>
> It has FAILed on rs6000 since 2004.

But that just means that the powerpc bug has been there since 2004,
assuming these FAILs can actually trigger in practice.  At that time,
the corresponding expand code was:

/* Generate insns for VEC_COND_EXPR.  */

rtx
expand_vec_cond_expr (tree vec_cond_expr, rtx target)
{
  enum insn_code icode;
  rtx comparison, rtx_op1, rtx_op2, cc_op0, cc_op1;
  enum machine_mode mode = TYPE_MODE (TREE_TYPE (vec_cond_expr));
  bool unsignedp = TYPE_UNSIGNED (TREE_TYPE (vec_cond_expr));

  icode = get_vcond_icode (vec_cond_expr, mode);
  if (icode == CODE_FOR_nothing)
    return 0;

  if (!target)
    target = gen_reg_rtx (mode);

  /* Get comparison rtx.  First expand both cond expr operands.  */
  comparison = vector_compare_rtx (TREE_OPERAND (vec_cond_expr, 0), 
                                   unsignedp, icode);
  cc_op0 = XEXP (comparison, 0);
  cc_op1 = XEXP (comparison, 1);
  /* Expand both operands and force them in reg, if required.  */
  rtx_op1 = expand_expr (TREE_OPERAND (vec_cond_expr, 1),
                         NULL_RTX, VOIDmode, 1);
  if (!(*insn_data[icode].operand[1].predicate) (rtx_op1, mode)
      && mode != VOIDmode)
    rtx_op1 = force_reg (mode, rtx_op1);

  rtx_op2 = expand_expr (TREE_OPERAND (vec_cond_expr, 2),
                         NULL_RTX, VOIDmode, 1);
  if (!(*insn_data[icode].operand[2].predicate) (rtx_op2, mode)
      && mode != VOIDmode)
    rtx_op2 = force_reg (mode, rtx_op2);

  /* Emit instruction! */
  emit_insn (GEN_FCN (icode) (target, rtx_op1, rtx_op2, 
                              comparison, cc_op0,  cc_op1));

  return target;
}

i.e. no fallbacks, and no checking whether the expansion even
succeeded.  Since FAIL just causes the generator to return null,
and since emit_insn is a no-op for null insns, the effect for
FAILs was to emit no instructions and return an uninitialised
target register.

The silent use of an uninitialised register was changed in 2011
to an ICE, via the introduction of expand_insn.

>> If we ignore the docs and look at what the status quo actually is --
>> which I agree seems safest for GCC :-) -- then patterns are allowed to
>> FAIL if target-independent code provides an expand-time fallback for
>> the FAILing case.  But that isn't true for vcond either.
>
> That is a bug in the callers then :-)

It was a bug in the powerpc patch you cited that added the FAILs
without changing target-independent code to cope with them.

The fact that we've had no code to handle the FAILs for 15+ years
without apparent problems makes it even more likely that the FAILs
never happen in practice.

If you think the FAILs do trigger in practice, please provide an example.

Richard
Segher Boessenkool May 30, 2020, 1:08 p.m. UTC | #25
Hi!

On Sat, May 30, 2020 at 08:15:55AM +0100, Richard Sandiford wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:
> >> Sure.  But the point is that FAILing isn't “explicitly allowed” for vcond*.
> >> In fact it's the opposite.

I disagree btw, and no one else has noticed for 16 years either.

In general, almost all patterns can FAIL, and those that can not are
simply because no one wrote fallback code.  Which means that all
targets that need a fallback need to implement the same thing for
themselves, which is just a waste and causes extra errors.

So, "cannot FAIL" should be a temporary thing, and should change to
"can FAIL" as soon as someone implements that, and never be changed
back -- and it should be how almost everything is in the first place
(and it still is, thankfully).

> > It has FAILed on rs6000 since 2004.
> 
> But that just means that the powerpc bug has been there since 2004,
> assuming these FAILs can actually trigger in practice.  At that time,
> the corresponding expand code was:

I, and I think most other people, thought it was allowed to FAIL (and
I still do).

> rtx
> expand_vec_cond_expr (tree vec_cond_expr, rtx target)

[ snip ]

So this was buggy.

> i.e. no fallbacks, and no checking whether the expansion even
> succeeded.  Since FAIL just causes the generator to return null,
> and since emit_insn is a no-op for null insns, the effect for
> FAILs was to emit no instructions and return an uninitialised
> target register.
> 
> The silent use of an uninitialised register was changed in 2011
> to an ICE, via the introduction of expand_insn.

Yeah, I ran into some of that in 2015, at least then not all of that
was fixed.  That was some very basic insn I think, that really should
never fail, a simple branch or something...  Was surprising though, a
good reminder to always check return values :-)

> The fact that we've had no code to handle the FAILs for 15+ years
> without apparent problems makes it even more likely that the FAILs
> never happen in practice.

AltiVec can do a lot less than VSX (and VSX on p7 can do less than on
p8, and that can do less than p9, etc.), so I am pretty certain it
could fail for some cases.  Only up to not so very long ago these
patterns were mainly (or only?) used via builtins, and the code for
those handles all those cases already.

> If you think the FAILs do trigger in practice, please provide an example.

As I said before, that is completely beside the point.

vcond is allowed to FAIL.  No pattern that can FAIL should ever be
changed to not allow that anymore.  This would make no sense at all.


Segher
Richard Biener June 2, 2020, 11:09 a.m. UTC | #26
On Sat, May 30, 2020 at 3:08 PM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> Hi!
>
> On Sat, May 30, 2020 at 08:15:55AM +0100, Richard Sandiford wrote:
> > Segher Boessenkool <segher@kernel.crashing.org> writes:
> > >> Sure.  But the point is that FAILing isn't “explicitly allowed” for vcond*.
> > >> In fact it's the opposite.
>
> I disagree btw, and no one else has noticed for 16 years either.
>
> In general, almost all patterns can FAIL, and those that can not are
> simply because no one wrote fallback code.  Which means that all
> targets that need a fallback need to implement the same thing for
> themselves, which is just a waste and causes extra errors.
>
> So, "cannot FAIL" should be a temporary thing, and should change to
> "can FAIL" as soon as someone implements that, and never be changed
> back -- and it should be how almost everything is in the first place
> (and it still is, thankfully).
>
> > > It has FAILed on rs6000 since 2004.
> >
> > But that just means that the powerpc bug has been there since 2004,
> > assuming these FAILs can actually trigger in practice.  At that time,
> > the corresponding expand code was:
>
> I, and I think most other people, thought it was allowed to FAIL (and
> I still do).
>
> > rtx
> > expand_vec_cond_expr (tree vec_cond_expr, rtx target)
>
> [ snip ]
>
> So this was buggy.
>
> > i.e. no fallbacks, and no checking whether the expansion even
> > succeeded.  Since FAIL just causes the generator to return null,
> > and since emit_insn is a no-op for null insns, the effect for
> > FAILs was to emit no instructions and return an uninitialised
> > target register.
> >
> > The silent use of an uninitialised register was changed in 2011
> > to an ICE, via the introduction of expand_insn.
>
> Yeah, I ran into some of that in 2015, at least then not all of that
> was fixed.  That was some very basic insn I think, that really should
> never fail, a simple branch or something...  Was surprising though, a
> good reminder to always check return values :-)
>
> > The fact that we've had no code to handle the FAILs for 15+ years
> > without apparent problems makes it even more likely that the FAILs
> > never happen in practice.
>
> AltiVec can do a lot less than VSX (and VSX on p7 can do less than on
> p8, and that can do less than p9, etc.), so I am pretty certain it
> could fail for some cases.  Only up to not so very long ago these
> patterns were mainly (or only?) used via builtins, and the code for
> those handles all those cases already.
>
> > If you think the FAILs do trigger in practice, please provide an example.
>
> As I said before, that is completely beside the point.
>
> vcond is allowed to FAIL.  No pattern that can FAIL should ever be
> changed to not allow that anymore.  This would make no sense at all.

Fact is if the FAIL happens we ICE _currently_.

The patterns may not FAIL since otherwise the vectorizer has no way
to check whether the backend can code-generate and fail vectorization
if not.  FAIL is a _very_ unspecific fail and I guess the middle-end would
need to cope with a pattern (considered as may-FAIL) doing

  if (random() == 5)
    FAIL;

specifically not FAIL when invoked during vectorization but FAIL when
re-invoked during RTL expansion just because some internal state
at the point of RTL expansion cannot be simulated at vectorization time.

Now the real issue here is of course that if vcond expansion may
FAIL then of course, based on your reasoning, vec_cmp expansion
may so as well.  In that case there's _no_ way to code generate
either of the two.  Well, spill, do elementwise compare (are cmp*
allowed to FAIL?), store, load would do the trick - but that defeats
the attempt to cost during vectorization.

So please be constructive.  Like, provide a testcase that ICEs
with the FAILs replaced by gcc_unreachable ().  Martin, may I suggest
to do this replacement and bootstrap/test?  I think it would be nice
to have testsuite coverage for the FAILs, and maybe we have that
already.

To get out of the deadlock here I'll approve a patch variant that
uses a separate internal function (thus without the static non-FAIL
checking) that preserves current behavior - thus ICE if the pattern
FAILs.  This change is then not a regression.  (please re-post for
appropriate approval)

Unless we come to a consensus in this discussion which seems
to dance around the latent vectorizer <-> rs6000 interface issue.

CCing David as other rs6000 maintainer (the subject isn't
specifically pointing to rs6000 so not sure if you followed the
discussion).

Thanks,
Richard.

>
> Segher
Martin Liška June 2, 2020, 3 p.m. UTC | #27
On 6/2/20 1:09 PM, Richard Biener wrote:
> So please be constructive.  Like, provide a testcase that ICEs
> with the FAILs replaced by gcc_unreachable ().  Martin, may I suggest
> to do this replacement and bootstrap/test?  I think it would be nice
> to have testsuite coverage for the FAILs, and maybe we have that
> already.

Hello.

There's the suggested patch that survives bootstrap on ppc64le-linux-gnu
and passes test-suite.

Martin
Richard Biener June 3, 2020, 7:38 a.m. UTC | #28
On Tue, Jun 2, 2020 at 5:00 PM Martin Liška <mliska@suse.cz> wrote:
>
> On 6/2/20 1:09 PM, Richard Biener wrote:
> > So please be constructive.  Like, provide a testcase that ICEs
> > with the FAILs replaced by gcc_unreachable ().  Martin, may I suggest
> > to do this replacement and bootstrap/test?  I think it would be nice
> > to have testsuite coverage for the FAILs, and maybe we have that
> > already.
>
> Hello.
>
> There's the suggested patch that survives bootstrap on ppc64le-linux-gnu
> and passes test-suite.

OK, so can you please re-post the version of the VEC_COND_EXPR
patch that uses a regular IFN (without the static non-FAIL checking)
in a new thread?  If there's no OK from rs6000 maintainers to remove
the FAILs then we'll go ahead with that version, unless Richard objects
here.

Thanks,
Richard.

> Martin
Richard Sandiford June 3, 2020, 1:41 p.m. UTC | #29
Richard Biener <richard.guenther@gmail.com> writes:
> On Tue, Jun 2, 2020 at 5:00 PM Martin Liška <mliska@suse.cz> wrote:
>>
>> On 6/2/20 1:09 PM, Richard Biener wrote:
>> > So please be constructive.  Like, provide a testcase that ICEs
>> > with the FAILs replaced by gcc_unreachable ().  Martin, may I suggest
>> > to do this replacement and bootstrap/test?  I think it would be nice
>> > to have testsuite coverage for the FAILs, and maybe we have that
>> > already.
>>
>> Hello.
>>
>> There's the suggested patch that survives bootstrap on ppc64le-linux-gnu
>> and passes test-suite.
>
> OK, so can you please re-post the version of the VEC_COND_EXPR
> patch that uses a regular IFN (without the static non-FAIL checking)
> in a new thread?  If there's no OK from rs6000 maintainers to remove
> the FAILs then we'll go ahead with that version, unless Richard objects
> here.

Well, it seems unfortunate to have to do that.

I think Martin's powerpc patch is the correct one.  But assuming that
the powerpc maintainers still object, I guess the options are:

- Find enough global reviewers who are prepared to approve that patch,
  to override the powerpc maintainers.

- Avoid conflict by going with the regular IFN patch.  To be clear,
  this will ICE in exactly the same cases that Martin's powerpc patch
  does (and current master does), so there's no real benefit to the
  powerpc port from doing this.  It just makes the code more complicated
  and means that other ports don't benefit from the static checking.

In the circumstances, I agree the second is probably the most practical
way forward.

I can't help but think this is a process failure though.  I don't think
using regular IFNs has any technical merits, and it doesn't give Segher
what he wants either (i.e. code that copes with failing vconds).

Thanks,
Richard
David Edelsohn June 3, 2020, 2:17 p.m. UTC | #30
On Wed, Jun 3, 2020 at 9:41 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener <richard.guenther@gmail.com> writes:
> > On Tue, Jun 2, 2020 at 5:00 PM Martin Liška <mliska@suse.cz> wrote:
> >>
> >> On 6/2/20 1:09 PM, Richard Biener wrote:
> >> > So please be constructive.  Like, provide a testcase that ICEs
> >> > with the FAILs replaced by gcc_unreachable ().  Martin, may I suggest
> >> > to do this replacement and bootstrap/test?  I think it would be nice
> >> > to have testsuite coverage for the FAILs, and maybe we have that
> >> > already.
> >>
> >> Hello.
> >>
> >> There's the suggested patch that survives bootstrap on ppc64le-linux-gnu
> >> and passes test-suite.
> >
> > OK, so can you please re-post the version of the VEC_COND_EXPR
> > patch that uses a regular IFN (without the static non-FAIL checking)
> > in a new thread?  If there's no OK from rs6000 maintainers to remove
> > the FAILs then we'll go ahead with that version, unless Richard objects
> > here.
>
> Well, it seems unfortunate to have to do that.
>
> I think Martin's powerpc patch is the correct one.  But assuming that
> the powerpc maintainers still object, I guess the options are:
>
> - Find enough global reviewers who are prepared to approve that patch,
>   to override the powerpc maintainers.

Luckily GCC Development does not operate this way.

How about (3) help to remove reliance on this incorrect behavior from
the PowerPC port?

I didn't formally check, but if this is 16 years old, then it's from
the original RHES Altivec work.

I don't believe that anyone fundamentally is objecting to "fixing this
correctly".  I don't know the entire history of this discussion, but
my objection is to a fix that breaks a long-time assumption of the
PowerPC port and leaves it as an exercise to the PowerPC maintainers
to fix it.

Thanks, David
Richard Biener June 3, 2020, 2:46 p.m. UTC | #31
On Wed, Jun 3, 2020 at 4:17 PM David Edelsohn <dje.gcc@gmail.com> wrote:
>
> On Wed, Jun 3, 2020 at 9:41 AM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
> >
> > Richard Biener <richard.guenther@gmail.com> writes:
> > > On Tue, Jun 2, 2020 at 5:00 PM Martin Liška <mliska@suse.cz> wrote:
> > >>
> > >> On 6/2/20 1:09 PM, Richard Biener wrote:
> > >> > So please be constructive.  Like, provide a testcase that ICEs
> > >> > with the FAILs replaced by gcc_unreachable ().  Martin, may I suggest
> > >> > to do this replacement and bootstrap/test?  I think it would be nice
> > >> > to have testsuite coverage for the FAILs, and maybe we have that
> > >> > already.
> > >>
> > >> Hello.
> > >>
> > >> There's the suggested patch that survives bootstrap on ppc64le-linux-gnu
> > >> and passes test-suite.
> > >
> > > OK, so can you please re-post the version of the VEC_COND_EXPR
> > > patch that uses a regular IFN (without the static non-FAIL checking)
> > > in a new thread?  If there's no OK from rs6000 maintainers to remove
> > > the FAILs then we'll go ahead with that version, unless Richard objects
> > > here.
> >
> > Well, it seems unfortunate to have to do that.
> >
> > I think Martin's powerpc patch is the correct one.  But assuming that
> > the powerpc maintainers still object, I guess the options are:
> >
> > - Find enough global reviewers who are prepared to approve that patch,
> >   to override the powerpc maintainers.
>
> Luckily GCC Development does not operate this way.
>
> How about (3) help to remove reliance on this incorrect behavior from
> the PowerPC port?
>
> I didn't formally check, but if this is 16 years old, then it's from
> the original RHES Altivec work.
>
> I don't believe that anyone fundamentally is objecting to "fixing this
> correctly".  I don't know the entire history of this discussion, but
> my objection is to a fix that breaks a long-time assumption of the
> PowerPC port and leaves it as an exercise to the PowerPC maintainers
> to fix it.

I _think_ there's nothing to fix besides removing the FAIL.  And I would
have no idea how to "fix" the powerpc port here since a) we lack a testcase
that actually FAILs, b) I'm not familiar with the ISA.  So we did (3) by
replacing the FAILs with gcc_unreachable () and bootstrap/regtest this
without any regression which I think "proves" the failure modes do not
actually exist.

So I'm not sure how we can help.

There's the option to remove the vcond_* patterns entirely which makes
us fall back to the vec_cmp_* ones which do never FAIL on powerpc.
But I suspect this might regress code generation.

It's suspicious that vec_cmp_* never FAIL when vcond_* do btw. - see
the case I pointed out where it appears to have inconsistencies/wrong-code
issues regarding ordered compare support.  But see above (I do not know
the powerpc ISA).

A vcond can usually be emulated by vec_cmp plus masking.  So if
we ever get a testcase that runs into the gcc_unreachable () I'll promise
to fix it up using this strategy in the vcond expander.  But without a
testcase and powerpc ISA knowledge it's really hard.  Or do you want
us to stick the vec_cmp expansion fallback in place of the FAILs?
I'm sure the powerpc maintainers are better suited to do that even though
I'll probably manage with some cut&paste.  To recap: vcond is
equal to

  mask = vec_cmp of the comparison
  true_masked = true_op & mask;
  false_masked = false_op & ~mask;
  result = true_masked | false_masked;

but I believe this would be dead code never triggered.

Richard.

>
> Thanks, David
Segher Boessenkool June 3, 2020, 5:01 p.m. UTC | #32
Hi!

On Wed, Jun 03, 2020 at 04:46:12PM +0200, Richard Biener wrote:
> On Wed, Jun 3, 2020 at 4:17 PM David Edelsohn <dje.gcc@gmail.com> wrote:
> > On Wed, Jun 3, 2020 at 9:41 AM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> > > Well, it seems unfortunate to have to do that.
> > >
> > > I think Martin's powerpc patch is the correct one.

It is papering over the issues a little -- the same assumption is made
at lower levels as well, so all *that* needs to be changed as well (not
"fixed", it is not a bug, we have a change in the vcond* interface here;
oh and that should be documented as well).

> > How about (3) help to remove reliance on this incorrect behavior from
> > the PowerPC port?

It is not a reliance on incorrect behaviour.  This is a change.  Which
pretty much everyone seems to want, so fine, but that takes time.

> > I didn't formally check, but if this is 16 years old, then it's from
> > the original RHES Altivec work.

It is, exactly.

> > I don't believe that anyone fundamentally is objecting to "fixing this
> > correctly".  I don't know the entire history of this discussion, but
> > my objection is to a fix that breaks a long-time assumption of the
> > PowerPC port and leaves it as an exercise to the PowerPC maintainers
> > to fix it.

*Exactly*.  This is changing an ancient interface, claiming "it always
was that way" (which very obviously isn't true), and leaving the rs6000
people to deal with all the fallout.  Again.

> I _think_ there's nothing to fix besides removing the FAIL.

All the lower levels need to get asserts as well.  We need a week or so
to put the whole thing through the wringer.  The documentation needs to
be changed by whoever changes the vcond* semantics.  All other ports
should be checked, too.

> And I would
> have no idea how to "fix" the powerpc port here since a) we lack a testcase
> that actually FAILs, b) I'm not familiar with the ISA.  So we did (3) by
> replacing the FAILs with gcc_unreachable () and bootstrap/regtest this
> without any regression which I think "proves" the failure modes do not
> actually exist.

Heh, assuming the testsuite is comprehensive?  Heh.  (Bootstrap doesn't
mean much for vector code).

> So I'm not sure how we can help.

You'll have to document the "vcond* is not allowed to FAIL" change.
We'll deal with the rest.  But testing needs a week or so.  (That is
an extremely short timescale already).

> A vcond can usually be emulated by vec_cmp plus masking.

That would be the generic way to implement this of course, but apparently
such code doesn't yet exist?  If there is a generic implementation it
should be trivial to deal with FAILs.

> So if
> we ever get a testcase that runs into the gcc_unreachable () I'll promise
> to fix it up using this strategy in the vcond expander.  But without a
> testcase and powerpc ISA knowledge it's really hard.  Or do you want
> us to stick the vec_cmp expansion fallback in place of the FAILs?
> I'm sure the powerpc maintainers are better suited to do that even though
> I'll probably manage with some cut&paste.  To recap: vcond is
> equal to
> 
>   mask = vec_cmp of the comparison
>   true_masked = true_op & mask;
>   false_masked = false_op & ~mask;
>   result = true_masked | false_masked;
> 
> but I believe this would be dead code never triggered.

But that would be the generic code as well?  Is that not useful to have
in any case?


Segher
Richard Biener June 3, 2020, 5:23 p.m. UTC | #33
On June 3, 2020 7:01:39 PM GMT+02:00, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>Hi!
>
>On Wed, Jun 03, 2020 at 04:46:12PM +0200, Richard Biener wrote:
>> On Wed, Jun 3, 2020 at 4:17 PM David Edelsohn <dje.gcc@gmail.com>
>wrote:
>> > On Wed, Jun 3, 2020 at 9:41 AM Richard Sandiford
>> > <richard.sandiford@arm.com> wrote:
>> > > Well, it seems unfortunate to have to do that.
>> > >
>> > > I think Martin's powerpc patch is the correct one.
>
>It is papering over the issues a little -- the same assumption is made
>at lower levels as well, so all *that* needs to be changed as well (not
>"fixed", it is not a bug, we have a change in the vcond* interface
>here;
>oh and that should be documented as well).
>
>> > How about (3) help to remove reliance on this incorrect behavior
>from
>> > the PowerPC port?
>
>It is not a reliance on incorrect behaviour.  This is a change.  Which
>pretty much everyone seems to want, so fine, but that takes time.
>
>> > I didn't formally check, but if this is 16 years old, then it's
>from
>> > the original RHES Altivec work.
>
>It is, exactly.
>
>> > I don't believe that anyone fundamentally is objecting to "fixing
>this
>> > correctly".  I don't know the entire history of this discussion,
>but
>> > my objection is to a fix that breaks a long-time assumption of the
>> > PowerPC port and leaves it as an exercise to the PowerPC
>maintainers
>> > to fix it.
>
>*Exactly*.  This is changing an ancient interface, claiming "it always
>was that way" (which very obviously isn't true), and leaving the rs6000
>people to deal with all the fallout.  Again.
>
>> I _think_ there's nothing to fix besides removing the FAIL.
>
>All the lower levels need to get asserts as well.  We need a week or so
>to put the whole thing through the wringer.  The documentation needs to
>be changed by whoever changes the vcond* semantics.  All other ports
>should be checked, too.
>
>> And I would
>> have no idea how to "fix" the powerpc port here since a) we lack a
>testcase
>> that actually FAILs, b) I'm not familiar with the ISA.  So we did (3)
>by
>> replacing the FAILs with gcc_unreachable () and bootstrap/regtest
>this
>> without any regression which I think "proves" the failure modes do
>not
>> actually exist.
>
>Heh, assuming the testsuite is comprehensive?  Heh.  (Bootstrap doesn't
>mean much for vector code).
>
>> So I'm not sure how we can help.
>
>You'll have to document the "vcond* is not allowed to FAIL" change.
>We'll deal with the rest.  But testing needs a week or so.  (That is
>an extremely short timescale already).
>
>> A vcond can usually be emulated by vec_cmp plus masking.
>
>That would be the generic way to implement this of course, but
>apparently
>such code doesn't yet exist?  If there is a generic implementation it
>should be trivial to deal with FAILs.
>
>> So if
>> we ever get a testcase that runs into the gcc_unreachable () I'll
>promise
>> to fix it up using this strategy in the vcond expander.  But without
>a
>> testcase and powerpc ISA knowledge it's really hard.  Or do you want
>> us to stick the vec_cmp expansion fallback in place of the FAILs?
>> I'm sure the powerpc maintainers are better suited to do that even
>though
>> I'll probably manage with some cut&paste.  To recap: vcond is
>> equal to
>> 
>>   mask = vec_cmp of the comparison
>>   true_masked = true_op & mask;
>>   false_masked = false_op & ~mask;
>>   result = true_masked | false_masked;
>> 
>> but I believe this would be dead code never triggered.
>
>But that would be the generic code as well?  Is that not useful to have
>in any case?

Sure. If you remove the vcond patterns from your port the vectorizer will do this transparently for you. So if you do not actually have a more clever way of representing this in the ISA there's no point of the vcond patterns. (though I think the vec_cmp ones didn't originally exist) 

The point is the vectorizer relies on a optab query for querying backend support and power claims vcond support here. If you then FAIL you have lied. (not in your interpretation of the pattern docs but in the implementations since introduction of vcond named patterns) 

So if you're happy I'll document explicitly that vector named patterns may not FAIL. 

Richard. 

>
>Segher
Segher Boessenkool June 3, 2020, 6:23 p.m. UTC | #34
On Wed, Jun 03, 2020 at 07:23:47PM +0200, Richard Biener wrote:
> >>   mask = vec_cmp of the comparison
> >>   true_masked = true_op & mask;
> >>   false_masked = false_op & ~mask;
> >>   result = true_masked | false_masked;
> >> 
> >> but I believe this would be dead code never triggered.
> >
> >But that would be the generic code as well?  Is that not useful to have
> >in any case?
> 
> Sure. If you remove the vcond patterns from your port the vectorizer will do this transparently for you. So if you do not actually have a more clever way of representing this in the ISA there's no point of the vcond patterns. (though I think the vec_cmp ones didn't originally exist) 

So why can the expander not just do that whenever the patterns FAIL as
well?

> The point is the vectorizer relies on a optab query for querying backend support and power claims vcond support here. If you then FAIL you have lied. (not in your interpretation of the pattern docs but in the implementations since introduction of vcond named patterns) 

Almost all RTL patterns are allowed to FAIL, and that is a very good
thing.  If the vectoriser does not allow that, *it* is buggy.

> So if you're happy I'll document explicitly that vector named patterns may not FAIL. 

That will not work in general at all, no.  Please document it for only
those RTL patterns you need it for (and it is documented per pattern
currently, anyway).


Segher
Segher Boessenkool June 3, 2020, 6:27 p.m. UTC | #35
Hi Martin,

Okay, let's try this out.  Okay for trunk.  Thanks for the work!

On Tue, Jun 02, 2020 at 05:00:56PM +0200, Martin Liška wrote:
> >From 22db04d058c9bbd140041e7aa2caf1613767095a Mon Sep 17 00:00:00 2001
> From: Martin Liska <mliska@suse.cz>
> Date: Tue, 2 Jun 2020 15:29:37 +0200
> Subject: [PATCH] rs6000: replace FAIL with gcc_unreachable.

("Replace", and no dot at the end).

> 	* config/rs6000/vector.md: Replace FAIL with gcc_unreachable
> 	in all vcond* patterns.


Segher
Richard Biener June 3, 2020, 6:38 p.m. UTC | #36
On June 3, 2020 8:23:14 PM GMT+02:00, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>On Wed, Jun 03, 2020 at 07:23:47PM +0200, Richard Biener wrote:
>> >>   mask = vec_cmp of the comparison
>> >>   true_masked = true_op & mask;
>> >>   false_masked = false_op & ~mask;
>> >>   result = true_masked | false_masked;
>> >> 
>> >> but I believe this would be dead code never triggered.
>> >
>> >But that would be the generic code as well?  Is that not useful to
>have
>> >in any case?
>> 
>> Sure. If you remove the vcond patterns from your port the vectorizer
>will do this transparently for you. So if you do not actually have a
>more clever way of representing this in the ISA there's no point of the
>vcond patterns. (though I think the vec_cmp ones didn't originally
>exist) 
>
>So why can the expander not just do that whenever the patterns FAIL as
>well?

It could but all the vectorizer costing assumed it goes the 'cheaper' way. So this is kind of a sanity check. And what when vec_cmp expansion fails as well? Resort to scalar soft FP support as ultimate fallback? That sounds very wrong as a auto vectorization result... 

>> The point is the vectorizer relies on a optab query for querying
>backend support and power claims vcond support here. If you then FAIL
>you have lied. (not in your interpretation of the pattern docs but in
>the implementations since introduction of vcond named patterns) 
>
>Almost all RTL patterns are allowed to FAIL, and that is a very good
>thing.  If the vectoriser does not allow that, *it* is buggy.

Your opinion. Please suggest a better way to query target vector capabilities. 

>> So if you're happy I'll document explicitly that vector named
>patterns may not FAIL. 
>
>That will not work in general at all, no.  Please document it for only
>those RTL patterns you need it for (and it is documented per pattern
>currently, anyway).

Sure, will do. But as Richard said, the documented list is the other way around. 
At least that was my interpretation. 

All vectorizer queried optabs have this constraint BTW.

Richard. 

>
>Segher
David Edelsohn June 3, 2020, 6:46 p.m. UTC | #37
On Wed, Jun 3, 2020 at 2:38 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On June 3, 2020 8:23:14 PM GMT+02:00, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >On Wed, Jun 03, 2020 at 07:23:47PM +0200, Richard Biener wrote:
> >> >>   mask = vec_cmp of the comparison
> >> >>   true_masked = true_op & mask;
> >> >>   false_masked = false_op & ~mask;
> >> >>   result = true_masked | false_masked;
> >> >>
> >> >> but I believe this would be dead code never triggered.
> >> >
> >> >But that would be the generic code as well?  Is that not useful to
> >have
> >> >in any case?
> >>
> >> Sure. If you remove the vcond patterns from your port the vectorizer
> >will do this transparently for you. So if you do not actually have a
> >more clever way of representing this in the ISA there's no point of the
> >vcond patterns. (though I think the vec_cmp ones didn't originally
> >exist)
> >
> >So why can the expander not just do that whenever the patterns FAIL as
> >well?
>
> It could but all the vectorizer costing assumed it goes the 'cheaper' way. So this is kind of a sanity check. And what when vec_cmp expansion fails as well? Resort to scalar soft FP support as ultimate fallback? That sounds very wrong as a auto vectorization result...
>
> >> The point is the vectorizer relies on a optab query for querying
> >backend support and power claims vcond support here. If you then FAIL
> >you have lied. (not in your interpretation of the pattern docs but in
> >the implementations since introduction of vcond named patterns)
> >
> >Almost all RTL patterns are allowed to FAIL, and that is a very good
> >thing.  If the vectoriser does not allow that, *it* is buggy.
>
> Your opinion. Please suggest a better way to query target vector capabilities.
>
> >> So if you're happy I'll document explicitly that vector named
> >patterns may not FAIL.
> >
> >That will not work in general at all, no.  Please document it for only
> >those RTL patterns you need it for (and it is documented per pattern
> >currently, anyway).
>
> Sure, will do. But as Richard said, the documented list is the other way around.
> At least that was my interpretation.
>
> All vectorizer queried optabs have this constraint BTW.

Looking at other primary targets, like x86, ARM, AArch64, I agree that
other targets don't FAIL vector patterns in the same manner as scalar
patterns.  The design of Altivec support came with the work from RHES
and no one at the time mentioned that FAILing those named patterns was
incorrect.  As Segher said, allowing named patterns to fail is fairly
standard in GCC and the restriction for vector patterns seems to have
appeared without warning or documentation.  Maybe the vectorizer
started to assume that (because of x86 behavior) but no one announced
this.

Again, I'm not saying that it is an incorrect design or policy, but
that it seems to have been imposed retroactively.  This needs to be
documented. It needs to be applied uniformly throughout the common
parts of the vectorizer and RTL generation. It needs to allow time for
the Power port to adapt. And we need to test this thoroughly to catch
unanticipated fallout.

Thanks, David
Segher Boessenkool June 3, 2020, 7:09 p.m. UTC | #38
On Wed, Jun 03, 2020 at 08:38:04PM +0200, Richard Biener wrote:
> On June 3, 2020 8:23:14 PM GMT+02:00, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >On Wed, Jun 03, 2020 at 07:23:47PM +0200, Richard Biener wrote:
> >> >>   mask = vec_cmp of the comparison
> >> >>   true_masked = true_op & mask;
> >> >>   false_masked = false_op & ~mask;
> >> >>   result = true_masked | false_masked;
> >> >> 
> >> >> but I believe this would be dead code never triggered.
> >> >
> >> >But that would be the generic code as well?  Is that not useful to
> >have
> >> >in any case?
> >> 
> >> Sure. If you remove the vcond patterns from your port the vectorizer
> >will do this transparently for you. So if you do not actually have a
> >more clever way of representing this in the ISA there's no point of the
> >vcond patterns. (though I think the vec_cmp ones didn't originally
> >exist) 
> >
> >So why can the expander not just do that whenever the patterns FAIL as
> >well?
> 
> It could but all the vectorizer costing assumed it goes the 'cheaper' way. So this is kind of a sanity check. And what when vec_cmp expansion fails as well? Resort to scalar soft FP support as ultimate fallback? That sounds very wrong as a auto vectorization result... 

Yeah, but that is specific to the vectoriser, not something that the
RTL expander should have to deal with.  A big impedance mismatch there.

> >> The point is the vectorizer relies on a optab query for querying
> >backend support and power claims vcond support here. If you then FAIL
> >you have lied. (not in your interpretation of the pattern docs but in
> >the implementations since introduction of vcond named patterns) 
> >
> >Almost all RTL patterns are allowed to FAIL, and that is a very good
> >thing.  If the vectoriser does not allow that, *it* is buggy.
> 
> Your opinion. Please suggest a better way to query target vector capabilities. 

Again, that is what RTL does.  If the vectoriser has extra considerations
or requirements, it perhaps should use some more conditions than just
"this pattern exists"?  Maybe this can be done via the optabs, dunno.

I have no good idea how to query things better -- I don't work in the
vectorisers much at all -- but this should not constrain the target
instruction support that much (it makes no good sense to have to have
two separate patterns for things, one for the vectorisers, one for the
builtins and everything else).

Thanks,


Segher
Jakub Jelinek June 3, 2020, 7:13 p.m. UTC | #39
On Wed, Jun 03, 2020 at 02:09:11PM -0500, Segher Boessenkool wrote:
> Yeah, but that is specific to the vectoriser, not something that the
> RTL expander should have to deal with.  A big impedance mismatch there.

It isn't specific to vectorizer, many named patterns aren't allowed to FAIL
and many others are.
E.g. if one looks at optabs.c, there are many calls to maybe_emit_*
functions which are allowed to FAIL, and many calls to the corresponding
emit_* functions that assert that it doesn't FAIL.
It is true that the documentation documents this in some and not in all
cases.

	Jakub
Martin Liška June 8, 2020, 11:04 a.m. UTC | #40
Hello.

Thank you for the approval. There's the patch that defines 4 new DEF_INTERNAL_OPTAB_FN.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
It also builds on ppc64le-linux-gnu.

Ready to be installed?
Thanks,
Martin
Richard Biener June 9, 2020, 1:42 p.m. UTC | #41
On Mon, Jun 8, 2020 at 1:04 PM Martin Liška <mliska@suse.cz> wrote:
>
> Hello.
>
> Thank you for the approval. There's the patch that defines 4 new DEF_INTERNAL_OPTAB_FN.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> It also builds on ppc64le-linux-gnu.
>
> Ready to be installed?

The ChangeLog refers to DEF_INTERNAL_OPTAB_CAN_FAIL which is no longer there.

Can you put the isel pass to a separate file please?

So this is a first step towards sanitizing VEC_COND_EXPR.  There were followups
mentioned, namely a) enforcing that VEC_COND_EXPR constraint everywhere,
b) isel vector comparisons at the same time since expansion has a
vec_cond fallback

There's

+         /* ???  If we do not cleanup EH then we will ICE in
+            verification.  But in reality we have created wrong-code
+            as we did not properly transition EH info and edges to
+            the piecewise computations.  */
+         if (maybe_clean_eh_stmt (gsi_stmt (gsi))
+             && gimple_purge_dead_eh_edges (bb))
+           cfg_changed = true;

which of course is bad.  It's the comparison that can throw and I guess current
RTL expansion manages to cope by find_many_sub_bbs and friends.  But we
need to get this correct on GIMPLE here.  Note I find it odd this only triggers
during ISEL - it should trigger during the lowering step which splits
the comparison
from the VEC_COND_EXPR.  An appropriate fix at lowering time would be to
insert the VEC_COND_EXPR w/o the condition on the normal outgoing edge
and keep the comparison in place of the original VEC_COND_EXPR,
moving EH info from the VEC_COND_EXPR to the comparison.

I think we need to fix that before merging.

Thanks,
Richard.

> Thanks,
> Martin
Martin Liška June 10, 2020, 8:51 a.m. UTC | #42
On 6/9/20 3:42 PM, Richard Biener wrote:
> On Mon, Jun 8, 2020 at 1:04 PM Martin Liška <mliska@suse.cz> wrote:
>>
>> Hello.
>>
>> Thank you for the approval. There's the patch that defines 4 new DEF_INTERNAL_OPTAB_FN.
>>
>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>> It also builds on ppc64le-linux-gnu.
>>
>> Ready to be installed?
> 
> The ChangeLog refers to DEF_INTERNAL_OPTAB_CAN_FAIL which is no longer there.


Sure.

> 
> Can you put the isel pass to a separate file please?

Yes.

> 
> So this is a first step towards sanitizing VEC_COND_EXPR.  There were followups
> mentioned, namely a) enforcing that VEC_COND_EXPR constraint everywhere,
> b) isel vector comparisons at the same time since expansion has a
> vec_cond fallback

I'm planning to work on the follow up.

> 
> There's
> 
> +         /* ???  If we do not cleanup EH then we will ICE in
> +            verification.  But in reality we have created wrong-code
> +            as we did not properly transition EH info and edges to
> +            the piecewise computations.  */
> +         if (maybe_clean_eh_stmt (gsi_stmt (gsi))
> +             && gimple_purge_dead_eh_edges (bb))
> +           cfg_changed = true;

Hm, I've tried to comment the code in both ISEL and expansion and I can't find a test-case
that would trigger a verification error (in vect.exp and i386.exp). Can you come up with
something that will trigger the code?

> 
> which of course is bad.  It's the comparison that can throw and I guess current
> RTL expansion manages to cope by find_many_sub_bbs and friends.  But we
> need to get this correct on GIMPLE here.  Note I find it odd this only triggers
> during ISEL - it should trigger during the lowering step which splits
> the comparison
> from the VEC_COND_EXPR.  An appropriate fix at lowering time would be to
> insert the VEC_COND_EXPR w/o the condition on the normal outgoing edge
> and keep the comparison in place of the original VEC_COND_EXPR,
> moving EH info from the VEC_COND_EXPR to the comparison.

Ah ok, so it's about correction of EH..

Martin

> 
> I think we need to fix that before merging.
> 
> Thanks,
> Richard.
> 
>> Thanks,
>> Martin
Richard Biener June 10, 2020, 10:50 a.m. UTC | #43
On Wed, Jun 10, 2020 at 10:51 AM Martin Liška <mliska@suse.cz> wrote:
>
> On 6/9/20 3:42 PM, Richard Biener wrote:
> > On Mon, Jun 8, 2020 at 1:04 PM Martin Liška <mliska@suse.cz> wrote:
> >>
> >> Hello.
> >>
> >> Thank you for the approval. There's the patch that defines 4 new DEF_INTERNAL_OPTAB_FN.
> >>
> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >> It also builds on ppc64le-linux-gnu.
> >>
> >> Ready to be installed?
> >
> > The ChangeLog refers to DEF_INTERNAL_OPTAB_CAN_FAIL which is no longer there.
>
>
> Sure.
>
> >
> > Can you put the isel pass to a separate file please?
>
> Yes.
>
> >
> > So this is a first step towards sanitizing VEC_COND_EXPR.  There were followups
> > mentioned, namely a) enforcing that VEC_COND_EXPR constraint everywhere,
> > b) isel vector comparisons at the same time since expansion has a
> > vec_cond fallback
>
> I'm planning to work on the follow up.
>
> >
> > There's
> >
> > +         /* ???  If we do not cleanup EH then we will ICE in
> > +            verification.  But in reality we have created wrong-code
> > +            as we did not properly transition EH info and edges to
> > +            the piecewise computations.  */
> > +         if (maybe_clean_eh_stmt (gsi_stmt (gsi))
> > +             && gimple_purge_dead_eh_edges (bb))
> > +           cfg_changed = true;
>
> Hm, I've tried to comment the code in both ISEL and expansion and I can't find a test-case
> that would trigger a verification error (in vect.exp and i386.exp). Can you come up with
> something that will trigger the code?

typedef double v2df __attribute__((vector_size(16)));

v2df foo (v2df a, v2df b, v2df c, v2df d)
{
  try
  {
    v2df res = a < b ? c : d;
    return res;
    }
    catch (...)
    {
    return (v2df){};
    }
}

with -fnon-call-exceptions should trigger it.

> >
> > which of course is bad.  It's the comparison that can throw and I guess current
> > RTL expansion manages to cope by find_many_sub_bbs and friends.  But we
> > need to get this correct on GIMPLE here.  Note I find it odd this only triggers
> > during ISEL - it should trigger during the lowering step which splits
> > the comparison
> > from the VEC_COND_EXPR.  An appropriate fix at lowering time would be to
> > insert the VEC_COND_EXPR w/o the condition on the normal outgoing edge
> > and keep the comparison in place of the original VEC_COND_EXPR,
> > moving EH info from the VEC_COND_EXPR to the comparison.
>
> Ah ok, so it's about correction of EH..
>
> Martin
>
> >
> > I think we need to fix that before merging.
> >
> > Thanks,
> > Richard.
> >
> >> Thanks,
> >> Martin
>
Martin Liška June 10, 2020, 12:27 p.m. UTC | #44
On 6/10/20 12:50 PM, Richard Biener wrote:
> with -fnon-call-exceptions should trigger it.

Thanks, that works!

We start with:

foo (v2df a, v2df b, v2df c, v2df d)
Eh tree:
    1 try land:{1,<L1>} catch:{}
{
   void * _1;
   v2df _2;
   v2df _8;

   <bb 2> [local count: 1073741824]:
   [LP 1] _8 = VEC_COND_EXPR <a_4(D) < b_5(D), c_6(D), d_7(D)>;

   <bb 3> [local count: 1073741824]:
   # _2 = PHI <{ 0.0, 0.0 }(4), _8(2)>
   return _2;

   <bb 4> [count: 0]:
<L1>: [LP 1]
   _1 = __builtin_eh_pointer (1);
   __cxa_begin_catch (_1);
   __cxa_end_catch ();
   goto <bb 3>; [0.00%]

I tried to use:

	  maybe_clean_or_replace_eh_stmt (stmt, assign);

which does:

   <bb 2> [local count: 1073741824]:
   [LP 1] _12 = a_4(D) < b_5(D);

   <bb 3> [local count: 1073741824]:
   _8 = VEC_COND_EXPR <_12, c_6(D), d_7(D)>;

which requires to split the BB. But now I'm missing an edge:

/home/marxin/Programming/testcases/vect-low.c: In function ‘v2df foo(v2df, v2df, v2df, v2df)’:
/home/marxin/Programming/testcases/vect-low.c:3:6: error: BB 2 is missing an EH edge

Am I doing that correctly? Or do we have a better function for it?
Martin
Martin Liška June 10, 2020, 1:01 p.m. UTC | #45
On 6/10/20 2:27 PM, Martin Liška wrote:
> /home/marxin/Programming/testcases/vect-low.c: In function ‘v2df foo(v2df, v2df, v2df, v2df)’:
> /home/marxin/Programming/testcases/vect-low.c:3:6: error: BB 2 is missing an EH edge

Ok, I was missing copying of the EH edges:

       FOR_EACH_EDGE (e, ei, gimple_bb (old_stmt)->succs)
	{
	  if (e->flags & EDGE_EH)
	    make_edge (gimple_bb (new_stmt), e->dest, e->flags);
	}

Martin
Martin Liška June 11, 2020, 8:52 a.m. UTC | #46
On 6/9/20 3:42 PM, Richard Biener wrote:
> I think we need to fix that before merging.

There's updated version of the patch that should handle the EH properly.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin
Richard Biener June 12, 2020, 9:43 a.m. UTC | #47
On Thu, Jun 11, 2020 at 10:52 AM Martin Liška <mliska@suse.cz> wrote:
>
> On 6/9/20 3:42 PM, Richard Biener wrote:
> > I think we need to fix that before merging.
>
> There's updated version of the patch that should handle the EH properly.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

I think it would be better to avoid creating of a new edge and removing the old
by simply moving the condition stmt to the normal outgoing edge.  Thus in
expand_vector_condition do

   if (lookup_stmt_eh_lp (stmt) != 0)
     {
        maybe_clean_or_replace_eh_stmt (stmt, assign);
        gsi_remove (gsi, false);
        edge_iterator ei;
        edge e;
        FOR_EACH_EDGE (e, ei, gimple_bb (assign)->succs)
           if (e->flags & EDGE_EH)
             break;
        if (e)
          {
             gsi_remove (gsi, false);
             gsi_insert_on_edge_immediate (e, stmt);
          }
        else
           gsi_remove (gsi, true);
     }

a twist is probably the following which shows how we wrongly
make 'res' available in the catch block.  The above would
break (your variant as well) since SSA form is broken afterwards.
This of course solves itself when we not start with the broken
IL in the first place.  We could also try to "solve" this in the
SSA renamer, but then I'm not sure if gimplification doesn't
already break it.

typedef double v2df __attribute__((vector_size(16)));

v2df foo (v2df a, v2df b, v2df c, v2df d)
{
 v2df res;
  try
  {
     res = a < b ? c : d;
    return res; // replace with gcc_unreachable () for more twists
    }
    catch (...)
    {
    return res;
    }
}

So ... how far are you with enforcing a split VEC_COND_EXPR?
Thus can we avoid the above completely (even as intermediate
state)?

Thanks,
Richard.

> Thanks,
> Martin
Martin Liška June 12, 2020, 1:24 p.m. UTC | #48
On 6/12/20 11:43 AM, Richard Biener wrote:
> So ... how far are you with enforcing a split VEC_COND_EXPR?
> Thus can we avoid the above completely (even as intermediate
> state)?

Apparently, I'm quite close. Using the attached patch I see only 2 testsuite
failures:

FAIL: gcc.dg/tree-ssa/pr68714.c scan-tree-dump-times reassoc1 " <= " 1
FAIL: gcc.target/i386/pr78102.c scan-assembler-times pcmpeqq 3

The first one is about teaching reassoc about the SSA_NAMEs in VEC_COND_EXPR. I haven't
analyze the second failure.

I'm also not sure about the gimlification change, I see a superfluous assignments:
   vec_cond_cmp.5 = _1 == _2;
   vec_cond_cmp.6 = vec_cond_cmp.5;
   vec_cond_cmp.7 = vec_cond_cmp.6;
   _3 = VEC_COND_EXPR <vec_cond_cmp.7, { -1, -1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0 }>;
?

So with the suggested patch, the EH should be gone as you suggested. Right?

Martin
Richard Biener June 15, 2020, 7:14 a.m. UTC | #49
On Fri, Jun 12, 2020 at 3:24 PM Martin Liška <mliska@suse.cz> wrote:
>
> On 6/12/20 11:43 AM, Richard Biener wrote:
> > So ... how far are you with enforcing a split VEC_COND_EXPR?
> > Thus can we avoid the above completely (even as intermediate
> > state)?
>
> Apparently, I'm quite close. Using the attached patch I see only 2 testsuite
> failures:
>
> FAIL: gcc.dg/tree-ssa/pr68714.c scan-tree-dump-times reassoc1 " <= " 1
> FAIL: gcc.target/i386/pr78102.c scan-assembler-times pcmpeqq 3
>
> The first one is about teaching reassoc about the SSA_NAMEs in VEC_COND_EXPR. I haven't
> analyze the second failure.
>
> I'm also not sure about the gimlification change, I see a superfluous assignments:
>    vec_cond_cmp.5 = _1 == _2;
>    vec_cond_cmp.6 = vec_cond_cmp.5;
>    vec_cond_cmp.7 = vec_cond_cmp.6;
>    _3 = VEC_COND_EXPR <vec_cond_cmp.7, { -1, -1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0 }>;
> ?
>
> So with the suggested patch, the EH should be gone as you suggested. Right?

Right, it should be on the comparison already from the start.

@@ -14221,9 +14221,13 @@ gimplify_expr (tree *expr_p, gimple_seq
*pre_p, gimple_seq *post_p,
        case VEC_COND_EXPR:
          {
            enum gimplify_status r0, r1, r2;
-
            r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
                                post_p, is_gimple_condexpr, fb_rvalue);
+           tree xop0 = TREE_OPERAND (*expr_p, 0);
+           tmp = create_tmp_var_raw (TREE_TYPE (xop0), "vec_cond_cmp");
+           gimple_add_tmp_var (tmp);
+           gimplify_assign (tmp, xop0, pre_p);
+           TREE_OPERAND (*expr_p, 0) = tmp;
            r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
                                post_p, is_gimple_val, fb_rvalue);

all of VEC_COND_EXPR can now be a simple goto expr_3;

diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index 494c9e9c20b..090fb52a2f1 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -3136,6 +3136,10 @@ pass_forwprop::execute (function *fun)
                    if (code == COND_EXPR
                        || code == VEC_COND_EXPR)
                      {
+                       /* Do not propagate into VEC_COND_EXPRs.  */
+                       if (code == VEC_COND_EXPR)
+                         break;
+

err - remove the || code == VEC_COND_EXPR instead?

@@ -2221,24 +2226,12 @@ expand_vector_operations (void)
 {
   gimple_stmt_iterator gsi;
   basic_block bb;
-  bool cfg_changed = false;

   FOR_EACH_BB_FN (bb, cfun)
-    {
-      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
-       {
-         expand_vector_operations_1 (&gsi);
-         /* ???  If we do not cleanup EH then we will ICE in
-            verification.  But in reality we have created wrong-code
-            as we did not properly transition EH info and edges to
-            the piecewise computations.  */
-         if (maybe_clean_eh_stmt (gsi_stmt (gsi))
-             && gimple_purge_dead_eh_edges (bb))
-           cfg_changed = true;
-       }
-    }

I'm not sure about this.  Consider the C++ testcase where
the ?: is replaced by a division.  If veclower needs to replace
that with four scalrar division statements then the above
still applies - veclower does not correctly duplicate EH info
and EH edges to the individual divisions (and we do not know
which component might trap).

So please leave the above in.  You can try if using integer
division makes it break and add such a testcase if there's
no coverage for this in the testsuite.

What's missing from the patch is adjusting
verify_gimple_assign_ternary from

  if (((rhs_code == VEC_COND_EXPR || rhs_code == COND_EXPR)
       ? !is_gimple_condexpr (rhs1) : !is_gimple_val (rhs1))
      || !is_gimple_val (rhs2)
      || !is_gimple_val (rhs3))
    {
      error ("invalid operands in ternary operation");
      return true;

to the same with the rhs_code == VEC_COND_EXPR case removed.

You'll likely figure the vectorizer still creates some VEC_COND_EXPRs
with embedded comparisons.

Thanks,
Richard.


> Martin
Martin Liška June 15, 2020, 11:19 a.m. UTC | #50
On 6/15/20 9:14 AM, Richard Biener wrote:
> On Fri, Jun 12, 2020 at 3:24 PM Martin Liška <mliska@suse.cz> wrote:
>>
>> On 6/12/20 11:43 AM, Richard Biener wrote:
>>> So ... how far are you with enforcing a split VEC_COND_EXPR?
>>> Thus can we avoid the above completely (even as intermediate
>>> state)?
>>
>> Apparently, I'm quite close. Using the attached patch I see only 2 testsuite
>> failures:
>>
>> FAIL: gcc.dg/tree-ssa/pr68714.c scan-tree-dump-times reassoc1 " <= " 1
>> FAIL: gcc.target/i386/pr78102.c scan-assembler-times pcmpeqq 3
>>
>> The first one is about teaching reassoc about the SSA_NAMEs in VEC_COND_EXPR. I haven't
>> analyze the second failure.
>>
>> I'm also not sure about the gimlification change, I see a superfluous assignments:
>>     vec_cond_cmp.5 = _1 == _2;
>>     vec_cond_cmp.6 = vec_cond_cmp.5;
>>     vec_cond_cmp.7 = vec_cond_cmp.6;
>>     _3 = VEC_COND_EXPR <vec_cond_cmp.7, { -1, -1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0 }>;
>> ?
>>
>> So with the suggested patch, the EH should be gone as you suggested. Right?
> 
> Right, it should be on the comparison already from the start.
> 
> @@ -14221,9 +14221,13 @@ gimplify_expr (tree *expr_p, gimple_seq
> *pre_p, gimple_seq *post_p,
>          case VEC_COND_EXPR:
>            {
>              enum gimplify_status r0, r1, r2;
> -
>              r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
>                                  post_p, is_gimple_condexpr, fb_rvalue);
> +           tree xop0 = TREE_OPERAND (*expr_p, 0);
> +           tmp = create_tmp_var_raw (TREE_TYPE (xop0), "vec_cond_cmp");
> +           gimple_add_tmp_var (tmp);
> +           gimplify_assign (tmp, xop0, pre_p);
> +           TREE_OPERAND (*expr_p, 0) = tmp;
>              r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
>                                  post_p, is_gimple_val, fb_rvalue);
> 
> all of VEC_COND_EXPR can now be a simple goto expr_3;

Works for me, thanks!

> 
> diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
> index 494c9e9c20b..090fb52a2f1 100644
> --- a/gcc/tree-ssa-forwprop.c
> +++ b/gcc/tree-ssa-forwprop.c
> @@ -3136,6 +3136,10 @@ pass_forwprop::execute (function *fun)
>                      if (code == COND_EXPR
>                          || code == VEC_COND_EXPR)
>                        {
> +                       /* Do not propagate into VEC_COND_EXPRs.  */
> +                       if (code == VEC_COND_EXPR)
> +                         break;
> +
> 
> err - remove the || code == VEC_COND_EXPR instead?

Yep.

> 
> @@ -2221,24 +2226,12 @@ expand_vector_operations (void)
>   {
>     gimple_stmt_iterator gsi;
>     basic_block bb;
> -  bool cfg_changed = false;
> 
>     FOR_EACH_BB_FN (bb, cfun)
> -    {
> -      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> -       {
> -         expand_vector_operations_1 (&gsi);
> -         /* ???  If we do not cleanup EH then we will ICE in
> -            verification.  But in reality we have created wrong-code
> -            as we did not properly transition EH info and edges to
> -            the piecewise computations.  */
> -         if (maybe_clean_eh_stmt (gsi_stmt (gsi))
> -             && gimple_purge_dead_eh_edges (bb))
> -           cfg_changed = true;
> -       }
> -    }
> 
> I'm not sure about this.  Consider the C++ testcase where
> the ?: is replaced by a division.  If veclower needs to replace
> that with four scalrar division statements then the above
> still applies - veclower does not correctly duplicate EH info
> and EH edges to the individual divisions (and we do not know
> which component might trap).
> 
> So please leave the above in.  You can try if using integer
> division makes it break and add such a testcase if there's
> no coverage for this in the testsuite.

I'm leaving that above. Can you please explain how can a division test-case
be created?

> 
> What's missing from the patch is adjusting
> verify_gimple_assign_ternary from
> 
>    if (((rhs_code == VEC_COND_EXPR || rhs_code == COND_EXPR)
>         ? !is_gimple_condexpr (rhs1) : !is_gimple_val (rhs1))
>        || !is_gimple_val (rhs2)
>        || !is_gimple_val (rhs3))
>      {
>        error ("invalid operands in ternary operation");
>        return true;
> 
> to the same with the rhs_code == VEC_COND_EXPR case removed.

Hmm. I'm not sure I've got this comment. Why do we want to change it
and is it done wright in the patch?

> 
> You'll likely figure the vectorizer still creates some VEC_COND_EXPRs
> with embedded comparisons.

I've fixed 2 failing test-cases I mentioned in the previous email.

Martin

> 
> Thanks,
> Richard.
> 
> 
>> Martin
Richard Biener June 15, 2020, 11:59 a.m. UTC | #51
On Mon, Jun 15, 2020 at 1:19 PM Martin Liška <mliska@suse.cz> wrote:
>
> On 6/15/20 9:14 AM, Richard Biener wrote:
> > On Fri, Jun 12, 2020 at 3:24 PM Martin Liška <mliska@suse.cz> wrote:
> >>
> >> On 6/12/20 11:43 AM, Richard Biener wrote:
> >>> So ... how far are you with enforcing a split VEC_COND_EXPR?
> >>> Thus can we avoid the above completely (even as intermediate
> >>> state)?
> >>
> >> Apparently, I'm quite close. Using the attached patch I see only 2 testsuite
> >> failures:
> >>
> >> FAIL: gcc.dg/tree-ssa/pr68714.c scan-tree-dump-times reassoc1 " <= " 1
> >> FAIL: gcc.target/i386/pr78102.c scan-assembler-times pcmpeqq 3
> >>
> >> The first one is about teaching reassoc about the SSA_NAMEs in VEC_COND_EXPR. I haven't
> >> analyze the second failure.
> >>
> >> I'm also not sure about the gimlification change, I see a superfluous assignments:
> >>     vec_cond_cmp.5 = _1 == _2;
> >>     vec_cond_cmp.6 = vec_cond_cmp.5;
> >>     vec_cond_cmp.7 = vec_cond_cmp.6;
> >>     _3 = VEC_COND_EXPR <vec_cond_cmp.7, { -1, -1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0 }>;
> >> ?
> >>
> >> So with the suggested patch, the EH should be gone as you suggested. Right?
> >
> > Right, it should be on the comparison already from the start.
> >
> > @@ -14221,9 +14221,13 @@ gimplify_expr (tree *expr_p, gimple_seq
> > *pre_p, gimple_seq *post_p,
> >          case VEC_COND_EXPR:
> >            {
> >              enum gimplify_status r0, r1, r2;
> > -
> >              r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
> >                                  post_p, is_gimple_condexpr, fb_rvalue);
> > +           tree xop0 = TREE_OPERAND (*expr_p, 0);
> > +           tmp = create_tmp_var_raw (TREE_TYPE (xop0), "vec_cond_cmp");
> > +           gimple_add_tmp_var (tmp);
> > +           gimplify_assign (tmp, xop0, pre_p);
> > +           TREE_OPERAND (*expr_p, 0) = tmp;
> >              r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
> >                                  post_p, is_gimple_val, fb_rvalue);
> >
> > all of VEC_COND_EXPR can now be a simple goto expr_3;
>
> Works for me, thanks!
>
> >
> > diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
> > index 494c9e9c20b..090fb52a2f1 100644
> > --- a/gcc/tree-ssa-forwprop.c
> > +++ b/gcc/tree-ssa-forwprop.c
> > @@ -3136,6 +3136,10 @@ pass_forwprop::execute (function *fun)
> >                      if (code == COND_EXPR
> >                          || code == VEC_COND_EXPR)
> >                        {
> > +                       /* Do not propagate into VEC_COND_EXPRs.  */
> > +                       if (code == VEC_COND_EXPR)
> > +                         break;
> > +
> >
> > err - remove the || code == VEC_COND_EXPR instead?
>
> Yep.
>
> >
> > @@ -2221,24 +2226,12 @@ expand_vector_operations (void)
> >   {
> >     gimple_stmt_iterator gsi;
> >     basic_block bb;
> > -  bool cfg_changed = false;
> >
> >     FOR_EACH_BB_FN (bb, cfun)
> > -    {
> > -      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > -       {
> > -         expand_vector_operations_1 (&gsi);
> > -         /* ???  If we do not cleanup EH then we will ICE in
> > -            verification.  But in reality we have created wrong-code
> > -            as we did not properly transition EH info and edges to
> > -            the piecewise computations.  */
> > -         if (maybe_clean_eh_stmt (gsi_stmt (gsi))
> > -             && gimple_purge_dead_eh_edges (bb))
> > -           cfg_changed = true;
> > -       }
> > -    }
> >
> > I'm not sure about this.  Consider the C++ testcase where
> > the ?: is replaced by a division.  If veclower needs to replace
> > that with four scalrar division statements then the above
> > still applies - veclower does not correctly duplicate EH info
> > and EH edges to the individual divisions (and we do not know
> > which component might trap).
> >
> > So please leave the above in.  You can try if using integer
> > division makes it break and add such a testcase if there's
> > no coverage for this in the testsuite.
>
> I'm leaving that above. Can you please explain how can a division test-case
> be created?

typedef long v2di __attribute__((vector_size(16)));

v2di foo (v2di a, v2di b)
{
  try
  {
    v2di res = a / b;
    return res;
    }
    catch (...)
    {
    return (v2di){};
    }
}

with -fnon-call-exceptions I see in t.ii.090t.ehdisp (correctly):

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  [LP 1] _6 = a_4(D) / b_5(D);
;;    succ:       5
;;                3

while after t.ii.226t.veclower we have

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _13 = BIT_FIELD_REF <a_4(D), 64, 0>;
  _14 = BIT_FIELD_REF <b_5(D), 64, 0>;
  _15 = _13 / _14;
  _16 = BIT_FIELD_REF <a_4(D), 64, 64>;
  _17 = BIT_FIELD_REF <b_5(D), 64, 64>;
  _18 = _16 / _17;
  _6 = {_15, _18};
  res_7 = _6;
  _8 = res_7;
;;    succ:       3

and all EH is gone and we'd ICE if you remove the above hunk.  Hopefully.

We still generate wrong-code obviously as we'd need to duplicate the
EH info on each component division (and split blocks and generate
extra EH edges).  That's a pre-existing bug of course.  I just wanted
to avoid to create a new instance just because of the early instruction
selection for VEC_COND_EXPR.

> >
> > What's missing from the patch is adjusting
> > verify_gimple_assign_ternary from
> >
> >    if (((rhs_code == VEC_COND_EXPR || rhs_code == COND_EXPR)
> >         ? !is_gimple_condexpr (rhs1) : !is_gimple_val (rhs1))
> >        || !is_gimple_val (rhs2)
> >        || !is_gimple_val (rhs3))
> >      {
> >        error ("invalid operands in ternary operation");
> >        return true;
> >
> > to the same with the rhs_code == VEC_COND_EXPR case removed.
>
> Hmm. I'm not sure I've got this comment. Why do we want to change it
> and is it done wright in the patch?

Ah, I missed the hunk you added.  But the check should be an inclusive
one, not an exclusive one and earlier accepting a is_gimple_condexpr
is superfluous when you later reject the tcc_comparison part.  Just
testing is_gimple_val is better.  So yes, remove your tree-cfg.c hunk
and just adjust the above test.

> >
> > You'll likely figure the vectorizer still creates some VEC_COND_EXPRs
> > with embedded comparisons.
>
> I've fixed 2 failing test-cases I mentioned in the previous email.
>
> Martin
>
> >
> > Thanks,
> > Richard.
> >
> >
> >> Martin
>
Martin Liška June 15, 2020, 12:20 p.m. UTC | #52
On 6/15/20 1:59 PM, Richard Biener wrote:
> On Mon, Jun 15, 2020 at 1:19 PM Martin Liška <mliska@suse.cz> wrote:
>>
>> On 6/15/20 9:14 AM, Richard Biener wrote:
>>> On Fri, Jun 12, 2020 at 3:24 PM Martin Liška <mliska@suse.cz> wrote:
>>>>
>>>> On 6/12/20 11:43 AM, Richard Biener wrote:
>>>>> So ... how far are you with enforcing a split VEC_COND_EXPR?
>>>>> Thus can we avoid the above completely (even as intermediate
>>>>> state)?
>>>>
>>>> Apparently, I'm quite close. Using the attached patch I see only 2 testsuite
>>>> failures:
>>>>
>>>> FAIL: gcc.dg/tree-ssa/pr68714.c scan-tree-dump-times reassoc1 " <= " 1
>>>> FAIL: gcc.target/i386/pr78102.c scan-assembler-times pcmpeqq 3
>>>>
>>>> The first one is about teaching reassoc about the SSA_NAMEs in VEC_COND_EXPR. I haven't
>>>> analyze the second failure.
>>>>
>>>> I'm also not sure about the gimlification change, I see a superfluous assignments:
>>>>      vec_cond_cmp.5 = _1 == _2;
>>>>      vec_cond_cmp.6 = vec_cond_cmp.5;
>>>>      vec_cond_cmp.7 = vec_cond_cmp.6;
>>>>      _3 = VEC_COND_EXPR <vec_cond_cmp.7, { -1, -1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0 }>;
>>>> ?
>>>>
>>>> So with the suggested patch, the EH should be gone as you suggested. Right?
>>>
>>> Right, it should be on the comparison already from the start.
>>>
>>> @@ -14221,9 +14221,13 @@ gimplify_expr (tree *expr_p, gimple_seq
>>> *pre_p, gimple_seq *post_p,
>>>           case VEC_COND_EXPR:
>>>             {
>>>               enum gimplify_status r0, r1, r2;
>>> -
>>>               r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
>>>                                   post_p, is_gimple_condexpr, fb_rvalue);
>>> +           tree xop0 = TREE_OPERAND (*expr_p, 0);
>>> +           tmp = create_tmp_var_raw (TREE_TYPE (xop0), "vec_cond_cmp");
>>> +           gimple_add_tmp_var (tmp);
>>> +           gimplify_assign (tmp, xop0, pre_p);
>>> +           TREE_OPERAND (*expr_p, 0) = tmp;
>>>               r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
>>>                                   post_p, is_gimple_val, fb_rvalue);
>>>
>>> all of VEC_COND_EXPR can now be a simple goto expr_3;
>>
>> Works for me, thanks!
>>
>>>
>>> diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
>>> index 494c9e9c20b..090fb52a2f1 100644
>>> --- a/gcc/tree-ssa-forwprop.c
>>> +++ b/gcc/tree-ssa-forwprop.c
>>> @@ -3136,6 +3136,10 @@ pass_forwprop::execute (function *fun)
>>>                       if (code == COND_EXPR
>>>                           || code == VEC_COND_EXPR)
>>>                         {
>>> +                       /* Do not propagate into VEC_COND_EXPRs.  */
>>> +                       if (code == VEC_COND_EXPR)
>>> +                         break;
>>> +
>>>
>>> err - remove the || code == VEC_COND_EXPR instead?
>>
>> Yep.
>>
>>>
>>> @@ -2221,24 +2226,12 @@ expand_vector_operations (void)
>>>    {
>>>      gimple_stmt_iterator gsi;
>>>      basic_block bb;
>>> -  bool cfg_changed = false;
>>>
>>>      FOR_EACH_BB_FN (bb, cfun)
>>> -    {
>>> -      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>>> -       {
>>> -         expand_vector_operations_1 (&gsi);
>>> -         /* ???  If we do not cleanup EH then we will ICE in
>>> -            verification.  But in reality we have created wrong-code
>>> -            as we did not properly transition EH info and edges to
>>> -            the piecewise computations.  */
>>> -         if (maybe_clean_eh_stmt (gsi_stmt (gsi))
>>> -             && gimple_purge_dead_eh_edges (bb))
>>> -           cfg_changed = true;
>>> -       }
>>> -    }
>>>
>>> I'm not sure about this.  Consider the C++ testcase where
>>> the ?: is replaced by a division.  If veclower needs to replace
>>> that with four scalrar division statements then the above
>>> still applies - veclower does not correctly duplicate EH info
>>> and EH edges to the individual divisions (and we do not know
>>> which component might trap).
>>>
>>> So please leave the above in.  You can try if using integer
>>> division makes it break and add such a testcase if there's
>>> no coverage for this in the testsuite.
>>
>> I'm leaving that above. Can you please explain how can a division test-case
>> be created?
> 
> typedef long v2di __attribute__((vector_size(16)));
> 
> v2di foo (v2di a, v2di b)
> {
>    try
>    {
>      v2di res = a / b;
>      return res;
>      }
>      catch (...)
>      {
>      return (v2di){};
>      }
> }
> 
> with -fnon-call-exceptions I see in t.ii.090t.ehdisp (correctly):
> 
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>    [LP 1] _6 = a_4(D) / b_5(D);
> ;;    succ:       5
> ;;                3
> 
> while after t.ii.226t.veclower we have
> 
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>    _13 = BIT_FIELD_REF <a_4(D), 64, 0>;
>    _14 = BIT_FIELD_REF <b_5(D), 64, 0>;
>    _15 = _13 / _14;
>    _16 = BIT_FIELD_REF <a_4(D), 64, 64>;
>    _17 = BIT_FIELD_REF <b_5(D), 64, 64>;
>    _18 = _16 / _17;
>    _6 = {_15, _18};
>    res_7 = _6;
>    _8 = res_7;
> ;;    succ:       3
> 
> and all EH is gone and we'd ICE if you remove the above hunk.  Hopefully.

Yes, it ICEs then:


./xg++ -B. ~/Programming/testcases/ice.c -c -fnon-call-exceptions -O3
/home/marxin/Programming/testcases/ice.c: In function ‘v2di foo(v2di, v2di)’:
/home/marxin/Programming/testcases/ice.c:3:6: error: statement marked for throw, but doesn’t
     3 | v2di foo (v2di a, v2di b)
       |      ^~~
_6 = {_12, _15};
during GIMPLE pass: veclower2
/home/marxin/Programming/testcases/ice.c:3:6: internal compiler error: verify_gimple failed
0x10e308a verify_gimple_in_cfg(function*, bool)
	/home/marxin/Programming/gcc/gcc/tree-cfg.c:5461
0xfc9caf execute_function_todo
	/home/marxin/Programming/gcc/gcc/passes.c:1985
0xfcaafc do_per_function
	/home/marxin/Programming/gcc/gcc/passes.c:1640
0xfcaafc execute_todo
	/home/marxin/Programming/gcc/gcc/passes.c:2039
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

> 
> We still generate wrong-code obviously as we'd need to duplicate the
> EH info on each component division (and split blocks and generate
> extra EH edges).  That's a pre-existing bug of course.  I just wanted
> to avoid to create a new instance just because of the early instruction
> selection for VEC_COND_EXPR.

Fine!

> 
>>>
>>> What's missing from the patch is adjusting
>>> verify_gimple_assign_ternary from
>>>
>>>     if (((rhs_code == VEC_COND_EXPR || rhs_code == COND_EXPR)
>>>          ? !is_gimple_condexpr (rhs1) : !is_gimple_val (rhs1))
>>>         || !is_gimple_val (rhs2)
>>>         || !is_gimple_val (rhs3))
>>>       {
>>>         error ("invalid operands in ternary operation");
>>>         return true;
>>>
>>> to the same with the rhs_code == VEC_COND_EXPR case removed.
>>
>> Hmm. I'm not sure I've got this comment. Why do we want to change it
>> and is it done wright in the patch?
> 
> Ah, I missed the hunk you added.

That explains the confusion I got.

>  But the check should be an inclusive
> one, not an exclusive one and earlier accepting a is_gimple_condexpr
> is superfluous when you later reject the tcc_comparison part.  Just
> testing is_gimple_val is better.  So yes, remove your tree-cfg.c hunk
> and just adjust the above test.

I simplified that.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Thanks,
Martin

> 
>>>
>>> You'll likely figure the vectorizer still creates some VEC_COND_EXPRs
>>> with embedded comparisons.
>>
>> I've fixed 2 failing test-cases I mentioned in the previous email.
>>
>> Martin
>>
>>>
>>> Thanks,
>>> Richard.
>>>
>>>
>>>> Martin
>>
Richard Biener June 17, 2020, 8:50 a.m. UTC | #53
On Mon, Jun 15, 2020 at 2:20 PM Martin Liška <mliska@suse.cz> wrote:
>
> On 6/15/20 1:59 PM, Richard Biener wrote:
> > On Mon, Jun 15, 2020 at 1:19 PM Martin Liška <mliska@suse.cz> wrote:
> >>
> >> On 6/15/20 9:14 AM, Richard Biener wrote:
> >>> On Fri, Jun 12, 2020 at 3:24 PM Martin Liška <mliska@suse.cz> wrote:
> >>>>
> >>>> On 6/12/20 11:43 AM, Richard Biener wrote:
> >>>>> So ... how far are you with enforcing a split VEC_COND_EXPR?
> >>>>> Thus can we avoid the above completely (even as intermediate
> >>>>> state)?
> >>>>
> >>>> Apparently, I'm quite close. Using the attached patch I see only 2 testsuite
> >>>> failures:
> >>>>
> >>>> FAIL: gcc.dg/tree-ssa/pr68714.c scan-tree-dump-times reassoc1 " <= " 1
> >>>> FAIL: gcc.target/i386/pr78102.c scan-assembler-times pcmpeqq 3
> >>>>
> >>>> The first one is about teaching reassoc about the SSA_NAMEs in VEC_COND_EXPR. I haven't
> >>>> analyze the second failure.
> >>>>
> >>>> I'm also not sure about the gimlification change, I see a superfluous assignments:
> >>>>      vec_cond_cmp.5 = _1 == _2;
> >>>>      vec_cond_cmp.6 = vec_cond_cmp.5;
> >>>>      vec_cond_cmp.7 = vec_cond_cmp.6;
> >>>>      _3 = VEC_COND_EXPR <vec_cond_cmp.7, { -1, -1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0 }>;
> >>>> ?
> >>>>
> >>>> So with the suggested patch, the EH should be gone as you suggested. Right?
> >>>
> >>> Right, it should be on the comparison already from the start.
> >>>
> >>> @@ -14221,9 +14221,13 @@ gimplify_expr (tree *expr_p, gimple_seq
> >>> *pre_p, gimple_seq *post_p,
> >>>           case VEC_COND_EXPR:
> >>>             {
> >>>               enum gimplify_status r0, r1, r2;
> >>> -
> >>>               r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
> >>>                                   post_p, is_gimple_condexpr, fb_rvalue);
> >>> +           tree xop0 = TREE_OPERAND (*expr_p, 0);
> >>> +           tmp = create_tmp_var_raw (TREE_TYPE (xop0), "vec_cond_cmp");
> >>> +           gimple_add_tmp_var (tmp);
> >>> +           gimplify_assign (tmp, xop0, pre_p);
> >>> +           TREE_OPERAND (*expr_p, 0) = tmp;
> >>>               r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
> >>>                                   post_p, is_gimple_val, fb_rvalue);
> >>>
> >>> all of VEC_COND_EXPR can now be a simple goto expr_3;
> >>
> >> Works for me, thanks!
> >>
> >>>
> >>> diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
> >>> index 494c9e9c20b..090fb52a2f1 100644
> >>> --- a/gcc/tree-ssa-forwprop.c
> >>> +++ b/gcc/tree-ssa-forwprop.c
> >>> @@ -3136,6 +3136,10 @@ pass_forwprop::execute (function *fun)
> >>>                       if (code == COND_EXPR
> >>>                           || code == VEC_COND_EXPR)
> >>>                         {
> >>> +                       /* Do not propagate into VEC_COND_EXPRs.  */
> >>> +                       if (code == VEC_COND_EXPR)
> >>> +                         break;
> >>> +
> >>>
> >>> err - remove the || code == VEC_COND_EXPR instead?
> >>
> >> Yep.
> >>
> >>>
> >>> @@ -2221,24 +2226,12 @@ expand_vector_operations (void)
> >>>    {
> >>>      gimple_stmt_iterator gsi;
> >>>      basic_block bb;
> >>> -  bool cfg_changed = false;
> >>>
> >>>      FOR_EACH_BB_FN (bb, cfun)
> >>> -    {
> >>> -      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> >>> -       {
> >>> -         expand_vector_operations_1 (&gsi);
> >>> -         /* ???  If we do not cleanup EH then we will ICE in
> >>> -            verification.  But in reality we have created wrong-code
> >>> -            as we did not properly transition EH info and edges to
> >>> -            the piecewise computations.  */
> >>> -         if (maybe_clean_eh_stmt (gsi_stmt (gsi))
> >>> -             && gimple_purge_dead_eh_edges (bb))
> >>> -           cfg_changed = true;
> >>> -       }
> >>> -    }
> >>>
> >>> I'm not sure about this.  Consider the C++ testcase where
> >>> the ?: is replaced by a division.  If veclower needs to replace
> >>> that with four scalrar division statements then the above
> >>> still applies - veclower does not correctly duplicate EH info
> >>> and EH edges to the individual divisions (and we do not know
> >>> which component might trap).
> >>>
> >>> So please leave the above in.  You can try if using integer
> >>> division makes it break and add such a testcase if there's
> >>> no coverage for this in the testsuite.
> >>
> >> I'm leaving that above. Can you please explain how can a division test-case
> >> be created?
> >
> > typedef long v2di __attribute__((vector_size(16)));
> >
> > v2di foo (v2di a, v2di b)
> > {
> >    try
> >    {
> >      v2di res = a / b;
> >      return res;
> >      }
> >      catch (...)
> >      {
> >      return (v2di){};
> >      }
> > }
> >
> > with -fnon-call-exceptions I see in t.ii.090t.ehdisp (correctly):
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >    [LP 1] _6 = a_4(D) / b_5(D);
> > ;;    succ:       5
> > ;;                3
> >
> > while after t.ii.226t.veclower we have
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >    _13 = BIT_FIELD_REF <a_4(D), 64, 0>;
> >    _14 = BIT_FIELD_REF <b_5(D), 64, 0>;
> >    _15 = _13 / _14;
> >    _16 = BIT_FIELD_REF <a_4(D), 64, 64>;
> >    _17 = BIT_FIELD_REF <b_5(D), 64, 64>;
> >    _18 = _16 / _17;
> >    _6 = {_15, _18};
> >    res_7 = _6;
> >    _8 = res_7;
> > ;;    succ:       3
> >
> > and all EH is gone and we'd ICE if you remove the above hunk.  Hopefully.
>
> Yes, it ICEs then:
>
>
> ./xg++ -B. ~/Programming/testcases/ice.c -c -fnon-call-exceptions -O3
> /home/marxin/Programming/testcases/ice.c: In function ‘v2di foo(v2di, v2di)’:
> /home/marxin/Programming/testcases/ice.c:3:6: error: statement marked for throw, but doesn’t
>      3 | v2di foo (v2di a, v2di b)
>        |      ^~~
> _6 = {_12, _15};
> during GIMPLE pass: veclower2
> /home/marxin/Programming/testcases/ice.c:3:6: internal compiler error: verify_gimple failed
> 0x10e308a verify_gimple_in_cfg(function*, bool)
>         /home/marxin/Programming/gcc/gcc/tree-cfg.c:5461
> 0xfc9caf execute_function_todo
>         /home/marxin/Programming/gcc/gcc/passes.c:1985
> 0xfcaafc do_per_function
>         /home/marxin/Programming/gcc/gcc/passes.c:1640
> 0xfcaafc execute_todo
>         /home/marxin/Programming/gcc/gcc/passes.c:2039
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See <https://gcc.gnu.org/bugs/> for instructions.
>
> >
> > We still generate wrong-code obviously as we'd need to duplicate the
> > EH info on each component division (and split blocks and generate
> > extra EH edges).  That's a pre-existing bug of course.  I just wanted
> > to avoid to create a new instance just because of the early instruction
> > selection for VEC_COND_EXPR.
>
> Fine!
>
> >
> >>>
> >>> What's missing from the patch is adjusting
> >>> verify_gimple_assign_ternary from
> >>>
> >>>     if (((rhs_code == VEC_COND_EXPR || rhs_code == COND_EXPR)
> >>>          ? !is_gimple_condexpr (rhs1) : !is_gimple_val (rhs1))
> >>>         || !is_gimple_val (rhs2)
> >>>         || !is_gimple_val (rhs3))
> >>>       {
> >>>         error ("invalid operands in ternary operation");
> >>>         return true;
> >>>
> >>> to the same with the rhs_code == VEC_COND_EXPR case removed.
> >>
> >> Hmm. I'm not sure I've got this comment. Why do we want to change it
> >> and is it done wright in the patch?
> >
> > Ah, I missed the hunk you added.
>
> That explains the confusion I got.
>
> >  But the check should be an inclusive
> > one, not an exclusive one and earlier accepting a is_gimple_condexpr
> > is superfluous when you later reject the tcc_comparison part.  Just
> > testing is_gimple_val is better.  So yes, remove your tree-cfg.c hunk
> > and just adjust the above test.
>
> I simplified that.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Please double-check the changelog

        (do_store_flag):

+       tree-vect-isel.o \

IMHO we want to move more of the pattern matching magic of RTL
expansion here to obsolete TER.  So please name it gimple-isel.cc
(.cc!, not .c)

+  gassign *assign = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (cond));
+  if (stmt != NULL
+      && TREE_CODE_CLASS (gimple_assign_rhs_code (assign)) != tcc_comparison)
+    return ERROR_MARK;

you want stmt == NULL || TREE_CODE_CLASS (...)

in case the def stmt is a call.

+         gimple_seq seq;
+         tree exp = force_gimple_operand (comb, &seq, true, NULL_TREE);
+         if (seq)
+           {
+             gimple_stmt_iterator gsi = gsi_for_stmt (vcond0);
+             gsi_insert_before (&gsi, seq, GSI_SAME_STMT);
+           }

use force_gimple_operand_gsi that makes the above simpler.

          if (invert)
-           std::swap (*gimple_assign_rhs2_ptr (stmt0),
-                      *gimple_assign_rhs3_ptr (stmt0));
-         update_stmt (stmt0);
+           std::swap (*gimple_assign_rhs2_ptr (vcond0),
+                      *gimple_assign_rhs3_ptr (vcond0));

use swap_ssa_operands.

+         gimple_assign_set_rhs1 (vcond0, exp);
+         update_stmt (vcond0);

diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index cf2d979fea1..710b17a7c5c 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -9937,8 +9937,8 @@ vectorizable_condition (vec_info *vinfo,
        {
          vec_cond_rhs = vec_oprnds1[i];
          if (bitop1 == NOP_EXPR)
-           vec_compare = build2 (cond_code, vec_cmp_type,
-                                 vec_cond_lhs, vec_cond_rhs);
+           vec_compare = gimplify_build2 (gsi, cond_code, vec_cmp_type,
+                                          vec_cond_lhs, vec_cond_rhs);
          else
            {

please don't introduce more uses of gimplify_buildN - I'd like to
get rid of those.  You can use

     gimple_seq stmts = NULL;
     vec_compare = gimple_build (&stmts, cond_code, ...);
     gsi_insert_seq_before/after (...);

OK with those changes.

Thanks,
Richard.


> Thanks,
> Martin
>
> >
> >>>
> >>> You'll likely figure the vectorizer still creates some VEC_COND_EXPRs
> >>> with embedded comparisons.
> >>
> >> I've fixed 2 failing test-cases I mentioned in the previous email.
> >>
> >> Martin
> >>
> >>>
> >>> Thanks,
> >>> Richard.
> >>>
> >>>
> >>>> Martin
> >>
>
Richard Biener June 17, 2020, 1:15 p.m. UTC | #54
On Wed, Jun 17, 2020 at 10:50 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Mon, Jun 15, 2020 at 2:20 PM Martin Liška <mliska@suse.cz> wrote:
> >
> > On 6/15/20 1:59 PM, Richard Biener wrote:
> > > On Mon, Jun 15, 2020 at 1:19 PM Martin Liška <mliska@suse.cz> wrote:
> > >>
> > >> On 6/15/20 9:14 AM, Richard Biener wrote:
> > >>> On Fri, Jun 12, 2020 at 3:24 PM Martin Liška <mliska@suse.cz> wrote:
> > >>>>
> > >>>> On 6/12/20 11:43 AM, Richard Biener wrote:
> > >>>>> So ... how far are you with enforcing a split VEC_COND_EXPR?
> > >>>>> Thus can we avoid the above completely (even as intermediate
> > >>>>> state)?
> > >>>>
> > >>>> Apparently, I'm quite close. Using the attached patch I see only 2 testsuite
> > >>>> failures:
> > >>>>
> > >>>> FAIL: gcc.dg/tree-ssa/pr68714.c scan-tree-dump-times reassoc1 " <= " 1
> > >>>> FAIL: gcc.target/i386/pr78102.c scan-assembler-times pcmpeqq 3
> > >>>>
> > >>>> The first one is about teaching reassoc about the SSA_NAMEs in VEC_COND_EXPR. I haven't
> > >>>> analyze the second failure.
> > >>>>
> > >>>> I'm also not sure about the gimlification change, I see a superfluous assignments:
> > >>>>      vec_cond_cmp.5 = _1 == _2;
> > >>>>      vec_cond_cmp.6 = vec_cond_cmp.5;
> > >>>>      vec_cond_cmp.7 = vec_cond_cmp.6;
> > >>>>      _3 = VEC_COND_EXPR <vec_cond_cmp.7, { -1, -1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0 }>;
> > >>>> ?
> > >>>>
> > >>>> So with the suggested patch, the EH should be gone as you suggested. Right?
> > >>>
> > >>> Right, it should be on the comparison already from the start.
> > >>>
> > >>> @@ -14221,9 +14221,13 @@ gimplify_expr (tree *expr_p, gimple_seq
> > >>> *pre_p, gimple_seq *post_p,
> > >>>           case VEC_COND_EXPR:
> > >>>             {
> > >>>               enum gimplify_status r0, r1, r2;
> > >>> -
> > >>>               r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
> > >>>                                   post_p, is_gimple_condexpr, fb_rvalue);
> > >>> +           tree xop0 = TREE_OPERAND (*expr_p, 0);
> > >>> +           tmp = create_tmp_var_raw (TREE_TYPE (xop0), "vec_cond_cmp");
> > >>> +           gimple_add_tmp_var (tmp);
> > >>> +           gimplify_assign (tmp, xop0, pre_p);
> > >>> +           TREE_OPERAND (*expr_p, 0) = tmp;
> > >>>               r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
> > >>>                                   post_p, is_gimple_val, fb_rvalue);
> > >>>
> > >>> all of VEC_COND_EXPR can now be a simple goto expr_3;
> > >>
> > >> Works for me, thanks!
> > >>
> > >>>
> > >>> diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
> > >>> index 494c9e9c20b..090fb52a2f1 100644
> > >>> --- a/gcc/tree-ssa-forwprop.c
> > >>> +++ b/gcc/tree-ssa-forwprop.c
> > >>> @@ -3136,6 +3136,10 @@ pass_forwprop::execute (function *fun)
> > >>>                       if (code == COND_EXPR
> > >>>                           || code == VEC_COND_EXPR)
> > >>>                         {
> > >>> +                       /* Do not propagate into VEC_COND_EXPRs.  */
> > >>> +                       if (code == VEC_COND_EXPR)
> > >>> +                         break;
> > >>> +
> > >>>
> > >>> err - remove the || code == VEC_COND_EXPR instead?
> > >>
> > >> Yep.
> > >>
> > >>>
> > >>> @@ -2221,24 +2226,12 @@ expand_vector_operations (void)
> > >>>    {
> > >>>      gimple_stmt_iterator gsi;
> > >>>      basic_block bb;
> > >>> -  bool cfg_changed = false;
> > >>>
> > >>>      FOR_EACH_BB_FN (bb, cfun)
> > >>> -    {
> > >>> -      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > >>> -       {
> > >>> -         expand_vector_operations_1 (&gsi);
> > >>> -         /* ???  If we do not cleanup EH then we will ICE in
> > >>> -            verification.  But in reality we have created wrong-code
> > >>> -            as we did not properly transition EH info and edges to
> > >>> -            the piecewise computations.  */
> > >>> -         if (maybe_clean_eh_stmt (gsi_stmt (gsi))
> > >>> -             && gimple_purge_dead_eh_edges (bb))
> > >>> -           cfg_changed = true;
> > >>> -       }
> > >>> -    }
> > >>>
> > >>> I'm not sure about this.  Consider the C++ testcase where
> > >>> the ?: is replaced by a division.  If veclower needs to replace
> > >>> that with four scalrar division statements then the above
> > >>> still applies - veclower does not correctly duplicate EH info
> > >>> and EH edges to the individual divisions (and we do not know
> > >>> which component might trap).
> > >>>
> > >>> So please leave the above in.  You can try if using integer
> > >>> division makes it break and add such a testcase if there's
> > >>> no coverage for this in the testsuite.
> > >>
> > >> I'm leaving that above. Can you please explain how can a division test-case
> > >> be created?
> > >
> > > typedef long v2di __attribute__((vector_size(16)));
> > >
> > > v2di foo (v2di a, v2di b)
> > > {
> > >    try
> > >    {
> > >      v2di res = a / b;
> > >      return res;
> > >      }
> > >      catch (...)
> > >      {
> > >      return (v2di){};
> > >      }
> > > }
> > >
> > > with -fnon-call-exceptions I see in t.ii.090t.ehdisp (correctly):
> > >
> > > ;;   basic block 2, loop depth 0
> > > ;;    pred:       ENTRY
> > >    [LP 1] _6 = a_4(D) / b_5(D);
> > > ;;    succ:       5
> > > ;;                3
> > >
> > > while after t.ii.226t.veclower we have
> > >
> > > ;;   basic block 2, loop depth 0
> > > ;;    pred:       ENTRY
> > >    _13 = BIT_FIELD_REF <a_4(D), 64, 0>;
> > >    _14 = BIT_FIELD_REF <b_5(D), 64, 0>;
> > >    _15 = _13 / _14;
> > >    _16 = BIT_FIELD_REF <a_4(D), 64, 64>;
> > >    _17 = BIT_FIELD_REF <b_5(D), 64, 64>;
> > >    _18 = _16 / _17;
> > >    _6 = {_15, _18};
> > >    res_7 = _6;
> > >    _8 = res_7;
> > > ;;    succ:       3
> > >
> > > and all EH is gone and we'd ICE if you remove the above hunk.  Hopefully.
> >
> > Yes, it ICEs then:
> >
> >
> > ./xg++ -B. ~/Programming/testcases/ice.c -c -fnon-call-exceptions -O3
> > /home/marxin/Programming/testcases/ice.c: In function ‘v2di foo(v2di, v2di)’:
> > /home/marxin/Programming/testcases/ice.c:3:6: error: statement marked for throw, but doesn’t
> >      3 | v2di foo (v2di a, v2di b)
> >        |      ^~~
> > _6 = {_12, _15};
> > during GIMPLE pass: veclower2
> > /home/marxin/Programming/testcases/ice.c:3:6: internal compiler error: verify_gimple failed
> > 0x10e308a verify_gimple_in_cfg(function*, bool)
> >         /home/marxin/Programming/gcc/gcc/tree-cfg.c:5461
> > 0xfc9caf execute_function_todo
> >         /home/marxin/Programming/gcc/gcc/passes.c:1985
> > 0xfcaafc do_per_function
> >         /home/marxin/Programming/gcc/gcc/passes.c:1640
> > 0xfcaafc execute_todo
> >         /home/marxin/Programming/gcc/gcc/passes.c:2039
> > Please submit a full bug report,
> > with preprocessed source if appropriate.
> > Please include the complete backtrace with any bug report.
> > See <https://gcc.gnu.org/bugs/> for instructions.
> >
> > >
> > > We still generate wrong-code obviously as we'd need to duplicate the
> > > EH info on each component division (and split blocks and generate
> > > extra EH edges).  That's a pre-existing bug of course.  I just wanted
> > > to avoid to create a new instance just because of the early instruction
> > > selection for VEC_COND_EXPR.
> >
> > Fine!
> >
> > >
> > >>>
> > >>> What's missing from the patch is adjusting
> > >>> verify_gimple_assign_ternary from
> > >>>
> > >>>     if (((rhs_code == VEC_COND_EXPR || rhs_code == COND_EXPR)
> > >>>          ? !is_gimple_condexpr (rhs1) : !is_gimple_val (rhs1))
> > >>>         || !is_gimple_val (rhs2)
> > >>>         || !is_gimple_val (rhs3))
> > >>>       {
> > >>>         error ("invalid operands in ternary operation");
> > >>>         return true;
> > >>>
> > >>> to the same with the rhs_code == VEC_COND_EXPR case removed.
> > >>
> > >> Hmm. I'm not sure I've got this comment. Why do we want to change it
> > >> and is it done wright in the patch?
> > >
> > > Ah, I missed the hunk you added.
> >
> > That explains the confusion I got.
> >
> > >  But the check should be an inclusive
> > > one, not an exclusive one and earlier accepting a is_gimple_condexpr
> > > is superfluous when you later reject the tcc_comparison part.  Just
> > > testing is_gimple_val is better.  So yes, remove your tree-cfg.c hunk
> > > and just adjust the above test.
> >
> > I simplified that.
> >
> > Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Please double-check the changelog
>
>         (do_store_flag):
>
> +       tree-vect-isel.o \
>
> IMHO we want to move more of the pattern matching magic of RTL
> expansion here to obsolete TER.  So please name it gimple-isel.cc
> (.cc!, not .c)
>
> +  gassign *assign = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (cond));
> +  if (stmt != NULL
> +      && TREE_CODE_CLASS (gimple_assign_rhs_code (assign)) != tcc_comparison)
> +    return ERROR_MARK;
>
> you want stmt == NULL || TREE_CODE_CLASS (...)
>
> in case the def stmt is a call.
>
> +         gimple_seq seq;
> +         tree exp = force_gimple_operand (comb, &seq, true, NULL_TREE);
> +         if (seq)
> +           {
> +             gimple_stmt_iterator gsi = gsi_for_stmt (vcond0);
> +             gsi_insert_before (&gsi, seq, GSI_SAME_STMT);
> +           }
>
> use force_gimple_operand_gsi that makes the above simpler.
>
>           if (invert)
> -           std::swap (*gimple_assign_rhs2_ptr (stmt0),
> -                      *gimple_assign_rhs3_ptr (stmt0));
> -         update_stmt (stmt0);
> +           std::swap (*gimple_assign_rhs2_ptr (vcond0),
> +                      *gimple_assign_rhs3_ptr (vcond0));
>
> use swap_ssa_operands.
>
> +         gimple_assign_set_rhs1 (vcond0, exp);
> +         update_stmt (vcond0);
>
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index cf2d979fea1..710b17a7c5c 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -9937,8 +9937,8 @@ vectorizable_condition (vec_info *vinfo,
>         {
>           vec_cond_rhs = vec_oprnds1[i];
>           if (bitop1 == NOP_EXPR)
> -           vec_compare = build2 (cond_code, vec_cmp_type,
> -                                 vec_cond_lhs, vec_cond_rhs);
> +           vec_compare = gimplify_build2 (gsi, cond_code, vec_cmp_type,
> +                                          vec_cond_lhs, vec_cond_rhs);
>           else
>             {
>
> please don't introduce more uses of gimplify_buildN - I'd like to
> get rid of those.  You can use
>
>      gimple_seq stmts = NULL;
>      vec_compare = gimple_build (&stmts, cond_code, ...);
>      gsi_insert_seq_before/after (...);
>
> OK with those changes.

Applying the patch caused

Running target unix//-m32
FAIL: gcc.c-torture/execute/ieee/pr50310.c execution,  -O3
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer
-finline-functions
FAIL: gcc.c-torture/execute/ieee/pr50310.c execution,  -O3 -g

and

FAIL: ext/random/simd_fast_mersenne_twister_engine/operators/inequal.cc
(test for excess errors)
UNRESOLVED: ext/random/simd_fast_mersenne_twister_engine/operators/inequal.cc
compilation failed to produce executable

Richard.

> Thanks,
> Richard.
>
>
> > Thanks,
> > Martin
> >
> > >
> > >>>
> > >>> You'll likely figure the vectorizer still creates some VEC_COND_EXPRs
> > >>> with embedded comparisons.
> > >>
> > >> I've fixed 2 failing test-cases I mentioned in the previous email.
> > >>
> > >> Martin
> > >>
> > >>>
> > >>> Thanks,
> > >>> Richard.
> > >>>
> > >>>
> > >>>> Martin
> > >>
> >
Martin Liška June 18, 2020, 8:10 a.m. UTC | #55
On 6/17/20 3:15 PM, Richard Biener wrote:
> On Wed, Jun 17, 2020 at 10:50 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
>>
>> On Mon, Jun 15, 2020 at 2:20 PM Martin Liška <mliska@suse.cz> wrote:
>>>
>>> On 6/15/20 1:59 PM, Richard Biener wrote:
>>>> On Mon, Jun 15, 2020 at 1:19 PM Martin Liška <mliska@suse.cz> wrote:
>>>>>
>>>>> On 6/15/20 9:14 AM, Richard Biener wrote:
>>>>>> On Fri, Jun 12, 2020 at 3:24 PM Martin Liška <mliska@suse.cz> wrote:
>>>>>>>
>>>>>>> On 6/12/20 11:43 AM, Richard Biener wrote:
>>>>>>>> So ... how far are you with enforcing a split VEC_COND_EXPR?
>>>>>>>> Thus can we avoid the above completely (even as intermediate
>>>>>>>> state)?
>>>>>>>
>>>>>>> Apparently, I'm quite close. Using the attached patch I see only 2 testsuite
>>>>>>> failures:
>>>>>>>
>>>>>>> FAIL: gcc.dg/tree-ssa/pr68714.c scan-tree-dump-times reassoc1 " <= " 1
>>>>>>> FAIL: gcc.target/i386/pr78102.c scan-assembler-times pcmpeqq 3
>>>>>>>
>>>>>>> The first one is about teaching reassoc about the SSA_NAMEs in VEC_COND_EXPR. I haven't
>>>>>>> analyze the second failure.
>>>>>>>
>>>>>>> I'm also not sure about the gimlification change, I see a superfluous assignments:
>>>>>>>       vec_cond_cmp.5 = _1 == _2;
>>>>>>>       vec_cond_cmp.6 = vec_cond_cmp.5;
>>>>>>>       vec_cond_cmp.7 = vec_cond_cmp.6;
>>>>>>>       _3 = VEC_COND_EXPR <vec_cond_cmp.7, { -1, -1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0 }>;
>>>>>>> ?
>>>>>>>
>>>>>>> So with the suggested patch, the EH should be gone as you suggested. Right?
>>>>>>
>>>>>> Right, it should be on the comparison already from the start.
>>>>>>
>>>>>> @@ -14221,9 +14221,13 @@ gimplify_expr (tree *expr_p, gimple_seq
>>>>>> *pre_p, gimple_seq *post_p,
>>>>>>            case VEC_COND_EXPR:
>>>>>>              {
>>>>>>                enum gimplify_status r0, r1, r2;
>>>>>> -
>>>>>>                r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
>>>>>>                                    post_p, is_gimple_condexpr, fb_rvalue);
>>>>>> +           tree xop0 = TREE_OPERAND (*expr_p, 0);
>>>>>> +           tmp = create_tmp_var_raw (TREE_TYPE (xop0), "vec_cond_cmp");
>>>>>> +           gimple_add_tmp_var (tmp);
>>>>>> +           gimplify_assign (tmp, xop0, pre_p);
>>>>>> +           TREE_OPERAND (*expr_p, 0) = tmp;
>>>>>>                r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
>>>>>>                                    post_p, is_gimple_val, fb_rvalue);
>>>>>>
>>>>>> all of VEC_COND_EXPR can now be a simple goto expr_3;
>>>>>
>>>>> Works for me, thanks!
>>>>>
>>>>>>
>>>>>> diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
>>>>>> index 494c9e9c20b..090fb52a2f1 100644
>>>>>> --- a/gcc/tree-ssa-forwprop.c
>>>>>> +++ b/gcc/tree-ssa-forwprop.c
>>>>>> @@ -3136,6 +3136,10 @@ pass_forwprop::execute (function *fun)
>>>>>>                        if (code == COND_EXPR
>>>>>>                            || code == VEC_COND_EXPR)
>>>>>>                          {
>>>>>> +                       /* Do not propagate into VEC_COND_EXPRs.  */
>>>>>> +                       if (code == VEC_COND_EXPR)
>>>>>> +                         break;
>>>>>> +
>>>>>>
>>>>>> err - remove the || code == VEC_COND_EXPR instead?
>>>>>
>>>>> Yep.
>>>>>
>>>>>>
>>>>>> @@ -2221,24 +2226,12 @@ expand_vector_operations (void)
>>>>>>     {
>>>>>>       gimple_stmt_iterator gsi;
>>>>>>       basic_block bb;
>>>>>> -  bool cfg_changed = false;
>>>>>>
>>>>>>       FOR_EACH_BB_FN (bb, cfun)
>>>>>> -    {
>>>>>> -      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>>>>>> -       {
>>>>>> -         expand_vector_operations_1 (&gsi);
>>>>>> -         /* ???  If we do not cleanup EH then we will ICE in
>>>>>> -            verification.  But in reality we have created wrong-code
>>>>>> -            as we did not properly transition EH info and edges to
>>>>>> -            the piecewise computations.  */
>>>>>> -         if (maybe_clean_eh_stmt (gsi_stmt (gsi))
>>>>>> -             && gimple_purge_dead_eh_edges (bb))
>>>>>> -           cfg_changed = true;
>>>>>> -       }
>>>>>> -    }
>>>>>>
>>>>>> I'm not sure about this.  Consider the C++ testcase where
>>>>>> the ?: is replaced by a division.  If veclower needs to replace
>>>>>> that with four scalrar division statements then the above
>>>>>> still applies - veclower does not correctly duplicate EH info
>>>>>> and EH edges to the individual divisions (and we do not know
>>>>>> which component might trap).
>>>>>>
>>>>>> So please leave the above in.  You can try if using integer
>>>>>> division makes it break and add such a testcase if there's
>>>>>> no coverage for this in the testsuite.
>>>>>
>>>>> I'm leaving that above. Can you please explain how can a division test-case
>>>>> be created?
>>>>
>>>> typedef long v2di __attribute__((vector_size(16)));
>>>>
>>>> v2di foo (v2di a, v2di b)
>>>> {
>>>>     try
>>>>     {
>>>>       v2di res = a / b;
>>>>       return res;
>>>>       }
>>>>       catch (...)
>>>>       {
>>>>       return (v2di){};
>>>>       }
>>>> }
>>>>
>>>> with -fnon-call-exceptions I see in t.ii.090t.ehdisp (correctly):
>>>>
>>>> ;;   basic block 2, loop depth 0
>>>> ;;    pred:       ENTRY
>>>>     [LP 1] _6 = a_4(D) / b_5(D);
>>>> ;;    succ:       5
>>>> ;;                3
>>>>
>>>> while after t.ii.226t.veclower we have
>>>>
>>>> ;;   basic block 2, loop depth 0
>>>> ;;    pred:       ENTRY
>>>>     _13 = BIT_FIELD_REF <a_4(D), 64, 0>;
>>>>     _14 = BIT_FIELD_REF <b_5(D), 64, 0>;
>>>>     _15 = _13 / _14;
>>>>     _16 = BIT_FIELD_REF <a_4(D), 64, 64>;
>>>>     _17 = BIT_FIELD_REF <b_5(D), 64, 64>;
>>>>     _18 = _16 / _17;
>>>>     _6 = {_15, _18};
>>>>     res_7 = _6;
>>>>     _8 = res_7;
>>>> ;;    succ:       3
>>>>
>>>> and all EH is gone and we'd ICE if you remove the above hunk.  Hopefully.
>>>
>>> Yes, it ICEs then:
>>>
>>>
>>> ./xg++ -B. ~/Programming/testcases/ice.c -c -fnon-call-exceptions -O3
>>> /home/marxin/Programming/testcases/ice.c: In function ‘v2di foo(v2di, v2di)’:
>>> /home/marxin/Programming/testcases/ice.c:3:6: error: statement marked for throw, but doesn’t
>>>       3 | v2di foo (v2di a, v2di b)
>>>         |      ^~~
>>> _6 = {_12, _15};
>>> during GIMPLE pass: veclower2
>>> /home/marxin/Programming/testcases/ice.c:3:6: internal compiler error: verify_gimple failed
>>> 0x10e308a verify_gimple_in_cfg(function*, bool)
>>>          /home/marxin/Programming/gcc/gcc/tree-cfg.c:5461
>>> 0xfc9caf execute_function_todo
>>>          /home/marxin/Programming/gcc/gcc/passes.c:1985
>>> 0xfcaafc do_per_function
>>>          /home/marxin/Programming/gcc/gcc/passes.c:1640
>>> 0xfcaafc execute_todo
>>>          /home/marxin/Programming/gcc/gcc/passes.c:2039
>>> Please submit a full bug report,
>>> with preprocessed source if appropriate.
>>> Please include the complete backtrace with any bug report.
>>> See <https://gcc.gnu.org/bugs/> for instructions.
>>>
>>>>
>>>> We still generate wrong-code obviously as we'd need to duplicate the
>>>> EH info on each component division (and split blocks and generate
>>>> extra EH edges).  That's a pre-existing bug of course.  I just wanted
>>>> to avoid to create a new instance just because of the early instruction
>>>> selection for VEC_COND_EXPR.
>>>
>>> Fine!
>>>
>>>>
>>>>>>
>>>>>> What's missing from the patch is adjusting
>>>>>> verify_gimple_assign_ternary from
>>>>>>
>>>>>>      if (((rhs_code == VEC_COND_EXPR || rhs_code == COND_EXPR)
>>>>>>           ? !is_gimple_condexpr (rhs1) : !is_gimple_val (rhs1))
>>>>>>          || !is_gimple_val (rhs2)
>>>>>>          || !is_gimple_val (rhs3))
>>>>>>        {
>>>>>>          error ("invalid operands in ternary operation");
>>>>>>          return true;
>>>>>>
>>>>>> to the same with the rhs_code == VEC_COND_EXPR case removed.
>>>>>
>>>>> Hmm. I'm not sure I've got this comment. Why do we want to change it
>>>>> and is it done wright in the patch?
>>>>
>>>> Ah, I missed the hunk you added.
>>>
>>> That explains the confusion I got.
>>>
>>>>   But the check should be an inclusive
>>>> one, not an exclusive one and earlier accepting a is_gimple_condexpr
>>>> is superfluous when you later reject the tcc_comparison part.  Just
>>>> testing is_gimple_val is better.  So yes, remove your tree-cfg.c hunk
>>>> and just adjust the above test.
>>>
>>> I simplified that.
>>>
>>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>>
>> Please double-check the changelog
>>
>>          (do_store_flag):
>>
>> +       tree-vect-isel.o \
>>
>> IMHO we want to move more of the pattern matching magic of RTL
>> expansion here to obsolete TER.  So please name it gimple-isel.cc
>> (.cc!, not .c)
>>
>> +  gassign *assign = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (cond));
>> +  if (stmt != NULL
>> +      && TREE_CODE_CLASS (gimple_assign_rhs_code (assign)) != tcc_comparison)
>> +    return ERROR_MARK;
>>
>> you want stmt == NULL || TREE_CODE_CLASS (...)
>>
>> in case the def stmt is a call.
>>
>> +         gimple_seq seq;
>> +         tree exp = force_gimple_operand (comb, &seq, true, NULL_TREE);
>> +         if (seq)
>> +           {
>> +             gimple_stmt_iterator gsi = gsi_for_stmt (vcond0);
>> +             gsi_insert_before (&gsi, seq, GSI_SAME_STMT);
>> +           }
>>
>> use force_gimple_operand_gsi that makes the above simpler.
>>
>>            if (invert)
>> -           std::swap (*gimple_assign_rhs2_ptr (stmt0),
>> -                      *gimple_assign_rhs3_ptr (stmt0));
>> -         update_stmt (stmt0);
>> +           std::swap (*gimple_assign_rhs2_ptr (vcond0),
>> +                      *gimple_assign_rhs3_ptr (vcond0));
>>
>> use swap_ssa_operands.
>>
>> +         gimple_assign_set_rhs1 (vcond0, exp);
>> +         update_stmt (vcond0);
>>
>> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
>> index cf2d979fea1..710b17a7c5c 100644
>> --- a/gcc/tree-vect-stmts.c
>> +++ b/gcc/tree-vect-stmts.c
>> @@ -9937,8 +9937,8 @@ vectorizable_condition (vec_info *vinfo,
>>          {
>>            vec_cond_rhs = vec_oprnds1[i];
>>            if (bitop1 == NOP_EXPR)
>> -           vec_compare = build2 (cond_code, vec_cmp_type,
>> -                                 vec_cond_lhs, vec_cond_rhs);
>> +           vec_compare = gimplify_build2 (gsi, cond_code, vec_cmp_type,
>> +                                          vec_cond_lhs, vec_cond_rhs);
>>            else
>>              {
>>
>> please don't introduce more uses of gimplify_buildN - I'd like to
>> get rid of those.  You can use
>>
>>       gimple_seq stmts = NULL;
>>       vec_compare = gimple_build (&stmts, cond_code, ...);
>>       gsi_insert_seq_before/after (...);
>>
>> OK with those changes.
> 
> Applying the patch caused
> 
> Running target unix//-m32
> FAIL: gcc.c-torture/execute/ieee/pr50310.c execution,  -O3
> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer
> -finline-functions
> FAIL: gcc.c-torture/execute/ieee/pr50310.c execution,  -O3 -g

I can't reproduce that with current master. Can you?

> 
> and
> 
> FAIL: ext/random/simd_fast_mersenne_twister_engine/operators/inequal.cc
> (test for excess errors)
> UNRESOLVED: ext/random/simd_fast_mersenne_twister_engine/operators/inequal.cc
> compilation failed to produce executable

I've just fixed this one.

Martin

> 
> Richard.
> 
>> Thanks,
>> Richard.
>>
>>
>>> Thanks,
>>> Martin
>>>
>>>>
>>>>>>
>>>>>> You'll likely figure the vectorizer still creates some VEC_COND_EXPRs
>>>>>> with embedded comparisons.
>>>>>
>>>>> I've fixed 2 failing test-cases I mentioned in the previous email.
>>>>>
>>>>> Martin
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Richard.
>>>>>>
>>>>>>
>>>>>>> Martin
>>>>>
>>>
Richard Biener June 18, 2020, 8:52 a.m. UTC | #56
On Thu, Jun 18, 2020 at 10:10 AM Martin Liška <mliska@suse.cz> wrote:
>
> On 6/17/20 3:15 PM, Richard Biener wrote:
> > On Wed, Jun 17, 2020 at 10:50 AM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> >>
> >> On Mon, Jun 15, 2020 at 2:20 PM Martin Liška <mliska@suse.cz> wrote:
> >>>
> >>> On 6/15/20 1:59 PM, Richard Biener wrote:
> >>>> On Mon, Jun 15, 2020 at 1:19 PM Martin Liška <mliska@suse.cz> wrote:
> >>>>>
> >>>>> On 6/15/20 9:14 AM, Richard Biener wrote:
> >>>>>> On Fri, Jun 12, 2020 at 3:24 PM Martin Liška <mliska@suse.cz> wrote:
> >>>>>>>
> >>>>>>> On 6/12/20 11:43 AM, Richard Biener wrote:
> >>>>>>>> So ... how far are you with enforcing a split VEC_COND_EXPR?
> >>>>>>>> Thus can we avoid the above completely (even as intermediate
> >>>>>>>> state)?
> >>>>>>>
> >>>>>>> Apparently, I'm quite close. Using the attached patch I see only 2 testsuite
> >>>>>>> failures:
> >>>>>>>
> >>>>>>> FAIL: gcc.dg/tree-ssa/pr68714.c scan-tree-dump-times reassoc1 " <= " 1
> >>>>>>> FAIL: gcc.target/i386/pr78102.c scan-assembler-times pcmpeqq 3
> >>>>>>>
> >>>>>>> The first one is about teaching reassoc about the SSA_NAMEs in VEC_COND_EXPR. I haven't
> >>>>>>> analyze the second failure.
> >>>>>>>
> >>>>>>> I'm also not sure about the gimlification change, I see a superfluous assignments:
> >>>>>>>       vec_cond_cmp.5 = _1 == _2;
> >>>>>>>       vec_cond_cmp.6 = vec_cond_cmp.5;
> >>>>>>>       vec_cond_cmp.7 = vec_cond_cmp.6;
> >>>>>>>       _3 = VEC_COND_EXPR <vec_cond_cmp.7, { -1, -1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0 }>;
> >>>>>>> ?
> >>>>>>>
> >>>>>>> So with the suggested patch, the EH should be gone as you suggested. Right?
> >>>>>>
> >>>>>> Right, it should be on the comparison already from the start.
> >>>>>>
> >>>>>> @@ -14221,9 +14221,13 @@ gimplify_expr (tree *expr_p, gimple_seq
> >>>>>> *pre_p, gimple_seq *post_p,
> >>>>>>            case VEC_COND_EXPR:
> >>>>>>              {
> >>>>>>                enum gimplify_status r0, r1, r2;
> >>>>>> -
> >>>>>>                r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
> >>>>>>                                    post_p, is_gimple_condexpr, fb_rvalue);
> >>>>>> +           tree xop0 = TREE_OPERAND (*expr_p, 0);
> >>>>>> +           tmp = create_tmp_var_raw (TREE_TYPE (xop0), "vec_cond_cmp");
> >>>>>> +           gimple_add_tmp_var (tmp);
> >>>>>> +           gimplify_assign (tmp, xop0, pre_p);
> >>>>>> +           TREE_OPERAND (*expr_p, 0) = tmp;
> >>>>>>                r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
> >>>>>>                                    post_p, is_gimple_val, fb_rvalue);
> >>>>>>
> >>>>>> all of VEC_COND_EXPR can now be a simple goto expr_3;
> >>>>>
> >>>>> Works for me, thanks!
> >>>>>
> >>>>>>
> >>>>>> diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
> >>>>>> index 494c9e9c20b..090fb52a2f1 100644
> >>>>>> --- a/gcc/tree-ssa-forwprop.c
> >>>>>> +++ b/gcc/tree-ssa-forwprop.c
> >>>>>> @@ -3136,6 +3136,10 @@ pass_forwprop::execute (function *fun)
> >>>>>>                        if (code == COND_EXPR
> >>>>>>                            || code == VEC_COND_EXPR)
> >>>>>>                          {
> >>>>>> +                       /* Do not propagate into VEC_COND_EXPRs.  */
> >>>>>> +                       if (code == VEC_COND_EXPR)
> >>>>>> +                         break;
> >>>>>> +
> >>>>>>
> >>>>>> err - remove the || code == VEC_COND_EXPR instead?
> >>>>>
> >>>>> Yep.
> >>>>>
> >>>>>>
> >>>>>> @@ -2221,24 +2226,12 @@ expand_vector_operations (void)
> >>>>>>     {
> >>>>>>       gimple_stmt_iterator gsi;
> >>>>>>       basic_block bb;
> >>>>>> -  bool cfg_changed = false;
> >>>>>>
> >>>>>>       FOR_EACH_BB_FN (bb, cfun)
> >>>>>> -    {
> >>>>>> -      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> >>>>>> -       {
> >>>>>> -         expand_vector_operations_1 (&gsi);
> >>>>>> -         /* ???  If we do not cleanup EH then we will ICE in
> >>>>>> -            verification.  But in reality we have created wrong-code
> >>>>>> -            as we did not properly transition EH info and edges to
> >>>>>> -            the piecewise computations.  */
> >>>>>> -         if (maybe_clean_eh_stmt (gsi_stmt (gsi))
> >>>>>> -             && gimple_purge_dead_eh_edges (bb))
> >>>>>> -           cfg_changed = true;
> >>>>>> -       }
> >>>>>> -    }
> >>>>>>
> >>>>>> I'm not sure about this.  Consider the C++ testcase where
> >>>>>> the ?: is replaced by a division.  If veclower needs to replace
> >>>>>> that with four scalrar division statements then the above
> >>>>>> still applies - veclower does not correctly duplicate EH info
> >>>>>> and EH edges to the individual divisions (and we do not know
> >>>>>> which component might trap).
> >>>>>>
> >>>>>> So please leave the above in.  You can try if using integer
> >>>>>> division makes it break and add such a testcase if there's
> >>>>>> no coverage for this in the testsuite.
> >>>>>
> >>>>> I'm leaving that above. Can you please explain how can a division test-case
> >>>>> be created?
> >>>>
> >>>> typedef long v2di __attribute__((vector_size(16)));
> >>>>
> >>>> v2di foo (v2di a, v2di b)
> >>>> {
> >>>>     try
> >>>>     {
> >>>>       v2di res = a / b;
> >>>>       return res;
> >>>>       }
> >>>>       catch (...)
> >>>>       {
> >>>>       return (v2di){};
> >>>>       }
> >>>> }
> >>>>
> >>>> with -fnon-call-exceptions I see in t.ii.090t.ehdisp (correctly):
> >>>>
> >>>> ;;   basic block 2, loop depth 0
> >>>> ;;    pred:       ENTRY
> >>>>     [LP 1] _6 = a_4(D) / b_5(D);
> >>>> ;;    succ:       5
> >>>> ;;                3
> >>>>
> >>>> while after t.ii.226t.veclower we have
> >>>>
> >>>> ;;   basic block 2, loop depth 0
> >>>> ;;    pred:       ENTRY
> >>>>     _13 = BIT_FIELD_REF <a_4(D), 64, 0>;
> >>>>     _14 = BIT_FIELD_REF <b_5(D), 64, 0>;
> >>>>     _15 = _13 / _14;
> >>>>     _16 = BIT_FIELD_REF <a_4(D), 64, 64>;
> >>>>     _17 = BIT_FIELD_REF <b_5(D), 64, 64>;
> >>>>     _18 = _16 / _17;
> >>>>     _6 = {_15, _18};
> >>>>     res_7 = _6;
> >>>>     _8 = res_7;
> >>>> ;;    succ:       3
> >>>>
> >>>> and all EH is gone and we'd ICE if you remove the above hunk.  Hopefully.
> >>>
> >>> Yes, it ICEs then:
> >>>
> >>>
> >>> ./xg++ -B. ~/Programming/testcases/ice.c -c -fnon-call-exceptions -O3
> >>> /home/marxin/Programming/testcases/ice.c: In function ‘v2di foo(v2di, v2di)’:
> >>> /home/marxin/Programming/testcases/ice.c:3:6: error: statement marked for throw, but doesn’t
> >>>       3 | v2di foo (v2di a, v2di b)
> >>>         |      ^~~
> >>> _6 = {_12, _15};
> >>> during GIMPLE pass: veclower2
> >>> /home/marxin/Programming/testcases/ice.c:3:6: internal compiler error: verify_gimple failed
> >>> 0x10e308a verify_gimple_in_cfg(function*, bool)
> >>>          /home/marxin/Programming/gcc/gcc/tree-cfg.c:5461
> >>> 0xfc9caf execute_function_todo
> >>>          /home/marxin/Programming/gcc/gcc/passes.c:1985
> >>> 0xfcaafc do_per_function
> >>>          /home/marxin/Programming/gcc/gcc/passes.c:1640
> >>> 0xfcaafc execute_todo
> >>>          /home/marxin/Programming/gcc/gcc/passes.c:2039
> >>> Please submit a full bug report,
> >>> with preprocessed source if appropriate.
> >>> Please include the complete backtrace with any bug report.
> >>> See <https://gcc.gnu.org/bugs/> for instructions.
> >>>
> >>>>
> >>>> We still generate wrong-code obviously as we'd need to duplicate the
> >>>> EH info on each component division (and split blocks and generate
> >>>> extra EH edges).  That's a pre-existing bug of course.  I just wanted
> >>>> to avoid to create a new instance just because of the early instruction
> >>>> selection for VEC_COND_EXPR.
> >>>
> >>> Fine!
> >>>
> >>>>
> >>>>>>
> >>>>>> What's missing from the patch is adjusting
> >>>>>> verify_gimple_assign_ternary from
> >>>>>>
> >>>>>>      if (((rhs_code == VEC_COND_EXPR || rhs_code == COND_EXPR)
> >>>>>>           ? !is_gimple_condexpr (rhs1) : !is_gimple_val (rhs1))
> >>>>>>          || !is_gimple_val (rhs2)
> >>>>>>          || !is_gimple_val (rhs3))
> >>>>>>        {
> >>>>>>          error ("invalid operands in ternary operation");
> >>>>>>          return true;
> >>>>>>
> >>>>>> to the same with the rhs_code == VEC_COND_EXPR case removed.
> >>>>>
> >>>>> Hmm. I'm not sure I've got this comment. Why do we want to change it
> >>>>> and is it done wright in the patch?
> >>>>
> >>>> Ah, I missed the hunk you added.
> >>>
> >>> That explains the confusion I got.
> >>>
> >>>>   But the check should be an inclusive
> >>>> one, not an exclusive one and earlier accepting a is_gimple_condexpr
> >>>> is superfluous when you later reject the tcc_comparison part.  Just
> >>>> testing is_gimple_val is better.  So yes, remove your tree-cfg.c hunk
> >>>> and just adjust the above test.
> >>>
> >>> I simplified that.
> >>>
> >>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >>
> >> Please double-check the changelog
> >>
> >>          (do_store_flag):
> >>
> >> +       tree-vect-isel.o \
> >>
> >> IMHO we want to move more of the pattern matching magic of RTL
> >> expansion here to obsolete TER.  So please name it gimple-isel.cc
> >> (.cc!, not .c)
> >>
> >> +  gassign *assign = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (cond));
> >> +  if (stmt != NULL
> >> +      && TREE_CODE_CLASS (gimple_assign_rhs_code (assign)) != tcc_comparison)
> >> +    return ERROR_MARK;
> >>
> >> you want stmt == NULL || TREE_CODE_CLASS (...)
> >>
> >> in case the def stmt is a call.
> >>
> >> +         gimple_seq seq;
> >> +         tree exp = force_gimple_operand (comb, &seq, true, NULL_TREE);
> >> +         if (seq)
> >> +           {
> >> +             gimple_stmt_iterator gsi = gsi_for_stmt (vcond0);
> >> +             gsi_insert_before (&gsi, seq, GSI_SAME_STMT);
> >> +           }
> >>
> >> use force_gimple_operand_gsi that makes the above simpler.
> >>
> >>            if (invert)
> >> -           std::swap (*gimple_assign_rhs2_ptr (stmt0),
> >> -                      *gimple_assign_rhs3_ptr (stmt0));
> >> -         update_stmt (stmt0);
> >> +           std::swap (*gimple_assign_rhs2_ptr (vcond0),
> >> +                      *gimple_assign_rhs3_ptr (vcond0));
> >>
> >> use swap_ssa_operands.
> >>
> >> +         gimple_assign_set_rhs1 (vcond0, exp);
> >> +         update_stmt (vcond0);
> >>
> >> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> >> index cf2d979fea1..710b17a7c5c 100644
> >> --- a/gcc/tree-vect-stmts.c
> >> +++ b/gcc/tree-vect-stmts.c
> >> @@ -9937,8 +9937,8 @@ vectorizable_condition (vec_info *vinfo,
> >>          {
> >>            vec_cond_rhs = vec_oprnds1[i];
> >>            if (bitop1 == NOP_EXPR)
> >> -           vec_compare = build2 (cond_code, vec_cmp_type,
> >> -                                 vec_cond_lhs, vec_cond_rhs);
> >> +           vec_compare = gimplify_build2 (gsi, cond_code, vec_cmp_type,
> >> +                                          vec_cond_lhs, vec_cond_rhs);
> >>            else
> >>              {
> >>
> >> please don't introduce more uses of gimplify_buildN - I'd like to
> >> get rid of those.  You can use
> >>
> >>       gimple_seq stmts = NULL;
> >>       vec_compare = gimple_build (&stmts, cond_code, ...);
> >>       gsi_insert_seq_before/after (...);
> >>
> >> OK with those changes.
> >
> > Applying the patch caused
> >
> > Running target unix//-m32
> > FAIL: gcc.c-torture/execute/ieee/pr50310.c execution,  -O3
> > -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer
> > -finline-functions
> > FAIL: gcc.c-torture/execute/ieee/pr50310.c execution,  -O3 -g
>
> I can't reproduce that with current master. Can you?

Yes.

> make check-gcc RUNTESTFLAGS="--target_board=unix/-m32 ieee.exp=pr50310.c"
...
FAIL: gcc.c-torture/execute/ieee/pr50310.c execution,  -O3
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer
-finline-functions
FAIL: gcc.c-torture/execute/ieee/pr50310.c execution,  -O3 -g

mind the -m32

> >
> > and
> >
> > FAIL: ext/random/simd_fast_mersenne_twister_engine/operators/inequal.cc
> > (test for excess errors)
> > UNRESOLVED: ext/random/simd_fast_mersenne_twister_engine/operators/inequal.cc
> > compilation failed to produce executable
>
> I've just fixed this one.
>
> Martin
>
> >
> > Richard.
> >
> >> Thanks,
> >> Richard.
> >>
> >>
> >>> Thanks,
> >>> Martin
> >>>
> >>>>
> >>>>>>
> >>>>>> You'll likely figure the vectorizer still creates some VEC_COND_EXPRs
> >>>>>> with embedded comparisons.
> >>>>>
> >>>>> I've fixed 2 failing test-cases I mentioned in the previous email.
> >>>>>
> >>>>> Martin
> >>>>>
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Richard.
> >>>>>>
> >>>>>>
> >>>>>>> Martin
> >>>>>
> >>>
>
Martin Liška June 18, 2020, 9:02 a.m. UTC | #57
On 6/18/20 10:52 AM, Richard Biener wrote:
> On Thu, Jun 18, 2020 at 10:10 AM Martin Liška <mliska@suse.cz> wrote:
>>
>> On 6/17/20 3:15 PM, Richard Biener wrote:
>>> On Wed, Jun 17, 2020 at 10:50 AM Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>>
>>>> On Mon, Jun 15, 2020 at 2:20 PM Martin Liška <mliska@suse.cz> wrote:
>>>>>
>>>>> On 6/15/20 1:59 PM, Richard Biener wrote:
>>>>>> On Mon, Jun 15, 2020 at 1:19 PM Martin Liška <mliska@suse.cz> wrote:
>>>>>>>
>>>>>>> On 6/15/20 9:14 AM, Richard Biener wrote:
>>>>>>>> On Fri, Jun 12, 2020 at 3:24 PM Martin Liška <mliska@suse.cz> wrote:
>>>>>>>>>
>>>>>>>>> On 6/12/20 11:43 AM, Richard Biener wrote:
>>>>>>>>>> So ... how far are you with enforcing a split VEC_COND_EXPR?
>>>>>>>>>> Thus can we avoid the above completely (even as intermediate
>>>>>>>>>> state)?
>>>>>>>>>
>>>>>>>>> Apparently, I'm quite close. Using the attached patch I see only 2 testsuite
>>>>>>>>> failures:
>>>>>>>>>
>>>>>>>>> FAIL: gcc.dg/tree-ssa/pr68714.c scan-tree-dump-times reassoc1 " <= " 1
>>>>>>>>> FAIL: gcc.target/i386/pr78102.c scan-assembler-times pcmpeqq 3
>>>>>>>>>
>>>>>>>>> The first one is about teaching reassoc about the SSA_NAMEs in VEC_COND_EXPR. I haven't
>>>>>>>>> analyze the second failure.
>>>>>>>>>
>>>>>>>>> I'm also not sure about the gimlification change, I see a superfluous assignments:
>>>>>>>>>        vec_cond_cmp.5 = _1 == _2;
>>>>>>>>>        vec_cond_cmp.6 = vec_cond_cmp.5;
>>>>>>>>>        vec_cond_cmp.7 = vec_cond_cmp.6;
>>>>>>>>>        _3 = VEC_COND_EXPR <vec_cond_cmp.7, { -1, -1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0 }>;
>>>>>>>>> ?
>>>>>>>>>
>>>>>>>>> So with the suggested patch, the EH should be gone as you suggested. Right?
>>>>>>>>
>>>>>>>> Right, it should be on the comparison already from the start.
>>>>>>>>
>>>>>>>> @@ -14221,9 +14221,13 @@ gimplify_expr (tree *expr_p, gimple_seq
>>>>>>>> *pre_p, gimple_seq *post_p,
>>>>>>>>             case VEC_COND_EXPR:
>>>>>>>>               {
>>>>>>>>                 enum gimplify_status r0, r1, r2;
>>>>>>>> -
>>>>>>>>                 r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
>>>>>>>>                                     post_p, is_gimple_condexpr, fb_rvalue);
>>>>>>>> +           tree xop0 = TREE_OPERAND (*expr_p, 0);
>>>>>>>> +           tmp = create_tmp_var_raw (TREE_TYPE (xop0), "vec_cond_cmp");
>>>>>>>> +           gimple_add_tmp_var (tmp);
>>>>>>>> +           gimplify_assign (tmp, xop0, pre_p);
>>>>>>>> +           TREE_OPERAND (*expr_p, 0) = tmp;
>>>>>>>>                 r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
>>>>>>>>                                     post_p, is_gimple_val, fb_rvalue);
>>>>>>>>
>>>>>>>> all of VEC_COND_EXPR can now be a simple goto expr_3;
>>>>>>>
>>>>>>> Works for me, thanks!
>>>>>>>
>>>>>>>>
>>>>>>>> diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
>>>>>>>> index 494c9e9c20b..090fb52a2f1 100644
>>>>>>>> --- a/gcc/tree-ssa-forwprop.c
>>>>>>>> +++ b/gcc/tree-ssa-forwprop.c
>>>>>>>> @@ -3136,6 +3136,10 @@ pass_forwprop::execute (function *fun)
>>>>>>>>                         if (code == COND_EXPR
>>>>>>>>                             || code == VEC_COND_EXPR)
>>>>>>>>                           {
>>>>>>>> +                       /* Do not propagate into VEC_COND_EXPRs.  */
>>>>>>>> +                       if (code == VEC_COND_EXPR)
>>>>>>>> +                         break;
>>>>>>>> +
>>>>>>>>
>>>>>>>> err - remove the || code == VEC_COND_EXPR instead?
>>>>>>>
>>>>>>> Yep.
>>>>>>>
>>>>>>>>
>>>>>>>> @@ -2221,24 +2226,12 @@ expand_vector_operations (void)
>>>>>>>>      {
>>>>>>>>        gimple_stmt_iterator gsi;
>>>>>>>>        basic_block bb;
>>>>>>>> -  bool cfg_changed = false;
>>>>>>>>
>>>>>>>>        FOR_EACH_BB_FN (bb, cfun)
>>>>>>>> -    {
>>>>>>>> -      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>>>>>>>> -       {
>>>>>>>> -         expand_vector_operations_1 (&gsi);
>>>>>>>> -         /* ???  If we do not cleanup EH then we will ICE in
>>>>>>>> -            verification.  But in reality we have created wrong-code
>>>>>>>> -            as we did not properly transition EH info and edges to
>>>>>>>> -            the piecewise computations.  */
>>>>>>>> -         if (maybe_clean_eh_stmt (gsi_stmt (gsi))
>>>>>>>> -             && gimple_purge_dead_eh_edges (bb))
>>>>>>>> -           cfg_changed = true;
>>>>>>>> -       }
>>>>>>>> -    }
>>>>>>>>
>>>>>>>> I'm not sure about this.  Consider the C++ testcase where
>>>>>>>> the ?: is replaced by a division.  If veclower needs to replace
>>>>>>>> that with four scalrar division statements then the above
>>>>>>>> still applies - veclower does not correctly duplicate EH info
>>>>>>>> and EH edges to the individual divisions (and we do not know
>>>>>>>> which component might trap).
>>>>>>>>
>>>>>>>> So please leave the above in.  You can try if using integer
>>>>>>>> division makes it break and add such a testcase if there's
>>>>>>>> no coverage for this in the testsuite.
>>>>>>>
>>>>>>> I'm leaving that above. Can you please explain how can a division test-case
>>>>>>> be created?
>>>>>>
>>>>>> typedef long v2di __attribute__((vector_size(16)));
>>>>>>
>>>>>> v2di foo (v2di a, v2di b)
>>>>>> {
>>>>>>      try
>>>>>>      {
>>>>>>        v2di res = a / b;
>>>>>>        return res;
>>>>>>        }
>>>>>>        catch (...)
>>>>>>        {
>>>>>>        return (v2di){};
>>>>>>        }
>>>>>> }
>>>>>>
>>>>>> with -fnon-call-exceptions I see in t.ii.090t.ehdisp (correctly):
>>>>>>
>>>>>> ;;   basic block 2, loop depth 0
>>>>>> ;;    pred:       ENTRY
>>>>>>      [LP 1] _6 = a_4(D) / b_5(D);
>>>>>> ;;    succ:       5
>>>>>> ;;                3
>>>>>>
>>>>>> while after t.ii.226t.veclower we have
>>>>>>
>>>>>> ;;   basic block 2, loop depth 0
>>>>>> ;;    pred:       ENTRY
>>>>>>      _13 = BIT_FIELD_REF <a_4(D), 64, 0>;
>>>>>>      _14 = BIT_FIELD_REF <b_5(D), 64, 0>;
>>>>>>      _15 = _13 / _14;
>>>>>>      _16 = BIT_FIELD_REF <a_4(D), 64, 64>;
>>>>>>      _17 = BIT_FIELD_REF <b_5(D), 64, 64>;
>>>>>>      _18 = _16 / _17;
>>>>>>      _6 = {_15, _18};
>>>>>>      res_7 = _6;
>>>>>>      _8 = res_7;
>>>>>> ;;    succ:       3
>>>>>>
>>>>>> and all EH is gone and we'd ICE if you remove the above hunk.  Hopefully.
>>>>>
>>>>> Yes, it ICEs then:
>>>>>
>>>>>
>>>>> ./xg++ -B. ~/Programming/testcases/ice.c -c -fnon-call-exceptions -O3
>>>>> /home/marxin/Programming/testcases/ice.c: In function ‘v2di foo(v2di, v2di)’:
>>>>> /home/marxin/Programming/testcases/ice.c:3:6: error: statement marked for throw, but doesn’t
>>>>>        3 | v2di foo (v2di a, v2di b)
>>>>>          |      ^~~
>>>>> _6 = {_12, _15};
>>>>> during GIMPLE pass: veclower2
>>>>> /home/marxin/Programming/testcases/ice.c:3:6: internal compiler error: verify_gimple failed
>>>>> 0x10e308a verify_gimple_in_cfg(function*, bool)
>>>>>           /home/marxin/Programming/gcc/gcc/tree-cfg.c:5461
>>>>> 0xfc9caf execute_function_todo
>>>>>           /home/marxin/Programming/gcc/gcc/passes.c:1985
>>>>> 0xfcaafc do_per_function
>>>>>           /home/marxin/Programming/gcc/gcc/passes.c:1640
>>>>> 0xfcaafc execute_todo
>>>>>           /home/marxin/Programming/gcc/gcc/passes.c:2039
>>>>> Please submit a full bug report,
>>>>> with preprocessed source if appropriate.
>>>>> Please include the complete backtrace with any bug report.
>>>>> See <https://gcc.gnu.org/bugs/> for instructions.
>>>>>
>>>>>>
>>>>>> We still generate wrong-code obviously as we'd need to duplicate the
>>>>>> EH info on each component division (and split blocks and generate
>>>>>> extra EH edges).  That's a pre-existing bug of course.  I just wanted
>>>>>> to avoid to create a new instance just because of the early instruction
>>>>>> selection for VEC_COND_EXPR.
>>>>>
>>>>> Fine!
>>>>>
>>>>>>
>>>>>>>>
>>>>>>>> What's missing from the patch is adjusting
>>>>>>>> verify_gimple_assign_ternary from
>>>>>>>>
>>>>>>>>       if (((rhs_code == VEC_COND_EXPR || rhs_code == COND_EXPR)
>>>>>>>>            ? !is_gimple_condexpr (rhs1) : !is_gimple_val (rhs1))
>>>>>>>>           || !is_gimple_val (rhs2)
>>>>>>>>           || !is_gimple_val (rhs3))
>>>>>>>>         {
>>>>>>>>           error ("invalid operands in ternary operation");
>>>>>>>>           return true;
>>>>>>>>
>>>>>>>> to the same with the rhs_code == VEC_COND_EXPR case removed.
>>>>>>>
>>>>>>> Hmm. I'm not sure I've got this comment. Why do we want to change it
>>>>>>> and is it done wright in the patch?
>>>>>>
>>>>>> Ah, I missed the hunk you added.
>>>>>
>>>>> That explains the confusion I got.
>>>>>
>>>>>>    But the check should be an inclusive
>>>>>> one, not an exclusive one and earlier accepting a is_gimple_condexpr
>>>>>> is superfluous when you later reject the tcc_comparison part.  Just
>>>>>> testing is_gimple_val is better.  So yes, remove your tree-cfg.c hunk
>>>>>> and just adjust the above test.
>>>>>
>>>>> I simplified that.
>>>>>
>>>>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>>>>
>>>> Please double-check the changelog
>>>>
>>>>           (do_store_flag):
>>>>
>>>> +       tree-vect-isel.o \
>>>>
>>>> IMHO we want to move more of the pattern matching magic of RTL
>>>> expansion here to obsolete TER.  So please name it gimple-isel.cc
>>>> (.cc!, not .c)
>>>>
>>>> +  gassign *assign = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (cond));
>>>> +  if (stmt != NULL
>>>> +      && TREE_CODE_CLASS (gimple_assign_rhs_code (assign)) != tcc_comparison)
>>>> +    return ERROR_MARK;
>>>>
>>>> you want stmt == NULL || TREE_CODE_CLASS (...)
>>>>
>>>> in case the def stmt is a call.
>>>>
>>>> +         gimple_seq seq;
>>>> +         tree exp = force_gimple_operand (comb, &seq, true, NULL_TREE);
>>>> +         if (seq)
>>>> +           {
>>>> +             gimple_stmt_iterator gsi = gsi_for_stmt (vcond0);
>>>> +             gsi_insert_before (&gsi, seq, GSI_SAME_STMT);
>>>> +           }
>>>>
>>>> use force_gimple_operand_gsi that makes the above simpler.
>>>>
>>>>             if (invert)
>>>> -           std::swap (*gimple_assign_rhs2_ptr (stmt0),
>>>> -                      *gimple_assign_rhs3_ptr (stmt0));
>>>> -         update_stmt (stmt0);
>>>> +           std::swap (*gimple_assign_rhs2_ptr (vcond0),
>>>> +                      *gimple_assign_rhs3_ptr (vcond0));
>>>>
>>>> use swap_ssa_operands.
>>>>
>>>> +         gimple_assign_set_rhs1 (vcond0, exp);
>>>> +         update_stmt (vcond0);
>>>>
>>>> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
>>>> index cf2d979fea1..710b17a7c5c 100644
>>>> --- a/gcc/tree-vect-stmts.c
>>>> +++ b/gcc/tree-vect-stmts.c
>>>> @@ -9937,8 +9937,8 @@ vectorizable_condition (vec_info *vinfo,
>>>>           {
>>>>             vec_cond_rhs = vec_oprnds1[i];
>>>>             if (bitop1 == NOP_EXPR)
>>>> -           vec_compare = build2 (cond_code, vec_cmp_type,
>>>> -                                 vec_cond_lhs, vec_cond_rhs);
>>>> +           vec_compare = gimplify_build2 (gsi, cond_code, vec_cmp_type,
>>>> +                                          vec_cond_lhs, vec_cond_rhs);
>>>>             else
>>>>               {
>>>>
>>>> please don't introduce more uses of gimplify_buildN - I'd like to
>>>> get rid of those.  You can use
>>>>
>>>>        gimple_seq stmts = NULL;
>>>>        vec_compare = gimple_build (&stmts, cond_code, ...);
>>>>        gsi_insert_seq_before/after (...);
>>>>
>>>> OK with those changes.
>>>
>>> Applying the patch caused
>>>
>>> Running target unix//-m32
>>> FAIL: gcc.c-torture/execute/ieee/pr50310.c execution,  -O3
>>> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer
>>> -finline-functions
>>> FAIL: gcc.c-torture/execute/ieee/pr50310.c execution,  -O3 -g
>>
>> I can't reproduce that with current master. Can you?
> 
> Yes.
> 
>> make check-gcc RUNTESTFLAGS="--target_board=unix/-m32 ieee.exp=pr50310.c"
> ...
> FAIL: gcc.c-torture/execute/ieee/pr50310.c execution,  -O3
> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer
> -finline-functions
> FAIL: gcc.c-torture/execute/ieee/pr50310.c execution,  -O3 -g

Now I've got it.

> 
> mind the -m32

I did, but -ffloat-store was not mentioned in the previous list of options ;)

Martin

> 
>>>
>>> and
>>>
>>> FAIL: ext/random/simd_fast_mersenne_twister_engine/operators/inequal.cc
>>> (test for excess errors)
>>> UNRESOLVED: ext/random/simd_fast_mersenne_twister_engine/operators/inequal.cc
>>> compilation failed to produce executable
>>
>> I've just fixed this one.
>>
>> Martin
>>
>>>
>>> Richard.
>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>
>>>>> Thanks,
>>>>> Martin
>>>>>
>>>>>>
>>>>>>>>
>>>>>>>> You'll likely figure the vectorizer still creates some VEC_COND_EXPRs
>>>>>>>> with embedded comparisons.
>>>>>>>
>>>>>>> I've fixed 2 failing test-cases I mentioned in the previous email.
>>>>>>>
>>>>>>> Martin
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>
>>>>>>>>> Martin
>>>>>>>
>>>>>
>>
Martin Liška June 18, 2020, 9:29 a.m. UTC | #58
On 6/18/20 11:02 AM, Martin Liška wrote:
> Now I've got it.

I've just reduced that to:

$ cat pr50310.c
double s1[4], s2[4], s3[64];

int
main ()
{
   s1[0] = 5.0;
   s1[1] = 6.0;
   s1[2] = 5.0;
   s1[3] = __builtin_nan ("");
   s2[0] = 6.0;
   s2[1] = 5.0;
   s2[2] = 5.0;
   s2[3] = 5.0;

   asm volatile ("" : : : "memory");
   for (int i = 1; i < 4; i++)
     s3[i] = __builtin_isgreater (s1[i], s2[i]) ? -1.0 : 0.0;
   asm volatile ("" : : : "memory");

   __builtin_printf ("val: %f\n", s3[1]);
   if (s3[1] != -1.0)
     __builtin_abort ();
   return 0;
}

Optimized dump differs in expected manner:

    <bb 2> [local count: 805306369]:
    MEM <vector(2) double> [(double *)&s1] = { 5.0e+0, 6.0e+0 };
    MEM <vector(2) double> [(double *)&s1 + 16B] = { 5.0e+0,  Nan };
    MEM <vector(2) double> [(double *)&s2] = { 6.0e+0, 5.0e+0 };
    MEM <vector(2) double> [(double *)&s2 + 16B] = { 5.0e+0, 5.0e+0 };
    __asm__ __volatile__("" :  :  : "memory");
    vect__1.13_51 = MEM <vector(2) double> [(double *)&s1 + 8B];
    vect__2.16_55 = MEM <vector(2) double> [(double *)&s2 + 8B];
-  vect_iftmp.17_58 = VEC_COND_EXPR <vect__1.13_51 u<= vect__2.16_55, { 0.0, 0.0 }, { -1.0e+0, -1.0e+0 }>;
-  MEM <vector(2) double> [(double *)&s3 + 8B] = vect_iftmp.17_58;
+  _58 = vect__1.13_51 u<= vect__2.16_55;
+  vect_iftmp.17_59 = .VCOND (vect__1.13_51, vect__2.16_55, { 0.0, 0.0 }, { -1.0e+0, -1.0e+0 }, 117);
+  MEM <vector(2) double> [(double *)&s3 + 8B] = vect_iftmp.17_59;
    _41 = s1[3];
    _42 = s2[3];
    if (_41 u<= _42)
      goto <bb 3>; [50.00%]
    else
      goto <bb 4>; [50.00%]
  
    <bb 3> [local count: 402653185]:
  
    <bb 4> [local count: 805306369]:
    # iftmp.0_43 = PHI <-1.0e+0(2), 0.0(3)>
    s3[3] = iftmp.0_43;
    __asm__ __volatile__("" :  :  : "memory");

but we fail with:

$ gcc pr50310.c -m32 -O3 -ffloat-store && ./a.out
val: -nan
Aborted (core dumped)

I'm digging deeper.
Martin
diff mbox series

Patch

From 4a6f4aa3cdef7a032a4ad442e6cd5ec2e706144d Mon Sep 17 00:00:00 2001
From: Martin Liska <mliska@suse.cz>
Date: Mon, 9 Mar 2020 13:23:03 +0100
Subject: [PATCH] Lower VEC_COND_EXPR into internal functions.

gcc/ChangeLog:

2020-03-30  Martin Liska  <mliska@suse.cz>

	* expr.c (expand_expr_real_2): Put gcc_unreachable, we should reach
	this path.
	(do_store_flag): Likewise here.
	* internal-fn.c (vec_cond_mask_direct): New.
	(vec_cond_direct): Likewise.
	(vec_condu_direct): Likewise.
	(vec_condeq_direct): Likewise.
	(expand_vect_cond_optab_fn): Move from optabs.c.
	(expand_vec_cond_optab_fn): New alias.
	(expand_vec_condu_optab_fn): Likewise.
	(expand_vec_condeq_optab_fn): Likewise.
	(expand_vect_cond_mask_optab_fn): Moved from optabs.c.
	(expand_vec_cond_mask_optab_fn): New alias.
	(direct_vec_cond_mask_optab_supported_p): New.
	(direct_vec_cond_optab_supported_p): Likewise.
	(direct_vec_condu_optab_supported_p): Likewise.
	(direct_vec_condeq_optab_supported_p): Likewise.
	* internal-fn.def (VCOND): New new internal optab
	function.
	(VCONDU): Likewise.
	(VCONDEQ): Likewise.
	(VCOND_MASK): Likewise.
	* optabs.c (expand_vec_cond_mask_expr): Removed.
	(expand_vec_cond_expr): Likewise.
	* optabs.h (expand_vec_cond_expr): Likewise.
	(vector_compare_rtx): Likewise.
	* passes.def: Add pass_gimple_isel.
	* tree-cfg.c (verify_gimple_assign_ternary): Add new
	GIMPLE check.
	* tree-pass.h (make_pass_gimple_isel): New.
	* tree-ssa-forwprop.c (pass_forwprop::execute): Do not forward
	to already lowered VEC_COND_EXPR.
	* tree-vect-generic.c (expand_vector_divmod): Expand to SSA_NAME.
	(expand_vector_condition): Expand tcc_comparison of a VEC_COND_EXPR
	into a SSA_NAME.
	(gimple_expand_vec_cond_expr): New.
	(gimple_expand_vec_cond_exprs): New.
	(class pass_gimple_isel): New.
	(make_pass_gimple_isel): New.
---
 gcc/expr.c              |  25 +----
 gcc/internal-fn.c       |  89 ++++++++++++++++
 gcc/internal-fn.def     |   5 +
 gcc/optabs.c            | 124 +---------------------
 gcc/optabs.h            |   7 +-
 gcc/passes.def          |   1 +
 gcc/tree-cfg.c          |   8 ++
 gcc/tree-pass.h         |   1 +
 gcc/tree-ssa-forwprop.c |   6 ++
 gcc/tree-vect-generic.c | 226 ++++++++++++++++++++++++++++++++++++++--
 10 files changed, 338 insertions(+), 154 deletions(-)

diff --git a/gcc/expr.c b/gcc/expr.c
index b97c217e86d..d6cecd0f251 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9200,17 +9200,8 @@  expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
       if (temp != 0)
 	return temp;
 
-      /* For vector MIN <x, y>, expand it a VEC_COND_EXPR <x <= y, x, y>
-	 and similarly for MAX <x, y>.  */
       if (VECTOR_TYPE_P (type))
-	{
-	  tree t0 = make_tree (type, op0);
-	  tree t1 = make_tree (type, op1);
-	  tree comparison = build2 (code == MIN_EXPR ? LE_EXPR : GE_EXPR,
-				    type, t0, t1);
-	  return expand_vec_cond_expr (type, comparison, t0, t1,
-				       original_target);
-	}
+	gcc_unreachable ();
 
       /* At this point, a MEM target is no longer useful; we will get better
 	 code without it.  */
@@ -9799,10 +9790,6 @@  expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	return temp;
       }
 
-    case VEC_COND_EXPR:
-      target = expand_vec_cond_expr (type, treeop0, treeop1, treeop2, target);
-      return target;
-
     case VEC_DUPLICATE_EXPR:
       op0 = expand_expr (treeop0, NULL_RTX, VOIDmode, modifier);
       target = expand_vector_broadcast (mode, op0);
@@ -12133,8 +12120,7 @@  do_store_flag (sepops ops, rtx target, machine_mode mode)
   STRIP_NOPS (arg1);
 
   /* For vector typed comparisons emit code to generate the desired
-     all-ones or all-zeros mask.  Conveniently use the VEC_COND_EXPR
-     expander for this.  */
+     all-ones or all-zeros mask.  */
   if (TREE_CODE (ops->type) == VECTOR_TYPE)
     {
       tree ifexp = build2 (ops->code, ops->type, arg0, arg1);
@@ -12142,12 +12128,7 @@  do_store_flag (sepops ops, rtx target, machine_mode mode)
 	  && expand_vec_cmp_expr_p (TREE_TYPE (arg0), ops->type, ops->code))
 	return expand_vec_cmp_expr (ops->type, ifexp, target);
       else
-	{
-	  tree if_true = constant_boolean_node (true, ops->type);
-	  tree if_false = constant_boolean_node (false, ops->type);
-	  return expand_vec_cond_expr (ops->type, ifexp, if_true,
-				       if_false, target);
-	}
+	gcc_unreachable ();
     }
 
   /* Optimize (x % C1) == C2 or (x % C1) != C2 if it is beneficial
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 52d1638917a..827bd5aa0d2 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -49,6 +49,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "gimple-ssa.h"
 #include "tree-phinodes.h"
 #include "ssa-iterators.h"
+#include "explow.h"
 
 /* The names of each internal function, indexed by function number.  */
 const char *const internal_fn_name_array[] = {
@@ -107,6 +108,10 @@  init_internal_fns ()
 #define mask_store_direct { 3, 2, false }
 #define store_lanes_direct { 0, 0, false }
 #define mask_store_lanes_direct { 0, 0, false }
+#define vec_cond_mask_direct { 0, 0, false }
+#define vec_cond_direct { 0, 0, false }
+#define vec_condu_direct { 0, 0, false }
+#define vec_condeq_direct { 0, 0, false }
 #define scatter_store_direct { 3, 1, false }
 #define unary_direct { 0, 0, true }
 #define binary_direct { 0, 0, true }
@@ -2544,6 +2549,86 @@  expand_mask_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
 
 #define expand_mask_store_lanes_optab_fn expand_mask_store_optab_fn
 
+/* Expand VCOND, VCONDU and VCONDEQ optab internal functions.
+   The expansion of STMT happens based on OPTAB table associated.  */
+
+static void
+expand_vect_cond_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
+{
+  class expand_operand ops[6];
+  insn_code icode;
+  tree lhs = gimple_call_lhs (stmt);
+  tree op0a = gimple_call_arg (stmt, 0);
+  tree op0b = gimple_call_arg (stmt, 1);
+  tree op1 = gimple_call_arg (stmt, 2);
+  tree op2 = gimple_call_arg (stmt, 3);
+  enum tree_code tcode = (tree_code) int_cst_value (gimple_call_arg (stmt, 4));
+
+  tree vec_cond_type = TREE_TYPE (lhs);
+  tree op_mode = TREE_TYPE (op0a);
+  bool unsignedp = TYPE_UNSIGNED (op_mode);
+
+  machine_mode mode = TYPE_MODE (vec_cond_type);
+  machine_mode cmp_op_mode = TYPE_MODE (op_mode);
+
+  icode = convert_optab_handler (optab, mode, cmp_op_mode);
+  rtx comparison
+    = vector_compare_rtx (VOIDmode, tcode, op0a, op0b, unsignedp, icode, 4);
+  rtx rtx_op1 = expand_normal (op1);
+  rtx rtx_op2 = expand_normal (op2);
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[1], rtx_op1, mode);
+  create_input_operand (&ops[2], rtx_op2, mode);
+  create_fixed_operand (&ops[3], comparison);
+  create_fixed_operand (&ops[4], XEXP (comparison, 0));
+  create_fixed_operand (&ops[5], XEXP (comparison, 1));
+  expand_insn (icode, 6, ops);
+}
+
+#define expand_vec_cond_optab_fn expand_vect_cond_optab_fn
+#define expand_vec_condu_optab_fn expand_vect_cond_optab_fn
+#define expand_vec_condeq_optab_fn expand_vect_cond_optab_fn
+
+/* Expand VCOND_MASK optab internal function.
+   The expansion of STMT happens based on OPTAB table associated.  */
+
+static void
+expand_vect_cond_mask_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
+{
+  class expand_operand ops[4];
+
+  tree lhs = gimple_call_lhs (stmt);
+  tree op0 = gimple_call_arg (stmt, 0);
+  tree op1 = gimple_call_arg (stmt, 1);
+  tree op2 = gimple_call_arg (stmt, 2);
+  tree vec_cond_type = TREE_TYPE (lhs);
+
+  machine_mode mode = TYPE_MODE (vec_cond_type);
+  machine_mode mask_mode = TYPE_MODE (TREE_TYPE (op0));
+  enum insn_code icode = convert_optab_handler (optab, mode, mask_mode);
+  rtx mask, rtx_op1, rtx_op2;
+
+  gcc_assert (icode != CODE_FOR_nothing);
+
+  mask = expand_normal (op0);
+  rtx_op1 = expand_normal (op1);
+  rtx_op2 = expand_normal (op2);
+
+  mask = force_reg (mask_mode, mask);
+  rtx_op1 = force_reg (GET_MODE (rtx_op1), rtx_op1);
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[1], rtx_op1, mode);
+  create_input_operand (&ops[2], rtx_op2, mode);
+  create_input_operand (&ops[3], mask, mask_mode);
+  expand_insn (icode, 4, ops);
+}
+
+#define expand_vec_cond_mask_optab_fn expand_vect_cond_mask_optab_fn
+
 static void
 expand_ABNORMAL_DISPATCHER (internal_fn, gcall *)
 {
@@ -3125,6 +3210,10 @@  multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 #define direct_mask_store_optab_supported_p direct_optab_supported_p
 #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p
 #define direct_mask_store_lanes_optab_supported_p multi_vector_optab_supported_p
+#define direct_vec_cond_mask_optab_supported_p multi_vector_optab_supported_p
+#define direct_vec_cond_optab_supported_p multi_vector_optab_supported_p
+#define direct_vec_condu_optab_supported_p multi_vector_optab_supported_p
+#define direct_vec_condeq_optab_supported_p multi_vector_optab_supported_p
 #define direct_scatter_store_optab_supported_p convert_optab_supported_p
 #define direct_while_optab_supported_p convert_optab_supported_p
 #define direct_fold_extract_optab_supported_p direct_optab_supported_p
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 1d190d492ff..0c6fc371190 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -136,6 +136,11 @@  DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes)
 DEF_INTERNAL_OPTAB_FN (MASK_STORE_LANES, 0,
 		       vec_mask_store_lanes, mask_store_lanes)
 
+DEF_INTERNAL_OPTAB_FN (VCOND, 0, vcond, vec_cond)
+DEF_INTERNAL_OPTAB_FN (VCONDU, 0, vcondu, vec_condu)
+DEF_INTERNAL_OPTAB_FN (VCONDEQ, 0, vcondeq, vec_condeq)
+DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask)
+
 DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while)
 DEF_INTERNAL_OPTAB_FN (CHECK_RAW_PTRS, ECF_CONST | ECF_NOTHROW,
 		       check_raw_ptrs, check_ptrs)
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 8dd351286cd..c66c08e7d55 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -5439,7 +5439,7 @@  get_rtx_code (enum tree_code tcode, bool unsignedp)
    first comparison operand for insn ICODE.  Do not generate the
    compare instruction itself.  */
 
-static rtx
+rtx
 vector_compare_rtx (machine_mode cmp_mode, enum tree_code tcode,
 		    tree t_op0, tree t_op1, bool unsignedp,
 		    enum insn_code icode, unsigned int opno)
@@ -5804,128 +5804,6 @@  expand_vec_perm_var (machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target)
   return tmp;
 }
 
-/* Generate insns for a VEC_COND_EXPR with mask, given its TYPE and its
-   three operands.  */
-
-rtx
-expand_vec_cond_mask_expr (tree vec_cond_type, tree op0, tree op1, tree op2,
-			   rtx target)
-{
-  class expand_operand ops[4];
-  machine_mode mode = TYPE_MODE (vec_cond_type);
-  machine_mode mask_mode = TYPE_MODE (TREE_TYPE (op0));
-  enum insn_code icode = get_vcond_mask_icode (mode, mask_mode);
-  rtx mask, rtx_op1, rtx_op2;
-
-  if (icode == CODE_FOR_nothing)
-    return 0;
-
-  mask = expand_normal (op0);
-  rtx_op1 = expand_normal (op1);
-  rtx_op2 = expand_normal (op2);
-
-  mask = force_reg (mask_mode, mask);
-  rtx_op1 = force_reg (GET_MODE (rtx_op1), rtx_op1);
-
-  create_output_operand (&ops[0], target, mode);
-  create_input_operand (&ops[1], rtx_op1, mode);
-  create_input_operand (&ops[2], rtx_op2, mode);
-  create_input_operand (&ops[3], mask, mask_mode);
-  expand_insn (icode, 4, ops);
-
-  return ops[0].value;
-}
-
-/* Generate insns for a VEC_COND_EXPR, given its TYPE and its
-   three operands.  */
-
-rtx
-expand_vec_cond_expr (tree vec_cond_type, tree op0, tree op1, tree op2,
-		      rtx target)
-{
-  class expand_operand ops[6];
-  enum insn_code icode;
-  rtx comparison, rtx_op1, rtx_op2;
-  machine_mode mode = TYPE_MODE (vec_cond_type);
-  machine_mode cmp_op_mode;
-  bool unsignedp;
-  tree op0a, op0b;
-  enum tree_code tcode;
-
-  if (COMPARISON_CLASS_P (op0))
-    {
-      op0a = TREE_OPERAND (op0, 0);
-      op0b = TREE_OPERAND (op0, 1);
-      tcode = TREE_CODE (op0);
-    }
-  else
-    {
-      gcc_assert (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (op0)));
-      if (get_vcond_mask_icode (mode, TYPE_MODE (TREE_TYPE (op0)))
-	  != CODE_FOR_nothing)
-	return expand_vec_cond_mask_expr (vec_cond_type, op0, op1,
-					  op2, target);
-      /* Fake op0 < 0.  */
-      else
-	{
-	  gcc_assert (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (op0)))
-		      == MODE_VECTOR_INT);
-	  op0a = op0;
-	  op0b = build_zero_cst (TREE_TYPE (op0));
-	  tcode = LT_EXPR;
-	}
-    }
-  cmp_op_mode = TYPE_MODE (TREE_TYPE (op0a));
-  unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
-
-
-  gcc_assert (known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE (cmp_op_mode))
-	      && known_eq (GET_MODE_NUNITS (mode),
-			   GET_MODE_NUNITS (cmp_op_mode)));
-
-  icode = get_vcond_icode (mode, cmp_op_mode, unsignedp);
-  if (icode == CODE_FOR_nothing)
-    {
-      if (tcode == LT_EXPR
-	  && op0a == op0
-	  && TREE_CODE (op0) == VECTOR_CST)
-	{
-	  /* A VEC_COND_EXPR condition could be folded from EQ_EXPR/NE_EXPR
-	     into a constant when only get_vcond_eq_icode is supported.
-	     Verify < 0 and != 0 behave the same and change it to NE_EXPR.  */
-	  unsigned HOST_WIDE_INT nelts;
-	  if (!VECTOR_CST_NELTS (op0).is_constant (&nelts))
-	    {
-	      if (VECTOR_CST_STEPPED_P (op0))
-		return 0;
-	      nelts = vector_cst_encoded_nelts (op0);
-	    }
-	  for (unsigned int i = 0; i < nelts; ++i)
-	    if (tree_int_cst_sgn (vector_cst_elt (op0, i)) == 1)
-	      return 0;
-	  tcode = NE_EXPR;
-	}
-      if (tcode == EQ_EXPR || tcode == NE_EXPR)
-	icode = get_vcond_eq_icode (mode, cmp_op_mode);
-      if (icode == CODE_FOR_nothing)
-	return 0;
-    }
-
-  comparison = vector_compare_rtx (VOIDmode, tcode, op0a, op0b, unsignedp,
-				   icode, 4);
-  rtx_op1 = expand_normal (op1);
-  rtx_op2 = expand_normal (op2);
-
-  create_output_operand (&ops[0], target, mode);
-  create_input_operand (&ops[1], rtx_op1, mode);
-  create_input_operand (&ops[2], rtx_op2, mode);
-  create_fixed_operand (&ops[3], comparison);
-  create_fixed_operand (&ops[4], XEXP (comparison, 0));
-  create_fixed_operand (&ops[5], XEXP (comparison, 1));
-  expand_insn (icode, 6, ops);
-  return ops[0].value;
-}
-
 /* Generate VEC_SERIES_EXPR <OP0, OP1>, returning a value of mode VMODE.
    Use TARGET for the result if nonnull and convenient.  */
 
diff --git a/gcc/optabs.h b/gcc/optabs.h
index 5bd19503a0a..7c2ec257cb0 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -321,9 +321,6 @@  extern rtx expand_vec_perm_const (machine_mode, rtx, rtx,
 /* Generate code for vector comparison.  */
 extern rtx expand_vec_cmp_expr (tree, tree, rtx);
 
-/* Generate code for VEC_COND_EXPR.  */
-extern rtx expand_vec_cond_expr (tree, tree, tree, tree, rtx);
-
 /* Generate code for VEC_SERIES_EXPR.  */
 extern rtx expand_vec_series_expr (machine_mode, rtx, rtx, rtx);
 
@@ -364,5 +361,9 @@  extern void expand_jump_insn (enum insn_code icode, unsigned int nops,
 			      class expand_operand *ops);
 
 extern enum rtx_code get_rtx_code (enum tree_code tcode, bool unsignedp);
+extern rtx vector_compare_rtx (machine_mode cmp_mode, enum tree_code tcode,
+			       tree t_op0, tree t_op1, bool unsignedp,
+			       enum insn_code icode, unsigned int opno);
+
 
 #endif /* GCC_OPTABS_H */
diff --git a/gcc/passes.def b/gcc/passes.def
index 2bf2cb78fc5..d654e5ee9fe 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -397,6 +397,7 @@  along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_cleanup_eh);
   NEXT_PASS (pass_lower_resx);
   NEXT_PASS (pass_nrv);
+  NEXT_PASS (pass_gimple_isel);
   NEXT_PASS (pass_cleanup_cfg_post_optimizing);
   NEXT_PASS (pass_warn_function_noreturn);
   NEXT_PASS (pass_gen_hsail);
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index f7b817d94e6..7154f436bb8 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -4197,6 +4197,14 @@  verify_gimple_assign_ternary (gassign *stmt)
 	  debug_generic_expr (rhs1_type);
 	  return true;
 	}
+      else if (cfun->curr_properties & PROP_gimple_lvec
+	       && TREE_CODE_CLASS (TREE_CODE (rhs1)) == tcc_comparison)
+	{
+	  error ("the first argument of %<VEC_COND_EXPR%> cannot be "
+		 "a %<GENERIC%> tree comparison expression");
+	  debug_generic_expr (rhs1);
+	  return true;
+	}
       /* Fallthrough.  */
     case COND_EXPR:
       if (!is_gimple_val (rhs1)
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index a1207a20a3c..490bc9702be 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -625,6 +625,7 @@  extern gimple_opt_pass *make_pass_local_fn_summary (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_update_address_taken (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_convert_switch (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lower_vaarg (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_gimple_isel (gcc::context *ctxt);
 
 /* Current optimization pass.  */
 extern opt_pass *current_pass;
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index 234c1f7dd7d..ce8537a58a7 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -3057,6 +3057,12 @@  pass_forwprop::execute (function *fun)
 		    if (code == COND_EXPR
 			|| code == VEC_COND_EXPR)
 		      {
+			/* Do not propagate into VEC_COND_EXPRs after they are
+			   vector lowering pass.  */
+			if (code == VEC_COND_EXPR
+			    && (fun->curr_properties & PROP_gimple_lvec))
+			  break;
+
 			/* In this case the entire COND_EXPR is in rhs1. */
 			if (forward_propagate_into_cond (&gsi))
 			  {
diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
index 2f6fd5e980c..587faf7eb6e 100644
--- a/gcc/tree-vect-generic.c
+++ b/gcc/tree-vect-generic.c
@@ -691,12 +691,14 @@  expand_vector_divmod (gimple_stmt_iterator *gsi, tree type, tree op0,
 	  if (addend == NULL_TREE
 	      && expand_vec_cond_expr_p (type, type, LT_EXPR))
 	    {
-	      tree zero, cst, cond, mask_type;
-	      gimple *stmt;
+	      tree zero, cst, mask_type, mask;
+	      gimple *stmt, *cond;
 
 	      mask_type = truth_type_for (type);
 	      zero = build_zero_cst (type);
-	      cond = build2 (LT_EXPR, mask_type, op0, zero);
+	      mask = make_ssa_name (mask_type);
+	      cond = gimple_build_assign (mask, LT_EXPR, op0, zero);
+	      gsi_insert_before (gsi, cond, GSI_SAME_STMT);
 	      tree_vector_builder vec (type, nunits, 1);
 	      for (i = 0; i < nunits; i++)
 		vec.quick_push (build_int_cst (TREE_TYPE (type),
@@ -704,8 +706,8 @@  expand_vector_divmod (gimple_stmt_iterator *gsi, tree type, tree op0,
 						<< shifts[i]) - 1));
 	      cst = vec.build ();
 	      addend = make_ssa_name (type);
-	      stmt = gimple_build_assign (addend, VEC_COND_EXPR, cond,
-					  cst, zero);
+	      stmt
+		= gimple_build_assign (addend, VEC_COND_EXPR, mask, cst, zero);
 	      gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
 	    }
 	}
@@ -944,7 +946,17 @@  expand_vector_condition (gimple_stmt_iterator *gsi)
     }
 
   if (expand_vec_cond_expr_p (type, TREE_TYPE (a1), TREE_CODE (a)))
-    return;
+    {
+      if (a_is_comparison)
+	{
+	  a = gimplify_build2 (gsi, TREE_CODE (a), TREE_TYPE (a), a1, a2);
+	  gimple_assign_set_rhs1 (stmt, a);
+	  update_stmt (stmt);
+	  return;
+	}
+      gcc_assert (TREE_CODE (a) == SSA_NAME || TREE_CODE (a) == VECTOR_CST);
+      return;
+    }
 
   /* Handle vector boolean types with bitmasks.  If there is a comparison
      and we can expand the comparison into the vector boolean bitmask,
@@ -2224,6 +2236,165 @@  expand_vector_operations (void)
   return cfg_changed ? TODO_cleanup_cfg : 0;
 }
 
+/* Expand all VEC_COND_EXPR gimple assignments into calls to internal
+   function based on type of selected expansion.  */
+
+static gimple *
+gimple_expand_vec_cond_expr (gimple_stmt_iterator *gsi)
+{
+  tree lhs, op0a = NULL_TREE, op0b = NULL_TREE;
+  enum tree_code code;
+  enum tree_code tcode;
+  machine_mode cmp_op_mode;
+  bool unsignedp;
+  enum insn_code icode;
+  imm_use_iterator imm_iter;
+
+  /* Only consider code == GIMPLE_ASSIGN.  */
+  gassign *stmt = dyn_cast<gassign *> (gsi_stmt (*gsi));
+  if (!stmt)
+    return NULL;
+
+  code = gimple_assign_rhs_code (stmt);
+  if (code != VEC_COND_EXPR)
+    return NULL;
+
+  tree op0 = gimple_assign_rhs1 (stmt);
+  tree op1 = gimple_assign_rhs2 (stmt);
+  tree op2 = gimple_assign_rhs3 (stmt);
+  lhs = gimple_assign_lhs (stmt);
+  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
+
+  gcc_assert (!COMPARISON_CLASS_P (op0));
+  if (TREE_CODE (op0) == SSA_NAME)
+    {
+      unsigned int used_vec_cond_exprs = 0;
+      gimple *use_stmt;
+      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, op0)
+	{
+	  gassign *assign = dyn_cast<gassign *> (use_stmt);
+	  if (assign != NULL && gimple_assign_rhs_code (assign) == VEC_COND_EXPR
+	      && gimple_assign_rhs1 (assign) == op0)
+	    used_vec_cond_exprs++;
+	}
+
+      gassign *def_stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (op0));
+      if (def_stmt)
+	{
+	  tcode = gimple_assign_rhs_code (def_stmt);
+	  op0a = gimple_assign_rhs1 (def_stmt);
+	  op0b = gimple_assign_rhs2 (def_stmt);
+
+	  tree op0a_type = TREE_TYPE (op0a);
+	  if (used_vec_cond_exprs >= 2
+	      && (get_vcond_mask_icode (mode, TYPE_MODE (op0a_type))
+		  != CODE_FOR_nothing)
+	      && expand_vec_cmp_expr_p (op0a_type, TREE_TYPE (lhs), tcode))
+	    {
+	      /* Keep the SSA name and use vcond_mask.  */
+	      tcode = TREE_CODE (op0);
+	    }
+	}
+      else
+	tcode = TREE_CODE (op0);
+    }
+  else
+    tcode = TREE_CODE (op0);
+
+  if (TREE_CODE_CLASS (tcode) != tcc_comparison)
+    {
+      gcc_assert (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (op0)));
+      if (get_vcond_mask_icode (mode, TYPE_MODE (TREE_TYPE (op0)))
+	  != CODE_FOR_nothing)
+	return gimple_build_call_internal (IFN_VCOND_MASK, 3, op0, op1, op2);
+      /* Fake op0 < 0.  */
+      else
+	{
+	  gcc_assert (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (op0)))
+		      == MODE_VECTOR_INT);
+	  op0a = op0;
+	  op0b = build_zero_cst (TREE_TYPE (op0));
+	  tcode = LT_EXPR;
+	}
+    }
+  cmp_op_mode = TYPE_MODE (TREE_TYPE (op0a));
+  unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
+
+
+  gcc_assert (known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE (cmp_op_mode))
+	      && known_eq (GET_MODE_NUNITS (mode),
+			   GET_MODE_NUNITS (cmp_op_mode)));
+
+  icode = get_vcond_icode (mode, cmp_op_mode, unsignedp);
+  if (icode == CODE_FOR_nothing)
+    {
+      if (tcode == LT_EXPR
+	  && op0a == op0
+	  && TREE_CODE (op0) == VECTOR_CST)
+	{
+	  /* A VEC_COND_EXPR condition could be folded from EQ_EXPR/NE_EXPR
+	     into a constant when only get_vcond_eq_icode is supported.
+	     Verify < 0 and != 0 behave the same and change it to NE_EXPR.  */
+	  unsigned HOST_WIDE_INT nelts;
+	  if (!VECTOR_CST_NELTS (op0).is_constant (&nelts))
+	    {
+	      if (VECTOR_CST_STEPPED_P (op0))
+		gcc_unreachable ();
+	      nelts = vector_cst_encoded_nelts (op0);
+	    }
+	  for (unsigned int i = 0; i < nelts; ++i)
+	    if (tree_int_cst_sgn (vector_cst_elt (op0, i)) == 1)
+	      gcc_unreachable ();
+	  tcode = NE_EXPR;
+	}
+      if (tcode == EQ_EXPR || tcode == NE_EXPR)
+	{
+	  tree tcode_tree = build_int_cst (integer_type_node, tcode);
+	  return gimple_build_call_internal (IFN_VCONDEQ, 5, op0a, op0b, op1,
+					     op2, tcode_tree);
+	}
+    }
+
+  gcc_assert (icode != CODE_FOR_nothing);
+  tree tcode_tree = build_int_cst (integer_type_node, tcode);
+  return gimple_build_call_internal (unsignedp ? IFN_VCONDU : IFN_VCOND,
+				     5, op0a, op0b, op1, op2, tcode_tree);
+}
+
+/* Iterate all gimple statements and try to expand
+   VEC_COND_EXPR assignments.  */
+
+static unsigned int
+gimple_expand_vec_cond_exprs (void)
+{
+  gimple_stmt_iterator gsi;
+  basic_block bb;
+  bool cfg_changed = false;
+
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gimple *g = gimple_expand_vec_cond_expr (&gsi);
+	  if (g != NULL)
+	    {
+	      tree lhs = gimple_assign_lhs (gsi_stmt (gsi));
+	      gimple_set_lhs (g, lhs);
+	      gsi_replace (&gsi, g, false);
+	    }
+	  /* ???  If we do not cleanup EH then we will ICE in
+	     verification.  But in reality we have created wrong-code
+	     as we did not properly transition EH info and edges to
+	     the piecewise computations.  */
+	  if (maybe_clean_eh_stmt (gsi_stmt (gsi))
+	      && gimple_purge_dead_eh_edges (bb))
+	    cfg_changed = true;
+	}
+    }
+
+  return cfg_changed ? TODO_cleanup_cfg : 0;
+}
+
 namespace {
 
 const pass_data pass_data_lower_vector =
@@ -2307,4 +2478,47 @@  make_pass_lower_vector_ssa (gcc::context *ctxt)
   return new pass_lower_vector_ssa (ctxt);
 }
 
+namespace {
+
+const pass_data pass_data_gimple_isel =
+{
+  GIMPLE_PASS, /* type */
+  "isel", /* name */
+  OPTGROUP_VEC, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  PROP_cfg, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  TODO_update_ssa, /* todo_flags_finish */
+};
+
+class pass_gimple_isel : public gimple_opt_pass
+{
+public:
+  pass_gimple_isel (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_gimple_isel, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *)
+    {
+      return true;
+    }
+
+  virtual unsigned int execute (function *)
+    {
+      return gimple_expand_vec_cond_exprs ();
+    }
+
+}; // class pass_gimple_isel
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_gimple_isel (gcc::context *ctxt)
+{
+  return new pass_gimple_isel (ctxt);
+}
+
 #include "gt-tree-vect-generic.h"
-- 
2.26.0