Replace FMA_EXPR with one internal fn per optab

Message ID 87po22p10d.fsf@linaro.org
State New
Headers show
Series
  • Replace FMA_EXPR with one internal fn per optab
Related show

Commit Message

Richard Sandiford May 11, 2018, 5:15 p.m.
There are four optabs for various forms of fused multiply-add:
fma, fms, fnma and fnms.  Of these, only fma had a direct gimple
representation.  For the other three we relied on special pattern-
matching during expand, although tree-ssa-math-opts.c did have
some code to try to second-guess what expand would do.

This patch removes the old FMA_EXPR representation of fma and
introduces four new internal functions, one for each optab.
IFN_FMA is tied to BUILT_IN_FMA* while the other three are
independent directly-mapped internal functions.  It's then
possible to do the pattern-matching in match.pd and
tree-ssa-math-opts.c (via folding) can select the exact
FMA-based operation.

The patch removes the gimple FE support for __FMA rather than mapping
it to the internal function.  There's no reason now to treat it
differently from other internal functions (although the FE doesn't
handle those yet).

The BRIG & HSA parts are a best guess, but seem relatively simple.

The genmatch.c changes are structured to allow ternary ops in which
the second two rather than the first two operands are commutative.
A later patch makes use of this.

Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf,
x86_64-linux-gnu and powerpc64le-linux-gnu.  OK to install?

Richard


2018-05-11  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* doc/sourcebuild.texi (all_scalar_fma): Document.
	* tree.def (FMA_EXPR): Delete.
	* internal-fn.def (FMA, FMS, FNMA, FNMS): New internal functions.
	* internal-fn.c (ternary_direct): New macro.
	(expand_ternary_optab_fn): Likewise.
	(direct_ternary_optab_supported_p): Likewise.
	* Makefile.in (build/genmatch.o): Depend on case-fn-macros.h.
	* builtins.c (fold_builtin_fma): Delete.
	(fold_builtin_3): Don't call it.
	* cfgexpand.c (expand_debug_expr): Remove FMA_EXPR handling.
	* expr.c (expand_expr_real_2): Likewise.
	* fold-const.c (operand_equal_p): Likewise.
	(fold_ternary_loc): Likewise.
	* gimple-pretty-print.c (dump_ternary_rhs): Likewise.
	* gimple.c (DEFTREECODE): Likewise.
	* gimplify.c (gimplify_expr): Likewise.
	* optabs-tree.c (optab_for_tree_code): Likewise.
	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
	* tree-eh.c (operation_could_trap_p): Likewise.
	(stmt_could_throw_1_p): Likewise.
	* tree-inline.c (estimate_operator_cost): Likewise.
	* tree-pretty-print.c (dump_generic_node): Likewise.
	(op_code_prio): Likewise.
	* tree-ssa-loop-im.c (stmt_cost): Likewise.
	* tree-ssa-operands.c (get_expr_operands): Likewise.
	* tree.c (commutative_ternary_tree_code, add_expr): Likewise.
	* fold-const-call.h (fold_fma): Delete.
	* fold-const-call.c (fold_const_call_ssss): Handle CFN_FMS,
	CFN_FNMA and CFN_FNMS.
	(fold_fma): Delete.
	* genmatch.c (combined_fn): New enum.
	(commutative_ternary_tree_code): Remove FMA_EXPR handling.
	(commutative_op): New function.
	(commutate): Use it.  Handle more than 2 operands.
	(dt_operand::gen_gimple_expr): Use commutative_op.
	(parser::parse_expr): Allow :c to be used with non-binary
	operators if the commutative operand is known.
	* gimple-ssa-backprop.c (backprop::process_builtin_call_use): Handle
	CFN_FMS, CFN_FNMA and CFN_FNMS.
	(backprop::process_assign_use): Remove FMA_EXPR handling.
	* hsa-gen.c (gen_hsa_insns_for_operation_assignment): Likewise.
	(gen_hsa_fma): New function.
	(gen_hsa_insn_for_internal_fn_call): Use it for IFN_FMA, IFN_FMS,
	IFN_FNMA and IFN_FNMS.
	* match.pd: Add folds for IFN_FMS, IFN_FNMA and IFN_FNMS.
	* tree-ssa-math-opts.c (aggressive_valueize): New function.
	(convert_mult_to_fma_1): Use the gimple_build interface and use
	aggerssive_valueize to fold the result.
	(convert_mult_to_fma): Use direct_internal_fn_suppoerted_p
	instead of checking for optabs directly.
	* config/i386/i386.c (ix86_add_stmt_cost): Recognize FMAs as calls
	rather than FMA_EXPRs.
	* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Create a
	call to IFN_FMA instead of an FMA_EXPR.

gcc/brig/
	* brigfrontend/brig-function.cc
	(brig_function::get_builtin_for_hsa_opcode): Use BUILT_IN_FMA
	for BRIG_OPCODE_FMA.
	(brig_function::get_tree_code_for_hsa_opcode): Treat BUILT_IN_FMA
	as a call.

gcc/c/
	* gimple-parser.c (c_parser_gimple_postfix_expression): Remove
	__FMA_EXPR handlng.

gcc/cp/
	* constexpr.c (cxx_eval_constant_expression): Remove FMA_EXPR handling.
	(potential_constant_expression_1): Likewise.

gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_all_scalar_fma):
	New proc.
	* gcc.dg/fma-1.c: New test.
	* gcc.dg/fma-2.c: Likewise.
	* gcc.dg/fma-3.c: Likewise.
	* gcc.dg/fma-4.c: Likewise.
	* gcc.dg/fma-5.c: Likewise.
	* gcc.dg/fma-6.c: Likewise.
	* gcc.dg/fma-7.c: Likewise.
	* gcc.dg/gimplefe-26.c: Remove.
	* gfortran.dg/reassoc_7.f: Pass -ffp-contract=off.
	* gfortran.dg/reassoc_8.f: Likewise.
	* gfortran.dg/reassoc_9.f: Likewise.
	* gfortran.dg/reassoc_10.f: Likewise.

Comments

Andrew Pinski May 11, 2018, 5:32 p.m. | #1
On Fri, May 11, 2018 at 10:15 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> There are four optabs for various forms of fused multiply-add:
> fma, fms, fnma and fnms.  Of these, only fma had a direct gimple
> representation.  For the other three we relied on special pattern-
> matching during expand, although tree-ssa-math-opts.c did have
> some code to try to second-guess what expand would do.
>
> This patch removes the old FMA_EXPR representation of fma and
> introduces four new internal functions, one for each optab.
> IFN_FMA is tied to BUILT_IN_FMA* while the other three are
> independent directly-mapped internal functions.  It's then
> possible to do the pattern-matching in match.pd and
> tree-ssa-math-opts.c (via folding) can select the exact
> FMA-based operation.
>
> The patch removes the gimple FE support for __FMA rather than mapping
> it to the internal function.  There's no reason now to treat it
> differently from other internal functions (although the FE doesn't
> handle those yet).
>
> The BRIG & HSA parts are a best guess, but seem relatively simple.
>
> The genmatch.c changes are structured to allow ternary ops in which
> the second two rather than the first two operands are commutative.
> A later patch makes use of this.
>
> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf,
> x86_64-linux-gnu and powerpc64le-linux-gnu.  OK to install?


I think there might be an issue with long double and __float128
support here (for both PowerPC and x86_64).  Please add testcases for
those to show they are not problematic.
What about half float on the aarch64 case?  Is that handle correctly?
I did not see a testcase for that case either.

Thanks,
Andrew

>
> Richard
>
>
> 2018-05-11  Richard Sandiford  <richard.sandiford@linaro.org>
>
> gcc/
>         * doc/sourcebuild.texi (all_scalar_fma): Document.
>         * tree.def (FMA_EXPR): Delete.
>         * internal-fn.def (FMA, FMS, FNMA, FNMS): New internal functions.
>         * internal-fn.c (ternary_direct): New macro.
>         (expand_ternary_optab_fn): Likewise.
>         (direct_ternary_optab_supported_p): Likewise.
>         * Makefile.in (build/genmatch.o): Depend on case-fn-macros.h.
>         * builtins.c (fold_builtin_fma): Delete.
>         (fold_builtin_3): Don't call it.
>         * cfgexpand.c (expand_debug_expr): Remove FMA_EXPR handling.
>         * expr.c (expand_expr_real_2): Likewise.
>         * fold-const.c (operand_equal_p): Likewise.
>         (fold_ternary_loc): Likewise.
>         * gimple-pretty-print.c (dump_ternary_rhs): Likewise.
>         * gimple.c (DEFTREECODE): Likewise.
>         * gimplify.c (gimplify_expr): Likewise.
>         * optabs-tree.c (optab_for_tree_code): Likewise.
>         * tree-cfg.c (verify_gimple_assign_ternary): Likewise.
>         * tree-eh.c (operation_could_trap_p): Likewise.
>         (stmt_could_throw_1_p): Likewise.
>         * tree-inline.c (estimate_operator_cost): Likewise.
>         * tree-pretty-print.c (dump_generic_node): Likewise.
>         (op_code_prio): Likewise.
>         * tree-ssa-loop-im.c (stmt_cost): Likewise.
>         * tree-ssa-operands.c (get_expr_operands): Likewise.
>         * tree.c (commutative_ternary_tree_code, add_expr): Likewise.
>         * fold-const-call.h (fold_fma): Delete.
>         * fold-const-call.c (fold_const_call_ssss): Handle CFN_FMS,
>         CFN_FNMA and CFN_FNMS.
>         (fold_fma): Delete.
>         * genmatch.c (combined_fn): New enum.
>         (commutative_ternary_tree_code): Remove FMA_EXPR handling.
>         (commutative_op): New function.
>         (commutate): Use it.  Handle more than 2 operands.
>         (dt_operand::gen_gimple_expr): Use commutative_op.
>         (parser::parse_expr): Allow :c to be used with non-binary
>         operators if the commutative operand is known.
>         * gimple-ssa-backprop.c (backprop::process_builtin_call_use): Handle
>         CFN_FMS, CFN_FNMA and CFN_FNMS.
>         (backprop::process_assign_use): Remove FMA_EXPR handling.
>         * hsa-gen.c (gen_hsa_insns_for_operation_assignment): Likewise.
>         (gen_hsa_fma): New function.
>         (gen_hsa_insn_for_internal_fn_call): Use it for IFN_FMA, IFN_FMS,
>         IFN_FNMA and IFN_FNMS.
>         * match.pd: Add folds for IFN_FMS, IFN_FNMA and IFN_FNMS.
>         * tree-ssa-math-opts.c (aggressive_valueize): New function.
>         (convert_mult_to_fma_1): Use the gimple_build interface and use
>         aggerssive_valueize to fold the result.
>         (convert_mult_to_fma): Use direct_internal_fn_suppoerted_p
>         instead of checking for optabs directly.
>         * config/i386/i386.c (ix86_add_stmt_cost): Recognize FMAs as calls
>         rather than FMA_EXPRs.
>         * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Create a
>         call to IFN_FMA instead of an FMA_EXPR.
>
> gcc/brig/
>         * brigfrontend/brig-function.cc
>         (brig_function::get_builtin_for_hsa_opcode): Use BUILT_IN_FMA
>         for BRIG_OPCODE_FMA.
>         (brig_function::get_tree_code_for_hsa_opcode): Treat BUILT_IN_FMA
>         as a call.
>
> gcc/c/
>         * gimple-parser.c (c_parser_gimple_postfix_expression): Remove
>         __FMA_EXPR handlng.
>
> gcc/cp/
>         * constexpr.c (cxx_eval_constant_expression): Remove FMA_EXPR handling.
>         (potential_constant_expression_1): Likewise.
>
> gcc/testsuite/
>         * lib/target-supports.exp (check_effective_target_all_scalar_fma):
>         New proc.
>         * gcc.dg/fma-1.c: New test.
>         * gcc.dg/fma-2.c: Likewise.
>         * gcc.dg/fma-3.c: Likewise.
>         * gcc.dg/fma-4.c: Likewise.
>         * gcc.dg/fma-5.c: Likewise.
>         * gcc.dg/fma-6.c: Likewise.
>         * gcc.dg/fma-7.c: Likewise.
>         * gcc.dg/gimplefe-26.c: Remove.
>         * gfortran.dg/reassoc_7.f: Pass -ffp-contract=off.
>         * gfortran.dg/reassoc_8.f: Likewise.
>         * gfortran.dg/reassoc_9.f: Likewise.
>         * gfortran.dg/reassoc_10.f: Likewise.
>
> Index: gcc/doc/sourcebuild.texi
> ===================================================================
> --- gcc/doc/sourcebuild.texi    2018-05-08 09:42:01.479723260 +0100
> +++ gcc/doc/sourcebuild.texi    2018-05-11 18:08:24.833947453 +0100
> @@ -2170,6 +2170,11 @@ Target supports wide characters.
>  @subsubsection Other attributes
>
>  @table @code
> +@item all_scalar_fma
> +Target supports all four fused multiply-add optabs for both @code{float}
> +and @code{double}.  These optabs are: @code{fma_optab}, @code{fms_optab},
> +@code{fnma_optab} and @code{fnms_optab}.
> +
>  @item automatic_stack_alignment
>  Target supports automatic stack alignment.
>
> Index: gcc/tree.def
> ===================================================================
> --- gcc/tree.def        2018-01-03 11:12:58.606649676 +0000
> +++ gcc/tree.def        2018-05-11 18:08:24.852946671 +0100
> @@ -1345,12 +1345,6 @@ DEFTREECODE (WIDEN_MULT_MINUS_EXPR, "wid
>     by the second argument.  */
>  DEFTREECODE (WIDEN_LSHIFT_EXPR, "widen_lshift_expr", tcc_binary, 2)
>
> -/* Fused multiply-add.
> -   All operands and the result are of the same type.  No intermediate
> -   rounding is performed after multiplying operand one with operand two
> -   before adding operand three.  */
> -DEFTREECODE (FMA_EXPR, "fma_expr", tcc_expression, 3)
> -
>  /* Widening vector multiplication.
>     The two operands are vectors with N elements of size S. Multiplying the
>     elements of the two vectors will result in N products of size 2*S.
> Index: gcc/internal-fn.def
> ===================================================================
> --- gcc/internal-fn.def 2018-02-20 09:40:41.809452604 +0000
> +++ gcc/internal-fn.def 2018-05-11 18:08:24.842947083 +0100
> @@ -57,6 +57,7 @@ along with GCC; see the file COPYING3.
>
>     - unary: a normal unary optab, such as vec_reverse_<mode>
>     - binary: a normal binary optab, such as vec_interleave_lo_<mode>
> +   - ternary: a normal ternary optab, such as fma<mode>4
>
>     - cond_binary: a conditional binary optab, such as add<mode>cc
>
> @@ -138,6 +139,10 @@ DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CO
>  DEF_INTERNAL_OPTAB_FN (VEC_SHL_INSERT, ECF_CONST | ECF_NOTHROW,
>                        vec_shl_insert, binary)
>
> +DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary)
> +DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary)
> +DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary)
> +
>  DEF_INTERNAL_OPTAB_FN (COND_ADD, ECF_CONST, cond_add, cond_binary)
>  DEF_INTERNAL_OPTAB_FN (COND_SUB, ECF_CONST, cond_sub, cond_binary)
>  DEF_INTERNAL_SIGNED_OPTAB_FN (COND_MIN, ECF_CONST, first,
> @@ -218,6 +223,9 @@ DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONS
>  /* FP scales.  */
>  DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
>
> +/* Ternary math functions.  */
> +DEF_INTERNAL_FLT_FN (FMA, ECF_CONST, fma, ternary)
> +
>  /* Unary integer ops.  */
>  DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb, unary)
>  DEF_INTERNAL_INT_FN (CLZ, ECF_CONST | ECF_NOTHROW, clz, unary)
> Index: gcc/internal-fn.c
> ===================================================================
> --- gcc/internal-fn.c   2018-03-02 09:44:45.456595753 +0000
> +++ gcc/internal-fn.c   2018-05-11 18:08:24.841947124 +0100
> @@ -90,6 +90,7 @@ #define mask_store_lanes_direct { 0, 0,
>  #define scatter_store_direct { 3, 3, false }
>  #define unary_direct { 0, 0, true }
>  #define binary_direct { 0, 0, true }
> +#define ternary_direct { 0, 0, true }
>  #define cond_unary_direct { 1, 1, true }
>  #define cond_binary_direct { 1, 1, true }
>  #define while_direct { 0, 2, false }
> @@ -2962,6 +2963,9 @@ #define expand_unary_optab_fn(FN, STMT,
>  #define expand_binary_optab_fn(FN, STMT, OPTAB) \
>    expand_direct_optab_fn (FN, STMT, OPTAB, 2)
>
> +#define expand_ternary_optab_fn(FN, STMT, OPTAB) \
> +  expand_direct_optab_fn (FN, STMT, OPTAB, 3)
> +
>  #define expand_cond_unary_optab_fn(FN, STMT, OPTAB) \
>    expand_direct_optab_fn (FN, STMT, OPTAB, 2)
>
> @@ -3047,6 +3051,7 @@ multi_vector_optab_supported_p (convert_
>
>  #define direct_unary_optab_supported_p direct_optab_supported_p
>  #define direct_binary_optab_supported_p direct_optab_supported_p
> +#define direct_ternary_optab_supported_p direct_optab_supported_p
>  #define direct_cond_unary_optab_supported_p direct_optab_supported_p
>  #define direct_cond_binary_optab_supported_p direct_optab_supported_p
>  #define direct_mask_load_optab_supported_p direct_optab_supported_p
> Index: gcc/Makefile.in
> ===================================================================
> --- gcc/Makefile.in     2018-03-13 15:06:01.749426280 +0000
> +++ gcc/Makefile.in     2018-05-11 18:08:24.816948152 +0100
> @@ -2786,7 +2786,7 @@ build/genmddump.o : genmddump.c $(RTL_BA
>    $(CORETYPES_H) $(GTM_H) errors.h $(READ_MD_H) $(GENSUPPORT_H)
>  build/genmatch.o : genmatch.c $(BCONFIG_H) $(SYSTEM_H) \
>    $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-map.h $(GGC_H) is-a.h \
> -  tree.def builtins.def internal-fn.def
> +  tree.def builtins.def internal-fn.def case-cfn-macros.h
>  build/gencfn-macros.o : gencfn-macros.c $(BCONFIG_H) $(SYSTEM_H)       \
>    $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-set.h builtins.def      \
>    internal-fn.def
> Index: gcc/builtins.c
> ===================================================================
> --- gcc/builtins.c      2018-05-08 09:42:01.633717606 +0100
> +++ gcc/builtins.c      2018-05-11 18:08:24.817948111 +0100
> @@ -8340,21 +8340,6 @@ fold_builtin_abs (location_t loc, tree a
>    return fold_build1_loc (loc, ABS_EXPR, type, arg);
>  }
>
> -/* Fold a call to fma, fmaf, or fmal with arguments ARG[012].  */
> -
> -static tree
> -fold_builtin_fma (location_t loc, tree arg0, tree arg1, tree arg2, tree type)
> -{
> -  /* ??? Only expand to FMA_EXPR if it's directly supported.  */
> -  if (validate_arg (arg0, REAL_TYPE)
> -      && validate_arg (arg1, REAL_TYPE)
> -      && validate_arg (arg2, REAL_TYPE)
> -      && optab_handler (fma_optab, TYPE_MODE (type)) != CODE_FOR_nothing)
> -    return fold_build3_loc (loc, FMA_EXPR, type, arg0, arg1, arg2);
> -
> -  return NULL_TREE;
> -}
> -
>  /* Fold a call to builtin carg(a+bi) -> atan2(b,a).  */
>
>  static tree
> @@ -9260,10 +9245,6 @@ fold_builtin_3 (location_t loc, tree fnd
>      CASE_FLT_FN (BUILT_IN_SINCOS):
>        return fold_builtin_sincos (loc, arg0, arg1, arg2);
>
> -    CASE_FLT_FN (BUILT_IN_FMA):
> -    CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMA):
> -      return fold_builtin_fma (loc, arg0, arg1, arg2, type);
> -
>      CASE_FLT_FN (BUILT_IN_REMQUO):
>        if (validate_arg (arg0, REAL_TYPE)
>           && validate_arg (arg1, REAL_TYPE)
> Index: gcc/cfgexpand.c
> ===================================================================
> --- gcc/cfgexpand.c     2018-05-09 11:34:47.190553782 +0100
> +++ gcc/cfgexpand.c     2018-05-11 18:08:24.818948070 +0100
> @@ -4202,7 +4202,6 @@ expand_debug_expr (tree exp)
>         case SAD_EXPR:
>         case WIDEN_MULT_PLUS_EXPR:
>         case WIDEN_MULT_MINUS_EXPR:
> -       case FMA_EXPR:
>           goto ternary;
>
>         case TRUTH_ANDIF_EXPR:
> @@ -5190,9 +5189,6 @@ expand_debug_expr (tree exp)
>         }
>        return NULL;
>
> -    case FMA_EXPR:
> -      return simplify_gen_ternary (FMA, mode, inner_mode, op0, op1, op2);
> -
>      default:
>      flag_unsupported:
>        if (flag_checking)
> Index: gcc/expr.c
> ===================================================================
> --- gcc/expr.c  2018-05-01 19:30:32.099548805 +0100
> +++ gcc/expr.c  2018-05-11 18:08:24.833947453 +0100
> @@ -8853,67 +8853,6 @@ #define REDUCE_BIT_FIELD(expr)   (reduce_b
>        expand_operands (treeop0, treeop1, subtarget, &op0, &op1, EXPAND_NORMAL);
>        return REDUCE_BIT_FIELD (expand_mult (mode, op0, op1, target, unsignedp));
>
> -    case FMA_EXPR:
> -      {
> -       optab opt = fma_optab;
> -       gimple *def0, *def2;
> -
> -       /* If there is no insn for FMA, emit it as __builtin_fma{,f,l}
> -          call.  */
> -       if (optab_handler (fma_optab, mode) == CODE_FOR_nothing)
> -         {
> -           tree fn = mathfn_built_in (TREE_TYPE (treeop0), BUILT_IN_FMA);
> -           tree call_expr;
> -
> -           gcc_assert (fn != NULL_TREE);
> -           call_expr = build_call_expr (fn, 3, treeop0, treeop1, treeop2);
> -           return expand_builtin (call_expr, target, subtarget, mode, false);
> -         }
> -
> -       def0 = get_def_for_expr (treeop0, NEGATE_EXPR);
> -       /* The multiplication is commutative - look at its 2nd operand
> -          if the first isn't fed by a negate.  */
> -       if (!def0)
> -         {
> -           def0 = get_def_for_expr (treeop1, NEGATE_EXPR);
> -           /* Swap operands if the 2nd operand is fed by a negate.  */
> -           if (def0)
> -             std::swap (treeop0, treeop1);
> -         }
> -       def2 = get_def_for_expr (treeop2, NEGATE_EXPR);
> -
> -       op0 = op2 = NULL;
> -
> -       if (def0 && def2
> -           && optab_handler (fnms_optab, mode) != CODE_FOR_nothing)
> -         {
> -           opt = fnms_optab;
> -           op0 = expand_normal (gimple_assign_rhs1 (def0));
> -           op2 = expand_normal (gimple_assign_rhs1 (def2));
> -         }
> -       else if (def0
> -                && optab_handler (fnma_optab, mode) != CODE_FOR_nothing)
> -         {
> -           opt = fnma_optab;
> -           op0 = expand_normal (gimple_assign_rhs1 (def0));
> -         }
> -       else if (def2
> -                && optab_handler (fms_optab, mode) != CODE_FOR_nothing)
> -         {
> -           opt = fms_optab;
> -           op2 = expand_normal (gimple_assign_rhs1 (def2));
> -         }
> -
> -       if (op0 == NULL)
> -         op0 = expand_expr (treeop0, subtarget, VOIDmode, EXPAND_NORMAL);
> -       if (op2 == NULL)
> -         op2 = expand_normal (treeop2);
> -       op1 = expand_normal (treeop1);
> -
> -       return expand_ternary_op (TYPE_MODE (type), opt,
> -                                 op0, op1, op2, target, 0);
> -      }
> -
>      case MULT_EXPR:
>        /* If this is a fixed-point operation, then we cannot use the code
>          below because "expand_mult" doesn't support sat/no-sat fixed-point
> Index: gcc/fold-const.c
> ===================================================================
> --- gcc/fold-const.c    2018-05-08 09:42:01.637717459 +0100
> +++ gcc/fold-const.c    2018-05-11 18:08:24.835947371 +0100
> @@ -3297,7 +3297,6 @@ #define OP_SAME_WITH_NULL(N)                              \
>         case TRUTH_ORIF_EXPR:
>           return OP_SAME (0) && OP_SAME (1);
>
> -       case FMA_EXPR:
>         case WIDEN_MULT_PLUS_EXPR:
>         case WIDEN_MULT_MINUS_EXPR:
>           if (!OP_SAME (2))
> @@ -11708,17 +11707,6 @@ fold_ternary_loc (location_t loc, enum t
>
>        return NULL_TREE;
>
> -    case FMA_EXPR:
> -      /* For integers we can decompose the FMA if possible.  */
> -      if (TREE_CODE (arg0) == INTEGER_CST
> -         && TREE_CODE (arg1) == INTEGER_CST)
> -       return fold_build2_loc (loc, PLUS_EXPR, type,
> -                               const_binop (MULT_EXPR, arg0, arg1), arg2);
> -      if (integer_zerop (arg2))
> -       return fold_build2_loc (loc, MULT_EXPR, type, arg0, arg1);
> -
> -      return fold_fma (loc, type, arg0, arg1, arg2);
> -
>      case VEC_PERM_EXPR:
>        if (TREE_CODE (arg2) == VECTOR_CST)
>         {
> Index: gcc/gimple-pretty-print.c
> ===================================================================
> --- gcc/gimple-pretty-print.c   2018-02-13 10:28:33.325853088 +0000
> +++ gcc/gimple-pretty-print.c   2018-05-11 18:08:24.836947330 +0100
> @@ -490,27 +490,6 @@ dump_ternary_rhs (pretty_printer *buffer
>        pp_greater (buffer);
>        break;
>
> -    case FMA_EXPR:
> -      if (flags & TDF_GIMPLE)
> -       {
> -         pp_string (buffer, "__FMA (");
> -         dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false);
> -         pp_comma (buffer);
> -         dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false);
> -         pp_comma (buffer);
> -         dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false);
> -         pp_right_paren (buffer);
> -       }
> -      else
> -       {
> -         dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false);
> -         pp_string (buffer, " * ");
> -         dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false);
> -         pp_string (buffer, " + ");
> -         dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false);
> -       }
> -      break;
> -
>      case DOT_PROD_EXPR:
>        pp_string (buffer, "DOT_PROD_EXPR <");
>        dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false);
> Index: gcc/gimple.c
> ===================================================================
> --- gcc/gimple.c        2018-02-13 10:28:33.415861313 +0000
> +++ gcc/gimple.c        2018-05-11 18:08:24.838947247 +0100
> @@ -2143,8 +2143,7 @@ #define DEFTREECODE(SYM, STRING, TYPE, N
>        || (SYM) == REALIGN_LOAD_EXPR                                        \
>        || (SYM) == VEC_COND_EXPR                                                    \
>        || (SYM) == VEC_PERM_EXPR                                             \
> -      || (SYM) == BIT_INSERT_EXPR                                          \
> -      || (SYM) == FMA_EXPR) ? GIMPLE_TERNARY_RHS                           \
> +      || (SYM) == BIT_INSERT_EXPR) ? GIMPLE_TERNARY_RHS                            \
>     : ((SYM) == CONSTRUCTOR                                                 \
>        || (SYM) == OBJ_TYPE_REF                                             \
>        || (SYM) == ASSERT_EXPR                                              \
> Index: gcc/gimplify.c
> ===================================================================
> --- gcc/gimplify.c      2018-05-08 09:42:02.972668452 +0100
> +++ gcc/gimplify.c      2018-05-11 18:08:24.840947165 +0100
> @@ -12086,7 +12086,6 @@ gimplify_expr (tree *expr_p, gimple_seq
>           }
>           break;
>
> -       case FMA_EXPR:
>         case VEC_PERM_EXPR:
>           /* Classified as tcc_expression.  */
>           goto expr_3;
> Index: gcc/optabs-tree.c
> ===================================================================
> --- gcc/optabs-tree.c   2018-01-13 18:01:26.108685820 +0000
> +++ gcc/optabs-tree.c   2018-05-11 18:08:24.842947083 +0100
> @@ -143,9 +143,6 @@ optab_for_tree_code (enum tree_code code
>               : (TYPE_SATURATING (type)
>                  ? ssmsub_widen_optab : smsub_widen_optab));
>
> -    case FMA_EXPR:
> -      return fma_optab;
> -
>      case VEC_WIDEN_MULT_HI_EXPR:
>        return TYPE_UNSIGNED (type) ?
>         vec_widen_umult_hi_optab : vec_widen_smult_hi_optab;
> Index: gcc/tree-cfg.c
> ===================================================================
> --- gcc/tree-cfg.c      2018-05-01 19:31:03.079312535 +0100
> +++ gcc/tree-cfg.c      2018-05-11 18:08:24.848946836 +0100
> @@ -4109,20 +4109,6 @@ verify_gimple_assign_ternary (gassign *s
>         }
>        break;
>
> -    case FMA_EXPR:
> -      if (!useless_type_conversion_p (lhs_type, rhs1_type)
> -         || !useless_type_conversion_p (lhs_type, rhs2_type)
> -         || !useless_type_conversion_p (lhs_type, rhs3_type))
> -       {
> -         error ("type mismatch in fused multiply-add expression");
> -         debug_generic_expr (lhs_type);
> -         debug_generic_expr (rhs1_type);
> -         debug_generic_expr (rhs2_type);
> -         debug_generic_expr (rhs3_type);
> -         return true;
> -       }
> -      break;
> -
>      case VEC_COND_EXPR:
>        if (!VECTOR_BOOLEAN_TYPE_P (rhs1_type)
>           || maybe_ne (TYPE_VECTOR_SUBPARTS (rhs1_type),
> Index: gcc/tree-eh.c
> ===================================================================
> --- gcc/tree-eh.c       2018-02-08 13:34:20.321293427 +0000
> +++ gcc/tree-eh.c       2018-05-11 18:08:24.849946795 +0100
> @@ -2512,8 +2512,7 @@ operation_could_trap_p (enum tree_code o
>
>    if (TREE_CODE_CLASS (op) != tcc_comparison
>        && TREE_CODE_CLASS (op) != tcc_unary
> -      && TREE_CODE_CLASS (op) != tcc_binary
> -      && op != FMA_EXPR)
> +      && TREE_CODE_CLASS (op) != tcc_binary)
>      return false;
>
>    return operation_could_trap_helper_p (op, fp_operation, honor_trapv,
> @@ -2825,8 +2824,7 @@ stmt_could_throw_1_p (gassign *stmt)
>
>    if (TREE_CODE_CLASS (code) == tcc_comparison
>        || TREE_CODE_CLASS (code) == tcc_unary
> -      || TREE_CODE_CLASS (code) == tcc_binary
> -      || code == FMA_EXPR)
> +      || TREE_CODE_CLASS (code) == tcc_binary)
>      {
>        if (TREE_CODE_CLASS (code) == tcc_comparison)
>         t = TREE_TYPE (gimple_assign_rhs1 (stmt));
> Index: gcc/tree-inline.c
> ===================================================================
> --- gcc/tree-inline.c   2018-03-13 15:06:01.738427167 +0000
> +++ gcc/tree-inline.c   2018-05-11 18:08:24.849946795 +0100
> @@ -3855,7 +3855,6 @@ estimate_operator_cost (enum tree_code c
>      case MINUS_EXPR:
>      case MULT_EXPR:
>      case MULT_HIGHPART_EXPR:
> -    case FMA_EXPR:
>
>      case ADDR_SPACE_CONVERT_EXPR:
>      case FIXED_CONVERT_EXPR:
> Index: gcc/tree-pretty-print.c
> ===================================================================
> --- gcc/tree-pretty-print.c     2018-03-13 15:06:00.523525161 +0000
> +++ gcc/tree-pretty-print.c     2018-05-11 18:08:24.850946754 +0100
> @@ -2901,16 +2901,6 @@ dump_generic_node (pretty_printer *pp, t
>        pp_string (pp, " > ");
>        break;
>
> -    case FMA_EXPR:
> -      pp_string (pp, " FMA_EXPR < ");
> -      dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
> -      pp_string (pp, ", ");
> -      dump_generic_node (pp, TREE_OPERAND (node, 1), spc, flags, false);
> -      pp_string (pp, ", ");
> -      dump_generic_node (pp, TREE_OPERAND (node, 2), spc, flags, false);
> -      pp_string (pp, " > ");
> -      break;
> -
>      case OACC_PARALLEL:
>        pp_string (pp, "#pragma acc parallel");
>        goto dump_omp_clauses_body;
> @@ -3549,7 +3539,6 @@ op_code_prio (enum tree_code code)
>      case CEIL_MOD_EXPR:
>      case FLOOR_MOD_EXPR:
>      case ROUND_MOD_EXPR:
> -    case FMA_EXPR:
>        return 13;
>
>      case TRUTH_NOT_EXPR:
> Index: gcc/tree-ssa-loop-im.c
> ===================================================================
> --- gcc/tree-ssa-loop-im.c      2018-03-13 15:06:00.521525322 +0000
> +++ gcc/tree-ssa-loop-im.c      2018-05-11 18:08:24.850946754 +0100
> @@ -493,7 +493,6 @@ stmt_cost (gimple *stmt)
>      case WIDEN_MULT_PLUS_EXPR:
>      case WIDEN_MULT_MINUS_EXPR:
>      case DOT_PROD_EXPR:
> -    case FMA_EXPR:
>      case TRUNC_DIV_EXPR:
>      case CEIL_DIV_EXPR:
>      case FLOOR_DIV_EXPR:
> Index: gcc/tree-ssa-operands.c
> ===================================================================
> --- gcc/tree-ssa-operands.c     2018-01-03 11:12:59.133628905 +0000
> +++ gcc/tree-ssa-operands.c     2018-05-11 18:08:24.851946713 +0100
> @@ -849,7 +849,6 @@ get_expr_operands (struct function *fn,
>      case REALIGN_LOAD_EXPR:
>      case WIDEN_MULT_PLUS_EXPR:
>      case WIDEN_MULT_MINUS_EXPR:
> -    case FMA_EXPR:
>        {
>         get_expr_operands (fn, stmt, &TREE_OPERAND (expr, 0), flags);
>         get_expr_operands (fn, stmt, &TREE_OPERAND (expr, 1), flags);
> Index: gcc/tree.c
> ===================================================================
> --- gcc/tree.c  2018-05-08 09:42:01.631717680 +0100
> +++ gcc/tree.c  2018-05-11 18:08:24.852946671 +0100
> @@ -7171,7 +7171,6 @@ commutative_ternary_tree_code (enum tree
>      case WIDEN_MULT_PLUS_EXPR:
>      case WIDEN_MULT_MINUS_EXPR:
>      case DOT_PROD_EXPR:
> -    case FMA_EXPR:
>        return true;
>
>      default:
> @@ -7457,7 +7456,6 @@ add_expr (const_tree t, inchash::hash &h
>               flags &= ~OEP_ADDRESS_OF;
>               break;
>
> -           case FMA_EXPR:
>             case WIDEN_MULT_PLUS_EXPR:
>             case WIDEN_MULT_MINUS_EXPR:
>               {
> Index: gcc/fold-const-call.h
> ===================================================================
> --- gcc/fold-const-call.h       2018-01-03 11:12:58.150667646 +0000
> +++ gcc/fold-const-call.h       2018-05-11 18:08:24.833947453 +0100
> @@ -23,7 +23,6 @@ #define GCC_FOLD_CONST_CALL_H
>  tree fold_const_call (combined_fn, tree, tree);
>  tree fold_const_call (combined_fn, tree, tree, tree);
>  tree fold_const_call (combined_fn, tree, tree, tree, tree);
> -tree fold_fma (location_t, tree, tree, tree, tree);
>  tree build_cmp_result (tree type, int res);
>
>  #endif
> Index: gcc/fold-const-call.c
> ===================================================================
> --- gcc/fold-const-call.c       2018-05-08 09:42:01.844709861 +0100
> +++ gcc/fold-const-call.c       2018-05-11 18:08:24.833947453 +0100
> @@ -1606,6 +1606,26 @@ fold_const_call_ssss (real_value *result
>      CASE_CFN_FMA_FN:
>        return do_mpfr_arg3 (result, mpfr_fma, arg0, arg1, arg2, format);
>
> +    case CFN_FMS:
> +      {
> +       real_value new_arg2 = real_value_negate (arg2);
> +       return do_mpfr_arg3 (result, mpfr_fma, arg0, arg1, &new_arg2, format);
> +      }
> +
> +    case CFN_FNMA:
> +      {
> +       real_value new_arg0 = real_value_negate (arg0);
> +       return do_mpfr_arg3 (result, mpfr_fma, &new_arg0, arg1, arg2, format);
> +      }
> +
> +    case CFN_FNMS:
> +      {
> +       real_value new_arg0 = real_value_negate (arg0);
> +       real_value new_arg2 = real_value_negate (arg2);
> +       return do_mpfr_arg3 (result, mpfr_fma, &new_arg0, arg1,
> +                            &new_arg2, format);
> +      }
> +
>      default:
>        return false;
>      }
> @@ -1719,20 +1739,3 @@ fold_const_call (combined_fn fn, tree ty
>        return fold_const_call_1 (fn, type, arg0, arg1, arg2);
>      }
>  }
> -
> -/* Fold a fma operation with arguments ARG[012].  */
> -
> -tree
> -fold_fma (location_t, tree type, tree arg0, tree arg1, tree arg2)
> -{
> -  REAL_VALUE_TYPE result;
> -  if (real_cst_p (arg0)
> -      && real_cst_p (arg1)
> -      && real_cst_p (arg2)
> -      && do_mpfr_arg3 (&result, mpfr_fma, TREE_REAL_CST_PTR (arg0),
> -                      TREE_REAL_CST_PTR (arg1), TREE_REAL_CST_PTR (arg2),
> -                      REAL_MODE_FORMAT (TYPE_MODE (type))))
> -    return build_real (type, result);
> -
> -  return NULL_TREE;
> -}
> Index: gcc/genmatch.c
> ===================================================================
> --- gcc/genmatch.c      2018-03-01 08:20:43.846526310 +0000
> +++ gcc/genmatch.c      2018-05-11 18:08:24.835947371 +0100
> @@ -241,6 +241,20 @@ enum internal_fn {
>    IFN_LAST
>  };
>
> +enum combined_fn {
> +#define DEF_BUILTIN(ENUM, N, C, T, LT, B, F, NA, AT, IM, COND) \
> +  CFN_##ENUM = int (ENUM),
> +#include "builtins.def"
> +
> +#define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) \
> +  CFN_##CODE = int (END_BUILTINS) + int (IFN_##CODE),
> +#include "internal-fn.def"
> +
> +  CFN_LAST
> +};
> +
> +#include "case-cfn-macros.h"
> +
>  /* Return true if CODE represents a commutative tree code.  Otherwise
>     return false.  */
>  bool
> @@ -288,7 +302,6 @@ commutative_ternary_tree_code (enum tree
>      case WIDEN_MULT_PLUS_EXPR:
>      case WIDEN_MULT_MINUS_EXPR:
>      case DOT_PROD_EXPR:
> -    case FMA_EXPR:
>        return true;
>
>      default:
> @@ -450,6 +463,44 @@ is_a_helper <user_id *>::test (id_base *
>    return id->kind == id_base::USER;
>  }
>
> +/* If ID has a pair of consecutive, commutative operands, return the
> +   index of the first, otherwise return -1.  */
> +
> +static int
> +commutative_op (id_base *id)
> +{
> +  if (operator_id *code = dyn_cast <operator_id *> (id))
> +    {
> +      if (commutative_tree_code (code->code)
> +         || commutative_ternary_tree_code (code->code))
> +       return 0;
> +      return -1;
> +    }
> +  if (fn_id *fn = dyn_cast <fn_id *> (id))
> +    switch (fn->fn)
> +      {
> +      CASE_CFN_FMA:
> +      case CFN_FMS:
> +      case CFN_FNMA:
> +      case CFN_FNMS:
> +       return 0;
> +
> +      default:
> +       return -1;
> +      }
> +  if (user_id *uid = dyn_cast<user_id *> (id))
> +    {
> +      int res = commutative_op (uid->substitutes[0]);
> +      if (res < 0)
> +       return 0;
> +      for (unsigned i = 1; i < uid->substitutes.length (); ++i)
> +       if (res != commutative_op (uid->substitutes[i]))
> +         return -1;
> +      return res;
> +    }
> +  return -1;
> +}
> +
>  /* Add a predicate identifier to the hash.  */
>
>  static predicate_id *
> @@ -946,6 +997,9 @@ commutate (operand *op, vec<vec<user_id
>    if (!e->is_commutative)
>      return ret;
>
> +  /* The operation is always binary if it isn't inherently commutative.  */
> +  int natural_opno = commutative_op (e->operation);
> +  unsigned int opno = natural_opno >= 0 ? natural_opno : 0;
>    for (unsigned i = 0; i < result.length (); ++i)
>      {
>        expr *ne = new expr (e);
> @@ -994,9 +1048,11 @@ commutate (operand *op, vec<vec<user_id
>             }
>         }
>        ne->is_commutative = false;
> -      // result[i].length () is 2 since e->operation is binary
> -      for (unsigned j = result[i].length (); j; --j)
> -       ne->append_op (result[i][j-1]);
> +      for (unsigned j = 0; j < result[i].length (); ++j)
> +       {
> +         int old_j = (j == opno ? opno + 1 : j == opno + 1 ? opno : j);
> +         ne->append_op (result[i][old_j]);
> +       }
>        ret.safe_push (ne);
>      }
>
> @@ -2759,24 +2815,18 @@ dt_operand::gen_gimple_expr (FILE *f, in
>    /* While the toplevel operands are canonicalized by the caller
>       after valueizing operands of sub-expressions we have to
>       re-canonicalize operand order.  */
> -  if (operator_id *code = dyn_cast <operator_id *> (id))
> +  int opno = commutative_op (id);
> +  if (opno >= 0)
>      {
> -      /* ???  We can't canonicalize tcc_comparison operands here
> -         because that requires changing the comparison code which
> -        we already matched...  */
> -      if (commutative_tree_code (code->code)
> -         || commutative_ternary_tree_code (code->code))
> -       {
> -         char child_opname0[20], child_opname1[20];
> -         gen_opname (child_opname0, 0);
> -         gen_opname (child_opname1, 1);
> -         fprintf_indent (f, indent,
> -                         "if (tree_swap_operands_p (%s, %s))\n",
> -                         child_opname0, child_opname1);
> -         fprintf_indent (f, indent,
> -                         "  std::swap (%s, %s);\n",
> -                         child_opname0, child_opname1);
> -       }
> +      char child_opname0[20], child_opname1[20];
> +      gen_opname (child_opname0, opno);
> +      gen_opname (child_opname1, opno + 1);
> +      fprintf_indent (f, indent,
> +                     "if (tree_swap_operands_p (%s, %s))\n",
> +                     child_opname0, child_opname1);
> +      fprintf_indent (f, indent,
> +                     "  std::swap (%s, %s);\n",
> +                     child_opname0, child_opname1);
>      }
>
>    return n_braces;
> @@ -4217,11 +4267,14 @@ parser::parse_expr ()
>                       e->operation->id, e->operation->nargs, e->ops.length ());
>           if (is_commutative)
>             {
> -             if (e->ops.length () == 2)
> +             if (e->ops.length () == 2
> +                 || commutative_op (e->operation) >= 0)
>                 e->is_commutative = true;
>               else
> -               fatal_at (token, "only binary operators or function with "
> -                         "two arguments can be marked commutative");
> +               fatal_at (token, "only binary operators or functions with "
> +                         "two arguments can be marked commutative, "
> +                         "unless the operation is known to be inherently "
> +                         "commutative");
>             }
>           e->expr_type = expr_type;
>           return op;
> Index: gcc/gimple-ssa-backprop.c
> ===================================================================
> --- gcc/gimple-ssa-backprop.c   2018-01-03 11:12:59.113629694 +0000
> +++ gcc/gimple-ssa-backprop.c   2018-05-11 18:08:24.836947330 +0100
> @@ -375,6 +375,9 @@ backprop::process_builtin_call_use (gcal
>
>      CASE_CFN_FMA:
>      CASE_CFN_FMA_FN:
> +    case CFN_FMS:
> +    case CFN_FNMA:
> +    case CFN_FNMS:
>        /* In X * X + Y, where Y is distinct from X, the sign of X doesn't
>          matter.  */
>        if (gimple_call_arg (call, 0) == rhs
> @@ -420,15 +423,6 @@ backprop::process_assign_use (gassign *a
>         }
>        break;
>
> -    case FMA_EXPR:
> -      /* In X * X + Y, where Y is distinct from X, the sign of X doesn't
> -        matter.  */
> -      if (gimple_assign_rhs1 (assign) == rhs
> -         && gimple_assign_rhs2 (assign) == rhs
> -         && gimple_assign_rhs3 (assign) != rhs)
> -       info->flags.ignore_sign = true;
> -      break;
> -
>      case MULT_EXPR:
>        /* In X * X, the sign of X doesn't matter.  */
>        if (gimple_assign_rhs1 (assign) == rhs
> Index: gcc/hsa-gen.c
> ===================================================================
> --- gcc/hsa-gen.c       2018-03-17 08:30:21.230924973 +0000
> +++ gcc/hsa-gen.c       2018-05-11 18:08:24.841947124 +0100
> @@ -3178,23 +3178,6 @@ gen_hsa_insns_for_operation_assignment (
>      case NEGATE_EXPR:
>        opcode = BRIG_OPCODE_NEG;
>        break;
> -    case FMA_EXPR:
> -      /* There is a native HSA instruction for scalar FMAs but not for vector
> -        ones.  */
> -      if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE)
> -       {
> -         hsa_op_reg *dest
> -           = hsa_cfun->reg_for_gimple_ssa (gimple_assign_lhs (assign));
> -         hsa_op_with_type *op1 = hsa_reg_or_immed_for_gimple_op (rhs1, hbb);
> -         hsa_op_with_type *op2 = hsa_reg_or_immed_for_gimple_op (rhs2, hbb);
> -         hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb);
> -         hsa_op_reg *tmp = new hsa_op_reg (dest->m_type);
> -         gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp, op1, op2, hbb);
> -         gen_hsa_binary_operation (BRIG_OPCODE_ADD, dest, tmp, op3, hbb);
> -         return;
> -       }
> -      opcode = BRIG_OPCODE_MAD;
> -      break;
>      case MIN_EXPR:
>        opcode = BRIG_OPCODE_MIN;
>        break;
> @@ -4490,6 +4473,57 @@ gen_hsa_divmod (gcall *call, hsa_bb *hbb
>    insn->set_output_in_type (dest, 0, hbb);
>  }
>
> +/* Emit instructions that implement FMA, FMS, FNMA or FNMS call STMT.
> +   Instructions are appended to basic block HBB.  NEGATE1 is true for
> +   FNMA and FNMS.  NEGATE3 is true for FMS and FNMS.  */
> +
> +static void
> +gen_hsa_fma (gcall *call, hsa_bb *hbb, bool negate1, bool negate3)
> +{
> +  tree lhs = gimple_call_lhs (call);
> +  if (lhs == NULL_TREE)
> +    return;
> +
> +  tree rhs1 = gimple_call_arg (call, 0);
> +  tree rhs2 = gimple_call_arg (call, 1);
> +  tree rhs3 = gimple_call_arg (call, 2);
> +
> +  hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
> +  hsa_op_with_type *op1 = hsa_reg_or_immed_for_gimple_op (rhs1, hbb);
> +  hsa_op_with_type *op2 = hsa_reg_or_immed_for_gimple_op (rhs2, hbb);
> +  hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb);
> +
> +  if (negate1)
> +    {
> +      hsa_op_reg *tmp = new hsa_op_reg (dest->m_type);
> +      gen_hsa_unary_operation (BRIG_OPCODE_NEG, tmp, op1, hbb);
> +      op1 = tmp;
> +    }
> +
> +  /* There is a native HSA instruction for scalar FMAs but not for vector
> +     ones.  */
> +  if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE)
> +    {
> +      hsa_op_reg *tmp = new hsa_op_reg (dest->m_type);
> +      gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp, op1, op2, hbb);
> +      gen_hsa_binary_operation (negate3 ? BRIG_OPCODE_SUB : BRIG_OPCODE_ADD,
> +                               dest, tmp, op3, hbb);
> +    }
> +  else
> +    {
> +      if (negate3)
> +       {
> +         hsa_op_reg *tmp = new hsa_op_reg (dest->m_type);
> +         gen_hsa_unary_operation (BRIG_OPCODE_NEG, tmp, op3, hbb);
> +         op3 = tmp;
> +       }
> +      hsa_insn_basic *insn = new hsa_insn_basic (4, BRIG_OPCODE_MAD,
> +                                                dest->m_type, dest,
> +                                                op1, op2, op3);
> +      hbb->append_insn (insn);
> +    }
> +}
> +
>  /* Set VALUE to a shadow kernel debug argument and append a new instruction
>     to HBB basic block.  */
>
> @@ -5224,6 +5258,22 @@ gen_hsa_insn_for_internal_fn_call (gcall
>        gen_hsa_insns_for_call_of_internal_fn (stmt, hbb);
>        break;
>
> +    case IFN_FMA:
> +      gen_hsa_fma (stmt, hbb, false, false);
> +      break;
> +
> +    case IFN_FMS:
> +      gen_hsa_fma (stmt, hbb, false, true);
> +      break;
> +
> +    case IFN_FNMA:
> +      gen_hsa_fma (stmt, hbb, true, false);
> +      break;
> +
> +    case IFN_FNMS:
> +      gen_hsa_fma (stmt, hbb, true, true);
> +      break;
> +
>      default:
>        HSA_SORRY_ATV (gimple_location (stmt),
>                      "support for HSA does not implement internal function: %s",
> Index: gcc/match.pd
> ===================================================================
> --- gcc/match.pd        2018-05-02 08:39:32.882737077 +0100
> +++ gcc/match.pd        2018-05-11 18:08:24.842947083 +0100
> @@ -4702,3 +4702,60 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>         || wi::geu_p (wi::to_wide (@rpos),
>                       wi::to_wide (@ipos) + isize))
>      (BIT_FIELD_REF @0 @rsize @rpos)))))
> +
> +(for fmas (FMA)
> + (simplify
> +  (fmas:c (negate @0) @1 @2)
> +  (IFN_FNMA @0 @1 @2))
> + (simplify
> +  (fmas @0 @1 (negate @2))
> +  (IFN_FMS @0 @1 @2))
> + (simplify
> +  (fmas:c (negate @0) @1 (negate @2))
> +  (IFN_FNMS @0 @1 @2))
> + (simplify
> +  (negate (fmas@3 @0 @1 @2))
> +  (if (single_use (@3))
> +   (IFN_FNMS @0 @1 @2))))
> +
> +(simplify
> + (IFN_FMS:c (negate @0) @1 @2)
> + (IFN_FNMS @0 @1 @2))
> +(simplify
> + (IFN_FMS @0 @1 (negate @2))
> + (IFN_FMA @0 @1 @2))
> +(simplify
> + (IFN_FMS:c (negate @0) @1 (negate @2))
> + (IFN_FNMA @0 @1 @2))
> +(simplify
> + (negate (IFN_FMS@3 @0 @1 @2))
> +  (if (single_use (@3))
> +   (IFN_FNMA @0 @1 @2)))
> +
> +(simplify
> + (IFN_FNMA:c (negate @0) @1 @2)
> + (IFN_FMA @0 @1 @2))
> +(simplify
> + (IFN_FNMA @0 @1 (negate @2))
> + (IFN_FNMS @0 @1 @2))
> +(simplify
> + (IFN_FNMA:c (negate @0) @1 (negate @2))
> + (IFN_FMS @0 @1 @2))
> +(simplify
> + (negate (IFN_FNMA@3 @0 @1 @2))
> + (if (single_use (@3))
> +  (IFN_FMS @0 @1 @2)))
> +
> +(simplify
> + (IFN_FNMS:c (negate @0) @1 @2)
> + (IFN_FMS @0 @1 @2))
> +(simplify
> + (IFN_FNMS @0 @1 (negate @2))
> + (IFN_FNMA @0 @1 @2))
> +(simplify
> + (IFN_FNMS:c (negate @0) @1 (negate @2))
> + (IFN_FMA @0 @1 @2))
> +(simplify
> + (negate (IFN_FNMS@3 @0 @1 @2))
> + (if (single_use (@3))
> +  (IFN_FMA @0 @1 @2)))
> Index: gcc/tree-ssa-math-opts.c
> ===================================================================
> --- gcc/tree-ssa-math-opts.c    2018-01-12 14:45:51.037434575 +0000
> +++ gcc/tree-ssa-math-opts.c    2018-05-11 18:08:24.850946754 +0100
> @@ -2640,6 +2640,14 @@ convert_plusminus_to_widen (gimple_stmt_
>    return true;
>  }
>
> +/* gimple_fold callback that "valueizes" everything.  */
> +
> +static tree
> +aggressive_valueize (tree val)
> +{
> +  return val;
> +}
> +
>  /* Given a result MUL_RESULT which is a result of a multiplication of OP1 and
>     OP2 and which we know is used in statements that can be, together with the
>     multiplication, converted to FMAs, perform the transformation.  */
> @@ -2650,7 +2658,7 @@ convert_mult_to_fma_1 (tree mul_result,
>    tree type = TREE_TYPE (mul_result);
>    gimple *use_stmt;
>    imm_use_iterator imm_iter;
> -  gassign *fma_stmt;
> +  gcall *fma_stmt;
>
>    FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, mul_result)
>      {
> @@ -2658,6 +2666,7 @@ convert_mult_to_fma_1 (tree mul_result,
>        enum tree_code use_code;
>        tree addop, mulop1 = op1, result = mul_result;
>        bool negate_p = false;
> +      gimple_seq seq = NULL;
>
>        if (is_gimple_debug (use_stmt))
>         continue;
> @@ -2683,11 +2692,7 @@ convert_mult_to_fma_1 (tree mul_result,
>           addop = gimple_assign_rhs2 (use_stmt);
>           /* a * b - c -> a * b + (-c)  */
>           if (gimple_assign_rhs_code (use_stmt) == MINUS_EXPR)
> -           addop = force_gimple_operand_gsi (&gsi,
> -                                             build1 (NEGATE_EXPR,
> -                                                     type, addop),
> -                                             true, NULL_TREE, true,
> -                                             GSI_SAME_STMT);
> +           addop = gimple_build (&seq, NEGATE_EXPR, type, addop);
>         }
>        else
>         {
> @@ -2698,23 +2703,26 @@ convert_mult_to_fma_1 (tree mul_result,
>         }
>
>        if (negate_p)
> -       mulop1 = force_gimple_operand_gsi (&gsi,
> -                                          build1 (NEGATE_EXPR,
> -                                                  type, mulop1),
> -                                          true, NULL_TREE, true,
> -                                          GSI_SAME_STMT);
> +       mulop1 = gimple_build (&seq, NEGATE_EXPR, type, mulop1);
>
> -      fma_stmt = gimple_build_assign (gimple_assign_lhs (use_stmt),
> -                                     FMA_EXPR, mulop1, op2, addop);
> +      if (seq)
> +       gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
> +      fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2, addop);
> +      gimple_call_set_lhs (fma_stmt, gimple_assign_lhs (use_stmt));
> +      gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal (use_stmt));
> +      gsi_replace (&gsi, fma_stmt, true);
> +      /* Valueize aggressively so that we generate FMS, FNMA and FNMS
> +        regardless of where the negation occurs.  */
> +      if (fold_stmt (&gsi, aggressive_valueize))
> +       update_stmt (gsi_stmt (gsi));
>
>        if (dump_file && (dump_flags & TDF_DETAILS))
>         {
>           fprintf (dump_file, "Generated FMA ");
> -         print_gimple_stmt (dump_file, fma_stmt, 0, 0);
> +         print_gimple_stmt (dump_file, gsi_stmt (gsi), 0, 0);
>           fprintf (dump_file, "\n");
>         }
>
> -      gsi_replace (&gsi, fma_stmt, true);
>        widen_mul_stats.fmas_inserted++;
>      }
>  }
> @@ -2862,7 +2870,8 @@ convert_mult_to_fma (gimple *mul_stmt, t
>
>    /* If the target doesn't support it, don't generate it.  We assume that
>       if fma isn't available then fms, fnma or fnms are not either.  */
> -  if (optab_handler (fma_optab, TYPE_MODE (type)) == CODE_FOR_nothing)
> +  optimization_type opt_type = bb_optimization_type (gimple_bb (mul_stmt));
> +  if (!direct_internal_fn_supported_p (IFN_FMA, type, opt_type))
>      return false;
>
>    /* If the multiplication has zero uses, it is kept around probably because
> @@ -2958,8 +2967,8 @@ convert_mult_to_fma (gimple *mul_stmt, t
>          that a mult / subtract pair.  */
>        if (use_code == MINUS_EXPR && !negate_p
>           && gimple_assign_rhs1 (use_stmt) == result
> -         && optab_handler (fms_optab, TYPE_MODE (type)) == CODE_FOR_nothing
> -         && optab_handler (fnma_optab, TYPE_MODE (type)) != CODE_FOR_nothing)
> +         && !direct_internal_fn_supported_p (IFN_FMS, type, opt_type)
> +         && direct_internal_fn_supported_p (IFN_FNMA, type, opt_type))
>         {
>           tree rhs2 = gimple_assign_rhs2 (use_stmt);
>
> Index: gcc/config/i386/i386.c
> ===================================================================
> --- gcc/config/i386/i386.c      2018-05-09 11:34:41.204789431 +0100
> +++ gcc/config/i386/i386.c      2018-05-11 18:08:24.825947782 +0100
> @@ -50516,19 +50516,20 @@ ix86_add_stmt_cost (void *data, int coun
>    tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE;
>    int stmt_cost = - 1;
>
> +  bool fp = false;
> +  machine_mode mode = TImode;
> +
> +  if (vectype != NULL)
> +    {
> +      fp = FLOAT_TYPE_P (vectype);
> +      mode = TYPE_MODE (vectype);
> +    }
> +
>    if ((kind == vector_stmt || kind == scalar_stmt)
>        && stmt_info
>        && stmt_info->stmt && gimple_code (stmt_info->stmt) == GIMPLE_ASSIGN)
>      {
>        tree_code subcode = gimple_assign_rhs_code (stmt_info->stmt);
> -      bool fp = false;
> -      machine_mode mode = TImode;
> -
> -      if (vectype != NULL)
> -       {
> -         fp = FLOAT_TYPE_P (vectype);
> -         mode = TYPE_MODE (vectype);
> -       }
>        /*machine_mode inner_mode = mode;
>        if (VECTOR_MODE_P (mode))
>         inner_mode = GET_MODE_INNER (mode);*/
> @@ -50559,12 +50560,6 @@ ix86_add_stmt_cost (void *data, int coun
>         case MULT_HIGHPART_EXPR:
>           stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
>           break;
> -       case FMA_EXPR:
> -          stmt_cost = ix86_vec_cost (mode,
> -                                    mode == SFmode ? ix86_cost->fmass
> -                                    : ix86_cost->fmasd,
> -                                    true);
> -         break;
>         case NEGATE_EXPR:
>           if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
>             stmt_cost = ix86_cost->sse_op;
> @@ -50627,6 +50622,24 @@ ix86_add_stmt_cost (void *data, int coun
>           break;
>         }
>      }
> +
> +  combined_fn cfn;
> +  if ((kind == vector_stmt || kind == scalar_stmt)
> +      && stmt_info
> +      && stmt_info->stmt
> +      && (cfn = gimple_call_combined_fn (stmt_info->stmt)) != CFN_LAST)
> +    switch (cfn)
> +      {
> +      case CFN_FMA:
> +       stmt_cost = ix86_vec_cost (mode,
> +                                  mode == SFmode ? ix86_cost->fmass
> +                                  : ix86_cost->fmasd,
> +                                  true);
> +       break;
> +      default:
> +       break;
> +      }
> +
>    /* If we do elementwise loads into a vector then we are bound by
>       latency and execution resources for the many scalar loads
>       (AGU and load ports).  Try to account for this by scaling the
> Index: gcc/config/rs6000/rs6000.c
> ===================================================================
> --- gcc/config/rs6000/rs6000.c  2018-05-08 09:42:03.510648702 +0100
> +++ gcc/config/rs6000/rs6000.c  2018-05-11 18:08:24.830947576 +0100
> @@ -16025,7 +16025,9 @@ rs6000_gimple_fold_builtin (gimple_stmt_
>         arg1 = gimple_call_arg (stmt, 1);
>         tree arg2 = gimple_call_arg (stmt, 2);
>         lhs = gimple_call_lhs (stmt);
> -       gimple *g = gimple_build_assign (lhs, FMA_EXPR, arg0, arg1, arg2);
> +       gcall *g = gimple_build_call_internal (IFN_FMA, 3, arg0, arg1, arg2);
> +       gimple_call_set_lhs (g, lhs);
> +       gimple_call_set_nothrow (g, true);
>         gimple_set_location (g, gimple_location (stmt));
>         gsi_replace (gsi, g, true);
>         return true;
> Index: gcc/brig/brigfrontend/brig-function.cc
> ===================================================================
> --- gcc/brig/brigfrontend/brig-function.cc      2018-05-08 09:42:01.419725462 +0100
> +++ gcc/brig/brigfrontend/brig-function.cc      2018-05-11 18:08:24.817948111 +0100
> @@ -1218,6 +1218,7 @@ brig_function::get_builtin_for_hsa_opcod
>      case BRIG_OPCODE_NEXP2:
>        builtin = mathfn_built_in (builtin_type, BUILT_IN_EXP2);
>        break;
> +    case BRIG_OPCODE_FMA:
>      case BRIG_OPCODE_NFMA:
>        builtin = mathfn_built_in (builtin_type, BUILT_IN_FMA);
>        break;
> @@ -1460,8 +1461,6 @@ brig_function::get_tree_code_for_hsa_opc
>         return CALL_EXPR;
>        else
>         return MAX_EXPR;
> -    case BRIG_OPCODE_FMA:
> -      return FMA_EXPR;
>      case BRIG_OPCODE_ABS:
>        return ABS_EXPR;
>      case BRIG_OPCODE_SHL:
> @@ -1496,6 +1495,7 @@ brig_function::get_tree_code_for_hsa_opc
>        /* Implement as 1/f (x).  gcc should pattern detect that and
>          use a native instruction, if available, for it.  */
>        return TREE_LIST;
> +    case BRIG_OPCODE_FMA:
>      case BRIG_OPCODE_FLOOR:
>      case BRIG_OPCODE_CEIL:
>      case BRIG_OPCODE_SQRT:
> Index: gcc/c/gimple-parser.c
> ===================================================================
> --- gcc/c/gimple-parser.c       2018-01-03 11:12:56.269741723 +0000
> +++ gcc/c/gimple-parser.c       2018-05-11 18:08:24.817948111 +0100
> @@ -903,27 +903,6 @@ c_parser_gimple_postfix_expression (c_pa
>               expr.value = fold_convert (type, val);
>               return expr;
>             }
> -         else if (strcmp (IDENTIFIER_POINTER (id), "__FMA") == 0)
> -           {
> -             c_parser_consume_token (parser);
> -             auto_vec<tree> args;
> -
> -             if (c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
> -               {
> -                 c_parser_gimple_expr_list (parser, &args);
> -                 c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
> -                                            "expected %<)%>");
> -               }
> -             if (args.length () != 3)
> -               {
> -                 error_at (loc, "invalid number of operands to __FMA");
> -                 expr.value = error_mark_node;
> -                 return expr;
> -               }
> -             expr.value = build3_loc (loc, FMA_EXPR, TREE_TYPE (args[0]),
> -                                      args[0], args[1], args[2]);
> -             return expr;
> -           }
>
>           /* SSA name.  */
>           unsigned version, ver_offset;
> Index: gcc/cp/constexpr.c
> ===================================================================
> --- gcc/cp/constexpr.c  2018-05-01 19:30:30.973597315 +0100
> +++ gcc/cp/constexpr.c  2018-05-11 18:08:24.830947576 +0100
> @@ -4573,7 +4573,6 @@ cxx_eval_constant_expression (const cons
>                              non_constant_p, overflow_p);
>        break;
>
> -    case FMA_EXPR:
>      case VEC_PERM_EXPR:
>        r = cxx_eval_trinary_expression (ctx, t, lval,
>                                        non_constant_p, overflow_p);
> @@ -5998,7 +5997,6 @@ #define RECUR(T,RV) \
>           return false;
>        return true;
>
> -    case FMA_EXPR:
>      case VEC_PERM_EXPR:
>       for (i = 0; i < 3; ++i)
>        if (!RECUR (TREE_OPERAND (t, i), true))
> Index: gcc/testsuite/lib/target-supports.exp
> ===================================================================
> --- gcc/testsuite/lib/target-supports.exp       2018-05-08 09:42:01.256731446 +0100
> +++ gcc/testsuite/lib/target-supports.exp       2018-05-11 18:08:24.848946836 +0100
> @@ -2879,6 +2879,13 @@ proc check_effective_target_base_quadflo
>      return 1
>  }
>
> +# Return 1 if the target supports all four forms of fused multiply-add
> +# (fma, fms, fnma, and fnms) for both float and double.
> +
> +proc check_effective_target_all_scalar_fma { } {
> +    return [istarget aarch64*-*-*]
> +}
> +
>  # Return 1 if the target supports compiling fixed-point,
>  # 0 otherwise.
>
> Index: gcc/testsuite/gcc.dg/fma-1.c
> ===================================================================
> --- /dev/null   2018-04-20 16:19:46.369131350 +0100
> +++ gcc/testsuite/gcc.dg/fma-1.c        2018-05-11 18:08:24.844947000 +0100
> @@ -0,0 +1,15 @@
> +/* { dg-options "-O2 -fdump-tree-widening_mul" } */
> +
> +float
> +f1 (float a, float b, float c)
> +{
> +  return a * b + c;
> +}
> +
> +double
> +f2 (double a, double b, double c)
> +{
> +  return a * b + c;
> +}
> +
> +/* { dg-final { scan-tree-dump-times { = FMA \(} 2 "widening_mul" { target all_scalar_fma } } } */
> Index: gcc/testsuite/gcc.dg/fma-2.c
> ===================================================================
> --- /dev/null   2018-04-20 16:19:46.369131350 +0100
> +++ gcc/testsuite/gcc.dg/fma-2.c        2018-05-11 18:08:24.844947000 +0100
> @@ -0,0 +1,15 @@
> +/* { dg-options "-O2 -fdump-tree-widening_mul" } */
> +
> +float
> +f1 (float a, float b, float c)
> +{
> +  return a * b - c;
> +}
> +
> +double
> +f2 (double a, double b, double c)
> +{
> +  return a * b - c;
> +}
> +
> +/* { dg-final { scan-tree-dump-times { = FMS \(} 2 "widening_mul" { target all_scalar_fma } } } */
> Index: gcc/testsuite/gcc.dg/fma-3.c
> ===================================================================
> --- /dev/null   2018-04-20 16:19:46.369131350 +0100
> +++ gcc/testsuite/gcc.dg/fma-3.c        2018-05-11 18:08:24.845946959 +0100
> @@ -0,0 +1,15 @@
> +/* { dg-options "-O2 -fdump-tree-widening_mul" } */
> +
> +float
> +f1 (float a, float b, float c)
> +{
> +  return c - a * b;
> +}
> +
> +double
> +f2 (double a, double b, double c)
> +{
> +  return c - a * b;
> +}
> +
> +/* { dg-final { scan-tree-dump-times { = FNMA \(} 2 "widening_mul" { target all_scalar_fma } } } */
> Index: gcc/testsuite/gcc.dg/fma-4.c
> ===================================================================
> --- /dev/null   2018-04-20 16:19:46.369131350 +0100
> +++ gcc/testsuite/gcc.dg/fma-4.c        2018-05-11 18:08:24.845946959 +0100
> @@ -0,0 +1,15 @@
> +/* { dg-options "-O2 -fdump-tree-widening_mul" } */
> +
> +float
> +f1 (float a, float b, float c)
> +{
> +  return -(a * b) - c;
> +}
> +
> +double
> +f2 (double a, double b, double c)
> +{
> +  return -(a * b) - c;
> +}
> +
> +/* { dg-final { scan-tree-dump-times { = FNMS \(} 2 "widening_mul" { target all_scalar_fma } } } */
> Index: gcc/testsuite/gcc.dg/fma-5.c
> ===================================================================
> --- /dev/null   2018-04-20 16:19:46.369131350 +0100
> +++ gcc/testsuite/gcc.dg/fma-5.c        2018-05-11 18:08:24.845946959 +0100
> @@ -0,0 +1,53 @@
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +float
> +f1 (float a, float b, float c)
> +{
> +  return __builtin_fmaf (a, b, -c);
> +}
> +
> +double
> +f2 (double a, double b, double c)
> +{
> +  return __builtin_fma (a, b, -c);
> +}
> +
> +void
> +f3 (float a, float b, float c, float d, float e, float *res)
> +{
> +  res[0] = __builtin_fmaf (a, b, -e);
> +  res[1] = __builtin_fmaf (c, d, -e);
> +}
> +
> +void
> +f4 (double a, double b, double c, double d, double e, double *res)
> +{
> +  res[0] = __builtin_fma (a, b, -e);
> +  res[1] = __builtin_fma (c, d, -e);
> +}
> +
> +float
> +f5 (float a, float b, float c)
> +{
> +  return -__builtin_fmaf (-a, b, c);
> +}
> +
> +double
> +f6 (double a, double b, double c)
> +{
> +  return -__builtin_fma (-a, b, c);
> +}
> +
> +float
> +f7 (float a, float b, float c)
> +{
> +  return -__builtin_fmaf (a, -b, c);
> +}
> +
> +double
> +f8 (double a, double b, double c)
> +{
> +  return -__builtin_fma (a, -b, c);
> +}
> +
> +/* { dg-final { scan-tree-dump-times { = FMS \(} 10 "optimized" { target all_scalar_fma } } } */
> Index: gcc/testsuite/gcc.dg/fma-6.c
> ===================================================================
> --- /dev/null   2018-04-20 16:19:46.369131350 +0100
> +++ gcc/testsuite/gcc.dg/fma-6.c        2018-05-11 18:08:24.845946959 +0100
> @@ -0,0 +1,67 @@
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +float
> +f1 (float a, float b, float c)
> +{
> +  return __builtin_fmaf (-a, b, c);
> +}
> +
> +double
> +f2 (double a, double b, double c)
> +{
> +  return __builtin_fma (-a, b, c);
> +}
> +
> +float
> +f3 (float a, float b, float c)
> +{
> +  return __builtin_fmaf (a, -b, c);
> +}
> +
> +double
> +f4 (double a, double b, double c)
> +{
> +  return __builtin_fma (a, -b, c);
> +}
> +
> +void
> +f5 (float a, float b, float c, float d, float e, float *res)
> +{
> +  res[0] = __builtin_fmaf (-a, b, c);
> +  res[1] = __builtin_fmaf (-a, d, e);
> +}
> +
> +void
> +f6 (double a, double b, double c, double d, double e, double *res)
> +{
> +  res[0] = __builtin_fma (-a, b, c);
> +  res[1] = __builtin_fma (-a, d, e);
> +}
> +
> +void
> +f7 (float a, float b, float c, float d, float e, float *res)
> +{
> +  res[0] = __builtin_fmaf (a, -b, c);
> +  res[1] = __builtin_fmaf (d, -b, e);
> +}
> +
> +void
> +f8 (double a, double b, double c, double d, double e, double *res)
> +{
> +  res[0] = __builtin_fma (a, -b, c);
> +  res[1] = __builtin_fma (d, -b, e);
> +}
> +
> +float
> +f9 (float a, float b, float c)
> +{
> +  return -__builtin_fmaf (a, b, -c);
> +}
> +
> +double
> +f10 (double a, double b, double c)
> +{
> +  return -__builtin_fma (a, b, -c);
> +}
> +
> +/* { dg-final { scan-tree-dump-times { = FNMA \(} 14 "optimized" { target all_scalar_fma } } } */
> Index: gcc/testsuite/gcc.dg/fma-7.c
> ===================================================================
> --- /dev/null   2018-04-20 16:19:46.369131350 +0100
> +++ gcc/testsuite/gcc.dg/fma-7.c        2018-05-11 18:08:24.845946959 +0100
> @@ -0,0 +1,67 @@
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +float
> +f1 (float a, float b, float c)
> +{
> +  return __builtin_fmaf (-a, b, -c);
> +}
> +
> +double
> +f2 (double a, double b, double c)
> +{
> +  return __builtin_fma (-a, b, -c);
> +}
> +
> +float
> +f3 (float a, float b, float c)
> +{
> +  return __builtin_fmaf (a, -b, -c);
> +}
> +
> +double
> +f4 (double a, double b, double c)
> +{
> +  return __builtin_fma (a, -b, -c);
> +}
> +
> +void
> +f5 (float a, float b, float c, float d, float *res)
> +{
> +  res[0] = __builtin_fmaf (-a, b, -c);
> +  res[1] = __builtin_fmaf (-a, d, -c);
> +}
> +
> +void
> +f6 (double a, double b, double c, double d, double *res)
> +{
> +  res[0] = __builtin_fma (-a, b, -c);
> +  res[1] = __builtin_fma (-a, d, -c);
> +}
> +
> +void
> +f7 (float a, float b, float c, float d, float *res)
> +{
> +  res[0] = __builtin_fmaf (a, -b, -c);
> +  res[1] = __builtin_fmaf (d, -b, -c);
> +}
> +
> +void
> +f8 (double a, double b, double c, double d, double *res)
> +{
> +  res[0] = __builtin_fma (a, -b, -c);
> +  res[1] = __builtin_fma (d, -b, -c);
> +}
> +
> +float
> +f9 (float a, float b, float c)
> +{
> +  return -__builtin_fmaf (a, b, c);
> +}
> +
> +double
> +f10 (double a, double b, double c)
> +{
> +  return -__builtin_fma (a, b, c);
> +}
> +
> +/* { dg-final { scan-tree-dump-times { = FNMS \(} 14 "optimized" { target all_scalar_fma } } } */
> Index: gcc/testsuite/gcc.dg/gimplefe-26.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/gimplefe-26.c  2017-02-23 19:54:08.000000000 +0000
> +++ /dev/null   2018-04-20 16:19:46.369131350 +0100
> @@ -1,16 +0,0 @@
> -/* { dg-do compile { target c99_runtime } } */
> -/* { dg-options "-O -fgimple -fdump-tree-ssa-gimple" } */
> -
> -#define foo(type, num) \
> -type __GIMPLE () foo_##num (type a, type b, type c) \
> -{ \
> -  type t0; \
> -  t0_1 = __FMA (a, b, c); \
> -  return t0_1; \
> -}
> -
> -foo(float, 1)
> -foo(double, 2)
> -foo(long double, 3)
> -
> -/* { dg-final { scan-tree-dump-times "__FMA" 3 "ssa" } } */
> Index: gcc/testsuite/gfortran.dg/reassoc_7.f
> ===================================================================
> --- gcc/testsuite/gfortran.dg/reassoc_7.f       2015-06-02 23:52:46.000000000 +0100
> +++ gcc/testsuite/gfortran.dg/reassoc_7.f       2018-05-11 18:08:24.846946918 +0100
> @@ -1,5 +1,5 @@
>  ! { dg-do compile }
> -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" }
> +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" }
>
>        SUBROUTINE S55199(P,Dvdph)
>        implicit none
> Index: gcc/testsuite/gfortran.dg/reassoc_8.f
> ===================================================================
> --- gcc/testsuite/gfortran.dg/reassoc_8.f       2015-06-02 23:52:46.000000000 +0100
> +++ gcc/testsuite/gfortran.dg/reassoc_8.f       2018-05-11 18:08:24.846946918 +0100
> @@ -1,5 +1,5 @@
>  ! { dg-do compile }
> -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" }
> +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" }
>
>        SUBROUTINE S55199(P,Dvdph)
>        implicit none
> Index: gcc/testsuite/gfortran.dg/reassoc_9.f
> ===================================================================
> --- gcc/testsuite/gfortran.dg/reassoc_9.f       2015-06-02 23:52:48.000000000 +0100
> +++ gcc/testsuite/gfortran.dg/reassoc_9.f       2018-05-11 18:08:24.847946877 +0100
> @@ -1,5 +1,5 @@
>  ! { dg-do compile }
> -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" }
> +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" }
>
>        SUBROUTINE S55199(P,Dvdph)
>        implicit none
> Index: gcc/testsuite/gfortran.dg/reassoc_10.f
> ===================================================================
> --- gcc/testsuite/gfortran.dg/reassoc_10.f      2015-06-02 23:52:44.000000000 +0100
> +++ gcc/testsuite/gfortran.dg/reassoc_10.f      2018-05-11 18:08:24.846946918 +0100
> @@ -1,5 +1,5 @@
>  ! { dg-do compile }
> -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" }
> +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" }
>
>        SUBROUTINE S55199(P,Q,Dvdph)
>        implicit none
<div id="DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2"><br />
<table style="border-top: 1px solid #D3D4DE;">
	<tr>
        <td style="width: 55px; padding-top: 13px;"><a
href="http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail"
target="_blank"><img
src="https://ipmcdn.avast.com/images/icons/icon-envelope-tick-green-avg-v1.png"
alt="" width="46" height="29" style="width: 46px; height: 29px;"
/></a></td>
		<td style="width: 470px; padding-top: 12px; color: #41424e;
font-size: 13px; font-family: Arial, Helvetica, sans-serif;
line-height: 18px;">Virus-free. <a
href="http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail"
target="_blank" style="color: #4453ea;">www.avg.com</a>
		</td>
	</tr>
</table><a href="#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2" width="1"
height="1"></a></div>
Richard Sandiford May 11, 2018, 6:04 p.m. | #2
Andrew Pinski <pinskia@gmail.com> writes:
> On Fri, May 11, 2018 at 10:15 AM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> There are four optabs for various forms of fused multiply-add:
>> fma, fms, fnma and fnms.  Of these, only fma had a direct gimple
>> representation.  For the other three we relied on special pattern-
>> matching during expand, although tree-ssa-math-opts.c did have
>> some code to try to second-guess what expand would do.
>>
>> This patch removes the old FMA_EXPR representation of fma and
>> introduces four new internal functions, one for each optab.
>> IFN_FMA is tied to BUILT_IN_FMA* while the other three are
>> independent directly-mapped internal functions.  It's then
>> possible to do the pattern-matching in match.pd and
>> tree-ssa-math-opts.c (via folding) can select the exact
>> FMA-based operation.
>>
>> The patch removes the gimple FE support for __FMA rather than mapping
>> it to the internal function.  There's no reason now to treat it
>> differently from other internal functions (although the FE doesn't
>> handle those yet).
>>
>> The BRIG & HSA parts are a best guess, but seem relatively simple.
>>
>> The genmatch.c changes are structured to allow ternary ops in which
>> the second two rather than the first two operands are commutative.
>> A later patch makes use of this.
>>
>> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf,
>> x86_64-linux-gnu and powerpc64le-linux-gnu.  OK to install?
>
>
> I think there might be an issue with long double and __float128
> support here (for both PowerPC and x86_64).  Please add testcases for
> those to show they are not problematic.
> What about half float on the aarch64 case?  Is that handle correctly?
> I did not see a testcase for that case either.

What specific kind of problem are you thinking of?  The patch is
type-generic and the internal functions are only used if the target
advertises the appropriate optab.  Targets already check (or should
check) that the optabs are used under the appropriate conditions for
that target.  E.g. gcc.target/powerpc/float128-fma1.c checks for the
four cases of __float128 for PowerPC.

Thanks,
Richard
Andrew Pinski May 11, 2018, 6:21 p.m. | #3
On Fri, May 11, 2018 at 11:04 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Andrew Pinski <pinskia@gmail.com> writes:
>> On Fri, May 11, 2018 at 10:15 AM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> There are four optabs for various forms of fused multiply-add:
>>> fma, fms, fnma and fnms.  Of these, only fma had a direct gimple
>>> representation.  For the other three we relied on special pattern-
>>> matching during expand, although tree-ssa-math-opts.c did have
>>> some code to try to second-guess what expand would do.
>>>
>>> This patch removes the old FMA_EXPR representation of fma and
>>> introduces four new internal functions, one for each optab.
>>> IFN_FMA is tied to BUILT_IN_FMA* while the other three are
>>> independent directly-mapped internal functions.  It's then
>>> possible to do the pattern-matching in match.pd and
>>> tree-ssa-math-opts.c (via folding) can select the exact
>>> FMA-based operation.
>>>
>>> The patch removes the gimple FE support for __FMA rather than mapping
>>> it to the internal function.  There's no reason now to treat it
>>> differently from other internal functions (although the FE doesn't
>>> handle those yet).
>>>
>>> The BRIG & HSA parts are a best guess, but seem relatively simple.
>>>
>>> The genmatch.c changes are structured to allow ternary ops in which
>>> the second two rather than the first two operands are commutative.
>>> A later patch makes use of this.
>>>
>>> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf,
>>> x86_64-linux-gnu and powerpc64le-linux-gnu.  OK to install?
>>
>>
>> I think there might be an issue with long double and __float128
>> support here (for both PowerPC and x86_64).  Please add testcases for
>> those to show they are not problematic.
>> What about half float on the aarch64 case?  Is that handle correctly?
>> I did not see a testcase for that case either.
>
> What specific kind of problem are you thinking of?  The patch is
> type-generic and the internal functions are only used if the target
> advertises the appropriate optab.  Targets already check (or should
> check) that the optabs are used under the appropriate conditions for
> that target.  E.g. gcc.target/powerpc/float128-fma1.c checks for the
> four cases of __float128 for PowerPC.

It was more of reference to the documentation addition you did:
"+Target supports all four fused multiply-add optabs for both @code{float}
+and @code{double}."

Also a side note, while I was working improving the use of integer
madd instructions on aarch64, I found if I changed "madd<mode>"
pattern name to "fma<mode>" I could get more madd used (and add with
shifts too).  Does the FMA internal function still support integer
types?

Thanks,
Andrew

>
> Thanks,
> Richard<div id="DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2"><br />
<table style="border-top: 1px solid #D3D4DE;">
	<tr>
        <td style="width: 55px; padding-top: 13px;"><a
href="http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail"
target="_blank"><img
src="https://ipmcdn.avast.com/images/icons/icon-envelope-tick-green-avg-v1.png"
alt="" width="46" height="29" style="width: 46px; height: 29px;"
/></a></td>
		<td style="width: 470px; padding-top: 12px; color: #41424e;
font-size: 13px; font-family: Arial, Helvetica, sans-serif;
line-height: 18px;">Virus-free. <a
href="http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail"
target="_blank" style="color: #4453ea;">www.avg.com</a>
		</td>
	</tr>
</table><a href="#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2" width="1"
height="1"></a></div>
Richard Sandiford May 12, 2018, 6:52 a.m. | #4
Andrew Pinski <pinskia@gmail.com> writes:
> On Fri, May 11, 2018 at 11:04 AM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Andrew Pinski <pinskia@gmail.com> writes:
>>> On Fri, May 11, 2018 at 10:15 AM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> There are four optabs for various forms of fused multiply-add:
>>>> fma, fms, fnma and fnms.  Of these, only fma had a direct gimple
>>>> representation.  For the other three we relied on special pattern-
>>>> matching during expand, although tree-ssa-math-opts.c did have
>>>> some code to try to second-guess what expand would do.
>>>>
>>>> This patch removes the old FMA_EXPR representation of fma and
>>>> introduces four new internal functions, one for each optab.
>>>> IFN_FMA is tied to BUILT_IN_FMA* while the other three are
>>>> independent directly-mapped internal functions.  It's then
>>>> possible to do the pattern-matching in match.pd and
>>>> tree-ssa-math-opts.c (via folding) can select the exact
>>>> FMA-based operation.
>>>>
>>>> The patch removes the gimple FE support for __FMA rather than mapping
>>>> it to the internal function.  There's no reason now to treat it
>>>> differently from other internal functions (although the FE doesn't
>>>> handle those yet).
>>>>
>>>> The BRIG & HSA parts are a best guess, but seem relatively simple.
>>>>
>>>> The genmatch.c changes are structured to allow ternary ops in which
>>>> the second two rather than the first two operands are commutative.
>>>> A later patch makes use of this.
>>>>
>>>> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf,
>>>> x86_64-linux-gnu and powerpc64le-linux-gnu.  OK to install?
>>>
>>>
>>> I think there might be an issue with long double and __float128
>>> support here (for both PowerPC and x86_64).  Please add testcases for
>>> those to show they are not problematic.
>>> What about half float on the aarch64 case?  Is that handle correctly?
>>> I did not see a testcase for that case either.
>>
>> What specific kind of problem are you thinking of?  The patch is
>> type-generic and the internal functions are only used if the target
>> advertises the appropriate optab.  Targets already check (or should
>> check) that the optabs are used under the appropriate conditions for
>> that target.  E.g. gcc.target/powerpc/float128-fma1.c checks for the
>> four cases of __float128 for PowerPC.
>
> It was more of reference to the documentation addition you did:
> "+Target supports all four fused multiply-add optabs for both @code{float}
> +and @code{double}."

Ah, OK.  That was just for the new testsuite target selector.  It had
to be defined relative to *some* types, and I thought in practice any
target that implements these optabs will implement at least float and
double, so those were the best types to use in the new tests.

The tests weren't meant to cover every possible type, and I guess
there's an argument that the float tests are redundant given the double
ones.  At least having both does prove some level of type genericity though.

> Also a side note, while I was working improving the use of integer
> madd instructions on aarch64, I found if I changed "madd<mode>"
> pattern name to "fma<mode>" I could get more madd used (and add with
> shifts too).  Does the FMA internal function still support integer
> types?

Yeah, if the optab is defined for the corresponding integer modes.

Not sure whether that's a good or a bad thing :-), but at least it's
consistent with how FMA_EXPR worked.

Thanks,
Richard
Richard Biener May 15, 2018, 10:16 a.m. | #5
On Fri, May 11, 2018 at 7:15 PM Richard Sandiford <
richard.sandiford@linaro.org> wrote:

> There are four optabs for various forms of fused multiply-add:
> fma, fms, fnma and fnms.  Of these, only fma had a direct gimple
> representation.  For the other three we relied on special pattern-
> matching during expand, although tree-ssa-math-opts.c did have
> some code to try to second-guess what expand would do.

> This patch removes the old FMA_EXPR representation of fma and
> introduces four new internal functions, one for each optab.
> IFN_FMA is tied to BUILT_IN_FMA* while the other three are
> independent directly-mapped internal functions.  It's then
> possible to do the pattern-matching in match.pd and
> tree-ssa-math-opts.c (via folding) can select the exact
> FMA-based operation.

> The patch removes the gimple FE support for __FMA rather than mapping
> it to the internal function.  There's no reason now to treat it
> differently from other internal functions (although the FE doesn't
> handle those yet).

> The BRIG & HSA parts are a best guess, but seem relatively simple.

> The genmatch.c changes are structured to allow ternary ops in which
> the second two rather than the first two operands are commutative.
> A later patch makes use of this.

> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf,
> x86_64-linux-gnu and powerpc64le-linux-gnu.  OK to install?

Comment below

> Richard


> 2018-05-11  Richard Sandiford  <richard.sandiford@linaro.org>

> gcc/
>          * doc/sourcebuild.texi (all_scalar_fma): Document.
>          * tree.def (FMA_EXPR): Delete.
>          * internal-fn.def (FMA, FMS, FNMA, FNMS): New internal functions.
>          * internal-fn.c (ternary_direct): New macro.
>          (expand_ternary_optab_fn): Likewise.
>          (direct_ternary_optab_supported_p): Likewise.
>          * Makefile.in (build/genmatch.o): Depend on case-fn-macros.h.
>          * builtins.c (fold_builtin_fma): Delete.
>          (fold_builtin_3): Don't call it.
>          * cfgexpand.c (expand_debug_expr): Remove FMA_EXPR handling.
>          * expr.c (expand_expr_real_2): Likewise.
>          * fold-const.c (operand_equal_p): Likewise.
>          (fold_ternary_loc): Likewise.
>          * gimple-pretty-print.c (dump_ternary_rhs): Likewise.
>          * gimple.c (DEFTREECODE): Likewise.
>          * gimplify.c (gimplify_expr): Likewise.
>          * optabs-tree.c (optab_for_tree_code): Likewise.
>          * tree-cfg.c (verify_gimple_assign_ternary): Likewise.
>          * tree-eh.c (operation_could_trap_p): Likewise.
>          (stmt_could_throw_1_p): Likewise.
>          * tree-inline.c (estimate_operator_cost): Likewise.
>          * tree-pretty-print.c (dump_generic_node): Likewise.
>          (op_code_prio): Likewise.
>          * tree-ssa-loop-im.c (stmt_cost): Likewise.
>          * tree-ssa-operands.c (get_expr_operands): Likewise.
>          * tree.c (commutative_ternary_tree_code, add_expr): Likewise.
>          * fold-const-call.h (fold_fma): Delete.
>          * fold-const-call.c (fold_const_call_ssss): Handle CFN_FMS,
>          CFN_FNMA and CFN_FNMS.
>          (fold_fma): Delete.
>          * genmatch.c (combined_fn): New enum.
>          (commutative_ternary_tree_code): Remove FMA_EXPR handling.
>          (commutative_op): New function.
>          (commutate): Use it.  Handle more than 2 operands.
>          (dt_operand::gen_gimple_expr): Use commutative_op.
>          (parser::parse_expr): Allow :c to be used with non-binary
>          operators if the commutative operand is known.
>          * gimple-ssa-backprop.c (backprop::process_builtin_call_use):
Handle
>          CFN_FMS, CFN_FNMA and CFN_FNMS.
>          (backprop::process_assign_use): Remove FMA_EXPR handling.
>          * hsa-gen.c (gen_hsa_insns_for_operation_assignment): Likewise.
>          (gen_hsa_fma): New function.
>          (gen_hsa_insn_for_internal_fn_call): Use it for IFN_FMA, IFN_FMS,
>          IFN_FNMA and IFN_FNMS.
>          * match.pd: Add folds for IFN_FMS, IFN_FNMA and IFN_FNMS.
>          * tree-ssa-math-opts.c (aggressive_valueize): New function.
>          (convert_mult_to_fma_1): Use the gimple_build interface and use
>          aggerssive_valueize to fold the result.
>          (convert_mult_to_fma): Use direct_internal_fn_suppoerted_p
>          instead of checking for optabs directly.
>          * config/i386/i386.c (ix86_add_stmt_cost): Recognize FMAs as calls
>          rather than FMA_EXPRs.
>          * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Create a
>          call to IFN_FMA instead of an FMA_EXPR.

> gcc/brig/
>          * brigfrontend/brig-function.cc
>          (brig_function::get_builtin_for_hsa_opcode): Use BUILT_IN_FMA
>          for BRIG_OPCODE_FMA.
>          (brig_function::get_tree_code_for_hsa_opcode): Treat BUILT_IN_FMA
>          as a call.

> gcc/c/
>          * gimple-parser.c (c_parser_gimple_postfix_expression): Remove
>          __FMA_EXPR handlng.

> gcc/cp/
>          * constexpr.c (cxx_eval_constant_expression): Remove FMA_EXPR
handling.
>          (potential_constant_expression_1): Likewise.

> gcc/testsuite/
>          * lib/target-supports.exp (check_effective_target_all_scalar_fma):
>          New proc.
>          * gcc.dg/fma-1.c: New test.
>          * gcc.dg/fma-2.c: Likewise.
>          * gcc.dg/fma-3.c: Likewise.
>          * gcc.dg/fma-4.c: Likewise.
>          * gcc.dg/fma-5.c: Likewise.
>          * gcc.dg/fma-6.c: Likewise.
>          * gcc.dg/fma-7.c: Likewise.
>          * gcc.dg/gimplefe-26.c: Remove.
>          * gfortran.dg/reassoc_7.f: Pass -ffp-contract=off.
>          * gfortran.dg/reassoc_8.f: Likewise.
>          * gfortran.dg/reassoc_9.f: Likewise.
>          * gfortran.dg/reassoc_10.f: Likewise.

> Index: gcc/doc/sourcebuild.texi
> ===================================================================
> --- gcc/doc/sourcebuild.texi    2018-05-08 09:42:01.479723260 +0100
> +++ gcc/doc/sourcebuild.texi    2018-05-11 18:08:24.833947453 +0100
> @@ -2170,6 +2170,11 @@ Target supports wide characters.
>   @subsubsection Other attributes

>   @table @code
> +@item all_scalar_fma
> +Target supports all four fused multiply-add optabs for both @code{float}
> +and @code{double}.  These optabs are: @code{fma_optab}, @code{fms_optab},
> +@code{fnma_optab} and @code{fnms_optab}.
> +
>   @item automatic_stack_alignment
>   Target supports automatic stack alignment.

> Index: gcc/tree.def
> ===================================================================
> --- gcc/tree.def        2018-01-03 11:12:58.606649676 +0000
> +++ gcc/tree.def        2018-05-11 18:08:24.852946671 +0100
> @@ -1345,12 +1345,6 @@ DEFTREECODE (WIDEN_MULT_MINUS_EXPR, "wid
>      by the second argument.  */
>   DEFTREECODE (WIDEN_LSHIFT_EXPR, "widen_lshift_expr", tcc_binary, 2)

> -/* Fused multiply-add.
> -   All operands and the result are of the same type.  No intermediate
> -   rounding is performed after multiplying operand one with operand two
> -   before adding operand three.  */
> -DEFTREECODE (FMA_EXPR, "fma_expr", tcc_expression, 3)
> -
>   /* Widening vector multiplication.
>      The two operands are vectors with N elements of size S. Multiplying
the
>      elements of the two vectors will result in N products of size 2*S.
> Index: gcc/internal-fn.def
> ===================================================================
> --- gcc/internal-fn.def 2018-02-20 09:40:41.809452604 +0000
> +++ gcc/internal-fn.def 2018-05-11 18:08:24.842947083 +0100
> @@ -57,6 +57,7 @@ along with GCC; see the file COPYING3.

>      - unary: a normal unary optab, such as vec_reverse_<mode>
>      - binary: a normal binary optab, such as vec_interleave_lo_<mode>
> +   - ternary: a normal ternary optab, such as fma<mode>4

>      - cond_binary: a conditional binary optab, such as add<mode>cc

> @@ -138,6 +139,10 @@ DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CO
>   DEF_INTERNAL_OPTAB_FN (VEC_SHL_INSERT, ECF_CONST | ECF_NOTHROW,
>                         vec_shl_insert, binary)

> +DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary)
> +DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary)
> +DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary)
> +
>   DEF_INTERNAL_OPTAB_FN (COND_ADD, ECF_CONST, cond_add, cond_binary)
>   DEF_INTERNAL_OPTAB_FN (COND_SUB, ECF_CONST, cond_sub, cond_binary)
>   DEF_INTERNAL_SIGNED_OPTAB_FN (COND_MIN, ECF_CONST, first,
> @@ -218,6 +223,9 @@ DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONS
>   /* FP scales.  */
>   DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)

> +/* Ternary math functions.  */
> +DEF_INTERNAL_FLT_FN (FMA, ECF_CONST, fma, ternary)
> +
>   /* Unary integer ops.  */
>   DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb, unary)
>   DEF_INTERNAL_INT_FN (CLZ, ECF_CONST | ECF_NOTHROW, clz, unary)
> Index: gcc/internal-fn.c
> ===================================================================
> --- gcc/internal-fn.c   2018-03-02 09:44:45.456595753 +0000
> +++ gcc/internal-fn.c   2018-05-11 18:08:24.841947124 +0100
> @@ -90,6 +90,7 @@ #define mask_store_lanes_direct { 0, 0,
>   #define scatter_store_direct { 3, 3, false }
>   #define unary_direct { 0, 0, true }
>   #define binary_direct { 0, 0, true }
> +#define ternary_direct { 0, 0, true }
>   #define cond_unary_direct { 1, 1, true }
>   #define cond_binary_direct { 1, 1, true }
>   #define while_direct { 0, 2, false }
> @@ -2962,6 +2963,9 @@ #define expand_unary_optab_fn(FN, STMT,
>   #define expand_binary_optab_fn(FN, STMT, OPTAB) \
>     expand_direct_optab_fn (FN, STMT, OPTAB, 2)

> +#define expand_ternary_optab_fn(FN, STMT, OPTAB) \
> +  expand_direct_optab_fn (FN, STMT, OPTAB, 3)
> +
>   #define expand_cond_unary_optab_fn(FN, STMT, OPTAB) \
>     expand_direct_optab_fn (FN, STMT, OPTAB, 2)

> @@ -3047,6 +3051,7 @@ multi_vector_optab_supported_p (convert_

>   #define direct_unary_optab_supported_p direct_optab_supported_p
>   #define direct_binary_optab_supported_p direct_optab_supported_p
> +#define direct_ternary_optab_supported_p direct_optab_supported_p
>   #define direct_cond_unary_optab_supported_p direct_optab_supported_p
>   #define direct_cond_binary_optab_supported_p direct_optab_supported_p
>   #define direct_mask_load_optab_supported_p direct_optab_supported_p
> Index: gcc/Makefile.in
> ===================================================================
> --- gcc/Makefile.in     2018-03-13 15:06:01.749426280 +0000
> +++ gcc/Makefile.in     2018-05-11 18:08:24.816948152 +0100
> @@ -2786,7 +2786,7 @@ build/genmddump.o : genmddump.c $(RTL_BA
>     $(CORETYPES_H) $(GTM_H) errors.h $(READ_MD_H) $(GENSUPPORT_H)
>   build/genmatch.o : genmatch.c $(BCONFIG_H) $(SYSTEM_H) \
>     $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-map.h $(GGC_H) is-a.h \
> -  tree.def builtins.def internal-fn.def
> +  tree.def builtins.def internal-fn.def case-cfn-macros.h
>   build/gencfn-macros.o : gencfn-macros.c $(BCONFIG_H) $(SYSTEM_H)       \
>     $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-set.h builtins.def      \
>     internal-fn.def
> Index: gcc/builtins.c
> ===================================================================
> --- gcc/builtins.c      2018-05-08 09:42:01.633717606 +0100
> +++ gcc/builtins.c      2018-05-11 18:08:24.817948111 +0100
> @@ -8340,21 +8340,6 @@ fold_builtin_abs (location_t loc, tree a
>     return fold_build1_loc (loc, ABS_EXPR, type, arg);
>   }

> -/* Fold a call to fma, fmaf, or fmal with arguments ARG[012].  */
> -
> -static tree
> -fold_builtin_fma (location_t loc, tree arg0, tree arg1, tree arg2, tree
type)
> -{
> -  /* ??? Only expand to FMA_EXPR if it's directly supported.  */
> -  if (validate_arg (arg0, REAL_TYPE)
> -      && validate_arg (arg1, REAL_TYPE)
> -      && validate_arg (arg2, REAL_TYPE)
> -      && optab_handler (fma_optab, TYPE_MODE (type)) != CODE_FOR_nothing)
> -    return fold_build3_loc (loc, FMA_EXPR, type, arg0, arg1, arg2);
> -
> -  return NULL_TREE;
> -}
> -
>   /* Fold a call to builtin carg(a+bi) -> atan2(b,a).  */

>   static tree
> @@ -9260,10 +9245,6 @@ fold_builtin_3 (location_t loc, tree fnd
>       CASE_FLT_FN (BUILT_IN_SINCOS):
>         return fold_builtin_sincos (loc, arg0, arg1, arg2);

> -    CASE_FLT_FN (BUILT_IN_FMA):
> -    CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMA):
> -      return fold_builtin_fma (loc, arg0, arg1, arg2, type);
> -
>       CASE_FLT_FN (BUILT_IN_REMQUO):
>         if (validate_arg (arg0, REAL_TYPE)
>            && validate_arg (arg1, REAL_TYPE)
> Index: gcc/cfgexpand.c
> ===================================================================
> --- gcc/cfgexpand.c     2018-05-09 11:34:47.190553782 +0100
> +++ gcc/cfgexpand.c     2018-05-11 18:08:24.818948070 +0100
> @@ -4202,7 +4202,6 @@ expand_debug_expr (tree exp)
>          case SAD_EXPR:
>          case WIDEN_MULT_PLUS_EXPR:
>          case WIDEN_MULT_MINUS_EXPR:
> -       case FMA_EXPR:
>            goto ternary;

>          case TRUTH_ANDIF_EXPR:
> @@ -5190,9 +5189,6 @@ expand_debug_expr (tree exp)
>          }
>         return NULL;

> -    case FMA_EXPR:
> -      return simplify_gen_ternary (FMA, mode, inner_mode, op0, op1, op2);
> -
>       default:
>       flag_unsupported:
>         if (flag_checking)
> Index: gcc/expr.c
> ===================================================================
> --- gcc/expr.c  2018-05-01 19:30:32.099548805 +0100
> +++ gcc/expr.c  2018-05-11 18:08:24.833947453 +0100
> @@ -8853,67 +8853,6 @@ #define REDUCE_BIT_FIELD(expr)   (reduce_b
>         expand_operands (treeop0, treeop1, subtarget, &op0, &op1,
EXPAND_NORMAL);
>         return REDUCE_BIT_FIELD (expand_mult (mode, op0, op1, target,
unsignedp));

> -    case FMA_EXPR:
> -      {
> -       optab opt = fma_optab;
> -       gimple *def0, *def2;
> -
> -       /* If there is no insn for FMA, emit it as __builtin_fma{,f,l}
> -          call.  */
> -       if (optab_handler (fma_optab, mode) == CODE_FOR_nothing)
> -         {
> -           tree fn = mathfn_built_in (TREE_TYPE (treeop0), BUILT_IN_FMA);
> -           tree call_expr;
> -
> -           gcc_assert (fn != NULL_TREE);
> -           call_expr = build_call_expr (fn, 3, treeop0, treeop1,
treeop2);
> -           return expand_builtin (call_expr, target, subtarget, mode,
false);
> -         }
> -
> -       def0 = get_def_for_expr (treeop0, NEGATE_EXPR);
> -       /* The multiplication is commutative - look at its 2nd operand
> -          if the first isn't fed by a negate.  */
> -       if (!def0)
> -         {
> -           def0 = get_def_for_expr (treeop1, NEGATE_EXPR);
> -           /* Swap operands if the 2nd operand is fed by a negate.  */
> -           if (def0)
> -             std::swap (treeop0, treeop1);
> -         }
> -       def2 = get_def_for_expr (treeop2, NEGATE_EXPR);
> -
> -       op0 = op2 = NULL;
> -
> -       if (def0 && def2
> -           && optab_handler (fnms_optab, mode) != CODE_FOR_nothing)
> -         {
> -           opt = fnms_optab;
> -           op0 = expand_normal (gimple_assign_rhs1 (def0));
> -           op2 = expand_normal (gimple_assign_rhs1 (def2));
> -         }
> -       else if (def0
> -                && optab_handler (fnma_optab, mode) != CODE_FOR_nothing)
> -         {
> -           opt = fnma_optab;
> -           op0 = expand_normal (gimple_assign_rhs1 (def0));
> -         }
> -       else if (def2
> -                && optab_handler (fms_optab, mode) != CODE_FOR_nothing)
> -         {
> -           opt = fms_optab;
> -           op2 = expand_normal (gimple_assign_rhs1 (def2));
> -         }
> -
> -       if (op0 == NULL)
> -         op0 = expand_expr (treeop0, subtarget, VOIDmode, EXPAND_NORMAL);
> -       if (op2 == NULL)
> -         op2 = expand_normal (treeop2);
> -       op1 = expand_normal (treeop1);
> -
> -       return expand_ternary_op (TYPE_MODE (type), opt,
> -                                 op0, op1, op2, target, 0);
> -      }
> -
>       case MULT_EXPR:
>         /* If this is a fixed-point operation, then we cannot use the code
>           below because "expand_mult" doesn't support sat/no-sat
fixed-point
> Index: gcc/fold-const.c
> ===================================================================
> --- gcc/fold-const.c    2018-05-08 09:42:01.637717459 +0100
> +++ gcc/fold-const.c    2018-05-11 18:08:24.835947371 +0100
> @@ -3297,7 +3297,6 @@ #define OP_SAME_WITH_NULL(N)
        \
>          case TRUTH_ORIF_EXPR:
>            return OP_SAME (0) && OP_SAME (1);

> -       case FMA_EXPR:
>          case WIDEN_MULT_PLUS_EXPR:
>          case WIDEN_MULT_MINUS_EXPR:
>            if (!OP_SAME (2))
> @@ -11708,17 +11707,6 @@ fold_ternary_loc (location_t loc, enum t

>         return NULL_TREE;

> -    case FMA_EXPR:
> -      /* For integers we can decompose the FMA if possible.  */
> -      if (TREE_CODE (arg0) == INTEGER_CST
> -         && TREE_CODE (arg1) == INTEGER_CST)
> -       return fold_build2_loc (loc, PLUS_EXPR, type,
> -                               const_binop (MULT_EXPR, arg0, arg1),
arg2);
> -      if (integer_zerop (arg2))
> -       return fold_build2_loc (loc, MULT_EXPR, type, arg0, arg1);
> -
> -      return fold_fma (loc, type, arg0, arg1, arg2);
> -
>       case VEC_PERM_EXPR:
>         if (TREE_CODE (arg2) == VECTOR_CST)
>          {
> Index: gcc/gimple-pretty-print.c
> ===================================================================
> --- gcc/gimple-pretty-print.c   2018-02-13 10:28:33.325853088 +0000
> +++ gcc/gimple-pretty-print.c   2018-05-11 18:08:24.836947330 +0100
> @@ -490,27 +490,6 @@ dump_ternary_rhs (pretty_printer *buffer
>         pp_greater (buffer);
>         break;

> -    case FMA_EXPR:
> -      if (flags & TDF_GIMPLE)
> -       {
> -         pp_string (buffer, "__FMA (");
> -         dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags,
false);
> -         pp_comma (buffer);
> -         dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags,
false);
> -         pp_comma (buffer);
> -         dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags,
false);
> -         pp_right_paren (buffer);
> -       }
> -      else
> -       {
> -         dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags,
false);
> -         pp_string (buffer, " * ");
> -         dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags,
false);
> -         pp_string (buffer, " + ");
> -         dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags,
false);
> -       }
> -      break;
> -
>       case DOT_PROD_EXPR:
>         pp_string (buffer, "DOT_PROD_EXPR <");
>         dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags,
false);
> Index: gcc/gimple.c
> ===================================================================
> --- gcc/gimple.c        2018-02-13 10:28:33.415861313 +0000
> +++ gcc/gimple.c        2018-05-11 18:08:24.838947247 +0100
> @@ -2143,8 +2143,7 @@ #define DEFTREECODE(SYM, STRING, TYPE, N
>         || (SYM) == REALIGN_LOAD_EXPR
    \
>         || (SYM) == VEC_COND_EXPR
            \
>         || (SYM) == VEC_PERM_EXPR
     \
> -      || (SYM) == BIT_INSERT_EXPR
    \
> -      || (SYM) == FMA_EXPR) ? GIMPLE_TERNARY_RHS
   \
> +      || (SYM) == BIT_INSERT_EXPR) ? GIMPLE_TERNARY_RHS
            \
>      : ((SYM) == CONSTRUCTOR
   \
>         || (SYM) == OBJ_TYPE_REF
   \
>         || (SYM) == ASSERT_EXPR
    \
> Index: gcc/gimplify.c
> ===================================================================
> --- gcc/gimplify.c      2018-05-08 09:42:02.972668452 +0100
> +++ gcc/gimplify.c      2018-05-11 18:08:24.840947165 +0100
> @@ -12086,7 +12086,6 @@ gimplify_expr (tree *expr_p, gimple_seq
>            }
>            break;

> -       case FMA_EXPR:
>          case VEC_PERM_EXPR:
>            /* Classified as tcc_expression.  */
>            goto expr_3;
> Index: gcc/optabs-tree.c
> ===================================================================
> --- gcc/optabs-tree.c   2018-01-13 18:01:26.108685820 +0000
> +++ gcc/optabs-tree.c   2018-05-11 18:08:24.842947083 +0100
> @@ -143,9 +143,6 @@ optab_for_tree_code (enum tree_code code
>                : (TYPE_SATURATING (type)
>                   ? ssmsub_widen_optab : smsub_widen_optab));

> -    case FMA_EXPR:
> -      return fma_optab;
> -
>       case VEC_WIDEN_MULT_HI_EXPR:
>         return TYPE_UNSIGNED (type) ?
>          vec_widen_umult_hi_optab : vec_widen_smult_hi_optab;
> Index: gcc/tree-cfg.c
> ===================================================================
> --- gcc/tree-cfg.c      2018-05-01 19:31:03.079312535 +0100
> +++ gcc/tree-cfg.c      2018-05-11 18:08:24.848946836 +0100
> @@ -4109,20 +4109,6 @@ verify_gimple_assign_ternary (gassign *s
>          }
>         break;

> -    case FMA_EXPR:
> -      if (!useless_type_conversion_p (lhs_type, rhs1_type)
> -         || !useless_type_conversion_p (lhs_type, rhs2_type)
> -         || !useless_type_conversion_p (lhs_type, rhs3_type))
> -       {
> -         error ("type mismatch in fused multiply-add expression");
> -         debug_generic_expr (lhs_type);
> -         debug_generic_expr (rhs1_type);
> -         debug_generic_expr (rhs2_type);
> -         debug_generic_expr (rhs3_type);
> -         return true;
> -       }
> -      break;
> -
>       case VEC_COND_EXPR:
>         if (!VECTOR_BOOLEAN_TYPE_P (rhs1_type)
>            || maybe_ne (TYPE_VECTOR_SUBPARTS (rhs1_type),
> Index: gcc/tree-eh.c
> ===================================================================
> --- gcc/tree-eh.c       2018-02-08 13:34:20.321293427 +0000
> +++ gcc/tree-eh.c       2018-05-11 18:08:24.849946795 +0100
> @@ -2512,8 +2512,7 @@ operation_could_trap_p (enum tree_code o

>     if (TREE_CODE_CLASS (op) != tcc_comparison
>         && TREE_CODE_CLASS (op) != tcc_unary
> -      && TREE_CODE_CLASS (op) != tcc_binary
> -      && op != FMA_EXPR)
> +      && TREE_CODE_CLASS (op) != tcc_binary)
>       return false;

>     return operation_could_trap_helper_p (op, fp_operation, honor_trapv,
> @@ -2825,8 +2824,7 @@ stmt_could_throw_1_p (gassign *stmt)

>     if (TREE_CODE_CLASS (code) == tcc_comparison
>         || TREE_CODE_CLASS (code) == tcc_unary
> -      || TREE_CODE_CLASS (code) == tcc_binary
> -      || code == FMA_EXPR)
> +      || TREE_CODE_CLASS (code) == tcc_binary)
>       {
>         if (TREE_CODE_CLASS (code) == tcc_comparison)
>          t = TREE_TYPE (gimple_assign_rhs1 (stmt));
> Index: gcc/tree-inline.c
> ===================================================================
> --- gcc/tree-inline.c   2018-03-13 15:06:01.738427167 +0000
> +++ gcc/tree-inline.c   2018-05-11 18:08:24.849946795 +0100
> @@ -3855,7 +3855,6 @@ estimate_operator_cost (enum tree_code c
>       case MINUS_EXPR:
>       case MULT_EXPR:
>       case MULT_HIGHPART_EXPR:
> -    case FMA_EXPR:

>       case ADDR_SPACE_CONVERT_EXPR:
>       case FIXED_CONVERT_EXPR:
> Index: gcc/tree-pretty-print.c
> ===================================================================
> --- gcc/tree-pretty-print.c     2018-03-13 15:06:00.523525161 +0000
> +++ gcc/tree-pretty-print.c     2018-05-11 18:08:24.850946754 +0100
> @@ -2901,16 +2901,6 @@ dump_generic_node (pretty_printer *pp, t
>         pp_string (pp, " > ");
>         break;

> -    case FMA_EXPR:
> -      pp_string (pp, " FMA_EXPR < ");
> -      dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
> -      pp_string (pp, ", ");
> -      dump_generic_node (pp, TREE_OPERAND (node, 1), spc, flags, false);
> -      pp_string (pp, ", ");
> -      dump_generic_node (pp, TREE_OPERAND (node, 2), spc, flags, false);
> -      pp_string (pp, " > ");
> -      break;
> -
>       case OACC_PARALLEL:
>         pp_string (pp, "#pragma acc parallel");
>         goto dump_omp_clauses_body;
> @@ -3549,7 +3539,6 @@ op_code_prio (enum tree_code code)
>       case CEIL_MOD_EXPR:
>       case FLOOR_MOD_EXPR:
>       case ROUND_MOD_EXPR:
> -    case FMA_EXPR:
>         return 13;

>       case TRUTH_NOT_EXPR:
> Index: gcc/tree-ssa-loop-im.c
> ===================================================================
> --- gcc/tree-ssa-loop-im.c      2018-03-13 15:06:00.521525322 +0000
> +++ gcc/tree-ssa-loop-im.c      2018-05-11 18:08:24.850946754 +0100
> @@ -493,7 +493,6 @@ stmt_cost (gimple *stmt)
>       case WIDEN_MULT_PLUS_EXPR:
>       case WIDEN_MULT_MINUS_EXPR:
>       case DOT_PROD_EXPR:
> -    case FMA_EXPR:
>       case TRUNC_DIV_EXPR:
>       case CEIL_DIV_EXPR:
>       case FLOOR_DIV_EXPR:
> Index: gcc/tree-ssa-operands.c
> ===================================================================
> --- gcc/tree-ssa-operands.c     2018-01-03 11:12:59.133628905 +0000
> +++ gcc/tree-ssa-operands.c     2018-05-11 18:08:24.851946713 +0100
> @@ -849,7 +849,6 @@ get_expr_operands (struct function *fn,
>       case REALIGN_LOAD_EXPR:
>       case WIDEN_MULT_PLUS_EXPR:
>       case WIDEN_MULT_MINUS_EXPR:
> -    case FMA_EXPR:
>         {
>          get_expr_operands (fn, stmt, &TREE_OPERAND (expr, 0), flags);
>          get_expr_operands (fn, stmt, &TREE_OPERAND (expr, 1), flags);
> Index: gcc/tree.c
> ===================================================================
> --- gcc/tree.c  2018-05-08 09:42:01.631717680 +0100
> +++ gcc/tree.c  2018-05-11 18:08:24.852946671 +0100
> @@ -7171,7 +7171,6 @@ commutative_ternary_tree_code (enum tree
>       case WIDEN_MULT_PLUS_EXPR:
>       case WIDEN_MULT_MINUS_EXPR:
>       case DOT_PROD_EXPR:
> -    case FMA_EXPR:
>         return true;

>       default:
> @@ -7457,7 +7456,6 @@ add_expr (const_tree t, inchash::hash &h
>                flags &= ~OEP_ADDRESS_OF;
>                break;

> -           case FMA_EXPR:
>              case WIDEN_MULT_PLUS_EXPR:
>              case WIDEN_MULT_MINUS_EXPR:
>                {
> Index: gcc/fold-const-call.h
> ===================================================================
> --- gcc/fold-const-call.h       2018-01-03 11:12:58.150667646 +0000
> +++ gcc/fold-const-call.h       2018-05-11 18:08:24.833947453 +0100
> @@ -23,7 +23,6 @@ #define GCC_FOLD_CONST_CALL_H
>   tree fold_const_call (combined_fn, tree, tree);
>   tree fold_const_call (combined_fn, tree, tree, tree);
>   tree fold_const_call (combined_fn, tree, tree, tree, tree);
> -tree fold_fma (location_t, tree, tree, tree, tree);
>   tree build_cmp_result (tree type, int res);

>   #endif
> Index: gcc/fold-const-call.c
> ===================================================================
> --- gcc/fold-const-call.c       2018-05-08 09:42:01.844709861 +0100
> +++ gcc/fold-const-call.c       2018-05-11 18:08:24.833947453 +0100
> @@ -1606,6 +1606,26 @@ fold_const_call_ssss (real_value *result
>       CASE_CFN_FMA_FN:
>         return do_mpfr_arg3 (result, mpfr_fma, arg0, arg1, arg2, format);

> +    case CFN_FMS:
> +      {
> +       real_value new_arg2 = real_value_negate (arg2);
> +       return do_mpfr_arg3 (result, mpfr_fma, arg0, arg1, &new_arg2,
format);
> +      }
> +
> +    case CFN_FNMA:
> +      {
> +       real_value new_arg0 = real_value_negate (arg0);
> +       return do_mpfr_arg3 (result, mpfr_fma, &new_arg0, arg1, arg2,
format);
> +      }
> +
> +    case CFN_FNMS:
> +      {
> +       real_value new_arg0 = real_value_negate (arg0);
> +       real_value new_arg2 = real_value_negate (arg2);
> +       return do_mpfr_arg3 (result, mpfr_fma, &new_arg0, arg1,
> +                            &new_arg2, format);
> +      }
> +
>       default:
>         return false;
>       }
> @@ -1719,20 +1739,3 @@ fold_const_call (combined_fn fn, tree ty
>         return fold_const_call_1 (fn, type, arg0, arg1, arg2);
>       }
>   }
> -
> -/* Fold a fma operation with arguments ARG[012].  */
> -
> -tree
> -fold_fma (location_t, tree type, tree arg0, tree arg1, tree arg2)
> -{
> -  REAL_VALUE_TYPE result;
> -  if (real_cst_p (arg0)
> -      && real_cst_p (arg1)
> -      && real_cst_p (arg2)
> -      && do_mpfr_arg3 (&result, mpfr_fma, TREE_REAL_CST_PTR (arg0),
> -                      TREE_REAL_CST_PTR (arg1), TREE_REAL_CST_PTR (arg2),
> -                      REAL_MODE_FORMAT (TYPE_MODE (type))))
> -    return build_real (type, result);
> -
> -  return NULL_TREE;
> -}
> Index: gcc/genmatch.c
> ===================================================================
> --- gcc/genmatch.c      2018-03-01 08:20:43.846526310 +0000
> +++ gcc/genmatch.c      2018-05-11 18:08:24.835947371 +0100
> @@ -241,6 +241,20 @@ enum internal_fn {
>     IFN_LAST
>   };

> +enum combined_fn {
> +#define DEF_BUILTIN(ENUM, N, C, T, LT, B, F, NA, AT, IM, COND) \
> +  CFN_##ENUM = int (ENUM),
> +#include "builtins.def"
> +
> +#define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) \
> +  CFN_##CODE = int (END_BUILTINS) + int (IFN_##CODE),
> +#include "internal-fn.def"
> +
> +  CFN_LAST
> +};
> +
> +#include "case-cfn-macros.h"
> +
>   /* Return true if CODE represents a commutative tree code.  Otherwise
>      return false.  */
>   bool
> @@ -288,7 +302,6 @@ commutative_ternary_tree_code (enum tree
>       case WIDEN_MULT_PLUS_EXPR:
>       case WIDEN_MULT_MINUS_EXPR:
>       case DOT_PROD_EXPR:
> -    case FMA_EXPR:
>         return true;

>       default:
> @@ -450,6 +463,44 @@ is_a_helper <user_id *>::test (id_base *
>     return id->kind == id_base::USER;
>   }

> +/* If ID has a pair of consecutive, commutative operands, return the
> +   index of the first, otherwise return -1.  */
> +
> +static int
> +commutative_op (id_base *id)
> +{
> +  if (operator_id *code = dyn_cast <operator_id *> (id))
> +    {
> +      if (commutative_tree_code (code->code)
> +         || commutative_ternary_tree_code (code->code))
> +       return 0;
> +      return -1;
> +    }
> +  if (fn_id *fn = dyn_cast <fn_id *> (id))
> +    switch (fn->fn)
> +      {
> +      CASE_CFN_FMA:
> +      case CFN_FMS:
> +      case CFN_FNMA:
> +      case CFN_FNMS:
> +       return 0;
> +
> +      default:
> +       return -1;
> +      }
> +  if (user_id *uid = dyn_cast<user_id *> (id))
> +    {
> +      int res = commutative_op (uid->substitutes[0]);
> +      if (res < 0)
> +       return 0;
> +      for (unsigned i = 1; i < uid->substitutes.length (); ++i)
> +       if (res != commutative_op (uid->substitutes[i]))
> +         return -1;
> +      return res;
> +    }
> +  return -1;
> +}
> +
>   /* Add a predicate identifier to the hash.  */

>   static predicate_id *
> @@ -946,6 +997,9 @@ commutate (operand *op, vec<vec<user_id
>     if (!e->is_commutative)
>       return ret;

> +  /* The operation is always binary if it isn't inherently commutative.
  */
> +  int natural_opno = commutative_op (e->operation);
> +  unsigned int opno = natural_opno >= 0 ? natural_opno : 0;
>     for (unsigned i = 0; i < result.length (); ++i)
>       {
>         expr *ne = new expr (e);
> @@ -994,9 +1048,11 @@ commutate (operand *op, vec<vec<user_id
>              }
>          }
>         ne->is_commutative = false;
> -      // result[i].length () is 2 since e->operation is binary
> -      for (unsigned j = result[i].length (); j; --j)
> -       ne->append_op (result[i][j-1]);
> +      for (unsigned j = 0; j < result[i].length (); ++j)
> +       {
> +         int old_j = (j == opno ? opno + 1 : j == opno + 1 ? opno : j);
> +         ne->append_op (result[i][old_j]);
> +       }
>         ret.safe_push (ne);
>       }

> @@ -2759,24 +2815,18 @@ dt_operand::gen_gimple_expr (FILE *f, in
>     /* While the toplevel operands are canonicalized by the caller
>        after valueizing operands of sub-expressions we have to
>        re-canonicalize operand order.  */
> -  if (operator_id *code = dyn_cast <operator_id *> (id))
> +  int opno = commutative_op (id);
> +  if (opno >= 0)
>       {
> -      /* ???  We can't canonicalize tcc_comparison operands here
> -         because that requires changing the comparison code which
> -        we already matched...  */
> -      if (commutative_tree_code (code->code)
> -         || commutative_ternary_tree_code (code->code))
> -       {
> -         char child_opname0[20], child_opname1[20];
> -         gen_opname (child_opname0, 0);
> -         gen_opname (child_opname1, 1);
> -         fprintf_indent (f, indent,
> -                         "if (tree_swap_operands_p (%s, %s))\n",
> -                         child_opname0, child_opname1);
> -         fprintf_indent (f, indent,
> -                         "  std::swap (%s, %s);\n",
> -                         child_opname0, child_opname1);
> -       }
> +      char child_opname0[20], child_opname1[20];
> +      gen_opname (child_opname0, opno);
> +      gen_opname (child_opname1, opno + 1);
> +      fprintf_indent (f, indent,
> +                     "if (tree_swap_operands_p (%s, %s))\n",
> +                     child_opname0, child_opname1);
> +      fprintf_indent (f, indent,
> +                     "  std::swap (%s, %s);\n",
> +                     child_opname0, child_opname1);
>       }

>     return n_braces;
> @@ -4217,11 +4267,14 @@ parser::parse_expr ()
>                        e->operation->id, e->operation->nargs,
e->ops.length ());
>            if (is_commutative)
>              {
> -             if (e->ops.length () == 2)
> +             if (e->ops.length () == 2
> +                 || commutative_op (e->operation) >= 0)
>                  e->is_commutative = true;
>                else
> -               fatal_at (token, "only binary operators or function with "
> -                         "two arguments can be marked commutative");
> +               fatal_at (token, "only binary operators or functions with
"
> +                         "two arguments can be marked commutative, "
> +                         "unless the operation is known to be inherently
"
> +                         "commutative");
>              }
>            e->expr_type = expr_type;
>            return op;
> Index: gcc/gimple-ssa-backprop.c
> ===================================================================
> --- gcc/gimple-ssa-backprop.c   2018-01-03 11:12:59.113629694 +0000
> +++ gcc/gimple-ssa-backprop.c   2018-05-11 18:08:24.836947330 +0100
> @@ -375,6 +375,9 @@ backprop::process_builtin_call_use (gcal

>       CASE_CFN_FMA:
>       CASE_CFN_FMA_FN:
> +    case CFN_FMS:
> +    case CFN_FNMA:
> +    case CFN_FNMS:
>         /* In X * X + Y, where Y is distinct from X, the sign of X doesn't
>           matter.  */
>         if (gimple_call_arg (call, 0) == rhs
> @@ -420,15 +423,6 @@ backprop::process_assign_use (gassign *a
>          }
>         break;

> -    case FMA_EXPR:
> -      /* In X * X + Y, where Y is distinct from X, the sign of X doesn't
> -        matter.  */
> -      if (gimple_assign_rhs1 (assign) == rhs
> -         && gimple_assign_rhs2 (assign) == rhs
> -         && gimple_assign_rhs3 (assign) != rhs)
> -       info->flags.ignore_sign = true;
> -      break;
> -
>       case MULT_EXPR:
>         /* In X * X, the sign of X doesn't matter.  */
>         if (gimple_assign_rhs1 (assign) == rhs
> Index: gcc/hsa-gen.c
> ===================================================================
> --- gcc/hsa-gen.c       2018-03-17 08:30:21.230924973 +0000
> +++ gcc/hsa-gen.c       2018-05-11 18:08:24.841947124 +0100
> @@ -3178,23 +3178,6 @@ gen_hsa_insns_for_operation_assignment (
>       case NEGATE_EXPR:
>         opcode = BRIG_OPCODE_NEG;
>         break;
> -    case FMA_EXPR:
> -      /* There is a native HSA instruction for scalar FMAs but not for
vector
> -        ones.  */
> -      if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE)
> -       {
> -         hsa_op_reg *dest
> -           = hsa_cfun->reg_for_gimple_ssa (gimple_assign_lhs (assign));
> -         hsa_op_with_type *op1 = hsa_reg_or_immed_for_gimple_op (rhs1,
hbb);
> -         hsa_op_with_type *op2 = hsa_reg_or_immed_for_gimple_op (rhs2,
hbb);
> -         hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3,
hbb);
> -         hsa_op_reg *tmp = new hsa_op_reg (dest->m_type);
> -         gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp, op1, op2, hbb);
> -         gen_hsa_binary_operation (BRIG_OPCODE_ADD, dest, tmp, op3, hbb);
> -         return;
> -       }
> -      opcode = BRIG_OPCODE_MAD;
> -      break;
>       case MIN_EXPR:
>         opcode = BRIG_OPCODE_MIN;
>         break;
> @@ -4490,6 +4473,57 @@ gen_hsa_divmod (gcall *call, hsa_bb *hbb
>     insn->set_output_in_type (dest, 0, hbb);
>   }

> +/* Emit instructions that implement FMA, FMS, FNMA or FNMS call STMT.
> +   Instructions are appended to basic block HBB.  NEGATE1 is true for
> +   FNMA and FNMS.  NEGATE3 is true for FMS and FNMS.  */
> +
> +static void
> +gen_hsa_fma (gcall *call, hsa_bb *hbb, bool negate1, bool negate3)
> +{
> +  tree lhs = gimple_call_lhs (call);
> +  if (lhs == NULL_TREE)
> +    return;
> +
> +  tree rhs1 = gimple_call_arg (call, 0);
> +  tree rhs2 = gimple_call_arg (call, 1);
> +  tree rhs3 = gimple_call_arg (call, 2);
> +
> +  hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
> +  hsa_op_with_type *op1 = hsa_reg_or_immed_for_gimple_op (rhs1, hbb);
> +  hsa_op_with_type *op2 = hsa_reg_or_immed_for_gimple_op (rhs2, hbb);
> +  hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb);
> +
> +  if (negate1)
> +    {
> +      hsa_op_reg *tmp = new hsa_op_reg (dest->m_type);
> +      gen_hsa_unary_operation (BRIG_OPCODE_NEG, tmp, op1, hbb);
> +      op1 = tmp;
> +    }
> +
> +  /* There is a native HSA instruction for scalar FMAs but not for vector
> +     ones.  */
> +  if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE)
> +    {
> +      hsa_op_reg *tmp = new hsa_op_reg (dest->m_type);
> +      gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp, op1, op2, hbb);
> +      gen_hsa_binary_operation (negate3 ? BRIG_OPCODE_SUB :
BRIG_OPCODE_ADD,
> +                               dest, tmp, op3, hbb);
> +    }
> +  else
> +    {
> +      if (negate3)
> +       {
> +         hsa_op_reg *tmp = new hsa_op_reg (dest->m_type);
> +         gen_hsa_unary_operation (BRIG_OPCODE_NEG, tmp, op3, hbb);
> +         op3 = tmp;
> +       }
> +      hsa_insn_basic *insn = new hsa_insn_basic (4, BRIG_OPCODE_MAD,
> +                                                dest->m_type, dest,
> +                                                op1, op2, op3);
> +      hbb->append_insn (insn);
> +    }
> +}
> +
>   /* Set VALUE to a shadow kernel debug argument and append a new
instruction
>      to HBB basic block.  */

> @@ -5224,6 +5258,22 @@ gen_hsa_insn_for_internal_fn_call (gcall
>         gen_hsa_insns_for_call_of_internal_fn (stmt, hbb);
>         break;

> +    case IFN_FMA:
> +      gen_hsa_fma (stmt, hbb, false, false);
> +      break;
> +
> +    case IFN_FMS:
> +      gen_hsa_fma (stmt, hbb, false, true);
> +      break;
> +
> +    case IFN_FNMA:
> +      gen_hsa_fma (stmt, hbb, true, false);
> +      break;
> +
> +    case IFN_FNMS:
> +      gen_hsa_fma (stmt, hbb, true, true);
> +      break;
> +
>       default:
>         HSA_SORRY_ATV (gimple_location (stmt),
>                       "support for HSA does not implement internal
function: %s",
> Index: gcc/match.pd
> ===================================================================
> --- gcc/match.pd        2018-05-02 08:39:32.882737077 +0100
> +++ gcc/match.pd        2018-05-11 18:08:24.842947083 +0100
> @@ -4702,3 +4702,60 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>          || wi::geu_p (wi::to_wide (@rpos),
>                        wi::to_wide (@ipos) + isize))
>       (BIT_FIELD_REF @0 @rsize @rpos)))))
> +
> +(for fmas (FMA)
> + (simplify
> +  (fmas:c (negate @0) @1 @2)
> +  (IFN_FNMA @0 @1 @2))
> + (simplify
> +  (fmas @0 @1 (negate @2))
> +  (IFN_FMS @0 @1 @2))
> + (simplify
> +  (fmas:c (negate @0) @1 (negate @2))
> +  (IFN_FNMS @0 @1 @2))
> + (simplify
> +  (negate (fmas@3 @0 @1 @2))
> +  (if (single_use (@3))
> +   (IFN_FNMS @0 @1 @2))))
> +
> +(simplify
> + (IFN_FMS:c (negate @0) @1 @2)
> + (IFN_FNMS @0 @1 @2))
> +(simplify
> + (IFN_FMS @0 @1 (negate @2))
> + (IFN_FMA @0 @1 @2))
> +(simplify
> + (IFN_FMS:c (negate @0) @1 (negate @2))
> + (IFN_FNMA @0 @1 @2))
> +(simplify
> + (negate (IFN_FMS@3 @0 @1 @2))
> +  (if (single_use (@3))
> +   (IFN_FNMA @0 @1 @2)))
> +
> +(simplify
> + (IFN_FNMA:c (negate @0) @1 @2)
> + (IFN_FMA @0 @1 @2))
> +(simplify
> + (IFN_FNMA @0 @1 (negate @2))
> + (IFN_FNMS @0 @1 @2))
> +(simplify
> + (IFN_FNMA:c (negate @0) @1 (negate @2))
> + (IFN_FMS @0 @1 @2))
> +(simplify
> + (negate (IFN_FNMA@3 @0 @1 @2))
> + (if (single_use (@3))
> +  (IFN_FMS @0 @1 @2)))
> +
> +(simplify
> + (IFN_FNMS:c (negate @0) @1 @2)
> + (IFN_FMS @0 @1 @2))
> +(simplify
> + (IFN_FNMS @0 @1 (negate @2))
> + (IFN_FNMA @0 @1 @2))
> +(simplify
> + (IFN_FNMS:c (negate @0) @1 (negate @2))
> + (IFN_FMA @0 @1 @2))
> +(simplify
> + (negate (IFN_FNMS@3 @0 @1 @2))
> + (if (single_use (@3))
> +  (IFN_FMA @0 @1 @2)))
> Index: gcc/tree-ssa-math-opts.c
> ===================================================================
> --- gcc/tree-ssa-math-opts.c    2018-01-12 14:45:51.037434575 +0000
> +++ gcc/tree-ssa-math-opts.c    2018-05-11 18:08:24.850946754 +0100
> @@ -2640,6 +2640,14 @@ convert_plusminus_to_widen (gimple_stmt_
>     return true;
>   }

> +/* gimple_fold callback that "valueizes" everything.  */
> +
> +static tree
> +aggressive_valueize (tree val)
> +{
> +  return val;
> +}
> +
>   /* Given a result MUL_RESULT which is a result of a multiplication of
OP1 and
>      OP2 and which we know is used in statements that can be, together
with the
>      multiplication, converted to FMAs, perform the transformation.  */
> @@ -2650,7 +2658,7 @@ convert_mult_to_fma_1 (tree mul_result,
>     tree type = TREE_TYPE (mul_result);
>     gimple *use_stmt;
>     imm_use_iterator imm_iter;
> -  gassign *fma_stmt;
> +  gcall *fma_stmt;

>     FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, mul_result)
>       {
> @@ -2658,6 +2666,7 @@ convert_mult_to_fma_1 (tree mul_result,
>         enum tree_code use_code;
>         tree addop, mulop1 = op1, result = mul_result;
>         bool negate_p = false;
> +      gimple_seq seq = NULL;

>         if (is_gimple_debug (use_stmt))
>          continue;
> @@ -2683,11 +2692,7 @@ convert_mult_to_fma_1 (tree mul_result,
>            addop = gimple_assign_rhs2 (use_stmt);
>            /* a * b - c -> a * b + (-c)  */
>            if (gimple_assign_rhs_code (use_stmt) == MINUS_EXPR)
> -           addop = force_gimple_operand_gsi (&gsi,
> -                                             build1 (NEGATE_EXPR,
> -                                                     type, addop),
> -                                             true, NULL_TREE, true,
> -                                             GSI_SAME_STMT);
> +           addop = gimple_build (&seq, NEGATE_EXPR, type, addop);
>          }
>         else
>          {
> @@ -2698,23 +2703,26 @@ convert_mult_to_fma_1 (tree mul_result,
>          }

>         if (negate_p)
> -       mulop1 = force_gimple_operand_gsi (&gsi,
> -                                          build1 (NEGATE_EXPR,
> -                                                  type, mulop1),
> -                                          true, NULL_TREE, true,
> -                                          GSI_SAME_STMT);
> +       mulop1 = gimple_build (&seq, NEGATE_EXPR, type, mulop1);

> -      fma_stmt = gimple_build_assign (gimple_assign_lhs (use_stmt),
> -                                     FMA_EXPR, mulop1, op2, addop);
> +      if (seq)
> +       gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
> +      fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2,
addop);
> +      gimple_call_set_lhs (fma_stmt, gimple_assign_lhs (use_stmt));
> +      gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal
(use_stmt));
> +      gsi_replace (&gsi, fma_stmt, true);
> +      /* Valueize aggressively so that we generate FMS, FNMA and FNMS
> +        regardless of where the negation occurs.  */
> +      if (fold_stmt (&gsi, aggressive_valueize))
> +       update_stmt (gsi_stmt (gsi));

I think it would be nice to be able to use gimple_build () with IFNs so you
can
gimple_build () the IFN and then use gsi_replace_with_seq () on it.  You
only need to fold with generated negates, not with negates already in the
IL?
The the folding implied with gimple_build will take care of it.

Otherwise can you please move aggressive_valueize to gimple-fold.[ch]
alongside no_follow_ssa_edges / follow_single_use_edges and maybe
rename it as follow_all_ssa_edges?

Otherwise looks good to me.

I guess I cannot convince you to add IFN parsing support to the GIMPLE FE?
;)

Thanks,
Richard.


>         if (dump_file && (dump_flags & TDF_DETAILS))
>          {
>            fprintf (dump_file, "Generated FMA ");
> -         print_gimple_stmt (dump_file, fma_stmt, 0, 0);
> +         print_gimple_stmt (dump_file, gsi_stmt (gsi), 0, 0);
>            fprintf (dump_file, "\n");
>          }

> -      gsi_replace (&gsi, fma_stmt, true);
>         widen_mul_stats.fmas_inserted++;
>       }
>   }
> @@ -2862,7 +2870,8 @@ convert_mult_to_fma (gimple *mul_stmt, t

>     /* If the target doesn't support it, don't generate it.  We assume that
>        if fma isn't available then fms, fnma or fnms are not either.  */
> -  if (optab_handler (fma_optab, TYPE_MODE (type)) == CODE_FOR_nothing)
> +  optimization_type opt_type = bb_optimization_type (gimple_bb
(mul_stmt));
> +  if (!direct_internal_fn_supported_p (IFN_FMA, type, opt_type))
>       return false;

>     /* If the multiplication has zero uses, it is kept around probably
because
> @@ -2958,8 +2967,8 @@ convert_mult_to_fma (gimple *mul_stmt, t
>           that a mult / subtract pair.  */
>         if (use_code == MINUS_EXPR && !negate_p
>            && gimple_assign_rhs1 (use_stmt) == result
> -         && optab_handler (fms_optab, TYPE_MODE (type)) ==
CODE_FOR_nothing
> -         && optab_handler (fnma_optab, TYPE_MODE (type)) !=
CODE_FOR_nothing)
> +         && !direct_internal_fn_supported_p (IFN_FMS, type, opt_type)
> +         && direct_internal_fn_supported_p (IFN_FNMA, type, opt_type))
>          {
>            tree rhs2 = gimple_assign_rhs2 (use_stmt);

> Index: gcc/config/i386/i386.c
> ===================================================================
> --- gcc/config/i386/i386.c      2018-05-09 11:34:41.204789431 +0100
> +++ gcc/config/i386/i386.c      2018-05-11 18:08:24.825947782 +0100
> @@ -50516,19 +50516,20 @@ ix86_add_stmt_cost (void *data, int coun
>     tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE;
>     int stmt_cost = - 1;

> +  bool fp = false;
> +  machine_mode mode = TImode;
> +
> +  if (vectype != NULL)
> +    {
> +      fp = FLOAT_TYPE_P (vectype);
> +      mode = TYPE_MODE (vectype);
> +    }
> +
>     if ((kind == vector_stmt || kind == scalar_stmt)
>         && stmt_info
>         && stmt_info->stmt && gimple_code (stmt_info->stmt) ==
GIMPLE_ASSIGN)
>       {
>         tree_code subcode = gimple_assign_rhs_code (stmt_info->stmt);
> -      bool fp = false;
> -      machine_mode mode = TImode;
> -
> -      if (vectype != NULL)
> -       {
> -         fp = FLOAT_TYPE_P (vectype);
> -         mode = TYPE_MODE (vectype);
> -       }
>         /*machine_mode inner_mode = mode;
>         if (VECTOR_MODE_P (mode))
>          inner_mode = GET_MODE_INNER (mode);*/
> @@ -50559,12 +50560,6 @@ ix86_add_stmt_cost (void *data, int coun
>          case MULT_HIGHPART_EXPR:
>            stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
>            break;
> -       case FMA_EXPR:
> -          stmt_cost = ix86_vec_cost (mode,
> -                                    mode == SFmode ? ix86_cost->fmass
> -                                    : ix86_cost->fmasd,
> -                                    true);
> -         break;
>          case NEGATE_EXPR:
>            if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
>              stmt_cost = ix86_cost->sse_op;
> @@ -50627,6 +50622,24 @@ ix86_add_stmt_cost (void *data, int coun
>            break;
>          }
>       }
> +
> +  combined_fn cfn;
> +  if ((kind == vector_stmt || kind == scalar_stmt)
> +      && stmt_info
> +      && stmt_info->stmt
> +      && (cfn = gimple_call_combined_fn (stmt_info->stmt)) != CFN_LAST)
> +    switch (cfn)
> +      {
> +      case CFN_FMA:
> +       stmt_cost = ix86_vec_cost (mode,
> +                                  mode == SFmode ? ix86_cost->fmass
> +                                  : ix86_cost->fmasd,
> +                                  true);
> +       break;
> +      default:
> +       break;
> +      }
> +
>     /* If we do elementwise loads into a vector then we are bound by
>        latency and execution resources for the many scalar loads
>        (AGU and load ports).  Try to account for this by scaling the
> Index: gcc/config/rs6000/rs6000.c
> ===================================================================
> --- gcc/config/rs6000/rs6000.c  2018-05-08 09:42:03.510648702 +0100
> +++ gcc/config/rs6000/rs6000.c  2018-05-11 18:08:24.830947576 +0100
> @@ -16025,7 +16025,9 @@ rs6000_gimple_fold_builtin (gimple_stmt_
>          arg1 = gimple_call_arg (stmt, 1);
>          tree arg2 = gimple_call_arg (stmt, 2);
>          lhs = gimple_call_lhs (stmt);
> -       gimple *g = gimple_build_assign (lhs, FMA_EXPR, arg0, arg1, arg2);
> +       gcall *g = gimple_build_call_internal (IFN_FMA, 3, arg0, arg1,
arg2);
> +       gimple_call_set_lhs (g, lhs);
> +       gimple_call_set_nothrow (g, true);
>          gimple_set_location (g, gimple_location (stmt));
>          gsi_replace (gsi, g, true);
>          return true;
> Index: gcc/brig/brigfrontend/brig-function.cc
> ===================================================================
> --- gcc/brig/brigfrontend/brig-function.cc      2018-05-08
09:42:01.419725462 +0100
> +++ gcc/brig/brigfrontend/brig-function.cc      2018-05-11
18:08:24.817948111 +0100
> @@ -1218,6 +1218,7 @@ brig_function::get_builtin_for_hsa_opcod
>       case BRIG_OPCODE_NEXP2:
>         builtin = mathfn_built_in (builtin_type, BUILT_IN_EXP2);
>         break;
> +    case BRIG_OPCODE_FMA:
>       case BRIG_OPCODE_NFMA:
>         builtin = mathfn_built_in (builtin_type, BUILT_IN_FMA);
>         break;
> @@ -1460,8 +1461,6 @@ brig_function::get_tree_code_for_hsa_opc
>          return CALL_EXPR;
>         else
>          return MAX_EXPR;
> -    case BRIG_OPCODE_FMA:
> -      return FMA_EXPR;
>       case BRIG_OPCODE_ABS:
>         return ABS_EXPR;
>       case BRIG_OPCODE_SHL:
> @@ -1496,6 +1495,7 @@ brig_function::get_tree_code_for_hsa_opc
>         /* Implement as 1/f (x).  gcc should pattern detect that and
>           use a native instruction, if available, for it.  */
>         return TREE_LIST;
> +    case BRIG_OPCODE_FMA:
>       case BRIG_OPCODE_FLOOR:
>       case BRIG_OPCODE_CEIL:
>       case BRIG_OPCODE_SQRT:
> Index: gcc/c/gimple-parser.c
> ===================================================================
> --- gcc/c/gimple-parser.c       2018-01-03 11:12:56.269741723 +0000
> +++ gcc/c/gimple-parser.c       2018-05-11 18:08:24.817948111 +0100
> @@ -903,27 +903,6 @@ c_parser_gimple_postfix_expression (c_pa
>                expr.value = fold_convert (type, val);
>                return expr;
>              }
> -         else if (strcmp (IDENTIFIER_POINTER (id), "__FMA") == 0)
> -           {
> -             c_parser_consume_token (parser);
> -             auto_vec<tree> args;
> -
> -             if (c_parser_require (parser, CPP_OPEN_PAREN, "expected
%<(%>"))
> -               {
> -                 c_parser_gimple_expr_list (parser, &args);
> -                 c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
> -                                            "expected %<)%>");
> -               }
> -             if (args.length () != 3)
> -               {
> -                 error_at (loc, "invalid number of operands to __FMA");
> -                 expr.value = error_mark_node;
> -                 return expr;
> -               }
> -             expr.value = build3_loc (loc, FMA_EXPR, TREE_TYPE (args[0]),
> -                                      args[0], args[1], args[2]);
> -             return expr;
> -           }

>            /* SSA name.  */
>            unsigned version, ver_offset;
> Index: gcc/cp/constexpr.c
> ===================================================================
> --- gcc/cp/constexpr.c  2018-05-01 19:30:30.973597315 +0100
> +++ gcc/cp/constexpr.c  2018-05-11 18:08:24.830947576 +0100
> @@ -4573,7 +4573,6 @@ cxx_eval_constant_expression (const cons
>                               non_constant_p, overflow_p);
>         break;

> -    case FMA_EXPR:
>       case VEC_PERM_EXPR:
>         r = cxx_eval_trinary_expression (ctx, t, lval,
>                                         non_constant_p, overflow_p);
> @@ -5998,7 +5997,6 @@ #define RECUR(T,RV) \
>            return false;
>         return true;

> -    case FMA_EXPR:
>       case VEC_PERM_EXPR:
>        for (i = 0; i < 3; ++i)
>         if (!RECUR (TREE_OPERAND (t, i), true))
> Index: gcc/testsuite/lib/target-supports.exp
> ===================================================================
> --- gcc/testsuite/lib/target-supports.exp       2018-05-08
09:42:01.256731446 +0100
> +++ gcc/testsuite/lib/target-supports.exp       2018-05-11
18:08:24.848946836 +0100
> @@ -2879,6 +2879,13 @@ proc check_effective_target_base_quadflo
>       return 1
>   }

> +# Return 1 if the target supports all four forms of fused multiply-add
> +# (fma, fms, fnma, and fnms) for both float and double.
> +
> +proc check_effective_target_all_scalar_fma { } {
> +    return [istarget aarch64*-*-*]
> +}
> +
>   # Return 1 if the target supports compiling fixed-point,
>   # 0 otherwise.

> Index: gcc/testsuite/gcc.dg/fma-1.c
> ===================================================================
> --- /dev/null   2018-04-20 16:19:46.369131350 +0100
> +++ gcc/testsuite/gcc.dg/fma-1.c        2018-05-11 18:08:24.844947000
+0100
> @@ -0,0 +1,15 @@
> +/* { dg-options "-O2 -fdump-tree-widening_mul" } */
> +
> +float
> +f1 (float a, float b, float c)
> +{
> +  return a * b + c;
> +}
> +
> +double
> +f2 (double a, double b, double c)
> +{
> +  return a * b + c;
> +}
> +
> +/* { dg-final { scan-tree-dump-times { = FMA \(} 2 "widening_mul" {
target all_scalar_fma } } } */
> Index: gcc/testsuite/gcc.dg/fma-2.c
> ===================================================================
> --- /dev/null   2018-04-20 16:19:46.369131350 +0100
> +++ gcc/testsuite/gcc.dg/fma-2.c        2018-05-11 18:08:24.844947000
+0100
> @@ -0,0 +1,15 @@
> +/* { dg-options "-O2 -fdump-tree-widening_mul" } */
> +
> +float
> +f1 (float a, float b, float c)
> +{
> +  return a * b - c;
> +}
> +
> +double
> +f2 (double a, double b, double c)
> +{
> +  return a * b - c;
> +}
> +
> +/* { dg-final { scan-tree-dump-times { = FMS \(} 2 "widening_mul" {
target all_scalar_fma } } } */
> Index: gcc/testsuite/gcc.dg/fma-3.c
> ===================================================================
> --- /dev/null   2018-04-20 16:19:46.369131350 +0100
> +++ gcc/testsuite/gcc.dg/fma-3.c        2018-05-11 18:08:24.845946959
+0100
> @@ -0,0 +1,15 @@
> +/* { dg-options "-O2 -fdump-tree-widening_mul" } */
> +
> +float
> +f1 (float a, float b, float c)
> +{
> +  return c - a * b;
> +}
> +
> +double
> +f2 (double a, double b, double c)
> +{
> +  return c - a * b;
> +}
> +
> +/* { dg-final { scan-tree-dump-times { = FNMA \(} 2 "widening_mul" {
target all_scalar_fma } } } */
> Index: gcc/testsuite/gcc.dg/fma-4.c
> ===================================================================
> --- /dev/null   2018-04-20 16:19:46.369131350 +0100
> +++ gcc/testsuite/gcc.dg/fma-4.c        2018-05-11 18:08:24.845946959
+0100
> @@ -0,0 +1,15 @@
> +/* { dg-options "-O2 -fdump-tree-widening_mul" } */
> +
> +float
> +f1 (float a, float b, float c)
> +{
> +  return -(a * b) - c;
> +}
> +
> +double
> +f2 (double a, double b, double c)
> +{
> +  return -(a * b) - c;
> +}
> +
> +/* { dg-final { scan-tree-dump-times { = FNMS \(} 2 "widening_mul" {
target all_scalar_fma } } } */
> Index: gcc/testsuite/gcc.dg/fma-5.c
> ===================================================================
> --- /dev/null   2018-04-20 16:19:46.369131350 +0100
> +++ gcc/testsuite/gcc.dg/fma-5.c        2018-05-11 18:08:24.845946959
+0100
> @@ -0,0 +1,53 @@
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +float
> +f1 (float a, float b, float c)
> +{
> +  return __builtin_fmaf (a, b, -c);
> +}
> +
> +double
> +f2 (double a, double b, double c)
> +{
> +  return __builtin_fma (a, b, -c);
> +}
> +
> +void
> +f3 (float a, float b, float c, float d, float e, float *res)
> +{
> +  res[0] = __builtin_fmaf (a, b, -e);
> +  res[1] = __builtin_fmaf (c, d, -e);
> +}
> +
> +void
> +f4 (double a, double b, double c, double d, double e, double *res)
> +{
> +  res[0] = __builtin_fma (a, b, -e);
> +  res[1] = __builtin_fma (c, d, -e);
> +}
> +
> +float
> +f5 (float a, float b, float c)
> +{
> +  return -__builtin_fmaf (-a, b, c);
> +}
> +
> +double
> +f6 (double a, double b, double c)
> +{
> +  return -__builtin_fma (-a, b, c);
> +}
> +
> +float
> +f7 (float a, float b, float c)
> +{
> +  return -__builtin_fmaf (a, -b, c);
> +}
> +
> +double
> +f8 (double a, double b, double c)
> +{
> +  return -__builtin_fma (a, -b, c);
> +}
> +
> +/* { dg-final { scan-tree-dump-times { = FMS \(} 10 "optimized" { target
all_scalar_fma } } } */
> Index: gcc/testsuite/gcc.dg/fma-6.c
> ===================================================================
> --- /dev/null   2018-04-20 16:19:46.369131350 +0100
> +++ gcc/testsuite/gcc.dg/fma-6.c        2018-05-11 18:08:24.845946959
+0100
> @@ -0,0 +1,67 @@
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +float
> +f1 (float a, float b, float c)
> +{
> +  return __builtin_fmaf (-a, b, c);
> +}
> +
> +double
> +f2 (double a, double b, double c)
> +{
> +  return __builtin_fma (-a, b, c);
> +}
> +
> +float
> +f3 (float a, float b, float c)
> +{
> +  return __builtin_fmaf (a, -b, c);
> +}
> +
> +double
> +f4 (double a, double b, double c)
> +{
> +  return __builtin_fma (a, -b, c);
> +}
> +
> +void
> +f5 (float a, float b, float c, float d, float e, float *res)
> +{
> +  res[0] = __builtin_fmaf (-a, b, c);
> +  res[1] = __builtin_fmaf (-a, d, e);
> +}
> +
> +void
> +f6 (double a, double b, double c, double d, double e, double *res)
> +{
> +  res[0] = __builtin_fma (-a, b, c);
> +  res[1] = __builtin_fma (-a, d, e);
> +}
> +
> +void
> +f7 (float a, float b, float c, float d, float e, float *res)
> +{
> +  res[0] = __builtin_fmaf (a, -b, c);
> +  res[1] = __builtin_fmaf (d, -b, e);
> +}
> +
> +void
> +f8 (double a, double b, double c, double d, double e, double *res)
> +{
> +  res[0] = __builtin_fma (a, -b, c);
> +  res[1] = __builtin_fma (d, -b, e);
> +}
> +
> +float
> +f9 (float a, float b, float c)
> +{
> +  return -__builtin_fmaf (a, b, -c);
> +}
> +
> +double
> +f10 (double a, double b, double c)
> +{
> +  return -__builtin_fma (a, b, -c);
> +}
> +
> +/* { dg-final { scan-tree-dump-times { = FNMA \(} 14 "optimized" {
target all_scalar_fma } } } */
> Index: gcc/testsuite/gcc.dg/fma-7.c
> ===================================================================
> --- /dev/null   2018-04-20 16:19:46.369131350 +0100
> +++ gcc/testsuite/gcc.dg/fma-7.c        2018-05-11 18:08:24.845946959
+0100
> @@ -0,0 +1,67 @@
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +float
> +f1 (float a, float b, float c)
> +{
> +  return __builtin_fmaf (-a, b, -c);
> +}
> +
> +double
> +f2 (double a, double b, double c)
> +{
> +  return __builtin_fma (-a, b, -c);
> +}
> +
> +float
> +f3 (float a, float b, float c)
> +{
> +  return __builtin_fmaf (a, -b, -c);
> +}
> +
> +double
> +f4 (double a, double b, double c)
> +{
> +  return __builtin_fma (a, -b, -c);
> +}
> +
> +void
> +f5 (float a, float b, float c, float d, float *res)
> +{
> +  res[0] = __builtin_fmaf (-a, b, -c);
> +  res[1] = __builtin_fmaf (-a, d, -c);
> +}
> +
> +void
> +f6 (double a, double b, double c, double d, double *res)
> +{
> +  res[0] = __builtin_fma (-a, b, -c);
> +  res[1] = __builtin_fma (-a, d, -c);
> +}
> +
> +void
> +f7 (float a, float b, float c, float d, float *res)
> +{
> +  res[0] = __builtin_fmaf (a, -b, -c);
> +  res[1] = __builtin_fmaf (d, -b, -c);
> +}
> +
> +void
> +f8 (double a, double b, double c, double d, double *res)
> +{
> +  res[0] = __builtin_fma (a, -b, -c);
> +  res[1] = __builtin_fma (d, -b, -c);
> +}
> +
> +float
> +f9 (float a, float b, float c)
> +{
> +  return -__builtin_fmaf (a, b, c);
> +}
> +
> +double
> +f10 (double a, double b, double c)
> +{
> +  return -__builtin_fma (a, b, c);
> +}
> +
> +/* { dg-final { scan-tree-dump-times { = FNMS \(} 14 "optimized" {
target all_scalar_fma } } } */
> Index: gcc/testsuite/gcc.dg/gimplefe-26.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/gimplefe-26.c  2017-02-23 19:54:08.000000000
+0000
> +++ /dev/null   2018-04-20 16:19:46.369131350 +0100
> @@ -1,16 +0,0 @@
> -/* { dg-do compile { target c99_runtime } } */
> -/* { dg-options "-O -fgimple -fdump-tree-ssa-gimple" } */
> -
> -#define foo(type, num) \
> -type __GIMPLE () foo_##num (type a, type b, type c) \
> -{ \
> -  type t0; \
> -  t0_1 = __FMA (a, b, c); \
> -  return t0_1; \
> -}
> -
> -foo(float, 1)
> -foo(double, 2)
> -foo(long double, 3)
> -
> -/* { dg-final { scan-tree-dump-times "__FMA" 3 "ssa" } } */
> Index: gcc/testsuite/gfortran.dg/reassoc_7.f
> ===================================================================
> --- gcc/testsuite/gfortran.dg/reassoc_7.f       2015-06-02
23:52:46.000000000 +0100
> +++ gcc/testsuite/gfortran.dg/reassoc_7.f       2018-05-11
18:08:24.846946918 +0100
> @@ -1,5 +1,5 @@
>   ! { dg-do compile }
> -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" }
> +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized"
}

>         SUBROUTINE S55199(P,Dvdph)
>         implicit none
> Index: gcc/testsuite/gfortran.dg/reassoc_8.f
> ===================================================================
> --- gcc/testsuite/gfortran.dg/reassoc_8.f       2015-06-02
23:52:46.000000000 +0100
> +++ gcc/testsuite/gfortran.dg/reassoc_8.f       2018-05-11
18:08:24.846946918 +0100
> @@ -1,5 +1,5 @@
>   ! { dg-do compile }
> -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" }
> +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized"
}

>         SUBROUTINE S55199(P,Dvdph)
>         implicit none
> Index: gcc/testsuite/gfortran.dg/reassoc_9.f
> ===================================================================
> --- gcc/testsuite/gfortran.dg/reassoc_9.f       2015-06-02
23:52:48.000000000 +0100
> +++ gcc/testsuite/gfortran.dg/reassoc_9.f       2018-05-11
18:08:24.847946877 +0100
> @@ -1,5 +1,5 @@
>   ! { dg-do compile }
> -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" }
> +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized"
}

>         SUBROUTINE S55199(P,Dvdph)
>         implicit none
> Index: gcc/testsuite/gfortran.dg/reassoc_10.f
> ===================================================================
> --- gcc/testsuite/gfortran.dg/reassoc_10.f      2015-06-02
23:52:44.000000000 +0100
> +++ gcc/testsuite/gfortran.dg/reassoc_10.f      2018-05-11
18:08:24.846946918 +0100
> @@ -1,5 +1,5 @@
>   ! { dg-do compile }
> -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" }
> +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized"
}

>         SUBROUTINE S55199(P,Q,Dvdph)
>         implicit none
Martin Jambor May 15, 2018, 11:21 a.m. | #6
Hi,

On Fri, May 11 2018, Richard Sandiford wrote:
> There are four optabs for various forms of fused multiply-add:
> fma, fms, fnma and fnms.  Of these, only fma had a direct gimple
> representation.  For the other three we relied on special pattern-
> matching during expand, although tree-ssa-math-opts.c did have
> some code to try to second-guess what expand would do.
>
> This patch removes the old FMA_EXPR representation of fma and
> introduces four new internal functions, one for each optab.
> IFN_FMA is tied to BUILT_IN_FMA* while the other three are
> independent directly-mapped internal functions.  It's then
> possible to do the pattern-matching in match.pd and
> tree-ssa-math-opts.c (via folding) can select the exact
> FMA-based operation.
>
> The patch removes the gimple FE support for __FMA rather than mapping
> it to the internal function.  There's no reason now to treat it
> differently from other internal functions (although the FE doesn't
> handle those yet).
>
> The BRIG & HSA parts are a best guess, but seem relatively simple.

Both parts are OK.

Thanks,

Martin


>
> The genmatch.c changes are structured to allow ternary ops in which
> the second two rather than the first two operands are commutative.
> A later patch makes use of this.
>
> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf,
> x86_64-linux-gnu and powerpc64le-linux-gnu.  OK to install?
>
> Richard
>
>
> 2018-05-11  Richard Sandiford  <richard.sandiford@linaro.org>
>
> gcc/
> 	* doc/sourcebuild.texi (all_scalar_fma): Document.
> 	* tree.def (FMA_EXPR): Delete.
> 	* internal-fn.def (FMA, FMS, FNMA, FNMS): New internal functions.
> 	* internal-fn.c (ternary_direct): New macro.
> 	(expand_ternary_optab_fn): Likewise.
> 	(direct_ternary_optab_supported_p): Likewise.
> 	* Makefile.in (build/genmatch.o): Depend on case-fn-macros.h.
> 	* builtins.c (fold_builtin_fma): Delete.
> 	(fold_builtin_3): Don't call it.
> 	* cfgexpand.c (expand_debug_expr): Remove FMA_EXPR handling.
> 	* expr.c (expand_expr_real_2): Likewise.
> 	* fold-const.c (operand_equal_p): Likewise.
> 	(fold_ternary_loc): Likewise.
> 	* gimple-pretty-print.c (dump_ternary_rhs): Likewise.
> 	* gimple.c (DEFTREECODE): Likewise.
> 	* gimplify.c (gimplify_expr): Likewise.
> 	* optabs-tree.c (optab_for_tree_code): Likewise.
> 	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> 	* tree-eh.c (operation_could_trap_p): Likewise.
> 	(stmt_could_throw_1_p): Likewise.
> 	* tree-inline.c (estimate_operator_cost): Likewise.
> 	* tree-pretty-print.c (dump_generic_node): Likewise.
> 	(op_code_prio): Likewise.
> 	* tree-ssa-loop-im.c (stmt_cost): Likewise.
> 	* tree-ssa-operands.c (get_expr_operands): Likewise.
> 	* tree.c (commutative_ternary_tree_code, add_expr): Likewise.
> 	* fold-const-call.h (fold_fma): Delete.
> 	* fold-const-call.c (fold_const_call_ssss): Handle CFN_FMS,
> 	CFN_FNMA and CFN_FNMS.
> 	(fold_fma): Delete.
> 	* genmatch.c (combined_fn): New enum.
> 	(commutative_ternary_tree_code): Remove FMA_EXPR handling.
> 	(commutative_op): New function.
> 	(commutate): Use it.  Handle more than 2 operands.
> 	(dt_operand::gen_gimple_expr): Use commutative_op.
> 	(parser::parse_expr): Allow :c to be used with non-binary
> 	operators if the commutative operand is known.
> 	* gimple-ssa-backprop.c (backprop::process_builtin_call_use): Handle
> 	CFN_FMS, CFN_FNMA and CFN_FNMS.
> 	(backprop::process_assign_use): Remove FMA_EXPR handling.
> 	* hsa-gen.c (gen_hsa_insns_for_operation_assignment): Likewise.
> 	(gen_hsa_fma): New function.
> 	(gen_hsa_insn_for_internal_fn_call): Use it for IFN_FMA, IFN_FMS,
> 	IFN_FNMA and IFN_FNMS.
> 	* match.pd: Add folds for IFN_FMS, IFN_FNMA and IFN_FNMS.
> 	* tree-ssa-math-opts.c (aggressive_valueize): New function.
> 	(convert_mult_to_fma_1): Use the gimple_build interface and use
> 	aggerssive_valueize to fold the result.
> 	(convert_mult_to_fma): Use direct_internal_fn_suppoerted_p
> 	instead of checking for optabs directly.
> 	* config/i386/i386.c (ix86_add_stmt_cost): Recognize FMAs as calls
> 	rather than FMA_EXPRs.
> 	* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Create a
> 	call to IFN_FMA instead of an FMA_EXPR.
>
> gcc/brig/
> 	* brigfrontend/brig-function.cc
> 	(brig_function::get_builtin_for_hsa_opcode): Use BUILT_IN_FMA
> 	for BRIG_OPCODE_FMA.
> 	(brig_function::get_tree_code_for_hsa_opcode): Treat BUILT_IN_FMA
> 	as a call.
>
> gcc/c/
> 	* gimple-parser.c (c_parser_gimple_postfix_expression): Remove
> 	__FMA_EXPR handlng.
>
> gcc/cp/
> 	* constexpr.c (cxx_eval_constant_expression): Remove FMA_EXPR handling.
> 	(potential_constant_expression_1): Likewise.
>
> gcc/testsuite/
> 	* lib/target-supports.exp (check_effective_target_all_scalar_fma):
> 	New proc.
> 	* gcc.dg/fma-1.c: New test.
> 	* gcc.dg/fma-2.c: Likewise.
> 	* gcc.dg/fma-3.c: Likewise.
> 	* gcc.dg/fma-4.c: Likewise.
> 	* gcc.dg/fma-5.c: Likewise.
> 	* gcc.dg/fma-6.c: Likewise.
> 	* gcc.dg/fma-7.c: Likewise.
> 	* gcc.dg/gimplefe-26.c: Remove.
> 	* gfortran.dg/reassoc_7.f: Pass -ffp-contract=off.
> 	* gfortran.dg/reassoc_8.f: Likewise.
> 	* gfortran.dg/reassoc_9.f: Likewise.
> 	* gfortran.dg/reassoc_10.f: Likewise.
>
Richard Sandiford May 17, 2018, 8:56 a.m. | #7
Richard Biener <richard.guenther@gmail.com> writes:
>> @@ -2698,23 +2703,26 @@ convert_mult_to_fma_1 (tree mul_result,
>>          }
>
>>         if (negate_p)
>> -       mulop1 = force_gimple_operand_gsi (&gsi,
>> -                                          build1 (NEGATE_EXPR,
>> -                                                  type, mulop1),
>> -                                          true, NULL_TREE, true,
>> -                                          GSI_SAME_STMT);
>> +       mulop1 = gimple_build (&seq, NEGATE_EXPR, type, mulop1);
>
>> -      fma_stmt = gimple_build_assign (gimple_assign_lhs (use_stmt),
>> -                                     FMA_EXPR, mulop1, op2, addop);
>> +      if (seq)
>> +       gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
>> +      fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2,
> addop);
>> +      gimple_call_set_lhs (fma_stmt, gimple_assign_lhs (use_stmt));
>> +      gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal
> (use_stmt));
>> +      gsi_replace (&gsi, fma_stmt, true);
>> +      /* Valueize aggressively so that we generate FMS, FNMA and FNMS
>> +        regardless of where the negation occurs.  */
>> +      if (fold_stmt (&gsi, aggressive_valueize))
>> +       update_stmt (gsi_stmt (gsi));
>
> I think it would be nice to be able to use gimple_build () with IFNs so you
> can
> gimple_build () the IFN and then use gsi_replace_with_seq () on it.  You
> only need to fold with generated negates, not with negates already in the
> IL?
> The the folding implied with gimple_build will take care of it.

The idea was to pick up existing negates that feed the multiplication
as well as any added by the pass itself.

On IRC yesterday we talked about how this should handle the ECF_NOTHROW
flag, and whether things like IFN_SQRT and IFN_FMA should always be
nothrow (like the built-in functions are).  But in the end I thought
it'd be better to keep things as they are.  We already handle
-fnon-call-exceptions for unfused a * b + c and before the patch also
handled it for FMA_EXPR.  It'd seem like a step backwards if the new
internal functions didn't handle it too.  If anything it seems like the
built-in functions should change to be closer to the tree_code and
internal_fn way of doing things, if we want to support -fnon-call-exceptions
properly.

This also surprised me when doing the if-conversion patch I sent yesterday.
We're happy to vectorise:

  for (int i = 0; i < 100; ++i)
    x[i] = ... ? sqrt (x[i]) : 0;

by doing the sqrt unconditionally and selecting on the result, even with
the default maths flags, but refuse to vectorise the simpler:

  for (int i = 0; i < 100; ++i)
    x[i] = ... ? x[i] + 1 : 0;

in the same way.

> Otherwise can you please move aggressive_valueize to gimple-fold.[ch]
> alongside no_follow_ssa_edges / follow_single_use_edges and maybe
> rename it as follow_all_ssa_edges?

Ah, yeah, that's definitely a better name.

I also renamed all_scalar_fma to scalar_all_fma, since I realised
after Andrew's reply that the old name made it sound like it was
"all scalars", whereas it meant to mean "all fmas".

Tested as before.

Thanks,
Richard

2018-05-17  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* doc/sourcebuild.texi (scalar_all_fma): Document.
	* tree.def (FMA_EXPR): Delete.
	* internal-fn.def (FMA, FMS, FNMA, FNMS): New internal functions.
	* internal-fn.c (ternary_direct): New macro.
	(expand_ternary_optab_fn): Likewise.
	(direct_ternary_optab_supported_p): Likewise.
	* Makefile.in (build/genmatch.o): Depend on case-fn-macros.h.
	* builtins.c (fold_builtin_fma): Delete.
	(fold_builtin_3): Don't call it.
	* cfgexpand.c (expand_debug_expr): Remove FMA_EXPR handling.
	* expr.c (expand_expr_real_2): Likewise.
	* fold-const.c (operand_equal_p): Likewise.
	(fold_ternary_loc): Likewise.
	* gimple-pretty-print.c (dump_ternary_rhs): Likewise.
	* gimple.c (DEFTREECODE): Likewise.
	* gimplify.c (gimplify_expr): Likewise.
	* optabs-tree.c (optab_for_tree_code): Likewise.
	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
	* tree-eh.c (operation_could_trap_p): Likewise.
	(stmt_could_throw_1_p): Likewise.
	* tree-inline.c (estimate_operator_cost): Likewise.
	* tree-pretty-print.c (dump_generic_node): Likewise.
	(op_code_prio): Likewise.
	* tree-ssa-loop-im.c (stmt_cost): Likewise.
	* tree-ssa-operands.c (get_expr_operands): Likewise.
	* tree.c (commutative_ternary_tree_code, add_expr): Likewise.
	* fold-const-call.h (fold_fma): Delete.
	* fold-const-call.c (fold_const_call_ssss): Handle CFN_FMS,
	CFN_FNMA and CFN_FNMS.
	(fold_fma): Delete.
	* genmatch.c (combined_fn): New enum.
	(commutative_ternary_tree_code): Remove FMA_EXPR handling.
	(commutative_op): New function.
	(commutate): Use it.  Handle more than 2 operands.
	(dt_operand::gen_gimple_expr): Use commutative_op.
	(parser::parse_expr): Allow :c to be used with non-binary
	operators if the commutative operand is known.
	* gimple-ssa-backprop.c (backprop::process_builtin_call_use): Handle
	CFN_FMS, CFN_FNMA and CFN_FNMS.
	(backprop::process_assign_use): Remove FMA_EXPR handling.
	* hsa-gen.c (gen_hsa_insns_for_operation_assignment): Likewise.
	(gen_hsa_fma): New function.
	(gen_hsa_insn_for_internal_fn_call): Use it for IFN_FMA, IFN_FMS,
	IFN_FNMA and IFN_FNMS.
	* match.pd: Add folds for IFN_FMS, IFN_FNMA and IFN_FNMS.
	* gimple-fold.h (follow_all_ssa_edges): Declare.
	* gimple-fold.c (follow_all_ssa_edges): New function.
	* tree-ssa-math-opts.c (convert_mult_to_fma_1): Use the
	gimple_build interface and use follow_all_ssa_edges to fold the result.
	(convert_mult_to_fma): Use direct_internal_fn_suppoerted_p
	instead of checking for optabs directly.
	* config/i386/i386.c (ix86_add_stmt_cost): Recognize FMAs as calls
	rather than FMA_EXPRs.
	* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Create a
	call to IFN_FMA instead of an FMA_EXPR.

gcc/brig/
	* brigfrontend/brig-function.cc
	(brig_function::get_builtin_for_hsa_opcode): Use BUILT_IN_FMA
	for BRIG_OPCODE_FMA.
	(brig_function::get_tree_code_for_hsa_opcode): Treat BUILT_IN_FMA
	as a call.

gcc/c/
	* gimple-parser.c (c_parser_gimple_postfix_expression): Remove
	__FMA_EXPR handlng.

gcc/cp/
	* constexpr.c (cxx_eval_constant_expression): Remove FMA_EXPR handling.
	(potential_constant_expression_1): Likewise.

gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_scalar_all_fma):
	New proc.
	* gcc.dg/fma-1.c: New test.
	* gcc.dg/fma-2.c: Likewise.
	* gcc.dg/fma-3.c: Likewise.
	* gcc.dg/fma-4.c: Likewise.
	* gcc.dg/fma-5.c: Likewise.
	* gcc.dg/fma-6.c: Likewise.
	* gcc.dg/fma-7.c: Likewise.
	* gcc.dg/gimplefe-26.c: Use .FMA instead of __FMA and require
	scalar_all_fma.
	* gfortran.dg/reassoc_7.f: Pass -ffp-contract=off.
	* gfortran.dg/reassoc_8.f: Likewise.
	* gfortran.dg/reassoc_9.f: Likewise.
	* gfortran.dg/reassoc_10.f: Likewise.

Index: gcc/doc/sourcebuild.texi
===================================================================
--- gcc/doc/sourcebuild.texi	2018-05-16 12:48:59.410941892 +0100
+++ gcc/doc/sourcebuild.texi	2018-05-17 09:18:19.954942948 +0100
@@ -2251,6 +2251,11 @@ Target supports @option{-pie}, @option{-
 @item rdynamic
 Target supports @option{-rdynamic}.
 
+@item scalar_all_fma
+Target supports all four fused multiply-add optabs for both @code{float}
+and @code{double}.  These optabs are: @code{fma_optab}, @code{fms_optab},
+@code{fnma_optab} and @code{fnms_optab}.
+
 @item section_anchors
 Target supports section anchors.
 
Index: gcc/tree.def
===================================================================
--- gcc/tree.def	2018-05-16 12:48:59.410941892 +0100
+++ gcc/tree.def	2018-05-17 09:18:19.987942174 +0100
@@ -1345,12 +1345,6 @@ DEFTREECODE (WIDEN_MULT_MINUS_EXPR, "wid
    by the second argument.  */
 DEFTREECODE (WIDEN_LSHIFT_EXPR, "widen_lshift_expr", tcc_binary, 2)
 
-/* Fused multiply-add.
-   All operands and the result are of the same type.  No intermediate
-   rounding is performed after multiplying operand one with operand two
-   before adding operand three.  */
-DEFTREECODE (FMA_EXPR, "fma_expr", tcc_expression, 3)
-
 /* Widening vector multiplication.
    The two operands are vectors with N elements of size S. Multiplying the
    elements of the two vectors will result in N products of size 2*S.
Index: gcc/internal-fn.def
===================================================================
--- gcc/internal-fn.def	2018-05-16 12:48:59.410941892 +0100
+++ gcc/internal-fn.def	2018-05-17 09:18:19.974942479 +0100
@@ -57,6 +57,7 @@ along with GCC; see the file COPYING3.
 
    - unary: a normal unary optab, such as vec_reverse_<mode>
    - binary: a normal binary optab, such as vec_interleave_lo_<mode>
+   - ternary: a normal ternary optab, such as fma<mode>4
 
    - cond_binary: a conditional binary optab, such as add<mode>cc
 
@@ -138,6 +139,10 @@ DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CO
 DEF_INTERNAL_OPTAB_FN (VEC_SHL_INSERT, ECF_CONST | ECF_NOTHROW,
 		       vec_shl_insert, binary)
 
+DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary)
+DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary)
+DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary)
+
 DEF_INTERNAL_OPTAB_FN (COND_ADD, ECF_CONST, cond_add, cond_binary)
 DEF_INTERNAL_OPTAB_FN (COND_SUB, ECF_CONST, cond_sub, cond_binary)
 DEF_INTERNAL_SIGNED_OPTAB_FN (COND_MIN, ECF_CONST, first,
@@ -218,6 +223,9 @@ DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONS
 /* FP scales.  */
 DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
 
+/* Ternary math functions.  */
+DEF_INTERNAL_FLT_FN (FMA, ECF_CONST, fma, ternary)
+
 /* Unary integer ops.  */
 DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb, unary)
 DEF_INTERNAL_INT_FN (CLZ, ECF_CONST | ECF_NOTHROW, clz, unary)
Index: gcc/internal-fn.c
===================================================================
--- gcc/internal-fn.c	2018-05-17 09:17:58.757608747 +0100
+++ gcc/internal-fn.c	2018-05-17 09:18:19.974942479 +0100
@@ -110,6 +110,7 @@ #define mask_store_lanes_direct { 0, 0,
 #define scatter_store_direct { 3, 3, false }
 #define unary_direct { 0, 0, true }
 #define binary_direct { 0, 0, true }
+#define ternary_direct { 0, 0, true }
 #define cond_unary_direct { 1, 1, true }
 #define cond_binary_direct { 1, 1, true }
 #define while_direct { 0, 2, false }
@@ -2982,6 +2983,9 @@ #define expand_unary_optab_fn(FN, STMT,
 #define expand_binary_optab_fn(FN, STMT, OPTAB) \
   expand_direct_optab_fn (FN, STMT, OPTAB, 2)
 
+#define expand_ternary_optab_fn(FN, STMT, OPTAB) \
+  expand_direct_optab_fn (FN, STMT, OPTAB, 3)
+
 #define expand_cond_unary_optab_fn(FN, STMT, OPTAB) \
   expand_direct_optab_fn (FN, STMT, OPTAB, 2)
 
@@ -3067,6 +3071,7 @@ multi_vector_optab_supported_p (convert_
 
 #define direct_unary_optab_supported_p direct_optab_supported_p
 #define direct_binary_optab_supported_p direct_optab_supported_p
+#define direct_ternary_optab_supported_p direct_optab_supported_p
 #define direct_cond_unary_optab_supported_p direct_optab_supported_p
 #define direct_cond_binary_optab_supported_p direct_optab_supported_p
 #define direct_mask_load_optab_supported_p direct_optab_supported_p
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	2018-05-16 12:49:10.495216143 +0100
+++ gcc/Makefile.in	2018-05-17 09:18:19.936943371 +0100
@@ -2787,7 +2787,7 @@ build/genmddump.o : genmddump.c $(RTL_BA
   $(CORETYPES_H) $(GTM_H) errors.h $(READ_MD_H) $(GENSUPPORT_H)
 build/genmatch.o : genmatch.c $(BCONFIG_H) $(SYSTEM_H) \
   $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-map.h $(GGC_H) is-a.h \
-  tree.def builtins.def internal-fn.def
+  tree.def builtins.def internal-fn.def case-cfn-macros.h
 build/gencfn-macros.o : gencfn-macros.c $(BCONFIG_H) $(SYSTEM_H)	\
   $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-set.h builtins.def	\
   internal-fn.def
Index: gcc/builtins.c
===================================================================
--- gcc/builtins.c	2018-05-16 12:48:59.410941892 +0100
+++ gcc/builtins.c	2018-05-17 09:18:19.939943300 +0100
@@ -8340,21 +8340,6 @@ fold_builtin_abs (location_t loc, tree a
   return fold_build1_loc (loc, ABS_EXPR, type, arg);
 }
 
-/* Fold a call to fma, fmaf, or fmal with arguments ARG[012].  */
-
-static tree
-fold_builtin_fma (location_t loc, tree arg0, tree arg1, tree arg2, tree type)
-{
-  /* ??? Only expand to FMA_EXPR if it's directly supported.  */
-  if (validate_arg (arg0, REAL_TYPE)
-      && validate_arg (arg1, REAL_TYPE)
-      && validate_arg (arg2, REAL_TYPE)
-      && optab_handler (fma_optab, TYPE_MODE (type)) != CODE_FOR_nothing)
-    return fold_build3_loc (loc, FMA_EXPR, type, arg0, arg1, arg2);
-
-  return NULL_TREE;
-}
-
 /* Fold a call to builtin carg(a+bi) -> atan2(b,a).  */
 
 static tree
@@ -9260,10 +9245,6 @@ fold_builtin_3 (location_t loc, tree fnd
     CASE_FLT_FN (BUILT_IN_SINCOS):
       return fold_builtin_sincos (loc, arg0, arg1, arg2);
 
-    CASE_FLT_FN (BUILT_IN_FMA):
-    CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMA):
-      return fold_builtin_fma (loc, arg0, arg1, arg2, type);
-
     CASE_FLT_FN (BUILT_IN_REMQUO):
       if (validate_arg (arg0, REAL_TYPE)
 	  && validate_arg (arg1, REAL_TYPE)
Index: gcc/cfgexpand.c
===================================================================
--- gcc/cfgexpand.c	2018-05-16 12:48:59.410941892 +0100
+++ gcc/cfgexpand.c	2018-05-17 09:18:19.941943253 +0100
@@ -4202,7 +4202,6 @@ expand_debug_expr (tree exp)
 	case SAD_EXPR:
 	case WIDEN_MULT_PLUS_EXPR:
 	case WIDEN_MULT_MINUS_EXPR:
-	case FMA_EXPR:
 	  goto ternary;
 
 	case TRUTH_ANDIF_EXPR:
@@ -5190,9 +5189,6 @@ expand_debug_expr (tree exp)
 	}
       return NULL;
 
-    case FMA_EXPR:
-      return simplify_gen_ternary (FMA, mode, inner_mode, op0, op1, op2);
-
     default:
     flag_unsupported:
       if (flag_checking)
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2018-05-16 12:48:59.410941892 +0100
+++ gcc/expr.c	2018-05-17 09:18:19.957942878 +0100
@@ -8853,67 +8853,6 @@ #define REDUCE_BIT_FIELD(expr)	(reduce_b
       expand_operands (treeop0, treeop1, subtarget, &op0, &op1, EXPAND_NORMAL);
       return REDUCE_BIT_FIELD (expand_mult (mode, op0, op1, target, unsignedp));
 
-    case FMA_EXPR:
-      {
-	optab opt = fma_optab;
-	gimple *def0, *def2;
-
-	/* If there is no insn for FMA, emit it as __builtin_fma{,f,l}
-	   call.  */
-	if (optab_handler (fma_optab, mode) == CODE_FOR_nothing)
-	  {
-	    tree fn = mathfn_built_in (TREE_TYPE (treeop0), BUILT_IN_FMA);
-	    tree call_expr;
-
-	    gcc_assert (fn != NULL_TREE);
-	    call_expr = build_call_expr (fn, 3, treeop0, treeop1, treeop2);
-	    return expand_builtin (call_expr, target, subtarget, mode, false);
-	  }
-
-	def0 = get_def_for_expr (treeop0, NEGATE_EXPR);
-	/* The multiplication is commutative - look at its 2nd operand
-	   if the first isn't fed by a negate.  */
-	if (!def0)
-	  {
-	    def0 = get_def_for_expr (treeop1, NEGATE_EXPR);
-	    /* Swap operands if the 2nd operand is fed by a negate.  */
-	    if (def0)
-	      std::swap (treeop0, treeop1);
-	  }
-	def2 = get_def_for_expr (treeop2, NEGATE_EXPR);
-
-	op0 = op2 = NULL;
-
-	if (def0 && def2
-	    && optab_handler (fnms_optab, mode) != CODE_FOR_nothing)
-	  {
-	    opt = fnms_optab;
-	    op0 = expand_normal (gimple_assign_rhs1 (def0));
-	    op2 = expand_normal (gimple_assign_rhs1 (def2));
-	  }
-	else if (def0
-		 && optab_handler (fnma_optab, mode) != CODE_FOR_nothing)
-	  {
-	    opt = fnma_optab;
-	    op0 = expand_normal (gimple_assign_rhs1 (def0));
-	  }
-	else if (def2
-		 && optab_handler (fms_optab, mode) != CODE_FOR_nothing)
-	  {
-	    opt = fms_optab;
-	    op2 = expand_normal (gimple_assign_rhs1 (def2));
-	  }
-
-	if (op0 == NULL)
-	  op0 = expand_expr (treeop0, subtarget, VOIDmode, EXPAND_NORMAL);
-	if (op2 == NULL)
-	  op2 = expand_normal (treeop2);
-	op1 = expand_normal (treeop1);
-
-	return expand_ternary_op (TYPE_MODE (type), opt,
-				  op0, op1, op2, target, 0);
-      }
-
     case MULT_EXPR:
       /* If this is a fixed-point operation, then we cannot use the code
 	 below because "expand_mult" doesn't support sat/no-sat fixed-point
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	2018-05-16 12:48:59.410941892 +0100
+++ gcc/fold-const.c	2018-05-17 09:18:19.959942831 +0100
@@ -3297,7 +3297,6 @@ #define OP_SAME_WITH_NULL(N)				\
 	case TRUTH_ORIF_EXPR:
 	  return OP_SAME (0) && OP_SAME (1);
 
-	case FMA_EXPR:
 	case WIDEN_MULT_PLUS_EXPR:
 	case WIDEN_MULT_MINUS_EXPR:
 	  if (!OP_SAME (2))
@@ -11708,17 +11707,6 @@ fold_ternary_loc (location_t loc, enum t
 
       return NULL_TREE;
 
-    case FMA_EXPR:
-      /* For integers we can decompose the FMA if possible.  */
-      if (TREE_CODE (arg0) == INTEGER_CST
-	  && TREE_CODE (arg1) == INTEGER_CST)
-	return fold_build2_loc (loc, PLUS_EXPR, type,
-				const_binop (MULT_EXPR, arg0, arg1), arg2);
-      if (integer_zerop (arg2))
-	return fold_build2_loc (loc, MULT_EXPR, type, arg0, arg1);
-
-      return fold_fma (loc, type, arg0, arg1, arg2);
-
     case VEC_PERM_EXPR:
       if (TREE_CODE (arg2) == VECTOR_CST)
 	{
Index: gcc/gimple-pretty-print.c
===================================================================
--- gcc/gimple-pretty-print.c	2018-05-17 09:17:58.756608780 +0100
+++ gcc/gimple-pretty-print.c	2018-05-17 09:18:19.963942737 +0100
@@ -490,27 +490,6 @@ dump_ternary_rhs (pretty_printer *buffer
       pp_greater (buffer);
       break;
 
-    case FMA_EXPR:
-      if (flags & TDF_GIMPLE)
-	{
-	  pp_string (buffer, "__FMA (");
-	  dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false);
-	  pp_comma (buffer);
-	  dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false);
-	  pp_comma (buffer);
-	  dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false);
-	  pp_right_paren (buffer);
-	}
-      else
-	{
-	  dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false);
-	  pp_string (buffer, " * ");
-	  dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false);
-	  pp_string (buffer, " + ");
-	  dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false);
-	}
-      break;
-
     case DOT_PROD_EXPR:
       pp_string (buffer, "DOT_PROD_EXPR <");
       dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false);
Index: gcc/gimple.c
===================================================================
--- gcc/gimple.c	2018-05-17 09:17:58.756608780 +0100
+++ gcc/gimple.c	2018-05-17 09:18:19.966942667 +0100
@@ -2150,8 +2150,7 @@ #define DEFTREECODE(SYM, STRING, TYPE, N
       || (SYM) == REALIGN_LOAD_EXPR					    \
       || (SYM) == VEC_COND_EXPR						    \
       || (SYM) == VEC_PERM_EXPR                                             \
-      || (SYM) == BIT_INSERT_EXPR					    \
-      || (SYM) == FMA_EXPR) ? GIMPLE_TERNARY_RHS			    \
+      || (SYM) == BIT_INSERT_EXPR) ? GIMPLE_TERNARY_RHS			    \
    : ((SYM) == CONSTRUCTOR						    \
       || (SYM) == OBJ_TYPE_REF						    \
       || (SYM) == ASSERT_EXPR						    \
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	2018-05-16 12:48:59.410941892 +0100
+++ gcc/gimplify.c	2018-05-17 09:18:19.972942526 +0100
@@ -12086,7 +12086,6 @@ gimplify_expr (tree *expr_p, gimple_seq
 	  }
 	  break;
 
-	case FMA_EXPR:
 	case VEC_PERM_EXPR:
 	  /* Classified as tcc_expression.  */
 	  goto expr_3;
Index: gcc/optabs-tree.c
===================================================================
--- gcc/optabs-tree.c	2018-05-16 12:48:59.410941892 +0100
+++ gcc/optabs-tree.c	2018-05-17 09:18:19.975942456 +0100
@@ -143,9 +143,6 @@ optab_for_tree_code (enum tree_code code
 	      : (TYPE_SATURATING (type)
 		 ? ssmsub_widen_optab : smsub_widen_optab));
 
-    case FMA_EXPR:
-      return fma_optab;
-
     case VEC_WIDEN_MULT_HI_EXPR:
       return TYPE_UNSIGNED (type) ?
 	vec_widen_umult_hi_optab : vec_widen_smult_hi_optab;
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	2018-05-16 12:49:10.497382733 +0100
+++ gcc/tree-cfg.c	2018-05-17 09:18:19.979942362 +0100
@@ -4109,20 +4109,6 @@ verify_gimple_assign_ternary (gassign *s
 	}
       break;
 
-    case FMA_EXPR:
-      if (!useless_type_conversion_p (lhs_type, rhs1_type)
-	  || !useless_type_conversion_p (lhs_type, rhs2_type)
-	  || !useless_type_conversion_p (lhs_type, rhs3_type))
-	{
-	  error ("type mismatch in fused multiply-add expression");
-	  debug_generic_expr (lhs_type);
-	  debug_generic_expr (rhs1_type);
-	  debug_generic_expr (rhs2_type);
-	  debug_generic_expr (rhs3_type);
-	  return true;
-	}
-      break;
-
     case VEC_COND_EXPR:
       if (!VECTOR_BOOLEAN_TYPE_P (rhs1_type)
 	  || maybe_ne (TYPE_VECTOR_SUBPARTS (rhs1_type),
Index: gcc/tree-eh.c
===================================================================
--- gcc/tree-eh.c	2018-05-16 12:48:59.410941892 +0100
+++ gcc/tree-eh.c	2018-05-17 09:18:19.980942338 +0100
@@ -2512,8 +2512,7 @@ operation_could_trap_p (enum tree_code o
 
   if (TREE_CODE_CLASS (op) != tcc_comparison
       && TREE_CODE_CLASS (op) != tcc_unary
-      && TREE_CODE_CLASS (op) != tcc_binary
-      && op != FMA_EXPR)
+      && TREE_CODE_CLASS (op) != tcc_binary)
     return false;
 
   return operation_could_trap_helper_p (op, fp_operation, honor_trapv,
@@ -2825,8 +2824,7 @@ stmt_could_throw_1_p (gassign *stmt)
 
   if (TREE_CODE_CLASS (code) == tcc_comparison
       || TREE_CODE_CLASS (code) == tcc_unary
-      || TREE_CODE_CLASS (code) == tcc_binary
-      || code == FMA_EXPR)
+      || TREE_CODE_CLASS (code) == tcc_binary)
     {
       if (TREE_CODE_CLASS (code) == tcc_comparison)
 	t = TREE_TYPE (gimple_assign_rhs1 (stmt));
Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c	2018-05-16 12:48:59.410941892 +0100
+++ gcc/tree-inline.c	2018-05-17 09:18:19.980942338 +0100
@@ -3855,7 +3855,6 @@ estimate_operator_cost (enum tree_code c
     case MINUS_EXPR:
     case MULT_EXPR:
     case MULT_HIGHPART_EXPR:
-    case FMA_EXPR:
 
     case ADDR_SPACE_CONVERT_EXPR:
     case FIXED_CONVERT_EXPR:
Index: gcc/tree-pretty-print.c
===================================================================
--- gcc/tree-pretty-print.c	2018-05-17 09:17:58.758608713 +0100
+++ gcc/tree-pretty-print.c	2018-05-17 09:18:19.981942315 +0100
@@ -2904,16 +2904,6 @@ dump_generic_node (pretty_printer *pp, t
       pp_string (pp, " > ");
       break;
 
-    case FMA_EXPR:
-      pp_string (pp, " FMA_EXPR < ");
-      dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
-      pp_string (pp, ", ");
-      dump_generic_node (pp, TREE_OPERAND (node, 1), spc, flags, false);
-      pp_string (pp, ", ");
-      dump_generic_node (pp, TREE_OPERAND (node, 2), spc, flags, false);
-      pp_string (pp, " > ");
-      break;
-
     case OACC_PARALLEL:
       pp_string (pp, "#pragma acc parallel");
       goto dump_omp_clauses_body;
@@ -3552,7 +3542,6 @@ op_code_prio (enum tree_code code)
     case CEIL_MOD_EXPR:
     case FLOOR_MOD_EXPR:
     case ROUND_MOD_EXPR:
-    case FMA_EXPR:
       return 13;
 
     case TRUTH_NOT_EXPR:
Index: gcc/tree-ssa-loop-im.c
===================================================================
--- gcc/tree-ssa-loop-im.c	2018-05-16 12:48:59.410941892 +0100
+++ gcc/tree-ssa-loop-im.c	2018-05-17 09:18:19.981942315 +0100
@@ -493,7 +493,6 @@ stmt_cost (gimple *stmt)
     case WIDEN_MULT_PLUS_EXPR:
     case WIDEN_MULT_MINUS_EXPR:
     case DOT_PROD_EXPR:
-    case FMA_EXPR:
     case TRUNC_DIV_EXPR:
     case CEIL_DIV_EXPR:
     case FLOOR_DIV_EXPR:
Index: gcc/tree-ssa-operands.c
===================================================================
--- gcc/tree-ssa-operands.c	2018-05-16 12:48:59.410941892 +0100
+++ gcc/tree-ssa-operands.c	2018-05-17 09:18:19.982942291 +0100
@@ -849,7 +849,6 @@ get_expr_operands (struct function *fn,
     case REALIGN_LOAD_EXPR:
     case WIDEN_MULT_PLUS_EXPR:
     case WIDEN_MULT_MINUS_EXPR:
-    case FMA_EXPR:
       {
 	get_expr_operands (fn, stmt, &TREE_OPERAND (expr, 0), flags);
 	get_expr_operands (fn, stmt, &TREE_OPERAND (expr, 1), flags);
Index: gcc/tree.c
===================================================================
--- gcc/tree.c	2018-05-16 12:48:59.410941892 +0100
+++ gcc/tree.c	2018-05-17 09:18:19.986942197 +0100
@@ -7171,7 +7171,6 @@ commutative_ternary_tree_code (enum tree
     case WIDEN_MULT_PLUS_EXPR:
     case WIDEN_MULT_MINUS_EXPR:
     case DOT_PROD_EXPR:
-    case FMA_EXPR:
       return true;
 
     default:
@@ -7457,7 +7456,6 @@ add_expr (const_tree t, inchash::hash &h
 	      flags &= ~OEP_ADDRESS_OF;
 	      break;
 
-	    case FMA_EXPR:
 	    case WIDEN_MULT_PLUS_EXPR:
 	    case WIDEN_MULT_MINUS_EXPR:
 	      {
Index: gcc/fold-const-call.h
===================================================================
--- gcc/fold-const-call.h	2018-05-16 12:48:59.410941892 +0100
+++ gcc/fold-const-call.h	2018-05-17 09:18:19.957942878 +0100
@@ -23,7 +23,6 @@ #define GCC_FOLD_CONST_CALL_H
 tree fold_const_call (combined_fn, tree, tree);
 tree fold_const_call (combined_fn, tree, tree, tree);
 tree fold_const_call (combined_fn, tree, tree, tree, tree);
-tree fold_fma (location_t, tree, tree, tree, tree);
 tree build_cmp_result (tree type, int res);
 
 #endif
Index: gcc/fold-const-call.c
===================================================================
--- gcc/fold-const-call.c	2018-05-16 12:48:59.410941892 +0100
+++ gcc/fold-const-call.c	2018-05-17 09:18:19.957942878 +0100
@@ -1606,6 +1606,26 @@ fold_const_call_ssss (real_value *result
     CASE_CFN_FMA_FN:
       return do_mpfr_arg3 (result, mpfr_fma, arg0, arg1, arg2, format);
 
+    case CFN_FMS:
+      {
+	real_value new_arg2 = real_value_negate (arg2);
+	return do_mpfr_arg3 (result, mpfr_fma, arg0, arg1, &new_arg2, format);
+      }
+
+    case CFN_FNMA:
+      {
+	real_value new_arg0 = real_value_negate (arg0);
+	return do_mpfr_arg3 (result, mpfr_fma, &new_arg0, arg1, arg2, format);
+      }
+
+    case CFN_FNMS:
+      {
+	real_value new_arg0 = real_value_negate (arg0);
+	real_value new_arg2 = real_value_negate (arg2);
+	return do_mpfr_arg3 (result, mpfr_fma, &new_arg0, arg1,
+			     &new_arg2, format);
+      }
+
     default:
       return false;
     }
@@ -1719,20 +1739,3 @@ fold_const_call (combined_fn fn, tree ty
       return fold_const_call_1 (fn, type, arg0, arg1, arg2);
     }
 }
-
-/* Fold a fma operation with arguments ARG[012].  */
-
-tree
-fold_fma (location_t, tree type, tree arg0, tree arg1, tree arg2)
-{
-  REAL_VALUE_TYPE result;
-  if (real_cst_p (arg0)
-      && real_cst_p (arg1)
-      && real_cst_p (arg2)
-      && do_mpfr_arg3 (&result, mpfr_fma, TREE_REAL_CST_PTR (arg0),
-		       TREE_REAL_CST_PTR (arg1), TREE_REAL_CST_PTR (arg2),
-		       REAL_MODE_FORMAT (TYPE_MODE (type))))
-    return build_real (type, result);
-
-  return NULL_TREE;
-}
Index: gcc/genmatch.c
===================================================================
--- gcc/genmatch.c	2018-05-16 12:48:59.410941892 +0100
+++ gcc/genmatch.c	2018-05-17 09:18:19.960942808 +0100
@@ -241,6 +241,20 @@ enum internal_fn {
   IFN_LAST
 };
 
+enum combined_fn {
+#define DEF_BUILTIN(ENUM, N, C, T, LT, B, F, NA, AT, IM, COND) \
+  CFN_##ENUM = int (ENUM),
+#include "builtins.def"
+
+#define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) \
+  CFN_##CODE = int (END_BUILTINS) + int (IFN_##CODE),
+#include "internal-fn.def"
+
+  CFN_LAST
+};
+
+#include "case-cfn-macros.h"
+
 /* Return true if CODE represents a commutative tree code.  Otherwise
    return false.  */
 bool
@@ -288,7 +302,6 @@ commutative_ternary_tree_code (enum tree
     case WIDEN_MULT_PLUS_EXPR:
     case WIDEN_MULT_MINUS_EXPR:
     case DOT_PROD_EXPR:
-    case FMA_EXPR:
       return true;
 
     default:
@@ -450,6 +463,44 @@ is_a_helper <user_id *>::test (id_base *
   return id->kind == id_base::USER;
 }
 
+/* If ID has a pair of consecutive, commutative operands, return the
+   index of the first, otherwise return -1.  */
+
+static int
+commutative_op (id_base *id)
+{
+  if (operator_id *code = dyn_cast <operator_id *> (id))
+    {
+      if (commutative_tree_code (code->code)
+	  || commutative_ternary_tree_code (code->code))
+	return 0;
+      return -1;
+    }
+  if (fn_id *fn = dyn_cast <fn_id *> (id))
+    switch (fn->fn)
+      {
+      CASE_CFN_FMA:
+      case CFN_FMS:
+      case CFN_FNMA:
+      case CFN_FNMS:
+	return 0;
+
+      default:
+	return -1;
+      }
+  if (user_id *uid = dyn_cast<user_id *> (id))
+    {
+      int res = commutative_op (uid->substitutes[0]);
+      if (res < 0)
+	return 0;
+      for (unsigned i = 1; i < uid->substitutes.length (); ++i)
+	if (res != commutative_op (uid->substitutes[i]))
+	  return -1;
+      return res;
+    }
+  return -1;
+}
+
 /* Add a predicate identifier to the hash.  */
 
 static predicate_id *
@@ -946,6 +997,9 @@ commutate (operand *op, vec<vec<user_id
   if (!e->is_commutative)
     return ret;
 
+  /* The operation is always binary if it isn't inherently commutative.  */
+  int natural_opno = commutative_op (e->operation);
+  unsigned int opno = natural_opno >= 0 ? natural_opno : 0;
   for (unsigned i = 0; i < result.length (); ++i)
     {
       expr *ne = new expr (e);
@@ -994,9 +1048,11 @@ commutate (operand *op, vec<vec<user_id
 	    }
 	}
       ne->is_commutative = false;
-      // result[i].length () is 2 since e->operation is binary
-      for (unsigned j = result[i].length (); j; --j)
-	ne->append_op (result[i][j-1]);
+      for (unsigned j = 0; j < result[i].length (); ++j)
+	{
+	  int old_j = (j == opno ? opno + 1 : j == opno + 1 ? opno : j);
+	  ne->append_op (result[i][old_j]);
+	}
       ret.safe_push (ne);
     }
 
@@ -2759,24 +2815,18 @@ dt_operand::gen_gimple_expr (FILE *f, in
   /* While the toplevel operands are canonicalized by the caller
      after valueizing operands of sub-expressions we have to
      re-canonicalize operand order.  */
-  if (operator_id *code = dyn_cast <operator_id *> (id))
+  int opno = commutative_op (id);
+  if (opno >= 0)
     {
-      /* ???  We can't canonicalize tcc_comparison operands here
-         because that requires changing the comparison code which
-	 we already matched...  */
-      if (commutative_tree_code (code->code)
-	  || commutative_ternary_tree_code (code->code))
-	{
-	  char child_opname0[20], child_opname1[20];
-	  gen_opname (child_opname0, 0);
-	  gen_opname (child_opname1, 1);
-	  fprintf_indent (f, indent,
-			  "if (tree_swap_operands_p (%s, %s))\n",
-			  child_opname0, child_opname1);
-	  fprintf_indent (f, indent,
-			  "  std::swap (%s, %s);\n",
-			  child_opname0, child_opname1);
-	}
+      char child_opname0[20], child_opname1[20];
+      gen_opname (child_opname0, opno);
+      gen_opname (child_opname1, opno + 1);
+      fprintf_indent (f, indent,
+		      "if (tree_swap_operands_p (%s, %s))\n",
+		      child_opname0, child_opname1);
+      fprintf_indent (f, indent,
+		      "  std::swap (%s, %s);\n",
+		      child_opname0, child_opname1);
     }
 
   return n_braces;
@@ -4217,11 +4267,14 @@ parser::parse_expr ()
 		      e->operation->id, e->operation->nargs, e->ops.length ());
 	  if (is_commutative)
 	    {
-	      if (e->ops.length () == 2)
+	      if (e->ops.length () == 2
+		  || commutative_op (e->operation) >= 0)
 		e->is_commutative = true;
 	      else
-		fatal_at (token, "only binary operators or function with "
-			  "two arguments can be marked commutative");
+		fatal_at (token, "only binary operators or functions with "
+			  "two arguments can be marked commutative, "
+			  "unless the operation is known to be inherently "
+			  "commutative");
 	    }
 	  e->expr_type = expr_type;
 	  return op;
Index: gcc/gimple-ssa-backprop.c
===================================================================
--- gcc/gimple-ssa-backprop.c	2018-05-16 12:48:59.410941892 +0100
+++ gcc/gimple-ssa-backprop.c	2018-05-17 09:18:19.963942737 +0100
@@ -375,6 +375,9 @@ backprop::process_builtin_call_use (gcal
 
     CASE_CFN_FMA:
     CASE_CFN_FMA_FN:
+    case CFN_FMS:
+    case CFN_FNMA:
+    case CFN_FNMS:
       /* In X * X + Y, where Y is distinct from X, the sign of X doesn't
 	 matter.  */
       if (gimple_call_arg (call, 0) == rhs
@@ -420,15 +423,6 @@ backprop::process_assign_use (gassign *a
 	}
       break;
 
-    case FMA_EXPR:
-      /* In X * X + Y, where Y is distinct from X, the sign of X doesn't
-	 matter.  */
-      if (gimple_assign_rhs1 (assign) == rhs
-	  && gimple_assign_rhs2 (assign) == rhs
-	  && gimple_assign_rhs3 (assign) != rhs)
-	info->flags.ignore_sign = true;
-      break;
-
     case MULT_EXPR:
       /* In X * X, the sign of X doesn't matter.  */
       if (gimple_assign_rhs1 (assign) == rhs
Index: gcc/hsa-gen.c
===================================================================
--- gcc/hsa-gen.c	2018-05-16 12:48:59.410941892 +0100
+++ gcc/hsa-gen.c	2018-05-17 09:18:19.973942503 +0100
@@ -3178,23 +3178,6 @@ gen_hsa_insns_for_operation_assignment (
     case NEGATE_EXPR:
       opcode = BRIG_OPCODE_NEG;
       break;
-    case FMA_EXPR:
-      /* There is a native HSA instruction for scalar FMAs but not for vector
-	 ones.  */
-      if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE)
-	{
-	  hsa_op_reg *dest
-	    = hsa_cfun->reg_for_gimple_ssa (gimple_assign_lhs (assign));
-	  hsa_op_with_type *op1 = hsa_reg_or_immed_for_gimple_op (rhs1, hbb);
-	  hsa_op_with_type *op2 = hsa_reg_or_immed_for_gimple_op (rhs2, hbb);
-	  hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb);
-	  hsa_op_reg *tmp = new hsa_op_reg (dest->m_type);
-	  gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp, op1, op2, hbb);
-	  gen_hsa_binary_operation (BRIG_OPCODE_ADD, dest, tmp, op3, hbb);
-	  return;
-	}
-      opcode = BRIG_OPCODE_MAD;
-      break;
     case MIN_EXPR:
       opcode = BRIG_OPCODE_MIN;
       break;
@@ -4490,6 +4473,57 @@ gen_hsa_divmod (gcall *call, hsa_bb *hbb
   insn->set_output_in_type (dest, 0, hbb);
 }
 
+/* Emit instructions that implement FMA, FMS, FNMA or FNMS call STMT.
+   Instructions are appended to basic block HBB.  NEGATE1 is true for
+   FNMA and FNMS.  NEGATE3 is true for FMS and FNMS.  */
+
+static void
+gen_hsa_fma (gcall *call, hsa_bb *hbb, bool negate1, bool negate3)
+{
+  tree lhs = gimple_call_lhs (call);
+  if (lhs == NULL_TREE)
+    return;
+
+  tree rhs1 = gimple_call_arg (call, 0);
+  tree rhs2 = gimple_call_arg (call, 1);
+  tree rhs3 = gimple_call_arg (call, 2);
+
+  hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
+  hsa_op_with_type *op1 = hsa_reg_or_immed_for_gimple_op (rhs1, hbb);
+  hsa_op_with_type *op2 = hsa_reg_or_immed_for_gimple_op (rhs2, hbb);
+  hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb);
+
+  if (negate1)
+    {
+      hsa_op_reg *tmp = new hsa_op_reg (dest->m_type);
+      gen_hsa_unary_operation (BRIG_OPCODE_NEG, tmp, op1, hbb);
+      op1 = tmp;
+    }
+
+  /* There is a native HSA instruction for scalar FMAs but not for vector
+     ones.  */
+  if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE)
+    {
+      hsa_op_reg *tmp = new hsa_op_reg (dest->m_type);
+      gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp, op1, op2, hbb);
+      gen_hsa_binary_operation (negate3 ? BRIG_OPCODE_SUB : BRIG_OPCODE_ADD,
+				dest, tmp, op3, hbb);
+    }
+  else
+    {
+      if (negate3)
+	{
+	  hsa_op_reg *tmp = new hsa_op_reg (dest->m_type);
+	  gen_hsa_unary_operation (BRIG_OPCODE_NEG, tmp, op3, hbb);
+	  op3 = tmp;
+	}
+      hsa_insn_basic *insn = new hsa_insn_basic (4, BRIG_OPCODE_MAD,
+						 dest->m_type, dest,
+						 op1, op2, op3);
+      hbb->append_insn (insn);
+    }
+}
+
 /* Set VALUE to a shadow kernel debug argument and append a new instruction
    to HBB basic block.  */
 
@@ -5224,6 +5258,22 @@ gen_hsa_insn_for_internal_fn_call (gcall
       gen_hsa_insns_for_call_of_internal_fn (stmt, hbb);
       break;
 
+    case IFN_FMA:
+      gen_hsa_fma (stmt, hbb, false, false);
+      break;
+
+    case IFN_FMS:
+      gen_hsa_fma (stmt, hbb, false, true);
+      break;
+
+    case IFN_FNMA:
+      gen_hsa_fma (stmt, hbb, true, false);
+      break;
+
+    case IFN_FNMS:
+      gen_hsa_fma (stmt, hbb, true, true);
+      break;
+
     default:
       HSA_SORRY_ATV (gimple_location (stmt),
 		     "support for HSA does not implement internal function: %s",
Index: gcc/match.pd
===================================================================
--- gcc/match.pd	2018-05-16 12:48:59.410941892 +0100
+++ gcc/match.pd	2018-05-17 09:18:19.974942479 +0100
@@ -4702,3 +4702,60 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 	|| wi::geu_p (wi::to_wide (@rpos),
 		      wi::to_wide (@ipos) + isize))
     (BIT_FIELD_REF @0 @rsize @rpos)))))
+
+(for fmas (FMA)
+ (simplify
+  (fmas:c (negate @0) @1 @2)
+  (IFN_FNMA @0 @1 @2))
+ (simplify
+  (fmas @0 @1 (negate @2))
+  (IFN_FMS @0 @1 @2))
+ (simplify
+  (fmas:c (negate @0) @1 (negate @2))
+  (IFN_FNMS @0 @1 @2))
+ (simplify
+  (negate (fmas@3 @0 @1 @2))
+  (if (single_use (@3))
+   (IFN_FNMS @0 @1 @2))))
+
+(simplify
+ (IFN_FMS:c (negate @0) @1 @2)
+ (IFN_FNMS @0 @1 @2))
+(simplify
+ (IFN_FMS @0 @1 (negate @2))
+ (IFN_FMA @0 @1 @2))
+(simplify
+ (IFN_FMS:c (negate @0) @1 (negate @2))
+ (IFN_FNMA @0 @1 @2))
+(simplify
+ (negate (IFN_FMS@3 @0 @1 @2))
+  (if (single_use (@3))
+   (IFN_FNMA @0 @1 @2)))
+
+(simplify
+ (IFN_FNMA:c (negate @0) @1 @2)
+ (IFN_FMA @0 @1 @2))
+(simplify
+ (IFN_FNMA @0 @1 (negate @2))
+ (IFN_FNMS @0 @1 @2))
+(simplify
+ (IFN_FNMA:c (negate @0) @1 (negate @2))
+ (IFN_FMS @0 @1 @2))
+(simplify
+ (negate (IFN_FNMA@3 @0 @1 @2))
+ (if (single_use (@3))
+  (IFN_FMS @0 @1 @2)))
+
+(simplify
+ (IFN_FNMS:c (negate @0) @1 @2)
+ (IFN_FMS @0 @1 @2))
+(simplify
+ (IFN_FNMS @0 @1 (negate @2))
+ (IFN_FNMA @0 @1 @2))
+(simplify
+ (IFN_FNMS:c (negate @0) @1 (negate @2))
+ (IFN_FMA @0 @1 @2))
+(simplify
+ (negate (IFN_FNMS@3 @0 @1 @2))
+ (if (single_use (@3))
+  (IFN_FMA @0 @1 @2)))
Index: gcc/gimple-fold.h
===================================================================
--- gcc/gimple-fold.h	2018-05-17 09:17:32.876478942 +0100
+++ gcc/gimple-fold.h	2018-05-17 09:18:19.962942761 +0100
@@ -41,6 +41,7 @@ extern bool arith_overflowed_p (enum tre
 				const_tree);
 extern tree no_follow_ssa_edges (tree);
 extern tree follow_single_use_edges (tree);
+extern tree follow_all_ssa_edges (tree);
 extern tree gimple_fold_stmt_to_constant_1 (gimple *, tree (*) (tree),
 					    tree (*) (tree) = no_follow_ssa_edges);
 extern tree gimple_fold_stmt_to_constant (gimple *, tree (*) (tree));
Index: gcc/gimple-fold.c
===================================================================
--- gcc/gimple-fold.c	2018-05-17 09:17:32.876478942 +0100
+++ gcc/gimple-fold.c	2018-05-17 09:18:19.961942784 +0100
@@ -4967,6 +4967,14 @@ follow_single_use_edges (tree val)
   return val;
 }
 
+/* Valueization callback that follows all SSA edges.  */
+
+tree
+follow_all_ssa_edges (tree val)
+{
+  return val;
+}
+
 /* Fold the statement pointed to by GSI.  In some cases, this function may
    replace the whole statement with a new one.  Returns true iff folding
    makes any changes.
Index: gcc/tree-ssa-math-opts.c
===================================================================
--- gcc/tree-ssa-math-opts.c	2018-05-16 12:48:59.410941892 +0100
+++ gcc/tree-ssa-math-opts.c	2018-05-17 09:18:19.982942291 +0100
@@ -2650,7 +2650,7 @@ convert_mult_to_fma_1 (tree mul_result,
   tree type = TREE_TYPE (mul_result);
   gimple *use_stmt;
   imm_use_iterator imm_iter;
-  gassign *fma_stmt;
+  gcall *fma_stmt;
 
   FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, mul_result)
     {
@@ -2658,6 +2658,7 @@ convert_mult_to_fma_1 (tree mul_result,
       enum tree_code use_code;
       tree addop, mulop1 = op1, result = mul_result;
       bool negate_p = false;
+      gimple_seq seq = NULL;
 
       if (is_gimple_debug (use_stmt))
 	continue;
@@ -2683,11 +2684,7 @@ convert_mult_to_fma_1 (tree mul_result,
 	  addop = gimple_assign_rhs2 (use_stmt);
 	  /* a * b - c -> a * b + (-c)  */
 	  if (gimple_assign_rhs_code (use_stmt) == MINUS_EXPR)
-	    addop = force_gimple_operand_gsi (&gsi,
-					      build1 (NEGATE_EXPR,
-						      type, addop),
-					      true, NULL_TREE, true,
-					      GSI_SAME_STMT);
+	    addop = gimple_build (&seq, NEGATE_EXPR, type, addop);
 	}
       else
 	{
@@ -2698,23 +2695,26 @@ convert_mult_to_fma_1 (tree mul_result,
 	}
 
       if (negate_p)
-	mulop1 = force_gimple_operand_gsi (&gsi,
-					   build1 (NEGATE_EXPR,
-						   type, mulop1),
-					   true, NULL_TREE, true,
-					   GSI_SAME_STMT);
+	mulop1 = gimple_build (&seq, NEGATE_EXPR, type, mulop1);
 
-      fma_stmt = gimple_build_assign (gimple_assign_lhs (use_stmt),
-				      FMA_EXPR, mulop1, op2, addop);
+      if (seq)
+	gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
+      fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2, addop);
+      gimple_call_set_lhs (fma_stmt, gimple_assign_lhs (use_stmt));
+      gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal (use_stmt));
+      gsi_replace (&gsi, fma_stmt, true);
+      /* Follow all SSA edges so that we generate FMS, FNMA and FNMS
+	 regardless of where the negation occurs.  */
+      if (fold_stmt (&gsi, follow_all_ssa_edges))
+	update_stmt (gsi_stmt (gsi));
 
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
 	  fprintf (dump_file, "Generated FMA ");
-	  print_gimple_stmt (dump_file, fma_stmt, 0, 0);
+	  print_gimple_stmt (dump_file, gsi_stmt (gsi), 0, 0);
 	  fprintf (dump_file, "\n");
 	}
 
-      gsi_replace (&gsi, fma_stmt, true);
       widen_mul_stats.fmas_inserted++;
     }
 }
@@ -2862,7 +2862,8 @@ convert_mult_to_fma (gimple *mul_stmt, t
 
   /* If the target doesn't support it, don't generate it.  We assume that
      if fma isn't available then fms, fnma or fnms are not either.  */
-  if (optab_handler (fma_optab, TYPE_MODE (type)) == CODE_FOR_nothing)
+  optimization_type opt_type = bb_optimization_type (gimple_bb (mul_stmt));
+  if (!direct_internal_fn_supported_p (IFN_FMA, type, opt_type))
     return false;
 
   /* If the multiplication has zero uses, it is kept around probably because
@@ -2958,8 +2959,8 @@ convert_mult_to_fma (gimple *mul_stmt, t
 	 that a mult / subtract pair.  */
       if (use_code == MINUS_EXPR && !negate_p
 	  && gimple_assign_rhs1 (use_stmt) == result
-	  && optab_handler (fms_optab, TYPE_MODE (type)) == CODE_FOR_nothing
-	  && optab_handler (fnma_optab, TYPE_MODE (type)) != CODE_FOR_nothing)
+	  && !direct_internal_fn_supported_p (IFN_FMS, type, opt_type)
+	  && direct_internal_fn_supported_p (IFN_FNMA, type, opt_type))
 	{
 	  tree rhs2 = gimple_assign_rhs2 (use_stmt);
 
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	2018-05-16 12:49:10.501715913 +0100
+++ gcc/config/i386/i386.c	2018-05-17 09:18:19.949943066 +0100
@@ -50614,19 +50614,20 @@ ix86_add_stmt_cost (void *data, int coun
   tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE;
   int stmt_cost = - 1;
 
+  bool fp = false;
+  machine_mode mode = TImode;
+
+  if (vectype != NULL)
+    {
+      fp = FLOAT_TYPE_P (vectype);
+      mode = TYPE_MODE (vectype);
+    }
+
   if ((kind == vector_stmt || kind == scalar_stmt)
       && stmt_info
       && stmt_info->stmt && gimple_code (stmt_info->stmt) == GIMPLE_ASSIGN)
     {
       tree_code subcode = gimple_assign_rhs_code (stmt_info->stmt);
-      bool fp = false;
-      machine_mode mode = TImode;
-
-      if (vectype != NULL)
-	{
-	  fp = FLOAT_TYPE_P (vectype);
-	  mode = TYPE_MODE (vectype);
-	}
       /*machine_mode inner_mode = mode;
       if (VECTOR_MODE_P (mode))
 	inner_mode = GET_MODE_INNER (mode);*/
@@ -50657,12 +50658,6 @@ ix86_add_stmt_cost (void *data, int coun
 	case MULT_HIGHPART_EXPR:
 	  stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
 	  break;
-	case FMA_EXPR:
-          stmt_cost = ix86_vec_cost (mode,
-				     mode == SFmode ? ix86_cost->fmass
-				     : ix86_cost->fmasd,
-				     true);
-	  break;
 	case NEGATE_EXPR:
 	  if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
 	    stmt_cost = ix86_cost->sse_op;
@@ -50725,6 +50720,24 @@ ix86_add_stmt_cost (void *data, int coun
 	  break;
 	}
     }
+
+  combined_fn cfn;
+  if ((kind == vector_stmt || kind == scalar_stmt)
+      && stmt_info
+      && stmt_info->stmt
+      && (cfn = gimple_call_combined_fn (stmt_info->stmt)) != CFN_LAST)
+    switch (cfn)
+      {
+      case CFN_FMA:
+	stmt_cost = ix86_vec_cost (mode,
+				   mode == SFmode ? ix86_cost->fmass
+				   : ix86_cost->fmasd,
+				   true);
+	break;
+      default:
+	break;
+      }
+
   /* If we do elementwise loads into a vector then we are bound by
      latency and execution resources for the many scalar loads
      (AGU and load ports).  Try to account for this by scaling the
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	2018-05-16 12:48:59.410941892 +0100
+++ gcc/config/rs6000/rs6000.c	2018-05-17 09:18:19.953942972 +0100
@@ -15880,7 +15880,9 @@ rs6000_gimple_fold_builtin (gimple_stmt_
 	arg1 = gimple_call_arg (stmt, 1);
 	tree arg2 = gimple_call_arg (stmt, 2);
 	lhs = gimple_call_lhs (stmt);
-	gimple *g = gimple_build_assign (lhs, FMA_EXPR, arg0, arg1, arg2);
+	gcall *g = gimple_build_call_internal (IFN_FMA, 3, arg0, arg1, arg2);
+	gimple_call_set_lhs (g, lhs);
+	gimple_call_set_nothrow (g, true);
 	gimple_set_location (g, gimple_location (stmt));
 	gsi_replace (gsi, g, true);
 	return true;
Index: gcc/brig/brigfrontend/brig-function.cc
===================================================================
--- gcc/brig/brigfrontend/brig-function.cc	2018-05-16 12:48:59.410941892 +0100
+++ gcc/brig/brigfrontend/brig-function.cc	2018-05-17 09:18:19.936943371 +0100
@@ -1218,6 +1218,7 @@ brig_function::get_builtin_for_hsa_opcod
     case BRIG_OPCODE_NEXP2:
       builtin = mathfn_built_in (builtin_type, BUILT_IN_EXP2);
       break;
+    case BRIG_OPCODE_FMA:
     case BRIG_OPCODE_NFMA:
       builtin = mathfn_built_in (builtin_type, BUILT_IN_FMA);
       break;
@@ -1460,8 +1461,6 @@ brig_function::get_tree_code_for_hsa_opc
 	return CALL_EXPR;
       else
 	return MAX_EXPR;
-    case BRIG_OPCODE_FMA:
-      return FMA_EXPR;
     case BRIG_OPCODE_ABS:
       return ABS_EXPR;
     case BRIG_OPCODE_SHL:
@@ -1496,6 +1495,7 @@ brig_function::get_tree_code_for_hsa_opc
       /* Implement as 1/f (x).  gcc should pattern detect that and
 	 use a native instruction, if available, for it.  */
       return TREE_LIST;
+    case BRIG_OPCODE_FMA:
     case BRIG_OPCODE_FLOOR:
     case BRIG_OPCODE_CEIL:
     case BRIG_OPCODE_SQRT:
Index: gcc/c/gimple-parser.c
===================================================================
--- gcc/c/gimple-parser.c	2018-05-17 09:17:58.756608780 +0100
+++ gcc/c/gimple-parser.c	2018-05-17 09:18:19.939943300 +0100
@@ -952,27 +952,6 @@ c_parser_gimple_postfix_expression (c_pa
 	      expr.value = fold_convert (type, val);
 	      return expr;
 	    }
-	  else if (strcmp (IDENTIFIER_POINTER (id), "__FMA") == 0)
-	    {
-	      c_parser_consume_token (parser);
-	      auto_vec<tree> args;
-
-	      if (c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
-		{
-		  c_parser_gimple_expr_list (parser, &args);
-		  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
-					     "expected %<)%>");
-		}
-	      if (args.length () != 3)
-		{
-		  error_at (loc, "invalid number of operands to __FMA");
-		  expr.value = error_mark_node;
-		  return expr;
-		}
-	      expr.value = build3_loc (loc, FMA_EXPR, TREE_TYPE (args[0]),
-				       args[0], args[1], args[2]);
-	      return expr;
-	    }
 
 	  /* SSA name.  */
 	  unsigned version, ver_offset;
Index: gcc/cp/constexpr.c
===================================================================
--- gcc/cp/constexpr.c	2018-05-16 12:49:10.203809793 +0100
+++ gcc/cp/constexpr.c	2018-05-17 09:18:19.954942948 +0100
@@ -4574,7 +4574,6 @@ cxx_eval_constant_expression (const cons
 			     non_constant_p, overflow_p);
       break;
 
-    case FMA_EXPR:
     case VEC_PERM_EXPR:
       r = cxx_eval_trinary_expression (ctx, t, lval,
 				       non_constant_p, overflow_p);
@@ -5999,7 +5998,6 @@ #define RECUR(T,RV) \
 	  return false;
       return true;
 
-    case FMA_EXPR:
     case VEC_PERM_EXPR:
      for (i = 0; i < 3; ++i)
       if (!RECUR (TREE_OPERAND (t, i), true))
Index: gcc/testsuite/lib/target-supports.exp
===================================================================
--- gcc/testsuite/lib/target-supports.exp	2018-05-16 12:48:59.410941892 +0100
+++ gcc/testsuite/lib/target-supports.exp	2018-05-17 09:18:19.978942385 +0100
@@ -2879,6 +2879,13 @@ proc check_effective_target_base_quadflo
     return 1
 }
 
+# Return 1 if the target supports all four forms of fused multiply-add
+# (fma, fms, fnma, and fnms) for both float and double.
+
+proc check_effective_target_scalar_all_fma { } {
+    return [istarget aarch64*-*-*]
+}
+
 # Return 1 if the target supports compiling fixed-point,
 # 0 otherwise.
 
Index: gcc/testsuite/gcc.dg/fma-1.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.dg/fma-1.c	2018-05-17 09:18:19.975942456 +0100
@@ -0,0 +1,15 @@
+/* { dg-options "-O2 -fdump-tree-widening_mul" } */
+
+float
+f1 (float a, float b, float c)
+{
+  return a * b + c;
+}
+
+double
+f2 (double a, double b, double c)
+{
+  return a * b + c;
+}
+
+/* { dg-final { scan-tree-dump-times { = \.FMA \(} 2 "widening_mul" { target scalar_all_fma } } } */
Index: gcc/testsuite/gcc.dg/fma-2.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.dg/fma-2.c	2018-05-17 09:18:19.975942456 +0100
@@ -0,0 +1,15 @@
+/* { dg-options "-O2 -fdump-tree-widening_mul" } */
+
+float
+f1 (float a, float b, float c)
+{
+  return a * b - c;
+}
+
+double
+f2 (double a, double b, double c)
+{
+  return a * b - c;
+}
+
+/* { dg-final { scan-tree-dump-times { = \.FMS \(} 2 "widening_mul" { target scalar_all_fma } } } */
Index: gcc/testsuite/gcc.dg/fma-3.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.dg/fma-3.c	2018-05-17 09:18:19.975942456 +0100
@@ -0,0 +1,15 @@
+/* { dg-options "-O2 -fdump-tree-widening_mul" } */
+
+float
+f1 (float a, float b, float c)
+{
+  return c - a * b;
+}
+
+double
+f2 (double a, double b, double c)
+{
+  return c - a * b;
+}
+
+/* { dg-final { scan-tree-dump-times { = \.FNMA \(} 2 "widening_mul" { target scalar_all_fma } } } */
Index: gcc/testsuite/gcc.dg/fma-4.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.dg/fma-4.c	2018-05-17 09:18:19.975942456 +0100
@@ -0,0 +1,15 @@
+/* { dg-options "-O2 -fdump-tree-widening_mul" } */
+
+float
+f1 (float a, float b, float c)
+{
+  return -(a * b) - c;
+}
+
+double
+f2 (double a, double b, double c)
+{
+  return -(a * b) - c;
+}
+
+/* { dg-final { scan-tree-dump-times { = \.FNMS \(} 2 "widening_mul" { target scalar_all_fma } } } */
Index: gcc/testsuite/gcc.dg/fma-5.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.dg/fma-5.c	2018-05-17 09:18:19.976942432 +0100
@@ -0,0 +1,53 @@
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+float
+f1 (float a, float b, float c)
+{
+  return __builtin_fmaf (a, b, -c);
+}
+
+double
+f2 (double a, double b, double c)
+{
+  return __builtin_fma (a, b, -c);
+}
+
+void
+f3 (float a, float b, float c, float d, float e, float *res)
+{
+  res[0] = __builtin_fmaf (a, b, -e);
+  res[1] = __builtin_fmaf (c, d, -e);
+}
+
+void
+f4 (double a, double b, double c, double d, double e, double *res)
+{
+  res[0] = __builtin_fma (a, b, -e);
+  res[1] = __builtin_fma (c, d, -e);
+}
+
+float
+f5 (float a, float b, float c)
+{
+  return -__builtin_fmaf (-a, b, c);
+}
+
+double
+f6 (double a, double b, double c)
+{
+  return -__builtin_fma (-a, b, c);
+}
+
+float
+f7 (float a, float b, float c)
+{
+  return -__builtin_fmaf (a, -b, c);
+}
+
+double
+f8 (double a, double b, double c)
+{
+  return -__builtin_fma (a, -b, c);
+}
+
+/* { dg-final { scan-tree-dump-times { = \.FMS \(} 10 "optimized" { target scalar_all_fma } } } */
Index: gcc/testsuite/gcc.dg/fma-6.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.dg/fma-6.c	2018-05-17 09:18:19.976942432 +0100
@@ -0,0 +1,67 @@
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+float
+f1 (float a, float b, float c)
+{
+  return __builtin_fmaf (-a, b, c);
+}
+
+double
+f2 (double a, double b, double c)
+{
+  return __builtin_fma (-a, b, c);
+}
+
+float
+f3 (float a, float b, float c)
+{
+  return __builtin_fmaf (a, -b, c);
+}
+
+double
+f4 (double a, double b, double c)
+{
+  return __builtin_fma (a, -b, c);
+}
+
+void
+f5 (float a, float b, float c, float d, float e, float *res)
+{
+  res[0] = __builtin_fmaf (-a, b, c);
+  res[1] = __builtin_fmaf (-a, d, e);
+}
+
+void
+f6 (double a, double b, double c, double d, double e, double *res)
+{
+  res[0] = __builtin_fma (-a, b, c);
+  res[1] = __builtin_fma (-a, d, e);
+}
+
+void
+f7 (float a, float b, float c, float d, float e, float *res)
+{
+  res[0] = __builtin_fmaf (a, -b, c);
+  res[1] = __builtin_fmaf (d, -b, e);
+}
+
+void
+f8 (double a, double b, double c, double d, double e, double *res)
+{
+  res[0] = __builtin_fma (a, -b, c);
+  res[1] = __builtin_fma (d, -b, e);
+}
+
+float
+f9 (float a, float b, float c)
+{
+  return -__builtin_fmaf (a, b, -c);
+}
+
+double
+f10 (double a, double b, double c)
+{
+  return -__builtin_fma (a, b, -c);
+}
+
+/* { dg-final { scan-tree-dump-times { = \.FNMA \(} 14 "optimized" { target scalar_all_fma } } } */
Index: gcc/testsuite/gcc.dg/fma-7.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.dg/fma-7.c	2018-05-17 09:18:19.976942432 +0100
@@ -0,0 +1,67 @@
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+float
+f1 (float a, float b, float c)
+{
+  return __builtin_fmaf (-a, b, -c);
+}
+
+double
+f2 (double a, double b, double c)
+{
+  return __builtin_fma (-a, b, -c);
+}
+
+float
+f3 (float a, float b, float c)
+{
+  return __builtin_fmaf (a, -b, -c);
+}
+
+double
+f4 (double a, double b, double c)
+{
+  return __builtin_fma (a, -b, -c);
+}
+
+void
+f5 (float a, float b, float c, float d, float *res)
+{
+  res[0] = __builtin_fmaf (-a, b, -c);
+  res[1] = __builtin_fmaf (-a, d, -c);
+}
+
+void
+f6 (double a, double b, double c, double d, double *res)
+{
+  res[0] = __builtin_fma (-a, b, -c);
+  res[1] = __builtin_fma (-a, d, -c);
+}
+
+void
+f7 (float a, float b, float c, float d, float *res)
+{
+  res[0] = __builtin_fmaf (a, -b, -c);
+  res[1] = __builtin_fmaf (d, -b, -c);
+}
+
+void
+f8 (double a, double b, double c, double d, double *res)
+{
+  res[0] = __builtin_fma (a, -b, -c);
+  res[1] = __builtin_fma (d, -b, -c);
+}
+
+float
+f9 (float a, float b, float c)
+{
+  return -__builtin_fmaf (a, b, c);
+}
+
+double
+f10 (double a, double b, double c)
+{
+  return -__builtin_fma (a, b, c);
+}
+
+/* { dg-final { scan-tree-dump-times { = \.FNMS \(} 14 "optimized" { target scalar_all_fma } } } */
Index: gcc/testsuite/gcc.dg/gimplefe-26.c
===================================================================
--- gcc/testsuite/gcc.dg/gimplefe-26.c	2018-05-16 12:48:59.410941892 +0100
+++ gcc/testsuite/gcc.dg/gimplefe-26.c	2018-05-17 09:18:19.976942432 +0100
@@ -1,16 +1,15 @@
-/* { dg-do compile { target c99_runtime } } */
+/* { dg-do compile { target scalar_all_fma } } */
 /* { dg-options "-O -fgimple -fdump-tree-ssa-gimple" } */
 
 #define foo(type, num) \
 type __GIMPLE () foo_##num (type a, type b, type c) \
 { \
   type t0; \
-  t0_1 = __FMA (a, b, c); \
+  t0_1 = .FMA (a, b, c); \
   return t0_1; \
 }
 
 foo(float, 1)
 foo(double, 2)
-foo(long double, 3)
 
-/* { dg-final { scan-tree-dump-times "__FMA" 3 "ssa" } } */
+/* { dg-final { scan-tree-dump-times {\.FMA} 2 "ssa" } } */
Index: gcc/testsuite/gfortran.dg/reassoc_7.f
===================================================================
--- gcc/testsuite/gfortran.dg/reassoc_7.f	2018-05-16 12:48:59.410941892 +0100
+++ gcc/testsuite/gfortran.dg/reassoc_7.f	2018-05-17 09:18:19.976942432 +0100
@@ -1,5 +1,5 @@
 ! { dg-do compile }
-! { dg-options "-O3 -ffast-math -fdump-tree-optimized" }
+! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" }
 
       SUBROUTINE S55199(P,Dvdph)
       implicit none
Index: gcc/testsuite/gfortran.dg/reassoc_8.f
===================================================================
--- gcc/testsuite/gfortran.dg/reassoc_8.f	2018-05-16 12:48:59.410941892 +0100
+++ gcc/testsuite/gfortran.dg/reassoc_8.f	2018-05-17 09:18:19.977942409 +0100
@@ -1,5 +1,5 @@
 ! { dg-do compile }
-! { dg-options "-O3 -ffast-math -fdump-tree-optimized" }
+! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" }
 
       SUBROUTINE S55199(P,Dvdph)
       implicit none
Index: gcc/testsuite/gfortran.dg/reassoc_9.f
===================================================================
--- gcc/testsuite/gfortran.dg/reassoc_9.f	2018-05-16 12:48:59.410941892 +0100
+++ gcc/testsuite/gfortran.dg/reassoc_9.f	2018-05-17 09:18:19.977942409 +0100
@@ -1,5 +1,5 @@
 ! { dg-do compile }
-! { dg-options "-O3 -ffast-math -fdump-tree-optimized" }
+! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" }
 
       SUBROUTINE S55199(P,Dvdph)
       implicit none
Index: gcc/testsuite/gfortran.dg/reassoc_10.f
===================================================================
--- gcc/testsuite/gfortran.dg/reassoc_10.f	2018-05-16 12:48:59.410941892 +0100
+++ gcc/testsuite/gfortran.dg/reassoc_10.f	2018-05-17 09:18:19.976942432 +0100
@@ -1,5 +1,5 @@
 ! { dg-do compile }
-! { dg-options "-O3 -ffast-math -fdump-tree-optimized" }
+! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" }
 
       SUBROUTINE S55199(P,Q,Dvdph)
       implicit none
Richard Biener May 17, 2018, 10:49 a.m. | #8
On Thu, May 17, 2018 at 10:56 AM Richard Sandiford <
richard.sandiford@linaro.org> wrote:

> Richard Biener <richard.guenther@gmail.com> writes:
> >> @@ -2698,23 +2703,26 @@ convert_mult_to_fma_1 (tree mul_result,
> >>          }
> >
> >>         if (negate_p)
> >> -       mulop1 = force_gimple_operand_gsi (&gsi,
> >> -                                          build1 (NEGATE_EXPR,
> >> -                                                  type, mulop1),
> >> -                                          true, NULL_TREE, true,
> >> -                                          GSI_SAME_STMT);
> >> +       mulop1 = gimple_build (&seq, NEGATE_EXPR, type, mulop1);
> >
> >> -      fma_stmt = gimple_build_assign (gimple_assign_lhs (use_stmt),
> >> -                                     FMA_EXPR, mulop1, op2, addop);
> >> +      if (seq)
> >> +       gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
> >> +      fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2,
> > addop);
> >> +      gimple_call_set_lhs (fma_stmt, gimple_assign_lhs (use_stmt));
> >> +      gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal
> > (use_stmt));
> >> +      gsi_replace (&gsi, fma_stmt, true);
> >> +      /* Valueize aggressively so that we generate FMS, FNMA and FNMS
> >> +        regardless of where the negation occurs.  */
> >> +      if (fold_stmt (&gsi, aggressive_valueize))
> >> +       update_stmt (gsi_stmt (gsi));
> >
> > I think it would be nice to be able to use gimple_build () with IFNs so
you
> > can
> > gimple_build () the IFN and then use gsi_replace_with_seq () on it.  You
> > only need to fold with generated negates, not with negates already in
the
> > IL?
> > The the folding implied with gimple_build will take care of it.

> The idea was to pick up existing negates that feed the multiplication
> as well as any added by the pass itself.

> On IRC yesterday we talked about how this should handle the ECF_NOTHROW
> flag, and whether things like IFN_SQRT and IFN_FMA should always be
> nothrow (like the built-in functions are).  But in the end I thought
> it'd be better to keep things as they are.  We already handle
> -fnon-call-exceptions for unfused a * b + c and before the patch also
> handled it for FMA_EXPR.  It'd seem like a step backwards if the new
> internal functions didn't handle it too.  If anything it seems like the
> built-in functions should change to be closer to the tree_code and
> internal_fn way of doing things, if we want to support
-fnon-call-exceptions
> properly.

Right.  -fnon-call-exceptions isn't very well tested outside of Ada which
must
have its own builtin declarations.

> This also surprised me when doing the if-conversion patch I sent
yesterday.
> We're happy to vectorise:

>    for (int i = 0; i < 100; ++i)
>      x[i] = ... ? sqrt (x[i]) : 0;

> by doing the sqrt unconditionally and selecting on the result, even with
> the default maths flags, but refuse to vectorise the simpler:

>    for (int i = 0; i < 100; ++i)
>      x[i] = ... ? x[i] + 1 : 0;

> in the same way.

Heh.

> > Otherwise can you please move aggressive_valueize to gimple-fold.[ch]
> > alongside no_follow_ssa_edges / follow_single_use_edges and maybe
> > rename it as follow_all_ssa_edges?

> Ah, yeah, that's definitely a better name.

> I also renamed all_scalar_fma to scalar_all_fma, since I realised
> after Andrew's reply that the old name made it sound like it was
> "all scalars", whereas it meant to mean "all fmas".

> Tested as before.

OK.

Thanks,
Richard.

> Thanks,
> Richard

> 2018-05-17  Richard Sandiford  <richard.sandiford@linaro.org>

> gcc/
>          * doc/sourcebuild.texi (scalar_all_fma): Document.
>          * tree.def (FMA_EXPR): Delete.
>          * internal-fn.def (FMA, FMS, FNMA, FNMS): New internal functions.
>          * internal-fn.c (ternary_direct): New macro.
>          (expand_ternary_optab_fn): Likewise.
>          (direct_ternary_optab_supported_p): Likewise.
>          * Makefile.in (build/genmatch.o): Depend on case-fn-macros.h.
>          * builtins.c (fold_builtin_fma): Delete.
>          (fold_builtin_3): Don't call it.
>          * cfgexpand.c (expand_debug_expr): Remove FMA_EXPR handling.
>          * expr.c (expand_expr_real_2): Likewise.
>          * fold-const.c (operand_equal_p): Likewise.
>          (fold_ternary_loc): Likewise.
>          * gimple-pretty-print.c (dump_ternary_rhs): Likewise.
>          * gimple.c (DEFTREECODE): Likewise.
>          * gimplify.c (gimplify_expr): Likewise.
>          * optabs-tree.c (optab_for_tree_code): Likewise.
>          * tree-cfg.c (verify_gimple_assign_ternary): Likewise.
>          * tree-eh.c (operation_could_trap_p): Likewise.
>          (stmt_could_throw_1_p): Likewise.
>          * tree-inline.c (estimate_operator_cost): Likewise.
>          * tree-pretty-print.c (dump_generic_node): Likewise.
>          (op_code_prio): Likewise.
>          * tree-ssa-loop-im.c (stmt_cost): Likewise.
>          * tree-ssa-operands.c (get_expr_operands): Likewise.
>          * tree.c (commutative_ternary_tree_code, add_expr): Likewise.
>          * fold-const-call.h (fold_fma): Delete.
>          * fold-const-call.c (fold_const_call_ssss): Handle CFN_FMS,
>          CFN_FNMA and CFN_FNMS.
>          (fold_fma): Delete.
>          * genmatch.c (combined_fn): New enum.
>          (commutative_ternary_tree_code): Remove FMA_EXPR handling.
>          (commutative_op): New function.
>          (commutate): Use it.  Handle more than 2 operands.
>          (dt_operand::gen_gimple_expr): Use commutative_op.
>          (parser::parse_expr): Allow :c to be used with non-binary
>          operators if the commutative operand is known.
>          * gimple-ssa-backprop.c (backprop::process_builtin_call_use):
Handle
>          CFN_FMS, CFN_FNMA and CFN_FNMS.
>          (backprop::process_assign_use): Remove FMA_EXPR handling.
>          * hsa-gen.c (gen_hsa_insns_for_operation_assignment): Likewise.
>          (gen_hsa_fma): New function.
>          (gen_hsa_insn_for_internal_fn_call): Use it for IFN_FMA, IFN_FMS,
>          IFN_FNMA and IFN_FNMS.
>          * match.pd: Add folds for IFN_FMS, IFN_FNMA and IFN_FNMS.
>          * gimple-fold.h (follow_all_ssa_edges): Declare.
>          * gimple-fold.c (follow_all_ssa_edges): New function.
>          * tree-ssa-math-opts.c (convert_mult_to_fma_1): Use the
>          gimple_build interface and use follow_all_ssa_edges to fold the
result.
>          (convert_mult_to_fma): Use direct_internal_fn_suppoerted_p
>          instead of checking for optabs directly.
>          * config/i386/i386.c (ix86_add_stmt_cost): Recognize FMAs as calls
>          rather than FMA_EXPRs.
>          * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Create a
>          call to IFN_FMA instead of an FMA_EXPR.

> gcc/brig/
>          * brigfrontend/brig-function.cc
>          (brig_function::get_builtin_for_hsa_opcode): Use BUILT_IN_FMA
>          for BRIG_OPCODE_FMA.
>          (brig_function::get_tree_code_for_hsa_opcode): Treat BUILT_IN_FMA
>          as a call.

> gcc/c/
>          * gimple-parser.c (c_parser_gimple_postfix_expression): Remove
>          __FMA_EXPR handlng.

> gcc/cp/
>          * constexpr.c (cxx_eval_constant_expression): Remove FMA_EXPR
handling.
>          (potential_constant_expression_1): Likewise.

> gcc/testsuite/
>          * lib/target-supports.exp (check_effective_target_scalar_all_fma):
>          New proc.
>          * gcc.dg/fma-1.c: New test.
>          * gcc.dg/fma-2.c: Likewise.
>          * gcc.dg/fma-3.c: Likewise.
>          * gcc.dg/fma-4.c: Likewise.
>          * gcc.dg/fma-5.c: Likewise.
>          * gcc.dg/fma-6.c: Likewise.
>          * gcc.dg/fma-7.c: Likewise.
>          * gcc.dg/gimplefe-26.c: Use .FMA instead of __FMA and require
>          scalar_all_fma.
>          * gfortran.dg/reassoc_7.f: Pass -ffp-contract=off.
>          * gfortran.dg/reassoc_8.f: Likewise.
>          * gfortran.dg/reassoc_9.f: Likewise.
>          * gfortran.dg/reassoc_10.f: Likewise.

> Index: gcc/doc/sourcebuild.texi
> ===================================================================
> --- gcc/doc/sourcebuild.texi    2018-05-16 12:48:59.410941892 +0100
> +++ gcc/doc/sourcebuild.texi    2018-05-17 09:18:19.954942948 +0100
> @@ -2251,6 +2251,11 @@ Target supports @option{-pie}, @option{-
>   @item rdynamic
>   Target supports @option{-rdynamic}.

> +@item scalar_all_fma
> +Target supports all four fused multiply-add optabs for both @code{float}
> +and @code{double}.  These optabs are: @code{fma_optab}, @code{fms_optab},
> +@code{fnma_optab} and @code{fnms_optab}.
> +
>   @item section_anchors
>   Target supports section anchors.

> Index: gcc/tree.def
> ===================================================================
> --- gcc/tree.def        2018-05-16 12:48:59.410941892 +0100
> +++ gcc/tree.def        2018-05-17 09:18:19.987942174 +0100
> @@ -1345,12 +1345,6 @@ DEFTREECODE (WIDEN_MULT_MINUS_EXPR, "wid
>      by the second argument.  */
>   DEFTREECODE (WIDEN_LSHIFT_EXPR, "widen_lshift_expr", tcc_binary, 2)

> -/* Fused multiply-add.
> -   All operands and the result are of the same type.  No intermediate
> -   rounding is performed after multiplying operand one with operand two
> -   before adding operand three.  */
> -DEFTREECODE (FMA_EXPR, "fma_expr", tcc_expression, 3)
> -
>   /* Widening vector multiplication.
>      The two operands are vectors with N elements of size S. Multiplying
the
>      elements of the two vectors will result in N products of size 2*S.
> Index: gcc/internal-fn.def
> ===================================================================
> --- gcc/internal-fn.def 2018-05-16 12:48:59.410941892 +0100
> +++ gcc/internal-fn.def 2018-05-17 09:18:19.974942479 +0100
> @@ -57,6 +57,7 @@ along with GCC; see the file COPYING3.

>      - unary: a normal unary optab, such as vec_reverse_<mode>
>      - binary: a normal binary optab, such as vec_interleave_lo_<mode>
> +   - ternary: a normal ternary optab, such as fma<mode>4

>      - cond_binary: a conditional binary optab, such as add<mode>cc

> @@ -138,6 +139,10 @@ DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CO
>   DEF_INTERNAL_OPTAB_FN (VEC_SHL_INSERT, ECF_CONST | ECF_NOTHROW,
>                         vec_shl_insert, binary)

> +DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary)
> +DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary)
> +DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary)
> +
>   DEF_INTERNAL_OPTAB_FN (COND_ADD, ECF_CONST, cond_add, cond_binary)
>   DEF_INTERNAL_OPTAB_FN (COND_SUB, ECF_CONST, cond_sub, cond_binary)
>   DEF_INTERNAL_SIGNED_OPTAB_FN (COND_MIN, ECF_CONST, first,
> @@ -218,6 +223,9 @@ DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONS
>   /* FP scales.  */
>   DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)

> +/* Ternary math functions.  */
> +DEF_INTERNAL_FLT_FN (FMA, ECF_CONST, fma, ternary)
> +
>   /* Unary integer ops.  */
>   DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb, unary)
>   DEF_INTERNAL_INT_FN (CLZ, ECF_CONST | ECF_NOTHROW, clz, unary)
> Index: gcc/internal-fn.c
> ===================================================================
> --- gcc/internal-fn.c   2018-05-17 09:17:58.757608747 +0100
> +++ gcc/internal-fn.c   2018-05-17 09:18:19.974942479 +0100
> @@ -110,6 +110,7 @@ #define mask_store_lanes_direct { 0, 0,
>   #define scatter_store_direct { 3, 3, false }
>   #define unary_direct { 0, 0, true }
>   #define binary_direct { 0, 0, true }
> +#define ternary_direct { 0, 0, true }
>   #define cond_unary_direct { 1, 1, true }
>   #define cond_binary_direct { 1, 1, true }
>   #define while_direct { 0, 2, false }
> @@ -2982,6 +2983,9 @@ #define expand_unary_optab_fn(FN, STMT,
>   #define expand_binary_optab_fn(FN, STMT, OPTAB) \
>     expand_direct_optab_fn (FN, STMT, OPTAB, 2)

> +#define expand_ternary_optab_fn(FN, STMT, OPTAB) \
> +  expand_direct_optab_fn (FN, STMT, OPTAB, 3)
> +
>   #define expand_cond_unary_optab_fn(FN, STMT, OPTAB) \
>     expand_direct_optab_fn (FN, STMT, OPTAB, 2)

> @@ -3067,6 +3071,7 @@ multi_vector_optab_supported_p (convert_

>   #define direct_unary_optab_supported_p direct_optab_supported_p
>   #define direct_binary_optab_supported_p direct_optab_supported_p
> +#define direct_ternary_optab_supported_p direct_optab_supported_p
>   #define direct_cond_unary_optab_supported_p direct_optab_supported_p
>   #define direct_cond_binary_optab_supported_p direct_optab_supported_p
>   #define direct_mask_load_optab_supported_p direct_optab_supported_p
> Index: gcc/Makefile.in
> ===================================================================
> --- gcc/Makefile.in     2018-05-16 12:49:10.495216143 +0100
> +++ gcc/Makefile.in     2018-05-17 09:18:19.936943371 +0100
> @@ -2787,7 +2787,7 @@ build/genmddump.o : genmddump.c $(RTL_BA
>     $(CORETYPES_H) $(GTM_H) errors.h $(READ_MD_H) $(GENSUPPORT_H)
>   build/genmatch.o : genmatch.c $(BCONFIG_H) $(SYSTEM_H) \
>     $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-map.h $(GGC_H) is-a.h \
> -  tree.def builtins.def internal-fn.def
> +  tree.def builtins.def internal-fn.def case-cfn-macros.h
>   build/gencfn-macros.o : gencfn-macros.c $(BCONFIG_H) $(SYSTEM_H)       \
>     $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-set.h builtins.def      \
>     internal-fn.def
> Index: gcc/builtins.c
> ===================================================================
> --- gcc/builtins.c      2018-05-16 12:48:59.410941892 +0100
> +++ gcc/builtins.c      2018-05-17 09:18:19.939943300 +0100
> @@ -8340,21 +8340,6 @@ fold_builtin_abs (location_t loc, tree a
>     return fold_build1_loc (loc, ABS_EXPR, type, arg);
>   }

> -/* Fold a call to fma, fmaf, or fmal with arguments ARG[012].  */
> -
> -static tree
> -fold_builtin_fma (location_t loc, tree arg0, tree arg1, tree arg2, tree
type)
> -{
> -  /* ??? Only expand to FMA_EXPR if it's directly supported.  */
> -  if (validate_arg (arg0, REAL_TYPE)
> -      && validate_arg (arg1, REAL_TYPE)
> -      && validate_arg (arg2, REAL_TYPE)
> -      && optab_handler (fma_optab, TYPE_MODE (type)) != CODE_FOR_nothing)
> -    return fold_build3_loc (loc, FMA_EXPR, type, arg0, arg1, arg2);
> -
> -  return NULL_TREE;
> -}
> -
>   /* Fold a call to builtin carg(a+bi) -> atan2(b,a).  */

>   static tree
> @@ -9260,10 +9245,6 @@ fold_builtin_3 (location_t loc, tree fnd
>       CASE_FLT_FN (BUILT_IN_SINCOS):
>         return fold_builtin_sincos (loc, arg0, arg1, arg2);

> -    CASE_FLT_FN (BUILT_IN_FMA):
> -    CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMA):
> -      return fold_builtin_fma (loc, arg0, arg1, arg2, type);
> -
>       CASE_FLT_FN (BUILT_IN_REMQUO):
>         if (validate_arg (arg0, REAL_TYPE)
>            && validate_arg (arg1, REAL_TYPE)
> Index: gcc/cfgexpand.c
> ===================================================================
> --- gcc/cfgexpand.c     2018-05-16 12:48:59.410941892 +0100
> +++ gcc/cfgexpand.c     2018-05-17 09:18:19.941943253 +0100
> @@ -4202,7 +4202,6 @@ expand_debug_expr (tree exp)
>          case SAD_EXPR:
>          case WIDEN_MULT_PLUS_EXPR:
>          case WIDEN_MULT_MINUS_EXPR:
> -       case FMA_EXPR:
>            goto ternary;

>          case TRUTH_ANDIF_EXPR:
> @@ -5190,9 +5189,6 @@ expand_debug_expr (tree exp)
>          }
>         return NULL;

> -    case FMA_EXPR:
> -      return simplify_gen_ternary (FMA, mode, inner_mode, op0, op1, op2);
> -
>       default:
>       flag_unsupported:
>         if (flag_checking)
> Index: gcc/expr.c
> ===================================================================
> --- gcc/expr.c  2018-05-16 12:48:59.410941892 +0100
> +++ gcc/expr.c  2018-05-17 09:18:19.957942878 +0100
> @@ -8853,67 +8853,6 @@ #define REDUCE_BIT_FIELD(expr)   (reduce_b
>         expand_operands (treeop0, treeop1, subtarget, &op0, &op1,
EXPAND_NORMAL);
>         return REDUCE_BIT_FIELD (expand_mult (mode, op0, op1, target,
unsignedp));

> -    case FMA_EXPR:
> -      {
> -       optab opt = fma_optab;
> -       gimple *def0, *def2;
> -
> -       /* If there is no insn for FMA, emit it as __builtin_fma{,f,l}
> -          call.  */
> -       if (optab_handler (fma_optab, mode) == CODE_FOR_nothing)
> -         {
> -           tree fn = mathfn_built_in (TREE_TYPE (treeop0), BUILT_IN_FMA);
> -           tree call_expr;
> -
> -           gcc_assert (fn != NULL_TREE);
> -           call_expr = build_call_expr (fn, 3, treeop0, treeop1,
treeop2);
> -           return expand_builtin (call_expr, target, subtarget, mode,
false);
> -         }
> -
> -       def0 = get_def_for_expr (treeop0, NEGATE_EXPR);
> -       /* The multiplication is commutative - look at its 2nd operand
> -          if the first isn't fed by a negate.  */
> -       if (!def0)
> -         {
> -           def0 = get_def_for_expr (treeop1, NEGATE_EXPR);
> -           /* Swap operands if the 2nd operand is fed by a negate.  */
> -           if (def0)
> -             std::swap (treeop0, treeop1);
> -         }
> -       def2 = get_def_for_expr (treeop2, NEGATE_EXPR);
> -
> -       op0 = op2 = NULL;
> -
> -       if (def0 && def2
> -           && optab_handler (fnms_optab, mode) != CODE_FOR_nothing)
> -         {
> -           opt = fnms_optab;
> -           op0 = expand_normal (gimple_assign_rhs1 (def0));
> -           op2 = expand_normal (gimple_assign_rhs1 (def2));
> -         }
> -       else if (def0
> -                && optab_handler (fnma_optab, mode) != CODE_FOR_nothing)
> -         {
> -           opt = fnma_optab;
> -           op0 = expand_normal (gimple_assign_rhs1 (def0));
> -         }
> -       else if (def2
> -                && optab_handler (fms_optab, mode) != CODE_FOR_nothing)
> -         {
> -           opt = fms_optab;
> -           op2 = expand_normal (gimple_assign_rhs1 (def2));
> -         }
> -
> -       if (op0 == NULL)
> -         op0 = expand_expr (treeop0, subtarget, VOIDmode, EXPAND_NORMAL);
> -       if (op2 == NULL)
> -         op2 = expand_normal (treeop2);
> -       op1 = expand_normal (treeop1);
> -
> -       return expand_ternary_op (TYPE_MODE (type), opt,
> -                                 op0, op1, op2, target, 0);
> -      }
> -
>       case MULT_EXPR:
>         /* If this is a fixed-point operation, then we cannot use the code
>           below because "expand_mult" doesn't support sat/no-sat
fixed-point
> Index: gcc/fold-const.c
> ===================================================================
> --- gcc/fold-const.c    2018-05-16 12:48:59.410941892 +0100
> +++ gcc/fold-const.c    2018-05-17 09:18:19.959942831 +0100
> @@ -3297,7 +3297,6 @@ #define OP_SAME_WITH_NULL(N)
        \
>          case TRUTH_ORIF_EXPR:
>            return OP_SAME (0) && OP_SAME (1);

> -       case FMA_EXPR:
>          case WIDEN_MULT_PLUS_EXPR:
>          case WIDEN_MULT_MINUS_EXPR:
>            if (!OP_SAME (2))
> @@ -11708,17 +11707,6 @@ fold_ternary_loc (location_t loc, enum t

>         return NULL_TREE;

> -    case FMA_EXPR:
> -      /* For integers we can decompose the FMA if possible.  */
> -      if (TREE_CODE (arg0) == INTEGER_CST
> -         && TREE_CODE (arg1) == INTEGER_CST)
> -       return fold_build2_loc (loc, PLUS_EXPR, type,
> -                               const_binop (MULT_EXPR, arg0, arg1),
arg2);
> -      if (integer_zerop (arg2))
> -       return fold_build2_loc (loc, MULT_EXPR, type, arg0, arg1);
> -
> -      return fold_fma (loc, type, arg0, arg1, arg2);
> -
>       case VEC_PERM_EXPR:
>         if (TREE_CODE (arg2) == VECTOR_CST)
>          {
> Index: gcc/gimple-pretty-print.c
> ===================================================================
> --- gcc/gimple-pretty-print.c   2018-05-17 09:17:58.756608780 +0100
> +++ gcc/gimple-pretty-print.c   2018-05-17 09:18:19.963942737 +0100
> @@ -490,27 +490,6 @@ dump_ternary_rhs (pretty_printer *buffer
>         pp_greater (buffer);
>         break;

> -    case FMA_EXPR:
> -      if (flags & TDF_GIMPLE)
> -       {
> -         pp_string (buffer, "__FMA (");
> -         dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags,
false);
> -         pp_comma (buffer);
> -         dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags,
false);
> -         pp_comma (buffer);
> -         dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags,
false);
> -         pp_right_paren (buffer);
> -       }
> -      else
> -       {
> -         dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags,
false);
> -         pp_string (buffer, " * ");
> -         dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags,
false);
> -         pp_string (buffer, " + ");
> -         dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags,
false);
> -       }
> -      break;
> -
>       case DOT_PROD_EXPR:
>         pp_string (buffer, "DOT_PROD_EXPR <");
>         dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags,
false);
> Index: gcc/gimple.c
> ===================================================================
> --- gcc/gimple.c        2018-05-17 09:17:58.756608780 +0100
> +++ gcc/gimple.c        2018-05-17 09:18:19.966942667 +0100
> @@ -2150,8 +2150,7 @@ #define DEFTREECODE(SYM, STRING, TYPE, N
>         || (SYM) == REALIGN_LOAD_EXPR
    \
>         || (SYM) == VEC_COND_EXPR
            \
>         || (SYM) == VEC_PERM_EXPR
     \
> -      || (SYM) == BIT_INSERT_EXPR
    \
> -      || (SYM) == FMA_EXPR) ? GIMPLE_TERNARY_RHS
   \
> +      || (SYM) == BIT_INSERT_EXPR) ? GIMPLE_TERNARY_RHS
            \
>      : ((SYM) == CONSTRUCTOR
   \
>         || (SYM) == OBJ_TYPE_REF
   \
>         || (SYM) == ASSERT_EXPR
    \
> Index: gcc/gimplify.c
> ===================================================================
> --- gcc/gimplify.c      2018-05-16 12:48:59.410941892 +0100
> +++ gcc/gimplify.c      2018-05-17 09:18:19.972942526 +0100
> @@ -12086,7 +12086,6 @@ gimplify_expr (tree *expr_p, gimple_seq
>            }
>            break;

> -       case FMA_EXPR:
>          case VEC_PERM_EXPR:
>            /* Classified as tcc_expression.  */
>            goto expr_3;
> Index: gcc/optabs-tree.c
> ===================================================================
> --- gcc/optabs-tree.c   2018-05-16 12:48:59.410941892 +0100
> +++ gcc/optabs-tree.c   2018-05-17 09:18:19.975942456 +0100
> @@ -143,9 +143,6 @@ optab_for_tree_code (enum tree_code code
>                : (TYPE_SATURATING (type)
>                   ? ssmsub_widen_optab : smsub_widen_optab));

> -    case FMA_EXPR:
> -      return fma_optab;
> -
>       case VEC_WIDEN_MULT_HI_EXPR:
>         return TYPE_UNSIGNED (type) ?
>          vec_widen_umult_hi_optab : vec_widen_smult_hi_optab;
> Index: gcc/tree-cfg.c
> ===================================================================
> --- gcc/tree-cfg.c      2018-05-16 12:49:10.497382733 +0100
> +++ gcc/tree-cfg.c      2018-05-17 09:18:19.979942362 +0100
> @@ -4109,20 +4109,6 @@ verify_gimple_assign_ternary (gassign *s
>          }
>         break;

> -    case FMA_EXPR:
> -      if (!useless_type_conversion_p (lhs_type, rhs1_type)
> -         || !useless_type_conversion_p (lhs_type, rhs2_type)
> -         || !useless_type_conversion_p (lhs_type, rhs3_type))
> -       {
> -         error ("type mismatch in fused multiply-add expression");
> -         debug_generic_expr (lhs_type);
> -         debug_generic_expr (rhs1_type);
> -         debug_generic_expr (rhs2_type);
> -         debug_generic_expr (rhs3_type);
> -         return true;
> -       }
> -      break;
> -
>       case VEC_COND_EXPR:
>         if (!VECTOR_BOOLEAN_TYPE_P (rhs1_type)
>            || maybe_ne (TYPE_VECTOR_SUBPARTS (rhs1_type),
> Index: gcc/tree-eh.c
> ===================================================================
> --- gcc/tree-eh.c       2018-05-16 12:48:59.410941892 +0100
> +++ gcc/tree-eh.c       2018-05-17 09:18:19.980942338 +0100
> @@ -2512,8 +2512,7 @@ operation_could_trap_p (enum tree_code o

>     if (TREE_CODE_CLASS (op) != tcc_comparison
>         && TREE_CODE_CLASS (op) != tcc_unary
> -      && TREE_CODE_CLASS (op) != tcc_binary
> -      && op != FMA_EXPR)
> +      && TREE_CODE_CLASS (op) != tcc_binary)
>       return false;

>     return operation_could_trap_helper_p (op, fp_operation, honor_trapv,
> @@ -2825,8 +2824,7 @@ stmt_could_throw_1_p (gassign *stmt)

>     if (TREE_CODE_CLASS (code) == tcc_comparison
>         || TREE_CODE_CLASS (code) == tcc_unary
> -      || TREE_CODE_CLASS (code) == tcc_binary
> -      || code == FMA_EXPR)
> +      || TREE_CODE_CLASS (code) == tcc_binary)
>       {
>         if (TREE_CODE_CLASS (code) == tcc_comparison)
>          t = TREE_TYPE (gimple_assign_rhs1 (stmt));
> Index: gcc/tree-inline.c
> ===================================================================
> --- gcc/tree-inline.c   2018-05-16 12:48:59.410941892 +0100
> +++ gcc/tree-inline.c   2018-05-17 09:18:19.980942338 +0100
> @@ -3855,7 +3855,6 @@ estimate_operator_cost (enum tree_code c
>       case MINUS_EXPR:
>       case MULT_EXPR:
>       case MULT_HIGHPART_EXPR:
> -    case FMA_EXPR:

>       case ADDR_SPACE_CONVERT_EXPR:
>       case FIXED_CONVERT_EXPR:
> Index: gcc/tree-pretty-print.c
> ===================================================================
> --- gcc/tree-pretty-print.c     2018-05-17 09:17:58.758608713 +0100
> +++ gcc/tree-pretty-print.c     2018-05-17 09:18:19.981942315 +0100
> @@ -2904,16 +2904,6 @@ dump_generic_node (pretty_printer *pp, t
>         pp_string (pp, " > ");
>         break;

> -    case FMA_EXPR:
> -      pp_string (pp, " FMA_EXPR < ");
> -      dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
> -      pp_string (pp, ", ");
> -      dump_generic_node (pp, TREE_OPERAND (node, 1), spc, flags, false);
> -      pp_string (pp, ", ");
> -      dump_generic_node (pp, TREE_OPERAND (node, 2), spc, flags, false);
> -      pp_string (pp, " > ");
> -      break;
> -
>       case OACC_PARALLEL:
>         pp_string (pp, "#pragma acc parallel");
>         goto dump_omp_clauses_body;
> @@ -3552,7 +3542,6 @@ op_code_prio (enum tree_code code)
>       case CEIL_MOD_EXPR:
>       case FLOOR_MOD_EXPR:
>       case ROUND_MOD_EXPR:
> -    case FMA_EXPR:
>         return 13;

>       case TRUTH_NOT_EXPR:
> Index: gcc/tree-ssa-loop-im.c
> ===================================================================
> --- gcc/tree-ssa-loop-im.c      2018-05-16 12:48:59.410941892 +0100
> +++ gcc/tree-ssa-loop-im.c      2018-05-17 09:18:19.981942315 +0100
> @@ -493,7 +493,6 @@ stmt_cost (gimple *stmt)
>       case WIDEN_MULT_PLUS_EXPR:
>       case WIDEN_MULT_MINUS_EXPR:
>       case DOT_PROD_EXPR:
> -    case FMA_EXPR:
>       case TRUNC_DIV_EXPR:
>       case CEIL_DIV_EXPR:
>       case FLOOR_DIV_EXPR:
> Index: gcc/tree-ssa-operands.c
> ===================================================================
> --- gcc/tree-ssa-operands.c     2018-05-16 12:48:59.410941892 +0100
> +++ gcc/tree-ssa-operands.c     2018-05-17 09:18:19.982942291 +0100
> @@ -849,7 +849,6 @@ get_expr_operands (struct function *fn,
>       case REALIGN_LOAD_EXPR:
>       case WIDEN_MULT_PLUS_EXPR:
>       case WIDEN_MULT_MINUS_EXPR:
> -    case FMA_EXPR:
>         {
>          get_expr_operands (fn, stmt, &TREE_OPERAND (expr, 0), flags);
>          get_expr_operands (fn, stmt, &TREE_OPERAND (expr, 1), flags);
> Index: gcc/tree.c
> ===================================================================
> --- gcc/tree.c  2018-05-16 12:48:59.410941892 +0100
> +++ gcc/tree.c  2018-05-17 09:18:19.986942197 +0100
> @@ -7171,7 +7171,6 @@ commutative_ternary_tree_code (enum tree
>       case WIDEN_MULT_PLUS_EXPR:
>       case WIDEN_MULT_MINUS_EXPR:
>       case DOT_PROD_EXPR:
> -    case FMA_EXPR:
>         return true;

>       default:
> @@ -7457,7 +7456,6 @@ add_expr (const_tree t, inchash::hash &h
>                flags &= ~OEP_ADDRESS_OF;
>                break;

> -           case FMA_EXPR:
>              case WIDEN_MULT_PLUS_EXPR:
>              case WIDEN_MULT_MINUS_EXPR:
>                {
> Index: gcc/fold-const-call.h
> ===================================================================
> --- gcc/fold-const-call.h       2018-05-16 12:48:59.410941892 +0100
> +++ gcc/fold-const-call.h       2018-05-17 09:18:19.957942878 +0100
> @@ -23,7 +23,6 @@ #define GCC_FOLD_CONST_CALL_H
>   tree fold_const_call (combined_fn, tree, tree);
>   tree fold_const_call (combined_fn, tree, tree, tree);
>   tree fold_const_call (combined_fn, tree, tree, tree, tree);
> -tree fold_fma (location_t, tree, tree, tree, tree);
>   tree build_cmp_result (tree type, int res);

>   #endif
> Index: gcc/fold-const-call.c
> ===================================================================
> --- gcc/fold-const-call.c       2018-05-16 12:48:59.410941892 +0100
> +++ gcc/fold-const-call.c       2018-05-17 09:18:19.957942878 +0100
> @@ -1606,6 +1606,26 @@ fold_const_call_ssss (real_value *result
>       CASE_CFN_FMA_FN:
>         return do_mpfr_arg3 (result, mpfr_fma, arg0, arg1, arg2, format);

> +    case CFN_FMS:
> +      {
> +       real_value new_arg2 = real_value_negate (arg2);
> +       return do_mpfr_arg3 (result, mpfr_fma, arg0, arg1, &new_arg2,
format);
> +      }
> +
> +    case CFN_FNMA:
> +      {
> +       real_value new_arg0 = real_value_negate (arg0);
> +       return do_mpfr_arg3 (result, mpfr_fma, &new_arg0, arg1, arg2,
format);
> +      }
> +
> +    case CFN_FNMS:
> +      {
> +       real_value new_arg0 = real_value_negate (arg0);
> +       real_value new_arg2 = real_value_negate (arg2);
> +       return do_mpfr_arg3 (result, mpfr_fma, &new_arg0, arg1,
> +                            &new_arg2, format);
> +      }
> +
>       default:
>         return false;
>       }
> @@ -1719,20 +1739,3 @@ fold_const_call (combined_fn fn, tree ty
>         return fold_const_call_1 (fn, type, arg0, arg1, arg2);
>       }
>   }
> -
> -/* Fold a fma operation with arguments ARG[012].  */
> -
> -tree
> -fold_fma (location_t, tree type, tree arg0, tree arg1, tree arg2)
> -{
> -  REAL_VALUE_TYPE result;
> -  if (real_cst_p (arg0)
> -      && real_cst_p (arg1)
> -      && real_cst_p (arg2)
> -      && do_mpfr_arg3 (&result, mpfr_fma, TREE_REAL_CST_PTR (arg0),
> -                      TREE_REAL_CST_PTR (arg1), TREE_REAL_CST_PTR (arg2),
> -                      REAL_MODE_FORMAT (TYPE_MODE (type))))
> -    return build_real (type, result);
> -
> -  return NULL_TREE;
> -}
> Index: gcc/genmatch.c
> ===================================================================
> --- gcc/genmatch.c      2018-05-16 12:48:59.410941892 +0100
> +++ gcc/genmatch.c      2018-05-17 09:18:19.960942808 +0100
> @@ -241,6 +241,20 @@ enum internal_fn {
>     IFN_LAST
>   };

> +enum combined_fn {
> +#define DEF_BUILTIN(ENUM, N, C, T, LT, B, F, NA, AT, IM, COND) \
> +  CFN_##ENUM = int (ENUM),
> +#include "builtins.def"
> +
> +#define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) \
> +  CFN_##CODE = int (END_BUILTINS) + int (IFN_##CODE),
> +#include "internal-fn.def"
> +
> +  CFN_LAST
> +};
> +
> +#include "case-cfn-macros.h"
> +
>   /* Return true if CODE represents a commutative tree code.  Otherwise
>      return false.  */
>   bool
> @@ -288,7 +302,6 @@ commutative_ternary_tree_code (enum tree
>       case WIDEN_MULT_PLUS_EXPR:
>       case WIDEN_MULT_MINUS_EXPR:
>       case DOT_PROD_EXPR:
> -    case FMA_EXPR:
>         return true;

>       default:
> @@ -450,6 +463,44 @@ is_a_helper <user_id *>::test (id_base *
>     return id->kind == id_base::USER;
>   }

> +/* If ID has a pair of consecutive, commutative operands, return the
> +   index of the first, otherwise return -1.  */
> +
> +static int
> +commutative_op (id_base *id)
> +{
> +  if (operator_id *code = dyn_cast <operator_id *> (id))
> +    {
> +      if (commutative_tree_code (code->code)
> +         || commutative_ternary_tree_code (code->code))
> +       return 0;
> +      return -1;
> +    }
> +  if (fn_id *fn = dyn_cast <fn_id *> (id))
> +    switch (fn->fn)
> +      {
> +      CASE_CFN_FMA:
> +      case CFN_FMS:
> +      case CFN_FNMA:
> +      case CFN_FNMS:
> +       return 0;
> +
> +      default:
> +       return -1;
> +      }
> +  if (user_id *uid = dyn_cast<user_id *> (id))
> +    {
> +      int res = commutative_op (uid->substitutes[0]);
> +      if (res < 0)
> +       return 0;
> +      for (unsigned i = 1; i < uid->substitutes.length (); ++i)
> +       if (res != commutative_op (uid->substitutes[i]))
> +         return -1;
> +      return res;
> +    }
> +  return -1;
> +}
> +
>   /* Add a predicate identifier to the hash.  */

>   static predicate_id *
> @@ -946,6 +997,9 @@ commutate (operand *op, vec<vec<user_id
>     if (!e->is_commutative)
>       return ret;

> +  /* The operation is always binary if it isn't inherently commutative.
  */
> +  int natural_opno = commutative_op (e->operation);
> +  unsigned int opno = natural_opno >= 0 ? natural_opno : 0;
>     for (unsigned i = 0; i < result.length (); ++i)
>       {
>         expr *ne = new expr (e);
> @@ -994,9 +1048,11 @@ commutate (operand *op, vec<vec<user_id
>              }
>          }
>         ne->is_commutative = false;
> -      // result[i].length () is 2 since e->operation is binary
> -      for (unsigned j = result[i].length (); j; --j)
> -       ne->append_op (result[i][j-1]);
> +      for (unsigned j = 0; j < result[i].length (); ++j)
> +       {
> +         int old_j = (j == opno ? opno + 1 : j == opno + 1 ? opno : j);
> +         ne->append_op (result[i][old_j]);
> +       }
>         ret.safe_push (ne);
>       }

> @@ -2759,24 +2815,18 @@ dt_operand::gen_gimple_expr (FILE *f, in
>     /* While the toplevel operands are canonicalized by the caller
>        after valueizing operands of sub-expressions we have to
>        re-canonicalize operand order.  */
> -  if (operator_id *code = dyn_cast <operator_id *> (id))
> +  int opno = commutative_op (id);
> +  if (opno >= 0)
>       {
> -      /* ???  We can't canonicalize tcc_comparison operands here
> -         because that requires changing the comparison code which
> -        we already matched...  */
> -      if (commutative_tree_code (code->code)
> -         || commutative_ternary_tree_code (code->code))
> -       {
> -         char child_opname0[20], child_opname1[20];
> -         gen_opname (child_opname0, 0);
> -         gen_opname (child_opname1, 1);
> -         fprintf_indent (f, indent,
> -                         "if (tree_swap_operands_p (%s, %s))\n",
> -                         child_opname0, child_opname1);
> -         fprintf_indent (f, indent,
> -                         "  std::swap (%s, %s);\n",
> -                         child_opname0, child_opname1);
> -       }
> +      char child_opname0[20], child_opname1[20];
> +      gen_opname (child_opname0, opno);
> +      gen_opname (child_opname1, opno + 1);
> +      fprintf_indent (f, indent,
> +                     "if (tree_swap_operands_p (%s, %s))\n",
> +                     child_opname0, child_opname1);
> +      fprintf_indent (f, indent,
> +                     "  std::swap (%s, %s);\n",
> +                     child_opname0, child_opname1);
>       }

>     return n_braces;
> @@ -4217,11 +4267,14 @@ parser::parse_expr ()
>                        e->operation->id, e->operation->nargs,
e->ops.length ());
>            if (is_commutative)
>              {
> -             if (e->ops.length () == 2)
> +             if (e->ops.length () == 2
> +                 || commutative_op (e->operation) >= 0)
>                  e->is_commutative = true;
>                else
> -               fatal_at (token, "only binary operators or function with "
> -                         "two arguments can be marked commutative");
> +               fatal_at (token, "only binary operators or functions with
"
> +                         "two arguments can be marked commutative, "
> +                         "unless the operation is known to be inherently
"
> +                         "commutative");
>              }
>            e->expr_type = expr_type;
>            return op;
> Index: gcc/gimple-ssa-backprop.c
> ===================================================================
> --- gcc/gimple-ssa-backprop.c   2018-05-16 12:48:59.410941892 +0100
> +++ gcc/gimple-ssa-backprop.c   2018-05-17 09:18:19.963942737 +0100
> @@ -375,6 +375,9 @@ backprop::process_builtin_call_use (gcal

>       CASE_CFN_FMA:
>       CASE_CFN_FMA_FN:
> +    case CFN_FMS:
> +    case CFN_FNMA:
> +    case CFN_FNMS:
>         /* In X * X + Y, where Y is distinct from X, the sign of X doesn't
>           matter.  */
>         if (gimple_call_arg (call, 0) == rhs
> @@ -420,15 +423,6 @@ backprop::process_assign_use (gassign *a
>          }
>         break;

> -    case FMA_EXPR:
> -      /* In X * X + Y, where Y is distinct from X, the sign of X doesn't
> -        matter.  */
> -      if (gimple_assign_rhs1 (assign) == rhs
> -         && gimple_assign_rhs2 (assign) == rhs
> -         && gimple_assign_rhs3 (assign) != rhs)
> -       info->flags.ignore_sign = true;
> -      break;
> -
>       case MULT_EXPR:
>         /* In X * X, the sign of X doesn't matter.  */
>         if (gimple_assign_rhs1 (assign) == rhs
> Index: gcc/hsa-gen.c
> ===================================================================
> --- gcc/hsa-gen.c       2018-05-16 12:48:59.410941892 +0100
> +++ gcc/hsa-gen.c       2018-05-17 09:18:19.973942503 +0100
> @@ -3178,23 +3178,6 @@ gen_hsa_insns_for_operation_assignment (
>       case NEGATE_EXPR:
>         opcode = BRIG_OPCODE_NEG;
>         break;
> -    case FMA_EXPR:
> -      /* There is a native HSA instruction for scalar FMAs but not for
vector
> -        ones.  */
> -      if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE)
> -       {
> -         hsa_op_reg *dest
> -           = hsa_cfun->reg_for_gimple_ssa (gimple_assign_lhs (assign));
> -         hsa_op_with_type *op1 = hsa_reg_or_immed_for_gimple_op (rhs1,
hbb);
> -         hsa_op_with_type *op2 = hsa_reg_or_immed_for_gimple_op (rhs2,
hbb);
> -         hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3,
hbb);
> -         hsa_op_reg *tmp = new hsa_op_reg (dest->m_type);
> -         gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp, op1, op2, hbb);
> -         gen_hsa_binary_operation (BRIG_OPCODE_ADD, dest, tmp, op3, hbb);
> -         return;
> -       }
> -      opcode = BRIG_OPCODE_MAD;
> -      break;
>       case MIN_EXPR:
>         opcode = BRIG_OPCODE_MIN;
>         break;
> @@ -4490,6 +4473,57 @@ gen_hsa_divmod (gcall *call, hsa_bb *hbb
>     insn->set_output_in_type (dest, 0, hbb);
>   }

> +/* Emit instructions that implement FMA, FMS, FNMA or FNMS call STMT.
> +   Instructions are appended to basic block HBB.  NEGATE1 is true for
> +   FNMA and FNMS.  NEGATE3 is true for FMS and FNMS.  */
> +
> +static void
> +gen_hsa_fma (gcall *call, hsa_bb *hbb, bool negate1, bool negate3)
> +{
> +  tree lhs = gimple_call_lhs (call);
> +  if (lhs == NULL_TREE)
> +    return;
> +
> +  tree rhs1 = gimple_call_arg (call, 0);
> +  tree rhs2 = gimple_call_arg (call, 1);
> +  tree rhs3 = gimple_call_arg (call, 2);
> +
> +  hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
> +  hsa_op_with_type *op1 = hsa_reg_or_immed_for_gimple_op (rhs1, hbb);
> +  hsa_op_with_type *op2 = hsa_reg_or_immed_for_gimple_op (rhs2, hbb);
> +  hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb);
> +
> +  if (negate1)
> +    {
> +      hsa_op_reg *tmp = new hsa_op_reg (dest->m_type);
> +      gen_hsa_unary_operation (BRIG_OPCODE_NEG, tmp, op1, hbb);
> +      op1 = tmp;
> +    }
> +
> +  /* There is a native HSA instruction for scalar FMAs but not for vector
> +     ones.  */
> +  if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE)
> +    {
> +      hsa_op_reg *tmp = new hsa_op_reg (dest->m_type);
> +      gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp, op1, op2, hbb);
> +      gen_hsa_binary_operation (negate3 ? BRIG_OPCODE_SUB :
BRIG_OPCODE_ADD,
> +                               dest, tmp, op3, hbb);
> +    }
> +  else
> +    {
> +      if (negate3)
> +       {
> +         hsa_op_reg *tmp = new hsa_op_reg (dest->m_type);
> +         gen_hsa_unary_operation (BRIG_OPCODE_NEG, tmp, op3, hbb);
> +         op3 = tmp;
> +       }
> +      hsa_insn_basic *insn = new hsa_insn_basic (4, BRIG_OPCODE_MAD,
> +                                                dest->m_type, dest,
> +                                                op1, op2, op3);
> +      hbb->append_insn (insn);
> +    }
> +}
> +
>   /* Set VALUE to a shadow kernel debug argument and append a new
instruction
>      to HBB basic block.  */

> @@ -5224,6 +5258,22 @@ gen_hsa_insn_for_internal_fn_call (gcall
>         gen_hsa_insns_for_call_of_internal_fn (stmt, hbb);
>         break;

> +    case IFN_FMA:
> +      gen_hsa_fma (stmt, hbb, false, false);
> +      break;
> +
> +    case IFN_FMS:
> +      gen_hsa_fma (stmt, hbb, false, true);
> +      break;
> +
> +    case IFN_FNMA:
> +      gen_hsa_fma (stmt, hbb, true, false);
> +      break;
> +
> +    case IFN_FNMS:
> +      gen_hsa_fma (stmt, hbb, true, true);
> +      break;
> +
>       default:
>         HSA_SORRY_ATV (gimple_location (stmt),
>                       "support for HSA does not implement internal
function: %s",
> Index: gcc/match.pd
> ===================================================================
> --- gcc/match.pd        2018-05-16 12:48:59.410941892 +0100
> +++ gcc/match.pd        2018-05-17 09:18:19.974942479 +0100
> @@ -4702,3 +4702,60 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>          || wi::geu_p (wi::to_wide (@rpos),
>                        wi::to_wide (@ipos) + isize))
>       (BIT_FIELD_REF @0 @rsize @rpos)))))
> +
> +(for fmas (FMA)
> + (simplify
> +  (fmas:c (negate @0) @1 @2)
> +  (IFN_FNMA @0 @1 @2))
> + (simplify
> +  (fmas @0 @1 (negate @2))
> +  (IFN_FMS @0 @1 @2))
> + (simplify
> +  (fmas:c (negate @0) @1 (negate @2))
> +  (IFN_FNMS @0 @1 @2))
> + (simplify
> +  (negate (fmas@3 @0 @1 @2))
> +  (if (single_use (@3))
> +   (IFN_FNMS @0 @1 @2))))
> +
> +(simplify
> + (IFN_FMS:c (negate @0) @1 @2)
> + (IFN_FNMS @0 @1 @2))
> +(simplify
> + (IFN_FMS @0 @1 (negate @2))
> + (IFN_FMA @0 @1 @2))
> +(simplify
> + (IFN_FMS:c (negate @0) @1 (negate @2))
> + (IFN_FNMA @0 @1 @2))
> +(simplify
> + (negate (IFN_FMS@3 @0 @1 @2))
> +  (if (single_use (@3))
> +   (IFN_FNMA @0 @1 @2)))
> +
> +(simplify
> + (IFN_FNMA:c (negate @0) @1 @2)
> + (IFN_FMA @0 @1 @2))
> +(simplify
> + (IFN_FNMA @0 @1 (negate @2))
> + (IFN_FNMS @0 @1 @2))
> +(simplify
> + (IFN_FNMA:c (negate @0) @1 (negate @2))
> + (IFN_FMS @0 @1 @2))
> +(simplify
> + (negate (IFN_FNMA@3 @0 @1 @2))
> + (if (single_use (@3))
> +  (IFN_FMS @0 @1 @2)))
> +
> +(simplify
> + (IFN_FNMS:c (negate @0) @1 @2)
> + (IFN_FMS @0 @1 @2))
> +(simplify
> + (IFN_FNMS @0 @1 (negate @2))
> + (IFN_FNMA @0 @1 @2))
> +(simplify
> + (IFN_FNMS:c (negate @0) @1 (negate @2))
> + (IFN_FMA @0 @1 @2))
> +(simplify
> + (negate (IFN_FNMS@3 @0 @1 @2))
> + (if (single_use (@3))
> +  (IFN_FMA @0 @1 @2)))
> Index: gcc/gimple-fold.h
> ===================================================================
> --- gcc/gimple-fold.h   2018-05-17 09:17:32.876478942 +0100
> +++ gcc/gimple-fold.h   2018-05-17 09:18:19.962942761 +0100
> @@ -41,6 +41,7 @@ extern bool arith_overflowed_p (enum tre
>                                  const_tree);
>   extern tree no_follow_ssa_edges (tree);
>   extern tree follow_single_use_edges (tree);
> +extern tree follow_all_ssa_edges (tree);
>   extern tree gimple_fold_stmt_to_constant_1 (gimple *, tree (*) (tree),
>                                              tree (*) (tree) =
no_follow_ssa_edges);
>   extern tree gimple_fold_stmt_to_constant (gimple *, tree (*) (tree));
> Index: gcc/gimple-fold.c
> ===================================================================
> --- gcc/gimple-fold.c   2018-05-17 09:17:32.876478942 +0100
> +++ gcc/gimple-fold.c   2018-05-17 09:18:19.961942784 +0100
> @@ -4967,6 +4967,14 @@ follow_single_use_edges (tree val)
>     return val;
>   }

> +/* Valueization callback that follows all SSA edges.  */
> +
> +tree
> +follow_all_ssa_edges (tree val)
> +{
> +  return val;
> +}
> +
>   /* Fold the statement pointed to by GSI.  In some cases, this function
may
>      replace the whole statement with a new one.  Returns true iff folding
>      makes any changes.
> Index: gcc/tree-ssa-math-opts.c
> ===================================================================
> --- gcc/tree-ssa-math-opts.c    2018-05-16 12:48:59.410941892 +0100
> +++ gcc/tree-ssa-math-opts.c    2018-05-17 09:18:19.982942291 +0100
> @@ -2650,7 +2650,7 @@ convert_mult_to_fma_1 (tree mul_result,
>     tree type = TREE_TYPE (mul_result);
>     gimple *use_stmt;
>     imm_use_iterator imm_iter;
> -  gassign *fma_stmt;
> +  gcall *fma_stmt;

>     FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, mul_result)
>       {
> @@ -2658,6 +2658,7 @@ convert_mult_to_fma_1 (tree mul_result,
>         enum tree_code use_code;
>         tree addop, mulop1 = op1, result = mul_result;
>         bool negate_p = false;
> +      gimple_seq seq = NULL;

>         if (is_gimple_debug (use_stmt))
>          continue;
> @@ -2683,11 +2684,7 @@ convert_mult_to_fma_1 (tree mul_result,
>            addop = gimple_assign_rhs2 (use_stmt);
>            /* a * b - c -> a * b + (-c)  */
>            if (gimple_assign_rhs_code (use_stmt) == MINUS_EXPR)
> -           addop = force_gimple_operand_gsi (&gsi,
> -                                             build1 (NEGATE_EXPR,
> -                                                     type, addop),
> -                                             true, NULL_TREE, true,
> -                                             GSI_SAME_STMT);
> +           addop = gimple_build (&seq, NEGATE_EXPR, type, addop);
>          }
>         else
>          {
> @@ -2698,23 +2695,26 @@ convert_mult_to_fma_1 (tree mul_result,
>          }

>         if (negate_p)
> -       mulop1 = force_gimple_operand_gsi (&gsi,
> -                                          build1 (NEGATE_EXPR,
> -                                                  type, mulop1),
> -                                          true, NULL_TREE, true,
> -                                          GSI_SAME_STMT);
> +       mulop1 = gimple_build (&seq, NEGATE_EXPR, type, mulop1);

> -      fma_stmt = gimple_build_assign (gimple_assign_lhs (use_stmt),
> -                                     FMA_EXPR, mulop1, op2, addop);
> +      if (seq)
> +       gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
> +      fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2,
addop);
> +      gimple_call_set_lhs (fma_stmt, gimple_assign_lhs (use_stmt));
> +      gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal
(use_stmt));
> +      gsi_replace (&gsi, fma_stmt, true);
> +      /* Follow all SSA edges so that we generate FMS, FNMA and FNMS
> +        regardless of where the negation occurs.  */
> +      if (fold_stmt (&gsi, follow_all_ssa_edges))
> +       update_stmt (gsi_stmt (gsi));

>         if (dump_file && (dump_flags & TDF_DETAILS))
>          {
>            fprintf (dump_file, "Generated FMA ");
> -         print_gimple_stmt (dump_file, fma_stmt, 0, 0);
> +         print_gimple_stmt (dump_file, gsi_stmt (gsi), 0, 0);
>            fprintf (dump_file, "\n");
>          }

> -      gsi_replace (&gsi, fma_stmt, true);
>         widen_mul_stats.fmas_inserted++;
>       }
>   }
> @@ -2862,7 +2862,8 @@ convert_mult_to_fma (gimple *mul_stmt, t

>     /* If the target doesn't support it, don't generate it.  We assume that
>        if fma isn't available then fms, fnma or fnms are not either.  */
> -  if (optab_handler (fma_optab, TYPE_MODE (type)) == CODE_FOR_nothing)
> +  optimization_type opt_type = bb_optimization_type (gimple_bb
(mul_stmt));
> +  if (!direct_internal_fn_supported_p (IFN_FMA, type, opt_type))
>       return false;

>     /* If the multiplication has zero uses, it is kept around probably
because
> @@ -2958,8 +2959,8 @@ convert_mult_to_fma (gimple *mul_stmt, t
>           that a mult / subtract pair.  */
>         if (use_code == MINUS_EXPR && !negate_p
>            && gimple_assign_rhs1 (use_stmt) == result
> -         && optab_handler (fms_optab, TYPE_MODE (type)) ==
CODE_FOR_nothing
> -         && optab_handler (fnma_optab, TYPE_MODE (type)) !=
CODE_FOR_nothing)
> +         && !direct_internal_fn_supported_p (IFN_FMS, type, opt_type)
> +         && direct_internal_fn_supported_p (IFN_FNMA, type, opt_type))
>          {
>            tree rhs2 = gimple_assign_rhs2 (use_stmt);

> Index: gcc/config/i386/i386.c
> ===================================================================
> --- gcc/config/i386/i386.c      2018-05-16 12:49:10.501715913 +0100
> +++ gcc/config/i386/i386.c      2018-05-17 09:18:19.949943066 +0100
> @@ -50614,19 +50614,20 @@ ix86_add_stmt_cost (void *data, int coun
>     tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE;
>     int stmt_cost = - 1;

> +  bool fp = false;
> +  machine_mode mode = TImode;
> +
> +  if (vectype != NULL)
> +    {
> +      fp = FLOAT_TYPE_P (vectype);
> +      mode = TYPE_MODE (vectype);
> +    }
> +
>     if ((kind == vector_stmt || kind == scalar_stmt)
>         && stmt_info
>         && stmt_info->stmt && gimple_code (stmt_info->stmt) ==
GIMPLE_ASSIGN)
>       {
>         tree_code subcode = gimple_assign_rhs_code (stmt_info->stmt);
> -      bool fp = false;
> -      machine_mode mode = TImode;
> -
> -      if (vectype != NULL)
> -       {
> -         fp = FLOAT_TYPE_P (vectype);
> -         mode = TYPE_MODE (vectype);
> -       }
>         /*machine_mode inner_mode = mode;
>         if (VECTOR_MODE_P (mode))
>          inner_mode = GET_MODE_INNER (mode);*/
> @@ -50657,12 +50658,6 @@ ix86_add_stmt_cost (void *data, int coun
>          case MULT_HIGHPART_EXPR:
>            stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
>            break;
> -       case FMA_EXPR:
> -          stmt_cost = ix86_vec_cost (mode,
> -                                    mode == SFmode ? ix86_cost->fmass
> -                                    : ix86_cost->fmasd,
> -                                    true);
> -         break;
>          case NEGATE_EXPR:
>            if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
>              stmt_cost = ix86_cost->sse_op;
> @@ -50725,6 +50720,24 @@ ix86_add_stmt_cost (void *data, int coun
>            break;
>          }
>       }
> +
> +  combined_fn cfn;
> +  if ((kind == vector_stmt || kind == scalar_stmt)
> +      && stmt_info
> +      && stmt_info->stmt
> +      && (cfn = gimple_call_combined_fn (stmt_info->stmt)) != CFN_LAST)
> +    switch (cfn)
> +      {
> +      case CFN_FMA:
> +       stmt_cost = ix86_vec_cost (mode,
> +                                  mode == SFmode ? ix86_cost->fmass
> +                                  : ix86_cost->fmasd,
> +                                  true);
> +       break;
> +      default:
> +       break;
> +      }
> +
>     /* If we do elementwise loads into a vector then we are bound by
>        latency and execution resources for the many scalar loads
>        (AGU and load ports).  Try to account for this by scaling the
> Index: gcc/config/rs6000/rs6000.c
> ===================================================================
> --- gcc/config/rs6000/rs6000.c  2018-05-16 12:48:59.410941892 +0100
> +++ gcc/config/rs6000/rs6000.c  2018-05-17 09:18:19.953942972 +0100
> @@ -15880,7 +15880,9 @@ rs6000_gimple_fold_builtin (gimple_stmt_
>          arg1 = gimple_call_arg (stmt, 1);
>          tree arg2 = gimple_call_arg (stmt, 2);
>          lhs = gimple_call_lhs (stmt);
> -       gimple *g = gimple_build_assign (lhs, FMA_EXPR, arg0, arg1, arg2);
> +       gcall *g = gimple_build_call_internal (IFN_FMA, 3, arg0, arg1,
arg2);
> +       gimple_call_set_lhs (g, lhs);
> +       gimple_call_set_nothrow (g, true);
>          gimple_set_location (g, gimple_location (stmt));
>          gsi_replace (gsi, g, true);
>          return true;
> Index: gcc/brig/brigfrontend/brig-function.cc
> ===================================================================
> --- gcc/brig/brigfrontend/brig-function.cc      2018-05-16
12:48:59.410941892 +0100
> +++ gcc/brig/brigfrontend/brig-function.cc      2018-05-17
09:18:19.936943371 +0100
> @@ -1218,6 +1218,7 @@ brig_function::get_builtin_for_hsa_opcod
>       case BRIG_OPCODE_NEXP2:
>         builtin = mathfn_built_in (builtin_type, BUILT_IN_EXP2);
>         break;
> +    case BRIG_OPCODE_FMA:
>       case BRIG_OPCODE_NFMA:
>         builtin = mathfn_built_in (builtin_type, BUILT_IN_FMA);
>         break;
> @@ -1460,8 +1461,6 @@ brig_function::get_tree_code_for_hsa_opc
>          return CALL_EXPR;
>         else
>          return MAX_EXPR;
> -    case BRIG_OPCODE_FMA:
> -      return FMA_EXPR;
>       case BRIG_OPCODE_ABS:
>         return ABS_EXPR;
>       case BRIG_OPCODE_SHL:
> @@ -1496,6 +1495,7 @@ brig_function::get_tree_code_for_hsa_opc
>         /* Implement as 1/f (x).  gcc should pattern detect that and
>           use a native instruction, if available, for it.  */
>         return TREE_LIST;
> +    case BRIG_OPCODE_FMA:
>       case BRIG_OPCODE_FLOOR:
>       case BRIG_OPCODE_CEIL:
>       case BRIG_OPCODE_SQRT:
> Index: gcc/c/gimple-parser.c
> ===================================================================
> --- gcc/c/gimple-parser.c       2018-05-17 09:17:58.756608780 +0100
> +++ gcc/c/gimple-parser.c       2018-05-17 09:18:19.939943300 +0100
> @@ -952,27 +952,6 @@ c_parser_gimple_postfix_expression (c_pa
>                expr.value = fold_convert (type, val);
>                return expr;
>              }
> -         else if (strcmp (IDENTIFIER_POINTER (id), "__FMA") == 0)
> -           {
> -             c_parser_consume_token (parser);
> -             auto_vec<tree> args;
> -
> -             if (c_parser_require (parser, CPP_OPEN_PAREN, "expected
%<(%>"))
> -               {
> -                 c_parser_gimple_expr_list (parser, &args);
> -                 c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
> -                                            "expected %<)%>");
> -               }
> -             if (args.length () != 3)
> -               {
> -                 error_at (loc, "invalid number of operands to __FMA");
> -                 expr.value = error_mark_node;
> -                 return expr;
> -               }
> -             expr.value = build3_loc (loc, FMA_EXPR, TREE_TYPE (args[0]),
> -                                      args[0], args[1], args[2]);
> -             return expr;
> -           }

>            /* SSA name.  */
>            unsigned version, ver_offset;
> Index: gcc/cp/constexpr.c
> ===================================================================
> --- gcc/cp/constexpr.c  2018-05-16 12:49:10.203809793 +0100
> +++ gcc/cp/constexpr.c  2018-05-17 09:18:19.954942948 +0100
> @@ -4574,7 +4574,6 @@ cxx_eval_constant_expression (const cons
>                               non_constant_p, overflow_p);
>         break;

> -    case FMA_EXPR:
>       case VEC_PERM_EXPR:
>         r = cxx_eval_trinary_expression (ctx, t, lval,
>                                         non_constant_p, overflow_p);
> @@ -5999,7 +5998,6 @@ #define RECUR(T,RV) \
>            return false;
>         return true;

> -    case FMA_EXPR:
>       case VEC_PERM_EXPR:
>        for (i = 0; i < 3; ++i)
>         if (!RECUR (TREE_OPERAND (t, i), true))
> Index: gcc/testsuite/lib/target-supports.exp
> ===================================================================
> --- gcc/testsuite/lib/target-supports.exp       2018-05-16
12:48:59.410941892 +0100
> +++ gcc/testsuite/lib/target-supports.exp       2018-05-17
09:18:19.978942385 +0100
> @@ -2879,6 +2879,13 @@ proc check_effective_target_base_quadflo
>       return 1
>   }

> +# Return 1 if the target supports all four forms of fused multiply-add
> +# (fma, fms, fnma, and fnms) for both float and double.
> +
> +proc check_effective_target_scalar_all_fma { } {
> +    return [istarget aarch64*-*-*]
> +}
> +
>   # Return 1 if the target supports compiling fixed-point,
>   # 0 otherwise.

> Index: gcc/testsuite/gcc.dg/fma-1.c
> ===================================================================
> --- /dev/null   2018-04-20 16:19:46.369131350 +0100
> +++ gcc/testsuite/gcc.dg/fma-1.c        2018-05-17 09:18:19.975942456
+0100
> @@ -0,0 +1,15 @@
> +/* { dg-options "-O2 -fdump-tree-widening_mul" } */
> +
> +float
> +f1 (float a, float b, float c)
> +{
> +  return a * b + c;
> +}
> +
> +double
> +f2 (double a, double b, double c)
> +{
> +  return a * b + c;
> +}
> +
> +/* { dg-final { scan-tree-dump-times { = \.FMA \(} 2 "widening_mul" {
target scalar_all_fma } } } */
> Index: gcc/testsuite/gcc.dg/fma-2.c
> ===================================================================
> --- /dev/null   2018-04-20 16:19:46.369131350 +0100
> +++ gcc/testsuite/gcc.dg/fma-2.c        2018-05-17 09:18:19.975942456
+0100
> @@ -0,0 +1,15 @@
> +/* { dg-options "-O2 -fdump-tree-widening_mul" } */
> +
> +float
> +f1 (float a, float b, float c)
> +{
> +  return a * b - c;
> +}
> +
> +double
> +f2 (double a, double b, double c)
> +{
> +  return a * b - c;
> +}
> +
> +/* { dg-final { scan-tree-dump-times { = \.FMS \(} 2 "widening_mul" {
target scalar_all_fma } } } */
> Index: gcc/testsuite/gcc.dg/fma-3.c
> ===================================================================
> --- /dev/null   2018-04-20 16:19:46.369131350 +0100
> +++ gcc/testsuite/gcc.dg/fma-3.c        2018-05-17 09:18:19.975942456
+0100
> @@ -0,0 +1,15 @@
> +/* { dg-options "-O2 -fdump-tree-widening_mul" } */
> +
> +float
> +f1 (float a, float b, float c)
> +{
> +  return c - a * b;
> +}
> +
> +double
> +f2 (double a, double b, double c)
> +{
> +  return c - a * b;
> +}
> +
> +/* { dg-final { scan-tree-dump-times { = \.FNMA \(} 2 "widening_mul" {
target scalar_all_fma } } } */
> Index: gcc/testsuite/gcc.dg/fma-4.c
> ===================================================================
> --- /dev/null   2018-04-20 16:19:46.369131350 +0100
> +++ gcc/testsuite/gcc.dg/fma-4.c        2018-05-17 09:18:19.975942456
+0100
> @@ -0,0 +1,15 @@
> +/* { dg-options "-O2 -fdump-tree-widening_mul" } */
> +
> +float
> +f1 (float a, float b, float c)
> +{
> +  return -(a * b) - c;
> +}
> +
> +double
> +f2 (double a, double b, double c)
> +{
> +  return -(a * b) - c;
> +}
> +
> +/* { dg-final { scan-tree-dump-times { = \.FNMS \(} 2 "widening_mul" {
target scalar_all_fma } } } */
> Index: gcc/testsuite/gcc.dg/fma-5.c
> ===================================================================
> --- /dev/null   2018-04-20 16:19:46.369131350 +0100
> +++ gcc/testsuite/gcc.dg/fma-5.c        2018-05-17 09:18:19.976942432
+0100
> @@ -0,0 +1,53 @@
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +float
> +f1 (float a, float b, float c)
> +{
> +  return __builtin_fmaf (a, b, -c);
> +}
> +
> +double
> +f2 (double a, double b, double c)
> +{
> +  return __builtin_fma (a, b, -c);
> +}
> +
> +void
> +f3 (float a, float b, float c, float d, float e, float *res)
> +{
> +  res[0] = __builtin_fmaf (a, b, -e);
> +  res[1] = __builtin_fmaf (c, d, -e);
> +}
> +
> +void
> +f4 (double a, double b, double c, double d, double e, double *res)
> +{
> +  res[0] = __builtin_fma (a, b, -e);
> +  res[1] = __builtin_fma (c, d, -e);
> +}
> +
> +float
> +f5 (float a, float b, float c)
> +{
> +  return -__builtin_fmaf (-a, b, c);
> +}
> +
> +double
> +f6 (double a, double b, double c)
> +{
> +  return -__builtin_fma (-a, b, c);
> +}
> +
> +float
> +f7 (float a, float b, float c)
> +{
> +  return -__builtin_fmaf (a, -b, c);
> +}
> +
> +double
> +f8 (double a, double b, double c)
> +{
> +  return -__builtin_fma (a, -b, c);
> +}
> +
> +/* { dg-final { scan-tree-dump-times { = \.FMS \(} 10 "optimized" {
target scalar_all_fma } } } */
> Index: gcc/testsuite/gcc.dg/fma-6.c
> ===================================================================
> --- /dev/null   2018-04-20 16:19:46.369131350 +0100
> +++ gcc/testsuite/gcc.dg/fma-6.c        2018-05-17 09:18:19.976942432
+0100
> @@ -0,0 +1,67 @@
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +float
> +f1 (float a, float b, float c)
> +{
> +  return __builtin_fmaf (-a, b, c);
> +}
> +
> +double
> +f2 (double a, double b, double c)
> +{
> +  return __builtin_fma (-a, b, c);
> +}
> +
> +float
> +f3 (float a, float b, float c)
> +{
> +  return __builtin_fmaf (a, -b, c);
> +}
> +
> +double
> +f4 (double a, double b, double c)
> +{
> +  return __builtin_fma (a, -b, c);
> +}
> +
> +void
> +f5 (float a, float b, float c, float d, float e, float *res)
> +{
> +  res[0] = __builtin_fmaf (-a, b, c);
> +  res[1] = __builtin_fmaf (-a, d, e);
> +}
> +
> +void
> +f6 (double a, double b, double c, double d, double e, double *res)
> +{
> +  res[0] = __builtin_fma (-a, b, c);
> +  res[1] = __builtin_fma (-a, d, e);
> +}
> +
> +void
> +f7 (float a, float b, float c, float d, float e, float *res)
> +{
> +  res[0] = __builtin_fmaf (a, -b, c);
> +  res[1] = __builtin_fmaf (d, -b, e);
> +}
> +
> +void
> +f8 (double a, double b, double c, double d, double e, double *res)
> +{
> +  res[0] = __builtin_fma (a, -b, c);
> +  res[1] = __builtin_fma (d, -b, e);
> +}
> +
> +float
> +f9 (float a, float b, float c)
> +{
> +  return -__builtin_fmaf (a, b, -c);
> +}
> +
> +double
> +f10 (double a, double b, double c)
> +{
> +  return -__builtin_fma (a, b, -c);
> +}
> +
> +/* { dg-final { scan-tree-dump-times { = \.FNMA \(} 14 "optimized" {
target scalar_all_fma } } } */
> Index: gcc/testsuite/gcc.dg/fma-7.c
> ===================================================================
> --- /dev/null   2018-04-20 16:19:46.369131350 +0100
> +++ gcc/testsuite/gcc.dg/fma-7.c        2018-05-17 09:18:19.976942432
+0100
> @@ -0,0 +1,67 @@
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +float
> +f1 (float a, float b, float c)
> +{
> +  return __builtin_fmaf (-a, b, -c);
> +}
> +
> +double
> +f2 (double a, double b, double c)
> +{
> +  return __builtin_fma (-a, b, -c);
> +}
> +
> +float
> +f3 (float a, float b, float c)
> +{
> +  return __builtin_fmaf (a, -b, -c);
> +}
> +
> +double
> +f4 (double a, double b, double c)
> +{
> +  return __builtin_fma (a, -b, -c);
> +}
> +
> +void
> +f5 (float a, float b, float c, float d, float *res)
> +{
> +  res[0] = __builtin_fmaf (-a, b, -c);
> +  res[1] = __builtin_fmaf (-a, d, -c);
> +}
> +
> +void
> +f6 (double a, double b, double c, double d, double *res)
> +{
> +  res[0] = __builtin_fma (-a, b, -c);
> +  res[1] = __builtin_fma (-a, d, -c);
> +}
> +
> +void
> +f7 (float a, float b, float c, float d, float *res)
> +{
> +  res[0] = __builtin_fmaf (a, -b, -c);
> +  res[1] = __builtin_fmaf (d, -b, -c);
> +}
> +
> +void
> +f8 (double a, double b, double c, double d, double *res)
> +{
> +  res[0] = __builtin_fma (a, -b, -c);
> +  res[1] = __builtin_fma (d, -b, -c);
> +}
> +
> +float
> +f9 (float a, float b, float c)
> +{
> +  return -__builtin_fmaf (a, b, c);
> +}
> +
> +double
> +f10 (double a, double b, double c)
> +{
> +  return -__builtin_fma (a, b, c);
> +}
> +
> +/* { dg-final { scan-tree-dump-times { = \.FNMS \(} 14 "optimized" {
target scalar_all_fma } } } */
> Index: gcc/testsuite/gcc.dg/gimplefe-26.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/gimplefe-26.c  2018-05-16 12:48:59.410941892
+0100
> +++ gcc/testsuite/gcc.dg/gimplefe-26.c  2018-05-17 09:18:19.976942432
+0100
> @@ -1,16 +1,15 @@
> -/* { dg-do compile { target c99_runtime } } */
> +/* { dg-do compile { target scalar_all_fma } } */
>   /* { dg-options "-O -fgimple -fdump-tree-ssa-gimple" } */

>   #define foo(type, num) \
>   type __GIMPLE () foo_##num (type a, type b, type c) \
>   { \
>     type t0; \
> -  t0_1 = __FMA (a, b, c); \
> +  t0_1 = .FMA (a, b, c); \
>     return t0_1; \
>   }

>   foo(float, 1)
>   foo(double, 2)
> -foo(long double, 3)

> -/* { dg-final { scan-tree-dump-times "__FMA" 3 "ssa" } } */
> +/* { dg-final { scan-tree-dump-times {\.FMA} 2 "ssa" } } */
> Index: gcc/testsuite/gfortran.dg/reassoc_7.f
> ===================================================================
> --- gcc/testsuite/gfortran.dg/reassoc_7.f       2018-05-16
12:48:59.410941892 +0100
> +++ gcc/testsuite/gfortran.dg/reassoc_7.f       2018-05-17
09:18:19.976942432 +0100
> @@ -1,5 +1,5 @@
>   ! { dg-do compile }
> -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" }
> +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized"
}

>         SUBROUTINE S55199(P,Dvdph)
>         implicit none
> Index: gcc/testsuite/gfortran.dg/reassoc_8.f
> ===================================================================
> --- gcc/testsuite/gfortran.dg/reassoc_8.f       2018-05-16
12:48:59.410941892 +0100
> +++ gcc/testsuite/gfortran.dg/reassoc_8.f       2018-05-17
09:18:19.977942409 +0100
> @@ -1,5 +1,5 @@
>   ! { dg-do compile }
> -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" }
> +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized"
}

>         SUBROUTINE S55199(P,Dvdph)
>         implicit none
> Index: gcc/testsuite/gfortran.dg/reassoc_9.f
> ===================================================================
> --- gcc/testsuite/gfortran.dg/reassoc_9.f       2018-05-16
12:48:59.410941892 +0100
> +++ gcc/testsuite/gfortran.dg/reassoc_9.f       2018-05-17
09:18:19.977942409 +0100
> @@ -1,5 +1,5 @@
>   ! { dg-do compile }
> -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" }
> +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized"
}

>         SUBROUTINE S55199(P,Dvdph)
>         implicit none
> Index: gcc/testsuite/gfortran.dg/reassoc_10.f
> ===================================================================
> --- gcc/testsuite/gfortran.dg/reassoc_10.f      2018-05-16
12:48:59.410941892 +0100
> +++ gcc/testsuite/gfortran.dg/reassoc_10.f      2018-05-17
09:18:19.976942432 +0100
> @@ -1,5 +1,5 @@
>   ! { dg-do compile }
> -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" }
> +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized"
}

>         SUBROUTINE S55199(P,Q,Dvdph)
>         implicit none
H.J. Lu May 22, 2018, 7:27 p.m. | #9
On Thu, May 17, 2018 at 1:56 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>>> @@ -2698,23 +2703,26 @@ convert_mult_to_fma_1 (tree mul_result,
>>>          }
>>
>>>         if (negate_p)
>>> -       mulop1 = force_gimple_operand_gsi (&gsi,
>>> -                                          build1 (NEGATE_EXPR,
>>> -                                                  type, mulop1),
>>> -                                          true, NULL_TREE, true,
>>> -                                          GSI_SAME_STMT);
>>> +       mulop1 = gimple_build (&seq, NEGATE_EXPR, type, mulop1);
>>
>>> -      fma_stmt = gimple_build_assign (gimple_assign_lhs (use_stmt),
>>> -                                     FMA_EXPR, mulop1, op2, addop);
>>> +      if (seq)
>>> +       gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
>>> +      fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2,
>> addop);
>>> +      gimple_call_set_lhs (fma_stmt, gimple_assign_lhs (use_stmt));
>>> +      gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal
>> (use_stmt));
>>> +      gsi_replace (&gsi, fma_stmt, true);
>>> +      /* Valueize aggressively so that we generate FMS, FNMA and FNMS
>>> +        regardless of where the negation occurs.  */
>>> +      if (fold_stmt (&gsi, aggressive_valueize))
>>> +       update_stmt (gsi_stmt (gsi));
>>
>> I think it would be nice to be able to use gimple_build () with IFNs so you
>> can
>> gimple_build () the IFN and then use gsi_replace_with_seq () on it.  You
>> only need to fold with generated negates, not with negates already in the
>> IL?
>> The the folding implied with gimple_build will take care of it.
>
> The idea was to pick up existing negates that feed the multiplication
> as well as any added by the pass itself.
>
> On IRC yesterday we talked about how this should handle the ECF_NOTHROW
> flag, and whether things like IFN_SQRT and IFN_FMA should always be
> nothrow (like the built-in functions are).  But in the end I thought
> it'd be better to keep things as they are.  We already handle
> -fnon-call-exceptions for unfused a * b + c and before the patch also
> handled it for FMA_EXPR.  It'd seem like a step backwards if the new
> internal functions didn't handle it too.  If anything it seems like the
> built-in functions should change to be closer to the tree_code and
> internal_fn way of doing things, if we want to support -fnon-call-exceptions
> properly.
>
> This also surprised me when doing the if-conversion patch I sent yesterday.
> We're happy to vectorise:
>
>   for (int i = 0; i < 100; ++i)
>     x[i] = ... ? sqrt (x[i]) : 0;
>
> by doing the sqrt unconditionally and selecting on the result, even with
> the default maths flags, but refuse to vectorise the simpler:
>
>   for (int i = 0; i < 100; ++i)
>     x[i] = ... ? x[i] + 1 : 0;
>
> in the same way.
>
>> Otherwise can you please move aggressive_valueize to gimple-fold.[ch]
>> alongside no_follow_ssa_edges / follow_single_use_edges and maybe
>> rename it as follow_all_ssa_edges?
>
> Ah, yeah, that's definitely a better name.
>
> I also renamed all_scalar_fma to scalar_all_fma, since I realised
> after Andrew's reply that the old name made it sound like it was
> "all scalars", whereas it meant to mean "all fmas".
>
> Tested as before.
>
> Thanks,
> Richard
>
> 2018-05-17  Richard Sandiford  <richard.sandiford@linaro.org>
>
> gcc/
>         * doc/sourcebuild.texi (scalar_all_fma): Document.
>         * tree.def (FMA_EXPR): Delete.
>         * internal-fn.def (FMA, FMS, FNMA, FNMS): New internal functions.
>         * internal-fn.c (ternary_direct): New macro.
>         (expand_ternary_optab_fn): Likewise.
>         (direct_ternary_optab_supported_p): Likewise.
>         * Makefile.in (build/genmatch.o): Depend on case-fn-macros.h.
>         * builtins.c (fold_builtin_fma): Delete.
>         (fold_builtin_3): Don't call it.
>         * cfgexpand.c (expand_debug_expr): Remove FMA_EXPR handling.
>         * expr.c (expand_expr_real_2): Likewise.
>         * fold-const.c (operand_equal_p): Likewise.
>         (fold_ternary_loc): Likewise.
>         * gimple-pretty-print.c (dump_ternary_rhs): Likewise.
>         * gimple.c (DEFTREECODE): Likewise.
>         * gimplify.c (gimplify_expr): Likewise.
>         * optabs-tree.c (optab_for_tree_code): Likewise.
>         * tree-cfg.c (verify_gimple_assign_ternary): Likewise.
>         * tree-eh.c (operation_could_trap_p): Likewise.
>         (stmt_could_throw_1_p): Likewise.
>         * tree-inline.c (estimate_operator_cost): Likewise.
>         * tree-pretty-print.c (dump_generic_node): Likewise.
>         (op_code_prio): Likewise.
>         * tree-ssa-loop-im.c (stmt_cost): Likewise.
>         * tree-ssa-operands.c (get_expr_operands): Likewise.
>         * tree.c (commutative_ternary_tree_code, add_expr): Likewise.
>         * fold-const-call.h (fold_fma): Delete.
>         * fold-const-call.c (fold_const_call_ssss): Handle CFN_FMS,
>         CFN_FNMA and CFN_FNMS.
>         (fold_fma): Delete.
>         * genmatch.c (combined_fn): New enum.
>         (commutative_ternary_tree_code): Remove FMA_EXPR handling.
>         (commutative_op): New function.
>         (commutate): Use it.  Handle more than 2 operands.
>         (dt_operand::gen_gimple_expr): Use commutative_op.
>         (parser::parse_expr): Allow :c to be used with non-binary
>         operators if the commutative operand is known.
>         * gimple-ssa-backprop.c (backprop::process_builtin_call_use): Handle
>         CFN_FMS, CFN_FNMA and CFN_FNMS.
>         (backprop::process_assign_use): Remove FMA_EXPR handling.
>         * hsa-gen.c (gen_hsa_insns_for_operation_assignment): Likewise.
>         (gen_hsa_fma): New function.
>         (gen_hsa_insn_for_internal_fn_call): Use it for IFN_FMA, IFN_FMS,
>         IFN_FNMA and IFN_FNMS.
>         * match.pd: Add folds for IFN_FMS, IFN_FNMA and IFN_FNMS.
>         * gimple-fold.h (follow_all_ssa_edges): Declare.
>         * gimple-fold.c (follow_all_ssa_edges): New function.
>         * tree-ssa-math-opts.c (convert_mult_to_fma_1): Use the
>         gimple_build interface and use follow_all_ssa_edges to fold the result.
>         (convert_mult_to_fma): Use direct_internal_fn_suppoerted_p
>         instead of checking for optabs directly.
>         * config/i386/i386.c (ix86_add_stmt_cost): Recognize FMAs as calls
>         rather than FMA_EXPRs.
>         * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Create a
>         call to IFN_FMA instead of an FMA_EXPR.
>
> gcc/brig/
>         * brigfrontend/brig-function.cc
>         (brig_function::get_builtin_for_hsa_opcode): Use BUILT_IN_FMA
>         for BRIG_OPCODE_FMA.
>         (brig_function::get_tree_code_for_hsa_opcode): Treat BUILT_IN_FMA
>         as a call.
>
> gcc/c/
>         * gimple-parser.c (c_parser_gimple_postfix_expression): Remove
>         __FMA_EXPR handlng.
>
> gcc/cp/
>         * constexpr.c (cxx_eval_constant_expression): Remove FMA_EXPR handling.
>         (potential_constant_expression_1): Likewise.
>
> gcc/testsuite/
>         * lib/target-supports.exp (check_effective_target_scalar_all_fma):
>         New proc.
>         * gcc.dg/fma-1.c: New test.
>         * gcc.dg/fma-2.c: Likewise.
>         * gcc.dg/fma-3.c: Likewise.
>         * gcc.dg/fma-4.c: Likewise.
>         * gcc.dg/fma-5.c: Likewise.
>         * gcc.dg/fma-6.c: Likewise.
>         * gcc.dg/fma-7.c: Likewise.
>         * gcc.dg/gimplefe-26.c: Use .FMA instead of __FMA and require
>         scalar_all_fma.
>         * gfortran.dg/reassoc_7.f: Pass -ffp-contract=off.
>         * gfortran.dg/reassoc_8.f: Likewise.
>         * gfortran.dg/reassoc_9.f: Likewise.
>         * gfortran.dg/reassoc_10.f: Likewise.
>

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85881


H.J.
Richard Sandiford May 23, 2018, 8:23 a.m. | #10
"H.J. Lu" <hjl.tools@gmail.com> writes:
> On Thu, May 17, 2018 at 1:56 AM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>>> @@ -2698,23 +2703,26 @@ convert_mult_to_fma_1 (tree mul_result,
>>>>          }
>>>
>>>>         if (negate_p)
>>>> -       mulop1 = force_gimple_operand_gsi (&gsi,
>>>> -                                          build1 (NEGATE_EXPR,
>>>> -                                                  type, mulop1),
>>>> -                                          true, NULL_TREE, true,
>>>> -                                          GSI_SAME_STMT);
>>>> +       mulop1 = gimple_build (&seq, NEGATE_EXPR, type, mulop1);
>>>
>>>> -      fma_stmt = gimple_build_assign (gimple_assign_lhs (use_stmt),
>>>> -                                     FMA_EXPR, mulop1, op2, addop);
>>>> +      if (seq)
>>>> +       gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
>>>> +      fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2,
>>> addop);
>>>> +      gimple_call_set_lhs (fma_stmt, gimple_assign_lhs (use_stmt));
>>>> +      gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal
>>> (use_stmt));
>>>> +      gsi_replace (&gsi, fma_stmt, true);
>>>> +      /* Valueize aggressively so that we generate FMS, FNMA and FNMS
>>>> +        regardless of where the negation occurs.  */
>>>> +      if (fold_stmt (&gsi, aggressive_valueize))
>>>> +       update_stmt (gsi_stmt (gsi));
>>>
>>> I think it would be nice to be able to use gimple_build () with IFNs so you
>>> can
>>> gimple_build () the IFN and then use gsi_replace_with_seq () on it.  You
>>> only need to fold with generated negates, not with negates already in the
>>> IL?
>>> The the folding implied with gimple_build will take care of it.
>>
>> The idea was to pick up existing negates that feed the multiplication
>> as well as any added by the pass itself.
>>
>> On IRC yesterday we talked about how this should handle the ECF_NOTHROW
>> flag, and whether things like IFN_SQRT and IFN_FMA should always be
>> nothrow (like the built-in functions are).  But in the end I thought
>> it'd be better to keep things as they are.  We already handle
>> -fnon-call-exceptions for unfused a * b + c and before the patch also
>> handled it for FMA_EXPR.  It'd seem like a step backwards if the new
>> internal functions didn't handle it too.  If anything it seems like the
>> built-in functions should change to be closer to the tree_code and
>> internal_fn way of doing things, if we want to support -fnon-call-exceptions
>> properly.
>>
>> This also surprised me when doing the if-conversion patch I sent yesterday.
>> We're happy to vectorise:
>>
>>   for (int i = 0; i < 100; ++i)
>>     x[i] = ... ? sqrt (x[i]) : 0;
>>
>> by doing the sqrt unconditionally and selecting on the result, even with
>> the default maths flags, but refuse to vectorise the simpler:
>>
>>   for (int i = 0; i < 100; ++i)
>>     x[i] = ... ? x[i] + 1 : 0;
>>
>> in the same way.
>>
>>> Otherwise can you please move aggressive_valueize to gimple-fold.[ch]
>>> alongside no_follow_ssa_edges / follow_single_use_edges and maybe
>>> rename it as follow_all_ssa_edges?
>>
>> Ah, yeah, that's definitely a better name.
>>
>> I also renamed all_scalar_fma to scalar_all_fma, since I realised
>> after Andrew's reply that the old name made it sound like it was
>> "all scalars", whereas it meant to mean "all fmas".
>>
>> Tested as before.
>>
>> Thanks,
>> Richard
>>
>> 2018-05-17  Richard Sandiford  <richard.sandiford@linaro.org>
>>
>> gcc/
>>         * doc/sourcebuild.texi (scalar_all_fma): Document.
>>         * tree.def (FMA_EXPR): Delete.
>>         * internal-fn.def (FMA, FMS, FNMA, FNMS): New internal functions.
>>         * internal-fn.c (ternary_direct): New macro.
>>         (expand_ternary_optab_fn): Likewise.
>>         (direct_ternary_optab_supported_p): Likewise.
>>         * Makefile.in (build/genmatch.o): Depend on case-fn-macros.h.
>>         * builtins.c (fold_builtin_fma): Delete.
>>         (fold_builtin_3): Don't call it.
>>         * cfgexpand.c (expand_debug_expr): Remove FMA_EXPR handling.
>>         * expr.c (expand_expr_real_2): Likewise.
>>         * fold-const.c (operand_equal_p): Likewise.
>>         (fold_ternary_loc): Likewise.
>>         * gimple-pretty-print.c (dump_ternary_rhs): Likewise.
>>         * gimple.c (DEFTREECODE): Likewise.
>>         * gimplify.c (gimplify_expr): Likewise.
>>         * optabs-tree.c (optab_for_tree_code): Likewise.
>>         * tree-cfg.c (verify_gimple_assign_ternary): Likewise.
>>         * tree-eh.c (operation_could_trap_p): Likewise.
>>         (stmt_could_throw_1_p): Likewise.
>>         * tree-inline.c (estimate_operator_cost): Likewise.
>>         * tree-pretty-print.c (dump_generic_node): Likewise.
>>         (op_code_prio): Likewise.
>>         * tree-ssa-loop-im.c (stmt_cost): Likewise.
>>         * tree-ssa-operands.c (get_expr_operands): Likewise.
>>         * tree.c (commutative_ternary_tree_code, add_expr): Likewise.
>>         * fold-const-call.h (fold_fma): Delete.
>>         * fold-const-call.c (fold_const_call_ssss): Handle CFN_FMS,
>>         CFN_FNMA and CFN_FNMS.
>>         (fold_fma): Delete.
>>         * genmatch.c (combined_fn): New enum.
>>         (commutative_ternary_tree_code): Remove FMA_EXPR handling.
>>         (commutative_op): New function.
>>         (commutate): Use it.  Handle more than 2 operands.
>>         (dt_operand::gen_gimple_expr): Use commutative_op.
>>         (parser::parse_expr): Allow :c to be used with non-binary
>>         operators if the commutative operand is known.
>>         * gimple-ssa-backprop.c (backprop::process_builtin_call_use): Handle
>>         CFN_FMS, CFN_FNMA and CFN_FNMS.
>>         (backprop::process_assign_use): Remove FMA_EXPR handling.
>>         * hsa-gen.c (gen_hsa_insns_for_operation_assignment): Likewise.
>>         (gen_hsa_fma): New function.
>>         (gen_hsa_insn_for_internal_fn_call): Use it for IFN_FMA, IFN_FMS,
>>         IFN_FNMA and IFN_FNMS.
>>         * match.pd: Add folds for IFN_FMS, IFN_FNMA and IFN_FNMS.
>>         * gimple-fold.h (follow_all_ssa_edges): Declare.
>>         * gimple-fold.c (follow_all_ssa_edges): New function.
>>         * tree-ssa-math-opts.c (convert_mult_to_fma_1): Use the
>>         gimple_build interface and use follow_all_ssa_edges to fold
>> the result.
>>         (convert_mult_to_fma): Use direct_internal_fn_suppoerted_p
>>         instead of checking for optabs directly.
>>         * config/i386/i386.c (ix86_add_stmt_cost): Recognize FMAs as calls
>>         rather than FMA_EXPRs.
>>         * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Create a
>>         call to IFN_FMA instead of an FMA_EXPR.
>>
>> gcc/brig/
>>         * brigfrontend/brig-function.cc
>>         (brig_function::get_builtin_for_hsa_opcode): Use BUILT_IN_FMA
>>         for BRIG_OPCODE_FMA.
>>         (brig_function::get_tree_code_for_hsa_opcode): Treat BUILT_IN_FMA
>>         as a call.
>>
>> gcc/c/
>>         * gimple-parser.c (c_parser_gimple_postfix_expression): Remove
>>         __FMA_EXPR handlng.
>>
>> gcc/cp/
>>         * constexpr.c (cxx_eval_constant_expression): Remove FMA_EXPR
>> handling.
>>         (potential_constant_expression_1): Likewise.
>>
>> gcc/testsuite/
>>         * lib/target-supports.exp (check_effective_target_scalar_all_fma):
>>         New proc.
>>         * gcc.dg/fma-1.c: New test.
>>         * gcc.dg/fma-2.c: Likewise.
>>         * gcc.dg/fma-3.c: Likewise.
>>         * gcc.dg/fma-4.c: Likewise.
>>         * gcc.dg/fma-5.c: Likewise.
>>         * gcc.dg/fma-6.c: Likewise.
>>         * gcc.dg/fma-7.c: Likewise.
>>         * gcc.dg/gimplefe-26.c: Use .FMA instead of __FMA and require
>>         scalar_all_fma.
>>         * gfortran.dg/reassoc_7.f: Pass -ffp-contract=off.
>>         * gfortran.dg/reassoc_8.f: Likewise.
>>         * gfortran.dg/reassoc_9.f: Likewise.
>>         * gfortran.dg/reassoc_10.f: Likewise.
>>
>
> This caused:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85881

Sorry about that.  Looking at it, I think it's another instance of the
original RA problem in PR 80481.  The expand code after this patch is
the same as before apart from the numbering of the pseudo registers:
after the patch there are fewer unused pseudo registers than before,
which in itself should be a good thing.  However, it seems to be enough
to defeat the fix for PR80481, and in fact reverting that fix makes
pr80481.C pass again.

Thanks,
Richard

Patch

Index: gcc/doc/sourcebuild.texi
===================================================================
--- gcc/doc/sourcebuild.texi	2018-05-08 09:42:01.479723260 +0100
+++ gcc/doc/sourcebuild.texi	2018-05-11 18:08:24.833947453 +0100
@@ -2170,6 +2170,11 @@  Target supports wide characters.
 @subsubsection Other attributes
 
 @table @code
+@item all_scalar_fma
+Target supports all four fused multiply-add optabs for both @code{float}
+and @code{double}.  These optabs are: @code{fma_optab}, @code{fms_optab},
+@code{fnma_optab} and @code{fnms_optab}.
+
 @item automatic_stack_alignment
 Target supports automatic stack alignment.
 
Index: gcc/tree.def
===================================================================
--- gcc/tree.def	2018-01-03 11:12:58.606649676 +0000
+++ gcc/tree.def	2018-05-11 18:08:24.852946671 +0100
@@ -1345,12 +1345,6 @@  DEFTREECODE (WIDEN_MULT_MINUS_EXPR, "wid
    by the second argument.  */
 DEFTREECODE (WIDEN_LSHIFT_EXPR, "widen_lshift_expr", tcc_binary, 2)
 
-/* Fused multiply-add.
-   All operands and the result are of the same type.  No intermediate
-   rounding is performed after multiplying operand one with operand two
-   before adding operand three.  */
-DEFTREECODE (FMA_EXPR, "fma_expr", tcc_expression, 3)
-
 /* Widening vector multiplication.
    The two operands are vectors with N elements of size S. Multiplying the
    elements of the two vectors will result in N products of size 2*S.
Index: gcc/internal-fn.def
===================================================================
--- gcc/internal-fn.def	2018-02-20 09:40:41.809452604 +0000
+++ gcc/internal-fn.def	2018-05-11 18:08:24.842947083 +0100
@@ -57,6 +57,7 @@  along with GCC; see the file COPYING3.
 
    - unary: a normal unary optab, such as vec_reverse_<mode>
    - binary: a normal binary optab, such as vec_interleave_lo_<mode>
+   - ternary: a normal ternary optab, such as fma<mode>4
 
    - cond_binary: a conditional binary optab, such as add<mode>cc
 
@@ -138,6 +139,10 @@  DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CO
 DEF_INTERNAL_OPTAB_FN (VEC_SHL_INSERT, ECF_CONST | ECF_NOTHROW,
 		       vec_shl_insert, binary)
 
+DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary)
+DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary)
+DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary)
+
 DEF_INTERNAL_OPTAB_FN (COND_ADD, ECF_CONST, cond_add, cond_binary)
 DEF_INTERNAL_OPTAB_FN (COND_SUB, ECF_CONST, cond_sub, cond_binary)
 DEF_INTERNAL_SIGNED_OPTAB_FN (COND_MIN, ECF_CONST, first,
@@ -218,6 +223,9 @@  DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONS
 /* FP scales.  */
 DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
 
+/* Ternary math functions.  */
+DEF_INTERNAL_FLT_FN (FMA, ECF_CONST, fma, ternary)
+
 /* Unary integer ops.  */
 DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb, unary)
 DEF_INTERNAL_INT_FN (CLZ, ECF_CONST | ECF_NOTHROW, clz, unary)
Index: gcc/internal-fn.c
===================================================================
--- gcc/internal-fn.c	2018-03-02 09:44:45.456595753 +0000
+++ gcc/internal-fn.c	2018-05-11 18:08:24.841947124 +0100
@@ -90,6 +90,7 @@  #define mask_store_lanes_direct { 0, 0,
 #define scatter_store_direct { 3, 3, false }
 #define unary_direct { 0, 0, true }
 #define binary_direct { 0, 0, true }
+#define ternary_direct { 0, 0, true }
 #define cond_unary_direct { 1, 1, true }
 #define cond_binary_direct { 1, 1, true }
 #define while_direct { 0, 2, false }
@@ -2962,6 +2963,9 @@  #define expand_unary_optab_fn(FN, STMT,
 #define expand_binary_optab_fn(FN, STMT, OPTAB) \
   expand_direct_optab_fn (FN, STMT, OPTAB, 2)
 
+#define expand_ternary_optab_fn(FN, STMT, OPTAB) \
+  expand_direct_optab_fn (FN, STMT, OPTAB, 3)
+
 #define expand_cond_unary_optab_fn(FN, STMT, OPTAB) \
   expand_direct_optab_fn (FN, STMT, OPTAB, 2)
 
@@ -3047,6 +3051,7 @@  multi_vector_optab_supported_p (convert_
 
 #define direct_unary_optab_supported_p direct_optab_supported_p
 #define direct_binary_optab_supported_p direct_optab_supported_p
+#define direct_ternary_optab_supported_p direct_optab_supported_p
 #define direct_cond_unary_optab_supported_p direct_optab_supported_p
 #define direct_cond_binary_optab_supported_p direct_optab_supported_p
 #define direct_mask_load_optab_supported_p direct_optab_supported_p
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	2018-03-13 15:06:01.749426280 +0000
+++ gcc/Makefile.in	2018-05-11 18:08:24.816948152 +0100
@@ -2786,7 +2786,7 @@  build/genmddump.o : genmddump.c $(RTL_BA
   $(CORETYPES_H) $(GTM_H) errors.h $(READ_MD_H) $(GENSUPPORT_H)
 build/genmatch.o : genmatch.c $(BCONFIG_H) $(SYSTEM_H) \
   $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-map.h $(GGC_H) is-a.h \
-  tree.def builtins.def internal-fn.def
+  tree.def builtins.def internal-fn.def case-cfn-macros.h
 build/gencfn-macros.o : gencfn-macros.c $(BCONFIG_H) $(SYSTEM_H)	\
   $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-set.h builtins.def	\
   internal-fn.def
Index: gcc/builtins.c
===================================================================
--- gcc/builtins.c	2018-05-08 09:42:01.633717606 +0100
+++ gcc/builtins.c	2018-05-11 18:08:24.817948111 +0100
@@ -8340,21 +8340,6 @@  fold_builtin_abs (location_t loc, tree a
   return fold_build1_loc (loc, ABS_EXPR, type, arg);
 }
 
-/* Fold a call to fma, fmaf, or fmal with arguments ARG[012].  */
-
-static tree
-fold_builtin_fma (location_t loc, tree arg0, tree arg1, tree arg2, tree type)
-{
-  /* ??? Only expand to FMA_EXPR if it's directly supported.  */
-  if (validate_arg (arg0, REAL_TYPE)
-      && validate_arg (arg1, REAL_TYPE)
-      && validate_arg (arg2, REAL_TYPE)
-      && optab_handler (fma_optab, TYPE_MODE (type)) != CODE_FOR_nothing)
-    return fold_build3_loc (loc, FMA_EXPR, type, arg0, arg1, arg2);
-
-  return NULL_TREE;
-}
-
 /* Fold a call to builtin carg(a+bi) -> atan2(b,a).  */
 
 static tree
@@ -9260,10 +9245,6 @@  fold_builtin_3 (location_t loc, tree fnd
     CASE_FLT_FN (BUILT_IN_SINCOS):
       return fold_builtin_sincos (loc, arg0, arg1, arg2);
 
-    CASE_FLT_FN (BUILT_IN_FMA):
-    CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMA):
-      return fold_builtin_fma (loc, arg0, arg1, arg2, type);
-
     CASE_FLT_FN (BUILT_IN_REMQUO):
       if (validate_arg (arg0, REAL_TYPE)
 	  && validate_arg (arg1, REAL_TYPE)
Index: gcc/cfgexpand.c
===================================================================
--- gcc/cfgexpand.c	2018-05-09 11:34:47.190553782 +0100
+++ gcc/cfgexpand.c	2018-05-11 18:08:24.818948070 +0100
@@ -4202,7 +4202,6 @@  expand_debug_expr (tree exp)
 	case SAD_EXPR:
 	case WIDEN_MULT_PLUS_EXPR:
 	case WIDEN_MULT_MINUS_EXPR:
-	case FMA_EXPR:
 	  goto ternary;
 
 	case TRUTH_ANDIF_EXPR:
@@ -5190,9 +5189,6 @@  expand_debug_expr (tree exp)
 	}
       return NULL;
 
-    case FMA_EXPR:
-      return simplify_gen_ternary (FMA, mode, inner_mode, op0, op1, op2);
-
     default:
     flag_unsupported:
       if (flag_checking)
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2018-05-01 19:30:32.099548805 +0100
+++ gcc/expr.c	2018-05-11 18:08:24.833947453 +0100
@@ -8853,67 +8853,6 @@  #define REDUCE_BIT_FIELD(expr)	(reduce_b
       expand_operands (treeop0, treeop1, subtarget, &op0, &op1, EXPAND_NORMAL);
       return REDUCE_BIT_FIELD (expand_mult (mode, op0, op1, target, unsignedp));
 
-    case FMA_EXPR:
-      {
-	optab opt = fma_optab;
-	gimple *def0, *def2;
-
-	/* If there is no insn for FMA, emit it as __builtin_fma{,f,l}
-	   call.  */
-	if (optab_handler (fma_optab, mode) == CODE_FOR_nothing)
-	  {
-	    tree fn = mathfn_built_in (TREE_TYPE (treeop0), BUILT_IN_FMA);
-	    tree call_expr;
-
-	    gcc_assert (fn != NULL_TREE);
-	    call_expr = build_call_expr (fn, 3, treeop0, treeop1, treeop2);
-	    return expand_builtin (call_expr, target, subtarget, mode, false);
-	  }
-
-	def0 = get_def_for_expr (treeop0, NEGATE_EXPR);
-	/* The multiplication is commutative - look at its 2nd operand
-	   if the first isn't fed by a negate.  */
-	if (!def0)
-	  {
-	    def0 = get_def_for_expr (treeop1, NEGATE_EXPR);
-	    /* Swap operands if the 2nd operand is fed by a negate.  */
-	    if (def0)
-	      std::swap (treeop0, treeop1);
-	  }
-	def2 = get_def_for_expr (treeop2, NEGATE_EXPR);
-
-	op0 = op2 = NULL;
-
-	if (def0 && def2
-	    && optab_handler (fnms_optab, mode) != CODE_FOR_nothing)
-	  {
-	    opt = fnms_optab;
-	    op0 = expand_normal (gimple_assign_rhs1 (def0));
-	    op2 = expand_normal (gimple_assign_rhs1 (def2));
-	  }
-	else if (def0
-		 && optab_handler (fnma_optab, mode) != CODE_FOR_nothing)
-	  {
-	    opt = fnma_optab;
-	    op0 = expand_normal (gimple_assign_rhs1 (def0));
-	  }
-	else if (def2
-		 && optab_handler (fms_optab, mode) != CODE_FOR_nothing)
-	  {
-	    opt = fms_optab;
-	    op2 = expand_normal (gimple_assign_rhs1 (def2));
-	  }
-
-	if (op0 == NULL)
-	  op0 = expand_expr (treeop0, subtarget, VOIDmode, EXPAND_NORMAL);
-	if (op2 == NULL)
-	  op2 = expand_normal (treeop2);
-	op1 = expand_normal (treeop1);
-
-	return expand_ternary_op (TYPE_MODE (type), opt,
-				  op0, op1, op2, target, 0);
-      }
-
     case MULT_EXPR:
       /* If this is a fixed-point operation, then we cannot use the code
 	 below because "expand_mult" doesn't support sat/no-sat fixed-point
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	2018-05-08 09:42:01.637717459 +0100
+++ gcc/fold-const.c	2018-05-11 18:08:24.835947371 +0100
@@ -3297,7 +3297,6 @@  #define OP_SAME_WITH_NULL(N)				\
 	case TRUTH_ORIF_EXPR:
 	  return OP_SAME (0) && OP_SAME (1);
 
-	case FMA_EXPR:
 	case WIDEN_MULT_PLUS_EXPR:
 	case WIDEN_MULT_MINUS_EXPR:
 	  if (!OP_SAME (2))
@@ -11708,17 +11707,6 @@  fold_ternary_loc (location_t loc, enum t
 
       return NULL_TREE;
 
-    case FMA_EXPR:
-      /* For integers we can decompose the FMA if possible.  */
-      if (TREE_CODE (arg0) == INTEGER_CST
-	  && TREE_CODE (arg1) == INTEGER_CST)
-	return fold_build2_loc (loc, PLUS_EXPR, type,
-				const_binop (MULT_EXPR, arg0, arg1), arg2);
-      if (integer_zerop (arg2))
-	return fold_build2_loc (loc, MULT_EXPR, type, arg0, arg1);
-
-      return fold_fma (loc, type, arg0, arg1, arg2);
-
     case VEC_PERM_EXPR:
       if (TREE_CODE (arg2) == VECTOR_CST)
 	{
Index: gcc/gimple-pretty-print.c
===================================================================
--- gcc/gimple-pretty-print.c	2018-02-13 10:28:33.325853088 +0000
+++ gcc/gimple-pretty-print.c	2018-05-11 18:08:24.836947330 +0100
@@ -490,27 +490,6 @@  dump_ternary_rhs (pretty_printer *buffer
       pp_greater (buffer);
       break;
 
-    case FMA_EXPR:
-      if (flags & TDF_GIMPLE)
-	{
-	  pp_string (buffer, "__FMA (");
-	  dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false);
-	  pp_comma (buffer);
-	  dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false);
-	  pp_comma (buffer);
-	  dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false);
-	  pp_right_paren (buffer);
-	}
-      else
-	{
-	  dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false);
-	  pp_string (buffer, " * ");
-	  dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false);
-	  pp_string (buffer, " + ");
-	  dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false);
-	}
-      break;
-
     case DOT_PROD_EXPR:
       pp_string (buffer, "DOT_PROD_EXPR <");
       dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false);
Index: gcc/gimple.c
===================================================================
--- gcc/gimple.c	2018-02-13 10:28:33.415861313 +0000
+++ gcc/gimple.c	2018-05-11 18:08:24.838947247 +0100
@@ -2143,8 +2143,7 @@  #define DEFTREECODE(SYM, STRING, TYPE, N
       || (SYM) == REALIGN_LOAD_EXPR					    \
       || (SYM) == VEC_COND_EXPR						    \
       || (SYM) == VEC_PERM_EXPR                                             \
-      || (SYM) == BIT_INSERT_EXPR					    \
-      || (SYM) == FMA_EXPR) ? GIMPLE_TERNARY_RHS			    \
+      || (SYM) == BIT_INSERT_EXPR) ? GIMPLE_TERNARY_RHS			    \
    : ((SYM) == CONSTRUCTOR						    \
       || (SYM) == OBJ_TYPE_REF						    \
       || (SYM) == ASSERT_EXPR						    \
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	2018-05-08 09:42:02.972668452 +0100
+++ gcc/gimplify.c	2018-05-11 18:08:24.840947165 +0100
@@ -12086,7 +12086,6 @@  gimplify_expr (tree *expr_p, gimple_seq
 	  }
 	  break;
 
-	case FMA_EXPR:
 	case VEC_PERM_EXPR:
 	  /* Classified as tcc_expression.  */
 	  goto expr_3;
Index: gcc/optabs-tree.c
===================================================================
--- gcc/optabs-tree.c	2018-01-13 18:01:26.108685820 +0000
+++ gcc/optabs-tree.c	2018-05-11 18:08:24.842947083 +0100
@@ -143,9 +143,6 @@  optab_for_tree_code (enum tree_code code
 	      : (TYPE_SATURATING (type)
 		 ? ssmsub_widen_optab : smsub_widen_optab));
 
-    case FMA_EXPR:
-      return fma_optab;
-
     case VEC_WIDEN_MULT_HI_EXPR:
       return TYPE_UNSIGNED (type) ?
 	vec_widen_umult_hi_optab : vec_widen_smult_hi_optab;
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	2018-05-01 19:31:03.079312535 +0100
+++ gcc/tree-cfg.c	2018-05-11 18:08:24.848946836 +0100
@@ -4109,20 +4109,6 @@  verify_gimple_assign_ternary (gassign *s
 	}
       break;
 
-    case FMA_EXPR:
-      if (!useless_type_conversion_p (lhs_type, rhs1_type)
-	  || !useless_type_conversion_p (lhs_type, rhs2_type)
-	  || !useless_type_conversion_p (lhs_type, rhs3_type))
-	{
-	  error ("type mismatch in fused multiply-add expression");
-	  debug_generic_expr (lhs_type);
-	  debug_generic_expr (rhs1_type);
-	  debug_generic_expr (rhs2_type);
-	  debug_generic_expr (rhs3_type);
-	  return true;
-	}
-      break;
-
     case VEC_COND_EXPR:
       if (!VECTOR_BOOLEAN_TYPE_P (rhs1_type)
 	  || maybe_ne (TYPE_VECTOR_SUBPARTS (rhs1_type),
Index: gcc/tree-eh.c
===================================================================
--- gcc/tree-eh.c	2018-02-08 13:34:20.321293427 +0000
+++ gcc/tree-eh.c	2018-05-11 18:08:24.849946795 +0100
@@ -2512,8 +2512,7 @@  operation_could_trap_p (enum tree_code o
 
   if (TREE_CODE_CLASS (op) != tcc_comparison
       && TREE_CODE_CLASS (op) != tcc_unary
-      && TREE_CODE_CLASS (op) != tcc_binary
-      && op != FMA_EXPR)
+      && TREE_CODE_CLASS (op) != tcc_binary)
     return false;
 
   return operation_could_trap_helper_p (op, fp_operation, honor_trapv,
@@ -2825,8 +2824,7 @@  stmt_could_throw_1_p (gassign *stmt)
 
   if (TREE_CODE_CLASS (code) == tcc_comparison
       || TREE_CODE_CLASS (code) == tcc_unary
-      || TREE_CODE_CLASS (code) == tcc_binary
-      || code == FMA_EXPR)
+      || TREE_CODE_CLASS (code) == tcc_binary)
     {
       if (TREE_CODE_CLASS (code) == tcc_comparison)
 	t = TREE_TYPE (gimple_assign_rhs1 (stmt));
Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c	2018-03-13 15:06:01.738427167 +0000
+++ gcc/tree-inline.c	2018-05-11 18:08:24.849946795 +0100
@@ -3855,7 +3855,6 @@  estimate_operator_cost (enum tree_code c
     case MINUS_EXPR:
     case MULT_EXPR:
     case MULT_HIGHPART_EXPR:
-    case FMA_EXPR:
 
     case ADDR_SPACE_CONVERT_EXPR:
     case FIXED_CONVERT_EXPR:
Index: gcc/tree-pretty-print.c
===================================================================
--- gcc/tree-pretty-print.c	2018-03-13 15:06:00.523525161 +0000
+++ gcc/tree-pretty-print.c	2018-05-11 18:08:24.850946754 +0100
@@ -2901,16 +2901,6 @@  dump_generic_node (pretty_printer *pp, t
       pp_string (pp, " > ");
       break;
 
-    case FMA_EXPR:
-      pp_string (pp, " FMA_EXPR < ");
-      dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
-      pp_string (pp, ", ");
-      dump_generic_node (pp, TREE_OPERAND (node, 1), spc, flags, false);
-      pp_string (pp, ", ");
-      dump_generic_node (pp, TREE_OPERAND (node, 2), spc, flags, false);
-      pp_string (pp, " > ");
-      break;
-
     case OACC_PARALLEL:
       pp_string (pp, "#pragma acc parallel");
       goto dump_omp_clauses_body;
@@ -3549,7 +3539,6 @@  op_code_prio (enum tree_code code)
     case CEIL_MOD_EXPR:
     case FLOOR_MOD_EXPR:
     case ROUND_MOD_EXPR:
-    case FMA_EXPR:
       return 13;
 
     case TRUTH_NOT_EXPR:
Index: gcc/tree-ssa-loop-im.c
===================================================================
--- gcc/tree-ssa-loop-im.c	2018-03-13 15:06:00.521525322 +0000
+++ gcc/tree-ssa-loop-im.c	2018-05-11 18:08:24.850946754 +0100
@@ -493,7 +493,6 @@  stmt_cost (gimple *stmt)
     case WIDEN_MULT_PLUS_EXPR:
     case WIDEN_MULT_MINUS_EXPR:
     case DOT_PROD_EXPR:
-    case FMA_EXPR:
     case TRUNC_DIV_EXPR:
     case CEIL_DIV_EXPR:
     case FLOOR_DIV_EXPR:
Index: gcc/tree-ssa-operands.c
===================================================================
--- gcc/tree-ssa-operands.c	2018-01-03 11:12:59.133628905 +0000
+++ gcc/tree-ssa-operands.c	2018-05-11 18:08:24.851946713 +0100
@@ -849,7 +849,6 @@  get_expr_operands (struct function *fn,
     case REALIGN_LOAD_EXPR:
     case WIDEN_MULT_PLUS_EXPR:
     case WIDEN_MULT_MINUS_EXPR:
-    case FMA_EXPR:
       {
 	get_expr_operands (fn, stmt, &TREE_OPERAND (expr, 0), flags);
 	get_expr_operands (fn, stmt, &TREE_OPERAND (expr, 1), flags);
Index: gcc/tree.c
===================================================================
--- gcc/tree.c	2018-05-08 09:42:01.631717680 +0100
+++ gcc/tree.c	2018-05-11 18:08:24.852946671 +0100
@@ -7171,7 +7171,6 @@  commutative_ternary_tree_code (enum tree
     case WIDEN_MULT_PLUS_EXPR:
     case WIDEN_MULT_MINUS_EXPR:
     case DOT_PROD_EXPR:
-    case FMA_EXPR:
       return true;
 
     default:
@@ -7457,7 +7456,6 @@  add_expr (const_tree t, inchash::hash &h
 	      flags &= ~OEP_ADDRESS_OF;
 	      break;
 
-	    case FMA_EXPR:
 	    case WIDEN_MULT_PLUS_EXPR:
 	    case WIDEN_MULT_MINUS_EXPR:
 	      {
Index: gcc/fold-const-call.h
===================================================================
--- gcc/fold-const-call.h	2018-01-03 11:12:58.150667646 +0000
+++ gcc/fold-const-call.h	2018-05-11 18:08:24.833947453 +0100
@@ -23,7 +23,6 @@  #define GCC_FOLD_CONST_CALL_H
 tree fold_const_call (combined_fn, tree, tree);
 tree fold_const_call (combined_fn, tree, tree, tree);
 tree fold_const_call (combined_fn, tree, tree, tree, tree);
-tree fold_fma (location_t, tree, tree, tree, tree);
 tree build_cmp_result (tree type, int res);
 
 #endif
Index: gcc/fold-const-call.c
===================================================================
--- gcc/fold-const-call.c	2018-05-08 09:42:01.844709861 +0100
+++ gcc/fold-const-call.c	2018-05-11 18:08:24.833947453 +0100
@@ -1606,6 +1606,26 @@  fold_const_call_ssss (real_value *result
     CASE_CFN_FMA_FN:
       return do_mpfr_arg3 (result, mpfr_fma, arg0, arg1, arg2, format);
 
+    case CFN_FMS:
+      {
+	real_value new_arg2 = real_value_negate (arg2);
+	return do_mpfr_arg3 (result, mpfr_fma, arg0, arg1, &new_arg2, format);
+      }
+
+    case CFN_FNMA:
+      {
+	real_value new_arg0 = real_value_negate (arg0);
+	return do_mpfr_arg3 (result, mpfr_fma, &new_arg0, arg1, arg2, format);
+      }
+
+    case CFN_FNMS:
+      {
+	real_value new_arg0 = real_value_negate (arg0);
+	real_value new_arg2 = real_value_negate (arg2);
+	return do_mpfr_arg3 (result, mpfr_fma, &new_arg0, arg1,
+			     &new_arg2, format);
+      }
+
     default:
       return false;
     }
@@ -1719,20 +1739,3 @@  fold_const_call (combined_fn fn, tree ty
       return fold_const_call_1 (fn, type, arg0, arg1, arg2);
     }
 }
-
-/* Fold a fma operation with arguments ARG[012].  */
-
-tree
-fold_fma (location_t, tree type, tree arg0, tree arg1, tree arg2)
-{
-  REAL_VALUE_TYPE result;
-  if (real_cst_p (arg0)
-      && real_cst_p (arg1)
-      && real_cst_p (arg2)
-      && do_mpfr_arg3 (&result, mpfr_fma, TREE_REAL_CST_PTR (arg0),
-		       TREE_REAL_CST_PTR (arg1), TREE_REAL_CST_PTR (arg2),
-		       REAL_MODE_FORMAT (TYPE_MODE (type))))
-    return build_real (type, result);
-
-  return NULL_TREE;
-}
Index: gcc/genmatch.c
===================================================================
--- gcc/genmatch.c	2018-03-01 08:20:43.846526310 +0000
+++ gcc/genmatch.c	2018-05-11 18:08:24.835947371 +0100
@@ -241,6 +241,20 @@  enum internal_fn {
   IFN_LAST
 };
 
+enum combined_fn {
+#define DEF_BUILTIN(ENUM, N, C, T, LT, B, F, NA, AT, IM, COND) \
+  CFN_##ENUM = int (ENUM),
+#include "builtins.def"
+
+#define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) \
+  CFN_##CODE = int (END_BUILTINS) + int (IFN_##CODE),
+#include "internal-fn.def"
+
+  CFN_LAST
+};
+
+#include "case-cfn-macros.h"
+
 /* Return true if CODE represents a commutative tree code.  Otherwise
    return false.  */
 bool
@@ -288,7 +302,6 @@  commutative_ternary_tree_code (enum tree
     case WIDEN_MULT_PLUS_EXPR:
     case WIDEN_MULT_MINUS_EXPR:
     case DOT_PROD_EXPR:
-    case FMA_EXPR:
       return true;
 
     default:
@@ -450,6 +463,44 @@  is_a_helper <user_id *>::test (id_base *
   return id->kind == id_base::USER;
 }
 
+/* If ID has a pair of consecutive, commutative operands, return the
+   index of the first, otherwise return -1.  */
+
+static int
+commutative_op (id_base *id)
+{
+  if (operator_id *code = dyn_cast <operator_id *> (id))
+    {
+      if (commutative_tree_code (code->code)
+	  || commutative_ternary_tree_code (code->code))
+	return 0;
+      return -1;
+    }
+  if (fn_id *fn = dyn_cast <fn_id *> (id))
+    switch (fn->fn)
+      {
+      CASE_CFN_FMA:
+      case CFN_FMS:
+      case CFN_FNMA:
+      case CFN_FNMS:
+	return 0;
+
+      default:
+	return -1;
+      }
+  if (user_id *uid = dyn_cast<user_id *> (id))
+    {
+      int res = commutative_op (uid->substitutes[0]);
+      if (res < 0)
+	return 0;
+      for (unsigned i = 1; i < uid->substitutes.length (); ++i)
+	if (res != commutative_op (uid->substitutes[i]))
+	  return -1;
+      return res;
+    }
+  return -1;
+}
+
 /* Add a predicate identifier to the hash.  */
 
 static predicate_id *
@@ -946,6 +997,9 @@  commutate (operand *op, vec<vec<user_id
   if (!e->is_commutative)
     return ret;
 
+  /* The operation is always binary if it isn't inherently commutative.  */
+  int natural_opno = commutative_op (e->operation);
+  unsigned int opno = natural_opno >= 0 ? natural_opno : 0;
   for (unsigned i = 0; i < result.length (); ++i)
     {
       expr *ne = new expr (e);
@@ -994,9 +1048,11 @@  commutate (operand *op, vec<vec<user_id
 	    }
 	}
       ne->is_commutative = false;
-      // result[i].length () is 2 since e->operation is binary
-      for (unsigned j = result[i].length (); j; --j)
-	ne->append_op (result[i][j-1]);
+      for (unsigned j = 0; j < result[i].length (); ++j)
+	{
+	  int old_j = (j == opno ? opno + 1 : j == opno + 1 ? opno : j);
+	  ne->append_op (result[i][old_j]);
+	}
       ret.safe_push (ne);
     }
 
@@ -2759,24 +2815,18 @@  dt_operand::gen_gimple_expr (FILE *f, in
   /* While the toplevel operands are canonicalized by the caller
      after valueizing operands of sub-expressions we have to
      re-canonicalize operand order.  */
-  if (operator_id *code = dyn_cast <operator_id *> (id))
+  int opno = commutative_op (id);
+  if (opno >= 0)
     {
-      /* ???  We can't canonicalize tcc_comparison operands here
-         because that requires changing the comparison code which
-	 we already matched...  */
-      if (commutative_tree_code (code->code)
-	  || commutative_ternary_tree_code (code->code))
-	{
-	  char child_opname0[20], child_opname1[20];
-	  gen_opname (child_opname0, 0);
-	  gen_opname (child_opname1, 1);
-	  fprintf_indent (f, indent,
-			  "if (tree_swap_operands_p (%s, %s))\n",
-			  child_opname0, child_opname1);
-	  fprintf_indent (f, indent,
-			  "  std::swap (%s, %s);\n",
-			  child_opname0, child_opname1);
-	}
+      char child_opname0[20], child_opname1[20];
+      gen_opname (child_opname0, opno);
+      gen_opname (child_opname1, opno + 1);
+      fprintf_indent (f, indent,
+		      "if (tree_swap_operands_p (%s, %s))\n",
+		      child_opname0, child_opname1);
+      fprintf_indent (f, indent,
+		      "  std::swap (%s, %s);\n",
+		      child_opname0, child_opname1);
     }
 
   return n_braces;
@@ -4217,11 +4267,14 @@  parser::parse_expr ()
 		      e->operation->id, e->operation->nargs, e->ops.length ());
 	  if (is_commutative)
 	    {
-	      if (e->ops.length () == 2)
+	      if (e->ops.length () == 2
+		  || commutative_op (e->operation) >= 0)
 		e->is_commutative = true;
 	      else
-		fatal_at (token, "only binary operators or function with "
-			  "two arguments can be marked commutative");
+		fatal_at (token, "only binary operators or functions with "
+			  "two arguments can be marked commutative, "
+			  "unless the operation is known to be inherently "
+			  "commutative");
 	    }
 	  e->expr_type = expr_type;
 	  return op;
Index: gcc/gimple-ssa-backprop.c
===================================================================
--- gcc/gimple-ssa-backprop.c	2018-01-03 11:12:59.113629694 +0000
+++ gcc/gimple-ssa-backprop.c	2018-05-11 18:08:24.836947330 +0100
@@ -375,6 +375,9 @@  backprop::process_builtin_call_use (gcal
 
     CASE_CFN_FMA:
     CASE_CFN_FMA_FN:
+    case CFN_FMS:
+    case CFN_FNMA:
+    case CFN_FNMS:
       /* In X * X + Y, where Y is distinct from X, the sign of X doesn't
 	 matter.  */
       if (gimple_call_arg (call, 0) == rhs
@@ -420,15 +423,6 @@  backprop::process_assign_use (gassign *a
 	}
       break;
 
-    case FMA_EXPR:
-      /* In X * X + Y, where Y is distinct from X, the sign of X doesn't
-	 matter.  */
-      if (gimple_assign_rhs1 (assign) == rhs
-	  && gimple_assign_rhs2 (assign) == rhs
-	  && gimple_assign_rhs3 (assign) != rhs)
-	info->flags.ignore_sign = true;
-      break;
-
     case MULT_EXPR:
       /* In X * X, the sign of X doesn't matter.  */
       if (gimple_assign_rhs1 (assign) == rhs
Index: gcc/hsa-gen.c
===================================================================
--- gcc/hsa-gen.c	2018-03-17 08:30:21.230924973 +0000
+++ gcc/hsa-gen.c	2018-05-11 18:08:24.841947124 +0100
@@ -3178,23 +3178,6 @@  gen_hsa_insns_for_operation_assignment (
     case NEGATE_EXPR:
       opcode = BRIG_OPCODE_NEG;
       break;
-    case FMA_EXPR:
-      /* There is a native HSA instruction for scalar FMAs but not for vector
-	 ones.  */
-      if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE)
-	{
-	  hsa_op_reg *dest
-	    = hsa_cfun->reg_for_gimple_ssa (gimple_assign_lhs (assign));
-	  hsa_op_with_type *op1 = hsa_reg_or_immed_for_gimple_op (rhs1, hbb);
-	  hsa_op_with_type *op2 = hsa_reg_or_immed_for_gimple_op (rhs2, hbb);
-	  hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb);
-	  hsa_op_reg *tmp = new hsa_op_reg (dest->m_type);
-	  gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp, op1, op2, hbb);
-	  gen_hsa_binary_operation (BRIG_OPCODE_ADD, dest, tmp, op3, hbb);
-	  return;
-	}
-      opcode = BRIG_OPCODE_MAD;
-      break;
     case MIN_EXPR:
       opcode = BRIG_OPCODE_MIN;
       break;
@@ -4490,6 +4473,57 @@  gen_hsa_divmod (gcall *call, hsa_bb *hbb
   insn->set_output_in_type (dest, 0, hbb);
 }
 
+/* Emit instructions that implement FMA, FMS, FNMA or FNMS call STMT.
+   Instructions are appended to basic block HBB.  NEGATE1 is true for
+   FNMA and FNMS.  NEGATE3 is true for FMS and FNMS.  */
+
+static void
+gen_hsa_fma (gcall *call, hsa_bb *hbb, bool negate1, bool negate3)
+{
+  tree lhs = gimple_call_lhs (call);
+  if (lhs == NULL_TREE)
+    return;
+
+  tree rhs1 = gimple_call_arg (call, 0);
+  tree rhs2 = gimple_call_arg (call, 1);
+  tree rhs3 = gimple_call_arg (call, 2);
+
+  hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
+  hsa_op_with_type *op1 = hsa_reg_or_immed_for_gimple_op (rhs1, hbb);
+  hsa_op_with_type *op2 = hsa_reg_or_immed_for_gimple_op (rhs2, hbb);
+  hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb);
+
+  if (negate1)
+    {
+      hsa_op_reg *tmp = new hsa_op_reg (dest->m_type);
+      gen_hsa_unary_operation (BRIG_OPCODE_NEG, tmp, op1, hbb);
+      op1 = tmp;
+    }
+
+  /* There is a native HSA instruction for scalar FMAs but not for vector
+     ones.  */
+  if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE)
+    {
+      hsa_op_reg *tmp = new hsa_op_reg (dest->m_type);
+      gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp, op1, op2, hbb);
+      gen_hsa_binary_operation (negate3 ? BRIG_OPCODE_SUB : BRIG_OPCODE_ADD,
+				dest, tmp, op3, hbb);
+    }
+  else
+    {
+      if (negate3)
+	{
+	  hsa_op_reg *tmp = new hsa_op_reg (dest->m_type);
+	  gen_hsa_unary_operation (BRIG_OPCODE_NEG, tmp, op3, hbb);
+	  op3 = tmp;
+	}
+      hsa_insn_basic *insn = new hsa_insn_basic (4, BRIG_OPCODE_MAD,
+						 dest->m_type, dest,
+						 op1, op2, op3);
+      hbb->append_insn (insn);
+    }
+}
+
 /* Set VALUE to a shadow kernel debug argument and append a new instruction
    to HBB basic block.  */
 
@@ -5224,6 +5258,22 @@  gen_hsa_insn_for_internal_fn_call (gcall
       gen_hsa_insns_for_call_of_internal_fn (stmt, hbb);
       break;
 
+    case IFN_FMA:
+      gen_hsa_fma (stmt, hbb, false, false);
+      break;
+
+    case IFN_FMS:
+      gen_hsa_fma (stmt, hbb, false, true);
+      break;
+
+    case IFN_FNMA:
+      gen_hsa_fma (stmt, hbb, true, false);
+      break;
+
+    case IFN_FNMS:
+      gen_hsa_fma (stmt, hbb, true, true);
+      break;
+
     default:
       HSA_SORRY_ATV (gimple_location (stmt),
 		     "support for HSA does not implement internal function: %s",
Index: gcc/match.pd
===================================================================
--- gcc/match.pd	2018-05-02 08:39:32.882737077 +0100
+++ gcc/match.pd	2018-05-11 18:08:24.842947083 +0100
@@ -4702,3 +4702,60 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 	|| wi::geu_p (wi::to_wide (@rpos),
 		      wi::to_wide (@ipos) + isize))
     (BIT_FIELD_REF @0 @rsize @rpos)))))
+
+(for fmas (FMA)
+ (simplify
+  (fmas:c (negate @0) @1 @2)
+  (IFN_FNMA @0 @1 @2))
+ (simplify
+  (fmas @0 @1 (negate @2))
+  (IFN_FMS @0 @1 @2))
+ (simplify
+  (fmas:c (negate @0) @1 (negate @2))
+  (IFN_FNMS @0 @1 @2))
+ (simplify
+  (negate (fmas@3 @0 @1 @2))
+  (if (single_use (@3))
+   (IFN_FNMS @0 @1 @2))))
+
+(simplify
+ (IFN_FMS:c (negate @0) @1 @2)
+ (IFN_FNMS @0 @1 @2))
+(simplify
+ (IFN_FMS @0 @1 (negate @2))
+ (IFN_FMA @0 @1 @2))
+(simplify
+ (IFN_FMS:c (negate @0) @1 (negate @2))
+ (IFN_FNMA @0 @1 @2))
+(simplify
+ (negate (IFN_FMS@3 @0 @1 @2))
+  (if (single_use (@3))
+   (IFN_FNMA @0 @1 @2)))
+
+(simplify
+ (IFN_FNMA:c (negate @0) @1 @2)
+ (IFN_FMA @0 @1 @2))
+(simplify
+ (IFN_FNMA @0 @1 (negate @2))
+ (IFN_FNMS @0 @1 @2))
+(simplify
+ (IFN_FNMA:c (negate @0) @1 (negate @2))
+ (IFN_FMS @0 @1 @2))
+(simplify
+ (negate (IFN_FNMA@3 @0 @1 @2))
+ (if (single_use (@3))
+  (IFN_FMS @0 @1 @2)))
+
+(simplify
+ (IFN_FNMS:c (negate @0) @1 @2)
+ (IFN_FMS @0 @1 @2))
+(simplify
+ (IFN_FNMS @0 @1 (negate @2))
+ (IFN_FNMA @0 @1 @2))
+(simplify
+ (IFN_FNMS:c (negate @0) @1 (negate @2))
+ (IFN_FMA @0 @1 @2))
+(simplify
+ (negate (IFN_FNMS@3 @0 @1 @2))
+ (if (single_use (@3))
+  (IFN_FMA @0 @1 @2)))
Index: gcc/tree-ssa-math-opts.c
===================================================================
--- gcc/tree-ssa-math-opts.c	2018-01-12 14:45:51.037434575 +0000
+++ gcc/tree-ssa-math-opts.c	2018-05-11 18:08:24.850946754 +0100
@@ -2640,6 +2640,14 @@  convert_plusminus_to_widen (gimple_stmt_
   return true;
 }
 
+/* gimple_fold callback that "valueizes" everything.  */
+
+static tree
+aggressive_valueize (tree val)
+{
+  return val;
+}
+
 /* Given a result MUL_RESULT which is a result of a multiplication of OP1 and
    OP2 and which we know is used in statements that can be, together with the
    multiplication, converted to FMAs, perform the transformation.  */
@@ -2650,7 +2658,7 @@  convert_mult_to_fma_1 (tree mul_result,
   tree type = TREE_TYPE (mul_result);
   gimple *use_stmt;
   imm_use_iterator imm_iter;
-  gassign *fma_stmt;
+  gcall *fma_stmt;
 
   FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, mul_result)
     {
@@ -2658,6 +2666,7 @@  convert_mult_to_fma_1 (tree mul_result,
       enum tree_code use_code;
       tree addop, mulop1 = op1, result = mul_result;
       bool negate_p = false;
+      gimple_seq seq = NULL;
 
       if (is_gimple_debug (use_stmt))
 	continue;
@@ -2683,11 +2692,7 @@  convert_mult_to_fma_1 (tree mul_result,
 	  addop = gimple_assign_rhs2 (use_stmt);
 	  /* a * b - c -> a * b + (-c)  */
 	  if (gimple_assign_rhs_code (use_stmt) == MINUS_EXPR)
-	    addop = force_gimple_operand_gsi (&gsi,
-					      build1 (NEGATE_EXPR,
-						      type, addop),
-					      true, NULL_TREE, true,
-					      GSI_SAME_STMT);
+	    addop = gimple_build (&seq, NEGATE_EXPR, type, addop);
 	}
       else
 	{
@@ -2698,23 +2703,26 @@  convert_mult_to_fma_1 (tree mul_result,
 	}
 
       if (negate_p)
-	mulop1 = force_gimple_operand_gsi (&gsi,
-					   build1 (NEGATE_EXPR,
-						   type, mulop1),
-					   true, NULL_TREE, true,
-					   GSI_SAME_STMT);
+	mulop1 = gimple_build (&seq, NEGATE_EXPR, type, mulop1);
 
-      fma_stmt = gimple_build_assign (gimple_assign_lhs (use_stmt),
-				      FMA_EXPR, mulop1, op2, addop);
+      if (seq)
+	gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
+      fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2, addop);
+      gimple_call_set_lhs (fma_stmt, gimple_assign_lhs (use_stmt));
+      gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal (use_stmt));
+      gsi_replace (&gsi, fma_stmt, true);
+      /* Valueize aggressively so that we generate FMS, FNMA and FNMS
+	 regardless of where the negation occurs.  */
+      if (fold_stmt (&gsi, aggressive_valueize))
+	update_stmt (gsi_stmt (gsi));
 
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
 	  fprintf (dump_file, "Generated FMA ");
-	  print_gimple_stmt (dump_file, fma_stmt, 0, 0);
+	  print_gimple_stmt (dump_file, gsi_stmt (gsi), 0, 0);
 	  fprintf (dump_file, "\n");
 	}
 
-      gsi_replace (&gsi, fma_stmt, true);
       widen_mul_stats.fmas_inserted++;
     }
 }
@@ -2862,7 +2870,8 @@  convert_mult_to_fma (gimple *mul_stmt, t
 
   /* If the target doesn't support it, don't generate it.  We assume that
      if fma isn't available then fms, fnma or fnms are not either.  */
-  if (optab_handler (fma_optab, TYPE_MODE (type)) == CODE_FOR_nothing)
+  optimization_type opt_type = bb_optimization_type (gimple_bb (mul_stmt));
+  if (!direct_internal_fn_supported_p (IFN_FMA, type, opt_type))
     return false;
 
   /* If the multiplication has zero uses, it is kept around probably because
@@ -2958,8 +2967,8 @@  convert_mult_to_fma (gimple *mul_stmt, t
 	 that a mult / subtract pair.  */
       if (use_code == MINUS_EXPR && !negate_p
 	  && gimple_assign_rhs1 (use_stmt) == result
-	  && optab_handler (fms_optab, TYPE_MODE (type)) == CODE_FOR_nothing
-	  && optab_handler (fnma_optab, TYPE_MODE (type)) != CODE_FOR_nothing)
+	  && !direct_internal_fn_supported_p (IFN_FMS, type, opt_type)
+	  && direct_internal_fn_supported_p (IFN_FNMA, type, opt_type))
 	{
 	  tree rhs2 = gimple_assign_rhs2 (use_stmt);
 
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	2018-05-09 11:34:41.204789431 +0100
+++ gcc/config/i386/i386.c	2018-05-11 18:08:24.825947782 +0100
@@ -50516,19 +50516,20 @@  ix86_add_stmt_cost (void *data, int coun
   tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE;
   int stmt_cost = - 1;
 
+  bool fp = false;
+  machine_mode mode = TImode;
+
+  if (vectype != NULL)
+    {
+      fp = FLOAT_TYPE_P (vectype);
+      mode = TYPE_MODE (vectype);
+    }
+
   if ((kind == vector_stmt || kind == scalar_stmt)
       && stmt_info
       && stmt_info->stmt && gimple_code (stmt_info->stmt) == GIMPLE_ASSIGN)
     {
       tree_code subcode = gimple_assign_rhs_code (stmt_info->stmt);
-      bool fp = false;
-      machine_mode mode = TImode;
-
-      if (vectype != NULL)
-	{
-	  fp = FLOAT_TYPE_P (vectype);
-	  mode = TYPE_MODE (vectype);
-	}
       /*machine_mode inner_mode = mode;
       if (VECTOR_MODE_P (mode))
 	inner_mode = GET_MODE_INNER (mode);*/
@@ -50559,12 +50560,6 @@  ix86_add_stmt_cost (void *data, int coun
 	case MULT_HIGHPART_EXPR:
 	  stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
 	  break;
-	case FMA_EXPR:
-          stmt_cost = ix86_vec_cost (mode,
-				     mode == SFmode ? ix86_cost->fmass
-				     : ix86_cost->fmasd,
-				     true);
-	  break;
 	case NEGATE_EXPR:
 	  if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
 	    stmt_cost = ix86_cost->sse_op;
@@ -50627,6 +50622,24 @@  ix86_add_stmt_cost (void *data, int coun
 	  break;
 	}
     }
+
+  combined_fn cfn;
+  if ((kind == vector_stmt || kind == scalar_stmt)
+      && stmt_info
+      && stmt_info->stmt
+      && (cfn = gimple_call_combined_fn (stmt_info->stmt)) != CFN_LAST)
+    switch (cfn)
+      {
+      case CFN_FMA:
+	stmt_cost = ix86_vec_cost (mode,
+				   mode == SFmode ? ix86_cost->fmass
+				   : ix86_cost->fmasd,
+				   true);
+	break;
+      default:
+	break;
+      }
+
   /* If we do elementwise loads into a vector then we are bound by
      latency and execution resources for the many scalar loads
      (AGU and load ports).  Try to account for this by scaling the
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	2018-05-08 09:42:03.510648702 +0100
+++ gcc/config/rs6000/rs6000.c	2018-05-11 18:08:24.830947576 +0100
@@ -16025,7 +16025,9 @@  rs6000_gimple_fold_builtin (gimple_stmt_
 	arg1 = gimple_call_arg (stmt, 1);
 	tree arg2 = gimple_call_arg (stmt, 2);
 	lhs = gimple_call_lhs (stmt);
-	gimple *g = gimple_build_assign (lhs, FMA_EXPR, arg0, arg1, arg2);
+	gcall *g = gimple_build_call_internal (IFN_FMA, 3, arg0, arg1, arg2);
+	gimple_call_set_lhs (g, lhs);
+	gimple_call_set_nothrow (g, true);
 	gimple_set_location (g, gimple_location (stmt));
 	gsi_replace (gsi, g, true);
 	return true;
Index: gcc/brig/brigfrontend/brig-function.cc
===================================================================
--- gcc/brig/brigfrontend/brig-function.cc	2018-05-08 09:42:01.419725462 +0100
+++ gcc/brig/brigfrontend/brig-function.cc	2018-05-11 18:08:24.817948111 +0100
@@ -1218,6 +1218,7 @@  brig_function::get_builtin_for_hsa_opcod
     case BRIG_OPCODE_NEXP2:
       builtin = mathfn_built_in (builtin_type, BUILT_IN_EXP2);
       break;
+    case BRIG_OPCODE_FMA:
     case BRIG_OPCODE_NFMA:
       builtin = mathfn_built_in (builtin_type, BUILT_IN_FMA);
       break;
@@ -1460,8 +1461,6 @@  brig_function::get_tree_code_for_hsa_opc
 	return CALL_EXPR;
       else
 	return MAX_EXPR;
-    case BRIG_OPCODE_FMA:
-      return FMA_EXPR;
     case BRIG_OPCODE_ABS:
       return ABS_EXPR;
     case BRIG_OPCODE_SHL:
@@ -1496,6 +1495,7 @@  brig_function::get_tree_code_for_hsa_opc
       /* Implement as 1/f (x).  gcc should pattern detect that and
 	 use a native instruction, if available, for it.  */
       return TREE_LIST;
+    case BRIG_OPCODE_FMA:
     case BRIG_OPCODE_FLOOR:
     case BRIG_OPCODE_CEIL:
     case BRIG_OPCODE_SQRT:
Index: gcc/c/gimple-parser.c
===================================================================
--- gcc/c/gimple-parser.c	2018-01-03 11:12:56.269741723 +0000
+++ gcc/c/gimple-parser.c	2018-05-11 18:08:24.817948111 +0100
@@ -903,27 +903,6 @@  c_parser_gimple_postfix_expression (c_pa
 	      expr.value = fold_convert (type, val);
 	      return expr;
 	    }
-	  else if (strcmp (IDENTIFIER_POINTER (id), "__FMA") == 0)
-	    {
-	      c_parser_consume_token (parser);
-	      auto_vec<tree> args;
-
-	      if (c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
-		{
-		  c_parser_gimple_expr_list (parser, &args);
-		  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
-					     "expected %<)%>");
-		}
-	      if (args.length () != 3)
-		{
-		  error_at (loc, "invalid number of operands to __FMA");
-		  expr.value = error_mark_node;
-		  return expr;
-		}
-	      expr.value = build3_loc (loc, FMA_EXPR, TREE_TYPE (args[0]),
-				       args[0], args[1], args[2]);
-	      return expr;
-	    }
 
 	  /* SSA name.  */
 	  unsigned version, ver_offset;
Index: gcc/cp/constexpr.c
===================================================================
--- gcc/cp/constexpr.c	2018-05-01 19:30:30.973597315 +0100
+++ gcc/cp/constexpr.c	2018-05-11 18:08:24.830947576 +0100
@@ -4573,7 +4573,6 @@  cxx_eval_constant_expression (const cons
 			     non_constant_p, overflow_p);
       break;
 
-    case FMA_EXPR:
     case VEC_PERM_EXPR:
       r = cxx_eval_trinary_expression (ctx, t, lval,
 				       non_constant_p, overflow_p);
@@ -5998,7 +5997,6 @@  #define RECUR(T,RV) \
 	  return false;
       return true;
 
-    case FMA_EXPR:
     case VEC_PERM_EXPR:
      for (i = 0; i < 3; ++i)
       if (!RECUR (TREE_OPERAND (t, i), true))
Index: gcc/testsuite/lib/target-supports.exp
===================================================================
--- gcc/testsuite/lib/target-supports.exp	2018-05-08 09:42:01.256731446 +0100
+++ gcc/testsuite/lib/target-supports.exp	2018-05-11 18:08:24.848946836 +0100
@@ -2879,6 +2879,13 @@  proc check_effective_target_base_quadflo
     return 1
 }
 
+# Return 1 if the target supports all four forms of fused multiply-add
+# (fma, fms, fnma, and fnms) for both float and double.
+
+proc check_effective_target_all_scalar_fma { } {
+    return [istarget aarch64*-*-*]
+}
+
 # Return 1 if the target supports compiling fixed-point,
 # 0 otherwise.
 
Index: gcc/testsuite/gcc.dg/fma-1.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.dg/fma-1.c	2018-05-11 18:08:24.844947000 +0100
@@ -0,0 +1,15 @@ 
+/* { dg-options "-O2 -fdump-tree-widening_mul" } */
+
+float
+f1 (float a, float b, float c)
+{
+  return a * b + c;
+}
+
+double
+f2 (double a, double b, double c)
+{
+  return a * b + c;
+}
+
+/* { dg-final { scan-tree-dump-times { = FMA \(} 2 "widening_mul" { target all_scalar_fma } } } */
Index: gcc/testsuite/gcc.dg/fma-2.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.dg/fma-2.c	2018-05-11 18:08:24.844947000 +0100
@@ -0,0 +1,15 @@ 
+/* { dg-options "-O2 -fdump-tree-widening_mul" } */
+
+float
+f1 (float a, float b, float c)
+{
+  return a * b - c;
+}
+
+double
+f2 (double a, double b, double c)
+{
+  return a * b - c;
+}
+
+/* { dg-final { scan-tree-dump-times { = FMS \(} 2 "widening_mul" { target all_scalar_fma } } } */
Index: gcc/testsuite/gcc.dg/fma-3.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.dg/fma-3.c	2018-05-11 18:08:24.845946959 +0100
@@ -0,0 +1,15 @@ 
+/* { dg-options "-O2 -fdump-tree-widening_mul" } */
+
+float
+f1 (float a, float b, float c)
+{
+  return c - a * b;
+}
+
+double
+f2 (double a, double b, double c)
+{
+  return c - a * b;
+}
+
+/* { dg-final { scan-tree-dump-times { = FNMA \(} 2 "widening_mul" { target all_scalar_fma } } } */
Index: gcc/testsuite/gcc.dg/fma-4.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.dg/fma-4.c	2018-05-11 18:08:24.845946959 +0100
@@ -0,0 +1,15 @@ 
+/* { dg-options "-O2 -fdump-tree-widening_mul" } */
+
+float
+f1 (float a, float b, float c)
+{
+  return -(a * b) - c;
+}
+
+double
+f2 (double a, double b, double c)
+{
+  return -(a * b) - c;
+}
+
+/* { dg-final { scan-tree-dump-times { = FNMS \(} 2 "widening_mul" { target all_scalar_fma } } } */
Index: gcc/testsuite/gcc.dg/fma-5.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.dg/fma-5.c	2018-05-11 18:08:24.845946959 +0100
@@ -0,0 +1,53 @@ 
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+float
+f1 (float a, float b, float c)
+{
+  return __builtin_fmaf (a, b, -c);
+}
+
+double
+f2 (double a, double b, double c)
+{
+  return __builtin_fma (a, b, -c);
+}
+
+void
+f3 (float a, float b, float c, float d, float e, float *res)
+{
+  res[0] = __builtin_fmaf (a, b, -e);
+  res[1] = __builtin_fmaf (c, d, -e);
+}
+
+void
+f4 (double a, double b, double c, double d, double e, double *res)
+{
+  res[0] = __builtin_fma (a, b, -e);
+  res[1] = __builtin_fma (c, d, -e);
+}
+
+float
+f5 (float a, float b, float c)
+{
+  return -__builtin_fmaf (-a, b, c);
+}
+
+double
+f6 (double a, double b, double c)
+{
+  return -__builtin_fma (-a, b, c);
+}
+
+float
+f7 (float a, float b, float c)
+{
+  return -__builtin_fmaf (a, -b, c);
+}
+
+double
+f8 (double a, double b, double c)
+{
+  return -__builtin_fma (a, -b, c);
+}
+
+/* { dg-final { scan-tree-dump-times { = FMS \(} 10 "optimized" { target all_scalar_fma } } } */
Index: gcc/testsuite/gcc.dg/fma-6.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.dg/fma-6.c	2018-05-11 18:08:24.845946959 +0100
@@ -0,0 +1,67 @@ 
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+float
+f1 (float a, float b, float c)
+{
+  return __builtin_fmaf (-a, b, c);
+}
+
+double
+f2 (double a, double b, double c)
+{
+  return __builtin_fma (-a, b, c);
+}
+
+float
+f3 (float a, float b, float c)
+{
+  return __builtin_fmaf (a, -b, c);
+}
+
+double
+f4 (double a, double b, double c)
+{
+  return __builtin_fma (a, -b, c);
+}
+
+void
+f5 (float a, float b, float c, float d, float e, float *res)
+{
+  res[0] = __builtin_fmaf (-a, b, c);
+  res[1] = __builtin_fmaf (-a, d, e);
+}
+
+void
+f6 (double a, double b, double c, double d, double e, double *res)
+{
+  res[0] = __builtin_fma (-a, b, c);
+  res[1] = __builtin_fma (-a, d, e);
+}
+
+void
+f7 (float a, float b, float c, float d, float e, float *res)
+{
+  res[0] = __builtin_fmaf (a, -b, c);
+  res[1] = __builtin_fmaf (d, -b, e);
+}
+
+void
+f8 (double a, double b, double c, double d, double e, double *res)
+{
+  res[0] = __builtin_fma (a, -b, c);
+  res[1] = __builtin_fma (d, -b, e);
+}
+
+float
+f9 (float a, float b, float c)
+{
+  return -__builtin_fmaf (a, b, -c);
+}
+
+double
+f10 (double a, double b, double c)
+{
+  return -__builtin_fma (a, b, -c);
+}
+
+/* { dg-final { scan-tree-dump-times { = FNMA \(} 14 "optimized" { target all_scalar_fma } } } */
Index: gcc/testsuite/gcc.dg/fma-7.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.dg/fma-7.c	2018-05-11 18:08:24.845946959 +0100
@@ -0,0 +1,67 @@ 
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+float
+f1 (float a, float b, float c)
+{
+  return __builtin_fmaf (-a, b, -c);
+}
+
+double
+f2 (double a, double b, double c)
+{
+  return __builtin_fma (-a, b, -c);
+}
+
+float
+f3 (float a, float b, float c)
+{
+  return __builtin_fmaf (a, -b, -c);
+}
+
+double
+f4 (double a, double b, double c)
+{
+  return __builtin_fma (a, -b, -c);
+}
+
+void
+f5 (float a, float b, float c, float d, float *res)
+{
+  res[0] = __builtin_fmaf (-a, b, -c);
+  res[1] = __builtin_fmaf (-a, d, -c);
+}
+
+void
+f6 (double a, double b, double c, double d, double *res)
+{
+  res[0] = __builtin_fma (-a, b, -c);
+  res[1] = __builtin_fma (-a, d, -c);
+}
+
+void
+f7 (float a, float b, float c, float d, float *res)
+{
+  res[0] = __builtin_fmaf (a, -b, -c);
+  res[1] = __builtin_fmaf (d, -b, -c);
+}
+
+void
+f8 (double a, double b, double c, double d, double *res)
+{
+  res[0] = __builtin_fma (a, -b, -c);
+  res[1] = __builtin_fma (d, -b, -c);
+}
+
+float
+f9 (float a, float b, float c)
+{
+  return -__builtin_fmaf (a, b, c);
+}
+
+double
+f10 (double a, double b, double c)
+{
+  return -__builtin_fma (a, b, c);
+}
+
+/* { dg-final { scan-tree-dump-times { = FNMS \(} 14 "optimized" { target all_scalar_fma } } } */
Index: gcc/testsuite/gcc.dg/gimplefe-26.c
===================================================================
--- gcc/testsuite/gcc.dg/gimplefe-26.c	2017-02-23 19:54:08.000000000 +0000
+++ /dev/null	2018-04-20 16:19:46.369131350 +0100
@@ -1,16 +0,0 @@ 
-/* { dg-do compile { target c99_runtime } } */
-/* { dg-options "-O -fgimple -fdump-tree-ssa-gimple" } */
-
-#define foo(type, num) \
-type __GIMPLE () foo_##num (type a, type b, type c) \
-{ \
-  type t0; \
-  t0_1 = __FMA (a, b, c); \
-  return t0_1; \
-}
-
-foo(float, 1)
-foo(double, 2)
-foo(long double, 3)
-
-/* { dg-final { scan-tree-dump-times "__FMA" 3 "ssa" } } */
Index: gcc/testsuite/gfortran.dg/reassoc_7.f
===================================================================
--- gcc/testsuite/gfortran.dg/reassoc_7.f	2015-06-02 23:52:46.000000000 +0100
+++ gcc/testsuite/gfortran.dg/reassoc_7.f	2018-05-11 18:08:24.846946918 +0100
@@ -1,5 +1,5 @@ 
 ! { dg-do compile }
-! { dg-options "-O3 -ffast-math -fdump-tree-optimized" }
+! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" }
 
       SUBROUTINE S55199(P,Dvdph)
       implicit none
Index: gcc/testsuite/gfortran.dg/reassoc_8.f
===================================================================
--- gcc/testsuite/gfortran.dg/reassoc_8.f	2015-06-02 23:52:46.000000000 +0100
+++ gcc/testsuite/gfortran.dg/reassoc_8.f	2018-05-11 18:08:24.846946918 +0100
@@ -1,5 +1,5 @@ 
 ! { dg-do compile }
-! { dg-options "-O3 -ffast-math -fdump-tree-optimized" }
+! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" }
 
       SUBROUTINE S55199(P,Dvdph)
       implicit none
Index: gcc/testsuite/gfortran.dg/reassoc_9.f
===================================================================
--- gcc/testsuite/gfortran.dg/reassoc_9.f	2015-06-02 23:52:48.000000000 +0100
+++ gcc/testsuite/gfortran.dg/reassoc_9.f	2018-05-11 18:08:24.847946877 +0100
@@ -1,5 +1,5 @@ 
 ! { dg-do compile }
-! { dg-options "-O3 -ffast-math -fdump-tree-optimized" }
+! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" }
 
       SUBROUTINE S55199(P,Dvdph)
       implicit none
Index: gcc/testsuite/gfortran.dg/reassoc_10.f
===================================================================
--- gcc/testsuite/gfortran.dg/reassoc_10.f	2015-06-02 23:52:44.000000000 +0100
+++ gcc/testsuite/gfortran.dg/reassoc_10.f	2018-05-11 18:08:24.846946918 +0100
@@ -1,5 +1,5 @@ 
 ! { dg-do compile }
-! { dg-options "-O3 -ffast-math -fdump-tree-optimized" }
+! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" }
 
       SUBROUTINE S55199(P,Q,Dvdph)
       implicit none