From patchwork Fri May 11 17:15:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 912087 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-477598-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="UXfIHuFc"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40jGvb07wCz9ryk for ; Sat, 12 May 2018 03:15:53 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=gZ20sa4x3HfIee50yLqXqTWxMQQ6y0v0s2JUBUffS3IZbNG5aZchJ fCts2AZuBYqqpEoU/342aqrHn4iRaYdHsJp+EgQzIQB1Rz7o1uACtXvYkE3zGmaM CouVnc4I2MTgbWgKDYQH7o9mX/kvDHjRzx/fXJ/iYfGaKMPusq3e3o= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=kmTLOveZu0E0YwusUSgurpB//q0=; b=UXfIHuFcA+38kGf+fQO4 frUXb8+ppdi8HcuNddYm47ZbuQ0MyIwlQ7bHO1N6c3R0NAfmUSTofluvm3UK+Tj5 FEwaGjsequ941U+xWMzRjC5n+ydYzGZG3T4umP7xzwW4j4rfoceQlyZOn3xWmMEP e8sDC8km4D10Ruez+fmDRkY= Received: (qmail 119519 invoked by alias); 11 May 2018 17:15:39 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 119413 invoked by uid 89); 11 May 2018 17:15:27 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-15.6 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=user_id, elementwise, multiplying, ternary X-HELO: mail-wr0-f172.google.com Received: from mail-wr0-f172.google.com (HELO mail-wr0-f172.google.com) (209.85.128.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 11 May 2018 17:15:20 +0000 Received: by mail-wr0-f172.google.com with SMTP id v15-v6so6037130wrm.10 for ; Fri, 11 May 2018 10:15:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:mail-followup-to:subject:date:message-id :user-agent:mime-version; bh=rTl+HqWWZsoI7sfOUBnGItQm1CTDeV3Zv4FuYxXhdOg=; b=DAdCMN0Klr3ueDYT2UWzmYlXQxRrfZBcDvCBHxYC83UuYc2fzx/dTge/L/RSIB+/Gr goUEkW+HfYmP9DO6f0FO8P0o8LoPeZnbcmeOtEL3/gEmXbctlTgKsRlYhekNJwE+/iQO 2J5u4PUmtLWDJdVTJbsco0rUSRNCmqDjke9IOOjRy4zUEKbQIlNdlmFHyqSCcEerGDX0 +vnGJ4EznkhH0c0M0+7m/mi62dOVtd7QyG0tvERxbKmfv+miMztmMvb9OQQX7rTVZlT1 79Y8CJas2a9ItvtwcNdUoMn4m69bXWp67aUyDquOXMRRmMh6OFSDvjruz06sqmgr2MHq 3Iig== X-Gm-Message-State: ALKqPwdGEamivfZr5mkVytQ7wftRFnoN1nldgffPLD0oaibBXstnojeK rYLTZ6/kPXi06n6zeYhsl5wgoekTu4E= X-Google-Smtp-Source: AB8JxZrFnMfFh8I1aTAYV09vY9qoHzP5KYHnKgOLZYl3Ht4bqJBeI1LNH8mnKnb9MXwMr5MEyYZ8tQ== X-Received: by 2002:adf:af28:: with SMTP id z37-v6mr5088203wrc.72.1526058916582; Fri, 11 May 2018 10:15:16 -0700 (PDT) Received: from localhost (116.58.7.51.dyn.plus.net. [51.7.58.116]) by smtp.gmail.com with ESMTPSA id q194-v6sm2975062wmd.26.2018.05.11.10.15.14 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 11 May 2018 10:15:15 -0700 (PDT) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@linaro.org Subject: Replace FMA_EXPR with one internal fn per optab Date: Fri, 11 May 2018 18:15:14 +0100 Message-ID: <87po22p10d.fsf@linaro.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 There are four optabs for various forms of fused multiply-add: fma, fms, fnma and fnms. Of these, only fma had a direct gimple representation. For the other three we relied on special pattern- matching during expand, although tree-ssa-math-opts.c did have some code to try to second-guess what expand would do. This patch removes the old FMA_EXPR representation of fma and introduces four new internal functions, one for each optab. IFN_FMA is tied to BUILT_IN_FMA* while the other three are independent directly-mapped internal functions. It's then possible to do the pattern-matching in match.pd and tree-ssa-math-opts.c (via folding) can select the exact FMA-based operation. The patch removes the gimple FE support for __FMA rather than mapping it to the internal function. There's no reason now to treat it differently from other internal functions (although the FE doesn't handle those yet). The BRIG & HSA parts are a best guess, but seem relatively simple. The genmatch.c changes are structured to allow ternary ops in which the second two rather than the first two operands are commutative. A later patch makes use of this. Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf, x86_64-linux-gnu and powerpc64le-linux-gnu. OK to install? Richard 2018-05-11 Richard Sandiford gcc/ * doc/sourcebuild.texi (all_scalar_fma): Document. * tree.def (FMA_EXPR): Delete. * internal-fn.def (FMA, FMS, FNMA, FNMS): New internal functions. * internal-fn.c (ternary_direct): New macro. (expand_ternary_optab_fn): Likewise. (direct_ternary_optab_supported_p): Likewise. * Makefile.in (build/genmatch.o): Depend on case-fn-macros.h. * builtins.c (fold_builtin_fma): Delete. (fold_builtin_3): Don't call it. * cfgexpand.c (expand_debug_expr): Remove FMA_EXPR handling. * expr.c (expand_expr_real_2): Likewise. * fold-const.c (operand_equal_p): Likewise. (fold_ternary_loc): Likewise. * gimple-pretty-print.c (dump_ternary_rhs): Likewise. * gimple.c (DEFTREECODE): Likewise. * gimplify.c (gimplify_expr): Likewise. * optabs-tree.c (optab_for_tree_code): Likewise. * tree-cfg.c (verify_gimple_assign_ternary): Likewise. * tree-eh.c (operation_could_trap_p): Likewise. (stmt_could_throw_1_p): Likewise. * tree-inline.c (estimate_operator_cost): Likewise. * tree-pretty-print.c (dump_generic_node): Likewise. (op_code_prio): Likewise. * tree-ssa-loop-im.c (stmt_cost): Likewise. * tree-ssa-operands.c (get_expr_operands): Likewise. * tree.c (commutative_ternary_tree_code, add_expr): Likewise. * fold-const-call.h (fold_fma): Delete. * fold-const-call.c (fold_const_call_ssss): Handle CFN_FMS, CFN_FNMA and CFN_FNMS. (fold_fma): Delete. * genmatch.c (combined_fn): New enum. (commutative_ternary_tree_code): Remove FMA_EXPR handling. (commutative_op): New function. (commutate): Use it. Handle more than 2 operands. (dt_operand::gen_gimple_expr): Use commutative_op. (parser::parse_expr): Allow :c to be used with non-binary operators if the commutative operand is known. * gimple-ssa-backprop.c (backprop::process_builtin_call_use): Handle CFN_FMS, CFN_FNMA and CFN_FNMS. (backprop::process_assign_use): Remove FMA_EXPR handling. * hsa-gen.c (gen_hsa_insns_for_operation_assignment): Likewise. (gen_hsa_fma): New function. (gen_hsa_insn_for_internal_fn_call): Use it for IFN_FMA, IFN_FMS, IFN_FNMA and IFN_FNMS. * match.pd: Add folds for IFN_FMS, IFN_FNMA and IFN_FNMS. * tree-ssa-math-opts.c (aggressive_valueize): New function. (convert_mult_to_fma_1): Use the gimple_build interface and use aggerssive_valueize to fold the result. (convert_mult_to_fma): Use direct_internal_fn_suppoerted_p instead of checking for optabs directly. * config/i386/i386.c (ix86_add_stmt_cost): Recognize FMAs as calls rather than FMA_EXPRs. * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Create a call to IFN_FMA instead of an FMA_EXPR. gcc/brig/ * brigfrontend/brig-function.cc (brig_function::get_builtin_for_hsa_opcode): Use BUILT_IN_FMA for BRIG_OPCODE_FMA. (brig_function::get_tree_code_for_hsa_opcode): Treat BUILT_IN_FMA as a call. gcc/c/ * gimple-parser.c (c_parser_gimple_postfix_expression): Remove __FMA_EXPR handlng. gcc/cp/ * constexpr.c (cxx_eval_constant_expression): Remove FMA_EXPR handling. (potential_constant_expression_1): Likewise. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_all_scalar_fma): New proc. * gcc.dg/fma-1.c: New test. * gcc.dg/fma-2.c: Likewise. * gcc.dg/fma-3.c: Likewise. * gcc.dg/fma-4.c: Likewise. * gcc.dg/fma-5.c: Likewise. * gcc.dg/fma-6.c: Likewise. * gcc.dg/fma-7.c: Likewise. * gcc.dg/gimplefe-26.c: Remove. * gfortran.dg/reassoc_7.f: Pass -ffp-contract=off. * gfortran.dg/reassoc_8.f: Likewise. * gfortran.dg/reassoc_9.f: Likewise. * gfortran.dg/reassoc_10.f: Likewise. Index: gcc/doc/sourcebuild.texi =================================================================== --- gcc/doc/sourcebuild.texi 2018-05-08 09:42:01.479723260 +0100 +++ gcc/doc/sourcebuild.texi 2018-05-11 18:08:24.833947453 +0100 @@ -2170,6 +2170,11 @@ Target supports wide characters. @subsubsection Other attributes @table @code +@item all_scalar_fma +Target supports all four fused multiply-add optabs for both @code{float} +and @code{double}. These optabs are: @code{fma_optab}, @code{fms_optab}, +@code{fnma_optab} and @code{fnms_optab}. + @item automatic_stack_alignment Target supports automatic stack alignment. Index: gcc/tree.def =================================================================== --- gcc/tree.def 2018-01-03 11:12:58.606649676 +0000 +++ gcc/tree.def 2018-05-11 18:08:24.852946671 +0100 @@ -1345,12 +1345,6 @@ DEFTREECODE (WIDEN_MULT_MINUS_EXPR, "wid by the second argument. */ DEFTREECODE (WIDEN_LSHIFT_EXPR, "widen_lshift_expr", tcc_binary, 2) -/* Fused multiply-add. - All operands and the result are of the same type. No intermediate - rounding is performed after multiplying operand one with operand two - before adding operand three. */ -DEFTREECODE (FMA_EXPR, "fma_expr", tcc_expression, 3) - /* Widening vector multiplication. The two operands are vectors with N elements of size S. Multiplying the elements of the two vectors will result in N products of size 2*S. Index: gcc/internal-fn.def =================================================================== --- gcc/internal-fn.def 2018-02-20 09:40:41.809452604 +0000 +++ gcc/internal-fn.def 2018-05-11 18:08:24.842947083 +0100 @@ -57,6 +57,7 @@ along with GCC; see the file COPYING3. - unary: a normal unary optab, such as vec_reverse_ - binary: a normal binary optab, such as vec_interleave_lo_ + - ternary: a normal ternary optab, such as fma4 - cond_binary: a conditional binary optab, such as addcc @@ -138,6 +139,10 @@ DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CO DEF_INTERNAL_OPTAB_FN (VEC_SHL_INSERT, ECF_CONST | ECF_NOTHROW, vec_shl_insert, binary) +DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary) +DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary) +DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary) + DEF_INTERNAL_OPTAB_FN (COND_ADD, ECF_CONST, cond_add, cond_binary) DEF_INTERNAL_OPTAB_FN (COND_SUB, ECF_CONST, cond_sub, cond_binary) DEF_INTERNAL_SIGNED_OPTAB_FN (COND_MIN, ECF_CONST, first, @@ -218,6 +223,9 @@ DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONS /* FP scales. */ DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) +/* Ternary math functions. */ +DEF_INTERNAL_FLT_FN (FMA, ECF_CONST, fma, ternary) + /* Unary integer ops. */ DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb, unary) DEF_INTERNAL_INT_FN (CLZ, ECF_CONST | ECF_NOTHROW, clz, unary) Index: gcc/internal-fn.c =================================================================== --- gcc/internal-fn.c 2018-03-02 09:44:45.456595753 +0000 +++ gcc/internal-fn.c 2018-05-11 18:08:24.841947124 +0100 @@ -90,6 +90,7 @@ #define mask_store_lanes_direct { 0, 0, #define scatter_store_direct { 3, 3, false } #define unary_direct { 0, 0, true } #define binary_direct { 0, 0, true } +#define ternary_direct { 0, 0, true } #define cond_unary_direct { 1, 1, true } #define cond_binary_direct { 1, 1, true } #define while_direct { 0, 2, false } @@ -2962,6 +2963,9 @@ #define expand_unary_optab_fn(FN, STMT, #define expand_binary_optab_fn(FN, STMT, OPTAB) \ expand_direct_optab_fn (FN, STMT, OPTAB, 2) +#define expand_ternary_optab_fn(FN, STMT, OPTAB) \ + expand_direct_optab_fn (FN, STMT, OPTAB, 3) + #define expand_cond_unary_optab_fn(FN, STMT, OPTAB) \ expand_direct_optab_fn (FN, STMT, OPTAB, 2) @@ -3047,6 +3051,7 @@ multi_vector_optab_supported_p (convert_ #define direct_unary_optab_supported_p direct_optab_supported_p #define direct_binary_optab_supported_p direct_optab_supported_p +#define direct_ternary_optab_supported_p direct_optab_supported_p #define direct_cond_unary_optab_supported_p direct_optab_supported_p #define direct_cond_binary_optab_supported_p direct_optab_supported_p #define direct_mask_load_optab_supported_p direct_optab_supported_p Index: gcc/Makefile.in =================================================================== --- gcc/Makefile.in 2018-03-13 15:06:01.749426280 +0000 +++ gcc/Makefile.in 2018-05-11 18:08:24.816948152 +0100 @@ -2786,7 +2786,7 @@ build/genmddump.o : genmddump.c $(RTL_BA $(CORETYPES_H) $(GTM_H) errors.h $(READ_MD_H) $(GENSUPPORT_H) build/genmatch.o : genmatch.c $(BCONFIG_H) $(SYSTEM_H) \ $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-map.h $(GGC_H) is-a.h \ - tree.def builtins.def internal-fn.def + tree.def builtins.def internal-fn.def case-cfn-macros.h build/gencfn-macros.o : gencfn-macros.c $(BCONFIG_H) $(SYSTEM_H) \ $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-set.h builtins.def \ internal-fn.def Index: gcc/builtins.c =================================================================== --- gcc/builtins.c 2018-05-08 09:42:01.633717606 +0100 +++ gcc/builtins.c 2018-05-11 18:08:24.817948111 +0100 @@ -8340,21 +8340,6 @@ fold_builtin_abs (location_t loc, tree a return fold_build1_loc (loc, ABS_EXPR, type, arg); } -/* Fold a call to fma, fmaf, or fmal with arguments ARG[012]. */ - -static tree -fold_builtin_fma (location_t loc, tree arg0, tree arg1, tree arg2, tree type) -{ - /* ??? Only expand to FMA_EXPR if it's directly supported. */ - if (validate_arg (arg0, REAL_TYPE) - && validate_arg (arg1, REAL_TYPE) - && validate_arg (arg2, REAL_TYPE) - && optab_handler (fma_optab, TYPE_MODE (type)) != CODE_FOR_nothing) - return fold_build3_loc (loc, FMA_EXPR, type, arg0, arg1, arg2); - - return NULL_TREE; -} - /* Fold a call to builtin carg(a+bi) -> atan2(b,a). */ static tree @@ -9260,10 +9245,6 @@ fold_builtin_3 (location_t loc, tree fnd CASE_FLT_FN (BUILT_IN_SINCOS): return fold_builtin_sincos (loc, arg0, arg1, arg2); - CASE_FLT_FN (BUILT_IN_FMA): - CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMA): - return fold_builtin_fma (loc, arg0, arg1, arg2, type); - CASE_FLT_FN (BUILT_IN_REMQUO): if (validate_arg (arg0, REAL_TYPE) && validate_arg (arg1, REAL_TYPE) Index: gcc/cfgexpand.c =================================================================== --- gcc/cfgexpand.c 2018-05-09 11:34:47.190553782 +0100 +++ gcc/cfgexpand.c 2018-05-11 18:08:24.818948070 +0100 @@ -4202,7 +4202,6 @@ expand_debug_expr (tree exp) case SAD_EXPR: case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: - case FMA_EXPR: goto ternary; case TRUTH_ANDIF_EXPR: @@ -5190,9 +5189,6 @@ expand_debug_expr (tree exp) } return NULL; - case FMA_EXPR: - return simplify_gen_ternary (FMA, mode, inner_mode, op0, op1, op2); - default: flag_unsupported: if (flag_checking) Index: gcc/expr.c =================================================================== --- gcc/expr.c 2018-05-01 19:30:32.099548805 +0100 +++ gcc/expr.c 2018-05-11 18:08:24.833947453 +0100 @@ -8853,67 +8853,6 @@ #define REDUCE_BIT_FIELD(expr) (reduce_b expand_operands (treeop0, treeop1, subtarget, &op0, &op1, EXPAND_NORMAL); return REDUCE_BIT_FIELD (expand_mult (mode, op0, op1, target, unsignedp)); - case FMA_EXPR: - { - optab opt = fma_optab; - gimple *def0, *def2; - - /* If there is no insn for FMA, emit it as __builtin_fma{,f,l} - call. */ - if (optab_handler (fma_optab, mode) == CODE_FOR_nothing) - { - tree fn = mathfn_built_in (TREE_TYPE (treeop0), BUILT_IN_FMA); - tree call_expr; - - gcc_assert (fn != NULL_TREE); - call_expr = build_call_expr (fn, 3, treeop0, treeop1, treeop2); - return expand_builtin (call_expr, target, subtarget, mode, false); - } - - def0 = get_def_for_expr (treeop0, NEGATE_EXPR); - /* The multiplication is commutative - look at its 2nd operand - if the first isn't fed by a negate. */ - if (!def0) - { - def0 = get_def_for_expr (treeop1, NEGATE_EXPR); - /* Swap operands if the 2nd operand is fed by a negate. */ - if (def0) - std::swap (treeop0, treeop1); - } - def2 = get_def_for_expr (treeop2, NEGATE_EXPR); - - op0 = op2 = NULL; - - if (def0 && def2 - && optab_handler (fnms_optab, mode) != CODE_FOR_nothing) - { - opt = fnms_optab; - op0 = expand_normal (gimple_assign_rhs1 (def0)); - op2 = expand_normal (gimple_assign_rhs1 (def2)); - } - else if (def0 - && optab_handler (fnma_optab, mode) != CODE_FOR_nothing) - { - opt = fnma_optab; - op0 = expand_normal (gimple_assign_rhs1 (def0)); - } - else if (def2 - && optab_handler (fms_optab, mode) != CODE_FOR_nothing) - { - opt = fms_optab; - op2 = expand_normal (gimple_assign_rhs1 (def2)); - } - - if (op0 == NULL) - op0 = expand_expr (treeop0, subtarget, VOIDmode, EXPAND_NORMAL); - if (op2 == NULL) - op2 = expand_normal (treeop2); - op1 = expand_normal (treeop1); - - return expand_ternary_op (TYPE_MODE (type), opt, - op0, op1, op2, target, 0); - } - case MULT_EXPR: /* If this is a fixed-point operation, then we cannot use the code below because "expand_mult" doesn't support sat/no-sat fixed-point Index: gcc/fold-const.c =================================================================== --- gcc/fold-const.c 2018-05-08 09:42:01.637717459 +0100 +++ gcc/fold-const.c 2018-05-11 18:08:24.835947371 +0100 @@ -3297,7 +3297,6 @@ #define OP_SAME_WITH_NULL(N) \ case TRUTH_ORIF_EXPR: return OP_SAME (0) && OP_SAME (1); - case FMA_EXPR: case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: if (!OP_SAME (2)) @@ -11708,17 +11707,6 @@ fold_ternary_loc (location_t loc, enum t return NULL_TREE; - case FMA_EXPR: - /* For integers we can decompose the FMA if possible. */ - if (TREE_CODE (arg0) == INTEGER_CST - && TREE_CODE (arg1) == INTEGER_CST) - return fold_build2_loc (loc, PLUS_EXPR, type, - const_binop (MULT_EXPR, arg0, arg1), arg2); - if (integer_zerop (arg2)) - return fold_build2_loc (loc, MULT_EXPR, type, arg0, arg1); - - return fold_fma (loc, type, arg0, arg1, arg2); - case VEC_PERM_EXPR: if (TREE_CODE (arg2) == VECTOR_CST) { Index: gcc/gimple-pretty-print.c =================================================================== --- gcc/gimple-pretty-print.c 2018-02-13 10:28:33.325853088 +0000 +++ gcc/gimple-pretty-print.c 2018-05-11 18:08:24.836947330 +0100 @@ -490,27 +490,6 @@ dump_ternary_rhs (pretty_printer *buffer pp_greater (buffer); break; - case FMA_EXPR: - if (flags & TDF_GIMPLE) - { - pp_string (buffer, "__FMA ("); - dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false); - pp_comma (buffer); - dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false); - pp_comma (buffer); - dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false); - pp_right_paren (buffer); - } - else - { - dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false); - pp_string (buffer, " * "); - dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false); - pp_string (buffer, " + "); - dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false); - } - break; - case DOT_PROD_EXPR: pp_string (buffer, "DOT_PROD_EXPR <"); dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false); Index: gcc/gimple.c =================================================================== --- gcc/gimple.c 2018-02-13 10:28:33.415861313 +0000 +++ gcc/gimple.c 2018-05-11 18:08:24.838947247 +0100 @@ -2143,8 +2143,7 @@ #define DEFTREECODE(SYM, STRING, TYPE, N || (SYM) == REALIGN_LOAD_EXPR \ || (SYM) == VEC_COND_EXPR \ || (SYM) == VEC_PERM_EXPR \ - || (SYM) == BIT_INSERT_EXPR \ - || (SYM) == FMA_EXPR) ? GIMPLE_TERNARY_RHS \ + || (SYM) == BIT_INSERT_EXPR) ? GIMPLE_TERNARY_RHS \ : ((SYM) == CONSTRUCTOR \ || (SYM) == OBJ_TYPE_REF \ || (SYM) == ASSERT_EXPR \ Index: gcc/gimplify.c =================================================================== --- gcc/gimplify.c 2018-05-08 09:42:02.972668452 +0100 +++ gcc/gimplify.c 2018-05-11 18:08:24.840947165 +0100 @@ -12086,7 +12086,6 @@ gimplify_expr (tree *expr_p, gimple_seq } break; - case FMA_EXPR: case VEC_PERM_EXPR: /* Classified as tcc_expression. */ goto expr_3; Index: gcc/optabs-tree.c =================================================================== --- gcc/optabs-tree.c 2018-01-13 18:01:26.108685820 +0000 +++ gcc/optabs-tree.c 2018-05-11 18:08:24.842947083 +0100 @@ -143,9 +143,6 @@ optab_for_tree_code (enum tree_code code : (TYPE_SATURATING (type) ? ssmsub_widen_optab : smsub_widen_optab)); - case FMA_EXPR: - return fma_optab; - case VEC_WIDEN_MULT_HI_EXPR: return TYPE_UNSIGNED (type) ? vec_widen_umult_hi_optab : vec_widen_smult_hi_optab; Index: gcc/tree-cfg.c =================================================================== --- gcc/tree-cfg.c 2018-05-01 19:31:03.079312535 +0100 +++ gcc/tree-cfg.c 2018-05-11 18:08:24.848946836 +0100 @@ -4109,20 +4109,6 @@ verify_gimple_assign_ternary (gassign *s } break; - case FMA_EXPR: - if (!useless_type_conversion_p (lhs_type, rhs1_type) - || !useless_type_conversion_p (lhs_type, rhs2_type) - || !useless_type_conversion_p (lhs_type, rhs3_type)) - { - error ("type mismatch in fused multiply-add expression"); - debug_generic_expr (lhs_type); - debug_generic_expr (rhs1_type); - debug_generic_expr (rhs2_type); - debug_generic_expr (rhs3_type); - return true; - } - break; - case VEC_COND_EXPR: if (!VECTOR_BOOLEAN_TYPE_P (rhs1_type) || maybe_ne (TYPE_VECTOR_SUBPARTS (rhs1_type), Index: gcc/tree-eh.c =================================================================== --- gcc/tree-eh.c 2018-02-08 13:34:20.321293427 +0000 +++ gcc/tree-eh.c 2018-05-11 18:08:24.849946795 +0100 @@ -2512,8 +2512,7 @@ operation_could_trap_p (enum tree_code o if (TREE_CODE_CLASS (op) != tcc_comparison && TREE_CODE_CLASS (op) != tcc_unary - && TREE_CODE_CLASS (op) != tcc_binary - && op != FMA_EXPR) + && TREE_CODE_CLASS (op) != tcc_binary) return false; return operation_could_trap_helper_p (op, fp_operation, honor_trapv, @@ -2825,8 +2824,7 @@ stmt_could_throw_1_p (gassign *stmt) if (TREE_CODE_CLASS (code) == tcc_comparison || TREE_CODE_CLASS (code) == tcc_unary - || TREE_CODE_CLASS (code) == tcc_binary - || code == FMA_EXPR) + || TREE_CODE_CLASS (code) == tcc_binary) { if (TREE_CODE_CLASS (code) == tcc_comparison) t = TREE_TYPE (gimple_assign_rhs1 (stmt)); Index: gcc/tree-inline.c =================================================================== --- gcc/tree-inline.c 2018-03-13 15:06:01.738427167 +0000 +++ gcc/tree-inline.c 2018-05-11 18:08:24.849946795 +0100 @@ -3855,7 +3855,6 @@ estimate_operator_cost (enum tree_code c case MINUS_EXPR: case MULT_EXPR: case MULT_HIGHPART_EXPR: - case FMA_EXPR: case ADDR_SPACE_CONVERT_EXPR: case FIXED_CONVERT_EXPR: Index: gcc/tree-pretty-print.c =================================================================== --- gcc/tree-pretty-print.c 2018-03-13 15:06:00.523525161 +0000 +++ gcc/tree-pretty-print.c 2018-05-11 18:08:24.850946754 +0100 @@ -2901,16 +2901,6 @@ dump_generic_node (pretty_printer *pp, t pp_string (pp, " > "); break; - case FMA_EXPR: - pp_string (pp, " FMA_EXPR < "); - dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false); - pp_string (pp, ", "); - dump_generic_node (pp, TREE_OPERAND (node, 1), spc, flags, false); - pp_string (pp, ", "); - dump_generic_node (pp, TREE_OPERAND (node, 2), spc, flags, false); - pp_string (pp, " > "); - break; - case OACC_PARALLEL: pp_string (pp, "#pragma acc parallel"); goto dump_omp_clauses_body; @@ -3549,7 +3539,6 @@ op_code_prio (enum tree_code code) case CEIL_MOD_EXPR: case FLOOR_MOD_EXPR: case ROUND_MOD_EXPR: - case FMA_EXPR: return 13; case TRUTH_NOT_EXPR: Index: gcc/tree-ssa-loop-im.c =================================================================== --- gcc/tree-ssa-loop-im.c 2018-03-13 15:06:00.521525322 +0000 +++ gcc/tree-ssa-loop-im.c 2018-05-11 18:08:24.850946754 +0100 @@ -493,7 +493,6 @@ stmt_cost (gimple *stmt) case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: case DOT_PROD_EXPR: - case FMA_EXPR: case TRUNC_DIV_EXPR: case CEIL_DIV_EXPR: case FLOOR_DIV_EXPR: Index: gcc/tree-ssa-operands.c =================================================================== --- gcc/tree-ssa-operands.c 2018-01-03 11:12:59.133628905 +0000 +++ gcc/tree-ssa-operands.c 2018-05-11 18:08:24.851946713 +0100 @@ -849,7 +849,6 @@ get_expr_operands (struct function *fn, case REALIGN_LOAD_EXPR: case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: - case FMA_EXPR: { get_expr_operands (fn, stmt, &TREE_OPERAND (expr, 0), flags); get_expr_operands (fn, stmt, &TREE_OPERAND (expr, 1), flags); Index: gcc/tree.c =================================================================== --- gcc/tree.c 2018-05-08 09:42:01.631717680 +0100 +++ gcc/tree.c 2018-05-11 18:08:24.852946671 +0100 @@ -7171,7 +7171,6 @@ commutative_ternary_tree_code (enum tree case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: case DOT_PROD_EXPR: - case FMA_EXPR: return true; default: @@ -7457,7 +7456,6 @@ add_expr (const_tree t, inchash::hash &h flags &= ~OEP_ADDRESS_OF; break; - case FMA_EXPR: case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: { Index: gcc/fold-const-call.h =================================================================== --- gcc/fold-const-call.h 2018-01-03 11:12:58.150667646 +0000 +++ gcc/fold-const-call.h 2018-05-11 18:08:24.833947453 +0100 @@ -23,7 +23,6 @@ #define GCC_FOLD_CONST_CALL_H tree fold_const_call (combined_fn, tree, tree); tree fold_const_call (combined_fn, tree, tree, tree); tree fold_const_call (combined_fn, tree, tree, tree, tree); -tree fold_fma (location_t, tree, tree, tree, tree); tree build_cmp_result (tree type, int res); #endif Index: gcc/fold-const-call.c =================================================================== --- gcc/fold-const-call.c 2018-05-08 09:42:01.844709861 +0100 +++ gcc/fold-const-call.c 2018-05-11 18:08:24.833947453 +0100 @@ -1606,6 +1606,26 @@ fold_const_call_ssss (real_value *result CASE_CFN_FMA_FN: return do_mpfr_arg3 (result, mpfr_fma, arg0, arg1, arg2, format); + case CFN_FMS: + { + real_value new_arg2 = real_value_negate (arg2); + return do_mpfr_arg3 (result, mpfr_fma, arg0, arg1, &new_arg2, format); + } + + case CFN_FNMA: + { + real_value new_arg0 = real_value_negate (arg0); + return do_mpfr_arg3 (result, mpfr_fma, &new_arg0, arg1, arg2, format); + } + + case CFN_FNMS: + { + real_value new_arg0 = real_value_negate (arg0); + real_value new_arg2 = real_value_negate (arg2); + return do_mpfr_arg3 (result, mpfr_fma, &new_arg0, arg1, + &new_arg2, format); + } + default: return false; } @@ -1719,20 +1739,3 @@ fold_const_call (combined_fn fn, tree ty return fold_const_call_1 (fn, type, arg0, arg1, arg2); } } - -/* Fold a fma operation with arguments ARG[012]. */ - -tree -fold_fma (location_t, tree type, tree arg0, tree arg1, tree arg2) -{ - REAL_VALUE_TYPE result; - if (real_cst_p (arg0) - && real_cst_p (arg1) - && real_cst_p (arg2) - && do_mpfr_arg3 (&result, mpfr_fma, TREE_REAL_CST_PTR (arg0), - TREE_REAL_CST_PTR (arg1), TREE_REAL_CST_PTR (arg2), - REAL_MODE_FORMAT (TYPE_MODE (type)))) - return build_real (type, result); - - return NULL_TREE; -} Index: gcc/genmatch.c =================================================================== --- gcc/genmatch.c 2018-03-01 08:20:43.846526310 +0000 +++ gcc/genmatch.c 2018-05-11 18:08:24.835947371 +0100 @@ -241,6 +241,20 @@ enum internal_fn { IFN_LAST }; +enum combined_fn { +#define DEF_BUILTIN(ENUM, N, C, T, LT, B, F, NA, AT, IM, COND) \ + CFN_##ENUM = int (ENUM), +#include "builtins.def" + +#define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) \ + CFN_##CODE = int (END_BUILTINS) + int (IFN_##CODE), +#include "internal-fn.def" + + CFN_LAST +}; + +#include "case-cfn-macros.h" + /* Return true if CODE represents a commutative tree code. Otherwise return false. */ bool @@ -288,7 +302,6 @@ commutative_ternary_tree_code (enum tree case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: case DOT_PROD_EXPR: - case FMA_EXPR: return true; default: @@ -450,6 +463,44 @@ is_a_helper ::test (id_base * return id->kind == id_base::USER; } +/* If ID has a pair of consecutive, commutative operands, return the + index of the first, otherwise return -1. */ + +static int +commutative_op (id_base *id) +{ + if (operator_id *code = dyn_cast (id)) + { + if (commutative_tree_code (code->code) + || commutative_ternary_tree_code (code->code)) + return 0; + return -1; + } + if (fn_id *fn = dyn_cast (id)) + switch (fn->fn) + { + CASE_CFN_FMA: + case CFN_FMS: + case CFN_FNMA: + case CFN_FNMS: + return 0; + + default: + return -1; + } + if (user_id *uid = dyn_cast (id)) + { + int res = commutative_op (uid->substitutes[0]); + if (res < 0) + return 0; + for (unsigned i = 1; i < uid->substitutes.length (); ++i) + if (res != commutative_op (uid->substitutes[i])) + return -1; + return res; + } + return -1; +} + /* Add a predicate identifier to the hash. */ static predicate_id * @@ -946,6 +997,9 @@ commutate (operand *op, vecis_commutative) return ret; + /* The operation is always binary if it isn't inherently commutative. */ + int natural_opno = commutative_op (e->operation); + unsigned int opno = natural_opno >= 0 ? natural_opno : 0; for (unsigned i = 0; i < result.length (); ++i) { expr *ne = new expr (e); @@ -994,9 +1048,11 @@ commutate (operand *op, vecis_commutative = false; - // result[i].length () is 2 since e->operation is binary - for (unsigned j = result[i].length (); j; --j) - ne->append_op (result[i][j-1]); + for (unsigned j = 0; j < result[i].length (); ++j) + { + int old_j = (j == opno ? opno + 1 : j == opno + 1 ? opno : j); + ne->append_op (result[i][old_j]); + } ret.safe_push (ne); } @@ -2759,24 +2815,18 @@ dt_operand::gen_gimple_expr (FILE *f, in /* While the toplevel operands are canonicalized by the caller after valueizing operands of sub-expressions we have to re-canonicalize operand order. */ - if (operator_id *code = dyn_cast (id)) + int opno = commutative_op (id); + if (opno >= 0) { - /* ??? We can't canonicalize tcc_comparison operands here - because that requires changing the comparison code which - we already matched... */ - if (commutative_tree_code (code->code) - || commutative_ternary_tree_code (code->code)) - { - char child_opname0[20], child_opname1[20]; - gen_opname (child_opname0, 0); - gen_opname (child_opname1, 1); - fprintf_indent (f, indent, - "if (tree_swap_operands_p (%s, %s))\n", - child_opname0, child_opname1); - fprintf_indent (f, indent, - " std::swap (%s, %s);\n", - child_opname0, child_opname1); - } + char child_opname0[20], child_opname1[20]; + gen_opname (child_opname0, opno); + gen_opname (child_opname1, opno + 1); + fprintf_indent (f, indent, + "if (tree_swap_operands_p (%s, %s))\n", + child_opname0, child_opname1); + fprintf_indent (f, indent, + " std::swap (%s, %s);\n", + child_opname0, child_opname1); } return n_braces; @@ -4217,11 +4267,14 @@ parser::parse_expr () e->operation->id, e->operation->nargs, e->ops.length ()); if (is_commutative) { - if (e->ops.length () == 2) + if (e->ops.length () == 2 + || commutative_op (e->operation) >= 0) e->is_commutative = true; else - fatal_at (token, "only binary operators or function with " - "two arguments can be marked commutative"); + fatal_at (token, "only binary operators or functions with " + "two arguments can be marked commutative, " + "unless the operation is known to be inherently " + "commutative"); } e->expr_type = expr_type; return op; Index: gcc/gimple-ssa-backprop.c =================================================================== --- gcc/gimple-ssa-backprop.c 2018-01-03 11:12:59.113629694 +0000 +++ gcc/gimple-ssa-backprop.c 2018-05-11 18:08:24.836947330 +0100 @@ -375,6 +375,9 @@ backprop::process_builtin_call_use (gcal CASE_CFN_FMA: CASE_CFN_FMA_FN: + case CFN_FMS: + case CFN_FNMA: + case CFN_FNMS: /* In X * X + Y, where Y is distinct from X, the sign of X doesn't matter. */ if (gimple_call_arg (call, 0) == rhs @@ -420,15 +423,6 @@ backprop::process_assign_use (gassign *a } break; - case FMA_EXPR: - /* In X * X + Y, where Y is distinct from X, the sign of X doesn't - matter. */ - if (gimple_assign_rhs1 (assign) == rhs - && gimple_assign_rhs2 (assign) == rhs - && gimple_assign_rhs3 (assign) != rhs) - info->flags.ignore_sign = true; - break; - case MULT_EXPR: /* In X * X, the sign of X doesn't matter. */ if (gimple_assign_rhs1 (assign) == rhs Index: gcc/hsa-gen.c =================================================================== --- gcc/hsa-gen.c 2018-03-17 08:30:21.230924973 +0000 +++ gcc/hsa-gen.c 2018-05-11 18:08:24.841947124 +0100 @@ -3178,23 +3178,6 @@ gen_hsa_insns_for_operation_assignment ( case NEGATE_EXPR: opcode = BRIG_OPCODE_NEG; break; - case FMA_EXPR: - /* There is a native HSA instruction for scalar FMAs but not for vector - ones. */ - if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE) - { - hsa_op_reg *dest - = hsa_cfun->reg_for_gimple_ssa (gimple_assign_lhs (assign)); - hsa_op_with_type *op1 = hsa_reg_or_immed_for_gimple_op (rhs1, hbb); - hsa_op_with_type *op2 = hsa_reg_or_immed_for_gimple_op (rhs2, hbb); - hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb); - hsa_op_reg *tmp = new hsa_op_reg (dest->m_type); - gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp, op1, op2, hbb); - gen_hsa_binary_operation (BRIG_OPCODE_ADD, dest, tmp, op3, hbb); - return; - } - opcode = BRIG_OPCODE_MAD; - break; case MIN_EXPR: opcode = BRIG_OPCODE_MIN; break; @@ -4490,6 +4473,57 @@ gen_hsa_divmod (gcall *call, hsa_bb *hbb insn->set_output_in_type (dest, 0, hbb); } +/* Emit instructions that implement FMA, FMS, FNMA or FNMS call STMT. + Instructions are appended to basic block HBB. NEGATE1 is true for + FNMA and FNMS. NEGATE3 is true for FMS and FNMS. */ + +static void +gen_hsa_fma (gcall *call, hsa_bb *hbb, bool negate1, bool negate3) +{ + tree lhs = gimple_call_lhs (call); + if (lhs == NULL_TREE) + return; + + tree rhs1 = gimple_call_arg (call, 0); + tree rhs2 = gimple_call_arg (call, 1); + tree rhs3 = gimple_call_arg (call, 2); + + hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs); + hsa_op_with_type *op1 = hsa_reg_or_immed_for_gimple_op (rhs1, hbb); + hsa_op_with_type *op2 = hsa_reg_or_immed_for_gimple_op (rhs2, hbb); + hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb); + + if (negate1) + { + hsa_op_reg *tmp = new hsa_op_reg (dest->m_type); + gen_hsa_unary_operation (BRIG_OPCODE_NEG, tmp, op1, hbb); + op1 = tmp; + } + + /* There is a native HSA instruction for scalar FMAs but not for vector + ones. */ + if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE) + { + hsa_op_reg *tmp = new hsa_op_reg (dest->m_type); + gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp, op1, op2, hbb); + gen_hsa_binary_operation (negate3 ? BRIG_OPCODE_SUB : BRIG_OPCODE_ADD, + dest, tmp, op3, hbb); + } + else + { + if (negate3) + { + hsa_op_reg *tmp = new hsa_op_reg (dest->m_type); + gen_hsa_unary_operation (BRIG_OPCODE_NEG, tmp, op3, hbb); + op3 = tmp; + } + hsa_insn_basic *insn = new hsa_insn_basic (4, BRIG_OPCODE_MAD, + dest->m_type, dest, + op1, op2, op3); + hbb->append_insn (insn); + } +} + /* Set VALUE to a shadow kernel debug argument and append a new instruction to HBB basic block. */ @@ -5224,6 +5258,22 @@ gen_hsa_insn_for_internal_fn_call (gcall gen_hsa_insns_for_call_of_internal_fn (stmt, hbb); break; + case IFN_FMA: + gen_hsa_fma (stmt, hbb, false, false); + break; + + case IFN_FMS: + gen_hsa_fma (stmt, hbb, false, true); + break; + + case IFN_FNMA: + gen_hsa_fma (stmt, hbb, true, false); + break; + + case IFN_FNMS: + gen_hsa_fma (stmt, hbb, true, true); + break; + default: HSA_SORRY_ATV (gimple_location (stmt), "support for HSA does not implement internal function: %s", Index: gcc/match.pd =================================================================== --- gcc/match.pd 2018-05-02 08:39:32.882737077 +0100 +++ gcc/match.pd 2018-05-11 18:08:24.842947083 +0100 @@ -4702,3 +4702,60 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) || wi::geu_p (wi::to_wide (@rpos), wi::to_wide (@ipos) + isize)) (BIT_FIELD_REF @0 @rsize @rpos))))) + +(for fmas (FMA) + (simplify + (fmas:c (negate @0) @1 @2) + (IFN_FNMA @0 @1 @2)) + (simplify + (fmas @0 @1 (negate @2)) + (IFN_FMS @0 @1 @2)) + (simplify + (fmas:c (negate @0) @1 (negate @2)) + (IFN_FNMS @0 @1 @2)) + (simplify + (negate (fmas@3 @0 @1 @2)) + (if (single_use (@3)) + (IFN_FNMS @0 @1 @2)))) + +(simplify + (IFN_FMS:c (negate @0) @1 @2) + (IFN_FNMS @0 @1 @2)) +(simplify + (IFN_FMS @0 @1 (negate @2)) + (IFN_FMA @0 @1 @2)) +(simplify + (IFN_FMS:c (negate @0) @1 (negate @2)) + (IFN_FNMA @0 @1 @2)) +(simplify + (negate (IFN_FMS@3 @0 @1 @2)) + (if (single_use (@3)) + (IFN_FNMA @0 @1 @2))) + +(simplify + (IFN_FNMA:c (negate @0) @1 @2) + (IFN_FMA @0 @1 @2)) +(simplify + (IFN_FNMA @0 @1 (negate @2)) + (IFN_FNMS @0 @1 @2)) +(simplify + (IFN_FNMA:c (negate @0) @1 (negate @2)) + (IFN_FMS @0 @1 @2)) +(simplify + (negate (IFN_FNMA@3 @0 @1 @2)) + (if (single_use (@3)) + (IFN_FMS @0 @1 @2))) + +(simplify + (IFN_FNMS:c (negate @0) @1 @2) + (IFN_FMS @0 @1 @2)) +(simplify + (IFN_FNMS @0 @1 (negate @2)) + (IFN_FNMA @0 @1 @2)) +(simplify + (IFN_FNMS:c (negate @0) @1 (negate @2)) + (IFN_FMA @0 @1 @2)) +(simplify + (negate (IFN_FNMS@3 @0 @1 @2)) + (if (single_use (@3)) + (IFN_FMA @0 @1 @2))) Index: gcc/tree-ssa-math-opts.c =================================================================== --- gcc/tree-ssa-math-opts.c 2018-01-12 14:45:51.037434575 +0000 +++ gcc/tree-ssa-math-opts.c 2018-05-11 18:08:24.850946754 +0100 @@ -2640,6 +2640,14 @@ convert_plusminus_to_widen (gimple_stmt_ return true; } +/* gimple_fold callback that "valueizes" everything. */ + +static tree +aggressive_valueize (tree val) +{ + return val; +} + /* Given a result MUL_RESULT which is a result of a multiplication of OP1 and OP2 and which we know is used in statements that can be, together with the multiplication, converted to FMAs, perform the transformation. */ @@ -2650,7 +2658,7 @@ convert_mult_to_fma_1 (tree mul_result, tree type = TREE_TYPE (mul_result); gimple *use_stmt; imm_use_iterator imm_iter; - gassign *fma_stmt; + gcall *fma_stmt; FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, mul_result) { @@ -2658,6 +2666,7 @@ convert_mult_to_fma_1 (tree mul_result, enum tree_code use_code; tree addop, mulop1 = op1, result = mul_result; bool negate_p = false; + gimple_seq seq = NULL; if (is_gimple_debug (use_stmt)) continue; @@ -2683,11 +2692,7 @@ convert_mult_to_fma_1 (tree mul_result, addop = gimple_assign_rhs2 (use_stmt); /* a * b - c -> a * b + (-c) */ if (gimple_assign_rhs_code (use_stmt) == MINUS_EXPR) - addop = force_gimple_operand_gsi (&gsi, - build1 (NEGATE_EXPR, - type, addop), - true, NULL_TREE, true, - GSI_SAME_STMT); + addop = gimple_build (&seq, NEGATE_EXPR, type, addop); } else { @@ -2698,23 +2703,26 @@ convert_mult_to_fma_1 (tree mul_result, } if (negate_p) - mulop1 = force_gimple_operand_gsi (&gsi, - build1 (NEGATE_EXPR, - type, mulop1), - true, NULL_TREE, true, - GSI_SAME_STMT); + mulop1 = gimple_build (&seq, NEGATE_EXPR, type, mulop1); - fma_stmt = gimple_build_assign (gimple_assign_lhs (use_stmt), - FMA_EXPR, mulop1, op2, addop); + if (seq) + gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT); + fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2, addop); + gimple_call_set_lhs (fma_stmt, gimple_assign_lhs (use_stmt)); + gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal (use_stmt)); + gsi_replace (&gsi, fma_stmt, true); + /* Valueize aggressively so that we generate FMS, FNMA and FNMS + regardless of where the negation occurs. */ + if (fold_stmt (&gsi, aggressive_valueize)) + update_stmt (gsi_stmt (gsi)); if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, "Generated FMA "); - print_gimple_stmt (dump_file, fma_stmt, 0, 0); + print_gimple_stmt (dump_file, gsi_stmt (gsi), 0, 0); fprintf (dump_file, "\n"); } - gsi_replace (&gsi, fma_stmt, true); widen_mul_stats.fmas_inserted++; } } @@ -2862,7 +2870,8 @@ convert_mult_to_fma (gimple *mul_stmt, t /* If the target doesn't support it, don't generate it. We assume that if fma isn't available then fms, fnma or fnms are not either. */ - if (optab_handler (fma_optab, TYPE_MODE (type)) == CODE_FOR_nothing) + optimization_type opt_type = bb_optimization_type (gimple_bb (mul_stmt)); + if (!direct_internal_fn_supported_p (IFN_FMA, type, opt_type)) return false; /* If the multiplication has zero uses, it is kept around probably because @@ -2958,8 +2967,8 @@ convert_mult_to_fma (gimple *mul_stmt, t that a mult / subtract pair. */ if (use_code == MINUS_EXPR && !negate_p && gimple_assign_rhs1 (use_stmt) == result - && optab_handler (fms_optab, TYPE_MODE (type)) == CODE_FOR_nothing - && optab_handler (fnma_optab, TYPE_MODE (type)) != CODE_FOR_nothing) + && !direct_internal_fn_supported_p (IFN_FMS, type, opt_type) + && direct_internal_fn_supported_p (IFN_FNMA, type, opt_type)) { tree rhs2 = gimple_assign_rhs2 (use_stmt); Index: gcc/config/i386/i386.c =================================================================== --- gcc/config/i386/i386.c 2018-05-09 11:34:41.204789431 +0100 +++ gcc/config/i386/i386.c 2018-05-11 18:08:24.825947782 +0100 @@ -50516,19 +50516,20 @@ ix86_add_stmt_cost (void *data, int coun tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE; int stmt_cost = - 1; + bool fp = false; + machine_mode mode = TImode; + + if (vectype != NULL) + { + fp = FLOAT_TYPE_P (vectype); + mode = TYPE_MODE (vectype); + } + if ((kind == vector_stmt || kind == scalar_stmt) && stmt_info && stmt_info->stmt && gimple_code (stmt_info->stmt) == GIMPLE_ASSIGN) { tree_code subcode = gimple_assign_rhs_code (stmt_info->stmt); - bool fp = false; - machine_mode mode = TImode; - - if (vectype != NULL) - { - fp = FLOAT_TYPE_P (vectype); - mode = TYPE_MODE (vectype); - } /*machine_mode inner_mode = mode; if (VECTOR_MODE_P (mode)) inner_mode = GET_MODE_INNER (mode);*/ @@ -50559,12 +50560,6 @@ ix86_add_stmt_cost (void *data, int coun case MULT_HIGHPART_EXPR: stmt_cost = ix86_multiplication_cost (ix86_cost, mode); break; - case FMA_EXPR: - stmt_cost = ix86_vec_cost (mode, - mode == SFmode ? ix86_cost->fmass - : ix86_cost->fmasd, - true); - break; case NEGATE_EXPR: if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH) stmt_cost = ix86_cost->sse_op; @@ -50627,6 +50622,24 @@ ix86_add_stmt_cost (void *data, int coun break; } } + + combined_fn cfn; + if ((kind == vector_stmt || kind == scalar_stmt) + && stmt_info + && stmt_info->stmt + && (cfn = gimple_call_combined_fn (stmt_info->stmt)) != CFN_LAST) + switch (cfn) + { + case CFN_FMA: + stmt_cost = ix86_vec_cost (mode, + mode == SFmode ? ix86_cost->fmass + : ix86_cost->fmasd, + true); + break; + default: + break; + } + /* If we do elementwise loads into a vector then we are bound by latency and execution resources for the many scalar loads (AGU and load ports). Try to account for this by scaling the Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c 2018-05-08 09:42:03.510648702 +0100 +++ gcc/config/rs6000/rs6000.c 2018-05-11 18:08:24.830947576 +0100 @@ -16025,7 +16025,9 @@ rs6000_gimple_fold_builtin (gimple_stmt_ arg1 = gimple_call_arg (stmt, 1); tree arg2 = gimple_call_arg (stmt, 2); lhs = gimple_call_lhs (stmt); - gimple *g = gimple_build_assign (lhs, FMA_EXPR, arg0, arg1, arg2); + gcall *g = gimple_build_call_internal (IFN_FMA, 3, arg0, arg1, arg2); + gimple_call_set_lhs (g, lhs); + gimple_call_set_nothrow (g, true); gimple_set_location (g, gimple_location (stmt)); gsi_replace (gsi, g, true); return true; Index: gcc/brig/brigfrontend/brig-function.cc =================================================================== --- gcc/brig/brigfrontend/brig-function.cc 2018-05-08 09:42:01.419725462 +0100 +++ gcc/brig/brigfrontend/brig-function.cc 2018-05-11 18:08:24.817948111 +0100 @@ -1218,6 +1218,7 @@ brig_function::get_builtin_for_hsa_opcod case BRIG_OPCODE_NEXP2: builtin = mathfn_built_in (builtin_type, BUILT_IN_EXP2); break; + case BRIG_OPCODE_FMA: case BRIG_OPCODE_NFMA: builtin = mathfn_built_in (builtin_type, BUILT_IN_FMA); break; @@ -1460,8 +1461,6 @@ brig_function::get_tree_code_for_hsa_opc return CALL_EXPR; else return MAX_EXPR; - case BRIG_OPCODE_FMA: - return FMA_EXPR; case BRIG_OPCODE_ABS: return ABS_EXPR; case BRIG_OPCODE_SHL: @@ -1496,6 +1495,7 @@ brig_function::get_tree_code_for_hsa_opc /* Implement as 1/f (x). gcc should pattern detect that and use a native instruction, if available, for it. */ return TREE_LIST; + case BRIG_OPCODE_FMA: case BRIG_OPCODE_FLOOR: case BRIG_OPCODE_CEIL: case BRIG_OPCODE_SQRT: Index: gcc/c/gimple-parser.c =================================================================== --- gcc/c/gimple-parser.c 2018-01-03 11:12:56.269741723 +0000 +++ gcc/c/gimple-parser.c 2018-05-11 18:08:24.817948111 +0100 @@ -903,27 +903,6 @@ c_parser_gimple_postfix_expression (c_pa expr.value = fold_convert (type, val); return expr; } - else if (strcmp (IDENTIFIER_POINTER (id), "__FMA") == 0) - { - c_parser_consume_token (parser); - auto_vec args; - - if (c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>")) - { - c_parser_gimple_expr_list (parser, &args); - c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, - "expected %<)%>"); - } - if (args.length () != 3) - { - error_at (loc, "invalid number of operands to __FMA"); - expr.value = error_mark_node; - return expr; - } - expr.value = build3_loc (loc, FMA_EXPR, TREE_TYPE (args[0]), - args[0], args[1], args[2]); - return expr; - } /* SSA name. */ unsigned version, ver_offset; Index: gcc/cp/constexpr.c =================================================================== --- gcc/cp/constexpr.c 2018-05-01 19:30:30.973597315 +0100 +++ gcc/cp/constexpr.c 2018-05-11 18:08:24.830947576 +0100 @@ -4573,7 +4573,6 @@ cxx_eval_constant_expression (const cons non_constant_p, overflow_p); break; - case FMA_EXPR: case VEC_PERM_EXPR: r = cxx_eval_trinary_expression (ctx, t, lval, non_constant_p, overflow_p); @@ -5998,7 +5997,6 @@ #define RECUR(T,RV) \ return false; return true; - case FMA_EXPR: case VEC_PERM_EXPR: for (i = 0; i < 3; ++i) if (!RECUR (TREE_OPERAND (t, i), true)) Index: gcc/testsuite/lib/target-supports.exp =================================================================== --- gcc/testsuite/lib/target-supports.exp 2018-05-08 09:42:01.256731446 +0100 +++ gcc/testsuite/lib/target-supports.exp 2018-05-11 18:08:24.848946836 +0100 @@ -2879,6 +2879,13 @@ proc check_effective_target_base_quadflo return 1 } +# Return 1 if the target supports all four forms of fused multiply-add +# (fma, fms, fnma, and fnms) for both float and double. + +proc check_effective_target_all_scalar_fma { } { + return [istarget aarch64*-*-*] +} + # Return 1 if the target supports compiling fixed-point, # 0 otherwise. Index: gcc/testsuite/gcc.dg/fma-1.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.dg/fma-1.c 2018-05-11 18:08:24.844947000 +0100 @@ -0,0 +1,15 @@ +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ + +float +f1 (float a, float b, float c) +{ + return a * b + c; +} + +double +f2 (double a, double b, double c) +{ + return a * b + c; +} + +/* { dg-final { scan-tree-dump-times { = FMA \(} 2 "widening_mul" { target all_scalar_fma } } } */ Index: gcc/testsuite/gcc.dg/fma-2.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.dg/fma-2.c 2018-05-11 18:08:24.844947000 +0100 @@ -0,0 +1,15 @@ +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ + +float +f1 (float a, float b, float c) +{ + return a * b - c; +} + +double +f2 (double a, double b, double c) +{ + return a * b - c; +} + +/* { dg-final { scan-tree-dump-times { = FMS \(} 2 "widening_mul" { target all_scalar_fma } } } */ Index: gcc/testsuite/gcc.dg/fma-3.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.dg/fma-3.c 2018-05-11 18:08:24.845946959 +0100 @@ -0,0 +1,15 @@ +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ + +float +f1 (float a, float b, float c) +{ + return c - a * b; +} + +double +f2 (double a, double b, double c) +{ + return c - a * b; +} + +/* { dg-final { scan-tree-dump-times { = FNMA \(} 2 "widening_mul" { target all_scalar_fma } } } */ Index: gcc/testsuite/gcc.dg/fma-4.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.dg/fma-4.c 2018-05-11 18:08:24.845946959 +0100 @@ -0,0 +1,15 @@ +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ + +float +f1 (float a, float b, float c) +{ + return -(a * b) - c; +} + +double +f2 (double a, double b, double c) +{ + return -(a * b) - c; +} + +/* { dg-final { scan-tree-dump-times { = FNMS \(} 2 "widening_mul" { target all_scalar_fma } } } */ Index: gcc/testsuite/gcc.dg/fma-5.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.dg/fma-5.c 2018-05-11 18:08:24.845946959 +0100 @@ -0,0 +1,53 @@ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +float +f1 (float a, float b, float c) +{ + return __builtin_fmaf (a, b, -c); +} + +double +f2 (double a, double b, double c) +{ + return __builtin_fma (a, b, -c); +} + +void +f3 (float a, float b, float c, float d, float e, float *res) +{ + res[0] = __builtin_fmaf (a, b, -e); + res[1] = __builtin_fmaf (c, d, -e); +} + +void +f4 (double a, double b, double c, double d, double e, double *res) +{ + res[0] = __builtin_fma (a, b, -e); + res[1] = __builtin_fma (c, d, -e); +} + +float +f5 (float a, float b, float c) +{ + return -__builtin_fmaf (-a, b, c); +} + +double +f6 (double a, double b, double c) +{ + return -__builtin_fma (-a, b, c); +} + +float +f7 (float a, float b, float c) +{ + return -__builtin_fmaf (a, -b, c); +} + +double +f8 (double a, double b, double c) +{ + return -__builtin_fma (a, -b, c); +} + +/* { dg-final { scan-tree-dump-times { = FMS \(} 10 "optimized" { target all_scalar_fma } } } */ Index: gcc/testsuite/gcc.dg/fma-6.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.dg/fma-6.c 2018-05-11 18:08:24.845946959 +0100 @@ -0,0 +1,67 @@ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +float +f1 (float a, float b, float c) +{ + return __builtin_fmaf (-a, b, c); +} + +double +f2 (double a, double b, double c) +{ + return __builtin_fma (-a, b, c); +} + +float +f3 (float a, float b, float c) +{ + return __builtin_fmaf (a, -b, c); +} + +double +f4 (double a, double b, double c) +{ + return __builtin_fma (a, -b, c); +} + +void +f5 (float a, float b, float c, float d, float e, float *res) +{ + res[0] = __builtin_fmaf (-a, b, c); + res[1] = __builtin_fmaf (-a, d, e); +} + +void +f6 (double a, double b, double c, double d, double e, double *res) +{ + res[0] = __builtin_fma (-a, b, c); + res[1] = __builtin_fma (-a, d, e); +} + +void +f7 (float a, float b, float c, float d, float e, float *res) +{ + res[0] = __builtin_fmaf (a, -b, c); + res[1] = __builtin_fmaf (d, -b, e); +} + +void +f8 (double a, double b, double c, double d, double e, double *res) +{ + res[0] = __builtin_fma (a, -b, c); + res[1] = __builtin_fma (d, -b, e); +} + +float +f9 (float a, float b, float c) +{ + return -__builtin_fmaf (a, b, -c); +} + +double +f10 (double a, double b, double c) +{ + return -__builtin_fma (a, b, -c); +} + +/* { dg-final { scan-tree-dump-times { = FNMA \(} 14 "optimized" { target all_scalar_fma } } } */ Index: gcc/testsuite/gcc.dg/fma-7.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.dg/fma-7.c 2018-05-11 18:08:24.845946959 +0100 @@ -0,0 +1,67 @@ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +float +f1 (float a, float b, float c) +{ + return __builtin_fmaf (-a, b, -c); +} + +double +f2 (double a, double b, double c) +{ + return __builtin_fma (-a, b, -c); +} + +float +f3 (float a, float b, float c) +{ + return __builtin_fmaf (a, -b, -c); +} + +double +f4 (double a, double b, double c) +{ + return __builtin_fma (a, -b, -c); +} + +void +f5 (float a, float b, float c, float d, float *res) +{ + res[0] = __builtin_fmaf (-a, b, -c); + res[1] = __builtin_fmaf (-a, d, -c); +} + +void +f6 (double a, double b, double c, double d, double *res) +{ + res[0] = __builtin_fma (-a, b, -c); + res[1] = __builtin_fma (-a, d, -c); +} + +void +f7 (float a, float b, float c, float d, float *res) +{ + res[0] = __builtin_fmaf (a, -b, -c); + res[1] = __builtin_fmaf (d, -b, -c); +} + +void +f8 (double a, double b, double c, double d, double *res) +{ + res[0] = __builtin_fma (a, -b, -c); + res[1] = __builtin_fma (d, -b, -c); +} + +float +f9 (float a, float b, float c) +{ + return -__builtin_fmaf (a, b, c); +} + +double +f10 (double a, double b, double c) +{ + return -__builtin_fma (a, b, c); +} + +/* { dg-final { scan-tree-dump-times { = FNMS \(} 14 "optimized" { target all_scalar_fma } } } */ Index: gcc/testsuite/gcc.dg/gimplefe-26.c =================================================================== --- gcc/testsuite/gcc.dg/gimplefe-26.c 2017-02-23 19:54:08.000000000 +0000 +++ /dev/null 2018-04-20 16:19:46.369131350 +0100 @@ -1,16 +0,0 @@ -/* { dg-do compile { target c99_runtime } } */ -/* { dg-options "-O -fgimple -fdump-tree-ssa-gimple" } */ - -#define foo(type, num) \ -type __GIMPLE () foo_##num (type a, type b, type c) \ -{ \ - type t0; \ - t0_1 = __FMA (a, b, c); \ - return t0_1; \ -} - -foo(float, 1) -foo(double, 2) -foo(long double, 3) - -/* { dg-final { scan-tree-dump-times "__FMA" 3 "ssa" } } */ Index: gcc/testsuite/gfortran.dg/reassoc_7.f =================================================================== --- gcc/testsuite/gfortran.dg/reassoc_7.f 2015-06-02 23:52:46.000000000 +0100 +++ gcc/testsuite/gfortran.dg/reassoc_7.f 2018-05-11 18:08:24.846946918 +0100 @@ -1,5 +1,5 @@ ! { dg-do compile } -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" } +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" } SUBROUTINE S55199(P,Dvdph) implicit none Index: gcc/testsuite/gfortran.dg/reassoc_8.f =================================================================== --- gcc/testsuite/gfortran.dg/reassoc_8.f 2015-06-02 23:52:46.000000000 +0100 +++ gcc/testsuite/gfortran.dg/reassoc_8.f 2018-05-11 18:08:24.846946918 +0100 @@ -1,5 +1,5 @@ ! { dg-do compile } -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" } +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" } SUBROUTINE S55199(P,Dvdph) implicit none Index: gcc/testsuite/gfortran.dg/reassoc_9.f =================================================================== --- gcc/testsuite/gfortran.dg/reassoc_9.f 2015-06-02 23:52:48.000000000 +0100 +++ gcc/testsuite/gfortran.dg/reassoc_9.f 2018-05-11 18:08:24.847946877 +0100 @@ -1,5 +1,5 @@ ! { dg-do compile } -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" } +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" } SUBROUTINE S55199(P,Dvdph) implicit none Index: gcc/testsuite/gfortran.dg/reassoc_10.f =================================================================== --- gcc/testsuite/gfortran.dg/reassoc_10.f 2015-06-02 23:52:44.000000000 +0100 +++ gcc/testsuite/gfortran.dg/reassoc_10.f 2018-05-11 18:08:24.846946918 +0100 @@ -1,5 +1,5 @@ ! { dg-do compile } -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" } +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" } SUBROUTINE S55199(P,Q,Dvdph) implicit none