From patchwork Tue Aug 22 01:58:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 1823811 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RVCFM4khxz1yZs for ; Tue, 22 Aug 2023 11:59:03 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 37BDA3854148 for ; Tue, 22 Aug 2023 01:59:01 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbgbr1.qq.com (smtpbgbr1.qq.com [54.207.19.206]) by sourceware.org (Postfix) with ESMTPS id 78E0A3858D33 for ; Tue, 22 Aug 2023 01:58:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 78E0A3858D33 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp62t1692669516tn7fmhsk Received: from rios-cad121.hadoop.rioslab.org ( [58.60.1.7]) by bizesmtp.qq.com (ESMTP) with id ; Tue, 22 Aug 2023 09:58:35 +0800 (CST) X-QQ-SSF: 01400000000000G0V000000A0000000 X-QQ-FEAT: CQfe3icsPEhI23kKkViYEfKZzF/X04yDC3K91yFGfcJFc+aPaGs0U2pxlK6/z 8wO8e7KdkYDCbVYgW0PFxvnOossqBjcMKF6oiq5xZ+/2dPFL9yKRTnB/bfj/pmsGY8UnFG6 Cb+ktfnZQ8wcovzsX3hm4gEmgyF+Zf2jEgcfbJSBh/a/AIZa7kcBSzo48ZyVkI5nm4yOGc4 JwkfQXHbHIJEj+Ydk94Zqac/bMrwciR9hiJj2NmQQwoAM9VmE2FWQIftouceOzMkoQ1goQd gpo7CVFOj4qTOR947lvUqZpqQGcAUFLQ9wmachGUlfREZk4iaxuHXW46IxF2Y8dSDkB14gh wPp6hM8GPv5xb/QWWKBM9MgyPHMfvjJaOiPRtxazt/UupN94xkvdYYrCa5wvdJHom5jX+TX xgMnltgDKV8= X-QQ-GoodBg: 2 X-BIZMAIL-ID: 10376093010247961449 From: Juzhe-Zhong To: gcc-patches@gcc.gnu.org Cc: richard.sandiford@arm.com, rguenther@suse.de, Juzhe-Zhong Subject: [PATCH V2] gimple_fold: Support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold Date: Tue, 22 Aug 2023 09:58:34 +0800 Message-Id: <20230822015834.3540320-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hi, Richard and Richi. Currently, GCC support COND_LEN_FMA for floating-point **NO** -ffast-math. It's supported in tree-ssa-math-opts.cc. However, GCC failed to support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS. Consider this following case: #define TEST_TYPE(TYPE) \ __attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst, \ TYPE *__restrict a, \ TYPE *__restrict b, int n) \ { \ for (int i = 0; i < n; i++) \ dst[i] -= a[i] * b[i]; \ } #define TEST_ALL() \ TEST_TYPE (float) \ TEST_ALL () Gimple IR for RVV: ... _39 = -vect__8.14_26; vect__10.16_21 = .COND_LEN_FMA ({ -1, ... }, vect__6.11_30, _39, vect__4.8_34, vect__4.8_34, _46, 0); ... This is because this following piece of codes in tree-ssa-math-opts.cc: if (len) fma_stmt = gimple_build_call_internal (IFN_COND_LEN_FMA, 7, cond, mulop1, op2, addop, else_value, len, bias); else if (cond) fma_stmt = gimple_build_call_internal (IFN_COND_FMA, 5, cond, mulop1, op2, addop, else_value); else fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2, addop); gimple_set_lhs (fma_stmt, gimple_get_lhs (use_stmt)); gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal (cfun, use_stmt)); gsi_replace (&gsi, fma_stmt, true); /* Follow all SSA edges so that we generate FMS, FNMA and FNMS regardless of where the negation occurs. */ gimple *orig_stmt = gsi_stmt (gsi); if (fold_stmt (&gsi, follow_all_ssa_edges)) { if (maybe_clean_or_replace_eh_stmt (orig_stmt, gsi_stmt (gsi))) gcc_unreachable (); update_stmt (gsi_stmt (gsi)); } 'fold_stmt' failed to fold NEGATE_EXPR + COND_LEN_FMA ====> COND_LEN_FNMA. This patch support STMT fold into: vect__10.16_21 = .COND_LEN_FNMA ({ -1, ... }, vect__8.14_26, vect__6.11_30, vect__4.8_34, { 0.0, ... }, _46, 0); Note that COND_LEN_FNMA has 7 arguments and COND_LEN_ADD has 6 arguments. Extend maximum num ops: - static const unsigned int MAX_NUM_OPS = 5; + static const unsigned int MAX_NUM_OPS = 7; Bootstrap and Regtest on X86 passed. Tested on aarch64 Qemu. Fully tested COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS on RISC-V backend. gcc/ChangeLog: * genmatch.cc (decision_tree::gen): Support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold. * gimple-match-exports.cc (gimple_simplify): Ditto. (gimple_resimplify6): New function. (gimple_resimplify7): New function. (gimple_match_op::resimplify): Support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold. (convert_conditional_op): Ditto. (build_call_internal): Ditto. (try_conditional_simplification): Ditto. (gimple_extract): Ditto. * gimple-match.h (gimple_match_cond::gimple_match_cond): Ditto. * internal-fn.cc (CASE): Ditto. --- gcc/genmatch.cc | 2 +- gcc/gimple-match-exports.cc | 123 ++++++++++++++++++++++++++++++++++-- gcc/gimple-match.h | 16 ++++- gcc/internal-fn.cc | 7 +- 4 files changed, 138 insertions(+), 10 deletions(-) diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc index f46d2e1520d..a1925a747a7 100644 --- a/gcc/genmatch.cc +++ b/gcc/genmatch.cc @@ -4052,7 +4052,7 @@ decision_tree::gen (vec &files, bool gimple) } fprintf (stderr, "removed %u duplicate tails\n", rcnt); - for (unsigned n = 1; n <= 5; ++n) + for (unsigned n = 1; n <= 7; ++n) { bool has_kids_p = false; diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc index 7aeb4ddb152..b36027b0bad 100644 --- a/gcc/gimple-match-exports.cc +++ b/gcc/gimple-match-exports.cc @@ -60,6 +60,12 @@ extern bool gimple_simplify (gimple_match_op *, gimple_seq *, tree (*)(tree), code_helper, tree, tree, tree, tree, tree); extern bool gimple_simplify (gimple_match_op *, gimple_seq *, tree (*)(tree), code_helper, tree, tree, tree, tree, tree, tree); +extern bool gimple_simplify (gimple_match_op *, gimple_seq *, tree (*)(tree), + code_helper, tree, tree, tree, tree, tree, tree, + tree); +extern bool gimple_simplify (gimple_match_op *, gimple_seq *, tree (*)(tree), + code_helper, tree, tree, tree, tree, tree, tree, + tree, tree); /* Functions that are needed by gimple-match but that are exported and used in other places in the compiler. */ @@ -89,6 +95,8 @@ static bool gimple_resimplify2 (gimple_seq *, gimple_match_op *, tree (*)(tree)) static bool gimple_resimplify3 (gimple_seq *, gimple_match_op *, tree (*)(tree)); static bool gimple_resimplify4 (gimple_seq *, gimple_match_op *, tree (*)(tree)); static bool gimple_resimplify5 (gimple_seq *, gimple_match_op *, tree (*)(tree)); +static bool gimple_resimplify6 (gimple_seq *, gimple_match_op *, tree (*)(tree)); +static bool gimple_resimplify7 (gimple_seq *, gimple_match_op *, tree (*)(tree)); /* Match and simplify the toplevel valueized operation THIS. Replaces THIS with a simplified and/or canonicalized result and @@ -109,6 +117,10 @@ gimple_match_op::resimplify (gimple_seq *seq, tree (*valueize)(tree)) return gimple_resimplify4 (seq, this, valueize); case 5: return gimple_resimplify5 (seq, this, valueize); + case 6: + return gimple_resimplify6 (seq, this, valueize); + case 7: + return gimple_resimplify7 (seq, this, valueize); default: gcc_unreachable (); } @@ -146,7 +158,14 @@ convert_conditional_op (gimple_match_op *orig_op, if (ifn == IFN_LAST) return false; unsigned int num_ops = orig_op->num_ops; - new_op->set_op (as_combined_fn (ifn), orig_op->type, num_ops + 2); + unsigned int num_cond_ops = 2; + if (orig_op->cond.len) + { + /* Add the length and bias parameters. */ + ifn = get_len_internal_fn (ifn); + num_cond_ops = 4; + } + new_op->set_op (as_combined_fn (ifn), orig_op->type, num_ops + num_cond_ops); new_op->ops[0] = orig_op->cond.cond; for (unsigned int i = 0; i < num_ops; ++i) new_op->ops[i + 1] = orig_op->ops[i]; @@ -155,6 +174,11 @@ convert_conditional_op (gimple_match_op *orig_op, else_value = targetm.preferred_else_value (ifn, orig_op->type, num_ops, orig_op->ops); new_op->ops[num_ops + 1] = else_value; + if (orig_op->cond.len) + { + new_op->ops[num_ops + 2] = orig_op->cond.len; + new_op->ops[num_ops + 3] = orig_op->cond.bias; + } return true; } /* Helper for gimple_simplify valueizing OP using VALUEIZE and setting @@ -219,7 +243,9 @@ build_call_internal (internal_fn fn, gimple_match_op *res_op) res_op->op_or_null (1), res_op->op_or_null (2), res_op->op_or_null (3), - res_op->op_or_null (4)); + res_op->op_or_null (4), + res_op->op_or_null (5), + res_op->op_or_null (6)); } /* RES_OP is the result of a simplification. If it is conditional, @@ -319,6 +345,7 @@ try_conditional_simplification (internal_fn ifn, gimple_match_op *res_op, { code_helper op; tree_code code = conditional_internal_fn_code (ifn); + int len_index = internal_fn_len_index (ifn); if (code != ERROR_MARK) op = code; else @@ -330,12 +357,19 @@ try_conditional_simplification (internal_fn ifn, gimple_match_op *res_op, } unsigned int num_ops = res_op->num_ops; + /* num_cond_ops = 2 for COND_ADD (MASK and ELSE) + wheras num_cond_ops = 4 for COND_LEN_ADD (MASK, ELSE, LEN and BIAS). */ + unsigned int num_cond_ops = len_index < 0 ? 2 : 4; + tree else_value + = len_index < 0 ? res_op->ops[num_ops - 1] : res_op->ops[num_ops - 3]; + tree len = len_index < 0 ? NULL_TREE : res_op->ops[num_ops - 2]; + tree bias = len_index < 0 ? NULL_TREE : res_op->ops[num_ops - 1]; gimple_match_op cond_op (gimple_match_cond (res_op->ops[0], - res_op->ops[num_ops - 1]), - op, res_op->type, num_ops - 2); + else_value, len, bias), + op, res_op->type, num_ops - num_cond_ops); memcpy (cond_op.ops, res_op->ops + 1, (num_ops - 1) * sizeof *cond_op.ops); - switch (num_ops - 2) + switch (num_ops - num_cond_ops) { case 1: if (!gimple_resimplify1 (seq, &cond_op, valueize)) @@ -717,7 +751,7 @@ gimple_extract (gimple *stmt, gimple_match_op *res_op, /* ??? This way we can't simplify calls with side-effects. */ if (gimple_call_lhs (stmt) != NULL_TREE && gimple_call_num_args (stmt) >= 1 - && gimple_call_num_args (stmt) <= 5) + && gimple_call_num_args (stmt) <= 7) { combined_fn cfn; if (gimple_call_internal_p (stmt)) @@ -1145,6 +1179,83 @@ gimple_resimplify5 (gimple_seq *seq, gimple_match_op *res_op, return canonicalized; } +/* Helper that matches and simplifies the toplevel result from + a gimple_simplify run (where we don't want to build + a stmt in case it's used in in-place folding). Replaces + RES_OP with a simplified and/or canonicalized result and + returns whether any change was made. */ + +static bool +gimple_resimplify6 (gimple_seq *seq, gimple_match_op *res_op, + tree (*valueize)(tree)) +{ + /* No constant folding is defined for six-operand functions. */ + + /* Canonicalize operand order. */ + bool canonicalized = false; + int argno = first_commutative_argument (res_op->code, res_op->type); + if (argno >= 0 + && tree_swap_operands_p (res_op->ops[argno], res_op->ops[argno + 1])) + { + std::swap (res_op->ops[argno], res_op->ops[argno + 1]); + canonicalized = true; + } + + gimple_match_op res_op2 (*res_op); + if (gimple_simplify (&res_op2, seq, valueize, + res_op->code, res_op->type, + res_op->ops[0], res_op->ops[1], res_op->ops[2], + res_op->ops[3], res_op->ops[4], res_op->ops[5])) + { + *res_op = res_op2; + return true; + } + + if (maybe_resimplify_conditional_op (seq, res_op, valueize)) + return true; + + return canonicalized; +} + +/* Helper that matches and simplifies the toplevel result from + a gimple_simplify run (where we don't want to build + a stmt in case it's used in in-place folding). Replaces + RES_OP with a simplified and/or canonicalized result and + returns whether any change was made. */ + +static bool +gimple_resimplify7 (gimple_seq *seq, gimple_match_op *res_op, + tree (*valueize)(tree)) +{ + /* No constant folding is defined for seven-operand functions. */ + + /* Canonicalize operand order. */ + bool canonicalized = false; + int argno = first_commutative_argument (res_op->code, res_op->type); + if (argno >= 0 + && tree_swap_operands_p (res_op->ops[argno], res_op->ops[argno + 1])) + { + std::swap (res_op->ops[argno], res_op->ops[argno + 1]); + canonicalized = true; + } + + gimple_match_op res_op2 (*res_op); + if (gimple_simplify (&res_op2, seq, valueize, + res_op->code, res_op->type, + res_op->ops[0], res_op->ops[1], res_op->ops[2], + res_op->ops[3], res_op->ops[4], res_op->ops[5], + res_op->ops[6])) + { + *res_op = res_op2; + return true; + } + + if (maybe_resimplify_conditional_op (seq, res_op, valueize)) + return true; + + return canonicalized; +} + /* Return a canonical form for CODE when operating on TYPE. The idea is to remove redundant ways of representing the same operation so that code_helpers can be hashed and compared for equality. diff --git a/gcc/gimple-match.h b/gcc/gimple-match.h index b20585dca4b..bec3ff42e3e 100644 --- a/gcc/gimple-match.h +++ b/gcc/gimple-match.h @@ -34,6 +34,7 @@ public: /* Build an unconditional op. */ gimple_match_cond (uncond) : cond (NULL_TREE), else_value (NULL_TREE) {} gimple_match_cond (tree, tree); + gimple_match_cond (tree, tree, tree, tree); gimple_match_cond any_else () const; @@ -44,6 +45,13 @@ public: /* The value to use when the condition is false. This is NULL_TREE if the operation is unconditional or if the value doesn't matter. */ tree else_value; + + /* The length and bias parameters to be applied to a vector operation, + so that the condition is forced to false when the element index is + >= LEN + BIAS. These are NULL_TREE if the operation isn't applied + to vectors, or if no such length limit is in use. */ + tree len; + tree bias; }; inline @@ -52,6 +60,12 @@ gimple_match_cond::gimple_match_cond (tree cond_in, tree else_value_in) { } +inline +gimple_match_cond::gimple_match_cond (tree cond_in, tree else_value_in, + tree len_in, tree bias_in) + : cond (cond_in), else_value (else_value_in), len (len_in), bias (bias_in) +{} + /* Return a gimple_match_cond with the same condition but with an arbitrary ELSE_VALUE. */ @@ -93,7 +107,7 @@ public: bool resimplify (gimple_seq *, tree (*)(tree)); /* The maximum value of NUM_OPS. */ - static const unsigned int MAX_NUM_OPS = 5; + static const unsigned int MAX_NUM_OPS = 7; /* The conditions under which the operation is performed, and the value to use as a fallback. */ diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index df673fda3f0..314f63b614b 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -4443,8 +4443,11 @@ get_unconditional_internal_fn (internal_fn ifn) { switch (ifn) { -#define CASE(NAME) case IFN_COND_##NAME: return IFN_##NAME; - FOR_EACH_COND_FN_PAIR(CASE) +#define CASE(NAME) \ + case IFN_COND_##NAME: \ + case IFN_COND_LEN_##NAME: \ + return IFN_##NAME; +FOR_EACH_COND_FN_PAIR (CASE) #undef CASE default: return IFN_LAST;