From patchwork Thu Nov 7 10:26:42 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Kyrill Tkachov X-Patchwork-Id: 1191037 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-512685-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=foss.arm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="uwqNHA0U"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47802F22TQz9sPk for ; Thu, 7 Nov 2019 21:26:59 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type; q=dns; s=default; b=Pv5jIL4p2H8HmPoKuu/IYd/OVgnkOal0zRQ/UORRNLcIGMMvfc 9JcFR/hKDwDWKXS9Sw9gbe7TL/XzbGm9JRc1xygmE8sl1mUC6aKiCi7F4dpySXc7 qZZ1T6CFJVA2p0RYgB8kn0046PjzaIxEJe2GHKLkPvynvqVG1elqgdbH4= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type; s= default; bh=HLi14nvtq+pNmI8X9xlLiYJQd54=; b=uwqNHA0UCknOEpbe1kU+ SbFappinGz5f3EmEllWUMQA2rJ8Pmr8L9euzIZPCgu0JCGDky1Yv/nZ216n49A2k bgZWa4pkzEsj0S4rURn0A6EEEHqDk3ntIhu1wg8hD8drvABfoDXqtnsc5KR/Gob9 CxpH6jdiQYViQgk7ul4lMlk= Received: (qmail 42979 invoked by alias); 7 Nov 2019 10:26:50 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 42966 invoked by uid 89); 7 Nov 2019 10:26:50 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-18.9 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_LOTSOFHASH autolearn=ham version=3.3.1 spammy=wireless, boundaries, resort, wrapping X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.110.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 07 Nov 2019 10:26:45 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A148C46A for ; Thu, 7 Nov 2019 02:26:43 -0800 (PST) Received: from [10.2.206.47] (e120808-lin.cambridge.arm.com [10.2.206.47]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4B8B23F71A for ; Thu, 7 Nov 2019 02:26:43 -0800 (PST) To: "gcc-patches@gcc.gnu.org" From: Kyrill Tkachov Subject: [PATCH][arm][1/X] Add initial support for saturation intrinsics Message-ID: <757b0bc7-2e7f-dd7a-1042-3b25d775caa2@foss.arm.com> Date: Thu, 7 Nov 2019 10:26:42 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.1 MIME-Version: 1.0 Hi all, This patch adds the plumbing for and an implementation of the saturation intrinsics from ACLE [1], in particular the __ssat, __usat intrinsics. These intrinsics set the Q sticky bit in APSR if an overflow occurred. ACLE allows the user to read that bit (within the same function, it's not defined across function boundaries) using the __saturation_occurred intrinsic and reset it using __set_saturation_occurred. Thus, if the user cares about the Q bit they would be using a flow such as: __set_saturation_occurred (0); // reset the Q bit ... __ssat (...) // Do some calculations involving __ssat ... if (__saturation_occurred ()) // if Q bit set handle overflow   ... For the implementation this has a few implications: * We must track the Q-setting side-effects of these instructions to make sure saturation reading/writing intrinsics are ordered properly. This is done by introducing a new "apsrq" register (and associated APSRQ_REGNUM) in a similar way to the "fake"" cc register. * The RTL patterns coming out of these intrinsics can have two forms: one where they set the APSRQ_REGNUM and one where they don't. Which one is used depends on whether the function cares about reading the Q flag. This is detected using the TARGET_CHECK_BUILTIN_CALL hook on the __saturation_occurred, __set_saturation_occurred occurrences. If no Q-flag read is present in the function we'll use the simpler non-Q-setting form to allow for more aggressive scheduling and such. If a Q-bit read is present then the Q-setting form is emitted. To avoid adding two patterns for each intrinsic to the MD file we make use of define_subst to auto-generate the Q-setting forms * Some existing patterns already produce instructions that may clobber the Q bit, but they don't model it (as we didn't care about that bit up till now). Since these patterns can be generated from straight-line C code they can affect the Q-bit reads from intrinsics. Therefore they have to be disabled when a Q-bit read is present.  These are mostly patterns in arm-fixed.md that are not very common anyway, but there are also a couple of widening multiply-accumulate patterns in arm.md that can set the Q-bit during accumulation. There are more Q-setting intrinsics in ACLE, but these will be implemented in a more mechanical fashion once the infrastructure in this patch goes in. Bootstrapped and tested on arm-none-linux-gnueabihf. Committing to trunk. Thanks, Kyrill 2019-11-07  Kyrylo Tkachov      * config/arm/aout.h (REGISTER_NAMES): Add apsrq.     * config/arm/arm.md (APSRQ_REGNUM): Define.     (add_setq): New define_subst.     (add_clobber_q_name): New define_subst_attr.     (add_clobber_q_pred): Likewise.     (maddhisi4): Change to define_expand.  Split into mult and add if     ARM_Q_BIT_READ.     (arm_maddhisi4): New define_insn.     (*maddhisi4tb): Disable for ARM_Q_BIT_READ.     (*maddhisi4tt): Likewise.     (arm_ssat): New define_expand.     (arm_usat): Likewise.     (arm_get_apsr): New define_insn.     (arm_set_apsr): Likewise.     (arm_saturation_occurred): New define_expand.     (arm_set_saturation): Likewise.     (*satsi_): Rename to...     (satsi_): ... This.     (*satsi__shift): Disable for ARM_Q_BIT_READ.     * config/arm/arm.h (FIXED_REGISTERS): Mark apsrq as fixed.     (CALL_USED_REGISTERS): Mark apsrq.     (FIRST_PSEUDO_REGISTER): Update value.     (REG_ALLOC_ORDER): Add APSRQ_REGNUM.     (machine_function): Add q_bit_access.     (ARM_Q_BIT_READ): Define.     * config/arm/arm.c (TARGET_CHECK_BUILTIN_CALL): Define.     (arm_conditional_register_usage): Clear APSRQ_REGNUM from     operand_reg_set.     (arm_q_bit_access): Define.     * config/arm/arm-builtins.c: Include stringpool.h.     (arm_sat_binop_imm_qualifiers,     arm_unsigned_sat_binop_unsigned_imm_qualifiers,     arm_sat_occurred_qualifiers, arm_set_sat_qualifiers): Define.     (SAT_BINOP_UNSIGNED_IMM_QUALIFIERS,     UNSIGNED_SAT_BINOP_UNSIGNED_IMM_QUALIFIERS, SAT_OCCURRED_QUALIFIERS,     SET_SAT_QUALIFIERS): Likewise.     (arm_builtins): Define ARM_BUILTIN_SAT_IMM_CHECK.     (arm_init_acle_builtins): Initialize __builtin_sat_imm_check.     Handle 0 argument expander.     (arm_expand_acle_builtin): Handle ARM_BUILTIN_SAT_IMM_CHECK.     (arm_check_builtin_call): Define.     * config/arm/arm.md (ssmulsa3, usmulusa3, usmuluha3,     arm_ssatsihi_shift, arm_usatsihi): Disable when ARM_Q_BIT_READ.     * config/arm/arm-protos.h (arm_check_builtin_call): Declare prototype.     (arm_q_bit_access): Likewise.     * config/arm/arm_acle.h (__ssat, __usat, __ignore_saturation,     __saturation_occurred, __set_saturation_occurred): Define.     * config/arm/arm_acle_builtins.def: Define builtins for ssat, usat,     saturation_occurred, set_saturation_occurred.     * config/arm/unspecs.md (UNSPEC_Q_SET): Define.     (UNSPEC_APSR_READ): Likewise.     (VUNSPEC_APSR_WRITE): Likewise.     * config/arm/arm-fixed.md (ssadd3): Convert to define_expand.     (*arm_ssadd3): New define_insn.     (sssub3): Convert to define_expand.     (*arm_sssub3): New define_insn.     (ssmulsa3): Convert to define_expand.     (*arm_ssmulsa3): New define_insn.     (usmulusa3): Convert to define_expand.     (*arm_usmulusa3): New define_insn.     (ssmulha3): FAIL if ARM_Q_BIT_READ.     (arm_ssatsihi_shift, arm_usatsihi): Disable for ARM_Q_BIT_READ.     * config/arm/iterators.md (qaddsub_clob_q): New mode attribute. 2019-11-07  Kyrylo Tkachov      * gcc.target/arm/acle/saturation.c: New test.     * gcc.target/arm/acle/sat_no_smlatb.c: Likewise.     * lib/target-supports.exp (check_effective_target_arm_qbit_ok_nocache):     Define..     (check_effective_target_arm_qbit_ok): Likewise.     (add_options_for_arm_qbit): Likewise. diff --git a/gcc/config/arm/aout.h b/gcc/config/arm/aout.h index 91830a6cbde7ea1edeabb9d6d5ec021d0413877d..a5f83cb503f61cc1cab0e61795edde33250610e7 100644 --- a/gcc/config/arm/aout.h +++ b/gcc/config/arm/aout.h @@ -72,7 +72,7 @@ "wr8", "wr9", "wr10", "wr11", \ "wr12", "wr13", "wr14", "wr15", \ "wcgr0", "wcgr1", "wcgr2", "wcgr3", \ - "cc", "vfpcc", "sfp", "afp" \ + "cc", "vfpcc", "sfp", "afp", "apsrq" \ } #endif diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index c5cdb7b5d339748bbf5d1f26a9de676b702b5c1b..995f50785f6ebff7b3cd47185516f7bcb4fd5b81 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -41,6 +41,7 @@ #include "langhooks.h" #include "case-cfn-macros.h" #include "sbitmap.h" +#include "stringpool.h" #define SIMD_MAX_BUILTIN_ARGS 7 @@ -127,6 +128,20 @@ arm_binop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_none, qualifier_immediate }; #define BINOP_IMM_QUALIFIERS (arm_binop_imm_qualifiers) +/* T (T, unsigned immediate). */ +static enum arm_type_qualifiers +arm_sat_binop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_unsigned, qualifier_none, qualifier_unsigned_immediate }; +#define SAT_BINOP_UNSIGNED_IMM_QUALIFIERS \ + (arm_sat_binop_imm_qualifiers) + +/* unsigned T (T, unsigned immediate). */ +static enum arm_type_qualifiers +arm_unsigned_sat_binop_unsigned_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_unsigned, qualifier_none, qualifier_unsigned_immediate }; +#define UNSIGNED_SAT_BINOP_UNSIGNED_IMM_QUALIFIERS \ + (arm_unsigned_sat_binop_unsigned_imm_qualifiers) + /* T (T, lane index). */ static enum arm_type_qualifiers arm_getlane_qualifiers[SIMD_MAX_BUILTIN_ARGS] @@ -285,6 +300,18 @@ arm_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] qualifier_none, qualifier_struct_load_store_lane_index }; #define STORE1LANE_QUALIFIERS (arm_storestruct_lane_qualifiers) + /* int (void). */ +static enum arm_type_qualifiers +arm_sat_occurred_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_void }; +#define SAT_OCCURRED_QUALIFIERS (arm_sat_occurred_qualifiers) + + /* void (int). */ +static enum arm_type_qualifiers +arm_set_sat_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_void, qualifier_none }; +#define SET_SAT_QUALIFIERS (arm_set_sat_qualifiers) + #define v8qi_UP E_V8QImode #define v4hi_UP E_V4HImode #define v4hf_UP E_V4HFmode @@ -674,6 +701,7 @@ enum arm_builtins ARM_BUILTIN_##N, ARM_BUILTIN_ACLE_BASE, + ARM_BUILTIN_SAT_IMM_CHECK = ARM_BUILTIN_ACLE_BASE, #include "arm_acle_builtins.def" @@ -1169,6 +1197,16 @@ arm_init_acle_builtins (void) { unsigned int i, fcode = ARM_BUILTIN_ACLE_PATTERN_START; + tree sat_check_fpr = build_function_type_list (void_type_node, + intSI_type_node, + intSI_type_node, + intSI_type_node, + NULL); + arm_builtin_decls[ARM_BUILTIN_SAT_IMM_CHECK] + = add_builtin_function ("__builtin_sat_imm_check", sat_check_fpr, + ARM_BUILTIN_SAT_IMM_CHECK, BUILT_IN_MD, + NULL, NULL_TREE); + for (i = 0; i < ARRAY_SIZE (acle_builtin_data); i++, fcode++) { arm_builtin_datum *d = &acle_builtin_data[i]; @@ -2307,6 +2345,9 @@ constant_arg: if (have_retval) switch (argc) { + case 0: + pat = GEN_FCN (icode) (target); + break; case 1: pat = GEN_FCN (icode) (target, op[0]); break; @@ -2465,7 +2506,26 @@ arm_expand_builtin_1 (int fcode, tree exp, rtx target, static rtx arm_expand_acle_builtin (int fcode, tree exp, rtx target) { - + if (fcode == ARM_BUILTIN_SAT_IMM_CHECK) + { + /* Check the saturation immediate bounds. */ + + rtx min_sat = expand_normal (CALL_EXPR_ARG (exp, 1)); + rtx max_sat = expand_normal (CALL_EXPR_ARG (exp, 2)); + gcc_assert (CONST_INT_P (min_sat)); + gcc_assert (CONST_INT_P (max_sat)); + rtx sat_imm = expand_normal (CALL_EXPR_ARG (exp, 0)); + if (CONST_INT_P (sat_imm)) + { + if (!IN_RANGE (sat_imm, min_sat, max_sat)) + error ("%Ksaturation bit range must be in the range [%wd, %wd]", + exp, UINTVAL (min_sat), UINTVAL (max_sat)); + } + else + error ("%Ksaturation bit range must be a constant immediate", exp); + /* Don't generate any RTL. */ + return const0_rtx; + } arm_builtin_datum *d = &acle_builtin_data[fcode - ARM_BUILTIN_ACLE_PATTERN_START]; @@ -3295,4 +3355,22 @@ arm_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update) reload_fenv, restore_fnenv), update_call); } +/* Implement TARGET_CHECK_BUILTIN_CALL. Record a read of the Q bit through + intrinsics in the machine function. */ +bool +arm_check_builtin_call (location_t , vec , tree fndecl, + tree, unsigned int, tree *) +{ + int fcode = DECL_MD_FUNCTION_CODE (fndecl); + if (fcode == ARM_BUILTIN_saturation_occurred + || fcode == ARM_BUILTIN_set_saturation) + { + if (cfun && cfun->decl) + DECL_ATTRIBUTES (cfun->decl) + = tree_cons (get_identifier ("acle qbit"), NULL_TREE, + DECL_ATTRIBUTES (cfun->decl)); + } + return true; +} + #include "gt-arm-builtins.h" diff --git a/gcc/config/arm/arm-fixed.md b/gcc/config/arm/arm-fixed.md index fcab40d13f634a678ea6dffa3406c155998f720f..85dbc5d05c35921bc5115df68d30292a712729cf 100644 --- a/gcc/config/arm/arm-fixed.md +++ b/gcc/config/arm/arm-fixed.md @@ -46,11 +46,22 @@ [(set_attr "predicable" "yes") (set_attr "type" "alu_dsp_reg")]) -(define_insn "ssadd3" +(define_expand "ssadd3" + [(set (match_operand:QADDSUB 0 "s_register_operand") + (ss_plus:QADDSUB (match_operand:QADDSUB 1 "s_register_operand") + (match_operand:QADDSUB 2 "s_register_operand")))] + "TARGET_INT_SIMD" + { + if () + FAIL; + } +) + +(define_insn "*arm_ssadd3" [(set (match_operand:QADDSUB 0 "s_register_operand" "=r") (ss_plus:QADDSUB (match_operand:QADDSUB 1 "s_register_operand" "r") (match_operand:QADDSUB 2 "s_register_operand" "r")))] - "TARGET_INT_SIMD" + "TARGET_INT_SIMD && !" "qadd%?\\t%0, %1, %2" [(set_attr "predicable" "yes") (set_attr "type" "alu_dsp_reg")]) @@ -84,11 +95,22 @@ [(set_attr "predicable" "yes") (set_attr "type" "alu_dsp_reg")]) -(define_insn "sssub3" +(define_expand "sssub3" + [(set (match_operand:QADDSUB 0 "s_register_operand") + (ss_minus:QADDSUB (match_operand:QADDSUB 1 "s_register_operand") + (match_operand:QADDSUB 2 "s_register_operand")))] + "TARGET_INT_SIMD" + { + if () + FAIL; + } +) + +(define_insn "*arm_sssub3" [(set (match_operand:QADDSUB 0 "s_register_operand" "=r") (ss_minus:QADDSUB (match_operand:QADDSUB 1 "s_register_operand" "r") (match_operand:QADDSUB 2 "s_register_operand" "r")))] - "TARGET_INT_SIMD" + "TARGET_INT_SIMD && !" "qsub%?\\t%0, %1, %2" [(set_attr "predicable" "yes") (set_attr "type" "alu_dsp_reg")]) @@ -193,19 +215,31 @@ DONE; }) -;; The code sequence emitted by this insn pattern uses the Q flag, which GCC -;; doesn't generally know about, so we don't bother expanding to individual -;; instructions. It may be better to just use an out-of-line asm libcall for -;; this. +;; The code sequence emitted by this insn pattern uses the Q flag, so we need +;; to bail out when ARM_Q_BIT_READ and resort to a library sequence instead. + +(define_expand "ssmulsa3" + [(parallel [(set (match_operand:SA 0 "s_register_operand") + (ss_mult:SA (match_operand:SA 1 "s_register_operand") + (match_operand:SA 2 "s_register_operand"))) + (clobber (match_scratch:DI 3)) + (clobber (match_scratch:SI 4)) + (clobber (reg:CC CC_REGNUM))])] + "TARGET_32BIT && arm_arch6" + { + if (ARM_Q_BIT_READ) + FAIL; + } +) -(define_insn "ssmulsa3" +(define_insn "*arm_ssmulsa3" [(set (match_operand:SA 0 "s_register_operand" "=r") (ss_mult:SA (match_operand:SA 1 "s_register_operand" "r") (match_operand:SA 2 "s_register_operand" "r"))) (clobber (match_scratch:DI 3 "=r")) (clobber (match_scratch:SI 4 "=r")) (clobber (reg:CC CC_REGNUM))] - "TARGET_32BIT && arm_arch6" + "TARGET_32BIT && arm_arch6 && !ARM_Q_BIT_READ" { /* s16.15 * s16.15 -> s32.30. */ output_asm_insn ("smull\\t%Q3, %R3, %1, %2", operands); @@ -256,16 +290,28 @@ (const_int 38)) (const_int 32)))]) -;; Same goes for this. +(define_expand "usmulusa3" + [(parallel [(set (match_operand:USA 0 "s_register_operand") + (us_mult:USA (match_operand:USA 1 "s_register_operand") + (match_operand:USA 2 "s_register_operand"))) + (clobber (match_scratch:DI 3)) + (clobber (match_scratch:SI 4)) + (clobber (reg:CC CC_REGNUM))])] + "TARGET_32BIT && arm_arch6" + { + if (ARM_Q_BIT_READ) + FAIL; + } +) -(define_insn "usmulusa3" +(define_insn "*arm_usmulusa3" [(set (match_operand:USA 0 "s_register_operand" "=r") (us_mult:USA (match_operand:USA 1 "s_register_operand" "r") (match_operand:USA 2 "s_register_operand" "r"))) (clobber (match_scratch:DI 3 "=r")) (clobber (match_scratch:SI 4 "=r")) (clobber (reg:CC CC_REGNUM))] - "TARGET_32BIT && arm_arch6" + "TARGET_32BIT && arm_arch6 && !ARM_Q_BIT_READ" { /* 16.16 * 16.16 -> 32.32. */ output_asm_insn ("umull\\t%Q3, %R3, %1, %2", operands); @@ -358,6 +404,8 @@ (match_operand:HA 2 "s_register_operand")))] "TARGET_32BIT && TARGET_DSP_MULTIPLY && arm_arch6" { + if (ARM_Q_BIT_READ) + FAIL; rtx tmp = gen_reg_rtx (SImode); rtx rshift; @@ -378,6 +426,9 @@ (match_operand:UHA 2 "s_register_operand")))] "TARGET_INT_SIMD" { + if (ARM_Q_BIT_READ) + FAIL; + rtx tmp1 = gen_reg_rtx (SImode); rtx tmp2 = gen_reg_rtx (SImode); rtx tmp3 = gen_reg_rtx (SImode); @@ -405,7 +456,7 @@ (ss_truncate:HI (match_operator:SI 1 "sat_shift_operator" [(match_operand:SI 2 "s_register_operand" "r") (match_operand:SI 3 "immediate_operand" "I")])))] - "TARGET_32BIT && arm_arch6" + "TARGET_32BIT && arm_arch6 && !ARM_Q_BIT_READ" "ssat%?\\t%0, #16, %2%S1" [(set_attr "predicable" "yes") (set_attr "shift" "1") @@ -414,7 +465,7 @@ (define_insn "arm_usatsihi" [(set (match_operand:HI 0 "s_register_operand" "=r") (us_truncate:HI (match_operand:SI 1 "s_register_operand")))] - "TARGET_INT_SIMD" + "TARGET_INT_SIMD && !ARM_Q_BIT_READ" "usat%?\\t%0, #16, %1" [(set_attr "predicable" "yes") (set_attr "type" "alu_imm")] diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index c685bcbf99c81210a34192cc31db055fa7b2d605..963dc3e92f0119f424014a023edb51fbf32fc63f 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -28,6 +28,8 @@ extern enum unwind_info_type arm_except_unwind_info (struct gcc_options *); extern int use_return_insn (int, rtx); extern bool use_simple_return_p (void); extern enum reg_class arm_regno_class (int); +extern bool arm_check_builtin_call (location_t , vec , tree, + tree, unsigned int, tree *); extern void arm_load_pic_register (unsigned long, rtx); extern int arm_volatile_func (void); extern void arm_expand_prologue (void); @@ -58,6 +60,7 @@ extern bool arm_simd_check_vect_par_cnst_half_p (rtx op, machine_mode mode, bool high); extern void arm_emit_speculation_barrier_function (void); extern void arm_decompose_di_binop (rtx, rtx, rtx *, rtx *, rtx *, rtx *); +extern bool arm_q_bit_access (void); #ifdef RTX_CODE extern void arm_gen_unlikely_cbranch (enum rtx_code, machine_mode cc_mode, diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 0e6edef53ae71a6e52568808758ae64427da20fc..1bbd006fa22a3ccc2b5f732aa11c3f1c7cf7958d 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -720,6 +720,8 @@ extern int arm_arch_cmse; goto. Without it fp appears to be used and the elimination code won't get rid of sfp. It tracks fp exactly at all times. + apsrq Nor this, it is used to track operations on the Q bit + of APSR by ACLE saturating intrinsics. *: See TARGET_CONDITIONAL_REGISTER_USAGE */ @@ -767,7 +769,7 @@ extern int arm_arch_cmse; 1,1,1,1,1,1,1,1, \ 1,1,1,1, \ /* Specials. */ \ - 1,1,1,1 \ + 1,1,1,1,1 \ } /* 1 for registers not available across function calls. @@ -797,7 +799,7 @@ extern int arm_arch_cmse; 1,1,1,1,1,1,1,1, \ 1,1,1,1, \ /* Specials. */ \ - 1,1,1,1 \ + 1,1,1,1,1 \ } #ifndef SUBTARGET_CONDITIONAL_REGISTER_USAGE @@ -972,10 +974,10 @@ extern int arm_arch_cmse; ((((REGNUM) - FIRST_VFP_REGNUM) & 3) == 0 \ && (LAST_VFP_REGNUM - (REGNUM) >= 2 * (N) - 1)) -/* The number of hard registers is 16 ARM + 1 CC + 1 SFP + 1 AFP. */ +/* The number of hard registers is 16 ARM + 1 CC + 1 SFP + 1 AFP + 1 APSRQ. */ /* Intel Wireless MMX Technology registers add 16 + 4 more. */ /* VFP (VFP3) adds 32 (64) + 1 VFPCC. */ -#define FIRST_PSEUDO_REGISTER 104 +#define FIRST_PSEUDO_REGISTER 105 #define DBX_REGISTER_NUMBER(REGNO) arm_dbx_register_number (REGNO) @@ -1059,7 +1061,7 @@ extern int arm_regs_in_sequence[]; /* Registers not for general use. */ \ CC_REGNUM, VFPCC_REGNUM, \ FRAME_POINTER_REGNUM, ARG_POINTER_REGNUM, \ - SP_REGNUM, PC_REGNUM \ + SP_REGNUM, PC_REGNUM, APSRQ_REGNUM \ } /* Use different register alloc ordering for Thumb. */ @@ -1399,6 +1401,8 @@ typedef struct GTY(()) machine_function machine_function; #endif +#define ARM_Q_BIT_READ (arm_q_bit_access ()) + /* As in the machine_function, a global set of call-via labels, for code that is in text_section. */ extern GTY(()) rtx thumb_call_via_label[14]; diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 6cc50a373e26d4c22ce1fc30c22e61c7fa980bb2..1ce6931c6e993160ca859e7736963da33fda56b5 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -384,6 +384,9 @@ static const struct attribute_spec arm_attribute_table[] = #define TARGET_MERGE_DECL_ATTRIBUTES merge_dllimport_decl_attributes #endif +#undef TARGET_CHECK_BUILTIN_CALL +#define TARGET_CHECK_BUILTIN_CALL arm_check_builtin_call + #undef TARGET_LEGITIMIZE_ADDRESS #define TARGET_LEGITIMIZE_ADDRESS arm_legitimize_address @@ -28773,6 +28776,10 @@ arm_conditional_register_usage (void) if (TARGET_CALLER_INTERWORKING) global_regs[ARM_HARD_FRAME_POINTER_REGNUM] = 1; } + + /* The Q bit is only accessed via special ACLE patterns. */ + CLEAR_HARD_REG_BIT (operand_reg_set, APSRQ_REGNUM); + SUBTARGET_CONDITIONAL_REGISTER_USAGE } @@ -32008,6 +32015,16 @@ arm_emit_speculation_barrier_function () emit_library_call (speculation_barrier_libfunc, LCT_NORMAL, VOIDmode); } +/* Have we recorded an explicit access to the Q bit of APSR?. */ +bool +arm_q_bit_access (void) +{ + if (cfun && cfun->decl) + return lookup_attribute ("acle qbit", + DECL_ATTRIBUTES (cfun->decl)); + return true; +} + #if CHECKING_P namespace selftest { diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 1e41d4630f1708631c147ee490631b24a4127864..09b632b5dbc8b38dcca22494468366c97a514bb6 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -39,6 +39,7 @@ (LAST_ARM_REGNUM 15) ; (CC_REGNUM 100) ; Condition code pseudo register (VFPCC_REGNUM 101) ; VFP Condition code pseudo register + (APSRQ_REGNUM 104) ; Q bit pseudo register ] ) ;; 3rd operand to select_dominance_cc_mode @@ -423,6 +424,20 @@ (include "marvell-pj4.md") (include "xgene1.md") +;; define_subst and associated attributes + +(define_subst "add_setq" + [(set (match_operand:SI 0 "" "") + (match_operand:SI 1 "" ""))] + "" + [(set (match_dup 0) + (match_dup 1)) + (set (reg:CC APSRQ_REGNUM) + (unspec:CC [(reg:CC APSRQ_REGNUM)] UNSPEC_Q_SET))]) + +(define_subst_attr "add_clobber_q_name" "add_setq" "" "_setq") +(define_subst_attr "add_clobber_q_pred" "add_setq" "!ARM_Q_BIT_READ" + "ARM_Q_BIT_READ") ;;--------------------------------------------------------------------------- ;; Insn patterns @@ -2515,14 +2530,36 @@ (set_attr "predicable" "yes")] ) -(define_insn "maddhisi4" +(define_expand "maddhisi4" + [(set (match_operand:SI 0 "s_register_operand") + (plus:SI (mult:SI (sign_extend:SI + (match_operand:HI 1 "s_register_operand")) + (sign_extend:SI + (match_operand:HI 2 "s_register_operand"))) + (match_operand:SI 3 "s_register_operand")))] + "TARGET_DSP_MULTIPLY" + { + /* If this function reads the Q bit from ACLE intrinsics break up the + multiplication and accumulation as an overflow during accumulation will + clobber the Q flag. */ + if (ARM_Q_BIT_READ) + { + rtx tmp = gen_reg_rtx (SImode); + emit_insn (gen_mulhisi3 (tmp, operands[1], operands[2])); + emit_insn (gen_addsi3 (operands[0], tmp, operands[3])); + DONE; + } + } +) + +(define_insn "*arm_maddhisi4" [(set (match_operand:SI 0 "s_register_operand" "=r") (plus:SI (mult:SI (sign_extend:SI (match_operand:HI 1 "s_register_operand" "r")) (sign_extend:SI (match_operand:HI 2 "s_register_operand" "r"))) (match_operand:SI 3 "s_register_operand" "r")))] - "TARGET_DSP_MULTIPLY" + "TARGET_DSP_MULTIPLY && !ARM_Q_BIT_READ" "smlabb%?\\t%0, %1, %2, %3" [(set_attr "type" "smlaxy") (set_attr "predicable" "yes")] @@ -2537,7 +2574,7 @@ (sign_extend:SI (match_operand:HI 2 "s_register_operand" "r"))) (match_operand:SI 3 "s_register_operand" "r")))] - "TARGET_DSP_MULTIPLY" + "TARGET_DSP_MULTIPLY && !ARM_Q_BIT_READ" "smlatb%?\\t%0, %1, %2, %3" [(set_attr "type" "smlaxy") (set_attr "predicable" "yes")] @@ -2552,7 +2589,7 @@ (match_operand:SI 2 "s_register_operand" "r") (const_int 16))) (match_operand:SI 3 "s_register_operand" "r")))] - "TARGET_DSP_MULTIPLY" + "TARGET_DSP_MULTIPLY && !ARM_Q_BIT_READ" "smlatt%?\\t%0, %1, %2, %3" [(set_attr "type" "smlaxy") (set_attr "predicable" "yes")] @@ -4044,12 +4081,113 @@ (define_code_attr SATlo [(smin "1") (smax "2")]) (define_code_attr SAThi [(smin "2") (smax "1")]) -(define_insn "*satsi_" +(define_expand "arm_ssat" + [(match_operand:SI 0 "s_register_operand") + (match_operand:SI 1 "s_register_operand") + (match_operand:SI 2 "const_int_operand")] + "TARGET_32BIT && arm_arch6" + { + HOST_WIDE_INT val = INTVAL (operands[2]); + /* The builtin checking code should have ensured the right + range for the immediate. */ + gcc_assert (IN_RANGE (val, 1, 32)); + HOST_WIDE_INT upper_bound = (HOST_WIDE_INT_1 << (val - 1)) - 1; + HOST_WIDE_INT lower_bound = -upper_bound - 1; + rtx up_rtx = gen_int_mode (upper_bound, SImode); + rtx lo_rtx = gen_int_mode (lower_bound, SImode); + if (ARM_Q_BIT_READ) + emit_insn (gen_satsi_smin_setq (operands[0], lo_rtx, + up_rtx, operands[1])); + else + emit_insn (gen_satsi_smin (operands[0], lo_rtx, up_rtx, operands[1])); + DONE; + } +) + +(define_expand "arm_usat" + [(match_operand:SI 0 "s_register_operand") + (match_operand:SI 1 "s_register_operand") + (match_operand:SI 2 "const_int_operand")] + "TARGET_32BIT && arm_arch6" + { + HOST_WIDE_INT val = INTVAL (operands[2]); + /* The builtin checking code should have ensured the right + range for the immediate. */ + gcc_assert (IN_RANGE (val, 0, 31)); + HOST_WIDE_INT upper_bound = (HOST_WIDE_INT_1 << val) - 1; + rtx up_rtx = gen_int_mode (upper_bound, SImode); + rtx lo_rtx = CONST0_RTX (SImode); + if (ARM_Q_BIT_READ) + emit_insn (gen_satsi_smin_setq (operands[0], lo_rtx, up_rtx, + operands[1])); + else + emit_insn (gen_satsi_smin (operands[0], lo_rtx, up_rtx, operands[1])); + DONE; + } +) + +(define_insn "arm_get_apsr" + [(set (match_operand:SI 0 "s_register_operand" "=r") + (unspec:SI [(reg:CC APSRQ_REGNUM)] UNSPEC_APSR_READ))] + "TARGET_ARM_QBIT" + "mrs%?\t%0, APSR" + [(set_attr "predicable" "yes") + (set_attr "conds" "use")] +) + +(define_insn "arm_set_apsr" + [(set (reg:CC APSRQ_REGNUM) + (unspec_volatile:CC + [(match_operand:SI 0 "s_register_operand" "r")] VUNSPEC_APSR_WRITE))] + "TARGET_ARM_QBIT" + "msr%?\tAPSR_nzcvq, %0" + [(set_attr "predicable" "yes") + (set_attr "conds" "set")] +) + +;; Read the APSR and extract the Q bit (bit 27) +(define_expand "arm_saturation_occurred" + [(match_operand:SI 0 "s_register_operand")] + "TARGET_ARM_QBIT" + { + rtx apsr = gen_reg_rtx (SImode); + emit_insn (gen_arm_get_apsr (apsr)); + emit_insn (gen_extzv (operands[0], apsr, CONST1_RTX (SImode), + gen_int_mode (27, SImode))); + DONE; + } +) + +;; Read the APSR and set the Q bit (bit position 27) according to operand 0 +(define_expand "arm_set_saturation" + [(match_operand:SI 0 "reg_or_int_operand")] + "TARGET_ARM_QBIT" + { + rtx apsr = gen_reg_rtx (SImode); + emit_insn (gen_arm_get_apsr (apsr)); + rtx to_insert = gen_reg_rtx (SImode); + if (CONST_INT_P (operands[0])) + emit_move_insn (to_insert, operands[0] == CONST0_RTX (SImode) + ? CONST0_RTX (SImode) : CONST1_RTX (SImode)); + else + { + rtx cmp = gen_rtx_NE (SImode, operands[0], CONST0_RTX (SImode)); + emit_insn (gen_cstoresi4 (to_insert, cmp, operands[0], + CONST0_RTX (SImode))); + } + emit_insn (gen_insv (apsr, CONST1_RTX (SImode), + gen_int_mode (27, SImode), to_insert)); + emit_insn (gen_arm_set_apsr (apsr)); + DONE; + } +) + +(define_insn "satsi_" [(set (match_operand:SI 0 "s_register_operand" "=r") (SAT:SI (:SI (match_operand:SI 3 "s_register_operand" "r") (match_operand:SI 1 "const_int_operand" "i")) (match_operand:SI 2 "const_int_operand" "i")))] - "TARGET_32BIT && arm_arch6 + "TARGET_32BIT && arm_arch6 && && arm_sat_operator_match (operands[], operands[], NULL, NULL)" { int mask; @@ -4075,7 +4213,7 @@ (match_operand:SI 5 "const_int_operand" "i")]) (match_operand:SI 1 "const_int_operand" "i")) (match_operand:SI 2 "const_int_operand" "i")))] - "TARGET_32BIT && arm_arch6 + "TARGET_32BIT && arm_arch6 && !ARM_Q_BIT_READ && arm_sat_operator_match (operands[], operands[], NULL, NULL)" { int mask; diff --git a/gcc/config/arm/arm_acle.h b/gcc/config/arm/arm_acle.h index 248a355d00239a8724e46b9203c818906a4d4908..2564ad849856610f9415586e386f85eea6947bf7 100644 --- a/gcc/config/arm/arm_acle.h +++ b/gcc/config/arm/arm_acle.h @@ -433,6 +433,50 @@ __smlsldx (int16x2_t __a, int16x2_t __b, int64_t __c) #endif +#ifdef __ARM_FEATURE_SAT + +#define __ssat(__a, __sat) \ + __extension__ \ + ({ \ + int32_t __arg = (__a); \ + __builtin_sat_imm_check (__sat, 1, 32); \ + int32_t __res = __builtin_arm_ssat (__arg, __sat); \ + __res; \ + }) + +#define __usat(__a, __sat) \ + __extension__ \ + ({ \ + int32_t __arg = (__a); \ + __builtin_sat_imm_check (__sat, 0, 31); \ + uint32_t __res = __builtin_arm_usat (__arg, __sat); \ + __res; \ + }) + +#endif + +#ifdef __ARM_FEATURE_QBIT +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__ignore_saturation (void) +{ + /* ACLE designates this intrinsic as a hint. + Implement as a nop for now. */ +} + +/* These are defined as macros because the implementation of the builtins + requires easy access to the current function so wrapping it in an + always_inline function complicates things. */ + +#define __saturation_occurred __builtin_arm_saturation_occurred + +#define __set_saturation_occurred(__a) \ + __extension__ \ + ({ \ + int __arg = (__a); \ + __builtin_arm_set_saturation (__arg); \ + }) +#endif #pragma GCC push_options #ifdef __ARM_FEATURE_CRC32 diff --git a/gcc/config/arm/arm_acle_builtins.def b/gcc/config/arm/arm_acle_builtins.def index 0021c0036ad7e1bddef6553a900c9eaf145037b6..c72480321faa952ac307418f9e4f7d5f5f9e3745 100644 --- a/gcc/config/arm/arm_acle_builtins.def +++ b/gcc/config/arm/arm_acle_builtins.def @@ -79,3 +79,8 @@ VAR1 (TERNOP, smlald, di) VAR1 (TERNOP, smlaldx, di) VAR1 (TERNOP, smlsld, di) VAR1 (TERNOP, smlsldx, di) + +VAR1 (SAT_BINOP_UNSIGNED_IMM, ssat, si) +VAR1 (UNSIGNED_SAT_BINOP_UNSIGNED_IMM, usat, si) +VAR1 (SAT_OCCURRED, saturation_occurred, si) +VAR1 (SET_SAT, set_saturation, void) diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md index 4eb203365a6c10492a948b45db38c244ec191427..e5cef6852a2dfcef4cd3597c163a53a6c247afab 100644 --- a/gcc/config/arm/iterators.md +++ b/gcc/config/arm/iterators.md @@ -763,6 +763,12 @@ (V4QQ "8") (V2HQ "16") (QQ "8") (HQ "16") (V2HA "16") (HA "16") (SQ "") (SA "")]) +(define_mode_attr qaddsub_clob_q [(V4UQQ "0") (V2UHQ "0") (UQQ "0") (UHQ "0") + (V2UHA "0") (UHA "0") + (V4QQ "0") (V2HQ "0") (QQ "0") (HQ "0") + (V2HA "0") (HA "0") (SQ "ARM_Q_BIT_READ") + (SA "ARM_Q_BIT_READ")]) + ;; Mode attribute for vshll. (define_mode_attr V_innermode [(V8QI "QI") (V4HI "HI") (V2SI "SI")]) diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md index 78f88d5fa09f424a9ab638053cc4fe068aa19368..a4287949e525688ee5141e4975917537f84466ff 100644 --- a/gcc/config/arm/unspecs.md +++ b/gcc/config/arm/unspecs.md @@ -70,6 +70,9 @@ ; that. UNSPEC_UNALIGNED_STORE ; Same for str/strh. UNSPEC_PIC_UNIFIED ; Create a common pic addressing form. + UNSPEC_Q_SET ; Represent setting the Q bit. + UNSPEC_APSR_READ ; Represent reading the APSR. + UNSPEC_LL ; Represent an unpaired load-register-exclusive. UNSPEC_VRINTZ ; Represent a float to integral float rounding ; towards zero. @@ -211,6 +214,7 @@ VUNSPEC_MRRC ; Represent the coprocessor mrrc instruction. VUNSPEC_MRRC2 ; Represent the coprocessor mrrc2 instruction. VUNSPEC_SPECULATION_BARRIER ; Represents an unconditional speculation barrier. + VUNSPEC_APSR_WRITE ; Represent writing the APSR. ]) ;; Enumerators for NEON unspecs. diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index 42a10cf2243836da9eb7a06dc8dd2be74cdbd050..f3bf66c44ee82a3f28f1ad638a8cea1b6cc19bf6 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -1911,6 +1911,12 @@ ARM Target supports options suitable for accessing the SIMD32 intrinsics from @code{arm_acle.h}. Some multilibs may be incompatible with these options. +@item arm_qbit_ok +@anchor{arm_qbit_ok} +ARM Target supports options suitable for accessing the Q-bit manipulation +intrinsics from @code{arm_acle.h}. +Some multilibs may be incompatible with these options. + @end table @subsubsection AArch64-specific attributes diff --git a/gcc/testsuite/gcc.target/arm/acle/sat_no_smlatb.c b/gcc/testsuite/gcc.target/arm/acle/sat_no_smlatb.c new file mode 100644 index 0000000000000000000000000000000000000000..e0c53ed4dc9c9a2ff580d03a434ff34005db9ce6 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/acle/sat_no_smlatb.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_dsp } */ + +/* Ensure the smlatb doesn't get generated when reading the Q flag + from ACLE. */ + +#include + +int +foo (int x, int in, int32_t c) +{ + short a = in & 0xffff; + short b = (in & 0xffff0000) >> 16; + + int res = x + b * a + __ssat (c, 24); + return res + __saturation_occurred (); +} + +/* { dg-final { scan-assembler-not "smlatb\\t" } } */ diff --git a/gcc/testsuite/gcc.target/arm/acle/saturation.c b/gcc/testsuite/gcc.target/arm/acle/saturation.c new file mode 100644 index 0000000000000000000000000000000000000000..0b3fe519933d05a2d35106ec47b0f432365e430a --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/acle/saturation.c @@ -0,0 +1,40 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_qbit_ok } */ +/* { dg-add-options arm_qbit } */ + +#include + +int32_t +test_ssat (int32_t a) +{ + return __ssat (a, 8); +} + +/* { dg-final { scan-assembler-times "ssat\t...?, #8, ...?" 1 } } */ + +uint32_t +test_usat (int32_t a) +{ + return __usat (a, 24); +} + +/* { dg-final { scan-assembler-times "usat\t...?, #24, ...?" 1 } } */ + +/* Test that USAT doesn't get removed as we need its Q-setting behavior. */ +int +test_sat_occur (int32_t a) +{ + uint32_t res = __usat (a, 3); + return __saturation_occurred (); +} + +/* { dg-final { scan-assembler-times "usat\t...?, #3, ...?" 1 } } */ +/* { dg-final { scan-assembler "mrs\t...?, APSR" } } */ + +void +test_set_sat (void) +{ + __set_saturation_occurred (0); +} + +/* { dg-final { scan-assembler-times "msr\tAPSR_nzcvq, ...?" 1 } } */ diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 6f224fa81416e9f3f402abe3525e86df02313e6a..751045d4744777991cda826c6a654ce2bcc73962 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -3845,6 +3845,44 @@ proc add_options_for_arm_simd32 { flags } { return "$flags $et_arm_simd32_flags" } +# Return 1 if this is an ARM target supporting the saturation intrinsics +# from arm_acle.h. Some multilibs may be incompatible with these options. +# Also set et_arm_qbit_flags to the best options to add. +# arm_acle.h includes stdint.h which can cause trouble with incompatible +# -mfloat-abi= options. + +proc check_effective_target_arm_qbit_ok_nocache { } { + global et_arm_qbit_flags + set et_arm_qbit_flags "" + foreach flags {"" "-march=armv5te" "-march=armv5te -mfloat-abi=softfp" "-march=armv5te -mfloat-abi=hard"} { + if { [check_no_compiler_messages_nocache et_arm_qbit_flags object { + #include + int dummy; + #ifndef __ARM_FEATURE_QBIT + #error not QBIT + #endif + } "$flags"] } { + set et_arm_qbit_flags $flags + return 1 + } + } + + return 0 +} + +proc check_effective_target_arm_qbit_ok { } { + return [check_cached_effective_target et_arm_qbit_flags \ + check_effective_target_arm_qbit_ok_nocache] +} + +proc add_options_for_arm_qbit { flags } { + if { ! [check_effective_target_arm_qbit_ok] } { + return "$flags" + } + global et_arm_qbit_flags + return "$flags $et_arm_qbit_flags" +} + # Return 1 if this is an ARM target supporting -mfpu=neon without any # -mfloat-abi= option. Useful in tests where add_options is not # supported (such as lto tests).