From patchwork Wed Aug 7 18:12:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1143613 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-506439-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="ycmu+1J0"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 463fjw01HRz9sPM for ; Thu, 8 Aug 2019 04:12:33 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; q=dns; s=default; b=X6qp9da2Ju52b8YJBV00EXfswDxDNiS/1lE7BTRB4vvAEMca6U fRXYLaLVoAp1TQ4ossud/ebC75SKCVVBTXdm+pNz/Qbj1c3fQudutQghw+zVRP4H MOtc2Ln7ZTXjdQZ/6nt9d9MNqMW90HSNYU0GAtC+RSkidzfHzNtGY2gRY= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; s= default; bh=x42n9kHgvzuu2zgPLETzK+tGuak=; b=ycmu+1J0PhLxHEyLkv6P 41OkPmO6MGe87yFF9/+JJb36CgAH3FfQPfGsMH3xqp7CtNw6pOawgQpzbl/0/6g+ vsvyORZgmfEy0G2Sbo2JsNCrVeYeymXXiFCfGC4Pq0uufnzeC0Tqa501buMP/LJN gdYnmTRIZQY/eZXb5DOi56g= Received: (qmail 87199 invoked by alias); 7 Aug 2019 18:12:25 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 86587 invoked by uid 89); 7 Aug 2019 18:12:25 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-8.2 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, SPF_PASS autolearn=ham version=3.3.1 spammy=rw, aarch64.c, UD:aarch64.c, aarch64c X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.110.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 07 Aug 2019 18:12:22 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1145E28; Wed, 7 Aug 2019 11:12:21 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.99.62]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 58E613F575; Wed, 7 Aug 2019 11:12:20 -0700 (PDT) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.earnshaw@arm.com, james.greenhalgh@arm.com, marcus.shawcroft@arm.com, richard.sandiford@arm.com Cc: richard.earnshaw@arm.com, james.greenhalgh@arm.com, marcus.shawcroft@arm.com Subject: [AArch64] Tweak handling of fp moves via int registers Date: Wed, 07 Aug 2019 19:12:19 +0100 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 X-IsSubscribed: yes The AArch64 port uses define_splits to prefer moving certain float constants via integer registers over loading them from memory. E.g.: (set (reg:SF X) (const_double:SF C)) splits to: (set (reg:SI tmp) (const_int C')) (set (reg:SF X) (subreg:SF (reg:SI tmp) 0)) The problem with using splits for this -- especially when the split instruction is a constant move -- is that the original form is still valid and can be recreated by later pre-RA passes. (And I think that's a valid thing for them to do, since they're folding away what appears in rtl terms to be a redundant instruction.) One pass that can do this is ira's combine_and_move_insns, which among other things looks for registers that are set once and used once. If the register is set to a rematerialisable value, the code tries to fold that value into the single use. We don't normally see this effect at -O2 and above because combine_and_move_insns isn't run when -fsched-pressure is enabled (which it is by default on AArch64). But arguably the combine part is useful independently of -fsched-pressure, and only the move part is suspect. So I don't think we should rely on the combination not happening here. The new tests demonstrate the problem by running the original tests at -O instead of -O2. This patch does the optimisation by splitting the moves at generation time and rejecting the combined form while the split is still possible. REG_EQUAL notes on the second move still give the original floating-point value for passes that need it. Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. OK to install? Richard 2019-08-07 Richard Sandiford gcc/ * config/aarch64/aarch64-protos.h (aarch64_move_float_via_int_p): Declare. * config/aarch64/aarch64.c (aarch64_move_float_via_int_p): New function, extracted from the GPF_HF move splitter. * config/aarch64/aarch64.md: Remove GPF_HF move splitter. (mov): Move via an integer register if aarch64_move_float_via_int_p. (*movhf_aarch64, *movsf_aarch64, *movdf_aarch64): Check aarch64_move_float_via_int_p. * config/aarch64/iterators.md (fcvt_target): Handle TI and TF. (FCVT_TARGET): Likewise. gcc/testsuite/ * gcc.target/aarch64/dbl_mov_immediate_2.c: New test. * gcc.target/aarch64/f16_mov_immediate_5.c: Likewise. * gcc.target/aarch64/flt_mov_immediate_2.c: Likewise. Index: gcc/config/aarch64/aarch64-protos.h =================================================================== --- gcc/config/aarch64/aarch64-protos.h 2019-07-01 09:37:06.704528805 +0100 +++ gcc/config/aarch64/aarch64-protos.h 2019-08-07 19:07:38.199739765 +0100 @@ -519,6 +519,7 @@ const char * aarch64_output_probe_stack_ const char * aarch64_output_probe_sve_stack_clash (rtx, rtx, rtx, rtx); void aarch64_err_no_fpadvsimd (machine_mode); void aarch64_expand_epilogue (bool); +bool aarch64_move_float_via_int_p (rtx); void aarch64_expand_mov_immediate (rtx, rtx, rtx (*) (rtx, rtx) = 0); rtx aarch64_ptrue_reg (machine_mode); rtx aarch64_pfalse_reg (machine_mode); Index: gcc/config/aarch64/aarch64.c =================================================================== --- gcc/config/aarch64/aarch64.c 2019-07-16 09:11:06.449416469 +0100 +++ gcc/config/aarch64/aarch64.c 2019-08-07 19:07:38.203739735 +0100 @@ -3278,6 +3278,22 @@ aarch64_expand_sve_const_vector (rtx des gcc_assert (vectors[0] == dest); } +/* Return true if floating-point value SRC should be moved into an + integer register first and then moved into a floating-point register. + This means that SRC is a constant that cannot be moved directly into + floating-point registers but assembling it in integer registers is + better than forcing it to memory. */ +bool +aarch64_move_float_via_int_p (rtx src) +{ + return (GET_MODE (src) != TFmode + && GET_CODE (src) == CONST_DOUBLE + && can_create_pseudo_p () + && !aarch64_can_const_movi_rtx_p (src, GET_MODE (src)) + && !aarch64_float_const_representable_p (src) + && aarch64_float_const_rtx_p (src)); +} + /* Set DEST to immediate IMM. For SVE vector modes, GEN_VEC_DUPLICATE is a pattern that can be used to set DEST to a replicated scalar element. */ Index: gcc/config/aarch64/aarch64.md =================================================================== --- gcc/config/aarch64/aarch64.md 2019-08-05 17:46:20.713723611 +0100 +++ gcc/config/aarch64/aarch64.md 2019-08-07 19:07:38.203739735 +0100 @@ -1249,14 +1249,24 @@ (define_expand "mov" && ! (GET_CODE (operands[1]) == CONST_DOUBLE && aarch64_float_const_zero_rtx_p (operands[1]))) operands[1] = force_reg (mode, operands[1]); + + if (aarch64_move_float_via_int_p (operands[1])) + { + rtx imm = simplify_gen_subreg (mode, operands[1], + mode, 0); + rtx tmp = force_reg (mode, imm); + operands[1] = gen_lowpart (mode, tmp); + } } ) (define_insn "*movhf_aarch64" [(set (match_operand:HF 0 "nonimmediate_operand" "=w,w , w,?r,w,w ,w ,w,m,r,m ,r") (match_operand:HF 1 "general_operand" "Y ,?rY,?r, w,w,Ufc,Uvi,m,w,m,rY,r"))] - "TARGET_FLOAT && (register_operand (operands[0], HFmode) - || aarch64_reg_or_fp_zero (operands[1], HFmode))" + "TARGET_FLOAT + && (register_operand (operands[0], HFmode) + || aarch64_reg_or_fp_zero (operands[1], HFmode)) + && !aarch64_move_float_via_int_p (operands[1])" "@ movi\\t%0.4h, #0 fmov\\t%h0, %w1 @@ -1278,8 +1288,10 @@ (define_insn "*movhf_aarch64" (define_insn "*movsf_aarch64" [(set (match_operand:SF 0 "nonimmediate_operand" "=w,w ,?r,w,w ,w ,w,m,r,m ,r,r") (match_operand:SF 1 "general_operand" "Y ,?rY, w,w,Ufc,Uvi,m,w,m,rY,r,M"))] - "TARGET_FLOAT && (register_operand (operands[0], SFmode) - || aarch64_reg_or_fp_zero (operands[1], SFmode))" + "TARGET_FLOAT + && (register_operand (operands[0], SFmode) + || aarch64_reg_or_fp_zero (operands[1], SFmode)) + && !aarch64_move_float_via_int_p (operands[1])" "@ movi\\t%0.2s, #0 fmov\\t%s0, %w1 @@ -1302,8 +1314,10 @@ (define_insn "*movsf_aarch64" (define_insn "*movdf_aarch64" [(set (match_operand:DF 0 "nonimmediate_operand" "=w, w ,?r,w,w ,w ,w,m,r,m ,r,r") (match_operand:DF 1 "general_operand" "Y , ?rY, w,w,Ufc,Uvi,m,w,m,rY,r,N"))] - "TARGET_FLOAT && (register_operand (operands[0], DFmode) - || aarch64_reg_or_fp_zero (operands[1], DFmode))" + "TARGET_FLOAT + && (register_operand (operands[0], DFmode) + || aarch64_reg_or_fp_zero (operands[1], DFmode)) + && !aarch64_move_float_via_int_p (operands[1])" "@ movi\\t%d0, #0 fmov\\t%d0, %x1 @@ -1323,26 +1337,6 @@ (define_insn "*movdf_aarch64" (set_attr "arch" "simd,*,*,*,*,simd,*,*,*,*,*,*")] ) -(define_split - [(set (match_operand:GPF_HF 0 "nonimmediate_operand") - (match_operand:GPF_HF 1 "general_operand"))] - "can_create_pseudo_p () - && !aarch64_can_const_movi_rtx_p (operands[1], mode) - && !aarch64_float_const_representable_p (operands[1]) - && aarch64_float_const_rtx_p (operands[1])" - [(const_int 0)] - { - unsigned HOST_WIDE_INT ival; - if (!aarch64_reinterpret_float_as_int (operands[1], &ival)) - FAIL; - - rtx tmp = gen_reg_rtx (mode); - emit_move_insn (tmp, gen_int_mode (ival, mode)); - emit_move_insn (operands[0], gen_lowpart (mode, tmp)); - DONE; - } -) - (define_insn "*movtf_aarch64" [(set (match_operand:TF 0 "nonimmediate_operand" "=w,?&r,w ,?r,w,?w,w,m,?r,m ,m") Index: gcc/config/aarch64/iterators.md =================================================================== --- gcc/config/aarch64/iterators.md 2019-07-29 09:39:46.658190164 +0100 +++ gcc/config/aarch64/iterators.md 2019-08-07 19:07:38.203739735 +0100 @@ -960,12 +960,14 @@ (define_mode_attr fcvt_target [(V2DF "v2 (V2DI "v2df") (V4SI "v4sf") (V2SI "v2sf") (SF "si") (DF "di") (SI "sf") (DI "df") (V4HF "v4hi") (V8HF "v8hi") (V4HI "v4hf") - (V8HI "v8hf") (HF "hi") (HI "hf")]) + (V8HI "v8hf") (HF "hi") (HI "hf") + (TI "tf") (TF "ti")]) (define_mode_attr FCVT_TARGET [(V2DF "V2DI") (V4SF "V4SI") (V2SF "V2SI") (V2DI "V2DF") (V4SI "V4SF") (V2SI "V2SF") (SF "SI") (DF "DI") (SI "SF") (DI "DF") (V4HF "V4HI") (V8HF "V8HI") (V4HI "V4HF") - (V8HI "V8HF") (HF "HI") (HI "HF")]) + (V8HI "V8HF") (HF "HI") (HI "HF") + (TI "TF") (TF "TI")]) ;; for the inequal width integer to fp conversions Index: gcc/testsuite/gcc.target/aarch64/dbl_mov_immediate_2.c =================================================================== --- /dev/null 2019-07-30 08:53:31.317691683 +0100 +++ gcc/testsuite/gcc.target/aarch64/dbl_mov_immediate_2.c 2019-08-07 19:07:38.203739735 +0100 @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O -mno-pc-relative-literal-loads" } */ +/* { dg-skip-if "Tiny model won't generate adrp" { *-*-* } { "-mcmodel=tiny" } { "" } } */ + +#include "dbl_mov_immediate_1.c" + +/* { dg-final { scan-assembler-times "movi\td\[0-9\]+, #?0" 1 } } */ + +/* { dg-final { scan-assembler-times "adrp\tx\[0-9\]+, \.LC\[0-9\]" 2 } } */ +/* { dg-final { scan-assembler-times "ldr\td\[0-9\]+, \\\[x\[0-9\], #:lo12:\.LC\[0-9\]\\\]" 2 } } */ + +/* { dg-final { scan-assembler-times "fmov\td\[0-9\]+, 1\\\.5e\\\+0" 1 } } */ + +/* { dg-final { scan-assembler-times "mov\tx\[0-9\]+, 25838523252736" 1 } } */ +/* { dg-final { scan-assembler-times "movk\tx\[0-9\]+, 0x40fe, lsl 48" 1 } } */ +/* { dg-final { scan-assembler-times "mov\tx\[0-9\]+, -9223372036854775808" 1 } } */ +/* { dg-final { scan-assembler-times "fmov\td\[0-9\]+, x\[0-9\]+" 2 } } */ + Index: gcc/testsuite/gcc.target/aarch64/f16_mov_immediate_5.c =================================================================== --- /dev/null 2019-07-30 08:53:31.317691683 +0100 +++ gcc/testsuite/gcc.target/aarch64/f16_mov_immediate_5.c 2019-08-07 19:07:38.203739735 +0100 @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O" } */ +/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok } */ +/* { dg-add-options arm_v8_2a_fp16_scalar } */ + +#include "f16_mov_immediate_2.c" + +/* { dg-final { scan-assembler-times "movi\tv\[0-9\]+\\\.4h, ?#0" 1 } } */ +/* { dg-final { scan-assembler-times "movi\tv\[0-9\]+\\\.4h, 0x80, lsl 8" 1 } } */ +/* { dg-final { scan-assembler-times "movi\tv\[0-9\]+\\\.4h, 0x5c, lsl 8" 1 } } */ +/* { dg-final { scan-assembler-times "movi\tv\[0-9\]+\\\.4h, 0x7c, lsl 8" 1 } } */ + +/* { dg-final { scan-assembler-times {fmov\th[0-9]+, #?1.7e\+1} 1 } } */ Index: gcc/testsuite/gcc.target/aarch64/flt_mov_immediate_2.c =================================================================== --- /dev/null 2019-07-30 08:53:31.317691683 +0100 +++ gcc/testsuite/gcc.target/aarch64/flt_mov_immediate_2.c 2019-08-07 19:07:38.203739735 +0100 @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O" } */ + +#include "flt_mov_immediate_1.c" + +/* { dg-final { scan-assembler-times "movi\tv\[0-9\]+\\\.2s, ?#0" 1 } } */ +/* { dg-final { scan-assembler-times "movi\tv\[0-9\]+\\\.2s, 0x80, lsl 24" 1 } } */ +/* { dg-final { scan-assembler-times "movi\tv\[0-9\]+\\\.2s, 0x80, lsl 24" 1 } } */ + +/* { dg-final { scan-assembler-times "mov\tw\[0-9\]+, 48128" 1 } } */ +/* { dg-final { scan-assembler-times "movk\tw\[0-9\]+, 0x47f0, lsl 16" 1 } } */ + +/* { dg-final { scan-assembler-times "fmov\ts\[0-9\]+, 2\\\.0e\\\+0" 1 } } */ + +/* { dg-final { scan-assembler-times "mov\tw\[0-9\]+, 16435" 1 } } */ +/* { dg-final { scan-assembler-times "movk\tw\[0-9\]+, 0xc69c, lsl 16" 1 } } */