From patchwork Tue May 11 11:20:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1477090 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=TSAHAHFm; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4Ffb846BR1z9sWl for ; Tue, 11 May 2021 21:20:51 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 764DC3839C56; Tue, 11 May 2021 11:20:48 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 764DC3839C56 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1620732048; bh=Lhv64aGzYBhsvq2htMMzod1IkcBY58FpxdIHXoGUid8=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=TSAHAHFmijr+AuUuHyINPUwco4Ch4glq6HBS4590PxSdvNwDP5OqhkDCTb2YfwRJG IQXIxCwOQNKb0RCcDSXD0Kfdp4WA2F8WWn7IQ+NqTDd1jTcPMHd3dnO1dFyZzi5yky 5JqsXnxxVx0zqiycPiUYiSTb6QwZASFt1wdaKD1s= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 46981385783A for ; Tue, 11 May 2021 11:20:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 46981385783A Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C9773169E for ; Tue, 11 May 2021 04:20:36 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6E3563F719 for ; Tue, 11 May 2021 04:20:31 -0700 (PDT) To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [committed] aarch64: A couple of mul_laneq tweaks Date: Tue, 11 May 2021 12:20:30 +0100 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Sandiford via Gcc-patches From: Richard Sandiford Reply-To: Richard Sandiford Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" This patch removes the duplication between the mul_laneq3 and the older mul-lane patterns. The older patterns were previously divided into two based on whether the indexed operand had the same mode as the other operands or whether it had the opposite length from the other operands (64-bit vs. 128-bit). However, it seemed easier to divide them instead based on whether the indexed operand was 64-bit or 128-bit, since that maps directly to the arm_neon.h “q” conventions. Also, it looks like the older patterns were missing cases for V8HF<->V4HF combinations, which meant that vmul_laneq_f16 and vmulq_lane_f16 didn't produce single instructions. There was a typo in the V2SF entry for VCONQ, but in practice no patterns were using that entry until now. The test passes for both endiannesses, but endianness does change the mapping between regexps and functions. Tested on aarch64-linux-gnu and aarch64_be-elf, pushed to trunk. Richard gcc/ * config/aarch64/iterators.md (VMUL_CHANGE_NLANES): Delete. (VMULD): New iterator. (VCOND): Handle V4HF and V8HF. (VCONQ): Fix entry for V2SF. * config/aarch64/aarch64-simd.md (mul_lane3): Use VMULD instead of VMUL. Use a 64-bit vector mode for the indexed operand. (*aarch64_mul3_elt_): Merge with... (mul_laneq3): ...this define_insn. Use VMUL instead of VDQSF. Use a 128-bit vector mode for the indexed operand. Use stype for the scheduling type. gcc/testsuite/ * gcc.target/aarch64/fmul_lane_1.c: New test. --- gcc/config/aarch64/aarch64-simd.md | 46 +++++---------- gcc/config/aarch64/iterators.md | 13 ++-- .../gcc.target/aarch64/fmul_lane_1.c | 59 +++++++++++++++++++ 3 files changed, 82 insertions(+), 36 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/fmul_lane_1.c diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 234762960bd..99620895e78 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -719,51 +719,35 @@ (define_expand "copysign3" ) (define_insn "mul_lane3" - [(set (match_operand:VMUL 0 "register_operand" "=w") - (mult:VMUL - (vec_duplicate:VMUL + [(set (match_operand:VMULD 0 "register_operand" "=w") + (mult:VMULD + (vec_duplicate:VMULD (vec_select: - (match_operand:VMUL 2 "register_operand" "") + (match_operand: 2 "register_operand" "") (parallel [(match_operand:SI 3 "immediate_operand" "i")]))) - (match_operand:VMUL 1 "register_operand" "w")))] + (match_operand:VMULD 1 "register_operand" "w")))] "TARGET_SIMD" { - operands[3] = aarch64_endian_lane_rtx (mode, INTVAL (operands[3])); + operands[3] = aarch64_endian_lane_rtx (mode, INTVAL (operands[3])); return "mul\\t%0., %1., %2.[%3]"; } [(set_attr "type" "neon_mul__scalar")] ) (define_insn "mul_laneq3" - [(set (match_operand:VDQSF 0 "register_operand" "=w") - (mult:VDQSF - (vec_duplicate:VDQSF - (vec_select: - (match_operand:V4SF 2 "register_operand" "w") - (parallel [(match_operand:SI 3 "immediate_operand" "i")]))) - (match_operand:VDQSF 1 "register_operand" "w")))] - "TARGET_SIMD" - { - operands[3] = aarch64_endian_lane_rtx (V4SFmode, INTVAL (operands[3])); - return "fmul\\t%0., %1., %2.[%3]"; - } - [(set_attr "type" "neon_fp_mul_s_scalar")] -) - -(define_insn "*aarch64_mul3_elt_" - [(set (match_operand:VMUL_CHANGE_NLANES 0 "register_operand" "=w") - (mult:VMUL_CHANGE_NLANES - (vec_duplicate:VMUL_CHANGE_NLANES + [(set (match_operand:VMUL 0 "register_operand" "=w") + (mult:VMUL + (vec_duplicate:VMUL (vec_select: - (match_operand: 1 "register_operand" "") - (parallel [(match_operand:SI 2 "immediate_operand")]))) - (match_operand:VMUL_CHANGE_NLANES 3 "register_operand" "w")))] + (match_operand: 2 "register_operand" "") + (parallel [(match_operand:SI 3 "immediate_operand")]))) + (match_operand:VMUL 1 "register_operand" "w")))] "TARGET_SIMD" { - operands[2] = aarch64_endian_lane_rtx (mode, INTVAL (operands[2])); - return "mul\\t%0., %3., %1.[%2]"; + operands[3] = aarch64_endian_lane_rtx (mode, INTVAL (operands[3])); + return "mul\\t%0., %1., %2.[%3]"; } - [(set_attr "type" "neon_mul__scalar")] + [(set_attr "type" "neon_mul__scalar")] ) (define_insn "mul_n3" diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index c57aa6bf2f4..69d9dbebe8f 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -312,15 +312,17 @@ (define_mode_iterator SX2 [SI SF]) (define_mode_iterator DSX [DF DI SF SI]) -;; Modes available for Advanced SIMD mul lane operations. +;; Modes available for Advanced SIMD mul operations. (define_mode_iterator VMUL [V4HI V8HI V2SI V4SI (V4HF "TARGET_SIMD_F16INST") (V8HF "TARGET_SIMD_F16INST") V2SF V4SF V2DF]) -;; Modes available for Advanced SIMD mul lane operations changing lane -;; count. -(define_mode_iterator VMUL_CHANGE_NLANES [V4HI V8HI V2SI V4SI V2SF V4SF]) +;; The subset of VMUL for which VCOND is a vector mode. +(define_mode_iterator VMULD [V4HI V8HI V2SI V4SI + (V4HF "TARGET_SIMD_F16INST") + (V8HF "TARGET_SIMD_F16INST") + V2SF V4SF]) ;; Iterators for single modes, for "@" patterns. (define_mode_iterator VNx16QI_ONLY [VNx16QI]) @@ -1201,6 +1203,7 @@ (define_mode_attr VCOND [(HI "V4HI") (SI "V2SI") (V4HI "V4HI") (V8HI "V4HI") (V2SI "V2SI") (V4SI "V2SI") (DI "DI") (V2DI "DI") + (V4HF "V4HF") (V8HF "V4HF") (V2SF "V2SF") (V4SF "V2SF") (V2DF "DF")]) @@ -1210,7 +1213,7 @@ (define_mode_attr VCONQ [(V8QI "V16QI") (V16QI "V16QI") (V2SI "V4SI") (V4SI "V4SI") (DI "V2DI") (V2DI "V2DI") (V4HF "V8HF") (V8HF "V8HF") - (V2SF "V2SF") (V4SF "V4SF") + (V2SF "V4SF") (V4SF "V4SF") (V2DF "V2DF") (SI "V4SI") (HI "V8HI") (QI "V16QI")]) diff --git a/gcc/testsuite/gcc.target/aarch64/fmul_lane_1.c b/gcc/testsuite/gcc.target/aarch64/fmul_lane_1.c new file mode 100644 index 00000000000..a2b57581c84 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/fmul_lane_1.c @@ -0,0 +1,59 @@ +/* { dg-options "-O" } */ + +#pragma GCC target "+simd+fp16" + +__Float16x4_t +f1 (__Float16x4_t x, __Float16x4_t y) +{ + return x * y[0]; +} + +__Float16x4_t +f2 (__Float16x4_t x, __Float16x4_t y) +{ + return x * y[3]; +} + +__Float16x4_t +f3 (__Float16x4_t x, __Float16x8_t y) +{ + return x * y[0]; +} + +__Float16x4_t +f4 (__Float16x4_t x, __Float16x8_t y) +{ + return x * y[7]; +} + +__Float16x8_t +f5 (__Float16x8_t x, __Float16x4_t y) +{ + return x * y[0]; +} + +__Float16x8_t +f6 (__Float16x8_t x, __Float16x4_t y) +{ + return x * y[3]; +} + +__Float16x8_t +f7 (__Float16x8_t x, __Float16x8_t y) +{ + return x * y[0]; +} + +__Float16x8_t +f8 (__Float16x8_t x, __Float16x8_t y) +{ + return x * y[7]; +} + +/* { dg-final { scan-assembler-times {\tfmul\tv0.4h, v0.4h, v1.h\[0\]} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmul\tv0.4h, v0.4h, v1.h\[3\]} 1 } } */ +/* { dg-final { scan-assembler-times {\tfmul\tv0.4h, v0.4h, v1.h\[7\]} 1 } } */ + +/* { dg-final { scan-assembler-times {\tfmul\tv0.8h, v0.8h, v1.h\[0\]} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmul\tv0.8h, v0.8h, v1.h\[3\]} 1 } } */ +/* { dg-final { scan-assembler-times {\tfmul\tv0.8h, v0.8h, v1.h\[7\]} 1 } } */