From patchwork Tue Jan 17 14:50:19 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 716218 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3v2tM73jkRz9ssP for ; Wed, 18 Jan 2017 01:50:42 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="B9YaKrD2"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:content-type:mime-version; q=dns; s=default; b=FrgBgG4jHx3cNy7iifJNnbukDULbkfAjF9OiWj+O3kqAGiYsrB n51EjzaJGJklXhjGbdGDxZKIo7htDqPmksbVF3UBC2GOi8DsYlayhJgEQDWozvU7 QxNZ2s4H4Y3U4h2/x0+5TbZjtn1qLe3bRyP9XiSwhPhrD2bl5y7SEqSJU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:content-type:mime-version; s= default; bh=o5kBg9hOgRMyMVF0qItdaixRNCc=; b=B9YaKrD2o9vtk5YMaAA1 JFQSWZeHPWIOrKh06iD01E87rRiSy+jEE68evRK+rm2PMcb/EiMj0qByrFEvcmp/ L+M7Tjg4svkPpgA7cldu72mt4k56k9+lNBbb3lXojmfTnarKng5LDQqDm2nNP40I HpsyroQssNglb9ccRUsXBSk= Received: (qmail 18558 invoked by alias); 17 Jan 2017 14:50:34 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 18541 invoked by uid 89); 17 Jan 2017 14:50:33 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=AWL, BAYES_00, KAM_LOTSOFHASH, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=no version=3.3.2 spammy=91, H*MI:sk:VI1PR08, bif, tamarchristinaarmcom X-HELO: EUR01-VE1-obe.outbound.protection.outlook.com Received: from mail-ve1eur01on0085.outbound.protection.outlook.com (HELO EUR01-VE1-obe.outbound.protection.outlook.com) (104.47.1.85) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 17 Jan 2017 14:50:23 +0000 Received: from VI1PR0801MB2031.eurprd08.prod.outlook.com (10.173.74.140) by VI1PR0801MB2093.eurprd08.prod.outlook.com (10.173.75.9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.845.12; Tue, 17 Jan 2017 14:50:19 +0000 Received: from VI1PR0801MB2031.eurprd08.prod.outlook.com ([10.173.74.140]) by VI1PR0801MB2031.eurprd08.prod.outlook.com ([10.173.74.140]) with mapi id 15.01.0845.014; Tue, 17 Jan 2017 14:50:19 +0000 From: Tamar Christina To: GCC Patches , James Greenhalgh , Marcus Shawcroft , Richard Earnshaw CC: nd Subject: [PATCH][GCC][Aarch64] Add vectorize patten for copysign. Date: Tue, 17 Jan 2017 14:50:19 +0000 Message-ID: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Tamar.Christina@arm.com; x-ms-exchange-messagesentrepresentingtype: 1 x-ms-office365-filtering-correlation-id: 52c87f77-1dd2-40c4-eb3f-08d43ee8285a x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001); SRVR:VI1PR0801MB2093; x-microsoft-exchange-diagnostics: 1; VI1PR0801MB2093; 7:niu6LH48spgjb9+IARM++pdo4KY9jkLJ8KIBVCXeXA4u7tmWMIxQpacNrwV13sFZmISHl1hTwzOOjlhcnxKJ+sJ6gdbYqlKcVdVMhHxMPHif9F7jX7TJoKtOOm4XVf02PasjKyofU9a6CXHS8YVSATL+gCCFF+wp4VnKV9clGvpjeLyInoSW9lJQtsHdE+m2/m9SFoVIC1kTZGeiIVfFzK7jhw13mapB2fzdEcFLW93Bpu/v5+VHSBwKXyEcfZOeTmH8JJFySQiEq4gLerEFblzCq2uLKSqL7txQBIXTFrf8U5fYdYSHMTJCkeFofdkiW8p09TKRkbbJH0U+AGNX4TrKyNcNkvBZ+KRYsSml/SKeO7hfNC2NvnVGCfevhQv4UExBuseiwA/3c2c6K492mqjjgODslxTiORB7dFp4KgtWKK6kOcNEHPFQA4NAnJaldzlWbZcxSFGoKAcxTR6nsw== nodisclaimer: True x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(180628864354917); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(102415395)(6040375)(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001)(6055026)(6041248)(20161123555025)(20161123560025)(20161123564025)(20161123562025)(6072148); SRVR:VI1PR0801MB2093; BCL:0; PCL:0; RULEID:; SRVR:VI1PR0801MB2093; x-forefront-prvs: 01901B3451 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(7916002)(39850400002)(39410400002)(39450400003)(39860400002)(39840400002)(199003)(189002)(377424004)(53754006)(3846002)(92566002)(6116002)(122556002)(5001770100001)(66066001)(9686003)(2900100001)(97736004)(189998001)(102836003)(2906002)(77096006)(7696004)(4326007)(3660700001)(86362001)(38730400001)(3280700002)(6506006)(105586002)(106116001)(5660300001)(450100001)(106356001)(25786008)(6636002)(55016002)(99286003)(6436002)(68736007)(33656002)(81156014)(54356999)(305945005)(99936001)(81166006)(8936002)(7736002)(8676002)(50986999)(74316002)(101416001)(30001); DIR:OUT; SFP:1101; SCL:1; SRVR:VI1PR0801MB2093; H:VI1PR0801MB2031.eurprd08.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-originalarrivaltime: 17 Jan 2017 14:50:19.4542 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0801MB2093 X-IsSubscribed: yes Hi All, This patch vectorizes the copysign builtin for AArch64 similar to how it is done for Arm. AArch64 now generates: ... .L4: ldr q1, [x6, x3] add w4, w4, 1 ldr q0, [x5, x3] cmp w4, w7 bif v1.16b, v2.16b, v3.16b fmul v0.2d, v0.2d, v1.2d str q0, [x5, x3] for the input: x * copysign(1.0, y) On 481.wrf in Spec2006 on AArch64 this gives us a speedup of 9.1%. Regtested on aarch64-none-linux-gnu and no regressions. Ok for trunk? gcc/ 2017-01-17 Tamar Christina * config/aarch64/aarch64-builtins.c (aarch64_builtin_vectorized_function): Added CASE_CFN_COPYSIGN. * config/aarch64/aarch64.c (aarch64_simd_gen_const_vector_dup): Changed int to HOST_WIDE_INT. * config/aarch64/aarch64-protos.h (aarch64_simd_gen_const_vector_dup): Likewise. * config/aarch64/aarch64-simd-builtins.def: Added copysign BINOP. * config/aarch64/aarch64-simd.md: Added copysign3. gcc/testsuite/ 2017-01-17 Tamar Christina * gcc.target/arm/vect-copysignf.c: Move to... * gcc.dg/vect/vect-copysignf.c: ... Here. diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 69fb756f0fbdc016f35ce1d08f2aaf092a034704..faba7a1a38b6e494e9589637d51c639e3126969d 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -1447,6 +1447,16 @@ aarch64_builtin_vectorized_function (unsigned int fn, tree type_out, return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOPU_bswapv2di]; else return NULL_TREE; + CASE_CFN_COPYSIGN: + if (AARCH64_CHECK_BUILTIN_MODE (2, S)) + return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_BINOP_copysignv2sf]; + else if (AARCH64_CHECK_BUILTIN_MODE (4, S)) + return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_BINOP_copysignv4sf]; + else if (AARCH64_CHECK_BUILTIN_MODE (2, D)) + return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_BINOP_copysignv2df]; + else + return NULL_TREE; + default: return NULL_TREE; } diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 29a3bd71151aa4fb7c6728f0fb52e2f3f233f41d..e75ba29f93e9e749791803ca3fa8d716ca261064 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -362,7 +362,7 @@ rtx aarch64_final_eh_return_addr (void); rtx aarch64_mask_from_zextract_ops (rtx, rtx); const char *aarch64_output_move_struct (rtx *operands); rtx aarch64_return_addr (int, rtx); -rtx aarch64_simd_gen_const_vector_dup (machine_mode, int); +rtx aarch64_simd_gen_const_vector_dup (machine_mode, HOST_WIDE_INT); bool aarch64_simd_mem_operand_p (rtx); rtx aarch64_simd_vect_par_cnst_half (machine_mode, bool); rtx aarch64_tls_get_addr (void); diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index d713d5d8b88837ec6f2dc51188fb252f8d5bc8bd..a67b7589e8badfbd0f13168557ef87e052eedcb1 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -151,6 +151,9 @@ BUILTIN_VQN (TERNOP, raddhn2, 0) BUILTIN_VQN (TERNOP, rsubhn2, 0) + /* Implemented by copysign3. */ + BUILTIN_VHSDF (BINOP, copysign, 3) + BUILTIN_VSQN_HSDI (UNOP, sqmovun, 0) /* Implemented by aarch64_qmovn. */ BUILTIN_VSQN_HSDI (UNOP, sqmovn, 0) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index a12e2268ef9b023112f8d05db0a86957fee83273..627ada98b3e4d4b02685d5b5ff71ae74d8e3356a 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -338,6 +338,24 @@ } ) +(define_expand "copysign3" + [(match_operand:VHSDF 0 "register_operand") + (match_operand:VHSDF 1 "register_operand") + (match_operand:VHSDF 2 "register_operand")] + "TARGET_FLOAT && TARGET_SIMD" +{ + rtx v_bitmask = gen_reg_rtx (mode); + int bits = GET_MODE_UNIT_BITSIZE (mode) - 1; + + emit_move_insn (v_bitmask, + aarch64_simd_gen_const_vector_dup (mode, + HOST_WIDE_INT_M1 << bits)); + emit_insn (gen_aarch64_simd_bsl (operands[0], v_bitmask, + operands[2], operands[1])); + DONE; +} +) + (define_insn "*aarch64_mul3_elt" [(set (match_operand:VMUL 0 "register_operand" "=w") (mult:VMUL diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 0cf7d12186af3e05ba8742af5a03425f61f51754..1a69605db5d2a4a0efb8c9f97a019de9dded40eb 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -11244,14 +11244,16 @@ aarch64_mov_operand_p (rtx x, machine_mode mode) /* Return a const_int vector of VAL. */ rtx -aarch64_simd_gen_const_vector_dup (machine_mode mode, int val) +aarch64_simd_gen_const_vector_dup (machine_mode mode, HOST_WIDE_INT val) { int nunits = GET_MODE_NUNITS (mode); rtvec v = rtvec_alloc (nunits); int i; + rtx cache = GEN_INT (val); + for (i=0; i < nunits; i++) - RTVEC_ELT (v, i) = GEN_INT (val); + RTVEC_ELT (v, i) = cache; return gen_rtx_CONST_VECTOR (mode, v); } diff --git a/gcc/testsuite/gcc.target/arm/vect-copysignf.c b/gcc/testsuite/gcc.dg/vect/vect-copysignf.c similarity index 91% rename from gcc/testsuite/gcc.target/arm/vect-copysignf.c rename to gcc/testsuite/gcc.dg/vect/vect-copysignf.c index 425f1b78af7b07be6929f9e5bc1118ca901bc9ce..dc961d0223399c6e7ee8209d22ca77f6d22dbd70 100644 --- a/gcc/testsuite/gcc.target/arm/vect-copysignf.c +++ b/gcc/testsuite/gcc.dg/vect/vect-copysignf.c @@ -1,5 +1,5 @@ /* { dg-do run } */ -/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-require-effective-target arm_neon_hw { target { arm*-*-* } } } */ /* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */ /* { dg-add-options "arm_neon" } */