From patchwork Fri Sep 14 18:02:14 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ulrich Weigand X-Patchwork-Id: 183992 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id A2E392C00A8 for ; Sat, 15 Sep 2012 04:02:58 +1000 (EST) Comment: DKIM? See http://www.dkim.org DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=gcc.gnu.org; s=default; x=1348250578; h=Comment: DomainKey-Signature:Received:Received:Received:Received:Received: Received:Received:Received:Message-Id:Received:Subject:To:Date: From:MIME-Version:Content-Type:Content-Transfer-Encoding: Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:Sender:Delivered-To; bh=8L2hGIu5YfOGVWjD+kUW c0LYlJI=; b=y3WppUSrvlIiT2gOh4bvAtZqoRu+OfXOSvKY7MJi2g4/ciQywnpS t6oy7XpcigRDPAyNo0FdeOltbd2eHO3DseAwYQ2Avk7jK18BJKiJEfFDNVhUPxLt B6k6d7Sd+Uvlj1dLMHvb8/vdNcm1S4aSCNWi7LisOllG/6OjqZEUNb4= Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:Received:Received:Received:Received:Message-Id:Received:Subject:To:Date:From:MIME-Version:Content-Type:Content-Transfer-Encoding:x-cbid:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=EDZ1MAk/s+xxG2olxL0k0ZAYVV5pZ9p0kT5DBpGpn/izMs+8dnRlG5D/R69h67 psxC4tFFsIi8wq8cpsalhzfw/FKLz/W4MwbIkKSkfMo/j81L731+PkxYzBJikmFV nBTtL81sZtQmeIZWEFVcfDyMBIKO4T1b1DRLHH6Zibm+s=; Received: (qmail 30461 invoked by alias); 14 Sep 2012 18:02:52 -0000 Received: (qmail 30450 invoked by uid 22791); 14 Sep 2012 18:02:49 -0000 X-SWARE-Spam-Status: No, hits=-3.6 required=5.0 tests=AWL, BAYES_00, KHOP_RCVD_UNTRUST, MSGID_FROM_MTA_HEADER, RCVD_IN_HOSTKARMA_W, RCVD_IN_HOSTKARMA_WL, RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from e06smtp13.uk.ibm.com (HELO e06smtp13.uk.ibm.com) (195.75.94.109) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 14 Sep 2012 18:02:33 +0000 Received: from /spool/local by e06smtp13.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 14 Sep 2012 19:02:31 +0100 Received: from b06cxnps4075.portsmouth.uk.ibm.com (9.149.109.197) by e06smtp13.uk.ibm.com (192.168.101.143) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Fri, 14 Sep 2012 19:02:17 +0100 Received: from d06av02.portsmouth.uk.ibm.com (d06av02.portsmouth.uk.ibm.com [9.149.37.228]) by b06cxnps4075.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q8EI2AT547579340 for ; Fri, 14 Sep 2012 18:02:10 GMT Received: from d06av02.portsmouth.uk.ibm.com (loopback [127.0.0.1]) by d06av02.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q8EI2GFQ010440 for ; Fri, 14 Sep 2012 12:02:16 -0600 Received: from tuxmaker.boeblingen.de.ibm.com (tuxmaker.boeblingen.de.ibm.com [9.152.85.9]) by d06av02.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with SMTP id q8EI2E4d010415; Fri, 14 Sep 2012 12:02:14 -0600 Message-Id: <201209141802.q8EI2E4d010415@d06av02.portsmouth.uk.ibm.com> Received: by tuxmaker.boeblingen.de.ibm.com (sSMTP sendmail emulation); Fri, 14 Sep 2012 20:02:14 +0200 Subject: [PATCH, ARM] Prefer vld1.64/vst1.64 over vldm/vstm To: gcc-patches@gcc.gnu.org, ramrad01@arm.com Date: Fri, 14 Sep 2012 20:02:14 +0200 (CEST) From: "Ulrich Weigand" MIME-Version: 1.0 x-cbid: 12091418-2966-0000-0000-000005497B45 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hello, this patch changes the ARM back-end to use vld1.64/vst1.64 instructions instead of vldm/vstm -where possible- to implement double-word moves. The main benefit of this is that it allows the compiler to provide appropriate alignment hints, which may improve performance. The patch is based on an earlier version by Ramana. This version has now successfully passed regression testing and benchmarking (no performance regressions found, improvements of up to 2.5% on certain benchmarks). Tested on arm-linux-gnueabi. OK for mainline? Bye, Ulrich 2012-09-14 Ramana Radhakrishnan Ulrich Weigand * config/arm/arm.c (output_move_neon): Update comment. Use vld1.64/vst1.64 instead of vldm/vstm where possible. (neon_vector_mem_operand): Support double-word modes. * config/arm/neon.md (*neon_mov VD): Call output_move_neon instead of output_move_vfp. Change constraint from Uv to Un. Index: gcc-head/gcc/config/arm/arm.c =================================================================== --- gcc-head.orig/gcc/config/arm/arm.c 2012-09-14 19:38:20.000000000 +0200 +++ gcc-head/gcc/config/arm/arm.c 2012-09-14 19:40:51.000000000 +0200 @@ -9629,7 +9629,11 @@ neon_vector_mem_operand (rtx op, int typ && REG_MODE_OK_FOR_BASE_P (XEXP (ind, 0), VOIDmode) && CONST_INT_P (XEXP (ind, 1)) && INTVAL (XEXP (ind, 1)) > -1024 - && INTVAL (XEXP (ind, 1)) < 1016 + /* For quad modes, we restrict the constant offset to be slightly less + than what the instruction format permits. We have no such constraint + on double mode offsets. (This must match arm_legitimate_index_p.) */ + && (INTVAL (XEXP (ind, 1)) + < (VALID_NEON_QREG_MODE (GET_MODE (op))? 1016 : 1024)) && (INTVAL (XEXP (ind, 1)) & 3) == 0) return TRUE; @@ -14573,15 +14577,16 @@ output_move_vfp (rtx *operands) return ""; } -/* Output a Neon quad-word load or store, or a load or store for - larger structure modes. +/* Output a Neon double-word or quad-word load or store, or a load + or store for larger structure modes. WARNING: The ordering of elements is weird in big-endian mode, - because we use VSTM, as required by the EABI. GCC RTL defines - element ordering based on in-memory order. This can be differ - from the architectural ordering of elements within a NEON register. - The intrinsics defined in arm_neon.h use the NEON register element - ordering, not the GCC RTL element ordering. + because the EABI requires that vectors stored in memory appear + as though they were stored by a VSTM, as required by the EABI. + GCC RTL defines element ordering based on in-memory order. + This can be different from the architectural ordering of elements + within a NEON register. The intrinsics defined in arm_neon.h use the + NEON register element ordering, not the GCC RTL element ordering. For example, the in-memory ordering of a big-endian a quadword vector with 16-bit elements when stored from register pair {d0,d1} @@ -14595,7 +14600,22 @@ output_move_vfp (rtx *operands) dN -> (rN+1, rN), dN+1 -> (rN+3, rN+2) So that STM/LDM can be used on vectors in ARM registers, and the - same memory layout will result as if VSTM/VLDM were used. */ + same memory layout will result as if VSTM/VLDM were used. + + Instead of VSTM/VLDM we prefer to use VST1.64/VLD1.64 where + possible, which allows use of appropriate alignment tags. + Note that the choice of "64" is independent of the actual vector + element size; this size simply ensures that the behavior is + equivalent to VSTM/VLDM in both little-endian and big-endian mode. + + Due to limitations of those instructions, use of VST1.64/VLD1.64 + is not possible if: + - the address contains PRE_DEC, or + - the mode refers to more than 4 double-word registers + + In those cases, it would be possible to replace VSTM/VLDM by a + sequence of instructions; this is not currently implemented since + this is not certain to actually improve performance. */ const char * output_move_neon (rtx *operands) @@ -14629,13 +14649,23 @@ output_move_neon (rtx *operands) switch (GET_CODE (addr)) { case POST_INC: - templ = "v%smia%%?\t%%0!, %%h1"; - ops[0] = XEXP (addr, 0); + /* We have to use vldm / vstm for too-large modes. */ + if (ARM_NUM_REGS (mode) / 2 > 4) + { + templ = "v%smia%%?\t%%0!, %%h1"; + ops[0] = XEXP (addr, 0); + } + else + { + templ = "v%s1.64\t%%h1, %%A0"; + ops[0] = mem; + } ops[1] = reg; break; case PRE_DEC: - /* FIXME: We should be using vld1/vst1 here in BE mode? */ + /* We have to use vldm / vstm in this case, since there is no + pre-decrement form of the vld1 / vst1 instructions. */ templ = "v%smdb%%?\t%%0!, %%h1"; ops[0] = XEXP (addr, 0); ops[1] = reg; @@ -14679,7 +14709,12 @@ output_move_neon (rtx *operands) } default: - templ = "v%smia%%?\t%%m0, %%h1"; + /* We have to use vldm / vstm for too-large modes. */ + if (ARM_NUM_REGS (mode) / 2 > 4) + templ = "v%smia%%?\t%%m0, %%h1"; + else + templ = "v%s1.64\t%%h1, %%A0"; + ops[0] = mem; ops[1] = reg; } Index: gcc-head/gcc/config/arm/neon.md =================================================================== --- gcc-head.orig/gcc/config/arm/neon.md 2012-09-14 19:38:20.000000000 +0200 +++ gcc-head/gcc/config/arm/neon.md 2012-09-14 19:40:51.000000000 +0200 @@ -156,9 +156,9 @@ (define_insn "*neon_mov" [(set (match_operand:VDX 0 "nonimmediate_operand" - "=w,Uv,w, w, ?r,?w,?r,?r, ?Us") + "=w,Un,w, w, ?r,?w,?r,?r, ?Us") (match_operand:VDX 1 "general_operand" - " w,w, Dn,Uvi, w, r, r, Usi,r"))] + " w,w, Dn,Uni, w, r, r, Usi,r"))] "TARGET_NEON && (register_operand (operands[0], mode) || register_operand (operands[1], mode))" @@ -181,15 +181,10 @@ return templ; } - /* FIXME: If the memory layout is changed in big-endian mode, output_move_vfp - below must be changed to output_move_neon (which will use the - element/structure loads/stores), and the constraint changed to 'Um' instead - of 'Uv'. */ - switch (which_alternative) { case 0: return "vmov\t%P0, %P1 @ "; - case 1: case 3: return output_move_vfp (operands); + case 1: case 3: return output_move_neon (operands); case 2: gcc_unreachable (); case 4: return "vmov\t%Q0, %R0, %P1 @ "; case 5: return "vmov\t%P0, %Q1, %R1 @ ";