From patchwork Wed Mar 19 09:55:41 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kyrylo Tkachov X-Patchwork-Id: 331667 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 67B242C0086 for ; Wed, 19 Mar 2014 20:56:41 +1100 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:content-type; q=dns; s=default; b=Fe2K3ki/IqG0H22VkRN1Uq1Pjf/+jEFrTDe51lt9Kqw Twgig4nbLfQZLLzwVOX8VVUpggsF3hqc7aEC+3rsio40zCCJS0b3ZHNShz8aYSJO 2fdHdtXoglWdQ4PmylRSDbJwkQL0YQ1ZNwfyJEQp7xmY0P/kkmxIU6fMVDAFpd6c = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:content-type; s=default; bh=31q5WuuVS9o6FBv6Z5T4urNWw0A=; b=HMwWP1WdCUxZyrnbp gFl4HYRJIGBYtQsDNsNIDPb2ublDn3fPaY+ord+iZvgSFXEsGSzWqLh4HiEKaR8A qLklDND2uQoGIZtj2nHvCgDxp6GPFY6xfXSRGODnQSn/ThG2lgkbHsJtIlUt1Jqt G8MwFBw9CwP2hTEzVp64Y685LI= Received: (qmail 17799 invoked by alias); 19 Mar 2014 09:56:33 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 17786 invoked by uid 89); 19 Mar 2014 09:56:33 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: service87.mimecast.com Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 19 Mar 2014 09:55:48 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Wed, 19 Mar 2014 09:55:43 +0000 Received: from [10.1.208.24] ([10.1.255.212]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Wed, 19 Mar 2014 09:55:53 +0000 Message-ID: <5329699D.6020605@arm.com> Date: Wed, 19 Mar 2014 09:55:41 +0000 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130804 Thunderbird/17.0.8 MIME-Version: 1.0 To: GCC Patches CC: Marcus Shawcroft , Richard Earnshaw Subject: [PATCH][AArch64][2/3] Recognise rev16 operations on SImode and DImode data X-MC-Unique: 114031909554300201 X-IsSubscribed: yes Hi all, This patch adds a recogniser for the bitmask,shift,orr sequence of instructions that can be used to reverse the bytes in 16-bit halfwords (for the sequence itself look at the testcase included in the patch). This can be implemented with a rev16 instruction. Since the shifts can occur in any order and there are no canonicalisation rules for where they appear in the expression we have to have two patterns to match both cases. The rtx costs function is updated to recognise the pattern and cost it appropriately by using the rev field of the cost tables introduced in patch [1/3]. The rtx costs helper functions that are used to recognise those bitwise operations are placed in config/arm/aarch-common.c so that they can be reused by both arm and aarch64. I've added an execute testcase but no scan-assembler tests since conceptually in the future the combiner might decide to not use a rev instruction due to rtx costs. We can at least test that the code generated is functionally correct though. Tested aarch64-none-elf. Ok for stage1? [gcc/] 2014-03-19 Kyrylo Tkachov * config/aarch64/aarch64.md (rev162): New pattern. (rev162_alt): Likewise. * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle rev16 case. * config/arm/aarch-common.c (aarch_rev16_shright_mask_imm_p): New. (aarch_rev16_shleft_mask_imm_p): Likewise. (aarch_rev16_p_1): Likewise. (aarch_rev16_p): Likewise. * config/arm/aarch-common-protos.h (aarch_rev16_p): Declare extern. (aarch_rev16_shright_mask_imm_p): Likewise. (aarch_rev16_shleft_mask_imm_p): Likewise. [gcc/testsuite/] 2014-03-19 Kyrylo Tkachov * gcc.target/aarch64/rev16_1.c: New test. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index ebd58c0..41761ae 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4682,6 +4682,16 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED, return false; case IOR: + if (aarch_rev16_p (x)) + { + *cost = COSTS_N_INSNS (1); + + if (speed) + *cost += extra_cost->alu.rev; + + return true; + } + /* Fall through. */ case XOR: case AND: cost_logic: diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 99a6ac8..a23452b 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3173,6 +3173,38 @@ [(set_attr "type" "rev")] ) +;; There are no canonicalisation rules for the position of the lshiftrt, ashift +;; operations within an IOR/AND RTX, therefore we have two patterns matching +;; each valid permutation. + +(define_insn "rev162" + [(set (match_operand:GPI 0 "register_operand" "=r") + (ior:GPI (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand" "r") + (const_int 8)) + (match_operand:GPI 3 "const_int_operand" "n")) + (and:GPI (lshiftrt:GPI (match_dup 1) + (const_int 8)) + (match_operand:GPI 2 "const_int_operand" "n"))))] + "aarch_rev16_shleft_mask_imm_p (operands[3], mode) + && aarch_rev16_shright_mask_imm_p (operands[2], mode)" + "rev16\\t%0, %1" + [(set_attr "type" "rev")] +) + +(define_insn "rev162_alt" + [(set (match_operand:GPI 0 "register_operand" "=r") + (ior:GPI (and:GPI (lshiftrt:GPI (match_operand:GPI 1 "register_operand" "r") + (const_int 8)) + (match_operand:GPI 2 "const_int_operand" "n")) + (and:GPI (ashift:GPI (match_dup 1) + (const_int 8)) + (match_operand:GPI 3 "const_int_operand" "n"))))] + "aarch_rev16_shleft_mask_imm_p (operands[3], mode) + && aarch_rev16_shright_mask_imm_p (operands[2], mode)" + "rev16\\t%0, %1" + [(set_attr "type" "rev")] +) + ;; zero_extend version of above (define_insn "*bswapsi2_uxtw" [(set (match_operand:DI 0 "register_operand" "=r") diff --git a/gcc/config/arm/aarch-common-protos.h b/gcc/config/arm/aarch-common-protos.h index d97ee61..08c4c7a 100644 --- a/gcc/config/arm/aarch-common-protos.h +++ b/gcc/config/arm/aarch-common-protos.h @@ -23,6 +23,9 @@ #ifndef GCC_AARCH_COMMON_PROTOS_H #define GCC_AARCH_COMMON_PROTOS_H +extern bool aarch_rev16_p (rtx); +extern bool aarch_rev16_shleft_mask_imm_p (rtx, enum machine_mode); +extern bool aarch_rev16_shright_mask_imm_p (rtx, enum machine_mode); extern int arm_early_load_addr_dep (rtx, rtx); extern int arm_early_store_addr_dep (rtx, rtx); extern int arm_mac_accumulator_is_mul_result (rtx, rtx); diff --git a/gcc/config/arm/aarch-common.c b/gcc/config/arm/aarch-common.c index c11f7e9..75ed3fd 100644 --- a/gcc/config/arm/aarch-common.c +++ b/gcc/config/arm/aarch-common.c @@ -155,6 +155,79 @@ arm_get_set_operands (rtx producer, rtx consumer, return 0; } +bool +aarch_rev16_shright_mask_imm_p (rtx val, enum machine_mode mode) +{ + return CONST_INT_P (val) + && INTVAL (val) == trunc_int_for_mode (0xff00ff00ff00ff, mode); +} + +bool +aarch_rev16_shleft_mask_imm_p (rtx val, enum machine_mode mode) +{ + return CONST_INT_P (val) + && INTVAL (val) == trunc_int_for_mode (0xff00ff00ff00ff00, mode); +} + + +static bool +aarch_rev16_p_1 (rtx lhs, rtx rhs, enum machine_mode mode) +{ + if (GET_CODE (lhs) == AND + && GET_CODE (XEXP (lhs, 0)) == ASHIFT + && CONST_INT_P (XEXP (XEXP (lhs, 0), 1)) + && INTVAL (XEXP (XEXP (lhs, 0), 1)) == 8 + && REG_P (XEXP (XEXP (lhs, 0), 0)) + && CONST_INT_P (XEXP (lhs, 1)) + && GET_CODE (rhs) == AND + && GET_CODE (XEXP (rhs, 0)) == LSHIFTRT + && REG_P (XEXP (XEXP (rhs, 0), 0)) + && CONST_INT_P (XEXP (XEXP (rhs, 0), 1)) + && INTVAL (XEXP (XEXP (rhs, 0), 1)) == 8 + && CONST_INT_P (XEXP (rhs, 1)) + && REGNO (XEXP (XEXP (rhs, 0), 0)) == REGNO (XEXP (XEXP (lhs, 0), 0))) + + { + rtx lhs_mask = XEXP (lhs, 1); + rtx rhs_mask = XEXP (rhs, 1); + + return aarch_rev16_shright_mask_imm_p (rhs_mask, mode) + && aarch_rev16_shleft_mask_imm_p (lhs_mask, mode); + } + + return false; +} + +/* Recognise a sequence of bitwise operations corresponding to a rev16 operation. + These will be of the form: + ((x >> 8) & 0x00ff00ff) + | ((x << 8) & 0xff00ff00) + for SImode and with similar but wider bitmasks for DImode. + The two sub-expressions of the IOR can appear on either side so check both + permutations with the help of aarch_rev16_p_1 above. */ + +bool +aarch_rev16_p (rtx x) +{ + rtx left_sub_rtx, right_sub_rtx; + bool is_rev = false; + + if (GET_CODE (x) != IOR) + return false; + + left_sub_rtx = XEXP (x, 0); + right_sub_rtx = XEXP (x, 1); + + /* There are no canonicalisation rules for the position of the two shifts + involved in a rev, so try both permutations. */ + is_rev = aarch_rev16_p_1 (left_sub_rtx, right_sub_rtx, GET_MODE (x)); + + if (!is_rev) + is_rev = aarch_rev16_p_1 (right_sub_rtx, left_sub_rtx, GET_MODE (x)); + + return is_rev; +} + /* Return nonzero if the CONSUMER instruction (a load) does need PRODUCER's value to calculate the address. */ int diff --git a/gcc/testsuite/gcc.target/aarch64/rev16_1.c b/gcc/testsuite/gcc.target/aarch64/rev16_1.c new file mode 100644 index 0000000..126d3c0 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/rev16_1.c @@ -0,0 +1,59 @@ +/* { dg-options "-O2" } */ +/* { dg-do run } */ + +extern void abort (void); + +typedef unsigned int __u32; + +__u32 +__rev16_32_alt (__u32 x) +{ + return (((__u32)(x) & (__u32)0xff00ff00UL) >> 8) + | (((__u32)(x) & (__u32)0x00ff00ffUL) << 8); +} + +__u32 +__rev16_32 (__u32 x) +{ + return (((__u32)(x) & (__u32)0x00ff00ffUL) << 8) + | (((__u32)(x) & (__u32)0xff00ff00UL) >> 8); +} + +typedef unsigned long long __u64; + +__u64 +__rev16_64_alt (__u64 x) +{ + return (((__u64)(x) & (__u64)0xff00ff00ff00ff00UL) >> 8) + | (((__u64)(x) & (__u64)0x00ff00ff00ff00ffUL) << 8); +} + +__u64 +__rev16_64 (__u64 x) +{ + return (((__u64)(x) & (__u64)0x00ff00ff00ff00ffUL) << 8) + | (((__u64)(x) & (__u64)0xff00ff00ff00ff00UL) >> 8); +} + +int +main (void) +{ + volatile __u32 in32 = 0x12345678; + volatile __u32 expected32 = 0x34127856; + volatile __u64 in64 = 0x1234567890abcdefUL; + volatile __u64 expected64 = 0x34127856ab90efcdUL; + + if (__rev16_32 (in32) != expected32) + abort (); + + if (__rev16_32_alt (in32) != expected32) + abort (); + + if (__rev16_64 (in64) != expected64) + abort (); + + if (__rev16_64_alt (in64) != expected64) + abort (); + + return 0; +}