From patchwork Fri Apr 8 07:52:40 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chung-Lin Tang X-Patchwork-Id: 90279 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id C4A9CB6EF7 for ; Fri, 8 Apr 2011 17:53:04 +1000 (EST) Received: (qmail 30828 invoked by alias); 8 Apr 2011 07:53:02 -0000 Received: (qmail 30816 invoked by uid 22791); 8 Apr 2011 07:53:00 -0000 X-SWARE-Spam-Status: No, hits=-1.5 required=5.0 tests=AWL, BAYES_00, SARE_OBFU_REAL_SUB, TW_TD, T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (38.113.113.100) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 08 Apr 2011 07:52:56 +0000 Received: (qmail 2376 invoked from network); 8 Apr 2011 07:52:53 -0000 Received: from unknown (HELO ?60.245.67.151?) (cltang@127.0.0.2) by mail.codesourcery.com with ESMTPA; 8 Apr 2011 07:52:53 -0000 Message-ID: <4D9EBEC8.2020807@codesourcery.com> Date: Fri, 08 Apr 2011 15:52:40 +0800 From: Chung-Lin Tang User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.15) Gecko/20110303 Thunderbird/3.1.9 MIME-Version: 1.0 To: gcc-patches CC: Richard Earnshaw Subject: [patch, ARM] PR48250, rehaul arm_legitimize_reload_address() X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi Richard, here's a new patch, with some quite significant changes to arm_legitimize_reload_address(). This time fully utilizing the sign-magnitude offset macro you gave in the last thread, and adapting your description into embedded comments. The reload_outdf pattern caused some regressions in a first round of testing. With the new reload address legitimization, which creates a RTX like (mem (plus (plus reg #high) #low)), the current SECONDARY_RELOAD_OUTPUT_CLASS macro returns 'GENERAL_REGS' under VFP, which triggers the execution of reload_outdf, but now creating unexpected (set (reg1) (plus (plus reg #high) #low)) insns. The historical reasons for the reload_outdf pattern are not entirely clear, as we discussed off list. In any case, we should aim to update the ARM secondary reload stuff to use the newer framework. I have disabled the pattern under ARM mode, and cross-tested the entire patch under v7-A/v5TE VFP, v7-A NEON, and v4T soft-float configs, all clear of regressions. reload_outdf is disabled by changing the pattern condition from TARGET_32BIT to TARGET_THUMB2 (disabling TARGET_ARM only). This is temporary, Thumb-2 probably needs a few more bits of simultaneous updates to work, so I'm leaving that for a later patch. I've listed your name together as significant parts are from you. Is it ready for trunk? Thanks, Chung-Lin 2011-04-08 Chung-Lin Tang Richard Earnshaw PR target/48250 * config/arm/arm.c (arm_legitimize_reload_address): Update cases to use sign-magnitude offsets. Reject unsupported unaligned cases. Add detailed description in comments. * config/arm/arm.md (reload_outdf): Disable for ARM mode; change condition from TARGET_32BIT to TARGET_ARM. Index: config/arm/arm.c =================================================================== --- config/arm/arm.c (revision 172057) +++ config/arm/arm.c (working copy) @@ -6419,23 +6419,126 @@ HOST_WIDE_INT val = INTVAL (XEXP (*p, 1)); HOST_WIDE_INT low, high; - if (mode == DImode || (mode == DFmode && TARGET_SOFT_FLOAT)) - low = ((val & 0xf) ^ 0x8) - 0x8; - else if (TARGET_MAVERICK && TARGET_HARD_FLOAT) - /* Need to be careful, -256 is not a valid offset. */ - low = val >= 0 ? (val & 0xff) : -((-val) & 0xff); - else if (mode == SImode - || (mode == SFmode && TARGET_SOFT_FLOAT) - || ((mode == HImode || mode == QImode) && ! arm_arch4)) - /* Need to be careful, -4096 is not a valid offset. */ - low = val >= 0 ? (val & 0xfff) : -((-val) & 0xfff); - else if ((mode == HImode || mode == QImode) && arm_arch4) - /* Need to be careful, -256 is not a valid offset. */ - low = val >= 0 ? (val & 0xff) : -((-val) & 0xff); - else if (GET_MODE_CLASS (mode) == MODE_FLOAT - && TARGET_HARD_FLOAT && TARGET_FPA) - /* Need to be careful, -1024 is not a valid offset. */ - low = val >= 0 ? (val & 0x3ff) : -((-val) & 0x3ff); + /* Detect coprocessor load/stores. */ + bool coproc_p = ((TARGET_HARD_FLOAT + && (TARGET_VFP || TARGET_FPA || TARGET_MAVERICK) + && (mode == SFmode || mode == DFmode + || (mode == DImode && TARGET_MAVERICK))) + || (TARGET_REALLY_IWMMXT + && VALID_IWMMXT_REG_MODE (mode)) + || (TARGET_NEON + && (VALID_NEON_DREG_MODE (mode) + || VALID_NEON_QREG_MODE (mode)))); + + /* For some conditions, bail out when lower two bits are unaligned. */ + if ((val & 0x3) != 0 + /* Coprocessor load/store indexes are 8-bits + '00' appended. */ + && (coproc_p + /* For DI, and DF under soft-float: */ + || ((mode == DImode || mode == DFmode) + /* Without ldrd, we use stm/ldm, which does not + fair well with unaligned bits. */ + && (! TARGET_LDRD + /* Thumb-2 ldrd/strd is [-1020,+1020] in steps of 4. */ + || TARGET_THUMB2)))) + return false; + + /* When breaking down a [reg+index] reload address into [(reg+high)+low], + of which the (reg+high) gets turned into a reload add insn, + we try to decompose the index into high/low values that can often + also lead to better reload CSE. + For example: + ldr r0, [r2, #4100] // Offset too large + ldr r1, [r2, #4104] // Offset too large + + is best reloaded as: + add t1, r2, #4096 + ldr r0, [t1, #4] + add t2, r2, #4096 + ldr r1, [t2, #8] + + which post-reload CSE can simplify in most cases to eliminate the + second add instruction: + add t1, r2, #4096 + ldr r0, [t1, #4] + ldr r1, [t1, #8] + + The idea here is that we want to split out the bits of the constant + as a mask, rather than as subtracting the maximum offset that the + respective type of load/store used can handle. + + When encountering negative offsets, we can still utilize it even if + the overall offset is positive; sometimes this may lead to an immediate + that can be constructed with fewer instructions. + For example: + ldr r0, [r2, #0x3FFFFC] + + This is best reloaded as: + add t1, r2, #0x400000 + ldr r0, [t1, #-4] + + The trick for spotting this for a load insn with N bits of offset + (i.e. bits N-1:0) is to look at bit N; if it is set, then chose a + negative offset that is going to make bit N and all the bits below + it become zero in the remainder part. + + The SIGN_MAG_LOW_ADDR_BITS macro below implements this, with respect + to sign-magnitude addressing (i.e. separate +- bit, or 1's complement), + used in most cases of ARM load/store instructions. */ + +#define SIGN_MAG_LOW_ADDR_BITS(VAL, N) \ + (((VAL) & ((1 << (N)) - 1)) \ + ? (((VAL) & ((1 << ((N) + 1)) - 1)) ^ (1 << (N))) - (1 << (N)) \ + : 0) + + if (coproc_p) + low = SIGN_MAG_LOW_ADDR_BITS (val, 10); + else if (GET_MODE_SIZE (mode) == 8) + { + if (TARGET_LDRD) + low = (TARGET_THUMB2 + ? SIGN_MAG_LOW_ADDR_BITS (val, 10) + : SIGN_MAG_LOW_ADDR_BITS (val, 8)); + else + /* For pre-ARMv5TE (without ldrd), we use ldm/stm(db/da/ib) + to access doublewords. The supported load/store offsets are + -8, -4, and 4, which we try to produce here. */ + low = ((val & 0xf) ^ 0x8) - 0x8; + } + else if (GET_MODE_SIZE (mode) < 8) + { + /* NEON element load/stores do not have an offset. */ + if (TARGET_NEON_FP16 && mode == HFmode) + return false; + + if (TARGET_THUMB2) + { + /* Thumb-2 has an asymmetrical index range of (-256,4096). + Try the wider 12-bit range first, and re-try if the result + is out of range. */ + low = SIGN_MAG_LOW_ADDR_BITS (val, 12); + if (low < -255) + low = SIGN_MAG_LOW_ADDR_BITS (val, 8); + } + else + { + if (mode == HImode || mode == HFmode) + { + if (arm_arch4) + low = SIGN_MAG_LOW_ADDR_BITS (val, 8); + else + { + /* The storehi/movhi_bytes fallbacks can use only + [-4094,+4094] of the full ldrb/strb index range. */ + low = SIGN_MAG_LOW_ADDR_BITS (val, 12); + if (low == 4095 || low == -4095) + return false; + } + } + else + low = SIGN_MAG_LOW_ADDR_BITS (val, 12); + } + } else return false; Index: config/arm/arm.md =================================================================== --- config/arm/arm.md (revision 172057) +++ config/arm/arm.md (working copy) @@ -6188,7 +6188,7 @@ [(match_operand:DF 0 "arm_reload_memory_operand" "=o") (match_operand:DF 1 "s_register_operand" "r") (match_operand:SI 2 "s_register_operand" "=&r")] - "TARGET_32BIT" + "TARGET_THUMB2" " { enum rtx_code code = GET_CODE (XEXP (operands[0], 0));