From patchwork Fri May 6 12:34:14 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julian Brown X-Patchwork-Id: 94373 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 461DC1007D1 for ; Fri, 6 May 2011 22:34:43 +1000 (EST) Received: (qmail 2798 invoked by alias); 6 May 2011 12:34:41 -0000 Received: (qmail 2148 invoked by uid 22791); 6 May 2011 12:34:39 -0000 X-SWARE-Spam-Status: No, hits=-0.9 required=5.0 tests=AWL, BAYES_00, KAM_STOCKGEN, TW_XT, T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (38.113.113.100) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 06 May 2011 12:34:22 +0000 Received: (qmail 18764 invoked from network); 6 May 2011 12:34:20 -0000 Received: from unknown (HELO rex.config) (julian@127.0.0.2) by mail.codesourcery.com with ESMTPA; 6 May 2011 12:34:20 -0000 Date: Fri, 6 May 2011 13:34:14 +0100 From: Julian Brown To: gcc-patches@gcc.gnu.org Cc: paul@codesourcery.com, rearnsha@arm.com, Ramana Radhakrishnan Subject: [PATCH, ARM] Unaligned accesses for packed structures [1/2] Message-ID: <20110506133414.14dc027c@rex.config> Mime-Version: 1.0 X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi, This is the first of two patches to add unaligned-access support to the ARM backend. This is done somewhat differently to Jie Zhang's earlier patch: http://gcc.gnu.org/ml/gcc-patches/2010-12/msg01890.html In that with Jie's patch, *any* pointer dereference would be allowed to access unaligned data. This has the undesirable side-effect of disallowing instructions which don't support unaligned accesses (LDRD, LDM etc.) when unaligned accesses are enabled. Instead, this patch enables only packed-structure accesses to use ldr/str/ldrh/strh, by taking a hint from the MIPS ldl/ldr implementation. I figured the unaligned-access ARM case is kind of similar to those, except that normal loads/stores are used, and the shifting/merging happens in hardware. The standard names extv/extzv/insv can take a memory operand for the source/destination of the extract/insert operation, so we just expand to unspec'ed versions of the load and store operations when unaligned-access support is enabled: the benefit of doing that rather than, say, expanding using the regular movsi pattern is that we bypass any smartness in the compiler which might replace operations which work for unaligned accesses (ldr/str/ldrh/strh) with operations which don't work (ldrd/strd/ldm/stm/vldr/...). The downside is we might potentially miss out on optimization opportunities (since these things no longer look like plain memory accesses). Doing things this way allows us to leave the settings for STRICT_ALIGNMENT/SLOW_BYTE_ACCESS alone, avoiding the disruption that changing them might cause. The most awkward change in the patch is to generic code (expmed.c, {store,extract}_bit_field_1): in big-endian mode, the existing behaviour (when inserting/extracting a bitfield to a memory location) is definitely bogus: "unit" is set to BITS_PER_UNIT for memory locations, and if bitsize (the size of the field to insert/extract) is greater than BITS_PER_UNIT (which isn't unusual at all), xbitpos becomes negative. That can't possibly be intentional; I can only assume that this code path is not exercised for machines which have memory alternatives for bitfield insert/extract, and BITS_BIG_ENDIAN of 0 in BYTES_BIG_ENDIAN mode. The logic for choosing when to enable the unaligned-access support (and the name of the option to override the default behaviour) is lifted from Jie's patch. Tested with cross to ARM Linux, and (on a branch) in both little & big-endian mode cross to ARM EABI, with no regressions. OK to apply? Thanks, Julian ChangeLog gcc/ * config/arm/arm.c (arm_override_options): Add unaligned_access support. * config/arm/arm.md (UNSPEC_UNALIGNED_LOAD) (UNSPEC_UNALIGNED_STORE): Add constants for unspecs. (insv, extzv): Add unaligned-access support. (extv): Change to expander. Likewise. (unaligned_loadsi, unaligned_loadhis, unaligned_loadhiu) (unaligned_storesi, unaligned_storehi): New. (*extv_reg): New (previous extv implementation). * config/arm/arm.opt (munaligned_access): Add option. * expmed.c (store_bit_field_1): Don't tweak bitfield numbering for memory locations if BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN. (extract_bit_field_1): Likewise. commit e76508ff702406fd63bc59465d9c7ab70dcb3266 Author: Julian Brown Date: Wed May 4 10:06:25 2011 -0700 Permit regular ldr/str/ldrh/strh for packed-structure accesses etc. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 4f9c2aa..a18aea6 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -1833,6 +1833,22 @@ arm_option_override (void) fix_cm3_ldrd = 0; } + /* Enable -munaligned-access by default for + - all ARMv6 architecture-based processors + - ARMv7-A, ARMv7-R, and ARMv7-M architecture-based processors. + + Disable -munaligned-access by default for + - all pre-ARMv6 architecture-based processors + - ARMv6-M architecture-based processors. */ + + if (unaligned_access == 2) + { + if (arm_arch6 && (arm_arch_notm || arm_arch7)) + unaligned_access = 1; + else + unaligned_access = 0; + } + if (TARGET_THUMB1 && flag_schedule_insns) { /* Don't warn since it's on by default in -O2. */ diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 40ebf35..7d37445 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -104,6 +104,10 @@ UNSPEC_SYMBOL_OFFSET ; The offset of the start of the symbol from ; another symbolic address. UNSPEC_MEMORY_BARRIER ; Represent a memory barrier. + UNSPEC_UNALIGNED_LOAD ; Used to represent ldr/ldrh instructions that access + ; unaligned locations, on architectures which support + ; that. + UNSPEC_UNALIGNED_STORE ; Same for str/strh. ]) ;; UNSPEC_VOLATILE Usage: @@ -2393,7 +2397,7 @@ ;;; this insv pattern, so this pattern needs to be reevalutated. (define_expand "insv" - [(set (zero_extract:SI (match_operand:SI 0 "s_register_operand" "") + [(set (zero_extract:SI (match_operand:SI 0 "nonimmediate_operand" "") (match_operand:SI 1 "general_operand" "") (match_operand:SI 2 "general_operand" "")) (match_operand:SI 3 "reg_or_int_operand" ""))] @@ -2407,35 +2411,66 @@ if (arm_arch_thumb2) { - bool use_bfi = TRUE; - - if (GET_CODE (operands[3]) == CONST_INT) + if (unaligned_access && MEM_P (operands[0]) + && s_register_operand (operands[3], GET_MODE (operands[3])) + && (width == 16 || width == 32) && (start_bit % BITS_PER_UNIT) == 0) { - HOST_WIDE_INT val = INTVAL (operands[3]) & mask; - - if (val == 0) + rtx base_addr; + + if (width == 32) { - emit_insn (gen_insv_zero (operands[0], operands[1], - operands[2])); - DONE; + base_addr = adjust_address (operands[0], SImode, + start_bit / BITS_PER_UNIT); + emit_insn (gen_unaligned_storesi (base_addr, operands[3])); } + else + { + rtx tmp = gen_reg_rtx (HImode); - /* See if the set can be done with a single orr instruction. */ - if (val == mask && const_ok_for_arm (val << start_bit)) - use_bfi = FALSE; + base_addr = adjust_address (operands[0], HImode, + start_bit / BITS_PER_UNIT); + emit_move_insn (tmp, gen_lowpart (HImode, operands[3])); + emit_insn (gen_unaligned_storehi (base_addr, tmp)); + } + DONE; } - - if (use_bfi) + else if (s_register_operand (operands[0], GET_MODE (operands[0]))) { - if (GET_CODE (operands[3]) != REG) - operands[3] = force_reg (SImode, operands[3]); + bool use_bfi = TRUE; - emit_insn (gen_insv_t2 (operands[0], operands[1], operands[2], - operands[3])); - DONE; + if (GET_CODE (operands[3]) == CONST_INT) + { + HOST_WIDE_INT val = INTVAL (operands[3]) & mask; + + if (val == 0) + { + emit_insn (gen_insv_zero (operands[0], operands[1], + operands[2])); + DONE; + } + + /* See if the set can be done with a single orr instruction. */ + if (val == mask && const_ok_for_arm (val << start_bit)) + use_bfi = FALSE; + } + + if (use_bfi) + { + if (GET_CODE (operands[3]) != REG) + operands[3] = force_reg (SImode, operands[3]); + + emit_insn (gen_insv_t2 (operands[0], operands[1], operands[2], + operands[3])); + DONE; + } } + else + FAIL; } + if (!s_register_operand (operands[0], GET_MODE (operands[0]))) + FAIL; + target = copy_rtx (operands[0]); /* Avoid using a subreg as a subtarget, and avoid writing a paradoxical subreg as the final target. */ @@ -3628,7 +3663,7 @@ (define_expand "extzv" [(set (match_dup 4) - (ashift:SI (match_operand:SI 1 "register_operand" "") + (ashift:SI (match_operand:SI 1 "nonimmediate_operand" "") (match_operand:SI 2 "const_int_operand" ""))) (set (match_operand:SI 0 "register_operand" "") (lshiftrt:SI (match_dup 4) @@ -3641,10 +3676,53 @@ if (arm_arch_thumb2) { - emit_insn (gen_extzv_t2 (operands[0], operands[1], operands[2], - operands[3])); - DONE; + HOST_WIDE_INT width = INTVAL (operands[2]); + HOST_WIDE_INT bitpos = INTVAL (operands[3]); + + if (unaligned_access && MEM_P (operands[1]) + && (width == 16 || width == 32) && (bitpos % BITS_PER_UNIT) == 0) + { + rtx base_addr; + + if (width == 32) + { + base_addr = adjust_address (operands[1], SImode, + bitpos / BITS_PER_UNIT); + emit_insn (gen_unaligned_loadsi (operands[0], base_addr)); + } + else + { + rtx dest = operands[0]; + rtx tmp = gen_reg_rtx (SImode); + + /* We may get a paradoxical subreg here. Strip it off. */ + if (GET_CODE (dest) == SUBREG + && GET_MODE (dest) == SImode + && GET_MODE (SUBREG_REG (dest)) == HImode) + dest = SUBREG_REG (dest); + + if (GET_MODE_BITSIZE (GET_MODE (dest)) != width) + FAIL; + + base_addr = adjust_address (operands[1], HImode, + bitpos / BITS_PER_UNIT); + emit_insn (gen_unaligned_loadhiu (tmp, base_addr)); + emit_move_insn (gen_lowpart (SImode, dest), tmp); + } + DONE; + } + else if (s_register_operand (operands[1], GET_MODE (operands[1]))) + { + emit_insn (gen_extzv_t2 (operands[0], operands[1], operands[2], + operands[3])); + DONE; + } + else + FAIL; } + + if (!s_register_operand (operands[1], GET_MODE (operands[1]))) + FAIL; operands[3] = GEN_INT (rshift); @@ -3659,7 +3737,103 @@ }" ) -(define_insn "extv" +(define_expand "extv" + [(set (match_operand 0 "s_register_operand" "") + (sign_extract (match_operand 1 "nonimmediate_operand" "") + (match_operand 2 "const_int_operand" "") + (match_operand 3 "const_int_operand" "")))] + "arm_arch_thumb2" +{ + HOST_WIDE_INT width = INTVAL (operands[2]); + HOST_WIDE_INT bitpos = INTVAL (operands[3]); + + if (unaligned_access && MEM_P (operands[1]) && (width == 16 || width == 32) + && (bitpos % BITS_PER_UNIT) == 0) + { + rtx base_addr; + + if (width == 32) + { + base_addr = adjust_address (operands[1], SImode, + bitpos / BITS_PER_UNIT); + emit_insn (gen_unaligned_loadsi (operands[0], base_addr)); + } + else + { + rtx dest = operands[0]; + rtx tmp = gen_reg_rtx (SImode); + + /* We may get a paradoxical subreg here. Strip it off. */ + if (GET_CODE (dest) == SUBREG + && GET_MODE (dest) == SImode + && GET_MODE (SUBREG_REG (dest)) == HImode) + dest = SUBREG_REG (dest); + + if (GET_MODE_BITSIZE (GET_MODE (dest)) != width) + FAIL; + + base_addr = adjust_address (operands[1], HImode, + bitpos / BITS_PER_UNIT); + emit_insn (gen_unaligned_loadhis (tmp, base_addr)); + emit_move_insn (gen_lowpart (SImode, dest), tmp); + } + + DONE; + } + else if (s_register_operand (operands[1], GET_MODE (operands[1]))) + DONE; + else + FAIL; +}) + +; ARMv6+ unaligned load/store instructions (used for packed structure accesses). + +(define_insn "unaligned_loadsi" + [(set (match_operand:SI 0 "s_register_operand" "=r") + (unspec:SI [(match_operand:SI 1 "memory_operand" "m")] + UNSPEC_UNALIGNED_LOAD))] + "unaligned_access" + "ldr%?\t%0, %1\t@ unaligned" + [(set_attr "predicable" "yes") + (set_attr "type" "load1")]) + +(define_insn "unaligned_loadhis" + [(set (match_operand:SI 0 "s_register_operand" "=r") + (sign_extend:SI (unspec:HI [(match_operand:HI 1 "memory_operand" "m")] + UNSPEC_UNALIGNED_LOAD)))] + "unaligned_access" + "ldr%(sh%)\t%0, %1\t@ unaligned" + [(set_attr "predicable" "yes") + (set_attr "type" "load_byte")]) + +(define_insn "unaligned_loadhiu" + [(set (match_operand:SI 0 "s_register_operand" "=r") + (zero_extend:SI (unspec:HI [(match_operand:HI 1 "memory_operand" "m")] + UNSPEC_UNALIGNED_LOAD)))] + "unaligned_access" + "ldr%(h%)\t%0, %1\t@ unaligned" + [(set_attr "predicable" "yes") + (set_attr "type" "load_byte")]) + +(define_insn "unaligned_storesi" + [(set (match_operand:SI 0 "memory_operand" "=m") + (unspec:SI [(match_operand:SI 1 "s_register_operand" "r")] + UNSPEC_UNALIGNED_STORE))] + "unaligned_access" + "str%?\t%1, %0\t@ unaligned" + [(set_attr "predicable" "yes") + (set_attr "type" "store1")]) + +(define_insn "unaligned_storehi" + [(set (match_operand:HI 0 "memory_operand" "=m") + (unspec:HI [(match_operand:HI 1 "s_register_operand" "r")] + UNSPEC_UNALIGNED_STORE))] + "unaligned_access" + "str%(h%)\t%1, %0\t@ unaligned" + [(set_attr "predicable" "yes") + (set_attr "type" "store1")]) + +(define_insn "*extv_reg" [(set (match_operand:SI 0 "s_register_operand" "=r") (sign_extract:SI (match_operand:SI 1 "s_register_operand" "r") (match_operand:SI 2 "const_int_operand" "M") diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt index 7d2d84c..3ed9016 100644 --- a/gcc/config/arm/arm.opt +++ b/gcc/config/arm/arm.opt @@ -170,3 +170,7 @@ mfix-cortex-m3-ldrd Target Report Var(fix_cm3_ldrd) Init(2) Avoid overlapping destination and address registers on LDRD instructions that may trigger Cortex-M3 errata. + +munaligned-access +Target Report Var(unaligned_access) Init(2) +Enable unaligned word and halfword accesses to packed data. diff --git a/gcc/expmed.c b/gcc/expmed.c index 7482747..a525184 100644 --- a/gcc/expmed.c +++ b/gcc/expmed.c @@ -648,7 +648,7 @@ store_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize, /* On big-endian machines, we count bits from the most significant. If the bit field insn does not, we must invert. */ - if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN) + if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN && !MEM_P (xop0)) xbitpos = unit - bitsize - xbitpos; /* We have been counting XBITPOS within UNIT. @@ -1456,7 +1456,7 @@ extract_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize, /* On big-endian machines, we count bits from the most significant. If the bit field insn does not, we must invert. */ - if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN) + if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN && !MEM_P (xop0)) xbitpos = unit - bitsize - xbitpos; /* Now convert from counting within UNIT to counting in EXT_MODE. */