diff mbox

[ARM] Unaligned accesses for packed structures [1/2]

Message ID 20110506133414.14dc027c@rex.config
State New
Headers show

Commit Message

Julian Brown May 6, 2011, 12:34 p.m. UTC
Hi,

This is the first of two patches to add unaligned-access support to the
ARM backend. This is done somewhat differently to Jie Zhang's earlier
patch:

  http://gcc.gnu.org/ml/gcc-patches/2010-12/msg01890.html

In that with Jie's patch, *any* pointer dereference would be allowed to
access unaligned data. This has the undesirable side-effect of
disallowing instructions which don't support unaligned accesses (LDRD,
LDM etc.) when unaligned accesses are enabled.

Instead, this patch enables only packed-structure accesses to use
ldr/str/ldrh/strh, by taking a hint from the MIPS ldl/ldr
implementation. I figured the unaligned-access ARM case is kind of
similar to those, except that normal loads/stores are used, and the
shifting/merging happens in hardware.

The standard names extv/extzv/insv can take a memory
operand for the source/destination of the extract/insert operation, so
we just expand to unspec'ed versions of the load and store operations
when unaligned-access support is enabled: the benefit of doing that
rather than, say, expanding using the regular movsi pattern is that we
bypass any smartness in the compiler which might replace operations
which work for unaligned accesses (ldr/str/ldrh/strh) with operations
which don't work (ldrd/strd/ldm/stm/vldr/...). The downside is we might
potentially miss out on optimization opportunities (since these things
no longer look like plain memory accesses).

Doing things this way allows us to leave the settings for
STRICT_ALIGNMENT/SLOW_BYTE_ACCESS alone, avoiding the disruption that
changing them might cause.

The most awkward change in the patch is to generic code (expmed.c,
{store,extract}_bit_field_1): in big-endian mode, the existing behaviour
(when inserting/extracting a bitfield to a memory location) is
definitely bogus: "unit" is set to BITS_PER_UNIT for memory locations,
and if bitsize (the size of the field to insert/extract) is greater than
BITS_PER_UNIT (which isn't unusual at all), xbitpos becomes negative.
That can't possibly be intentional; I can only assume that this code
path is not exercised for machines which have memory alternatives for
bitfield insert/extract, and BITS_BIG_ENDIAN of 0 in BYTES_BIG_ENDIAN
mode.

The logic for choosing when to enable the unaligned-access support (and
the name of the option to override the default behaviour) is lifted from
Jie's patch.

Tested with cross to ARM Linux, and (on a branch) in both little &
big-endian mode cross to ARM EABI, with no regressions. OK to apply?

Thanks,

Julian

ChangeLog

    gcc/
    * config/arm/arm.c (arm_override_options): Add unaligned_access
    support.
    * config/arm/arm.md (UNSPEC_UNALIGNED_LOAD)
    (UNSPEC_UNALIGNED_STORE): Add constants for unspecs.
    (insv, extzv): Add unaligned-access support.
    (extv): Change to expander. Likewise.
    (unaligned_loadsi, unaligned_loadhis, unaligned_loadhiu)
    (unaligned_storesi, unaligned_storehi): New.
    (*extv_reg): New (previous extv implementation).
    * config/arm/arm.opt (munaligned_access): Add option.
    * expmed.c (store_bit_field_1): Don't tweak bitfield numbering for
    memory locations if BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN.
    (extract_bit_field_1): Likewise.

Comments

Richard Earnshaw May 6, 2011, 1:04 p.m. UTC | #1
On Fri, 2011-05-06 at 13:34 +0100, Julian Brown wrote:
> Hi,
> 
> This is the first of two patches to add unaligned-access support to the
> ARM backend. This is done somewhat differently to Jie Zhang's earlier
> patch:
> 
>   http://gcc.gnu.org/ml/gcc-patches/2010-12/msg01890.html
> 
> In that with Jie's patch, *any* pointer dereference would be allowed to
> access unaligned data. This has the undesirable side-effect of
> disallowing instructions which don't support unaligned accesses (LDRD,
> LDM etc.) when unaligned accesses are enabled.
> 
> Instead, this patch enables only packed-structure accesses to use
> ldr/str/ldrh/strh, by taking a hint from the MIPS ldl/ldr
> implementation. I figured the unaligned-access ARM case is kind of
> similar to those, except that normal loads/stores are used, and the
> shifting/merging happens in hardware.
> 
> The standard names extv/extzv/insv can take a memory
> operand for the source/destination of the extract/insert operation, so
> we just expand to unspec'ed versions of the load and store operations
> when unaligned-access support is enabled: the benefit of doing that
> rather than, say, expanding using the regular movsi pattern is that we
> bypass any smartness in the compiler which might replace operations
> which work for unaligned accesses (ldr/str/ldrh/strh) with operations
> which don't work (ldrd/strd/ldm/stm/vldr/...). The downside is we might
> potentially miss out on optimization opportunities (since these things
> no longer look like plain memory accesses).
> 
> Doing things this way allows us to leave the settings for
> STRICT_ALIGNMENT/SLOW_BYTE_ACCESS alone, avoiding the disruption that
> changing them might cause.
> 
> The most awkward change in the patch is to generic code (expmed.c,
> {store,extract}_bit_field_1): in big-endian mode, the existing behaviour
> (when inserting/extracting a bitfield to a memory location) is
> definitely bogus: "unit" is set to BITS_PER_UNIT for memory locations,
> and if bitsize (the size of the field to insert/extract) is greater than
> BITS_PER_UNIT (which isn't unusual at all), xbitpos becomes negative.
> That can't possibly be intentional; I can only assume that this code
> path is not exercised for machines which have memory alternatives for
> bitfield insert/extract, and BITS_BIG_ENDIAN of 0 in BYTES_BIG_ENDIAN
> mode.
> 
> The logic for choosing when to enable the unaligned-access support (and
> the name of the option to override the default behaviour) is lifted from
> Jie's patch.
> 
> Tested with cross to ARM Linux, and (on a branch) in both little &
> big-endian mode cross to ARM EABI, with no regressions. OK to apply?
> 
> Thanks,
> 
> Julian
> 
> ChangeLog
> 
>     gcc/
>     * config/arm/arm.c (arm_override_options): Add unaligned_access
>     support.
>     * config/arm/arm.md (UNSPEC_UNALIGNED_LOAD)
>     (UNSPEC_UNALIGNED_STORE): Add constants for unspecs.
>     (insv, extzv): Add unaligned-access support.
>     (extv): Change to expander. Likewise.
>     (unaligned_loadsi, unaligned_loadhis, unaligned_loadhiu)
>     (unaligned_storesi, unaligned_storehi): New.
>     (*extv_reg): New (previous extv implementation).
>     * config/arm/arm.opt (munaligned_access): Add option.
>     * expmed.c (store_bit_field_1): Don't tweak bitfield numbering for
>     memory locations if BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN.
>     (extract_bit_field_1): Likewise.

+  if (unaligned_access == 2)
+    {
+      if (arm_arch6 && (arm_arch_notm || arm_arch7))
+       unaligned_access = 1;
+      else
+       unaligned_access = 0;
+    }
+

The compiler should fault -munaligned-access on cores that don't support
it.
+(define_insn "unaligned_loadsi"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+       (unspec:SI [(match_operand:SI 1 "memory_operand" "m")]
+                  UNSPEC_UNALIGNED_LOAD))]
+  "unaligned_access"
+  "ldr%?\t%0, %1\t@ unaligned"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "load1")])

I think the final condition should also include TARGET_32BIT, as these
patterns aren't expected to work with Thumb-1.

Secondly, they should be structured to get the length information right
when a 16-bit encoding can be used in Thumb-2 (you'll keep Carrot happy
that way :-) : just add an alternative that's only enabled for Thumb
mode and which matches the requirements for a 16-bit instruction (beware
however of the 16-bit write-back case).

Finally, I don't see anything to put out the correct build attribute
information for unaligned access enabled (Tag_CPU_unaligned_access).
Have I just missed it?

R.
Joey Ye Sept. 22, 2011, 4:54 a.m. UTC | #2
Julian,

This patch works for register ld/st, but doesn't work for immediate,
as showed in example.

Would you further improve it for imm?

Thanks - Joey

$ arm-none-eabi-gcc -v 2>&1 | grep version
gcc version 4.7.0 20110922 (experimental) [trunk revision 179074] (GCC)
$ cat u.c
struct packed_str
{
	char f0;
	int  f1;
}__attribute__((packed)) u;

void foo(int a)
{
  u.f1 = a; // works fine
}

void bar()
{
  u.f1 = 1; // doesn't work
}

$ arm-none-eabi-gcc -mthumb -march=armv7 -S -O2 u.c
$ cat u.s
	.syntax unified
	.arch armv7
	.fpu softvfp
	.eabi_attribute 20, 1
	.eabi_attribute 21, 1
	.eabi_attribute 23, 3
	.eabi_attribute 24, 1
	.eabi_attribute 25, 1
	.eabi_attribute 26, 1
	.eabi_attribute 30, 2
	.eabi_attribute 34, 1
	.eabi_attribute 18, 4
	.thumb
	.file	"u.c"
	.text
	.align	2
	.global	foo
	.thumb
	.thumb_func
	.type	foo, %function
foo:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	movw	r3, #:lower16:u
	movt	r3, #:upper16:u
	str	r0, [r3, #1]	@ unaligned
	bx	lr
	.size	foo, .-foo
	.align	2
	.global	bar
	.thumb
	.thumb_func
	.type	bar, %function
bar:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	movw	r3, #:lower16:u
	movs	r1, #0
	movt	r3, #:upper16:u

@ Still uses aligned str from here
	ldrb	r2, [r3, #0]	@ zero_extendqisi2
	strb	r1, [r3, #4]
	orr	r2, r2, #256
	str	r2, [r3, #0]
	bx	lr
	.size	bar, .-bar
	.comm	u,5,4
	.ident	"GCC: (GNU) 4.7.0 20110922 (experimental) [trunk revision 179074]"
diff mbox

Patch

commit e76508ff702406fd63bc59465d9c7ab70dcb3266
Author: Julian Brown <julian@henry7.codesourcery.com>
Date:   Wed May 4 10:06:25 2011 -0700

    Permit regular ldr/str/ldrh/strh for packed-structure accesses etc.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 4f9c2aa..a18aea6 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1833,6 +1833,22 @@  arm_option_override (void)
 	fix_cm3_ldrd = 0;
     }
 
+  /* Enable -munaligned-access by default for
+     - all ARMv6 architecture-based processors
+     - ARMv7-A, ARMv7-R, and ARMv7-M architecture-based processors.
+
+     Disable -munaligned-access by default for
+     - all pre-ARMv6 architecture-based processors
+     - ARMv6-M architecture-based processors.  */
+
+  if (unaligned_access == 2)
+    {
+      if (arm_arch6 && (arm_arch_notm || arm_arch7))
+	unaligned_access = 1;
+      else
+	unaligned_access = 0;
+    }
+
   if (TARGET_THUMB1 && flag_schedule_insns)
     {
       /* Don't warn since it's on by default in -O2.  */
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 40ebf35..7d37445 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -104,6 +104,10 @@ 
   UNSPEC_SYMBOL_OFFSET  ; The offset of the start of the symbol from
                         ; another symbolic address.
   UNSPEC_MEMORY_BARRIER ; Represent a memory barrier.
+  UNSPEC_UNALIGNED_LOAD	; Used to represent ldr/ldrh instructions that access
+			; unaligned locations, on architectures which support
+			; that.
+  UNSPEC_UNALIGNED_STORE ; Same for str/strh.
 ])
 
 ;; UNSPEC_VOLATILE Usage:
@@ -2393,7 +2397,7 @@ 
 ;;; this insv pattern, so this pattern needs to be reevalutated.
 
 (define_expand "insv"
-  [(set (zero_extract:SI (match_operand:SI 0 "s_register_operand" "")
+  [(set (zero_extract:SI (match_operand:SI 0 "nonimmediate_operand" "")
                          (match_operand:SI 1 "general_operand" "")
                          (match_operand:SI 2 "general_operand" ""))
         (match_operand:SI 3 "reg_or_int_operand" ""))]
@@ -2407,35 +2411,66 @@ 
 
     if (arm_arch_thumb2)
       {
-	bool use_bfi = TRUE;
-
-	if (GET_CODE (operands[3]) == CONST_INT)
+        if (unaligned_access && MEM_P (operands[0])
+	    && s_register_operand (operands[3], GET_MODE (operands[3]))
+	    && (width == 16 || width == 32) && (start_bit % BITS_PER_UNIT) == 0)
 	  {
-	    HOST_WIDE_INT val = INTVAL (operands[3]) & mask;
-
-	    if (val == 0)
+	    rtx base_addr;
+	    
+	    if (width == 32)
 	      {
-		emit_insn (gen_insv_zero (operands[0], operands[1],
-					  operands[2]));
-		DONE;
+	        base_addr = adjust_address (operands[0], SImode,
+					    start_bit / BITS_PER_UNIT);
+		emit_insn (gen_unaligned_storesi (base_addr, operands[3]));
 	      }
+	    else
+	      {
+	        rtx tmp = gen_reg_rtx (HImode);
 
-	    /* See if the set can be done with a single orr instruction.  */
-	    if (val == mask && const_ok_for_arm (val << start_bit))
-	      use_bfi = FALSE;
+	        base_addr = adjust_address (operands[0], HImode,
+					    start_bit / BITS_PER_UNIT);
+		emit_move_insn (tmp, gen_lowpart (HImode, operands[3]));
+		emit_insn (gen_unaligned_storehi (base_addr, tmp));
+	      }
+	    DONE;
 	  }
-	  
-	if (use_bfi)
+	else if (s_register_operand (operands[0], GET_MODE (operands[0])))
 	  {
-	    if (GET_CODE (operands[3]) != REG)
-	      operands[3] = force_reg (SImode, operands[3]);
+	    bool use_bfi = TRUE;
 
-	    emit_insn (gen_insv_t2 (operands[0], operands[1], operands[2],
-				    operands[3]));
-	    DONE;
+	    if (GET_CODE (operands[3]) == CONST_INT)
+	      {
+		HOST_WIDE_INT val = INTVAL (operands[3]) & mask;
+
+		if (val == 0)
+		  {
+		    emit_insn (gen_insv_zero (operands[0], operands[1],
+					      operands[2]));
+		    DONE;
+		  }
+
+		/* See if the set can be done with a single orr instruction.  */
+		if (val == mask && const_ok_for_arm (val << start_bit))
+		  use_bfi = FALSE;
+	      }
+
+	    if (use_bfi)
+	      {
+		if (GET_CODE (operands[3]) != REG)
+		  operands[3] = force_reg (SImode, operands[3]);
+
+		emit_insn (gen_insv_t2 (operands[0], operands[1], operands[2],
+					operands[3]));
+		DONE;
+	      }
 	  }
+	else
+	  FAIL;
       }
 
+    if (!s_register_operand (operands[0], GET_MODE (operands[0])))
+      FAIL;
+
     target = copy_rtx (operands[0]);
     /* Avoid using a subreg as a subtarget, and avoid writing a paradoxical 
        subreg as the final target.  */
@@ -3628,7 +3663,7 @@ 
 
 (define_expand "extzv"
   [(set (match_dup 4)
-	(ashift:SI (match_operand:SI   1 "register_operand" "")
+	(ashift:SI (match_operand:SI   1 "nonimmediate_operand" "")
 		   (match_operand:SI   2 "const_int_operand" "")))
    (set (match_operand:SI              0 "register_operand" "")
 	(lshiftrt:SI (match_dup 4)
@@ -3641,10 +3676,53 @@ 
     
     if (arm_arch_thumb2)
       {
-	emit_insn (gen_extzv_t2 (operands[0], operands[1], operands[2],
-				 operands[3]));
-	DONE;
+	HOST_WIDE_INT width = INTVAL (operands[2]);
+	HOST_WIDE_INT bitpos = INTVAL (operands[3]);
+
+	if (unaligned_access && MEM_P (operands[1])
+	    && (width == 16 || width == 32) && (bitpos % BITS_PER_UNIT) == 0)
+	  {
+	    rtx base_addr;
+
+	    if (width == 32)
+              {
+		base_addr = adjust_address (operands[1], SImode,
+					    bitpos / BITS_PER_UNIT);
+		emit_insn (gen_unaligned_loadsi (operands[0], base_addr));
+              }
+	    else
+              {
+		rtx dest = operands[0];
+		rtx tmp = gen_reg_rtx (SImode);
+
+		/* We may get a paradoxical subreg here.  Strip it off.  */
+		if (GET_CODE (dest) == SUBREG
+		    && GET_MODE (dest) == SImode
+		    && GET_MODE (SUBREG_REG (dest)) == HImode)
+		  dest = SUBREG_REG (dest);
+
+		if (GET_MODE_BITSIZE (GET_MODE (dest)) != width)
+		  FAIL;
+
+		base_addr = adjust_address (operands[1], HImode,
+					    bitpos / BITS_PER_UNIT);
+		emit_insn (gen_unaligned_loadhiu (tmp, base_addr));
+		emit_move_insn (gen_lowpart (SImode, dest), tmp);
+	      }
+	    DONE;
+	  }
+	else if (s_register_operand (operands[1], GET_MODE (operands[1])))
+	  {
+	    emit_insn (gen_extzv_t2 (operands[0], operands[1], operands[2],
+				     operands[3]));
+	    DONE;
+	  }
+	else
+	  FAIL;
       }
+    
+    if (!s_register_operand (operands[1], GET_MODE (operands[1])))
+      FAIL;
 
     operands[3] = GEN_INT (rshift);
     
@@ -3659,7 +3737,103 @@ 
   }"
 )
 
-(define_insn "extv"
+(define_expand "extv"
+  [(set (match_operand 0 "s_register_operand" "")
+	(sign_extract (match_operand 1 "nonimmediate_operand" "")
+		      (match_operand 2 "const_int_operand" "")
+		      (match_operand 3 "const_int_operand" "")))]
+  "arm_arch_thumb2"
+{
+  HOST_WIDE_INT width = INTVAL (operands[2]);
+  HOST_WIDE_INT bitpos = INTVAL (operands[3]);
+
+  if (unaligned_access && MEM_P (operands[1]) && (width == 16 || width == 32)
+      && (bitpos % BITS_PER_UNIT) == 0)
+    {
+      rtx base_addr;
+      
+      if (width == 32)
+        {
+	  base_addr = adjust_address (operands[1], SImode,
+				      bitpos / BITS_PER_UNIT);
+	  emit_insn (gen_unaligned_loadsi (operands[0], base_addr));
+        }
+      else
+        {
+	  rtx dest = operands[0];
+	  rtx tmp = gen_reg_rtx (SImode);
+	  
+	  /* We may get a paradoxical subreg here.  Strip it off.  */
+	  if (GET_CODE (dest) == SUBREG
+	      && GET_MODE (dest) == SImode
+	      && GET_MODE (SUBREG_REG (dest)) == HImode)
+	    dest = SUBREG_REG (dest);
+	  
+	  if (GET_MODE_BITSIZE (GET_MODE (dest)) != width)
+	    FAIL;
+	  
+	  base_addr = adjust_address (operands[1], HImode,
+				      bitpos / BITS_PER_UNIT);
+	  emit_insn (gen_unaligned_loadhis (tmp, base_addr));
+	  emit_move_insn (gen_lowpart (SImode, dest), tmp);
+	}
+
+      DONE;
+    }
+  else if (s_register_operand (operands[1], GET_MODE (operands[1])))
+    DONE;
+  else
+    FAIL;
+})
+
+; ARMv6+ unaligned load/store instructions (used for packed structure accesses).
+
+(define_insn "unaligned_loadsi"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(unspec:SI [(match_operand:SI 1 "memory_operand" "m")]
+		   UNSPEC_UNALIGNED_LOAD))]
+  "unaligned_access"
+  "ldr%?\t%0, %1\t@ unaligned"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "load1")])
+
+(define_insn "unaligned_loadhis"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(sign_extend:SI (unspec:HI [(match_operand:HI 1 "memory_operand" "m")]
+				   UNSPEC_UNALIGNED_LOAD)))]
+  "unaligned_access"
+  "ldr%(sh%)\t%0, %1\t@ unaligned"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "load_byte")])
+
+(define_insn "unaligned_loadhiu"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(zero_extend:SI (unspec:HI [(match_operand:HI 1 "memory_operand" "m")]
+				   UNSPEC_UNALIGNED_LOAD)))]
+  "unaligned_access"
+  "ldr%(h%)\t%0, %1\t@ unaligned"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "load_byte")])
+
+(define_insn "unaligned_storesi"
+  [(set (match_operand:SI 0 "memory_operand" "=m")
+	(unspec:SI [(match_operand:SI 1 "s_register_operand" "r")]
+		   UNSPEC_UNALIGNED_STORE))]
+  "unaligned_access"
+  "str%?\t%1, %0\t@ unaligned"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "store1")])
+
+(define_insn "unaligned_storehi"
+  [(set (match_operand:HI 0 "memory_operand" "=m")
+	(unspec:HI [(match_operand:HI 1 "s_register_operand" "r")]
+		   UNSPEC_UNALIGNED_STORE))]
+  "unaligned_access"
+  "str%(h%)\t%1, %0\t@ unaligned"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "store1")])
+
+(define_insn "*extv_reg"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
 	(sign_extract:SI (match_operand:SI 1 "s_register_operand" "r")
                          (match_operand:SI 2 "const_int_operand" "M")
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index 7d2d84c..3ed9016 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -170,3 +170,7 @@  mfix-cortex-m3-ldrd
 Target Report Var(fix_cm3_ldrd) Init(2)
 Avoid overlapping destination and address registers on LDRD instructions
 that may trigger Cortex-M3 errata.
+
+munaligned-access
+Target Report Var(unaligned_access) Init(2)
+Enable unaligned word and halfword accesses to packed data.
diff --git a/gcc/expmed.c b/gcc/expmed.c
index 7482747..a525184 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -648,7 +648,7 @@  store_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
       /* On big-endian machines, we count bits from the most significant.
 	 If the bit field insn does not, we must invert.  */
 
-      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
+      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN && !MEM_P (xop0))
 	xbitpos = unit - bitsize - xbitpos;
 
       /* We have been counting XBITPOS within UNIT.
@@ -1456,7 +1456,7 @@  extract_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
 
       /* On big-endian machines, we count bits from the most significant.
 	 If the bit field insn does not, we must invert.  */
-      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
+      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN && !MEM_P (xop0))
 	xbitpos = unit - bitsize - xbitpos;
 
       /* Now convert from counting within UNIT to counting in EXT_MODE.  */