diff mbox

[ARM] Unaligned accesses for packed structures [1/2]

Message ID 20110825164650.01ec842f@rex.config
State New
Headers show

Commit Message

Julian Brown Aug. 25, 2011, 3:46 p.m. UTC
On Mon, 4 Jul 2011 13:43:31 +0200 (CEST)
"Ulrich Weigand" <uweigand@de.ibm.com> wrote:

> Julian Brown wrote:
> 
> > The most awkward change in the patch is to generic code (expmed.c,
> > {store,extract}_bit_field_1): in big-endian mode, the existing
> > behaviour (when inserting/extracting a bitfield to a memory
> > location) is definitely bogus: "unit" is set to BITS_PER_UNIT for
> > memory locations, and if bitsize (the size of the field to
> > insert/extract) is greater than BITS_PER_UNIT (which isn't unusual
> > at all), xbitpos becomes negative. That can't possibly be
> > intentional; I can only assume that this code path is not exercised
> > for machines which have memory alternatives for bitfield
> > insert/extract, and BITS_BIG_ENDIAN of 0 in BYTES_BIG_ENDIAN mode.

> I agree that the current code cannot possibly be correct.  However,
> just disabling the BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN renumbering
> *completely* seems wrong to me as well.  
> 
> According to the docs, the meaning bit position passed to the
> extv/insv expanders is determined by BITS_BIG_ENDIAN, both in the
> cases of register and memory operands.  Therefore, if BITS_BIG_ENDIAN
> differs from BYTES_BIG_ENDIAN, we should need a correction for memory
> operands as well.  However, this correction needs to be relative to
> the size of the access (i.e. the operand to the extv/insn), not just
> BITS_PER_UNIT.

> Note that with that change, the new code your patch introduces to the
> ARM back-end will also need to change.  You currently handle bitpos
> like this:
> 
>           base_addr = adjust_address (operands[1], HImode,
>                                       bitpos / BITS_PER_UNIT);
> 
> This implicitly assumes that bitpos counts according to
> BYTES_BIG_ENDIAN, not BITS_BIG_ENDIAN -- which exactly cancels out
> the common code behaviour introduced by your patch ...

I've updated the patch to work with current mainline, and implemented
your suggestion along with the change of the interpretation of bitpos in
the insv/extv/extzv expanders in arm.md. It seems to work fine
(testing still in progress), but I'm a bit concerned that the semantics
of bit-positioning for memory operands when BYTES_BIG_ENDIAN
&& !BITS_BIG_ENDIAN are now strange to the point of perversity.

The problem is, if we're using little-endian bit numbering for memory
locations in big-endian-bytes mode, we need to define an origin from
which to count "backwards" from. For the current implementation, this
will now be (I believe) one word beyond the base address of the access
in question, which is IMO slightly bizarre, to say the least. But, I
can't think of any other easy ways forward than either this patch, or
the previous one which disabled bit-endian switching entirely for
memory operands in this case.

So, OK to apply this version, assuming testing comes out OK? (And the
followup patch [2/2], which remains unchanged?)

Thanks,

Julian

ChangeLog

    gcc/
    * config/arm/arm.c (arm_override_options): Add unaligned_access
    support.
    (arm_file_start): Emit attribute for unaligned access as
    appropriate.
    * config/arm/arm.md (UNSPEC_UNALIGNED_LOAD)
    (UNSPEC_UNALIGNED_STORE): Add constants for unspecs.
    (insv, extzv): Add unaligned-access support.
    (extv): Change to expander. Likewise.
    (extzv_t1, extv_regsi): Add helpers.
    (unaligned_loadsi, unaligned_loadhis, unaligned_loadhiu)
    (unaligned_storesi, unaligned_storehi): New.
    (*extv_reg): New (previous extv implementation).
    * config/arm/arm.opt (munaligned_access): Add option.
    * config/arm/constraints.md (Uw): New constraint.
    * expmed.c (store_bit_field_1): Adjust bitfield numbering according
    to size of access, not size of unit, when BITS_BIG_ENDIAN !=
    BYTES_BIG_ENDIAN.
    (extract_bit_field_1): Likewise.

Comments

Julian Brown Aug. 25, 2011, 5:31 p.m. UTC | #1
On Thu, 25 Aug 2011 16:46:50 +0100
Julian Brown <julian@codesourcery.com> wrote:

> So, OK to apply this version, assuming testing comes out OK? (And the
> followup patch [2/2], which remains unchanged?)

FWIW, all tests pass, apart from gcc.target/arm/volatile-bitfields-3.c,
which regresses. The output contains:

        ldrh    r0, [r3, #2]    @ unaligned

I believe that, to conform to the ARM EABI, that GCC must use an
(aligned) ldr in this case. Is that correct? If so, it looks like the
middle-end bitfield code does not take the setting of
-fstrict-volatile-bitfields into account.

Julian
Ramana Radhakrishnan Aug. 25, 2011, 8:18 p.m. UTC | #2
On 25 August 2011 18:31, Julian Brown <julian@codesourcery.com> wrote:
> On Thu, 25 Aug 2011 16:46:50 +0100
> Julian Brown <julian@codesourcery.com> wrote:
>
>> So, OK to apply this version, assuming testing comes out OK? (And the
>> followup patch [2/2], which remains unchanged?)
>
> FWIW, all tests pass, apart from gcc.target/arm/volatile-bitfields-3.c,
> which regresses. The output contains:
>
>        ldrh    r0, [r3, #2]    @ unaligned
>
> I believe that, to conform to the ARM EABI, that GCC must use an
> (aligned) ldr in this case. Is that correct?

That is correct by my reading of the ABI Spec.

The relevant section is 7.1.7.5 where it states that :

"When a volatile bitfield is read it's container must be read exactly
once using the access width appropriate to the type of the container.
"

Here the type of the container is a long and hence the access should
be with an ldr instruction followed by a shift as it is today
irrespective whether we support unaligned accesses in this case.

cheers
Ramana
Joseph Myers Aug. 25, 2011, 8:25 p.m. UTC | #3
On Thu, 25 Aug 2011, Ramana Radhakrishnan wrote:

> On 25 August 2011 18:31, Julian Brown <julian@codesourcery.com> wrote:
> > On Thu, 25 Aug 2011 16:46:50 +0100
> > Julian Brown <julian@codesourcery.com> wrote:
> >
> >> So, OK to apply this version, assuming testing comes out OK? (And the
> >> followup patch [2/2], which remains unchanged?)
> >
> > FWIW, all tests pass, apart from gcc.target/arm/volatile-bitfields-3.c,
> > which regresses. The output contains:
> >
> >        ldrh    r0, [r3, #2]    @ unaligned
> >
> > I believe that, to conform to the ARM EABI, that GCC must use an
> > (aligned) ldr in this case. Is that correct?
> 
> That is correct by my reading of the ABI Spec.
> 
> The relevant section is 7.1.7.5 where it states that :
> 
> "When a volatile bitfield is read it's container must be read exactly
> once using the access width appropriate to the type of the container.
> "
> 
> Here the type of the container is a long and hence the access should
> be with an ldr instruction followed by a shift as it is today
> irrespective whether we support unaligned accesses in this case.

Except for packed structures, anyway (and there aren't any packed 
structures involved in this particular testcase).  The ABI doesn't cover 
packed structures and I maintain that there the target-independent GNU C 
semantics take precedence - meaning that if the compiler doesn't know that 
a single read with the relevant access width is going to be safe, it must 
use an instruction sequence it does know to be safe.
diff mbox

Patch

commit 7bf9c1c0806ad1ae75e96635cda55fff4c40e7ae
Author: Julian Brown <julian@henry8.codesourcery.com>
Date:   Tue Aug 23 05:46:22 2011 -0700

    Unaligned support for packed structs

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 2353704..dda2718 100644
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 3162b30..cc1eb80 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1905,6 +1905,28 @@  arm_option_override (void)
 	fix_cm3_ldrd = 0;
     }
 
+  /* Enable -munaligned-access by default for
+     - all ARMv6 architecture-based processors
+     - ARMv7-A, ARMv7-R, and ARMv7-M architecture-based processors.
+
+     Disable -munaligned-access by default for
+     - all pre-ARMv6 architecture-based processors
+     - ARMv6-M architecture-based processors.  */
+
+  if (unaligned_access == 2)
+    {
+      if (arm_arch6 && (arm_arch_notm || arm_arch7))
+	unaligned_access = 1;
+      else
+	unaligned_access = 0;
+    }
+  else if (unaligned_access == 1
+	   && !(arm_arch6 && (arm_arch_notm || arm_arch7)))
+    {
+      warning (0, "target CPU does not support unaligned accesses");
+      unaligned_access = 0;
+    }
+
   if (TARGET_THUMB1 && flag_schedule_insns)
     {
       /* Don't warn since it's on by default in -O2.  */
@@ -22145,6 +22167,10 @@  arm_file_start (void)
 	val = 6;
       asm_fprintf (asm_out_file, "\t.eabi_attribute 30, %d\n", val);
 
+      /* Tag_CPU_unaligned_access.  */
+      asm_fprintf (asm_out_file, "\t.eabi_attribute 34, %d\n",
+		   unaligned_access);
+
       /* Tag_ABI_FP_16bit_format.  */
       if (arm_fp16_format)
 	asm_fprintf (asm_out_file, "\t.eabi_attribute 38, %d\n",
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 0f23400..0ea0f7f 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -103,6 +103,10 @@ 
   UNSPEC_SYMBOL_OFFSET  ; The offset of the start of the symbol from
                         ; another symbolic address.
   UNSPEC_MEMORY_BARRIER ; Represent a memory barrier.
+  UNSPEC_UNALIGNED_LOAD	; Used to represent ldr/ldrh instructions that access
+			; unaligned locations, on architectures which support
+			; that.
+  UNSPEC_UNALIGNED_STORE ; Same for str/strh.
 ])
 
 ;; UNSPEC_VOLATILE Usage:
@@ -2468,10 +2472,10 @@ 
 ;;; this insv pattern, so this pattern needs to be reevalutated.
 
 (define_expand "insv"
-  [(set (zero_extract:SI (match_operand:SI 0 "s_register_operand" "")
-                         (match_operand:SI 1 "general_operand" "")
-                         (match_operand:SI 2 "general_operand" ""))
-        (match_operand:SI 3 "reg_or_int_operand" ""))]
+  [(set (zero_extract (match_operand 0 "nonimmediate_operand" "")
+                      (match_operand 1 "general_operand" "")
+                      (match_operand 2 "general_operand" ""))
+        (match_operand 3 "reg_or_int_operand" ""))]
   "TARGET_ARM || arm_arch_thumb2"
   "
   {
@@ -2482,35 +2486,70 @@ 
 
     if (arm_arch_thumb2)
       {
-	bool use_bfi = TRUE;
-
-	if (GET_CODE (operands[3]) == CONST_INT)
+        if (unaligned_access && MEM_P (operands[0])
+	    && s_register_operand (operands[3], GET_MODE (operands[3]))
+	    && (width == 16 || width == 32) && (start_bit % BITS_PER_UNIT) == 0)
 	  {
-	    HOST_WIDE_INT val = INTVAL (operands[3]) & mask;
+	    rtx base_addr;
+
+	    if (BYTES_BIG_ENDIAN)
+	      start_bit = GET_MODE_BITSIZE (GET_MODE (operands[3])) - width
+			  - start_bit;
 
-	    if (val == 0)
+	    if (width == 32)
 	      {
-		emit_insn (gen_insv_zero (operands[0], operands[1],
-					  operands[2]));
-		DONE;
+	        base_addr = adjust_address (operands[0], SImode,
+					    start_bit / BITS_PER_UNIT);
+		emit_insn (gen_unaligned_storesi (base_addr, operands[3]));
 	      }
+	    else
+	      {
+	        rtx tmp = gen_reg_rtx (HImode);
 
-	    /* See if the set can be done with a single orr instruction.  */
-	    if (val == mask && const_ok_for_arm (val << start_bit))
-	      use_bfi = FALSE;
+	        base_addr = adjust_address (operands[0], HImode,
+					    start_bit / BITS_PER_UNIT);
+		emit_move_insn (tmp, gen_lowpart (HImode, operands[3]));
+		emit_insn (gen_unaligned_storehi (base_addr, tmp));
+	      }
+	    DONE;
 	  }
-	  
-	if (use_bfi)
+	else if (s_register_operand (operands[0], GET_MODE (operands[0])))
 	  {
-	    if (GET_CODE (operands[3]) != REG)
-	      operands[3] = force_reg (SImode, operands[3]);
+	    bool use_bfi = TRUE;
 
-	    emit_insn (gen_insv_t2 (operands[0], operands[1], operands[2],
-				    operands[3]));
-	    DONE;
+	    if (GET_CODE (operands[3]) == CONST_INT)
+	      {
+		HOST_WIDE_INT val = INTVAL (operands[3]) & mask;
+
+		if (val == 0)
+		  {
+		    emit_insn (gen_insv_zero (operands[0], operands[1],
+					      operands[2]));
+		    DONE;
+		  }
+
+		/* See if the set can be done with a single orr instruction.  */
+		if (val == mask && const_ok_for_arm (val << start_bit))
+		  use_bfi = FALSE;
+	      }
+
+	    if (use_bfi)
+	      {
+		if (GET_CODE (operands[3]) != REG)
+		  operands[3] = force_reg (SImode, operands[3]);
+
+		emit_insn (gen_insv_t2 (operands[0], operands[1], operands[2],
+					operands[3]));
+		DONE;
+	      }
 	  }
+	else
+	  FAIL;
       }
 
+    if (!s_register_operand (operands[0], GET_MODE (operands[0])))
+      FAIL;
+
     target = copy_rtx (operands[0]);
     /* Avoid using a subreg as a subtarget, and avoid writing a paradoxical 
        subreg as the final target.  */
@@ -3702,12 +3741,10 @@ 
 ;; to reduce register pressure later on.
 
 (define_expand "extzv"
-  [(set (match_dup 4)
-	(ashift:SI (match_operand:SI   1 "register_operand" "")
-		   (match_operand:SI   2 "const_int_operand" "")))
-   (set (match_operand:SI              0 "register_operand" "")
-	(lshiftrt:SI (match_dup 4)
-		     (match_operand:SI 3 "const_int_operand" "")))]
+  [(set (match_operand 0 "s_register_operand" "")
+	(zero_extract (match_operand 1 "nonimmediate_operand" "")
+		      (match_operand 2 "const_int_operand" "")
+		      (match_operand 3 "const_int_operand" "")))]
   "TARGET_THUMB1 || arm_arch_thumb2"
   "
   {
@@ -3716,10 +3753,57 @@ 
     
     if (arm_arch_thumb2)
       {
-	emit_insn (gen_extzv_t2 (operands[0], operands[1], operands[2],
-				 operands[3]));
-	DONE;
+	HOST_WIDE_INT width = INTVAL (operands[2]);
+	HOST_WIDE_INT bitpos = INTVAL (operands[3]);
+
+	if (unaligned_access && MEM_P (operands[1])
+	    && (width == 16 || width == 32) && (bitpos % BITS_PER_UNIT) == 0)
+	  {
+	    rtx base_addr;
+
+	    if (BYTES_BIG_ENDIAN)
+	      bitpos = GET_MODE_BITSIZE (GET_MODE (operands[0])) - width
+		       - bitpos;
+
+	    if (width == 32)
+              {
+		base_addr = adjust_address (operands[1], SImode,
+					    bitpos / BITS_PER_UNIT);
+		emit_insn (gen_unaligned_loadsi (operands[0], base_addr));
+              }
+	    else
+              {
+		rtx dest = operands[0];
+		rtx tmp = gen_reg_rtx (SImode);
+
+		/* We may get a paradoxical subreg here.  Strip it off.  */
+		if (GET_CODE (dest) == SUBREG
+		    && GET_MODE (dest) == SImode
+		    && GET_MODE (SUBREG_REG (dest)) == HImode)
+		  dest = SUBREG_REG (dest);
+
+		if (GET_MODE_BITSIZE (GET_MODE (dest)) != width)
+		  FAIL;
+
+		base_addr = adjust_address (operands[1], HImode,
+					    bitpos / BITS_PER_UNIT);
+		emit_insn (gen_unaligned_loadhiu (tmp, base_addr));
+		emit_move_insn (gen_lowpart (SImode, dest), tmp);
+	      }
+	    DONE;
+	  }
+	else if (s_register_operand (operands[1], GET_MODE (operands[1])))
+	  {
+	    emit_insn (gen_extzv_t2 (operands[0], operands[1], operands[2],
+				     operands[3]));
+	    DONE;
+	  }
+	else
+	  FAIL;
       }
+    
+    if (!s_register_operand (operands[1], GET_MODE (operands[1])))
+      FAIL;
 
     operands[3] = GEN_INT (rshift);
     
@@ -3729,12 +3813,154 @@ 
         DONE;
       }
       
-    operands[2] = GEN_INT (lshift);
-    operands[4] = gen_reg_rtx (SImode);
+    emit_insn (gen_extzv_t1 (operands[0], operands[1], GEN_INT (lshift),
+			     operands[3], gen_reg_rtx (SImode)));
+    DONE;
   }"
 )
 
-(define_insn "extv"
+;; Helper for extzv, for the Thumb-1 register-shifts case.
+
+(define_expand "extzv_t1"
+  [(set (match_operand:SI 4 "s_register_operand" "")
+	(ashift:SI (match_operand:SI 1 "nonimmediate_operand" "")
+		   (match_operand:SI 2 "const_int_operand" "")))
+   (set (match_operand:SI 0 "s_register_operand" "")
+	(lshiftrt:SI (match_dup 4)
+		     (match_operand:SI 3 "const_int_operand" "")))]
+  "TARGET_THUMB1"
+  "")
+
+(define_expand "extv"
+  [(set (match_operand 0 "s_register_operand" "")
+	(sign_extract (match_operand 1 "nonimmediate_operand" "")
+		      (match_operand 2 "const_int_operand" "")
+		      (match_operand 3 "const_int_operand" "")))]
+  "arm_arch_thumb2"
+{
+  HOST_WIDE_INT width = INTVAL (operands[2]);
+  HOST_WIDE_INT bitpos = INTVAL (operands[3]);
+
+  if (unaligned_access && MEM_P (operands[1]) && (width == 16 || width == 32)
+      && (bitpos % BITS_PER_UNIT)  == 0)
+    {
+      rtx base_addr;
+      
+      if (BYTES_BIG_ENDIAN)
+	bitpos = GET_MODE_BITSIZE (GET_MODE (operands[0])) - width - bitpos;
+      
+      if (width == 32)
+        {
+	  base_addr = adjust_address (operands[1], SImode,
+				      bitpos / BITS_PER_UNIT);
+	  emit_insn (gen_unaligned_loadsi (operands[0], base_addr));
+        }
+      else
+        {
+	  rtx dest = operands[0];
+	  rtx tmp = gen_reg_rtx (SImode);
+	  
+	  /* We may get a paradoxical subreg here.  Strip it off.  */
+	  if (GET_CODE (dest) == SUBREG
+	      && GET_MODE (dest) == SImode
+	      && GET_MODE (SUBREG_REG (dest)) == HImode)
+	    dest = SUBREG_REG (dest);
+	  
+	  if (GET_MODE_BITSIZE (GET_MODE (dest)) != width)
+	    FAIL;
+	  
+	  base_addr = adjust_address (operands[1], HImode,
+				      bitpos / BITS_PER_UNIT);
+	  emit_insn (gen_unaligned_loadhis (tmp, base_addr));
+	  emit_move_insn (gen_lowpart (SImode, dest), tmp);
+	}
+
+      DONE;
+    }
+  else if (!s_register_operand (operands[1], GET_MODE (operands[1])))
+    FAIL;
+  else if (GET_MODE (operands[0]) == SImode
+	   && GET_MODE (operands[1]) == SImode)
+    {
+      emit_insn (gen_extv_regsi (operands[0], operands[1], operands[2],
+				 operands[3]));
+      DONE;
+    }
+
+  FAIL;
+})
+
+; Helper to expand register forms of extv with the proper modes.
+
+(define_expand "extv_regsi"
+  [(set (match_operand:SI 0 "s_register_operand" "")
+	(sign_extract:SI (match_operand:SI 1 "s_register_operand" "")
+			 (match_operand 2 "const_int_operand" "")
+			 (match_operand 3 "const_int_operand" "")))]
+  ""
+{
+})
+
+; ARMv6+ unaligned load/store instructions (used for packed structure accesses).
+
+(define_insn "unaligned_loadsi"
+  [(set (match_operand:SI 0 "s_register_operand" "=l,r")
+	(unspec:SI [(match_operand:SI 1 "memory_operand" "Uw,m")]
+		   UNSPEC_UNALIGNED_LOAD))]
+  "unaligned_access && TARGET_32BIT"
+  "ldr%?\t%0, %1\t@ unaligned"
+  [(set_attr "arch" "t2,any")
+   (set_attr "length" "2,4")
+   (set_attr "predicable" "yes")
+   (set_attr "type" "load1")])
+
+(define_insn "unaligned_loadhis"
+  [(set (match_operand:SI 0 "s_register_operand" "=l,r")
+	(sign_extend:SI
+	  (unspec:HI [(match_operand:HI 1 "memory_operand" "Uw,m")]
+		     UNSPEC_UNALIGNED_LOAD)))]
+  "unaligned_access && TARGET_32BIT"
+  "ldr%(sh%)\t%0, %1\t@ unaligned"
+  [(set_attr "arch" "t2,any")
+   (set_attr "length" "2,4")
+   (set_attr "predicable" "yes")
+   (set_attr "type" "load_byte")])
+
+(define_insn "unaligned_loadhiu"
+  [(set (match_operand:SI 0 "s_register_operand" "=l,r")
+	(zero_extend:SI
+	  (unspec:HI [(match_operand:HI 1 "memory_operand" "Uw,m")]
+		     UNSPEC_UNALIGNED_LOAD)))]
+  "unaligned_access && TARGET_32BIT"
+  "ldr%(h%)\t%0, %1\t@ unaligned"
+  [(set_attr "arch" "t2,any")
+   (set_attr "length" "2,4")
+   (set_attr "predicable" "yes")
+   (set_attr "type" "load_byte")])
+
+(define_insn "unaligned_storesi"
+  [(set (match_operand:SI 0 "memory_operand" "=Uw,m")
+	(unspec:SI [(match_operand:SI 1 "s_register_operand" "l,r")]
+		   UNSPEC_UNALIGNED_STORE))]
+  "unaligned_access && TARGET_32BIT"
+  "str%?\t%1, %0\t@ unaligned"
+  [(set_attr "arch" "t2,any")
+   (set_attr "length" "2,4")
+   (set_attr "predicable" "yes")
+   (set_attr "type" "store1")])
+
+(define_insn "unaligned_storehi"
+  [(set (match_operand:HI 0 "memory_operand" "=Uw,m")
+	(unspec:HI [(match_operand:HI 1 "s_register_operand" "l,r")]
+		   UNSPEC_UNALIGNED_STORE))]
+  "unaligned_access && TARGET_32BIT"
+  "str%(h%)\t%1, %0\t@ unaligned"
+  [(set_attr "arch" "t2,any")
+   (set_attr "length" "2,4")
+   (set_attr "predicable" "yes")
+   (set_attr "type" "store1")])
+
+(define_insn "*extv_reg"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
 	(sign_extract:SI (match_operand:SI 1 "s_register_operand" "r")
                          (match_operand:SI 2 "const_int_operand" "M")
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index be5fd3c..98eace3 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -249,3 +249,7 @@  mfix-cortex-m3-ldrd
 Target Report Var(fix_cm3_ldrd) Init(2)
 Avoid overlapping destination and address registers on LDRD instructions
 that may trigger Cortex-M3 errata.
+
+munaligned-access
+Target Report Var(unaligned_access) Init(2)
+Enable unaligned word and halfword accesses to packed data.
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index f5b8521..53c358e 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -36,7 +36,7 @@ 
 ;; The following memory constraints have been used:
 ;; in ARM/Thumb-2 state: Q, Ut, Uv, Uy, Un, Um, Us
 ;; in ARM state: Uq
-;; in Thumb state: Uu
+;; in Thumb state: Uu, Uw
 
 
 (define_register_constraint "f" "TARGET_ARM ? FPA_REGS : NO_REGS"
@@ -341,6 +341,19 @@ 
 		   && thumb1_legitimate_address_p (GET_MODE (op), XEXP (op, 0),
 						   0)")))
 
+; The 16-bit post-increment LDR/STR accepted by thumb1_legitimate_address_p
+; are actually LDM/STM instructions, so cannot be used to access unaligned
+; data.
+(define_memory_constraint "Uw"
+ "@internal
+  In Thumb state an address that is valid in 16bit encoding, and that can be
+  used for unaligned accesses."
+ (and (match_code "mem")
+      (match_test "TARGET_THUMB
+		   && thumb1_legitimate_address_p (GET_MODE (op), XEXP (op, 0),
+						   0)
+		   && GET_CODE (XEXP (op, 0)) != POST_INC")))
+
 ;; We used to have constraint letters for S and R in ARM state, but
 ;; all uses of these now appear to have been removed.
 
diff --git a/gcc/expmed.c b/gcc/expmed.c
index 0047cd0..2a663b3 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -659,19 +659,19 @@  store_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
 	  copy_back = true;
 	}
 
-      /* On big-endian machines, we count bits from the most significant.
-	 If the bit field insn does not, we must invert.  */
-
-      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
-	xbitpos = unit - bitsize - xbitpos;
-
       /* We have been counting XBITPOS within UNIT.
 	 Count instead within the size of the register.  */
-      if (BITS_BIG_ENDIAN && !MEM_P (xop0))
+      if (BYTES_BIG_ENDIAN && !MEM_P (xop0))
 	xbitpos += GET_MODE_BITSIZE (op_mode) - unit;
 
       unit = GET_MODE_BITSIZE (op_mode);
 
+      /* On big-endian machines, we count bits from the most significant.
+	 If the bit field insn does not, we must invert.  */
+
+      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
+	xbitpos = unit - bitsize - xbitpos;
+
       /* Convert VALUE to op_mode (which insv insn wants) in VALUE1.  */
       value1 = value;
       if (GET_MODE (value) != op_mode)
@@ -1528,17 +1528,17 @@  extract_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
 	/* Get ref to first byte containing part of the field.  */
 	xop0 = adjust_address (xop0, byte_mode, xoffset);
 
-      /* On big-endian machines, we count bits from the most significant.
-	 If the bit field insn does not, we must invert.  */
-      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
-	xbitpos = unit - bitsize - xbitpos;
-
       /* Now convert from counting within UNIT to counting in EXT_MODE.  */
-      if (BITS_BIG_ENDIAN && !MEM_P (xop0))
+      if (BYTES_BIG_ENDIAN && !MEM_P (xop0))
 	xbitpos += GET_MODE_BITSIZE (ext_mode) - unit;
 
       unit = GET_MODE_BITSIZE (ext_mode);
 
+      /* On big-endian machines, we count bits from the most significant.
+	 If the bit field insn does not, we must invert.  */
+      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
+	xbitpos = unit - bitsize - xbitpos;
+
       if (xtarget == 0)
 	xtarget = xspec_target = gen_reg_rtx (tmode);