diff mbox

[PATCH:,ARM] PR 45335 Use ldrd and strd to access two consecutive words

Message ID AANLkTi=yvvsJVJwymkE5nTL-q8RnW+WXMf5GNw1sTqpe@mail.gmail.com
State New
Headers show

Commit Message

Carrot Wei Jan. 18, 2011, 2:59 p.m. UTC
On Fri, Jan 14, 2011 at 5:51 PM, Richard Earnshaw <rearnsha@arm.com> wrote:
>
> On Fri, 2011-01-14 at 17:18 +0800, Carrot Wei wrote:
>> On Thu, Jan 13, 2011 at 6:25 PM, Ramana Radhakrishnan
>> <ramana.gcc@googlemail.com> wrote:
>> > On Thu, Jan 13, 2011 at 9:27 AM, Carrot Wei <carrot@google.com> wrote:
>> >> One question about the attribute length. It looks the attribute
>> >> expression is not very powerful according to
>> >> http://gcc.gnu.org/onlinedocs/gccint/Expressions.html#Expressions,
>> >> then how can I express following expressions:
>> >>
>> >> if (fix_cm3_ldrd && (operands[2] == operands[0]))
>> >
>> >
>> > Can't you express this as ?
>> >
>> > (set_attr "length"
>> >        (and (ne (symbol_ref ("fix_cm3_ldrd") (const_int 0))
>> >               (eq (match_dup 2) (match_dup 0))) (const_int <length of insns>)
>> >
>> According to http://gcc.gnu.org/onlinedocs/gccint/Insn-Lengths.html#Insn-Lengths,
>> (match_dup n) can only be used with a label_ref operand.
>>
>> >
>> > It's too early in the day and I haven't yet had my coffee but you
>> > probably get the picture.
>> >
>> > If you can't get separate constraints as Richard says or the logic as
>> > I mention above becomes too complicated which I suspect it might, it
>> > would be worth factoring out the logic into a common C function
>> > parameterised on counting vs emiting. Then you could just call the
>> > same function with and without the "emit/count" flag from the place
>> > where you set the length attribute and the place where you want to
>> > emit the assembler.
>> >
>> > Then the logic is in one place and more maintainable than having 2
>> > implementatons of the same logic.
>> >
>> This is a good idea!
>>
>
> It can sometimes be done that way, but beware: separating the length
> calculations from the insns it relates to is a long-term maintenance
> nightmare, because now the code is in two separate places.
>
> R.

Ramana's method is to put the instruction output and counting in on place.
So it's easy to keep them synchronized.

My latest version of patch did the following modifications compared to
the earlier version: Added support of arm ldrd/strd instructions. Added length
attribute to insn patterns. Moved the insn patterns to file ldmstm.md.

It has passed the dejagnu testing on arm qemu.

thanks
Carrot



ChangeLog:
2010-01-18  Wei Guozhi  <carrot@google.com>

        PR target/45335
        * gcc/config/arm/ldmstm.md (ldm2_ia, stm2_ia, ldm2_ib, stm2_ib, ldm2_da,
        stm2_da, ldm2_db, stm2_db): Add condition !arm_arch7 to these insns.
        (ldrd, ldrd_reg1, ldrd_reg2 and peephole2): New insn patterns and
        related peephole2.
        (strd, strd_reg1, strd_reg2 and peephole2): New insn patterns and
        related peephole2.
        * gcc/config/arm/arm-protos.h (arm_check_ldrd_operands): New prototype.
        (arm_legitimate_ldrd_p): New prototype.
        (arm_output_ldrd): New prototype.
        * gcc/config/arm/arm.c (arm_check_ldrd_operands): New function.
        (arm_legitimate_ldrd_p): New function.
        (arm_output_ldrd): New function.


2010-01-18  Wei Guozhi  <carrot@google.com>

        PR target/45335
        * gcc.target/arm/pr45335.c: New test.
        * gcc.target/arm/pr45335-2.c: New test.
        * gcc.target/arm/pr45335-3.c: New test.
        * gcc.target/arm/pr40457-1.c: Add another possible output "ldrd".
        * gcc.target/arm/pr40457-2.c: Changed to store 3 words.
        * gcc.target/arm/pr40457-3.c: Changed to store 3 words.

Comments

Jie Zhang Jan. 27, 2011, 4:10 a.m. UTC | #1
Hi Carrot,

I just found your patch is not in a good format. See the following piece 
of it:

On 01/18/2011 10:59 PM, Carrot Wei wrote:
> +      /* TARGET_ARM  */
> +      if (((regno1&  1) == 0)&&  ((regno1 + 1) == regno2))
> /* ldrd  */
> +	return true;
> +

The "/* ldrd */" line is bad and patch reports an error for it. There 
are several other similar cases. So the patch can't be applied easily. 
Could you resend your patch, in a good format?


Regards,
Carrot Wei March 15, 2011, 9:19 a.m. UTC | #2
The trunk is opened again, could any maintainers continue to review this patch?

thanks
Carrot

On Tue, Jan 18, 2011 at 10:59 PM, Carrot Wei <carrot@google.com> wrote:
> Ramana's method is to put the instruction output and counting in on place.
> So it's easy to keep them synchronized.
>
> My latest version of patch did the following modifications compared to
> the earlier version: Added support of arm ldrd/strd instructions. Added length
> attribute to insn patterns. Moved the insn patterns to file ldmstm.md.
>
> It has passed the dejagnu testing on arm qemu.
>
> thanks
> Carrot
Mike Stump March 24, 2011, 12:25 a.m. UTC | #3
On Jan 18, 2011, at 6:59 AM, Carrot Wei wrote:
> +(define_insn "*ldrd"
> +  [(parallel [(set (match_operand:SI 0 "arm_hard_register_operand" "")

parallel is implicit, you can safely remove it from all define_insns.
diff mbox

Patch

Index: arm.c
===================================================================
--- arm.c	(revision 168737)
+++ arm.c	(working copy)
@@ -23574,4 +23574,234 @@  arm_preferred_rename_class (reg_class_t
     return NO_REGS;
 }

+/* Check the validity of operands in an ldrd/strd instruction.  */
+bool
+arm_check_ldrd_operands (rtx reg1, rtx reg2, rtx off1, rtx off2)
+{
+  HOST_WIDE_INT offset1 = 0;
+  HOST_WIDE_INT offset2 = 0;
+  int regno1 = REGNO (reg1);
+  int regno2 = REGNO (reg2);
+  HOST_WIDE_INT max_offset = 1020;
+
+  if (TARGET_ARM)
+    max_offset = 255;
+
+  if (off1 != NULL_RTX)
+    offset1 = INTVAL (off1);
+  if (off2 != NULL_RTX)
+    offset2 = INTVAL (off2);
+
+  /* The offset range of LDRD is [-max_offset, max_offset]. Here we check if
+     both offsets lie in the range [-max_offset, max_offset+4]. If one of the
+     offsets is max_offset+4, the following condition
+         ((offset1 + 4) == offset2)
+     will ensure offset1 to be max_offset, suitable for instruction LDRD.  */
+  if ((offset1 > (max_offset + 4)) || (offset1 < -max_offset)
+      || ((offset1 & 3) != 0))
+    return false;
+  if ((offset2 > (max_offset + 4)) || (offset2 < -max_offset)
+      || ((offset2 & 3) != 0))
+    return false;
+
+  if ((offset1 + 4) == offset2)
+    {
+      if (TARGET_THUMB2)
+	return true;
+
+      /* TARGET_ARM  */
+      if (((regno1 & 1) == 0) && ((regno1 + 1) == regno2))
/* ldrd  */
+	return true;
+
+      if ((regno1 < regno2) && ((offset1 <= 4) && (offset1 >= -8)))  /* ldm  */
+	return true;
+    }
+  if ((offset2 + 4) == offset1)
+    {
+      if (TARGET_THUMB2)
+	return true;
+
+      /* TARGET_ARM  */
+      if (((regno2 & 1) == 0) && ((regno2 + 1) == regno1))
/* ldrd  */
+	return true;
+
+      if ((regno2 < regno1) && ((offset2 <= 4) && (offset2 >= -8)))  /* ldm  */
+	return true;
+    }
+
+  return false;
+}
+
+/* Check if the two memory accesses can be merged to an ldrd/strd instruction.
+   That is they use the same base register, and the gap between constant
+   offsets should be 4.  */
+bool
+arm_legitimate_ldrd_p (rtx reg1, rtx reg2, rtx mem1, rtx mem2, bool ldrd)
+{
+  rtx base1, base2;
+  rtx offset1 = NULL_RTX;
+  rtx offset2 = NULL_RTX;
+  rtx addr1 = XEXP (mem1, 0);
+  rtx addr2 = XEXP (mem2, 0);
+
+  if (MEM_VOLATILE_P (mem1) || MEM_VOLATILE_P (mem2))
+    return false;
+
+  if (REG_P (addr1))
+    base1 = addr1;
+  else if (GET_CODE (addr1) == PLUS)
+    {
+      base1 = XEXP (addr1, 0);
+      offset1 = XEXP (addr1, 1);
+      if (!REG_P (base1) || (GET_CODE (offset1) != CONST_INT))
+	return false;
+    }
+  else
+    return false;
+
+  if (REG_P (addr2))
+    base2 = addr2;
+  else if (GET_CODE (addr2) == PLUS)
+    {
+      base2 = XEXP (addr2, 0);
+      offset2 = XEXP (addr2, 1);
+      if (!REG_P (base2) || (GET_CODE (offset2) != CONST_INT))
+	return false;
+    }
+  else
+    return false;
+
+  if (base1 != base2)
+    return false;
+
+  if (ldrd && ((reg1 == reg2) || (reg1 == base1)))
+    return false;
+
+  return arm_check_ldrd_operands (reg1, reg2, offset1, offset2);
+}
+
+/* Output instructions for ldrd and count the number of bytes has been
+   outputted. Do not actually output instructions if EMIT_P is false.  */
+int
+arm_output_ldrd (rtx reg1, rtx reg2, rtx base, rtx off1, rtx off2, bool emit_p)
+{
+  int length = 0;
+  rtx operands[5];
+  HOST_WIDE_INT offset1 = 0;
+  HOST_WIDE_INT offset2 = 0;
+
+  if (off1 != NULL_RTX)
+    offset1 = INTVAL (off1);
+  else
+    off1 = GEN_INT (0);
+  if (off2 != NULL_RTX)
+    offset2 = INTVAL (off2);
+  else
+    off2 = GEN_INT (0);
+  if (offset1 > offset2)
+    {
+      rtx tmp;
+      HOST_WIDE_INT t = offset1;   offset1 = offset2;   offset2 = t;
+      tmp = off1;   off1 = off2;   off2 = tmp;
+      tmp = reg1;   reg1 = reg2;   reg2 = tmp;
+    }
+
+  operands[0] = reg1;
+  operands[1] = reg2;
+  operands[2] = base;
+  operands[3] = off1;
+  operands[4] = off2;
+
+  if (TARGET_THUMB2)
+    {
+      if (fix_cm3_ldrd && (base == reg1))
+	{
+	  if (offset1 <= -256)
+	    {
+	      if (emit_p)
+		output_asm_insn ("sub\t%2, %2, %n3", operands);
+	      length = 4;
+
+	      if (emit_p)
+		output_asm_insn ("ldr\t%1, [%2, #4]", operands);
+	      if (low_register_operand (reg2, SImode)
+		  && low_register_operand (base, SImode))
+		length += 2;
+	      else
+		length += 4;
+
+	      if (emit_p)
+		output_asm_insn ("ldr\t%0, [%2]", operands);
+	      if (low_register_operand (base, SImode))
+		length += 2;
+	      else
+		length += 4;
+	    }
+	  else
+	    {
+	      if (emit_p)
+		output_asm_insn ("ldr\t%1, [%2, %4]", operands);
+	      if (low_register_operand (reg2, SImode) && (offset2 >= 0)
+		  && low_register_operand (base, SImode) && (offset2 < 128))
+		length += 2;
+	      else
+		length += 4;
+
+	      if (emit_p)
+		output_asm_insn ("ldr\t%0, [%2, %3]", operands);
+	      if (low_register_operand (base, SImode)
+		  && (offset1 >= 0) && (offset1 < 128))
+		length += 2;
+	      else
+		length += 4;
+	    }
+	}
+      else
+	{
+	  if (emit_p)
+	    output_asm_insn ("ldrd\t%0, %1, [%2, %3]", operands);
+	  length = 4;
+	}
+    }
+  else    /* TARGET_ARM  */
+    {
+      if ((REGNO (reg2) == (REGNO (reg1) + 1)) && ((REGNO (reg1) & 1) == 0))
+	{
+	  if (emit_p)
+	    output_asm_insn ("ldrd\t%0, %1, [%2, %3]", operands);
+	  length = 4;
+	}
+      else
+	{
+	  if (emit_p)
+	    {
+	      switch (offset1)
+		{
+		case -8:
+		  output_asm_insn ("ldm%(db%)\t%2, {%0, %1}", operands);
+		  break;
+
+		case -4:
+		  output_asm_insn ("ldm%(da%)\t%2, {%0, %1}", operands);
+		  break;
+
+		case 0:
+		  output_asm_insn ("ldm%(ia%)\t%2, {%0, %1}", operands);
+		  break;
+
+		case 4:
+		  output_asm_insn ("ldm%(ib%)\t%2, {%0, %1}", operands);
+		  break;
+
+		default:
+		  gcc_unreachable ();
+		}
+	    }
+	  length = 4;
+	}
+    }
+
+  return length;
+}
+
 #include "gt-arm.h"
Index: arm-protos.h
===================================================================
--- arm-protos.h	(revision 168737)
+++ arm-protos.h	(working copy)
@@ -150,6 +150,9 @@  extern void arm_expand_sync (enum machin
 extern const char *arm_output_memory_barrier (rtx *);
 extern const char *arm_output_sync_insn (rtx, rtx *);
 extern unsigned int arm_sync_loop_insns (rtx , rtx *);
+extern bool arm_check_ldrd_operands (rtx, rtx, rtx, rtx);
+extern bool arm_legitimate_ldrd_p (rtx, rtx, rtx, rtx, bool);
+extern int arm_output_ldrd (rtx, rtx, rtx, rtx, rtx, bool);

 #if defined TREE_CODE
 extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree);
Index: ldmstm.md
===================================================================
--- ldmstm.md	(revision 168737)
+++ ldmstm.md	(working copy)
@@ -852,7 +852,7 @@  (define_insn "*ldm2_ia"
      (set (match_operand:SI 2 "arm_hard_register_operand" "")
           (mem:SI (plus:SI (match_dup 3)
                   (const_int 4))))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
+  "TARGET_32BIT && !arm_arch7 && XVECLEN (operands[0], 0) == 2"
   "ldm%(ia%)\t%3, {%1, %2}"
   [(set_attr "type" "load2")
    (set_attr "predicable" "yes")])
@@ -901,7 +901,7 @@  (define_insn "*stm2_ia"
           (match_operand:SI 1 "arm_hard_register_operand" ""))
      (set (mem:SI (plus:SI (match_dup 3) (const_int 4)))
           (match_operand:SI 2 "arm_hard_register_operand" ""))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
+  "TARGET_32BIT && !arm_arch7 && XVECLEN (operands[0], 0) == 2"
   "stm%(ia%)\t%3, {%1, %2}"
   [(set_attr "type" "store2")
    (set_attr "predicable" "yes")])
@@ -939,7 +939,7 @@  (define_insn "*ldm2_ib"
      (set (match_operand:SI 2 "arm_hard_register_operand" "")
           (mem:SI (plus:SI (match_dup 3)
                   (const_int 8))))])]
-  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
+  "TARGET_ARM && !arm_arch7 && XVECLEN (operands[0], 0) == 2"
   "ldm%(ib%)\t%3, {%1, %2}"
   [(set_attr "type" "load2")
    (set_attr "predicable" "yes")])
@@ -965,7 +965,7 @@  (define_insn "*stm2_ib"
           (match_operand:SI 1 "arm_hard_register_operand" ""))
      (set (mem:SI (plus:SI (match_dup 3) (const_int 8)))
           (match_operand:SI 2 "arm_hard_register_operand" ""))])]
-  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
+  "TARGET_ARM && !arm_arch7 && XVECLEN (operands[0], 0) == 2"
   "stm%(ib%)\t%3, {%1, %2}"
   [(set_attr "type" "store2")
    (set_attr "predicable" "yes")])
@@ -990,7 +990,7 @@  (define_insn "*ldm2_da"
                   (const_int -4))))
      (set (match_operand:SI 2 "arm_hard_register_operand" "")
           (mem:SI (match_dup 3)))])]
-  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
+  "TARGET_ARM && !arm_arch7 && XVECLEN (operands[0], 0) == 2"
   "ldm%(da%)\t%3, {%1, %2}"
   [(set_attr "type" "load2")
    (set_attr "predicable" "yes")])
@@ -1015,7 +1015,7 @@  (define_insn "*stm2_da"
           (match_operand:SI 1 "arm_hard_register_operand" ""))
      (set (mem:SI (match_dup 3))
           (match_operand:SI 2 "arm_hard_register_operand" ""))])]
-  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
+  "TARGET_ARM && !arm_arch7 && XVECLEN (operands[0], 0) == 2"
   "stm%(da%)\t%3, {%1, %2}"
   [(set_attr "type" "store2")
    (set_attr "predicable" "yes")])
@@ -1041,7 +1041,7 @@  (define_insn "*ldm2_db"
      (set (match_operand:SI 2 "arm_hard_register_operand" "")
           (mem:SI (plus:SI (match_dup 3)
                   (const_int -4))))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
+  "TARGET_32BIT && !arm_arch7 && XVECLEN (operands[0], 0) == 2"
   "ldm%(db%)\t%3, {%1, %2}"
   [(set_attr "type" "load2")
    (set_attr "predicable" "yes")])
@@ -1067,7 +1067,7 @@  (define_insn "*stm2_db"
           (match_operand:SI 1 "arm_hard_register_operand" ""))
      (set (mem:SI (plus:SI (match_dup 3) (const_int -4)))
           (match_operand:SI 2 "arm_hard_register_operand" ""))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
+  "TARGET_32BIT && !arm_arch7 && XVECLEN (operands[0], 0) == 2"
   "stm%(db%)\t%3, {%1, %2}"
   [(set_attr "type" "store2")
    (set_attr "predicable" "yes")])
@@ -1189,3 +1189,211 @@  (define_peephole2
     FAIL;
 })

+(define_insn "*ldrd"
+  [(parallel [(set (match_operand:SI 0 "arm_hard_register_operand" "")
+		   (mem:SI (plus:SI
+				(match_operand:SI 2 "s_register_operand" "rk")
+				(match_operand:SI 3 "const_int_operand" ""))))
+	      (set (match_operand:SI 1 "arm_hard_register_operand" "")
+		   (mem:SI (plus:SI (match_dup 2)
+			     (match_operand:SI 4 "const_int_operand" ""))))])]
+  "TARGET_32BIT && arm_arch7
+   && arm_check_ldrd_operands (operands[0], operands[1],
+			       operands[3], operands[4])"
+  "*
+  arm_output_ldrd (operands[0], operands[1],
+		   operands[2], operands[3], operands[4], true);
+  return \"\";
+  "
+  [(set (attr "length")
+	(symbol_ref ("arm_output_ldrd (operands[0], operands[1], operands[2],
+				       operands[3], operands[4], false)")))]
+)
+
+(define_insn "*ldrd_reg1"
+  [(parallel [(set (match_operand:SI 0 "arm_hard_register_operand" "")
+		   (mem:SI (match_operand:SI 2 "s_register_operand" "rk")))
+	      (set (match_operand:SI 1 "arm_hard_register_operand" "")
+		   (mem:SI (plus:SI (match_dup 2)
+			     (match_operand:SI 3 "const_int_operand" ""))))])]
+  "TARGET_32BIT && arm_arch7
+   && arm_check_ldrd_operands (operands[0], operands[1], NULL_RTX,
operands[3])"
+  "*
+  arm_output_ldrd (operands[0], operands[1],
+		   operands[2], NULL_RTX, operands[3], true);
+  return \"\";
+  "
+  [(set (attr "length")
+	(symbol_ref ("arm_output_ldrd (operands[0], operands[1], operands[2],
+				       NULL_RTX, operands[3], false)")))]
+)
+
+(define_insn "*ldrd_reg2"
+  [(parallel [(set (match_operand:SI 0 "arm_hard_register_operand" "")
+		   (mem:SI (plus:SI
+				(match_operand:SI 2 "s_register_operand" "rk")
+				(match_operand:SI 3 "const_int_operand" ""))))
+	      (set (match_operand:SI 1 "arm_hard_register_operand" "")
+		   (mem:SI (match_dup 2)))])]
+  "TARGET_32BIT && arm_arch7
+   && arm_check_ldrd_operands (operands[0], operands[1], operands[3],
NULL_RTX)"
+  "*
+  arm_output_ldrd (operands[0], operands[1],
+		   operands[2], operands[3], NULL_RTX, true);
+  return \"\";
+  "
+  [(set (attr "length")
+	(symbol_ref ("arm_output_ldrd (operands[0], operands[1], operands[2],
+				       operands[3], NULL_RTX, false)")))]
+)
+
+(define_peephole2
+  [(set (match_operand:SI 0 "s_register_operand" "")
+	(match_operand:SI 2 "memory_operand" ""))
+   (set (match_operand:SI 1 "s_register_operand" "")
+	(match_operand:SI 3 "memory_operand" ""))]
+  "TARGET_32BIT && arm_arch7
+   && arm_legitimate_ldrd_p (operands[0], operands[1],
+				operands[2], operands[3], true)"
+  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
+		   (match_operand:SI 2 "memory_operand" ""))
+	      (set (match_operand:SI 1 "s_register_operand" "")
+		   (match_operand:SI 3 "memory_operand" ""))])]
+  ""
+)
+
+(define_insn "*strd"
+  [(parallel [(set (mem:SI
+			(plus:SI (match_operand:SI 2 "s_register_operand" "rk")
+				 (match_operand:SI 3 "const_int_operand" "")))
+		   (match_operand:SI 0 "arm_hard_register_operand" ""))
+	      (set (mem:SI (plus:SI (match_dup 2)
+				 (match_operand:SI 4 "const_int_operand" "")))
+		   (match_operand:SI 1 "arm_hard_register_operand" ""))])]
+  "TARGET_32BIT && arm_arch7
+   && arm_check_ldrd_operands (operands[0], operands[1],
+			       operands[3], operands[4])"
+  "*
+  {
+    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
+    HOST_WIDE_INT offset2 = INTVAL (operands[4]);
+    if (offset1 > offset2)
+      {
+	rtx tmp = operands[0];  operands[0] = operands[1];  operands[1] = tmp;
+	tmp = operands[3];  operands[3] = operands[4];  operands[4] = tmp;
+	offset1 = INTVAL (operands[3]);
+	offset2 = INTVAL (operands[4]);
+      }
+    if (TARGET_THUMB2)
+      return \"strd\\t%0, %1, [%2, %3]\";
+    else          /* TARGET_ARM  */
+      {
+	if ((REGNO (operands[1]) == (REGNO (operands[0]) + 1))
+	    && ((REGNO (operands[0]) & 1) == 0))
+	  return \"strd\\t%0, %1, [%2, %3]\";
+	else if (offset1 == -8)
+	  return \"stm%(db%)\\t%2, {%0, %1}\";
+	else      /* offset1 == 4  */
+	  return \"stm%(ib%)\\t%2, {%0, %1}\";
+      }
+  }"
+  [(set_attr "length" "4")]
+)
+
+(define_insn "*strd_reg1"
+  [(parallel [(set (mem:SI (match_operand:SI 2 "s_register_operand" "rk"))
+		   (match_operand:SI 0 "arm_hard_register_operand" ""))
+	      (set (mem:SI (plus:SI (match_dup 2)
+				(match_operand:SI 3 "const_int_operand" "")))
+		   (match_operand:SI 1 "arm_hard_register_operand" ""))])]
+  "TARGET_32BIT && arm_arch7
+   && arm_check_ldrd_operands (operands[0], operands[1], NULL_RTX,
operands[3])"
+  "*
+  {
+    HOST_WIDE_INT offset2 = INTVAL (operands[3]);
+    if (TARGET_THUMB2)
+      {
+	if (offset2 == 4)
+	  return \"strd\\t%0, %1, [%2]\";
+	else
+	  return \"strd\\t%1, %0, [%2, %3]\";
+      }
+    else           /* TARGET_ARM  */
+      {
+	if (offset2 == 4)
+	  {
+	    if ((REGNO (operands[1]) == (REGNO (operands[0]) + 1))
+		&& ((REGNO (operands[0]) & 1) == 0))
+	      return \"strd\\t%0, %1, [%2]\";
+	    else
+	      return \"stm%(ia%)\\t%2, {%0, %1}\";
+	  }
+	else      /* offset2 == -4  */
+	  {
+	    if ((REGNO (operands[0]) == (REGNO (operands[1]) + 1))
+		&& ((REGNO (operands[1]) & 1) == 0))
+	      return \"strd\\t%1, %0, [%2, %3]\";
+	    else
+	      return \"stm%(da%)\\t%2, {%1, %0}\";
+	  }
+      }
+  }"
+  [(set_attr "length" "4")]
+)
+
+(define_insn "*strd_reg2"
+  [(parallel [(set (mem:SI (plus:SI
+				(match_operand:SI 2 "s_register_operand" "rk")
+				(match_operand:SI 3 "const_int_operand" "")))
+		   (match_operand:SI 0 "arm_hard_register_operand" ""))
+	      (set (mem:SI (match_dup 2))
+		   (match_operand:SI 1 "arm_hard_register_operand" ""))])]
+  "TARGET_32BIT && arm_arch7
+   && arm_check_ldrd_operands (operands[0], operands[1], operands[3],
NULL_RTX)"
+  "*
+  {
+    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
+    if (TARGET_THUMB2)
+      {
+	if (offset1 == -4)
+	  return \"strd\\t%0, %1, [%2, %3]\";
+	else
+	  return \"strd\\t%1, %0, [%2]\";
+      }
+    else           /* TARGET_ARM  */
+      {
+	if (offset1 == -4)
+	  {
+	    if ((REGNO (operands[1]) == (REGNO (operands[0]) + 1))
+		&& ((REGNO (operands[0]) & 1) == 0))
+	      return \"strd\\t%0, %1, [%2, %3]\";
+	    else
+	      return \"stm%(da%)\\t%2, {%0, %1}\";
+	  }
+	else
+	  {
+	    if ((REGNO (operands[0]) == (REGNO (operands[1]) + 1))
+		&& ((REGNO (operands[1]) & 1) == 0))
+	      return \"strd\\t%1, %0, [%2]\";
+	    else
+	      return \"stm%(ia%)\\t%2, {%1, %0}\";
+	  }
+      }
+  }"
+  [(set_attr "length" "4")]
+)
+
+(define_peephole2
+  [(set (match_operand:SI 2 "memory_operand" "")
+	(match_operand:SI 0 "s_register_operand" ""))
+   (set (match_operand:SI 3 "memory_operand" "")
+	(match_operand:SI 1 "s_register_operand" ""))]
+  "TARGET_32BIT && arm_arch7
+   && arm_legitimate_ldrd_p (operands[0], operands[1],
+				operands[2], operands[3], false)"
+  [(parallel [(set (match_operand:SI 2 "memory_operand" "")
+		   (match_operand:SI 0 "s_register_operand" ""))
+	      (set (match_operand:SI 3 "memory_operand" "")
+		   (match_operand:SI 1 "s_register_operand" ""))])]
+  ""
+)


Index: pr40457-3.c
===================================================================
--- pr40457-3.c	(revision 168737)
+++ pr40457-3.c	(working copy)
@@ -5,6 +5,7 @@  void foo(int* p)
 {
   p[0] = 1;
   p[1] = 0;
+  p[2] = 2;
 }

 /* { dg-final { scan-assembler "stm" } } */
Index: pr45335-2.c
===================================================================
--- pr45335-2.c	(revision 0)
+++ pr45335-2.c	(revision 0)
@@ -0,0 +1,10 @@ 
+/* { dg-options "-Os -march=armv7-a" }  */
+/* { dg-do compile } */
+
+void foo(int a, int b, int* p)
+{
+  p[2] = a;
+  p[3] = b;
+}
+
+/* { dg-final { scan-assembler "strd" } } */
Index: pr45335-3.c
===================================================================
--- pr45335-3.c	(revision 0)
+++ pr45335-3.c	(revision 0)
@@ -0,0 +1,12 @@ 
+/* { dg-options "-Os -march=armv7-a" }  */
+/* { dg-do compile } */
+
+int foo(int a, int b, int* p, int *q)
+{
+  a = p[2] + p[3];
+  *q = a;
+  *p = a;
+  return a;
+}
+
+/* { dg-final { scan-assembler "ldrd" } } */
Index: pr40457-1.c
===================================================================
--- pr40457-1.c	(revision 168737)
+++ pr40457-1.c	(working copy)
@@ -7,4 +7,4 @@  int bar(int* p)
   return x;
 }

-/* { dg-final { scan-assembler "ldm" } } */
+/* { dg-final { scan-assembler "ldm|ldrd" } } */
Index: pr40457-2.c
===================================================================
--- pr40457-2.c	(revision 168737)
+++ pr40457-2.c	(working copy)
@@ -5,6 +5,7 @@  void foo(int* p)
 {
   p[0] = 1;
   p[1] = 0;
+  p[2] = 2;
 }

 /* { dg-final { scan-assembler "stm" } } */
Index: pr45335.c
===================================================================
--- pr45335.c	(revision 0)
+++ pr45335.c	(revision 0)
@@ -0,0 +1,22 @@ 
+/* { dg-options "-mthumb -O2" } */
+/* { dg-require-effective-target arm_thumb2_ok } */
+/* { dg-final { scan-assembler "ldrd" } } */
+/* { dg-final { scan-assembler "strd" } } */
+
+struct S
+{
+    void* p1;
+    void* p2;
+    void* p3;
+    void* p4;
+};
+
+extern printf(char*, ...);
+
+void foo1(struct S* fp, struct S* otherSaveArea)
+{
+    struct S* saveA = fp - 1;
+    printf("StackSaveArea for fp %p [%p/%p]:\n", fp, saveA, otherSaveArea);
+    printf("prevFrame=%p savedPc=%p meth=%p curPc=%p fp[0]=0x%08x\n",
+        saveA->p1, saveA->p2, saveA->p3, saveA->p4, *(unsigned int*)fp);
+}