diff mbox

[PATCH:,ARM] PR 45335 Use ldrd and strd to access two consecutive words

Message ID AANLkTi=obfc3R-sguDE4uaU_Ni+ciNz5gtPVKnG-JBVB@mail.gmail.com
State New
Headers show

Commit Message

Carrot Wei Oct. 16, 2010, 12:27 p.m. UTC
On Wed, Oct 13, 2010 at 7:01 PM, Paul Brook <paul@codesourcery.com> wrote:
>> ChangeLog:
>> 2010-09-04  Wei Guozhi  <carrot@google.com>
>>
>>         PR target/45335
>>         * gcc/config/arm/thumb2.md (thumb2_ldrd, thumb2_ldrd_reg1,
>>         thumb2_ldrd_reg2 and peephole2): New insn pattern and related
>>         peephole2.
>>         (thumb2_strd, thumb2_strd_reg1, thumb2_strd_reg2 and peephole2):
>>         New insn pattern and related peephole2.
>>         * gcc/config/arm/arm.c (thumb2_legitimate_ldrd_p): New function.
>>         (thumb2_check_ldrd_operands): New function.
>>         (thumb2_prefer_ldmstm): New function.
>>         * gcc/config/arm/arm-protos.h (thumb2_legitimate_ldrd_p): New
>> prototype. (thumb2_check_ldrd_operands): New prototype.
>>         (thumb2_prefer_ldmstm): New prototype.
>>         * gcc/config/arm/ldmstm.md (ldm2_ia, stm2_ia, ldm2_db, stm2_db):
>>         Change the ldm/stm patterns with 2 words to ARM only.
>>         * gcc/config/arm/constraints.md (Py): New thumb2 constant
>> constraint suitable to ldrd/strd instructions.
>
> Not ok.
>
> Why is this restricted to Thumb mode? The ARM variant of ldrd isn't quite as
> flexible, but still provides a useful improvement over ldm.
>
I agree the ARM version is also useful. But it brings much less
benefit with too much complexity (due to more restriction and insn
pattern conflict with ldm). So I will leave it as a future
improvement.

> This transformation is only valid on ARMv7 cores. On earlier hardware
> (depending on system configuration) it may cause undefined behavior or an
> alignment trap.
>
done.

> The range on -1020 to +1024 is used in several places, but without any
> apparent explanation of why it's different to the range of an ldrd
> instruction.  I figured it out eventually, but it deserves a comment.
>
Comments added.

>> +  "TARGET_THUMB2 && thumb2_check_ldrd_operands (operands[0], operands[1],
>> +                                           operands[2], 0, operands[3], 1)"
>
> Passed operands do not match expected types. Specifically "0" is not an rtx
> (should be "NULL_RTX"), and "1" is not a boolean value (should be "true").
> Many other occurrences.
>
Fixed.

>> +(define_constraint "Py"
>> +  "@internal In Thumb-2 state a constant that is a multiple of 4 in the
>> +   range -1020 to 1024"
>
> This comment seems particularly pointless. You should mention why this
> exists/where it is used.
>
> I think you're better off enforcing this in the insn condition, and remove
> this constraint. At least half the uses (the -reg[12] insns) are incorrect,
> and you already need the condition to enforce the dependency between the
> operands.
>
I removed this constraint and add the check to insn condition.

>> +thumb2_check_ldrd_operands (rtx reg1, rtx reg2, rtx base,
>>...
>> +  if (ldrd && (reg1 == reg2))
>> +    return false;
>
> This function is part of the instruction condition.  Instruction conditions
> must not be used to enforce register allocation.
>
removed.

>> +thumb2_legitimate_ldrd_p (
>>...
>> +  if (ldrd && ((reg1 == reg2) || (reg1 == base1)))
>> +    return false;
>
> You're incorrectly assuming offset1 < offset2, which might not be true at this
> point.
>
The following check assumes offset1 < offset2
+  if ((offset1 + 4) == offset2)
+    return true;

And another check assumes offset2 < offset1, so both cases are covered.
+  if ((offset2 + 4) == offset1)
+    return true;

>> +  /* Now ldm/stm is possible. Check for special cases ldm/stm has lower
>> +     cost.  */
>> +  return false;
>
> Code clearly doesn't match the comment.  In fact this function always returns
> false.
>
Richard mentioned that in some cases (specifically cortex A9) ldm has
less cost than ldrd and we should model this in the insn pattern. This
function is used for this. But I don't know the cortex A9 architecture
detail, so it should be filled by somebody with more knowledge about
it in future.

Wei Guozhi


ChangeLog:
2010-10-16  Wei Guozhi  <carrot@google.com>

        PR target/45335
        * gcc/config/arm/thumb2.md (thumb2_ldrd, thumb2_ldrd_reg1,
        thumb2_ldrd_reg2 and peephole2): New insn pattern and related
        peephole2.
        (thumb2_strd, thumb2_strd_reg1, thumb2_strd_reg2 and peephole2):
        New insn pattern and related peephole2.
        * gcc/config/arm/arm.c (thumb2_legitimate_ldrd_p): New function.
        (thumb2_check_ldrd_operands): New function.
        (thumb2_prefer_ldmstm): New function.
        * gcc/config/arm/arm-protos.h (thumb2_legitimate_ldrd_p): New prototype.
        (thumb2_check_ldrd_operands): New prototype.
        (thumb2_prefer_ldmstm): New prototype.
        * gcc/config/arm/ldmstm.md (ldm2_ia, stm2_ia, ldm2_db, stm2_db):
        Change the ldm/stm patterns with 2 words to ARM only.


2010-10-16  Wei Guozhi  <carrot@google.com>

        PR target/45335
        * gcc.target/arm/pr45335.c: New test.
        * gcc.target/arm/pr40457-1.c: Changed to load 3 words.
        * gcc.target/arm/pr40457-2.c: Changed to store 3 words.
        * gcc.target/arm/pr40457-3.c: Changed to store 3 words.

Comments

Carrot Wei Oct. 24, 2010, 1:46 p.m. UTC | #1
Ping

On Sat, Oct 16, 2010 at 8:27 PM, Carrot Wei <carrot@google.com> wrote:
> On Wed, Oct 13, 2010 at 7:01 PM, Paul Brook <paul@codesourcery.com> wrote:
>>> ChangeLog:
>>> 2010-09-04  Wei Guozhi  <carrot@google.com>
>>>
>>>         PR target/45335
>>>         * gcc/config/arm/thumb2.md (thumb2_ldrd, thumb2_ldrd_reg1,
>>>         thumb2_ldrd_reg2 and peephole2): New insn pattern and related
>>>         peephole2.
>>>         (thumb2_strd, thumb2_strd_reg1, thumb2_strd_reg2 and peephole2):
>>>         New insn pattern and related peephole2.
>>>         * gcc/config/arm/arm.c (thumb2_legitimate_ldrd_p): New function.
>>>         (thumb2_check_ldrd_operands): New function.
>>>         (thumb2_prefer_ldmstm): New function.
>>>         * gcc/config/arm/arm-protos.h (thumb2_legitimate_ldrd_p): New
>>> prototype. (thumb2_check_ldrd_operands): New prototype.
>>>         (thumb2_prefer_ldmstm): New prototype.
>>>         * gcc/config/arm/ldmstm.md (ldm2_ia, stm2_ia, ldm2_db, stm2_db):
>>>         Change the ldm/stm patterns with 2 words to ARM only.
>>>         * gcc/config/arm/constraints.md (Py): New thumb2 constant
>>> constraint suitable to ldrd/strd instructions.
>>
>> Not ok.
>>
>> Why is this restricted to Thumb mode? The ARM variant of ldrd isn't quite as
>> flexible, but still provides a useful improvement over ldm.
>>
> I agree the ARM version is also useful. But it brings much less
> benefit with too much complexity (due to more restriction and insn
> pattern conflict with ldm). So I will leave it as a future
> improvement.
>
>> This transformation is only valid on ARMv7 cores. On earlier hardware
>> (depending on system configuration) it may cause undefined behavior or an
>> alignment trap.
>>
> done.
>
>> The range on -1020 to +1024 is used in several places, but without any
>> apparent explanation of why it's different to the range of an ldrd
>> instruction.  I figured it out eventually, but it deserves a comment.
>>
> Comments added.
>
>>> +  "TARGET_THUMB2 && thumb2_check_ldrd_operands (operands[0], operands[1],
>>> +                                           operands[2], 0, operands[3], 1)"
>>
>> Passed operands do not match expected types. Specifically "0" is not an rtx
>> (should be "NULL_RTX"), and "1" is not a boolean value (should be "true").
>> Many other occurrences.
>>
> Fixed.
>
>>> +(define_constraint "Py"
>>> +  "@internal In Thumb-2 state a constant that is a multiple of 4 in the
>>> +   range -1020 to 1024"
>>
>> This comment seems particularly pointless. You should mention why this
>> exists/where it is used.
>>
>> I think you're better off enforcing this in the insn condition, and remove
>> this constraint. At least half the uses (the -reg[12] insns) are incorrect,
>> and you already need the condition to enforce the dependency between the
>> operands.
>>
> I removed this constraint and add the check to insn condition.
>
>>> +thumb2_check_ldrd_operands (rtx reg1, rtx reg2, rtx base,
>>>...
>>> +  if (ldrd && (reg1 == reg2))
>>> +    return false;
>>
>> This function is part of the instruction condition.  Instruction conditions
>> must not be used to enforce register allocation.
>>
> removed.
>
>>> +thumb2_legitimate_ldrd_p (
>>>...
>>> +  if (ldrd && ((reg1 == reg2) || (reg1 == base1)))
>>> +    return false;
>>
>> You're incorrectly assuming offset1 < offset2, which might not be true at this
>> point.
>>
> The following check assumes offset1 < offset2
> +  if ((offset1 + 4) == offset2)
> +    return true;
>
> And another check assumes offset2 < offset1, so both cases are covered.
> +  if ((offset2 + 4) == offset1)
> +    return true;
>
>>> +  /* Now ldm/stm is possible. Check for special cases ldm/stm has lower
>>> +     cost.  */
>>> +  return false;
>>
>> Code clearly doesn't match the comment.  In fact this function always returns
>> false.
>>
> Richard mentioned that in some cases (specifically cortex A9) ldm has
> less cost than ldrd and we should model this in the insn pattern. This
> function is used for this. But I don't know the cortex A9 architecture
> detail, so it should be filled by somebody with more knowledge about
> it in future.
>
> Wei Guozhi
>
>
> ChangeLog:
> 2010-10-16  Wei Guozhi  <carrot@google.com>
>
>        PR target/45335
>        * gcc/config/arm/thumb2.md (thumb2_ldrd, thumb2_ldrd_reg1,
>        thumb2_ldrd_reg2 and peephole2): New insn pattern and related
>        peephole2.
>        (thumb2_strd, thumb2_strd_reg1, thumb2_strd_reg2 and peephole2):
>        New insn pattern and related peephole2.
>        * gcc/config/arm/arm.c (thumb2_legitimate_ldrd_p): New function.
>        (thumb2_check_ldrd_operands): New function.
>        (thumb2_prefer_ldmstm): New function.
>        * gcc/config/arm/arm-protos.h (thumb2_legitimate_ldrd_p): New prototype.
>        (thumb2_check_ldrd_operands): New prototype.
>        (thumb2_prefer_ldmstm): New prototype.
>        * gcc/config/arm/ldmstm.md (ldm2_ia, stm2_ia, ldm2_db, stm2_db):
>        Change the ldm/stm patterns with 2 words to ARM only.
>
>
> 2010-10-16  Wei Guozhi  <carrot@google.com>
>
>        PR target/45335
>        * gcc.target/arm/pr45335.c: New test.
>        * gcc.target/arm/pr40457-1.c: Changed to load 3 words.
>        * gcc.target/arm/pr40457-2.c: Changed to store 3 words.
>        * gcc.target/arm/pr40457-3.c: Changed to store 3 words.
>
>
> Index: thumb2.md
> ===================================================================
> --- thumb2.md   (revision 165492)
> +++ thumb2.md   (working copy)
> @@ -1118,3 +1118,228 @@ (define_peephole2
>   "
>   operands[2] = GEN_INT (32 - INTVAL (operands[2]));
>   ")
> +
> +(define_insn "*thumb2_ldrd"
> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
> +                  (mem:SI (plus:SI
> +                               (match_operand:SI 2 "s_register_operand" "rk")
> +                               (match_operand:SI 3 "const_int_operand" ""))))
> +             (set (match_operand:SI 1 "s_register_operand" "")
> +                  (mem:SI (plus:SI (match_dup 2)
> +                            (match_operand:SI 4 "const_int_operand" ""))))])]
> +  "TARGET_THUMB2 && arm_arch7
> +   && thumb2_check_ldrd_operands (operands[3], operands[4])"
> +  "*
> +  {
> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
> +    HOST_WIDE_INT offset2 = INTVAL (operands[4]);
> +    if (offset1 > offset2)
> +      {
> +       /* Swap the operands so that memory [base+offset1] is loaded into
> +          operands[0].  */
> +       rtx tmp = operands[0];
> +       operands[0] = operands[1];
> +       operands[1] = tmp;
> +       tmp = operands[3];
> +       operands[3] = operands[4];
> +       operands[4] = tmp;
> +       offset1 = INTVAL (operands[3]);
> +       offset2 = INTVAL (operands[4]);
> +      }
> +    if (thumb2_prefer_ldmstm (operands[0], operands[1],
> +                             operands[2], operands[3], operands[4], true))
> +      return \"ldmdb\\t%2, {%0, %1}\";
> +    else if (fix_cm3_ldrd && (operands[2] == operands[0]))
> +      {
> +       if (offset1 <= -256)
> +         {
> +           output_asm_insn (\"sub\\t%2, %2, %n3\", operands);
> +           output_asm_insn (\"ldr\\t%1, [%2, #4]\", operands);
> +           output_asm_insn (\"ldr\\t%0, [%2]\", operands);
> +         }
> +       else
> +         {
> +           output_asm_insn (\"ldr\\t%1, [%2, %4]\", operands);
> +           output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
> +         }
> +       return \"\";
> +      }
> +    else
> +      return \"ldrd\\t%0, %1, [%2, %3]\";
> +  }"
> +)
> +
> +(define_insn "*thumb2_ldrd_reg1"
> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
> +                  (mem:SI (match_operand:SI 2 "s_register_operand" "rk")))
> +             (set (match_operand:SI 1 "s_register_operand" "")
> +                  (mem:SI (plus:SI (match_dup 2)
> +                            (match_operand:SI 3 "const_int_operand" ""))))])]
> +  "TARGET_THUMB2 && arm_arch7
> +   && thumb2_check_ldrd_operands (NULL_RTX, operands[3])"
> +  "*
> +  {
> +    HOST_WIDE_INT offset2 = INTVAL (operands[3]);
> +    if (offset2 == 4)
> +      {
> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
> +                                 operands[2], NULL_RTX, operands[3], true))
> +         return \"ldmia\\t%2, {%0, %1}\";
> +       if (fix_cm3_ldrd && (operands[2] == operands[0]))
> +         {
> +           output_asm_insn (\"ldr\\t%1, [%2, %3]\", operands);
> +           output_asm_insn (\"ldr\\t%0, [%2]\", operands);
> +           return \"\";
> +         }
> +       return \"ldrd\\t%0, %1, [%2]\";
> +      }
> +    else
> +      {
> +       if (fix_cm3_ldrd && (operands[2] == operands[1]))
> +         {
> +           output_asm_insn (\"ldr\\t%0, [%2]\", operands);
> +           output_asm_insn (\"ldr\\t%1, [%2, %3]\", operands);
> +         }
> +       return \"ldrd\\t%1, %0, [%2, %3]\";
> +      }
> +  }"
> +)
> +
> +(define_insn "*thumb2_ldrd_reg2"
> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
> +                  (mem:SI (plus:SI
> +                               (match_operand:SI 2 "s_register_operand" "rk")
> +                               (match_operand:SI 3 "const_int_operand" ""))))
> +             (set (match_operand:SI 1 "s_register_operand" "")
> +                  (mem:SI (match_dup 2)))])]
> +  "TARGET_THUMB2 && arm_arch7
> +   && thumb2_check_ldrd_operands (operands[3], NULL_RTX)"
> +  "*
> +  {
> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
> +    if (offset1 == -4)
> +      {
> +       if (fix_cm3_ldrd && (operands[2] == operands[0]))
> +         {
> +           output_asm_insn (\"ldr\\t%1, [%2]\", operands);
> +           output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
> +           return \"\";
> +         }
> +       return \"ldrd\\t%0, %1, [%2, %3]\";
> +      }
> +    else
> +      {
> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
> +                                 operands[2], operands[3], NULL_RTX, true))
> +         return \"ldmia\\t%2, {%1, %0}\";
> +       if (fix_cm3_ldrd && (operands[2] == operands[1]))
> +         {
> +           output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
> +           output_asm_insn (\"ldr\\t%1, [%2]\", operands);
> +           return \"\";
> +         }
> +       return \"ldrd\\t%1, %0, [%2]\";
> +      }
> +  }"
> +)
> +
> +(define_peephole2
> +  [(set (match_operand:SI 0 "s_register_operand" "")
> +       (match_operand:SI 2 "memory_operand" ""))
> +   (set (match_operand:SI 1 "s_register_operand" "")
> +       (match_operand:SI 3 "memory_operand" ""))]
> +  "TARGET_THUMB2 && arm_arch7
> +   && thumb2_legitimate_ldrd_p (operands[0], operands[1],
> +                               operands[2], operands[3], true)"
> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
> +                  (match_operand:SI 2 "memory_operand" ""))
> +             (set (match_operand:SI 1 "s_register_operand" "")
> +                  (match_operand:SI 3 "memory_operand" ""))])]
> +  ""
> +)
> +
> +(define_insn "*thumb2_strd"
> +  [(parallel [(set (mem:SI
> +                       (plus:SI (match_operand:SI 2 "s_register_operand" "rk")
> +                                (match_operand:SI 3 "const_int_operand" "")))
> +                  (match_operand:SI 0 "s_register_operand" ""))
> +             (set (mem:SI (plus:SI (match_dup 2)
> +                                (match_operand:SI 4 "const_int_operand" "")))
> +                  (match_operand:SI 1 "s_register_operand" ""))])]
> +  "TARGET_THUMB2 && arm_arch7
> +   && thumb2_check_ldrd_operands (operands[3], operands[4])"
> +  "*
> +  {
> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
> +    HOST_WIDE_INT offset2 = INTVAL (operands[4]);
> +    if (thumb2_prefer_ldmstm (operands[0], operands[1],
> +                             operands[2], operands[3], operands[4], false))
> +      return \"stmdb\\t%2, {%0, %1}\";
> +    if (offset1 < offset2)
> +      return \"strd\\t%0, %1, [%2, %3]\";
> +    else
> +      return \"strd\\t%1, %0, [%2, %4]\";
> +  }"
> +)
> +
> +(define_insn "*thumb2_strd_reg1"
> +  [(parallel [(set (mem:SI (match_operand:SI 2 "s_register_operand" "rk"))
> +                  (match_operand:SI 0 "s_register_operand" ""))
> +             (set (mem:SI (plus:SI (match_dup 2)
> +                               (match_operand:SI 3 "const_int_operand" "")))
> +                  (match_operand:SI 1 "s_register_operand" ""))])]
> +  "TARGET_THUMB2 && arm_arch7
> +   && thumb2_check_ldrd_operands (NULL_RTX, operands[3])"
> +  "*
> +  {
> +    HOST_WIDE_INT offset2 = INTVAL (operands[3]);
> +    if (offset2 == 4)
> +      {
> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
> +                                 operands[2], NULL_RTX, operands[3], false))
> +         return \"stmia\\t%2, {%0, %1}\";
> +       return \"strd\\t%0, %1, [%2]\";
> +      }
> +    else
> +      return \"strd\\t%1, %0, [%2, %3]\";
> +  }"
> +)
> +
> +(define_insn "*thumb2_strd_reg2"
> +  [(parallel [(set (mem:SI (plus:SI
> +                               (match_operand:SI 2 "s_register_operand" "rk")
> +                               (match_operand:SI 3 "const_int_operand" "")))
> +                  (match_operand:SI 0 "s_register_operand" ""))
> +             (set (mem:SI (match_dup 2))
> +                  (match_operand:SI 1 "s_register_operand" ""))])]
> +  "TARGET_THUMB2 && arm_arch7
> +   && thumb2_check_ldrd_operands (operands[3], NULL_RTX)"
> +  "*
> +  {
> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
> +    if (offset1 == -4)
> +      return \"strd\\t%0, %1, [%2, %3]\";
> +    else
> +      {
> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
> +                                 operands[2], operands[3], NULL_RTX, false))
> +         return \"stmia\\t%2, {%1, %0}\";
> +       return \"strd\\t%1, %0, [%2]\";
> +      }
> +  }"
> +)
> +
> +(define_peephole2
> +  [(set (match_operand:SI 2 "memory_operand" "")
> +       (match_operand:SI 0 "s_register_operand" ""))
> +   (set (match_operand:SI 3 "memory_operand" "")
> +       (match_operand:SI 1 "s_register_operand" ""))]
> +  "TARGET_THUMB2 && arm_arch7
> +   && thumb2_legitimate_ldrd_p (operands[0], operands[1],
> +                               operands[2], operands[3], false)"
> +  [(parallel [(set (match_operand:SI 2 "memory_operand" "")
> +                  (match_operand:SI 0 "s_register_operand" ""))
> +             (set (match_operand:SI 3 "memory_operand" "")
> +                  (match_operand:SI 1 "s_register_operand" ""))])]
> +  ""
> +)
> Index: arm.c
> ===================================================================
> --- arm.c       (revision 165492)
> +++ arm.c       (working copy)
> @@ -23254,4 +23254,134 @@ arm_builtin_support_vector_misalignment
>                                                      is_packed);
>  }
>
> +/* Check the validity of operands in an ldrd/strd instruction.  */
> +bool
> +thumb2_check_ldrd_operands (rtx off1, rtx off2)
> +{
> +  HOST_WIDE_INT offset1 = 0;
> +  HOST_WIDE_INT offset2 = 0;
> +
> +  if (off1 != NULL_RTX)
> +    offset1 = INTVAL (off1);
> +  if (off2 != NULL_RTX)
> +    offset2 = INTVAL (off2);
> +
> +  /* The offset range of LDRD is [-1020, 1020]. Here we check if both
> +     offsets lie in the range [-1020, 1024]. If one of the offsets is
> +     1024, the following condition ((offset1 + 4) == offset2) will ensure
> +     offset1 to be 1020, suitable for instruction LDRD.  */
> +  if ((offset1 > 1024) || (offset1 < -1020) || ((offset1 & 3) != 0))
> +    return false;
> +  if ((offset2 > 1024) || (offset2 < -1020) || ((offset2 & 3) != 0))
> +    return false;
> +
> +  if ((offset1 + 4) == offset2)
> +    return true;
> +  if ((offset2 + 4) == offset1)
> +    return true;
> +
> +  return false;
> +}
> +
> +/* Check if the two memory accesses can be merged to an ldrd/strd instruction.
> +   That is they use the same base register, and the gap between constant
> +   offsets should be 4.  */
> +bool
> +thumb2_legitimate_ldrd_p (rtx reg1, rtx reg2, rtx mem1, rtx mem2, bool ldrd)
> +{
> +  rtx base1, base2, op1;
> +  rtx addr1 = XEXP (mem1, 0);
> +  rtx addr2 = XEXP (mem2, 0);
> +  HOST_WIDE_INT offset1 = 0;
> +  HOST_WIDE_INT offset2 = 0;
> +
> +  if (MEM_VOLATILE_P (mem1) || MEM_VOLATILE_P (mem2))
> +    return false;
> +
> +  if (REG_P (addr1))
> +    base1 = addr1;
> +  else if (GET_CODE (addr1) == PLUS)
> +    {
> +      base1 = XEXP (addr1, 0);
> +      op1 = XEXP (addr1, 1);
> +      if (!REG_P (base1) || (GET_CODE (op1) != CONST_INT))
> +       return false;
> +      offset1 = INTVAL (op1);
> +    }
> +  else
> +    return false;
> +
> +  if (REG_P (addr2))
> +    base2 = addr2;
> +  else if (GET_CODE (addr2) == PLUS)
> +    {
> +      base2 = XEXP (addr2, 0);
> +      op1 = XEXP (addr2, 1);
> +      if (!REG_P (base2) || (GET_CODE (op1) != CONST_INT))
> +       return false;
> +      offset2 = INTVAL (op1);
> +    }
> +  else
> +    return false;
> +
> +  if (base1 != base2)
> +    return false;
> +
> +  /* The offset range of LDRD is [-1020, 1020]. Here we check if both
> +     offsets lie in the range [-1020, 1024]. If one of the offsets is
> +     1024, the following condition ((offset1 + 4) == offset2) will ensure
> +     offset1 to be 1020, suitable for instruction LDRD.  */
> +  if ((offset1 > 1024) || (offset1 < -1020) || ((offset1 & 3) != 0))
> +    return false;
> +  if ((offset2 > 1024) || (offset2 < -1020) || ((offset2 & 3) != 0))
> +    return false;
> +
> +  if (ldrd && ((reg1 == reg2) || (reg1 == base1)))
> +    return false;
> +
> +  if ((offset1 + 4) == offset2)
> +    return true;
> +  if ((offset2 + 4) == offset1)
> +    return true;
> +
> +  return false;
> +}
> +
> +/* Check if the insn can be expressed as ldm/stm with less cost.  */
> +bool
> +thumb2_prefer_ldmstm (rtx reg1, rtx reg2, rtx base,
> +                     rtx off1, rtx off2, bool ldrd)
> +{
> +  HOST_WIDE_INT offset1 = 0;
> +  HOST_WIDE_INT offset2 = 0;
> +
> +  if (off1 != NULL_RTX)
> +    offset1 = INTVAL (off1);
> +  if (off2 != NULL_RTX)
> +    offset2 = INTVAL (off2);
> +
> +  if (offset1 > offset2)
> +    {
> +      rtx tmp;
> +      HOST_WIDE_INT t = offset1;
> +      offset1 = offset2;
> +      offset2 = t;
> +      tmp = reg1;
> +      reg1 = reg2;
> +      reg2 = tmp;
> +    }
> +
> +  /* The offset of ldmdb is -8, the offset of ldmia is 0.  */
> +  if ((offset1 != -8) && (offset1 != 0))
> +    return false;
> +
> +  /* Lower register corresponds to lower memory.  */
> +  if (REGNO (reg1) > REGNO (reg2))
> +    return false;
> +
> +  /* Now ldm/stm is possible. Check for special cases ldm/stm has lower
> +     cost.  */
> +  return false;
> +}
> +
>  #include "gt-arm.h"
> Index: arm-protos.h
> ===================================================================
> --- arm-protos.h        (revision 165492)
> +++ arm-protos.h        (working copy)
> @@ -150,6 +150,9 @@ extern void arm_expand_sync (enum machin
>  extern const char *arm_output_memory_barrier (rtx *);
>  extern const char *arm_output_sync_insn (rtx, rtx *);
>  extern unsigned int arm_sync_loop_insns (rtx , rtx *);
> +extern bool thumb2_check_ldrd_operands (rtx, rtx);
> +extern bool thumb2_legitimate_ldrd_p (rtx, rtx, rtx, rtx, bool);
> +extern bool thumb2_prefer_ldmstm (rtx, rtx, rtx, rtx, rtx, bool);
>
>  #if defined TREE_CODE
>  extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree);
> Index: ldmstm.md
> ===================================================================
> --- ldmstm.md   (revision 165492)
> +++ ldmstm.md   (working copy)
> @@ -852,7 +852,7 @@ (define_insn "*ldm2_ia"
>      (set (match_operand:SI 2 "arm_hard_register_operand" "")
>           (mem:SI (plus:SI (match_dup 3)
>                   (const_int 4))))])]
> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>   "ldm%(ia%)\t%3, {%1, %2}"
>   [(set_attr "type" "load2")
>    (set_attr "predicable" "yes")])
> @@ -901,7 +901,7 @@ (define_insn "*stm2_ia"
>           (match_operand:SI 1 "arm_hard_register_operand" ""))
>      (set (mem:SI (plus:SI (match_dup 3) (const_int 4)))
>           (match_operand:SI 2 "arm_hard_register_operand" ""))])]
> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>   "stm%(ia%)\t%3, {%1, %2}"
>   [(set_attr "type" "store2")
>    (set_attr "predicable" "yes")])
> @@ -1041,7 +1041,7 @@ (define_insn "*ldm2_db"
>      (set (match_operand:SI 2 "arm_hard_register_operand" "")
>           (mem:SI (plus:SI (match_dup 3)
>                   (const_int -4))))])]
> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>   "ldm%(db%)\t%3, {%1, %2}"
>   [(set_attr "type" "load2")
>    (set_attr "predicable" "yes")])
> @@ -1067,7 +1067,7 @@ (define_insn "*stm2_db"
>           (match_operand:SI 1 "arm_hard_register_operand" ""))
>      (set (mem:SI (plus:SI (match_dup 3) (const_int -4)))
>           (match_operand:SI 2 "arm_hard_register_operand" ""))])]
> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>   "stm%(db%)\t%3, {%1, %2}"
>   [(set_attr "type" "store2")
>    (set_attr "predicable" "yes")])
>
>
> Index: pr40457-3.c
> ===================================================================
> --- pr40457-3.c (revision 165492)
> +++ pr40457-3.c (working copy)
> @@ -5,6 +5,7 @@ void foo(int* p)
>  {
>   p[0] = 1;
>   p[1] = 0;
> +  p[2] = 2;
>  }
>
>  /* { dg-final { scan-assembler "stm" } } */
> Index: pr40457-1.c
> ===================================================================
> --- pr40457-1.c (revision 165492)
> +++ pr40457-1.c (working copy)
> @@ -1,9 +1,9 @@
> -/* { dg-options "-Os" }  */
> +/* { dg-options "-O2" }  */
>  /* { dg-do compile } */
>
>  int bar(int* p)
>  {
> -  int x = p[0] + p[1];
> +  int x = p[0] + p[1] + p[2];
>   return x;
>  }
>
> Index: pr40457-2.c
> ===================================================================
> --- pr40457-2.c (revision 165492)
> +++ pr40457-2.c (working copy)
> @@ -5,6 +5,7 @@ void foo(int* p)
>  {
>   p[0] = 1;
>   p[1] = 0;
> +  p[2] = 2;
>  }
>
>  /* { dg-final { scan-assembler "stm" } } */
> Index: pr45335.c
> ===================================================================
> --- pr45335.c   (revision 0)
> +++ pr45335.c   (revision 0)
> @@ -0,0 +1,22 @@
> +/* { dg-options "-mthumb -O2" } */
> +/* { dg-require-effective-target arm_thumb2_ok } */
> +/* { dg-final { scan-assembler "ldrd" } } */
> +/* { dg-final { scan-assembler "strd" } } */
> +
> +struct S
> +{
> +    void* p1;
> +    void* p2;
> +    void* p3;
> +    void* p4;
> +};
> +
> +extern printf(char*, ...);
> +
> +void foo1(struct S* fp, struct S* otherSaveArea)
> +{
> +    struct S* saveA = fp - 1;
> +    printf("StackSaveArea for fp %p [%p/%p]:\n", fp, saveA, otherSaveArea);
> +    printf("prevFrame=%p savedPc=%p meth=%p curPc=%p fp[0]=0x%08x\n",
> +        saveA->p1, saveA->p2, saveA->p3, saveA->p4, *(unsigned int*)fp);
> +}
>
Carrot Wei Oct. 31, 2010, 9:22 a.m. UTC | #2
Ping

On Sun, Oct 24, 2010 at 9:46 PM, Carrot Wei <carrot@google.com> wrote:
> Ping
>
> On Sat, Oct 16, 2010 at 8:27 PM, Carrot Wei <carrot@google.com> wrote:
>> On Wed, Oct 13, 2010 at 7:01 PM, Paul Brook <paul@codesourcery.com> wrote:
>>>> ChangeLog:
>>>> 2010-09-04  Wei Guozhi  <carrot@google.com>
>>>>
>>>>         PR target/45335
>>>>         * gcc/config/arm/thumb2.md (thumb2_ldrd, thumb2_ldrd_reg1,
>>>>         thumb2_ldrd_reg2 and peephole2): New insn pattern and related
>>>>         peephole2.
>>>>         (thumb2_strd, thumb2_strd_reg1, thumb2_strd_reg2 and peephole2):
>>>>         New insn pattern and related peephole2.
>>>>         * gcc/config/arm/arm.c (thumb2_legitimate_ldrd_p): New function.
>>>>         (thumb2_check_ldrd_operands): New function.
>>>>         (thumb2_prefer_ldmstm): New function.
>>>>         * gcc/config/arm/arm-protos.h (thumb2_legitimate_ldrd_p): New
>>>> prototype. (thumb2_check_ldrd_operands): New prototype.
>>>>         (thumb2_prefer_ldmstm): New prototype.
>>>>         * gcc/config/arm/ldmstm.md (ldm2_ia, stm2_ia, ldm2_db, stm2_db):
>>>>         Change the ldm/stm patterns with 2 words to ARM only.
>>>>         * gcc/config/arm/constraints.md (Py): New thumb2 constant
>>>> constraint suitable to ldrd/strd instructions.
>>>
>>> Not ok.
>>>
>>> Why is this restricted to Thumb mode? The ARM variant of ldrd isn't quite as
>>> flexible, but still provides a useful improvement over ldm.
>>>
>> I agree the ARM version is also useful. But it brings much less
>> benefit with too much complexity (due to more restriction and insn
>> pattern conflict with ldm). So I will leave it as a future
>> improvement.
>>
>>> This transformation is only valid on ARMv7 cores. On earlier hardware
>>> (depending on system configuration) it may cause undefined behavior or an
>>> alignment trap.
>>>
>> done.
>>
>>> The range on -1020 to +1024 is used in several places, but without any
>>> apparent explanation of why it's different to the range of an ldrd
>>> instruction.  I figured it out eventually, but it deserves a comment.
>>>
>> Comments added.
>>
>>>> +  "TARGET_THUMB2 && thumb2_check_ldrd_operands (operands[0], operands[1],
>>>> +                                           operands[2], 0, operands[3], 1)"
>>>
>>> Passed operands do not match expected types. Specifically "0" is not an rtx
>>> (should be "NULL_RTX"), and "1" is not a boolean value (should be "true").
>>> Many other occurrences.
>>>
>> Fixed.
>>
>>>> +(define_constraint "Py"
>>>> +  "@internal In Thumb-2 state a constant that is a multiple of 4 in the
>>>> +   range -1020 to 1024"
>>>
>>> This comment seems particularly pointless. You should mention why this
>>> exists/where it is used.
>>>
>>> I think you're better off enforcing this in the insn condition, and remove
>>> this constraint. At least half the uses (the -reg[12] insns) are incorrect,
>>> and you already need the condition to enforce the dependency between the
>>> operands.
>>>
>> I removed this constraint and add the check to insn condition.
>>
>>>> +thumb2_check_ldrd_operands (rtx reg1, rtx reg2, rtx base,
>>>>...
>>>> +  if (ldrd && (reg1 == reg2))
>>>> +    return false;
>>>
>>> This function is part of the instruction condition.  Instruction conditions
>>> must not be used to enforce register allocation.
>>>
>> removed.
>>
>>>> +thumb2_legitimate_ldrd_p (
>>>>...
>>>> +  if (ldrd && ((reg1 == reg2) || (reg1 == base1)))
>>>> +    return false;
>>>
>>> You're incorrectly assuming offset1 < offset2, which might not be true at this
>>> point.
>>>
>> The following check assumes offset1 < offset2
>> +  if ((offset1 + 4) == offset2)
>> +    return true;
>>
>> And another check assumes offset2 < offset1, so both cases are covered.
>> +  if ((offset2 + 4) == offset1)
>> +    return true;
>>
>>>> +  /* Now ldm/stm is possible. Check for special cases ldm/stm has lower
>>>> +     cost.  */
>>>> +  return false;
>>>
>>> Code clearly doesn't match the comment.  In fact this function always returns
>>> false.
>>>
>> Richard mentioned that in some cases (specifically cortex A9) ldm has
>> less cost than ldrd and we should model this in the insn pattern. This
>> function is used for this. But I don't know the cortex A9 architecture
>> detail, so it should be filled by somebody with more knowledge about
>> it in future.
>>
>> Wei Guozhi
>>
>>
>> ChangeLog:
>> 2010-10-16  Wei Guozhi  <carrot@google.com>
>>
>>        PR target/45335
>>        * gcc/config/arm/thumb2.md (thumb2_ldrd, thumb2_ldrd_reg1,
>>        thumb2_ldrd_reg2 and peephole2): New insn pattern and related
>>        peephole2.
>>        (thumb2_strd, thumb2_strd_reg1, thumb2_strd_reg2 and peephole2):
>>        New insn pattern and related peephole2.
>>        * gcc/config/arm/arm.c (thumb2_legitimate_ldrd_p): New function.
>>        (thumb2_check_ldrd_operands): New function.
>>        (thumb2_prefer_ldmstm): New function.
>>        * gcc/config/arm/arm-protos.h (thumb2_legitimate_ldrd_p): New prototype.
>>        (thumb2_check_ldrd_operands): New prototype.
>>        (thumb2_prefer_ldmstm): New prototype.
>>        * gcc/config/arm/ldmstm.md (ldm2_ia, stm2_ia, ldm2_db, stm2_db):
>>        Change the ldm/stm patterns with 2 words to ARM only.
>>
>>
>> 2010-10-16  Wei Guozhi  <carrot@google.com>
>>
>>        PR target/45335
>>        * gcc.target/arm/pr45335.c: New test.
>>        * gcc.target/arm/pr40457-1.c: Changed to load 3 words.
>>        * gcc.target/arm/pr40457-2.c: Changed to store 3 words.
>>        * gcc.target/arm/pr40457-3.c: Changed to store 3 words.
>>
>>
>> Index: thumb2.md
>> ===================================================================
>> --- thumb2.md   (revision 165492)
>> +++ thumb2.md   (working copy)
>> @@ -1118,3 +1118,228 @@ (define_peephole2
>>   "
>>   operands[2] = GEN_INT (32 - INTVAL (operands[2]));
>>   ")
>> +
>> +(define_insn "*thumb2_ldrd"
>> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>> +                  (mem:SI (plus:SI
>> +                               (match_operand:SI 2 "s_register_operand" "rk")
>> +                               (match_operand:SI 3 "const_int_operand" ""))))
>> +             (set (match_operand:SI 1 "s_register_operand" "")
>> +                  (mem:SI (plus:SI (match_dup 2)
>> +                            (match_operand:SI 4 "const_int_operand" ""))))])]
>> +  "TARGET_THUMB2 && arm_arch7
>> +   && thumb2_check_ldrd_operands (operands[3], operands[4])"
>> +  "*
>> +  {
>> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>> +    HOST_WIDE_INT offset2 = INTVAL (operands[4]);
>> +    if (offset1 > offset2)
>> +      {
>> +       /* Swap the operands so that memory [base+offset1] is loaded into
>> +          operands[0].  */
>> +       rtx tmp = operands[0];
>> +       operands[0] = operands[1];
>> +       operands[1] = tmp;
>> +       tmp = operands[3];
>> +       operands[3] = operands[4];
>> +       operands[4] = tmp;
>> +       offset1 = INTVAL (operands[3]);
>> +       offset2 = INTVAL (operands[4]);
>> +      }
>> +    if (thumb2_prefer_ldmstm (operands[0], operands[1],
>> +                             operands[2], operands[3], operands[4], true))
>> +      return \"ldmdb\\t%2, {%0, %1}\";
>> +    else if (fix_cm3_ldrd && (operands[2] == operands[0]))
>> +      {
>> +       if (offset1 <= -256)
>> +         {
>> +           output_asm_insn (\"sub\\t%2, %2, %n3\", operands);
>> +           output_asm_insn (\"ldr\\t%1, [%2, #4]\", operands);
>> +           output_asm_insn (\"ldr\\t%0, [%2]\", operands);
>> +         }
>> +       else
>> +         {
>> +           output_asm_insn (\"ldr\\t%1, [%2, %4]\", operands);
>> +           output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
>> +         }
>> +       return \"\";
>> +      }
>> +    else
>> +      return \"ldrd\\t%0, %1, [%2, %3]\";
>> +  }"
>> +)
>> +
>> +(define_insn "*thumb2_ldrd_reg1"
>> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>> +                  (mem:SI (match_operand:SI 2 "s_register_operand" "rk")))
>> +             (set (match_operand:SI 1 "s_register_operand" "")
>> +                  (mem:SI (plus:SI (match_dup 2)
>> +                            (match_operand:SI 3 "const_int_operand" ""))))])]
>> +  "TARGET_THUMB2 && arm_arch7
>> +   && thumb2_check_ldrd_operands (NULL_RTX, operands[3])"
>> +  "*
>> +  {
>> +    HOST_WIDE_INT offset2 = INTVAL (operands[3]);
>> +    if (offset2 == 4)
>> +      {
>> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
>> +                                 operands[2], NULL_RTX, operands[3], true))
>> +         return \"ldmia\\t%2, {%0, %1}\";
>> +       if (fix_cm3_ldrd && (operands[2] == operands[0]))
>> +         {
>> +           output_asm_insn (\"ldr\\t%1, [%2, %3]\", operands);
>> +           output_asm_insn (\"ldr\\t%0, [%2]\", operands);
>> +           return \"\";
>> +         }
>> +       return \"ldrd\\t%0, %1, [%2]\";
>> +      }
>> +    else
>> +      {
>> +       if (fix_cm3_ldrd && (operands[2] == operands[1]))
>> +         {
>> +           output_asm_insn (\"ldr\\t%0, [%2]\", operands);
>> +           output_asm_insn (\"ldr\\t%1, [%2, %3]\", operands);
>> +         }
>> +       return \"ldrd\\t%1, %0, [%2, %3]\";
>> +      }
>> +  }"
>> +)
>> +
>> +(define_insn "*thumb2_ldrd_reg2"
>> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>> +                  (mem:SI (plus:SI
>> +                               (match_operand:SI 2 "s_register_operand" "rk")
>> +                               (match_operand:SI 3 "const_int_operand" ""))))
>> +             (set (match_operand:SI 1 "s_register_operand" "")
>> +                  (mem:SI (match_dup 2)))])]
>> +  "TARGET_THUMB2 && arm_arch7
>> +   && thumb2_check_ldrd_operands (operands[3], NULL_RTX)"
>> +  "*
>> +  {
>> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>> +    if (offset1 == -4)
>> +      {
>> +       if (fix_cm3_ldrd && (operands[2] == operands[0]))
>> +         {
>> +           output_asm_insn (\"ldr\\t%1, [%2]\", operands);
>> +           output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
>> +           return \"\";
>> +         }
>> +       return \"ldrd\\t%0, %1, [%2, %3]\";
>> +      }
>> +    else
>> +      {
>> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
>> +                                 operands[2], operands[3], NULL_RTX, true))
>> +         return \"ldmia\\t%2, {%1, %0}\";
>> +       if (fix_cm3_ldrd && (operands[2] == operands[1]))
>> +         {
>> +           output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
>> +           output_asm_insn (\"ldr\\t%1, [%2]\", operands);
>> +           return \"\";
>> +         }
>> +       return \"ldrd\\t%1, %0, [%2]\";
>> +      }
>> +  }"
>> +)
>> +
>> +(define_peephole2
>> +  [(set (match_operand:SI 0 "s_register_operand" "")
>> +       (match_operand:SI 2 "memory_operand" ""))
>> +   (set (match_operand:SI 1 "s_register_operand" "")
>> +       (match_operand:SI 3 "memory_operand" ""))]
>> +  "TARGET_THUMB2 && arm_arch7
>> +   && thumb2_legitimate_ldrd_p (operands[0], operands[1],
>> +                               operands[2], operands[3], true)"
>> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>> +                  (match_operand:SI 2 "memory_operand" ""))
>> +             (set (match_operand:SI 1 "s_register_operand" "")
>> +                  (match_operand:SI 3 "memory_operand" ""))])]
>> +  ""
>> +)
>> +
>> +(define_insn "*thumb2_strd"
>> +  [(parallel [(set (mem:SI
>> +                       (plus:SI (match_operand:SI 2 "s_register_operand" "rk")
>> +                                (match_operand:SI 3 "const_int_operand" "")))
>> +                  (match_operand:SI 0 "s_register_operand" ""))
>> +             (set (mem:SI (plus:SI (match_dup 2)
>> +                                (match_operand:SI 4 "const_int_operand" "")))
>> +                  (match_operand:SI 1 "s_register_operand" ""))])]
>> +  "TARGET_THUMB2 && arm_arch7
>> +   && thumb2_check_ldrd_operands (operands[3], operands[4])"
>> +  "*
>> +  {
>> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>> +    HOST_WIDE_INT offset2 = INTVAL (operands[4]);
>> +    if (thumb2_prefer_ldmstm (operands[0], operands[1],
>> +                             operands[2], operands[3], operands[4], false))
>> +      return \"stmdb\\t%2, {%0, %1}\";
>> +    if (offset1 < offset2)
>> +      return \"strd\\t%0, %1, [%2, %3]\";
>> +    else
>> +      return \"strd\\t%1, %0, [%2, %4]\";
>> +  }"
>> +)
>> +
>> +(define_insn "*thumb2_strd_reg1"
>> +  [(parallel [(set (mem:SI (match_operand:SI 2 "s_register_operand" "rk"))
>> +                  (match_operand:SI 0 "s_register_operand" ""))
>> +             (set (mem:SI (plus:SI (match_dup 2)
>> +                               (match_operand:SI 3 "const_int_operand" "")))
>> +                  (match_operand:SI 1 "s_register_operand" ""))])]
>> +  "TARGET_THUMB2 && arm_arch7
>> +   && thumb2_check_ldrd_operands (NULL_RTX, operands[3])"
>> +  "*
>> +  {
>> +    HOST_WIDE_INT offset2 = INTVAL (operands[3]);
>> +    if (offset2 == 4)
>> +      {
>> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
>> +                                 operands[2], NULL_RTX, operands[3], false))
>> +         return \"stmia\\t%2, {%0, %1}\";
>> +       return \"strd\\t%0, %1, [%2]\";
>> +      }
>> +    else
>> +      return \"strd\\t%1, %0, [%2, %3]\";
>> +  }"
>> +)
>> +
>> +(define_insn "*thumb2_strd_reg2"
>> +  [(parallel [(set (mem:SI (plus:SI
>> +                               (match_operand:SI 2 "s_register_operand" "rk")
>> +                               (match_operand:SI 3 "const_int_operand" "")))
>> +                  (match_operand:SI 0 "s_register_operand" ""))
>> +             (set (mem:SI (match_dup 2))
>> +                  (match_operand:SI 1 "s_register_operand" ""))])]
>> +  "TARGET_THUMB2 && arm_arch7
>> +   && thumb2_check_ldrd_operands (operands[3], NULL_RTX)"
>> +  "*
>> +  {
>> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>> +    if (offset1 == -4)
>> +      return \"strd\\t%0, %1, [%2, %3]\";
>> +    else
>> +      {
>> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
>> +                                 operands[2], operands[3], NULL_RTX, false))
>> +         return \"stmia\\t%2, {%1, %0}\";
>> +       return \"strd\\t%1, %0, [%2]\";
>> +      }
>> +  }"
>> +)
>> +
>> +(define_peephole2
>> +  [(set (match_operand:SI 2 "memory_operand" "")
>> +       (match_operand:SI 0 "s_register_operand" ""))
>> +   (set (match_operand:SI 3 "memory_operand" "")
>> +       (match_operand:SI 1 "s_register_operand" ""))]
>> +  "TARGET_THUMB2 && arm_arch7
>> +   && thumb2_legitimate_ldrd_p (operands[0], operands[1],
>> +                               operands[2], operands[3], false)"
>> +  [(parallel [(set (match_operand:SI 2 "memory_operand" "")
>> +                  (match_operand:SI 0 "s_register_operand" ""))
>> +             (set (match_operand:SI 3 "memory_operand" "")
>> +                  (match_operand:SI 1 "s_register_operand" ""))])]
>> +  ""
>> +)
>> Index: arm.c
>> ===================================================================
>> --- arm.c       (revision 165492)
>> +++ arm.c       (working copy)
>> @@ -23254,4 +23254,134 @@ arm_builtin_support_vector_misalignment
>>                                                      is_packed);
>>  }
>>
>> +/* Check the validity of operands in an ldrd/strd instruction.  */
>> +bool
>> +thumb2_check_ldrd_operands (rtx off1, rtx off2)
>> +{
>> +  HOST_WIDE_INT offset1 = 0;
>> +  HOST_WIDE_INT offset2 = 0;
>> +
>> +  if (off1 != NULL_RTX)
>> +    offset1 = INTVAL (off1);
>> +  if (off2 != NULL_RTX)
>> +    offset2 = INTVAL (off2);
>> +
>> +  /* The offset range of LDRD is [-1020, 1020]. Here we check if both
>> +     offsets lie in the range [-1020, 1024]. If one of the offsets is
>> +     1024, the following condition ((offset1 + 4) == offset2) will ensure
>> +     offset1 to be 1020, suitable for instruction LDRD.  */
>> +  if ((offset1 > 1024) || (offset1 < -1020) || ((offset1 & 3) != 0))
>> +    return false;
>> +  if ((offset2 > 1024) || (offset2 < -1020) || ((offset2 & 3) != 0))
>> +    return false;
>> +
>> +  if ((offset1 + 4) == offset2)
>> +    return true;
>> +  if ((offset2 + 4) == offset1)
>> +    return true;
>> +
>> +  return false;
>> +}
>> +
>> +/* Check if the two memory accesses can be merged to an ldrd/strd instruction.
>> +   That is they use the same base register, and the gap between constant
>> +   offsets should be 4.  */
>> +bool
>> +thumb2_legitimate_ldrd_p (rtx reg1, rtx reg2, rtx mem1, rtx mem2, bool ldrd)
>> +{
>> +  rtx base1, base2, op1;
>> +  rtx addr1 = XEXP (mem1, 0);
>> +  rtx addr2 = XEXP (mem2, 0);
>> +  HOST_WIDE_INT offset1 = 0;
>> +  HOST_WIDE_INT offset2 = 0;
>> +
>> +  if (MEM_VOLATILE_P (mem1) || MEM_VOLATILE_P (mem2))
>> +    return false;
>> +
>> +  if (REG_P (addr1))
>> +    base1 = addr1;
>> +  else if (GET_CODE (addr1) == PLUS)
>> +    {
>> +      base1 = XEXP (addr1, 0);
>> +      op1 = XEXP (addr1, 1);
>> +      if (!REG_P (base1) || (GET_CODE (op1) != CONST_INT))
>> +       return false;
>> +      offset1 = INTVAL (op1);
>> +    }
>> +  else
>> +    return false;
>> +
>> +  if (REG_P (addr2))
>> +    base2 = addr2;
>> +  else if (GET_CODE (addr2) == PLUS)
>> +    {
>> +      base2 = XEXP (addr2, 0);
>> +      op1 = XEXP (addr2, 1);
>> +      if (!REG_P (base2) || (GET_CODE (op1) != CONST_INT))
>> +       return false;
>> +      offset2 = INTVAL (op1);
>> +    }
>> +  else
>> +    return false;
>> +
>> +  if (base1 != base2)
>> +    return false;
>> +
>> +  /* The offset range of LDRD is [-1020, 1020]. Here we check if both
>> +     offsets lie in the range [-1020, 1024]. If one of the offsets is
>> +     1024, the following condition ((offset1 + 4) == offset2) will ensure
>> +     offset1 to be 1020, suitable for instruction LDRD.  */
>> +  if ((offset1 > 1024) || (offset1 < -1020) || ((offset1 & 3) != 0))
>> +    return false;
>> +  if ((offset2 > 1024) || (offset2 < -1020) || ((offset2 & 3) != 0))
>> +    return false;
>> +
>> +  if (ldrd && ((reg1 == reg2) || (reg1 == base1)))
>> +    return false;
>> +
>> +  if ((offset1 + 4) == offset2)
>> +    return true;
>> +  if ((offset2 + 4) == offset1)
>> +    return true;
>> +
>> +  return false;
>> +}
>> +
>> +/* Check if the insn can be expressed as ldm/stm with less cost.  */
>> +bool
>> +thumb2_prefer_ldmstm (rtx reg1, rtx reg2, rtx base,
>> +                     rtx off1, rtx off2, bool ldrd)
>> +{
>> +  HOST_WIDE_INT offset1 = 0;
>> +  HOST_WIDE_INT offset2 = 0;
>> +
>> +  if (off1 != NULL_RTX)
>> +    offset1 = INTVAL (off1);
>> +  if (off2 != NULL_RTX)
>> +    offset2 = INTVAL (off2);
>> +
>> +  if (offset1 > offset2)
>> +    {
>> +      rtx tmp;
>> +      HOST_WIDE_INT t = offset1;
>> +      offset1 = offset2;
>> +      offset2 = t;
>> +      tmp = reg1;
>> +      reg1 = reg2;
>> +      reg2 = tmp;
>> +    }
>> +
>> +  /* The offset of ldmdb is -8, the offset of ldmia is 0.  */
>> +  if ((offset1 != -8) && (offset1 != 0))
>> +    return false;
>> +
>> +  /* Lower register corresponds to lower memory.  */
>> +  if (REGNO (reg1) > REGNO (reg2))
>> +    return false;
>> +
>> +  /* Now ldm/stm is possible. Check for special cases ldm/stm has lower
>> +     cost.  */
>> +  return false;
>> +}
>> +
>>  #include "gt-arm.h"
>> Index: arm-protos.h
>> ===================================================================
>> --- arm-protos.h        (revision 165492)
>> +++ arm-protos.h        (working copy)
>> @@ -150,6 +150,9 @@ extern void arm_expand_sync (enum machin
>>  extern const char *arm_output_memory_barrier (rtx *);
>>  extern const char *arm_output_sync_insn (rtx, rtx *);
>>  extern unsigned int arm_sync_loop_insns (rtx , rtx *);
>> +extern bool thumb2_check_ldrd_operands (rtx, rtx);
>> +extern bool thumb2_legitimate_ldrd_p (rtx, rtx, rtx, rtx, bool);
>> +extern bool thumb2_prefer_ldmstm (rtx, rtx, rtx, rtx, rtx, bool);
>>
>>  #if defined TREE_CODE
>>  extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree);
>> Index: ldmstm.md
>> ===================================================================
>> --- ldmstm.md   (revision 165492)
>> +++ ldmstm.md   (working copy)
>> @@ -852,7 +852,7 @@ (define_insn "*ldm2_ia"
>>      (set (match_operand:SI 2 "arm_hard_register_operand" "")
>>           (mem:SI (plus:SI (match_dup 3)
>>                   (const_int 4))))])]
>> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>   "ldm%(ia%)\t%3, {%1, %2}"
>>   [(set_attr "type" "load2")
>>    (set_attr "predicable" "yes")])
>> @@ -901,7 +901,7 @@ (define_insn "*stm2_ia"
>>           (match_operand:SI 1 "arm_hard_register_operand" ""))
>>      (set (mem:SI (plus:SI (match_dup 3) (const_int 4)))
>>           (match_operand:SI 2 "arm_hard_register_operand" ""))])]
>> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>   "stm%(ia%)\t%3, {%1, %2}"
>>   [(set_attr "type" "store2")
>>    (set_attr "predicable" "yes")])
>> @@ -1041,7 +1041,7 @@ (define_insn "*ldm2_db"
>>      (set (match_operand:SI 2 "arm_hard_register_operand" "")
>>           (mem:SI (plus:SI (match_dup 3)
>>                   (const_int -4))))])]
>> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>   "ldm%(db%)\t%3, {%1, %2}"
>>   [(set_attr "type" "load2")
>>    (set_attr "predicable" "yes")])
>> @@ -1067,7 +1067,7 @@ (define_insn "*stm2_db"
>>           (match_operand:SI 1 "arm_hard_register_operand" ""))
>>      (set (mem:SI (plus:SI (match_dup 3) (const_int -4)))
>>           (match_operand:SI 2 "arm_hard_register_operand" ""))])]
>> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>   "stm%(db%)\t%3, {%1, %2}"
>>   [(set_attr "type" "store2")
>>    (set_attr "predicable" "yes")])
>>
>>
>> Index: pr40457-3.c
>> ===================================================================
>> --- pr40457-3.c (revision 165492)
>> +++ pr40457-3.c (working copy)
>> @@ -5,6 +5,7 @@ void foo(int* p)
>>  {
>>   p[0] = 1;
>>   p[1] = 0;
>> +  p[2] = 2;
>>  }
>>
>>  /* { dg-final { scan-assembler "stm" } } */
>> Index: pr40457-1.c
>> ===================================================================
>> --- pr40457-1.c (revision 165492)
>> +++ pr40457-1.c (working copy)
>> @@ -1,9 +1,9 @@
>> -/* { dg-options "-Os" }  */
>> +/* { dg-options "-O2" }  */
>>  /* { dg-do compile } */
>>
>>  int bar(int* p)
>>  {
>> -  int x = p[0] + p[1];
>> +  int x = p[0] + p[1] + p[2];
>>   return x;
>>  }
>>
>> Index: pr40457-2.c
>> ===================================================================
>> --- pr40457-2.c (revision 165492)
>> +++ pr40457-2.c (working copy)
>> @@ -5,6 +5,7 @@ void foo(int* p)
>>  {
>>   p[0] = 1;
>>   p[1] = 0;
>> +  p[2] = 2;
>>  }
>>
>>  /* { dg-final { scan-assembler "stm" } } */
>> Index: pr45335.c
>> ===================================================================
>> --- pr45335.c   (revision 0)
>> +++ pr45335.c   (revision 0)
>> @@ -0,0 +1,22 @@
>> +/* { dg-options "-mthumb -O2" } */
>> +/* { dg-require-effective-target arm_thumb2_ok } */
>> +/* { dg-final { scan-assembler "ldrd" } } */
>> +/* { dg-final { scan-assembler "strd" } } */
>> +
>> +struct S
>> +{
>> +    void* p1;
>> +    void* p2;
>> +    void* p3;
>> +    void* p4;
>> +};
>> +
>> +extern printf(char*, ...);
>> +
>> +void foo1(struct S* fp, struct S* otherSaveArea)
>> +{
>> +    struct S* saveA = fp - 1;
>> +    printf("StackSaveArea for fp %p [%p/%p]:\n", fp, saveA, otherSaveArea);
>> +    printf("prevFrame=%p savedPc=%p meth=%p curPc=%p fp[0]=0x%08x\n",
>> +        saveA->p1, saveA->p2, saveA->p3, saveA->p4, *(unsigned int*)fp);
>> +}
>>
>
Carrot Wei Nov. 22, 2010, 11:16 p.m. UTC | #3
ping

On Sun, Oct 31, 2010 at 2:22 AM, Carrot Wei <carrot@google.com> wrote:
> Ping
>
> On Sun, Oct 24, 2010 at 9:46 PM, Carrot Wei <carrot@google.com> wrote:
>> Ping
>>
>> On Sat, Oct 16, 2010 at 8:27 PM, Carrot Wei <carrot@google.com> wrote:
>>> On Wed, Oct 13, 2010 at 7:01 PM, Paul Brook <paul@codesourcery.com> wrote:
>>>>> ChangeLog:
>>>>> 2010-09-04  Wei Guozhi  <carrot@google.com>
>>>>>
>>>>>         PR target/45335
>>>>>         * gcc/config/arm/thumb2.md (thumb2_ldrd, thumb2_ldrd_reg1,
>>>>>         thumb2_ldrd_reg2 and peephole2): New insn pattern and related
>>>>>         peephole2.
>>>>>         (thumb2_strd, thumb2_strd_reg1, thumb2_strd_reg2 and peephole2):
>>>>>         New insn pattern and related peephole2.
>>>>>         * gcc/config/arm/arm.c (thumb2_legitimate_ldrd_p): New function.
>>>>>         (thumb2_check_ldrd_operands): New function.
>>>>>         (thumb2_prefer_ldmstm): New function.
>>>>>         * gcc/config/arm/arm-protos.h (thumb2_legitimate_ldrd_p): New
>>>>> prototype. (thumb2_check_ldrd_operands): New prototype.
>>>>>         (thumb2_prefer_ldmstm): New prototype.
>>>>>         * gcc/config/arm/ldmstm.md (ldm2_ia, stm2_ia, ldm2_db, stm2_db):
>>>>>         Change the ldm/stm patterns with 2 words to ARM only.
>>>>>         * gcc/config/arm/constraints.md (Py): New thumb2 constant
>>>>> constraint suitable to ldrd/strd instructions.
>>>>
>>>> Not ok.
>>>>
>>>> Why is this restricted to Thumb mode? The ARM variant of ldrd isn't quite as
>>>> flexible, but still provides a useful improvement over ldm.
>>>>
>>> I agree the ARM version is also useful. But it brings much less
>>> benefit with too much complexity (due to more restriction and insn
>>> pattern conflict with ldm). So I will leave it as a future
>>> improvement.
>>>
>>>> This transformation is only valid on ARMv7 cores. On earlier hardware
>>>> (depending on system configuration) it may cause undefined behavior or an
>>>> alignment trap.
>>>>
>>> done.
>>>
>>>> The range on -1020 to +1024 is used in several places, but without any
>>>> apparent explanation of why it's different to the range of an ldrd
>>>> instruction.  I figured it out eventually, but it deserves a comment.
>>>>
>>> Comments added.
>>>
>>>>> +  "TARGET_THUMB2 && thumb2_check_ldrd_operands (operands[0], operands[1],
>>>>> +                                           operands[2], 0, operands[3], 1)"
>>>>
>>>> Passed operands do not match expected types. Specifically "0" is not an rtx
>>>> (should be "NULL_RTX"), and "1" is not a boolean value (should be "true").
>>>> Many other occurrences.
>>>>
>>> Fixed.
>>>
>>>>> +(define_constraint "Py"
>>>>> +  "@internal In Thumb-2 state a constant that is a multiple of 4 in the
>>>>> +   range -1020 to 1024"
>>>>
>>>> This comment seems particularly pointless. You should mention why this
>>>> exists/where it is used.
>>>>
>>>> I think you're better off enforcing this in the insn condition, and remove
>>>> this constraint. At least half the uses (the -reg[12] insns) are incorrect,
>>>> and you already need the condition to enforce the dependency between the
>>>> operands.
>>>>
>>> I removed this constraint and add the check to insn condition.
>>>
>>>>> +thumb2_check_ldrd_operands (rtx reg1, rtx reg2, rtx base,
>>>>>...
>>>>> +  if (ldrd && (reg1 == reg2))
>>>>> +    return false;
>>>>
>>>> This function is part of the instruction condition.  Instruction conditions
>>>> must not be used to enforce register allocation.
>>>>
>>> removed.
>>>
>>>>> +thumb2_legitimate_ldrd_p (
>>>>>...
>>>>> +  if (ldrd && ((reg1 == reg2) || (reg1 == base1)))
>>>>> +    return false;
>>>>
>>>> You're incorrectly assuming offset1 < offset2, which might not be true at this
>>>> point.
>>>>
>>> The following check assumes offset1 < offset2
>>> +  if ((offset1 + 4) == offset2)
>>> +    return true;
>>>
>>> And another check assumes offset2 < offset1, so both cases are covered.
>>> +  if ((offset2 + 4) == offset1)
>>> +    return true;
>>>
>>>>> +  /* Now ldm/stm is possible. Check for special cases ldm/stm has lower
>>>>> +     cost.  */
>>>>> +  return false;
>>>>
>>>> Code clearly doesn't match the comment.  In fact this function always returns
>>>> false.
>>>>
>>> Richard mentioned that in some cases (specifically cortex A9) ldm has
>>> less cost than ldrd and we should model this in the insn pattern. This
>>> function is used for this. But I don't know the cortex A9 architecture
>>> detail, so it should be filled by somebody with more knowledge about
>>> it in future.
>>>
>>> Wei Guozhi
>>>
>>>
>>> ChangeLog:
>>> 2010-10-16  Wei Guozhi  <carrot@google.com>
>>>
>>>        PR target/45335
>>>        * gcc/config/arm/thumb2.md (thumb2_ldrd, thumb2_ldrd_reg1,
>>>        thumb2_ldrd_reg2 and peephole2): New insn pattern and related
>>>        peephole2.
>>>        (thumb2_strd, thumb2_strd_reg1, thumb2_strd_reg2 and peephole2):
>>>        New insn pattern and related peephole2.
>>>        * gcc/config/arm/arm.c (thumb2_legitimate_ldrd_p): New function.
>>>        (thumb2_check_ldrd_operands): New function.
>>>        (thumb2_prefer_ldmstm): New function.
>>>        * gcc/config/arm/arm-protos.h (thumb2_legitimate_ldrd_p): New prototype.
>>>        (thumb2_check_ldrd_operands): New prototype.
>>>        (thumb2_prefer_ldmstm): New prototype.
>>>        * gcc/config/arm/ldmstm.md (ldm2_ia, stm2_ia, ldm2_db, stm2_db):
>>>        Change the ldm/stm patterns with 2 words to ARM only.
>>>
>>>
>>> 2010-10-16  Wei Guozhi  <carrot@google.com>
>>>
>>>        PR target/45335
>>>        * gcc.target/arm/pr45335.c: New test.
>>>        * gcc.target/arm/pr40457-1.c: Changed to load 3 words.
>>>        * gcc.target/arm/pr40457-2.c: Changed to store 3 words.
>>>        * gcc.target/arm/pr40457-3.c: Changed to store 3 words.
>>>
>>>
>>> Index: thumb2.md
>>> ===================================================================
>>> --- thumb2.md   (revision 165492)
>>> +++ thumb2.md   (working copy)
>>> @@ -1118,3 +1118,228 @@ (define_peephole2
>>>   "
>>>   operands[2] = GEN_INT (32 - INTVAL (operands[2]));
>>>   ")
>>> +
>>> +(define_insn "*thumb2_ldrd"
>>> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>> +                  (mem:SI (plus:SI
>>> +                               (match_operand:SI 2 "s_register_operand" "rk")
>>> +                               (match_operand:SI 3 "const_int_operand" ""))))
>>> +             (set (match_operand:SI 1 "s_register_operand" "")
>>> +                  (mem:SI (plus:SI (match_dup 2)
>>> +                            (match_operand:SI 4 "const_int_operand" ""))))])]
>>> +  "TARGET_THUMB2 && arm_arch7
>>> +   && thumb2_check_ldrd_operands (operands[3], operands[4])"
>>> +  "*
>>> +  {
>>> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>> +    HOST_WIDE_INT offset2 = INTVAL (operands[4]);
>>> +    if (offset1 > offset2)
>>> +      {
>>> +       /* Swap the operands so that memory [base+offset1] is loaded into
>>> +          operands[0].  */
>>> +       rtx tmp = operands[0];
>>> +       operands[0] = operands[1];
>>> +       operands[1] = tmp;
>>> +       tmp = operands[3];
>>> +       operands[3] = operands[4];
>>> +       operands[4] = tmp;
>>> +       offset1 = INTVAL (operands[3]);
>>> +       offset2 = INTVAL (operands[4]);
>>> +      }
>>> +    if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>> +                             operands[2], operands[3], operands[4], true))
>>> +      return \"ldmdb\\t%2, {%0, %1}\";
>>> +    else if (fix_cm3_ldrd && (operands[2] == operands[0]))
>>> +      {
>>> +       if (offset1 <= -256)
>>> +         {
>>> +           output_asm_insn (\"sub\\t%2, %2, %n3\", operands);
>>> +           output_asm_insn (\"ldr\\t%1, [%2, #4]\", operands);
>>> +           output_asm_insn (\"ldr\\t%0, [%2]\", operands);
>>> +         }
>>> +       else
>>> +         {
>>> +           output_asm_insn (\"ldr\\t%1, [%2, %4]\", operands);
>>> +           output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
>>> +         }
>>> +       return \"\";
>>> +      }
>>> +    else
>>> +      return \"ldrd\\t%0, %1, [%2, %3]\";
>>> +  }"
>>> +)
>>> +
>>> +(define_insn "*thumb2_ldrd_reg1"
>>> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>> +                  (mem:SI (match_operand:SI 2 "s_register_operand" "rk")))
>>> +             (set (match_operand:SI 1 "s_register_operand" "")
>>> +                  (mem:SI (plus:SI (match_dup 2)
>>> +                            (match_operand:SI 3 "const_int_operand" ""))))])]
>>> +  "TARGET_THUMB2 && arm_arch7
>>> +   && thumb2_check_ldrd_operands (NULL_RTX, operands[3])"
>>> +  "*
>>> +  {
>>> +    HOST_WIDE_INT offset2 = INTVAL (operands[3]);
>>> +    if (offset2 == 4)
>>> +      {
>>> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>> +                                 operands[2], NULL_RTX, operands[3], true))
>>> +         return \"ldmia\\t%2, {%0, %1}\";
>>> +       if (fix_cm3_ldrd && (operands[2] == operands[0]))
>>> +         {
>>> +           output_asm_insn (\"ldr\\t%1, [%2, %3]\", operands);
>>> +           output_asm_insn (\"ldr\\t%0, [%2]\", operands);
>>> +           return \"\";
>>> +         }
>>> +       return \"ldrd\\t%0, %1, [%2]\";
>>> +      }
>>> +    else
>>> +      {
>>> +       if (fix_cm3_ldrd && (operands[2] == operands[1]))
>>> +         {
>>> +           output_asm_insn (\"ldr\\t%0, [%2]\", operands);
>>> +           output_asm_insn (\"ldr\\t%1, [%2, %3]\", operands);
>>> +         }
>>> +       return \"ldrd\\t%1, %0, [%2, %3]\";
>>> +      }
>>> +  }"
>>> +)
>>> +
>>> +(define_insn "*thumb2_ldrd_reg2"
>>> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>> +                  (mem:SI (plus:SI
>>> +                               (match_operand:SI 2 "s_register_operand" "rk")
>>> +                               (match_operand:SI 3 "const_int_operand" ""))))
>>> +             (set (match_operand:SI 1 "s_register_operand" "")
>>> +                  (mem:SI (match_dup 2)))])]
>>> +  "TARGET_THUMB2 && arm_arch7
>>> +   && thumb2_check_ldrd_operands (operands[3], NULL_RTX)"
>>> +  "*
>>> +  {
>>> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>> +    if (offset1 == -4)
>>> +      {
>>> +       if (fix_cm3_ldrd && (operands[2] == operands[0]))
>>> +         {
>>> +           output_asm_insn (\"ldr\\t%1, [%2]\", operands);
>>> +           output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
>>> +           return \"\";
>>> +         }
>>> +       return \"ldrd\\t%0, %1, [%2, %3]\";
>>> +      }
>>> +    else
>>> +      {
>>> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>> +                                 operands[2], operands[3], NULL_RTX, true))
>>> +         return \"ldmia\\t%2, {%1, %0}\";
>>> +       if (fix_cm3_ldrd && (operands[2] == operands[1]))
>>> +         {
>>> +           output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
>>> +           output_asm_insn (\"ldr\\t%1, [%2]\", operands);
>>> +           return \"\";
>>> +         }
>>> +       return \"ldrd\\t%1, %0, [%2]\";
>>> +      }
>>> +  }"
>>> +)
>>> +
>>> +(define_peephole2
>>> +  [(set (match_operand:SI 0 "s_register_operand" "")
>>> +       (match_operand:SI 2 "memory_operand" ""))
>>> +   (set (match_operand:SI 1 "s_register_operand" "")
>>> +       (match_operand:SI 3 "memory_operand" ""))]
>>> +  "TARGET_THUMB2 && arm_arch7
>>> +   && thumb2_legitimate_ldrd_p (operands[0], operands[1],
>>> +                               operands[2], operands[3], true)"
>>> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>> +                  (match_operand:SI 2 "memory_operand" ""))
>>> +             (set (match_operand:SI 1 "s_register_operand" "")
>>> +                  (match_operand:SI 3 "memory_operand" ""))])]
>>> +  ""
>>> +)
>>> +
>>> +(define_insn "*thumb2_strd"
>>> +  [(parallel [(set (mem:SI
>>> +                       (plus:SI (match_operand:SI 2 "s_register_operand" "rk")
>>> +                                (match_operand:SI 3 "const_int_operand" "")))
>>> +                  (match_operand:SI 0 "s_register_operand" ""))
>>> +             (set (mem:SI (plus:SI (match_dup 2)
>>> +                                (match_operand:SI 4 "const_int_operand" "")))
>>> +                  (match_operand:SI 1 "s_register_operand" ""))])]
>>> +  "TARGET_THUMB2 && arm_arch7
>>> +   && thumb2_check_ldrd_operands (operands[3], operands[4])"
>>> +  "*
>>> +  {
>>> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>> +    HOST_WIDE_INT offset2 = INTVAL (operands[4]);
>>> +    if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>> +                             operands[2], operands[3], operands[4], false))
>>> +      return \"stmdb\\t%2, {%0, %1}\";
>>> +    if (offset1 < offset2)
>>> +      return \"strd\\t%0, %1, [%2, %3]\";
>>> +    else
>>> +      return \"strd\\t%1, %0, [%2, %4]\";
>>> +  }"
>>> +)
>>> +
>>> +(define_insn "*thumb2_strd_reg1"
>>> +  [(parallel [(set (mem:SI (match_operand:SI 2 "s_register_operand" "rk"))
>>> +                  (match_operand:SI 0 "s_register_operand" ""))
>>> +             (set (mem:SI (plus:SI (match_dup 2)
>>> +                               (match_operand:SI 3 "const_int_operand" "")))
>>> +                  (match_operand:SI 1 "s_register_operand" ""))])]
>>> +  "TARGET_THUMB2 && arm_arch7
>>> +   && thumb2_check_ldrd_operands (NULL_RTX, operands[3])"
>>> +  "*
>>> +  {
>>> +    HOST_WIDE_INT offset2 = INTVAL (operands[3]);
>>> +    if (offset2 == 4)
>>> +      {
>>> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>> +                                 operands[2], NULL_RTX, operands[3], false))
>>> +         return \"stmia\\t%2, {%0, %1}\";
>>> +       return \"strd\\t%0, %1, [%2]\";
>>> +      }
>>> +    else
>>> +      return \"strd\\t%1, %0, [%2, %3]\";
>>> +  }"
>>> +)
>>> +
>>> +(define_insn "*thumb2_strd_reg2"
>>> +  [(parallel [(set (mem:SI (plus:SI
>>> +                               (match_operand:SI 2 "s_register_operand" "rk")
>>> +                               (match_operand:SI 3 "const_int_operand" "")))
>>> +                  (match_operand:SI 0 "s_register_operand" ""))
>>> +             (set (mem:SI (match_dup 2))
>>> +                  (match_operand:SI 1 "s_register_operand" ""))])]
>>> +  "TARGET_THUMB2 && arm_arch7
>>> +   && thumb2_check_ldrd_operands (operands[3], NULL_RTX)"
>>> +  "*
>>> +  {
>>> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>> +    if (offset1 == -4)
>>> +      return \"strd\\t%0, %1, [%2, %3]\";
>>> +    else
>>> +      {
>>> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>> +                                 operands[2], operands[3], NULL_RTX, false))
>>> +         return \"stmia\\t%2, {%1, %0}\";
>>> +       return \"strd\\t%1, %0, [%2]\";
>>> +      }
>>> +  }"
>>> +)
>>> +
>>> +(define_peephole2
>>> +  [(set (match_operand:SI 2 "memory_operand" "")
>>> +       (match_operand:SI 0 "s_register_operand" ""))
>>> +   (set (match_operand:SI 3 "memory_operand" "")
>>> +       (match_operand:SI 1 "s_register_operand" ""))]
>>> +  "TARGET_THUMB2 && arm_arch7
>>> +   && thumb2_legitimate_ldrd_p (operands[0], operands[1],
>>> +                               operands[2], operands[3], false)"
>>> +  [(parallel [(set (match_operand:SI 2 "memory_operand" "")
>>> +                  (match_operand:SI 0 "s_register_operand" ""))
>>> +             (set (match_operand:SI 3 "memory_operand" "")
>>> +                  (match_operand:SI 1 "s_register_operand" ""))])]
>>> +  ""
>>> +)
>>> Index: arm.c
>>> ===================================================================
>>> --- arm.c       (revision 165492)
>>> +++ arm.c       (working copy)
>>> @@ -23254,4 +23254,134 @@ arm_builtin_support_vector_misalignment
>>>                                                      is_packed);
>>>  }
>>>
>>> +/* Check the validity of operands in an ldrd/strd instruction.  */
>>> +bool
>>> +thumb2_check_ldrd_operands (rtx off1, rtx off2)
>>> +{
>>> +  HOST_WIDE_INT offset1 = 0;
>>> +  HOST_WIDE_INT offset2 = 0;
>>> +
>>> +  if (off1 != NULL_RTX)
>>> +    offset1 = INTVAL (off1);
>>> +  if (off2 != NULL_RTX)
>>> +    offset2 = INTVAL (off2);
>>> +
>>> +  /* The offset range of LDRD is [-1020, 1020]. Here we check if both
>>> +     offsets lie in the range [-1020, 1024]. If one of the offsets is
>>> +     1024, the following condition ((offset1 + 4) == offset2) will ensure
>>> +     offset1 to be 1020, suitable for instruction LDRD.  */
>>> +  if ((offset1 > 1024) || (offset1 < -1020) || ((offset1 & 3) != 0))
>>> +    return false;
>>> +  if ((offset2 > 1024) || (offset2 < -1020) || ((offset2 & 3) != 0))
>>> +    return false;
>>> +
>>> +  if ((offset1 + 4) == offset2)
>>> +    return true;
>>> +  if ((offset2 + 4) == offset1)
>>> +    return true;
>>> +
>>> +  return false;
>>> +}
>>> +
>>> +/* Check if the two memory accesses can be merged to an ldrd/strd instruction.
>>> +   That is they use the same base register, and the gap between constant
>>> +   offsets should be 4.  */
>>> +bool
>>> +thumb2_legitimate_ldrd_p (rtx reg1, rtx reg2, rtx mem1, rtx mem2, bool ldrd)
>>> +{
>>> +  rtx base1, base2, op1;
>>> +  rtx addr1 = XEXP (mem1, 0);
>>> +  rtx addr2 = XEXP (mem2, 0);
>>> +  HOST_WIDE_INT offset1 = 0;
>>> +  HOST_WIDE_INT offset2 = 0;
>>> +
>>> +  if (MEM_VOLATILE_P (mem1) || MEM_VOLATILE_P (mem2))
>>> +    return false;
>>> +
>>> +  if (REG_P (addr1))
>>> +    base1 = addr1;
>>> +  else if (GET_CODE (addr1) == PLUS)
>>> +    {
>>> +      base1 = XEXP (addr1, 0);
>>> +      op1 = XEXP (addr1, 1);
>>> +      if (!REG_P (base1) || (GET_CODE (op1) != CONST_INT))
>>> +       return false;
>>> +      offset1 = INTVAL (op1);
>>> +    }
>>> +  else
>>> +    return false;
>>> +
>>> +  if (REG_P (addr2))
>>> +    base2 = addr2;
>>> +  else if (GET_CODE (addr2) == PLUS)
>>> +    {
>>> +      base2 = XEXP (addr2, 0);
>>> +      op1 = XEXP (addr2, 1);
>>> +      if (!REG_P (base2) || (GET_CODE (op1) != CONST_INT))
>>> +       return false;
>>> +      offset2 = INTVAL (op1);
>>> +    }
>>> +  else
>>> +    return false;
>>> +
>>> +  if (base1 != base2)
>>> +    return false;
>>> +
>>> +  /* The offset range of LDRD is [-1020, 1020]. Here we check if both
>>> +     offsets lie in the range [-1020, 1024]. If one of the offsets is
>>> +     1024, the following condition ((offset1 + 4) == offset2) will ensure
>>> +     offset1 to be 1020, suitable for instruction LDRD.  */
>>> +  if ((offset1 > 1024) || (offset1 < -1020) || ((offset1 & 3) != 0))
>>> +    return false;
>>> +  if ((offset2 > 1024) || (offset2 < -1020) || ((offset2 & 3) != 0))
>>> +    return false;
>>> +
>>> +  if (ldrd && ((reg1 == reg2) || (reg1 == base1)))
>>> +    return false;
>>> +
>>> +  if ((offset1 + 4) == offset2)
>>> +    return true;
>>> +  if ((offset2 + 4) == offset1)
>>> +    return true;
>>> +
>>> +  return false;
>>> +}
>>> +
>>> +/* Check if the insn can be expressed as ldm/stm with less cost.  */
>>> +bool
>>> +thumb2_prefer_ldmstm (rtx reg1, rtx reg2, rtx base,
>>> +                     rtx off1, rtx off2, bool ldrd)
>>> +{
>>> +  HOST_WIDE_INT offset1 = 0;
>>> +  HOST_WIDE_INT offset2 = 0;
>>> +
>>> +  if (off1 != NULL_RTX)
>>> +    offset1 = INTVAL (off1);
>>> +  if (off2 != NULL_RTX)
>>> +    offset2 = INTVAL (off2);
>>> +
>>> +  if (offset1 > offset2)
>>> +    {
>>> +      rtx tmp;
>>> +      HOST_WIDE_INT t = offset1;
>>> +      offset1 = offset2;
>>> +      offset2 = t;
>>> +      tmp = reg1;
>>> +      reg1 = reg2;
>>> +      reg2 = tmp;
>>> +    }
>>> +
>>> +  /* The offset of ldmdb is -8, the offset of ldmia is 0.  */
>>> +  if ((offset1 != -8) && (offset1 != 0))
>>> +    return false;
>>> +
>>> +  /* Lower register corresponds to lower memory.  */
>>> +  if (REGNO (reg1) > REGNO (reg2))
>>> +    return false;
>>> +
>>> +  /* Now ldm/stm is possible. Check for special cases ldm/stm has lower
>>> +     cost.  */
>>> +  return false;
>>> +}
>>> +
>>>  #include "gt-arm.h"
>>> Index: arm-protos.h
>>> ===================================================================
>>> --- arm-protos.h        (revision 165492)
>>> +++ arm-protos.h        (working copy)
>>> @@ -150,6 +150,9 @@ extern void arm_expand_sync (enum machin
>>>  extern const char *arm_output_memory_barrier (rtx *);
>>>  extern const char *arm_output_sync_insn (rtx, rtx *);
>>>  extern unsigned int arm_sync_loop_insns (rtx , rtx *);
>>> +extern bool thumb2_check_ldrd_operands (rtx, rtx);
>>> +extern bool thumb2_legitimate_ldrd_p (rtx, rtx, rtx, rtx, bool);
>>> +extern bool thumb2_prefer_ldmstm (rtx, rtx, rtx, rtx, rtx, bool);
>>>
>>>  #if defined TREE_CODE
>>>  extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree);
>>> Index: ldmstm.md
>>> ===================================================================
>>> --- ldmstm.md   (revision 165492)
>>> +++ ldmstm.md   (working copy)
>>> @@ -852,7 +852,7 @@ (define_insn "*ldm2_ia"
>>>      (set (match_operand:SI 2 "arm_hard_register_operand" "")
>>>           (mem:SI (plus:SI (match_dup 3)
>>>                   (const_int 4))))])]
>>> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>   "ldm%(ia%)\t%3, {%1, %2}"
>>>   [(set_attr "type" "load2")
>>>    (set_attr "predicable" "yes")])
>>> @@ -901,7 +901,7 @@ (define_insn "*stm2_ia"
>>>           (match_operand:SI 1 "arm_hard_register_operand" ""))
>>>      (set (mem:SI (plus:SI (match_dup 3) (const_int 4)))
>>>           (match_operand:SI 2 "arm_hard_register_operand" ""))])]
>>> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>   "stm%(ia%)\t%3, {%1, %2}"
>>>   [(set_attr "type" "store2")
>>>    (set_attr "predicable" "yes")])
>>> @@ -1041,7 +1041,7 @@ (define_insn "*ldm2_db"
>>>      (set (match_operand:SI 2 "arm_hard_register_operand" "")
>>>           (mem:SI (plus:SI (match_dup 3)
>>>                   (const_int -4))))])]
>>> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>   "ldm%(db%)\t%3, {%1, %2}"
>>>   [(set_attr "type" "load2")
>>>    (set_attr "predicable" "yes")])
>>> @@ -1067,7 +1067,7 @@ (define_insn "*stm2_db"
>>>           (match_operand:SI 1 "arm_hard_register_operand" ""))
>>>      (set (mem:SI (plus:SI (match_dup 3) (const_int -4)))
>>>           (match_operand:SI 2 "arm_hard_register_operand" ""))])]
>>> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>   "stm%(db%)\t%3, {%1, %2}"
>>>   [(set_attr "type" "store2")
>>>    (set_attr "predicable" "yes")])
>>>
>>>
>>> Index: pr40457-3.c
>>> ===================================================================
>>> --- pr40457-3.c (revision 165492)
>>> +++ pr40457-3.c (working copy)
>>> @@ -5,6 +5,7 @@ void foo(int* p)
>>>  {
>>>   p[0] = 1;
>>>   p[1] = 0;
>>> +  p[2] = 2;
>>>  }
>>>
>>>  /* { dg-final { scan-assembler "stm" } } */
>>> Index: pr40457-1.c
>>> ===================================================================
>>> --- pr40457-1.c (revision 165492)
>>> +++ pr40457-1.c (working copy)
>>> @@ -1,9 +1,9 @@
>>> -/* { dg-options "-Os" }  */
>>> +/* { dg-options "-O2" }  */
>>>  /* { dg-do compile } */
>>>
>>>  int bar(int* p)
>>>  {
>>> -  int x = p[0] + p[1];
>>> +  int x = p[0] + p[1] + p[2];
>>>   return x;
>>>  }
>>>
>>> Index: pr40457-2.c
>>> ===================================================================
>>> --- pr40457-2.c (revision 165492)
>>> +++ pr40457-2.c (working copy)
>>> @@ -5,6 +5,7 @@ void foo(int* p)
>>>  {
>>>   p[0] = 1;
>>>   p[1] = 0;
>>> +  p[2] = 2;
>>>  }
>>>
>>>  /* { dg-final { scan-assembler "stm" } } */
>>> Index: pr45335.c
>>> ===================================================================
>>> --- pr45335.c   (revision 0)
>>> +++ pr45335.c   (revision 0)
>>> @@ -0,0 +1,22 @@
>>> +/* { dg-options "-mthumb -O2" } */
>>> +/* { dg-require-effective-target arm_thumb2_ok } */
>>> +/* { dg-final { scan-assembler "ldrd" } } */
>>> +/* { dg-final { scan-assembler "strd" } } */
>>> +
>>> +struct S
>>> +{
>>> +    void* p1;
>>> +    void* p2;
>>> +    void* p3;
>>> +    void* p4;
>>> +};
>>> +
>>> +extern printf(char*, ...);
>>> +
>>> +void foo1(struct S* fp, struct S* otherSaveArea)
>>> +{
>>> +    struct S* saveA = fp - 1;
>>> +    printf("StackSaveArea for fp %p [%p/%p]:\n", fp, saveA, otherSaveArea);
>>> +    printf("prevFrame=%p savedPc=%p meth=%p curPc=%p fp[0]=0x%08x\n",
>>> +        saveA->p1, saveA->p2, saveA->p3, saveA->p4, *(unsigned int*)fp);
>>> +}
>>>
>>
>
Carrot Wei Nov. 29, 2010, 10:32 p.m. UTC | #4
ping

On Mon, Nov 22, 2010 at 3:16 PM, Carrot Wei <carrot@google.com> wrote:
> ping
>
> On Sun, Oct 31, 2010 at 2:22 AM, Carrot Wei <carrot@google.com> wrote:
>> Ping
>>
>> On Sun, Oct 24, 2010 at 9:46 PM, Carrot Wei <carrot@google.com> wrote:
>>> Ping
>>>
>>> On Sat, Oct 16, 2010 at 8:27 PM, Carrot Wei <carrot@google.com> wrote:
>>>> On Wed, Oct 13, 2010 at 7:01 PM, Paul Brook <paul@codesourcery.com> wrote:
>>>>>> ChangeLog:
>>>>>> 2010-09-04  Wei Guozhi  <carrot@google.com>
>>>>>>
>>>>>>         PR target/45335
>>>>>>         * gcc/config/arm/thumb2.md (thumb2_ldrd, thumb2_ldrd_reg1,
>>>>>>         thumb2_ldrd_reg2 and peephole2): New insn pattern and related
>>>>>>         peephole2.
>>>>>>         (thumb2_strd, thumb2_strd_reg1, thumb2_strd_reg2 and peephole2):
>>>>>>         New insn pattern and related peephole2.
>>>>>>         * gcc/config/arm/arm.c (thumb2_legitimate_ldrd_p): New function.
>>>>>>         (thumb2_check_ldrd_operands): New function.
>>>>>>         (thumb2_prefer_ldmstm): New function.
>>>>>>         * gcc/config/arm/arm-protos.h (thumb2_legitimate_ldrd_p): New
>>>>>> prototype. (thumb2_check_ldrd_operands): New prototype.
>>>>>>         (thumb2_prefer_ldmstm): New prototype.
>>>>>>         * gcc/config/arm/ldmstm.md (ldm2_ia, stm2_ia, ldm2_db, stm2_db):
>>>>>>         Change the ldm/stm patterns with 2 words to ARM only.
>>>>>>         * gcc/config/arm/constraints.md (Py): New thumb2 constant
>>>>>> constraint suitable to ldrd/strd instructions.
>>>>>
>>>>> Not ok.
>>>>>
>>>>> Why is this restricted to Thumb mode? The ARM variant of ldrd isn't quite as
>>>>> flexible, but still provides a useful improvement over ldm.
>>>>>
>>>> I agree the ARM version is also useful. But it brings much less
>>>> benefit with too much complexity (due to more restriction and insn
>>>> pattern conflict with ldm). So I will leave it as a future
>>>> improvement.
>>>>
>>>>> This transformation is only valid on ARMv7 cores. On earlier hardware
>>>>> (depending on system configuration) it may cause undefined behavior or an
>>>>> alignment trap.
>>>>>
>>>> done.
>>>>
>>>>> The range on -1020 to +1024 is used in several places, but without any
>>>>> apparent explanation of why it's different to the range of an ldrd
>>>>> instruction.  I figured it out eventually, but it deserves a comment.
>>>>>
>>>> Comments added.
>>>>
>>>>>> +  "TARGET_THUMB2 && thumb2_check_ldrd_operands (operands[0], operands[1],
>>>>>> +                                           operands[2], 0, operands[3], 1)"
>>>>>
>>>>> Passed operands do not match expected types. Specifically "0" is not an rtx
>>>>> (should be "NULL_RTX"), and "1" is not a boolean value (should be "true").
>>>>> Many other occurrences.
>>>>>
>>>> Fixed.
>>>>
>>>>>> +(define_constraint "Py"
>>>>>> +  "@internal In Thumb-2 state a constant that is a multiple of 4 in the
>>>>>> +   range -1020 to 1024"
>>>>>
>>>>> This comment seems particularly pointless. You should mention why this
>>>>> exists/where it is used.
>>>>>
>>>>> I think you're better off enforcing this in the insn condition, and remove
>>>>> this constraint. At least half the uses (the -reg[12] insns) are incorrect,
>>>>> and you already need the condition to enforce the dependency between the
>>>>> operands.
>>>>>
>>>> I removed this constraint and add the check to insn condition.
>>>>
>>>>>> +thumb2_check_ldrd_operands (rtx reg1, rtx reg2, rtx base,
>>>>>>...
>>>>>> +  if (ldrd && (reg1 == reg2))
>>>>>> +    return false;
>>>>>
>>>>> This function is part of the instruction condition.  Instruction conditions
>>>>> must not be used to enforce register allocation.
>>>>>
>>>> removed.
>>>>
>>>>>> +thumb2_legitimate_ldrd_p (
>>>>>>...
>>>>>> +  if (ldrd && ((reg1 == reg2) || (reg1 == base1)))
>>>>>> +    return false;
>>>>>
>>>>> You're incorrectly assuming offset1 < offset2, which might not be true at this
>>>>> point.
>>>>>
>>>> The following check assumes offset1 < offset2
>>>> +  if ((offset1 + 4) == offset2)
>>>> +    return true;
>>>>
>>>> And another check assumes offset2 < offset1, so both cases are covered.
>>>> +  if ((offset2 + 4) == offset1)
>>>> +    return true;
>>>>
>>>>>> +  /* Now ldm/stm is possible. Check for special cases ldm/stm has lower
>>>>>> +     cost.  */
>>>>>> +  return false;
>>>>>
>>>>> Code clearly doesn't match the comment.  In fact this function always returns
>>>>> false.
>>>>>
>>>> Richard mentioned that in some cases (specifically cortex A9) ldm has
>>>> less cost than ldrd and we should model this in the insn pattern. This
>>>> function is used for this. But I don't know the cortex A9 architecture
>>>> detail, so it should be filled by somebody with more knowledge about
>>>> it in future.
>>>>
>>>> Wei Guozhi
>>>>
>>>>
>>>> ChangeLog:
>>>> 2010-10-16  Wei Guozhi  <carrot@google.com>
>>>>
>>>>        PR target/45335
>>>>        * gcc/config/arm/thumb2.md (thumb2_ldrd, thumb2_ldrd_reg1,
>>>>        thumb2_ldrd_reg2 and peephole2): New insn pattern and related
>>>>        peephole2.
>>>>        (thumb2_strd, thumb2_strd_reg1, thumb2_strd_reg2 and peephole2):
>>>>        New insn pattern and related peephole2.
>>>>        * gcc/config/arm/arm.c (thumb2_legitimate_ldrd_p): New function.
>>>>        (thumb2_check_ldrd_operands): New function.
>>>>        (thumb2_prefer_ldmstm): New function.
>>>>        * gcc/config/arm/arm-protos.h (thumb2_legitimate_ldrd_p): New prototype.
>>>>        (thumb2_check_ldrd_operands): New prototype.
>>>>        (thumb2_prefer_ldmstm): New prototype.
>>>>        * gcc/config/arm/ldmstm.md (ldm2_ia, stm2_ia, ldm2_db, stm2_db):
>>>>        Change the ldm/stm patterns with 2 words to ARM only.
>>>>
>>>>
>>>> 2010-10-16  Wei Guozhi  <carrot@google.com>
>>>>
>>>>        PR target/45335
>>>>        * gcc.target/arm/pr45335.c: New test.
>>>>        * gcc.target/arm/pr40457-1.c: Changed to load 3 words.
>>>>        * gcc.target/arm/pr40457-2.c: Changed to store 3 words.
>>>>        * gcc.target/arm/pr40457-3.c: Changed to store 3 words.
>>>>
>>>>
>>>> Index: thumb2.md
>>>> ===================================================================
>>>> --- thumb2.md   (revision 165492)
>>>> +++ thumb2.md   (working copy)
>>>> @@ -1118,3 +1118,228 @@ (define_peephole2
>>>>   "
>>>>   operands[2] = GEN_INT (32 - INTVAL (operands[2]));
>>>>   ")
>>>> +
>>>> +(define_insn "*thumb2_ldrd"
>>>> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>> +                  (mem:SI (plus:SI
>>>> +                               (match_operand:SI 2 "s_register_operand" "rk")
>>>> +                               (match_operand:SI 3 "const_int_operand" ""))))
>>>> +             (set (match_operand:SI 1 "s_register_operand" "")
>>>> +                  (mem:SI (plus:SI (match_dup 2)
>>>> +                            (match_operand:SI 4 "const_int_operand" ""))))])]
>>>> +  "TARGET_THUMB2 && arm_arch7
>>>> +   && thumb2_check_ldrd_operands (operands[3], operands[4])"
>>>> +  "*
>>>> +  {
>>>> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>> +    HOST_WIDE_INT offset2 = INTVAL (operands[4]);
>>>> +    if (offset1 > offset2)
>>>> +      {
>>>> +       /* Swap the operands so that memory [base+offset1] is loaded into
>>>> +          operands[0].  */
>>>> +       rtx tmp = operands[0];
>>>> +       operands[0] = operands[1];
>>>> +       operands[1] = tmp;
>>>> +       tmp = operands[3];
>>>> +       operands[3] = operands[4];
>>>> +       operands[4] = tmp;
>>>> +       offset1 = INTVAL (operands[3]);
>>>> +       offset2 = INTVAL (operands[4]);
>>>> +      }
>>>> +    if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>> +                             operands[2], operands[3], operands[4], true))
>>>> +      return \"ldmdb\\t%2, {%0, %1}\";
>>>> +    else if (fix_cm3_ldrd && (operands[2] == operands[0]))
>>>> +      {
>>>> +       if (offset1 <= -256)
>>>> +         {
>>>> +           output_asm_insn (\"sub\\t%2, %2, %n3\", operands);
>>>> +           output_asm_insn (\"ldr\\t%1, [%2, #4]\", operands);
>>>> +           output_asm_insn (\"ldr\\t%0, [%2]\", operands);
>>>> +         }
>>>> +       else
>>>> +         {
>>>> +           output_asm_insn (\"ldr\\t%1, [%2, %4]\", operands);
>>>> +           output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
>>>> +         }
>>>> +       return \"\";
>>>> +      }
>>>> +    else
>>>> +      return \"ldrd\\t%0, %1, [%2, %3]\";
>>>> +  }"
>>>> +)
>>>> +
>>>> +(define_insn "*thumb2_ldrd_reg1"
>>>> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>> +                  (mem:SI (match_operand:SI 2 "s_register_operand" "rk")))
>>>> +             (set (match_operand:SI 1 "s_register_operand" "")
>>>> +                  (mem:SI (plus:SI (match_dup 2)
>>>> +                            (match_operand:SI 3 "const_int_operand" ""))))])]
>>>> +  "TARGET_THUMB2 && arm_arch7
>>>> +   && thumb2_check_ldrd_operands (NULL_RTX, operands[3])"
>>>> +  "*
>>>> +  {
>>>> +    HOST_WIDE_INT offset2 = INTVAL (operands[3]);
>>>> +    if (offset2 == 4)
>>>> +      {
>>>> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>> +                                 operands[2], NULL_RTX, operands[3], true))
>>>> +         return \"ldmia\\t%2, {%0, %1}\";
>>>> +       if (fix_cm3_ldrd && (operands[2] == operands[0]))
>>>> +         {
>>>> +           output_asm_insn (\"ldr\\t%1, [%2, %3]\", operands);
>>>> +           output_asm_insn (\"ldr\\t%0, [%2]\", operands);
>>>> +           return \"\";
>>>> +         }
>>>> +       return \"ldrd\\t%0, %1, [%2]\";
>>>> +      }
>>>> +    else
>>>> +      {
>>>> +       if (fix_cm3_ldrd && (operands[2] == operands[1]))
>>>> +         {
>>>> +           output_asm_insn (\"ldr\\t%0, [%2]\", operands);
>>>> +           output_asm_insn (\"ldr\\t%1, [%2, %3]\", operands);
>>>> +         }
>>>> +       return \"ldrd\\t%1, %0, [%2, %3]\";
>>>> +      }
>>>> +  }"
>>>> +)
>>>> +
>>>> +(define_insn "*thumb2_ldrd_reg2"
>>>> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>> +                  (mem:SI (plus:SI
>>>> +                               (match_operand:SI 2 "s_register_operand" "rk")
>>>> +                               (match_operand:SI 3 "const_int_operand" ""))))
>>>> +             (set (match_operand:SI 1 "s_register_operand" "")
>>>> +                  (mem:SI (match_dup 2)))])]
>>>> +  "TARGET_THUMB2 && arm_arch7
>>>> +   && thumb2_check_ldrd_operands (operands[3], NULL_RTX)"
>>>> +  "*
>>>> +  {
>>>> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>> +    if (offset1 == -4)
>>>> +      {
>>>> +       if (fix_cm3_ldrd && (operands[2] == operands[0]))
>>>> +         {
>>>> +           output_asm_insn (\"ldr\\t%1, [%2]\", operands);
>>>> +           output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
>>>> +           return \"\";
>>>> +         }
>>>> +       return \"ldrd\\t%0, %1, [%2, %3]\";
>>>> +      }
>>>> +    else
>>>> +      {
>>>> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>> +                                 operands[2], operands[3], NULL_RTX, true))
>>>> +         return \"ldmia\\t%2, {%1, %0}\";
>>>> +       if (fix_cm3_ldrd && (operands[2] == operands[1]))
>>>> +         {
>>>> +           output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
>>>> +           output_asm_insn (\"ldr\\t%1, [%2]\", operands);
>>>> +           return \"\";
>>>> +         }
>>>> +       return \"ldrd\\t%1, %0, [%2]\";
>>>> +      }
>>>> +  }"
>>>> +)
>>>> +
>>>> +(define_peephole2
>>>> +  [(set (match_operand:SI 0 "s_register_operand" "")
>>>> +       (match_operand:SI 2 "memory_operand" ""))
>>>> +   (set (match_operand:SI 1 "s_register_operand" "")
>>>> +       (match_operand:SI 3 "memory_operand" ""))]
>>>> +  "TARGET_THUMB2 && arm_arch7
>>>> +   && thumb2_legitimate_ldrd_p (operands[0], operands[1],
>>>> +                               operands[2], operands[3], true)"
>>>> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>> +                  (match_operand:SI 2 "memory_operand" ""))
>>>> +             (set (match_operand:SI 1 "s_register_operand" "")
>>>> +                  (match_operand:SI 3 "memory_operand" ""))])]
>>>> +  ""
>>>> +)
>>>> +
>>>> +(define_insn "*thumb2_strd"
>>>> +  [(parallel [(set (mem:SI
>>>> +                       (plus:SI (match_operand:SI 2 "s_register_operand" "rk")
>>>> +                                (match_operand:SI 3 "const_int_operand" "")))
>>>> +                  (match_operand:SI 0 "s_register_operand" ""))
>>>> +             (set (mem:SI (plus:SI (match_dup 2)
>>>> +                                (match_operand:SI 4 "const_int_operand" "")))
>>>> +                  (match_operand:SI 1 "s_register_operand" ""))])]
>>>> +  "TARGET_THUMB2 && arm_arch7
>>>> +   && thumb2_check_ldrd_operands (operands[3], operands[4])"
>>>> +  "*
>>>> +  {
>>>> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>> +    HOST_WIDE_INT offset2 = INTVAL (operands[4]);
>>>> +    if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>> +                             operands[2], operands[3], operands[4], false))
>>>> +      return \"stmdb\\t%2, {%0, %1}\";
>>>> +    if (offset1 < offset2)
>>>> +      return \"strd\\t%0, %1, [%2, %3]\";
>>>> +    else
>>>> +      return \"strd\\t%1, %0, [%2, %4]\";
>>>> +  }"
>>>> +)
>>>> +
>>>> +(define_insn "*thumb2_strd_reg1"
>>>> +  [(parallel [(set (mem:SI (match_operand:SI 2 "s_register_operand" "rk"))
>>>> +                  (match_operand:SI 0 "s_register_operand" ""))
>>>> +             (set (mem:SI (plus:SI (match_dup 2)
>>>> +                               (match_operand:SI 3 "const_int_operand" "")))
>>>> +                  (match_operand:SI 1 "s_register_operand" ""))])]
>>>> +  "TARGET_THUMB2 && arm_arch7
>>>> +   && thumb2_check_ldrd_operands (NULL_RTX, operands[3])"
>>>> +  "*
>>>> +  {
>>>> +    HOST_WIDE_INT offset2 = INTVAL (operands[3]);
>>>> +    if (offset2 == 4)
>>>> +      {
>>>> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>> +                                 operands[2], NULL_RTX, operands[3], false))
>>>> +         return \"stmia\\t%2, {%0, %1}\";
>>>> +       return \"strd\\t%0, %1, [%2]\";
>>>> +      }
>>>> +    else
>>>> +      return \"strd\\t%1, %0, [%2, %3]\";
>>>> +  }"
>>>> +)
>>>> +
>>>> +(define_insn "*thumb2_strd_reg2"
>>>> +  [(parallel [(set (mem:SI (plus:SI
>>>> +                               (match_operand:SI 2 "s_register_operand" "rk")
>>>> +                               (match_operand:SI 3 "const_int_operand" "")))
>>>> +                  (match_operand:SI 0 "s_register_operand" ""))
>>>> +             (set (mem:SI (match_dup 2))
>>>> +                  (match_operand:SI 1 "s_register_operand" ""))])]
>>>> +  "TARGET_THUMB2 && arm_arch7
>>>> +   && thumb2_check_ldrd_operands (operands[3], NULL_RTX)"
>>>> +  "*
>>>> +  {
>>>> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>> +    if (offset1 == -4)
>>>> +      return \"strd\\t%0, %1, [%2, %3]\";
>>>> +    else
>>>> +      {
>>>> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>> +                                 operands[2], operands[3], NULL_RTX, false))
>>>> +         return \"stmia\\t%2, {%1, %0}\";
>>>> +       return \"strd\\t%1, %0, [%2]\";
>>>> +      }
>>>> +  }"
>>>> +)
>>>> +
>>>> +(define_peephole2
>>>> +  [(set (match_operand:SI 2 "memory_operand" "")
>>>> +       (match_operand:SI 0 "s_register_operand" ""))
>>>> +   (set (match_operand:SI 3 "memory_operand" "")
>>>> +       (match_operand:SI 1 "s_register_operand" ""))]
>>>> +  "TARGET_THUMB2 && arm_arch7
>>>> +   && thumb2_legitimate_ldrd_p (operands[0], operands[1],
>>>> +                               operands[2], operands[3], false)"
>>>> +  [(parallel [(set (match_operand:SI 2 "memory_operand" "")
>>>> +                  (match_operand:SI 0 "s_register_operand" ""))
>>>> +             (set (match_operand:SI 3 "memory_operand" "")
>>>> +                  (match_operand:SI 1 "s_register_operand" ""))])]
>>>> +  ""
>>>> +)
>>>> Index: arm.c
>>>> ===================================================================
>>>> --- arm.c       (revision 165492)
>>>> +++ arm.c       (working copy)
>>>> @@ -23254,4 +23254,134 @@ arm_builtin_support_vector_misalignment
>>>>                                                      is_packed);
>>>>  }
>>>>
>>>> +/* Check the validity of operands in an ldrd/strd instruction.  */
>>>> +bool
>>>> +thumb2_check_ldrd_operands (rtx off1, rtx off2)
>>>> +{
>>>> +  HOST_WIDE_INT offset1 = 0;
>>>> +  HOST_WIDE_INT offset2 = 0;
>>>> +
>>>> +  if (off1 != NULL_RTX)
>>>> +    offset1 = INTVAL (off1);
>>>> +  if (off2 != NULL_RTX)
>>>> +    offset2 = INTVAL (off2);
>>>> +
>>>> +  /* The offset range of LDRD is [-1020, 1020]. Here we check if both
>>>> +     offsets lie in the range [-1020, 1024]. If one of the offsets is
>>>> +     1024, the following condition ((offset1 + 4) == offset2) will ensure
>>>> +     offset1 to be 1020, suitable for instruction LDRD.  */
>>>> +  if ((offset1 > 1024) || (offset1 < -1020) || ((offset1 & 3) != 0))
>>>> +    return false;
>>>> +  if ((offset2 > 1024) || (offset2 < -1020) || ((offset2 & 3) != 0))
>>>> +    return false;
>>>> +
>>>> +  if ((offset1 + 4) == offset2)
>>>> +    return true;
>>>> +  if ((offset2 + 4) == offset1)
>>>> +    return true;
>>>> +
>>>> +  return false;
>>>> +}
>>>> +
>>>> +/* Check if the two memory accesses can be merged to an ldrd/strd instruction.
>>>> +   That is they use the same base register, and the gap between constant
>>>> +   offsets should be 4.  */
>>>> +bool
>>>> +thumb2_legitimate_ldrd_p (rtx reg1, rtx reg2, rtx mem1, rtx mem2, bool ldrd)
>>>> +{
>>>> +  rtx base1, base2, op1;
>>>> +  rtx addr1 = XEXP (mem1, 0);
>>>> +  rtx addr2 = XEXP (mem2, 0);
>>>> +  HOST_WIDE_INT offset1 = 0;
>>>> +  HOST_WIDE_INT offset2 = 0;
>>>> +
>>>> +  if (MEM_VOLATILE_P (mem1) || MEM_VOLATILE_P (mem2))
>>>> +    return false;
>>>> +
>>>> +  if (REG_P (addr1))
>>>> +    base1 = addr1;
>>>> +  else if (GET_CODE (addr1) == PLUS)
>>>> +    {
>>>> +      base1 = XEXP (addr1, 0);
>>>> +      op1 = XEXP (addr1, 1);
>>>> +      if (!REG_P (base1) || (GET_CODE (op1) != CONST_INT))
>>>> +       return false;
>>>> +      offset1 = INTVAL (op1);
>>>> +    }
>>>> +  else
>>>> +    return false;
>>>> +
>>>> +  if (REG_P (addr2))
>>>> +    base2 = addr2;
>>>> +  else if (GET_CODE (addr2) == PLUS)
>>>> +    {
>>>> +      base2 = XEXP (addr2, 0);
>>>> +      op1 = XEXP (addr2, 1);
>>>> +      if (!REG_P (base2) || (GET_CODE (op1) != CONST_INT))
>>>> +       return false;
>>>> +      offset2 = INTVAL (op1);
>>>> +    }
>>>> +  else
>>>> +    return false;
>>>> +
>>>> +  if (base1 != base2)
>>>> +    return false;
>>>> +
>>>> +  /* The offset range of LDRD is [-1020, 1020]. Here we check if both
>>>> +     offsets lie in the range [-1020, 1024]. If one of the offsets is
>>>> +     1024, the following condition ((offset1 + 4) == offset2) will ensure
>>>> +     offset1 to be 1020, suitable for instruction LDRD.  */
>>>> +  if ((offset1 > 1024) || (offset1 < -1020) || ((offset1 & 3) != 0))
>>>> +    return false;
>>>> +  if ((offset2 > 1024) || (offset2 < -1020) || ((offset2 & 3) != 0))
>>>> +    return false;
>>>> +
>>>> +  if (ldrd && ((reg1 == reg2) || (reg1 == base1)))
>>>> +    return false;
>>>> +
>>>> +  if ((offset1 + 4) == offset2)
>>>> +    return true;
>>>> +  if ((offset2 + 4) == offset1)
>>>> +    return true;
>>>> +
>>>> +  return false;
>>>> +}
>>>> +
>>>> +/* Check if the insn can be expressed as ldm/stm with less cost.  */
>>>> +bool
>>>> +thumb2_prefer_ldmstm (rtx reg1, rtx reg2, rtx base,
>>>> +                     rtx off1, rtx off2, bool ldrd)
>>>> +{
>>>> +  HOST_WIDE_INT offset1 = 0;
>>>> +  HOST_WIDE_INT offset2 = 0;
>>>> +
>>>> +  if (off1 != NULL_RTX)
>>>> +    offset1 = INTVAL (off1);
>>>> +  if (off2 != NULL_RTX)
>>>> +    offset2 = INTVAL (off2);
>>>> +
>>>> +  if (offset1 > offset2)
>>>> +    {
>>>> +      rtx tmp;
>>>> +      HOST_WIDE_INT t = offset1;
>>>> +      offset1 = offset2;
>>>> +      offset2 = t;
>>>> +      tmp = reg1;
>>>> +      reg1 = reg2;
>>>> +      reg2 = tmp;
>>>> +    }
>>>> +
>>>> +  /* The offset of ldmdb is -8, the offset of ldmia is 0.  */
>>>> +  if ((offset1 != -8) && (offset1 != 0))
>>>> +    return false;
>>>> +
>>>> +  /* Lower register corresponds to lower memory.  */
>>>> +  if (REGNO (reg1) > REGNO (reg2))
>>>> +    return false;
>>>> +
>>>> +  /* Now ldm/stm is possible. Check for special cases ldm/stm has lower
>>>> +     cost.  */
>>>> +  return false;
>>>> +}
>>>> +
>>>>  #include "gt-arm.h"
>>>> Index: arm-protos.h
>>>> ===================================================================
>>>> --- arm-protos.h        (revision 165492)
>>>> +++ arm-protos.h        (working copy)
>>>> @@ -150,6 +150,9 @@ extern void arm_expand_sync (enum machin
>>>>  extern const char *arm_output_memory_barrier (rtx *);
>>>>  extern const char *arm_output_sync_insn (rtx, rtx *);
>>>>  extern unsigned int arm_sync_loop_insns (rtx , rtx *);
>>>> +extern bool thumb2_check_ldrd_operands (rtx, rtx);
>>>> +extern bool thumb2_legitimate_ldrd_p (rtx, rtx, rtx, rtx, bool);
>>>> +extern bool thumb2_prefer_ldmstm (rtx, rtx, rtx, rtx, rtx, bool);
>>>>
>>>>  #if defined TREE_CODE
>>>>  extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree);
>>>> Index: ldmstm.md
>>>> ===================================================================
>>>> --- ldmstm.md   (revision 165492)
>>>> +++ ldmstm.md   (working copy)
>>>> @@ -852,7 +852,7 @@ (define_insn "*ldm2_ia"
>>>>      (set (match_operand:SI 2 "arm_hard_register_operand" "")
>>>>           (mem:SI (plus:SI (match_dup 3)
>>>>                   (const_int 4))))])]
>>>> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>   "ldm%(ia%)\t%3, {%1, %2}"
>>>>   [(set_attr "type" "load2")
>>>>    (set_attr "predicable" "yes")])
>>>> @@ -901,7 +901,7 @@ (define_insn "*stm2_ia"
>>>>           (match_operand:SI 1 "arm_hard_register_operand" ""))
>>>>      (set (mem:SI (plus:SI (match_dup 3) (const_int 4)))
>>>>           (match_operand:SI 2 "arm_hard_register_operand" ""))])]
>>>> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>   "stm%(ia%)\t%3, {%1, %2}"
>>>>   [(set_attr "type" "store2")
>>>>    (set_attr "predicable" "yes")])
>>>> @@ -1041,7 +1041,7 @@ (define_insn "*ldm2_db"
>>>>      (set (match_operand:SI 2 "arm_hard_register_operand" "")
>>>>           (mem:SI (plus:SI (match_dup 3)
>>>>                   (const_int -4))))])]
>>>> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>   "ldm%(db%)\t%3, {%1, %2}"
>>>>   [(set_attr "type" "load2")
>>>>    (set_attr "predicable" "yes")])
>>>> @@ -1067,7 +1067,7 @@ (define_insn "*stm2_db"
>>>>           (match_operand:SI 1 "arm_hard_register_operand" ""))
>>>>      (set (mem:SI (plus:SI (match_dup 3) (const_int -4)))
>>>>           (match_operand:SI 2 "arm_hard_register_operand" ""))])]
>>>> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>   "stm%(db%)\t%3, {%1, %2}"
>>>>   [(set_attr "type" "store2")
>>>>    (set_attr "predicable" "yes")])
>>>>
>>>>
>>>> Index: pr40457-3.c
>>>> ===================================================================
>>>> --- pr40457-3.c (revision 165492)
>>>> +++ pr40457-3.c (working copy)
>>>> @@ -5,6 +5,7 @@ void foo(int* p)
>>>>  {
>>>>   p[0] = 1;
>>>>   p[1] = 0;
>>>> +  p[2] = 2;
>>>>  }
>>>>
>>>>  /* { dg-final { scan-assembler "stm" } } */
>>>> Index: pr40457-1.c
>>>> ===================================================================
>>>> --- pr40457-1.c (revision 165492)
>>>> +++ pr40457-1.c (working copy)
>>>> @@ -1,9 +1,9 @@
>>>> -/* { dg-options "-Os" }  */
>>>> +/* { dg-options "-O2" }  */
>>>>  /* { dg-do compile } */
>>>>
>>>>  int bar(int* p)
>>>>  {
>>>> -  int x = p[0] + p[1];
>>>> +  int x = p[0] + p[1] + p[2];
>>>>   return x;
>>>>  }
>>>>
>>>> Index: pr40457-2.c
>>>> ===================================================================
>>>> --- pr40457-2.c (revision 165492)
>>>> +++ pr40457-2.c (working copy)
>>>> @@ -5,6 +5,7 @@ void foo(int* p)
>>>>  {
>>>>   p[0] = 1;
>>>>   p[1] = 0;
>>>> +  p[2] = 2;
>>>>  }
>>>>
>>>>  /* { dg-final { scan-assembler "stm" } } */
>>>> Index: pr45335.c
>>>> ===================================================================
>>>> --- pr45335.c   (revision 0)
>>>> +++ pr45335.c   (revision 0)
>>>> @@ -0,0 +1,22 @@
>>>> +/* { dg-options "-mthumb -O2" } */
>>>> +/* { dg-require-effective-target arm_thumb2_ok } */
>>>> +/* { dg-final { scan-assembler "ldrd" } } */
>>>> +/* { dg-final { scan-assembler "strd" } } */
>>>> +
>>>> +struct S
>>>> +{
>>>> +    void* p1;
>>>> +    void* p2;
>>>> +    void* p3;
>>>> +    void* p4;
>>>> +};
>>>> +
>>>> +extern printf(char*, ...);
>>>> +
>>>> +void foo1(struct S* fp, struct S* otherSaveArea)
>>>> +{
>>>> +    struct S* saveA = fp - 1;
>>>> +    printf("StackSaveArea for fp %p [%p/%p]:\n", fp, saveA, otherSaveArea);
>>>> +    printf("prevFrame=%p savedPc=%p meth=%p curPc=%p fp[0]=0x%08x\n",
>>>> +        saveA->p1, saveA->p2, saveA->p3, saveA->p4, *(unsigned int*)fp);
>>>> +}
>>>>
>>>
>>
>
Carrot Wei Dec. 14, 2010, 10 p.m. UTC | #5
ping

On Mon, Nov 29, 2010 at 2:32 PM, Carrot Wei <carrot@google.com> wrote:
> ping
>
> On Mon, Nov 22, 2010 at 3:16 PM, Carrot Wei <carrot@google.com> wrote:
>> ping
>>
>> On Sun, Oct 31, 2010 at 2:22 AM, Carrot Wei <carrot@google.com> wrote:
>>> Ping
>>>
>>> On Sun, Oct 24, 2010 at 9:46 PM, Carrot Wei <carrot@google.com> wrote:
>>>> Ping
>>>>
>>>> On Sat, Oct 16, 2010 at 8:27 PM, Carrot Wei <carrot@google.com> wrote:
>>>>> On Wed, Oct 13, 2010 at 7:01 PM, Paul Brook <paul@codesourcery.com> wrote:
>>>>>>> ChangeLog:
>>>>>>> 2010-09-04  Wei Guozhi  <carrot@google.com>
>>>>>>>
>>>>>>>         PR target/45335
>>>>>>>         * gcc/config/arm/thumb2.md (thumb2_ldrd, thumb2_ldrd_reg1,
>>>>>>>         thumb2_ldrd_reg2 and peephole2): New insn pattern and related
>>>>>>>         peephole2.
>>>>>>>         (thumb2_strd, thumb2_strd_reg1, thumb2_strd_reg2 and peephole2):
>>>>>>>         New insn pattern and related peephole2.
>>>>>>>         * gcc/config/arm/arm.c (thumb2_legitimate_ldrd_p): New function.
>>>>>>>         (thumb2_check_ldrd_operands): New function.
>>>>>>>         (thumb2_prefer_ldmstm): New function.
>>>>>>>         * gcc/config/arm/arm-protos.h (thumb2_legitimate_ldrd_p): New
>>>>>>> prototype. (thumb2_check_ldrd_operands): New prototype.
>>>>>>>         (thumb2_prefer_ldmstm): New prototype.
>>>>>>>         * gcc/config/arm/ldmstm.md (ldm2_ia, stm2_ia, ldm2_db, stm2_db):
>>>>>>>         Change the ldm/stm patterns with 2 words to ARM only.
>>>>>>>         * gcc/config/arm/constraints.md (Py): New thumb2 constant
>>>>>>> constraint suitable to ldrd/strd instructions.
>>>>>>
>>>>>> Not ok.
>>>>>>
>>>>>> Why is this restricted to Thumb mode? The ARM variant of ldrd isn't quite as
>>>>>> flexible, but still provides a useful improvement over ldm.
>>>>>>
>>>>> I agree the ARM version is also useful. But it brings much less
>>>>> benefit with too much complexity (due to more restriction and insn
>>>>> pattern conflict with ldm). So I will leave it as a future
>>>>> improvement.
>>>>>
>>>>>> This transformation is only valid on ARMv7 cores. On earlier hardware
>>>>>> (depending on system configuration) it may cause undefined behavior or an
>>>>>> alignment trap.
>>>>>>
>>>>> done.
>>>>>
>>>>>> The range on -1020 to +1024 is used in several places, but without any
>>>>>> apparent explanation of why it's different to the range of an ldrd
>>>>>> instruction.  I figured it out eventually, but it deserves a comment.
>>>>>>
>>>>> Comments added.
>>>>>
>>>>>>> +  "TARGET_THUMB2 && thumb2_check_ldrd_operands (operands[0], operands[1],
>>>>>>> +                                           operands[2], 0, operands[3], 1)"
>>>>>>
>>>>>> Passed operands do not match expected types. Specifically "0" is not an rtx
>>>>>> (should be "NULL_RTX"), and "1" is not a boolean value (should be "true").
>>>>>> Many other occurrences.
>>>>>>
>>>>> Fixed.
>>>>>
>>>>>>> +(define_constraint "Py"
>>>>>>> +  "@internal In Thumb-2 state a constant that is a multiple of 4 in the
>>>>>>> +   range -1020 to 1024"
>>>>>>
>>>>>> This comment seems particularly pointless. You should mention why this
>>>>>> exists/where it is used.
>>>>>>
>>>>>> I think you're better off enforcing this in the insn condition, and remove
>>>>>> this constraint. At least half the uses (the -reg[12] insns) are incorrect,
>>>>>> and you already need the condition to enforce the dependency between the
>>>>>> operands.
>>>>>>
>>>>> I removed this constraint and add the check to insn condition.
>>>>>
>>>>>>> +thumb2_check_ldrd_operands (rtx reg1, rtx reg2, rtx base,
>>>>>>>...
>>>>>>> +  if (ldrd && (reg1 == reg2))
>>>>>>> +    return false;
>>>>>>
>>>>>> This function is part of the instruction condition.  Instruction conditions
>>>>>> must not be used to enforce register allocation.
>>>>>>
>>>>> removed.
>>>>>
>>>>>>> +thumb2_legitimate_ldrd_p (
>>>>>>>...
>>>>>>> +  if (ldrd && ((reg1 == reg2) || (reg1 == base1)))
>>>>>>> +    return false;
>>>>>>
>>>>>> You're incorrectly assuming offset1 < offset2, which might not be true at this
>>>>>> point.
>>>>>>
>>>>> The following check assumes offset1 < offset2
>>>>> +  if ((offset1 + 4) == offset2)
>>>>> +    return true;
>>>>>
>>>>> And another check assumes offset2 < offset1, so both cases are covered.
>>>>> +  if ((offset2 + 4) == offset1)
>>>>> +    return true;
>>>>>
>>>>>>> +  /* Now ldm/stm is possible. Check for special cases ldm/stm has lower
>>>>>>> +     cost.  */
>>>>>>> +  return false;
>>>>>>
>>>>>> Code clearly doesn't match the comment.  In fact this function always returns
>>>>>> false.
>>>>>>
>>>>> Richard mentioned that in some cases (specifically cortex A9) ldm has
>>>>> less cost than ldrd and we should model this in the insn pattern. This
>>>>> function is used for this. But I don't know the cortex A9 architecture
>>>>> detail, so it should be filled by somebody with more knowledge about
>>>>> it in future.
>>>>>
>>>>> Wei Guozhi
>>>>>
>>>>>
>>>>> ChangeLog:
>>>>> 2010-10-16  Wei Guozhi  <carrot@google.com>
>>>>>
>>>>>        PR target/45335
>>>>>        * gcc/config/arm/thumb2.md (thumb2_ldrd, thumb2_ldrd_reg1,
>>>>>        thumb2_ldrd_reg2 and peephole2): New insn pattern and related
>>>>>        peephole2.
>>>>>        (thumb2_strd, thumb2_strd_reg1, thumb2_strd_reg2 and peephole2):
>>>>>        New insn pattern and related peephole2.
>>>>>        * gcc/config/arm/arm.c (thumb2_legitimate_ldrd_p): New function.
>>>>>        (thumb2_check_ldrd_operands): New function.
>>>>>        (thumb2_prefer_ldmstm): New function.
>>>>>        * gcc/config/arm/arm-protos.h (thumb2_legitimate_ldrd_p): New prototype.
>>>>>        (thumb2_check_ldrd_operands): New prototype.
>>>>>        (thumb2_prefer_ldmstm): New prototype.
>>>>>        * gcc/config/arm/ldmstm.md (ldm2_ia, stm2_ia, ldm2_db, stm2_db):
>>>>>        Change the ldm/stm patterns with 2 words to ARM only.
>>>>>
>>>>>
>>>>> 2010-10-16  Wei Guozhi  <carrot@google.com>
>>>>>
>>>>>        PR target/45335
>>>>>        * gcc.target/arm/pr45335.c: New test.
>>>>>        * gcc.target/arm/pr40457-1.c: Changed to load 3 words.
>>>>>        * gcc.target/arm/pr40457-2.c: Changed to store 3 words.
>>>>>        * gcc.target/arm/pr40457-3.c: Changed to store 3 words.
>>>>>
>>>>>
>>>>> Index: thumb2.md
>>>>> ===================================================================
>>>>> --- thumb2.md   (revision 165492)
>>>>> +++ thumb2.md   (working copy)
>>>>> @@ -1118,3 +1118,228 @@ (define_peephole2
>>>>>   "
>>>>>   operands[2] = GEN_INT (32 - INTVAL (operands[2]));
>>>>>   ")
>>>>> +
>>>>> +(define_insn "*thumb2_ldrd"
>>>>> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>>> +                  (mem:SI (plus:SI
>>>>> +                               (match_operand:SI 2 "s_register_operand" "rk")
>>>>> +                               (match_operand:SI 3 "const_int_operand" ""))))
>>>>> +             (set (match_operand:SI 1 "s_register_operand" "")
>>>>> +                  (mem:SI (plus:SI (match_dup 2)
>>>>> +                            (match_operand:SI 4 "const_int_operand" ""))))])]
>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>> +   && thumb2_check_ldrd_operands (operands[3], operands[4])"
>>>>> +  "*
>>>>> +  {
>>>>> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>>> +    HOST_WIDE_INT offset2 = INTVAL (operands[4]);
>>>>> +    if (offset1 > offset2)
>>>>> +      {
>>>>> +       /* Swap the operands so that memory [base+offset1] is loaded into
>>>>> +          operands[0].  */
>>>>> +       rtx tmp = operands[0];
>>>>> +       operands[0] = operands[1];
>>>>> +       operands[1] = tmp;
>>>>> +       tmp = operands[3];
>>>>> +       operands[3] = operands[4];
>>>>> +       operands[4] = tmp;
>>>>> +       offset1 = INTVAL (operands[3]);
>>>>> +       offset2 = INTVAL (operands[4]);
>>>>> +      }
>>>>> +    if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>> +                             operands[2], operands[3], operands[4], true))
>>>>> +      return \"ldmdb\\t%2, {%0, %1}\";
>>>>> +    else if (fix_cm3_ldrd && (operands[2] == operands[0]))
>>>>> +      {
>>>>> +       if (offset1 <= -256)
>>>>> +         {
>>>>> +           output_asm_insn (\"sub\\t%2, %2, %n3\", operands);
>>>>> +           output_asm_insn (\"ldr\\t%1, [%2, #4]\", operands);
>>>>> +           output_asm_insn (\"ldr\\t%0, [%2]\", operands);
>>>>> +         }
>>>>> +       else
>>>>> +         {
>>>>> +           output_asm_insn (\"ldr\\t%1, [%2, %4]\", operands);
>>>>> +           output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
>>>>> +         }
>>>>> +       return \"\";
>>>>> +      }
>>>>> +    else
>>>>> +      return \"ldrd\\t%0, %1, [%2, %3]\";
>>>>> +  }"
>>>>> +)
>>>>> +
>>>>> +(define_insn "*thumb2_ldrd_reg1"
>>>>> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>>> +                  (mem:SI (match_operand:SI 2 "s_register_operand" "rk")))
>>>>> +             (set (match_operand:SI 1 "s_register_operand" "")
>>>>> +                  (mem:SI (plus:SI (match_dup 2)
>>>>> +                            (match_operand:SI 3 "const_int_operand" ""))))])]
>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>> +   && thumb2_check_ldrd_operands (NULL_RTX, operands[3])"
>>>>> +  "*
>>>>> +  {
>>>>> +    HOST_WIDE_INT offset2 = INTVAL (operands[3]);
>>>>> +    if (offset2 == 4)
>>>>> +      {
>>>>> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>> +                                 operands[2], NULL_RTX, operands[3], true))
>>>>> +         return \"ldmia\\t%2, {%0, %1}\";
>>>>> +       if (fix_cm3_ldrd && (operands[2] == operands[0]))
>>>>> +         {
>>>>> +           output_asm_insn (\"ldr\\t%1, [%2, %3]\", operands);
>>>>> +           output_asm_insn (\"ldr\\t%0, [%2]\", operands);
>>>>> +           return \"\";
>>>>> +         }
>>>>> +       return \"ldrd\\t%0, %1, [%2]\";
>>>>> +      }
>>>>> +    else
>>>>> +      {
>>>>> +       if (fix_cm3_ldrd && (operands[2] == operands[1]))
>>>>> +         {
>>>>> +           output_asm_insn (\"ldr\\t%0, [%2]\", operands);
>>>>> +           output_asm_insn (\"ldr\\t%1, [%2, %3]\", operands);
>>>>> +         }
>>>>> +       return \"ldrd\\t%1, %0, [%2, %3]\";
>>>>> +      }
>>>>> +  }"
>>>>> +)
>>>>> +
>>>>> +(define_insn "*thumb2_ldrd_reg2"
>>>>> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>>> +                  (mem:SI (plus:SI
>>>>> +                               (match_operand:SI 2 "s_register_operand" "rk")
>>>>> +                               (match_operand:SI 3 "const_int_operand" ""))))
>>>>> +             (set (match_operand:SI 1 "s_register_operand" "")
>>>>> +                  (mem:SI (match_dup 2)))])]
>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>> +   && thumb2_check_ldrd_operands (operands[3], NULL_RTX)"
>>>>> +  "*
>>>>> +  {
>>>>> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>>> +    if (offset1 == -4)
>>>>> +      {
>>>>> +       if (fix_cm3_ldrd && (operands[2] == operands[0]))
>>>>> +         {
>>>>> +           output_asm_insn (\"ldr\\t%1, [%2]\", operands);
>>>>> +           output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
>>>>> +           return \"\";
>>>>> +         }
>>>>> +       return \"ldrd\\t%0, %1, [%2, %3]\";
>>>>> +      }
>>>>> +    else
>>>>> +      {
>>>>> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>> +                                 operands[2], operands[3], NULL_RTX, true))
>>>>> +         return \"ldmia\\t%2, {%1, %0}\";
>>>>> +       if (fix_cm3_ldrd && (operands[2] == operands[1]))
>>>>> +         {
>>>>> +           output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
>>>>> +           output_asm_insn (\"ldr\\t%1, [%2]\", operands);
>>>>> +           return \"\";
>>>>> +         }
>>>>> +       return \"ldrd\\t%1, %0, [%2]\";
>>>>> +      }
>>>>> +  }"
>>>>> +)
>>>>> +
>>>>> +(define_peephole2
>>>>> +  [(set (match_operand:SI 0 "s_register_operand" "")
>>>>> +       (match_operand:SI 2 "memory_operand" ""))
>>>>> +   (set (match_operand:SI 1 "s_register_operand" "")
>>>>> +       (match_operand:SI 3 "memory_operand" ""))]
>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>> +   && thumb2_legitimate_ldrd_p (operands[0], operands[1],
>>>>> +                               operands[2], operands[3], true)"
>>>>> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>>> +                  (match_operand:SI 2 "memory_operand" ""))
>>>>> +             (set (match_operand:SI 1 "s_register_operand" "")
>>>>> +                  (match_operand:SI 3 "memory_operand" ""))])]
>>>>> +  ""
>>>>> +)
>>>>> +
>>>>> +(define_insn "*thumb2_strd"
>>>>> +  [(parallel [(set (mem:SI
>>>>> +                       (plus:SI (match_operand:SI 2 "s_register_operand" "rk")
>>>>> +                                (match_operand:SI 3 "const_int_operand" "")))
>>>>> +                  (match_operand:SI 0 "s_register_operand" ""))
>>>>> +             (set (mem:SI (plus:SI (match_dup 2)
>>>>> +                                (match_operand:SI 4 "const_int_operand" "")))
>>>>> +                  (match_operand:SI 1 "s_register_operand" ""))])]
>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>> +   && thumb2_check_ldrd_operands (operands[3], operands[4])"
>>>>> +  "*
>>>>> +  {
>>>>> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>>> +    HOST_WIDE_INT offset2 = INTVAL (operands[4]);
>>>>> +    if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>> +                             operands[2], operands[3], operands[4], false))
>>>>> +      return \"stmdb\\t%2, {%0, %1}\";
>>>>> +    if (offset1 < offset2)
>>>>> +      return \"strd\\t%0, %1, [%2, %3]\";
>>>>> +    else
>>>>> +      return \"strd\\t%1, %0, [%2, %4]\";
>>>>> +  }"
>>>>> +)
>>>>> +
>>>>> +(define_insn "*thumb2_strd_reg1"
>>>>> +  [(parallel [(set (mem:SI (match_operand:SI 2 "s_register_operand" "rk"))
>>>>> +                  (match_operand:SI 0 "s_register_operand" ""))
>>>>> +             (set (mem:SI (plus:SI (match_dup 2)
>>>>> +                               (match_operand:SI 3 "const_int_operand" "")))
>>>>> +                  (match_operand:SI 1 "s_register_operand" ""))])]
>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>> +   && thumb2_check_ldrd_operands (NULL_RTX, operands[3])"
>>>>> +  "*
>>>>> +  {
>>>>> +    HOST_WIDE_INT offset2 = INTVAL (operands[3]);
>>>>> +    if (offset2 == 4)
>>>>> +      {
>>>>> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>> +                                 operands[2], NULL_RTX, operands[3], false))
>>>>> +         return \"stmia\\t%2, {%0, %1}\";
>>>>> +       return \"strd\\t%0, %1, [%2]\";
>>>>> +      }
>>>>> +    else
>>>>> +      return \"strd\\t%1, %0, [%2, %3]\";
>>>>> +  }"
>>>>> +)
>>>>> +
>>>>> +(define_insn "*thumb2_strd_reg2"
>>>>> +  [(parallel [(set (mem:SI (plus:SI
>>>>> +                               (match_operand:SI 2 "s_register_operand" "rk")
>>>>> +                               (match_operand:SI 3 "const_int_operand" "")))
>>>>> +                  (match_operand:SI 0 "s_register_operand" ""))
>>>>> +             (set (mem:SI (match_dup 2))
>>>>> +                  (match_operand:SI 1 "s_register_operand" ""))])]
>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>> +   && thumb2_check_ldrd_operands (operands[3], NULL_RTX)"
>>>>> +  "*
>>>>> +  {
>>>>> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>>> +    if (offset1 == -4)
>>>>> +      return \"strd\\t%0, %1, [%2, %3]\";
>>>>> +    else
>>>>> +      {
>>>>> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>> +                                 operands[2], operands[3], NULL_RTX, false))
>>>>> +         return \"stmia\\t%2, {%1, %0}\";
>>>>> +       return \"strd\\t%1, %0, [%2]\";
>>>>> +      }
>>>>> +  }"
>>>>> +)
>>>>> +
>>>>> +(define_peephole2
>>>>> +  [(set (match_operand:SI 2 "memory_operand" "")
>>>>> +       (match_operand:SI 0 "s_register_operand" ""))
>>>>> +   (set (match_operand:SI 3 "memory_operand" "")
>>>>> +       (match_operand:SI 1 "s_register_operand" ""))]
>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>> +   && thumb2_legitimate_ldrd_p (operands[0], operands[1],
>>>>> +                               operands[2], operands[3], false)"
>>>>> +  [(parallel [(set (match_operand:SI 2 "memory_operand" "")
>>>>> +                  (match_operand:SI 0 "s_register_operand" ""))
>>>>> +             (set (match_operand:SI 3 "memory_operand" "")
>>>>> +                  (match_operand:SI 1 "s_register_operand" ""))])]
>>>>> +  ""
>>>>> +)
>>>>> Index: arm.c
>>>>> ===================================================================
>>>>> --- arm.c       (revision 165492)
>>>>> +++ arm.c       (working copy)
>>>>> @@ -23254,4 +23254,134 @@ arm_builtin_support_vector_misalignment
>>>>>                                                      is_packed);
>>>>>  }
>>>>>
>>>>> +/* Check the validity of operands in an ldrd/strd instruction.  */
>>>>> +bool
>>>>> +thumb2_check_ldrd_operands (rtx off1, rtx off2)
>>>>> +{
>>>>> +  HOST_WIDE_INT offset1 = 0;
>>>>> +  HOST_WIDE_INT offset2 = 0;
>>>>> +
>>>>> +  if (off1 != NULL_RTX)
>>>>> +    offset1 = INTVAL (off1);
>>>>> +  if (off2 != NULL_RTX)
>>>>> +    offset2 = INTVAL (off2);
>>>>> +
>>>>> +  /* The offset range of LDRD is [-1020, 1020]. Here we check if both
>>>>> +     offsets lie in the range [-1020, 1024]. If one of the offsets is
>>>>> +     1024, the following condition ((offset1 + 4) == offset2) will ensure
>>>>> +     offset1 to be 1020, suitable for instruction LDRD.  */
>>>>> +  if ((offset1 > 1024) || (offset1 < -1020) || ((offset1 & 3) != 0))
>>>>> +    return false;
>>>>> +  if ((offset2 > 1024) || (offset2 < -1020) || ((offset2 & 3) != 0))
>>>>> +    return false;
>>>>> +
>>>>> +  if ((offset1 + 4) == offset2)
>>>>> +    return true;
>>>>> +  if ((offset2 + 4) == offset1)
>>>>> +    return true;
>>>>> +
>>>>> +  return false;
>>>>> +}
>>>>> +
>>>>> +/* Check if the two memory accesses can be merged to an ldrd/strd instruction.
>>>>> +   That is they use the same base register, and the gap between constant
>>>>> +   offsets should be 4.  */
>>>>> +bool
>>>>> +thumb2_legitimate_ldrd_p (rtx reg1, rtx reg2, rtx mem1, rtx mem2, bool ldrd)
>>>>> +{
>>>>> +  rtx base1, base2, op1;
>>>>> +  rtx addr1 = XEXP (mem1, 0);
>>>>> +  rtx addr2 = XEXP (mem2, 0);
>>>>> +  HOST_WIDE_INT offset1 = 0;
>>>>> +  HOST_WIDE_INT offset2 = 0;
>>>>> +
>>>>> +  if (MEM_VOLATILE_P (mem1) || MEM_VOLATILE_P (mem2))
>>>>> +    return false;
>>>>> +
>>>>> +  if (REG_P (addr1))
>>>>> +    base1 = addr1;
>>>>> +  else if (GET_CODE (addr1) == PLUS)
>>>>> +    {
>>>>> +      base1 = XEXP (addr1, 0);
>>>>> +      op1 = XEXP (addr1, 1);
>>>>> +      if (!REG_P (base1) || (GET_CODE (op1) != CONST_INT))
>>>>> +       return false;
>>>>> +      offset1 = INTVAL (op1);
>>>>> +    }
>>>>> +  else
>>>>> +    return false;
>>>>> +
>>>>> +  if (REG_P (addr2))
>>>>> +    base2 = addr2;
>>>>> +  else if (GET_CODE (addr2) == PLUS)
>>>>> +    {
>>>>> +      base2 = XEXP (addr2, 0);
>>>>> +      op1 = XEXP (addr2, 1);
>>>>> +      if (!REG_P (base2) || (GET_CODE (op1) != CONST_INT))
>>>>> +       return false;
>>>>> +      offset2 = INTVAL (op1);
>>>>> +    }
>>>>> +  else
>>>>> +    return false;
>>>>> +
>>>>> +  if (base1 != base2)
>>>>> +    return false;
>>>>> +
>>>>> +  /* The offset range of LDRD is [-1020, 1020]. Here we check if both
>>>>> +     offsets lie in the range [-1020, 1024]. If one of the offsets is
>>>>> +     1024, the following condition ((offset1 + 4) == offset2) will ensure
>>>>> +     offset1 to be 1020, suitable for instruction LDRD.  */
>>>>> +  if ((offset1 > 1024) || (offset1 < -1020) || ((offset1 & 3) != 0))
>>>>> +    return false;
>>>>> +  if ((offset2 > 1024) || (offset2 < -1020) || ((offset2 & 3) != 0))
>>>>> +    return false;
>>>>> +
>>>>> +  if (ldrd && ((reg1 == reg2) || (reg1 == base1)))
>>>>> +    return false;
>>>>> +
>>>>> +  if ((offset1 + 4) == offset2)
>>>>> +    return true;
>>>>> +  if ((offset2 + 4) == offset1)
>>>>> +    return true;
>>>>> +
>>>>> +  return false;
>>>>> +}
>>>>> +
>>>>> +/* Check if the insn can be expressed as ldm/stm with less cost.  */
>>>>> +bool
>>>>> +thumb2_prefer_ldmstm (rtx reg1, rtx reg2, rtx base,
>>>>> +                     rtx off1, rtx off2, bool ldrd)
>>>>> +{
>>>>> +  HOST_WIDE_INT offset1 = 0;
>>>>> +  HOST_WIDE_INT offset2 = 0;
>>>>> +
>>>>> +  if (off1 != NULL_RTX)
>>>>> +    offset1 = INTVAL (off1);
>>>>> +  if (off2 != NULL_RTX)
>>>>> +    offset2 = INTVAL (off2);
>>>>> +
>>>>> +  if (offset1 > offset2)
>>>>> +    {
>>>>> +      rtx tmp;
>>>>> +      HOST_WIDE_INT t = offset1;
>>>>> +      offset1 = offset2;
>>>>> +      offset2 = t;
>>>>> +      tmp = reg1;
>>>>> +      reg1 = reg2;
>>>>> +      reg2 = tmp;
>>>>> +    }
>>>>> +
>>>>> +  /* The offset of ldmdb is -8, the offset of ldmia is 0.  */
>>>>> +  if ((offset1 != -8) && (offset1 != 0))
>>>>> +    return false;
>>>>> +
>>>>> +  /* Lower register corresponds to lower memory.  */
>>>>> +  if (REGNO (reg1) > REGNO (reg2))
>>>>> +    return false;
>>>>> +
>>>>> +  /* Now ldm/stm is possible. Check for special cases ldm/stm has lower
>>>>> +     cost.  */
>>>>> +  return false;
>>>>> +}
>>>>> +
>>>>>  #include "gt-arm.h"
>>>>> Index: arm-protos.h
>>>>> ===================================================================
>>>>> --- arm-protos.h        (revision 165492)
>>>>> +++ arm-protos.h        (working copy)
>>>>> @@ -150,6 +150,9 @@ extern void arm_expand_sync (enum machin
>>>>>  extern const char *arm_output_memory_barrier (rtx *);
>>>>>  extern const char *arm_output_sync_insn (rtx, rtx *);
>>>>>  extern unsigned int arm_sync_loop_insns (rtx , rtx *);
>>>>> +extern bool thumb2_check_ldrd_operands (rtx, rtx);
>>>>> +extern bool thumb2_legitimate_ldrd_p (rtx, rtx, rtx, rtx, bool);
>>>>> +extern bool thumb2_prefer_ldmstm (rtx, rtx, rtx, rtx, rtx, bool);
>>>>>
>>>>>  #if defined TREE_CODE
>>>>>  extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree);
>>>>> Index: ldmstm.md
>>>>> ===================================================================
>>>>> --- ldmstm.md   (revision 165492)
>>>>> +++ ldmstm.md   (working copy)
>>>>> @@ -852,7 +852,7 @@ (define_insn "*ldm2_ia"
>>>>>      (set (match_operand:SI 2 "arm_hard_register_operand" "")
>>>>>           (mem:SI (plus:SI (match_dup 3)
>>>>>                   (const_int 4))))])]
>>>>> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>>> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>>   "ldm%(ia%)\t%3, {%1, %2}"
>>>>>   [(set_attr "type" "load2")
>>>>>    (set_attr "predicable" "yes")])
>>>>> @@ -901,7 +901,7 @@ (define_insn "*stm2_ia"
>>>>>           (match_operand:SI 1 "arm_hard_register_operand" ""))
>>>>>      (set (mem:SI (plus:SI (match_dup 3) (const_int 4)))
>>>>>           (match_operand:SI 2 "arm_hard_register_operand" ""))])]
>>>>> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>>> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>>   "stm%(ia%)\t%3, {%1, %2}"
>>>>>   [(set_attr "type" "store2")
>>>>>    (set_attr "predicable" "yes")])
>>>>> @@ -1041,7 +1041,7 @@ (define_insn "*ldm2_db"
>>>>>      (set (match_operand:SI 2 "arm_hard_register_operand" "")
>>>>>           (mem:SI (plus:SI (match_dup 3)
>>>>>                   (const_int -4))))])]
>>>>> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>>> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>>   "ldm%(db%)\t%3, {%1, %2}"
>>>>>   [(set_attr "type" "load2")
>>>>>    (set_attr "predicable" "yes")])
>>>>> @@ -1067,7 +1067,7 @@ (define_insn "*stm2_db"
>>>>>           (match_operand:SI 1 "arm_hard_register_operand" ""))
>>>>>      (set (mem:SI (plus:SI (match_dup 3) (const_int -4)))
>>>>>           (match_operand:SI 2 "arm_hard_register_operand" ""))])]
>>>>> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>>> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>>   "stm%(db%)\t%3, {%1, %2}"
>>>>>   [(set_attr "type" "store2")
>>>>>    (set_attr "predicable" "yes")])
>>>>>
>>>>>
>>>>> Index: pr40457-3.c
>>>>> ===================================================================
>>>>> --- pr40457-3.c (revision 165492)
>>>>> +++ pr40457-3.c (working copy)
>>>>> @@ -5,6 +5,7 @@ void foo(int* p)
>>>>>  {
>>>>>   p[0] = 1;
>>>>>   p[1] = 0;
>>>>> +  p[2] = 2;
>>>>>  }
>>>>>
>>>>>  /* { dg-final { scan-assembler "stm" } } */
>>>>> Index: pr40457-1.c
>>>>> ===================================================================
>>>>> --- pr40457-1.c (revision 165492)
>>>>> +++ pr40457-1.c (working copy)
>>>>> @@ -1,9 +1,9 @@
>>>>> -/* { dg-options "-Os" }  */
>>>>> +/* { dg-options "-O2" }  */
>>>>>  /* { dg-do compile } */
>>>>>
>>>>>  int bar(int* p)
>>>>>  {
>>>>> -  int x = p[0] + p[1];
>>>>> +  int x = p[0] + p[1] + p[2];
>>>>>   return x;
>>>>>  }
>>>>>
>>>>> Index: pr40457-2.c
>>>>> ===================================================================
>>>>> --- pr40457-2.c (revision 165492)
>>>>> +++ pr40457-2.c (working copy)
>>>>> @@ -5,6 +5,7 @@ void foo(int* p)
>>>>>  {
>>>>>   p[0] = 1;
>>>>>   p[1] = 0;
>>>>> +  p[2] = 2;
>>>>>  }
>>>>>
>>>>>  /* { dg-final { scan-assembler "stm" } } */
>>>>> Index: pr45335.c
>>>>> ===================================================================
>>>>> --- pr45335.c   (revision 0)
>>>>> +++ pr45335.c   (revision 0)
>>>>> @@ -0,0 +1,22 @@
>>>>> +/* { dg-options "-mthumb -O2" } */
>>>>> +/* { dg-require-effective-target arm_thumb2_ok } */
>>>>> +/* { dg-final { scan-assembler "ldrd" } } */
>>>>> +/* { dg-final { scan-assembler "strd" } } */
>>>>> +
>>>>> +struct S
>>>>> +{
>>>>> +    void* p1;
>>>>> +    void* p2;
>>>>> +    void* p3;
>>>>> +    void* p4;
>>>>> +};
>>>>> +
>>>>> +extern printf(char*, ...);
>>>>> +
>>>>> +void foo1(struct S* fp, struct S* otherSaveArea)
>>>>> +{
>>>>> +    struct S* saveA = fp - 1;
>>>>> +    printf("StackSaveArea for fp %p [%p/%p]:\n", fp, saveA, otherSaveArea);
>>>>> +    printf("prevFrame=%p savedPc=%p meth=%p curPc=%p fp[0]=0x%08x\n",
>>>>> +        saveA->p1, saveA->p2, saveA->p3, saveA->p4, *(unsigned int*)fp);
>>>>> +}
>>>>>
>>>>
>>>
>>
>
Carrot Wei Jan. 4, 2011, 6:09 a.m. UTC | #6
Happy new year!
Hope I can check in this patch in 2011


On Wed, Dec 15, 2010 at 6:00 AM, Carrot Wei <carrot@google.com> wrote:
> ping
>
> On Mon, Nov 29, 2010 at 2:32 PM, Carrot Wei <carrot@google.com> wrote:
>> ping
>>
>> On Mon, Nov 22, 2010 at 3:16 PM, Carrot Wei <carrot@google.com> wrote:
>>> ping
>>>
>>> On Sun, Oct 31, 2010 at 2:22 AM, Carrot Wei <carrot@google.com> wrote:
>>>> Ping
>>>>
>>>> On Sun, Oct 24, 2010 at 9:46 PM, Carrot Wei <carrot@google.com> wrote:
>>>>> Ping
>>>>>
>>>>> On Sat, Oct 16, 2010 at 8:27 PM, Carrot Wei <carrot@google.com> wrote:
>>>>>> On Wed, Oct 13, 2010 at 7:01 PM, Paul Brook <paul@codesourcery.com> wrote:
>>>>>>>> ChangeLog:
>>>>>>>> 2010-09-04  Wei Guozhi  <carrot@google.com>
>>>>>>>>
>>>>>>>>         PR target/45335
>>>>>>>>         * gcc/config/arm/thumb2.md (thumb2_ldrd, thumb2_ldrd_reg1,
>>>>>>>>         thumb2_ldrd_reg2 and peephole2): New insn pattern and related
>>>>>>>>         peephole2.
>>>>>>>>         (thumb2_strd, thumb2_strd_reg1, thumb2_strd_reg2 and peephole2):
>>>>>>>>         New insn pattern and related peephole2.
>>>>>>>>         * gcc/config/arm/arm.c (thumb2_legitimate_ldrd_p): New function.
>>>>>>>>         (thumb2_check_ldrd_operands): New function.
>>>>>>>>         (thumb2_prefer_ldmstm): New function.
>>>>>>>>         * gcc/config/arm/arm-protos.h (thumb2_legitimate_ldrd_p): New
>>>>>>>> prototype. (thumb2_check_ldrd_operands): New prototype.
>>>>>>>>         (thumb2_prefer_ldmstm): New prototype.
>>>>>>>>         * gcc/config/arm/ldmstm.md (ldm2_ia, stm2_ia, ldm2_db, stm2_db):
>>>>>>>>         Change the ldm/stm patterns with 2 words to ARM only.
>>>>>>>>         * gcc/config/arm/constraints.md (Py): New thumb2 constant
>>>>>>>> constraint suitable to ldrd/strd instructions.
>>>>>>>
>>>>>>> Not ok.
>>>>>>>
>>>>>>> Why is this restricted to Thumb mode? The ARM variant of ldrd isn't quite as
>>>>>>> flexible, but still provides a useful improvement over ldm.
>>>>>>>
>>>>>> I agree the ARM version is also useful. But it brings much less
>>>>>> benefit with too much complexity (due to more restriction and insn
>>>>>> pattern conflict with ldm). So I will leave it as a future
>>>>>> improvement.
>>>>>>
>>>>>>> This transformation is only valid on ARMv7 cores. On earlier hardware
>>>>>>> (depending on system configuration) it may cause undefined behavior or an
>>>>>>> alignment trap.
>>>>>>>
>>>>>> done.
>>>>>>
>>>>>>> The range on -1020 to +1024 is used in several places, but without any
>>>>>>> apparent explanation of why it's different to the range of an ldrd
>>>>>>> instruction.  I figured it out eventually, but it deserves a comment.
>>>>>>>
>>>>>> Comments added.
>>>>>>
>>>>>>>> +  "TARGET_THUMB2 && thumb2_check_ldrd_operands (operands[0], operands[1],
>>>>>>>> +                                           operands[2], 0, operands[3], 1)"
>>>>>>>
>>>>>>> Passed operands do not match expected types. Specifically "0" is not an rtx
>>>>>>> (should be "NULL_RTX"), and "1" is not a boolean value (should be "true").
>>>>>>> Many other occurrences.
>>>>>>>
>>>>>> Fixed.
>>>>>>
>>>>>>>> +(define_constraint "Py"
>>>>>>>> +  "@internal In Thumb-2 state a constant that is a multiple of 4 in the
>>>>>>>> +   range -1020 to 1024"
>>>>>>>
>>>>>>> This comment seems particularly pointless. You should mention why this
>>>>>>> exists/where it is used.
>>>>>>>
>>>>>>> I think you're better off enforcing this in the insn condition, and remove
>>>>>>> this constraint. At least half the uses (the -reg[12] insns) are incorrect,
>>>>>>> and you already need the condition to enforce the dependency between the
>>>>>>> operands.
>>>>>>>
>>>>>> I removed this constraint and add the check to insn condition.
>>>>>>
>>>>>>>> +thumb2_check_ldrd_operands (rtx reg1, rtx reg2, rtx base,
>>>>>>>>...
>>>>>>>> +  if (ldrd && (reg1 == reg2))
>>>>>>>> +    return false;
>>>>>>>
>>>>>>> This function is part of the instruction condition.  Instruction conditions
>>>>>>> must not be used to enforce register allocation.
>>>>>>>
>>>>>> removed.
>>>>>>
>>>>>>>> +thumb2_legitimate_ldrd_p (
>>>>>>>>...
>>>>>>>> +  if (ldrd && ((reg1 == reg2) || (reg1 == base1)))
>>>>>>>> +    return false;
>>>>>>>
>>>>>>> You're incorrectly assuming offset1 < offset2, which might not be true at this
>>>>>>> point.
>>>>>>>
>>>>>> The following check assumes offset1 < offset2
>>>>>> +  if ((offset1 + 4) == offset2)
>>>>>> +    return true;
>>>>>>
>>>>>> And another check assumes offset2 < offset1, so both cases are covered.
>>>>>> +  if ((offset2 + 4) == offset1)
>>>>>> +    return true;
>>>>>>
>>>>>>>> +  /* Now ldm/stm is possible. Check for special cases ldm/stm has lower
>>>>>>>> +     cost.  */
>>>>>>>> +  return false;
>>>>>>>
>>>>>>> Code clearly doesn't match the comment.  In fact this function always returns
>>>>>>> false.
>>>>>>>
>>>>>> Richard mentioned that in some cases (specifically cortex A9) ldm has
>>>>>> less cost than ldrd and we should model this in the insn pattern. This
>>>>>> function is used for this. But I don't know the cortex A9 architecture
>>>>>> detail, so it should be filled by somebody with more knowledge about
>>>>>> it in future.
>>>>>>
>>>>>> Wei Guozhi
>>>>>>
>>>>>>
>>>>>> ChangeLog:
>>>>>> 2010-10-16  Wei Guozhi  <carrot@google.com>
>>>>>>
>>>>>>        PR target/45335
>>>>>>        * gcc/config/arm/thumb2.md (thumb2_ldrd, thumb2_ldrd_reg1,
>>>>>>        thumb2_ldrd_reg2 and peephole2): New insn pattern and related
>>>>>>        peephole2.
>>>>>>        (thumb2_strd, thumb2_strd_reg1, thumb2_strd_reg2 and peephole2):
>>>>>>        New insn pattern and related peephole2.
>>>>>>        * gcc/config/arm/arm.c (thumb2_legitimate_ldrd_p): New function.
>>>>>>        (thumb2_check_ldrd_operands): New function.
>>>>>>        (thumb2_prefer_ldmstm): New function.
>>>>>>        * gcc/config/arm/arm-protos.h (thumb2_legitimate_ldrd_p): New prototype.
>>>>>>        (thumb2_check_ldrd_operands): New prototype.
>>>>>>        (thumb2_prefer_ldmstm): New prototype.
>>>>>>        * gcc/config/arm/ldmstm.md (ldm2_ia, stm2_ia, ldm2_db, stm2_db):
>>>>>>        Change the ldm/stm patterns with 2 words to ARM only.
>>>>>>
>>>>>>
>>>>>> 2010-10-16  Wei Guozhi  <carrot@google.com>
>>>>>>
>>>>>>        PR target/45335
>>>>>>        * gcc.target/arm/pr45335.c: New test.
>>>>>>        * gcc.target/arm/pr40457-1.c: Changed to load 3 words.
>>>>>>        * gcc.target/arm/pr40457-2.c: Changed to store 3 words.
>>>>>>        * gcc.target/arm/pr40457-3.c: Changed to store 3 words.
>>>>>>
>>>>>>
>>>>>> Index: thumb2.md
>>>>>> ===================================================================
>>>>>> --- thumb2.md   (revision 165492)
>>>>>> +++ thumb2.md   (working copy)
>>>>>> @@ -1118,3 +1118,228 @@ (define_peephole2
>>>>>>   "
>>>>>>   operands[2] = GEN_INT (32 - INTVAL (operands[2]));
>>>>>>   ")
>>>>>> +
>>>>>> +(define_insn "*thumb2_ldrd"
>>>>>> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>>>> +                  (mem:SI (plus:SI
>>>>>> +                               (match_operand:SI 2 "s_register_operand" "rk")
>>>>>> +                               (match_operand:SI 3 "const_int_operand" ""))))
>>>>>> +             (set (match_operand:SI 1 "s_register_operand" "")
>>>>>> +                  (mem:SI (plus:SI (match_dup 2)
>>>>>> +                            (match_operand:SI 4 "const_int_operand" ""))))])]
>>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>>> +   && thumb2_check_ldrd_operands (operands[3], operands[4])"
>>>>>> +  "*
>>>>>> +  {
>>>>>> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>>>> +    HOST_WIDE_INT offset2 = INTVAL (operands[4]);
>>>>>> +    if (offset1 > offset2)
>>>>>> +      {
>>>>>> +       /* Swap the operands so that memory [base+offset1] is loaded into
>>>>>> +          operands[0].  */
>>>>>> +       rtx tmp = operands[0];
>>>>>> +       operands[0] = operands[1];
>>>>>> +       operands[1] = tmp;
>>>>>> +       tmp = operands[3];
>>>>>> +       operands[3] = operands[4];
>>>>>> +       operands[4] = tmp;
>>>>>> +       offset1 = INTVAL (operands[3]);
>>>>>> +       offset2 = INTVAL (operands[4]);
>>>>>> +      }
>>>>>> +    if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>>> +                             operands[2], operands[3], operands[4], true))
>>>>>> +      return \"ldmdb\\t%2, {%0, %1}\";
>>>>>> +    else if (fix_cm3_ldrd && (operands[2] == operands[0]))
>>>>>> +      {
>>>>>> +       if (offset1 <= -256)
>>>>>> +         {
>>>>>> +           output_asm_insn (\"sub\\t%2, %2, %n3\", operands);
>>>>>> +           output_asm_insn (\"ldr\\t%1, [%2, #4]\", operands);
>>>>>> +           output_asm_insn (\"ldr\\t%0, [%2]\", operands);
>>>>>> +         }
>>>>>> +       else
>>>>>> +         {
>>>>>> +           output_asm_insn (\"ldr\\t%1, [%2, %4]\", operands);
>>>>>> +           output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
>>>>>> +         }
>>>>>> +       return \"\";
>>>>>> +      }
>>>>>> +    else
>>>>>> +      return \"ldrd\\t%0, %1, [%2, %3]\";
>>>>>> +  }"
>>>>>> +)
>>>>>> +
>>>>>> +(define_insn "*thumb2_ldrd_reg1"
>>>>>> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>>>> +                  (mem:SI (match_operand:SI 2 "s_register_operand" "rk")))
>>>>>> +             (set (match_operand:SI 1 "s_register_operand" "")
>>>>>> +                  (mem:SI (plus:SI (match_dup 2)
>>>>>> +                            (match_operand:SI 3 "const_int_operand" ""))))])]
>>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>>> +   && thumb2_check_ldrd_operands (NULL_RTX, operands[3])"
>>>>>> +  "*
>>>>>> +  {
>>>>>> +    HOST_WIDE_INT offset2 = INTVAL (operands[3]);
>>>>>> +    if (offset2 == 4)
>>>>>> +      {
>>>>>> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>>> +                                 operands[2], NULL_RTX, operands[3], true))
>>>>>> +         return \"ldmia\\t%2, {%0, %1}\";
>>>>>> +       if (fix_cm3_ldrd && (operands[2] == operands[0]))
>>>>>> +         {
>>>>>> +           output_asm_insn (\"ldr\\t%1, [%2, %3]\", operands);
>>>>>> +           output_asm_insn (\"ldr\\t%0, [%2]\", operands);
>>>>>> +           return \"\";
>>>>>> +         }
>>>>>> +       return \"ldrd\\t%0, %1, [%2]\";
>>>>>> +      }
>>>>>> +    else
>>>>>> +      {
>>>>>> +       if (fix_cm3_ldrd && (operands[2] == operands[1]))
>>>>>> +         {
>>>>>> +           output_asm_insn (\"ldr\\t%0, [%2]\", operands);
>>>>>> +           output_asm_insn (\"ldr\\t%1, [%2, %3]\", operands);
>>>>>> +         }
>>>>>> +       return \"ldrd\\t%1, %0, [%2, %3]\";
>>>>>> +      }
>>>>>> +  }"
>>>>>> +)
>>>>>> +
>>>>>> +(define_insn "*thumb2_ldrd_reg2"
>>>>>> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>>>> +                  (mem:SI (plus:SI
>>>>>> +                               (match_operand:SI 2 "s_register_operand" "rk")
>>>>>> +                               (match_operand:SI 3 "const_int_operand" ""))))
>>>>>> +             (set (match_operand:SI 1 "s_register_operand" "")
>>>>>> +                  (mem:SI (match_dup 2)))])]
>>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>>> +   && thumb2_check_ldrd_operands (operands[3], NULL_RTX)"
>>>>>> +  "*
>>>>>> +  {
>>>>>> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>>>> +    if (offset1 == -4)
>>>>>> +      {
>>>>>> +       if (fix_cm3_ldrd && (operands[2] == operands[0]))
>>>>>> +         {
>>>>>> +           output_asm_insn (\"ldr\\t%1, [%2]\", operands);
>>>>>> +           output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
>>>>>> +           return \"\";
>>>>>> +         }
>>>>>> +       return \"ldrd\\t%0, %1, [%2, %3]\";
>>>>>> +      }
>>>>>> +    else
>>>>>> +      {
>>>>>> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>>> +                                 operands[2], operands[3], NULL_RTX, true))
>>>>>> +         return \"ldmia\\t%2, {%1, %0}\";
>>>>>> +       if (fix_cm3_ldrd && (operands[2] == operands[1]))
>>>>>> +         {
>>>>>> +           output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
>>>>>> +           output_asm_insn (\"ldr\\t%1, [%2]\", operands);
>>>>>> +           return \"\";
>>>>>> +         }
>>>>>> +       return \"ldrd\\t%1, %0, [%2]\";
>>>>>> +      }
>>>>>> +  }"
>>>>>> +)
>>>>>> +
>>>>>> +(define_peephole2
>>>>>> +  [(set (match_operand:SI 0 "s_register_operand" "")
>>>>>> +       (match_operand:SI 2 "memory_operand" ""))
>>>>>> +   (set (match_operand:SI 1 "s_register_operand" "")
>>>>>> +       (match_operand:SI 3 "memory_operand" ""))]
>>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>>> +   && thumb2_legitimate_ldrd_p (operands[0], operands[1],
>>>>>> +                               operands[2], operands[3], true)"
>>>>>> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>>>> +                  (match_operand:SI 2 "memory_operand" ""))
>>>>>> +             (set (match_operand:SI 1 "s_register_operand" "")
>>>>>> +                  (match_operand:SI 3 "memory_operand" ""))])]
>>>>>> +  ""
>>>>>> +)
>>>>>> +
>>>>>> +(define_insn "*thumb2_strd"
>>>>>> +  [(parallel [(set (mem:SI
>>>>>> +                       (plus:SI (match_operand:SI 2 "s_register_operand" "rk")
>>>>>> +                                (match_operand:SI 3 "const_int_operand" "")))
>>>>>> +                  (match_operand:SI 0 "s_register_operand" ""))
>>>>>> +             (set (mem:SI (plus:SI (match_dup 2)
>>>>>> +                                (match_operand:SI 4 "const_int_operand" "")))
>>>>>> +                  (match_operand:SI 1 "s_register_operand" ""))])]
>>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>>> +   && thumb2_check_ldrd_operands (operands[3], operands[4])"
>>>>>> +  "*
>>>>>> +  {
>>>>>> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>>>> +    HOST_WIDE_INT offset2 = INTVAL (operands[4]);
>>>>>> +    if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>>> +                             operands[2], operands[3], operands[4], false))
>>>>>> +      return \"stmdb\\t%2, {%0, %1}\";
>>>>>> +    if (offset1 < offset2)
>>>>>> +      return \"strd\\t%0, %1, [%2, %3]\";
>>>>>> +    else
>>>>>> +      return \"strd\\t%1, %0, [%2, %4]\";
>>>>>> +  }"
>>>>>> +)
>>>>>> +
>>>>>> +(define_insn "*thumb2_strd_reg1"
>>>>>> +  [(parallel [(set (mem:SI (match_operand:SI 2 "s_register_operand" "rk"))
>>>>>> +                  (match_operand:SI 0 "s_register_operand" ""))
>>>>>> +             (set (mem:SI (plus:SI (match_dup 2)
>>>>>> +                               (match_operand:SI 3 "const_int_operand" "")))
>>>>>> +                  (match_operand:SI 1 "s_register_operand" ""))])]
>>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>>> +   && thumb2_check_ldrd_operands (NULL_RTX, operands[3])"
>>>>>> +  "*
>>>>>> +  {
>>>>>> +    HOST_WIDE_INT offset2 = INTVAL (operands[3]);
>>>>>> +    if (offset2 == 4)
>>>>>> +      {
>>>>>> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>>> +                                 operands[2], NULL_RTX, operands[3], false))
>>>>>> +         return \"stmia\\t%2, {%0, %1}\";
>>>>>> +       return \"strd\\t%0, %1, [%2]\";
>>>>>> +      }
>>>>>> +    else
>>>>>> +      return \"strd\\t%1, %0, [%2, %3]\";
>>>>>> +  }"
>>>>>> +)
>>>>>> +
>>>>>> +(define_insn "*thumb2_strd_reg2"
>>>>>> +  [(parallel [(set (mem:SI (plus:SI
>>>>>> +                               (match_operand:SI 2 "s_register_operand" "rk")
>>>>>> +                               (match_operand:SI 3 "const_int_operand" "")))
>>>>>> +                  (match_operand:SI 0 "s_register_operand" ""))
>>>>>> +             (set (mem:SI (match_dup 2))
>>>>>> +                  (match_operand:SI 1 "s_register_operand" ""))])]
>>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>>> +   && thumb2_check_ldrd_operands (operands[3], NULL_RTX)"
>>>>>> +  "*
>>>>>> +  {
>>>>>> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>>>> +    if (offset1 == -4)
>>>>>> +      return \"strd\\t%0, %1, [%2, %3]\";
>>>>>> +    else
>>>>>> +      {
>>>>>> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>>> +                                 operands[2], operands[3], NULL_RTX, false))
>>>>>> +         return \"stmia\\t%2, {%1, %0}\";
>>>>>> +       return \"strd\\t%1, %0, [%2]\";
>>>>>> +      }
>>>>>> +  }"
>>>>>> +)
>>>>>> +
>>>>>> +(define_peephole2
>>>>>> +  [(set (match_operand:SI 2 "memory_operand" "")
>>>>>> +       (match_operand:SI 0 "s_register_operand" ""))
>>>>>> +   (set (match_operand:SI 3 "memory_operand" "")
>>>>>> +       (match_operand:SI 1 "s_register_operand" ""))]
>>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>>> +   && thumb2_legitimate_ldrd_p (operands[0], operands[1],
>>>>>> +                               operands[2], operands[3], false)"
>>>>>> +  [(parallel [(set (match_operand:SI 2 "memory_operand" "")
>>>>>> +                  (match_operand:SI 0 "s_register_operand" ""))
>>>>>> +             (set (match_operand:SI 3 "memory_operand" "")
>>>>>> +                  (match_operand:SI 1 "s_register_operand" ""))])]
>>>>>> +  ""
>>>>>> +)
>>>>>> Index: arm.c
>>>>>> ===================================================================
>>>>>> --- arm.c       (revision 165492)
>>>>>> +++ arm.c       (working copy)
>>>>>> @@ -23254,4 +23254,134 @@ arm_builtin_support_vector_misalignment
>>>>>>                                                      is_packed);
>>>>>>  }
>>>>>>
>>>>>> +/* Check the validity of operands in an ldrd/strd instruction.  */
>>>>>> +bool
>>>>>> +thumb2_check_ldrd_operands (rtx off1, rtx off2)
>>>>>> +{
>>>>>> +  HOST_WIDE_INT offset1 = 0;
>>>>>> +  HOST_WIDE_INT offset2 = 0;
>>>>>> +
>>>>>> +  if (off1 != NULL_RTX)
>>>>>> +    offset1 = INTVAL (off1);
>>>>>> +  if (off2 != NULL_RTX)
>>>>>> +    offset2 = INTVAL (off2);
>>>>>> +
>>>>>> +  /* The offset range of LDRD is [-1020, 1020]. Here we check if both
>>>>>> +     offsets lie in the range [-1020, 1024]. If one of the offsets is
>>>>>> +     1024, the following condition ((offset1 + 4) == offset2) will ensure
>>>>>> +     offset1 to be 1020, suitable for instruction LDRD.  */
>>>>>> +  if ((offset1 > 1024) || (offset1 < -1020) || ((offset1 & 3) != 0))
>>>>>> +    return false;
>>>>>> +  if ((offset2 > 1024) || (offset2 < -1020) || ((offset2 & 3) != 0))
>>>>>> +    return false;
>>>>>> +
>>>>>> +  if ((offset1 + 4) == offset2)
>>>>>> +    return true;
>>>>>> +  if ((offset2 + 4) == offset1)
>>>>>> +    return true;
>>>>>> +
>>>>>> +  return false;
>>>>>> +}
>>>>>> +
>>>>>> +/* Check if the two memory accesses can be merged to an ldrd/strd instruction.
>>>>>> +   That is they use the same base register, and the gap between constant
>>>>>> +   offsets should be 4.  */
>>>>>> +bool
>>>>>> +thumb2_legitimate_ldrd_p (rtx reg1, rtx reg2, rtx mem1, rtx mem2, bool ldrd)
>>>>>> +{
>>>>>> +  rtx base1, base2, op1;
>>>>>> +  rtx addr1 = XEXP (mem1, 0);
>>>>>> +  rtx addr2 = XEXP (mem2, 0);
>>>>>> +  HOST_WIDE_INT offset1 = 0;
>>>>>> +  HOST_WIDE_INT offset2 = 0;
>>>>>> +
>>>>>> +  if (MEM_VOLATILE_P (mem1) || MEM_VOLATILE_P (mem2))
>>>>>> +    return false;
>>>>>> +
>>>>>> +  if (REG_P (addr1))
>>>>>> +    base1 = addr1;
>>>>>> +  else if (GET_CODE (addr1) == PLUS)
>>>>>> +    {
>>>>>> +      base1 = XEXP (addr1, 0);
>>>>>> +      op1 = XEXP (addr1, 1);
>>>>>> +      if (!REG_P (base1) || (GET_CODE (op1) != CONST_INT))
>>>>>> +       return false;
>>>>>> +      offset1 = INTVAL (op1);
>>>>>> +    }
>>>>>> +  else
>>>>>> +    return false;
>>>>>> +
>>>>>> +  if (REG_P (addr2))
>>>>>> +    base2 = addr2;
>>>>>> +  else if (GET_CODE (addr2) == PLUS)
>>>>>> +    {
>>>>>> +      base2 = XEXP (addr2, 0);
>>>>>> +      op1 = XEXP (addr2, 1);
>>>>>> +      if (!REG_P (base2) || (GET_CODE (op1) != CONST_INT))
>>>>>> +       return false;
>>>>>> +      offset2 = INTVAL (op1);
>>>>>> +    }
>>>>>> +  else
>>>>>> +    return false;
>>>>>> +
>>>>>> +  if (base1 != base2)
>>>>>> +    return false;
>>>>>> +
>>>>>> +  /* The offset range of LDRD is [-1020, 1020]. Here we check if both
>>>>>> +     offsets lie in the range [-1020, 1024]. If one of the offsets is
>>>>>> +     1024, the following condition ((offset1 + 4) == offset2) will ensure
>>>>>> +     offset1 to be 1020, suitable for instruction LDRD.  */
>>>>>> +  if ((offset1 > 1024) || (offset1 < -1020) || ((offset1 & 3) != 0))
>>>>>> +    return false;
>>>>>> +  if ((offset2 > 1024) || (offset2 < -1020) || ((offset2 & 3) != 0))
>>>>>> +    return false;
>>>>>> +
>>>>>> +  if (ldrd && ((reg1 == reg2) || (reg1 == base1)))
>>>>>> +    return false;
>>>>>> +
>>>>>> +  if ((offset1 + 4) == offset2)
>>>>>> +    return true;
>>>>>> +  if ((offset2 + 4) == offset1)
>>>>>> +    return true;
>>>>>> +
>>>>>> +  return false;
>>>>>> +}
>>>>>> +
>>>>>> +/* Check if the insn can be expressed as ldm/stm with less cost.  */
>>>>>> +bool
>>>>>> +thumb2_prefer_ldmstm (rtx reg1, rtx reg2, rtx base,
>>>>>> +                     rtx off1, rtx off2, bool ldrd)
>>>>>> +{
>>>>>> +  HOST_WIDE_INT offset1 = 0;
>>>>>> +  HOST_WIDE_INT offset2 = 0;
>>>>>> +
>>>>>> +  if (off1 != NULL_RTX)
>>>>>> +    offset1 = INTVAL (off1);
>>>>>> +  if (off2 != NULL_RTX)
>>>>>> +    offset2 = INTVAL (off2);
>>>>>> +
>>>>>> +  if (offset1 > offset2)
>>>>>> +    {
>>>>>> +      rtx tmp;
>>>>>> +      HOST_WIDE_INT t = offset1;
>>>>>> +      offset1 = offset2;
>>>>>> +      offset2 = t;
>>>>>> +      tmp = reg1;
>>>>>> +      reg1 = reg2;
>>>>>> +      reg2 = tmp;
>>>>>> +    }
>>>>>> +
>>>>>> +  /* The offset of ldmdb is -8, the offset of ldmia is 0.  */
>>>>>> +  if ((offset1 != -8) && (offset1 != 0))
>>>>>> +    return false;
>>>>>> +
>>>>>> +  /* Lower register corresponds to lower memory.  */
>>>>>> +  if (REGNO (reg1) > REGNO (reg2))
>>>>>> +    return false;
>>>>>> +
>>>>>> +  /* Now ldm/stm is possible. Check for special cases ldm/stm has lower
>>>>>> +     cost.  */
>>>>>> +  return false;
>>>>>> +}
>>>>>> +
>>>>>>  #include "gt-arm.h"
>>>>>> Index: arm-protos.h
>>>>>> ===================================================================
>>>>>> --- arm-protos.h        (revision 165492)
>>>>>> +++ arm-protos.h        (working copy)
>>>>>> @@ -150,6 +150,9 @@ extern void arm_expand_sync (enum machin
>>>>>>  extern const char *arm_output_memory_barrier (rtx *);
>>>>>>  extern const char *arm_output_sync_insn (rtx, rtx *);
>>>>>>  extern unsigned int arm_sync_loop_insns (rtx , rtx *);
>>>>>> +extern bool thumb2_check_ldrd_operands (rtx, rtx);
>>>>>> +extern bool thumb2_legitimate_ldrd_p (rtx, rtx, rtx, rtx, bool);
>>>>>> +extern bool thumb2_prefer_ldmstm (rtx, rtx, rtx, rtx, rtx, bool);
>>>>>>
>>>>>>  #if defined TREE_CODE
>>>>>>  extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree);
>>>>>> Index: ldmstm.md
>>>>>> ===================================================================
>>>>>> --- ldmstm.md   (revision 165492)
>>>>>> +++ ldmstm.md   (working copy)
>>>>>> @@ -852,7 +852,7 @@ (define_insn "*ldm2_ia"
>>>>>>      (set (match_operand:SI 2 "arm_hard_register_operand" "")
>>>>>>           (mem:SI (plus:SI (match_dup 3)
>>>>>>                   (const_int 4))))])]
>>>>>> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>>>> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>>>   "ldm%(ia%)\t%3, {%1, %2}"
>>>>>>   [(set_attr "type" "load2")
>>>>>>    (set_attr "predicable" "yes")])
>>>>>> @@ -901,7 +901,7 @@ (define_insn "*stm2_ia"
>>>>>>           (match_operand:SI 1 "arm_hard_register_operand" ""))
>>>>>>      (set (mem:SI (plus:SI (match_dup 3) (const_int 4)))
>>>>>>           (match_operand:SI 2 "arm_hard_register_operand" ""))])]
>>>>>> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>>>> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>>>   "stm%(ia%)\t%3, {%1, %2}"
>>>>>>   [(set_attr "type" "store2")
>>>>>>    (set_attr "predicable" "yes")])
>>>>>> @@ -1041,7 +1041,7 @@ (define_insn "*ldm2_db"
>>>>>>      (set (match_operand:SI 2 "arm_hard_register_operand" "")
>>>>>>           (mem:SI (plus:SI (match_dup 3)
>>>>>>                   (const_int -4))))])]
>>>>>> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>>>> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>>>   "ldm%(db%)\t%3, {%1, %2}"
>>>>>>   [(set_attr "type" "load2")
>>>>>>    (set_attr "predicable" "yes")])
>>>>>> @@ -1067,7 +1067,7 @@ (define_insn "*stm2_db"
>>>>>>           (match_operand:SI 1 "arm_hard_register_operand" ""))
>>>>>>      (set (mem:SI (plus:SI (match_dup 3) (const_int -4)))
>>>>>>           (match_operand:SI 2 "arm_hard_register_operand" ""))])]
>>>>>> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>>>> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>>>   "stm%(db%)\t%3, {%1, %2}"
>>>>>>   [(set_attr "type" "store2")
>>>>>>    (set_attr "predicable" "yes")])
>>>>>>
>>>>>>
>>>>>> Index: pr40457-3.c
>>>>>> ===================================================================
>>>>>> --- pr40457-3.c (revision 165492)
>>>>>> +++ pr40457-3.c (working copy)
>>>>>> @@ -5,6 +5,7 @@ void foo(int* p)
>>>>>>  {
>>>>>>   p[0] = 1;
>>>>>>   p[1] = 0;
>>>>>> +  p[2] = 2;
>>>>>>  }
>>>>>>
>>>>>>  /* { dg-final { scan-assembler "stm" } } */
>>>>>> Index: pr40457-1.c
>>>>>> ===================================================================
>>>>>> --- pr40457-1.c (revision 165492)
>>>>>> +++ pr40457-1.c (working copy)
>>>>>> @@ -1,9 +1,9 @@
>>>>>> -/* { dg-options "-Os" }  */
>>>>>> +/* { dg-options "-O2" }  */
>>>>>>  /* { dg-do compile } */
>>>>>>
>>>>>>  int bar(int* p)
>>>>>>  {
>>>>>> -  int x = p[0] + p[1];
>>>>>> +  int x = p[0] + p[1] + p[2];
>>>>>>   return x;
>>>>>>  }
>>>>>>
>>>>>> Index: pr40457-2.c
>>>>>> ===================================================================
>>>>>> --- pr40457-2.c (revision 165492)
>>>>>> +++ pr40457-2.c (working copy)
>>>>>> @@ -5,6 +5,7 @@ void foo(int* p)
>>>>>>  {
>>>>>>   p[0] = 1;
>>>>>>   p[1] = 0;
>>>>>> +  p[2] = 2;
>>>>>>  }
>>>>>>
>>>>>>  /* { dg-final { scan-assembler "stm" } } */
>>>>>> Index: pr45335.c
>>>>>> ===================================================================
>>>>>> --- pr45335.c   (revision 0)
>>>>>> +++ pr45335.c   (revision 0)
>>>>>> @@ -0,0 +1,22 @@
>>>>>> +/* { dg-options "-mthumb -O2" } */
>>>>>> +/* { dg-require-effective-target arm_thumb2_ok } */
>>>>>> +/* { dg-final { scan-assembler "ldrd" } } */
>>>>>> +/* { dg-final { scan-assembler "strd" } } */
>>>>>> +
>>>>>> +struct S
>>>>>> +{
>>>>>> +    void* p1;
>>>>>> +    void* p2;
>>>>>> +    void* p3;
>>>>>> +    void* p4;
>>>>>> +};
>>>>>> +
>>>>>> +extern printf(char*, ...);
>>>>>> +
>>>>>> +void foo1(struct S* fp, struct S* otherSaveArea)
>>>>>> +{
>>>>>> +    struct S* saveA = fp - 1;
>>>>>> +    printf("StackSaveArea for fp %p [%p/%p]:\n", fp, saveA, otherSaveArea);
>>>>>> +    printf("prevFrame=%p savedPc=%p meth=%p curPc=%p fp[0]=0x%08x\n",
>>>>>> +        saveA->p1, saveA->p2, saveA->p3, saveA->p4, *(unsigned int*)fp);
>>>>>> +}
>>>>>>
>>>>>
>>>>
>>>
>>
>
Nick Clifton Jan. 11, 2011, 2:19 p.m. UTC | #7
Hi Carrot,

>>>>>>> ChangeLog:
>>>>>>> 2010-10-16  Wei Guozhi<carrot@google.com>
>>>>>>>
>>>>>>>         PR target/45335
>>>>>>>         * gcc/config/arm/thumb2.md (thumb2_ldrd, thumb2_ldrd_reg1,
>>>>>>>         thumb2_ldrd_reg2 and peephole2): New insn pattern and related
>>>>>>>         peephole2.
>>>>>>>         (thumb2_strd, thumb2_strd_reg1, thumb2_strd_reg2 and peephole2):
>>>>>>>         New insn pattern and related peephole2.
>>>>>>>         * gcc/config/arm/arm.c (thumb2_legitimate_ldrd_p): New function.
>>>>>>>         (thumb2_check_ldrd_operands): New function.
>>>>>>>         (thumb2_prefer_ldmstm): New function.
>>>>>>>         * gcc/config/arm/arm-protos.h (thumb2_legitimate_ldrd_p): New prototype.
>>>>>>>         (thumb2_check_ldrd_operands): New prototype.
>>>>>>>         (thumb2_prefer_ldmstm): New prototype.
>>>>>>>         * gcc/config/arm/ldmstm.md (ldm2_ia, stm2_ia, ldm2_db, stm2_db):
>>>>>>>         Change the ldm/stm patterns with 2 words to ARM only.

Approved - please apply.

Happy New Year!

Cheers
   Nick
Richard Earnshaw Jan. 11, 2011, 2:49 p.m. UTC | #8
On Tue, 2011-01-11 at 14:19 +0000, Nick Clifton wrote:
> Hi Carrot,
> 
> >>>>>>> ChangeLog:
> >>>>>>> 2010-10-16  Wei Guozhi<carrot@google.com>
> >>>>>>>
> >>>>>>>         PR target/45335
> >>>>>>>         * gcc/config/arm/thumb2.md (thumb2_ldrd, thumb2_ldrd_reg1,
> >>>>>>>         thumb2_ldrd_reg2 and peephole2): New insn pattern and related
> >>>>>>>         peephole2.
> >>>>>>>         (thumb2_strd, thumb2_strd_reg1, thumb2_strd_reg2 and peephole2):
> >>>>>>>         New insn pattern and related peephole2.
> >>>>>>>         * gcc/config/arm/arm.c (thumb2_legitimate_ldrd_p): New function.
> >>>>>>>         (thumb2_check_ldrd_operands): New function.
> >>>>>>>         (thumb2_prefer_ldmstm): New function.
> >>>>>>>         * gcc/config/arm/arm-protos.h (thumb2_legitimate_ldrd_p): New prototype.
> >>>>>>>         (thumb2_check_ldrd_operands): New prototype.
> >>>>>>>         (thumb2_prefer_ldmstm): New prototype.
> >>>>>>>         * gcc/config/arm/ldmstm.md (ldm2_ia, stm2_ia, ldm2_db, stm2_db):
> >>>>>>>         Change the ldm/stm patterns with 2 words to ARM only.
> 
> Approved - please apply.
> 

No, this is not yet ready.  It certainly isn't ready to go in this late
in gcc-4.6, which is now in regression-only fixes stage.

Sorry, I still think this needs more work, but I'm dashing off to yet
another meeting right now.

R.

> Happy New Year!
> 
> Cheers
>    Nick
> 
> 
>
Nathan Froyd Jan. 11, 2011, 3:02 p.m. UTC | #9
On Tue, Jan 11, 2011 at 02:49:06PM +0000, Richard Earnshaw wrote:
> On Tue, 2011-01-11 at 14:19 +0000, Nick Clifton wrote:
> > Hi Carrot,
> > 
> > >>>>>>> ChangeLog:
> > >>>>>>> 2010-10-16  Wei Guozhi<carrot@google.com>
> > 
> > Approved - please apply.
> > 
> 
> No, this is not yet ready.  It certainly isn't ready to go in this late
> in gcc-4.6, which is now in regression-only fixes stage.
> 
> Sorry, I still think this needs more work, but I'm dashing off to yet
> another meeting right now.

Carrot has been pinging *this* patch for almost three months (and needed
to ping a previous patch for a month, maybe more).  And only just now,
when it's been approved, you come forward and wave your hands and say
"needs more work"?  That doesn't seem very sporting.

-Nathan
Ian Lance Taylor Jan. 12, 2011, 5:23 a.m. UTC | #10
On Tue, Jan 11, 2011 at 6:49 AM, Richard Earnshaw <rearnsha@arm.com> wrote:
>
> On Tue, 2011-01-11 at 14:19 +0000, Nick Clifton wrote:
>> Hi Carrot,
>>
>> >>>>>>> ChangeLog:
>> >>>>>>> 2010-10-16  Wei Guozhi<carrot@google.com>
>> >>>>>>>
>> >>>>>>>         PR target/45335
>> >>>>>>>         * gcc/config/arm/thumb2.md (thumb2_ldrd, thumb2_ldrd_reg1,
>> >>>>>>>         thumb2_ldrd_reg2 and peephole2): New insn pattern and related
>> >>>>>>>         peephole2.
>> >>>>>>>         (thumb2_strd, thumb2_strd_reg1, thumb2_strd_reg2 and peephole2):
>> >>>>>>>         New insn pattern and related peephole2.
>> >>>>>>>         * gcc/config/arm/arm.c (thumb2_legitimate_ldrd_p): New function.
>> >>>>>>>         (thumb2_check_ldrd_operands): New function.
>> >>>>>>>         (thumb2_prefer_ldmstm): New function.
>> >>>>>>>         * gcc/config/arm/arm-protos.h (thumb2_legitimate_ldrd_p): New prototype.
>> >>>>>>>         (thumb2_check_ldrd_operands): New prototype.
>> >>>>>>>         (thumb2_prefer_ldmstm): New prototype.
>> >>>>>>>         * gcc/config/arm/ldmstm.md (ldm2_ia, stm2_ia, ldm2_db, stm2_db):
>> >>>>>>>         Change the ldm/stm patterns with 2 words to ARM only.
>>
>> Approved - please apply.
>>
>
> No, this is not yet ready.  It certainly isn't ready to go in this late
> in gcc-4.6, which is now in regression-only fixes stage.
>
> Sorry, I still think this needs more work, but I'm dashing off to yet
> another meeting right now.

Richard, I really have to object to this.  Carrot proposed this patch
before gcc went into stage 3, and has pinged it regularly, which by
the rules as I understand them means that it can be accepted during
stage 3 if the target maintainers agree.  This really hits a chord for
me as I've been hearing recently from people both inside and outside
of Google that it is incredibly hard to get any patches into gcc, and
people are naturally looking for alternatives.  I think that we as a
community need to take this seriously.  I don't think you have to
accept this patch.  But it is just not OK to ignore multiple pings and
then reject it after it has been accepted by another ARM maintainer.
If you're going to reject a patch, reject it fast.  If you are going
to reject this patch now, please, please, take the time to do a proper
review and suggest how the patch can be improved.  Please do not say
that a patch that was first proposed in August (!), and has been
regularly pinged and updated quickly to all substantive responses, has
to wait another three months for stage 1 before it can be accepted.

Ian
Richard Biener Jan. 12, 2011, 10:11 a.m. UTC | #11
On Wed, Jan 12, 2011 at 6:23 AM, Ian Lance Taylor <iant@google.com> wrote:
> On Tue, Jan 11, 2011 at 6:49 AM, Richard Earnshaw <rearnsha@arm.com> wrote:
>>
>> On Tue, 2011-01-11 at 14:19 +0000, Nick Clifton wrote:
>>> Hi Carrot,
>>>
>>> >>>>>>> ChangeLog:
>>> >>>>>>> 2010-10-16  Wei Guozhi<carrot@google.com>
>>> >>>>>>>
>>> >>>>>>>         PR target/45335
>>> >>>>>>>         * gcc/config/arm/thumb2.md (thumb2_ldrd, thumb2_ldrd_reg1,
>>> >>>>>>>         thumb2_ldrd_reg2 and peephole2): New insn pattern and related
>>> >>>>>>>         peephole2.
>>> >>>>>>>         (thumb2_strd, thumb2_strd_reg1, thumb2_strd_reg2 and peephole2):
>>> >>>>>>>         New insn pattern and related peephole2.
>>> >>>>>>>         * gcc/config/arm/arm.c (thumb2_legitimate_ldrd_p): New function.
>>> >>>>>>>         (thumb2_check_ldrd_operands): New function.
>>> >>>>>>>         (thumb2_prefer_ldmstm): New function.
>>> >>>>>>>         * gcc/config/arm/arm-protos.h (thumb2_legitimate_ldrd_p): New prototype.
>>> >>>>>>>         (thumb2_check_ldrd_operands): New prototype.
>>> >>>>>>>         (thumb2_prefer_ldmstm): New prototype.
>>> >>>>>>>         * gcc/config/arm/ldmstm.md (ldm2_ia, stm2_ia, ldm2_db, stm2_db):
>>> >>>>>>>         Change the ldm/stm patterns with 2 words to ARM only.
>>>
>>> Approved - please apply.
>>>
>>
>> No, this is not yet ready.  It certainly isn't ready to go in this late
>> in gcc-4.6, which is now in regression-only fixes stage.
>>
>> Sorry, I still think this needs more work, but I'm dashing off to yet
>> another meeting right now.
>
> Richard, I really have to object to this.  Carrot proposed this patch
> before gcc went into stage 3, and has pinged it regularly, which by
> the rules as I understand them means that it can be accepted during
> stage 3 if the target maintainers agree.

Not specific to this issue, but we are in stage 4 now which means that
only fixes for regressions and documentation are allowed.
Patches that only affect non-primary/secondary targets
target maintainers may have more freedom.  We also regularly
accepted non-regression wrong-code and rejects-valid patches
at this stage.

That's 2 cents from your release manager(s) (aka, "it was posted
during stageN < 3" isn't on its own a good enough reason to
not honor restrictions of stageN >= 3).

Richard.
Paul Brook Jan. 12, 2011, 1:19 p.m. UTC | #12
> > Why is this restricted to Thumb mode? The ARM variant of ldrd isn't quite
> > as flexible, but still provides a useful improvement over ldm.
> 
> I agree the ARM version is also useful. But it brings much less
> benefit with too much complexity (due to more restriction and insn
> pattern conflict with ldm). So I will leave it as a future
> improvement.

I'm still not convinced.  Surely there's no more complexity than the current 
ldm fallback bits.

> >> +  /* Now ldm/stm is possible. Check for special cases ldm/stm has lower
> >> +     cost.  */
> >> +  return false;
> > 
> > Code clearly doesn't match the comment.  In fact this function always
> > returns false.
> 
> Richard mentioned that in some cases (specifically cortex A9) ldm has
> less cost than ldrd and we should model this in the insn pattern. This
> function is used for this. But I don't know the cortex A9 architecture
> detail, so it should be filled by somebody with more knowledge about
> it in future.

This is trivially dead code. As such it should be removed.
I consider this sort of thing to be actively harmful.  At best it's likely to 
bitrot and need rewriting when you implement your "future" changes. At worst 
it triggers incorrectly and breaks something.

> --- pr40457-1.c (revision 165492)
> +++ pr40457-1.c (working copy)
> @@ -1,9 +1,9 @@
> -/* { dg-options "-Os" }  */
> +/* { dg-options "-O2" }  */
> 
>  /* { dg-do compile } */
>  
>  int bar(int* p)

This looks wrong.  ldm is always smaller but is only faster on some cores, so 
I'd expect compiling with -O2 to make this test less reliable.

Paul
Diego Novillo Jan. 12, 2011, 1:49 p.m. UTC | #13
On Tue, Jan 11, 2011 at 09:49, Richard Earnshaw <rearnsha@arm.com> wrote:

> No, this is not yet ready.  It certainly isn't ready to go in this late
> in gcc-4.6, which is now in regression-only fixes stage.
>
> Sorry, I still think this needs more work, but I'm dashing off to yet
> another meeting right now.

Richard,

This is simply unacceptable.  Carrot has been carrying this patch for
more than 4 months now, diligently pinging the patch and incorporating
the very scarce reviews he got from the ARM maintainers.  Now that it
has finally been approved, you summarily reject it with no
explanation?  Please provide an explanation, at least.

Our ARM developers at Google have been having a particularly hard time
trying to get their patches noticed by maintainers.  Patches are
pinged many times and ignored for long stretches of time.  Perhaps we
need more ARM maintainers?  Is there anything we could do to help the
current situation?


Diego.
Richard Earnshaw Jan. 12, 2011, 3:06 p.m. UTC | #14
On Wed, 2011-01-12 at 13:19 +0000, Paul Brook wrote:
> > > Why is this restricted to Thumb mode? The ARM variant of ldrd isn't quite
> > > as flexible, but still provides a useful improvement over ldm.
> > 
> > I agree the ARM version is also useful. But it brings much less
> > benefit with too much complexity (due to more restriction and insn
> > pattern conflict with ldm). So I will leave it as a future
> > improvement.
> 
> I'm still not convinced.  Surely there's no more complexity than the current 
> ldm fallback bits.
> 
> > >> +  /* Now ldm/stm is possible. Check for special cases ldm/stm has lower
> > >> +     cost.  */
> > >> +  return false;
> > > 
> > > Code clearly doesn't match the comment.  In fact this function always
> > > returns false.
> > 
> > Richard mentioned that in some cases (specifically cortex A9) ldm has
> > less cost than ldrd and we should model this in the insn pattern. This
> > function is used for this. But I don't know the cortex A9 architecture
> > detail, so it should be filled by somebody with more knowledge about
> > it in future.
> 
> This is trivially dead code. As such it should be removed.
> I consider this sort of thing to be actively harmful.  At best it's likely to 
> bitrot and need rewriting when you implement your "future" changes. At worst 
> it triggers incorrectly and breaks something.
> 
> > --- pr40457-1.c (revision 165492)
> > +++ pr40457-1.c (working copy)
> > @@ -1,9 +1,9 @@
> > -/* { dg-options "-Os" }  */
> > +/* { dg-options "-O2" }  */
> > 
> >  /* { dg-do compile } */
> >  
> >  int bar(int* p)
> 
> This looks wrong.  ldm is always smaller but is only faster on some cores, so 
> I'd expect compiling with -O2 to make this test less reliable.
> 
> Paul
> 

Additionally, all the define_insn patterns in the patch can generate
more than one instruction.  They MUST have a length attribute that
specifies the number of bytes that can be generated when the default (4
bytes) is insufficient; otherwise the constant placement code will fail.

R.
Mike Stump Jan. 12, 2011, 8:34 p.m. UTC | #15
On Jan 12, 2011, at 5:49 AM, Diego Novillo wrote:
> Patches are pinged many times and ignored for long stretches of time.  Perhaps we
> need more ARM maintainers?

3 months, yeah, I'd think it would be beneficial to have additional maintainers as well.  I don't think gcc is benefitted by review times that take more than 48 hours.  If the gcc web site had a poll button (a la /.), we could run random polls to help answer interesting questions.
Carrot Wei Jan. 13, 2011, 9:27 a.m. UTC | #16
One question about the attribute length. It looks the attribute
expression is not very powerful according to
http://gcc.gnu.org/onlinedocs/gccint/Expressions.html#Expressions,
then how can I express following expressions:

if (fix_cm3_ldrd && (operands[2] == operands[0]))

if (offset1 <= -256)

thanks
Carrot

>
> Additionally, all the define_insn patterns in the patch can generate
> more than one instruction.  They MUST have a length attribute that
> specifies the number of bytes that can be generated when the default (4
> bytes) is insufficient; otherwise the constant placement code will fail.
>
> R.
>
>
>
Richard Earnshaw Jan. 13, 2011, 10:02 a.m. UTC | #17
On Thu, 2011-01-13 at 17:27 +0800, Carrot Wei wrote:
> One question about the attribute length. It looks the attribute
> expression is not very powerful according to
> http://gcc.gnu.org/onlinedocs/gccint/Expressions.html#Expressions,
> then how can I express following expressions:
> 
> if (fix_cm3_ldrd && (operands[2] == operands[0]))
> 
> if (offset1 <= -256)
> 
> thanks
> Carrot
> 
> >
> > Additionally, all the define_insn patterns in the patch can generate
> > more than one instruction.  They MUST have a length attribute that
> > specifies the number of bytes that can be generated when the default (4
> > bytes) is insufficient; otherwise the constant placement code will fail.
> >
> > R.
> >
> >
> >
> 

In the worst case you have to make the attribute express the longest
sequence that the pattern can generate.  If that's going to be
significantly longer than the common case then really you should think
about how you might restructure the code to avoid over accounting too
often (maybe by creating separate constraint alternatives).  Attribute
expressions are not totally inflexible and with some care you can often
express the length quite precisely.

R.
Ramana Radhakrishnan Jan. 13, 2011, 10:25 a.m. UTC | #18
On Thu, Jan 13, 2011 at 9:27 AM, Carrot Wei <carrot@google.com> wrote:
> One question about the attribute length. It looks the attribute
> expression is not very powerful according to
> http://gcc.gnu.org/onlinedocs/gccint/Expressions.html#Expressions,
> then how can I express following expressions:
>
> if (fix_cm3_ldrd && (operands[2] == operands[0]))


Can't you express this as ?

(set_attr "length"
        (and (ne (symbol_ref ("fix_cm3_ldrd") (const_int 0))
               (eq (match_dup 2) (match_dup 0))) (const_int <length of insns>)


It's too early in the day and I haven't yet had my coffee but you
probably get the picture.

If you can't get separate constraints as Richard says or the logic as
I mention above becomes too complicated which I suspect it might, it
would be worth factoring out the logic into a common C function
parameterised on counting vs emiting. Then you could just call the
same function with and without the "emit/count" flag from the place
where you set the length attribute and the place where you want to
emit the assembler.

Then the logic is in one place and more maintainable than having 2
implementatons of the same logic.


HTH
Ramana
Mike Stump Jan. 13, 2011, 5:14 p.m. UTC | #19
On Jan 13, 2011, at 1:27 AM, Carrot Wei wrote:
> One question about the attribute length. It looks the attribute
> expression is not very powerful

Check out i386/i386.md...  Take a look at nops and *jcc_1 for example.
Carrot Wei Jan. 14, 2011, 9:18 a.m. UTC | #20
On Thu, Jan 13, 2011 at 6:25 PM, Ramana Radhakrishnan
<ramana.gcc@googlemail.com> wrote:
> On Thu, Jan 13, 2011 at 9:27 AM, Carrot Wei <carrot@google.com> wrote:
>> One question about the attribute length. It looks the attribute
>> expression is not very powerful according to
>> http://gcc.gnu.org/onlinedocs/gccint/Expressions.html#Expressions,
>> then how can I express following expressions:
>>
>> if (fix_cm3_ldrd && (operands[2] == operands[0]))
>
>
> Can't you express this as ?
>
> (set_attr "length"
>        (and (ne (symbol_ref ("fix_cm3_ldrd") (const_int 0))
>               (eq (match_dup 2) (match_dup 0))) (const_int <length of insns>)
>
According to http://gcc.gnu.org/onlinedocs/gccint/Insn-Lengths.html#Insn-Lengths,
(match_dup n) can only be used with a label_ref operand.

>
> It's too early in the day and I haven't yet had my coffee but you
> probably get the picture.
>
> If you can't get separate constraints as Richard says or the logic as
> I mention above becomes too complicated which I suspect it might, it
> would be worth factoring out the logic into a common C function
> parameterised on counting vs emiting. Then you could just call the
> same function with and without the "emit/count" flag from the place
> where you set the length attribute and the place where you want to
> emit the assembler.
>
> Then the logic is in one place and more maintainable than having 2
> implementatons of the same logic.
>
This is a good idea!

thanks
Carrot
Richard Earnshaw Jan. 14, 2011, 9:51 a.m. UTC | #21
On Fri, 2011-01-14 at 17:18 +0800, Carrot Wei wrote:
> On Thu, Jan 13, 2011 at 6:25 PM, Ramana Radhakrishnan
> <ramana.gcc@googlemail.com> wrote:
> > On Thu, Jan 13, 2011 at 9:27 AM, Carrot Wei <carrot@google.com> wrote:
> >> One question about the attribute length. It looks the attribute
> >> expression is not very powerful according to
> >> http://gcc.gnu.org/onlinedocs/gccint/Expressions.html#Expressions,
> >> then how can I express following expressions:
> >>
> >> if (fix_cm3_ldrd && (operands[2] == operands[0]))
> >
> >
> > Can't you express this as ?
> >
> > (set_attr "length"
> >        (and (ne (symbol_ref ("fix_cm3_ldrd") (const_int 0))
> >               (eq (match_dup 2) (match_dup 0))) (const_int <length of insns>)
> >
> According to http://gcc.gnu.org/onlinedocs/gccint/Insn-Lengths.html#Insn-Lengths,
> (match_dup n) can only be used with a label_ref operand.
> 
> >
> > It's too early in the day and I haven't yet had my coffee but you
> > probably get the picture.
> >
> > If you can't get separate constraints as Richard says or the logic as
> > I mention above becomes too complicated which I suspect it might, it
> > would be worth factoring out the logic into a common C function
> > parameterised on counting vs emiting. Then you could just call the
> > same function with and without the "emit/count" flag from the place
> > where you set the length attribute and the place where you want to
> > emit the assembler.
> >
> > Then the logic is in one place and more maintainable than having 2
> > implementatons of the same logic.
> >
> This is a good idea!
> 

It can sometimes be done that way, but beware: separating the length
calculations from the insns it relates to is a long-term maintenance
nightmare, because now the code is in two separate places.

R.

> thanks
> Carrot
>
diff mbox

Patch

Index: thumb2.md
===================================================================
--- thumb2.md   (revision 165492)
+++ thumb2.md   (working copy)
@@ -1118,3 +1118,228 @@  (define_peephole2
   "
   operands[2] = GEN_INT (32 - INTVAL (operands[2]));
   ")
+
+(define_insn "*thumb2_ldrd"
+  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
+                  (mem:SI (plus:SI
+                               (match_operand:SI 2 "s_register_operand" "rk")
+                               (match_operand:SI 3 "const_int_operand" ""))))
+             (set (match_operand:SI 1 "s_register_operand" "")
+                  (mem:SI (plus:SI (match_dup 2)
+                            (match_operand:SI 4 "const_int_operand" ""))))])]
+  "TARGET_THUMB2 && arm_arch7
+   && thumb2_check_ldrd_operands (operands[3], operands[4])"
+  "*
+  {
+    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
+    HOST_WIDE_INT offset2 = INTVAL (operands[4]);
+    if (offset1 > offset2)
+      {
+       /* Swap the operands so that memory [base+offset1] is loaded into
+          operands[0].  */
+       rtx tmp = operands[0];
+       operands[0] = operands[1];
+       operands[1] = tmp;
+       tmp = operands[3];
+       operands[3] = operands[4];
+       operands[4] = tmp;
+       offset1 = INTVAL (operands[3]);
+       offset2 = INTVAL (operands[4]);
+      }
+    if (thumb2_prefer_ldmstm (operands[0], operands[1],
+                             operands[2], operands[3], operands[4], true))
+      return \"ldmdb\\t%2, {%0, %1}\";
+    else if (fix_cm3_ldrd && (operands[2] == operands[0]))
+      {
+       if (offset1 <= -256)
+         {
+           output_asm_insn (\"sub\\t%2, %2, %n3\", operands);
+           output_asm_insn (\"ldr\\t%1, [%2, #4]\", operands);
+           output_asm_insn (\"ldr\\t%0, [%2]\", operands);
+         }
+       else
+         {
+           output_asm_insn (\"ldr\\t%1, [%2, %4]\", operands);
+           output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
+         }
+       return \"\";
+      }
+    else
+      return \"ldrd\\t%0, %1, [%2, %3]\";
+  }"
+)
+
+(define_insn "*thumb2_ldrd_reg1"
+  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
+                  (mem:SI (match_operand:SI 2 "s_register_operand" "rk")))
+             (set (match_operand:SI 1 "s_register_operand" "")
+                  (mem:SI (plus:SI (match_dup 2)
+                            (match_operand:SI 3 "const_int_operand" ""))))])]
+  "TARGET_THUMB2 && arm_arch7
+   && thumb2_check_ldrd_operands (NULL_RTX, operands[3])"
+  "*
+  {
+    HOST_WIDE_INT offset2 = INTVAL (operands[3]);
+    if (offset2 == 4)
+      {
+       if (thumb2_prefer_ldmstm (operands[0], operands[1],
+                                 operands[2], NULL_RTX, operands[3], true))
+         return \"ldmia\\t%2, {%0, %1}\";
+       if (fix_cm3_ldrd && (operands[2] == operands[0]))
+         {
+           output_asm_insn (\"ldr\\t%1, [%2, %3]\", operands);
+           output_asm_insn (\"ldr\\t%0, [%2]\", operands);
+           return \"\";
+         }
+       return \"ldrd\\t%0, %1, [%2]\";
+      }
+    else
+      {
+       if (fix_cm3_ldrd && (operands[2] == operands[1]))
+         {
+           output_asm_insn (\"ldr\\t%0, [%2]\", operands);
+           output_asm_insn (\"ldr\\t%1, [%2, %3]\", operands);
+         }
+       return \"ldrd\\t%1, %0, [%2, %3]\";
+      }
+  }"
+)
+
+(define_insn "*thumb2_ldrd_reg2"
+  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
+                  (mem:SI (plus:SI
+                               (match_operand:SI 2 "s_register_operand" "rk")
+                               (match_operand:SI 3 "const_int_operand" ""))))
+             (set (match_operand:SI 1 "s_register_operand" "")
+                  (mem:SI (match_dup 2)))])]
+  "TARGET_THUMB2 && arm_arch7
+   && thumb2_check_ldrd_operands (operands[3], NULL_RTX)"
+  "*
+  {
+    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
+    if (offset1 == -4)
+      {
+       if (fix_cm3_ldrd && (operands[2] == operands[0]))
+         {
+           output_asm_insn (\"ldr\\t%1, [%2]\", operands);
+           output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
+           return \"\";
+         }
+       return \"ldrd\\t%0, %1, [%2, %3]\";
+      }
+    else
+      {
+       if (thumb2_prefer_ldmstm (operands[0], operands[1],
+                                 operands[2], operands[3], NULL_RTX, true))
+         return \"ldmia\\t%2, {%1, %0}\";
+       if (fix_cm3_ldrd && (operands[2] == operands[1]))
+         {
+           output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
+           output_asm_insn (\"ldr\\t%1, [%2]\", operands);
+           return \"\";
+         }
+       return \"ldrd\\t%1, %0, [%2]\";
+      }
+  }"
+)
+
+(define_peephole2
+  [(set (match_operand:SI 0 "s_register_operand" "")
+       (match_operand:SI 2 "memory_operand" ""))
+   (set (match_operand:SI 1 "s_register_operand" "")
+       (match_operand:SI 3 "memory_operand" ""))]
+  "TARGET_THUMB2 && arm_arch7
+   && thumb2_legitimate_ldrd_p (operands[0], operands[1],
+                               operands[2], operands[3], true)"
+  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
+                  (match_operand:SI 2 "memory_operand" ""))
+             (set (match_operand:SI 1 "s_register_operand" "")
+                  (match_operand:SI 3 "memory_operand" ""))])]
+  ""
+)
+
+(define_insn "*thumb2_strd"
+  [(parallel [(set (mem:SI
+                       (plus:SI (match_operand:SI 2 "s_register_operand" "rk")
+                                (match_operand:SI 3 "const_int_operand" "")))
+                  (match_operand:SI 0 "s_register_operand" ""))
+             (set (mem:SI (plus:SI (match_dup 2)
+                                (match_operand:SI 4 "const_int_operand" "")))
+                  (match_operand:SI 1 "s_register_operand" ""))])]
+  "TARGET_THUMB2 && arm_arch7
+   && thumb2_check_ldrd_operands (operands[3], operands[4])"
+  "*
+  {
+    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
+    HOST_WIDE_INT offset2 = INTVAL (operands[4]);
+    if (thumb2_prefer_ldmstm (operands[0], operands[1],
+                             operands[2], operands[3], operands[4], false))
+      return \"stmdb\\t%2, {%0, %1}\";
+    if (offset1 < offset2)
+      return \"strd\\t%0, %1, [%2, %3]\";
+    else
+      return \"strd\\t%1, %0, [%2, %4]\";
+  }"
+)
+
+(define_insn "*thumb2_strd_reg1"
+  [(parallel [(set (mem:SI (match_operand:SI 2 "s_register_operand" "rk"))
+                  (match_operand:SI 0 "s_register_operand" ""))
+             (set (mem:SI (plus:SI (match_dup 2)
+                               (match_operand:SI 3 "const_int_operand" "")))
+                  (match_operand:SI 1 "s_register_operand" ""))])]
+  "TARGET_THUMB2 && arm_arch7
+   && thumb2_check_ldrd_operands (NULL_RTX, operands[3])"
+  "*
+  {
+    HOST_WIDE_INT offset2 = INTVAL (operands[3]);
+    if (offset2 == 4)
+      {
+       if (thumb2_prefer_ldmstm (operands[0], operands[1],
+                                 operands[2], NULL_RTX, operands[3], false))
+         return \"stmia\\t%2, {%0, %1}\";
+       return \"strd\\t%0, %1, [%2]\";
+      }
+    else
+      return \"strd\\t%1, %0, [%2, %3]\";
+  }"
+)
+
+(define_insn "*thumb2_strd_reg2"
+  [(parallel [(set (mem:SI (plus:SI
+                               (match_operand:SI 2 "s_register_operand" "rk")
+                               (match_operand:SI 3 "const_int_operand" "")))
+                  (match_operand:SI 0 "s_register_operand" ""))
+             (set (mem:SI (match_dup 2))
+                  (match_operand:SI 1 "s_register_operand" ""))])]
+  "TARGET_THUMB2 && arm_arch7
+   && thumb2_check_ldrd_operands (operands[3], NULL_RTX)"
+  "*
+  {
+    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
+    if (offset1 == -4)
+      return \"strd\\t%0, %1, [%2, %3]\";
+    else
+      {
+       if (thumb2_prefer_ldmstm (operands[0], operands[1],
+                                 operands[2], operands[3], NULL_RTX, false))
+         return \"stmia\\t%2, {%1, %0}\";
+       return \"strd\\t%1, %0, [%2]\";
+      }
+  }"
+)
+
+(define_peephole2
+  [(set (match_operand:SI 2 "memory_operand" "")
+       (match_operand:SI 0 "s_register_operand" ""))
+   (set (match_operand:SI 3 "memory_operand" "")
+       (match_operand:SI 1 "s_register_operand" ""))]
+  "TARGET_THUMB2 && arm_arch7
+   && thumb2_legitimate_ldrd_p (operands[0], operands[1],
+                               operands[2], operands[3], false)"
+  [(parallel [(set (match_operand:SI 2 "memory_operand" "")
+                  (match_operand:SI 0 "s_register_operand" ""))
+             (set (match_operand:SI 3 "memory_operand" "")
+                  (match_operand:SI 1 "s_register_operand" ""))])]
+  ""
+)
Index: arm.c
===================================================================
--- arm.c       (revision 165492)
+++ arm.c       (working copy)
@@ -23254,4 +23254,134 @@  arm_builtin_support_vector_misalignment
                                                      is_packed);
 }

+/* Check the validity of operands in an ldrd/strd instruction.  */
+bool
+thumb2_check_ldrd_operands (rtx off1, rtx off2)
+{
+  HOST_WIDE_INT offset1 = 0;
+  HOST_WIDE_INT offset2 = 0;
+
+  if (off1 != NULL_RTX)
+    offset1 = INTVAL (off1);
+  if (off2 != NULL_RTX)
+    offset2 = INTVAL (off2);
+
+  /* The offset range of LDRD is [-1020, 1020]. Here we check if both
+     offsets lie in the range [-1020, 1024]. If one of the offsets is
+     1024, the following condition ((offset1 + 4) == offset2) will ensure
+     offset1 to be 1020, suitable for instruction LDRD.  */
+  if ((offset1 > 1024) || (offset1 < -1020) || ((offset1 & 3) != 0))
+    return false;
+  if ((offset2 > 1024) || (offset2 < -1020) || ((offset2 & 3) != 0))
+    return false;
+
+  if ((offset1 + 4) == offset2)
+    return true;
+  if ((offset2 + 4) == offset1)
+    return true;
+
+  return false;
+}
+
+/* Check if the two memory accesses can be merged to an ldrd/strd instruction.
+   That is they use the same base register, and the gap between constant
+   offsets should be 4.  */
+bool
+thumb2_legitimate_ldrd_p (rtx reg1, rtx reg2, rtx mem1, rtx mem2, bool ldrd)
+{
+  rtx base1, base2, op1;
+  rtx addr1 = XEXP (mem1, 0);
+  rtx addr2 = XEXP (mem2, 0);
+  HOST_WIDE_INT offset1 = 0;
+  HOST_WIDE_INT offset2 = 0;
+
+  if (MEM_VOLATILE_P (mem1) || MEM_VOLATILE_P (mem2))
+    return false;
+
+  if (REG_P (addr1))
+    base1 = addr1;
+  else if (GET_CODE (addr1) == PLUS)
+    {
+      base1 = XEXP (addr1, 0);
+      op1 = XEXP (addr1, 1);
+      if (!REG_P (base1) || (GET_CODE (op1) != CONST_INT))
+       return false;
+      offset1 = INTVAL (op1);
+    }
+  else
+    return false;
+
+  if (REG_P (addr2))
+    base2 = addr2;
+  else if (GET_CODE (addr2) == PLUS)
+    {
+      base2 = XEXP (addr2, 0);
+      op1 = XEXP (addr2, 1);
+      if (!REG_P (base2) || (GET_CODE (op1) != CONST_INT))
+       return false;
+      offset2 = INTVAL (op1);
+    }
+  else
+    return false;
+
+  if (base1 != base2)
+    return false;
+
+  /* The offset range of LDRD is [-1020, 1020]. Here we check if both
+     offsets lie in the range [-1020, 1024]. If one of the offsets is
+     1024, the following condition ((offset1 + 4) == offset2) will ensure
+     offset1 to be 1020, suitable for instruction LDRD.  */
+  if ((offset1 > 1024) || (offset1 < -1020) || ((offset1 & 3) != 0))
+    return false;
+  if ((offset2 > 1024) || (offset2 < -1020) || ((offset2 & 3) != 0))
+    return false;
+
+  if (ldrd && ((reg1 == reg2) || (reg1 == base1)))
+    return false;
+
+  if ((offset1 + 4) == offset2)
+    return true;
+  if ((offset2 + 4) == offset1)
+    return true;
+
+  return false;
+}
+
+/* Check if the insn can be expressed as ldm/stm with less cost.  */
+bool
+thumb2_prefer_ldmstm (rtx reg1, rtx reg2, rtx base,
+                     rtx off1, rtx off2, bool ldrd)
+{
+  HOST_WIDE_INT offset1 = 0;
+  HOST_WIDE_INT offset2 = 0;
+
+  if (off1 != NULL_RTX)
+    offset1 = INTVAL (off1);
+  if (off2 != NULL_RTX)
+    offset2 = INTVAL (off2);
+
+  if (offset1 > offset2)
+    {
+      rtx tmp;
+      HOST_WIDE_INT t = offset1;
+      offset1 = offset2;
+      offset2 = t;
+      tmp = reg1;
+      reg1 = reg2;
+      reg2 = tmp;
+    }
+
+  /* The offset of ldmdb is -8, the offset of ldmia is 0.  */
+  if ((offset1 != -8) && (offset1 != 0))
+    return false;
+
+  /* Lower register corresponds to lower memory.  */
+  if (REGNO (reg1) > REGNO (reg2))
+    return false;
+
+  /* Now ldm/stm is possible. Check for special cases ldm/stm has lower
+     cost.  */
+  return false;
+}
+
 #include "gt-arm.h"
Index: arm-protos.h
===================================================================
--- arm-protos.h        (revision 165492)
+++ arm-protos.h        (working copy)
@@ -150,6 +150,9 @@  extern void arm_expand_sync (enum machin
 extern const char *arm_output_memory_barrier (rtx *);
 extern const char *arm_output_sync_insn (rtx, rtx *);
 extern unsigned int arm_sync_loop_insns (rtx , rtx *);
+extern bool thumb2_check_ldrd_operands (rtx, rtx);
+extern bool thumb2_legitimate_ldrd_p (rtx, rtx, rtx, rtx, bool);
+extern bool thumb2_prefer_ldmstm (rtx, rtx, rtx, rtx, rtx, bool);

 #if defined TREE_CODE
 extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree);
Index: ldmstm.md
===================================================================
--- ldmstm.md   (revision 165492)
+++ ldmstm.md   (working copy)
@@ -852,7 +852,7 @@  (define_insn "*ldm2_ia"
      (set (match_operand:SI 2 "arm_hard_register_operand" "")
           (mem:SI (plus:SI (match_dup 3)
                   (const_int 4))))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
   "ldm%(ia%)\t%3, {%1, %2}"
   [(set_attr "type" "load2")
    (set_attr "predicable" "yes")])
@@ -901,7 +901,7 @@  (define_insn "*stm2_ia"
           (match_operand:SI 1 "arm_hard_register_operand" ""))
      (set (mem:SI (plus:SI (match_dup 3) (const_int 4)))
           (match_operand:SI 2 "arm_hard_register_operand" ""))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
   "stm%(ia%)\t%3, {%1, %2}"
   [(set_attr "type" "store2")
    (set_attr "predicable" "yes")])
@@ -1041,7 +1041,7 @@  (define_insn "*ldm2_db"
      (set (match_operand:SI 2 "arm_hard_register_operand" "")
           (mem:SI (plus:SI (match_dup 3)
                   (const_int -4))))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
   "ldm%(db%)\t%3, {%1, %2}"
   [(set_attr "type" "load2")
    (set_attr "predicable" "yes")])
@@ -1067,7 +1067,7 @@  (define_insn "*stm2_db"
           (match_operand:SI 1 "arm_hard_register_operand" ""))
      (set (mem:SI (plus:SI (match_dup 3) (const_int -4)))
           (match_operand:SI 2 "arm_hard_register_operand" ""))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
   "stm%(db%)\t%3, {%1, %2}"
   [(set_attr "type" "store2")
    (set_attr "predicable" "yes")])


Index: pr40457-3.c
===================================================================
--- pr40457-3.c (revision 165492)
+++ pr40457-3.c (working copy)
@@ -5,6 +5,7 @@  void foo(int* p)
 {
   p[0] = 1;
   p[1] = 0;
+  p[2] = 2;
 }

 /* { dg-final { scan-assembler "stm" } } */
Index: pr40457-1.c
===================================================================
--- pr40457-1.c (revision 165492)
+++ pr40457-1.c (working copy)
@@ -1,9 +1,9 @@ 
-/* { dg-options "-Os" }  */
+/* { dg-options "-O2" }  */
 /* { dg-do compile } */

 int bar(int* p)
 {
-  int x = p[0] + p[1];
+  int x = p[0] + p[1] + p[2];
   return x;
 }

Index: pr40457-2.c
===================================================================
--- pr40457-2.c (revision 165492)
+++ pr40457-2.c (working copy)
@@ -5,6 +5,7 @@  void foo(int* p)
 {
   p[0] = 1;
   p[1] = 0;
+  p[2] = 2;
 }

 /* { dg-final { scan-assembler "stm" } } */
Index: pr45335.c
===================================================================
--- pr45335.c   (revision 0)
+++ pr45335.c   (revision 0)
@@ -0,0 +1,22 @@ 
+/* { dg-options "-mthumb -O2" } */
+/* { dg-require-effective-target arm_thumb2_ok } */
+/* { dg-final { scan-assembler "ldrd" } } */
+/* { dg-final { scan-assembler "strd" } } */
+
+struct S
+{
+    void* p1;
+    void* p2;
+    void* p3;
+    void* p4;
+};
+
+extern printf(char*, ...);
+
+void foo1(struct S* fp, struct S* otherSaveArea)
+{
+    struct S* saveA = fp - 1;
+    printf("StackSaveArea for fp %p [%p/%p]:\n", fp, saveA, otherSaveArea);
+    printf("prevFrame=%p savedPc=%p meth=%p curPc=%p fp[0]=0x%08x\n",
+        saveA->p1, saveA->p2, saveA->p3, saveA->p4, *(unsigned int*)fp);
+}