diff mbox

[PR67383,ARM,4.9] Backport of "Allow any register for DImode values in Thumb2"

Message ID 561FB1B4.3060103@arm.com
State New
Headers show

Commit Message

Renlin Li Oct. 15, 2015, 2:01 p.m. UTC
Hi all,

This is a backport patch to loosen restrictions on core registers for 
DImode values in Thumb2.

It fixes PR67383. In this particular case, reload tries to spill a hard 
register, and use next register together as a pair to reload a DImode 
pseudo register. However, the spilled register number is odd.This is 
rejected by arm_hard_regno_mode_ok(). There is no other register 
available, so the compiler throws an ICE.


The test case in PR67383 is too big, so I didn't include it as part of 
the patch.
arm-none-eabi regression test Okay without any new issues. Okay to 
backport to 4.9?

Regards,
Renlin Li

gcc/ChangeLog:

2015-10-15  Renlin Li  <renlin.li@arm.com>

         PR target/67383
         Backport from mainline.
         2014-04-22  Ramana Radhakrishnan <ramana.radhakrishnan@arm.com>

         * config/arm/arm.c (arm_hard_regno_mode_ok): Loosen
         restrictions on core registers for DImode values in Thumb2.

Comments

Ramana Radhakrishnan Oct. 16, 2015, 10:52 a.m. UTC | #1
On Thu, Oct 15, 2015 at 03:01:24PM +0100, Renlin Li wrote:
> Hi all,
> 
> This is a backport patch to loosen restrictions on core registers
> for DImode values in Thumb2.
> 
> It fixes PR67383. In this particular case, reload tries to spill a
> hard register, and use next register together as a pair to reload a
> DImode pseudo register. However, the spilled register number is
> odd.This is rejected by arm_hard_regno_mode_ok(). There is no other
> register available, so the compiler throws an ICE.

I was not convinced enough by the reasoning provided in the description
because this patch was intended to be a bit of an optimization
rather than a correctness fix.

The command line implies we remove r7 (frame pointer in Thumb2 - historical accident, fno-omit-frame-pointer), r9 (ffixed-r9), r10 (-mpic-register) which
leaves us with:

* r0, r1
* r2, r3
* r4, r5

as the only free registers available for DImode values for the whole compilation.

We then have r0, r1 and r2 live across the insn which means that there are no free registers to handle DImode values
under the constraints provided unless LRA / reload can spill the argument registers which it doesn't seem to be able to do
in this particular testcase. Vlad, is that correct ?

Then I wondered why the same problem did not occur in ARM state given that has the same restriction.
In ARM state life is a bit better because the Frame pointer is r11 which means you pretty much have r6 and r7
as well available in addition to the above list, which means that theoretically you can
get away with this in ARM state.

Can you do some more comparison with ARM state as to why we don't have the same issue there ?


> 
> The test case in PR67383 is too big, so I didn't include it as part
> of the patch.

I've put up a reduced testcase on the bz, the one I was using to debug.

> arm-none-eabi regression test Okay without any new issues. Okay to
> backport to 4.9?

For changes of this nature please bootstrap and regression test this in arm and thumb2 state as well please.

regards
Ramana
Renlin Li Oct. 16, 2015, 1:54 p.m. UTC | #2
Hi Ramana,

On 16/10/15 11:52, Ramana Radhakrishnan wrote:
> On Thu, Oct 15, 2015 at 03:01:24PM +0100, Renlin Li wrote:
>> Hi all,
>>
>> This is a backport patch to loosen restrictions on core registers
>> for DImode values in Thumb2.
>>
>> It fixes PR67383. In this particular case, reload tries to spill a
>> hard register, and use next register together as a pair to reload a
>> DImode pseudo register. However, the spilled register number is
>> odd.This is rejected by arm_hard_regno_mode_ok(). There is no other
>> register available, so the compiler throws an ICE.
> I was not convinced enough by the reasoning provided in the description
> because this patch was intended to be a bit of an optimization
> rather than a correctness fix.

True, It's not a fix. It just allows more flexibility for register 
allocation.

> The command line implies we remove r7 (frame pointer in Thumb2 - historical accident, fno-omit-frame-pointer), r9 (ffixed-r9), r10 (-mpic-register) which
> leaves us with:
>
> * r0, r1
> * r2, r3
> * r4, r5
>
> as the only free registers available for DImode values for the whole compilation.
>
> We then have r0, r1 and r2 live across the insn which means that there are no free registers to handle DImode values
> under the constraints provided unless LRA / reload can spill the argument registers which it doesn't seem to be able to do
> in this particular testcase. Vlad, is that correct ?
According to the logic, conflict hard register are excluded from spill 
candidate. That's why, in this case, r0, r1, r2 cannot be used.

> Then I wondered why the same problem did not occur in ARM state given that has the same restriction.
> In ARM state life is a bit better because the Frame pointer is r11 which means you pretty much have r6 and r7
> as well available in addition to the above list, which means that theoretically you can
> get away with this in ARM state.
>
> Can you do some more comparison with ARM state as to why we don't have the same issue there ?
>
Presumably, ARM state should suffer from the same issue. I will have a look.

Regards,
Renlin
>> The test case in PR67383 is too big, so I didn't include it as part
>> of the patch.
> I've put up a reduced testcase on the bz, the one I was using to debug.
>
>> arm-none-eabi regression test Okay without any new issues. Okay to
>> backport to 4.9?
> For changes of this nature please bootstrap and regression test this in arm and thumb2 state as well please.
>
> regards
> Ramana
>
Renlin Li Nov. 27, 2015, 9:40 a.m. UTC | #3
Hi Ramana,

On 16/10/15 14:54, Renlin Li wrote:
>
>
>> The command line implies we remove r7 (frame pointer in Thumb2 - 
>> historical accident, fno-omit-frame-pointer), r9 (ffixed-r9), r10 
>> (-mpic-register) which
>> leaves us with:
>>
>> * r0, r1
>> * r2, r3
>> * r4, r5
>>
>> as the only free registers available for DImode values for the whole 
>> compilation.
>>
>> We then have r0, r1 and r2 live across the insn which means that 
>> there are no free registers to handle DImode values
>> under the constraints provided unless LRA / reload can spill the 
>> argument registers which it doesn't seem to be able to do
>> in this particular testcase. Vlad, is that correct ?
> According to the logic, conflict hard register are excluded from spill 
> candidate. That's why, in this case, r0, r1, r2 cannot be used.


In the test case, there are code structure like this.


uint64_t callee (int a, int b, int c, int d);
uint64_t caller (int a, int b, int c, int d)
{
   uint64_t res;
/*
single BB contains complicated data processing which requires register pair
*/

   res = callee (tmp, b ,c, d);
   return res;
}

CES pass in this case will extend the hard register live range across 
the whole BB until the callee. In this case, r1, r2, r3 are excluded 
from allocatable registers.

There are places in CES which prevents extending the hard register's 
live range, for example for hard register which fullfil 
small_register_classes_for_mode_p(), class_likely_spilled_p(). However, 
argument registers belong to neither of them.

I tried to stop CES from extending argument registers live range. 
However, later, scheduler jumps in and re-orders the instruction to 
reduce the pseudo register pressure, which in effect extend the argument 
register live again.

Regards,

Renlin Li
Ramana Radhakrishnan Nov. 27, 2015, 11:26 a.m. UTC | #4
On 27/11/15 09:40, Renlin Li wrote:
> Hi Ramana,
> 
> On 16/10/15 14:54, Renlin Li wrote:
>>
>>
>>> The command line implies we remove r7 (frame pointer in Thumb2 - historical accident, fno-omit-frame-pointer), r9 (ffixed-r9), r10 (-mpic-register) which
>>> leaves us with:
>>>
>>> * r0, r1
>>> * r2, r3
>>> * r4, r5
>>>
>>> as the only free registers available for DImode values for the whole compilation.
>>>
>>> We then have r0, r1 and r2 live across the insn which means that there are no free registers to handle DImode values
>>> under the constraints provided unless LRA / reload can spill the argument registers which it doesn't seem to be able to do
>>> in this particular testcase. Vlad, is that correct ?
>> According to the logic, conflict hard register are excluded from spill candidate. That's why, in this case, r0, r1, r2 cannot be used.
> 
> 
> In the test case, there are code structure like this.
> 
> 
> uint64_t callee (int a, int b, int c, int d);
> uint64_t caller (int a, int b, int c, int d)
> {
>   uint64_t res;
> /*
> single BB contains complicated data processing which requires register pair
> */
> 
>   res = callee (tmp, b ,c, d);
>   return res;
> }
> 
> CES pass in this case will extend the hard register live range across the whole BB until the callee. In this case, r1, r2, r3 are excluded from allocatable registers.
> 
> There are places in CES which prevents extending the hard register's live range, for example for hard register which fullfil small_register_classes_for_mode_p(), class_likely_spilled_p(). However, argument registers belong to neither of them.
> 
> I tried to stop CES from extending argument registers live range. However, later, scheduler jumps in and re-orders the instruction to reduce the pseudo register pressure, which in effect extend the argument register live again.

Thanks for digging further and trying to figure out what the solution was. I can't think of a less risky fix than what you have proposed, thus Ok if no regressions.


regards
Ramana





> 
> Regards,
> 
> Renlin Li
> 
> 
>
Christophe Lyon Dec. 3, 2015, 2:25 p.m. UTC | #5
On 27 November 2015 at 12:26, Ramana Radhakrishnan
<ramana.radhakrishnan@foss.arm.com> wrote:
>
>
> On 27/11/15 09:40, Renlin Li wrote:
>> Hi Ramana,
>>
>> On 16/10/15 14:54, Renlin Li wrote:
>>>
>>>
>>>> The command line implies we remove r7 (frame pointer in Thumb2 - historical accident, fno-omit-frame-pointer), r9 (ffixed-r9), r10 (-mpic-register) which
>>>> leaves us with:
>>>>
>>>> * r0, r1
>>>> * r2, r3
>>>> * r4, r5
>>>>
>>>> as the only free registers available for DImode values for the whole compilation.
>>>>
>>>> We then have r0, r1 and r2 live across the insn which means that there are no free registers to handle DImode values
>>>> under the constraints provided unless LRA / reload can spill the argument registers which it doesn't seem to be able to do
>>>> in this particular testcase. Vlad, is that correct ?
>>> According to the logic, conflict hard register are excluded from spill candidate. That's why, in this case, r0, r1, r2 cannot be used.
>>
>>
>> In the test case, there are code structure like this.
>>
>>
>> uint64_t callee (int a, int b, int c, int d);
>> uint64_t caller (int a, int b, int c, int d)
>> {
>>   uint64_t res;
>> /*
>> single BB contains complicated data processing which requires register pair
>> */
>>
>>   res = callee (tmp, b ,c, d);
>>   return res;
>> }
>>
>> CES pass in this case will extend the hard register live range across the whole BB until the callee. In this case, r1, r2, r3 are excluded from allocatable registers.
>>
>> There are places in CES which prevents extending the hard register's live range, for example for hard register which fullfil small_register_classes_for_mode_p(), class_likely_spilled_p(). However, argument registers belong to neither of them.
>>
>> I tried to stop CES from extending argument registers live range. However, later, scheduler jumps in and re-orders the instruction to reduce the pseudo register pressure, which in effect extend the argument register live again.
>
> Thanks for digging further and trying to figure out what the solution was. I can't think of a less risky fix than what you have proposed, thus Ok if no regressions.
>
>

Hi,

I have noticed regressions after this commit to the 4.9 branch:
Passed now fails          [PASS => FAIL]:
  gcc.c-torture/compile/pr34856.c  -O3 -fomit-frame-pointer
-funroll-all-loops -finline-functions  (test for excess errors)
  gcc.c-torture/compile/pr34856.c  -O3 -fomit-frame-pointer
-funroll-loops  (test for excess errors)
Pass disappears           [PASS =>     ]:
  gcc.c-torture/execute/scal-to-vec1.c compilation,  -O2
  gcc.c-torture/execute/scal-to-vec1.c compilation,  -O2 -flto
-fno-use-linker-plugin -flto-partition=none
  gcc.c-torture/execute/scal-to-vec1.c compilation,  -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects
Fail appears              [     => FAIL]:
  gcc.c-torture/compile/pr34856.c  -O3 -fomit-frame-pointer
-funroll-all-loops -finline-functions  (internal compiler error)
  gcc.c-torture/compile/pr34856.c  -O3 -fomit-frame-pointer
-funroll-loops  (internal compiler error)
  gcc.c-torture/execute/scal-to-vec1.c compilation,  -O2  (internal
compiler error)
  gcc.c-torture/execute/scal-to-vec1.c compilation,  -O2 -flto
-fno-use-linker-plugin -flto-partition=none  (internal compiler error)
  gcc.c-torture/execute/scal-to-vec1.c compilation,  -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects  (internal compiler error)

See the red links in
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/gcc-4_9-branch/231177/report-build-info.html

Christophe.

> regards
> Ramana
>
>
>
>
>
>>
>> Regards,
>>
>> Renlin Li
>>
>>
>>
diff mbox

Patch

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 08b5255..88d957a 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -22646,12 +22646,19 @@  arm_hard_regno_mode_ok (unsigned int regno, enum machine_mode mode)
     }
 
   /* We allow almost any value to be stored in the general registers.
-     Restrict doubleword quantities to even register pairs so that we can
-     use ldrd.  Do not allow very large Neon structure opaque modes in
-     general registers; they would use too many.  */
+     Restrict doubleword quantities to even register pairs in ARM state
+     so that we can use ldrd.  Do not allow very large Neon structure
+     opaque modes in general registers; they would use too many.  */
   if (regno <= LAST_ARM_REGNUM)
-    return !(TARGET_LDRD && GET_MODE_SIZE (mode) > 4 && (regno & 1) != 0)
-      && ARM_NUM_REGS (mode) <= 4;
+    {
+      if (ARM_NUM_REGS (mode) > 4)
+	  return FALSE;
+
+      if (TARGET_THUMB2)
+	return TRUE;
+
+      return !(TARGET_LDRD && GET_MODE_SIZE (mode) > 4 && (regno & 1) != 0);
+    }
 
   if (regno == FRAME_POINTER_REGNUM
       || regno == ARG_POINTER_REGNUM)