diff mbox

[ARM] PR48250, rehaul arm_legitimize_reload_address()

Message ID 4D9EBEC8.2020807@codesourcery.com
State New
Headers show

Commit Message

Chung-Lin Tang April 8, 2011, 7:52 a.m. UTC
Hi Richard,
here's a new patch, with some quite significant changes to
arm_legitimize_reload_address(). This time fully utilizing the
sign-magnitude offset macro you gave in the last thread, and adapting
your description into embedded comments.

The reload_outdf pattern caused some regressions in a first round of
testing. With the new reload address legitimization, which creates a RTX
like (mem (plus (plus reg #high) #low)), the current
SECONDARY_RELOAD_OUTPUT_CLASS macro returns 'GENERAL_REGS' under VFP,
which triggers the execution of reload_outdf, but now creating
unexpected (set (reg1) (plus (plus reg #high) #low)) insns.

The historical reasons for the reload_outdf pattern are not entirely
clear, as we discussed off list. In any case, we should aim to update
the ARM secondary reload stuff to use the newer framework. I have
disabled the pattern under ARM mode, and cross-tested the entire patch
under v7-A/v5TE VFP, v7-A NEON, and v4T soft-float configs, all clear of
regressions.

reload_outdf is disabled by changing the pattern condition from
TARGET_32BIT to TARGET_THUMB2 (disabling TARGET_ARM only). This is
temporary, Thumb-2 probably needs a few more bits of simultaneous
updates to work, so I'm leaving that for a later patch.

I've listed your name together as significant parts are from you. Is it
ready for trunk?

Thanks,
Chung-Lin

2011-04-08  Chung-Lin Tang  <cltang@codesourcery.com>
	    Richard Earnshaw  <rearnsha@arm.com>

	PR target/48250
	* config/arm/arm.c (arm_legitimize_reload_address): Update cases
	to use sign-magnitude offsets. Reject unsupported unaligned
	cases. Add detailed description in comments.
	* config/arm/arm.md (reload_outdf): Disable for ARM mode; change
	condition from TARGET_32BIT to TARGET_ARM.

Comments

Richard Earnshaw April 11, 2011, 10:53 a.m. UTC | #1
On Fri, 2011-04-08 at 15:52 +0800, Chung-Lin Tang wrote:
> Hi Richard,
> here's a new patch, with some quite significant changes to
> arm_legitimize_reload_address(). This time fully utilizing the
> sign-magnitude offset macro you gave in the last thread, and adapting
> your description into embedded comments.
> 
> The reload_outdf pattern caused some regressions in a first round of
> testing. With the new reload address legitimization, which creates a RTX
> like (mem (plus (plus reg #high) #low)), the current
> SECONDARY_RELOAD_OUTPUT_CLASS macro returns 'GENERAL_REGS' under VFP,
> which triggers the execution of reload_outdf, but now creating
> unexpected (set (reg1) (plus (plus reg #high) #low)) insns.
> 
> The historical reasons for the reload_outdf pattern are not entirely
> clear, as we discussed off list. In any case, we should aim to update
> the ARM secondary reload stuff to use the newer framework. I have
> disabled the pattern under ARM mode, and cross-tested the entire patch
> under v7-A/v5TE VFP, v7-A NEON, and v4T soft-float configs, all clear of
> regressions.
> 
> reload_outdf is disabled by changing the pattern condition from
> TARGET_32BIT to TARGET_THUMB2 (disabling TARGET_ARM only). This is
> temporary, Thumb-2 probably needs a few more bits of simultaneous
> updates to work, so I'm leaving that for a later patch.
> 
> I've listed your name together as significant parts are from you. Is it
> ready for trunk?
> 
> Thanks,
> Chung-Lin
> 
> 2011-04-08  Chung-Lin Tang  <cltang@codesourcery.com>
> 	    Richard Earnshaw  <rearnsha@arm.com>
> 
> 	PR target/48250
> 	* config/arm/arm.c (arm_legitimize_reload_address): Update cases
> 	to use sign-magnitude offsets. Reject unsupported unaligned
> 	cases. Add detailed description in comments.
> 	* config/arm/arm.md (reload_outdf): Disable for ARM mode; change
> 	condition from TARGET_32BIT to TARGET_ARM.

This is OK, thanks for working on it.

Nothing to do with this patch specifically, but I'll note here for
completeness that LEGITIMIZE_RELOAD_ADDRESS is missing some fundamental
information which makes generating optimal code here impossible: there's
no information available about what the address is going to be used for.
Given that the set of legitimate addresses depends on this is, IMO, a
fundamental problem.

R.
Richard Sandiford April 20, 2011, 1:24 p.m. UTC | #2
Hi Chung-Lin,

I'm seeing an ICE with this patch, specifically;

Chung-Lin Tang <cltang@codesourcery.com> writes:
> +      if (coproc_p)
> +	low = SIGN_MAG_LOW_ADDR_BITS (val, 10);

We generate:

Reload 1: reload_out (V4SI) = (mem/c:V4SI (plus:SI (plus:SI (reg/f:SI 11 fp)
                                                            (const_int -6144 [0xffffffffffffe800]))
                                                        (const_int 1020 [0x3fc])) [43 %sfp+-5024 S16 A64])

but 1020 isn't a legitimate offset for V4SI:

  /* For quad modes, we restrict the constant offset to be slightly less
     than what the instruction format permits.  We do this because for
     quad mode moves, we will actually decompose them into two separate
     double-mode reads or writes.  INDEX must therefore be a valid
     (double-mode) offset and so should INDEX+8.  */
  if (TARGET_NEON && VALID_NEON_QREG_MODE (mode))
    return (code == CONST_INT
	    && INTVAL (index) < 1016
	    && INTVAL (index) > -1024
	    && (INTVAL (index) & 3) == 0);

A simple "fix" would be to use 9 instead of 10, but something a little
more subtle might be preferred :-)

Richard
Chung-Lin Tang April 20, 2011, 3:06 p.m. UTC | #3
On 2011/4/20 09:24 PM, Richard Sandiford wrote:
> Hi Chung-Lin,
> 
> I'm seeing an ICE with this patch, specifically;
> 
> Chung-Lin Tang <cltang@codesourcery.com> writes:
>> +      if (coproc_p)
>> +	low = SIGN_MAG_LOW_ADDR_BITS (val, 10);
> 
> We generate:
> 
> Reload 1: reload_out (V4SI) = (mem/c:V4SI (plus:SI (plus:SI (reg/f:SI 11 fp)
>                                                             (const_int -6144 [0xffffffffffffe800]))
>                                                         (const_int 1020 [0x3fc])) [43 %sfp+-5024 S16 A64])
> 
> but 1020 isn't a legitimate offset for V4SI:
> 
>   /* For quad modes, we restrict the constant offset to be slightly less
>      than what the instruction format permits.  We do this because for
>      quad mode moves, we will actually decompose them into two separate
>      double-mode reads or writes.  INDEX must therefore be a valid
>      (double-mode) offset and so should INDEX+8.  */
>   if (TARGET_NEON && VALID_NEON_QREG_MODE (mode))
>     return (code == CONST_INT
> 	    && INTVAL (index) < 1016
> 	    && INTVAL (index) > -1024
> 	    && (INTVAL (index) & 3) == 0);
> 
> A simple "fix" would be to use 9 instead of 10, but something a little
> more subtle might be preferred :-)
> 
> Richard

Oh dear, for some reason I mistakenly thought that NEON had a quad-word
load/store, sorry :P

Reducing from 10 to 9 may be a possible solution, if restricted to the
necessary cases. For example:

-low = SIGN_MAG_LOW_ADDR_BITS (val, 10);
+{
+  low = SIGN_MAG_LOW_ADDR_BITS (val, 10);
+
+  /* NEON quad-word load/stores are made of two double-word accesses,
+     so the valid index range is reduced by 8. Treat as 9-bit range if
+     we go over it.  */
+  if (TARGET_NEON && VALID_NEON_QREG_MODE (mode) && low >= 1016)
+    low = SIGN_MAG_LOW_ADDR_BITS (val, 9);
+}

To Richard Earnshaw, how do you think of a fix like this? Or should we
just simply return false under this out-of-range case (it should be rare
I hope).

Thanks,
Chung-Lin
Richard Earnshaw April 20, 2011, 3:12 p.m. UTC | #4
On Wed, 2011-04-20 at 23:06 +0800, Chung-Lin Tang wrote:
> On 2011/4/20 09:24 PM, Richard Sandiford wrote:
> > Hi Chung-Lin,
> > 
> > I'm seeing an ICE with this patch, specifically;
> > 
> > Chung-Lin Tang <cltang@codesourcery.com> writes:
> >> +      if (coproc_p)
> >> +	low = SIGN_MAG_LOW_ADDR_BITS (val, 10);
> > 
> > We generate:
> > 
> > Reload 1: reload_out (V4SI) = (mem/c:V4SI (plus:SI (plus:SI (reg/f:SI 11 fp)
> >                                                             (const_int -6144 [0xffffffffffffe800]))
> >                                                         (const_int 1020 [0x3fc])) [43 %sfp+-5024 S16 A64])
> > 
> > but 1020 isn't a legitimate offset for V4SI:
> > 
> >   /* For quad modes, we restrict the constant offset to be slightly less
> >      than what the instruction format permits.  We do this because for
> >      quad mode moves, we will actually decompose them into two separate
> >      double-mode reads or writes.  INDEX must therefore be a valid
> >      (double-mode) offset and so should INDEX+8.  */
> >   if (TARGET_NEON && VALID_NEON_QREG_MODE (mode))
> >     return (code == CONST_INT
> > 	    && INTVAL (index) < 1016
> > 	    && INTVAL (index) > -1024
> > 	    && (INTVAL (index) & 3) == 0);
> > 
> > A simple "fix" would be to use 9 instead of 10, but something a little
> > more subtle might be preferred :-)
> > 
> > Richard
> 
> Oh dear, for some reason I mistakenly thought that NEON had a quad-word
> load/store, sorry :P
> 
> Reducing from 10 to 9 may be a possible solution, if restricted to the
> necessary cases. For example:
> 
> -low = SIGN_MAG_LOW_ADDR_BITS (val, 10);
> +{
> +  low = SIGN_MAG_LOW_ADDR_BITS (val, 10);
> +
> +  /* NEON quad-word load/stores are made of two double-word accesses,
> +     so the valid index range is reduced by 8. Treat as 9-bit range if
> +     we go over it.  */
> +  if (TARGET_NEON && VALID_NEON_QREG_MODE (mode) && low >= 1016)
> +    low = SIGN_MAG_LOW_ADDR_BITS (val, 9);
> +}
> 
> To Richard Earnshaw, how do you think of a fix like this? Or should we
> just simply return false under this out-of-range case (it should be rare
> I hope).
> 

I don't think it matters a great deal.  The above is fine.

Note, that some targets don't have LDRD either.  Do we do the right
thing if we're going to fall back to two LDR instructions?

R.
Chung-Lin Tang April 20, 2011, 3:28 p.m. UTC | #5
On 2011/4/20 11:12 PM, Richard Earnshaw wrote:
> 
> On Wed, 2011-04-20 at 23:06 +0800, Chung-Lin Tang wrote:
>> On 2011/4/20 09:24 PM, Richard Sandiford wrote:
>>> Hi Chung-Lin,
>>>
>>> I'm seeing an ICE with this patch, specifically;
>>>
>>> Chung-Lin Tang <cltang@codesourcery.com> writes:
>>>> +      if (coproc_p)
>>>> +	low = SIGN_MAG_LOW_ADDR_BITS (val, 10);
>>>
>>> We generate:
>>>
>>> Reload 1: reload_out (V4SI) = (mem/c:V4SI (plus:SI (plus:SI (reg/f:SI 11 fp)
>>>                                                             (const_int -6144 [0xffffffffffffe800]))
>>>                                                         (const_int 1020 [0x3fc])) [43 %sfp+-5024 S16 A64])
>>>
>>> but 1020 isn't a legitimate offset for V4SI:
>>>
>>>   /* For quad modes, we restrict the constant offset to be slightly less
>>>      than what the instruction format permits.  We do this because for
>>>      quad mode moves, we will actually decompose them into two separate
>>>      double-mode reads or writes.  INDEX must therefore be a valid
>>>      (double-mode) offset and so should INDEX+8.  */
>>>   if (TARGET_NEON && VALID_NEON_QREG_MODE (mode))
>>>     return (code == CONST_INT
>>> 	    && INTVAL (index) < 1016
>>> 	    && INTVAL (index) > -1024
>>> 	    && (INTVAL (index) & 3) == 0);
>>>
>>> A simple "fix" would be to use 9 instead of 10, but something a little
>>> more subtle might be preferred :-)
>>>
>>> Richard
>>
>> Oh dear, for some reason I mistakenly thought that NEON had a quad-word
>> load/store, sorry :P
>>
>> Reducing from 10 to 9 may be a possible solution, if restricted to the
>> necessary cases. For example:
>>
>> -low = SIGN_MAG_LOW_ADDR_BITS (val, 10);
>> +{
>> +  low = SIGN_MAG_LOW_ADDR_BITS (val, 10);
>> +
>> +  /* NEON quad-word load/stores are made of two double-word accesses,
>> +     so the valid index range is reduced by 8. Treat as 9-bit range if
>> +     we go over it.  */
>> +  if (TARGET_NEON && VALID_NEON_QREG_MODE (mode) && low >= 1016)
>> +    low = SIGN_MAG_LOW_ADDR_BITS (val, 9);
>> +}
>>
>> To Richard Earnshaw, how do you think of a fix like this? Or should we
>> just simply return false under this out-of-range case (it should be rare
>> I hope).
>>
> 
> I don't think it matters a great deal.  The above is fine.
> 
> Note, that some targets don't have LDRD either.  Do we do the right
> thing if we're going to fall back to two LDR instructions?
> 
> R.

The current non-TARGET_LDRD case goes through this path:
    ...
    else
      /* For pre-ARMv5TE (without ldrd), we use ldm/stm(db/da/ib)
         to access doublewords. The supported load/store offsets are
         -8, -4, and 4, which we try to produce here.  */
      low = ((val & 0xf) ^ 0x8) - 0x8;

which uses ldm/stm. This should be safe.

As for pre-ARMv4 ldrh, this is special cased as:
    if (arm_arch4)
      low = SIGN_MAG_LOW_ADDR_BITS (val, 8);
    else
      {
         /* The storehi/movhi_bytes fallbacks can use only
            [-4094,+4094] of the full ldrb/strb index range.  */
         low = SIGN_MAG_LOW_ADDR_BITS (val, 12);
         if (low == 4095 || low == -4095)
           return false;
      }

Although to be frank, I haven't really tested a pre-ARMv4 config; not
very easy to do so in an EABI world :)

I'll take the above NEON QREG mode fix as approved.

Chung-Lin
diff mbox

Patch

Index: config/arm/arm.c
===================================================================
--- config/arm/arm.c	(revision 172057)
+++ config/arm/arm.c	(working copy)
@@ -6419,23 +6419,126 @@ 
       HOST_WIDE_INT val = INTVAL (XEXP (*p, 1));
       HOST_WIDE_INT low, high;
 
-      if (mode == DImode || (mode == DFmode && TARGET_SOFT_FLOAT))
-	low = ((val & 0xf) ^ 0x8) - 0x8;
-      else if (TARGET_MAVERICK && TARGET_HARD_FLOAT)
-	/* Need to be careful, -256 is not a valid offset.  */
-	low = val >= 0 ? (val & 0xff) : -((-val) & 0xff);
-      else if (mode == SImode
-	       || (mode == SFmode && TARGET_SOFT_FLOAT)
-	       || ((mode == HImode || mode == QImode) && ! arm_arch4))
-	/* Need to be careful, -4096 is not a valid offset.  */
-	low = val >= 0 ? (val & 0xfff) : -((-val) & 0xfff);
-      else if ((mode == HImode || mode == QImode) && arm_arch4)
-	/* Need to be careful, -256 is not a valid offset.  */
-	low = val >= 0 ? (val & 0xff) : -((-val) & 0xff);
-      else if (GET_MODE_CLASS (mode) == MODE_FLOAT
-	       && TARGET_HARD_FLOAT && TARGET_FPA)
-	/* Need to be careful, -1024 is not a valid offset.  */
-	low = val >= 0 ? (val & 0x3ff) : -((-val) & 0x3ff);
+      /* Detect coprocessor load/stores.  */
+      bool coproc_p = ((TARGET_HARD_FLOAT
+			&& (TARGET_VFP || TARGET_FPA || TARGET_MAVERICK)
+			&& (mode == SFmode || mode == DFmode
+			    || (mode == DImode && TARGET_MAVERICK)))
+		       || (TARGET_REALLY_IWMMXT
+			   && VALID_IWMMXT_REG_MODE (mode))
+		       || (TARGET_NEON
+			   && (VALID_NEON_DREG_MODE (mode)
+			       || VALID_NEON_QREG_MODE (mode))));
+
+      /* For some conditions, bail out when lower two bits are unaligned.  */
+      if ((val & 0x3) != 0
+	  /* Coprocessor load/store indexes are 8-bits + '00' appended.  */
+	  && (coproc_p
+	      /* For DI, and DF under soft-float: */
+	      || ((mode == DImode || mode == DFmode)
+		  /* Without ldrd, we use stm/ldm, which does not
+		     fair well with unaligned bits.  */
+		  && (! TARGET_LDRD
+		      /* Thumb-2 ldrd/strd is [-1020,+1020] in steps of 4.  */
+		      || TARGET_THUMB2))))
+	return false;
+
+      /* When breaking down a [reg+index] reload address into [(reg+high)+low],
+	 of which the (reg+high) gets turned into a reload add insn,
+	 we try to decompose the index into high/low values that can often
+	 also lead to better reload CSE.
+	 For example:
+	         ldr r0, [r2, #4100]  // Offset too large
+		 ldr r1, [r2, #4104]  // Offset too large
+
+	 is best reloaded as:
+	         add t1, r2, #4096
+		 ldr r0, [t1, #4]
+		 add t2, r2, #4096
+		 ldr r1, [t2, #8]
+
+	 which post-reload CSE can simplify in most cases to eliminate the
+	 second add instruction:
+	         add t1, r2, #4096
+		 ldr r0, [t1, #4]
+		 ldr r1, [t1, #8]
+
+	 The idea here is that we want to split out the bits of the constant
+	 as a mask, rather than as subtracting the maximum offset that the
+	 respective type of load/store used can handle.
+
+	 When encountering negative offsets, we can still utilize it even if
+	 the overall offset is positive; sometimes this may lead to an immediate
+	 that can be constructed with fewer instructions.
+	 For example:
+	         ldr r0, [r2, #0x3FFFFC]
+
+	 This is best reloaded as:
+	         add t1, r2, #0x400000
+		 ldr r0, [t1, #-4]
+
+	 The trick for spotting this for a load insn with N bits of offset
+	 (i.e. bits N-1:0) is to look at bit N; if it is set, then chose a
+	 negative offset that is going to make bit N and all the bits below
+	 it become zero in the remainder part.
+
+	 The SIGN_MAG_LOW_ADDR_BITS macro below implements this, with respect
+	 to sign-magnitude addressing (i.e. separate +- bit, or 1's complement),
+	 used in most cases of ARM load/store instructions.  */
+
+#define SIGN_MAG_LOW_ADDR_BITS(VAL, N)					\
+      (((VAL) & ((1 << (N)) - 1))					\
+       ? (((VAL) & ((1 << ((N) + 1)) - 1)) ^ (1 << (N))) - (1 << (N))	\
+       : 0)
+
+      if (coproc_p)
+	low = SIGN_MAG_LOW_ADDR_BITS (val, 10);
+      else if (GET_MODE_SIZE (mode) == 8)
+	{
+	  if (TARGET_LDRD)
+	    low = (TARGET_THUMB2
+		   ? SIGN_MAG_LOW_ADDR_BITS (val, 10)
+		   : SIGN_MAG_LOW_ADDR_BITS (val, 8));
+	  else
+	    /* For pre-ARMv5TE (without ldrd), we use ldm/stm(db/da/ib)
+	       to access doublewords. The supported load/store offsets are
+	       -8, -4, and 4, which we try to produce here.  */
+	    low = ((val & 0xf) ^ 0x8) - 0x8;
+	}
+      else if (GET_MODE_SIZE (mode) < 8)
+	{
+	  /* NEON element load/stores do not have an offset.  */
+	  if (TARGET_NEON_FP16 && mode == HFmode)
+	    return false;
+
+	  if (TARGET_THUMB2)
+	    {
+	      /* Thumb-2 has an asymmetrical index range of (-256,4096).
+		 Try the wider 12-bit range first, and re-try if the result
+		 is out of range.  */
+	      low = SIGN_MAG_LOW_ADDR_BITS (val, 12);
+	      if (low < -255)
+		low = SIGN_MAG_LOW_ADDR_BITS (val, 8);
+	    }
+	  else
+	    {
+	      if (mode == HImode || mode == HFmode)
+		{
+		  if (arm_arch4)
+		    low = SIGN_MAG_LOW_ADDR_BITS (val, 8);
+		  else
+		    {
+		      /* The storehi/movhi_bytes fallbacks can use only
+			 [-4094,+4094] of the full ldrb/strb index range.  */
+		      low = SIGN_MAG_LOW_ADDR_BITS (val, 12);
+		      if (low == 4095 || low == -4095)
+			return false;
+		    }
+		}
+	      else
+		low = SIGN_MAG_LOW_ADDR_BITS (val, 12);
+	    }
+	}
       else
 	return false;
 
Index: config/arm/arm.md
===================================================================
--- config/arm/arm.md	(revision 172057)
+++ config/arm/arm.md	(working copy)
@@ -6188,7 +6188,7 @@ 
   [(match_operand:DF 0 "arm_reload_memory_operand" "=o")
    (match_operand:DF 1 "s_register_operand" "r")
    (match_operand:SI 2 "s_register_operand" "=&r")]
-  "TARGET_32BIT"
+  "TARGET_THUMB2"
   "
   {
     enum rtx_code code = GET_CODE (XEXP (operands[0], 0));