diff mbox

Tighten ARM's CANNOT_CHANGE_MODE_CLASS

Message ID g47hbov856.fsf@linaro.org
State New
Headers show

Commit Message

Richard Sandiford March 24, 2011, 3:40 p.m. UTC
We currently generate very poor code for tests like:

#include <arm_neon.h>

void
foo (uint32_t *a, uint32_t *b, uint32_t *c)
{
  uint32x4x3_t x, y;

  x = vld3q_u32 (a);
  y = vld3q_u32 (b);
  x.val[0] = vaddq_u32 (x.val[0], y.val[0]);
  x.val[1] = vaddq_u32 (x.val[1], y.val[1]);
  x.val[2] = vaddq_u32 (x.val[2], y.val[2]);
  vst3q_u32 (a, x);
}

This is because we force the uint32x4x3_t values to the stack and
then load and store the individual vectors.

What we actually want is for the uint32x4x3_t values to be stored
in registers, and for the individual vectors to be accessed as
subregs of those registers.  The first part involves some middle-end
mode changes (see recent gcc@ thread), while the second part requires
a change to ARM's CANNOT_CHANGE_MODE_CLASS.

CANNOT_CHANGE_MODE_CLASS is defined as:

/* FPA registers can't do subreg as all values are reformatted to internal
   precision.  VFP registers may only be accessed in the mode they
   were set.  */
#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)	\
  (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)		\
   ? reg_classes_intersect_p (FPA_REGS, (CLASS))	\
     || reg_classes_intersect_p (VFP_REGS, (CLASS))	\
   : 0)

But this VFP restriction appears to apply only to VFPv1; thanks to
Peter Maydell for the archaeology.

Tested on arm-linux-gnueabi.  OK to install?

This doesn't have any direct benefit without the middle-end mode change,
but it needs to go in first in order for that change not to regress.

Richard


gcc/
	* config/arm/arm.h (CANNOT_CHANGE_MODE_CLASS): Restrict FPA_REGS
	case to VFPv1.

Comments

Richard Earnshaw March 24, 2011, 4:47 p.m. UTC | #1
On Thu, 2011-03-24 at 15:40 +0000, Richard Sandiford wrote:
> We currently generate very poor code for tests like:
> 
> #include <arm_neon.h>
> 
> void
> foo (uint32_t *a, uint32_t *b, uint32_t *c)
> {
>   uint32x4x3_t x, y;
> 
>   x = vld3q_u32 (a);
>   y = vld3q_u32 (b);
>   x.val[0] = vaddq_u32 (x.val[0], y.val[0]);
>   x.val[1] = vaddq_u32 (x.val[1], y.val[1]);
>   x.val[2] = vaddq_u32 (x.val[2], y.val[2]);
>   vst3q_u32 (a, x);
> }
> 
> This is because we force the uint32x4x3_t values to the stack and
> then load and store the individual vectors.
> 
> What we actually want is for the uint32x4x3_t values to be stored
> in registers, and for the individual vectors to be accessed as
> subregs of those registers.  The first part involves some middle-end
> mode changes (see recent gcc@ thread), while the second part requires
> a change to ARM's CANNOT_CHANGE_MODE_CLASS.
> 
> CANNOT_CHANGE_MODE_CLASS is defined as:
> 
> /* FPA registers can't do subreg as all values are reformatted to internal
>    precision.  VFP registers may only be accessed in the mode they
>    were set.  */
> #define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)	\
>   (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)		\
>    ? reg_classes_intersect_p (FPA_REGS, (CLASS))	\
>      || reg_classes_intersect_p (VFP_REGS, (CLASS))	\
>    : 0)
> 
> But this VFP restriction appears to apply only to VFPv1; thanks to
> Peter Maydell for the archaeology.
> 
> Tested on arm-linux-gnueabi.  OK to install?
> 
> This doesn't have any direct benefit without the middle-end mode change,
> but it needs to go in first in order for that change not to regress.
> 
> Richard
> 
> 
> gcc/
> 	* config/arm/arm.h (CANNOT_CHANGE_MODE_CLASS): Restrict FPA_REGS
> 	case to VFPv1.
> 

GCC doesn't support VFPv1 (see the all_fpus table), and I don't think
many chips based on that ever escaped into the wild world, so I'm not
worried about trying to add that now.

So it's probably safe to just kill that check for VFP entirely.

R.
diff mbox

Patch

Index: gcc/config/arm/arm.h
===================================================================
--- gcc/config/arm/arm.h	2011-03-24 13:47:14.000000000 +0000
+++ gcc/config/arm/arm.h	2011-03-24 15:26:19.000000000 +0000
@@ -1167,12 +1167,14 @@  #define IRA_COVER_CLASSES						     \
 }
 
 /* FPA registers can't do subreg as all values are reformatted to internal
-   precision.  VFP registers may only be accessed in the mode they
+   precision.  VFPv1 registers may only be accessed in the mode they
    were set.  */
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)	\
-  (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)		\
-   ? reg_classes_intersect_p (FPA_REGS, (CLASS))	\
-     || reg_classes_intersect_p (VFP_REGS, (CLASS))	\
+#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)		\
+  (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)			\
+   ? (reg_classes_intersect_p (FPA_REGS, (CLASS))		\
+      || (TARGET_VFP						\
+	  && arm_fpu_desc->rev == 1				\
+	  && reg_classes_intersect_p (VFP_REGS, (CLASS))))	\
    : 0)
 
 /* The class value for index registers, and the one for base regs.  */