[ARM,3/6] Adjust costs for Large moves for ARM.

Message ID CACUk7=VbZVOy+3Jd6-nMxmL-7j+TWB+OB4AZc_j0dYUFUWf91g@mail.gmail.com
State New
Headers show

Commit Message

Ramana Radhakrishnan July 30, 2012, 11:46 a.m.

lower-subreg.c goes completely bonkers at times with code
that uses the large vector modes, especially the vld3 / vst3
type operations. In these cases these large modes are usually
split into SImode moves which then cause massive spilling
and in these cases we end up generating really really bad code.

The problem here appears to be around the fact
that we report the cost of a reg-reg move to be 0 and the alternate
is also 0 which means that by default we split in any large register
case. I am a bit unsure about DImode moves and whether they should be
split or not which is why there is a fixme in this particular case.

With the examples that I've tried out which has been suitably complex
neon intrinsics code, this appears to prevent the gratuitous splitting.
Ofcourse not splitting has it's own problems as we now have a contiguous
3 registers with large values being allocated. I'm not however sure
how this will hold up in practice and in real life applications
and if someone could provide some feedback on this it would be

If only smaller portions of those large registers are used, it gets
a bit harder for the register allocator to get this right.

So this is a patch that might need more tweaking and is potentially
the most contentious of the lot. In addition the same logic could be
applied to arm_size_cost before I commit this patch.


2012-07-27  Ramana Radhakrishnan  <ramana.radhakrishnan@linaro.org>

	* config/arm/arm.c (arm_rtx_costs_1): Adjust cost for register
	register moves.
	(arm_reg_reg_move_cost_for_mode): Use it.
 gcc/config/arm/arm.c |   46 ++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 46 insertions(+), 0 deletions(-)

     case UNSPEC:
@@ -26501,4 +26509,42 @@ arm_split_tocx_imoves (rtx *operands, enum
machine_mode mode)


+static int
+arm_reg_reg_move_cost_for_mode (enum machine_mode mode)
+  /* Check if this is a move between 2 pseudos and
+     2 hard registers will fall out from the stuff
+     below.  */
+    {
+      /* FIXME - this is currently in only to prevent
+	 the large register moves. However in practice
+	 preventing splitting of DImode values requires
+	 more tuning.  */
+      if (mode != DImode
+	  && (VALID_NEON_DREG_MODE (mode)
+	      || VALID_NEON_QREG_MODE (mode)))
+	return 1;
+      /* The cost of moving a structure type size is the
+	 number of 128 bit moves one needs to do in addition
+	 to the number of 64 bit moves one needs to do in
+	 case of the EImode values.  */
+      if (VALID_NEON_STRUCT_MODE (mode))
+	{
+	  return ((GET_MODE_SIZE (mode) / GET_MODE_SIZE (TImode))
+		  + ((GET_MODE_SIZE (mode) / GET_MODE_SIZE (DImode)) & 1));
+	}
+    }
+    {
+      if (mode == DFmode
+	  && mode == SFmode)
+	return 1;
+    }
+  return ARM_NUM_REGS (mode);
 #include "gt-arm.h"


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index b281485..c59184f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -268,6 +268,7 @@  static int arm_cortex_a5_branch_cost (bool, bool);

 static bool arm_vectorize_vec_perm_const_ok (enum machine_mode vmode,
 					     const unsigned char *sel);
+static int arm_reg_reg_move_cost_for_mode (enum machine_mode mode);

 /* Table of machine attributes.  */
@@ -7637,6 +7638,13 @@  arm_rtx_costs_1 (rtx x, enum rtx_code outer,
int* total, bool speed)
       return true;

     case SET:
+      if (s_register_operand (SET_DEST (x), GET_MODE (SET_DEST (x)))
+	  && s_register_operand (SET_SRC (x), GET_MODE (SET_SRC (x))))
+	{
+	  *total = COSTS_N_INSNS (arm_reg_reg_move_cost_for_mode
+				  (GET_MODE (SET_DEST (x))));
+	  return true;
+	}
       return false;