Patchwork [IRA] Fix ALLOCNO_MODE in the case of paradoxical subreg.

login
register
mail settings
Submitter Wei Mi
Date Oct. 1, 2013, 5:55 a.m.
Message ID <CA+4CFy7y3BNQTDCaNR2ypg4k8N-aFLsqmJPed2=qWZa33-g07Q@mail.gmail.com>
Download mbox | patch
Permalink /patch/279338/
State New
Headers show

Comments

Wei Mi - Oct. 1, 2013, 5:55 a.m.
> Probably the best place to add a code for this is in
> lra-constraints.c::simplify_operand_subreg by permitting subreg reload
> for paradoxical subregs whose hard regs are not fully in allocno class
> of the inner pseudo.
>
> It needs a good testing (i'd check that the generated code is not
> changed on variety benchmarks to see that the change has no impact on
> the most programs performance) and you need to add a good comment
> describing why this change is needed.
>

Vlad, thanks! I make another patch here by following your guidance.
Please check whether it is ok. Boostrap and regression ok. I am also
verifying its performance effect on google applications (But most of
them are 64 bits, so I cannot verify its performance effect on 32 bits
apps).

The idea of the patch is here:

For the following two types of paradoxical subreg, we insert reload in
simplify_operand_subreg:
1. If the op_type is OP_IN, and the hardreg could not be paired with
other hardreg to contain the outermode operand, for example R15 in
x86-64 (checked by in_hard_reg_set_p), we need to insert a reload. If
the hardreg allocated in IRA is R12, we don't need to insert reload
here because upper half of rvalue paradoxical subreg is undefined so
it is ok for R13 to contain undefined data.

2. If the op_type is OP_OUT or OP_INOUT.
    (It is possible that we don't need to insert reload for this case
too, because the upper half of lvalue paradoxical subreg is useless.
If the assignment to upper half of subreg register will not be
generated by rtl split4 stage, we don't need to insert reload here.
But I havn't got a testcase to verify it so I keep it)

Here is a paradoxical subreg example showing how the reload is generated:

     (insn 5 4 7 2 (set (reg:TI 106 [ __comp ])
        (subreg:TI (reg:DI 107 [ __comp ]) 0)) {*movti_internal_rex64}

In IRA, reg107 is allocated to a DImode hardreg. If reg107 is assigned
to hardreg R15, compiler cannot find another hardreg to pair with R15
to contain TImode data. So we insert a TImode reload pseudo reg180 for
it.

After reload is inserted:

     (insn 283 0 0 (set (subreg:DI (reg:TI 180 [orig:107 __comp ] [107]) 0)
        (reg:DI 107 [ __comp ])) -1
     (insn 5 4 7 2 (set (reg:TI 106 [ __comp ])
        (subreg:TI (reg:TI 180 [orig:107 __comp ] [107]) 0))
{*movti_internal_rex64}

Two reload hard registers will be allocated to reg180 to save TImode
operand in LRA_assign.

Thanks,
Wei Mi.

2013-09-30  Wei Mi  <wmi@google.com>

        * lra-constraints.c (insert_move_for_subreg): New function.
        (simplify_operand_subreg): Add reload for paradoxical subreg.


 }
Wei Mi - Oct. 1, 2013, 3:06 p.m.
> Please check whether it is ok. Boostrap and regression ok. I am also
> verifying its performance effect on google applications (But most of
> them are 64 bits, so I cannot verify its performance effect on 32 bits
> apps).

Have verified It has no performance impact on google applications.

Thanks,
Wei Mi.
Vladimir Makarov - Oct. 3, 2013, 3:36 p.m.
On 10/01/2013 01:55 AM, Wei Mi wrote:
>> Probably the best place to add a code for this is in
>> lra-constraints.c::simplify_operand_subreg by permitting subreg reload
>> for paradoxical subregs whose hard regs are not fully in allocno class
>> of the inner pseudo.
>>
>> It needs a good testing (i'd check that the generated code is not
>> changed on variety benchmarks to see that the change has no impact on
>> the most programs performance) and you need to add a good comment
>> describing why this change is needed.
>>
> Vlad, thanks! I make another patch here by following your guidance.
> Please check whether it is ok. Boostrap and regression ok. I am also
> verifying its performance effect on google applications (But most of
> them are 64 bits, so I cannot verify its performance effect on 32 bits
> apps).
>
> The idea of the patch is here:
>
> For the following two types of paradoxical subreg, we insert reload in
> simplify_operand_subreg:
> 1. If the op_type is OP_IN, and the hardreg could not be paired with
> other hardreg to contain the outermode operand, for example R15 in
> x86-64 (checked by in_hard_reg_set_p), we need to insert a reload. If
> the hardreg allocated in IRA is R12, we don't need to insert reload
> here because upper half of rvalue paradoxical subreg is undefined so
> it is ok for R13 to contain undefined data.
>
> 2. If the op_type is OP_OUT or OP_INOUT.
>     (It is possible that we don't need to insert reload for this case
> too, because the upper half of lvalue paradoxical subreg is useless.
> If the assignment to upper half of subreg register will not be
> generated by rtl split4 stage, we don't need to insert reload here.
> But I havn't got a testcase to verify it so I keep it)
>
> Here is a paradoxical subreg example showing how the reload is generated:
>
>      (insn 5 4 7 2 (set (reg:TI 106 [ __comp ])
>         (subreg:TI (reg:DI 107 [ __comp ]) 0)) {*movti_internal_rex64}
>
> In IRA, reg107 is allocated to a DImode hardreg. If reg107 is assigned
> to hardreg R15, compiler cannot find another hardreg to pair with R15
> to contain TImode data. So we insert a TImode reload pseudo reg180 for
> it.
>
> After reload is inserted:
>
>      (insn 283 0 0 (set (subreg:DI (reg:TI 180 [orig:107 __comp ] [107]) 0)
>         (reg:DI 107 [ __comp ])) -1
>      (insn 5 4 7 2 (set (reg:TI 106 [ __comp ])
>         (subreg:TI (reg:TI 180 [orig:107 __comp ] [107]) 0))
> {*movti_internal_rex64}
>
> Two reload hard registers will be allocated to reg180 to save TImode
> operand in LRA_assign.
>
Wei Mi, thanks very much for working on this.  The patch is ok for me
except for one change (please see below).
>
> 2013-09-30  Wei Mi  <wmi@google.com>
>
>         * lra-constraints.c (insert_move_for_subreg): New function.
>         (simplify_operand_subreg): Add reload for paradoxical subreg.
>
>
> @@ -1215,13 +1242,9 @@ simplify_operand_subreg (int nop, enum m
>         && (hard_regno_nregs[hard_regno][GET_MODE (reg)]
>            >= hard_regno_nregs[hard_regno][mode])
>         && simplify_subreg_regno (hard_regno, GET_MODE (reg),
> -                                SUBREG_BYTE (operand), mode) < 0
> -       /* Don't reload subreg for matching reload.  It is actually
> -         valid subreg in LRA.  */
> -       && ! LRA_SUBREG_P (operand))
> +                                SUBREG_BYTE (operand), mode) < 0)
You removed conditition with LRA_SUBREG for non-paradoxical subreg
generated for matched operands.  I think that is important condition and
the comment says why.  There are some 32-bit insns constraints requiring
different modes (int and fp ones) for matching operands in FP regs.  The
condition prevents LRA cycling as such subreg can look invalid (in
simplify_subreg_regno) but it is used internally in LRA for matching
constraint expression and should be not reloaded.

With this change the patch is ok for me.
 
>        || CONSTANT_P (reg) || GET_CODE (reg) == PLUS || MEM_P (reg))
>      {
> -      enum op_type type = curr_static_id->operand[nop].type;
>        /* The class will be defined later in curr_insn_transform.  */
>        enum reg_class rclass
>         = (enum reg_class) targetm.preferred_reload_class (reg, ALL_REGS);
>
Wei Mi - Oct. 3, 2013, 5:21 p.m.
> You removed conditition with LRA_SUBREG for non-paradoxical subreg
> generated for matched operands.  I think that is important condition and
> the comment says why.  There are some 32-bit insns constraints requiring
> different modes (int and fp ones) for matching operands in FP regs.  The
> condition prevents LRA cycling as such subreg can look invalid (in
> simplify_subreg_regno) but it is used internally in LRA for matching
> constraint expression and should be not reloaded.
>
> With this change the patch is ok for me.
>

Thank you very much for the review! Patch fixed according to your
comments and committed as r203169.

Regards,
Wei Mi.

Patch

Index: lra-constraints.c
===================================================================
--- lra-constraints.c   (revision 201963)
+++ lra-constraints.c   (working copy)
@@ -1158,6 +1158,30 @@  process_addr_reg (rtx *loc, rtx *before,
   return true;
 }

+/* Insert move insn in simplify_operand_subreg. BEFORE returns
+   the insn to be inserted before curr insn. AFTER returns the
+   the insn to be inserted after curr insn.  ORIGREG and NEWREG
+   are the original reg and new reg for reload.  */
+static void
+insert_move_for_subreg (rtx *before, rtx *after, rtx origreg, rtx newreg)
+{
+  if (before)
+    {
+      push_to_sequence (*before);
+      lra_emit_move (newreg, origreg);
+      *before = get_insns ();
+      end_sequence ();
+    }
+  if (after)
+    {
+      start_sequence ();
+      lra_emit_move (origreg, newreg);
+      emit_insn (*after);
+      *after = get_insns ();
+      end_sequence ();
+    }
+}
+
 /* Make reloads for subreg in operand NOP with internal subreg mode
    REG_MODE, add new reloads for further processing.  Return true if
    any reload was generated.  */
@@ -1169,6 +1193,8 @@  simplify_operand_subreg (int nop, enum m
   enum machine_mode mode;
   rtx reg, new_reg;
   rtx operand = *curr_id->operand_loc[nop];
+  enum reg_class regclass;
+  enum op_type type;

   before = after = NULL_RTX;

@@ -1177,6 +1203,7 @@  simplify_operand_subreg (int nop, enum m

   mode = GET_MODE (operand);
   reg = SUBREG_REG (operand);
+  type = curr_static_id->operand[nop].type;
   /* If we change address for paradoxical subreg of memory, the
      address might violate the necessary alignment or the access might
      be slow.  So take this into consideration.  We should not worry
@@ -1215,13 +1242,9 @@  simplify_operand_subreg (int nop, enum m
        && (hard_regno_nregs[hard_regno][GET_MODE (reg)]
           >= hard_regno_nregs[hard_regno][mode])
        && simplify_subreg_regno (hard_regno, GET_MODE (reg),
-                                SUBREG_BYTE (operand), mode) < 0
-       /* Don't reload subreg for matching reload.  It is actually
-         valid subreg in LRA.  */
-       && ! LRA_SUBREG_P (operand))
+                                SUBREG_BYTE (operand), mode) < 0)
       || CONSTANT_P (reg) || GET_CODE (reg) == PLUS || MEM_P (reg))
     {
-      enum op_type type = curr_static_id->operand[nop].type;
       /* The class will be defined later in curr_insn_transform.  */
       enum reg_class rclass
        = (enum reg_class) targetm.preferred_reload_class (reg, ALL_REGS);
@@ -1229,29 +1252,85 @@  simplify_operand_subreg (int nop, enum m
       if (get_reload_reg (curr_static_id->operand[nop].type, reg_mode, reg,
                          rclass, "subreg reg", &new_reg))
        {
+         bool insert_before, insert_after;
          bitmap_set_bit (&lra_subreg_reload_pseudos, REGNO (new_reg));
-         if (type != OP_OUT
-             || GET_MODE_SIZE (GET_MODE (reg)) > GET_MODE_SIZE (mode))
-           {
-             push_to_sequence (before);
-             lra_emit_move (new_reg, reg);
-             before = get_insns ();
-             end_sequence ();
-           }
-         if (type != OP_IN)
-           {
-             start_sequence ();
-             lra_emit_move (reg, new_reg);
-             emit_insn (after);
-             after = get_insns ();
-             end_sequence ();
-           }
+
+         insert_before = (type != OP_OUT
+                          || GET_MODE_SIZE (GET_MODE (reg)) >
GET_MODE_SIZE (mode));
+         insert_after = (type != OP_IN);
+         insert_move_for_subreg (insert_before ? &before : NULL,
+                                 insert_after ? &after : NULL,
+                                 reg, new_reg);
        }
       SUBREG_REG (operand) = new_reg;
       lra_process_new_insns (curr_insn, before, after,
                             "Inserting subreg reload");
       return true;
     }
+  /* Force a reload for a paradoxical subreg. For paradoxical subreg,
+     IRA allocates hardreg to the inner pseudo reg according to its mode
+     instead of the outermode, so the size of the hardreg may not be enough
+     to contain the outermode operand, in that case we may need to insert
+     reload for the reg. For the following two types of paradoxical subreg,
+     we need to insert reload:
+     1. If the op_type is OP_IN, and the hardreg could not be paired with
+        other hardreg to contain the outermode operand
+        (checked by in_hard_reg_set_p), we need to insert the reload.
+     2. If the op_type is OP_OUT or OP_INOUT.
+
+     Here is a paradoxical subreg example showing how the reload is generated:
+
+     (insn 5 4 7 2 (set (reg:TI 106 [ __comp ])
+        (subreg:TI (reg:DI 107 [ __comp ]) 0)) {*movti_internal_rex64}
+
+     In IRA, reg107 is allocated to a DImode hardreg. We use x86-64 as example
+     here, if reg107 is assigned to hardreg R15, because R15 is the last
+     hardreg, compiler cannot find another hardreg to pair with R15 to
+     contain TImode data. So we insert a TImode reload reg180 for it.
+     After reload is inserted:
+
+     (insn 283 0 0 (set (subreg:DI (reg:TI 180 [orig:107 __comp ] [107]) 0)
+        (reg:DI 107 [ __comp ])) -1
+     (insn 5 4 7 2 (set (reg:TI 106 [ __comp ])
+        (subreg:TI (reg:TI 180 [orig:107 __comp ] [107]) 0))
{*movti_internal_rex64}
+
+     Two reload hard registers will be allocated to reg180 to save TImode data
+     in LRA_assign.  */
+  else if (REG_P (reg)
+          && REGNO (reg) >= FIRST_PSEUDO_REGISTER
+          && (hard_regno = lra_get_regno_hard_regno (REGNO (reg))) >= 0
+          && (hard_regno_nregs[hard_regno][GET_MODE (reg)]
+              < hard_regno_nregs[hard_regno][mode])
+          && (regclass = lra_get_allocno_class (REGNO (reg)))
+          && (type != OP_IN
+              || !in_hard_reg_set_p (reg_class_contents[regclass],
+                                     mode, hard_regno)))
+    {
+      /* The class will be defined later in curr_insn_transform.  */
+      enum reg_class rclass
+       = (enum reg_class) targetm.preferred_reload_class (reg, ALL_REGS);
+
+      if (get_reload_reg (curr_static_id->operand[nop].type, mode, reg,
+                          rclass, "paradoxical subreg", &new_reg))
+        {
+         rtx subreg;
+         bool insert_before, insert_after;
+
+         PUT_MODE (new_reg, mode);
+          subreg = simplify_gen_subreg (GET_MODE (reg), new_reg, mode, 0);
+         bitmap_set_bit (&lra_subreg_reload_pseudos, REGNO (new_reg));
+
+         insert_before = (type != OP_OUT);
+         insert_after = (type != OP_IN);
+         insert_move_for_subreg (insert_before ? &before : NULL,
+                                 insert_after ? &after : NULL,
+                                 reg, subreg);
+       }
+      SUBREG_REG (operand) = new_reg;
+      lra_process_new_insns (curr_insn, before, after,
+                             "Inserting paradoxical subreg reload");
+      return true;
+    }
   return false;