diff mbox series

[IRA] Handle fully-tied destinations in a similar way to earlyclobbers

Message ID mpt7e5p97ae.fsf@arm.com
State New
Headers show
Series [IRA] Handle fully-tied destinations in a similar way to earlyclobbers | expand

Commit Message

Richard Sandiford Sept. 30, 2019, 2:40 p.m. UTC
IRA's make_early_clobber_and_input_conflicts checks for cases in
which an output operand is likely to be an earlyclobber and an input
operand is unlikely to be tieable with it.  If so, the allocno for
the output conflicts with the allocno for the input.  This seems
to work well.

However, a similar situation arises if an output operand is likely
to be tied to one of a set of input operands X and if another input
operand has a different value from all of the operands in X.
E.g. if we have:

  0: "=r, r"
  1: "0, r"
  2: "r, 0"
  3: "r, r"

operand 0 will always be tied to operand 1 or operand 2, so if operand 3
is different from them both, operand 0 acts like an earlyclobber as far
as operand 3 (only) is concerned.  The same is true for operand 2 in:

  0: "=r"
  1: "0"
  2: "r"

In the second example, we'd normally have a copy between operand 1 and
operand 0 if operand 1 dies in the instruction, and so there's rarely
a problem.  But if operand 1 doesn't die in the instruction, operand 0
still acts as an earlyclobber for operand 2 (if different from operand 1),
since in that case LRA must copy operand 1 to operand 0 before the
instruction.

As the existing comment says:

    Avoid introducing unnecessary conflicts by checking classes of the
    constraints and pseudos because otherwise significant code
    degradation is possible for some targets.

I think that's doubly true here.  E.g. it's perfectly reasonable to have
constraints like:

  0: "=r, r"
  1: "0, r"
  2: "r, r"

on targets like s390 that have shorter instructions for tied operands,
but that don't want the size difference to influence RA too much.
We shouldn't treat operand 0 as earlyclobber wrt operand 2 in that case.

This patch therefore treats a normal tied non-earlyclobber output as
being effectively earlyclobber wrt to an input if it is so for *all*
preferred alternatives.

My usual bogo-comparison of gcc.c-torture, gcc.dg and g++.dg
(this time using -Os -fno-schedule-insns{,2}) gives:

Target                 Tests  Delta   Best  Worst Median
======                 =====  =====   ====  ===== ======
aarch64-linux-gnu          3     -3     -1     -1     -1
aarch64_be-linux-gnu       4     -4     -1     -1     -1
alpha-linux-gnu          136   -190    -56     84     -1
arc-elf                   31   -172    -27      3     -2
arm-linux-gnueabi         59   -996   -136      4     -1
arm-linux-gnueabihf       59   -996   -136      4     -1
bfin-elf                  22    -31    -19      8     -1
bpf-elf                  276   -388   -191     12     -1
cris-elf                  73     69    -18     26     -1
epiphany-elf              58    -91    -10      2     -1
fr30-elf                 123   -156    -33     20     -1
h8300-elf                150   -426    -36     17     -2
hppa64-hp-hpux11.23       39    -65    -16      1     -1
i686-apple-darwin         93    -51    -29     26     -1
i686-pc-linux-gnu         43      8    -10     27     -1
m32r-elf                  68    -92    -31     14     -1
m68k-linux-gnu           169    -65    -23     33     -1
mcore-elf                 27    -29    -14      8     -1
mmix                      25    -75    -28      2     -1
mn10300-elf              166     32    -46    149     -1
moxie-rtems              937   1461  -1649   6000     -1
msp430-elf                89  -1364   -835      5     -4
nds32le-elf               34    -54    -29      2     -1
pdp11                    252   -458    -23     13     -1
powerpc-ibm-aix7.0         3     -4     -2     -1     -1
powerpc64-linux-gnu        1     -1     -1     -1     -1
powerpc64le-linux-gnu      3     -3     -1     -1     -1
rl78-elf                   4    -12     -4     -2     -4
rx-elf                    59    -99    -11      2     -1
s390-linux-gnu           115   -117    -53     21     -1
s390x-linux-gnu          120    -47    -25     21     -1
sh-linux-gnu              54    -89    -31      8     -1
sparc64-linux-gnu         14     -6     -5      4     -1
tilepro-linux-gnu        209   -452    -55     16     -1
v850-elf                  10     18     -2     21     -1
vax-netbsdelf              5     -5     -1     -1     -1
x86_64-darwin             53    -62    -33      3     -1
x86_64-linux-gnu          52     -8     -8     13     -1
xstormy16-elf            144   -814   -541     25     -1
xtensa-elf               578  -2096   -138     15     -1

The eye-watering moxie-rtems +6,000 outlier is from gcc.dg/pr59992.c,
where the same code template is instantiated 10,000 times.  In some
instances the code improves by one instruction, but it regresses by
one instruction in many more.

To emphasise that this is a poor metric (and is just to get a flavour),
most of the {i686,x86_64}-linux-gnu LOC increases happen in frame info
rather than code.  I should try to script that out...

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2019-09-30  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* ira-lives.c (check_and_make_def_conflict): Handle cases in which
	DEF is not a true earlyclobber but is tied to a specific input
	operand, and so is effectively earlyclobber wrt inputs that have
	different values.
	(make_early_clobber_and_input_conflicts): Pass this case to the above.

Comments

Vladimir Makarov Oct. 2, 2019, 1:44 p.m. UTC | #1
On 2019-09-30 10:40 a.m., Richard Sandiford wrote:
> IRA's make_early_clobber_and_input_conflicts checks for cases in
> which an output operand is likely to be an earlyclobber and an input
> operand is unlikely to be tieable with it.  If so, the allocno for
> the output conflicts with the allocno for the input.  This seems
> to work well.
>
> However, a similar situation arises if an output operand is likely
> to be tied to one of a set of input operands X and if another input
> operand has a different value from all of the operands in X.
> E.g. if we have:
>
>    0: "=r, r"
>    1: "0, r"
>    2: "r, 0"
>    3: "r, r"
>
> operand 0 will always be tied to operand 1 or operand 2, so if operand 3
> is different from them both, operand 0 acts like an earlyclobber as far
> as operand 3 (only) is concerned.  The same is true for operand 2 in:
>
>    0: "=r"
>    1: "0"
>    2: "r"
>
> In the second example, we'd normally have a copy between operand 1 and
> operand 0 if operand 1 dies in the instruction, and so there's rarely
> a problem.  But if operand 1 doesn't die in the instruction, operand 0
> still acts as an earlyclobber for operand 2 (if different from operand 1),
> since in that case LRA must copy operand 1 to operand 0 before the
> instruction.
>
> As the existing comment says:
>
>      Avoid introducing unnecessary conflicts by checking classes of the
>      constraints and pseudos because otherwise significant code
>      degradation is possible for some targets.
>
> I think that's doubly true here.  E.g. it's perfectly reasonable to have
> constraints like:
>
>    0: "=r, r"
>    1: "0, r"
>    2: "r, r"
>
> on targets like s390 that have shorter instructions for tied operands,
> but that don't want the size difference to influence RA too much.
> We shouldn't treat operand 0 as earlyclobber wrt operand 2 in that case.
>
> This patch therefore treats a normal tied non-earlyclobber output as
> being effectively earlyclobber wrt to an input if it is so for *all*
> preferred alternatives.
>
> My usual bogo-comparison of gcc.c-torture, gcc.dg and g++.dg
> (this time using -Os -fno-schedule-insns{,2}) gives:
>
> Target                 Tests  Delta   Best  Worst Median
> ======                 =====  =====   ====  ===== ======
> aarch64-linux-gnu          3     -3     -1     -1     -1
> aarch64_be-linux-gnu       4     -4     -1     -1     -1
> alpha-linux-gnu          136   -190    -56     84     -1
> arc-elf                   31   -172    -27      3     -2
> arm-linux-gnueabi         59   -996   -136      4     -1
> arm-linux-gnueabihf       59   -996   -136      4     -1
> bfin-elf                  22    -31    -19      8     -1
> bpf-elf                  276   -388   -191     12     -1
> cris-elf                  73     69    -18     26     -1
> epiphany-elf              58    -91    -10      2     -1
> fr30-elf                 123   -156    -33     20     -1
> h8300-elf                150   -426    -36     17     -2
> hppa64-hp-hpux11.23       39    -65    -16      1     -1
> i686-apple-darwin         93    -51    -29     26     -1
> i686-pc-linux-gnu         43      8    -10     27     -1
> m32r-elf                  68    -92    -31     14     -1
> m68k-linux-gnu           169    -65    -23     33     -1
> mcore-elf                 27    -29    -14      8     -1
> mmix                      25    -75    -28      2     -1
> mn10300-elf              166     32    -46    149     -1
> moxie-rtems              937   1461  -1649   6000     -1
> msp430-elf                89  -1364   -835      5     -4
> nds32le-elf               34    -54    -29      2     -1
> pdp11                    252   -458    -23     13     -1
> powerpc-ibm-aix7.0         3     -4     -2     -1     -1
> powerpc64-linux-gnu        1     -1     -1     -1     -1
> powerpc64le-linux-gnu      3     -3     -1     -1     -1
> rl78-elf                   4    -12     -4     -2     -4
> rx-elf                    59    -99    -11      2     -1
> s390-linux-gnu           115   -117    -53     21     -1
> s390x-linux-gnu          120    -47    -25     21     -1
> sh-linux-gnu              54    -89    -31      8     -1
> sparc64-linux-gnu         14     -6     -5      4     -1
> tilepro-linux-gnu        209   -452    -55     16     -1
> v850-elf                  10     18     -2     21     -1
> vax-netbsdelf              5     -5     -1     -1     -1
> x86_64-darwin             53    -62    -33      3     -1
> x86_64-linux-gnu          52     -8     -8     13     -1
> xstormy16-elf            144   -814   -541     25     -1
> xtensa-elf               578  -2096   -138     15     -1
>
> The eye-watering moxie-rtems +6,000 outlier is from gcc.dg/pr59992.c,
> where the same code template is instantiated 10,000 times.  In some
> instances the code improves by one instruction, but it regresses by
> one instruction in many more.
>
> To emphasise that this is a poor metric (and is just to get a flavour),
> most of the {i686,x86_64}-linux-gnu LOC increases happen in frame info
> rather than code.  I should try to script that out...
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
Yes.

It is a very non-trivial patch.  It took a some time for me to analyze 
your patch.  But this time was spent for good :)

Thank you, Richard!

>
>
> 2019-09-30  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
> 	* ira-lives.c (check_and_make_def_conflict): Handle cases in which
> 	DEF is not a true earlyclobber but is tied to a specific input
> 	operand, and so is effectively earlyclobber wrt inputs that have
> 	different values.
> 	(make_early_clobber_and_input_conflicts): Pass this case to the above.
>
diff mbox series

Patch

Index: gcc/ira-lives.c
===================================================================
--- gcc/ira-lives.c	2019-09-26 08:37:44.000000000 +0100
+++ gcc/ira-lives.c	2019-09-30 15:23:22.629428359 +0100
@@ -632,9 +632,28 @@  check_and_make_def_use_conflict (rtx dre
 
 /* Check and make if necessary conflicts for definition DEF of class
    DEF_CL of the current insn with input operands.  Process only
-   constraints of alternative ALT.  */
+   constraints of alternative ALT.
+
+   One of three things is true when this function is called:
+
+   (1) DEF is an earlyclobber for alternative ALT.  Input operands then
+       conflict with DEF in ALT unless they explicitly match DEF via 0-9
+       constraints.
+
+   (2) DEF matches (via 0-9 constraints) an operand that is an
+       earlyclobber for alternative ALT.  Other input operands then
+       conflict with DEF in ALT.
+
+   (3) [FOR_TIE_P] Some input operand X matches DEF for alternative ALT.
+       Input operands with a different value from X then conflict with
+       DEF in ALT.
+
+   However, there's still a judgement call to make when deciding
+   whether a conflict in ALT is important enough to be reflected
+   in the pan-alternative allocno conflict set.  */
 static void
-check_and_make_def_conflict (int alt, int def, enum reg_class def_cl)
+check_and_make_def_conflict (int alt, int def, enum reg_class def_cl,
+			     bool for_tie_p)
 {
   int use, use_match;
   ira_allocno_t a;
@@ -668,14 +687,40 @@  check_and_make_def_conflict (int alt, in
       if (use == def || recog_data.operand_type[use] == OP_OUT)
 	continue;
 
+      /* An earlyclobber on DEF doesn't apply to an input operand X if X
+	 explicitly matches DEF, but it applies to other input operands
+	 even if they happen to be the same value as X.
+
+	 In contrast, if an input operand X is tied to a non-earlyclobber
+	 DEF, there's no conflict with other input operands that have the
+	 same value as X.  */
+      if (op_alt[use].matches == def
+	  || (for_tie_p
+	      && rtx_equal_p (recog_data.operand[use],
+			      recog_data.operand[op_alt[def].matched])))
+	continue;
+
       if (op_alt[use].anything_ok)
 	use_cl = ALL_REGS;
       else
 	use_cl = op_alt[use].cl;
+      if (use_cl == NO_REGS)
+	continue;
+
+      /* If DEF is simply a tied operand, ignore cases in which this
+	 alternative requires USE to have a likely-spilled class.
+	 Adding a conflict would just constrain USE further if DEF
+	 happens to be allocated first.  */
+      if (for_tie_p && targetm.class_likely_spilled_p (use_cl))
+	continue;
 
       /* If there's any alternative that allows USE to match DEF, do not
 	 record a conflict.  If that causes us to create an invalid
-	 instruction due to the earlyclobber, reload must fix it up.  */
+	 instruction due to the earlyclobber, reload must fix it up.
+
+	 Likewise, if we're treating a tied DEF like a partial earlyclobber,
+	 do not record a conflict if there's another alternative in which
+	 DEF is neither tied nor earlyclobber.  */
       for (alt1 = 0; alt1 < recog_data.n_alternatives; alt1++)
 	{
 	  if (!TEST_BIT (preferred_alternatives, alt1))
@@ -690,6 +735,12 @@  check_and_make_def_conflict (int alt, in
 		  && recog_data.constraints[use - 1][0] == '%'
 		  && op_alt1[use - 1].matches == def))
 	    break;
+	  if (for_tie_p
+	      && !op_alt1[def].earlyclobber
+	      && op_alt1[def].matched < 0
+	      && alternative_class (op_alt1, def) != NO_REGS
+	      && alternative_class (op_alt1, use) != NO_REGS)
+	    break;
 	}
 
       if (alt1 < recog_data.n_alternatives)
@@ -700,8 +751,7 @@  check_and_make_def_conflict (int alt, in
 
       if ((use_match = op_alt[use].matches) >= 0)
 	{
-	  if (use_match == def)
-	    continue;
+	  gcc_checking_assert (use_match != def);
 
 	  if (op_alt[use_match].anything_ok)
 	    use_cl = ALL_REGS;
@@ -716,7 +766,11 @@  check_and_make_def_conflict (int alt, in
 /* Make conflicts of early clobber pseudo registers of the current
    insn with its inputs.  Avoid introducing unnecessary conflicts by
    checking classes of the constraints and pseudos because otherwise
-   significant code degradation is possible for some targets.  */
+   significant code degradation is possible for some targets.
+
+   For these purposes, tying an input to an output makes that output act
+   like an earlyclobber for inputs with a different value, since the output
+   register then has a predetermined purpose on input to the instruction.  */
 static void
 make_early_clobber_and_input_conflicts (void)
 {
@@ -731,15 +785,19 @@  make_early_clobber_and_input_conflicts (
     if (TEST_BIT (preferred_alternatives, alt))
       for (def = 0; def < n_operands; def++)
 	{
-	  def_cl = NO_REGS;
-	  if (op_alt[def].earlyclobber)
+	  if (op_alt[def].anything_ok)
+	    def_cl = ALL_REGS;
+	  else
+	    def_cl = op_alt[def].cl;
+	  if (def_cl != NO_REGS)
 	    {
-	      if (op_alt[def].anything_ok)
-		def_cl = ALL_REGS;
-	      else
-		def_cl = op_alt[def].cl;
-	      check_and_make_def_conflict (alt, def, def_cl);
+	      if (op_alt[def].earlyclobber)
+		check_and_make_def_conflict (alt, def, def_cl, false);
+	      else if (op_alt[def].matched >= 0
+		       && !targetm.class_likely_spilled_p (def_cl))
+		check_and_make_def_conflict (alt, def, def_cl, true);
 	    }
+
 	  if ((def_match = op_alt[def].matches) >= 0
 	      && (op_alt[def_match].earlyclobber
 		  || op_alt[def].earlyclobber))
@@ -748,7 +806,7 @@  make_early_clobber_and_input_conflicts (
 		def_cl = ALL_REGS;
 	      else
 		def_cl = op_alt[def_match].cl;
-	      check_and_make_def_conflict (alt, def, def_cl);
+	      check_and_make_def_conflict (alt, def, def_cl, false);
 	    }
 	}
 }