diff mbox

[GCC/LRA] Teach LRA to not use same register value for multiple output operands of an insn

Message ID 3064206.UDVhgfly4a@e108577-lin
State New
Headers show

Commit Message

Thomas Preudhomme July 8, 2016, 3:07 p.m. UTC
Hi,

While investigating the root cause a testsuite regression for the 
ARM/embedded-5-branch GCC in gcc.dg/vect/slp-perm-5.c, we found that the bug 
seems to also affect trunk. The bug manifests itself as an ICE in cselib due to 
a parallel insn with two SET to the same register. When processing the second 
SET in cselib_record_set (), the assert gcc_assert (REG_VALUES (dreg)->elt == 
0) fails because the field was already set when processing the first SET. The 
root cause seems to be a register allocation issue in lra-constraints.

When considering an output operand with matching input operand(s), 
match_reload does a number of checks to see if it can reuse the first matching 
input operand register value or if a new unique value should be given to the 
output operand. The current check ignores the case of multiple output operands 
(as in neon_vtrn<mode>_insn insn pattern in config/arm/arm.md). This can lead 
to cases where multiple output operands share a same register value when the 
first matching input operand has the same value as another output operand, 
leading to the ICE in cselib. This patch changes match_reload to get 
information about other output operands and check whether this case is met or 
not.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2016-07-01  Thomas Preud'homme  <thomas.preudhomme@arm.com>

        * lra-constraints.c (match_reload): Pass information about other
        output operands.  Create new unique register value if matching input
        operand shares same register value as output operand being considered.
        (curr_insn_transform): Record output operands already processed.


Patch passed bootstrap under arm-none-linux-gnueabihf (Cortex-A57 in ARM mode 
as well as Thumb mode), aarch64-linux-gnu (Cortex-A57) and x86_64-linux-gnu 
and testsuite results does not regress for these and for arm-none-eabi 
targeting Cortex-A8.

Is this ok for trunk?

Best regards,

Thomas

Comments

Mike Stump July 8, 2016, 4:05 p.m. UTC | #1
On Jul 8, 2016, at 8:07 AM, Thomas Preudhomme <thomas.preudhomme@foss.arm.com> wrote:
> While investigating the root cause a testsuite regression for the 
> ARM/embedded-5-branch GCC in gcc.dg/vect/slp-perm-5.c, we found that the bug 
> seems to also affect trunk.

Hum...  If in 6.x, and safe to back port to 6, a back port would be nice...  I use LRA in 6.x, and seems like I'd be susceptible to this sort of thing, but, I didn't test it.
Thomas Preudhomme July 14, 2016, 9:32 a.m. UTC | #2
On Friday 08 July 2016 09:05:55 Mike Stump wrote:
> On Jul 8, 2016, at 8:07 AM, Thomas Preudhomme 
<thomas.preudhomme@foss.arm.com> wrote:
> > While investigating the root cause a testsuite regression for the
> > ARM/embedded-5-branch GCC in gcc.dg/vect/slp-perm-5.c, we found that the
> > bug seems to also affect trunk.
> 
> Hum...  If in 6.x, and safe to back port to 6, a back port would be nice... 
> I use LRA in 6.x, and seems like I'd be susceptible to this sort of thing,
> but, I didn't test it.

It should affect 6.x as well yes since there is no special protection against 
it in the code being modified. However I only ever managed to reproduce this on 
the ARM/embedded-5-branch.

Best regards,

Thomas
Thomas Preudhomme July 14, 2016, 12:23 p.m. UTC | #3
On Thursday 14 July 2016 10:32:48 Thomas Preudhomme wrote:
> On Friday 08 July 2016 09:05:55 Mike Stump wrote:
> > On Jul 8, 2016, at 8:07 AM, Thomas Preudhomme
> 
> <thomas.preudhomme@foss.arm.com> wrote:
> > > While investigating the root cause a testsuite regression for the
> > > ARM/embedded-5-branch GCC in gcc.dg/vect/slp-perm-5.c, we found that the
> > > bug seems to also affect trunk.
> > 
> > Hum...  If in 6.x, and safe to back port to 6, a back port would be
> > nice...
> > I use LRA in 6.x, and seems like I'd be susceptible to this sort of thing,
> > but, I didn't test it.
> 
> It should affect 6.x as well yes since there is no special protection
> against it in the code being modified. However I only ever managed to
> reproduce this on the ARM/embedded-5-branch.

I've created PR71878 for this problem with more details on how to reproduce 
the issue. it only requires to backport a single commit to gcc-5-branch, build 
gcc for arm-none-eabi (make all-gcc is enough) and compile the testcase 
attached in the PR with the given options.

Please let me know if you need any more details.

Best regards,

Thomas
Vladimir Makarov July 14, 2016, 4:25 p.m. UTC | #4
On 07/08/2016 11:07 AM, Thomas Preudhomme wrote:
> Hi,
>
> While investigating the root cause a testsuite regression for the
> ARM/embedded-5-branch GCC in gcc.dg/vect/slp-perm-5.c, we found that the bug
> seems to also affect trunk. The bug manifests itself as an ICE in cselib due to
> a parallel insn with two SET to the same register. When processing the second
> SET in cselib_record_set (), the assert gcc_assert (REG_VALUES (dreg)->elt ==
> 0) fails because the field was already set when processing the first SET. The
> root cause seems to be a register allocation issue in lra-constraints.
>
> When considering an output operand with matching input operand(s),
> match_reload does a number of checks to see if it can reuse the first matching
> input operand register value or if a new unique value should be given to the
> output operand. The current check ignores the case of multiple output operands
> (as in neon_vtrn<mode>_insn insn pattern in config/arm/arm.md). This can lead
> to cases where multiple output operands share a same register value when the
> first matching input operand has the same value as another output operand,
> leading to the ICE in cselib. This patch changes match_reload to get
> information about other output operands and check whether this case is met or
> not.
>
> ChangeLog entry is as follows:
>
> *** gcc/ChangeLog ***
>
> 2016-07-01  Thomas Preud'homme  <thomas.preudhomme@arm.com>
>
>          * lra-constraints.c (match_reload): Pass information about other
>          output operands.  Create new unique register value if matching input
>          operand shares same register value as output operand being considered.
>          (curr_insn_transform): Record output operands already processed.
>
>
> Patch passed bootstrap under arm-none-linux-gnueabihf (Cortex-A57 in ARM mode
> as well as Thumb mode), aarch64-linux-gnu (Cortex-A57) and x86_64-linux-gnu
> and testsuite results does not regress for these and for arm-none-eabi
> targeting Cortex-A8.
Sorry, it took sometime to think about the problem.  It is a nontrivial 
problem with a lot of possible scenarios in general case.  The patch is 
safe in any case (it can not create a wrong code if LRA w/o the patch 
does not create a wrong code).
> Is this ok for trunk?
>
>

Yes.  Thank you, Thomas.
Thomas Preudhomme Sept. 15, 2016, 2:30 p.m. UTC | #5
Hi Vladimir & release managers,

The patch applies cleanly to gcc-6-branch. Ok to backport?

Best regards,

Thomas

On 14/07/16 17:25, Vladimir Makarov wrote:
> On 07/08/2016 11:07 AM, Thomas Preudhomme wrote:
>> Hi,
>>
>> While investigating the root cause a testsuite regression for the
>> ARM/embedded-5-branch GCC in gcc.dg/vect/slp-perm-5.c, we found that the bug
>> seems to also affect trunk. The bug manifests itself as an ICE in cselib due to
>> a parallel insn with two SET to the same register. When processing the second
>> SET in cselib_record_set (), the assert gcc_assert (REG_VALUES (dreg)->elt ==
>> 0) fails because the field was already set when processing the first SET. The
>> root cause seems to be a register allocation issue in lra-constraints.
>>
>> When considering an output operand with matching input operand(s),
>> match_reload does a number of checks to see if it can reuse the first matching
>> input operand register value or if a new unique value should be given to the
>> output operand. The current check ignores the case of multiple output operands
>> (as in neon_vtrn<mode>_insn insn pattern in config/arm/arm.md). This can lead
>> to cases where multiple output operands share a same register value when the
>> first matching input operand has the same value as another output operand,
>> leading to the ICE in cselib. This patch changes match_reload to get
>> information about other output operands and check whether this case is met or
>> not.
>>
>> ChangeLog entry is as follows:
>>
>> *** gcc/ChangeLog ***
>>
>> 2016-07-01  Thomas Preud'homme  <thomas.preudhomme@arm.com>
>>
>>          * lra-constraints.c (match_reload): Pass information about other
>>          output operands.  Create new unique register value if matching input
>>          operand shares same register value as output operand being considered.
>>          (curr_insn_transform): Record output operands already processed.
>>
>>
>> Patch passed bootstrap under arm-none-linux-gnueabihf (Cortex-A57 in ARM mode
>> as well as Thumb mode), aarch64-linux-gnu (Cortex-A57) and x86_64-linux-gnu
>> and testsuite results does not regress for these and for arm-none-eabi
>> targeting Cortex-A8.
> Sorry, it took sometime to think about the problem.  It is a nontrivial problem
> with a lot of possible scenarios in general case.  The patch is safe in any case
> (it can not create a wrong code if LRA w/o the patch does not create a wrong code).
>> Is this ok for trunk?
>>
>>
>
> Yes.  Thank you, Thomas.
>
Vladimir Makarov Sept. 19, 2016, 3:05 p.m. UTC | #6
On 07/08/2016 11:07 AM, Thomas Preudhomme wrote:
> Hi,
>
> While investigating the root cause a testsuite regression for the
> ARM/embedded-5-branch GCC in gcc.dg/vect/slp-perm-5.c, we found that the bug
> seems to also affect trunk. The bug manifests itself as an ICE in cselib due to
> a parallel insn with two SET to the same register. When processing the second
> SET in cselib_record_set (), the assert gcc_assert (REG_VALUES (dreg)->elt ==
> 0) fails because the field was already set when processing the first SET. The
> root cause seems to be a register allocation issue in lra-constraints.
>
> When considering an output operand with matching input operand(s),
> match_reload does a number of checks to see if it can reuse the first matching
> input operand register value or if a new unique value should be given to the
> output operand. The current check ignores the case of multiple output operands
> (as in neon_vtrn<mode>_insn insn pattern in config/arm/arm.md). This can lead
> to cases where multiple output operands share a same register value when the
> first matching input operand has the same value as another output operand,
> leading to the ICE in cselib. This patch changes match_reload to get
> information about other output operands and check whether this case is met or
> not.
>
> ChangeLog entry is as follows:
>
> *** gcc/ChangeLog ***
>
> 2016-07-01  Thomas Preud'homme  <thomas.preudhomme@arm.com>
>
>          * lra-constraints.c (match_reload): Pass information about other
>          output operands.  Create new unique register value if matching input
>          operand shares same register value as output operand being considered.
>          (curr_insn_transform): Record output operands already processed.
>
>
> Patch passed bootstrap under arm-none-linux-gnueabihf (Cortex-A57 in ARM mode
> as well as Thumb mode), aarch64-linux-gnu (Cortex-A57) and x86_64-linux-gnu
> and testsuite results does not regress for these and for arm-none-eabi
> targeting Cortex-A8.
>
> Is this ok for trunk?
>
>
Yes, Thomas.  I tried to find an alternative solution but your approach 
is probably better.  Your patch is also ok for gcc-6 branch.  I'd delay 
its commit to gcc-6 branch for a few days to see how it behaves on the 
trunk.  Although I don't expect any troubles as the patch is quite safe 
with my point of view.

Thank you for the patch and sorry for the delay with the answer.
Thomas Preudhomme Sept. 20, 2016, 9:20 a.m. UTC | #7
Hi Vladimir,

I'm sorry I failed to give a proper amount of context in that email. You already 
approved that patch for trunk one or two months ago and I was just replying to 
your approval to ask if you are now ok to commit it to gcc-6-branch.

I've committed it now then since it has been on trunk for long enough ;-)

Best regards.

Thomas

On 19/09/16 16:05, Vladimir N Makarov wrote:
>
>
> On 07/08/2016 11:07 AM, Thomas Preudhomme wrote:
>> Hi,
>>
>> While investigating the root cause a testsuite regression for the
>> ARM/embedded-5-branch GCC in gcc.dg/vect/slp-perm-5.c, we found that the bug
>> seems to also affect trunk. The bug manifests itself as an ICE in cselib due to
>> a parallel insn with two SET to the same register. When processing the second
>> SET in cselib_record_set (), the assert gcc_assert (REG_VALUES (dreg)->elt ==
>> 0) fails because the field was already set when processing the first SET. The
>> root cause seems to be a register allocation issue in lra-constraints.
>>
>> When considering an output operand with matching input operand(s),
>> match_reload does a number of checks to see if it can reuse the first matching
>> input operand register value or if a new unique value should be given to the
>> output operand. The current check ignores the case of multiple output operands
>> (as in neon_vtrn<mode>_insn insn pattern in config/arm/arm.md). This can lead
>> to cases where multiple output operands share a same register value when the
>> first matching input operand has the same value as another output operand,
>> leading to the ICE in cselib. This patch changes match_reload to get
>> information about other output operands and check whether this case is met or
>> not.
>>
>> ChangeLog entry is as follows:
>>
>> *** gcc/ChangeLog ***
>>
>> 2016-07-01  Thomas Preud'homme  <thomas.preudhomme@arm.com>
>>
>>          * lra-constraints.c (match_reload): Pass information about other
>>          output operands.  Create new unique register value if matching input
>>          operand shares same register value as output operand being considered.
>>          (curr_insn_transform): Record output operands already processed.
>>
>>
>> Patch passed bootstrap under arm-none-linux-gnueabihf (Cortex-A57 in ARM mode
>> as well as Thumb mode), aarch64-linux-gnu (Cortex-A57) and x86_64-linux-gnu
>> and testsuite results does not regress for these and for arm-none-eabi
>> targeting Cortex-A8.
>>
>> Is this ok for trunk?
>>
>>
> Yes, Thomas.  I tried to find an alternative solution but your approach is
> probably better.  Your patch is also ok for gcc-6 branch.  I'd delay its commit
> to gcc-6 branch for a few days to see how it behaves on the trunk.  Although I
> don't expect any troubles as the patch is quite safe with my point of view.
>
> Thank you for the patch and sorry for the delay with the answer.
diff mbox

Patch

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index bf08dce2e0a4c2ef4c339aedbda4dba47cba1645..a3fd6c93c648050f3479dc8aca359a819d24863e 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -871,15 +871,18 @@  regno_val_use_in (unsigned int regno, rtx x)
 }
 
 /* Generate reloads for matching OUT and INS (array of input operand
-   numbers with end marker -1) with reg class GOAL_CLASS.  Add input
-   and output reloads correspondingly to the lists *BEFORE and *AFTER.
-   OUT might be negative.  In this case we generate input reloads for
-   matched input operands INS.  EARLY_CLOBBER_P is a flag that the
-   output operand is early clobbered for chosen alternative.  */
+   numbers with end marker -1) with reg class GOAL_CLASS, considering
+   output operands OUTS (similar array to INS) needing to be in different
+   registers.  Add input and output reloads correspondingly to the lists
+   *BEFORE and *AFTER.  OUT might be negative.  In this case we generate
+   input reloads for matched input operands INS.  EARLY_CLOBBER_P is a flag
+   that the output operand is early clobbered for chosen alternative.  */
 static void
-match_reload (signed char out, signed char *ins, enum reg_class goal_class,
-	      rtx_insn **before, rtx_insn **after, bool early_clobber_p)
+match_reload (signed char out, signed char *ins, signed char *outs,
+	      enum reg_class goal_class, rtx_insn **before,
+	      rtx_insn **after, bool early_clobber_p)
 {
+  bool out_conflict;
   int i, in;
   rtx new_in_reg, new_out_reg, reg;
   machine_mode inmode, outmode;
@@ -968,12 +971,32 @@  match_reload (signed char out, signed char *ins, enum reg_class goal_class,
 	 We don't care about eliminable hard regs here as we are
 	 interesting only in pseudos.  */
 
+      /* Matching input's register value is the same as one of the other
+	 output operand.  Output operands in a parallel insn must be in
+	 different registers.  */
+      out_conflict = false;
+      if (REG_P (in_rtx))
+	{
+	  for (i = 0; outs[i] >= 0; i++)
+	    {
+	      rtx other_out_rtx = *curr_id->operand_loc[outs[i]];
+	      if (REG_P (other_out_rtx)
+		  && (regno_val_use_in (REGNO (in_rtx), other_out_rtx)
+		      != NULL_RTX))
+		{
+		  out_conflict = true;
+		  break;
+		}
+	    }
+	}
+
       new_in_reg = new_out_reg
 	= (! early_clobber_p && ins[1] < 0 && REG_P (in_rtx)
 	   && (int) REGNO (in_rtx) < lra_new_regno_start
 	   && find_regno_note (curr_insn, REG_DEAD, REGNO (in_rtx))
 	   && (out < 0
 	       || regno_val_use_in (REGNO (in_rtx), out_rtx) == NULL_RTX)
+	   && !out_conflict
 	   ? lra_create_new_reg (inmode, in_rtx, goal_class, "")
 	   : lra_create_new_reg_with_unique_value (outmode, out_rtx,
 						   goal_class, ""));
@@ -3432,9 +3455,11 @@  curr_insn_transform (bool check_only_p)
   int i, j, k;
   int n_operands;
   int n_alternatives;
+  int n_outputs;
   int commutative;
   signed char goal_alt_matched[MAX_RECOG_OPERANDS][MAX_RECOG_OPERANDS];
   signed char match_inputs[MAX_RECOG_OPERANDS + 1];
+  signed char outputs[MAX_RECOG_OPERANDS + 1];
   rtx_insn *before, *after;
   bool alt_p = false;
   /* Flag that the insn has been changed through a transformation.  */
@@ -3844,6 +3869,8 @@  curr_insn_transform (bool check_only_p)
 	  }
       }
 
+  n_outputs = 0;
+  outputs[0] = -1;
   for (i = 0; i < n_operands; i++)
     {
       int regno;
@@ -4001,7 +4028,7 @@  curr_insn_transform (bool check_only_p)
 	  /* generate reloads for input and matched outputs.  */
 	  match_inputs[0] = i;
 	  match_inputs[1] = -1;
-	  match_reload (goal_alt_matched[i][0], match_inputs,
+	  match_reload (goal_alt_matched[i][0], match_inputs, outputs,
 			goal_alt[i], &before, &after,
 			curr_static_id->operand_alternative
 			[goal_alt_number * n_operands + goal_alt_matched[i][0]]
@@ -4011,9 +4038,9 @@  curr_insn_transform (bool check_only_p)
 	       && (curr_static_id->operand[goal_alt_matched[i][0]].type
 		   == OP_IN))
 	/* Generate reloads for output and matched inputs.  */
-	match_reload (i, goal_alt_matched[i], goal_alt[i], &before, &after,
-		      curr_static_id->operand_alternative
-		      [goal_alt_number * n_operands + i].earlyclobber);
+	match_reload (i, goal_alt_matched[i], outputs, goal_alt[i], &before,
+		      &after, curr_static_id->operand_alternative
+			      [goal_alt_number * n_operands + i].earlyclobber);
       else if (curr_static_id->operand[i].type == OP_IN
 	       && (curr_static_id->operand[goal_alt_matched[i][0]].type
 		   == OP_IN))
@@ -4023,12 +4050,22 @@  curr_insn_transform (bool check_only_p)
 	  for (j = 0; (k = goal_alt_matched[i][j]) >= 0; j++)
 	    match_inputs[j + 1] = k;
 	  match_inputs[j + 1] = -1;
-	  match_reload (-1, match_inputs, goal_alt[i], &before, &after, false);
+	  match_reload (-1, match_inputs, outputs, goal_alt[i], &before,
+			&after, false);
 	}
       else
 	/* We must generate code in any case when function
 	   process_alt_operands decides that it is possible.  */
 	gcc_unreachable ();
+
+      /* Memorise processed outputs so that output remaining to be processed
+	 can avoid using the same register value (see match_reload).  */
+      if (curr_static_id->operand[i].type == OP_OUT)
+	{
+	  outputs[n_outputs++] = i;
+	  outputs[n_outputs] = -1;
+	}
+
       if (optional_p)
 	{
 	  lra_assert (REG_P (op));