diff mbox series

[rs6000] Fix PR88845: ICE in lra_set_insn_recog_data

Message ID a9956825-4423-c479-8ad3-06d6bad4c782@linux.ibm.com
State New
Headers show
Series [rs6000] Fix PR88845: ICE in lra_set_insn_recog_data | expand

Commit Message

Peter Bergner March 1, 2019, 7:33 p.m. UTC
PR88845 shows a problem where LRA spilled an input operand of an inline
asm statement by calling our generic movsf pattern which ended up generating
an insn we don't have a pattern for, so we ICE.  The insn was:

  (insn (set (reg:SF 125)
	     (subreg:SF (reg:SI 124) 0)))

The problem is that rs6000_emit_move_si_sf_subreg() is disabled for LRA
and so wasn't able to call gen_movsf_from_si() which generates the correct
pattern for moving a 32-bit value from a GPR to a FPR.  The patch below
fixes the issue by allowing rs6000_emit_move_si_sf_subreg() to be called
during LRA as well as creating an expander so that when it is called during
LRA, we can create the scratch register that is required for its associated
splitter.  We have to do this, since LRA has already converted all of the
scratches into real registers before it does any spilling.

This passed bootstrap and regtesting on powerpc64le-linux with no
regressions.  Ok for mainline?

Peter

gcc/
	PR rtl-optimization/88845
	* config/rs6000/rs6000.c (rs6000_emit_move_si_sf_subreg): Enable during
	LRA.
	* config/rs6000/rs6000.md (movsf_from_si): New expander; old insn and
	splitter renamed from this ...
	(movsf_from_si_internal): ... to this.

gcc/testsuite/
	PR rtl-optimization/88845
	* gcc.target/powerpc/pr88845.c: New test.

Comments

Segher Boessenkool March 4, 2019, 7:27 p.m. UTC | #1
Hi Peter,

On Fri, Mar 01, 2019 at 01:33:27PM -0600, Peter Bergner wrote:
> PR88845 shows a problem where LRA spilled an input operand of an inline
> asm statement by calling our generic movsf pattern which ended up generating
> an insn we don't have a pattern for, so we ICE.  The insn was:
> 
>   (insn (set (reg:SF 125)
> 	     (subreg:SF (reg:SI 124) 0)))
> 
> The problem is that rs6000_emit_move_si_sf_subreg() is disabled for LRA
> and so wasn't able to call gen_movsf_from_si() which generates the correct
> pattern for moving a 32-bit value from a GPR to a FPR.  The patch below
> fixes the issue by allowing rs6000_emit_move_si_sf_subreg() to be called
> during LRA as well as creating an expander so that when it is called during
> LRA, we can create the scratch register that is required for its associated
> splitter.  We have to do this, since LRA has already converted all of the
> scratches into real registers before it does any spilling.

> +      /* If LRA is generating a direct move from a GPR to a FPR,
> +	 then the splitter is going to need a scratch register.  */
> +      rtx insn = gen_movsf_from_si_internal (operands[0], operands[1]);
> +      XEXP (XVECEXP (insn, 0, 1), 0) = gen_reg_rtx (DImode);
> +      emit_insn (insn);
> +      DONE;

This part isn't so great, needing detailed knowledge of the RTL generated
by some other pattern.  Maybe there already exists some function that
generates a register for every scratch in an insn, or you can make such
a function?

Okay for trunk with or without such an improvement.  Also backports, if
you want those.  But note (on trunk):

> +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
> +/* { dg-options "-mcpu=power8 -O2" } */

These two lines should now be just

/* { dg-options "-mdejagnu-cpu=power8 -O2" } */


Thanks!


Segher
Peter Bergner March 4, 2019, 10:16 p.m. UTC | #2
On 3/4/19 1:27 PM, Segher Boessenkool wrote:
>> +      /* If LRA is generating a direct move from a GPR to a FPR,
>> +	 then the splitter is going to need a scratch register.  */
>> +      rtx insn = gen_movsf_from_si_internal (operands[0], operands[1]);
>> +      XEXP (XVECEXP (insn, 0, 1), 0) = gen_reg_rtx (DImode);
>> +      emit_insn (insn);
>> +      DONE;
> 
> This part isn't so great, needing detailed knowledge of the RTL generated
> by some other pattern.  Maybe there already exists some function that
> generates a register for every scratch in an insn, or you can make such
> a function?

A function that updates one insn does not exist.  There is remove_scratches(),
but that works on the entire cfg.  As part of my earlier attempts, I did split
remove_scratches() into a function that traverses the cfg and another that
replaces the scratches in one insn.  I've included it below.  I went with the
current patch, because that doesn't touch anything outside of the port.
If you prefer the patch below, we can go with that instead.  Let me know which
you prefer.


>> +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
>> +/* { dg-options "-mcpu=power8 -O2" } */
> 
> These two lines should now be just
> 
> /* { dg-options "-mdejagnu-cpu=power8 -O2" } */

Ok, will update that.

Peter



Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 269028)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -9887,7 +9887,7 @@ valid_sf_si_move (rtx dest, rtx src, mac
 static bool
 rs6000_emit_move_si_sf_subreg (rtx dest, rtx source, machine_mode mode)
 {
-  if (TARGET_DIRECT_MOVE_64BIT && !lra_in_progress && !reload_completed
+  if (TARGET_DIRECT_MOVE_64BIT && !reload_completed
       && (!SUBREG_P (dest) || !sf_subreg_operand (dest, mode))
       && SUBREG_P (source) && sf_subreg_operand (source, mode))
     {
@@ -9902,7 +9902,9 @@ rs6000_emit_move_si_sf_subreg (rtx dest,
 
       if (mode == SFmode && inner_mode == SImode)
 	{
-	  emit_insn (gen_movsf_from_si (dest, inner_source));
+	  rtx_insn *insn = emit_insn (gen_movsf_from_si (dest, inner_source));
+	  if (lra_in_progress)
+	    remove_scratches_1 (insn);
 	  return true;
 	}
     }

Index: gcc/lra.c
===================================================================
--- gcc/lra.c	(revision 269028)
+++ gcc/lra.c	(working copy)
@@ -2077,7 +2077,40 @@ lra_register_new_scratch_op (rtx_insn *i
   add_reg_note (insn, REG_UNUSED, op);
 }
 
-/* Change scratches onto pseudos and save their location.  */
+/* Change INSN's scratches into pseudos and save their location.  */
+void
+remove_scratches_1 (rtx_insn *insn)
+{
+  int i;
+  bool insn_changed_p;
+  rtx reg;
+  lra_insn_recog_data_t id;
+  struct lra_static_insn_data *static_id;
+
+  id = lra_get_insn_recog_data (insn);
+  static_id = id->insn_static_data;
+  insn_changed_p = false;
+  for (i = 0; i < static_id->n_operands; i++)
+    if (GET_CODE (*id->operand_loc[i]) == SCRATCH
+	&& GET_MODE (*id->operand_loc[i]) != VOIDmode)
+      {
+	insn_changed_p = true;
+	*id->operand_loc[i] = reg
+	  = lra_create_new_reg (static_id->operand[i].mode,
+				*id->operand_loc[i], ALL_REGS, NULL);
+	lra_register_new_scratch_op (insn, i, id->icode);
+	if (lra_dump_file != NULL)
+	  fprintf (lra_dump_file,
+		   "Removing SCRATCH in insn #%u (nop %d)\n",
+		   INSN_UID (insn), i);
+      }
+  if (insn_changed_p)
+    /* Because we might use DF right after caller-saves sub-pass
+       we need to keep DF info up to date.  */
+    df_insn_rescan (insn);
+}
+
+/* Change scratches into pseudos and save their location.  */
 static void
 remove_scratches (void)
 {
@@ -2095,29 +2128,7 @@ remove_scratches (void)
   FOR_EACH_BB_FN (bb, cfun)
     FOR_BB_INSNS (bb, insn)
     if (INSN_P (insn))
-      {
-	id = lra_get_insn_recog_data (insn);
-	static_id = id->insn_static_data;
-	insn_changed_p = false;
-	for (i = 0; i < static_id->n_operands; i++)
-	  if (GET_CODE (*id->operand_loc[i]) == SCRATCH
-	      && GET_MODE (*id->operand_loc[i]) != VOIDmode)
-	    {
-	      insn_changed_p = true;
-	      *id->operand_loc[i] = reg
-		= lra_create_new_reg (static_id->operand[i].mode,
-				      *id->operand_loc[i], ALL_REGS, NULL);
-	      lra_register_new_scratch_op (insn, i, id->icode);
-	      if (lra_dump_file != NULL)
-		fprintf (lra_dump_file,
-			 "Removing SCRATCH in insn #%u (nop %d)\n",
-			 INSN_UID (insn), i);
-	    }
-	if (insn_changed_p)
-	  /* Because we might use DF right after caller-saves sub-pass
-	     we need to keep DF info up to date.  */
-	  df_insn_rescan (insn);
-      }
+      remove_scratches_1 (insn);
 }
 
 /* Changes pseudos created by function remove_scratches onto scratches.	 */
Index: gcc/rtl.h
===================================================================
--- gcc/rtl.h	(revision 269028)
+++ gcc/rtl.h	(working copy)
@@ -4087,6 +4087,9 @@ extern rtx fis_get_condition (rtx_insn *
 extern HARD_REG_SET eliminable_regset;
 extern void mark_elimination (int, int);
 
+/* In lra.c */
+extern void remove_scratches_1 (rtx_insn *);
+
 /* In reginfo.c */
 extern int reg_classes_intersect_p (reg_class_t, reg_class_t);
 extern int reg_class_subset_p (reg_class_t, reg_class_t);
Peter Bergner March 4, 2019, 10:24 p.m. UTC | #3
On 3/4/19 4:16 PM, Peter Bergner wrote:
> Index: gcc/config/rs6000/rs6000.c
> ===================================================================
> --- gcc/config/rs6000/rs6000.c	(revision 269028)
> +++ gcc/config/rs6000/rs6000.c	(working copy)
> @@ -9887,7 +9887,7 @@ valid_sf_si_move (rtx dest, rtx src, mac
>  static bool
>  rs6000_emit_move_si_sf_subreg (rtx dest, rtx source, machine_mode mode)
>  {
> -  if (TARGET_DIRECT_MOVE_64BIT && !lra_in_progress && !reload_completed
> +  if (TARGET_DIRECT_MOVE_64BIT && !reload_completed
>        && (!SUBREG_P (dest) || !sf_subreg_operand (dest, mode))
>        && SUBREG_P (source) && sf_subreg_operand (source, mode))
>      {
> @@ -9902,7 +9902,9 @@ rs6000_emit_move_si_sf_subreg (rtx dest,
>  
>        if (mode == SFmode && inner_mode == SImode)
>  	{
> -	  emit_insn (gen_movsf_from_si (dest, inner_source));
> +	  rtx_insn *insn = emit_insn (gen_movsf_from_si (dest, inner_source));
> +	  if (lra_in_progress)
> +	    remove_scratches_1 (insn);
>  	  return true;
>  	}
>      }

But maybe the call to remove_scratches_1() should move to lra_emit_move(),
which is how we get to this code in the first place?  Who knows what other
generic move patterns might need scratches?

Peter
Peter Bergner March 5, 2019, 12:41 a.m. UTC | #4
On 3/4/19 4:24 PM, Peter Bergner wrote:
> On 3/4/19 4:16 PM, Peter Bergner wrote:
>> Index: gcc/config/rs6000/rs6000.c
>> ===================================================================
>> --- gcc/config/rs6000/rs6000.c	(revision 269028)
>> +++ gcc/config/rs6000/rs6000.c	(working copy)
>> @@ -9887,7 +9887,7 @@ valid_sf_si_move (rtx dest, rtx src, mac
>>  static bool
>>  rs6000_emit_move_si_sf_subreg (rtx dest, rtx source, machine_mode mode)
>>  {
>> -  if (TARGET_DIRECT_MOVE_64BIT && !lra_in_progress && !reload_completed
>> +  if (TARGET_DIRECT_MOVE_64BIT && !reload_completed
>>        && (!SUBREG_P (dest) || !sf_subreg_operand (dest, mode))
>>        && SUBREG_P (source) && sf_subreg_operand (source, mode))
>>      {
>> @@ -9902,7 +9902,9 @@ rs6000_emit_move_si_sf_subreg (rtx dest,
>>  
>>        if (mode == SFmode && inner_mode == SImode)
>>  	{
>> -	  emit_insn (gen_movsf_from_si (dest, inner_source));
>> +	  rtx_insn *insn = emit_insn (gen_movsf_from_si (dest, inner_source));
>> +	  if (lra_in_progress)
>> +	    remove_scratches_1 (insn);
>>  	  return true;
>>  	}
>>      }
> 
> But maybe the call to remove_scratches_1() should move to lra_emit_move(),
> which is how we get to this code in the first place?  Who knows what other
> generic move patterns might need scratches too?

Like this.  This bootstraps and regtests with no regressions.  Do you prefer
this instead?  If so, we'll need Vlad or Jeff or ... to approve the LRA
changes.

Vlad and Jeff,

The original problem and patch is described here:

    https://gcc.gnu.org/ml/gcc-patches/2019-03/msg00061.html

Short answer is, after enabling a rs6000 move pattern we need for spilling,
we ICE when spilling, because the move pattern uses a scratch register
and scratch registers are replaced early on during LRA initialization.
The patch below just extracts out the code that fixes up one insn and
makes it a function itself.  I then changed lra_emit_move() to then call
that function after generating the move insn so as to replace the scratch
register the move pattern generated.  Thoughts on this patch compared to
the rs6000 only patch linked above?

Peter


gcc/
	PR rtl-optimization/88845
	* config/rs6000/rs6000.c (rs6000_emit_move_si_sf_subreg): Enable during
	LRA.
	* lra.c (remove_scratches_1): New function.
	(remove_scratches): Use it.
	(lra_emit_move): Likewise.

gcc/testsuite/
	PR rtl-optimization/88845
	* gcc.target/powerpc/pr88845.c: New test.

Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 269263)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -9887,7 +9887,7 @@ valid_sf_si_move (rtx dest, rtx src, mac
 static bool
 rs6000_emit_move_si_sf_subreg (rtx dest, rtx source, machine_mode mode)
 {
-  if (TARGET_DIRECT_MOVE_64BIT && !lra_in_progress && !reload_completed
+  if (TARGET_DIRECT_MOVE_64BIT && !reload_completed
       && (!SUBREG_P (dest) || !sf_subreg_operand (dest, mode))
       && SUBREG_P (source) && sf_subreg_operand (source, mode))
     {
Index: gcc/lra.c
===================================================================
--- gcc/lra.c	(revision 269263)
+++ gcc/lra.c	(working copy)
@@ -159,6 +159,7 @@ static void invalidate_insn_recog_data (
 static int get_insn_freq (rtx_insn *);
 static void invalidate_insn_data_regno_info (lra_insn_recog_data_t,
 					     rtx_insn *, int);
+static void remove_scratches_1 (rtx_insn *);
 
 /* Expand all regno related info needed for LRA.  */
 static void
@@ -494,7 +495,11 @@ lra_emit_move (rtx x, rtx y)
       if (rtx_equal_p (x, y))
 	return;
       old = max_reg_num ();
-      emit_move_insn (x, y);
+      rtx_insn *insn = emit_move_insn (x, y);
+      /* The move pattern may require scratch registers, so convert them
+	 into real registers now.  */
+      if (insn != NULL_RTX)
+	remove_scratches_1 (insn);
       if (REG_P (x))
 	lra_reg_info[ORIGINAL_REGNO (x)].last_reload = ++lra_curr_reload_num;
       /* Function emit_move can create pseudos -- so expand the pseudo
@@ -2077,47 +2082,53 @@ lra_register_new_scratch_op (rtx_insn *i
   add_reg_note (insn, REG_UNUSED, op);
 }
 
-/* Change scratches onto pseudos and save their location.  */
+/* Change INSN's scratches into pseudos and save their location.  */
 static void
-remove_scratches (void)
+remove_scratches_1 (rtx_insn *insn)
 {
   int i;
   bool insn_changed_p;
-  basic_block bb;
-  rtx_insn *insn;
   rtx reg;
   lra_insn_recog_data_t id;
   struct lra_static_insn_data *static_id;
 
+  id = lra_get_insn_recog_data (insn);
+  static_id = id->insn_static_data;
+  insn_changed_p = false;
+  for (i = 0; i < static_id->n_operands; i++)
+    if (GET_CODE (*id->operand_loc[i]) == SCRATCH
+	&& GET_MODE (*id->operand_loc[i]) != VOIDmode)
+      {
+	insn_changed_p = true;
+	*id->operand_loc[i] = reg
+	  = lra_create_new_reg (static_id->operand[i].mode,
+				*id->operand_loc[i], ALL_REGS, NULL);
+	lra_register_new_scratch_op (insn, i, id->icode);
+	if (lra_dump_file != NULL)
+	  fprintf (lra_dump_file,
+		   "Removing SCRATCH in insn #%u (nop %d)\n",
+		   INSN_UID (insn), i);
+      }
+  if (insn_changed_p)
+    /* Because we might use DF right after caller-saves sub-pass
+       we need to keep DF info up to date.  */
+    df_insn_rescan (insn);
+}
+
+/* Change scratches into pseudos and save their location.  */
+static void
+remove_scratches (void)
+{
+  basic_block bb;
+  rtx_insn *insn;
+
   scratches.create (get_max_uid ());
   bitmap_initialize (&scratch_bitmap, &reg_obstack);
   bitmap_initialize (&scratch_operand_bitmap, &reg_obstack);
   FOR_EACH_BB_FN (bb, cfun)
     FOR_BB_INSNS (bb, insn)
     if (INSN_P (insn))
-      {
-	id = lra_get_insn_recog_data (insn);
-	static_id = id->insn_static_data;
-	insn_changed_p = false;
-	for (i = 0; i < static_id->n_operands; i++)
-	  if (GET_CODE (*id->operand_loc[i]) == SCRATCH
-	      && GET_MODE (*id->operand_loc[i]) != VOIDmode)
-	    {
-	      insn_changed_p = true;
-	      *id->operand_loc[i] = reg
-		= lra_create_new_reg (static_id->operand[i].mode,
-				      *id->operand_loc[i], ALL_REGS, NULL);
-	      lra_register_new_scratch_op (insn, i, id->icode);
-	      if (lra_dump_file != NULL)
-		fprintf (lra_dump_file,
-			 "Removing SCRATCH in insn #%u (nop %d)\n",
-			 INSN_UID (insn), i);
-	    }
-	if (insn_changed_p)
-	  /* Because we might use DF right after caller-saves sub-pass
-	     we need to keep DF info up to date.  */
-	  df_insn_rescan (insn);
-      }
+      remove_scratches_1 (insn);
 }
 
 /* Changes pseudos created by function remove_scratches onto scratches.	 */
Index: gcc/testsuite/gcc.target/powerpc/pr88845.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr88845.c	(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr88845.c	(working copy)
@@ -0,0 +1,24 @@
+/* { dg-do compile { target powerpc*-*-linux* } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-skip-if "" { powerpc*-*-*spe* } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mdejagnu-cpu=power8 -O2" } */
+/* { dg-final { scan-assembler {\mmtvsrd\M} { target { lp64 } } } } */
+/* { dg-final { scan-assembler {\mxscvspdpn\M} { target { lp64 } } } } */
+
+/* Verify that we do not ICE and that we generate a direct move
+   for float types when compiling for 64-bit.  */
+
+struct a {
+  unsigned ui;
+  float f;
+};
+
+void
+foo (void)
+{
+  float e;
+  struct a s;
+  e = s.f;
+  __asm__("" : : "d" (e));
+}
Segher Boessenkool March 5, 2019, 11:25 a.m. UTC | #5
Hi Peter,

On Mon, Mar 04, 2019 at 04:16:23PM -0600, Peter Bergner wrote:
> On 3/4/19 1:27 PM, Segher Boessenkool wrote:
> >> +      /* If LRA is generating a direct move from a GPR to a FPR,
> >> +	 then the splitter is going to need a scratch register.  */
> >> +      rtx insn = gen_movsf_from_si_internal (operands[0], operands[1]);
> >> +      XEXP (XVECEXP (insn, 0, 1), 0) = gen_reg_rtx (DImode);
> >> +      emit_insn (insn);
> >> +      DONE;
> > 
> > This part isn't so great, needing detailed knowledge of the RTL generated
> > by some other pattern.  Maybe there already exists some function that
> > generates a register for every scratch in an insn, or you can make such
> > a function?
> 
> A function that updates one insn does not exist.  There is remove_scratches(),
> but that works on the entire cfg.  As part of my earlier attempts, I did split
> remove_scratches() into a function that traverses the cfg and another that
> replaces the scratches in one insn.  I've included it below.  I went with the
> current patch, because that doesn't touch anything outside of the port.
> If you prefer the patch below, we can go with that instead.  Let me know which
> you prefer.

I meant just like you did in your original patch, just gen_reg_rtx and
nothing more, but finding the SCRATCH locations by itself.  This might also
be useful for the very many splitters we have that allocate regs for
SCRATCHes.


Something like (totally untested)

void itch_scratches (rtx_insn *insn)
{
  subrtx_ptr_iterator::array_type array;
  FOR_EACH_SUBRTX_PTR (iter, array, PATTERN (insn), NONCONST)
    if (GET_CODE (**iter) == SCRATCH)
      **iter = gen_reg_rtx (GET_MODE (**iter));
}

But, it seems you need to keep track of other things on the side for LRA?


Segher
Segher Boessenkool March 5, 2019, 11:51 a.m. UTC | #6
On Mon, Mar 04, 2019 at 06:41:16PM -0600, Peter Bergner wrote:
> Like this.  This bootstraps and regtests with no regressions.  Do you prefer
> this instead?  If so, we'll need Vlad or Jeff or ... to approve the LRA
> changes.

Either solution is fine with me, whatever the LRA experts prefer.  Even
your original patch is okay, just a bit shaky; I don't expect us to need
more like this though, SF is a bit special, so I can just close my eyes
and move on, no problem :-)

[ I do wonder why we are having these problems, it's not a very special
problem, other targets should hit similar...  do they use secondary reloads
maybe? ]


Segher
Peter Bergner March 5, 2019, 2:08 p.m. UTC | #7
On 3/5/19 5:25 AM, Segher Boessenkool wrote:
> But, it seems you need to keep track of other things on the side for LRA?

The extra LRA info is to keep track of scratches that are not needed.
In our case, only the one alternative in movsf_from_si requires a
scratch register.  The rest use an X constraint.  For those alternatives,
LRA changes the scratch reg back to just a scratch when it's all done.

Peter
Peter Bergner March 6, 2019, 1:46 a.m. UTC | #8
On 3/5/19 5:51 AM, Segher Boessenkool wrote:
> On Mon, Mar 04, 2019 at 06:41:16PM -0600, Peter Bergner wrote:
>> Like this.  This bootstraps and regtests with no regressions.  Do you prefer
>> this instead?  If so, we'll need Vlad or Jeff or ... to approve the LRA
>> changes.
> 
> Either solution is fine with me, whatever the LRA experts prefer.  Even
> your original patch is okay, just a bit shaky; I don't expect us to need
> more like this though, SF is a bit special, so I can just close my eyes
> and move on, no problem :-)

I'd like to hold off until Vlad and/or Jeff...or anyone else chimes in.
It seems like LRA is assuming the move insn it generates won't have a
scratch, since it replaces all scratches early, but as we can see for
our 32-bit GPR to FPR copy, we do need one.


> [ I do wonder why we are having these problems, it's not a very special
> problem, other targets should hit similar...  do they use secondary reloads
> maybe? ]

Secondary reload can't fix the problem of the scratch replacement, since
we try and immediately recog the move insn we just created and we ICE
because the scratch hasn't yet been replaced with a pseudo.

Peter
Vladimir Makarov March 6, 2019, 5:03 a.m. UTC | #9
On 03/04/2019 07:41 PM, Peter Bergner wrote:
> On 3/4/19 4:24 PM, Peter Bergner wrote:
>> On 3/4/19 4:16 PM, Peter Bergner wrote:
>>> Index: gcc/config/rs6000/rs6000.c
>>> ===================================================================
>>> --- gcc/config/rs6000/rs6000.c	(revision 269028)
>>> +++ gcc/config/rs6000/rs6000.c	(working copy)
>>> @@ -9887,7 +9887,7 @@ valid_sf_si_move (rtx dest, rtx src, mac
>>>   static bool
>>>   rs6000_emit_move_si_sf_subreg (rtx dest, rtx source, machine_mode mode)
>>>   {
>>> -  if (TARGET_DIRECT_MOVE_64BIT && !lra_in_progress && !reload_completed
>>> +  if (TARGET_DIRECT_MOVE_64BIT && !reload_completed
>>>         && (!SUBREG_P (dest) || !sf_subreg_operand (dest, mode))
>>>         && SUBREG_P (source) && sf_subreg_operand (source, mode))
>>>       {
>>> @@ -9902,7 +9902,9 @@ rs6000_emit_move_si_sf_subreg (rtx dest,
>>>   
>>>         if (mode == SFmode && inner_mode == SImode)
>>>   	{
>>> -	  emit_insn (gen_movsf_from_si (dest, inner_source));
>>> +	  rtx_insn *insn = emit_insn (gen_movsf_from_si (dest, inner_source));
>>> +	  if (lra_in_progress)
>>> +	    remove_scratches_1 (insn);
>>>   	  return true;
>>>   	}
>>>       }
>> But maybe the call to remove_scratches_1() should move to lra_emit_move(),
>> which is how we get to this code in the first place?  Who knows what other
>> generic move patterns might need scratches too?
> Like this.  This bootstraps and regtests with no regressions.  Do you prefer
> this instead?  If so, we'll need Vlad or Jeff or ... to approve the LRA
> changes.
>
> Vlad and Jeff,
>
> The original problem and patch is described here:
>
>      https://gcc.gnu.org/ml/gcc-patches/2019-03/msg00061.html
>
> Short answer is, after enabling a rs6000 move pattern we need for spilling,
> we ICE when spilling, because the move pattern uses a scratch register
> and scratch registers are replaced early on during LRA initialization.
> The patch below just extracts out the code that fixes up one insn and
> makes it a function itself.  I then changed lra_emit_move() to then call
> that function after generating the move insn so as to replace the scratch
> register the move pattern generated.  Thoughts on this patch compared to
> the rs6000 only patch linked above?
I recommend don't use scratches at all because IRA ignores them and it 
might result in worse allocation.  Unfortunately, removing scratches for 
all targets requires a lot of work.  This patch deals with scratches 
during all LRA work which is good.  So I like the patch and LRA part of 
it is ok to commit.

Peter, thank you for adding the new functionality to LRA.
>
>
> gcc/
> 	PR rtl-optimization/88845
> 	* config/rs6000/rs6000.c (rs6000_emit_move_si_sf_subreg): Enable during
> 	LRA.
> 	* lra.c (remove_scratches_1): New function.
> 	(remove_scratches): Use it.
> 	(lra_emit_move): Likewise.
>
> gcc/testsuite/
> 	PR rtl-optimization/88845
> 	* gcc.target/powerpc/pr88845.c: New test.
>
>
> Index: gcc/lra.c
> ===================================================================
> --- gcc/lra.c	(revision 269263)
> +++ gcc/lra.c	(working copy)
> @@ -159,6 +159,7 @@ static void invalidate_insn_recog_data (
>   static int get_insn_freq (rtx_insn *);
>   static void invalidate_insn_data_regno_info (lra_insn_recog_data_t,
>   					     rtx_insn *, int);
> +static void remove_scratches_1 (rtx_insn *);
>   
>   /* Expand all regno related info needed for LRA.  */
>   static void
> @@ -494,7 +495,11 @@ lra_emit_move (rtx x, rtx y)
>         if (rtx_equal_p (x, y))
>   	return;
>         old = max_reg_num ();
> -      emit_move_insn (x, y);
> +      rtx_insn *insn = emit_move_insn (x, y);
> +      /* The move pattern may require scratch registers, so convert them
> +	 into real registers now.  */
> +      if (insn != NULL_RTX)
> +	remove_scratches_1 (insn);
>         if (REG_P (x))
>   	lra_reg_info[ORIGINAL_REGNO (x)].last_reload = ++lra_curr_reload_num;
>         /* Function emit_move can create pseudos -- so expand the pseudo
> @@ -2077,47 +2082,53 @@ lra_register_new_scratch_op (rtx_insn *i
>     add_reg_note (insn, REG_UNUSED, op);
>   }
>   
> -/* Change scratches onto pseudos and save their location.  */
> +/* Change INSN's scratches into pseudos and save their location.  */
>   static void
> -remove_scratches (void)
> +remove_scratches_1 (rtx_insn *insn)
>   {
>     int i;
>     bool insn_changed_p;
> -  basic_block bb;
> -  rtx_insn *insn;
>     rtx reg;
>     lra_insn_recog_data_t id;
>     struct lra_static_insn_data *static_id;
>   
> +  id = lra_get_insn_recog_data (insn);
> +  static_id = id->insn_static_data;
> +  insn_changed_p = false;
> +  for (i = 0; i < static_id->n_operands; i++)
> +    if (GET_CODE (*id->operand_loc[i]) == SCRATCH
> +	&& GET_MODE (*id->operand_loc[i]) != VOIDmode)
> +      {
> +	insn_changed_p = true;
> +	*id->operand_loc[i] = reg
> +	  = lra_create_new_reg (static_id->operand[i].mode,
> +				*id->operand_loc[i], ALL_REGS, NULL);
> +	lra_register_new_scratch_op (insn, i, id->icode);
> +	if (lra_dump_file != NULL)
> +	  fprintf (lra_dump_file,
> +		   "Removing SCRATCH in insn #%u (nop %d)\n",
> +		   INSN_UID (insn), i);
> +      }
> +  if (insn_changed_p)
> +    /* Because we might use DF right after caller-saves sub-pass
> +       we need to keep DF info up to date.  */
> +    df_insn_rescan (insn);
> +}
> +
> +/* Change scratches into pseudos and save their location.  */
> +static void
> +remove_scratches (void)
> +{
> +  basic_block bb;
> +  rtx_insn *insn;
> +
>     scratches.create (get_max_uid ());
>     bitmap_initialize (&scratch_bitmap, &reg_obstack);
>     bitmap_initialize (&scratch_operand_bitmap, &reg_obstack);
>     FOR_EACH_BB_FN (bb, cfun)
>       FOR_BB_INSNS (bb, insn)
>       if (INSN_P (insn))
> -      {
> -	id = lra_get_insn_recog_data (insn);
> -	static_id = id->insn_static_data;
> -	insn_changed_p = false;
> -	for (i = 0; i < static_id->n_operands; i++)
> -	  if (GET_CODE (*id->operand_loc[i]) == SCRATCH
> -	      && GET_MODE (*id->operand_loc[i]) != VOIDmode)
> -	    {
> -	      insn_changed_p = true;
> -	      *id->operand_loc[i] = reg
> -		= lra_create_new_reg (static_id->operand[i].mode,
> -				      *id->operand_loc[i], ALL_REGS, NULL);
> -	      lra_register_new_scratch_op (insn, i, id->icode);
> -	      if (lra_dump_file != NULL)
> -		fprintf (lra_dump_file,
> -			 "Removing SCRATCH in insn #%u (nop %d)\n",
> -			 INSN_UID (insn), i);
> -	    }
> -	if (insn_changed_p)
> -	  /* Because we might use DF right after caller-saves sub-pass
> -	     we need to keep DF info up to date.  */
> -	  df_insn_rescan (insn);
> -      }
> +      remove_scratches_1 (insn);
>   }
>
diff mbox series

Patch

Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 269263)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -9887,7 +9887,7 @@  valid_sf_si_move (rtx dest, rtx src, mac
 static bool
 rs6000_emit_move_si_sf_subreg (rtx dest, rtx source, machine_mode mode)
 {
-  if (TARGET_DIRECT_MOVE_64BIT && !lra_in_progress && !reload_completed
+  if (TARGET_DIRECT_MOVE_64BIT && !reload_completed
       && (!SUBREG_P (dest) || !sf_subreg_operand (dest, mode))
       && SUBREG_P (source) && sf_subreg_operand (source, mode))
     {
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 269263)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -7353,9 +7353,31 @@  (define_insn "*mov<mode>_softfloat"
 ;; This function is called before reload, and it creates the temporary as
 ;; needed.
 
+(define_expand "movsf_from_si"
+  [(parallel [(set (match_operand:SF 0 "nonimmediate_operand")
+		   (unspec:SF [(match_operand:SI 1 "input_operand" )]
+			      UNSPEC_SF_FROM_SI))
+	      (clobber (match_scratch:DI 2))])]
+  "TARGET_NO_SF_SUBREG
+   && (register_operand (operands[0], SFmode)
+       || register_operand (operands[1], SImode))"
+{
+  if (lra_in_progress
+      && REG_P (operands[0])
+      && REG_P (operands[1]))
+    {
+      /* If LRA is generating a direct move from a GPR to a FPR,
+	 then the splitter is going to need a scratch register.  */
+      rtx insn = gen_movsf_from_si_internal (operands[0], operands[1]);
+      XEXP (XVECEXP (insn, 0, 1), 0) = gen_reg_rtx (DImode);
+      emit_insn (insn);
+      DONE;
+    }
+})
+
 ;;	    LWZ          LFS        LXSSP      LXSSPX     STW        STFIWX
 ;;	    STXSIWX      GPR->VSX   VSX->GPR   GPR->GPR
-(define_insn_and_split "movsf_from_si"
+(define_insn_and_split "movsf_from_si_internal"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 	    "=!r,       f,         wb,        wu,        m,         Z,
 	     Z,         wy,        ?r,        !r")
Index: gcc/testsuite/gcc.target/powerpc/pr88845.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr88845.c	(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr88845.c	(working copy)
@@ -0,0 +1,25 @@ 
+/* { dg-do compile { target powerpc*-*-linux* } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-skip-if "" { powerpc*-*-*spe* } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
+/* { dg-options "-mcpu=power8 -O2" } */
+/* { dg-final { scan-assembler {\mmtvsrd\M} { target { lp64 } } } } */
+/* { dg-final { scan-assembler {\mxscvspdpn\M} { target { lp64 } } } } */
+
+/* Verify that we do not ICE and that we generate a direct move
+   for float types when compiling for 64-bit.  */
+
+struct a {
+  unsigned ui;
+  float f;
+};
+
+void
+foo (void)
+{
+  float e;
+  struct a s;
+  e = s.f;
+  __asm__("" : : "d" (e));
+}