diff mbox

combine: If a parallel I2 was split, do not allow a new I2 (PR64268)

Message ID 35b5f631a14f76c9305c7bfc8e8bf5fd764ec845.1418640924.git.segher@kernel.crashing.org
State New
Headers show

Commit Message

Segher Boessenkool Dec. 15, 2014, 11:05 a.m. UTC
If combine is given a parallel I2, it splits it into separate I1 and
I2 insns in some cases (one case since the beginning of time; more
cases since my r218248).  It gives the new I1 the same UID as the I2
already has; there is a comment that this works fine because the I1
never is added to the insn stream.

When combine eventually replaces the insns with something new, it
calls SET_INSN_DELETED on those it wants gone.  Since r125624 (when
DF was added, back in 2007) SET_INSN_DELETED uses the UID of the insn
it is called for to do the deletion for dataflow.  So since then, when
such a split I1 is deleted, DF thinks I2 is deleted as well.  This
of course explodes if I2 is in fact needed (but only if that I2 still
exists at the end of combine, i.e. the insn was not combined to further
down; that might explain why it wasn't noticed before).

This should be fixed properly, but it is somewhat involved.  This
patch simply disallows all combinations coming from a split parallel
I2 if it needs a new I2.  This fixes PR target/64268.

Bootstrapped on powerpc64-linux and powerpc-linux; the fails are gone.
Regtested on powerpc64-linux -m32,-m32/-mpowerpc64,-m64,-m64/-mlra.
Is this okay for mainline?

Segher


2014-12-15  Segher Boessenkool  <segher@kernel.crashing.org>

gcc/
	* combine.c (try_combine): Don't try to split newpat if we started
	with I2 a PARALLEL that we split.

---
 gcc/combine.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Paolo Bonzini Dec. 15, 2014, 3:51 p.m. UTC | #1
On 15/12/2014 12:05, Segher Boessenkool wrote:
> If combine is given a parallel I2, it splits it into separate I1 and
> I2 insns in some cases (one case since the beginning of time; more
> cases since my r218248).  It gives the new I1 the same UID as the I2
> already has; there is a comment that this works fine because the I1
> never is added to the insn stream.
> 
> When combine eventually replaces the insns with something new, it
> calls SET_INSN_DELETED on those it wants gone.  Since r125624 (when
> DF was added, back in 2007) SET_INSN_DELETED uses the UID of the insn
> it is called for to do the deletion for dataflow.  So since then, when
> such a split I1 is deleted, DF thinks I2 is deleted as well.  This
> of course explodes if I2 is in fact needed (but only if that I2 still
> exists at the end of combine, i.e. the insn was not combined to further
> down; that might explain why it wasn't noticed before).
> 
> This should be fixed properly, but it is somewhat involved.  This
> patch simply disallows all combinations coming from a split parallel
> I2 if it needs a new I2.  This fixes PR target/64268.
> 
> Bootstrapped on powerpc64-linux and powerpc-linux; the fails are gone.
> Regtested on powerpc64-linux -m32,-m32/-mpowerpc64,-m64,-m64/-mlra.
> Is this okay for mainline?
> 
> Segher
> 
> 
> 2014-12-15  Segher Boessenkool  <segher@kernel.crashing.org>
> 
> gcc/
> 	* combine.c (try_combine): Don't try to split newpat if we started
> 	with I2 a PARALLEL that we split.
> 
> ---
>  gcc/combine.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/gcc/combine.c b/gcc/combine.c
> index 8995c1d3..de2e49f 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -3471,6 +3471,17 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0,
>    if (insn_code_number < 0)
>      insn_code_number = recog_for_combine (&newpat, i3, &new_i3_notes);
>  
> +  /* If we got I1 and I2 from splitting the original (parallel) I2 into two,
> +     I1 and I2 have the same UID, which means that DF ends up deleting I2
> +     when it is asked to delete I1.  So only allow this if we want I2 deleted,
> +     that is, if we get only one insn as combine result; don't try to split
> +     off a new I2 if it doesn't match yet.  */
> +  if (i1 && insn_code_number < 0 && INSN_UID (i1) == INSN_UID (i2))
> +    {
> +      undo_all ();
> +      return 0;
> +    }
> +
>    /* If we were combining three insns and the result is a simple SET
>       with no ASM_OPERANDS that wasn't recognized, try to split it into two
>       insns.  There are two ways to do this.  It can be split using a
> 

Random questions:

1) did you check that it never triggers on e.g. an x86 bootstrap, and
that it doesn't trigger too often on PPC64?

2) if it triggers rarely, should combine just try and give a new UID to
i1?  What makes that hard?  Or should it just skip the SET_INSN_DELETED
on i1 unless it is added to the instruction stream?

That said, if (1) is true, this looks like this fix is good enough for 5
and open branches.

Thanks David for pointing this out to me.

Paolo
Segher Boessenkool Dec. 15, 2014, 4:24 p.m. UTC | #2
On Mon, Dec 15, 2014 at 04:51:14PM +0100, Paolo Bonzini wrote:
> Random questions:
> 
> 1) did you check that it never triggers on e.g. an x86 bootstrap, and
> that it doesn't trigger too often on PPC64?

I have checked on my largish connection of tests for the carry insns
on PowerPC, and only two (related) transforms are disabled, and they
aren't too important anyway.  Well, and the bad transforms are disabled,
only just two of-em but much more frequent (long long x; x--;).

I haven't checked on x86, but it's a bugfix: don't do things that blow up!
It is amazing to me that it didn't show up before.  One theory is that
instructions that set the condition code as well as a GPR will never
combine with a later insn to two insns, always to just one.  But nothing
made this explicit so AFAICS it is just an accident that it worked before.

I'll do an instrumented x86 bootstrap.

[ That testcase, -m32:

long long addSH(long long a, unsigned long b)
{
	return a + ((unsigned long long)b << 32);
}

results in  addic 4,4,0 ; addze 3,5   while it could just be  add 3,5  ]

> 2) if it triggers rarely, should combine just try and give a new UID to
> i1?

That should in principle works, sure.  Seems too dangerous for stage3
though.  combine cannot create UIDs as things stand now.

> What makes that hard?  Or should it just skip the SET_INSN_DELETED
> on i1 unless it is added to the instruction stream?

I have tried that first, but it blew up in other ways.  I didn't look
too closely, sorry.

It really is quite dangerous to allow combining just two original insns to
two other insns; there are arguments why it cannot loop in this case (it
always moves a parallel down, for example) but that is quite fragile, and
also not documented anywhere.  It also _did_ loop with some of my (perhaps
broken) patch attempts.  So I opted for the heavy hammer approach.

> That said, if (1) is true, this looks like this fix is good enough for 5
> and open branches.
> 
> Thanks David for pointing this out to me.

I need to remember to CC: people, sorry :-/

Thanks for looking,


Segher
Paolo Bonzini Dec. 16, 2014, 10:57 a.m. UTC | #3
On 15/12/2014 17:24, Segher Boessenkool wrote:
> On Mon, Dec 15, 2014 at 04:51:14PM +0100, Paolo Bonzini wrote:
>> Random questions:
>>
>> 1) did you check that it never triggers on e.g. an x86 bootstrap, and
>> that it doesn't trigger too often on PPC64?
> 
> I have checked on my largish connection of tests for the carry insns
> on PowerPC, and only two (related) transforms are disabled, and they
> aren't too important anyway.  Well, and the bad transforms are disabled,
> only just two of-em but much more frequent (long long x; x--;).
> 
> I haven't checked on x86, but it's a bugfix: don't do things that blow up!

I agree---I just want to be sure of the extent of the change.

> [ That testcase, -m32:
> 
> long long addSH(long long a, unsigned long b)
> {
> 	return a + ((unsigned long long)b << 32);
> }
> 
> results in  addic 4,4,0 ; addze 3,5   while it could just be  add 3,5  ]

Ah, that's a pity.  But breaking x-- is much worse.

>> 2) if it triggers rarely, should combine just try and give a new UID to
>> i1?
> 
> That should in principle works, sure.  Seems too dangerous for stage3
> though.

Indeed.  Just thinking about it for 5.1.

Paolo
Segher Boessenkool Dec. 16, 2014, 1:28 p.m. UTC | #4
On Mon, Dec 15, 2014 at 10:24:47AM -0600, Segher Boessenkool wrote:
> On Mon, Dec 15, 2014 at 04:51:14PM +0100, Paolo Bonzini wrote:
> > 1) did you check that it never triggers on e.g. an x86 bootstrap, and
> > that it doesn't trigger too often on PPC64?
> 
> I have checked on my largish connection of tests for the carry insns
> on PowerPC, and only two (related) transforms are disabled, and they
> aren't too important anyway.  Well, and the bad transforms are disabled,
> only just two of-em but much more frequent (long long x; x--;).
> 
> I haven't checked on x86, but it's a bugfix: don't do things that blow up!
> It is amazing to me that it didn't show up before.  One theory is that
> instructions that set the condition code as well as a GPR will never
> combine with a later insn to two insns, always to just one.  But nothing
> made this explicit so AFAICS it is just an accident that it worked before.
> 
> I'll do an instrumented x86 bootstrap.

I did a run for powerpc64, one for powerpc, and one for x86-64.

The powerpc64 bootstrap was with pre-installed GMP etc.; the others
had those libraries in-tree.

"type1" is when try_combine used the ancient combine code to split a
parallel set and set of cc; "type2" is when it used my code to split
any other parallel that sets two things; and "type0" is when it didn't
do either but still ended up with I1 and I2 the same UID (I think it
might be called with the same insn twice; not a good thing, it does
not know how to handle this; and it is really worrisome that it then
sometimes succeeds in combining it).

"tries" is how often that split-orig-I2-to-two code is used; "recog"
is how often it reached the first call to recog (so it passed
can_combine_p etc.); "fail" is how often it eventually failed (after
reaching recog), "one" is how often it combined to one insn, "two"
is how often it combined to two.


powerpc64
	tries	recog	fail	one	two
type1	39214	39214	38944	202	18
type2	21540	18968	18928	2	38
type0		292	289	0	3


powerpc
	tries	recog	fail	one	two
type1	21654	21654	21167	485	2
type2	21839	19754	19243	0	509	(*)
type0		427	294	0	133


x86-64
	tries	recog	fail	one	two
type1	17387	17387	17288	70	29
type2	40413	31681	30242	60	1369
type0		0	0	0	0


Not sure what to make of the high number in the x86-64 type2/two
result.


Segher


(*) The (32-bit) powerpc bootstrap failed (that's what the patch
is trying to rectify, after all); the columns on this line don't
add up correctly (two are missing; this was a -j60 build).
Segher Boessenkool Dec. 16, 2014, 7:03 p.m. UTC | #5
On Tue, Dec 16, 2014 at 07:28:22AM -0600, Segher Boessenkool wrote:
> and "type0" is when it didn't
> do either but still ended up with I1 and I2 the same UID (I think it
> might be called with the same insn twice; not a good thing, it does
> not know how to handle this; and it is really worrisome that it then
> sometimes succeeds in combining it).

... and when I make try_combine immediately refuse duplicates in I0,I1,I2
everything works again.

When INSN_UID (i1) == INSN_UID (i2), calling df_insn_delete (i1) does
set the "should be deleted" flag on i2 instead, but try_combine later
calls df_insn_rescan (i2) which resets it again.  This is icky and
looks very wrong in the debug dump, but it does work fine.

Doing an (effictively) 2->2 combine like this still feels quite unsafe
wrt combine possibly looping, but it has worked for over twenty years.

Bootstrapping, making new test results, will send a new patch later
today.

Thanks for pushing me to look much closer,


Segher
Segher Boessenkool Dec. 16, 2014, 11:16 p.m. UTC | #6
On Tue, Dec 16, 2014 at 07:28:22AM -0600, Segher Boessenkool wrote:
> I did a run for powerpc64, one for powerpc, and one for x86-64.
> 
> The powerpc64 bootstrap was with pre-installed GMP etc.; the others
> had those libraries in-tree.
> 
> "type1" is when try_combine used the ancient combine code to split a
> parallel set and set of cc; "type2" is when it used my code to split
> any other parallel that sets two things; and "type0" is when it didn't
> do either but still ended up with I1 and I2 the same UID (I think it
> might be called with the same insn twice; not a good thing, it does
> not know how to handle this; and it is really worrisome that it then
> sometimes succeeds in combining it).
> 
> "tries" is how often that split-orig-I2-to-two code is used; "recog"
> is how often it reached the first call to recog (so it passed
> can_combine_p etc.); "fail" is how often it eventually failed (after
> reaching recog), "one" is how often it combined to one insn, "two"
> is how often it combined to two.
> 
> 
> powerpc64
> 	tries	recog	fail	one	two
> type1	39214	39214	38944	202	18
> type2	21540	18968	18928	2	38
> type0		292	289	0	3
> 
> 
> powerpc
> 	tries	recog	fail	one	two
> type1	21654	21654	21167	485	2
> type2	21839	19754	19243	0	509	(*)
> type0		427	294	0	133
> 
> 
> x86-64
> 	tries	recog	fail	one	two
> type1	17387	17387	17288	70	29
> type2	40413	31681	30242	60	1369
> type0		0	0	0	0

The new numbers:

powerpc64
	tries	recog	fail	one	two
type1	39216	39216	38996	202	18
type2	21540	18968	18928	2	38


powerpc
	tries	recog	fail	one	two
type1	45276	45276	44196	1040	40
type2	46149	41589	40553	0	1032


x86-64
	tries	recog	fail	one	two
type1	17387	17387	17288	70	29
type2	40088	31625	30196	60	1369


Segher
diff mbox

Patch

diff --git a/gcc/combine.c b/gcc/combine.c
index 8995c1d3..de2e49f 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -3471,6 +3471,17 @@  try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0,
   if (insn_code_number < 0)
     insn_code_number = recog_for_combine (&newpat, i3, &new_i3_notes);
 
+  /* If we got I1 and I2 from splitting the original (parallel) I2 into two,
+     I1 and I2 have the same UID, which means that DF ends up deleting I2
+     when it is asked to delete I1.  So only allow this if we want I2 deleted,
+     that is, if we get only one insn as combine result; don't try to split
+     off a new I2 if it doesn't match yet.  */
+  if (i1 && insn_code_number < 0 && INSN_UID (i1) == INSN_UID (i2))
+    {
+      undo_all ();
+      return 0;
+    }
+
   /* If we were combining three insns and the result is a simple SET
      with no ASM_OPERANDS that wasn't recognized, try to split it into two
      insns.  There are two ways to do this.  It can be split using a