diff mbox

[RFA,rtl-optimization/52714] Do not allow combine to create invalid RTL

Message ID 52F421F5.4000203@redhat.com
State New
Headers show

Commit Message

Jeff Law Feb. 6, 2014, 11:59 p.m. UTC
As mentioned in the PR, we call try_combine with:


i1 = (insn 14 13 16 2 (set (reg/v/f:SI 34 [ stack ])
         (reg/f:SI 15 %sp)) pr52714.c:9 38 {*movsi_m68k2}
      (nil))
i2 = (insn 16 14 17 2 (set (cc0)
         (compare (reg/v/f:SI 34 [ stack ])
             (const_int 0 [0]))) pr52714.c:10 4 {*tstsi_internal_68020_cf}
      (nil))
i3 = (jump_insn 17 16 18 2 (set (pc)
         (if_then_else (eq (cc0)
                 (const_int 0 [0]))
             (label_ref 40)
             (pc))) pr52714.c:10 390 {beq}
      (int_list:REG_BR_PROB 1014 (nil))
  -> 40)

This is then combined into:

newpat = (parallel [
         (set (pc)
             (pc))
         (set (reg/v/f:SI 34 [ stack ])
             (reg/f:SI 15 %sp))
     ])

This isn't a recognized insn, and it needs to be split.  Since it's just 
a parallel of independent sets, combine tries to split it into a pair of 
assignments which look like:

(insn 16 14 17 2 (set (pc)
         (pc)) pr52714.c:10 2147483647 {NOOP_MOVE}
      (nil))
(jump_insn 17 16 18 2 (set (reg/v/f:SI 34 [ stack ])
         (reg/f:SI 15 %sp)) pr52714.c:10 38 {*movsi_m68k2}
      (int_list:REG_BR_PROB 1014 (nil))
  -> 40)


Note how the second is a JUMP_INSN, but it doesn't modify the PC.  Opps.


ISTM the code in combine which tries to rip apart a PARALLEL like that 
needs to ensure that I2 and I3 are both INSNs.

The astute reader would notice that this wasn't optimized at the tree 
level.  That's an artifact of using -O1 instead of -O2.  At -O2 VRP 
would come in and zap that test well before it ever expanded into RTL.

Verified all 3 tests in the PR are fixed (two full, one reduced). 
Reduced test to be added to the test suite.

Bootstrap and regression test in progress on x86_64-unknown-linux-gnu. 
OK if that passes without incident?

Comments

Eric Botcazou Feb. 8, 2014, 10:44 a.m. UTC | #1
> This is then combined into:
> 
> newpat = (parallel [
>          (set (pc)
>              (pc))
>          (set (reg/v/f:SI 34 [ stack ])
>              (reg/f:SI 15 %sp))
>      ])
> 
> This isn't a recognized insn, and it needs to be split.  Since it's just
> a parallel of independent sets, combine tries to split it into a pair of
> assignments which look like:
> 
> (insn 16 14 17 2 (set (pc)
>          (pc)) pr52714.c:10 2147483647 {NOOP_MOVE}
>       (nil))
> (jump_insn 17 16 18 2 (set (reg/v/f:SI 34 [ stack ])
>          (reg/f:SI 15 %sp)) pr52714.c:10 38 {*movsi_m68k2}
>       (int_list:REG_BR_PROB 1014 (nil))
>   -> 40)
> 
> 
> Note how the second is a JUMP_INSN, but it doesn't modify the PC.  Opps.
> 
> 
> ISTM the code in combine which tries to rip apart a PARALLEL like that
> needs to ensure that I2 and I3 are both INSNs.

That seems to be an unnecessary pessimization given that the combination looks 
perfectly valid if you swap the insns.  Can't we enhance the code just below 
which chooses the order of the insns after splitting?
Jeff Law Feb. 10, 2014, 6:23 a.m. UTC | #2
On 02/08/14 03:44, Eric Botcazou wrote:
>> This is then combined into:
>>
>> newpat = (parallel [
>>           (set (pc)
>>               (pc))
>>           (set (reg/v/f:SI 34 [ stack ])
>>               (reg/f:SI 15 %sp))
>>       ])
>>
>> This isn't a recognized insn, and it needs to be split.  Since it's just
>> a parallel of independent sets, combine tries to split it into a pair of
>> assignments which look like:
>>
>> (insn 16 14 17 2 (set (pc)
>>           (pc)) pr52714.c:10 2147483647 {NOOP_MOVE}
>>        (nil))
>> (jump_insn 17 16 18 2 (set (reg/v/f:SI 34 [ stack ])
>>           (reg/f:SI 15 %sp)) pr52714.c:10 38 {*movsi_m68k2}
>>        (int_list:REG_BR_PROB 1014 (nil))
>>    -> 40)
>>
>>
>> Note how the second is a JUMP_INSN, but it doesn't modify the PC.  Opps.
>>
>>
>> ISTM the code in combine which tries to rip apart a PARALLEL like that
>> needs to ensure that I2 and I3 are both INSNs.
>
> That seems to be an unnecessary pessimization given that the combination looks
> perfectly valid if you swap the insns.  Can't we enhance the code just below
> which chooses the order of the insns after splitting?
I pondered changing the condition for swapping the insn order, but it 
didn't seem worth it.  I doubt we see many 3->2 combinations where I3 is 
a JUMP_INSN, the result turns into two simple sets and the insn swapping 
code you wrote decides it needs to swap the insns.

I also pondered identifying the nop set within the parallel & taking it 
apart again.  Again, I rejected as I suspect it doesn't happen enough in 
practice to make it worth the effort.

I pondered adding  the guard for the newi2pat = set0; newpat = set1 case 
where we actually swap the insns.  But realized we could run into 
similar problems in the prior conditional if I2 was a CALL_INSN that we 
somehow simplified.

It seems to me that as long as we're re-using the existing insns to 
contain the simple sets that we have to ensure that they're INSNs, not 
CALL_INSNs or JUMP_INSNs.

I also pondered resetting the code on each insn via SET_CODE (whatever, 
INSN).  It's rather hacky, but the formats are close enough that it 
would work.  Then I realized that we can still undo_all later and didn't 
want to add compensation code to put the original code back.

Jeff
Eric Botcazou Feb. 11, 2014, 9:06 a.m. UTC | #3
> I pondered changing the condition for swapping the insn order, but it
> didn't seem worth it.  I doubt we see many 3->2 combinations where I3 is
> a JUMP_INSN, the result turns into two simple sets and the insn swapping
> code you wrote decides it needs to swap the insns.

I didn't actually write it, just enhanced it recently, that's why I suggested 
to do the same here.  It's one line of code and we have an example of valid 
simplification at hand so I think we ought to do it.

> It seems to me that as long as we're re-using the existing insns to
> contain the simple sets that we have to ensure that they're INSNs, not
> CALL_INSNs or JUMP_INSNs.

I disagree, nullifying JUMP_INSNs by changing them to (set (pc) (pc)) is a 
standard method in the combiner.
Jeff Law Feb. 11, 2014, 8:25 p.m. UTC | #4
On 02/11/14 02:06, Eric Botcazou wrote:
>> I pondered changing the condition for swapping the insn order, but it
>> didn't seem worth it.  I doubt we see many 3->2 combinations where I3 is
>> a JUMP_INSN, the result turns into two simple sets and the insn swapping
>> code you wrote decides it needs to swap the insns.
>
> I didn't actually write it, just enhanced it recently, that's why I suggested
> to do the same here.  It's one line of code and we have an example of valid
> simplification at hand so I think we ought to do it.
>
>> It seems to me that as long as we're re-using the existing insns to
>> contain the simple sets that we have to ensure that they're INSNs, not
>> CALL_INSNs or JUMP_INSNs.
>
> I disagree, nullifying JUMP_INSNs by changing them to (set (pc) (pc)) is a
> standard method in the combiner.
Right, but most of the time we're doing a 2->1 or 3->1 that turns the 
conditional jump into a nop -- at which point all we do is modify I3 
(which is the conditional jump) into a nop-jump.  That common case 
shouldn't be affected by my change.

It's only when we have a 3->2 or 4->2 combination where one of the 
earlier insns has a side effect we need to keep and we're able to 
simplify the conditional into a NOP that would be affected by my change. 
  And I suspect that is much more rare.

Regardless, I'll see what I can do.

THanks,
Jeff
Jeff Law Feb. 14, 2014, 6:05 p.m. UTC | #5
On 02/11/14 02:06, Eric Botcazou wrote:
>> I pondered changing the condition for swapping the insn order, but it
>> didn't seem worth it.  I doubt we see many 3->2 combinations where I3 is
>> a JUMP_INSN, the result turns into two simple sets and the insn swapping
>> code you wrote decides it needs to swap the insns.
>
> I didn't actually write it, just enhanced it recently, that's why I suggested
> to do the same here.  It's one line of code and we have an example of valid
> simplification at hand so I think we ought to do it.
>
>> It seems to me that as long as we're re-using the existing insns to
>> contain the simple sets that we have to ensure that they're INSNs, not
>> CALL_INSNs or JUMP_INSNs.
>
> I disagree, nullifying JUMP_INSNs by changing them to (set (pc) (pc)) is a
> standard method in the combiner.
So, the code has this structure

if (looks safe)
   emit in existing order
else if (reverse order looks safe)
   emit in reversed order
else
   undo_all


In this specific case, the existing order is never going to look safe 
because set1 uses (sp) as an input argument and use_crosses_set_p is 
very conservative when the value is the stack pointer on a PUSH_ROUNDING 
machine (such as the m68k)

So we could put the verification code that both I3 and I2 are INSNs in 
the else if (reverse order looks safe) clause.    That would mean for 
this testcase, we ultimately undo_all.  But I consider that reasonable 
given the only reason this instance bled into RTL land was -O1 instead 
of -O2 compilation.

I already know that variant works as it's what I had before I started 
thinking about what happens if we have a CALL_INSN as I2.

Jeff
Eric Botcazou Feb. 17, 2014, 9:28 a.m. UTC | #6
> So, the code has this structure
> 
> if (looks safe)
>    emit in existing order
> else if (reverse order looks safe)
>    emit in reversed order
> else
>    undo_all
> 
> 
> In this specific case, the existing order is never going to look safe
> because set1 uses (sp) as an input argument and use_crosses_set_p is
> very conservative when the value is the stack pointer on a PUSH_ROUNDING
> machine (such as the m68k)

What a bummer, it seems all things are aligned so as to block the optimization 
here...  It is a bit of a shame, as this is a powerful combination that would 
eliminate a branch if it succeeded.  The combination of the 3 insns yields the 
(set (pc) (pc)) pattern and it's only because added_sets_1 is true that the 
annoying PARALLEL is created.

> So we could put the verification code that both I3 and I2 are INSNs in
> the else if (reverse order looks safe) clause.    That would mean for
> this testcase, we ultimately undo_all.  But I consider that reasonable
> given the only reason this instance bled into RTL land was -O1 instead
> of -O2 compilation.

Although it's probably time to concede defeat in this particular case, I don't 
think we should use the NONJUMP_INSN_P big hammer because we want to eliminate 
branches if possible.  So I'd just add the minimal test to the two existing 
conditions so as to prevent the invalid RTL from being created, e.g.

  && (!JUMP_P (i3) || SET_DEST (set[01]) == pc_rtx)
Jeff Law Feb. 25, 2014, 6:27 a.m. UTC | #7
On 02/17/14 02:28, Eric Botcazou wrote:
>> So, the code has this structure
>>
>> if (looks safe)
>>     emit in existing order
>> else if (reverse order looks safe)
>>     emit in reversed order
>> else
>>     undo_all
>>
>>
>> In this specific case, the existing order is never going to look safe
>> because set1 uses (sp) as an input argument and use_crosses_set_p is
>> very conservative when the value is the stack pointer on a PUSH_ROUNDING
>> machine (such as the m68k)
>
> What a bummer, it seems all things are aligned so as to block the optimization
> here...  It is a bit of a shame, as this is a powerful combination that would
> eliminate a branch if it succeeded.  The combination of the 3 insns yields the
> (set (pc) (pc)) pattern and it's only because added_sets_1 is true that the
> annoying PARALLEL is created.
Yea.  As I mentioned before, I actually considered peeking inside the 
vector for a (set (pc) (pc)) and simplifying vector (possibly back down 
to a single set).  I really like eliminating branches ;-0

But I couldn't convince myself it'd be worth the effort.

>
>> So we could put the verification code that both I3 and I2 are INSNs in
>> the else if (reverse order looks safe) clause.    That would mean for
>> this testcase, we ultimately undo_all.  But I consider that reasonable
>> given the only reason this instance bled into RTL land was -O1 instead
>> of -O2 compilation.
>
> Although it's probably time to concede defeat in this particular case, I don't
> think we should use the NONJUMP_INSN_P big hammer because we want to eliminate
> branches if possible.  So I'd just add the minimal test to the two existing
> conditions so as to prevent the invalid RTL from being created, e.g.
>
>    && (!JUMP_P (i3) || SET_DEST (set[01]) == pc_rtx)
OK.  I'll give that a spin.

jeff
diff mbox

Patch

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index e489b62..7983139 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@ 
+2014-02-06  Jeff Law  <law@redhat.com>
+
+	PR rtl-optimization/52714
+	* combine.c (try_combine): When splitting an unrecognized PARALLEL
+	into two independent simple sets, make sure both I3 and I2 are
+	INSNs (as opposed to JUMP_INSNs or CALL_INSNs).
+
 2014-02-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
 
 	* config/arm/aarch-cost-tables.h (cortexa57_extra_costs): New table.
diff --git a/gcc/combine.c b/gcc/combine.c
index fd4294b..12db3a3 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -3692,7 +3692,11 @@  try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int *new_direct_jump_p,
 	   && ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 0)),
 				  XVECEXP (newpat, 0, 1))
 	   && ! (contains_muldiv (SET_SRC (XVECEXP (newpat, 0, 0)))
-		 && contains_muldiv (SET_SRC (XVECEXP (newpat, 0, 1)))))
+		 && contains_muldiv (SET_SRC (XVECEXP (newpat, 0, 1))))
+	   /* Make sure both I2 and I3 are simple INSNs, otherwise we might
+	      create a JUMP_INSN with a PATTERN that is a simple set.  */
+	   && GET_CODE (i2) == INSN
+	   && GET_CODE (i3) == INSN)
     {
       rtx set0 = XVECEXP (newpat, 0, 0);
       rtx set1 = XVECEXP (newpat, 0, 1);
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 2cac8d2..5d6e066 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@ 
+2014-02-06  Jeff Law  <law@redhat.com>
+
+	PR rtl-optimization/52714
+	* gcc.c-torture/compile/pr52714.c: New test.
+
 2014-02-06  Jakub Jelinek  <jakub@redhat.com>
 
 	PR target/59575
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr52714.c b/gcc/testsuite/gcc.c-torture/compile/pr52714.c
new file mode 100644
index 0000000..03d4990
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr52714.c
@@ -0,0 +1,25 @@ 
+
+int __re_compile_fastmap(unsigned char *p)
+{
+    unsigned char **stack;
+    unsigned size;
+    unsigned avail;
+
+    stack = __builtin_alloca(5 * sizeof(unsigned char*));
+    if (stack == 0)
+	return -2;
+    size = 5;
+    avail = 0;
+
+    for (;;) {
+	switch (*p++) {
+	case 0:
+	    if (avail == size)
+		return -2;
+	    stack[avail++] = p;
+	}
+    }
+
+    return 0;
+}
+