Message ID | 52F421F5.4000203@redhat.com |
---|---|
State | New |
Headers | show |
> This is then combined into: > > newpat = (parallel [ > (set (pc) > (pc)) > (set (reg/v/f:SI 34 [ stack ]) > (reg/f:SI 15 %sp)) > ]) > > This isn't a recognized insn, and it needs to be split. Since it's just > a parallel of independent sets, combine tries to split it into a pair of > assignments which look like: > > (insn 16 14 17 2 (set (pc) > (pc)) pr52714.c:10 2147483647 {NOOP_MOVE} > (nil)) > (jump_insn 17 16 18 2 (set (reg/v/f:SI 34 [ stack ]) > (reg/f:SI 15 %sp)) pr52714.c:10 38 {*movsi_m68k2} > (int_list:REG_BR_PROB 1014 (nil)) > -> 40) > > > Note how the second is a JUMP_INSN, but it doesn't modify the PC. Opps. > > > ISTM the code in combine which tries to rip apart a PARALLEL like that > needs to ensure that I2 and I3 are both INSNs. That seems to be an unnecessary pessimization given that the combination looks perfectly valid if you swap the insns. Can't we enhance the code just below which chooses the order of the insns after splitting?
On 02/08/14 03:44, Eric Botcazou wrote: >> This is then combined into: >> >> newpat = (parallel [ >> (set (pc) >> (pc)) >> (set (reg/v/f:SI 34 [ stack ]) >> (reg/f:SI 15 %sp)) >> ]) >> >> This isn't a recognized insn, and it needs to be split. Since it's just >> a parallel of independent sets, combine tries to split it into a pair of >> assignments which look like: >> >> (insn 16 14 17 2 (set (pc) >> (pc)) pr52714.c:10 2147483647 {NOOP_MOVE} >> (nil)) >> (jump_insn 17 16 18 2 (set (reg/v/f:SI 34 [ stack ]) >> (reg/f:SI 15 %sp)) pr52714.c:10 38 {*movsi_m68k2} >> (int_list:REG_BR_PROB 1014 (nil)) >> -> 40) >> >> >> Note how the second is a JUMP_INSN, but it doesn't modify the PC. Opps. >> >> >> ISTM the code in combine which tries to rip apart a PARALLEL like that >> needs to ensure that I2 and I3 are both INSNs. > > That seems to be an unnecessary pessimization given that the combination looks > perfectly valid if you swap the insns. Can't we enhance the code just below > which chooses the order of the insns after splitting? I pondered changing the condition for swapping the insn order, but it didn't seem worth it. I doubt we see many 3->2 combinations where I3 is a JUMP_INSN, the result turns into two simple sets and the insn swapping code you wrote decides it needs to swap the insns. I also pondered identifying the nop set within the parallel & taking it apart again. Again, I rejected as I suspect it doesn't happen enough in practice to make it worth the effort. I pondered adding the guard for the newi2pat = set0; newpat = set1 case where we actually swap the insns. But realized we could run into similar problems in the prior conditional if I2 was a CALL_INSN that we somehow simplified. It seems to me that as long as we're re-using the existing insns to contain the simple sets that we have to ensure that they're INSNs, not CALL_INSNs or JUMP_INSNs. I also pondered resetting the code on each insn via SET_CODE (whatever, INSN). It's rather hacky, but the formats are close enough that it would work. Then I realized that we can still undo_all later and didn't want to add compensation code to put the original code back. Jeff
> I pondered changing the condition for swapping the insn order, but it > didn't seem worth it. I doubt we see many 3->2 combinations where I3 is > a JUMP_INSN, the result turns into two simple sets and the insn swapping > code you wrote decides it needs to swap the insns. I didn't actually write it, just enhanced it recently, that's why I suggested to do the same here. It's one line of code and we have an example of valid simplification at hand so I think we ought to do it. > It seems to me that as long as we're re-using the existing insns to > contain the simple sets that we have to ensure that they're INSNs, not > CALL_INSNs or JUMP_INSNs. I disagree, nullifying JUMP_INSNs by changing them to (set (pc) (pc)) is a standard method in the combiner.
On 02/11/14 02:06, Eric Botcazou wrote: >> I pondered changing the condition for swapping the insn order, but it >> didn't seem worth it. I doubt we see many 3->2 combinations where I3 is >> a JUMP_INSN, the result turns into two simple sets and the insn swapping >> code you wrote decides it needs to swap the insns. > > I didn't actually write it, just enhanced it recently, that's why I suggested > to do the same here. It's one line of code and we have an example of valid > simplification at hand so I think we ought to do it. > >> It seems to me that as long as we're re-using the existing insns to >> contain the simple sets that we have to ensure that they're INSNs, not >> CALL_INSNs or JUMP_INSNs. > > I disagree, nullifying JUMP_INSNs by changing them to (set (pc) (pc)) is a > standard method in the combiner. Right, but most of the time we're doing a 2->1 or 3->1 that turns the conditional jump into a nop -- at which point all we do is modify I3 (which is the conditional jump) into a nop-jump. That common case shouldn't be affected by my change. It's only when we have a 3->2 or 4->2 combination where one of the earlier insns has a side effect we need to keep and we're able to simplify the conditional into a NOP that would be affected by my change. And I suspect that is much more rare. Regardless, I'll see what I can do. THanks, Jeff
On 02/11/14 02:06, Eric Botcazou wrote: >> I pondered changing the condition for swapping the insn order, but it >> didn't seem worth it. I doubt we see many 3->2 combinations where I3 is >> a JUMP_INSN, the result turns into two simple sets and the insn swapping >> code you wrote decides it needs to swap the insns. > > I didn't actually write it, just enhanced it recently, that's why I suggested > to do the same here. It's one line of code and we have an example of valid > simplification at hand so I think we ought to do it. > >> It seems to me that as long as we're re-using the existing insns to >> contain the simple sets that we have to ensure that they're INSNs, not >> CALL_INSNs or JUMP_INSNs. > > I disagree, nullifying JUMP_INSNs by changing them to (set (pc) (pc)) is a > standard method in the combiner. So, the code has this structure if (looks safe) emit in existing order else if (reverse order looks safe) emit in reversed order else undo_all In this specific case, the existing order is never going to look safe because set1 uses (sp) as an input argument and use_crosses_set_p is very conservative when the value is the stack pointer on a PUSH_ROUNDING machine (such as the m68k) So we could put the verification code that both I3 and I2 are INSNs in the else if (reverse order looks safe) clause. That would mean for this testcase, we ultimately undo_all. But I consider that reasonable given the only reason this instance bled into RTL land was -O1 instead of -O2 compilation. I already know that variant works as it's what I had before I started thinking about what happens if we have a CALL_INSN as I2. Jeff
> So, the code has this structure > > if (looks safe) > emit in existing order > else if (reverse order looks safe) > emit in reversed order > else > undo_all > > > In this specific case, the existing order is never going to look safe > because set1 uses (sp) as an input argument and use_crosses_set_p is > very conservative when the value is the stack pointer on a PUSH_ROUNDING > machine (such as the m68k) What a bummer, it seems all things are aligned so as to block the optimization here... It is a bit of a shame, as this is a powerful combination that would eliminate a branch if it succeeded. The combination of the 3 insns yields the (set (pc) (pc)) pattern and it's only because added_sets_1 is true that the annoying PARALLEL is created. > So we could put the verification code that both I3 and I2 are INSNs in > the else if (reverse order looks safe) clause. That would mean for > this testcase, we ultimately undo_all. But I consider that reasonable > given the only reason this instance bled into RTL land was -O1 instead > of -O2 compilation. Although it's probably time to concede defeat in this particular case, I don't think we should use the NONJUMP_INSN_P big hammer because we want to eliminate branches if possible. So I'd just add the minimal test to the two existing conditions so as to prevent the invalid RTL from being created, e.g. && (!JUMP_P (i3) || SET_DEST (set[01]) == pc_rtx)
On 02/17/14 02:28, Eric Botcazou wrote: >> So, the code has this structure >> >> if (looks safe) >> emit in existing order >> else if (reverse order looks safe) >> emit in reversed order >> else >> undo_all >> >> >> In this specific case, the existing order is never going to look safe >> because set1 uses (sp) as an input argument and use_crosses_set_p is >> very conservative when the value is the stack pointer on a PUSH_ROUNDING >> machine (such as the m68k) > > What a bummer, it seems all things are aligned so as to block the optimization > here... It is a bit of a shame, as this is a powerful combination that would > eliminate a branch if it succeeded. The combination of the 3 insns yields the > (set (pc) (pc)) pattern and it's only because added_sets_1 is true that the > annoying PARALLEL is created. Yea. As I mentioned before, I actually considered peeking inside the vector for a (set (pc) (pc)) and simplifying vector (possibly back down to a single set). I really like eliminating branches ;-0 But I couldn't convince myself it'd be worth the effort. > >> So we could put the verification code that both I3 and I2 are INSNs in >> the else if (reverse order looks safe) clause. That would mean for >> this testcase, we ultimately undo_all. But I consider that reasonable >> given the only reason this instance bled into RTL land was -O1 instead >> of -O2 compilation. > > Although it's probably time to concede defeat in this particular case, I don't > think we should use the NONJUMP_INSN_P big hammer because we want to eliminate > branches if possible. So I'd just add the minimal test to the two existing > conditions so as to prevent the invalid RTL from being created, e.g. > > && (!JUMP_P (i3) || SET_DEST (set[01]) == pc_rtx) OK. I'll give that a spin. jeff
diff --git a/gcc/ChangeLog b/gcc/ChangeLog index e489b62..7983139 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,10 @@ +2014-02-06 Jeff Law <law@redhat.com> + + PR rtl-optimization/52714 + * combine.c (try_combine): When splitting an unrecognized PARALLEL + into two independent simple sets, make sure both I3 and I2 are + INSNs (as opposed to JUMP_INSNs or CALL_INSNs). + 2014-02-06 Kyrylo Tkachov <kyrylo.tkachov@arm.com> * config/arm/aarch-cost-tables.h (cortexa57_extra_costs): New table. diff --git a/gcc/combine.c b/gcc/combine.c index fd4294b..12db3a3 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -3692,7 +3692,11 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int *new_direct_jump_p, && ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 0)), XVECEXP (newpat, 0, 1)) && ! (contains_muldiv (SET_SRC (XVECEXP (newpat, 0, 0))) - && contains_muldiv (SET_SRC (XVECEXP (newpat, 0, 1))))) + && contains_muldiv (SET_SRC (XVECEXP (newpat, 0, 1)))) + /* Make sure both I2 and I3 are simple INSNs, otherwise we might + create a JUMP_INSN with a PATTERN that is a simple set. */ + && GET_CODE (i2) == INSN + && GET_CODE (i3) == INSN) { rtx set0 = XVECEXP (newpat, 0, 0); rtx set1 = XVECEXP (newpat, 0, 1); diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 2cac8d2..5d6e066 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2014-02-06 Jeff Law <law@redhat.com> + + PR rtl-optimization/52714 + * gcc.c-torture/compile/pr52714.c: New test. + 2014-02-06 Jakub Jelinek <jakub@redhat.com> PR target/59575 diff --git a/gcc/testsuite/gcc.c-torture/compile/pr52714.c b/gcc/testsuite/gcc.c-torture/compile/pr52714.c new file mode 100644 index 0000000..03d4990 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/compile/pr52714.c @@ -0,0 +1,25 @@ + +int __re_compile_fastmap(unsigned char *p) +{ + unsigned char **stack; + unsigned size; + unsigned avail; + + stack = __builtin_alloca(5 * sizeof(unsigned char*)); + if (stack == 0) + return -2; + size = 5; + avail = 0; + + for (;;) { + switch (*p++) { + case 0: + if (avail == size) + return -2; + stack[avail++] = p; + } + } + + return 0; +} +