Message ID | 20171124213813.GB14653@tucnak |
---|---|
State | New |
Headers | show |
Series | Fix combine's simplify_if_then_else (PR rtl-optimization/81553) | expand |
Hi, On Fri, Nov 24, 2017 at 10:38:13PM +0100, Jakub Jelinek wrote: > The following testcase ICEs in wide-int*, but the reason is a mode mismatch > (we build a SImode MULT with one QImode argument and one VOIDmode argument, > then it is folded into SImode NEG with QImode argument, ...). > The bug is in assuming that the mode of c1 must be m, that is usually the > case, but shifts are special, the second argument can have a different mode. > > The following patch makes sure we perform the computation of the new shift > count in the right mode. Okay, thanks! Is this better than bailing out though, do you have an example? Segher > 2017-11-24 Jakub Jelinek <jakub@redhat.com> > > PR rtl-optimization/81553 > * combine.c (simplify_if_then_else): In (if_then_else COND (OP Z C1) Z) > to (OP Z (mult COND (C1 * STORE_FLAG_VALUE))) optimization, if OP > is a shift where C1 has different mode than the whole shift, use C1's > mode for MULT rather than the shift's mode. > > * gcc.c-torture/compile/pr81553.c: New test.
On Fri, Nov 24, 2017 at 07:19:29PM -0600, Segher Boessenkool wrote: > Hi, > > On Fri, Nov 24, 2017 at 10:38:13PM +0100, Jakub Jelinek wrote: > > The following testcase ICEs in wide-int*, but the reason is a mode mismatch > > (we build a SImode MULT with one QImode argument and one VOIDmode argument, > > then it is folded into SImode NEG with QImode argument, ...). > > The bug is in assuming that the mode of c1 must be m, that is usually the > > case, but shifts are special, the second argument can have a different mode. > > > > The following patch makes sure we perform the computation of the new shift > > count in the right mode. > > Okay, thanks! Is this better than bailing out though, do you have > an example? I don't, but I haven't bootstrapped/regtested with a patch to gather statistics on whether it would ever be successful (how? Should I note in a flag successful optimization of this and then check that flag upon successful try_combine and clear it at the end of any try_combine? It would need to be done for all targets probably). This code dates from 1994, but testsuite was separate, so it is hard to check if this was covered by any testcases. If C1 is a constant (but then it would likely have VOIDmode), then I can imagine it would match, though hard to find testcases, because usually GIMPLE folding should propagate constants far earlier. If C1 is a pseudo, then we'd need an instruction that has (any_shift (something) (mult (something) (something))) which is quite unlikely. But it is similarly unlikely for (ior (something) (mult (something) (something))) and similar. Or can try_combine then split it into two separate instructions, one doing the mult and one doing the other, say when combining from original 3 or 4 instructions? If so, then it could be likely it hits. Jakub
On Sat, Nov 25, 2017 at 09:39:54AM +0100, Jakub Jelinek wrote: > > Okay, thanks! Is this better than bailing out though, do you have > > an example? > > I don't, but I haven't bootstrapped/regtested with a patch to gather > statistics on whether it would ever be successful (how? Should I note in a > flag successful optimization of this and then check that flag upon > successful try_combine and clear it at the end of any try_combine? It would > need to be done for all targets probably). This code dates from 1994, but > testsuite was separate, so it is hard to check if this was covered by any > testcases. > > If C1 is a constant (but then it would likely have VOIDmode), then I can > imagine it would match, though hard to find testcases, because usually > GIMPLE folding should propagate constants far earlier. If C1 is a pseudo, > then we'd need an instruction that has > (any_shift (something) (mult (something) (something))) > which is quite unlikely. But it is similarly unlikely for > (ior (something) (mult (something) (something))) and similar. const_int is always VOIDmode yes. So we're left with a shift of a mul of pseudos, and when that is split it is extremely likely split on a boundary that was already there when we began, so I don't see it likely it helps. I thought you might have hit on an example where it does :-) The patch is fine either way. > Or can try_combine then split it into two separate instructions, one doing > the mult and one doing the other, say when combining from original 3 or 4 > instructions? If so, then it could be likely it hits. Yes: it will use a define_split if there is one, or else it will find what it thinks is the best spot to split, and try that. That should usually split the mul off, and you usually will have started with a mul in a separate insn, so says this was i1+i2+i3 with i3 the mul, in effect this is then just combining i1+i2. Sometimes of course that will then work, where combining just i1+i2 won't (because of known register values, or REG_DEAD notes, or similar). Segher
--- gcc/combine.c.jj 2017-11-19 18:08:08.000000000 +0100 +++ gcc/combine.c 2017-11-24 12:07:25.701480794 +0100 @@ -6639,11 +6639,15 @@ simplify_if_then_else (rtx x) if (z) { - temp = subst (simplify_gen_relational (true_code, m, VOIDmode, + machine_mode cm = m; + if ((op == ASHIFT || op == LSHIFTRT || op == ASHIFTRT) + && GET_MODE (c1) != VOIDmode) + cm = GET_MODE (c1); + temp = subst (simplify_gen_relational (true_code, cm, VOIDmode, cond_op0, cond_op1), pc_rtx, pc_rtx, 0, 0, 0); - temp = simplify_gen_binary (MULT, m, temp, - simplify_gen_binary (MULT, m, c1, + temp = simplify_gen_binary (MULT, cm, temp, + simplify_gen_binary (MULT, cm, c1, const_true_rtx)); temp = subst (temp, pc_rtx, pc_rtx, 0, 0, 0); temp = simplify_gen_binary (op, m, gen_lowpart (m, z), temp); --- gcc/testsuite/gcc.c-torture/compile/pr81553.c.jj 2017-11-24 12:11:25.681551110 +0100 +++ gcc/testsuite/gcc.c-torture/compile/pr81553.c 2017-11-24 12:10:55.000000000 +0100 @@ -0,0 +1,10 @@ +/* PR rtl-optimization/81553 */ + +int a, b, c, d; + +void +foo (void) +{ + d = 1 >> c >> 1; + b = ~(209883449764912897ULL & d) << (0 >= a) | ~d; +}