Message ID | 20131012073934.GB30970@tucnak.zalov.cz |
---|---|
State | New |
Headers | show |
Patch and test cases are updated according to your comments. Bootstrap and no make check regression on X86-64. ChangeLog: 2013-10-14 Zhenqiang Chen <zhenqiang.chen@arm.com> * tree-ssa-reassoc.c (try_transfer_range_tests_1): New function, extracted from optimize_range_tests * (try_transfer_range_tests_2): New function. * (try_transfer_range_tests): New function, extracted from optimize_range_tests and calls try_transfer_range_tests_1 and try_transfer_range_tests_2, * (optimize_range_tests): Use try_transfer_range_tests. testsuite/ChangeLog: 2013-10-14 Zhenqiang Chen <zhenqiang.chen@arm.com> * gcc.dg/tree-ssa/reassoc-32.c: New test case. * gcc.dg/tree-ssa/reassoc-33.c: New test case. * gcc.dg/tree-ssa/reassoc-34.c: New test case. * gcc.dg/tree-ssa/reassoc-35.c: New test case. * gcc.dg/tree-ssa/reassoc-36.c: New test case. > -----Original Message----- > From: Jakub Jelinek [mailto:jakub@redhat.com] > Sent: Saturday, October 12, 2013 3:40 PM > To: Zhenqiang Chen > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - > CST1) == 1 into ((X - CST1) & ~(CST2 - CST1)) == 0 > > On Sat, Oct 12, 2013 at 10:08:12AM +0800, Zhenqiang Chen wrote: > > As you had mentioned, the transition in this patch does not reduce > > instructions. But the preexisting optimization does. So we prefer to > > do the preexisting optimization first. E.g. > > > > X == 10 || X == 12 || X == 26 > > Ok, that makes sense. > > @@ -2279,6 +2275,71 @@ optimize_range_tests (enum tree_code opcode, > } > } > > + /* Optimize X == CST1 || X == CST2 > + if popcount (CST2 - CST1) == 1 into > + ((X - CST1) & ~(CST2 - CST1)) == 0. */ > > Mention here also similarly to the other comment that it works also with > ranges. > > + if (BRANCH_COST (optimize_function_for_speed_p (cfun), false) >= 2) > + for (i = first; i < length; i++) > + { > + tree lowi, highi, lowj, highj, type, tem1, tem2, mask; > + > + if (ranges[i].exp == NULL_TREE || ranges[i].in_p) > + continue; > + type = TREE_TYPE (ranges[i].exp); > + if (!INTEGRAL_TYPE_P (type)) > + continue; > + lowi = ranges[i].low; > + if (lowi == NULL_TREE) > + continue; > > The other loop has here: > if (lowi == NULL_TREE) > lowi = TYPE_MIN_VALUE (type); > instead, which is better, can you please change it? > > + highi = ranges[i].high; > + if (highi == NULL_TREE) > + continue; > + for (j = i + 1; j < length && j < i + 64; j++) > + { > + lowj = ranges[j].low; > + if (lowj == NULL_TREE) > + continue; > + highj = ranges[j].high; > + if (highj == NULL_TREE) > + continue; > > The other loop has > if (highj == NULL_TREE) > highj = TYPE_MAX_VALUE (type); > here instead. > > + if (ranges[j].exp == NULL_TREE || ranges[j].in_p > + || (ranges[i].exp != ranges[j].exp)) > + continue; > > Can you please move this test the lowj = assignment, and remove the > ranges[j].exp == NULL_TREE test (both here and in the earlier loop)? I mean, > we've checked at the beginning of for (i = first; i < length; i++) loop that > ranges[i].exp is not NULL_TREE, and if ranges[j].exp is NULL_TREE, it will be > different than ranges[i].exp. > So > if (ranges[i].exp != ranges[j].exp || ranges[j].in_p) > continue; > (and please also collapse the two checks in the first loop, so that both look > similar). > > + /* Check lowj > highi. */ > + tem1 = fold_binary (GT_EXPR, boolean_type_node, > + lowj, highi); > + if (tem1 == NULL_TREE || !integer_onep (tem1)) > + continue; > + /* Check highi - lowi == highj - lowj. */ > + tem1 = fold_binary (MINUS_EXPR, type, highi, lowi); > + if (tem1 == NULL_TREE || TREE_CODE (tem1) != INTEGER_CST) > + continue; > + tem2 = fold_binary (MINUS_EXPR, type, highj, lowj); > + if (tem2 == NULL_TREE || TREE_CODE (tem2) != INTEGER_CST) > + continue; > + if (!tree_int_cst_equal (tem1, tem2)) > + continue; > + /* Check popcount (lowj - lowi) == 1. */ > + tem1 = fold_binary (MINUS_EXPR, type, lowj, lowi); > + if (tem1 == NULL_TREE || TREE_CODE (tem1) != INTEGER_CST) > + continue; > + if (tree_log2 (tem1) < 0) > + continue; > + mask = fold_build1 (BIT_NOT_EXPR, type, tem1); > + tem1 = fold_binary (MINUS_EXPR, type, ranges[i].exp, lowi); > + tem1 = fold_build2 (BIT_AND_EXPR, type, tem1, mask); > + lowi = build_int_cst (type, 0); > > Please use lowj instead of lowi here. Because, if update_range_test fails, we > continue looking for further matches in following ranges, and while lowj will > be computed again, lowi will be assumed to contain the low bound of the > first range. > > + if (update_range_test (ranges + i, ranges + j, 1, opcode, ops, tem1, > + ranges[i].in_p, lowi, tem2, > > And here too. > > Or, alternatively to avoid duplication, you could put the whole for (i = first; i < > length; i++) loop from the first optimization into another int i; > for (pass = 0; pass < (BRANCH_COST (optimize_function_for_speed_p > (cfun), > false) >= 2) + 1; pass++) > { > } > loop, use whatever is common in the loop unconditionally, and just > conditionalize the rest on pass == 0 vs. pass == 1. > Or maybe even better just move this whole loop into a new function, with > ranges, opcode, first, length arguments plus another (bool?) argument which > of the two optimizations you want to perform, and return the bool > any_changes. Then you wouldn't run into line length issues that you could > run into with the extra loop. > > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-33.c > @@ -0,0 +1,38 @@ > +/* { dg-do run { target { ! "m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* > +v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* > +mcore*-*-* powerpc*-*-* xtensa*-*-*"} } } */ > + > +/* { dg-options "-O2 -fno-inline -fdump-tree-reassoc1-details" } */ > +/* { dg-additional-options "-mbranch-cost=2" { target avr-*-* } } */ > + > +int test (int a, int b, int c) > +{ > + if (a == 43 || a == 75 || a == 44 || a == 78 || a == 77 || a == 46 || a == 76 || > a == 45) > + return b; > + else > + return c; > +} > + > +int main () > +{ > + if (test (43, 20, 30) != 20) > + __builtin_abort (); > + if (test (44, 20, 30) != 20) > + __builtin_abort (); > + if (test (45, 20, 30) != 20) > + __builtin_abort (); > + if (test (46, 20, 30) != 20) > + __builtin_abort (); > + if (test (75, 20, 30) != 20) > + __builtin_abort (); > + if (test (76, 20, 30) != 20) > + __builtin_abort (); > + if (test (77, 20, 30) != 20) > + __builtin_abort (); > + if (test (78, 20, 30) != 20) > + __builtin_abort (); > + if (test (30, 20, 30) != 30) > + __builtin_abort (); > > Perhaps it would be better to test more than just one value outside of the > range. Say everything from -10 to 100 might be better, but you'd want to > make sure the optimization can't be applied to the What about > > int > main () > { > volatile int n43, n47, n75, n79; > n43 = 43; n47 = n43 + 4; n75 = 75; n79 = n75 + 4; > int i; > for (i = -10; i <= 100; i++) > if (test (i, 20, 30) != 30 - ((i >= n43 && i < n47) > || (i >= n75 && i < n79)) * 10) > __builtin_abort (); > return 0; > } > > ? > > Jakub
On Mon, Oct 14, 2013 at 03:10:12PM +0800, Zhenqiang Chen wrote: @@ -2131,6 +2133,155 @@ update_range_test (struct range_entry *range, struct range_entry *otherrange, return true; } +/* Optimize X == CST1 || X == CST2 + if popcount (CST1 ^ CST2) == 1 into + (X & ~(CST1 ^ CST2)) == (CST1 & ~(CST1 ^ CST2)). + Similarly for ranges. E.g. + X != 2 && X != 3 && X != 10 && X != 11 + will be transformed by the previous optimization into + (X - 2U) <= 1U && (X - 10U) <= 1U + and this loop can transform that into + ((X & ~8) - 2U) <= 1U. */ + +static bool +try_transfer_range_tests_1 (enum tree_code opcode, int i, int j, tree type, + tree lowi, tree lowj, tree highi, tree highj, + vec<operand_entry_t> *ops, + struct range_entry *ranges) The function names are bad, you aren't transfering anything, but optimizing. Please rename try_transfer_range_tests to optimize_range_tests_1 and try_transfer_range_tests_{1,2} to optimize_range_tests_{2,3} or perhaps better yet to optimize_range_tests_{xor,diff}. Also, perhaps instead of passing ranges and i and j to these two functions you could pass struct range_entry *range1, struct range_entry *range2 (caller would pass ranges + i, ranges + j). +/* It does some common checks for function try_transfer_range_tests_1 and + try_transfer_range_tests_2. Please adjust the comment for the renaming. Please change trans_option to bool optimize_xor. + if (trans_option == 1) + { + if (try_transfer_range_tests_1 (opcode, i, j, type, lowi, lowj, + highi, highj, ops, ranges)) + { + any_changes = true; + break; + } + } + else if (trans_option == 2) + { + if (try_transfer_range_tests_2 (opcode, i, j, type, lowi, lowj, + highi, highj, ops, ranges)) + { + any_changes = true; + break; + } + } I'd prefer if (optimize_xor) any_changes = optimize_range_tests_xor (opcode, type, lowi, lowj, highi, highj, ops, ranges + i, ranges + j); else any_changes = optimize_range_tests_xor (opcode, type, lowi, lowj, highi, highj, ops, ranges + i, ranges + j); if (any_changes) break; Ok with those changes. Jakub
> -----Original Message----- > From: Jakub Jelinek [mailto:jakub@redhat.com] > Sent: Monday, October 14, 2013 4:49 PM > To: Zhenqiang Chen > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - > CST1) == 1 into ((X - CST1) & ~(CST2 - CST1)) == 0 > > On Mon, Oct 14, 2013 at 03:10:12PM +0800, Zhenqiang Chen wrote: > > @@ -2131,6 +2133,155 @@ update_range_test (struct range_entry *range, > struct range_entry *otherrange, > return true; > } > > +/* Optimize X == CST1 || X == CST2 > + if popcount (CST1 ^ CST2) == 1 into > + (X & ~(CST1 ^ CST2)) == (CST1 & ~(CST1 ^ CST2)). > + Similarly for ranges. E.g. > + X != 2 && X != 3 && X != 10 && X != 11 > + will be transformed by the previous optimization into > + (X - 2U) <= 1U && (X - 10U) <= 1U > + and this loop can transform that into > + ((X & ~8) - 2U) <= 1U. */ > + > +static bool > +try_transfer_range_tests_1 (enum tree_code opcode, int i, int j, tree type, > + tree lowi, tree lowj, tree highi, tree highj, > + vec<operand_entry_t> *ops, > + struct range_entry *ranges) > > The function names are bad, you aren't transfering anything, but optimizing. > Please rename try_transfer_range_tests to optimize_range_tests_1 and > try_transfer_range_tests_{1,2} to optimize_range_tests_{2,3} or perhaps > better yet to optimize_range_tests_{xor,diff}. try_transfer_range_tests is changed to optimize_range_tests_1 try_transfer_range_tests_1 is changed to optimize_range_tests_xor try_transfer_range_tests_2 is changed to optimize_range_tests_diff > Also, perhaps instead of passing ranges and i and j to these two functions > you could pass struct range_entry *range1, struct range_entry *range2 > (caller would pass ranges + i, ranges + j). Updated to rangei and rangej since we use i, j, lowi, lowj etc at other places. > +/* It does some common checks for function try_transfer_range_tests_1 > and > + try_transfer_range_tests_2. > > Please adjust the comment for the renaming. Please change trans_option to > bool optimize_xor. Updated. > + if (trans_option == 1) > + { > + if (try_transfer_range_tests_1 (opcode, i, j, type, lowi, lowj, > + highi, highj, ops, ranges)) > + { > + any_changes = true; > + break; > + } > + } > + else if (trans_option == 2) > + { > + if (try_transfer_range_tests_2 (opcode, i, j, type, lowi, lowj, > + highi, highj, ops, ranges)) > + { > + any_changes = true; > + break; > + } > + } > > I'd prefer > if (optimize_xor) > any_changes > = optimize_range_tests_xor (opcode, type, lowi, lowj, highi, > highj, ops, ranges + i, ranges + j); > else > any_changes > = optimize_range_tests_xor (opcode, type, lowi, lowj, highi, > highj, ops, ranges + i, ranges + j); > if (any_changes) > break; This is incorrect. The "any_changes" should be "|=" of the return values. In the updated patch, I changed the code segment as bool changes; ... if (optimize_xor) changes = optimize_range_tests_xor (opcode, type, lowi, lowj, highi, highj, ops, ranges + i, ranges + j); else changes = optimize_range_tests_diff (opcode, type, lowi, lowj, highi, highj, ops, ranges + i, ranges + j); if (changes) { any_changes = true; break; } Is it OK? Thanks! -Zhenqiang > Ok with those changes. > > Jakub
On Tue, Oct 15, 2013 at 03:57:23PM +0800, Zhenqiang Chen wrote:
> Is it OK?
Ok, except two comments apparently still need updating.
+/* Optimize X == CST1 || X == CST2
+ if popcount (CST1 ^ CST2) == 1 into
+ (X & ~(CST1 ^ CST2)) == (CST1 & ~(CST1 ^ CST2)).
+ Similarly for ranges. E.g.
+ X != 2 && X != 3 && X != 10 && X != 11
+ will be transformed by the previous optimization into
+ (X - 2U) <= 1U && (X - 10U) <= 1U
+ and this loop can transform that into
+ ((X & ~8) - 2U) <= 1U. */
Here the example is using != and &&, so it is transformed into:
!((X - 2U) <= 1U || (X - 10U) <= 1U)
(everything is negated at the end, and note || instead of &&,
with && it could fold into !(0))
and finally into:
!(((X & ~8) - 2U) <= 1U)
Or alternatively change the first expression into:
X == 2 || X == 3 || X == 10 || X == 11
and the second to:
(X - 2U) <= 1U || (X - 10U) <= 1U
and the third keep as is.
+/* Optimize X == CST1 || X == CST2
+ if popcount (CST2 - CST1) == 1 into
+ ((X - CST1) & ~(CST2 - CST1)) == 0.
+ Similarly for ranges. E.g.
+ X == 43 || X == 76 || X == 44 || X == 78 || X == 77 || X == 46
+ || X == 75 || X == 45
+ will be transformed by the previous optimization into
+ (X - 43U) <= 3U && (X - 75U) <= 3U
+ and this loop can transform that into
+ ((X - 43U) & ~(75U - 43U)) <= 3U. */
Here the example is using == and ||, so the only wrong thing in there
is that the second expression should be
(X - 43U) <= 3U || (X - 75U) <= 3U
Ok with those changes.
Jakub
On 10/15/13 02:12, Jakub Jelinek wrote: > On Tue, Oct 15, 2013 at 03:57:23PM +0800, Zhenqiang Chen wrote: >> Is it OK? > > Ok, except two comments apparently still need updating. > > +/* Optimize X == CST1 || X == CST2 > + if popcount (CST1 ^ CST2) == 1 into > + (X & ~(CST1 ^ CST2)) == (CST1 & ~(CST1 ^ CST2)). > + Similarly for ranges. E.g. > + X != 2 && X != 3 && X != 10 && X != 11 > + will be transformed by the previous optimization into > + (X - 2U) <= 1U && (X - 10U) <= 1U > + and this loop can transform that into > + ((X & ~8) - 2U) <= 1U. */ > > Here the example is using != and &&, so it is transformed into: > !((X - 2U) <= 1U || (X - 10U) <= 1U) > (everything is negated at the end, and note || instead of &&, > with && it could fold into !(0)) > and finally into: > !(((X & ~8) - 2U) <= 1U) > Or alternatively change the first expression into: > X == 2 || X == 3 || X == 10 || X == 11 > and the second to: > (X - 2U) <= 1U || (X - 10U) <= 1U > and the third keep as is. > > +/* Optimize X == CST1 || X == CST2 > + if popcount (CST2 - CST1) == 1 into > + ((X - CST1) & ~(CST2 - CST1)) == 0. > + Similarly for ranges. E.g. > + X == 43 || X == 76 || X == 44 || X == 78 || X == 77 || X == 46 > + || X == 75 || X == 45 > + will be transformed by the previous optimization into > + (X - 43U) <= 3U && (X - 75U) <= 3U > + and this loop can transform that into > + ((X - 43U) & ~(75U - 43U)) <= 3U. */ > > Here the example is using == and ||, so the only wrong thing in there > is that the second expression should be > (X - 43U) <= 3U || (X - 75U) <= 3U > > Ok with those changes. I noticed that we're now including rtl.h and tm_p.h in tree-ssa-reassoc.c, which seems wrong. Jeff
On Tue, Oct 15, 2013 at 10:50:39AM -0600, Jeff Law wrote: > I noticed that we're now including rtl.h and tm_p.h in > tree-ssa-reassoc.c, which seems wrong. Isn't that required for BRANCH_COST use? Other option would be to add some dummy wrapper around BRANCH_COST, put that wrapper into some file that already includes rtl.h and tm_p.h and just call it from there. Jakub
On 10/15/13 10:53, Jakub Jelinek wrote: > On Tue, Oct 15, 2013 at 10:50:39AM -0600, Jeff Law wrote: >> I noticed that we're now including rtl.h and tm_p.h in >> tree-ssa-reassoc.c, which seems wrong. > > Isn't that required for BRANCH_COST use? > Other option would be to add some dummy wrapper around > BRANCH_COST, put that wrapper into some file that already includes rtl.h and > tm_p.h and just call it from there. Yea, looking at it now, I'm a bit surprised this hasn't been converted to a target hook which would avoid this problem entirely. jeff
On 10/15/13 02:12, Jakub Jelinek wrote: > On Tue, Oct 15, 2013 at 03:57:23PM +0800, Zhenqiang Chen wrote: >> Is it OK? > > Ok, except two comments apparently still need updating. > > +/* Optimize X == CST1 || X == CST2 > + if popcount (CST1 ^ CST2) == 1 into > + (X & ~(CST1 ^ CST2)) == (CST1 & ~(CST1 ^ CST2)). > + Similarly for ranges. E.g. > + X != 2 && X != 3 && X != 10 && X != 11 > + will be transformed by the previous optimization into > + (X - 2U) <= 1U && (X - 10U) <= 1U > + and this loop can transform that into > + ((X & ~8) - 2U) <= 1U. */ > > Here the example is using != and &&, so it is transformed into: > !((X - 2U) <= 1U || (X - 10U) <= 1U) > (everything is negated at the end, and note || instead of &&, > with && it could fold into !(0)) > and finally into: > !(((X & ~8) - 2U) <= 1U) > Or alternatively change the first expression into: > X == 2 || X == 3 || X == 10 || X == 11 > and the second to: > (X - 2U) <= 1U || (X - 10U) <= 1U > and the third keep as is. > > +/* Optimize X == CST1 || X == CST2 > + if popcount (CST2 - CST1) == 1 into > + ((X - CST1) & ~(CST2 - CST1)) == 0. > + Similarly for ranges. E.g. > + X == 43 || X == 76 || X == 44 || X == 78 || X == 77 || X == 46 > + || X == 75 || X == 45 > + will be transformed by the previous optimization into > + (X - 43U) <= 3U && (X - 75U) <= 3U > + and this loop can transform that into > + ((X - 43U) & ~(75U - 43U)) <= 3U. */ > > Here the example is using == and ||, so the only wrong thing in there > is that the second expression should be > (X - 43U) <= 3U || (X - 75U) <= 3U > > Ok with those changes. I fixed up the comments & ChangeLog entry and committed on Zhenqiang's behalf. jeff
On Tue, Oct 15, 2013 at 6:55 PM, Jeff Law <law@redhat.com> wrote: > On 10/15/13 10:53, Jakub Jelinek wrote: >> >> On Tue, Oct 15, 2013 at 10:50:39AM -0600, Jeff Law wrote: >>> >>> I noticed that we're now including rtl.h and tm_p.h in >>> tree-ssa-reassoc.c, which seems wrong. >> >> >> Isn't that required for BRANCH_COST use? >> Other option would be to add some dummy wrapper around >> BRANCH_COST, put that wrapper into some file that already includes rtl.h >> and >> tm_p.h and just call it from there. > > Yea, looking at it now, I'm a bit surprised this hasn't been converted to a > target hook which would avoid this problem entirely. You got the job! ;) Richard. > jeff
On 10/16/13 02:21, Richard Biener wrote: > On Tue, Oct 15, 2013 at 6:55 PM, Jeff Law <law@redhat.com> wrote: >> On 10/15/13 10:53, Jakub Jelinek wrote: >>> >>> On Tue, Oct 15, 2013 at 10:50:39AM -0600, Jeff Law wrote: >>>> >>>> I noticed that we're now including rtl.h and tm_p.h in >>>> tree-ssa-reassoc.c, which seems wrong. >>> >>> >>> Isn't that required for BRANCH_COST use? >>> Other option would be to add some dummy wrapper around >>> BRANCH_COST, put that wrapper into some file that already includes rtl.h >>> and >>> tm_p.h and just call it from there. >> >> Yea, looking at it now, I'm a bit surprised this hasn't been converted to a >> target hook which would avoid this problem entirely. > > You got the job! Joys :-) What's the policy on the GDFL stuff. I've tried so hard to avoid having to worry about that rats nest that I have no idea what our policy is. Basically I just want to take the old docs for BRANCH_COST and re-use those. Is that considered kosher with all the GDFL nonsense? Jeff
On Wed, 16 Oct 2013, Jeff Law wrote: > What's the policy on the GDFL stuff. I've tried so hard to avoid having to > worry about that rats nest that I have no idea what our policy is. Basically I > just want to take the old docs for BRANCH_COST and re-use those. Is that > considered kosher with all the GDFL nonsense? When moving documentation text from the manual into target.def (so it ends up in both target.def and tm.texi, with tm.texi.in just having an @hook line to show where the regeneration of tm.texi should put the text), CC one of the people listed as docstring relicensing maintainers and ask us to approve the GPL relicensing in that case.
--- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-33.c @@ -0,0 +1,38 @@ +/* { dg-do run { target { ! "m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* xtensa*-*-*"} } } */ + +/* { dg-options "-O2 -fno-inline -fdump-tree-reassoc1-details" } */ +/* { dg-additional-options "-mbranch-cost=2" { target avr-*-* } } */ + +int test (int a, int b, int c) +{ + if (a == 43 || a == 75 || a == 44 || a == 78 || a == 77 || a == 46 || a == 76 || a == 45) + return b; + else + return c; +} + +int main () +{ + if (test (43, 20, 30) != 20) + __builtin_abort (); + if (test (44, 20, 30) != 20) + __builtin_abort (); + if (test (45, 20, 30) != 20) + __builtin_abort (); + if (test (46, 20, 30) != 20) + __builtin_abort (); + if (test (75, 20, 30) != 20) + __builtin_abort (); + if (test (76, 20, 30) != 20) + __builtin_abort (); + if (test (77, 20, 30) != 20) + __builtin_abort (); + if (test (78, 20, 30) != 20) + __builtin_abort (); + if (test (30, 20, 30) != 30) + __builtin_abort (); Perhaps it would be better to test more than just one value outside of the range. Say everything from -10 to 100 might be better, but you'd want to make sure the optimization can't be applied to the What about