Message ID | 20220129162334.GV2646553@tucnak |
---|---|
State | New |
Headers | show |
Series | match.pd: Fix up 1 / X for unsigned X optimization [PR104280] | expand |
On 1/29/2022 9:23 AM, Jakub Jelinek wrote: > Hi! > > On Fri, Jan 28, 2022 at 11:38:23AM -0700, Jeff Law wrote: >> Thanks. Given the original submission and most of the review work was done >> prior to stage3 closing, I went ahead and installed this on the trunk. > Unfortunately this breaks quite a lot of things. > The main problem is that GIMPLE allows EQ_EXPR etc. only with BOOLEAN_TYPE > or with TYPE_PRECISION == 1 integral type (or vector boolean). > Violating this causes verification failures in tree-cfg.cc in some cases, > in other cases wrong-code issues because before it is verified we e.g. > transform > 1U / x > into > x == 1U > and later into > x (because we assume that == type must be one of the above cases and > when it is the same type as the type of the first operand, for boolean-ish > cases it should be equivalent). > > Fixed by changing that > (eq @1 { build_one_cst (type); }) > into > (convert (eq:boolean_type_node @1 { build_one_cst (type); })) > Note, I'm not 100% sure if :boolean_type_node is required in that case, > I see some spots in match.pd that look exactly like this, while there is > e.g. (convert (le ...)) that supposedly does the right thing too. > The signed integer 1/X case doesn't need changes changes, for > (cond (le ...) ...) > le gets correctly boolean_type_node and cond should use type. > I've also reformatted it, some lines were too long, match.pd uses > indentation by 1 column instead of 2 etc. > > Bootstrapped/regtested on powerpc64le-linux and tested on the testcases > on x86_64-linux, ok for trunk? > > 2022-01-29 Jakub Jelinek <jakub@redhat.com> > Andrew Pinski <apinski@marvell.com> > > PR tree-optimization/104279 > PR tree-optimization/104280 > PR tree-optimization/104281 > * match.pd (1 / X -> X == 1 for unsigned X): Build eq with > boolean_type_node and convert to type. Formatting fixes. > > * gcc.dg/torture/pr104279.c: New test. > * gcc.dg/torture/pr104280.c: New test. > * gcc.dg/torture/pr104281.c: New test. Ugh. Sorry for the breakage. THat makes me wonder how the original patch was tested at all. I think both forms will DTRT (eq:boolean_type_node ...) vs (convert (le ...)) . OK for the trunk as well as the testcase fix. Jeff
Sincere apologies for the issues. I wasn't aware of the need for a cast but after reading the PRs, I understand that now. On the other hand, the incorrect test case was simply a major oversight on my part. I'll be sure to be more careful next time. Thanks for the fixes!
On Sat, Jan 29, 2022 at 6:30 PM Zhao Wei Liew via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Sincere apologies for the issues. I wasn't aware of the need for a cast but > after reading the PRs, I understand that now. On the other hand, the > incorrect test case was simply a major oversight on my part. > > I'll be sure to be more careful next time. Well I think gimple-match should have caught the cast issue really. I filed https://gcc.gnu.org/PR104286 for that. I hope to submit a patch for stage 1 to catch that so it will be harder to happen in the first place. Thanks, Andrew > > Thanks for the fixes!
> Unfortunately this breaks quite a lot of things.
Right, for example in Ada where we now happily turn a division by zero, which
should raise an exception with -gnatp, into nonsense. Do we really need this
rather useless optimization in GCC? Blindly mimicing LLVM is not a reason...
I have installed the attached testcase, which now fails because of the change.
* gnat.dg/div_zero.adb: New test.
On Sat, 29 Jan 2022, Jakub Jelinek wrote: > Hi! > > On Fri, Jan 28, 2022 at 11:38:23AM -0700, Jeff Law wrote: > > Thanks. Given the original submission and most of the review work was done > > prior to stage3 closing, I went ahead and installed this on the trunk. > > Unfortunately this breaks quite a lot of things. > The main problem is that GIMPLE allows EQ_EXPR etc. only with BOOLEAN_TYPE > or with TYPE_PRECISION == 1 integral type (or vector boolean). > Violating this causes verification failures in tree-cfg.cc in some cases, > in other cases wrong-code issues because before it is verified we e.g. > transform > 1U / x > into > x == 1U > and later into > x (because we assume that == type must be one of the above cases and > when it is the same type as the type of the first operand, for boolean-ish > cases it should be equivalent). > > Fixed by changing that > (eq @1 { build_one_cst (type); }) > into > (convert (eq:boolean_type_node @1 { build_one_cst (type); })) > Note, I'm not 100% sure if :boolean_type_node is required in that case, It shouldn't be necessary. Unfortunately it's hard for genmatch to statically detect this case as bogus ... Thanks for spotting and fixing this. Richard. > I see some spots in match.pd that look exactly like this, while there is > e.g. (convert (le ...)) that supposedly does the right thing too. > The signed integer 1/X case doesn't need changes changes, for > (cond (le ...) ...) > le gets correctly boolean_type_node and cond should use type. > I've also reformatted it, some lines were too long, match.pd uses > indentation by 1 column instead of 2 etc. > > Bootstrapped/regtested on powerpc64le-linux and tested on the testcases > on x86_64-linux, ok for trunk? > > 2022-01-29 Jakub Jelinek <jakub@redhat.com> > Andrew Pinski <apinski@marvell.com> > > PR tree-optimization/104279 > PR tree-optimization/104280 > PR tree-optimization/104281 > * match.pd (1 / X -> X == 1 for unsigned X): Build eq with > boolean_type_node and convert to type. Formatting fixes. > > * gcc.dg/torture/pr104279.c: New test. > * gcc.dg/torture/pr104280.c: New test. > * gcc.dg/torture/pr104281.c: New test. > > --- gcc/match.pd.jj 2022-01-29 11:11:39.316628007 +0100 > +++ gcc/match.pd 2022-01-29 12:19:46.662096678 +0100 > @@ -435,18 +435,22 @@ (define_operator_list SYNC_FETCH_AND_AND > && TYPE_UNSIGNED (type)) > (trunc_divmod @0 @1)))) > > - /* 1 / X -> X == 1 for unsigned integer X. > - 1 / X -> X >= -1 && X <= 1 ? X : 0 for signed integer X. > - But not for 1 / 0 so that we can get proper warnings and errors, > - and not for 1-bit integers as they are edge cases better handled elsewhere. */ > +/* 1 / X -> X == 1 for unsigned integer X. > + 1 / X -> X >= -1 && X <= 1 ? X : 0 for signed integer X. > + But not for 1 / 0 so that we can get proper warnings and errors, > + and not for 1-bit integers as they are edge cases better handled > + elsewhere. */ > (simplify > - (trunc_div integer_onep@0 @1) > - (if (INTEGRAL_TYPE_P (type) && !integer_zerop (@1) && TYPE_PRECISION (type) > 1) > - (if (TYPE_UNSIGNED (type)) > - (eq @1 { build_one_cst (type); }) > - (with { tree utype = unsigned_type_for (type); } > - (cond (le (plus (convert:utype @1) { build_one_cst (utype); }) { build_int_cst (utype, 2); }) > - @1 { build_zero_cst (type); }))))) > + (trunc_div integer_onep@0 @1) > + (if (INTEGRAL_TYPE_P (type) > + && !integer_zerop (@1) > + && TYPE_PRECISION (type) > 1) > + (if (TYPE_UNSIGNED (type)) > + (convert (eq:boolean_type_node @1 { build_one_cst (type); })) > + (with { tree utype = unsigned_type_for (type); } > + (cond (le (plus (convert:utype @1) { build_one_cst (utype); }) > + { build_int_cst (utype, 2); }) > + @1 { build_zero_cst (type); }))))) > > /* Combine two successive divisions. Note that combining ceil_div > and floor_div is trickier and combining round_div even more so. */ > --- gcc/testsuite/gcc.dg/torture/pr104279.c.jj 2022-01-29 12:25:36.388174312 +0100 > +++ gcc/testsuite/gcc.dg/torture/pr104279.c 2022-01-29 12:25:53.206937588 +0100 > @@ -0,0 +1,12 @@ > +/* PR tree-optimization/104279 */ > +/* { dg-do compile } */ > + > +unsigned a, b; > + > +int > +main () > +{ > + b = ~(0 || ~0); > + a = ~b / ~a; > + return 0; > +} > --- gcc/testsuite/gcc.dg/torture/pr104280.c.jj 2022-01-29 12:24:36.190021595 +0100 > +++ gcc/testsuite/gcc.dg/torture/pr104280.c 2022-01-29 12:25:28.681282783 +0100 > @@ -0,0 +1,16 @@ > +/* PR tree-optimization/104280 */ > +/* { dg-do run } */ > + > +int > +foo (unsigned b, int c) > +{ > + return b / c; > +} > + > +int > +main () > +{ > + if (foo (1, 2) != 0) > + __builtin_abort (); > + return 0; > +} > --- gcc/testsuite/gcc.dg/torture/pr104281.c.jj 2022-01-29 12:27:29.840577473 +0100 > +++ gcc/testsuite/gcc.dg/torture/pr104281.c 2022-01-29 12:27:24.526652267 +0100 > @@ -0,0 +1,22 @@ > +/* PR tree-optimization/104281 */ > +/* { dg-do run } */ > + > +unsigned a = 1; > +int b, c = 2; > +long d; > + > +int > +main () > +{ > + while (1) > + { > + int m = a; > + L: > + a = ~(-(m || b & d)); > + b = ((1 ^ a) / c); > + if (b) > + goto L; > + break; > + } > + return 0; > +} > > > Jakub > >
On Sat, 29 Jan 2022, Andrew Pinski wrote: > On Sat, Jan 29, 2022 at 6:30 PM Zhao Wei Liew via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > Sincere apologies for the issues. I wasn't aware of the need for a cast but > > after reading the PRs, I understand that now. On the other hand, the > > incorrect test case was simply a major oversight on my part. > > > > I'll be sure to be more careful next time. > > Well I think gimple-match should have caught the cast issue really. I > filed https://gcc.gnu.org/PR104286 for that. I hope to submit a patch > for stage 1 to catch that so it will be harder to happen in the first > place. I suppose that we could change verify_gimple_* to receive gimple_match_op (or simply exploded args) and we could then, with -fchecking, verify each such built match-op (some passes also build "custom" ones they feed into match-and-simplify). Richard.
On Mon, Jan 31, 2022 at 12:17 AM Eric Botcazou via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > > Unfortunately this breaks quite a lot of things. > > Right, for example in Ada where we now happily turn a division by zero, which > should raise an exception with -gnatp, into nonsense. Do we really need this > rather useless optimization in GCC? Blindly mimicing LLVM is not a reason... Since I didn't see anyone responding to this problem, I filed PR 104356 to record the regression. And yes this should be handled correctly. Thanks, Andrew Pinski > > I have installed the attached testcase, which now fails because of the change. > > * gnat.dg/div_zero.adb: New test. > > -- > Eric Botcazou
> Since I didn't see anyone responding to this problem, I filed PR > 104356 to record the regression. > And yes this should be handled correctly. Thanks. Note that we have an example of this in libgcc/libgcc2.c too.
On Thu, 3 Feb 2022, Eric Botcazou wrote: > > Since I didn't see anyone responding to this problem, I filed PR > > 104356 to record the regression. > > And yes this should be handled correctly. > > Thanks. Note that we have an example of this in libgcc/libgcc2.c too. I assumed this was handled by the followups to the patch ... Well, yes, we have to fix it. I will share my thoughts when coming along the bugreport. Richard.
> Well, yes, we have to fix it. I will share my thoughts when coming > along the bugreport. IMO we should simply scrap it, as it does not serve any useful purpose, breaks a very ancient and useful idiom and also introduces an artificial discrepancy between 1/X and 2/X for example.
On Thu, Feb 03, 2022 at 10:06:42AM +0100, Eric Botcazou wrote: > > Well, yes, we have to fix it. I will share my thoughts when coming > > along the bugreport. > > IMO we should simply scrap it, as it does not serve any useful purpose, breaks > a very ancient and useful idiom and also introduces an artificial discrepancy > between 1/X and 2/X for example. That doesn't make sense. The optimization can be useful, it doesn't have to be for user written y = 1 / x; but can appear through inlining or some other optimizations, e.g. jump threading and suddenly we have constant 1 in some bb, if it a never executed at runtime block, it will be likely shorter because of the optimization, if it is not, then it will be cheaper. And, this is definitely not the first optimization that assumes undefined behavior in integer division should not happen, lots of other optimizations do the same thing. E.g. consider unsigned foo (unsigned x, unsigned y) { if (x >= 2) return 0; if (x == 1) y += 4; return y / x; } Here the ranger optimizes away the division, in GCC11 in vrp1 pass, in GCC12 in evrp pass already. Even in match.pd I vaguely remember one or two optimizations where it also relied on division by zero being undefined. Jakub
On Thu, 3 Feb 2022, Jakub Jelinek wrote: > On Thu, Feb 03, 2022 at 10:06:42AM +0100, Eric Botcazou wrote: > > > Well, yes, we have to fix it. I will share my thoughts when coming > > > along the bugreport. > > > > IMO we should simply scrap it, as it does not serve any useful purpose, breaks > > a very ancient and useful idiom and also introduces an artificial discrepancy > > between 1/X and 2/X for example. > > That doesn't make sense. > The optimization can be useful, it doesn't have to be for user written > y = 1 / x; but can appear through inlining or some other optimizations, e.g. > jump threading and suddenly we have constant 1 in some bb, if it a never > executed at runtime block, it will be likely shorter because of the > optimization, if it is not, then it will be cheaper. > And, this is definitely not the first optimization that assumes undefined > behavior in integer division should not happen, lots of other optimizations > do the same thing. > E.g. consider > unsigned > foo (unsigned x, unsigned y) > { > if (x >= 2) > return 0; > if (x == 1) > y += 4; > return y / x; > } > Here the ranger optimizes away the division, in GCC11 in vrp1 pass, in > GCC12 in evrp pass already. Even in match.pd I vaguely remember one or two > optimizations where it also relied on division by zero being undefined. Yes, we definitely have multiple of those cases. I do think preserving "an idiom", for example literal 0/0 or all x/0 might be worth considering. But I also think we have to sort out different language standards requirements vs. the middle-end and whos responsible for making sure we adhere here. That said, for GCC 12 we can take shortcuts and one option is of course to revert (but we don't have to rush). oh, and even GCC 4.3 optimizes int main() { volatile int tem = 0/0; } to just movl $0, -4(%rsp) even with -fnon-call-exceptions -fexceptions. Richard.
> The optimization can be useful, it doesn't have to be for user written > y = 1 / x; but can appear through inlining or some other optimizations, e.g. > jump threading and suddenly we have constant 1 in some bb, if it a never > executed at runtime block, it will be likely shorter because of the > optimization, if it is not, then it will be cheaper. The hundreds of people having worked on GCC for the past 30 years must have been stupid then, how come have they missed such a great optimization? ;-)
On Thu, Feb 03, 2022 at 10:40:10AM +0100, Richard Biener wrote: > Yes, we definitely have multiple of those cases. I do think > preserving "an idiom", for example literal 0/0 or all x/0 might be > worth considering. But I also think we have to sort out different > language standards requirements vs. the middle-end and whos > responsible for making sure we adhere here. I think we try to preserve literal 0/0 and x/0, including this optimization which punts if the divisor is literal. But, for literal 0/0 and x/0 we alsy emit -Wdiv-by-zero warning, maybe that was the reason why libgcc2.c did it differently. Jakub
On Thu, 3 Feb 2022, Jakub Jelinek wrote: > On Thu, Feb 03, 2022 at 10:40:10AM +0100, Richard Biener wrote: > > Yes, we definitely have multiple of those cases. I do think > > preserving "an idiom", for example literal 0/0 or all x/0 might be > > worth considering. But I also think we have to sort out different > > language standards requirements vs. the middle-end and whos > > responsible for making sure we adhere here. > > I think we try to preserve literal 0/0 and x/0, including this > optimization which punts if the divisor is literal. But, for literal > 0/0 and x/0 we alsy emit -Wdiv-by-zero warning, maybe that was the > reason why libgcc2.c did it differently. Sure, I bet the code is quite old since we very likely propagate the equality and optimize the division away since a long time. Now that #pragma GCC diagnostic is a thing we can probably write literal 0/0 if we choose to preserve that until the end. But as said, for the libgcc2.c case I'd simply remove all of it. Richard.
On Thu, Feb 03, 2022 at 12:03:15PM +0100, Richard Biener wrote:
> But as said, for the libgcc2.c case I'd simply remove all of it.
I can't read RMS' mind, it is indeed UB, so we can do anything, but I bet
it was just a QoI attempt, when (most of the time) normal single-word
or smaller division for / 0 behaves certain way (SIGFPE with FPE_INTDIV,
or being ignored, or whatever else) that it would be nice if the multi-word
division behaved likewise.
On the platforms where it is SIGFPE with FPE_INTDIV, raising that would
help people figure out what's going on.
Jakub
On Thu, 3 Feb 2022, Jakub Jelinek wrote: > On Thu, Feb 03, 2022 at 12:03:15PM +0100, Richard Biener wrote: > > But as said, for the libgcc2.c case I'd simply remove all of it. > > I can't read RMS' mind, it is indeed UB, so we can do anything, but I bet > it was just a QoI attempt, when (most of the time) normal single-word > or smaller division for / 0 behaves certain way (SIGFPE with FPE_INTDIV, > or being ignored, or whatever else) that it would be nice if the multi-word > division behaved likewise. > On the platforms where it is SIGFPE with FPE_INTDIV, raising that would > help people figure out what's going on. Yes, I think the intent is clear. The question is whether we should re-instantiate the clear intent of preserving a literal / 0 as well (for C, without -fnon-call-exceptions). That said, I'm fine with the asms but they are ugly ;) Richard.
On Thu, Feb 03, 2022 at 01:13:29PM +0100, Richard Biener wrote: > On Thu, 3 Feb 2022, Jakub Jelinek wrote: > > > On Thu, Feb 03, 2022 at 12:03:15PM +0100, Richard Biener wrote: > > > But as said, for the libgcc2.c case I'd simply remove all of it. > > > > I can't read RMS' mind, it is indeed UB, so we can do anything, but I bet > > it was just a QoI attempt, when (most of the time) normal single-word > > or smaller division for / 0 behaves certain way (SIGFPE with FPE_INTDIV, > > or being ignored, or whatever else) that it would be nice if the multi-word > > division behaved likewise. > > On the platforms where it is SIGFPE with FPE_INTDIV, raising that would > > help people figure out what's going on. > > Yes, I think the intent is clear. The question is whether we should > re-instantiate the clear intent of preserving a literal / 0 as well > (for C, without -fnon-call-exceptions). I think currently it is path isolation that breaks it. Bet in a good intent that everything after it is UB and allowing the stmts after it to be optimized away. Unfortunately we don't (easily) know whether the division by zero was literal already in the source or propagated through inlining etc. or even jump threading. Especially in the last case it is nice to be able to emit as short code as possible, which __builtin_trap () serves nicely on many targets. Emitting both a 1 / 0 insn followed by __builtin_trap () might work with perhaps some RTL optimization to optimize away the trap if the target is known to trap on division by zero. Jakub
> Yes, I think the intent is clear. The question is whether we should > re-instantiate the clear intent of preserving a literal / 0 as well > (for C, without -fnon-call-exceptions). Note that the code is precisely compiled with -fnon-call-exceptions: ifeq ($(LIB2_DIVMOD_EXCEPTION_FLAGS),) # Provide default flags for compiling divmod functions, if they haven't been # set already by a target-specific Makefile fragment. LIB2_DIVMOD_EXCEPTION_FLAGS := -fexceptions -fnon-call-exceptions endif # Build LIB2_DIVMOD_FUNCS. lib2-divmod-o = $(patsubst %,%$(objext),$(LIB2_DIVMOD_FUNCS)) $(lib2-divmod-o): %$(objext): $(srcdir)/libgcc2.c $(gcc_compile) -DL$* -c $< \ $(LIB2_DIVMOD_EXCEPTION_FLAGS) $(vis_hide) libgcc-objects += $(lib2-divmod-o) ifeq ($(enable_shared),yes) lib2-divmod-s-o = $(patsubst %,%_s$(objext),$(LIB2_DIVMOD_FUNCS)) $(lib2-divmod-s-o): %_s$(objext): $(srcdir)/libgcc2.c $(gcc_s_compile) -DL$* -c $< \ $(LIB2_DIVMOD_EXCEPTION_FLAGS) libgcc-s-objects += $(lib2-divmod-s-o) endif so it seems like we were about to reinvent the wheel here. ;-)
> Well, yes, we have to fix it.
Here's the fix we agreed upon in the audit trail, OK for the mainline?
PR tree-optimization/104356
* match.pd (X / bool_range_Y is X): Add guard.
(X / X is one): Likewise.
(X / abs (X) is X < 0 ? -1 : 1): Likewise.
(X / -X is -1): Likewise.
(1 / X -> X == 1): Likewise.
On Fri, Feb 04, 2022 at 10:53:22AM +0100, Eric Botcazou wrote: > > Well, yes, we have to fix it. > > Here's the fix we agreed upon in the audit trail, OK for the mainline? > > PR tree-optimization/104356 > * match.pd (X / bool_range_Y is X): Add guard. > (X / X is one): Likewise. > (X / abs (X) is X < 0 ? -1 : 1): Likewise. > (X / -X is -1): Likewise. > (1 / X -> X == 1): Likewise. Looks mostly good, just a few nits. > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -401,27 +401,35 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > /* X / bool_range_Y is X. */ > (simplify > (div @0 SSA_NAME@1) > - (if (INTEGRAL_TYPE_P (type) && ssa_name_has_boolean_range (@1)) > + (if (INTEGRAL_TYPE_P (type) > + && ssa_name_has_boolean_range (@1) > + && !flag_non_call_exceptions) ssa_name_has_boolean_range call is certainly more expensive than !flag_non_call_exceptions check, can you swap those two? > @@ -444,6 +452,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (trunc_div integer_onep@0 @1) > (if (INTEGRAL_TYPE_P (type) > && !integer_zerop (@1) > + && (!flag_non_call_exceptions || tree_expr_nonzero_p (@1)) > && TYPE_PRECISION (type) > 1) And similarly, TYPE_PRECISION (type) > 1 check is very cheap, can it be done before the && !integer_zerop (@1) line? I admit this one is already preexisting, but tree_expr_nonzero_p can be quite expensive. Otherwise LGTM. Jakub
> > --- a/gcc/match.pd > > +++ b/gcc/match.pd > > @@ -401,27 +401,35 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > > > /* X / bool_range_Y is X. */ > > (simplify > > > > (div @0 SSA_NAME@1) > > > > - (if (INTEGRAL_TYPE_P (type) && ssa_name_has_boolean_range (@1)) > > + (if (INTEGRAL_TYPE_P (type) > > + && ssa_name_has_boolean_range (@1) > > + && !flag_non_call_exceptions) > > ssa_name_has_boolean_range call is certainly more expensive than > !flag_non_call_exceptions check, can you swap those two? But !flag_non_call_exceptions is (almost) always true for the C family of languages, so you're going to penalize them by doing this. > And similarly, TYPE_PRECISION (type) > 1 check is very cheap, can > it be done before the && !integer_zerop (@1) line? Yes, it clearly belongs there.
On Fri, Feb 04, 2022 at 11:14:05AM +0100, Eric Botcazou wrote: > > > --- a/gcc/match.pd > > > +++ b/gcc/match.pd > > > @@ -401,27 +401,35 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > > > > > /* X / bool_range_Y is X. */ > > > (simplify > > > > > > (div @0 SSA_NAME@1) > > > > > > - (if (INTEGRAL_TYPE_P (type) && ssa_name_has_boolean_range (@1)) > > > + (if (INTEGRAL_TYPE_P (type) > > > + && ssa_name_has_boolean_range (@1) > > > + && !flag_non_call_exceptions) > > > > ssa_name_has_boolean_range call is certainly more expensive than > > !flag_non_call_exceptions check, can you swap those two? > > But !flag_non_call_exceptions is (almost) always true for the C family of > languages, so you're going to penalize them by doing this. True, but much less so than the other order penalizing Ada/Go. > > And similarly, TYPE_PRECISION (type) > 1 check is very cheap, can > > it be done before the && !integer_zerop (@1) line? > > Yes, it clearly belongs there. Jakub
On Fri, Feb 04, 2022 at 11:26:30AM +0100, Jakub Jelinek via Gcc-patches wrote: > On Fri, Feb 04, 2022 at 11:14:05AM +0100, Eric Botcazou wrote: > > > > --- a/gcc/match.pd > > > > +++ b/gcc/match.pd > > > > @@ -401,27 +401,35 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > > > > > > > /* X / bool_range_Y is X. */ > > > > (simplify > > > > > > > > (div @0 SSA_NAME@1) > > > > > > > > - (if (INTEGRAL_TYPE_P (type) && ssa_name_has_boolean_range (@1)) > > > > + (if (INTEGRAL_TYPE_P (type) > > > > + && ssa_name_has_boolean_range (@1) > > > > + && !flag_non_call_exceptions) > > > > > > ssa_name_has_boolean_range call is certainly more expensive than > > > !flag_non_call_exceptions check, can you swap those two? > > > > But !flag_non_call_exceptions is (almost) always true for the C family of > > languages, so you're going to penalize them by doing this. > > True, but much less so than the other order penalizing Ada/Go. > > > > And similarly, TYPE_PRECISION (type) > 1 check is very cheap, can > > > it be done before the && !integer_zerop (@1) line? > > > > Yes, it clearly belongs there. Anyway, not a big deal. Jakub
--- gcc/match.pd.jj 2022-01-29 11:11:39.316628007 +0100 +++ gcc/match.pd 2022-01-29 12:19:46.662096678 +0100 @@ -435,18 +435,22 @@ (define_operator_list SYNC_FETCH_AND_AND && TYPE_UNSIGNED (type)) (trunc_divmod @0 @1)))) - /* 1 / X -> X == 1 for unsigned integer X. - 1 / X -> X >= -1 && X <= 1 ? X : 0 for signed integer X. - But not for 1 / 0 so that we can get proper warnings and errors, - and not for 1-bit integers as they are edge cases better handled elsewhere. */ +/* 1 / X -> X == 1 for unsigned integer X. + 1 / X -> X >= -1 && X <= 1 ? X : 0 for signed integer X. + But not for 1 / 0 so that we can get proper warnings and errors, + and not for 1-bit integers as they are edge cases better handled + elsewhere. */ (simplify - (trunc_div integer_onep@0 @1) - (if (INTEGRAL_TYPE_P (type) && !integer_zerop (@1) && TYPE_PRECISION (type) > 1) - (if (TYPE_UNSIGNED (type)) - (eq @1 { build_one_cst (type); }) - (with { tree utype = unsigned_type_for (type); } - (cond (le (plus (convert:utype @1) { build_one_cst (utype); }) { build_int_cst (utype, 2); }) - @1 { build_zero_cst (type); }))))) + (trunc_div integer_onep@0 @1) + (if (INTEGRAL_TYPE_P (type) + && !integer_zerop (@1) + && TYPE_PRECISION (type) > 1) + (if (TYPE_UNSIGNED (type)) + (convert (eq:boolean_type_node @1 { build_one_cst (type); })) + (with { tree utype = unsigned_type_for (type); } + (cond (le (plus (convert:utype @1) { build_one_cst (utype); }) + { build_int_cst (utype, 2); }) + @1 { build_zero_cst (type); }))))) /* Combine two successive divisions. Note that combining ceil_div and floor_div is trickier and combining round_div even more so. */ --- gcc/testsuite/gcc.dg/torture/pr104279.c.jj 2022-01-29 12:25:36.388174312 +0100 +++ gcc/testsuite/gcc.dg/torture/pr104279.c 2022-01-29 12:25:53.206937588 +0100 @@ -0,0 +1,12 @@ +/* PR tree-optimization/104279 */ +/* { dg-do compile } */ + +unsigned a, b; + +int +main () +{ + b = ~(0 || ~0); + a = ~b / ~a; + return 0; +} --- gcc/testsuite/gcc.dg/torture/pr104280.c.jj 2022-01-29 12:24:36.190021595 +0100 +++ gcc/testsuite/gcc.dg/torture/pr104280.c 2022-01-29 12:25:28.681282783 +0100 @@ -0,0 +1,16 @@ +/* PR tree-optimization/104280 */ +/* { dg-do run } */ + +int +foo (unsigned b, int c) +{ + return b / c; +} + +int +main () +{ + if (foo (1, 2) != 0) + __builtin_abort (); + return 0; +} --- gcc/testsuite/gcc.dg/torture/pr104281.c.jj 2022-01-29 12:27:29.840577473 +0100 +++ gcc/testsuite/gcc.dg/torture/pr104281.c 2022-01-29 12:27:24.526652267 +0100 @@ -0,0 +1,22 @@ +/* PR tree-optimization/104281 */ +/* { dg-do run } */ + +unsigned a = 1; +int b, c = 2; +long d; + +int +main () +{ + while (1) + { + int m = a; + L: + a = ~(-(m || b & d)); + b = ((1 ^ a) / c); + if (b) + goto L; + break; + } + return 0; +}