diff mbox series

match.pd: Fix up 1 / X for unsigned X optimization [PR104280]

Message ID 20220129162334.GV2646553@tucnak
State New
Headers show
Series match.pd: Fix up 1 / X for unsigned X optimization [PR104280] | expand

Commit Message

Jakub Jelinek Jan. 29, 2022, 4:23 p.m. UTC
Hi!

On Fri, Jan 28, 2022 at 11:38:23AM -0700, Jeff Law wrote:
> Thanks.  Given the original submission and most of the review work was done
> prior to stage3 closing, I went ahead and installed this on the trunk.

Unfortunately this breaks quite a lot of things.
The main problem is that GIMPLE allows EQ_EXPR etc. only with BOOLEAN_TYPE
or with TYPE_PRECISION == 1 integral type (or vector boolean).
Violating this causes verification failures in tree-cfg.cc in some cases,
in other cases wrong-code issues because before it is verified we e.g.
transform
1U / x
into
x == 1U
and later into
x (because we assume that == type must be one of the above cases and
when it is the same type as the type of the first operand, for boolean-ish
cases it should be equivalent).

Fixed by changing that
(eq @1 { build_one_cst (type); })
into
(convert (eq:boolean_type_node @1 { build_one_cst (type); }))
Note, I'm not 100% sure if :boolean_type_node is required in that case,
I see some spots in match.pd that look exactly like this, while there is
e.g. (convert (le ...)) that supposedly does the right thing too.
The signed integer 1/X case doesn't need changes changes, for
(cond (le ...) ...)
le gets correctly boolean_type_node and cond should use type.
I've also reformatted it, some lines were too long, match.pd uses
indentation by 1 column instead of 2 etc.

Bootstrapped/regtested on powerpc64le-linux and tested on the testcases
on x86_64-linux, ok for trunk?

2022-01-29  Jakub Jelinek  <jakub@redhat.com>
	    Andrew Pinski  <apinski@marvell.com>

	PR tree-optimization/104279
	PR tree-optimization/104280
	PR tree-optimization/104281
	* match.pd (1 / X -> X == 1 for unsigned X): Build eq with
	boolean_type_node and convert to type.  Formatting fixes.

	* gcc.dg/torture/pr104279.c: New test.
	* gcc.dg/torture/pr104280.c: New test.
	* gcc.dg/torture/pr104281.c: New test.



	Jakub

Comments

Jeff Law Jan. 29, 2022, 4:48 p.m. UTC | #1
On 1/29/2022 9:23 AM, Jakub Jelinek wrote:
> Hi!
>
> On Fri, Jan 28, 2022 at 11:38:23AM -0700, Jeff Law wrote:
>> Thanks.  Given the original submission and most of the review work was done
>> prior to stage3 closing, I went ahead and installed this on the trunk.
> Unfortunately this breaks quite a lot of things.
> The main problem is that GIMPLE allows EQ_EXPR etc. only with BOOLEAN_TYPE
> or with TYPE_PRECISION == 1 integral type (or vector boolean).
> Violating this causes verification failures in tree-cfg.cc in some cases,
> in other cases wrong-code issues because before it is verified we e.g.
> transform
> 1U / x
> into
> x == 1U
> and later into
> x (because we assume that == type must be one of the above cases and
> when it is the same type as the type of the first operand, for boolean-ish
> cases it should be equivalent).
>
> Fixed by changing that
> (eq @1 { build_one_cst (type); })
> into
> (convert (eq:boolean_type_node @1 { build_one_cst (type); }))
> Note, I'm not 100% sure if :boolean_type_node is required in that case,
> I see some spots in match.pd that look exactly like this, while there is
> e.g. (convert (le ...)) that supposedly does the right thing too.
> The signed integer 1/X case doesn't need changes changes, for
> (cond (le ...) ...)
> le gets correctly boolean_type_node and cond should use type.
> I've also reformatted it, some lines were too long, match.pd uses
> indentation by 1 column instead of 2 etc.
>
> Bootstrapped/regtested on powerpc64le-linux and tested on the testcases
> on x86_64-linux, ok for trunk?
>
> 2022-01-29  Jakub Jelinek  <jakub@redhat.com>
> 	    Andrew Pinski  <apinski@marvell.com>
>
> 	PR tree-optimization/104279
> 	PR tree-optimization/104280
> 	PR tree-optimization/104281
> 	* match.pd (1 / X -> X == 1 for unsigned X): Build eq with
> 	boolean_type_node and convert to type.  Formatting fixes.
>
> 	* gcc.dg/torture/pr104279.c: New test.
> 	* gcc.dg/torture/pr104280.c: New test.
> 	* gcc.dg/torture/pr104281.c: New test.
Ugh.  Sorry for the breakage.  THat makes me wonder how the original 
patch was tested at all.

I think both forms will DTRT (eq:boolean_type_node ...) vs (convert (le 
...)) .

OK for the trunk as well as the testcase fix.

Jeff
Zhao Wei Liew Jan. 30, 2022, 2:28 a.m. UTC | #2
Sincere apologies for the issues. I wasn't aware of the need for a cast but
after reading the PRs, I understand that now. On the other hand, the
incorrect test case was simply a major oversight on my part.

I'll be sure to be more careful next time.

Thanks for the fixes!
Andrew Pinski Jan. 30, 2022, 2:38 a.m. UTC | #3
On Sat, Jan 29, 2022 at 6:30 PM Zhao Wei Liew via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Sincere apologies for the issues. I wasn't aware of the need for a cast but
> after reading the PRs, I understand that now. On the other hand, the
> incorrect test case was simply a major oversight on my part.
>
> I'll be sure to be more careful next time.

Well I think gimple-match should have caught the cast issue really. I
filed https://gcc.gnu.org/PR104286 for that. I hope to submit a patch
for stage 1 to catch that so it will be harder to happen in the first
place.

Thanks,
Andrew

>
> Thanks for the fixes!
Eric Botcazou Jan. 31, 2022, 8:16 a.m. UTC | #4
> Unfortunately this breaks quite a lot of things.

Right, for example in Ada where we now happily turn a division by zero, which 
should raise an exception with -gnatp, into nonsense.  Do we really need this 
rather useless optimization in GCC?  Blindly mimicing LLVM is not a reason...

I have installed the attached testcase, which now fails because of the change.

	* gnat.dg/div_zero.adb: New test.
Richard Biener Jan. 31, 2022, 8:28 a.m. UTC | #5
On Sat, 29 Jan 2022, Jakub Jelinek wrote:

> Hi!
> 
> On Fri, Jan 28, 2022 at 11:38:23AM -0700, Jeff Law wrote:
> > Thanks.  Given the original submission and most of the review work was done
> > prior to stage3 closing, I went ahead and installed this on the trunk.
> 
> Unfortunately this breaks quite a lot of things.
> The main problem is that GIMPLE allows EQ_EXPR etc. only with BOOLEAN_TYPE
> or with TYPE_PRECISION == 1 integral type (or vector boolean).
> Violating this causes verification failures in tree-cfg.cc in some cases,
> in other cases wrong-code issues because before it is verified we e.g.
> transform
> 1U / x
> into
> x == 1U
> and later into
> x (because we assume that == type must be one of the above cases and
> when it is the same type as the type of the first operand, for boolean-ish
> cases it should be equivalent).
> 
> Fixed by changing that
> (eq @1 { build_one_cst (type); })
> into
> (convert (eq:boolean_type_node @1 { build_one_cst (type); }))
> Note, I'm not 100% sure if :boolean_type_node is required in that case,

It shouldn't be necessary.  Unfortunately it's hard for genmatch to
statically detect this case as bogus ...

Thanks for spotting and fixing this.

Richard.

> I see some spots in match.pd that look exactly like this, while there is
> e.g. (convert (le ...)) that supposedly does the right thing too.
> The signed integer 1/X case doesn't need changes changes, for
> (cond (le ...) ...)
> le gets correctly boolean_type_node and cond should use type.
> I've also reformatted it, some lines were too long, match.pd uses
> indentation by 1 column instead of 2 etc.
> 
> Bootstrapped/regtested on powerpc64le-linux and tested on the testcases
> on x86_64-linux, ok for trunk?
> 
> 2022-01-29  Jakub Jelinek  <jakub@redhat.com>
> 	    Andrew Pinski  <apinski@marvell.com>
> 
> 	PR tree-optimization/104279
> 	PR tree-optimization/104280
> 	PR tree-optimization/104281
> 	* match.pd (1 / X -> X == 1 for unsigned X): Build eq with
> 	boolean_type_node and convert to type.  Formatting fixes.
> 
> 	* gcc.dg/torture/pr104279.c: New test.
> 	* gcc.dg/torture/pr104280.c: New test.
> 	* gcc.dg/torture/pr104281.c: New test.
> 
> --- gcc/match.pd.jj	2022-01-29 11:11:39.316628007 +0100
> +++ gcc/match.pd	2022-01-29 12:19:46.662096678 +0100
> @@ -435,18 +435,22 @@ (define_operator_list SYNC_FETCH_AND_AND
>         && TYPE_UNSIGNED (type))
>     (trunc_divmod @0 @1))))
>  
> - /* 1 / X -> X == 1 for unsigned integer X.
> -    1 / X -> X >= -1 && X <= 1 ? X : 0 for signed integer X.
> -    But not for 1 / 0 so that we can get proper warnings and errors,
> -    and not for 1-bit integers as they are edge cases better handled elsewhere. */
> +/* 1 / X -> X == 1 for unsigned integer X.
> +   1 / X -> X >= -1 && X <= 1 ? X : 0 for signed integer X.
> +   But not for 1 / 0 so that we can get proper warnings and errors,
> +   and not for 1-bit integers as they are edge cases better handled
> +   elsewhere.  */
>  (simplify
> -  (trunc_div integer_onep@0 @1)
> -  (if (INTEGRAL_TYPE_P (type) && !integer_zerop (@1) && TYPE_PRECISION (type) > 1)
> -    (if (TYPE_UNSIGNED (type))
> -      (eq @1 { build_one_cst (type); })
> -      (with { tree utype = unsigned_type_for (type); }
> -        (cond (le (plus (convert:utype @1) { build_one_cst (utype); }) { build_int_cst (utype, 2); })
> -          @1 { build_zero_cst (type); })))))
> + (trunc_div integer_onep@0 @1)
> + (if (INTEGRAL_TYPE_P (type)
> +      && !integer_zerop (@1)
> +      && TYPE_PRECISION (type) > 1)
> +  (if (TYPE_UNSIGNED (type))
> +   (convert (eq:boolean_type_node @1 { build_one_cst (type); }))
> +   (with { tree utype = unsigned_type_for (type); }
> +    (cond (le (plus (convert:utype @1) { build_one_cst (utype); })
> +	      { build_int_cst (utype, 2); })
> +     @1 { build_zero_cst (type); })))))
>  
>  /* Combine two successive divisions.  Note that combining ceil_div
>     and floor_div is trickier and combining round_div even more so.  */
> --- gcc/testsuite/gcc.dg/torture/pr104279.c.jj	2022-01-29 12:25:36.388174312 +0100
> +++ gcc/testsuite/gcc.dg/torture/pr104279.c	2022-01-29 12:25:53.206937588 +0100
> @@ -0,0 +1,12 @@
> +/* PR tree-optimization/104279 */
> +/* { dg-do compile } */
> +
> +unsigned a, b;
> +
> +int
> +main ()
> +{
> +  b = ~(0 || ~0);
> +  a = ~b / ~a;
> +  return 0;
> +}
> --- gcc/testsuite/gcc.dg/torture/pr104280.c.jj	2022-01-29 12:24:36.190021595 +0100
> +++ gcc/testsuite/gcc.dg/torture/pr104280.c	2022-01-29 12:25:28.681282783 +0100
> @@ -0,0 +1,16 @@
> +/* PR tree-optimization/104280 */
> +/* { dg-do run } */
> +
> +int
> +foo (unsigned b, int c)
> +{
> +  return b / c;
> +}
> +
> +int
> +main ()
> +{
> +  if (foo (1, 2) != 0)
> +    __builtin_abort ();
> +  return 0;
> +}
> --- gcc/testsuite/gcc.dg/torture/pr104281.c.jj	2022-01-29 12:27:29.840577473 +0100
> +++ gcc/testsuite/gcc.dg/torture/pr104281.c	2022-01-29 12:27:24.526652267 +0100
> @@ -0,0 +1,22 @@
> +/* PR tree-optimization/104281 */
> +/* { dg-do run } */
> +
> +unsigned a = 1;
> +int b, c = 2;
> +long d;
> +
> +int
> +main ()
> +{
> +  while (1)
> +    {
> +      int m = a;
> +    L:
> +      a = ~(-(m || b & d));
> +      b = ((1 ^ a) / c);
> +      if (b)
> +	goto L;
> +      break;
> +    }
> +  return 0;
> +}
> 
> 
> 	Jakub
> 
>
Richard Biener Jan. 31, 2022, 8:33 a.m. UTC | #6
On Sat, 29 Jan 2022, Andrew Pinski wrote:

> On Sat, Jan 29, 2022 at 6:30 PM Zhao Wei Liew via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Sincere apologies for the issues. I wasn't aware of the need for a cast but
> > after reading the PRs, I understand that now. On the other hand, the
> > incorrect test case was simply a major oversight on my part.
> >
> > I'll be sure to be more careful next time.
> 
> Well I think gimple-match should have caught the cast issue really. I
> filed https://gcc.gnu.org/PR104286 for that. I hope to submit a patch
> for stage 1 to catch that so it will be harder to happen in the first
> place.

I suppose that we could change verify_gimple_* to receive gimple_match_op
(or simply exploded args) and we could then, with -fchecking, verify
each such built match-op (some passes also build "custom" ones they feed
into match-and-simplify).

Richard.
Andrew Pinski Feb. 2, 2022, 10:58 p.m. UTC | #7
On Mon, Jan 31, 2022 at 12:17 AM Eric Botcazou via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> > Unfortunately this breaks quite a lot of things.
>
> Right, for example in Ada where we now happily turn a division by zero, which
> should raise an exception with -gnatp, into nonsense.  Do we really need this
> rather useless optimization in GCC?  Blindly mimicing LLVM is not a reason...

Since I didn't see anyone responding to this problem, I filed PR
104356 to record the regression.
And yes this should be handled correctly.

Thanks,
Andrew Pinski

>
> I have installed the attached testcase, which now fails because of the change.
>
>         * gnat.dg/div_zero.adb: New test.
>
> --
> Eric Botcazou
Eric Botcazou Feb. 2, 2022, 11:01 p.m. UTC | #8
> Since I didn't see anyone responding to this problem, I filed PR
> 104356 to record the regression.
> And yes this should be handled correctly.

Thanks.  Note that we have an example of this in libgcc/libgcc2.c too.
Richard Biener Feb. 3, 2022, 7:16 a.m. UTC | #9
On Thu, 3 Feb 2022, Eric Botcazou wrote:

> > Since I didn't see anyone responding to this problem, I filed PR
> > 104356 to record the regression.
> > And yes this should be handled correctly.
> 
> Thanks.  Note that we have an example of this in libgcc/libgcc2.c too.

I assumed this was handled by the followups to the patch ...

Well, yes, we have to fix it.  I will share my thoughts when coming
along the bugreport.

Richard.
Eric Botcazou Feb. 3, 2022, 9:06 a.m. UTC | #10
> Well, yes, we have to fix it.  I will share my thoughts when coming
> along the bugreport.

IMO we should simply scrap it, as it does not serve any useful purpose, breaks 
a very ancient and useful idiom and also introduces an artificial discrepancy 
between 1/X and 2/X for example.
Jakub Jelinek Feb. 3, 2022, 9:33 a.m. UTC | #11
On Thu, Feb 03, 2022 at 10:06:42AM +0100, Eric Botcazou wrote:
> > Well, yes, we have to fix it.  I will share my thoughts when coming
> > along the bugreport.
> 
> IMO we should simply scrap it, as it does not serve any useful purpose, breaks 
> a very ancient and useful idiom and also introduces an artificial discrepancy 
> between 1/X and 2/X for example.

That doesn't make sense.
The optimization can be useful, it doesn't have to be for user written
y = 1 / x; but can appear through inlining or some other optimizations, e.g.
jump threading and suddenly we have constant 1 in some bb, if it a never
executed at runtime block, it will be likely shorter because of the
optimization, if it is not, then it will be cheaper.
And, this is definitely not the first optimization that assumes undefined
behavior in integer division should not happen, lots of other optimizations
do the same thing.
E.g. consider
unsigned
foo (unsigned x, unsigned y)
{
  if (x >= 2)
    return 0;
  if (x == 1)
    y += 4;
  return y / x;
}
Here the ranger optimizes away the division, in GCC11 in vrp1 pass, in
GCC12 in evrp pass already.  Even in match.pd I vaguely remember one or two
optimizations where it also relied on division by zero being undefined.

	Jakub
Richard Biener Feb. 3, 2022, 9:40 a.m. UTC | #12
On Thu, 3 Feb 2022, Jakub Jelinek wrote:

> On Thu, Feb 03, 2022 at 10:06:42AM +0100, Eric Botcazou wrote:
> > > Well, yes, we have to fix it.  I will share my thoughts when coming
> > > along the bugreport.
> > 
> > IMO we should simply scrap it, as it does not serve any useful purpose, breaks 
> > a very ancient and useful idiom and also introduces an artificial discrepancy 
> > between 1/X and 2/X for example.
> 
> That doesn't make sense.
> The optimization can be useful, it doesn't have to be for user written
> y = 1 / x; but can appear through inlining or some other optimizations, e.g.
> jump threading and suddenly we have constant 1 in some bb, if it a never
> executed at runtime block, it will be likely shorter because of the
> optimization, if it is not, then it will be cheaper.
> And, this is definitely not the first optimization that assumes undefined
> behavior in integer division should not happen, lots of other optimizations
> do the same thing.
> E.g. consider
> unsigned
> foo (unsigned x, unsigned y)
> {
>   if (x >= 2)
>     return 0;
>   if (x == 1)
>     y += 4;
>   return y / x;
> }
> Here the ranger optimizes away the division, in GCC11 in vrp1 pass, in
> GCC12 in evrp pass already.  Even in match.pd I vaguely remember one or two
> optimizations where it also relied on division by zero being undefined.

Yes, we definitely have multiple of those cases.  I do think
preserving "an idiom", for example literal 0/0 or all x/0 might be
worth considering.  But I also think we have to sort out different
language standards requirements vs. the middle-end and whos
responsible for making sure we adhere here.

That said, for GCC 12 we can take shortcuts and one option is of
course to revert (but we don't have to rush).

oh, and even GCC 4.3 optimizes

int main()
{
  volatile int tem = 0/0;
}

to just

        movl    $0, -4(%rsp)

even with -fnon-call-exceptions -fexceptions.

Richard.
Eric Botcazou Feb. 3, 2022, 9:49 a.m. UTC | #13
> The optimization can be useful, it doesn't have to be for user written
> y = 1 / x; but can appear through inlining or some other optimizations, e.g.
> jump threading and suddenly we have constant 1 in some bb, if it a never
> executed at runtime block, it will be likely shorter because of the
> optimization, if it is not, then it will be cheaper.

The hundreds of people having worked on GCC for the past 30 years must have 
been stupid then, how come have they missed such a great optimization? ;-)
Jakub Jelinek Feb. 3, 2022, 9:55 a.m. UTC | #14
On Thu, Feb 03, 2022 at 10:40:10AM +0100, Richard Biener wrote:
> Yes, we definitely have multiple of those cases.  I do think
> preserving "an idiom", for example literal 0/0 or all x/0 might be
> worth considering.  But I also think we have to sort out different
> language standards requirements vs. the middle-end and whos
> responsible for making sure we adhere here.

I think we try to preserve literal 0/0 and x/0, including this
optimization which punts if the divisor is literal.  But, for literal
0/0 and x/0 we alsy emit -Wdiv-by-zero warning, maybe that was the
reason why libgcc2.c did it differently.

	Jakub
Richard Biener Feb. 3, 2022, 11:03 a.m. UTC | #15
On Thu, 3 Feb 2022, Jakub Jelinek wrote:

> On Thu, Feb 03, 2022 at 10:40:10AM +0100, Richard Biener wrote:
> > Yes, we definitely have multiple of those cases.  I do think
> > preserving "an idiom", for example literal 0/0 or all x/0 might be
> > worth considering.  But I also think we have to sort out different
> > language standards requirements vs. the middle-end and whos
> > responsible for making sure we adhere here.
> 
> I think we try to preserve literal 0/0 and x/0, including this
> optimization which punts if the divisor is literal.  But, for literal
> 0/0 and x/0 we alsy emit -Wdiv-by-zero warning, maybe that was the
> reason why libgcc2.c did it differently.

Sure, I bet the code is quite old since we very likely propagate
the equality and optimize the division away since a long time.
Now that #pragma GCC diagnostic is a thing we can probably write
literal 0/0 if we choose to preserve that until the end.

But as said, for the libgcc2.c case I'd simply remove all of it.

Richard.
Jakub Jelinek Feb. 3, 2022, 11:18 a.m. UTC | #16
On Thu, Feb 03, 2022 at 12:03:15PM +0100, Richard Biener wrote:
> But as said, for the libgcc2.c case I'd simply remove all of it.

I can't read RMS' mind, it is indeed UB, so we can do anything, but I bet
it was just a QoI attempt, when (most of the time) normal single-word
or smaller division for / 0 behaves certain way (SIGFPE with FPE_INTDIV,
or being ignored, or whatever else) that it would be nice if the multi-word
division behaved likewise.
On the platforms where it is SIGFPE with FPE_INTDIV, raising that would
help people figure out what's going on.

	Jakub
Richard Biener Feb. 3, 2022, 12:13 p.m. UTC | #17
On Thu, 3 Feb 2022, Jakub Jelinek wrote:

> On Thu, Feb 03, 2022 at 12:03:15PM +0100, Richard Biener wrote:
> > But as said, for the libgcc2.c case I'd simply remove all of it.
> 
> I can't read RMS' mind, it is indeed UB, so we can do anything, but I bet
> it was just a QoI attempt, when (most of the time) normal single-word
> or smaller division for / 0 behaves certain way (SIGFPE with FPE_INTDIV,
> or being ignored, or whatever else) that it would be nice if the multi-word
> division behaved likewise.
> On the platforms where it is SIGFPE with FPE_INTDIV, raising that would
> help people figure out what's going on.

Yes, I think the intent is clear.  The question is whether we should
re-instantiate the clear intent of preserving a literal / 0 as well
(for C, without -fnon-call-exceptions).

That said, I'm fine with the asms but they are ugly ;)

Richard.
Jakub Jelinek Feb. 3, 2022, 12:22 p.m. UTC | #18
On Thu, Feb 03, 2022 at 01:13:29PM +0100, Richard Biener wrote:
> On Thu, 3 Feb 2022, Jakub Jelinek wrote:
> 
> > On Thu, Feb 03, 2022 at 12:03:15PM +0100, Richard Biener wrote:
> > > But as said, for the libgcc2.c case I'd simply remove all of it.
> > 
> > I can't read RMS' mind, it is indeed UB, so we can do anything, but I bet
> > it was just a QoI attempt, when (most of the time) normal single-word
> > or smaller division for / 0 behaves certain way (SIGFPE with FPE_INTDIV,
> > or being ignored, or whatever else) that it would be nice if the multi-word
> > division behaved likewise.
> > On the platforms where it is SIGFPE with FPE_INTDIV, raising that would
> > help people figure out what's going on.
> 
> Yes, I think the intent is clear.  The question is whether we should
> re-instantiate the clear intent of preserving a literal / 0 as well
> (for C, without -fnon-call-exceptions).

I think currently it is path isolation that breaks it.
Bet in a good intent that everything after it is UB and allowing the
stmts after it to be optimized away.
Unfortunately we don't (easily) know whether the division by zero
was literal already in the source or propagated through inlining etc.
or even jump threading.  Especially in the last case it is nice to
be able to emit as short code as possible, which __builtin_trap ()
serves nicely on many targets.
Emitting both a 1 / 0 insn followed by __builtin_trap () might work
with perhaps some RTL optimization to optimize away the trap if
the target is known to trap on division by zero.

	Jakub
Eric Botcazou Feb. 3, 2022, 12:24 p.m. UTC | #19
> Yes, I think the intent is clear.  The question is whether we should
> re-instantiate the clear intent of preserving a literal / 0 as well
> (for C, without -fnon-call-exceptions).

Note that the code is precisely compiled with -fnon-call-exceptions:

ifeq ($(LIB2_DIVMOD_EXCEPTION_FLAGS),)
# Provide default flags for compiling divmod functions, if they haven't been
# set already by a target-specific Makefile fragment.
LIB2_DIVMOD_EXCEPTION_FLAGS := -fexceptions -fnon-call-exceptions
endif

# Build LIB2_DIVMOD_FUNCS.
lib2-divmod-o = $(patsubst %,%$(objext),$(LIB2_DIVMOD_FUNCS))
$(lib2-divmod-o): %$(objext): $(srcdir)/libgcc2.c
	$(gcc_compile) -DL$* -c $< \
	  $(LIB2_DIVMOD_EXCEPTION_FLAGS) $(vis_hide)
libgcc-objects += $(lib2-divmod-o)

ifeq ($(enable_shared),yes)
lib2-divmod-s-o = $(patsubst %,%_s$(objext),$(LIB2_DIVMOD_FUNCS))
$(lib2-divmod-s-o): %_s$(objext): $(srcdir)/libgcc2.c
	$(gcc_s_compile) -DL$* -c $< \
	  $(LIB2_DIVMOD_EXCEPTION_FLAGS)
libgcc-s-objects += $(lib2-divmod-s-o)
endif

so it seems like we were about to reinvent the wheel here. ;-)
Eric Botcazou Feb. 4, 2022, 9:53 a.m. UTC | #20
> Well, yes, we have to fix it.

Here's the fix we agreed upon in the audit trail, OK for the mainline?

	PR tree-optimization/104356
	* match.pd (X / bool_range_Y is X): Add guard.
	(X / X is one): Likewise.
	(X / abs (X) is X < 0 ? -1 : 1): Likewise.
	(X / -X is -1): Likewise.
	(1 / X -> X == 1): Likewise.
Jakub Jelinek Feb. 4, 2022, 10:07 a.m. UTC | #21
On Fri, Feb 04, 2022 at 10:53:22AM +0100, Eric Botcazou wrote:
> > Well, yes, we have to fix it.
> 
> Here's the fix we agreed upon in the audit trail, OK for the mainline?
> 
> 	PR tree-optimization/104356
> 	* match.pd (X / bool_range_Y is X): Add guard.
> 	(X / X is one): Likewise.
> 	(X / abs (X) is X < 0 ? -1 : 1): Likewise.
> 	(X / -X is -1): Likewise.
> 	(1 / X -> X == 1): Likewise.

Looks mostly good, just a few nits.

> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -401,27 +401,35 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   /* X / bool_range_Y is X.  */ 
>   (simplify
>    (div @0 SSA_NAME@1)
> -  (if (INTEGRAL_TYPE_P (type) && ssa_name_has_boolean_range (@1))
> +  (if (INTEGRAL_TYPE_P (type)
> +       && ssa_name_has_boolean_range (@1)
> +       && !flag_non_call_exceptions)

ssa_name_has_boolean_range call is certainly more expensive than
!flag_non_call_exceptions check, can you swap those two?

> @@ -444,6 +452,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (trunc_div integer_onep@0 @1)
>   (if (INTEGRAL_TYPE_P (type)
>        && !integer_zerop (@1)
> +      && (!flag_non_call_exceptions || tree_expr_nonzero_p (@1))
>        && TYPE_PRECISION (type) > 1)

And similarly, TYPE_PRECISION (type) > 1 check is very cheap, can
it be done before the && !integer_zerop (@1) line?
I admit this one is already preexisting, but tree_expr_nonzero_p
can be quite expensive.

Otherwise LGTM.

	Jakub
Eric Botcazou Feb. 4, 2022, 10:14 a.m. UTC | #22
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -401,27 +401,35 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > 
> >   /* X / bool_range_Y is X.  */
> >   (simplify
> >   
> >    (div @0 SSA_NAME@1)
> > 
> > -  (if (INTEGRAL_TYPE_P (type) && ssa_name_has_boolean_range (@1))
> > +  (if (INTEGRAL_TYPE_P (type)
> > +       && ssa_name_has_boolean_range (@1)
> > +       && !flag_non_call_exceptions)
> 
> ssa_name_has_boolean_range call is certainly more expensive than
> !flag_non_call_exceptions check, can you swap those two?

But !flag_non_call_exceptions is (almost) always true for the C family of 
languages, so you're going to penalize them by doing this.

> And similarly, TYPE_PRECISION (type) > 1 check is very cheap, can
> it be done before the && !integer_zerop (@1) line?

Yes, it clearly belongs there.
Jakub Jelinek Feb. 4, 2022, 10:26 a.m. UTC | #23
On Fri, Feb 04, 2022 at 11:14:05AM +0100, Eric Botcazou wrote:
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -401,27 +401,35 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > 
> > >   /* X / bool_range_Y is X.  */
> > >   (simplify
> > >   
> > >    (div @0 SSA_NAME@1)
> > > 
> > > -  (if (INTEGRAL_TYPE_P (type) && ssa_name_has_boolean_range (@1))
> > > +  (if (INTEGRAL_TYPE_P (type)
> > > +       && ssa_name_has_boolean_range (@1)
> > > +       && !flag_non_call_exceptions)
> > 
> > ssa_name_has_boolean_range call is certainly more expensive than
> > !flag_non_call_exceptions check, can you swap those two?
> 
> But !flag_non_call_exceptions is (almost) always true for the C family of 
> languages, so you're going to penalize them by doing this.

True, but much less so than the other order penalizing Ada/Go.

> > And similarly, TYPE_PRECISION (type) > 1 check is very cheap, can
> > it be done before the && !integer_zerop (@1) line?
> 
> Yes, it clearly belongs there.

	Jakub
Jakub Jelinek Feb. 4, 2022, 10:29 a.m. UTC | #24
On Fri, Feb 04, 2022 at 11:26:30AM +0100, Jakub Jelinek via Gcc-patches wrote:
> On Fri, Feb 04, 2022 at 11:14:05AM +0100, Eric Botcazou wrote:
> > > > --- a/gcc/match.pd
> > > > +++ b/gcc/match.pd
> > > > @@ -401,27 +401,35 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > > 
> > > >   /* X / bool_range_Y is X.  */
> > > >   (simplify
> > > >   
> > > >    (div @0 SSA_NAME@1)
> > > > 
> > > > -  (if (INTEGRAL_TYPE_P (type) && ssa_name_has_boolean_range (@1))
> > > > +  (if (INTEGRAL_TYPE_P (type)
> > > > +       && ssa_name_has_boolean_range (@1)
> > > > +       && !flag_non_call_exceptions)
> > > 
> > > ssa_name_has_boolean_range call is certainly more expensive than
> > > !flag_non_call_exceptions check, can you swap those two?
> > 
> > But !flag_non_call_exceptions is (almost) always true for the C family of 
> > languages, so you're going to penalize them by doing this.
> 
> True, but much less so than the other order penalizing Ada/Go.
> 
> > > And similarly, TYPE_PRECISION (type) > 1 check is very cheap, can
> > > it be done before the && !integer_zerop (@1) line?
> > 
> > Yes, it clearly belongs there.

Anyway, not a big deal.

	Jakub
diff mbox series

Patch

--- gcc/match.pd.jj	2022-01-29 11:11:39.316628007 +0100
+++ gcc/match.pd	2022-01-29 12:19:46.662096678 +0100
@@ -435,18 +435,22 @@  (define_operator_list SYNC_FETCH_AND_AND
        && TYPE_UNSIGNED (type))
    (trunc_divmod @0 @1))))
 
- /* 1 / X -> X == 1 for unsigned integer X.
-    1 / X -> X >= -1 && X <= 1 ? X : 0 for signed integer X.
-    But not for 1 / 0 so that we can get proper warnings and errors,
-    and not for 1-bit integers as they are edge cases better handled elsewhere. */
+/* 1 / X -> X == 1 for unsigned integer X.
+   1 / X -> X >= -1 && X <= 1 ? X : 0 for signed integer X.
+   But not for 1 / 0 so that we can get proper warnings and errors,
+   and not for 1-bit integers as they are edge cases better handled
+   elsewhere.  */
 (simplify
-  (trunc_div integer_onep@0 @1)
-  (if (INTEGRAL_TYPE_P (type) && !integer_zerop (@1) && TYPE_PRECISION (type) > 1)
-    (if (TYPE_UNSIGNED (type))
-      (eq @1 { build_one_cst (type); })
-      (with { tree utype = unsigned_type_for (type); }
-        (cond (le (plus (convert:utype @1) { build_one_cst (utype); }) { build_int_cst (utype, 2); })
-          @1 { build_zero_cst (type); })))))
+ (trunc_div integer_onep@0 @1)
+ (if (INTEGRAL_TYPE_P (type)
+      && !integer_zerop (@1)
+      && TYPE_PRECISION (type) > 1)
+  (if (TYPE_UNSIGNED (type))
+   (convert (eq:boolean_type_node @1 { build_one_cst (type); }))
+   (with { tree utype = unsigned_type_for (type); }
+    (cond (le (plus (convert:utype @1) { build_one_cst (utype); })
+	      { build_int_cst (utype, 2); })
+     @1 { build_zero_cst (type); })))))
 
 /* Combine two successive divisions.  Note that combining ceil_div
    and floor_div is trickier and combining round_div even more so.  */
--- gcc/testsuite/gcc.dg/torture/pr104279.c.jj	2022-01-29 12:25:36.388174312 +0100
+++ gcc/testsuite/gcc.dg/torture/pr104279.c	2022-01-29 12:25:53.206937588 +0100
@@ -0,0 +1,12 @@ 
+/* PR tree-optimization/104279 */
+/* { dg-do compile } */
+
+unsigned a, b;
+
+int
+main ()
+{
+  b = ~(0 || ~0);
+  a = ~b / ~a;
+  return 0;
+}
--- gcc/testsuite/gcc.dg/torture/pr104280.c.jj	2022-01-29 12:24:36.190021595 +0100
+++ gcc/testsuite/gcc.dg/torture/pr104280.c	2022-01-29 12:25:28.681282783 +0100
@@ -0,0 +1,16 @@ 
+/* PR tree-optimization/104280 */
+/* { dg-do run } */
+
+int
+foo (unsigned b, int c)
+{
+  return b / c;
+}
+
+int
+main ()
+{
+  if (foo (1, 2) != 0)
+    __builtin_abort ();
+  return 0;
+}
--- gcc/testsuite/gcc.dg/torture/pr104281.c.jj	2022-01-29 12:27:29.840577473 +0100
+++ gcc/testsuite/gcc.dg/torture/pr104281.c	2022-01-29 12:27:24.526652267 +0100
@@ -0,0 +1,22 @@ 
+/* PR tree-optimization/104281 */
+/* { dg-do run } */
+
+unsigned a = 1;
+int b, c = 2;
+long d;
+
+int
+main ()
+{
+  while (1)
+    {
+      int m = a;
+    L:
+      a = ~(-(m || b & d));
+      b = ((1 ^ a) / c);
+      if (b)
+	goto L;
+      break;
+    }
+  return 0;
+}