diff mbox

Extend tree-call-cdce to calls whose result is used

Message ID 874mgyorwm.fsf@e105548-lin.cambridge.arm.com
State New
Headers show

Commit Message

Richard Sandiford Nov. 7, 2015, 1:33 p.m. UTC
For -fmath-errno, builtins.c currently expands calls to sqrt to:

    y = sqrt_optab (x);
    if (y != y)
      [ sqrt (x); or errno = EDOM; ]

The drawbacks of this are:

- the call to sqrt is protected by the result of the optab rather
  than the input.  It would be better to check !(x >= 0), like
  tree-call-cdce.c does.

- the branch isn't exposed at the gimple level and so gets little
  high-level optimisation.

- we do this for log too, but for log a zero input produces
  -inf rather than a NaN, and sets errno to ERANGE rather than EDOM.

This patch moves the code to tree-call-cdce.c instead, with the optab
operation being represented as an internal function.  This means that
we can use the existing argument-based range checks rather than the
result-based checks and that we get more gimple optimisation of
the branch.

Previously the pass was only enabled by default at -O2 or above,
but the old builtins.c code was enabled at -O.  The patch therefore
enables the pass at -O as well.

The previous patch to cfgexpand.c handled cases where functions
don't (or are assumed not to) set errno, so this patch makes
the builtins.c code dead.

Tested on x86_64-linux-gnu, aarch64-linux-gnu, arm-linux-gnueabi
and visium-elf (for the EDOM stuff).  OK to install?

Thanks,
Richard


gcc/
	* builtins.c (expand_errno_check, expand_builtin_mathfn)
	(expand_builtin_mathfn_2): Delete.
	(expand_builtin): Remove handling of functions with
	internal function equivalents.
	* internal-fn.def (SET_EDOM): New internal function.
	* internal-fn.h (set_edom_supported_p): Declare.
	* internal-fn.c (expand_SET_EDOM): New function.
	(set_edom_supported_p): Likewise.
	* tree-call-cdce.c: Include builtins.h and internal-fn.h.
	Rewrite comment at head of file.
	(is_call_dce_candidate): Rename to...
	(can_test_argument_range): ...this.  Don't check gimple_call_lhs
	or gimple_call_builtin_p here.
	(edom_only_function): New function.
	(shrink_wrap_one_built_in_call_with_conds): New function, split out
	from...
	(shrink_wrap_one_built_in_call): ...here.
	(can_use_internal_fn, use_internal_fn): New functions.
	(shrink_wrap_conditional_dead_built_in_calls): Call use_internal_fn
	for calls that have an lhs.
	(pass_call_cdce::gate): Remove optimize_function_for_speed_p check.
	(pass_call_cdce::execute): Skip blocks that are optimized for size.
	Check gimple_call_builtin_p here.  Use can_use_internal_fn for
	calls with an lhs.
	* opts.c (default_options_table): Enable -ftree-builtin-call-cdce
	at -O and above.

Comments

Michael Matz Nov. 9, 2015, 2:30 p.m. UTC | #1
Hi,

On Sat, 7 Nov 2015, Richard Sandiford wrote:

> For -fmath-errno, builtins.c currently expands calls to sqrt to:
> 
>     y = sqrt_optab (x);
>     if (y != y)
>       [ sqrt (x); or errno = EDOM; ]
> 
> - the call to sqrt is protected by the result of the optab rather
>   than the input.  It would be better to check !(x >= 0), like
>   tree-call-cdce.c does.

It depends.  With fast-math (and hence without NaNs) you can trivially 
optimize away a (y != y) test.  You can't do so with !(x>=0) at all.  

> - the branch isn't exposed at the gimple level and so gets little
>   high-level optimisation.
> 
> - we do this for log too, but for log a zero input produces
>   -inf rather than a NaN, and sets errno to ERANGE rather than EDOM.
> 
> This patch moves the code to tree-call-cdce.c instead,

This somehow feels wrong.  Dead-code elimination doesn't have anything to 
do with the transformation you want, it rather is rewriting all feasible 
calls into something else, like fold_builtins does.  Also cdce currently 
doesn't seem to do any checks on the fast-math flags, so I wonder if some 
of the conditions that you now also insert for calls whose results are 
used stay until final code.

> Previously the pass was only enabled by default at -O2 or above, but the 
> old builtins.c code was enabled at -O.  The patch therefore enables the 
> pass at -O as well.

The pass is somewhat expensive in that it removes dominator info and 
schedules a full ssa update.  The transformation is trivial enough that 
dominators and SSA form can be updated on the fly, I think without that 
it's not feasible for -O.

But as said I think this transformation should better be moved into 
builtin folding (or other call folding), at which point also the fast-math 
flags can be checked.  The infrastructure routines of tree-call-cdce can 
be used there of course.  If so moved the cdce pass would be subsumed by 
that btw. (because the dead call result will be trivially exposed), and 
that would be a good thing.


Ciao,
Michael.
Richard Sandiford Nov. 9, 2015, 3:18 p.m. UTC | #2
Michael Matz <matz@suse.de> writes:
> On Sat, 7 Nov 2015, Richard Sandiford wrote:
>> For -fmath-errno, builtins.c currently expands calls to sqrt to:
>> 
>>     y = sqrt_optab (x);
>>     if (y != y)
>>       [ sqrt (x); or errno = EDOM; ]
>> 
>> - the call to sqrt is protected by the result of the optab rather
>>   than the input.  It would be better to check !(x >= 0), like
>>   tree-call-cdce.c does.
>
> It depends.  With fast-math (and hence without NaNs) you can trivially 
> optimize away a (y != y) test.  You can't do so with !(x>=0) at all.  

-ffast-math would already cause us to treat the function as not setting
errno, so the code wouldn't be used.

>> - the branch isn't exposed at the gimple level and so gets little
>>   high-level optimisation.
>> 
>> - we do this for log too, but for log a zero input produces
>>   -inf rather than a NaN, and sets errno to ERANGE rather than EDOM.
>> 
>> This patch moves the code to tree-call-cdce.c instead,
>
> This somehow feels wrong.  Dead-code elimination doesn't have anything to 
> do with the transformation you want, it rather is rewriting all feasible 
> calls into something else, like fold_builtins does.  Also cdce currently 
> doesn't seem to do any checks on the fast-math flags, so I wonder if some 
> of the conditions that you now also insert for calls whose results are 
> used stay until final code.

Isn't that just a question of what you call the current pass though?
Arguably "conditional DCE" was always a confusing name, since the
pass only _adds_ code :-)  Obviously the point is that it eliminates
the external call at runtime in most cases, but so does the transformation
I'm adding.

I'd be happy to rename the file to something else if that would help.

>> Previously the pass was only enabled by default at -O2 or above, but the 
>> old builtins.c code was enabled at -O.  The patch therefore enables the 
>> pass at -O as well.
>
> The pass is somewhat expensive in that it removes dominator info and 
> schedules a full ssa update.  The transformation is trivial enough that 
> dominators and SSA form can be updated on the fly, I think without that 
> it's not feasible for -O.

r229916 fixed that for the non-EH case.  I posted a patch to update
the vops for the non-EH case as well:

    https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03355.html

If this aspect is going to be a blocker, maybe we can reconsider that,
since the EH case should be rare.

> But as said I think this transformation should better be moved into 
> builtin folding (or other call folding), at which point also the fast-math 
> flags can be checked.  The infrastructure routines of tree-call-cdce can 
> be used there of course.  If so moved the cdce pass would be subsumed by 
> that btw. (because the dead call result will be trivially exposed), and 
> that would be a good thing.

Yeah, that was why I added the code to the cdce pass.  I don't think
it makes sense to have one pass that adds range checks for calls whose
lhs is unused and a separate pass for calls whose lhs isn't used.

But by "builtin folding", do you mean fold_builtin_n etc.?
I thought the point was that we were trying to move away from
relying on folding at that level.  Or did you mean something else?

Thanks,
Richard
Michael Matz Nov. 9, 2015, 5:02 p.m. UTC | #3
Hi,

On Mon, 9 Nov 2015, Richard Sandiford wrote:

> -ffast-math would already cause us to treat the function as not setting 
> errno, so the code wouldn't be used.

What is "the code"?  I don't see any checking of the relevant flags in 
tree-call-cdce.c, so I wonder what would prevent the addition of the 
unnecessary checking.  It's clear that such calls with unused results will 
be removed by normal DCE without errno (so cdce doesn't have anything to 
worry about, and hence doesn't add any conditions).  But you're adding 
handling of calls with _used_ result, i.e. the call itself will stay, and 
you're adding some conditions around this, like:

  if (x>=0)
    _1 = SQRT(x);
  else
    _2 = sqrt(x);
  res_3 = phi(_1, _2);

My question was, what transforms this into

  res_3 = SQRT(x);

under fast-math?  Before your patch this was done by removing the y!=y 
test eventually, but with your patch?

> > This somehow feels wrong.  Dead-code elimination doesn't have anything 
> > to do with the transformation you want, it rather is rewriting all 
> > feasible calls into something else, like fold_builtins does.  Also 
> > cdce currently doesn't seem to do any checks on the fast-math flags, 
> > so I wonder if some of the conditions that you now also insert for 
> > calls whose results are used stay until final code.
> 
> Isn't that just a question of what you call the current pass though?

The name definitely was always suspect, but I really think the pass itself 
is the problem, it should have been part of dce itself from the beginning.  
Adding more code to it that doesn't even have anything to do with the 
original purpose moves us away from the ideal situation IMHO.  We do have 
existing places that rewrite calls into something else, so instead of 
another pass over all instructions, why not extend those places (and then 
remove the cdce pass)?

> > The pass is somewhat expensive in that it removes dominator info and 
> > schedules a full ssa update.  The transformation is trivial enough 
> > that dominators and SSA form can be updated on the fly, I think 
> > without that it's not feasible for -O.
> 
> r229916 fixed that for the non-EH case.

Ah, missed it.  Even the EH case shouldn't be difficult.  If the original 
dominator of the EH destination was the call block it moves, otherwise it 
remains unchanged.

> I posted a patch to update the vops for the non-EH case as well:
> 
>     https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03355.html

I see, here the EH case is a bit more difficult as you need to 
differentiate between VOP uses in the EH and the non-EH region, but not 
insurmountable.

> > But as said I think this transformation should better be moved into 
> > builtin folding (or other call folding), at which point also the 
> > fast-math flags can be checked.  The infrastructure routines of 
> > tree-call-cdce can be used there of course.  If so moved the cdce pass 
> > would be subsumed by that btw. (because the dead call result will be 
> > trivially exposed), and that would be a good thing.
> 
> Yeah, that was why I added the code to the cdce pass.  I don't think it 
> makes sense to have one pass that adds range checks for calls whose lhs 
> is unused and a separate pass for calls whose lhs isn't used.

Certainly not.  But it does make sense to add the range checks 
unconditionally in some rewriting pass, thereby obsoleting the cdce pass.

> But by "builtin folding", do you mean fold_builtin_n etc.?

I had the pass_fold_builtins in mind.

> I thought the point was that we were trying to move away from relying on 
> folding at that level.

If you mean in the sense of the various fold_builtin_n calls, yes.  But I 
don't see us moving to a state where we don't have _any_ specific pass 
that does a generic replace-these-patterns-with-those-patterns work, IOW I 
don't think we should make it so that everything is subsumed by the 
match.pd method.  For instance it doesn't make sense to diddle with the 
stdarg builtins before a specific time, and after diddling it doesn't make 
sense anymore to check for them, so putting this into the match.pd work 
would merely do useless work before that time.  Also match.pd (currently) 
isn't too happy with generating control flow.  And sometimes it simply 
makes sense to open-code the replacement sequences.  So, whereever we end 
up being, I think we always will have some pass going over all BBs/insns 
and pattern matching stuff.

Currently we have quite some of such passes (reassoc, forwprop, 
lower_vector_ssa, cse_reciprocals, cse_sincos (sigh!), optimize_bswap and 
others), but they are all handling only special situations in one way or 
the other.  pass_fold_builtins is another one, but it seems most related 
to what you want (replacing a call with something else), so I thought 
that'd be the natural choice.

call_cdce is also such a pass, but I think it's simply not the appropriate 
one (only in so far as its source file contains the helper routines you 
need), and in addition I think it shouldn't exist at all (and wouldn't 
need to if it had been part of DCE from the start, or if you implemented 
the conditionalizing as part of another pass).  Hey, you could be one to 
remove a pass! ;-)


Ciao,
Michael.
Richard Sandiford Nov. 9, 2015, 6:02 p.m. UTC | #4
Michael Matz <matz@suse.de> writes:
> On Mon, 9 Nov 2015, Richard Sandiford wrote:
>
>> -ffast-math would already cause us to treat the function as not setting 
>> errno, so the code wouldn't be used.
>
> What is "the code"?  I don't see any checking of the relevant flags in 
> tree-call-cdce.c, so I wonder what would prevent the addition of the 
> unnecessary checking.

-ffast-math implies -fno-errno-math, which in turn changes how the
function attributes are set.  E.g.:

#undef ATTR_MATHFN_FPROUNDING_ERRNO
#define ATTR_MATHFN_FPROUNDING_ERRNO (flag_errno_math ? \
	ATTR_NOTHROW_LEAF_LIST : ATTR_MATHFN_FPROUNDING)

So with -ffast-math these functions don't set errno and don't have a vdef.
The patch checks for that here:

+/* Return true if built-in function call CALL could be implemented using
+   a combination of an internal function to compute the result and a
+   separate call to set errno.  */
+
+static bool
+can_use_internal_fn (gcall *call)
+{
+  /* Only replace calls that set errno.  */
+  if (!gimple_vdef (call))
+    return false;

Checking for a vdef seemed reasonable.  If we treat these libm functions
as doing nothing other than producing a numerical result then we'll get
better optimisation across the board if we don't add unnecessary vops.
(Which we do, but see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68235 )

>> > The pass is somewhat expensive in that it removes dominator info and 
>> > schedules a full ssa update.  The transformation is trivial enough 
>> > that dominators and SSA form can be updated on the fly, I think 
>> > without that it's not feasible for -O.
>> 
>> r229916 fixed that for the non-EH case.
>
> Ah, missed it.  Even the EH case shouldn't be difficult.  If the original 
> dominator of the EH destination was the call block it moves, otherwise it 
> remains unchanged.

The target of the edge is easy in itself, I agree, but that isn't
necessarily the only affected block, if the EH handler doesn't
exit or rethrow.

>> I posted a patch to update the vops for the non-EH case as well:
>> 
>>     https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03355.html
>
> I see, here the EH case is a bit more difficult as you need to 
> differentiate between VOP uses in the EH and the non-EH region, but not 
> insurmountable.

Well, I agree it's not insurmountable. :-)

>> But by "builtin folding", do you mean fold_builtin_n etc.?
>
> I had the pass_fold_builtins in mind.

OK.

> Currently we have quite some of such passes (reassoc, forwprop, 
> lower_vector_ssa, cse_reciprocals, cse_sincos (sigh!), optimize_bswap and 
> others), but they are all handling only special situations in one way or 
> the other.  pass_fold_builtins is another one, but it seems most related 
> to what you want (replacing a call with something else), so I thought 
> that'd be the natural choice.

Well, to be pedantic, it's not really replacing the call.  Except for
the special case of targets that support direct assignments to errno,
it keeps the original call but ensures that it isn't usually executed.
From that point of view it doesn't really seem like a fold.

But I suppose that's just naming again :-).  And it's easily solved with
s/fold/rewrite/.

> call_cdce is also such a pass, but I think it's simply not the appropriate 
> one (only in so far as its source file contains the helper routines you 
> need), and in addition I think it shouldn't exist at all (and wouldn't 
> need to if it had been part of DCE from the start, or if you implemented 
> the conditionalizing as part of another pass).  Hey, you could be one to 
> remove a pass! ;-)

It still seems a bit artificial to me to say that the transformation
with a null lhs is "DCE enough" to go in the main DCE pass (even
though like I say it doesn't actually eliminate any code from the IR,
it just adds more code) and should be kept in a separate pass from
the one that does the transformation on a non-null lhs.

Thanks,
Richard
Michael Matz Nov. 9, 2015, 9:03 p.m. UTC | #5
Hi,

On Mon, 9 Nov 2015, Richard Sandiford wrote:

> +static bool
> +can_use_internal_fn (gcall *call)
> +{
> +  /* Only replace calls that set errno.  */
> +  if (!gimple_vdef (call))
> +    return false;

Oh, I managed to confuse this in my head while reading the patch.  So, 
hmm, you don't actually replace the builtin with an internal function 
(without the condition) under no-errno-math?  Does something else do that?  
Because otherwise that seems an unnecessary restriction?

> >> r229916 fixed that for the non-EH case.
> >
> > Ah, missed it.  Even the EH case shouldn't be difficult.  If the 
> > original dominator of the EH destination was the call block it moves, 
> > otherwise it remains unchanged.
> 
> The target of the edge is easy in itself, I agree, but that isn't
> necessarily the only affected block, if the EH handler doesn't
> exit or rethrow.

You're worried the non-EH and the EH regions merge again, right?  Like so:

before change:

        BB1: throwing-call
     fallthru/           \EH
    BB2                   BBeh
     |                   /....\ (stuff in EH-region)
     |                     /    some path out of EH region
     | /------------------/
    BB3

Here, BB3 must at least be dominated by BB1 (the throwing block), or by 
something further up (when there are other side-entries to the path 
BB2->BB3 or into the EH region).  When further up, nothing changes, when 
it's BB1, then it's afterwards dominated by the BB containing the 
condition.  So everything with idom==BB1 gets idom=Bcond, except for BBeh, 
which gets idom=Bcall.  Depending on how you split BB1, either Bcond or 
BBcall might still be BB1 and doesn't lead to changes in the dom tree.

> > Currently we have quite some of such passes (reassoc, forwprop, 
> > lower_vector_ssa, cse_reciprocals, cse_sincos (sigh!), optimize_bswap 
> > and others), but they are all handling only special situations in one 
> > way or the other.  pass_fold_builtins is another one, but it seems 
> > most related to what you want (replacing a call with something else), 
> > so I thought that'd be the natural choice.
> 
> Well, to be pedantic, it's not really replacing the call.  Except for
> the special case of targets that support direct assignments to errno,
> it keeps the original call but ensures that it isn't usually executed.
> From that point of view it doesn't really seem like a fold.
> 
> But I suppose that's just naming again :-).  And it's easily solved with
> s/fold/rewrite/.

Exactly, in my mind pass_fold_builtin (like many of the others I 
mentioned) doesn't do folding but rewriting :)

> > call_cdce is also such a pass, but I think it's simply not the 
> > appropriate one (only in so far as its source file contains the helper 
> > routines you need), and in addition I think it shouldn't exist at all 
> > (and wouldn't need to if it had been part of DCE from the start, or if 
> > you implemented the conditionalizing as part of another pass).  Hey, 
> > you could be one to remove a pass! ;-)
> 
> It still seems a bit artificial to me to say that the transformation 
> with a null lhs is "DCE enough" to go in the main DCE pass (even though 
> like I say it doesn't actually eliminate any code from the IR, it just 
> adds more code) and should be kept in a separate pass from the one that 
> does the transformation on a non-null lhs.

Oh, I agree, I might not have been clear: I'm not arguing that the normal 
DCE should now be changed to do the conditionalizing when it removes an 
call LHS; I was saying that it _would_ have been good instead of adding 
the call_cdce pass in the past, when it was for DCE purposes only.  But 
now your proposal is on the plate, namely doing the conditionalizing also 
with an LHS.  So that conditionalizing should take place in some rewriting 
pass (and ideally not call_cdce), no matter the LHS, and normal DCE not be 
changed (it will still remove LHSs of non-removable calls, just that those 
then are sometimes under a condition, when DCE runs after the rewriting).


Ciao,
Michael.
Joseph Myers Nov. 9, 2015, 11:49 p.m. UTC | #6
On Sat, 7 Nov 2015, Richard Sandiford wrote:

> - the call to sqrt is protected by the result of the optab rather
>   than the input.  It would be better to check !(x >= 0), like
>   tree-call-cdce.c does.

Well, no, isless (x, 0) (or !isgreaterequal (x, 0), since it doesn't 
matter what result a NaN gets).  If a quiet NaN is the argument, the 
"invalid" exception should not be raised, which >= will do (modulo bugs on 
some targets where unordered comparison instructions are wrongly used for 
ordered comparisons).  Bug 68264 filed.

> +/* Return true if CALL can produce a domain error (EDOM) but can never
> +   produce a pole or range overflow error (ERANGE).  This means that we
> +   can tell whether a function would have set errno by testing whether
> +   the result is a NaN.
> +
> +   Note that we do not consider range underflow errors, which are
> +   implementation-defined in C99.  */

I'd think this should respect the library's definition.  For glibc that is 
that underflow to 0 should set errno if the result would also underflow to 
0 in the default rounding mode, but underflow to subnormal result doesn't 
need to set errno, and neither does rounding-mode-dependent underflow to 0 
in cases where f (x) is very close to x for small x but in some rounding 
modes f of the least subnormal is 0.  (That caveat around f (x) close to x 
avoids possibly needing to set errno for the least subnormal only for lots 
of functions.)

> +    CASE_FLT_FN (BUILT_IN_EXP):
> +    CASE_FLT_FN (BUILT_IN_EXP10):
> +    CASE_FLT_FN (BUILT_IN_EXP2):
> +    CASE_FLT_FN (BUILT_IN_EXPM1):

These can overflow.

> +    CASE_FLT_FN (BUILT_IN_ATAN2):

And this is the only case here where, given that caveat, underflow setting 
ERANGE is applicable with glibc, but overflow is not.
Richard Biener Nov. 10, 2015, 10:02 a.m. UTC | #7
On Mon, Nov 9, 2015 at 10:03 PM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> On Mon, 9 Nov 2015, Richard Sandiford wrote:
>
>> +static bool
>> +can_use_internal_fn (gcall *call)
>> +{
>> +  /* Only replace calls that set errno.  */
>> +  if (!gimple_vdef (call))
>> +    return false;
>
> Oh, I managed to confuse this in my head while reading the patch.  So,
> hmm, you don't actually replace the builtin with an internal function
> (without the condition) under no-errno-math?  Does something else do that?
> Because otherwise that seems an unnecessary restriction?
>
>> >> r229916 fixed that for the non-EH case.
>> >
>> > Ah, missed it.  Even the EH case shouldn't be difficult.  If the
>> > original dominator of the EH destination was the call block it moves,
>> > otherwise it remains unchanged.
>>
>> The target of the edge is easy in itself, I agree, but that isn't
>> necessarily the only affected block, if the EH handler doesn't
>> exit or rethrow.
>
> You're worried the non-EH and the EH regions merge again, right?  Like so:
>
> before change:
>
>         BB1: throwing-call
>      fallthru/           \EH
>     BB2                   BBeh
>      |                   /....\ (stuff in EH-region)
>      |                     /    some path out of EH region
>      | /------------------/
>     BB3
>
> Here, BB3 must at least be dominated by BB1 (the throwing block), or by
> something further up (when there are other side-entries to the path
> BB2->BB3 or into the EH region).  When further up, nothing changes, when
> it's BB1, then it's afterwards dominated by the BB containing the
> condition.  So everything with idom==BB1 gets idom=Bcond, except for BBeh,
> which gets idom=Bcall.  Depending on how you split BB1, either Bcond or
> BBcall might still be BB1 and doesn't lead to changes in the dom tree.
>
>> > Currently we have quite some of such passes (reassoc, forwprop,
>> > lower_vector_ssa, cse_reciprocals, cse_sincos (sigh!), optimize_bswap
>> > and others), but they are all handling only special situations in one
>> > way or the other.  pass_fold_builtins is another one, but it seems
>> > most related to what you want (replacing a call with something else),
>> > so I thought that'd be the natural choice.
>>
>> Well, to be pedantic, it's not really replacing the call.  Except for
>> the special case of targets that support direct assignments to errno,
>> it keeps the original call but ensures that it isn't usually executed.
>> From that point of view it doesn't really seem like a fold.
>>
>> But I suppose that's just naming again :-).  And it's easily solved with
>> s/fold/rewrite/.
>
> Exactly, in my mind pass_fold_builtin (like many of the others I
> mentioned) doesn't do folding but rewriting :)

So I am replying here to the issue of where to do the transform call_cdce
does and the one Richard wants to add.  For example we "lower"
posix_memalign as early as GIMPLE lowering (that's before CFG construction).
We also lower sincos to cexpi during GENERIC folding (or if that is dropped
either GIMPLE lowering or GIMPLE folding during gimplification would be
appropriate).

Now, with offloading we have to avoid creating target dependencies before
LTO stream-out (thus no IFN replacements before that - not sure if
Richards patches have an issue there already).  Which would leave us
with a lowering stage early in the main optimization pipeline - I think
fold_builtins pass is way too late but any "folding" pass will do
(like forwprop or backprop where the latter might be better because
it might end up computing FP "ranges" to improve the initial lowering code).

Of course call_cdce is as good as long as it still exists.

>> > call_cdce is also such a pass, but I think it's simply not the
>> > appropriate one (only in so far as its source file contains the helper
>> > routines you need), and in addition I think it shouldn't exist at all
>> > (and wouldn't need to if it had been part of DCE from the start, or if
>> > you implemented the conditionalizing as part of another pass).  Hey,
>> > you could be one to remove a pass! ;-)
>>
>> It still seems a bit artificial to me to say that the transformation
>> with a null lhs is "DCE enough" to go in the main DCE pass (even though
>> like I say it doesn't actually eliminate any code from the IR, it just
>> adds more code) and should be kept in a separate pass from the one that
>> does the transformation on a non-null lhs.
>
> Oh, I agree, I might not have been clear: I'm not arguing that the normal
> DCE should now be changed to do the conditionalizing when it removes an
> call LHS; I was saying that it _would_ have been good instead of adding
> the call_cdce pass in the past, when it was for DCE purposes only.

Yes, I also argued that.

> But
> now your proposal is on the plate, namely doing the conditionalizing also
> with an LHS.  So that conditionalizing should take place in some rewriting
> pass (and ideally not call_cdce), no matter the LHS, and normal DCE not be
> changed (it will still remove LHSs of non-removable calls, just that those
> then are sometimes under a condition, when DCE runs after the rewriting).

Indeed.  How difficult is to integrate all this with backprop?

Thanks,
Richard.

>
> Ciao,
> Michael.
Richard Sandiford Nov. 13, 2015, 1:12 p.m. UTC | #8
Richard Biener <richard.guenther@gmail.com> writes:
> On Mon, Nov 9, 2015 at 10:03 PM, Michael Matz <matz@suse.de> wrote:
>> Hi,
>>
>> On Mon, 9 Nov 2015, Richard Sandiford wrote:
>>
>>> +static bool
>>> +can_use_internal_fn (gcall *call)
>>> +{
>>> +  /* Only replace calls that set errno.  */
>>> +  if (!gimple_vdef (call))
>>> +    return false;
>>
>> Oh, I managed to confuse this in my head while reading the patch.  So,
>> hmm, you don't actually replace the builtin with an internal function
>> (without the condition) under no-errno-math?  Does something else do that?
>> Because otherwise that seems an unnecessary restriction?
>>
>>> >> r229916 fixed that for the non-EH case.
>>> >
>>> > Ah, missed it.  Even the EH case shouldn't be difficult.  If the
>>> > original dominator of the EH destination was the call block it moves,
>>> > otherwise it remains unchanged.
>>>
>>> The target of the edge is easy in itself, I agree, but that isn't
>>> necessarily the only affected block, if the EH handler doesn't
>>> exit or rethrow.
>>
>> You're worried the non-EH and the EH regions merge again, right?  Like so:
>>
>> before change:
>>
>>         BB1: throwing-call
>>      fallthru/           \EH
>>     BB2                   BBeh
>>      |                   /....\ (stuff in EH-region)
>>      |                     /    some path out of EH region
>>      | /------------------/
>>     BB3
>>
>> Here, BB3 must at least be dominated by BB1 (the throwing block), or by
>> something further up (when there are other side-entries to the path
>> BB2->BB3 or into the EH region).  When further up, nothing changes, when
>> it's BB1, then it's afterwards dominated by the BB containing the
>> condition.  So everything with idom==BB1 gets idom=Bcond, except for BBeh,
>> which gets idom=Bcall.  Depending on how you split BB1, either Bcond or
>> BBcall might still be BB1 and doesn't lead to changes in the dom tree.
>>
>>> > Currently we have quite some of such passes (reassoc, forwprop,
>>> > lower_vector_ssa, cse_reciprocals, cse_sincos (sigh!), optimize_bswap
>>> > and others), but they are all handling only special situations in one
>>> > way or the other.  pass_fold_builtins is another one, but it seems
>>> > most related to what you want (replacing a call with something else),
>>> > so I thought that'd be the natural choice.
>>>
>>> Well, to be pedantic, it's not really replacing the call.  Except for
>>> the special case of targets that support direct assignments to errno,
>>> it keeps the original call but ensures that it isn't usually executed.
>>> From that point of view it doesn't really seem like a fold.
>>>
>>> But I suppose that's just naming again :-).  And it's easily solved with
>>> s/fold/rewrite/.
>>
>> Exactly, in my mind pass_fold_builtin (like many of the others I
>> mentioned) doesn't do folding but rewriting :)
>
> So I am replying here to the issue of where to do the transform call_cdce
> does and the one Richard wants to add.  For example we "lower"
> posix_memalign as early as GIMPLE lowering (that's before CFG construction).
> We also lower sincos to cexpi during GENERIC folding (or if that is dropped
> either GIMPLE lowering or GIMPLE folding during gimplification would be
> appropriate).
>
> Now, with offloading we have to avoid creating target dependencies before
> LTO stream-out (thus no IFN replacements before that - not sure if
> Richards patches have an issue there already).

No, this patch was the earliest point at which we converted to internal
functions.  The idea was to make code treat ECF_PURE built-in functions
and internal functions as being basically equivalent.  There's therefore
not much benefit to doing a straight replacement of one with the other
during either GENERIC or gimple.  Instead the series only used internal
functions for things that built-in functions couldn't do, specifically:

- the case used in this patch, to optimise part of a non-pure built-in
  function using a pure equivalent.

- vector versions of built-in functions.

The cfgexpand patch makes sure that pure built-in functions are expanded
like internal functions where possible.

> Which would leave us with a lowering stage early in the main
> optimization pipeline - I think fold_builtins pass is way too late but
> any "folding" pass will do (like forwprop or backprop where the latter
> might be better because it might end up computing FP "ranges" to
> improve the initial lowering code).

This isn't at all related to what backprop is doing though.
backprop is about optimising definitions based on information
about all uses.

Does fold_builtins need to be as late as it is?

> Of course call_cdce is as good as long as it still exists.

Does this meann that you're not against the patch in principle
(i.e. keeping call_cdce for now and extending it in the way that
this patch does)?

Thanks,
Richard
Richard Biener Nov. 16, 2015, 2:01 p.m. UTC | #9
On Fri, Nov 13, 2015 at 2:12 PM, Richard Sandiford
<richard.sandiford@arm.com> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Mon, Nov 9, 2015 at 10:03 PM, Michael Matz <matz@suse.de> wrote:
>>> Hi,
>>>
>>> On Mon, 9 Nov 2015, Richard Sandiford wrote:
>>>
>>>> +static bool
>>>> +can_use_internal_fn (gcall *call)
>>>> +{
>>>> +  /* Only replace calls that set errno.  */
>>>> +  if (!gimple_vdef (call))
>>>> +    return false;
>>>
>>> Oh, I managed to confuse this in my head while reading the patch.  So,
>>> hmm, you don't actually replace the builtin with an internal function
>>> (without the condition) under no-errno-math?  Does something else do that?
>>> Because otherwise that seems an unnecessary restriction?
>>>
>>>> >> r229916 fixed that for the non-EH case.
>>>> >
>>>> > Ah, missed it.  Even the EH case shouldn't be difficult.  If the
>>>> > original dominator of the EH destination was the call block it moves,
>>>> > otherwise it remains unchanged.
>>>>
>>>> The target of the edge is easy in itself, I agree, but that isn't
>>>> necessarily the only affected block, if the EH handler doesn't
>>>> exit or rethrow.
>>>
>>> You're worried the non-EH and the EH regions merge again, right?  Like so:
>>>
>>> before change:
>>>
>>>         BB1: throwing-call
>>>      fallthru/           \EH
>>>     BB2                   BBeh
>>>      |                   /....\ (stuff in EH-region)
>>>      |                     /    some path out of EH region
>>>      | /------------------/
>>>     BB3
>>>
>>> Here, BB3 must at least be dominated by BB1 (the throwing block), or by
>>> something further up (when there are other side-entries to the path
>>> BB2->BB3 or into the EH region).  When further up, nothing changes, when
>>> it's BB1, then it's afterwards dominated by the BB containing the
>>> condition.  So everything with idom==BB1 gets idom=Bcond, except for BBeh,
>>> which gets idom=Bcall.  Depending on how you split BB1, either Bcond or
>>> BBcall might still be BB1 and doesn't lead to changes in the dom tree.
>>>
>>>> > Currently we have quite some of such passes (reassoc, forwprop,
>>>> > lower_vector_ssa, cse_reciprocals, cse_sincos (sigh!), optimize_bswap
>>>> > and others), but they are all handling only special situations in one
>>>> > way or the other.  pass_fold_builtins is another one, but it seems
>>>> > most related to what you want (replacing a call with something else),
>>>> > so I thought that'd be the natural choice.
>>>>
>>>> Well, to be pedantic, it's not really replacing the call.  Except for
>>>> the special case of targets that support direct assignments to errno,
>>>> it keeps the original call but ensures that it isn't usually executed.
>>>> From that point of view it doesn't really seem like a fold.
>>>>
>>>> But I suppose that's just naming again :-).  And it's easily solved with
>>>> s/fold/rewrite/.
>>>
>>> Exactly, in my mind pass_fold_builtin (like many of the others I
>>> mentioned) doesn't do folding but rewriting :)
>>
>> So I am replying here to the issue of where to do the transform call_cdce
>> does and the one Richard wants to add.  For example we "lower"
>> posix_memalign as early as GIMPLE lowering (that's before CFG construction).
>> We also lower sincos to cexpi during GENERIC folding (or if that is dropped
>> either GIMPLE lowering or GIMPLE folding during gimplification would be
>> appropriate).
>>
>> Now, with offloading we have to avoid creating target dependencies before
>> LTO stream-out (thus no IFN replacements before that - not sure if
>> Richards patches have an issue there already).
>
> No, this patch was the earliest point at which we converted to internal
> functions.  The idea was to make code treat ECF_PURE built-in functions
> and internal functions as being basically equivalent.  There's therefore
> not much benefit to doing a straight replacement of one with the other
> during either GENERIC or gimple.  Instead the series only used internal
> functions for things that built-in functions couldn't do, specifically:
>
> - the case used in this patch, to optimise part of a non-pure built-in
>   function using a pure equivalent.
>
> - vector versions of built-in functions.
>
> The cfgexpand patch makes sure that pure built-in functions are expanded
> like internal functions where possible.
>
>> Which would leave us with a lowering stage early in the main
>> optimization pipeline - I think fold_builtins pass is way too late but
>> any "folding" pass will do (like forwprop or backprop where the latter
>> might be better because it might end up computing FP "ranges" to
>> improve the initial lowering code).
>
> This isn't at all related to what backprop is doing though.
> backprop is about optimising definitions based on information
> about all uses.
>
> Does fold_builtins need to be as late as it is?

Not sure.  It folds remaining __builtin_constant_p stuff for example.

>> Of course call_cdce is as good as long as it still exists.
>
> Does this meann that you're not against the patch in principle
> (i.e. keeping call_cdce for now and extending it in the way that
> this patch does)?

Yes, I'm fine with extending call_cdce.  Of course I'd happily
approve a patch dissolving it into somewhere where it makes more
sense.  But this shouldn't block this patch.

Richard.

>
> Thanks,
> Richard
>
Michael Matz Nov. 16, 2015, 4:19 p.m. UTC | #10
Hi,

On Mon, 16 Nov 2015, Richard Biener wrote:

> >> Which would leave us with a lowering stage early in the main 
> >> optimization pipeline - I think fold_builtins pass is way too late 
> >> but any "folding" pass will do (like forwprop or backprop where the 
> >> latter might be better because it might end up computing FP "ranges" 
> >> to improve the initial lowering code).
> >
> > This isn't at all related to what backprop is doing though. backprop 
> > is about optimising definitions based on information about all uses.

Right, I think backprop would be even worse than call_cdce, that pass has 
a completely different structure.

> >> Of course call_cdce is as good as long as it still exists.
> >
> > Does this meann that you're not against the patch in principle (i.e. 
> > keeping call_cdce for now and extending it in the way that this patch 
> > does)?
> 
> Yes, I'm fine with extending call_cdce.  Of course I'd happily approve a 
> patch dissolving it into somewhere where it makes more sense.  But this 
> shouldn't block this patch.

Okay, I like merging passes, so I'll try to do that, once the stuff is in 
:)


Ciao,
Michael.
diff mbox

Patch

diff --git a/gcc/builtins.c b/gcc/builtins.c
index bbcc7dc3..1c13a51 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -101,9 +101,6 @@  static rtx expand_builtin_apply (rtx, rtx, rtx);
 static void expand_builtin_return (rtx);
 static enum type_class type_to_class (tree);
 static rtx expand_builtin_classify_type (tree);
-static void expand_errno_check (tree, rtx);
-static rtx expand_builtin_mathfn (tree, rtx, rtx);
-static rtx expand_builtin_mathfn_2 (tree, rtx, rtx);
 static rtx expand_builtin_mathfn_3 (tree, rtx, rtx);
 static rtx expand_builtin_mathfn_ternary (tree, rtx, rtx);
 static rtx expand_builtin_interclass_mathfn (tree, rtx);
@@ -1972,286 +1969,6 @@  replacement_internal_fn (gcall *call)
   return IFN_LAST;
 }
 
-/* If errno must be maintained, expand the RTL to check if the result,
-   TARGET, of a built-in function call, EXP, is NaN, and if so set
-   errno to EDOM.  */
-
-static void
-expand_errno_check (tree exp, rtx target)
-{
-  rtx_code_label *lab = gen_label_rtx ();
-
-  /* Test the result; if it is NaN, set errno=EDOM because
-     the argument was not in the domain.  */
-  do_compare_rtx_and_jump (target, target, EQ, 0, GET_MODE (target),
-			   NULL_RTX, NULL, lab,
-			   /* The jump is very likely.  */
-			   REG_BR_PROB_BASE - (REG_BR_PROB_BASE / 2000 - 1));
-
-#ifdef TARGET_EDOM
-  /* If this built-in doesn't throw an exception, set errno directly.  */
-  if (TREE_NOTHROW (TREE_OPERAND (CALL_EXPR_FN (exp), 0)))
-    {
-#ifdef GEN_ERRNO_RTX
-      rtx errno_rtx = GEN_ERRNO_RTX;
-#else
-      rtx errno_rtx
-	  = gen_rtx_MEM (word_mode, gen_rtx_SYMBOL_REF (Pmode, "errno"));
-#endif
-      emit_move_insn (errno_rtx,
-		      gen_int_mode (TARGET_EDOM, GET_MODE (errno_rtx)));
-      emit_label (lab);
-      return;
-    }
-#endif
-
-  /* Make sure the library call isn't expanded as a tail call.  */
-  CALL_EXPR_TAILCALL (exp) = 0;
-
-  /* We can't set errno=EDOM directly; let the library call do it.
-     Pop the arguments right away in case the call gets deleted.  */
-  NO_DEFER_POP;
-  expand_call (exp, target, 0);
-  OK_DEFER_POP;
-  emit_label (lab);
-}
-
-/* Expand a call to one of the builtin math functions (sqrt, exp, or log).
-   Return NULL_RTX if a normal call should be emitted rather than expanding
-   the function in-line.  EXP is the expression that is a call to the builtin
-   function; if convenient, the result should be placed in TARGET.
-   SUBTARGET may be used as the target for computing one of EXP's operands.  */
-
-static rtx
-expand_builtin_mathfn (tree exp, rtx target, rtx subtarget)
-{
-  optab builtin_optab;
-  rtx op0;
-  rtx_insn *insns;
-  tree fndecl = get_callee_fndecl (exp);
-  machine_mode mode;
-  bool errno_set = false;
-  bool try_widening = false;
-  tree arg;
-
-  if (!validate_arglist (exp, REAL_TYPE, VOID_TYPE))
-    return NULL_RTX;
-
-  arg = CALL_EXPR_ARG (exp, 0);
-
-  switch (DECL_FUNCTION_CODE (fndecl))
-    {
-    CASE_FLT_FN (BUILT_IN_SQRT):
-      errno_set = ! tree_expr_nonnegative_p (arg);
-      try_widening = true;
-      builtin_optab = sqrt_optab;
-      break;
-    CASE_FLT_FN (BUILT_IN_EXP):
-      errno_set = true; builtin_optab = exp_optab; break;
-    CASE_FLT_FN (BUILT_IN_EXP10):
-    CASE_FLT_FN (BUILT_IN_POW10):
-      errno_set = true; builtin_optab = exp10_optab; break;
-    CASE_FLT_FN (BUILT_IN_EXP2):
-      errno_set = true; builtin_optab = exp2_optab; break;
-    CASE_FLT_FN (BUILT_IN_EXPM1):
-      errno_set = true; builtin_optab = expm1_optab; break;
-    CASE_FLT_FN (BUILT_IN_LOGB):
-      errno_set = true; builtin_optab = logb_optab; break;
-    CASE_FLT_FN (BUILT_IN_LOG):
-      errno_set = true; builtin_optab = log_optab; break;
-    CASE_FLT_FN (BUILT_IN_LOG10):
-      errno_set = true; builtin_optab = log10_optab; break;
-    CASE_FLT_FN (BUILT_IN_LOG2):
-      errno_set = true; builtin_optab = log2_optab; break;
-    CASE_FLT_FN (BUILT_IN_LOG1P):
-      errno_set = true; builtin_optab = log1p_optab; break;
-    CASE_FLT_FN (BUILT_IN_ASIN):
-      builtin_optab = asin_optab; break;
-    CASE_FLT_FN (BUILT_IN_ACOS):
-      builtin_optab = acos_optab; break;
-    CASE_FLT_FN (BUILT_IN_TAN):
-      builtin_optab = tan_optab; break;
-    CASE_FLT_FN (BUILT_IN_ATAN):
-      builtin_optab = atan_optab; break;
-    CASE_FLT_FN (BUILT_IN_FLOOR):
-      builtin_optab = floor_optab; break;
-    CASE_FLT_FN (BUILT_IN_CEIL):
-      builtin_optab = ceil_optab; break;
-    CASE_FLT_FN (BUILT_IN_TRUNC):
-      builtin_optab = btrunc_optab; break;
-    CASE_FLT_FN (BUILT_IN_ROUND):
-      builtin_optab = round_optab; break;
-    CASE_FLT_FN (BUILT_IN_NEARBYINT):
-      builtin_optab = nearbyint_optab;
-      if (flag_trapping_math)
-	break;
-      /* Else fallthrough and expand as rint.  */
-    CASE_FLT_FN (BUILT_IN_RINT):
-      builtin_optab = rint_optab; break;
-    CASE_FLT_FN (BUILT_IN_SIGNIFICAND):
-      builtin_optab = significand_optab; break;
-    default:
-      gcc_unreachable ();
-    }
-
-  /* Make a suitable register to place result in.  */
-  mode = TYPE_MODE (TREE_TYPE (exp));
-
-  if (! flag_errno_math || ! HONOR_NANS (mode))
-    errno_set = false;
-
-  /* Before working hard, check whether the instruction is available, but try
-     to widen the mode for specific operations.  */
-  if ((optab_handler (builtin_optab, mode) != CODE_FOR_nothing
-       || (try_widening && !excess_precision_type (TREE_TYPE (exp))))
-      && (!errno_set || !optimize_insn_for_size_p ()))
-    {
-      rtx result = gen_reg_rtx (mode);
-
-      /* Wrap the computation of the argument in a SAVE_EXPR, as we may
-	 need to expand the argument again.  This way, we will not perform
-	 side-effects more the once.  */
-      CALL_EXPR_ARG (exp, 0) = arg = builtin_save_expr (arg);
-
-      op0 = expand_expr (arg, subtarget, VOIDmode, EXPAND_NORMAL);
-
-      start_sequence ();
-
-      /* Compute into RESULT.
-	 Set RESULT to wherever the result comes back.  */
-      result = expand_unop (mode, builtin_optab, op0, result, 0);
-
-      if (result != 0)
-	{
-	  if (errno_set)
-	    expand_errno_check (exp, result);
-
-	  /* Output the entire sequence.  */
-	  insns = get_insns ();
-	  end_sequence ();
-	  emit_insn (insns);
-	  return result;
-	}
-
-      /* If we were unable to expand via the builtin, stop the sequence
-	 (without outputting the insns) and call to the library function
-	 with the stabilized argument list.  */
-      end_sequence ();
-    }
-
-  return expand_call (exp, target, target == const0_rtx);
-}
-
-/* Expand a call to the builtin binary math functions (pow and atan2).
-   Return NULL_RTX if a normal call should be emitted rather than expanding the
-   function in-line.  EXP is the expression that is a call to the builtin
-   function; if convenient, the result should be placed in TARGET.
-   SUBTARGET may be used as the target for computing one of EXP's
-   operands.  */
-
-static rtx
-expand_builtin_mathfn_2 (tree exp, rtx target, rtx subtarget)
-{
-  optab builtin_optab;
-  rtx op0, op1, result;
-  rtx_insn *insns;
-  int op1_type = REAL_TYPE;
-  tree fndecl = get_callee_fndecl (exp);
-  tree arg0, arg1;
-  machine_mode mode;
-  bool errno_set = true;
-
-  switch (DECL_FUNCTION_CODE (fndecl))
-    {
-    CASE_FLT_FN (BUILT_IN_SCALBN):
-    CASE_FLT_FN (BUILT_IN_SCALBLN):
-    CASE_FLT_FN (BUILT_IN_LDEXP):
-      op1_type = INTEGER_TYPE;
-    default:
-      break;
-    }
-
-  if (!validate_arglist (exp, REAL_TYPE, op1_type, VOID_TYPE))
-    return NULL_RTX;
-
-  arg0 = CALL_EXPR_ARG (exp, 0);
-  arg1 = CALL_EXPR_ARG (exp, 1);
-
-  switch (DECL_FUNCTION_CODE (fndecl))
-    {
-    CASE_FLT_FN (BUILT_IN_POW):
-      builtin_optab = pow_optab; break;
-    CASE_FLT_FN (BUILT_IN_ATAN2):
-      builtin_optab = atan2_optab; break;
-    CASE_FLT_FN (BUILT_IN_SCALB):
-      if (REAL_MODE_FORMAT (TYPE_MODE (TREE_TYPE (exp)))->b != 2)
-	return 0;
-      builtin_optab = scalb_optab; break;
-    CASE_FLT_FN (BUILT_IN_SCALBN):
-    CASE_FLT_FN (BUILT_IN_SCALBLN):
-      if (REAL_MODE_FORMAT (TYPE_MODE (TREE_TYPE (exp)))->b != 2)
-	return 0;
-    /* Fall through... */
-    CASE_FLT_FN (BUILT_IN_LDEXP):
-      builtin_optab = ldexp_optab; break;
-    CASE_FLT_FN (BUILT_IN_FMOD):
-      builtin_optab = fmod_optab; break;
-    CASE_FLT_FN (BUILT_IN_REMAINDER):
-    CASE_FLT_FN (BUILT_IN_DREM):
-      builtin_optab = remainder_optab; break;
-    default:
-      gcc_unreachable ();
-    }
-
-  /* Make a suitable register to place result in.  */
-  mode = TYPE_MODE (TREE_TYPE (exp));
-
-  /* Before working hard, check whether the instruction is available.  */
-  if (optab_handler (builtin_optab, mode) == CODE_FOR_nothing)
-    return NULL_RTX;
-
-  result = gen_reg_rtx (mode);
-
-  if (! flag_errno_math || ! HONOR_NANS (mode))
-    errno_set = false;
-
-  if (errno_set && optimize_insn_for_size_p ())
-    return 0;
-
-  /* Always stabilize the argument list.  */
-  CALL_EXPR_ARG (exp, 0) = arg0 = builtin_save_expr (arg0);
-  CALL_EXPR_ARG (exp, 1) = arg1 = builtin_save_expr (arg1);
-
-  op0 = expand_expr (arg0, subtarget, VOIDmode, EXPAND_NORMAL);
-  op1 = expand_normal (arg1);
-
-  start_sequence ();
-
-  /* Compute into RESULT.
-     Set RESULT to wherever the result comes back.  */
-  result = expand_binop (mode, builtin_optab, op0, op1,
-			 result, 0, OPTAB_DIRECT);
-
-  /* If we were unable to expand via the builtin, stop the sequence
-     (without outputting the insns) and call to the library function
-     with the stabilized argument list.  */
-  if (result == 0)
-    {
-      end_sequence ();
-      return expand_call (exp, target, target == const0_rtx);
-    }
-
-  if (errno_set)
-    expand_errno_check (exp, result);
-
-  /* Output the entire sequence.  */
-  insns = get_insns ();
-  end_sequence ();
-  emit_insn (insns);
-
-  return result;
-}
-
 /* Expand a call to the builtin trinary math functions (fma).
    Return NULL_RTX if a normal call should be emitted rather than expanding the
    function in-line.  EXP is the expression that is a call to the builtin
@@ -5984,37 +5701,6 @@  expand_builtin (tree exp, rtx target, rtx subtarget, machine_mode mode,
     CASE_FLT_FN (BUILT_IN_CABS):
       break;
 
-    CASE_FLT_FN (BUILT_IN_EXP):
-    CASE_FLT_FN (BUILT_IN_EXP10):
-    CASE_FLT_FN (BUILT_IN_POW10):
-    CASE_FLT_FN (BUILT_IN_EXP2):
-    CASE_FLT_FN (BUILT_IN_EXPM1):
-    CASE_FLT_FN (BUILT_IN_LOGB):
-    CASE_FLT_FN (BUILT_IN_LOG):
-    CASE_FLT_FN (BUILT_IN_LOG10):
-    CASE_FLT_FN (BUILT_IN_LOG2):
-    CASE_FLT_FN (BUILT_IN_LOG1P):
-    CASE_FLT_FN (BUILT_IN_TAN):
-    CASE_FLT_FN (BUILT_IN_ASIN):
-    CASE_FLT_FN (BUILT_IN_ACOS):
-    CASE_FLT_FN (BUILT_IN_ATAN):
-    CASE_FLT_FN (BUILT_IN_SIGNIFICAND):
-      /* Treat these like sqrt only if unsafe math optimizations are allowed,
-	 because of possible accuracy problems.  */
-      if (! flag_unsafe_math_optimizations)
-	break;
-    CASE_FLT_FN (BUILT_IN_SQRT):
-    CASE_FLT_FN (BUILT_IN_FLOOR):
-    CASE_FLT_FN (BUILT_IN_CEIL):
-    CASE_FLT_FN (BUILT_IN_TRUNC):
-    CASE_FLT_FN (BUILT_IN_ROUND):
-    CASE_FLT_FN (BUILT_IN_NEARBYINT):
-    CASE_FLT_FN (BUILT_IN_RINT):
-      target = expand_builtin_mathfn (exp, target, subtarget);
-      if (target)
-	return target;
-      break;
-
     CASE_FLT_FN (BUILT_IN_FMA):
       target = expand_builtin_mathfn_ternary (exp, target, subtarget);
       if (target)
@@ -6061,23 +5747,6 @@  expand_builtin (tree exp, rtx target, rtx subtarget, machine_mode mode,
 	return target;
       break;
 
-    CASE_FLT_FN (BUILT_IN_ATAN2):
-    CASE_FLT_FN (BUILT_IN_LDEXP):
-    CASE_FLT_FN (BUILT_IN_SCALB):
-    CASE_FLT_FN (BUILT_IN_SCALBN):
-    CASE_FLT_FN (BUILT_IN_SCALBLN):
-      if (! flag_unsafe_math_optimizations)
-	break;
-
-    CASE_FLT_FN (BUILT_IN_FMOD):
-    CASE_FLT_FN (BUILT_IN_REMAINDER):
-    CASE_FLT_FN (BUILT_IN_DREM):
-    CASE_FLT_FN (BUILT_IN_POW):
-      target = expand_builtin_mathfn_2 (exp, target, subtarget);
-      if (target)
-	return target;
-      break;
-
     CASE_FLT_FN (BUILT_IN_CEXPI):
       target = expand_builtin_cexpi (exp, target);
       gcc_assert (target);
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index c03c0fc..898c83d 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -2069,6 +2069,24 @@  expand_GOACC_REDUCTION (internal_fn, gcall *)
   gcc_unreachable ();
 }
 
+/* Set errno to EDOM.  */
+
+static void
+expand_SET_EDOM (internal_fn, gcall *)
+{
+#ifdef TARGET_EDOM
+#ifdef GEN_ERRNO_RTX
+  rtx errno_rtx = GEN_ERRNO_RTX;
+#else
+  rtx errno_rtx = gen_rtx_MEM (word_mode, gen_rtx_SYMBOL_REF (Pmode, "errno"));
+#endif
+  emit_move_insn (errno_rtx,
+		  gen_int_mode (TARGET_EDOM, GET_MODE (errno_rtx)));
+#else
+  gcc_unreachable ();
+#endif
+}
+
 /* Expand a call to FN using the operands in STMT.  FN has a single
    output operand and NARGS input operands.  */
 
@@ -2213,6 +2231,18 @@  direct_internal_fn_supported_p (internal_fn fn, tree type)
   return direct_internal_fn_supported_p (fn, tree_pair (type, type));
 }
 
+/* Return true if IFN_SET_EDOM is supported.  */
+
+bool
+set_edom_supported_p (void)
+{
+#ifdef TARGET_EDOM
+  return true;
+#else
+  return false;
+#endif
+}
+
 #define DEF_INTERNAL_OPTAB_FN(CODE, FLAGS, OPTAB, TYPE) \
   static void						\
   expand_##CODE (internal_fn fn, gcall *stmt)		\
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index bf8047a..825dba1 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -181,6 +181,10 @@  DEF_INTERNAL_FN (GOACC_LOOP, ECF_PURE | ECF_NOTHROW, NULL)
 /* OpenACC reduction abstraction.  See internal-fn.h  for usage.  */
 DEF_INTERNAL_FN (GOACC_REDUCTION, ECF_NOTHROW | ECF_LEAF, NULL)
 
+/* Set errno to EDOM, if GCC knows how to do that directly for the
+   current target.  */
+DEF_INTERNAL_FN (SET_EDOM, ECF_LEAF | ECF_NOTHROW, NULL)
+
 #undef DEF_INTERNAL_INT_FN
 #undef DEF_INTERNAL_FLT_FN
 #undef DEF_INTERNAL_OPTAB_FN
diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
index 5ee43b8..6cb123f 100644
--- a/gcc/internal-fn.h
+++ b/gcc/internal-fn.h
@@ -160,6 +160,7 @@  extern tree_pair direct_internal_fn_types (internal_fn, tree, tree *);
 extern tree_pair direct_internal_fn_types (internal_fn, gcall *);
 extern bool direct_internal_fn_supported_p (internal_fn, tree_pair);
 extern bool direct_internal_fn_supported_p (internal_fn, tree);
+extern bool set_edom_supported_p (void);
 
 extern void expand_internal_call (gcall *);
 extern void expand_internal_call (internal_fn, gcall *);
diff --git a/gcc/opts.c b/gcc/opts.c
index 9a3fbb3..99d35ae 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -464,6 +464,7 @@  static const struct default_options default_options_table[] =
     { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fmove_loop_invariants, NULL, 1 },
     { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_ftree_pta, NULL, 1 },
     { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fssa_phiopt, NULL, 1 },
+    { OPT_LEVELS_1_PLUS, OPT_ftree_builtin_call_dce, NULL, 1 },
 
     /* -O2 optimizations.  */
     { OPT_LEVELS_2_PLUS, OPT_finline_small_functions, NULL, 1 },
@@ -489,7 +490,6 @@  static const struct default_options default_options_table[] =
       REORDER_BLOCKS_ALGORITHM_STC },
     { OPT_LEVELS_2_PLUS, OPT_freorder_functions, NULL, 1 },
     { OPT_LEVELS_2_PLUS, OPT_ftree_vrp, NULL, 1 },
-    { OPT_LEVELS_2_PLUS, OPT_ftree_builtin_call_dce, NULL, 1 },
     { OPT_LEVELS_2_PLUS, OPT_ftree_pre, NULL, 1 },
     { OPT_LEVELS_2_PLUS, OPT_ftree_switch_conversion, NULL, 1 },
     { OPT_LEVELS_2_PLUS, OPT_fipa_cp, NULL, 1 },
diff --git a/gcc/tree-call-cdce.c b/gcc/tree-call-cdce.c
index a5f38ce..986c218 100644
--- a/gcc/tree-call-cdce.c
+++ b/gcc/tree-call-cdce.c
@@ -33,45 +33,77 @@  along with GCC; see the file COPYING3.  If not see
 #include "gimple-iterator.h"
 #include "tree-cfg.h"
 #include "tree-into-ssa.h"
+#include "builtins.h"
+#include "internal-fn.h"
 
 
-/* Conditional dead call elimination
+/* This pass serves two closely-related purposes:
 
-   Some builtin functions can set errno on error conditions, but they
-   are otherwise pure.  If the result of a call to such a function is
-   not used, the compiler can still not eliminate the call without
-   powerful interprocedural analysis to prove that the errno is not
-   checked.  However, if the conditions under which the error occurs
-   are known, the compiler can conditionally dead code eliminate the
-   calls by shrink-wrapping the semi-dead calls into the error condition:
+   1. It conditionally executes calls that set errno if (a) the result of
+      the call is unused and (b) a simple range check on the arguments can
+      detect most cases where errno does not need to be set.
 
-        built_in_call (args)
-          ==>
-        if (error_cond (args))
-             built_in_call (args)
+      This is the "conditional dead-code elimination" that gave the pass
+      its original name, since the call is dead for most argument values.
+      The calls for which it helps are usually part of the C++ abstraction
+      penalty exposed after inlining.
 
-    An actual simple example is :
-         log (x);   // Mostly dead call
+   2. It looks for calls to built-in functions that set errno and whose
+      result is used.  It checks whether there is an associated internal
+      function that doesn't set errno and whether the target supports
+      that internal function.  If so, the pass uses the internal function
+      to compute the result of the built-in function but still arranges
+      for errno to be set when necessary.  There are two ways of setting
+      errno:
+
+      a. by protecting the original call with the same argument checks as (1)
+
+      b. by protecting the original call with a check that the result
+	 of the internal function is not equal to itself (i.e. is NaN).
+
+      (b) requires that NaNs are the only erroneous results.  It is not
+      appropriate for functions like log, which returns ERANGE for zero
+      arguments.  (b) is also likely to perform worse than (a) because it
+      requires the result to be calculated first.  The pass therefore uses
+      (a) when it can and uses (b) as a fallback.
+
+      For (b) the pass can replace the original call with a call to
+      IFN_SET_EDOM, if the target supports direct assignments to errno.
+
+   In both cases, arguments that require errno to be set should occur
+   rarely in practice.  Checks of the errno result should also be rare,
+   but the compiler would need powerful interprocedural analysis to
+   prove that errno is not checked.  It's much easier to add argument
+   checks or result checks instead.
+
+     An example of (1) is:
+
+	 log (x);   // Mostly dead call
      ==>
-         if (x <= 0)
-             log (x);
+	 if (x <= 0)
+	     log (x);
+
      With this change, call to log (x) is effectively eliminated, as
-     in majority of the cases, log won't be called with x out of
+     in the majority of the cases, log won't be called with x out of
      range.  The branch is totally predictable, so the branch cost
      is low.
 
+     An example of (2) is:
+
+	y = sqrt (x);
+     ==>
+	y = IFN_SQRT (x);
+	if (x < 0)
+	    sqrt (x);
+
+     In the vast majority of cases we should then never need to call sqrt.
+
    Note that library functions are not supposed to clear errno to zero without
    error.  See IEEE Std 1003.1, section 2.3 Error Numbers, and section 7.5:3 of
    ISO/IEC 9899 (C99).
 
    The condition wrapping the builtin call is conservatively set to avoid too
-   aggressive (wrong) shrink wrapping.  The optimization is called conditional
-   dead call elimination because the call is eliminated under the condition
-   that the input arguments would not lead to domain or range error (for
-   instance when x <= 0 for a log (x) call), however the chances that the error
-   condition is hit is very low (those builtin calls which are conditionally
-   dead are usually part of the C++ abstraction penalty exposed after
-   inlining).  */
+   aggressive (wrong) shrink wrapping.  */
 
 
 /* A structure for representing input domain of
@@ -250,28 +282,15 @@  check_builtin_call (gcall *bcall)
   return check_target_format (arg);
 }
 
-/* A helper function to determine if a builtin function call is a
-   candidate for conditional DCE.  Returns true if the builtin call
-   is a candidate.  */
+/* Return true if built-in function call CALL calls a math function
+   and if we know how to test the range of its arguments to detect _most_
+   situations in which errno is not set.  The test must err on the side
+   of treating non-erroneous values as potentially erroneous.  */
 
 static bool
-is_call_dce_candidate (gcall *call)
+can_test_argument_range (gcall *call)
 {
-  tree fn;
-  enum built_in_function fnc;
-
-  /* Only potentially dead calls are considered.  */
-  if (gimple_call_lhs (call))
-    return false;
-
-  fn = gimple_call_fndecl (call);
-  if (!fn
-      || !DECL_BUILT_IN (fn)
-      || (DECL_BUILT_IN_CLASS (fn) != BUILT_IN_NORMAL))
-    return false;
-
-  fnc = DECL_FUNCTION_CODE (fn);
-  switch (fnc)
+  switch (DECL_FUNCTION_CODE (gimple_call_fndecl (call)))
     {
     /* Trig functions.  */
     CASE_FLT_FN (BUILT_IN_ACOS):
@@ -305,6 +324,39 @@  is_call_dce_candidate (gcall *call)
   return false;
 }
 
+/* Return true if CALL can produce a domain error (EDOM) but can never
+   produce a pole or range overflow error (ERANGE).  This means that we
+   can tell whether a function would have set errno by testing whether
+   the result is a NaN.
+
+   Note that we do not consider range underflow errors, which are
+   implementation-defined in C99.  */
+
+static bool
+edom_only_function (gcall *call)
+{
+  switch (DECL_FUNCTION_CODE (gimple_call_fndecl (call)))
+    {
+    CASE_FLT_FN (BUILT_IN_ACOS):
+    CASE_FLT_FN (BUILT_IN_ASIN):
+    CASE_FLT_FN (BUILT_IN_ATAN):
+    CASE_FLT_FN (BUILT_IN_COS):
+    CASE_FLT_FN (BUILT_IN_EXP):
+    CASE_FLT_FN (BUILT_IN_EXP10):
+    CASE_FLT_FN (BUILT_IN_EXP2):
+    CASE_FLT_FN (BUILT_IN_EXPM1):
+    CASE_FLT_FN (BUILT_IN_SIGNIFICAND):
+    CASE_FLT_FN (BUILT_IN_SIN):
+    CASE_FLT_FN (BUILT_IN_SQRT):
+    CASE_FLT_FN (BUILT_IN_ATAN2):
+    CASE_FLT_FN (BUILT_IN_FMOD):
+    CASE_FLT_FN (BUILT_IN_REMAINDER):
+      return true;
+
+    default:
+      return false;
+    }
+}
 
 /* A helper function to generate gimple statements for
    one bound comparison.  ARG is the call argument to
@@ -704,33 +756,24 @@  gen_shrink_wrap_conditions (gcall *bi_call, vec<gimple *> conds,
 /* Probability of the branch (to the call) is taken.  */
 #define ERR_PROB 0.01
 
-/* The function to shrink wrap a partially dead builtin call
-   whose return value is not used anywhere, but has to be kept
-   live due to potential error condition.  Returns true if the
-   transformation actually happens.  */
+/* Shrink-wrap BI_CALL so that it is only called when the NCONDS conditions
+   in CONDS are true.
+
+   Return true on success, in which case the cfg will have been updated.  */
 
 static bool
-shrink_wrap_one_built_in_call (gcall *bi_call)
+shrink_wrap_one_built_in_call_with_conds (gcall *bi_call, vec <gimple *> conds,
+					  unsigned int nconds)
 {
   gimple_stmt_iterator bi_call_bsi;
   basic_block bi_call_bb, join_tgt_bb, guard_bb;
   edge join_tgt_in_edge_from_call, join_tgt_in_edge_fall_thru;
   edge bi_call_in_edge0, guard_bb_in_edge;
-  unsigned tn_cond_stmts, nconds;
+  unsigned tn_cond_stmts;
   unsigned ci;
   gimple *cond_expr = NULL;
   gimple *cond_expr_start;
 
-  auto_vec<gimple *, 12> conds;
-  gen_shrink_wrap_conditions (bi_call, conds, &nconds);
-
-  /* This can happen if the condition generator decides
-     it is not beneficial to do the transformation.  Just
-     return false and do not do any transformation for
-     the call.  */
-  if (nconds == 0)
-    return false;
-
   /* The cfg we want to create looks like this:
 
 	   [guard n-1]         <- guard_bb (old block)
@@ -869,6 +912,117 @@  shrink_wrap_one_built_in_call (gcall *bi_call)
   return true;
 }
 
+/* Shrink-wrap BI_CALL so that it is only called when it might set errno
+   (but is always called if it would set errno).
+
+   Return true on success, in which case the cfg will have been updated.  */
+
+static bool
+shrink_wrap_one_built_in_call (gcall *bi_call)
+{
+  unsigned nconds = 0;
+  auto_vec<gimple *, 12> conds;
+  gen_shrink_wrap_conditions (bi_call, conds, &nconds);
+  /* This can happen if the condition generator decides
+     it is not beneficial to do the transformation.  Just
+     return false and do not do any transformation for
+     the call.  */
+  if (nconds == 0)
+    return false;
+  return shrink_wrap_one_built_in_call_with_conds (bi_call, conds, nconds);
+}
+
+/* Return true if built-in function call CALL could be implemented using
+   a combination of an internal function to compute the result and a
+   separate call to set errno.  */
+
+static bool
+can_use_internal_fn (gcall *call)
+{
+  /* Only replace calls that set errno.  */
+  if (!gimple_vdef (call))
+    return false;
+
+  /* Punt if we can't conditionalize the call.  */
+  basic_block bb = gimple_bb (call);
+  if (stmt_ends_bb_p (call) && !find_fallthru_edge (bb->succs))
+    return false;
+
+  /* See whether there is an internal function for this built-in.  */
+  if (replacement_internal_fn (call) == IFN_LAST)
+    return false;
+
+  /* See whether we can catch all cases where errno would be set,
+     while still avoiding the call in most cases.  */
+  if (!can_test_argument_range (call)
+      && !edom_only_function (call))
+    return false;
+
+  return true;
+}
+
+/* Implement built-in function call CALL using an internal function.
+   Return true on success, in which case the cfg will have changed.  */
+
+static bool
+use_internal_fn (gcall *call)
+{
+  unsigned nconds = 0;
+  auto_vec<gimple *, 12> conds;
+  gen_shrink_wrap_conditions (call, conds, &nconds);
+  if (nconds == 0 && !edom_only_function (call))
+    return false;
+
+  internal_fn ifn = replacement_internal_fn (call);
+  gcc_assert (ifn != IFN_LAST);
+
+  /* Construct the new call, with the same arguments as the original one.  */
+  auto_vec <tree, 16> args;
+  unsigned int nargs = gimple_call_num_args (call);
+  for (unsigned int i = 0; i < nargs; ++i)
+    args.safe_push (gimple_call_arg (call, i));
+  gcall *new_call = gimple_build_call_internal_vec (ifn, args);
+  gimple_set_location (new_call, gimple_location (call));
+
+  /* Transfer the LHS to the new call.  */
+  tree lhs = gimple_call_lhs (call);
+  gimple_call_set_lhs (new_call, lhs);
+  gimple_call_set_lhs (call, NULL_TREE);
+  SSA_NAME_DEF_STMT (lhs) = new_call;
+
+  /* Insert the new call.  */
+  gimple_stmt_iterator gsi = gsi_for_stmt (call);
+  gsi_insert_before (&gsi, new_call, GSI_SAME_STMT);
+
+  if (nconds == 0)
+    {
+      /* Check LHS != LHS.  If we reach here, EDOM is the only
+	 valid errno value and it is used iff the result is NaN.  */
+      conds.quick_push (gimple_build_cond (NE_EXPR, lhs, lhs,
+					   NULL_TREE, NULL_TREE));
+      nconds++;
+
+      /* Try replacing the original call with a direct assignment to
+	 errno, via an internal function.  */
+      if (set_edom_supported_p () && !stmt_ends_bb_p (call))
+	{
+	  gimple_stmt_iterator gsi = gsi_for_stmt (call);
+	  gcall *new_call = gimple_build_call_internal (IFN_SET_EDOM, 0);
+	  gimple_set_vuse (new_call, gimple_vuse (call));
+	  gimple_set_vdef (new_call, gimple_vdef (call));
+	  SSA_NAME_DEF_STMT (gimple_vdef (new_call)) = new_call;
+	  gimple_set_location (new_call, gimple_location (call));
+	  gsi_replace (&gsi, new_call, false);
+	  call = new_call;
+	}
+    }
+
+  if (!shrink_wrap_one_built_in_call_with_conds (call, conds, nconds))
+    /* It's too late to back out now.  */
+    gcc_unreachable ();
+  return true;
+}
+
 /* The top level function for conditional dead code shrink
    wrapping transformation.  */
 
@@ -885,7 +1039,10 @@  shrink_wrap_conditional_dead_built_in_calls (vec<gcall *> calls)
   for (; i < n ; i++)
     {
       gcall *bi_call = calls[i];
-      changed |= shrink_wrap_one_built_in_call (bi_call);
+      if (gimple_call_lhs (bi_call))
+	changed |= use_internal_fn (bi_call);
+      else
+	changed |= shrink_wrap_one_built_in_call (bi_call);
     }
 
   return changed;
@@ -914,13 +1071,12 @@  public:
   {}
 
   /* opt_pass methods: */
-  virtual bool gate (function *fun)
+  virtual bool gate (function *)
     {
       /* The limit constants used in the implementation
 	 assume IEEE floating point format.  Other formats
 	 can be supported in the future if needed.  */
-      return flag_tree_builtin_call_dce != 0
-       	&& optimize_function_for_speed_p (fun);
+      return flag_tree_builtin_call_dce != 0;
     }
 
   virtual unsigned int execute (function *);
@@ -936,11 +1092,20 @@  pass_call_cdce::execute (function *fun)
   auto_vec<gcall *> cond_dead_built_in_calls;
   FOR_EACH_BB_FN (bb, fun)
     {
+      /* Skip blocks that are being optimized for size, since our
+	 transformation always increases code size.  */
+      if (optimize_bb_for_size_p (bb))
+	continue;
+
       /* Collect dead call candidates.  */
       for (i = gsi_start_bb (bb); !gsi_end_p (i); gsi_next (&i))
         {
 	  gcall *stmt = dyn_cast <gcall *> (gsi_stmt (i));
-          if (stmt && is_call_dce_candidate (stmt))
+          if (stmt
+	      && gimple_call_builtin_p (stmt, BUILT_IN_NORMAL)
+	      && (gimple_call_lhs (stmt)
+		  ? can_use_internal_fn (stmt)
+		  : can_test_argument_range (stmt)))
             {
               if (dump_file && (dump_flags & TDF_DETAILS))
                 {