Message ID  CAEFO=4AiqFvHH5sb1xWguEoSY2osH+NZJzWELkHqefEbgTd_6g@mail.gmail.com 

State  New 
Headers  show 
Series 

Related  show 
ping On Sat, Aug 4, 2018 at 10:22 AM, Giuliano Augusto Faulin Belinassi <giuliano.belinassi@usp.br> wrote: > Closes bug #86829 > > Description: Adds substitution rules for both sin(atan(x)) and > cos(atan(x)). These formulas are replaced by x / sqrt(x*x + 1) and 1 / > sqrt(x*x + 1) respectively, providing up to 10x speedup. This identity > can be proved mathematically. > > Changelog: > > 20180803 Giuliano Belinassi <giuliano.belinassi@usp.br> > > * match.pd: add simplification rules to sin(atan(x)) and cos(atan(x)). > > Bootstrap and Testing: > There were no unexpected failures in a proper testing in GCC 8.1.0 > under a x86_64 running Ubuntu 18.04.
On 08/04/2018 07:22 AM, Giuliano Augusto Faulin Belinassi wrote: > Closes bug #86829 > > Description: Adds substitution rules for both sin(atan(x)) and > cos(atan(x)). These formulas are replaced by x / sqrt(x*x + 1) and 1 / > sqrt(x*x + 1) respectively, providing up to 10x speedup. This identity > can be proved mathematically. > > Changelog: > > 20180803 Giuliano Belinassi <giuliano.belinassi@usp.br> > > * match.pd: add simplification rules to sin(atan(x)) and cos(atan(x)). > > Bootstrap and Testing: > There were no unexpected failures in a proper testing in GCC 8.1.0 > under a x86_64 running Ubuntu 18.04. I understand these are mathematical identities. But floating point arthmetic in a compiler isn't nearly that clean :) We have to worry about overflows, underflows, rounding, and the simple fact that many floating point numbers can't be exactly represented. Just as an example, compare the results for x = 0x1.fffffffffffffp1023 I think sin(atan (x)) is well defined in that case. But the x*x isn't because it overflows. So I think this has to be somewhere under the ffastmath umbrella. And the testing requirements for that are painful  you have to verify it doesn't break the spec benchmark. I know Richi acked in the PR, but that might have been premature. jeff
On Mon, Aug 20, 2018 at 9:40 PM Jeff Law <law@redhat.com> wrote: > > On 08/04/2018 07:22 AM, Giuliano Augusto Faulin Belinassi wrote: > > Closes bug #86829 > > > > Description: Adds substitution rules for both sin(atan(x)) and > > cos(atan(x)). These formulas are replaced by x / sqrt(x*x + 1) and 1 / > > sqrt(x*x + 1) respectively, providing up to 10x speedup. This identity > > can be proved mathematically. > > > > Changelog: > > > > 20180803 Giuliano Belinassi <giuliano.belinassi@usp.br> > > > > * match.pd: add simplification rules to sin(atan(x)) and cos(atan(x)). > > > > Bootstrap and Testing: > > There were no unexpected failures in a proper testing in GCC 8.1.0 > > under a x86_64 running Ubuntu 18.04. > I understand these are mathematical identities. But floating point > arthmetic in a compiler isn't nearly that clean :) We have to worry > about overflows, underflows, rounding, and the simple fact that many > floating point numbers can't be exactly represented. > > Just as an example, compare the results for > x = 0x1.fffffffffffffp1023 > > I think sin(atan (x)) is well defined in that case. But the x*x isn't > because it overflows. > > So I think this has to be somewhere under the ffastmath umbrella. > And the testing requirements for that are painful  you have to verify > it doesn't break the spec benchmark. > > I know Richi acked in the PR, but that might have been premature. It's under the flag_unsafe_math_optimizations umbrella, but sure, a "proper" way to optimize this would be to further expand sqrt (x*x + 1) to fabs(x) + ... (extra terms) that are precise enough and not have this overflow issue. But yes, I do not find (quickly skimming) other simplifications that have this kind of overflow issue (in fact I do remember raising overflow/underflow issues for other patches). Thus approval withdrawn. If we had useful range info on floats we might conditionalize such transforms appropriately. Or we can enable it on floats and do the sqrt (x*x + 1) in double. Richard. > jeff > > >
On 08/21/2018 02:02 AM, Richard Biener wrote: > On Mon, Aug 20, 2018 at 9:40 PM Jeff Law <law@redhat.com> wrote: >> >> On 08/04/2018 07:22 AM, Giuliano Augusto Faulin Belinassi wrote: >>> Closes bug #86829 >>> >>> Description: Adds substitution rules for both sin(atan(x)) and >>> cos(atan(x)). These formulas are replaced by x / sqrt(x*x + 1) and 1 / >>> sqrt(x*x + 1) respectively, providing up to 10x speedup. This identity >>> can be proved mathematically. >>> >>> Changelog: >>> >>> 20180803 Giuliano Belinassi <giuliano.belinassi@usp.br> >>> >>> * match.pd: add simplification rules to sin(atan(x)) and cos(atan(x)). >>> >>> Bootstrap and Testing: >>> There were no unexpected failures in a proper testing in GCC 8.1.0 >>> under a x86_64 running Ubuntu 18.04. >> I understand these are mathematical identities. But floating point >> arthmetic in a compiler isn't nearly that clean :) We have to worry >> about overflows, underflows, rounding, and the simple fact that many >> floating point numbers can't be exactly represented. >> >> Just as an example, compare the results for >> x = 0x1.fffffffffffffp1023 >> >> I think sin(atan (x)) is well defined in that case. But the x*x isn't >> because it overflows. >> >> So I think this has to be somewhere under the ffastmath umbrella. >> And the testing requirements for that are painful  you have to verify >> it doesn't break the spec benchmark. >> >> I know Richi acked in the PR, but that might have been premature. > > It's under the flag_unsafe_math_optimizations umbrella, but sure, > a "proper" way to optimize this would be to further expand > sqrt (x*x + 1) to fabs(x) + ... (extra terms) that are precise enough > and not have this overflow issue. > > But yes, I do not find (quickly skimming) other simplifications that > have this kind of overflow issue (in fact I do remember raising > overflow/underflow issues for other patches). > > Thus approval withdrawn. At least until we can do some testing around spec. There's also a patch for logarithm addition/subtraction from MCC CS and another from Giuliano for hyperbolics that need testing with spec. I think that getting that testing done anytime between now and stage1 close is sufficient  none of the 3 patches is particularly complex. > > If we had useful range info on floats we might conditionalize such > transforms appropriately. Or we can enable it on floats and do > the sqrt (x*x + 1) in double. Yea. I keep thinking about what it might take to start doing some light VRP of floating point objects. I'd originally been thinking to just track 0.0 and exceptional value state. But the more I ponder the more I think we could use the range information to allow transformations that are currently guarded by the ffastmath family of options. jeff
> Just as an example, compare the results for > x = 0x1.fffffffffffffp1023 Thank you for your answer and the counterexample. :) > If we had useful range info on floats we might conditionalize such > transforms appropriately. Or we can enable it on floats and do > the sqrt (x*x + 1) in double. I think I managed to find a bound were the transformation can be done without overflow harm, however I don't know about rounding problems, however Suppose we are handling double precision floats for now. The function x/sqrt(1 + x*x) approaches 1 when x is big enough. How big must be x for the function be 1? Since sqrt(1 + x*x) > x when x > 1, then we must find a value to x that x/sqrt(1 + x*x) < eps, where eps is the biggest double smaller than 1. Such eps must be around 1  2^53 in ieee double because the mantissa has 52 bits. Solving for x yields that x must be somewhat bigger than 6.7e7, so let's take 1e8. Therefore if abs(x) > 1e8, it is enough to return copysign(1, x). Notice that this arguments is also valid for x = +inf (if target supports that) because sin(atan(+inf)) = +1, and it can be extended to other floating point formats.The following test code illustrates my point: https://pastebin.com/M4G4neLQ This might still be faster than calculating sin(atan(x)) explicitly. Please let me know if this is unfeasible. :) Giuliano. On Tue, Aug 21, 2018 at 11:27 AM, Jeff Law <law@redhat.com> wrote: > On 08/21/2018 02:02 AM, Richard Biener wrote: >> On Mon, Aug 20, 2018 at 9:40 PM Jeff Law <law@redhat.com> wrote: >>> >>> On 08/04/2018 07:22 AM, Giuliano Augusto Faulin Belinassi wrote: >>>> Closes bug #86829 >>>> >>>> Description: Adds substitution rules for both sin(atan(x)) and >>>> cos(atan(x)). These formulas are replaced by x / sqrt(x*x + 1) and 1 / >>>> sqrt(x*x + 1) respectively, providing up to 10x speedup. This identity >>>> can be proved mathematically. >>>> >>>> Changelog: >>>> >>>> 20180803 Giuliano Belinassi <giuliano.belinassi@usp.br> >>>> >>>> * match.pd: add simplification rules to sin(atan(x)) and cos(atan(x)). >>>> >>>> Bootstrap and Testing: >>>> There were no unexpected failures in a proper testing in GCC 8.1.0 >>>> under a x86_64 running Ubuntu 18.04. >>> I understand these are mathematical identities. But floating point >>> arthmetic in a compiler isn't nearly that clean :) We have to worry >>> about overflows, underflows, rounding, and the simple fact that many >>> floating point numbers can't be exactly represented. >>> >>> Just as an example, compare the results for >>> x = 0x1.fffffffffffffp1023 >>> >>> I think sin(atan (x)) is well defined in that case. But the x*x isn't >>> because it overflows. >>> >>> So I think this has to be somewhere under the ffastmath umbrella. >>> And the testing requirements for that are painful  you have to verify >>> it doesn't break the spec benchmark. >>> >>> I know Richi acked in the PR, but that might have been premature. >> >> It's under the flag_unsafe_math_optimizations umbrella, but sure, >> a "proper" way to optimize this would be to further expand >> sqrt (x*x + 1) to fabs(x) + ... (extra terms) that are precise enough >> and not have this overflow issue. >> >> But yes, I do not find (quickly skimming) other simplifications that >> have this kind of overflow issue (in fact I do remember raising >> overflow/underflow issues for other patches). >> >> Thus approval withdrawn. > At least until we can do some testing around spec. There's also a patch > for logarithm addition/subtraction from MCC CS and another from Giuliano > for hyperbolics that need testing with spec. I think that getting that > testing done anytime between now and stage1 close is sufficient  none > of the 3 patches is particularly complex. > > >> >> If we had useful range info on floats we might conditionalize such >> transforms appropriately. Or we can enable it on floats and do >> the sqrt (x*x + 1) in double. > Yea. I keep thinking about what it might take to start doing some light > VRP of floating point objects. I'd originally been thinking to just > track 0.0 and exceptional value state. But the more I ponder the more I > think we could use the range information to allow transformations that > are currently guarded by the ffastmath family of options. > > jeff >
On 08/21/2018 02:08 PM, Giuliano Augusto Faulin Belinassi wrote: >> Just as an example, compare the results for >> x = 0x1.fffffffffffffp1023 > > Thank you for your answer and the counterexample. :) > >> If we had useful range info on floats we might conditionalize such >> transforms appropriately. Or we can enable it on floats and do >> the sqrt (x*x + 1) in double. > > I think I managed to find a bound were the transformation can be done > without overflow harm, however I don't know about rounding problems, > however > > Suppose we are handling double precision floats for now. The function > x/sqrt(1 + x*x) approaches 1 when x is big enough. How big must be x > for the function be 1? > > Since sqrt(1 + x*x) > x when x > 1, then we must find a value to x > that x/sqrt(1 + x*x) < eps, where eps is the biggest double smaller > than 1. Such eps must be around 1  2^53 in ieee double because the > mantissa has 52 bits. Solving for x yields that x must be somewhat > bigger than 6.7e7, so let's take 1e8. Therefore if abs(x) > 1e8, it is > enough to return copysign(1, x). Notice that this arguments is also > valid for x = +inf (if target supports that) because sin(atan(+inf)) > = +1, and it can be extended to other floating point formats.The > following test code illustrates my point: > https://pastebin.com/M4G4neLQ > > This might still be faster than calculating sin(atan(x)) explicitly. > > Please let me know if this is unfeasible. :) The problem is our VRP implementation doesn't handle any floating point types at this time. If we had range information for FP types, then this kind of analysis is precisely what we'd need to do the transformation regardless of ffastmath. jeff > > Giuliano. > > On Tue, Aug 21, 2018 at 11:27 AM, Jeff Law <law@redhat.com> wrote: >> On 08/21/2018 02:02 AM, Richard Biener wrote: >>> On Mon, Aug 20, 2018 at 9:40 PM Jeff Law <law@redhat.com> wrote: >>>> >>>> On 08/04/2018 07:22 AM, Giuliano Augusto Faulin Belinassi wrote: >>>>> Closes bug #86829 >>>>> >>>>> Description: Adds substitution rules for both sin(atan(x)) and >>>>> cos(atan(x)). These formulas are replaced by x / sqrt(x*x + 1) and 1 / >>>>> sqrt(x*x + 1) respectively, providing up to 10x speedup. This identity >>>>> can be proved mathematically. >>>>> >>>>> Changelog: >>>>> >>>>> 20180803 Giuliano Belinassi <giuliano.belinassi@usp.br> >>>>> >>>>> * match.pd: add simplification rules to sin(atan(x)) and cos(atan(x)). >>>>> >>>>> Bootstrap and Testing: >>>>> There were no unexpected failures in a proper testing in GCC 8.1.0 >>>>> under a x86_64 running Ubuntu 18.04. >>>> I understand these are mathematical identities. But floating point >>>> arthmetic in a compiler isn't nearly that clean :) We have to worry >>>> about overflows, underflows, rounding, and the simple fact that many >>>> floating point numbers can't be exactly represented. >>>> >>>> Just as an example, compare the results for >>>> x = 0x1.fffffffffffffp1023 >>>> >>>> I think sin(atan (x)) is well defined in that case. But the x*x isn't >>>> because it overflows. >>>> >>>> So I think this has to be somewhere under the ffastmath umbrella. >>>> And the testing requirements for that are painful  you have to verify >>>> it doesn't break the spec benchmark. >>>> >>>> I know Richi acked in the PR, but that might have been premature. >>> >>> It's under the flag_unsafe_math_optimizations umbrella, but sure, >>> a "proper" way to optimize this would be to further expand >>> sqrt (x*x + 1) to fabs(x) + ... (extra terms) that are precise enough >>> and not have this overflow issue. >>> >>> But yes, I do not find (quickly skimming) other simplifications that >>> have this kind of overflow issue (in fact I do remember raising >>> overflow/underflow issues for other patches). >>> >>> Thus approval withdrawn. >> At least until we can do some testing around spec. There's also a patch >> for logarithm addition/subtraction from MCC CS and another from Giuliano >> for hyperbolics that need testing with spec. I think that getting that >> testing done anytime between now and stage1 close is sufficient  none >> of the 3 patches is particularly complex. >> >> >>> >>> If we had useful range info on floats we might conditionalize such >>> transforms appropriately. Or we can enable it on floats and do >>> the sqrt (x*x + 1) in double. >> Yea. I keep thinking about what it might take to start doing some light >> VRP of floating point objects. I'd originally been thinking to just >> track 0.0 and exceptional value state. But the more I ponder the more I >> think we could use the range information to allow transformations that >> are currently guarded by the ffastmath family of options. >> >> jeff >>
On Tue, 21 Aug 2018, Jeff Law wrote: > The problem is our VRP implementation doesn't handle any floating point > types at this time. If we had range information for FP types, then > this kind of analysis is precisely what we'd need to do the > transformation regardless of ffastmath. I don't think you can do it regardless of ffastmath simply because it may change the semantics and we've generally assumed that if the optimization might produce results different from what you get with correctly rounded library functions, it should go under funsafemathoptimizations. One might try to figure out a way to split that option, to distinguish optimizations that might change correctly rounded result but keep errors small from optimizations that might produce results that are way off, or spurious exceptions, for some inputs.
On Tue, Aug 21, 2018 at 11:27 PM Jeff Law <law@redhat.com> wrote: > > On 08/21/2018 02:08 PM, Giuliano Augusto Faulin Belinassi wrote: > >> Just as an example, compare the results for > >> x = 0x1.fffffffffffffp1023 > > > > Thank you for your answer and the counterexample. :) > > > >> If we had useful range info on floats we might conditionalize such > >> transforms appropriately. Or we can enable it on floats and do > >> the sqrt (x*x + 1) in double. > > > > I think I managed to find a bound were the transformation can be done > > without overflow harm, however I don't know about rounding problems, > > however > > > > Suppose we are handling double precision floats for now. The function > > x/sqrt(1 + x*x) approaches 1 when x is big enough. How big must be x > > for the function be 1? > > > > Since sqrt(1 + x*x) > x when x > 1, then we must find a value to x > > that x/sqrt(1 + x*x) < eps, where eps is the biggest double smaller > > than 1. Such eps must be around 1  2^53 in ieee double because the > > mantissa has 52 bits. Solving for x yields that x must be somewhat > > bigger than 6.7e7, so let's take 1e8. Therefore if abs(x) > 1e8, it is > > enough to return copysign(1, x). Notice that this arguments is also > > valid for x = +inf (if target supports that) because sin(atan(+inf)) > > = +1, and it can be extended to other floating point formats.The > > following test code illustrates my point: > > https://pastebin.com/M4G4neLQ > > > > This might still be faster than calculating sin(atan(x)) explicitly. > > > > Please let me know if this is unfeasible. :) > The problem is our VRP implementation doesn't handle any floating point > types at this time. If we had range information for FP types, then > this kind of analysis is precisely what we'd need to do the > transformation regardless of ffastmath. I think his idea was to emit a runtime test? You'd have to use a COND_EXPR and evaluate both arms at the same time because match.pd doesn't allow you to create control flow. Note the rounding issue is also real given for large x you strip away lower mantissa bits when computing x*x. Richard. > jeff > > > > Giuliano. > > > > On Tue, Aug 21, 2018 at 11:27 AM, Jeff Law <law@redhat.com> wrote: > >> On 08/21/2018 02:02 AM, Richard Biener wrote: > >>> On Mon, Aug 20, 2018 at 9:40 PM Jeff Law <law@redhat.com> wrote: > >>>> > >>>> On 08/04/2018 07:22 AM, Giuliano Augusto Faulin Belinassi wrote: > >>>>> Closes bug #86829 > >>>>> > >>>>> Description: Adds substitution rules for both sin(atan(x)) and > >>>>> cos(atan(x)). These formulas are replaced by x / sqrt(x*x + 1) and 1 / > >>>>> sqrt(x*x + 1) respectively, providing up to 10x speedup. This identity > >>>>> can be proved mathematically. > >>>>> > >>>>> Changelog: > >>>>> > >>>>> 20180803 Giuliano Belinassi <giuliano.belinassi@usp.br> > >>>>> > >>>>> * match.pd: add simplification rules to sin(atan(x)) and cos(atan(x)). > >>>>> > >>>>> Bootstrap and Testing: > >>>>> There were no unexpected failures in a proper testing in GCC 8.1.0 > >>>>> under a x86_64 running Ubuntu 18.04. > >>>> I understand these are mathematical identities. But floating point > >>>> arthmetic in a compiler isn't nearly that clean :) We have to worry > >>>> about overflows, underflows, rounding, and the simple fact that many > >>>> floating point numbers can't be exactly represented. > >>>> > >>>> Just as an example, compare the results for > >>>> x = 0x1.fffffffffffffp1023 > >>>> > >>>> I think sin(atan (x)) is well defined in that case. But the x*x isn't > >>>> because it overflows. > >>>> > >>>> So I think this has to be somewhere under the ffastmath umbrella. > >>>> And the testing requirements for that are painful  you have to verify > >>>> it doesn't break the spec benchmark. > >>>> > >>>> I know Richi acked in the PR, but that might have been premature. > >>> > >>> It's under the flag_unsafe_math_optimizations umbrella, but sure, > >>> a "proper" way to optimize this would be to further expand > >>> sqrt (x*x + 1) to fabs(x) + ... (extra terms) that are precise enough > >>> and not have this overflow issue. > >>> > >>> But yes, I do not find (quickly skimming) other simplifications that > >>> have this kind of overflow issue (in fact I do remember raising > >>> overflow/underflow issues for other patches). > >>> > >>> Thus approval withdrawn. > >> At least until we can do some testing around spec. There's also a patch > >> for logarithm addition/subtraction from MCC CS and another from Giuliano > >> for hyperbolics that need testing with spec. I think that getting that > >> testing done anytime between now and stage1 close is sufficient  none > >> of the 3 patches is particularly complex. > >> > >> > >>> > >>> If we had useful range info on floats we might conditionalize such > >>> transforms appropriately. Or we can enable it on floats and do > >>> the sqrt (x*x + 1) in double. > >> Yea. I keep thinking about what it might take to start doing some light > >> VRP of floating point objects. I'd originally been thinking to just > >> track 0.0 and exceptional value state. But the more I ponder the more I > >> think we could use the range information to allow transformations that > >> are currently guarded by the ffastmath family of options. > >> > >> jeff > >> >
On 08/22/2018 06:02 AM, Richard Biener wrote: > On Tue, Aug 21, 2018 at 11:27 PM Jeff Law <law@redhat.com> wrote: >> >> On 08/21/2018 02:08 PM, Giuliano Augusto Faulin Belinassi wrote: >>>> Just as an example, compare the results for >>>> x = 0x1.fffffffffffffp1023 >>> >>> Thank you for your answer and the counterexample. :) >>> >>>> If we had useful range info on floats we might conditionalize such >>>> transforms appropriately. Or we can enable it on floats and do >>>> the sqrt (x*x + 1) in double. >>> >>> I think I managed to find a bound were the transformation can be done >>> without overflow harm, however I don't know about rounding problems, >>> however >>> >>> Suppose we are handling double precision floats for now. The function >>> x/sqrt(1 + x*x) approaches 1 when x is big enough. How big must be x >>> for the function be 1? >>> >>> Since sqrt(1 + x*x) > x when x > 1, then we must find a value to x >>> that x/sqrt(1 + x*x) < eps, where eps is the biggest double smaller >>> than 1. Such eps must be around 1  2^53 in ieee double because the >>> mantissa has 52 bits. Solving for x yields that x must be somewhat >>> bigger than 6.7e7, so let's take 1e8. Therefore if abs(x) > 1e8, it is >>> enough to return copysign(1, x). Notice that this arguments is also >>> valid for x = +inf (if target supports that) because sin(atan(+inf)) >>> = +1, and it can be extended to other floating point formats.The >>> following test code illustrates my point: >>> https://pastebin.com/M4G4neLQ >>> >>> This might still be faster than calculating sin(atan(x)) explicitly. >>> >>> Please let me know if this is unfeasible. :) >> The problem is our VRP implementation doesn't handle any floating point >> types at this time. If we had range information for FP types, then >> this kind of analysis is precisely what we'd need to do the >> transformation regardless of ffastmath. > > I think his idea was to emit a runtime test? You'd have to use a > COND_EXPR and evaluate both arms at the same time because > match.pd doesn't allow you to create control flow. > > Note the rounding issue is also real given for large x you strip > away lower mantissa bits when computing x*x. Ah, a runtime test. That'd be sufficient. The cost when we can't do the transformation is relatively small, but the gains when we can are huge. Jeff
> > Ah, a runtime test. That'd be sufficient. The cost when we can't do > the transformation is relatively small, but the gains when we can are huge. Thank you. I will update the patch and send it again :) On Wed, Aug 22, 2018 at 7:05 PM, Jeff Law <law@redhat.com> wrote: > On 08/22/2018 06:02 AM, Richard Biener wrote: >> On Tue, Aug 21, 2018 at 11:27 PM Jeff Law <law@redhat.com> wrote: >>> >>> On 08/21/2018 02:08 PM, Giuliano Augusto Faulin Belinassi wrote: >>>>> Just as an example, compare the results for >>>>> x = 0x1.fffffffffffffp1023 >>>> >>>> Thank you for your answer and the counterexample. :) >>>> >>>>> If we had useful range info on floats we might conditionalize such >>>>> transforms appropriately. Or we can enable it on floats and do >>>>> the sqrt (x*x + 1) in double. >>>> >>>> I think I managed to find a bound were the transformation can be done >>>> without overflow harm, however I don't know about rounding problems, >>>> however >>>> >>>> Suppose we are handling double precision floats for now. The function >>>> x/sqrt(1 + x*x) approaches 1 when x is big enough. How big must be x >>>> for the function be 1? >>>> >>>> Since sqrt(1 + x*x) > x when x > 1, then we must find a value to x >>>> that x/sqrt(1 + x*x) < eps, where eps is the biggest double smaller >>>> than 1. Such eps must be around 1  2^53 in ieee double because the >>>> mantissa has 52 bits. Solving for x yields that x must be somewhat >>>> bigger than 6.7e7, so let's take 1e8. Therefore if abs(x) > 1e8, it is >>>> enough to return copysign(1, x). Notice that this arguments is also >>>> valid for x = +inf (if target supports that) because sin(atan(+inf)) >>>> = +1, and it can be extended to other floating point formats.The >>>> following test code illustrates my point: >>>> https://pastebin.com/M4G4neLQ >>>> >>>> This might still be faster than calculating sin(atan(x)) explicitly. >>>> >>>> Please let me know if this is unfeasible. :) >>> The problem is our VRP implementation doesn't handle any floating point >>> types at this time. If we had range information for FP types, then >>> this kind of analysis is precisely what we'd need to do the >>> transformation regardless of ffastmath. >> >> I think his idea was to emit a runtime test? You'd have to use a >> COND_EXPR and evaluate both arms at the same time because >> match.pd doesn't allow you to create control flow. >> >> Note the rounding issue is also real given for large x you strip >> away lower mantissa bits when computing x*x. > Ah, a runtime test. That'd be sufficient. The cost when we can't do > the transformation is relatively small, but the gains when we can are huge. > > Jeff
On 8/22/18 6:02 AM, Richard Biener wrote: > On Tue, Aug 21, 2018 at 11:27 PM Jeff Law <law@redhat.com> wrote: >> >> On 08/21/2018 02:08 PM, Giuliano Augusto Faulin Belinassi wrote: >>>> Just as an example, compare the results for >>>> x = 0x1.fffffffffffffp1023 >>> >>> Thank you for your answer and the counterexample. :) >>> >>>> If we had useful range info on floats we might conditionalize such >>>> transforms appropriately. Or we can enable it on floats and do >>>> the sqrt (x*x + 1) in double. >>> >>> I think I managed to find a bound were the transformation can be done >>> without overflow harm, however I don't know about rounding problems, >>> however >>> >>> Suppose we are handling double precision floats for now. The function >>> x/sqrt(1 + x*x) approaches 1 when x is big enough. How big must be x >>> for the function be 1? >>> >>> Since sqrt(1 + x*x) > x when x > 1, then we must find a value to x >>> that x/sqrt(1 + x*x) < eps, where eps is the biggest double smaller >>> than 1. Such eps must be around 1  2^53 in ieee double because the >>> mantissa has 52 bits. Solving for x yields that x must be somewhat >>> bigger than 6.7e7, so let's take 1e8. Therefore if abs(x) > 1e8, it is >>> enough to return copysign(1, x). Notice that this arguments is also >>> valid for x = +inf (if target supports that) because sin(atan(+inf)) >>> = +1, and it can be extended to other floating point formats.The >>> following test code illustrates my point: >>> https://pastebin.com/M4G4neLQ >>> >>> This might still be faster than calculating sin(atan(x)) explicitly. >>> >>> Please let me know if this is unfeasible. :) >> The problem is our VRP implementation doesn't handle any floating point >> types at this time. If we had range information for FP types, then >> this kind of analysis is precisely what we'd need to do the >> transformation regardless of ffastmath. > > I think his idea was to emit a runtime test? You'd have to use a > COND_EXPR and evaluate both arms at the same time because > match.pd doesn't allow you to create control flow. Right. That's what his subsequent patch does. Can you take a peek at the match.pd part > > Note the rounding issue is also real given for large x you strip > away lower mantissa bits when computing x*x. Does that happen for values less than 1e8? Jeff
diff rupN u gcc_orig/gcc/match.pd gcc_mod/gcc/match.pd  gcc_orig/gcc/match.pd 20180420 07:31:23.000000000 0300 +++ gcc_mod/gcc/match.pd 20180803 15:25:32.307520365 0300 @@ 4174,6 +4174,25 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && ! HONOR_INFINITIES (@0)) (rdiv { build_one_cst (type); } (COS @0)))) + /* Simplify sin(atan(x)) > x / sqrt(x*x + 1). */ + (for sins (SIN) + atans (ATAN) + sqrts (SQRT) + (simplify + (sins (atans:s @0)) + (rdiv @0 (sqrts (plus (mult @0 @0) + {build_one_cst (type);}))))) + + + /* Simplify cos(atan(x)) > 1 / sqrt(x*x + 1). */ + (for coss (COS) + atans (ATAN) + sqrts (SQRT) + (simplify + (coss (atans:s @0)) + (rdiv {build_one_cst (type);} + (sqrts (plus (mult @0 @0) {build_one_cst (type);}))))) + /* Simplify pow(x,y) * pow(x,z) > pow(x,y+z). */ (simplify (mult (POW:s @0 @1) (POW:s @0 @2)) diff rupN u gcc_orig/gcc/testsuite/gcc.dg/sinatan1.c gcc_mod/gcc/testsuite/gcc.dg/sinatan1.c  gcc_orig/gcc/testsuite/gcc.dg/sinatan1.c 19691231 21:00:00.000000000 0300 +++ gcc_mod/gcc/testsuite/gcc.dg/sinatan1.c 20180802 19:06:23.632015000 0300 @@ 0,0 +1,15 @@ +/* { dgdo compile } */ +/* { dgoptions "O3 ffastmath fdumptreeoptimized" } */ + +extern double sin(double x); +extern double atan(double x); + +double __attribute__ ((noinline)) +sinatan_(double x) +{ + return sin(atan(x)); +} + +/* There should be no calls to sin nor atan */ +/* { dgfinal { scantreedumpnot "sin " "optimized" } } */ +/* { dgfinal { scantreedumpnot "atan " "optimized" } } */ diff rupN u gcc_orig/gcc/testsuite/gcc.dg/sinatan2.c gcc_mod/gcc/testsuite/gcc.dg/sinatan2.c  gcc_orig/gcc/testsuite/gcc.dg/sinatan2.c 19691231 21:00:00.000000000 0300 +++ gcc_mod/gcc/testsuite/gcc.dg/sinatan2.c 20180802 19:06:29.579990000 0300 @@ 0,0 +1,15 @@ +/* { dgdo compile } */ +/* { dgoptions "O3 ffastmath fdumptreeoptimized" } */ + +extern double cos(double x); +extern double atan(double x); + +double __attribute__ ((noinline)) +cosatan_(double x) +{ + return cos(atan(x)); +} + +/* There should be no calls to sin nor atan */ +/* { dgfinal { scantreedumpnot "cos " "optimized" } } */ +/* { dgfinal { scantreedumpnot "atan " "optimized" } } */ diff rupN u gcc_orig/gcc/testsuite/gcc.dg/sinatan3.c gcc_mod/gcc/testsuite/gcc.dg/sinatan3.c  gcc_orig/gcc/testsuite/gcc.dg/sinatan3.c 19691231 21:00:00.000000000 0300 +++ gcc_mod/gcc/testsuite/gcc.dg/sinatan3.c 20180802 19:17:11.663657000 0300 @@ 0,0 +1,16 @@ +/* { dgdo compile } */ +/* { dgoptions "O3 ffastmath fdumptreeoptimized" } */ + +extern double sin(double x); +extern double atan(double x); + +double __attribute__ ((noinline)) +sinatan_(double x) +{ + double atg = atan(x); + return sin(atg) + atg; +} + +/* There should be calls to both sin and atan */ +/* { dgfinal { scantreedump "sin " "optimized" } } */ +/* { dgfinal { scantreedump "atan " "optimized" } } */ diff rupN u gcc_orig/gcc/testsuite/gcc.dg/sinatan4.c gcc_mod/gcc/testsuite/gcc.dg/sinatan4.c  gcc_orig/gcc/testsuite/gcc.dg/sinatan4.c 19691231 21:00:00.000000000 0300 +++ gcc_mod/gcc/testsuite/gcc.dg/sinatan4.c 20180802 19:17:01.091726000 0300 @@ 0,0 +1,16 @@ +/* { dgdo compile } */ +/* { dgoptions "O3 ffastmath fdumptreeoptimized" } */ + +extern double cos(double x); +extern double atan(double x); + +double __attribute__ ((noinline)) +cosatan_(double x) +{ + double atg = atan(x); + return cos(atg) + atg; +} + +/* There should be calls to both cos and atan */ +/* { dgfinal { scantreedump "cos " "optimized" } } */ +/* { dgfinal { scantreedump "atan " "optimized" } } */