Optimize sin(atan(x)) and cos(atan(x)), take 3 (PR tree-optimization/86829)

Message ID CAEFO=4DcnN6u_aXHiE=r8MUmmPuaBqYLYsw7Yg3fHggQC3_c6g@mail.gmail.com
State New
Headers show
Series
  • Optimize sin(atan(x)) and cos(atan(x)), take 3 (PR tree-optimization/86829)
Related show

Commit Message

Giuliano Augusto Faulin Belinassi Oct. 9, 2018, 11:29 p.m.
Fixed all issues pointed in the previous iteration.
There is now a significant change regarding how the sin(atan(x))
constant is calculated, as now it checks for which values such that
computing 1 + x*x won't overflow. There are two reasons for this
change: (1) Avoid an intermediate infinity value when optimizing
cos(atan(x)), and (2) avoid the requirement of separate constants for
sin(atan(x)) and cos(atan(x)), thus making easier to maintain the
code.

gcc/ChangeLog

2018-10-09  Giuliano Belinassi  <giuliano.belinassi@usp.br>

    PR tree-optimization/86829
    * match.pd: Added sin(atan(x)) and cos(atan(x)) simplification rules.
    * real.c (build_sinatan_real): New function to build a constant equal to the
    largest value c such that 1 + c*c will not overflow.
    * real.h (build_sinatan_real): Allows this function to be called externally.

gcc/testsuite/gcc.dg/ChangeLog

2018-10-09  Giuliano Belinassi  <giuliano.belinassi@usp.br>

    PR tree-optimization/86829
    * gcc.dg/sinatan-1.c: New test.
    * gcc.dg/sinatan-2.c: New test.
    * gcc.dg/sinatan-3.c: New test.

There are no tests broken in trunk that seems related to this PR.

Comments

Jeff Law Oct. 11, 2018, 9:07 p.m. | #1
On 10/9/18 5:29 PM, Giuliano Augusto Faulin Belinassi wrote:
> Fixed all issues pointed in the previous iteration.
> There is now a significant change regarding how the sin(atan(x))
> constant is calculated, as now it checks for which values such that
> computing 1 + x*x won't overflow. There are two reasons for this
> change: (1) Avoid an intermediate infinity value when optimizing
> cos(atan(x)), and (2) avoid the requirement of separate constants for
> sin(atan(x)) and cos(atan(x)), thus making easier to maintain the
> code.
> 
> gcc/ChangeLog
> 
> 2018-10-09  Giuliano Belinassi  <giuliano.belinassi@usp.br>
> 
>     PR tree-optimization/86829
>     * match.pd: Added sin(atan(x)) and cos(atan(x)) simplification rules.
>     * real.c (build_sinatan_real): New function to build a constant equal to the
>     largest value c such that 1 + c*c will not overflow.
>     * real.h (build_sinatan_real): Allows this function to be called externally.
> 
> gcc/testsuite/gcc.dg/ChangeLog
> 
> 2018-10-09  Giuliano Belinassi  <giuliano.belinassi@usp.br>
> 
>     PR tree-optimization/86829
>     * gcc.dg/sinatan-1.c: New test.
>     * gcc.dg/sinatan-2.c: New test.
>     * gcc.dg/sinatan-3.c: New test.
> 
> There are no tests broken in trunk that seems related to this PR.
THanks.  I've installed this onto the trunk.  It's right at the
borderline of what would require a copyright assignment.  So if you're
going to do further work on GCC you should go ahead and start the
copyright assignment process.

Jeff
Christophe Lyon Oct. 12, 2018, 8:08 a.m. | #2
On Thu, 11 Oct 2018 at 23:07, Jeff Law <law@redhat.com> wrote:
>
> On 10/9/18 5:29 PM, Giuliano Augusto Faulin Belinassi wrote:
> > Fixed all issues pointed in the previous iteration.
> > There is now a significant change regarding how the sin(atan(x))
> > constant is calculated, as now it checks for which values such that
> > computing 1 + x*x won't overflow. There are two reasons for this
> > change: (1) Avoid an intermediate infinity value when optimizing
> > cos(atan(x)), and (2) avoid the requirement of separate constants for
> > sin(atan(x)) and cos(atan(x)), thus making easier to maintain the
> > code.
> >
> > gcc/ChangeLog
> >
> > 2018-10-09  Giuliano Belinassi  <giuliano.belinassi@usp.br>
> >
> >     PR tree-optimization/86829
> >     * match.pd: Added sin(atan(x)) and cos(atan(x)) simplification rules.
> >     * real.c (build_sinatan_real): New function to build a constant equal to the
> >     largest value c such that 1 + c*c will not overflow.
> >     * real.h (build_sinatan_real): Allows this function to be called externally.
> >
> > gcc/testsuite/gcc.dg/ChangeLog
> >
> > 2018-10-09  Giuliano Belinassi  <giuliano.belinassi@usp.br>
> >
> >     PR tree-optimization/86829
> >     * gcc.dg/sinatan-1.c: New test.
> >     * gcc.dg/sinatan-2.c: New test.
> >     * gcc.dg/sinatan-3.c: New test.
> >
> > There are no tests broken in trunk that seems related to this PR.
> THanks.  I've installed this onto the trunk.  It's right at the
> borderline of what would require a copyright assignment.  So if you're
> going to do further work on GCC you should go ahead and start the
> copyright assignment process.
>
> Jeff

Hi!

The new sinatan-1.c test fails to link against newlib on aarch64-elf:
/tmp/ccmp5fP4.o: In function `sinatanl':
sinatan-1.c:(.text+0x68): undefined reference to `atanl'
sinatan-1.c:(.text+0x70): undefined reference to `sinl'
/tmp/ccmp5fP4.o: In function `cosatanl':
sinatan-1.c:(.text+0x80): undefined reference to `atanl'
sinatan-1.c:(.text+0x88): undefined reference to `cosl'
collect2: error: ld returned 1 exit status

I'm not familiar enough with newlib to know if it's a newlib bug, or
if we should skip the test?

On arm-eabi, the same test fails at runtime, but I haven't yet taken
the time to reproduce it manually.

Christophe
Richard Biener Oct. 12, 2018, 11:27 a.m. | #3
On Fri, Oct 12, 2018 at 10:08 AM Christophe Lyon
<christophe.lyon@linaro.org> wrote:
>
> On Thu, 11 Oct 2018 at 23:07, Jeff Law <law@redhat.com> wrote:
> >
> > On 10/9/18 5:29 PM, Giuliano Augusto Faulin Belinassi wrote:
> > > Fixed all issues pointed in the previous iteration.
> > > There is now a significant change regarding how the sin(atan(x))
> > > constant is calculated, as now it checks for which values such that
> > > computing 1 + x*x won't overflow. There are two reasons for this
> > > change: (1) Avoid an intermediate infinity value when optimizing
> > > cos(atan(x)), and (2) avoid the requirement of separate constants for
> > > sin(atan(x)) and cos(atan(x)), thus making easier to maintain the
> > > code.
> > >
> > > gcc/ChangeLog
> > >
> > > 2018-10-09  Giuliano Belinassi  <giuliano.belinassi@usp.br>
> > >
> > >     PR tree-optimization/86829
> > >     * match.pd: Added sin(atan(x)) and cos(atan(x)) simplification rules.
> > >     * real.c (build_sinatan_real): New function to build a constant equal to the
> > >     largest value c such that 1 + c*c will not overflow.
> > >     * real.h (build_sinatan_real): Allows this function to be called externally.
> > >
> > > gcc/testsuite/gcc.dg/ChangeLog
> > >
> > > 2018-10-09  Giuliano Belinassi  <giuliano.belinassi@usp.br>
> > >
> > >     PR tree-optimization/86829
> > >     * gcc.dg/sinatan-1.c: New test.
> > >     * gcc.dg/sinatan-2.c: New test.
> > >     * gcc.dg/sinatan-3.c: New test.
> > >
> > > There are no tests broken in trunk that seems related to this PR.
> > THanks.  I've installed this onto the trunk.  It's right at the
> > borderline of what would require a copyright assignment.  So if you're
> > going to do further work on GCC you should go ahead and start the
> > copyright assignment process.
> >
> > Jeff
>
> Hi!
>
> The new sinatan-1.c test fails to link against newlib on aarch64-elf:
> /tmp/ccmp5fP4.o: In function `sinatanl':
> sinatan-1.c:(.text+0x68): undefined reference to `atanl'
> sinatan-1.c:(.text+0x70): undefined reference to `sinl'
> /tmp/ccmp5fP4.o: In function `cosatanl':
> sinatan-1.c:(.text+0x80): undefined reference to `atanl'
> sinatan-1.c:(.text+0x88): undefined reference to `cosl'
> collect2: error: ld returned 1 exit status
>
> I'm not familiar enough with newlib to know if it's a newlib bug, or
> if we should skip the test?
>
> On arm-eabi, the same test fails at runtime, but I haven't yet taken
> the time to reproduce it manually.

target_c99_math might do the trick

> Christophe
Christophe Lyon Oct. 12, 2018, 2:11 p.m. | #4
On Fri, 12 Oct 2018 at 13:27, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Fri, Oct 12, 2018 at 10:08 AM Christophe Lyon
> <christophe.lyon@linaro.org> wrote:
> >
> > On Thu, 11 Oct 2018 at 23:07, Jeff Law <law@redhat.com> wrote:
> > >
> > > On 10/9/18 5:29 PM, Giuliano Augusto Faulin Belinassi wrote:
> > > > Fixed all issues pointed in the previous iteration.
> > > > There is now a significant change regarding how the sin(atan(x))
> > > > constant is calculated, as now it checks for which values such that
> > > > computing 1 + x*x won't overflow. There are two reasons for this
> > > > change: (1) Avoid an intermediate infinity value when optimizing
> > > > cos(atan(x)), and (2) avoid the requirement of separate constants for
> > > > sin(atan(x)) and cos(atan(x)), thus making easier to maintain the
> > > > code.
> > > >
> > > > gcc/ChangeLog
> > > >
> > > > 2018-10-09  Giuliano Belinassi  <giuliano.belinassi@usp.br>
> > > >
> > > >     PR tree-optimization/86829
> > > >     * match.pd: Added sin(atan(x)) and cos(atan(x)) simplification rules.
> > > >     * real.c (build_sinatan_real): New function to build a constant equal to the
> > > >     largest value c such that 1 + c*c will not overflow.
> > > >     * real.h (build_sinatan_real): Allows this function to be called externally.
> > > >
> > > > gcc/testsuite/gcc.dg/ChangeLog
> > > >
> > > > 2018-10-09  Giuliano Belinassi  <giuliano.belinassi@usp.br>
> > > >
> > > >     PR tree-optimization/86829
> > > >     * gcc.dg/sinatan-1.c: New test.
> > > >     * gcc.dg/sinatan-2.c: New test.
> > > >     * gcc.dg/sinatan-3.c: New test.
> > > >
> > > > There are no tests broken in trunk that seems related to this PR.
> > > THanks.  I've installed this onto the trunk.  It's right at the
> > > borderline of what would require a copyright assignment.  So if you're
> > > going to do further work on GCC you should go ahead and start the
> > > copyright assignment process.
> > >
> > > Jeff
> >
> > Hi!
> >
> > The new sinatan-1.c test fails to link against newlib on aarch64-elf:
> > /tmp/ccmp5fP4.o: In function `sinatanl':
> > sinatan-1.c:(.text+0x68): undefined reference to `atanl'
> > sinatan-1.c:(.text+0x70): undefined reference to `sinl'
> > /tmp/ccmp5fP4.o: In function `cosatanl':
> > sinatan-1.c:(.text+0x80): undefined reference to `atanl'
> > sinatan-1.c:(.text+0x88): undefined reference to `cosl'
> > collect2: error: ld returned 1 exit status
> >
> > I'm not familiar enough with newlib to know if it's a newlib bug, or
> > if we should skip the test?
> >
> > On arm-eabi, the same test fails at runtime, but I haven't yet taken
> > the time to reproduce it manually.
>
> target_c99_math might do the trick
>

Yes, if we want to skip the test, but I'm not sure about that?
On arm-eabi, adding some traces in the test indicates that the outputs
of cosatanf, cosatan and cosatanl are wrong.

> > Christophe
Jeff Law Oct. 12, 2018, 2:14 p.m. | #5
On 10/12/18 8:11 AM, Christophe Lyon wrote:
> On Fri, 12 Oct 2018 at 13:27, Richard Biener <richard.guenther@gmail.com> wrote:
>>
>> On Fri, Oct 12, 2018 at 10:08 AM Christophe Lyon
>> <christophe.lyon@linaro.org> wrote:
>>>
>>> On Thu, 11 Oct 2018 at 23:07, Jeff Law <law@redhat.com> wrote:
>>>>
>>>> On 10/9/18 5:29 PM, Giuliano Augusto Faulin Belinassi wrote:
>>>>> Fixed all issues pointed in the previous iteration.
>>>>> There is now a significant change regarding how the sin(atan(x))
>>>>> constant is calculated, as now it checks for which values such that
>>>>> computing 1 + x*x won't overflow. There are two reasons for this
>>>>> change: (1) Avoid an intermediate infinity value when optimizing
>>>>> cos(atan(x)), and (2) avoid the requirement of separate constants for
>>>>> sin(atan(x)) and cos(atan(x)), thus making easier to maintain the
>>>>> code.
>>>>>
>>>>> gcc/ChangeLog
>>>>>
>>>>> 2018-10-09  Giuliano Belinassi  <giuliano.belinassi@usp.br>
>>>>>
>>>>>     PR tree-optimization/86829
>>>>>     * match.pd: Added sin(atan(x)) and cos(atan(x)) simplification rules.
>>>>>     * real.c (build_sinatan_real): New function to build a constant equal to the
>>>>>     largest value c such that 1 + c*c will not overflow.
>>>>>     * real.h (build_sinatan_real): Allows this function to be called externally.
>>>>>
>>>>> gcc/testsuite/gcc.dg/ChangeLog
>>>>>
>>>>> 2018-10-09  Giuliano Belinassi  <giuliano.belinassi@usp.br>
>>>>>
>>>>>     PR tree-optimization/86829
>>>>>     * gcc.dg/sinatan-1.c: New test.
>>>>>     * gcc.dg/sinatan-2.c: New test.
>>>>>     * gcc.dg/sinatan-3.c: New test.
>>>>>
>>>>> There are no tests broken in trunk that seems related to this PR.
>>>> THanks.  I've installed this onto the trunk.  It's right at the
>>>> borderline of what would require a copyright assignment.  So if you're
>>>> going to do further work on GCC you should go ahead and start the
>>>> copyright assignment process.
>>>>
>>>> Jeff
>>>
>>> Hi!
>>>
>>> The new sinatan-1.c test fails to link against newlib on aarch64-elf:
>>> /tmp/ccmp5fP4.o: In function `sinatanl':
>>> sinatan-1.c:(.text+0x68): undefined reference to `atanl'
>>> sinatan-1.c:(.text+0x70): undefined reference to `sinl'
>>> /tmp/ccmp5fP4.o: In function `cosatanl':
>>> sinatan-1.c:(.text+0x80): undefined reference to `atanl'
>>> sinatan-1.c:(.text+0x88): undefined reference to `cosl'
>>> collect2: error: ld returned 1 exit status
>>>
>>> I'm not familiar enough with newlib to know if it's a newlib bug, or
>>> if we should skip the test?
>>>
>>> On arm-eabi, the same test fails at runtime, but I haven't yet taken
>>> the time to reproduce it manually.
>>
>> target_c99_math might do the trick
>>
> 
> Yes, if we want to skip the test, but I'm not sure about that?
> On arm-eabi, adding some traces in the test indicates that the outputs
> of cosatanf, cosatan and cosatanl are wrong.
I think skipping on the newlib targets is fine.  I'm much more concerned
about the execution tests -- even if it's the library implementation
that is wrong, it's going to be painful to manage the xfails over time.

jeff
Christophe Lyon Oct. 12, 2018, 3:15 p.m. | #6
On Fri, 12 Oct 2018 at 16:14, Jeff Law <law@redhat.com> wrote:
>
> On 10/12/18 8:11 AM, Christophe Lyon wrote:
> > On Fri, 12 Oct 2018 at 13:27, Richard Biener <richard.guenther@gmail.com> wrote:
> >>
> >> On Fri, Oct 12, 2018 at 10:08 AM Christophe Lyon
> >> <christophe.lyon@linaro.org> wrote:
> >>>
> >>> On Thu, 11 Oct 2018 at 23:07, Jeff Law <law@redhat.com> wrote:
> >>>>
> >>>> On 10/9/18 5:29 PM, Giuliano Augusto Faulin Belinassi wrote:
> >>>>> Fixed all issues pointed in the previous iteration.
> >>>>> There is now a significant change regarding how the sin(atan(x))
> >>>>> constant is calculated, as now it checks for which values such that
> >>>>> computing 1 + x*x won't overflow. There are two reasons for this
> >>>>> change: (1) Avoid an intermediate infinity value when optimizing
> >>>>> cos(atan(x)), and (2) avoid the requirement of separate constants for
> >>>>> sin(atan(x)) and cos(atan(x)), thus making easier to maintain the
> >>>>> code.
> >>>>>
> >>>>> gcc/ChangeLog
> >>>>>
> >>>>> 2018-10-09  Giuliano Belinassi  <giuliano.belinassi@usp.br>
> >>>>>
> >>>>>     PR tree-optimization/86829
> >>>>>     * match.pd: Added sin(atan(x)) and cos(atan(x)) simplification rules.
> >>>>>     * real.c (build_sinatan_real): New function to build a constant equal to the
> >>>>>     largest value c such that 1 + c*c will not overflow.
> >>>>>     * real.h (build_sinatan_real): Allows this function to be called externally.
> >>>>>
> >>>>> gcc/testsuite/gcc.dg/ChangeLog
> >>>>>
> >>>>> 2018-10-09  Giuliano Belinassi  <giuliano.belinassi@usp.br>
> >>>>>
> >>>>>     PR tree-optimization/86829
> >>>>>     * gcc.dg/sinatan-1.c: New test.
> >>>>>     * gcc.dg/sinatan-2.c: New test.
> >>>>>     * gcc.dg/sinatan-3.c: New test.
> >>>>>
> >>>>> There are no tests broken in trunk that seems related to this PR.
> >>>> THanks.  I've installed this onto the trunk.  It's right at the
> >>>> borderline of what would require a copyright assignment.  So if you're
> >>>> going to do further work on GCC you should go ahead and start the
> >>>> copyright assignment process.
> >>>>
> >>>> Jeff
> >>>
> >>> Hi!
> >>>
> >>> The new sinatan-1.c test fails to link against newlib on aarch64-elf:
> >>> /tmp/ccmp5fP4.o: In function `sinatanl':
> >>> sinatan-1.c:(.text+0x68): undefined reference to `atanl'
> >>> sinatan-1.c:(.text+0x70): undefined reference to `sinl'
> >>> /tmp/ccmp5fP4.o: In function `cosatanl':
> >>> sinatan-1.c:(.text+0x80): undefined reference to `atanl'
> >>> sinatan-1.c:(.text+0x88): undefined reference to `cosl'
> >>> collect2: error: ld returned 1 exit status
> >>>
> >>> I'm not familiar enough with newlib to know if it's a newlib bug, or
> >>> if we should skip the test?
> >>>
> >>> On arm-eabi, the same test fails at runtime, but I haven't yet taken
> >>> the time to reproduce it manually.
> >>
> >> target_c99_math might do the trick
> >>
> >
> > Yes, if we want to skip the test, but I'm not sure about that?
> > On arm-eabi, adding some traces in the test indicates that the outputs
> > of cosatanf, cosatan and cosatanl are wrong.
> I think skipping on the newlib targets is fine.  I'm much more concerned
> about the execution tests -- even if it's the library implementation
> that is wrong, it's going to be painful to manage the xfails over time.
>

The two problems I noticed are both with newlib:
- link failure on aarch64-elf
- runtime error on arm-eabi

A little bit more info on the arm-eabi problem:
The testcase has:
float fc = nextafterf (sqrtf (__FLT_MAX__ - 1), __FLT_MAX__);
[...]
    fy = cosatanf (fc);

fc is built with:
        mov     r0, #0
        movt    r0, 24448
so r0=0x5f800000 (1.8446744E19) which looks ok
but then cosatanf is computed as (ie there's not call to cosatanf()):
        movw    r3, #48430
        movt    r3, 45883
so r3=0xb33bbd2e (-4.371139E-8) which is not zero.

cosatan and cosatanl are called and seem to return wrong values too,
but I haven't stepped into them yet.


Christophe
Giuliano Augusto Faulin Belinassi Oct. 12, 2018, 3:51 p.m. | #7
Hello
     What is the output of these functions on such arch? Since the
test didn't fail for the sinatan counterpart, an possible explanation
would be that the calculation of the sqrf, sqrt and sqrtl (lines
62-64) yielded a number that is far behind of what it should be.
However, I am still not sure about this, so I will investigate
further.
     How about I  write a small program to check if the result
obtained by this calculation is what it should be?
Giuliano
Jeff Law Oct. 12, 2018, 3:57 p.m. | #8
On 10/12/18 9:51 AM, Giuliano Augusto Faulin Belinassi wrote:
> Hello
>      What is the output of these functions on such arch? Since the
> test didn't fail for the sinatan counterpart, an possible explanation
> would be that the calculation of the sqrf, sqrt and sqrtl (lines
> 62-64) yielded a number that is far behind of what it should be.
> However, I am still not sure about this, so I will investigate
> further.
>      How about I  write a small program to check if the result
> obtained by this calculation is what it should be?
I suspect it's less the architecture and more the underlying library.
As Christophe mentioned, both issues are with newlib which is an C
library primarily used in the emebedded space.  I believe it's math code
derives from early BSD libm and likely hasn't been stressed for this
kind of correctness.  It's lightly maintained (primarily for cygwin).

I'm going to run the testcases in my arm linux chroots.  That should
allow us to rule out codegen issues as those chroots will be using glibc
for their math library.

jeff
Giuliano Augusto Faulin Belinassi Oct. 12, 2018, 6:01 p.m. | #9
> fc is built with:
>        mov     r0, #0
>        movt    r0, 24448
> so r0=0x5f800000 (1.8446744E19) which looks ok

this is correct. My x86_64 yields the same value

> but then cosatanf is computed as (ie there's not call to cosatanf()):
>        movw    r3, #48430
>        movt    r3, 45883
> so r3=0xb33bbd2e (-4.371139E-8) which is not zero.

Ugh. So GCC replaced the function call with a precomputed value? This
does not happens in my x86_64. Maybe it is Ofast's fault?
Also, it seems that GCC is precomputing cos(atan(x)) before the
substitution, as the following python script yields the same result:

import numpy as np
x = np.float32 (1.8446744e19)
print (np.cos (np.arctan (x))

I would also like to add that -4.371139E-8 is very far away from
5.421011E-20, which is a "more" correct value for this computation. So
returning 0 may be a better option?
On Fri, Oct 12, 2018 at 12:57 PM Jeff Law <law@redhat.com> wrote:
>
> On 10/12/18 9:51 AM, Giuliano Augusto Faulin Belinassi wrote:
> > Hello
> >      What is the output of these functions on such arch? Since the
> > test didn't fail for the sinatan counterpart, an possible explanation
> > would be that the calculation of the sqrf, sqrt and sqrtl (lines
> > 62-64) yielded a number that is far behind of what it should be.
> > However, I am still not sure about this, so I will investigate
> > further.
> >      How about I  write a small program to check if the result
> > obtained by this calculation is what it should be?
> I suspect it's less the architecture and more the underlying library.
> As Christophe mentioned, both issues are with newlib which is an C
> library primarily used in the emebedded space.  I believe it's math code
> derives from early BSD libm and likely hasn't been stressed for this
> kind of correctness.  It's lightly maintained (primarily for cygwin).
>
> I'm going to run the testcases in my arm linux chroots.  That should
> allow us to rule out codegen issues as those chroots will be using glibc
> for their math library.
>
> jeff
Christophe Lyon Oct. 12, 2018, 6:23 p.m. | #10
On Fri, 12 Oct 2018 at 17:57, Jeff Law <law@redhat.com> wrote:
>
> On 10/12/18 9:51 AM, Giuliano Augusto Faulin Belinassi wrote:
> > Hello
> >      What is the output of these functions on such arch? Since the
> > test didn't fail for the sinatan counterpart, an possible explanation
> > would be that the calculation of the sqrf, sqrt and sqrtl (lines
> > 62-64) yielded a number that is far behind of what it should be.
> > However, I am still not sure about this, so I will investigate
> > further.
> >      How about I  write a small program to check if the result
> > obtained by this calculation is what it should be?
> I suspect it's less the architecture and more the underlying library.
> As Christophe mentioned, both issues are with newlib which is an C
> library primarily used in the emebedded space.  I believe it's math code
> derives from early BSD libm and likely hasn't been stressed for this
> kind of correctness.  It's lightly maintained (primarily for cygwin).
>
> I'm going to run the testcases in my arm linux chroots.  That should
> allow us to rule out codegen issues as those chroots will be using glibc
> for their math library.
>
My tests on arm-linux-gnueabi* are OK, so yours shoud be OK too :)

> jeff
Christophe Lyon Oct. 12, 2018, 6:24 p.m. | #11
On Fri, 12 Oct 2018 at 20:01, Giuliano Augusto Faulin Belinassi
<giuliano.belinassi@usp.br> wrote:
>
> > fc is built with:
> >        mov     r0, #0
> >        movt    r0, 24448
> > so r0=0x5f800000 (1.8446744E19) which looks ok
>
> this is correct. My x86_64 yields the same value
>
> > but then cosatanf is computed as (ie there's not call to cosatanf()):
> >        movw    r3, #48430
> >        movt    r3, 45883
> > so r3=0xb33bbd2e (-4.371139E-8) which is not zero.
>
> Ugh. So GCC replaced the function call with a precomputed value? This
> does not happens in my x86_64. Maybe it is Ofast's fault?
> Also, it seems that GCC is precomputing cos(atan(x)) before the
> substitution, as the following python script yields the same result:
>

Yes, I was surprised to see that.

> import numpy as np
> x = np.float32 (1.8446744e19)
> print (np.cos (np.arctan (x))
>
> I would also like to add that -4.371139E-8 is very far away from
> 5.421011E-20, which is a "more" correct value for this computation. So
> returning 0 may be a better option?
> On Fri, Oct 12, 2018 at 12:57 PM Jeff Law <law@redhat.com> wrote:
> >
> > On 10/12/18 9:51 AM, Giuliano Augusto Faulin Belinassi wrote:
> > > Hello
> > >      What is the output of these functions on such arch? Since the
> > > test didn't fail for the sinatan counterpart, an possible explanation
> > > would be that the calculation of the sqrf, sqrt and sqrtl (lines
> > > 62-64) yielded a number that is far behind of what it should be.
> > > However, I am still not sure about this, so I will investigate
> > > further.
> > >      How about I  write a small program to check if the result
> > > obtained by this calculation is what it should be?
> > I suspect it's less the architecture and more the underlying library.
> > As Christophe mentioned, both issues are with newlib which is an C
> > library primarily used in the emebedded space.  I believe it's math code
> > derives from early BSD libm and likely hasn't been stressed for this
> > kind of correctness.  It's lightly maintained (primarily for cygwin).
> >
> > I'm going to run the testcases in my arm linux chroots.  That should
> > allow us to rule out codegen issues as those chroots will be using glibc
> > for their math library.
> >
> > jeff
Giuliano Augusto Faulin Belinassi Oct. 16, 2018, 3:15 p.m. | #12
Hello. Sorry for the late reply.

> but then cosatanf is computed as (ie there's not call to cosatanf()):
>        movw    r3, #48430
>        movt    r3, 45883
> so r3=0xb33bbd2e (-4.371139E-8) which is not zero.
Does this behavior is still present if we change the line 58 to:
    int __attribute__ ((optimize("O0")))
in sinatan-1.c?
On Fri, Oct 12, 2018 at 3:24 PM Christophe Lyon
<christophe.lyon@linaro.org> wrote:
>
> On Fri, 12 Oct 2018 at 20:01, Giuliano Augusto Faulin Belinassi
> <giuliano.belinassi@usp.br> wrote:
> >
> > > fc is built with:
> > >        mov     r0, #0
> > >        movt    r0, 24448
> > > so r0=0x5f800000 (1.8446744E19) which looks ok
> >
> > this is correct. My x86_64 yields the same value
> >
> > > but then cosatanf is computed as (ie there's not call to cosatanf()):
> > >        movw    r3, #48430
> > >        movt    r3, 45883
> > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero.
> >
> > Ugh. So GCC replaced the function call with a precomputed value? This
> > does not happens in my x86_64. Maybe it is Ofast's fault?
> > Also, it seems that GCC is precomputing cos(atan(x)) before the
> > substitution, as the following python script yields the same result:
> >
>
> Yes, I was surprised to see that.
>
> > import numpy as np
> > x = np.float32 (1.8446744e19)
> > print (np.cos (np.arctan (x))
> >
> > I would also like to add that -4.371139E-8 is very far away from
> > 5.421011E-20, which is a "more" correct value for this computation. So
> > returning 0 may be a better option?
> > On Fri, Oct 12, 2018 at 12:57 PM Jeff Law <law@redhat.com> wrote:
> > >
> > > On 10/12/18 9:51 AM, Giuliano Augusto Faulin Belinassi wrote:
> > > > Hello
> > > >      What is the output of these functions on such arch? Since the
> > > > test didn't fail for the sinatan counterpart, an possible explanation
> > > > would be that the calculation of the sqrf, sqrt and sqrtl (lines
> > > > 62-64) yielded a number that is far behind of what it should be.
> > > > However, I am still not sure about this, so I will investigate
> > > > further.
> > > >      How about I  write a small program to check if the result
> > > > obtained by this calculation is what it should be?
> > > I suspect it's less the architecture and more the underlying library.
> > > As Christophe mentioned, both issues are with newlib which is an C
> > > library primarily used in the emebedded space.  I believe it's math code
> > > derives from early BSD libm and likely hasn't been stressed for this
> > > kind of correctness.  It's lightly maintained (primarily for cygwin).
> > >
> > > I'm going to run the testcases in my arm linux chroots.  That should
> > > allow us to rule out codegen issues as those chroots will be using glibc
> > > for their math library.
> > >
> > > jeff
Christophe Lyon Oct. 16, 2018, 3:23 p.m. | #13
On Tue, 16 Oct 2018 at 17:15, Giuliano Augusto Faulin Belinassi
<giuliano.belinassi@usp.br> wrote:
>
> Hello. Sorry for the late reply.
>
> > but then cosatanf is computed as (ie there's not call to cosatanf()):
> >        movw    r3, #48430
> >        movt    r3, 45883
> > so r3=0xb33bbd2e (-4.371139E-8) which is not zero.
> Does this behavior is still present if we change the line 58 to:
>     int __attribute__ ((optimize("O0")))
> in sinatan-1.c?

No, this now generates:
        ldr     r0, [fp, #-32]
        bl      cosatanf
where r0=0x5f800000, and cosatanf() returns 0xb33bbd2e
(ie the same value as what was computed by GCC)

> On Fri, Oct 12, 2018 at 3:24 PM Christophe Lyon
> <christophe.lyon@linaro.org> wrote:
> >
> > On Fri, 12 Oct 2018 at 20:01, Giuliano Augusto Faulin Belinassi
> > <giuliano.belinassi@usp.br> wrote:
> > >
> > > > fc is built with:
> > > >        mov     r0, #0
> > > >        movt    r0, 24448
> > > > so r0=0x5f800000 (1.8446744E19) which looks ok
> > >
> > > this is correct. My x86_64 yields the same value
> > >
> > > > but then cosatanf is computed as (ie there's not call to cosatanf()):
> > > >        movw    r3, #48430
> > > >        movt    r3, 45883
> > > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero.
> > >
> > > Ugh. So GCC replaced the function call with a precomputed value? This
> > > does not happens in my x86_64. Maybe it is Ofast's fault?
> > > Also, it seems that GCC is precomputing cos(atan(x)) before the
> > > substitution, as the following python script yields the same result:
> > >
> >
> > Yes, I was surprised to see that.
> >
> > > import numpy as np
> > > x = np.float32 (1.8446744e19)
> > > print (np.cos (np.arctan (x))
> > >
> > > I would also like to add that -4.371139E-8 is very far away from
> > > 5.421011E-20, which is a "more" correct value for this computation. So
> > > returning 0 may be a better option?
> > > On Fri, Oct 12, 2018 at 12:57 PM Jeff Law <law@redhat.com> wrote:
> > > >
> > > > On 10/12/18 9:51 AM, Giuliano Augusto Faulin Belinassi wrote:
> > > > > Hello
> > > > >      What is the output of these functions on such arch? Since the
> > > > > test didn't fail for the sinatan counterpart, an possible explanation
> > > > > would be that the calculation of the sqrf, sqrt and sqrtl (lines
> > > > > 62-64) yielded a number that is far behind of what it should be.
> > > > > However, I am still not sure about this, so I will investigate
> > > > > further.
> > > > >      How about I  write a small program to check if the result
> > > > > obtained by this calculation is what it should be?
> > > > I suspect it's less the architecture and more the underlying library.
> > > > As Christophe mentioned, both issues are with newlib which is an C
> > > > library primarily used in the emebedded space.  I believe it's math code
> > > > derives from early BSD libm and likely hasn't been stressed for this
> > > > kind of correctness.  It's lightly maintained (primarily for cygwin).
> > > >
> > > > I'm going to run the testcases in my arm linux chroots.  That should
> > > > allow us to rule out codegen issues as those chroots will be using glibc
> > > > for their math library.
> > > >
> > > > jeff
Giuliano Augusto Faulin Belinassi Oct. 16, 2018, 4:04 p.m. | #14
Hello, Christophe
    Could you please dump the assembly of cosatanf here?


On Tue, Oct 16, 2018 at 12:23 PM Christophe Lyon
<christophe.lyon@linaro.org> wrote:
>
> On Tue, 16 Oct 2018 at 17:15, Giuliano Augusto Faulin Belinassi
> <giuliano.belinassi@usp.br> wrote:
> >
> > Hello. Sorry for the late reply.
> >
> > > but then cosatanf is computed as (ie there's not call to cosatanf()):
> > >        movw    r3, #48430
> > >        movt    r3, 45883
> > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero.
> > Does this behavior is still present if we change the line 58 to:
> >     int __attribute__ ((optimize("O0")))
> > in sinatan-1.c?
>
> No, this now generates:
>         ldr     r0, [fp, #-32]
>         bl      cosatanf
> where r0=0x5f800000, and cosatanf() returns 0xb33bbd2e
> (ie the same value as what was computed by GCC)
>
> > On Fri, Oct 12, 2018 at 3:24 PM Christophe Lyon
> > <christophe.lyon@linaro.org> wrote:
> > >
> > > On Fri, 12 Oct 2018 at 20:01, Giuliano Augusto Faulin Belinassi
> > > <giuliano.belinassi@usp.br> wrote:
> > > >
> > > > > fc is built with:
> > > > >        mov     r0, #0
> > > > >        movt    r0, 24448
> > > > > so r0=0x5f800000 (1.8446744E19) which looks ok
> > > >
> > > > this is correct. My x86_64 yields the same value
> > > >
> > > > > but then cosatanf is computed as (ie there's not call to cosatanf()):
> > > > >        movw    r3, #48430
> > > > >        movt    r3, 45883
> > > > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero.
> > > >
> > > > Ugh. So GCC replaced the function call with a precomputed value? This
> > > > does not happens in my x86_64. Maybe it is Ofast's fault?
> > > > Also, it seems that GCC is precomputing cos(atan(x)) before the
> > > > substitution, as the following python script yields the same result:
> > > >
> > >
> > > Yes, I was surprised to see that.
> > >
> > > > import numpy as np
> > > > x = np.float32 (1.8446744e19)
> > > > print (np.cos (np.arctan (x))
> > > >
> > > > I would also like to add that -4.371139E-8 is very far away from
> > > > 5.421011E-20, which is a "more" correct value for this computation. So
> > > > returning 0 may be a better option?
> > > > On Fri, Oct 12, 2018 at 12:57 PM Jeff Law <law@redhat.com> wrote:
> > > > >
> > > > > On 10/12/18 9:51 AM, Giuliano Augusto Faulin Belinassi wrote:
> > > > > > Hello
> > > > > >      What is the output of these functions on such arch? Since the
> > > > > > test didn't fail for the sinatan counterpart, an possible explanation
> > > > > > would be that the calculation of the sqrf, sqrt and sqrtl (lines
> > > > > > 62-64) yielded a number that is far behind of what it should be.
> > > > > > However, I am still not sure about this, so I will investigate
> > > > > > further.
> > > > > >      How about I  write a small program to check if the result
> > > > > > obtained by this calculation is what it should be?
> > > > > I suspect it's less the architecture and more the underlying library.
> > > > > As Christophe mentioned, both issues are with newlib which is an C
> > > > > library primarily used in the emebedded space.  I believe it's math code
> > > > > derives from early BSD libm and likely hasn't been stressed for this
> > > > > kind of correctness.  It's lightly maintained (primarily for cygwin).
> > > > >
> > > > > I'm going to run the testcases in my arm linux chroots.  That should
> > > > > allow us to rule out codegen issues as those chroots will be using glibc
> > > > > for their math library.
> > > > >
> > > > > jeff
Christophe Lyon Oct. 16, 2018, 4:25 p.m. | #15
On Tue, 16 Oct 2018 at 18:04, Giuliano Augusto Faulin Belinassi
<giuliano.belinassi@usp.br> wrote:
>
> Hello, Christophe
>     Could you please dump the assembly of cosatanf here?
>
>
Sure:
        .global cosatanf
        .syntax unified
        .arm
        .fpu softvfp
        .type   cosatanf, %function
cosatanf:
.LFB3:
        .loc 1 44 1 is_stmt 1
        .cfi_startproc
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
.LVL50:
        .loc 1 45 5
        .loc 1 44 1 is_stmt 0
        push    {r4, lr}
        .cfi_def_cfa_offset 8
        .cfi_offset 4, -8
        .cfi_offset 14, -4
        .loc 1 45 12
        bl      atanf
.LVL51:
        .loc 1 46 1
        pop     {r4, lr}
        .cfi_restore 14
        .cfi_restore 4
        .cfi_def_cfa_offset 0
        .loc 1 45 12
        b       cosf
.LVL52:
        .cfi_endproc
.LFE3:
        .size   cosatanf, .-cosatanf

So, upon entry we have r0=0x5f800000
atanf returns 0x3fc90fdb
and cosf returns 0xb33bbd2e


> On Tue, Oct 16, 2018 at 12:23 PM Christophe Lyon
> <christophe.lyon@linaro.org> wrote:
> >
> > On Tue, 16 Oct 2018 at 17:15, Giuliano Augusto Faulin Belinassi
> > <giuliano.belinassi@usp.br> wrote:
> > >
> > > Hello. Sorry for the late reply.
> > >
> > > > but then cosatanf is computed as (ie there's not call to cosatanf()):
> > > >        movw    r3, #48430
> > > >        movt    r3, 45883
> > > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero.
> > > Does this behavior is still present if we change the line 58 to:
> > >     int __attribute__ ((optimize("O0")))
> > > in sinatan-1.c?
> >
> > No, this now generates:
> >         ldr     r0, [fp, #-32]
> >         bl      cosatanf
> > where r0=0x5f800000, and cosatanf() returns 0xb33bbd2e
> > (ie the same value as what was computed by GCC)
> >
> > > On Fri, Oct 12, 2018 at 3:24 PM Christophe Lyon
> > > <christophe.lyon@linaro.org> wrote:
> > > >
> > > > On Fri, 12 Oct 2018 at 20:01, Giuliano Augusto Faulin Belinassi
> > > > <giuliano.belinassi@usp.br> wrote:
> > > > >
> > > > > > fc is built with:
> > > > > >        mov     r0, #0
> > > > > >        movt    r0, 24448
> > > > > > so r0=0x5f800000 (1.8446744E19) which looks ok
> > > > >
> > > > > this is correct. My x86_64 yields the same value
> > > > >
> > > > > > but then cosatanf is computed as (ie there's not call to cosatanf()):
> > > > > >        movw    r3, #48430
> > > > > >        movt    r3, 45883
> > > > > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero.
> > > > >
> > > > > Ugh. So GCC replaced the function call with a precomputed value? This
> > > > > does not happens in my x86_64. Maybe it is Ofast's fault?
> > > > > Also, it seems that GCC is precomputing cos(atan(x)) before the
> > > > > substitution, as the following python script yields the same result:
> > > > >
> > > >
> > > > Yes, I was surprised to see that.
> > > >
> > > > > import numpy as np
> > > > > x = np.float32 (1.8446744e19)
> > > > > print (np.cos (np.arctan (x))
> > > > >
> > > > > I would also like to add that -4.371139E-8 is very far away from
> > > > > 5.421011E-20, which is a "more" correct value for this computation. So
> > > > > returning 0 may be a better option?
> > > > > On Fri, Oct 12, 2018 at 12:57 PM Jeff Law <law@redhat.com> wrote:
> > > > > >
> > > > > > On 10/12/18 9:51 AM, Giuliano Augusto Faulin Belinassi wrote:
> > > > > > > Hello
> > > > > > >      What is the output of these functions on such arch? Since the
> > > > > > > test didn't fail for the sinatan counterpart, an possible explanation
> > > > > > > would be that the calculation of the sqrf, sqrt and sqrtl (lines
> > > > > > > 62-64) yielded a number that is far behind of what it should be.
> > > > > > > However, I am still not sure about this, so I will investigate
> > > > > > > further.
> > > > > > >      How about I  write a small program to check if the result
> > > > > > > obtained by this calculation is what it should be?
> > > > > > I suspect it's less the architecture and more the underlying library.
> > > > > > As Christophe mentioned, both issues are with newlib which is an C
> > > > > > library primarily used in the emebedded space.  I believe it's math code
> > > > > > derives from early BSD libm and likely hasn't been stressed for this
> > > > > > kind of correctness.  It's lightly maintained (primarily for cygwin).
> > > > > >
> > > > > > I'm going to run the testcases in my arm linux chroots.  That should
> > > > > > allow us to rule out codegen issues as those chroots will be using glibc
> > > > > > for their math library.
> > > > > >
> > > > > > jeff
Giuliano Augusto Faulin Belinassi Oct. 16, 2018, 5:33 p.m. | #16
Hello,

> cosatanf:
> .LFB3:
>         .loc 1 44 1 is_stmt 1
>         .cfi_startproc
>        @ args = 0, pretend = 0, frame = 0
>         @ frame_needed = 0, uses_anonymous_args = 0
> .LVL50:
>         .loc 1 45 5
>         .loc 1 44 1 is_stmt 0
>         push    {r4, lr}
>         .cfi_def_cfa_offset 8
>         .cfi_offset 4, -8
>         .cfi_offset 14, -4
>         .loc 1 45 12
>         bl      atanf
> .LVL51:
>         .loc 1 46 1
>          pop     {r4, lr}
>         .cfi_restore 14
>         .cfi_restore 4
>         .cfi_def_cfa_offset 0
>         .loc 1 45 12
>         b       cosf
> .LVL52:

This means that the expression 'cos (atan (x))' was not simplified
:-(. Did the check at line 4281 (svn revision 265209) in gcc/match.pd
failed?

> Yes, if we want to skip the test, but I'm not sure about that?
Since the only point of this patch is to simplify these kind of
expressions, and it is not being simplified at all in your arch as the
asm dump suggests, then it seems safe to skip all sinatan-*.c tests.
On Tue, Oct 16, 2018 at 1:25 PM Christophe Lyon
<christophe.lyon@linaro.org> wrote:
>
> On Tue, 16 Oct 2018 at 18:04, Giuliano Augusto Faulin Belinassi
> <giuliano.belinassi@usp.br> wrote:
> >
> > Hello, Christophe
> >     Could you please dump the assembly of cosatanf here?
> >
> >
> Sure:
>         .global cosatanf
>         .syntax unified
>         .arm
>         .fpu softvfp
>         .type   cosatanf, %function
> cosatanf:
> .LFB3:
>         .loc 1 44 1 is_stmt 1
>         .cfi_startproc
>         @ args = 0, pretend = 0, frame = 0
>         @ frame_needed = 0, uses_anonymous_args = 0
> .LVL50:
>         .loc 1 45 5
>         .loc 1 44 1 is_stmt 0
>         push    {r4, lr}
>         .cfi_def_cfa_offset 8
>         .cfi_offset 4, -8
>         .cfi_offset 14, -4
>         .loc 1 45 12
>         bl      atanf
> .LVL51:
>         .loc 1 46 1
>         pop     {r4, lr}
>         .cfi_restore 14
>         .cfi_restore 4
>         .cfi_def_cfa_offset 0
>         .loc 1 45 12
>         b       cosf
> .LVL52:
>         .cfi_endproc
> .LFE3:
>         .size   cosatanf, .-cosatanf
>
> So, upon entry we have r0=0x5f800000
> atanf returns 0x3fc90fdb
> and cosf returns 0xb33bbd2e
>
>
> > On Tue, Oct 16, 2018 at 12:23 PM Christophe Lyon
> > <christophe.lyon@linaro.org> wrote:
> > >
> > > On Tue, 16 Oct 2018 at 17:15, Giuliano Augusto Faulin Belinassi
> > > <giuliano.belinassi@usp.br> wrote:
> > > >
> > > > Hello. Sorry for the late reply.
> > > >
> > > > > but then cosatanf is computed as (ie there's not call to cosatanf()):
> > > > >        movw    r3, #48430
> > > > >        movt    r3, 45883
> > > > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero.
> > > > Does this behavior is still present if we change the line 58 to:
> > > >     int __attribute__ ((optimize("O0")))
> > > > in sinatan-1.c?
> > >
> > > No, this now generates:
> > >         ldr     r0, [fp, #-32]
> > >         bl      cosatanf
> > > where r0=0x5f800000, and cosatanf() returns 0xb33bbd2e
> > > (ie the same value as what was computed by GCC)
> > >
> > > > On Fri, Oct 12, 2018 at 3:24 PM Christophe Lyon
> > > > <christophe.lyon@linaro.org> wrote:
> > > > >
> > > > > On Fri, 12 Oct 2018 at 20:01, Giuliano Augusto Faulin Belinassi
> > > > > <giuliano.belinassi@usp.br> wrote:
> > > > > >
> > > > > > > fc is built with:
> > > > > > >        mov     r0, #0
> > > > > > >        movt    r0, 24448
> > > > > > > so r0=0x5f800000 (1.8446744E19) which looks ok
> > > > > >
> > > > > > this is correct. My x86_64 yields the same value
> > > > > >
> > > > > > > but then cosatanf is computed as (ie there's not call to cosatanf()):
> > > > > > >        movw    r3, #48430
> > > > > > >        movt    r3, 45883
> > > > > > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero.
> > > > > >
> > > > > > Ugh. So GCC replaced the function call with a precomputed value? This
> > > > > > does not happens in my x86_64. Maybe it is Ofast's fault?
> > > > > > Also, it seems that GCC is precomputing cos(atan(x)) before the
> > > > > > substitution, as the following python script yields the same result:
> > > > > >
> > > > >
> > > > > Yes, I was surprised to see that.
> > > > >
> > > > > > import numpy as np
> > > > > > x = np.float32 (1.8446744e19)
> > > > > > print (np.cos (np.arctan (x))
> > > > > >
> > > > > > I would also like to add that -4.371139E-8 is very far away from
> > > > > > 5.421011E-20, which is a "more" correct value for this computation. So
> > > > > > returning 0 may be a better option?
> > > > > > On Fri, Oct 12, 2018 at 12:57 PM Jeff Law <law@redhat.com> wrote:
> > > > > > >
> > > > > > > On 10/12/18 9:51 AM, Giuliano Augusto Faulin Belinassi wrote:
> > > > > > > > Hello
> > > > > > > >      What is the output of these functions on such arch? Since the
> > > > > > > > test didn't fail for the sinatan counterpart, an possible explanation
> > > > > > > > would be that the calculation of the sqrf, sqrt and sqrtl (lines
> > > > > > > > 62-64) yielded a number that is far behind of what it should be.
> > > > > > > > However, I am still not sure about this, so I will investigate
> > > > > > > > further.
> > > > > > > >      How about I  write a small program to check if the result
> > > > > > > > obtained by this calculation is what it should be?
> > > > > > > I suspect it's less the architecture and more the underlying library.
> > > > > > > As Christophe mentioned, both issues are with newlib which is an C
> > > > > > > library primarily used in the emebedded space.  I believe it's math code
> > > > > > > derives from early BSD libm and likely hasn't been stressed for this
> > > > > > > kind of correctness.  It's lightly maintained (primarily for cygwin).
> > > > > > >
> > > > > > > I'm going to run the testcases in my arm linux chroots.  That should
> > > > > > > allow us to rule out codegen issues as those chroots will be using glibc
> > > > > > > for their math library.
> > > > > > >
> > > > > > > jeff
Christophe Lyon Oct. 16, 2018, 7:14 p.m. | #17
On Tue, 16 Oct 2018 at 19:33, Giuliano Augusto Faulin Belinassi
<giuliano.belinassi@usp.br> wrote:
>
> Hello,
>
> > cosatanf:
> > .LFB3:
> >         .loc 1 44 1 is_stmt 1
> >         .cfi_startproc
> >        @ args = 0, pretend = 0, frame = 0
> >         @ frame_needed = 0, uses_anonymous_args = 0
> > .LVL50:
> >         .loc 1 45 5
> >         .loc 1 44 1 is_stmt 0
> >         push    {r4, lr}
> >         .cfi_def_cfa_offset 8
> >         .cfi_offset 4, -8
> >         .cfi_offset 14, -4
> >         .loc 1 45 12
> >         bl      atanf
> > .LVL51:
> >         .loc 1 46 1
> >          pop     {r4, lr}
> >         .cfi_restore 14
> >         .cfi_restore 4
> >         .cfi_def_cfa_offset 0
> >         .loc 1 45 12
> >         b       cosf
> > .LVL52:
>
> This means that the expression 'cos (atan (x))' was not simplified
> :-(. Did the check at line 4281 (svn revision 265209) in gcc/match.pd
> failed?

Do you mean r265064? There's no r265209 in trunk.

> > Yes, if we want to skip the test, but I'm not sure about that?
> Since the only point of this patch is to simplify these kind of
> expressions, and it is not being simplified at all in your arch as the
> asm dump suggests, then it seems safe to skip all sinatan-*.c tests.


> On Tue, Oct 16, 2018 at 1:25 PM Christophe Lyon
> <christophe.lyon@linaro.org> wrote:
> >
> > On Tue, 16 Oct 2018 at 18:04, Giuliano Augusto Faulin Belinassi
> > <giuliano.belinassi@usp.br> wrote:
> > >
> > > Hello, Christophe
> > >     Could you please dump the assembly of cosatanf here?
> > >
> > >
> > Sure:
> >         .global cosatanf
> >         .syntax unified
> >         .arm
> >         .fpu softvfp
> >         .type   cosatanf, %function
> > cosatanf:
> > .LFB3:
> >         .loc 1 44 1 is_stmt 1
> >         .cfi_startproc
> >         @ args = 0, pretend = 0, frame = 0
> >         @ frame_needed = 0, uses_anonymous_args = 0
> > .LVL50:
> >         .loc 1 45 5
> >         .loc 1 44 1 is_stmt 0
> >         push    {r4, lr}
> >         .cfi_def_cfa_offset 8
> >         .cfi_offset 4, -8
> >         .cfi_offset 14, -4
> >         .loc 1 45 12
> >         bl      atanf
> > .LVL51:
> >         .loc 1 46 1
> >         pop     {r4, lr}
> >         .cfi_restore 14
> >         .cfi_restore 4
> >         .cfi_def_cfa_offset 0
> >         .loc 1 45 12
> >         b       cosf
> > .LVL52:
> >         .cfi_endproc
> > .LFE3:
> >         .size   cosatanf, .-cosatanf
> >
> > So, upon entry we have r0=0x5f800000
> > atanf returns 0x3fc90fdb
> > and cosf returns 0xb33bbd2e
> >
> >
> > > On Tue, Oct 16, 2018 at 12:23 PM Christophe Lyon
> > > <christophe.lyon@linaro.org> wrote:
> > > >
> > > > On Tue, 16 Oct 2018 at 17:15, Giuliano Augusto Faulin Belinassi
> > > > <giuliano.belinassi@usp.br> wrote:
> > > > >
> > > > > Hello. Sorry for the late reply.
> > > > >
> > > > > > but then cosatanf is computed as (ie there's not call to cosatanf()):
> > > > > >        movw    r3, #48430
> > > > > >        movt    r3, 45883
> > > > > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero.
> > > > > Does this behavior is still present if we change the line 58 to:
> > > > >     int __attribute__ ((optimize("O0")))
> > > > > in sinatan-1.c?
> > > >
> > > > No, this now generates:
> > > >         ldr     r0, [fp, #-32]
> > > >         bl      cosatanf
> > > > where r0=0x5f800000, and cosatanf() returns 0xb33bbd2e
> > > > (ie the same value as what was computed by GCC)
> > > >
> > > > > On Fri, Oct 12, 2018 at 3:24 PM Christophe Lyon
> > > > > <christophe.lyon@linaro.org> wrote:
> > > > > >
> > > > > > On Fri, 12 Oct 2018 at 20:01, Giuliano Augusto Faulin Belinassi
> > > > > > <giuliano.belinassi@usp.br> wrote:
> > > > > > >
> > > > > > > > fc is built with:
> > > > > > > >        mov     r0, #0
> > > > > > > >        movt    r0, 24448
> > > > > > > > so r0=0x5f800000 (1.8446744E19) which looks ok
> > > > > > >
> > > > > > > this is correct. My x86_64 yields the same value
> > > > > > >
> > > > > > > > but then cosatanf is computed as (ie there's not call to cosatanf()):
> > > > > > > >        movw    r3, #48430
> > > > > > > >        movt    r3, 45883
> > > > > > > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero.
> > > > > > >
> > > > > > > Ugh. So GCC replaced the function call with a precomputed value? This
> > > > > > > does not happens in my x86_64. Maybe it is Ofast's fault?
> > > > > > > Also, it seems that GCC is precomputing cos(atan(x)) before the
> > > > > > > substitution, as the following python script yields the same result:
> > > > > > >
> > > > > >
> > > > > > Yes, I was surprised to see that.
> > > > > >
> > > > > > > import numpy as np
> > > > > > > x = np.float32 (1.8446744e19)
> > > > > > > print (np.cos (np.arctan (x))
> > > > > > >
> > > > > > > I would also like to add that -4.371139E-8 is very far away from
> > > > > > > 5.421011E-20, which is a "more" correct value for this computation. So
> > > > > > > returning 0 may be a better option?
> > > > > > > On Fri, Oct 12, 2018 at 12:57 PM Jeff Law <law@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On 10/12/18 9:51 AM, Giuliano Augusto Faulin Belinassi wrote:
> > > > > > > > > Hello
> > > > > > > > >      What is the output of these functions on such arch? Since the
> > > > > > > > > test didn't fail for the sinatan counterpart, an possible explanation
> > > > > > > > > would be that the calculation of the sqrf, sqrt and sqrtl (lines
> > > > > > > > > 62-64) yielded a number that is far behind of what it should be.
> > > > > > > > > However, I am still not sure about this, so I will investigate
> > > > > > > > > further.
> > > > > > > > >      How about I  write a small program to check if the result
> > > > > > > > > obtained by this calculation is what it should be?
> > > > > > > > I suspect it's less the architecture and more the underlying library.
> > > > > > > > As Christophe mentioned, both issues are with newlib which is an C
> > > > > > > > library primarily used in the emebedded space.  I believe it's math code
> > > > > > > > derives from early BSD libm and likely hasn't been stressed for this
> > > > > > > > kind of correctness.  It's lightly maintained (primarily for cygwin).
> > > > > > > >
> > > > > > > > I'm going to run the testcases in my arm linux chroots.  That should
> > > > > > > > allow us to rule out codegen issues as those chroots will be using glibc
> > > > > > > > for their math library.
> > > > > > > >
> > > > > > > > jeff
Giuliano Augusto Faulin Belinassi Oct. 16, 2018, 7:29 p.m. | #18
Oh, sorry about that. :P

Could you please check if the 'if (SCALAR_FLOAT_TYPE_P (type)', at
line 4281 in gcc/match.pd  is evaluated true in your arch?

svn revision:
$ svn info | grep Revision
Revision: 265212
On Tue, Oct 16, 2018 at 4:14 PM Christophe Lyon
<christophe.lyon@linaro.org> wrote:
>
> On Tue, 16 Oct 2018 at 19:33, Giuliano Augusto Faulin Belinassi
> <giuliano.belinassi@usp.br> wrote:
> >
> > Hello,
> >
> > > cosatanf:
> > > .LFB3:
> > >         .loc 1 44 1 is_stmt 1
> > >         .cfi_startproc
> > >        @ args = 0, pretend = 0, frame = 0
> > >         @ frame_needed = 0, uses_anonymous_args = 0
> > > .LVL50:
> > >         .loc 1 45 5
> > >         .loc 1 44 1 is_stmt 0
> > >         push    {r4, lr}
> > >         .cfi_def_cfa_offset 8
> > >         .cfi_offset 4, -8
> > >         .cfi_offset 14, -4
> > >         .loc 1 45 12
> > >         bl      atanf
> > > .LVL51:
> > >         .loc 1 46 1
> > >          pop     {r4, lr}
> > >         .cfi_restore 14
> > >         .cfi_restore 4
> > >         .cfi_def_cfa_offset 0
> > >         .loc 1 45 12
> > >         b       cosf
> > > .LVL52:
> >
> > This means that the expression 'cos (atan (x))' was not simplified
> > :-(. Did the check at line 4281 (svn revision 265209) in gcc/match.pd
> > failed?
>
> Do you mean r265064? There's no r265209 in trunk.
>
> > > Yes, if we want to skip the test, but I'm not sure about that?
> > Since the only point of this patch is to simplify these kind of
> > expressions, and it is not being simplified at all in your arch as the
> > asm dump suggests, then it seems safe to skip all sinatan-*.c tests.
>
>
> > On Tue, Oct 16, 2018 at 1:25 PM Christophe Lyon
> > <christophe.lyon@linaro.org> wrote:
> > >
> > > On Tue, 16 Oct 2018 at 18:04, Giuliano Augusto Faulin Belinassi
> > > <giuliano.belinassi@usp.br> wrote:
> > > >
> > > > Hello, Christophe
> > > >     Could you please dump the assembly of cosatanf here?
> > > >
> > > >
> > > Sure:
> > >         .global cosatanf
> > >         .syntax unified
> > >         .arm
> > >         .fpu softvfp
> > >         .type   cosatanf, %function
> > > cosatanf:
> > > .LFB3:
> > >         .loc 1 44 1 is_stmt 1
> > >         .cfi_startproc
> > >         @ args = 0, pretend = 0, frame = 0
> > >         @ frame_needed = 0, uses_anonymous_args = 0
> > > .LVL50:
> > >         .loc 1 45 5
> > >         .loc 1 44 1 is_stmt 0
> > >         push    {r4, lr}
> > >         .cfi_def_cfa_offset 8
> > >         .cfi_offset 4, -8
> > >         .cfi_offset 14, -4
> > >         .loc 1 45 12
> > >         bl      atanf
> > > .LVL51:
> > >         .loc 1 46 1
> > >         pop     {r4, lr}
> > >         .cfi_restore 14
> > >         .cfi_restore 4
> > >         .cfi_def_cfa_offset 0
> > >         .loc 1 45 12
> > >         b       cosf
> > > .LVL52:
> > >         .cfi_endproc
> > > .LFE3:
> > >         .size   cosatanf, .-cosatanf
> > >
> > > So, upon entry we have r0=0x5f800000
> > > atanf returns 0x3fc90fdb
> > > and cosf returns 0xb33bbd2e
> > >
> > >
> > > > On Tue, Oct 16, 2018 at 12:23 PM Christophe Lyon
> > > > <christophe.lyon@linaro.org> wrote:
> > > > >
> > > > > On Tue, 16 Oct 2018 at 17:15, Giuliano Augusto Faulin Belinassi
> > > > > <giuliano.belinassi@usp.br> wrote:
> > > > > >
> > > > > > Hello. Sorry for the late reply.
> > > > > >
> > > > > > > but then cosatanf is computed as (ie there's not call to cosatanf()):
> > > > > > >        movw    r3, #48430
> > > > > > >        movt    r3, 45883
> > > > > > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero.
> > > > > > Does this behavior is still present if we change the line 58 to:
> > > > > >     int __attribute__ ((optimize("O0")))
> > > > > > in sinatan-1.c?
> > > > >
> > > > > No, this now generates:
> > > > >         ldr     r0, [fp, #-32]
> > > > >         bl      cosatanf
> > > > > where r0=0x5f800000, and cosatanf() returns 0xb33bbd2e
> > > > > (ie the same value as what was computed by GCC)
> > > > >
> > > > > > On Fri, Oct 12, 2018 at 3:24 PM Christophe Lyon
> > > > > > <christophe.lyon@linaro.org> wrote:
> > > > > > >
> > > > > > > On Fri, 12 Oct 2018 at 20:01, Giuliano Augusto Faulin Belinassi
> > > > > > > <giuliano.belinassi@usp.br> wrote:
> > > > > > > >
> > > > > > > > > fc is built with:
> > > > > > > > >        mov     r0, #0
> > > > > > > > >        movt    r0, 24448
> > > > > > > > > so r0=0x5f800000 (1.8446744E19) which looks ok
> > > > > > > >
> > > > > > > > this is correct. My x86_64 yields the same value
> > > > > > > >
> > > > > > > > > but then cosatanf is computed as (ie there's not call to cosatanf()):
> > > > > > > > >        movw    r3, #48430
> > > > > > > > >        movt    r3, 45883
> > > > > > > > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero.
> > > > > > > >
> > > > > > > > Ugh. So GCC replaced the function call with a precomputed value? This
> > > > > > > > does not happens in my x86_64. Maybe it is Ofast's fault?
> > > > > > > > Also, it seems that GCC is precomputing cos(atan(x)) before the
> > > > > > > > substitution, as the following python script yields the same result:
> > > > > > > >
> > > > > > >
> > > > > > > Yes, I was surprised to see that.
> > > > > > >
> > > > > > > > import numpy as np
> > > > > > > > x = np.float32 (1.8446744e19)
> > > > > > > > print (np.cos (np.arctan (x))
> > > > > > > >
> > > > > > > > I would also like to add that -4.371139E-8 is very far away from
> > > > > > > > 5.421011E-20, which is a "more" correct value for this computation. So
> > > > > > > > returning 0 may be a better option?
> > > > > > > > On Fri, Oct 12, 2018 at 12:57 PM Jeff Law <law@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On 10/12/18 9:51 AM, Giuliano Augusto Faulin Belinassi wrote:
> > > > > > > > > > Hello
> > > > > > > > > >      What is the output of these functions on such arch? Since the
> > > > > > > > > > test didn't fail for the sinatan counterpart, an possible explanation
> > > > > > > > > > would be that the calculation of the sqrf, sqrt and sqrtl (lines
> > > > > > > > > > 62-64) yielded a number that is far behind of what it should be.
> > > > > > > > > > However, I am still not sure about this, so I will investigate
> > > > > > > > > > further.
> > > > > > > > > >      How about I  write a small program to check if the result
> > > > > > > > > > obtained by this calculation is what it should be?
> > > > > > > > > I suspect it's less the architecture and more the underlying library.
> > > > > > > > > As Christophe mentioned, both issues are with newlib which is an C
> > > > > > > > > library primarily used in the emebedded space.  I believe it's math code
> > > > > > > > > derives from early BSD libm and likely hasn't been stressed for this
> > > > > > > > > kind of correctness.  It's lightly maintained (primarily for cygwin).
> > > > > > > > >
> > > > > > > > > I'm going to run the testcases in my arm linux chroots.  That should
> > > > > > > > > allow us to rule out codegen issues as those chroots will be using glibc
> > > > > > > > > for their math library.
> > > > > > > > >
> > > > > > > > > jeff
Giuliano Augusto Faulin Belinassi Oct. 16, 2018, 8:33 p.m. | #19
Please ignore the previous message. I did not know how revisions in
SVN worked until now. Sorry :-(

The code I referred was added in r265064, by law. So Christophe was correct.

Could you please check if the 'if (SCALAR_FLOAT_TYPE_P (type)', at
line 4260 in gcc/match.pd (r265064)  is evaluated true in your arch?


On Tue, Oct 16, 2018 at 4:29 PM Giuliano Augusto Faulin Belinassi
<giuliano.belinassi@usp.br> wrote:
>
> Oh, sorry about that. :P
>
> Could you please check if the 'if (SCALAR_FLOAT_TYPE_P (type)', at
> line 4281 in gcc/match.pd  is evaluated true in your arch?
>
> svn revision:
> $ svn info | grep Revision
> Revision: 265212
> On Tue, Oct 16, 2018 at 4:14 PM Christophe Lyon
> <christophe.lyon@linaro.org> wrote:
> >
> > On Tue, 16 Oct 2018 at 19:33, Giuliano Augusto Faulin Belinassi
> > <giuliano.belinassi@usp.br> wrote:
> > >
> > > Hello,
> > >
> > > > cosatanf:
> > > > .LFB3:
> > > >         .loc 1 44 1 is_stmt 1
> > > >         .cfi_startproc
> > > >        @ args = 0, pretend = 0, frame = 0
> > > >         @ frame_needed = 0, uses_anonymous_args = 0
> > > > .LVL50:
> > > >         .loc 1 45 5
> > > >         .loc 1 44 1 is_stmt 0
> > > >         push    {r4, lr}
> > > >         .cfi_def_cfa_offset 8
> > > >         .cfi_offset 4, -8
> > > >         .cfi_offset 14, -4
> > > >         .loc 1 45 12
> > > >         bl      atanf
> > > > .LVL51:
> > > >         .loc 1 46 1
> > > >          pop     {r4, lr}
> > > >         .cfi_restore 14
> > > >         .cfi_restore 4
> > > >         .cfi_def_cfa_offset 0
> > > >         .loc 1 45 12
> > > >         b       cosf
> > > > .LVL52:
> > >
> > > This means that the expression 'cos (atan (x))' was not simplified
> > > :-(. Did the check at line 4281 (svn revision 265209) in gcc/match.pd
> > > failed?
> >
> > Do you mean r265064? There's no r265209 in trunk.
> >
> > > > Yes, if we want to skip the test, but I'm not sure about that?
> > > Since the only point of this patch is to simplify these kind of
> > > expressions, and it is not being simplified at all in your arch as the
> > > asm dump suggests, then it seems safe to skip all sinatan-*.c tests.
> >
> >
> > > On Tue, Oct 16, 2018 at 1:25 PM Christophe Lyon
> > > <christophe.lyon@linaro.org> wrote:
> > > >
> > > > On Tue, 16 Oct 2018 at 18:04, Giuliano Augusto Faulin Belinassi
> > > > <giuliano.belinassi@usp.br> wrote:
> > > > >
> > > > > Hello, Christophe
> > > > >     Could you please dump the assembly of cosatanf here?
> > > > >
> > > > >
> > > > Sure:
> > > >         .global cosatanf
> > > >         .syntax unified
> > > >         .arm
> > > >         .fpu softvfp
> > > >         .type   cosatanf, %function
> > > > cosatanf:
> > > > .LFB3:
> > > >         .loc 1 44 1 is_stmt 1
> > > >         .cfi_startproc
> > > >         @ args = 0, pretend = 0, frame = 0
> > > >         @ frame_needed = 0, uses_anonymous_args = 0
> > > > .LVL50:
> > > >         .loc 1 45 5
> > > >         .loc 1 44 1 is_stmt 0
> > > >         push    {r4, lr}
> > > >         .cfi_def_cfa_offset 8
> > > >         .cfi_offset 4, -8
> > > >         .cfi_offset 14, -4
> > > >         .loc 1 45 12
> > > >         bl      atanf
> > > > .LVL51:
> > > >         .loc 1 46 1
> > > >         pop     {r4, lr}
> > > >         .cfi_restore 14
> > > >         .cfi_restore 4
> > > >         .cfi_def_cfa_offset 0
> > > >         .loc 1 45 12
> > > >         b       cosf
> > > > .LVL52:
> > > >         .cfi_endproc
> > > > .LFE3:
> > > >         .size   cosatanf, .-cosatanf
> > > >
> > > > So, upon entry we have r0=0x5f800000
> > > > atanf returns 0x3fc90fdb
> > > > and cosf returns 0xb33bbd2e
> > > >
> > > >
> > > > > On Tue, Oct 16, 2018 at 12:23 PM Christophe Lyon
> > > > > <christophe.lyon@linaro.org> wrote:
> > > > > >
> > > > > > On Tue, 16 Oct 2018 at 17:15, Giuliano Augusto Faulin Belinassi
> > > > > > <giuliano.belinassi@usp.br> wrote:
> > > > > > >
> > > > > > > Hello. Sorry for the late reply.
> > > > > > >
> > > > > > > > but then cosatanf is computed as (ie there's not call to cosatanf()):
> > > > > > > >        movw    r3, #48430
> > > > > > > >        movt    r3, 45883
> > > > > > > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero.
> > > > > > > Does this behavior is still present if we change the line 58 to:
> > > > > > >     int __attribute__ ((optimize("O0")))
> > > > > > > in sinatan-1.c?
> > > > > >
> > > > > > No, this now generates:
> > > > > >         ldr     r0, [fp, #-32]
> > > > > >         bl      cosatanf
> > > > > > where r0=0x5f800000, and cosatanf() returns 0xb33bbd2e
> > > > > > (ie the same value as what was computed by GCC)
> > > > > >
> > > > > > > On Fri, Oct 12, 2018 at 3:24 PM Christophe Lyon
> > > > > > > <christophe.lyon@linaro.org> wrote:
> > > > > > > >
> > > > > > > > On Fri, 12 Oct 2018 at 20:01, Giuliano Augusto Faulin Belinassi
> > > > > > > > <giuliano.belinassi@usp.br> wrote:
> > > > > > > > >
> > > > > > > > > > fc is built with:
> > > > > > > > > >        mov     r0, #0
> > > > > > > > > >        movt    r0, 24448
> > > > > > > > > > so r0=0x5f800000 (1.8446744E19) which looks ok
> > > > > > > > >
> > > > > > > > > this is correct. My x86_64 yields the same value
> > > > > > > > >
> > > > > > > > > > but then cosatanf is computed as (ie there's not call to cosatanf()):
> > > > > > > > > >        movw    r3, #48430
> > > > > > > > > >        movt    r3, 45883
> > > > > > > > > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero.
> > > > > > > > >
> > > > > > > > > Ugh. So GCC replaced the function call with a precomputed value? This
> > > > > > > > > does not happens in my x86_64. Maybe it is Ofast's fault?
> > > > > > > > > Also, it seems that GCC is precomputing cos(atan(x)) before the
> > > > > > > > > substitution, as the following python script yields the same result:
> > > > > > > > >
> > > > > > > >
> > > > > > > > Yes, I was surprised to see that.
> > > > > > > >
> > > > > > > > > import numpy as np
> > > > > > > > > x = np.float32 (1.8446744e19)
> > > > > > > > > print (np.cos (np.arctan (x))
> > > > > > > > >
> > > > > > > > > I would also like to add that -4.371139E-8 is very far away from
> > > > > > > > > 5.421011E-20, which is a "more" correct value for this computation. So
> > > > > > > > > returning 0 may be a better option?
> > > > > > > > > On Fri, Oct 12, 2018 at 12:57 PM Jeff Law <law@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On 10/12/18 9:51 AM, Giuliano Augusto Faulin Belinassi wrote:
> > > > > > > > > > > Hello
> > > > > > > > > > >      What is the output of these functions on such arch? Since the
> > > > > > > > > > > test didn't fail for the sinatan counterpart, an possible explanation
> > > > > > > > > > > would be that the calculation of the sqrf, sqrt and sqrtl (lines
> > > > > > > > > > > 62-64) yielded a number that is far behind of what it should be.
> > > > > > > > > > > However, I am still not sure about this, so I will investigate
> > > > > > > > > > > further.
> > > > > > > > > > >      How about I  write a small program to check if the result
> > > > > > > > > > > obtained by this calculation is what it should be?
> > > > > > > > > > I suspect it's less the architecture and more the underlying library.
> > > > > > > > > > As Christophe mentioned, both issues are with newlib which is an C
> > > > > > > > > > library primarily used in the emebedded space.  I believe it's math code
> > > > > > > > > > derives from early BSD libm and likely hasn't been stressed for this
> > > > > > > > > > kind of correctness.  It's lightly maintained (primarily for cygwin).
> > > > > > > > > >
> > > > > > > > > > I'm going to run the testcases in my arm linux chroots.  That should
> > > > > > > > > > allow us to rule out codegen issues as those chroots will be using glibc
> > > > > > > > > > for their math library.
> > > > > > > > > >
> > > > > > > > > > jeff
Richard Biener Oct. 17, 2018, 7:18 a.m. | #20
On Tue, Oct 16, 2018 at 10:33 PM Giuliano Augusto Faulin Belinassi
<giuliano.belinassi@usp.br> wrote:
>
> Please ignore the previous message. I did not know how revisions in
> SVN worked until now. Sorry :-(
>
> The code I referred was added in r265064, by law. So Christophe was correct.
>
> Could you please check if the 'if (SCALAR_FLOAT_TYPE_P (type)', at
> line 4260 in gcc/match.pd (r265064)  is evaluated true in your arch?

The issue is more likely copysign being used in the replacement but the
testcases not declaring that and the target not providing an optab.
Same for sqrt.

match.pd will only do replacements to functions that have been properly
declared.

Richard.

>
> On Tue, Oct 16, 2018 at 4:29 PM Giuliano Augusto Faulin Belinassi
> <giuliano.belinassi@usp.br> wrote:
> >
> > Oh, sorry about that. :P
> >
> > Could you please check if the 'if (SCALAR_FLOAT_TYPE_P (type)', at
> > line 4281 in gcc/match.pd  is evaluated true in your arch?
> >
> > svn revision:
> > $ svn info | grep Revision
> > Revision: 265212
> > On Tue, Oct 16, 2018 at 4:14 PM Christophe Lyon
> > <christophe.lyon@linaro.org> wrote:
> > >
> > > On Tue, 16 Oct 2018 at 19:33, Giuliano Augusto Faulin Belinassi
> > > <giuliano.belinassi@usp.br> wrote:
> > > >
> > > > Hello,
> > > >
> > > > > cosatanf:
> > > > > .LFB3:
> > > > >         .loc 1 44 1 is_stmt 1
> > > > >         .cfi_startproc
> > > > >        @ args = 0, pretend = 0, frame = 0
> > > > >         @ frame_needed = 0, uses_anonymous_args = 0
> > > > > .LVL50:
> > > > >         .loc 1 45 5
> > > > >         .loc 1 44 1 is_stmt 0
> > > > >         push    {r4, lr}
> > > > >         .cfi_def_cfa_offset 8
> > > > >         .cfi_offset 4, -8
> > > > >         .cfi_offset 14, -4
> > > > >         .loc 1 45 12
> > > > >         bl      atanf
> > > > > .LVL51:
> > > > >         .loc 1 46 1
> > > > >          pop     {r4, lr}
> > > > >         .cfi_restore 14
> > > > >         .cfi_restore 4
> > > > >         .cfi_def_cfa_offset 0
> > > > >         .loc 1 45 12
> > > > >         b       cosf
> > > > > .LVL52:
> > > >
> > > > This means that the expression 'cos (atan (x))' was not simplified
> > > > :-(. Did the check at line 4281 (svn revision 265209) in gcc/match.pd
> > > > failed?
> > >
> > > Do you mean r265064? There's no r265209 in trunk.
> > >
> > > > > Yes, if we want to skip the test, but I'm not sure about that?
> > > > Since the only point of this patch is to simplify these kind of
> > > > expressions, and it is not being simplified at all in your arch as the
> > > > asm dump suggests, then it seems safe to skip all sinatan-*.c tests.
> > >
> > >
> > > > On Tue, Oct 16, 2018 at 1:25 PM Christophe Lyon
> > > > <christophe.lyon@linaro.org> wrote:
> > > > >
> > > > > On Tue, 16 Oct 2018 at 18:04, Giuliano Augusto Faulin Belinassi
> > > > > <giuliano.belinassi@usp.br> wrote:
> > > > > >
> > > > > > Hello, Christophe
> > > > > >     Could you please dump the assembly of cosatanf here?
> > > > > >
> > > > > >
> > > > > Sure:
> > > > >         .global cosatanf
> > > > >         .syntax unified
> > > > >         .arm
> > > > >         .fpu softvfp
> > > > >         .type   cosatanf, %function
> > > > > cosatanf:
> > > > > .LFB3:
> > > > >         .loc 1 44 1 is_stmt 1
> > > > >         .cfi_startproc
> > > > >         @ args = 0, pretend = 0, frame = 0
> > > > >         @ frame_needed = 0, uses_anonymous_args = 0
> > > > > .LVL50:
> > > > >         .loc 1 45 5
> > > > >         .loc 1 44 1 is_stmt 0
> > > > >         push    {r4, lr}
> > > > >         .cfi_def_cfa_offset 8
> > > > >         .cfi_offset 4, -8
> > > > >         .cfi_offset 14, -4
> > > > >         .loc 1 45 12
> > > > >         bl      atanf
> > > > > .LVL51:
> > > > >         .loc 1 46 1
> > > > >         pop     {r4, lr}
> > > > >         .cfi_restore 14
> > > > >         .cfi_restore 4
> > > > >         .cfi_def_cfa_offset 0
> > > > >         .loc 1 45 12
> > > > >         b       cosf
> > > > > .LVL52:
> > > > >         .cfi_endproc
> > > > > .LFE3:
> > > > >         .size   cosatanf, .-cosatanf
> > > > >
> > > > > So, upon entry we have r0=0x5f800000
> > > > > atanf returns 0x3fc90fdb
> > > > > and cosf returns 0xb33bbd2e
> > > > >
> > > > >
> > > > > > On Tue, Oct 16, 2018 at 12:23 PM Christophe Lyon
> > > > > > <christophe.lyon@linaro.org> wrote:
> > > > > > >
> > > > > > > On Tue, 16 Oct 2018 at 17:15, Giuliano Augusto Faulin Belinassi
> > > > > > > <giuliano.belinassi@usp.br> wrote:
> > > > > > > >
> > > > > > > > Hello. Sorry for the late reply.
> > > > > > > >
> > > > > > > > > but then cosatanf is computed as (ie there's not call to cosatanf()):
> > > > > > > > >        movw    r3, #48430
> > > > > > > > >        movt    r3, 45883
> > > > > > > > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero.
> > > > > > > > Does this behavior is still present if we change the line 58 to:
> > > > > > > >     int __attribute__ ((optimize("O0")))
> > > > > > > > in sinatan-1.c?
> > > > > > >
> > > > > > > No, this now generates:
> > > > > > >         ldr     r0, [fp, #-32]
> > > > > > >         bl      cosatanf
> > > > > > > where r0=0x5f800000, and cosatanf() returns 0xb33bbd2e
> > > > > > > (ie the same value as what was computed by GCC)
> > > > > > >
> > > > > > > > On Fri, Oct 12, 2018 at 3:24 PM Christophe Lyon
> > > > > > > > <christophe.lyon@linaro.org> wrote:
> > > > > > > > >
> > > > > > > > > On Fri, 12 Oct 2018 at 20:01, Giuliano Augusto Faulin Belinassi
> > > > > > > > > <giuliano.belinassi@usp.br> wrote:
> > > > > > > > > >
> > > > > > > > > > > fc is built with:
> > > > > > > > > > >        mov     r0, #0
> > > > > > > > > > >        movt    r0, 24448
> > > > > > > > > > > so r0=0x5f800000 (1.8446744E19) which looks ok
> > > > > > > > > >
> > > > > > > > > > this is correct. My x86_64 yields the same value
> > > > > > > > > >
> > > > > > > > > > > but then cosatanf is computed as (ie there's not call to cosatanf()):
> > > > > > > > > > >        movw    r3, #48430
> > > > > > > > > > >        movt    r3, 45883
> > > > > > > > > > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero.
> > > > > > > > > >
> > > > > > > > > > Ugh. So GCC replaced the function call with a precomputed value? This
> > > > > > > > > > does not happens in my x86_64. Maybe it is Ofast's fault?
> > > > > > > > > > Also, it seems that GCC is precomputing cos(atan(x)) before the
> > > > > > > > > > substitution, as the following python script yields the same result:
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yes, I was surprised to see that.
> > > > > > > > >
> > > > > > > > > > import numpy as np
> > > > > > > > > > x = np.float32 (1.8446744e19)
> > > > > > > > > > print (np.cos (np.arctan (x))
> > > > > > > > > >
> > > > > > > > > > I would also like to add that -4.371139E-8 is very far away from
> > > > > > > > > > 5.421011E-20, which is a "more" correct value for this computation. So
> > > > > > > > > > returning 0 may be a better option?
> > > > > > > > > > On Fri, Oct 12, 2018 at 12:57 PM Jeff Law <law@redhat.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On 10/12/18 9:51 AM, Giuliano Augusto Faulin Belinassi wrote:
> > > > > > > > > > > > Hello
> > > > > > > > > > > >      What is the output of these functions on such arch? Since the
> > > > > > > > > > > > test didn't fail for the sinatan counterpart, an possible explanation
> > > > > > > > > > > > would be that the calculation of the sqrf, sqrt and sqrtl (lines
> > > > > > > > > > > > 62-64) yielded a number that is far behind of what it should be.
> > > > > > > > > > > > However, I am still not sure about this, so I will investigate
> > > > > > > > > > > > further.
> > > > > > > > > > > >      How about I  write a small program to check if the result
> > > > > > > > > > > > obtained by this calculation is what it should be?
> > > > > > > > > > > I suspect it's less the architecture and more the underlying library.
> > > > > > > > > > > As Christophe mentioned, both issues are with newlib which is an C
> > > > > > > > > > > library primarily used in the emebedded space.  I believe it's math code
> > > > > > > > > > > derives from early BSD libm and likely hasn't been stressed for this
> > > > > > > > > > > kind of correctness.  It's lightly maintained (primarily for cygwin).
> > > > > > > > > > >
> > > > > > > > > > > I'm going to run the testcases in my arm linux chroots.  That should
> > > > > > > > > > > allow us to rule out codegen issues as those chroots will be using glibc
> > > > > > > > > > > for their math library.
> > > > > > > > > > >
> > > > > > > > > > > jeff

Patch

Index: gcc/match.pd
===================================================================
--- gcc/match.pd	(revision 264941)
+++ gcc/match.pd	(working copy)
@@ -4223,6 +4223,45 @@ 
    (tans (atans @0))
    @0)))
 
+ /* Simplify sin(atan(x)) -> x / sqrt(x*x + 1). */
+ (for sins (SIN)
+      atans (ATAN)
+      sqrts (SQRT)
+      copysigns (COPYSIGN)
+  (simplify
+   (sins (atans:s @0))
+   (with
+     {
+      REAL_VALUE_TYPE r_cst;
+      build_sinatan_real (&r_cst, type);
+      tree t_cst = build_real (type, r_cst);
+      tree t_one = build_one_cst (type);
+     }
+    (if (SCALAR_FLOAT_TYPE_P (type))
+     (cond (le (abs @0) { t_cst; })
+      (rdiv @0 (sqrts (plus (mult @0 @0) { t_one; })))
+      (copysigns { t_one; } @0))))))
+
+/* Simplify cos(atan(x)) -> 1 / sqrt(x*x + 1). */
+ (for coss (COS)
+      atans (ATAN)
+      sqrts (SQRT)
+      copysigns (COPYSIGN)
+  (simplify
+   (coss (atans:s @0))
+   (with
+     {
+      REAL_VALUE_TYPE r_cst;
+      build_sinatan_real (&r_cst, type);
+      tree t_cst = build_real (type, r_cst);
+      tree t_one = build_one_cst (type);
+      tree t_zero = build_zero_cst (type);
+     }
+    (if (SCALAR_FLOAT_TYPE_P (type))
+     (cond (le (abs @0) { t_cst; })
+      (rdiv { t_one; } (sqrts (plus (mult @0 @0) { t_one; })))
+      (copysigns { t_zero; } @0))))))
+
 /* cabs(x+0i) or cabs(0+xi) -> abs(x).  */
 (simplify
  (CABS (complex:C @0 real_zerop@1))
Index: gcc/real.c
===================================================================
--- gcc/real.c	(revision 264941)
+++ gcc/real.c	(working copy)
@@ -5279,3 +5279,29 @@ 
 {
   return HONOR_SIGN_DEPENDENT_ROUNDING (GET_MODE (x));
 }
+
+/* Fills r with the largest value such that 1 + r*r won't overflow.
+   This is used in both sin (atan (x)) and cos (atan(x)) optimizations. */
+
+void
+build_sinatan_real (REAL_VALUE_TYPE * r, tree type)
+{
+  REAL_VALUE_TYPE maxval;
+  mpfr_t mpfr_const1, mpfr_c, mpfr_maxval;
+  machine_mode mode = TYPE_MODE (type);
+  const struct real_format * fmt = REAL_MODE_FORMAT (mode);
+
+  real_maxval (&maxval, 0, mode);
+
+  mpfr_inits (mpfr_const1, mpfr_c, mpfr_maxval, NULL);
+
+  mpfr_from_real (mpfr_const1, &dconst1, GMP_RNDN);
+  mpfr_from_real (mpfr_maxval, &maxval,  GMP_RNDN);
+
+  mpfr_sub (mpfr_c, mpfr_maxval, mpfr_const1, GMP_RNDN);
+  mpfr_sqrt (mpfr_c, mpfr_c, GMP_RNDZ);
+
+  real_from_mpfr (r, mpfr_c, fmt, GMP_RNDZ);
+  
+  mpfr_clears (mpfr_const1, mpfr_c, mpfr_maxval, NULL);
+}
Index: gcc/real.h
===================================================================
--- gcc/real.h	(revision 264941)
+++ gcc/real.h	(working copy)
@@ -523,4 +523,8 @@ 
 			       const wide_int_ref &, signop);
 #endif
 
+/* Fills r with the largest value such that 1 + r*r won't overflow.
+   This is used in both sin (atan (x)) and cos (atan(x)) optimizations. */
+extern void build_sinatan_real (REAL_VALUE_TYPE *, tree); 
+
 #endif /* ! GCC_REAL_H */
Index: gcc/testsuite/gcc.dg/sinatan-1.c
===================================================================
--- gcc/testsuite/gcc.dg/sinatan-1.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/sinatan-1.c	(working copy)
@@ -0,0 +1,101 @@ 
+/* { dg-do run } */
+/* { dg-options "-Ofast" } */
+
+extern float sinf (float);
+extern float cosf (float);
+extern float atanf (float);
+extern float sqrtf (float);
+extern float nextafterf (float, float);
+extern double sin (double);
+extern double cos (double);
+extern double atan (double);
+extern double sqrt (double);
+extern double nextafter (double, double);
+extern long double sinl (long double);
+extern long double cosl (long double);
+extern long double atanl (long double);
+extern long double sqrtl (long double);
+extern long double nextafterl (long double, long double);
+
+extern void abort ();
+
+double __attribute__ ((noinline, optimize("Ofast")))
+sinatan (double x)
+{
+    return sin (atan (x));
+}
+
+double __attribute__ ((noinline, optimize("Ofast")))
+cosatan (double x)
+{
+    return cos (atan (x));
+}
+
+float __attribute__ ((noinline, optimize("Ofast")))
+sinatanf(float x)
+{
+    return sinf (atanf (x));
+}
+
+float __attribute__ ((noinline, optimize("Ofast")))
+cosatanf(float x)
+{
+    return cosf (atanf (x));
+}
+
+long double __attribute__ ((noinline, optimize("Ofast")))
+sinatanl (long double x)
+{
+    return sinl (atanl (x));
+}
+
+long double __attribute__ ((noinline, optimize("Ofast")))
+cosatanl (long double x)
+{
+    return cosl (atanl (x));
+}
+
+int
+main()
+{
+    /* Get first x such that 1 + x*x will overflow */
+    float fc = nextafterf (sqrtf (__FLT_MAX__ - 1), __FLT_MAX__);
+    double c = nextafter (sqrt (__DBL_MAX__ - 1), __DBL_MAX__);
+    long double lc = nextafter (sqrtl (__LDBL_MAX__ - 1), __LDBL_MAX__);
+
+    /*  Force move from FPU to memory, otherwise comparison may
+        fail due to possible more accurate registers (see 387)  */
+    volatile float fy;
+    volatile double y;
+    volatile long double ly;
+
+    fy = sinatanf (fc);
+    y = sinatan (c);
+    ly = sinatanl (lc);
+
+    if (fy != 1.f || y != 1 || ly != 1.L)
+        abort ();
+
+    fy = cosatanf (fc);
+    y = cosatan (c);
+    ly = cosatanl (lc);
+
+    if (fy != 0.f || y != 0. || ly != 0.L)
+        abort ();
+
+    fy = sinatanf (-fc);
+    y = sinatan (-c);
+    ly = sinatanl (-lc);
+
+    if (fy != -1.f || y != -1. || ly != -1.L)
+        abort ();
+    
+    fy = cosatanf (-fc);
+    y = cosatan (-c);
+    ly = cosatanl (-lc);
+
+    if (fy != 0.f || y != 0. || ly != 0.L)
+        abort ();
+
+    return 0;
+}
Index: gcc/testsuite/gcc.dg/sinatan-2.c
===================================================================
--- gcc/testsuite/gcc.dg/sinatan-2.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/sinatan-2.c	(working copy)
@@ -0,0 +1,59 @@ 
+/* { dg-do compile } */
+/* { dg-options "-Ofast -fdump-tree-optimized" } */
+
+extern float sinf (float);
+extern float cosf (float);
+extern float atanf (float);
+extern double sin (double);
+extern double cos (double);
+extern double atan (double);
+extern long double sinl (long double);
+extern long double cosl (long double);
+extern long double atanl (long double);
+
+double __attribute__ ((noinline))
+sinatan_ (double x)
+{
+    return sin (atan (x));
+}
+
+double __attribute__ ((noinline))
+cosatan_ (double x)
+{
+    return cos (atan (x));
+}
+
+float __attribute__ ((noinline))
+sinatanf_(float x)
+{
+    return sinf (atanf (x));
+}
+
+float __attribute__ ((noinline))
+cosatanf_(float x)
+{
+    return cosf (atanf (x));
+}
+
+long double __attribute__ ((noinline))
+sinatanl_ (long double x)
+{
+    return sinl (atanl (x));
+}
+
+long double __attribute__ ((noinline))
+cosatanl_ (long double x)
+{
+    return cosl (atanl (x));
+}
+
+/* There must be no calls to sin, cos, or atan */
+/* {dg-final { scan-tree-dump-not "sin " "optimized" } } */
+/* {dg-final { scan-tree-dump-not "cos " "optimized" } } */
+/* {dg-final { scan-tree-dump-not "atan " "optimized" }} */
+/* {dg-final { scan-tree-dump-not "sinf " "optimized" } } */
+/* {dg-final { scan-tree-dump-not "cosf " "optimized" } } */
+/* {dg-final { scan-tree-dump-not "atanf " "optimized" }} */
+/* {dg-final { scan-tree-dump-not "sinl " "optimized" } } */
+/* {dg-final { scan-tree-dump-not "cosl " "optimized" } } */
+/* {dg-final { scan-tree-dump-not "atanl " "optimized" }} */
Index: gcc/testsuite/gcc.dg/sinatan-3.c
===================================================================
--- gcc/testsuite/gcc.dg/sinatan-3.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/sinatan-3.c	(working copy)
@@ -0,0 +1,65 @@ 
+/* { dg-do compile } */
+/* { dg-options "-Ofast -fdump-tree-optimized" } */
+
+extern float sinf (float);
+extern float cosf (float);
+extern float atanf (float);
+extern double sin (double);
+extern double cos (double);
+extern double atan (double);
+extern long double sinl (long double);
+extern long double cosl (long double);
+extern long double atanl (long double);
+
+float __attribute__ ((noinline)) 
+cosatanf_(float x)
+{
+    float atg = atanf(x);
+    return cosf(atg) + atg;
+}
+
+double __attribute__ ((noinline)) 
+cosatan_(double x)
+{
+    double atg = atan(x);
+    return cos(atg) + atg;
+}
+
+long double __attribute__ ((noinline)) 
+cosatanl_(long double x)
+{
+    long double atg = atanl(x);
+    return cosl(atg) + atg;
+}
+
+float __attribute__ ((noinline)) 
+sinatanf_(float x)
+{
+    float atg = atanf(x);
+    return sinf(atg) + atg;
+}
+
+double __attribute__ ((noinline)) 
+sinatan_(double x)
+{
+    double atg = atan(x);
+    return sin(atg) + atg;
+}
+
+long double __attribute__ ((noinline)) 
+sinatanl_(long double x)
+{
+    long double atg = atanl(x);
+    return sinl(atg) + atg;
+}
+
+/* There should be calls to both sin and atan */
+/* { dg-final { scan-tree-dump "cos " "optimized" } } */
+/* { dg-final { scan-tree-dump "sin " "optimized" } } */
+/* { dg-final { scan-tree-dump "atan " "optimized" } } */
+/* { dg-final { scan-tree-dump "cosf " "optimized" } } */
+/* { dg-final { scan-tree-dump "sinf " "optimized" } } */
+/* { dg-final { scan-tree-dump "atanf " "optimized" } } */
+/* { dg-final { scan-tree-dump "cosl " "optimized" } } */
+/* { dg-final { scan-tree-dump "sinl " "optimized" } } */
+/* { dg-final { scan-tree-dump "atanl " "optimized" } } */