Message ID | CAEFO=4DcnN6u_aXHiE=r8MUmmPuaBqYLYsw7Yg3fHggQC3_c6g@mail.gmail.com |
---|---|
State | New |
Headers | show |
Series | Optimize sin(atan(x)) and cos(atan(x)), take 3 (PR tree-optimization/86829) | expand |
On 10/9/18 5:29 PM, Giuliano Augusto Faulin Belinassi wrote: > Fixed all issues pointed in the previous iteration. > There is now a significant change regarding how the sin(atan(x)) > constant is calculated, as now it checks for which values such that > computing 1 + x*x won't overflow. There are two reasons for this > change: (1) Avoid an intermediate infinity value when optimizing > cos(atan(x)), and (2) avoid the requirement of separate constants for > sin(atan(x)) and cos(atan(x)), thus making easier to maintain the > code. > > gcc/ChangeLog > > 2018-10-09 Giuliano Belinassi <giuliano.belinassi@usp.br> > > PR tree-optimization/86829 > * match.pd: Added sin(atan(x)) and cos(atan(x)) simplification rules. > * real.c (build_sinatan_real): New function to build a constant equal to the > largest value c such that 1 + c*c will not overflow. > * real.h (build_sinatan_real): Allows this function to be called externally. > > gcc/testsuite/gcc.dg/ChangeLog > > 2018-10-09 Giuliano Belinassi <giuliano.belinassi@usp.br> > > PR tree-optimization/86829 > * gcc.dg/sinatan-1.c: New test. > * gcc.dg/sinatan-2.c: New test. > * gcc.dg/sinatan-3.c: New test. > > There are no tests broken in trunk that seems related to this PR. THanks. I've installed this onto the trunk. It's right at the borderline of what would require a copyright assignment. So if you're going to do further work on GCC you should go ahead and start the copyright assignment process. Jeff
On Thu, 11 Oct 2018 at 23:07, Jeff Law <law@redhat.com> wrote: > > On 10/9/18 5:29 PM, Giuliano Augusto Faulin Belinassi wrote: > > Fixed all issues pointed in the previous iteration. > > There is now a significant change regarding how the sin(atan(x)) > > constant is calculated, as now it checks for which values such that > > computing 1 + x*x won't overflow. There are two reasons for this > > change: (1) Avoid an intermediate infinity value when optimizing > > cos(atan(x)), and (2) avoid the requirement of separate constants for > > sin(atan(x)) and cos(atan(x)), thus making easier to maintain the > > code. > > > > gcc/ChangeLog > > > > 2018-10-09 Giuliano Belinassi <giuliano.belinassi@usp.br> > > > > PR tree-optimization/86829 > > * match.pd: Added sin(atan(x)) and cos(atan(x)) simplification rules. > > * real.c (build_sinatan_real): New function to build a constant equal to the > > largest value c such that 1 + c*c will not overflow. > > * real.h (build_sinatan_real): Allows this function to be called externally. > > > > gcc/testsuite/gcc.dg/ChangeLog > > > > 2018-10-09 Giuliano Belinassi <giuliano.belinassi@usp.br> > > > > PR tree-optimization/86829 > > * gcc.dg/sinatan-1.c: New test. > > * gcc.dg/sinatan-2.c: New test. > > * gcc.dg/sinatan-3.c: New test. > > > > There are no tests broken in trunk that seems related to this PR. > THanks. I've installed this onto the trunk. It's right at the > borderline of what would require a copyright assignment. So if you're > going to do further work on GCC you should go ahead and start the > copyright assignment process. > > Jeff Hi! The new sinatan-1.c test fails to link against newlib on aarch64-elf: /tmp/ccmp5fP4.o: In function `sinatanl': sinatan-1.c:(.text+0x68): undefined reference to `atanl' sinatan-1.c:(.text+0x70): undefined reference to `sinl' /tmp/ccmp5fP4.o: In function `cosatanl': sinatan-1.c:(.text+0x80): undefined reference to `atanl' sinatan-1.c:(.text+0x88): undefined reference to `cosl' collect2: error: ld returned 1 exit status I'm not familiar enough with newlib to know if it's a newlib bug, or if we should skip the test? On arm-eabi, the same test fails at runtime, but I haven't yet taken the time to reproduce it manually. Christophe
On Fri, Oct 12, 2018 at 10:08 AM Christophe Lyon <christophe.lyon@linaro.org> wrote: > > On Thu, 11 Oct 2018 at 23:07, Jeff Law <law@redhat.com> wrote: > > > > On 10/9/18 5:29 PM, Giuliano Augusto Faulin Belinassi wrote: > > > Fixed all issues pointed in the previous iteration. > > > There is now a significant change regarding how the sin(atan(x)) > > > constant is calculated, as now it checks for which values such that > > > computing 1 + x*x won't overflow. There are two reasons for this > > > change: (1) Avoid an intermediate infinity value when optimizing > > > cos(atan(x)), and (2) avoid the requirement of separate constants for > > > sin(atan(x)) and cos(atan(x)), thus making easier to maintain the > > > code. > > > > > > gcc/ChangeLog > > > > > > 2018-10-09 Giuliano Belinassi <giuliano.belinassi@usp.br> > > > > > > PR tree-optimization/86829 > > > * match.pd: Added sin(atan(x)) and cos(atan(x)) simplification rules. > > > * real.c (build_sinatan_real): New function to build a constant equal to the > > > largest value c such that 1 + c*c will not overflow. > > > * real.h (build_sinatan_real): Allows this function to be called externally. > > > > > > gcc/testsuite/gcc.dg/ChangeLog > > > > > > 2018-10-09 Giuliano Belinassi <giuliano.belinassi@usp.br> > > > > > > PR tree-optimization/86829 > > > * gcc.dg/sinatan-1.c: New test. > > > * gcc.dg/sinatan-2.c: New test. > > > * gcc.dg/sinatan-3.c: New test. > > > > > > There are no tests broken in trunk that seems related to this PR. > > THanks. I've installed this onto the trunk. It's right at the > > borderline of what would require a copyright assignment. So if you're > > going to do further work on GCC you should go ahead and start the > > copyright assignment process. > > > > Jeff > > Hi! > > The new sinatan-1.c test fails to link against newlib on aarch64-elf: > /tmp/ccmp5fP4.o: In function `sinatanl': > sinatan-1.c:(.text+0x68): undefined reference to `atanl' > sinatan-1.c:(.text+0x70): undefined reference to `sinl' > /tmp/ccmp5fP4.o: In function `cosatanl': > sinatan-1.c:(.text+0x80): undefined reference to `atanl' > sinatan-1.c:(.text+0x88): undefined reference to `cosl' > collect2: error: ld returned 1 exit status > > I'm not familiar enough with newlib to know if it's a newlib bug, or > if we should skip the test? > > On arm-eabi, the same test fails at runtime, but I haven't yet taken > the time to reproduce it manually. target_c99_math might do the trick > Christophe
On Fri, 12 Oct 2018 at 13:27, Richard Biener <richard.guenther@gmail.com> wrote: > > On Fri, Oct 12, 2018 at 10:08 AM Christophe Lyon > <christophe.lyon@linaro.org> wrote: > > > > On Thu, 11 Oct 2018 at 23:07, Jeff Law <law@redhat.com> wrote: > > > > > > On 10/9/18 5:29 PM, Giuliano Augusto Faulin Belinassi wrote: > > > > Fixed all issues pointed in the previous iteration. > > > > There is now a significant change regarding how the sin(atan(x)) > > > > constant is calculated, as now it checks for which values such that > > > > computing 1 + x*x won't overflow. There are two reasons for this > > > > change: (1) Avoid an intermediate infinity value when optimizing > > > > cos(atan(x)), and (2) avoid the requirement of separate constants for > > > > sin(atan(x)) and cos(atan(x)), thus making easier to maintain the > > > > code. > > > > > > > > gcc/ChangeLog > > > > > > > > 2018-10-09 Giuliano Belinassi <giuliano.belinassi@usp.br> > > > > > > > > PR tree-optimization/86829 > > > > * match.pd: Added sin(atan(x)) and cos(atan(x)) simplification rules. > > > > * real.c (build_sinatan_real): New function to build a constant equal to the > > > > largest value c such that 1 + c*c will not overflow. > > > > * real.h (build_sinatan_real): Allows this function to be called externally. > > > > > > > > gcc/testsuite/gcc.dg/ChangeLog > > > > > > > > 2018-10-09 Giuliano Belinassi <giuliano.belinassi@usp.br> > > > > > > > > PR tree-optimization/86829 > > > > * gcc.dg/sinatan-1.c: New test. > > > > * gcc.dg/sinatan-2.c: New test. > > > > * gcc.dg/sinatan-3.c: New test. > > > > > > > > There are no tests broken in trunk that seems related to this PR. > > > THanks. I've installed this onto the trunk. It's right at the > > > borderline of what would require a copyright assignment. So if you're > > > going to do further work on GCC you should go ahead and start the > > > copyright assignment process. > > > > > > Jeff > > > > Hi! > > > > The new sinatan-1.c test fails to link against newlib on aarch64-elf: > > /tmp/ccmp5fP4.o: In function `sinatanl': > > sinatan-1.c:(.text+0x68): undefined reference to `atanl' > > sinatan-1.c:(.text+0x70): undefined reference to `sinl' > > /tmp/ccmp5fP4.o: In function `cosatanl': > > sinatan-1.c:(.text+0x80): undefined reference to `atanl' > > sinatan-1.c:(.text+0x88): undefined reference to `cosl' > > collect2: error: ld returned 1 exit status > > > > I'm not familiar enough with newlib to know if it's a newlib bug, or > > if we should skip the test? > > > > On arm-eabi, the same test fails at runtime, but I haven't yet taken > > the time to reproduce it manually. > > target_c99_math might do the trick > Yes, if we want to skip the test, but I'm not sure about that? On arm-eabi, adding some traces in the test indicates that the outputs of cosatanf, cosatan and cosatanl are wrong. > > Christophe
On 10/12/18 8:11 AM, Christophe Lyon wrote: > On Fri, 12 Oct 2018 at 13:27, Richard Biener <richard.guenther@gmail.com> wrote: >> >> On Fri, Oct 12, 2018 at 10:08 AM Christophe Lyon >> <christophe.lyon@linaro.org> wrote: >>> >>> On Thu, 11 Oct 2018 at 23:07, Jeff Law <law@redhat.com> wrote: >>>> >>>> On 10/9/18 5:29 PM, Giuliano Augusto Faulin Belinassi wrote: >>>>> Fixed all issues pointed in the previous iteration. >>>>> There is now a significant change regarding how the sin(atan(x)) >>>>> constant is calculated, as now it checks for which values such that >>>>> computing 1 + x*x won't overflow. There are two reasons for this >>>>> change: (1) Avoid an intermediate infinity value when optimizing >>>>> cos(atan(x)), and (2) avoid the requirement of separate constants for >>>>> sin(atan(x)) and cos(atan(x)), thus making easier to maintain the >>>>> code. >>>>> >>>>> gcc/ChangeLog >>>>> >>>>> 2018-10-09 Giuliano Belinassi <giuliano.belinassi@usp.br> >>>>> >>>>> PR tree-optimization/86829 >>>>> * match.pd: Added sin(atan(x)) and cos(atan(x)) simplification rules. >>>>> * real.c (build_sinatan_real): New function to build a constant equal to the >>>>> largest value c such that 1 + c*c will not overflow. >>>>> * real.h (build_sinatan_real): Allows this function to be called externally. >>>>> >>>>> gcc/testsuite/gcc.dg/ChangeLog >>>>> >>>>> 2018-10-09 Giuliano Belinassi <giuliano.belinassi@usp.br> >>>>> >>>>> PR tree-optimization/86829 >>>>> * gcc.dg/sinatan-1.c: New test. >>>>> * gcc.dg/sinatan-2.c: New test. >>>>> * gcc.dg/sinatan-3.c: New test. >>>>> >>>>> There are no tests broken in trunk that seems related to this PR. >>>> THanks. I've installed this onto the trunk. It's right at the >>>> borderline of what would require a copyright assignment. So if you're >>>> going to do further work on GCC you should go ahead and start the >>>> copyright assignment process. >>>> >>>> Jeff >>> >>> Hi! >>> >>> The new sinatan-1.c test fails to link against newlib on aarch64-elf: >>> /tmp/ccmp5fP4.o: In function `sinatanl': >>> sinatan-1.c:(.text+0x68): undefined reference to `atanl' >>> sinatan-1.c:(.text+0x70): undefined reference to `sinl' >>> /tmp/ccmp5fP4.o: In function `cosatanl': >>> sinatan-1.c:(.text+0x80): undefined reference to `atanl' >>> sinatan-1.c:(.text+0x88): undefined reference to `cosl' >>> collect2: error: ld returned 1 exit status >>> >>> I'm not familiar enough with newlib to know if it's a newlib bug, or >>> if we should skip the test? >>> >>> On arm-eabi, the same test fails at runtime, but I haven't yet taken >>> the time to reproduce it manually. >> >> target_c99_math might do the trick >> > > Yes, if we want to skip the test, but I'm not sure about that? > On arm-eabi, adding some traces in the test indicates that the outputs > of cosatanf, cosatan and cosatanl are wrong. I think skipping on the newlib targets is fine. I'm much more concerned about the execution tests -- even if it's the library implementation that is wrong, it's going to be painful to manage the xfails over time. jeff
On Fri, 12 Oct 2018 at 16:14, Jeff Law <law@redhat.com> wrote: > > On 10/12/18 8:11 AM, Christophe Lyon wrote: > > On Fri, 12 Oct 2018 at 13:27, Richard Biener <richard.guenther@gmail.com> wrote: > >> > >> On Fri, Oct 12, 2018 at 10:08 AM Christophe Lyon > >> <christophe.lyon@linaro.org> wrote: > >>> > >>> On Thu, 11 Oct 2018 at 23:07, Jeff Law <law@redhat.com> wrote: > >>>> > >>>> On 10/9/18 5:29 PM, Giuliano Augusto Faulin Belinassi wrote: > >>>>> Fixed all issues pointed in the previous iteration. > >>>>> There is now a significant change regarding how the sin(atan(x)) > >>>>> constant is calculated, as now it checks for which values such that > >>>>> computing 1 + x*x won't overflow. There are two reasons for this > >>>>> change: (1) Avoid an intermediate infinity value when optimizing > >>>>> cos(atan(x)), and (2) avoid the requirement of separate constants for > >>>>> sin(atan(x)) and cos(atan(x)), thus making easier to maintain the > >>>>> code. > >>>>> > >>>>> gcc/ChangeLog > >>>>> > >>>>> 2018-10-09 Giuliano Belinassi <giuliano.belinassi@usp.br> > >>>>> > >>>>> PR tree-optimization/86829 > >>>>> * match.pd: Added sin(atan(x)) and cos(atan(x)) simplification rules. > >>>>> * real.c (build_sinatan_real): New function to build a constant equal to the > >>>>> largest value c such that 1 + c*c will not overflow. > >>>>> * real.h (build_sinatan_real): Allows this function to be called externally. > >>>>> > >>>>> gcc/testsuite/gcc.dg/ChangeLog > >>>>> > >>>>> 2018-10-09 Giuliano Belinassi <giuliano.belinassi@usp.br> > >>>>> > >>>>> PR tree-optimization/86829 > >>>>> * gcc.dg/sinatan-1.c: New test. > >>>>> * gcc.dg/sinatan-2.c: New test. > >>>>> * gcc.dg/sinatan-3.c: New test. > >>>>> > >>>>> There are no tests broken in trunk that seems related to this PR. > >>>> THanks. I've installed this onto the trunk. It's right at the > >>>> borderline of what would require a copyright assignment. So if you're > >>>> going to do further work on GCC you should go ahead and start the > >>>> copyright assignment process. > >>>> > >>>> Jeff > >>> > >>> Hi! > >>> > >>> The new sinatan-1.c test fails to link against newlib on aarch64-elf: > >>> /tmp/ccmp5fP4.o: In function `sinatanl': > >>> sinatan-1.c:(.text+0x68): undefined reference to `atanl' > >>> sinatan-1.c:(.text+0x70): undefined reference to `sinl' > >>> /tmp/ccmp5fP4.o: In function `cosatanl': > >>> sinatan-1.c:(.text+0x80): undefined reference to `atanl' > >>> sinatan-1.c:(.text+0x88): undefined reference to `cosl' > >>> collect2: error: ld returned 1 exit status > >>> > >>> I'm not familiar enough with newlib to know if it's a newlib bug, or > >>> if we should skip the test? > >>> > >>> On arm-eabi, the same test fails at runtime, but I haven't yet taken > >>> the time to reproduce it manually. > >> > >> target_c99_math might do the trick > >> > > > > Yes, if we want to skip the test, but I'm not sure about that? > > On arm-eabi, adding some traces in the test indicates that the outputs > > of cosatanf, cosatan and cosatanl are wrong. > I think skipping on the newlib targets is fine. I'm much more concerned > about the execution tests -- even if it's the library implementation > that is wrong, it's going to be painful to manage the xfails over time. > The two problems I noticed are both with newlib: - link failure on aarch64-elf - runtime error on arm-eabi A little bit more info on the arm-eabi problem: The testcase has: float fc = nextafterf (sqrtf (__FLT_MAX__ - 1), __FLT_MAX__); [...] fy = cosatanf (fc); fc is built with: mov r0, #0 movt r0, 24448 so r0=0x5f800000 (1.8446744E19) which looks ok but then cosatanf is computed as (ie there's not call to cosatanf()): movw r3, #48430 movt r3, 45883 so r3=0xb33bbd2e (-4.371139E-8) which is not zero. cosatan and cosatanl are called and seem to return wrong values too, but I haven't stepped into them yet. Christophe
Hello What is the output of these functions on such arch? Since the test didn't fail for the sinatan counterpart, an possible explanation would be that the calculation of the sqrf, sqrt and sqrtl (lines 62-64) yielded a number that is far behind of what it should be. However, I am still not sure about this, so I will investigate further. How about I write a small program to check if the result obtained by this calculation is what it should be? Giuliano
On 10/12/18 9:51 AM, Giuliano Augusto Faulin Belinassi wrote: > Hello > What is the output of these functions on such arch? Since the > test didn't fail for the sinatan counterpart, an possible explanation > would be that the calculation of the sqrf, sqrt and sqrtl (lines > 62-64) yielded a number that is far behind of what it should be. > However, I am still not sure about this, so I will investigate > further. > How about I write a small program to check if the result > obtained by this calculation is what it should be? I suspect it's less the architecture and more the underlying library. As Christophe mentioned, both issues are with newlib which is an C library primarily used in the emebedded space. I believe it's math code derives from early BSD libm and likely hasn't been stressed for this kind of correctness. It's lightly maintained (primarily for cygwin). I'm going to run the testcases in my arm linux chroots. That should allow us to rule out codegen issues as those chroots will be using glibc for their math library. jeff
> fc is built with: > mov r0, #0 > movt r0, 24448 > so r0=0x5f800000 (1.8446744E19) which looks ok this is correct. My x86_64 yields the same value > but then cosatanf is computed as (ie there's not call to cosatanf()): > movw r3, #48430 > movt r3, 45883 > so r3=0xb33bbd2e (-4.371139E-8) which is not zero. Ugh. So GCC replaced the function call with a precomputed value? This does not happens in my x86_64. Maybe it is Ofast's fault? Also, it seems that GCC is precomputing cos(atan(x)) before the substitution, as the following python script yields the same result: import numpy as np x = np.float32 (1.8446744e19) print (np.cos (np.arctan (x)) I would also like to add that -4.371139E-8 is very far away from 5.421011E-20, which is a "more" correct value for this computation. So returning 0 may be a better option? On Fri, Oct 12, 2018 at 12:57 PM Jeff Law <law@redhat.com> wrote: > > On 10/12/18 9:51 AM, Giuliano Augusto Faulin Belinassi wrote: > > Hello > > What is the output of these functions on such arch? Since the > > test didn't fail for the sinatan counterpart, an possible explanation > > would be that the calculation of the sqrf, sqrt and sqrtl (lines > > 62-64) yielded a number that is far behind of what it should be. > > However, I am still not sure about this, so I will investigate > > further. > > How about I write a small program to check if the result > > obtained by this calculation is what it should be? > I suspect it's less the architecture and more the underlying library. > As Christophe mentioned, both issues are with newlib which is an C > library primarily used in the emebedded space. I believe it's math code > derives from early BSD libm and likely hasn't been stressed for this > kind of correctness. It's lightly maintained (primarily for cygwin). > > I'm going to run the testcases in my arm linux chroots. That should > allow us to rule out codegen issues as those chroots will be using glibc > for their math library. > > jeff
On Fri, 12 Oct 2018 at 17:57, Jeff Law <law@redhat.com> wrote: > > On 10/12/18 9:51 AM, Giuliano Augusto Faulin Belinassi wrote: > > Hello > > What is the output of these functions on such arch? Since the > > test didn't fail for the sinatan counterpart, an possible explanation > > would be that the calculation of the sqrf, sqrt and sqrtl (lines > > 62-64) yielded a number that is far behind of what it should be. > > However, I am still not sure about this, so I will investigate > > further. > > How about I write a small program to check if the result > > obtained by this calculation is what it should be? > I suspect it's less the architecture and more the underlying library. > As Christophe mentioned, both issues are with newlib which is an C > library primarily used in the emebedded space. I believe it's math code > derives from early BSD libm and likely hasn't been stressed for this > kind of correctness. It's lightly maintained (primarily for cygwin). > > I'm going to run the testcases in my arm linux chroots. That should > allow us to rule out codegen issues as those chroots will be using glibc > for their math library. > My tests on arm-linux-gnueabi* are OK, so yours shoud be OK too :) > jeff
On Fri, 12 Oct 2018 at 20:01, Giuliano Augusto Faulin Belinassi <giuliano.belinassi@usp.br> wrote: > > > fc is built with: > > mov r0, #0 > > movt r0, 24448 > > so r0=0x5f800000 (1.8446744E19) which looks ok > > this is correct. My x86_64 yields the same value > > > but then cosatanf is computed as (ie there's not call to cosatanf()): > > movw r3, #48430 > > movt r3, 45883 > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero. > > Ugh. So GCC replaced the function call with a precomputed value? This > does not happens in my x86_64. Maybe it is Ofast's fault? > Also, it seems that GCC is precomputing cos(atan(x)) before the > substitution, as the following python script yields the same result: > Yes, I was surprised to see that. > import numpy as np > x = np.float32 (1.8446744e19) > print (np.cos (np.arctan (x)) > > I would also like to add that -4.371139E-8 is very far away from > 5.421011E-20, which is a "more" correct value for this computation. So > returning 0 may be a better option? > On Fri, Oct 12, 2018 at 12:57 PM Jeff Law <law@redhat.com> wrote: > > > > On 10/12/18 9:51 AM, Giuliano Augusto Faulin Belinassi wrote: > > > Hello > > > What is the output of these functions on such arch? Since the > > > test didn't fail for the sinatan counterpart, an possible explanation > > > would be that the calculation of the sqrf, sqrt and sqrtl (lines > > > 62-64) yielded a number that is far behind of what it should be. > > > However, I am still not sure about this, so I will investigate > > > further. > > > How about I write a small program to check if the result > > > obtained by this calculation is what it should be? > > I suspect it's less the architecture and more the underlying library. > > As Christophe mentioned, both issues are with newlib which is an C > > library primarily used in the emebedded space. I believe it's math code > > derives from early BSD libm and likely hasn't been stressed for this > > kind of correctness. It's lightly maintained (primarily for cygwin). > > > > I'm going to run the testcases in my arm linux chroots. That should > > allow us to rule out codegen issues as those chroots will be using glibc > > for their math library. > > > > jeff
Hello. Sorry for the late reply. > but then cosatanf is computed as (ie there's not call to cosatanf()): > movw r3, #48430 > movt r3, 45883 > so r3=0xb33bbd2e (-4.371139E-8) which is not zero. Does this behavior is still present if we change the line 58 to: int __attribute__ ((optimize("O0"))) in sinatan-1.c? On Fri, Oct 12, 2018 at 3:24 PM Christophe Lyon <christophe.lyon@linaro.org> wrote: > > On Fri, 12 Oct 2018 at 20:01, Giuliano Augusto Faulin Belinassi > <giuliano.belinassi@usp.br> wrote: > > > > > fc is built with: > > > mov r0, #0 > > > movt r0, 24448 > > > so r0=0x5f800000 (1.8446744E19) which looks ok > > > > this is correct. My x86_64 yields the same value > > > > > but then cosatanf is computed as (ie there's not call to cosatanf()): > > > movw r3, #48430 > > > movt r3, 45883 > > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero. > > > > Ugh. So GCC replaced the function call with a precomputed value? This > > does not happens in my x86_64. Maybe it is Ofast's fault? > > Also, it seems that GCC is precomputing cos(atan(x)) before the > > substitution, as the following python script yields the same result: > > > > Yes, I was surprised to see that. > > > import numpy as np > > x = np.float32 (1.8446744e19) > > print (np.cos (np.arctan (x)) > > > > I would also like to add that -4.371139E-8 is very far away from > > 5.421011E-20, which is a "more" correct value for this computation. So > > returning 0 may be a better option? > > On Fri, Oct 12, 2018 at 12:57 PM Jeff Law <law@redhat.com> wrote: > > > > > > On 10/12/18 9:51 AM, Giuliano Augusto Faulin Belinassi wrote: > > > > Hello > > > > What is the output of these functions on such arch? Since the > > > > test didn't fail for the sinatan counterpart, an possible explanation > > > > would be that the calculation of the sqrf, sqrt and sqrtl (lines > > > > 62-64) yielded a number that is far behind of what it should be. > > > > However, I am still not sure about this, so I will investigate > > > > further. > > > > How about I write a small program to check if the result > > > > obtained by this calculation is what it should be? > > > I suspect it's less the architecture and more the underlying library. > > > As Christophe mentioned, both issues are with newlib which is an C > > > library primarily used in the emebedded space. I believe it's math code > > > derives from early BSD libm and likely hasn't been stressed for this > > > kind of correctness. It's lightly maintained (primarily for cygwin). > > > > > > I'm going to run the testcases in my arm linux chroots. That should > > > allow us to rule out codegen issues as those chroots will be using glibc > > > for their math library. > > > > > > jeff
On Tue, 16 Oct 2018 at 17:15, Giuliano Augusto Faulin Belinassi <giuliano.belinassi@usp.br> wrote: > > Hello. Sorry for the late reply. > > > but then cosatanf is computed as (ie there's not call to cosatanf()): > > movw r3, #48430 > > movt r3, 45883 > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero. > Does this behavior is still present if we change the line 58 to: > int __attribute__ ((optimize("O0"))) > in sinatan-1.c? No, this now generates: ldr r0, [fp, #-32] bl cosatanf where r0=0x5f800000, and cosatanf() returns 0xb33bbd2e (ie the same value as what was computed by GCC) > On Fri, Oct 12, 2018 at 3:24 PM Christophe Lyon > <christophe.lyon@linaro.org> wrote: > > > > On Fri, 12 Oct 2018 at 20:01, Giuliano Augusto Faulin Belinassi > > <giuliano.belinassi@usp.br> wrote: > > > > > > > fc is built with: > > > > mov r0, #0 > > > > movt r0, 24448 > > > > so r0=0x5f800000 (1.8446744E19) which looks ok > > > > > > this is correct. My x86_64 yields the same value > > > > > > > but then cosatanf is computed as (ie there's not call to cosatanf()): > > > > movw r3, #48430 > > > > movt r3, 45883 > > > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero. > > > > > > Ugh. So GCC replaced the function call with a precomputed value? This > > > does not happens in my x86_64. Maybe it is Ofast's fault? > > > Also, it seems that GCC is precomputing cos(atan(x)) before the > > > substitution, as the following python script yields the same result: > > > > > > > Yes, I was surprised to see that. > > > > > import numpy as np > > > x = np.float32 (1.8446744e19) > > > print (np.cos (np.arctan (x)) > > > > > > I would also like to add that -4.371139E-8 is very far away from > > > 5.421011E-20, which is a "more" correct value for this computation. So > > > returning 0 may be a better option? > > > On Fri, Oct 12, 2018 at 12:57 PM Jeff Law <law@redhat.com> wrote: > > > > > > > > On 10/12/18 9:51 AM, Giuliano Augusto Faulin Belinassi wrote: > > > > > Hello > > > > > What is the output of these functions on such arch? Since the > > > > > test didn't fail for the sinatan counterpart, an possible explanation > > > > > would be that the calculation of the sqrf, sqrt and sqrtl (lines > > > > > 62-64) yielded a number that is far behind of what it should be. > > > > > However, I am still not sure about this, so I will investigate > > > > > further. > > > > > How about I write a small program to check if the result > > > > > obtained by this calculation is what it should be? > > > > I suspect it's less the architecture and more the underlying library. > > > > As Christophe mentioned, both issues are with newlib which is an C > > > > library primarily used in the emebedded space. I believe it's math code > > > > derives from early BSD libm and likely hasn't been stressed for this > > > > kind of correctness. It's lightly maintained (primarily for cygwin). > > > > > > > > I'm going to run the testcases in my arm linux chroots. That should > > > > allow us to rule out codegen issues as those chroots will be using glibc > > > > for their math library. > > > > > > > > jeff
Hello, Christophe Could you please dump the assembly of cosatanf here? On Tue, Oct 16, 2018 at 12:23 PM Christophe Lyon <christophe.lyon@linaro.org> wrote: > > On Tue, 16 Oct 2018 at 17:15, Giuliano Augusto Faulin Belinassi > <giuliano.belinassi@usp.br> wrote: > > > > Hello. Sorry for the late reply. > > > > > but then cosatanf is computed as (ie there's not call to cosatanf()): > > > movw r3, #48430 > > > movt r3, 45883 > > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero. > > Does this behavior is still present if we change the line 58 to: > > int __attribute__ ((optimize("O0"))) > > in sinatan-1.c? > > No, this now generates: > ldr r0, [fp, #-32] > bl cosatanf > where r0=0x5f800000, and cosatanf() returns 0xb33bbd2e > (ie the same value as what was computed by GCC) > > > On Fri, Oct 12, 2018 at 3:24 PM Christophe Lyon > > <christophe.lyon@linaro.org> wrote: > > > > > > On Fri, 12 Oct 2018 at 20:01, Giuliano Augusto Faulin Belinassi > > > <giuliano.belinassi@usp.br> wrote: > > > > > > > > > fc is built with: > > > > > mov r0, #0 > > > > > movt r0, 24448 > > > > > so r0=0x5f800000 (1.8446744E19) which looks ok > > > > > > > > this is correct. My x86_64 yields the same value > > > > > > > > > but then cosatanf is computed as (ie there's not call to cosatanf()): > > > > > movw r3, #48430 > > > > > movt r3, 45883 > > > > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero. > > > > > > > > Ugh. So GCC replaced the function call with a precomputed value? This > > > > does not happens in my x86_64. Maybe it is Ofast's fault? > > > > Also, it seems that GCC is precomputing cos(atan(x)) before the > > > > substitution, as the following python script yields the same result: > > > > > > > > > > Yes, I was surprised to see that. > > > > > > > import numpy as np > > > > x = np.float32 (1.8446744e19) > > > > print (np.cos (np.arctan (x)) > > > > > > > > I would also like to add that -4.371139E-8 is very far away from > > > > 5.421011E-20, which is a "more" correct value for this computation. So > > > > returning 0 may be a better option? > > > > On Fri, Oct 12, 2018 at 12:57 PM Jeff Law <law@redhat.com> wrote: > > > > > > > > > > On 10/12/18 9:51 AM, Giuliano Augusto Faulin Belinassi wrote: > > > > > > Hello > > > > > > What is the output of these functions on such arch? Since the > > > > > > test didn't fail for the sinatan counterpart, an possible explanation > > > > > > would be that the calculation of the sqrf, sqrt and sqrtl (lines > > > > > > 62-64) yielded a number that is far behind of what it should be. > > > > > > However, I am still not sure about this, so I will investigate > > > > > > further. > > > > > > How about I write a small program to check if the result > > > > > > obtained by this calculation is what it should be? > > > > > I suspect it's less the architecture and more the underlying library. > > > > > As Christophe mentioned, both issues are with newlib which is an C > > > > > library primarily used in the emebedded space. I believe it's math code > > > > > derives from early BSD libm and likely hasn't been stressed for this > > > > > kind of correctness. It's lightly maintained (primarily for cygwin). > > > > > > > > > > I'm going to run the testcases in my arm linux chroots. That should > > > > > allow us to rule out codegen issues as those chroots will be using glibc > > > > > for their math library. > > > > > > > > > > jeff
On Tue, 16 Oct 2018 at 18:04, Giuliano Augusto Faulin Belinassi <giuliano.belinassi@usp.br> wrote: > > Hello, Christophe > Could you please dump the assembly of cosatanf here? > > Sure: .global cosatanf .syntax unified .arm .fpu softvfp .type cosatanf, %function cosatanf: .LFB3: .loc 1 44 1 is_stmt 1 .cfi_startproc @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 .LVL50: .loc 1 45 5 .loc 1 44 1 is_stmt 0 push {r4, lr} .cfi_def_cfa_offset 8 .cfi_offset 4, -8 .cfi_offset 14, -4 .loc 1 45 12 bl atanf .LVL51: .loc 1 46 1 pop {r4, lr} .cfi_restore 14 .cfi_restore 4 .cfi_def_cfa_offset 0 .loc 1 45 12 b cosf .LVL52: .cfi_endproc .LFE3: .size cosatanf, .-cosatanf So, upon entry we have r0=0x5f800000 atanf returns 0x3fc90fdb and cosf returns 0xb33bbd2e > On Tue, Oct 16, 2018 at 12:23 PM Christophe Lyon > <christophe.lyon@linaro.org> wrote: > > > > On Tue, 16 Oct 2018 at 17:15, Giuliano Augusto Faulin Belinassi > > <giuliano.belinassi@usp.br> wrote: > > > > > > Hello. Sorry for the late reply. > > > > > > > but then cosatanf is computed as (ie there's not call to cosatanf()): > > > > movw r3, #48430 > > > > movt r3, 45883 > > > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero. > > > Does this behavior is still present if we change the line 58 to: > > > int __attribute__ ((optimize("O0"))) > > > in sinatan-1.c? > > > > No, this now generates: > > ldr r0, [fp, #-32] > > bl cosatanf > > where r0=0x5f800000, and cosatanf() returns 0xb33bbd2e > > (ie the same value as what was computed by GCC) > > > > > On Fri, Oct 12, 2018 at 3:24 PM Christophe Lyon > > > <christophe.lyon@linaro.org> wrote: > > > > > > > > On Fri, 12 Oct 2018 at 20:01, Giuliano Augusto Faulin Belinassi > > > > <giuliano.belinassi@usp.br> wrote: > > > > > > > > > > > fc is built with: > > > > > > mov r0, #0 > > > > > > movt r0, 24448 > > > > > > so r0=0x5f800000 (1.8446744E19) which looks ok > > > > > > > > > > this is correct. My x86_64 yields the same value > > > > > > > > > > > but then cosatanf is computed as (ie there's not call to cosatanf()): > > > > > > movw r3, #48430 > > > > > > movt r3, 45883 > > > > > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero. > > > > > > > > > > Ugh. So GCC replaced the function call with a precomputed value? This > > > > > does not happens in my x86_64. Maybe it is Ofast's fault? > > > > > Also, it seems that GCC is precomputing cos(atan(x)) before the > > > > > substitution, as the following python script yields the same result: > > > > > > > > > > > > > Yes, I was surprised to see that. > > > > > > > > > import numpy as np > > > > > x = np.float32 (1.8446744e19) > > > > > print (np.cos (np.arctan (x)) > > > > > > > > > > I would also like to add that -4.371139E-8 is very far away from > > > > > 5.421011E-20, which is a "more" correct value for this computation. So > > > > > returning 0 may be a better option? > > > > > On Fri, Oct 12, 2018 at 12:57 PM Jeff Law <law@redhat.com> wrote: > > > > > > > > > > > > On 10/12/18 9:51 AM, Giuliano Augusto Faulin Belinassi wrote: > > > > > > > Hello > > > > > > > What is the output of these functions on such arch? Since the > > > > > > > test didn't fail for the sinatan counterpart, an possible explanation > > > > > > > would be that the calculation of the sqrf, sqrt and sqrtl (lines > > > > > > > 62-64) yielded a number that is far behind of what it should be. > > > > > > > However, I am still not sure about this, so I will investigate > > > > > > > further. > > > > > > > How about I write a small program to check if the result > > > > > > > obtained by this calculation is what it should be? > > > > > > I suspect it's less the architecture and more the underlying library. > > > > > > As Christophe mentioned, both issues are with newlib which is an C > > > > > > library primarily used in the emebedded space. I believe it's math code > > > > > > derives from early BSD libm and likely hasn't been stressed for this > > > > > > kind of correctness. It's lightly maintained (primarily for cygwin). > > > > > > > > > > > > I'm going to run the testcases in my arm linux chroots. That should > > > > > > allow us to rule out codegen issues as those chroots will be using glibc > > > > > > for their math library. > > > > > > > > > > > > jeff
Hello, > cosatanf: > .LFB3: > .loc 1 44 1 is_stmt 1 > .cfi_startproc > @ args = 0, pretend = 0, frame = 0 > @ frame_needed = 0, uses_anonymous_args = 0 > .LVL50: > .loc 1 45 5 > .loc 1 44 1 is_stmt 0 > push {r4, lr} > .cfi_def_cfa_offset 8 > .cfi_offset 4, -8 > .cfi_offset 14, -4 > .loc 1 45 12 > bl atanf > .LVL51: > .loc 1 46 1 > pop {r4, lr} > .cfi_restore 14 > .cfi_restore 4 > .cfi_def_cfa_offset 0 > .loc 1 45 12 > b cosf > .LVL52: This means that the expression 'cos (atan (x))' was not simplified :-(. Did the check at line 4281 (svn revision 265209) in gcc/match.pd failed? > Yes, if we want to skip the test, but I'm not sure about that? Since the only point of this patch is to simplify these kind of expressions, and it is not being simplified at all in your arch as the asm dump suggests, then it seems safe to skip all sinatan-*.c tests. On Tue, Oct 16, 2018 at 1:25 PM Christophe Lyon <christophe.lyon@linaro.org> wrote: > > On Tue, 16 Oct 2018 at 18:04, Giuliano Augusto Faulin Belinassi > <giuliano.belinassi@usp.br> wrote: > > > > Hello, Christophe > > Could you please dump the assembly of cosatanf here? > > > > > Sure: > .global cosatanf > .syntax unified > .arm > .fpu softvfp > .type cosatanf, %function > cosatanf: > .LFB3: > .loc 1 44 1 is_stmt 1 > .cfi_startproc > @ args = 0, pretend = 0, frame = 0 > @ frame_needed = 0, uses_anonymous_args = 0 > .LVL50: > .loc 1 45 5 > .loc 1 44 1 is_stmt 0 > push {r4, lr} > .cfi_def_cfa_offset 8 > .cfi_offset 4, -8 > .cfi_offset 14, -4 > .loc 1 45 12 > bl atanf > .LVL51: > .loc 1 46 1 > pop {r4, lr} > .cfi_restore 14 > .cfi_restore 4 > .cfi_def_cfa_offset 0 > .loc 1 45 12 > b cosf > .LVL52: > .cfi_endproc > .LFE3: > .size cosatanf, .-cosatanf > > So, upon entry we have r0=0x5f800000 > atanf returns 0x3fc90fdb > and cosf returns 0xb33bbd2e > > > > On Tue, Oct 16, 2018 at 12:23 PM Christophe Lyon > > <christophe.lyon@linaro.org> wrote: > > > > > > On Tue, 16 Oct 2018 at 17:15, Giuliano Augusto Faulin Belinassi > > > <giuliano.belinassi@usp.br> wrote: > > > > > > > > Hello. Sorry for the late reply. > > > > > > > > > but then cosatanf is computed as (ie there's not call to cosatanf()): > > > > > movw r3, #48430 > > > > > movt r3, 45883 > > > > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero. > > > > Does this behavior is still present if we change the line 58 to: > > > > int __attribute__ ((optimize("O0"))) > > > > in sinatan-1.c? > > > > > > No, this now generates: > > > ldr r0, [fp, #-32] > > > bl cosatanf > > > where r0=0x5f800000, and cosatanf() returns 0xb33bbd2e > > > (ie the same value as what was computed by GCC) > > > > > > > On Fri, Oct 12, 2018 at 3:24 PM Christophe Lyon > > > > <christophe.lyon@linaro.org> wrote: > > > > > > > > > > On Fri, 12 Oct 2018 at 20:01, Giuliano Augusto Faulin Belinassi > > > > > <giuliano.belinassi@usp.br> wrote: > > > > > > > > > > > > > fc is built with: > > > > > > > mov r0, #0 > > > > > > > movt r0, 24448 > > > > > > > so r0=0x5f800000 (1.8446744E19) which looks ok > > > > > > > > > > > > this is correct. My x86_64 yields the same value > > > > > > > > > > > > > but then cosatanf is computed as (ie there's not call to cosatanf()): > > > > > > > movw r3, #48430 > > > > > > > movt r3, 45883 > > > > > > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero. > > > > > > > > > > > > Ugh. So GCC replaced the function call with a precomputed value? This > > > > > > does not happens in my x86_64. Maybe it is Ofast's fault? > > > > > > Also, it seems that GCC is precomputing cos(atan(x)) before the > > > > > > substitution, as the following python script yields the same result: > > > > > > > > > > > > > > > > Yes, I was surprised to see that. > > > > > > > > > > > import numpy as np > > > > > > x = np.float32 (1.8446744e19) > > > > > > print (np.cos (np.arctan (x)) > > > > > > > > > > > > I would also like to add that -4.371139E-8 is very far away from > > > > > > 5.421011E-20, which is a "more" correct value for this computation. So > > > > > > returning 0 may be a better option? > > > > > > On Fri, Oct 12, 2018 at 12:57 PM Jeff Law <law@redhat.com> wrote: > > > > > > > > > > > > > > On 10/12/18 9:51 AM, Giuliano Augusto Faulin Belinassi wrote: > > > > > > > > Hello > > > > > > > > What is the output of these functions on such arch? Since the > > > > > > > > test didn't fail for the sinatan counterpart, an possible explanation > > > > > > > > would be that the calculation of the sqrf, sqrt and sqrtl (lines > > > > > > > > 62-64) yielded a number that is far behind of what it should be. > > > > > > > > However, I am still not sure about this, so I will investigate > > > > > > > > further. > > > > > > > > How about I write a small program to check if the result > > > > > > > > obtained by this calculation is what it should be? > > > > > > > I suspect it's less the architecture and more the underlying library. > > > > > > > As Christophe mentioned, both issues are with newlib which is an C > > > > > > > library primarily used in the emebedded space. I believe it's math code > > > > > > > derives from early BSD libm and likely hasn't been stressed for this > > > > > > > kind of correctness. It's lightly maintained (primarily for cygwin). > > > > > > > > > > > > > > I'm going to run the testcases in my arm linux chroots. That should > > > > > > > allow us to rule out codegen issues as those chroots will be using glibc > > > > > > > for their math library. > > > > > > > > > > > > > > jeff
On Tue, 16 Oct 2018 at 19:33, Giuliano Augusto Faulin Belinassi <giuliano.belinassi@usp.br> wrote: > > Hello, > > > cosatanf: > > .LFB3: > > .loc 1 44 1 is_stmt 1 > > .cfi_startproc > > @ args = 0, pretend = 0, frame = 0 > > @ frame_needed = 0, uses_anonymous_args = 0 > > .LVL50: > > .loc 1 45 5 > > .loc 1 44 1 is_stmt 0 > > push {r4, lr} > > .cfi_def_cfa_offset 8 > > .cfi_offset 4, -8 > > .cfi_offset 14, -4 > > .loc 1 45 12 > > bl atanf > > .LVL51: > > .loc 1 46 1 > > pop {r4, lr} > > .cfi_restore 14 > > .cfi_restore 4 > > .cfi_def_cfa_offset 0 > > .loc 1 45 12 > > b cosf > > .LVL52: > > This means that the expression 'cos (atan (x))' was not simplified > :-(. Did the check at line 4281 (svn revision 265209) in gcc/match.pd > failed? Do you mean r265064? There's no r265209 in trunk. > > Yes, if we want to skip the test, but I'm not sure about that? > Since the only point of this patch is to simplify these kind of > expressions, and it is not being simplified at all in your arch as the > asm dump suggests, then it seems safe to skip all sinatan-*.c tests. > On Tue, Oct 16, 2018 at 1:25 PM Christophe Lyon > <christophe.lyon@linaro.org> wrote: > > > > On Tue, 16 Oct 2018 at 18:04, Giuliano Augusto Faulin Belinassi > > <giuliano.belinassi@usp.br> wrote: > > > > > > Hello, Christophe > > > Could you please dump the assembly of cosatanf here? > > > > > > > > Sure: > > .global cosatanf > > .syntax unified > > .arm > > .fpu softvfp > > .type cosatanf, %function > > cosatanf: > > .LFB3: > > .loc 1 44 1 is_stmt 1 > > .cfi_startproc > > @ args = 0, pretend = 0, frame = 0 > > @ frame_needed = 0, uses_anonymous_args = 0 > > .LVL50: > > .loc 1 45 5 > > .loc 1 44 1 is_stmt 0 > > push {r4, lr} > > .cfi_def_cfa_offset 8 > > .cfi_offset 4, -8 > > .cfi_offset 14, -4 > > .loc 1 45 12 > > bl atanf > > .LVL51: > > .loc 1 46 1 > > pop {r4, lr} > > .cfi_restore 14 > > .cfi_restore 4 > > .cfi_def_cfa_offset 0 > > .loc 1 45 12 > > b cosf > > .LVL52: > > .cfi_endproc > > .LFE3: > > .size cosatanf, .-cosatanf > > > > So, upon entry we have r0=0x5f800000 > > atanf returns 0x3fc90fdb > > and cosf returns 0xb33bbd2e > > > > > > > On Tue, Oct 16, 2018 at 12:23 PM Christophe Lyon > > > <christophe.lyon@linaro.org> wrote: > > > > > > > > On Tue, 16 Oct 2018 at 17:15, Giuliano Augusto Faulin Belinassi > > > > <giuliano.belinassi@usp.br> wrote: > > > > > > > > > > Hello. Sorry for the late reply. > > > > > > > > > > > but then cosatanf is computed as (ie there's not call to cosatanf()): > > > > > > movw r3, #48430 > > > > > > movt r3, 45883 > > > > > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero. > > > > > Does this behavior is still present if we change the line 58 to: > > > > > int __attribute__ ((optimize("O0"))) > > > > > in sinatan-1.c? > > > > > > > > No, this now generates: > > > > ldr r0, [fp, #-32] > > > > bl cosatanf > > > > where r0=0x5f800000, and cosatanf() returns 0xb33bbd2e > > > > (ie the same value as what was computed by GCC) > > > > > > > > > On Fri, Oct 12, 2018 at 3:24 PM Christophe Lyon > > > > > <christophe.lyon@linaro.org> wrote: > > > > > > > > > > > > On Fri, 12 Oct 2018 at 20:01, Giuliano Augusto Faulin Belinassi > > > > > > <giuliano.belinassi@usp.br> wrote: > > > > > > > > > > > > > > > fc is built with: > > > > > > > > mov r0, #0 > > > > > > > > movt r0, 24448 > > > > > > > > so r0=0x5f800000 (1.8446744E19) which looks ok > > > > > > > > > > > > > > this is correct. My x86_64 yields the same value > > > > > > > > > > > > > > > but then cosatanf is computed as (ie there's not call to cosatanf()): > > > > > > > > movw r3, #48430 > > > > > > > > movt r3, 45883 > > > > > > > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero. > > > > > > > > > > > > > > Ugh. So GCC replaced the function call with a precomputed value? This > > > > > > > does not happens in my x86_64. Maybe it is Ofast's fault? > > > > > > > Also, it seems that GCC is precomputing cos(atan(x)) before the > > > > > > > substitution, as the following python script yields the same result: > > > > > > > > > > > > > > > > > > > Yes, I was surprised to see that. > > > > > > > > > > > > > import numpy as np > > > > > > > x = np.float32 (1.8446744e19) > > > > > > > print (np.cos (np.arctan (x)) > > > > > > > > > > > > > > I would also like to add that -4.371139E-8 is very far away from > > > > > > > 5.421011E-20, which is a "more" correct value for this computation. So > > > > > > > returning 0 may be a better option? > > > > > > > On Fri, Oct 12, 2018 at 12:57 PM Jeff Law <law@redhat.com> wrote: > > > > > > > > > > > > > > > > On 10/12/18 9:51 AM, Giuliano Augusto Faulin Belinassi wrote: > > > > > > > > > Hello > > > > > > > > > What is the output of these functions on such arch? Since the > > > > > > > > > test didn't fail for the sinatan counterpart, an possible explanation > > > > > > > > > would be that the calculation of the sqrf, sqrt and sqrtl (lines > > > > > > > > > 62-64) yielded a number that is far behind of what it should be. > > > > > > > > > However, I am still not sure about this, so I will investigate > > > > > > > > > further. > > > > > > > > > How about I write a small program to check if the result > > > > > > > > > obtained by this calculation is what it should be? > > > > > > > > I suspect it's less the architecture and more the underlying library. > > > > > > > > As Christophe mentioned, both issues are with newlib which is an C > > > > > > > > library primarily used in the emebedded space. I believe it's math code > > > > > > > > derives from early BSD libm and likely hasn't been stressed for this > > > > > > > > kind of correctness. It's lightly maintained (primarily for cygwin). > > > > > > > > > > > > > > > > I'm going to run the testcases in my arm linux chroots. That should > > > > > > > > allow us to rule out codegen issues as those chroots will be using glibc > > > > > > > > for their math library. > > > > > > > > > > > > > > > > jeff
Oh, sorry about that. :P Could you please check if the 'if (SCALAR_FLOAT_TYPE_P (type)', at line 4281 in gcc/match.pd is evaluated true in your arch? svn revision: $ svn info | grep Revision Revision: 265212 On Tue, Oct 16, 2018 at 4:14 PM Christophe Lyon <christophe.lyon@linaro.org> wrote: > > On Tue, 16 Oct 2018 at 19:33, Giuliano Augusto Faulin Belinassi > <giuliano.belinassi@usp.br> wrote: > > > > Hello, > > > > > cosatanf: > > > .LFB3: > > > .loc 1 44 1 is_stmt 1 > > > .cfi_startproc > > > @ args = 0, pretend = 0, frame = 0 > > > @ frame_needed = 0, uses_anonymous_args = 0 > > > .LVL50: > > > .loc 1 45 5 > > > .loc 1 44 1 is_stmt 0 > > > push {r4, lr} > > > .cfi_def_cfa_offset 8 > > > .cfi_offset 4, -8 > > > .cfi_offset 14, -4 > > > .loc 1 45 12 > > > bl atanf > > > .LVL51: > > > .loc 1 46 1 > > > pop {r4, lr} > > > .cfi_restore 14 > > > .cfi_restore 4 > > > .cfi_def_cfa_offset 0 > > > .loc 1 45 12 > > > b cosf > > > .LVL52: > > > > This means that the expression 'cos (atan (x))' was not simplified > > :-(. Did the check at line 4281 (svn revision 265209) in gcc/match.pd > > failed? > > Do you mean r265064? There's no r265209 in trunk. > > > > Yes, if we want to skip the test, but I'm not sure about that? > > Since the only point of this patch is to simplify these kind of > > expressions, and it is not being simplified at all in your arch as the > > asm dump suggests, then it seems safe to skip all sinatan-*.c tests. > > > > On Tue, Oct 16, 2018 at 1:25 PM Christophe Lyon > > <christophe.lyon@linaro.org> wrote: > > > > > > On Tue, 16 Oct 2018 at 18:04, Giuliano Augusto Faulin Belinassi > > > <giuliano.belinassi@usp.br> wrote: > > > > > > > > Hello, Christophe > > > > Could you please dump the assembly of cosatanf here? > > > > > > > > > > > Sure: > > > .global cosatanf > > > .syntax unified > > > .arm > > > .fpu softvfp > > > .type cosatanf, %function > > > cosatanf: > > > .LFB3: > > > .loc 1 44 1 is_stmt 1 > > > .cfi_startproc > > > @ args = 0, pretend = 0, frame = 0 > > > @ frame_needed = 0, uses_anonymous_args = 0 > > > .LVL50: > > > .loc 1 45 5 > > > .loc 1 44 1 is_stmt 0 > > > push {r4, lr} > > > .cfi_def_cfa_offset 8 > > > .cfi_offset 4, -8 > > > .cfi_offset 14, -4 > > > .loc 1 45 12 > > > bl atanf > > > .LVL51: > > > .loc 1 46 1 > > > pop {r4, lr} > > > .cfi_restore 14 > > > .cfi_restore 4 > > > .cfi_def_cfa_offset 0 > > > .loc 1 45 12 > > > b cosf > > > .LVL52: > > > .cfi_endproc > > > .LFE3: > > > .size cosatanf, .-cosatanf > > > > > > So, upon entry we have r0=0x5f800000 > > > atanf returns 0x3fc90fdb > > > and cosf returns 0xb33bbd2e > > > > > > > > > > On Tue, Oct 16, 2018 at 12:23 PM Christophe Lyon > > > > <christophe.lyon@linaro.org> wrote: > > > > > > > > > > On Tue, 16 Oct 2018 at 17:15, Giuliano Augusto Faulin Belinassi > > > > > <giuliano.belinassi@usp.br> wrote: > > > > > > > > > > > > Hello. Sorry for the late reply. > > > > > > > > > > > > > but then cosatanf is computed as (ie there's not call to cosatanf()): > > > > > > > movw r3, #48430 > > > > > > > movt r3, 45883 > > > > > > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero. > > > > > > Does this behavior is still present if we change the line 58 to: > > > > > > int __attribute__ ((optimize("O0"))) > > > > > > in sinatan-1.c? > > > > > > > > > > No, this now generates: > > > > > ldr r0, [fp, #-32] > > > > > bl cosatanf > > > > > where r0=0x5f800000, and cosatanf() returns 0xb33bbd2e > > > > > (ie the same value as what was computed by GCC) > > > > > > > > > > > On Fri, Oct 12, 2018 at 3:24 PM Christophe Lyon > > > > > > <christophe.lyon@linaro.org> wrote: > > > > > > > > > > > > > > On Fri, 12 Oct 2018 at 20:01, Giuliano Augusto Faulin Belinassi > > > > > > > <giuliano.belinassi@usp.br> wrote: > > > > > > > > > > > > > > > > > fc is built with: > > > > > > > > > mov r0, #0 > > > > > > > > > movt r0, 24448 > > > > > > > > > so r0=0x5f800000 (1.8446744E19) which looks ok > > > > > > > > > > > > > > > > this is correct. My x86_64 yields the same value > > > > > > > > > > > > > > > > > but then cosatanf is computed as (ie there's not call to cosatanf()): > > > > > > > > > movw r3, #48430 > > > > > > > > > movt r3, 45883 > > > > > > > > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero. > > > > > > > > > > > > > > > > Ugh. So GCC replaced the function call with a precomputed value? This > > > > > > > > does not happens in my x86_64. Maybe it is Ofast's fault? > > > > > > > > Also, it seems that GCC is precomputing cos(atan(x)) before the > > > > > > > > substitution, as the following python script yields the same result: > > > > > > > > > > > > > > > > > > > > > > Yes, I was surprised to see that. > > > > > > > > > > > > > > > import numpy as np > > > > > > > > x = np.float32 (1.8446744e19) > > > > > > > > print (np.cos (np.arctan (x)) > > > > > > > > > > > > > > > > I would also like to add that -4.371139E-8 is very far away from > > > > > > > > 5.421011E-20, which is a "more" correct value for this computation. So > > > > > > > > returning 0 may be a better option? > > > > > > > > On Fri, Oct 12, 2018 at 12:57 PM Jeff Law <law@redhat.com> wrote: > > > > > > > > > > > > > > > > > > On 10/12/18 9:51 AM, Giuliano Augusto Faulin Belinassi wrote: > > > > > > > > > > Hello > > > > > > > > > > What is the output of these functions on such arch? Since the > > > > > > > > > > test didn't fail for the sinatan counterpart, an possible explanation > > > > > > > > > > would be that the calculation of the sqrf, sqrt and sqrtl (lines > > > > > > > > > > 62-64) yielded a number that is far behind of what it should be. > > > > > > > > > > However, I am still not sure about this, so I will investigate > > > > > > > > > > further. > > > > > > > > > > How about I write a small program to check if the result > > > > > > > > > > obtained by this calculation is what it should be? > > > > > > > > > I suspect it's less the architecture and more the underlying library. > > > > > > > > > As Christophe mentioned, both issues are with newlib which is an C > > > > > > > > > library primarily used in the emebedded space. I believe it's math code > > > > > > > > > derives from early BSD libm and likely hasn't been stressed for this > > > > > > > > > kind of correctness. It's lightly maintained (primarily for cygwin). > > > > > > > > > > > > > > > > > > I'm going to run the testcases in my arm linux chroots. That should > > > > > > > > > allow us to rule out codegen issues as those chroots will be using glibc > > > > > > > > > for their math library. > > > > > > > > > > > > > > > > > > jeff
Please ignore the previous message. I did not know how revisions in SVN worked until now. Sorry :-( The code I referred was added in r265064, by law. So Christophe was correct. Could you please check if the 'if (SCALAR_FLOAT_TYPE_P (type)', at line 4260 in gcc/match.pd (r265064) is evaluated true in your arch? On Tue, Oct 16, 2018 at 4:29 PM Giuliano Augusto Faulin Belinassi <giuliano.belinassi@usp.br> wrote: > > Oh, sorry about that. :P > > Could you please check if the 'if (SCALAR_FLOAT_TYPE_P (type)', at > line 4281 in gcc/match.pd is evaluated true in your arch? > > svn revision: > $ svn info | grep Revision > Revision: 265212 > On Tue, Oct 16, 2018 at 4:14 PM Christophe Lyon > <christophe.lyon@linaro.org> wrote: > > > > On Tue, 16 Oct 2018 at 19:33, Giuliano Augusto Faulin Belinassi > > <giuliano.belinassi@usp.br> wrote: > > > > > > Hello, > > > > > > > cosatanf: > > > > .LFB3: > > > > .loc 1 44 1 is_stmt 1 > > > > .cfi_startproc > > > > @ args = 0, pretend = 0, frame = 0 > > > > @ frame_needed = 0, uses_anonymous_args = 0 > > > > .LVL50: > > > > .loc 1 45 5 > > > > .loc 1 44 1 is_stmt 0 > > > > push {r4, lr} > > > > .cfi_def_cfa_offset 8 > > > > .cfi_offset 4, -8 > > > > .cfi_offset 14, -4 > > > > .loc 1 45 12 > > > > bl atanf > > > > .LVL51: > > > > .loc 1 46 1 > > > > pop {r4, lr} > > > > .cfi_restore 14 > > > > .cfi_restore 4 > > > > .cfi_def_cfa_offset 0 > > > > .loc 1 45 12 > > > > b cosf > > > > .LVL52: > > > > > > This means that the expression 'cos (atan (x))' was not simplified > > > :-(. Did the check at line 4281 (svn revision 265209) in gcc/match.pd > > > failed? > > > > Do you mean r265064? There's no r265209 in trunk. > > > > > > Yes, if we want to skip the test, but I'm not sure about that? > > > Since the only point of this patch is to simplify these kind of > > > expressions, and it is not being simplified at all in your arch as the > > > asm dump suggests, then it seems safe to skip all sinatan-*.c tests. > > > > > > > On Tue, Oct 16, 2018 at 1:25 PM Christophe Lyon > > > <christophe.lyon@linaro.org> wrote: > > > > > > > > On Tue, 16 Oct 2018 at 18:04, Giuliano Augusto Faulin Belinassi > > > > <giuliano.belinassi@usp.br> wrote: > > > > > > > > > > Hello, Christophe > > > > > Could you please dump the assembly of cosatanf here? > > > > > > > > > > > > > > Sure: > > > > .global cosatanf > > > > .syntax unified > > > > .arm > > > > .fpu softvfp > > > > .type cosatanf, %function > > > > cosatanf: > > > > .LFB3: > > > > .loc 1 44 1 is_stmt 1 > > > > .cfi_startproc > > > > @ args = 0, pretend = 0, frame = 0 > > > > @ frame_needed = 0, uses_anonymous_args = 0 > > > > .LVL50: > > > > .loc 1 45 5 > > > > .loc 1 44 1 is_stmt 0 > > > > push {r4, lr} > > > > .cfi_def_cfa_offset 8 > > > > .cfi_offset 4, -8 > > > > .cfi_offset 14, -4 > > > > .loc 1 45 12 > > > > bl atanf > > > > .LVL51: > > > > .loc 1 46 1 > > > > pop {r4, lr} > > > > .cfi_restore 14 > > > > .cfi_restore 4 > > > > .cfi_def_cfa_offset 0 > > > > .loc 1 45 12 > > > > b cosf > > > > .LVL52: > > > > .cfi_endproc > > > > .LFE3: > > > > .size cosatanf, .-cosatanf > > > > > > > > So, upon entry we have r0=0x5f800000 > > > > atanf returns 0x3fc90fdb > > > > and cosf returns 0xb33bbd2e > > > > > > > > > > > > > On Tue, Oct 16, 2018 at 12:23 PM Christophe Lyon > > > > > <christophe.lyon@linaro.org> wrote: > > > > > > > > > > > > On Tue, 16 Oct 2018 at 17:15, Giuliano Augusto Faulin Belinassi > > > > > > <giuliano.belinassi@usp.br> wrote: > > > > > > > > > > > > > > Hello. Sorry for the late reply. > > > > > > > > > > > > > > > but then cosatanf is computed as (ie there's not call to cosatanf()): > > > > > > > > movw r3, #48430 > > > > > > > > movt r3, 45883 > > > > > > > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero. > > > > > > > Does this behavior is still present if we change the line 58 to: > > > > > > > int __attribute__ ((optimize("O0"))) > > > > > > > in sinatan-1.c? > > > > > > > > > > > > No, this now generates: > > > > > > ldr r0, [fp, #-32] > > > > > > bl cosatanf > > > > > > where r0=0x5f800000, and cosatanf() returns 0xb33bbd2e > > > > > > (ie the same value as what was computed by GCC) > > > > > > > > > > > > > On Fri, Oct 12, 2018 at 3:24 PM Christophe Lyon > > > > > > > <christophe.lyon@linaro.org> wrote: > > > > > > > > > > > > > > > > On Fri, 12 Oct 2018 at 20:01, Giuliano Augusto Faulin Belinassi > > > > > > > > <giuliano.belinassi@usp.br> wrote: > > > > > > > > > > > > > > > > > > > fc is built with: > > > > > > > > > > mov r0, #0 > > > > > > > > > > movt r0, 24448 > > > > > > > > > > so r0=0x5f800000 (1.8446744E19) which looks ok > > > > > > > > > > > > > > > > > > this is correct. My x86_64 yields the same value > > > > > > > > > > > > > > > > > > > but then cosatanf is computed as (ie there's not call to cosatanf()): > > > > > > > > > > movw r3, #48430 > > > > > > > > > > movt r3, 45883 > > > > > > > > > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero. > > > > > > > > > > > > > > > > > > Ugh. So GCC replaced the function call with a precomputed value? This > > > > > > > > > does not happens in my x86_64. Maybe it is Ofast's fault? > > > > > > > > > Also, it seems that GCC is precomputing cos(atan(x)) before the > > > > > > > > > substitution, as the following python script yields the same result: > > > > > > > > > > > > > > > > > > > > > > > > > Yes, I was surprised to see that. > > > > > > > > > > > > > > > > > import numpy as np > > > > > > > > > x = np.float32 (1.8446744e19) > > > > > > > > > print (np.cos (np.arctan (x)) > > > > > > > > > > > > > > > > > > I would also like to add that -4.371139E-8 is very far away from > > > > > > > > > 5.421011E-20, which is a "more" correct value for this computation. So > > > > > > > > > returning 0 may be a better option? > > > > > > > > > On Fri, Oct 12, 2018 at 12:57 PM Jeff Law <law@redhat.com> wrote: > > > > > > > > > > > > > > > > > > > > On 10/12/18 9:51 AM, Giuliano Augusto Faulin Belinassi wrote: > > > > > > > > > > > Hello > > > > > > > > > > > What is the output of these functions on such arch? Since the > > > > > > > > > > > test didn't fail for the sinatan counterpart, an possible explanation > > > > > > > > > > > would be that the calculation of the sqrf, sqrt and sqrtl (lines > > > > > > > > > > > 62-64) yielded a number that is far behind of what it should be. > > > > > > > > > > > However, I am still not sure about this, so I will investigate > > > > > > > > > > > further. > > > > > > > > > > > How about I write a small program to check if the result > > > > > > > > > > > obtained by this calculation is what it should be? > > > > > > > > > > I suspect it's less the architecture and more the underlying library. > > > > > > > > > > As Christophe mentioned, both issues are with newlib which is an C > > > > > > > > > > library primarily used in the emebedded space. I believe it's math code > > > > > > > > > > derives from early BSD libm and likely hasn't been stressed for this > > > > > > > > > > kind of correctness. It's lightly maintained (primarily for cygwin). > > > > > > > > > > > > > > > > > > > > I'm going to run the testcases in my arm linux chroots. That should > > > > > > > > > > allow us to rule out codegen issues as those chroots will be using glibc > > > > > > > > > > for their math library. > > > > > > > > > > > > > > > > > > > > jeff
On Tue, Oct 16, 2018 at 10:33 PM Giuliano Augusto Faulin Belinassi <giuliano.belinassi@usp.br> wrote: > > Please ignore the previous message. I did not know how revisions in > SVN worked until now. Sorry :-( > > The code I referred was added in r265064, by law. So Christophe was correct. > > Could you please check if the 'if (SCALAR_FLOAT_TYPE_P (type)', at > line 4260 in gcc/match.pd (r265064) is evaluated true in your arch? The issue is more likely copysign being used in the replacement but the testcases not declaring that and the target not providing an optab. Same for sqrt. match.pd will only do replacements to functions that have been properly declared. Richard. > > On Tue, Oct 16, 2018 at 4:29 PM Giuliano Augusto Faulin Belinassi > <giuliano.belinassi@usp.br> wrote: > > > > Oh, sorry about that. :P > > > > Could you please check if the 'if (SCALAR_FLOAT_TYPE_P (type)', at > > line 4281 in gcc/match.pd is evaluated true in your arch? > > > > svn revision: > > $ svn info | grep Revision > > Revision: 265212 > > On Tue, Oct 16, 2018 at 4:14 PM Christophe Lyon > > <christophe.lyon@linaro.org> wrote: > > > > > > On Tue, 16 Oct 2018 at 19:33, Giuliano Augusto Faulin Belinassi > > > <giuliano.belinassi@usp.br> wrote: > > > > > > > > Hello, > > > > > > > > > cosatanf: > > > > > .LFB3: > > > > > .loc 1 44 1 is_stmt 1 > > > > > .cfi_startproc > > > > > @ args = 0, pretend = 0, frame = 0 > > > > > @ frame_needed = 0, uses_anonymous_args = 0 > > > > > .LVL50: > > > > > .loc 1 45 5 > > > > > .loc 1 44 1 is_stmt 0 > > > > > push {r4, lr} > > > > > .cfi_def_cfa_offset 8 > > > > > .cfi_offset 4, -8 > > > > > .cfi_offset 14, -4 > > > > > .loc 1 45 12 > > > > > bl atanf > > > > > .LVL51: > > > > > .loc 1 46 1 > > > > > pop {r4, lr} > > > > > .cfi_restore 14 > > > > > .cfi_restore 4 > > > > > .cfi_def_cfa_offset 0 > > > > > .loc 1 45 12 > > > > > b cosf > > > > > .LVL52: > > > > > > > > This means that the expression 'cos (atan (x))' was not simplified > > > > :-(. Did the check at line 4281 (svn revision 265209) in gcc/match.pd > > > > failed? > > > > > > Do you mean r265064? There's no r265209 in trunk. > > > > > > > > Yes, if we want to skip the test, but I'm not sure about that? > > > > Since the only point of this patch is to simplify these kind of > > > > expressions, and it is not being simplified at all in your arch as the > > > > asm dump suggests, then it seems safe to skip all sinatan-*.c tests. > > > > > > > > > > On Tue, Oct 16, 2018 at 1:25 PM Christophe Lyon > > > > <christophe.lyon@linaro.org> wrote: > > > > > > > > > > On Tue, 16 Oct 2018 at 18:04, Giuliano Augusto Faulin Belinassi > > > > > <giuliano.belinassi@usp.br> wrote: > > > > > > > > > > > > Hello, Christophe > > > > > > Could you please dump the assembly of cosatanf here? > > > > > > > > > > > > > > > > > Sure: > > > > > .global cosatanf > > > > > .syntax unified > > > > > .arm > > > > > .fpu softvfp > > > > > .type cosatanf, %function > > > > > cosatanf: > > > > > .LFB3: > > > > > .loc 1 44 1 is_stmt 1 > > > > > .cfi_startproc > > > > > @ args = 0, pretend = 0, frame = 0 > > > > > @ frame_needed = 0, uses_anonymous_args = 0 > > > > > .LVL50: > > > > > .loc 1 45 5 > > > > > .loc 1 44 1 is_stmt 0 > > > > > push {r4, lr} > > > > > .cfi_def_cfa_offset 8 > > > > > .cfi_offset 4, -8 > > > > > .cfi_offset 14, -4 > > > > > .loc 1 45 12 > > > > > bl atanf > > > > > .LVL51: > > > > > .loc 1 46 1 > > > > > pop {r4, lr} > > > > > .cfi_restore 14 > > > > > .cfi_restore 4 > > > > > .cfi_def_cfa_offset 0 > > > > > .loc 1 45 12 > > > > > b cosf > > > > > .LVL52: > > > > > .cfi_endproc > > > > > .LFE3: > > > > > .size cosatanf, .-cosatanf > > > > > > > > > > So, upon entry we have r0=0x5f800000 > > > > > atanf returns 0x3fc90fdb > > > > > and cosf returns 0xb33bbd2e > > > > > > > > > > > > > > > > On Tue, Oct 16, 2018 at 12:23 PM Christophe Lyon > > > > > > <christophe.lyon@linaro.org> wrote: > > > > > > > > > > > > > > On Tue, 16 Oct 2018 at 17:15, Giuliano Augusto Faulin Belinassi > > > > > > > <giuliano.belinassi@usp.br> wrote: > > > > > > > > > > > > > > > > Hello. Sorry for the late reply. > > > > > > > > > > > > > > > > > but then cosatanf is computed as (ie there's not call to cosatanf()): > > > > > > > > > movw r3, #48430 > > > > > > > > > movt r3, 45883 > > > > > > > > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero. > > > > > > > > Does this behavior is still present if we change the line 58 to: > > > > > > > > int __attribute__ ((optimize("O0"))) > > > > > > > > in sinatan-1.c? > > > > > > > > > > > > > > No, this now generates: > > > > > > > ldr r0, [fp, #-32] > > > > > > > bl cosatanf > > > > > > > where r0=0x5f800000, and cosatanf() returns 0xb33bbd2e > > > > > > > (ie the same value as what was computed by GCC) > > > > > > > > > > > > > > > On Fri, Oct 12, 2018 at 3:24 PM Christophe Lyon > > > > > > > > <christophe.lyon@linaro.org> wrote: > > > > > > > > > > > > > > > > > > On Fri, 12 Oct 2018 at 20:01, Giuliano Augusto Faulin Belinassi > > > > > > > > > <giuliano.belinassi@usp.br> wrote: > > > > > > > > > > > > > > > > > > > > > fc is built with: > > > > > > > > > > > mov r0, #0 > > > > > > > > > > > movt r0, 24448 > > > > > > > > > > > so r0=0x5f800000 (1.8446744E19) which looks ok > > > > > > > > > > > > > > > > > > > > this is correct. My x86_64 yields the same value > > > > > > > > > > > > > > > > > > > > > but then cosatanf is computed as (ie there's not call to cosatanf()): > > > > > > > > > > > movw r3, #48430 > > > > > > > > > > > movt r3, 45883 > > > > > > > > > > > so r3=0xb33bbd2e (-4.371139E-8) which is not zero. > > > > > > > > > > > > > > > > > > > > Ugh. So GCC replaced the function call with a precomputed value? This > > > > > > > > > > does not happens in my x86_64. Maybe it is Ofast's fault? > > > > > > > > > > Also, it seems that GCC is precomputing cos(atan(x)) before the > > > > > > > > > > substitution, as the following python script yields the same result: > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, I was surprised to see that. > > > > > > > > > > > > > > > > > > > import numpy as np > > > > > > > > > > x = np.float32 (1.8446744e19) > > > > > > > > > > print (np.cos (np.arctan (x)) > > > > > > > > > > > > > > > > > > > > I would also like to add that -4.371139E-8 is very far away from > > > > > > > > > > 5.421011E-20, which is a "more" correct value for this computation. So > > > > > > > > > > returning 0 may be a better option? > > > > > > > > > > On Fri, Oct 12, 2018 at 12:57 PM Jeff Law <law@redhat.com> wrote: > > > > > > > > > > > > > > > > > > > > > > On 10/12/18 9:51 AM, Giuliano Augusto Faulin Belinassi wrote: > > > > > > > > > > > > Hello > > > > > > > > > > > > What is the output of these functions on such arch? Since the > > > > > > > > > > > > test didn't fail for the sinatan counterpart, an possible explanation > > > > > > > > > > > > would be that the calculation of the sqrf, sqrt and sqrtl (lines > > > > > > > > > > > > 62-64) yielded a number that is far behind of what it should be. > > > > > > > > > > > > However, I am still not sure about this, so I will investigate > > > > > > > > > > > > further. > > > > > > > > > > > > How about I write a small program to check if the result > > > > > > > > > > > > obtained by this calculation is what it should be? > > > > > > > > > > > I suspect it's less the architecture and more the underlying library. > > > > > > > > > > > As Christophe mentioned, both issues are with newlib which is an C > > > > > > > > > > > library primarily used in the emebedded space. I believe it's math code > > > > > > > > > > > derives from early BSD libm and likely hasn't been stressed for this > > > > > > > > > > > kind of correctness. It's lightly maintained (primarily for cygwin). > > > > > > > > > > > > > > > > > > > > > > I'm going to run the testcases in my arm linux chroots. That should > > > > > > > > > > > allow us to rule out codegen issues as those chroots will be using glibc > > > > > > > > > > > for their math library. > > > > > > > > > > > > > > > > > > > > > > jeff
Index: gcc/match.pd =================================================================== --- gcc/match.pd (revision 264941) +++ gcc/match.pd (working copy) @@ -4223,6 +4223,45 @@ (tans (atans @0)) @0))) + /* Simplify sin(atan(x)) -> x / sqrt(x*x + 1). */ + (for sins (SIN) + atans (ATAN) + sqrts (SQRT) + copysigns (COPYSIGN) + (simplify + (sins (atans:s @0)) + (with + { + REAL_VALUE_TYPE r_cst; + build_sinatan_real (&r_cst, type); + tree t_cst = build_real (type, r_cst); + tree t_one = build_one_cst (type); + } + (if (SCALAR_FLOAT_TYPE_P (type)) + (cond (le (abs @0) { t_cst; }) + (rdiv @0 (sqrts (plus (mult @0 @0) { t_one; }))) + (copysigns { t_one; } @0)))))) + +/* Simplify cos(atan(x)) -> 1 / sqrt(x*x + 1). */ + (for coss (COS) + atans (ATAN) + sqrts (SQRT) + copysigns (COPYSIGN) + (simplify + (coss (atans:s @0)) + (with + { + REAL_VALUE_TYPE r_cst; + build_sinatan_real (&r_cst, type); + tree t_cst = build_real (type, r_cst); + tree t_one = build_one_cst (type); + tree t_zero = build_zero_cst (type); + } + (if (SCALAR_FLOAT_TYPE_P (type)) + (cond (le (abs @0) { t_cst; }) + (rdiv { t_one; } (sqrts (plus (mult @0 @0) { t_one; }))) + (copysigns { t_zero; } @0)))))) + /* cabs(x+0i) or cabs(0+xi) -> abs(x). */ (simplify (CABS (complex:C @0 real_zerop@1)) Index: gcc/real.c =================================================================== --- gcc/real.c (revision 264941) +++ gcc/real.c (working copy) @@ -5279,3 +5279,29 @@ { return HONOR_SIGN_DEPENDENT_ROUNDING (GET_MODE (x)); } + +/* Fills r with the largest value such that 1 + r*r won't overflow. + This is used in both sin (atan (x)) and cos (atan(x)) optimizations. */ + +void +build_sinatan_real (REAL_VALUE_TYPE * r, tree type) +{ + REAL_VALUE_TYPE maxval; + mpfr_t mpfr_const1, mpfr_c, mpfr_maxval; + machine_mode mode = TYPE_MODE (type); + const struct real_format * fmt = REAL_MODE_FORMAT (mode); + + real_maxval (&maxval, 0, mode); + + mpfr_inits (mpfr_const1, mpfr_c, mpfr_maxval, NULL); + + mpfr_from_real (mpfr_const1, &dconst1, GMP_RNDN); + mpfr_from_real (mpfr_maxval, &maxval, GMP_RNDN); + + mpfr_sub (mpfr_c, mpfr_maxval, mpfr_const1, GMP_RNDN); + mpfr_sqrt (mpfr_c, mpfr_c, GMP_RNDZ); + + real_from_mpfr (r, mpfr_c, fmt, GMP_RNDZ); + + mpfr_clears (mpfr_const1, mpfr_c, mpfr_maxval, NULL); +} Index: gcc/real.h =================================================================== --- gcc/real.h (revision 264941) +++ gcc/real.h (working copy) @@ -523,4 +523,8 @@ const wide_int_ref &, signop); #endif +/* Fills r with the largest value such that 1 + r*r won't overflow. + This is used in both sin (atan (x)) and cos (atan(x)) optimizations. */ +extern void build_sinatan_real (REAL_VALUE_TYPE *, tree); + #endif /* ! GCC_REAL_H */ Index: gcc/testsuite/gcc.dg/sinatan-1.c =================================================================== --- gcc/testsuite/gcc.dg/sinatan-1.c (nonexistent) +++ gcc/testsuite/gcc.dg/sinatan-1.c (working copy) @@ -0,0 +1,101 @@ +/* { dg-do run } */ +/* { dg-options "-Ofast" } */ + +extern float sinf (float); +extern float cosf (float); +extern float atanf (float); +extern float sqrtf (float); +extern float nextafterf (float, float); +extern double sin (double); +extern double cos (double); +extern double atan (double); +extern double sqrt (double); +extern double nextafter (double, double); +extern long double sinl (long double); +extern long double cosl (long double); +extern long double atanl (long double); +extern long double sqrtl (long double); +extern long double nextafterl (long double, long double); + +extern void abort (); + +double __attribute__ ((noinline, optimize("Ofast"))) +sinatan (double x) +{ + return sin (atan (x)); +} + +double __attribute__ ((noinline, optimize("Ofast"))) +cosatan (double x) +{ + return cos (atan (x)); +} + +float __attribute__ ((noinline, optimize("Ofast"))) +sinatanf(float x) +{ + return sinf (atanf (x)); +} + +float __attribute__ ((noinline, optimize("Ofast"))) +cosatanf(float x) +{ + return cosf (atanf (x)); +} + +long double __attribute__ ((noinline, optimize("Ofast"))) +sinatanl (long double x) +{ + return sinl (atanl (x)); +} + +long double __attribute__ ((noinline, optimize("Ofast"))) +cosatanl (long double x) +{ + return cosl (atanl (x)); +} + +int +main() +{ + /* Get first x such that 1 + x*x will overflow */ + float fc = nextafterf (sqrtf (__FLT_MAX__ - 1), __FLT_MAX__); + double c = nextafter (sqrt (__DBL_MAX__ - 1), __DBL_MAX__); + long double lc = nextafter (sqrtl (__LDBL_MAX__ - 1), __LDBL_MAX__); + + /* Force move from FPU to memory, otherwise comparison may + fail due to possible more accurate registers (see 387) */ + volatile float fy; + volatile double y; + volatile long double ly; + + fy = sinatanf (fc); + y = sinatan (c); + ly = sinatanl (lc); + + if (fy != 1.f || y != 1 || ly != 1.L) + abort (); + + fy = cosatanf (fc); + y = cosatan (c); + ly = cosatanl (lc); + + if (fy != 0.f || y != 0. || ly != 0.L) + abort (); + + fy = sinatanf (-fc); + y = sinatan (-c); + ly = sinatanl (-lc); + + if (fy != -1.f || y != -1. || ly != -1.L) + abort (); + + fy = cosatanf (-fc); + y = cosatan (-c); + ly = cosatanl (-lc); + + if (fy != 0.f || y != 0. || ly != 0.L) + abort (); + + return 0; +} Index: gcc/testsuite/gcc.dg/sinatan-2.c =================================================================== --- gcc/testsuite/gcc.dg/sinatan-2.c (nonexistent) +++ gcc/testsuite/gcc.dg/sinatan-2.c (working copy) @@ -0,0 +1,59 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -fdump-tree-optimized" } */ + +extern float sinf (float); +extern float cosf (float); +extern float atanf (float); +extern double sin (double); +extern double cos (double); +extern double atan (double); +extern long double sinl (long double); +extern long double cosl (long double); +extern long double atanl (long double); + +double __attribute__ ((noinline)) +sinatan_ (double x) +{ + return sin (atan (x)); +} + +double __attribute__ ((noinline)) +cosatan_ (double x) +{ + return cos (atan (x)); +} + +float __attribute__ ((noinline)) +sinatanf_(float x) +{ + return sinf (atanf (x)); +} + +float __attribute__ ((noinline)) +cosatanf_(float x) +{ + return cosf (atanf (x)); +} + +long double __attribute__ ((noinline)) +sinatanl_ (long double x) +{ + return sinl (atanl (x)); +} + +long double __attribute__ ((noinline)) +cosatanl_ (long double x) +{ + return cosl (atanl (x)); +} + +/* There must be no calls to sin, cos, or atan */ +/* {dg-final { scan-tree-dump-not "sin " "optimized" } } */ +/* {dg-final { scan-tree-dump-not "cos " "optimized" } } */ +/* {dg-final { scan-tree-dump-not "atan " "optimized" }} */ +/* {dg-final { scan-tree-dump-not "sinf " "optimized" } } */ +/* {dg-final { scan-tree-dump-not "cosf " "optimized" } } */ +/* {dg-final { scan-tree-dump-not "atanf " "optimized" }} */ +/* {dg-final { scan-tree-dump-not "sinl " "optimized" } } */ +/* {dg-final { scan-tree-dump-not "cosl " "optimized" } } */ +/* {dg-final { scan-tree-dump-not "atanl " "optimized" }} */ Index: gcc/testsuite/gcc.dg/sinatan-3.c =================================================================== --- gcc/testsuite/gcc.dg/sinatan-3.c (nonexistent) +++ gcc/testsuite/gcc.dg/sinatan-3.c (working copy) @@ -0,0 +1,65 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -fdump-tree-optimized" } */ + +extern float sinf (float); +extern float cosf (float); +extern float atanf (float); +extern double sin (double); +extern double cos (double); +extern double atan (double); +extern long double sinl (long double); +extern long double cosl (long double); +extern long double atanl (long double); + +float __attribute__ ((noinline)) +cosatanf_(float x) +{ + float atg = atanf(x); + return cosf(atg) + atg; +} + +double __attribute__ ((noinline)) +cosatan_(double x) +{ + double atg = atan(x); + return cos(atg) + atg; +} + +long double __attribute__ ((noinline)) +cosatanl_(long double x) +{ + long double atg = atanl(x); + return cosl(atg) + atg; +} + +float __attribute__ ((noinline)) +sinatanf_(float x) +{ + float atg = atanf(x); + return sinf(atg) + atg; +} + +double __attribute__ ((noinline)) +sinatan_(double x) +{ + double atg = atan(x); + return sin(atg) + atg; +} + +long double __attribute__ ((noinline)) +sinatanl_(long double x) +{ + long double atg = atanl(x); + return sinl(atg) + atg; +} + +/* There should be calls to both sin and atan */ +/* { dg-final { scan-tree-dump "cos " "optimized" } } */ +/* { dg-final { scan-tree-dump "sin " "optimized" } } */ +/* { dg-final { scan-tree-dump "atan " "optimized" } } */ +/* { dg-final { scan-tree-dump "cosf " "optimized" } } */ +/* { dg-final { scan-tree-dump "sinf " "optimized" } } */ +/* { dg-final { scan-tree-dump "atanf " "optimized" } } */ +/* { dg-final { scan-tree-dump "cosl " "optimized" } } */ +/* { dg-final { scan-tree-dump "sinl " "optimized" } } */ +/* { dg-final { scan-tree-dump "atanl " "optimized" } } */