diff mbox

inhibit the sincos optimization when the target has sin and cos instructions

Message ID 573B88B0.2080508@codesourcery.com
State New
Headers show

Commit Message

Cesar Philippidis May 17, 2016, 9:10 p.m. UTC
On 05/13/2016 01:13 PM, Andrew Pinski wrote:
> On Fri, May 13, 2016 at 12:58 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On May 13, 2016 9:18:57 PM GMT+02:00, Cesar Philippidis <cesar@codesourcery.com> wrote:
>>> The cse_sincos pass tries to optimize sequences such as
>>>
>>>  sin (x);
>>>  cos (x);
>>>
>>> into a single call to sincos, or cexpi, when available. However, the
>>> nvptx target has sin and cos instructions, albeit with some loss of
>>> precision (so it's only enabled with -ffast-math). This patch teaches
>>> cse_sincos pass to ignore sin, cos and cexpi instructions when the
>>> target can expand those calls. This yields a 6x speedup in 314.omriq
>> >from spec accel when running on Nvidia accelerators.
>>>
>>> Is this OK for trunk?
>>
>> Isn't there an optab for sincos?
> 
> This is exactly what I was going to suggest.  This transformation
> should be done in the back-end back to sin/cos instructions.

I didn't realize that the 387 has sin, cos and sincos instructions,
so yeah, my original patch is bad.

Nathan, is this patch ok for trunk and gcc-6? It adds a new sincos 
pattern in the nvptx backend. I haven't testing a standalone nvptx 
toolchain prior to this patch, so I'm not sure if my test results 
look sane. I seem to be getting a different set of failures when I
test a clean trunk build multiple times. I attached my results
below for reference.

Cesar

g++.sum
Tests that now fail, but worked before:

nvptx-none-run: g++.dg/abi/param1.C  -std=c++14 execution test

Tests that now work, but didn't before:

nvptx-none-run: g++.dg/opt/pr30590.C  -std=gnu++98 execution test
nvptx-none-run: g++.dg/opt/pr36187.C  -std=gnu++14 execution test

gfortran.sum
Tests that now fail, but worked before:

nvptx-none-run: gfortran.dg/alloc_comp_assign_10.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
nvptx-none-run: gfortran.dg/allocate_with_source_5.f90   -O1  execution test
nvptx-none-run: gfortran.dg/func_assign_3.f90   -O3 -g  execution test
nvptx-none-run: gfortran.dg/inline_sum_3.f90   -O1  execution test
nvptx-none-run: gfortran.dg/inline_sum_3.f90   -O3 -g  execution test
nvptx-none-run: gfortran.dg/internal_pack_15.f90   -O2  execution test
nvptx-none-run: gfortran.dg/internal_pack_8.f90   -Os  execution test
nvptx-none-run: gfortran.dg/intrinsic_ifunction_2.f90   -O0  execution test
nvptx-none-run: gfortran.dg/intrinsic_ifunction_2.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
nvptx-none-run: gfortran.dg/intrinsic_pack_5.f90   -O3 -g  execution test
nvptx-none-run: gfortran.dg/intrinsic_product_1.f90   -O1  execution test
nvptx-none-run: gfortran.dg/intrinsic_verify_1.f90   -O3 -g  execution test
nvptx-none-run: gfortran.dg/is_iostat_end_eor_1.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
nvptx-none-run: gfortran.dg/iso_c_binding_rename_1.f03   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test

Tests that now work, but didn't before:

nvptx-none-run: gfortran.dg/char_pointer_assign.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
nvptx-none-run: gfortran.dg/char_pointer_dummy.f90   -O1  execution test
nvptx-none-run: gfortran.dg/char_pointer_dummy.f90   -Os  execution test
nvptx-none-run: gfortran.dg/char_result_13.f90   -O3 -g  execution test
nvptx-none-run: gfortran.dg/char_result_2.f90   -O1  execution test
nvptx-none-run: gfortran.dg/char_type_len.f90   -Os  execution test
nvptx-none-run: gfortran.dg/character_array_constructor_1.f90   -O0  execution test
nvptx-none-run: gfortran.dg/nested_allocatables_1.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test

gcc.sum
Tests that now fail, but worked before:

nvptx-none-run: gcc.c-torture/execute/20100316-1.c   -Os  execution test
nvptx-none-run: gcc.c-torture/execute/20100708-1.c   -O1  execution test
nvptx-none-run: gcc.c-torture/execute/20100805-1.c   -O0  execution test
nvptx-none-run: gcc.dg/torture/pr52028.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
nvptx-none-run: gcc.dg/torture/pr52028.c   -O3 -g  execution test

Tests that now work, but didn't before:

nvptx-none-run: gcc.c-torture/execute/20091229-1.c   -O3 -g  execution test
nvptx-none-run: gcc.c-torture/execute/20101013-1.c   -Os  execution test
nvptx-none-run: gcc.c-torture/execute/20101025-1.c   -Os  execution test
nvptx-none-run: gcc.c-torture/execute/20120105-1.c   -O0  execution test
nvptx-none-run: gcc.c-torture/execute/20120111-1.c   -O0  execution test

New tests that PASS:

nvptx-none-run: gcc.target/nvptx/sincos-1.c (test for excess errors)
nvptx-none-run: gcc.target/nvptx/sincos-1.c scan-assembler-times cos.approx.f32 1
nvptx-none-run: gcc.target/nvptx/sincos-1.c scan-assembler-times sin.approx.f32 1
nvptx-none-run: gcc.target/nvptx/sincos-2.c (test for excess errors)
nvptx-none-run: gcc.target/nvptx/sincos-2.c execution test


>> ISTR x87 handles this pass just fine and also can do sin and cos.
>>
>> Richard.
>>
>>> Cesar
>>
>>

Comments

Andrew Pinski May 17, 2016, 9:22 p.m. UTC | #1
On Tue, May 17, 2016 at 2:10 PM, Cesar Philippidis
<cesar@codesourcery.com> wrote:
> On 05/13/2016 01:13 PM, Andrew Pinski wrote:
>> On Fri, May 13, 2016 at 12:58 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On May 13, 2016 9:18:57 PM GMT+02:00, Cesar Philippidis <cesar@codesourcery.com> wrote:
>>>> The cse_sincos pass tries to optimize sequences such as
>>>>
>>>>  sin (x);
>>>>  cos (x);
>>>>
>>>> into a single call to sincos, or cexpi, when available. However, the
>>>> nvptx target has sin and cos instructions, albeit with some loss of
>>>> precision (so it's only enabled with -ffast-math). This patch teaches
>>>> cse_sincos pass to ignore sin, cos and cexpi instructions when the
>>>> target can expand those calls. This yields a 6x speedup in 314.omriq
>>> >from spec accel when running on Nvidia accelerators.
>>>>
>>>> Is this OK for trunk?
>>>
>>> Isn't there an optab for sincos?
>>
>> This is exactly what I was going to suggest.  This transformation
>> should be done in the back-end back to sin/cos instructions.
>
> I didn't realize that the 387 has sin, cos and sincos instructions,
> so yeah, my original patch is bad.
>
> Nathan, is this patch ok for trunk and gcc-6? It adds a new sincos
> pattern in the nvptx backend. I haven't testing a standalone nvptx
> toolchain prior to this patch, so I'm not sure if my test results
> look sane. I seem to be getting a different set of failures when I
> test a clean trunk build multiple times. I attached my results
> below for reference.


UNSPEC_SINCOS is unused so why add it?

Thanks,
Andrew Pinski


>
> Cesar
>
> g++.sum
> Tests that now fail, but worked before:
>
> nvptx-none-run: g++.dg/abi/param1.C  -std=c++14 execution test
>
> Tests that now work, but didn't before:
>
> nvptx-none-run: g++.dg/opt/pr30590.C  -std=gnu++98 execution test
> nvptx-none-run: g++.dg/opt/pr36187.C  -std=gnu++14 execution test
>
> gfortran.sum
> Tests that now fail, but worked before:
>
> nvptx-none-run: gfortran.dg/alloc_comp_assign_10.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> nvptx-none-run: gfortran.dg/allocate_with_source_5.f90   -O1  execution test
> nvptx-none-run: gfortran.dg/func_assign_3.f90   -O3 -g  execution test
> nvptx-none-run: gfortran.dg/inline_sum_3.f90   -O1  execution test
> nvptx-none-run: gfortran.dg/inline_sum_3.f90   -O3 -g  execution test
> nvptx-none-run: gfortran.dg/internal_pack_15.f90   -O2  execution test
> nvptx-none-run: gfortran.dg/internal_pack_8.f90   -Os  execution test
> nvptx-none-run: gfortran.dg/intrinsic_ifunction_2.f90   -O0  execution test
> nvptx-none-run: gfortran.dg/intrinsic_ifunction_2.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> nvptx-none-run: gfortran.dg/intrinsic_pack_5.f90   -O3 -g  execution test
> nvptx-none-run: gfortran.dg/intrinsic_product_1.f90   -O1  execution test
> nvptx-none-run: gfortran.dg/intrinsic_verify_1.f90   -O3 -g  execution test
> nvptx-none-run: gfortran.dg/is_iostat_end_eor_1.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> nvptx-none-run: gfortran.dg/iso_c_binding_rename_1.f03   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
>
> Tests that now work, but didn't before:
>
> nvptx-none-run: gfortran.dg/char_pointer_assign.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> nvptx-none-run: gfortran.dg/char_pointer_dummy.f90   -O1  execution test
> nvptx-none-run: gfortran.dg/char_pointer_dummy.f90   -Os  execution test
> nvptx-none-run: gfortran.dg/char_result_13.f90   -O3 -g  execution test
> nvptx-none-run: gfortran.dg/char_result_2.f90   -O1  execution test
> nvptx-none-run: gfortran.dg/char_type_len.f90   -Os  execution test
> nvptx-none-run: gfortran.dg/character_array_constructor_1.f90   -O0  execution test
> nvptx-none-run: gfortran.dg/nested_allocatables_1.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
>
> gcc.sum
> Tests that now fail, but worked before:
>
> nvptx-none-run: gcc.c-torture/execute/20100316-1.c   -Os  execution test
> nvptx-none-run: gcc.c-torture/execute/20100708-1.c   -O1  execution test
> nvptx-none-run: gcc.c-torture/execute/20100805-1.c   -O0  execution test
> nvptx-none-run: gcc.dg/torture/pr52028.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> nvptx-none-run: gcc.dg/torture/pr52028.c   -O3 -g  execution test
>
> Tests that now work, but didn't before:
>
> nvptx-none-run: gcc.c-torture/execute/20091229-1.c   -O3 -g  execution test
> nvptx-none-run: gcc.c-torture/execute/20101013-1.c   -Os  execution test
> nvptx-none-run: gcc.c-torture/execute/20101025-1.c   -Os  execution test
> nvptx-none-run: gcc.c-torture/execute/20120105-1.c   -O0  execution test
> nvptx-none-run: gcc.c-torture/execute/20120111-1.c   -O0  execution test
>
> New tests that PASS:
>
> nvptx-none-run: gcc.target/nvptx/sincos-1.c (test for excess errors)
> nvptx-none-run: gcc.target/nvptx/sincos-1.c scan-assembler-times cos.approx.f32 1
> nvptx-none-run: gcc.target/nvptx/sincos-1.c scan-assembler-times sin.approx.f32 1
> nvptx-none-run: gcc.target/nvptx/sincos-2.c (test for excess errors)
> nvptx-none-run: gcc.target/nvptx/sincos-2.c execution test
>
>
>>> ISTR x87 handles this pass just fine and also can do sin and cos.
>>>
>>> Richard.
>>>
>>>> Cesar
>>>
>>>
>
diff mbox

Patch

2016-05-17  Cesar Philippidis  <cesar@codesourcery.com>

	gcc/
	* config/nvptx/nvptx.md (unspec): Add UNSPEC_SINCOS.
	(sincossf3): New pattern.

	gcc/testsuite/
	* gcc.target/nvptx/sincos-1.c: New test.
	* gcc.target/nvptx/sincos-2.c: New test.


diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 33a4862..03a2f67 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -26,6 +26,7 @@ 
    UNSPEC_EXP2
    UNSPEC_SIN
    UNSPEC_COS
+   UNSPEC_SINCOS
 
    UNSPEC_FPINT_FLOOR
    UNSPEC_FPINT_BTRUNC
@@ -794,6 +795,20 @@ 
   ""
   "%.\\tsqrt%#%t0\\t%0, %1;")
 
+(define_expand "sincossf3"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(unspec:SF [(match_operand:SF 2 "nvptx_register_operand" "R")]
+	           UNSPEC_COS))
+   (set (match_operand:SF 1 "nvptx_register_operand" "=R")
+	(unspec:SF [(match_dup 2)] UNSPEC_SIN))]
+  "flag_unsafe_math_optimizations"
+{
+  emit_insn (gen_sinsf2 (operands[1], operands[2]));
+  emit_insn (gen_cossf2 (operands[0], operands[2]));
+
+  DONE;
+})
+
 (define_insn "sinsf2"
   [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
 	(unspec:SF [(match_operand:SF 1 "nvptx_register_operand" "R")]
diff --git a/gcc/testsuite/gcc.target/nvptx/sincos-1.c b/gcc/testsuite/gcc.target/nvptx/sincos-1.c
new file mode 100644
index 0000000..921ec41
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/sincos-1.c
@@ -0,0 +1,17 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math" } */
+
+extern float sinf (float);
+extern float cosf (float);
+
+float
+sincos_add (float x)
+{
+  float s = sinf (x);
+  float c = cosf (x);
+
+  return s + c;
+}
+
+/* { dg-final { scan-assembler-times "sin.approx.f32" 1 } } */
+/* { dg-final { scan-assembler-times "cos.approx.f32" 1 } } */
diff --git a/gcc/testsuite/gcc.target/nvptx/sincos-2.c b/gcc/testsuite/gcc.target/nvptx/sincos-2.c
new file mode 100644
index 0000000..b617a7c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/sincos-2.c
@@ -0,0 +1,30 @@ 
+/* { dg-do run } */
+/* { dg-options "-O2 -ffast-math" } */
+
+#include <assert.h>
+
+extern float sinf (float);
+extern float cosf (float);
+
+float val = 1.0;
+
+float
+test_sincos (float x, float other_cos)
+{
+  float s = sinf (x);
+  float c = cosf (x);
+
+  assert (c == other_cos);
+  
+  return s + c;
+}
+
+int
+main ()
+{
+  float c = cosf (val);
+  
+  test_sincos (val, c);
+ 
+  return 0;
+}