diff mbox series

move sincos after pre (was: Re: [Ada,FYI] revamp ada.numerics.aux)

Message ID oro8kt2f88.fsf_-_@livre.home
State New
Headers show
Series move sincos after pre (was: Re: [Ada,FYI] revamp ada.numerics.aux) | expand

Commit Message

Alexandre Oliva Oct. 23, 2020, 2:23 p.m. UTC
On Oct 22, 2020, Alexandre Oliva <oliva@adacore.com> wrote:

> On Oct 18, 2020, Alexandre Oliva <oliva@adacore.com> wrote:
>> The option is provided by default, but there is an alternate version
>> that doesn't, that is used for vxworks targets.

> vxworks float EFs not precise enough -> use long float

> From: Alexandre Oliva <oliva@adacore.com>

> Some acats-4 tests that check the precision of Float elementary
> functions fail with vxworks 7.2's implementations of single-precision
> math functions.

> This patch arranges for us to bypass the single-precision functions,
> and use the Aux_Long_Float implementation, based on the double-typed
> calls from the C library, for Float and Short_Float.


On platforms in which Aux_[Real_Type] involves non-NOP conversions
(e.g., between single- and double-precision, or between short float
and float), the conversions before the calls are CSEd too late for
sincos to combine calls.

This patch moves sincos after PRE, where the conversions are unified
at -O2.

I'm regstrapping this on x86_64-linux-gnu and powerpc64-linux-gnu, and
also testing it on affected platforms.  Another way to go, that would
take a little more effort, would be to extend sincos to take equivalent
conversions as the same operand, but I doubt I'll be able to undertake
that any time soon, so...  Is this one ok to install?


for  gcc/ChangeLog

	* passes.def: Move sincos after pre.

for  gcc/testsuite/ChangeLog

	* gnat.dg/sin_cos.ads: New.
	* gnat.dg/sin_cos.adb: New.
	* gcc.dg/sin_cos.c: New.
---
 gcc/passes.def                    |    2 +-
 gcc/testsuite/gcc.dg/sin_cos.c    |   41 +++++++++++++++++++++++++++++++++++++
 gcc/testsuite/gnat.dg/sin_cos.adb |   14 +++++++++++++
 gcc/testsuite/gnat.dg/sin_cos.ads |    4 ++++
 4 files changed, 60 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/sin_cos.c
 create mode 100644 gcc/testsuite/gnat.dg/sin_cos.adb
 create mode 100644 gcc/testsuite/gnat.dg/sin_cos.ads

Comments

Richard Biener Oct. 23, 2020, 3:05 p.m. UTC | #1
On October 23, 2020 4:23:35 PM GMT+02:00, Alexandre Oliva <oliva@adacore.com> wrote:
>On Oct 22, 2020, Alexandre Oliva <oliva@adacore.com> wrote:
>
>> On Oct 18, 2020, Alexandre Oliva <oliva@adacore.com> wrote:
>>> The option is provided by default, but there is an alternate version
>>> that doesn't, that is used for vxworks targets.
>
>> vxworks float EFs not precise enough -> use long float
>
>> From: Alexandre Oliva <oliva@adacore.com>
>
>> Some acats-4 tests that check the precision of Float elementary
>> functions fail with vxworks 7.2's implementations of single-precision
>> math functions.
>
>> This patch arranges for us to bypass the single-precision functions,
>> and use the Aux_Long_Float implementation, based on the double-typed
>> calls from the C library, for Float and Short_Float.
>
>
>On platforms in which Aux_[Real_Type] involves non-NOP conversions
>(e.g., between single- and double-precision, or between short float
>and float), the conversions before the calls are CSEd too late for
>sincos to combine calls.
>
>This patch moves sincos after PRE, where the conversions are unified
>at -O2.
>
>I'm regstrapping this on x86_64-linux-gnu and powerpc64-linux-gnu, and
>also testing it on affected platforms.  Another way to go, that would
>take a little more effort, would be to extend sincos to take equivalent
>conversions as the same operand, but I doubt I'll be able to undertake
>that any time soon, so...  Is this one ok to install?

Can you move it one pass further after sink please? Also I don't remember exactly but does pass_sincos only handle sin/cos unifying? 

Thanks, 
Richard. 

>
>for  gcc/ChangeLog
>
>	* passes.def: Move sincos after pre.
>
>for  gcc/testsuite/ChangeLog
>
>	* gnat.dg/sin_cos.ads: New.
>	* gnat.dg/sin_cos.adb: New.
>	* gcc.dg/sin_cos.c: New.
>---
> gcc/passes.def                    |    2 +-
>gcc/testsuite/gcc.dg/sin_cos.c    |   41
>+++++++++++++++++++++++++++++++++++++
> gcc/testsuite/gnat.dg/sin_cos.adb |   14 +++++++++++++
> gcc/testsuite/gnat.dg/sin_cos.ads |    4 ++++
> 4 files changed, 60 insertions(+), 1 deletion(-)
> create mode 100644 gcc/testsuite/gcc.dg/sin_cos.c
> create mode 100644 gcc/testsuite/gnat.dg/sin_cos.adb
> create mode 100644 gcc/testsuite/gnat.dg/sin_cos.ads
>
>diff --git a/gcc/passes.def b/gcc/passes.def
>index cf15d8e..2743506 100644
>--- a/gcc/passes.def
>+++ b/gcc/passes.def
>@@ -242,12 +242,12 @@ along with GCC; see the file COPYING3.  If not
>see
>       NEXT_PASS (pass_ccp, true /* nonzero_p */);
>       /* After CCP we rewrite no longer addressed locals into SSA
> 	 form if possible.  */
>-      NEXT_PASS (pass_cse_sincos);
>       NEXT_PASS (pass_optimize_bswap);
>       NEXT_PASS (pass_laddress);
>       NEXT_PASS (pass_lim);
>       NEXT_PASS (pass_walloca, false);
>       NEXT_PASS (pass_pre);
>+      NEXT_PASS (pass_cse_sincos);
>       NEXT_PASS (pass_sink_code);
>       NEXT_PASS (pass_sancov);
>       NEXT_PASS (pass_asan);
>diff --git a/gcc/testsuite/gcc.dg/sin_cos.c
>b/gcc/testsuite/gcc.dg/sin_cos.c
>new file mode 100644
>index 00000000..a4a7727
>--- /dev/null
>+++ b/gcc/testsuite/gcc.dg/sin_cos.c
>@@ -0,0 +1,41 @@
>+/* { dg-do compile } */
>+/* { dg-options "-O2" } */
>+
>+/* This maps to essentially the same gimple that is generated for
>+   gnat.dg/sin_cos.adb, on platforms that use the wraplf variant of
>+   Ada.Numerics.Aux_Float.  The value of EPSILON is not relevant to
>+   the test, but the test must be there to keep the conversions in
>+   different BBs long enough to trigger the problem that prevented the
>+   sincos optimization, because the arguments passed to sin and cos
>+   didn't get unified into a single SSA_NAME in time for sincos.  */
>+
>+#include <math.h>
>+
>+#define EPSILON 3.4526697709225118160247802734375e-4
>+
>+static float my_sinf(float x) {
>+  return (float) sin ((double) x);
>+}
>+
>+static float wrap_sinf(float x) {
>+  if (fabs (x) < EPSILON)
>+    return 0;
>+  return my_sinf (x);
>+}
>+
>+static float my_cosf(float x) {
>+  return (float) cos ((double) x);
>+}
>+
>+static float wrap_cosf(float x) {
>+  if (fabs (x) < EPSILON)
>+    return 1;
>+  return my_cosf (x);
>+}
>+
>+float my_sin_cos(float x, float *s, float *c) {
>+  *s = wrap_sinf (x);
>+  *c = wrap_cosf (x);
>+}
>+
>+/* { dg-final { scan-assembler "sincos\|cexp" { target *-linux-gnu*
>*-w64-mingw* powerpc*-*-* } } } */
>diff --git a/gcc/testsuite/gnat.dg/sin_cos.adb
>b/gcc/testsuite/gnat.dg/sin_cos.adb
>new file mode 100644
>index 00000000..e72f521
>--- /dev/null
>+++ b/gcc/testsuite/gnat.dg/sin_cos.adb
>@@ -0,0 +1,14 @@
>+--  { dg-do compile }
>+--  { dg-options "-O2 -gnatn" }
>+
>+with Ada.Numerics.Elementary_Functions;
>+use Ada.Numerics.Elementary_Functions;
>+package body Sin_Cos is
>+   procedure Sin_Cos (Angle : T; SinA, CosA : out T) is
>+   begin
>+      SinA := Sin (Angle);
>+      CosA := Cos (Angle);
>+   end;
>+end Sin_Cos;
>+
>+--  { dg-final { scan-assembler "sincos\|cexp" { target *-linux-gnu*
>*-w64-mingw* powerpc*-*-* } } }
>diff --git a/gcc/testsuite/gnat.dg/sin_cos.ads
>b/gcc/testsuite/gnat.dg/sin_cos.ads
>new file mode 100644
>index 00000000..a0eff3d
>--- /dev/null
>+++ b/gcc/testsuite/gnat.dg/sin_cos.ads
>@@ -0,0 +1,4 @@
>+package Sin_Cos is
>+   subtype T is Float;
>+   procedure Sin_Cos (Angle : T; SinA, CosA : out T);
>+end Sin_Cos;
Alexandre Oliva Oct. 27, 2020, 5:32 a.m. UTC | #2
On Oct 23, 2020, Richard Biener <richard.guenther@gmail.com> wrote:

> Can you move it one pass further after sink please?

I did, but it didn't solve the recip regressions that my first attempt
brought about.
  
> Also I don't
> remember exactly but does pass_sincos only handle sin/cos unifying?

It rearranges some powi computations, and that's what breaks recip.  It
adds a copy of y*y in extract_recip_[34], we'd need a forwprop or
similar to get rid of the trivial copy before recip could do its job
properly again.

So I figured I'd try to cse type conversions before sincos, and that
turned out to be pretty easy, and it didn't regress anything.

Regstrapped on x86_64-linux-gnu.  Ok to install?


CSE conversions within sincos

From: Alexandre Oliva <oliva@adacore.com>

On platforms in which Aux_[Real_Type] involves non-NOP conversions
(e.g., between single- and double-precision, or between short float
and float), the conversions before the calls are CSEd too late for
sincos to combine calls.

This patch enables the sincos pass to CSE type casts used as arguments
to eligible calls before looking for other calls using the same
operand.


for  gcc/ChangeLog

	* tree-ssa-math-opts.c (sincos_stats): Rename inserted to
	sincos_inserted.  Add conv_inserted.
	(maybe_record_sincos): Rename to...
	(maybe_record_stmt): ... this.
	(execute_cse_conv_1): New.
	(execute_cse_sincos_1): Call it.  Adjust.
	(pass_cse_sincos::execute): Adjust.  Report conv_inserted.

for  gcc/testsuite/ChangeLog

	* gnat.dg/sin_cos.ads: New.
	* gnat.dg/sin_cos.adb: New.
	* gcc.dg/sin_cos.c: New.
---
 gcc/testsuite/gcc.dg/sin_cos.c    |   41 +++++++++++++
 gcc/testsuite/gnat.dg/sin_cos.adb |   14 ++++
 gcc/testsuite/gnat.dg/sin_cos.ads |    4 +
 gcc/tree-ssa-math-opts.c          |  119 +++++++++++++++++++++++++++++++++++--
 4 files changed, 171 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/sin_cos.c
 create mode 100644 gcc/testsuite/gnat.dg/sin_cos.adb
 create mode 100644 gcc/testsuite/gnat.dg/sin_cos.ads

diff --git a/gcc/testsuite/gcc.dg/sin_cos.c b/gcc/testsuite/gcc.dg/sin_cos.c
new file mode 100644
index 00000000..aa71dca
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/sin_cos.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* This maps to essentially the same gimple that is generated for
+   gnat.dg/sin_cos.adb, on platforms that use the wraplf variant of
+   Ada.Numerics.Aux_Float.  The value of EPSILON is not relevant to
+   the test, but the test must be there to keep the conversions in
+   different BBs long enough to trigger the problem that prevented the
+   sincos optimization, because the arguments passed to sin and cos
+   didn't get unified into a single SSA_NAME in time for sincos.  */
+
+#include <math.h>
+
+#define EPSILON 3.4526697709225118160247802734375e-4
+
+static float my_sinf(float x) {
+  return (float) sin ((double) x);
+}
+
+static float wrap_sinf(float x) {
+  if (fabs (x) < EPSILON)
+    return 0;
+  return my_sinf (x);
+}
+
+static float my_cosf(float x) {
+  return (float) cos ((double) x);
+}
+
+static float wrap_cosf(float x) {
+  if (fabs (x) < EPSILON)
+    return 1;
+  return my_cosf (x);
+}
+
+float my_sin_cos(float x, float *s, float *c) {
+  *s = wrap_sinf (x);
+  *c = wrap_cosf (x);
+}
+
+/* { dg-final { scan-assembler "sincos\|cexp" { target *-linux-gnu* *-w64-mingw* *-*-vxworks* } } } */
diff --git a/gcc/testsuite/gnat.dg/sin_cos.adb b/gcc/testsuite/gnat.dg/sin_cos.adb
new file mode 100644
index 00000000..6e18df9
--- /dev/null
+++ b/gcc/testsuite/gnat.dg/sin_cos.adb
@@ -0,0 +1,14 @@
+--  { dg-do compile }
+--  { dg-options "-O2 -gnatn" }
+
+with Ada.Numerics.Elementary_Functions;
+use Ada.Numerics.Elementary_Functions;
+package body Sin_Cos is
+   procedure Sin_Cos (Angle : T; SinA, CosA : out T) is
+   begin
+      SinA := Sin (Angle);
+      CosA := Cos (Angle);
+   end;
+end Sin_Cos;
+
+--  { dg-final { scan-assembler "sincos\|cexp" { target *-linux-gnu* *-w64-mingw* *-*-vxworks* } } }
diff --git a/gcc/testsuite/gnat.dg/sin_cos.ads b/gcc/testsuite/gnat.dg/sin_cos.ads
new file mode 100644
index 00000000..a0eff3d
--- /dev/null
+++ b/gcc/testsuite/gnat.dg/sin_cos.ads
@@ -0,0 +1,4 @@
+package Sin_Cos is
+   subtype T is Float;
+   procedure Sin_Cos (Angle : T; SinA, CosA : out T);
+end Sin_Cos;
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 90dfb98..a32f5ca 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -186,7 +186,11 @@ static struct
 static struct
 {
   /* Number of cexpi calls inserted.  */
-  int inserted;
+  int sincos_inserted;
+
+  /* Number of conversions inserted.  */
+  int conv_inserted;
+
 } sincos_stats;
 
 static struct
@@ -1106,7 +1110,7 @@ make_pass_cse_reciprocals (gcc::context *ctxt)
    statements in the vector.  */
 
 static bool
-maybe_record_sincos (vec<gimple *> *stmts,
+maybe_record_stmt (vec<gimple *> *stmts,
 		     basic_block *top_bb, gimple *use_stmt)
 {
   basic_block use_bb = gimple_bb (use_stmt);
@@ -1126,6 +1130,103 @@ maybe_record_sincos (vec<gimple *> *stmts,
   return true;
 }
 
+/* If NAME is the result of a type conversion, look for other
+   conversions from the same source to the same type and CSE them.
+   Replace results of conversions in sin, cos and cexpi calls with the
+   CSEd SSA_NAME.  Return a SSA_NAME that holds the result of the
+   conversion to be used instead of NAME.  */
+
+static tree
+execute_cse_conv_1 (tree name)
+{
+  if (SSA_NAME_IS_DEFAULT_DEF (name))
+    return name;
+
+  gimple *def_stmt = SSA_NAME_DEF_STMT (name);
+
+  if (!gimple_assign_cast_p (def_stmt))
+    return name;
+
+  tree src = gimple_assign_rhs1 (def_stmt);
+  int count = 0;
+  imm_use_iterator use_iter;
+  gimple *use_stmt;
+  auto_vec<gimple *> stmts;
+  basic_block top_bb = NULL;
+
+  /* Record equivalent conversions from the same source.  */
+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, src)
+    {
+      if (!gimple_assign_cast_p (use_stmt))
+	continue;
+
+      if (!types_compatible_p (TREE_TYPE (name),
+			       TREE_TYPE (gimple_assign_lhs (use_stmt))))
+	continue;
+
+      gcc_checking_assert (gimple_assign_rhs1 (use_stmt)
+			   == gimple_assign_rhs1 (def_stmt));
+
+      if (maybe_record_stmt (&stmts, &top_bb, use_stmt))
+	count++;
+    }
+
+  if (count <= 1)
+    return name;
+
+  /* Build and insert a new conversion.  */
+  name = make_temp_ssa_name (TREE_TYPE (name), def_stmt, "convtmp");
+  gimple *stmt = gimple_build_assign (name,
+				      gimple_assign_rhs_code (def_stmt),
+				      gimple_assign_rhs1 (def_stmt));
+
+  gimple_stmt_iterator gsi;
+  if (!SSA_NAME_IS_DEFAULT_DEF (name)
+      && gimple_code (def_stmt) != GIMPLE_PHI
+      && gimple_bb (def_stmt) == top_bb)
+    gsi = gsi_for_stmt (def_stmt);
+  else
+    gsi = gsi_after_labels (top_bb);
+  gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
+  update_stmt (stmt);
+  sincos_stats.conv_inserted++;
+
+  /* And adjust the recorded old conversion sites.  */
+  for (int i = 0; stmts.iterate (i, &use_stmt); ++i)
+    {
+      tree lhs = gimple_assign_lhs (use_stmt);
+      gimple_assign_set_rhs1 (use_stmt, name);
+      gimple_assign_set_rhs_code (use_stmt, SSA_NAME);
+      update_stmt (use_stmt);
+
+      /* Replace uses of LHS with NAME in sin, cos and cexpi calls
+	 right away, so that we can recognize them as taking the same
+	 arguments.  */
+      FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, lhs)
+	{
+	  if (gimple_code (use_stmt) != GIMPLE_CALL
+	      || !gimple_call_lhs (use_stmt))
+	    continue;
+
+	  switch (gimple_call_combined_fn (use_stmt))
+	    {
+	    CASE_CFN_COS:
+	    CASE_CFN_SIN:
+	    CASE_CFN_CEXPI:
+	      gcc_checking_assert (gimple_call_arg (use_stmt, 0) == lhs);
+	      gimple_call_set_arg (use_stmt, 0, name);
+	      update_stmt (use_stmt);
+	      break;
+
+	    default:
+	      continue;
+	    }
+	}
+    }
+
+  return name;
+}
+
 /* Look for sin, cos and cexpi calls with the same argument NAME and
    create a single call to cexpi CSEing the result in this case.
    We first walk over all immediate uses of the argument collecting
@@ -1147,6 +1248,8 @@ execute_cse_sincos_1 (tree name)
   int i;
   bool cfg_changed = false;
 
+  name = execute_cse_conv_1 (name);
+
   FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, name)
     {
       if (gimple_code (use_stmt) != GIMPLE_CALL
@@ -1156,15 +1259,15 @@ execute_cse_sincos_1 (tree name)
       switch (gimple_call_combined_fn (use_stmt))
 	{
 	CASE_CFN_COS:
-	  seen_cos |= maybe_record_sincos (&stmts, &top_bb, use_stmt) ? 1 : 0;
+	  seen_cos |= maybe_record_stmt (&stmts, &top_bb, use_stmt) ? 1 : 0;
 	  break;
 
 	CASE_CFN_SIN:
-	  seen_sin |= maybe_record_sincos (&stmts, &top_bb, use_stmt) ? 1 : 0;
+	  seen_sin |= maybe_record_stmt (&stmts, &top_bb, use_stmt) ? 1 : 0;
 	  break;
 
 	CASE_CFN_CEXPI:
-	  seen_cexpi |= maybe_record_sincos (&stmts, &top_bb, use_stmt) ? 1 : 0;
+	  seen_cexpi |= maybe_record_stmt (&stmts, &top_bb, use_stmt) ? 1 : 0;
 	  break;
 
 	default:;
@@ -1208,7 +1311,7 @@ execute_cse_sincos_1 (tree name)
       gsi = gsi_after_labels (top_bb);
       gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
     }
-  sincos_stats.inserted++;
+  sincos_stats.sincos_inserted++;
 
   /* And adjust the recorded old call sites.  */
   for (i = 0; stmts.iterate (i, &use_stmt); ++i)
@@ -2295,7 +2398,9 @@ pass_cse_sincos::execute (function *fun)
     }
 
   statistics_counter_event (fun, "sincos statements inserted",
-			    sincos_stats.inserted);
+			    sincos_stats.sincos_inserted);
+  statistics_counter_event (fun, "conv statements inserted",
+			    sincos_stats.conv_inserted);
 
   return cfg_changed ? TODO_cleanup_cfg : 0;
 }
Richard Biener Oct. 27, 2020, 8:54 a.m. UTC | #3
On Tue, Oct 27, 2020 at 6:32 AM Alexandre Oliva <oliva@adacore.com> wrote:
>
> On Oct 23, 2020, Richard Biener <richard.guenther@gmail.com> wrote:
>
> > Can you move it one pass further after sink please?
>
> I did, but it didn't solve the recip regressions that my first attempt
> brought about.
>
> > Also I don't
> > remember exactly but does pass_sincos only handle sin/cos unifying?
>
> It rearranges some powi computations, and that's what breaks recip.  It
> adds a copy of y*y in extract_recip_[34], we'd need a forwprop or
> similar to get rid of the trivial copy before recip could do its job
> properly again.
>
> So I figured I'd try to cse type conversions before sincos, and that
> turned out to be pretty easy, and it didn't regress anything.
>
> Regstrapped on x86_64-linux-gnu.  Ok to install?
>
>
> CSE conversions within sincos
>
> From: Alexandre Oliva <oliva@adacore.com>
>
> On platforms in which Aux_[Real_Type] involves non-NOP conversions
> (e.g., between single- and double-precision, or between short float
> and float), the conversions before the calls are CSEd too late for
> sincos to combine calls.
>
> This patch enables the sincos pass to CSE type casts used as arguments
> to eligible calls before looking for other calls using the same
> operand.
>
>
> for  gcc/ChangeLog
>
>         * tree-ssa-math-opts.c (sincos_stats): Rename inserted to
>         sincos_inserted.  Add conv_inserted.
>         (maybe_record_sincos): Rename to...
>         (maybe_record_stmt): ... this.
>         (execute_cse_conv_1): New.
>         (execute_cse_sincos_1): Call it.  Adjust.
>         (pass_cse_sincos::execute): Adjust.  Report conv_inserted.
>
> for  gcc/testsuite/ChangeLog
>
>         * gnat.dg/sin_cos.ads: New.
>         * gnat.dg/sin_cos.adb: New.
>         * gcc.dg/sin_cos.c: New.
> ---
>  gcc/testsuite/gcc.dg/sin_cos.c    |   41 +++++++++++++
>  gcc/testsuite/gnat.dg/sin_cos.adb |   14 ++++
>  gcc/testsuite/gnat.dg/sin_cos.ads |    4 +
>  gcc/tree-ssa-math-opts.c          |  119 +++++++++++++++++++++++++++++++++++--
>  4 files changed, 171 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/sin_cos.c
>  create mode 100644 gcc/testsuite/gnat.dg/sin_cos.adb
>  create mode 100644 gcc/testsuite/gnat.dg/sin_cos.ads
>
> diff --git a/gcc/testsuite/gcc.dg/sin_cos.c b/gcc/testsuite/gcc.dg/sin_cos.c
> new file mode 100644
> index 00000000..aa71dca
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/sin_cos.c
> @@ -0,0 +1,41 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +/* This maps to essentially the same gimple that is generated for
> +   gnat.dg/sin_cos.adb, on platforms that use the wraplf variant of
> +   Ada.Numerics.Aux_Float.  The value of EPSILON is not relevant to
> +   the test, but the test must be there to keep the conversions in
> +   different BBs long enough to trigger the problem that prevented the
> +   sincos optimization, because the arguments passed to sin and cos
> +   didn't get unified into a single SSA_NAME in time for sincos.  */
> +
> +#include <math.h>
> +
> +#define EPSILON 3.4526697709225118160247802734375e-4
> +
> +static float my_sinf(float x) {
> +  return (float) sin ((double) x);
> +}
> +
> +static float wrap_sinf(float x) {
> +  if (fabs (x) < EPSILON)
> +    return 0;
> +  return my_sinf (x);
> +}
> +
> +static float my_cosf(float x) {
> +  return (float) cos ((double) x);
> +}
> +
> +static float wrap_cosf(float x) {
> +  if (fabs (x) < EPSILON)
> +    return 1;
> +  return my_cosf (x);
> +}
> +
> +float my_sin_cos(float x, float *s, float *c) {
> +  *s = wrap_sinf (x);
> +  *c = wrap_cosf (x);
> +}
> +
> +/* { dg-final { scan-assembler "sincos\|cexp" { target *-linux-gnu* *-w64-mingw* *-*-vxworks* } } } */
> diff --git a/gcc/testsuite/gnat.dg/sin_cos.adb b/gcc/testsuite/gnat.dg/sin_cos.adb
> new file mode 100644
> index 00000000..6e18df9
> --- /dev/null
> +++ b/gcc/testsuite/gnat.dg/sin_cos.adb
> @@ -0,0 +1,14 @@
> +--  { dg-do compile }
> +--  { dg-options "-O2 -gnatn" }
> +
> +with Ada.Numerics.Elementary_Functions;
> +use Ada.Numerics.Elementary_Functions;
> +package body Sin_Cos is
> +   procedure Sin_Cos (Angle : T; SinA, CosA : out T) is
> +   begin
> +      SinA := Sin (Angle);
> +      CosA := Cos (Angle);
> +   end;
> +end Sin_Cos;
> +
> +--  { dg-final { scan-assembler "sincos\|cexp" { target *-linux-gnu* *-w64-mingw* *-*-vxworks* } } }
> diff --git a/gcc/testsuite/gnat.dg/sin_cos.ads b/gcc/testsuite/gnat.dg/sin_cos.ads
> new file mode 100644
> index 00000000..a0eff3d
> --- /dev/null
> +++ b/gcc/testsuite/gnat.dg/sin_cos.ads
> @@ -0,0 +1,4 @@
> +package Sin_Cos is
> +   subtype T is Float;
> +   procedure Sin_Cos (Angle : T; SinA, CosA : out T);
> +end Sin_Cos;
> diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
> index 90dfb98..a32f5ca 100644
> --- a/gcc/tree-ssa-math-opts.c
> +++ b/gcc/tree-ssa-math-opts.c
> @@ -186,7 +186,11 @@ static struct
>  static struct
>  {
>    /* Number of cexpi calls inserted.  */
> -  int inserted;
> +  int sincos_inserted;
> +
> +  /* Number of conversions inserted.  */
> +  int conv_inserted;
> +
>  } sincos_stats;
>
>  static struct
> @@ -1106,7 +1110,7 @@ make_pass_cse_reciprocals (gcc::context *ctxt)
>     statements in the vector.  */
>
>  static bool
> -maybe_record_sincos (vec<gimple *> *stmts,
> +maybe_record_stmt (vec<gimple *> *stmts,
>                      basic_block *top_bb, gimple *use_stmt)
>  {
>    basic_block use_bb = gimple_bb (use_stmt);
> @@ -1126,6 +1130,103 @@ maybe_record_sincos (vec<gimple *> *stmts,
>    return true;
>  }
>
> +/* If NAME is the result of a type conversion, look for other
> +   conversions from the same source to the same type and CSE them.
> +   Replace results of conversions in sin, cos and cexpi calls with the
> +   CSEd SSA_NAME.  Return a SSA_NAME that holds the result of the
> +   conversion to be used instead of NAME.  */
> +
> +static tree
> +execute_cse_conv_1 (tree name)
> +{
> +  if (SSA_NAME_IS_DEFAULT_DEF (name))
> +    return name;
> +
> +  gimple *def_stmt = SSA_NAME_DEF_STMT (name);
> +
> +  if (!gimple_assign_cast_p (def_stmt))
> +    return name;
> +
> +  tree src = gimple_assign_rhs1 (def_stmt);

For trapping math SRC may be a constant?  Better be safe
and guard against TREE_CODE (src) != SSA_NAME.

You also want to guard against SSA_NAME_OCCURS_IN_ABNORMAL_PHI (src)
since you cannot generally propagate or move uses of those.  That
applies to 'name' as well, and ...

> +  int count = 0;
> +  imm_use_iterator use_iter;
> +  gimple *use_stmt;
> +  auto_vec<gimple *> stmts;
> +  basic_block top_bb = NULL;
> +
> +  /* Record equivalent conversions from the same source.  */
> +  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, src)
> +    {
> +      if (!gimple_assign_cast_p (use_stmt))
> +       continue;

... the lhs of the other conversions.

> +      if (!types_compatible_p (TREE_TYPE (name),
> +                              TREE_TYPE (gimple_assign_lhs (use_stmt))))
> +       continue;
> +
> +      gcc_checking_assert (gimple_assign_rhs1 (use_stmt)
> +                          == gimple_assign_rhs1 (def_stmt));
> +
> +      if (maybe_record_stmt (&stmts, &top_bb, use_stmt))
> +       count++;
> +    }
> +
> +  if (count <= 1)
> +    return name;
> +
> +  /* Build and insert a new conversion.  */
> +  name = make_temp_ssa_name (TREE_TYPE (name), def_stmt, "convtmp");
> +  gimple *stmt = gimple_build_assign (name,
> +                                     gimple_assign_rhs_code (def_stmt),
> +                                     gimple_assign_rhs1 (def_stmt));
> +
> +  gimple_stmt_iterator gsi;
> +  if (!SSA_NAME_IS_DEFAULT_DEF (name)
> +      && gimple_code (def_stmt) != GIMPLE_PHI
> +      && gimple_bb (def_stmt) == top_bb)
> +    gsi = gsi_for_stmt (def_stmt);

So I think this will go wrong if def_stmt is in a BB like

    def1 = (T) src;
    def2 = (T) src;

and def_stmt is def2.  I think you need to look at the
definition of SRC instead, so

   if (!SSA_NAME_IS_DEFAULT_DEF (src)
       && gimple_code (SSA_NAME_DEF_STMT (src)) != GIMPLE_PHI
       && gimple_bb (SSA_NAME_DEF_STMT (src)) == top_bb)
   {
     gsi = gsi_for_stmt (SSA_NAME_DEF_STMT (src);
     gsi_insert_after (...);

> +  else
> +    gsi = gsi_after_labels (top_bb);
> +  gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
> +  update_stmt (stmt);
> +  sincos_stats.conv_inserted++;
> +
> +  /* And adjust the recorded old conversion sites.  */
> +  for (int i = 0; stmts.iterate (i, &use_stmt); ++i)
> +    {
> +      tree lhs = gimple_assign_lhs (use_stmt);
> +      gimple_assign_set_rhs1 (use_stmt, name);
> +      gimple_assign_set_rhs_code (use_stmt, SSA_NAME);
> +      update_stmt (use_stmt);
> +
> +      /* Replace uses of LHS with NAME in sin, cos and cexpi calls
> +        right away, so that we can recognize them as taking the same
> +        arguments.  */
> +      FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, lhs)
> +       {
> +         if (gimple_code (use_stmt) != GIMPLE_CALL
> +             || !gimple_call_lhs (use_stmt))
> +           continue;

Any reason you are not replacing all uses via replace_uses_by
and removing the old conversion stmts?  OK, removing might
wreck the upthread iterator so replacing with GIMPLE_NOP
is the usual trick then.

> +         switch (gimple_call_combined_fn (use_stmt))
> +           {
> +           CASE_CFN_COS:
> +           CASE_CFN_SIN:
> +           CASE_CFN_CEXPI:
> +             gcc_checking_assert (gimple_call_arg (use_stmt, 0) == lhs);
> +             gimple_call_set_arg (use_stmt, 0, name);
> +             update_stmt (use_stmt);
> +             break;
> +
> +           default:
> +             continue;
> +           }
> +       }
> +    }
> +
> +  return name;
> +}
> +
>  /* Look for sin, cos and cexpi calls with the same argument NAME and
>     create a single call to cexpi CSEing the result in this case.
>     We first walk over all immediate uses of the argument collecting
> @@ -1147,6 +1248,8 @@ execute_cse_sincos_1 (tree name)
>    int i;
>    bool cfg_changed = false;
>
> +  name = execute_cse_conv_1 (name);
> +
>    FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, name)
>      {
>        if (gimple_code (use_stmt) != GIMPLE_CALL
> @@ -1156,15 +1259,15 @@ execute_cse_sincos_1 (tree name)
>        switch (gimple_call_combined_fn (use_stmt))
>         {
>         CASE_CFN_COS:
> -         seen_cos |= maybe_record_sincos (&stmts, &top_bb, use_stmt) ? 1 : 0;
> +         seen_cos |= maybe_record_stmt (&stmts, &top_bb, use_stmt) ? 1 : 0;
>           break;
>
>         CASE_CFN_SIN:
> -         seen_sin |= maybe_record_sincos (&stmts, &top_bb, use_stmt) ? 1 : 0;
> +         seen_sin |= maybe_record_stmt (&stmts, &top_bb, use_stmt) ? 1 : 0;
>           break;
>
>         CASE_CFN_CEXPI:
> -         seen_cexpi |= maybe_record_sincos (&stmts, &top_bb, use_stmt) ? 1 : 0;
> +         seen_cexpi |= maybe_record_stmt (&stmts, &top_bb, use_stmt) ? 1 : 0;
>           break;
>
>         default:;
> @@ -1208,7 +1311,7 @@ execute_cse_sincos_1 (tree name)
>        gsi = gsi_after_labels (top_bb);
>        gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
>      }
> -  sincos_stats.inserted++;
> +  sincos_stats.sincos_inserted++;
>
>    /* And adjust the recorded old call sites.  */
>    for (i = 0; stmts.iterate (i, &use_stmt); ++i)
> @@ -2295,7 +2398,9 @@ pass_cse_sincos::execute (function *fun)
>      }
>
>    statistics_counter_event (fun, "sincos statements inserted",
> -                           sincos_stats.inserted);
> +                           sincos_stats.sincos_inserted);
> +  statistics_counter_event (fun, "conv statements inserted",
> +                           sincos_stats.conv_inserted);
>
>    return cfg_changed ? TODO_cleanup_cfg : 0;
>  }
>
>
> --
> Alexandre Oliva, happy hacker
> https://FSFLA.org/blogs/lxo/
> Free Software Activist
> GNU Toolchain Engineer
Alexandre Oliva Oct. 28, 2020, 3:17 a.m. UTC | #4
On Oct 27, 2020, Richard Biener <richard.guenther@gmail.com> wrote:

> For trapping math SRC may be a constant?  Better be safe
> and guard against TREE_CODE (src) != SSA_NAME.

*nod*

> You also want to guard against SSA_NAME_OCCURS_IN_ABNORMAL_PHI (src)
> since you cannot generally propagate or move uses of those.

What if I don't propagate or move them?

In my first cut at this change, I figured it (looked like it) took half
a brain to implement it, and shut down the other half, only making minor
tweaks to a copy of execute_cse_sincos_1.

Your response was a wake-up call to many issues I hadn't considered but
should have, and that it looks like sincos doesn't either, so I ended up
rewriting the cse_conv function to use and propagate a preexisting
dominating def, instead of inserting one at a point where there wasn't
one.  That's because any such conv might trap, unless an earlier one
didn't.

> Any reason you are not replacing all uses via replace_uses_by
> and removing the old conversion stmts?  OK, removing might
> wreck the upthread iterator so replacing with GIMPLE_NOP
> is the usual trick then.

Removing them would be fine, I was just thoughtlessly mirroring the
cse_sincos_1 behavior that I'd based it on.


Now, I put in code to leave conversion results alone if
SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs), but I wonder if it would work to
replace uses thereof, or to propagate uses thereof, as long as I turned
such conversion results that would be deleted into copies.  I.e., given:

  _2 = (double) _1;
  _4 = sin (_2);
  ...
  _3 = (double) _1; // occurs in abnormal phi below
  _5 = cos (_3);
  ...
  _6 = PHI <..., _3 (abnormal), ...>;
  
Wouldn't this be ok to turn into:

  _2 = (double) _1;
  _4 = sin (_2);
  ...
  _3 = _2;
  _5 = cos (_2);
  ...
  _6 = PHI <..., _3 (abnormal), ...>;

?


Anyway, here's a patch that does *not* assume it would work, and skips
defs used in abnormal phis instead.


sincos, however, may mess with them, and even introduce/move a trapping
call to a place where there wasn't any, even if none of the preexisting
calls were going to be executed.  The only thing I fixed there was a
plain return within a FOR_EACH_IMM_USE_STMT that I introduced recently.
Though it's unlikely to ever hit, I understand it's wrong per the
current API.

BTW, any reason why we are not (yet?) using something like:

#define FOR_EACH_IMM_USE_STMT(STMT, ITER, SSAVAR)		\
  for (auto_end_imm_use_stmt_traverse auto_end                  \
       ((((STMT) = first_imm_use_stmt (&(ITER), (SSAVAR))),     \
         &(ITER)));		                                \
       !end_imm_use_stmt_p (&(ITER));				\
       (void) ((STMT) = next_imm_use_stmt (&(ITER))))


Anyway, here's what I'm testing now.  Bootstrap succeeded, regression
testing underway.  Ok to install if it succeeds?


CSE conversions within sincos

From: Alexandre Oliva <oliva@adacore.com>

On platforms in which Aux_[Real_Type] involves non-NOP conversions
(e.g., between single- and double-precision, or between short float
and float), the conversions before the calls are CSEd too late for
sincos to combine calls.

This patch enables the sincos pass to CSE type casts used as arguments
to eligible calls before looking for other calls using the same
operand.


for  gcc/ChangeLog

	* tree-ssa-math-opts.c (sincos_stats): Add conv_removed.
	(execute_cse_conv_1): New.
	(execute_cse_sincos_1): Call it.  Fix return within
	FOR_EACH_IMM_USE_STMT.
	(pass_cse_sincos::execute): Report conv_inserted.

for  gcc/testsuite/ChangeLog

	* gnat.dg/sin_cos.ads: New.
	* gnat.dg/sin_cos.adb: New.
	* gcc.dg/sin_cos.c: New.
---
 gcc/testsuite/gcc.dg/sin_cos.c    |   41 ++++++++++++++
 gcc/testsuite/gnat.dg/sin_cos.adb |   14 +++++
 gcc/testsuite/gnat.dg/sin_cos.ads |    4 +
 gcc/tree-ssa-math-opts.c          |  107 +++++++++++++++++++++++++++++++++++++
 4 files changed, 165 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/sin_cos.c
 create mode 100644 gcc/testsuite/gnat.dg/sin_cos.adb
 create mode 100644 gcc/testsuite/gnat.dg/sin_cos.ads

diff --git a/gcc/testsuite/gcc.dg/sin_cos.c b/gcc/testsuite/gcc.dg/sin_cos.c
new file mode 100644
index 00000000..aa71dca
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/sin_cos.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* This maps to essentially the same gimple that is generated for
+   gnat.dg/sin_cos.adb, on platforms that use the wraplf variant of
+   Ada.Numerics.Aux_Float.  The value of EPSILON is not relevant to
+   the test, but the test must be there to keep the conversions in
+   different BBs long enough to trigger the problem that prevented the
+   sincos optimization, because the arguments passed to sin and cos
+   didn't get unified into a single SSA_NAME in time for sincos.  */
+
+#include <math.h>
+
+#define EPSILON 3.4526697709225118160247802734375e-4
+
+static float my_sinf(float x) {
+  return (float) sin ((double) x);
+}
+
+static float wrap_sinf(float x) {
+  if (fabs (x) < EPSILON)
+    return 0;
+  return my_sinf (x);
+}
+
+static float my_cosf(float x) {
+  return (float) cos ((double) x);
+}
+
+static float wrap_cosf(float x) {
+  if (fabs (x) < EPSILON)
+    return 1;
+  return my_cosf (x);
+}
+
+float my_sin_cos(float x, float *s, float *c) {
+  *s = wrap_sinf (x);
+  *c = wrap_cosf (x);
+}
+
+/* { dg-final { scan-assembler "sincos\|cexp" { target *-linux-gnu* *-w64-mingw* *-*-vxworks* } } } */
diff --git a/gcc/testsuite/gnat.dg/sin_cos.adb b/gcc/testsuite/gnat.dg/sin_cos.adb
new file mode 100644
index 00000000..6e18df9
--- /dev/null
+++ b/gcc/testsuite/gnat.dg/sin_cos.adb
@@ -0,0 +1,14 @@
+--  { dg-do compile }
+--  { dg-options "-O2 -gnatn" }
+
+with Ada.Numerics.Elementary_Functions;
+use Ada.Numerics.Elementary_Functions;
+package body Sin_Cos is
+   procedure Sin_Cos (Angle : T; SinA, CosA : out T) is
+   begin
+      SinA := Sin (Angle);
+      CosA := Cos (Angle);
+   end;
+end Sin_Cos;
+
+--  { dg-final { scan-assembler "sincos\|cexp" { target *-linux-gnu* *-w64-mingw* *-*-vxworks* } } }
diff --git a/gcc/testsuite/gnat.dg/sin_cos.ads b/gcc/testsuite/gnat.dg/sin_cos.ads
new file mode 100644
index 00000000..a0eff3d
--- /dev/null
+++ b/gcc/testsuite/gnat.dg/sin_cos.ads
@@ -0,0 +1,4 @@
+package Sin_Cos is
+   subtype T is Float;
+   procedure Sin_Cos (Angle : T; SinA, CosA : out T);
+end Sin_Cos;
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 90dfb98..65c7b34 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -187,6 +187,10 @@ static struct
 {
   /* Number of cexpi calls inserted.  */
   int inserted;
+
+  /* Number of conversions removed.  */
+  int conv_removed;
+
 } sincos_stats;
 
 static struct
@@ -1099,6 +1103,103 @@ make_pass_cse_reciprocals (gcc::context *ctxt)
   return new pass_cse_reciprocals (ctxt);
 }
 
+/* If NAME is the result of a type conversion, look for other
+   equivalent dominating or dominated conversions, and replace all
+   uses with the earliest dominating name, removing the redundant
+   conversions.  Return the prevailing name.  */
+
+static tree
+execute_cse_conv_1 (tree name)
+{
+  if (SSA_NAME_IS_DEFAULT_DEF (name)
+      || SSA_NAME_OCCURS_IN_ABNORMAL_PHI (name))
+    return name;
+
+  gimple *def_stmt = SSA_NAME_DEF_STMT (name);
+
+  if (!gimple_assign_cast_p (def_stmt))
+    return name;
+
+  tree src = gimple_assign_rhs1 (def_stmt);
+
+  if (TREE_CODE (src) != SSA_NAME)
+    return name;
+
+  imm_use_iterator use_iter;
+  gimple *use_stmt;
+
+  /* Find the earliest dominating def.    */
+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, src)
+    {
+      if (use_stmt == def_stmt
+	  || !gimple_assign_cast_p (use_stmt))
+	continue;
+
+      tree lhs = gimple_assign_lhs (use_stmt);
+
+      if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs)
+	  || (gimple_assign_rhs1 (use_stmt)
+	      != gimple_assign_rhs1 (def_stmt))
+	  || !types_compatible_p (TREE_TYPE (name), TREE_TYPE (lhs)))
+	continue;
+
+      bool use_dominates;
+      if (gimple_bb (def_stmt) == gimple_bb (use_stmt))
+	{
+	  gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
+	  while (!gsi_end_p (gsi) && gsi_stmt (gsi) != def_stmt)
+	    gsi_next (&gsi);
+	  use_dominates = !gsi_end_p (gsi);
+	}
+      else if (dominated_by_p (CDI_DOMINATORS, gimple_bb (use_stmt),
+			       gimple_bb (def_stmt)))
+	use_dominates = false;
+      else if (dominated_by_p (CDI_DOMINATORS, gimple_bb (def_stmt),
+			       gimple_bb (use_stmt)))
+	use_dominates = true;
+      else
+	continue;
+
+      if (use_dominates)
+	{
+	  std::swap (name, lhs);
+	  std::swap (def_stmt, use_stmt);
+	}
+    }
+
+  /* Now go through all uses of SRC again, replacing the equivalent
+     dominated conversions.  We may replace defs that were not
+     dominated by the then-prevailing defs when we first visited
+     them.  */
+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, src)
+    {
+      if (use_stmt == def_stmt
+	  || !gimple_assign_cast_p (use_stmt))
+	continue;
+
+      tree lhs = gimple_assign_lhs (use_stmt);
+
+      if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs)
+	  || (gimple_assign_rhs1 (use_stmt)
+	      != gimple_assign_rhs1 (def_stmt))
+	  || !types_compatible_p (TREE_TYPE (name), TREE_TYPE (lhs)))
+	continue;
+
+      if (gimple_bb (def_stmt) == gimple_bb (use_stmt)
+	  || dominated_by_p (CDI_DOMINATORS, gimple_bb (use_stmt),
+			     gimple_bb (def_stmt)))
+	{
+	  sincos_stats.conv_removed++;
+
+	  gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
+	  replace_uses_by (lhs, name);
+	  gsi_remove (&gsi, true);
+	}
+    }
+
+  return name;
+}
+
 /* Records an occurrence at statement USE_STMT in the vector of trees
    STMTS if it is dominated by *TOP_BB or dominates it or this basic block
    is not yet initialized.  Returns true if the occurrence was pushed on
@@ -1147,6 +1248,8 @@ execute_cse_sincos_1 (tree name)
   int i;
   bool cfg_changed = false;
 
+  name = execute_cse_conv_1 (name);
+
   FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, name)
     {
       if (gimple_code (use_stmt) != GIMPLE_CALL
@@ -1181,7 +1284,7 @@ execute_cse_sincos_1 (tree name)
 	 and, in subsequent rounds, that the built_in type is the same
 	 type, or a compatible type.  */
       if (type != t && !types_compatible_p (type, t))
-	return false;
+	RETURN_FROM_IMM_USE_STMT (use_iter, false);
     }
   if (seen_cos + seen_sin + seen_cexpi <= 1)
     return false;
@@ -2296,6 +2399,8 @@ pass_cse_sincos::execute (function *fun)
 
   statistics_counter_event (fun, "sincos statements inserted",
 			    sincos_stats.inserted);
+  statistics_counter_event (fun, "conv statements removed",
+			    sincos_stats.conv_removed);
 
   return cfg_changed ? TODO_cleanup_cfg : 0;
 }
Richard Biener Oct. 28, 2020, 1:01 p.m. UTC | #5
On Wed, Oct 28, 2020 at 4:18 AM Alexandre Oliva <oliva@adacore.com> wrote:
>
> On Oct 27, 2020, Richard Biener <richard.guenther@gmail.com> wrote:
>
> > For trapping math SRC may be a constant?  Better be safe
> > and guard against TREE_CODE (src) != SSA_NAME.
>
> *nod*
>
> > You also want to guard against SSA_NAME_OCCURS_IN_ABNORMAL_PHI (src)
> > since you cannot generally propagate or move uses of those.
>
> What if I don't propagate or move them?

That's fine then.

> In my first cut at this change, I figured it (looked like it) took half
> a brain to implement it, and shut down the other half, only making minor
> tweaks to a copy of execute_cse_sincos_1.
>
> Your response was a wake-up call to many issues I hadn't considered but
> should have, and that it looks like sincos doesn't either, so I ended up
> rewriting the cse_conv function to use and propagate a preexisting
> dominating def, instead of inserting one at a point where there wasn't
> one.  That's because any such conv might trap, unless an earlier one
> didn't.
>
> > Any reason you are not replacing all uses via replace_uses_by
> > and removing the old conversion stmts?  OK, removing might
> > wreck the upthread iterator so replacing with GIMPLE_NOP
> > is the usual trick then.
>
> Removing them would be fine, I was just thoughtlessly mirroring the
> cse_sincos_1 behavior that I'd based it on.
>
>
> Now, I put in code to leave conversion results alone if
> SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs), but I wonder if it would work to
> replace uses thereof, or to propagate uses thereof, as long as I turned
> such conversion results that would be deleted into copies.  I.e., given:
>
>   _2 = (double) _1;
>   _4 = sin (_2);
>   ...
>   _3 = (double) _1; // occurs in abnormal phi below
>   _5 = cos (_3);
>   ...
>   _6 = PHI <..., _3 (abnormal), ...>;
>
> Wouldn't this be ok to turn into:
>
>   _2 = (double) _1;
>   _4 = sin (_2);
>   ...
>   _3 = _2;
>   _5 = cos (_2);
>   ...
>   _6 = PHI <..., _3 (abnormal), ...>;
>
> ?

Yes.

>
> Anyway, here's a patch that does *not* assume it would work, and skips
> defs used in abnormal phis instead.
>
>
> sincos, however, may mess with them, and even introduce/move a trapping
> call to a place where there wasn't any, even if none of the preexisting
> calls were going to be executed.  The only thing I fixed there was a
> plain return within a FOR_EACH_IMM_USE_STMT that I introduced recently.
> Though it's unlikely to ever hit, I understand it's wrong per the
> current API.
>
> BTW, any reason why we are not (yet?) using something like:
>
> #define FOR_EACH_IMM_USE_STMT(STMT, ITER, SSAVAR)               \
>   for (auto_end_imm_use_stmt_traverse auto_end                  \
>        ((((STMT) = first_imm_use_stmt (&(ITER), (SSAVAR))),     \
>          &(ITER)));                                             \
>        !end_imm_use_stmt_p (&(ITER));                           \
>        (void) ((STMT) = next_imm_use_stmt (&(ITER))))

Just laziness.  Or rather last time I remembered this I tried to do it
more fancy via range-for but didn't get very far due to the nested
iteration ...

>
> Anyway, here's what I'm testing now.  Bootstrap succeeded, regression
> testing underway.  Ok to install if it succeeds?

OK.

Thanks,
Richard.

>
> CSE conversions within sincos
>
> From: Alexandre Oliva <oliva@adacore.com>
>
> On platforms in which Aux_[Real_Type] involves non-NOP conversions
> (e.g., between single- and double-precision, or between short float
> and float), the conversions before the calls are CSEd too late for
> sincos to combine calls.
>
> This patch enables the sincos pass to CSE type casts used as arguments
> to eligible calls before looking for other calls using the same
> operand.
>
>
> for  gcc/ChangeLog
>
>         * tree-ssa-math-opts.c (sincos_stats): Add conv_removed.
>         (execute_cse_conv_1): New.
>         (execute_cse_sincos_1): Call it.  Fix return within
>         FOR_EACH_IMM_USE_STMT.
>         (pass_cse_sincos::execute): Report conv_inserted.
>
> for  gcc/testsuite/ChangeLog
>
>         * gnat.dg/sin_cos.ads: New.
>         * gnat.dg/sin_cos.adb: New.
>         * gcc.dg/sin_cos.c: New.
> ---
>  gcc/testsuite/gcc.dg/sin_cos.c    |   41 ++++++++++++++
>  gcc/testsuite/gnat.dg/sin_cos.adb |   14 +++++
>  gcc/testsuite/gnat.dg/sin_cos.ads |    4 +
>  gcc/tree-ssa-math-opts.c          |  107 +++++++++++++++++++++++++++++++++++++
>  4 files changed, 165 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/sin_cos.c
>  create mode 100644 gcc/testsuite/gnat.dg/sin_cos.adb
>  create mode 100644 gcc/testsuite/gnat.dg/sin_cos.ads
>
> diff --git a/gcc/testsuite/gcc.dg/sin_cos.c b/gcc/testsuite/gcc.dg/sin_cos.c
> new file mode 100644
> index 00000000..aa71dca
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/sin_cos.c
> @@ -0,0 +1,41 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +/* This maps to essentially the same gimple that is generated for
> +   gnat.dg/sin_cos.adb, on platforms that use the wraplf variant of
> +   Ada.Numerics.Aux_Float.  The value of EPSILON is not relevant to
> +   the test, but the test must be there to keep the conversions in
> +   different BBs long enough to trigger the problem that prevented the
> +   sincos optimization, because the arguments passed to sin and cos
> +   didn't get unified into a single SSA_NAME in time for sincos.  */
> +
> +#include <math.h>
> +
> +#define EPSILON 3.4526697709225118160247802734375e-4
> +
> +static float my_sinf(float x) {
> +  return (float) sin ((double) x);
> +}
> +
> +static float wrap_sinf(float x) {
> +  if (fabs (x) < EPSILON)
> +    return 0;
> +  return my_sinf (x);
> +}
> +
> +static float my_cosf(float x) {
> +  return (float) cos ((double) x);
> +}
> +
> +static float wrap_cosf(float x) {
> +  if (fabs (x) < EPSILON)
> +    return 1;
> +  return my_cosf (x);
> +}
> +
> +float my_sin_cos(float x, float *s, float *c) {
> +  *s = wrap_sinf (x);
> +  *c = wrap_cosf (x);
> +}
> +
> +/* { dg-final { scan-assembler "sincos\|cexp" { target *-linux-gnu* *-w64-mingw* *-*-vxworks* } } } */
> diff --git a/gcc/testsuite/gnat.dg/sin_cos.adb b/gcc/testsuite/gnat.dg/sin_cos.adb
> new file mode 100644
> index 00000000..6e18df9
> --- /dev/null
> +++ b/gcc/testsuite/gnat.dg/sin_cos.adb
> @@ -0,0 +1,14 @@
> +--  { dg-do compile }
> +--  { dg-options "-O2 -gnatn" }
> +
> +with Ada.Numerics.Elementary_Functions;
> +use Ada.Numerics.Elementary_Functions;
> +package body Sin_Cos is
> +   procedure Sin_Cos (Angle : T; SinA, CosA : out T) is
> +   begin
> +      SinA := Sin (Angle);
> +      CosA := Cos (Angle);
> +   end;
> +end Sin_Cos;
> +
> +--  { dg-final { scan-assembler "sincos\|cexp" { target *-linux-gnu* *-w64-mingw* *-*-vxworks* } } }
> diff --git a/gcc/testsuite/gnat.dg/sin_cos.ads b/gcc/testsuite/gnat.dg/sin_cos.ads
> new file mode 100644
> index 00000000..a0eff3d
> --- /dev/null
> +++ b/gcc/testsuite/gnat.dg/sin_cos.ads
> @@ -0,0 +1,4 @@
> +package Sin_Cos is
> +   subtype T is Float;
> +   procedure Sin_Cos (Angle : T; SinA, CosA : out T);
> +end Sin_Cos;
> diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
> index 90dfb98..65c7b34 100644
> --- a/gcc/tree-ssa-math-opts.c
> +++ b/gcc/tree-ssa-math-opts.c
> @@ -187,6 +187,10 @@ static struct
>  {
>    /* Number of cexpi calls inserted.  */
>    int inserted;
> +
> +  /* Number of conversions removed.  */
> +  int conv_removed;
> +
>  } sincos_stats;
>
>  static struct
> @@ -1099,6 +1103,103 @@ make_pass_cse_reciprocals (gcc::context *ctxt)
>    return new pass_cse_reciprocals (ctxt);
>  }
>
> +/* If NAME is the result of a type conversion, look for other
> +   equivalent dominating or dominated conversions, and replace all
> +   uses with the earliest dominating name, removing the redundant
> +   conversions.  Return the prevailing name.  */
> +
> +static tree
> +execute_cse_conv_1 (tree name)
> +{
> +  if (SSA_NAME_IS_DEFAULT_DEF (name)
> +      || SSA_NAME_OCCURS_IN_ABNORMAL_PHI (name))
> +    return name;
> +
> +  gimple *def_stmt = SSA_NAME_DEF_STMT (name);
> +
> +  if (!gimple_assign_cast_p (def_stmt))
> +    return name;
> +
> +  tree src = gimple_assign_rhs1 (def_stmt);
> +
> +  if (TREE_CODE (src) != SSA_NAME)
> +    return name;
> +
> +  imm_use_iterator use_iter;
> +  gimple *use_stmt;
> +
> +  /* Find the earliest dominating def.    */
> +  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, src)
> +    {
> +      if (use_stmt == def_stmt
> +         || !gimple_assign_cast_p (use_stmt))
> +       continue;
> +
> +      tree lhs = gimple_assign_lhs (use_stmt);
> +
> +      if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs)
> +         || (gimple_assign_rhs1 (use_stmt)
> +             != gimple_assign_rhs1 (def_stmt))
> +         || !types_compatible_p (TREE_TYPE (name), TREE_TYPE (lhs)))
> +       continue;
> +
> +      bool use_dominates;
> +      if (gimple_bb (def_stmt) == gimple_bb (use_stmt))
> +       {
> +         gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
> +         while (!gsi_end_p (gsi) && gsi_stmt (gsi) != def_stmt)
> +           gsi_next (&gsi);
> +         use_dominates = !gsi_end_p (gsi);
> +       }
> +      else if (dominated_by_p (CDI_DOMINATORS, gimple_bb (use_stmt),
> +                              gimple_bb (def_stmt)))
> +       use_dominates = false;
> +      else if (dominated_by_p (CDI_DOMINATORS, gimple_bb (def_stmt),
> +                              gimple_bb (use_stmt)))
> +       use_dominates = true;
> +      else
> +       continue;
> +
> +      if (use_dominates)
> +       {
> +         std::swap (name, lhs);
> +         std::swap (def_stmt, use_stmt);
> +       }
> +    }
> +
> +  /* Now go through all uses of SRC again, replacing the equivalent
> +     dominated conversions.  We may replace defs that were not
> +     dominated by the then-prevailing defs when we first visited
> +     them.  */
> +  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, src)
> +    {
> +      if (use_stmt == def_stmt
> +         || !gimple_assign_cast_p (use_stmt))
> +       continue;
> +
> +      tree lhs = gimple_assign_lhs (use_stmt);
> +
> +      if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs)
> +         || (gimple_assign_rhs1 (use_stmt)
> +             != gimple_assign_rhs1 (def_stmt))
> +         || !types_compatible_p (TREE_TYPE (name), TREE_TYPE (lhs)))
> +       continue;
> +
> +      if (gimple_bb (def_stmt) == gimple_bb (use_stmt)
> +         || dominated_by_p (CDI_DOMINATORS, gimple_bb (use_stmt),
> +                            gimple_bb (def_stmt)))
> +       {
> +         sincos_stats.conv_removed++;
> +
> +         gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
> +         replace_uses_by (lhs, name);
> +         gsi_remove (&gsi, true);
> +       }
> +    }
> +
> +  return name;
> +}
> +
>  /* Records an occurrence at statement USE_STMT in the vector of trees
>     STMTS if it is dominated by *TOP_BB or dominates it or this basic block
>     is not yet initialized.  Returns true if the occurrence was pushed on
> @@ -1147,6 +1248,8 @@ execute_cse_sincos_1 (tree name)
>    int i;
>    bool cfg_changed = false;
>
> +  name = execute_cse_conv_1 (name);
> +
>    FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, name)
>      {
>        if (gimple_code (use_stmt) != GIMPLE_CALL
> @@ -1181,7 +1284,7 @@ execute_cse_sincos_1 (tree name)
>          and, in subsequent rounds, that the built_in type is the same
>          type, or a compatible type.  */
>        if (type != t && !types_compatible_p (type, t))
> -       return false;
> +       RETURN_FROM_IMM_USE_STMT (use_iter, false);
>      }
>    if (seen_cos + seen_sin + seen_cexpi <= 1)
>      return false;
> @@ -2296,6 +2399,8 @@ pass_cse_sincos::execute (function *fun)
>
>    statistics_counter_event (fun, "sincos statements inserted",
>                             sincos_stats.inserted);
> +  statistics_counter_event (fun, "conv statements removed",
> +                           sincos_stats.conv_removed);
>
>    return cfg_changed ? TODO_cleanup_cfg : 0;
>  }
>
>
>
>
> --
> Alexandre Oliva, happy hacker
> https://FSFLA.org/blogs/lxo/
> Free Software Activist
> GNU Toolchain Engineer
diff mbox series

Patch

diff --git a/gcc/passes.def b/gcc/passes.def
index cf15d8e..2743506 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -242,12 +242,12 @@  along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_ccp, true /* nonzero_p */);
       /* After CCP we rewrite no longer addressed locals into SSA
 	 form if possible.  */
-      NEXT_PASS (pass_cse_sincos);
       NEXT_PASS (pass_optimize_bswap);
       NEXT_PASS (pass_laddress);
       NEXT_PASS (pass_lim);
       NEXT_PASS (pass_walloca, false);
       NEXT_PASS (pass_pre);
+      NEXT_PASS (pass_cse_sincos);
       NEXT_PASS (pass_sink_code);
       NEXT_PASS (pass_sancov);
       NEXT_PASS (pass_asan);
diff --git a/gcc/testsuite/gcc.dg/sin_cos.c b/gcc/testsuite/gcc.dg/sin_cos.c
new file mode 100644
index 00000000..a4a7727
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/sin_cos.c
@@ -0,0 +1,41 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* This maps to essentially the same gimple that is generated for
+   gnat.dg/sin_cos.adb, on platforms that use the wraplf variant of
+   Ada.Numerics.Aux_Float.  The value of EPSILON is not relevant to
+   the test, but the test must be there to keep the conversions in
+   different BBs long enough to trigger the problem that prevented the
+   sincos optimization, because the arguments passed to sin and cos
+   didn't get unified into a single SSA_NAME in time for sincos.  */
+
+#include <math.h>
+
+#define EPSILON 3.4526697709225118160247802734375e-4
+
+static float my_sinf(float x) {
+  return (float) sin ((double) x);
+}
+
+static float wrap_sinf(float x) {
+  if (fabs (x) < EPSILON)
+    return 0;
+  return my_sinf (x);
+}
+
+static float my_cosf(float x) {
+  return (float) cos ((double) x);
+}
+
+static float wrap_cosf(float x) {
+  if (fabs (x) < EPSILON)
+    return 1;
+  return my_cosf (x);
+}
+
+float my_sin_cos(float x, float *s, float *c) {
+  *s = wrap_sinf (x);
+  *c = wrap_cosf (x);
+}
+
+/* { dg-final { scan-assembler "sincos\|cexp" { target *-linux-gnu* *-w64-mingw* powerpc*-*-* } } } */
diff --git a/gcc/testsuite/gnat.dg/sin_cos.adb b/gcc/testsuite/gnat.dg/sin_cos.adb
new file mode 100644
index 00000000..e72f521
--- /dev/null
+++ b/gcc/testsuite/gnat.dg/sin_cos.adb
@@ -0,0 +1,14 @@ 
+--  { dg-do compile }
+--  { dg-options "-O2 -gnatn" }
+
+with Ada.Numerics.Elementary_Functions;
+use Ada.Numerics.Elementary_Functions;
+package body Sin_Cos is
+   procedure Sin_Cos (Angle : T; SinA, CosA : out T) is
+   begin
+      SinA := Sin (Angle);
+      CosA := Cos (Angle);
+   end;
+end Sin_Cos;
+
+--  { dg-final { scan-assembler "sincos\|cexp" { target *-linux-gnu* *-w64-mingw* powerpc*-*-* } } }
diff --git a/gcc/testsuite/gnat.dg/sin_cos.ads b/gcc/testsuite/gnat.dg/sin_cos.ads
new file mode 100644
index 00000000..a0eff3d
--- /dev/null
+++ b/gcc/testsuite/gnat.dg/sin_cos.ads
@@ -0,0 +1,4 @@ 
+package Sin_Cos is
+   subtype T is Float;
+   procedure Sin_Cos (Angle : T; SinA, CosA : out T);
+end Sin_Cos;