diff mbox series

path solver: Minimize exported ranges to subsequent blocks.

Message ID 20211127083206.2450093-1-aldyh@redhat.com
State New
Headers show
Series path solver: Minimize exported ranges to subsequent blocks. | expand

Commit Message

Aldy Hernandez Nov. 27, 2021, 8:32 a.m. UTC
Currently we have a set of interesting names in the path that we solve
for.  However, solving *all* of them on exit from each block is
unnecessary, even if an edge range is available.  We can reduce the
queried edges by only solving SSA names from the current block
that will be needed in subsequent blocks.

With this patch we avoid the explosion in PR103254, without the need for
Andrew's patch (which is still independently needed).

In my testbed we loose 6 threads in the non-resolving threading passes,
but the overall thread count is the same, as we pick them up later.  We
could probably try harder to get them, but it's probably not worth the
effort.

Benchmarking this patch shows no difference in overall compilation speed
and a tiny slowdown (< 0.80%) for some of the threading passes.
Benchmarked for both -O2, -O3, and -O2 --param ranger-logical-depth=10.

I'm unsure what the review requirements are in stage3, so...

OK for trunk?

p.s. I saw some cleanups that could be done (intersecting more bitmaps,
putting the rest of the bitmaps in the new obstack, etc etc), but I
avoided doing them to minimize the churn in stage3.

Regstrapped on x86-64 & ppc64le Linux.

	PR tree-optimization/103254

gcc/ChangeLog:

	* gimple-range-path.cc (path_range_query::path_range_query):
	Initialize obstack.
	(path_range_query::~path_range_query): Release obstack.
	(path_range_query::needs_on_entry_for_block): New.
	(path_range_query::compute_needs_on_entry_set): New.
	(path_range_query::compute_ranges_in_block): Only calculate
	outgoing edges for m_needs_on_entry set.
	(path_range_query::compute_ranges): Call
	compute_needs_on_entry_set.
	* gimple-range-path.h: Add needs_on_entry_for_block,
	compute_needs_on_entry_set, m_bitmaps, and m_needs_on_entry.

gcc/testsuite/ChangeLog:

	* gcc.dg/pr103254.c: Add -fdump-tree-vrp2-details.
	* gcc.dg/pr103254-2.c: New test.
---
 gcc/gimple-range-path.cc          | 67 +++++++++++++++++++++++++++++--
 gcc/gimple-range-path.h           |  4 ++
 gcc/testsuite/gcc.dg/pr103254-2.c | 29 +++++++++++++
 gcc/testsuite/gcc.dg/pr103254.c   |  6 ++-
 4 files changed, 102 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr103254-2.c

Comments

Aldy Hernandez Dec. 1, 2021, 4:18 p.m. UTC | #1
I'm going to hold off pushing this until I hear from the global
maintainers and/or release managers.

This patch fixes a pathological performance case, but it may also be
handled by Andrew's fixes in this area.  It's up to y'all to decide if
it's something we want in this release.  An alternative would be to
keep track of any pathological issues going forward, and see if this
fixes the issue.  I'm going to be taking leave any day now :).

Aldy

On Sat, Nov 27, 2021 at 9:32 AM Aldy Hernandez <aldyh@redhat.com> wrote:
>
> Currently we have a set of interesting names in the path that we solve
> for.  However, solving *all* of them on exit from each block is
> unnecessary, even if an edge range is available.  We can reduce the
> queried edges by only solving SSA names from the current block
> that will be needed in subsequent blocks.
>
> With this patch we avoid the explosion in PR103254, without the need for
> Andrew's patch (which is still independently needed).
>
> In my testbed we loose 6 threads in the non-resolving threading passes,
> but the overall thread count is the same, as we pick them up later.  We
> could probably try harder to get them, but it's probably not worth the
> effort.
>
> Benchmarking this patch shows no difference in overall compilation speed
> and a tiny slowdown (< 0.80%) for some of the threading passes.
> Benchmarked for both -O2, -O3, and -O2 --param ranger-logical-depth=10.
>
> I'm unsure what the review requirements are in stage3, so...
>
> OK for trunk?
>
> p.s. I saw some cleanups that could be done (intersecting more bitmaps,
> putting the rest of the bitmaps in the new obstack, etc etc), but I
> avoided doing them to minimize the churn in stage3.
>
> Regstrapped on x86-64 & ppc64le Linux.
>
>         PR tree-optimization/103254
>
> gcc/ChangeLog:
>
>         * gimple-range-path.cc (path_range_query::path_range_query):
>         Initialize obstack.
>         (path_range_query::~path_range_query): Release obstack.
>         (path_range_query::needs_on_entry_for_block): New.
>         (path_range_query::compute_needs_on_entry_set): New.
>         (path_range_query::compute_ranges_in_block): Only calculate
>         outgoing edges for m_needs_on_entry set.
>         (path_range_query::compute_ranges): Call
>         compute_needs_on_entry_set.
>         * gimple-range-path.h: Add needs_on_entry_for_block,
>         compute_needs_on_entry_set, m_bitmaps, and m_needs_on_entry.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/pr103254.c: Add -fdump-tree-vrp2-details.
>         * gcc.dg/pr103254-2.c: New test.
> ---
>  gcc/gimple-range-path.cc          | 67 +++++++++++++++++++++++++++++--
>  gcc/gimple-range-path.h           |  4 ++
>  gcc/testsuite/gcc.dg/pr103254-2.c | 29 +++++++++++++
>  gcc/testsuite/gcc.dg/pr103254.c   |  6 ++-
>  4 files changed, 102 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr103254-2.c
>
> diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
> index b9c71226c1c..32226df05b5 100644
> --- a/gcc/gimple-range-path.cc
> +++ b/gcc/gimple-range-path.cc
> @@ -48,6 +48,7 @@ path_range_query::path_range_query (bool resolve, gimple_ranger *ranger)
>      m_ranger = ranger;
>
>    m_oracle = new path_oracle (m_ranger->oracle ());
> +  bitmap_obstack_initialize (&m_bitmaps);
>  }
>
>  path_range_query::~path_range_query ()
> @@ -57,6 +58,7 @@ path_range_query::~path_range_query ()
>      delete m_ranger;
>    BITMAP_FREE (m_has_cache_entry);
>    delete m_cache;
> +  bitmap_obstack_release (&m_bitmaps);
>  }
>
>  // Return TRUE if NAME is in the import bitmap.
> @@ -401,6 +403,62 @@ path_range_query::compute_ranges_in_phis (basic_block bb)
>      }
>  }
>
> +// Return the set of SSA names that would be useful to have resolved
> +// on entry to this block.  These are the names that will ultimately
> +// help resolve the final conditional on the path.
> +
> +void
> +path_range_query::needs_on_entry_for_block (bitmap on_entry, unsigned bbn)
> +{
> +  gori_compute &g = m_ranger->gori ();
> +  basic_block next = m_path[bbn];
> +  bitmap_copy (on_entry, g.imports (next));
> +
> +  // Any incoming PHI args are needed on entry, but are not in the
> +  // g.imports() above, so add them manually.
> +  for (auto iter = gsi_start_phis (next); !gsi_end_p (iter); gsi_next (&iter))
> +    {
> +      gphi *phi = iter.phi ();
> +      tree result = gimple_phi_result (phi);
> +
> +      if (bitmap_bit_p (m_imports, SSA_NAME_VERSION (result)))
> +       for (size_t i = 0; i < gimple_phi_num_args (phi); ++i)
> +         {
> +           tree arg = gimple_phi_arg (phi, i)->def;
> +
> +           if (TREE_CODE (arg) == SSA_NAME
> +               && m_path.contains (gimple_phi_arg_edge (phi, i)->src))
> +             bitmap_set_bit (on_entry, SSA_NAME_VERSION (arg));
> +         }
> +    }
> +  bitmap_and_into (on_entry, m_imports);
> +}
> +
> +// Compute the set of needed on-entry names for all path blocks.
> +
> +void
> +path_range_query::compute_needs_on_entry_set ()
> +{
> +  unsigned len = m_path.length ();
> +
> +  if (len > m_needs_on_entry.length ())
> +    m_needs_on_entry.safe_grow_cleared (len);
> +
> +  // Iterate backwards through the path, excluding the entry block
> +  // since we'll be asking the ranger for incoming values into the path.
> +  for (unsigned i = 0; i + 1 < len; ++i)
> +    {
> +      if (!m_needs_on_entry[i])
> +       m_needs_on_entry[i] = BITMAP_ALLOC (&m_bitmaps);
> +
> +      needs_on_entry_for_block (m_needs_on_entry[i], i);
> +
> +      // Include the on-entry names for the subsequent blocks.
> +      if (i > 0)
> +       bitmap_ior_into (m_needs_on_entry[i], m_needs_on_entry[i - 1]);
> +    }
> +}
> +
>  // Compute ranges defined in the current block, or exported to the
>  // next block.
>
> @@ -441,11 +499,12 @@ path_range_query::compute_ranges_in_block (basic_block bb)
>    // Solve imports that are exported to the next block.
>    basic_block next = next_bb ();
>    edge e = find_edge (bb, next);
> -  EXECUTE_IF_SET_IN_BITMAP (m_imports, 0, i, bi)
> +  const_bitmap candidates = m_needs_on_entry[m_pos - 1];
> +  gori_compute &g = m_ranger->gori ();
> +  bitmap exports = g.exports (bb);
> +  EXECUTE_IF_SET_IN_BITMAP (candidates, 0, i, bi)
>      {
>        tree name = ssa_name (i);
> -      gori_compute &g = m_ranger->gori ();
> -      bitmap exports = g.exports (bb);
>
>        if (bitmap_bit_p (exports, i))
>         {
> @@ -602,6 +661,8 @@ path_range_query::compute_ranges (const vec<basic_block> &path,
>    else
>      compute_imports (m_imports, exit_bb ());
>
> +  compute_needs_on_entry_set ();
> +
>    if (m_resolve)
>      get_path_oracle ()->reset_path ();
>
> diff --git a/gcc/gimple-range-path.h b/gcc/gimple-range-path.h
> index 57a9ae9bdcd..f5601377209 100644
> --- a/gcc/gimple-range-path.h
> +++ b/gcc/gimple-range-path.h
> @@ -61,6 +61,8 @@ private:
>    void compute_ranges_in_phis (basic_block bb);
>    void adjust_for_non_null_uses (basic_block bb);
>    void ssa_range_in_phi (irange &r, gphi *phi);
> +  void needs_on_entry_for_block (bitmap interesting, unsigned bbn);
> +  void compute_needs_on_entry_set ();
>    void compute_outgoing_relations (basic_block bb, basic_block next);
>    void compute_phi_relations (basic_block bb, basic_block prev);
>    void maybe_register_phi_relation (gphi *, tree arg);
> @@ -93,6 +95,8 @@ private:
>    auto_bitmap m_imports;
>    gimple_ranger *m_ranger;
>    non_null_ref m_non_null;
> +  bitmap_obstack m_bitmaps;
> +  auto_vec<bitmap> m_needs_on_entry;
>
>    // Current path position.
>    unsigned m_pos;
> diff --git a/gcc/testsuite/gcc.dg/pr103254-2.c b/gcc/testsuite/gcc.dg/pr103254-2.c
> new file mode 100644
> index 00000000000..1956d90a563
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr103254-2.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 --param ranger-logical-depth=99" } */
> +/* { dg-timeout 10 } */
> +
> +/* This is the same test as pr103254.c, but making sure the threader
> +   succeeds regardless of the depth limit, since it should avoid
> +   asking for unnecessary edges.  */
> +
> +short int i;
> +
> +void
> +foo (void)
> +{
> +  for (i = 1; i < 2; i += 4)
> +    {
> +      int j;
> +
> +      for (j = 0; j < 5; j += 4)
> +        {
> +          int k;
> +
> +          for (k = 0; k < 68; k += 4)
> +            {
> +              i &= j;
> +              j &= i;
> +            }
> +        }
> +    }
> +}
> diff --git a/gcc/testsuite/gcc.dg/pr103254.c b/gcc/testsuite/gcc.dg/pr103254.c
> index 62d2415f216..9d6828a6771 100644
> --- a/gcc/testsuite/gcc.dg/pr103254.c
> +++ b/gcc/testsuite/gcc.dg/pr103254.c
> @@ -1,7 +1,11 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3" } */
> +/* { dg-options "-O3 -fdump-tree-vrp2-details" } */
>  /* { dg-timeout 10 } */
>
> +/* The vrp2 dump above is needed to test that ranger can dump all the
> +   known ranges in the IL and not blow up, which would happen for
> +   --param ranger-logical-depth=99.  */
> +
>  short int i;
>
>  void
> --
> 2.31.1
>
Jeff Law Dec. 1, 2021, 4:38 p.m. UTC | #2
On 12/1/2021 9:18 AM, Aldy Hernandez via Gcc-patches wrote:
> I'm going to hold off pushing this until I hear from the global
> maintainers and/or release managers.
>
> This patch fixes a pathological performance case, but it may also be
> handled by Andrew's fixes in this area.  It's up to y'all to decide if
> it's something we want in this release.  An alternative would be to
> keep track of any pathological issues going forward, and see if this
> fixes the issue.  I'm going to be taking leave any day now :).
I'm leaving this to you and Andrew to decide.  Y'all know this code 
better than anyone.

And yes, I keep expecting you to disappear without notice :-)

Jeff
diff mbox series

Patch

diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index b9c71226c1c..32226df05b5 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -48,6 +48,7 @@  path_range_query::path_range_query (bool resolve, gimple_ranger *ranger)
     m_ranger = ranger;
 
   m_oracle = new path_oracle (m_ranger->oracle ());
+  bitmap_obstack_initialize (&m_bitmaps);
 }
 
 path_range_query::~path_range_query ()
@@ -57,6 +58,7 @@  path_range_query::~path_range_query ()
     delete m_ranger;
   BITMAP_FREE (m_has_cache_entry);
   delete m_cache;
+  bitmap_obstack_release (&m_bitmaps);
 }
 
 // Return TRUE if NAME is in the import bitmap.
@@ -401,6 +403,62 @@  path_range_query::compute_ranges_in_phis (basic_block bb)
     }
 }
 
+// Return the set of SSA names that would be useful to have resolved
+// on entry to this block.  These are the names that will ultimately
+// help resolve the final conditional on the path.
+
+void
+path_range_query::needs_on_entry_for_block (bitmap on_entry, unsigned bbn)
+{
+  gori_compute &g = m_ranger->gori ();
+  basic_block next = m_path[bbn];
+  bitmap_copy (on_entry, g.imports (next));
+
+  // Any incoming PHI args are needed on entry, but are not in the
+  // g.imports() above, so add them manually.
+  for (auto iter = gsi_start_phis (next); !gsi_end_p (iter); gsi_next (&iter))
+    {
+      gphi *phi = iter.phi ();
+      tree result = gimple_phi_result (phi);
+
+      if (bitmap_bit_p (m_imports, SSA_NAME_VERSION (result)))
+	for (size_t i = 0; i < gimple_phi_num_args (phi); ++i)
+	  {
+	    tree arg = gimple_phi_arg (phi, i)->def;
+
+	    if (TREE_CODE (arg) == SSA_NAME
+		&& m_path.contains (gimple_phi_arg_edge (phi, i)->src))
+	      bitmap_set_bit (on_entry, SSA_NAME_VERSION (arg));
+	  }
+    }
+  bitmap_and_into (on_entry, m_imports);
+}
+
+// Compute the set of needed on-entry names for all path blocks.
+
+void
+path_range_query::compute_needs_on_entry_set ()
+{
+  unsigned len = m_path.length ();
+
+  if (len > m_needs_on_entry.length ())
+    m_needs_on_entry.safe_grow_cleared (len);
+
+  // Iterate backwards through the path, excluding the entry block
+  // since we'll be asking the ranger for incoming values into the path.
+  for (unsigned i = 0; i + 1 < len; ++i)
+    {
+      if (!m_needs_on_entry[i])
+	m_needs_on_entry[i] = BITMAP_ALLOC (&m_bitmaps);
+
+      needs_on_entry_for_block (m_needs_on_entry[i], i);
+
+      // Include the on-entry names for the subsequent blocks.
+      if (i > 0)
+	bitmap_ior_into (m_needs_on_entry[i], m_needs_on_entry[i - 1]);
+    }
+}
+
 // Compute ranges defined in the current block, or exported to the
 // next block.
 
@@ -441,11 +499,12 @@  path_range_query::compute_ranges_in_block (basic_block bb)
   // Solve imports that are exported to the next block.
   basic_block next = next_bb ();
   edge e = find_edge (bb, next);
-  EXECUTE_IF_SET_IN_BITMAP (m_imports, 0, i, bi)
+  const_bitmap candidates = m_needs_on_entry[m_pos - 1];
+  gori_compute &g = m_ranger->gori ();
+  bitmap exports = g.exports (bb);
+  EXECUTE_IF_SET_IN_BITMAP (candidates, 0, i, bi)
     {
       tree name = ssa_name (i);
-      gori_compute &g = m_ranger->gori ();
-      bitmap exports = g.exports (bb);
 
       if (bitmap_bit_p (exports, i))
 	{
@@ -602,6 +661,8 @@  path_range_query::compute_ranges (const vec<basic_block> &path,
   else
     compute_imports (m_imports, exit_bb ());
 
+  compute_needs_on_entry_set ();
+
   if (m_resolve)
     get_path_oracle ()->reset_path ();
 
diff --git a/gcc/gimple-range-path.h b/gcc/gimple-range-path.h
index 57a9ae9bdcd..f5601377209 100644
--- a/gcc/gimple-range-path.h
+++ b/gcc/gimple-range-path.h
@@ -61,6 +61,8 @@  private:
   void compute_ranges_in_phis (basic_block bb);
   void adjust_for_non_null_uses (basic_block bb);
   void ssa_range_in_phi (irange &r, gphi *phi);
+  void needs_on_entry_for_block (bitmap interesting, unsigned bbn);
+  void compute_needs_on_entry_set ();
   void compute_outgoing_relations (basic_block bb, basic_block next);
   void compute_phi_relations (basic_block bb, basic_block prev);
   void maybe_register_phi_relation (gphi *, tree arg);
@@ -93,6 +95,8 @@  private:
   auto_bitmap m_imports;
   gimple_ranger *m_ranger;
   non_null_ref m_non_null;
+  bitmap_obstack m_bitmaps;
+  auto_vec<bitmap> m_needs_on_entry;
 
   // Current path position.
   unsigned m_pos;
diff --git a/gcc/testsuite/gcc.dg/pr103254-2.c b/gcc/testsuite/gcc.dg/pr103254-2.c
new file mode 100644
index 00000000000..1956d90a563
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr103254-2.c
@@ -0,0 +1,29 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3 --param ranger-logical-depth=99" } */
+/* { dg-timeout 10 } */
+
+/* This is the same test as pr103254.c, but making sure the threader
+   succeeds regardless of the depth limit, since it should avoid
+   asking for unnecessary edges.  */
+
+short int i;
+
+void
+foo (void)
+{
+  for (i = 1; i < 2; i += 4)
+    {
+      int j;
+
+      for (j = 0; j < 5; j += 4)
+        {
+          int k;
+
+          for (k = 0; k < 68; k += 4)
+            {
+              i &= j;
+              j &= i;
+            }
+        }
+    }
+}
diff --git a/gcc/testsuite/gcc.dg/pr103254.c b/gcc/testsuite/gcc.dg/pr103254.c
index 62d2415f216..9d6828a6771 100644
--- a/gcc/testsuite/gcc.dg/pr103254.c
+++ b/gcc/testsuite/gcc.dg/pr103254.c
@@ -1,7 +1,11 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O3" } */
+/* { dg-options "-O3 -fdump-tree-vrp2-details" } */
 /* { dg-timeout 10 } */
 
+/* The vrp2 dump above is needed to test that ranger can dump all the
+   known ranges in the IL and not blow up, which would happen for
+   --param ranger-logical-depth=99.  */
+
 short int i;
 
 void