diff mbox series

tree-optimization/101120 - fix compile-time issue with SLP groups

Message ID 73235n67-9r97-sq35-o146-6rp057p08q9s@fhfr.qr
State New
Headers show
Series tree-optimization/101120 - fix compile-time issue with SLP groups | expand

Commit Message

Richard Biener June 18, 2021, 12:22 p.m. UTC
This places two hacks to avoid an old compile-time issue when
vectorizing large permuted SLP groups with gaps where we end up
emitting loads and IV adjustments for the gap as well and those
have quite a high cost until they are eventually cleaned up.

The first hack is to fold the auto-inc style IV updates early
in the vectorizer rather than in the next forwprop pass which
shortens the SSA use-def chains of the used IV.

The second hack is to remove the unused loads after we've picked
all that we possibly use.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

I wonder if this is too gross (and I have to check the one or two
bug duplicates), but it should be at least easy to backport ...

Thanks,
Richard.

2021-06-18  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/101120
	* tree-vect-data-refs.c (bump_vector_ptr): Fold the
	built increment.
	* tree-vect-stmts.c (vectorizable_load): Remove unused
	loads in the DR chain for SLP.
---
 gcc/tree-vect-data-refs.c | 12 +++++++++++-
 gcc/tree-vect-stmts.c     | 12 ++++++++++++
 2 files changed, 23 insertions(+), 1 deletion(-)

Comments

Richard Biener June 18, 2021, 2:24 p.m. UTC | #1
On Fri, Jun 18, 2021 at 2:23 PM Richard Biener <rguenther@suse.de> wrote:
>
> This places two hacks to avoid an old compile-time issue when
> vectorizing large permuted SLP groups with gaps where we end up
> emitting loads and IV adjustments for the gap as well and those
> have quite a high cost until they are eventually cleaned up.
>
> The first hack is to fold the auto-inc style IV updates early
> in the vectorizer rather than in the next forwprop pass which
> shortens the SSA use-def chains of the used IV.
>
> The second hack is to remove the unused loads after we've picked
> all that we possibly use.
>
> Bootstrap / regtest running on x86_64-unknown-linux-gnu.
>
> I wonder if this is too gross (and I have to check the one or two
> bug duplicates), but it should be at least easy to backport ...

Was apparently too simple - the following passes bootstrap and
regtest.

Richard.
Richard Biener June 21, 2021, 1:02 p.m. UTC | #2
On Fri, Jun 18, 2021 at 4:24 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Fri, Jun 18, 2021 at 2:23 PM Richard Biener <rguenther@suse.de> wrote:
> >
> > This places two hacks to avoid an old compile-time issue when
> > vectorizing large permuted SLP groups with gaps where we end up
> > emitting loads and IV adjustments for the gap as well and those
> > have quite a high cost until they are eventually cleaned up.
> >
> > The first hack is to fold the auto-inc style IV updates early
> > in the vectorizer rather than in the next forwprop pass which
> > shortens the SSA use-def chains of the used IV.
> >
> > The second hack is to remove the unused loads after we've picked
> > all that we possibly use.
> >
> > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> >
> > I wonder if this is too gross (and I have to check the one or two
> > bug duplicates), but it should be at least easy to backport ...
>
> Was apparently too simple - the following passes bootstrap and
> regtest.

I've pushed this now after thinking about better solutions.

Richard.

>
> Richard.
diff mbox series

Patch

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index bb086c6ac1c..be067c8923b 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -53,6 +53,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree-hash-traits.h"
 #include "vec-perm-indices.h"
 #include "internal-fn.h"
+#include "gimple-fold.h"
 
 /* Return true if load- or store-lanes optab OPTAB is implemented for
    COUNT vectors of type VECTYPE.  NAME is the name of OPTAB.  */
@@ -5026,7 +5027,7 @@  bump_vector_ptr (vec_info *vinfo,
   struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
   tree update = TYPE_SIZE_UNIT (vectype);
-  gassign *incr_stmt;
+  gimple *incr_stmt;
   ssa_op_iter iter;
   use_operand_p use_p;
   tree new_dataref_ptr;
@@ -5041,6 +5042,15 @@  bump_vector_ptr (vec_info *vinfo,
   incr_stmt = gimple_build_assign (new_dataref_ptr, POINTER_PLUS_EXPR,
 				   dataref_ptr, update);
   vect_finish_stmt_generation (vinfo, stmt_info, incr_stmt, gsi);
+  /* Fold the increment, avoiding excessive chains use-def chains of
+     those, leading to compile-time issues for passes until the next
+     forwprop pass which would do this as well.  */
+  gimple_stmt_iterator fold_gsi = gsi_for_stmt (incr_stmt);
+  if (fold_stmt (&fold_gsi, follow_all_ssa_edges))
+    {
+      incr_stmt = gsi_stmt (fold_gsi);
+      update_stmt (incr_stmt);
+    }
 
   /* Copy the points-to information if it exists. */
   if (DR_PTR_INFO (dr))
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index eeef96a2eb6..1636e6716df 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -9765,6 +9765,18 @@  vectorizable_load (vec_info *vinfo,
 	  bool ok = vect_transform_slp_perm_load (vinfo, slp_node, dr_chain,
 						  gsi, vf, false, &n_perms);
 	  gcc_assert (ok);
+	  /* For SLP we know we've seen all possible uses of dr_chain.
+	     See to remove stmts we didn't need.
+	     ???  This is a hack to prevent compile-time issues as seen
+	     in PR101120 and friends.  */
+	  for (tree op : dr_chain)
+	    if (has_zero_uses (op))
+	      {
+		gimple *stmt = SSA_NAME_DEF_STMT (op);
+		gimple_stmt_iterator rgsi = gsi_for_stmt (stmt);
+		gsi_remove (&rgsi, true);
+		release_defs (stmt);
+	      }
         }
       else
         {