Patchwork More improvements for the code generation of the if-conversion

login
register
mail settings
Submitter Sebastian Pop
Date June 22, 2010, 7:18 p.m.
Message ID <AANLkTikk4dX2Os740MRxRaaAzaOnyh6LmVvCij_aQwU8@mail.gmail.com>
Download mbox | patch
Permalink /patch/56571/
State New
Headers show

Comments

Sebastian Pop - June 22, 2010, 7:18 p.m.
Hi,

The attached patches clean the code generated by the if-conversion:

0001 adds a debug counter, useful to reduce code generation problems
due to the tree-if-conversion pass.

0002 calls the cleanup_tree_cfg after if-conversion to ensure that the
CFG representation passed to the vectorizer is in good shape.

0003 adds some more sanity checks after the if-conversion pass:
verify_loop_structure and verify_loop_closed_ssa.

0004 moves the code needed for the computation of the predicate of a
BB at the beginning of the predicated BB.  The computation of the
insertion place is simplified, and makes it possible to use this
predicate in the code of the BB.

0005 implements and calls an unfold SSA function that first expands
the SSA_NAMEs and then calls fold on the expanded expression tree.
unfold_ssa_names exposes to fold a more complete expression than
otherwise available in the predicates of a BB.  This patch is now
needed to improve the quality of the code generated by the
if-conversion as if-convert gimplifies the predicates in order to
avoid recomputations of some predicates.  With this patch it is
possible to fold (a || b) into the true predicate when a and b are the
predicates of the two branches of a condition, as the SSA_NAMEs a and
b are first unfolded into their respective expressions and then fold
finishes by computing the true predicate.  Note that this patch needs
some adjustment, i.e., replacing the magic constant 5 that is the
depth level for the unfold traversal to be some parameter, or so.

0006 uses a single function reset_bb_predicate to reset the predicate
of a BB to true.  This function has to release all the SSA_NAMEs used
in the gimplification of a predicate.

0007 avoids the generation of code computing the true predicate, that
occurs for all the BBs merging disjunct predicates leading to the true
predicate.

0008 forces the predicate of a BB to be either is_gimple_condexpr or
otherwise the predicate is gimplified, avoiding the insertion of the
same computation on other predicates.

This patch-set passed regstrap on amd64-linux with
BOOT_CFLAGS= -g -O3 -msse2.
Ok for trunk?

Thanks,
Sebastian Pop
--
AMD / Open Source Compiler Engineering / GNU Tools
Richard Guenther - June 25, 2010, 2:48 p.m.
On Tue, 22 Jun 2010, Sebastian Pop wrote:

> Hi,
> 
> The attached patches clean the code generated by the if-conversion:
> 
> 0001 adds a debug counter, useful to reduce code generation problems
> due to the tree-if-conversion pass.

Ok.

> 0002 calls the cleanup_tree_cfg after if-conversion to ensure that the
> CFG representation passed to the vectorizer is in good shape.

Instead of

+  if (changed)
+    cleanup_tree_cfg ();

   return 0;
 }

do

   return changed ? TODO_cleanup_cfg : 0;

ok with that change.

> 0003 adds some more sanity checks after the if-conversion pass:
> verify_loop_structure and verify_loop_closed_ssa.

This is already done.  See passes.c.

Not ok.

> 0004 moves the code needed for the computation of the predicate of a
> BB at the beginning of the predicated BB.  The computation of the
> insertion place is simplified, and makes it possible to use this
> predicate in the code of the BB.

Why should it ever be needed to use it in the code of the BB?
As far as I see this only enlarges life-time of them.

> 0005 implements and calls an unfold SSA function that first expands
> the SSA_NAMEs and then calls fold on the expanded expression tree.
> unfold_ssa_names exposes to fold a more complete expression than
> otherwise available in the predicates of a BB.  This patch is now
> needed to improve the quality of the code generated by the
> if-conversion as if-convert gimplifies the predicates in order to
> avoid recomputations of some predicates.  With this patch it is
> possible to fold (a || b) into the true predicate when a and b are the
> predicates of the two branches of a condition, as the SSA_NAMEs a and
> b are first unfolded into their respective expressions and then fold
> finishes by computing the true predicate.  Note that this patch needs
> some adjustment, i.e., replacing the magic constant 5 that is the
> depth level for the unfold traversal to be some parameter, or so.

+      op2 = unfold_ssa_names (gimple_assign_rhs2 (stmt), level);

this possibly accesses uninitialized and unallocated memory.

I do not think this patch is a good idea at all and you allow
exponential growth as you do not decrement level for each operand
you visit.  Thus, for all-ternary operands and your use of a limit
of depth 5 you'd get 243 substitutions and foldings.

At most a substitution depth of 1 would be which in turn will
simplify that patch a lot.

Please re-work it.

> 0006 uses a single function reset_bb_predicate to reset the predicate
> of a BB to true.  This function has to release all the SSA_NAMEs used
> in the gimplification of a predicate.

Ok.

> 0007 avoids the generation of code computing the true predicate, that
> occurs for all the BBs merging disjunct predicates leading to the true
> predicate.

Ok.

> 0008 forces the predicate of a BB to be either is_gimple_condexpr or
> otherwise the predicate is gimplified, avoiding the insertion of the
> same computation on other predicates.

I suppose this is only necessary because of 0005, so defered.

Please post your patches separately and inline, not as attachment.
It's a major nuisance to deal with this kind of mails with its loads
of attachements.

Thanks,
Richard.
Sebastian Pop - June 25, 2010, 3:05 p.m.
On Fri, Jun 25, 2010 at 09:48, Richard Guenther <rguenther@suse.de> wrote:
>> 0004 moves the code needed for the computation of the predicate of a
>> BB at the beginning of the predicated BB.  The computation of the
>> insertion place is simplified, and makes it possible to use this
>> predicate in the code of the BB.
>
> Why should it ever be needed to use it in the code of the BB?
> As far as I see this only enlarges life-time of them.

This is needed for predicating memory accesses in the BB: see 0009.
I will include this patch with the other patch-set.

> Please post your patches separately and inline, not as attachment.
> It's a major nuisance to deal with this kind of mails with its loads
> of attachements.

Ok, sorry for the inconvenient.

Sebastian
Sebastian Pop - June 29, 2010, 8:46 p.m.
On Fri, Jun 25, 2010 at 09:48, Richard Guenther <rguenther@suse.de> wrote:
>> 0005 implements and calls an unfold SSA function that first expands
>> the SSA_NAMEs and then calls fold on the expanded expression tree.
>> unfold_ssa_names exposes to fold a more complete expression than
>> otherwise available in the predicates of a BB.  This patch is now
>> needed to improve the quality of the code generated by the
>> if-conversion as if-convert gimplifies the predicates in order to
>> avoid recomputations of some predicates.  With this patch it is
>> possible to fold (a || b) into the true predicate when a and b are the
>> predicates of the two branches of a condition, as the SSA_NAMEs a and
>> b are first unfolded into their respective expressions and then fold
>> finishes by computing the true predicate.  Note that this patch needs
>> some adjustment, i.e., replacing the magic constant 5 that is the
>> depth level for the unfold traversal to be some parameter, or so.
>
> +      op2 = unfold_ssa_names (gimple_assign_rhs2 (stmt), level);
>
> this possibly accesses uninitialized and unallocated memory.

How could this happen, could you explain?

> I do not think this patch is a good idea at all and you allow
> exponential growth as you do not decrement level for each operand
> you visit.  Thus, for all-ternary operands and your use of a limit
> of depth 5 you'd get 243 substitutions and foldings.
>
> At most a substitution depth of 1 would be which in turn will
> simplify that patch a lot.

A substitution of depth 1 doesn't seem to be enough: if I am limiting
the level to 1, then with my current patch-set, I get all these failures:

FAIL: gcc.dg/vect/vect-ifcvt-16.c scan-tree-dump-times vect
"vectorized 1 loops" 1
FAIL: gcc.dg/vect/vect-ifcvt-17.c scan-tree-dump-times vect
"vectorized 1 loops" 1
FAIL: gcc.dg/vect/vect-ifcvt-2.c scan-tree-dump-times vect "vectorized
1 loops" 1
FAIL: gcc.dg/vect/vect-ifcvt-3.c scan-tree-dump-times vect "vectorized
1 loops" 1
FAIL: gcc.dg/vect/vect-ifcvt-4.c scan-tree-dump-times vect "vectorized
1 loops" 1
FAIL: gcc.dg/vect/vect-ifcvt-5.c scan-tree-dump-times vect "vectorized
1 loops" 1
FAIL: gcc.dg/vect/vect-ifcvt-6.c scan-tree-dump-times vect "vectorized
1 loops" 1
FAIL: gcc.dg/vect/vect-ifcvt-7.c scan-tree-dump-times vect "vectorized
1 loops" 1
FAIL: gcc.dg/vect/vect-ifcvt-9.c scan-tree-dump-times vect "vectorized
1 loops" 2
FAIL: gcc.dg/vect/no-trapping-math-2.c scan-tree-dump-times vect
"vectorized 1 loops" 1
FAIL: gcc.dg/vect/no-trapping-math-vect-111.c scan-tree-dump-times
vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/no-trapping-math-vect-ifcvt-11.c
scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/no-trapping-math-vect-ifcvt-12.c
scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/no-trapping-math-vect-ifcvt-13.c
scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/no-trapping-math-vect-ifcvt-14.c
scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/no-trapping-math-vect-ifcvt-15.c
scan-tree-dump-times vect "vectorized 1 loops" 1

for level = 2, I still get these fails:

FAIL: gcc.dg/vect/vect-ifcvt-16.c scan-tree-dump-times vect
"vectorized 1 loops" 1
FAIL: gcc.dg/vect/vect-ifcvt-17.c scan-tree-dump-times vect
"vectorized 1 loops" 1

for level = 3 these fails disappear.

Sebastian
Richard Guenther - June 30, 2010, 8:22 a.m.
On Tue, 29 Jun 2010, Sebastian Pop wrote:

> On Fri, Jun 25, 2010 at 09:48, Richard Guenther <rguenther@suse.de> wrote:
> >> 0005 implements and calls an unfold SSA function that first expands
> >> the SSA_NAMEs and then calls fold on the expanded expression tree.
> >> unfold_ssa_names exposes to fold a more complete expression than
> >> otherwise available in the predicates of a BB.  This patch is now
> >> needed to improve the quality of the code generated by the
> >> if-conversion as if-convert gimplifies the predicates in order to
> >> avoid recomputations of some predicates.  With this patch it is
> >> possible to fold (a || b) into the true predicate when a and b are the
> >> predicates of the two branches of a condition, as the SSA_NAMEs a and
> >> b are first unfolded into their respective expressions and then fold
> >> finishes by computing the true predicate.  Note that this patch needs
> >> some adjustment, i.e., replacing the magic constant 5 that is the
> >> depth level for the unfold traversal to be some parameter, or so.
> >
> > +      op2 = unfold_ssa_names (gimple_assign_rhs2 (stmt), level);
> >
> > this possibly accesses uninitialized and unallocated memory.
> 
> How could this happen, could you explain?

You unconditionally access op2 w/o checking the rhs is binary.

> > I do not think this patch is a good idea at all and you allow
> > exponential growth as you do not decrement level for each operand
> > you visit.  Thus, for all-ternary operands and your use of a limit
> > of depth 5 you'd get 243 substitutions and foldings.
> >
> > At most a substitution depth of 1 would be which in turn will
> > simplify that patch a lot.
> 
> A substitution of depth 1 doesn't seem to be enough: if I am limiting
> the level to 1, then with my current patch-set, I get all these failures:

How is it that an improvement patch creates new FAILs?  W/o your
patch we also don't need to "unfold".

What kind of expressions do you need to fold?  I suppose they
are all comparisons and logical/bitwise and/or or nots?

Please have a look at the infrastructure Sandra added
for the patch for PRs 39874 and 28685 then.  This is how it should
be done - not using fold and building large trees for any
expression kinds you are running into.  I bet you can simply
re-use what Sandra added.

Richard.

> FAIL: gcc.dg/vect/vect-ifcvt-16.c scan-tree-dump-times vect
> "vectorized 1 loops" 1
> FAIL: gcc.dg/vect/vect-ifcvt-17.c scan-tree-dump-times vect
> "vectorized 1 loops" 1
> FAIL: gcc.dg/vect/vect-ifcvt-2.c scan-tree-dump-times vect "vectorized
> 1 loops" 1
> FAIL: gcc.dg/vect/vect-ifcvt-3.c scan-tree-dump-times vect "vectorized
> 1 loops" 1
> FAIL: gcc.dg/vect/vect-ifcvt-4.c scan-tree-dump-times vect "vectorized
> 1 loops" 1
> FAIL: gcc.dg/vect/vect-ifcvt-5.c scan-tree-dump-times vect "vectorized
> 1 loops" 1
> FAIL: gcc.dg/vect/vect-ifcvt-6.c scan-tree-dump-times vect "vectorized
> 1 loops" 1
> FAIL: gcc.dg/vect/vect-ifcvt-7.c scan-tree-dump-times vect "vectorized
> 1 loops" 1
> FAIL: gcc.dg/vect/vect-ifcvt-9.c scan-tree-dump-times vect "vectorized
> 1 loops" 2
> FAIL: gcc.dg/vect/no-trapping-math-2.c scan-tree-dump-times vect
> "vectorized 1 loops" 1
> FAIL: gcc.dg/vect/no-trapping-math-vect-111.c scan-tree-dump-times
> vect "vectorized 1 loops" 1
> FAIL: gcc.dg/vect/no-trapping-math-vect-ifcvt-11.c
> scan-tree-dump-times vect "vectorized 1 loops" 1
> FAIL: gcc.dg/vect/no-trapping-math-vect-ifcvt-12.c
> scan-tree-dump-times vect "vectorized 1 loops" 1
> FAIL: gcc.dg/vect/no-trapping-math-vect-ifcvt-13.c
> scan-tree-dump-times vect "vectorized 1 loops" 1
> FAIL: gcc.dg/vect/no-trapping-math-vect-ifcvt-14.c
> scan-tree-dump-times vect "vectorized 1 loops" 1
> FAIL: gcc.dg/vect/no-trapping-math-vect-ifcvt-15.c
> scan-tree-dump-times vect "vectorized 1 loops" 1
> 
> for level = 2, I still get these fails:
> 
> FAIL: gcc.dg/vect/vect-ifcvt-16.c scan-tree-dump-times vect
> "vectorized 1 loops" 1
> FAIL: gcc.dg/vect/vect-ifcvt-17.c scan-tree-dump-times vect
> "vectorized 1 loops" 1
> 
> for level = 3 these fails disappear.
> 
> Sebastian
> 
>
Sebastian Pop - June 30, 2010, 3:03 p.m.
On Wed, Jun 30, 2010 at 03:22, Richard Guenther <rguenther@suse.de> wrote:
>> A substitution of depth 1 doesn't seem to be enough: if I am limiting
>> the level to 1, then with my current patch-set, I get all these failures:
>
> How is it that an improvement patch creates new FAILs?  W/o your
> patch we also don't need to "unfold".
>

The patch-set contains the if-conversion of memory writes, so these
fails correspond to vect not able to handle the codes generated by the
if-conversion, see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44710

> What kind of expressions do you need to fold?  I suppose they
> are all comparisons and logical/bitwise and/or or nots?
>

Yes, see the gimple code in the PR, although I guess that this case
would be handled by level=1.

> Please have a look at the infrastructure Sandra added
> for the patch for PRs 39874 and 28685 then.  This is how it should
> be done - not using fold and building large trees for any
> expression kinds you are running into.  I bet you can simply
> re-use what Sandra added.
>

Thanks for the pointer, I will try to reuse this.

Sebastian

Patch

From 35f74eaa5d4a79f3fcdfeae8bdf3b81751c6f1ba Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Wed, 16 Jun 2010 15:57:53 -0500
Subject: [PATCH 08/10] The predicate of a BB should either be a gimple_condexpr or an SSA_NAME.

	* tree-if-conv.c (add_to_predicate_list): Call force_gimple_operand
	when the predicate is not a gimple_condexpr.  Call reset_bb_predicate
	when the predicate is true.

	* gcc.dg/tree-ssa/ifc-6.c: New.
---
 gcc/testsuite/gcc.dg/tree-ssa/ifc-6.c |   15 +++++++++++++++
 gcc/tree-if-conv.c                    |   12 +++++++++++-
 2 files changed, 26 insertions(+), 1 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-6.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-6.c
new file mode 100644
index 0000000..a9c5db3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-6.c
@@ -0,0 +1,15 @@ 
+/* { dg-do compile } */
+/* { dg-options "-c -O2 -ftree-vectorize" { target *-*-* } } */
+
+static int x;
+foo (int n, int *A)
+{
+  int i;
+  for (i = 0; i < n; i++)
+    {
+      if (A[i])
+	x = 2;
+      if (A[i + 1])
+	x = 1;
+    }
+}
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index ac3ba93..759920c 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -278,7 +278,17 @@  add_to_predicate_list (basic_block bb, tree new_cond)
       cond = unfold_ssa_names (cond, 5);
     }
 
-  set_bb_predicate (bb, cond);
+  if (!is_gimple_condexpr (cond))
+    {
+      gimple_seq stmts;
+      cond = force_gimple_operand (cond, &stmts, true, NULL_TREE);
+      add_bb_predicate_gimplified_stmts (bb, stmts);
+    }
+
+  if (is_true_predicate (cond))
+    reset_bb_predicate (bb);
+  else
+    set_bb_predicate (bb, cond);
 }
 
 /* Add the condition COND to the previous condition PREV_COND, and add
-- 
1.7.0.4