diff mbox

[0/n] Merge from match-and-simplify

Message ID 20141017182811.GA14499@f1.c.bardezibar.internal
State New
Headers show

Commit Message

Sebastian Pop Oct. 17, 2014, 6:28 p.m. UTC
Sebastian Pop wrote:
> Richard Biener wrote:
> > looks like
> > RTL issues and/or IVOPTs issues?
> 
> I should have posted the first diff between the compilers with -fdump-tree-all:
> that would expose the problem at its root.

Looks like this is caused by the fwprop pass:

Comments

Richard Biener Oct. 20, 2014, 11:42 a.m. UTC | #1
On Fri, 17 Oct 2014, Sebastian Pop wrote:

> Sebastian Pop wrote:
> > Richard Biener wrote:
> > > looks like
> > > RTL issues and/or IVOPTs issues?
> > 
> > I should have posted the first diff between the compilers with -fdump-tree-all:
> > that would expose the problem at its root.
> 
> Looks like this is caused by the fwprop pass:
> 
> diff -u -r ./foo.i.087t.forwprop3 ../mas/foo.i.087t.forwprop3
> --- ./foo.i.087t.forwprop3      2014-10-17 13:17:29.985327000 -0500
> +++ ../mas/foo.i.087t.forwprop3 2014-10-17 13:17:29.308814000 -0500
> @@ -5,6 +5,8 @@
>  Pass statistics:
>  ----------------
>  
> +Applying pattern match-comparison.pd:43, gimple-match.c:11747
> +gimple_simplified to if (i_20 != 99)
>  
>  Pass statistics:
>  ----------------
> @@ -60,7 +62,7 @@
>    i_17 = i_20 + 1;
>    # DEBUG iD.2450 => i_17
>    # DEBUG iD.2450 => i_17
> -  if (i_17 != 100)
> +  if (i_20 != 99)
>      goto <bb 3>;
>    else
>      goto <bb 4>;

Ok, so this is one effect on the thing Marc pointed out - currently
no patterns (well, no but one) guards itself with has_single_use
predicates.

That was a conscious decision and the idea was that the caller should
do this via its lattice valueization function which could look like

tree
valueize (tree t)
{
  if (TREE_CODE (t) == SSA_NAME
      && !has_single_use (t))
    return NULL_TREE;
  return t;
}

But of course doing that unconditionally would also pessimize code.
Generally we'd like to avoid un-CSEing stuff in a way that cannot
be CSEd again.  That's a more complex condition than what can be
implemented with has_single_use.  You might also consider a
stmt doing a_1 + a_1 where a_1 has two uses now.

For Sebastians case above the issue is that we are appearantly
bad at optimizing post-increment exit tests.  But if you'd consider
code like

  i_2 = i_1 + 1;
  b1_3 = i_2 < 100;
  b2_4 = i_2 > 50;
  if (b1_3 && b2_4)
    ...

then it is profitable to remove i_2 by changing the two comparisons
to i_2 <= 98 and i_2 > 49.

I thought about doing all simplifications first without committing
any simplified sequence to the IL, then scanning over the result,
pruning out cases that end up pessimizing code (how exactly isn't
yet clear to me).

So I'm not sure what we want to do here now.  I don't very much like
doing things explicitely in the pattern description (nor using the
"has_single_use" predicate).
I suppose for the gimple_build () stuff we could restrict simplifications
to the expression we are building (not simplifying with SSA defs in the 
IL), more exactly mimicing fold_buildN behavior.
I suppose for forwprop we could use the above valueize hook (but then
regress because not all patterns as implemented in forwprop guard
their def stmt lookup with has_single_use...).

Any opinion on this?  Any idea of a "simple" cost function if
you have the functions IL before and after simplifications (but
without any DCE/CSE applied)?

Thanks,
Richard.
Jeff Law Oct. 22, 2014, 8:40 p.m. UTC | #2
On 10/20/14 05:42, Richard Biener wrote:
> That was a conscious decision and the idea was that the caller should
> do this via its lattice valueization function which could look like
>
> tree
> valueize (tree t)
> {
>    if (TREE_CODE (t) == SSA_NAME
>        && !has_single_use (t))
>      return NULL_TREE;
>    return t;
> }
>
> But of course doing that unconditionally would also pessimize code.
> Generally we'd like to avoid un-CSEing stuff in a way that cannot
> be CSEd again.  That's a more complex condition than what can be
> implemented with has_single_use.  You might also consider a
> stmt doing a_1 + a_1 where a_1 has two uses now.
FWIW, I wouldn't worry much about the two uses in a single statement 
case.  I looked at that in RTL eons ago it just doesn't happen enough to 
bother trying to detect and treat as a single use.

>
> I thought about doing all simplifications first without committing
> any simplified sequence to the IL, then scanning over the result,
> pruning out cases that end up pessimizing code (how exactly isn't
> yet clear to me).
>
> So I'm not sure what we want to do here now.  I don't very much like
> doing things explicitely in the pattern description (nor using the
> "has_single_use" predicate).
> I suppose for the gimple_build () stuff we could restrict simplifications
> to the expression we are building (not simplifying with SSA defs in the
> IL), more exactly mimicing fold_buildN behavior.
> I suppose for forwprop we could use the above valueize hook (but then
> regress because not all patterns as implemented in forwprop guard
> their def stmt lookup with has_single_use...).
>
> Any opinion on this?  Any idea of a "simple" cost function if
> you have the functions IL before and after simplifications (but
> without any DCE/CSE applied)?
It's certainly ideal to be able to be able to CSE/un-CSE depending on 
final context and it's a design goal I've heard other compiler 
developers making.  ie, every transformation early which may be somewhat 
speculative must be "un-doable" later.  But the infrastructure for that 
is, umm, hard.

The concept of simplify on the side, then prune out stuff that isn't 
profitable is nice, but as you state, that's nontrivial as well.

In general, the has_single_use case is profitable.  So we want to 
aggressively go after those and I think we can commit those immediately 
and use the valueize function shown above.

Maybe you then look at the more speculative cases...

jeff
diff mbox

Patch

diff -u -r ./foo.i.087t.forwprop3 ../mas/foo.i.087t.forwprop3
--- ./foo.i.087t.forwprop3      2014-10-17 13:17:29.985327000 -0500
+++ ../mas/foo.i.087t.forwprop3 2014-10-17 13:17:29.308814000 -0500
@@ -5,6 +5,8 @@ 
 Pass statistics:
 ----------------
 
+Applying pattern match-comparison.pd:43, gimple-match.c:11747
+gimple_simplified to if (i_20 != 99)
 
 Pass statistics:
 ----------------
@@ -60,7 +62,7 @@ 
   i_17 = i_20 + 1;
   # DEBUG iD.2450 => i_17
   # DEBUG iD.2450 => i_17
-  if (i_17 != 100)
+  if (i_20 != 99)
     goto <bb 3>;
   else
     goto <bb 4>;

[...]
diff -u -r ./foo.i.089t.ccp3 ../mas/foo.i.089t.ccp3
--- ./foo.i.089t.ccp3   2014-10-17 13:17:29.991734000 -0500
+++ ../mas/foo.i.089t.ccp3      2014-10-17 13:17:29.316140000 -0500
@@ -53,13 +53,13 @@ 
 # VUSE <.MEM_16>
 return;
 
-i_17 : -->2 uses.
+i_17 : --> single use.
 i_20 = PHI <i_17(3), 0(2)>
 # DEBUG i => i_17
-if (i_17 != 100)
 # DEBUG i => i_17
 
-i_20 : -->2 uses.
+i_20 : -->3 uses.
+if (i_20 != 99)
 i_17 = i_20 + 1;
 _4 = (long unsigned int) i_20;
 # DEBUG i => i_20