diff mbox

[RFA,PR,tree-optimization/64058] Improve and stabilize sorting of coalesce pairs

Message ID 56E234EE.9080804@redhat.com
State New
Headers show

Commit Message

Jeff Law March 11, 2016, 3:01 a.m. UTC
As discussed in the BZ, we have multiple problems with how we sort the 
coalesce list during out-of-ssa coalescing.

First, the sort is not stable.  If the cost of two coalesce pairs is the 
same, we break the tie by looking at the underlying SSA_NAME_VERSION of 
the first, then the second elements in the coalesce pairs.

As a result, changes in SSA_NAME_VERSIONs in the IL can result in 
different coalescing during out-of-ssa.  That in turn can cause changes 
in what objects are coalesced, which in turn causes random performance 
changes.

This patch addresses that problem by recording an index for each 
coalescing pair discovered and using that index as the final tiebreaker 
rather than looking at SSA_NAME_VERSIONs.  That brings stability to the 
coalescing process and avoids a lot of unnecessary differences in the 
code we generate when SSA_NAME_VERSIONs change.

The second problem is our costing heuristic only looks at edge 
frequencies & flags.  It's actually a pretty good heuristic and captures 
the main goal of coalescing -- reducing the most commonly executed 
copies.  However, in the case where the edge frequencies/flags result in 
the same cost we can do better.

When we coalesce two SSA_NAMEs, we have to build the union of the 
conflicts of each of the SSA_NAMEs -- which means the resulting union 
object is less likely to be able to participate in further coalescing.

So given two coalescing pairs with the same primary cost, preferring the 
coalescing pair with the smaller resulting conflict set gives us a 
better chance that the resulting object will be able to participate in 
further coalescing.

That heuristic broadly mirrors one aspect of how iterated conservative 
coalescing works.  The other interesting heuristic (that I did not 
implement) was to favor coalescing of the pair which had a higher degree 
of common conflicts between the two nodes -- which broadly falls into 
the same category as what we're doing with this patch.  The key being 
that the conflict sets are an important thing to consider when coalescing.

Using the conflict sizes as a tie-breaker eliminates the regression in 
64058 and AFAICT also eliminates the regression in 68654 (the latter 
doesn't include a testcase or as in-depth analysis as 64058, but my 
testing indicates this patch should generate the desired code for both 
cases).

The patch has (of course) bootstrapped and regression tested on 
x86_64-linux-gnu.

I'd be curious for thoughts on how to build a testcase for this.  I 
could emit the conflict sizes along with the coalescing cost in the 
dumps, but that won't positively verify that we've done the preferred 
set of coalescings.

I might be able to look at the .expand dumps and perhaps look for copies 
on edges.  However, unless the only copies are the ones that were 
causing the regression, I suspect such a test would end up being rather 
fragile.

Other thoughts on how to get this under regression testing?  And of 
course, thoughts on the patch itself?

Thanks,
Jeff

Comments

Richard Biener March 11, 2016, 10:02 a.m. UTC | #1
On Fri, Mar 11, 2016 at 4:01 AM, Jeff Law <law@redhat.com> wrote:
>
> As discussed in the BZ, we have multiple problems with how we sort the
> coalesce list during out-of-ssa coalescing.
>
> First, the sort is not stable.  If the cost of two coalesce pairs is the
> same, we break the tie by looking at the underlying SSA_NAME_VERSION of the
> first, then the second elements in the coalesce pairs.
>
> As a result, changes in SSA_NAME_VERSIONs in the IL can result in different
> coalescing during out-of-ssa.  That in turn can cause changes in what
> objects are coalesced, which in turn causes random performance changes.
>
> This patch addresses that problem by recording an index for each coalescing
> pair discovered and using that index as the final tiebreaker rather than
> looking at SSA_NAME_VERSIONs.  That brings stability to the coalescing
> process and avoids a lot of unnecessary differences in the code we generate
> when SSA_NAME_VERSIONs change.
>
> The second problem is our costing heuristic only looks at edge frequencies &
> flags.  It's actually a pretty good heuristic and captures the main goal of
> coalescing -- reducing the most commonly executed copies.  However, in the
> case where the edge frequencies/flags result in the same cost we can do
> better.
>
> When we coalesce two SSA_NAMEs, we have to build the union of the conflicts
> of each of the SSA_NAMEs -- which means the resulting union object is less
> likely to be able to participate in further coalescing.
>
> So given two coalescing pairs with the same primary cost, preferring the
> coalescing pair with the smaller resulting conflict set gives us a better
> chance that the resulting object will be able to participate in further
> coalescing.
>
> That heuristic broadly mirrors one aspect of how iterated conservative
> coalescing works.  The other interesting heuristic (that I did not
> implement) was to favor coalescing of the pair which had a higher degree of
> common conflicts between the two nodes -- which broadly falls into the same
> category as what we're doing with this patch.  The key being that the
> conflict sets are an important thing to consider when coalescing.
>
> Using the conflict sizes as a tie-breaker eliminates the regression in 64058
> and AFAICT also eliminates the regression in 68654 (the latter doesn't
> include a testcase or as in-depth analysis as 64058, but my testing
> indicates this patch should generate the desired code for both cases).
>
> The patch has (of course) bootstrapped and regression tested on
> x86_64-linux-gnu.
>
> I'd be curious for thoughts on how to build a testcase for this.  I could
> emit the conflict sizes along with the coalescing cost in the dumps, but
> that won't positively verify that we've done the preferred set of
> coalescings.
>
> I might be able to look at the .expand dumps and perhaps look for copies on
> edges.  However, unless the only copies are the ones that were causing the
> regression, I suspect such a test would end up being rather fragile.
>
> Other thoughts on how to get this under regression testing?  And of course,
> thoughts on the patch itself?

Can you please split out the 'index' introduction as a separate patch
and apply that?
I think it is quite obviously a good idea and might make regression
hunting easier
later if needed.

For the other part I noticed a few things
 1) having a bitmap_count_ior_bits () would be an improvement
 2) you might end up with redundant work(?) as you are iterating
 over SSA name coalesce candidates but look at partition conflicts
 3) having this extra heuristic might be best guarded by
flag_expensive_optimizations
 as it is a quite expensive "tie breaker" - maybe even improve things
by first sorting
 after cost and then only doing the tie breaking when necessary, re-sorting the
 sub-sequence with same original cost.  It may also be enough to only perform
 this for "important" candidates, say within the first 100 of the function or so
 or with cost > X.

And finally - if we really think that looking at the conflict size
increase is the way to go
it would maybe be better to use a fibheap updating keys in attempt_coalesce
when we merge the conflicts.  That would also mean to work on a list (fibheap)
of coalesces of partitions rather than SSA names.

I think the patch is reasonable enough for GCC 6 if we can bring compile-time
cost down a bit (it can be quadratic in the number of SSA names if we have
a lot of coalesce candidates and nearly full conflict bitmaps - of course that's
not a case we handle very well right now but still).  I would have hoped the
index part of the patch fixed the regression (by luck)...

As far as a testcase goes we want to scan the dumps for the actual coalesces
being done.  Might be a bit fragile though...

Thanks,
Richard.

> Thanks,
> Jeff
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index cc91e84..f28baa2 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,18 @@
> +2016-03-10  Jeff Law  <law@redhat.com>
> +
> +       PR tree-optimization/64058
> +       * tree-ssa-coalesce.c (struct coalesce_pair): Add new fields
> +       CONFLICT_COUNT and INDEX.
> +       (num_coalesce_pairs): Move up earlier in the file.
> +       (find_coalesce_pair): Initialize the INDEX field for each pair
> +       discovered.
> +       (add_conflict_counts): New function to initialize the CONFLICT_COUNT
> +       field for each conflict pair.
> +       (coalesce_ssa_name): Call it.
> +       (compare_pairs): No longer sort on the elements of each pair.
> +       Instead break ties with the conflict size and finally the index
> +       of the coalesce pair.
> +
>  2016-03-10  Ulrich Weigand  <Ulrich.Weigand@de.ibm.com>
>
>         PR target/70168
> diff --git a/gcc/tree-ssa-coalesce.c b/gcc/tree-ssa-coalesce.c
> index 6624e7e..b8a2e0d 100644
> --- a/gcc/tree-ssa-coalesce.c
> +++ b/gcc/tree-ssa-coalesce.c
> @@ -50,6 +50,19 @@ struct coalesce_pair
>    int first_element;
>    int second_element;
>    int cost;
> +
> +  /* A count of the number of unique partitions this pair would conflict
> +     with if coalescing was successful.  This is the secondary sort key,
> +     given two pairs with equal costs, we will prefer the pair with a
> smaller
> +     conflict set.
> +
> +     Note this is not updated and propagated as pairs are coalesced.  */
> +  int conflict_count;
> +
> +  /* The order in which coalescing pairs are discovered is recorded in this
> +     field, which is used as the final tie breaker when sorting coalesce
> +     pairs.  */
> +  int index;
>  };
>
>  /* Coalesce pair hashtable helpers.  */
> @@ -254,6 +267,13 @@ delete_coalesce_list (coalesce_list *cl)
>    free (cl);
>  }
>
> +/* Return the number of unique coalesce pairs in CL.  */
> +
> +static inline int
> +num_coalesce_pairs (coalesce_list *cl)
> +{
> +  return cl->list->elements ();
> +}
>
>  /* Find a matching coalesce pair object in CL for the pair P1 and P2.  If
>     one isn't found, return NULL if CREATE is false, otherwise create a new
> @@ -290,6 +310,7 @@ find_coalesce_pair (coalesce_list *cl, int p1, int p2,
> bool create)
>        pair->first_element = p.first_element;
>        pair->second_element = p.second_element;
>        pair->cost = 0;
> +      pair->index = num_coalesce_pairs (cl);
>        *slot = pair;
>      }
>
> @@ -343,29 +364,21 @@ compare_pairs (const void *p1, const void *p2)
>    int result;
>
>    result = (* pp1)->cost - (* pp2)->cost;
> -  /* Since qsort does not guarantee stability we use the elements
> -     as a secondary key.  This provides us with independence from
> -     the host's implementation of the sorting algorithm.  */
> +  /* We use the size of the resulting conflict set as the secondary sort
> key.
> +     Given two equal costing coalesce pairs, we want to prefer the pair
> that
> +     has the smaller conflict set.  */
>    if (result == 0)
>      {
> -      result = (* pp2)->first_element - (* pp1)->first_element;
> +      result = (*pp2)->conflict_count - (*pp1)->conflict_count;
> +      /* And if everything else is equal, then sort based on which
> +        coalesce pair was found first.  */
>        if (result == 0)
> -       result = (* pp2)->second_element - (* pp1)->second_element;
> +       result = (*pp2)->index - (*pp1)->index;
>      }
>
>    return result;
>  }
>
> -
> -/* Return the number of unique coalesce pairs in CL.  */
> -
> -static inline int
> -num_coalesce_pairs (coalesce_list *cl)
> -{
> -  return cl->list->elements ();
> -}
> -
> -
>  /* Iterate over CL using ITER, returning values in PAIR.  */
>
>  #define FOR_EACH_PARTITION_PAIR(PAIR, ITER, CL)                \
> @@ -578,6 +591,42 @@ ssa_conflicts_merge (ssa_conflicts *ptr, unsigned x,
> unsigned y)
>      }
>  }
>
> +/* Compute and record how many unique conflicts would exist for the
> +   representative partition for each coalesce pair in CL.
> +
> +   CONFLICTS is the conflict graph and MAP is the current partition view.
> */
> +
> +static void
> +add_conflict_counts (ssa_conflicts *conflicts, var_map map, coalesce_list
> *cl)
> +{
> +  coalesce_pair *p;
> +  coalesce_iterator_type ppi;
> +  bitmap tmp = BITMAP_ALLOC (NULL);
> +
> +  FOR_EACH_PARTITION_PAIR (p, ppi, cl)
> +    {
> +      int p1 = var_to_partition (map, ssa_name (p->first_element));
> +      int p2 = var_to_partition (map, ssa_name (p->second_element));
> +
> +      /* 4 cases.  If both P1 and P2 have conflicts, then build their
> +        union and count the members.  Else handle the degenerate cases
> +        in the obvious ways.  */
> +      if (conflicts->conflicts[p1] && conflicts->conflicts[p2])
> +       {
> +         bitmap_ior (tmp, conflicts->conflicts[p1],
> conflicts->conflicts[p2]);
> +         p->conflict_count = bitmap_count_bits (tmp);
> +         bitmap_clear (tmp);
> +       }
> +      else if (conflicts->conflicts[p1])
> +       p->conflict_count = bitmap_count_bits (conflicts->conflicts[p1]);
> +      else if (conflicts->conflicts[p2])
> +       p->conflict_count = bitmap_count_bits (conflicts->conflicts[p2]);
> +      else
> +       p->conflict_count = 0;
> +    }
> +
> +  BITMAP_FREE (tmp);
> +}
>
>  /* Dump a conflicts graph.  */
>
> @@ -1802,6 +1851,7 @@ coalesce_ssa_name (void)
>    if (dump_file && (dump_flags & TDF_DETAILS))
>      ssa_conflicts_dump (dump_file, graph);
>
> +  add_conflict_counts (graph, map, cl);
>    sort_coalesce_list (cl);
>
>    if (dump_file && (dump_flags & TDF_DETAILS))
>
Jeff Law March 14, 2016, 10:32 p.m. UTC | #2
On 03/11/2016 03:02 AM, Richard Biener wrote:
>
>
> For the other part I noticed a few things
>   1) having a bitmap_count_ior_bits () would be an improvement
Yea, I almost built one.  That's easy enough to add.

>   2) you might end up with redundant work(?) as you are iterating
>   over SSA name coalesce candidates but look at partition conflicts
We'd have redundant work if the elements mapped back to SSA_NAMEs which 
in turn mapped to partitions which appeared as a coalescing pair 
already.  But there's no way to know that in advance.

This is mitigated somewhat in the next version which computes the 
conflict sizes lazily when the qsort comparison function is given two 
conflict pairs with an equal cost.



>   3) having this extra heuristic might be best guarded by
> flag_expensive_optimizations
Perhaps.  I don't see this tie breaker as being all that expensive.  But 
I won't object to guarding with flag_expensive_optimizations.

>   as it is a quite expensive "tie breaker" - maybe even improve things
> by first sorting
>   after cost and then only doing the tie breaking when necessary, re-sorting the
>   sub-sequence with same original cost.  It may also be enough to only perform
>   this for "important" candidates, say within the first 100 of the function or so
>   or with cost > X.
The problem with this is qsort's interface into the comparison function 
has a terribly narrow API and I don't think we want to rely on qsort_r. 
  In fact that's the whole reason why I didn't do lazy evaluation on the 
conflict sizes initially.

To work around the narrow API in the comparison function we have to 
either store additional data in each node or have them available in 
globals.  The former would be horribly wasteful, the latter is just 
ugly.  I choose the latter in the lazy evaluation of the conflicts version.

>
> And finally - if we really think that looking at the conflict size
> increase is the way to go
> it would maybe be better to use a fibheap updating keys in attempt_coalesce
> when we merge the conflicts.  That would also mean to work on a list (fibheap)
> of coalesces of partitions rather than SSA names.
I really doubt it's worth this effort.  The literature I've been looking 
at in this space essentially says that given a reasonable coalescer, 
improvements, while measurable, are very very small in terms of the 
efficiency of the final code.

Thus I rejected conservative coalescing + iteration, biased coalescing, 
& trial coalescing as too expensive given the trivial benefit. 
Similarly I rejected trying to update the costs as we coalesce 
partitions.  A single successful coalesce could have a significant 
ripple effect.  Perhaps that could be mitigated by realizing that many 
updates wouldn't be needed, but it's just a level of complexity that's 
not needed here.

And note we work on partitions, not SSA_NAMEs.  It just happens that we 
start with each SSA_NAME in its own partition.  Most SSA_NAMEs actually 
don't participate in coalescing as they're not used in a copy 
instruction or as a phi arg/result.   That's why we compact the 
partitions after we've scanned the IL for names that are going to 
participate in coalescing.





>
> I think the patch is reasonable enough for GCC 6 if we can bring compile-time
> cost down a bit (it can be quadratic in the number of SSA names if we have
> a lot of coalesce candidates and nearly full conflict bitmaps - of course that's
> not a case we handle very well right now but still).  I would have hoped the
> index part of the patch fixed the regression (by luck)...
I'd hoped it'd fix the regression by luck as well, but that was not the 
case :(


>
> As far as a testcase goes we want to scan the dumps for the actual coalesces
> being done.  Might be a bit fragile though...
I suspect that's going to be quite fragile and may have more target 
dependencies than we'd like (due to branch costing and such).



Jeff
Trevor Saunders March 15, 2016, 1:08 a.m. UTC | #3
On Mon, Mar 14, 2016 at 04:32:06PM -0600, Jeff Law wrote:
> On 03/11/2016 03:02 AM, Richard Biener wrote:
> >
> >
> >For the other part I noticed a few things
> >  1) having a bitmap_count_ior_bits () would be an improvement
> Yea, I almost built one.  That's easy enough to add.
> 
> >  2) you might end up with redundant work(?) as you are iterating
> >  over SSA name coalesce candidates but look at partition conflicts
> We'd have redundant work if the elements mapped back to SSA_NAMEs which in
> turn mapped to partitions which appeared as a coalescing pair already.  But
> there's no way to know that in advance.
> 
> This is mitigated somewhat in the next version which computes the conflict
> sizes lazily when the qsort comparison function is given two conflict pairs
> with an equal cost.
> 
> 
> 
> >  3) having this extra heuristic might be best guarded by
> >flag_expensive_optimizations
> Perhaps.  I don't see this tie breaker as being all that expensive.  But I
> won't object to guarding with flag_expensive_optimizations.
> 
> >  as it is a quite expensive "tie breaker" - maybe even improve things
> >by first sorting
> >  after cost and then only doing the tie breaking when necessary, re-sorting the
> >  sub-sequence with same original cost.  It may also be enough to only perform
> >  this for "important" candidates, say within the first 100 of the function or so
> >  or with cost > X.
> The problem with this is qsort's interface into the comparison function has
> a terribly narrow API and I don't think we want to rely on qsort_r.  In fact
> that's the whole reason why I didn't do lazy evaluation on the conflict
> sizes initially.
> 
> To work around the narrow API in the comparison function we have to either
> store additional data in each node or have them available in globals.  The
> former would be horribly wasteful, the latter is just ugly.  I choose the
> latter in the lazy evaluation of the conflicts version.

its a bit ugly in C++98, but you can give std::sort a random object with
operator () to compare with.

Trev
Richard Biener March 15, 2016, 2:22 p.m. UTC | #4
On Mon, Mar 14, 2016 at 11:32 PM, Jeff Law <law@redhat.com> wrote:
> On 03/11/2016 03:02 AM, Richard Biener wrote:
>>
>>
>>
>> For the other part I noticed a few things
>>   1) having a bitmap_count_ior_bits () would be an improvement
>
> Yea, I almost built one.  That's easy enough to add.
>
>>   2) you might end up with redundant work(?) as you are iterating
>>   over SSA name coalesce candidates but look at partition conflicts
>
> We'd have redundant work if the elements mapped back to SSA_NAMEs which in
> turn mapped to partitions which appeared as a coalescing pair already.  But
> there's no way to know that in advance.
>
> This is mitigated somewhat in the next version which computes the conflict
> sizes lazily when the qsort comparison function is given two conflict pairs
> with an equal cost.

That sounds good.

>>   3) having this extra heuristic might be best guarded by
>> flag_expensive_optimizations
>
> Perhaps.  I don't see this tie breaker as being all that expensive.  But I
> won't object to guarding with flag_expensive_optimizations.

Yeah, we should first address the quadraticness of the live compute which is
usually what hits us first when hitting a bottleneck in coalescing / out-of-SSA.

>>   as it is a quite expensive "tie breaker" - maybe even improve things
>> by first sorting
>>   after cost and then only doing the tie breaking when necessary,
>> re-sorting the
>>   sub-sequence with same original cost.  It may also be enough to only
>> perform
>>   this for "important" candidates, say within the first 100 of the
>> function or so
>>   or with cost > X.
>
> The problem with this is qsort's interface into the comparison function has
> a terribly narrow API and I don't think we want to rely on qsort_r.  In fact
> that's the whole reason why I didn't do lazy evaluation on the conflict
> sizes initially.
>
> To work around the narrow API in the comparison function we have to either
> store additional data in each node or have them available in globals.  The
> former would be horribly wasteful, the latter is just ugly.  I choose the
> latter in the lazy evaluation of the conflicts version.

Works for me.

>>
>> And finally - if we really think that looking at the conflict size
>> increase is the way to go
>> it would maybe be better to use a fibheap updating keys in
>> attempt_coalesce
>> when we merge the conflicts.  That would also mean to work on a list
>> (fibheap)
>> of coalesces of partitions rather than SSA names.
>
> I really doubt it's worth this effort.  The literature I've been looking at
> in this space essentially says that given a reasonable coalescer,
> improvements, while measurable, are very very small in terms of the
> efficiency of the final code.
>
> Thus I rejected conservative coalescing + iteration, biased coalescing, &
> trial coalescing as too expensive given the trivial benefit. Similarly I
> rejected trying to update the costs as we coalesce partitions.  A single
> successful coalesce could have a significant ripple effect.  Perhaps that
> could be mitigated by realizing that many updates wouldn't be needed, but
> it's just a level of complexity that's not needed here.

Ok.

> And note we work on partitions, not SSA_NAMEs.  It just happens that we
> start with each SSA_NAME in its own partition.  Most SSA_NAMEs actually
> don't participate in coalescing as they're not used in a copy instruction or
> as a phi arg/result.   That's why we compact the partitions after we've
> scanned the IL for names that are going to participate in coalescing.
>
>
>
>
>
>>
>> I think the patch is reasonable enough for GCC 6 if we can bring
>> compile-time
>> cost down a bit (it can be quadratic in the number of SSA names if we have
>> a lot of coalesce candidates and nearly full conflict bitmaps - of course
>> that's
>> not a case we handle very well right now but still).  I would have hoped
>> the
>> index part of the patch fixed the regression (by luck)...
>
> I'd hoped it'd fix the regression by luck as well, but that was not the case
> :(
>
>
>>
>> As far as a testcase goes we want to scan the dumps for the actual
>> coalesces
>> being done.  Might be a bit fragile though...
>
> I suspect that's going to be quite fragile and may have more target
> dependencies than we'd like (due to branch costing and such).

Yes.

Otherwise -ENOPATCH.

Thanks,
Richard.

>
>
> Jeff
Jeff Law March 16, 2016, 5:10 p.m. UTC | #5
On 03/14/2016 07:08 PM, Trevor Saunders wrote:
>>
>> To work around the narrow API in the comparison function we have to either
>> store additional data in each node or have them available in globals.  The
>> former would be horribly wasteful, the latter is just ugly.  I choose the
>> latter in the lazy evaluation of the conflicts version.
>
> its a bit ugly in C++98, but you can give std::sort a random object with
> operator () to compare with.
So we could just wrap the object with a class that has a suitable 
operator?  I like that much more than mucking around with global 
variables.  Let me give that a whirl.

Jeff
Jeff Law March 16, 2016, 5:13 p.m. UTC | #6
On 03/15/2016 08:22 AM, Richard Biener wrote:
>> To work around the narrow API in the comparison function we have to either
>> store additional data in each node or have them available in globals.  The
>> former would be horribly wasteful, the latter is just ugly.  I choose the
>> latter in the lazy evaluation of the conflicts version.
>
> Works for me.
I'm going to take a look at Trevor's suggestion to use std::sort with a 
suitable class.  That may ultimately be cleaner.


>>> As far as a testcase goes we want to scan the dumps for the actual
>>> coalesces
>>> being done.  Might be a bit fragile though...
>>
>> I suspect that's going to be quite fragile and may have more target
>> dependencies than we'd like (due to branch costing and such).
>
> Yes.
>
> Otherwise -ENOPATCH.
Right.  I haven't written the part to count the number of unique bits 
across two bitmaps yet as exported function from bitmap.[ch] yet.  So no 
patch was included.  Off to do that now :-)

jeff
diff mbox

Patch

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index cc91e84..f28baa2 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,18 @@ 
+2016-03-10  Jeff Law  <law@redhat.com>
+
+	PR tree-optimization/64058
+	* tree-ssa-coalesce.c (struct coalesce_pair): Add new fields
+	CONFLICT_COUNT and INDEX.
+	(num_coalesce_pairs): Move up earlier in the file.
+	(find_coalesce_pair): Initialize the INDEX field for each pair
+	discovered.
+	(add_conflict_counts): New function to initialize the CONFLICT_COUNT
+	field for each conflict pair.
+	(coalesce_ssa_name): Call it.
+	(compare_pairs): No longer sort on the elements of each pair.
+	Instead break ties with the conflict size and finally the index
+	of the coalesce pair.
+
 2016-03-10  Ulrich Weigand  <Ulrich.Weigand@de.ibm.com>
 
 	PR target/70168
diff --git a/gcc/tree-ssa-coalesce.c b/gcc/tree-ssa-coalesce.c
index 6624e7e..b8a2e0d 100644
--- a/gcc/tree-ssa-coalesce.c
+++ b/gcc/tree-ssa-coalesce.c
@@ -50,6 +50,19 @@  struct coalesce_pair
   int first_element;
   int second_element;
   int cost;
+
+  /* A count of the number of unique partitions this pair would conflict
+     with if coalescing was successful.  This is the secondary sort key,
+     given two pairs with equal costs, we will prefer the pair with a smaller
+     conflict set.
+
+     Note this is not updated and propagated as pairs are coalesced.  */
+  int conflict_count;
+
+  /* The order in which coalescing pairs are discovered is recorded in this
+     field, which is used as the final tie breaker when sorting coalesce
+     pairs.  */
+  int index;
 };
 
 /* Coalesce pair hashtable helpers.  */
@@ -254,6 +267,13 @@  delete_coalesce_list (coalesce_list *cl)
   free (cl);
 }
 
+/* Return the number of unique coalesce pairs in CL.  */
+
+static inline int
+num_coalesce_pairs (coalesce_list *cl)
+{
+  return cl->list->elements ();
+}
 
 /* Find a matching coalesce pair object in CL for the pair P1 and P2.  If
    one isn't found, return NULL if CREATE is false, otherwise create a new
@@ -290,6 +310,7 @@  find_coalesce_pair (coalesce_list *cl, int p1, int p2, bool create)
       pair->first_element = p.first_element;
       pair->second_element = p.second_element;
       pair->cost = 0;
+      pair->index = num_coalesce_pairs (cl);
       *slot = pair;
     }
 
@@ -343,29 +364,21 @@  compare_pairs (const void *p1, const void *p2)
   int result;
 
   result = (* pp1)->cost - (* pp2)->cost;
-  /* Since qsort does not guarantee stability we use the elements
-     as a secondary key.  This provides us with independence from
-     the host's implementation of the sorting algorithm.  */
+  /* We use the size of the resulting conflict set as the secondary sort key.
+     Given two equal costing coalesce pairs, we want to prefer the pair that
+     has the smaller conflict set.  */
   if (result == 0)
     {
-      result = (* pp2)->first_element - (* pp1)->first_element;
+      result = (*pp2)->conflict_count - (*pp1)->conflict_count;
+      /* And if everything else is equal, then sort based on which
+	 coalesce pair was found first.  */
       if (result == 0)
-	result = (* pp2)->second_element - (* pp1)->second_element;
+	result = (*pp2)->index - (*pp1)->index;
     }
 
   return result;
 }
 
-
-/* Return the number of unique coalesce pairs in CL.  */
-
-static inline int
-num_coalesce_pairs (coalesce_list *cl)
-{
-  return cl->list->elements ();
-}
-
-
 /* Iterate over CL using ITER, returning values in PAIR.  */
 
 #define FOR_EACH_PARTITION_PAIR(PAIR, ITER, CL)		\
@@ -578,6 +591,42 @@  ssa_conflicts_merge (ssa_conflicts *ptr, unsigned x, unsigned y)
     }
 }
 
+/* Compute and record how many unique conflicts would exist for the
+   representative partition for each coalesce pair in CL.
+
+   CONFLICTS is the conflict graph and MAP is the current partition view.  */
+
+static void
+add_conflict_counts (ssa_conflicts *conflicts, var_map map, coalesce_list *cl)
+{
+  coalesce_pair *p;
+  coalesce_iterator_type ppi;
+  bitmap tmp = BITMAP_ALLOC (NULL);
+
+  FOR_EACH_PARTITION_PAIR (p, ppi, cl)
+    {
+      int p1 = var_to_partition (map, ssa_name (p->first_element));
+      int p2 = var_to_partition (map, ssa_name (p->second_element));
+
+      /* 4 cases.  If both P1 and P2 have conflicts, then build their
+	 union and count the members.  Else handle the degenerate cases
+	 in the obvious ways.  */
+      if (conflicts->conflicts[p1] && conflicts->conflicts[p2])
+	{
+	  bitmap_ior (tmp, conflicts->conflicts[p1], conflicts->conflicts[p2]);
+	  p->conflict_count = bitmap_count_bits (tmp);
+	  bitmap_clear (tmp);
+	}
+      else if (conflicts->conflicts[p1])
+	p->conflict_count = bitmap_count_bits (conflicts->conflicts[p1]);
+      else if (conflicts->conflicts[p2])
+	p->conflict_count = bitmap_count_bits (conflicts->conflicts[p2]);
+      else
+	p->conflict_count = 0;
+    }
+
+  BITMAP_FREE (tmp);
+}
 
 /* Dump a conflicts graph.  */
 
@@ -1802,6 +1851,7 @@  coalesce_ssa_name (void)
   if (dump_file && (dump_flags & TDF_DETAILS))
     ssa_conflicts_dump (dump_file, graph);
 
+  add_conflict_counts (graph, map, cl);
   sort_coalesce_list (cl);
 
   if (dump_file && (dump_flags & TDF_DETAILS))