diff mbox

Keep static VTA locs in cselib tables only

Message ID or7h2rup3b.fsf@livre.localdomain
State New
Headers show

Commit Message

Alexandre Oliva Nov. 23, 2011, 10:10 a.m. UTC
This patch reduces VTA memory consumption and even speeds it up
somewhat, by avoiding recording permanent equivalences in dataflow sets
(and propagating them all the way down the control flow graph), keeping
them in cselib tables only.  This saves some micro-operations, some
duplicate attempts to expand the same complex operations, and most of
all time and memory for locations in dataflow sets.

I've also moved reverse operations and entry values, that are also
permanent equivalences, to cselib tables, introducing a mechanism to add
equivalences to cselib tables that doesn't depend on expressions hashing
equal: instead, locs for values in an equivalence set are grouped in the
loc list for the earliest (canonical) value in the set, in the cselib
tables, with a single entry in the loc list for all other set members
pointing to the canonical value.

The downside is that we don't sort loc lists in cselib as we do in
var-trackin, so we don't give expressions the same preferences we did
before, which means there's some potential for debug info degradation,
particularly for preferring entry value expressions over concrete
expressions guaranteed to have an available value.  I'm going to see
whether sorting gets us better/faster results next, but just sorting
them won't get us all the way: while before we'd sort all equivalences
for the var-tracking-canonical equivalence, we may now fail to merge
location lists because the static equivalences aren't taken into account
when dynamic equivalence sets in var-tracking dataflow sets.  I haven't
thought about whether this makes much of a difference, or how to do that
efficiently if desirable, but I figured I wouldn't wait any longer
before submitting this patch for 4.7.

This was regstrapped on i686-pc-linux-gnu and x86_64-linux-gnu.  I've
also run some debug info, memory and compile-time measurements:

- compiling stage3-gcc/ (--enable-languages=all,ada) became some 1-2%
faster on average (0.5% to 5% speedups were observed over 3
measurements)

- comparable speedups with a not-very-random sample of preprocessed
sources that used to be VTA bad-performers, with var-tracking memory use
down by 10% to 50%.

- compiling stage2 target libs and stage3 host patched sources (with
both unpatched and patched stage2 compiler) produced cc1plus with 10%
fewer entry value expressions (a welcome surprise!), 1% fewer call site
value expressions, an increase of 0.1% in the total number of variables
with location lists and less than 0.5% decrease in variables with full
coverage.

Here's the patch.  Ok to install?

Comments

Jakub Jelinek Nov. 25, 2011, 2:51 p.m. UTC | #1
On Wed, Nov 23, 2011 at 08:10:00AM -0200, Alexandre Oliva wrote:
> - compiling stage2 target libs and stage3 host patched sources (with
> both unpatched and patched stage2 compiler) produced cc1plus with 10%
> fewer entry value expressions (a welcome surprise!), 1% fewer call site
> value expressions, an increase of 0.1% in the total number of variables
> with location lists and less than 0.5% decrease in variables with full
> coverage.

The numbers I got with your patch (RTL checking) are below, seems
the cumulative numbers other than 100% are all bigger with patched stage2,
which means unfortunately debug info quality degradation.  Have you
analysed at least on some shorter testcases why does that happen?

Otherwise the patch looks good to me.

x86_64 patched stage3 compiled by vanilla stage2
cov%	samples	cumul
0.0	230172/32%	230172/32%
0..10	12267/1%	242439/34%
11..20	10548/1%	252987/35%
21..30	17018/2%	270005/37%
31..40	16374/2%	286379/40%
41..50	17533/2%	303912/42%
51..60	13051/1%	316963/44%
61..70	13946/1%	330909/46%
71..80	19627/2%	350536/49%
81..90	28877/4%	379413/53%
91..99	85086/11%	464499/65%
100	246568/34%	711067/100%
x86_64 patched stage3 compiled by patched stage2
cov%	samples	cumul
0.0	230182/32%	230182/32%
0..10	12319/1%	242501/34%
11..20	10765/1%	253266/35%
21..30	17390/2%	270656/38%
31..40	16745/2%	287401/40%
41..50	17821/2%	305222/42%
51..60	13306/1%	318528/44%
61..70	14104/1%	332632/46%
71..80	19795/2%	352427/49%
81..90	29030/4%	381457/53%
91..99	85171/11%	466628/65%
100	244439/34%	711067/100%
i686 patched stage3 compiled by vanilla stage2
cov%	samples	cumul
0.0	225909/32%	225909/32%
0..10	12420/1%	238329/34%
11..20	10693/1%	249022/35%
21..30	17102/2%	266124/38%
31..40	13529/1%	279653/40%
41..50	17232/2%	296885/42%
51..60	12568/1%	309453/44%
61..70	14769/2%	324222/46%
71..80	14937/2%	339159/48%
81..90	23868/3%	363027/52%
91..99	86306/12%	449333/64%
100	245327/35%	694660/100%
i686 patched stage3 compiled by patched stage2
cov%	samples	cumul
0.0	225917/32%	225917/32%
0..10	12471/1%	238388/34%
11..20	10848/1%	249236/35%
21..30	17292/2%	266528/38%
31..40	13716/1%	280244/40%
41..50	17324/2%	297568/42%
51..60	12673/1%	310241/44%
61..70	14950/2%	325191/46%
71..80	15085/2%	340276/48%
81..90	24019/3%	364295/52%
91..99	86228/12%	450523/64%
100	244137/35%	694660/100%

	Jakub
Andreas Krebbel Jan. 2, 2012, 4:47 p.m. UTC | #2
Hi,

this seem to have caused a bootstrap failure on s390x: PR51735

/home/andreas/git/gcc-head/gcc/tree-ssa-pre.c: In function ‘bool
insert_aux(basic_block)’:
/home/andreas/git/gcc-head/gcc/tree-ssa-pre.c:3791:1: internal compiler error:
Segmentation fault

Bye,

-Andreas-

On 11/23/2011 11:10 AM, Alexandre Oliva wrote:
> This patch reduces VTA memory consumption and even speeds it up
> somewhat, by avoiding recording permanent equivalences in dataflow sets
> (and propagating them all the way down the control flow graph), keeping
> them in cselib tables only.  This saves some micro-operations, some
> duplicate attempts to expand the same complex operations, and most of
> all time and memory for locations in dataflow sets.
> 
> I've also moved reverse operations and entry values, that are also
> permanent equivalences, to cselib tables, introducing a mechanism to add
> equivalences to cselib tables that doesn't depend on expressions hashing
> equal: instead, locs for values in an equivalence set are grouped in the
> loc list for the earliest (canonical) value in the set, in the cselib
> tables, with a single entry in the loc list for all other set members
> pointing to the canonical value.
> 
> The downside is that we don't sort loc lists in cselib as we do in
> var-trackin, so we don't give expressions the same preferences we did
> before, which means there's some potential for debug info degradation,
> particularly for preferring entry value expressions over concrete
> expressions guaranteed to have an available value.  I'm going to see
> whether sorting gets us better/faster results next, but just sorting
> them won't get us all the way: while before we'd sort all equivalences
> for the var-tracking-canonical equivalence, we may now fail to merge
> location lists because the static equivalences aren't taken into account
> when dynamic equivalence sets in var-tracking dataflow sets.  I haven't
> thought about whether this makes much of a difference, or how to do that
> efficiently if desirable, but I figured I wouldn't wait any longer
> before submitting this patch for 4.7.
> 
> This was regstrapped on i686-pc-linux-gnu and x86_64-linux-gnu.  I've
> also run some debug info, memory and compile-time measurements:
> 
> - compiling stage3-gcc/ (--enable-languages=all,ada) became some 1-2%
> faster on average (0.5% to 5% speedups were observed over 3
> measurements)
> 
> - comparable speedups with a not-very-random sample of preprocessed
> sources that used to be VTA bad-performers, with var-tracking memory use
> down by 10% to 50%.
> 
> - compiling stage2 target libs and stage3 host patched sources (with
> both unpatched and patched stage2 compiler) produced cc1plus with 10%
> fewer entry value expressions (a welcome surprise!), 1% fewer call site
> value expressions, an increase of 0.1% in the total number of variables
> with location lists and less than 0.5% decrease in variables with full
> coverage.
> 
> Here's the patch.  Ok to install?
> 
> 
> 
> 
> 
>
diff mbox

Patch

for  gcc/ChangeLog
from  Alexandre Oliva  <aoliva@redhat.com>

	* cselib.h (cselib_add_permanent_equiv): Declare.
	(canonical_cselib_val): New.
	* cselib.c (new_elt_loc_list): Rework to support value
	equivalences.  Adjust all callers.
	(preserve_only_constants): Retain value equivalences.
	(references_value_p): Retain preserved values.
	(rtx_equal_for_cselib_1): Handle value equivalences.
	(cselib_invalidate_regno): Use canonical value.
	(cselib_add_permanent_equiv): New.
	* alias.c (find_base_term): Reset locs lists while recursing.
	* var-tracking.c (val_bind): New.  Don't add equivalences
	present in cselib table, compared with code moved from...
	(val_store): ... here.
	(val_resolve): Use val_bind.
	(VAL_EXPR_HAS_REVERSE): Drop.
	(add_uses): Do not create MOps for addresses.  Do not mark
	non-REG non-MEM expressions as requiring resolution.
	(reverse_op): Record reverse as a cselib equivalence.
	(add_stores): Use it.  Do not create MOps for addresses.
	Do not require resolution for non-REG non-MEM expressions.
	Simplify support for reverse operations.
	(compute_bb_dataflow): Drop reverse support.
	(emit_notes_in_bb): Likewise.
	(create_entry_value): Rename to...
	(record_entry_value): ... this.  Use cselib equivalences.
	(vt_add_function_parameter): Adjust.

Index: gcc/cselib.h
===================================================================
--- gcc/cselib.h.orig	2011-11-21 02:23:54.111708716 -0200
+++ gcc/cselib.h	2011-11-21 05:31:42.176203099 -0200
@@ -96,5 +96,24 @@  extern void cselib_preserve_value (cseli
 extern bool cselib_preserved_value_p (cselib_val *);
 extern void cselib_preserve_only_values (void);
 extern void cselib_preserve_cfa_base_value (cselib_val *, unsigned int);
+extern void cselib_add_permanent_equiv (cselib_val *, rtx, rtx);
 
 extern void dump_cselib_table (FILE *);
+
+/* Return the canonical value for VAL, following the equivalence chain
+   towards the earliest (== lowest uid) equivalent value.  */
+
+static inline cselib_val *
+canonical_cselib_val (cselib_val *val)
+{
+  cselib_val *canon;
+
+  if (!val->locs || val->locs->next
+      || !val->locs->loc || GET_CODE (val->locs->loc) != VALUE
+      || val->uid < CSELIB_VAL_PTR (val->locs->loc)->uid)
+    return val;
+
+  canon = CSELIB_VAL_PTR (val->locs->loc);
+  gcc_checking_assert (canonical_cselib_val (canon) == canon);
+  return canon;
+}
Index: gcc/cselib.c
===================================================================
--- gcc/cselib.c.orig	2011-11-21 02:23:54.110708730 -0200
+++ gcc/cselib.c	2011-11-21 08:07:25.290452455 -0200
@@ -55,7 +55,7 @@  static bool cselib_preserve_constants;
 static int entry_and_rtx_equal_p (const void *, const void *);
 static hashval_t get_value_hash (const void *);
 static struct elt_list *new_elt_list (struct elt_list *, cselib_val *);
-static struct elt_loc_list *new_elt_loc_list (struct elt_loc_list *, rtx);
+static void new_elt_loc_list (cselib_val *, rtx);
 static void unchain_one_value (cselib_val *);
 static void unchain_one_elt_list (struct elt_list **);
 static void unchain_one_elt_loc_list (struct elt_loc_list **);
@@ -223,26 +223,75 @@  new_elt_list (struct elt_list *next, cse
   return el;
 }
 
-/* Allocate a struct elt_loc_list and fill in its two elements with the
-   arguments.  */
+/* Allocate a struct elt_loc_list with LOC and prepend it to VAL's loc
+   list.  */
 
-static inline struct elt_loc_list *
-new_elt_loc_list (struct elt_loc_list *next, rtx loc)
+static inline void
+new_elt_loc_list (cselib_val *val, rtx loc)
 {
-  struct elt_loc_list *el;
-  el = (struct elt_loc_list *) pool_alloc (elt_loc_list_pool);
-  el->next = next;
-  el->loc = loc;
-  el->setting_insn = cselib_current_insn;
-  gcc_assert (!next || !next->setting_insn
-	      || !DEBUG_INSN_P (next->setting_insn));
+  struct elt_loc_list *el, *next = val->locs;
+
+  gcc_checking_assert (!next || !next->setting_insn
+		       || !DEBUG_INSN_P (next->setting_insn)
+		       || cselib_current_insn == next->setting_insn);
 
   /* If we're creating the first loc in a debug insn context, we've
      just created a debug value.  Count it.  */
   if (!next && cselib_current_insn && DEBUG_INSN_P (cselib_current_insn))
     n_debug_values++;
 
-  return el;
+  val = canonical_cselib_val (val);
+  next = val->locs;
+
+  if (GET_CODE (loc) == VALUE)
+    {
+      loc = canonical_cselib_val (CSELIB_VAL_PTR (loc))->val_rtx;
+
+      gcc_checking_assert (PRESERVED_VALUE_P (loc)
+			   == PRESERVED_VALUE_P (val->val_rtx));
+
+      if (val->val_rtx == loc)
+	return;
+      else if (val->uid > CSELIB_VAL_PTR (loc)->uid)
+	{
+	  /* Reverse the insertion.  */
+	  new_elt_loc_list (CSELIB_VAL_PTR (loc), val->val_rtx);
+	  return;
+	}
+
+      gcc_checking_assert (val->uid < CSELIB_VAL_PTR (loc)->uid);
+
+      if (CSELIB_VAL_PTR (loc)->locs)
+	{
+	  /* Bring all locs from LOC to VAL.  */
+	  for (el = CSELIB_VAL_PTR (loc)->locs; el->next; el = el->next)
+	    {
+	      /* Adjust values that have LOC as canonical so that VAL
+		 becomes their canonical.  */
+	      if (el->loc && GET_CODE (el->loc) == VALUE)
+		{
+		  gcc_checking_assert (CSELIB_VAL_PTR (el->loc)->locs->loc
+				       == loc);
+		  CSELIB_VAL_PTR (el->loc)->locs->loc = val->val_rtx;
+		}
+	    }
+	  el->next = val->locs;
+	  next = val->locs = CSELIB_VAL_PTR (loc)->locs;
+	}
+
+      /* Chain LOC back to VAL.  */
+      el = (struct elt_loc_list *) pool_alloc (elt_loc_list_pool);
+      el->loc = val->val_rtx;
+      el->setting_insn = cselib_current_insn;
+      el->next = NULL;
+      CSELIB_VAL_PTR (loc)->locs = el;
+    }
+
+  el = (struct elt_loc_list *) pool_alloc (elt_loc_list_pool);
+  el->loc = loc;
+  el->setting_insn = cselib_current_insn;
+  el->next = next;
+  val->locs = el;
 }
 
 /* Promote loc L to a nondebug cselib_current_insn if L is marked as
@@ -320,6 +369,7 @@  static int
 preserve_only_constants (void **x, void *info ATTRIBUTE_UNUSED)
 {
   cselib_val *v = (cselib_val *)*x;
+  struct elt_loc_list *l;
 
   if (v->locs != NULL
       && v->locs->next == NULL)
@@ -338,14 +388,11 @@  preserve_only_constants (void **x, void 
 	    return 1;
 	}
     }
-  /* Keep around VALUEs that forward function invariant ENTRY_VALUEs
-     to corresponding parameter VALUEs.  */
-  if (v->locs != NULL
-      && v->locs->next != NULL
-      && v->locs->next->next == NULL
-      && GET_CODE (v->locs->next->loc) == ENTRY_VALUE
-      && GET_CODE (v->locs->loc) == VALUE)
-    return 1;
+
+  /* Keep VALUE equivalences around.  */
+  for (l = v->locs; l; l = l->next)
+    if (GET_CODE (l->loc) == VALUE)
+      return 1;
 
   htab_clear_slot (cselib_hash_table, x);
   return 1;
@@ -490,7 +537,8 @@  references_value_p (const_rtx x, int onl
   int i, j;
 
   if (GET_CODE (x) == VALUE
-      && (! only_useless || CSELIB_VAL_PTR (x)->locs == 0))
+      && (! only_useless ||
+	  (CSELIB_VAL_PTR (x)->locs == 0 && !PRESERVED_VALUE_P (x))))
     return 1;
 
   for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
@@ -744,20 +792,22 @@  rtx_equal_for_cselib_1 (rtx x, rtx y, en
   if (x == y)
     return 1;
 
-  if (GET_CODE (x) == VALUE && GET_CODE (y) == VALUE)
-    return CSELIB_VAL_PTR (x) == CSELIB_VAL_PTR (y);
-
   if (GET_CODE (x) == VALUE)
     {
-      cselib_val *e = CSELIB_VAL_PTR (x);
+      cselib_val *e = canonical_cselib_val (CSELIB_VAL_PTR (x));
       struct elt_loc_list *l;
 
+      if (GET_CODE (y) == VALUE)
+	return e == canonical_cselib_val (CSELIB_VAL_PTR (y));
+
       for (l = e->locs; l; l = l->next)
 	{
 	  rtx t = l->loc;
 
-	  /* Avoid infinite recursion.  */
-	  if (REG_P (t) || MEM_P (t))
+	  /* Avoid infinite recursion.  We know we have the canonical
+	     value, so we can just skip any values in the equivalence
+	     list.  */
+	  if (REG_P (t) || MEM_P (t) || GET_CODE (t) == VALUE)
 	    continue;
 	  else if (rtx_equal_for_cselib_1 (t, y, memmode))
 	    return 1;
@@ -765,17 +815,16 @@  rtx_equal_for_cselib_1 (rtx x, rtx y, en
 
       return 0;
     }
-
-  if (GET_CODE (y) == VALUE)
+  else if (GET_CODE (y) == VALUE)
     {
-      cselib_val *e = CSELIB_VAL_PTR (y);
+      cselib_val *e = canonical_cselib_val (CSELIB_VAL_PTR (y));
       struct elt_loc_list *l;
 
       for (l = e->locs; l; l = l->next)
 	{
 	  rtx t = l->loc;
 
-	  if (REG_P (t) || MEM_P (t))
+	  if (REG_P (t) || MEM_P (t) || GET_CODE (t) == VALUE)
 	    continue;
 	  else if (rtx_equal_for_cselib_1 (x, t, memmode))
 	    return 1;
@@ -1217,9 +1266,8 @@  add_mem_for_addr (cselib_val *addr_elt, 
       }
 
   addr_elt->addr_list = new_elt_list (addr_elt->addr_list, mem_elt);
-  mem_elt->locs
-    = new_elt_loc_list (mem_elt->locs,
-			replace_equiv_address_nv (x, addr_elt->val_rtx));
+  new_elt_loc_list (mem_elt,
+		    replace_equiv_address_nv (x, addr_elt->val_rtx));
   if (mem_elt->next_containing_mem == NULL)
     {
       mem_elt->next_containing_mem = first_containing_mem;
@@ -1858,7 +1906,7 @@  cselib_lookup_1 (rtx x, enum machine_mod
 	}
 
       e = new_cselib_val (next_uid, GET_MODE (x), x);
-      e->locs = new_elt_loc_list (e->locs, x);
+      new_elt_loc_list (e, x);
       if (REG_VALUES (i) == 0)
 	{
 	  /* Maintain the invariant that the first entry of
@@ -1901,7 +1949,7 @@  cselib_lookup_1 (rtx x, enum machine_mod
 	      rtx sub = lowpart_subreg (mode, lwider->elt->val_rtx,
 					GET_MODE (lwider->elt->val_rtx));
 	      if (sub)
-		e->locs->next = new_elt_loc_list (e->locs->next, sub);
+		new_elt_loc_list (e, sub);
 	    }
 	}
       REG_VALUES (i)->next = new_elt_list (REG_VALUES (i)->next, e);
@@ -1933,8 +1981,7 @@  cselib_lookup_1 (rtx x, enum machine_mod
      the hash table is inconsistent until we do so, and
      cselib_subst_to_values will need to do lookups.  */
   *slot = (void *) e;
-  e->locs = new_elt_loc_list (e->locs,
-			      cselib_subst_to_values (x, memmode));
+  new_elt_loc_list (e, cselib_subst_to_values (x, memmode));
   return e;
 }
 
@@ -2059,6 +2106,8 @@  cselib_invalidate_regno (unsigned int re
 	  else
 	    unchain_one_elt_list (l);
 
+	  v = canonical_cselib_val (v);
+
 	  had_locs = v->locs != NULL;
 	  setting_insn = v->locs ? v->locs->setting_insn : NULL;
 
@@ -2245,7 +2294,7 @@  cselib_record_set (rtx dest, cselib_val 
 
       if (src_elt->locs == 0 && !PRESERVED_VALUE_P (src_elt->val_rtx))
 	n_useless_values--;
-      src_elt->locs = new_elt_loc_list (src_elt->locs, dest);
+      new_elt_loc_list (src_elt, dest);
     }
   else if (MEM_P (dest) && dest_addr_elt != 0
 	   && cselib_record_memory)
@@ -2256,6 +2305,33 @@  cselib_record_set (rtx dest, cselib_val 
     }
 }
 
+/* Make ELT and X's VALUE equivalent to each other at INSN.  */
+
+void
+cselib_add_permanent_equiv (cselib_val *elt, rtx x, rtx insn)
+{
+  cselib_val *nelt;
+  rtx save_cselib_current_insn = cselib_current_insn;
+
+  gcc_checking_assert (elt);
+  gcc_checking_assert (PRESERVED_VALUE_P (elt->val_rtx));
+  gcc_checking_assert (!side_effects_p (x));
+
+  cselib_current_insn = insn;
+
+  nelt = cselib_lookup (x, GET_MODE (elt->val_rtx), 1, VOIDmode);
+
+  if (nelt != elt)
+    {
+      if (!PRESERVED_VALUE_P (nelt->val_rtx))
+	cselib_preserve_value (nelt);
+
+      new_elt_loc_list (nelt, elt->val_rtx);
+    }
+
+  cselib_current_insn = save_cselib_current_insn;
+}
+
 /* There is no good way to determine how many elements there can be
    in a PARALLEL.  Since it's fairly cheap, use a really large number.  */
 #define MAX_SETS (FIRST_PSEUDO_REGISTER * 2)
Index: gcc/alias.c
===================================================================
--- gcc/alias.c.orig	2011-11-21 02:23:54.110708730 -0200
+++ gcc/alias.c	2011-11-21 03:37:52.111387298 -0200
@@ -1542,7 +1542,8 @@  rtx
 find_base_term (rtx x)
 {
   cselib_val *val;
-  struct elt_loc_list *l;
+  struct elt_loc_list *l, *f;
+  rtx ret;
 
 #if defined (FIND_BASE_TERM)
   /* Try machine-dependent ways to find the base term.  */
@@ -1591,12 +1592,26 @@  find_base_term (rtx x)
 
     case VALUE:
       val = CSELIB_VAL_PTR (x);
+      ret = NULL_RTX;
+
       if (!val)
-	return 0;
-      for (l = val->locs; l; l = l->next)
-	if ((x = find_base_term (l->loc)) != 0)
-	  return x;
-      return 0;
+	return ret;
+
+      f = val->locs;
+      /* Temporarily reset val->locs to avoid infinite recursion.  */
+      val->locs = NULL;
+
+      for (l = f; l; l = l->next)
+	if (GET_CODE (l->loc) == VALUE
+	    && CSELIB_VAL_PTR (l->loc)->locs
+	    && !CSELIB_VAL_PTR (l->loc)->locs->next
+	    && CSELIB_VAL_PTR (l->loc)->locs->loc == x)
+	  continue;
+	else if ((ret = find_base_term (l->loc)) != 0)
+	  break;
+
+      val->locs = f;
+      return ret;
 
     case LO_SUM:
       /* The standard form is (lo_sum reg sym) so look only at the
Index: gcc/var-tracking.c
===================================================================
--- gcc/var-tracking.c.orig	2011-11-21 02:23:54.111708716 -0200
+++ gcc/var-tracking.c	2011-11-21 22:13:11.831071308 -0200
@@ -2027,6 +2027,50 @@  unsuitable_loc (rtx loc)
     }
 }
 
+/* Bind VAL to LOC in SET.  If MODIFIED, detach LOC from any values
+   bound to it.  */
+
+static inline void
+val_bind (dataflow_set *set, rtx val, rtx loc, bool modified)
+{
+  if (REG_P (loc))
+    {
+      if (modified)
+	var_regno_delete (set, REGNO (loc));
+      var_reg_decl_set (set, loc, VAR_INIT_STATUS_INITIALIZED,
+			dv_from_value (val), 0, NULL_RTX, INSERT);
+    }
+  else if (MEM_P (loc))
+    {
+      struct elt_loc_list *l = CSELIB_VAL_PTR (val)->locs;
+
+      if (l && GET_CODE (l->loc) == VALUE)
+	l = canonical_cselib_val (CSELIB_VAL_PTR (l->loc))->locs;
+
+      /* If this MEM is a global constant, we don't need it in the
+	 dynamic tables.  ??? We should test this before emitting the
+	 micro-op in the first place.  */
+      while (l)
+	if (GET_CODE (l->loc) == MEM && XEXP (l->loc, 0) == XEXP (loc, 0))
+	  break;
+	else
+	  l = l->next;
+
+      if (!l)
+	var_mem_decl_set (set, loc, VAR_INIT_STATUS_INITIALIZED,
+			  dv_from_value (val), 0, NULL_RTX, INSERT);
+    }
+  else
+    {
+      /* Other kinds of equivalences are necessarily static, at least
+	 so long as we do not perform substitutions while merging
+	 expressions.  */
+      gcc_unreachable ();
+      set_variable_part (set, loc, dv_from_value (val), 0,
+			 VAR_INIT_STATUS_INITIALIZED, NULL_RTX, INSERT);
+    }
+}
+
 /* Bind a value to a location it was just stored in.  If MODIFIED
    holds, assume the location was modified, detaching it from any
    values bound to it.  */
@@ -2058,21 +2102,7 @@  val_store (dataflow_set *set, rtx val, r
 
   gcc_checking_assert (!unsuitable_loc (loc));
 
-  if (REG_P (loc))
-    {
-      if (modified)
-	var_regno_delete (set, REGNO (loc));
-      var_reg_decl_set (set, loc, VAR_INIT_STATUS_INITIALIZED,
-			dv_from_value (val), 0, NULL_RTX, INSERT);
-    }
-  else if (MEM_P (loc))
-    var_mem_decl_set (set, loc, VAR_INIT_STATUS_INITIALIZED,
-		      dv_from_value (val), 0, NULL_RTX, INSERT);
-  else
-    /* ??? Ideally we wouldn't get these, and use them from the static
-       cselib loc list.  */
-    set_variable_part (set, loc, dv_from_value (val), 0,
-		       VAR_INIT_STATUS_INITIALIZED, NULL_RTX, INSERT);
+  val_bind (set, val, loc, modified);
 }
 
 /* Reset this node, detaching all its equivalences.  Return the slot
@@ -2187,20 +2217,13 @@  val_resolve (dataflow_set *set, rtx val,
 
       /* If we didn't find any equivalence, we need to remember that
 	 this value is held in the named register.  */
-      if (!found)
-	var_reg_decl_set (set, loc, VAR_INIT_STATUS_INITIALIZED,
-			  dv_from_value (val), 0, NULL_RTX, INSERT);
+      if (found)
+	return;
     }
-  else if (MEM_P (loc))
-    /* ??? Merge equivalent MEMs.  */
-    var_mem_decl_set (set, loc, VAR_INIT_STATUS_INITIALIZED,
-		      dv_from_value (val), 0, NULL_RTX, INSERT);
-  else
-    /* ??? Ideally we wouldn't get these, and use them from the static
-       cselib loc list.  */
-    /* ??? Merge equivalent expressions.  */
-    set_variable_part (set, loc, dv_from_value (val), 0,
-		       VAR_INIT_STATUS_INITIALIZED, NULL_RTX, INSERT);
+  /* ??? Attempt to find and merge equivalent MEMs or other
+     expressions too.  */
+
+  val_bind (set, val, loc, false);
 }
 
 /* Initialize dataflow set SET to be empty.
@@ -5046,10 +5069,6 @@  log_op_type (rtx x, basic_block bb, rtx 
    MO_CLOBBER as well.  */
 #define VAL_EXPR_IS_CLOBBERED(x) \
   (RTL_FLAG_CHECK1 ("VAL_EXPR_IS_CLOBBERED", (x), CONCAT)->unchanging)
-/* Whether the location is a CONCAT of the MO_VAL_SET expression and
-   a reverse operation that should be handled afterwards.  */
-#define VAL_EXPR_HAS_REVERSE(x) \
-  (RTL_FLAG_CHECK1 ("VAL_EXPR_HAS_REVERSE", (x), CONCAT)->return_val)
 
 /* All preserved VALUEs.  */
 static VEC (rtx, heap) *preserved_values;
@@ -5129,28 +5148,7 @@  add_uses (rtx *ploc, void *data)
 				 GET_MODE (mloc));
 
 	      if (val && !cselib_preserved_value_p (val))
-		{
-		  micro_operation moa;
-		  preserve_value (val);
-
-		  if (GET_CODE (XEXP (mloc, 0)) != ENTRY_VALUE
-		      && (GET_CODE (XEXP (mloc, 0)) != PLUS
-			  || XEXP (XEXP (mloc, 0), 0) != cfa_base_rtx
-			  || !CONST_INT_P (XEXP (XEXP (mloc, 0), 1))))
-		    {
-		      mloc = cselib_subst_to_values (XEXP (mloc, 0),
-						     GET_MODE (mloc));
-		      moa.type = MO_VAL_USE;
-		      moa.insn = cui->insn;
-		      moa.u.loc = gen_rtx_CONCAT (address_mode,
-						  val->val_rtx, mloc);
-		      if (dump_file && (dump_flags & TDF_DETAILS))
-			log_op_type (moa.u.loc, cui->bb, cui->insn,
-				     moa.type, dump_file);
-		      VEC_safe_push (micro_operation, heap, VTI (bb)->mos,
-				     &moa);
-		    }
-		}
+		preserve_value (val);
 	    }
 
 	  if (CONSTANT_P (vloc)
@@ -5162,7 +5160,11 @@  add_uses (rtx *ploc, void *data)
 	    {
 	      enum machine_mode mode2;
 	      enum micro_operation_type type2;
-	      rtx nloc = replace_expr_with_values (vloc);
+	      rtx nloc = NULL;
+	      bool resolvable = REG_P (vloc) || MEM_P (vloc);
+
+	      if (resolvable)
+		nloc = replace_expr_with_values (vloc);
 
 	      if (nloc)
 		{
@@ -5180,7 +5182,7 @@  add_uses (rtx *ploc, void *data)
 	      if (type2 == MO_CLOBBER
 		  && !cselib_preserved_value_p (val))
 		{
-		  VAL_NEEDS_RESOLUTION (oloc) = 1;
+		  VAL_NEEDS_RESOLUTION (oloc) = resolvable;
 		  preserve_value (val);
 		}
 	    }
@@ -5212,28 +5214,7 @@  add_uses (rtx *ploc, void *data)
 				 GET_MODE (mloc));
 
 	      if (val && !cselib_preserved_value_p (val))
-		{
-		  micro_operation moa;
-		  preserve_value (val);
-
-		  if (GET_CODE (XEXP (mloc, 0)) != ENTRY_VALUE
-		      && (GET_CODE (XEXP (mloc, 0)) != PLUS
-			  || XEXP (XEXP (mloc, 0), 0) != cfa_base_rtx
-			  || !CONST_INT_P (XEXP (XEXP (mloc, 0), 1))))
-		    {
-		      mloc = cselib_subst_to_values (XEXP (mloc, 0),
-						     GET_MODE (mloc));
-		      moa.type = MO_VAL_USE;
-		      moa.insn = cui->insn;
-		      moa.u.loc = gen_rtx_CONCAT (address_mode,
-						  val->val_rtx, mloc);
-		      if (dump_file && (dump_flags & TDF_DETAILS))
-			log_op_type (moa.u.loc, cui->bb, cui->insn,
-				     moa.type, dump_file);
-		      VEC_safe_push (micro_operation, heap, VTI (bb)->mos,
-				     &moa);
-		    }
-		}
+		preserve_value (val);
 	    }
 
 	  type2 = use_type (loc, 0, &mode2);
@@ -5256,6 +5237,7 @@  add_uses (rtx *ploc, void *data)
 
 	  */
 
+	  gcc_checking_assert (REG_P (loc) || MEM_P (loc));
 	  nloc = replace_expr_with_values (loc);
 	  if (!nloc)
 	    nloc = oloc;
@@ -5307,22 +5289,22 @@  add_uses_1 (rtx *x, void *cui)
    representable anyway.  */
 #define EXPR_USE_DEPTH (PARAM_VALUE (PARAM_MAX_VARTRACK_EXPR_DEPTH))
 
-/* Attempt to reverse the EXPR operation in the debug info.  Say for
-   reg1 = reg2 + 6 even when reg2 is no longer live we
-   can express its value as VAL - 6.  */
+/* Attempt to reverse the EXPR operation in the debug info and record
+   it in the cselib table.  Say for reg1 = reg2 + 6 even when reg2 is
+   no longer live we can express its value as VAL - 6.  */
 
-static rtx
-reverse_op (rtx val, const_rtx expr)
+static void
+reverse_op (rtx val, const_rtx expr, rtx insn)
 {
   rtx src, arg, ret;
   cselib_val *v;
   enum rtx_code code;
 
   if (GET_CODE (expr) != SET)
-    return NULL_RTX;
+    return;
 
   if (!REG_P (SET_DEST (expr)) || GET_MODE (val) != GET_MODE (SET_DEST (expr)))
-    return NULL_RTX;
+    return;
 
   src = SET_SRC (expr);
   switch (GET_CODE (src))
@@ -5333,30 +5315,30 @@  reverse_op (rtx val, const_rtx expr)
     case NOT:
     case NEG:
       if (!REG_P (XEXP (src, 0)))
-	return NULL_RTX;
+	return;
       break;
     case SIGN_EXTEND:
     case ZERO_EXTEND:
       if (!REG_P (XEXP (src, 0)) && !MEM_P (XEXP (src, 0)))
-	return NULL_RTX;
+	return;
       break;
     default:
-      return NULL_RTX;
+      return;
     }
 
   if (!SCALAR_INT_MODE_P (GET_MODE (src)) || XEXP (src, 0) == cfa_base_rtx)
-    return NULL_RTX;
+    return;
 
   v = cselib_lookup (XEXP (src, 0), GET_MODE (XEXP (src, 0)), 0, VOIDmode);
   if (!v || !cselib_preserved_value_p (v))
-    return NULL_RTX;
+    return;
 
   switch (GET_CODE (src))
     {
     case NOT:
     case NEG:
       if (GET_MODE (v->val_rtx) != GET_MODE (val))
-	return NULL_RTX;
+	return;
       ret = gen_rtx_fmt_e (GET_CODE (src), GET_MODE (val), val);
       break;
     case SIGN_EXTEND:
@@ -5374,15 +5356,15 @@  reverse_op (rtx val, const_rtx expr)
       goto binary;
     binary:
       if (GET_MODE (v->val_rtx) != GET_MODE (val))
-	return NULL_RTX;
+	return;
       arg = XEXP (src, 1);
       if (!CONST_INT_P (arg) && GET_CODE (arg) != SYMBOL_REF)
 	{
 	  arg = cselib_expand_value_rtx (arg, scratch_regs, 5);
 	  if (arg == NULL_RTX)
-	    return NULL_RTX;
+	    return;
 	  if (!CONST_INT_P (arg) && GET_CODE (arg) != SYMBOL_REF)
-	    return NULL_RTX;
+	    return;
 	}
       ret = simplify_gen_binary (code, GET_MODE (val), val, arg);
       if (ret == val)
@@ -5395,7 +5377,7 @@  reverse_op (rtx val, const_rtx expr)
       gcc_unreachable ();
     }
 
-  return gen_rtx_CONCAT (GET_MODE (v->val_rtx), v->val_rtx, ret);
+  cselib_add_permanent_equiv (v, ret, insn);
 }
 
 /* Add stores (register and memory references) LOC which will be tracked
@@ -5414,7 +5396,6 @@  add_stores (rtx loc, const_rtx expr, voi
   bool track_p = false;
   cselib_val *v;
   bool resolve, preserve;
-  rtx reverse;
 
   if (type == MO_CLOBBER)
     return;
@@ -5479,26 +5460,7 @@  add_stores (rtx loc, const_rtx expr, voi
 					   GET_MODE (mloc));
 
 	  if (val && !cselib_preserved_value_p (val))
-	    {
-	      preserve_value (val);
-
-	      if (GET_CODE (XEXP (mloc, 0)) != ENTRY_VALUE
-		  && (GET_CODE (XEXP (mloc, 0)) != PLUS
-		      || XEXP (XEXP (mloc, 0), 0) != cfa_base_rtx
-		      || !CONST_INT_P (XEXP (XEXP (mloc, 0), 1))))
-		{
-		  mloc = cselib_subst_to_values (XEXP (mloc, 0),
-						 GET_MODE (mloc));
-		  mo.type = MO_VAL_USE;
-		  mo.insn = cui->insn;
-		  mo.u.loc = gen_rtx_CONCAT (address_mode,
-					     val->val_rtx, mloc);
-		  if (dump_file && (dump_flags & TDF_DETAILS))
-		    log_op_type (mo.u.loc, cui->bb, cui->insn,
-				 mo.type, dump_file);
-		  VEC_safe_push (micro_operation, heap, VTI (bb)->mos, &mo);
-		}
-	    }
+	    preserve_value (val);
 	}
 
       if (GET_CODE (expr) == CLOBBER || !track_p)
@@ -5578,7 +5540,10 @@  add_stores (rtx loc, const_rtx expr, voi
     }
   else if (resolve && GET_CODE (mo.u.loc) == SET)
     {
-      nloc = replace_expr_with_values (SET_SRC (expr));
+      if (REG_P (SET_SRC (expr)) || MEM_P (SET_SRC (expr)))
+	nloc = replace_expr_with_values (SET_SRC (expr));
+      else
+	nloc = NULL_RTX;
 
       /* Avoid the mode mismatch between oexpr and expr.  */
       if (!nloc && mode != mode2)
@@ -5587,7 +5552,7 @@  add_stores (rtx loc, const_rtx expr, voi
 	  gcc_assert (oloc == SET_DEST (expr));
 	}
 
-      if (nloc)
+      if (nloc && nloc != SET_SRC (mo.u.loc))
 	oloc = gen_rtx_SET (GET_MODE (mo.u.loc), oloc, nloc);
       else
 	{
@@ -5634,14 +5599,7 @@  add_stores (rtx loc, const_rtx expr, voi
   */
 
   if (GET_CODE (PATTERN (cui->insn)) != COND_EXEC)
-    {
-      reverse = reverse_op (v->val_rtx, expr);
-      if (reverse)
-	{
-	  loc = gen_rtx_CONCAT (GET_MODE (mo.u.loc), loc, reverse);
-	  VAL_EXPR_HAS_REVERSE (loc) = 1;
-	}
-    }
+    reverse_op (v->val_rtx, expr, cui->insn);
 
   mo.u.loc = loc;
 
@@ -6299,14 +6257,9 @@  compute_bb_dataflow (basic_block bb)
 	  case MO_VAL_SET:
 	    {
 	      rtx loc = mo->u.loc;
-	      rtx val, vloc, uloc, reverse = NULL_RTX;
+	      rtx val, vloc, uloc;
 
 	      vloc = loc;
-	      if (VAL_EXPR_HAS_REVERSE (loc))
-		{
-		  reverse = XEXP (loc, 1);
-		  vloc = XEXP (loc, 0);
-		}
 	      uloc = XEXP (vloc, 1);
 	      val = XEXP (vloc, 0);
 	      vloc = uloc;
@@ -6382,10 +6335,6 @@  compute_bb_dataflow (basic_block bb)
 		var_regno_delete (out, REGNO (uloc));
 
 	      val_store (out, val, vloc, insn, true);
-
-	      if (reverse)
-		val_store (out, XEXP (reverse, 0), XEXP (reverse, 1),
-			   insn, false);
 	    }
 	    break;
 
@@ -8709,14 +8658,9 @@  emit_notes_in_bb (basic_block bb, datafl
 	  case MO_VAL_SET:
 	    {
 	      rtx loc = mo->u.loc;
-	      rtx val, vloc, uloc, reverse = NULL_RTX;
+	      rtx val, vloc, uloc;
 
 	      vloc = loc;
-	      if (VAL_EXPR_HAS_REVERSE (loc))
-		{
-		  reverse = XEXP (loc, 1);
-		  vloc = XEXP (loc, 0);
-		}
 	      uloc = XEXP (vloc, 1);
 	      val = XEXP (vloc, 0);
 	      vloc = uloc;
@@ -8787,10 +8731,6 @@  emit_notes_in_bb (basic_block bb, datafl
 
 	      val_store (set, val, vloc, insn, true);
 
-	      if (reverse)
-		val_store (set, XEXP (reverse, 0), XEXP (reverse, 1),
-			   insn, false);
-
 	      emit_notes_for_changes (next_insn, EMIT_NOTE_BEFORE_INSN,
 				      set->vars);
 	    }
@@ -8957,28 +8897,17 @@  vt_get_decl_and_offset (rtx rtl, tree *d
   return false;
 }
 
-/* Mark the value for the ENTRY_VALUE of RTL as equivalent to EQVAL in
-   OUT.  */
+/* Record the value for the ENTRY_VALUE of RTL as a global equivalence
+   of VAL.  */
 
 static void
-create_entry_value (dataflow_set *out, rtx eqval, rtx rtl)
+record_entry_value (cselib_val *val, rtx rtl)
 {
   rtx ev = gen_rtx_ENTRY_VALUE (GET_MODE (rtl));
-  cselib_val *val;
 
   ENTRY_VALUE_EXP (ev) = rtl;
 
-  val = cselib_lookup_from_insn (ev, GET_MODE (ev), true,
-				 VOIDmode, get_insns ());
-
-  if (val->val_rtx != eqval)
-    {
-      preserve_value (val);
-      set_variable_part (out, val->val_rtx, dv_from_value (eqval), 0,
-			 VAR_INIT_STATUS_INITIALIZED, NULL_RTX, INSERT);
-      set_variable_part (out, eqval, dv_from_value (val->val_rtx), 0,
-			 VAR_INIT_STATUS_INITIALIZED, NULL_RTX, INSERT);
-    }
+  cselib_add_permanent_equiv (val, ev, get_insns ());
 }
 
 /* Insert function parameter PARM in IN and OUT sets of ENTRY_BLOCK.  */
@@ -9137,7 +9066,7 @@  vt_add_function_parameter (tree parm)
 			 VAR_INIT_STATUS_INITIALIZED, NULL, INSERT);
       if (dv_is_value_p (dv))
 	{
-	  create_entry_value (out, dv_as_value (dv), incoming);
+	  record_entry_value (CSELIB_VAL_PTR (dv_as_value (dv)), incoming);
 	  if (TREE_CODE (TREE_TYPE (parm)) == REFERENCE_TYPE
 	      && INTEGRAL_TYPE_P (TREE_TYPE (TREE_TYPE (parm))))
 	    {
@@ -9150,9 +9079,9 @@  vt_add_function_parameter (tree parm)
 	      if (val)
 		{
 		  preserve_value (val);
+		  record_entry_value (val, mem);
 		  set_variable_part (out, mem, dv_from_value (val->val_rtx), 0,
 				     VAR_INIT_STATUS_INITIALIZED, NULL, INSERT);
-		  create_entry_value (out, val->val_rtx, mem);
 		}
 	    }
 	}