diff mbox

Strengthen ICF hash

Message ID 20150303072545.GA84757@kam.mff.cuni.cz
State New
Headers show

Commit Message

Jan Hubicka March 3, 2015, 7:25 a.m. UTC
Andi,
> Jan Hubicka <hubicka@ucw.cz> writes:
> >
> > The hash itself is quite simple (borrowing incremental hash for constants adding
> > very simple match for other stuff + logic to skip things that may match even if
> > they are syntactticaly different). The hash can be strenghtened significantly,
> > but I suppose we may do it based on actual profiling. It also is loosely based
> > on varasm.c constant pool hash implementation that was OK for years.
> 
> FWIW i have some older patches to use a stronger/faster spooky hash for
> incremential hash. But I never submitted them because I wasn't able
> to show clear gains for the type hashing (that was the original
> motivator) since you fixed it.
> 
> But if you're interested I can update them to the latest tree.
> spooky is very good at dealing with a lot of input data quickly.

You are my expert on modern hashing functions - my knowledge is very 1970's
with some random insights of papers I read. I still think it may be interesting
to get bit more up to date.

Ipa-icf is in nature similar to tree merging and suffers from similar problem.
What it does is following:

1) At copmile time it analyzes bodies of functions and initializers and computes
   hash.
2) At link time the pass is tyring to prove equivalence of two functions.  This
   is a standard congruence algorithm.
   As a starting point it divides functions into equivalence classes based on hash
3) next it takes data available readily and does compare within classes (equals_wpa).
   Here it compares function declarations and other flags and subdivides equivalences
4) next it subdivides equivalences by the congruences (i.e. if two functions are
   equivalent but they reffer two functions that are not equivalent, the classes are
   subdivided
5) finally it read functions bodies for each non-trivial class, compares bodies and
   subdivides euqivaelnces.  About 25% of all functions are read on firefox.
   After that subdividng is made again as in step 4.
Finally about 10% functions are known to be equivalent to something else.

Unlike tree merging, the equivalence is much more slopely defined.  Basically
it is a value numbering problem, but for now we basically require both
functions to have same CFG and same sequence of statements in it.
The hash computed in 1) must be stable across these equivalences.

More interesting are types that are handled by existing types_compatible_p
machinery.  This adds some extra fun - for example int and long may be considered
compatible at 32bit target unless they are used in memory refernces where
alias info differs.

So I think main problem is not the actual strength of the incremental hash,
but the fact that we do not hash enough.

Whe following WIP patch makes situation bit better by properly hashing
some type properties that are known to be stable and also moving some checkes
from 4) to 2).

I seem to get about 200k of body compars compared to 1m witout the patch.

I think one can also shift some work from 3) to 2) by doing the SCC hash
same way as tree hashing does that is linear in the size of imput unlike
the congruence subdivision that may run in O(n^2). I will discuss this
with Martin tomorrow.

Said all that I would be very curious if actual stronger hading would help.
I saw some miscompares for stuff that should be covered by hash. I will try
to get some stats soon.

Honza

Comments

Mikhail Maltsev March 4, 2015, 2:45 a.m. UTC | #1
03.03.2015 10:25, Jan Hubicka writes:

>>>
>>> The hash itself is quite simple (borrowing incremental hash for constants adding
>>> very simple match for other stuff + logic to skip things that may match even if
>>> they are syntactticaly different). The hash can be strenghtened significantly,
>>> but I suppose we may do it based on actual profiling. It also is loosely based
>>> on varasm.c constant pool hash implementation that was OK for years.
>>
>> FWIW i have some older patches to use a stronger/faster spooky hash for
>> incremential hash. But I never submitted them because I wasn't able
>> to show clear gains for the type hashing (that was the original
>> motivator) since you fixed it.
>>
>> But if you're interested I can update them to the latest tree.
>> spooky is very good at dealing with a lot of input data quickly.
> 
> You are my expert on modern hashing functions - my knowledge is very 1970's
> with some random insights of papers I read. I still think it may be interesting
> to get bit more up to date.
> 

When I read your conversation, I recalled that one of my colleagues has
quite good experience in using and evaluating hash functions, so I asked
him for advice.

Yura gave me the link to his repo on github, it contains an
implementation of his own hash function (you are free to use it, it's in
public domain):
https://github.com/funny-falcon/funny_hash

Even if you are not interested in it (I understand that you are probably
looking for something well-known), this repo is worth looking at,
because it contains the benchmark of several popular hash functions.
Noticeably, they include "lookup3" (the function used in libiberty and
inchash.c is based on it's predecessor, "lookup2"), "Murmurhash" (which
is also used in GCC, in libstdc++-v3/libsupc++/hash_bytes.cc) and spooky
hash mentioned by Andi.

I also ran the benchmark on GCC trunk and a recent version of clang
(results are attached). If you are interested in more detailed studying
of hash functions, SMHasher
https://code.google.com/p/smhasher/wiki/SMHasher
is probably a good framework for their analysis and benchmarking.

> So I think main problem is not the actual strength of the incremental hash,
> but the fact that we do not hash enough.
> 
> Whe following WIP patch makes situation bit better by properly hashing
> some type properties that are known to be stable and also moving some checkes
> from 4) to 2).

I have some idea. Unfortunately, I did not study the layout of GIMPLE
nodes in detail, so it might be useless... But suppose, that we can
extract the information which is relevant to ICF in the same way
gengtype extracts pointers for garbage collection. If the fields are
located close enough to each other, we could simply use bitwise "and"
with a constant mask to extract them. I.e. we use some tool to get the
fields of tree nodes which will be used when calculating the hash. At
compile time this tool will generate a lookup table (TREE_CODE will be
an index in it). Each element of the table can either contain a pair
(bitmask, length) - in this case we copy the tree node into temporary
buffer and do bitwise "and", or, perhaps, a list of (offset, length)
pairs, corresponding to some fields of the node (and possibly another
bit mask). In this case we copy the fields.

When we have enough data in the buffer, we perform hashing (that would
probably be faster than element-wise hashing, especially if we use SIMD).

Depending on how sparse the information inside tree nodes is, we could
fall back to something we already have:

> +  else if (VECTOR_TYPE_P (type))
> +    {
> +      hstate.add_int (VECTOR_TYPE);
> +      hstate.add_int (TYPE_PRECISION (type));
> +      sem_item::add_type (TREE_TYPE (type), hstate);
> +    }
> +  else if (TREE_CODE (type) == ARRAY_TYPE)
> +    {
> +      hstate.add_int (ARRAY_TYPE);
> +      sem_item::add_type (TREE_TYPE (type), hstate);
> +      sem_item::add_expr (TYPE_SIZE (type), hstate);
> +      hstate.add_int (POINTER_TYPE_P (type));

Note that this code can also be generated automatically, based on
gengtype-like annotations or, better, some separate description like
match.pd.
Andi Kleen March 4, 2015, 1:20 p.m. UTC | #2
Hi Honza,

Regarding modern hash functions, as far as I understand the trend
is to just stop doing anything fancy and just use the CRC instructions
in modern CPUs (unless you need a somewhat cryto hash to guard against
DoS attacks). spooky doesn't do that though, it's just a highly
optimized classical hash. It's much more stream lined than the somewhat
weird iterative hash construction that standard gcc uses, and I suspect
has better avalance effect. Also it'll do much less work per bit
(at the cost of larger code)

> Said all that I would be very curious if actual stronger hading would help.
> I saw some miscompares for stuff that should be covered by hash. I will try
> to get some stats soon.

I sent you the spooky patches in separate mail. I did a quick test run
and they don't seem to regress anything.

-Andi
Richard Biener March 4, 2015, 1:51 p.m. UTC | #3
On Wed, Mar 4, 2015 at 2:20 PM, Andi Kleen <andi@firstfloor.org> wrote:
> Hi Honza,
>
> Regarding modern hash functions, as far as I understand the trend
> is to just stop doing anything fancy and just use the CRC instructions
> in modern CPUs

I wonder if we can use intrinsics for CRC in place of the
existing iterative_hash_hashval_t (and thus the 'mix' macro)?
Similar to how we optimize libcpp?  Seems like

#define mix(a,b,c) \
 c = _mm_crc32_u32 (b, c)

would do?  And for iterative_hash_host_wide_int use _mm_crc32_u64?
(libcpp seems to use GCC target builtins, not intrinsics though,
see init_vectorized_lexer)

Thanks,
Richard.

> (unless you need a somewhat cryto hash to guard against
> DoS attacks). spooky doesn't do that though, it's just a highly
> optimized classical hash. It's much more stream lined than the somewhat
> weird iterative hash construction that standard gcc uses, and I suspect
> has better avalance effect. Also it'll do much less work per bit
> (at the cost of larger code)
>
>> Said all that I would be very curious if actual stronger hading would help.
>> I saw some miscompares for stuff that should be covered by hash. I will try
>> to get some stats soon.
>
> I sent you the spooky patches in separate mail. I did a quick test run
> and they don't seem to regress anything.
>
> -Andi
Jan Hubicka March 4, 2015, 8:22 p.m. UTC | #4
> On Wed, Mar 4, 2015 at 2:20 PM, Andi Kleen <andi@firstfloor.org> wrote:
> > Hi Honza,
> >
> > Regarding modern hash functions, as far as I understand the trend
> > is to just stop doing anything fancy and just use the CRC instructions
> > in modern CPUs
> 
> I wonder if we can use intrinsics for CRC in place of the
> existing iterative_hash_hashval_t (and thus the 'mix' macro)?
> Similar to how we optimize libcpp?  Seems like
> 
> #define mix(a,b,c) \
>  c = _mm_crc32_u32 (b, c)
> 
> would do?  And for iterative_hash_host_wide_int use _mm_crc32_u64?
> (libcpp seems to use GCC target builtins, not intrinsics though,
> see init_vectorized_lexer)

That looks like a good idea.  From tree/ICF sreaming we however needs
to get host independent at some point. Can we have resonable&compatible
generic implementation?

Honza
Richard Biener March 4, 2015, 8:35 p.m. UTC | #5
On March 4, 2015 9:22:24 PM CET, Jan Hubicka <hubicka@ucw.cz> wrote:
>> On Wed, Mar 4, 2015 at 2:20 PM, Andi Kleen <andi@firstfloor.org>
>wrote:
>> > Hi Honza,
>> >
>> > Regarding modern hash functions, as far as I understand the trend
>> > is to just stop doing anything fancy and just use the CRC
>instructions
>> > in modern CPUs
>> 
>> I wonder if we can use intrinsics for CRC in place of the
>> existing iterative_hash_hashval_t (and thus the 'mix' macro)?
>> Similar to how we optimize libcpp?  Seems like
>> 
>> #define mix(a,b,c) \
>>  c = _mm_crc32_u32 (b, c)
>> 
>> would do?  And for iterative_hash_host_wide_int use _mm_crc32_u64?
>> (libcpp seems to use GCC target builtins, not intrinsics though,
>> see init_vectorized_lexer)
>
>That looks like a good idea.  From tree/ICF sreaming we however needs
>to get host independent at some point. Can we have resonable&compatible
>generic implementation?

I honestly don't see us ending up with a host independent LTO byte code.  I know we kind of try, but did anybody try using bytecode produced by a cross on a native system?

Richard.

>Honza
Jan Hubicka March 4, 2015, 9:18 p.m. UTC | #6
> On March 4, 2015 9:22:24 PM CET, Jan Hubicka <hubicka@ucw.cz> wrote:
> >> On Wed, Mar 4, 2015 at 2:20 PM, Andi Kleen <andi@firstfloor.org>
> >wrote:
> >> > Hi Honza,
> >> >
> >> > Regarding modern hash functions, as far as I understand the trend
> >> > is to just stop doing anything fancy and just use the CRC
> >instructions
> >> > in modern CPUs
> >> 
> >> I wonder if we can use intrinsics for CRC in place of the
> >> existing iterative_hash_hashval_t (and thus the 'mix' macro)?
> >> Similar to how we optimize libcpp?  Seems like
> >> 
> >> #define mix(a,b,c) \
> >>  c = _mm_crc32_u32 (b, c)
> >> 
> >> would do?  And for iterative_hash_host_wide_int use _mm_crc32_u64?
> >> (libcpp seems to use GCC target builtins, not intrinsics though,
> >> see init_vectorized_lexer)
> >
> >That looks like a good idea.  From tree/ICF sreaming we however needs
> >to get host independent at some point. Can we have resonable&compatible
> >generic implementation?
> 
> I honestly don't see us ending up with a host independent LTO byte code.  I know we kind of try, but did anybody try using bytecode produced by a cross on a native system?

Not in GCC-5. But in longer run I do expect people to be able to ship LTO .o
objects given major GCC version match.  Other compilers including LLVM does
that, we should too. It has other benefits, too - like forcing us to design
things in clean and maintainable manner.

After early debug (hopefully landing in GCC 6) and with gimple/trees interface
cleanups, I think well defined IL stream may be next reachable goal though.

For optimization summaries I have alternative plan - those can stay subversion
specific.  If you get mismatch for minor version, you can just recompute them
on WPA time.  Assuming the LTO shipped libraries will be of controlled size, it
should not be too bad.

But yes, I do not think we need to significantly slow things down in current
implemetnation when this feature does not work.  Just perhaps keep track of
changes that introduce host specific stuff and not introduce these when it is
easily avoidable.

Honza
> 
> Richard.
> 
> >Honza
> 
>
Joseph Myers March 7, 2015, 12:29 a.m. UTC | #7
On Wed, 4 Mar 2015, Jan Hubicka wrote:

> But yes, I do not think we need to significantly slow things down in current
> implemetnation when this feature does not work.  Just perhaps keep track of
> changes that introduce host specific stuff and not introduce these when it is
> easily avoidable.

Known LTO host dependencies are listed in bug 41526, so I suggest listing 
any new such host dependencies there (and noting there if anything listed 
is in fact now fixed).
diff mbox

Patch

Index: ipa-icf.c
===================================================================
--- ipa-icf.c	(revision 221126)
+++ ipa-icf.c	(working copy)
@@ -389,33 +389,79 @@  sem_function::equals_wpa (sem_item *item
 
   m_compared_func = static_cast<sem_function *> (item);
 
-  if (arg_types.length () != m_compared_func->arg_types.length ())
-    return return_false_with_msg ("different number of arguments");
+  /* This is quite time ciritical, cheap checks first.   */
 
-  /* Checking types of arguments.  */
-  for (unsigned i = 0; i < arg_types.length (); i++)
-    {
-      /* This guard is here for function pointer with attributes (pr59927.c).  */
-      if (!arg_types[i] || !m_compared_func->arg_types[i])
-	return return_false_with_msg ("NULL argument type");
+  if (node->num_references () != item->node->num_references ())
+    return return_false_with_msg ("different number of references");
 
-      /* Polymorphic comparison is executed just for non-leaf functions.  */
-      bool is_not_leaf = get_node ()->callees != NULL
-			 || get_node ()->indirect_calls != NULL;
+  if (DECL_DISREGARD_INLINE_LIMITS (decl)
+      != DECL_DISREGARD_INLINE_LIMITS (item->decl))
+    return return_false_with_msg ("DECL_DISREGARD_INLINE_LIMITS are different");
 
-      if (!func_checker::compatible_types_p (arg_types[i],
-					     m_compared_func->arg_types[i],
-					     is_not_leaf, i == 0))
-	return return_false_with_msg ("argument type is different");
-    }
+  if (DECL_DECLARED_INLINE_P (decl) != DECL_DECLARED_INLINE_P (item->decl))
+    return return_false_with_msg ("inline attributes are different");
+
+  if (DECL_IS_OPERATOR_NEW (decl) != DECL_IS_OPERATOR_NEW (item->decl))
+    return return_false_with_msg ("operator new flags are different");
+
+  if (DECL_NO_INSTRUMENT_FUNCTION_ENTRY_EXIT (decl)
+       != DECL_NO_INSTRUMENT_FUNCTION_ENTRY_EXIT (item->decl))
+    return return_false_with_msg ("intrument function entry exit "
+				  "attributes are different");
+
+  if (DECL_NO_LIMIT_STACK (decl) != DECL_NO_LIMIT_STACK (item->decl))
+    return return_false_with_msg ("no stack limit attributes are different");
+
+  /* Compare special function DECL attributes.  */
+  if (DECL_FUNCTION_PERSONALITY (decl)
+      != DECL_FUNCTION_PERSONALITY (item->decl))
+    return return_false_with_msg ("function personalities are different");
+
+  if (flags_from_decl_or_type (decl) != flags_from_decl_or_type (item->decl))
+    return return_false_with_msg ("decl_or_type flags are different");
 
   /* Result type checking.  */
   if (!func_checker::compatible_types_p (result_type,
 					 m_compared_func->result_type))
     return return_false_with_msg ("result types are different");
 
-  if (node->num_references () != item->node->num_references ())
-    return return_false_with_msg ("different number of references");
+  /* Checking function TARGET and OPTIMIZATION flags.  */
+  cl_target_option *tar1 = target_opts_for_fn (decl);
+  cl_target_option *tar2 = target_opts_for_fn (item->decl);
+
+  if (tar1 != tar2 && !cl_target_option_eq (tar1, tar2))
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "target flags difference");
+	  cl_target_option_print_diff (dump_file, 2, tar1, tar2);
+	}
+
+      return return_false_with_msg ("Target flags are different");
+    }
+
+  cl_optimization *opt1 = opts_for_fn (decl);
+  cl_optimization *opt2 = opts_for_fn (item->decl);
+
+  if (opt1 != opt2 && memcmp (opt1, opt2, sizeof(cl_optimization)))
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "optimization flags difference");
+	  cl_optimization_print_diff (dump_file, 2, opt1, opt2);
+	}
+
+      return return_false_with_msg ("optimization flags are different");
+    }
+
+  /* Checking function attributes.
+     This is quadratic in number of attributes  */
+  if (!comp_attributes (DECL_ATTRIBUTES (decl),
+			DECL_ATTRIBUTES (item->decl)))
+    return return_false ();
+  if (comp_type_attributes (TREE_TYPE (decl),
+			    TREE_TYPE (item->decl)) != 1)
+    return return_false ();
 
   ipa_ref *ref = NULL, *ref2 = NULL;
   for (unsigned i = 0; node->iterate_reference (i, ref); i++)
@@ -424,7 +470,8 @@  sem_function::equals_wpa (sem_item *item
 
       if (!compare_cgraph_references (ignored_nodes, ref->referred,
 				      ref2->referred,
-				      ref->address_matters_p ()))
+				      ref->address_matters_p ()
+				      || ref2->address_matters_p ()))
 	return false;
     }
 
@@ -471,11 +518,57 @@  sem_function::equals (sem_item *item,
   return eq;
 }
 
+/* Return ture if A1 and A2 represent equivalent function attribute lists.
+   Based on comp_type_attributes.  */
+
+bool
+sem_function::comp_attributes (const_tree a1, const_tree a2)
+{
+  const_tree a;
+  if (a1 == a2)
+    return true;
+  for (a = a1; a != NULL_TREE; a = TREE_CHAIN (a))
+    {
+      const struct attribute_spec *as;
+      const_tree attr;
+
+      as = lookup_attribute_spec (get_attribute_name (a));
+      if (!as || as->affects_type_identity == false)
+        continue;
+
+      attr = lookup_attribute (as->name, CONST_CAST_TREE (a2));
+      if (!attr || !attribute_value_equal (a, attr))
+        break;
+    }
+  if (!a)
+    {
+      for (a = a2; a != NULL_TREE; a = TREE_CHAIN (a))
+	{
+	  const struct attribute_spec *as;
+
+	  as = lookup_attribute_spec (get_attribute_name (a));
+	  if (!as || as->affects_type_identity == false)
+	    continue;
+
+	  if (!lookup_attribute (as->name, CONST_CAST_TREE (a1)))
+	    break;
+	  /* We don't need to compare trees again, as we did this
+	     already in first loop.  */
+	}
+      /* All types - affecting identity - are equal, so
+         there is no need to call target hook for comparison.  */
+      if (!a)
+        return true;
+    }
+  return false;
+}
+
 /* Processes function equality comparison.  */
 
 bool
 sem_function::equals_private (sem_item *item,
-			      hash_map <symtab_node *, sem_item *> &ignored_nodes)
+			      hash_map <symtab_node *, sem_item *>
+				 &ignored_nodes ATTRIBUTE_UNUSED)
 {
   if (item->type != FUNC)
     return false;
@@ -484,7 +577,6 @@  sem_function::equals_private (sem_item *
   edge e1, e2;
   edge_iterator ei1, ei2;
   bool result = true;
-  tree arg1, arg2;
 
   m_compared_func = static_cast<sem_function *> (item);
 
@@ -495,93 +587,39 @@  sem_function::equals_private (sem_item *
       || cfg_checksum != m_compared_func->cfg_checksum)
     return return_false ();
 
-  if (!equals_wpa (item, ignored_nodes))
-    return false;
+#ifdef ENABLE_CHECKING
+  gcc_assert (equals_wpa (item, ignored_nodes));
+#endif
 
-  /* Checking function TARGET and OPTIMIZATION flags.  */
-  cl_target_option *tar1 = target_opts_for_fn (decl);
-  cl_target_option *tar2 = target_opts_for_fn (m_compared_func->decl);
+  if (arg_types.length () != m_compared_func->arg_types.length ())
+    return return_false_with_msg ("different number of arguments");
 
-  if (tar1 != NULL && tar2 != NULL)
+  /* Checking types of arguments.  */
+  for (unsigned i = 0; i < arg_types.length (); i++)
     {
-      if (!cl_target_option_eq (tar1, tar2))
-	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    {
-	      fprintf (dump_file, "target flags difference");
-	      cl_target_option_print_diff (dump_file, 2, tar1, tar2);
-	    }
-
-	  return return_false_with_msg ("Target flags are different");
-	}
-    }
-  else if (tar1 != NULL || tar2 != NULL)
-    return return_false_with_msg ("Target flags are different");
-
-  cl_optimization *opt1 = opts_for_fn (decl);
-  cl_optimization *opt2 = opts_for_fn (m_compared_func->decl);
+      /* This guard is here for function pointer with attributes (pr59927.c).  */
+      if (!arg_types[i] || !m_compared_func->arg_types[i])
+	return return_false_with_msg ("NULL argument type");
 
-  if (opt1 != NULL && opt2 != NULL)
-    {
-      if (memcmp (opt1, opt2, sizeof(cl_optimization)))
-	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    {
-	      fprintf (dump_file, "optimization flags difference");
-	      cl_optimization_print_diff (dump_file, 2, opt1, opt2);
-	    }
+      /* Polymorphic comparison is executed just for non-leaf functions.  */
+      bool is_not_leaf = get_node ()->callees != NULL
+			 || get_node ()->indirect_calls != NULL;
 
-	  return return_false_with_msg ("optimization flags are different");
-	}
+      if (!func_checker::compatible_types_p (arg_types[i],
+					     m_compared_func->arg_types[i],
+					     is_not_leaf,
+					     i == 0
+					     && TREE_CODE (TREE_TYPE (decl))
+						 == METHOD_TYPE
+					     && compare_polymorphic_p ()))
+	return return_false_with_msg ("argument type is different");
     }
-  else if (opt1 != NULL || opt2 != NULL)
-    return return_false_with_msg ("optimization flags are different");
-
-  /* Checking function arguments.  */
-  tree decl1 = DECL_ATTRIBUTES (decl);
-  tree decl2 = DECL_ATTRIBUTES (m_compared_func->decl);
 
   m_checker = new func_checker (decl, m_compared_func->decl,
 				compare_polymorphic_p (),
 				false,
 				&refs_set,
 				&m_compared_func->refs_set);
-  while (decl1)
-    {
-      if (decl2 == NULL)
-	return return_false ();
-
-      if (get_attribute_name (decl1) != get_attribute_name (decl2))
-	return return_false ();
-
-      tree attr_value1 = TREE_VALUE (decl1);
-      tree attr_value2 = TREE_VALUE (decl2);
-
-      if (attr_value1 && attr_value2)
-	{
-	  bool ret = m_checker->compare_operand (TREE_VALUE (attr_value1),
-						 TREE_VALUE (attr_value2));
-	  if (!ret)
-	    return return_false_with_msg ("attribute values are different");
-	}
-      else if (!attr_value1 && !attr_value2)
-	{}
-      else
-	return return_false ();
-
-      decl1 = TREE_CHAIN (decl1);
-      decl2 = TREE_CHAIN (decl2);
-    }
-
-  if (decl1 != decl2)
-    return return_false();
-
-
-  for (arg1 = DECL_ARGUMENTS (decl),
-       arg2 = DECL_ARGUMENTS (m_compared_func->decl);
-       arg1; arg1 = DECL_CHAIN (arg1), arg2 = DECL_CHAIN (arg2))
-    if (!m_checker->compare_decl (arg1, arg2))
-      return return_false ();
 
   /* Fill-up label dictionary.  */
   for (unsigned i = 0; i < bb_sorted.length (); ++i)
@@ -632,30 +670,6 @@  sem_function::equals_private (sem_item *
     if (!compare_phi_node (bb_sorted[i]->bb, m_compared_func->bb_sorted[i]->bb))
       return return_false_with_msg ("PHI node comparison returns false");
 
-  /* Compare special function DECL attributes.  */
-  if (DECL_FUNCTION_PERSONALITY (decl) != DECL_FUNCTION_PERSONALITY (item->decl))
-    return return_false_with_msg ("function personalities are different");
-
-  if (DECL_DISREGARD_INLINE_LIMITS (decl) != DECL_DISREGARD_INLINE_LIMITS (item->decl))
-    return return_false_with_msg ("DECL_DISREGARD_INLINE_LIMITS are different");
-
-  if (DECL_DECLARED_INLINE_P (decl) != DECL_DECLARED_INLINE_P (item->decl))
-    return return_false_with_msg ("inline attributes are different");
-
-  if (DECL_IS_OPERATOR_NEW (decl) != DECL_IS_OPERATOR_NEW (item->decl))
-    return return_false_with_msg ("operator new flags are different");
-
-  if (DECL_NO_INSTRUMENT_FUNCTION_ENTRY_EXIT (decl)
-       != DECL_NO_INSTRUMENT_FUNCTION_ENTRY_EXIT (item->decl))
-    return return_false_with_msg ("intrument function entry exit "
-				  "attributes are different");
-
-  if (DECL_NO_LIMIT_STACK (decl) != DECL_NO_LIMIT_STACK (item->decl))
-    return return_false_with_msg ("no stack limit attributes are different");
-
-  if (flags_from_decl_or_type (decl) != flags_from_decl_or_type (item->decl))
-    return return_false_with_msg ("decl_or_type flags are different");
-
   return result;
 }
 
@@ -716,7 +730,7 @@  redirect_all_callers (cgraph_node *n, cg
 	      && n->get_availability () > AVAIL_INTERPOSABLE))
 	{
 	  nredirected += redirect_all_callers (n_alias, to);
-	  if (n_alias->can_remove_if_no_direct_calls_p ()
+	  if (n_alias->can_remove_with_comdat_group_if_no_direct_calls_p ()
 	      && !n_alias->has_aliases_p ())
 	    n_alias->remove ();
 	}
@@ -907,7 +921,7 @@  sem_function::merge (sem_item *alias_ite
 	  return false;
 	}
       if (!create_wrapper
-	  && !alias->can_remove_if_no_direct_calls_p ())
+	  && !alias->can_remove_with_comdat_group_if_no_direct_calls_p ())
 	{
 	  if (dump_file)
 	    fprintf (dump_file, "Not unifying; can not make wrapper and "
@@ -933,7 +947,7 @@  sem_function::merge (sem_item *alias_ite
 	}
 
       /* If all callers was redirected, do not produce wrapper.  */
-      if (alias->can_remove_if_no_direct_calls_p ()
+      if (alias->can_remove_with_comdat_group_if_no_direct_calls_p ()
 	  && !alias->has_aliases_p ())
 	{
 	  create_wrapper = false;
@@ -1114,10 +1128,8 @@  sem_item::add_expr (const_tree exp, inch
       add_expr (get_base_address (TREE_OPERAND (exp, 0)), hstate);
       break;
     case SSA_NAME:
-    case VAR_DECL:
-    case CONST_DECL:
-    case PARM_DECL:
-      hstate.add_wide_int (int_size_in_bytes (TREE_TYPE (exp)));
+      if (SSA_NAME_IS_DEFAULT_DEF (exp))
+	add_expr (SSA_NAME_VAR (exp), hstate);
       break;
     case MEM_REF:
     case POINTER_PLUS_EXPR:
@@ -1142,6 +1154,66 @@  sem_item::add_expr (const_tree exp, inch
     }
 }
 
+/* Accumulate to HSTATE a hash of type t.
+   TYpes that may end up being compatible after LTO type merging needs to have
+   the same hash.  */
+void
+sem_item::add_type (const_tree type, inchash::hash &hstate)
+{
+  if (type == NULL_TREE)
+    {
+      hstate.merge_hash (0);
+      return;
+    }
+
+  type = TYPE_MAIN_VARIANT (type);
+  if (TYPE_CANONICAL (type))
+    type = TYPE_CANONICAL (type);
+
+  if (!AGGREGATE_TYPE_P (type))
+    hstate.add_int (TYPE_MODE (type));
+
+  if (TREE_CODE (type) == COMPLEX_TYPE)
+    {
+      hstate.add_int (COMPLEX_TYPE);
+      sem_item::add_type (TREE_TYPE (type), hstate);
+    }
+  else if (INTEGRAL_TYPE_P (type))
+    {
+      hstate.add_int (INTEGER_TYPE);
+      hstate.add_flag (TYPE_UNSIGNED (type));
+      hstate.add_int (TYPE_PRECISION (type));
+    }
+  else if (VECTOR_TYPE_P (type))
+    {
+      hstate.add_int (VECTOR_TYPE);
+      hstate.add_int (TYPE_PRECISION (type));
+      sem_item::add_type (TREE_TYPE (type), hstate);
+    }
+  else if (TREE_CODE (type) == ARRAY_TYPE)
+    {
+      hstate.add_int (ARRAY_TYPE);
+      sem_item::add_type (TREE_TYPE (type), hstate);
+      sem_item::add_expr (TYPE_SIZE (type), hstate);
+      hstate.add_int (POINTER_TYPE_P (type));
+    }
+  else if (RECORD_OR_UNION_TYPE_P (type))
+    {
+      unsigned nf;
+      tree f;
+      hstate.add_int (RECORD_TYPE);
+
+      for (f = TYPE_FIELDS (type), nf = 0; f; f = TREE_CHAIN (f))
+	if (TREE_CODE (f) == FIELD_DECL)
+	  {
+	    add_type (TREE_TYPE (f), hstate);
+	    nf++;
+	  }
+
+      hstate.add_int (nf);
+    }
+}
+
 /* Improve accumulated hash for HSTATE based on a gimple statement STMT.  */
 
 void
@@ -1153,14 +1225,20 @@  sem_function::hash_stmt (gimple stmt, in
 
   switch (code)
     {
+    case GIMPLE_SWITCH:
+      add_expr (gimple_switch_index (as_a <gswitch *> (stmt)), hstate);
+      break;
     case GIMPLE_ASSIGN:
+      hstate.add_int (gimple_assign_rhs_code (stmt));
       if (commutative_tree_code (gimple_assign_rhs_code (stmt))
 	  || commutative_ternary_tree_code (gimple_assign_rhs_code (stmt)))
 	{
 	  inchash::hash one, two;
 
 	  add_expr (gimple_assign_rhs1 (stmt), one);
+	  add_type (TREE_TYPE (gimple_assign_rhs1 (stmt)), one);
 	  add_expr (gimple_assign_rhs2 (stmt), two);
+	  add_type (TREE_TYPE (gimple_assign_rhs2 (stmt)), two);
 	  hstate.add_commutative (one, two);
 	  add_expr (gimple_assign_lhs (stmt), hstate);
 	  break;
@@ -1173,7 +1251,11 @@  sem_function::hash_stmt (gimple stmt, in
     case GIMPLE_RETURN:
       /* All these statements are equivalent if their operands are.  */
       for (unsigned i = 0; i < gimple_num_ops (stmt); ++i)
-	add_expr (gimple_op (stmt, i), hstate);
+	{
+	  add_expr (gimple_op (stmt, i), hstate);
+	  if (gimple_op (stmt, i))
+	    add_type (TREE_TYPE (gimple_op (stmt, i)), hstate);
+	}
     default:
       break;
     }
@@ -1343,41 +1425,6 @@  sem_function::bb_dict_test (vec<int> *bb
     return (*bb_dict)[source] == target;
 }
 
-/* Iterates all tree types in T1 and T2 and returns true if all types
-   are compatible. If COMPARE_POLYMORPHIC is set to true,
-   more strict comparison is executed.  */
-
-bool
-sem_function::compare_type_list (tree t1, tree t2, bool compare_polymorphic)
-{
-  tree tv1, tv2;
-  tree_code tc1, tc2;
-
-  if (!t1 && !t2)
-    return true;
-
-  while (t1 != NULL && t2 != NULL)
-    {
-      tv1 = TREE_VALUE (t1);
-      tv2 = TREE_VALUE (t2);
-
-      tc1 = TREE_CODE (tv1);
-      tc2 = TREE_CODE (tv2);
-
-      if (tc1 == NOP_EXPR && tc2 == NOP_EXPR)
-	{}
-      else if (tc1 == NOP_EXPR || tc2 == NOP_EXPR)
-	return false;
-      else if (!func_checker::compatible_types_p (tv1, tv2, compare_polymorphic))
-	return false;
-
-      t1 = TREE_CHAIN (t1);
-      t2 = TREE_CHAIN (t2);
-    }
-
-  return !(t1 || t2);
-}
-
 
 /* Semantic variable constructor that uses STACK as bitmap memory stack.  */
 
@@ -1439,7 +1486,8 @@  sem_variable::equals_wpa (sem_item *item
 
       if (!compare_cgraph_references (ignored_nodes,
 				      ref->referred, ref2->referred,
-				      ref->address_matters_p ()))
+				      ref->address_matters_p ()
+				      || ref2->address_matters_p ()))
 	return false;
     }
 
@@ -1452,7 +1500,8 @@  sem_variable::equals_wpa (sem_item *item
 
 bool
 sem_variable::equals (sem_item *item,
-		      hash_map <symtab_node *, sem_item *> &)
+		      hash_map <symtab_node *, sem_item *> &ignored_nodes
+			ATTRIBUTE_UNUSED)
 {
   gcc_assert (item->type == VAR);
   bool ret;
@@ -1462,6 +1511,9 @@  sem_variable::equals (sem_item *item,
   if (DECL_INITIAL (item->decl) == error_mark_node && in_lto_p)
     dyn_cast <varpool_node *>(item->node)->get_constructor ();
 
+#ifdef ENABLE_CHECKING
+  gcc_assert (equals_wpa (item, ignored_nodes));
+#endif
   ret = sem_variable::equals (DECL_INITIAL (decl),
 			      DECL_INITIAL (item->node->decl));
   if (dump_file && (dump_flags & TDF_DETAILS))
@@ -2384,8 +2436,9 @@  sem_item_optimizer::subdivide_classes_by
 		{
 		  sem_item *item = c->members[j];
 
-		  bool equals = in_wpa ? first->equals_wpa (item,
-				m_symtab_node_map) : first->equals (item, m_symtab_node_map);
+		  bool equals = in_wpa
+				 ? first->equals_wpa (item, m_symtab_node_map)
+				 : first->equals (item, m_symtab_node_map);
 
 		  if (equals)
 		    new_vector.safe_push (item);
@@ -2396,8 +2449,9 @@  sem_item_optimizer::subdivide_classes_by
 		      for (unsigned k = class_split_first; k < (*it)->classes.length (); k++)
 			{
 			  sem_item *x = (*it)->classes[k]->members[0];
-			  bool equals = in_wpa ? x->equals_wpa (item,
-								m_symtab_node_map) : x->equals (item, m_symtab_node_map);
+			  bool equals = in_wpa
+					? x->equals_wpa (item, m_symtab_node_map)
+					: x->equals (item, m_symtab_node_map);
 
 			  if (equals)
 			    {
@@ -2419,7 +2473,8 @@  sem_item_optimizer::subdivide_classes_by
 		    }
 		}
 
-	      // we replace newly created new_vector for the class we've just splitted
+	      /* we replace newly created new_vector for the class we've just
+ 		 splitted.  */
 	      c->members.release ();
 	      c->members.create (new_vector.length ());
 
Index: ipa-icf.h
===================================================================
--- ipa-icf.h	(revision 221126)
+++ ipa-icf.h	(working copy)
@@ -242,8 +242,11 @@  protected:
   /* Cached, once calculated hash for the item.  */
   hashval_t hash;
 
-  /* Accumulate to HSTATE a hash of constructor expression EXP.  */
+  /* Accumulate to HSTATE a hash of expression EXP (either constructor or
+     gimple operand).  */
   static void add_expr (const_tree exp, inchash::hash &hstate);
+  /* Accumulate to HSTATE a hash of type T.  */
+  static void add_type (const_tree t, inchash::hash &hstate);
 
   /* For a given symbol table nodes N1 and N2, we check that FUNCTION_DECLs
      point to a same function. Comparison can be skipped if IGNORED_NODES
@@ -353,15 +356,12 @@  private:
      corresponds to TARGET.  */
   bool bb_dict_test (vec<int> *bb_dict, int source, int target);
 
-  /* Iterates all tree types in T1 and T2 and returns true if all types
-     are compatible. If COMPARE_POLYMORPHIC is set to true,
-     more strict comparison is executed.  */
-  bool compare_type_list (tree t1, tree t2, bool compare_polymorphic);
-
   /* If cgraph edges E1 and E2 are indirect calls, verify that
      ICF flags are the same.  */
   bool compare_edge_flags (cgraph_edge *e1, cgraph_edge *e2);
 
+  bool comp_attributes (const_tree list1, const_tree list2);
+
   /* Processes function equality comparison.  */
   bool equals_private (sem_item *item,
 		       hash_map <symtab_node *, sem_item *> &ignored_nodes);
Index: tree.c
===================================================================
--- tree.c	(revision 221126)
+++ tree.c	(working copy)
@@ -4873,7 +4873,7 @@  simple_cst_list_equal (const_tree l1, co
    attribute values are known to be equal; otherwise return false.
 */
 
-static bool
+bool
 attribute_value_equal (const_tree attr1, const_tree attr2)
 {
   if (TREE_VALUE (attr1) == TREE_VALUE (attr2))
Index: tree.h
===================================================================
--- tree.h	(revision 221126)
+++ tree.h	(working copy)
@@ -4366,6 +4366,7 @@  extern tree lhd_gcc_personality (void);
 extern void assign_assembler_name_if_neeeded (tree);
 extern void warn_deprecated_use (tree, tree);
 extern void cache_integer_cst (tree);
+extern bool attribute_value_equal (const_tree, const_tree);
 
 /* Compare and hash for any structure which begins with a canonical
    pointer.  Assumes all pointers are interchangeable, which is sort