Patchwork User directed Function Multiversioning via Function Overloading (issue5752064)

login
register
mail settings
Submitter Sriraman Tallam
Date April 27, 2012, 5:08 a.m.
Message ID <CAAs8HmyeZE25B+vAp7aretL85WtPfyUuLAZNw7oFPFsUHZBtPA@mail.gmail.com>
Download mbox | patch
Permalink /patch/155380/
State New
Headers show

Comments

Sriraman Tallam - April 27, 2012, 5:08 a.m.
Hi,

   I have made the following changes in this new patch which is attached:

* Use target attribute itself to create function versions.
* Handle any number of ISA names and arch=  args to target attribute,
generating the right dispatchers.
* Integrate with the CPU runtime detection checked in this week.
* Overload resolution: If the caller's target matches any of the
version function's target, then a direct call to the version is
generated, no need to go through the dispatching.

Patch also available for review here:
http://codereview.appspot.com/5752064

Thanks,
-Sri.


On Fri, Mar 9, 2012 at 12:04 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> Hi Richard,
>
>  Here is a more detailed overview of the front-end description:
>
> * Tracking decls that correspond to function versions of function
> name, say "foo":
>
> Wnen the front-end sees a decl for "foo" with "targetv" attributes, it
> tags it as a function version. To prevent duplicate definition errors
> with other versions of "foo", I change "decls_match" function in
> cp/decl.c to return false when 2 decls have the same signature but
> different targetv attributes. This will make all function versions of
> "foo" to be added to the overload list of "foo".
>
> To expand further, different targetv attributes is checked for by
> sorting the arguments to targetv.
>
> * Change the assembler names of the function versions.
>
> The front-end, changes the assembler names of the function versions by
> tagging the sorted list of args to "targetv" to the function name of
> "foo". For example, the assembler name of "void foo () __attribute__
> ((targetv ("sse4")))" will become _Z3foov.sse4.
>
> * Separately group all function versions of "foo" together, in multiversion.c:
>
> File multiversion.c maintains a hashtab, decl_version_htab,  that maps
> the  default function decl of "foo" to the list of all other versions
> of this function "foo". This is meant to be used when creating the
> dispatcher for this function.
>
> * Overload resolution:
>
>  Function "build_over_call" in cp/call.c sees a call to function
> "foo", which is multi-versioned. The overload resolution happens in
> function "joust" in "cp/call.c". Here, the call to "foo" has all
> possible versions of "foo" as candidates. Currently, "joust" returns
> the default version of "foo" as the winning candidate. But,
> "build_over_call" realizes that this is a versioned function and
> replaces the call-site of foo with a "ifunc" call for foo, by querying
> a function in "multiversion.c" which builds the ifunc decl. After
> this, all call-sites of "foo" contain the call to the ifunc.
>
> Notice that, for  calls from a sse function to a versioned function
> with an sse variant, I can modify "joust" to return the "sse" function
> version rather than the default and not replace this call with an
> ifunc. To do this, I must pass the target attributes of the callee to
> "joust" and check if the target attributes also match any version.
>
> * Creating the dispatcher:
>
> The dispatcher is independently created in a new pass, called
> "pass_dispatch_version", that runs immediately after cfg and cgraph is
> created. The dispatcher looks at all possible versions and queries the
> target to give it the CPU detection predicates it must use to dispatch
> each version. Then, the dispatcher body is created and the ifunc is
> mapped to use this dispatcher.
>
> Notice that only the dispatcher creation is done after the front-end.
> Everything else occurs in the front-end itself. I could have created
> the dispatcher also in the front-end. I did not do so because I
> thought keeping it as a separate pass made sense to easily add more
> dispatch mechanisms. Like when IFUNC is not available, replace it with
>  control-flow to make direct calls to the function versions. Also,
> making the dispatcher after "cfg" is created was easy.
>
> Thanks,
> -Sri.
>
>
> On Wed, Mar 7, 2012 at 6:05 AM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Wed, Mar 7, 2012 at 1:46 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> User directed Function Multiversioning (MV) via Function Overloading
>>> ====================================================================
>>>
>>> This patch adds support for user directed function MV via function overloading.
>>> For more detailed description:
>>> http://gcc.gnu.org/ml/gcc/2012-03/msg00074.html
>>>
>>>
>>> Here is an example program with function versions:
>>>
>>> int foo ();  /* Default version */
>>> int foo () __attribute__ ((targetv("arch=corei7")));/*Specialized for corei7 */
>>> int foo () __attribute__ ((targetv("arch=core2")));/*Specialized for core2 */
>>>
>>> int main ()
>>> {
>>>  int (*p)() = &foo;
>>>  return foo () + (*p)();
>>> }
>>>
>>> int foo ()
>>> {
>>>  return 0;
>>> }
>>>
>>> int __attribute__ ((targetv("arch=corei7")))
>>> foo ()
>>> {
>>>  return 0;
>>> }
>>>
>>> int __attribute__ ((targetv("arch=core2")))
>>> foo ()
>>> {
>>>  return 0;
>>> }
>>>
>>> The above example has foo defined 3 times, but all 3 definitions of foo are
>>> different versions of the same function. The call to foo in main, directly and
>>> via a pointer, are calls to the multi-versioned function foo which is dispatched
>>> to the right foo at run-time.
>>>
>>> Function versions must have the same signature but must differ in the specifier
>>> string provided to a new attribute called "targetv", which is nothing but the
>>> target attribute with an extra specification to indicate a version. Any number
>>> of versions can be created using the targetv attribute but it is mandatory to
>>> have one function without the attribute, which is treated as the default
>>> version.
>>>
>>> The dispatching is done using the IFUNC mechanism to keep the dispatch overhead
>>> low. The compiler creates a dispatcher function which checks the CPU type and
>>> calls the right version of foo. The dispatching code checks for the platform
>>> type and calls the first version that matches. The default function is called if
>>> no specialized version is appropriate for execution.
>>>
>>> The pointer to foo is made to be the address of the dispatcher function, so that
>>> it is unique and calls made via the pointer also work correctly. The assembler
>>> names of the various versions of foo is made different, by tagging
>>> the specifier strings, to keep them unique.  A specific version can be called
>>> directly by creating an alias to its assembler name. For instance, to call the
>>> corei7 version directly, make an alias :
>>> int foo_corei7 () __attribute__((alias ("_Z3foov.arch_corei7")));
>>> and then call foo_corei7.
>>>
>>> Note that using IFUNC  blocks inlining of versioned functions. I had implemented
>>> an optimization earlier to do hot path cloning to allow versioned functions to
>>> be inlined. Please see : http://gcc.gnu.org/ml/gcc-patches/2011-04/msg02285.html
>>> In the next iteration, I plan to merge these two. With that, hot code paths with
>>> versioned functions will be cloned so that versioned functions can be inlined.
>>
>> Note that inlining of functions with the target attribute is limited as well,
>> but your issue is that of the indirect dispatch as ...
>>
>> You don't give an overview of the frontend implementation.  Thus I have
>> extracted the following
>>
>>  - the FE does not really know about the "overloading", nor can it directly
>>   resolve calls from a "sse" function to another "sse" function without going
>>   through the 2nd IFUNC
>>
>>  - cgraph also does not know about the "overloading", so it cannot do such
>>   "devirtualization" either
>>
>> you seem to have implemented something inbetween a pure frontend
>> solution and a proper middle-end solution.  For optimization and eventually
>> automatically selecting functions for cloning (like, callees of a manual "sse"
>> versioned function should be cloned?) it would be nice if the cgraph would
>> know about the different versions and their relationships (and the dispatcher).
>> Especially the cgraph code should know the functions are semantically
>> equivalent (I suppose we should require that).  The IFUNC should be
>> generated by cgraph / target code, similar to how we generate C++ thunks.
>>
>> Honza, any suggestions on how the FE side of such cgraph infrastructure
>> should look like and how we should encode the target bits?
>>
>> Thanks,
>> Richard.
>>
>>>        * doc/tm.texi.in: Add description for TARGET_DISPATCH_VERSION.
>>>        * doc/tm.texi: Regenerate.
>>>        * c-family/c-common.c (handle_targetv_attribute): New function.
>>>        * target.def (dispatch_version): New target hook.
>>>        * tree.h (DECL_FUNCTION_VERSIONED): New macro.
>>>        (tree_function_decl): New bit-field versioned_function.
>>>        * tree-pass.h (pass_dispatch_versions): New pass.
>>>        * multiversion.c: New file.
>>>        * multiversion.h: New file.
>>>        * cgraphunit.c: Include multiversion.h
>>>        (cgraph_finalize_function): Change assembler names of versioned
>>>        functions.
>>>        * cp/class.c: Include multiversion.h
>>>        (add_method): aggregate function versions. Change assembler names of
>>>        versioned functions.
>>>        (resolve_address_of_overloaded_function): Match address of function
>>>        version with default function.  Return address of ifunc dispatcher
>>>        for address of versioned functions.
>>>        * cp/decl.c (decls_match): Make decls unmatched for versioned
>>>        functions.
>>>        (duplicate_decls): Remove ambiguity for versioned functions. Notify
>>>        of deleted function version decls.
>>>        (start_decl): Change assembler name of versioned functions.
>>>        (start_function): Change assembler name of versioned functions.
>>>        (cxx_comdat_group): Make comdat group of versioned functions be the
>>>        same.
>>>        * cp/semantics.c (expand_or_defer_fn_1): Mark as needed versioned
>>>        functions that are also marked inline.
>>>        * cp/decl2.c: Include multiversion.h
>>>        (check_classfn): Check attributes of versioned functions for match.
>>>        * cp/call.c: Include multiversion.h
>>>        (build_over_call): Make calls to multiversioned functions to call the
>>>        dispatcher.
>>>        (joust): For calls to multi-versioned functions, make the default
>>>        function win.
>>>        * timevar.def (TV_MULTIVERSION_DISPATCH): New time var.
>>>        * varasm.c (finish_aliases_1): Check if the alias points to a function
>>>        with a body before giving an error.
>>>        * Makefile.in: Add multiversion.o
>>>        * passes.c: Add pass_dispatch_versions to the pass list.
>>>        * config/i386/i386.c (add_condition_to_bb): New function.
>>>        (get_builtin_code_for_version): New function.
>>>        (ix86_dispatch_version): New function.
>>>        (TARGET_DISPATCH_VERSION): New macro.
>>>        * testsuite/g++.dg/mv1.C: New test.
>>>
>>> Index: doc/tm.texi
>>> ===================================================================
>>> --- doc/tm.texi (revision 184971)
>>> +++ doc/tm.texi (working copy)
>>> @@ -10995,6 +10995,14 @@ The result is another tree containing a simplified
>>>  call's result.  If @var{ignore} is true the value will be ignored.
>>>  @end deftypefn
>>>
>>> +@deftypefn {Target Hook} int TARGET_DISPATCH_VERSION (tree @var{dispatch_decl}, void *@var{fndecls}, basic_block *@var{empty_bb})
>>> +For multi-versioned function, this hook sets up the dispatcher.
>>> +@var{dispatch_decl} is the function that will be used to dispatch the
>>> +version. @var{fndecls} are the function choices for dispatch.
>>> +@var{empty_bb} is an basic block in @var{dispatch_decl} where the
>>> +code to do the dispatch will be added.
>>> +@end deftypefn
>>> +
>>>  @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn})
>>>
>>>  Take an instruction in @var{insn} and return NULL if it is valid within a
>>> Index: doc/tm.texi.in
>>> ===================================================================
>>> --- doc/tm.texi.in      (revision 184971)
>>> +++ doc/tm.texi.in      (working copy)
>>> @@ -10873,6 +10873,14 @@ The result is another tree containing a simplified
>>>  call's result.  If @var{ignore} is true the value will be ignored.
>>>  @end deftypefn
>>>
>>> +@hook TARGET_DISPATCH_VERSION
>>> +For multi-versioned function, this hook sets up the dispatcher.
>>> +@var{dispatch_decl} is the function that will be used to dispatch the
>>> +version. @var{fndecls} are the function choices for dispatch.
>>> +@var{empty_bb} is an basic block in @var{dispatch_decl} where the
>>> +code to do the dispatch will be added.
>>> +@end deftypefn
>>> +
>>>  @hook TARGET_INVALID_WITHIN_DOLOOP
>>>
>>>  Take an instruction in @var{insn} and return NULL if it is valid within a
>>> Index: c-family/c-common.c
>>> ===================================================================
>>> --- c-family/c-common.c (revision 184971)
>>> +++ c-family/c-common.c (working copy)
>>> @@ -315,6 +315,7 @@ static tree check_case_value (tree);
>>>  static bool check_case_bounds (tree, tree, tree *, tree *);
>>>
>>>  static tree handle_packed_attribute (tree *, tree, tree, int, bool *);
>>> +static tree handle_targetv_attribute (tree *, tree, tree, int, bool *);
>>>  static tree handle_nocommon_attribute (tree *, tree, tree, int, bool *);
>>>  static tree handle_common_attribute (tree *, tree, tree, int, bool *);
>>>  static tree handle_noreturn_attribute (tree *, tree, tree, int, bool *);
>>> @@ -604,6 +605,8 @@ const struct attribute_spec c_common_attribute_tab
>>>  {
>>>   /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler,
>>>        affects_type_identity } */
>>> +  { "targetv",               1, -1, true, false, false,
>>> +                             handle_targetv_attribute, false },
>>>   { "packed",                 0, 0, false, false, false,
>>>                              handle_packed_attribute , false},
>>>   { "nocommon",               0, 0, true,  false, false,
>>> @@ -5869,6 +5872,54 @@ handle_packed_attribute (tree *node, tree name, tr
>>>   return NULL_TREE;
>>>  }
>>>
>>> +/* The targetv attribue is used to specify a function version
>>> +   targeted to specific platform types.  The "targetv" attributes
>>> +   have to be valid "target" attributes.  NODE should always point
>>> +   to a FUNCTION_DECL.  ARGS contain the arguments to "targetv"
>>> +   which should be valid arguments to attribute "target" too.
>>> +   Check handle_target_attribute for FLAGS and NO_ADD_ATTRS.  */
>>> +
>>> +static tree
>>> +handle_targetv_attribute (tree *node, tree name,
>>> +                         tree args,
>>> +                         int flags,
>>> +                         bool *no_add_attrs)
>>> +{
>>> +  const char *attr_str = NULL;
>>> +  gcc_assert (TREE_CODE (*node) == FUNCTION_DECL);
>>> +  gcc_assert (args != NULL);
>>> +
>>> +  /* This is a function version.  */
>>> +  DECL_FUNCTION_VERSIONED (*node) = 1;
>>> +
>>> +  attr_str = TREE_STRING_POINTER (TREE_VALUE (args));
>>> +
>>> +  /* Check if multiple sets of target attributes are there.  This
>>> +     is not supported now.   In future, this will be supported by
>>> +     cloning this function for each set.  */
>>> +  if (TREE_CHAIN (args) != NULL)
>>> +    warning (OPT_Wattributes, "%qE attribute has multiple sets which "
>>> +            "is not supported", name);
>>> +
>>> +  if (attr_str == NULL
>>> +      || strstr (attr_str, "arch=") == NULL)
>>> +    error_at (DECL_SOURCE_LOCATION (*node),
>>> +             "Versioning supported only on \"arch=\" for now");
>>> +
>>> +  /* targetv attributes must translate into target attributes.  */
>>> +  handle_target_attribute (node, get_identifier ("target"), args, flags,
>>> +                          no_add_attrs);
>>> +
>>> +  if (*no_add_attrs)
>>> +    warning (OPT_Wattributes, "%qE attribute has no effect", name);
>>> +
>>> +  /* This is necessary to keep the attribute tagged to the decl
>>> +     all the time.  */
>>> +  *no_add_attrs = false;
>>> +
>>> +  return NULL_TREE;
>>> +}
>>> +
>>>  /* Handle a "nocommon" attribute; arguments as in
>>>    struct attribute_spec.handler.  */
>>>
>>> Index: target.def
>>> ===================================================================
>>> --- target.def  (revision 184971)
>>> +++ target.def  (working copy)
>>> @@ -1249,6 +1249,15 @@ DEFHOOK
>>>  tree, (tree fndecl, int n_args, tree *argp, bool ignore),
>>>  hook_tree_tree_int_treep_bool_null)
>>>
>>> +/* Target hook to generate the dispatching code for calls to multi-versioned
>>> +   functions.  DISPATCH_DECL is the function that will have the dispatching
>>> +   logic.  FNDECLS are the list of choices for dispatch and EMPTY_BB is the
>>> +   basic bloc in DISPATCH_DECL which will contain the code.  */
>>> +DEFHOOK
>>> +(dispatch_version,
>>> + "",
>>> + int, (tree dispatch_decl, void *fndecls, basic_block *empty_bb), NULL)
>>> +
>>>  /* Returns a code for a target-specific builtin that implements
>>>    reciprocal of the function, or NULL_TREE if not available.  */
>>>  DEFHOOK
>>> Index: tree.h
>>> ===================================================================
>>> --- tree.h      (revision 184971)
>>> +++ tree.h      (working copy)
>>> @@ -3532,6 +3532,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre
>>>  #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \
>>>    (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization)
>>>
>>> +/* In FUNCTION_DECL, this is set if this function has other versions generated
>>> +   using "targetv" attributes.  The default version is the one which does not
>>> +   have any "targetv" attribute set. */
>>> +#define DECL_FUNCTION_VERSIONED(NODE)\
>>> +   (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function)
>>> +
>>>  /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the
>>>    arguments/result/saved_tree fields by front ends.   It was either inherit
>>>    FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL,
>>> @@ -3576,8 +3582,8 @@ struct GTY(()) tree_function_decl {
>>>   unsigned looping_const_or_pure_flag : 1;
>>>   unsigned has_debug_args_flag : 1;
>>>   unsigned tm_clone_flag : 1;
>>> -
>>> -  /* 1 bit left */
>>> +  unsigned versioned_function : 1;
>>> +  /* No bits left.  */
>>>  };
>>>
>>>  /* The source language of the translation-unit.  */
>>> Index: tree-pass.h
>>> ===================================================================
>>> --- tree-pass.h (revision 184971)
>>> +++ tree-pass.h (working copy)
>>> @@ -455,6 +455,7 @@ extern struct gimple_opt_pass pass_tm_memopt;
>>>  extern struct gimple_opt_pass pass_tm_edges;
>>>  extern struct gimple_opt_pass pass_split_functions;
>>>  extern struct gimple_opt_pass pass_feedback_split_functions;
>>> +extern struct gimple_opt_pass pass_dispatch_versions;
>>>
>>>  /* IPA Passes */
>>>  extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
>>> Index: multiversion.c
>>> ===================================================================
>>> --- multiversion.c      (revision 0)
>>> +++ multiversion.c      (revision 0)
>>> @@ -0,0 +1,798 @@
>>> +/* Function Multiversioning.
>>> +   Copyright (C) 2012 Free Software Foundation, Inc.
>>> +   Contributed by Sriraman Tallam (tmsriram@google.com)
>>> +
>>> +This file is part of GCC.
>>> +
>>> +GCC is free software; you can redistribute it and/or modify it under
>>> +the terms of the GNU General Public License as published by the Free
>>> +Software Foundation; either version 3, or (at your option) any later
>>> +version.
>>> +
>>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>>> +for more details.
>>> +
>>> +You should have received a copy of the GNU General Public License
>>> +along with GCC; see the file COPYING3.  If not see
>>> +<http://www.gnu.org/licenses/>. */
>>> +
>>> +/* Holds the state for multi-versioned functions here. The front-end
>>> +   updates the state as and when function versions are encountered.
>>> +   This is then used to generate the dispatch code.  Also, the
>>> +   optimization passes to clone hot paths involving versioned functions
>>> +   will be done here.
>>> +
>>> +   Function versions are created by using the same function signature but
>>> +   also tagging attribute "targetv" to specify the platform type for which
>>> +   the version must be executed.  Here is an example:
>>> +
>>> +   int foo ()
>>> +   {
>>> +     printf ("Execute as default");
>>> +     return 0;
>>> +   }
>>> +
>>> +   int  __attribute__ ((targetv ("arch=corei7")))
>>> +   foo ()
>>> +   {
>>> +     printf ("Execute for corei7");
>>> +     return 0;
>>> +   }
>>> +
>>> +   int main ()
>>> +   {
>>> +     return foo ();
>>> +   }
>>> +
>>> +   The call to foo in main is replaced with a call to an IFUNC function that
>>> +   contains the dispatch code to call the correct function version at
>>> +   run-time.  */
>>> +
>>> +
>>> +#include "config.h"
>>> +#include "system.h"
>>> +#include "coretypes.h"
>>> +#include "tm.h"
>>> +#include "tree.h"
>>> +#include "tree-inline.h"
>>> +#include "langhooks.h"
>>> +#include "flags.h"
>>> +#include "cgraph.h"
>>> +#include "diagnostic.h"
>>> +#include "toplev.h"
>>> +#include "timevar.h"
>>> +#include "params.h"
>>> +#include "fibheap.h"
>>> +#include "intl.h"
>>> +#include "tree-pass.h"
>>> +#include "hashtab.h"
>>> +#include "coverage.h"
>>> +#include "ggc.h"
>>> +#include "tree-flow.h"
>>> +#include "rtl.h"
>>> +#include "ipa-prop.h"
>>> +#include "basic-block.h"
>>> +#include "toplev.h"
>>> +#include "dbgcnt.h"
>>> +#include "tree-dump.h"
>>> +#include "output.h"
>>> +#include "vecprim.h"
>>> +#include "gimple-pretty-print.h"
>>> +#include "ipa-inline.h"
>>> +#include "target.h"
>>> +#include "multiversion.h"
>>> +
>>> +typedef void * void_p;
>>> +
>>> +DEF_VEC_P (void_p);
>>> +DEF_VEC_ALLOC_P (void_p, heap);
>>> +
>>> +/* Each function decl that is a function version gets an instance of this
>>> +   structure.   Since this is called by the front-end, decl merging can
>>> +   happen, where a decl created for a new declaration is merged with
>>> +   the old. In this case, the new decl is deleted and the IS_DELETED
>>> +   field is set for the struct instance corresponding to the new decl.
>>> +   IFUNC_DECL is the decl of the ifunc function for default decls.
>>> +   IFUNC_RESOLVER_DECL is the decl of the dispatch function.  VERSIONS
>>> +   is a vector containing the list of function versions  that are
>>> +   the candidates for dispatch.  */
>>> +
>>> +typedef struct version_function_d {
>>> +  tree decl;
>>> +  tree ifunc_decl;
>>> +  tree ifunc_resolver_decl;
>>> +  VEC (void_p, heap) *versions;
>>> +  bool is_deleted;
>>> +} version_function;
>>> +
>>> +/* Hashmap has an entry for every function decl that has other function
>>> +   versions.  For function decls that are the default, it also stores the
>>> +   list of all the other function versions.  Each entry is a structure
>>> +   of type version_function_d.  */
>>> +static htab_t decl_version_htab = NULL;
>>> +
>>> +/* Hashtable helpers for decl_version_htab. */
>>> +
>>> +static hashval_t
>>> +decl_version_htab_hash_descriptor (const void *p)
>>> +{
>>> +  const version_function *t = (const version_function *) p;
>>> +  return htab_hash_pointer (t->decl);
>>> +}
>>> +
>>> +/* Hashtable helper for decl_version_htab. */
>>> +
>>> +static int
>>> +decl_version_htab_eq_descriptor (const void *p1, const void *p2)
>>> +{
>>> +  const version_function *t1 = (const version_function *) p1;
>>> +  return htab_eq_pointer ((const void_p) t1->decl, p2);
>>> +}
>>> +
>>> +/* Create the decl_version_htab.  */
>>> +static void
>>> +create_decl_version_htab (void)
>>> +{
>>> +  if (decl_version_htab == NULL)
>>> +    decl_version_htab = htab_create (10, decl_version_htab_hash_descriptor,
>>> +                                    decl_version_htab_eq_descriptor, NULL);
>>> +}
>>> +
>>> +/* Creates an instance of version_function for decl DECL.  */
>>> +
>>> +static version_function*
>>> +new_version_function (const tree decl)
>>> +{
>>> +  version_function *v;
>>> +  v = (version_function *)xmalloc(sizeof (version_function));
>>> +  v->decl = decl;
>>> +  v->ifunc_decl = NULL;
>>> +  v->ifunc_resolver_decl = NULL;
>>> +  v->versions = NULL;
>>> +  v->is_deleted = false;
>>> +  return v;
>>> +}
>>> +
>>> +/* Comparator function to be used in qsort routine to sort attribute
>>> +   specification strings to "targetv".  */
>>> +
>>> +static int
>>> +attr_strcmp (const void *v1, const void *v2)
>>> +{
>>> +  const char *c1 = *(char *const*)v1;
>>> +  const char *c2 = *(char *const*)v2;
>>> +  return strcmp (c1, c2);
>>> +}
>>> +
>>> +/* STR is the argument to targetv attribute.  This function tokenizes
>>> +   the comma separated arguments, sorts them and returns a string which
>>> +   is a unique identifier for the comma separated arguments.  */
>>> +
>>> +static char *
>>> +sorted_attr_string (const char *str)
>>> +{
>>> +  char **args = NULL;
>>> +  char *attr_str, *ret_str;
>>> +  char *attr = NULL;
>>> +  unsigned int argnum = 1;
>>> +  unsigned int i;
>>> +
>>> +  for (i = 0; i < strlen (str); i++)
>>> +    if (str[i] == ',')
>>> +      argnum++;
>>> +
>>> +  attr_str = (char *)xmalloc (strlen (str) + 1);
>>> +  strcpy (attr_str, str);
>>> +
>>> +  for (i = 0; i < strlen (attr_str); i++)
>>> +    if (attr_str[i] == '=')
>>> +      attr_str[i] = '_';
>>> +
>>> +  if (argnum == 1)
>>> +    return attr_str;
>>> +
>>> +  args = (char **)xmalloc (argnum * sizeof (char *));
>>> +
>>> +  i = 0;
>>> +  attr = strtok (attr_str, ",");
>>> +  while (attr != NULL)
>>> +    {
>>> +      args[i] = attr;
>>> +      i++;
>>> +      attr = strtok (NULL, ",");
>>> +    }
>>> +
>>> +  qsort (args, argnum, sizeof (char*), attr_strcmp);
>>> +
>>> +  ret_str = (char *)xmalloc (strlen (str) + 1);
>>> +  strcpy (ret_str, args[0]);
>>> +  for (i = 1; i < argnum; i++)
>>> +    {
>>> +      strcat (ret_str, "_");
>>> +      strcat (ret_str, args[i]);
>>> +    }
>>> +
>>> +  free (args);
>>> +  free (attr_str);
>>> +  return ret_str;
>>> +}
>>> +
>>> +/* Returns true when only one of DECL1 and DECL2 is marked with "targetv"
>>> +   or if the "targetv" attribute strings of DECL1 and DECL2 dont match.  */
>>> +
>>> +bool
>>> +has_different_version_attributes (const tree decl1, const tree decl2)
>>> +{
>>> +  tree attr1, attr2;
>>> +  char *c1, *c2;
>>> +  bool ret = false;
>>> +
>>> +  if (TREE_CODE (decl1) != FUNCTION_DECL
>>> +      || TREE_CODE (decl2) != FUNCTION_DECL)
>>> +    return false;
>>> +
>>> +  attr1 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl1));
>>> +  attr2 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl2));
>>> +
>>> +  if (attr1 == NULL_TREE && attr2 == NULL_TREE)
>>> +    return false;
>>> +
>>> +  if ((attr1 == NULL_TREE && attr2 != NULL_TREE)
>>> +      || (attr1 != NULL_TREE && attr2 == NULL_TREE))
>>> +    return true;
>>> +
>>> +  c1 = sorted_attr_string (
>>> +       TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr1))));
>>> +  c2 = sorted_attr_string (
>>> +       TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr2))));
>>> +
>>> +  if (strcmp (c1, c2) != 0)
>>> +     ret = true;
>>> +
>>> +  free (c1);
>>> +  free (c2);
>>> +
>>> +  return ret;
>>> +}
>>> +
>>> +/* If this decl corresponds to a function and has "targetv" attribute,
>>> +   append the attribute string to its assembler name.  */
>>> +
>>> +void
>>> +version_assembler_name (const tree decl)
>>> +{
>>> +  tree version_attr;
>>> +  const char *orig_name, *version_string, *attr_str;
>>> +  char *assembler_name;
>>> +  tree assembler_name_tree;
>>> +
>>> +  if (TREE_CODE (decl) != FUNCTION_DECL
>>> +      || DECL_ASSEMBLER_NAME_SET_P (decl)
>>> +      || !DECL_FUNCTION_VERSIONED (decl))
>>> +    return;
>>> +
>>> +  if (DECL_DECLARED_INLINE_P (decl)
>>> +      &&lookup_attribute ("gnu_inline",
>>> +                         DECL_ATTRIBUTES (decl)))
>>> +    error_at (DECL_SOURCE_LOCATION (decl),
>>> +             "Function versions cannot be marked as gnu_inline,"
>>> +             " bodies have to be generated\n");
>>> +
>>> +  if (DECL_VIRTUAL_P (decl)
>>> +      || DECL_VINDEX (decl))
>>> +    error_at (DECL_SOURCE_LOCATION (decl),
>>> +             "Virtual function versioning not supported\n");
>>> +
>>> +  version_attr = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl));
>>> +  /* targetv attribute string is NULL for default functions.  */
>>> +  if (version_attr == NULL_TREE)
>>> +    return;
>>> +
>>> +  orig_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
>>> +  version_string
>>> +    = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr)));
>>> +
>>> +  attr_str = sorted_attr_string (version_string);
>>> +  assembler_name = (char *) xmalloc (strlen (orig_name)
>>> +                                    + strlen (attr_str) + 2);
>>> +
>>> +  sprintf (assembler_name, "%s.%s", orig_name, attr_str);
>>> +  if (dump_file)
>>> +    fprintf (dump_file, "Assembler name set to %s for function version %s\n",
>>> +            assembler_name, IDENTIFIER_POINTER (DECL_NAME (decl)));
>>> +  assembler_name_tree = get_identifier (assembler_name);
>>> +  SET_DECL_ASSEMBLER_NAME (decl, assembler_name_tree);
>>> +}
>>> +
>>> +/* Returns true if decl is multi-versioned and DECL is the default function,
>>> +   that is it is not tagged with "targetv" attribute.  */
>>> +
>>> +bool
>>> +is_default_function (const tree decl)
>>> +{
>>> +  return (TREE_CODE (decl) == FUNCTION_DECL
>>> +         && DECL_FUNCTION_VERSIONED (decl)
>>> +         && (lookup_attribute ("targetv", DECL_ATTRIBUTES (decl))
>>> +             == NULL_TREE));
>>> +}
>>> +
>>> +/* For function decl DECL, find the version_function struct in the
>>> +   decl_version_htab.  */
>>> +
>>> +static version_function *
>>> +find_function_version (const tree decl)
>>> +{
>>> +  void *slot;
>>> +
>>> +  if (!DECL_FUNCTION_VERSIONED (decl))
>>> +    return NULL;
>>> +
>>> +  if (!decl_version_htab)
>>> +    return NULL;
>>> +
>>> +  slot = htab_find_with_hash (decl_version_htab, decl,
>>> +                              htab_hash_pointer (decl));
>>> +
>>> +  if (slot != NULL)
>>> +    return (version_function *)slot;
>>> +
>>> +  return NULL;
>>> +}
>>> +
>>> +/* Record DECL as a function version by creating a version_function struct
>>> +   for it and storing it in the hashtable.  */
>>> +
>>> +static version_function *
>>> +add_function_version (const tree decl)
>>> +{
>>> +  void **slot;
>>> +  version_function *v;
>>> +
>>> +  if (!DECL_FUNCTION_VERSIONED (decl))
>>> +    return NULL;
>>> +
>>> +  create_decl_version_htab ();
>>> +
>>> +  slot = htab_find_slot_with_hash (decl_version_htab, (const void_p)decl,
>>> +                                   htab_hash_pointer ((const void_p)decl),
>>> +                                  INSERT);
>>> +
>>> +  if (*slot != NULL)
>>> +    return (version_function *)*slot;
>>> +
>>> +  v = new_version_function (decl);
>>> +  *slot = v;
>>> +
>>> +  return v;
>>> +}
>>> +
>>> +/* Push V into VEC only if it is not already present.  */
>>> +
>>> +static void
>>> +push_function_version (version_function *v, VEC (void_p, heap) *vec)
>>> +{
>>> +  int ix;
>>> +  void_p ele;
>>> +  for (ix = 0; VEC_iterate (void_p, vec, ix, ele); ++ix)
>>> +    {
>>> +      if (ele == (void_p)v)
>>> +        return;
>>> +    }
>>> +
>>> +  VEC_safe_push (void_p, heap, vec, (void*)v);
>>> +}
>>> +
>>> +/* Mark DECL as deleted.  This is called by the front-end when a duplicate
>>> +   decl is merged with the original decl and the duplicate decl is deleted.
>>> +   This function marks the duplicate_decl as invalid.  Called by
>>> +   duplicate_decls in cp/decl.c.  */
>>> +
>>> +void
>>> +mark_delete_decl_version (const tree decl)
>>> +{
>>> +  version_function *decl_v;
>>> +
>>> +  decl_v = find_function_version (decl);
>>> +
>>> +  if (decl_v == NULL)
>>> +    return;
>>> +
>>> +  decl_v->is_deleted = true;
>>> +
>>> +  if (is_default_function (decl)
>>> +      && decl_v->versions != NULL)
>>> +    {
>>> +      VEC_truncate (void_p, decl_v->versions, 0);
>>> +      VEC_free (void_p, heap, decl_v->versions);
>>> +    }
>>> +}
>>> +
>>> +/* Mark DECL1 and DECL2 to be function versions in the same group.  One
>>> +   of DECL1 and DECL2 must be the default, otherwise this function does
>>> +   nothing.  This function aggregates the versions.  */
>>> +
>>> +int
>>> +group_function_versions (const tree decl1, const tree decl2)
>>> +{
>>> +  tree default_decl, version_decl;
>>> +  version_function *default_v, *version_v;
>>> +
>>> +  gcc_assert (DECL_FUNCTION_VERSIONED (decl1)
>>> +             && DECL_FUNCTION_VERSIONED (decl2));
>>> +
>>> +  /* The version decls are added only to the default decl.  */
>>> +  if (!is_default_function (decl1)
>>> +      && !is_default_function (decl2))
>>> +    return 0;
>>> +
>>> +  /* This can happen with duplicate declarations.  Just ignore.  */
>>> +  if (is_default_function (decl1)
>>> +      && is_default_function (decl2))
>>> +    return 0;
>>> +
>>> +  default_decl = (is_default_function (decl1)) ? decl1 : decl2;
>>> +  version_decl = (default_decl == decl1) ? decl2 : decl1;
>>> +
>>> +  gcc_assert (default_decl != version_decl);
>>> +  create_decl_version_htab ();
>>> +
>>> +  /* If the version function is found, it has been added.  */
>>> +  if (find_function_version (version_decl))
>>> +    return 0;
>>> +
>>> +  default_v = add_function_version (default_decl);
>>> +  version_v = add_function_version (version_decl);
>>> +
>>> +  if (default_v->versions == NULL)
>>> +    default_v->versions = VEC_alloc (void_p, heap, 1);
>>> +
>>> +  push_function_version (version_v, default_v->versions);
>>> +  return 0;
>>> +}
>>> +
>>> +/* Makes a function attribute of the form NAME(ARG_NAME) and chains
>>> +   it to CHAIN.  */
>>> +
>>> +static tree
>>> +make_attribute (const char *name, const char *arg_name, tree chain)
>>> +{
>>> +  tree attr_name;
>>> +  tree attr_arg_name;
>>> +  tree attr_args;
>>> +  tree attr;
>>> +
>>> +  attr_name = get_identifier (name);
>>> +  attr_arg_name = build_string (strlen (arg_name), arg_name);
>>> +  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
>>> +  attr = tree_cons (attr_name, attr_args, chain);
>>> +  return attr;
>>> +}
>>> +
>>> +/* Return a new name by appending SUFFIX to the DECL name.  If
>>> +   make_unique is true, append the full path name.  */
>>> +
>>> +static char *
>>> +make_name (tree decl, const char *suffix, bool make_unique)
>>> +{
>>> +  char *global_var_name;
>>> +  int name_len;
>>> +  const char *name;
>>> +  const char *unique_name = NULL;
>>> +
>>> +  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
>>> +
>>> +  /* Get a unique name that can be used globally without any chances
>>> +     of collision at link time.  */
>>> +  if (make_unique)
>>> +    unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0"));
>>> +
>>> +  name_len = strlen (name) + strlen (suffix) + 2;
>>> +
>>> +  if (make_unique)
>>> +    name_len += strlen (unique_name) + 1;
>>> +  global_var_name = (char *) xmalloc (name_len);
>>> +
>>> +  /* Use '.' to concatenate names as it is demangler friendly.  */
>>> +  if (make_unique)
>>> +      snprintf (global_var_name, name_len, "%s.%s.%s", name,
>>> +               unique_name, suffix);
>>> +  else
>>> +      snprintf (global_var_name, name_len, "%s.%s", name, suffix);
>>> +
>>> +  return global_var_name;
>>> +}
>>> +
>>> +/* Make the resolver function decl for ifunc (IFUNC_DECL) to dispatch
>>> +   the versions of multi-versioned function DEFAULT_DECL.  Create and
>>> +   empty basic block in the resolver and store the pointer in
>>> +   EMPTY_BB.  Return the decl of the resolver function.  */
>>> +
>>> +static tree
>>> +make_ifunc_resolver_func (const tree default_decl,
>>> +                         const tree ifunc_decl,
>>> +                         basic_block *empty_bb)
>>> +{
>>> +  char *resolver_name;
>>> +  tree decl, type, decl_name, t;
>>> +  basic_block new_bb;
>>> +  tree old_current_function_decl;
>>> +  bool make_unique = false;
>>> +
>>> +  /* IFUNC's have to be globally visible.  So, if the default_decl is
>>> +     not, then the name of the IFUNC should be made unique.  */
>>> +  if (TREE_PUBLIC (default_decl) == 0)
>>> +    make_unique = true;
>>> +
>>> +  /* Append the filename to the resolver function if the versions are
>>> +     not externally visible.  This is because the resolver function has
>>> +     to be externally visible for the loader to find it.  So, appending
>>> +     the filename will prevent conflicts with a resolver function from
>>> +     another module which is based on the same version name.  */
>>> +  resolver_name = make_name (default_decl, "resolver", make_unique);
>>> +
>>> +  /* The resolver function should return a (void *). */
>>> +  type = build_function_type_list (ptr_type_node, NULL_TREE);
>>> +
>>> +  decl = build_fn_decl (resolver_name, type);
>>> +  decl_name = get_identifier (resolver_name);
>>> +  SET_DECL_ASSEMBLER_NAME (decl, decl_name);
>>> +
>>> +  DECL_NAME (decl) = decl_name;
>>> +  TREE_USED (decl) = TREE_USED (default_decl);
>>> +  DECL_ARTIFICIAL (decl) = 1;
>>> +  DECL_IGNORED_P (decl) = 0;
>>> +  /* IFUNC resolvers have to be externally visible.  */
>>> +  TREE_PUBLIC (decl) = 1;
>>> +  DECL_UNINLINABLE (decl) = 1;
>>> +
>>> +  DECL_EXTERNAL (decl) = DECL_EXTERNAL (default_decl);
>>> +  DECL_EXTERNAL (ifunc_decl) = 0;
>>> +
>>> +  DECL_CONTEXT (decl) = NULL_TREE;
>>> +  DECL_INITIAL (decl) = make_node (BLOCK);
>>> +  DECL_STATIC_CONSTRUCTOR (decl) = 0;
>>> +  TREE_READONLY (decl) = 0;
>>> +  DECL_PURE_P (decl) = 0;
>>> +  DECL_COMDAT (decl) = DECL_COMDAT (default_decl);
>>> +  if (DECL_COMDAT_GROUP (default_decl))
>>> +    {
>>> +      make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl));
>>> +    }
>>> +  /* Build result decl and add to function_decl. */
>>> +  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
>>> +  DECL_ARTIFICIAL (t) = 1;
>>> +  DECL_IGNORED_P (t) = 1;
>>> +  DECL_RESULT (decl) = t;
>>> +
>>> +  gimplify_function_tree (decl);
>>> +  old_current_function_decl = current_function_decl;
>>> +  push_cfun (DECL_STRUCT_FUNCTION (decl));
>>> +  current_function_decl = decl;
>>> +  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
>>> +  cfun->curr_properties |=
>>> +    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
>>> +     PROP_ssa);
>>> +  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
>>> +  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
>>> +  make_edge (new_bb, EXIT_BLOCK_PTR, 0);
>>> +  *empty_bb = new_bb;
>>> +
>>> +  cgraph_add_new_function (decl, true);
>>> +  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl));
>>> +  cgraph_analyze_function (cgraph_get_create_node (decl));
>>> +  cgraph_mark_needed_node (cgraph_get_create_node (decl));
>>> +
>>> +  if (DECL_COMDAT_GROUP (default_decl))
>>> +    {
>>> +      gcc_assert (cgraph_get_node (default_decl));
>>> +      cgraph_add_to_same_comdat_group (cgraph_get_node (decl),
>>> +                                      cgraph_get_node (default_decl));
>>> +    }
>>> +
>>> +  pop_cfun ();
>>> +  current_function_decl = old_current_function_decl;
>>> +
>>> +  gcc_assert (ifunc_decl != NULL);
>>> +  DECL_ATTRIBUTES (ifunc_decl)
>>> +    = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (ifunc_decl));
>>> +  assemble_alias (ifunc_decl, get_identifier (resolver_name));
>>> +  return decl;
>>> +}
>>> +
>>> +/* Make and ifunc declaration for the multi-versioned function DECL.  Calls to
>>> +   DECL function will be replaced with calls to the ifunc.   Return the decl
>>> +   of the ifunc created.  */
>>> +
>>> +static tree
>>> +make_ifunc_func (const tree decl)
>>> +{
>>> +  tree ifunc_decl;
>>> +  char *ifunc_name, *resolver_name;
>>> +  tree fn_type, ifunc_type;
>>> +  bool make_unique = false;
>>> +
>>> +  if (TREE_PUBLIC (decl) == 0)
>>> +    make_unique = true;
>>> +
>>> +  ifunc_name = make_name (decl, "ifunc", make_unique);
>>> +  resolver_name = make_name (decl, "resolver", make_unique);
>>> +  gcc_assert (resolver_name);
>>> +
>>> +  fn_type = TREE_TYPE (decl);
>>> +  ifunc_type = build_function_type (TREE_TYPE (fn_type),
>>> +                                   TYPE_ARG_TYPES (fn_type));
>>> +
>>> +  ifunc_decl = build_fn_decl (ifunc_name, ifunc_type);
>>> +  TREE_USED (ifunc_decl) = 1;
>>> +  DECL_CONTEXT (ifunc_decl) = NULL_TREE;
>>> +  DECL_INITIAL (ifunc_decl) = error_mark_node;
>>> +  DECL_ARTIFICIAL (ifunc_decl) = 1;
>>> +  /* Mark this ifunc as external, the resolver will flip it again if
>>> +     it gets generated.  */
>>> +  DECL_EXTERNAL (ifunc_decl) = 1;
>>> +  /* IFUNCs have to be externally visible.  */
>>> +  TREE_PUBLIC (ifunc_decl) = 1;
>>> +
>>> +  return ifunc_decl;
>>> +}
>>> +
>>> +/* For multi-versioned function decl, which should also be the default,
>>> +   return the decl of the ifunc resolver, create it if it does not
>>> +   exist.  */
>>> +
>>> +tree
>>> +get_ifunc_for_version (const tree decl)
>>> +{
>>> +  version_function *decl_v;
>>> +  int ix;
>>> +  void_p ele;
>>> +
>>> +  /* DECL has to be the default version, otherwise it is missing and
>>> +     that is not allowed.  */
>>> +  if (!is_default_function (decl))
>>> +    {
>>> +      error_at (DECL_SOURCE_LOCATION (decl), "Default version not found");
>>> +      return decl;
>>> +    }
>>> +
>>> +  decl_v = find_function_version (decl);
>>> +  gcc_assert (decl_v != NULL);
>>> +  if (decl_v->ifunc_decl == NULL)
>>> +    {
>>> +      tree ifunc_decl;
>>> +      ifunc_decl = make_ifunc_func (decl);
>>> +      decl_v->ifunc_decl = ifunc_decl;
>>> +    }
>>> +
>>> +  if (cgraph_get_node (decl))
>>> +    cgraph_mark_needed_node (cgraph_get_node (decl));
>>> +
>>> +  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
>>> +    {
>>> +      version_function *v = (version_function *) ele;
>>> +      gcc_assert (v->decl != NULL);
>>> +      if (cgraph_get_node (v->decl))
>>> +       cgraph_mark_needed_node (cgraph_get_node (v->decl));
>>> +    }
>>> +
>>> +  return decl_v->ifunc_decl;
>>> +}
>>> +
>>> +/* Generate the dispatching code to dispatch multi-versioned function
>>> +   DECL.  Make a new function decl for dispatching and call the target
>>> +   hook to process the "targetv" attributes and provide the code to
>>> +   dispatch the right function at run-time.  */
>>> +
>>> +static tree
>>> +make_ifunc_resolver_for_version (const tree decl)
>>> +{
>>> +  version_function *decl_v;
>>> +  tree ifunc_resolver_decl, ifunc_decl;
>>> +  basic_block empty_bb;
>>> +  int ix;
>>> +  void_p ele;
>>> +  VEC (tree, heap) *fn_ver_vec = NULL;
>>> +
>>> +  gcc_assert (is_default_function (decl));
>>> +
>>> +  decl_v = find_function_version (decl);
>>> +  gcc_assert (decl_v != NULL);
>>> +
>>> +  if (decl_v->ifunc_resolver_decl != NULL)
>>> +    return decl_v->ifunc_resolver_decl;
>>> +
>>> +  ifunc_decl = decl_v->ifunc_decl;
>>> +
>>> +  if (ifunc_decl == NULL)
>>> +    ifunc_decl = decl_v->ifunc_decl = make_ifunc_func (decl);
>>> +
>>> +  ifunc_resolver_decl = make_ifunc_resolver_func (decl, ifunc_decl,
>>> +                                                 &empty_bb);
>>> +
>>> +  fn_ver_vec = VEC_alloc (tree, heap, 2);
>>> +  VEC_safe_push (tree, heap, fn_ver_vec, decl);
>>> +
>>> +  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
>>> +    {
>>> +      version_function *v = (version_function *) ele;
>>> +      gcc_assert (v->decl != NULL);
>>> +      /* Check for virtual functions here again, as by this time it should
>>> +        have been determined if this function needs a vtable index or
>>> +        not.  This happens for methods in derived classes that override
>>> +        virtual methods in base classes but are not explicitly marked as
>>> +        virtual.  */
>>> +      if (DECL_VINDEX (v->decl))
>>> +        error_at (DECL_SOURCE_LOCATION (v->decl),
>>> +                 "Virtual function versioning not supported\n");
>>> +      if (!v->is_deleted)
>>> +       VEC_safe_push (tree, heap, fn_ver_vec, v->decl);
>>> +    }
>>> +
>>> +  gcc_assert (targetm.dispatch_version);
>>> +  targetm.dispatch_version (ifunc_resolver_decl, fn_ver_vec, &empty_bb);
>>> +  decl_v->ifunc_resolver_decl = ifunc_resolver_decl;
>>> +
>>> +  return ifunc_resolver_decl;
>>> +}
>>> +
>>> +/* Main entry point to pass_dispatch_versions. For multi-versioned functions,
>>> +   generate the dispatching code.  */
>>> +
>>> +static unsigned int
>>> +do_dispatch_versions (void)
>>> +{
>>> +  /* A new pass for generating dispatch code for multi-versioned functions.
>>> +     Other forms of dispatch can be added when ifunc support is not available
>>> +     like just calling the function directly after checking for target type.
>>> +     Currently, dispatching is done through IFUNC.  This pass will become
>>> +     more meaningful when other dispatch mechanisms are added.  */
>>> +
>>> +  /* Cloning a function to produce more versions will happen here when the
>>> +     user requests that via the targetv attribute. For example,
>>> +     int foo () __attribute__ ((targetv(("arch=core2"), ("arch=corei7"))));
>>> +     means that the user wants the same body of foo to be versioned for core2
>>> +     and corei7.  In that case, this function will be cloned during this
>>> +     pass.  */
>>> +
>>> +  if (DECL_FUNCTION_VERSIONED (current_function_decl)
>>> +      && is_default_function (current_function_decl))
>>> +    {
>>> +      tree decl = make_ifunc_resolver_for_version (current_function_decl);
>>> +      if (dump_file && decl)
>>> +       dump_function_to_file (decl, dump_file, TDF_BLOCKS);
>>> +    }
>>> +  return 0;
>>> +}
>>> +
>>> +static  bool
>>> +gate_dispatch_versions (void)
>>> +{
>>> +  return true;
>>> +}
>>> +
>>> +/* A pass to generate the dispatch code to execute the appropriate version
>>> +   of a multi-versioned function at run-time.  */
>>> +
>>> +struct gimple_opt_pass pass_dispatch_versions =
>>> +{
>>> + {
>>> +  GIMPLE_PASS,
>>> +  "dispatch_multiversion_functions",    /* name */
>>> +  gate_dispatch_versions,              /* gate */
>>> +  do_dispatch_versions,                        /* execute */
>>> +  NULL,                                        /* sub */
>>> +  NULL,                                        /* next */
>>> +  0,                                   /* static_pass_number */
>>> +  TV_MULTIVERSION_DISPATCH,            /* tv_id */
>>> +  PROP_cfg,                            /* properties_required */
>>> +  PROP_cfg,                            /* properties_provided */
>>> +  0,                                   /* properties_destroyed */
>>> +  0,                                   /* todo_flags_start */
>>> +  TODO_dump_func |                     /* todo_flags_finish */
>>> +  TODO_cleanup_cfg | TODO_dump_cgraph
>>> + }
>>> +};
>>> Index: cgraphunit.c
>>> ===================================================================
>>> --- cgraphunit.c        (revision 184971)
>>> +++ cgraphunit.c        (working copy)
>>> @@ -141,6 +141,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "ipa-inline.h"
>>>  #include "ipa-utils.h"
>>>  #include "lto-streamer.h"
>>> +#include "multiversion.h"
>>>
>>>  static void cgraph_expand_all_functions (void);
>>>  static void cgraph_mark_functions_to_output (void);
>>> @@ -343,6 +344,13 @@ cgraph_finalize_function (tree decl, bool nested)
>>>       node->local.redefined_extern_inline = true;
>>>     }
>>>
>>> +  /* If this is a function version and not the default, change the
>>> +     assembler name of this function.  The DECL names of function
>>> +     versions are the same, only the assembler names are made unique.
>>> +     The assembler name is changed by appending the string from
>>> +     the "targetv" attribute.  */
>>> +  version_assembler_name (decl);
>>> +
>>>   notice_global_symbol (decl);
>>>   node->local.finalized = true;
>>>   node->lowered = DECL_STRUCT_FUNCTION (decl)->cfg != NULL;
>>> Index: multiversion.h
>>> ===================================================================
>>> --- multiversion.h      (revision 0)
>>> +++ multiversion.h      (revision 0)
>>> @@ -0,0 +1,52 @@
>>> +/* Function Multiversioning.
>>> +   Copyright (C) 2012 Free Software Foundation, Inc.
>>> +   Contributed by Sriraman Tallam (tmsriram@google.com)
>>> +
>>> +This file is part of GCC.
>>> +
>>> +GCC is free software; you can redistribute it and/or modify it under
>>> +the terms of the GNU General Public License as published by the Free
>>> +Software Foundation; either version 3, or (at your option) any later
>>> +version.
>>> +
>>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>>> +for more details.
>>> +
>>> +You should have received a copy of the GNU General Public License
>>> +along with GCC; see the file COPYING3.  If not see
>>> +<http://www.gnu.org/licenses/>. */
>>> +
>>> +/* This is the header file which provides the functions to keep track
>>> +   of functions that are multi-versioned and to generate the dispatch
>>> +   code to call the right version at run-time.  */
>>> +
>>> +#ifndef GCC_MULTIVERSION_H
>>> +#define GCC_MULTIVERION_H
>>> +
>>> +#include "tree.h"
>>> +
>>> +/* Mark DECL1 and DECL2 as function versions.  */
>>> +int group_function_versions (const tree decl1, const tree decl2);
>>> +
>>> +/* Mark DECL as deleted and no longer a version.  */
>>> +void mark_delete_decl_version (const tree decl);
>>> +
>>> +/* Returns true if DECL is the default version to be executed if all
>>> +   other versions are inappropriate at run-time.  */
>>> +bool is_default_function (const tree decl);
>>> +
>>> +/* Gets the IFUNC dispatcher for this multi-versioned function DECL. DECL
>>> +   must be the default function in the multi-versioned group.  */
>>> +tree get_ifunc_for_version (const tree decl);
>>> +
>>> +/* Returns true when only one of DECL1 and DECL2 is marked with "targetv"
>>> +   or if the "targetv" attribute strings of  DECL1 and DECL2 dont match.  */
>>> +bool has_different_version_attributes (const tree decl1, const tree decl2);
>>> +
>>> +/* If DECL is a function version and not the default version, the assembler
>>> +   name of DECL is changed to include the attribute string to keep the
>>> +   name unambiguous.  */
>>> +void version_assembler_name (const tree decl);
>>> +#endif
>>> Index: cp/class.c
>>> ===================================================================
>>> --- cp/class.c  (revision 184971)
>>> +++ cp/class.c  (working copy)
>>> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "tree-dump.h"
>>>  #include "splay-tree.h"
>>>  #include "pointer-set.h"
>>> +#include "multiversion.h"
>>>
>>>  /* The number of nested classes being processed.  If we are not in the
>>>    scope of any class, this is zero.  */
>>> @@ -1092,7 +1093,20 @@ add_method (tree type, tree method, tree using_dec
>>>              || same_type_p (TREE_TYPE (fn_type),
>>>                              TREE_TYPE (method_type))))
>>>        {
>>> -         if (using_decl)
>>> +         /* For function versions, their parms and types match
>>> +            but they are not duplicates.  Record function versions
>>> +            as and when they are found.  */
>>> +         if (TREE_CODE (fn) == FUNCTION_DECL
>>> +             && TREE_CODE (method) == FUNCTION_DECL
>>> +             && (DECL_FUNCTION_VERSIONED (fn)
>>> +                 || DECL_FUNCTION_VERSIONED (method)))
>>> +           {
>>> +             DECL_FUNCTION_VERSIONED (fn) = 1;
>>> +             DECL_FUNCTION_VERSIONED (method) = 1;
>>> +             group_function_versions (fn, method);
>>> +             continue;
>>> +           }
>>> +         else if (using_decl)
>>>            {
>>>              if (DECL_CONTEXT (fn) == type)
>>>                /* Defer to the local function.  */
>>> @@ -1150,6 +1164,13 @@ add_method (tree type, tree method, tree using_dec
>>>   else
>>>     /* Replace the current slot.  */
>>>     VEC_replace (tree, method_vec, slot, overload);
>>> +
>>> +  /* Change the assembler name of method here if it has "targetv"
>>> +     attributes.  Since all versions have the same mangled name,
>>> +     their assembler name is changed by appending the string from
>>> +     the "targetv" attribute. */
>>> +  version_assembler_name (method);
>>> +
>>>   return true;
>>>  }
>>>
>>> @@ -6890,8 +6911,11 @@ resolve_address_of_overloaded_function (tree targe
>>>          if (DECL_ANTICIPATED (fn))
>>>            continue;
>>>
>>> -         /* See if there's a match.  */
>>> -         if (same_type_p (target_fn_type, static_fn_type (fn)))
>>> +         /* See if there's a match.   For functions that are multi-versioned
>>> +            match it to the default function.  */
>>> +         if (same_type_p (target_fn_type, static_fn_type (fn))
>>> +             && (!DECL_FUNCTION_VERSIONED (fn)
>>> +                 || is_default_function (fn)))
>>>            matches = tree_cons (fn, NULL_TREE, matches);
>>>        }
>>>     }
>>> @@ -7053,6 +7077,21 @@ resolve_address_of_overloaded_function (tree targe
>>>       perform_or_defer_access_check (access_path, fn, fn);
>>>     }
>>>
>>> +  /* If a pointer to a function that is multi-versioned is requested, the
>>> +     pointer to the dispatcher function is returned instead.  This works
>>> +     well because indirectly calling the function will dispatch the right
>>> +     function version at run-time. Also, the function address is kept
>>> +     unique.  */
>>> +  if (DECL_FUNCTION_VERSIONED (fn)
>>> +      && is_default_function (fn))
>>> +    {
>>> +      tree ifunc_decl;
>>> +      ifunc_decl = get_ifunc_for_version (fn);
>>> +      gcc_assert (ifunc_decl != NULL);
>>> +      mark_used (fn);
>>> +      return build_fold_addr_expr (ifunc_decl);
>>> +    }
>>> +
>>>   if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type))
>>>     return cp_build_addr_expr (fn, flags);
>>>   else
>>> Index: cp/decl.c
>>> ===================================================================
>>> --- cp/decl.c   (revision 184971)
>>> +++ cp/decl.c   (working copy)
>>> @@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "pointer-set.h"
>>>  #include "splay-tree.h"
>>>  #include "plugin.h"
>>> +#include "multiversion.h"
>>>
>>>  /* Possible cases of bad specifiers type used by bad_specifiers. */
>>>  enum bad_spec_place {
>>> @@ -972,6 +973,23 @@ decls_match (tree newdecl, tree olddecl)
>>>       if (t1 != t2)
>>>        return 0;
>>>
>>> +      /* The decls dont match if they correspond to two different versions
>>> +        of the same function.  */
>>> +      if (compparms (p1, p2)
>>> +         && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2))
>>> +         && (DECL_FUNCTION_VERSIONED (newdecl)
>>> +             || DECL_FUNCTION_VERSIONED (olddecl))
>>> +         && has_different_version_attributes (newdecl, olddecl))
>>> +       {
>>> +         /* One of the decls could be the default without the "targetv"
>>> +            attribute. Set it to be a versioned function here.  */
>>> +         DECL_FUNCTION_VERSIONED (newdecl) = 1;
>>> +         DECL_FUNCTION_VERSIONED (olddecl) = 1;
>>> +         /* Accumulate all the versions of a function.  */
>>> +         group_function_versions (olddecl, newdecl);
>>> +         return 0;
>>> +       }
>>> +
>>>       if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl)
>>>          && ! (DECL_EXTERN_C_P (newdecl)
>>>                && DECL_EXTERN_C_P (olddecl)))
>>> @@ -1482,7 +1500,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
>>>              error ("previous declaration %q+#D here", olddecl);
>>>              return NULL_TREE;
>>>            }
>>> -         else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
>>> +         /* For function versions, params and types match, but they
>>> +            are not ambiguous.  */
>>> +         else if ((!DECL_FUNCTION_VERSIONED (newdecl)
>>> +                   && !DECL_FUNCTION_VERSIONED (olddecl))
>>> +                  && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
>>>                              TYPE_ARG_TYPES (TREE_TYPE (olddecl))))
>>>            {
>>>              error ("new declaration %q#D", newdecl);
>>> @@ -2250,6 +2272,16 @@ duplicate_decls (tree newdecl, tree olddecl, bool
>>>   else if (DECL_PRESERVE_P (newdecl))
>>>     DECL_PRESERVE_P (olddecl) = 1;
>>>
>>> +  /* If the olddecl is a version, so is the newdecl.  */
>>> +  if (TREE_CODE (newdecl) == FUNCTION_DECL
>>> +      && DECL_FUNCTION_VERSIONED (olddecl))
>>> +    {
>>> +      DECL_FUNCTION_VERSIONED (newdecl) = 1;
>>> +      /* Record that newdecl is not a valid version and has
>>> +        been deleted.  */
>>> +      mark_delete_decl_version (newdecl);
>>> +    }
>>> +
>>>   if (TREE_CODE (newdecl) == FUNCTION_DECL)
>>>     {
>>>       int function_size;
>>> @@ -4512,6 +4544,10 @@ start_decl (const cp_declarator *declarator,
>>>   /* Enter this declaration into the symbol table.  */
>>>   decl = maybe_push_decl (decl);
>>>
>>> +  /* If this decl is a function version and not the default, its assembler
>>> +     name has to be changed.  */
>>> +  version_assembler_name (decl);
>>> +
>>>   if (processing_template_decl)
>>>     decl = push_template_decl (decl);
>>>   if (decl == error_mark_node)
>>> @@ -13019,6 +13055,10 @@ start_function (cp_decl_specifier_seq *declspecs,
>>>     gcc_assert (same_type_p (TREE_TYPE (TREE_TYPE (decl1)),
>>>                             integer_type_node));
>>>
>>> +  /* If this decl is a function version and not the default, its assembler
>>> +     name has to be changed.  */
>>> +  version_assembler_name (decl1);
>>> +
>>>   start_preparsed_function (decl1, attrs, /*flags=*/SF_DEFAULT);
>>>
>>>   return 1;
>>> @@ -13960,6 +14000,11 @@ cxx_comdat_group (tree decl)
>>>            break;
>>>        }
>>>       name = DECL_ASSEMBLER_NAME (decl);
>>> +      if (TREE_CODE (decl) == FUNCTION_DECL
>>> +         && DECL_FUNCTION_VERSIONED (decl))
>>> +       name = DECL_NAME (decl);
>>> +      else
>>> +        name = DECL_ASSEMBLER_NAME (decl);
>>>     }
>>>
>>>   return name;
>>> Index: cp/semantics.c
>>> ===================================================================
>>> --- cp/semantics.c      (revision 184971)
>>> +++ cp/semantics.c      (working copy)
>>> @@ -3783,8 +3783,11 @@ expand_or_defer_fn_1 (tree fn)
>>>       /* If the user wants us to keep all inline functions, then mark
>>>         this function as needed so that finish_file will make sure to
>>>         output it later.  Similarly, all dllexport'd functions must
>>> -        be emitted; there may be callers in other DLLs.  */
>>> -      if ((flag_keep_inline_functions
>>> +        be emitted; there may be callers in other DLLs.
>>> +        Also, mark this function as needed if it is marked inline but
>>> +        is a multi-versioned function.  */
>>> +      if (((flag_keep_inline_functions
>>> +           || DECL_FUNCTION_VERSIONED (fn))
>>>           && DECL_DECLARED_INLINE_P (fn)
>>>           && !DECL_REALLY_EXTERN (fn))
>>>          || (flag_keep_inline_dllexport
>>> Index: cp/decl2.c
>>> ===================================================================
>>> --- cp/decl2.c  (revision 184971)
>>> +++ cp/decl2.c  (working copy)
>>> @@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "splay-tree.h"
>>>  #include "langhooks.h"
>>>  #include "c-family/c-ada-spec.h"
>>> +#include "multiversion.h"
>>>
>>>  extern cpp_reader *parse_in;
>>>
>>> @@ -674,9 +675,13 @@ check_classfn (tree ctype, tree function, tree tem
>>>          if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL))
>>>            continue;
>>>
>>> +         /* While finding a match, same types and params are not enough
>>> +            if the function is versioned.  Also check version ("targetv")
>>> +            attributes.  */
>>>          if (same_type_p (TREE_TYPE (TREE_TYPE (function)),
>>>                           TREE_TYPE (TREE_TYPE (fndecl)))
>>>              && compparms (p1, p2)
>>> +             && !has_different_version_attributes (function, fndecl)
>>>              && (!is_template
>>>                  || comp_template_parms (template_parms,
>>>                                          DECL_TEMPLATE_PARMS (fndecl)))
>>> Index: cp/call.c
>>> ===================================================================
>>> --- cp/call.c   (revision 184971)
>>> +++ cp/call.c   (working copy)
>>> @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "langhooks.h"
>>>  #include "c-family/c-objc.h"
>>>  #include "timevar.h"
>>> +#include "multiversion.h"
>>>
>>>  /* The various kinds of conversion.  */
>>>
>>> @@ -6730,6 +6731,17 @@ build_over_call (struct z_candidate *cand, int fla
>>>   if (!already_used)
>>>     mark_used (fn);
>>>
>>> +  /* For a call to a multi-versioned function, the call should actually be to
>>> +     the dispatcher.  */
>>> +  if (DECL_FUNCTION_VERSIONED (fn))
>>> +    {
>>> +      tree ifunc_decl;
>>> +      ifunc_decl = get_ifunc_for_version (fn);
>>> +      gcc_assert (ifunc_decl != NULL);
>>> +      return build_call_expr_loc_array (UNKNOWN_LOCATION, ifunc_decl,
>>> +                                       nargs, argarray);
>>> +    }
>>> +
>>>   if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0)
>>>     {
>>>       tree t;
>>> @@ -7980,6 +7992,30 @@ joust (struct z_candidate *cand1, struct z_candida
>>>   size_t i;
>>>   size_t len;
>>>
>>> +  /* For Candidates of a multi-versioned function, the one marked default
>>> +     wins.  This is because the default decl is used as key to aggregate
>>> +     all the other versions provided for it in multiversion.c.  When
>>> +     generating the actual call, the appropriate dispatcher is created
>>> +     to call the right function version at run-time.  */
>>> +
>>> +  if ((TREE_CODE (cand1->fn) == FUNCTION_DECL
>>> +       && DECL_FUNCTION_VERSIONED (cand1->fn))
>>> +      ||(TREE_CODE (cand2->fn) == FUNCTION_DECL
>>> +        && DECL_FUNCTION_VERSIONED (cand2->fn)))
>>> +    {
>>> +      if (is_default_function (cand1->fn))
>>> +       {
>>> +          mark_used (cand2->fn);
>>> +         return 1;
>>> +       }
>>> +      if (is_default_function (cand2->fn))
>>> +       {
>>> +          mark_used (cand1->fn);
>>> +         return -1;
>>> +       }
>>> +      return 0;
>>> +    }
>>> +
>>>   /* Candidates that involve bad conversions are always worse than those
>>>      that don't.  */
>>>   if (cand1->viable > cand2->viable)
>>> Index: timevar.def
>>> ===================================================================
>>> --- timevar.def (revision 184971)
>>> +++ timevar.def (working copy)
>>> @@ -253,6 +253,7 @@ DEFTIMEVAR (TV_TREE_IFCOMBINE        , "tree if-co
>>>  DEFTIMEVAR (TV_TREE_UNINIT           , "uninit var analysis")
>>>  DEFTIMEVAR (TV_PLUGIN_INIT           , "plugin initialization")
>>>  DEFTIMEVAR (TV_PLUGIN_RUN            , "plugin execution")
>>> +DEFTIMEVAR (TV_MULTIVERSION_DISPATCH , "multiversion dispatch")
>>>
>>>  /* Everything else in rest_of_compilation not included above.  */
>>>  DEFTIMEVAR (TV_EARLY_LOCAL          , "early local passes")
>>> Index: varasm.c
>>> ===================================================================
>>> --- varasm.c    (revision 184971)
>>> +++ varasm.c    (working copy)
>>> @@ -5755,6 +5755,8 @@ finish_aliases_1 (void)
>>>        }
>>>       else if (! (p->emitted_diags & ALIAS_DIAG_TO_EXTERN)
>>>               && DECL_EXTERNAL (target_decl)
>>> +              && (!TREE_CODE (target_decl) == FUNCTION_DECL
>>> +                  || !DECL_STRUCT_FUNCTION (target_decl))
>>>               /* We use local aliases for C++ thunks to force the tailcall
>>>                  to bind locally.  This is a hack - to keep it working do
>>>                  the following (which is not strictly correct).  */
>>> Index: Makefile.in
>>> ===================================================================
>>> --- Makefile.in (revision 184971)
>>> +++ Makefile.in (working copy)
>>> @@ -1298,6 +1298,7 @@ OBJS = \
>>>        mcf.o \
>>>        mode-switching.o \
>>>        modulo-sched.o \
>>> +       multiversion.o \
>>>        omega.o \
>>>        omp-low.o \
>>>        optabs.o \
>>> @@ -3030,6 +3031,11 @@ ree.o : ree.c $(CONFIG_H) $(SYSTEM_H) coretypes.h
>>>    $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
>>>    $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) \
>>>    $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h $(PARAMS_H) $(CGRAPH_H)
>>> +multiversion.o : multiversion.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
>>> +   $(TREE_H) langhooks.h $(TREE_INLINE_H) $(FLAGS_H) $(CGRAPH_H) intl.h \
>>> +   $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h \
>>> +   $(HASHTAB_H) $(COVERAGE_H) $(GGC_H) $(TREE_FLOW_H) $(RTL_H) $(IPA_PROP_H) \
>>> +   $(BASIC_BLOCK_H) $(TOPLEV_H) $(TREE_DUMP_H) ipa-inline.h
>>>  cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>>    $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
>>>    $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h toplev.h $(DIAGNOSTIC_CORE_H) \
>>> Index: passes.c
>>> ===================================================================
>>> --- passes.c    (revision 184971)
>>> +++ passes.c    (working copy)
>>> @@ -1190,6 +1190,7 @@ init_optimization_passes (void)
>>>   NEXT_PASS (pass_build_cfg);
>>>   NEXT_PASS (pass_warn_function_return);
>>>   NEXT_PASS (pass_build_cgraph_edges);
>>> +  NEXT_PASS (pass_dispatch_versions);
>>>   *p = NULL;
>>>
>>>   /* Interprocedural optimization passes.  */
>>> Index: config/i386/i386.c
>>> ===================================================================
>>> --- config/i386/i386.c  (revision 184971)
>>> +++ config/i386/i386.c  (working copy)
>>> @@ -27446,6 +27473,593 @@ ix86_init_mmx_sse_builtins (void)
>>>     }
>>>  }
>>>
>>> +/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL
>>> +   to return a pointer to VERSION_DECL if the outcome of the function
>>> +   PREDICATE_DECL is true.  This function will be called during version
>>> +   dispatch to decide which function version to execute.  It returns the
>>> +   basic block at the end to which more conditions can be added.  */
>>> +
>>> +static basic_block
>>> +add_condition_to_bb (tree function_decl, tree version_decl,
>>> +                    basic_block new_bb, tree predicate_decl)
>>> +{
>>> +  gimple return_stmt;
>>> +  tree convert_expr, result_var;
>>> +  gimple convert_stmt;
>>> +  gimple call_cond_stmt;
>>> +  gimple if_else_stmt;
>>> +
>>> +  basic_block bb1, bb2, bb3;
>>> +  edge e12, e23;
>>> +
>>> +  tree cond_var;
>>> +  gimple_seq gseq;
>>> +
>>> +  tree old_current_function_decl;
>>> +
>>> +  old_current_function_decl = current_function_decl;
>>> +  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
>>> +  current_function_decl = function_decl;
>>> +
>>> +  gcc_assert (new_bb != NULL);
>>> +  gseq = bb_seq (new_bb);
>>> +
>>> +
>>> +  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
>>> +                        build_fold_addr_expr (version_decl));
>>> +  result_var = create_tmp_var (ptr_type_node, NULL);
>>> +  convert_stmt = gimple_build_assign (result_var, convert_expr);
>>> +  return_stmt = gimple_build_return (result_var);
>>> +
>>> +  if (predicate_decl == NULL_TREE)
>>> +    {
>>> +      gimple_seq_add_stmt (&gseq, convert_stmt);
>>> +      gimple_seq_add_stmt (&gseq, return_stmt);
>>> +      set_bb_seq (new_bb, gseq);
>>> +      gimple_set_bb (convert_stmt, new_bb);
>>> +      gimple_set_bb (return_stmt, new_bb);
>>> +      pop_cfun ();
>>> +      current_function_decl = old_current_function_decl;
>>> +      return new_bb;
>>> +    }
>>> +
>>> +  cond_var = create_tmp_var (integer_type_node, NULL);
>>> +  call_cond_stmt = gimple_build_call (predicate_decl, 0);
>>> +  gimple_call_set_lhs (call_cond_stmt, cond_var);
>>> +
>>> +  gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl));
>>> +  gimple_set_bb (call_cond_stmt, new_bb);
>>> +  gimple_seq_add_stmt (&gseq, call_cond_stmt);
>>> +
>>> +  if_else_stmt = gimple_build_cond (GT_EXPR, cond_var,
>>> +                                   integer_zero_node,
>>> +                                   NULL_TREE, NULL_TREE);
>>> +  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
>>> +  gimple_set_bb (if_else_stmt, new_bb);
>>> +  gimple_seq_add_stmt (&gseq, if_else_stmt);
>>> +
>>> +  gimple_seq_add_stmt (&gseq, convert_stmt);
>>> +  gimple_seq_add_stmt (&gseq, return_stmt);
>>> +  set_bb_seq (new_bb, gseq);
>>> +
>>> +  bb1 = new_bb;
>>> +  e12 = split_block (bb1, if_else_stmt);
>>> +  bb2 = e12->dest;
>>> +  e12->flags &= ~EDGE_FALLTHRU;
>>> +  e12->flags |= EDGE_TRUE_VALUE;
>>> +
>>> +  e23 = split_block (bb2, return_stmt);
>>> +
>>> +  gimple_set_bb (convert_stmt, bb2);
>>> +  gimple_set_bb (return_stmt, bb2);
>>> +
>>> +  bb3 = e23->dest;
>>> +  make_edge (bb1, bb3, EDGE_FALSE_VALUE);
>>> +
>>> +  remove_edge (e23);
>>> +  make_edge (bb2, EXIT_BLOCK_PTR, 0);
>>> +
>>> +  rebuild_cgraph_edges ();
>>> +
>>> +  pop_cfun ();
>>> +  current_function_decl = old_current_function_decl;
>>> +
>>> +  return bb3;
>>> +}
>>> +
>>> +/* This parses the attribute arguments to targetv in DECL and determines
>>> +   the right builtin to use to match the platform specification.
>>> +   For now, only one target argument ("arch=") is allowed.  */
>>> +
>>> +static enum ix86_builtins
>>> +get_builtin_code_for_version (tree decl)
>>> +{
>>> +  tree attrs;
>>> +  struct cl_target_option cur_target;
>>> +  tree target_node;
>>> +  struct cl_target_option *new_target;
>>> +  enum ix86_builtins builtin_code = IX86_BUILTIN_MAX;
>>> +
>>> +  attrs = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl));
>>> +  gcc_assert (attrs != NULL);
>>> +
>>> +  cl_target_option_save (&cur_target, &global_options);
>>> +
>>> +  target_node = ix86_valid_target_attribute_tree
>>> +                 (TREE_VALUE (TREE_VALUE (attrs)));
>>> +
>>> +  gcc_assert (target_node);
>>> +  new_target = TREE_TARGET_OPTION (target_node);
>>> +  gcc_assert (new_target);
>>> +
>>> +  if (new_target->arch_specified && new_target->arch > 0)
>>> +    {
>>> +      switch (new_target->arch)
>>> +        {
>>> +       case 1:
>>> +       case 2:
>>> +       case 3:
>>> +       case 4:
>>> +       case 5:
>>> +       case 6:
>>> +       case 7:
>>> +       case 8:
>>> +       case 9:
>>> +       case 10:
>>> +       case 11:
>>> +         builtin_code = IX86_BUILTIN_CPU_IS_INTEL;
>>> +         break;
>>> +       case 12:
>>> +         builtin_code = IX86_BUILTIN_CPU_IS_INTEL_CORE2;
>>> +         break;
>>> +       case 13:
>>> +         builtin_code = IX86_BUILTIN_CPU_IS_INTEL_COREI7;
>>> +         break;
>>> +       case 14:
>>> +         builtin_code = IX86_BUILTIN_CPU_IS_INTEL_ATOM;
>>> +         break;
>>> +       case 15:
>>> +       case 16:
>>> +       case 17:
>>> +       case 18:
>>> +       case 19:
>>> +       case 20:
>>> +       case 21:
>>> +         builtin_code = IX86_BUILTIN_CPU_IS_AMD;
>>> +         break;
>>> +       case 22:
>>> +         builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM10H;
>>> +         break;
>>> +       case 23:
>>> +         builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER1;
>>> +         break;
>>> +       case 24:
>>> +         builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER2;
>>> +         break;
>>> +       case 25: /* What is btver1 ? */
>>> +         builtin_code = IX86_BUILTIN_CPU_IS_AMD;
>>> +         break;
>>> +       }
>>> +    }
>>> +
>>> +  cl_target_option_restore (&global_options, &cur_target);
>>> +  if (builtin_code == IX86_BUILTIN_MAX)
>>> +      error_at (DECL_SOURCE_LOCATION (decl),
>>> +               "No dispatcher found for the versioning attributes");
>>> +
>>> +  return builtin_code;
>>> +}
>>> +
>>> +/* This is the target hook to generate the dispatch function for
>>> +   multi-versioned functions.  DISPATCH_DECL is the function which will
>>> +   contain the dispatch logic.  FNDECLS are the function choices for
>>> +   dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer
>>> +   in DISPATCH_DECL in which the dispatch code is generated.  */
>>> +
>>> +static int
>>> +ix86_dispatch_version (tree dispatch_decl,
>>> +                      void *fndecls_p,
>>> +                      basic_block *empty_bb)
>>> +{
>>> +  tree default_decl;
>>> +  gimple ifunc_cpu_init_stmt;
>>> +  gimple_seq gseq;
>>> +  tree old_current_function_decl;
>>> +  int ix;
>>> +  tree ele;
>>> +  VEC (tree, heap) *fndecls;
>>> +
>>> +  gcc_assert (dispatch_decl != NULL
>>> +             && fndecls_p != NULL
>>> +             && empty_bb != NULL);
>>> +
>>> +  /*fndecls_p is actually a vector.  */
>>> +  fndecls = (VEC (tree, heap) *)fndecls_p;
>>> +
>>> +  /* Atleast one more version other than the default.  */
>>> +  gcc_assert (VEC_length (tree, fndecls) >= 2);
>>> +
>>> +  /* The first version in the vector is the default decl.  */
>>> +  default_decl = VEC_index (tree, fndecls, 0);
>>> +
>>> +  old_current_function_decl = current_function_decl;
>>> +  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
>>> +  current_function_decl = dispatch_decl;
>>> +
>>> +  gseq = bb_seq (*empty_bb);
>>> +  ifunc_cpu_init_stmt = gimple_build_call_vec (
>>> +                     ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL);
>>> +  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
>>> +  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
>>> +  set_bb_seq (*empty_bb, gseq);
>>> +
>>> +  pop_cfun ();
>>> +  current_function_decl = old_current_function_decl;
>>> +
>>> +
>>> +  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix)
>>> +    {
>>> +      tree version_decl = ele;
>>> +      /* Get attribute string, parse it and find the right predicate decl.
>>> +         The predicate function could be a lengthy combination of many
>>> +        features, like arch-type and various isa-variants.  For now, only
>>> +        check the arch-type.  */
>>> +      tree predicate_decl = ix86_builtins [
>>> +                       get_builtin_code_for_version (version_decl)];
>>> +      *empty_bb = add_condition_to_bb (dispatch_decl, version_decl, *empty_bb,
>>> +                                      predicate_decl);
>>> +
>>> +    }
>>> +  /* dispatch default version at the end.  */
>>> +  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl, *empty_bb,
>>> +                                  NULL);
>>> +  return 0;
>>> +}
>>>
>>> @@ -38610,6 +39269,12 @@ ix86_autovectorize_vector_sizes (void)
>>>  #undef TARGET_BUILD_BUILTIN_VA_LIST
>>>  #define TARGET_BUILD_BUILTIN_VA_LIST ix86_build_builtin_va_list
>>>
>>> +#undef TARGET_DISPATCH_VERSION
>>> +#define TARGET_DISPATCH_VERSION ix86_dispatch_version
>>> +
>>>  #undef TARGET_ENUM_VA_LIST_P
>>>  #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
>>>
>>> Index: testsuite/g++.dg/mv1.C
>>> ===================================================================
>>> --- testsuite/g++.dg/mv1.C      (revision 0)
>>> +++ testsuite/g++.dg/mv1.C      (revision 0)
>>> @@ -0,0 +1,23 @@
>>> +/* Simple test case to check if Multiversioning works.  */
>>> +/* { dg-do run } */
>>> +/* { dg-options "-O2" } */
>>> +
>>> +int foo ();
>>> +int foo () __attribute__ ((targetv("arch=corei7")));
>>> +
>>> +int main ()
>>> +{
>>> +  int (*p)() = &foo;
>>> +  return foo () + (*p)();
>>> +}
>>> +
>>> +int foo ()
>>> +{
>>> +  return 0;
>>> +}
>>> +
>>> +int __attribute__ ((targetv("arch=corei7")))
>>> +foo ()
>>> +{
>>> +  return 0;
>>> +}
>>>
>>>
>>> --
>>> This patch is available for review at http://codereview.appspot.com/5752064
Overview of the patch which adds front-end support to specify function versions.

Example:

int foo ();  /* Default version */
int foo () __attribute__ ((target("avx,popcnt")));/*Specialized for avx and popcnt */
int foo () __attribute__ ((target("arch=core2,ssse3")));/*Specialized for core2 and ssse3*/

int main ()
{
 int (*p)() = &foo;
 return foo () + (*p)();
}

int foo ()
{
 return 0;
}

int __attribute__ ((target("avx,popcnt")))
foo ()
{
 return 0;
}

int __attribute__ ((target("arch=core2,ssse3")))
foo ()
{
 return 0;
}

The above example has foo defined 3 times, but all 3 definitions of foo are
different versions of the same function. The call to foo in main, directly and
via a pointer, are calls to the multi-versioned function foo which is dispatched
to the right foo at run-time.

What does the patch do?

* Tracking decls that correspond to function versions of function
name, say "foo":

Wnen the front-end sees more than one decl for "foo", with atleast one decl
tagged with "target"  attributes, it marks it as function versions. To
prevent duplicate definition errors with other versions of "foo", I change
"decls_match" function in cp/decl.c to return false when 2 decls have the
same signature but different target attributes. This will make all function
versions of "foo" to be added to the overload list of "foo".

* Change the assembler names of the function versions.

The front-end changes the assembler names of the function versions by suffixing
the sorted list of args to "target" to the function name of "foo". For example,
he assembler name of "void foo () __attribute__ ((target ("sse4")))" will
become _Z3foov.sse4.

* Separately group all function versions of "foo" together, in multiversion.c:

File multiversion.c maintains a hashtab, decl_version_htab,  that maps
the  default function decl of "foo" to the list of all other versions
of this function "foo". This is used when creating the dispatcher for
this function.

* Overload resolution:

 Function "build_over_call" in cp/call.c sees a call to function
"foo", which is multi-versioned. The overload resolution happens in
function "joust" in "cp/call.c". Here, the call to "foo" has all
possible versions of "foo" as candidates. If the caller has target
attributes and if it matches any of the function version's target
attributes, then a direct call is made to that function version.

For example:

int baz __attribute__ ((target ("avx,popcnt")))
{
  foo ();
}

it baz calls foo which is multi-versioned, then the call to foo here
will become a direct call to the version of foo targeted to avx,popcnt.

When a direct call to a version cannot be made then, the default
version of "foo" is the winning candidate. But, "build_over_call" realizes
that this is a versioned function and replaces the call-site of foo with a
"ifunc" call for foo, by querying a function in "multiversion.c" which
builds the ifunc decl. After this, all call-sites of "foo" contain the
call to the ifunc.

* Creating the dispatcher:

The dispatcher is independently created in a new pass, called
"pass_dispatch_version", that runs immediately after cfg and cgraph are
created. The dispatcher looks at all possible versions and queries the
target to give it the CPU detection predicates it must use to dispatch
each version. Then, the dispatcher body is created and the ifunc is
mapped to use this dispatcher.

Notice that only the dispatcher creation is done after the front-end.
Everything else occurs in the front-end itself. I could have created
the dispatcher also in the front-end. I did not do so because I
thought keeping it as a separate pass made sense to easily add more
dispatch mechanisms. Like when IFUNC is not available, replace it with
 control-flow to make direct calls to the function versions. Also,
making the dispatcher after cfg is created was easy.


	* doc/tm.texi.in: Add description for TARGET_DISPATCH_VERSION.
	* doc/tm.texi: Regenerate.
	* target.def (dispatch_version): New target hook.
	* tree.h (DECL_FUNCTION_VERSIONED): New macro.
	(tree_function_decl): New bit-field versioned_function.
	* tree-pass.h (pass_dispatch_versions): New pass.
	* multiversion.c: New file.
	* multiversion.h: New file.
	* cgraphunit.c:
	(cgraph_finalize_function): Force output of versioned inline
	functions.
	* cp/class.c: Include multiversion.h
	(add_method): aggregate function versions. Change assembler names of
	versioned functions.
	(resolve_address_of_overloaded_function): Match address of function
	version with default function.  Return address of ifunc dispatcher
	for address of versioned functions.
	(cxx_comdat_group): Use decl names for comdat groups of versioned
	functions.
	* cp/decl.c (decls_match): Make decls unmatched for versioned
	functions.
	(duplicate_decls): Remove ambiguity for versioned functions. Notify
	of deleted function version decls.
	(start_decl): Change assembler name of versioned functions.
	(start_function): Change assembler name of versioned functions.
	(cxx_comdat_group): Make comdat group of versioned functions be the
	same.
	* cp/semantics.c (expand_or_defer_fn_1): Mark as needed versioned
	functions that are also marked inline.
	* cp/decl2.c: Include multiversion.h
	(check_classfn): Check attributes of versioned functions for match.
	* cp/call.c: Include multiversion.h
	(build_over_call): Make calls to multiversioned functions to call the
	dispatcher.
	(joust): For calls to multi-versioned functions, make the default
	function win.
	* timevar.def (TV_MULTIVERSION_DISPATCH): New time var.
	* varasm.c (finish_aliases_1): Check if the alias points to a function
	with a body before giving an error.
	* Makefile.in: Add multiversion.o
	* passes.c: Add pass_dispatch_versions to the pass list.
	* config/i386/i386.c (add_condition_to_bb): New function.
	(get_builtin_code_for_version): New function.
	(ix86_dispatch_version): New function.
	(TARGET_DISPATCH_VERSION): New macro.
	* testsuite/g++.dg/mv1.C: New test.
H.J. Lu - April 27, 2012, 1:38 p.m.
On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> Hi,
>
>   I have made the following changes in this new patch which is attached:
>
> * Use target attribute itself to create function versions.
> * Handle any number of ISA names and arch=  args to target attribute,
> generating the right dispatchers.
> * Integrate with the CPU runtime detection checked in this week.
> * Overload resolution: If the caller's target matches any of the
> version function's target, then a direct call to the version is
> generated, no need to go through the dispatching.
>
> Patch also available for review here:
> http://codereview.appspot.com/5752064
>

Does it work with

int foo ();
int foo () __attribute__ ((targetv("arch=corei7")));

int (*foo_p) () = foo?

Does it support C++?

Thanks.
Sriraman Tallam - April 27, 2012, 2:35 p.m.
On Fri, Apr 27, 2012 at 6:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> Hi,
>>
>>   I have made the following changes in this new patch which is attached:
>>
>> * Use target attribute itself to create function versions.
>> * Handle any number of ISA names and arch=  args to target attribute,
>> generating the right dispatchers.
>> * Integrate with the CPU runtime detection checked in this week.
>> * Overload resolution: If the caller's target matches any of the
>> version function's target, then a direct call to the version is
>> generated, no need to go through the dispatching.
>>
>> Patch also available for review here:
>> http://codereview.appspot.com/5752064
>>
>
> Does it work with
>
> int foo ();
> int foo () __attribute__ ((targetv("arch=corei7")));
>
> int (*foo_p) () = foo?

Yes, this will work. foo_p will be the address of the dispatcher
function and hence doing (*foo_p)() will call the right version.

>
> Does it support C++?

Partially, no support for virtual function versioning yet. I will add
it in the next iteration.

Thanks,
-Sri.

>
> Thanks.
>
> --
> H.J.
H.J. Lu - April 27, 2012, 2:38 p.m.
On Fri, Apr 27, 2012 at 7:35 AM, Sriraman Tallam <tmsriram@google.com> wrote:
> On Fri, Apr 27, 2012 at 6:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> Hi,
>>>
>>>   I have made the following changes in this new patch which is attached:
>>>
>>> * Use target attribute itself to create function versions.
>>> * Handle any number of ISA names and arch=  args to target attribute,
>>> generating the right dispatchers.
>>> * Integrate with the CPU runtime detection checked in this week.
>>> * Overload resolution: If the caller's target matches any of the
>>> version function's target, then a direct call to the version is
>>> generated, no need to go through the dispatching.
>>>
>>> Patch also available for review here:
>>> http://codereview.appspot.com/5752064
>>>
>>
>> Does it work with
>>
>> int foo ();
>> int foo () __attribute__ ((targetv("arch=corei7")));
>>
>> int (*foo_p) () = foo?
>
> Yes, this will work. foo_p will be the address of the dispatcher
> function and hence doing (*foo_p)() will call the right version.

Even when foo_p is a global variable and compiled with -fPIC?

Thanks.
Sriraman Tallam - April 27, 2012, 2:53 p.m.
On Fri, Apr 27, 2012 at 7:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Apr 27, 2012 at 7:35 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>> On Fri, Apr 27, 2012 at 6:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>> Hi,
>>>>
>>>>   I have made the following changes in this new patch which is attached:
>>>>
>>>> * Use target attribute itself to create function versions.
>>>> * Handle any number of ISA names and arch=  args to target attribute,
>>>> generating the right dispatchers.
>>>> * Integrate with the CPU runtime detection checked in this week.
>>>> * Overload resolution: If the caller's target matches any of the
>>>> version function's target, then a direct call to the version is
>>>> generated, no need to go through the dispatching.
>>>>
>>>> Patch also available for review here:
>>>> http://codereview.appspot.com/5752064
>>>>
>>>
>>> Does it work with
>>>
>>> int foo ();
>>> int foo () __attribute__ ((targetv("arch=corei7")));
>>>
>>> int (*foo_p) () = foo?
>>
>> Yes, this will work. foo_p will be the address of the dispatcher
>> function and hence doing (*foo_p)() will call the right version.
>
> Even when foo_p is a global variable and compiled with -fPIC?

I am not sure I understand what the complication is here, but FWIW, I
tried this example and it works

int foo ()
{
 return 0;
}

int  __attribute__ ((target ("arch=corei7)))
foo ()
{
 return 1;
}

int (*foo_p)() = foo;
int main ()
{
 return (*foo_p)();
}

g++ -fPIC -O2 example.cc


Did you have something else in mind? Could you please elaborate if you
a have a particular case in mind.

The way I handle function pointers is straightforward. When the
front-end sees a pointer to a function that is versioned, it returns
the pointer to the dispatcher instead.

Thanks,
-Sri.

>
> Thanks.
>
> --
> H.J.
H.J. Lu - April 27, 2012, 3:36 p.m.
On Fri, Apr 27, 2012 at 7:53 AM, Sriraman Tallam <tmsriram@google.com> wrote:
> On Fri, Apr 27, 2012 at 7:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Fri, Apr 27, 2012 at 7:35 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> On Fri, Apr 27, 2012 at 6:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>> Hi,
>>>>>
>>>>>   I have made the following changes in this new patch which is attached:
>>>>>
>>>>> * Use target attribute itself to create function versions.
>>>>> * Handle any number of ISA names and arch=  args to target attribute,
>>>>> generating the right dispatchers.
>>>>> * Integrate with the CPU runtime detection checked in this week.
>>>>> * Overload resolution: If the caller's target matches any of the
>>>>> version function's target, then a direct call to the version is
>>>>> generated, no need to go through the dispatching.
>>>>>
>>>>> Patch also available for review here:
>>>>> http://codereview.appspot.com/5752064
>>>>>
>>>>
>>>> Does it work with
>>>>
>>>> int foo ();
>>>> int foo () __attribute__ ((targetv("arch=corei7")));
>>>>
>>>> int (*foo_p) () = foo?
>>>
>>> Yes, this will work. foo_p will be the address of the dispatcher
>>> function and hence doing (*foo_p)() will call the right version.
>>
>> Even when foo_p is a global variable and compiled with -fPIC?
>
> I am not sure I understand what the complication is here, but FWIW, I
> tried this example and it works
>
> int foo ()
> {
>  return 0;
> }
>
> int  __attribute__ ((target ("arch=corei7)))
> foo ()
> {
>  return 1;
> }
>
> int (*foo_p)() = foo;
> int main ()
> {
>  return (*foo_p)();
> }
>
> g++ -fPIC -O2 example.cc
>
>
> Did you have something else in mind? Could you please elaborate if you
> a have a particular case in mind.
>

That is what I meant.  But I didn't see it in your testcase.
Can you add it to your testcase?

Also you should verify the correct function is called in
your testcase at run-time.


Thanks.
Sriraman Tallam - April 27, 2012, 3:45 p.m.
On Fri, Apr 27, 2012 at 8:36 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Apr 27, 2012 at 7:53 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>> On Fri, Apr 27, 2012 at 7:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Fri, Apr 27, 2012 at 7:35 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>> On Fri, Apr 27, 2012 at 6:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>>   I have made the following changes in this new patch which is attached:
>>>>>>
>>>>>> * Use target attribute itself to create function versions.
>>>>>> * Handle any number of ISA names and arch=  args to target attribute,
>>>>>> generating the right dispatchers.
>>>>>> * Integrate with the CPU runtime detection checked in this week.
>>>>>> * Overload resolution: If the caller's target matches any of the
>>>>>> version function's target, then a direct call to the version is
>>>>>> generated, no need to go through the dispatching.
>>>>>>
>>>>>> Patch also available for review here:
>>>>>> http://codereview.appspot.com/5752064
>>>>>>
>>>>>
>>>>> Does it work with
>>>>>
>>>>> int foo ();
>>>>> int foo () __attribute__ ((targetv("arch=corei7")));
>>>>>
>>>>> int (*foo_p) () = foo?
>>>>
>>>> Yes, this will work. foo_p will be the address of the dispatcher
>>>> function and hence doing (*foo_p)() will call the right version.
>>>
>>> Even when foo_p is a global variable and compiled with -fPIC?
>>
>> I am not sure I understand what the complication is here, but FWIW, I
>> tried this example and it works
>>
>> int foo ()
>> {
>>  return 0;
>> }
>>
>> int  __attribute__ ((target ("arch=corei7)))
>> foo ()
>> {
>>  return 1;
>> }
>>
>> int (*foo_p)() = foo;
>> int main ()
>> {
>>  return (*foo_p)();
>> }
>>
>> g++ -fPIC -O2 example.cc
>>
>>
>> Did you have something else in mind? Could you please elaborate if you
>> a have a particular case in mind.
>>
>
> That is what I meant.  But I didn't see it in your testcase.
> Can you add it to your testcase?
>
> Also you should verify the correct function is called in
> your testcase at run-time.

Ok, i will update the patch.

Thanks,
-Sri.

>
>
> Thanks.
>
>
> --
> H.J.

Patch

Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 186883)
+++ gcc/doc/tm.texi	(working copy)
@@ -10997,6 +10997,14 @@  The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_DISPATCH_VERSION (tree @var{dispatch_decl}, void *@var{fndecls}, basic_block *@var{empty_bb})
+For multi-versioned function, this hook sets up the dispatcher.
+@var{dispatch_decl} is the function that will be used to dispatch the
+version. @var{fndecls} are the function choices for dispatch.
+@var{empty_bb} is an basic block in @var{dispatch_decl} where the
+code to do the dispatch will be added.
+@end deftypefn
+
 @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn})
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 186883)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -10877,6 +10877,14 @@  The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@hook TARGET_DISPATCH_VERSION
+For multi-versioned function, this hook sets up the dispatcher.
+@var{dispatch_decl} is the function that will be used to dispatch the
+version. @var{fndecls} are the function choices for dispatch.
+@var{empty_bb} is an basic block in @var{dispatch_decl} where the
+code to do the dispatch will be added.
+@end deftypefn
+
 @hook TARGET_INVALID_WITHIN_DOLOOP
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 186883)
+++ gcc/target.def	(working copy)
@@ -1249,6 +1249,15 @@  DEFHOOK
  tree, (tree fndecl, int n_args, tree *argp, bool ignore),
  hook_tree_tree_int_treep_bool_null)
 
+/* Target hook to generate the dispatching code for calls to multi-versioned
+   functions.  DISPATCH_DECL is the function that will have the dispatching
+   logic.  FNDECLS are the list of choices for dispatch and EMPTY_BB is the
+   basic bloc in DISPATCH_DECL which will contain the code.  */
+DEFHOOK
+(dispatch_version,
+ "",
+ int, (tree dispatch_decl, void *fndecls, basic_block *empty_bb), NULL)
+
 /* Returns a code for a target-specific builtin that implements
    reciprocal of the function, or NULL_TREE if not available.  */
 DEFHOOK
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 186883)
+++ gcc/tree.h	(working copy)
@@ -3539,6 +3539,12 @@  extern VEC(tree, gc) **decl_debug_args_insert (tre
 #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \
    (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization)
 
+/* In FUNCTION_DECL, this is set if this function has other versions generated
+   using "target" attributes.  The default version is the one which does not
+   have any "target" attribute set. */
+#define DECL_FUNCTION_VERSIONED(NODE)\
+   (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function)
+
 /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the
    arguments/result/saved_tree fields by front ends.   It was either inherit
    FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL,
@@ -3583,8 +3589,8 @@  struct GTY(()) tree_function_decl {
   unsigned looping_const_or_pure_flag : 1;
   unsigned has_debug_args_flag : 1;
   unsigned tm_clone_flag : 1;
-
-  /* 1 bit left */
+  unsigned versioned_function : 1;
+  /* No bits left.  */
 };
 
 /* The source language of the translation-unit.  */
Index: gcc/tree-pass.h
===================================================================
--- gcc/tree-pass.h	(revision 186883)
+++ gcc/tree-pass.h	(working copy)
@@ -453,6 +453,7 @@  extern struct gimple_opt_pass pass_tm_memopt;
 extern struct gimple_opt_pass pass_tm_edges;
 extern struct gimple_opt_pass pass_split_functions;
 extern struct gimple_opt_pass pass_feedback_split_functions;
+extern struct gimple_opt_pass pass_dispatch_versions;
 
 /* IPA Passes */
 extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
Index: gcc/multiversion.c
===================================================================
--- gcc/multiversion.c	(revision 0)
+++ gcc/multiversion.c	(revision 0)
@@ -0,0 +1,832 @@ 
+/* Function Multiversioning.
+   Copyright (C) 2012 Free Software Foundation, Inc.
+   Contributed by Sriraman Tallam (tmsriram@google.com)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>. */
+
+/* Holds the state for multi-versioned functions here. The front-end
+   updates the state as and when function versions are encountered.
+   This is then used to generate the dispatch code.  Also, the
+   optimization passes to clone hot paths involving versioned functions
+   will be done here.
+
+   Function versions are created by using the same function signature but
+   also tagging attribute "target" to specify the platform type for which
+   the version must be executed.  Here is an example:
+
+   int foo ()
+   {
+     printf ("Execute as default");
+     return 0;
+   }
+
+   int  __attribute__ ((target ("arch=corei7")))
+   foo ()
+   {
+     printf ("Execute for corei7");
+     return 0;
+   }
+   
+   int main ()
+   {
+     return foo ();
+   } 
+
+   The call to foo in main is replaced with a call to an IFUNC function that
+   contains the dispatch code to call the correct function version at
+   run-time.  */
+
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "tree-inline.h"
+#include "langhooks.h"
+#include "flags.h"
+#include "cgraph.h"
+#include "diagnostic.h"
+#include "toplev.h"
+#include "timevar.h"
+#include "params.h"
+#include "fibheap.h"
+#include "intl.h"
+#include "tree-pass.h"
+#include "hashtab.h"
+#include "coverage.h"
+#include "ggc.h"
+#include "tree-flow.h"
+#include "rtl.h"
+#include "ipa-prop.h"
+#include "basic-block.h"
+#include "toplev.h"
+#include "dbgcnt.h"
+#include "tree-dump.h"
+#include "output.h"
+#include "vecprim.h"
+#include "gimple-pretty-print.h"
+#include "ipa-inline.h"
+#include "target.h"
+#include "multiversion.h"
+
+typedef void * void_p;
+
+DEF_VEC_P (void_p);
+DEF_VEC_ALLOC_P (void_p, heap);
+
+/* Each function decl that is a function version gets an instance of this
+   structure.   Since this is called by the front-end, decl merging can
+   happen, where a decl created for a new declaration is merged with 
+   the old. In this case, the new decl is deleted and the IS_DELETED
+   field is set for the struct instance corresponding to the new decl.
+   IFUNC_DECL is the decl of the ifunc function for default decls.
+   IFUNC_RESOLVER_DECL is the decl of the dispatch function.  VERSIONS
+   is a vector containing the list of function versions  that are
+   the candidates for dispatch.  */
+
+typedef struct version_function_d {
+  tree decl;
+  tree ifunc_decl;
+  tree ifunc_resolver_decl;
+  VEC (void_p, heap) *versions;
+  bool is_deleted;
+} version_function;
+
+/* Hashmap has an entry for every function decl that has other function
+   versions.  For function decls that are the default, it also stores the
+   list of all the other function versions.  Each entry is a structure
+   of type version_function_d.  */
+static htab_t decl_version_htab = NULL;
+
+/* Hashtable helpers for decl_version_htab. */
+
+static hashval_t
+decl_version_htab_hash_descriptor (const void *p)
+{
+  const version_function *t = (const version_function *) p;
+  return htab_hash_pointer (t->decl);
+}
+
+/* Hashtable helper for decl_version_htab. */
+
+static int
+decl_version_htab_eq_descriptor (const void *p1, const void *p2)
+{
+  const version_function *t1 = (const version_function *) p1;
+  return htab_eq_pointer ((const void_p) t1->decl, p2);
+}
+
+/* Create the decl_version_htab.  */
+static void
+create_decl_version_htab (void)
+{
+  if (decl_version_htab == NULL)
+    decl_version_htab = htab_create (10, decl_version_htab_hash_descriptor,
+				     decl_version_htab_eq_descriptor, NULL);
+}
+
+/* Creates an instance of version_function for decl DECL.  */
+
+static version_function*
+new_version_function (const tree decl)
+{
+  version_function *v;
+  v = (version_function *)xmalloc(sizeof (version_function));
+  v->decl = decl;
+  v->ifunc_decl = NULL;
+  v->ifunc_resolver_decl = NULL;
+  v->versions = NULL;
+  v->is_deleted = false;
+  return v;
+}
+
+/* Comparator function to be used in qsort routine to sort attribute
+   specification strings to "target".  */
+
+static int
+attr_strcmp (const void *v1, const void *v2)
+{
+  const char *c1 = *(char *const*)v1;
+  const char *c2 = *(char *const*)v2;
+  return strcmp (c1, c2);
+}
+
+/* STR is the argument to target attribute.  This function tokenizes
+   the comma separated arguments, sorts them and returns a string which
+   is a unique identifier for the comma separated arguments.  */
+
+static char *
+sorted_attr_string (const char *str)
+{
+  char **args = NULL;
+  char *attr_str, *ret_str;
+  char *attr = NULL;
+  unsigned int argnum = 1;
+  unsigned int i;
+
+  for (i = 0; i < strlen (str); i++)
+    if (str[i] == ',')
+      argnum++;
+
+  attr_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (attr_str, str);
+
+  for (i = 0; i < strlen (attr_str); i++)
+    if (attr_str[i] == '=')
+      attr_str[i] = '_';
+
+  if (argnum == 1)
+    return attr_str;
+
+  args = (char **)xmalloc (argnum * sizeof (char *));
+
+  i = 0;
+  attr = strtok (attr_str, ",");
+  while (attr != NULL)
+    {
+      args[i] = attr;
+      i++;
+      attr = strtok (NULL, ",");
+    }
+
+  qsort (args, argnum, sizeof (char*), attr_strcmp);
+
+  ret_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (ret_str, args[0]);
+  for (i = 1; i < argnum; i++)
+    {
+      strcat (ret_str, "_");
+      strcat (ret_str, args[i]);
+    }
+
+  free (args);
+  free (attr_str);
+  return ret_str;
+}
+
+/* Returns true when only one of DECL1 and DECL2 is marked with "target"
+   or if the "target" attribute strings of DECL1 and DECL2 dont match.  */
+
+bool
+has_different_version_attributes (const tree decl1, const tree decl2)
+{
+  tree attr1, attr2;
+  char *c1, *c2;
+  bool ret = false;
+
+  if (TREE_CODE (decl1) != FUNCTION_DECL
+      || TREE_CODE (decl2) != FUNCTION_DECL)
+    return false;
+
+  attr1 = lookup_attribute ("target", DECL_ATTRIBUTES (decl1));
+  attr2 = lookup_attribute ("target", DECL_ATTRIBUTES (decl2));
+
+  if (attr1 == NULL_TREE && attr2 == NULL_TREE)
+    return false;
+
+  if ((attr1 == NULL_TREE && attr2 != NULL_TREE)
+      || (attr1 != NULL_TREE && attr2 == NULL_TREE))
+    return true;
+
+  c1 = sorted_attr_string (
+	TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr1))));
+  c2 = sorted_attr_string (
+	TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr2))));
+
+  if (strcmp (c1, c2) != 0)
+     ret = true;
+
+  free (c1);
+  free (c2);
+
+  return ret;
+}
+
+/* If this decl corresponds to a function and has "target" attribute,
+   append the attribute string to its assembler name.  */
+
+static void
+version_assembler_name (const tree decl)
+{
+  tree version_attr;
+  const char *orig_name, *version_string, *attr_str;
+  char *assembler_name;
+  tree assembler_name_tree;
+  
+  if (TREE_CODE (decl) != FUNCTION_DECL)
+    return;
+
+  if (DECL_DECLARED_INLINE_P (decl)
+      &&lookup_attribute ("gnu_inline",
+			  DECL_ATTRIBUTES (decl)))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Function versions cannot be marked as gnu_inline,"
+	      " bodies have to be generated\n");
+
+  if (DECL_VIRTUAL_P (decl)
+      || DECL_VINDEX (decl))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Virtual function versioning not supported\n");
+
+  version_attr = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+  /* target attribute string is NULL for default functions.  */
+  if (version_attr == NULL_TREE)
+    return;
+
+  orig_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+  version_string
+    = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr)));
+
+  attr_str = sorted_attr_string (version_string);
+  assembler_name = (char *) xmalloc (strlen (orig_name)
+				     + strlen (attr_str) + 2);
+
+  sprintf (assembler_name, "%s.%s", orig_name, attr_str);
+  if (dump_file)
+    fprintf (dump_file, "Assembler name set to %s for function version %s\n",
+	     assembler_name, IDENTIFIER_POINTER (DECL_NAME (decl)));
+
+  assembler_name_tree = get_identifier (assembler_name);
+
+  SET_DECL_ASSEMBLER_NAME (decl, assembler_name_tree);
+  SET_DECL_RTL (decl, NULL);
+}
+
+void
+mark_function_as_version (const tree decl)
+{
+  if (DECL_FUNCTION_VERSIONED (decl))
+    return;
+  DECL_FUNCTION_VERSIONED (decl) = 1;
+  version_assembler_name (decl);
+}
+
+/* Returns true if function DECL has target attribute set.  This could be
+   a version.  */
+
+bool
+is_target_attribute_set (const tree decl)
+{
+  return (TREE_CODE (decl) == FUNCTION_DECL
+	  && (lookup_attribute ("target", DECL_ATTRIBUTES (decl))
+	      != NULL_TREE));
+}
+
+/* Returns true if decl is multi-versioned and DECL is the default function,
+   that is it is not tagged with "target" attribute.  */
+
+bool
+is_default_function (const tree decl)
+{
+  return (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl)
+	  && (lookup_attribute ("target", DECL_ATTRIBUTES (decl))
+	      == NULL_TREE));	
+}
+
+/* For function decl DECL, find the version_function struct in the
+   decl_version_htab.  */
+
+static version_function *
+find_function_version (const tree decl)
+{
+  void *slot;
+
+  if (!DECL_FUNCTION_VERSIONED (decl))
+    return NULL;
+
+  if (!decl_version_htab)
+    return NULL;
+
+  slot = htab_find_with_hash (decl_version_htab, decl,
+                              htab_hash_pointer (decl));
+
+  if (slot != NULL)
+    return (version_function *)slot;
+
+  return NULL;
+}
+
+/* Record DECL as a function version by creating a version_function struct
+   for it and storing it in the hashtable.  */
+
+static version_function *
+add_function_version (const tree decl)
+{
+  void **slot;
+  version_function *v;
+
+  if (!DECL_FUNCTION_VERSIONED (decl))
+    return NULL;
+
+  create_decl_version_htab ();
+
+  slot = htab_find_slot_with_hash (decl_version_htab, (const void_p)decl,
+                                   htab_hash_pointer ((const void_p)decl),
+				   INSERT);
+
+  if (*slot != NULL)
+    return (version_function *)*slot;
+
+  v = new_version_function (decl);
+  *slot = v;
+
+  return v;
+}
+
+/* Push V into VEC only if it is not already present.  If already present
+   returns false.  */
+
+static bool
+push_function_version (version_function *v, VEC (void_p, heap) **vec)
+{
+  int ix;
+  void_p ele; 
+  for (ix = 0; VEC_iterate (void_p, *vec, ix, ele); ++ix)
+    {
+      if (ele == (void_p)v)
+        return false;
+    }
+
+  VEC_safe_push (void_p, heap, *vec, (void*)v);
+  return true;
+}
+ 
+/* Mark DECL as deleted.  This is called by the front-end when a duplicate
+   decl is merged with the original decl and the duplicate decl is deleted.
+   This function marks the duplicate_decl as invalid.  Called by
+   duplicate_decls in cp/decl.c.  */
+
+void
+mark_delete_decl_version (const tree decl)
+{
+  version_function *decl_v;
+
+  decl_v = find_function_version (decl);
+
+  if (decl_v == NULL)
+    return;
+
+  decl_v->is_deleted = true;
+
+  if (is_default_function (decl)
+      && decl_v->versions != NULL)
+    {
+      VEC_truncate (void_p, decl_v->versions, 0);
+      VEC_free (void_p, heap, decl_v->versions);
+      decl_v->versions = NULL;
+    }
+}
+
+/* Mark DECL1 and DECL2 to be function versions in the same group.  One
+   of DECL1 and DECL2 must be the default, otherwise this function does
+   nothing.  This function aggregates the versions.  */
+
+int
+group_function_versions (const tree decl1, const tree decl2)
+{
+  tree default_decl, version_decl;
+  version_function *default_v, *version_v;
+
+  gcc_assert (DECL_FUNCTION_VERSIONED (decl1)
+	      && DECL_FUNCTION_VERSIONED (decl2));
+
+  /* The version decls are added only to the default decl.  */
+  if (!is_default_function (decl1)
+      && !is_default_function (decl2))
+    return 0;
+
+  /* This can happen with duplicate declarations.  Just ignore.  */
+  if (is_default_function (decl1)
+      && is_default_function (decl2))
+    return 0;
+
+  default_decl = (is_default_function (decl1)) ? decl1 : decl2;
+  version_decl = (default_decl == decl1) ? decl2 : decl1;
+
+  gcc_assert (default_decl != version_decl);
+  create_decl_version_htab ();
+
+  /* If the version function is found, it has been added.  */
+  if (find_function_version (version_decl))
+    return 0;
+
+  default_v = add_function_version (default_decl);
+  version_v = add_function_version (version_decl);
+
+  if (default_v->versions == NULL)
+    default_v->versions = VEC_alloc (void_p, heap, 1);
+
+  push_function_version (version_v, &default_v->versions);
+  return 0;
+}
+
+/* Makes a function attribute of the form NAME(ARG_NAME) and chains
+   it to CHAIN.  */
+
+static tree
+make_attribute (const char *name, const char *arg_name, tree chain)
+{
+  tree attr_name;
+  tree attr_arg_name;
+  tree attr_args;
+  tree attr;
+
+  attr_name = get_identifier (name);
+  attr_arg_name = build_string (strlen (arg_name), arg_name);
+  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
+  attr = tree_cons (attr_name, attr_args, chain);
+  return attr;
+}
+
+/* Return a new name by appending SUFFIX to the DECL name.  If
+   make_unique is true, append the full path name.  */
+
+static char *
+make_name (tree decl, const char *suffix, bool make_unique)
+{
+  char *global_var_name;
+  int name_len;
+  const char *name;
+  const char *unique_name = NULL;
+
+  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+  /* Get a unique name that can be used globally without any chances
+     of collision at link time.  */
+  if (make_unique)
+    unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0"));
+
+  name_len = strlen (name) + strlen (suffix) + 2;
+
+  if (make_unique)
+    name_len += strlen (unique_name) + 1;
+  global_var_name = (char *) xmalloc (name_len);
+
+  /* Use '.' to concatenate names as it is demangler friendly.  */
+  if (make_unique)
+      snprintf (global_var_name, name_len, "%s.%s.%s", name,
+		unique_name, suffix);
+  else
+      snprintf (global_var_name, name_len, "%s.%s", name, suffix);
+
+  return global_var_name;
+}
+
+/* Make the resolver function decl for ifunc (IFUNC_DECL) to dispatch
+   the versions of multi-versioned function DEFAULT_DECL.  Create and
+   empty basic block in the resolver and store the pointer in
+   EMPTY_BB.  Return the decl of the resolver function.  */
+
+static tree
+make_ifunc_resolver_func (const tree default_decl,
+			  const tree ifunc_decl,
+			  basic_block *empty_bb)
+{
+  char *resolver_name;
+  tree decl, type, decl_name, t;
+  basic_block new_bb;
+  tree old_current_function_decl;
+  bool make_unique = false;
+
+  /* IFUNC's have to be globally visible.  So, if the default_decl is
+     not, then the name of the IFUNC should be made unique.  */
+  if (TREE_PUBLIC (default_decl) == 0)
+    make_unique = true;
+
+  /* Append the filename to the resolver function if the versions are
+     not externally visible.  This is because the resolver function has
+     to be externally visible for the loader to find it.  So, appending
+     the filename will prevent conflicts with a resolver function from
+     another module which is based on the same version name.  */
+  resolver_name = make_name (default_decl, "resolver", make_unique);
+
+  /* The resolver function should return a (void *). */
+  type = build_function_type_list (ptr_type_node, NULL_TREE);
+
+  decl = build_fn_decl (resolver_name, type);
+  decl_name = get_identifier (resolver_name);
+  SET_DECL_ASSEMBLER_NAME (decl, decl_name);
+
+  DECL_NAME (decl) = decl_name;
+  TREE_USED (decl) = TREE_USED (default_decl);
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  /* IFUNC resolvers have to be externally visible.  */
+  TREE_PUBLIC (decl) = 1;
+  DECL_UNINLINABLE (decl) = 1;
+
+  DECL_EXTERNAL (decl) = DECL_EXTERNAL (default_decl);
+  DECL_EXTERNAL (ifunc_decl) = 0;
+
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  DECL_STATIC_CONSTRUCTOR (decl) = 0;
+  TREE_READONLY (decl) = 0;
+  DECL_PURE_P (decl) = 0;
+  DECL_COMDAT (decl) = DECL_COMDAT (default_decl);
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl));
+    }
+  /* Build result decl and add to function_decl. */
+  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_RESULT (decl) = t;
+
+  gimplify_function_tree (decl);
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (decl));
+  current_function_decl = decl;
+  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
+  cfun->curr_properties |=
+    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
+     PROP_ssa);
+  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
+  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
+  make_edge (new_bb, EXIT_BLOCK_PTR, 0);
+  *empty_bb = new_bb;
+
+  cgraph_add_new_function (decl, true);
+  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl));
+  cgraph_mark_force_output_node (cgraph_get_create_node (decl));
+
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      gcc_assert (cgraph_get_node (default_decl));
+      cgraph_add_to_same_comdat_group (cgraph_get_node (decl),
+				       cgraph_get_node (default_decl));
+    }
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+  gcc_assert (ifunc_decl != NULL);
+  /* Mark ifunc_decl as "ifunc" with resolver as resolver_name.  */
+  DECL_ATTRIBUTES (ifunc_decl) 
+    = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (ifunc_decl));
+
+  /* Create the alias here.  */
+  cgraph_create_function_alias (ifunc_decl, decl);
+  return decl;
+}
+
+/* Make and ifunc declaration for the multi-versioned function DECL.  Calls to
+   DECL function will be replaced with calls to the ifunc.   Return the decl
+   of the ifunc created.  */
+
+static tree
+make_ifunc_func (const tree decl)
+{
+  tree ifunc_decl;
+  char *ifunc_name, *resolver_name;
+  tree fn_type, ifunc_type;
+  bool make_unique = false;
+
+  if (TREE_PUBLIC (decl) == 0)
+    make_unique = true;
+
+  ifunc_name = make_name (decl, "ifunc", make_unique);
+  resolver_name = make_name (decl, "resolver", make_unique);
+  gcc_assert (resolver_name);
+
+  fn_type = TREE_TYPE (decl);
+  ifunc_type = build_function_type (TREE_TYPE (fn_type),
+				    TYPE_ARG_TYPES (fn_type));
+  
+  ifunc_decl = build_fn_decl (ifunc_name, ifunc_type);
+  TREE_USED (ifunc_decl) = 1;
+  DECL_CONTEXT (ifunc_decl) = NULL_TREE;
+  DECL_INITIAL (ifunc_decl) = error_mark_node;
+  DECL_ARTIFICIAL (ifunc_decl) = 1;
+  /* Mark this ifunc as external, the resolver will flip it again if
+     it gets generated.  */
+  DECL_EXTERNAL (ifunc_decl) = 1;
+  /* IFUNCs have to be externally visible.  */
+  TREE_PUBLIC (ifunc_decl) = 1;
+
+  return ifunc_decl;  
+}
+
+/* For multi-versioned function decl, which should also be the default,
+   return the decl of the ifunc resolver, create it if it does not
+   exist.  */
+
+tree
+get_ifunc_for_version (const tree decl)
+{
+  version_function *decl_v;
+  int ix;
+  void_p ele;
+
+  /* DECL has to be the default version, otherwise it is missing and
+     that is not allowed.  */
+  if (!is_default_function (decl))
+    {
+      error_at (DECL_SOURCE_LOCATION (decl), "Default version not found");
+      return decl;
+    }
+
+  decl_v = find_function_version (decl);
+  gcc_assert (decl_v != NULL);
+  if (decl_v->ifunc_decl == NULL)
+    {
+      tree ifunc_decl;
+      ifunc_decl = make_ifunc_func (decl);
+      decl_v->ifunc_decl = ifunc_decl;
+    }
+
+  if (cgraph_get_node (decl))
+    cgraph_mark_force_output_node (cgraph_get_node (decl));
+
+  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
+    {
+      version_function *v = (version_function *) ele;
+      /* This could be a deleted version.  Happens with
+	 duplicate declarations. */
+      if (v->is_deleted)
+	continue;
+      gcc_assert (v->decl != NULL);
+      if (cgraph_get_node (v->decl))
+	cgraph_mark_force_output_node (cgraph_get_node (v->decl));
+    }
+
+  return decl_v->ifunc_decl;
+}
+
+/* Generate the dispatching code to dispatch multi-versioned function
+   DECL.  Make a new function decl for dispatching and call the target
+   hook to process the "target" attributes and provide the code to
+   dispatch the right function at run-time.  */
+
+static tree
+make_ifunc_resolver_for_version (const tree decl)
+{
+  version_function *decl_v;
+  tree ifunc_resolver_decl, ifunc_decl;
+  basic_block empty_bb;
+  int ix;
+  void_p ele;
+  VEC (tree, heap) *fn_ver_vec = NULL;
+  tree old_current_function_decl;
+
+  gcc_assert (is_default_function (decl));
+
+  decl_v = find_function_version (decl);
+  gcc_assert (decl_v != NULL);
+
+  if (decl_v->ifunc_resolver_decl != NULL)
+    return decl_v->ifunc_resolver_decl;
+
+  ifunc_decl = decl_v->ifunc_decl;
+
+  if (ifunc_decl == NULL)
+    ifunc_decl = decl_v->ifunc_decl = make_ifunc_func (decl);
+
+  ifunc_resolver_decl = make_ifunc_resolver_func (decl, ifunc_decl,
+						  &empty_bb);
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (ifunc_resolver_decl));
+  current_function_decl = ifunc_resolver_decl;
+
+  fn_ver_vec = VEC_alloc (tree, heap, 2);
+  VEC_safe_push (tree, heap, fn_ver_vec, decl);
+
+  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
+    {
+      version_function *v = (version_function *) ele;
+      gcc_assert (v->decl != NULL);
+      /* Check for virtual functions here again, as by this time it should
+	 have been determined if this function needs a vtable index or
+	 not.  This happens for methods in derived classes that override
+	 virtual methods in base classes but are not explicitly marked as
+	 virtual.  */
+      if (DECL_VINDEX (v->decl))
+        error_at (DECL_SOURCE_LOCATION (v->decl),
+		  "Virtual function versioning not supported\n");
+      if (!v->is_deleted)
+	VEC_safe_push (tree, heap, fn_ver_vec, v->decl);
+    }
+
+  gcc_assert (targetm.dispatch_version);
+  targetm.dispatch_version (ifunc_resolver_decl, fn_ver_vec, &empty_bb);
+  decl_v->ifunc_resolver_decl = ifunc_resolver_decl;
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+  return ifunc_resolver_decl;
+}
+
+/* Main entry point to pass_dispatch_versions. For multi-versioned functions,
+   generate the dispatching code.  */
+
+static unsigned int
+do_dispatch_versions (void)
+{
+  /* A new pass for generating dispatch code for multi-versioned functions.
+     Other forms of dispatch can be added when ifunc support is not available
+     like just calling the function directly after checking for target type.
+     Currently, dispatching is done through IFUNC.  This pass will become
+     more meaningful when other dispatch mechanisms are added.  */
+
+  /* Cloning a function to produce more versions will happen here when the
+     user requests that via the target attribute. For example,
+     int foo () __attribute__ ((target(("arch=core2"), ("arch=corei7"))));
+     means that the user wants the same body of foo to be versioned for core2
+     and corei7.  In that case, this function will be cloned during this
+     pass.  */
+  
+  if (DECL_FUNCTION_VERSIONED (current_function_decl)
+      && is_default_function (current_function_decl))
+    {
+      tree decl = make_ifunc_resolver_for_version (current_function_decl);
+      if (dump_file && decl)
+	dump_function_to_file (decl, dump_file, TDF_BLOCKS);
+    }
+  return 0;
+}
+
+static  bool
+gate_dispatch_versions (void)
+{
+  return true;
+}
+
+/* A pass to generate the dispatch code to execute the appropriate version
+   of a multi-versioned function at run-time.  */
+
+struct gimple_opt_pass pass_dispatch_versions =
+{
+ {
+  GIMPLE_PASS,
+  "dispatch_multiversion_functions",    /* name */
+  gate_dispatch_versions,		/* gate */
+  do_dispatch_versions,			/* execute */
+  NULL,					/* sub */
+  NULL,					/* next */
+  0,					/* static_pass_number */
+  TV_MULTIVERSION_DISPATCH,		/* tv_id */
+  PROP_cfg,				/* properties_required */
+  PROP_cfg,				/* properties_provided */
+  0,					/* properties_destroyed */
+  0,					/* todo_flags_start */
+  0					/* todo_flags_finish */
+ }
+};
Index: gcc/multiversion.h
===================================================================
--- gcc/multiversion.h	(revision 0)
+++ gcc/multiversion.h	(revision 0)
@@ -0,0 +1,55 @@ 
+/* Function Multiversioning.
+   Copyright (C) 2012 Free Software Foundation, Inc.
+   Contributed by Sriraman Tallam (tmsriram@google.com)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>. */
+
+/* This is the header file which provides the functions to keep track
+   of functions that are multi-versioned and to generate the dispatch
+   code to call the right version at run-time.  */
+
+#ifndef GCC_MULTIVERSION_H
+#define GCC_MULTIVERION_H
+
+#include "tree.h"
+
+/* Mark DECL1 and DECL2 as function versions.  */
+int group_function_versions (const tree decl1, const tree decl2);
+
+/* Mark DECL as deleted and no longer a version.  */
+void mark_delete_decl_version (const tree decl);
+
+/* Returns true if DECL is the default version to be executed if all
+   other versions are inappropriate at run-time.  */
+bool is_default_function (const tree decl);
+
+/* Gets the IFUNC dispatcher for this multi-versioned function DECL. DECL
+   must be the default function in the multi-versioned group.  */
+tree get_ifunc_for_version (const tree decl);
+
+/* Returns true when only one of DECL1 and DECL2 is marked with "target"
+   or if the "target" attribute strings of  DECL1 and DECL2 dont match.  */
+bool has_different_version_attributes (const tree decl1, const tree decl2);
+
+/* Function DECL is marked to be a multi-versioned function.  If DECL is
+   not the default version, the assembler name of DECL is changed to include
+   the attribute string to keep the name unambiguous.  */
+void mark_function_as_version (const tree decl);
+
+/* Check if decl is FUNCTION_DECL with target attribute set.  */
+bool is_target_attribute_set (const tree decl);
+#endif
Index: gcc/cgraphunit.c
===================================================================
--- gcc/cgraphunit.c	(revision 186883)
+++ gcc/cgraphunit.c	(working copy)
@@ -411,6 +411,13 @@  cgraph_finalize_function (tree decl, bool nested)
       && !DECL_DISREGARD_INLINE_LIMITS (decl))
     node->symbol.force_output = 1;
 
+  /* With function versions, keep inline functions and do not worry about
+     inline limits.  */
+  if (DECL_FUNCTION_VERSIONED (decl)
+      && DECL_DECLARED_INLINE_P (decl)
+      && !DECL_EXTERNAL (decl))
+    node->symbol.force_output = 1;
+
   /* When not optimizing, also output the static functions. (see
      PR24561), but don't do so for always_inline functions, functions
      declared inline and nested functions.  These were optimized out
Index: gcc/testsuite/g++.dg/mv1.C
===================================================================
--- gcc/testsuite/g++.dg/mv1.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv1.C	(revision 0)
@@ -0,0 +1,24 @@ 
+/* Simple test case to check if Multiversioning works.  */
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-dispatch_multiversion_functions" } */
+
+int foo ();
+int foo () __attribute__ ((target("arch=corei7")));
+
+int main ()
+{
+  int (*p)() = &foo;
+  return foo () + (*p)();
+}
+
+int foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("arch=corei7")))
+foo ()
+{
+  return 0;
+}
+
Index: gcc/cp/class.c
===================================================================
--- gcc/cp/class.c	(revision 186883)
+++ gcc/cp/class.c	(working copy)
@@ -38,6 +38,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree-dump.h"
 #include "splay-tree.h"
 #include "pointer-set.h"
+#include "multiversion.h"
 
 /* The number of nested classes being processed.  If we are not in the
    scope of any class, this is zero.  */
@@ -1092,7 +1093,21 @@  add_method (tree type, tree method, tree using_dec
 	      || same_type_p (TREE_TYPE (fn_type),
 			      TREE_TYPE (method_type))))
 	{
-	  if (using_decl)
+	  /* For function versions, their parms and types match
+	     but they are not duplicates.  Record function versions
+	     as and when they are found.  */
+	  if (TREE_CODE (fn) == FUNCTION_DECL
+	      && TREE_CODE (method) == FUNCTION_DECL
+	      && (is_target_attribute_set (fn)
+		  || is_target_attribute_set (method))
+	      && has_different_version_attributes (fn, method))
+ 	    {
+	      mark_function_as_version (fn);
+	      mark_function_as_version (method);
+	      group_function_versions (fn, method);
+	      continue;
+	    }
+	  else if (using_decl)
 	    {
 	      if (DECL_CONTEXT (fn) == type)
 		/* Defer to the local function.  */
@@ -1150,6 +1165,7 @@  add_method (tree type, tree method, tree using_dec
   else
     /* Replace the current slot.  */
     VEC_replace (tree, method_vec, slot, overload);
+
   return true;
 }
 
@@ -6930,8 +6946,11 @@  resolve_address_of_overloaded_function (tree targe
 	  if (DECL_ANTICIPATED (fn))
 	    continue;
 
-	  /* See if there's a match.  */
-	  if (same_type_p (target_fn_type, static_fn_type (fn)))
+	  /* See if there's a match.   For functions that are multi-versioned
+	     match it to the default function.  */
+	  if (same_type_p (target_fn_type, static_fn_type (fn))
+	      && (!DECL_FUNCTION_VERSIONED (fn)
+		  || is_default_function (fn)))
 	    matches = tree_cons (fn, NULL_TREE, matches);
 	}
     }
@@ -7093,6 +7112,21 @@  resolve_address_of_overloaded_function (tree targe
       perform_or_defer_access_check (access_path, fn, fn);
     }
 
+  /* If a pointer to a function that is multi-versioned is requested, the
+     pointer to the dispatcher function is returned instead.  This works
+     well because indirectly calling the function will dispatch the right
+     function version at run-time. Also, the function address is kept
+     unique.  */
+  if (DECL_FUNCTION_VERSIONED (fn)
+      && is_default_function (fn))
+    {
+      tree ifunc_decl;
+      ifunc_decl = get_ifunc_for_version (fn);
+      gcc_assert (ifunc_decl != NULL);
+      mark_used (fn);
+      return build_fold_addr_expr (ifunc_decl);
+    }
+
   if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type))
     return cp_build_addr_expr (fn, flags);
   else
Index: gcc/cp/decl.c
===================================================================
--- gcc/cp/decl.c	(revision 186883)
+++ gcc/cp/decl.c	(working copy)
@@ -54,6 +54,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "pointer-set.h"
 #include "splay-tree.h"
 #include "plugin.h"
+#include "multiversion.h"
 
 /* Possible cases of bad specifiers type used by bad_specifiers. */
 enum bad_spec_place {
@@ -973,6 +974,21 @@  decls_match (tree newdecl, tree olddecl)
       if (t1 != t2)
 	return 0;
 
+      /* The decls dont match if they correspond to two different versions
+	 of the same function.  */
+      if (compparms (p1, p2)
+	  && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)) 
+	  && has_different_version_attributes (newdecl, olddecl))
+	{
+	  /* One of the decls could be the default without the "target"
+	     attribute. Set it to be a versioned function here.  */
+	  mark_function_as_version (newdecl);
+	  mark_function_as_version (olddecl);
+	  /* Accumulate all the versions of a function.  */
+	  group_function_versions (olddecl, newdecl);
+	  return 0;
+	}
+
       if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl)
 	  && ! (DECL_EXTERN_C_P (newdecl)
 		&& DECL_EXTERN_C_P (olddecl)))
@@ -1490,7 +1506,11 @@  duplicate_decls (tree newdecl, tree olddecl, bool
 	      error ("previous declaration %q+#D here", olddecl);
 	      return NULL_TREE;
 	    }
-	  else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
+	  /* For function versions, params and types match, but they
+	     are not ambiguous.  */
+	  else if ((!DECL_FUNCTION_VERSIONED (newdecl)
+		    && !DECL_FUNCTION_VERSIONED (olddecl))
+		   && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
 			      TYPE_ARG_TYPES (TREE_TYPE (olddecl))))
 	    {
 	      error ("new declaration %q#D", newdecl);
@@ -2262,6 +2282,16 @@  duplicate_decls (tree newdecl, tree olddecl, bool
   else if (DECL_PRESERVE_P (newdecl))
     DECL_PRESERVE_P (olddecl) = 1;
 
+  /* If the olddecl is a version, so is the newdecl.  */
+  if (TREE_CODE (newdecl) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (olddecl))
+    {
+      DECL_FUNCTION_VERSIONED (newdecl) = 1;
+      /* Record that newdecl is not a valid version and has
+	 been deleted.  */
+      mark_delete_decl_version (newdecl);
+    }
+
   if (TREE_CODE (newdecl) == FUNCTION_DECL)
     {
       int function_size;
@@ -14035,7 +14065,11 @@  cxx_comdat_group (tree decl)
 	  else
 	    break;
 	}
-      name = DECL_ASSEMBLER_NAME (decl);
+      if (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl))
+	name = DECL_NAME (decl);
+      else
+        name = DECL_ASSEMBLER_NAME (decl);
     }
 
   return name;
Index: gcc/cp/semantics.c
===================================================================
--- gcc/cp/semantics.c	(revision 186883)
+++ gcc/cp/semantics.c	(working copy)
@@ -3783,8 +3783,11 @@  expand_or_defer_fn_1 (tree fn)
       /* If the user wants us to keep all inline functions, then mark
 	 this function as needed so that finish_file will make sure to
 	 output it later.  Similarly, all dllexport'd functions must
-	 be emitted; there may be callers in other DLLs.  */
-      if ((flag_keep_inline_functions
+	 be emitted; there may be callers in other DLLs.
+	 Also, mark this function as needed if it is marked inline but
+	 is a multi-versioned function.  */
+      if (((flag_keep_inline_functions
+	    || DECL_FUNCTION_VERSIONED (fn))
 	   && DECL_DECLARED_INLINE_P (fn)
 	   && !DECL_REALLY_EXTERN (fn))
 	  || (flag_keep_inline_dllexport
Index: gcc/cp/decl2.c
===================================================================
--- gcc/cp/decl2.c	(revision 186883)
+++ gcc/cp/decl2.c	(working copy)
@@ -53,6 +53,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "splay-tree.h"
 #include "langhooks.h"
 #include "c-family/c-ada-spec.h"
+#include "multiversion.h"
 
 extern cpp_reader *parse_in;
 
@@ -677,9 +678,13 @@  check_classfn (tree ctype, tree function, tree tem
 	  if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL))
 	    continue;
 
+	  /* While finding a match, same types and params are not enough
+	     if the function is versioned.  Also check version ("target")
+	     attributes.  */
 	  if (same_type_p (TREE_TYPE (TREE_TYPE (function)),
 			   TREE_TYPE (TREE_TYPE (fndecl)))
 	      && compparms (p1, p2)
+	      && !has_different_version_attributes (function, fndecl)
 	      && (!is_template
 		  || comp_template_parms (template_parms,
 					  DECL_TEMPLATE_PARMS (fndecl)))
Index: gcc/cp/call.c
===================================================================
--- gcc/cp/call.c	(revision 186883)
+++ gcc/cp/call.c	(working copy)
@@ -41,6 +41,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "c-family/c-objc.h"
 #include "timevar.h"
+#include "multiversion.h"
 
 /* The various kinds of conversion.  */
 
@@ -3903,6 +3904,16 @@  build_new_function_call (tree fn, VEC(tree,gc) **a
     {
       if (complain & tf_error)
 	{
+	  /* If the call is to a multiversioned function without
+	     a default version, overload resolution will fail.  */
+	  if (candidates
+	      && TREE_CODE (candidates->fn) == FUNCTION_DECL
+	      && DECL_FUNCTION_VERSIONED (candidates->fn))
+	    error_at (location_of (DECL_NAME (OVL_CURRENT (fn))),
+		      "Call to multiversioned function %<%D(%A)%> with"
+		      " no default version", DECL_NAME (OVL_CURRENT (fn)),
+		      build_tree_list_vec (*args));
+
 	  if (!any_viable_p && candidates && ! candidates->next
 	      && (TREE_CODE (candidates->fn) == FUNCTION_DECL))
 	    return cp_build_function_call_vec (candidates->fn, args, complain);
@@ -6809,6 +6820,18 @@  build_over_call (struct z_candidate *cand, int fla
   if (!already_used)
     mark_used (fn);
 
+  /* For a call to a multi-versioned function, the call should actually be to
+     the dispatcher.  */
+  if (DECL_FUNCTION_VERSIONED (fn)
+      && is_default_function (fn))
+    {
+      tree ifunc_decl;
+      ifunc_decl = get_ifunc_for_version (fn);
+      gcc_assert (ifunc_decl != NULL);
+      return build_call_expr_loc_array (UNKNOWN_LOCATION, ifunc_decl,
+					nargs, argarray);
+    }
+
   if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0)
     {
       tree t;
@@ -8067,6 +8090,60 @@  joust (struct z_candidate *cand1, struct z_candida
   size_t i;
   size_t len;
 
+  /* For Candidates of a multi-versioned function, first check if the
+     target flags of the caller match any of the candidates. If so,
+     the caller can directly call this candidate otherwise the one marked
+     default wins.  This is because the default decl is used as key to
+     aggregate all the other versions provided for it in multiversion.c.
+     When generating the actual call, the appropriate dispatcher is created
+     to call the right function version at run-time.  */
+
+  if ((TREE_CODE (cand1->fn) == FUNCTION_DECL
+       && DECL_FUNCTION_VERSIONED (cand1->fn))
+      ||(TREE_CODE (cand2->fn) == FUNCTION_DECL
+	 && DECL_FUNCTION_VERSIONED (cand2->fn)))
+    {
+      /* Both functions must be marked versioned.  */
+      gcc_assert (DECL_FUNCTION_VERSIONED (cand1->fn)
+		  && DECL_FUNCTION_VERSIONED (cand2->fn));
+
+      /* Try to see if a direct call can be made to a version.  This is
+	 possible if the caller and callee have the same target flags.
+	 If cand->fn is marked with target attributes,  check if the
+	 target approves inlining this into the caller.  If so, this is
+	 the version we want.  */
+
+      if (is_target_attribute_set (cand1->fn)
+	  && targetm.target_option.can_inline_p (current_function_decl,
+						 cand1->fn))
+	return 1;
+
+      if (is_target_attribute_set (cand2->fn)
+	  && targetm.target_option.can_inline_p (current_function_decl,
+						 cand2->fn))
+	return -1;
+
+      /* A direct call to a version is not possible, so find the default
+	 function and return it.  This will later be converted to dispatch
+	 the right version at run time.  */
+
+      if (is_default_function (cand1->fn))
+	{
+          mark_used (cand2->fn);
+	  return 1;
+	}
+
+      if (is_default_function (cand2->fn))
+	{
+          mark_used (cand1->fn);
+	  return -1;
+	}
+
+      /* If a default function is absent, this will never get resolved leading
+	 to an ambiguous call error.  */
+      return 0;
+    }
+
   /* Candidates that involve bad conversions are always worse than those
      that don't.  */
   if (cand1->viable > cand2->viable)
Index: gcc/timevar.def
===================================================================
--- gcc/timevar.def	(revision 186883)
+++ gcc/timevar.def	(working copy)
@@ -253,6 +253,7 @@  DEFTIMEVAR (TV_TREE_IFCOMBINE        , "tree if-co
 DEFTIMEVAR (TV_TREE_UNINIT           , "uninit var analysis")
 DEFTIMEVAR (TV_PLUGIN_INIT           , "plugin initialization")
 DEFTIMEVAR (TV_PLUGIN_RUN            , "plugin execution")
+DEFTIMEVAR (TV_MULTIVERSION_DISPATCH , "multiversion dispatch")
 
 /* Everything else in rest_of_compilation not included above.  */
 DEFTIMEVAR (TV_EARLY_LOCAL	     , "early local passes")
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 186883)
+++ gcc/Makefile.in	(working copy)
@@ -1294,6 +1294,7 @@  OBJS = \
 	mcf.o \
 	mode-switching.o \
 	modulo-sched.o \
+	multiversion.o \
 	omega.o \
 	omp-low.o \
 	optabs.o \
@@ -3030,6 +3031,11 @@  ree.o : ree.c $(CONFIG_H) $(SYSTEM_H) coretypes.h
    $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
    $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) \
    $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h $(PARAMS_H) $(CGRAPH_H)
+multiversion.o : multiversion.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
+   $(TREE_H) langhooks.h $(TREE_INLINE_H) $(FLAGS_H) $(CGRAPH_H) intl.h \
+   $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h \
+   $(HASHTAB_H) $(COVERAGE_H) $(GGC_H) $(TREE_FLOW_H) $(RTL_H) $(IPA_PROP_H) \
+   $(BASIC_BLOCK_H) $(TOPLEV_H) $(TREE_DUMP_H) ipa-inline.h
 cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
    $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h toplev.h $(DIAGNOSTIC_CORE_H) \
Index: gcc/passes.c
===================================================================
--- gcc/passes.c	(revision 186883)
+++ gcc/passes.c	(working copy)
@@ -1287,6 +1287,7 @@  init_optimization_passes (void)
   NEXT_PASS (pass_build_cfg);
   NEXT_PASS (pass_warn_function_return);
   NEXT_PASS (pass_build_cgraph_edges);
+  NEXT_PASS (pass_dispatch_versions);
   *p = NULL;
 
   /* Interprocedural optimization passes.  */
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 186883)
+++ gcc/config/i386/i386.c	(working copy)
@@ -27678,6 +27678,324 @@  ix86_init_mmx_sse_builtins (void)
     }
 }
 
+
+/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL
+   to return a pointer to VERSION_DECL if the outcome of the expression
+   formed by PREDICATE_CHAIN is true.  This function will be called during
+   version dispatch to decide which function version to execute.  It returns
+   the basic block at the end to which more conditions can be added.  */
+
+static basic_block
+add_condition_to_bb (tree function_decl, tree version_decl,
+		     tree predicate_chain, basic_block new_bb)
+{
+  gimple return_stmt;
+  tree convert_expr, result_var;
+  gimple convert_stmt;
+  gimple call_cond_stmt;
+  gimple if_else_stmt;
+
+  basic_block bb1, bb2, bb3;
+  edge e12, e23;
+
+  tree cond_var, and_expr_var = NULL_TREE;
+  gimple_seq gseq;
+
+  tree old_current_function_decl;
+  tree predicate_decl, predicate_arg;
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
+  current_function_decl = function_decl;
+
+  gcc_assert (new_bb != NULL);
+  gseq = bb_seq (new_bb);
+
+
+  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
+	     		 build_fold_addr_expr (version_decl));
+  result_var = create_tmp_var (ptr_type_node, NULL);
+  convert_stmt = gimple_build_assign (result_var, convert_expr); 
+  return_stmt = gimple_build_return (result_var);
+
+  if (predicate_chain == NULL_TREE)
+    {
+      gimple_seq_add_stmt (&gseq, convert_stmt);
+      gimple_seq_add_stmt (&gseq, return_stmt);
+      set_bb_seq (new_bb, gseq);
+      gimple_set_bb (convert_stmt, new_bb);
+      gimple_set_bb (return_stmt, new_bb);
+      pop_cfun ();
+      current_function_decl = old_current_function_decl;
+      return new_bb;
+    }
+
+  while (predicate_chain != NULL)
+    {
+      cond_var = create_tmp_var (integer_type_node, NULL);
+      predicate_decl = TREE_PURPOSE (predicate_chain);
+      predicate_arg = TREE_VALUE (predicate_chain);
+      call_cond_stmt = gimple_build_call (predicate_decl, 1, predicate_arg);
+      gimple_call_set_lhs (call_cond_stmt, cond_var);
+
+      gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl));
+      gimple_set_bb (call_cond_stmt, new_bb);
+      gimple_seq_add_stmt (&gseq, call_cond_stmt);
+
+      predicate_chain = TREE_CHAIN (predicate_chain);
+      
+      if (and_expr_var == NULL)
+        and_expr_var = cond_var;
+      else
+	{
+	  gimple assign_stmt;
+	  assign_stmt = gimple_build_assign_with_ops (BIT_AND_EXPR,
+						      and_expr_var,
+						      cond_var, and_expr_var);
+
+	  gimple_set_block (assign_stmt, DECL_INITIAL (function_decl));
+	  gimple_set_bb (assign_stmt, new_bb);
+	  gimple_seq_add_stmt (&gseq, assign_stmt);
+	}
+    }
+
+  if_else_stmt = gimple_build_cond (GT_EXPR, and_expr_var,
+	  		            integer_zero_node,
+				    NULL_TREE, NULL_TREE);
+  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
+  gimple_set_bb (if_else_stmt, new_bb);
+  gimple_seq_add_stmt (&gseq, if_else_stmt);
+
+  gimple_seq_add_stmt (&gseq, convert_stmt);
+  gimple_seq_add_stmt (&gseq, return_stmt);
+  set_bb_seq (new_bb, gseq);
+
+  bb1 = new_bb;
+  e12 = split_block (bb1, if_else_stmt);
+  bb2 = e12->dest;
+  e12->flags &= ~EDGE_FALLTHRU;
+  e12->flags |= EDGE_TRUE_VALUE;
+
+  e23 = split_block (bb2, return_stmt);
+
+  gimple_set_bb (convert_stmt, bb2);
+  gimple_set_bb (return_stmt, bb2);
+
+  bb3 = e23->dest;
+  make_edge (bb1, bb3, EDGE_FALSE_VALUE); 
+
+  remove_edge (e23);
+  make_edge (bb2, EXIT_BLOCK_PTR, 0);
+
+  rebuild_cgraph_edges ();
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+  return bb3;
+}
+
+/* This parses the attribute arguments to target in DECL and determines
+   the right builtin to use to match the platform specification.
+   For now, only one target argument ("arch=" or "<-m>xxx") is allowed.  */
+
+static tree 
+get_builtin_code_for_version (tree decl)
+{
+  tree attrs;
+  struct cl_target_option cur_target;
+  tree target_node;
+  struct cl_target_option *new_target;
+  const char *arg_str = NULL;
+  const char *attrs_str = NULL;
+  char *tok_str = NULL;
+  char *token;
+  /* These are the target attribute strings for which a dispatcher is
+     available, from fold_builtin_cpu.  */
+  const char *feature_list[] = {"mmx", "popcnt", "sse", "sse2", "sse3",
+				"ssse3", "sse4.1", "sse4.2", "avx", "avx2"};
+  unsigned int NUM_FEATURES = sizeof (feature_list) / sizeof (const char *);
+  unsigned int i;
+  tree predicate_chain = NULL_TREE;
+  tree predicate_decl, predicate_arg;
+
+  attrs = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+  gcc_assert (attrs != NULL);
+
+  attrs = TREE_VALUE (TREE_VALUE (attrs));
+
+  gcc_assert (TREE_CODE (attrs) == STRING_CST);
+  attrs_str = TREE_STRING_POINTER (attrs);
+
+  /* Handle arch= if specified.  */
+  if (strstr (attrs_str, "arch=") != NULL)
+    {
+      cl_target_option_save (&cur_target, &global_options);
+      target_node = ix86_valid_target_attribute_tree (attrs);
+    
+      gcc_assert (target_node);
+      new_target = TREE_TARGET_OPTION (target_node);
+      gcc_assert (new_target);
+      
+      if (new_target->arch_specified && new_target->arch > 0)
+	{
+	  switch (new_target->arch)
+	    {
+	    case PROCESSOR_CORE2_32:
+	    case PROCESSOR_CORE2_64:
+	      arg_str = "core2";
+	      break;
+	    case PROCESSOR_COREI7_32:
+	    case PROCESSOR_COREI7_64:
+	      arg_str = "corei7";
+	      break;
+	    case PROCESSOR_ATOM:
+	      arg_str = "atom";
+	      break;
+	    case PROCESSOR_AMDFAM10:
+	      arg_str = "amdfam10h";
+	      break;
+	    case PROCESSOR_BDVER1:
+	      arg_str = "bdver1";
+	      break;
+	    case PROCESSOR_BDVER2:
+	      arg_str = "bdver2";
+	      break;
+	    }  
+	}    
+    
+      cl_target_option_restore (&global_options, &cur_target);
+      if (arg_str == NULL)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+	    	"No dispatcher found for the versioning attributes");
+	  return NULL;
+	}
+    
+      predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_IS];
+      /* For a C string literal the length includes the trailing NULL.  */
+      predicate_arg = build_string_literal (strlen (arg_str) + 1, arg_str);
+      predicate_chain = tree_cons (predicate_decl, predicate_arg,
+				   predicate_chain);
+    }
+
+  /* Process feature name.  */
+  tok_str =  (char *) xmalloc (strlen (attrs_str) + 1);
+  strcpy (tok_str, attrs_str);
+  token = strtok (tok_str, ",");
+  predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_SUPPORTS];
+
+  while (token != NULL)
+    {
+      /* Do not process "arch="  */
+      if (strncmp (token, "arch=", 5) == 0)
+	{
+	  token = strtok (NULL, ",");
+	  continue;
+	}
+      for (i = 0; i < NUM_FEATURES; ++i)
+	{
+	  if (strcmp (token, feature_list[i]) == 0)
+	    {
+	      predicate_arg = build_string_literal (
+				strlen (feature_list[i]) + 1,
+				feature_list[i]);
+	      predicate_chain = tree_cons (predicate_decl, predicate_arg,
+					   predicate_chain);
+	      break;
+	    }
+	}
+      if (i == NUM_FEATURES)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+		    "No dispatcher found for %s", token);
+	  return NULL;
+	}
+      token = strtok (NULL, ",");
+    }
+  free (tok_str);
+
+  if (predicate_chain == NULL_TREE)
+    {
+      error_at (DECL_SOURCE_LOCATION (decl),
+	        "No dispatcher found for the versioning attributes : %s",
+	        attrs_str);
+      return NULL;
+    }
+
+  predicate_chain = nreverse (predicate_chain);
+  return predicate_chain; 
+} 
+
+/* This is the target hook to generate the dispatch function for
+   multi-versioned functions.  DISPATCH_DECL is the function which will
+   contain the dispatch logic.  FNDECLS are the function choices for
+   dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer
+   in DISPATCH_DECL in which the dispatch code is generated.  */
+
+static int
+ix86_dispatch_version (tree dispatch_decl,
+		       void *fndecls_p,
+		       basic_block *empty_bb)
+{
+  tree default_decl;
+  gimple ifunc_cpu_init_stmt;
+  gimple_seq gseq;
+  tree old_current_function_decl;
+  int ix;
+  tree ele;
+  VEC (tree, heap) *fndecls;
+
+  gcc_assert (dispatch_decl != NULL
+	      && fndecls_p != NULL
+	      && empty_bb != NULL);
+
+  /*fndecls_p is actually a vector.  */
+  fndecls = (VEC (tree, heap) *)fndecls_p;
+
+  /* Atleast one more version other than the default.  */
+  gcc_assert (VEC_length (tree, fndecls) >= 2);
+
+  /* The first version in the vector is the default decl.  */
+  default_decl = VEC_index (tree, fndecls, 0);
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
+  current_function_decl = dispatch_decl;
+
+  gseq = bb_seq (*empty_bb);
+  ifunc_cpu_init_stmt = gimple_build_call_vec (
+                     ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL);
+  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
+  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
+  set_bb_seq (*empty_bb, gseq);
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+
+  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix)
+    {
+      tree version_decl = ele;
+      tree predicate_chain = NULL_TREE;
+      /* Get attribute string, parse it and find the right predicate decl.
+         The predicate function could be a lengthy combination of many
+	 features, like arch-type and various isa-variants.  */
+      predicate_chain = get_builtin_code_for_version (version_decl);
+
+      if (predicate_chain == NULL_TREE)
+	continue;
+
+      *empty_bb = add_condition_to_bb (dispatch_decl, version_decl,
+				       predicate_chain, *empty_bb);
+
+    }
+  /* dispatch default version at the end.  */
+  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl,
+				   NULL, *empty_bb);
+  return 0;
+}
+
 /* This builds the processor_model struct type defined in
    libgcc/config/i386/i386-cpuinfo.c  */
 
@@ -39463,6 +39781,9 @@  ix86_autovectorize_vector_sizes (void)
 #undef TARGET_FOLD_BUILTIN
 #define TARGET_FOLD_BUILTIN ix86_fold_builtin
 
+#undef TARGET_DISPATCH_VERSION
+#define TARGET_DISPATCH_VERSION ix86_dispatch_version
+
 #undef TARGET_ENUM_VA_LIST_P
 #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list