Patchwork [google] Patch to support calling multi-versioned functions via new GCC builtin. (issue4440078)

login
register
mail settings
Submitter Sriraman Tallam
Date May 4, 2011, 7:35 p.m.
Message ID <BANLkTinSxAvNSpFahLAmsF+xML1xk2=zfg@mail.gmail.com>
Download mbox | patch
Permalink /patch/94122/
State New
Headers show

Comments

Sriraman Tallam - May 4, 2011, 7:35 p.m.
Hi,

   I had reverted this patch from google/main because it failed to
boot-strap with java enabled. I have fixed that bug. I have attached
the new patch. The only changes I made from the old version are in
file
gcc/mversn-dispatch.c :

=============================================================================================
347,349d346
<   cfun->curr_properties |=
<     (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
<      PROP_ssa);
865c862
<   e = make_edge (new_bb, EXIT_BLOCK_PTR, 0);
---
>   e = make_edge (new_bb, EXIT_BLOCK_PTR, EDGE_FALLTHRU);
1507c1504
<       gimple_call_set_lhs (feature_test_call, ssa_result_var);
---
>       gimple_assign_set_lhs (feature_test_call, ssa_result_var);
1558c1555
<       init_stmt = gimple_build_assign (tmp_var, build_zero_cst (return_type));
---
>       init_stmt = gimple_build_assign (tmp_var, integer_zero_node);
1708c1705
< 	  if (strstr (IDENTIFIER_POINTER (DECL_NAME (call_decl)),
---
>           if (strstr (cgraph_node_name (cgraph_node (call_decl)),
==============================================================================================

I now checked that it boot-straps with Java. All tests pass. Is this
ok to submit to google/main ?

	* tree-pass.h (pass_tree_convert_builtin_dispatch): New pass.
	(pass_ipa_multiversion_dispatch): New pass.
	* builtin-types.def (BT_PTR_FN_INT): New pointer type.
	(BT_FN_INT_PTR_FN_INT_PTR_PTR_VAR): New function type for __builtin_dispatch.
	* builtins.def (BUILT_IN_DISPATCH): New builtin to
	support multi-version calls.
	* mversn-dispatch.c: New file.
	* timevar.de (TV_MVERSN_DISPATCH): New time var.
	* common.opt (fclone-hot-version-paths): New flag.
	* Makefile.in (mversn-dispatch.o): New rule.
	* passes.c (init_optimization_passes): Add the new
	multi-version and dispatch passes to the pass list.
	* params.def (PARAM_NUMBER_OF_MVERSN_CLONES): Define.
	(PARAM_MVERSN_CLONE_CGRAPH_DEPTH): Define.
	* doc/invoke.texi (mversn-clone-depth): Document.
	(num-mversn-clones): Document.
	(fclone-hot-version-paths): Document.
	* testsuite/gcc.dg/mversn7.c: New test.
	* testsuite/gcc.dg/mversn4.c: New test.
	* testsuite/gcc.dg/mversn4.h: New test.
	* testsuite/gcc.dg/mversn4a.c: New test.
	* testsuite/gcc.dg/torture/mversn1.c: New test.
	* testsuite/gcc.dg/mversn2.c: New test.
	* testsuite/gcc.dg/mversn6.c: New test.
	* testsuite/gcc.dg/mversn3.c: New test.
	* testsuite/g++.dg/mversn8.C: New test.
	* testsuite/g++.dg/mversn10a.C: New test.
	* testsuite/g++.dg/mversn14a.C: New test.
	* testsuite/g++.dg/tree-prof/mversn13.C: New test.
	* testsuite/g++.dg/tree-prof/mversn15.C: New test.
	* testsuite/g++.dg/tree-prof/mversn15a.C: New test.
	* testsuite/g++.dg/mversn9.C: New test.
	* testsuite/g++.dg/mversn10.C: New test.
	* testsuite/g++.dg/mversn12.C: New test.
	* testsuite/g++.dg/mversn14.C: New test.
	* testsuite/g++.dg/mversn16.C: New test.
	* testsuite/g++.dg/torture/mversn11.C: New test.
	* testsuite/g++.dg/torture/mversn5.C: New test.
	* testsuite/g++.dg/torture/mversn5.h: New test.
	* testsuite/g++.dg/torture/mversn5a.C: New test.
	* c-family/c-common.c (handle_version_selector_attribute): New function.
	(c_common_attribute_table): New attribute "version_selector".

Thanks,
-Sri.


On Mon, May 2, 2011 at 1:38 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> Thanks. I committed this patch.
>
> -Sri.
>
>
> On Mon, May 2, 2011 at 12:32 PM, Xinliang David Li <davidxl@google.com> wrote:
>> Ok. There may be more correctness related comments -- but those can be
>> addressed when available. For trunk, you need to address issues such
>> as multi-way dispatch.
>>
>> Thanks,
>>
>> David
>>
>> On Mon, May 2, 2011 at 12:11 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> Hi,
>>>
>>>  I want to submit this patch to google/main to make sure it is
>>> available for our internal use at Google in order to materialize some
>>> optimization opportunities. Let us continue this dicussion as I make
>>> changes and submit this for review for trunk.
>>>
>>> Thanks,
>>> -Sri.
>>>
>>>
>>> On Mon, May 2, 2011 at 9:41 AM, Xinliang David Li <davidxl@google.com> wrote:
>>>> On Mon, May 2, 2011 at 2:11 AM, Richard Guenther
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Fri, Apr 29, 2011 at 6:23 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>>> Here is the background for this feature:
>>>>>>
>>>>>> 1) People relies on function multi-version to explore hw features and
>>>>>> squeeze performance, but there is no standard ways of doing so, either
>>>>>> a) using indirect function calls with function pointers set at program
>>>>>> initialization; b) using manual dispatch at each callsite; b) using
>>>>>> features like IFUNC.  The dispatch mechanism needs to be promoted to
>>>>>> the language level and becomes the first class citizen;
>>>>>
>>>>> You are not doing that, you are inventing a new (crude) GCC extension.
>>>>
>>>> To capture the high level semantics and prevent user from lowering the
>>>> dispatch calls into forms compiler can not recognize, language
>>>> extension is the way to go.
>>>>
>>>>>
>>>>
>>>>>> See above. Multi-way dispatch can be added similarly.
>>>>>
>>>>> Not with the specified syntax.  So you'd end up with _two_
>>>>> language extensions.  That's bad (and unacceptable, IMHO).
>>>>
>>>> This part can be improved.
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> That's a nice idea, but why not simply process all functions with
>>>>>>> a const attribute and no arguments this way?  IMHO
>>>>>>>
>>>>>>> int testsse (void) __attribute__((const));
>>>>>>>
>>>>>>> is as good and avoids the new attribute completely.  The pass would
>>>>>>> also benefit other uses of this idiom (giving a way to have global
>>>>>>> dynamic initializers in C).  The functions may not be strictly 'const'
>>>>>>> in a way the compiler autodetects this attribute but it presents
>>>>>>> exactly the promises to the compiler that allow this optimization.
>>>>>>>
>>>>>>> Thus, please split this piece out into a separate patch and use
>>>>>>> the const attribute.
>>>>>>
>>>>>>
>>>>>> Sounds good.
>>>>>>
>>>>
>>>>>>
>>>>>> 1) the function selection may happen in a different function;
>>>>>
>>>>> Well, sure.  I propose to have the above as lowered form.  If people
>>>>> deliberately obfuscate code it's their fault.  Your scheme simply
>>>>> makes that obfuscation impossible (or less likely possible).  So
>>>>> 1) is not a valid argument.
>>>>
>>>> Lowering too early may defeat the purpose because
>>>>
>>>> 1) the desired optimization may not happen subject to many compiler
>>>> heuristic changes;
>>>> 2) it has other side effects such as wrong estimation of function size
>>>> which may impact inlining
>>>> 3) it limits the lowering into one form which may not be ideal  --
>>>> with builtin_dispatch, after hoisting optimization, the lowering can
>>>> use more efficient IFUNC scheme, for instance.
>>>>
>>>>>
>>>>>
>>>>> My point is that such optimization is completely independent of
>>>>> that dispatch thing.  The above could as well be a selection to
>>>>> use different input arrays, one causing alias analysis issues
>>>>> and one not.
>>>>>
>>>>> Thus, a __builtin_dispatch centric optimization pass is the wrong
>>>>> way to go.
>>>>
>>>> I agree that many things can implemented in different ways, but a high
>>>> level standard builtin_dispatch mechanism doing hoisting
>>>> interprocedcurally is cleaner and simpler and targeted for function
>>>> multi-versioning -- it does not depend on/rely on later phase's
>>>> heuristic tunings to make the right things to happen. Function MV
>>>> deserves this level of treatment as it will become more and more
>>>> important for some users (e.g., Google).
>>>>
>>>>>
>>>>> Now, with FDO I'd expect the foo is inlined into bar (because foo
>>>>> is deemed hot),
>>>>
>>>> That is a myth -- the truth is that there are other heuristics which
>>>> can prevent this from happening.
>>>>
>>>>> then you only need to deal with loop unswitching,
>>>>> which should be easy to drive from FDO.
>>>>
>>>> Same here -- the loop body may not be well formed/recognized. The loop
>>>> nests may not be perfectly nested, or other unswitching heuristics may
>>>> block it from happening.  This is the common problem form many other
>>>> things that get lowered too early. It is cleaner to make the high
>>>> level transformation first in IPA, and let unswitching dealing with
>>>> intra-procedural optimization.
>>>>
>>>> Thanks,
>>>>
>>>> David
>>>>
>>>>>
>>>>> Richard.
>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> David
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Note that we already have special FDO support for indirect to direct
>>>>>>> call promotion, so that might work already.
>>>>>>>
>>>>>>> Thus, with all this, __builtin_dispatch would be more like syntactic
>>>>>>> sugar to avoid writing above switch statement yourself.  If you restrict
>>>>>>> that sugar to a constant number of candidates it can be replaced by
>>>>>>> a macro (or several ones, specialized for the number of candidates).
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>>  For the call graph that looks like this before cloning :
>>>>>>>>
>>>>>>>> func_cold ----> func_hot ----> findOnes ----> IsPopCntSupported ----> popcnt
>>>>>>>>                                                |
>>>>>>>>                                                -----> no_popcnt
>>>>>>>>
>>>>>>>>  where popcnt and no_popcnt are the multi-versioned functions, then after cloning :
>>>>>>>>
>>>>>>>> func_cold ---->IsPopCntSupported ----> func_hot_clone0 ----> findOnes_clone0 ----> popcnt
>>>>>>>>                              |
>>>>>>>>                               ----> func_hot_clone1 ----> findOnes_clone1 ----> no_popcnt
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  Flags : -fclone-hot-version-paths does function unswitching via cloning.
>>>>>>>>          --param=num-mversn-clones=<num> allows to specify the number of
>>>>>>>>          functions that should be cloned.
>>>>>>>>          --param=mversn-clone-depth=<num> allows to specify the length of
>>>>>>>>          the call graph path that should be cloned.  num = 0 implies only
>>>>>>>>          leaf node that contains the __builtin_dispatch statement must be
>>>>>>>>          cloned.
>>>>>>>> ============================================================================================
>>>>>>>>
>>>>>>>> Google ref 45947, 45986
>>>>>>>>
>>>>>>>> 2011-04-28  Sriraman Tallam  <tmsriram@google.com>
>>>>>>>>
>>>>>>>>        * c-family/c-common.c   (revision 173122) (handle_version_selector_attribute): New function.
>>>>>>>>        (c_common_attribute_table): New attribute "version_selector".
>>>>>>>>        * tree-pass.h   (revision 173122) (pass_tree_convert_builtin_dispatch): New pass.
>>>>>>>>        (pass_ipa_multiversion_dispatch): New pass.
>>>>>>>>        * testsuite/gcc.dg/mversn7.c    (revision 0): New test case.
>>>>>>>>        * testsuite/gcc.dg/mversn4.c    (revision 0): New test case.
>>>>>>>>        * testsuite/gcc.dg/mversn4.h    (revision 0): New test case.
>>>>>>>>        * testsuite/gcc.dg/mversn4a.c   (revision 0): New test case.
>>>>>>>>        * testsuite/gcc.dg/torture/mversn1.c    (revision 0): New test case.
>>>>>>>>        * testsuite/gcc.dg/mversn2.c    (revision 0): New test case.
>>>>>>>>        * testsuite/gcc.dg/mversn6.c    (revision 0): New test case.
>>>>>>>>        * testsuite/gcc.dg/mversn3.c    (revision 0): New test case.
>>>>>>>>        * testsuite/g++.dg/mversn8.C    (revision 0): New test case.
>>>>>>>>        * testsuite/g++.dg/mversn10a.C  (revision 0): New test case.
>>>>>>>>        * testsuite/g++.dg/mversn14a.C  (revision 0): New test case.
>>>>>>>>        * testsuite/g++.dg/tree-prof/mversn13.C (revision 0): New test case.
>>>>>>>>        * testsuite/g++.dg/tree-prof/mversn15.C (revision 0): New test case.
>>>>>>>>        * testsuite/g++.dg/tree-prof/mversn15a.C        (revision 0): New test case.
>>>>>>>>        * testsuite/g++.dg/mversn9.C    (revision 0): New test case.
>>>>>>>>        * testsuite/g++.dg/mversn10.C   (revision 0): New test case.
>>>>>>>>        * testsuite/g++.dg/mversn12.C   (revision 0): New test case.
>>>>>>>>        * testsuite/g++.dg/mversn14.C   (revision 0): New test case.
>>>>>>>>        * testsuite/g++.dg/mversn16.C   (revision 0): New test case.
>>>>>>>>        * testsuite/g++.dg/torture/mversn11.C   (revision 0): New test case.
>>>>>>>>        * testsuite/g++.dg/torture/mversn5.C    (revision 0): New test case.
>>>>>>>>        * testsuite/g++.dg/torture/mversn5.h    (revision 0): New test case.
>>>>>>>>        * testsuite/g++.dg/torture/mversn5a.C   (revision 0): New test case.
>>>>>>>>        * builtin-types.def     (revision 173122) (BT_PTR_FN_INT): New pointer type.
>>>>>>>>        (BT_FN_INT_PTR_FN_INT_PTR_PTR_VAR): New function type for __builtin_dispatch.
>>>>>>>>        * builtins.def  (revision 173122) (BUILT_IN_DISPATCH): New builtin to
>>>>>>>>        support multi-version calls.
>>>>>>>>        * mversn-dispatch.c     (revision 0): New file.
>>>>>>>>        * timevar.def   (revision 173122) (TV_MVERSN_DISPATCH): New time var.
>>>>>>>>        * common.opt    (revision 173122) (fclone-hot-version-paths): New flag.
>>>>>>>>        * Makefile.in   (revision 173122) (mversn-dispatch.o): New rule.
>>>>>>>>        * passes.c      (revision 173122) (init_optimization_passes): Add the new
>>>>>>>>        multi-version and dispatch passes to the pass list.
>>>>>>>>        * params.def    (revision 173122) (PARAM_NUMBER_OF_MVERSN_CLONES): Define.
>>>>>>>>        (PARAM_MVERSN_CLONE_CGRAPH_DEPTH): Define.
>>>>>>>>        * doc/invoke.texi       (revision 173122) (mversn-clone-depth): Document.
>>>>>>>>        (num-mversn-clones): Document.
>>>>>>>>        (fclone-hot-version-paths): Document.
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
Diego Novillo - May 4, 2011, 10:13 p.m.
On Wed, May 4, 2011 at 15:35, Sriraman Tallam <tmsriram@google.com> wrote:

>        * tree-pass.h (pass_tree_convert_builtin_dispatch): New pass.
>        (pass_ipa_multiversion_dispatch): New pass.
>        * builtin-types.def (BT_PTR_FN_INT): New pointer type.
>        (BT_FN_INT_PTR_FN_INT_PTR_PTR_VAR): New function type for __builtin_dispatch.
>        * builtins.def (BUILT_IN_DISPATCH): New builtin to
>        support multi-version calls.
>        * mversn-dispatch.c: New file.
>        * timevar.de (TV_MVERSN_DISPATCH): New time var.
>        * common.opt (fclone-hot-version-paths): New flag.
>        * Makefile.in (mversn-dispatch.o): New rule.
>        * passes.c (init_optimization_passes): Add the new
>        multi-version and dispatch passes to the pass list.
>        * params.def (PARAM_NUMBER_OF_MVERSN_CLONES): Define.
>        (PARAM_MVERSN_CLONE_CGRAPH_DEPTH): Define.
>        * doc/invoke.texi (mversn-clone-depth): Document.
>        (num-mversn-clones): Document.
>        (fclone-hot-version-paths): Document.
>        * testsuite/gcc.dg/mversn7.c: New test.
>        * testsuite/gcc.dg/mversn4.c: New test.
>        * testsuite/gcc.dg/mversn4.h: New test.
>        * testsuite/gcc.dg/mversn4a.c: New test.
>        * testsuite/gcc.dg/torture/mversn1.c: New test.
>        * testsuite/gcc.dg/mversn2.c: New test.
>        * testsuite/gcc.dg/mversn6.c: New test.
>        * testsuite/gcc.dg/mversn3.c: New test.
>        * testsuite/g++.dg/mversn8.C: New test.
>        * testsuite/g++.dg/mversn10a.C: New test.
>        * testsuite/g++.dg/mversn14a.C: New test.
>        * testsuite/g++.dg/tree-prof/mversn13.C: New test.
>        * testsuite/g++.dg/tree-prof/mversn15.C: New test.
>        * testsuite/g++.dg/tree-prof/mversn15a.C: New test.
>        * testsuite/g++.dg/mversn9.C: New test.
>        * testsuite/g++.dg/mversn10.C: New test.
>        * testsuite/g++.dg/mversn12.C: New test.
>        * testsuite/g++.dg/mversn14.C: New test.
>        * testsuite/g++.dg/mversn16.C: New test.
>        * testsuite/g++.dg/torture/mversn11.C: New test.
>        * testsuite/g++.dg/torture/mversn5.C: New test.
>        * testsuite/g++.dg/torture/mversn5.h: New test.
>        * testsuite/g++.dg/torture/mversn5a.C: New test.
>        * c-family/c-common.c (handle_version_selector_attribute): New function.
>        (c_common_attribute_table): New attribute "version_selector".

OK.  Thanks for the quick fix!


Diego.
Sriraman Tallam - May 4, 2011, 10:16 p.m.
I submitted the patch.

Thanks,
-Sri.

On Wed, May 4, 2011 at 3:13 PM, Diego Novillo <dnovillo@google.com> wrote:
> On Wed, May 4, 2011 at 15:35, Sriraman Tallam <tmsriram@google.com> wrote:
>
>>        * tree-pass.h (pass_tree_convert_builtin_dispatch): New pass.
>>        (pass_ipa_multiversion_dispatch): New pass.
>>        * builtin-types.def (BT_PTR_FN_INT): New pointer type.
>>        (BT_FN_INT_PTR_FN_INT_PTR_PTR_VAR): New function type for __builtin_dispatch.
>>        * builtins.def (BUILT_IN_DISPATCH): New builtin to
>>        support multi-version calls.
>>        * mversn-dispatch.c: New file.
>>        * timevar.de (TV_MVERSN_DISPATCH): New time var.
>>        * common.opt (fclone-hot-version-paths): New flag.
>>        * Makefile.in (mversn-dispatch.o): New rule.
>>        * passes.c (init_optimization_passes): Add the new
>>        multi-version and dispatch passes to the pass list.
>>        * params.def (PARAM_NUMBER_OF_MVERSN_CLONES): Define.
>>        (PARAM_MVERSN_CLONE_CGRAPH_DEPTH): Define.
>>        * doc/invoke.texi (mversn-clone-depth): Document.
>>        (num-mversn-clones): Document.
>>        (fclone-hot-version-paths): Document.
>>        * testsuite/gcc.dg/mversn7.c: New test.
>>        * testsuite/gcc.dg/mversn4.c: New test.
>>        * testsuite/gcc.dg/mversn4.h: New test.
>>        * testsuite/gcc.dg/mversn4a.c: New test.
>>        * testsuite/gcc.dg/torture/mversn1.c: New test.
>>        * testsuite/gcc.dg/mversn2.c: New test.
>>        * testsuite/gcc.dg/mversn6.c: New test.
>>        * testsuite/gcc.dg/mversn3.c: New test.
>>        * testsuite/g++.dg/mversn8.C: New test.
>>        * testsuite/g++.dg/mversn10a.C: New test.
>>        * testsuite/g++.dg/mversn14a.C: New test.
>>        * testsuite/g++.dg/tree-prof/mversn13.C: New test.
>>        * testsuite/g++.dg/tree-prof/mversn15.C: New test.
>>        * testsuite/g++.dg/tree-prof/mversn15a.C: New test.
>>        * testsuite/g++.dg/mversn9.C: New test.
>>        * testsuite/g++.dg/mversn10.C: New test.
>>        * testsuite/g++.dg/mversn12.C: New test.
>>        * testsuite/g++.dg/mversn14.C: New test.
>>        * testsuite/g++.dg/mversn16.C: New test.
>>        * testsuite/g++.dg/torture/mversn11.C: New test.
>>        * testsuite/g++.dg/torture/mversn5.C: New test.
>>        * testsuite/g++.dg/torture/mversn5.h: New test.
>>        * testsuite/g++.dg/torture/mversn5a.C: New test.
>>        * c-family/c-common.c (handle_version_selector_attribute): New function.
>>        (c_common_attribute_table): New attribute "version_selector".
>
> OK.  Thanks for the quick fix!
>
>
> Diego.
>

Patch

Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 173381)
+++ doc/invoke.texi	(working copy)
@@ -343,7 +343,8 @@  Objective-C and Objective-C++ Dialects}.
 -falign-labels[=@var{n}] -falign-loops[=@var{n}] -fassociative-math @gol
 -fauto-inc-dec -fbranch-probabilities -fbranch-target-load-optimize @gol
 -fbranch-target-load-optimize2 -fbtr-bb-exclusive -fcaller-saves @gol
--fcheck-data-deps -fcombine-stack-adjustments -fconserve-stack @gol
+-fcheck-data-deps -fclone-hot-version-paths @gol
+-fcombine-stack-adjustments -fconserve-stack @gol
 -fcompare-elim -fcprop-registers -fcrossjumping @gol
 -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules @gol
 -fcx-limited-range @gol
@@ -8283,6 +8284,10 @@  used in one place: in @file{reorg.c}, instead of g
 branch is most likely to take, the @samp{REG_BR_PROB} values are used to
 exactly determine which path is taken more often.
 
+@item -fclone-hot-version-paths
+When multi-version calls are made using @samp{__builtin_dispatch}, this flag
+enables cloning and hoisting of hot multiversioned paths.
+
 @item -fprofile-values
 @opindex fprofile-values
 If combined with @option{-fprofile-arcs}, it adds code so that some
@@ -8544,6 +8549,16 @@  by the compiler will be investigated.  To those fu
 be applied.
 The default value is 40.
 
+@item mversn-clone-depth
+When using @option{-fclone-hot-version-paths}, hot function paths are multi-
+versioned via cloning.  This parameter specifies the maximum length of the
+call graph path that can be cloned. The default value is 2.
+
+@item num-mversn-clones
+When using @option{-fclone-hot-version-paths}, hot function paths are multi-
+versioned via cloning.  This parameter specifies the maximum number of
+functions that can be cloned. The default value is 10.
+
 @item large-function-insns
 The limit specifying really large functions.  For functions larger than this
 limit after inlining, inlining is constrained by
Index: c-family/c-common.c
===================================================================
--- c-family/c-common.c	(revision 173381)
+++ c-family/c-common.c	(working copy)
@@ -315,6 +315,7 @@  static tree check_case_value (tree);
 static bool check_case_bounds (tree, tree, tree *, tree *);
 
 static tree handle_packed_attribute (tree *, tree, tree, int, bool *);
+static tree handle_version_selector_attribute (tree *, tree, tree, int, bool *);
 static tree handle_nocommon_attribute (tree *, tree, tree, int, bool *);
 static tree handle_common_attribute (tree *, tree, tree, int, bool *);
 static tree handle_noreturn_attribute (tree *, tree, tree, int, bool *);
@@ -603,6 +604,8 @@  const unsigned int num_c_common_reswords =
 const struct attribute_spec c_common_attribute_table[] =
 {
   /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler } */
+  { "version_selector",	      0, 0, true, false, false,
+			      handle_version_selector_attribute },
   { "packed",                 0, 0, false, false, false,
 			      handle_packed_attribute },
   { "nocommon",               0, 0, true,  false, false,
@@ -5786,6 +5789,37 @@  handle_packed_attribute (tree *node, tree name, tr
   return NULL_TREE;
 }
 
+/* Handle a "version_selector attribute".
+   Functions are marked with attribute "version_selector" only if
+   they are run-time constants.  Example of such functions would
+   be those that test if a particular feature is available on a
+   particular architecture.  Such function must return a positive
+   integer. For two-way functions, those that test if a feature
+   is present or not must return 1 or 0 respectively. */
+
+static tree
+handle_version_selector_attribute (tree *node, tree name,
+ 				   tree ARG_UNUSED (args),
+ 				   int ARG_UNUSED (flags),
+				   bool *no_add_attrs)
+{
+  if (TREE_CODE (*node) == FUNCTION_DECL)
+    {
+      /* Check that the return type is integer. */
+      gcc_assert (TREE_CODE (TREE_TYPE (TREE_TYPE (*node)))
+                  == INTEGER_TYPE);
+      if (dump_file)
+        fprintf (dump_file, "%s is a version_selector function\n",
+   	         IDENTIFIER_POINTER (DECL_NAME (*node)));
+    }
+  else
+    {
+      warning (OPT_Wattributes, "%qE attribute ignored", name);
+      *no_add_attrs = true;
+    }
+  return NULL_TREE;
+}
+
 /* Handle a "nocommon" attribute; arguments as in
    struct attribute_spec.handler.  */
 
Index: tree-pass.h
===================================================================
--- tree-pass.h	(revision 173381)
+++ tree-pass.h	(working copy)
@@ -448,6 +448,7 @@  extern struct gimple_opt_pass pass_warn_unused_res
 extern struct gimple_opt_pass pass_split_functions;
 extern struct gimple_opt_pass pass_feedback_split_functions;
 extern struct gimple_opt_pass pass_threadsafe_analyze;
+extern struct gimple_opt_pass pass_tree_convert_builtin_dispatch;
 
 /* IPA Passes */
 extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
@@ -472,6 +473,7 @@  extern struct ipa_opt_pass_d pass_ipa_lto_wpa_fixu
 extern struct ipa_opt_pass_d pass_ipa_lto_finish_out;
 extern struct ipa_opt_pass_d pass_ipa_profile;
 extern struct ipa_opt_pass_d pass_ipa_cdtor_merge;
+extern struct simple_ipa_opt_pass pass_ipa_multiversion_dispatch;
 
 extern struct gimple_opt_pass pass_all_optimizations;
 extern struct gimple_opt_pass pass_cleanup_cfg_post_optimizing;
Index: testsuite/gcc.dg/mversn7.c
===================================================================
--- testsuite/gcc.dg/mversn7.c	(revision 0)
+++ testsuite/gcc.dg/mversn7.c	(revision 0)
@@ -0,0 +1,47 @@ 
+/* This test checks if cloning and dispatching works correctly with
+   a motivating example. The function problem calculates the sum of
+   numbers from 1 to 10 in two different ways.  Hence, after cloning
+   both clones should return 55, which means main returns 0 if function
+   "problem" gets inlined.
+   This example also shows the benefits of function
+   unswitching.  Without cloning, the loop will be done. */
+
+/* { dg-do run } */
+/* { dg-options "-O2 -fclone-hot-version-paths -fdump-tree-optimized" } */
+
+int __attribute__ ((version_selector))
+featureTest ()
+{
+  return 1;
+}
+
+int foo (int i)
+{
+  return i;
+}
+
+int bar (int i)
+{
+  return (11 - i);  
+}
+
+/* This calculates the sum of numbers from 1 to 10 in 2 different ways. */
+int __attribute__ ((hot))
+problem ()
+{
+  int ret = 0;
+  int j = 1;
+  for (j = 1; j<=10; j++)
+    ret += __builtin_dispatch (featureTest, (void *)foo, (void *)bar, j);
+  return ret;
+}
+
+int __attribute__ ((hot))
+main ()
+{
+  return problem() - 55;
+}
+
+
+/* { dg-final { scan-tree-dump "return 55" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: testsuite/gcc.dg/mversn4.c
===================================================================
--- testsuite/gcc.dg/mversn4.c	(revision 0)
+++ testsuite/gcc.dg/mversn4.c	(revision 0)
@@ -0,0 +1,29 @@ 
+/* Check that static feature test functions are correctly handled. This
+   test case also needs mversn4a.c.  extern_func calls foo and returns 0.*/
+   
+/* { dg-do run } */
+/* { dg-additional-sources "mversn4a.c" } */
+/* { dg-options "-O2" } */
+
+#include "mversn4.h"
+
+extern int extern_func ();
+
+int foo ()
+{
+  return 0;
+}
+
+int bar ()
+{
+  return 1;
+}
+
+int __attribute__ ((cold))
+main ()
+{
+  int a = 1, b = 1;
+  a  = extern_func ();
+  b  = __builtin_dispatch (featureTest, (void *)bar, (void *) foo);
+  return a * b;
+}
Index: testsuite/gcc.dg/mversn4.h
===================================================================
--- testsuite/gcc.dg/mversn4.h	(revision 0)
+++ testsuite/gcc.dg/mversn4.h	(revision 0)
@@ -0,0 +1,5 @@ 
+static int __attribute__ ((version_selector))
+featureTest ()
+{
+  return 1;
+}
Index: testsuite/gcc.dg/mversn4a.c
===================================================================
--- testsuite/gcc.dg/mversn4a.c	(revision 0)
+++ testsuite/gcc.dg/mversn4a.c	(revision 0)
@@ -0,0 +1,13 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#include "mversn4.h"
+
+extern int foo ();
+extern int bar ();
+
+
+int extern_func ()
+{
+  return __builtin_dispatch (featureTest, (void *)foo, (void *) bar);
+}
Index: testsuite/gcc.dg/torture/mversn1.c
===================================================================
--- testsuite/gcc.dg/torture/mversn1.c	(revision 0)
+++ testsuite/gcc.dg/torture/mversn1.c	(revision 0)
@@ -0,0 +1,31 @@ 
+/* Check that __builtin_dispatch gets converted and the executable runs fine. 
+   Since featureTest () returns 1, foo should be called and return 0.  Cloning
+   and Hoisting is not turned on here. */
+/* { dg-do run } */
+
+int __attribute__ ((version_selector))
+featureTest ()
+{
+  return 1;
+}
+
+int __attribute__ ((cold))
+foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((cold))
+bar ()
+{
+  return 1;
+}
+
+int __attribute__ ((cold))
+main ()
+{
+  int a, b;
+  a = __builtin_dispatch (featureTest, (void *)foo, (void *) bar);
+  b = __builtin_dispatch (featureTest, (void *)bar, (void *) foo);
+  return a * b;
+}
Index: testsuite/gcc.dg/mversn2.c
===================================================================
--- testsuite/gcc.dg/mversn2.c	(revision 0)
+++ testsuite/gcc.dg/mversn2.c	(revision 0)
@@ -0,0 +1,47 @@ 
+/* This checks if cloning works correctly.  Since dispatch and fn1 are hot, they
+   should be cloned.  main should not be cloned.*/
+
+/* { dg-do run } */
+/* { dg-options "-O2 -fclone-hot-version-paths -fdump-tree-optimized" } */
+
+int __attribute__ ((version_selector))
+featureTest ()
+{
+  return 1;
+}
+
+int __attribute__ ((cold))
+foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((cold))
+bar ()
+{
+  return 1;
+}
+
+int __attribute__ ((hot))
+dispatch ()
+{
+  return __builtin_dispatch (featureTest, (void *)foo, (void *) bar);
+}
+int __attribute__ ((hot))
+fn1 ()
+{
+  return dispatch ();
+}
+
+int __attribute__ ((cold))
+main ()
+{
+  return fn1 ();
+}
+
+/* { dg-final { scan-tree-dump "fn1_clone_1" "optimized" } } */
+/* { dg-final { scan-tree-dump "dispatch_clone_0" "optimized" } } */
+/* { dg-final { scan-tree-dump "dispatch_clone_1" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "main_clone_0" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "main_clone_1" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: testsuite/gcc.dg/mversn6.c
===================================================================
--- testsuite/gcc.dg/mversn6.c	(revision 0)
+++ testsuite/gcc.dg/mversn6.c	(revision 0)
@@ -0,0 +1,29 @@ 
+/* Check that __builtin_dispatch gets converted and the executable runs fine. 
+   when featureTest () is not marked with "version_selector". foo should be 
+   called and return 0.  Cloning and Hoisting is also done. */
+/* { dg-do run } */
+/* { dg-options "-O2 -fclone-hot-version-paths" } */
+
+int
+featureTest ()
+{
+  return 1;
+}
+
+int __attribute__ ((cold))
+foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((cold))
+bar ()
+{
+  return 1;
+}
+
+int __attribute__ ((cold))
+main ()
+{
+  return __builtin_dispatch (featureTest, (void *)foo, (void *) bar);
+}
Index: testsuite/gcc.dg/mversn3.c
===================================================================
--- testsuite/gcc.dg/mversn3.c	(revision 0)
+++ testsuite/gcc.dg/mversn3.c	(revision 0)
@@ -0,0 +1,30 @@ 
+/* Simple check if parameters are passed correctly. */
+
+/* { dg-do run } */
+/* { dg-options "-O2 -fclone-hot-version-paths" } */
+
+int __attribute__ ((version_selector))
+featureTest ()
+{
+  return 1;
+}
+
+int __attribute__ ((cold))
+foo (int a)
+{
+  if (a == 1729)
+    return 0;
+  return 1;
+}
+
+int __attribute__ ((cold))
+bar (int a)
+{
+  return 1;
+}
+
+int __attribute__ ((cold))
+main ()
+{
+  return __builtin_dispatch (featureTest, (void *)foo, (void *) bar, 1729);
+}
Index: testsuite/g++.dg/mversn8.C
===================================================================
--- testsuite/g++.dg/mversn8.C	(revision 0)
+++ testsuite/g++.dg/mversn8.C	(revision 0)
@@ -0,0 +1,45 @@ 
+/* Call the caller of __builtin_dispatch indirectly.  Specify the
+   feature test function as a function pointer.  Make sure cloning
+   still happens. */
+
+/* { dg-do run } */
+/* { dg-options "-O2 -fclone-hot-version-paths -fdump-tree-optimized" } */
+
+int __attribute__ ((version_selector))
+featureTest ()
+{
+  return 1;
+}
+
+int
+foo ()
+{
+  return 0;
+}
+
+int
+bar ()
+{
+  return 1;
+}
+
+int __attribute__ ((hot))
+dispatch ()
+{
+  int (*funcp)() = featureTest;
+  int ret = __builtin_dispatch (funcp, (void *)foo, (void *)bar);
+  return ret;
+}
+
+int __attribute__ ((hot))
+main (int argc, char **argv)
+{
+  int (*ptr)() = dispatch;
+  return (*ptr)();
+}
+
+/* { dg-final { scan-tree-dump "dispatchv_clone_0" "optimized" } } */
+/* { dg-final { scan-tree-dump "dispatchv_clone_1" "optimized" } } */
+/* { dg-final { scan-tree-dump "main_clone_0" "optimized" } } */
+/* { dg-final { scan-tree-dump "main_clone_1" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: testsuite/g++.dg/mversn10a.C
===================================================================
--- testsuite/g++.dg/mversn10a.C	(revision 0)
+++ testsuite/g++.dg/mversn10a.C	(revision 0)
@@ -0,0 +1,37 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fclone-hot-version-paths -fdump-tree-optimized" } */
+
+static int __attribute__ ((version_selector))
+featureTest ()
+{
+  return 1;
+}
+
+static int foo ()
+{
+  return 0;
+}
+
+static int bar ()
+{
+  return 1;
+}
+
+static int __attribute__ ((noinline))
+dispatch ()
+{
+  __builtin_dispatch (featureTest, (void *)foo, (void *)bar);
+  return 0;
+}
+
+int
+fn2 ()
+{
+  for (int i = 0; i < 1000; i++)
+    dispatch ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "dispatchv_clone_0" "optimized" } } */
+/* { dg-final { scan-tree-dump "dispatchv_clone_1" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: testsuite/g++.dg/mversn14a.C
===================================================================
--- testsuite/g++.dg/mversn14a.C	(revision 0)
+++ testsuite/g++.dg/mversn14a.C	(revision 0)
@@ -0,0 +1,8 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int __attribute__ ((version_selector))
+featureTest ()
+{
+  return 1;
+}
Index: testsuite/g++.dg/tree-prof/mversn13.C
===================================================================
--- testsuite/g++.dg/tree-prof/mversn13.C	(revision 0)
+++ testsuite/g++.dg/tree-prof/mversn13.C	(revision 0)
@@ -0,0 +1,37 @@ 
+/* Make sure -fprofile-generate and -fprofile-use work fine. */
+
+/* { dg-options "-O2 -fclone-hot-version-paths" } */
+
+static int glob = 0;
+int __attribute__ ((version_selector))
+featureTest ()
+{
+  return glob;
+}
+
+int bar (int i)
+{
+  if (i > 500)
+    return 2 * i;
+  return 3 * i;
+}
+
+int foo (int i)
+{
+  bar (i);
+}
+
+int
+dispatch ()
+{
+  int ret = 0;
+  for (int i = 0; i < 1000; i++)
+    ret += __builtin_dispatch (featureTest, (void *)foo, (void *)bar,  i);
+  return ret;
+}
+
+int main ()
+{
+  int val = dispatch ();
+  return val > 10000000;
+}
Index: testsuite/g++.dg/tree-prof/mversn15.C
===================================================================
--- testsuite/g++.dg/tree-prof/mversn15.C	(revision 0)
+++ testsuite/g++.dg/tree-prof/mversn15.C	(revision 0)
@@ -0,0 +1,23 @@ 
+/* Make sure LIPO works correctly. dispatch is defined in mversn15a.C. It either
+   calls foo or bar and both returns 1. So, the value of ret is always 1000.
+   After cross-module inlining, main must return 0. */
+
+/* { dg-additional-sources "mversn15a.C" } */
+/* { dg-options "-O2 -fclone-hot-version-paths -fripa -fdump-tree-optimized" } */
+
+extern int foo ();
+extern int bar ();
+extern int dispatch ();
+
+int
+main ()
+{
+  int ret = 0;
+  for (int i = 1; i <= 1000; i++)
+   ret += dispatch ();
+  return ret - 1000;
+}
+
+/* { dg-final-use { scan-tree-dump "main_clone" "optimized" } } */
+/* { dg-final-use { scan-tree-dump "return 0" "optimized" } } */
+/* { dg-final-use { cleanup-tree-dump "optimized" } } */
Index: testsuite/g++.dg/tree-prof/mversn15a.C
===================================================================
--- testsuite/g++.dg/tree-prof/mversn15a.C	(revision 0)
+++ testsuite/g++.dg/tree-prof/mversn15a.C	(revision 0)
@@ -0,0 +1,26 @@ 
+/* { dg-options "-O2 -fclone-hot-version-paths -fripa" } */
+/* { dg-additional-sources "mversn15.C" } */
+
+#include <stdio.h>
+
+inline int
+ __attribute__ ((version_selector))
+featureTest()
+{
+  return 1;
+}
+
+int foo ()
+{
+  return 1;
+}
+
+int bar ()
+{
+  return 1;
+}
+
+int dispatch ()
+{
+  return __builtin_dispatch (featureTest, (void *)foo, (void *)bar);
+}
Index: testsuite/g++.dg/mversn9.C
===================================================================
--- testsuite/g++.dg/mversn9.C	(revision 0)
+++ testsuite/g++.dg/mversn9.C	(revision 0)
@@ -0,0 +1,30 @@ 
+/* Two __builtin_dispatch calls in different basic blocks. */
+
+/* { dg-do run } */
+/* { dg-options "-O2 -fclone-hot-version-paths" } */
+
+int __attribute__ ((version_selector))
+featureTest ()
+{
+  return 1;
+}
+
+int
+foo ()
+{
+  return 0;
+}
+
+int
+bar ()
+{
+  return 1;
+}
+
+int main (int argc, char **argv)
+{
+  if (argc)
+   return __builtin_dispatch (featureTest, (void *)foo, (void *)bar);
+  else
+   return (__builtin_dispatch (featureTest, (void *)bar, (void *)foo) - 1);
+}
Index: testsuite/g++.dg/mversn10.C
===================================================================
--- testsuite/g++.dg/mversn10.C	(revision 0)
+++ testsuite/g++.dg/mversn10.C	(revision 0)
@@ -0,0 +1,54 @@ 
+/* The purpose of this test is to check if the attributes of the clone
+   correctly shadow the parent function.  In this case, function "dispatch"
+   is cloned but is a static function.  The other file, "mversn10a.C" also
+   has a dispatch function that is cloned.  So, if the attributes of the
+   clone are not correct the linker will complain. */
+
+/* { dg-do run } */
+/* { dg-additional-sources "mversn10a.C" } */
+/* { dg-options "-O2 -fclone-hot-version-paths -fdump-tree-optimized" } */
+
+static int __attribute__ ((version_selector))
+featureTest ()
+{
+  return 1;
+}
+
+static int foo ()
+{
+  return 0;
+}
+
+static int bar ()
+{
+  return 1;
+}
+
+static int __attribute__ ((noinline))
+dispatch ()
+{
+  __builtin_dispatch (featureTest, (void *)foo, (void *)bar);
+  return 0;
+}
+
+int
+fn1 ()
+{
+  for (int i = 0; i < 1000; i++)
+    dispatch ();
+  return 0;
+}
+
+extern int fn2 ();
+
+int __attribute__ ((hot))
+main ()
+{
+  fn1 ();
+  fn2 ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "dispatchv_clone_0" "optimized" } } */
+/* { dg-final { scan-tree-dump "dispatchv_clone_1" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: testsuite/g++.dg/mversn12.C
===================================================================
--- testsuite/g++.dg/mversn12.C	(revision 0)
+++ testsuite/g++.dg/mversn12.C	(revision 0)
@@ -0,0 +1,43 @@ 
+/* Check if everything is fine if the versioned functions are static
+   member functions. */
+
+/* { dg-do run } */
+/* { dg-options "-O2 -fclone-hot-version-paths" } */
+
+int  __attribute__ ((version_selector))
+featureTest()
+{
+  return 1;;
+}
+
+class TestClass
+{
+ public:
+  static int foo ()
+  {
+    return 0;
+  }
+
+  static int bar ()
+  {
+    return 1;
+  }
+
+  int dispatch ()
+  {
+    int a = __builtin_dispatch (featureTest,
+                               (void*)(&TestClass::foo),
+			       (void*)(&TestClass::bar));
+    int b = __builtin_dispatch (featureTest,
+                               (void*)(&TestClass::bar),
+			       (void*)(&TestClass::foo));
+    return a * b;
+  }
+};
+
+int
+main ()
+{
+  TestClass c1;
+  return c1.dispatch ();
+}
Index: testsuite/g++.dg/mversn14.C
===================================================================
--- testsuite/g++.dg/mversn14.C	(revision 0)
+++ testsuite/g++.dg/mversn14.C	(revision 0)
@@ -0,0 +1,26 @@ 
+/* Check if everything is fine when the feature test body is in a different
+   module that does not have a __builtin-dispatch".  Requires mversn14a.C. */
+
+/* { dg-do run } */
+/* { dg-additional-sources "mversn14a.C" } */
+/* { dg-options "-O2 -fclone-hot-version-paths" } */
+
+int __attribute__ ((version_selector))
+featureTest ();
+
+int
+foo ()
+{
+  return 0;
+}
+
+int
+bar ()
+{
+  return 1;
+}
+
+int main ()
+{
+   return __builtin_dispatch (featureTest, (void *)foo, (void *)bar);
+}
Index: testsuite/g++.dg/mversn16.C
===================================================================
--- testsuite/g++.dg/mversn16.C	(revision 0)
+++ testsuite/g++.dg/mversn16.C	(revision 0)
@@ -0,0 +1,39 @@ 
+/* dispatch is cloned. Make sure the double is returned to main correctly. */
+
+/* { dg-do run } */
+/* { dg-options "-O2 -fclone-hot-version-paths -fdump-tree-optimized" } */
+
+#include "assert.h"
+
+int __attribute__ ((version_selector))
+featureTest ()
+{
+  return 1;
+}
+
+static int foo ()
+{
+  return 0;
+}
+
+static int bar ()
+{
+  return 1;
+}
+
+double
+dispatch ()
+{
+  __builtin_dispatch (featureTest, (void *)foo, (void *)bar);
+  return 2.536;
+}
+
+int main ()
+{
+  double d = dispatch ();
+  assert (d == 2.536);
+  return 0; 
+}
+
+/* { dg-final { scan-tree-dump "dispatchv_clone" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: testsuite/g++.dg/torture/mversn11.C
===================================================================
--- testsuite/g++.dg/torture/mversn11.C	(revision 0)
+++ testsuite/g++.dg/torture/mversn11.C	(revision 0)
@@ -0,0 +1,40 @@ 
+/* Check if parameters are passed correctly to the versioned function. */
+
+/* { dg-do run } */
+
+#include <stdio.h>
+#include <assert.h>
+
+int __attribute__ ((version_selector))
+featureTest ()
+{
+  return 1;
+}
+
+int __attribute__ ((noinline))
+foo (int n1, double n2, char n3)
+{
+  assert (n1 == 10);
+  assert (n2 == 20.0);
+  assert (n3 == 'c');
+  return n1;
+}
+
+
+int __attribute__ ((noinline))
+bar (int n1, double n2, char n3)
+{
+  assert (n1 == 10);
+  assert (n2 == 20.0);
+  assert (n3 == 'c');
+  return (n1 - 10);
+}
+
+int main ()
+{
+  int a = __builtin_dispatch (featureTest, (void *)foo, (void *)bar,
+                              10, 20.0, 'c');
+  int b = __builtin_dispatch (featureTest, (void *)bar, (void *)foo,
+                              10, 20.0, 'c');
+  return a * b;
+}
Index: testsuite/g++.dg/torture/mversn5.C
===================================================================
--- testsuite/g++.dg/torture/mversn5.C	(revision 0)
+++ testsuite/g++.dg/torture/mversn5.C	(revision 0)
@@ -0,0 +1,28 @@ 
+/* Check that global inline feature test functions are correctly handled. This
+   test case also needs mversn5a.c.  extern_func calls foo and returns 0.*/
+
+/* { dg-do run } */
+/* { dg-additional-sources "mversn5a.C" } */
+
+#include "mversn5.h"
+
+extern int extern_func ();
+
+int foo ()
+{
+  return 0;
+}
+
+int bar ()
+{
+  return 1;
+}
+
+int __attribute__ ((cold))
+main ()
+{
+  int a = 1, b = 1;
+  a  = extern_func ();
+  b  = __builtin_dispatch (featureTest, (void *)bar, (void *) foo);
+  return a * b;
+}
Index: testsuite/g++.dg/torture/mversn5.h
===================================================================
--- testsuite/g++.dg/torture/mversn5.h	(revision 0)
+++ testsuite/g++.dg/torture/mversn5.h	(revision 0)
@@ -0,0 +1,5 @@ 
+inline int __attribute__ ((version_selector))
+featureTest ()
+{
+  return 1;
+}
Index: testsuite/g++.dg/torture/mversn5a.C
===================================================================
--- testsuite/g++.dg/torture/mversn5a.C	(revision 0)
+++ testsuite/g++.dg/torture/mversn5a.C	(revision 0)
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+
+#include "mversn5.h"
+
+extern int foo ();
+extern int bar ();
+
+
+int extern_func ()
+{
+  return __builtin_dispatch (featureTest, (void *)foo, (void *) bar);
+}
Index: builtin-types.def
===================================================================
--- builtin-types.def	(revision 173381)
+++ builtin-types.def	(working copy)
@@ -475,3 +475,7 @@  DEF_FUNCTION_TYPE_VAR_5 (BT_FN_INT_INT_INT_INT_INT
 DEF_POINTER_TYPE (BT_PTR_FN_VOID_VAR, BT_FN_VOID_VAR)
 DEF_FUNCTION_TYPE_3 (BT_FN_PTR_PTR_FN_VOID_VAR_PTR_SIZE,
 		     BT_PTR, BT_PTR_FN_VOID_VAR, BT_PTR, BT_SIZE)
+
+DEF_POINTER_TYPE (BT_PTR_FN_INT, BT_FN_INT)
+DEF_FUNCTION_TYPE_VAR_3 (BT_FN_INT_PTR_FN_INT_PTR_PTR_VAR,
+			 BT_INT, BT_PTR_FN_INT, BT_PTR, BT_PTR)
Index: builtins.def
===================================================================
--- builtins.def	(revision 173381)
+++ builtins.def	(working copy)
@@ -760,6 +760,9 @@  DEF_BUILTIN (BUILT_IN_EMUTLS_REGISTER_COMMON,
 	     true, true, true, ATTR_NOTHROW_LEAF_LIST, false,
 	     !targetm.have_tls)
 
+/* Multiversioning builtin dispatch hook. */
+DEF_GCC_BUILTIN (BUILT_IN_DISPATCH, "dispatch", BT_FN_INT_PTR_FN_INT_PTR_PTR_VAR, ATTR_NULL)
+
 /* Exception support.  */
 DEF_BUILTIN_STUB (BUILT_IN_UNWIND_RESUME, "__builtin_unwind_resume")
 DEF_BUILTIN_STUB (BUILT_IN_CXA_END_CLEANUP, "__builtin_cxa_end_cleanup")
Index: mversn-dispatch.c
===================================================================
--- mversn-dispatch.c	(revision 0)
+++ mversn-dispatch.c	(revision 0)
@@ -0,0 +1,1766 @@ 
+/* Mulitversion Dispatch Pass.
+   Copyright (C) 2011 Free Software Foundation, Inc.
+   Contributed by Sriraman Tallam (tmsriram@google.com)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>. */
+
+
+/* This pass processes __builtin_dispatch calls to call multi-versioned
+   functions. Only two versions are supported now. Example use :
+
+  int popcnt_sse4(unsigned int x) __attribute__((__target__("popcnt")));
+  int popcnt_sse4(unsigned int x)
+  {
+    int count = __builtin_popcount(x);
+    return count;
+  }
+
+  int popcnt(unsigned int x) __attribute__((__target__("no-popcnt")));
+  int popcnt(unsigned int x)
+  {
+    int count = __builtin_popcount(x);
+    return count;
+  }
+
+  int testsse() __attribute__((version_selector));
+  int main ()
+  {
+    ...
+    ret = __builtin_dispatch (testsse,  (void*)popcnt_sse4, (void*)popcnt, 25);
+    ...
+  }
+
+  There are two passes that are run to achieve multi-versioning.
+  "pass_ipa_multiversion_dispatch" is an ipa pass that decides which functions
+  have to be cloned and hoists the feature-test calls appropriately.  This
+  pass can be enabled with the flag "-fclone-hot-version-paths" and disabled
+  with "-fno-clone-hot-version-paths".
+
+  "pass_tree_convert_builtin_dispatch" does the lowering.  It is a
+  function-level pass.  Functions marked with attribute "version_selector" are
+  also handled by this pass.  This pass is always on.
+
+  How to use __builtin_dispatch ?
+  -----------------------------
+
+  __builtin_dispatch takes 3 mandatory arguments :
+
+  __builtin_dispatch (arg1, arg2, arg3, <arg4>, <arg5>, ...);
+
+  arg1 is the pointer to the feature-test function.
+  arg2 is the ( void *) cast pointer to the versioned function that is
+  executed when the feature test returns 1.
+  arg3 is the ( void *) cast pointer to the versioned function that is
+  executed when the feature test returns 0.
+  arg4, arg5, ... are optional. They are the arguments to the versioned
+  functions.  Both versions must accept the same number of arguments.
+  The __builtin_dispatch function returns the value returned by the
+  versioned function that gets executed.  The versioned function arg2
+  is executed when the feature_test function arg1 returns 1 and arg3
+  is executed when the feature_test function arg1 returns 0. arg1
+  could be marked as a "version_selector" function if it is a pure
+  function with no side-effects, returns a constant at run-time and
+  can be evaluated at any point in the execution.
+
+  When to use the "version_selector" attribute ?
+  -----------------------------------------------
+
+  Functions are marked with attribute "version_selector" only if
+  they are run-time constants.  Example of such functions would
+  be those that test if a specific feature is available on a
+  particular architecture.  Such functions must return a positive
+  integer. For two-way functions, those that test if a feature
+  is present or not must return 1 or 0 respectively.
+
+
+  The code is organized into five parts.  The first part has the functionality
+  to detect and handle functions marked with attribute "version_selector".  The
+  second part is the analysis phase where we find calls to __builtin_dispatch
+  and mark all functions that are hot and have a call-graph path to a
+  __builtin_dispatch call.  The third part decides which functions
+  to clone.  This is based on the number of clones that have to be created for
+  the functions marked in the analysis phase. Only two clones are allowed for
+  a function currently.  The fourth part is where the actual cloning happens.
+  The fifth part contains the implementation to lower the __builtin_dispatch
+  calls.
+
+  Flags : -fclone-hot-version-paths does function unswitching via cloning.
+          --param=num-mversn-clones=<num> allows to specify the number of
+          functions that should be cloned.
+	  --param=mversn-clone-depth=<num> allows to specify the length of
+          the call graph path that should be cloned.  num = 0 implies only
+          leaf node that contains the __builtin_dispatch statement must be
+          cloned. */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "tree-inline.h"
+#include "langhooks.h"
+#include "flags.h"
+#include "cgraph.h"
+#include "diagnostic.h"
+#include "toplev.h"
+#include "timevar.h"
+#include "params.h"
+#include "fibheap.h"
+#include "intl.h"
+#include "tree-pass.h"
+#include "hashtab.h"
+#include "coverage.h"
+#include "ggc.h"
+#include "tree-flow.h"
+#include "rtl.h"
+#include "ipa-prop.h"
+#include "basic-block.h"
+#include "toplev.h"
+#include "dbgcnt.h"
+#include "tree-dump.h"
+#include "output.h"
+#include "vecprim.h"
+#include "gimple-pretty-print.h"
+
+typedef struct cgraph_node* NODEPTR;
+DEF_VEC_P (NODEPTR);
+DEF_VEC_ALLOC_P (NODEPTR, heap);
+
+/* Store the decl of __builtin_dispatch */
+static tree builtin_function_decl = NULL;
+
+/* Hash to map name to a decl.  Used for variables and functions. */
+static htab_t name_decl_htab = NULL;
+
+/* Hashtable helpers for name_decl_htab. */
+
+static hashval_t
+name_decl_htab_hash_descriptor (const void *p)
+{
+  const_tree t = (const_tree) p;
+  const char *name
+    = (IDENTIFIER_POINTER (DECL_NAME (t)));
+  return htab_hash_string(name);
+}
+
+/* Hashtable helper for name_decl_htab. */
+
+static int
+name_decl_htab_eq_descriptor (const void *p1, const void *p2)
+{
+  const_tree t1 = (const_tree) p1;
+  const char *c1 = IDENTIFIER_POINTER (DECL_NAME (t1));
+  const char *c2 = (const char *)p2;
+
+  return (strcmp (c1, c2) == 0);
+}
+
+/* Return true if NODE is a hot function.  It is a hot function
+   if its execution frequency is determined to be hot or
+   if any of its incoming or outgoing call-graph edges is hot.  */
+
+static bool
+hot_function_p (struct cgraph_node *node)
+{
+  struct cgraph_edge *edge;
+
+  if (node->frequency == NODE_FREQUENCY_HOT)
+    return true;
+
+  for (edge = node->callees; edge; edge = edge->next_callee)
+    if (cgraph_maybe_hot_edge_p (edge))
+      return true;
+
+  for (edge = node->callers; edge; edge = edge->next_caller)
+    if (cgraph_maybe_hot_edge_p (edge))
+      return true;
+
+  return false;
+}
+
+/* Return the number of arguments that a function has.  */
+
+static int
+function_args_count (tree fntype)
+{
+  function_args_iterator args_iter;
+  tree t;
+  int num = 0;
+
+  if (fntype)
+    {
+      FOREACH_FUNCTION_ARGS(fntype, t, args_iter)
+	{
+	  num++;
+	}
+    }
+
+  return num;
+}
+
+/* Return the variable name (global/constructor) to use for the
+   version_selector function with name of DECL by appending SUFFIX. */
+
+static char *
+make_name (tree decl, const char *suffix)
+{
+  char *global_var_name;
+  int name_len;
+  const char *name;
+
+  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+  name_len = strlen (name) + strlen (suffix) + 2;
+  global_var_name = (char *) xmalloc (name_len);
+  snprintf (global_var_name, name_len, "%s_%s", name, suffix);
+  return global_var_name;
+}
+
+/* Code for handling version_selector attribute functions.  Such functions are
+   run-time constants and need to be executed only once.  They are hoisted
+   to a static constructor and their result is stored in a global.
+ */
+
+
+/* This function returns the global variable / constructor name created
+   for feature-test functions marked with attribute "version_selector".
+   The name returned is the DECL name appended with
+   "version_selector_global" for the variable and
+   "version_selector_constructor" for the constructor. */
+
+static char*
+make_feature_test_global_name (tree decl, bool is_constructor)
+{
+  if (is_constructor)
+    return make_name (decl, "version_selector_constructor");
+
+  return make_name (decl, "version_selector_global");
+}
+
+/* This function creates a new VAR_DECL with attributes set
+   using the parameters.  PUBLIK corresponds to TREE_PUBLIC,
+   EXTERNAL corresponds to DECL_EXTERNAL and comdat is
+   for DECL_ONE_ONLY.  The global variable will have the
+   same status as the version_selector function.*/
+
+static tree
+allocate_new_var (const char *name, int publik,
+                  int external, int comdat)
+{
+  tree new_global_var;
+  struct varpool_node *vnode;
+
+  new_global_var = build_decl (UNKNOWN_LOCATION,
+			       VAR_DECL,
+	  	               get_identifier (name),
+		               integer_type_node);
+
+  DECL_EXTERNAL (new_global_var) = external;
+  TREE_STATIC (new_global_var) = 1;
+  TREE_PUBLIC (new_global_var) = publik;
+  DECL_INITIAL (new_global_var) = 0;
+  DECL_ARTIFICIAL (new_global_var) = 1;
+  DECL_PRESERVE_P (new_global_var) = 1;
+
+  if (comdat)
+    make_decl_one_only (new_global_var, DECL_ASSEMBLER_NAME (new_global_var));
+  assemble_variable (new_global_var, 0, 0, 0);
+
+  vnode = varpool_node (new_global_var);
+  gcc_assert (vnode != NULL);
+  /* Set finalized to 1, otherwise it asserts in function "write_symbol" in
+     lto-streamer-out.c. */
+  vnode->finalized = 1;
+  
+  return new_global_var;
+}
+
+/* Make a new constructor function here to call a feature-test function
+   and set its body to CONSTRUCTOR_BODY.  Its public and comdat
+   attributes are set from the parameters, PUBLIK, and COMDAT.
+   VERSION_SELECTOR_VAR is the global decl that saves the result of the
+   feature-test function in the constructor. */
+
+static tree
+make_constructor_function (char *name, gimple constructor_body, int publik,
+			   int comdat, tree version_selector_var)
+{
+  tree decl, type, t;
+  gimple_seq seq;
+  basic_block new_bb;
+  tree old_current_function_decl;
+
+  type = build_function_type_list (void_type_node, NULL_TREE);
+
+  if (dump_file)
+    fprintf (dump_file, "Name of new constructor function = %s\n", name);
+
+  decl = build_fn_decl (name, type);
+
+  DECL_NAME (decl) = get_identifier (name);
+  SET_DECL_ASSEMBLER_NAME (decl, DECL_NAME (decl));
+  gcc_assert (cgraph_node (decl) != NULL);
+
+  TREE_USED (decl) = 1;
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  TREE_PUBLIC (decl) = publik;
+  DECL_UNINLINABLE (decl) = 1;
+  DECL_EXTERNAL (decl) = 0;
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  DECL_STATIC_CONSTRUCTOR (decl) = 1;
+  TREE_READONLY (decl) = 0;
+  DECL_PURE_P (decl) = 0;
+
+  if (comdat)
+    make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
+
+  /* Build result decl and add to function_decl. */
+  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, void_type_node);
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_RESULT (decl) = t;
+
+  gimplify_function_tree (decl);
+
+  /* Build CFG for this function. */
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (decl));
+  current_function_decl = decl;
+  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
+  cfun->curr_properties |=
+    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
+     PROP_ssa);
+  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
+  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
+
+  /* XXX: Not sure if the edge commented below is necessary.  If I add this
+     edge, it fails in gimple_verify_flow_info in tree-cfg.c in condition :
+     " if (e->flags & EDGE_FALLTHRU)"
+     during -fprofile-generate.
+     Otherwise, it is fine.  Deleting this edge does not break anything.
+     Commenting this so that it is clear I am intentionally not doing this.*/
+  /* make_edge (new_bb, EXIT_BLOCK_PTR, EDGE_FALLTHRU); */
+
+  seq = gimple_seq_alloc_with_stmt (constructor_body);
+
+  set_bb_seq (new_bb, seq);
+  gimple_set_bb (constructor_body, new_bb);
+
+  /* Set the lexical block of the constructor body. Fails the inliner
+     other wise. */
+  gimple_set_block (constructor_body, DECL_INITIAL (decl));
+
+  /* This call is very important if this pass runs when the IR is in
+     SSA form.  It breaks things in strange ways otherwise. */
+  init_tree_ssa (DECL_STRUCT_FUNCTION (decl));
+  add_referenced_var (version_selector_var);
+
+  cgraph_add_new_function (decl, true);
+  cgraph_call_function_insertion_hooks (cgraph_node (decl));
+  cgraph_mark_needed_node (cgraph_node (decl));
+
+  if (dump_file)
+    dump_function_to_file (decl, dump_file, TDF_BLOCKS);
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+  return decl;
+}
+
+/* If  the current function is marked with attribute
+   "version_selector" then it is the predicate (feature-test) function
+   for multi-versioning.  Call this function in a constructor and assign
+   the return value to a global variable.
+   The constructor's name is the decl name suffixed
+   "version_selector_constructor" and the global variable's name is the
+   decl name suffixed with "version_selector_global"
+
+   For example, feature-test function isSSE4 marked with attribute
+   version_selector is converted to
+
+   void isSSE4_version_selector_constructor ()
+   {
+     isSSE4_version_selector_global = isSSE4 ();
+   }
+
+   This function returns the decl of the global variable.
+
+   THIS_DECL is the function decl of the "version_selector" function.
+   */
+
+static tree
+handle_version_selector_attr_function (tree this_decl)
+{
+  char *global_var_name;
+  tree version_selector_var = NULL;
+  void **slot;
+
+  gcc_assert (!flag_lto);
+
+  if (dump_file)
+    fprintf (dump_file, "Creating constructor/global for function %s\n",
+	     IDENTIFIER_POINTER (DECL_NAME (this_decl)));
+
+  global_var_name = make_feature_test_global_name (this_decl,
+                                                   false);
+
+  slot = htab_find_slot_with_hash (name_decl_htab, global_var_name,
+                                   htab_hash_string (global_var_name),
+                                   INSERT);
+  if (*slot == NULL)
+    {
+      if (dump_file)
+        fprintf (dump_file, "Creating global variable %s\n",
+	         global_var_name);
+      *slot = allocate_new_var (global_var_name,
+                                TREE_PUBLIC (this_decl),
+                                DECL_EXTERNAL (this_decl),
+                                DECL_ONE_ONLY (this_decl));
+    }
+  else
+    {
+      free (global_var_name);
+      return (tree) *slot;
+    }
+
+  version_selector_var = (tree) *slot;
+
+  /* If the feature-test function is not external, create a constructor and
+     call this function in the constructor. */
+
+  if (!DECL_EXTERNAL (this_decl))
+    {
+      char *constructor_name;
+      gimple constructor_body;
+      tree constructor_decl;
+
+      constructor_name
+        = make_feature_test_global_name (this_decl, true);
+
+      constructor_body = gimple_build_call (this_decl, 0);
+
+      gimple_call_set_lhs (constructor_body, version_selector_var);
+
+      if (dump_file)
+        print_gimple_stmt (dump_file, constructor_body, 0, TDF_VOPS);
+
+      constructor_decl =
+        make_constructor_function (constructor_name, constructor_body,
+				   TREE_PUBLIC (this_decl),
+				   DECL_ONE_ONLY (this_decl),
+			           version_selector_var);
+
+      gcc_assert (constructor_decl != NULL_TREE);
+      free (constructor_name);
+    }
+
+  free (global_var_name);
+  return version_selector_var;
+}
+
+/* Start Analysis phase.  Mark all functions that are hot and have a call-graph
+   path to a __builtin_dispatch call. */
+
+/* This function returns the address of the feature test function.
+   If the address of the function is saved to a temporary,
+   this function traverses the gimple statements before BUILTIN_STMT
+   and finds an assignment whose rhs is the feature test function.
+   If the feature test function is specified as a function pointer
+   whose function value is unknown, this funcition returns NULL. */
+
+static tree
+find_version_selector_func_addr (gimple builtin_stmt)
+{
+  tree cond_func_addr = NULL;
+  gimple def_stmt = NULL;
+
+  cond_func_addr = gimple_call_arg (builtin_stmt, 0);
+
+  gcc_assert (TREE_CODE (cond_func_addr) == ADDR_EXPR
+	      || TREE_CODE (cond_func_addr) == SSA_NAME);
+
+  if (TREE_CODE (cond_func_addr) == ADDR_EXPR)
+    return cond_func_addr;
+
+  /* TREE_CODE (cond_func_addr) ==  SSA_NAME
+     This means a new function pointer variable is created and assigned the
+     address of the feature-test function. Traverse the statements backwards
+     and find the assignment to get the RHS. */
+
+  def_stmt = SSA_NAME_DEF_STMT (cond_func_addr);
+
+  gcc_assert (def_stmt
+              && gimple_assign_lhs (def_stmt) == cond_func_addr);
+
+  cond_func_addr = gimple_assign_rhs1 (def_stmt);
+
+  /* If the cond_func_addr is still not an ADDR_EXPR, it means that the
+     feature-test function is specified as a pointer.  In this case, we
+     return NULL, since the feature-test function decl is not known. */
+
+  if (cond_func_addr == NULL
+      || TREE_CODE (cond_func_addr) != ADDR_EXPR)
+    return NULL;
+
+  /* If the operand of the ADDR_EXPR is not a function_decl, return NULL
+     as this still means the feature-test function is specified as a
+     function pointer. */
+
+  if (TREE_CODE (TREE_OPERAND (cond_func_addr, 0)) != FUNCTION_DECL)
+    return NULL;
+
+  return cond_func_addr;
+}
+
+/* Finds the gimple calls  to __builtin_dispatch in function pointed
+   to by the call graph NODE and populates the vector VEC.  Returns
+   true if at least one statement was found where the feature test
+   function is marked as "version_selector".  Otherwise, there is no
+   question of hoisting it. */
+
+static bool
+is_builtin_dispatch_stmt_present (struct cgraph_node *node,
+			          VEC (tree,heap) **vec)
+{
+  struct cgraph_edge *edge;
+  bool present = false;
+
+  gcc_assert (!flag_lto);
+
+  for (edge = node->callees; edge; edge = edge->next_callee)
+    {
+      if (edge->callee->decl == builtin_function_decl)
+        {
+	  tree cond_func_decl;
+	  tree cond_func_addr;
+	  gcc_assert (*vec != NULL);
+	  cond_func_addr = find_version_selector_func_addr (edge->call_stmt);
+
+	  if (cond_func_addr == NULL)
+            continue;
+
+	  cond_func_decl = TREE_OPERAND (cond_func_addr, 0);
+
+	  /* Do not consider for hoisting if "version_selector" attribute is
+	     not set. */
+	  if (lookup_attribute ("version_selector",
+			        DECL_ATTRIBUTES (cond_func_decl)) == NULL)
+            {
+              if (dump_file)
+                {
+                  fprintf (dump_file, "Not hoisting builtin_dispatch as "
+                           "feature_test function not version_selector :\n");
+                  print_gimple_stmt (dump_file, edge->call_stmt, 0, TDF_VOPS);
+		}
+              continue;
+            }
+
+	  present = true;
+	  VEC_safe_push (tree, heap, *vec, cond_func_decl);
+        }
+    }
+  return present;
+}
+
+/* Updates the list of feature-test function decls reaching the cgraph
+   function NODE. */
+
+static void
+update_reachable_decls_list (struct cgraph_node *node,
+                             VEC (tree, heap) *predicate_decls)
+{
+  VEC (tree, heap) **decl_list = NULL;
+  tree cond_func_decl;
+  int ix;
+
+  if (node->aux == NULL)
+    {
+      decl_list = (VEC (tree, heap) **) xmalloc (sizeof (VEC (tree, heap) *));
+      *decl_list = VEC_alloc (tree, heap, 1);
+      node->aux = decl_list;
+    }
+  else
+    decl_list = (VEC (tree, heap) **) node->aux;
+
+  for (ix = 0; VEC_iterate (tree, predicate_decls, ix, cond_func_decl); ++ix)
+    VEC_safe_push (tree, heap, *decl_list, cond_func_decl);
+}
+
+/* Propagate the __builtin_dispatch stmt (s) called from node to its
+   callers, PREDICATE_DECLS is the decls list of the predicate functions. */
+
+static unsigned int
+mark_reachable_functions (struct cgraph_node *this_node,
+			  VEC (tree, heap) *predicate_decls)
+{
+  VEC (NODEPTR, heap) *work_list;
+  VEC (int, heap) *depth_list;
+  struct cgraph_edge *e;
+  htab_t node_htab = NULL;
+  void **slot = NULL;
+
+  /* Use a work-list style algorithm to mark functions in any call-graph
+     path to the current function. */
+
+  work_list = VEC_alloc (NODEPTR, heap, 8);
+  depth_list = VEC_alloc (int, heap, 8);
+
+  VEC_safe_push (NODEPTR, heap, work_list, this_node);
+  VEC_safe_push (int, heap, depth_list, 0);
+
+  node_htab = htab_create (10, htab_hash_pointer,
+  			   htab_eq_pointer, NULL);
+
+  slot = htab_find_slot (node_htab, this_node, INSERT);
+
+  gcc_assert (*slot == NULL);
+  *slot = this_node;
+
+  while (!VEC_empty (NODEPTR, work_list))
+    {
+      struct cgraph_node *node = VEC_pop (NODEPTR, work_list);
+      int depth = VEC_pop (int, depth_list);
+
+      if (dump_file)
+        fprintf (dump_file, "%s has a depth = %d callgraph path to %s\n",
+                 cgraph_node_name (node), depth,
+                 cgraph_node_name (this_node));
+
+      update_reachable_decls_list (node, predicate_decls);
+
+      gcc_assert (node->aux != NULL);
+
+      if (depth >= PARAM_VALUE (PARAM_MVERSN_CLONE_CGRAPH_DEPTH))
+        {
+          if (dump_file)
+            fprintf (dump_file, "Not propogating __builtin_dispatch... "
+                     "maximum cloning depth  = %d reached\n",
+		     PARAM_VALUE (PARAM_MVERSN_CLONE_CGRAPH_DEPTH));
+          continue;
+        }
+
+      for (e = node->callers; e; e = e->next_caller)
+        {
+          slot = htab_find_slot (node_htab, e->caller, INSERT);
+	  if (*slot != NULL)
+	    continue;
+          *slot = e->caller;
+          if (!hot_function_p (e->caller))
+            continue;
+
+          VEC_safe_push (NODEPTR, heap, work_list, e->caller);
+          VEC_safe_push (int, heap, depth_list, (depth + 1));
+        }
+    }
+
+  htab_delete (node_htab);
+  VEC_free (NODEPTR, heap, work_list);
+  VEC_free (int, heap, depth_list);
+  return 0;
+}
+
+/* Scan the call graph and detect hot functions that have __builtin_dispatch
+   calls. Then, propogate this information to its callers. Returns true if
+   a suitable __builtin_dispatch was found. */
+
+static bool
+perform_analysis_phase (void)
+{
+  struct cgraph_node *node;
+  VEC(tree, heap) *builtin_predicates_vec = NULL;
+  bool flag = false;
+
+  builtin_predicates_vec = VEC_alloc (tree, heap, 1);
+
+  for (node = cgraph_nodes; node; node = node->next)
+    {
+      /* if the body of this decl is from outside, do nothing. */
+      if (DECL_EXTERNAL (node->decl))
+        continue;
+
+      if (!hot_function_p (node))
+        continue;
+
+      if (!is_builtin_dispatch_stmt_present (node, &builtin_predicates_vec))
+        continue;
+
+      if (dump_file)
+        {
+          fprintf (dump_file, "%s calls __builtin_dispatch atleast once.\n",
+                   cgraph_node_name (node));
+
+          fprintf (dump_file, "%s is a hot function, consider cloning ...\n",
+        	   cgraph_node_name (node));
+        }
+
+      flag = true;
+      mark_reachable_functions (node, builtin_predicates_vec);
+      VEC_truncate (tree, builtin_predicates_vec, 0);
+    }
+
+  VEC_free (tree, heap, builtin_predicates_vec);
+  return flag;
+}
+
+/* End Analysis phase. */
+
+/* Decide Cloning Phase.
+
+   In this phase, we go through each function and decide if it should be
+   cloned or not. */
+
+/* This function counts the number of unique decls in the DECL_LIST.*/
+
+static int
+count_predicate_functions (VEC (tree,heap) *decl_list)
+{
+  int ix;
+  int count = 0;
+  tree cond_func_decl = NULL;
+  htab_t dup_decl_htab = NULL;
+
+  if (VEC_length (tree, decl_list) == 1)
+    return 1;
+
+  dup_decl_htab = htab_create (2, htab_hash_pointer, htab_eq_pointer, NULL);
+
+  for (ix = 0; VEC_iterate (tree, decl_list, ix, cond_func_decl); ++ix)
+    {
+      void **slot = NULL;
+      slot = htab_find_slot (dup_decl_htab, cond_func_decl, INSERT);
+
+      if (*slot != NULL)
+        continue;
+      count++;
+      *slot = cond_func_decl;
+    }
+
+  htab_delete (dup_decl_htab);
+  return count;
+}
+
+/* This function decides which functions to clone based on the number of
+   feature_test decls reaching it.  Currently, only one feature_test decl
+   is allowed. */
+
+static bool
+decide_cloning_phase (void)
+{
+  struct cgraph_node *node;
+  int count;
+  bool run_cloning_phase = false;
+  int num_funcs_cloned = 0;
+
+  for (node = cgraph_nodes; node; node = node->next)
+    {
+      tree cond_func_decl = NULL;
+      VEC (tree, heap) *vec;
+      if (node->aux == NULL)
+        continue;
+
+      if (num_funcs_cloned >= PARAM_VALUE (PARAM_NUMBER_OF_MVERSN_CLONES))
+        {
+          if (dump_file)
+            fprintf (dump_file, "Reached cloning limit specified "
+                     "by \"num-mversn-clones\" for %s\n",
+	             cgraph_node_name (node));
+
+          free (node->aux);
+	  node->aux = NULL;
+	  continue;
+        }
+
+      vec = *(VEC (tree,heap) **) node->aux;
+      count = count_predicate_functions (vec);
+      gcc_assert (count >= 1);
+      cond_func_decl = VEC_index (tree, vec, 0);
+      gcc_assert (cond_func_decl != NULL);
+      VEC_free (tree, heap, vec);
+      free (node->aux);
+      node->aux = NULL;
+
+      if (count > 1)
+        {
+          if (dump_file)
+            fprintf (dump_file, "%s has %d predicates, Not cloning for > 1\n",
+	             cgraph_node_name (node), count);
+	  continue;
+        }
+      /* Set the node's aux value to be that of the predicate decl. */
+      node->aux = cond_func_decl;
+      run_cloning_phase = true;
+      num_funcs_cloned++;
+    }
+  return run_cloning_phase;
+}
+
+/* End Decide Cloning Phase. */
+
+/* Cloning Phase. */
+
+/* Deletes all basic-blocks and leaves function with :
+   ENTRY_BLOCK ---> (new empty basic block) ---> EXIT_BLOCK
+*/
+
+static basic_block
+empty_function_body (tree fndecl)
+{
+  basic_block bb, new_bb;
+  edge e;
+  tree old_current_function_decl;
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (fndecl));
+  current_function_decl = fndecl;
+
+  clear_edges ();
+  for (bb = ENTRY_BLOCK_PTR; bb != NULL;)
+    {
+      basic_block bb_next;
+      bb_next = bb->next_bb;
+      if (bb != EXIT_BLOCK_PTR
+          && bb != ENTRY_BLOCK_PTR)
+        {
+          if (bb_seq (bb) != NULL)
+            {
+              gimple_stmt_iterator i;
+              for (i = gsi_start_bb (bb); !gsi_end_p (i);)
+	        {
+		  gimple stmt = gsi_stmt (i);
+		  unlink_stmt_vdef (stmt);
+                  gsi_remove (&i, true);
+		  release_defs (stmt);
+		}
+            }
+          bb->il.gimple = NULL;
+          bb->prev_bb = NULL;
+          bb->next_bb = NULL;
+          SET_BASIC_BLOCK (bb->index, NULL);
+          n_basic_blocks--;
+        }
+      bb = bb_next;
+    }
+  ENTRY_BLOCK_PTR->next_bb = EXIT_BLOCK_PTR;
+  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
+  e = make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
+  gcc_assert (e != NULL);
+  /* XXX:Is this edge necessary ? */
+  e = make_edge (new_bb, EXIT_BLOCK_PTR, 0);
+  gcc_assert (e != NULL);
+
+  current_function_decl = old_current_function_decl;
+  pop_cfun ();
+  return new_bb;
+}
+
+/* Takes function with decl ORIG_FNDECL and clones it.  The
+   name of the clone is the original name suffixed with
+   NAME_SUFFIX.  Code is adapted from cgraph_function_versioning
+   in cgraphunit.c */
+
+static tree
+clone_function (tree orig_fndecl, const char *name_suffix)
+{
+  tree new_decl;
+  char *new_name;
+  struct cgraph_node *new_version;
+  struct cgraph_node *old_version;
+  void **slot;
+  tree old_current_function_decl;
+
+  new_name = make_name (orig_fndecl, name_suffix);
+  new_decl = copy_node (orig_fndecl);
+
+
+  slot = htab_find_slot_with_hash (name_decl_htab, new_name,
+                                   htab_hash_string (new_name), INSERT);
+
+  gcc_assert (*slot == NULL);
+  *slot = new_decl;
+
+  /* Code adapted from cgraph_function_versioning in cgraphuinit.c */
+
+  new_version = cgraph_node (new_decl);
+  old_version = cgraph_node (orig_fndecl);
+
+  new_version->local = old_version->local;
+  new_version->global = old_version->global;
+  new_version->rtl = old_version->rtl;
+  new_version->reachable = true;
+  new_version->count = old_version->count;
+
+  /* Set the name of the new function. */
+  DECL_NAME (new_decl) = get_identifier (new_name);
+  SET_DECL_ASSEMBLER_NAME (new_decl, DECL_NAME (new_decl));
+  SET_DECL_RTL (new_decl, NULL);
+
+  tree_function_versioning (orig_fndecl, new_decl, NULL /*tree_map*/,
+                            false, NULL /*args_to_skip*/,
+			    NULL /* blocks_to_copy */ ,
+                            NULL /* new_entry */);
+
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (new_decl));
+  current_function_decl = new_decl;
+
+  TREE_READONLY (new_decl) = TREE_READONLY (orig_fndecl);
+  TREE_STATIC (new_decl) = TREE_STATIC (orig_fndecl);
+  TREE_USED (new_decl) = TREE_USED (orig_fndecl);
+  DECL_ARTIFICIAL (new_decl) = 1;
+  DECL_IGNORED_P (new_decl) = 0;
+  TREE_PUBLIC (new_decl) = TREE_PUBLIC (orig_fndecl);
+  DECL_CONTEXT (new_decl) = DECL_CONTEXT (orig_fndecl);
+
+  DECL_EXTERNAL (new_decl) = DECL_EXTERNAL (orig_fndecl);
+  DECL_COMDAT (new_decl) = DECL_COMDAT (orig_fndecl);
+  DECL_COMDAT_GROUP (new_decl) = DECL_COMDAT_GROUP (orig_fndecl);
+  DECL_VIRTUAL_P (new_decl) = DECL_VIRTUAL_P (orig_fndecl);
+  DECL_WEAK (new_decl) = DECL_WEAK (orig_fndecl);
+
+  /* Always inline the clones. Why are we cloning otherwise? */
+  DECL_DECLARED_INLINE_P (new_decl) = 1;
+  DECL_UNINLINABLE (new_decl) = 0;
+  new_version->local.externally_visible
+  = old_version->local.externally_visible;
+  new_version->local.local
+  = old_version->local.local;
+
+  new_version->analyzed = true;
+  new_version->lowered = true;
+
+  if (dump_file)
+    dump_function_to_file (new_decl, dump_file, TDF_BLOCKS);
+
+  cgraph_add_new_function (new_decl, true);
+
+  cgraph_call_function_insertion_hooks (new_version);
+  cgraph_mark_needed_node (new_version);
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+  return new_decl;
+}
+
+/* This function populates the vector *VEC with the args in the gimple
+   call statement STMT. SKIP_ARGS is the number of args to skip.*/
+
+static void
+get_function_args (gimple stmt, int num_args, VEC (tree, heap) **vec,
+                   int skip_args)
+{
+  int i;
+
+  if (num_args == 0) return;
+
+  *vec = VEC_alloc (tree, heap, num_args);
+  /* The number of args in a function is 1 plus the actual number of
+     args.  Also, there are 3 special args reserved, so the first arg
+     starts from 3. */
+  for (i = 0; i <= num_args - 2; ++i)
+    VEC_quick_push (tree, *vec, gimple_call_arg (stmt, (skip_args + i)));
+}
+
+/* Given ret = __builtin_dispatch (pred, fn1, fn2, arg1, ....)
+   get ret = fn1 (arg1, ...) or ret = fn2 (arg1, ....)
+   depending on the value of SIDE == 0 or 1. */
+
+static gimple
+make_specialized_call_from_builtin (gimple builtin_stmt, int side)
+{
+  tree func_addr;
+  int num_func_args = 0;
+  VEC (tree, heap) *nargs = NULL;
+  tree lhs_stmt;
+  gimple specialized_call_stmt;
+
+  if (side == 0)
+    func_addr = gimple_call_arg (builtin_stmt, 1);
+  else
+    func_addr = gimple_call_arg (builtin_stmt, 2);
+
+  num_func_args
+    =  function_args_count (TREE_TYPE (TREE_OPERAND (func_addr, 0)));
+
+  get_function_args (builtin_stmt, num_func_args, &nargs, 3);
+
+  specialized_call_stmt = gimple_build_call_vec (func_addr, nargs);
+
+  lhs_stmt = gimple_call_lhs (builtin_stmt);
+
+  if (lhs_stmt != NULL_TREE)
+    gimple_call_set_lhs (specialized_call_stmt, lhs_stmt);
+
+  if (nargs != NULL)
+    VEC_free (tree, heap, nargs);
+
+  return specialized_call_stmt;
+}
+
+/* Given a call (GENERIC_STMT) to a function that is cloned, substitute
+   with a call to the correct clone. */
+
+static gimple
+make_specialized_call_to_clone (gimple generic_stmt, int side)
+{
+  tree new_decl;
+  char *new_name;
+  tree generic_fndecl;
+  gimple specialized_call_stmt;
+  void **slot;
+  int num_func_args;
+  tree lhs_stmt;
+  VEC (tree, heap) *nargs= NULL;
+
+  generic_fndecl = gimple_call_fndecl (generic_stmt);
+  gcc_assert (generic_fndecl != NULL);
+
+  if (side == 0)
+    new_name = make_name (generic_fndecl, "clone_0");
+  else
+    new_name = make_name (generic_fndecl, "clone_1");
+
+  slot = htab_find_slot_with_hash (name_decl_htab, new_name,
+                                   htab_hash_string (new_name), NO_INSERT);
+  gcc_assert (slot != NULL);
+  new_decl = (tree) *slot;
+  gcc_assert (new_decl);
+
+  num_func_args = function_args_count (TREE_TYPE (generic_fndecl));
+  get_function_args (generic_stmt, num_func_args, &nargs, 0);
+  specialized_call_stmt = gimple_build_call_vec (new_decl, nargs);
+
+  lhs_stmt = gimple_call_lhs (generic_stmt);
+
+  if (lhs_stmt != NULL_TREE)
+    gimple_call_set_lhs (specialized_call_stmt, lhs_stmt);
+
+  if (nargs != NULL)
+    VEC_free (tree, heap, nargs);
+
+  return specialized_call_stmt;
+}
+
+/* Returns true if STMT is a call to __builtin_dispatch and its
+   predicate feature-test function is marked with attribute
+   "version_selector". */
+
+static bool
+is_builtin_with_predicate_version_selector (gimple stmt)
+{
+  tree cond_func_addr, cond_func_decl;
+
+  gcc_assert (!flag_lto);
+ 
+  if (gimple_call_fndecl (stmt) != builtin_function_decl)
+    return false;
+
+  cond_func_addr = find_version_selector_func_addr (stmt);
+
+  if (cond_func_addr == NULL)
+    return false;
+
+  cond_func_decl = TREE_OPERAND (cond_func_addr, 0);
+  if (lookup_attribute ("version_selector",
+			DECL_ATTRIBUTES (cond_func_decl)) != NULL)
+    return true;
+
+  return false;
+}
+
+/* Find calls to __builtin_dispatch or to functions that are versioned
+   in CLONE_DECL and substitute the call with the correct version based
+   on the value of SIDE. */
+
+static void
+specialize_call (tree clone_decl, int side)
+{
+  basic_block bb;
+  tree old_current_function_decl;
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (clone_decl));
+  current_function_decl = clone_decl;
+
+  /* Iterate over call edges and find out if there is
+     a call to __builtin_dispatch or a cloned function.
+     We cannot iterate over call graph edges as there are
+     no edges for the clones yet. */
+
+  FOR_EACH_BB_FN (bb, DECL_STRUCT_FUNCTION (clone_decl))
+    {
+      gimple_stmt_iterator gsi;
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+        {
+	  tree callee_decl;
+	  struct cgraph_node *callee_node;
+	  gimple specialized_call_stmt;
+	  gimple stmt = gsi_stmt (gsi);
+
+	  if (!is_gimple_call (stmt))
+            continue;
+
+	  callee_decl = gimple_call_fndecl (stmt);
+
+	  if (callee_decl == NULL)
+            continue;
+
+	  callee_node = cgraph_node (callee_decl);
+
+	  /* For a __builtin_dispatch stmt, only specialize if
+             version_selector attribute is set. Otherwise, it is
+             not hoisted, so no specialization. */
+
+	  if (is_builtin_with_predicate_version_selector (stmt))
+	    {
+	      specialized_call_stmt =
+	        make_specialized_call_from_builtin (stmt, side);
+	    }
+	  else if  (callee_node->aux != NULL)
+            {
+	      specialized_call_stmt =
+	        make_specialized_call_to_clone (stmt, side);
+            }
+	  else
+	    continue;
+
+          if (dump_file)
+            {
+	      fprintf (dump_file, "Specialize stmt : \n");
+              print_gimple_stmt (dump_file, stmt, 0, TDF_VOPS);
+	      fprintf (dump_file, "Specialized stmt : \n");
+              print_gimple_stmt (dump_file, specialized_call_stmt,
+			         0, TDF_VOPS);
+            }
+
+	  gimple_set_block (specialized_call_stmt, gimple_block (stmt));
+	  gsi_insert_before_without_update (&gsi, specialized_call_stmt,
+                                            GSI_SAME_STMT);
+
+	  
+          unlink_stmt_vdef (stmt);
+	  gsi_remove (&gsi, true); 
+	  mark_symbols_for_renaming (specialized_call_stmt);
+
+	  /* After removing make sure gsi is set correctly to not skip
+	     a statememt. */
+          gsi = gsi_for_stmt (specialized_call_stmt);
+	}
+    }
+  current_function_decl = old_current_function_decl;
+  pop_cfun ();
+}
+
+/* When a function is version cloned, its body is replaced to call one
+   of the versions with the feature-test function acting as a predicate.
+   This is done with __builtin_dispatch which is later expanded. */
+
+static gimple
+make_builtin_call_to_clones (tree orig_fndecl, tree clone_0_addr,
+                             tree clone_1_addr, tree cond_func_addr)
+{
+  gimple new_builtin_call;
+  VEC(tree, heap) *vargs = VEC_alloc (tree, heap, 4);
+  tree arg;
+
+  VEC_quick_push (tree, vargs, cond_func_addr);
+  VEC_quick_push (tree, vargs, clone_0_addr);
+  VEC_quick_push (tree, vargs, clone_1_addr);
+
+  for (arg = DECL_ARGUMENTS (orig_fndecl); arg; arg = TREE_CHAIN (arg))
+    {
+      VEC_safe_push (tree, heap, vargs, arg);
+      /* Again, this add_referenced_var is very very important.  It broke
+         a build where a cloned function's arguments where never
+         referenced.  Missing this statement in places asserts at
+         tree-dfa.c:589, in function referenced_var_lookup at
+         "gcc_assert (h || uid == 0);" and is very difficult to triage. */
+      add_referenced_var (arg);
+    }
+
+  new_builtin_call = gimple_build_call_vec (builtin_function_decl, vargs);
+  mark_symbols_for_renaming (new_builtin_call);
+
+
+  if (dump_file)
+    print_gimple_stmt (dump_file, new_builtin_call, 0, TDF_VOPS);
+
+  VEC_free (tree, heap, vargs);
+
+  return new_builtin_call;
+}
+
+/* This clones a dispatch function whose callee-graph path has a function
+   which calls __builtin_dispatch.  This function is cloned and the
+   original function branches to the right clone. */
+
+static int
+clone_and_dispatch_function (struct cgraph_node *orig_node, tree *clone_0,
+			     tree *clone_1)
+{
+  tree clone_0_decl, clone_1_decl;
+  gimple new_builtin_call = NULL;
+  gimple new_return_stmt = NULL;
+  gimple_seq seq = NULL;
+  basic_block new_bb;
+  tree orig_fndecl;
+  tree return_var = NULL;
+  tree return_type;
+  tree old_current_function_decl;
+
+  old_current_function_decl = current_function_decl;
+  orig_fndecl = orig_node->decl;
+  push_cfun (DECL_STRUCT_FUNCTION (orig_fndecl));
+  current_function_decl = orig_fndecl;
+
+  /* Make 2 clones for true and false function. */
+  clone_0_decl = clone_function (orig_fndecl, "clone_0");
+  clone_1_decl = clone_function (orig_fndecl, "clone_1");
+  *clone_0 = clone_0_decl;
+  *clone_1 = clone_1_decl;
+
+  new_bb = empty_function_body (orig_fndecl);
+
+  new_builtin_call = make_builtin_call_to_clones (
+		       orig_fndecl,
+                       build_fold_addr_expr (clone_0_decl),
+		       build_fold_addr_expr (clone_1_decl),
+		       build_fold_addr_expr ((tree)orig_node->aux));
+
+  return_type = TREE_TYPE (TREE_TYPE (orig_fndecl));
+
+  if (!TREE_ADDRESSABLE (return_type) && COMPLETE_TYPE_P (return_type))
+    {
+      tree tmp_var;
+      tmp_var = create_tmp_var (return_type, NULL);
+      add_referenced_var (tmp_var);
+      return_var = make_ssa_name (tmp_var, new_builtin_call);
+      gimple_call_set_lhs (new_builtin_call, return_var);
+    }
+
+  mark_symbols_for_renaming (new_builtin_call);
+  new_return_stmt = gimple_build_return (return_var);
+  mark_symbols_for_renaming (new_return_stmt);
+  gimple_seq_add_stmt (&seq, new_builtin_call);
+  gimple_seq_add_stmt (&seq, new_return_stmt);
+  set_bb_seq (new_bb, seq);
+  gimple_set_bb (new_builtin_call, new_bb);
+  gimple_set_bb (new_return_stmt, new_bb);
+
+  gimple_set_block (new_builtin_call, DECL_INITIAL (orig_fndecl));
+  gimple_set_block (new_return_stmt, DECL_INITIAL (orig_fndecl));
+
+  if (dump_file)
+    dump_function_to_file (orig_fndecl, dump_file, TDF_BLOCKS);
+
+  /* This update_ssa is necessary here for the following reason.  SSA uses
+     a global syms_to_rename bitmap that stores syms that must be renamed.
+     So, if we accumulate the syms from one function in IPA but move to
+     a different function without updating SSA, then we could be
+     accumulating syms from many functions.  This would assert in
+     referenced_var_lookup because the hashtab storing the syms is
+     function local. This is horrible. gcc-4.6 makes this bitmap a
+     global. */
+  update_ssa (TODO_update_ssa);
+
+  compute_inline_parameters (cgraph_node (orig_fndecl));
+  DECL_DECLARED_INLINE_P (orig_fndecl) = 1;
+  DECL_UNINLINABLE (orig_fndecl) = 0;
+  current_function_decl = old_current_function_decl;
+  pop_cfun ();
+  return 0;
+}
+
+/* Clone all functions marked for cloning by the earlier phase. */
+
+static void
+perform_cloning_phase (void)
+{
+  struct cgraph_node *node;
+  int ix;
+  VEC (tree, heap) *cloned_decl_list = NULL;
+  tree cloned_decl = NULL;
+
+  cloned_decl_list = VEC_alloc (tree, heap, 2);
+
+  /* First clone, then specialize the clones. */
+  for (node = cgraph_nodes; node; node = node->next)
+    {
+      tree clone_0_decl, clone_1_decl;
+      if (node->aux == NULL)
+        continue;
+      if (dump_file)
+      {
+        fprintf (dump_file, "%s will be cloned\n", cgraph_node_name (node));
+        dump_function_to_file (node->decl, dump_file, TDF_BLOCKS);
+      }
+      clone_and_dispatch_function (node, &clone_0_decl, &clone_1_decl);
+      VEC_safe_push (tree, heap, cloned_decl_list, clone_0_decl);
+      VEC_safe_push (tree, heap, cloned_decl_list, clone_1_decl);
+      continue;
+    }
+
+  /* Specialize the clones now. */
+  for (ix = 0; VEC_iterate (tree, cloned_decl_list, ix, cloned_decl); ++ix)
+    {
+      int which_clone = ix % 2;
+      specialize_call (cloned_decl, which_clone);
+    }
+
+  VEC_free (tree, heap, cloned_decl_list);
+}
+
+/* End Cloning phase. */
+
+/* Checks if there is atleast one call to __builtin_dispatch. */
+
+static bool
+find_builtin_decl (void)
+{
+  struct cgraph_node *node;
+  for (node = cgraph_nodes; node; node = node->next)
+    {
+      if (strstr (cgraph_node_name (node), "__builtin_dispatch") != NULL)
+        {
+          builtin_function_decl = node->decl;
+          return true;
+        }
+    }
+  return false;
+}
+
+/* Set the aux fields of all nodes and edges in the call graph to be NULL. */
+
+static void
+cleanup_aux_field (void)
+{
+  struct cgraph_node *node;
+  struct cgraph_edge *edge;
+
+  for (node = cgraph_nodes; node; node = node->next)
+    {
+      node->aux = NULL;
+      for (edge = node->callees; edge; edge = edge->next_callee)
+        edge->aux = NULL;
+    }
+}
+
+/* Main driver function. It scans the __builtin_dispatch calls and
+   figures out which functions to clone.  It then clones the functions. */
+
+static unsigned int
+builtin_dispatch_ipa_clone (void)
+{
+  cleanup_aux_field ();
+
+  /* Allocate hashtab mapping name to decl. */
+  name_decl_htab = htab_create (10, name_decl_htab_hash_descriptor,
+			        name_decl_htab_eq_descriptor, NULL);
+
+  /* Turn it on for O1 and above.  At -O0, there is a SSA alias bug
+     with create_tmp_var.  Cloning and hoisting is not necessary at
+     -O0 anyways.  Also, guard it with the flag
+     "-fclone-hot-version-paths". 
+     Disabled for LTO as it needs more work. */
+  if (optimize == 0
+      || profile_arc_flag
+      || !flag_clone_hot_version_paths
+      || flag_lto)
+    return 0;
+
+  if (!find_builtin_decl ())
+    return 0;
+
+  gcc_assert (builtin_function_decl != NULL);
+
+  if (!perform_analysis_phase ())
+    {
+      cleanup_aux_field ();
+      return 0;
+    }
+
+  if (decide_cloning_phase ())
+    perform_cloning_phase ();
+
+  cleanup_aux_field ();
+
+  return 0;
+}
+
+static bool
+gate_handle_builtin_dispatch (void)
+{
+  return true;
+}
+
+struct simple_ipa_opt_pass pass_ipa_multiversion_dispatch =
+{
+ {
+  SIMPLE_IPA_PASS,
+  "multiversion_dispatch",		/* name */
+  gate_handle_builtin_dispatch,		/* gate */
+  builtin_dispatch_ipa_clone,           /* execute */
+  NULL,					/* sub */
+  NULL,					/* next */
+  0,					/* static_pass_number */
+  TV_MVERSN_DISPATCH,			/* tv_id */
+  0,	                                /* properties_required */
+  PROP_cfg,				/* properties_provided */
+  0,					/* properties_destroyed */
+  0,					/* todo_flags_start */
+  TODO_dump_func |			/* todo_flags_finish */
+  TODO_update_ssa
+ }
+};
+
+/* Lowering of the __builtin_dispatch calls. */
+
+
+/* This function converts STMT which is a __builtin_dispatch
+   call of the form :
+   ret = __builtin_dispatch (predicate, foo, bar, arg1, ...)
+   into :
+   var_1 = predicate
+   if  (var_1)
+     var_2 = foo (arg1, ...);
+   else
+     var_3 = bar (arg1, ...);
+   var_4 = phi (var_2, var_3)
+   ret = var_4
+
+   var_? are ssa names for variable var.
+*/
+
+static unsigned int
+convert_builtin_dispatch (gimple stmt)
+{
+  tree cond_func_addr, if_func_addr, else_func_addr;
+  tree cond_func_decl = NULL;
+  gimple if_part, else_part, if_else_stmt;
+  basic_block bb1, bb2, bb3, bb4;
+  gimple bb1end, bb2end, bb3end;
+  edge e12, e13, e23, e24, e34;
+  VEC(tree, heap) *nargs = NULL;
+  int num_func_args = 0, i;
+  tree version_selector_var;
+  tree lhs_result;
+  gimple_stmt_iterator gsi;
+  basic_block bb;
+  gimple feature_test_call = NULL;
+  tree tmp_var = NULL;
+  gimple init_stmt = NULL;
+  tree ssa_if_name, ssa_else_name;
+  gimple phinode = NULL;
+  tree tmp_result_var, ssa_result_var;
+
+  gsi = gsi_for_stmt (stmt);
+  bb = gsi_bb (gsi);
+
+  cond_func_addr = find_version_selector_func_addr (stmt);
+  if (cond_func_addr != NULL)
+    {
+      cond_func_decl = TREE_OPERAND (cond_func_addr, 0);
+      gcc_assert (cond_func_decl);
+    }
+
+  if (dump_file)
+    {
+      fprintf (dump_file, "Before Converting __builtin_dispatch :\n");
+      dump_function_to_file (current_function_decl, dump_file, TDF_BLOCKS);
+    }
+
+  if_func_addr = gimple_call_arg (stmt, 1);
+  else_func_addr = gimple_call_arg (stmt, 2);
+
+  tmp_result_var = create_tmp_var (integer_type_node, NULL);
+  add_referenced_var (tmp_result_var);
+
+  if (flag_lto
+      || cond_func_decl == NULL
+      || lookup_attribute ("version_selector",
+                           DECL_ATTRIBUTES (cond_func_decl)) == NULL)
+    {
+      tree arg = gimple_call_arg (stmt, 0);
+      /* This means the feature-test function is not set with attribute
+         version_selector or it is a function pointer or in LTO. So,
+         explicitly call it. */
+      feature_test_call = gimple_build_call (arg, 0);
+      ssa_result_var = make_ssa_name (tmp_result_var, feature_test_call);
+      gimple_call_set_lhs (feature_test_call, ssa_result_var);
+      mark_symbols_for_renaming (feature_test_call);
+      version_selector_var = ssa_result_var;
+    }
+  else
+    {
+      /* Get the global corresponding to the "version_selector" function. */
+      version_selector_var
+        = handle_version_selector_attr_function (cond_func_decl);
+      gcc_assert (version_selector_var);
+      add_referenced_var (version_selector_var);
+      feature_test_call = gimple_build_assign (tmp_result_var,
+                                               version_selector_var);
+      ssa_result_var = make_ssa_name (tmp_result_var, feature_test_call);
+      gimple_assign_set_lhs (feature_test_call, ssa_result_var);
+      mark_symbols_for_renaming (feature_test_call);
+      version_selector_var = ssa_result_var;
+    }
+
+  if_else_stmt = gimple_build_cond (GT_EXPR,
+                                    version_selector_var,
+                                    integer_zero_node,
+                                    NULL_TREE, NULL_TREE);
+
+  mark_symbols_for_renaming (if_else_stmt);
+
+  num_func_args =  function_args_count (
+    TREE_TYPE (TREE_OPERAND (if_func_addr, 0)));
+
+  nargs = VEC_alloc (tree, heap, num_func_args);
+
+  /* The arguments to the feature test function start from the 4th argument
+     in __builtin_dispatch.  The first 3 arguments are mandatory. */
+
+  for (i = 0; i <= num_func_args - 2; ++i)
+    VEC_quick_push (tree, nargs,
+                    gimple_call_arg (stmt, (3 + i)));
+
+  if_part = gimple_build_call_vec (if_func_addr, nargs);
+  else_part = gimple_build_call_vec (else_func_addr, nargs);
+
+  lhs_result = gimple_call_lhs (stmt);
+
+  if (lhs_result != NULL_TREE)
+    {
+      tree ssa_var;
+      tree return_type;
+      return_type = TREE_TYPE (lhs_result);
+      tmp_var = create_tmp_var (return_type, NULL);
+      add_referenced_var (tmp_var);
+
+      init_stmt = gimple_build_assign (tmp_var, build_zero_cst (return_type));
+      ssa_var = make_ssa_name (tmp_var, init_stmt);
+      gimple_assign_set_lhs (init_stmt, ssa_var);
+      mark_symbols_for_renaming (init_stmt);
+
+      ssa_if_name = make_ssa_name (tmp_var, init_stmt);
+      ssa_else_name = make_ssa_name (tmp_var, init_stmt);
+      gimple_call_set_lhs (if_part, ssa_if_name);
+      gimple_call_set_lhs (else_part, ssa_else_name);
+    }
+  mark_symbols_for_renaming (if_part);
+  mark_symbols_for_renaming (else_part);
+
+  /* Set the lexical block to be the same as the dispatch call. */
+  gcc_assert (feature_test_call);
+  gimple_set_block (feature_test_call, gimple_block (stmt));
+
+  if (init_stmt)
+    gimple_set_block (init_stmt, gimple_block (stmt));
+
+  gimple_set_block (if_else_stmt, gimple_block (stmt));
+  gimple_set_block (if_part, gimple_block (stmt));
+  gimple_set_block (else_part, gimple_block (stmt));
+
+  gsi_insert_before_without_update (&gsi, feature_test_call, GSI_SAME_STMT);
+  gimple_set_bb (feature_test_call, bb);
+
+  if (init_stmt)
+    {
+      gsi_insert_before_without_update (&gsi, init_stmt,
+                                        GSI_SAME_STMT);
+      gimple_set_bb (init_stmt, bb);
+    }
+
+  gsi_insert_before_without_update (&gsi, if_else_stmt, GSI_SAME_STMT);
+  gsi_insert_before_without_update (&gsi, if_part, GSI_SAME_STMT);
+  gsi_insert_before_without_update (&gsi, else_part, GSI_SAME_STMT);
+
+  /* Remove the builtin_dispatch call after the expansion. */
+  unlink_stmt_vdef (stmt);
+  gsi_remove (&gsi, true);
+
+  bb1end = if_else_stmt;
+  bb2end = if_part;
+  bb3end = else_part;
+  bb1 = bb;
+  e12 = split_block (bb1, bb1end);
+  bb2 = e12->dest;
+  e23 = split_block (bb2, bb2end);
+  bb3 = e23->dest;
+  e34 = split_block (bb3, bb3end);
+  bb4 = e34->dest;
+
+  e12->flags &= ~EDGE_FALLTHRU;
+  e12->flags |= EDGE_TRUE_VALUE;
+  e13 = make_edge (bb1, bb3, EDGE_FALSE_VALUE);
+  gcc_assert (e13);
+  e24 = make_edge (bb2, bb4, EDGE_FALLTHRU);
+  gcc_assert (e24);
+  remove_edge (e23);
+
+  if (tmp_var)
+    {
+      gimple assign_stmt;
+      phinode = create_phi_node (tmp_var, bb4);
+      add_phi_arg (phinode, ssa_if_name, e24, UNKNOWN_LOCATION);
+      add_phi_arg (phinode, ssa_else_name, e34, UNKNOWN_LOCATION);
+      mark_symbols_for_renaming (phinode);
+      gcc_assert (lhs_result);
+      assign_stmt
+        = gimple_build_assign (lhs_result, gimple_phi_result (phinode));
+      mark_symbols_for_renaming (assign_stmt);
+      gsi = gsi_start_bb (bb4);
+      gsi_insert_before_without_update (&gsi, assign_stmt, GSI_SAME_STMT);
+      gimple_set_bb (assign_stmt, bb4);
+    }
+
+  if (dump_file)
+    {
+      fprintf (dump_file, "Converted __builtin_dispatch :\n");
+      dump_function_to_file (current_function_decl, dump_file, TDF_BLOCKS);
+    }
+
+  return 0;
+}
+
+/* This function does two things.
+
+   1) For a feature-test function marked with attribute "version_selector",
+      it creates a constructor that calls the feature-test function and a
+      global that holds the result.  The global's result will be used
+      to lower any __builtin_dispatch statement that refers to this feature
+      test function.  The __builtin_dispatch statement and the feature test
+      function can be in different modules.
+
+   2) It lowers __builtin_dispatch statements. */
+
+static unsigned int
+do_convert_builtin_dispatch (void)
+{
+  basic_block bb;
+  gimple_stmt_iterator gsi;
+  VEC (gimple, heap) *builtin_stmt_list = NULL;
+  int ix;
+  gimple builtin_stmt;
+
+  /* Allocate hashtab mapping name to decl. */
+  if (name_decl_htab == NULL)
+    name_decl_htab = htab_create (10, name_decl_htab_hash_descriptor,
+		                  name_decl_htab_eq_descriptor, NULL);
+
+  /* Look for functions with attribute "version_selector" and make a
+     constructor which calls the function and saves the result in a
+     global.  Disabled for LTO as it needs more work. */
+
+  if (!flag_lto
+      && lookup_attribute ("version_selector",
+                           DECL_ATTRIBUTES (current_function_decl)) != NULL)
+    {
+      if (dump_file)
+        fprintf (dump_file, "Function with version_selector attribute found :"
+                 " %s.  Making constructor for it.\n",
+		 current_function_name ());
+
+      handle_version_selector_attr_function (current_function_decl);
+      /* Assume there are no __builtin_dispatch calls in feature test
+	 functions.  So it is safe to return. */
+      return 0;
+    }
+
+  /* Find and lower __builtin_dispatch calls. */
+
+  FOR_EACH_BB_FN (bb, DECL_STRUCT_FUNCTION (current_function_decl))
+    {
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+        {
+          gimple stmt = gsi_stmt (gsi);
+          tree call_decl;
+
+          if (!is_gimple_call (stmt))
+            continue;
+
+          call_decl = gimple_call_fndecl (stmt);
+
+	  if (call_decl == NULL)
+            continue;
+
+	  if (DECL_NAME (call_decl) == NULL_TREE)
+	    continue;
+
+	  if (strstr (IDENTIFIER_POINTER (DECL_NAME (call_decl)),
+                      "__builtin_dispatch") == NULL)
+            continue;
+
+	  if (dump_file)
+            {
+	      fprintf (dump_file, "Converting __builtin_dispatch stmt in:%s\n",
+		       current_function_name ());
+              print_gimple_stmt (dump_file, stmt, 0, TDF_VOPS);
+            }
+
+	  if (builtin_stmt_list == NULL)
+              builtin_stmt_list = VEC_alloc (gimple, heap, 2);
+
+          gcc_assert (builtin_stmt_list != NULL);
+	  VEC_safe_push (gimple, heap, builtin_stmt_list, stmt);
+        }
+    }
+
+  if (!builtin_stmt_list)
+    return 0;
+ 
+  for (ix = 0; VEC_iterate (gimple, builtin_stmt_list, ix, builtin_stmt);
+       ++ix)
+    convert_builtin_dispatch (builtin_stmt);
+
+  compute_inline_parameters (cgraph_node (current_function_decl));
+ 
+  VEC_free (gimple, heap, builtin_stmt_list); 
+  
+  return 0;
+}
+
+static bool
+gate_convert_builtin_dispatch (void)
+{
+  return true;
+}
+
+struct gimple_opt_pass pass_tree_convert_builtin_dispatch =
+{
+ {
+  GIMPLE_PASS,
+  "convert_builtin_dispatch",	        /* name */
+  gate_convert_builtin_dispatch,	/* gate */
+  do_convert_builtin_dispatch,		/* execute */
+  NULL,					/* sub */
+  NULL,					/* next */
+  0,					/* static_pass_number */
+  TV_MVERSN_DISPATCH,			/* tv_id */
+  PROP_cfg,				/* properties_required */
+  PROP_cfg,				/* properties_provided */
+  0,					/* properties_destroyed */
+  0,					/* todo_flags_start */
+  TODO_dump_func |			/* todo_flags_finish */
+  TODO_cleanup_cfg | TODO_dump_cgraph |
+  TODO_update_ssa | TODO_verify_ssa
+ }
+};
Index: timevar.def
===================================================================
--- timevar.def	(revision 173381)
+++ timevar.def	(working copy)
@@ -106,6 +106,7 @@  DEFTIMEVAR (TV_LEX		     , "lexical analysis")
 DEFTIMEVAR (TV_PARSE                 , "parser")
 DEFTIMEVAR (TV_NAME_LOOKUP           , "name lookup")
 DEFTIMEVAR (TV_INLINE_HEURISTICS     , "inline heuristics")
+DEFTIMEVAR (TV_MVERSN_DISPATCH       , "multiversion dispatch")
 DEFTIMEVAR (TV_INTEGRATION           , "integration")
 DEFTIMEVAR (TV_TREE_GIMPLIFY	     , "tree gimplify")
 DEFTIMEVAR (TV_TREE_EH		     , "tree eh")
Index: common.opt
===================================================================
--- common.opt	(revision 173381)
+++ common.opt	(working copy)
@@ -2205,6 +2205,10 @@  ftree-builtin-call-dce
 Common Report Var(flag_tree_builtin_call_dce) Init(0) Optimization
 Enable conditional dead code elimination for builtin calls
 
+fclone-hot-version-paths
+Common Report Var(flag_clone_hot_version_paths) Init(0)
+Enable cloning and hoisting of hot multiversioned paths
+
 fwhole-program
 Common Report Var(flag_whole_program) Init(0) Optimization
 Perform whole program optimizations
Index: Makefile.in
===================================================================
--- Makefile.in	(revision 173381)
+++ Makefile.in	(working copy)
@@ -1304,6 +1304,7 @@  OBJS-common = \
 	mcf.o \
 	mode-switching.o \
 	modulo-sched.o \
+	mversn-dispatch.o \
 	omega.o \
 	omp-low.o \
 	optabs.o \
@@ -3100,6 +3101,11 @@  implicit-zee.o : implicit-zee.c $(CONFIG_H) $(SYST
    $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
    $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) \
    $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h $(PARAMS_H) $(CGRAPH_H)
+mversn-dispatch.o : mversn-dispatch.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
+   $(TREE_H) langhooks.h $(TREE_INLINE_H) $(FLAGS_H) $(CGRAPH_H) intl.h \
+   $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h \
+   $(HASHTAB_H) $(COVERAGE_H) $(GGC_H) $(TREE_FLOW_H) $(RTL_H) $(IPA_PROP_H) \
+   $(BASIC_BLOCK_H) $(TOPLEV_H) $(TREE_DUMP_H)
 gcse.o : gcse.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
    $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h toplev.h $(DIAGNOSTIC_CORE_H) \
Index: passes.c
===================================================================
--- passes.c	(revision 173381)
+++ passes.c	(working copy)
@@ -796,6 +796,15 @@  init_optimization_passes (void)
     }
   NEXT_PASS (pass_ipa_increase_alignment);
   NEXT_PASS (pass_ipa_matrix_reorg);
+  NEXT_PASS (pass_ipa_multiversion_dispatch);
+    {
+      struct opt_pass **p = &pass_ipa_multiversion_dispatch.pass.sub;
+      NEXT_PASS (pass_tree_convert_builtin_dispatch);
+      /* Rebuilding cgraph edges is necessary as the above passes change
+         the call graph.  Otherwise, future optimizations use the old
+	 call graph and make wrong decisions sometimes.*/
+      NEXT_PASS (pass_rebuild_cgraph_edges);
+    }
   NEXT_PASS (pass_ipa_lower_emutls);
   *p = NULL;
 
Index: params.def
===================================================================
--- params.def	(revision 173381)
+++ params.def	(working copy)
@@ -950,6 +950,17 @@  DEFPARAM (CXX_MAX_NAMESPACES_FOR_DIAGNOSTIC_HELP,
 	  "name lookup fails",
 	  1000, 0, 0)
 
+DEFPARAM (PARAM_NUMBER_OF_MVERSN_CLONES,
+          "num-mversn-clones",
+	  "maximum number of functions to be cloned while doing "
+          "multiversioning",
+	  10, 0, 1000)
+
+DEFPARAM (PARAM_MVERSN_CLONE_CGRAPH_DEPTH,
+          "mversn-clone-depth",
+	  "maximum length of the call graph path to be cloned "
+          "while doing multiversioning",
+	  2, 0, 5)
 /*
 Local variables:
 mode:c