diff mbox

Add attribute((target_clone(...))) to PowerPC

Message ID 20170525200539.GA13410@ibm-tiger.the-meissners.org
State New
Headers show

Commit Message

Michael Meissner May 25, 2017, 8:05 p.m. UTC
On Thu, May 25, 2017 at 09:56:20PM +0200, Florian Weimer wrote:
> On Thu, May 25, 2017 at 8:25 PM, Michael Meissner
> <meissner@linux.vnet.ibm.com> wrote:
> > This patch adds the initial attribute((target_clone(...))) support to the
> 
> Patch seems to be missing.
> 
> Florian
> 

Sorry about that.

This patch adds the initial attribute((target_clone(...))) support to the
PowerPC.  It looks at the HWCAP bits for ISA 2.05 (power6), ISA 2.06 (power7),
ISA 2.07 (power8) and ISA 3.0 (power9) to determine which clone function to
run.  The implementation used the existing i386/x86_64 support for target_clone
as a template.

At the moment, it has the same basic flaw that the i386/x86_64 implementation
has, which is outside of the current module, the default version of the
function is exported.  It is only in the module that the function is defined in
that supports calling the different target clones.  I hope to add support in
the future to make the exported function be the ifunc handler and not the
default version.  However, I wanted to get the basic framework into the
compiler before tackling that issue.

I have tested these patches on a little endian power8 system and there were no
regressions.  Can I install it into the trunk?

[gcc]
2017-05-24  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000.c (toplevel): Include attribs.h.
	(enum clone_list): New enumeration to give the target clones
	processors we generate code for.
	(rs6000_clone_map): New array to identify which clone processors
	the current program is running on.
	(TARGET_COMPARE_VERSION_PRIORITY): Define to enable the
	target_clone attribute.
	(TARGET_GENERATE_VERSION_DISPATCHER_BODY): Likewise.
	(TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): Likewise.
	(TARGET_OPTION_FUNCTION_VERSIONS): Likewise.
	(cpu_expand_builtin): Add support for target_clone attribute.
	(rs6000_valid_attribute_p): Allow "default" attribute.
	(get_decl_name): New debug function to simplify printing the
	current function name in debugging statements.
	(rs6000_clone_priority): New functions to support the target_clone
	attribute, and be able to generate code to switch between ISA 2.05
	through ISA 3.0 (power6 through power9).
	(rs6000_compare_version_priority): Likewise.
	(rs6000_get_function_versions_dispatcher): Likewise.
	(make_resolver_func): Likewise.
	(add_condition_to_bb): Likewise.
	(dispatch_function_versions): Likewise.
	(rs6000_generate_version_dispatcher_body): Likewise.
	(rs6000_can_inline_p): Call get_decl_name for debugging usage.
	* doc/extend.texi (Common Function Attributes): Document that the
	PowerPC supports the target_clone attribute.

[gcc/testsuite]
2017-05-24  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/clone1.c: New test.

Comments

Segher Boessenkool May 30, 2017, 9:51 p.m. UTC | #1
Hi Mike,

On Thu, May 25, 2017 at 04:05:39PM -0400, Michael Meissner wrote:
> +/* On PowerPC, we have a limited number of target clones that we care about
> +   which means we can use an array to hold the options, rather than having more
> +   elaborate data structures to identify each possible variation.  Order the
> +   clones from the highest ISA to the least.  */
> +enum clone_list {
> +  CLONE_ISA_3_00,		/* ISA 3.00 (power9).  */
> +  CLONE_ISA_2_07,		/* ISA 2.07 (power8).  */
> +  CLONE_ISA_2_06,		/* ISA 2.06 (power7).  */
> +  CLONE_ISA_2_05,		/* ISA 2.05 (power6).  */
> +  CLONE_DEFAULT,		/* default clone.  */
> +  CLONE_MAX
> +};

Is this easier than the more natural ordering (from default to higher)?
Also, since you use the enum values as numbers, please make the first
on explicitly "= 0".  These go together: default 0 is nice to have.

> +static const struct clone_map rs6000_clone_map[ (int)CLONE_MAX ] = {

Space after cast; no spaces inside [].

> +static inline const char *
> +get_decl_name (tree fn)

Please don't use inline unless there is a good reason to.

> +  if (TARGET_DEBUG_TARGET)
> +    fprintf (stderr, "rs6000_get_function_version_priority (%s) => %d\n",
> +	     get_decl_name (fndecl), (int) ret);

"ret" already is an int.  Similarly, are the casts of the enum values
necessary?

> +  struct cgraph_function_version_info *default_version_info = NULL;

You always initialise this variable later on; don't set it to NULL
earlier.  You can move the declaration down to where the var is first
initialised.

> +  tree dispatch_decl = NULL;

For this one, you can put it inside the if (), and just explicitly
return NULL on the error path (you do that in one case already).

> +#if defined (ASM_OUTPUT_TYPE_DIRECTIVE)

Is this the correct conditional to use?  It is not obvious to me why
it would be.  Does it have to be an #ifdef anyway, can't it be an if?

> +  if (targetm.has_ifunc_p ())
> +    {
> +      struct cgraph_function_version_info *it_v = NULL;
> +      struct cgraph_node *dispatcher_node = NULL;
> +      struct cgraph_function_version_info *dispatcher_version_info = NULL;

No NULL for these either please.  If you later add a path where you
forget to initialise one of these vars you will not get a warning
(and if nothing goes wrong these initialisations are distracting noise).

> +/* Make the resolver function decl to dispatch the versions of
> +   a multi-versioned function,  DEFAULT_DECL.  Create an

One space after comma.

> +  /* The resolver function should return a (void *). */

And two after a dot.

> +  gcc_assert (dispatch_decl != NULL);
> +  /* Mark dispatch_decl as "ifunc" with resolver as resolver_name.  */
> +  DECL_ATTRIBUTES (dispatch_decl)
> +    = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (dispatch_decl));

That assert is not very useful: the very next statement would segfault
if the assertion fails, giving just as much information.

> +  /* Create the alias for dispatch to resolver here.  */
> +  /*cgraph_create_function_alias (dispatch_decl, decl);*/

Do you need to keep this line?  Please add a comment saying why it is
disabled for now, or such.

> +  gcc_assert (new_bb != NULL);
> +  gseq = bb_seq (new_bb);

Same as before.

> +  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
> +	     		 build_fold_addr_expr (version_decl));

Indent is broken here.

> +  result_var = create_tmp_var (ptr_type_node);
> +  convert_stmt = gimple_build_assign (result_var, convert_expr); 

Space at end of line.

> +  if (clone_isa == (int)CLONE_DEFAULT)

Space after cast.  Do you need a cast here?

> +  predicate_decl = rs6000_builtin_decls [(int) RS6000_BUILTIN_CPU_SUPPORTS];

You don't need a cast here either afaics.

> +  make_edge (bb1, bb3, EDGE_FALSE_VALUE); 

Space at end of line.

> +  /* The first version in the vector is the default decl.  */
> +  memset ((void *) clones, '\0', sizeof (clones));

memset (clones, 0, sizeof clones);

or just initialise it in the first place:

tree clones[CLONE_MAX] = { 0 };

> +  /* On the PowerPC, we do not need to call __builtin_cpu_init, if we are using
> +     a new enough glibc.  If we ever need to call it, we would need to insert
> +     the code here to do the call.  */

Are we always using a new enough glibc?  If so, please clarify the
comment.

> +static tree 
> +rs6000_generate_version_dispatcher_body (void *node_p)

Trailing space.

> +  node = (cgraph_node *)node_p;

Space after cast.

> +On a PowerPC, you could compile a function with
> +@code{target_clones("cpu=power9,default")}.  GCC creates two function

"For PowerPC you can ..."?

> --- gcc/testsuite/gcc.target/powerpc/clone1.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/clone1.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 248446)
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */

s/powerpc64/powerpc/

Looks good so far, just needs some polish ;-)  Please consider changing
the clone_list enum to a more natural order (and does the enum need a
name, anyway?), tidy up layout stuff etc., and repost.

Thanks,


Segher
Michael Meissner May 30, 2017, 11:39 p.m. UTC | #2
On Tue, May 30, 2017 at 04:51:34PM -0500, Segher Boessenkool wrote:
> Hi Mike,
> 
> On Thu, May 25, 2017 at 04:05:39PM -0400, Michael Meissner wrote:
> > +/* On PowerPC, we have a limited number of target clones that we care about
> > +   which means we can use an array to hold the options, rather than having more
> > +   elaborate data structures to identify each possible variation.  Order the
> > +   clones from the highest ISA to the least.  */
> > +enum clone_list {
> > +  CLONE_ISA_3_00,		/* ISA 3.00 (power9).  */
> > +  CLONE_ISA_2_07,		/* ISA 2.07 (power8).  */
> > +  CLONE_ISA_2_06,		/* ISA 2.06 (power7).  */
> > +  CLONE_ISA_2_05,		/* ISA 2.05 (power6).  */
> > +  CLONE_DEFAULT,		/* default clone.  */
> > +  CLONE_MAX
> > +};
> 
> Is this easier than the more natural ordering (from default to higher)?
> Also, since you use the enum values as numbers, please make the first
> on explicitly "= 0".  These go together: default 0 is nice to have.

It is easier to write the loops going up, but I have changed it to const ints
and deleted the enum.

> > +static const struct clone_map rs6000_clone_map[ (int)CLONE_MAX ] = {
> 
> Space after cast; no spaces inside [].

Yep.

> > +static inline const char *
> > +get_decl_name (tree fn)
> 
> Please don't use inline unless there is a good reason to.

Ok.

> > +  if (TARGET_DEBUG_TARGET)
> > +    fprintf (stderr, "rs6000_get_function_version_priority (%s) => %d\n",
> > +	     get_decl_name (fndecl), (int) ret);
> 
> "ret" already is an int.  Similarly, are the casts of the enum values
> necessary?

Yep.

> > +  struct cgraph_function_version_info *default_version_info = NULL;
> 
> You always initialise this variable later on; don't set it to NULL
> earlier.  You can move the declaration down to where the var is first
> initialised.

Ok.

> > +  tree dispatch_decl = NULL;
> 
> For this one, you can put it inside the if (), and just explicitly
> return NULL on the error path (you do that in one case already).

Ok.

> > +#if defined (ASM_OUTPUT_TYPE_DIRECTIVE)
> 
> Is this the correct conditional to use?  It is not obvious to me why
> it would be.  Does it have to be an #ifdef anyway, can't it be an if?

Yes I believe it is.  ASM_OUTPUT_TYPE_DIRECTIVE is only defined in sysv4.h.
You need the .type directive to be able to declare .ifunc functions (plus
enabling ifunc which we now do as a default).  AIX and non-Linux systems will
not be able to use target_clones.

> > +  if (targetm.has_ifunc_p ())
> > +    {
> > +      struct cgraph_function_version_info *it_v = NULL;
> > +      struct cgraph_node *dispatcher_node = NULL;
> > +      struct cgraph_function_version_info *dispatcher_version_info = NULL;
> 
> No NULL for these either please.  If you later add a path where you
> forget to initialise one of these vars you will not get a warning
> (and if nothing goes wrong these initialisations are distracting noise).

I've recoded these.

> > +/* Make the resolver function decl to dispatch the versions of
> > +   a multi-versioned function,  DEFAULT_DECL.  Create an
> 
> One space after comma.

Ok.

> > +  /* The resolver function should return a (void *). */
> 
> And two after a dot.

Ok.

> > +  gcc_assert (dispatch_decl != NULL);
> > +  /* Mark dispatch_decl as "ifunc" with resolver as resolver_name.  */
> > +  DECL_ATTRIBUTES (dispatch_decl)
> > +    = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (dispatch_decl));
> 
> That assert is not very useful: the very next statement would segfault
> if the assertion fails, giving just as much information.

Ok.

> > +  /* Create the alias for dispatch to resolver here.  */
> > +  /*cgraph_create_function_alias (dispatch_decl, decl);*/
> 
> Do you need to keep this line?  Please add a comment saying why it is
> disabled for now, or such.

I will probably need to call cgraph_create_function_alias in the next round
when I fix what I consider to be the big problem with target_clones (namely,
outside of the function you don't use the target clones, you only use the ifunc
support for the current module.  But I will comment it for now.

> 
> > +  gcc_assert (new_bb != NULL);
> > +  gseq = bb_seq (new_bb);
> 
> Same as before.

Ok.

> > +  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
> > +	     		 build_fold_addr_expr (version_decl));
> 
> Indent is broken here.

Ok.

> > +  result_var = create_tmp_var (ptr_type_node);
> > +  convert_stmt = gimple_build_assign (result_var, convert_expr); 
> 
> Space at end of line.
> 
> > +  if (clone_isa == (int)CLONE_DEFAULT)
> 
> Space after cast.  Do you need a cast here?
> 
> > +  predicate_decl = rs6000_builtin_decls [(int) RS6000_BUILTIN_CPU_SUPPORTS];
> 
> You don't need a cast here either afaics.

See above.

> > +  make_edge (bb1, bb3, EDGE_FALSE_VALUE); 
> 
> Space at end of line.
> 
> > +  /* The first version in the vector is the default decl.  */
> > +  memset ((void *) clones, '\0', sizeof (clones));
> 
> memset (clones, 0, sizeof clones);

Ummm, it was my understanding in C++, you no longer get a free cast to void *,
and when you do need to use it in the mem* functions, you need an explicit
case.

> or just initialise it in the first place:
> 
> tree clones[CLONE_MAX] = { 0 };
> 
> > +  /* On the PowerPC, we do not need to call __builtin_cpu_init, if we are using
> > +     a new enough glibc.  If we ever need to call it, we would need to insert
> > +     the code here to do the call.  */
> 
> Are we always using a new enough glibc?  If so, please clarify the
> comment.

The expansion of the __builtin_cpu_supports ensures we have a new enough glibc,
but I can expand on the comment (basically x86 needs to call
__builtin_cpu_init, we don't).

> > +static tree 
> > +rs6000_generate_version_dispatcher_body (void *node_p)
> 
> Trailing space.

Ok.

> > +  node = (cgraph_node *)node_p;
> 
> Space after cast.

Ok.

> > +On a PowerPC, you could compile a function with
> > +@code{target_clones("cpu=power9,default")}.  GCC creates two function
> 
> "For PowerPC you can ..."?
> 
> > --- gcc/testsuite/gcc.target/powerpc/clone1.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
> > +++ gcc/testsuite/gcc.target/powerpc/clone1.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 248446)
> > @@ -0,0 +1,19 @@
> > +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
> 
> s/powerpc64/powerpc/

Ok.

> 
> Looks good so far, just needs some polish ;-)  Please consider changing
> the clone_list enum to a more natural order (and does the enum need a
> name, anyway?), tidy up layout stuff etc., and repost.
> 
> Thanks,
> 
> 
> Segher
>
diff mbox

Patch

Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 248378)
+++ gcc/config/rs6000/rs6000.c	(.../gcc/config/rs6000)	(working copy)
@@ -42,6 +42,7 @@ 
 #include "flags.h"
 #include "alias.h"
 #include "fold-const.h"
+#include "attribs.h"
 #include "stor-layout.h"
 #include "calls.h"
 #include "print-tree.h"
@@ -384,6 +385,34 @@  static const struct
   { "ieee128",		PPC_FEATURE2_HAS_IEEE128,	1 }
 };
 
+/* On PowerPC, we have a limited number of target clones that we care about
+   which means we can use an array to hold the options, rather than having more
+   elaborate data structures to identify each possible variation.  Order the
+   clones from the highest ISA to the least.  */
+enum clone_list {
+  CLONE_ISA_3_00,		/* ISA 3.00 (power9).  */
+  CLONE_ISA_2_07,		/* ISA 2.07 (power8).  */
+  CLONE_ISA_2_06,		/* ISA 2.06 (power7).  */
+  CLONE_ISA_2_05,		/* ISA 2.05 (power6).  */
+  CLONE_DEFAULT,		/* default clone.  */
+  CLONE_MAX
+};
+
+/* Map compiler ISA bits into HWCAP names.  */
+struct clone_map {
+  HOST_WIDE_INT isa_mask;	/* rs6000_isa mask */
+  const char *name;		/* name to use in __builtin_cpu_supports.  */
+};
+
+static const struct clone_map rs6000_clone_map[ (int)CLONE_MAX ] = {
+  { OPTION_MASK_P9_VECTOR,	"arch_3_00" },	/* ISA 3.00 (power9).  */
+  { OPTION_MASK_P8_VECTOR,	"arch_2_07" },	/* ISA 2.07 (power8).  */
+  { OPTION_MASK_POPCNTD,	"arch_2_06" },	/* ISA 2.06 (power7).  */
+  { OPTION_MASK_CMPB,		"arch_2_05" },	/* ISA 2.05 (power6).  */
+  { 0,				"" },		/* Default options.  */
+};
+
+
 /* Newer LIBCs explicitly export this symbol to declare that they provide
    the AT_PLATFORM and AT_HWCAP/AT_HWCAP2 values in the TCB.  We emit a
    reference to this symbol whenever we expand a CPU builtin, so that
@@ -1969,6 +1998,21 @@  static const struct attribute_spec rs600
 
 #undef TARGET_CUSTOM_FUNCTION_DESCRIPTORS
 #define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 1
+
+#undef TARGET_COMPARE_VERSION_PRIORITY
+#define TARGET_COMPARE_VERSION_PRIORITY rs6000_compare_version_priority
+
+#undef TARGET_GENERATE_VERSION_DISPATCHER_BODY
+#define TARGET_GENERATE_VERSION_DISPATCHER_BODY				\
+  rs6000_generate_version_dispatcher_body
+
+#undef TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
+#define TARGET_GET_FUNCTION_VERSIONS_DISPATCHER				\
+  rs6000_get_function_versions_dispatcher
+
+#undef TARGET_OPTION_FUNCTION_VERSIONS
+#define TARGET_OPTION_FUNCTION_VERSIONS common_function_versions
+
 
 
 /* Processor table.  */
@@ -15616,6 +15660,14 @@  cpu_expand_builtin (enum rs6000_builtins
 
 #ifdef TARGET_LIBC_PROVIDES_HWCAP_IN_TCB
   tree arg = TREE_OPERAND (CALL_EXPR_ARG (exp, 0), 0);
+  /* Target clones creates an ARRAY_REF instead of STRING_CST, convert it back
+     to a STRING_CST.  */
+  if (TREE_CODE (arg) == ARRAY_REF
+      && TREE_CODE (TREE_OPERAND (arg, 0)) == STRING_CST
+      && TREE_CODE (TREE_OPERAND (arg, 1)) == INTEGER_CST
+      && compare_tree_int (TREE_OPERAND (arg, 1), 0) == 0)
+    arg = TREE_OPERAND (arg, 0);
+
   if (TREE_CODE (arg) != STRING_CST)
     {
       error ("builtin %s only accepts a string argument",
@@ -39743,6 +39795,14 @@  rs6000_valid_attribute_p (tree fndecl,
       fprintf (stderr, "--------------------\n");
     }
 
+  /* attribute((target("default"))) does nothing, beyond
+     affecting multi-versioning.  */
+  if (TREE_VALUE (args)
+      && TREE_CODE (TREE_VALUE (args)) == STRING_CST
+      && TREE_CHAIN (args) == NULL_TREE
+      && strcmp (TREE_STRING_POINTER (TREE_VALUE (args)), "default") == 0)
+    return true;
+
   old_optimize = build_optimization_node (&global_options);
   func_optimize = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (fndecl);
 
@@ -40175,6 +40235,486 @@  rs6000_disable_incompatible_switches (vo
 }
 
 
+/* Helper function for printing the function name when debugging.  */
+
+static inline const char *
+get_decl_name (tree fn)
+{
+  tree name;
+
+  if (!fn)
+    return "<null>";
+
+  name = DECL_NAME (fn);
+  if (!name)
+    return "<no-name>";
+
+  return IDENTIFIER_POINTER (name);
+}
+
+/* Return the clone id of the target we are compiling code for in a target
+   clone.  The clone id is ordered from 0 to CLONE_MAX-1 and gives the priority
+   list for the target clones (ordered from highest to lowest).  */
+
+static int
+rs6000_clone_priority (tree fndecl)
+{
+  tree fn_opts = DECL_FUNCTION_SPECIFIC_TARGET (fndecl);
+  HOST_WIDE_INT isa_masks;
+  int ret = (int) CLONE_DEFAULT;
+  tree attrs = lookup_attribute ("target", DECL_ATTRIBUTES (fndecl));
+  const char *attrs_str = NULL;
+
+  gcc_assert (attrs != NULL);
+  attrs = TREE_VALUE (TREE_VALUE (attrs));
+
+  gcc_assert (TREE_CODE (attrs) == STRING_CST);
+  attrs_str = TREE_STRING_POINTER (attrs);
+
+  /* Return priority zero for default function.  Return the ISA needed for the
+     function if it is not the default.  */
+  if (strcmp (attrs_str, "default") != 0)
+    {
+      if (fn_opts == NULL_TREE)
+	fn_opts = target_option_default_node;
+
+      if (!fn_opts || !TREE_TARGET_OPTION (fn_opts))
+	isa_masks = rs6000_isa_flags;
+      else
+	isa_masks = TREE_TARGET_OPTION (fn_opts)->x_rs6000_isa_flags;
+
+      for (ret = 0; ret < (int) CLONE_DEFAULT; ret++)
+	if ((rs6000_clone_map[ret].isa_mask & isa_masks) != 0)
+	  break;
+    }
+
+  if (TARGET_DEBUG_TARGET)
+    fprintf (stderr, "rs6000_get_function_version_priority (%s) => %d\n",
+	     get_decl_name (fndecl), (int) ret);
+
+  return ret;
+}
+
+/* This compares the priority of target features in function DECL1 and DECL2.
+   It returns positive value if DECL1 is higher priority, negative value if
+   DECL2 is higher priority and 0 if they are the same.  Note, priorities are
+   ordered from highest (0, CLONE_ISA_3_0) to lowest (CLONE_DEFAULT).  */
+
+static int
+rs6000_compare_version_priority (tree decl1, tree decl2)
+{
+  int priority1 = rs6000_clone_priority (decl1);
+  int priority2 = rs6000_clone_priority (decl2);
+  int ret = priority2 - priority1;
+
+  if (TARGET_DEBUG_TARGET)
+    fprintf (stderr, "rs6000_compare_version_priority (%s, %s) => %d\n",
+	     get_decl_name (decl1), get_decl_name (decl2), ret);
+
+  return ret;
+}
+
+/* Make a dispatcher declaration for the multi-versioned function DECL.
+   Calls to DECL function will be replaced with calls to the dispatcher
+   by the front-end.  Returns the decl of the dispatcher function.  */
+
+static tree
+rs6000_get_function_versions_dispatcher (void *decl)
+{
+  tree fn = (tree) decl;
+  struct cgraph_node *node = NULL;
+  struct cgraph_node *default_node = NULL;
+  struct cgraph_function_version_info *node_v = NULL;
+  struct cgraph_function_version_info *first_v = NULL;
+
+  tree dispatch_decl = NULL;
+
+  struct cgraph_function_version_info *default_version_info = NULL;
+ 
+  gcc_assert (fn != NULL && DECL_FUNCTION_VERSIONED (fn));
+
+  if (TARGET_DEBUG_TARGET)
+    fprintf (stderr, "rs6000_get_function_versions_dispatcher (%s)\n",
+	     get_decl_name (fn));
+
+  node = cgraph_node::get (fn);
+  gcc_assert (node != NULL);
+
+  node_v = node->function_version ();
+  gcc_assert (node_v != NULL);
+ 
+  if (node_v->dispatcher_resolver != NULL)
+    return node_v->dispatcher_resolver;
+
+  /* Find the default version and make it the first node.  */
+  first_v = node_v;
+  /* Go to the beginning of the chain.  */
+  while (first_v->prev != NULL)
+    first_v = first_v->prev;
+
+  default_version_info = first_v;
+  while (default_version_info != NULL)
+    {
+      const tree decl2 = default_version_info->this_node->decl;
+      if (is_function_default_version (decl2))
+        break;
+      default_version_info = default_version_info->next;
+    }
+
+  /* If there is no default node, just return NULL.  */
+  if (default_version_info == NULL)
+    return NULL;
+
+  /* Make default info the first node.  */
+  if (first_v != default_version_info)
+    {
+      default_version_info->prev->next = default_version_info->next;
+      if (default_version_info->next)
+        default_version_info->next->prev = default_version_info->prev;
+      first_v->prev = default_version_info;
+      default_version_info->next = first_v;
+      default_version_info->prev = NULL;
+    }
+
+  default_node = default_version_info->this_node;
+
+#if defined (ASM_OUTPUT_TYPE_DIRECTIVE)
+  if (targetm.has_ifunc_p ())
+    {
+      struct cgraph_function_version_info *it_v = NULL;
+      struct cgraph_node *dispatcher_node = NULL;
+      struct cgraph_function_version_info *dispatcher_version_info = NULL;
+
+      /* Right now, the dispatching is done via ifunc.  */
+      dispatch_decl = make_dispatcher_decl (default_node->decl);
+
+      dispatcher_node = cgraph_node::get_create (dispatch_decl);
+      gcc_assert (dispatcher_node != NULL);
+      dispatcher_node->dispatcher_function = 1;
+      dispatcher_version_info
+	= dispatcher_node->insert_new_function_version ();
+      dispatcher_version_info->next = default_version_info;
+      dispatcher_node->definition = 1;
+
+      /* Set the dispatcher for all the versions.  */
+      it_v = default_version_info;
+      while (it_v != NULL)
+	{
+	  it_v->dispatcher_resolver = dispatch_decl;
+	  it_v = it_v->next;
+	}
+    }
+  else
+#endif
+    {
+      error_at (DECL_SOURCE_LOCATION (default_node->decl),
+		"multiversioning needs ifunc which is not supported "
+		"on this target");
+    }
+
+  return dispatch_decl;
+}
+
+/* Make the resolver function decl to dispatch the versions of
+   a multi-versioned function,  DEFAULT_DECL.  Create an
+   empty basic block in the resolver and store the pointer in
+   EMPTY_BB.  Return the decl of the resolver function.  */
+
+static tree
+make_resolver_func (const tree default_decl,
+		    const tree dispatch_decl,
+		    basic_block *empty_bb)
+{
+  char *resolver_name;
+  tree decl, type, decl_name, t;
+  bool is_uniq = false;
+
+  /* IFUNC's have to be globally visible.  So, if the default_decl is
+     not, then the name of the IFUNC should be made unique.  */
+  if (TREE_PUBLIC (default_decl) == 0)
+    is_uniq = true;
+
+  /* Append the filename to the resolver function if the versions are
+     not externally visible.  This is because the resolver function has
+     to be externally visible for the loader to find it.  So, appending
+     the filename will prevent conflicts with a resolver function from
+     another module which is based on the same version name.  */
+  resolver_name = make_unique_name (default_decl, "resolver", is_uniq);
+
+  /* The resolver function should return a (void *). */
+  type = build_function_type_list (ptr_type_node, NULL_TREE);
+
+  decl = build_fn_decl (resolver_name, type);
+  decl_name = get_identifier (resolver_name);
+  SET_DECL_ASSEMBLER_NAME (decl, decl_name);
+
+  DECL_NAME (decl) = decl_name;
+  TREE_USED (decl) = 1;
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  /* IFUNC resolvers have to be externally visible.  */
+  TREE_PUBLIC (decl) = 1;
+  DECL_UNINLINABLE (decl) = 1;
+
+  /* Resolver is not external, body is generated.  */
+  DECL_EXTERNAL (decl) = 0;
+  DECL_EXTERNAL (dispatch_decl) = 0;
+
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  DECL_STATIC_CONSTRUCTOR (decl) = 0;
+
+  if (DECL_COMDAT_GROUP (default_decl)
+      || TREE_PUBLIC (default_decl))
+    {
+      /* In this case, each translation unit with a call to this
+	 versioned function will put out a resolver.  Ensure it
+	 is comdat to keep just one copy.  */
+      DECL_COMDAT (decl) = 1;
+      make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
+    }
+  /* Build result decl and add to function_decl. */
+  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_RESULT (decl) = t;
+
+  gimplify_function_tree (decl);
+  push_cfun (DECL_STRUCT_FUNCTION (decl));
+  *empty_bb = init_lowered_empty_function (decl, false, 0);
+
+  cgraph_node::add_new_function (decl, true);
+  symtab->call_cgraph_insertion_hooks (cgraph_node::get_create (decl));
+
+  pop_cfun ();
+
+  gcc_assert (dispatch_decl != NULL);
+  /* Mark dispatch_decl as "ifunc" with resolver as resolver_name.  */
+  DECL_ATTRIBUTES (dispatch_decl)
+    = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (dispatch_decl));
+
+  /* Create the alias for dispatch to resolver here.  */
+  /*cgraph_create_function_alias (dispatch_decl, decl);*/
+  cgraph_node::create_same_body_alias (dispatch_decl, decl);
+  XDELETEVEC (resolver_name);
+  return decl;
+}
+
+/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL to
+   return a pointer to VERSION_DECL if we are running on a machine that
+   supports the index CLONE_ISA hardware architecture bits.  This function will
+   be called during version dispatch to decide which function version to
+   execute.  It returns the basic block at the end, to which more conditions
+   can be added.  */
+
+static basic_block
+add_condition_to_bb (tree function_decl, tree version_decl,
+		     int clone_isa, basic_block new_bb)
+{
+  gimple *return_stmt;
+  tree convert_expr, result_var;
+  gimple *convert_stmt;
+  gimple_seq gseq;
+  gimple *call_cond_stmt;
+  gimple *if_else_stmt;
+
+  basic_block bb1, bb2, bb3;
+  edge e12, e23;
+  tree cond_var,  predicate_decl, predicate_arg, bool_zero;
+  const char *arg_str;
+
+  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
+
+  gcc_assert (new_bb != NULL);
+  gseq = bb_seq (new_bb);
+
+
+  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
+	     		 build_fold_addr_expr (version_decl));
+  result_var = create_tmp_var (ptr_type_node);
+  convert_stmt = gimple_build_assign (result_var, convert_expr); 
+  return_stmt = gimple_build_return (result_var);
+
+  if (clone_isa == (int)CLONE_DEFAULT)
+    {
+      gimple_seq_add_stmt (&gseq, convert_stmt);
+      gimple_seq_add_stmt (&gseq, return_stmt);
+      set_bb_seq (new_bb, gseq);
+      gimple_set_bb (convert_stmt, new_bb);
+      gimple_set_bb (return_stmt, new_bb);
+      pop_cfun ();
+      return new_bb;
+    }
+
+  bool_zero = build_int_cst (bool_int_type_node, 0);
+  cond_var = create_tmp_var (bool_int_type_node);
+  predicate_decl = rs6000_builtin_decls [(int) RS6000_BUILTIN_CPU_SUPPORTS];
+  arg_str = rs6000_clone_map[clone_isa].name;
+  predicate_arg = build_string_literal (strlen (arg_str) + 1, arg_str);
+  call_cond_stmt = gimple_build_call (predicate_decl, 1, predicate_arg);
+  gimple_call_set_lhs (call_cond_stmt, cond_var);
+
+  gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl));
+  gimple_set_bb (call_cond_stmt, new_bb);
+  gimple_seq_add_stmt (&gseq, call_cond_stmt);
+
+  if_else_stmt = gimple_build_cond (NE_EXPR, cond_var, bool_zero, NULL_TREE,
+				    NULL_TREE);
+  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
+  gimple_set_bb (if_else_stmt, new_bb);
+  gimple_seq_add_stmt (&gseq, if_else_stmt);
+
+  gimple_seq_add_stmt (&gseq, convert_stmt);
+  gimple_seq_add_stmt (&gseq, return_stmt);
+  set_bb_seq (new_bb, gseq);
+
+  bb1 = new_bb;
+  e12 = split_block (bb1, if_else_stmt);
+  bb2 = e12->dest;
+  e12->flags &= ~EDGE_FALLTHRU;
+  e12->flags |= EDGE_TRUE_VALUE;
+
+  e23 = split_block (bb2, return_stmt);
+
+  gimple_set_bb (convert_stmt, bb2);
+  gimple_set_bb (return_stmt, bb2);
+
+  bb3 = e23->dest;
+  make_edge (bb1, bb3, EDGE_FALSE_VALUE); 
+
+  remove_edge (e23);
+  make_edge (bb2, EXIT_BLOCK_PTR_FOR_FN (cfun), 0);
+
+  pop_cfun ();
+
+  return bb3;
+}
+
+/* This function generates the dispatch function for multi-versioned functions.
+   DISPATCH_DECL is the function which will contain the dispatch logic.
+   FNDECLS are the function choices for dispatch, and is a tree chain.
+   EMPTY_BB is the basic block pointer in DISPATCH_DECL in which the dispatch
+   code is generated.  */
+
+static int
+dispatch_function_versions (tree dispatch_decl,
+			    void *fndecls_p,
+			    basic_block *empty_bb)
+{
+  int ix;
+  tree ele;
+  vec<tree> *fndecls;
+  tree clones[ (int)CLONE_MAX ];
+
+  if (TARGET_DEBUG_TARGET)
+    fputs ("dispatch_function_versions, top\n", stderr);
+
+  gcc_assert (dispatch_decl != NULL
+	      && fndecls_p != NULL
+	      && empty_bb != NULL);
+
+  /* fndecls_p is actually a vector.  */
+  fndecls = static_cast<vec<tree> *> (fndecls_p);
+
+  /* At least one more version other than the default.  */
+  gcc_assert (fndecls->length () >= 2);
+
+  /* The first version in the vector is the default decl.  */
+  memset ((void *) clones, '\0', sizeof (clones));
+  clones[ (int)CLONE_DEFAULT ] = (*fndecls)[0];
+
+  /* On the PowerPC, we do not need to call __builtin_cpu_init, if we are using
+     a new enough glibc.  If we ever need to call it, we would need to insert
+     the code here to do the call.  */
+
+  for (ix = 1; fndecls->iterate (ix, &ele); ++ix)
+    {
+      int priority = rs6000_clone_priority (ele);
+      if (!clones[priority])
+	clones[priority] = ele;
+    }
+
+  for (ix = 0; ix < (int)CLONE_MAX; ix++)
+    if (clones[ix])
+      {
+	if (TARGET_DEBUG_TARGET)
+	  fprintf (stderr, "dispatch_function_versions, clone %d, %s\n",
+		   ix, get_decl_name (clones[ix]));
+
+	*empty_bb = add_condition_to_bb (dispatch_decl, clones[ix], ix,
+					 *empty_bb);
+      }
+
+  return 0;
+}
+
+/* Generate the dispatching code body to dispatch multi-versioned function
+   DECL.  The target hook is called to process the "target" attributes and
+   provide the code to dispatch the right function at run-time.  NODE points
+   to the dispatcher decl whose body will be created.  */
+
+static tree 
+rs6000_generate_version_dispatcher_body (void *node_p)
+{
+  tree resolver_decl;
+  basic_block empty_bb;
+  tree default_ver_decl;
+  struct cgraph_node *versn;
+  struct cgraph_node *node;
+
+  struct cgraph_function_version_info *node_version_info = NULL;
+  struct cgraph_function_version_info *versn_info = NULL;
+
+  node = (cgraph_node *)node_p;
+
+  node_version_info = node->function_version ();
+  gcc_assert (node->dispatcher_function
+	      && node_version_info != NULL);
+
+  if (node_version_info->dispatcher_resolver)
+    return node_version_info->dispatcher_resolver;
+
+  /* The first version in the chain corresponds to the default version.  */
+  default_ver_decl = node_version_info->next->this_node->decl;
+
+  /* node is going to be an alias, so remove the finalized bit.  */
+  node->definition = false;
+
+  resolver_decl = make_resolver_func (default_ver_decl,
+				      node->decl, &empty_bb);
+
+  node_version_info->dispatcher_resolver = resolver_decl;
+
+  if (TARGET_DEBUG_TARGET)
+    fprintf (stderr, "rs6000_get_function_versions_dispatcher, %s\n",
+	     get_decl_name (resolver_decl));
+
+  push_cfun (DECL_STRUCT_FUNCTION (resolver_decl));
+
+  auto_vec<tree, 2> fn_ver_vec;
+
+  for (versn_info = node_version_info->next; versn_info;
+       versn_info = versn_info->next)
+    {
+      versn = versn_info->this_node;
+      /* Check for virtual functions here again, as by this time it should
+	 have been determined if this function needs a vtable index or
+	 not.  This happens for methods in derived classes that override
+	 virtual methods in base classes but are not explicitly marked as
+	 virtual.  */
+      if (DECL_VINDEX (versn->decl))
+	sorry ("Virtual function multiversioning not supported");
+
+      fn_ver_vec.safe_push (versn->decl);
+    }
+
+  dispatch_function_versions (resolver_decl, &fn_ver_vec, &empty_bb);
+  cgraph_edge::rebuild_edges ();
+  pop_cfun ();
+  return resolver_decl;
+}
+
+
 /* Hook to determine if one function can safely inline another.  */
 
 static bool
@@ -40208,12 +40748,7 @@  rs6000_can_inline_p (tree caller, tree c
 
   if (TARGET_DEBUG_TARGET)
     fprintf (stderr, "rs6000_can_inline_p:, caller %s, callee %s, %s inline\n",
-	     (DECL_NAME (caller)
-	      ? IDENTIFIER_POINTER (DECL_NAME (caller))
-	      : "<unknown>"),
-	     (DECL_NAME (callee)
-	      ? IDENTIFIER_POINTER (DECL_NAME (callee))
-	      : "<unknown>"),
+	     get_decl_name (caller), get_decl_name (callee),
 	     (ret ? "can" : "cannot"));
 
   return ret;
Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/doc)	(revision 248378)
+++ gcc/doc/extend.texi	(.../gcc/doc)	(working copy)
@@ -3257,7 +3257,15 @@  For instance, on an x86, you could compi
 @code{target_clones("sse4.1,avx")}.  GCC creates two function clones,
 one compiled with @option{-msse4.1} and another with @option{-mavx}.
 It also creates a resolver function (see the @code{ifunc} attribute
-above) that dynamically selects a clone suitable for current architecture.
+above) that dynamically selects a clone suitable for current
+architecture.
+
+On a PowerPC, you could compile a function with
+@code{target_clones("cpu=power9,default")}.  GCC creates two function
+clones, one compiled with @option{-mcpu=power9} and another with the
+default options.  It also creates a resolver function (see the
+@code{ifunc} attribute above) that dynamically selects a clone
+suitable for current architecture.
 
 @item unused
 @cindex @code{unused} function attribute
Index: gcc/testsuite/gcc.target/powerpc/clone1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/clone1.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/clone1.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 248446)
@@ -0,0 +1,19 @@ 
+/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
+/* { dg-options "-mcpu=power8 -O2" } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+
+__attribute__((target_clones("cpu=power9,default")))
+long mod_func (long a, long b)
+{
+  return a % b;
+}
+
+long mod_func_or (long a, long b, long c)
+{
+  return mod_func (a, b) | c;
+}
+
+/* { dg-final { scan-assembler-times {\mdivd\M}  1 } } */
+/* { dg-final { scan-assembler-times {\mmulld\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mmodsd\M} 1 } } */