diff mbox

Fix PR46728 (move pow/powi folds to tree phases)

Message ID 1305298365.4889.6.camel@L3G5336.ibm.com
State New
Headers show

Commit Message

Bill Schmidt May 13, 2011, 2:52 p.m. UTC
This patch addresses PR46728, which notes that pow and powi need to be
lowered in tree phases to restore lost FMA opportunities and expose
vectorization opportunities.

The approach is to move most optimizations from expand_builtin_pow[i]
into fold_builtin_pow[i]. One exception is the rewrite of powi as an
optimal sequence of multiplies, which relies on the ability to insert
new statements. Because fold_builtin_powi is called during front-end
parsing, the gimple machinery for statement manipulation can't be relied
upon there. 

A new pass (tree-ssa-math-opts.c:execute_lower_pow) is added at all opt
levels to lower calls to pow and powi early in the middle end. This is
where the expansion of powi into multiplies takes place. Other folds
from fold_builtin_pow[i] take place here as well. In many cases, these
opportunities were already folded during parsing, but not all front ends
may do this, so the patch doesn't rely on it.

Miscellaneous notes:

      * expand_builtin_pow[i] remain as skeletons, for those cases that
        can't be lowered into another form.
        
      * In some cases, fold_builtin_sqrt attempts to convert sqrt into
        pow[i] (the inverse of what the pow lowering does); this must be
        disabled when a hardware sqrt instruction is available.
        
      * Many fewer pow invocations will exist in tree form; they will
        now be converted to use sqrt, cbrt, powi, etc. where possible.
        Because powi will be more prevalent now, I duplicated many of
        the pow folds in fold-const.c to work on powi as well.
        
      * I added 16 new test cases. There is already good test coverage
        for this area, so most of the tests are powerpc-specific to test
        behavior when a hardware sqrt instruction is available.
        
Patch was regression-tested on powerpc64-linux and i686-linux-gnu. OK
for mainline?

Bill


gcc/

2011-04-28  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* Makefile.in (tree-ssa-math-opts.o): Add dependency on
	tree-ssa-propagate.h.
	* builtins.c (fold_builtin_pow, fold_builtin_powi): Remove forward
	declarations.
	(powi_cost): Update commentary.
	(expand_powi_1): Remove.
	(expand_powi): Remove.
	(expand_builtin_pow_root): Remove.
	(expand_builtin_pow): Remove all folds.
	(expand_builtin_powi): Remove all folds.
	(fold_builtin_sqrt): Restrict fold of sqrt(Nroot(x)) when
	hardware sqrt is available; add fold of sqrt(powi(x,y)) to
	pow(|x|,y*0.5).
	(fold_builtin_cbrt): Restrict fold of (cbrt(sqrt(x)) when hardware
	sqrt is available.
	(fold_eval_powi): New function.
	(build_call_expr_loc_strip_sign): New function.
	(fold_builtin_pow_frac_exp): New function.
	(fold_builtin_pow): Remove static declaraion; add fold of
	pow(x,0.25) to sqrt(sqrt(x)); add fold of pow(x,0.75) to
	sqrt(x)*sqrt(sqrt(x)); add folds of pow(x,c) to use powi when c,
	2c, or 3c is an integer.
	(powi_as_mults_1): New function.
	(powi_as_mults): New function.
	(tree_expand_builtin_powi): New function.
	(fold_builtin_powi): Remove static declaration; delay handling of
	compile-time constants until after simple folds, and move that
	handling into fold_eval_powi.
	* fold-const.c (fold_binary_loc): Add folds on powi similar to
	existing folds on pow; remove fold of x*x to pow(x,2.0).
	* passes.c (init_optimization_passes): Add pass_lower_pow.
	* tree.h (fold_builtin_pow, fold_builtin_powi): Add declarations.
	* tree-flow.h (tree_expand_builtin_powi): Add declaration.
	* tree-pass.h (pass_lower_pow): Add declaration.
	* tree-ssa-math-opts.c (tree-ssa-propagate.h): New include.
	(execute_lower_pow): New function.
	(pass_lower_pow): New gimple_opt_pass.
	
gcc/testsuite/

2011-05-13  Bill Schmidt <wschmidt@linux.vnet.ibm.com>

	* gcc.target/powerpc/pr46728-1.c: New testcase.
	* gcc.target/powerpc/pr46728-2.c: New testcase.
	* gcc.target/powerpc/pr46728-3.c: New testcase.
	* gcc.target/powerpc/pr46728-4.c: New testcase.
	* gcc.target/powerpc/pr46728-5.c: New testcase.
	* gcc.dg/pr46728-6.c: New testcase.
	* gcc.target/powerpc/pr46728-7.c: New testcase.
	* gcc.target/powerpc/pr46728-8.c: New testcase.
	* gcc.dg/pr46728-9.c: New testcase.
	* gcc.target/powerpc/pr46728-10.c: New testcase.
	* gcc.target/powerpc/pr46728-11.c: New testcase.
	* gcc.dg/pr46728-12.c: New testcase.
	* gcc.target/powerpc/pr46728-13.c: New testcase.
	* gcc.target/powerpc/pr46728-14.c: New testcase.
	* gcc.target/powerpc/pr46728-15.c: New testcase.
	* gcc.target/powerpc/pr46728-16.c: New testcase.

Comments

Nathan Froyd May 13, 2011, 3:01 p.m. UTC | #1
On 05/13/2011 10:52 AM, William J. Schmidt wrote:
> This patch addresses PR46728, which notes that pow and powi need to be
> lowered in tree phases to restore lost FMA opportunities and expose
> vectorization opportunities.
>
> +struct gimple_opt_pass pass_lower_pow =
> +{
> + {
> +  GIMPLE_PASS,
> +  "lower_pow",				/* name */
> +  NULL,					/* gate */

Please make this controlled by an option; this pass doesn't need to be run all
the time.

IMHO, the pass shouldn't run at anything less than -O3, but that's for other
people to decide.

-Nathan
Richard Biener May 13, 2011, 3:26 p.m. UTC | #2
On Fri, May 13, 2011 at 5:01 PM, Nathan Froyd <froydnj@codesourcery.com> wrote:
> On 05/13/2011 10:52 AM, William J. Schmidt wrote:
>> This patch addresses PR46728, which notes that pow and powi need to be
>> lowered in tree phases to restore lost FMA opportunities and expose
>> vectorization opportunities.
>>
>> +struct gimple_opt_pass pass_lower_pow =
>> +{
>> + {
>> +  GIMPLE_PASS,
>> +  "lower_pow",                               /* name */
>> +  NULL,                                      /* gate */
>
> Please make this controlled by an option; this pass doesn't need to be run all
> the time.
>
> IMHO, the pass shouldn't run at anything less than -O3, but that's for other
> people to decide.

It was run unconditionally before, so unless we preserve the code at
expansion time we have to do it here.

I will have a closer look at the patch early next week.  Btw, I thought
of adding a POW_EXPR tree code that can take mixed-mode operands
to make foldings (eventually) simpler, but I'm not sure it's worth the
trouble.

The position of the pass is odd - why did you place it there?  I would
have placed it alongside pass_cse_sincos and pass_optimize_bswap.
The foldings should probably be done via fold-stmt only (where they
should already apply), and you won't catch things like pow(sqrt(...))
there because you only see the outer call.  That said, I'd be happier
if the patch just did the powi expansion and left the rest to somebody
else.

Richard.
Bill Schmidt May 13, 2011, 3:52 p.m. UTC | #3
On Fri, 2011-05-13 at 17:26 +0200, Richard Guenther wrote:
> On Fri, May 13, 2011 at 5:01 PM, Nathan Froyd <froydnj@codesourcery.com> wrote:
> > On 05/13/2011 10:52 AM, William J. Schmidt wrote:
> >> This patch addresses PR46728, which notes that pow and powi need to be
> >> lowered in tree phases to restore lost FMA opportunities and expose
> >> vectorization opportunities.
> >>
> >> +struct gimple_opt_pass pass_lower_pow =
> >> +{
> >> + {
> >> +  GIMPLE_PASS,
> >> +  "lower_pow",                               /* name */
> >> +  NULL,                                      /* gate */
> >
> > Please make this controlled by an option; this pass doesn't need to be run all
> > the time.
> >
> > IMHO, the pass shouldn't run at anything less than -O3, but that's for other
> > people to decide.
> 
> It was run unconditionally before, so unless we preserve the code at
> expansion time we have to do it here.

Right.  A number of tests fail at -O0 if it's not done unconditionally.
This seemed better than having duplicate code remain in expand.

> 
> I will have a closer look at the patch early next week.  

Much obliged!

> Btw, I thought
> of adding a POW_EXPR tree code that can take mixed-mode operands
> to make foldings (eventually) simpler, but I'm not sure it's worth the
> trouble.
> 
> The position of the pass is odd - why did you place it there?  I would
> have placed it alongside pass_cse_sincos and pass_optimize_bswap.

That was where I wanted it initially also, but this seems necessary for
the pass to run unconditionally.  If I recall correctly,
gate_all_optimizations() was kicking in at -O0, so I had to move it
earlier.

> The foldings should probably be done via fold-stmt only (where they
> should already apply), and you won't catch things like pow(sqrt(...))
> there because you only see the outer call.  That said, I'd be happier
> if the patch just did the powi expansion and left the rest to somebody
> else.

I'm not sure I understand this last part.  The original concern of
PR46728 regarded __builtin_pow(x, 0.75) being lowered too late for the
FMA optimization to kick in, so I needed to address that.  I'm probably
misunderstanding you.

> 
> Richard.
Nathan Froyd May 13, 2011, 4 p.m. UTC | #4
On 05/13/2011 11:26 AM, Richard Guenther wrote:
> On Fri, May 13, 2011 at 5:01 PM, Nathan Froyd <froydnj@codesourcery.com> wrote:
>> On 05/13/2011 10:52 AM, William J. Schmidt wrote:
>>> This patch addresses PR46728, which notes that pow and powi need to be
>>> lowered in tree phases to restore lost FMA opportunities and expose
>>> vectorization opportunities.
>>>
>>> +struct gimple_opt_pass pass_lower_pow =
>>> +{
>>> + {
>>> +  GIMPLE_PASS,
>>> +  "lower_pow",                               /* name */
>>> +  NULL,                                      /* gate */
>>
>> Please make this controlled by an option; this pass doesn't need to be run all
>> the time.
>>
>> IMHO, the pass shouldn't run at anything less than -O3, but that's for other
>> people to decide.
> 
> It was run unconditionally before, so unless we preserve the code at
> expansion time we have to do it here.

We were doing it unconditionally before because we were calling it through
folding and expansion, both of which only fired if we were expanding pow
calls; now we're groveling over the whole IR to look for optimization
opportunities that, let's be honest, the vast majority of code is never going
to care about.  The whole point of the patch is "to restore lost FMA
opportunities and expose vectorization opportunities".  The first reason
*might* be good justification for running it at -O2, but the second reason
calls for -O3 or at the very least flag_tree_vectorize.

However, I don't think anybody's going to notice/care if -O1 stopped folding
pow calls; could we at least add a flag_expensive_optimizations && optimize gate?

-Nathan
Bill Schmidt May 13, 2011, 9:55 p.m. UTC | #5
On Fri, 2011-05-13 at 10:52 -0500, William J. Schmidt wrote:
> On Fri, 2011-05-13 at 17:26 +0200, Richard Guenther wrote:

 -- snip --

> > 
> > The position of the pass is odd - why did you place it there?  I would
> > have placed it alongside pass_cse_sincos and pass_optimize_bswap.
> 
> That was where I wanted it initially also, but this seems necessary for
> the pass to run unconditionally.  If I recall correctly,
> gate_all_optimizations() was kicking in at -O0, so I had to move it
> earlier.

As an alternative, I could reinstate the "expand" transformations to
kick in when the lower_pow pass is disabled.  I can then move the
lower_pow pass to the neighborhood of pass_cse_sincos and
pass_optimize_bswap, and limit it to -O1 and above.  Optionally, I could
gate it on flag_expensive_optimizations as Nathan suggested, though that
is perhaps not appropriate for a simple linear scan.

I did a quick regtest of this and it held up without regressions.  Let
me know if you'd prefer me to implement it that way.  I'd probably vote
for it myself, as I wasn't happy with extra compile time at -O0 either.
Just a matter of whether we want to tolerate duplicated logic to avoid
that.
Richard Biener May 16, 2011, 2:26 p.m. UTC | #6
On Fri, May 13, 2011 at 6:00 PM, Nathan Froyd <froydnj@codesourcery.com> wrote:
> On 05/13/2011 11:26 AM, Richard Guenther wrote:
>> On Fri, May 13, 2011 at 5:01 PM, Nathan Froyd <froydnj@codesourcery.com> wrote:
>>> On 05/13/2011 10:52 AM, William J. Schmidt wrote:
>>>> This patch addresses PR46728, which notes that pow and powi need to be
>>>> lowered in tree phases to restore lost FMA opportunities and expose
>>>> vectorization opportunities.
>>>>
>>>> +struct gimple_opt_pass pass_lower_pow =
>>>> +{
>>>> + {
>>>> +  GIMPLE_PASS,
>>>> +  "lower_pow",                               /* name */
>>>> +  NULL,                                      /* gate */
>>>
>>> Please make this controlled by an option; this pass doesn't need to be run all
>>> the time.
>>>
>>> IMHO, the pass shouldn't run at anything less than -O3, but that's for other
>>> people to decide.
>>
>> It was run unconditionally before, so unless we preserve the code at
>> expansion time we have to do it here.
>
> We were doing it unconditionally before because we were calling it through
> folding and expansion, both of which only fired if we were expanding pow
> calls; now we're groveling over the whole IR to look for optimization
> opportunities that, let's be honest, the vast majority of code is never going
> to care about.  The whole point of the patch is "to restore lost FMA
> opportunities and expose vectorization opportunities".  The first reason
> *might* be good justification for running it at -O2, but the second reason
> calls for -O3 or at the very least flag_tree_vectorize.

The patch of course does more, like expanding to power series.

> However, I don't think anybody's going to notice/care if -O1 stopped folding
> pow calls; could we at least add a flag_expensive_optimizations && optimize gate?

The pass doesn't look very expensive and the IL walk could be
shared with cse_sincos.  Note that I agree that the transforms
probably should never have been applied at -O0 (we IMHO do too
much builtin folding at -O0 which can be surprising - like
transforming pow (x, 0.5) to sqrt (x)).  So I wouldn't worry about
losing the -O0 transforms at all.

Richard.

> -Nathan
>
>
Richard Biener May 16, 2011, 3:07 p.m. UTC | #7
On Fri, May 13, 2011 at 4:52 PM, William J. Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> This patch addresses PR46728, which notes that pow and powi need to be
> lowered in tree phases to restore lost FMA opportunities and expose
> vectorization opportunities.
>
> The approach is to move most optimizations from expand_builtin_pow[i]
> into fold_builtin_pow[i]. One exception is the rewrite of powi as an
> optimal sequence of multiplies, which relies on the ability to insert
> new statements. Because fold_builtin_powi is called during front-end
> parsing, the gimple machinery for statement manipulation can't be relied
> upon there.
>
> A new pass (tree-ssa-math-opts.c:execute_lower_pow) is added at all opt
> levels to lower calls to pow and powi early in the middle end. This is
> where the expansion of powi into multiplies takes place. Other folds
> from fold_builtin_pow[i] take place here as well. In many cases, these
> opportunities were already folded during parsing, but not all front ends
> may do this, so the patch doesn't rely on it.
>
> Miscellaneous notes:
>
>      * expand_builtin_pow[i] remain as skeletons, for those cases that
>        can't be lowered into another form.
>
>      * In some cases, fold_builtin_sqrt attempts to convert sqrt into
>        pow[i] (the inverse of what the pow lowering does); this must be
>        disabled when a hardware sqrt instruction is available.
>
>      * Many fewer pow invocations will exist in tree form; they will
>        now be converted to use sqrt, cbrt, powi, etc. where possible.
>        Because powi will be more prevalent now, I duplicated many of
>        the pow folds in fold-const.c to work on powi as well.
>
>      * I added 16 new test cases. There is already good test coverage
>        for this area, so most of the tests are powerpc-specific to test
>        behavior when a hardware sqrt instruction is available.
>
> Patch was regression-tested on powerpc64-linux and i686-linux-gnu. OK
> for mainline?
>
> Bill
>
>
> gcc/
>
> 2011-04-28  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>        * Makefile.in (tree-ssa-math-opts.o): Add dependency on
>        tree-ssa-propagate.h.
>        * builtins.c (fold_builtin_pow, fold_builtin_powi): Remove forward
>        declarations.
>        (powi_cost): Update commentary.
>        (expand_powi_1): Remove.
>        (expand_powi): Remove.
>        (expand_builtin_pow_root): Remove.
>        (expand_builtin_pow): Remove all folds.
>        (expand_builtin_powi): Remove all folds.
>        (fold_builtin_sqrt): Restrict fold of sqrt(Nroot(x)) when
>        hardware sqrt is available; add fold of sqrt(powi(x,y)) to
>        pow(|x|,y*0.5).
>        (fold_builtin_cbrt): Restrict fold of (cbrt(sqrt(x)) when hardware
>        sqrt is available.
>        (fold_eval_powi): New function.
>        (build_call_expr_loc_strip_sign): New function.
>        (fold_builtin_pow_frac_exp): New function.
>        (fold_builtin_pow): Remove static declaraion; add fold of
>        pow(x,0.25) to sqrt(sqrt(x)); add fold of pow(x,0.75) to
>        sqrt(x)*sqrt(sqrt(x)); add folds of pow(x,c) to use powi when c,
>        2c, or 3c is an integer.
>        (powi_as_mults_1): New function.
>        (powi_as_mults): New function.
>        (tree_expand_builtin_powi): New function.
>        (fold_builtin_powi): Remove static declaration; delay handling of
>        compile-time constants until after simple folds, and move that
>        handling into fold_eval_powi.
>        * fold-const.c (fold_binary_loc): Add folds on powi similar to
>        existing folds on pow; remove fold of x*x to pow(x,2.0).
>        * passes.c (init_optimization_passes): Add pass_lower_pow.
>        * tree.h (fold_builtin_pow, fold_builtin_powi): Add declarations.
>        * tree-flow.h (tree_expand_builtin_powi): Add declaration.
>        * tree-pass.h (pass_lower_pow): Add declaration.
>        * tree-ssa-math-opts.c (tree-ssa-propagate.h): New include.
>        (execute_lower_pow): New function.
>        (pass_lower_pow): New gimple_opt_pass.
>
> gcc/testsuite/
>
> 2011-05-13  Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
>        * gcc.target/powerpc/pr46728-1.c: New testcase.
>        * gcc.target/powerpc/pr46728-2.c: New testcase.
>        * gcc.target/powerpc/pr46728-3.c: New testcase.
>        * gcc.target/powerpc/pr46728-4.c: New testcase.
>        * gcc.target/powerpc/pr46728-5.c: New testcase.
>        * gcc.dg/pr46728-6.c: New testcase.
>        * gcc.target/powerpc/pr46728-7.c: New testcase.
>        * gcc.target/powerpc/pr46728-8.c: New testcase.
>        * gcc.dg/pr46728-9.c: New testcase.
>        * gcc.target/powerpc/pr46728-10.c: New testcase.
>        * gcc.target/powerpc/pr46728-11.c: New testcase.
>        * gcc.dg/pr46728-12.c: New testcase.
>        * gcc.target/powerpc/pr46728-13.c: New testcase.
>        * gcc.target/powerpc/pr46728-14.c: New testcase.
>        * gcc.target/powerpc/pr46728-15.c: New testcase.
>        * gcc.target/powerpc/pr46728-16.c: New testcase.
>
>
> Index: gcc/tree.h
> ===================================================================
> --- gcc/tree.h  (revision 173730)
> +++ gcc/tree.h  (working copy)
> @@ -1,6 +1,6 @@
>  /* Front-end tree definitions for GNU compiler.
>    Copyright (C) 1989, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000,
> -   2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010
> +   2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011
>    Free Software Foundation, Inc.
>
>  This file is part of GCC.
> @@ -5271,6 +5271,8 @@
>  extern bool fold_builtin_next_arg (tree, bool);
>  extern enum built_in_function builtin_mathfn_code (const_tree);
>  extern tree fold_builtin_call_array (location_t, tree, tree, int, tree *);
> +extern tree fold_builtin_pow (location_t, tree, tree, tree, tree);
> +extern tree fold_builtin_powi (location_t, tree, tree, tree, tree);
>  extern tree build_call_expr_loc_array (location_t, tree, int, tree *);
>  extern tree build_call_expr_loc_vec (location_t, tree, VEC(tree,gc) *);
>  extern tree build_call_expr_loc (location_t, tree, int, ...);
> Index: gcc/tree-pass.h
> ===================================================================
> --- gcc/tree-pass.h     (revision 173730)
> +++ gcc/tree-pass.h     (working copy)
> @@ -419,6 +419,7 @@
>  extern struct gimple_opt_pass pass_cse_sincos;
>  extern struct gimple_opt_pass pass_optimize_bswap;
>  extern struct gimple_opt_pass pass_optimize_widening_mul;
> +extern struct gimple_opt_pass pass_lower_pow;
>  extern struct gimple_opt_pass pass_warn_function_return;
>  extern struct gimple_opt_pass pass_warn_function_noreturn;
>  extern struct gimple_opt_pass pass_cselim;
> Index: gcc/builtins.c
> ===================================================================
> --- gcc/builtins.c      (revision 173730)
> +++ gcc/builtins.c      (working copy)
> @@ -149,8 +149,6 @@
>  static rtx expand_builtin_signbit (tree, rtx);
>  static tree fold_builtin_sqrt (location_t, tree, tree);
>  static tree fold_builtin_cbrt (location_t, tree, tree);
> -static tree fold_builtin_pow (location_t, tree, tree, tree, tree);
> -static tree fold_builtin_powi (location_t, tree, tree, tree, tree);
>  static tree fold_builtin_cos (location_t, tree, tree, tree);
>  static tree fold_builtin_cosh (location_t, tree, tree, tree);
>  static tree fold_builtin_tan (tree, tree);
> @@ -2940,7 +2938,7 @@
>
>  /* Return the number of multiplications required to calculate
>    powi(x,n) for an arbitrary x, given the exponent N.  This
> -   function needs to be kept in sync with expand_powi below.  */
> +   function needs to be kept in sync with fold_powi_as_mults, below.  */
>
>  static int
>  powi_cost (HOST_WIDE_INT n)
> @@ -2981,165 +2979,6 @@
>   return result + powi_lookup_cost (val, cache);
>  }
>
> -/* Recursive subroutine of expand_powi.  This function takes the array,
> -   CACHE, of already calculated exponents and an exponent N and returns
> -   an RTX that corresponds to CACHE[1]**N, as calculated in mode MODE.  */
> -
> -static rtx
> -expand_powi_1 (enum machine_mode mode, unsigned HOST_WIDE_INT n, rtx *cache)
> -{
> -  unsigned HOST_WIDE_INT digit;
> -  rtx target, result;
> -  rtx op0, op1;
> -
> -  if (n < POWI_TABLE_SIZE)
> -    {
> -      if (cache[n])
> -       return cache[n];
> -
> -      target = gen_reg_rtx (mode);
> -      cache[n] = target;
> -
> -      op0 = expand_powi_1 (mode, n - powi_table[n], cache);
> -      op1 = expand_powi_1 (mode, powi_table[n], cache);
> -    }
> -  else if (n & 1)
> -    {
> -      target = gen_reg_rtx (mode);
> -      digit = n & ((1 << POWI_WINDOW_SIZE) - 1);
> -      op0 = expand_powi_1 (mode, n - digit, cache);
> -      op1 = expand_powi_1 (mode, digit, cache);
> -    }
> -  else
> -    {
> -      target = gen_reg_rtx (mode);
> -      op0 = expand_powi_1 (mode, n >> 1, cache);
> -      op1 = op0;
> -    }
> -
> -  result = expand_mult (mode, op0, op1, target, 0);
> -  if (result != target)
> -    emit_move_insn (target, result);
> -  return target;
> -}
> -
> -/* Expand the RTL to evaluate powi(x,n) in mode MODE.  X is the
> -   floating point operand in mode MODE, and N is the exponent.  This
> -   function needs to be kept in sync with powi_cost above.  */
> -
> -static rtx
> -expand_powi (rtx x, enum machine_mode mode, HOST_WIDE_INT n)
> -{
> -  rtx cache[POWI_TABLE_SIZE];
> -  rtx result;
> -
> -  if (n == 0)
> -    return CONST1_RTX (mode);
> -
> -  memset (cache, 0, sizeof (cache));
> -  cache[1] = x;
> -
> -  result = expand_powi_1 (mode, (n < 0) ? -n : n, cache);
> -
> -  /* If the original exponent was negative, reciprocate the result.  */
> -  if (n < 0)
> -    result = expand_binop (mode, sdiv_optab, CONST1_RTX (mode),
> -                          result, NULL_RTX, 0, OPTAB_LIB_WIDEN);
> -
> -  return result;
> -}
> -
> -/* Fold a builtin function call to pow, powf, or powl into a series of sqrts or
> -   cbrts.  Return NULL_RTX if no simplification can be made or expand the tree
> -   if we can simplify it.  */
> -static rtx
> -expand_builtin_pow_root (location_t loc, tree arg0, tree arg1, tree type,
> -                        rtx subtarget)
> -{
> -  if (TREE_CODE (arg1) == REAL_CST
> -      && !TREE_OVERFLOW (arg1)
> -      && flag_unsafe_math_optimizations)
> -    {
> -      enum machine_mode mode = TYPE_MODE (type);
> -      tree sqrtfn = mathfn_built_in (type, BUILT_IN_SQRT);
> -      tree cbrtfn = mathfn_built_in (type, BUILT_IN_CBRT);
> -      REAL_VALUE_TYPE c = TREE_REAL_CST (arg1);
> -      tree op = NULL_TREE;
> -
> -      if (sqrtfn)
> -       {
> -         /* Optimize pow (x, 0.5) into sqrt.  */
> -         if (REAL_VALUES_EQUAL (c, dconsthalf))
> -           op = build_call_nofold_loc (loc, sqrtfn, 1, arg0);
> -
> -         /* Don't do this optimization if we don't have a sqrt insn.  */
> -         else if (optab_handler (sqrt_optab, mode) != CODE_FOR_nothing)
> -           {
> -             REAL_VALUE_TYPE dconst1_4 = dconst1;
> -             REAL_VALUE_TYPE dconst3_4;
> -             SET_REAL_EXP (&dconst1_4, REAL_EXP (&dconst1_4) - 2);
> -
> -             real_from_integer (&dconst3_4, VOIDmode, 3, 0, 0);
> -             SET_REAL_EXP (&dconst3_4, REAL_EXP (&dconst3_4) - 2);
> -
> -             /* Optimize pow (x, 0.25) into sqrt (sqrt (x)).  Assume on most
> -                machines that a builtin sqrt instruction is smaller than a
> -                call to pow with 0.25, so do this optimization even if
> -                -Os.  */
> -             if (REAL_VALUES_EQUAL (c, dconst1_4))
> -               {
> -                 op = build_call_nofold_loc (loc, sqrtfn, 1, arg0);
> -                 op = build_call_nofold_loc (loc, sqrtfn, 1, op);
> -               }
> -
> -             /* Optimize pow (x, 0.75) = sqrt (x) * sqrt (sqrt (x)) unless we
> -                are optimizing for space.  */
> -             else if (optimize_insn_for_speed_p ()
> -                      && !TREE_SIDE_EFFECTS (arg0)
> -                      && REAL_VALUES_EQUAL (c, dconst3_4))
> -               {
> -                 tree sqrt1 = build_call_expr_loc (loc, sqrtfn, 1, arg0);
> -                 tree sqrt2 = builtin_save_expr (sqrt1);
> -                 tree sqrt3 = build_call_expr_loc (loc, sqrtfn, 1, sqrt1);
> -                 op = fold_build2_loc (loc, MULT_EXPR, type, sqrt2, sqrt3);
> -               }
> -           }
> -       }
> -
> -      /* Check whether we can do cbrt insstead of pow (x, 1./3.) and
> -        cbrt/sqrts instead of pow (x, 1./6.).  */
> -      if (cbrtfn && ! op
> -         && (tree_expr_nonnegative_p (arg0) || !HONOR_NANS (mode)))
> -       {
> -         /* First try 1/3.  */
> -         REAL_VALUE_TYPE dconst1_3
> -           = real_value_truncate (mode, dconst_third ());
> -
> -         if (REAL_VALUES_EQUAL (c, dconst1_3))
> -           op = build_call_nofold_loc (loc, cbrtfn, 1, arg0);
> -
> -             /* Now try 1/6.  */
> -         else if (optimize_insn_for_speed_p ()
> -                  && optab_handler (sqrt_optab, mode) != CODE_FOR_nothing)
> -           {
> -             REAL_VALUE_TYPE dconst1_6 = dconst1_3;
> -             SET_REAL_EXP (&dconst1_6, REAL_EXP (&dconst1_6) - 1);
> -
> -             if (REAL_VALUES_EQUAL (c, dconst1_6))
> -               {
> -                 op = build_call_nofold_loc (loc, sqrtfn, 1, arg0);
> -                 op = build_call_nofold_loc (loc, cbrtfn, 1, op);
> -               }
> -           }
> -       }
> -
> -      if (op)
> -       return expand_expr (op, subtarget, mode, EXPAND_NORMAL);
> -    }
> -
> -  return NULL_RTX;
> -}
> -
>  /* Expand a call to the pow built-in mathematical function.  Return NULL_RTX if
>    a normal call should be emitted rather than expanding the function
>    in-line.  EXP is the expression that is a call to the builtin
> @@ -3148,147 +2987,9 @@
>  static rtx
>  expand_builtin_pow (tree exp, rtx target, rtx subtarget)
>  {

This function now seems redundant - please simply remove its call
and make sure it falls trhough to libcall emit.

Also use diff -p, it's hard to spot patch context otherwise.

> -  tree arg0, arg1;
> -  tree fn, narg0;
> -  tree type = TREE_TYPE (exp);
> -  REAL_VALUE_TYPE cint, c, c2;
> -  HOST_WIDE_INT n;
> -  rtx op, op2;
> -  enum machine_mode mode = TYPE_MODE (type);
> -
>   if (! validate_arglist (exp, REAL_TYPE, REAL_TYPE, VOID_TYPE))
>     return NULL_RTX;
>
> -  arg0 = CALL_EXPR_ARG (exp, 0);
> -  arg1 = CALL_EXPR_ARG (exp, 1);
> -
> -  if (TREE_CODE (arg1) != REAL_CST
> -      || TREE_OVERFLOW (arg1))
> -    return expand_builtin_mathfn_2 (exp, target, subtarget);
> -
> -  /* Handle constant exponents.  */
> -
> -  /* For integer valued exponents we can expand to an optimal multiplication
> -     sequence using expand_powi.  */
> -  c = TREE_REAL_CST (arg1);
> -  n = real_to_integer (&c);
> -  real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
> -  if (real_identical (&c, &cint)
> -      && ((n >= -1 && n <= 2)
> -         || (flag_unsafe_math_optimizations
> -             && optimize_insn_for_speed_p ()
> -             && powi_cost (n) <= POWI_MAX_MULTS)))
> -    {
> -      op = expand_expr (arg0, subtarget, VOIDmode, EXPAND_NORMAL);
> -      if (n != 1)
> -       {
> -         op = force_reg (mode, op);
> -         op = expand_powi (op, mode, n);
> -       }
> -      return op;
> -    }
> -
> -  narg0 = builtin_save_expr (arg0);
> -
> -  /* If the exponent is not integer valued, check if it is half of an integer.
> -     In this case we can expand to sqrt (x) * x**(n/2).  */
> -  fn = mathfn_built_in (type, BUILT_IN_SQRT);
> -  if (fn != NULL_TREE)
> -    {
> -      real_arithmetic (&c2, MULT_EXPR, &c, &dconst2);
> -      n = real_to_integer (&c2);
> -      real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
> -      if (real_identical (&c2, &cint)
> -         && ((flag_unsafe_math_optimizations
> -              && optimize_insn_for_speed_p ()
> -              && powi_cost (n/2) <= POWI_MAX_MULTS)
> -             /* Even the c == 0.5 case cannot be done unconditionally
> -                when we need to preserve signed zeros, as
> -                pow (-0, 0.5) is +0, while sqrt(-0) is -0.  */
> -             || (!HONOR_SIGNED_ZEROS (mode) && n == 1)
> -             /* For c == 1.5 we can assume that x * sqrt (x) is always
> -                smaller than pow (x, 1.5) if sqrt will not be expanded
> -                as a call.  */
> -             || (n == 3
> -                 && optab_handler (sqrt_optab, mode) != CODE_FOR_nothing)))
> -       {
> -         tree call_expr = build_call_nofold_loc (EXPR_LOCATION (exp), fn, 1,
> -                                                 narg0);
> -         /* Use expand_expr in case the newly built call expression
> -            was folded to a non-call.  */
> -         op = expand_expr (call_expr, subtarget, mode, EXPAND_NORMAL);
> -         if (n != 1)
> -           {
> -             op2 = expand_expr (narg0, subtarget, VOIDmode, EXPAND_NORMAL);
> -             op2 = force_reg (mode, op2);
> -             op2 = expand_powi (op2, mode, abs (n / 2));
> -             op = expand_simple_binop (mode, MULT, op, op2, NULL_RTX,
> -                                       0, OPTAB_LIB_WIDEN);
> -             /* If the original exponent was negative, reciprocate the
> -                result.  */
> -             if (n < 0)
> -               op = expand_binop (mode, sdiv_optab, CONST1_RTX (mode),
> -                                  op, NULL_RTX, 0, OPTAB_LIB_WIDEN);
> -           }
> -         return op;
> -       }
> -    }
> -
> -  /* Check whether we can do a series of sqrt or cbrt's instead of the pow
> -     call.  */
> -  op = expand_builtin_pow_root (EXPR_LOCATION (exp), arg0, arg1, type,
> -                               subtarget);
> -  if (op)
> -    return op;
> -
> -  /* Try if the exponent is a third of an integer.  In this case
> -     we can expand to x**(n/3) * cbrt(x)**(n%3).  As cbrt (x) is
> -     different from pow (x, 1./3.) due to rounding and behavior
> -     with negative x we need to constrain this transformation to
> -     unsafe math and positive x or finite math.  */
> -  fn = mathfn_built_in (type, BUILT_IN_CBRT);
> -  if (fn != NULL_TREE
> -      && flag_unsafe_math_optimizations
> -      && (tree_expr_nonnegative_p (arg0)
> -         || !HONOR_NANS (mode)))
> -    {
> -      REAL_VALUE_TYPE dconst3;
> -      real_from_integer (&dconst3, VOIDmode, 3, 0, 0);
> -      real_arithmetic (&c2, MULT_EXPR, &c, &dconst3);
> -      real_round (&c2, mode, &c2);
> -      n = real_to_integer (&c2);
> -      real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
> -      real_arithmetic (&c2, RDIV_EXPR, &cint, &dconst3);
> -      real_convert (&c2, mode, &c2);
> -      if (real_identical (&c2, &c)
> -         && ((optimize_insn_for_speed_p ()
> -              && powi_cost (n/3) <= POWI_MAX_MULTS)
> -             || n == 1))
> -       {
> -         tree call_expr = build_call_nofold_loc (EXPR_LOCATION (exp), fn, 1,
> -                                                 narg0);
> -         op = expand_builtin (call_expr, NULL_RTX, subtarget, mode, 0);
> -         if (abs (n) % 3 == 2)
> -           op = expand_simple_binop (mode, MULT, op, op, op,
> -                                     0, OPTAB_LIB_WIDEN);
> -         if (n != 1)
> -           {
> -             op2 = expand_expr (narg0, subtarget, VOIDmode, EXPAND_NORMAL);
> -             op2 = force_reg (mode, op2);
> -             op2 = expand_powi (op2, mode, abs (n / 3));
> -             op = expand_simple_binop (mode, MULT, op, op2, NULL_RTX,
> -                                       0, OPTAB_LIB_WIDEN);
> -             /* If the original exponent was negative, reciprocate the
> -                result.  */
> -             if (n < 0)
> -               op = expand_binop (mode, sdiv_optab, CONST1_RTX (mode),
> -                                  op, NULL_RTX, 0, OPTAB_LIB_WIDEN);
> -           }
> -         return op;
> -       }
> -    }
> -
> -  /* Fall back to optab expansion.  */
>   return expand_builtin_mathfn_2 (exp, target, subtarget);
>  }
>
> @@ -3312,27 +3013,6 @@
>   arg1 = CALL_EXPR_ARG (exp, 1);
>   mode = TYPE_MODE (TREE_TYPE (exp));
>
> -  /* Handle constant power.  */
> -
> -  if (TREE_CODE (arg1) == INTEGER_CST
> -      && !TREE_OVERFLOW (arg1))
> -    {
> -      HOST_WIDE_INT n = TREE_INT_CST_LOW (arg1);
> -
> -      /* If the exponent is -1, 0, 1 or 2, then expand_powi is exact.
> -        Otherwise, check the number of multiplications required.  */
> -      if ((TREE_INT_CST_HIGH (arg1) == 0
> -          || TREE_INT_CST_HIGH (arg1) == -1)
> -         && ((n >= -1 && n <= 2)
> -             || (optimize_insn_for_speed_p ()
> -                 && powi_cost (n) <= POWI_MAX_MULTS)))
> -       {
> -         op0 = expand_expr (arg0, NULL_RTX, VOIDmode, EXPAND_NORMAL);
> -         op0 = force_reg (mode, op0);
> -         return expand_powi (op0, mode, n);
> -       }
> -    }
> -
>   /* Emit a libcall to libgcc.  */
>
>   /* Mode of the 2nd argument must match that of an int.  */
> @@ -7195,8 +6875,13 @@
>       return build_call_expr_loc (loc, expfn, 1, arg);
>     }
>
> -  /* Optimize sqrt(Nroot(x)) -> pow(x,1/(2*N)).  */
> -  if (flag_unsafe_math_optimizations && BUILTIN_ROOT_P (fcode))
> +  /* Optimize sqrt(Nroot(x)) -> pow(x,1/(2*N)).  However, for N=2, only
> +     do this if there is no hardware sqrt instruction.  (For N=3, this
> +     has the effect of canonicalizing sqrt(cbrt(x)) as cbrt(sqrt(x)),
> +     due to folding on pow(x,1/6).)  */
> +  if (flag_unsafe_math_optimizations && BUILTIN_ROOT_P (fcode)
> +      && (!BUILTIN_SQRT_P (fcode)
> +         || optab_handler (sqrt_optab, TYPE_MODE (type)) == CODE_FOR_nothing))
>     {
>       tree powfn = mathfn_built_in (type, BUILT_IN_POW);
>
> @@ -7238,6 +6923,39 @@
>       return build_call_expr_loc (loc, powfn, 2, arg0, narg1);
>     }
>
> +  /* Optimize sqrt(powi(x,y)) = pow(|x|,y*0.5).  */
> +  if (flag_unsafe_math_optimizations
> +      && (fcode == BUILT_IN_POWI
> +         || fcode == BUILT_IN_POWIF
> +         || fcode == BUILT_IN_POWIL))
> +    {
> +      tree powfn = mathfn_built_in (type, BUILT_IN_POW);
> +      tree arg0 = CALL_EXPR_ARG (arg, 0);
> +      tree arg1 = CALL_EXPR_ARG (arg, 1);
> +      tree narg1;
> +      if (!tree_expr_nonnegative_p (arg0))
> +       arg0 = build1 (ABS_EXPR, type, arg0);
> +      narg1 = fold_convert_loc (loc, type, arg1);
> +      narg1 = fold_build2_loc (loc, MULT_EXPR, type, narg1,
> +                          build_real (type, dconsthalf));
> +      return build_call_expr_loc (loc, powfn, 2, arg0, narg1);
> +    }

Can't you merge this with the existing pow code?

> +
> +  /* Optimize sqrt(x*x) = |x|.  */
> +  if (flag_unsafe_math_optimizations
> +      && TREE_CODE (arg) == MULT_EXPR)
> +    {
> +      tree arg0 = TREE_OPERAND (arg, 0);
> +      tree arg1 = TREE_OPERAND (arg, 1);
> +
> +      if (operand_equal_p (arg0, arg1, 0))
> +       {
> +         if (!tree_expr_nonnegative_p (arg0))
> +           arg0 = build1 (ABS_EXPR, type, arg0);
> +         return fold_convert_loc (loc, type, arg0);
> +       }
> +    }
> +

It looks like this (and the previous) change could be easily split
out into a separate patch.  Please do so (and consider it for
otherparts that I might have missed).

>   return NULL_TREE;
>  }
>
> @@ -7271,8 +6989,11 @@
>          return build_call_expr_loc (loc, expfn, 1, arg);
>        }
>
> -      /* Optimize cbrt(sqrt(x)) -> pow(x,1/6).  */
> -      if (BUILTIN_SQRT_P (fcode))
> +      /* Optimize cbrt(sqrt(x)) -> pow(x,1/6), but only if there is no
> +        native square root instruction or we are optimizing for size.  */
> +      if (BUILTIN_SQRT_P (fcode)
> +         && (optab_handler (sqrt_optab, TYPE_MODE (type)) == CODE_FOR_nothing
> +             || !optimize_function_for_speed_p (cfun)))

Note that most of the folding code was also to canoncalize things
to for example catch pow(cbrt(sqrt(x)),6).  So I'm not sure
we really want to disable those foldings (that apply from the FEs
side only anyway), but instead we might want to undo some of
them during optimization (in your new pass).

>        {
>          tree powfn = mathfn_built_in (type, BUILT_IN_POW);
>
> @@ -8010,11 +7731,177 @@
>  }
>
>
> +/* Attempt to evaluate powi(arg0,n) at compile time, unless this should
> +   raise an exception.  */
> +static tree
> +fold_eval_powi (tree arg0, HOST_WIDE_INT n, tree type, enum machine_mode mode)

fold_powi_to_constant

> +{
> +  if (TREE_CODE (arg0) == REAL_CST
> +      && !TREE_OVERFLOW (arg0)
> +      && (n > 0
> +         || (!flag_trapping_math && !flag_errno_math)
> +         || !REAL_VALUES_EQUAL (TREE_REAL_CST (arg0), dconst0)))
> +    {
> +      REAL_VALUE_TYPE x;
> +      bool inexact;
> +
> +      x = TREE_REAL_CST (arg0);
> +      inexact = real_powi (&x, mode, &x, n);
> +      if (flag_unsafe_math_optimizations || !inexact)
> +       return build_real (type, x);
> +    }
> +
> +  return NULL_TREE;
> +}
> +
> +
> +/* Build a call to FNDECL with location LOC and arguments ARG0 and ARG1.
> +   If N is even, strip the sign from ARG0 before building the call.  */
> +static tree
> +build_call_expr_loc_strip_sign (HOST_WIDE_INT n, location_t loc, tree fndecl,
> +                               tree arg0, tree arg1)
> +{
> +  if ((n & 1) == 0 && flag_unsafe_math_optimizations)
> +    {
> +      tree narg0 = fold_strip_sign_ops (arg0);
> +      if (narg0)
> +       return build_call_expr_loc (loc, fndecl, 2, narg0, arg1);
> +    }
> +
> +  return build_call_expr_loc (loc, fndecl, 2, arg0, arg1);

That's expensive, please inline into the callers and avoid building
calls again where possible.

> +}
> +
> +
> +/* Attempt to optimize pow(ARG0, C), where C is a real constant not equal
> +   to any integer.  When 2C or 3C is an integer, we can sometimes improve
> +   the code using sqrt and/or cbrt.  */
> +static tree
> +fold_builtin_pow_frac_exp (location_t loc, tree arg0, REAL_VALUE_TYPE c,
> +                          tree type, enum machine_mode mode)
> +{
> +  REAL_VALUE_TYPE c2, cint;
> +  HOST_WIDE_INT n;
> +  tree sqrtfn = mathfn_built_in (type, BUILT_IN_SQRT);
> +  tree cbrtfn = mathfn_built_in (type, BUILT_IN_CBRT);
> +  tree powifn = mathfn_built_in (type, BUILT_IN_POWI);
> +
> +  /* Optimize pow(x,c), where c = floor(c) + 0.5, into
> +     sqrt(x) * powi(x, floor(c)).  */
> +
> +  real_arithmetic (&c2, MULT_EXPR, &c, &dconst2);
> +  n = real_to_integer (&c2);
> +  real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
> +
> +  if (real_identical (&c2, &cint)
> +      && ((flag_unsafe_math_optimizations
> +          && sqrtfn != NULL_TREE
> +          && powi_cost (n/2) <= POWI_MAX_MULTS)
> +         /* pow(x,0.5) can be done unconditionally provided signed
> +            zeros must not be maintained.  pow(-0,0.5) = +0, but
> +            sqrt(-0) = -0.  */
> +         || (!HONOR_SIGNED_ZEROS (mode) && n == 1)
> +         /* pow(x,1.5)=x*sqrt(x) is safe, and smaller than pow(x,1.5)
> +            provided sqrt will not be expanded as a call.  */
> +         || (n == 3
> +             && optab_handler (sqrt_optab, mode) != CODE_FOR_nothing)))
> +    {
> +      tree narg0 = builtin_save_expr (arg0);
> +      tree powi_x_floor_c = NULL_TREE;
> +      HOST_WIDE_INT floor_c = n / 2;
> +      if (n <= 0)
> +       floor_c--;
> +
> +      /* Attempt to fold powi(arg0, floor_c) into a constant.  */
> +      powi_x_floor_c = fold_eval_powi (arg0, floor_c, type, mode);
> +
> +      if (!powi_x_floor_c && powifn)
> +       {
> +         tree tree_floor_c = build_int_cst (integer_type_node, floor_c);
> +         powi_x_floor_c = build_call_expr_loc_strip_sign (floor_c, loc, powifn,
> +                                                          narg0, tree_floor_c);
> +       }
> +
> +      if (powi_x_floor_c)
> +       {
> +         tree sqrt_arg0 = build_call_nofold_loc (loc, sqrtfn, 1, narg0);
> +         return fold_build2_loc (loc, MULT_EXPR, type,
> +                                 sqrt_arg0, powi_x_floor_c);
> +       }
> +    }
> +
> +  /* Optimize pow(x,c), where 3c = n for some integer n, into
> +     powi(x, floor(c)) * powi(cbrt(x), n%3).  */
> +  if (cbrtfn != NULL_TREE
> +      && powifn != NULL_TREE
> +      && flag_unsafe_math_optimizations
> +      && (tree_expr_nonnegative_p (arg0) || !HONOR_NANS (mode)))
> +    {
> +      REAL_VALUE_TYPE dconst3;
> +
> +      real_from_integer (&dconst3, VOIDmode, 3, 0, 0);
> +      real_arithmetic (&c2, MULT_EXPR, &c, &dconst3);
> +      real_round (&c2, mode, &c2);
> +      n = real_to_integer (&c2);
> +      real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
> +      real_arithmetic (&c2, RDIV_EXPR, &cint, &dconst3);
> +      real_convert (&c2, mode, &c2);
> +      if (real_identical (&c2, &c)
> +         && ((optimize_function_for_speed_p (cfun)
> +              && powi_cost (n / 3) <= POWI_MAX_MULTS)
> +             || n == 1))
> +       {
> +         HOST_WIDE_INT floor_c = n / 3;
> +         tree narg0 = builtin_save_expr (arg0);
> +         tree powi_x_floor_c;
> +
> +         if (n <= 0)
> +           floor_c--;
> +
> +         /* Attempt to fold powi(x, floor(c)) into a constant.  */
> +         powi_x_floor_c = fold_eval_powi (arg0, floor_c, type, mode);
> +
> +         if (!powi_x_floor_c)
> +           {
> +             tree tree_floor_c =
> +               build_int_cst (integer_type_node, floor_c);
> +
> +             powi_x_floor_c =
> +               build_call_expr_loc_strip_sign (floor_c, loc, powifn,
> +                                               narg0, tree_floor_c);
> +           }
> +
> +         if (powi_x_floor_c)
> +           {
> +             HOST_WIDE_INT n_mod_3 = n % 3;
> +             tree tree_n_mod_3, powi_cbrt_x, cbrt_arg0;
> +
> +             if (n <= 0)
> +               n_mod_3 = n_mod_3 + 3;
> +
> +             tree_n_mod_3 = build_int_cst (integer_type_node, n_mod_3);
> +
> +             cbrt_arg0 = build_call_nofold_loc (loc, cbrtfn, 1, narg0);
> +             powi_cbrt_x =
> +               build_call_expr_loc_strip_sign (n_mod_3, loc, powifn,
> +                                               cbrt_arg0, tree_n_mod_3);
> +
> +             if (powi_cbrt_x)
> +               return fold_build2_loc (loc, MULT_EXPR, type,
> +                                       powi_x_floor_c, powi_cbrt_x);
> +           }
> +       }
> +    }
> +
> +  return NULL_TREE;
> +}

THis is probably mostly code-reorg, can you split it out into a
separate patch?

I'm skipping a bit now ...

> +
>  /* Fold a builtin function call to pow, powf, or powl.  Return
>    NULL_TREE if no simplification can be made.  */
> -static tree
> +tree
>  fold_builtin_pow (location_t loc, tree fndecl, tree arg0, tree arg1, tree type)
>  {
> +  enum machine_mode mode = TYPE_MODE (type);
>   tree res;
>
>   if (!validate_arg (arg0, REAL_TYPE)
> @@ -8032,9 +7919,10 @@
>   if (TREE_CODE (arg1) == REAL_CST
>       && !TREE_OVERFLOW (arg1))
>     {
> -      REAL_VALUE_TYPE cint;
>       REAL_VALUE_TYPE c;
> -      HOST_WIDE_INT n;
> +      tree sqrtfn = mathfn_built_in (type, BUILT_IN_SQRT);
> +      tree cbrtfn = mathfn_built_in (type, BUILT_IN_CBRT);
> +      REAL_VALUE_TYPE dconst1_4, dconst3_4;
>
>       c = TREE_REAL_CST (arg1);
>
> @@ -8054,56 +7942,63 @@
>
>       /* Optimize pow(x,0.5) = sqrt(x).  */
>       if (flag_unsafe_math_optimizations
> -         && REAL_VALUES_EQUAL (c, dconsthalf))
> +         && REAL_VALUES_EQUAL (c, dconsthalf)
> +         && sqrtfn != NULL_TREE)
> +       return build_call_expr_loc (loc, sqrtfn, 1, arg0);
> +
> +      /* Optimize pow(x,0.25) = sqrt(sqrt(x)).  */
> +      dconst1_4 = dconst1;
> +      SET_REAL_EXP (&dconst1_4, REAL_EXP (&dconst1_4) - 2);
> +
> +      if (flag_unsafe_math_optimizations
> +         && REAL_VALUES_EQUAL (c, dconst1_4)
> +         && sqrtfn != NULL_TREE
> +         && optab_handler (sqrt_optab, mode) != CODE_FOR_nothing)
>        {
> -         tree sqrtfn = mathfn_built_in (type, BUILT_IN_SQRT);
> -
> -         if (sqrtfn != NULL_TREE)
> -           return build_call_expr_loc (loc, sqrtfn, 1, arg0);
> +         tree sqrt_arg0 = build_call_nofold_loc (loc, sqrtfn, 1, arg0);
> +         return build_call_nofold_loc (loc, sqrtfn, 1, sqrt_arg0);
>        }
>
> -      /* Optimize pow(x,1.0/3.0) = cbrt(x).  */
> -      if (flag_unsafe_math_optimizations)
> +      /* Optimize pow(x,0.75) = sqrt(x) * sqrt(sqrt(x)) unless we are
> +        optimizing for space.  */
> +      real_from_integer (&dconst3_4, VOIDmode, 3, 0, 0);
> +      SET_REAL_EXP (&dconst3_4, REAL_EXP (&dconst3_4) - 2);
> +
> +      if (flag_unsafe_math_optimizations
> +         && optimize_function_for_speed_p (cfun)
> +         && !TREE_SIDE_EFFECTS (arg0)
> +         && REAL_VALUES_EQUAL (c, dconst3_4)
> +         && sqrtfn != NULL_TREE
> +         && optab_handler (sqrt_optab, mode) != CODE_FOR_nothing)
>        {
> -         const REAL_VALUE_TYPE dconstroot
> -           = real_value_truncate (TYPE_MODE (type), dconst_third ());
> -
> -         if (REAL_VALUES_EQUAL (c, dconstroot))
> -           {
> -             tree cbrtfn = mathfn_built_in (type, BUILT_IN_CBRT);
> -             if (cbrtfn != NULL_TREE)
> -               return build_call_expr_loc (loc, cbrtfn, 1, arg0);
> -           }
> +         tree sqrt_arg0 = build_call_expr_loc (loc, sqrtfn, 1, arg0);
> +         tree sqrt_save = builtin_save_expr (sqrt_arg0);
> +         tree sqrt_sqrt = build_call_expr_loc (loc, sqrtfn, 1, sqrt_arg0);
> +         return fold_build2_loc (loc, MULT_EXPR, type, sqrt_save, sqrt_sqrt);
>        }
>
> -      /* Check for an integer exponent.  */
> -      n = real_to_integer (&c);
> -      real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
> -      if (real_identical (&c, &cint))
> +      /* Optimize pow(x,1.0/3.0) = cbrt(x), and pow(x,1.0/6.0) =
> +        cbrt(sqrt(x)).  */
> +      if (flag_unsafe_math_optimizations && cbrtfn != NULL_TREE)
>        {
> -         /* Attempt to evaluate pow at compile-time, unless this should
> -            raise an exception.  */
> -         if (TREE_CODE (arg0) == REAL_CST
> -             && !TREE_OVERFLOW (arg0)
> -             && (n > 0
> -                 || (!flag_trapping_math && !flag_errno_math)
> -                 || !REAL_VALUES_EQUAL (TREE_REAL_CST (arg0), dconst0)))
> -           {
> -             REAL_VALUE_TYPE x;
> -             bool inexact;
> +         const REAL_VALUE_TYPE dconst1_3
> +           = real_value_truncate (mode, dconst_third ());
>
> -             x = TREE_REAL_CST (arg0);
> -             inexact = real_powi (&x, TYPE_MODE (type), &x, n);
> -             if (flag_unsafe_math_optimizations || !inexact)
> -               return build_real (type, x);
> -           }
> +         if (REAL_VALUES_EQUAL (c, dconst1_3))
> +           return build_call_expr_loc (loc, cbrtfn, 1, arg0);
>
> -         /* Strip sign ops from even integer powers.  */
> -         if ((n & 1) == 0 && flag_unsafe_math_optimizations)
> +         if (optimize_function_for_speed_p (cfun)
> +             && sqrtfn != NULL_TREE
> +             && optab_handler (sqrt_optab, mode) != CODE_FOR_nothing)
>            {
> -             tree narg0 = fold_strip_sign_ops (arg0);
> -             if (narg0)
> -               return build_call_expr_loc (loc, fndecl, 2, narg0, arg1);
> +             REAL_VALUE_TYPE dconst1_6 = dconst1_3;
> +             SET_REAL_EXP (&dconst1_6, REAL_EXP (&dconst1_6) - 1);
> +
> +             if (REAL_VALUES_EQUAL (c, dconst1_6))
> +               {
> +                 tree sqrt_arg0 = build_call_nofold_loc (loc, sqrtfn, 1, arg0);
> +                 return build_call_nofold_loc (loc, cbrtfn, 1, sqrt_arg0);
> +               }
>            }
>        }
>     }
> @@ -8137,7 +8032,7 @@
>          if (tree_expr_nonnegative_p (arg))
>            {
>              const REAL_VALUE_TYPE dconstroot
> -               = real_value_truncate (TYPE_MODE (type), dconst_third ());
> +               = real_value_truncate (mode, dconst_third ());
>              tree narg1 = fold_build2_loc (loc, MULT_EXPR, type, arg1,
>                                        build_real (type, dconstroot));
>              return build_call_expr_loc (loc, fndecl, 2, arg, narg1);
> @@ -8159,12 +8054,148 @@
>        }
>     }
>
> +  if (TREE_CODE (arg1) == REAL_CST
> +      && !TREE_OVERFLOW (arg1)
> +      /* If we weren't able to fold a constant expression as reals,
> +        don't convert into a different form.  */
> +      && TREE_CODE (arg0) != REAL_CST)
> +    {
> +      REAL_VALUE_TYPE c, cint;
> +      HOST_WIDE_INT n;
> +
> +      c = TREE_REAL_CST (arg1);
> +
> +      /* Check for an integer exponent.  */
> +      n = real_to_integer (&c);
> +      real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
> +      if (real_identical (&c, &cint)
> +         && powi_cost (n) <= POWI_MAX_MULTS)
> +       {
> +         /* Convert to powi, which will be processed into an optimal
> +            number of multiplications.  */
> +         tree powifn = mathfn_built_in (type, BUILT_IN_POWI);
> +
> +         if (powifn)
> +           {
> +             tree power = build_int_cst (integer_type_node, n);
> +             return build_call_expr_loc (loc, powifn, 2, arg0, power);
> +           }
> +       }
> +
> +      /* Check for specific fractional exponents we can optimize.  */
> +      else
> +       {
> +         tree opt_tree =
> +           fold_builtin_pow_frac_exp (loc, arg0, c, type, mode);
> +
> +         if (opt_tree)
> +           return opt_tree;
> +       }
> +    }
> +
>   return NULL_TREE;
>  }
>
> +/* Recursive subroutine of fold_powi_as_mults.  This function takes the
> +   array, CACHE, of already calculated exponents and an exponent N and
> +   returns a tree that corresponds to CACHE[1]**N, with type TYPE.  */
> +
> +static tree
> +powi_as_mults_1 (gimple_stmt_iterator *gsi, location_t loc, tree type,
> +                HOST_WIDE_INT n, tree *cache)

I think most (if not all) of this code belongs to its caller in
tree-ssa-math-opts.c.

> +{
> +  tree op0, op1, target;
> +  unsigned HOST_WIDE_INT digit;
> +  gimple mult_stmt;
> +
> +  if (n < POWI_TABLE_SIZE)
> +    {
> +      if (cache[n])
> +       return cache[n];
> +
> +      target = create_tmp_var (type, "powmult");
> +      add_referenced_var (target);
> +      target = make_ssa_name (target, NULL);
> +      cache[n] = target;
> +
> +      op0 = powi_as_mults_1 (gsi, loc, type, n - powi_table[n], cache);
> +      op1 = powi_as_mults_1 (gsi, loc, type, powi_table[n], cache);
> +    }
> +  else if (n & 1)
> +    {
> +      target = create_tmp_var (type, "powmult");
> +      add_referenced_var (target);
> +      target = make_ssa_name (target, NULL);
> +      digit = n & ((1 << POWI_WINDOW_SIZE) - 1);
> +      op0 = powi_as_mults_1 (gsi, loc, type, n - digit, cache);
> +      op1 = powi_as_mults_1 (gsi, loc, type, digit, cache);
> +    }
> +  else
> +    {
> +      target = create_tmp_var (type, "powmult");
> +      add_referenced_var (target);
> +      target = make_ssa_name (target, NULL);
> +      op0 = powi_as_mults_1 (gsi, loc, type, n >> 1, cache);
> +      op1 = op0;
> +    }
> +
> +  mult_stmt = gimple_build_assign_with_ops (MULT_EXPR, target, op0, op1);
> +  SSA_NAME_DEF_STMT (target) = mult_stmt;
> +  gsi_insert_before (gsi, mult_stmt, GSI_SAME_STMT);
> +
> +  return target;
> +}
> +
> +/* Convert ARG0**N to a tree of multiplications of ARG0 with itself.
> +   This function needs to be kept in sync with powi_cost, above.  */
> +
> +static tree
> +powi_as_mults (gimple_stmt_iterator *gsi, location_t loc,
> +              tree arg0, HOST_WIDE_INT n)
> +{
> +  tree cache[POWI_TABLE_SIZE], result, type = TREE_TYPE (arg0);
> +
> +  if (n == 0)
> +    return omit_one_operand_loc (loc, type, build_real (type, dconst1), arg0);
> +
> +  memset (cache, 0,  sizeof (cache));
> +  cache[1] = arg0;
> +
> +  result = powi_as_mults_1 (gsi, loc, type, (n < 0) ? -n : n, cache);
> +
> +  /* If the original exponent was negative, reciprocate the result.  */
> +  if (n < 0)
> +    result = build2_loc (loc, RDIV_EXPR, type,
> +                        build_real (type, dconst1), result);
> +  return result;
> +}
> +
> +/* ARGS are the two arguments to a powi builtin in GSI with location info
> +   LOC.  If the arguments are appropriate, create an equivalent set of
> +   statements prior to GSI using an optimal number of multiplications,
> +   and return an expession holding the result.  */
> +
> +tree
> +tree_expand_builtin_powi (gimple_stmt_iterator *gsi, location_t loc, tree *args)
> +{
> +  HOST_WIDE_INT n = TREE_INT_CST_LOW (args[1]);
> +  HOST_WIDE_INT n_hi = TREE_INT_CST_HIGH (args[1]);
> +
> +  if ((n_hi == 0 || n_hi == -1)
> +      /* Avoid largest negative number.  */
> +      && (n != -n)
> +      && ((n >= -1 && n <= 2)
> +         || (optimize_function_for_speed_p (cfun)
> +             && powi_cost (n) <= POWI_MAX_MULTS)))
> +    return powi_as_mults (gsi, loc, args[0], n);
> +
> +  return NULL_TREE;
> +}
> +
>  /* Fold a builtin function call to powi, powif, or powil with argument ARG.
>    Return NULL_TREE if no simplification can be made.  */
> -static tree
> +
> +tree
>  fold_builtin_powi (location_t loc, tree fndecl ATTRIBUTE_UNUSED,
>                   tree arg0, tree arg1, tree type)
>  {
> @@ -8179,17 +8210,8 @@
>   if (host_integerp (arg1, 0))
>     {
>       HOST_WIDE_INT c = TREE_INT_CST_LOW (arg1);
> +      tree powi_const;
>
> -      /* Evaluate powi at compile-time.  */
> -      if (TREE_CODE (arg0) == REAL_CST
> -         && !TREE_OVERFLOW (arg0))
> -       {
> -         REAL_VALUE_TYPE x;
> -         x = TREE_REAL_CST (arg0);
> -         real_powi (&x, TYPE_MODE (type), &x, c);
> -         return build_real (type, x);
> -       }
> -
>       /* Optimize pow(x,0) = 1.0.  */
>       if (c == 0)
>        return omit_one_operand_loc (loc, type, build_real (type, dconst1),
> @@ -8203,6 +8225,12 @@
>       if (c == -1)
>        return fold_build2_loc (loc, RDIV_EXPR, type,
>                           build_real (type, dconst1), arg0);
> +
> +      /* Attempt to evaluate powi at compile time.  */
> +      powi_const = fold_eval_powi (arg0, c, type, TYPE_MODE (type));
> +
> +      if (powi_const)
> +       return powi_const;
>     }
>
>   return NULL_TREE;
> Index: gcc/fold-const.c
> ===================================================================
> --- gcc/fold-const.c    (revision 173730)
> +++ gcc/fold-const.c    (working copy)
> @@ -10460,6 +10460,36 @@
>                    }
>                }
>
> +             /* Optimizations of powi(...)*powi(...).  */
> +             if ((fcode0 == BUILT_IN_POWI && fcode1 == BUILT_IN_POWI)
> +                 || (fcode0 == BUILT_IN_POWIF && fcode1 == BUILT_IN_POWIF)
> +                 || (fcode0 == BUILT_IN_POWIL && fcode1 == BUILT_IN_POWIL))
> +               {
> +                 tree arg00 = CALL_EXPR_ARG (arg0, 0);
> +                 tree arg01 = CALL_EXPR_ARG (arg0, 1);
> +                 tree arg10 = CALL_EXPR_ARG (arg1, 0);
> +                 tree arg11 = CALL_EXPR_ARG (arg1, 1);
> +
> +                 /* Optimize powi(x,y)*powi(z,y) as powi(x*z,y).  */
> +                 if (operand_equal_p (arg01, arg11, 0))
> +                   {
> +                     tree powfn = TREE_OPERAND (CALL_EXPR_FN (arg0), 0);
> +                     tree arg = fold_build2_loc (loc, MULT_EXPR, type,
> +                                             arg00, arg10);
> +                     return build_call_expr_loc (loc, powfn, 2, arg, arg01);
> +                   }
> +
> +                 /* Optimize powi(x,y)*powi(x,z) as powi(x,y+z).  */
> +                 if (operand_equal_p (arg00, arg10, 0))
> +                   {
> +                     tree powfn = TREE_OPERAND (CALL_EXPR_FN (arg0), 0);
> +                     tree inttype = TREE_TYPE (arg01);
> +                     tree arg = fold_build2_loc (loc, PLUS_EXPR, inttype,
> +                                             arg01, arg11);
> +                     return build_call_expr_loc (loc, powfn, 2, arg00, arg);
> +                   }
> +               }
> +
>              /* Optimize tan(x)*cos(x) as sin(x).  */
>              if (((fcode0 == BUILT_IN_TAN && fcode1 == BUILT_IN_COS)
>                   || (fcode0 == BUILT_IN_TANF && fcode1 == BUILT_IN_COSF)
> @@ -10521,16 +10551,61 @@
>                    }
>                }
>
> -             /* Optimize x*x as pow(x,2.0), which is expanded as x*x.  */
> -             if (optimize_function_for_speed_p (cfun)
> -                 && operand_equal_p (arg0, arg1, 0))
> +             /* Optimize x*powi(x,c) as powi(x,c+1).  */
> +             if (fcode1 == BUILT_IN_POWI
> +                 || fcode1 == BUILT_IN_POWIF
> +                 || fcode1 == BUILT_IN_POWIL)
>                {
> -                 tree powfn = mathfn_built_in (type, BUILT_IN_POW);
> +                 tree arg10 = CALL_EXPR_ARG (arg1, 0);
> +                 tree arg11 = CALL_EXPR_ARG (arg1, 1);
> +                 if (TREE_CODE (arg11) == INTEGER_CST
> +                     && !TREE_OVERFLOW (arg11)
> +                     && operand_equal_p (arg0, arg10, 0))
> +                   {
> +                     tree powifn = TREE_OPERAND (CALL_EXPR_FN (arg1), 0);
> +                     HOST_WIDE_INT n, n_hi, n_plus_1;
> +                     tree arg;
>
> -                 if (powfn)
> +                     n = TREE_INT_CST_LOW (arg11);
> +                     n_hi = TREE_INT_CST_HIGH (arg11);
> +                     n_plus_1 = n + 1;
> +                     if ((n_hi == 0 || n_hi == -1)
> +                         /* Avoid overflow.  */
> +                         && n_plus_1 > n)
> +                       {
> +                         arg = build_int_cst (TREE_TYPE (arg11), n + 1);
> +                         return build_call_expr_loc (loc, powifn, 2,
> +                                                     arg0, arg);
> +                       }
> +                   }
> +               }
> +
> +             /* Optimize powi(x,c)*x as powi(x,c+1).  */
> +             if (fcode0 == BUILT_IN_POWI
> +                 || fcode0 == BUILT_IN_POWIF
> +                 || fcode0 == BUILT_IN_POWIL)
> +               {
> +                 tree arg00 = CALL_EXPR_ARG (arg0, 0);
> +                 tree arg01 = CALL_EXPR_ARG (arg0, 1);
> +                 if (TREE_CODE (arg01) == INTEGER_CST
> +                     && !TREE_OVERFLOW (arg01)
> +                     && operand_equal_p (arg1, arg00, 0))
>                    {
> -                     tree arg = build_real (type, dconst2);
> -                     return build_call_expr_loc (loc, powfn, 2, arg0, arg);
> +                     tree powifn = TREE_OPERAND (CALL_EXPR_FN (arg0), 0);
> +                     HOST_WIDE_INT n, n_hi, n_plus_1;
> +                     tree arg;
> +
> +                     n = TREE_INT_CST_LOW (arg01);
> +                     n_hi = TREE_INT_CST_HIGH (arg01);
> +                     n_plus_1 = n + 1;
> +                     if ((n_hi == 0 || n_hi == -1)
> +                         /* Avoid overflow.  */
> +                         && n_plus_1 > n)
> +                       {
> +                         arg = build_int_cst (TREE_TYPE (arg01), n + 1);
> +                         return build_call_expr_loc (loc, powifn, 2,
> +                                                     arg1, arg);
> +                       }
>                    }
>                }
>            }
> @@ -11457,6 +11532,34 @@
>                }
>            }
>
> +         /* Optimize powi(x,c)/x as powi(x,c-1).  */
> +         if (fcode0 == BUILT_IN_POWI
> +             || fcode0 == BUILT_IN_POWIF
> +             || fcode0 == BUILT_IN_POWIL)
> +           {
> +             tree arg00 = CALL_EXPR_ARG (arg0, 0);
> +             tree arg01 = CALL_EXPR_ARG (arg0, 1);
> +             if (TREE_CODE (arg01) == INTEGER_CST
> +                 && !TREE_OVERFLOW (arg01)
> +                 && operand_equal_p (arg1, arg00, 0))
> +               {
> +                 tree powifn = TREE_OPERAND (CALL_EXPR_FN (arg0), 0);
> +                 HOST_WIDE_INT n, n_hi, n_minus_1;
> +                 tree arg;
> +
> +                 n = TREE_INT_CST_LOW (arg01);
> +                 n_hi = TREE_INT_CST_HIGH (arg01);
> +                 n_minus_1 = n - 1;
> +                 if ((n_hi == 0 || n_hi == -1)
> +                     /* Avoid overflow.  */
> +                     && n_minus_1 < n)
> +                   {
> +                     arg = build_int_cst (TREE_TYPE (arg01), n - 1);
> +                     return build_call_expr_loc (loc, powifn, 2, arg1, arg);
> +                   }
> +               }
> +           }
> +
>          /* Optimize a/root(b/c) into a*root(c/b).  */
>          if (BUILTIN_ROOT_P (fcode1))
>            {
> @@ -11499,6 +11602,20 @@
>              arg1 = build_call_expr_loc (loc, powfn, 2, arg10, neg11);
>              return fold_build2_loc (loc, MULT_EXPR, type, arg0, arg1);
>            }
> +
> +         /* Optimize x/powi(y,z) into x*powi(y,-z).  */
> +         if (fcode1 == BUILT_IN_POWI
> +             || fcode1 == BUILT_IN_POWIF
> +             || fcode1 == BUILT_IN_POWIL)
> +           {
> +             tree powifn = TREE_OPERAND (CALL_EXPR_FN (arg1), 0);
> +             tree arg10 = CALL_EXPR_ARG (arg1, 0);
> +             tree arg11 = CALL_EXPR_ARG (arg1, 1);
> +             tree neg11 = fold_convert_loc (loc, integer_type_node,
> +                                            negate_expr (arg11));
> +             arg1 = build_call_expr_loc (loc, powifn, 2, arg10, neg11);
> +             return fold_build2_loc (loc, MULT_EXPR, type, arg0, arg1);
> +           }

Did you check whether any of the above apply?  You removed the
x*x -> pow(x,2.0) folding as far as I see but did not introduce a
corresponding powi(x,2) one.

>        }
>       return NULL_TREE;
>
> Index: gcc/testsuite/gcc.target/powerpc/pr46728-13.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/pr46728-13.c       (revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/pr46728-13.c       (revision 0)
> @@ -0,0 +1,27 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
> +
> +#include <math.h>
> +
> +extern void abort (void);
> +
> +#define NVALS 6
> +
> +static double
> +convert_it (double x)
> +{
> +  return pow (x, 1.0 / 6.0);
> +}
> +
> +int
> +main (int argc, char *argv[])
> +{
> +  double values[NVALS] = { 3.0, 1.95, 2.227, 729.0, 64.0, .0008797 };
> +  unsigned i;
> +
> +  for (i = 0; i < NVALS; i++)
> +    if (convert_it (values[i]) != cbrt (sqrt (values[i])))
> +      abort ();
> +
> +  return 0;
> +}
> Index: gcc/testsuite/gcc.target/powerpc/pr46728-3.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/pr46728-3.c        (revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/pr46728-3.c        (revision 0)
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
> +
> +#include <math.h>
> +
> +extern void abort (void);
> +
> +#define NVALS 6
> +
> +static double
> +convert_it (double x)
> +{
> +  return pow (x, 0.75);
> +}
> +
> +int
> +main (int argc, char *argv[])
> +{
> +  double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 };
> +  unsigned i;
> +
> +  for (i = 0; i < NVALS; i++)
> +    if (convert_it (values[i]) != sqrt(values[i]) * sqrt (sqrt (values[i])))
> +      abort ();
> +
> +  return 0;
> +}
> +
> +
> +/* { dg-final { scan-assembler-times "sqrt" 4 { target powerpc*-*-* } } } */
> +/* { dg-final { scan-assembler-not "pow" { target powerpc*-*-* } } } */
> Index: gcc/testsuite/gcc.target/powerpc/pr46728-14.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/pr46728-14.c       (revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/pr46728-14.c       (revision 0)
> @@ -0,0 +1,78 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
> +
> +#include <math.h>
> +
> +extern void abort (void);
> +
> +#define NVALS 6
> +
> +static double
> +convert_it_1 (double x)
> +{
> +  return pow (x, 1.5);
> +}
> +
> +static double
> +convert_it_2 (double x)
> +{
> +  return pow (x, 2.5);
> +}
> +
> +static double
> +convert_it_3 (double x)
> +{
> +  return pow (x, -0.5);
> +}
> +
> +static double
> +convert_it_4 (double x)
> +{
> +  return pow (x, 10.5);
> +}
> +
> +static double
> +convert_it_5 (double x)
> +{
> +  return pow (x, -3.5);
> +}
> +
> +int
> +main (int argc, char *argv[])
> +{
> +  double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 };
> +  double PREC = .999999;
> +  unsigned i;
> +
> +  for (i = 0; i < NVALS; i++)
> +    {
> +      volatile double x, y;
> +
> +      x = sqrt (values[i]);
> +      y = __builtin_powi (values[i], 1);
> +      if (fabs (convert_it_1 (values[i]) / (x * y)) < PREC)
> +       abort ();
> +
> +      x = sqrt (values[i]);
> +      y = __builtin_powi (values[i], 2);
> +      if (fabs (convert_it_2 (values[i]) / (x * y)) < PREC)
> +       abort ();
> +
> +      x = sqrt (values[i]);
> +      y = __builtin_powi (values[i], -1);
> +      if (fabs (convert_it_3 (values[i]) / (x * y)) < PREC)
> +       abort ();
> +
> +      x = sqrt (values[i]);
> +      y = __builtin_powi (values[i], 10);
> +      if (fabs (convert_it_4 (values[i]) / (x * y)) < PREC)
> +       abort ();
> +
> +      x = sqrt (values[i]);
> +      y = __builtin_powi (values[i], -4);
> +      if (fabs (convert_it_5 (values[i]) / (x * y)) < PREC)
> +       abort ();
> +    }
> +
> +  return 0;
> +}
> Index: gcc/testsuite/gcc.target/powerpc/pr46728-4.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/pr46728-4.c        (revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/pr46728-4.c        (revision 0)
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
> +
> +#include <math.h>
> +
> +extern void abort (void);
> +
> +#define NVALS 6
> +
> +static double
> +convert_it (double x)
> +{
> +  return pow (x, 1.0 / 3.0);
> +}
> +
> +int
> +main (int argc, char *argv[])
> +{
> +  double values[NVALS] = { 3.0, 1.95, 2.227, 729.0, 64.0, .0008797 };
> +  unsigned i;
> +
> +  for (i = 0; i < NVALS; i++)
> +    if (convert_it (values[i]) != cbrt (values[i]))
> +      abort ();
> +
> +  return 0;
> +}
> +
> +
> +/* { dg-final { scan-assembler-times "cbrt" 2 { target powerpc*-*-* } } } */
> +/* { dg-final { scan-assembler-not "pow" { target powerpc*-*-* } } } */
> Index: gcc/testsuite/gcc.target/powerpc/pr46728-15.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/pr46728-15.c       (revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/pr46728-15.c       (revision 0)
> @@ -0,0 +1,67 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
> +
> +#include <math.h>
> +
> +extern void abort (void);
> +
> +#define NVALS 6
> +
> +static double
> +convert_it_1 (double x)
> +{
> +  return pow (x, 10.0 / 3.0);
> +}
> +
> +static double
> +convert_it_2 (double x)
> +{
> +  return pow (x, 11.0 / 3.0);
> +}
> +
> +static double
> +convert_it_3 (double x)
> +{
> +  return pow (x, -7.0 / 3.0);
> +}
> +
> +static double
> +convert_it_4 (double x)
> +{
> +  return pow (x, -8.0 / 3.0);
> +}
> +
> +int
> +main (int argc, char *argv[])
> +{
> +  double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 };
> +  double PREC = .999999;
> +  unsigned i;
> +
> +  for (i = 0; i < NVALS; i++)
> +    {
> +      volatile double x, y;
> +
> +      x = __builtin_powi (values[i], 3);
> +      y = __builtin_powi (cbrt (values[i]), 1);
> +      if (fabs (convert_it_1 (values[i]) / (x * y)) < PREC)
> +       abort ();
> +
> +      x = __builtin_powi (values[i], 3);
> +      y = __builtin_powi (cbrt (values[i]), 2);
> +      if (fabs (convert_it_2 (values[i]) / (x * y)) < PREC)
> +       abort ();
> +
> +      x = __builtin_powi (values[i], -3);
> +      y = __builtin_powi (cbrt (values[i]), 2);
> +      if (fabs (convert_it_3 (values[i]) / (x * y)) < PREC)
> +       abort ();
> +
> +      x = __builtin_powi (values[i], -3);
> +      y = __builtin_powi (cbrt (values[i]), 1);
> +      if (fabs (convert_it_4 (values[i]) / (x * y)) < PREC)
> +       abort ();
> +    }
> +
> +  return 0;
> +}
> Index: gcc/testsuite/gcc.target/powerpc/pr46728-5.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/pr46728-5.c        (revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/pr46728-5.c        (revision 0)
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
> +
> +#include <math.h>
> +
> +extern void abort (void);
> +
> +#define NVALS 6
> +
> +static double
> +convert_it (double x)
> +{
> +  return pow (x, 1.0 / 6.0);
> +}
> +
> +int
> +main (int argc, char *argv[])
> +{
> +  double values[NVALS] = { 3.0, 1.95, 2.227, 729.0, 64.0, .0008797 };
> +  unsigned i;
> +
> +  for (i = 0; i < NVALS; i++)
> +    if (convert_it (values[i]) != cbrt (sqrt (values[i])))
> +      abort ();
> +
> +  return 0;
> +}
> +
> +
> +/* { dg-final { scan-assembler-times "cbrt" 2 { target powerpc*-*-* } } } */
> +/* { dg-final { scan-assembler-not " pow " { target powerpc*-*-* } } } */
> Index: gcc/testsuite/gcc.target/powerpc/pr46728-16.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/pr46728-16.c       (revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/pr46728-16.c       (revision 0)
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ffast-math -mcpu=power6" } */
> +
> +double foo (double x, double y)
> +{
> +  return __builtin_pow (x, 0.75) + y;
> +}
> +
> +
> +/* { dg-final { scan-assembler "fmadd" { target powerpc*-*-* } } } */
> Index: gcc/testsuite/gcc.target/powerpc/pr46728-7.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/pr46728-7.c        (revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/pr46728-7.c        (revision 0)
> @@ -0,0 +1,58 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
> +
> +#include <math.h>
> +
> +extern void abort (void);
> +
> +#define NVALS 6
> +
> +static double
> +convert_it_1 (double x)
> +{
> +  return pow (x, 1.5);
> +}
> +
> +static double
> +convert_it_2 (double x)
> +{
> +  return pow (x, 2.5);
> +}
> +
> +static double
> +convert_it_3 (double x)
> +{
> +  return pow (x, -0.5);
> +}
> +
> +static double
> +convert_it_4 (double x)
> +{
> +  return pow (x, 10.5);
> +}
> +
> +int
> +main (int argc, char *argv[])
> +{
> +  double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 };
> +  unsigned i;
> +
> +  for (i = 0; i < NVALS; i++)
> +    {
> +      if (convert_it_1 (values[i]) != sqrt (values[i]) * powi (values[i], 1))
> +       abort ();
> +      if (convert_it_2 (values[i]) != sqrt (values[i]) * powi (values[i], 2))
> +       abort ();
> +      if (convert_it_3 (values[i]) != sqrt (values[i]) * powi (values[i], -1))
> +       abort ();
> +      if (convert_it_4 (values[i]) != sqrt (values[i]) * powi (values[i], 10))
> +       abort ();
> +    }
> +
> +  return 0;
> +}
> +
> +
> +/* { dg-final { scan-assembler-times "sqrt" 5 { target powerpc*-*-* } } } */
> +/* { dg-final { scan-assembler-times "powi" 4 { target powerpc*-*-* } } } */
> +/* { dg-final { scan-assembler-not "pow " { target powerpc*-*-* } } } */
> Index: gcc/testsuite/gcc.target/powerpc/pr46728-10.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/pr46728-10.c       (revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/pr46728-10.c       (revision 0)
> @@ -0,0 +1,28 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
> +
> +#include <math.h>
> +
> +extern void abort (void);
> +
> +#define NVALS 6
> +
> +static double
> +convert_it (double x)
> +{
> +  return pow (x, 0.25);
> +}
> +
> +int
> +main (int argc, char *argv[])
> +{
> +  double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 };
> +  unsigned i;
> +
> +  for (i = 0; i < NVALS; i++)
> +    if (convert_it (values[i]) != sqrt (sqrt (values[i])))
> +      abort ();
> +
> +  return 0;
> +}
> +
> Index: gcc/testsuite/gcc.target/powerpc/pr46728-8.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/pr46728-8.c        (revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/pr46728-8.c        (revision 0)
> @@ -0,0 +1,62 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
> +
> +#include <math.h>
> +
> +extern void abort (void);
> +
> +#define NVALS 6
> +
> +static double
> +convert_it_1 (double x)
> +{
> +  return pow (x, 10.0 / 3.0);
> +}
> +
> +static double
> +convert_it_2 (double x)
> +{
> +  return pow (x, 11.0 / 3.0);
> +}
> +
> +static double
> +convert_it_3 (double x)
> +{
> +  return pow (x, -7.0 / 3.0);
> +}
> +
> +static double
> +convert_it_4 (double x)
> +{
> +  return pow (x, -8.0 / 3.0);
> +}
> +
> +int
> +main (int argc, char *argv[])
> +{
> +  double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 };
> +  unsigned i;
> +
> +  for (i = 0; i < NVALS; i++)
> +    {
> +      if (convert_it_1 (values[i]) !=
> +         powi (values[i], 3) * powi (cbrt (values[i]), 1))
> +       abort ();
> +      if (convert_it_2 (values[i]) !=
> +         powi (values[i], 3) * powi (cbrt (values[i]), 2))
> +       abort ();
> +      if (convert_it_3 (values[i]) !=
> +         powi (values[i], -3) * powi (cbrt (values[i]), 2))
> +       abort ();
> +      if (convert_it_4 (values[i]) !=
> +         powi (values[i], -3) * powi (cbrt (values[i]), 1))
> +       abort ();
> +    }
> +
> +  return 0;
> +}
> +
> +
> +/* { dg-final { scan-assembler-times "powi" 8 { target powerpc*-*-* } } } */
> +/* { dg-final { scan-assembler-times "cbrt" 5 { target powerpc*-*-* } } } */
> +/* { dg-final { scan-assembler-not "pow " { target powerpc*-*-* } } } */
> Index: gcc/testsuite/gcc.target/powerpc/pr46728-11.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/pr46728-11.c       (revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/pr46728-11.c       (revision 0)
> @@ -0,0 +1,34 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
> +
> +#include <math.h>
> +
> +extern void abort (void);
> +
> +#define NVALS 6
> +
> +static double
> +convert_it (double x)
> +{
> +  return pow (x, 0.75);
> +}
> +
> +int
> +main (int argc, char *argv[])
> +{
> +  double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 };
> +  double PREC = 0.999999;
> +  unsigned i;
> +
> +  for (i = 0; i < NVALS; i++)
> +    {
> +      volatile double x, y;
> +      x = sqrt (values[i]);
> +      y = sqrt (sqrt (values[i]));
> +
> +      if (fabs (convert_it (values[i]) / (x * y)) < PREC)
> +       abort ();
> +    }
> +
> +  return 0;
> +}
> Index: gcc/testsuite/gcc.target/powerpc/pr46728-1.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/pr46728-1.c        (revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/pr46728-1.c        (revision 0)
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
> +
> +#include <math.h>
> +
> +extern void abort (void);
> +
> +#define NVALS 6
> +
> +static double
> +convert_it (double x)
> +{
> +  return pow (x, 0.5);
> +}
> +
> +int
> +main (int argc, char *argv[])
> +{
> +  double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 };
> +  unsigned i;
> +
> +  for (i = 0; i < NVALS; i++)
> +    if (convert_it (values[i]) != sqrt (values[i]))
> +      abort ();
> +
> +  return 0;
> +}
> +
> +
> +/* { dg-final { scan-assembler-times "fsqrt" 2 { target powerpc*-*-* } } } */
> +/* { dg-final { scan-assembler-not "pow" { target powerpc*-*-* } } } */
> Index: gcc/testsuite/gcc.target/powerpc/pr46728-2.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/pr46728-2.c        (revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/pr46728-2.c        (revision 0)
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
> +
> +#include <math.h>
> +
> +extern void abort (void);
> +
> +#define NVALS 6
> +
> +static double
> +convert_it (double x)
> +{
> +  return pow (x, 0.25);
> +}
> +
> +int
> +main (int argc, char *argv[])
> +{
> +  double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 };
> +  unsigned i;
> +
> +  for (i = 0; i < NVALS; i++)
> +    if (convert_it (values[i]) != sqrt (sqrt (values[i])))
> +      abort ();
> +
> +  return 0;
> +}
> +
> +
> +/* { dg-final { scan-assembler-times "fsqrt" 4 { target powerpc*-*-* } } } */
> +/* { dg-final { scan-assembler-not "pow" { target powerpc*-*-* } } } */
> Index: gcc/testsuite/gcc.dg/pr46728-9.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/pr46728-9.c    (revision 0)
> +++ gcc/testsuite/gcc.dg/pr46728-9.c    (revision 0)
> @@ -0,0 +1,29 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm" } */
> +
> +#include <math.h>
> +
> +extern void abort (void);
> +
> +#define NVALS 6
> +
> +static double
> +convert_it (double x)
> +{
> +  return pow (x, 0.5);
> +}
> +
> +int
> +main (int argc, char *argv[])
> +{
> +  double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 };
> +  double PREC = 0.999999;
> +  unsigned i;
> +
> +  for (i = 0; i < NVALS; i++)
> +    if (fabs (convert_it (values[i]) / sqrt (values[i])) < PREC)
> +      abort ();
> +
> +  return 0;
> +}
> +
> Index: gcc/testsuite/gcc.dg/pr46728-12.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/pr46728-12.c   (revision 0)
> +++ gcc/testsuite/gcc.dg/pr46728-12.c   (revision 0)
> @@ -0,0 +1,28 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm" } */
> +
> +#include <math.h>
> +
> +extern void abort (void);
> +
> +#define NVALS 6
> +
> +static double
> +convert_it (double x)
> +{
> +  return pow (x, 1.0 / 3.0);
> +}
> +
> +int
> +main (int argc, char *argv[])
> +{
> +  double values[NVALS] = { 3.0, 1.95, 2.227, 729.0, 64.0, .0008797 };
> +  double PREC = 0.999999;
> +  unsigned i;
> +
> +  for (i = 0; i < NVALS; i++)
> +    if (fabs (convert_it (values[i]) / cbrt (values[i])) < PREC)
> +      abort ();
> +
> +  return 0;
> +}
> Index: gcc/testsuite/gcc.dg/pr46728-6.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/pr46728-6.c    (revision 0)
> +++ gcc/testsuite/gcc.dg/pr46728-6.c    (revision 0)
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ffast-math -lm" } */
> +
> +#include <math.h>
> +
> +int
> +main (int argc, char *argv[])
> +{
> +  volatile double result;
> +
> +  result = pow (-0.0, 3.0);
> +  result = pow (26.47, -2.0);
> +  result = pow (0.0, 0.0);
> +  result = pow (22.3, 1.0);
> +  result = pow (33.2, -1.0);
> +
> +  return 0;
> +}
> +
> +
> +/* { dg-final { scan-assembler-not "pow" } } */
> Index: gcc/tree-ssa-math-opts.c
> ===================================================================
> --- gcc/tree-ssa-math-opts.c    (revision 173730)
> +++ gcc/tree-ssa-math-opts.c    (working copy)
> @@ -1,5 +1,5 @@
>  /* Global, SSA-based optimizations using mathematical identities.
> -   Copyright (C) 2005, 2006, 2007, 2008, 2009, 2010
> +   Copyright (C) 2005, 2006, 2007, 2008, 2009, 2010, 2011
>    Free Software Foundation, Inc.
>
>  This file is part of GCC.
> @@ -103,6 +103,7 @@
>  #include "rtl.h"               /* Because optabs.h wants enum rtx_code.  */
>  #include "expr.h"              /* Because optabs.h wants sepops.  */
>  #include "optabs.h"
> +#include "tree-ssa-propagate.h"

What for?

>  /* This structure represents one basic block that either computes a
>    division, or is a common dominator for basic block that compute a
> @@ -1854,3 +1855,123 @@
>   | TODO_update_ssa                     /* todo_flags_finish */
>  }
>  };
> +
> +/* Simplify built-in calls to pow and powi.  This is done prior to
> +   the vectorizer to expose vector square root and multiplication
> +   series opportunities.  */
> +
> +static unsigned int
> +execute_lower_pow (void)
> +{
> +  basic_block bb;
> +
> +  FOR_EACH_BB (bb)
> +    {
> +      gimple_stmt_iterator gsi;
> +
> +      for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi);)
> +        {
> +         gimple stmt = gsi_stmt (gsi);
> +
> +         if (is_gimple_call (stmt))
> +           {
> +             tree fndecl = gimple_call_fndecl (stmt);
> +             tree result = NULL_TREE;
> +
> +             if (!fndecl
> +                 || TREE_CODE (fndecl) != FUNCTION_DECL

that's always true

> +                 || !DECL_BUILT_IN (fndecl)
> +                 || gimple_call_va_arg_pack_p (stmt)

Why special-case this?

> +                 || DECL_BUILT_IN_CLASS (fndecl) != BUILT_IN_NORMAL)
> +               {
> +                 gsi_next (&gsi);
> +                 continue;
> +               }
> +
> +             switch (DECL_FUNCTION_CODE (fndecl))
> +               {
> +               case BUILT_IN_POW:
> +               case BUILT_IN_POWF:
> +               case BUILT_IN_POWL:
> +                 {
> +                   location_t loc = gimple_location (stmt);
> +                   tree *args = gimple_call_arg_ptr (stmt, 0);
> +                   tree type = TREE_TYPE (TREE_TYPE (fndecl));
> +                   result = fold_builtin_pow (loc, fndecl, args[0],
> +                                              args[1], type);

I think this code belongs in gimple_fold_call.  Also I don't think
any of the foldings apart from constant folding will apply as you
never will get calls as arguments there.  The folding should also
already be applied via gimple_fold_call -> gimple_fold_builtin ->
fold_call_stmt.

> +                   break;
> +                 }
> +               case BUILT_IN_POWI:
> +               case BUILT_IN_POWIF:
> +               case BUILT_IN_POWIL:
> +                 {
> +                   location_t loc = gimple_location (stmt);
> +                   tree *args = gimple_call_arg_ptr (stmt, 0);
> +                   tree type = TREE_TYPE (TREE_TYPE (fndecl));
> +                   result = fold_builtin_powi (loc, fndecl, args[0],
> +                                               args[1], type);

Likewise.

Explicit un-canonicalization could be done here, though.

So expanding to power-series remains, which btw could be split
out into a separate patch.

> +                   /* Expanding powi into an optimal number of
> +                      multiplications requires adding statements,
> +                      so handle that separately.  */
> +                   if (result == NULL_TREE
> +                       && host_integerp (args[1], 0)
> +                       && !TREE_OVERFLOW (args[1]))
> +                     result = tree_expand_builtin_powi (&gsi, loc, args);
> +
> +                   break;
> +                 }
> +               default:
> +                 break;
> +               }
> +
> +             if (result)
> +               {
> +                 /* Propagate location information from original call to
> +                    expansion of builtin.  Otherwise things like

Did you check why that does not happen anyway?  It sounds like a bug.

Richard.

> +                    maybe_emit_chk_warning, that operate on the expansion
> +                    of a builtin, will use the wrong location information.  */
> +                 if (gimple_has_location (stmt))
> +                   {
> +                     tree realret = result;
> +                     if (TREE_CODE (result) == NOP_EXPR)
> +                       realret = TREE_OPERAND (result, 0);
> +                     if (CAN_HAVE_LOCATION_P (realret)
> +                         && !EXPR_HAS_LOCATION (realret))
> +                       SET_EXPR_LOCATION (realret, gimple_location (stmt));
> +                     result = realret;
> +                   }
> +               }
> +
> +             if (result && !update_call_from_tree (&gsi, result))
> +               gimplify_and_update_call_from_tree (&gsi, result);
> +           }
> +
> +         gsi_next (&gsi);
> +       }
> +    }
> +
> +  return 0;
> +}
> +
> +struct gimple_opt_pass pass_lower_pow =
> +{
> + {
> +  GIMPLE_PASS,
> +  "lower_pow",                         /* name */
> +  NULL,                                        /* gate */
> +  execute_lower_pow,                   /* execute */
> +  NULL,                                        /* sub */
> +  NULL,                                        /* next */
> +  0,                                   /* static_pass_number */
> +  TV_NONE,                             /* tv_id */
> +  PROP_ssa,                            /* properties_required */
> +  0,                                   /* properties_provided */
> +  0,                                   /* properties_destroyed */
> +  0,                                   /* todo_flags_start */
> +  TODO_verify_ssa
> +  | TODO_verify_stmts
> +  | TODO_dump_func
> +  | TODO_update_ssa                     /* todo_flags_finish */
> + }
> +};
> Index: gcc/tree-flow.h
> ===================================================================
> --- gcc/tree-flow.h     (revision 173730)
> +++ gcc/tree-flow.h     (working copy)
> @@ -856,4 +856,7 @@
>
>  void swap_tree_operands (gimple, tree *, tree *);
>
> +/* In builtins.c  */
> +tree tree_expand_builtin_powi (gimple_stmt_iterator *, location_t, tree *);
> +
>  #endif /* _TREE_FLOW_H  */
> Index: gcc/Makefile.in
> ===================================================================
> --- gcc/Makefile.in     (revision 173730)
> +++ gcc/Makefile.in     (working copy)
> @@ -2639,7 +2639,8 @@
>  tree-ssa-math-opts.o : tree-ssa-math-opts.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
>    $(TM_H) $(FLAGS_H) $(TREE_H) $(TREE_FLOW_H) $(TIMEVAR_H) \
>    $(TREE_PASS_H) alloc-pool.h $(BASIC_BLOCK_H) $(TARGET_H) \
> -   $(DIAGNOSTIC_H) $(RTL_H) $(EXPR_H) $(OPTABS_H) gimple-pretty-print.h
> +   $(DIAGNOSTIC_H) $(RTL_H) $(EXPR_H) $(OPTABS_H) gimple-pretty-print.h \
> +   tree-ssa-propagate.h
>  tree-ssa-alias.o : tree-ssa-alias.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \
>    $(TREE_H) $(TM_P_H) $(EXPR_H) $(GGC_H) $(TREE_INLINE_H) $(FLAGS_H) \
>    $(FUNCTION_H) $(TIMEVAR_H) convert.h $(TM_H) coretypes.h langhooks.h \
> Index: gcc/passes.c
> ===================================================================
> --- gcc/passes.c        (revision 173730)
> +++ gcc/passes.c        (working copy)
> @@ -1,7 +1,7 @@
>  /* Top level of GCC compilers (cc1, cc1plus, etc.)
>    Copyright (C) 1987, 1988, 1989, 1992, 1993, 1994, 1995, 1996, 1997, 1998,
> -   1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010
> -   Free Software Foundation, Inc.
> +   1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010,
> +   2011 Free Software Foundation, Inc.
>
>  This file is part of GCC.
>
> @@ -812,6 +812,7 @@
>      output to the assembler file.  */
>   p = &all_passes;
>   NEXT_PASS (pass_lower_eh_dispatch);
> +  NEXT_PASS (pass_lower_pow);
>   NEXT_PASS (pass_all_optimizations);
>     {
>       struct opt_pass **p = &pass_all_optimizations.pass.sub;
>
>
>
Bill Schmidt May 16, 2011, 5:30 p.m. UTC | #8
Richi, thank you for the detailed review!

I'll plan to move the power-series expansion into the existing IL walk
during pass_cse_sincos.  As part of this, I'll move
tree_expand_builtin_powi and its subfunctions from builtins.c into
tree-ssa-math-opts.c.  I'll submit this as a separate patch.

I will also stop attempting to make code generation match completely at
-O0.  If there are tests in the test suite that fail only at -O0 due to
these changes, I'll modify the tests to require -O1 or higher.

I understand that you'd prefer that I leave the existing
canonicalization folds in place, and only un-canonicalize them during my
new pass (now, during cse_sincos).  Actually, that was my first approach
to this issue.  The problem that I ran into is that the various folds
are not performed just by the front end, but can pop up later, after my
pass is done.  In particular, pass_fold_builtins will undo my changes,
turning expressions involving roots back into expressions involving
pow/powi.  It wasn't clear to me whether the folds could kick in
elsewhere as well, so I took the approach of shutting them down.  I see
now that this does lose some optimizations such as
pow(sqrt(cbrx(x)),6.0), as you pointed out.

Should I attempt to leave the folds in place, and screen out the
particular cases that are causing trouble in pass_fold_builtins?  Or is
it too fragile to try to catch all places where folds occur?  If there's
a flag that indicates parsing is complete, I suppose I could disable
individual folds once we're into the optimizer.  I'd appreciate your
guidance.

Thanks,
Bill
Richard Biener May 17, 2011, 9:03 a.m. UTC | #9
On Mon, May 16, 2011 at 7:30 PM, William J. Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> Richi, thank you for the detailed review!
>
> I'll plan to move the power-series expansion into the existing IL walk
> during pass_cse_sincos.  As part of this, I'll move
> tree_expand_builtin_powi and its subfunctions from builtins.c into
> tree-ssa-math-opts.c.  I'll submit this as a separate patch.
>
> I will also stop attempting to make code generation match completely at
> -O0.  If there are tests in the test suite that fail only at -O0 due to
> these changes, I'll modify the tests to require -O1 or higher.
>
> I understand that you'd prefer that I leave the existing
> canonicalization folds in place, and only un-canonicalize them during my
> new pass (now, during cse_sincos).  Actually, that was my first approach
> to this issue.  The problem that I ran into is that the various folds
> are not performed just by the front end, but can pop up later, after my
> pass is done.  In particular, pass_fold_builtins will undo my changes,
> turning expressions involving roots back into expressions involving
> pow/powi.  It wasn't clear to me whether the folds could kick in
> elsewhere as well, so I took the approach of shutting them down.  I see
> now that this does lose some optimizations such as
> pow(sqrt(cbrx(x)),6.0), as you pointed out.

Yeah, it's always a delicate balance between canonicalization
and optimal form for further optimization.  Did you really see
sqrt(cbrt(x)) being transformed back to pow()?  I would doubt that,
as on gimple the foldings that require two function calls to match
shouldn't trigger.  Or do you see sqrt(x) turned into pow(x,0.5)?
I see that the vectorizer for example handles both pow(x,0.5) and
pow(x,2), so indeed that might happen.

I think what we might want to do is limit what the generic
gimple fold_stmt folding does to function calls, also to avoid
building regular generic call statements again.  But that might
be a bigger project and certainly should be done separately.

So I'd say don't worry about this issue for the initial patch but
instead deal with it separately.

We also repeatedly thought about whether canonicalizing
everything to pow is a good idea or not, especially our
canonicalizing of x * x to pow (x, 2) leads to interesting
effects in some cases, as several passes do not handle
function calls very well.  So I also thought about introducing
a POW_EXPR tree code that would be easier in this
regard and would be a more IL friendly canonical form
of the power-related functions.

> Should I attempt to leave the folds in place, and screen out the
> particular cases that are causing trouble in pass_fold_builtins?  Or is
> it too fragile to try to catch all places where folds occur?  If there's
> a flag that indicates parsing is complete, I suppose I could disable
> individual folds once we're into the optimizer.  I'd appreciate your
> guidance.

Indeed restricting canonicalization to earlier passes would be the
way to go I think.  I will think of the best way to achieve this.

Richard.

> Thanks,
> Bill
>
>
>
Bill Schmidt May 17, 2011, 11:41 a.m. UTC | #10
On Tue, 2011-05-17 at 11:03 +0200, Richard Guenther wrote:
> On Mon, May 16, 2011 at 7:30 PM, William J. Schmidt
> <wschmidt@linux.vnet.ibm.com> wrote:
> > Richi, thank you for the detailed review!
> >
> > I'll plan to move the power-series expansion into the existing IL walk
> > during pass_cse_sincos.  As part of this, I'll move
> > tree_expand_builtin_powi and its subfunctions from builtins.c into
> > tree-ssa-math-opts.c.  I'll submit this as a separate patch.
> >
> > I will also stop attempting to make code generation match completely at
> > -O0.  If there are tests in the test suite that fail only at -O0 due to
> > these changes, I'll modify the tests to require -O1 or higher.
> >
> > I understand that you'd prefer that I leave the existing
> > canonicalization folds in place, and only un-canonicalize them during my
> > new pass (now, during cse_sincos).  Actually, that was my first approach
> > to this issue.  The problem that I ran into is that the various folds
> > are not performed just by the front end, but can pop up later, after my
> > pass is done.  In particular, pass_fold_builtins will undo my changes,
> > turning expressions involving roots back into expressions involving
> > pow/powi.  It wasn't clear to me whether the folds could kick in
> > elsewhere as well, so I took the approach of shutting them down.  I see
> > now that this does lose some optimizations such as
> > pow(sqrt(cbrx(x)),6.0), as you pointed out.
> 
> Yeah, it's always a delicate balance between canonicalization
> and optimal form for further optimization.  Did you really see
> sqrt(cbrt(x)) being transformed back to pow()?  I would doubt that,
> as on gimple the foldings that require two function calls to match
> shouldn't trigger.  Or do you see sqrt(x) turned into pow(x,0.5)?
> I see that the vectorizer for example handles both pow(x,0.5) and
> pow(x,2), so indeed that might happen.

Yes, I was seeing sqrt(x) turned back to pow(x,0.5), and even x*x
turning back into pow(x,2.0).  I don't specifically recall the
sqrt(cbrt(x)) case; you're probably right about that one.  But I had
several test cases break because of this.

> 
> I think what we might want to do is limit what the generic
> gimple fold_stmt folding does to function calls, also to avoid
> building regular generic call statements again.  But that might
> be a bigger project and certainly should be done separately.
> 
> So I'd say don't worry about this issue for the initial patch but
> instead deal with it separately.

Agreed...

> 
> We also repeatedly thought about whether canonicalizing
> everything to pow is a good idea or not, especially our
> canonicalizing of x * x to pow (x, 2) leads to interesting
> effects in some cases, as several passes do not handle
> function calls very well.  So I also thought about introducing
> a POW_EXPR tree code that would be easier in this
> regard and would be a more IL friendly canonical form
> of the power-related functions.
> 
> > Should I attempt to leave the folds in place, and screen out the
> > particular cases that are causing trouble in pass_fold_builtins?  Or is
> > it too fragile to try to catch all places where folds occur?  If there's
> > a flag that indicates parsing is complete, I suppose I could disable
> > individual folds once we're into the optimizer.  I'd appreciate your
> > guidance.
> 
> Indeed restricting canonicalization to earlier passes would be the
> way to go I think.  I will think of the best way to achieve this.

Thanks.  I think we need to address this as part of this patch, unless
you're willing to live with a number of broken test cases in the
meanwhile.  If I only do the un-canonicalization in the new pass and let
some of the folds be re-done later, some will fail.  I'll start
experimenting and see how many.

Bill

> 
> Richard.
> 
> > Thanks,
> > Bill
> >
> >
> >
diff mbox

Patch

Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 173730)
+++ gcc/tree.h	(working copy)
@@ -1,6 +1,6 @@ 
 /* Front-end tree definitions for GNU compiler.
    Copyright (C) 1989, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000,
-   2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010
+   2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011
    Free Software Foundation, Inc.
 
 This file is part of GCC.
@@ -5271,6 +5271,8 @@ 
 extern bool fold_builtin_next_arg (tree, bool);
 extern enum built_in_function builtin_mathfn_code (const_tree);
 extern tree fold_builtin_call_array (location_t, tree, tree, int, tree *);
+extern tree fold_builtin_pow (location_t, tree, tree, tree, tree);
+extern tree fold_builtin_powi (location_t, tree, tree, tree, tree);
 extern tree build_call_expr_loc_array (location_t, tree, int, tree *);
 extern tree build_call_expr_loc_vec (location_t, tree, VEC(tree,gc) *);
 extern tree build_call_expr_loc (location_t, tree, int, ...);
Index: gcc/tree-pass.h
===================================================================
--- gcc/tree-pass.h	(revision 173730)
+++ gcc/tree-pass.h	(working copy)
@@ -419,6 +419,7 @@ 
 extern struct gimple_opt_pass pass_cse_sincos;
 extern struct gimple_opt_pass pass_optimize_bswap;
 extern struct gimple_opt_pass pass_optimize_widening_mul;
+extern struct gimple_opt_pass pass_lower_pow;
 extern struct gimple_opt_pass pass_warn_function_return;
 extern struct gimple_opt_pass pass_warn_function_noreturn;
 extern struct gimple_opt_pass pass_cselim;
Index: gcc/builtins.c
===================================================================
--- gcc/builtins.c	(revision 173730)
+++ gcc/builtins.c	(working copy)
@@ -149,8 +149,6 @@ 
 static rtx expand_builtin_signbit (tree, rtx);
 static tree fold_builtin_sqrt (location_t, tree, tree);
 static tree fold_builtin_cbrt (location_t, tree, tree);
-static tree fold_builtin_pow (location_t, tree, tree, tree, tree);
-static tree fold_builtin_powi (location_t, tree, tree, tree, tree);
 static tree fold_builtin_cos (location_t, tree, tree, tree);
 static tree fold_builtin_cosh (location_t, tree, tree, tree);
 static tree fold_builtin_tan (tree, tree);
@@ -2940,7 +2938,7 @@ 
 
 /* Return the number of multiplications required to calculate
    powi(x,n) for an arbitrary x, given the exponent N.  This
-   function needs to be kept in sync with expand_powi below.  */
+   function needs to be kept in sync with fold_powi_as_mults, below.  */
 
 static int
 powi_cost (HOST_WIDE_INT n)
@@ -2981,165 +2979,6 @@ 
   return result + powi_lookup_cost (val, cache);
 }
 
-/* Recursive subroutine of expand_powi.  This function takes the array,
-   CACHE, of already calculated exponents and an exponent N and returns
-   an RTX that corresponds to CACHE[1]**N, as calculated in mode MODE.  */
-
-static rtx
-expand_powi_1 (enum machine_mode mode, unsigned HOST_WIDE_INT n, rtx *cache)
-{
-  unsigned HOST_WIDE_INT digit;
-  rtx target, result;
-  rtx op0, op1;
-
-  if (n < POWI_TABLE_SIZE)
-    {
-      if (cache[n])
-	return cache[n];
-
-      target = gen_reg_rtx (mode);
-      cache[n] = target;
-
-      op0 = expand_powi_1 (mode, n - powi_table[n], cache);
-      op1 = expand_powi_1 (mode, powi_table[n], cache);
-    }
-  else if (n & 1)
-    {
-      target = gen_reg_rtx (mode);
-      digit = n & ((1 << POWI_WINDOW_SIZE) - 1);
-      op0 = expand_powi_1 (mode, n - digit, cache);
-      op1 = expand_powi_1 (mode, digit, cache);
-    }
-  else
-    {
-      target = gen_reg_rtx (mode);
-      op0 = expand_powi_1 (mode, n >> 1, cache);
-      op1 = op0;
-    }
-
-  result = expand_mult (mode, op0, op1, target, 0);
-  if (result != target)
-    emit_move_insn (target, result);
-  return target;
-}
-
-/* Expand the RTL to evaluate powi(x,n) in mode MODE.  X is the
-   floating point operand in mode MODE, and N is the exponent.  This
-   function needs to be kept in sync with powi_cost above.  */
-
-static rtx
-expand_powi (rtx x, enum machine_mode mode, HOST_WIDE_INT n)
-{
-  rtx cache[POWI_TABLE_SIZE];
-  rtx result;
-
-  if (n == 0)
-    return CONST1_RTX (mode);
-
-  memset (cache, 0, sizeof (cache));
-  cache[1] = x;
-
-  result = expand_powi_1 (mode, (n < 0) ? -n : n, cache);
-
-  /* If the original exponent was negative, reciprocate the result.  */
-  if (n < 0)
-    result = expand_binop (mode, sdiv_optab, CONST1_RTX (mode),
-			   result, NULL_RTX, 0, OPTAB_LIB_WIDEN);
-
-  return result;
-}
-
-/* Fold a builtin function call to pow, powf, or powl into a series of sqrts or
-   cbrts.  Return NULL_RTX if no simplification can be made or expand the tree
-   if we can simplify it.  */
-static rtx
-expand_builtin_pow_root (location_t loc, tree arg0, tree arg1, tree type,
-			 rtx subtarget)
-{
-  if (TREE_CODE (arg1) == REAL_CST
-      && !TREE_OVERFLOW (arg1)
-      && flag_unsafe_math_optimizations)
-    {
-      enum machine_mode mode = TYPE_MODE (type);
-      tree sqrtfn = mathfn_built_in (type, BUILT_IN_SQRT);
-      tree cbrtfn = mathfn_built_in (type, BUILT_IN_CBRT);
-      REAL_VALUE_TYPE c = TREE_REAL_CST (arg1);
-      tree op = NULL_TREE;
-
-      if (sqrtfn)
-	{
-	  /* Optimize pow (x, 0.5) into sqrt.  */
-	  if (REAL_VALUES_EQUAL (c, dconsthalf))
-	    op = build_call_nofold_loc (loc, sqrtfn, 1, arg0);
-
-	  /* Don't do this optimization if we don't have a sqrt insn.  */
-	  else if (optab_handler (sqrt_optab, mode) != CODE_FOR_nothing)
-	    {
-	      REAL_VALUE_TYPE dconst1_4 = dconst1;
-	      REAL_VALUE_TYPE dconst3_4;
-	      SET_REAL_EXP (&dconst1_4, REAL_EXP (&dconst1_4) - 2);
-
-	      real_from_integer (&dconst3_4, VOIDmode, 3, 0, 0);
-	      SET_REAL_EXP (&dconst3_4, REAL_EXP (&dconst3_4) - 2);
-
-	      /* Optimize pow (x, 0.25) into sqrt (sqrt (x)).  Assume on most
-		 machines that a builtin sqrt instruction is smaller than a
-		 call to pow with 0.25, so do this optimization even if
-		 -Os.  */
-	      if (REAL_VALUES_EQUAL (c, dconst1_4))
-		{
-		  op = build_call_nofold_loc (loc, sqrtfn, 1, arg0);
-		  op = build_call_nofold_loc (loc, sqrtfn, 1, op);
-		}
-
-	      /* Optimize pow (x, 0.75) = sqrt (x) * sqrt (sqrt (x)) unless we
-		 are optimizing for space.  */
-	      else if (optimize_insn_for_speed_p ()
-		       && !TREE_SIDE_EFFECTS (arg0)
-		       && REAL_VALUES_EQUAL (c, dconst3_4))
-		{
-		  tree sqrt1 = build_call_expr_loc (loc, sqrtfn, 1, arg0);
-		  tree sqrt2 = builtin_save_expr (sqrt1);
-		  tree sqrt3 = build_call_expr_loc (loc, sqrtfn, 1, sqrt1);
-		  op = fold_build2_loc (loc, MULT_EXPR, type, sqrt2, sqrt3);
-		}
-	    }
-	}
-
-      /* Check whether we can do cbrt insstead of pow (x, 1./3.) and
-	 cbrt/sqrts instead of pow (x, 1./6.).  */
-      if (cbrtfn && ! op
-	  && (tree_expr_nonnegative_p (arg0) || !HONOR_NANS (mode)))
-	{
-	  /* First try 1/3.  */
-	  REAL_VALUE_TYPE dconst1_3
-	    = real_value_truncate (mode, dconst_third ());
-
-	  if (REAL_VALUES_EQUAL (c, dconst1_3))
-	    op = build_call_nofold_loc (loc, cbrtfn, 1, arg0);
-
-	      /* Now try 1/6.  */
-	  else if (optimize_insn_for_speed_p ()
-		   && optab_handler (sqrt_optab, mode) != CODE_FOR_nothing)
-	    {
-	      REAL_VALUE_TYPE dconst1_6 = dconst1_3;
-	      SET_REAL_EXP (&dconst1_6, REAL_EXP (&dconst1_6) - 1);
-
-	      if (REAL_VALUES_EQUAL (c, dconst1_6))
-		{
-		  op = build_call_nofold_loc (loc, sqrtfn, 1, arg0);
-		  op = build_call_nofold_loc (loc, cbrtfn, 1, op);
-		}
-	    }
-	}
-
-      if (op)
-	return expand_expr (op, subtarget, mode, EXPAND_NORMAL);
-    }
-
-  return NULL_RTX;
-}
-
 /* Expand a call to the pow built-in mathematical function.  Return NULL_RTX if
    a normal call should be emitted rather than expanding the function
    in-line.  EXP is the expression that is a call to the builtin
@@ -3148,147 +2987,9 @@ 
 static rtx
 expand_builtin_pow (tree exp, rtx target, rtx subtarget)
 {
-  tree arg0, arg1;
-  tree fn, narg0;
-  tree type = TREE_TYPE (exp);
-  REAL_VALUE_TYPE cint, c, c2;
-  HOST_WIDE_INT n;
-  rtx op, op2;
-  enum machine_mode mode = TYPE_MODE (type);
-
   if (! validate_arglist (exp, REAL_TYPE, REAL_TYPE, VOID_TYPE))
     return NULL_RTX;
 
-  arg0 = CALL_EXPR_ARG (exp, 0);
-  arg1 = CALL_EXPR_ARG (exp, 1);
-
-  if (TREE_CODE (arg1) != REAL_CST
-      || TREE_OVERFLOW (arg1))
-    return expand_builtin_mathfn_2 (exp, target, subtarget);
-
-  /* Handle constant exponents.  */
-
-  /* For integer valued exponents we can expand to an optimal multiplication
-     sequence using expand_powi.  */
-  c = TREE_REAL_CST (arg1);
-  n = real_to_integer (&c);
-  real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
-  if (real_identical (&c, &cint)
-      && ((n >= -1 && n <= 2)
-	  || (flag_unsafe_math_optimizations
-	      && optimize_insn_for_speed_p ()
-	      && powi_cost (n) <= POWI_MAX_MULTS)))
-    {
-      op = expand_expr (arg0, subtarget, VOIDmode, EXPAND_NORMAL);
-      if (n != 1)
-	{
-	  op = force_reg (mode, op);
-	  op = expand_powi (op, mode, n);
-	}
-      return op;
-    }
-
-  narg0 = builtin_save_expr (arg0);
-
-  /* If the exponent is not integer valued, check if it is half of an integer.
-     In this case we can expand to sqrt (x) * x**(n/2).  */
-  fn = mathfn_built_in (type, BUILT_IN_SQRT);
-  if (fn != NULL_TREE)
-    {
-      real_arithmetic (&c2, MULT_EXPR, &c, &dconst2);
-      n = real_to_integer (&c2);
-      real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
-      if (real_identical (&c2, &cint)
-	  && ((flag_unsafe_math_optimizations
-	       && optimize_insn_for_speed_p ()
-	       && powi_cost (n/2) <= POWI_MAX_MULTS)
-	      /* Even the c == 0.5 case cannot be done unconditionally
-	         when we need to preserve signed zeros, as
-		 pow (-0, 0.5) is +0, while sqrt(-0) is -0.  */
-	      || (!HONOR_SIGNED_ZEROS (mode) && n == 1)
-	      /* For c == 1.5 we can assume that x * sqrt (x) is always
-	         smaller than pow (x, 1.5) if sqrt will not be expanded
-		 as a call.  */
-	      || (n == 3
-		  && optab_handler (sqrt_optab, mode) != CODE_FOR_nothing)))
-	{
-	  tree call_expr = build_call_nofold_loc (EXPR_LOCATION (exp), fn, 1,
-						  narg0);
-	  /* Use expand_expr in case the newly built call expression
-	     was folded to a non-call.  */
-	  op = expand_expr (call_expr, subtarget, mode, EXPAND_NORMAL);
-	  if (n != 1)
-	    {
-	      op2 = expand_expr (narg0, subtarget, VOIDmode, EXPAND_NORMAL);
-	      op2 = force_reg (mode, op2);
-	      op2 = expand_powi (op2, mode, abs (n / 2));
-	      op = expand_simple_binop (mode, MULT, op, op2, NULL_RTX,
-					0, OPTAB_LIB_WIDEN);
-	      /* If the original exponent was negative, reciprocate the
-		 result.  */
-	      if (n < 0)
-		op = expand_binop (mode, sdiv_optab, CONST1_RTX (mode),
-				   op, NULL_RTX, 0, OPTAB_LIB_WIDEN);
-	    }
-	  return op;
-	}
-    }
-
-  /* Check whether we can do a series of sqrt or cbrt's instead of the pow
-     call.  */
-  op = expand_builtin_pow_root (EXPR_LOCATION (exp), arg0, arg1, type,
-				subtarget);
-  if (op)
-    return op;
-
-  /* Try if the exponent is a third of an integer.  In this case
-     we can expand to x**(n/3) * cbrt(x)**(n%3).  As cbrt (x) is
-     different from pow (x, 1./3.) due to rounding and behavior
-     with negative x we need to constrain this transformation to
-     unsafe math and positive x or finite math.  */
-  fn = mathfn_built_in (type, BUILT_IN_CBRT);
-  if (fn != NULL_TREE
-      && flag_unsafe_math_optimizations
-      && (tree_expr_nonnegative_p (arg0)
-	  || !HONOR_NANS (mode)))
-    {
-      REAL_VALUE_TYPE dconst3;
-      real_from_integer (&dconst3, VOIDmode, 3, 0, 0);
-      real_arithmetic (&c2, MULT_EXPR, &c, &dconst3);
-      real_round (&c2, mode, &c2);
-      n = real_to_integer (&c2);
-      real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
-      real_arithmetic (&c2, RDIV_EXPR, &cint, &dconst3);
-      real_convert (&c2, mode, &c2);
-      if (real_identical (&c2, &c)
-	  && ((optimize_insn_for_speed_p ()
-	       && powi_cost (n/3) <= POWI_MAX_MULTS)
-	      || n == 1))
-	{
-	  tree call_expr = build_call_nofold_loc (EXPR_LOCATION (exp), fn, 1,
-						  narg0);
-	  op = expand_builtin (call_expr, NULL_RTX, subtarget, mode, 0);
-	  if (abs (n) % 3 == 2)
-	    op = expand_simple_binop (mode, MULT, op, op, op,
-				      0, OPTAB_LIB_WIDEN);
-	  if (n != 1)
-	    {
-	      op2 = expand_expr (narg0, subtarget, VOIDmode, EXPAND_NORMAL);
-	      op2 = force_reg (mode, op2);
-	      op2 = expand_powi (op2, mode, abs (n / 3));
-	      op = expand_simple_binop (mode, MULT, op, op2, NULL_RTX,
-					0, OPTAB_LIB_WIDEN);
-	      /* If the original exponent was negative, reciprocate the
-		 result.  */
-	      if (n < 0)
-		op = expand_binop (mode, sdiv_optab, CONST1_RTX (mode),
-				   op, NULL_RTX, 0, OPTAB_LIB_WIDEN);
-	    }
-	  return op;
-	}
-    }
-
-  /* Fall back to optab expansion.  */
   return expand_builtin_mathfn_2 (exp, target, subtarget);
 }
 
@@ -3312,27 +3013,6 @@ 
   arg1 = CALL_EXPR_ARG (exp, 1);
   mode = TYPE_MODE (TREE_TYPE (exp));
 
-  /* Handle constant power.  */
-
-  if (TREE_CODE (arg1) == INTEGER_CST
-      && !TREE_OVERFLOW (arg1))
-    {
-      HOST_WIDE_INT n = TREE_INT_CST_LOW (arg1);
-
-      /* If the exponent is -1, 0, 1 or 2, then expand_powi is exact.
-	 Otherwise, check the number of multiplications required.  */
-      if ((TREE_INT_CST_HIGH (arg1) == 0
-	   || TREE_INT_CST_HIGH (arg1) == -1)
-	  && ((n >= -1 && n <= 2)
-	      || (optimize_insn_for_speed_p ()
-		  && powi_cost (n) <= POWI_MAX_MULTS)))
-	{
-	  op0 = expand_expr (arg0, NULL_RTX, VOIDmode, EXPAND_NORMAL);
-	  op0 = force_reg (mode, op0);
-	  return expand_powi (op0, mode, n);
-	}
-    }
-
   /* Emit a libcall to libgcc.  */
 
   /* Mode of the 2nd argument must match that of an int.  */
@@ -7195,8 +6875,13 @@ 
       return build_call_expr_loc (loc, expfn, 1, arg);
     }
 
-  /* Optimize sqrt(Nroot(x)) -> pow(x,1/(2*N)).  */
-  if (flag_unsafe_math_optimizations && BUILTIN_ROOT_P (fcode))
+  /* Optimize sqrt(Nroot(x)) -> pow(x,1/(2*N)).  However, for N=2, only 
+     do this if there is no hardware sqrt instruction.  (For N=3, this
+     has the effect of canonicalizing sqrt(cbrt(x)) as cbrt(sqrt(x)),
+     due to folding on pow(x,1/6).)  */
+  if (flag_unsafe_math_optimizations && BUILTIN_ROOT_P (fcode)
+      && (!BUILTIN_SQRT_P (fcode)
+	  || optab_handler (sqrt_optab, TYPE_MODE (type)) == CODE_FOR_nothing))
     {
       tree powfn = mathfn_built_in (type, BUILT_IN_POW);
 
@@ -7238,6 +6923,39 @@ 
       return build_call_expr_loc (loc, powfn, 2, arg0, narg1);
     }
 
+  /* Optimize sqrt(powi(x,y)) = pow(|x|,y*0.5).  */
+  if (flag_unsafe_math_optimizations
+      && (fcode == BUILT_IN_POWI
+	  || fcode == BUILT_IN_POWIF
+	  || fcode == BUILT_IN_POWIL))
+    {
+      tree powfn = mathfn_built_in (type, BUILT_IN_POW);
+      tree arg0 = CALL_EXPR_ARG (arg, 0);
+      tree arg1 = CALL_EXPR_ARG (arg, 1);
+      tree narg1;
+      if (!tree_expr_nonnegative_p (arg0))
+	arg0 = build1 (ABS_EXPR, type, arg0);
+      narg1 = fold_convert_loc (loc, type, arg1);
+      narg1 = fold_build2_loc (loc, MULT_EXPR, type, narg1,
+			   build_real (type, dconsthalf));
+      return build_call_expr_loc (loc, powfn, 2, arg0, narg1);
+    }
+
+  /* Optimize sqrt(x*x) = |x|.  */
+  if (flag_unsafe_math_optimizations
+      && TREE_CODE (arg) == MULT_EXPR)
+    {
+      tree arg0 = TREE_OPERAND (arg, 0);
+      tree arg1 = TREE_OPERAND (arg, 1);
+
+      if (operand_equal_p (arg0, arg1, 0))
+	{
+	  if (!tree_expr_nonnegative_p (arg0))
+	    arg0 = build1 (ABS_EXPR, type, arg0);
+	  return fold_convert_loc (loc, type, arg0);
+	}
+    }
+
   return NULL_TREE;
 }
 
@@ -7271,8 +6989,11 @@ 
 	  return build_call_expr_loc (loc, expfn, 1, arg);
 	}
 
-      /* Optimize cbrt(sqrt(x)) -> pow(x,1/6).  */
-      if (BUILTIN_SQRT_P (fcode))
+      /* Optimize cbrt(sqrt(x)) -> pow(x,1/6), but only if there is no
+	 native square root instruction or we are optimizing for size.  */
+      if (BUILTIN_SQRT_P (fcode)
+	  && (optab_handler (sqrt_optab, TYPE_MODE (type)) == CODE_FOR_nothing
+	      || !optimize_function_for_speed_p (cfun)))
 	{
 	  tree powfn = mathfn_built_in (type, BUILT_IN_POW);
 
@@ -8010,11 +7731,177 @@ 
 }
 
 
+/* Attempt to evaluate powi(arg0,n) at compile time, unless this should
+   raise an exception.  */
+static tree
+fold_eval_powi (tree arg0, HOST_WIDE_INT n, tree type, enum machine_mode mode)
+{
+  if (TREE_CODE (arg0) == REAL_CST
+      && !TREE_OVERFLOW (arg0)
+      && (n > 0
+	  || (!flag_trapping_math && !flag_errno_math)
+	  || !REAL_VALUES_EQUAL (TREE_REAL_CST (arg0), dconst0)))
+    {
+      REAL_VALUE_TYPE x;
+      bool inexact;
+      
+      x = TREE_REAL_CST (arg0);
+      inexact = real_powi (&x, mode, &x, n);
+      if (flag_unsafe_math_optimizations || !inexact)
+	return build_real (type, x);
+    }
+
+  return NULL_TREE;
+}
+
+
+/* Build a call to FNDECL with location LOC and arguments ARG0 and ARG1.
+   If N is even, strip the sign from ARG0 before building the call.  */
+static tree
+build_call_expr_loc_strip_sign (HOST_WIDE_INT n, location_t loc, tree fndecl,
+				tree arg0, tree arg1)
+{
+  if ((n & 1) == 0 && flag_unsafe_math_optimizations)
+    {
+      tree narg0 = fold_strip_sign_ops (arg0);
+      if (narg0)
+	return build_call_expr_loc (loc, fndecl, 2, narg0, arg1);
+    }
+
+  return build_call_expr_loc (loc, fndecl, 2, arg0, arg1);
+}
+
+
+/* Attempt to optimize pow(ARG0, C), where C is a real constant not equal
+   to any integer.  When 2C or 3C is an integer, we can sometimes improve
+   the code using sqrt and/or cbrt.  */
+static tree
+fold_builtin_pow_frac_exp (location_t loc, tree arg0, REAL_VALUE_TYPE c,
+			   tree type, enum machine_mode mode)
+{
+  REAL_VALUE_TYPE c2, cint;
+  HOST_WIDE_INT n;
+  tree sqrtfn = mathfn_built_in (type, BUILT_IN_SQRT);
+  tree cbrtfn = mathfn_built_in (type, BUILT_IN_CBRT);
+  tree powifn = mathfn_built_in (type, BUILT_IN_POWI);
+  
+  /* Optimize pow(x,c), where c = floor(c) + 0.5, into
+     sqrt(x) * powi(x, floor(c)).  */
+
+  real_arithmetic (&c2, MULT_EXPR, &c, &dconst2);
+  n = real_to_integer (&c2);
+  real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
+
+  if (real_identical (&c2, &cint)
+      && ((flag_unsafe_math_optimizations
+	   && sqrtfn != NULL_TREE
+	   && powi_cost (n/2) <= POWI_MAX_MULTS)
+	  /* pow(x,0.5) can be done unconditionally provided signed
+	     zeros must not be maintained.  pow(-0,0.5) = +0, but 
+	     sqrt(-0) = -0.  */
+	  || (!HONOR_SIGNED_ZEROS (mode) && n == 1)
+	  /* pow(x,1.5)=x*sqrt(x) is safe, and smaller than pow(x,1.5)
+	     provided sqrt will not be expanded as a call.  */
+	  || (n == 3
+	      && optab_handler (sqrt_optab, mode) != CODE_FOR_nothing)))
+    {
+      tree narg0 = builtin_save_expr (arg0);
+      tree powi_x_floor_c = NULL_TREE;
+      HOST_WIDE_INT floor_c = n / 2;
+      if (n <= 0)
+	floor_c--;
+
+      /* Attempt to fold powi(arg0, floor_c) into a constant.  */
+      powi_x_floor_c = fold_eval_powi (arg0, floor_c, type, mode);
+
+      if (!powi_x_floor_c && powifn)
+	{
+	  tree tree_floor_c = build_int_cst (integer_type_node, floor_c);
+	  powi_x_floor_c = build_call_expr_loc_strip_sign (floor_c, loc, powifn,
+							   narg0, tree_floor_c);
+	}
+
+      if (powi_x_floor_c)
+	{
+	  tree sqrt_arg0 = build_call_nofold_loc (loc, sqrtfn, 1, narg0);
+	  return fold_build2_loc (loc, MULT_EXPR, type,
+				  sqrt_arg0, powi_x_floor_c);
+	}
+    }
+
+  /* Optimize pow(x,c), where 3c = n for some integer n, into
+     powi(x, floor(c)) * powi(cbrt(x), n%3).  */
+  if (cbrtfn != NULL_TREE
+      && powifn != NULL_TREE
+      && flag_unsafe_math_optimizations
+      && (tree_expr_nonnegative_p (arg0) || !HONOR_NANS (mode)))
+    {
+      REAL_VALUE_TYPE dconst3;
+      
+      real_from_integer (&dconst3, VOIDmode, 3, 0, 0);
+      real_arithmetic (&c2, MULT_EXPR, &c, &dconst3);
+      real_round (&c2, mode, &c2);
+      n = real_to_integer (&c2);
+      real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
+      real_arithmetic (&c2, RDIV_EXPR, &cint, &dconst3);
+      real_convert (&c2, mode, &c2);
+      if (real_identical (&c2, &c)
+	  && ((optimize_function_for_speed_p (cfun)
+	       && powi_cost (n / 3) <= POWI_MAX_MULTS)
+	      || n == 1))
+	{
+	  HOST_WIDE_INT floor_c = n / 3;
+	  tree narg0 = builtin_save_expr (arg0);
+	  tree powi_x_floor_c;
+
+	  if (n <= 0)
+	    floor_c--;
+
+	  /* Attempt to fold powi(x, floor(c)) into a constant.  */
+	  powi_x_floor_c = fold_eval_powi (arg0, floor_c, type, mode);
+
+	  if (!powi_x_floor_c)
+	    {
+	      tree tree_floor_c =
+		build_int_cst (integer_type_node, floor_c);
+
+	      powi_x_floor_c = 
+		build_call_expr_loc_strip_sign (floor_c, loc, powifn,
+						narg0, tree_floor_c);
+	    }
+
+	  if (powi_x_floor_c)
+	    {
+	      HOST_WIDE_INT n_mod_3 = n % 3;
+	      tree tree_n_mod_3, powi_cbrt_x, cbrt_arg0;
+	      
+	      if (n <= 0)
+		n_mod_3 = n_mod_3 + 3;
+
+	      tree_n_mod_3 = build_int_cst (integer_type_node, n_mod_3);
+
+	      cbrt_arg0 = build_call_nofold_loc (loc, cbrtfn, 1, narg0);
+	      powi_cbrt_x =
+		build_call_expr_loc_strip_sign (n_mod_3, loc, powifn,
+						cbrt_arg0, tree_n_mod_3);
+
+	      if (powi_cbrt_x)
+		return fold_build2_loc (loc, MULT_EXPR, type,
+					powi_x_floor_c, powi_cbrt_x);
+	    }
+	}
+    }
+
+  return NULL_TREE;
+}
+
+
 /* Fold a builtin function call to pow, powf, or powl.  Return
    NULL_TREE if no simplification can be made.  */
-static tree
+tree
 fold_builtin_pow (location_t loc, tree fndecl, tree arg0, tree arg1, tree type)
 {
+  enum machine_mode mode = TYPE_MODE (type);
   tree res;
 
   if (!validate_arg (arg0, REAL_TYPE)
@@ -8032,9 +7919,10 @@ 
   if (TREE_CODE (arg1) == REAL_CST
       && !TREE_OVERFLOW (arg1))
     {
-      REAL_VALUE_TYPE cint;
       REAL_VALUE_TYPE c;
-      HOST_WIDE_INT n;
+      tree sqrtfn = mathfn_built_in (type, BUILT_IN_SQRT);
+      tree cbrtfn = mathfn_built_in (type, BUILT_IN_CBRT);
+      REAL_VALUE_TYPE dconst1_4, dconst3_4;
 
       c = TREE_REAL_CST (arg1);
 
@@ -8054,56 +7942,63 @@ 
 
       /* Optimize pow(x,0.5) = sqrt(x).  */
       if (flag_unsafe_math_optimizations
-	  && REAL_VALUES_EQUAL (c, dconsthalf))
+	  && REAL_VALUES_EQUAL (c, dconsthalf)
+	  && sqrtfn != NULL_TREE)
+	return build_call_expr_loc (loc, sqrtfn, 1, arg0);
+
+      /* Optimize pow(x,0.25) = sqrt(sqrt(x)).  */
+      dconst1_4 = dconst1;
+      SET_REAL_EXP (&dconst1_4, REAL_EXP (&dconst1_4) - 2);
+
+      if (flag_unsafe_math_optimizations
+	  && REAL_VALUES_EQUAL (c, dconst1_4)
+	  && sqrtfn != NULL_TREE
+	  && optab_handler (sqrt_optab, mode) != CODE_FOR_nothing)
 	{
-	  tree sqrtfn = mathfn_built_in (type, BUILT_IN_SQRT);
-
-	  if (sqrtfn != NULL_TREE)
-	    return build_call_expr_loc (loc, sqrtfn, 1, arg0);
+	  tree sqrt_arg0 = build_call_nofold_loc (loc, sqrtfn, 1, arg0);
+	  return build_call_nofold_loc (loc, sqrtfn, 1, sqrt_arg0);
 	}
 
-      /* Optimize pow(x,1.0/3.0) = cbrt(x).  */
-      if (flag_unsafe_math_optimizations)
+      /* Optimize pow(x,0.75) = sqrt(x) * sqrt(sqrt(x)) unless we are
+	 optimizing for space.  */
+      real_from_integer (&dconst3_4, VOIDmode, 3, 0, 0);
+      SET_REAL_EXP (&dconst3_4, REAL_EXP (&dconst3_4) - 2);
+
+      if (flag_unsafe_math_optimizations
+	  && optimize_function_for_speed_p (cfun)
+	  && !TREE_SIDE_EFFECTS (arg0)
+	  && REAL_VALUES_EQUAL (c, dconst3_4)
+	  && sqrtfn != NULL_TREE
+	  && optab_handler (sqrt_optab, mode) != CODE_FOR_nothing)
 	{
-	  const REAL_VALUE_TYPE dconstroot
-	    = real_value_truncate (TYPE_MODE (type), dconst_third ());
-
-	  if (REAL_VALUES_EQUAL (c, dconstroot))
-	    {
-	      tree cbrtfn = mathfn_built_in (type, BUILT_IN_CBRT);
-	      if (cbrtfn != NULL_TREE)
-		return build_call_expr_loc (loc, cbrtfn, 1, arg0);
-	    }
+	  tree sqrt_arg0 = build_call_expr_loc (loc, sqrtfn, 1, arg0);
+	  tree sqrt_save = builtin_save_expr (sqrt_arg0);
+	  tree sqrt_sqrt = build_call_expr_loc (loc, sqrtfn, 1, sqrt_arg0);
+	  return fold_build2_loc (loc, MULT_EXPR, type, sqrt_save, sqrt_sqrt);
 	}
 
-      /* Check for an integer exponent.  */
-      n = real_to_integer (&c);
-      real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
-      if (real_identical (&c, &cint))
+      /* Optimize pow(x,1.0/3.0) = cbrt(x), and pow(x,1.0/6.0) =
+	 cbrt(sqrt(x)).  */
+      if (flag_unsafe_math_optimizations && cbrtfn != NULL_TREE)
 	{
-	  /* Attempt to evaluate pow at compile-time, unless this should
-	     raise an exception.  */
-	  if (TREE_CODE (arg0) == REAL_CST
-	      && !TREE_OVERFLOW (arg0)
-	      && (n > 0
-		  || (!flag_trapping_math && !flag_errno_math)
-		  || !REAL_VALUES_EQUAL (TREE_REAL_CST (arg0), dconst0)))
-	    {
-	      REAL_VALUE_TYPE x;
-	      bool inexact;
+	  const REAL_VALUE_TYPE dconst1_3
+	    = real_value_truncate (mode, dconst_third ());
 
-	      x = TREE_REAL_CST (arg0);
-	      inexact = real_powi (&x, TYPE_MODE (type), &x, n);
-	      if (flag_unsafe_math_optimizations || !inexact)
-		return build_real (type, x);
-	    }
+	  if (REAL_VALUES_EQUAL (c, dconst1_3))
+	    return build_call_expr_loc (loc, cbrtfn, 1, arg0);
 
-	  /* Strip sign ops from even integer powers.  */
-	  if ((n & 1) == 0 && flag_unsafe_math_optimizations)
+	  if (optimize_function_for_speed_p (cfun)
+	      && sqrtfn != NULL_TREE
+	      && optab_handler (sqrt_optab, mode) != CODE_FOR_nothing)
 	    {
-	      tree narg0 = fold_strip_sign_ops (arg0);
-	      if (narg0)
-		return build_call_expr_loc (loc, fndecl, 2, narg0, arg1);
+	      REAL_VALUE_TYPE dconst1_6 = dconst1_3;
+	      SET_REAL_EXP (&dconst1_6, REAL_EXP (&dconst1_6) - 1);
+
+	      if (REAL_VALUES_EQUAL (c, dconst1_6))
+		{
+		  tree sqrt_arg0 = build_call_nofold_loc (loc, sqrtfn, 1, arg0);
+		  return build_call_nofold_loc (loc, cbrtfn, 1, sqrt_arg0);
+		}
 	    }
 	}
     }
@@ -8137,7 +8032,7 @@ 
 	  if (tree_expr_nonnegative_p (arg))
 	    {
 	      const REAL_VALUE_TYPE dconstroot
-		= real_value_truncate (TYPE_MODE (type), dconst_third ());
+		= real_value_truncate (mode, dconst_third ());
 	      tree narg1 = fold_build2_loc (loc, MULT_EXPR, type, arg1,
 					build_real (type, dconstroot));
 	      return build_call_expr_loc (loc, fndecl, 2, arg, narg1);
@@ -8159,12 +8054,148 @@ 
 	}
     }
 
+  if (TREE_CODE (arg1) == REAL_CST
+      && !TREE_OVERFLOW (arg1)
+      /* If we weren't able to fold a constant expression as reals,
+	 don't convert into a different form.  */
+      && TREE_CODE (arg0) != REAL_CST)
+    {
+      REAL_VALUE_TYPE c, cint;
+      HOST_WIDE_INT n;
+
+      c = TREE_REAL_CST (arg1);
+
+      /* Check for an integer exponent.  */
+      n = real_to_integer (&c);
+      real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
+      if (real_identical (&c, &cint)
+	  && powi_cost (n) <= POWI_MAX_MULTS)
+	{
+	  /* Convert to powi, which will be processed into an optimal
+	     number of multiplications.  */
+	  tree powifn = mathfn_built_in (type, BUILT_IN_POWI);
+
+	  if (powifn)
+	    {
+	      tree power = build_int_cst (integer_type_node, n);
+	      return build_call_expr_loc (loc, powifn, 2, arg0, power);
+	    }
+	}
+
+      /* Check for specific fractional exponents we can optimize.  */
+      else
+	{
+	  tree opt_tree =
+	    fold_builtin_pow_frac_exp (loc, arg0, c, type, mode);
+
+	  if (opt_tree)
+	    return opt_tree;
+	}
+    }
+
   return NULL_TREE;
 }
 
+/* Recursive subroutine of fold_powi_as_mults.  This function takes the
+   array, CACHE, of already calculated exponents and an exponent N and
+   returns a tree that corresponds to CACHE[1]**N, with type TYPE.  */
+
+static tree
+powi_as_mults_1 (gimple_stmt_iterator *gsi, location_t loc, tree type,
+		 HOST_WIDE_INT n, tree *cache)
+{
+  tree op0, op1, target;
+  unsigned HOST_WIDE_INT digit;
+  gimple mult_stmt;
+
+  if (n < POWI_TABLE_SIZE)
+    {
+      if (cache[n])
+	return cache[n];
+
+      target = create_tmp_var (type, "powmult");
+      add_referenced_var (target);
+      target = make_ssa_name (target, NULL);
+      cache[n] = target;
+
+      op0 = powi_as_mults_1 (gsi, loc, type, n - powi_table[n], cache);
+      op1 = powi_as_mults_1 (gsi, loc, type, powi_table[n], cache);
+    }
+  else if (n & 1)
+    {
+      target = create_tmp_var (type, "powmult");
+      add_referenced_var (target);
+      target = make_ssa_name (target, NULL);
+      digit = n & ((1 << POWI_WINDOW_SIZE) - 1);
+      op0 = powi_as_mults_1 (gsi, loc, type, n - digit, cache);
+      op1 = powi_as_mults_1 (gsi, loc, type, digit, cache);
+    }
+  else
+    {
+      target = create_tmp_var (type, "powmult");
+      add_referenced_var (target);
+      target = make_ssa_name (target, NULL);
+      op0 = powi_as_mults_1 (gsi, loc, type, n >> 1, cache);
+      op1 = op0;
+    }
+
+  mult_stmt = gimple_build_assign_with_ops (MULT_EXPR, target, op0, op1);
+  SSA_NAME_DEF_STMT (target) = mult_stmt;
+  gsi_insert_before (gsi, mult_stmt, GSI_SAME_STMT);
+
+  return target;
+}
+
+/* Convert ARG0**N to a tree of multiplications of ARG0 with itself.
+   This function needs to be kept in sync with powi_cost, above.  */
+
+static tree
+powi_as_mults (gimple_stmt_iterator *gsi, location_t loc,
+	       tree arg0, HOST_WIDE_INT n)
+{
+  tree cache[POWI_TABLE_SIZE], result, type = TREE_TYPE (arg0);
+
+  if (n == 0)
+    return omit_one_operand_loc (loc, type, build_real (type, dconst1), arg0);
+
+  memset (cache, 0,  sizeof (cache));
+  cache[1] = arg0;
+
+  result = powi_as_mults_1 (gsi, loc, type, (n < 0) ? -n : n, cache);
+
+  /* If the original exponent was negative, reciprocate the result.  */
+  if (n < 0)
+    result = build2_loc (loc, RDIV_EXPR, type,
+			 build_real (type, dconst1), result);
+  return result;
+}
+
+/* ARGS are the two arguments to a powi builtin in GSI with location info
+   LOC.  If the arguments are appropriate, create an equivalent set of
+   statements prior to GSI using an optimal number of multiplications,
+   and return an expession holding the result.  */
+
+tree
+tree_expand_builtin_powi (gimple_stmt_iterator *gsi, location_t loc, tree *args)
+{
+  HOST_WIDE_INT n = TREE_INT_CST_LOW (args[1]);
+  HOST_WIDE_INT n_hi = TREE_INT_CST_HIGH (args[1]);
+
+  if ((n_hi == 0 || n_hi == -1)
+      /* Avoid largest negative number.  */
+      && (n != -n)
+      && ((n >= -1 && n <= 2)
+	  || (optimize_function_for_speed_p (cfun)
+	      && powi_cost (n) <= POWI_MAX_MULTS)))
+    return powi_as_mults (gsi, loc, args[0], n);
+
+  return NULL_TREE;
+}
+
 /* Fold a builtin function call to powi, powif, or powil with argument ARG.
    Return NULL_TREE if no simplification can be made.  */
-static tree
+
+tree
 fold_builtin_powi (location_t loc, tree fndecl ATTRIBUTE_UNUSED,
 		   tree arg0, tree arg1, tree type)
 {
@@ -8179,17 +8210,8 @@ 
   if (host_integerp (arg1, 0))
     {
       HOST_WIDE_INT c = TREE_INT_CST_LOW (arg1);
+      tree powi_const;
 
-      /* Evaluate powi at compile-time.  */
-      if (TREE_CODE (arg0) == REAL_CST
-	  && !TREE_OVERFLOW (arg0))
-	{
-	  REAL_VALUE_TYPE x;
-	  x = TREE_REAL_CST (arg0);
-	  real_powi (&x, TYPE_MODE (type), &x, c);
-	  return build_real (type, x);
-	}
-
       /* Optimize pow(x,0) = 1.0.  */
       if (c == 0)
 	return omit_one_operand_loc (loc, type, build_real (type, dconst1),
@@ -8203,6 +8225,12 @@ 
       if (c == -1)
 	return fold_build2_loc (loc, RDIV_EXPR, type,
 			   build_real (type, dconst1), arg0);
+
+      /* Attempt to evaluate powi at compile time.  */
+      powi_const = fold_eval_powi (arg0, c, type, TYPE_MODE (type));
+
+      if (powi_const)
+	return powi_const;
     }
 
   return NULL_TREE;
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	(revision 173730)
+++ gcc/fold-const.c	(working copy)
@@ -10460,6 +10460,36 @@ 
 		    }
 		}
 
+	      /* Optimizations of powi(...)*powi(...).  */
+	      if ((fcode0 == BUILT_IN_POWI && fcode1 == BUILT_IN_POWI)
+		  || (fcode0 == BUILT_IN_POWIF && fcode1 == BUILT_IN_POWIF)
+		  || (fcode0 == BUILT_IN_POWIL && fcode1 == BUILT_IN_POWIL))
+		{
+		  tree arg00 = CALL_EXPR_ARG (arg0, 0);
+		  tree arg01 = CALL_EXPR_ARG (arg0, 1);
+		  tree arg10 = CALL_EXPR_ARG (arg1, 0);
+		  tree arg11 = CALL_EXPR_ARG (arg1, 1);
+
+		  /* Optimize powi(x,y)*powi(z,y) as powi(x*z,y).  */
+		  if (operand_equal_p (arg01, arg11, 0))
+		    {
+		      tree powfn = TREE_OPERAND (CALL_EXPR_FN (arg0), 0);
+		      tree arg = fold_build2_loc (loc, MULT_EXPR, type,
+					      arg00, arg10);
+		      return build_call_expr_loc (loc, powfn, 2, arg, arg01);
+		    }
+
+		  /* Optimize powi(x,y)*powi(x,z) as powi(x,y+z).  */
+		  if (operand_equal_p (arg00, arg10, 0))
+		    {
+		      tree powfn = TREE_OPERAND (CALL_EXPR_FN (arg0), 0);
+		      tree inttype = TREE_TYPE (arg01);
+		      tree arg = fold_build2_loc (loc, PLUS_EXPR, inttype,
+					      arg01, arg11);
+		      return build_call_expr_loc (loc, powfn, 2, arg00, arg);
+		    }
+		}
+
 	      /* Optimize tan(x)*cos(x) as sin(x).  */
 	      if (((fcode0 == BUILT_IN_TAN && fcode1 == BUILT_IN_COS)
 		   || (fcode0 == BUILT_IN_TANF && fcode1 == BUILT_IN_COSF)
@@ -10521,16 +10551,61 @@ 
 		    }
 		}
 
-	      /* Optimize x*x as pow(x,2.0), which is expanded as x*x.  */
-	      if (optimize_function_for_speed_p (cfun)
-		  && operand_equal_p (arg0, arg1, 0))
+	      /* Optimize x*powi(x,c) as powi(x,c+1).  */
+	      if (fcode1 == BUILT_IN_POWI
+		  || fcode1 == BUILT_IN_POWIF
+		  || fcode1 == BUILT_IN_POWIL)
 		{
-		  tree powfn = mathfn_built_in (type, BUILT_IN_POW);
+		  tree arg10 = CALL_EXPR_ARG (arg1, 0);
+		  tree arg11 = CALL_EXPR_ARG (arg1, 1);
+		  if (TREE_CODE (arg11) == INTEGER_CST
+		      && !TREE_OVERFLOW (arg11)
+		      && operand_equal_p (arg0, arg10, 0))
+		    {
+		      tree powifn = TREE_OPERAND (CALL_EXPR_FN (arg1), 0);
+		      HOST_WIDE_INT n, n_hi, n_plus_1;
+		      tree arg;
 
-		  if (powfn)
+		      n = TREE_INT_CST_LOW (arg11);
+		      n_hi = TREE_INT_CST_HIGH (arg11);
+		      n_plus_1 = n + 1;
+		      if ((n_hi == 0 || n_hi == -1)
+			  /* Avoid overflow.  */
+			  && n_plus_1 > n)
+			{
+			  arg = build_int_cst (TREE_TYPE (arg11), n + 1);
+			  return build_call_expr_loc (loc, powifn, 2,
+						      arg0, arg);
+			}
+		    }
+		}
+
+	      /* Optimize powi(x,c)*x as powi(x,c+1).  */
+	      if (fcode0 == BUILT_IN_POWI
+		  || fcode0 == BUILT_IN_POWIF
+		  || fcode0 == BUILT_IN_POWIL)
+		{
+		  tree arg00 = CALL_EXPR_ARG (arg0, 0);
+		  tree arg01 = CALL_EXPR_ARG (arg0, 1);
+		  if (TREE_CODE (arg01) == INTEGER_CST
+		      && !TREE_OVERFLOW (arg01)
+		      && operand_equal_p (arg1, arg00, 0))
 		    {
-		      tree arg = build_real (type, dconst2);
-		      return build_call_expr_loc (loc, powfn, 2, arg0, arg);
+		      tree powifn = TREE_OPERAND (CALL_EXPR_FN (arg0), 0);
+		      HOST_WIDE_INT n, n_hi, n_plus_1;
+		      tree arg;
+
+		      n = TREE_INT_CST_LOW (arg01);
+		      n_hi = TREE_INT_CST_HIGH (arg01);
+		      n_plus_1 = n + 1;
+		      if ((n_hi == 0 || n_hi == -1)
+			  /* Avoid overflow.  */
+			  && n_plus_1 > n)
+			{
+			  arg = build_int_cst (TREE_TYPE (arg01), n + 1);
+			  return build_call_expr_loc (loc, powifn, 2,
+						      arg1, arg);
+			}
 		    }
 		}
 	    }
@@ -11457,6 +11532,34 @@ 
 		}
 	    }
 
+	  /* Optimize powi(x,c)/x as powi(x,c-1).  */
+	  if (fcode0 == BUILT_IN_POWI
+	      || fcode0 == BUILT_IN_POWIF
+	      || fcode0 == BUILT_IN_POWIL)
+	    {
+	      tree arg00 = CALL_EXPR_ARG (arg0, 0);
+	      tree arg01 = CALL_EXPR_ARG (arg0, 1);
+	      if (TREE_CODE (arg01) == INTEGER_CST
+		  && !TREE_OVERFLOW (arg01)
+		  && operand_equal_p (arg1, arg00, 0))
+		{
+		  tree powifn = TREE_OPERAND (CALL_EXPR_FN (arg0), 0);
+		  HOST_WIDE_INT n, n_hi, n_minus_1;
+		  tree arg;
+
+		  n = TREE_INT_CST_LOW (arg01);
+		  n_hi = TREE_INT_CST_HIGH (arg01);
+		  n_minus_1 = n - 1;
+		  if ((n_hi == 0 || n_hi == -1)
+		      /* Avoid overflow.  */
+		      && n_minus_1 < n)
+		    {
+		      arg = build_int_cst (TREE_TYPE (arg01), n - 1);
+		      return build_call_expr_loc (loc, powifn, 2, arg1, arg);
+		    }
+		}
+	    }
+
 	  /* Optimize a/root(b/c) into a*root(c/b).  */
 	  if (BUILTIN_ROOT_P (fcode1))
 	    {
@@ -11499,6 +11602,20 @@ 
 	      arg1 = build_call_expr_loc (loc, powfn, 2, arg10, neg11);
 	      return fold_build2_loc (loc, MULT_EXPR, type, arg0, arg1);
 	    }
+
+	  /* Optimize x/powi(y,z) into x*powi(y,-z).  */
+	  if (fcode1 == BUILT_IN_POWI
+	      || fcode1 == BUILT_IN_POWIF
+	      || fcode1 == BUILT_IN_POWIL)
+	    {
+	      tree powifn = TREE_OPERAND (CALL_EXPR_FN (arg1), 0);
+	      tree arg10 = CALL_EXPR_ARG (arg1, 0);
+	      tree arg11 = CALL_EXPR_ARG (arg1, 1);
+	      tree neg11 = fold_convert_loc (loc, integer_type_node,
+					     negate_expr (arg11));
+	      arg1 = build_call_expr_loc (loc, powifn, 2, arg10, neg11);
+	      return fold_build2_loc (loc, MULT_EXPR, type, arg0, arg1);
+	    }
 	}
       return NULL_TREE;
 
Index: gcc/testsuite/gcc.target/powerpc/pr46728-13.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr46728-13.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr46728-13.c	(revision 0)
@@ -0,0 +1,27 @@ 
+/* { dg-do run } */
+/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
+
+#include <math.h>
+
+extern void abort (void);
+
+#define NVALS 6
+
+static double
+convert_it (double x)
+{
+  return pow (x, 1.0 / 6.0);
+}
+
+int
+main (int argc, char *argv[])
+{
+  double values[NVALS] = { 3.0, 1.95, 2.227, 729.0, 64.0, .0008797 };
+  unsigned i;
+
+  for (i = 0; i < NVALS; i++)
+    if (convert_it (values[i]) != cbrt (sqrt (values[i])))
+      abort ();
+
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/pr46728-3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr46728-3.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr46728-3.c	(revision 0)
@@ -0,0 +1,31 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
+
+#include <math.h>
+
+extern void abort (void);
+
+#define NVALS 6
+
+static double
+convert_it (double x)
+{
+  return pow (x, 0.75);
+}
+
+int
+main (int argc, char *argv[])
+{
+  double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 };
+  unsigned i;
+
+  for (i = 0; i < NVALS; i++)
+    if (convert_it (values[i]) != sqrt(values[i]) * sqrt (sqrt (values[i])))
+      abort ();
+
+  return 0;
+}
+
+
+/* { dg-final { scan-assembler-times "sqrt" 4 { target powerpc*-*-* } } } */
+/* { dg-final { scan-assembler-not "pow" { target powerpc*-*-* } } } */
Index: gcc/testsuite/gcc.target/powerpc/pr46728-14.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr46728-14.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr46728-14.c	(revision 0)
@@ -0,0 +1,78 @@ 
+/* { dg-do run } */
+/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
+
+#include <math.h>
+
+extern void abort (void);
+
+#define NVALS 6
+
+static double
+convert_it_1 (double x)
+{
+  return pow (x, 1.5);
+}
+
+static double
+convert_it_2 (double x)
+{
+  return pow (x, 2.5);
+}
+
+static double
+convert_it_3 (double x)
+{
+  return pow (x, -0.5);
+}
+
+static double
+convert_it_4 (double x)
+{
+  return pow (x, 10.5);
+}
+
+static double
+convert_it_5 (double x)
+{
+  return pow (x, -3.5);
+}
+
+int
+main (int argc, char *argv[])
+{
+  double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 };
+  double PREC = .999999;
+  unsigned i;
+
+  for (i = 0; i < NVALS; i++)
+    {
+      volatile double x, y;
+
+      x = sqrt (values[i]);
+      y = __builtin_powi (values[i], 1);
+      if (fabs (convert_it_1 (values[i]) / (x * y)) < PREC)
+	abort ();
+
+      x = sqrt (values[i]);
+      y = __builtin_powi (values[i], 2);
+      if (fabs (convert_it_2 (values[i]) / (x * y)) < PREC)
+	abort ();
+
+      x = sqrt (values[i]);
+      y = __builtin_powi (values[i], -1);
+      if (fabs (convert_it_3 (values[i]) / (x * y)) < PREC)
+	abort ();
+
+      x = sqrt (values[i]);
+      y = __builtin_powi (values[i], 10);
+      if (fabs (convert_it_4 (values[i]) / (x * y)) < PREC)
+	abort ();
+
+      x = sqrt (values[i]);
+      y = __builtin_powi (values[i], -4);
+      if (fabs (convert_it_5 (values[i]) / (x * y)) < PREC)
+	abort ();
+    }
+
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/pr46728-4.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr46728-4.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr46728-4.c	(revision 0)
@@ -0,0 +1,31 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
+
+#include <math.h>
+
+extern void abort (void);
+
+#define NVALS 6
+
+static double
+convert_it (double x)
+{
+  return pow (x, 1.0 / 3.0);
+}
+
+int
+main (int argc, char *argv[])
+{
+  double values[NVALS] = { 3.0, 1.95, 2.227, 729.0, 64.0, .0008797 };
+  unsigned i;
+
+  for (i = 0; i < NVALS; i++)
+    if (convert_it (values[i]) != cbrt (values[i]))
+      abort ();
+
+  return 0;
+}
+
+
+/* { dg-final { scan-assembler-times "cbrt" 2 { target powerpc*-*-* } } } */
+/* { dg-final { scan-assembler-not "pow" { target powerpc*-*-* } } } */
Index: gcc/testsuite/gcc.target/powerpc/pr46728-15.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr46728-15.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr46728-15.c	(revision 0)
@@ -0,0 +1,67 @@ 
+/* { dg-do run } */
+/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
+
+#include <math.h>
+
+extern void abort (void);
+
+#define NVALS 6
+
+static double
+convert_it_1 (double x)
+{
+  return pow (x, 10.0 / 3.0);
+}
+
+static double
+convert_it_2 (double x)
+{
+  return pow (x, 11.0 / 3.0);
+}
+
+static double
+convert_it_3 (double x)
+{
+  return pow (x, -7.0 / 3.0);
+}
+
+static double
+convert_it_4 (double x)
+{
+  return pow (x, -8.0 / 3.0);
+}
+
+int
+main (int argc, char *argv[])
+{
+  double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 };
+  double PREC = .999999;
+  unsigned i;
+
+  for (i = 0; i < NVALS; i++)
+    {
+      volatile double x, y;
+
+      x = __builtin_powi (values[i], 3);
+      y = __builtin_powi (cbrt (values[i]), 1);
+      if (fabs (convert_it_1 (values[i]) / (x * y)) < PREC)
+	abort ();
+
+      x = __builtin_powi (values[i], 3);
+      y = __builtin_powi (cbrt (values[i]), 2);
+      if (fabs (convert_it_2 (values[i]) / (x * y)) < PREC)
+	abort ();
+
+      x = __builtin_powi (values[i], -3);
+      y = __builtin_powi (cbrt (values[i]), 2);
+      if (fabs (convert_it_3 (values[i]) / (x * y)) < PREC)
+	abort ();
+
+      x = __builtin_powi (values[i], -3);
+      y = __builtin_powi (cbrt (values[i]), 1);
+      if (fabs (convert_it_4 (values[i]) / (x * y)) < PREC)
+	abort ();
+    }
+
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/pr46728-5.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr46728-5.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr46728-5.c	(revision 0)
@@ -0,0 +1,31 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
+
+#include <math.h>
+
+extern void abort (void);
+
+#define NVALS 6
+
+static double
+convert_it (double x)
+{
+  return pow (x, 1.0 / 6.0);
+}
+
+int
+main (int argc, char *argv[])
+{
+  double values[NVALS] = { 3.0, 1.95, 2.227, 729.0, 64.0, .0008797 };
+  unsigned i;
+
+  for (i = 0; i < NVALS; i++)
+    if (convert_it (values[i]) != cbrt (sqrt (values[i])))
+      abort ();
+
+  return 0;
+}
+
+
+/* { dg-final { scan-assembler-times "cbrt" 2 { target powerpc*-*-* } } } */
+/* { dg-final { scan-assembler-not " pow " { target powerpc*-*-* } } } */
Index: gcc/testsuite/gcc.target/powerpc/pr46728-16.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr46728-16.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr46728-16.c	(revision 0)
@@ -0,0 +1,10 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -mcpu=power6" } */
+
+double foo (double x, double y)
+{
+  return __builtin_pow (x, 0.75) + y;
+}
+
+
+/* { dg-final { scan-assembler "fmadd" { target powerpc*-*-* } } } */
Index: gcc/testsuite/gcc.target/powerpc/pr46728-7.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr46728-7.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr46728-7.c	(revision 0)
@@ -0,0 +1,58 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
+
+#include <math.h>
+
+extern void abort (void);
+
+#define NVALS 6
+
+static double
+convert_it_1 (double x)
+{
+  return pow (x, 1.5);
+}
+
+static double
+convert_it_2 (double x)
+{
+  return pow (x, 2.5);
+}
+
+static double
+convert_it_3 (double x)
+{
+  return pow (x, -0.5);
+}
+
+static double
+convert_it_4 (double x)
+{
+  return pow (x, 10.5);
+}
+
+int
+main (int argc, char *argv[])
+{
+  double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 };
+  unsigned i;
+
+  for (i = 0; i < NVALS; i++)
+    {
+      if (convert_it_1 (values[i]) != sqrt (values[i]) * powi (values[i], 1))
+	abort ();
+      if (convert_it_2 (values[i]) != sqrt (values[i]) * powi (values[i], 2))
+	abort ();
+      if (convert_it_3 (values[i]) != sqrt (values[i]) * powi (values[i], -1))
+	abort ();
+      if (convert_it_4 (values[i]) != sqrt (values[i]) * powi (values[i], 10))
+	abort ();
+    }
+
+  return 0;
+}
+
+
+/* { dg-final { scan-assembler-times "sqrt" 5 { target powerpc*-*-* } } } */
+/* { dg-final { scan-assembler-times "powi" 4 { target powerpc*-*-* } } } */
+/* { dg-final { scan-assembler-not "pow " { target powerpc*-*-* } } } */
Index: gcc/testsuite/gcc.target/powerpc/pr46728-10.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr46728-10.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr46728-10.c	(revision 0)
@@ -0,0 +1,28 @@ 
+/* { dg-do run } */
+/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
+
+#include <math.h>
+
+extern void abort (void);
+
+#define NVALS 6
+
+static double
+convert_it (double x)
+{
+  return pow (x, 0.25);
+}
+
+int
+main (int argc, char *argv[])
+{
+  double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 };
+  unsigned i;
+
+  for (i = 0; i < NVALS; i++)
+    if (convert_it (values[i]) != sqrt (sqrt (values[i])))
+      abort ();
+
+  return 0;
+}
+
Index: gcc/testsuite/gcc.target/powerpc/pr46728-8.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr46728-8.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr46728-8.c	(revision 0)
@@ -0,0 +1,62 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
+
+#include <math.h>
+
+extern void abort (void);
+
+#define NVALS 6
+
+static double
+convert_it_1 (double x)
+{
+  return pow (x, 10.0 / 3.0);
+}
+
+static double
+convert_it_2 (double x)
+{
+  return pow (x, 11.0 / 3.0);
+}
+
+static double
+convert_it_3 (double x)
+{
+  return pow (x, -7.0 / 3.0);
+}
+
+static double
+convert_it_4 (double x)
+{
+  return pow (x, -8.0 / 3.0);
+}
+
+int
+main (int argc, char *argv[])
+{
+  double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 };
+  unsigned i;
+
+  for (i = 0; i < NVALS; i++)
+    {
+      if (convert_it_1 (values[i]) != 
+	  powi (values[i], 3) * powi (cbrt (values[i]), 1))
+	abort ();
+      if (convert_it_2 (values[i]) != 
+	  powi (values[i], 3) * powi (cbrt (values[i]), 2))
+	abort ();
+      if (convert_it_3 (values[i]) != 
+	  powi (values[i], -3) * powi (cbrt (values[i]), 2))
+	abort ();
+      if (convert_it_4 (values[i]) !=
+	  powi (values[i], -3) * powi (cbrt (values[i]), 1))
+	abort ();
+    }
+
+  return 0;
+}
+
+
+/* { dg-final { scan-assembler-times "powi" 8 { target powerpc*-*-* } } } */
+/* { dg-final { scan-assembler-times "cbrt" 5 { target powerpc*-*-* } } } */
+/* { dg-final { scan-assembler-not "pow " { target powerpc*-*-* } } } */
Index: gcc/testsuite/gcc.target/powerpc/pr46728-11.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr46728-11.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr46728-11.c	(revision 0)
@@ -0,0 +1,34 @@ 
+/* { dg-do run } */
+/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
+
+#include <math.h>
+
+extern void abort (void);
+
+#define NVALS 6
+
+static double
+convert_it (double x)
+{
+  return pow (x, 0.75);
+}
+
+int
+main (int argc, char *argv[])
+{
+  double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 };
+  double PREC = 0.999999;
+  unsigned i;
+
+  for (i = 0; i < NVALS; i++)
+    {
+      volatile double x, y;
+      x = sqrt (values[i]);
+      y = sqrt (sqrt (values[i]));
+  
+      if (fabs (convert_it (values[i]) / (x * y)) < PREC)
+	abort ();
+    }
+
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/pr46728-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr46728-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr46728-1.c	(revision 0)
@@ -0,0 +1,31 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
+
+#include <math.h>
+
+extern void abort (void);
+
+#define NVALS 6
+
+static double
+convert_it (double x)
+{
+  return pow (x, 0.5);
+}
+
+int
+main (int argc, char *argv[])
+{
+  double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 };
+  unsigned i;
+
+  for (i = 0; i < NVALS; i++)
+    if (convert_it (values[i]) != sqrt (values[i]))
+      abort ();
+
+  return 0;
+}
+
+
+/* { dg-final { scan-assembler-times "fsqrt" 2 { target powerpc*-*-* } } } */
+/* { dg-final { scan-assembler-not "pow" { target powerpc*-*-* } } } */
Index: gcc/testsuite/gcc.target/powerpc/pr46728-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr46728-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr46728-2.c	(revision 0)
@@ -0,0 +1,31 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt" } */
+
+#include <math.h>
+
+extern void abort (void);
+
+#define NVALS 6
+
+static double
+convert_it (double x)
+{
+  return pow (x, 0.25);
+}
+
+int
+main (int argc, char *argv[])
+{
+  double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 };
+  unsigned i;
+
+  for (i = 0; i < NVALS; i++)
+    if (convert_it (values[i]) != sqrt (sqrt (values[i])))
+      abort ();
+
+  return 0;
+}
+
+
+/* { dg-final { scan-assembler-times "fsqrt" 4 { target powerpc*-*-* } } } */
+/* { dg-final { scan-assembler-not "pow" { target powerpc*-*-* } } } */
Index: gcc/testsuite/gcc.dg/pr46728-9.c
===================================================================
--- gcc/testsuite/gcc.dg/pr46728-9.c	(revision 0)
+++ gcc/testsuite/gcc.dg/pr46728-9.c	(revision 0)
@@ -0,0 +1,29 @@ 
+/* { dg-do run } */
+/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm" } */
+
+#include <math.h>
+
+extern void abort (void);
+
+#define NVALS 6
+
+static double
+convert_it (double x)
+{
+  return pow (x, 0.5);
+}
+
+int
+main (int argc, char *argv[])
+{
+  double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 };
+  double PREC = 0.999999;
+  unsigned i;
+
+  for (i = 0; i < NVALS; i++)
+    if (fabs (convert_it (values[i]) / sqrt (values[i])) < PREC)
+      abort ();
+
+  return 0;
+}
+
Index: gcc/testsuite/gcc.dg/pr46728-12.c
===================================================================
--- gcc/testsuite/gcc.dg/pr46728-12.c	(revision 0)
+++ gcc/testsuite/gcc.dg/pr46728-12.c	(revision 0)
@@ -0,0 +1,28 @@ 
+/* { dg-do run } */
+/* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm" } */
+
+#include <math.h>
+
+extern void abort (void);
+
+#define NVALS 6
+
+static double
+convert_it (double x)
+{
+  return pow (x, 1.0 / 3.0);
+}
+
+int
+main (int argc, char *argv[])
+{
+  double values[NVALS] = { 3.0, 1.95, 2.227, 729.0, 64.0, .0008797 };
+  double PREC = 0.999999;
+  unsigned i;
+
+  for (i = 0; i < NVALS; i++)
+    if (fabs (convert_it (values[i]) / cbrt (values[i])) < PREC)
+      abort ();
+
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/pr46728-6.c
===================================================================
--- gcc/testsuite/gcc.dg/pr46728-6.c	(revision 0)
+++ gcc/testsuite/gcc.dg/pr46728-6.c	(revision 0)
@@ -0,0 +1,21 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -lm" } */
+
+#include <math.h>
+
+int
+main (int argc, char *argv[])
+{
+  volatile double result;
+
+  result = pow (-0.0, 3.0);
+  result = pow (26.47, -2.0);
+  result = pow (0.0, 0.0);
+  result = pow (22.3, 1.0);
+  result = pow (33.2, -1.0);
+
+  return 0;
+}
+
+
+/* { dg-final { scan-assembler-not "pow" } } */
Index: gcc/tree-ssa-math-opts.c
===================================================================
--- gcc/tree-ssa-math-opts.c	(revision 173730)
+++ gcc/tree-ssa-math-opts.c	(working copy)
@@ -1,5 +1,5 @@ 
 /* Global, SSA-based optimizations using mathematical identities.
-   Copyright (C) 2005, 2006, 2007, 2008, 2009, 2010
+   Copyright (C) 2005, 2006, 2007, 2008, 2009, 2010, 2011
    Free Software Foundation, Inc.
 
 This file is part of GCC.
@@ -103,6 +103,7 @@ 
 #include "rtl.h"		/* Because optabs.h wants enum rtx_code.  */
 #include "expr.h"		/* Because optabs.h wants sepops.  */
 #include "optabs.h"
+#include "tree-ssa-propagate.h"
 
 /* This structure represents one basic block that either computes a
    division, or is a common dominator for basic block that compute a
@@ -1854,3 +1855,123 @@ 
   | TODO_update_ssa                     /* todo_flags_finish */
  }
 };
+
+/* Simplify built-in calls to pow and powi.  This is done prior to
+   the vectorizer to expose vector square root and multiplication
+   series opportunities.  */
+
+static unsigned int
+execute_lower_pow (void)
+{
+  basic_block bb;
+
+  FOR_EACH_BB (bb)
+    {
+      gimple_stmt_iterator gsi;
+
+      for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi);)
+        {
+	  gimple stmt = gsi_stmt (gsi);
+
+	  if (is_gimple_call (stmt))
+	    {
+	      tree fndecl = gimple_call_fndecl (stmt);
+	      tree result = NULL_TREE;
+
+	      if (!fndecl
+		  || TREE_CODE (fndecl) != FUNCTION_DECL
+		  || !DECL_BUILT_IN (fndecl)
+		  || gimple_call_va_arg_pack_p (stmt)
+		  || DECL_BUILT_IN_CLASS (fndecl) != BUILT_IN_NORMAL)
+		{
+		  gsi_next (&gsi);
+		  continue;
+		}
+
+	      switch (DECL_FUNCTION_CODE (fndecl))
+		{
+		case BUILT_IN_POW:
+		case BUILT_IN_POWF:
+		case BUILT_IN_POWL:
+		  {
+		    location_t loc = gimple_location (stmt);
+		    tree *args = gimple_call_arg_ptr (stmt, 0);
+		    tree type = TREE_TYPE (TREE_TYPE (fndecl));
+		    result = fold_builtin_pow (loc, fndecl, args[0],
+					       args[1], type);
+		    break;
+		  }
+		case BUILT_IN_POWI:
+		case BUILT_IN_POWIF:
+		case BUILT_IN_POWIL:
+		  {
+		    location_t loc = gimple_location (stmt);
+		    tree *args = gimple_call_arg_ptr (stmt, 0);
+		    tree type = TREE_TYPE (TREE_TYPE (fndecl));
+		    result = fold_builtin_powi (loc, fndecl, args[0],
+						args[1], type);
+
+		    /* Expanding powi into an optimal number of 
+		       multiplications requires adding statements,
+		       so handle that separately.  */
+		    if (result == NULL_TREE
+			&& host_integerp (args[1], 0)
+			&& !TREE_OVERFLOW (args[1]))
+		      result = tree_expand_builtin_powi (&gsi, loc, args);
+
+		    break;
+		  }
+		default:
+		  break;
+		}
+
+	      if (result)
+		{
+		  /* Propagate location information from original call to
+		     expansion of builtin.  Otherwise things like
+		     maybe_emit_chk_warning, that operate on the expansion
+		     of a builtin, will use the wrong location information.  */
+		  if (gimple_has_location (stmt))
+		    {
+		      tree realret = result;
+		      if (TREE_CODE (result) == NOP_EXPR)
+			realret = TREE_OPERAND (result, 0);
+		      if (CAN_HAVE_LOCATION_P (realret)
+			  && !EXPR_HAS_LOCATION (realret))
+			SET_EXPR_LOCATION (realret, gimple_location (stmt));
+		      result = realret;
+		    }
+		}
+
+	      if (result && !update_call_from_tree (&gsi, result))
+		gimplify_and_update_call_from_tree (&gsi, result);
+	    }
+
+	  gsi_next (&gsi);
+	}
+    }
+
+  return 0;
+}
+
+struct gimple_opt_pass pass_lower_pow =
+{
+ {
+  GIMPLE_PASS,
+  "lower_pow",				/* name */
+  NULL,					/* gate */
+  execute_lower_pow,			/* execute */
+  NULL,					/* sub */
+  NULL,					/* next */
+  0,					/* static_pass_number */
+  TV_NONE,				/* tv_id */
+  PROP_ssa,				/* properties_required */
+  0,					/* properties_provided */
+  0,					/* properties_destroyed */
+  0,					/* todo_flags_start */
+  TODO_verify_ssa
+  | TODO_verify_stmts
+  | TODO_dump_func
+  | TODO_update_ssa                     /* todo_flags_finish */
+ }
+};
Index: gcc/tree-flow.h
===================================================================
--- gcc/tree-flow.h	(revision 173730)
+++ gcc/tree-flow.h	(working copy)
@@ -856,4 +856,7 @@ 
 
 void swap_tree_operands (gimple, tree *, tree *);
 
+/* In builtins.c  */
+tree tree_expand_builtin_powi (gimple_stmt_iterator *, location_t, tree *);
+
 #endif /* _TREE_FLOW_H  */
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 173730)
+++ gcc/Makefile.in	(working copy)
@@ -2639,7 +2639,8 @@ 
 tree-ssa-math-opts.o : tree-ssa-math-opts.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
    $(TM_H) $(FLAGS_H) $(TREE_H) $(TREE_FLOW_H) $(TIMEVAR_H) \
    $(TREE_PASS_H) alloc-pool.h $(BASIC_BLOCK_H) $(TARGET_H) \
-   $(DIAGNOSTIC_H) $(RTL_H) $(EXPR_H) $(OPTABS_H) gimple-pretty-print.h
+   $(DIAGNOSTIC_H) $(RTL_H) $(EXPR_H) $(OPTABS_H) gimple-pretty-print.h \
+   tree-ssa-propagate.h
 tree-ssa-alias.o : tree-ssa-alias.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \
    $(TREE_H) $(TM_P_H) $(EXPR_H) $(GGC_H) $(TREE_INLINE_H) $(FLAGS_H) \
    $(FUNCTION_H) $(TIMEVAR_H) convert.h $(TM_H) coretypes.h langhooks.h \
Index: gcc/passes.c
===================================================================
--- gcc/passes.c	(revision 173730)
+++ gcc/passes.c	(working copy)
@@ -1,7 +1,7 @@ 
 /* Top level of GCC compilers (cc1, cc1plus, etc.)
    Copyright (C) 1987, 1988, 1989, 1992, 1993, 1994, 1995, 1996, 1997, 1998,
-   1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010
-   Free Software Foundation, Inc.
+   1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010,
+   2011 Free Software Foundation, Inc.
 
 This file is part of GCC.
 
@@ -812,6 +812,7 @@ 
      output to the assembler file.  */
   p = &all_passes;
   NEXT_PASS (pass_lower_eh_dispatch);
+  NEXT_PASS (pass_lower_pow);
   NEXT_PASS (pass_all_optimizations);
     {
       struct opt_pass **p = &pass_all_optimizations.pass.sub;