Patchwork Add FMA_EXPR, un-cse multiplications before expansion

login
register
mail settings
Submitter Richard Guenther
Date Oct. 22, 2010, 2:03 p.m.
Message ID <alpine.LNX.2.00.1010221600330.23074@zhemvz.fhfr.qr>
Download mbox | patch
Permalink /patch/68857/
State New
Headers show

Comments

Richard Guenther - Oct. 22, 2010, 2:03 p.m.
On Fri, 22 Oct 2010, Richard Henderson wrote:

> On 10/21/2010 05:09 PM, Joseph S. Myers wrote:
> > My model is that the .md files always accurately describe the instructions 
> > - using fma RTL, not (a*b)+c RTL - at which point the insns should be 
> > conditioned on the instructions actually being available, not on the 
> > -ffp-contract setting.
> 
> With the exception of s390 and arm, I believe the backends have been
> converted so that the fma rtl pattern is available.  However, the
> existing backend switch -mfused-madd (TARGET_FUSED_MADD) is still
> supported via a*b+c rtl.
> 
> I would be absolutely delighted if we could drop/alias the -m switch
> to a generic -f switch that can handle this in either combine or in
> Richi's new pass.

Consider the following added to my patch.  Note that for the
mul/add patterns to be removed we have to address the fma expander
pattern issue I raised in the original posting (the expander
doesn't like (fma (mul (neg x) y) z) yet and so the neg gets
split away and optimized by its expander confusing combine).

Joseph, I'm waiting for your last option handling patch to go in
to adjust -ffast-math to include -ffp-contract.  I'm also not
sure how to do the tristate on/off/fast at the moment.

Richard.

Index: gcc/tree-ssa-math-opts.c
===================================================================
--- gcc/tree-ssa-math-opts.c.orig       2010-10-22 16:00:28.000000000 
+0200
+++ gcc/tree-ssa-math-opts.c    2010-10-22 16:00:22.000000000 +0200
@@ -1507,6 +1507,10 @@ convert_mult_to_fma (gimple mul_stmt)
   use_operand_p use_p;
   imm_use_iterator imm_iter;
 
+  if (FLOAT_TYPE_P (type)
+      && !flag_fp_contract)
+    return false;
+
   /* If the target doesn't support it, don't generate it.
      ???  We have no way of querying support for the various variants
      with negated operands, so for the following we simply assume
Joseph S. Myers - Oct. 22, 2010, 2:24 p.m.
On Fri, 22 Oct 2010, Richard Guenther wrote:

> Joseph, I'm waiting for your last option handling patch to go in
> to adjust -ffast-math to include -ffp-contract.  I'm also not

Right now I don't have any uncommitted options patches.

> sure how to do the tristate on/off/fast at the moment.

The current way is code in the option handler that checks for different 
string arguments, as done for -fexcess-precision= (a two-state option 
where the option design allows for other versions such as "none" to be 
added if anyone wants them).

(I do intend at some point to add a generic facility for such options in 
.opt files; there's a lot of duplication of similar code to handle such 
options, especially in all the targets looking up possible -march values.  
Given a generic facility, diagnostics for invalid values can all tell you 
what the valid values are, as can --help.)
Richard Guenther - Oct. 22, 2010, 3:06 p.m.
On Fri, 22 Oct 2010, Joseph S. Myers wrote:

> On Fri, 22 Oct 2010, Richard Guenther wrote:
> 
> > Joseph, I'm waiting for your last option handling patch to go in
> > to adjust -ffast-math to include -ffp-contract.  I'm also not
> 
> Right now I don't have any uncommitted options patches.
> 
> > sure how to do the tristate on/off/fast at the moment.
> 
> The current way is code in the option handler that checks for different 
> string arguments, as done for -fexcess-precision= (a two-state option 
> where the option design allows for other versions such as "none" to be 
> added if anyone wants them).
> 
> (I do intend at some point to add a generic facility for such options in 
> .opt files; there's a lot of duplication of similar code to handle such 
> options, especially in all the targets looking up possible -march values.  
> Given a generic facility, diagnostics for invalid values can all tell you 
> what the valid values are, as can --help.)

--help= seems to be broken for me at the moment.

Here's my current patch in case anyone plans to do fma related work
elsewhere.

Richard.

2010-10-22  Richard Guenther  <rguenther@suse.de>

	* tree.def (FMA_EXPR): New tree code.
	* expr.c (expand_expr_real_2): Add FMA_EXPR expansion code.
	* gimple.c (gimple_rhs_class_table): FMA_EXPR is a GIMPLE_TERNARY_RHS.
	* tree-cfg.c (verify_gimple_assign_ternary): Verify FMA_EXPR types.
	* tree-inline.c (estimate_operator_cost): Handle FMA_EXPR.
	* gimple-pretty-print.c (dump_ternary_rhs): Likewise.
	* tree-ssa-math-opts.c (convert_mult_to_fma): New function.
	(execute_optimize_widening_mul): Call it.  Reorganize to allow
	dead stmt removal.  Move TODO flags ...
	(pass_optimize_widening_mul): ... here.
	* common.opt (-ffp-contract): New option.
	* opts.c (common_handle_option): Handle it.
	(set_unsafe_math_optimizations_flags): Enable -ffp-contract=fast.
	* doc/invoke.texi (-ffp-contract): Document.
	(-funsafe-math-optimizations): Adjust.

	* gcc.target/i386/fma4-vector-2.c: New testcase.

Index: gcc/tree.def
===================================================================
*** gcc/tree.def.orig	2010-10-22 16:43:20.000000000 +0200
--- gcc/tree.def	2010-10-22 16:48:01.000000000 +0200
*************** DEFTREECODE (WIDEN_MULT_PLUS_EXPR, "wide
*** 1092,1097 ****
--- 1092,1103 ----
     is subtracted from t3.  */
  DEFTREECODE (WIDEN_MULT_MINUS_EXPR, "widen_mult_plus_expr", tcc_expression, 3)
  
+ /* Fused multiply-add.
+    All operands and the result are of the same type.  No intermediate
+    rounding is performed after multiplying operand one with operand two
+    before adding operand three.  */
+ DEFTREECODE (FMA_EXPR, "fma_expr", tcc_expression, 3)
+ 
  /* Whole vector left/right shift in bits.
     Operand 0 is a vector to be shifted.
     Operand 1 is an integer shift amount in bits.  */
Index: gcc/expr.c
===================================================================
*** gcc/expr.c.orig	2010-10-22 16:43:20.000000000 +0200
--- gcc/expr.c	2010-10-22 16:48:01.000000000 +0200
*************** expand_expr_real_2 (sepops ops, rtx targ
*** 7254,7260 ****
    int ignore;
    bool reduce_bit_field;
    location_t loc = ops->location;
!   tree treeop0, treeop1;
  #define REDUCE_BIT_FIELD(expr)	(reduce_bit_field			  \
  				 ? reduce_to_bit_field_precision ((expr), \
  								  target, \
--- 7254,7260 ----
    int ignore;
    bool reduce_bit_field;
    location_t loc = ops->location;
!   tree treeop0, treeop1, treeop2;
  #define REDUCE_BIT_FIELD(expr)	(reduce_bit_field			  \
  				 ? reduce_to_bit_field_precision ((expr), \
  								  target, \
*************** expand_expr_real_2 (sepops ops, rtx targ
*** 7267,7272 ****
--- 7267,7273 ----
  
    treeop0 = ops->op0;
    treeop1 = ops->op1;
+   treeop2 = ops->op2;
  
    /* We should be called only on simple (binary or unary) expressions,
       exactly those that are valid in gimple expressions that aren't
*************** expand_expr_real_2 (sepops ops, rtx targ
*** 7624,7630 ****
      case WIDEN_MULT_PLUS_EXPR:
      case WIDEN_MULT_MINUS_EXPR:
        expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1, EXPAND_NORMAL);
!       op2 = expand_normal (ops->op2);
        target = expand_widen_pattern_expr (ops, op0, op1, op2,
  					  target, unsignedp);
        return target;
--- 7625,7631 ----
      case WIDEN_MULT_PLUS_EXPR:
      case WIDEN_MULT_MINUS_EXPR:
        expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1, EXPAND_NORMAL);
!       op2 = expand_normal (treeop2);
        target = expand_widen_pattern_expr (ops, op0, op1, op2,
  					  target, unsignedp);
        return target;
*************** expand_expr_real_2 (sepops ops, rtx targ
*** 7711,7716 ****
--- 7712,7745 ----
        expand_operands (treeop0, treeop1, subtarget, &op0, &op1, EXPAND_NORMAL);
        return REDUCE_BIT_FIELD (expand_mult (mode, op0, op1, target, unsignedp));
  
+     case FMA_EXPR:
+       {
+ 	gimple def;
+ 	def = get_def_for_expr (treeop0, NEGATE_EXPR);
+ 	if (def)
+ 	  {
+ 	    op0 = expand_normal (gimple_assign_rhs1 (def));
+ 	    op0 = force_reg (mode, op0);
+ 	    op0 = gen_rtx_NEG (mode, op0);
+ 	  }
+ 	else
+ 	  op0 = expand_expr (treeop0, subtarget, VOIDmode, EXPAND_NORMAL);
+ 	op1 = expand_normal (treeop1);
+ 	def = get_def_for_expr (treeop2, NEGATE_EXPR);
+ 	if (def)
+ 	  {
+ 	    op2 = expand_normal (gimple_assign_rhs1 (def));
+ 	    op2 = force_reg (mode, op2);
+ 	    op2 = gen_rtx_NEG (mode, op2);
+ 	  }
+ 	else
+ 	  op2 = expand_normal (treeop2);
+ 	/* ???  Building the explicit negs above doesn't help if the
+ 	   fma<mode>4 expander doesn't accept it.  */
+ 	return expand_ternary_op (TYPE_MODE (type), fma_optab,
+ 				  op0, op1, op2, target, 0);
+       }
+ 
      case MULT_EXPR:
        /* If this is a fixed-point operation, then we cannot use the code
  	 below because "expand_mult" doesn't support sat/no-sat fixed-point
Index: gcc/gimple.c
===================================================================
*** gcc/gimple.c.orig	2010-10-22 16:43:20.000000000 +0200
--- gcc/gimple.c	2010-10-22 16:48:01.000000000 +0200
*************** get_gimple_rhs_num_ops (enum tree_code c
*** 2528,2534 ****
        || (SYM) == TRUTH_XOR_EXPR) ? GIMPLE_BINARY_RHS			    \
     : (SYM) == TRUTH_NOT_EXPR ? GIMPLE_UNARY_RHS				    \
     : ((SYM) == WIDEN_MULT_PLUS_EXPR					    \
!       || (SYM) == WIDEN_MULT_MINUS_EXPR) ? GIMPLE_TERNARY_RHS		    \
     : ((SYM) == COND_EXPR						    \
        || (SYM) == CONSTRUCTOR						    \
        || (SYM) == OBJ_TYPE_REF						    \
--- 2528,2535 ----
        || (SYM) == TRUTH_XOR_EXPR) ? GIMPLE_BINARY_RHS			    \
     : (SYM) == TRUTH_NOT_EXPR ? GIMPLE_UNARY_RHS				    \
     : ((SYM) == WIDEN_MULT_PLUS_EXPR					    \
!       || (SYM) == WIDEN_MULT_MINUS_EXPR					    \
!       || (SYM) == FMA_EXPR) ? GIMPLE_TERNARY_RHS			    \
     : ((SYM) == COND_EXPR						    \
        || (SYM) == CONSTRUCTOR						    \
        || (SYM) == OBJ_TYPE_REF						    \
Index: gcc/tree-cfg.c
===================================================================
*** gcc/tree-cfg.c.orig	2010-10-22 16:43:20.000000000 +0200
--- gcc/tree-cfg.c	2010-10-22 16:48:01.000000000 +0200
*************** verify_gimple_assign_ternary (gimple stm
*** 3748,3753 ****
--- 3748,3767 ----
  	}
        break;
  
+     case FMA_EXPR:
+       if (!useless_type_conversion_p (lhs_type, rhs1_type)
+ 	  || !useless_type_conversion_p (lhs_type, rhs2_type)
+ 	  || !useless_type_conversion_p (lhs_type, rhs3_type))
+ 	{
+ 	  error ("type mismatch in fused multiply-add expression");
+ 	  debug_generic_expr (lhs_type);
+ 	  debug_generic_expr (rhs1_type);
+ 	  debug_generic_expr (rhs2_type);
+ 	  debug_generic_expr (rhs3_type);
+ 	  return true;
+ 	}
+       break;
+ 
      default:
        gcc_unreachable ();
      }
Index: gcc/tree-inline.c
===================================================================
*** gcc/tree-inline.c.orig	2010-10-22 16:43:20.000000000 +0200
--- gcc/tree-inline.c	2010-10-22 16:48:01.000000000 +0200
*************** estimate_operator_cost (enum tree_code c
*** 3284,3289 ****
--- 3284,3290 ----
      case POINTER_PLUS_EXPR:
      case MINUS_EXPR:
      case MULT_EXPR:
+     case FMA_EXPR:
  
      case ADDR_SPACE_CONVERT_EXPR:
      case FIXED_CONVERT_EXPR:
Index: gcc/gimple-pretty-print.c
===================================================================
*** gcc/gimple-pretty-print.c.orig	2010-10-22 16:43:20.000000000 +0200
--- gcc/gimple-pretty-print.c	2010-10-22 16:48:01.000000000 +0200
*************** dump_ternary_rhs (pretty_printer *buffer
*** 400,405 ****
--- 400,413 ----
        pp_character (buffer, '>');
        break;
  
+     case FMA_EXPR:
+       dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false);
+       pp_string (buffer, " * ");
+       dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false);
+       pp_string (buffer, " + ");
+       dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false);
+       break;
+ 
      default:
        gcc_unreachable ();
      }
Index: gcc/tree-ssa-math-opts.c
===================================================================
*** gcc/tree-ssa-math-opts.c.orig	2010-10-22 16:43:20.000000000 +0200
--- gcc/tree-ssa-math-opts.c	2010-10-22 16:48:01.000000000 +0200
*************** convert_plusminus_to_widen (gimple_stmt_
*** 1494,1499 ****
--- 1494,1604 ----
    return true;
  }
  
+ /* Combine the multiplication at MUL_STMT with uses in additions and
+    subtractions to form fused multiply-add operations.  Returns true
+    if successful and MUL_STMT should be removed.  */
+ 
+ static bool
+ convert_mult_to_fma (gimple mul_stmt)
+ {
+   tree mul_result = gimple_assign_lhs (mul_stmt);
+   tree type = TREE_TYPE (mul_result);
+   gimple use_stmt, fma_stmt;
+   use_operand_p use_p;
+   imm_use_iterator imm_iter;
+ 
+   if (FLOAT_TYPE_P (type)
+       && !flag_fp_contract)
+     return false;
+ 
+   /* If the target doesn't support it, don't generate it.
+      ???  We have no way of querying support for the various variants
+      with negated operands, so for the following we simply assume
+      they are all available ((-a)*b+c, a*b-c and (-a)*b-c).  */
+   if (optab_handler (fma_optab, TYPE_MODE (type)) == CODE_FOR_nothing)
+     return false;
+ 
+   /* We don't want to do bitfield reduction ops.  */
+   if (INTEGRAL_TYPE_P (type)
+       && (TYPE_PRECISION (type)
+ 	  != GET_MODE_PRECISION (TYPE_MODE (type))))
+     return false;
+ 
+   /* Make sure that the multiplication statement becomes dead after
+      the transformation, thus that all uses are transformed to FMAs.
+      This means we assume that an FMA operation has the same cost
+      as an addition.  */
+   FOR_EACH_IMM_USE_FAST (use_p, imm_iter, mul_result)
+     {
+       enum tree_code use_code;
+ 
+       use_stmt = USE_STMT (use_p);
+ 
+       if (!is_gimple_assign (use_stmt))
+ 	return false;
+       use_code = gimple_assign_rhs_code (use_stmt);
+       /* ???  Handle NEGATE_EXPR.  */
+       if (use_code != PLUS_EXPR
+ 	  && use_code != MINUS_EXPR)
+ 	return false;
+ 
+       /* We can't handle a * b + a * b.  */
+       if (gimple_assign_rhs1 (use_stmt) == gimple_assign_rhs2 (use_stmt))
+ 	return false;
+ 
+       /* For now restrict this operations to single basic blocks.  In theory
+ 	 we would want to support sinking the multiplication in
+ 	 m = a*b;
+ 	 if ()
+ 	   ma = m + c;
+ 	 else
+ 	   d = m;
+ 	 to form a fma in the then block and sink the multiplication to the
+ 	 else block.  */
+       if (gimple_bb (use_stmt) != gimple_bb (mul_stmt))
+ 	return false;
+     }
+ 
+   FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, mul_result)
+     {
+       tree addop, mulop1;
+       gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
+ 
+       mulop1 = gimple_assign_rhs1 (mul_stmt);
+       if (gimple_assign_rhs1 (use_stmt) == mul_result)
+ 	{
+ 	  addop = gimple_assign_rhs2 (use_stmt);
+ 	  /* a * b - c -> a * b + (-c)  */
+ 	  if (gimple_assign_rhs_code (use_stmt) == MINUS_EXPR)
+ 	    addop = force_gimple_operand_gsi (&gsi,
+ 					      build1 (NEGATE_EXPR,
+ 						      type, addop),
+ 					      true, NULL_TREE, true,
+ 					      GSI_SAME_STMT);
+ 	}
+       else
+ 	{
+ 	  addop = gimple_assign_rhs1 (use_stmt);
+ 	  /* a - b * c -> (-b) * c + a */
+ 	  if (gimple_assign_rhs_code (use_stmt) == MINUS_EXPR)
+ 	    mulop1 = force_gimple_operand_gsi (&gsi,
+ 					       build1 (NEGATE_EXPR,
+ 						       type, mulop1),
+ 					       true, NULL_TREE, true,
+ 					       GSI_SAME_STMT);
+ 	}
+ 
+       fma_stmt = gimple_build_assign_with_ops3 (FMA_EXPR,
+ 						gimple_assign_lhs (use_stmt),
+ 						mulop1,
+ 						gimple_assign_rhs2 (mul_stmt),
+ 						addop);
+       gsi_replace (&gsi, fma_stmt, true);
+     }
+ 
+   return true;
+ }
+ 
  /* Find integer multiplications where the operands are extended from
     smaller types, and replace the MULT_EXPR with a WIDEN_MULT_EXPR
     where appropriate.  */
*************** convert_plusminus_to_widen (gimple_stmt_
*** 1501,1531 ****
  static unsigned int
  execute_optimize_widening_mul (void)
  {
-   bool changed = false;
    basic_block bb;
  
    FOR_EACH_BB (bb)
      {
        gimple_stmt_iterator gsi;
  
!       for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (&gsi))
          {
  	  gimple stmt = gsi_stmt (gsi);
  	  enum tree_code code;
  
! 	  if (!is_gimple_assign (stmt))
! 	    continue;
! 
! 	  code = gimple_assign_rhs_code (stmt);
! 	  if (code == MULT_EXPR)
! 	    changed |= convert_mult_to_widen (stmt);
! 	  else if (code == PLUS_EXPR || code == MINUS_EXPR)
! 	    changed |= convert_plusminus_to_widen (&gsi, stmt, code);
  	}
      }
  
!   return (changed ? TODO_dump_func | TODO_update_ssa | TODO_verify_ssa
! 	  | TODO_verify_stmts : 0);
  }
  
  static bool
--- 1606,1650 ----
  static unsigned int
  execute_optimize_widening_mul (void)
  {
    basic_block bb;
  
    FOR_EACH_BB (bb)
      {
        gimple_stmt_iterator gsi;
  
!       for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi);)
          {
  	  gimple stmt = gsi_stmt (gsi);
  	  enum tree_code code;
  
! 	  if (is_gimple_assign (stmt))
! 	    {
! 	      code = gimple_assign_rhs_code (stmt);
! 	      switch (code)
! 		{
! 		case MULT_EXPR:
! 		  if (!convert_mult_to_widen (stmt)
! 		      && convert_mult_to_fma (stmt))
! 		    {
! 		      gsi_remove (&gsi, true);
! 		      release_defs (stmt);
! 		      continue;
! 		    }
! 		  break;
! 
! 		case PLUS_EXPR:
! 		case MINUS_EXPR:
! 		  convert_plusminus_to_widen (&gsi, stmt, code);
! 		  break;
! 
! 		default:;
! 		}
! 	    }
! 	  gsi_next (&gsi);
  	}
      }
  
!   return 0;
  }
  
  static bool
*************** struct gimple_opt_pass pass_optimize_wid
*** 1549,1554 ****
    0,					/* properties_provided */
    0,					/* properties_destroyed */
    0,					/* todo_flags_start */
!   0                                     /* todo_flags_finish */
   }
  };
--- 1668,1676 ----
    0,					/* properties_provided */
    0,					/* properties_destroyed */
    0,					/* todo_flags_start */
!   TODO_verify_ssa
!   | TODO_verify_stmts
!   | TODO_dump_func
!   | TODO_update_ssa                     /* todo_flags_finish */
   }
  };
Index: gcc/testsuite/gcc.target/i386/fma4-vector-2.c
===================================================================
*** /dev/null	1970-01-01 00:00:00.000000000 +0000
--- gcc/testsuite/gcc.target/i386/fma4-vector-2.c	2010-10-22 16:48:01.000000000 +0200
***************
*** 0 ****
--- 1,21 ----
+ /* { dg-do compile } */
+ /* { dg-require-effective-target lp64 } */
+ /* { dg-options "-O2 -mfma4 -ftree-vectorize -mtune=generic" } */
+ 
+ float r[256], s[256];
+ float x[256];
+ float y[256];
+ float z[256];
+ 
+ void foo (void)
+ {
+   int i;
+   for (i = 0; i < 256; ++i)
+     {
+       r[i] = x[i] * y[i] - z[i];
+       s[i] = x[i] * y[i] + z[i];
+     }
+ }
+ 
+ /* { dg-final { scan-assembler "vfmaddps" } } */
+ /* { dg-final { scan-assembler "vfmsubps" } } */
Index: gcc/common.opt
===================================================================
*** gcc/common.opt.orig	2010-10-22 16:43:20.000000000 +0200
--- gcc/common.opt	2010-10-22 16:48:24.000000000 +0200
*************** fforward-propagate
*** 842,847 ****
--- 842,851 ----
  Common Report Var(flag_forward_propagate) Optimization
  Perform a forward propagation pass on RTL
  
+ ffp-contract=
+ Common Joined RejectNegative Var(flag_fp_contract)
+ -ffp-contract=[on|off|fast] Perform floating-point expression contraction.
+ 
  ; Nonzero means don't put addresses of constant functions in registers.
  ; Used for compiling the Unix kernel, where strange substitutions are
  ; done on the assembly output.
Index: gcc/doc/invoke.texi
===================================================================
*** gcc/doc/invoke.texi.orig	2010-10-22 16:43:20.000000000 +0200
--- gcc/doc/invoke.texi	2010-10-22 16:48:01.000000000 +0200
*************** Objective-C and Objective-C++ Dialects}.
*** 342,348 ****
  -fdelayed-branch -fdelete-null-pointer-checks -fdse -fdse @gol
  -fearly-inlining -fipa-sra -fexpensive-optimizations -ffast-math @gol
  -ffinite-math-only -ffloat-store -fexcess-precision=@var{style} @gol
! -fforward-propagate -ffunction-sections @gol
  -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol
  -fgcse-sm -fif-conversion -fif-conversion2 -findirect-inlining @gol
  -finline-functions -finline-functions-called-once -finline-limit=@var{n} @gol
--- 342,348 ----
  -fdelayed-branch -fdelete-null-pointer-checks -fdse -fdse @gol
  -fearly-inlining -fipa-sra -fexpensive-optimizations -ffast-math @gol
  -ffinite-math-only -ffloat-store -fexcess-precision=@var{style} @gol
! -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol
  -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol
  -fgcse-sm -fif-conversion -fif-conversion2 -findirect-inlining @gol
  -finline-functions -finline-functions-called-once -finline-limit=@var{n} @gol
*************** loop unrolling.
*** 5980,5985 ****
--- 5980,5996 ----
  This option is enabled by default at optimization levels @option{-O},
  @option{-O2}, @option{-O3}, @option{-Os}.
  
+ @item -ffp-contract=@var{style}
+ @opindex ffp-contract
+ @option{-ffp-contract=off} disables floating-point expression contraction.
+ @option{-ffp-contract=fast} enables floating-point expression contraction
+ such as forming of fused multiply-add operations if the target has
+ native support for them.
+ @option{-ffp-contract=on} enables floating-point expression contraction
+ if allowed by the language standard.
+ 
+ The default is @option{-ffp-contract=off}.
+ 
  @item -fomit-frame-pointer
  @opindex fomit-frame-pointer
  Don't keep the frame pointer in a register for functions that
*************** an exact implementation of IEEE or ISO r
*** 7816,7822 ****
  math functions. It may, however, yield faster code for programs
  that do not require the guarantees of these specifications.
  Enables @option{-fno-signed-zeros}, @option{-fno-trapping-math},
! @option{-fassociative-math} and @option{-freciprocal-math}.
  
  The default is @option{-fno-unsafe-math-optimizations}.
  
--- 7827,7834 ----
  math functions. It may, however, yield faster code for programs
  that do not require the guarantees of these specifications.
  Enables @option{-fno-signed-zeros}, @option{-fno-trapping-math},
! @option{-fassociative-math}, @option{-freciprocal-math} and
! @option{-ffp-contract=fast}.
  
  The default is @option{-fno-unsafe-math-optimizations}.
  
Index: gcc/opts.c
===================================================================
*** gcc/opts.c.orig	2010-10-22 16:43:20.000000000 +0200
--- gcc/opts.c	2010-10-22 16:54:23.000000000 +0200
*************** common_handle_option (struct gcc_options
*** 1901,1906 ****
--- 1901,1918 ----
  	return false;
        break;
  
+     case OPT_ffp_contract_:
+       if (!strcmp (arg, "on"))
+ 	/* Not implemented.  */
+ 	flag_fp_contract = 0;
+       else if (!strcmp (arg, "off"))
+ 	flag_fp_contract = 0;
+       else if (!strcmp (arg, "fast"))
+ 	flag_fp_contract = 1;
+       else
+ 	error ("unknown floating point contraction style \"%s\"", arg);
+       break;
+ 
      case OPT_fexcess_precision_:
        if (!strcmp (arg, "fast"))
  	flag_excess_precision_cmdline = EXCESS_PRECISION_FAST;
*************** set_unsafe_math_optimizations_flags (int
*** 2289,2294 ****
--- 2301,2307 ----
    flag_signed_zeros = !set;
    flag_associative_math = set;
    flag_reciprocal_math = set;
+   flag_fp_contract = set;
  }
  
  /* Return true iff flags are set as if -ffast-math.  */
Joseph S. Myers - Oct. 22, 2010, 3:14 p.m.
On Fri, 22 Oct 2010, Richard Guenther wrote:

> + ffp-contract=
> + Common Joined RejectNegative Var(flag_fp_contract)
> + -ffp-contract=[on|off|fast] Perform floating-point expression contraction.

Var with Joined and without UInteger is supposed to create a pointer 
variable, which doesn't fit well with assigning integer values to it 
elsewhere.

(I'd suggest an enum variable.  Define the enumeration in flag-types.h, 
define the variable through a Variable entry among those at the top of 
common.opt.  Don't put Var on the individual option entry.  Store distinct 
enum values for all three option arguments even if the code checking the 
values later doesn't distinguish them all.)

> + @item -ffp-contract=@var{style}
> + @opindex ffp-contract
> + @option{-ffp-contract=off} disables floating-point expression contraction.
> + @option{-ffp-contract=fast} enables floating-point expression contraction
> + such as forming of fused multiply-add operations if the target has
> + native support for them.
> + @option{-ffp-contract=on} enables floating-point expression contraction
> + if allowed by the language standard.
> + 
> + The default is @option{-ffp-contract=off}.

FWIW, hitherto the default has been "fast", though certainly in ISO C 
modes it should be "on" or "off".

Patch

Index: gcc/common.opt
===================================================================
--- gcc/common.opt.orig 2010-10-14 14:00:03.000000000 +0200
+++ gcc/common.opt      2010-10-22 15:58:37.000000000 +0200
@@ -842,6 +842,10 @@  fforward-propagate
 Common Report Var(flag_forward_propagate) Optimization
 Perform a forward propagation pass on RTL
 
+ffp-contract
+Common Report Var(flag_fp_contract) Optimization
+Perform floating-point expression contraction.
+
 ; Nonzero means don't put addresses of constant functions in registers.
 ; Used for compiling the Unix kernel, where strange substitutions are
 ; done on the assembly output.