diff mbox

On x86 allow if-conversion of more than one insn as long as there is at most one cmov (PR tree-optimization/79390)

Message ID 20170401122027.GT17461@tucnak
State New
Headers show

Commit Message

Jakub Jelinek April 1, 2017, 12:20 p.m. UTC
Hi!

As discussed in the PR, in the following testcase we don't if-convert
with the generic (and many other) tuning, because we default to
--param max-rtl-if-conversion-insns=1 in most of the tunings.
The problem we have is with multiple cmov instructions, but in the
testcase there is just one cmov and the other insn is turned into a SSE
max insn, which is fine.

This patch stops artificially lowering that param, and for one_if_conv_insn
tuning it instead rejects the if-conversion if the resulting sequence has
multiple cmov instructions.  The hook is passed if_info too, so it can
in the future do better heuristics based on predictability of the edges,
how far the uses of the cmov result are (I assume cmov major problem is
latency, right?) etc.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-04-01  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/79390
	* target.h (struct noce_if_info): Declare.
	* targhooks.h (default_noce_conversion_profitable_p): Declare.
	* target.def (noce_conversion_profitable_p): New target hook.
	* ifcvt.h (struct noce_if_info): New type, moved from ...
	* ifcvt.c (struct noce_if_info): ... here.
	(noce_conversion_profitable_p): Renamed to ...
	(default_noce_conversion_profitable_p): ... this.  No longer
	static nor inline.
	(noce_try_store_flag_constants, noce_try_addcc,
	noce_try_store_flag_mask, noce_try_cmove, noce_try_cmove_arith,
	noce_convert_multiple_sets): Use targetm.noce_conversion_profitable_p
	instead of noce_conversion_profitable_p.
	* config/i386/i386.c: Include ifcvt.h.
	(ix86_option_override_internal): Don't override
	PARAM_MAX_RTL_IF_CONVERSION_INSNS default.
	(ix86_noce_conversion_profitable_p): New function.
	(TARGET_NOCE_CONVERSION_PROFITABLE_P): Redefine.
	* config/i386/x86-tune.def (X86_TUNE_ONE_IF_CONV_INSN): Adjust comment.
	* doc/tm.texi.in (TARGET_NOCE_CONVERSION_PROFITABLE_P): Add.
	* doc/tm.texi: Regenerated.

	* gcc.target/i386/pr79390.c: New test.
	* gcc.dg/ifcvt-4.c: Use -mtune-ctrl=^one_if_conv_insn for i?86/x86_64.


	Jakub

Comments

Segher Boessenkool April 1, 2017, 6:43 p.m. UTC | #1
Hi,

On Sat, Apr 01, 2017 at 02:20:27PM +0200, Jakub Jelinek wrote:
> As discussed in the PR, in the following testcase we don't if-convert
> with the generic (and many other) tuning, because we default to
> --param max-rtl-if-conversion-insns=1 in most of the tunings.
> The problem we have is with multiple cmov instructions, but in the
> testcase there is just one cmov and the other insn is turned into a SSE
> max insn, which is fine.
> 
> This patch stops artificially lowering that param, and for one_if_conv_insn
> tuning it instead rejects the if-conversion if the resulting sequence has
> multiple cmov instructions.  The hook is passed if_info too, so it can
> in the future do better heuristics based on predictability of the edges,
> how far the uses of the cmov result are (I assume cmov major problem is
> latency, right?) etc.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Does this change anything for targets that do not implement the new hook?
It isn't immediately obvious from the patch.


Segher
Jakub Jelinek April 1, 2017, 6:47 p.m. UTC | #2
On Sat, Apr 01, 2017 at 01:43:36PM -0500, Segher Boessenkool wrote:
> Hi,
> 
> On Sat, Apr 01, 2017 at 02:20:27PM +0200, Jakub Jelinek wrote:
> > As discussed in the PR, in the following testcase we don't if-convert
> > with the generic (and many other) tuning, because we default to
> > --param max-rtl-if-conversion-insns=1 in most of the tunings.
> > The problem we have is with multiple cmov instructions, but in the
> > testcase there is just one cmov and the other insn is turned into a SSE
> > max insn, which is fine.
> > 
> > This patch stops artificially lowering that param, and for one_if_conv_insn
> > tuning it instead rejects the if-conversion if the resulting sequence has
> > multiple cmov instructions.  The hook is passed if_info too, so it can
> > in the future do better heuristics based on predictability of the edges,
> > how far the uses of the cmov result are (I assume cmov major problem is
> > latency, right?) etc.
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> Does this change anything for targets that do not implement the new hook?

No.  The patch changes all calls to noce_conversion_profitable_p into the
new target hook with the same parameters, and provides the old definition
of the function as the default implementation of the target hook.
So, other targets shouldn't see any change (ok, indirect vs. direct call
or inline in the compiler internals), and targets that choose to override
it can use the default hook as fallthrough or whatever else they choose.

> It isn't immediately obvious from the patch.

	Jakub
Uros Bizjak April 2, 2017, 6:44 p.m. UTC | #3
On Sat, Apr 1, 2017 at 2:20 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> Hi!
>
> As discussed in the PR, in the following testcase we don't if-convert
> with the generic (and many other) tuning, because we default to
> --param max-rtl-if-conversion-insns=1 in most of the tunings.
> The problem we have is with multiple cmov instructions, but in the
> testcase there is just one cmov and the other insn is turned into a SSE
> max insn, which is fine.
>
> This patch stops artificially lowering that param, and for one_if_conv_insn
> tuning it instead rejects the if-conversion if the resulting sequence has
> multiple cmov instructions.  The hook is passed if_info too, so it can
> in the future do better heuristics based on predictability of the edges,
> how far the uses of the cmov result are (I assume cmov major problem is
> latency, right?) etc.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2017-04-01  Jakub Jelinek  <jakub@redhat.com>
>
>         PR tree-optimization/79390
>         * target.h (struct noce_if_info): Declare.
>         * targhooks.h (default_noce_conversion_profitable_p): Declare.
>         * target.def (noce_conversion_profitable_p): New target hook.
>         * ifcvt.h (struct noce_if_info): New type, moved from ...
>         * ifcvt.c (struct noce_if_info): ... here.
>         (noce_conversion_profitable_p): Renamed to ...
>         (default_noce_conversion_profitable_p): ... this.  No longer
>         static nor inline.
>         (noce_try_store_flag_constants, noce_try_addcc,
>         noce_try_store_flag_mask, noce_try_cmove, noce_try_cmove_arith,
>         noce_convert_multiple_sets): Use targetm.noce_conversion_profitable_p
>         instead of noce_conversion_profitable_p.
>         * config/i386/i386.c: Include ifcvt.h.
>         (ix86_option_override_internal): Don't override
>         PARAM_MAX_RTL_IF_CONVERSION_INSNS default.
>         (ix86_noce_conversion_profitable_p): New function.
>         (TARGET_NOCE_CONVERSION_PROFITABLE_P): Redefine.
>         * config/i386/x86-tune.def (X86_TUNE_ONE_IF_CONV_INSN): Adjust comment.
>         * doc/tm.texi.in (TARGET_NOCE_CONVERSION_PROFITABLE_P): Add.
>         * doc/tm.texi: Regenerated.
>
>         * gcc.target/i386/pr79390.c: New test.
>         * gcc.dg/ifcvt-4.c: Use -mtune-ctrl=^one_if_conv_insn for i?86/x86_64.

x86 part LGTM.

Hopefully, this infrastructure will allow us to fix (or it already
fixes) PR 56309 [1].

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309

Thanks,
Uros.
Jakub Jelinek April 3, 2017, 7:20 a.m. UTC | #4
On Sun, Apr 02, 2017 at 08:44:03PM +0200, Uros Bizjak wrote:
> x86 part LGTM.
> 
> Hopefully, this infrastructure will allow us to fix (or it already
> fixes) PR 56309 [1].

I think only allows to.  The target hook has access to the if_info
structure which contains the original basic blocks, their edges,
frequencies, and can inspect both the new sequence as well as the original
basic blocks etc.  I really don't know in detail what the problem with
cmov is (latency, or that it blocks some CPU units, something else?).

In any case, any change will need lots of benchmarking, because apparently
cmov is extremely important to get right (for not well predictable branches
use it, for other not really).  Another question is if say setcc has similar
problem or not.

> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309

	Jakub
Jeff Law April 4, 2017, 5:31 p.m. UTC | #5
On 04/01/2017 06:20 AM, Jakub Jelinek wrote:
> Hi!
>
> As discussed in the PR, in the following testcase we don't if-convert
> with the generic (and many other) tuning, because we default to
> --param max-rtl-if-conversion-insns=1 in most of the tunings.
> The problem we have is with multiple cmov instructions, but in the
> testcase there is just one cmov and the other insn is turned into a SSE
> max insn, which is fine.
>
> This patch stops artificially lowering that param, and for one_if_conv_insn
> tuning it instead rejects the if-conversion if the resulting sequence has
> multiple cmov instructions.  The hook is passed if_info too, so it can
> in the future do better heuristics based on predictability of the edges,
> how far the uses of the cmov result are (I assume cmov major problem is
> latency, right?) etc.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2017-04-01  Jakub Jelinek  <jakub@redhat.com>
>
> 	PR tree-optimization/79390
> 	* target.h (struct noce_if_info): Declare.
> 	* targhooks.h (default_noce_conversion_profitable_p): Declare.
> 	* target.def (noce_conversion_profitable_p): New target hook.
> 	* ifcvt.h (struct noce_if_info): New type, moved from ...
> 	* ifcvt.c (struct noce_if_info): ... here.
> 	(noce_conversion_profitable_p): Renamed to ...
> 	(default_noce_conversion_profitable_p): ... this.  No longer
> 	static nor inline.
> 	(noce_try_store_flag_constants, noce_try_addcc,
> 	noce_try_store_flag_mask, noce_try_cmove, noce_try_cmove_arith,
> 	noce_convert_multiple_sets): Use targetm.noce_conversion_profitable_p
> 	instead of noce_conversion_profitable_p.
> 	* config/i386/i386.c: Include ifcvt.h.
> 	(ix86_option_override_internal): Don't override
> 	PARAM_MAX_RTL_IF_CONVERSION_INSNS default.
> 	(ix86_noce_conversion_profitable_p): New function.
> 	(TARGET_NOCE_CONVERSION_PROFITABLE_P): Redefine.
> 	* config/i386/x86-tune.def (X86_TUNE_ONE_IF_CONV_INSN): Adjust comment.
> 	* doc/tm.texi.in (TARGET_NOCE_CONVERSION_PROFITABLE_P): Add.
> 	* doc/tm.texi: Regenerated.
>
> 	* gcc.target/i386/pr79390.c: New test.
> 	* gcc.dg/ifcvt-4.c: Use -mtune-ctrl=^one_if_conv_insn for i?86/x86_64.
OK.
jeff
diff mbox

Patch

--- gcc/target.h.jj	2017-01-01 12:45:34.000000000 +0100
+++ gcc/target.h	2017-04-01 09:33:21.306019575 +0200
@@ -140,6 +140,9 @@  struct ddg;
 /* This is defined in cfgloop.h .  */
 struct loop;
 
+/* This is defined in ifcvt.h.  */
+struct noce_if_info;
+
 /* This is defined in tree-ssa-alias.h.  */
 struct ao_ref;
 
--- gcc/targhooks.h.jj	2017-01-01 12:45:38.000000000 +0100
+++ gcc/targhooks.h	2017-04-01 09:34:16.621342814 +0200
@@ -257,6 +257,8 @@  extern void default_setup_incoming_varar
 extern bool default_optab_supported_p (int, machine_mode, machine_mode,
 				       optimization_type);
 extern unsigned int default_max_noce_ifcvt_seq_cost (edge);
+extern bool default_noce_conversion_profitable_p (rtx_insn *,
+						  struct noce_if_info *);
 extern unsigned int default_min_arithmetic_precision (void);
 
 extern enum flt_eval_method
--- gcc/target.def.jj	2017-02-06 20:24:34.000000000 +0100
+++ gcc/target.def	2017-04-01 09:31:54.652079752 +0200
@@ -3660,6 +3660,16 @@  and uses a multiple of @code{BRANCH_COST
 unsigned int, (edge e),
 default_max_noce_ifcvt_seq_cost)
 
+/* Return true if the given instruction sequence is a good candidate
+   as a replacement for the if-convertible sequence.  */
+DEFHOOK
+(noce_conversion_profitable_p,
+ "This hook returns true if the instruction sequence @code{seq} is a good\n\
+candidate as a replacement for the if-convertible sequence described in\n\
+@code{if_info}.",
+bool, (rtx_insn *seq, struct noce_if_info *if_info),
+default_noce_conversion_profitable_p)
+
 /* Permit speculative instructions in delay slots during delayed-branch 
    scheduling.  */
 DEFHOOK
--- gcc/ifcvt.h.jj	2017-01-01 12:45:34.000000000 +0100
+++ gcc/ifcvt.h	2017-04-01 09:21:32.478105243 +0200
@@ -40,4 +40,74 @@  struct ce_if_block
   int pass;				/* Pass number.  */
 };
 
+/* Used by noce_process_if_block to communicate with its subroutines.
+
+   The subroutines know that A and B may be evaluated freely.  They
+   know that X is a register.  They should insert new instructions
+   before cond_earliest.  */
+
+struct noce_if_info
+{
+  /* The basic blocks that make up the IF-THEN-{ELSE-,}JOIN block.  */
+  basic_block test_bb, then_bb, else_bb, join_bb;
+
+  /* The jump that ends TEST_BB.  */
+  rtx_insn *jump;
+
+  /* The jump condition.  */
+  rtx cond;
+
+  /* Reversed jump condition.  */
+  rtx rev_cond;
+
+  /* New insns should be inserted before this one.  */
+  rtx_insn *cond_earliest;
+
+  /* Insns in the THEN and ELSE block.  There is always just this
+     one insns in those blocks.  The insns are single_set insns.
+     If there was no ELSE block, INSN_B is the last insn before
+     COND_EARLIEST, or NULL_RTX.  In the former case, the insn
+     operands are still valid, as if INSN_B was moved down below
+     the jump.  */
+  rtx_insn *insn_a, *insn_b;
+
+  /* The SET_SRC of INSN_A and INSN_B.  */
+  rtx a, b;
+
+  /* The SET_DEST of INSN_A.  */
+  rtx x;
+
+  /* The original set destination that the THEN and ELSE basic blocks finally
+     write their result to.  */
+  rtx orig_x;
+  /* True if this if block is not canonical.  In the canonical form of
+     if blocks, the THEN_BB is the block reached via the fallthru edge
+     from TEST_BB.  For the noce transformations, we allow the symmetric
+     form as well.  */
+  bool then_else_reversed;
+
+  /* True if the contents of then_bb and else_bb are a
+     simple single set instruction.  */
+  bool then_simple;
+  bool else_simple;
+
+  /* True if we're optimisizing the control block for speed, false if
+     we're optimizing for size.  */
+  bool speed_p;
+
+  /* An estimate of the original costs.  When optimizing for size, this is the
+     combined cost of COND, JUMP and the costs for THEN_BB and ELSE_BB.
+     When optimizing for speed, we use the costs of COND plus the minimum of
+     the costs for THEN_BB and ELSE_BB, as computed in the next field.  */
+  unsigned int original_cost;
+
+  /* Maximum permissible cost for the unconditional sequence we should
+     generate to replace this branch.  */
+  unsigned int max_seq_cost;
+
+  /* The name of the noce transform that succeeded in if-converting
+     this structure.  Used for debugging.  */
+  const char *transform_name;
+};
+
 #endif /* GCC_IFCVT_H */
--- gcc/ifcvt.c.jj	2017-02-23 23:01:42.000000000 +0100
+++ gcc/ifcvt.c	2017-04-01 09:50:48.831572345 +0200
@@ -760,76 +760,6 @@  cond_exec_process_if_block (ce_if_block
   return FALSE;
 }
 
-/* Used by noce_process_if_block to communicate with its subroutines.
-
-   The subroutines know that A and B may be evaluated freely.  They
-   know that X is a register.  They should insert new instructions
-   before cond_earliest.  */
-
-struct noce_if_info
-{
-  /* The basic blocks that make up the IF-THEN-{ELSE-,}JOIN block.  */
-  basic_block test_bb, then_bb, else_bb, join_bb;
-
-  /* The jump that ends TEST_BB.  */
-  rtx_insn *jump;
-
-  /* The jump condition.  */
-  rtx cond;
-
-  /* Reversed jump condition.  */
-  rtx rev_cond;
-
-  /* New insns should be inserted before this one.  */
-  rtx_insn *cond_earliest;
-
-  /* Insns in the THEN and ELSE block.  There is always just this
-     one insns in those blocks.  The insns are single_set insns.
-     If there was no ELSE block, INSN_B is the last insn before
-     COND_EARLIEST, or NULL_RTX.  In the former case, the insn
-     operands are still valid, as if INSN_B was moved down below
-     the jump.  */
-  rtx_insn *insn_a, *insn_b;
-
-  /* The SET_SRC of INSN_A and INSN_B.  */
-  rtx a, b;
-
-  /* The SET_DEST of INSN_A.  */
-  rtx x;
-
-  /* The original set destination that the THEN and ELSE basic blocks finally
-     write their result to.  */
-  rtx orig_x;
-  /* True if this if block is not canonical.  In the canonical form of
-     if blocks, the THEN_BB is the block reached via the fallthru edge
-     from TEST_BB.  For the noce transformations, we allow the symmetric
-     form as well.  */
-  bool then_else_reversed;
-
-  /* True if the contents of then_bb and else_bb are a
-     simple single set instruction.  */
-  bool then_simple;
-  bool else_simple;
-
-  /* True if we're optimisizing the control block for speed, false if
-     we're optimizing for size.  */
-  bool speed_p;
-
-  /* An estimate of the original costs.  When optimizing for size, this is the
-     combined cost of COND, JUMP and the costs for THEN_BB and ELSE_BB.
-     When optimizing for speed, we use the costs of COND plus the minimum of
-     the costs for THEN_BB and ELSE_BB, as computed in the next field.  */
-  unsigned int original_cost;
-
-  /* Maximum permissible cost for the unconditional sequence we should
-     generate to replace this branch.  */
-  unsigned int max_seq_cost;
-
-  /* The name of the noce transform that succeeded in if-converting
-     this structure.  Used for debugging.  */
-  const char *transform_name;
-};
-
 static rtx noce_emit_store_flag (struct noce_if_info *, rtx, int, int);
 static int noce_try_move (struct noce_if_info *);
 static int noce_try_ifelse_collapse (struct noce_if_info *);
@@ -857,11 +787,14 @@  noce_reversed_cond_code (struct noce_if_
   return reversed_comparison_code (if_info->cond, if_info->jump);
 }
 
-/* Return TRUE if SEQ is a good candidate as a replacement for the
-   if-convertible sequence described in IF_INFO.  */
-
-inline static bool
-noce_conversion_profitable_p (rtx_insn *seq, struct noce_if_info *if_info)
+/* Return true if SEQ is a good candidate as a replacement for the
+   if-convertible sequence described in IF_INFO.
+   This is the default implementation that targets can override
+   through a target hook.  */
+
+bool
+default_noce_conversion_profitable_p (rtx_insn *seq,
+				      struct noce_if_info *if_info)
 {
   bool speed_p = if_info->speed_p;
 
@@ -1544,7 +1477,7 @@  noce_try_store_flag_constants (struct no
 	noce_emit_move_insn (if_info->x, target);
 
       seq = end_ifcvt_sequence (if_info);
-      if (!seq || !noce_conversion_profitable_p (seq, if_info))
+      if (!seq || !targetm.noce_conversion_profitable_p (seq, if_info))
 	return FALSE;
 
       emit_insn_before_setloc (seq, if_info->jump,
@@ -1605,7 +1538,7 @@  noce_try_addcc (struct noce_if_info *if_
 		noce_emit_move_insn (if_info->x, target);
 
 	      seq = end_ifcvt_sequence (if_info);
-	      if (!seq || !noce_conversion_profitable_p (seq, if_info))
+	      if (!seq || !targetm.noce_conversion_profitable_p (seq, if_info))
 		return FALSE;
 
 	      emit_insn_before_setloc (seq, if_info->jump,
@@ -1647,7 +1580,7 @@  noce_try_addcc (struct noce_if_info *if_
 		noce_emit_move_insn (if_info->x, target);
 
 	      seq = end_ifcvt_sequence (if_info);
-	      if (!seq || !noce_conversion_profitable_p (seq, if_info))
+	      if (!seq || !targetm.noce_conversion_profitable_p (seq, if_info))
 		return FALSE;
 
 	      emit_insn_before_setloc (seq, if_info->jump,
@@ -1698,7 +1631,7 @@  noce_try_store_flag_mask (struct noce_if
 	    noce_emit_move_insn (if_info->x, target);
 
 	  seq = end_ifcvt_sequence (if_info);
-	  if (!seq || !noce_conversion_profitable_p (seq, if_info))
+	  if (!seq || !targetm.noce_conversion_profitable_p (seq, if_info))
 	    return FALSE;
 
 	  emit_insn_before_setloc (seq, if_info->jump,
@@ -1850,7 +1783,7 @@  noce_try_cmove (struct noce_if_info *if_
 	    noce_emit_move_insn (if_info->x, target);
 
 	  seq = end_ifcvt_sequence (if_info);
-	  if (!seq || !noce_conversion_profitable_p (seq, if_info))
+	  if (!seq || !targetm.noce_conversion_profitable_p (seq, if_info))
 	    return FALSE;
 
 	  emit_insn_before_setloc (seq, if_info->jump,
@@ -1903,7 +1836,7 @@  noce_try_cmove (struct noce_if_info *if_
 		noce_emit_move_insn (if_info->x, target);
 
 	      seq = end_ifcvt_sequence (if_info);
-	      if (!seq || !noce_conversion_profitable_p (seq, if_info))
+	      if (!seq || !targetm.noce_conversion_profitable_p (seq, if_info))
 		return FALSE;
 
 	      emit_insn_before_setloc (seq, if_info->jump,
@@ -2345,7 +2278,7 @@  noce_try_cmove_arith (struct noce_if_inf
     noce_emit_move_insn (x, target);
 
   ifcvt_seq = end_ifcvt_sequence (if_info);
-  if (!ifcvt_seq || !noce_conversion_profitable_p (ifcvt_seq, if_info))
+  if (!ifcvt_seq || !targetm.noce_conversion_profitable_p (ifcvt_seq, if_info))
     return FALSE;
 
   emit_insn_before_setloc (ifcvt_seq, if_info->jump,
@@ -3308,7 +3241,7 @@  noce_convert_multiple_sets (struct noce_
   /* Actually emit the sequence if it isn't too expensive.  */
   rtx_insn *seq = get_insns ();
 
-  if (!noce_conversion_profitable_p (seq, if_info))
+  if (!targetm.noce_conversion_profitable_p (seq, if_info))
     {
       end_sequence ();
       return FALSE;
--- gcc/config/i386/i386.c.jj	2017-03-29 07:11:22.000000000 +0200
+++ gcc/config/i386/i386.c	2017-04-01 10:44:45.428063140 +0200
@@ -84,6 +84,7 @@  along with GCC; see the file COPYING3.
 #include "selftest-rtl.h"
 #include "print-rtl.h"
 #include "intl.h"
+#include "ifcvt.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -6104,13 +6105,6 @@  ix86_option_override_internal (bool main
 			 opts->x_param_values,
 			 opts_set->x_param_values);
 
-  /* Restrict number of if-converted SET insns to 1.  */
-  if (TARGET_ONE_IF_CONV_INSN)
-    maybe_set_param_value (PARAM_MAX_RTL_IF_CONVERSION_INSNS,
-			   1,
-			   opts->x_param_values,
-			   opts_set->x_param_values);
-
   /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful.  */
   if (opts->x_flag_prefetch_loop_arrays < 0
       && HAVE_prefetch
@@ -50622,6 +50616,41 @@  ix86_max_noce_ifcvt_seq_cost (edge e)
     return BRANCH_COST (true, predictable_p) * COSTS_N_INSNS (2);
 }
 
+/* Return true if SEQ is a good candidate as a replacement for the
+   if-convertible sequence described in IF_INFO.  */
+
+static bool
+ix86_noce_conversion_profitable_p (rtx_insn *seq, struct noce_if_info *if_info)
+{
+  if (TARGET_ONE_IF_CONV_INSN && if_info->speed_p)
+    {
+      int cmov_cnt = 0;
+      /* Punt if SEQ contains more than one CMOV or FCMOV instruction.
+	 Maybe we should allow even more conditional moves as long as they
+	 are used far enough not to stall the CPU, or also consider
+	 IF_INFO->TEST_BB succ edge probabilities.  */
+      for (rtx_insn *insn = seq; insn; insn = NEXT_INSN (insn))
+	{
+	  rtx set = single_set (insn);
+	  if (!set)
+	    continue;
+	  if (GET_CODE (SET_SRC (set)) != IF_THEN_ELSE)
+	    continue;
+	  rtx src = SET_SRC (set);
+	  enum machine_mode mode = GET_MODE (src);
+	  if (GET_MODE_CLASS (mode) != MODE_INT
+	      && GET_MODE_CLASS (mode) != MODE_FLOAT)
+	    continue;
+	  if ((!REG_P (XEXP (src, 1)) && !MEM_P (XEXP (src, 1)))
+	      || (!REG_P (XEXP (src, 2)) && !MEM_P (XEXP (src, 2))))
+	    continue;
+	  /* insn is CMOV or FCMOV.  */
+	  if (++cmov_cnt > 1)
+	    return false;
+	}
+    }
+  return default_noce_conversion_profitable_p (seq, if_info);
+}
 
 /* Implement targetm.vectorize.init_cost.  */
 
@@ -52174,6 +52203,10 @@  ix86_run_selftests (void)
 
 #undef TARGET_MAX_NOCE_IFCVT_SEQ_COST
 #define TARGET_MAX_NOCE_IFCVT_SEQ_COST ix86_max_noce_ifcvt_seq_cost
+
+#undef TARGET_NOCE_CONVERSION_PROFITABLE_P
+#define TARGET_NOCE_CONVERSION_PROFITABLE_P ix86_noce_conversion_profitable_p
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests
--- gcc/config/i386/x86-tune.def.jj	2017-01-01 12:45:42.000000000 +0100
+++ gcc/config/i386/x86-tune.def	2017-04-01 09:47:54.671828727 +0200
@@ -547,7 +547,7 @@  DEF_TUNE (X86_TUNE_PROMOTE_QI_REGS, "pro
    the unroll factor so that the unrolled loop fits the loop buffer.  */
 DEF_TUNE (X86_TUNE_ADJUST_UNROLL, "adjust_unroll_factor", m_BDVER3 | m_BDVER4)
 
-/* X86_TUNE_ONE_IF_CONV_INSNS: Restrict a number of set insns to be
-   if-converted to one.  */
+/* X86_TUNE_ONE_IF_CONV_INSNS: Restrict a number of cmov insns in
+   if-converted sequence to one.  */
 DEF_TUNE (X86_TUNE_ONE_IF_CONV_INSN, "one_if_conv_insn",
 	  m_SILVERMONT | m_KNL | m_INTEL | m_CORE_ALL | m_GENERIC)
--- gcc/doc/tm.texi.in.jj	2017-03-22 19:31:40.000000000 +0100
+++ gcc/doc/tm.texi.in	2017-04-01 09:35:48.975212900 +0200
@@ -4796,6 +4796,8 @@  Define this macro if a non-short-circuit
 
 @hook TARGET_MAX_NOCE_IFCVT_SEQ_COST
 
+@hook TARGET_NOCE_CONVERSION_PROFITABLE_P
+
 @hook TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P
 
 @node Scheduling
--- gcc/doc/tm.texi.jj	2017-03-22 19:31:40.000000000 +0100
+++ gcc/doc/tm.texi	2017-04-01 09:35:55.000000000 +0200
@@ -6644,6 +6644,12 @@  The default implementation of this hook
 and uses a multiple of @code{BRANCH_COST} otherwise.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_NOCE_CONVERSION_PROFITABLE_P (rtx_insn *@var{seq}, struct noce_if_info *@var{if_info})
+This hook returns true if the instruction sequence @code{seq} is a good
+candidate as a replacement for the if-convertible sequence described in
+@code{if_info}.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P (void)
 This predicate controls the use of the eager delay slot filler to disallow
 speculatively executed instructions being placed in delay slots.  Targets
--- gcc/testsuite/gcc.target/i386/pr79390.c.jj	2017-04-01 10:41:13.341777312 +0200
+++ gcc/testsuite/gcc.target/i386/pr79390.c	2017-04-01 10:47:05.865265897 +0200
@@ -0,0 +1,28 @@ 
+/* PR tree-optimization/79390 */
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O2 -march=haswell -mtune=haswell -mfpmath=sse" } */
+/* Make sure we emit a conditional move in this loop.  */
+
+extern double A[32];
+
+int
+foo (void)
+{
+  double t = A[0];
+  int jp = 0;
+  int i;
+
+  for (i = 0; i < 32; i++)
+    {
+      double ab = A[i];
+      if (ab > t)
+	{
+	  jp = i;
+	  t = ab;
+	}
+    }
+ 
+  return jp;
+}
+
+/* { dg-final { scan-assembler "\[ \\t\]cmov\[a-z]+\[ \\t\]" } } */
--- gcc/testsuite/gcc.dg/ifcvt-4.c.jj	2017-02-23 23:00:50.000000000 +0100
+++ gcc/testsuite/gcc.dg/ifcvt-4.c	2017-04-01 13:45:36.688942812 +0200
@@ -1,6 +1,7 @@ 
 /* { dg-options "-fdump-rtl-ce1 -O2 --param max-rtl-if-conversion-insns=3 --param max-rtl-if-conversion-unpredictable-cost=100" } */
 /* { dg-additional-options "-misel" { target { powerpc*-*-* } } } */
 /* { dg-additional-options "-march=z196" { target { s390x-*-* } } } */
+/* { dg-additional-options "-mtune-ctrl=^one_if_conv_insn" { target { i?86-*-* x86_64-*-* } } } */
 /* { dg-skip-if "Multiple set if-conversion not guaranteed on all subtargets" { "arm*-*-* hppa*64*-*-* s390-*-* visium-*-*" riscv*-*-* } }  */
 /* { dg-skip-if "" { "s390x-*-*" } { "-m31" } }  */