diff mbox

0004-Model-Core-2-i7-decoder-bottleneck

Message ID 4CC6E3A4.9040105@codesourcery.com
State New
Headers show

Commit Message

Maxim Kuvyrkov Oct. 26, 2010, 2:20 p.m. UTC
This patch makes the scheduler aware of decoder restrictions on Core 
2/i7.  It adds new hooks to multipass scheduling that allow the target 
to filter the search space from instructions that should not be tried in 
current [partial] solution of multipass scheduling.

The primary motivation of this patch is to model limited-size buffers, 
such as decoder fetch blocks, and, more generally, CPU pipeline features 
that are difficult to express in a DFA model.  For Core 2/i7 we need to 
model a 16-byte decoder buffer filled with variable-length instructions. 
  When the modelled buffer becomes nearly full the hooks remove 
instructions that would not fit the rest of the buffer from multipass 
scheduler consideration.

The patch does not affect behavior of targets that do not define the new 
hooks.

The patch touches both i386 backend and the haifa scheduler; the i386 
part is at the beginning of the patch and the target-independent changes 
are in the second part of the patch.

Tested by bootstrapping on i686-pc-linux-gnu, an earlier version was 
successfully regtested.  The patch shows 0.1-0.25% performance 
improvement on SPEC2000.  I'm now rerunning the SPEC2000 and SPEC2006 
benchmarks and will post the final results in a day or so.

Your approvals and comments are welcome.

OK to commit?

Thank you,

Comments

Vladimir Makarov Oct. 29, 2010, 7:32 p.m. UTC | #1
On 10/26/2010 10:20 AM, Maxim Kuvyrkov wrote:
> This patch makes the scheduler aware of decoder restrictions on Core 
> 2/i7.  It adds new hooks to multipass scheduling that allow the target 
> to filter the search space from instructions that should not be tried 
> in current [partial] solution of multipass scheduling.
>
> The primary motivation of this patch is to model limited-size buffers, 
> such as decoder fetch blocks, and, more generally, CPU pipeline 
> features that are difficult to express in a DFA model.  For Core 2/i7 
> we need to model a 16-byte decoder buffer filled with variable-length 
> instructions.  When the modelled buffer becomes nearly full the hooks 
> remove instructions that would not fit the rest of the buffer from 
> multipass scheduler consideration.
>
> The patch does not affect behavior of targets that do not define the 
> new hooks.
>
> The patch touches both i386 backend and the haifa scheduler; the i386 
> part is at the beginning of the patch and the target-independent 
> changes are in the second part of the patch.
>
> Tested by bootstrapping on i686-pc-linux-gnu, an earlier version was 
> successfully regtested.  The patch shows 0.1-0.25% performance 
> improvement on SPEC2000.  I'm now rerunning the SPEC2000 and SPEC2006 
> benchmarks and will post the final results in a day or so.
>
> Your approvals and comments are welcome.
>
I think the way you simulate the insns buffer would be even more 
interesting to AMD folks because AMD processors are usually more 
sensitive to insn scheduling than Intel ones.
> OK to commit? 
The scheduler part is ok for me.  You just need to fix the typo in Changelog

* doc/tm.texi: Regenrate.

and add a sentence for  TARGET_SCHED_FIRST_CYCLE_MULTIPASS_FINI in 
document files (although its meaning is easy to guess from the context 
and it is described in target.def).  That is what I found.

Thanks for the patch, Maxim.
Uros Bizjak Oct. 30, 2010, 8:02 p.m. UTC | #2
On Fri, Oct 29, 2010 at 9:32 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>  On 10/26/2010 10:20 AM, Maxim Kuvyrkov wrote:
>>
>> This patch makes the scheduler aware of decoder restrictions on Core 2/i7.
>>  It adds new hooks to multipass scheduling that allow the target to filter
>> the search space from instructions that should not be tried in current
>> [partial] solution of multipass scheduling.
>>
>> The primary motivation of this patch is to model limited-size buffers,
>> such as decoder fetch blocks, and, more generally, CPU pipeline features
>> that are difficult to express in a DFA model.  For Core 2/i7 we need to
>> model a 16-byte decoder buffer filled with variable-length instructions.
>>  When the modelled buffer becomes nearly full the hooks remove instructions
>> that would not fit the rest of the buffer from multipass scheduler
>> consideration.
>>
>> The patch does not affect behavior of targets that do not define the new
>> hooks.
>>
>> The patch touches both i386 backend and the haifa scheduler; the i386 part
>> is at the beginning of the patch and the target-independent changes are in
>> the second part of the patch.
>>
>> Tested by bootstrapping on i686-pc-linux-gnu, an earlier version was
>> successfully regtested.  The patch shows 0.1-0.25% performance improvement
>> on SPEC2000.  I'm now rerunning the SPEC2000 and SPEC2006 benchmarks and
>> will post the final results in a day or so.
>>
>> Your approvals and comments are welcome.
>>
> I think the way you simulate the insns buffer would be even more interesting
> to AMD folks because AMD processors are usually more sensitive to insn
> scheduling than Intel ones.
>>
>> OK to commit?
>
> The scheduler part is ok for me.  You just need to fix the typo in Changelog

The x86 part is also OK.

Thanks,
Uros.
diff mbox

Patch

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 9c10103..38c5ff1 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -263,3 +263,21 @@  extern int asm_preferred_eh_data_format (int, int);
 #ifdef HAVE_ATTR_cpu
 extern enum attr_cpu ix86_schedule;
 #endif
+
+#ifdef RTX_CODE
+/* Target data for multipass lookahead scheduling.
+   Currently used for Core 2/i7 tuning.  */
+struct ix86_first_cycle_multipass_data_
+{
+  /* The length (in bytes) of ifetch block in this solution.  */
+  int ifetch_block_len;
+  /* Number of instructions in ifetch block in this solution.  */
+  int ifetch_block_n_insns;
+  /* Bitmap to remember changes to ready_try for backtracking.  */
+  sbitmap ready_try_change;
+  /* Size of the bitmap.  */
+  int ready_try_change_size;
+};
+# define TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DATA_T	\
+  struct ix86_first_cycle_multipass_data_
+#endif /* RTX_CODE */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 70423fa..ae47f4d 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -21538,12 +21538,265 @@  ia32_multipass_dfa_lookahead (void)
     case PROCESSOR_K6:
       return 1;
 
+    case PROCESSOR_CORE2:
+    case PROCESSOR_COREI7_32:
+    case PROCESSOR_COREI7_64:
+      /* Generally, we want haifa-sched:max_issue() to look ahead as far
+	 as many instructions can be executed on a cycle, i.e.,
+	 issue_rate.  I wonder why tuning for many CPUs does not do this.  */
+      return ix86_issue_rate ();
+
     default:
       return 0;
     }
 }
 
 
+
+/* Model decoder of Core 2/i7.
+   Below hooks for multipass scheduling (see haifa-sched.c:max_issue)
+   track the instruction fetch block boundaries and make sure that long
+   (9+ bytes) instructions are assigned to D0.  */
+
+/* Maximum length of an insn that can be handled by
+   a secondary decoder unit.  '8' for Core 2/i7.  */
+static int core2i7_secondary_decoder_max_insn_size;
+
+/* Ifetch block size, i.e., number of bytes decoder reads per cycle.
+   '16' for Core 2/i7.  */
+static int core2i7_ifetch_block_size;
+
+/* Maximum number of instructions decoder can handle per cycle.
+   '6' for Core 2/i7.  */
+static int core2i7_ifetch_block_max_insns;
+
+typedef struct ix86_first_cycle_multipass_data_ *
+  ix86_first_cycle_multipass_data_t;
+typedef const struct ix86_first_cycle_multipass_data_ *
+  const_ix86_first_cycle_multipass_data_t;
+
+/* A variable to store target state across calls to max_issue within
+   one cycle.  */
+static struct ix86_first_cycle_multipass_data_ _ix86_first_cycle_multipass_data,
+  *ix86_first_cycle_multipass_data = &_ix86_first_cycle_multipass_data;
+
+/* Initialize DATA.  */
+static void
+core2i7_first_cycle_multipass_init (void *_data)
+{
+  ix86_first_cycle_multipass_data_t data
+    = (ix86_first_cycle_multipass_data_t) _data;
+
+  data->ifetch_block_len = 0;
+  data->ifetch_block_n_insns = 0;
+  data->ready_try_change = NULL;
+  data->ready_try_change_size = 0;
+}
+
+/* Advancing the cycle; reset ifetch block counts.  */
+static void
+core2i7_dfa_post_advance_cycle (void)
+{
+  ix86_first_cycle_multipass_data_t data = ix86_first_cycle_multipass_data;
+
+  gcc_assert (data->ifetch_block_n_insns <= core2i7_ifetch_block_max_insns);
+
+  data->ifetch_block_len = 0;
+  data->ifetch_block_n_insns = 0;
+}
+
+static int min_insn_size (rtx);
+
+/* Filter out insns from ready_try that the core will not be able to issue
+   on current cycle due to decoder.  */
+static void
+core2i7_first_cycle_multipass_filter_ready_try
+(const_ix86_first_cycle_multipass_data_t data,
+ char *ready_try, int n_ready, bool first_cycle_insn_p)
+{
+  while (n_ready--)
+    {
+      rtx insn;
+      int insn_size;
+
+      if (ready_try[n_ready])
+	continue;
+
+      insn = get_ready_element (n_ready);
+      insn_size = min_insn_size (insn);
+
+      if (/* If this is a too long an insn for a secondary decoder ...  */
+	  (!first_cycle_insn_p
+	   && insn_size > core2i7_secondary_decoder_max_insn_size)
+	  /* ... or it would not fit into the ifetch block ...  */
+	  || data->ifetch_block_len + insn_size > core2i7_ifetch_block_size
+	  /* ... or the decoder is full already ...  */
+	  || data->ifetch_block_n_insns > core2i7_ifetch_block_max_insns)
+	/* ... mask the insn out.  */
+	{
+	  ready_try[n_ready] = 1;
+
+	  if (data->ready_try_change)
+	    SET_BIT (data->ready_try_change, n_ready);
+	}
+    }
+}
+
+/* Prepare for a new round of multipass lookahead scheduling.  */
+static void
+core2i7_first_cycle_multipass_begin (void *_data, char *ready_try, int n_ready,
+				     bool first_cycle_insn_p)
+{
+  ix86_first_cycle_multipass_data_t data
+    = (ix86_first_cycle_multipass_data_t) _data;
+  const_ix86_first_cycle_multipass_data_t prev_data
+    = ix86_first_cycle_multipass_data;
+
+  /* Restore the state from the end of the previous round.  */
+  data->ifetch_block_len = prev_data->ifetch_block_len;
+  data->ifetch_block_n_insns = prev_data->ifetch_block_n_insns;
+
+  /* Filter instructions that cannot be issued on current cycle due to
+     decoder restrictions.  */
+  core2i7_first_cycle_multipass_filter_ready_try (data, ready_try, n_ready,
+						  first_cycle_insn_p);
+}
+
+/* INSN is being issued in current solution.  Account for its impact on
+   the decoder model.  */
+static void
+core2i7_first_cycle_multipass_issue (void *_data, char *ready_try, int n_ready,
+				     rtx insn, const void *_prev_data)
+{
+  ix86_first_cycle_multipass_data_t data
+    = (ix86_first_cycle_multipass_data_t) _data;
+  const_ix86_first_cycle_multipass_data_t prev_data
+    = (const_ix86_first_cycle_multipass_data_t) _prev_data;
+
+  int insn_size = min_insn_size (insn);
+
+  data->ifetch_block_len = prev_data->ifetch_block_len + insn_size;
+  data->ifetch_block_n_insns = prev_data->ifetch_block_n_insns + 1;
+  gcc_assert (data->ifetch_block_len <= core2i7_ifetch_block_size
+	      && data->ifetch_block_n_insns <= core2i7_ifetch_block_max_insns);
+
+  /* Allocate or resize the bitmap for storing INSN's effect on ready_try.  */
+  if (!data->ready_try_change)
+    {
+      data->ready_try_change = sbitmap_alloc (n_ready);
+      data->ready_try_change_size = n_ready;
+    }
+  else if (data->ready_try_change_size < n_ready)
+    {
+      data->ready_try_change = sbitmap_resize (data->ready_try_change,
+					       n_ready, 0);
+      data->ready_try_change_size = n_ready;
+    }
+  sbitmap_zero (data->ready_try_change);
+
+  /* Filter out insns from ready_try that the core will not be able to issue
+     on current cycle due to decoder.  */
+  core2i7_first_cycle_multipass_filter_ready_try (data, ready_try, n_ready,
+						  false);
+}
+
+/* Revert the effect on ready_try.  */
+static void
+core2i7_first_cycle_multipass_backtrack (const void *_data,
+					 char *ready_try,
+					 int n_ready ATTRIBUTE_UNUSED)
+{
+  const_ix86_first_cycle_multipass_data_t data
+    = (const_ix86_first_cycle_multipass_data_t) _data;
+  unsigned int i = 0;
+  sbitmap_iterator sbi;
+
+  gcc_assert (sbitmap_last_set_bit (data->ready_try_change) < n_ready);
+  EXECUTE_IF_SET_IN_SBITMAP (data->ready_try_change, 0, i, sbi)
+    {
+      ready_try[i] = 0;
+    }
+}
+
+/* Save the result of multipass lookahead scheduling for the next round.  */
+static void
+core2i7_first_cycle_multipass_end (const void *_data)
+{
+  const_ix86_first_cycle_multipass_data_t data
+    = (const_ix86_first_cycle_multipass_data_t) _data;
+  ix86_first_cycle_multipass_data_t next_data
+    = ix86_first_cycle_multipass_data;
+
+  if (data != NULL)
+    {
+      next_data->ifetch_block_len = data->ifetch_block_len;
+      next_data->ifetch_block_n_insns = data->ifetch_block_n_insns;
+    }
+}
+
+/* Deallocate target data.  */
+static void
+core2i7_first_cycle_multipass_fini (void *_data)
+{
+  ix86_first_cycle_multipass_data_t data
+    = (ix86_first_cycle_multipass_data_t) _data;
+
+  if (data->ready_try_change)
+    {
+      sbitmap_free (data->ready_try_change);
+      data->ready_try_change = NULL;
+      data->ready_try_change_size = 0;
+    }
+}
+
+/* Prepare for scheduling pass.  */
+static void
+ix86_sched_init_global (FILE *dump ATTRIBUTE_UNUSED,
+			int verbose ATTRIBUTE_UNUSED,
+			int max_uid ATTRIBUTE_UNUSED)
+{
+  /* Install scheduling hooks for current CPU.  Some of these hooks are used
+     in time-critical parts of the scheduler, so we only set them up when
+     they are actually used.  */
+  switch (ix86_tune)
+    {
+    case PROCESSOR_CORE2:
+    case PROCESSOR_COREI7_32:
+    case PROCESSOR_COREI7_64:
+      targetm.sched.dfa_post_advance_cycle
+	= core2i7_dfa_post_advance_cycle;
+      targetm.sched.first_cycle_multipass_init
+	= core2i7_first_cycle_multipass_init;
+      targetm.sched.first_cycle_multipass_begin
+	= core2i7_first_cycle_multipass_begin;
+      targetm.sched.first_cycle_multipass_issue
+	= core2i7_first_cycle_multipass_issue;
+      targetm.sched.first_cycle_multipass_backtrack
+	= core2i7_first_cycle_multipass_backtrack;
+      targetm.sched.first_cycle_multipass_end
+	= core2i7_first_cycle_multipass_end;
+      targetm.sched.first_cycle_multipass_fini
+	= core2i7_first_cycle_multipass_fini;
+
+      /* Set decoder parameters.  */
+      core2i7_secondary_decoder_max_insn_size = 8;
+      core2i7_ifetch_block_size = 16;
+      core2i7_ifetch_block_max_insns = 6;
+      break;
+
+    default:
+      targetm.sched.dfa_post_advance_cycle = NULL;
+      targetm.sched.first_cycle_multipass_init = NULL;
+      targetm.sched.first_cycle_multipass_begin = NULL;
+      targetm.sched.first_cycle_multipass_issue = NULL;
+      targetm.sched.first_cycle_multipass_backtrack = NULL;
+      targetm.sched.first_cycle_multipass_end = NULL;
+      targetm.sched.first_cycle_multipass_fini = NULL;
+      break;
+    }
+}
+
+
 /* Compute the alignment given to a constant that is being placed in memory.
    EXP is the constant and ALIGN is the alignment that the object would
    ordinarily have.
@@ -33214,6 +33467,8 @@  ix86_autovectorize_vector_sizes (void)
 #undef TARGET_ASM_OUTPUT_ADDR_CONST_EXTRA
 #define TARGET_ASM_OUTPUT_ADDR_CONST_EXTRA i386_asm_output_addr_const_extra 
 
+#undef TARGET_SCHED_INIT_GLOBAL
+#define TARGET_SCHED_INIT_GLOBAL ix86_sched_init_global
 #undef TARGET_SCHED_ADJUST_COST
 #define TARGET_SCHED_ADJUST_COST ix86_adjust_cost
 #undef TARGET_SCHED_ISSUE_RATE
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index a4f33a7..83b3065 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6703,6 +6703,32 @@  be issued.
 The default is that any ready insns can be chosen to be issued.
 @end deftypefn
 
+@deftypefn {Target Hook} void TARGET_SCHED_FIRST_CYCLE_MULTIPASS_BEGIN (void *@var{data}, char *@var{ready_try}, int @var{n_ready}, bool @var{first_cycle_insn_p})
+This hook prepares the target backend for a new round of multipass
+scheduling.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_SCHED_FIRST_CYCLE_MULTIPASS_ISSUE (void *@var{data}, char *@var{ready_try}, int @var{n_ready}, rtx @var{insn}, const void *@var{prev_data})
+This hook is called when multipass scheduling evaluates instruction INSN.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_SCHED_FIRST_CYCLE_MULTIPASS_BACKTRACK (const void *@var{data}, char *@var{ready_try}, int @var{n_ready})
+This is called when multipass scheduling backtracks from evaluation of
+an instruction.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_SCHED_FIRST_CYCLE_MULTIPASS_END (const void *@var{data})
+This hook notifies the target about the result of the concluded current
+round of multipass scheduling.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_SCHED_FIRST_CYCLE_MULTIPASS_INIT (void *@var{data})
+This hook initilizes target-specific data used in multipass scheduling.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_SCHED_FIRST_CYCLE_MULTIPASS_FINI (void *@var{data})
+@end deftypefn
+
 @deftypefn {Target Hook} int TARGET_SCHED_DFA_NEW_CYCLE (FILE *@var{dump}, int @var{verbose}, rtx @var{insn}, int @var{last_clock}, int @var{clock}, int *@var{sort_p})
 This hook is called by the insn scheduler before issuing @var{insn}
 on cycle @var{clock}.  If the hook returns nonzero,
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index a9592b1..4f6bbd6 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -6697,6 +6697,32 @@  be issued.
 The default is that any ready insns can be chosen to be issued.
 @end deftypefn
 
+@hook TARGET_SCHED_FIRST_CYCLE_MULTIPASS_BEGIN
+This hook prepares the target backend for a new round of multipass
+scheduling.
+@end deftypefn
+
+@hook TARGET_SCHED_FIRST_CYCLE_MULTIPASS_ISSUE
+This hook is called when multipass scheduling evaluates instruction INSN.
+@end deftypefn
+
+@hook TARGET_SCHED_FIRST_CYCLE_MULTIPASS_BACKTRACK
+This is called when multipass scheduling backtracks from evaluation of
+an instruction.
+@end deftypefn
+
+@hook TARGET_SCHED_FIRST_CYCLE_MULTIPASS_END
+This hook notifies the target about the result of the concluded current
+round of multipass scheduling.
+@end deftypefn
+
+@hook TARGET_SCHED_FIRST_CYCLE_MULTIPASS_INIT
+This hook initilizes target-specific data used in multipass scheduling.
+@end deftypefn
+
+@hook TARGET_SCHED_FIRST_CYCLE_MULTIPASS_FINI
+@end deftypefn
+
 @hook TARGET_SCHED_DFA_NEW_CYCLE
 This hook is called by the insn scheduler before issuing @var{insn}
 on cycle @var{clock}.  If the hook returns nonzero,
diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index 5b5459f..9528fd8 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -544,8 +544,6 @@  static void debug_ready_list (struct ready_list *);
 static rtx ready_remove (struct ready_list *, int);
 static void ready_remove_insn (rtx);
 
-static int choose_ready (struct ready_list *, rtx *);
-
 static void fix_inter_tick (rtx, rtx);
 static int fix_tick_ready (rtx);
 static void change_queue_index (rtx, int);
@@ -2394,6 +2392,12 @@  insn_finishes_cycle_p (rtx insn)
   return false;
 }
 
+/* Define type for target data used in multipass scheduling.  */
+#ifndef TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DATA_T
+# define TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DATA_T int
+#endif
+typedef TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DATA_T first_cycle_multipass_data_t;
+
 /* The following structure describe an entry of the stack of choices.  */
 struct choice_entry
 {
@@ -2405,6 +2409,8 @@  struct choice_entry
   int n;
   /* State after issuing the insn.  */
   state_t state;
+  /* Target-specific data.  */
+  first_cycle_multipass_data_t target_data;
 };
 
 /* The following array is used to implement a stack of choices used in
@@ -2456,7 +2462,7 @@  static int cached_issue_rate = 0;
    CLOBBERs, etc must be filtered elsewhere.  */
 int
 max_issue (struct ready_list *ready, int privileged_n, state_t state,
-	   int *index)
+	   bool first_cycle_insn_p, int *index)
 {
   int n, i, all, n_ready, best, delay, tries_num, max_points;
   int more_issue;
@@ -2476,6 +2482,11 @@  max_issue (struct ready_list *ready, int privileged_n, state_t state,
 	max_lookahead_tries *= dfa_lookahead;
     }
 
+  if (targetm.sched.first_cycle_multipass_begin)
+    targetm.sched.first_cycle_multipass_begin (&top->target_data,
+					       ready_try, n_ready,
+					       first_cycle_insn_p);
+
   /* Init max_points.  */
   max_points = 0;
   more_issue = issue_rate - cycle_issued_insns;
@@ -2559,6 +2570,11 @@  max_issue (struct ready_list *ready, int privileged_n, state_t state,
 
 	  /* Backtrack.  */
 	  ready_try [i] = 0;
+
+	  if (targetm.sched.first_cycle_multipass_backtrack)
+	    targetm.sched.first_cycle_multipass_backtrack (&top->target_data,
+							   ready_try, n_ready);
+
 	  top--;
 	  memcpy (state, top->state, dfa_state_size);
 	}
@@ -2590,8 +2606,15 @@  max_issue (struct ready_list *ready, int privileged_n, state_t state,
 	      top->index = i;
 	      top->n = n;
 	      memcpy (top->state, state, dfa_state_size);
-
 	      ready_try [i] = 1;
+
+	      if (targetm.sched.first_cycle_multipass_issue)
+		targetm.sched.first_cycle_multipass_issue (&top->target_data,
+							   ready_try, n_ready,
+							   insn,
+							   &((top - 1)
+							     ->target_data));
+
 	      i = -1;
 	    }
 	}
@@ -2600,6 +2623,11 @@  max_issue (struct ready_list *ready, int privileged_n, state_t state,
       i++;
     }
 
+  if (targetm.sched.first_cycle_multipass_end)
+    targetm.sched.first_cycle_multipass_end (best != 0
+					     ? &choice_stack[1].target_data
+					     : NULL);
+
   /* Restore the original state of the DFA.  */
   memcpy (state, choice_stack->state, dfa_state_size);
 
@@ -2614,7 +2642,8 @@  max_issue (struct ready_list *ready, int privileged_n, state_t state,
    0 if INSN_PTR is set to point to the desirable insn,
    1 if choose_ready () should be restarted without advancing the cycle.  */
 static int
-choose_ready (struct ready_list *ready, rtx *insn_ptr)
+choose_ready (struct ready_list *ready, bool first_cycle_insn_p,
+	      rtx *insn_ptr)
 {
   int lookahead;
 
@@ -2745,7 +2774,7 @@  choose_ready (struct ready_list *ready, rtx *insn_ptr)
 		     (insn)));
 	  }
 
-      if (max_issue (ready, 1, curr_state, &index) == 0)
+      if (max_issue (ready, 1, curr_state, first_cycle_insn_p, &index) == 0)
 	{
 	  *insn_ptr = ready_remove_first (ready);
 	  if (sched_verbose >= 4)
@@ -2773,7 +2802,8 @@  choose_ready (struct ready_list *ready, rtx *insn_ptr)
 void
 schedule_block (basic_block *target_bb)
 {
-  int i, first_cycle_insn_p;
+  int i;
+  bool first_cycle_insn_p;
   int can_issue_more;
   state_t temp_state = NULL;  /* It is used for multipass scheduling.  */
   int sort_p, advance, start_clock_var;
@@ -2979,7 +3009,7 @@  schedule_block (basic_block *target_bb)
       else
 	can_issue_more = issue_rate;
 
-      first_cycle_insn_p = 1;
+      first_cycle_insn_p = true;
       cycle_issued_insns = 0;
       for (;;)
 	{
@@ -3022,7 +3052,7 @@  schedule_block (basic_block *target_bb)
 	      int res;
 
 	      insn = NULL_RTX;
-	      res = choose_ready (&ready, &insn);
+	      res = choose_ready (&ready, first_cycle_insn_p, &insn);
 
 	      if (res < 0)
 		/* Finish cycle.  */
@@ -3175,7 +3205,7 @@  schedule_block (basic_block *target_bb)
 	  if (advance != 0)
 	    break;
 
-	  first_cycle_insn_p = 0;
+	  first_cycle_insn_p = false;
 
 	  /* Sort the ready list based on priority.  This must be
 	     redone here, as schedule_insn may have readied additional
@@ -3971,7 +4001,13 @@  sched_extend_ready_list (int new_sched_ready_n_insns)
 			     new_sched_ready_n_insns + 1);
 
   for (; i <= new_sched_ready_n_insns; i++)
-    choice_stack[i].state = xmalloc (dfa_state_size);
+    {
+      choice_stack[i].state = xmalloc (dfa_state_size);
+
+      if (targetm.sched.first_cycle_multipass_init)
+	targetm.sched.first_cycle_multipass_init (&(choice_stack[i]
+						    .target_data));
+    }
 
   sched_ready_n_insns = new_sched_ready_n_insns;
 }
@@ -3990,7 +4026,13 @@  sched_finish_ready_list (void)
   ready_try = NULL;
 
   for (i = 0; i <= sched_ready_n_insns; i++)
-    free (choice_stack [i].state);
+    {
+      if (targetm.sched.first_cycle_multipass_fini)
+	targetm.sched.first_cycle_multipass_fini (&(choice_stack[i]
+						    .target_data));
+
+      free (choice_stack [i].state);
+    }
   free (choice_stack);
   choice_stack = NULL;
 
diff --git a/gcc/sched-int.h b/gcc/sched-int.h
index fd2e15d..1797421 100644
--- a/gcc/sched-int.h
+++ b/gcc/sched-int.h
@@ -195,7 +195,7 @@  struct ready_list
 extern char *ready_try;
 extern struct ready_list ready;
 
-extern int max_issue (struct ready_list *, int, state_t, int *);
+extern int max_issue (struct ready_list *, int, state_t, bool, int *);
 
 extern void ebb_compute_jump_reg_dependencies (rtx, regset, regset, regset);
 
diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c
index 12af486..20d226a 100644
--- a/gcc/sel-sched.c
+++ b/gcc/sel-sched.c
@@ -4318,8 +4318,9 @@  choose_best_insn (fence_t fence, int privileged_n, int *index)
   if (dfa_lookahead > 0)
     {
       cycle_issued_insns = FENCE_ISSUED_INSNS (fence);
+      /* TODO: pass equivalent of first_cycle_insn_p to max_issue ().  */
       can_issue = max_issue (&ready, privileged_n,
-                             FENCE_STATE (fence), index);
+                             FENCE_STATE (fence), true, index);
       if (sched_verbose >= 2)
         sel_print ("max_issue: we can issue %d insns, already did %d insns\n",
                    can_issue, FENCE_ISSUED_INSNS (fence));
diff --git a/gcc/target.def b/gcc/target.def
index 82f3040..be718ec 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -650,6 +650,84 @@  DEFHOOK
  "",
  int, (rtx insn), NULL)
 
+/* This hook prepares the target for a new round of multipass
+   scheduling.
+   DATA is a pointer to target-specific data used for multipass scheduling.
+   READY_TRY and N_READY represent the current state of search in the
+   optimization space.  The target can filter out instructions that
+   should not be tried during current round by setting corresponding
+   elements in READY_TRY to non-zero.
+   FIRST_CYCLE_INSN_P is true if this is the first round of multipass
+   scheduling on current cycle.  */
+DEFHOOK
+(first_cycle_multipass_begin,
+ "",
+ void, (void *data, char *ready_try, int n_ready, bool first_cycle_insn_p),
+ NULL)
+
+/* This hook is called when multipass scheduling evaluates instruction INSN.
+   DATA is a pointer to target-specific data that can be used to record effects
+   of INSN on CPU that are not described in DFA.
+   READY_TRY and N_READY represent the current state of search in the
+   optimization space.  The target can filter out instructions that
+   should not be tried after issueing INSN by setting corresponding
+   elements in READY_TRY to non-zero.
+   INSN is the instruction being evaluated.
+   PREV_DATA is a pointer to target-specific data corresponding
+   to a state before issueing INSN.  */
+DEFHOOK
+(first_cycle_multipass_issue,
+ "",
+ void, (void *data, char *ready_try, int n_ready, rtx insn,
+	const void *prev_data), NULL)
+
+/* This hook is called when multipass scheduling backtracks from evaluation of
+   instruction corresponding to DATA.
+   DATA is a pointer to target-specific data that stores the effects
+   of instruction from which the algorithm backtracks on CPU that are not
+   described in DFA.
+   READY_TRY and N_READY represent the current state of search in the
+   optimization space.  The target can filter out instructions that
+   should not be tried after issueing INSN by setting corresponding
+   elements in READY_TRY to non-zero.  */
+DEFHOOK
+(first_cycle_multipass_backtrack,
+ "",
+ void, (const void *data, char *ready_try, int n_ready), NULL)
+
+/* This hook notifies the target about the result of the concluded current
+   round of multipass scheduling.
+   DATA is a pointer.
+   If DATA is non-NULL it points to target-specific data used for multipass
+   scheduling which corresponds to instruction at the start of the chain of
+   the winning solution.  DATA is NULL when multipass scheduling cannot find
+   a good enough solution on current cycle and decides to retry later,
+   usually after advancing the cycle count.  */
+DEFHOOK
+(first_cycle_multipass_end,
+ "",
+ void, (const void *data), NULL)
+
+/* This hook is called to initialize target-specific data for multipass
+   scheduling after it has been allocated.
+   DATA is a pointer to target-specific data that stores the effects
+   of instruction from which the algorithm backtracks on CPU that are not
+   described in DFA.  */
+DEFHOOK
+(first_cycle_multipass_init,
+ "",
+ void, (void *data), NULL)
+
+/* This hook is called to finalize target-specific data for multipass
+   scheduling before it is deallocated.
+   DATA is a pointer to target-specific data that stores the effects
+   of instruction from which the algorithm backtracks on CPU that are not
+   described in DFA.  */
+DEFHOOK
+(first_cycle_multipass_fini,
+ "",
+ void, (void *data), NULL)
+
 /* The following member value is pointer to a function called by
    the insn scheduler before issuing insn passed as the third
    parameter on given cycle.  If the hook returns nonzero, the