Patchwork C6X port 4/11: Backtracking scheduler

login
register
mail settings
Submitter Bernd Schmidt
Date May 10, 2011, 3:33 p.m.
Message ID <4DC95ADB.1010408@codesourcery.com>
Download mbox | patch
Permalink /patch/95249/
State New
Headers show

Comments

Bernd Schmidt - May 10, 2011, 3:33 p.m.
On C6X, every jump instruction has 5 delay slots which can be filled
with normally scheduled instructions. With an issue width of 8
insns/cycle, this means that up to 40 insns can be issued after the jump
insn before the jump's side-effect takes place. I didn't particularaly
feel like using reorg.c to deal with this, hence these scheduler patches.

Conceptually, one can think of the dependency between the two instructions

	mv .l1	a1, a4    ; set the first function argument
	call	somewhere

as a true dependence with cost -5, meaning that the first instruction
has to be scheduled no later than 5 cycles after the call. We can extend
the scheduler to issue the call when only such negative-cost
dependencies are left, but generally there is no way of knowing in
advance whether it will be possible to schedule all the remaining
dependencies in the 5 cycles following it. This means we have to
backtrack to a previous state when we find that we got into a state
where it is not possible to generate a valid schedule.

The papers I looked at all seemed to use this model of negative-cost
dependencies, and that's what I implemented first. It kind of worked,
except that the RTL was meaningless after the scheduling pass - you
could have unconditional jumps followed by instructions which would
appear dead to any other consumer of RTL. Unfortunately, the scheduling
pass isn't quite the last thing that runs - var_tracking happens
afterwards, and it gets terminally confused by such bogus RTL, which
means the whole scheme only works as long as one doesn't try to produce
debug output.

Because of this problem, I've chosen a different model for this patch.
Before calling the final sched_ebb pass, the port must split jump insns
(or anything that should have delay slots) into two separate insns, a
real insn and a shadow. As an example,

 (jump_insn (set (pc) (label_ref 24)))

becomes

 (insn (unspec [(label_ref 24)] UNSPEC_REAL_JUMP)
 (jump_insn (set (pc) (unspec [(pc)] UNSPEC_JUMP_SHADOW)

where the first insn emits the branch into the assembly file, while the
second one produces no output (or, in the case of C6X, a helpful comment
that documents that the side effect takes place). For each such pair, a
new function in haifa-sched.c should be called to record the two insns
and the number of delay slots. The scheduler then ensures that these are
scheduled with the prescribed distance in cycles, and that the shadow
insn(s) are the last to occur in their cycle (see the shadows_only_p
mechanism).

This model allows us to leave the dependencies looking rather natural -
e.g. everything that shouldn't cross a jump will depend on the branch
shadow, rather than having a dependence with negative cost on the real
branch.

On C6X, other instructions than jumps (e.g. loads and multiplies) also
have delay slots, but we can't make use of them in normal code since
only for jumps does the CPU ensure that code is interrupt-safe. However,
when using the SPLOOP instruction for hardware-pipelined loops, we can
make use of this as well. This will be part of a future patch which
extends the mechanism to do modulo scheduling using the C6X SPLOOP
instruction.


Bernd
* haifa-sched.c: Include "hashtab.h"
	(sched_no_dce): New global variable.
	(INSN_EXACT_TICK, INSN_TICK_ESTIMATE, FEEDS_BACKTRACK_INSN,
	SHADOW_P): New macros.
	(last_clock_var, cycle_issued_insns): Move declarations.
	(must_backtrack): New static variable.
	(struct delay_pair): New structure.
	(delay_htab, delay_htab_i2): New static variables.
	(delay_hash_i1, delay_hash_i2, delay_i1_eq, delay_i2_eq,
	record_delay_slot_pair, pair_delay, add_delay_dependencies): New
	functions.
	(dep_cost_1): If delay pairs exist, try to look up the insns and
	use the correct pair delay if we find them.
	(rank-for_schedule): Tweak priority for insns that must be scheduled
	soon to avoid backtracking.
	(queue_insn): Detect conditions which force backtracking.
	(ready_add): Likewise.
	(struct sched_block_state): Add member shadows_only_p.
	(struct haifa_save_data): New structure.
	(backtrack_queue): New static variable.
	(mark_backtrack_feeds, copy_insn_list, save_backtrack_point,
	unschedule_insns_until, restore_last_backtrack_point,
	free_topmost_backtrack_point, free_backtrack_queue,
	estimate_insn_tick, estimate_shadow_tick): New functions.
	(prune_ready_list): New arg shadows_only_p.  All callers changed.
	If true, remove everything that isn't SHADOW_P.  Look up delay
	pairs and estimate ticks to avoid scheduling the first insn too
	early.
	(verify_shadows): New function.
	(schedule_block): Add machinery to enable backtracking.
	(sched_init): Take sched_no_dce into account when setting
	DF_LR_RUN_DCE.
	(free_delay_pairs): New function.
	(init_h_i_d): Initialize INSN_EXACT_TICK.
	* Makefile.in (haifa-sched.o): Add $(HASHTAB_H).
	* sched-deps.c (sd_unresolve_dep): New function.
	* sched-int. (struct haifa_sched_info): New fields save_state
	and restore_state.
	(struct _haifa_insn_data): New fields exact_tick, tick_estiamte,
	feeds_backtrack_insn and shadow_p.
	(DO_BACKTRACKING): New value in enum SCHED_FLAGS.
	(sched_no_dce): Declare variable.
	(record_delay_slot_pair, free_delay_pairs, add_delay_dependencies,
	sd_unresolve_dep): Declare functions.
	* modulo-sched.c (sms_sched_info): Clear the two new fields.
	* sched-rgn.c (rgn_const_sched_info): Likewise.
	* sel-sched-ir.c (sched_sel_haifa_sched_info): Likewise.
	* sched-ebb.c (save_ebb_state, restore_ebb_state): New functions.
	(ebb_sched_info): Add them for the two new fields.
	(add_deps_for_risky_insns): Call add_delay_dependencies.
Hans-Peter Nilsson - May 25, 2011, 8:47 a.m.
On Tue, 10 May 2011, Bernd Schmidt wrote:
> On C6X, every jump instruction has 5 delay slots which can be filled
> with normally scheduled instructions. With an issue width of 8
> insns/cycle, this means that up to 40 insns can be issued after the jump
> insn before the jump's side-effect takes place. I didn't particularaly
> feel like using reorg.c to deal with this,

No kidding... multi-delay-slot bugs just waiting for you...

> hence these scheduler patches.

THANK YOU for these first steps!

brgds, H-P

Patch

Index: gcc/haifa-sched.c
===================================================================
--- gcc.orig/haifa-sched.c
+++ gcc/haifa-sched.c
@@ -148,6 +148,7 @@  along with GCC; see the file COPYING3.
 #include "cfgloop.h"
 #include "ira.h"
 #include "emit-rtl.h"  /* FIXME: Can go away once crtl is moved to rtl.h.  */
+#include "hashtab.h"
 
 #ifdef INSN_SCHEDULING
 
@@ -157,6 +158,10 @@  along with GCC; see the file COPYING3.
 
 int issue_rate;
 
+/* This can be set to true by a backend if the scheduler should not
+   enable a DCE pass.  */
+bool sched_no_dce;
+
 /* sched-verbose controls the amount of debugging output the
    scheduler prints.  It is controlled by -fsched-verbose=N:
    N>0 and no -DSR : the output is directed to stderr.
@@ -177,7 +182,11 @@  FILE *sched_dump = 0;
 struct common_sched_info_def *common_sched_info;
 
 #define INSN_TICK(INSN)	(HID (INSN)->tick)
+#define INSN_EXACT_TICK(INSN) (HID (INSN)->exact_tick)
+#define INSN_TICK_ESTIMATE(INSN) (HID (INSN)->tick_estimate)
 #define INTER_TICK(INSN) (HID (INSN)->inter_tick)
+#define FEEDS_BACKTRACK_INSN(INSN) (HID (INSN)->feeds_backtrack_insn)
+#define SHADOW_P(INSN) (HID (INSN)->shadow_p)
 
 /* If INSN_TICK of an instruction is equal to INVALID_TICK,
    then it should be recalculated from scratch.  */
@@ -302,6 +311,18 @@  static struct ready_list *readyp = &read
 /* Scheduling clock.  */
 static int clock_var;
 
+/* Clock at which the previous instruction was issued.  */
+static int last_clock_var;
+
+/* Set to true if, when queuing a shadow insn, we discover that it would be
+   scheduled too late.  */
+static bool must_backtrack;
+
+/* The following variable value is number of essential insns issued on
+   the current cycle.  An insn is essential one if it changes the
+   processors state.  */
+int cycle_issued_insns;
+
 /* This records the actual schedule.  It is built up during the main phase
    of schedule_block, and afterwards used to reorder the insns in the RTL.  */
 static VEC(rtx, heap) *scheduled_insns;
@@ -488,6 +509,147 @@  haifa_classify_insn (const_rtx insn)
   return haifa_classify_rtx (PATTERN (insn));
 }
 
+/* A structure to record a pair of insns where the first one is a real
+   insn that has delay slots, and the second is its delayed shadow.
+   I1 is scheduled normally and will emit an assembly instruction,
+   while I2 describes the side effect that takes place at the
+   transition between cycles CYCLES and (CYCLES + 1) after I1.  */
+struct delay_pair
+{
+  struct delay_pair *next_same_i1;
+  rtx i1, i2;
+  int cycles;
+};
+
+/* Two hash tables to record delay_pairs, one indexed by I1 and the other
+   indexed by I2.  */
+static htab_t delay_htab;
+static htab_t delay_htab_i2;
+
+/* Returns a hash value for X (which really is a delay_pair), based on
+   hashing just I1.  */
+static hashval_t
+delay_hash_i1 (const void *x)
+{
+  return htab_hash_pointer (((const struct delay_pair *) x)->i1);
+}
+
+/* Returns a hash value for X (which really is a delay_pair), based on
+   hashing just I2.  */
+static hashval_t
+delay_hash_i2 (const void *x)
+{
+  return htab_hash_pointer (((const struct delay_pair *) x)->i2);
+}
+
+/* Return nonzero if I1 of pair X is the same as that of pair Y.  */
+static int
+delay_i1_eq (const void *x, const void *y)
+{
+  return ((const struct delay_pair *) x)->i1 == y;
+}
+
+/* Return nonzero if I2 of pair X is the same as that of pair Y.  */
+static int
+delay_i2_eq (const void *x, const void *y)
+{
+  return ((const struct delay_pair *) x)->i2 == y;
+}
+
+/* This function can be called by a port just before it starts the
+   final scheduling pass.  It records the fact that an instruction
+   with delay slots has been split into two insns, I1 and I2.  The
+   first one will be scheduled normally and initiates the operation.
+   The second one is a shadow which must follow a specific number of
+   CYCLES after I1; its only purpose is to show the side effect that
+   occurs at that cycle in the RTL.  If a JUMP_INSN or a CALL_INSN has
+   been split, I1 should be a normal INSN, while I2 retains the
+   original insn type.  */
+
+void
+record_delay_slot_pair (rtx i1, rtx i2, int cycles)
+{
+  struct delay_pair *p = XNEW (struct delay_pair);
+  struct delay_pair **slot;
+
+  p->i1 = i1;
+  p->i2 = i2;
+  p->cycles = cycles;
+
+  if (!delay_htab)
+    {
+      delay_htab = htab_create (10, delay_hash_i1, delay_i1_eq, NULL);
+      delay_htab_i2 = htab_create (10, delay_hash_i2, delay_i2_eq, free);
+    }
+  slot = ((struct delay_pair **)
+	  htab_find_slot_with_hash (delay_htab, i1, htab_hash_pointer (i1),
+				    INSERT));
+  p->next_same_i1 = *slot;
+  *slot = p;
+  slot = ((struct delay_pair **)
+	  htab_find_slot_with_hash (delay_htab_i2, i2, htab_hash_pointer (i2),
+				    INSERT));
+  *slot = p;
+}
+
+/* For a pair P of insns, return the fixed distance in cycles from the first
+   insn after which the second must be scheduled.  */
+static int
+pair_delay (struct delay_pair *p)
+{
+  return p->cycles;
+}
+
+/* Given an insn INSN, add a dependence on its delayed shadow if it
+   has one.  Also try to find situations where shadows depend on each other
+   and add dependencies to the real insns to limit the amount of backtracking
+   needed.  */
+void
+add_delay_dependencies (rtx insn)
+{
+  struct delay_pair *pair;
+  sd_iterator_def sd_it;
+  dep_t dep;
+
+  if (!delay_htab)
+    return;
+
+  pair
+    = (struct delay_pair *)htab_find_with_hash (delay_htab_i2, insn,
+						htab_hash_pointer (insn));
+  if (!pair)
+    return;
+  add_dependence (insn, pair->i1, REG_DEP_ANTI);
+
+  FOR_EACH_DEP (pair->i2, SD_LIST_BACK, sd_it, dep)
+    {
+      rtx pro = DEP_PRO (dep);
+      struct delay_pair *other_pair
+	= (struct delay_pair *)htab_find_with_hash (delay_htab_i2, pro,
+						    htab_hash_pointer (pro));
+      if (!other_pair)
+	continue;
+      if (pair_delay (other_pair) >= pair_delay (pair))
+	{
+	  if (sched_verbose >= 4)
+	    {
+	      fprintf (sched_dump, ";;\tadding dependence %d <- %d\n",
+		       INSN_UID (other_pair->i1),
+		       INSN_UID (pair->i1));
+	      fprintf (sched_dump, ";;\tpair1 %d <- %d, cost %d\n",
+		       INSN_UID (pair->i1),
+		       INSN_UID (pair->i2),
+		       pair_delay (pair));
+	      fprintf (sched_dump, ";;\tpair2 %d <- %d, cost %d\n",
+		       INSN_UID (other_pair->i1),
+		       INSN_UID (other_pair->i2),
+		       pair_delay (other_pair));
+	    }
+	  add_dependence (pair->i1, other_pair->i1, REG_DEP_ANTI);
+	}
+    }
+}
+
 /* Forward declarations.  */
 
 static int priority (rtx);
@@ -858,6 +1020,22 @@  dep_cost_1 (dep_t link, dw_t dw)
   if (DEP_COST (link) != UNKNOWN_DEP_COST)
     return DEP_COST (link);
 
+  if (delay_htab)
+    {
+      struct delay_pair *delay_entry;
+      delay_entry
+	= (struct delay_pair *)htab_find_with_hash (delay_htab_i2, used,
+						    htab_hash_pointer (used));
+      if (delay_entry)
+	{
+	  if (delay_entry->i1 == insn)
+	    {
+	      DEP_COST (link) = pair_delay (delay_entry);
+	      return DEP_COST (link);
+	    }
+	}
+    }
+
   /* A USE insn should never require the value used to be computed.
      This allows the computation of a function's result and parameter
      values to overlap the return and call.  We don't care about the
@@ -1214,6 +1392,17 @@  rank_for_schedule (const void *x, const
       else
 	return INSN_TICK (tmp) - INSN_TICK (tmp2);
     }
+
+  /* If we are doing backtracking in this schedule, prefer insns that
+     have forward dependencies with negative cost against an insn that
+     was already scheduled.  */
+  if (current_sched_info->flags & DO_BACKTRACKING)
+    {
+      priority_val = FEEDS_BACKTRACK_INSN (tmp2) - FEEDS_BACKTRACK_INSN (tmp);
+      if (priority_val)
+	return priority_val;
+    }
+
   /* Prefer insn with higher priority.  */
   priority_val = INSN_PRIORITY (tmp2) - INSN_PRIORITY (tmp);
 
@@ -1326,6 +1515,7 @@  queue_insn (rtx insn, int n_cycles, cons
 {
   int next_q = NEXT_Q_AFTER (q_ptr, n_cycles);
   rtx link = alloc_INSN_LIST (insn, insn_queue[next_q]);
+  int new_tick;
 
   gcc_assert (n_cycles <= max_insn_queue_index);
   gcc_assert (!DEBUG_INSN_P (insn));
@@ -1342,6 +1532,21 @@  queue_insn (rtx insn, int n_cycles, cons
     }
 
   QUEUE_INDEX (insn) = next_q;
+
+  if (current_sched_info->flags & DO_BACKTRACKING)
+    {
+      new_tick = clock_var + n_cycles;
+      if (INSN_TICK (insn) == INVALID_TICK || INSN_TICK (insn) < new_tick)
+	INSN_TICK (insn) = new_tick;
+
+      if (INSN_EXACT_TICK (insn) != INVALID_TICK
+	  && INSN_EXACT_TICK (insn) < clock_var + n_cycles)
+	{
+	  must_backtrack = true;
+	  if (sched_verbose >= 2)
+	    fprintf (sched_dump, ";;\t\tcausing a backtrack.\n");
+	}
+    }
 }
 
 /* Remove INSN from queue.  */
@@ -1401,6 +1606,12 @@  ready_add (struct ready_list *ready, rtx
 
   gcc_assert (QUEUE_INDEX (insn) != QUEUE_READY);
   QUEUE_INDEX (insn) = QUEUE_READY;
+
+  if (INSN_EXACT_TICK (insn) != INVALID_TICK
+      && INSN_EXACT_TICK (insn) < clock_var)
+    {
+      must_backtrack = true;
+    }
 }
 
 /* Remove the element with the highest priority from the ready list and
@@ -1547,9 +1758,6 @@  advance_one_cycle (void)
     fprintf (sched_dump, ";;\tAdvanced a state.\n");
 }
 
-/* Clock at which the previous instruction was issued.  */
-static int last_clock_var;
-
 /* Update register pressure after scheduling INSN.  */
 static void
 update_register_pressure (rtx insn)
@@ -1645,6 +1853,9 @@  struct sched_block_state
 {
   /* True if no real insns have been scheduled in the current cycle.  */
   bool first_cycle_insn_p;
+  /* True if a shadow insn has been scheduled in the current cycle, which
+     means that no more normal insns can be issued.  */
+  bool shadows_only_p;
   /* Initialized with the machine's issue rate every cycle, and updated
      by calls to the variable_issue hook.  */
   int can_issue_more;
@@ -1901,6 +2112,372 @@  remove_notes (rtx head, rtx tail)
     }
 }
 
+/* A structure to record enough data to allow us to backtrack the scheduler to
+   a previous state.  */
+struct haifa_saved_data
+{
+  /* Next entry on the list.  */
+  struct haifa_saved_data *next;
+
+  /* Backtracking is associated with scheduling insns that have delay slots.
+     DELAY_PAIR points to the structure that contains the insns involved, and
+     the number of cycles between them.  */
+  struct delay_pair *delay_pair;
+
+  /* Data used by the frontend (e.g. sched-ebb or sched-rgn).  */
+  void *fe_saved_data;
+  /* Data used by the backend.  */
+  void *be_saved_data;
+
+  /* Copies of global state.  */
+  int clock_var, last_clock_var;
+  struct ready_list ready;
+  state_t curr_state;
+
+  rtx last_scheduled_insn;
+  rtx last_nondebug_scheduled_insn;
+  int cycle_issued_insns;
+
+  /* Copies of state used in the inner loop of schedule_block.  */
+  struct sched_block_state sched_block;
+
+  /* We don't need to save q_ptr, as its value is arbitrary and we can set it
+     to 0 when restoring.  */
+  int q_size;
+  rtx *insn_queue;
+};
+
+/* A record, in reverse order, of all scheduled insns which have delay slots
+   and may require backtracking.  */
+static struct haifa_saved_data *backtrack_queue;
+
+/* For every dependency of INSN, set the FEEDS_BACKTRACK_INSN bit according
+   to SET_P.  */
+static void
+mark_backtrack_feeds (rtx insn, int set_p)
+{
+  sd_iterator_def sd_it;
+  dep_t dep;
+  FOR_EACH_DEP (insn, SD_LIST_HARD_BACK, sd_it, dep)
+    {
+      FEEDS_BACKTRACK_INSN (DEP_PRO (dep)) = set_p;
+    }
+}
+
+/* Make a copy of the INSN_LIST list LINK and return it.  */
+static rtx
+copy_insn_list (rtx link)
+{
+  rtx new_queue;
+  rtx *pqueue = &new_queue;
+
+  for (; link; link = XEXP (link, 1))
+    {
+      rtx x = XEXP (link, 0);
+      rtx newlink = alloc_INSN_LIST (x, NULL);
+      *pqueue = newlink;
+      pqueue = &XEXP (newlink, 1);
+    }
+  *pqueue = NULL_RTX;
+  return new_queue;
+}
+
+/* Save the current scheduler state so that we can backtrack to it
+   later if necessary.  PAIR gives the insns that make it necessary to
+   save this point.  SCHED_BLOCK is the local state of schedule_block
+   that need to be saved.  */
+static void
+save_backtrack_point (struct delay_pair *pair,
+		      struct sched_block_state sched_block)
+{
+  int i;
+  struct haifa_saved_data *save = XNEW (struct haifa_saved_data);
+
+  save->curr_state = xmalloc (dfa_state_size);
+  memcpy (save->curr_state, curr_state, dfa_state_size);
+
+  save->ready.first = ready.first;
+  save->ready.n_ready = ready.n_ready;
+  save->ready.n_debug = ready.n_debug;
+  save->ready.veclen = ready.veclen;
+  save->ready.vec = XNEWVEC (rtx, ready.veclen);
+  memcpy (save->ready.vec, ready.vec, ready.veclen * sizeof (rtx));
+
+  save->insn_queue = XNEWVEC (rtx, max_insn_queue_index + 1);
+  save->q_size = q_size;
+  for (i = 0; i <= max_insn_queue_index; i++)
+    {
+      int q = NEXT_Q_AFTER (q_ptr, i);
+      save->insn_queue[i] = copy_insn_list (insn_queue[q]);
+    }
+
+  save->clock_var = clock_var;
+  save->last_clock_var = last_clock_var;
+  save->cycle_issued_insns = cycle_issued_insns;
+  save->last_scheduled_insn = last_scheduled_insn;
+  save->last_nondebug_scheduled_insn = last_nondebug_scheduled_insn;
+
+  save->sched_block = sched_block;
+
+  if (current_sched_info->save_state)
+    save->fe_saved_data = (*current_sched_info->save_state) ();
+
+  if (targetm.sched.alloc_sched_context)
+    {
+      save->be_saved_data = targetm.sched.alloc_sched_context ();
+      targetm.sched.init_sched_context (save->be_saved_data, false);
+    }
+  else
+    save->be_saved_data = NULL;
+
+  save->delay_pair = pair;
+
+  save->next = backtrack_queue;
+  backtrack_queue = save;
+
+  while (pair)
+    {
+      mark_backtrack_feeds (pair->i2, 1);
+      INSN_TICK (pair->i2) = INVALID_TICK;
+      INSN_EXACT_TICK (pair->i2) = clock_var + pair_delay (pair);
+      SHADOW_P (pair->i2) = true;
+      pair = pair->next_same_i1;
+    }
+}
+
+/* Pop entries from the SCHEDULED_INSNS vector up to and including INSN.
+   Restore their dependencies to an unresolved state, and mark them as
+   queued nowhere.  */
+
+static void
+unschedule_insns_until (rtx insn)
+{
+  for (;;)
+    {
+      rtx last;
+      sd_iterator_def sd_it;
+      dep_t dep;
+
+      last = VEC_pop (rtx, scheduled_insns);
+
+      /* This will be changed by restore_backtrack_point if the insn is in
+	 any queue.  */
+      QUEUE_INDEX (last) = QUEUE_NOWHERE;
+      if (last != insn)
+	INSN_TICK (last) = INVALID_TICK;
+
+      for (sd_it = sd_iterator_start (last, SD_LIST_RES_FORW);
+	   sd_iterator_cond (&sd_it, &dep);)
+	{
+	  rtx con = DEP_CON (dep);
+	  TODO_SPEC (con) |= HARD_DEP;
+	  INSN_TICK (con) = INVALID_TICK;
+	  sd_unresolve_dep (sd_it);
+	}
+
+      if (last == insn)
+	break;
+    }
+}
+
+/* Restore scheduler state from the topmost entry on the backtracking queue.
+   PSCHED_BLOCK_P points to the local data of schedule_block that we must
+   overwrite with the saved data.
+   The caller must already have called unschedule_insns_until.  */
+
+static void
+restore_last_backtrack_point (struct sched_block_state *psched_block)
+
+{
+  rtx link;
+  int i;
+  struct haifa_saved_data *save = backtrack_queue;
+
+  backtrack_queue = save->next;
+
+  if (current_sched_info->restore_state)
+    (*current_sched_info->restore_state) (save->fe_saved_data);
+
+  if (targetm.sched.alloc_sched_context)
+    {
+      targetm.sched.set_sched_context (save->be_saved_data);
+      targetm.sched.free_sched_context (save->be_saved_data);
+    }
+
+  /* Clear the QUEUE_INDEX of everything in the ready list or one
+     of the queues.  */
+  if (ready.n_ready > 0)
+    {
+      rtx *first = ready_lastpos (&ready);
+      for (i = 0; i < ready.n_ready; i++)
+	{
+	  QUEUE_INDEX (first[i]) = QUEUE_NOWHERE;
+	  INSN_TICK (first[i]) = INVALID_TICK;
+	}
+    }
+  for (i = 0; i <= max_insn_queue_index; i++)
+    {
+      int q = NEXT_Q_AFTER (q_ptr, i);
+
+      for (link = insn_queue[q]; link; link = XEXP (link, 1))
+	{
+	  rtx x = XEXP (link, 0);
+	  QUEUE_INDEX (x) = QUEUE_NOWHERE;
+	  INSN_TICK (x) = INVALID_TICK;
+	}
+      free_INSN_LIST_list (&insn_queue[q]);
+    }
+
+  free (ready.vec);
+  ready = save->ready;
+
+  if (ready.n_ready > 0)
+    {
+      rtx *first = ready_lastpos (&ready);
+      for (i = 0; i < ready.n_ready; i++)
+	{
+	  QUEUE_INDEX (first[i]) = QUEUE_READY;
+	  INSN_TICK (first[i]) = save->clock_var;
+	}
+    }
+
+  q_ptr = 0;
+  q_size = save->q_size;
+  for (i = 0; i <= max_insn_queue_index; i++)
+    {
+      int q = NEXT_Q_AFTER (q_ptr, i);
+
+      insn_queue[q] = save->insn_queue[q];
+
+      for (link = insn_queue[q]; link; link = XEXP (link, 1))
+	{
+	  rtx x = XEXP (link, 0);
+	  QUEUE_INDEX (x) = i;
+	  INSN_TICK (x) = save->clock_var + i;
+	}
+    }
+  free (save->insn_queue);
+
+  clock_var = save->clock_var;
+  last_clock_var = save->last_clock_var;
+  cycle_issued_insns = save->cycle_issued_insns;
+  last_scheduled_insn = save->last_scheduled_insn;
+  last_nondebug_scheduled_insn = save->last_nondebug_scheduled_insn;
+
+  *psched_block = save->sched_block;
+
+  memcpy (curr_state, save->curr_state, dfa_state_size);
+  free (save->curr_state);
+
+  mark_backtrack_feeds (save->delay_pair->i2, 0);
+
+  free (save);
+
+  for (save = backtrack_queue; save; save = save->next)
+    {
+      mark_backtrack_feeds (save->delay_pair->i2, 1);
+    }
+}
+
+/* Discard all data associated with the topmost entry in the backtrack
+   queue.  If RESET_TICK is false, we just want to free the data.  If true,
+   we are doing this because we discovered a reason to backtrack.  In the
+   latter case, also reset the INSN_TICK for the shadow insn.  */
+static void
+free_topmost_backtrack_point (bool reset_tick)
+{
+  struct haifa_saved_data *save = backtrack_queue;
+  int i;
+
+  backtrack_queue = save->next;
+
+  if (reset_tick)
+    {
+      struct delay_pair *pair = save->delay_pair;
+      while (pair)
+	{
+	  INSN_TICK (pair->i2) = INVALID_TICK;
+	  INSN_EXACT_TICK (pair->i2) = INVALID_TICK;
+	  pair = pair->next_same_i1;
+	}
+    }
+  if (targetm.sched.free_sched_context)
+    targetm.sched.free_sched_context (save->be_saved_data);
+  if (current_sched_info->restore_state)
+    free (save->fe_saved_data);
+  for (i = 0; i <= max_insn_queue_index; i++)
+    free_INSN_LIST_list (&save->insn_queue[i]);
+  free (save->insn_queue);
+  free (save->curr_state);
+  free (save->ready.vec);
+  free (save);
+}
+
+/* Free the entire backtrack queue.  */
+static void
+free_backtrack_queue (void)
+{
+  while (backtrack_queue)
+    free_topmost_backtrack_point (false);
+}
+
+static bool
+estimate_insn_tick (bitmap processed, rtx insn, int budget)
+{
+  sd_iterator_def sd_it;
+  dep_t dep;
+  int earliest = INSN_TICK (insn);
+
+  FOR_EACH_DEP (insn, SD_LIST_BACK, sd_it, dep)
+    {
+      rtx pro = DEP_PRO (dep);
+      int t;
+
+      if (QUEUE_INDEX (pro) == QUEUE_SCHEDULED)
+	gcc_assert (INSN_TICK (pro) + dep_cost (dep) <= INSN_TICK (insn));
+      else
+	{
+	  int cost = dep_cost (dep);
+	  if (cost >= budget)
+	    return false;
+	  if (!bitmap_bit_p (processed, INSN_LUID (pro)))
+	    {
+	      if (!estimate_insn_tick (processed, pro, budget - cost))
+		return false;
+	    }
+	  gcc_assert (INSN_TICK_ESTIMATE (pro) != INVALID_TICK);
+	  t = INSN_TICK_ESTIMATE (pro) + cost;
+	  if (earliest == INVALID_TICK || t > earliest)
+	    earliest = t;
+	}
+    }
+  bitmap_set_bit (processed, INSN_LUID (insn));
+  INSN_TICK_ESTIMATE (insn) = earliest;
+  return true;
+}
+
+/* Examine the pair of insns in P, and estimate (optimistically, assuming
+   infinite resources) the cycle in which the delayed shadow can be issued.
+   Return the number of cycles that must pass before the real insn can be
+   issued in order to meet this constraint.  */
+static int
+estimate_shadow_tick (struct delay_pair *p)
+{
+  bitmap_head processed;
+  int t;
+  bool cutoff;
+  bitmap_initialize (&processed, 0);
+
+  cutoff = !estimate_insn_tick (&processed, p->i2,
+				max_insn_queue_index + pair_delay (p));
+  bitmap_clear (&processed);
+  if (cutoff)
+    return max_insn_queue_index;
+  t = INSN_TICK_ESTIMATE (p->i2) - (clock_var + pair_delay (p) + 1);
+  if (t > 0)
+    return t;
+  return 0;
+}
 
 /* Return the head and tail pointers of ebb starting at BEG and ending
    at END.  */
@@ -2468,11 +3045,6 @@  struct choice_entry
    function max_issue.  */
 static struct choice_entry *choice_stack;
 
-/* The following variable value is number of essential insns issued on
-   the current cycle.  An insn is essential one if it changes the
-   processors state.  */
-int cycle_issued_insns;
-
 /* This holds the value of the target dfa_lookahead hook.  */
 int dfa_lookahead;
 
@@ -2884,10 +3456,18 @@  commit_schedule (rtx prev_head, rtx tail
    issued in this cycle.  TEMP_STATE is temporary scheduler state we
    can use as scratch space.  If FIRST_CYCLE_INSN_P is true, no insns
    have been issued for the current cycle, which means it is valid to
-   issue an asm statement.  */
+   issue an asm statement.
+
+   If SHADOWS_ONLY_P is true, we eliminate all real insns and only
+   leave those for which SHADOW_P is true.
+
+   Return the number of cycles we must
+   advance to find the next ready instruction, or zero if there remain
+   insns on the ready list.  */
 
 static void
-prune_ready_list (state_t temp_state, bool first_cycle_insn_p)
+prune_ready_list (state_t temp_state, bool first_cycle_insn_p,
+		  bool shadows_only_p)
 {
   int i;
 
@@ -2898,7 +3478,12 @@  prune_ready_list (state_t temp_state, bo
       int cost = 0;
       const char *reason = "resource conflict";
 
-      if (recog_memoized (insn) < 0)
+      if (shadows_only_p && !DEBUG_INSN_P (insn) && !SHADOW_P (insn))
+	{
+	  cost = 1;
+	  reason = "not a shadow";
+	}
+      else if (recog_memoized (insn) < 0)
 	{
 	  if (!first_cycle_insn_p
 	      && (GET_CODE (PATTERN (insn)) == ASM_INPUT
@@ -2910,12 +3495,34 @@  prune_ready_list (state_t temp_state, bo
 	cost = 0;
       else
 	{
+	  int delay_cost = 0;
+
+	  if (delay_htab)
+	    {
+	      struct delay_pair *delay_entry;
+	      delay_entry
+		= (struct delay_pair *)htab_find_with_hash (delay_htab, insn,
+							    htab_hash_pointer (insn));
+	      while (delay_entry && delay_cost == 0)
+		{
+		  delay_cost = estimate_shadow_tick (delay_entry);
+		  if (delay_cost > max_insn_queue_index)
+		    delay_cost = max_insn_queue_index;
+		  delay_entry = delay_entry->next_same_i1;
+		}
+	    }
+
 	  memcpy (temp_state, curr_state, dfa_state_size);
 	  cost = state_transition (temp_state, insn);
 	  if (cost < 0)
 	    cost = 0;
 	  else if (cost == 0)
 	    cost = 1;
+	  if (cost < delay_cost)
+	    {
+	      cost = delay_cost;
+	      reason = "shadow tick";
+	    }
 	}
       if (cost >= 1)
 	{
@@ -2926,6 +3533,60 @@  prune_ready_list (state_t temp_state, bo
     }
 }
 
+/* Called when we detect that the schedule is impossible.  We examine the
+   backtrack queue to find the earliest insn that caused this condition.  */
+
+static struct haifa_saved_data *
+verify_shadows (void)
+{
+  struct haifa_saved_data *save, *earliest_fail = NULL;
+  for (save = backtrack_queue; save; save = save->next)
+    {
+      int t;
+      struct delay_pair *pair = save->delay_pair;
+      rtx i1 = pair->i1;
+
+      for (; pair; pair = pair->next_same_i1)
+	{
+	  rtx i2 = pair->i2;
+
+	  if (QUEUE_INDEX (i2) == QUEUE_SCHEDULED)
+	    continue;
+
+	  t = INSN_TICK (i1) + pair_delay (pair);
+	  if (t < clock_var)
+	    {
+	      if (sched_verbose >= 2)
+		fprintf (sched_dump,
+			 ";;\t\tfailed delay requirements for %d/%d (%d->%d)"
+			 ", not ready\n",
+			 INSN_UID (pair->i1), INSN_UID (pair->i2),
+			 INSN_TICK (pair->i1), INSN_EXACT_TICK (pair->i2));
+	      earliest_fail = save;
+	      break;
+	    }
+	  if (QUEUE_INDEX (i2) >= 0)
+	    {
+	      int queued_for = INSN_TICK (i2);
+
+	      if (t < queued_for)
+		{
+		  if (sched_verbose >= 2)
+		    fprintf (sched_dump,
+			     ";;\t\tfailed delay requirements for %d/%d"
+			     " (%d->%d), queued too late\n",
+			     INSN_UID (pair->i1), INSN_UID (pair->i2),
+			     INSN_TICK (pair->i1), INSN_EXACT_TICK (pair->i2));
+		  earliest_fail = save;
+		  break;
+		}
+	    }
+	}
+    }
+
+  return earliest_fail;
+}
+
 /* Use forward list scheduling to rearrange insns of block pointed to by
    TARGET_BB, possibly bringing insns from subsequent blocks in the same
    region.  */
@@ -2955,6 +3616,8 @@  schedule_block (basic_block *target_bb)
 
   haifa_recovery_bb_recently_added_p = false;
 
+  backtrack_queue = NULL;
+
   /* Debug info.  */
   if (sched_verbose)
     dump_new_block_header (0, *target_bb, head, tail);
@@ -3051,6 +3714,8 @@  schedule_block (basic_block *target_bb)
 
   gcc_assert (VEC_length (rtx, scheduled_insns) == 0);
   sort_p = TRUE;
+  must_backtrack = false;
+
   /* Loop until all the insns in BB are scheduled.  */
   while ((*current_sched_info->schedule_more_p) ())
     {
@@ -3080,18 +3745,21 @@  schedule_block (basic_block *target_bb)
       while (advance > 0);
 
       if (ready.n_ready > 0)
-	prune_ready_list (temp_state, true);
+	prune_ready_list (temp_state, true, false);
       if (ready.n_ready == 0)
 	continue;
+      if (must_backtrack)
+	goto do_backtrack;
 
       ls.first_cycle_insn_p = true;
+      ls.shadows_only_p = false;
       cycle_issued_insns = 0;
       ls.can_issue_more = issue_rate;
       for (;;)
 	{
 	  rtx insn;
 	  int cost;
-	  bool asm_p = false;
+	  bool asm_p;
 
 	  if (sort_p && ready.n_ready > 0)
 	    {
@@ -3130,6 +3798,7 @@  schedule_block (basic_block *target_bb)
 	  if (ls.first_cycle_insn_p && !ready.n_ready)
 	    break;
 
+	resume_after_backtrack:
 	  /* Allow the target to reorder the list, typically for
 	     better instruction bundling.  */
 	  if (sort_p
@@ -3236,6 +3905,22 @@  schedule_block (basic_block *target_bb)
 	      goto restart_choose_ready;
 	    }
 
+	  if (delay_htab)
+	    {
+	      /* If this insn is the first part of a delay-slot pair, record a
+		 backtrack point.  */
+	      struct delay_pair *delay_entry;
+	      delay_entry
+		= (struct delay_pair *)htab_find_with_hash (delay_htab, insn,
+							    htab_hash_pointer (insn));
+	      if (delay_entry)
+		{
+		  save_backtrack_point (delay_entry, ls);
+		  if (sched_verbose >= 2)
+		    fprintf (sched_dump, ";;\t\tsaving backtrack point\n");
+		}
+	    }
+
 	  /* DECISION is made.  */
 
           if (TODO_SPEC (insn) & SPECULATIVE)
@@ -3275,18 +3960,70 @@  schedule_block (basic_block *target_bb)
 	    ls.can_issue_more--;
 	  advance = schedule_insn (insn);
 
+	  if (SHADOW_P (insn))
+	    ls.shadows_only_p = true;
+
 	  /* After issuing an asm insn we should start a new cycle.  */
 	  if (advance == 0 && asm_p)
 	    advance = 1;
+
+	  if (must_backtrack)
+	    break;
+
 	  if (advance != 0)
 	    break;
 
 	  ls.first_cycle_insn_p = false;
 	  if (ready.n_ready > 0)
-	    prune_ready_list (temp_state, false);
+	    prune_ready_list (temp_state, false, ls.shadows_only_p);
 	}
-    }
 
+    do_backtrack:
+      if (!must_backtrack)
+	for (i = 0; i < ready.n_ready; i++)
+	  {
+	    rtx insn = ready_element (&ready, i);
+	    if (INSN_EXACT_TICK (insn) == clock_var)
+	      {
+		must_backtrack = true;
+		clock_var++;
+		break;
+	      }
+	  }
+      while (must_backtrack)
+	{
+	  struct haifa_saved_data *failed;
+	  rtx failed_insn;
+
+	  must_backtrack = false;
+	  failed = verify_shadows ();
+	  gcc_assert (failed);
+
+	  failed_insn = failed->delay_pair->i1;
+	  unschedule_insns_until (failed_insn);
+	  while (failed != backtrack_queue)
+	    free_topmost_backtrack_point (true);
+	  restore_last_backtrack_point (&ls);
+	  if (sched_verbose >= 2)
+	    fprintf (sched_dump, ";;\t\trewind to cycle %d\n", clock_var);
+	  /* Delay by at least a cycle.  This could cause additional
+	     backtracking.  */
+	  queue_insn (failed_insn, 1, "backtracked");
+	  advance = 0;
+	  if (must_backtrack)
+	    continue;
+	  if (ready.n_ready > 0)
+	    goto resume_after_backtrack;
+	  else
+	    {
+	      if (clock_var == 0 && ls.first_cycle_insn_p)
+		goto end_schedule;
+	      advance = 1;
+	      break;
+	    }
+	}
+    }
+ end_schedule:
   /* Debug info.  */
   if (sched_verbose)
     {
@@ -3364,6 +4101,8 @@  schedule_block (basic_block *target_bb)
 
   current_sched_info->head = head;
   current_sched_info->tail = tail;
+
+  free_backtrack_queue ();
 }
 
 /* Set_priorities: compute priority of each insn in the block.  */
@@ -3488,7 +4227,8 @@  sched_init (void)
 
   init_alias_analysis ();
 
-  df_set_flags (DF_LR_RUN_DCE);
+  if (!sched_no_dce)
+    df_set_flags (DF_LR_RUN_DCE);
   df_note_add_problem ();
 
   /* More problems needed for interloop dep calculation in SMS.  */
@@ -3653,6 +4393,17 @@  sched_finish (void)
 #endif
 }
 
+/* Free all delay_pair structures that were recorded.  */
+void
+free_delay_pairs (void)
+{
+  if (delay_htab)
+    {
+      htab_empty (delay_htab);
+      htab_empty (delay_htab_i2);
+    }
+}
+
 /* Fix INSN_TICKs of the instructions in the current block as well as
    INSN_TICKs of their dependents.
    HEAD and TAIL are the begin and the end of the current scheduled block.  */
@@ -5546,6 +6297,7 @@  init_h_i_d (rtx insn)
       INSN_COST (insn) = -1;
       QUEUE_INDEX (insn) = QUEUE_NOWHERE;
       INSN_TICK (insn) = INVALID_TICK;
+      INSN_EXACT_TICK (insn) = INVALID_TICK;
       INTER_TICK (insn) = INVALID_TICK;
       TODO_SPEC (insn) = HARD_DEP;
     }
Index: gcc/Makefile.in
===================================================================
--- gcc.orig/Makefile.in
+++ gcc/Makefile.in
@@ -3399,7 +3399,7 @@  modulo-sched.o : modulo-sched.c $(DDG_H)
 haifa-sched.o : haifa-sched.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    $(SCHED_INT_H) $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(FUNCTION_H) \
    $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) $(RECOG_H) $(EXCEPT_H) $(TM_P_H) $(TARGET_H) output.h \
-   $(PARAMS_H) $(DBGCNT_H) $(CFGLOOP_H) ira.h $(EMIT_RTL_H)
+   $(PARAMS_H) $(DBGCNT_H) $(CFGLOOP_H) ira.h $(EMIT_RTL_H) $(HASHTAB_H)
 sched-deps.o : sched-deps.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
    $(RTL_H) $(SCHED_INT_H) $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h \
    $(FUNCTION_H) $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) $(RECOG_H) $(EXCEPT_H) cselib.h \
Index: gcc/modulo-sched.c
===================================================================
--- gcc.orig/modulo-sched.c
+++ gcc/modulo-sched.c
@@ -276,6 +276,7 @@  static struct haifa_sched_info sms_sched
   0, 0,
 
   NULL, NULL, NULL, NULL,
+  NULL, NULL,
   0
 };
 
Index: gcc/sched-deps.c
===================================================================
--- gcc.orig/sched-deps.c
+++ gcc/sched-deps.c
@@ -1279,6 +1279,28 @@  sd_resolve_dep (sd_iterator_def sd_it)
 		 INSN_RESOLVED_FORW_DEPS (pro));
 }
 
+/* Perform the inverse operation of sd_resolve_dep.  Restore the dependence
+   pointed to by SD_IT to unresolved state.  */
+void
+sd_unresolve_dep (sd_iterator_def sd_it)
+{
+  dep_node_t node = DEP_LINK_NODE (*sd_it.linkp);
+  dep_t dep = DEP_NODE_DEP (node);
+  rtx pro = DEP_PRO (dep);
+  rtx con = DEP_CON (dep);
+
+  if ((current_sched_info->flags & DO_SPECULATION)
+      && (DEP_STATUS (dep) & SPECULATIVE))
+    move_dep_link (DEP_NODE_BACK (node), INSN_RESOLVED_BACK_DEPS (con),
+		   INSN_SPEC_BACK_DEPS (con));
+  else
+    move_dep_link (DEP_NODE_BACK (node), INSN_RESOLVED_BACK_DEPS (con),
+		   INSN_HARD_BACK_DEPS (con));
+
+  move_dep_link (DEP_NODE_FORW (node), INSN_RESOLVED_FORW_DEPS (pro),
+		 INSN_FORW_DEPS (pro));
+}
+
 /* Make TO depend on all the FROM's producers.
    If RESOLVED_P is true add dependencies to the resolved lists.  */
 void
Index: gcc/sched-ebb.c
===================================================================
--- gcc.orig/sched-ebb.c
+++ gcc/sched-ebb.c
@@ -74,6 +74,25 @@  static void ebb_add_block (basic_block,
 static basic_block advance_target_bb (basic_block, rtx);
 static void ebb_fix_recovery_cfg (int, int, int);
 
+/* Allocate memory and store the state of the frontend.  Return the allocated
+   memory.  */
+static void *
+save_ebb_state (void)
+{
+  int *p = XNEW (int);
+  *p = sched_rgn_n_insns;
+  return p;
+}
+
+/* Restore the state of the frontend from P_, then free it.  */
+static void
+restore_ebb_state (void *p_)
+{
+  int *p = (int *)p_;
+  sched_rgn_n_insns = *p;
+  free (p_);
+}
+
 /* Return nonzero if there are more insns that should be scheduled.  */
 
 static int
@@ -295,6 +314,10 @@  static struct haifa_sched_info ebb_sched
   begin_schedule_ready,
   begin_move_insn,
   advance_target_bb,
+
+  save_ebb_state,
+  restore_ebb_state,
+
   SCHED_EBB
   /* We can create new blocks in begin_schedule_ready ().  */
   | NEW_BBS
@@ -377,76 +400,80 @@  add_deps_for_risky_insns (rtx head, rtx
   basic_block last_block = NULL, bb;
 
   for (insn = head; insn != next_tail; insn = NEXT_INSN (insn))
-    if (control_flow_insn_p (insn))
-      {
-	bb = BLOCK_FOR_INSN (insn);
-	bb->aux = last_block;
-	last_block = bb;
-	last_jump = insn;
-      }
-    else if (INSN_P (insn) && last_jump != NULL_RTX)
-      {
-	classification = haifa_classify_insn (insn);
-	prev = last_jump;
-	switch (classification)
-	  {
-	  case PFREE_CANDIDATE:
-	    if (flag_schedule_speculative_load)
-	      {
-		bb = earliest_block_with_similiar_load (last_block, insn);
-		if (bb)
-		  {
-		    bb = (basic_block) bb->aux;
-		    if (!bb)
-		      break;
-		    prev = BB_END (bb);
-		  }
-	      }
-	    /* Fall through.  */
-	  case TRAP_RISKY:
-	  case IRISKY:
-	  case PRISKY_CANDIDATE:
-	    /* ??? We could implement better checking PRISKY_CANDIDATEs
-	       analogous to sched-rgn.c.  */
-	    /* We can not change the mode of the backward
-	       dependency because REG_DEP_ANTI has the lowest
-	       rank.  */
-	    if (! sched_insns_conditions_mutex_p (insn, prev))
-	      {
-		dep_def _dep, *dep = &_dep;
-
-		init_dep (dep, prev, insn, REG_DEP_ANTI);
-
-		if (!(current_sched_info->flags & USE_DEPS_LIST))
-		  {
-		    enum DEPS_ADJUST_RESULT res;
-
-		    res = sd_add_or_update_dep (dep, false);
-
-		    /* We can't change an existing dependency with
-		       DEP_ANTI.  */
-		    gcc_assert (res != DEP_CHANGED);
-		  }
-		else
-		  {
-		    if ((current_sched_info->flags & DO_SPECULATION)
-			&& (spec_info->mask & BEGIN_CONTROL))
-		      DEP_STATUS (dep) = set_dep_weak (DEP_ANTI, BEGIN_CONTROL,
-						       MAX_DEP_WEAK);
-
-		    sd_add_or_update_dep (dep, false);
-
-		    /* Dep_status could have been changed.
-		       No assertion here.  */
-		  }
-	      }
-
-            break;
-
-          default:
-            break;
-	  }
-      }
+    {
+      add_delay_dependencies (insn);
+      if (control_flow_insn_p (insn))
+	{
+	  bb = BLOCK_FOR_INSN (insn);
+	  bb->aux = last_block;
+	  last_block = bb;
+	  last_jump = insn;
+	}
+      else if (INSN_P (insn) && last_jump != NULL_RTX)
+	{
+	  classification = haifa_classify_insn (insn);
+	  prev = last_jump;
+
+	  switch (classification)
+	    {
+	    case PFREE_CANDIDATE:
+	      if (flag_schedule_speculative_load)
+		{
+		  bb = earliest_block_with_similiar_load (last_block, insn);
+		  if (bb)
+		    {
+		      bb = (basic_block) bb->aux;
+		      if (!bb)
+			break;
+		      prev = BB_END (bb);
+		    }
+		}
+	      /* Fall through.  */
+	    case TRAP_RISKY:
+	    case IRISKY:
+	    case PRISKY_CANDIDATE:
+	      /* ??? We could implement better checking PRISKY_CANDIDATEs
+		 analogous to sched-rgn.c.  */
+	      /* We can not change the mode of the backward
+		 dependency because REG_DEP_ANTI has the lowest
+		 rank.  */
+	      if (! sched_insns_conditions_mutex_p (insn, prev))
+		{
+		  dep_def _dep, *dep = &_dep;
+
+		  init_dep (dep, prev, insn, REG_DEP_ANTI);
+
+		  if (!(current_sched_info->flags & USE_DEPS_LIST))
+		    {
+		      enum DEPS_ADJUST_RESULT res;
+
+		      res = sd_add_or_update_dep (dep, false);
+
+		      /* We can't change an existing dependency with
+			 DEP_ANTI.  */
+		      gcc_assert (res != DEP_CHANGED);
+		    }
+		  else
+		    {
+		      if ((current_sched_info->flags & DO_SPECULATION)
+			  && (spec_info->mask & BEGIN_CONTROL))
+			DEP_STATUS (dep) = set_dep_weak (DEP_ANTI, BEGIN_CONTROL,
+							 MAX_DEP_WEAK);
+
+		      sd_add_or_update_dep (dep, false);
+
+		      /* Dep_status could have been changed.
+			 No assertion here.  */
+		    }
+		}
+
+	      break;
+
+	    default:
+	      break;
+	    }
+	}
+    }
   /* Maintain the invariant that bb->aux is clear after use.  */
   while (last_block)
     {
Index: gcc/sched-int.h
===================================================================
--- gcc.orig/sched-int.h
+++ gcc/sched-int.h
@@ -627,6 +627,13 @@  struct haifa_sched_info
      The first parameter is the current basic block in EBB.  */
   basic_block (*advance_target_bb) (basic_block, rtx);
 
+  /* Allocate memory, store the frontend scheduler state in it, and
+     return it.  */
+  void *(*save_state) (void);
+  /* Restore frontend scheduler state from the argument, and free the
+     memory.  */
+  void (*restore_state) (void *);
+
   /* ??? FIXME: should use straight bitfields inside sched_info instead of
      this flag field.  */
   unsigned int flags;
@@ -775,10 +782,18 @@  struct _haifa_insn_data
      used to note timing constraints for the insns in the pending list.  */
   int tick;
 
+  /* For insns that are scheduled at a fixed difference from another,
+     this records the tick in which they must be ready.  */
+  int exact_tick;
+
   /* INTER_TICK is used to adjust INSN_TICKs of instructions from the
      subsequent blocks in a region.  */
   int inter_tick;
 
+  /* Used temporarily to estimate an INSN_TICK value for an insn given
+     current knowledge.  */
+  int tick_estimate;
+
   /* See comment on QUEUE_INDEX macro in haifa-sched.c.  */
   int queue_index;
 
@@ -788,6 +803,14 @@  struct _haifa_insn_data
      moved load insn and this one.  */
   unsigned int fed_by_spec_load : 1;
   unsigned int is_load_insn : 1;
+  /* Nonzero if this insn has negative-cost forward dependencies against
+     an already scheduled insn.  */
+  unsigned int feeds_backtrack_insn : 1;
+
+  /* Nonzero if this insn is a shadow of another, scheduled after a fixed
+     delay.  We only emit shadows at the end of a cycle, with no other
+     real insns following them.  */
+  unsigned int shadow_p : 1;
 
   /* '> 0' if priority is valid,
      '== 0' if priority was not yet computed,
@@ -1028,7 +1051,8 @@  enum SCHED_FLAGS {
      Results in generation of data and control speculative dependencies.
      Requires USE_DEPS_LIST set.  */
   DO_SPECULATION = USE_DEPS_LIST << 1,
-  SCHED_RGN = DO_SPECULATION << 1,
+  DO_BACKTRACKING = DO_SPECULATION << 1,
+  SCHED_RGN = DO_BACKTRACKING << 1,
   SCHED_EBB = SCHED_RGN << 1,
   /* Scheduler can possibly create new basic blocks.  Used for assertions.  */
   NEW_BBS = SCHED_EBB << 1,
@@ -1315,7 +1339,11 @@  extern int *ebb_head;
 extern int current_nr_blocks;
 extern int current_blocks;
 extern int target_bb;
+extern bool sched_no_dce;
 
+extern void record_delay_slot_pair (rtx, rtx, int);
+extern void free_delay_pairs (void);
+extern void add_delay_dependencies (rtx);
 extern bool sched_is_disabled_for_current_region_p (void);
 extern void sched_rgn_init (bool);
 extern void sched_rgn_finish (void);
@@ -1489,6 +1517,7 @@  extern dep_t sd_find_dep_between (rtx, r
 extern void sd_add_dep (dep_t, bool);
 extern enum DEPS_ADJUST_RESULT sd_add_or_update_dep (dep_t, bool);
 extern void sd_resolve_dep (sd_iterator_def);
+extern void sd_unresolve_dep (sd_iterator_def);
 extern void sd_copy_back_deps (rtx, rtx, bool);
 extern void sd_delete_dep (sd_iterator_def);
 extern void sd_debug_lists (rtx, sd_list_types_def);
Index: gcc/sched-rgn.c
===================================================================
--- gcc.orig/sched-rgn.c
+++ gcc/sched-rgn.c
@@ -2371,6 +2371,7 @@  static const struct haifa_sched_info rgn
   begin_schedule_ready,
   NULL,
   advance_target_bb,
+  NULL, NULL,
   SCHED_RGN
 };
 
Index: gcc/sel-sched-ir.c
===================================================================
--- gcc.orig/sel-sched-ir.c
+++ gcc/sel-sched-ir.c
@@ -5652,6 +5652,10 @@  static struct haifa_sched_info sched_sel
   NULL, /* begin_schedule_ready */
   NULL, /* begin_move_insn */
   NULL, /* advance_target_bb */
+
+  NULL,
+  NULL,
+
   SEL_SCHED | NEW_BBS
 };