diff mbox

Scheduler: Allow breaking dependencies by modifying patterns

Message ID 501BBE7B.7030104@codesourcery.com
State New
Headers show

Commit Message

Bernd Schmidt Aug. 3, 2012, 12:05 p.m. UTC
This patch allows us to change

rn++
rm=[rn]

into

rm=[rn + 4]
rn++

Opportunities to do this are discovered by a mini-pass over the
instructions after generating dependencies and before scheduling a
block. At that point we have all the information required to ensure that
a candidate dep between two instructions is only used to show the
register dependence, and to ensure that every insn with a memory
reference is only subject to at most one dep causing a pattern change.

The dep_t structure is extended to hold an optional pointer to a
"replacement description", which holds information about what to change
when a dependency is broken. The time when this replacement is applied
differs depending on whether the changed insn is the DEP_CON (in which
case the pattern is changed whenever the broken dependency becomes the
last one), or the DEP_PRO, in which case we make the change when the
corresponding DEP_CON has been scheduled. This ensures that the ready
list always contains insns with the correct pattern.

A few additional bits are needed in the dep structure: one to hold
information about whether a dependency occurs multiple times, and one to
distinguish dependencies that are purely for register values from those
with other meanings (e.g. memory references).

Also, sched-rgn was changed to use a new bit, DEP_POSTPONED, rather than
HARD_DEP to indicate that we don't want to schedule an insn in the
current block.

A possible future extension would be to also allow autoinc addressing
modes as the increment insn.

Bootstrapped and tested on x86_64-linux, and also tested on c6x-elf
(quite a number of changes were necessary to make it work there). It was
originally written for a mips target and tested there in the context of
a 4.6 tree. I've also run spec2000 on x86_64, with no change that looked
like anything other than noise. Ok?


Bernd

Comments

Maxim Kuvyrkov Aug. 14, 2012, 4:49 a.m. UTC | #1
On 4/08/2012, at 12:05 AM, Bernd Schmidt wrote:

> This patch allows us to change
> 
> rn++
> rm=[rn]
> 
> into
> 
> rm=[rn + 4]
> rn++

This is a good scheduler optimization that I wanted to have for quite a while.  Bernd, kudos for implementing it!

I am going to review this patch, and then the scheduler maintainers will have an option of accepting my word for it, or doing a second review.  I have reviewed earlier versions of this patch, so I know I'm happy with the general structure of the optimization.

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics
Maxim Kuvyrkov Aug. 16, 2012, 12:35 a.m. UTC | #2
On 4/08/2012, at 12:05 AM, Bernd Schmidt wrote:

> This patch allows us to change
> 
> rn++
> rm=[rn]
> 
> into
> 
> rm=[rn + 4]
> rn++

The patch is OK with the following nitpicks.

[BTW, if anyone else wants to review this patch, it helps to read it from the end.]

> 
> Opportunities to do this are discovered by a mini-pass over the
> instructions after generating dependencies and before scheduling a
> block. At that point we have all the information required to ensure that
> a candidate dep between two instructions is only used to show the
> register dependence, and to ensure that every insn with a memory
> reference is only subject to at most one dep causing a pattern change.
> 
> The dep_t structure is extended to hold an optional pointer to a
> "replacement description", which holds information about what to change
> when a dependency is broken. The time when this replacement is applied
> differs depending on whether the changed insn is the DEP_CON (in which
> case the pattern is changed whenever the broken dependency becomes the
> last one), or the DEP_PRO, in which case we make the change when the
> corresponding DEP_CON has been scheduled. This ensures that the ready
> list always contains insns with the correct pattern.
> 
> A few additional bits are needed in the dep structure: one to hold
> information about whether a dependency occurs multiple times, and one to
> distinguish dependencies that are purely for register values from those
> with other meanings (e.g. memory references).
> 
> Also, sched-rgn was changed to use a new bit, DEP_POSTPONED, rather than
> HARD_DEP to indicate that we don't want to schedule an insn in the
> current block.
> 
> A possible future extension would be to also allow autoinc addressing
> modes as the increment insn.

Would you please copy the above to a comment describing how the new optimization fits within the scheduler?  In sched-deps.c just before definition of struct mem_inc_info seems like a good spot for an overview comment.

> 
> Bootstrapped and tested on x86_64-linux, and also tested on c6x-elf
> (quite a number of changes were necessary to make it work there). It was
> originally written for a mips target and tested there in the context of
> a 4.6 tree. I've also run spec2000 on x86_64, with no change that looked
> like anything other than noise. Ok?
> 

> +   We also perform pattern replacements for predication, and for broken
> +   replacement dependencies.  The latter is only done if FOR_BACKTRACK is
> +   false.  */
> 

We also perform pattern replacement for ?speculation? ...


>> +void
>> +find_modifiable_mems (rtx head, rtx tail)
>> +{
>> +  rtx insn;
>> +  int success_in_block = 0;

Success_in_block is not really used.  Maybe add a debug printout for it?


> +  /* Dependency major type.  This field is superseded by STATUS below.
> +     Though, it is still in place because some targets use it.  */
> +  ENUM_BITFIELD(reg_note) type:6;
> 

... is superseded by STATUS *above*.


--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics
Bernd Schmidt Sept. 5, 2012, 2:27 p.m. UTC | #3
On 08/03/2012 02:05 PM, Bernd Schmidt wrote:
> This patch allows us to change
> 
> rn++
> rm=[rn]
> 
> into
> 
> rm=[rn + 4]
> rn++

Ping.


Bernd
Vladimir Makarov Sept. 11, 2012, 4:34 p.m. UTC | #4
On 08/03/2012 08:05 AM, Bernd Schmidt wrote:
> This patch allows us to change
>
> rn++
> rm=[rn]
>
> into
>
> rm=[rn + 4]
> rn++
That is an interesting optimization.  I think analogous optimization 
could be done for INC/DEC addressing (probably it might be beneficial 
for ppc which has such addressing and displacement addressing).  
Although it will complicate the haifa scheduler quite a lot as a new 
insn is generated and the real benefits are may be not worth of it (as 
an additional insn should be generated which in many cases it could 
result even in worse code).
> Opportunities to do this are discovered by a mini-pass over the
> instructions after generating dependencies and before scheduling a
> block. At that point we have all the information required to ensure that
> a candidate dep between two instructions is only used to show the
> register dependence, and to ensure that every insn with a memory
> reference is only subject to at most one dep causing a pattern change.
>
> The dep_t structure is extended to hold an optional pointer to a
> "replacement description", which holds information about what to change
> when a dependency is broken. The time when this replacement is applied
> differs depending on whether the changed insn is the DEP_CON (in which
> case the pattern is changed whenever the broken dependency becomes the
> last one), or the DEP_PRO, in which case we make the change when the
> corresponding DEP_CON has been scheduled. This ensures that the ready
> list always contains insns with the correct pattern.
>
> A few additional bits are needed in the dep structure: one to hold
> information about whether a dependency occurs multiple times, and one to
> distinguish dependencies that are purely for register values from those
> with other meanings (e.g. memory references).
>
> Also, sched-rgn was changed to use a new bit, DEP_POSTPONED, rather than
> HARD_DEP to indicate that we don't want to schedule an insn in the
> current block.
>
> A possible future extension would be to also allow autoinc addressing
> modes as the increment insn.
>
> Bootstrapped and tested on x86_64-linux, and also tested on c6x-elf
> (quite a number of changes were necessary to make it work there). It was
> originally written for a mips target and tested there in the context of
> a 4.6 tree. I've also run spec2000 on x86_64, with no change that looked
> like anything other than noise. Ok?
>
>
Ok, thanks.  The changes are pretty straightforward.  Only just a few 
comments.

One is a missed change log entry for haifa_note_dep.

Second one is for

+  /* Cached cost of the dependency.  Make sure to update UNKNOWN_DEP_COST
+     when changing the size of this field.  */
+  int cost:20;
  };

+#define UNKNOWN_DEP_COST (-1<<19)
+

You could use a macro to define bit widths and UNKNOWN_DEP_COST.  But probably it is a taste matter.

The third one is success_in_block in find_modifiable_mems.  It is calculated but nowhere used.  Probably it was used for debugging.  You should something to do with this.

Thanks for the patch, Bernd.  Sorry for the delay with the review.  I thought that Maxim writes his comments first.
Maxim Kuvyrkov Sept. 12, 2012, 11:09 p.m. UTC | #5
On 12/09/2012, at 4:34 AM, Vladimir Makarov wrote:

> On 08/03/2012 08:05 AM, Bernd Schmidt wrote:
>> 
...
> Ok, thanks.  The changes are pretty straightforward.  Only just a few comments.
> 
...
> 
> Thanks for the patch, Bernd.  Sorry for the delay with the review.  I thought that Maxim writes his comments first.

I reviewed the patch in http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01028.html (comments were similar to yours), but didn't CC you.  Anyway, thanks for the review!

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics
H.J. Lu Sept. 20, 2012, 5:42 p.m. UTC | #6
On Fri, Aug 3, 2012 at 5:05 AM, Bernd Schmidt <bernds@codesourcery.com> wrote:
> This patch allows us to change
>
> rn++
> rm=[rn]
>
> into
>
> rm=[rn + 4]
> rn++
>
> Opportunities to do this are discovered by a mini-pass over the
> instructions after generating dependencies and before scheduling a
> block. At that point we have all the information required to ensure that
> a candidate dep between two instructions is only used to show the
> register dependence, and to ensure that every insn with a memory
> reference is only subject to at most one dep causing a pattern change.
>
> The dep_t structure is extended to hold an optional pointer to a
> "replacement description", which holds information about what to change
> when a dependency is broken. The time when this replacement is applied
> differs depending on whether the changed insn is the DEP_CON (in which
> case the pattern is changed whenever the broken dependency becomes the
> last one), or the DEP_PRO, in which case we make the change when the
> corresponding DEP_CON has been scheduled. This ensures that the ready
> list always contains insns with the correct pattern.
>
> A few additional bits are needed in the dep structure: one to hold
> information about whether a dependency occurs multiple times, and one to
> distinguish dependencies that are purely for register values from those
> with other meanings (e.g. memory references).
>
> Also, sched-rgn was changed to use a new bit, DEP_POSTPONED, rather than
> HARD_DEP to indicate that we don't want to schedule an insn in the
> current block.
>
> A possible future extension would be to also allow autoinc addressing
> modes as the increment insn.
>
> Bootstrapped and tested on x86_64-linux, and also tested on c6x-elf
> (quite a number of changes were necessary to make it work there). It was
> originally written for a mips target and tested there in the context of
> a 4.6 tree. I've also run spec2000 on x86_64, with no change that looked
> like anything other than noise. Ok?
>
>

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54645
diff mbox

Patch

	* dbgcnt.def (sched_breakdep): New counter.
	* haifa-sched.c (update_insn_after_change): New static function,
	broken out of haifa_change_pattern.
	(haifa_change_pattern): Call it.
	(dep_t heap vecs): Declare.
	(INSN_COST): Define earlier.
	(next_cycle_replace_deps, next_cycle_apply): New static
	variables.
	(apply_replacement): New static function.
	(recompute_todo_spec): New argument FOR_BACKTRACK.  All callers
	changed.  Handle DEP_REPLACE deps.
	(contributes_to_priority_p): False for replaceable deps.
	(must_restore_pattern_p, restore_pattern): New static functions.
	(schedule_insn): Use them.  Apply replacements for broken deps.
	(struct haifa_saved_data): Add new fields to keep track of
	replacements.
	(save_backtrack_point): Initialize them.
	(undo_replacements_for_backtrack): New static function.
	(restore_last_backtrack_point, free_topmost_backtrack_point):
	Use it and keep track of replacements.
	(perform_replacements_new_cycle, undo_all_replacements): New static
	functions.
	(schedule_block): Call these two as necessary.  Call
	find_modifiable_mems.
	(try_ready): Tweak the assert.  Check for DEP_POSTPONED.
	* sched-deps.c: Include "emit-rtl.h".
	(init_dep_1): Initialize DEP_NONREG, DEP_MULTIPLE and DEP_REPLACE.
	(dep_spec_p): True for DEP_REPLACE deps.
	(mark_as_hard): New static variable.
	(update_dep): Update DEP_NONREG and DEP_MULTIPLE.
	(add_dependence_list): New argument hard.  All callers changed.  Set
	and clear mark_as_hard around function body.
	(add_dependence_list_and_free): Likewise.
	(haifa_note_mem_dep): Set DEP_NONREG.
	(sched_analyze_insn): Switch loop with if statement testing for
	sel_sched_p.
	(struct mem_inc_info): New.
	(attempt_change, parse_add_or_inc, find_inc, find_mem): New static
	functions.
	(find_modifiable_mems): New function.
	* sched-int.h (struct dep_replacement): New.
	(struct _dep): Add replace, nonreg and multiple fields.  Make type and
	cost bitfields.
	(UNKNOWN_DEP_COST): Change to match the bitfield.
	(DEP_NONREG, DEP_MULTIPLE, DEP_REPLACE): New macros.
	(DEP_POSTPONED): New macro.
	(DEP_CANCELLED): Renumber.
	(find_modifiable_mems): Declare.
	(enum SCHED_FLAGS): Add DONT_BREAK_DEPENDENCIES.
	* sched-rgn.c (init_ready_list): Set TODO_SPEC here.
	(new_ready): Don't set HARD_DEP, use DEP_POSTPONED.
	(debug_dependencies): Dump DEP_NONREG and DEP_MULTIPLE.
	* Makefile.in (sched-deps.o): Update dependencies.
	* config/c6x/c6x.c (in_hwloop): New static variable.
	(c6x_set_sched_flags): If it is true, add DONT_BREAK_DEPENDENCIES.
	(hwloop_optimize): Set and clear it around preliminary scheduling
	pass.

Index: gcc/dbgcnt.def
===================================================================
--- gcc/dbgcnt.def	(revision 189284)
+++ gcc/dbgcnt.def	(working copy)
@@ -176,6 +176,7 @@  DEBUG_COUNTER (sched2_func)
 DEBUG_COUNTER (sched_block)
 DEBUG_COUNTER (sched_func)
 DEBUG_COUNTER (sched_insn)
+DEBUG_COUNTER (sched_breakdep)
 DEBUG_COUNTER (sched_region)
 DEBUG_COUNTER (sel_sched_cnt)
 DEBUG_COUNTER (sel_sched_region_cnt)
Index: gcc/haifa-sched.c
===================================================================
--- gcc/haifa-sched.c	(revision 189284)
+++ gcc/haifa-sched.c	(working copy)
@@ -213,6 +213,9 @@  struct common_sched_info_def *common_sch
 #define FEEDS_BACKTRACK_INSN(INSN) (HID (INSN)->feeds_backtrack_insn)
 #define SHADOW_P(INSN) (HID (INSN)->shadow_p)
 #define MUST_RECOMPUTE_SPEC_P(INSN) (HID (INSN)->must_recompute_spec)
+/* Cached cost of the instruction.  Use insn_cost to get cost of the
+   insn.  -1 here means that the field is not initialized.  */
+#define INSN_COST(INSN)	(HID (INSN)->cost)
 
 /* If INSN_TICK of an instruction is equal to INVALID_TICK,
    then it should be recalculated from scratch.  */
@@ -1114,18 +1117,59 @@  cond_clobbered_p (rtx insn, HARD_REG_SET
   return false;
 }
 
+/* This function should be called after modifying the pattern of INSN,
+   to update scheduler data structures as needed.  */
+static void
+update_insn_after_change (rtx insn)
+{
+  sd_iterator_def sd_it;
+  dep_t dep;
+
+  dfa_clear_single_insn_cache (insn);
+
+  sd_it = sd_iterator_start (insn,
+			     SD_LIST_FORW | SD_LIST_BACK | SD_LIST_RES_BACK);
+  while (sd_iterator_cond (&sd_it, &dep))
+    {
+      DEP_COST (dep) = UNKNOWN_DEP_COST;
+      sd_iterator_next (&sd_it);
+    }
+
+  /* Invalidate INSN_COST, so it'll be recalculated.  */
+  INSN_COST (insn) = -1;
+  /* Invalidate INSN_TICK, so it'll be recalculated.  */
+  INSN_TICK (insn) = INVALID_TICK;
+}
+
+DEF_VEC_P(dep_t);
+DEF_VEC_ALLOC_P(dep_t, heap);
+
+/* Two VECs, one to hold dependencies for which pattern replacements
+   need to be applied or restored at the start of the next cycle, and
+   another to hold an integer that is either one, to apply the
+   corresponding replacement, or zero to restore it.  */
+static VEC(dep_t, heap) *next_cycle_replace_deps;
+static VEC(int, heap) *next_cycle_apply;
+
+static void apply_replacement (dep_t, bool);
+static void restore_pattern (dep_t, bool);
+
 /* Look at the remaining dependencies for insn NEXT, and compute and return
    the TODO_SPEC value we should use for it.  This is called after one of
-   NEXT's dependencies has been resolved.  */
+   NEXT's dependencies has been resolved.
+   We also perform pattern replacements for predication, and for broken
+   replacement dependencies.  The latter is only done if FOR_BACKTRACK is
+   false.  */
 
 static ds_t
-recompute_todo_spec (rtx next)
+recompute_todo_spec (rtx next, bool for_backtrack)
 {
   ds_t new_ds;
   sd_iterator_def sd_it;
-  dep_t dep, control_dep = NULL;
+  dep_t dep, modify_dep = NULL;
   int n_spec = 0;
   int n_control = 0;
+  int n_replace = 0;
   bool first_p = true;
 
   if (sd_lists_empty_p (next, SD_LIST_BACK))
@@ -1142,9 +1186,10 @@  recompute_todo_spec (rtx next)
 
   FOR_EACH_DEP (next, SD_LIST_BACK, sd_it, dep)
     {
+      rtx pro = DEP_PRO (dep);
       ds_t ds = DEP_STATUS (dep) & SPECULATIVE;
 
-      if (DEBUG_INSN_P (DEP_PRO (dep)) && !DEBUG_INSN_P (next))
+      if (DEBUG_INSN_P (pro) && !DEBUG_INSN_P (next))
 	continue;
 
       if (ds)
@@ -1159,15 +1204,47 @@  recompute_todo_spec (rtx next)
 	  else
 	    new_ds = ds_merge (new_ds, ds);
 	}
-      if (DEP_TYPE (dep) == REG_DEP_CONTROL)
+      else if (DEP_TYPE (dep) == REG_DEP_CONTROL)
 	{
-	  n_control++;
-	  control_dep = dep;
+	  if (QUEUE_INDEX (pro) != QUEUE_SCHEDULED)
+	    {
+	      n_control++;
+	      modify_dep = dep;
+	    }
+	  DEP_STATUS (dep) &= ~DEP_CANCELLED;
+	}
+      else if (DEP_REPLACE (dep) != NULL)
+	{
+	  if (QUEUE_INDEX (pro) != QUEUE_SCHEDULED)
+	    {
+	      n_replace++;
+	      modify_dep = dep;
+	    }
 	  DEP_STATUS (dep) &= ~DEP_CANCELLED;
 	}
     }
 
-  if (n_control == 1 && n_spec == 0)
+  if (n_replace > 0 && n_control == 0 && n_spec == 0)
+    {
+      if (!dbg_cnt (sched_breakdep))
+	return HARD_DEP;
+      FOR_EACH_DEP (next, SD_LIST_BACK, sd_it, dep)
+	{
+	  struct dep_replacement *desc = DEP_REPLACE (dep);
+	  if (desc != NULL)
+	    {
+	      if (desc->insn == next && !for_backtrack)
+		{
+		  gcc_assert (n_replace == 1);
+		  apply_replacement (dep, true);
+		}
+	      DEP_STATUS (dep) |= DEP_CANCELLED;
+	    }
+	}
+      return 0;
+    }
+  
+  else if (n_control == 1 && n_replace == 0 && n_spec == 0)
     {
       rtx pro, other, new_pat;
       rtx cond = NULL_RTX;
@@ -1181,7 +1258,7 @@  recompute_todo_spec (rtx next)
 	      && PREDICATED_PAT (next) == NULL_RTX))
 	return HARD_DEP;
 
-      pro = DEP_PRO (control_dep);
+      pro = DEP_PRO (modify_dep);
       other = real_insn_for_shadow (pro);
       if (other != NULL_RTX)
 	pro = other;
@@ -1220,7 +1297,7 @@  recompute_todo_spec (rtx next)
 					       PREDICATED_PAT (next));
 	  gcc_assert (success);
 	}
-      DEP_STATUS (control_dep) |= DEP_CANCELLED;
+      DEP_STATUS (modify_dep) |= DEP_CANCELLED;
       return DEP_CONTROL;
     }
 
@@ -1237,11 +1314,12 @@  recompute_todo_spec (rtx next)
      dependencies, so we return HARD_DEP in such a case.  Also fail if
      we have speculative dependencies with not enough points, or more than
      one control dependency.  */
-  if ((n_spec > 0 && n_control > 0)
+  if ((n_spec > 0 && (n_control > 0 || n_replace > 0))
       || (n_spec > 0
 	  /* Too few points?  */
 	  && ds_weak (new_ds) < spec_info->data_weakness_cutoff)
-      || (n_control > 1))
+      || n_control > 0
+      || n_replace > 0)
     return HARD_DEP;
 
   return new_ds;
@@ -1261,10 +1339,6 @@  static rtx last_nondebug_scheduled_insn;
    first unscheduled one.  */
 static rtx nonscheduled_insns_begin;
 
-/* Cached cost of the instruction.  Use below function to get cost of the
-   insn.  -1 here means that the field is not initialized.  */
-#define INSN_COST(INSN)	(HID (INSN)->cost)
-
 /* Compute cost of executing INSN.
    This is the number of cycles between instruction issue and
    instruction results.  */
@@ -1443,6 +1517,9 @@  contributes_to_priority_p (dep_t dep)
 						    DEP_PRO (dep)))
     return false;
 
+  if (DEP_REPLACE (dep) != NULL)
+    return false;
+
   /* If flag COUNT_SPEC_IN_CRITICAL_PATH is set,
      then speculative instructions will less likely be
      scheduled.  That is because the priority of
@@ -2136,6 +2213,31 @@  model_recompute (rtx insn)
   if (print_p)
     fprintf (sched_dump, MODEL_BAR);
 }
+
+/* After DEP, which was cancelled, has been resolved for insn NEXT,
+   check whether the insn's pattern needs restoring.  */
+static bool
+must_restore_pattern_p (rtx next, dep_t dep)
+{
+  if (QUEUE_INDEX (next) == QUEUE_SCHEDULED)
+    return false;
+
+  if (DEP_TYPE (dep) == REG_DEP_CONTROL)
+    {
+      gcc_assert (ORIG_PAT (next) != NULL_RTX);
+      gcc_assert (next == DEP_CON (dep));
+    }
+  else
+    {
+      struct dep_replacement *desc = DEP_REPLACE (dep);
+      if (desc->insn != next)
+	{
+	  gcc_assert (*desc->loc == desc->orig);
+	  return false;
+	}
+    }
+  return true;
+}
 
 /* model_spill_cost (CL, P, P') returns the cost of increasing the
    pressure on CL from P to P'.  We use this to calculate a "base ECC",
@@ -3735,7 +3837,20 @@  schedule_insn (rtx insn)
 
   check_clobbered_conditions (insn);
 
-  /* Update dependent instructions.  */
+  /* Update dependent instructions.  First, see if by scheduling this insn
+     now we broke a dependence in a way that requires us to change another
+     insn.  */
+  for (sd_it = sd_iterator_start (insn, SD_LIST_SPEC_BACK);
+       sd_iterator_cond (&sd_it, &dep); sd_iterator_next (&sd_it))
+    {
+      struct dep_replacement *desc = DEP_REPLACE (dep);
+      rtx pro = DEP_PRO (dep);
+      if (QUEUE_INDEX (pro) != QUEUE_SCHEDULED
+	  && desc != NULL && desc->insn == pro)
+	apply_replacement (dep, false);
+    }
+
+  /* Go through and resolve forward dependencies.  */
   for (sd_it = sd_iterator_start (insn, SD_LIST_FORW);
        sd_iterator_cond (&sd_it, &dep);)
     {
@@ -3749,17 +3864,8 @@  schedule_insn (rtx insn)
 
       if (cancelled)
 	{
-	  if (QUEUE_INDEX (next) != QUEUE_SCHEDULED)
-	    {
-	      int tick = INSN_TICK (next);
-	      gcc_assert (ORIG_PAT (next) != NULL_RTX);
-	      haifa_change_pattern (next, ORIG_PAT (next));
-	      INSN_TICK (next) = tick;
-	      if (sd_lists_empty_p (next, SD_LIST_BACK))
-		TODO_SPEC (next) = 0;
-	      else if (!sd_lists_empty_p (next, SD_LIST_HARD_BACK))
-		TODO_SPEC (next) = HARD_DEP;
-	    }
+	  if (must_restore_pattern_p (next, dep))
+	    restore_pattern (dep, false);
 	  continue;
 	}
 
@@ -3917,6 +4023,16 @@  struct haifa_saved_data
      to 0 when restoring.  */
   int q_size;
   rtx *insn_queue;
+
+  /* Describe pattern replacements that occurred since this backtrack point
+     was queued.  */
+  VEC (dep_t, heap) *replacement_deps;
+  VEC (int, heap) *replace_apply;
+
+  /* A copy of the next-cycle replacement vectors at the time of the backtrack
+     point.  */
+  VEC (dep_t, heap) *next_cycle_deps;
+  VEC (int, heap) *next_cycle_apply;
 };
 
 /* A record, in reverse order, of all scheduled insns which have delay slots
@@ -3973,6 +4089,11 @@  save_backtrack_point (struct delay_pair
 
   save->sched_block = sched_block;
 
+  save->replacement_deps = NULL;
+  save->replace_apply = NULL;
+  save->next_cycle_deps = VEC_copy (dep_t, heap, next_cycle_replace_deps);
+  save->next_cycle_apply = VEC_copy (int, heap, next_cycle_apply);
+
   if (current_sched_info->save_state)
     save->fe_saved_data = (*current_sched_info->save_state) ();
 
@@ -4042,6 +4163,25 @@  toggle_cancelled_flags (bool set)
     }
 }
 
+/* Undo the replacements that have occurred after backtrack point SAVE
+   was placed.  */
+static void
+undo_replacements_for_backtrack (struct haifa_saved_data *save)
+{
+  while (!VEC_empty (dep_t, save->replacement_deps))
+    {
+      dep_t dep = VEC_pop (dep_t, save->replacement_deps);
+      int apply_p = VEC_pop (int, save->replace_apply);
+
+      if (apply_p)
+	restore_pattern (dep, true);
+      else
+	apply_replacement (dep, true);
+    }
+  VEC_free (dep_t, heap, save->replacement_deps);
+  VEC_free (int, heap, save->replace_apply);
+}
+
 /* Pop entries from the SCHEDULED_INSNS vector up to and including INSN.
    Restore their dependencies to an unresolved state, and mark them as
    queued nowhere.  */
@@ -4107,7 +4247,7 @@  unschedule_insns_until (rtx insn)
 	    haifa_change_pattern (con, ORIG_PAT (con));
 	}
       else if (QUEUE_INDEX (con) != QUEUE_SCHEDULED)
-	TODO_SPEC (con) = recompute_todo_spec (con);
+	TODO_SPEC (con) = recompute_todo_spec (con, true);
     }
   VEC_free (rtx, heap, recompute_vec);
 }
@@ -4135,6 +4275,10 @@  restore_last_backtrack_point (struct sch
       targetm.sched.free_sched_context (save->be_saved_data);
     }
 
+  /* Do this first since it clobbers INSN_TICK of the involved
+     instructions.  */
+  undo_replacements_for_backtrack (save);
+
   /* Clear the QUEUE_INDEX of everything in the ready list or one
      of the queues.  */
   if (ready.n_ready > 0)
@@ -4170,7 +4314,7 @@  restore_last_backtrack_point (struct sch
 	{
 	  rtx insn = first[i];
 	  QUEUE_INDEX (insn) = QUEUE_READY;
-	  TODO_SPEC (insn) = recompute_todo_spec (insn);
+	  TODO_SPEC (insn) = recompute_todo_spec (insn, true);
 	  INSN_TICK (insn) = save->clock_var;
 	}
     }
@@ -4187,7 +4331,7 @@  restore_last_backtrack_point (struct sch
 	{
 	  rtx x = XEXP (link, 0);
 	  QUEUE_INDEX (x) = i;
-	  TODO_SPEC (x) = recompute_todo_spec (x);
+	  TODO_SPEC (x) = recompute_todo_spec (x, true);
 	  INSN_TICK (x) = save->clock_var + i;
 	}
     }
@@ -4208,6 +4352,10 @@  restore_last_backtrack_point (struct sch
 
   mark_backtrack_feeds (save->delay_pair->i2, 0);
 
+  gcc_assert (VEC_empty (dep_t, next_cycle_replace_deps));
+  next_cycle_replace_deps = VEC_copy (dep_t, heap, save->next_cycle_deps);
+  next_cycle_apply = VEC_copy (int, heap, save->next_cycle_apply);
+
   free (save);
 
   for (save = backtrack_queue; save; save = save->next)
@@ -4237,7 +4385,14 @@  free_topmost_backtrack_point (bool reset
 	  INSN_EXACT_TICK (pair->i2) = INVALID_TICK;
 	  pair = pair->next_same_i1;
 	}
+      undo_replacements_for_backtrack (save);
     }
+  else
+    {
+      VEC_free (dep_t, heap, save->replacement_deps);
+      VEC_free (int, heap, save->replace_apply);
+    }
+
   if (targetm.sched.free_sched_context)
     targetm.sched.free_sched_context (save->be_saved_data);
   if (current_sched_info->restore_state)
@@ -4258,6 +4413,124 @@  free_backtrack_queue (void)
     free_topmost_backtrack_point (false);
 }
 
+/* Apply a replacement described by DESC.  If IMMEDIATELY is false, we
+   may have to postpone the replacement until the start of the next cycle,
+   at which point we will be called again with IMMEDIATELY true.  This is
+   only done for machines which have instruction packets with explicit
+   parallelism however.  */
+static void
+apply_replacement (dep_t dep, bool immediately)
+{
+  struct dep_replacement *desc = DEP_REPLACE (dep);
+  if (!immediately && targetm.sched.exposed_pipeline && reload_completed)
+    {
+      VEC_safe_push (dep_t, heap, next_cycle_replace_deps, dep);
+      VEC_safe_push (int, heap, next_cycle_apply, 1);
+    }
+  else
+    {
+      bool success;
+
+      if (QUEUE_INDEX (desc->insn) == QUEUE_SCHEDULED)
+	return;
+
+      if (sched_verbose >= 5)
+	fprintf (sched_dump, "applying replacement for insn %d\n",
+		 INSN_UID (desc->insn));
+
+      success = validate_change (desc->insn, desc->loc, desc->newval, 0);
+      gcc_assert (success);
+
+      update_insn_after_change (desc->insn);
+      if ((TODO_SPEC (desc->insn) & (HARD_DEP | DEP_POSTPONED)) == 0)
+	fix_tick_ready (desc->insn);
+
+      if (backtrack_queue != NULL)
+	{
+	  VEC_safe_push (dep_t, heap, backtrack_queue->replacement_deps, dep);
+	  VEC_safe_push (int, heap, backtrack_queue->replace_apply, 1);
+	}
+    }
+}
+
+/* We have determined that a pattern involved in DEP must be restored.
+   If IMMEDIATELY is false, we may have to postpone the replacement
+   until the start of the next cycle, at which point we will be called
+   again with IMMEDIATELY true.  */
+static void
+restore_pattern (dep_t dep, bool immediately)
+{
+  rtx next = DEP_CON (dep);
+  int tick = INSN_TICK (next);
+
+  /* If we already scheduled the insn, the modified version is
+     correct.  */
+  if (QUEUE_INDEX (next) == QUEUE_SCHEDULED)
+    return;
+
+  if (!immediately && targetm.sched.exposed_pipeline && reload_completed)
+    {
+      VEC_safe_push (dep_t, heap, next_cycle_replace_deps, dep);
+      VEC_safe_push (int, heap, next_cycle_apply, 0);
+      return;
+    }
+
+
+  if (DEP_TYPE (dep) == REG_DEP_CONTROL)
+    {
+      if (sched_verbose >= 5)
+	fprintf (sched_dump, "restoring pattern for insn %d\n",
+		 INSN_UID (next));
+      haifa_change_pattern (next, ORIG_PAT (next));
+    }
+  else
+    {
+      struct dep_replacement *desc = DEP_REPLACE (dep);
+      bool success;
+
+      if (sched_verbose >= 5)
+	fprintf (sched_dump, "restoring pattern for insn %d\n",
+		 INSN_UID (desc->insn));
+      tick = INSN_TICK (desc->insn);
+
+      success = validate_change (desc->insn, desc->loc, desc->orig, 0);
+      gcc_assert (success);
+      update_insn_after_change (desc->insn);
+      if (backtrack_queue != NULL)
+	{
+	  VEC_safe_push (dep_t, heap, backtrack_queue->replacement_deps, dep);
+	  VEC_safe_push (int, heap, backtrack_queue->replace_apply, 0);
+	}
+    }
+  INSN_TICK (next) = tick;
+  if (TODO_SPEC (next) == DEP_POSTPONED)
+    return;
+
+  if (sd_lists_empty_p (next, SD_LIST_BACK))
+    TODO_SPEC (next) = 0;
+  else if (!sd_lists_empty_p (next, SD_LIST_HARD_BACK))
+    TODO_SPEC (next) = HARD_DEP;
+}
+
+/* Perform pattern replacements that were queued up until the next
+   cycle.  */
+static void
+perform_replacements_new_cycle (void)
+{
+  int i;
+  dep_t dep;
+  FOR_EACH_VEC_ELT (dep_t, next_cycle_replace_deps, i, dep)
+    {
+      int apply_p = VEC_index (int, next_cycle_apply, i);
+      if (apply_p)
+	apply_replacement (dep, true);
+      else
+	restore_pattern (dep, true);
+    }
+  VEC_truncate (dep_t, next_cycle_replace_deps, 0);
+  VEC_truncate (int, next_cycle_apply, 0);
+}
+
 /* Compute INSN_TICK_ESTIMATE for INSN.  PROCESSED is a bitmap of
    instructions we've previously encountered, a set bit prevents
    recursion.  BUDGET is a limit on how far ahead we look, it is
@@ -4515,6 +4788,30 @@  restore_other_notes (rtx head, basic_blo
   return head;
 }
 
+/* When we know we are going to discard the schedule due to a failed attempt
+   at modulo scheduling, undo all replacements.  */
+static void
+undo_all_replacements (void)
+{
+  rtx insn;
+  int i;
+
+  FOR_EACH_VEC_ELT (rtx, scheduled_insns, i, insn)
+    {
+      sd_iterator_def sd_it;
+      dep_t dep;
+
+      /* See if we must undo a replacement.  */
+      for (sd_it = sd_iterator_start (insn, SD_LIST_RES_FORW);
+	   sd_iterator_cond (&sd_it, &dep); sd_iterator_next (&sd_it))
+	{
+	  struct dep_replacement *desc = DEP_REPLACE (dep);
+	  if (desc != NULL)
+	    validate_change (desc->insn, desc->loc, desc->orig, 0);
+	}
+    }
+}
+
 /* Move insns that became ready to fire from queue to ready list.  */
 
 static void
@@ -5556,6 +5853,9 @@  schedule_block (basic_block *target_bb)
   rtx head = NEXT_INSN (prev_head);
   rtx tail = PREV_INSN (next_tail);
 
+  if ((current_sched_info->flags & DONT_BREAK_DEPENDENCIES) == 0)
+    find_modifiable_mems (head, tail);
+
   /* We used to have code to avoid getting parameters moved from hard
      argument registers into pseudos.
 
@@ -5676,6 +5976,7 @@  schedule_block (basic_block *target_bb)
   /* Loop until all the insns in BB are scheduled.  */
   while ((*current_sched_info->schedule_more_p) ())
     {
+      perform_replacements_new_cycle ();
       do
 	{
 	  start_clock_var = clock_var;
@@ -5892,7 +6193,7 @@  schedule_block (basic_block *target_bb)
 	    /* We normally get here only if we don't want to move
 	       insn from the split block.  */
 	    {
-	      TODO_SPEC (insn) = HARD_DEP;
+	      TODO_SPEC (insn) = DEP_POSTPONED;
 	      goto restart_choose_ready;
 	    }
 
@@ -6003,6 +6304,8 @@  schedule_block (basic_block *target_bb)
 	  gcc_assert (failed);
 
 	  failed_insn = failed->delay_pair->i1;
+	  /* Clear these queues.  */
+	  perform_replacements_new_cycle ();
 	  toggle_cancelled_flags (false);
 	  unschedule_insns_until (failed_insn);
 	  while (failed != backtrack_queue)
@@ -6030,6 +6333,7 @@  schedule_block (basic_block *target_bb)
   if (ls.modulo_epilogue)
     success = true;
  end_schedule:
+  perform_replacements_new_cycle ();
   if (modulo_ii > 0)
     {
       /* Once again, debug insn suckiness: they can be on the ready list
@@ -6069,6 +6373,9 @@  schedule_block (basic_block *target_bb)
 	}
     }
 
+  if (!success)
+    undo_all_replacements ();
+
   /* Debug info.  */
   if (sched_verbose)
     {
@@ -6552,14 +6859,15 @@  try_ready (rtx next)
 
   old_ts = TODO_SPEC (next);
 
-  gcc_assert (!(old_ts & ~(SPECULATIVE | HARD_DEP | DEP_CONTROL))
-	      && ((old_ts & HARD_DEP)
+  gcc_assert (!(old_ts & ~(SPECULATIVE | HARD_DEP | DEP_CONTROL | DEP_POSTPONED))
+	      && (old_ts == HARD_DEP
+		  || old_ts == DEP_POSTPONED
 		  || (old_ts & SPECULATIVE)
-		  || (old_ts & DEP_CONTROL)));
+		  || old_ts == DEP_CONTROL));
 
-  new_ts = recompute_todo_spec (next);
+  new_ts = recompute_todo_spec (next, false);
 
-  if (new_ts & HARD_DEP)
+  if (new_ts & (HARD_DEP | DEP_POSTPONED))
     gcc_assert (new_ts == old_ts
 		&& QUEUE_INDEX (next) == QUEUE_NOWHERE);
   else if (current_sched_info->new_ready)
@@ -6627,7 +6935,7 @@  try_ready (rtx next)
 
   TODO_SPEC (next) = new_ts;
 
-  if (new_ts & HARD_DEP)
+  if (new_ts & (HARD_DEP | DEP_POSTPONED))
     {
       /* We can't assert (QUEUE_INDEX (next) == QUEUE_NOWHERE) here because
 	 control-speculative NEXT could have been discarded by sched-rgn.c
@@ -7647,27 +7955,13 @@  fix_recovery_deps (basic_block rec)
 static bool
 haifa_change_pattern (rtx insn, rtx new_pat)
 {
-  sd_iterator_def sd_it;
-  dep_t dep;
   int t;
 
   t = validate_change (insn, &PATTERN (insn), new_pat, 0);
   if (!t)
     return false;
-  dfa_clear_single_insn_cache (insn);
 
-  sd_it = sd_iterator_start (insn,
-			     SD_LIST_FORW | SD_LIST_BACK | SD_LIST_RES_BACK);
-  while (sd_iterator_cond (&sd_it, &dep))
-    {
-      DEP_COST (dep) = UNKNOWN_DEP_COST;
-      sd_iterator_next (&sd_it);
-    }
-
-  /* Invalidate INSN_COST, so it'll be recalculated.  */
-  INSN_COST (insn) = -1;
-  /* Invalidate INSN_TICK, so it'll be recalculated.  */
-  INSN_TICK (insn) = INVALID_TICK;
+  update_insn_after_change (insn);
   return true;
 }
 
Index: gcc/sched-deps.c
===================================================================
--- gcc/sched-deps.c	(revision 189284)
+++ gcc/sched-deps.c	(working copy)
@@ -38,6 +38,7 @@  along with GCC; see the file COPYING3.
 #include "insn-attr.h"
 #include "except.h"
 #include "recog.h"
+#include "emit-rtl.h"
 #include "sched-int.h"
 #include "params.h"
 #include "cselib.h"
@@ -108,6 +109,9 @@  init_dep_1 (dep_t dep, rtx pro, rtx con,
   DEP_TYPE (dep) = type;
   DEP_STATUS (dep) = ds;
   DEP_COST (dep) = UNKNOWN_DEP_COST;
+  DEP_NONREG (dep) = 0;
+  DEP_MULTIPLE (dep) = 0;
+  DEP_REPLACE (dep) = NULL;
 }
 
 /* Init DEP with the arguments.
@@ -433,6 +437,8 @@  dep_spec_p (dep_t dep)
       if (DEP_TYPE (dep) == REG_DEP_CONTROL)
 	return true;
     }
+  if (DEP_REPLACE (dep) != NULL)
+    return true;
   return false;
 }
 
@@ -471,11 +477,14 @@  static bitmap_head *control_dependency_c
 static bitmap_head *spec_dependency_cache = NULL;
 static int cache_size;
 
+/* True if we should mark added dependencies as a non-register deps.  */
+static bool mark_as_hard;
+
 static int deps_may_trap_p (const_rtx);
 static void add_dependence_1 (rtx, rtx, enum reg_note);
-static void add_dependence_list (rtx, rtx, int, enum reg_note);
+static void add_dependence_list (rtx, rtx, int, enum reg_note, bool);
 static void add_dependence_list_and_free (struct deps_desc *, rtx,
-					  rtx *, int, enum reg_note);
+					  rtx *, int, enum reg_note, bool);
 static void delete_all_dependences (rtx);
 static void chain_to_prev_insn (rtx);
 
@@ -1135,6 +1144,9 @@  update_dep (dep_t dep, dep_t new_dep,
   enum reg_note old_type = DEP_TYPE (dep);
   bool was_spec = dep_spec_p (dep);
 
+  DEP_NONREG (dep) |= DEP_NONREG (new_dep);
+  DEP_MULTIPLE (dep) = 1;
+
   /* If this is a more restrictive type of dependence than the
      existing one, then change the existing dependence to this
      type.  */
@@ -1537,31 +1549,36 @@  add_dependence (rtx con, rtx pro, enum r
 	    fprintf (sched_dump, "making DEP_CONTROL for %d\n",
 		     INSN_UID (real_pro));
 	  add_dependence_list (con, INSN_COND_DEPS (real_pro), 0,
-			       REG_DEP_TRUE);
+			       REG_DEP_TRUE, false);
 	}
     }
 	  
   add_dependence_1 (con, pro, dep_type);
 }
 
-/* A convenience wrapper to operate on an entire list.  */
+/* A convenience wrapper to operate on an entire list.  HARD should be
+   true if DEP_NONREG should be set on newly created dependencies.  */
 
 static void
-add_dependence_list (rtx insn, rtx list, int uncond, enum reg_note dep_type)
+add_dependence_list (rtx insn, rtx list, int uncond, enum reg_note dep_type,
+		     bool hard)
 {
+  mark_as_hard = hard;
   for (; list; list = XEXP (list, 1))
     {
       if (uncond || ! sched_insns_conditions_mutex_p (insn, XEXP (list, 0)))
 	add_dependence (insn, XEXP (list, 0), dep_type);
     }
+  mark_as_hard = false;
 }
 
 /* Similar, but free *LISTP at the same time, when the context
-   is not readonly.  */
+   is not readonly.  HARD should be true if DEP_NONREG should be set on
+   newly created dependencies.  */
 
 static void
 add_dependence_list_and_free (struct deps_desc *deps, rtx insn, rtx *listp,
-                              int uncond, enum reg_note dep_type)
+                              int uncond, enum reg_note dep_type, bool hard)
 {
   rtx list, next;
 
@@ -1570,10 +1587,11 @@  add_dependence_list_and_free (struct dep
      disregarded.  */
   if (deps->readonly || DEBUG_INSN_P (insn))
     {
-      add_dependence_list (insn, *listp, uncond, dep_type);
+      add_dependence_list (insn, *listp, uncond, dep_type, hard);
       return;
     }
 
+  mark_as_hard = hard;
   for (list = *listp, *listp = NULL; list ; list = next)
     {
       next = XEXP (list, 1);
@@ -1581,6 +1599,7 @@  add_dependence_list_and_free (struct dep
 	add_dependence (insn, XEXP (list, 0), dep_type);
       free_INSN_LIST_node (list);
     }
+  mark_as_hard = false;
 }
 
 /* Remove all occurrences of INSN from LIST.  Return the number of
@@ -1746,7 +1765,7 @@  flush_pending_lists (struct deps_desc *d
   if (for_write)
     {
       add_dependence_list_and_free (deps, insn, &deps->pending_read_insns,
-                                    1, REG_DEP_ANTI);
+                                    1, REG_DEP_ANTI, true);
       if (!deps->readonly)
         {
           free_EXPR_LIST_list (&deps->pending_read_mems);
@@ -1755,14 +1774,16 @@  flush_pending_lists (struct deps_desc *d
     }
 
   add_dependence_list_and_free (deps, insn, &deps->pending_write_insns, 1,
-				for_read ? REG_DEP_ANTI : REG_DEP_OUTPUT);
+				for_read ? REG_DEP_ANTI : REG_DEP_OUTPUT,
+				true);
 
   add_dependence_list_and_free (deps, insn,
                                 &deps->last_pending_memory_flush, 1,
-                                for_read ? REG_DEP_ANTI : REG_DEP_OUTPUT);
+                                for_read ? REG_DEP_ANTI : REG_DEP_OUTPUT,
+				true);
 
   add_dependence_list_and_free (deps, insn, &deps->pending_jump_insns, 1,
-				REG_DEP_ANTI);
+				REG_DEP_ANTI, true);
 
   if (!deps->readonly)
     {
@@ -1827,6 +1848,7 @@  haifa_note_mem_dep (rtx mem, rtx pending
 
     init_dep_1 (dep, pending_insn, cur_insn, ds_to_dt (ds),
                 current_sched_info->flags & USE_DEPS_LIST ? ds : 0);
+    DEP_NONREG (dep) = 1;
     maybe_add_or_update_dep_1 (dep, false, pending_mem, mem);
   }
 
@@ -1839,6 +1861,8 @@  haifa_note_dep (rtx elem, ds_t ds)
   dep_t dep = &_dep;
 
   init_dep (dep, elem, cur_insn, ds_to_dt (ds));
+  if (mark_as_hard)
+    DEP_NONREG (dep) = 1;
   maybe_add_or_update_dep_1 (dep, false, NULL_RTX, NULL_RTX);
 }
 
@@ -2343,7 +2367,7 @@  sched_analyze_reg (struct deps_desc *dep
 	      = alloc_INSN_LIST (insn, deps->sched_before_next_call);
 	  else
 	    add_dependence_list (insn, deps->last_function_call, 1,
-				 REG_DEP_ANTI);
+				 REG_DEP_ANTI, false);
 	}
     }
 }
@@ -2499,9 +2523,9 @@  sched_analyze_1 (struct deps_desc *deps,
 	    }
 
 	  add_dependence_list (insn, deps->last_pending_memory_flush, 1,
-			       REG_DEP_ANTI);
+			       REG_DEP_ANTI, true);
 	  add_dependence_list (insn, deps->pending_jump_insns, 1,
-			       REG_DEP_CONTROL);
+			       REG_DEP_CONTROL, true);
 
           if (!deps->readonly)
             add_insn_mem_dependence (deps, false, insn, dest);
@@ -2801,7 +2825,7 @@  sched_analyze_insn (struct deps_desc *de
     /* Avoid moving trapping instructions across function calls that might
        not always return.  */
     add_dependence_list (insn, deps->last_function_call_may_noreturn,
-			 1, REG_DEP_ANTI);
+			 1, REG_DEP_ANTI, true);
 
   /* We must avoid creating a situation in which two successors of the
      current block have different unwind info after scheduling.  If at any
@@ -2818,7 +2842,8 @@  sched_analyze_insn (struct deps_desc *de
 	= alloc_INSN_LIST (insn, deps->sched_before_next_jump);
 
       /* Make sure epilogue insn is scheduled after preceding jumps.  */
-      add_dependence_list (insn, deps->pending_jump_insns, 1, REG_DEP_ANTI);
+      add_dependence_list (insn, deps->pending_jump_insns, 1, REG_DEP_ANTI,
+			   true);
     }
 
   if (code == COND_EXEC)
@@ -2839,7 +2864,7 @@  sched_analyze_insn (struct deps_desc *de
 	 instruction so that reg-stack won't get confused.  */
       if (code == CLOBBER)
 	add_dependence_list (insn, deps->last_function_call, 1,
-			     REG_DEP_OUTPUT);
+			     REG_DEP_OUTPUT, true);
     }
   else if (code == PARALLEL)
     {
@@ -2900,11 +2925,12 @@  sched_analyze_insn (struct deps_desc *de
               EXECUTE_IF_SET_IN_REG_SET (reg_pending_control_uses, 0, i, rsi)
                 {
                   struct deps_reg *reg_last = &deps->reg_last[i];
-                  add_dependence_list (insn, reg_last->sets, 0, REG_DEP_ANTI);
+                  add_dependence_list (insn, reg_last->sets, 0, REG_DEP_ANTI,
+				       false);
                   add_dependence_list (insn, reg_last->implicit_sets,
-				       0, REG_DEP_ANTI);
+				       0, REG_DEP_ANTI, false);
                   add_dependence_list (insn, reg_last->clobbers, 0,
-				       REG_DEP_ANTI);
+				       REG_DEP_ANTI, false);
                 }
             }
 
@@ -2934,9 +2960,9 @@  sched_analyze_insn (struct deps_desc *de
 	    }
 
 	  add_dependence_list (insn, deps->last_pending_memory_flush, 1,
-			       REG_DEP_ANTI);
+			       REG_DEP_ANTI, true);
 	  add_dependence_list (insn, deps->pending_jump_insns, 1,
-			       REG_DEP_ANTI);
+			       REG_DEP_ANTI, true);
 	}
     }
 
@@ -2969,19 +2995,20 @@  sched_analyze_insn (struct deps_desc *de
 	add_dependence (insn, prev, REG_DEP_ANTI);
 
       add_dependence_list (insn, deps->last_function_call, 1,
-			   REG_DEP_ANTI);
+			   REG_DEP_ANTI, false);
 
-      for (u = deps->last_pending_memory_flush; u; u = XEXP (u, 1))
-	if (!sel_sched_p ())
+      if (!sel_sched_p ())
+	for (u = deps->last_pending_memory_flush; u; u = XEXP (u, 1))
 	  add_dependence (insn, XEXP (u, 0), REG_DEP_ANTI);
 
       EXECUTE_IF_SET_IN_REG_SET (reg_pending_uses, 0, i, rsi)
 	{
 	  struct deps_reg *reg_last = &deps->reg_last[i];
-	  add_dependence_list (insn, reg_last->sets, 1, REG_DEP_ANTI);
+	  add_dependence_list (insn, reg_last->sets, 1, REG_DEP_ANTI, false);
 	  /* There's no point in making REG_DEP_CONTROL dependencies for
 	     debug insns.  */
-	  add_dependence_list (insn, reg_last->clobbers, 1, REG_DEP_ANTI);
+	  add_dependence_list (insn, reg_last->clobbers, 1, REG_DEP_ANTI,
+			       false);
 
 	  if (!deps->readonly)
 	    reg_last->uses = alloc_INSN_LIST (insn, reg_last->uses);
@@ -3007,9 +3034,11 @@  sched_analyze_insn (struct deps_desc *de
       EXECUTE_IF_SET_IN_REG_SET (reg_pending_uses, 0, i, rsi)
 	{
 	  struct deps_reg *reg_last = &deps->reg_last[i];
-	  add_dependence_list (insn, reg_last->sets, 0, REG_DEP_TRUE);
-	  add_dependence_list (insn, reg_last->implicit_sets, 0, REG_DEP_ANTI);
-	  add_dependence_list (insn, reg_last->clobbers, 0, REG_DEP_TRUE);
+	  add_dependence_list (insn, reg_last->sets, 0, REG_DEP_TRUE, false);
+	  add_dependence_list (insn, reg_last->implicit_sets, 0, REG_DEP_ANTI,
+			       false);
+	  add_dependence_list (insn, reg_last->clobbers, 0, REG_DEP_TRUE,
+			       false);
 
 	  if (!deps->readonly)
 	    {
@@ -3022,10 +3051,11 @@  sched_analyze_insn (struct deps_desc *de
 	if (TEST_HARD_REG_BIT (implicit_reg_pending_uses, i))
 	  {
 	    struct deps_reg *reg_last = &deps->reg_last[i];
-	    add_dependence_list (insn, reg_last->sets, 0, REG_DEP_TRUE);
+	    add_dependence_list (insn, reg_last->sets, 0, REG_DEP_TRUE, false);
 	    add_dependence_list (insn, reg_last->implicit_sets, 0,
-				 REG_DEP_ANTI);
-	    add_dependence_list (insn, reg_last->clobbers, 0, REG_DEP_TRUE);
+				 REG_DEP_ANTI, false);
+	    add_dependence_list (insn, reg_last->clobbers, 0, REG_DEP_TRUE,
+				 false);
 
 	    if (!deps->readonly)
 	      {
@@ -3060,12 +3090,14 @@  sched_analyze_insn (struct deps_desc *de
 	  EXECUTE_IF_SET_IN_REG_SET (reg_pending_clobbers, 0, i, rsi)
 	    {
 	      struct deps_reg *reg_last = &deps->reg_last[i];
-	      add_dependence_list (insn, reg_last->sets, 0, REG_DEP_OUTPUT);
+	      add_dependence_list (insn, reg_last->sets, 0, REG_DEP_OUTPUT,
+				   false);
 	      add_dependence_list (insn, reg_last->implicit_sets, 0,
-				   REG_DEP_ANTI);
-	      add_dependence_list (insn, reg_last->uses, 0, REG_DEP_ANTI);
+				   REG_DEP_ANTI, false);
+	      add_dependence_list (insn, reg_last->uses, 0, REG_DEP_ANTI,
+				   false);
 	      add_dependence_list (insn, reg_last->control_uses, 0,
-				   REG_DEP_CONTROL);
+				   REG_DEP_CONTROL, false);
 
 	      if (!deps->readonly)
 		{
@@ -3077,13 +3109,16 @@  sched_analyze_insn (struct deps_desc *de
 	  EXECUTE_IF_SET_IN_REG_SET (reg_pending_sets, 0, i, rsi)
 	    {
 	      struct deps_reg *reg_last = &deps->reg_last[i];
-	      add_dependence_list (insn, reg_last->sets, 0, REG_DEP_OUTPUT);
+	      add_dependence_list (insn, reg_last->sets, 0, REG_DEP_OUTPUT,
+				   false);
 	      add_dependence_list (insn, reg_last->implicit_sets, 0,
-				   REG_DEP_ANTI);
-	      add_dependence_list (insn, reg_last->clobbers, 0, REG_DEP_OUTPUT);
-	      add_dependence_list (insn, reg_last->uses, 0, REG_DEP_ANTI);
+				   REG_DEP_ANTI, false);
+	      add_dependence_list (insn, reg_last->clobbers, 0, REG_DEP_OUTPUT,
+				   false);
+	      add_dependence_list (insn, reg_last->uses, 0, REG_DEP_ANTI,
+				   false);
 	      add_dependence_list (insn, reg_last->control_uses, 0,
-				   REG_DEP_CONTROL);
+				   REG_DEP_CONTROL, false);
 
 	      if (!deps->readonly)
 		reg_last->sets = alloc_INSN_LIST (insn, reg_last->sets);
@@ -3098,17 +3133,18 @@  sched_analyze_insn (struct deps_desc *de
 		  || reg_last->clobbers_length > MAX_PENDING_LIST_LENGTH)
 		{
 		  add_dependence_list_and_free (deps, insn, &reg_last->sets, 0,
-						REG_DEP_OUTPUT);
+						REG_DEP_OUTPUT, false);
 		  add_dependence_list_and_free (deps, insn,
 						&reg_last->implicit_sets, 0,
-						REG_DEP_ANTI);
+						REG_DEP_ANTI, false);
 		  add_dependence_list_and_free (deps, insn, &reg_last->uses, 0,
-						REG_DEP_ANTI);
+						REG_DEP_ANTI, false);
 		  add_dependence_list_and_free (deps, insn,
 						&reg_last->control_uses, 0,
-						REG_DEP_ANTI);
-		  add_dependence_list_and_free
-		    (deps, insn, &reg_last->clobbers, 0, REG_DEP_OUTPUT);
+						REG_DEP_ANTI, false);
+		  add_dependence_list_and_free (deps, insn,
+						&reg_last->clobbers, 0,
+						REG_DEP_OUTPUT, false);
 
 		  if (!deps->readonly)
 		    {
@@ -3119,12 +3155,14 @@  sched_analyze_insn (struct deps_desc *de
 		}
 	      else
 		{
-		  add_dependence_list (insn, reg_last->sets, 0, REG_DEP_OUTPUT);
+		  add_dependence_list (insn, reg_last->sets, 0, REG_DEP_OUTPUT,
+				       false);
 		  add_dependence_list (insn, reg_last->implicit_sets, 0,
-				       REG_DEP_ANTI);
-		  add_dependence_list (insn, reg_last->uses, 0, REG_DEP_ANTI);
+				       REG_DEP_ANTI, false);
+		  add_dependence_list (insn, reg_last->uses, 0, REG_DEP_ANTI,
+				       false);
 		  add_dependence_list (insn, reg_last->control_uses, 0,
-				       REG_DEP_CONTROL);
+				       REG_DEP_CONTROL, false);
 		}
 
 	      if (!deps->readonly)
@@ -3139,16 +3177,16 @@  sched_analyze_insn (struct deps_desc *de
 	      struct deps_reg *reg_last = &deps->reg_last[i];
 
 	      add_dependence_list_and_free (deps, insn, &reg_last->sets, 0,
-					    REG_DEP_OUTPUT);
+					    REG_DEP_OUTPUT, false);
 	      add_dependence_list_and_free (deps, insn,
 					    &reg_last->implicit_sets,
-					    0, REG_DEP_ANTI);
+					    0, REG_DEP_ANTI, false);
 	      add_dependence_list_and_free (deps, insn, &reg_last->clobbers, 0,
-					    REG_DEP_OUTPUT);
+					    REG_DEP_OUTPUT, false);
 	      add_dependence_list_and_free (deps, insn, &reg_last->uses, 0,
-					    REG_DEP_ANTI);
+					    REG_DEP_ANTI, false);
 	      add_dependence_list (insn, reg_last->control_uses, 0,
-				   REG_DEP_CONTROL);
+				   REG_DEP_CONTROL, false);
 
 	      if (!deps->readonly)
 		{
@@ -3173,10 +3211,11 @@  sched_analyze_insn (struct deps_desc *de
     if (TEST_HARD_REG_BIT (implicit_reg_pending_clobbers, i))
       {
 	struct deps_reg *reg_last = &deps->reg_last[i];
-	add_dependence_list (insn, reg_last->sets, 0, REG_DEP_ANTI);
-	add_dependence_list (insn, reg_last->clobbers, 0, REG_DEP_ANTI);
-	add_dependence_list (insn, reg_last->uses, 0, REG_DEP_ANTI);
-	add_dependence_list (insn, reg_last->control_uses, 0, REG_DEP_ANTI);
+	add_dependence_list (insn, reg_last->sets, 0, REG_DEP_ANTI, false);
+	add_dependence_list (insn, reg_last->clobbers, 0, REG_DEP_ANTI, false);
+	add_dependence_list (insn, reg_last->uses, 0, REG_DEP_ANTI, false);
+	add_dependence_list (insn, reg_last->control_uses, 0, REG_DEP_ANTI,
+			     false);
 
 	if (!deps->readonly)
 	  reg_last->implicit_sets
@@ -3214,15 +3253,16 @@  sched_analyze_insn (struct deps_desc *de
 	  EXECUTE_IF_SET_IN_REG_SET (&deps->reg_last_in_use, 0, i, rsi)
 	    {
 	      struct deps_reg *reg_last = &deps->reg_last[i];
-	      add_dependence_list (insn, reg_last->uses, 0, REG_DEP_ANTI);
+	      add_dependence_list (insn, reg_last->uses, 0, REG_DEP_ANTI,
+				   true);
 	      add_dependence_list (insn, reg_last->sets, 0,
 				   reg_pending_barrier == TRUE_BARRIER
-				   ? REG_DEP_TRUE : REG_DEP_ANTI);
+				   ? REG_DEP_TRUE : REG_DEP_ANTI, true);
 	      add_dependence_list (insn, reg_last->implicit_sets, 0,
-				   REG_DEP_ANTI);
+				   REG_DEP_ANTI, true);
 	      add_dependence_list (insn, reg_last->clobbers, 0,
 				   reg_pending_barrier == TRUE_BARRIER
-				   ? REG_DEP_TRUE : REG_DEP_ANTI);
+				   ? REG_DEP_TRUE : REG_DEP_ANTI, true);
 	    }
 	}
       else
@@ -3231,19 +3271,21 @@  sched_analyze_insn (struct deps_desc *de
 	    {
 	      struct deps_reg *reg_last = &deps->reg_last[i];
 	      add_dependence_list_and_free (deps, insn, &reg_last->uses, 0,
-					    REG_DEP_ANTI);
+					    REG_DEP_ANTI, true);
 	      add_dependence_list_and_free (deps, insn,
 					    &reg_last->control_uses, 0,
-					    REG_DEP_CONTROL);
+					    REG_DEP_CONTROL, true);
 	      add_dependence_list_and_free (deps, insn, &reg_last->sets, 0,
 					    reg_pending_barrier == TRUE_BARRIER
-					    ? REG_DEP_TRUE : REG_DEP_ANTI);
+					    ? REG_DEP_TRUE : REG_DEP_ANTI,
+					    true);
 	      add_dependence_list_and_free (deps, insn,
 					    &reg_last->implicit_sets, 0,
-					    REG_DEP_ANTI);
+					    REG_DEP_ANTI, true);
 	      add_dependence_list_and_free (deps, insn, &reg_last->clobbers, 0,
 					    reg_pending_barrier == TRUE_BARRIER
-					    ? REG_DEP_TRUE : REG_DEP_ANTI);
+					    ? REG_DEP_TRUE : REG_DEP_ANTI,
+					    true);
 
               if (!deps->readonly)
                 {
@@ -3526,7 +3568,7 @@  deps_analyze_insn (struct deps_desc *dep
       /* For each insn which shouldn't cross a jump, add a dependence.  */
       add_dependence_list_and_free (deps, insn,
 				    &deps->sched_before_next_jump, 1,
-				    REG_DEP_ANTI);
+				    REG_DEP_ANTI, true);
 
       sched_analyze_insn (deps, PATTERN (insn), insn);
     }
@@ -3582,7 +3624,7 @@  deps_analyze_insn (struct deps_desc *dep
          between that insn and this call insn.  */
       add_dependence_list_and_free (deps, insn,
                                     &deps->sched_before_next_call, 1,
-                                    REG_DEP_ANTI);
+                                    REG_DEP_ANTI, true);
 
       sched_analyze_insn (deps, PATTERN (insn), insn);
 
@@ -4476,4 +4518,305 @@  check_dep (dep_t dep, bool relaxed_p)
 }
 #endif /* ENABLE_CHECKING */
 
+/* Holds information about a pair of memory reference and register increment
+   insns which depend on each other, but could possibly be interchanged.  */
+struct mem_inc_info
+{
+  rtx inc_insn;
+  rtx mem_insn;
+
+  rtx *mem_loc;
+  /* A register occurring in the memory address for which we wish to break
+     the dependence.  This must be identical to the destination register of
+     the increment.  */
+  rtx mem_reg0;
+  /* Any kind of index that is added to that register.  */
+  rtx mem_index;
+  /* The constant offset used in the memory address.  */
+  HOST_WIDE_INT mem_constant;
+  /* The constant added in the increment insn.  Negated if the increment is
+     after the memory address.  */
+  HOST_WIDE_INT inc_constant;
+  /* The source register used in the increment.  May be different from mem_reg0
+     if the increment occurs before the memory address.  */
+  rtx inc_input;
+};
+
+/* Verify that the memory location described in MII can be replaced with
+   one using NEW_ADDR.  Return the new memory reference or NULL_RTX.  The
+   insn remains unchanged by this function.  */
+
+static rtx
+attempt_change (struct mem_inc_info *mii, rtx new_addr)
+{
+  rtx mem = *mii->mem_loc;
+  rtx new_mem;
+
+  /* Jump thru a lot of hoops to keep the attributes up to date.  We
+     do not want to call one of the change address variants that take
+     an offset even though we know the offset in many cases.  These
+     assume you are changing where the address is pointing by the
+     offset.  */
+  new_mem = replace_equiv_address_nv (mem, new_addr);
+  if (! validate_change (mii->mem_insn, mii->mem_loc, new_mem, 0))
+    {
+      if (sched_verbose >= 5)
+	fprintf (sched_dump, "validation failure\n");
+      return NULL_RTX;
+    }
+
+  /* Put back the old one.  */
+  validate_change (mii->mem_insn, mii->mem_loc, mem, 0);
+
+  return new_mem;
+}
+
+/* Return true if INSN is of a form "a = b op c" where a and b are
+   regs.  op is + if c is a reg and +|- if c is a const.  Fill in
+   informantion in MII about what is found.
+   BEFORE_MEM indicates whether the increment is found before or after
+   a corresponding memory reference.  */
+
+static bool
+parse_add_or_inc (struct mem_inc_info *mii, rtx insn, bool before_mem)
+{
+  rtx pat = single_set (insn);
+  rtx src, cst;
+  bool regs_equal;
+
+  if (RTX_FRAME_RELATED_P (insn) || !pat)
+    return false;
+
+  /* Result must be single reg.  */
+  if (!REG_P (SET_DEST (pat)))
+    return false;
+
+  if (GET_CODE (SET_SRC (pat)) != PLUS
+      && GET_CODE (SET_SRC (pat)) != MINUS)
+    return false;
+
+  mii->inc_insn = insn;
+  src = SET_SRC (pat);
+  mii->inc_input = XEXP (src, 0);
+
+  if (!REG_P (XEXP (src, 0)))
+    return false;
+
+  if (!rtx_equal_p (SET_DEST (pat), mii->mem_reg0))
+    return false;
+
+  cst = XEXP (src, 1);
+  if (!CONST_INT_P (cst))
+    return false;
+  mii->inc_constant = INTVAL (cst);
+
+  regs_equal = rtx_equal_p (mii->inc_input, mii->mem_reg0);
+
+  if (!before_mem)
+    {
+      mii->inc_constant = -mii->inc_constant;
+      if (!regs_equal)
+	return false;
+    }
+
+  if (regs_equal && REGNO (SET_DEST (pat)) == STACK_POINTER_REGNUM)
+    /* Note that the sign has already been reversed for !before_mem.  */
+    return mii->inc_constant > 0;
+
+  return true;
+}
+
+/* Once a suitable mem reference has been found and the corresponding data
+   in MII has been filled in, this function is called to find a suitable
+   add or inc insn involving the register we found in the memory
+   reference.  */
+
+static bool
+find_inc (struct mem_inc_info *mii, bool backwards)
+{
+  sd_iterator_def sd_it;
+  dep_t dep;
+
+  sd_it = sd_iterator_start (mii->mem_insn,
+			     backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW);
+  while (sd_iterator_cond (&sd_it, &dep))
+    {
+      dep_node_t node = DEP_LINK_NODE (*sd_it.linkp);
+      rtx pro = DEP_PRO (dep);
+      rtx con = DEP_CON (dep);
+      rtx inc_cand = backwards ? pro : con;
+      if (DEP_NONREG (dep) || DEP_MULTIPLE (dep))
+	goto next;
+      if (parse_add_or_inc (mii, inc_cand, backwards))
+	{
+	  struct dep_replacement *desc;
+	  df_ref *def_rec;
+	  rtx newaddr, newmem;
+
+	  if (sched_verbose >= 5)
+	    fprintf (sched_dump, "candidate mem/inc pair: %d %d\n",
+		     INSN_UID (mii->mem_insn), INSN_UID (inc_cand));
+
+	  /* Need to assure that none of the operands of the inc
+	     instruction are assigned to by the mem insn.  */
+	  for (def_rec = DF_INSN_DEFS (mii->mem_insn); *def_rec; def_rec++)
+	    {
+	      df_ref def = *def_rec;
+	      if (reg_overlap_mentioned_p (DF_REF_REG (def), mii->inc_input)
+		  || reg_overlap_mentioned_p (DF_REF_REG (def), mii->mem_reg0))
+		{
+		  if (sched_verbose >= 5)
+		    fprintf (sched_dump,
+			     "inc conflicts with store failure.\n");
+		  goto next;
+		}
+	    }
+	  newaddr = mii->inc_input;
+	  if (mii->mem_index != NULL_RTX)
+	    newaddr = gen_rtx_PLUS (GET_MODE (newaddr), newaddr,
+				    mii->mem_index);
+	  newaddr = plus_constant (GET_MODE (newaddr), newaddr,
+				   mii->mem_constant + mii->inc_constant);
+	  newmem = attempt_change (mii, newaddr);
+	  if (newmem == NULL_RTX)
+	    goto next;
+	  if (sched_verbose >= 5)
+	    fprintf (sched_dump, "successful address replacement\n");
+	  desc = XCNEW (struct dep_replacement);
+	  DEP_REPLACE (dep) = desc;
+	  desc->loc = mii->mem_loc;
+	  desc->newval = newmem;
+	  desc->orig = *desc->loc;
+	  desc->insn = mii->mem_insn;
+	  move_dep_link (DEP_NODE_BACK (node), INSN_HARD_BACK_DEPS (con),
+			 INSN_SPEC_BACK_DEPS (con));
+	  if (backwards)
+	    {
+	      FOR_EACH_DEP (mii->inc_insn, SD_LIST_BACK, sd_it, dep)
+		if (modified_in_p (mii->inc_input, DEP_PRO (dep)))
+		  add_dependence_1 (mii->mem_insn, DEP_PRO (dep),
+				    REG_DEP_TRUE);
+	    }
+	  else
+	    {
+	      FOR_EACH_DEP (mii->inc_insn, SD_LIST_FORW, sd_it, dep)
+		if (modified_in_p (mii->inc_input, DEP_CON (dep)))
+		  add_dependence_1 (DEP_CON (dep), mii->mem_insn,
+				    REG_DEP_ANTI);
+	    }
+	  return true;
+	}
+    next:
+      sd_iterator_next (&sd_it);
+    }
+  return false;
+}
+
+/* A recursive function that walks ADDRESS_OF_X to find memory references
+   which could be modified during scheduling.  We call find_inc for each
+   one we find that has a recognizable form.  MII holds information about
+   the pair of memory/increment instructions.
+   We ensure that every instruction with a memory reference (which will be
+   the location of the replacement) is assigned at most one breakable
+   dependency.  */
+
+static bool
+find_mem (struct mem_inc_info *mii, rtx *address_of_x)
+{
+  rtx x = *address_of_x;
+  enum rtx_code code = GET_CODE (x);
+  const char *const fmt = GET_RTX_FORMAT (code);
+  int i;
+
+  if (code == MEM)
+    {
+      rtx reg0 = XEXP (x, 0);
+
+      mii->mem_loc = address_of_x;
+      mii->mem_index = NULL_RTX;
+      mii->mem_constant = 0;
+      if (GET_CODE (reg0) == PLUS && CONST_INT_P (XEXP (reg0, 1)))
+	{
+	  mii->mem_constant = INTVAL (XEXP (reg0, 1));
+	  reg0 = XEXP (reg0, 0);
+	}
+      if (GET_CODE (reg0) == PLUS)
+	{
+	  mii->mem_index = XEXP (reg0, 1);
+	  reg0 = XEXP (reg0, 0);
+	}
+      if (REG_P (reg0))
+	{
+	  df_ref *def_rec;
+	  int occurrences = 0;
+
+	  /* Make sure this reg appears only once in this insn.  Can't use
+	     count_occurrences since that only works for pseudos.  */
+	  for (def_rec = DF_INSN_USES (mii->mem_insn); *def_rec; def_rec++)
+	    {
+	      df_ref def = *def_rec;
+	      if (reg_overlap_mentioned_p (reg0, DF_REF_REG (def)))
+		if (++occurrences > 1)
+		  {
+		    if (sched_verbose >= 5)
+		      fprintf (sched_dump, "mem count failure\n");
+		    return false;
+		  }
+	    }
+
+	  mii->mem_reg0 = reg0;
+	  return find_inc (mii, true) || find_inc (mii, false);
+	}
+      return false;
+    }
+
+  if (code == SIGN_EXTRACT || code == ZERO_EXTRACT)
+    {
+      /* If REG occurs inside a MEM used in a bit-field reference,
+	 that is unacceptable.  */
+      return false;
+    }
+
+  /* Time for some deep diving.  */
+  for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
+    {
+      if (fmt[i] == 'e')
+	{
+	  if (find_mem (mii, &XEXP (x, i)))
+	    return true;
+	}
+      else if (fmt[i] == 'E')
+	{
+	  int j;
+	  for (j = XVECLEN (x, i) - 1; j >= 0; j--)
+	    if (find_mem (mii, &XVECEXP (x, i, j)))
+	      return true;
+	}
+    }
+  return false;
+}
+
+
+/* Examine the instructions between HEAD and TAIL and try to find
+   dependencies that can be broken by modifying one of the patterns.  */
+
+void
+find_modifiable_mems (rtx head, rtx tail)
+{
+  rtx insn;
+  int success_in_block = 0;
+
+  for (insn = head; insn != tail; insn = NEXT_INSN (insn))
+    {
+      struct mem_inc_info mii;
+
+      if (!NONDEBUG_INSN_P (insn) || RTX_FRAME_RELATED_P (insn))
+	continue;
+
+      mii.mem_insn = insn;
+      if (find_mem (&mii, &PATTERN (insn)))
+	success_in_block++;
+    }
+}
+
 #endif /* INSN_SCHEDULING */
Index: gcc/sched-int.h
===================================================================
--- gcc/sched-int.h	(revision 189284)
+++ gcc/sched-int.h	(working copy)
@@ -206,6 +206,18 @@  typedef int dw_t;
 extern enum reg_note ds_to_dk (ds_t);
 extern ds_t dk_to_ds (enum reg_note);
 
+/* Describe a dependency that can be broken by making a replacement
+   in one of the patterns.  LOC is the location, ORIG and NEWVAL the
+   two alternative contents, and INSN the instruction that must be
+   changed.  */
+struct dep_replacement
+{
+  rtx *loc;
+  rtx orig;
+  rtx newval;
+  rtx insn;
+};
+
 /* Information about the dependency.  */
 struct _dep
 {
@@ -215,18 +227,30 @@  struct _dep
   /* Consumer.  */
   rtx con;
 
-  /* Dependency major type.  This field is superseded by STATUS below.
-     Though, it is still in place because some targets use it.  */
-  enum reg_note type;
+  /* If nonnull, holds a pointer to information about how to break the
+     dependency by making a replacement in one of the insns.  There is
+     only one such dependency for each insn that must be modified in
+     order to break such a dependency.  */
+  struct dep_replacement *replace;
 
   /* Dependency status.  This field holds all dependency types and additional
      information for speculative dependencies.  */
   ds_t status;
 
-  /* Cached cost of the dependency.  */
-  int cost;
+  /* Dependency major type.  This field is superseded by STATUS below.
+     Though, it is still in place because some targets use it.  */
+  ENUM_BITFIELD(reg_note) type:6;
+
+  unsigned nonreg:1;
+  unsigned multiple:1;
+
+  /* Cached cost of the dependency.  Make sure to update UNKNOWN_DEP_COST
+     when changing the size of this field.  */
+  int cost:20;
 };
 
+#define UNKNOWN_DEP_COST (-1<<19)
+
 typedef struct _dep dep_def;
 typedef dep_def *dep_t;
 
@@ -235,8 +259,9 @@  typedef dep_def *dep_t;
 #define DEP_TYPE(D) ((D)->type)
 #define DEP_STATUS(D) ((D)->status)
 #define DEP_COST(D) ((D)->cost)
-
-#define UNKNOWN_DEP_COST INT_MIN
+#define DEP_NONREG(D) ((D)->nonreg)
+#define DEP_MULTIPLE(D) ((D)->multiple)
+#define DEP_REPLACE(D) ((D)->replace)
 
 /* Functions to work with dep.  */
 
@@ -1047,7 +1072,11 @@  enum SPEC_TYPES_OFFSETS {
    Therefore, it can appear only in TODO_SPEC field of an instruction.  */
 #define HARD_DEP (DEP_CONTROL << 1)
 
-#define DEP_CANCELLED (HARD_DEP << 1)
+/* Set in the TODO_SPEC field of an instruction for which new_ready
+   has decided not to schedule it speculatively.  */
+#define DEP_POSTPONED (HARD_DEP << 1)
+
+#define DEP_CANCELLED (DEP_POSTPONED << 1)
 
 /* This represents the results of calling sched-deps.c functions,
    which modify dependencies.  */
@@ -1074,7 +1103,8 @@  enum SCHED_FLAGS {
   DO_SPECULATION = USE_DEPS_LIST << 1,
   DO_BACKTRACKING = DO_SPECULATION << 1,
   DO_PREDICATION = DO_BACKTRACKING << 1,
-  SCHED_RGN = DO_PREDICATION << 1,
+  DONT_BREAK_DEPENDENCIES = DO_PREDICATION << 1,
+  SCHED_RGN = DONT_BREAK_DEPENDENCIES << 1,
   SCHED_EBB = SCHED_RGN << 1,
   /* Scheduler can possibly create new basic blocks.  Used for assertions.  */
   NEW_BBS = SCHED_EBB << 1,
@@ -1406,6 +1436,8 @@  extern void dump_region_dot_file (const
 extern void haifa_sched_init (void);
 extern void haifa_sched_finish (void);
 
+extern void find_modifiable_mems (rtx, rtx);
+
 /* sched-deps.c interface to walk, add, search, update, resolve, delete
    and debug instruction dependencies.  */
 
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 189284)
+++ gcc/Makefile.in	(working copy)
@@ -3263,7 +3263,7 @@  haifa-sched.o : haifa-sched.c $(CONFIG_H
 sched-deps.o : sched-deps.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
    $(RTL_H) $(SCHED_INT_H) $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h \
    $(FUNCTION_H) $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) $(RECOG_H) $(EXCEPT_H) cselib.h \
-   ira.h $(PARAMS_H) $(TM_P_H) ira.h $(TARGET_H)
+   ira.h $(PARAMS_H) $(TM_P_H) ira.h $(TARGET_H) $(EMIT_RTL_H)
 sched-rgn.o : sched-rgn.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
    $(RTL_H) $(SCHED_INT_H) $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h \
    $(FUNCTION_H) $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) $(RECOG_H) $(EXCEPT_H) $(PARAMS_H) \
Index: gcc/sched-rgn.c
===================================================================
--- gcc/sched-rgn.c	(revision 189284)
+++ gcc/sched-rgn.c	(working copy)
@@ -2103,6 +2103,8 @@  init_ready_list (void)
      Count number of insns in the target block being scheduled.  */
   for (insn = NEXT_INSN (prev_head); insn != next_tail; insn = NEXT_INSN (insn))
     {
+      gcc_assert (TODO_SPEC (insn) == HARD_DEP || TODO_SPEC (insn) == DEP_POSTPONED);
+      TODO_SPEC (insn) = HARD_DEP;
       try_ready (insn);
       target_n_insns++;
 
@@ -2126,7 +2128,11 @@  init_ready_list (void)
 
 	for (insn = src_head; insn != src_next_tail; insn = NEXT_INSN (insn))
 	  if (INSN_P (insn))
-	    try_ready (insn);
+	    {
+	      gcc_assert (TODO_SPEC (insn) == HARD_DEP || TODO_SPEC (insn) == DEP_POSTPONED);
+	      TODO_SPEC (insn) = HARD_DEP;
+	      try_ready (insn);
+	    }
       }
 }
 
@@ -2218,11 +2224,11 @@  new_ready (rtx next, ds_t ts)
 		ts = new_ds;
 	      else
 		/* NEXT isn't ready yet.  */
-		ts = (ts & ~SPECULATIVE) | HARD_DEP;
+		ts = DEP_POSTPONED;
 	    }
 	  else
 	    /* NEXT isn't ready yet.  */
-            ts = (ts & ~SPECULATIVE) | HARD_DEP;
+            ts = DEP_POSTPONED;
 	}
     }
 
@@ -2826,7 +2832,9 @@  void debug_dependencies (rtx head, rtx t
 	dep_t dep;
 
 	FOR_EACH_DEP (insn, SD_LIST_FORW, sd_it, dep)
-	  fprintf (sched_dump, "%d ", INSN_UID (DEP_CON (dep)));
+	  fprintf (sched_dump, "%d%s%s ", INSN_UID (DEP_CON (dep)),
+		   DEP_NONREG (dep) ? "n" : "",
+		   DEP_MULTIPLE (dep) ? "m" : "");
       }
       fprintf (sched_dump, "\n");
     }
Index: gcc/config/c6x/c6x.c
===================================================================
--- gcc/config/c6x/c6x.c	(revision 189284)
+++ gcc/config/c6x/c6x.c	(working copy)
@@ -3911,6 +3911,13 @@  c6x_free_sched_context (void *_sc)
   free (_sc);
 }
 
+/* True if we are currently performing a preliminary scheduling
+   pass before modulo scheduling; we can't allow the scheduler to
+   modify instruction patterns using packetization assumptions,
+   since there will be another scheduling pass later if modulo
+   scheduling fails.  */
+static bool in_hwloop;
+
 /* Provide information about speculation capabilities, and set the
    DO_BACKTRACKING flag.  */
 static void
@@ -3922,6 +3929,8 @@  c6x_set_sched_flags (spec_info_t spec_in
     {
       *flags |= DO_BACKTRACKING | DO_PREDICATION;
     }
+  if (in_hwloop)
+    *flags |= DONT_BREAK_DEPENDENCIES;
 
   spec_info->mask = 0;
 }
@@ -5535,9 +5544,11 @@  hwloop_optimize (hwloop_info loop)
 
   reshuffle_units (loop->head);
 
+  in_hwloop = true;
   schedule_ebbs_init ();
   schedule_ebb (BB_HEAD (loop->tail), loop->loop_end, true);
   schedule_ebbs_finish ();
+  in_hwloop = false;
 
   bb = loop->head;
   loop_earliest = bb_earliest_end_cycle (bb, loop->loop_end) + 1;