Patchwork Generic hwloop support library

login
register
mail settings
Submitter Bernd Schmidt
Date June 21, 2011, 1:37 p.m.
Message ID <4E009E8B.6010104@codesourcery.com>
Download mbox | patch
Permalink /patch/101289/
State New
Headers show

Comments

Bernd Schmidt - June 21, 2011, 1:37 p.m.
We currently support counted loops on several targets by having the
loop-doloop pass search for suitable loops and convert them to using the
doloop_end pattern. Unfortunately, there are various machines which have
hardware loop support, but need additional tests and transformations,
during the final stage of compilation, to ensure we can make use of
their hardware.

For example, Blackfin has an LSETUP instruction which sets up a loop
start and loop end address; the hardware compares the program counter to
the loop end address register and automatically resets it to the loop
start address if the end is reached. The encoding of LSETUP has a
limited amount of bits, which forces to compiler to ensure that an upper
bound for the loop's length is not exceeded. The loop end must be after
the loop start, which sometimes requires us to reorder the CFG.

On C6X, we'd like to make use of the SPLOOP/SPKERNEL instructions, which
define a hardware software pipelined loop. Instructions after SPLOOP are
copied into a limited-size loop buffer from which they are reexecuted;
SPKERNEL indicates the end of the loop.

We have some target code that deals with this. Nathan Sidwell originally
wrote this for the mt port (then called ms1, now removed), it was then
reused for the Blackfin by Jie Zhang and myself. For C6X, I've decided
that there is sufficient commonality that we can move parts of the
machinery into a target-independent file, callable from the backends
with a set of hooks that do the actual transformations. This is what the
patch below does.

There are some odd problems when regression testing on Blackfin right
now, but I've got two runs with identical results. Doing before/after
comparisons on a set of .i files, there is one case in my collection
where this patch produces different code; this seems merely a difference
between a long branch and a short branch. Not sure why it happens, but
it seems innocuous.

Ok?


Bernd
* function.c (record_hard_reg_sets): Moved to... 
	* rtlanal.c (record_hard_reg_sets): ... here.  No longer static.
	* rtl.h (record_hard_reg_sets): Declare.
	* hw-doloop.c: New file.
	* hw-doloop.h: New file.
	* Makefile.in (OBJS): Add hw-doloop.o.
	(hw-doloop.o): New rule.
	* config/bfin/bfin.c: Include "hw-doloop.h".
	(loop_info, DEF_VEC_P for loop_info, loop_info_d): Remove.
	(bfin_dump_loops, bfin_bb_in_loop, bfin_scan_loop): Remove.
	(hwloop_optimize): Renamed from bfin_optimize_loop.  Return bool,
	true if the loop was successfully optimized.  Remove code that
	was moved to hw-doloop.c, and adjust other parts.
	(hwloop_fail): New static function, containing parts that used
	to be in bfin_optimize_loop.
	(bfin_discover_loop, bfin_discover_loops, free_loops,
	bfin_reorder_loops): Remove.
	(hwloop_after_reorder, hwloop_pattern_reg): New static functions.
	(bfin_doloop_hooks): New variable.
	(bfin_reorg_loops): Remove most code, call reorg_loops.
	* config/bfin/bfin.md (doloop_end splitter): Also enable if
	loop counter is a memory_operand.
Richard Sandiford - July 2, 2011, 1:03 p.m.
I suppose my main question about this is: did you consider extending
the cfgloop machinery instead of writing new loop discovery code?

Also, loop discovery seems to be based on the liveness of the iterator,
but what happens if the iterator is a general register that is spilled
for part of the loop?  Obviously it's best if that doesn't happen, but
if it does, it looks like you might have a "loop" in which the recorded
head isn't actually considered to be a member of the loop.

I just wondered if it would be better to discover the loop
"normally" and then analyse the use of the iterator.

Bernd Schmidt <bernds@codesourcery.com> writes:
> +typedef struct loop_info *loop_info;

That'll fall foul of -Wc++-compat.

I wonder whether we should use a more specialised name, such as
"hw_doloop", and similarly in the function names.  "loop_info"
and "bb_in_loop" sound more like things that belong in cfgloop.

> +struct hw_doloop_hooks
> +{
> +  /* Examine INSN.  If it is a suitable doloop_end pattern, return the
> +     iteration register.  Otherwise, return NULL_RTX.  */
> +  rtx (*end_pattern_reg) (rtx insn);
> +  /* Called after the hw-doloop pass has finished reordering the basic block
> +     structure.  The target can defer some of its initializations to this
> +     hook if necessary.  This function should return true if the CFG has been
> +     changed and loops must be rediscovered.  */
> +  bool (*after_reorder) (void);
> +  /* Optimize LOOP.  Return true if successful, false if the loop should be
> +     marked bad.  */
> +  bool (*opt) (loop_info loop);
> +  /* Handle a loop that was marked bad for any reason.  This could be used to
> +     split the doloop_end pattern.  */
> +  void (*fail) (loop_info loop);
> +};

For the record, I wondered at first whether these should be target hooks,
but I agree they shouldn't.  md_reorg is special enough that this new
infrastructure really does need to be a subroutine of md_reorg rather
than a new pass.  Having separate hooks is more flexible in that case,
because it allows the hooks to rely on data that is set up by md_reorg
beforehand.

However, I think there needs to be a bit more overview commentary
somewhere.  The main comment is above reorg_loops():

> +/* Run from machine_dependent_reorg, this pass looks for doloop_end
> +   insns, analyzes the loop structure, and tries to rewrite the RTL of
> +   these loops so that the target can create whatever hardware looping
> +   instructions are available.
> +   
> +   DO_REORDER should be set to true if we should try to use the
> +   reorder_loops function to ensure the loop end occurs after the loop
> +   start.  HOOKS is used to pass in target specific hooks; see
> +   hw-doloop.h.  */

I think the combination of this comment and the comments in the hooks
ought to be self-contained enough that a back-end writer could tell from
them alone how to call the subroutine, and how the hooks need to be
defined.  Would you mind expanding it a bit?  E.g. I found the
after_reorder comment a bit hard to follow because there's no single
comment that describes what reordering is done, and how the backend
might want to react to it.  I think it would also help to describe
(broadly) what kind of hardware instruction you're trying to support.

The current code seems to assume that the iterator is a single hard
register.  It might be worth stating (and asserting?)  that somewhere.

I also wondered whether the comment ought to be at the head of the file,
but given that it's a subroutine rather than a stand-alone pass,
there's not really a clear answer...

> +/* Return true if BB is part of LOOP.  */
> +bool
> +bb_in_loop_p (loop_info loop, basic_block bb)

Silly nitlet 1, but the file seems a bit inconsistent about the
number of lines between the comment and the start of function.
(0 here and a few other places, but mostly 1).

> +/* Scan the blocks of LOOP (and its inferiors), and record things such as
> +   hard registers set, jumps out of and within the loop.  */
> +
> +static void
> +scan_loop (loop_info loop)
> +{
> +  unsigned ix;
> +  basic_block bb;
> +
> +  if (loop->head == NULL)
> +    {
> +      gcc_assert (loop->bad);
> +      return;
> +    }
> +  CLEAR_HARD_REG_SET (loop->regs_set_in_loop);
> +  loop->iter_reg_used = false;
> +  loop->jumps_within = false;
> +  loop->jumps_outof = false;
> +  loop->has_call = false;
> +  loop->has_asm = false;
> +  loop->iter_reg_used_outside = false;
> +  loop->timode_insns = 0;

Since the initialisation is split over two places, the zero initialisations
above and the mixed initialisations here:

> +  loop->tail = tail_bb;
> +  loop->loop_end = tail_insn;
> +  loop->last_insn = NULL_RTX;
> +  loop->iter_reg = reg;
> +  loop->depth = loop->length = 0;
> +  loop->visited = false;
> +  loop->bad = false;
> +  loop->outer = NULL;
> +  loop->loops = NULL;
> +  loop->incoming = VEC_alloc (edge, gc, 2);
> +  loop->end_label = NULL_RTX;
> +  loop->start_label = JUMP_LABEL (tail_insn);

it might be better to allocate using XCNEW instead.

> +	  if (GET_MODE (insn) == TImode)
> +	    loop->timode_insns++;

I assume this is used by the C6X port?  The information is only going
to be meaningful to ports like C6X that treat scheduling as a subroutine
of md_reorg, so I wonder whether it belongs here.  If it stays, I think
there should be a comment above the field definition saying that it is
only meaningful in that case.

> +	  CLEAR_HARD_REG_SET (set_this_insn);
> +	  note_stores (PATTERN (insn), record_hard_reg_sets,
> +		       &set_this_insn);
> +          for (x = REG_NOTES (insn); x; x = XEXP (x, 1))
> +            if (REG_NOTE_KIND (x) == REG_INC)
> +	      record_hard_reg_sets (XEXP (x, 0), NULL, &set_this_insn);
> +
> +	  if (CALL_P (insn))
> +	    {
> +	      loop->has_call = true;
> +	      for (x = CALL_INSN_FUNCTION_USAGE (insn); x; x = XEXP (x, 1))
> +		if (GET_CODE (XEXP (x, 0)) == CLOBBER)
> +		  record_hard_reg_sets (XEXP (XEXP (x, 0), 0), NULL,
> +					&set_this_insn);
> +	      
> +	    }

I think this should be iterating over DF_INSN_DEFS instead.  If that
isn't accurate for some reason, it needs to be fixed...

> +  if (EDGE_COUNT (tail_bb->succs) != 2)
> +    {
> +      loop->bad = true;
> +      loop->head = loop->successor = NULL;

What's the significance of clearing "head" and "successor" as well as
setting "bad"?

> +  works = VEC_alloc (basic_block,heap,20);

Missing spaces.

> +	      if (!VEC_space (basic_block, works, 1))
> +		{
> +		  if (dwork)
> +		    {
> +		      VEC_block_remove (basic_block, works, 0, dwork);
> +		      dwork = 0;
> +		    }
> +		  else
> +		    VEC_reserve (basic_block, heap, works, 1);
> +		}
> +	      VEC_quick_push (basic_block, works, succ);

I'm being dense, sorry, but could you walk me through what this
is doing (i.e. why it isn't a simple VEC_safe_push)?

> +      for (pass = 0, retry = 1; retry && pass < 2; pass++)
> +	{
> +	  edge e;
> +	  edge_iterator ei;
> +	  bool first = true;
> +	  retry = 0;
> +
> +	  FOR_EACH_EDGE (e, ei, loop->incoming)
> +	    {
> +	      if (first)
> +		{
> +		  loop->incoming_src = e->src;
> +		  loop->incoming_dest = e->dest;
> +		  first = false;
> +		}
> +	      else
> +		{
> +		  if (e->dest != loop->incoming_dest)
> +		    loop->incoming_dest = NULL;
> +		  if (e->src != loop->incoming_src)
> +		    loop->incoming_src = NULL;
> +		}
> +	      if (loop->incoming_src == NULL && loop->incoming_dest == NULL)
> +		{
> +		  if (pass == 0)
> +		    {
> +		      if (dump_file)
> +			fprintf (dump_file,
> +				 ";; retrying loop %d with forwarder blocks\n",
> +				 loop->loop_no);
> +		      retry = 1;
> +		      break;
> +		    }
> +		  loop->bad = 1;
> +		  if (dump_file)
> +		    fprintf (dump_file,
> +			     ";; can't find suitable entry for loop %d\n",
> +			     loop->loop_no);
> +		  goto out;
> +		}
> +	    }
> +	  if (retry)
> +	    {
> +	      retry = 0;
> +	      FOR_EACH_EDGE (e, ei, loop->incoming)
> +		{
> +		  if (forwarder_block_p (e->src))
> +		    {
> +		      edge e2;
> +		      edge_iterator ei2;
> +
> +		      if (dump_file)
> +			fprintf (dump_file,
> +				 ";; Adding forwarder block %d to loop %d and retrying\n",
> +				 e->src->index, loop->loop_no);
> +		      VEC_safe_push (basic_block, heap, loop->blocks, e->src);
> +		      bitmap_set_bit (loop->block_bitmap, e->src->index);
> +		      FOR_EACH_EDGE (e2, ei2, e->src->preds)
> +			VEC_safe_push (edge, gc, loop->incoming, e2);
> +		      VEC_unordered_remove (edge, loop->incoming, ei.index);
> +		      retry = 1;
> +		      break;
> +		    }
> +		}
> +	      if (!retry)
> +		{
> +		  if (dump_file)
> +		    fprintf (dump_file, ";; No forwarder blocks found\n");
> +		  loop->bad = 1;
> +		}
> +	    }
> +	}
> +    }

I think this would be easier to follow if the discovery part of the
loop_incoming loop was split out:

static bool
process_incoming_edges_p (loop_info loop)
{
  edge e;
  edge_iterator ei;
  bool first = true;

  FOR_EACH_EDGE (e, ei, loop->incoming)
    {
      if (first)
	{
	  loop->incoming_src = e->src;
	  loop->incoming_dest = e->dest;
	  first = false;
	}
      else
	{
	  if (e->dest != loop->incoming_dest)
	    loop->incoming_dest = NULL;
	  if (e->src != loop->incoming_src)
	    loop->incoming_src = NULL;
	  if (loop->incoming_src == NULL && loop->incoming_dest == NULL)
	    return false;
	}
    }
  return true;
}

static bool
add_forwarder_block (loop_info loop)
{
  edge e;
  edge_iterator ei;

  FOR_EACH_EDGE (e, ei, loop->incoming)
    if (forwarder_block_p (e->src))
      {
        edge e2;
        edge_iterator ei2;

        if (dump_file)
          fprintf (dump_file,
                   ";; Adding forwarder block %d to loop %d and retrying\n",
                   e->src->index, loop->loop_no);
        VEC_safe_push (basic_block, heap, loop->blocks, e->src);
        bitmap_set_bit (loop->block_bitmap, e->src->index);
        FOR_EACH_EDGE (e2, ei2, e->src->preds)
          VEC_safe_push (edge, gc, loop->incoming, e2);
        VEC_unordered_remove (edge, loop->incoming, ei.index);
      }
  return false;
}

and then:

    if (!process_incoming_edges_p (loop))
      {
        /* Try again with forwarder blocks.  */
        if (dump_file)
          fprintf (dump_file,
                   ";; retrying loop %d with forwarder blocks\n",
                   loop->loop_no);

	if (!add_forwarder_block (loop))
	  {
            if (dump_file)
              fprintf (dump_file, ";; No forwarder blocks found\n");
            loop->bad = 1;
	  }
        else if (!process_incoming_edges_p (loop))
          {
            if (dump_file)
              fprintf (dump_file,
                       ";; can't find suitable entry for loop %d\n",
                       loop->loop_no);
            loop->bad = 1;
          }
      }

Could you add commentary explaining what this case is trying to handle?
It seems like adding the forwarder adds more incoming_dests,
so presumably the idea is to unify the incoming_srcs.  Something like
(in insn order):

    +---- B1
    |     |
    |     V
    | +-- B2	  
    | |
    | |   L1 <-+
    | |   |    |
    | |   V    |
    | +-> L2   |
    |     |    |
    |     V    |
    +---> L3   |
          |    |
          V    |
          L4 --+
          |
          V
         ...

So we'd then consider B2 to be part of the loop.  Is that right?
If so, how does it help?  B2 still seems a bit of an odd-man-out,
even if we pretend it's part of the loop.

> +      /* There's a degenerate case we can handle - an empty loop consisting
> +	 of only a back branch.  Handle that by deleting the branch.  */
> +      insn = JUMP_LABEL (tail);
> +      while (insn && !NONDEBUG_INSN_P (insn))
> +	insn = NEXT_INSN (insn);
> +      if (insn == tail)
> +	{
> +	  if (dump_file)
> +	    {
> +	      fprintf (dump_file, ";; degenerate loop ending at\n");
> +	      print_rtl_single (dump_file, tail);
> +	    }
> +	  delete_insn_and_edges (tail);
> +	  continue;
> +	}

Do we really need to handle that here?  It sounds like something that
should be optimised by an earlier pass.  Don't you also need to check
that the iterator isn't live on fallthrough?

> +      bb->aux = loop;

Where is this used?  Is it provided for the backend hooks?
If so, that doesn't seem to be documented.

> +	  bitmap_and (tmp_bitmap, other->block_bitmap, loop->block_bitmap);
> +	  if (bitmap_empty_p (tmp_bitmap))
> +	    continue;
> +	  if (bitmap_equal_p (tmp_bitmap, other->block_bitmap))
> +	    {
> +	      other->outer = loop;
> +	      VEC_safe_push (loop_info, heap, loop->loops, other);
> +	    }
> +	  else if (bitmap_equal_p (tmp_bitmap, loop->block_bitmap))
> +	    {
> +	      loop->outer = other;
> +	      VEC_safe_push (loop_info, heap, other->loops, loop);
> +	    }

->outer doesn't seem to be used anywhere, and the choice of loop
looks somewhat arbitrary.

We can avoid the temporary here with:

	if (!bitmap_intersect_p (other->block_bitmap, loop->block_bitmap))
	  continue;
	if (!bitmap_intersect_compl_p (other->block_bitmap,
				       loop->block_bitmap))
	  {
	    other->outer = loop;
	    VEC_safe_push (loop_info, heap, loop->loops, other);
	  }
	else if (!bitmap_intersect_compl_p (loop->block_bitmap,
					    other->block_bitmap))
	  {
	    loop->outer = other;
	    VEC_safe_push (loop_info, heap, other->loops, loop);
	  }

> +#define BB_AUX_INDEX(BB) ((unsigned)(BB)->aux)

I think this'll produce warnings on targets with sizeof(int) !=
sizeof (void *).

> +      /* Recreate an index for basic blocks that represents their order.  */
> +      for (bb = ENTRY_BLOCK_PTR->next_bb, index = 0;
> +	   bb != EXIT_BLOCK_PTR;
> +	   bb = bb->next_bb, index++)
> +	bb->aux = (PTR) index;

I think this would be clearer with FOR_EACH_BB.  It seems a shame
to recompute this for every loop though, especially if no changes
are needed.

> +      if (BB_AUX_INDEX (loop->head) < BB_AUX_INDEX (loop->tail))
> +	continue;

<=?  A self loop isn't a problem.

> +      FOR_EACH_EDGE (e, ei, loop->head->succs)
> +	{
> +	  if (bitmap_bit_p (loop->block_bitmap, e->dest->index)
> +	      && BB_AUX_INDEX (e->dest) < BB_AUX_INDEX (loop->tail))
> +	    {
> +	      basic_block start_bb = e->dest;
> +	      basic_block start_prev_bb = start_bb->prev_bb;
> +
> +	      if (dump_file)
> +		fprintf (dump_file, ";; Moving block %d before block %d\n",
> +			 loop->head->index, start_bb->index);
> +	      loop->head->prev_bb->next_bb = loop->head->next_bb;
> +	      loop->head->next_bb->prev_bb = loop->head->prev_bb;
> +
> +	      loop->head->prev_bb = start_prev_bb;
> +	      loop->head->next_bb = start_bb;
> +	      start_prev_bb->next_bb = start_bb->prev_bb = loop->head;
> +	      break;
> +	    }
> +	}

How confident are we at this stage that the loop can use the hardware
instruction (i.e. won't be marked bad)?  It might be nice to have a
comment about the impact of this change in terms of branching,
both when we can and can't do the actual loop optimisation.

Is there no possibility of running the optimisation in cfglayout mode
instead?  It seems from this and the forwarder block stuff above as
though it might make things easier, but maybe not.

> +  /* Every loop contains in its list of inner loops every loop nested inside
> +     it, even if there are intermediate loops.  This works because we're doing
> +     a depth-first search here and never visit a loop more than once.  */
> +  for (ix = 0; VEC_iterate (loop_info, loop->loops, ix, inner); ix++)
> +    {
> +      optimize_loop (inner, hooks);

Perhaps use a worklist here, to avoid O(loops)-deep recursion?

> +  df_live_add_problem ();
> +  df_live_set_all_dirty ();
> +  df_analyze ();
> +
> +  bitmap_obstack_initialize (&stack);
> +
> +  if (dump_file)
> +    fprintf (dump_file, ";; Find loops, first pass\n\n");
> +
> +  loops = discover_loops (&stack, hooks);

Is it a required part of the interface that we call things like
after_reorder even if there are no loops?  If so, I think that
needs to be stated, otherwise we should early out if loops is null.

> +  if (do_reorder)
> +    {
> +      reorder_loops (loops);
> +      free_loops (loops);
> +
> +      if (dump_file)
> +	fprintf (dump_file, ";; Find loops, second pass\n\n");
> +
> +      loops = discover_loops (&stack, hooks);
> +    }
> +
> +  if (hooks->after_reorder)
> +    {
> +      if (hooks->after_reorder ())
> +	{
> +	  free_loops (loops);
> +	  if (dump_file)
> +	    fprintf (dump_file, ";; Find loops after target after_reorder\n\n");
> +
> +	  loops = discover_loops (&stack, hooks);
> +	}
> +    }

From a naive reading, this looks a bit inefficient: the loops created
after the reorder aren't passed to the after_reorder hook, and could
be immediately discarded if after_reorder returns true.  Maybe just
set loops to null if they become invalid, and recreate them with:

  if (!loops)
    loops = discover_loops ();

Depends on the answer to the early-out question above though.

Richard
Steven Bosscher - July 3, 2011, 1:58 p.m.
On Tue, Jun 21, 2011 at 3:37 PM, Bernd Schmidt <bernds@codesourcery.com> wrote:
>
> +static void
> +reorder_loops (loop_info loops)
> +{
> +  basic_block bb;
> +  loop_info loop;
> +
> +  FOR_EACH_BB (bb)
> +    bb->aux = NULL;


See clear_aux_for_blocks().

Ciao!
Steven

Patch

Index: gcc/rtlanal.c
===================================================================
--- gcc/rtlanal.c	(revision 175202)
+++ gcc/rtlanal.c	(working copy)
@@ -1000,6 +1000,17 @@  set_of (const_rtx pat, const_rtx insn)
   return data.found;
 }
 
+/* This function, which is typically called through note_stores,
+   collects sets and clobbers of hard registers in a HARD_REG_SET,
+   which is pointed to by DATA.  */
+void
+record_hard_reg_sets (rtx x, const_rtx pat ATTRIBUTE_UNUSED, void *data)
+{
+  HARD_REG_SET *pset = (HARD_REG_SET *)data;
+  if (REG_P (x) && HARD_REGISTER_P (x))
+    add_to_hard_reg_set (pset, GET_MODE (x), REGNO (x));
+}
+
 /* Given an INSN, return a SET expression if this insn has only a single SET.
    It may also have CLOBBERs, USEs, or SET whose output
    will not be used, which we ignore.  */
Index: gcc/function.c
===================================================================
--- gcc/function.c	(revision 175202)
+++ gcc/function.c	(working copy)
@@ -2898,17 +2898,6 @@  assign_parm_setup_block (struct assign_p
   SET_DECL_RTL (parm, stack_parm);
 }
 
-/* A subroutine of assign_parm_setup_reg, called through note_stores.
-   This collects sets and clobbers of hard registers in a HARD_REG_SET,
-   which is pointed to by DATA.  */
-static void
-record_hard_reg_sets (rtx x, const_rtx pat ATTRIBUTE_UNUSED, void *data)
-{
-  HARD_REG_SET *pset = (HARD_REG_SET *)data;
-  if (REG_P (x) && HARD_REGISTER_P (x))
-    add_to_hard_reg_set (pset, GET_MODE (x), REGNO (x));
-}
-
 /* A subroutine of assign_parms.  Allocate a pseudo to hold the current
    parameter.  Get it there.  Perform all ABI specified conversions.  */
 
Index: gcc/hw-doloop.c
===================================================================
--- gcc/hw-doloop.c	(revision 0)
+++ gcc/hw-doloop.c	(revision 0)
@@ -0,0 +1,683 @@ 
+/* Code to analyze doloop loops in order for targets to perform late
+   optimizations converting doloops to other forms of hardware loops.
+   Copyright (C) 2011 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "rtl.h"
+#include "flags.h"
+#include "expr.h"
+#include "hard-reg-set.h"
+#include "basic-block.h"
+#include "tm_p.h"
+#include "df.h"
+#include "cfglayout.h"
+#include "cfgloop.h"
+#include "output.h"
+#include "recog.h"
+#include "target.h"
+#include "hw-doloop.h"
+
+#ifdef HAVE_doloop_end
+
+/* Dump information collected in LOOPS.  */
+
+static void
+dump_hwloops (loop_info loops)
+{
+  loop_info loop;
+
+  for (loop = loops; loop; loop = loop->next)
+    {
+      loop_info i;
+      basic_block b;
+      unsigned ix;
+
+      fprintf (dump_file, ";; loop %d: ", loop->loop_no);
+      if (loop->bad)
+	fprintf (dump_file, "(bad) ");
+      fprintf (dump_file, "{head:%d, depth:%d, reg:%u}",
+	       loop->head == NULL ? -1 : loop->head->index,
+	       loop->depth, REGNO (loop->iter_reg));
+
+      fprintf (dump_file, " blocks: [ ");
+      for (ix = 0; VEC_iterate (basic_block, loop->blocks, ix, b); ix++)
+	fprintf (dump_file, "%d ", b->index);
+      fprintf (dump_file, "] ");
+
+      fprintf (dump_file, " inner loops: [ ");
+      for (ix = 0; VEC_iterate (loop_info, loop->loops, ix, i); ix++)
+	fprintf (dump_file, "%d ", i->loop_no);
+      fprintf (dump_file, "]\n");
+    }
+  fprintf (dump_file, "\n");
+}
+
+/* Return true if BB is part of LOOP.  */
+bool
+bb_in_loop_p (loop_info loop, basic_block bb)
+{
+  return bitmap_bit_p (loop->block_bitmap, bb->index);
+}
+
+/* Scan the blocks of LOOP (and its inferiors), and record things such as
+   hard registers set, jumps out of and within the loop.  */
+
+static void
+scan_loop (loop_info loop)
+{
+  unsigned ix;
+  basic_block bb;
+
+  if (loop->head == NULL)
+    {
+      gcc_assert (loop->bad);
+      return;
+    }
+  CLEAR_HARD_REG_SET (loop->regs_set_in_loop);
+  loop->iter_reg_used = false;
+  loop->jumps_within = false;
+  loop->jumps_outof = false;
+  loop->has_call = false;
+  loop->has_asm = false;
+  loop->iter_reg_used_outside = false;
+  loop->timode_insns = 0;
+
+  if (REGNO_REG_SET_P (df_get_live_in (loop->successor),
+		       REGNO (loop->iter_reg)))
+    loop->iter_reg_used_outside = true;
+
+  for (ix = 0; VEC_iterate (basic_block, loop->blocks, ix, bb); ix++)
+    {
+      rtx insn;
+      edge e;
+      edge_iterator ei;
+
+      if (bb != loop->tail)
+	FOR_EACH_EDGE (e, ei, bb->succs)
+	  {
+	    if (bb_in_loop_p (loop, e->dest))
+	      {
+		if (!(e->flags & EDGE_FALLTHRU))
+		  loop->jumps_within = true;
+	      }
+	    else
+	      {
+		loop->jumps_outof = true;
+		if (!loop->bad)
+		  gcc_assert (!REGNO_REG_SET_P (df_get_live_in (e->dest),
+						REGNO (loop->iter_reg)));
+	      }
+	  }
+
+      for (insn = BB_HEAD (bb);
+	   insn != NEXT_INSN (BB_END (bb));
+	   insn = NEXT_INSN (insn))
+	{
+	  HARD_REG_SET set_this_insn;
+	  rtx x;
+
+	  if (!INSN_P (insn))
+	    continue;
+
+	  if (recog_memoized (insn) < 0
+	      && (GET_CODE (PATTERN (insn)) == ASM_INPUT
+		  || asm_noperands (PATTERN (insn)) >= 0))
+	    loop->has_asm = true;
+
+	  if (GET_MODE (insn) == TImode)
+	    loop->timode_insns++;
+
+	  CLEAR_HARD_REG_SET (set_this_insn);
+	  note_stores (PATTERN (insn), record_hard_reg_sets,
+		       &set_this_insn);
+          for (x = REG_NOTES (insn); x; x = XEXP (x, 1))
+            if (REG_NOTE_KIND (x) == REG_INC)
+	      record_hard_reg_sets (XEXP (x, 0), NULL, &set_this_insn);
+
+	  if (CALL_P (insn))
+	    {
+	      loop->has_call = true;
+	      for (x = CALL_INSN_FUNCTION_USAGE (insn); x; x = XEXP (x, 1))
+		if (GET_CODE (XEXP (x, 0)) == CLOBBER)
+		  record_hard_reg_sets (XEXP (XEXP (x, 0), 0), NULL,
+					&set_this_insn);
+	      
+	    }
+
+	  if (insn == loop->loop_end)
+	    CLEAR_HARD_REG_BIT (set_this_insn, REGNO (loop->iter_reg));
+	  else if (reg_mentioned_p (loop->iter_reg, PATTERN (insn)))
+	    loop->iter_reg_used = true;
+	  IOR_HARD_REG_SET (loop->regs_set_in_loop, set_this_insn);
+	}
+    }
+}
+
+/* Called from reorg_loops when a potential loop end is found.  LOOP is
+   a newly set up structure describing the loop, it is this function's
+   responsibility to fill most of it.  TAIL_BB and TAIL_INSN point to the
+   loop_end insn and its enclosing basic block.  REG is the loop counter
+   register.  */
+
+static void
+discover_loop (loop_info loop, basic_block tail_bb, rtx tail_insn, rtx reg)
+{
+  unsigned dwork = 0;
+  basic_block bb;
+  VEC (basic_block,heap) *works;
+
+  loop->tail = tail_bb;
+  loop->loop_end = tail_insn;
+  loop->last_insn = NULL_RTX;
+  loop->iter_reg = reg;
+  loop->depth = loop->length = 0;
+  loop->visited = false;
+  loop->bad = false;
+  loop->outer = NULL;
+  loop->loops = NULL;
+  loop->incoming = VEC_alloc (edge, gc, 2);
+  loop->end_label = NULL_RTX;
+  loop->start_label = JUMP_LABEL (tail_insn);
+
+  if (EDGE_COUNT (tail_bb->succs) != 2)
+    {
+      loop->bad = true;
+      loop->head = loop->successor = NULL;
+      return;
+    }
+  loop->head = BRANCH_EDGE (tail_bb)->dest;
+  loop->successor = FALLTHRU_EDGE (tail_bb)->dest;
+
+  works = VEC_alloc (basic_block,heap,20);
+  VEC_safe_push (basic_block, heap, works, loop->head);
+
+  while (VEC_iterate (basic_block, works, dwork++, bb))
+    {
+      edge e;
+      edge_iterator ei;
+      if (bb == EXIT_BLOCK_PTR)
+	{
+	  /* We've reached the exit block.  The loop must be bad. */
+	  if (dump_file)
+	    fprintf (dump_file,
+		     ";; Loop is bad - reached exit block while scanning\n");
+	  loop->bad = 1;
+	  break;
+	}
+
+      if (bitmap_bit_p (loop->block_bitmap, bb->index))
+	continue;
+
+      /* We've not seen this block before.  Add it to the loop's
+	 list and then add each successor to the work list.  */
+
+      VEC_safe_push (basic_block, heap, loop->blocks, bb);
+      bitmap_set_bit (loop->block_bitmap, bb->index);
+
+      if (bb != tail_bb)
+	{
+	  FOR_EACH_EDGE (e, ei, bb->succs)
+	    {
+	      basic_block succ = EDGE_SUCC (bb, ei.index)->dest;
+	      if (!REGNO_REG_SET_P (df_get_live_in (succ),
+				    REGNO (loop->iter_reg)))
+		continue;
+	      if (!VEC_space (basic_block, works, 1))
+		{
+		  if (dwork)
+		    {
+		      VEC_block_remove (basic_block, works, 0, dwork);
+		      dwork = 0;
+		    }
+		  else
+		    VEC_reserve (basic_block, heap, works, 1);
+		}
+	      VEC_quick_push (basic_block, works, succ);
+	    }
+	}
+    }
+
+  /* Find the predecessor, and make sure nothing else jumps into this loop.  */
+  if (!loop->bad)
+    {
+      int pass, retry;
+      for (dwork = 0; VEC_iterate (basic_block, loop->blocks, dwork, bb); dwork++)
+	{
+	  edge e;
+	  edge_iterator ei;
+	  FOR_EACH_EDGE (e, ei, bb->preds)
+	    {
+	      basic_block pred = e->src;
+
+	      if (!bb_in_loop_p (loop, pred))
+		{
+		  if (dump_file)
+		    fprintf (dump_file, ";; Loop %d: incoming edge %d -> %d\n",
+			     loop->loop_no, pred->index,
+			     e->dest->index);
+		  VEC_safe_push (edge, gc, loop->incoming, e);
+		}
+	    }
+	}
+
+      for (pass = 0, retry = 1; retry && pass < 2; pass++)
+	{
+	  edge e;
+	  edge_iterator ei;
+	  bool first = true;
+	  retry = 0;
+
+	  FOR_EACH_EDGE (e, ei, loop->incoming)
+	    {
+	      if (first)
+		{
+		  loop->incoming_src = e->src;
+		  loop->incoming_dest = e->dest;
+		  first = false;
+		}
+	      else
+		{
+		  if (e->dest != loop->incoming_dest)
+		    loop->incoming_dest = NULL;
+		  if (e->src != loop->incoming_src)
+		    loop->incoming_src = NULL;
+		}
+	      if (loop->incoming_src == NULL && loop->incoming_dest == NULL)
+		{
+		  if (pass == 0)
+		    {
+		      if (dump_file)
+			fprintf (dump_file,
+				 ";; retrying loop %d with forwarder blocks\n",
+				 loop->loop_no);
+		      retry = 1;
+		      break;
+		    }
+		  loop->bad = 1;
+		  if (dump_file)
+		    fprintf (dump_file,
+			     ";; can't find suitable entry for loop %d\n",
+			     loop->loop_no);
+		  goto out;
+		}
+	    }
+	  if (retry)
+	    {
+	      retry = 0;
+	      FOR_EACH_EDGE (e, ei, loop->incoming)
+		{
+		  if (forwarder_block_p (e->src))
+		    {
+		      edge e2;
+		      edge_iterator ei2;
+
+		      if (dump_file)
+			fprintf (dump_file,
+				 ";; Adding forwarder block %d to loop %d and retrying\n",
+				 e->src->index, loop->loop_no);
+		      VEC_safe_push (basic_block, heap, loop->blocks, e->src);
+		      bitmap_set_bit (loop->block_bitmap, e->src->index);
+		      FOR_EACH_EDGE (e2, ei2, e->src->preds)
+			VEC_safe_push (edge, gc, loop->incoming, e2);
+		      VEC_unordered_remove (edge, loop->incoming, ei.index);
+		      retry = 1;
+		      break;
+		    }
+		}
+	      if (!retry)
+		{
+		  if (dump_file)
+		    fprintf (dump_file, ";; No forwarder blocks found\n");
+		  loop->bad = 1;
+		}
+	    }
+	}
+    }
+
+ out:
+  VEC_free (basic_block, heap, works);
+}
+
+/* Analyze the structure of the loops in the current function.  Use STACK
+   for bitmap allocations.  Returns all the valid candidates for hardware
+   loops found in this function.  */
+static loop_info
+discover_loops (bitmap_obstack *stack, struct hw_doloop_hooks *hooks)
+{
+  loop_info loops = NULL;
+  loop_info loop;
+  basic_block bb;
+  bitmap tmp_bitmap;
+  int nloops = 0;
+
+  /* Find all the possible loop tails.  This means searching for every
+     loop_end instruction.  For each one found, create a loop_info
+     structure and add the head block to the work list. */
+  FOR_EACH_BB (bb)
+    {
+      rtx tail = BB_END (bb);
+      rtx insn, reg;
+
+      bb->aux = NULL;
+
+      while (tail && GET_CODE (tail) == NOTE && tail != BB_HEAD (bb))
+	tail = PREV_INSN (tail);
+
+      if (tail == NULL_RTX)
+	continue;
+
+      if (!JUMP_P (tail))
+	continue;
+      reg = hooks->end_pattern_reg (tail);
+      if (reg == NULL_RTX)
+	continue;
+
+      /* A possible loop end */
+
+      /* There's a degenerate case we can handle - an empty loop consisting
+	 of only a back branch.  Handle that by deleting the branch.  */
+      insn = JUMP_LABEL (tail);
+      while (insn && !NONDEBUG_INSN_P (insn))
+	insn = NEXT_INSN (insn);
+      if (insn == tail)
+	{
+	  if (dump_file)
+	    {
+	      fprintf (dump_file, ";; degenerate loop ending at\n");
+	      print_rtl_single (dump_file, tail);
+	    }
+	  delete_insn_and_edges (tail);
+	  continue;
+	}
+
+      loop = XNEW (struct loop_info);
+      loop->next = loops;
+      loops = loop;
+      loop->loop_no = nloops++;
+      loop->blocks = VEC_alloc (basic_block, heap, 20);
+      loop->block_bitmap = BITMAP_ALLOC (stack);
+      bb->aux = loop;
+
+      if (dump_file)
+	{
+	  fprintf (dump_file, ";; potential loop %d ending at\n",
+		   loop->loop_no);
+	  print_rtl_single (dump_file, tail);
+	}
+
+      discover_loop (loop, bb, tail, reg);
+    }
+
+  tmp_bitmap = BITMAP_ALLOC (stack);
+  /* Compute loop nestings.  */
+  for (loop = loops; loop; loop = loop->next)
+    {
+      loop_info other;
+      if (loop->bad)
+	continue;
+
+      for (other = loop->next; other; other = other->next)
+	{
+	  if (other->bad)
+	    continue;
+
+	  bitmap_and (tmp_bitmap, other->block_bitmap, loop->block_bitmap);
+	  if (bitmap_empty_p (tmp_bitmap))
+	    continue;
+	  if (bitmap_equal_p (tmp_bitmap, other->block_bitmap))
+	    {
+	      other->outer = loop;
+	      VEC_safe_push (loop_info, heap, loop->loops, other);
+	    }
+	  else if (bitmap_equal_p (tmp_bitmap, loop->block_bitmap))
+	    {
+	      loop->outer = other;
+	      VEC_safe_push (loop_info, heap, other->loops, loop);
+	    }
+	  else
+	    {
+	      if (dump_file)
+		fprintf (dump_file,
+			 ";; can't find suitable nesting for loops %d and %d\n",
+			 loop->loop_no, other->loop_no);
+	      loop->bad = other->bad = 1;
+	    }
+	}
+    }
+  BITMAP_FREE (tmp_bitmap);
+
+  if (dump_file)
+    dump_hwloops (loops);
+
+  return loops;
+}
+
+/* Free up the loop structures in LOOPS.  */
+static void
+free_loops (loop_info loops)
+{
+  while (loops)
+    {
+      loop_info loop = loops;
+      loops = loop->next;
+      VEC_free (loop_info, heap, loop->loops);
+      VEC_free (basic_block, heap, loop->blocks);
+      BITMAP_FREE (loop->block_bitmap);
+      XDELETE (loop);
+    }
+}
+
+#define BB_AUX_INDEX(BB) ((unsigned)(BB)->aux)
+
+/* The taken-branch edge from the loop end can actually go forward.
+   If the target's hardware loop support requires that the loop end be
+   after the loop start, try to reorder a loop's basic blocks when we
+   find such a case.  */
+
+static void
+reorder_loops (loop_info loops)
+{
+  basic_block bb;
+  loop_info loop;
+
+  FOR_EACH_BB (bb)
+    bb->aux = NULL;
+  cfg_layout_initialize (0);
+
+  for (loop = loops; loop; loop = loop->next)
+    {
+      unsigned index;
+      basic_block bb;
+      edge e;
+      edge_iterator ei;
+
+      if (loop->bad)
+	continue;
+
+      /* Recreate an index for basic blocks that represents their order.  */
+      for (bb = ENTRY_BLOCK_PTR->next_bb, index = 0;
+	   bb != EXIT_BLOCK_PTR;
+	   bb = bb->next_bb, index++)
+	bb->aux = (PTR) index;
+
+      if (BB_AUX_INDEX (loop->head) < BB_AUX_INDEX (loop->tail))
+	continue;
+
+      FOR_EACH_EDGE (e, ei, loop->head->succs)
+	{
+	  if (bitmap_bit_p (loop->block_bitmap, e->dest->index)
+	      && BB_AUX_INDEX (e->dest) < BB_AUX_INDEX (loop->tail))
+	    {
+	      basic_block start_bb = e->dest;
+	      basic_block start_prev_bb = start_bb->prev_bb;
+
+	      if (dump_file)
+		fprintf (dump_file, ";; Moving block %d before block %d\n",
+			 loop->head->index, start_bb->index);
+	      loop->head->prev_bb->next_bb = loop->head->next_bb;
+	      loop->head->next_bb->prev_bb = loop->head->prev_bb;
+
+	      loop->head->prev_bb = start_prev_bb;
+	      loop->head->next_bb = start_bb;
+	      start_prev_bb->next_bb = start_bb->prev_bb = loop->head;
+	      break;
+	    }
+	}
+      loops = loops->next;
+    }
+  
+  FOR_EACH_BB (bb)
+    {
+      if (bb->next_bb != EXIT_BLOCK_PTR)
+	bb->aux = bb->next_bb;
+      else
+	bb->aux = NULL;
+    }
+  cfg_layout_finalize ();
+  df_analyze ();
+}
+
+/* Call the OPT function for LOOP and all of its sub-loops.  This is
+   done in a depth-first search; innermost loops are visited first.
+   OPTIMIZE and FAIL are the functions passed to reorg_loops by the
+   target's reorg pass.  */
+
+static void
+optimize_loop (loop_info loop, struct hw_doloop_hooks *hooks)
+{
+  int ix;
+  loop_info inner;
+  int inner_depth = 0;
+
+  if (loop->visited)
+    return;
+
+  loop->visited = 1;
+
+  if (loop->bad)
+    {
+      if (dump_file)
+	fprintf (dump_file, ";; loop %d bad when found\n", loop->loop_no);
+      goto bad_loop;
+    }
+
+  /* Every loop contains in its list of inner loops every loop nested inside
+     it, even if there are intermediate loops.  This works because we're doing
+     a depth-first search here and never visit a loop more than once.  */
+  for (ix = 0; VEC_iterate (loop_info, loop->loops, ix, inner); ix++)
+    {
+      optimize_loop (inner, hooks);
+
+      if (!inner->bad && inner_depth < inner->depth)
+	inner_depth = inner->depth;
+      /* The set of registers may be changed while optimizing the inner
+	 loop.  */
+      IOR_HARD_REG_SET (loop->regs_set_in_loop, inner->regs_set_in_loop);
+    }
+
+  loop->depth = inner_depth + 1;
+
+  if (hooks->opt (loop))
+    return;
+
+ bad_loop:
+  if (dump_file)
+    fprintf (dump_file, ";; loop %d is bad\n", loop->loop_no);
+
+  loop->bad = true;
+  hooks->fail (loop);
+}
+
+/* Run from machine_dependent_reorg, this pass looks for doloop_end
+   insns, analyzes the loop structure, and tries to rewrite the RTL of
+   these loops so that the target can create whatever hardware looping
+   instructions are available.
+   
+   DO_REORDER should be set to true if we should try to use the
+   reorder_loops function to ensure the loop end occurs after the loop
+   start.  HOOKS is used to pass in target specific hooks; see
+   hw-doloop.h.  */
+
+void
+reorg_loops (bool do_reorder, struct hw_doloop_hooks *hooks)
+{
+  loop_info loops = NULL;
+  loop_info loop;
+  basic_block bb;
+  bitmap_obstack stack;
+
+  df_live_add_problem ();
+  df_live_set_all_dirty ();
+  df_analyze ();
+
+  bitmap_obstack_initialize (&stack);
+
+  if (dump_file)
+    fprintf (dump_file, ";; Find loops, first pass\n\n");
+
+  loops = discover_loops (&stack, hooks);
+
+  if (do_reorder)
+    {
+      reorder_loops (loops);
+      free_loops (loops);
+
+      if (dump_file)
+	fprintf (dump_file, ";; Find loops, second pass\n\n");
+
+      loops = discover_loops (&stack, hooks);
+    }
+
+  if (hooks->after_reorder)
+    {
+      if (hooks->after_reorder ())
+	{
+	  free_loops (loops);
+	  if (dump_file)
+	    fprintf (dump_file, ";; Find loops after target after_reorder\n\n");
+
+	  loops = discover_loops (&stack, hooks);
+	}
+    }
+
+  for (loop = loops; loop; loop = loop->next)
+    scan_loop (loop);
+
+  /* Now apply the optimizations.  */
+  for (loop = loops; loop; loop = loop->next)
+    optimize_loop (loop, hooks);
+
+  if (dump_file)
+    {
+      fprintf (dump_file, ";; After hardware loops optimization:\n\n");
+      dump_hwloops (loops);
+    }
+
+  free_loops (loops);
+
+  if (dump_file)
+    print_rtl (dump_file, get_insns ());
+
+  FOR_EACH_BB (bb)
+    bb->aux = NULL;
+}
+#endif
Index: gcc/hw-doloop.h
===================================================================
--- gcc/hw-doloop.h	(revision 0)
+++ gcc/hw-doloop.h	(revision 0)
@@ -0,0 +1,149 @@ 
+/* Code to analyze doloop loops in order for targets to perform late
+   optimizations converting doloops to other forms of hardware loops.
+   Copyright (C) 2011 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* We need to keep a vector of loops */
+typedef struct loop_info *loop_info;
+DEF_VEC_P (loop_info);
+DEF_VEC_ALLOC_P (loop_info,heap);
+
+/* Information about a loop we have found (or are in the process of
+   finding).  */
+struct GTY (()) loop_info
+{
+  /* loop number, for dumps */
+  int loop_no;
+
+  /* Next loop in the graph. */
+  struct loop_info *next;
+
+  /* Immediate outer loop of this loop.  */
+  struct loop_info *outer;
+
+  /* Vector of blocks only within the loop, including those within
+     inner loops.  */
+  VEC (basic_block,heap) *blocks;
+
+  /* Same information in a bitmap.  */
+  bitmap block_bitmap;
+
+  /* Vector of inner loops within this loop  */
+  VEC (loop_info,heap) *loops;
+
+  /* All edges that jump into the loop.  */
+  VEC(edge,gc) *incoming;
+
+  /* The ports currently using this infrastructure can typically
+     handle two cases: all incoming edges have the same destination
+     block, or all incoming edges have the same source block.  These
+     two members are set to the common source or destination we found,
+     or NULL if different blocks were found.  If both are NULL the
+     loop can't be optimized.  */
+  basic_block incoming_src;
+  basic_block incoming_dest;
+
+  /* First block in the loop.  This is the one branched to by the loop_end
+     insn.  */
+  basic_block head;
+
+  /* Last block in the loop (the one with the loop_end insn).  */
+  basic_block tail;
+
+  /* The successor block of the loop.  This is the one the loop_end insn
+     falls into.  */
+  basic_block successor;
+
+  /* The last instruction in the tail.  */
+  rtx last_insn;
+
+  /* The loop_end insn.  */
+  rtx loop_end;
+
+  /* The iteration register.  */
+  rtx iter_reg;
+
+  /* The new label placed at the beginning of the loop. */
+  rtx start_label;
+
+  /* The new label placed at the end of the loop. */
+  rtx end_label;
+
+  /* The length of the loop.  */
+  int length;
+
+  /* The nesting depth of the loop.  Innermost loops are given a depth
+     of 1.  Only successfully optimized doloops are counted; if an inner
+     loop was marked as bad, it does not increase the depth of its parent
+     loop.
+     This value is valid when the target's optimize function is called.  */
+  int depth;
+
+  /* True if we can't optimize this loop.  */
+  bool bad;
+
+  /* True if we have visited this loop during the optimization phase.  */
+  bool visited;
+
+  /* The following values are collected before calling the target's optimize
+     function and are not valid earlier.  */
+  
+  /* Record information about control flow: whether the loop has calls
+     or asm statements, whether it has edges that jump out of the loop,
+     or edges that jump within the loop.  */
+  bool has_call;
+  bool has_asm;
+  bool jumps_within;
+  bool jumps_outof;
+
+  /* True if there is an instruction other than the doloop_end which uses the
+     iteration register.  */
+  bool iter_reg_used;
+  /* True if the iteration register lives past the doloop instruction.  */
+  bool iter_reg_used_outside;
+
+  /* Hard registers set at any point in the loop, except for the loop counter
+     register's set in the doloop_end instruction.  */
+  HARD_REG_SET regs_set_in_loop;
+
+  /* The number of instructions that are marked with TImode.  Useful if the
+     pass is run after the second scheduling pass.  */
+  int timode_insns;
+};
+
+struct hw_doloop_hooks
+{
+  /* Examine INSN.  If it is a suitable doloop_end pattern, return the
+     iteration register.  Otherwise, return NULL_RTX.  */
+  rtx (*end_pattern_reg) (rtx insn);
+  /* Called after the hw-doloop pass has finished reordering the basic block
+     structure.  The target can defer some of its initializations to this
+     hook if necessary.  This function should return true if the CFG has been
+     changed and loops must be rediscovered.  */
+  bool (*after_reorder) (void);
+  /* Optimize LOOP.  Return true if successful, false if the loop should be
+     marked bad.  */
+  bool (*opt) (loop_info loop);
+  /* Handle a loop that was marked bad for any reason.  This could be used to
+     split the doloop_end pattern.  */
+  void (*fail) (loop_info loop);
+};
+
+extern void reorg_loops (bool, struct hw_doloop_hooks *);
+
+extern bool bb_in_loop_p (loop_info, basic_block);
Index: gcc/rtl.h
===================================================================
--- gcc/rtl.h	(revision 175202)
+++ gcc/rtl.h	(working copy)
@@ -1867,6 +1867,7 @@  extern rtx find_last_value (rtx, rtx *,
 extern int refers_to_regno_p (unsigned int, unsigned int, const_rtx, rtx *);
 extern int reg_overlap_mentioned_p (const_rtx, const_rtx);
 extern const_rtx set_of (const_rtx, const_rtx);
+extern void record_hard_reg_sets (rtx, const_rtx, void *);
 extern void note_stores (const_rtx, void (*) (rtx, const_rtx, void *), void *);
 extern void note_uses (rtx *, void (*) (rtx *, void *), void *);
 extern int dead_or_set_p (const_rtx, const_rtx);
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 175202)
+++ gcc/Makefile.in	(working copy)
@@ -1292,6 +1292,7 @@  OBJS = \
 	graphite-sese-to-poly.o \
 	gtype-desc.o \
 	haifa-sched.o \
+	hw-doloop.o \
 	hwint.o \
 	ifcvt.o \
 	implicit-zee.o \
@@ -3545,7 +3546,10 @@  target-globals.o : target-globals.c $(CO
    $(TM_H) insn-config.h $(MACHMODE_H) $(GGC_H) toplev.h target-globals.h \
    $(FLAGS_H) $(REGS_H) $(RTL_H) reload.h expmed.h $(EXPR_H) $(OPTABS_H) \
    $(LIBFUNCS_H) $(CFGLOOP_H) $(IRA_INT_H) builtins.h gcse.h bb-reorder.h
-
+hw-doloop.o : hw-doloop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
+   $(RTL_H) $(FLAGS_H) $(EXPR_H) hard-reg-set.h $(BASIC_BLOCK_H) $(TM_P_H) \
+   $(DF_H) $(CFGLAYOUT_H) $(CFGLOOP_H) output.h $(RECOG_H) $(TARGET_H) \
+   hw-doloop.h
 $(out_object_file): $(out_file) $(CONFIG_H) coretypes.h $(TM_H) $(TREE_H) \
    $(RTL_H) $(REGS_H) hard-reg-set.h insn-config.h conditions.h \
    output.h $(INSN_ATTR_H) $(SYSTEM_H) toplev.h $(DIAGNOSTIC_CORE_H) \
Index: gcc/config/bfin/bfin.c
===================================================================
--- gcc/config/bfin/bfin.c	(revision 175202)
+++ gcc/config/bfin/bfin.c	(working copy)
@@ -56,6 +56,7 @@ 
 #include "timevar.h"
 #include "df.h"
 #include "sel-sched.h"
+#include "hw-doloop.h"
 #include "opts.h"
 
 /* A C structure for machine-specific, per-function data.
@@ -3381,157 +3382,6 @@  bfin_hardware_loop (void)
 /* Maximum distance of the LSETUP instruction from the loop start.  */
 #define MAX_LSETUP_DISTANCE 30
 
-/* We need to keep a vector of loops */
-typedef struct loop_info_d *loop_info;
-DEF_VEC_P (loop_info);
-DEF_VEC_ALLOC_P (loop_info,heap);
-
-/* Information about a loop we have found (or are in the process of
-   finding).  */
-struct GTY (()) loop_info_d
-{
-  /* loop number, for dumps */
-  int loop_no;
-
-  /* All edges that jump into and out of the loop.  */
-  VEC(edge,gc) *incoming;
-
-  /* We can handle two cases: all incoming edges have the same destination
-     block, or all incoming edges have the same source block.  These two
-     members are set to the common source or destination we found, or NULL
-     if different blocks were found.  If both are NULL the loop can't be
-     optimized.  */
-  basic_block incoming_src;
-  basic_block incoming_dest;
-
-  /* First block in the loop.  This is the one branched to by the loop_end
-     insn.  */
-  basic_block head;
-
-  /* Last block in the loop (the one with the loop_end insn).  */
-  basic_block tail;
-
-  /* The successor block of the loop.  This is the one the loop_end insn
-     falls into.  */
-  basic_block successor;
-
-  /* The last instruction in the tail.  */
-  rtx last_insn;
-
-  /* The loop_end insn.  */
-  rtx loop_end;
-
-  /* The iteration register.  */
-  rtx iter_reg;
-
-  /* The new label placed at the beginning of the loop. */
-  rtx start_label;
-
-  /* The new label placed at the end of the loop. */
-  rtx end_label;
-
-  /* The length of the loop.  */
-  int length;
-
-  /* The nesting depth of the loop.  */
-  int depth;
-
-  /* Nonzero if we can't optimize this loop.  */
-  int bad;
-
-  /* True if we have visited this loop.  */
-  int visited;
-
-  /* True if this loop body clobbers any of LC0, LT0, or LB0.  */
-  int clobber_loop0;
-
-  /* True if this loop body clobbers any of LC1, LT1, or LB1.  */
-  int clobber_loop1;
-
-  /* Next loop in the graph. */
-  struct loop_info_d *next;
-
-  /* Immediate outer loop of this loop.  */
-  struct loop_info_d *outer;
-
-  /* Vector of blocks only within the loop, including those within
-     inner loops.  */
-  VEC (basic_block,heap) *blocks;
-
-  /* Same information in a bitmap.  */
-  bitmap block_bitmap;
-
-  /* Vector of inner loops within this loop  */
-  VEC (loop_info,heap) *loops;
-};
-
-static void
-bfin_dump_loops (loop_info loops)
-{
-  loop_info loop;
-
-  for (loop = loops; loop; loop = loop->next)
-    {
-      loop_info i;
-      basic_block b;
-      unsigned ix;
-
-      fprintf (dump_file, ";; loop %d: ", loop->loop_no);
-      if (loop->bad)
-	fprintf (dump_file, "(bad) ");
-      fprintf (dump_file, "{head:%d, depth:%d}", loop->head->index, loop->depth);
-
-      fprintf (dump_file, " blocks: [ ");
-      FOR_EACH_VEC_ELT (basic_block, loop->blocks, ix, b)
-	fprintf (dump_file, "%d ", b->index);
-      fprintf (dump_file, "] ");
-
-      fprintf (dump_file, " inner loops: [ ");
-      FOR_EACH_VEC_ELT (loop_info, loop->loops, ix, i)
-	fprintf (dump_file, "%d ", i->loop_no);
-      fprintf (dump_file, "]\n");
-    }
-  fprintf (dump_file, "\n");
-}
-
-/* Scan the blocks of LOOP (and its inferiors) looking for basic block
-   BB. Return true, if we find it.  */
-
-static bool
-bfin_bb_in_loop (loop_info loop, basic_block bb)
-{
-  return bitmap_bit_p (loop->block_bitmap, bb->index);
-}
-
-/* Scan the blocks of LOOP (and its inferiors) looking for uses of
-   REG.  Return true, if we find any.  Don't count the loop's loop_end
-   insn if it matches LOOP_END.  */
-
-static bool
-bfin_scan_loop (loop_info loop, rtx reg, rtx loop_end)
-{
-  unsigned ix;
-  basic_block bb;
-
-  FOR_EACH_VEC_ELT (basic_block, loop->blocks, ix, bb)
-    {
-      rtx insn;
-
-      for (insn = BB_HEAD (bb);
-	   insn != NEXT_INSN (BB_END (bb));
-	   insn = NEXT_INSN (insn))
-	{
-	  if (!INSN_P (insn))
-	    continue;
-	  if (insn == loop_end)
-	    continue;
-	  if (reg_mentioned_p (reg, PATTERN (insn)))
-	    return true;
-	}
-    }
-  return false;
-}
-
 /* Estimate the length of INSN conservatively.  */
 
 static int
@@ -3559,67 +3409,32 @@  length_for_loop (rtx insn)
 
 /* Optimize LOOP.  */
 
-static void
-bfin_optimize_loop (loop_info loop)
+static bool
+hwloop_optimize (loop_info loop)
 {
   basic_block bb;
   loop_info inner;
   rtx insn, last_insn;
   rtx loop_init, start_label, end_label;
-  rtx reg_lc0, reg_lc1, reg_lt0, reg_lt1, reg_lb0, reg_lb1;
   rtx iter_reg, scratchreg, scratch_init, scratch_init_insn;
   rtx lc_reg, lt_reg, lb_reg;
   rtx seq, seq_end;
   int length;
   unsigned ix;
-  int inner_depth = 0;
-
-  if (loop->visited)
-    return;
-
-  loop->visited = 1;
-
-  if (loop->bad)
-    {
-      if (dump_file)
-	fprintf (dump_file, ";; loop %d bad when found\n", loop->loop_no);
-      goto bad_loop;
-    }
-
-  /* Every loop contains in its list of inner loops every loop nested inside
-     it, even if there are intermediate loops.  This works because we're doing
-     a depth-first search here and never visit a loop more than once.  */
-  FOR_EACH_VEC_ELT (loop_info, loop->loops, ix, inner)
-    {
-      bfin_optimize_loop (inner);
-
-      if (!inner->bad && inner_depth < inner->depth)
-	{
-	  inner_depth = inner->depth;
-
-	  loop->clobber_loop0 |= inner->clobber_loop0;
-	  loop->clobber_loop1 |= inner->clobber_loop1;
-	}
-    }
+  bool clobber0, clobber1;
 
-  loop->depth = inner_depth + 1;
   if (loop->depth > MAX_LOOP_DEPTH)
     {
       if (dump_file)
 	fprintf (dump_file, ";; loop %d too deep\n", loop->loop_no);
-      goto bad_loop;
+      return false;
     }
 
   /* Get the loop iteration register.  */
   iter_reg = loop->iter_reg;
 
-  if (!REG_P (iter_reg))
-    {
-      if (dump_file)
-	fprintf (dump_file, ";; loop %d iteration count not in a register\n",
-		 loop->loop_no);
-      goto bad_loop;
-    }
+  gcc_assert (REG_P (iter_reg));
+
   scratchreg = NULL_RTX;
   scratch_init = iter_reg;
   scratch_init_insn = NULL_RTX;
@@ -3680,7 +3495,7 @@  bfin_optimize_loop (loop_info loop)
 	  if (dump_file)
 	    fprintf (dump_file, ";; loop %d lsetup not before loop_start\n",
 		     loop->loop_no);
-	  goto bad_loop;
+	  return false;
 	}
 
       /* Account for the pop of a scratch register where necessary.  */
@@ -3692,7 +3507,7 @@  bfin_optimize_loop (loop_info loop)
 	{
 	  if (dump_file)
 	    fprintf (dump_file, ";; loop %d lsetup too far away\n", loop->loop_no);
-	  goto bad_loop;
+	  return false;
 	}
     }
 
@@ -3710,7 +3525,7 @@  bfin_optimize_loop (loop_info loop)
       if (dump_file)
 	fprintf (dump_file, ";; loop %d start_label not before loop_end\n",
 		 loop->loop_no);
-      goto bad_loop;
+      return false;
     }
 
   loop->length = length;
@@ -3718,58 +3533,29 @@  bfin_optimize_loop (loop_info loop)
     {
       if (dump_file)
 	fprintf (dump_file, ";; loop %d too long\n", loop->loop_no);
-      goto bad_loop;
+      return false;
     }
 
   /* Scan all the blocks to make sure they don't use iter_reg.  */
-  if (bfin_scan_loop (loop, iter_reg, loop->loop_end))
+  if (loop->iter_reg_used)
     {
       if (dump_file)
 	fprintf (dump_file, ";; loop %d uses iterator\n", loop->loop_no);
-      goto bad_loop;
-    }
-
-  /* Scan all the insns to see if the loop body clobber
-     any hardware loop registers. */
-
-  reg_lc0 = gen_rtx_REG (SImode, REG_LC0);
-  reg_lc1 = gen_rtx_REG (SImode, REG_LC1);
-  reg_lt0 = gen_rtx_REG (SImode, REG_LT0);
-  reg_lt1 = gen_rtx_REG (SImode, REG_LT1);
-  reg_lb0 = gen_rtx_REG (SImode, REG_LB0);
-  reg_lb1 = gen_rtx_REG (SImode, REG_LB1);
-
-  FOR_EACH_VEC_ELT (basic_block, loop->blocks, ix, bb)
-    {
-      rtx insn;
-
-      for (insn = BB_HEAD (bb);
-	   insn != NEXT_INSN (BB_END (bb));
-	   insn = NEXT_INSN (insn))
-	{
-	  if (!INSN_P (insn))
-	    continue;
-
-	  if (reg_set_p (reg_lc0, insn)
-	      || reg_set_p (reg_lt0, insn)
-	      || reg_set_p (reg_lb0, insn))
-	    loop->clobber_loop0 = 1;
-	  
-	  if (reg_set_p (reg_lc1, insn)
-	      || reg_set_p (reg_lt1, insn)
-	      || reg_set_p (reg_lb1, insn))
-	    loop->clobber_loop1 |= 1;
-	}
+      return false;
     }
 
-  if ((loop->clobber_loop0 && loop->clobber_loop1)
-      || (loop->depth == MAX_LOOP_DEPTH && loop->clobber_loop0))
+  clobber0 = (TEST_HARD_REG_BIT (loop->regs_set_in_loop, REG_LC0)
+	      || TEST_HARD_REG_BIT (loop->regs_set_in_loop, REG_LB0)
+	      || TEST_HARD_REG_BIT (loop->regs_set_in_loop, REG_LT0));
+  clobber1 = (TEST_HARD_REG_BIT (loop->regs_set_in_loop, REG_LC1)
+	      || TEST_HARD_REG_BIT (loop->regs_set_in_loop, REG_LB1)
+	      || TEST_HARD_REG_BIT (loop->regs_set_in_loop, REG_LT1));
+  if (clobber0 && clobber1)
     {
-      loop->depth = MAX_LOOP_DEPTH + 1;
       if (dump_file)
 	fprintf (dump_file, ";; loop %d no loop reg available\n",
 		 loop->loop_no);
-      goto bad_loop;
+      return false;
     }
 
   /* There should be an instruction before the loop_end instruction
@@ -3814,7 +3600,7 @@  bfin_optimize_loop (loop_info loop)
       if (dump_file)
 	fprintf (dump_file, ";; loop %d has no last instruction\n",
 		 loop->loop_no);
-      goto bad_loop;
+      return false;
     }
 
   if (JUMP_P (last_insn) && !any_condjump_p (last_insn))
@@ -3822,7 +3608,7 @@  bfin_optimize_loop (loop_info loop)
       if (dump_file)
 	fprintf (dump_file, ";; loop %d has bad last instruction\n",
 		 loop->loop_no);
-      goto bad_loop;
+      return false;
     }
   /* In all other cases, try to replace a bad last insn with a nop.  */
   else if (JUMP_P (last_insn)
@@ -3838,7 +3624,7 @@  bfin_optimize_loop (loop_info loop)
 	{
 	  if (dump_file)
 	    fprintf (dump_file, ";; loop %d too long\n", loop->loop_no);
-	  goto bad_loop;
+	  return false;
 	}
       if (dump_file)
 	fprintf (dump_file, ";; loop %d has bad last insn; replace with nop\n",
@@ -3854,19 +3640,19 @@  bfin_optimize_loop (loop_info loop)
   end_label = gen_label_rtx ();
   iter_reg = loop->iter_reg;
 
-  if (loop->depth == 1 && !loop->clobber_loop1)
+  if (loop->depth == 1 && !clobber1)
     {
-      lc_reg = reg_lc1;
-      lt_reg = reg_lt1;
-      lb_reg = reg_lb1;
-      loop->clobber_loop1 = 1;
+      lc_reg = gen_rtx_REG (SImode, REG_LC1);
+      lb_reg = gen_rtx_REG (SImode, REG_LB1);
+      lt_reg = gen_rtx_REG (SImode, REG_LT1);
+      SET_HARD_REG_BIT (loop->regs_set_in_loop, REG_LC1);
     }
   else
     {
-      lc_reg = reg_lc0;
-      lt_reg = reg_lt0;
-      lb_reg = reg_lb0;
-      loop->clobber_loop0 = 1;
+      lc_reg = gen_rtx_REG (SImode, REG_LC0);
+      lb_reg = gen_rtx_REG (SImode, REG_LB0);
+      lt_reg = gen_rtx_REG (SImode, REG_LT0);
+      SET_HARD_REG_BIT (loop->regs_set_in_loop, REG_LC0);
     }
 
   loop->end_label = end_label;
@@ -4009,15 +3795,17 @@  bfin_optimize_loop (loop_info loop)
   /* Insert the loop end label before the last instruction of the loop.  */
   emit_label_before (loop->end_label, loop->last_insn);
 
-  return;
-
- bad_loop:
-
-  if (dump_file)
-    fprintf (dump_file, ";; loop %d is bad\n", loop->loop_no);
-
-  loop->bad = 1;
+  return true;
+}
 
+/* A callback for the hw-doloop pass.  Called when a loop we have discovered
+   turns out not to be optimizable; we have to split the doloop_end pattern
+   into a subtract and a test.  */
+static void
+hwloop_fail (loop_info loop)
+{
+  rtx insn = loop->loop_end;
+  
   if (DPREG_P (loop->iter_reg))
     {
       /* If loop->iter_reg is a DREG or PREG, we can split it here
@@ -4039,370 +3827,51 @@  bfin_optimize_loop (loop_info loop)
       LABEL_NUSES (loop->start_label)++;
       delete_insn (loop->loop_end);
     }
-}
-
-/* Called from bfin_reorg_loops when a potential loop end is found.  LOOP is
-   a newly set up structure describing the loop, it is this function's
-   responsibility to fill most of it.  TAIL_BB and TAIL_INSN point to the
-   loop_end insn and its enclosing basic block.  */
-
-static void
-bfin_discover_loop (loop_info loop, basic_block tail_bb, rtx tail_insn)
-{
-  unsigned dwork = 0;
-  basic_block bb;
-  VEC (basic_block,heap) *works = VEC_alloc (basic_block,heap,20);
-
-  loop->tail = tail_bb;
-  loop->head = BRANCH_EDGE (tail_bb)->dest;
-  loop->successor = FALLTHRU_EDGE (tail_bb)->dest;
-  loop->loop_end = tail_insn;
-  loop->last_insn = NULL_RTX;
-  loop->iter_reg = SET_DEST (XVECEXP (PATTERN (tail_insn), 0, 1));
-  loop->depth = loop->length = 0;
-  loop->visited = 0;
-  loop->clobber_loop0 = loop->clobber_loop1 = 0;
-  loop->outer = NULL;
-  loop->loops = NULL;
-  loop->incoming = VEC_alloc (edge, gc, 2);
-  loop->start_label = XEXP (XEXP (SET_SRC (XVECEXP (PATTERN (tail_insn), 0, 0)), 1), 0);
-  loop->end_label = NULL_RTX;
-  loop->bad = 0;
-
-  VEC_safe_push (basic_block, heap, works, loop->head);
-
-  while (VEC_iterate (basic_block, works, dwork++, bb))
-    {
-      edge e;
-      edge_iterator ei;
-      if (bb == EXIT_BLOCK_PTR)
-	{
-	  /* We've reached the exit block.  The loop must be bad. */
-	  if (dump_file)
-	    fprintf (dump_file,
-		     ";; Loop is bad - reached exit block while scanning\n");
-	  loop->bad = 1;
-	  break;
-	}
-
-      if (!bitmap_set_bit (loop->block_bitmap, bb->index))
-	continue;
-
-      /* We've not seen this block before.  Add it to the loop's
-	 list and then add each successor to the work list.  */
-
-      VEC_safe_push (basic_block, heap, loop->blocks, bb);
-
-      if (bb != tail_bb)
-	{
-	  FOR_EACH_EDGE (e, ei, bb->succs)
-	    {
-	      basic_block succ = EDGE_SUCC (bb, ei.index)->dest;
-	      if (!REGNO_REG_SET_P (df_get_live_in (succ),
-				    REGNO (loop->iter_reg)))
-		continue;
-	      if (!VEC_space (basic_block, works, 1))
-		{
-		  if (dwork)
-		    {
-		      VEC_block_remove (basic_block, works, 0, dwork);
-		      dwork = 0;
-		    }
-		  else
-		    VEC_reserve (basic_block, heap, works, 1);
-		}
-	      VEC_quick_push (basic_block, works, succ);
-	    }
-	}
-    }
-
-  /* Find the predecessor, and make sure nothing else jumps into this loop.  */
-  if (!loop->bad)
-    {
-      int pass, retry;
-      FOR_EACH_VEC_ELT (basic_block, loop->blocks, dwork, bb)
-	{
-	  edge e;
-	  edge_iterator ei;
-	  FOR_EACH_EDGE (e, ei, bb->preds)
-	    {
-	      basic_block pred = e->src;
-
-	      if (!bfin_bb_in_loop (loop, pred))
-		{
-		  if (dump_file)
-		    fprintf (dump_file, ";; Loop %d: incoming edge %d -> %d\n",
-			     loop->loop_no, pred->index,
-			     e->dest->index);
-		  VEC_safe_push (edge, gc, loop->incoming, e);
-		}
-	    }
-	}
-
-      for (pass = 0, retry = 1; retry && pass < 2; pass++)
-	{
-	  edge e;
-	  edge_iterator ei;
-	  bool first = true;
-	  retry = 0;
-
-	  FOR_EACH_EDGE (e, ei, loop->incoming)
-	    {
-	      if (first)
-		{
-		  loop->incoming_src = e->src;
-		  loop->incoming_dest = e->dest;
-		  first = false;
-		}
-	      else
-		{
-		  if (e->dest != loop->incoming_dest)
-		    loop->incoming_dest = NULL;
-		  if (e->src != loop->incoming_src)
-		    loop->incoming_src = NULL;
-		}
-	      if (loop->incoming_src == NULL && loop->incoming_dest == NULL)
-		{
-		  if (pass == 0)
-		    {
-		      if (dump_file)
-			fprintf (dump_file,
-				 ";; retrying loop %d with forwarder blocks\n",
-				 loop->loop_no);
-		      retry = 1;
-		      break;
-		    }
-		  loop->bad = 1;
-		  if (dump_file)
-		    fprintf (dump_file,
-			     ";; can't find suitable entry for loop %d\n",
-			     loop->loop_no);
-		  goto out;
-		}
-	    }
-	  if (retry)
-	    {
-	      retry = 0;
-	      FOR_EACH_EDGE (e, ei, loop->incoming)
-		{
-		  if (forwarder_block_p (e->src))
-		    {
-		      edge e2;
-		      edge_iterator ei2;
-
-		      if (dump_file)
-			fprintf (dump_file,
-				 ";; Adding forwarder block %d to loop %d and retrying\n",
-				 e->src->index, loop->loop_no);
-		      VEC_safe_push (basic_block, heap, loop->blocks, e->src);
-		      bitmap_set_bit (loop->block_bitmap, e->src->index);
-		      FOR_EACH_EDGE (e2, ei2, e->src->preds)
-			VEC_safe_push (edge, gc, loop->incoming, e2);
-		      VEC_unordered_remove (edge, loop->incoming, ei.index);
-		      retry = 1;
-		      break;
-		    }
-		}
-	      if (!retry)
-		{
-		  if (dump_file)
-		    fprintf (dump_file, ";; No forwarder blocks found\n");
-		  loop->bad = 1;
-		}
-	    }
-	}
-    }
-
- out:
-  VEC_free (basic_block, heap, works);
-}
-
-/* Analyze the structure of the loops in the current function.  Use STACK
-   for bitmap allocations.  Returns all the valid candidates for hardware
-   loops found in this function.  */
-static loop_info
-bfin_discover_loops (bitmap_obstack *stack, FILE *dump_file)
-{
-  loop_info loops = NULL;
-  loop_info loop;
-  basic_block bb;
-  bitmap tmp_bitmap;
-  int nloops = 0;
-
-  /* Find all the possible loop tails.  This means searching for every
-     loop_end instruction.  For each one found, create a loop_info
-     structure and add the head block to the work list. */
-  FOR_EACH_BB (bb)
-    {
-      rtx tail = BB_END (bb);
-
-      while (GET_CODE (tail) == NOTE)
-	tail = PREV_INSN (tail);
-
-      bb->aux = NULL;
-
-      if (INSN_P (tail) && recog_memoized (tail) == CODE_FOR_loop_end)
-	{
-	  rtx insn;
-	  /* A possible loop end */
-
-	  /* There's a degenerate case we can handle - an empty loop consisting
-	     of only a back branch.  Handle that by deleting the branch.  */
-	  insn = BB_HEAD (BRANCH_EDGE (bb)->dest);
-	  if (next_real_insn (insn) == tail)
-	    {
-	      if (dump_file)
-		{
-		  fprintf (dump_file, ";; degenerate loop ending at\n");
-		  print_rtl_single (dump_file, tail);
-		}
-	      delete_insn_and_edges (tail);
-	      continue;
-	    }
-
-	  loop = XNEW (struct loop_info_d);
-	  loop->next = loops;
-	  loops = loop;
-	  loop->loop_no = nloops++;
-	  loop->blocks = VEC_alloc (basic_block, heap, 20);
-	  loop->block_bitmap = BITMAP_ALLOC (stack);
-	  bb->aux = loop;
-
-	  if (dump_file)
-	    {
-	      fprintf (dump_file, ";; potential loop %d ending at\n",
-		       loop->loop_no);
-	      print_rtl_single (dump_file, tail);
-	    }
-
-	  bfin_discover_loop (loop, bb, tail);
-	}
-    }
-
-  tmp_bitmap = BITMAP_ALLOC (stack);
-  /* Compute loop nestings.  */
-  for (loop = loops; loop; loop = loop->next)
+  else
     {
-      loop_info other;
-      if (loop->bad)
-	continue;
-
-      for (other = loop->next; other; other = other->next)
-	{
-	  if (other->bad)
-	    continue;
-
-	  bitmap_and (tmp_bitmap, other->block_bitmap, loop->block_bitmap);
-	  if (bitmap_empty_p (tmp_bitmap))
-	    continue;
-	  if (bitmap_equal_p (tmp_bitmap, other->block_bitmap))
-	    {
-	      other->outer = loop;
-	      VEC_safe_push (loop_info, heap, loop->loops, other);
-	    }
-	  else if (bitmap_equal_p (tmp_bitmap, loop->block_bitmap))
-	    {
-	      loop->outer = other;
-	      VEC_safe_push (loop_info, heap, other->loops, loop);
-	    }
-	  else
-	    {
-	      if (dump_file)
-		fprintf (dump_file,
-			 ";; can't find suitable nesting for loops %d and %d\n",
-			 loop->loop_no, other->loop_no);
-	      loop->bad = other->bad = 1;
-	    }
-	}
+      splitting_loops = 1;  
+      try_split (PATTERN (insn), insn, 1);
+      splitting_loops = 0;
     }
-  BITMAP_FREE (tmp_bitmap);
-
-  return loops;
 }
 
-/* Free up the loop structures in LOOPS.  */
-static void
-free_loops (loop_info loops)
+/* A callback for the hw-doloop pass.  This function is called after the loop
+   structure has been reordered.  It should return true if it makes any
+   further changes to the CFG that require the caller to rediscover the
+   loops.  */
+static bool
+hwloop_after_reorder (void)
 {
-  while (loops)
-    {
-      loop_info loop = loops;
-      loops = loop->next;
-      VEC_free (loop_info, heap, loop->loops);
-      VEC_free (basic_block, heap, loop->blocks);
-      BITMAP_FREE (loop->block_bitmap);
-      XDELETE (loop);
-    }
+  return false;
 }
 
-#define BB_AUX_INDEX(BB) ((intptr_t)(BB)->aux)
+/* A callback for the hw-doloop pass.  This function examines INSN; if
+   it is a loop_end pattern we recognize, return the reg rtx for the
+   loop counter.  Otherwise, return NULL_RTX.  */
 
-/* The taken-branch edge from the loop end can actually go forward.  Since the
-   Blackfin's LSETUP instruction requires that the loop end be after the loop
-   start, try to reorder a loop's basic blocks when we find such a case.  */
-static void
-bfin_reorder_loops (loop_info loops, FILE *dump_file)
+static rtx
+hwloop_pattern_reg (rtx insn)
 {
-  basic_block bb;
-  loop_info loop;
-
-  FOR_EACH_BB (bb)
-    bb->aux = NULL;
-  cfg_layout_initialize (0);
-
-  for (loop = loops; loop; loop = loop->next)
-    {
-      intptr_t index;
-      basic_block bb;
-      edge e;
-      edge_iterator ei;
-
-      if (loop->bad)
-	continue;
-
-      /* Recreate an index for basic blocks that represents their order.  */
-      for (bb = ENTRY_BLOCK_PTR->next_bb, index = 0;
-	   bb != EXIT_BLOCK_PTR;
-	   bb = bb->next_bb, index++)
-	bb->aux = (PTR) index;
-
-      if (BB_AUX_INDEX (loop->head) < BB_AUX_INDEX (loop->tail))
-	continue;
-
-      FOR_EACH_EDGE (e, ei, loop->head->succs)
-	{
-	  if (bitmap_bit_p (loop->block_bitmap, e->dest->index)
-	      && BB_AUX_INDEX (e->dest) < BB_AUX_INDEX (loop->tail))
-	    {
-	      basic_block start_bb = e->dest;
-	      basic_block start_prev_bb = start_bb->prev_bb;
+  rtx pat, reg;
 
-	      if (dump_file)
-		fprintf (dump_file, ";; Moving block %d before block %d\n",
-			 loop->head->index, start_bb->index);
-	      loop->head->prev_bb->next_bb = loop->head->next_bb;
-	      loop->head->next_bb->prev_bb = loop->head->prev_bb;
+  if (!JUMP_P (insn) || recog_memoized (insn) != CODE_FOR_loop_end)
+    return NULL_RTX;
 
-	      loop->head->prev_bb = start_prev_bb;
-	      loop->head->next_bb = start_bb;
-	      start_prev_bb->next_bb = start_bb->prev_bb = loop->head;
-	      break;
-	    }
-	}
-      loops = loops->next;
-    }
-  
-  FOR_EACH_BB (bb)
-    {
-      if (bb->next_bb != EXIT_BLOCK_PTR)
-	bb->aux = bb->next_bb;
-      else
-	bb->aux = NULL;
-    }
-  cfg_layout_finalize ();
-  df_analyze ();
+  pat = PATTERN (insn);
+  reg = SET_DEST (XVECEXP (PATTERN (insn), 0, 1));
+  if (!REG_P (reg))
+    return NULL_RTX;
+  return reg;
 }
 
+static struct hw_doloop_hooks bfin_doloop_hooks =
+{
+  hwloop_pattern_reg,
+  hwloop_after_reorder,
+  hwloop_optimize,
+  hwloop_fail
+};
+
 /* Run from machine_dependent_reorg, this pass looks for doloop_end insns
    and tries to rewrite the RTL of these loops so that proper Blackfin
    hardware loops are generated.  */
@@ -4410,62 +3879,7 @@  bfin_reorder_loops (loop_info loops, FIL
 static void
 bfin_reorg_loops (FILE *dump_file)
 {
-  loop_info loops = NULL;
-  loop_info loop;
-  basic_block bb;
-  bitmap_obstack stack;
-
-  bitmap_obstack_initialize (&stack);
-
-  if (dump_file)
-    fprintf (dump_file, ";; Find loops, first pass\n\n");
-
-  loops = bfin_discover_loops (&stack, dump_file);
-
-  if (dump_file)
-    bfin_dump_loops (loops);
-
-  bfin_reorder_loops (loops, dump_file);
-  free_loops (loops);
-
-  if (dump_file)
-    fprintf (dump_file, ";; Find loops, second pass\n\n");
-
-  loops = bfin_discover_loops (&stack, dump_file);
-  if (dump_file)
-    {
-      fprintf (dump_file, ";; All loops found:\n\n");
-      bfin_dump_loops (loops);
-    }
-
-  /* Now apply the optimizations.  */
-  for (loop = loops; loop; loop = loop->next)
-    bfin_optimize_loop (loop);
-
-  if (dump_file)
-    {
-      fprintf (dump_file, ";; After hardware loops optimization:\n\n");
-      bfin_dump_loops (loops);
-    }
-
-  free_loops (loops);
-
-  if (dump_file)
-    print_rtl (dump_file, get_insns ());
-
-  FOR_EACH_BB (bb)
-    bb->aux = NULL;
-
-  splitting_loops = 1;
-  FOR_EACH_BB (bb)
-    {
-      rtx insn = BB_END (bb);
-      if (!JUMP_P (insn))
-	continue;
-
-      try_split (PATTERN (insn), insn, 1);
-    }
-  splitting_loops = 0;
+  reorg_loops (true, &bfin_doloop_hooks);
 }
 
 /* Possibly generate a SEQUENCE out of three insns found in SLOT.
Index: gcc/config/bfin/bfin.md
===================================================================
--- gcc/config/bfin/bfin.md	(revision 175202)
+++ gcc/config/bfin/bfin.md	(working copy)
@@ -1989,7 +1989,7 @@  (define_split
 	      (const_int -1)))
    (unspec [(const_int 0)] UNSPEC_LSETUP_END)
    (clobber (match_scratch:SI 2 "=&r"))]
-  "splitting_loops"
+  "memory_operand (operands[0], SImode) || splitting_loops"
   [(set (match_dup 2) (match_dup 0))
    (set (match_dup 2) (plus:SI (match_dup 2) (const_int -1)))
    (set (match_dup 0) (match_dup 2))