diff mbox

Ping / update: Re: RFA: hookize ADJUST_INSN_LENGTH

Message ID 20121122175417.1bzpdydqo8w8wo8g-nzlynne@webmail.spamcop.net
State New
Headers show

Commit Message

Joern Rennecke Nov. 22, 2012, 10:54 p.m. UTC
This uses the same interface as my previous patch:
http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00473.html ,
but I refined the algorithm for the get_insn_variants
mechanism to work properly with the reworked ARC port -
http://gcc.gnu.org/ml/gcc-patches/2012-11/msg01891.html -
the only user so far, and added some documentation.

Bootstrapped in r193731 on i686-pc-linux-gnu .

As mentioned before, I can adjust the config part of the previous
hookization approach to convert the existing ADJUST_INSN_LENGTH targets
to use TARGET_ADJUST_INSN_LNEGTH if that is desired.
2012-11-13  Joern Rennecke  <joern.rennecke@embecosm.com>

	* doc/tm.texi.in (@hook TARGET_ADJUST_INSN_LENGTH): Add.
	(@hook TARGET_INSN_LENGTH_PARAMETERS): Add.
	* doc/tm.texi: Regenerate.
	* final.c (get_attr_length_1): Assert HAVE_ATTR_length.
	Use targetm.adjust_insn_length instead of ADJUST_INSN_LENGTH.
	(shorten_branches_context_t): New typedef.
	(adjust_length): New function.
	(shorten_branches): Use adjust_length instead of ADJUST_INSN_LENGTH.
	Try to satidfy alignment by using variable length instructions.
	* target.def (adjust_insn_length, insn_length_parameters): New hooks.
	* target.h (insn_length_variant_t, insn_length_parameters_t): New types.
	* targhooks.c (default_adjust_insn_length): New function.
	* targhooks.h (default_adjust_insn_length): Declare.
	* genattrtab.c (make_length_attrs): Generate an insn_current_length
	function that is also valid for prima facie invariant length
	instructions.

Comments

Richard Sandiford Nov. 24, 2012, 9:35 a.m. UTC | #1
[responding because you kept me on cc:]

Joern Rennecke <joern.rennecke@embecosm.com> writes:
> This uses the same interface as my previous patch:
> http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00473.html ,
> but I refined the algorithm for the get_insn_variants
> mechanism to work properly with the reworked ARC port -
> http://gcc.gnu.org/ml/gcc-patches/2012-11/msg01891.html -
> the only user so far, and added some documentation.
>
> Bootstrapped in r193731 on i686-pc-linux-gnu .

In http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00029.html
the conversation was roughly:

   > Here's a very general (and rather complicated) interface,
   I wasn't after something as complicated as that.  I just
   wanted to model alignment.

and it looks like this patch implements the very general and rather
complicated interface. :-)  Obviously you're completely free to do that,
but it means that any comments from me are going to be same (and as
unhelpful and unproductive) as last time.  So I still think I should
bow out of this one.

In case that seems unfair, here's a summary of my concerns:

1) As Richard B says, having "locked lengths" with the comment "care must
   be taken to avoid cycles" doesn't sound like good design.  So the
   question was: without this, why would the length be going up and down
   "arbitrarily", even though we're trying to reach a fixed point?
   And the good answer seemed to be: because the backend wants to grow
   some instructions in order to give a particular (mis)alignment to
   later ones.  There's no inherent cycle problem with that, because
   although the length of an instruction can decrease when the padding
   it previously contained is no longer needed, the instruction addresses
   themselves still only move in one direction.  We don't need a locked
   length for this case.

   That kind of alignment optimisation sounds like a useful thing to have,
   but it should be modelled explicitly.  The new patch does model it
   explicitly but still keeps the "locked length: to avoid cycles" thing
   as well.

2) Another reason the lengths could go up and down "arbitrarily" is because
   the backend wants to perform if-conversion, predication, call-it-what-
   you-will on the fly during shorten_branches, and can therefore delete
   or restore instructions relative to previous iterations.  I'm still
   not convinced that's something we should support.

3) As mentioned previously, I don't think ADJUST_LENGTH should be
   aware of ADJUST_LENGTH calls for other instructions.  The way that
   the ARC port "simulates" instructions between the insn passed in the
   previous and current calls doesn't seem right.

4) The patch provides a complex interface and allows for complex decisions
   to be made, but the results of those decisions aren't stored in the rtl.
   That's not a problem for ARC because ARC keeps an on-the-side FSM
   to track these things.  I don't think we should encourage or require
   that though.

FWIW, one interface that would deal with the alignment side of things is:

    unsigned HOST_WIDE_INT preferred_insn_alignment (rtx insn);

which returns an instruction's preferred alignment, and which could
(if useful) be called for SEQUENCEs and/or the insns inside them.

    rtx longer_insn_form (rtx insn, unsigned HOST_WIDE_INT current_length,
                          unsigned HOST_WIDE_INT target_length);

which returns a pattern that would turn INSN from being CURRENT_LENGTH
to TARGET_LENGTH bytes in length, or null if no such pattern exists.

shorten_branches would only use these hooks when optimising the insns
for speed rather than size (the posted patch doesn't seem to account
for that).  It could record the original pattern internally so that
it can revert to that pattern if the extra padding is no longer needed.
longer_insn_form could be used for label alignment as well as
instruction alignment.

There's the potential for a third hook to return the possible target
lengths for a given instruction, a bit like the one in the patch,
so that alignment could be distributed over several instructions.
I think that's an unnecessary complication at this stage though.
(I don't think the original ARC submission supported it.)

But like I say, that's all just for the record.

Richard
Joern Rennecke Nov. 24, 2012, 11:34 a.m. UTC | #2
Quoting Richard Sandiford <rdsandiford@googlemail.com>:

> [responding because you kept me on cc:]
>
> Joern Rennecke <joern.rennecke@embecosm.com> writes:
>> This uses the same interface as my previous patch:
>> http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00473.html ,
>> but I refined the algorithm for the get_insn_variants
>> mechanism to work properly with the reworked ARC port -
>> http://gcc.gnu.org/ml/gcc-patches/2012-11/msg01891.html -
>> the only user so far, and added some documentation.
>>
>> Bootstrapped in r193731 on i686-pc-linux-gnu .
>
> In http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00029.html
> the conversation was roughly:
>
>    > Here's a very general (and rather complicated) interface,
>    I wasn't after something as complicated as that.  I just
>    wanted to model alignment.
>
> and it looks like this patch implements the very general and rather
> complicated interface. :-)

For the port writer, the complexities exposed are proportional to the
complexities of the port that (s)he wants to describe.

>  Obviously you're completely free to do that,
> but it means that any comments from me are going to be same (and as
> unhelpful and unproductive) as last time.  So I still think I should
> bow out of this one.
>
> In case that seems unfair, here's a summary of my concerns:
>
> 1) As Richard B says, having "locked lengths" with the comment "care must
>    be taken to avoid cycles" doesn't sound like good design.  So the
>    question was: without this, why would the length be going up and down
>    "arbitrarily", even though we're trying to reach a fixed point?
>    And the good answer seemed to be: because the backend wants to grow
>    some instructions in order to give a particular (mis)alignment to
>    later ones.  There's no inherent cycle problem with that, because
>    although the length of an instruction can decrease when the padding
>    it previously contained is no longer needed, the instruction addresses
>    themselves still only move in one direction.  We don't need a locked
>    length for this case.

I've seen cycles between different alignment requirements for different
instructions and branch lengths that were borderline.
The new approach to stabilize on instruction variants that are available
should give a saner convergence, but I was afraid there might be more complex
cycles that remain.  Well, I'll try to remove the lock_length logic and see
what happens.  But obviously I can't test with all possible code that could
be compiled - I just do a test build of elf32 arc-linux-uclibc toolchain
including libraries and linux kernels.  It could happen that that suceeds
but we find a few months later that there is something else that causes
cycles.

> 2) Another reason the lengths could go up and down "arbitrarily" is because
>    the backend wants to perform if-conversion, predication, call-it-what-
>    you-will on the fly during shorten_branches, and can therefore delete
>    or restore instructions relative to previous iterations.  I'm still
>    not convinced that's something we should support.
>
> 3) As mentioned previously, I don't think ADJUST_LENGTH should be
>    aware of ADJUST_LENGTH calls for other instructions.  The way that
>    the ARC port "simulates" instructions between the insn passed in the
>    previous and current calls doesn't seem right.

I have removed these aspects of the ARC port, as you can see in:
http://gcc.gnu.org/ml/gcc-patches/2012-11/msg01891.html

The new approach is to do an arc-specific if-conversion based on the ccfsm
action, and run this pass at a few strategic places, so that branch_shortening
sees these effects in the rtl.

> 4) The patch provides a complex interface and allows for complex decisions
>    to be made, but the results of those decisions aren't stored in the rtl.
>    That's not a problem for ARC because ARC keeps an on-the-side FSM
>    to track these things.  I don't think we should encourage or require
>    that though.

As stated above, the arc port no longer uses the FSM during branch shortening.
The interface needs to be complex because the target instruction selection
has to take complex circumstances into account.

> FWIW, one interface that would deal with the alignment side of things is:
>
>     unsigned HOST_WIDE_INT preferred_insn_alignment (rtx insn);
>
> which returns an instruction's preferred alignment, and which could
> (if useful) be called for SEQUENCEs and/or the insns inside them.
>

No, that doesn't make sense, because alignment cost differentials depend
on instruction length.
As length changes during branch shortening, you'd get different alignment
requirements, and thus infinite cycles.

>     rtx longer_insn_form (rtx insn, unsigned HOST_WIDE_INT current_length,
>                           unsigned HOST_WIDE_INT target_length);
>
> which returns a pattern that would turn INSN from being CURRENT_LENGTH
> to TARGET_LENGTH bytes in length, or null if no such pattern exists.

And again you would ignore the cost and alignment interactions.

> shorten_branches would only use these hooks when optimising the insns
> for speed rather than size (the posted patch doesn't seem to account
> for that).  It could record the original pattern internally so that
> it can revert to that pattern if the extra padding is no longer needed.

That isn't quite right either, because we are supposed to do do
speed optimizations as long as they don't hurt size.
So lengthening an insn to avoid inserting a nop in front of an alignment
would be nice.

> longer_insn_form could be used for label alignment as well as
> instruction alignment.
>
> There's the potential for a third hook to return the possible target
> lengths for a given instruction, a bit like the one in the patch,
> so that alignment could be distributed over several instructions.
> I think that's an unnecessary complication at this stage though.
> (I don't think the original ARC submission supported it.)

Indeed, the focus for the ARC port is on (mis)alignment in terms of
an instruction address mod 4 being either 0 or 2.  This can be archived
by upsizing a single instruction.
It would be desirable to have the ability to also do cache-line aware
optimizations, e.g. when you have a bunch of long instructions, and a
hot path branches to them, we'd prefer there to be no cache line boundary
right after or inside the first instruction.
The interface I've designed does support that, the current iteration of the
code does not.
diff mbox

Patch

Index: doc/tm.texi
===================================================================
--- doc/tm.texi	(revision 2691)
+++ doc/tm.texi	(working copy)
@@ -11365,3 +11365,15 @@  @deftypefn {Target Hook} {unsigned HOST_
 @deftypevr {Target Hook} {unsigned char} TARGET_ATOMIC_TEST_AND_SET_TRUEVAL
 This value should be set if the result written by @code{atomic_test_and_set} is not exactly 1, i.e. the @code{bool} @code{true}.
 @end deftypevr
+
+@deftypefn {Target Hook} int TARGET_ADJUST_INSN_LENGTH (rtx @var{insn}, int @var{length}, bool @var{in_delay_sequence})
+Return an adjusted length for @var{insn}.  @var{length} is the value that has been calculated using the @code{length} instruction attribute.  @var{in_delay_sequence} if @var{insn} forms part of a delay sequence.  The default implementation uses @code{ADJUST_INSN_LENGTH}, if defined.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_INSN_LENGTH_PARAMETERS (insn_length_parameters_t *@var{insn_length_parameters})
+Fill in values used for branch shortening.  The type  @code{insn_length_parameters_t} is defined in @file{target-def.h}.  The main feature is the @code{get_variants} function. @smallexample
+int (*get_variants) (rtx insn, int length, bool sequence_p, bool target_p,
+		     insn_length_variant_t *variants)
+@end smallexample
+ For instructions where the ordinary monotonic branch shortening is insufficeint to describe the alternatives, e.g. because there is alignemnt involved, get_variants can provide two or more variants for the instruction.  The return value is the number of variants filled in, which must never exceed the number filled in by @code{insn_length_parameters} in the @var{max_variants} field.  The set of variants for any given instruction filled in should not vary during branch shortening, but rather unavailable variants should be flagged with a @samp{false} @var{enabled} field.  This allows @code{shorten_branches} to keep track of varaints that have been ever disabled in a previous iteration and keep them disabled, so as to avoid infinite looping inside @code{shorten_branches}.  The @var{length} parameter provides the length calculated previously from attributes.  @var{sequence_p} indicates if the instruction presented is inside a @code{SEQUENCE}.  Note, if you make @code{get_variants} provide variants for an entire @code{SEQUENCE}, the @code{SEQUENCE} will henceforth be handled as a single entity for branch shortening.  @var{target_p} indicates if the instruction is considered the target of  a branch, function call, or function return.
+@end deftypefn
Index: doc/tm.texi.in
===================================================================
--- doc/tm.texi.in	(revision 2691)
+++ doc/tm.texi.in	(working copy)
@@ -11205,3 +11205,7 @@  @hook TARGET_MEMMODEL_CHECK
 @end deftypefn
 
 @hook TARGET_ATOMIC_TEST_AND_SET_TRUEVAL
+
+@hook TARGET_ADJUST_INSN_LENGTH
+
+@hook TARGET_INSN_LENGTH_PARAMETERS
Index: final.c
===================================================================
--- final.c	(revision 2691)
+++ final.c	(working copy)
@@ -82,6 +82,7 @@  Software Foundation; either version 3, o
 #include "cfgloop.h"
 #include "params.h"
 #include "tree-pretty-print.h" /* for dump_function_header */
+#include "sbitmap.h"
 
 #ifdef XCOFF_DEBUGGING_INFO
 #include "xcoffout.h"		/* Needed for external data
@@ -377,8 +378,7 @@  get_attr_length_1 (rtx insn, int (*fallb
   int i;
   int length = 0;
 
-  if (!HAVE_ATTR_length)
-    return 0;
+  gcc_assert (HAVE_ATTR_length);
 
   if (insn_lengths_max_uid > INSN_UID (insn))
     return insn_lengths[INSN_UID (insn)];
@@ -424,10 +424,7 @@  get_attr_length_1 (rtx insn, int (*fallb
 	break;
       }
 
-#ifdef ADJUST_INSN_LENGTH
-  ADJUST_INSN_LENGTH (insn, length);
-#endif
-  return length;
+  return targetm.adjust_insn_length (insn, length, false);
 }
 
 /* Obtain the current length of an insn.  If branch shortening has been done,
@@ -828,6 +825,243 @@  struct rtl_opt_pass pass_compute_alignme
 };
 
 
+/* Context to pass from shorten_branches to adjust_length.  */
+typedef struct {
+  /* For each varying length insn, a length to keep across iterations, to
+     avoid cycles.  */
+  int *uid_lock_length;
+  /* Number of iterations since last lock_length change.  */
+  int niter;
+  /* Values obtained from targetm.insn_length_parameters call.  */
+  insn_length_parameters_t parameters;
+  /* A scratch space to hold all the variants of the insn currently being
+     considered.  */
+  insn_length_variant_t *variants;
+  /* For each variant number, an sbitmap that says if that variant is still
+     being considered for this shuid.  */
+  sbitmap *shuid_variants;
+  /* indexed by shuid, alignment offset required at end of insn.  */
+  signed char *request_align;
+  /* Indexed by uid, indicates if and how an insn length varies - see
+     varying_length variable in shorten_branches.  */
+  char *varying_length;
+  /* uid of last insn that provides a choice of lengths.  */
+  int last_aligning_insn;
+  bool something_changed;
+} shorten_branches_context_t;
+
+/* NEW_LENGTH is the length for INSN that has been computed by evaluating
+   the raw instruction length attribute.  Return an adjusted length by
+   avaluating the adjust_insn_length target hook and the get_variants hook
+   in parameters.
+   If INSN is inside a SEQUENCE, then SEQ is that SEQUENCE.
+   TARGET_P indicates if INSN is the target of a call, branch, or call
+   return.  */
+
+static int
+adjust_length (rtx insn, int new_length, bool seq_p,
+	       shorten_branches_context_t *ctx, bool target_p)
+{
+  int uid = INSN_UID (insn);
+  new_length = targetm.adjust_insn_length (insn, new_length, seq_p);
+    /* If the sequence as a whole is subject to variant selection, don't
+       try it for the constituent instructions too; at best, it'd be a waste
+       of time; at worst, it leads to chaos when trying to align the sequence
+       by tweaking a constituent instruction.
+       Also, in general, if we have previously established that
+       variant selection doesn't apply, stick with that decision.  */
+  bool select_variant = (ctx->varying_length[uid] & 4);
+  int n_variants;
+
+  if (new_length < 0)
+    fatal_insn ("negative insn length", insn);
+  int iter_threshold = 0;
+  memset (ctx->variants, 0,
+	  sizeof *ctx->variants * ctx->parameters.max_variants);
+  if (select_variant
+      && (n_variants
+	  = ctx->parameters.get_variants (insn, new_length, seq_p, target_p,
+					  ctx->variants)) != 0)
+    {
+      unsigned align_base = 1 << ctx->parameters.align_base_log;
+      unsigned align_base_mask = align_base - 1;
+      unsigned align_base_set = (1U << align_base) - 1;
+      unsigned align_unit_mask = (1 << ctx->parameters.align_unit_log) - 1;
+      gcc_assert ((insn_current_address & align_unit_mask) == 0);
+      int best_cost = new_length = INT_MAX;
+      bool can_align = false;
+      int best_need_align = -1;
+      bool best_right_align = false;
+      int shuid = uid_shuid[uid];
+
+      /* Freeze disabled variants, and find cheapest variant;
+	 with any align if we have last_aligning_insn, otherwise with
+	 actual align.
+	 See if we can provide alignment at no extra cost.  */
+      for (int i = 0; i < n_variants; i++)
+	{
+	  insn_length_variant_t *variant = &ctx->variants[i];
+	  if (bitmap_bit_p (ctx->shuid_variants[i], shuid))
+	    {
+	      if (!variant->enabled)
+		{
+		  /* We could have the variant provide an iter_threshold
+		     field here, to use instead of 0 ... if there is any
+		     point in having variants that get frozen out only
+		     after a few tries.  */
+		  if (ctx->niter >= 0)
+		    {
+		      bitmap_clear_bit (ctx->shuid_variants[i], shuid);
+		      ctx->something_changed = true;
+		      ctx->niter = 0;
+		    }
+		  continue;
+		}
+	      int need_align;
+	      unsigned align_offset
+		= insn_current_address >> ctx->parameters.align_unit_log;
+	      bool right_align = false;
+	      align_offset &= align_base_mask;
+	      if (variant->align_set == align_base_set)
+		{
+		  need_align = -1;
+		  right_align = ctx->last_aligning_insn != 0;
+		}
+	      else if ((1 << align_offset) & variant->align_set)
+		need_align = 0; /* OK.  */
+	      /* Check if adding one unit provides alignment.
+		 FIXME: this works for the current ARC port,
+		 but we should more generally search for an offset
+		 such that a variable length insn can provide it.  */
+	      else if  ((1 << ((align_offset + 1) & align_base_mask)
+			 & variant->align_set)
+			&& ctx->last_aligning_insn)
+		need_align = 1;
+	      else
+		continue;
+	      int length = variant->length;
+	      /* Add extra length needed for alignment to insn length.
+		 This should go away in the next iteration, when the
+		 desired alignment is provided.  If that doesn't happen,
+		 we have a bug, possibly giving the target a length that
+		 doesn't correspond to an available variant.  */
+	      if (need_align >= 0)
+		length += need_align << ctx->parameters.align_unit_log;
+	      /* FIXME: Add probabilistic weighting and target cost.  */
+	      int cost = (variant->fallthrough_cost + variant->branch_cost);
+	      if (ctx->request_align[shuid] >= 0 && need_align >= 0)
+		{
+		  unsigned offset = insn_current_address + length;
+		  offset >>= ctx->parameters.align_unit_log;
+		  offset = ctx->request_align[shuid] - offset;
+		  /* Fixme: might want to apply smaller mask if alignment
+		     requirement has matching bitmask.  */
+		  offset &= align_base_mask;
+		  if (offset == 0)
+		    right_align = true;
+		}
+	      /* ??? if we had remembered the cost of not getting the desired
+		 alignment, we could weigh cost in this insn against cost
+		 for alignment in following insn.  */
+	      if (cost > best_cost)
+		continue;
+	      if (cost < best_cost)
+		{
+		  best_cost = cost;
+		  new_length = length;
+		  best_need_align = need_align;
+		  best_right_align = right_align;
+		  can_align = false;
+		  continue;
+		}
+	      /* FIXME: to cover the general case, we should really
+		 build a bitmap of the offsets that we can manage for
+		 alignment purposes.  */
+	      if (((length - new_length)
+		   & (align_base_mask << ctx->parameters.align_unit_log))
+		  && new_length <= ctx->uid_lock_length[uid]
+		  && length <= ctx->uid_lock_length[uid])
+		can_align = true;
+	      if (length < new_length && (right_align || !best_right_align))
+		{
+		  new_length = length;
+		  best_need_align = need_align;
+		  best_right_align = right_align;
+		}
+	      else if ((length == new_length
+			&& need_align < best_need_align
+			&& (right_align || !best_right_align))
+		       || (right_align && !best_right_align))
+		{
+		  new_length = length;
+		  best_need_align = need_align;
+		  best_right_align = right_align;
+		}
+	    }
+	}
+      gcc_assert (best_cost < INT_MAX);
+      if (best_need_align >= 0 && ctx->last_aligning_insn)
+	{
+	  if (best_need_align != 0)
+	    gcc_assert (ctx->something_changed);
+
+	  int aluid = ctx->last_aligning_insn;
+	  unsigned offset = INSN_ADDRESSES (aluid) + insn_lengths[aluid];
+	  offset >>= ctx->parameters.align_unit_log;
+	  offset += best_need_align;
+	  ctx->request_align[uid_shuid[aluid]] = (offset & align_base_mask);
+	  ctx->last_aligning_insn = 0;
+	}
+      else
+	gcc_assert (best_need_align <= 0);
+      if (can_align)
+	{
+	  if (ctx->request_align[shuid] >= 0)
+	    {
+	      unsigned offset = insn_current_address + new_length;
+	      offset >>= ctx->parameters.align_unit_log;
+	      offset = ctx->request_align[shuid] - offset;
+	      /* Fixme: might want to apply smaller mask if alignment
+		 requirement has matching bitmask.  */
+	      offset &= align_base_mask;
+	      offset <<= ctx->parameters.align_unit_log;
+	      new_length += offset;
+	    }
+	  ctx->last_aligning_insn = uid;
+	}
+      else if (ctx->request_align[shuid] >= 0)
+	{
+	  ctx->something_changed = true;
+	  ctx->request_align[shuid] = -1;
+	}
+      /* Since we are freezing out variants that have been disabled once,
+	 cycles should generally not happen, so bump up iter_threshold.  */
+      iter_threshold = 7;
+    }
+  /* Stabilizing by length should generally happen for entire delay slot
+     sequences, so doing it inside should be a last resort.  */
+  if (seq_p)
+    iter_threshold = 9;
+  if (new_length != insn_lengths[uid])
+    {
+      if (new_length < ctx->uid_lock_length[uid])
+	new_length = ctx->uid_lock_length[uid];
+      if (new_length == insn_lengths[uid])
+	; /* done here.  */
+      else if (ctx->niter < iter_threshold
+	       || new_length > insn_lengths[uid])
+	{
+	  if (ctx->niter >= iter_threshold)
+	    ctx->uid_lock_length[uid] = new_length, ctx->niter = 0;
+	  insn_lengths[uid] = new_length;
+	  ctx->something_changed = true;
+	}
+      else
+	new_length = insn_lengths[uid];
+    }
+  return new_length;
+}
+
 /* Make a pass over all insns and compute their actual lengths by shortening
    any branches of variable length if possible.  */
 
@@ -849,7 +1083,10 @@  shorten_branches (rtx first)
   int max_skip;
 #define MAX_CODE_ALIGN 16
   rtx seq;
-  int something_changed = 1;
+  /* Indexed by uid, indicates if and how an insn length varies.
+     Bit 0: varying length instruction
+     Bit 1: aligned label
+     Bit 2: apply instruction variant selection.  */
   char *varying_length;
   rtx body;
   int uid;
@@ -899,7 +1136,14 @@  shorten_branches (rtx first)
 
       INSN_SHUID (insn) = i++;
       if (INSN_P (insn))
-	continue;
+	{
+	  if (GET_CODE (PATTERN (insn)) == SEQUENCE)
+	    {
+	      insn = XVECEXP (PATTERN (insn), 0, 0);
+	      INSN_SHUID (insn) = i++;
+	    }
+	  continue;
+	}
 
       if (LABEL_P (insn))
 	{
@@ -964,14 +1208,39 @@  shorten_branches (rtx first)
   if (!HAVE_ATTR_length)
     return;
 
+  int max_shuid = INSN_SHUID (get_last_insn ()) + 1;
+
+  shorten_branches_context_t ctx;
+  insn_length_parameters_t *parameters = &ctx.parameters;
+  memset (parameters, 0, sizeof *parameters);
+  ctx.variants = 0;
+  if (targetm.insn_length_parameters)
+    {
+      targetm.insn_length_parameters (parameters);
+      ctx.variants = XCNEWVEC (insn_length_variant_t, parameters->max_variants);
+    }
+  unsigned align_base = 1 << parameters->align_base_log;
+  ctx.shuid_variants
+    = sbitmap_vector_alloc (parameters->max_variants, max_shuid);
+  bitmap_vector_ones (ctx.shuid_variants, parameters->max_variants);
+
+  ctx.request_align = 0;
+  if (parameters->align_base_log)
+    {
+      ctx.request_align = XNEWVEC (signed char, max_shuid);
+      gcc_assert ((1 << parameters->align_base_log) - 1 <= SCHAR_MAX);
+      memset (ctx.request_align, -1, max_shuid);
+    }
+
   /* Allocate the rest of the arrays.  */
-  insn_lengths = XNEWVEC (int, max_uid);
+  insn_lengths = XCNEWVEC (int, max_uid);
+  ctx.uid_lock_length = XCNEWVEC (int, max_uid);
   insn_lengths_max_uid = max_uid;
   /* Syntax errors can lead to labels being outside of the main insn stream.
      Initialize insn_addresses, so that we get reproducible results.  */
   INSN_ADDRESSES_ALLOC (max_uid);
 
-  varying_length = XCNEWVEC (char, max_uid);
+  ctx.varying_length = varying_length = XCNEWVEC (char, max_uid);
 
   /* Initialize uid_align.  We scan instructions
      from end to start, and keep in align_tab[n] the last seen insn
@@ -1011,7 +1280,6 @@  shorten_branches (rtx first)
          label fields.  */
 
       int min_shuid = INSN_SHUID (get_insns ()) - 1;
-      int max_shuid = INSN_SHUID (get_last_insn ()) + 1;
       int rel;
 
       for (insn = first; insn != 0; insn = NEXT_INSN (insn))
@@ -1066,15 +1334,17 @@  shorten_branches (rtx first)
 
   /* Compute initial lengths, addresses, and varying flags for each insn.  */
   int (*length_fun) (rtx) = increasing ? insn_min_length : insn_default_length;
+  ctx.niter = increasing ? 0 : -1;
+  ctx.last_aligning_insn = 0;
+  gcc_assert (INSN_UID (get_insns ()) > ctx.last_aligning_insn);
 
+  bool target_p = true;
   for (insn_current_address = 0, insn = first;
        insn != 0;
        insn_current_address += insn_lengths[uid], insn = NEXT_INSN (insn))
     {
       uid = INSN_UID (insn);
 
-      insn_lengths[uid] = 0;
-
       if (LABEL_P (insn))
 	{
 	  int log = LABEL_TO_ALIGNMENT (insn);
@@ -1084,16 +1354,19 @@  shorten_branches (rtx first)
 	      int new_address = (insn_current_address + align - 1) & -align;
 	      insn_lengths[uid] = new_address - insn_current_address;
 	    }
+	  target_p = true;
 	}
 
       INSN_ADDRESSES (uid) = insn_current_address + insn_lengths[uid];
 
       if (NOTE_P (insn) || BARRIER_P (insn)
-	  || LABEL_P (insn) || DEBUG_INSN_P(insn))
+	  || LABEL_P (insn) || DEBUG_INSN_P (insn))
 	continue;
       if (INSN_DELETED_P (insn))
 	continue;
 
+      int length = 0;
+
       body = PATTERN (insn);
       if (GET_CODE (body) == ADDR_VEC || GET_CODE (body) == ADDR_DIFF_VEC)
 	{
@@ -1101,13 +1374,15 @@  shorten_branches (rtx first)
 	     section.  */
 	  if (JUMP_TABLES_IN_TEXT_SECTION
 	      || readonly_data_section == text_section)
-	    insn_lengths[uid] = (XVECLEN (body,
-					  GET_CODE (body) == ADDR_DIFF_VEC)
-				 * GET_MODE_SIZE (GET_MODE (body)));
+	    length = (XVECLEN (body, GET_CODE (body) == ADDR_DIFF_VEC)
+		      * GET_MODE_SIZE (GET_MODE (body)));
 	  /* Alignment is handled by ADDR_VEC_ALIGN.  */
 	}
       else if (GET_CODE (body) == ASM_INPUT || asm_noperands (body) >= 0)
-	insn_lengths[uid] = asm_insn_count (body) * insn_default_length (insn);
+	{
+	  length = asm_insn_count (body) * insn_default_length (insn);
+	  target_p = false;
+	}
       else if (GET_CODE (body) == SEQUENCE)
 	{
 	  int i;
@@ -1135,50 +1410,81 @@  shorten_branches (rtx first)
 	      else
 		inner_length = inner_length_fun (inner_insn);
 
-	      insn_lengths[inner_uid] = inner_length;
+	      inner_length = adjust_length (inner_insn, inner_length, true,
+					    &ctx, target_p);
 	      if (const_delay_slots)
 		{
-		  if ((varying_length[inner_uid]
-		       = insn_variable_length_p (inner_insn)) != 0)
-		    varying_length[uid] = 1;
+		  if (parameters->get_variants
+		      && parameters->get_variants (inner_insn, inner_length,
+						   true, target_p && i == 0,
+						   ctx.variants) > 1)
+		    varying_length[inner_uid] = 5;
+		  else
+		    varying_length[inner_uid]
+		      = insn_variable_length_p (inner_insn);
+		  varying_length[uid] |= (varying_length[inner_uid] & 1);
 		  INSN_ADDRESSES (inner_uid) = (insn_current_address
 						+ insn_lengths[uid]);
 		}
 	      else
 		varying_length[inner_uid] = 0;
-	      insn_lengths[uid] += inner_length;
+	      insn_lengths[inner_uid] = inner_length;
+	      length += inner_length;
+	    }
+	  if (parameters->get_variants
+	      && (parameters->get_variants (insn, length, false, target_p,
+					    ctx.variants)
+		  > 1))
+	    {
+	      varying_length[uid] = 5;
+	      for (i = 0; i < XVECLEN (body, 0); i++)
+		{
+		  rtx inner_insn = XVECEXP (body, 0, i);
+		  int inner_uid = INSN_UID (inner_insn);
+		  varying_length[inner_uid] &= ~4;
+		}
 	    }
+	  target_p = false;
 	}
       else if (GET_CODE (body) != USE && GET_CODE (body) != CLOBBER)
 	{
-	  insn_lengths[uid] = length_fun (insn);
-	  varying_length[uid] = insn_variable_length_p (insn);
+	  length = length_fun (insn);
+	  if (parameters->get_variants
+	      && (parameters->get_variants (insn, length, false, target_p,
+					    ctx.variants)
+		  > 1))
+	    varying_length[uid] = 5;
+	  else
+	    varying_length[uid] = insn_variable_length_p (insn);
+	  target_p = false;
 	}
-
+	
       /* If needed, do any adjustment.  */
-#ifdef ADJUST_INSN_LENGTH
-      ADJUST_INSN_LENGTH (insn, insn_lengths[uid]);
-      if (insn_lengths[uid] < 0)
-	fatal_insn ("negative insn length", insn);
-#endif
+      insn_lengths[uid] = adjust_length (insn, length, false, &ctx, target_p);
+      target_p
+	= (CALL_P (body)
+	   || (GET_CODE (body) == SEQUENCE && CALL_P (XVECEXP (body, 0, 0))));
     }
 
   /* Now loop over all the insns finding varying length insns.  For each,
      get the current insn length.  If it has changed, reflect the change.
      When nothing changes for a full pass, we are done.  */
 
-  while (something_changed)
+  ctx.something_changed = true;
+  while (ctx.something_changed)
     {
-      something_changed = 0;
+      if (increasing)
+	ctx.niter++;
+
+      ctx.something_changed = false;
+      ctx.last_aligning_insn = 0;
       insn_current_align = MAX_CODE_ALIGN - 1;
+      target_p = true;
       for (insn_current_address = 0, insn = first;
 	   insn != 0;
 	   insn = NEXT_INSN (insn))
 	{
 	  int new_length;
-#ifdef ADJUST_INSN_LENGTH
-	  int tmp_length;
-#endif
 	  int length_align;
 
 	  uid = INSN_UID (insn);
@@ -1197,6 +1503,17 @@  shorten_branches (rtx first)
 	      else
 		insn_lengths[uid] = 0;
 	      INSN_ADDRESSES (uid) = insn_current_address;
+	      if (log > parameters->align_base_log && ctx.last_aligning_insn)
+		{
+		  unsigned offset = insn_lengths[uid];
+		  offset -= INSN_ADDRESSES (ctx.last_aligning_insn);
+		  offset -= insn_lengths[ctx.last_aligning_insn];
+		  offset >>= parameters->align_unit_log;
+		  offset &= align_base - 1;
+		  ctx.request_align[uid_shuid[ctx.last_aligning_insn]] = offset;
+		  ctx.last_aligning_insn = 0;
+		}
+	      target_p = true;
 	      continue;
 	    }
 
@@ -1228,6 +1545,8 @@  shorten_branches (rtx first)
 	      flags = ADDR_DIFF_VEC_FLAGS (body);
 
 	      /* Try to find a known alignment for rel_lab.  */
+	      /* FIXME: We seem to have lost the code that sets
+		 varying_length & 2 for labels without max_skip.  */
 	      for (prev = rel_lab;
 		   prev
 		   && ! insn_lengths[INSN_UID (prev)]
@@ -1314,7 +1633,7 @@  shorten_branches (rtx first)
 		    = (XVECLEN (body, 1) * GET_MODE_SIZE (GET_MODE (body)));
 		  insn_current_address += insn_lengths[uid];
 		  if (insn_lengths[uid] != old_length)
-		    something_changed = 1;
+		    ctx.something_changed = true;
 		}
 
 	      continue;
@@ -1338,9 +1657,15 @@  shorten_branches (rtx first)
 
 		      insn_current_address += insn_lengths[inner_uid];
 		    }
+		  target_p = CALL_P (XVECEXP (body, 0, 0));
 		}
 	      else
-		insn_current_address += insn_lengths[uid];
+		{
+		  insn_current_address += insn_lengths[uid];
+		  if (BARRIER_P (insn))
+		    ctx.last_aligning_insn = 0;
+		  target_p = CALL_P (insn);
+		}
 
 	      continue;
 	    }
@@ -1364,49 +1689,42 @@  shorten_branches (rtx first)
 		  if (! varying_length[inner_uid])
 		    inner_length = insn_lengths[inner_uid];
 		  else
-		    inner_length = insn_current_length (inner_insn);
-
-		  if (inner_length != insn_lengths[inner_uid])
 		    {
-		      if (!increasing || inner_length > insn_lengths[inner_uid])
-			{
-			  insn_lengths[inner_uid] = inner_length;
-			  something_changed = 1;
-			}
-		      else
-			inner_length = insn_lengths[inner_uid];
+		      inner_length = insn_current_length (inner_insn);
+
+		      inner_length
+			= adjust_length (inner_insn, inner_length, true, &ctx,
+					 target_p && i == 0);
 		    }
 		  insn_current_address += inner_length;
 		  new_length += inner_length;
 		}
+	      /* Set INSN_CURRENT_ADDRESS back to the start of INSN, so that
+		 we have the right context when we process INSN as a whole
+		 in adjust_length later.  */
+	      insn_current_address -= new_length;
 	    }
 	  else
-	    {
-	      new_length = insn_current_length (insn);
-	      insn_current_address += new_length;
-	    }
+	    new_length = insn_current_length (insn);
 
-#ifdef ADJUST_INSN_LENGTH
 	  /* If needed, do any adjustment.  */
-	  tmp_length = new_length;
-	  ADJUST_INSN_LENGTH (insn, new_length);
-	  insn_current_address += (new_length - tmp_length);
-#endif
-
-	  if (new_length != insn_lengths[uid]
-	      && (!increasing || new_length > insn_lengths[uid]))
-	    {
-	      insn_lengths[uid] = new_length;
-	      something_changed = 1;
-	    }
-	  else
-	    insn_current_address += insn_lengths[uid] - new_length;
+	  new_length = adjust_length (insn, new_length, false, &ctx, target_p);
+	  target_p = (CALL_P (insn)
+		      || (NONJUMP_INSN_P (insn)
+			  && GET_CODE (body = PATTERN (insn)) == SEQUENCE
+			  && CALL_P (XVECEXP (body, 0, 0))));
+	  insn_current_address += new_length;
 	}
       /* For a non-optimizing compile, do only a single pass.  */
       if (!increasing)
 	break;
     }
 
+  if (ctx.variants)
+    free (ctx.variants);
+  if (ctx.request_align)
+    free (ctx.request_align);
+  sbitmap_vector_free (ctx.shuid_variants);
   free (varying_length);
 }
 
Index: target.def
===================================================================
--- target.def	(revision 2691)
+++ target.def	(working copy)
@@ -2859,6 +2859,47 @@  HOOK_VECTOR_END (target_option)
  enum unwind_info_type, (void),
  default_debug_unwind_info)
 
+DEFHOOK
+(adjust_insn_length,
+ "Return an adjusted length for @var{insn}.  @var{length} is the value that\
+ has been calculated using the @code{length} instruction attribute. \
+ @var{in_delay_sequence} if @var{insn} forms part of a delay sequence. \
+ The default\
+ implementation uses @code{ADJUST_INSN_LENGTH}, if defined.",
+ int, (rtx insn, int length, bool in_delay_sequence),
+ default_adjust_insn_length)
+
+DEFHOOK
+(insn_length_parameters,
+ "Fill in values used for branch shortening.  The type \
+ @code{insn_length_parameters_t} is defined in @file{target-def.h}. \
+ The main feature is the @code{get_variants} function. \
+@smallexample\n\
+int (*get_variants) (rtx insn, int length, bool sequence_p, bool target_p,\n\
+		     insn_length_variant_t *variants)\n\
+@end smallexample\n\
+ For instructions where the ordinary monotonic branch shortening is\
+ insufficeint to describe the alternatives, e.g. because there is alignemnt\
+ involved, get_variants can provide two or more variants for the instruction. \
+ The return value is the number of variants filled in, which must never exceed\
+ the number filled in by @code{insn_length_parameters} in the\
+ @var{max_variants} field.  The set of variants for any given instruction\
+ filled in should not vary during branch shortening, but rather unavailable\
+ variants should be flagged with a @samp{false} @var{enabled} field. \
+ This allows @code{shorten_branches} to keep track of varaints that have\
+ been ever disabled in a previous iteration and keep them disabled, so\
+ as to avoid infinite looping inside @code{shorten_branches}. \
+ The @var{length} parameter provides the length calculated previously from\
+ attributes. \
+ @var{sequence_p} indicates if the instruction presented is inside a\
+ @code{SEQUENCE}.  Note, if you make @code{get_variants} provide variants\
+ for an entire @code{SEQUENCE}, the @code{SEQUENCE} will henceforth be handled\
+ as a single entity for branch shortening. \
+ @var{target_p} indicates if the instruction is considered the target of \
+ a branch, function call, or function return.",
+ void, (insn_length_parameters_t *insn_length_parameters),
+ NULL)
+
 DEFHOOKPOD
 (atomic_test_and_set_trueval,
  "This value should be set if the result written by\
Index: target.h
===================================================================
--- target.h	(revision 2691)
+++ target.h	(working copy)
@@ -165,6 +165,28 @@  enum vect_cost_model_location {
   vect_epilogue = 2
 };
 
+typedef struct
+{
+  unsigned align_set;
+  /* Cost as a branch / call target or call return address.  */
+  int target_cost;
+  int fallthrough_cost;
+  int branch_cost;
+  int length;
+  /* 0 for not length sensitive, 1 for largest offset range,
+     2 for next smaller etc.  */
+  unsigned length_sensitive : 8;
+  bool enabled;
+} insn_length_variant_t;
+
+typedef struct insn_length_parameters_s
+{
+  int align_unit_log;
+  int align_base_log;
+  int max_variants;
+  int (*get_variants) (rtx, int, bool, bool, insn_length_variant_t *);
+} insn_length_parameters_t;
+
 /* The target structure.  This holds all the backend hooks.  */
 #define DEFHOOKPOD(NAME, DOC, TYPE, INIT) TYPE NAME;
 #define DEFHOOK(NAME, DOC, TYPE, PARAMS, INIT) TYPE (* NAME) PARAMS;
Index: targhooks.c
===================================================================
--- targhooks.c	(revision 2691)
+++ targhooks.c	(working copy)
@@ -1540,4 +1540,18 @@  default_member_type_forces_blk (const_tr
   return false;
 }
 
+/* Default version of adjust_insn_length.  */
+
+int
+default_adjust_insn_length (rtx insn ATTRIBUTE_UNUSED, int length,
+			    bool in_delay_sequence ATTRIBUTE_UNUSED)
+{
+#ifdef ADJUST_INSN_LENGTH
+  if (!in_delay_sequence)
+    ADJUST_INSN_LENGTH (insn, length);
+#endif
+  return length;
+}
+
+
 #include "gt-targhooks.h"
Index: targhooks.h
===================================================================
--- targhooks.h	(revision 2691)
+++ targhooks.h	(working copy)
@@ -193,3 +193,4 @@  extern const char *default_pch_valid_p (
 extern void default_asm_output_ident_directive (const char*);
 
 extern bool default_member_type_forces_blk (const_tree, enum machine_mode);
+extern int default_adjust_insn_length (rtx, int, bool);
Index: genattrtab.c
===================================================================
--- genattrtab.c	(revision 2691)
+++ genattrtab.c	(working copy)
@@ -1551,7 +1551,7 @@  make_length_attrs (void)
       "*insn_current_length"
     };
   static rtx (*const no_address_fn[]) (rtx)
-    = {identity_fn,identity_fn, zero_fn, zero_fn};
+    = {identity_fn,identity_fn, zero_fn, identity_fn};
   static rtx (*const address_fn[]) (rtx)
     = {max_fn, min_fn, one_fn, identity_fn};
   size_t i;