diff mbox

RFA: hookize ADJUST_INSN_LENGTH (Was: RFA: Add lock_lenth attribute to support the ARC port)

Message ID 20121028131312.poycwzqkg0kc4gc8-nzlynne@webmail.spamcop.net
State New
Headers show

Commit Message

Joern Rennecke Oct. 28, 2012, 5:13 p.m. UTC
Quoting Richard Biener <richard.guenther@gmail.com>:

>> Thus, you can allow the length to vary downwards as well as upwards
>> across iterations with suitable definitions of the @code{length} attribute
>> and/or @code{ADJUST_INSN_LENGTH}.  Care has to be taken that this does not
>> lead to infinite loops.
>
> I don't see that you can shrink length with just suitable lock_length and
> length attributes.

I disagree there (for certain values of 'just'), but we can just agree
to disagree on this point because...

>  What seems to be the cruical difference is that you
> apply ADJUST_INSN_LENGTH after combining lock-length-max and length.
> But then you _decrease_ length with ADJUST_INSN_LENGHT ...
>
> Maybe what you really want is ADJUST_INSN_LENGTH_AFTER which is
> applied afterwards?  Thus,

Well, actually, I found a number of problems with ADJUST_INSN_LENGTH:
- it is not applied to inner insn is a delay slot sequence.  You can
   sort of work around this by stitching recursive calls together, but
   then you are faced with another problem:
- You don't know what the length prior to ADJUST_INSN_LENGTH was.  That
   was even worse with the non-unique uids where you didn't knew squat
   about the instructions in the delay slot.
- As you said, I want the adjustment happen after the maximum calculation.
   Well, usually.  There are actually special cases where it would be useful
   to have some sort of maximum calculation in place, to break up
   alignment-driven cycles, but only applicable with a lesser priority than
   for the range-driven branch offset calculations.

But adding yet another macro neither does solve all these problems, nor
would it align with our goal to move away from target macros.

I found now an alternate way to make the ARC port termiante building its
insn-attrtab.c - it involves using match_test("get_attr_length (insn) == 2")
instead of eq_attr("[lock_]length" (const_int 2)) - where I really want to
know if the instruction was considered short in the previous iteration.

So, I have made a patch to replace the ADJUST_INSN_LENGTH macro in final.c
with an adjust_insn_length hook, for which a default implementation
using ADJUST_INSN_LENGTH is provided in targhooks.c to provide for an
easy transition.
I've looked at the existing ports that use ADJUST_INSN_LENGTH, and some
seem to prefer to return an adjustment to be added to the length, while
others prefer to return the new length.  The latter seemed to be slightly
more numerous, so I went with that.

The hook has two new parameters:
- a flag that tells it if the insn in question is a delay sequence.
   The default hook implementation skips the invocation of ADJUST_INSN_LENGTH
   in this case for the sake of compatibility.
- a pointer to int to set the number of iteration loops till the length
   locking feature is supposed to apply to this instruction length when
   using the increasing size calculations.  The pointed-to value is initialized
   to zero, which means that length locking is always applied (assuming final.c
   uses the increasing length algorithm).  Setting this to a higher number
   effectively gives the new instruction length a lower priority to be put
   into uid_lock_length.

> Note that
>
>> Care has to be taken that this does not
>> lead to infinite loops.
>
> doesn't convince me that is properly designed ;)

With the hook mechanism, it is much harder to create an infinite loop
inside shorten_branches.  (It would involve something like setting
  iteration_threshold to MAX_INT and making length decreasing when niter
  is at MAX_INT, then letting integer overflow of niter take its course.
  Making it impossible for a port maintainer to get things wrong is not a
  meaningful goal for GCC, but making it straightforward to get it right is.)

This patch builds on:
http://gcc.gnu.org/ml/gcc-patches/2012-10/msg02527.html

Bootstrapped in revision 192879 on i686-pc-linux-gnu.
Tested with config-list.mk on x86_64-unknown-linux-gnu.
2012-10-28  Joern Rennecke  <joern.rennecke@embecosm.com>

        * doc/tm.texi.in (@hook TARGET_ADJUST_INSN_LENGTH): Add.
        * doc/tm.texi: Regenerate. 
        * final.c (get_attr_length_1): Use targetm.adjust_insn_length
        instead of ADJUST_INSN_LENGTH.
        (adjust_length): New function.
        (shorten_branches): Use adjust_length instead of ADJUST_INSN_LENGTH.
        * target.def (adjust_insn_length): New hook.
        * targhooks.c (default_adjust_insn_length): New function.
        * targhooks.h (default_adjust_insn_length): Declare.

Comments

Richard Biener Oct. 30, 2012, 2:59 p.m. UTC | #1
On Sun, Oct 28, 2012 at 6:13 PM, Joern Rennecke
<joern.rennecke@embecosm.com> wrote:
> Quoting Richard Biener <richard.guenther@gmail.com>:
>
>>> Thus, you can allow the length to vary downwards as well as upwards
>>> across iterations with suitable definitions of the @code{length}
>>> attribute
>>> and/or @code{ADJUST_INSN_LENGTH}.  Care has to be taken that this does
>>> not
>>> lead to infinite loops.
>>
>>
>> I don't see that you can shrink length with just suitable lock_length and
>> length attributes.
>
>
> I disagree there (for certain values of 'just'), but we can just agree
> to disagree on this point because...
>
>>  What seems to be the cruical difference is that you
>> apply ADJUST_INSN_LENGTH after combining lock-length-max and length.
>> But then you _decrease_ length with ADJUST_INSN_LENGHT ...
>>
>> Maybe what you really want is ADJUST_INSN_LENGTH_AFTER which is
>> applied afterwards?  Thus,
>
>
> Well, actually, I found a number of problems with ADJUST_INSN_LENGTH:
> - it is not applied to inner insn is a delay slot sequence.  You can
>   sort of work around this by stitching recursive calls together, but
>   then you are faced with another problem:
> - You don't know what the length prior to ADJUST_INSN_LENGTH was.  That
>   was even worse with the non-unique uids where you didn't knew squat
>   about the instructions in the delay slot.
> - As you said, I want the adjustment happen after the maximum calculation.
>   Well, usually.  There are actually special cases where it would be useful
>   to have some sort of maximum calculation in place, to break up
>   alignment-driven cycles, but only applicable with a lesser priority than
>   for the range-driven branch offset calculations.
>
> But adding yet another macro neither does solve all these problems, nor
> would it align with our goal to move away from target macros.
>
> I found now an alternate way to make the ARC port termiante building its
> insn-attrtab.c - it involves using match_test("get_attr_length (insn) == 2")
> instead of eq_attr("[lock_]length" (const_int 2)) - where I really want to
> know if the instruction was considered short in the previous iteration.
>
> So, I have made a patch to replace the ADJUST_INSN_LENGTH macro in final.c
> with an adjust_insn_length hook, for which a default implementation
> using ADJUST_INSN_LENGTH is provided in targhooks.c to provide for an
> easy transition.
> I've looked at the existing ports that use ADJUST_INSN_LENGTH, and some
> seem to prefer to return an adjustment to be added to the length, while
> others prefer to return the new length.  The latter seemed to be slightly
> more numerous, so I went with that.
>
> The hook has two new parameters:
> - a flag that tells it if the insn in question is a delay sequence.
>   The default hook implementation skips the invocation of ADJUST_INSN_LENGTH
>   in this case for the sake of compatibility.
> - a pointer to int to set the number of iteration loops till the length
>   locking feature is supposed to apply to this instruction length when
>   using the increasing size calculations.  The pointed-to value is
> initialized
>   to zero, which means that length locking is always applied (assuming
> final.c
>   uses the increasing length algorithm).  Setting this to a higher number
>   effectively gives the new instruction length a lower priority to be put
>   into uid_lock_length.
>
>> Note that
>>
>>> Care has to be taken that this does not
>>> lead to infinite loops.
>>
>>
>> doesn't convince me that is properly designed ;)
>
>
> With the hook mechanism, it is much harder to create an infinite loop
> inside shorten_branches.  (It would involve something like setting
>  iteration_threshold to MAX_INT and making length decreasing when niter
>  is at MAX_INT, then letting integer overflow of niter take its course.
>  Making it impossible for a port maintainer to get things wrong is not a
>  meaningful goal for GCC, but making it straightforward to get it right is.)
>
> This patch builds on:
> http://gcc.gnu.org/ml/gcc-patches/2012-10/msg02527.html
>
> Bootstrapped in revision 192879 on i686-pc-linux-gnu.
> Tested with config-list.mk on x86_64-unknown-linux-gnu.

Apart from the iteration_threshold the hookization would be straight-forward.
Now I cannot decipher from the patch what functional change it introduces ;)

There are also ony 10 users of ADJUST_INSN_LENGTH, not enough to
warrant introducing a hook without removing the macro IMHO.

Thanks,
Richard.

> 2012-10-28  Joern Rennecke  <joern.rennecke@embecosm.com>
>
>         * doc/tm.texi.in (@hook TARGET_ADJUST_INSN_LENGTH): Add.
>         * doc/tm.texi: Regenerate.
>         * final.c (get_attr_length_1): Use targetm.adjust_insn_length
>         instead of ADJUST_INSN_LENGTH.
>         (adjust_length): New function.
>         (shorten_branches): Use adjust_length instead of ADJUST_INSN_LENGTH.
>         * target.def (adjust_insn_length): New hook.
>         * targhooks.c (default_adjust_insn_length): New function.
>         * targhooks.h (default_adjust_insn_length): Declare.
>
> diff -drup gcc-192879-haveattr/doc/tm.texi gcc/doc/tm.texi
> --- gcc-192879-haveattr/doc/tm.texi     2012-10-28 01:07:38.463469923 +0000
> +++ gcc/doc/tm.texi     2012-10-28 01:31:15.522227927 +0000
> @@ -11341,3 +11341,7 @@ This value should be set if the result w
>  @deftypevrx {Target Hook} {bool} TARGET_HAVE_ATTR_LENGTH
>  These flags are automatically generated;  you should not override them in
> tm.c:
>  @end deftypevr
> +
> +@deftypefn {Target Hook} int TARGET_ADJUST_INSN_LENGTH (rtx @var{insn}, int
> @var{length}, bool @var{in_delay_sequence}, int *@var{iteration_threshold})
> +Return an adjusted length for @var{insn}.  @var{length} is the value that
> has been calculated using the @code{length} instruction attribute.
> @var{in_delay_sequence} if @var{insn} forms part of a delay sequence.
> *@var{iteration_threshold} specifies the number of branch shortening
> iterations before length decreases are inhibited.  The default
> implementation uses @code{ADJUST_INSN_LENGTH}, if defined.
> +@end deftypefn
> diff -drup gcc-192879-haveattr/doc/tm.texi.in gcc/doc/tm.texi.in
> --- gcc-192879-haveattr/doc/tm.texi.in  2012-10-28 01:07:38.489470252 +0000
> +++ gcc/doc/tm.texi.in  2012-10-28 01:31:55.578745701 +0000
> @@ -11175,3 +11175,5 @@ memory model bits are allowed.
>  @hook TARGET_ATOMIC_TEST_AND_SET_TRUEVAL
>
>  @hook TARGET_HAVE_CC0
> +
> +@hook TARGET_ADJUST_INSN_LENGTH
> diff -drup gcc-192879-haveattr/final.c gcc/final.c
> --- gcc-192879-haveattr/final.c 2012-10-28 01:07:38.492470288 +0000
> +++ gcc/final.c 2012-10-28 15:24:49.731169660 +0000
> @@ -424,10 +424,8 @@ get_attr_length_1 (rtx insn, int (*fallb
>         break;
>        }
>
> -#ifdef ADJUST_INSN_LENGTH
> -  ADJUST_INSN_LENGTH (insn, length);
> -#endif
> -  return length;
> +  int dummy = -1;
> +  return targetm.adjust_insn_length (insn, length, false, &dummy);
>  }
>
>  /* Obtain the current length of an insn.  If branch shortening has been
> done,
> @@ -827,6 +825,56 @@ struct rtl_opt_pass pass_compute_alignme
>  };
>
>
> +/* Helper function for shorten_branches, to apply the adjust_insn_length
> +   target hook, and lock in length increases in order to avoid infinite
> +   iteration cycles.
> +   INSN the the instruction being processed, and new_length is the length
> +   that has been calulated for it from its length attribute.  UID is
> +   its UID.  SEQ_P indicates if it is inside a delay slot sequence.
> +   INSN_LENGTHS and UID_LOCK_LENGTH are the arrays holding the current
> lengths
> +   and lockeds-in maximum lengths from the prevuious iterations, and are
> +   updated for UID when appropriate.
> +   *NITER contains the number of iterations since we last made a change
> +   to uid_lock_length.
> +   *SOMETHING_CHANGED will be set if we make a change to INSN_LENGTHS.
> +   The return value is the new length taking the result of the
> +   adjust_insn_length and uid_lock_length into account.
> +   Note that when *NITER is negative, no length locking is effective, and
> +   the new lengths are just assigned.  This is used in the initial pass,
> +   and when only making a single pass to update instruction length
> downward.  */
> +static int
> +adjust_length (rtx insn, int new_length, int uid, int *insn_lengths,
> +              int *uid_lock_length, bool seq_p, int *niter,
> +              bool *something_changed)
> +
> +{
> +  int iter_threshold = 0;
> +  new_length
> +    = targetm.adjust_insn_length (insn, new_length, seq_p,
> &iter_threshold);
> +  if (new_length < 0)
> +    fatal_insn ("negative insn length", insn);
> +  if (iter_threshold < 0)
> +    fatal_insn ("negative shorten_branches iteration threshold", insn);
> +  if (new_length != insn_lengths[uid])
> +    {
> +      if (new_length < uid_lock_length[uid])
> +       new_length = uid_lock_length[uid];
> +      if (new_length == insn_lengths[uid])
> +       ; /* done here.  */
> +      else if (*niter < iter_threshold
> +              || new_length > insn_lengths[uid])
> +       {
> +         if (*niter >= iter_threshold)
> +           uid_lock_length[uid] = new_length, *niter = 0;
> +         insn_lengths[uid] = new_length;
> +         *something_changed = true;
> +       }
> +      else
> +       new_length = insn_lengths[uid];
> +    }
> +  return new_length;
> +}
> +
>  /* Make a pass over all insns and compute their actual lengths by
> shortening
>     any branches of variable length if possible.  */
>
> @@ -848,7 +896,7 @@ shorten_branches (rtx first)
>    int max_skip;
>  #define MAX_CODE_ALIGN 16
>    rtx seq;
> -  int something_changed = 1;
> +  bool something_changed = true;
>    char *varying_length;
>    rtx body;
>    int uid;
> @@ -964,7 +1012,8 @@ shorten_branches (rtx first)
>      return;
>
>    /* Allocate the rest of the arrays.  */
> -  insn_lengths = XNEWVEC (int, max_uid);
> +  insn_lengths = XCNEWVEC (int, max_uid);
> +  int *uid_lock_length = XCNEWVEC (int, max_uid);
>    insn_lengths_max_uid = max_uid;
>    /* Syntax errors can lead to labels being outside of the main insn
> stream.
>       Initialize insn_addresses, so that we get reproducible results.  */
> @@ -1065,6 +1114,7 @@ shorten_branches (rtx first)
>
>    /* Compute initial lengths, addresses, and varying flags for each insn.
> */
>    int (*length_fun) (rtx) = increasing ? insn_min_length :
> insn_default_length;
> +  int niter = increasing ? 0 : -1;
>
>    for (insn_current_address = 0, insn = first;
>         insn != 0;
> @@ -1072,8 +1122,6 @@ shorten_branches (rtx first)
>      {
>        uid = INSN_UID (insn);
>
> -      insn_lengths[uid] = 0;
> -
>        if (LABEL_P (insn))
>         {
>           int log = LABEL_TO_ALIGNMENT (insn);
> @@ -1093,6 +1141,8 @@ shorten_branches (rtx first)
>        if (INSN_DELETED_P (insn))
>         continue;
>
> +      int length = 0;
> +
>        body = PATTERN (insn);
>        if (GET_CODE (body) == ADDR_VEC || GET_CODE (body) == ADDR_DIFF_VEC)
>         {
> @@ -1100,13 +1150,12 @@ shorten_branches (rtx first)
>              section.  */
>           if (JUMP_TABLES_IN_TEXT_SECTION
>               || readonly_data_section == text_section)
> -           insn_lengths[uid] = (XVECLEN (body,
> -                                         GET_CODE (body) == ADDR_DIFF_VEC)
> -                                * GET_MODE_SIZE (GET_MODE (body)));
> +           length = (XVECLEN (body, GET_CODE (body) == ADDR_DIFF_VEC)
> +                     * GET_MODE_SIZE (GET_MODE (body)));
>           /* Alignment is handled by ADDR_VEC_ALIGN.  */
>         }
>        else if (GET_CODE (body) == ASM_INPUT || asm_noperands (body) >= 0)
> -       insn_lengths[uid] = asm_insn_count (body) * insn_default_length
> (insn);
> +       length = asm_insn_count (body) * insn_default_length (insn);
>        else if (GET_CODE (body) == SEQUENCE)
>         {
>           int i;
> @@ -1134,7 +1183,10 @@ shorten_branches (rtx first)
>               else
>                 inner_length = inner_length_fun (inner_insn);
>
> -             insn_lengths[inner_uid] = inner_length;
> +             inner_length
> +               = adjust_length (insn, inner_length, inner_uid,
> insn_lengths,
> +                                uid_lock_length, true, &niter,
> +                                &something_changed);
>               if (const_delay_slots)
>                 {
>                   if ((varying_length[inner_uid]
> @@ -1145,21 +1197,18 @@ shorten_branches (rtx first)
>                 }
>               else
>                 varying_length[inner_uid] = 0;
> -             insn_lengths[uid] += inner_length;
> +             length += inner_length;
>             }
>         }
>        else if (GET_CODE (body) != USE && GET_CODE (body) != CLOBBER)
>         {
> -         insn_lengths[uid] = length_fun (insn);
> +         length = length_fun (insn);
>           varying_length[uid] = insn_variable_length_p (insn);
>         }
> -
> +
>        /* If needed, do any adjustment.  */
> -#ifdef ADJUST_INSN_LENGTH
> -      ADJUST_INSN_LENGTH (insn, insn_lengths[uid]);
> -      if (insn_lengths[uid] < 0)
> -       fatal_insn ("negative insn length", insn);
> -#endif
> +      adjust_length (insn, length, uid, insn_lengths, uid_lock_length,
> +                    false, &niter, &something_changed);
>      }
>
>    /* Now loop over all the insns finding varying length insns.  For each,
> @@ -1168,16 +1217,16 @@ shorten_branches (rtx first)
>
>    while (something_changed)
>      {
> -      something_changed = 0;
> +      if (increasing)
> +       niter++;
> +
> +      something_changed = false;
>        insn_current_align = MAX_CODE_ALIGN - 1;
>        for (insn_current_address = 0, insn = first;
>            insn != 0;
>            insn = NEXT_INSN (insn))
>         {
>           int new_length;
> -#ifdef ADJUST_INSN_LENGTH
> -         int tmp_length;
> -#endif
>           int length_align;
>
>           uid = INSN_UID (insn);
> @@ -1363,17 +1412,13 @@ shorten_branches (rtx first)
>                   if (! varying_length[inner_uid])
>                     inner_length = insn_lengths[inner_uid];
>                   else
> -                   inner_length = insn_current_length (inner_insn);
> -
> -                 if (inner_length != insn_lengths[inner_uid])
>                     {
> -                     if (!increasing || inner_length >
> insn_lengths[inner_uid])
> -                       {
> -                         insn_lengths[inner_uid] = inner_length;
> -                         something_changed = 1;
> -                       }
> -                     else
> -                       inner_length = insn_lengths[inner_uid];
> +                     inner_length = insn_current_length (inner_insn);
> +
> +                     inner_length
> +                       = adjust_length (insn, inner_length, inner_uid,
> +                                        insn_lengths, uid_lock_length,
> true,
> +                                        &niter, &something_changed);
>                     }
>                   insn_current_address += inner_length;
>                   new_length += inner_length;
> @@ -1385,21 +1430,13 @@ shorten_branches (rtx first)
>               insn_current_address += new_length;
>             }
>
> -#ifdef ADJUST_INSN_LENGTH
>           /* If needed, do any adjustment.  */
> -         tmp_length = new_length;
> -         ADJUST_INSN_LENGTH (insn, new_length);
> +         int tmp_length = new_length;
> +         new_length
> +           = adjust_length (insn, new_length, uid, insn_lengths,
> +                            uid_lock_length, false, &niter,
> +                            &something_changed);
>           insn_current_address += (new_length - tmp_length);
> -#endif
> -
> -         if (new_length != insn_lengths[uid]
> -             && (!increasing || new_length > insn_lengths[uid]))
> -           {
> -             insn_lengths[uid] = new_length;
> -             something_changed = 1;
> -           }
> -         else
> -           insn_current_address += insn_lengths[uid] - new_length;
>         }
>        /* For a non-optimizing compile, do only a single pass.  */
>        if (!increasing)
> diff -drup gcc-192879-haveattr/target.def gcc/target.def
> --- gcc-192879-haveattr/target.def      2012-10-28 01:07:38.517470606 +0000
> +++ gcc/target.def      2012-10-28 01:31:15.525227979 +0000
> @@ -2818,6 +2818,17 @@ DEFHOOK
>   enum unwind_info_type, (void),
>   default_debug_unwind_info)
>
> +DEFHOOK
> +(adjust_insn_length,
> + "Return an adjusted length for @var{insn}.  @var{length} is the value
> that\
> + has been calculated using the @code{length} instruction attribute. \
> + @var{in_delay_sequence} if @var{insn} forms part of a delay sequence. \
> + *@var{iteration_threshold} specifies the number of branch shortening\
> + iterations before length decreases are inhibited.  The default\
> + implementation uses @code{ADJUST_INSN_LENGTH}, if defined.",
> + int, (rtx insn, int length, bool in_delay_sequence, int
> *iteration_threshold),
> + default_adjust_insn_length)
> +
>  DEFHOOKPOD
>  (atomic_test_and_set_trueval,
>   "This value should be set if the result written by\
> diff -drup gcc-192879-haveattr/targhooks.c gcc/targhooks.c
> --- gcc-192879-haveattr/targhooks.c     2012-10-25 03:33:26.000000000 +0100
> +++ gcc/targhooks.c     2012-10-28 01:41:35.082931024 +0000
> @@ -1540,4 +1540,20 @@ default_member_type_forces_blk (const_tr
>    return false;
>  }
>
> +/* Default version of adjust_insn_length.  */
> +
> +int
> +default_adjust_insn_length (rtx insn ATTRIBUTE_UNUSED, int length,
> +                           bool in_delay_sequence, int *iter_threshold)
> +{
> +  if (in_delay_sequence)
> +    *iter_threshold = INT_MAX;
> +#ifdef ADJUST_INSN_LENGTH
> +  else
> +    ADJUST_INSN_LENGTH (insn, length);
> +#endif
> +  return length;
> +}
> +
> +
>  #include "gt-targhooks.h"
> diff -drup gcc-192879-haveattr/targhooks.h gcc/targhooks.h
> --- gcc-192879-haveattr/targhooks.h     2012-10-25 03:33:26.000000000 +0100
> +++ gcc/targhooks.h     2012-10-28 01:31:15.479227682 +0000
> @@ -193,3 +193,4 @@ extern const char *default_pch_valid_p (
>  extern void default_asm_output_ident_directive (const char*);
>
>  extern bool default_member_type_forces_blk (const_tree, enum machine_mode);
> +extern int default_adjust_insn_length (rtx, int, bool, int *);
>
Joern Rennecke Oct. 30, 2012, 4:14 p.m. UTC | #2
Quoting Richard Biener <richard.guenther@gmail.com>:

> Apart from the iteration_threshold the hookization would be straight-forward.
> Now I cannot decipher from the patch what functional change it introduces ;)

The only change occurs if we reach an iteration count of MAX_INT iterations -
which should already be indicative of a problem.

At the MAX_INTth iteration, we will apply the length locking logic to
instructions inside a delay slot sequence as well.

If we disregard this exceptional case, there should be no functional changes
unless someone defines TARGET_ADJUST_INSN_LENGTH.

uid_lock_length gets initialized to allocated with XCNEWVEC.  So, in
particular, the elements corresponding to instructions inside delay slot
sequences are initialized to zero.

As default hook sets *iter_threshold to MAX_INT when inside a delay sequence,
and doesn't change length, the max operation with uid_lock_length is a no-op,
and *niter < iter_threshold is true, hence a length change results in
updating the length immediately, without changing uid_lock_length.

In the case that we are not inside a delay slot sequence, the default hook
leaves iter_threshold as 0, and applies ADJUST_INSN_LENGTH.  Thus, when niter
is 0 or larger, as is the case for the ordinary looping operation, we
always find *niter >= iter_threshold, and thus apply the length  
locking mechanism.

If we are in the preliminary pass, or doing the single !increasing iteration,
niter is set to -1, so *inter < iter_threshold is always true, so again we
update the length immediately.

> There are also ony 10 users of ADJUST_INSN_LENGTH, not enough to
> warrant introducing a hook without removing the macro IMHO.

Ok, I can prepare a patch for that, even though it makes it even a bit
less obvious that there's no functional change.

What about the preparatory patch
http://gcc.gnu.org/ml/gcc-patches/2012-10/msg02527.html ?
Can I check that in now?
Richard Biener Oct. 31, 2012, 9:52 a.m. UTC | #3
On Tue, Oct 30, 2012 at 5:14 PM, Joern Rennecke
<joern.rennecke@embecosm.com> wrote:
> Quoting Richard Biener <richard.guenther@gmail.com>:
>
>> Apart from the iteration_threshold the hookization would be
>> straight-forward.
>> Now I cannot decipher from the patch what functional change it introduces
>> ;)
>
>
> The only change occurs if we reach an iteration count of MAX_INT iterations
> -
> which should already be indicative of a problem.
>
> At the MAX_INTth iteration, we will apply the length locking logic to
> instructions inside a delay slot sequence as well.
>
> If we disregard this exceptional case, there should be no functional changes
> unless someone defines TARGET_ADJUST_INSN_LENGTH.
>
> uid_lock_length gets initialized to allocated with XCNEWVEC.  So, in
> particular, the elements corresponding to instructions inside delay slot
> sequences are initialized to zero.
>
> As default hook sets *iter_threshold to MAX_INT when inside a delay
> sequence,
> and doesn't change length, the max operation with uid_lock_length is a
> no-op,
> and *niter < iter_threshold is true, hence a length change results in
> updating the length immediately, without changing uid_lock_length.
>
> In the case that we are not inside a delay slot sequence, the default hook
> leaves iter_threshold as 0, and applies ADJUST_INSN_LENGTH.  Thus, when
> niter
> is 0 or larger, as is the case for the ordinary looping operation, we
> always find *niter >= iter_threshold, and thus apply the length locking
> mechanism.
>
> If we are in the preliminary pass, or doing the single !increasing
> iteration,
> niter is set to -1, so *inter < iter_threshold is always true, so again we
> update the length immediately.

Ok, I see.

The patch is ok then.

>> There are also ony 10 users of ADJUST_INSN_LENGTH, not enough to
>> warrant introducing a hook without removing the macro IMHO.
>
>
> Ok, I can prepare a patch for that, even though it makes it even a bit
> less obvious that there's no functional change.

That would be nice (you can do that as followup).

> What about the preparatory patch
> http://gcc.gnu.org/ml/gcc-patches/2012-10/msg02527.html ?
> Can I check that in now?

Richard S. had some comments there so I defer to him.

Richard.
Richard Sandiford Oct. 31, 2012, 10:22 a.m. UTC | #4
Richard Biener <richard.guenther@gmail.com> writes:
> On Tue, Oct 30, 2012 at 5:14 PM, Joern Rennecke
> <joern.rennecke@embecosm.com> wrote:
>> Quoting Richard Biener <richard.guenther@gmail.com>:
>>
>>> Apart from the iteration_threshold the hookization would be
>>> straight-forward.
>>> Now I cannot decipher from the patch what functional change it introduces
>>> ;)
>>
>>
>> The only change occurs if we reach an iteration count of MAX_INT iterations
>> -
>> which should already be indicative of a problem.
>>
>> At the MAX_INTth iteration, we will apply the length locking logic to
>> instructions inside a delay slot sequence as well.
>>
>> If we disregard this exceptional case, there should be no functional changes
>> unless someone defines TARGET_ADJUST_INSN_LENGTH.
>>
>> uid_lock_length gets initialized to allocated with XCNEWVEC.  So, in
>> particular, the elements corresponding to instructions inside delay slot
>> sequences are initialized to zero.
>>
>> As default hook sets *iter_threshold to MAX_INT when inside a delay
>> sequence,
>> and doesn't change length, the max operation with uid_lock_length is a
>> no-op,
>> and *niter < iter_threshold is true, hence a length change results in
>> updating the length immediately, without changing uid_lock_length.
>>
>> In the case that we are not inside a delay slot sequence, the default hook
>> leaves iter_threshold as 0, and applies ADJUST_INSN_LENGTH.  Thus, when
>> niter
>> is 0 or larger, as is the case for the ordinary looping operation, we
>> always find *niter >= iter_threshold, and thus apply the length locking
>> mechanism.
>>
>> If we are in the preliminary pass, or doing the single !increasing
>> iteration,
>> niter is set to -1, so *inter < iter_threshold is always true, so again we
>> update the length immediately.
>
> Ok, I see.
>
> The patch is ok then.

I should probably have piped up earlier, but I'm really not sure about it.
ADJUST_INSN_LENGTH as defined now is telling us a property of the target.
The patch instead ties the new hook directly into the shorten_branches
algorithm, which I think is a bad idea.

IMO, the length attributes and ADJUST_INSN_LENGTH should be treated as
a fused process: every value returned by a length attribute (whether min,
default or current) should be filtered through ADJUST_INSN_LENGTH before
being used.  Whether ADJUST_INSN_LENGTH decreases or increases the original
length doesn't matter, because only the output of ADJUST_INSN_LENGTH should
be used.

Joern's patch makes us use ADJUST_INSN_LENGTH for delay slot insns,
which I agree is a good thing in itself.  It's the all-important
iteration parameter that feels wrong.

TBH, I still don't understand what property of the target we're trying
to describe here.  I gather it has something to do with alignment.
But I'd really like a description of that target property, independent
of the shorten_branches implementation.

Richard
Richard Biener Oct. 31, 2012, 10:30 a.m. UTC | #5
On Wed, Oct 31, 2012 at 11:22 AM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Tue, Oct 30, 2012 at 5:14 PM, Joern Rennecke
>> <joern.rennecke@embecosm.com> wrote:
>>> Quoting Richard Biener <richard.guenther@gmail.com>:
>>>
>>>> Apart from the iteration_threshold the hookization would be
>>>> straight-forward.
>>>> Now I cannot decipher from the patch what functional change it introduces
>>>> ;)
>>>
>>>
>>> The only change occurs if we reach an iteration count of MAX_INT iterations
>>> -
>>> which should already be indicative of a problem.
>>>
>>> At the MAX_INTth iteration, we will apply the length locking logic to
>>> instructions inside a delay slot sequence as well.
>>>
>>> If we disregard this exceptional case, there should be no functional changes
>>> unless someone defines TARGET_ADJUST_INSN_LENGTH.
>>>
>>> uid_lock_length gets initialized to allocated with XCNEWVEC.  So, in
>>> particular, the elements corresponding to instructions inside delay slot
>>> sequences are initialized to zero.
>>>
>>> As default hook sets *iter_threshold to MAX_INT when inside a delay
>>> sequence,
>>> and doesn't change length, the max operation with uid_lock_length is a
>>> no-op,
>>> and *niter < iter_threshold is true, hence a length change results in
>>> updating the length immediately, without changing uid_lock_length.
>>>
>>> In the case that we are not inside a delay slot sequence, the default hook
>>> leaves iter_threshold as 0, and applies ADJUST_INSN_LENGTH.  Thus, when
>>> niter
>>> is 0 or larger, as is the case for the ordinary looping operation, we
>>> always find *niter >= iter_threshold, and thus apply the length locking
>>> mechanism.
>>>
>>> If we are in the preliminary pass, or doing the single !increasing
>>> iteration,
>>> niter is set to -1, so *inter < iter_threshold is always true, so again we
>>> update the length immediately.
>>
>> Ok, I see.
>>
>> The patch is ok then.
>
> I should probably have piped up earlier, but I'm really not sure about it.
> ADJUST_INSN_LENGTH as defined now is telling us a property of the target.
> The patch instead ties the new hook directly into the shorten_branches
> algorithm, which I think is a bad idea.
>
> IMO, the length attributes and ADJUST_INSN_LENGTH should be treated as
> a fused process: every value returned by a length attribute (whether min,
> default or current) should be filtered through ADJUST_INSN_LENGTH before
> being used.  Whether ADJUST_INSN_LENGTH decreases or increases the original
> length doesn't matter, because only the output of ADJUST_INSN_LENGTH should
> be used.
>
> Joern's patch makes us use ADJUST_INSN_LENGTH for delay slot insns,
> which I agree is a good thing in itself.  It's the all-important
> iteration parameter that feels wrong.
>
> TBH, I still don't understand what property of the target we're trying
> to describe here.  I gather it has something to do with alignment.
> But I'd really like a description of that target property, independent
> of the shorten_branches implementation.

Yeah, the iteration parameter doesn't feel right, I agree.  But
ADJUST_INSN_LENGTH
is only used in this code.

Which is why I originally said if the patch would not add the
iteration parameter
it would be obvious ...

So, consider my approval revoked.

Richard.

> Richard
Joern Rennecke Oct. 31, 2012, 11:39 a.m. UTC | #6
Quoting Richard Sandiford <rdsandiford@googlemail.com>:

> I should probably have piped up earlier, but I'm really not sure about it.
> ADJUST_INSN_LENGTH as defined now is telling us a property of the target.
> The patch instead ties the new hook directly into the shorten_branches
> algorithm, which I think is a bad idea.
>
> IMO, the length attributes and ADJUST_INSN_LENGTH should be treated as
> a fused process: every value returned by a length attribute (whether min,
> default or current) should be filtered through ADJUST_INSN_LENGTH before
> being used.  Whether ADJUST_INSN_LENGTH decreases or increases the original
> length doesn't matter, because only the output of ADJUST_INSN_LENGTH should
> be used.
>
> Joern's patch makes us use ADJUST_INSN_LENGTH for delay slot insns,
> which I agree is a good thing in itself.  It's the all-important
> iteration parameter that feels wrong.
>
> TBH, I still don't understand what property of the target we're trying
> to describe here.  I gather it has something to do with alignment.
> But I'd really like a description of that target property, independent
> of the shorten_branches implementation.

It's about describing complex interactions of length adjustments that
derive from branch shortening and length added for (un)alignment for
scheduling purposes.  Expressed naively, these can lead to cycles.
I could manage with the combination of length / lock_length attribute,
but there was a fine line between tuning for optimal scheduling and
risking cycles.
So, as Richard Biener has pointed out, that design was not optimal.

What is really needed is a way to express priorities between instructions
when to inhibit length shrinkage from one iteration to another.

Ports that have no issues with negative feedback in the length calculations
don't need to worry about priorities; the length locking mechanism
will be a no-op anyways, so the question when it is being applied is moot.

In the case you have negative feedback, the default, to apply length
locking immediatly to all instructions, gives the fastest convergence,
although it will likely give a sub-optimal shortening solution.

The length variation for the ARC are not alike: there are branches that
are subject to branch shortening in the usual way, but they might
shrink when other things shrink.  When we are iterating starting at
minimal length and increasing lengths, these branches should be stopped
from shrinking, as they likly will grow back again in the next iteration.
OTOH there are potentially short instructions that are lengthened to
archive scheduling-dictated alignment or misalignment.  These should
be allowed to change freely back and forth.  Unless we have a rare
cycle that only depends on alignments.

So, to get this right without having the port writer to worry too much
when a length adjustment causes a cycle, we want all varying length
instructions subject to length locking to break cycles, but not with
the same priority.  Specifying this as an iteration count is an
easy way to implement this, without having to make shorten_branches
identify every cycle and the participating instructions that are
oscillating in length.  The latter would actually run up against
computability problems in distinguishing a single complex cycle and
multiple independent cycles.

If you like, we can describe the iter_threshold parameter in a different
way, just saying that it indicates a priority, 0 being the first to
get length locking applied, and the larger the number, the later.
Richard Sandiford Oct. 31, 2012, 11:56 a.m. UTC | #7
Joern Rennecke <joern.rennecke@embecosm.com> writes:
> Quoting Richard Sandiford <rdsandiford@googlemail.com>:
>> I should probably have piped up earlier, but I'm really not sure about it.
>> ADJUST_INSN_LENGTH as defined now is telling us a property of the target.
>> The patch instead ties the new hook directly into the shorten_branches
>> algorithm, which I think is a bad idea.
>>
>> IMO, the length attributes and ADJUST_INSN_LENGTH should be treated as
>> a fused process: every value returned by a length attribute (whether min,
>> default or current) should be filtered through ADJUST_INSN_LENGTH before
>> being used.  Whether ADJUST_INSN_LENGTH decreases or increases the original
>> length doesn't matter, because only the output of ADJUST_INSN_LENGTH should
>> be used.
>>
>> Joern's patch makes us use ADJUST_INSN_LENGTH for delay slot insns,
>> which I agree is a good thing in itself.  It's the all-important
>> iteration parameter that feels wrong.
>>
>> TBH, I still don't understand what property of the target we're trying
>> to describe here.  I gather it has something to do with alignment.
>> But I'd really like a description of that target property, independent
>> of the shorten_branches implementation.
>
> It's about describing complex interactions of length adjustments that
> derive from branch shortening and length added for (un)alignment for
> scheduling purposes.  Expressed naively, these can lead to cycles.

But shorten_branches should be written to avoid cycles, and I thought
your patch did that by making sure that the address of each instruction
only increases.

> I could manage with the combination of length / lock_length attribute,
> but there was a fine line between tuning for optimal scheduling and
> risking cycles.
> So, as Richard Biener has pointed out, that design was not optimal.
>
> What is really needed is a way to express priorities between instructions
> when to inhibit length shrinkage from one iteration to another.
>
> Ports that have no issues with negative feedback in the length calculations
> don't need to worry about priorities; the length locking mechanism
> will be a no-op anyways, so the question when it is being applied is moot.
>
> In the case you have negative feedback, the default, to apply length
> locking immediatly to all instructions, gives the fastest convergence,
> although it will likely give a sub-optimal shortening solution.
>
> The length variation for the ARC are not alike: there are branches that
> are subject to branch shortening in the usual way, but they might
> shrink when other things shrink.  When we are iterating starting at
> minimal length and increasing lengths, these branches should be stopped
> from shrinking, as they likly will grow back again in the next iteration.
> OTOH there are potentially short instructions that are lengthened to
> archive scheduling-dictated alignment or misalignment.  These should
> be allowed to change freely back and forth.  Unless we have a rare
> cycle that only depends on alignments.

Hmm, but this is still talking in terms of shorten_branches, rather than
the target property that we're trying to model.  It sounds like the property
is something along the lines of: some instructions have different encodings
(or work-alikes with different encodings), and we want to make all the
encodings available to the optimisers.  Is that right?  If so, that sounds
like a useful thing to model, but I think it should be modelled directly,
rather than as a modification to the shorten_branches algorithm.

Is the alignment/misalignment explicitly modelled in the rtl?  If so, how?
Using labels, or something else?

Richard
Joern Rennecke Oct. 31, 2012, 12:33 p.m. UTC | #8
Quoting Richard Sandiford <rdsandiford@googlemail.com>:

>> It's about describing complex interactions of length adjustments that
>> derive from branch shortening and length added for (un)alignment for
>> scheduling purposes.  Expressed naively, these can lead to cycles.
>
> But shorten_branches should be written to avoid cycles, and I thought
> your patch did that by making sure that the address of each instruction
> only increases.

This actually gives a very poor branch shortening result for ARC.
What I did before with the lock_length attribute was only locking in increases
for the length of branches, and preserving a complex set of invariants to
avoid cycles from the (mis)alignment padding.

Saying that for some instructions, the length locking comes in effect
only after a few iterations of not permanently increasing the size of
other instructions is still cycle-safe.

>> The length variation for the ARC are not alike: there are branches that
>> are subject to branch shortening in the usual way, but they might
>> shrink when other things shrink.  When we are iterating starting at
>> minimal length and increasing lengths, these branches should be stopped
>> from shrinking, as they likly will grow back again in the next iteration.
>> OTOH there are potentially short instructions that are lengthened to
>> archive scheduling-dictated alignment or misalignment.  These should
>> be allowed to change freely back and forth.  Unless we have a rare
>> cycle that only depends on alignments.
>
> Hmm, but this is still talking in terms of shorten_branches, rather than
> the target property that we're trying to model.  It sounds like the property
> is something along the lines of: some instructions have different encodings
> (or work-alikes with different encodings), and we want to make all the
> encodings available to the optimisers.  Is that right?

No, not really.  There are lots of redundant encodings, but we have only
one favourite encoding for each size.  We usually want the shortest version,
unless we want to tweak the alignment of a following instruction, or in
the case of unaligned conditional branches with unfilled delay slot, to
hide the branch penalty (don't ask me why it has that effect...).

> If so, that sounds
> like a useful thing to model, but I think it should be modelled directly,
> rather than as a modification to the shorten_branches algorithm.

The shorten_branches algorithm can't give the correct result without
taking these things into account.  Also, instruction lengths are needed
to know the requirements.  So, do you want me to run a scheduling
pass in each iteration of shorten_branches?

> Is the alignment/misalignment explicitly modelled in the rtl?  If so, how?
> Using labels, or something else?

No, it works with instruction addresses during branch shortening, and
extra bits in cfun->machine during instruction output to keep track of
the current alignment.
Richard Biener Oct. 31, 2012, 12:49 p.m. UTC | #9
On Wed, Oct 31, 2012 at 1:33 PM, Joern Rennecke
<joern.rennecke@embecosm.com> wrote:
> Quoting Richard Sandiford <rdsandiford@googlemail.com>:
>
>>> It's about describing complex interactions of length adjustments that
>>> derive from branch shortening and length added for (un)alignment for
>>> scheduling purposes.  Expressed naively, these can lead to cycles.
>>
>>
>> But shorten_branches should be written to avoid cycles, and I thought
>> your patch did that by making sure that the address of each instruction
>> only increases.
>
>
> This actually gives a very poor branch shortening result for ARC.
> What I did before with the lock_length attribute was only locking in
> increases
> for the length of branches, and preserving a complex set of invariants to
> avoid cycles from the (mis)alignment padding.
>
> Saying that for some instructions, the length locking comes in effect
> only after a few iterations of not permanently increasing the size of
> other instructions is still cycle-safe.
>
>
>>> The length variation for the ARC are not alike: there are branches that
>>> are subject to branch shortening in the usual way, but they might
>>> shrink when other things shrink.  When we are iterating starting at
>>> minimal length and increasing lengths, these branches should be stopped
>>> from shrinking, as they likly will grow back again in the next iteration.
>>> OTOH there are potentially short instructions that are lengthened to
>>> archive scheduling-dictated alignment or misalignment.  These should
>>> be allowed to change freely back and forth.  Unless we have a rare
>>> cycle that only depends on alignments.
>>
>>
>> Hmm, but this is still talking in terms of shorten_branches, rather than
>> the target property that we're trying to model.  It sounds like the
>> property
>> is something along the lines of: some instructions have different
>> encodings
>> (or work-alikes with different encodings), and we want to make all the
>> encodings available to the optimisers.  Is that right?
>
>
> No, not really.  There are lots of redundant encodings, but we have only
> one favourite encoding for each size.  We usually want the shortest version,
> unless we want to tweak the alignment of a following instruction, or in
> the case of unaligned conditional branches with unfilled delay slot, to
> hide the branch penalty (don't ask me why it has that effect...).

Maybe we should split the thing then into a adjust_insn_length attribute
without the iteration parameter and a hook, similar to the various scheduler
hooks, that is special for branch shortening.

Richard.

>
>> If so, that sounds
>> like a useful thing to model, but I think it should be modelled directly,
>> rather than as a modification to the shorten_branches algorithm.
>
>
> The shorten_branches algorithm can't give the correct result without
> taking these things into account.  Also, instruction lengths are needed
> to know the requirements.  So, do you want me to run a scheduling
> pass in each iteration of shorten_branches?
>
>
>> Is the alignment/misalignment explicitly modelled in the rtl?  If so, how?
>> Using labels, or something else?
>
>
> No, it works with instruction addresses during branch shortening, and
> extra bits in cfun->machine during instruction output to keep track of
> the current alignment.
Joern Rennecke Oct. 31, 2012, 1:06 p.m. UTC | #10
Quoting Richard Biener <richard.guenther@gmail.com>:

> Maybe we should split the thing then into a adjust_insn_length attribute
> without the iteration parameter

Attributes don't get any parameter but the instruction, and don't apply
to delay slot SEQUENCEs.
Richard Biener Oct. 31, 2012, 1:11 p.m. UTC | #11
On Wed, Oct 31, 2012 at 2:06 PM, Joern Rennecke
<joern.rennecke@embecosm.com> wrote:
> Quoting Richard Biener <richard.guenther@gmail.com>:
>
>> Maybe we should split the thing then into a adjust_insn_length attribute
>> without the iteration parameter
>
>
> Attributes don't get any parameter but the instruction, and don't apply
> to delay slot SEQUENCEs.

hook I meant, not attribute.

Richard.
Joern Rennecke Oct. 31, 2012, 1:54 p.m. UTC | #12
Quoting Richard Biener <richard.guenther@gmail.com>:

> On Wed, Oct 31, 2012 at 2:06 PM, Joern Rennecke
> <joern.rennecke@embecosm.com> wrote:
>> Quoting Richard Biener <richard.guenther@gmail.com>:
>>
>>> Maybe we should split the thing then into a adjust_insn_length attribute
>>> without the iteration parameter
>>
>>
>> Attributes don't get any parameter but the instruction, and don't apply
>> to delay slot SEQUENCEs.
>
> hook I meant, not attribute.

All right.
So I gather first is called again adjust_insn_length.  What do we call the
one for the iteration count (or priority, if you want to put it that way)
for the lock_length mechanism to become effective?

int lock_length_iter_count (rtx_insn, bool in_delay_slot) ?
int lock_length_priority (rtx_insn, bool iter_count) ?
Richard Sandiford Oct. 31, 2012, 2:48 p.m. UTC | #13
Joern Rennecke <joern.rennecke@embecosm.com> writes:
> Quoting Richard Sandiford <rdsandiford@googlemail.com>:
>>> The length variation for the ARC are not alike: there are branches that
>>> are subject to branch shortening in the usual way, but they might
>>> shrink when other things shrink.  When we are iterating starting at
>>> minimal length and increasing lengths, these branches should be stopped
>>> from shrinking, as they likly will grow back again in the next iteration.
>>> OTOH there are potentially short instructions that are lengthened to
>>> archive scheduling-dictated alignment or misalignment.  These should
>>> be allowed to change freely back and forth.  Unless we have a rare
>>> cycle that only depends on alignments.
>>
>> Hmm, but this is still talking in terms of shorten_branches, rather than
>> the target property that we're trying to model.  It sounds like the property
>> is something along the lines of: some instructions have different encodings
>> (or work-alikes with different encodings), and we want to make all the
>> encodings available to the optimisers.  Is that right?
>
> No, not really.  There are lots of redundant encodings, but we have only
> one favourite encoding for each size.  We usually want the shortest version,
> unless we want to tweak the alignment of a following instruction, or in
> the case of unaligned conditional branches with unfilled delay slot, to
> hide the branch penalty (don't ask me why it has that effect...).

But what I'm trying to get at is: why can't the backend tell
shorten_branches about the amount of alignment/misalignment
that the target wants, and where?  Via an attribute or a hook,
I don't mind which.  But it should be declarative, rather than
a part of the shorten_branches algorithm.

Also...

>> If so, that sounds
>> like a useful thing to model, but I think it should be modelled directly,
>> rather than as a modification to the shorten_branches algorithm.
>
> The shorten_branches algorithm can't give the correct result without
> taking these things into account.  Also, instruction lengths are needed
> to know the requirements.  So, do you want me to run a scheduling
> pass in each iteration of shorten_branches?

...it looks like arc_reorg already runs shorten_branches in a loop.
Is that right?  So in a sense we already have iteration involving
shorten_branches and the backend.

I realise we want to cut down the number of iterations where possible,
but at the same time, it seems like you're trying to do conditional
execution conversion on the fly during shorten_branches, such that the
rtl doesn't actually represent the instructions we output.  And the
performance problem you're seeing is that those behind-the-scenes
changes are too confusing for the current shorten_branches code to
handle well.  E.g. sometimes you decide to "delete" branches that stay
in the rtl stream as branches, and that got a length in a previous
iteration.

At some point, as you say, you hav have to commit to using a particular
form (conditionally executed or not) in order to avoid cycles.  But that
seems like a conditional execution decision rather than a shorten_branches
decision.  (It'd also be good to represent it in the rtl if possible.)

Richard
Joern Rennecke Nov. 1, 2012, 5:41 a.m. UTC | #14
Quoting Richard Sandiford <rdsandiford@googlemail.com>:

> But what I'm trying to get at is: why can't the backend tell
> shorten_branches about the amount of alignment/misalignment
> that the target wants, and where?  Via an attribute or a hook,
> I don't mind which.  But it should be declarative, rather than
> a part of the shorten_branches algorithm.

So, if we want to go with a declarative approach, each instruction can have
a plurality of variants, each described by a size, an instruction fetch
alignment base, an instruction fetch alignment offset set, a cache line base,
a cache line offset set, a cost as a branch destination - cached,
a cost as a branch destination - uncached, a fall-through predicted cost,
a fall-through mispredicted cost, a branch-taken-predicted cost,
a branch-taken-mispredicted cost.
Or, if we want to get into more detail, we should actually have a scheduler
instruction reservation descriptions instead of singular costs.

A variants availability and costs will depend on branch offsets,
the presence, kind and size of delay slot instructions and further
length-sensitive context (like the set of paths from the function entry,
with aggregate instruction fetch sizes and probabilities), so you have
to call a target hook each iteration to get the currently available set
of variants.

Then you have to see this in the context of relative execution frequencies
(for cached/uncached branch destinations), and branch probabilities and
the reliability of the probability information, and to-be-added annotations
about branch predictability. to arrive at a near-optimal solution in
reasonable time.

Do you have any suggestions about what algorithm to use?

FWIW, the current ARC description with the adjust_insn_length hook patch
uses instruction fetch offsets 0 and 2 to base 4, branch-taken/non-taken
costs with implicit machine-specific heuristics about predictability
(which are not as good as I'd like them to be, but better than nothing),
length-sensitive context information, branch probabilities and their
reliability, and delay slot instruction presence and size.  There are
some canned rules about how different requirements interact with each
other to get at a converging solution quickly and with minimal compromises.
All of the other mentioned features would be in principle interesting, but not
enough so far to justify implementing them in the previous length/lock_length
attribute framework (which was prone to cycles if you were not very careful
while adding new rules).

I believe other ports would have different priorities, possibly within the
sent I enumerated, maybe going beyond that.

> ...it looks like arc_reorg already runs shorten_branches in a loop.
> Is that right?  So in a sense we already have iteration involving
> shorten_branches and the backend.

Actually, that is mostly for instruction selection/splitting/combination
for compare-and-branch/test-bit-and-branch instructions, which have
very stringent offset requirements; these decisions need to be made
(mostly; there is code to handle rare out-of-range offsets later, but it
  makes code quality worse)
before delayed branch scheduling and final conditional execution decisions.

But the fine-tuning of instruction sizes and alignments happens later in
the original branch shortening pass.

> I realise we want to cut down the number of iterations where possible,
> but at the same time, it seems like you're trying to do conditional
> execution conversion on the fly during shorten_branches, such that the
> rtl doesn't actually represent the instructions we output.  And the
> performance problem you're seeing is that those behind-the-scenes
> changes are too confusing for the current shorten_branches code to
> handle well.  E.g. sometimes you decide to "delete" branches that stay
> in the rtl stream as branches, and that got a length in a previous
> iteration.

No, not really.  Within each branch shortening pass, these decisions
are static; they may change when the branches are modified after
early branch shortening.  They may also change after delay slot scheduling.

If we want to get really precise with rtl, I have to alter it in every
branch shortening iteration when the decision changes if a
compare-and-branch/ test-bit-and branch is within range or not.
If it's a single instruction, it is transparent to previous flag settings;
if it isn't, we might be able to use the flag setting result for a
successive branch or conditional execution.
So then we'd need a way to describe this rtl modification depending on
branch offset ranges.
But that still won't get rid of the ARC requirement of running an extra
branch shortening pass, unless you want to roll the delay slot scheduling
into branch shortening as well.
Others claim that delayed branch scheduling should be integrated with the
normal scheduler, while there are also good arguments to integrate the
scheduler with the register allocator.  Just roll it all into one big
pile of hair.

Going back to pragmatic issues...

The problem I had with the original GCC 4.4 framework was that branch
shortening would not terminate for some functions when there is non-linear
positive and negative feedback.
The problem I have with the framework in the current trunk is that it
applies length locking even to those instructions that it should not
apply to - or at least, not so quickly, i.e. the short insns that are
upsized for alignment purposes.  When the input alignment changes, all
previous decisions are wrong.

> At some point, as you say, you have have to commit to using a particular
> form (conditionally executed or not) in order to avoid cycles.  But that
> seems like a conditional execution decision rather than a shorten_branches
> decision.  (It'd also be good to represent it in the rtl if possible.)

Well, I could do this with additional machine-dependent rtl passes, but
this is really a (costly, in terms of development time and maintenance)
solution looking for a problem.

The ARC ccfsm machinery was originally only run at instruction output
time, and I modified it so I can run it earlier to tell what changes
it would make to the instructions and their lengths to get better quality
length predictions (most importantly, when optimizing for speed, it can
at times increase code size slightly, which can be catastrophic when we
commited to using a short branch at the edge of its offset range.  Running
the same code to make or predict decisions avoids issues with code getting
out of sync I would have with using a modified rtl-modifying copy.
By not actually modifying rtl, I don't have to allocate new uids during branch
shortening (which would require branch shorteing to restart from scratch or
reallocate arrays and somehow map different instructions and uids), nor do
I have to convert rtl back as the input changes, with the hazard of
increasing the rtl each iteration.
Richard Sandiford Nov. 1, 2012, 11:34 a.m. UTC | #15
Joern Rennecke <joern.rennecke@embecosm.com> writes:
> Quoting Richard Sandiford <rdsandiford@googlemail.com>:
>
>> But what I'm trying to get at is: why can't the backend tell
>> shorten_branches about the amount of alignment/misalignment
>> that the target wants, and where?  Via an attribute or a hook,
>> I don't mind which.  But it should be declarative, rather than
>> a part of the shorten_branches algorithm.
>
> So, if we want to go with a declarative approach, each instruction can have
> a plurality of variants, each described by a size, an instruction fetch
> alignment base, an instruction fetch alignment offset set, a cache line base,
> a cache line offset set, a cost as a branch destination - cached,
> a cost as a branch destination - uncached, a fall-through predicted cost,
> a fall-through mispredicted cost, a branch-taken-predicted cost,
> a branch-taken-mispredicted cost.
> Or, if we want to get into more detail, we should actually have a scheduler
> instruction reservation descriptions instead of singular costs.
>
> A variants availability and costs will depend on branch offsets,
> the presence, kind and size of delay slot instructions and further
> length-sensitive context (like the set of paths from the function entry,
> with aggregate instruction fetch sizes and probabilities), so you have
> to call a target hook each iteration to get the currently available set
> of variants.
>
> Then you have to see this in the context of relative execution frequencies
> (for cached/uncached branch destinations), and branch probabilities and
> the reliability of the probability information, and to-be-added annotations
> about branch predictability. to arrive at a near-optimal solution in
> reasonable time.

Hey, I wasn't suggesting anything like that :-)  I was just asking
about alignment.  It seemed to me that part of the problem is that
ARC wants to increase the length of an instruction in order to align
a following instruction (which I agree is a useful thing to support).
But as things stand, shorten_branches has no way of distinguing that from
an instruction that was forced to grow because of offsets being out of range.

If instead shorten_branches knows that an instruction prefers a particular
alignment/misalignment, it can cooperate with the backend to extend earlier
instructions to make that possible.  If we do it that way, shorten_branches
knows what's going on.

You quote lots of complications above, that's also my concern.  Directly
hooking into ADJUST_INSN_LENGTH restricts us to a very peephole optimisation
with no view wider than the current instruction.  E.g. for:

      insn1
      insn2
      insn3 -- wants to be misaligned by 2 bytes

hooking directly into ADJUST_INSN_LENGTH doesn't make it easy to consider
using insn1 rather than insn2 to provide the misalignment.

>> ...it looks like arc_reorg already runs shorten_branches in a loop.
>> Is that right?  So in a sense we already have iteration involving
>> shorten_branches and the backend.
>
> Actually, that is mostly for instruction selection/splitting/combination
> for compare-and-branch/test-bit-and-branch instructions, which have
> very stringent offset requirements; these decisions need to be made
> (mostly; there is code to handle rare out-of-range offsets later, but it
>   makes code quality worse)
> before delayed branch scheduling and final conditional execution decisions.
>
> But the fine-tuning of instruction sizes and alignments happens later in
> the original branch shortening pass.

OK, but the same kind of iteration seems acceptable after delayed-branch
scheduling too.

>> I realise we want to cut down the number of iterations where possible,
>> but at the same time, it seems like you're trying to do conditional
>> execution conversion on the fly during shorten_branches, such that the
>> rtl doesn't actually represent the instructions we output.  And the
>> performance problem you're seeing is that those behind-the-scenes
>> changes are too confusing for the current shorten_branches code to
>> handle well.  E.g. sometimes you decide to "delete" branches that stay
>> in the rtl stream as branches, and that got a length in a previous
>> iteration.
>
> No, not really.  Within each branch shortening pass, these decisions
> are static; they may change when the branches are modified after
> early branch shortening.  They may also change after delay slot scheduling.
>
> If we want to get really precise with rtl, I have to alter it in every
> branch shortening iteration when the decision changes if a
> compare-and-branch/ test-bit-and branch is within range or not.

I was suggesting that once you've committed to something (which I think
in your proposal corresponds to the iteration count becoming too high)
that commitment should be reflected in the rtl.  That also stops the
decision from being accidentally reversed.

>> At some point, as you say, you have have to commit to using a particular
>> form (conditionally executed or not) in order to avoid cycles.  But that
>> seems like a conditional execution decision rather than a shorten_branches
>> decision.  (It'd also be good to represent it in the rtl if possible.)
>
> Well, I could do this with additional machine-dependent rtl passes, but
> this is really a (costly, in terms of development time and maintenance)
> solution looking for a problem.
>
> The ARC ccfsm machinery was originally only run at instruction output
> time, and I modified it so I can run it earlier to tell what changes
> it would make to the instructions and their lengths to get better quality
> length predictions (most importantly, when optimizing for speed, it can
> at times increase code size slightly, which can be catastrophic when we
> commited to using a short branch at the edge of its offset range.  Running
> the same code to make or predict decisions avoids issues with code getting
> out of sync I would have with using a modified rtl-modifying copy.
> By not actually modifying rtl, I don't have to allocate new uids during branch
> shortening (which would require branch shorteing to restart from scratch or
> reallocate arrays and somehow map different instructions and uids), nor do
> I have to convert rtl back as the input changes, with the hazard of
> increasing the rtl each iteration.

But I think extending the FSM from just being an on-the-fly
transformatino during final (as for ARM) to something that happens
earlier is a pretty major step in terms of the interface between the
backend and the rest of GCC.  There are then undocumented restrictions
on what hooks can be called when, which if broken could lead to the FSM
producing different results during final than it did earlier.

E.g., personally, I don't think it's acceptable for ADJUST_INSN_LENGTH
to hard-code an assumption that all instructions are processed in
assembly order, which the ARC one seems to (irrespective of this patch).
That is, whenever ARC's ADJUST_INSN_LENGTH is called, it seems to
"simulate" the instructions between the current instruction and the
one previously passed to ADJUST_INSN_LENGTH, so it will break if we
ever try to take lengths in a different order.

I talked earlier about doing predication on the fly, but the ARC hook also
does alignment optimisations on the fly.  AIUI, the alignment optimisations
can increase offsets, and increasing offsets interferes with the predication.
As mentioned above, my main problem with this is that you seem to be saying
"there are lots of complications, so we have to do everything at once",
but at the same time you're suggesting something that forces both the
target-independent and ARC code to process things in assembly order and
look at each instruction in isolation.  It makes it very difficult to do
any of the three optimisations (branch shortening, predication, and alignment)
in a less peephole way in future, such as by taking execution frequency
into account.

OTOH, I realise I'm probably not being helpful here.  I've said my piece,
so I'd better bow out.

Richard
Joern Rennecke Nov. 1, 2012, 6:47 p.m. UTC | #16
Quoting Richard Sandiford <rdsandiford@googlemail.com>:

  > But I think extending the FSM from just being an on-the-fly
> transformatino during final (as for ARM) to something that happens
> earlier is a pretty major step in terms of the interface between the
> backend and the rest of GCC.  There are then undocumented restrictions
> on what hooks can be called when,

Indeed.  I found I have to call extract_constrain_insn_cached to restore
recog_data (the struct, even though gdb will never believe this and insist
on that being a type)  if I have recognized other insns (by evaluating
an attribute) from a splitter predicate.

> which if broken could lead to the FSM
> producing different results during final than it did earlier.

Well, actually, I did find that this happens, and added code to either
compensate for changes or inhibit them.  For the most part, these changes
are wanted, we just need to manage them to avoid out-of-range branches
or ADDR_DIFF_VECs.
You could think of this as simulated annealing for ice: some provisions need
to be made to avoid rupturing the vessel, either by anticipating size
increases, containing pressure to limit size increase, or a combination
of both.

> E.g., personally, I don't think it's acceptable for ADJUST_INSN_LENGTH
> to hard-code an assumption that all instructions are processed in
> assembly order, which the ARC one seems to (irrespective of this patch).

It doesn't, at least not for correctness; you might argue that it depends
on this for compilation time performance.  It does a scan in  
round-robin fashion from the last insn it processed till it reaches  
the one that is
to be processed.

But I did develop this stuff when passes were much more rigid than they
are today; I'll look into the feasibility of running the ccfsm in a
machine-specific pass to finalize in rtl whatever COND_EXEC opportunities
are visible at this stage, and calling it after preliminary compare-and-branch
determination, and after delay branch scheduling.  Maybe at some other places
too.
One complication is that sometimes a compare-and-branch decision has to be
reversed, so the exact ccfsm description before instruction output is that
it is unlikely to clobber the flags.  Thus, a conditionalizing that depends
on the flags surviving through a compare-and-branch cannot be finalized till
the branch is safe.  Still, the conditionalizing is likely and we should
take length changes into account, provided that the compare-and-branch stays
in range.
I suppose I could add a note
REG_DEP_TRUE (unspec [<list of brcc insn for required to be short for  
conditionalizing>] UNSPEC_CCFSM)
for any instructions subject to such uncertain conditionalizations.
Or probably rather three UNSPEC kinds: one to indicate ccfsm-predicated
deletion (for a branch), one for general conditionalizing
(for arithmetic, branches and calls), and one for equality conditionalizing
(for branches, wider offset ranges are available for short variants,
  and for register zeroing, which has a not-equal conditionalized short form).
Joern Rennecke Nov. 3, 2012, 2:22 p.m. UTC | #17
Quoting Richard Sandiford <rdsandiford@googlemail.com>:

> Joern Rennecke <joern.rennecke@embecosm.com> writes:

>> So, if we want to go with a declarative approach, each instruction can have
>> a plurality of variants, each described by a size, an instruction fetch
>> alignment base, an instruction fetch alignment offset set, a cache   
>> line base,
>> a cache line offset set, a cost as a branch destination - cached,
>> a cost as a branch destination - uncached, a fall-through predicted cost,
>> a fall-through mispredicted cost, a branch-taken-predicted cost,
>> a branch-taken-mispredicted cost.
>> Or, if we want to get into more detail, we should actually have a scheduler
>> instruction reservation descriptions instead of singular costs.
>>
>> A variants availability and costs will depend on branch offsets,
>> the presence, kind and size of delay slot instructions and further
>> length-sensitive context (like the set of paths from the function entry,
>> with aggregate instruction fetch sizes and probabilities), so you have
>> to call a target hook each iteration to get the currently available set
>> of variants.
>>
>> Then you have to see this in the context of relative execution frequencies
>> (for cached/uncached branch destinations), and branch probabilities and
>> the reliability of the probability information, and to-be-added annotations
>> about branch predictability. to arrive at a near-optimal solution in
>> reasonable time.
>
> Hey, I wasn't suggesting anything like that :-)  I was just asking
> about alignment.  It seemed to me that part of the problem is that
> ARC wants to increase the length of an instruction in order to align
> a following instruction (which I agree is a useful thing to support).
> But as things stand, shorten_branches has no way of distinguing that from
> an instruction that was forced to grow because of offsets being out of range.

It takes a lot of effort to change the interface of an existing hook
that requires in-depth target knowledge to understand the ramifictions
of the target's settings.  OTOH it is impossible to anticipate all the
things that various targets - or even your own, as it evolves over time -
will want from the interface in the future.

Therefore, I think we want an interface that naturally supports extensibility
while supporting backward compatibility.

My current plan is to have one hook to be called at the start of
shorten_branches.

This hook gets passes a pointer to a zeroed out data structure.
The target may fill in various fields, some of them, or leave them all
blank.  Some fields might indicate that other fields are valid, and thus
have to be populated with valid data.
I'll refer to this hook for now as TARGET_INSN_LENGTHS_PARAMETERS,
and type that the passed pointer points to as  insn_length_parameters_t.

The return value is a pointer to a function, to be used for adjusting
instruction length in the place of ADJUST_INSN_LENGTHS - and where it
is missing, or zero if no such functionis required.
(I first thought of having separate target hooks for everything, but then
  I realized that'd just be a waste of space in struct gcc_target.)

I can make a simple ADJUST_INSN_LENGTH backward-compatible targhooks.c
implementation, or a full conversion - but please don't make me waste
another day implementing and testing the latter only to throw it away
again.

One setting that I want to support for ARC is setting an alignment
unit size and left shift, and a function pointer to be called on each
instruction to provide instruction variants with various alignment offsets
(0 or 1 times unit size 2), sizes, fall-through costs, and branch-taken
costs, and branch-target costs.

The instruction alternatives would have to be allocated zeroed
(or provied zeroed - I haven-tworked thisoutyet) and
flags set before by TARGET_INSN_LENGTH_PARAMETERS might indicate
additional things that the port is cognizant about and fill in data
for in the instrution alternatives.
Thus, ports that provide additional information could be catered for
with refined algorithms while other ports that don't provide this
information get sensible defauls for the not provided data.
Joern Rennecke Nov. 4, 2012, 9:10 p.m. UTC | #18
Quoting Joern Rennecke <joern.rennecke@embecosm.com>:

> The return value is a pointer to a function, to be used for adjusting
> instruction length in the place of ADJUST_INSN_LENGTHS - and where it
> is missing, or zero if no such functionis required.
> (I first thought of having separate target hooks for everything, but then
>  I realized that'd just be a waste of space in struct gcc_target.)

Hmm, that bit doesn't work too well, because we have earlier optimizers that
want to know insn_min_length, and getting a pointer to the adjustment function
initialized would be messy.  So it's two hooks then, again with a separate
adjust_insn_length hooks.  Although I'm not sure if it makes sense to apply
that during branch shortening on instructions with multiple variants.
It's probably better to let the target incorporate that in the variants
generation.
diff mbox

Patch

diff -drup gcc-192879-haveattr/doc/tm.texi gcc/doc/tm.texi
--- gcc-192879-haveattr/doc/tm.texi	2012-10-28 01:07:38.463469923 +0000
+++ gcc/doc/tm.texi	2012-10-28 01:31:15.522227927 +0000
@@ -11341,3 +11341,7 @@  This value should be set if the result w
 @deftypevrx {Target Hook} {bool} TARGET_HAVE_ATTR_LENGTH
 These flags are automatically generated;  you should not override them in tm.c:
 @end deftypevr
+
+@deftypefn {Target Hook} int TARGET_ADJUST_INSN_LENGTH (rtx @var{insn}, int @var{length}, bool @var{in_delay_sequence}, int *@var{iteration_threshold})
+Return an adjusted length for @var{insn}.  @var{length} is the value that has been calculated using the @code{length} instruction attribute.  @var{in_delay_sequence} if @var{insn} forms part of a delay sequence.  *@var{iteration_threshold} specifies the number of branch shortening iterations before length decreases are inhibited.  The default implementation uses @code{ADJUST_INSN_LENGTH}, if defined.
+@end deftypefn
diff -drup gcc-192879-haveattr/doc/tm.texi.in gcc/doc/tm.texi.in
--- gcc-192879-haveattr/doc/tm.texi.in	2012-10-28 01:07:38.489470252 +0000
+++ gcc/doc/tm.texi.in	2012-10-28 01:31:55.578745701 +0000
@@ -11175,3 +11175,5 @@  memory model bits are allowed.
 @hook TARGET_ATOMIC_TEST_AND_SET_TRUEVAL
 
 @hook TARGET_HAVE_CC0
+
+@hook TARGET_ADJUST_INSN_LENGTH
diff -drup gcc-192879-haveattr/final.c gcc/final.c
--- gcc-192879-haveattr/final.c	2012-10-28 01:07:38.492470288 +0000
+++ gcc/final.c	2012-10-28 15:24:49.731169660 +0000
@@ -424,10 +424,8 @@  get_attr_length_1 (rtx insn, int (*fallb
 	break;
       }
 
-#ifdef ADJUST_INSN_LENGTH
-  ADJUST_INSN_LENGTH (insn, length);
-#endif
-  return length;
+  int dummy = -1;
+  return targetm.adjust_insn_length (insn, length, false, &dummy);
 }
 
 /* Obtain the current length of an insn.  If branch shortening has been done,
@@ -827,6 +825,56 @@  struct rtl_opt_pass pass_compute_alignme
 };
 
 
+/* Helper function for shorten_branches, to apply the adjust_insn_length
+   target hook, and lock in length increases in order to avoid infinite
+   iteration cycles.
+   INSN the the instruction being processed, and new_length is the length
+   that has been calulated for it from its length attribute.  UID is
+   its UID.  SEQ_P indicates if it is inside a delay slot sequence.
+   INSN_LENGTHS and UID_LOCK_LENGTH are the arrays holding the current lengths
+   and lockeds-in maximum lengths from the prevuious iterations, and are
+   updated for UID when appropriate.
+   *NITER contains the number of iterations since we last made a change
+   to uid_lock_length.
+   *SOMETHING_CHANGED will be set if we make a change to INSN_LENGTHS.
+   The return value is the new length taking the result of the
+   adjust_insn_length and uid_lock_length into account.
+   Note that when *NITER is negative, no length locking is effective, and
+   the new lengths are just assigned.  This is used in the initial pass,
+   and when only making a single pass to update instruction length downward.  */
+static int
+adjust_length (rtx insn, int new_length, int uid, int *insn_lengths,
+	       int *uid_lock_length, bool seq_p, int *niter,
+	       bool *something_changed)
+
+{
+  int iter_threshold = 0;
+  new_length
+    = targetm.adjust_insn_length (insn, new_length, seq_p, &iter_threshold);
+  if (new_length < 0)
+    fatal_insn ("negative insn length", insn);
+  if (iter_threshold < 0)
+    fatal_insn ("negative shorten_branches iteration threshold", insn);
+  if (new_length != insn_lengths[uid])
+    {
+      if (new_length < uid_lock_length[uid])
+	new_length = uid_lock_length[uid];
+      if (new_length == insn_lengths[uid])
+	; /* done here.  */
+      else if (*niter < iter_threshold
+	       || new_length > insn_lengths[uid])
+	{
+	  if (*niter >= iter_threshold)
+	    uid_lock_length[uid] = new_length, *niter = 0;
+	  insn_lengths[uid] = new_length;
+	  *something_changed = true;
+	}
+      else
+	new_length = insn_lengths[uid];
+    }
+  return new_length;
+}
+
 /* Make a pass over all insns and compute their actual lengths by shortening
    any branches of variable length if possible.  */
 
@@ -848,7 +896,7 @@  shorten_branches (rtx first)
   int max_skip;
 #define MAX_CODE_ALIGN 16
   rtx seq;
-  int something_changed = 1;
+  bool something_changed = true;
   char *varying_length;
   rtx body;
   int uid;
@@ -964,7 +1012,8 @@  shorten_branches (rtx first)
     return;
 
   /* Allocate the rest of the arrays.  */
-  insn_lengths = XNEWVEC (int, max_uid);
+  insn_lengths = XCNEWVEC (int, max_uid);
+  int *uid_lock_length = XCNEWVEC (int, max_uid);
   insn_lengths_max_uid = max_uid;
   /* Syntax errors can lead to labels being outside of the main insn stream.
      Initialize insn_addresses, so that we get reproducible results.  */
@@ -1065,6 +1114,7 @@  shorten_branches (rtx first)
 
   /* Compute initial lengths, addresses, and varying flags for each insn.  */
   int (*length_fun) (rtx) = increasing ? insn_min_length : insn_default_length;
+  int niter = increasing ? 0 : -1;
 
   for (insn_current_address = 0, insn = first;
        insn != 0;
@@ -1072,8 +1122,6 @@  shorten_branches (rtx first)
     {
       uid = INSN_UID (insn);
 
-      insn_lengths[uid] = 0;
-
       if (LABEL_P (insn))
 	{
 	  int log = LABEL_TO_ALIGNMENT (insn);
@@ -1093,6 +1141,8 @@  shorten_branches (rtx first)
       if (INSN_DELETED_P (insn))
 	continue;
 
+      int length = 0;
+
       body = PATTERN (insn);
       if (GET_CODE (body) == ADDR_VEC || GET_CODE (body) == ADDR_DIFF_VEC)
 	{
@@ -1100,13 +1150,12 @@  shorten_branches (rtx first)
 	     section.  */
 	  if (JUMP_TABLES_IN_TEXT_SECTION
 	      || readonly_data_section == text_section)
-	    insn_lengths[uid] = (XVECLEN (body,
-					  GET_CODE (body) == ADDR_DIFF_VEC)
-				 * GET_MODE_SIZE (GET_MODE (body)));
+	    length = (XVECLEN (body, GET_CODE (body) == ADDR_DIFF_VEC)
+		      * GET_MODE_SIZE (GET_MODE (body)));
 	  /* Alignment is handled by ADDR_VEC_ALIGN.  */
 	}
       else if (GET_CODE (body) == ASM_INPUT || asm_noperands (body) >= 0)
-	insn_lengths[uid] = asm_insn_count (body) * insn_default_length (insn);
+	length = asm_insn_count (body) * insn_default_length (insn);
       else if (GET_CODE (body) == SEQUENCE)
 	{
 	  int i;
@@ -1134,7 +1183,10 @@  shorten_branches (rtx first)
 	      else
 		inner_length = inner_length_fun (inner_insn);
 
-	      insn_lengths[inner_uid] = inner_length;
+	      inner_length
+		= adjust_length (insn, inner_length, inner_uid, insn_lengths,
+				 uid_lock_length, true, &niter,
+				 &something_changed);
 	      if (const_delay_slots)
 		{
 		  if ((varying_length[inner_uid]
@@ -1145,21 +1197,18 @@  shorten_branches (rtx first)
 		}
 	      else
 		varying_length[inner_uid] = 0;
-	      insn_lengths[uid] += inner_length;
+	      length += inner_length;
 	    }
 	}
       else if (GET_CODE (body) != USE && GET_CODE (body) != CLOBBER)
 	{
-	  insn_lengths[uid] = length_fun (insn);
+	  length = length_fun (insn);
 	  varying_length[uid] = insn_variable_length_p (insn);
 	}
-
+	
       /* If needed, do any adjustment.  */
-#ifdef ADJUST_INSN_LENGTH
-      ADJUST_INSN_LENGTH (insn, insn_lengths[uid]);
-      if (insn_lengths[uid] < 0)
-	fatal_insn ("negative insn length", insn);
-#endif
+      adjust_length (insn, length, uid, insn_lengths, uid_lock_length,
+		     false, &niter, &something_changed);
     }
 
   /* Now loop over all the insns finding varying length insns.  For each,
@@ -1168,16 +1217,16 @@  shorten_branches (rtx first)
 
   while (something_changed)
     {
-      something_changed = 0;
+      if (increasing)
+	niter++;
+
+      something_changed = false;
       insn_current_align = MAX_CODE_ALIGN - 1;
       for (insn_current_address = 0, insn = first;
 	   insn != 0;
 	   insn = NEXT_INSN (insn))
 	{
 	  int new_length;
-#ifdef ADJUST_INSN_LENGTH
-	  int tmp_length;
-#endif
 	  int length_align;
 
 	  uid = INSN_UID (insn);
@@ -1363,17 +1412,13 @@  shorten_branches (rtx first)
 		  if (! varying_length[inner_uid])
 		    inner_length = insn_lengths[inner_uid];
 		  else
-		    inner_length = insn_current_length (inner_insn);
-
-		  if (inner_length != insn_lengths[inner_uid])
 		    {
-		      if (!increasing || inner_length > insn_lengths[inner_uid])
-			{
-			  insn_lengths[inner_uid] = inner_length;
-			  something_changed = 1;
-			}
-		      else
-			inner_length = insn_lengths[inner_uid];
+		      inner_length = insn_current_length (inner_insn);
+
+		      inner_length
+			= adjust_length (insn, inner_length, inner_uid,
+					 insn_lengths, uid_lock_length, true,
+					 &niter, &something_changed);
 		    }
 		  insn_current_address += inner_length;
 		  new_length += inner_length;
@@ -1385,21 +1430,13 @@  shorten_branches (rtx first)
 	      insn_current_address += new_length;
 	    }
 
-#ifdef ADJUST_INSN_LENGTH
 	  /* If needed, do any adjustment.  */
-	  tmp_length = new_length;
-	  ADJUST_INSN_LENGTH (insn, new_length);
+	  int tmp_length = new_length;
+	  new_length
+	    = adjust_length (insn, new_length, uid, insn_lengths,
+			     uid_lock_length, false, &niter,
+			     &something_changed);
 	  insn_current_address += (new_length - tmp_length);
-#endif
-
-	  if (new_length != insn_lengths[uid]
-	      && (!increasing || new_length > insn_lengths[uid]))
-	    {
-	      insn_lengths[uid] = new_length;
-	      something_changed = 1;
-	    }
-	  else
-	    insn_current_address += insn_lengths[uid] - new_length;
 	}
       /* For a non-optimizing compile, do only a single pass.  */
       if (!increasing)
diff -drup gcc-192879-haveattr/target.def gcc/target.def
--- gcc-192879-haveattr/target.def	2012-10-28 01:07:38.517470606 +0000
+++ gcc/target.def	2012-10-28 01:31:15.525227979 +0000
@@ -2818,6 +2818,17 @@  DEFHOOK
  enum unwind_info_type, (void),
  default_debug_unwind_info)
 
+DEFHOOK
+(adjust_insn_length,
+ "Return an adjusted length for @var{insn}.  @var{length} is the value that\
+ has been calculated using the @code{length} instruction attribute. \
+ @var{in_delay_sequence} if @var{insn} forms part of a delay sequence. \
+ *@var{iteration_threshold} specifies the number of branch shortening\
+ iterations before length decreases are inhibited.  The default\
+ implementation uses @code{ADJUST_INSN_LENGTH}, if defined.",
+ int, (rtx insn, int length, bool in_delay_sequence, int *iteration_threshold),
+ default_adjust_insn_length)
+
 DEFHOOKPOD
 (atomic_test_and_set_trueval,
  "This value should be set if the result written by\
diff -drup gcc-192879-haveattr/targhooks.c gcc/targhooks.c
--- gcc-192879-haveattr/targhooks.c	2012-10-25 03:33:26.000000000 +0100
+++ gcc/targhooks.c	2012-10-28 01:41:35.082931024 +0000
@@ -1540,4 +1540,20 @@  default_member_type_forces_blk (const_tr
   return false;
 }
 
+/* Default version of adjust_insn_length.  */
+
+int
+default_adjust_insn_length (rtx insn ATTRIBUTE_UNUSED, int length,
+			    bool in_delay_sequence, int *iter_threshold)
+{
+  if (in_delay_sequence)
+    *iter_threshold = INT_MAX;
+#ifdef ADJUST_INSN_LENGTH
+  else
+    ADJUST_INSN_LENGTH (insn, length);
+#endif
+  return length;
+}
+
+
 #include "gt-targhooks.h"
diff -drup gcc-192879-haveattr/targhooks.h gcc/targhooks.h
--- gcc-192879-haveattr/targhooks.h	2012-10-25 03:33:26.000000000 +0100
+++ gcc/targhooks.h	2012-10-28 01:31:15.479227682 +0000
@@ -193,3 +193,4 @@  extern const char *default_pch_valid_p (
 extern void default_asm_output_ident_directive (const char*);
 
 extern bool default_member_type_forces_blk (const_tree, enum machine_mode);
+extern int default_adjust_insn_length (rtx, int, bool, int *);