diff mbox

RFA: add lock_length attribute to break branch-shortening cycles

Message ID 20121004103317.5xf2l3q6g44o04kc-nzlynne@webmail.spamcop.net
State New
Headers show

Commit Message

Joern Rennecke Oct. 4, 2012, 2:33 p.m. UTC
Quoting Richard Guenther <richard.guenther@gmail.com>:

> I miss a few things in this description:
> - what is the value of lock_length supposed to be?  From the "lock"
>   prefix it sounds like it is something unchanging, maybe even constant,
>   thus a maximum?
> - the length attribute still needs to be specified when lock_length is?
>   how do they relate?  Is lock_length always smaller / bigger than length?

I hope I have clarified this in the updated documentation:

  Usually, branch shortening is done assuming the worst case (i.e.
longest) lengths, and then iterating (if optimizing) to smaller lengths
till no further changed occur.  This does not work so well for
architectures that have very small minimum offsets and considerable
jumps in instruction lengths.

  If you define the `lock_length' attribute, branch shortening will work
the other way round: it starts out assuming minimum instruction lengths
and iterates from there.  `lock_length' specifies an instruction length
value that is calculated like `length' in every iteration, but if the
value at the last iteration was larger, that larger previous value will
be used instead.  The value computed for the `length' attribute will be
no smaller than that of the `lock_length' attribute, but you may still
specify a larger value, and in that case `length' can decrease in the
next iteration, but not below the value of `lock_length'.  The easiest
way to make sure branch shortening doesn't start looping indefinitely
is to define the `length' attribute to only a minimum instruction
length for varying length instructions and let the `lock_length'
attribute do all the heavy lifting of computing varying lengths.  On
the other hand, for some instruction sets you might get better
shortening outcomes if you use a more complex `length' attribute and
use `lock_length' only to the extent required to prevent indefinite
looping.  Note that `ADJUST_INSN_LENGTH' applies only to the `length'
attribute.

  Because defining `lock_length' makes branch shortening start with
optimistic assumptions, this means we have to see it through to the end
to eliminate all out-of-range branches, thus branch shortening might
take a bit longer at `-O0' than if you didn't define the `lock_length'
attribute.

  Using `lock_length' with varying delay slots and varying length delay
slot insns (all at once) is currently not supported.

> - what happens if you have patterns with lock_length and patterns without?
> - what patterns does lock_length apply to?

These questions don't really make literal sense; instruction attributes
are defined for all instructions.  Of course, you can define the default
value to 0, which means you see no effect of this attribute on a pattern
unless some non-default definition applies.

> In general optimistically attacking this kind of problem should be always
> better - did you try simply switching this for all targets?

No, I haven't.  The boundaries to be set for branch offsets can be different
when starting with optimistic assumptions than when starting with  
pessimistic assumptions.  Moreover, there could be 'interesting'  
interactions
with ADJUST_INSN_LENGTH.  And there is the problem with handling varying
delay slots.  I have modified the final.c patch to reduce the scope of this
issue so that lock_length should hopefully be useful for a wider range of
targets.

Also, ...

>  It shouldn't be
> slower and the only thing you need to guarantee is that during iteration
> you never make insn-lenghts smaller again.

At -O0 it is slower because we can't finish the loop early.

For all these reasons, I think it is better if each target maintainer  
evaluates
if the better branch shortening weighs out the longer -O0 compilation time,
and addresses any issues arising if/when converting.

Bootstrapped on i686-pc-linux-gnu
diff mbox

Patch

Index: doc/md.texi
===================================================================
--- doc/md.texi	(revision 192036)
+++ doc/md.texi	(working copy)
@@ -8004,6 +8004,42 @@  (define_insn "jump"
                       (const_int 6)))])
 @end smallexample
 
+@cindex lock_length
+Usually, branch shortening is done assuming the worst case (i.e. longest)
+lengths, and then iterating (if optimizing) to smaller lengths till 
+no further changed occur.  This does not work so well for architectures
+that have very small minimum offsets and considerable jumps in instruction
+lengths.
+
+If you define the @code{lock_length} attribute, branch shortening will
+work the other way round: it starts out assuming minimum instruction
+lengths and iterates from there.  @code{lock_length} specifies an instruction
+length value that is calculated like @code{length} in every iteration,
+but if the value at the last iteration was larger, that larger previous
+value will be used instead.
+The value computed for the @code{length} attribute will be no smaller
+than that of the @code{lock_length} attribute, but you may still specify
+a larger value, and in that case @code{length} can decrease in the next
+iteration, but not below the value of @code{lock_length}.
+The easiest way to make sure branch shortening doesn't start looping
+indefinitely is to define the @code{length} attribute to only a minimum
+instruction length for varying length instructions and let the
+@code{lock_length} attribute do all the heavy lifting of computing varying
+lengths.  On the other hand, for some instruction sets you might get
+better shortening outcomes if you use a more complex @code{length} attribute
+and use @code{lock_length} only to the extent required to prevent indefinite
+looping.  Note that @code{ADJUST_INSN_LENGTH} applies only to the
+@code{length} attribute.
+
+Because defining @code{lock_length} makes branch shortening start with
+optimistic assumptions, this means we have to see it through to the end
+to eliminate all out-of-range branches, thus branch shortening might take
+a bit longer at @option{-O0} than if you didn't define the
+@code{lock_length} attribute.
+
+Using @code{lock_length} with varying delay slots and varying length delay
+slot insns (all at once) is currently not supported.
+
 @end ifset
 @ifset INTERNALS
 @node Constant Attributes
Index: final.c
===================================================================
--- final.c	(revision 192036)
+++ final.c	(working copy)
@@ -312,6 +312,7 @@  dbr_sequence_length (void)
    `insn_current_length'.  */
 
 static int *insn_lengths;
+static char *uid_lock_length;
 
 VEC(int,heap) *insn_addresses_;
 
@@ -447,6 +448,20 @@  get_attr_length (rtx insn)
   return get_attr_length_1 (insn, insn_default_length);
 }
 
+#ifdef HAVE_ATTR_lock_length
+int
+get_attr_lock_length (rtx insn)
+{
+  if (uid_lock_length && insn_lengths_max_uid > INSN_UID (insn))
+    return uid_lock_length[INSN_UID (insn)];
+  return get_attr_length_1 (insn, insn_min_lock_length);
+}
+#define INSN_VARIABLE_LENGTH_P(INSN) \
+  (insn_variable_length_p (INSN) || insn_variable_lock_length_p (INSN))
+#else
+#define INSN_VARIABLE_LENGTH_P(INSN) (insn_variable_length_p (INSN))
+#endif
+
 /* Obtain the current length of an insn.  If branch shortening has been done,
    get its actual length.  Otherwise, get its minimum length.  */
 int
@@ -865,6 +880,7 @@  shorten_branches (rtx first ATTRIBUTE_UN
   rtx body;
   int uid;
   rtx align_tab[MAX_CODE_ALIGN];
+  int (*length_fun) (rtx);
 
 #endif
 
@@ -875,6 +891,10 @@  shorten_branches (rtx first ATTRIBUTE_UN
   free (uid_shuid);
 
   uid_shuid = XNEWVEC (int, max_uid);
+#ifdef HAVE_ATTR_lock_length
+  uid_lock_length = XNEWVEC (char, max_uid);
+  memset (uid_lock_length, 0, max_uid);
+#endif
 
   if (max_labelno != max_label_num ())
     {
@@ -1067,6 +1087,10 @@  shorten_branches (rtx first ATTRIBUTE_UN
 #endif /* CASE_VECTOR_SHORTEN_MODE */
 
   /* Compute initial lengths, addresses, and varying flags for each insn.  */
+  length_fun = insn_default_length;
+#ifdef HAVE_ATTR_lock_length
+  length_fun = insn_min_length;
+#endif
   for (insn_current_address = 0, insn = first;
        insn != 0;
        insn_current_address += insn_lengths[uid], insn = NEXT_INSN (insn))
@@ -1131,26 +1155,36 @@  shorten_branches (rtx first ATTRIBUTE_UN
 		inner_length = (asm_insn_count (PATTERN (inner_insn))
 				* insn_default_length (inner_insn));
 	      else
-		inner_length = insn_default_length (inner_insn);
+		inner_length = length_fun (inner_insn);
 
 	      insn_lengths[inner_uid] = inner_length;
 	      if (const_delay_slots)
 		{
 		  if ((varying_length[inner_uid]
-		       = insn_variable_length_p (inner_insn)) != 0)
+		       = INSN_VARIABLE_LENGTH_P (inner_insn)) != 0)
 		    varying_length[uid] = 1;
 		  INSN_ADDRESSES (inner_uid) = (insn_current_address
 						+ insn_lengths[uid]);
 		}
 	      else
-		varying_length[inner_uid] = 0;
+		{
+		  /* We'd need to make this code more complicated
+		     to properly support non-const-delay-slots with the
+		     lock_length attribute.
+		     We would have to make sure we don't end up with an
+		     over-optimistic length assumption that causes an
+		     out-of-range offset.  */
+		  if (INSN_VARIABLE_LENGTH_P (inner_insn) != 0)
+		    gcc_assert (length_fun == &insn_default_length);
+		  varying_length[inner_uid] = 0;
+		}
 	      insn_lengths[uid] += inner_length;
 	    }
 	}
       else if (GET_CODE (body) != USE && GET_CODE (body) != CLOBBER)
 	{
-	  insn_lengths[uid] = insn_default_length (insn);
-	  varying_length[uid] = insn_variable_length_p (insn);
+	  insn_lengths[uid] = length_fun (insn);
+	  varying_length[uid] = INSN_VARIABLE_LENGTH_P (insn);
 	}
 
       /* If needed, do any adjustment.  */
@@ -1220,6 +1254,7 @@  shorten_branches (rtx first ATTRIBUTE_UN
 	      rtx prev;
 	      int rel_align = 0;
 	      addr_diff_vec_flags flags;
+	      enum machine_mode vec_mode;
 
 	      /* Avoid automatic aggregate initialization.  */
 	      flags = ADDR_DIFF_VEC_FLAGS (body);
@@ -1298,9 +1333,15 @@  shorten_branches (rtx first ATTRIBUTE_UN
 		  else
 		    max_addr += align_fuzz (max_lab, rel_lab, 0, 0);
 		}
-	      PUT_MODE (body, CASE_VECTOR_SHORTEN_MODE (min_addr - rel_addr,
-							max_addr - rel_addr,
-							body));
+	      vec_mode = CASE_VECTOR_SHORTEN_MODE (min_addr - rel_addr,
+						   max_addr - rel_addr, body);
+	      if (!uid_lock_length
+		  || !uid_lock_length[uid]
+		  || (GET_MODE_SIZE (vec_mode)
+		      >= GET_MODE_SIZE (GET_MODE (body))))
+		PUT_MODE (body, vec_mode);
+	      if (uid_lock_length)
+		uid_lock_length[uid] = 1;
 	      if (JUMP_TABLES_IN_TEXT_SECTION
 		  || readonly_data_section == text_section)
 		{
@@ -1360,18 +1401,44 @@  shorten_branches (rtx first ATTRIBUTE_UN
 		  else
 		    inner_length = insn_current_length (inner_insn);
 
-		  if (inner_length != insn_lengths[inner_uid])
+		  /* We can't record lengths of delay slot insns.  */
+		  if (i == 0 && inner_length != insn_lengths[inner_uid])
 		    {
-		      insn_lengths[inner_uid] = inner_length;
-		      something_changed = 1;
+#ifdef HAVE_ATTR_lock_length
+		      int lock_length = insn_current_lock_length (inner_insn);
+
+		      if (lock_length > uid_lock_length[inner_uid])
+			uid_lock_length[inner_uid] = lock_length;
+		      else
+			lock_length = uid_lock_length[inner_uid];
+		      if (inner_length < lock_length)
+			inner_length = lock_length;
+		      if (inner_length != insn_lengths[inner_uid])
+#endif
+			{
+			  insn_lengths[inner_uid] = inner_length;
+			  something_changed = 1;
+			}
 		    }
-		  insn_current_address += insn_lengths[inner_uid];
+		  insn_current_address += inner_length;
 		  new_length += inner_length;
 		}
 	    }
 	  else
 	    {
 	      new_length = insn_current_length (insn);
+#ifdef HAVE_ATTR_lock_length
+		{
+		  int lock_length = insn_current_lock_length (insn);
+
+		  if (lock_length > uid_lock_length[uid])
+		    uid_lock_length[uid] = lock_length;
+		  else
+		    lock_length = uid_lock_length[uid];
+		  if (new_length < lock_length)
+		    new_length = lock_length;
+		}
+#endif
 	      insn_current_address += new_length;
 	    }
 
@@ -1388,12 +1455,17 @@  shorten_branches (rtx first ATTRIBUTE_UN
 	      something_changed = 1;
 	    }
 	}
+#ifndef HAVE_ATTR_lock_length
       /* For a non-optimizing compile, do only a single pass.  */
       if (!optimize)
 	break;
+#endif
     }
 
   free (varying_length);
+  if (uid_lock_length)
+    free (uid_lock_length);
+  uid_lock_length = 0;
 
 #endif /* HAVE_ATTR_length */
 }
Index: genattr.c
===================================================================
--- genattr.c	(revision 192036)
+++ genattr.c	(working copy)
@@ -71,6 +71,10 @@  extern int insn_default_length (rtx);\n\
 extern int insn_min_length (rtx);\n\
 extern int insn_variable_length_p (rtx);\n\
 extern int insn_current_length (rtx);\n\n\
+extern int insn_default_lock_length (rtx);\n\
+extern int insn_min_lock_length (rtx);\n\
+extern int insn_variable_lock_length_p (rtx);\n\
+extern int insn_current_lock_length (rtx);\n\n\
 #include \"insn-addr.h\"\n");
     }
 }
Index: genattrtab.c
===================================================================
--- genattrtab.c	(revision 192036)
+++ genattrtab.c	(working copy)
@@ -242,6 +242,7 @@  struct attr_value_list **insn_code_value
 
 static const char *alternative_name;
 static const char *length_str;
+static const char *lock_length_str;
 static const char *delay_type_str;
 static const char *delay_1_0_str;
 static const char *num_delay_slots_str;
@@ -1541,14 +1542,14 @@  substitute_address (rtx exp, rtx (*no_ad
   */
 
 static void
-make_length_attrs (void)
+make_length_attrs (const char **base)
 {
   static const char *new_names[] =
     {
-      "*insn_default_length",
-      "*insn_min_length",
-      "*insn_variable_length_p",
-      "*insn_current_length"
+      "*insn_default_%s",
+      "*insn_min_%s",
+      "*insn_variable_%s_p",
+      "*insn_current_%s"
     };
   static rtx (*const no_address_fn[]) (rtx)
     = {identity_fn,identity_fn, zero_fn, zero_fn};
@@ -1561,7 +1562,7 @@  make_length_attrs (void)
 
   /* See if length attribute is defined.  If so, it must be numeric.  Make
      it special so we don't output anything for it.  */
-  length_attr = find_attr (&length_str, 0);
+  length_attr = find_attr (base, 0);
   if (length_attr == 0)
     return;
 
@@ -1574,11 +1575,14 @@  make_length_attrs (void)
   /* Make each new attribute, in turn.  */
   for (i = 0; i < ARRAY_SIZE (new_names); i++)
     {
-      make_internal_attr (new_names[i],
+      const char *p = attr_printf (strlen (new_names[i]) - 2 + strlen (*base),
+				   new_names[i], *base);
+
+      make_internal_attr (p,
 			  substitute_address (length_attr->default_val->value,
 					      no_address_fn[i], address_fn[i]),
 			  ATTR_NONE);
-      new_attr = find_attr (&new_names[i], 0);
+      new_attr = find_attr (&p, 0);
       for (av = length_attr->first_value; av; av = av->next)
 	for (ie = av->first_insn; ie; ie = ie->next)
 	  {
@@ -5178,6 +5182,7 @@  main (int argc, char **argv)
 
   alternative_name = DEF_ATTR_STRING ("alternative");
   length_str = DEF_ATTR_STRING ("length");
+  lock_length_str = DEF_ATTR_STRING ("lock_length");
   delay_type_str = DEF_ATTR_STRING ("*delay_type");
   delay_1_0_str = DEF_ATTR_STRING ("*delay_1_0");
   num_delay_slots_str = DEF_ATTR_STRING ("*num_delay_slots");
@@ -5274,7 +5279,8 @@  main (int argc, char **argv)
       fill_attr (attr);
 
   /* Construct extra attributes for `length'.  */
-  make_length_attrs ();
+  make_length_attrs (&length_str);
+  make_length_attrs (&lock_length_str);
 
   /* Perform any possible optimizations to speed up compilation.  */
   optimize_attrs ();