Patchwork RFA: fix dbr_schedule to leave unique ids unique

login
register
mail settings
Submitter Joern Rennecke
Date Oct. 19, 2012, 3:22 p.m.
Message ID <20121019112258.f139lurwg0g8w8gk-nzlynne@webmail.spamcop.net>
Download mbox | patch
Permalink /patch/192735/
State New
Headers show

Comments

Joern Rennecke - Oct. 19, 2012, 3:22 p.m.
Quoting Richard Sandiford <rdsandiford@googlemail.com>:

> Joern Rennecke <joern.rennecke@embecosm.com> writes:
>> Quoting Richard Sandiford <rdsandiford@googlemail.com>:
>>> The fact that we even have shared unique ids is pretty bad -- and surely
>>> a contradiction in terms -- but I think both ways of handling them rely
>>> on the length being the same for all copies.  If we don't record a length
>>> (your version), we won't set something_changed if the length of this copy
>>> did in fact change, which could lead to branches remaining too short.
>>
>> That is not the case, as the current length of the inner insn is added to
>> new_length (of the outer insn).
>>
>>> If we do record a length (current version),
>>
>> In the current version, the length of the inner insn is calculated anew
>> each iteration, so only the out-of-sequence copy suffers from the bogosity.
>>
>>> we could end up changing
>>> the length of previous copies that have already been processed in
>>> this iteration.
>>
>> Both that, and also using the length or the previous insn for the inner
>> length calculation, and ratcheting them up together.
>> In fact, this can make delay slot instructions ineligible for the delay slot
>> they are in.  Also, the unwanted length cross-interference can violate
>> alignment invariants, and this leads to invalid code on ARCompact.
>
> But if you're saying that ARCompat wants different copies of the same insn
> (with the same uid) to have different lengths, then please fix dbr_schedule
> so that it uses different uids.  The "i == 0" thing is just too hackish IMO.

Implemented with the attached patch.
I've been wondering if I should use copy_insn, but that doesn't seem to make
any real difference after reload.
Also, copying only PATTERN, INSN_LOCATION, and REG_NOTES into the new
insn obtained from make_insn_raw had seemed a possibility, but there is no
firm assurance that we will only see insn with the code INSN.
In the end, continuing to use copy_rtx to make the copy seems least likely to
break something.  And now that the copying is in one place, it's easier to
change the way it is done if you want to try that.

We waste a bit of memory (temporarily, as it is garbage collected) by using
make_insn_raw just to increment cur_insn_uid.  Alternatives would be to
move the function inside emit-rtl.c, or have emit-rtl.h provide a means to
increment cur_insn_uid.  Well, or we could directly access  
crtl->emit.x_cur_insn_uid, but that would certainly be hackish.

regression tested on i686-pc-linux-gnu X cris-elf

(cris-elf has delayed branches, CASE_VECTOR_SHORTEN_MODE, is not  
currently affected by PR54938, and builds libgfortran.)

C / C++ / objc regression tests for mipsel-elf also passed.
2012-10-19  Joern Rennecke  <joern.rennecke@embecosm.com>

	* reorg.c (copy_delay_slot_insn): New function.
	(steal_delay_list_from_target, fill_slots_from_thread): Use it.
	(fill_simple_delay_slots): Likewise.
Richard Sandiford - Oct. 20, 2012, 9:07 a.m.
Joern Rennecke <joern.rennecke@embecosm.com> writes:
> Quoting Richard Sandiford <rdsandiford@googlemail.com>:
>> Joern Rennecke <joern.rennecke@embecosm.com> writes:
>>> Quoting Richard Sandiford <rdsandiford@googlemail.com>:
>>>> The fact that we even have shared unique ids is pretty bad -- and surely
>>>> a contradiction in terms -- but I think both ways of handling them rely
>>>> on the length being the same for all copies.  If we don't record a length
>>>> (your version), we won't set something_changed if the length of this copy
>>>> did in fact change, which could lead to branches remaining too short.
>>>
>>> That is not the case, as the current length of the inner insn is added to
>>> new_length (of the outer insn).
>>>
>>>> If we do record a length (current version),
>>>
>>> In the current version, the length of the inner insn is calculated anew
>>> each iteration, so only the out-of-sequence copy suffers from the bogosity.
>>>
>>>> we could end up changing
>>>> the length of previous copies that have already been processed in
>>>> this iteration.
>>>
>>> Both that, and also using the length or the previous insn for the inner
>>> length calculation, and ratcheting them up together.
>>> In fact, this can make delay slot instructions ineligible for the delay slot
>>> they are in.  Also, the unwanted length cross-interference can violate
>>> alignment invariants, and this leads to invalid code on ARCompact.
>>
>> But if you're saying that ARCompat wants different copies of the same insn
>> (with the same uid) to have different lengths, then please fix dbr_schedule
>> so that it uses different uids.  The "i == 0" thing is just too hackish IMO.
>
> Implemented with the attached patch.
> I've been wondering if I should use copy_insn, but that doesn't seem to make
> any real difference after reload.
> Also, copying only PATTERN, INSN_LOCATION, and REG_NOTES into the new
> insn obtained from make_insn_raw had seemed a possibility, but there is no
> firm assurance that we will only see insn with the code INSN.

Yeah, that's unfortunate.

> In the end, continuing to use copy_rtx to make the copy seems least likely to
> break something.  And now that the copying is in one place, it's easier to
> change the way it is done if you want to try that.

Agreed.

> We waste a bit of memory (temporarily, as it is garbage collected) by using
> make_insn_raw just to increment cur_insn_uid.  Alternatives would be to
> move the function inside emit-rtl.c, or have emit-rtl.h provide a means to
> increment cur_insn_uid.  Well, or we could directly access  
> crtl->emit.x_cur_insn_uid, but that would certainly be hackish.

I agree using crtl->emit.x_cur_insn_uid from reorg.c would be hackish.
I think it would be better to put the new function in emit-rtl.c
and just have:

/* Return a copy of INSN that can be used in a SEQUENCE delay slot,
   on that assumption that INSN itself remains in its original place.  */

rtx
copy_delay_slot_insn (rtx insn)
{
  /* Copy INSN with its rtx_code, all its notes, location etc.  */
  insn = copy_rtx (insn);
  INSN_UID (insn) = cur_insn_uid++;
  return insn;
}

OK with that change, thanks.

Richard

Patch

Index: reorg.c
===================================================================
--- reorg.c	(revision 192573)
+++ reorg.c	(working copy)
@@ -1179,6 +1179,19 @@  check_annul_list_true_false (int annul_t
   return 1;
 }
 
+/* Copy INSN, which is to remain in place (in fill_slots_from_thread parlance,
+   it is from a thread we don't own), for use in a delay slot.  */
+static rtx
+copy_delay_slot_insn (rtx insn)
+{
+  /* Copy INSN with its rtx_code, all its notes, location etc.  */
+  insn = copy_rtx (insn);
+  /* Get new UID.  */
+  rtx new_insn = make_insn_raw (PATTERN (insn));
+  INSN_UID (insn) = INSN_UID (new_insn);
+  return insn;
+}
+
 /* INSN branches to an insn whose pattern SEQ is a SEQUENCE.  Given that
    the condition tested by INSN is CONDITION and the resources shown in
    OTHER_NEEDED are needed after INSN, see whether INSN can take all the insns
@@ -1297,7 +1310,7 @@  steal_delay_list_from_target (rtx insn,
 	{
 	  if (must_annul)
 	    used_annul = 1;
-	  temp = copy_rtx (trial);
+	  temp = copy_delay_slot_insn (trial);
 	  INSN_FROM_TARGET_P (temp) = 1;
 	  new_delay_list = add_to_delay_list (temp, new_delay_list);
 	  total_slots_filled++;
@@ -2369,7 +2382,8 @@  fill_simple_delay_slots (int non_jumps_p
 	      if (new_label)
 	        {
 		  delay_list
-		    = add_to_delay_list (copy_rtx (next_trial), delay_list);
+		    = add_to_delay_list (copy_delay_slot_insn (next_trial),
+					 delay_list);
 		  slots_filled++;
 		  reorg_redirect_jump (trial, new_label);
 
@@ -2793,7 +2807,7 @@  fill_slots_from_thread (rtx insn, rtx co
 		  else
 		    new_thread = next_active_insn (trial);
 
-		  temp = own_thread ? trial : copy_rtx (trial);
+		  temp = own_thread ? trial : copy_delay_slot_insn (trial);
 		  if (thread_if_true)
 		    INSN_FROM_TARGET_P (temp) = 1;
 
@@ -2974,7 +2988,7 @@  fill_slots_from_thread (rtx insn, rtx co
 	  else
 	    new_thread = next_active_insn (trial);
 
-	  ninsn = own_thread ? trial : copy_rtx (trial);
+	  ninsn = own_thread ? trial : copy_delay_slot_insn (trial);
 	  if (thread_if_true)
 	    INSN_FROM_TARGET_P (ninsn) = 1;