Patchwork Fill more delay slots in conditional returns

login
register
mail settings
Submitter Eric Botcazou
Date March 25, 2013, 11:25 a.m.
Message ID <1769120.Os7kk4Zj9I@polaris>
Download mbox | patch
Permalink /patch/230637/
State New
Headers show

Comments

Eric Botcazou - March 25, 2013, 11:25 a.m.
Hi,

for a private port with conditional returns and delay slots, only the simple 
algorithm (fill_simple_delay_slots) is able to fill the slots.  It's because 
get_branch_condition just punts on conditional returns.

Fixed thusly.  While I investigated this, I realized that the block of code in 
fill_simple_delay_slots between line 2097 and line 2274 is dead for JUMP insns 
(and has been so for a long time, which is consistent with various comments in 
the code, for example the head comment of fill_eager_delay_slots) so the patch 
also cleans it up (modulo the formatting to make the patch readable).

Jeff, any objections?  Tested on SPARC/Solaris, no difference in the generated 
code at -O2 for the gcc.c-torture/compile testsuite.


2013-03-25  Eric Botcazou  <ebotcazou@adacore.com>

	* reorg.c (get_branch_condition): Deal with conditional returns.
	(fill_simple_delay_slots): Remove dead code dealing with jumps.
Jeff Law - March 29, 2013, 8:03 p.m.
On 03/25/2013 05:25 AM, Eric Botcazou wrote:
> Hi,
>
> for a private port with conditional returns and delay slots, only the simple
> algorithm (fill_simple_delay_slots) is able to fill the slots.  It's because
> get_branch_condition just punts on conditional returns.
>
> Fixed thusly.  While I investigated this, I realized that the block of code in
> fill_simple_delay_slots between line 2097 and line 2274 is dead for JUMP insns
> (and has been so for a long time, which is consistent with various comments in
> the code, for example the head comment of fill_eager_delay_slots) so the patch
> also cleans it up (modulo the formatting to make the patch readable).
>
> Jeff, any objections?  Tested on SPARC/Solaris, no difference in the generated
> code at -O2 for the gcc.c-torture/compile testsuite.
>
>
> 2013-03-25  Eric Botcazou<ebotcazou@adacore.com>
>
> 	* reorg.c (get_branch_condition): Deal with conditional returns.
> 	(fill_simple_delay_slots): Remove dead code dealing with jumps.
>
>
> -- Eric Botcazou
>
>
> p.diff
>
>
> Index: reorg.c
> ===================================================================
> --- reorg.c	(revision 196816)
> +++ reorg.c	(working copy)
> @@ -921,8 +921,8 @@ get_branch_condition (rtx insn, rtx targ
>     if (condjump_in_parallel_p (insn))
>       pat = XVECEXP (pat, 0, 0);
>
> -  if (ANY_RETURN_P (pat))
> -    return pat == target ? const_true_rtx : 0;
> +  if (ANY_RETURN_P (pat) && pat == target)
> +    return const_true_rtx;
I'm not sure what this is supposed to do.  How is pat == target ever 
going to be true when ANY_RETURN_P (pat) is true?  Isn't target supposed 
to be a CODE_LABEL or NULL?  What am I missing?


What does the RTL for your conditional return look like

(if_then_else (cond) (pc) (return))

Where the (pc) and (return) can be reversed as well?  That's what the 
later hunks in get_branch_condition seem to imply.


Maybe if I saw the RTL that first hunk would make sense.  It feels like 
I'm missing something.

The cleanup stuff is OK, check that in whenver you'd like.

jeff
Eric Botcazou - April 1, 2013, 7:47 p.m.
> I'm not sure what this is supposed to do.  How is pat == target ever
> going to be true when ANY_RETURN_P (pat) is true?  Isn't target supposed
> to be a CODE_LABEL or NULL?  What am I missing?

The simple_return change from Bernd.  The JUMP_LABEL of a JUMP_INSN is now 
either a CODE_LABEL or a RETURN or a SIMPLE_RETURN.

> What does the RTL for your conditional return look like
> 
> (if_then_else (cond) (pc) (return))
> 
> Where the (pc) and (return) can be reversed as well?  That's what the
> later hunks in get_branch_condition seem to imply.

It's the usual support for branches and inverted branches applied to returns, 
so I'm not sure what you're asking here. :-)

> The cleanup stuff is OK, check that in whenver you'd like.

Thanks.
Jeff Law - April 2, 2013, 3 a.m.
On 04/01/2013 01:47 PM, Eric Botcazou wrote:
>> I'm not sure what this is supposed to do.  How is pat == target ever
>> going to be true when ANY_RETURN_P (pat) is true?  Isn't target supposed
>> to be a CODE_LABEL or NULL?  What am I missing?
>
> The simple_return change from Bernd.  The JUMP_LABEL of a JUMP_INSN is now
> either a CODE_LABEL or a RETURN or a SIMPLE_RETURN.
Ah, OK, yea, it makes perfect sense now.  Approved.

If you wanted to get ambitious, rewriting reorg.c to compute and use 
dependency links to drive its candidate selection instead of the insane 
tracking code it uses would be a huge step forward, both in terms of 
cleanliness.  It'd also help compile-time performance; we do a lot of 
work to track resources that would be totally unnecessary.

I don't work on targets with delay slots anymore, so it's not something 
I'm likely to ever tackle myself.

jeff
Steven Bosscher - April 2, 2013, 4:56 p.m.
On Tue, Apr 2, 2013 at 5:00 AM, Jeff Law wrote:
> On 04/01/2013 01:47 PM, Eric Botcazou wrote:
>>>
>>> I'm not sure what this is supposed to do.  How is pat == target ever
>>> going to be true when ANY_RETURN_P (pat) is true?  Isn't target supposed
>>> to be a CODE_LABEL or NULL?  What am I missing?
>>
>>
>> The simple_return change from Bernd.  The JUMP_LABEL of a JUMP_INSN is now
>> either a CODE_LABEL or a RETURN or a SIMPLE_RETURN.
>
> Ah, OK, yea, it makes perfect sense now.  Approved.
>
> If you wanted to get ambitious, rewriting reorg.c to compute and use
> dependency links to drive its candidate selection instead of the insane
> tracking code it uses would be a huge step forward, both in terms of
> cleanliness.  It'd also help compile-time performance; we do a lot of work
> to track resources that would be totally unnecessary.
>
> I don't work on targets with delay slots anymore, so it's not something I'm
> likely to ever tackle myself.

For the last few, my "new development cycle resolutions" list has
included an item "do something about reorg.c" but every time the task
is just too daunting.

The following problems (at least these anyway) make it a difficult job:

* There is no CFG, and sched-deps.c doesn't workout one. Extending the
life of the CFG at least up to just before dbr_schedule() is a
chicken-and-egg problem. Some machine_reorg passes wreck the CFG
beyond repair. The MIPS back end has to run dbr_schedule() as part of
its machine_reorg pass before mips_reorg_process_insns. The SPARC back
end also calls dbr_schedule() in its machine_reorg pass to work around
errata in Atmel AT697F chips (LEON2-like, i.e. fairly new, see
r179921).
Ironically, there are also a few machine_reorgs *need* the CFG and
restore it just after pass_free_cfg...

* Even if there'd be a CFG, sched-deps.c doesn't handle sequences,
which may appear after the MIPS and SPARC machine_reorg passes, which
pass the delayed-branch scheduled insn stream through the rest of the
compiler passes pipeline, also through pass_delay_slots.

* sched-deps uses the DF machinery (DF_INSN_USES, DF_INSN_DEFS) but DF
doesn't look through SEQUENCEs so DF caches are invalid after, and
probably during, dbr_schedule().

* Generally the reorg.c code doesn't look inviting to someone willing
to spend some time editing it. A few functions are huge
(fill_slots_from_thread and relax_delay_slots ~500 each, 25% of
reorg.c). There are many recurring idioms that make the code somewhat
harder to read than necessary (e.g. NONJUMP_INSN_P && GET_CODE ==
SEQUENCE). And some conditionals look too complex to touch and apply
transformations in the middle of a condition, e.g. try_split in this
one:

          /* If there are slots left to fill and our search was stopped by an
             unconditional branch, try the insn at the branch target.  We can
             redirect the branch if it works.

             Don't do this if the insn at the branch target is a branch.  */
          if (slots_to_fill != slots_filled
              && trial
              && jump_to_label_p (trial)
              && simplejump_p (trial)
              && (target == 0 || JUMP_LABEL (trial) == target)
              && (next_trial = next_active_insn (JUMP_LABEL (trial))) != 0
              && ! (NONJUMP_INSN_P (next_trial)
                    && GET_CODE (PATTERN (next_trial)) == SEQUENCE)
              && !JUMP_P (next_trial)
              && ! insn_references_resource_p (next_trial, &set, true)
              && ! insn_sets_resource_p (next_trial, &set, true)
              && ! insn_sets_resource_p (next_trial, &needed, true)
#ifdef HAVE_cc0
              && ! reg_mentioned_p (cc0_rtx, PATTERN (next_trial))
#endif
              && ! (maybe_never && may_trap_or_fault_p (PATTERN (next_trial)))
              && (next_trial = try_split (PATTERN (next_trial), next_trial, 0))
              && eligible_for_delay (insn, slots_filled, next_trial, flags)
              && ! can_throw_internal (trial))
            {

* reorg.c is doing things itself that it should be using existing APIs
for, e.g. emit_delay_sequence should use
add_insn_before/add_insn_after instead of splicing seq_insn into the
chain "manually".


If only there'd be a better way to model delayed insn (better than
SEQUENCE) then it is probably easier to pick Muchnick off the shelf
and re-implement delayed-branch scheduling from scratch, rather than
"modernizing" reorg.c.

Ciao!
Steven

Patch

Index: reorg.c
===================================================================
--- reorg.c	(revision 196816)
+++ reorg.c	(working copy)
@@ -921,8 +921,8 @@  get_branch_condition (rtx insn, rtx targ
   if (condjump_in_parallel_p (insn))
     pat = XVECEXP (pat, 0, 0);
 
-  if (ANY_RETURN_P (pat))
-    return pat == target ? const_true_rtx : 0;
+  if (ANY_RETURN_P (pat) && pat == target)
+    return const_true_rtx;
 
   if (GET_CODE (pat) != SET || SET_DEST (pat) != pc_rtx)
     return 0;
@@ -933,14 +933,16 @@  get_branch_condition (rtx insn, rtx targ
 
   else if (GET_CODE (src) == IF_THEN_ELSE
 	   && XEXP (src, 2) == pc_rtx
-	   && GET_CODE (XEXP (src, 1)) == LABEL_REF
-	   && XEXP (XEXP (src, 1), 0) == target)
+	   && ((GET_CODE (XEXP (src, 1)) == LABEL_REF
+	        && XEXP (XEXP (src, 1), 0) == target)
+	       || (ANY_RETURN_P (XEXP (src, 1)) && XEXP (src, 1) == target)))
     return XEXP (src, 0);
 
   else if (GET_CODE (src) == IF_THEN_ELSE
 	   && XEXP (src, 1) == pc_rtx
-	   && GET_CODE (XEXP (src, 2)) == LABEL_REF
-	   && XEXP (XEXP (src, 2), 0) == target)
+	   && ((GET_CODE (XEXP (src, 2)) == LABEL_REF
+		&& XEXP (XEXP (src, 2), 0) == target)
+	       || (ANY_RETURN_P (XEXP (src, 2)) && XEXP (src, 2) == target)))
     {
       enum rtx_code rev;
       rev = reversed_comparison_code (XEXP (src, 0), insn);
@@ -2129,35 +2131,19 @@  fill_simple_delay_slots (int non_jumps_p
 	     Presumably, we should also check to see if we could get
 	     back to this function via `setjmp'.  */
 	  && ! can_throw_internal (insn)
-	  && (!JUMP_P (insn)
-	      || ((condjump_p (insn) || condjump_in_parallel_p (insn))
-		  && ! simplejump_p (insn)
-		  && !ANY_RETURN_P (JUMP_LABEL (insn)))))
+	  && !JUMP_P (insn))
 	{
-	  /* Invariant: If insn is a JUMP_INSN, the insn's jump
-	     label.  Otherwise, zero.  */
-	  rtx target = 0;
 	  int maybe_never = 0;
 	  rtx pat, trial_delay;
 
 	  CLEAR_RESOURCE (&needed);
 	  CLEAR_RESOURCE (&set);
+	  mark_set_resources (insn, &set, 0, MARK_SRC_DEST_CALL);
+	  mark_referenced_resources (insn, &needed, true);
 
 	  if (CALL_P (insn))
-	    {
-	      mark_set_resources (insn, &set, 0, MARK_SRC_DEST_CALL);
-	      mark_referenced_resources (insn, &needed, true);
-	      maybe_never = 1;
-	    }
-	  else
-	    {
-	      mark_set_resources (insn, &set, 0, MARK_SRC_DEST_CALL);
-	      mark_referenced_resources (insn, &needed, true);
-	      if (JUMP_P (insn))
-		target = JUMP_LABEL (insn);
-	    }
+	    maybe_never = 1;
 
-	  if (target == 0 || ANY_RETURN_P (target))
 	    for (trial = next_nonnote_insn (insn); !stop_search_p (trial, 1);
 		 trial = next_trial)
 	      {
@@ -2217,9 +2203,8 @@  fill_simple_delay_slots (int non_jumps_p
 		   slot since these insns could clobber the condition code.  */
 		set.cc = 1;
 
-		/* If this is a call or jump, we might not get here.  */
-		if (CALL_P (trial_delay)
-		    || JUMP_P (trial_delay))
+		/* If this is a call, we might not get here.  */
+		if (CALL_P (trial_delay))
 		  maybe_never = 1;
 	      }
 
@@ -2232,7 +2217,6 @@  fill_simple_delay_slots (int non_jumps_p
 	      && trial
 	      && jump_to_label_p (trial)
 	      && simplejump_p (trial)
-	      && (target == 0 || JUMP_LABEL (trial) == target)
 	      && (next_trial = next_active_insn (JUMP_LABEL (trial))) != 0
 	      && ! (NONJUMP_INSN_P (next_trial)
 		    && GET_CODE (PATTERN (next_trial)) == SEQUENCE)
@@ -2264,11 +2248,6 @@  fill_simple_delay_slots (int non_jumps_p
 					 delay_list);
 		  slots_filled++;
 		  reorg_redirect_jump (trial, new_label);
-
-		  /* If we merged because we both jumped to the same place,
-		     redirect the original insn also.  */
-		  if (target)
-		    reorg_redirect_jump (insn, new_label);
 		}
 	    }
 	}