diff mbox

Follow-up to PR51471

Message ID 1707878.8rQuAFM7vl@polaris
State New
Headers show

Commit Message

Eric Botcazou Nov. 11, 2014, 10:28 p.m. UTC
In https://gcc.gnu.org/ml/gcc-patches/2012-01/msg00363.html, Tom reported an 
ICE in the DWARF CFI pass because a frame-related insn was speculated by the 
reorg pass and, therefore, disabled the optimization.  It turns out that the 
same code (when condition == const_true_rtx) can also duplicate frame-related 
insns, leading to the same ICE in the DWARF CFI pass; we have run into that 
building newlib for a port we are about to contribute.

Hence the attached patch, which disables the optimization in this case too and 
adds a comment for both.  Tested on SPARC/Solaris, applied on the mainline.


2014-11-11  Eric Botcazou  <ebotcazou@adacore.com>

	* reorg.c (fill_slots_from_thread): Do not copy frame-related insns.

Comments

Jeff Law Nov. 12, 2014, 8:10 p.m. UTC | #1
On 11/11/14 15:28, Eric Botcazou wrote:
> In https://gcc.gnu.org/ml/gcc-patches/2012-01/msg00363.html, Tom reported an
> ICE in the DWARF CFI pass because a frame-related insn was speculated by the
> reorg pass and, therefore, disabled the optimization.  It turns out that the
> same code (when condition == const_true_rtx) can also duplicate frame-related
> insns, leading to the same ICE in the DWARF CFI pass; we have run into that
> building newlib for a port we are about to contribute.
>
> Hence the attached patch, which disables the optimization in this case too and
> adds a comment for both.  Tested on SPARC/Solaris, applied on the mainline.
>
>
> 2014-11-11  Eric Botcazou  <ebotcazou@adacore.com>
>
> 	* reorg.c (fill_slots_from_thread): Do not copy frame-related insns.
I wonder how many other problems of this nature are lurking in reorg.c. 
  For example steal_delay_list_from_{target,fallthrough} or the code 
which searches for arithmetic at the branch target, and puts the 
opposite insn in a delay slot.

In fact, I really wonder if we should be allowing anything frame related 
outside fill_simple_delay_slots.

jeff
Eric Botcazou Nov. 14, 2014, 10:44 a.m. UTC | #2
> I wonder how many other problems of this nature are lurking in reorg.c.
> For example steal_delay_list_from_{target,fallthrough} or the code
> which searches for arithmetic at the branch target, and puts the
> opposite insn in a delay slot.

Right, and the latter has already been dealt with by Richard:

2012-02-11  Richard Sandiford  <rdsandiford@googlemail.com>

	PR rtl-optimization/52175
	* reorg.c (fill_slots_from_thread): Don't apply add/sub optimization
	to frame-related instructions.

> In fact, I really wonder if we should be allowing anything frame related
> outside fill_simple_delay_slots.

That could well be the end result after a few more years of tweaking. :-)
Jeff Law Nov. 14, 2014, 5:16 p.m. UTC | #3
On 11/14/14 03:44, Eric Botcazou wrote:
>> I wonder how many other problems of this nature are lurking in reorg.c.
>> For example steal_delay_list_from_{target,fallthrough} or the code
>> which searches for arithmetic at the branch target, and puts the
>> opposite insn in a delay slot.
>
> Right, and the latter has already been dealt with by Richard:
>
> 2012-02-11  Richard Sandiford  <rdsandiford@googlemail.com>
>
> 	PR rtl-optimization/52175
> 	* reorg.c (fill_slots_from_thread): Don't apply add/sub optimization
> 	to frame-related instructions.
Ahhh, good.

>
>> In fact, I really wonder if we should be allowing anything frame related
>> outside fill_simple_delay_slots.
>
> That could well be the end result after a few more years of tweaking. :-)
IIRC, fill_eager and its related friends are all speculative in some way 
and aren't those precisely the ones that are causing us problems.   Also 
note we have backends working around this stuff in fairly blunt ways:

;; For conditional branches. Frame related instructions are not allowed
;; because they confuse the unwind support.
(define_attr "in_branch_delay" "false,true"
   (if_then_else (and (eq_attr "type" 
"!uncond_branch,branch,cbranch,fbranch,call,sibcall,dyncall,multi,milli,sh_func_adrs,parallel_branch,trap")
                      (eq_attr "length" "4")
                      (not (match_test "RTX_FRAME_RELATED_P (insn)")))
                 (const_string "true")
                 (const_string "false")))

;; Disallow instructions which use the FPU since they will tie up the FPU
;; even if the instruction is nullified.
(define_attr "in_nullified_branch_delay" "false,true"
   (if_then_else (and (eq_attr "type" 
"!uncond_branch,branch,cbranch,fbranch,call,sibcall,dyncall,multi,milli,sh_func_adrs,fpcc,fpalu,fpmulsgl,fpmuldbl,fpdivsgl,fpdivdbl,fpsqrtsgl,fpsqrtdbl,parallel_branch,trap")
                      (eq_attr "length" "4")
                      (not (match_test "RTX_FRAME_RELATED_P (insn)")))
                 (const_string "true")
                 (const_string "false")))

;; For calls and millicode calls.
(define_attr "in_call_delay" "false,true"
   (if_then_else (and (eq_attr "type" 
"!uncond_branch,branch,cbranch,fbranch,call,sibcall,dyncall,multi,milli,sh_func_adrs,parallel_branch,trap")
                      (eq_attr "length" "4")
                      (not (match_test "RTX_FRAME_RELATED_P (insn)")))
                 (const_string "true")
                 (const_string "false")))


Given architectural difficulties of delay slots on modern processors, 
would it be that painful to just not allow filling slots with frame 
insns and let dbr try to find something else or drop in a nop?  I 
wouldn't be all that surprised if there wasn't a measurable performance 
difference on something like a modern Sparc.


Jeff
Eric Botcazou Nov. 15, 2014, 9:39 a.m. UTC | #4
> IIRC, fill_eager and its related friends are all speculative in some way
> and aren't those precisely the ones that are causing us problems.   Also
> note we have backends working around this stuff in fairly blunt ways:

I'd say that the PA back-end went a bit too far here, especially if it marks 
some insns of the epilogue as frame-related.  dwarf2cfi.c has special code to 
handle delay slots (SEQUENCEs) so it's not an all-or-nothing game.

> Given architectural difficulties of delay slots on modern processors,
> would it be that painful to just not allow filling slots with frame
> insns and let dbr try to find something else or drop in a nop?  I
> wouldn't be all that surprised if there wasn't a measurable performance
> difference on something like a modern Sparc.

Yes, modern SPARCs have (short) branches without delay slots.  But the other 
big contender is MIPS here and the story might be different for it.
Matthew Fortune Nov. 15, 2014, 9:37 p.m. UTC | #5
Eric Botcazou <ebotcazou@adacore.com> writes:
> > IIRC, fill_eager and its related friends are all speculative in some
> way
> > and aren't those precisely the ones that are causing us problems.
> Also
> > note we have backends working around this stuff in fairly blunt ways:
> 
> I'd say that the PA back-end went a bit too far here, especially if it
> marks some insns of the epilogue as frame-related.  dwarf2cfi.c has
> special code to handle delay slots (SEQUENCEs) so it's not an all-or-
> nothing game.
> 
> > Given architectural difficulties of delay slots on modern processors,
> > would it be that painful to just not allow filling slots with frame
> > insns and let dbr try to find something else or drop in a nop?  I
> > wouldn't be all that surprised if there wasn't a measurable
> > performance difference on something like a modern Sparc.
> 
> Yes, modern SPARCs have (short) branches without delay slots.  But the
> other big contender is MIPS here and the story might be different for
> it.

MIPSr6 introduces 'compact' branches which do not have delay slots.

So the issues of filling delay slots will be less important from R6
onwards. However, delay slots remain important for now.

I haven't thought about the problem much but instinctively I'd be surprised
if a blanket restriction on frame-related instructions would lead to lots
of NOPs in delay slots.

Matthew
Jeff Law Nov. 17, 2014, 6:32 a.m. UTC | #6
On 11/15/14 14:37, Matthew Fortune wrote:
> Eric Botcazou <ebotcazou@adacore.com> writes:
>>> IIRC, fill_eager and its related friends are all speculative in some
>> way
>>> and aren't those precisely the ones that are causing us problems.
>> Also
>>> note we have backends working around this stuff in fairly blunt ways:
>>
>> I'd say that the PA back-end went a bit too far here, especially if it
>> marks some insns of the epilogue as frame-related.  dwarf2cfi.c has
>> special code to handle delay slots (SEQUENCEs) so it's not an all-or-
>> nothing game.
>>
>>> Given architectural difficulties of delay slots on modern processors,
>>> would it be that painful to just not allow filling slots with frame
>>> insns and let dbr try to find something else or drop in a nop?  I
>>> wouldn't be all that surprised if there wasn't a measurable
>>> performance difference on something like a modern Sparc.
>>
>> Yes, modern SPARCs have (short) branches without delay slots.  But the
>> other big contender is MIPS here and the story might be different for
>> it.
>
> MIPSr6 introduces 'compact' branches which do not have delay slots.
>
> So the issues of filling delay slots will be less important from R6
> onwards. However, delay slots remain important for now.
>
> I haven't thought about the problem much but instinctively I'd be surprised
> if a blanket restriction on frame-related instructions would lead to lots
> of NOPs in delay slots.
Possibly.  I'd be surprised if frame-related stuff is used that often 
for filling slots...  Combine that with the decrease in importance for 
filling delay slots when the exist, I wouldn't be terribly surprised if 
nobody could actually measure the change if we were to make it.

The PA port may have gone too far, but it's certainly conservatively 
correct and on every PA processor that "matters" (for a very liberal 
definition of matters), I doubt the difference is measurable due to the 
depth of the reorder buffers and the fact that a nop can retire anytime 
that's convenient.

Jeff
diff mbox

Patch

Index: reorg.c
===================================================================
--- reorg.c	(revision 217259)
+++ reorg.c	(working copy)
@@ -2501,9 +2501,11 @@  fill_slots_from_thread (rtx_insn *insn,
 
 	  /* There are two ways we can win:  If TRIAL doesn't set anything
 	     needed at the opposite thread and can't trap, or if it can
-	     go into an annulled delay slot.  */
+	     go into an annulled delay slot.  But we want neither to copy
+	     nor to speculate frame-related insns.  */
 	  if (!must_annul
-	      && (condition == const_true_rtx
+	      && ((condition == const_true_rtx
+		   && (own_thread || !RTX_FRAME_RELATED_P (trial)))
 	          || (! insn_sets_resource_p (trial, &opposite_needed, true)
 		      && ! may_trap_or_fault_p (pat)
 		      && ! RTX_FRAME_RELATED_P (trial))))