diff mbox

[i386] : Combine memory and indirect jump

Message ID CAEwic4bwpcge4neZHZuEn=RBN509pmDy5ocrqx8bk52sTM0xCg@mail.gmail.com
State New
Headers show

Commit Message

Kai Tietz June 20, 2014, 9:59 p.m. UTC
2014-06-20 20:14 GMT+02:00 Jeff Law <law@redhat.com>:
> On 06/20/14 12:07, Kai Tietz wrote:
>>
>> 2014-06-20 19:55 GMT+02:00 Richard Henderson <rth@redhat.com>:
>>>
>>> On 06/20/2014 10:52 AM, Kai Tietz wrote:
>>>>
>>>> 2014-06-20  Kai Tietz  <ktietz@redhat.com>
>>>>
>>>>      PR target/39284
>>>>      * passes.def (peephole2): Add second peephole2 pass before
>>>>      split before sched2 pass.
>>>>      * config/i386/i386.md (peehole2): To combine
>>>>      indirect jump with memory.
>>>>      (split2): Likewise.
>>>
>>>
>>> Why are we adding a second pass instead of just moving the one?
>>>
>>>
>>> r~
>>
>>
>> I told that in a prior mail in that thread to Jeff. IIRC there are
>> some conversion of impossible pushes then done too late, additional
>> some patterns about split & dieing register too.  Means we produce
>> weaker code.
>
> So let's dig into this deeper.  Examples & explanations would help.  I know
> it feels like a bit of a runaround, but avoiding adding the pass would be
> good.
>
> jeff

I dug into it a bit.  And couldn't find any significant difference for
x64 target for existing testcases.
I am still a bit concerned - I can't reproduce it for x86/x86_64
targets - that we might cause regressions for targets by moving
peephole2 pass too close before the sched2 pass.  Therefore I searched
for the closest place to the prior place of the peephole2 pass, which
solves still the indirect jump optimization on memory.
By testing for x86/x64 the pass needs to be run directly after the
"reorder blocks" pass.

So I suggest following change of passes.def:


Kai

Comments

Richard Henderson June 23, 2014, 2:13 p.m. UTC | #1
On 06/20/2014 02:59 PM, Kai Tietz wrote:
> So I suggest following change of passes.def:
> 
> Index: passes.def
> ===================================================================
> --- passes.def  (Revision 211850)
> +++ passes.def  (Arbeitskopie)
> @@ -384,7 +384,6 @@ along with GCC; see the file COPYING3.  If not see
>           NEXT_PASS (pass_rtl_dse2);
>           NEXT_PASS (pass_stack_adjustments);
>           NEXT_PASS (pass_jump2);
> -         NEXT_PASS (pass_peephole2);
>           NEXT_PASS (pass_if_after_reload);
>           NEXT_PASS (pass_regrename);
>           NEXT_PASS (pass_cprop_hardreg);
> @@ -391,6 +390,7 @@ along with GCC; see the file COPYING3.  If not see
>           NEXT_PASS (pass_fast_rtl_dce);
>           NEXT_PASS (pass_duplicate_computed_gotos);
>           NEXT_PASS (pass_reorder_blocks);
> +         NEXT_PASS (pass_peephole2);
>           NEXT_PASS (pass_branch_target_load_optimize2);
>           NEXT_PASS (pass_leaf_regs);
>           NEXT_PASS (pass_split_before_sched2);

Looks good to me.  I guess just keep an eye out for bug reports for other ports.


r~
Richard Biener June 23, 2014, 2:32 p.m. UTC | #2
On Mon, Jun 23, 2014 at 4:13 PM, Richard Henderson <rth@redhat.com> wrote:
> On 06/20/2014 02:59 PM, Kai Tietz wrote:
>> So I suggest following change of passes.def:
>>
>> Index: passes.def
>> ===================================================================
>> --- passes.def  (Revision 211850)
>> +++ passes.def  (Arbeitskopie)
>> @@ -384,7 +384,6 @@ along with GCC; see the file COPYING3.  If not see
>>           NEXT_PASS (pass_rtl_dse2);
>>           NEXT_PASS (pass_stack_adjustments);
>>           NEXT_PASS (pass_jump2);
>> -         NEXT_PASS (pass_peephole2);
>>           NEXT_PASS (pass_if_after_reload);
>>           NEXT_PASS (pass_regrename);
>>           NEXT_PASS (pass_cprop_hardreg);
>> @@ -391,6 +390,7 @@ along with GCC; see the file COPYING3.  If not see
>>           NEXT_PASS (pass_fast_rtl_dce);
>>           NEXT_PASS (pass_duplicate_computed_gotos);
>>           NEXT_PASS (pass_reorder_blocks);
>> +         NEXT_PASS (pass_peephole2);
>>           NEXT_PASS (pass_branch_target_load_optimize2);
>>           NEXT_PASS (pass_leaf_regs);
>>           NEXT_PASS (pass_split_before_sched2);
>
> Looks good to me.  I guess just keep an eye out for bug reports for other ports.

Maybe put a comment here because it looks like a random placement to me
which would be obvious to revert.  peepholing before if-after-reload sounds
good anyway.

Did you test effect on code-generation of this change on other targets?

Btw, there is now no DCE after peephole2?  Is peephole2 expected to
cleanup after itself?

Richard.

>
> r~
Jeff Law June 23, 2014, 4:22 p.m. UTC | #3
On 06/23/14 08:32, Richard Biener wrote:
> On Mon, Jun 23, 2014 at 4:13 PM, Richard Henderson <rth@redhat.com> wrote:
>> On 06/20/2014 02:59 PM, Kai Tietz wrote:
>>> So I suggest following change of passes.def:
>>>
>>> Index: passes.def
>>> ===================================================================
>>> --- passes.def  (Revision 211850)
>>> +++ passes.def  (Arbeitskopie)
>>> @@ -384,7 +384,6 @@ along with GCC; see the file COPYING3.  If not see
>>>            NEXT_PASS (pass_rtl_dse2);
>>>            NEXT_PASS (pass_stack_adjustments);
>>>            NEXT_PASS (pass_jump2);
>>> -         NEXT_PASS (pass_peephole2);
>>>            NEXT_PASS (pass_if_after_reload);
>>>            NEXT_PASS (pass_regrename);
>>>            NEXT_PASS (pass_cprop_hardreg);
>>> @@ -391,6 +390,7 @@ along with GCC; see the file COPYING3.  If not see
>>>            NEXT_PASS (pass_fast_rtl_dce);
>>>            NEXT_PASS (pass_duplicate_computed_gotos);
>>>            NEXT_PASS (pass_reorder_blocks);
>>> +         NEXT_PASS (pass_peephole2);
>>>            NEXT_PASS (pass_branch_target_load_optimize2);
>>>            NEXT_PASS (pass_leaf_regs);
>>>            NEXT_PASS (pass_split_before_sched2);
>>
>> Looks good to me.  I guess just keep an eye out for bug reports for other ports.
>
> Maybe put a comment here because it looks like a random placement to me
> which would be obvious to revert.  peepholing before if-after-reload sounds
> good anyway.
Definitely need a comment on the pass placement.

> Btw, there is now no DCE after peephole2?  Is peephole2 expected to
> cleanup after itself?
There were cases where we wanted to change the insns we would output to 
fit into the 4:1:1 issue model of the PPro, but to do so we needed to 
know what registers were live/dead so that we could rewrite the insns 
appropriately.  It didn't fit well into what we could do in the 
splitters and the old peephole ran too late.  Dead code wasn't ever 
really considered.  At least that's my recollection.  RTH might recall more.

I think it'd be worth an experiment here, but I think that can/should 
happen independently of Kai's patch.  Arguably the scheduler should have 
all the necessary dataflow information to quickly identify any dead code.

Jeff
Richard Henderson June 23, 2014, 4:42 p.m. UTC | #4
On 06/23/2014 09:22 AM, Jeff Law wrote:
> On 06/23/14 08:32, Richard Biener wrote:
>> Btw, there is now no DCE after peephole2?  Is peephole2 expected to
>> cleanup after itself?
> There were cases where we wanted to change the insns we would output to fit
> into the 4:1:1 issue model of the PPro, but to do so we needed to know what
> registers were live/dead so that we could rewrite the insns appropriately.  It
> didn't fit well into what we could do in the splitters and the old peephole ran
> too late.  Dead code wasn't ever really considered.  At least that's my
> recollection.  RTH might recall more.

Yes, peep2 was about doing what the old "peep1" rtl->text transformation did,
but as an rtl->rtl transformation so we can expose the result to the scheduler.

It's expected that all dead code be gone before sched2, so that the scheduler
sees exactly what needs to be scheduled, and can bundle insns appropriately.

I believe the peep2 pass to also want dead code to be gone, so that it gets an
accurate picture of what registers are live or dead at any point.  As far as I
know, there are no current transformations that create new garbage.



r~
diff mbox

Patch

Index: passes.def
===================================================================
--- passes.def  (Revision 211850)
+++ passes.def  (Arbeitskopie)
@@ -384,7 +384,6 @@  along with GCC; see the file COPYING3.  If not see
          NEXT_PASS (pass_rtl_dse2);
          NEXT_PASS (pass_stack_adjustments);
          NEXT_PASS (pass_jump2);
-         NEXT_PASS (pass_peephole2);
          NEXT_PASS (pass_if_after_reload);
          NEXT_PASS (pass_regrename);
          NEXT_PASS (pass_cprop_hardreg);
@@ -391,6 +390,7 @@  along with GCC; see the file COPYING3.  If not see
          NEXT_PASS (pass_fast_rtl_dce);
          NEXT_PASS (pass_duplicate_computed_gotos);
          NEXT_PASS (pass_reorder_blocks);
+         NEXT_PASS (pass_peephole2);
          NEXT_PASS (pass_branch_target_load_optimize2);
          NEXT_PASS (pass_leaf_regs);
          NEXT_PASS (pass_split_before_sched2);