mbox series

[0/2,IRA,LRA] Fix PR86939, IRA incorrectly creates an interference between a pseudo register and a hard register

Message ID db00fba8-f392-510f-7f75-31dd1eed6f87@linux.ibm.com
Headers show
Series Fix PR86939, IRA incorrectly creates an interference between a pseudo register and a hard register | expand

Message

Peter Bergner Sept. 26, 2018, 9:14 p.m. UTC
PR86939 shows a problem where IRA (and LRA) adds an unneeded conflict between
a pseudo reg and a hard reg leading to an unnecessary copy.  The definition
of conflict that most register allocators use is that two live ranges conflict
if one is "live" at the definition of the other and they have different
values (here "live" means live and available).  In computing conflicts,
IRA/LRA do not try and represent the "...and they have different values"
which leads to the unneeded conflict.  They also add conflicts when they
handle insn operand uses, rather than at operand definitions that the
definition above implies and that other allocators I know about do.

The following 2 patches fixes (some) of the problems mentioned above
and is enough to fix the performance problem reported in the PR.
Firstly, PATCH 1 changes IRA and LRA to compute conflicts when handling
definitions, rather than at uses.  This change is required by the second
patch, since we cannot handle reg to reg copies specially (ie, skip the
addition of a conflict) if we've already made them conflict because we
noticed they're live simultaneously.  Technically, this should only
make a difference in the conflicts computed for two live ranges that
are both undefined.  Not really likely.  I also took it upon myself to
rename the *_born functions, since the live ranges are not born at the
point we call them.  They are actually deaths/last uses and we need to
add them to the "live" sets (we traverse the basic blocks in reverse).
Therefore, I changes the suffix from _born to _live.

PATCH 2 adds the special handling of pseudo to/from hard reg copies.
Specifically, we ignore the source reg of the copy when adding conflicts
for the reg that is being defined in the copy insn.  A few things I'd
like to point out.  Normally, in the other allocators I'm familiar with,
we handle conflicts for copies by removing the source reg from the "live"
set, even if it doesn't die in the copy.  Then when we process the
copies definitions, we add conflicts with everything that is live.
Since we've removed the source reg, then we've successfully not added a
conflict here.  Then we process the copies uses which will automatically
add the source reg back to the live set.  I tried that here too, but
given allocno ranges computation, I couldn't do that here.  I decided
just to create a global var that I can test for when adding conflicts.
Also, this patch does not handle pseudo to pseudo copies, since their
conflicts are computed by checking their allocno ranges for overlap
and I could not determine how to recognize and handle copies during
that process.  If someone has ideas, please let me know.  Finally, I
have explicitly disallowed special handling of copies of register pairs,
since that is difficult to get right.  I tried several solutions before
finally just giving up on them.  That could be a future work item along
with pseudo to pseudo handling if people want.

I'll note that I have bootstrapped and regtested PATCH 1 on both
powerpc64le-linux and x86_64-linux with not regressions.  I have
also bootstrapped and regtested PATCH 1 + PATCH 2 on the same two
platforms with no regressions.

Vlad, Jeff or anyone else, any comments on the approach used here
to fix PR86939?

If the patches are "ok" as is, I'd like to commit PATCH 1 first and
let it bake on trunk for a little while before committing PATCH 2.

Peter

Comments

Vladimir Makarov Sept. 28, 2018, 9:34 p.m. UTC | #1
On 09/26/2018 05:14 PM, Peter Bergner wrote:
> PR86939 shows a problem where IRA (and LRA) adds an unneeded conflict between
> a pseudo reg and a hard reg leading to an unnecessary copy.  The definition
> of conflict that most register allocators use is that two live ranges conflict
> if one is "live" at the definition of the other and they have different
> values (here "live" means live and available).  In computing conflicts,
> IRA/LRA do not try and represent the "...and they have different values"
> which leads to the unneeded conflict.  They also add conflicts when they
> handle insn operand uses, rather than at operand definitions that the
> definition above implies and that other allocators I know about do.
There is some handling simple cases of the same value in LRA but such 
handling is absent in IRA which is the source of the reported problem.

I remember I experimented with value numbering and using it in building 
conflicts (that time it was global.c) but it did not improve the code as 
I expected and made RA much slower.

We still can improve simple cases though.  So thank you, Perter, for 
working on this very non-trivial issue.


> The following 2 patches fixes (some) of the problems mentioned above
> and is enough to fix the performance problem reported in the PR.
> Firstly, PATCH 1 changes IRA and LRA to compute conflicts when handling
> definitions, rather than at uses.  This change is required by the second
> patch, since we cannot handle reg to reg copies specially (ie, skip the
> addition of a conflict) if we've already made them conflict because we
> noticed they're live simultaneously.  Technically, this should only
> make a difference in the conflicts computed for two live ranges that
> are both undefined.  Not really likely.  I also took it upon myself to
> rename the *_born functions, since the live ranges are not born at the
> point we call them.  They are actually deaths/last uses and we need to
> add them to the "live" sets (we traverse the basic blocks in reverse).
> Therefore, I changes the suffix from _born to _live.
We had some PRs with old RA code generation in case of using undefined 
values.  I don't remember what problems were exactly but I remember Ken 
Zadeck worked on this too.  That is why I probably chose the current 
scheme of conflict building.

May be I was wrong because you testing shows that it is not a problem 
anymore.


> PATCH 2 adds the special handling of pseudo to/from hard reg copies.
> Specifically, we ignore the source reg of the copy when adding conflicts
> for the reg that is being defined in the copy insn.  A few things I'd
> like to point out.  Normally, in the other allocators I'm familiar with,
> we handle conflicts for copies by removing the source reg from the "live"
> set, even if it doesn't die in the copy.  Then when we process the
> copies definitions, we add conflicts with everything that is live.
> Since we've removed the source reg, then we've successfully not added a
> conflict here.  Then we process the copies uses which will automatically
> add the source reg back to the live set.  I tried that here too, but
> given allocno ranges computation, I couldn't do that here.  I decided
> just to create a global var that I can test for when adding conflicts.
> Also, this patch does not handle pseudo to pseudo copies, since their
> conflicts are computed by checking their allocno ranges for overlap
> and I could not determine how to recognize and handle copies during
> that process.  If someone has ideas, please let me know.  Finally, I
> have explicitly disallowed special handling of copies of register pairs,
> since that is difficult to get right.  I tried several solutions before
> finally just giving up on them.  That could be a future work item along
> with pseudo to pseudo handling if people want.
Using live ranges speeds up significantly IRA because we don't need to 
build the interference graph.  It also simplifies the fast register 
allocation implementation.  On the other hand, dealing with live ranges 
can be sometimes complicated.
> I'll note that I have bootstrapped and regtested PATCH 1 on both
> powerpc64le-linux and x86_64-linux with not regressions.  I have
> also bootstrapped and regtested PATCH 1 + PATCH 2 on the same two
> platforms with no regressions.
>
> Vlad, Jeff or anyone else, any comments on the approach used here
> to fix PR86939?
>
> If the patches are "ok" as is, I'd like to commit PATCH 1 first and
> let it bake on trunk for a little while before committing PATCH 2.
>
Peter, thank you again for working on the PR.  It is always interesting 
to read your comments about other register allocators and your 
experience about working on RAs.

I am reviewing the patches.  The 1st patch is ok for me.  You can commit 
it to the trunk.  I'll review the second patch on the next week.
Peter Bergner Sept. 30, 2018, 8:20 p.m. UTC | #2
On 9/28/18 4:34 PM, Vladimir Makarov wrote:
> I remember I experimented with value numbering and using it in building
> conflicts (that time it was global.c) but it did not improve the code as
> I expected and made RA much slower.

Yes, value numbering is not cheap to compute for catching special cases 
like this that probably don't happen too often.  Handling copies is fairly
cheap and easy though.


> We had some PRs with old RA code generation in case of using undefined values.
> I don't remember what problems were exactly but I remember Ken Zadeck worked
> on this too.  That is why I probably chose the current scheme of conflict
> building.
> 
> May be I was wrong because you testing shows that it is not a problem anymore.

Maybe it was due to code relying on undefined behavior?  Anyway, it'll be good
to let PATCH 1 bake on trunk for a little while before we commit PATCH 2 to
catch any (if they exist) problems related to that change.


> The 1st patch is ok for me.  You can commit it to the trunk.

Ok, PATCH 1 is now committed, thanks!


> I'll review the second patch on the next week.

Sounds great as it gives any problems caused or exposed by PATCH 1 time to
be reported.  Thanks again!


Peter
H.J. Lu Oct. 1, 2018, 12:57 a.m. UTC | #3
On Sun, Sep 30, 2018 at 1:21 PM Peter Bergner <bergner@linux.ibm.com> wrote:
>
> On 9/28/18 4:34 PM, Vladimir Makarov wrote:
> > I remember I experimented with value numbering and using it in building
> > conflicts (that time it was global.c) but it did not improve the code as
> > I expected and made RA much slower.
>
> Yes, value numbering is not cheap to compute for catching special cases
> like this that probably don't happen too often.  Handling copies is fairly
> cheap and easy though.
>
>
> > We had some PRs with old RA code generation in case of using undefined values.
> > I don't remember what problems were exactly but I remember Ken Zadeck worked
> > on this too.  That is why I probably chose the current scheme of conflict
> > building.
> >
> > May be I was wrong because you testing shows that it is not a problem anymore.
>
> Maybe it was due to code relying on undefined behavior?  Anyway, it'll be good
> to let PATCH 1 bake on trunk for a little while before we commit PATCH 2 to
> catch any (if they exist) problems related to that change.
>
>
> > The 1st patch is ok for me.  You can commit it to the trunk.
>
> Ok, PATCH 1 is now committed, thanks!
>
>
> > I'll review the second patch on the next week.
>
> Sounds great as it gives any problems caused or exposed by PATCH 1 time to
> be reported.  Thanks again!

This caused:

FAIL: gcc.target/i386/pr63527.c scan-assembler-not movl[ \t]%[^,]+, %ebx
FAIL: gcc.target/i386/pr63534.c scan-assembler-not movl[ \t]%[^,]+, %ebx
FAIL: gcc.target/i386/pr64317.c scan-assembler addl[
\\t]+[$]_GLOBAL_OFFSET_TABLE_, %ebx
FAIL: gcc.target/i386/pr64317.c scan-assembler movl[ \\t]+c@GOTOFF[(]%ebx[)]

for Linux/i686.
Peter Bergner Oct. 1, 2018, 1:18 a.m. UTC | #4
On 9/30/18 7:57 PM, H.J. Lu wrote:
> This caused:
> 
> FAIL: gcc.target/i386/pr63527.c scan-assembler-not movl[ \t]%[^,]+, %ebx
> FAIL: gcc.target/i386/pr63534.c scan-assembler-not movl[ \t]%[^,]+, %ebx
> FAIL: gcc.target/i386/pr64317.c scan-assembler addl[
> \\t]+[$]_GLOBAL_OFFSET_TABLE_, %ebx
> FAIL: gcc.target/i386/pr64317.c scan-assembler movl[ \\t]+c@GOTOFF[(]%ebx[)]

Can you check whether the new generated code is at least as good
as the old generated code?  I'm assuming the code we generate now isn't
wrong, just different and maybe we just need to change what we expect
to see.

Peter
H.J. Lu Oct. 1, 2018, 12:44 p.m. UTC | #5
On Sun, Sep 30, 2018 at 6:18 PM Peter Bergner <bergner@linux.ibm.com> wrote:
>
> On 9/30/18 7:57 PM, H.J. Lu wrote:
> > This caused:
> >
> > FAIL: gcc.target/i386/pr63527.c scan-assembler-not movl[ \t]%[^,]+, %ebx
> > FAIL: gcc.target/i386/pr63534.c scan-assembler-not movl[ \t]%[^,]+, %ebx
> > FAIL: gcc.target/i386/pr64317.c scan-assembler addl[
> > \\t]+[$]_GLOBAL_OFFSET_TABLE_, %ebx
> > FAIL: gcc.target/i386/pr64317.c scan-assembler movl[ \\t]+c@GOTOFF[(]%ebx[)]
>
> Can you check whether the new generated code is at least as good
> as the old generated code?  I'm assuming the code we generate now isn't
> wrong, just different and maybe we just need to change what we expect
> to see.

I checked gcc.target/i386/pr63527.c and it has a regression.

Before:

00000000 <foo>:
   0: 53                    push   %ebx
   1: e8 fc ff ff ff        call   2 <foo+0x2>
   6: 81 c3 02 00 00 00    add    $0x2,%ebx
   c: 83 ec 08              sub    $0x8,%esp
   f: e8 fc ff ff ff        call   10 <foo+0x10>
  14: e8 fc ff ff ff        call   15 <foo+0x15>
  19: 83 c4 08              add    $0x8,%esp
  1c: 5b                    pop    %ebx
  1d: c3                    ret

Disassembly of section .text.__x86.get_pc_thunk.bx:

00000000 <__x86.get_pc_thunk.bx>:
   0: 8b 1c 24              mov    (%esp),%ebx
   3: c3                    ret

After:

00000000 <foo>:
   0: 56                    push   %esi
   1: e8 fc ff ff ff        call   2 <foo+0x2>
   6: 81 c6 02 00 00 00    add    $0x2,%esi
   c: 53                    push   %ebx
   d: 83 ec 04              sub    $0x4,%esp
  10: 89 f3                mov    %esi,%ebx
  12: e8 fc ff ff ff        call   13 <foo+0x13>
  17: e8 fc ff ff ff        call   18 <foo+0x18>
  1c: 83 c4 04              add    $0x4,%esp
  1f: 5b                    pop    %ebx
  20: 5e                    pop    %esi
  21: c3                    ret

Disassembly of section .text.__x86.get_pc_thunk.si:

00000000 <__x86.get_pc_thunk.si>:
   0: 8b 34 24              mov    (%esp),%esi
   3: c3                    ret
H.J. Lu Oct. 1, 2018, 12:45 p.m. UTC | #6
On Mon, Oct 1, 2018 at 5:44 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Sun, Sep 30, 2018 at 6:18 PM Peter Bergner <bergner@linux.ibm.com> wrote:
> >
> > On 9/30/18 7:57 PM, H.J. Lu wrote:
> > > This caused:
> > >
> > > FAIL: gcc.target/i386/pr63527.c scan-assembler-not movl[ \t]%[^,]+, %ebx
> > > FAIL: gcc.target/i386/pr63534.c scan-assembler-not movl[ \t]%[^,]+, %ebx
> > > FAIL: gcc.target/i386/pr64317.c scan-assembler addl[
> > > \\t]+[$]_GLOBAL_OFFSET_TABLE_, %ebx
> > > FAIL: gcc.target/i386/pr64317.c scan-assembler movl[ \\t]+c@GOTOFF[(]%ebx[)]
> >
> > Can you check whether the new generated code is at least as good
> > as the old generated code?  I'm assuming the code we generate now isn't
> > wrong, just different and maybe we just need to change what we expect
> > to see.
>
> I checked gcc.target/i386/pr63527.c and it has a regression.
>
> Before:
>
> 00000000 <foo>:
>    0: 53                    push   %ebx
>    1: e8 fc ff ff ff        call   2 <foo+0x2>
>    6: 81 c3 02 00 00 00    add    $0x2,%ebx
>    c: 83 ec 08              sub    $0x8,%esp
>    f: e8 fc ff ff ff        call   10 <foo+0x10>
>   14: e8 fc ff ff ff        call   15 <foo+0x15>
>   19: 83 c4 08              add    $0x8,%esp
>   1c: 5b                    pop    %ebx
>   1d: c3                    ret
>
> Disassembly of section .text.__x86.get_pc_thunk.bx:
>
> 00000000 <__x86.get_pc_thunk.bx>:
>    0: 8b 1c 24              mov    (%esp),%ebx
>    3: c3                    ret
>
> After:
>
> 00000000 <foo>:
>    0: 56                    push   %esi
>    1: e8 fc ff ff ff        call   2 <foo+0x2>
>    6: 81 c6 02 00 00 00    add    $0x2,%esi
>    c: 53                    push   %ebx
>    d: 83 ec 04              sub    $0x4,%esp
>   10: 89 f3                mov    %esi,%ebx
>   12: e8 fc ff ff ff        call   13 <foo+0x13>
>   17: e8 fc ff ff ff        call   18 <foo+0x18>
>   1c: 83 c4 04              add    $0x4,%esp
>   1f: 5b                    pop    %ebx
>   20: 5e                    pop    %esi
>   21: c3                    ret
>
> Disassembly of section .text.__x86.get_pc_thunk.si:
>
> 00000000 <__x86.get_pc_thunk.si>:
>    0: 8b 34 24              mov    (%esp),%esi
>    3: c3                    ret
>

You may have undone:

https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=218059
H.J. Lu Oct. 1, 2018, 12:50 p.m. UTC | #7
On Mon, Oct 1, 2018 at 5:45 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Mon, Oct 1, 2018 at 5:44 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Sun, Sep 30, 2018 at 6:18 PM Peter Bergner <bergner@linux.ibm.com> wrote:
> > >
> > > On 9/30/18 7:57 PM, H.J. Lu wrote:
> > > > This caused:
> > > >
> > > > FAIL: gcc.target/i386/pr63527.c scan-assembler-not movl[ \t]%[^,]+, %ebx
> > > > FAIL: gcc.target/i386/pr63534.c scan-assembler-not movl[ \t]%[^,]+, %ebx
> > > > FAIL: gcc.target/i386/pr64317.c scan-assembler addl[
> > > > \\t]+[$]_GLOBAL_OFFSET_TABLE_, %ebx
> > > > FAIL: gcc.target/i386/pr64317.c scan-assembler movl[ \\t]+c@GOTOFF[(]%ebx[)]
> > >
> > > Can you check whether the new generated code is at least as good
> > > as the old generated code?  I'm assuming the code we generate now isn't
> > > wrong, just different and maybe we just need to change what we expect
> > > to see.
> >
> > I checked gcc.target/i386/pr63527.c and it has a regression.
> >
> > Before:
> >
> > 00000000 <foo>:
> >    0: 53                    push   %ebx
> >    1: e8 fc ff ff ff        call   2 <foo+0x2>
> >    6: 81 c3 02 00 00 00    add    $0x2,%ebx
> >    c: 83 ec 08              sub    $0x8,%esp
> >    f: e8 fc ff ff ff        call   10 <foo+0x10>
> >   14: e8 fc ff ff ff        call   15 <foo+0x15>
> >   19: 83 c4 08              add    $0x8,%esp
> >   1c: 5b                    pop    %ebx
> >   1d: c3                    ret
> >
> > Disassembly of section .text.__x86.get_pc_thunk.bx:
> >
> > 00000000 <__x86.get_pc_thunk.bx>:
> >    0: 8b 1c 24              mov    (%esp),%ebx
> >    3: c3                    ret
> >
> > After:
> >
> > 00000000 <foo>:
> >    0: 56                    push   %esi
> >    1: e8 fc ff ff ff        call   2 <foo+0x2>
> >    6: 81 c6 02 00 00 00    add    $0x2,%esi
> >    c: 53                    push   %ebx
> >    d: 83 ec 04              sub    $0x4,%esp
> >   10: 89 f3                mov    %esi,%ebx
> >   12: e8 fc ff ff ff        call   13 <foo+0x13>
> >   17: e8 fc ff ff ff        call   18 <foo+0x18>
> >   1c: 83 c4 04              add    $0x4,%esp
> >   1f: 5b                    pop    %ebx
> >   20: 5e                    pop    %esi
> >   21: c3                    ret
> >
> > Disassembly of section .text.__x86.get_pc_thunk.si:
> >
> > 00000000 <__x86.get_pc_thunk.si>:
> >    0: 8b 34 24              mov    (%esp),%esi
> >    3: c3                    ret
> >
>
> You may have undone:
>
> https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=218059
>

I opened:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87479
Peter Bergner Oct. 1, 2018, 1:51 p.m. UTC | #8
On 10/1/18 7:50 AM, H.J. Lu wrote:
> On Mon, Oct 1, 2018 at 5:45 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>> You may have undone:
>>
>> https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=218059
>>
> 
> I opened:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87479


Thanks for checking.  I'll have a look.

Peter
Peter Bergner Oct. 2, 2018, 3:52 a.m. UTC | #9
On 10/1/18 7:45 AM, H.J. Lu wrote:
> You may have undone:
> 
> https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=218059

Yes, the code above also needed to be modified to handle conflicts being
added at definitions rather than at uses.  The patch below does that.
I don't really have access to a i686 (ie, 32-bit) system to test on and
I'm not sure how to force the test to be run in 32-bit mode on a 64-bit
build, but it does fix the assembler for the pr63534.c test case.

That said, looking at the rtl for the test case, I see the following
before RA:

(insn 5 2 6 2 (set (reg:SI 3 bx)
        (reg:SI 82)) "pr63534.c":10 85 {*movsi_internal}
     (nil))
(call_insn 6 5 7 2 (call (mem:QI (symbol_ref:SI ("bar") [flags 0x41]  <function_decl 0x7f30c7548000 bar>) [0 barD.1498 S1 A8])
        (const_int 0 [0])) "pr63534.c":10 687 {*call}
     (expr_list:REG_DEAD (reg:SI 3 bx)
        (expr_list:REG_CALL_DECL (symbol_ref:SI ("bar") [flags 0x41]  <function_decl 0x7f30c7548000 bar>)
            (nil)))
    (expr_list (use (reg:SI 3 bx))
        (nil)))
(insn 7 6 8 2 (set (reg:SI 3 bx)
        (reg:SI 82)) "pr63534.c":11 85 {*movsi_internal}
     (expr_list:REG_DEAD (reg:SI 82)
        (nil)))
(call_insn 8 7 0 2 (call (mem:QI (symbol_ref:SI ("bar") [flags 0x41]  <function_decl 0x7f30c7548000 bar>) [0 barD.1498 S1 A8])
        (const_int 0 [0])) "pr63534.c":11 687 {*call}
     (expr_list:REG_DEAD (reg:SI 3 bx)
        (expr_list:REG_CALL_DECL (symbol_ref:SI ("bar") [flags 0x41]  <function_decl 0x7f30c7548000 bar>)
            (nil)))
    (expr_list (use (reg:SI 3 bx))
        (nil)))

Now that we handle conflicts at definitions and the pic hard reg
is set via a copy from the pic pseudo, my PATCH 2 is setup to
handle exactly this scenario (ie, a copy between a pseudo and
a hard reg).  I looked at the asm output from a build with both
PATCH 1 and PATCH 2, and yes, it also does not add the conflict
between the pic pseudo and pic hard reg, so our other option to
fix PR87479 is to apply PATCH 2.  However, since PATCH 2 handles
the pic pseudo and pic hard reg conflict itself, that means we
don't need the special pic conflict code and it can be removed!
I'm going to update PATCH 2 to remove that pic handling code
and send it through bootstrap and regtesting.

H.J., can you confirm that the following patch not only fixes
the bug you opened, but also doesn't introduce any more?
Once I've updated PATCH 2, I'd like you to test/bless that
one as well.  Thanks.

Peter


gcc/
	PR rtl-optimization/87479
	* ira-lives.c (process_bb_node_lives): Move handling of pic pseudo
	and pic hard reg conflict to the insn that sets pic hard reg.
	* lra-lives.c (mark_regno_dead) <check_pic_pseudo_p>: New function
	argument.  Use it.
	(process_bb_lives): Use new argument to mark_regno_dead.
	Don't handle pic pseudo and pic hard reg conflict when processing
	function call arguments.

Index: gcc/ira-lives.c
===================================================================
--- gcc/ira-lives.c	(revision 264758)
+++ gcc/ira-lives.c	(working copy)
@@ -1108,20 +1108,25 @@ process_bb_node_lives (ira_loop_tree_node_t loop_t
 
 	  call_p = CALL_P (insn);
 #ifdef REAL_PIC_OFFSET_TABLE_REGNUM
-	  int regno;
+	  unsigned int regno;
+	  rtx set;
 	  bool clear_pic_use_conflict_p = false;
-	  /* Processing insn usage in call insn can create conflict
-	     with pic pseudo and pic hard reg and that is wrong.
-	     Check this situation and fix it at the end of the insn
-	     processing.  */
-	  if (call_p && pic_offset_table_rtx != NULL_RTX
+	  /* Processing insn definition of REAL_PIC_OFFSET_TABLE_REGNUM
+	     can create a conflict between the pic pseudo and pic hard reg
+	     and that is wrong.  Check this situation and fix it at the end
+	     of the insn processing.  */
+	  if (pic_offset_table_rtx != NULL_RTX
 	      && (regno = REGNO (pic_offset_table_rtx)) >= FIRST_PSEUDO_REGISTER
-	      && (a = ira_curr_regno_allocno_map[regno]) != NULL)
+	      && (a = ira_curr_regno_allocno_map[regno]) != NULL
+	      && (set = single_set (insn)) != NULL_RTX
+	      && REG_P (SET_DEST (set))
+	      && REGNO (SET_DEST (set)) == REAL_PIC_OFFSET_TABLE_REGNUM
+	      && REG_P (SET_SRC (set))
+	      && REGNO (SET_SRC (set)) == regno)
 	    clear_pic_use_conflict_p
-		= (find_regno_fusage (insn, USE, REAL_PIC_OFFSET_TABLE_REGNUM)
-		   && ! TEST_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS
-					   (ALLOCNO_OBJECT (a, 0)),
-					   REAL_PIC_OFFSET_TABLE_REGNUM));
+		= ! TEST_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS
+				       (ALLOCNO_OBJECT (a, 0)),
+				       REAL_PIC_OFFSET_TABLE_REGNUM);
 #endif
 
 	  /* Mark each defined value as live.  We need to do this for
Index: gcc/lra-lives.c
===================================================================
--- gcc/lra-lives.c	(revision 264758)
+++ gcc/lra-lives.c	(working copy)
@@ -342,7 +342,8 @@ mark_regno_live (int regno, machine_mode mode, int
    BB_GEN_PSEUDOS and BB_KILLED_PSEUDOS.  Return TRUE if the liveness
    tracking sets were modified, or FALSE if nothing changed.  */
 static bool
-mark_regno_dead (int regno, machine_mode mode, int point)
+mark_regno_dead (int regno, machine_mode mode, int point,
+		 bool check_pic_pseudo_p)
 {
   int last;
   bool changed = false;
@@ -350,7 +351,7 @@ static bool
   if (regno < FIRST_PSEUDO_REGISTER)
     {
       for (last = end_hard_regno (mode, regno); regno < last; regno++)
-	make_hard_regno_dead (regno, false);
+	make_hard_regno_dead (regno, check_pic_pseudo_p);
     }
   else
     {
@@ -851,9 +852,11 @@ process_bb_lives (basic_block bb, int &curr_point,
       for (reg = curr_id->regs; reg != NULL; reg = reg->next)
 	if (reg->type == OP_OUT
 	    && ! reg_early_clobber_p (reg, n_alt) && ! reg->subreg_p)
+	  /* Don't create a conflict between REAL_PIC_OFFSET_TABLE_REGNUM
+	     and the pic pseudo.  */
 	  need_curr_point_incr
 	    |= mark_regno_dead (reg->regno, reg->biggest_mode,
-				curr_point);
+				curr_point, true);
 
       for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next)
 	if (reg->type == OP_OUT
@@ -863,9 +866,7 @@ process_bb_lives (basic_block bb, int &curr_point,
       if (curr_id->arg_hard_regs != NULL)
 	for (i = 0; (regno = curr_id->arg_hard_regs[i]) >= 0; i++)
 	  if (regno >= FIRST_PSEUDO_REGISTER)
-	    /* It is a clobber.  Don't create conflict of used
-	       REAL_PIC_OFFSET_TABLE_REGNUM and the pic pseudo.  */
-	    make_hard_regno_dead (regno - FIRST_PSEUDO_REGISTER, true);
+	    make_hard_regno_dead (regno - FIRST_PSEUDO_REGISTER, false);
 
       if (call_p)
 	{
@@ -939,7 +940,7 @@ process_bb_lives (basic_block bb, int &curr_point,
 	    && reg_early_clobber_p (reg, n_alt) && ! reg->subreg_p)
 	  need_curr_point_incr
 	    |= mark_regno_dead (reg->regno, reg->biggest_mode,
-				curr_point);
+				curr_point, false);
 
       for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next)
 	if (reg->type == OP_OUT
Jeff Law Oct. 2, 2018, 2:22 p.m. UTC | #10
On 10/1/18 9:52 PM, Peter Bergner wrote:
> On 10/1/18 7:45 AM, H.J. Lu wrote:
>> You may have undone:
>>
>> https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=218059
> 
> Yes, the code above also needed to be modified to handle conflicts being
> added at definitions rather than at uses.  The patch below does that.
> I don't really have access to a i686 (ie, 32-bit) system to test on and
> I'm not sure how to force the test to be run in 32-bit mode on a 64-bit
> build, but it does fix the assembler for the pr63534.c test case.
> 
> That said, looking at the rtl for the test case, I see the following
> before RA:
> 
> (insn 5 2 6 2 (set (reg:SI 3 bx)
>         (reg:SI 82)) "pr63534.c":10 85 {*movsi_internal}
>      (nil))
> (call_insn 6 5 7 2 (call (mem:QI (symbol_ref:SI ("bar") [flags 0x41]  <function_decl 0x7f30c7548000 bar>) [0 barD.1498 S1 A8])
>         (const_int 0 [0])) "pr63534.c":10 687 {*call}
>      (expr_list:REG_DEAD (reg:SI 3 bx)
>         (expr_list:REG_CALL_DECL (symbol_ref:SI ("bar") [flags 0x41]  <function_decl 0x7f30c7548000 bar>)
>             (nil)))
>     (expr_list (use (reg:SI 3 bx))
>         (nil)))
> (insn 7 6 8 2 (set (reg:SI 3 bx)
>         (reg:SI 82)) "pr63534.c":11 85 {*movsi_internal}
>      (expr_list:REG_DEAD (reg:SI 82)
>         (nil)))
> (call_insn 8 7 0 2 (call (mem:QI (symbol_ref:SI ("bar") [flags 0x41]  <function_decl 0x7f30c7548000 bar>) [0 barD.1498 S1 A8])
>         (const_int 0 [0])) "pr63534.c":11 687 {*call}
>      (expr_list:REG_DEAD (reg:SI 3 bx)
>         (expr_list:REG_CALL_DECL (symbol_ref:SI ("bar") [flags 0x41]  <function_decl 0x7f30c7548000 bar>)
>             (nil)))
>     (expr_list (use (reg:SI 3 bx))
>         (nil)))
> 
> Now that we handle conflicts at definitions and the pic hard reg
> is set via a copy from the pic pseudo, my PATCH 2 is setup to
> handle exactly this scenario (ie, a copy between a pseudo and
> a hard reg).  I looked at the asm output from a build with both
> PATCH 1 and PATCH 2, and yes, it also does not add the conflict
> between the pic pseudo and pic hard reg, so our other option to
> fix PR87479 is to apply PATCH 2.  However, since PATCH 2 handles
> the pic pseudo and pic hard reg conflict itself, that means we
> don't need the special pic conflict code and it can be removed!
> I'm going to update PATCH 2 to remove that pic handling code
> and send it through bootstrap and regtesting.
> 
> H.J., can you confirm that the following patch not only fixes
> the bug you opened, but also doesn't introduce any more?
> Once I've updated PATCH 2, I'd like you to test/bless that
> one as well.  Thanks.
Haven't looked at the patch yet.  The easiest (but not fastest) way to
build i686 native is gcc45 in the build farm.

Jeff
Peter Bergner Oct. 2, 2018, 2:59 p.m. UTC | #11
On 10/1/18 10:52 PM, Peter Bergner wrote:
> Now that we handle conflicts at definitions and the pic hard reg
> is set via a copy from the pic pseudo, my PATCH 2 is setup to
> handle exactly this scenario (ie, a copy between a pseudo and
> a hard reg).  I looked at the asm output from a build with both
> PATCH 1 and PATCH 2, and yes, it also does not add the conflict
> between the pic pseudo and pic hard reg, so our other option to
> fix PR87479 is to apply PATCH 2.  However, since PATCH 2 handles
> the pic pseudo and pic hard reg conflict itself, that means we
> don't need the special pic conflict code and it can be removed!
> I'm going to update PATCH 2 to remove that pic handling code
> and send it through bootstrap and regtesting.

Here is an updated PATCH 2 that adds the generic handling of copies between
pseudos and hard regs which obsoletes the special conflict handling of the
REAL_PIC_OFFSET_TABLE_REGNUM with the pic pseudo.  I have confirmed the
assembler generated by this patch for test case pr63534.c matches the code
generated before PATCH 1 was committed, so we are successfully removing the
copy of the pic pseudo into the pic hard reg with this patch.

I'm currently performing bootstrap and regtesting on powerpc64le-linux and
x86_64-linux.  H.J., could you please test this patch on i686 to verify it
doesn't expose any other problems there?  Otherwise, I'll take Jeff's
suggestion and attempt a build on gcc45, but it sounds like the results
will take a while.

Is this patch version ok for trunk assuming no regressions are found in
the testing mentioned above?

Peter


gcc/
	PR rtl-optimization/86939
	PR rtl-optimization/87479
	* ira.h (copy_insn_p): New prototype.
	* ira-lives.c (ignore_reg_for_conflicts): New static variable.
	(make_hard_regno_dead): Don't add conflicts for register
	ignore_reg_for_conflicts.
	(make_object_dead): Likewise.
	(copy_insn_p): New function.
	(process_bb_node_lives): Set ignore_reg_for_conflicts for copies.
	Remove special conflict handling of REAL_PIC_OFFSET_TABLE_REGNUM.
	* lra-lives.c (ignore_reg_for_conflicts): New static variable.
	(make_hard_regno_dead): Don't add conflicts for register
	ignore_reg_for_conflicts.  Remove special conflict handling of
	REAL_PIC_OFFSET_TABLE_REGNUM.  Remove now unused argument
	check_pic_pseudo_p and update callers.
	(mark_pseudo_dead): Don't add conflicts for register
	ignore_reg_for_conflicts.
	(process_bb_lives): Set ignore_reg_for_conflicts for copies.

gcc/testsuite/
	* gcc.target/powerpc/pr86939.c: New test.

Index: gcc/ira.h
===================================================================
--- gcc/ira.h	(revision 264789)
+++ gcc/ira.h	(working copy)
@@ -210,6 +210,9 @@ extern void ira_adjust_equiv_reg_cost (u
 /* ira-costs.c */
 extern void ira_costs_c_finalize (void);
 
+/* ira-lives.c */
+extern rtx copy_insn_p (rtx_insn *);
+
 /* Spilling static chain pseudo may result in generation of wrong
    non-local goto code using frame-pointer to address saved stack
    pointer value after restoring old frame pointer value.  The
Index: gcc/ira-lives.c
===================================================================
--- gcc/ira-lives.c	(revision 264789)
+++ gcc/ira-lives.c	(working copy)
@@ -84,6 +84,10 @@ static int *allocno_saved_at_call;
    supplemental to recog_data.  */
 static alternative_mask preferred_alternatives;
 
+/* If non-NULL, the source operand of a register to register copy for which
+   we should not add a conflict with the copy's destination operand.  */
+static rtx ignore_reg_for_conflicts;
+
 /* Record hard register REGNO as now being live.  */
 static void
 make_hard_regno_live (int regno)
@@ -101,6 +105,11 @@ make_hard_regno_dead (int regno)
     {
       ira_object_t obj = ira_object_id_map[i];
 
+      if (ignore_reg_for_conflicts != NULL_RTX
+	  && REGNO (ignore_reg_for_conflicts)
+	     == (unsigned int) ALLOCNO_REGNO (OBJECT_ALLOCNO (obj)))
+	continue;
+
       SET_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (obj), regno);
       SET_HARD_REG_BIT (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj), regno);
     }
@@ -154,12 +163,38 @@ static void
 make_object_dead (ira_object_t obj)
 {
   live_range_t lr;
+  int ignore_regno = -1;
+  int last_regno = -1;
 
   sparseset_clear_bit (objects_live, OBJECT_CONFLICT_ID (obj));
 
+  /* Check whether any part of IGNORE_REG_FOR_CONFLICTS already conflicts
+     with OBJ.  */
+  if (ignore_reg_for_conflicts != NULL_RTX
+      && REGNO (ignore_reg_for_conflicts) < FIRST_PSEUDO_REGISTER)
+    {
+      last_regno = END_REGNO (ignore_reg_for_conflicts);
+      int src_regno = ignore_regno = REGNO (ignore_reg_for_conflicts);
+
+      while (src_regno < last_regno)
+	{
+	  if (TEST_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (obj), src_regno))
+	    {
+	      ignore_regno = last_regno = -1;
+	      break;
+	    }
+	  src_regno++;
+	}
+    }
+
   IOR_HARD_REG_SET (OBJECT_CONFLICT_HARD_REGS (obj), hard_regs_live);
   IOR_HARD_REG_SET (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj), hard_regs_live);
 
+  /* If IGNORE_REG_FOR_CONFLICTS did not already conflict with OBJ, make
+     sure it still doesn't.  */
+  for (; ignore_regno < last_regno; ignore_regno++)
+    CLEAR_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (obj), ignore_regno);
+
   lr = OBJECT_LIVE_RANGES (obj);
   ira_assert (lr != NULL);
   lr->finish = curr_point;
@@ -1022,6 +1057,38 @@ find_call_crossed_cheap_reg (rtx_insn *i
   return cheap_reg;
 }  
 
+/* Determine whether INSN is a register to register copy of the type where
+   we do not need to make the source and destiniation registers conflict.
+   If this is a copy instruction, then return the source reg.  Otherwise,
+   return NULL_RTX.  */
+rtx
+copy_insn_p (rtx_insn *insn)
+{
+  rtx set;
+
+  /* Disallow anything other than a simple register to register copy
+     that has no side effects.  */
+  if ((set = single_set (insn)) == NULL_RTX
+      || !REG_P (SET_DEST (set))
+      || !REG_P (SET_SRC (set))
+      || side_effects_p (set))
+    return NULL_RTX;
+
+  int dst_regno = REGNO (SET_DEST (set));
+  int src_regno = REGNO (SET_SRC (set));
+  machine_mode mode = GET_MODE (SET_DEST (set));
+
+  /* Computing conflicts for register pairs is difficult to get right, so
+     for now, disallow it.  */
+  if ((dst_regno < FIRST_PSEUDO_REGISTER
+       && hard_regno_nregs (dst_regno, mode) != 1)
+      || (src_regno < FIRST_PSEUDO_REGISTER
+	  && hard_regno_nregs (src_regno, mode) != 1))
+    return NULL_RTX;
+
+  return SET_SRC (set);
+}
+
 /* Process insns of the basic block given by its LOOP_TREE_NODE to
    update allocno live ranges, allocno hard register conflicts,
    intersected calls, and register pressure info for allocnos for the
@@ -1107,22 +1174,7 @@ process_bb_node_lives (ira_loop_tree_nod
 		     curr_point);
 
 	  call_p = CALL_P (insn);
-#ifdef REAL_PIC_OFFSET_TABLE_REGNUM
-	  int regno;
-	  bool clear_pic_use_conflict_p = false;
-	  /* Processing insn usage in call insn can create conflict
-	     with pic pseudo and pic hard reg and that is wrong.
-	     Check this situation and fix it at the end of the insn
-	     processing.  */
-	  if (call_p && pic_offset_table_rtx != NULL_RTX
-	      && (regno = REGNO (pic_offset_table_rtx)) >= FIRST_PSEUDO_REGISTER
-	      && (a = ira_curr_regno_allocno_map[regno]) != NULL)
-	    clear_pic_use_conflict_p
-		= (find_regno_fusage (insn, USE, REAL_PIC_OFFSET_TABLE_REGNUM)
-		   && ! TEST_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS
-					   (ALLOCNO_OBJECT (a, 0)),
-					   REAL_PIC_OFFSET_TABLE_REGNUM));
-#endif
+	  ignore_reg_for_conflicts = copy_insn_p (insn);
 
 	  /* Mark each defined value as live.  We need to do this for
 	     unused values because they still conflict with quantities
@@ -1275,20 +1327,9 @@ process_bb_node_lives (ira_loop_tree_nod
 		}
 	    }
 
-#ifdef REAL_PIC_OFFSET_TABLE_REGNUM
-	  if (clear_pic_use_conflict_p)
-	    {
-	      regno = REGNO (pic_offset_table_rtx);
-	      a = ira_curr_regno_allocno_map[regno];
-	      CLEAR_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (ALLOCNO_OBJECT (a, 0)),
-				  REAL_PIC_OFFSET_TABLE_REGNUM);
-	      CLEAR_HARD_REG_BIT (OBJECT_TOTAL_CONFLICT_HARD_REGS
-				  (ALLOCNO_OBJECT (a, 0)),
-				  REAL_PIC_OFFSET_TABLE_REGNUM);
-	    }
-#endif
 	  curr_point++;
 	}
+      ignore_reg_for_conflicts = NULL_RTX;
 
       if (bb_has_eh_pred (bb))
 	for (j = 0; ; ++j)
Index: gcc/lra-lives.c
===================================================================
--- gcc/lra-lives.c	(revision 264789)
+++ gcc/lra-lives.c	(working copy)
@@ -96,6 +96,10 @@ static bitmap_head temp_bitmap;
 /* Pool for pseudo live ranges.	 */
 static object_allocator<lra_live_range> lra_live_range_pool ("live ranges");
 
+/* If non-NULL, the source operand of a register to register copy for which
+   we should not add a conflict with the copy's destination operand.  */
+static rtx ignore_reg_for_conflicts;
+
 /* Free live range list LR.  */
 static void
 free_live_range_list (lra_live_range_t lr)
@@ -239,11 +243,9 @@ make_hard_regno_live (int regno)
 
 /* Process the definition of hard register REGNO.  This updates
    hard_regs_live, START_DYING and conflict hard regs for living
-   pseudos.  Conflict hard regs for the pic pseudo is not updated if
-   REGNO is REAL_PIC_OFFSET_TABLE_REGNUM and CHECK_PIC_PSEUDO_P is
-   true.  */
+   pseudos.  */
 static void
-make_hard_regno_dead (int regno, bool check_pic_pseudo_p ATTRIBUTE_UNUSED)
+make_hard_regno_dead (int regno)
 {
   lra_assert (regno < FIRST_PSEUDO_REGISTER);
   if (! TEST_HARD_REG_BIT (hard_regs_live, regno))
@@ -251,13 +253,12 @@ make_hard_regno_dead (int regno, bool ch
   sparseset_set_bit (start_dying, regno);
   unsigned int i;
   EXECUTE_IF_SET_IN_SPARSESET (pseudos_live, i)
-#ifdef REAL_PIC_OFFSET_TABLE_REGNUM
-    if (! check_pic_pseudo_p
-	|| regno != REAL_PIC_OFFSET_TABLE_REGNUM
-	|| pic_offset_table_rtx == NULL
-	|| i != REGNO (pic_offset_table_rtx))
-#endif
+    {
+      if (ignore_reg_for_conflicts != NULL_RTX
+	  && REGNO (ignore_reg_for_conflicts) == i)
+	continue;
       SET_HARD_REG_BIT (lra_reg_info[i].conflict_hard_regs, regno);
+    }
   CLEAR_HARD_REG_BIT (hard_regs_live, regno);
   if (fixed_regs[regno] || TEST_HARD_REG_BIT (hard_regs_spilled_into, regno))
     {
@@ -294,14 +295,41 @@ static void
 mark_pseudo_dead (int regno, int point)
 {
   lra_live_range_t p;
+  int ignore_regno = -1;
+  int last_regno = -1;
 
   lra_assert (regno >= FIRST_PSEUDO_REGISTER);
   lra_assert (sparseset_bit_p (pseudos_live, regno));
   sparseset_clear_bit (pseudos_live, regno);
   sparseset_set_bit (start_dying, regno);
 
+  /* Check whether any part of IGNORE_REG_FOR_CONFLICTS already conflicts
+     with REGNO.  */
+  if (ignore_reg_for_conflicts != NULL_RTX
+      && REGNO (ignore_reg_for_conflicts) < FIRST_PSEUDO_REGISTER)
+    {
+      last_regno = END_REGNO (ignore_reg_for_conflicts);
+      int src_regno = ignore_regno = REGNO (ignore_reg_for_conflicts);
+
+      while (src_regno < last_regno)
+	{
+	  if (TEST_HARD_REG_BIT (lra_reg_info[regno].conflict_hard_regs,
+				 src_regno))
+	    {
+	      ignore_regno = last_regno = -1;
+	      break;
+	    }
+	  src_regno++;
+	}
+    }
+
   IOR_HARD_REG_SET (lra_reg_info[regno].conflict_hard_regs, hard_regs_live);
 
+  /* If IGNORE_REG_FOR_CONFLICTS did not already conflict with REGNO, make
+     sure it still doesn't.  */
+  for (; ignore_regno < last_regno; ignore_regno++)
+    CLEAR_HARD_REG_BIT (lra_reg_info[regno].conflict_hard_regs, ignore_regno);
+
   if (complete_info_p || lra_get_regno_hard_regno (regno) < 0)
     {
       p = lra_reg_info[regno].live_ranges;
@@ -350,7 +378,7 @@ mark_regno_dead (int regno, machine_mode
   if (regno < FIRST_PSEUDO_REGISTER)
     {
       for (last = end_hard_regno (mode, regno); regno < last; regno++)
-	make_hard_regno_dead (regno, false);
+	make_hard_regno_dead (regno);
     }
   else
     {
@@ -747,6 +775,7 @@ process_bb_lives (basic_block bb, int &c
 	}
 
       call_p = CALL_P (curr_insn);
+      ignore_reg_for_conflicts = copy_insn_p (curr_insn);
       src_regno = (set != NULL_RTX && REG_P (SET_SRC (set))
 		   ? REGNO (SET_SRC (set)) : -1);
       dst_regno = (set != NULL_RTX && REG_P (SET_DEST (set))
@@ -858,14 +887,13 @@ process_bb_lives (basic_block bb, int &c
       for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next)
 	if (reg->type == OP_OUT
 	    && ! reg_early_clobber_p (reg, n_alt) && ! reg->subreg_p)
-	  make_hard_regno_dead (reg->regno, false);
+	  make_hard_regno_dead (reg->regno);
 
       if (curr_id->arg_hard_regs != NULL)
 	for (i = 0; (regno = curr_id->arg_hard_regs[i]) >= 0; i++)
 	  if (regno >= FIRST_PSEUDO_REGISTER)
-	    /* It is a clobber.  Don't create conflict of used
-	       REAL_PIC_OFFSET_TABLE_REGNUM and the pic pseudo.  */
-	    make_hard_regno_dead (regno - FIRST_PSEUDO_REGISTER, true);
+	    /* It is a clobber.  */
+	    make_hard_regno_dead (regno - FIRST_PSEUDO_REGISTER);
 
       if (call_p)
 	{
@@ -925,8 +953,7 @@ process_bb_lives (basic_block bb, int &c
 	  make_hard_regno_live (reg->regno);
 
       if (curr_id->arg_hard_regs != NULL)
-	/* Make argument hard registers live.  Don't create conflict
-	   of used REAL_PIC_OFFSET_TABLE_REGNUM and the pic pseudo.  */
+	/* Make argument hard registers live.  */
 	for (i = 0; (regno = curr_id->arg_hard_regs[i]) >= 0; i++)
 	  if (regno < FIRST_PSEUDO_REGISTER)
 	    make_hard_regno_live (regno);
@@ -954,7 +981,7 @@ process_bb_lives (basic_block bb, int &c
 	      if (reg2->type != OP_OUT && reg2->regno == reg->regno)
 		break;
 	    if (reg2 == NULL)
-	      make_hard_regno_dead (reg->regno, false);
+	      make_hard_regno_dead (reg->regno);
 	  }
 
       if (need_curr_point_incr)
@@ -989,6 +1016,7 @@ process_bb_lives (basic_block bb, int &c
       EXECUTE_IF_SET_IN_SPARSESET (unused_set, j)
 	add_reg_note (curr_insn, REG_UNUSED, regno_reg_rtx[j]);
     }
+  ignore_reg_for_conflicts = NULL_RTX;
 
   if (bb_has_eh_pred (bb))
     for (j = 0; ; ++j)
Index: gcc/testsuite/gcc.target/powerpc/pr86939.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr86939.c	(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr86939.c	(working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-options "-O2" } */
+
+typedef long (*fptr_t) (void);
+long
+func (fptr_t *p)
+{
+  if (p)
+    return (*p) ();
+  return 0;
+}
+/* { dg-final { scan-assembler-not {mr %?r?12,} } } */
H.J. Lu Oct. 2, 2018, 3:32 p.m. UTC | #12
On Tue, Oct 2, 2018 at 7:59 AM Peter Bergner <bergner@linux.ibm.com> wrote:
>
> On 10/1/18 10:52 PM, Peter Bergner wrote:
> > Now that we handle conflicts at definitions and the pic hard reg
> > is set via a copy from the pic pseudo, my PATCH 2 is setup to
> > handle exactly this scenario (ie, a copy between a pseudo and
> > a hard reg).  I looked at the asm output from a build with both
> > PATCH 1 and PATCH 2, and yes, it also does not add the conflict
> > between the pic pseudo and pic hard reg, so our other option to
> > fix PR87479 is to apply PATCH 2.  However, since PATCH 2 handles
> > the pic pseudo and pic hard reg conflict itself, that means we
> > don't need the special pic conflict code and it can be removed!
> > I'm going to update PATCH 2 to remove that pic handling code
> > and send it through bootstrap and regtesting.
>
> Here is an updated PATCH 2 that adds the generic handling of copies between
> pseudos and hard regs which obsoletes the special conflict handling of the
> REAL_PIC_OFFSET_TABLE_REGNUM with the pic pseudo.  I have confirmed the
> assembler generated by this patch for test case pr63534.c matches the code
> generated before PATCH 1 was committed, so we are successfully removing the
> copy of the pic pseudo into the pic hard reg with this patch.
>
> I'm currently performing bootstrap and regtesting on powerpc64le-linux and
> x86_64-linux.  H.J., could you please test this patch on i686 to verify it
> doesn't expose any other problems there?  Otherwise, I'll take Jeff's

I am waiting for the result of your previous patch.  I will test this one after
that.

> suggestion and attempt a build on gcc45, but it sounds like the results
> will take a while.
>
> Is this patch version ok for trunk assuming no regressions are found in
> the testing mentioned above?
>
> Peter
>
>
> gcc/
>         PR rtl-optimization/86939
>         PR rtl-optimization/87479
>         * ira.h (copy_insn_p): New prototype.
>         * ira-lives.c (ignore_reg_for_conflicts): New static variable.
>         (make_hard_regno_dead): Don't add conflicts for register
>         ignore_reg_for_conflicts.
>         (make_object_dead): Likewise.
>         (copy_insn_p): New function.
>         (process_bb_node_lives): Set ignore_reg_for_conflicts for copies.
>         Remove special conflict handling of REAL_PIC_OFFSET_TABLE_REGNUM.
>         * lra-lives.c (ignore_reg_for_conflicts): New static variable.
>         (make_hard_regno_dead): Don't add conflicts for register
>         ignore_reg_for_conflicts.  Remove special conflict handling of
>         REAL_PIC_OFFSET_TABLE_REGNUM.  Remove now unused argument
>         check_pic_pseudo_p and update callers.
>         (mark_pseudo_dead): Don't add conflicts for register
>         ignore_reg_for_conflicts.
>         (process_bb_lives): Set ignore_reg_for_conflicts for copies.
>
Peter Bergner Oct. 2, 2018, 3:39 p.m. UTC | #13
On 10/2/18 10:32 AM, H.J. Lu wrote:
> On Tue, Oct 2, 2018 at 7:59 AM Peter Bergner <bergner@linux.ibm.com> wrote:
>> I'm currently performing bootstrap and regtesting on powerpc64le-linux and
>> x86_64-linux.  H.J., could you please test this patch on i686 to verify it
>> doesn't expose any other problems there?  Otherwise, I'll take Jeff's
> 
> I am waiting for the result of your previous patch.  I will test this one after
> that.

Great, thanks!

Peter
H.J. Lu Oct. 2, 2018, 5:03 p.m. UTC | #14
On Tue, Oct 2, 2018 at 8:39 AM Peter Bergner <bergner@linux.ibm.com> wrote:
>
> On 10/2/18 10:32 AM, H.J. Lu wrote:
> > On Tue, Oct 2, 2018 at 7:59 AM Peter Bergner <bergner@linux.ibm.com> wrote:
> >> I'm currently performing bootstrap and regtesting on powerpc64le-linux and
> >> x86_64-linux.  H.J., could you please test this patch on i686 to verify it
> >> doesn't expose any other problems there?  Otherwise, I'll take Jeff's
> >
> > I am waiting for the result of your previous patch.  I will test this one after
> > that.
>
> Great, thanks!

Your previous patch is OK.  I am testing the new patch now.
Peter Bergner Oct. 2, 2018, 7:44 p.m. UTC | #15
On 10/2/18 9:59 AM, Peter Bergner wrote:
> gcc/
> 	PR rtl-optimization/86939
> 	PR rtl-optimization/87479
> 	* ira.h (copy_insn_p): New prototype.
> 	* ira-lives.c (ignore_reg_for_conflicts): New static variable.
> 	(make_hard_regno_dead): Don't add conflicts for register
> 	ignore_reg_for_conflicts.
> 	(make_object_dead): Likewise.
> 	(copy_insn_p): New function.
> 	(process_bb_node_lives): Set ignore_reg_for_conflicts for copies.
> 	Remove special conflict handling of REAL_PIC_OFFSET_TABLE_REGNUM.
> 	* lra-lives.c (ignore_reg_for_conflicts): New static variable.
> 	(make_hard_regno_dead): Don't add conflicts for register
> 	ignore_reg_for_conflicts.  Remove special conflict handling of
> 	REAL_PIC_OFFSET_TABLE_REGNUM.  Remove now unused argument
> 	check_pic_pseudo_p and update callers.
> 	(mark_pseudo_dead): Don't add conflicts for register
> 	ignore_reg_for_conflicts.
> 	(process_bb_lives): Set ignore_reg_for_conflicts for copies.

So bootstrap and regtesting on powerpc64le-linux show no regressions.
Looking at the x86_64-linux results, I see one test suite regression:

  FAIL: gcc.target/i386/pr49095.c scan-assembler-times \\), % 8

I have included the function that is compiled differently with and
without the patch.  Looking at the IRA rtl dumps, there is pseudo
85 that is copied from and into hard reg 5 and we now no longer
report that they conflict.  That allows pesudo 85 to now be assigned
to hard reg 5, rather than hard reg 3 (old code) and that leads
to the code differences shown below.  I don't know x86_64 mnemonics
enough to say whether the code changes below are "better" or "similar"
or ???.  H.J., can you comment on the below code gen changes?
If they're better or similar to the old code, I could just modify
the expected results for pr49095.c.

Peter



[bergner@dagger1 PR87479]$ cat pr49095.i 
void foo (void *);

int *
f1 (int *x)
{
  if (!--*x)
    foo (x);
  return x;
}
[bergner@dagger1 PR87479]$ /data/bergner/gcc/build/gcc-fsf-mainline-pr87479-base-regtest/gcc/xgcc -B/data/bergner/gcc/build/gcc-fsf-mainline-pr87479-base-regtest/gcc/ -Os -fno-shrink-wrap -masm=att -ffat-lto-objects -fno-ident -S -o pr49095-base.s pr49095.i 
[bergner@dagger1 PR87479]$ /data/bergner/gcc/build/gcc-fsf-mainline-pr87479-regtest/gcc/xgcc -B/data/bergner/gcc/build/gcc-fsf-mainline-pr87479-regtest/gcc/ -Os -fno-shrink-wrap -masm=att -ffat-lto-objects -fno-ident -S -o pr49095-new.s pr49095.i 
[bergner@dagger1 PR87479]$ diff -u pr49095-base.s pr49095-new.s 
--- pr49095-base.s	2018-10-02 14:07:09.000000000 -0500
+++ pr49095-new.s	2018-10-02 14:07:40.000000000 -0500
@@ -5,16 +5,16 @@
 f1:
 .LFB0:
 	.cfi_startproc
+	subq	$24, %rsp
+	.cfi_def_cfa_offset 32
 	decl	(%rdi)
-	pushq	%rbx
-	.cfi_def_cfa_offset 16
-	.cfi_offset 3, -16
-	movq	%rdi, %rbx
 	jne	.L2
+	movq	%rdi, 8(%rsp)
 	call	foo
+	movq	8(%rsp), %rdi
 .L2:
-	movq	%rbx, %rax
-	popq	%rbx
+	movq	%rdi, %rax
+	addq	$24, %rsp
 	.cfi_def_cfa_offset 8
 	ret
 	.cfi_endproc
H.J. Lu Oct. 2, 2018, 9:52 p.m. UTC | #16
On Tue, Oct 2, 2018 at 12:44 PM Peter Bergner <bergner@linux.ibm.com> wrote:
>
> On 10/2/18 9:59 AM, Peter Bergner wrote:
> > gcc/
> >       PR rtl-optimization/86939
> >       PR rtl-optimization/87479
> >       * ira.h (copy_insn_p): New prototype.
> >       * ira-lives.c (ignore_reg_for_conflicts): New static variable.
> >       (make_hard_regno_dead): Don't add conflicts for register
> >       ignore_reg_for_conflicts.
> >       (make_object_dead): Likewise.
> >       (copy_insn_p): New function.
> >       (process_bb_node_lives): Set ignore_reg_for_conflicts for copies.
> >       Remove special conflict handling of REAL_PIC_OFFSET_TABLE_REGNUM.
> >       * lra-lives.c (ignore_reg_for_conflicts): New static variable.
> >       (make_hard_regno_dead): Don't add conflicts for register
> >       ignore_reg_for_conflicts.  Remove special conflict handling of
> >       REAL_PIC_OFFSET_TABLE_REGNUM.  Remove now unused argument
> >       check_pic_pseudo_p and update callers.
> >       (mark_pseudo_dead): Don't add conflicts for register
> >       ignore_reg_for_conflicts.
> >       (process_bb_lives): Set ignore_reg_for_conflicts for copies.
>
> So bootstrap and regtesting on powerpc64le-linux show no regressions.
> Looking at the x86_64-linux results, I see one test suite regression:
>
>   FAIL: gcc.target/i386/pr49095.c scan-assembler-times \\), % 8
>
> I have included the function that is compiled differently with and
> without the patch.  Looking at the IRA rtl dumps, there is pseudo
> 85 that is copied from and into hard reg 5 and we now no longer
> report that they conflict.  That allows pesudo 85 to now be assigned
> to hard reg 5, rather than hard reg 3 (old code) and that leads
> to the code differences shown below.  I don't know x86_64 mnemonics
> enough to say whether the code changes below are "better" or "similar"
> or ???.  H.J., can you comment on the below code gen changes?
> If they're better or similar to the old code, I could just modify
> the expected results for pr49095.c.
>
> Peter
>
>
>
> [bergner@dagger1 PR87479]$ cat pr49095.i
> void foo (void *);
>
> int *
> f1 (int *x)
> {
>   if (!--*x)
>     foo (x);
>   return x;
> }
> [bergner@dagger1 PR87479]$ /data/bergner/gcc/build/gcc-fsf-mainline-pr87479-base-regtest/gcc/xgcc -B/data/bergner/gcc/build/gcc-fsf-mainline-pr87479-base-regtest/gcc/ -Os -fno-shrink-wrap -masm=att -ffat-lto-objects -fno-ident -S -o pr49095-base.s pr49095.i
> [bergner@dagger1 PR87479]$ /data/bergner/gcc/build/gcc-fsf-mainline-pr87479-regtest/gcc/xgcc -B/data/bergner/gcc/build/gcc-fsf-mainline-pr87479-regtest/gcc/ -Os -fno-shrink-wrap -masm=att -ffat-lto-objects -fno-ident -S -o pr49095-new.s pr49095.i
> [bergner@dagger1 PR87479]$ diff -u pr49095-base.s pr49095-new.s
> --- pr49095-base.s      2018-10-02 14:07:09.000000000 -0500
> +++ pr49095-new.s       2018-10-02 14:07:40.000000000 -0500
> @@ -5,16 +5,16 @@
>  f1:
>  .LFB0:
>         .cfi_startproc
> +       subq    $24, %rsp
> +       .cfi_def_cfa_offset 32
>         decl    (%rdi)
> -       pushq   %rbx
> -       .cfi_def_cfa_offset 16
> -       .cfi_offset 3, -16
> -       movq    %rdi, %rbx
>         jne     .L2
> +       movq    %rdi, 8(%rsp)
>         call    foo
> +       movq    8(%rsp), %rdi
>  .L2:
> -       movq    %rbx, %rax
> -       popq    %rbx
> +       movq    %rdi, %rax
> +       addq    $24, %rsp
>         .cfi_def_cfa_offset 8
>         ret
>         .cfi_endproc
>

I saw the same failures:

FAIL: gcc.target/i386/pr49095.c scan-assembler-times \\), % 8
FAIL: gcc.target/i386/pr49095.c scan-assembler-times \\), % 8

I think the new ones are better, especially in 32-bit case:

Old:

[hjl@gnu-cfl-1 gcc]$ ./xgcc -B./
/export/gnu/import/git/sources/gcc/gcc/testsuite/gcc.target/i386/pr49095.c
-m32 -fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
-fdiagnostics-color=never -Os -fno-shrink-wrap -masm=att -mregparm=2
-ffat-lto-objects -fno-ident -S -o pr49095.s
[hjl@gnu-cfl-1 gcc]$ wc -l pr49095.s
2314 pr49095.s
[hjl@gnu-cfl-1 gcc]$

New:

[hjl@gnu-skl-1 gcc]$ ./xgcc -B./
/export/gnu/import/git/sources/gcc/gcc/testsuite/gcc.target/i386/pr49095.c
-m32 -fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
-fdiagnostics-color=never -Os -fno-shrink-wrap -masm=att -mregparm=2
-ffat-lto-objects -fno-ident -S -o pr49095.s
[hjl@gnu-skl-1 gcc]$ wc -l pr49095.s
2163 pr49095.s
[hjl@gnu-skl-1 gcc]$
Peter Bergner Oct. 2, 2018, 10:27 p.m. UTC | #17
On 10/2/18 4:52 PM, H.J. Lu wrote:
> I saw the same failures:
> 
> FAIL: gcc.target/i386/pr49095.c scan-assembler-times \\), % 8
> FAIL: gcc.target/i386/pr49095.c scan-assembler-times \\), % 8
> 
> I think the new ones are better, especially in 32-bit case:

Excellent!  Does the following test case patch make it so that
it PASSes again?

Peter


Index: gcc/testsuite/gcc.target/i386/pr49095.c
===================================================================
--- gcc/testsuite/gcc.target/i386/pr49095.c	(revision 264793)
+++ gcc/testsuite/gcc.target/i386/pr49095.c	(working copy)
@@ -73,4 +73,5 @@ G (long)
 /* { dg-final { scan-assembler-not "test\[lq\]" } } */
 /* The {f,h}{char,short,int,long}xor functions aren't optimized into
    a RMW instruction, so need load, modify and store.  FIXME eventually.  */
-/* { dg-final { scan-assembler-times "\\), %" 8 } } */
+/* { dg-final { scan-assembler-times "\\), %" 57 { target { ia32 } } } } */
+/* { dg-final { scan-assembler-times "\\), %" 45 { target { lp64 } } } } */
H.J. Lu Oct. 3, 2018, 12:17 a.m. UTC | #18
On Tue, Oct 2, 2018 at 3:28 PM Peter Bergner <bergner@linux.ibm.com> wrote:
>
> On 10/2/18 4:52 PM, H.J. Lu wrote:
> > I saw the same failures:
> >
> > FAIL: gcc.target/i386/pr49095.c scan-assembler-times \\), % 8
> > FAIL: gcc.target/i386/pr49095.c scan-assembler-times \\), % 8
> >
> > I think the new ones are better, especially in 32-bit case:
>
> Excellent!  Does the following test case patch make it so that
> it PASSes again?
>
> Peter
>
>
> Index: gcc/testsuite/gcc.target/i386/pr49095.c
> ===================================================================
> --- gcc/testsuite/gcc.target/i386/pr49095.c     (revision 264793)
> +++ gcc/testsuite/gcc.target/i386/pr49095.c     (working copy)
> @@ -73,4 +73,5 @@ G (long)
>  /* { dg-final { scan-assembler-not "test\[lq\]" } } */
>  /* The {f,h}{char,short,int,long}xor functions aren't optimized into
>     a RMW instruction, so need load, modify and store.  FIXME eventually.  */
> -/* { dg-final { scan-assembler-times "\\), %" 8 } } */
> +/* { dg-final { scan-assembler-times "\\), %" 57 { target { ia32 } } } } */
> +/* { dg-final { scan-assembler-times "\\), %" 45 { target { lp64 } } } } */

                      ^^^^^ This is wrong.

It should be not ia32.  Otherwise, it will skip x32.
Peter Bergner Oct. 3, 2018, 12:34 a.m. UTC | #19
On 10/2/18 7:17 PM, H.J. Lu wrote:
>> Index: gcc/testsuite/gcc.target/i386/pr49095.c
>> ===================================================================
>> --- gcc/testsuite/gcc.target/i386/pr49095.c     (revision 264793)
>> +++ gcc/testsuite/gcc.target/i386/pr49095.c     (working copy)
>> @@ -73,4 +73,5 @@ G (long)
>>  /* { dg-final { scan-assembler-not "test\[lq\]" } } */
>>  /* The {f,h}{char,short,int,long}xor functions aren't optimized into
>>     a RMW instruction, so need load, modify and store.  FIXME eventually.  */
>> -/* { dg-final { scan-assembler-times "\\), %" 8 } } */
>> +/* { dg-final { scan-assembler-times "\\), %" 57 { target { ia32 } } } } */
>> +/* { dg-final { scan-assembler-times "\\), %" 45 { target { lp64 } } } } */
> 
>                       ^^^^^ This is wrong.
> 
> It should be not ia32.  Otherwise, it will skip x32.

Ok, I changed it, thanks.

Peter


Index: gcc/testsuite/gcc.target/i386/pr49095.c
===================================================================
--- gcc/testsuite/gcc.target/i386/pr49095.c	(revision 264793)
+++ gcc/testsuite/gcc.target/i386/pr49095.c	(working copy)
@@ -73,4 +73,5 @@ G (long)
 /* { dg-final { scan-assembler-not "test\[lq\]" } } */
 /* The {f,h}{char,short,int,long}xor functions aren't optimized into
    a RMW instruction, so need load, modify and store.  FIXME eventually.  */
-/* { dg-final { scan-assembler-times "\\), %" 8 } } */
+/* { dg-final { scan-assembler-times "\\), %" 57 { target { ia32 } } } } */
+/* { dg-final { scan-assembler-times "\\), %" 45 { target { ! ia32 } } } } */