diff mbox series

[2/2,v3,IRA,LRA] Fix PR86939, IRA incorrectly creates an interference between a pseudo register and a hard register

Message ID 14bf79ef-9db2-e76b-df10-fcb2574d5ccb@linux.ibm.com
State New
Headers show
Series None | expand

Commit Message

Peter Bergner Oct. 3, 2018, 2:09 p.m. UTC
Here is another updated PATCH 2 that is the same as the previous version,
but includes the modification to the expected output of i386/pr49095.c
test case, as H.J. has confirmed the code gen changes we are seeing on
are a good thing.

This patch completed bootstrap and regtesting on powerpc64le-linux,
x86_64-linux and i686-linux (Thanks H.J.!) with no regressions.
Ok for trunk?

Peter


gcc/
	PR rtl-optimization/86939
	PR rtl-optimization/87479
	* ira.h (copy_insn_p): New prototype.
	* ira-lives.c (ignore_reg_for_conflicts): New static variable.
	(make_hard_regno_dead): Don't add conflicts for register
	ignore_reg_for_conflicts.
	(make_object_dead): Likewise.
	(copy_insn_p): New function.
	(process_bb_node_lives): Set ignore_reg_for_conflicts for copies.
	Remove special conflict handling of REAL_PIC_OFFSET_TABLE_REGNUM.
	* lra-lives.c (ignore_reg_for_conflicts): New static variable.
	(make_hard_regno_dead): Don't add conflicts for register
	ignore_reg_for_conflicts.  Remove special conflict handling of
	REAL_PIC_OFFSET_TABLE_REGNUM.  Remove now unused argument
	check_pic_pseudo_p and update callers.
	(mark_pseudo_dead): Don't add conflicts for register
	ignore_reg_for_conflicts.
	(process_bb_lives): Set ignore_reg_for_conflicts for copies.

gcc/testsuite/
	* gcc.target/powerpc/pr86939.c: New test.
	* gcc/testsuite/gcc.target/i386/pr49095.c: Fix expected results.

Comments

Vladimir Makarov Oct. 4, 2018, 8:01 p.m. UTC | #1
On 10/03/2018 10:09 AM, Peter Bergner wrote:
> Here is another updated PATCH 2 that is the same as the previous version,
> but includes the modification to the expected output of i386/pr49095.c
> test case, as H.J. has confirmed the code gen changes we are seeing on
> are a good thing.
>
> This patch completed bootstrap and regtesting on powerpc64le-linux,
> x86_64-linux and i686-linux (Thanks H.J.!) with no regressions.
> Ok for trunk?
>
The patch is ok for the trunk.  Only very minor comments.

IMHO, the name copy_insn_p is too common and confusing (we already have 
functions copy_insn and copy_insn_1 in GCC).  The name does not reflect 
its result meaning.  I would call it something like 
non_conflict_copy_source_reg although it is long.

Also I would rename last_regno to bound_regno because it is better 
reflect variable value meaning or at least to end_regno as it is a value 
of END_REGNO macro.

But you can ignore these insignificant comments.

Peter, thank you for working on the issue.  The patches may be solve 
more PRs you mentioned.
>
>
> gcc/
> 	PR rtl-optimization/86939
> 	PR rtl-optimization/87479
> 	* ira.h (copy_insn_p): New prototype.
> 	* ira-lives.c (ignore_reg_for_conflicts): New static variable.
> 	(make_hard_regno_dead): Don't add conflicts for register
> 	ignore_reg_for_conflicts.
> 	(make_object_dead): Likewise.
> 	(copy_insn_p): New function.
> 	(process_bb_node_lives): Set ignore_reg_for_conflicts for copies.
> 	Remove special conflict handling of REAL_PIC_OFFSET_TABLE_REGNUM.
> 	* lra-lives.c (ignore_reg_for_conflicts): New static variable.
> 	(make_hard_regno_dead): Don't add conflicts for register
> 	ignore_reg_for_conflicts.  Remove special conflict handling of
> 	REAL_PIC_OFFSET_TABLE_REGNUM.  Remove now unused argument
> 	check_pic_pseudo_p and update callers.
> 	(mark_pseudo_dead): Don't add conflicts for register
> 	ignore_reg_for_conflicts.
> 	(process_bb_lives): Set ignore_reg_for_conflicts for copies.
>
> gcc/testsuite/
> 	* gcc.target/powerpc/pr86939.c: New test.
> 	* gcc/testsuite/gcc.target/i386/pr49095.c: Fix expected results.
>
Peter Bergner Oct. 5, 2018, 4:40 p.m. UTC | #2
On 10/4/18 3:01 PM, Vladimir Makarov wrote:
> IMHO, the name copy_insn_p is too common and confusing (we already have
> functions copy_insn and copy_insn_1 in GCC).  The name does not reflect its
> result meaning.  I would call it something like non_conflict_copy_source_reg
> although it is long.

I'm fine with renaming it.  I'm not sure I like the use of source reg in
the name even though it is what is returned.  That is just a convenience for
the caller of the function.  Its true purpose is recognizing whether INSN
is or is not a reg to reg copy for which we can ignore their interference.

How about is_reg_copy_insn_p() or is_reg_to_reg_copy_p() or ???



> Also I would rename last_regno to bound_regno because it is better reflect
> variable value meaning or at least to end_regno as it is a value of END_REGNO
> macro.

Ok, I went with end_regno, since that seems to be used elsewhere.


Peter
Vladimir Makarov Oct. 5, 2018, 6:32 p.m. UTC | #3
On 10/05/2018 12:40 PM, Peter Bergner wrote:
> On 10/4/18 3:01 PM, Vladimir Makarov wrote:
>> IMHO, the name copy_insn_p is too common and confusing (we already have
>> functions copy_insn and copy_insn_1 in GCC).  The name does not reflect its
>> result meaning.  I would call it something like non_conflict_copy_source_reg
>> although it is long.
> I'm fine with renaming it.  I'm not sure I like the use of source reg in
> the name even though it is what is returned.  That is just a convenience for
> the caller of the function.  Its true purpose is recognizing whether INSN
> is or is not a reg to reg copy for which we can ignore their interference.
OK.
> How about is_reg_copy_insn_p() or is_reg_to_reg_copy_p() or ???
>
Personally I like the first name more.  But it is up to you.  I don't 
want to bother you anymore.
>
>> Also I would rename last_regno to bound_regno because it is better reflect
>> variable value meaning or at least to end_regno as it is a value of END_REGNO
>> macro.
> Ok, I went with end_regno, since that seems to be used elsewhere.
>
Great.

Thank you, Peter.
Peter Bergner Oct. 5, 2018, 8 p.m. UTC | #4
On 10/5/18 1:32 PM, Vladimir Makarov wrote:
> On 10/05/2018 12:40 PM, Peter Bergner wrote:
>> On 10/4/18 3:01 PM, Vladimir Makarov wrote:
>>> IMHO, the name copy_insn_p is too common and confusing (we already have
>>> functions copy_insn and copy_insn_1 in GCC).  The name does not reflect its
>>> result meaning.  I would call it something like non_conflict_copy_source_reg
>>> although it is long.
>> How about is_reg_copy_insn_p() or is_reg_to_reg_copy_p() or ???
>>
> Personally I like the first name more.  But it is up to you.  I don't want
> to bother you anymore.

It's not a bother, so lets get something we both are ok with.
How about non_conflicting_reg_copy or non_conflicting_copy_insn?

Peter
Vladimir Makarov Oct. 5, 2018, 9:12 p.m. UTC | #5
On 10/05/2018 04:00 PM, Peter Bergner wrote:
> On 10/5/18 1:32 PM, Vladimir Makarov wrote:
>> On 10/05/2018 12:40 PM, Peter Bergner wrote:
>>> On 10/4/18 3:01 PM, Vladimir Makarov wrote:
>>>> IMHO, the name copy_insn_p is too common and confusing (we already have
>>>> functions copy_insn and copy_insn_1 in GCC).  The name does not reflect its
>>>> result meaning.  I would call it something like non_conflict_copy_source_reg
>>>> although it is long.
>>> How about is_reg_copy_insn_p() or is_reg_to_reg_copy_p() or ???
>>>
>> Personally I like the first name more.  But it is up to you.  I don't want
>> to bother you anymore.
> It's not a bother, so lets get something we both are ok with.
> How about non_conflicting_reg_copy or non_conflicting_copy_insn?
OK. I like the first name more.
Peter Bergner Oct. 6, 2018, 2:38 a.m. UTC | #6
On 10/5/18 4:12 PM, Vladimir Makarov wrote:
> On 10/05/2018 04:00 PM, Peter Bergner wrote:
>> How about non_conflicting_reg_copy or non_conflicting_copy_insn?
> OK. I like the first name more.

Ok, I committed the patch using the first function name.
Thank you very much for the patch reviews and approvals!

Peter
Christophe Lyon Oct. 8, 2018, 9:14 a.m. UTC | #7
On Sat, 6 Oct 2018 at 04:38, Peter Bergner <bergner@linux.ibm.com> wrote:
>
> On 10/5/18 4:12 PM, Vladimir Makarov wrote:
> > On 10/05/2018 04:00 PM, Peter Bergner wrote:
> >> How about non_conflicting_reg_copy or non_conflicting_copy_insn?
> > OK. I like the first name more.
>
> Ok, I committed the patch using the first function name.
> Thank you very much for the patch reviews and approvals!
>

Hi,

Since r264897, we are seeing lots of regressions when bootstrapping on arm.
There are execution errors as well as ICEs.
A detailed list can be found at:
https://ci.linaro.org/job/tcwg-compare-results/6498/artifact/artifacts/logs/1-diff-d05_32.tcwg-d05_32-build.txt

You can also see the results posted on gcc-testresults.

Christophe

> Peter
>
>
>
Peter Bergner Oct. 8, 2018, 2:13 p.m. UTC | #8
On 10/8/18 4:14 AM, Christophe Lyon wrote:
> Since r264897, we are seeing lots of regressions when bootstrapping on arm.
> There are execution errors as well as ICEs.
> A detailed list can be found at:
> https://ci.linaro.org/job/tcwg-compare-results/6498/artifact/artifacts/logs/1-diff-d05_32.tcwg-d05_32-build.txt
> 
> You can also see the results posted on gcc-testresults.

Sorry for the breakage.  I'll try and build a cross compiler to fix the
ICEs.  Hopefully all the other issues are related.

Peter
Christophe Lyon Oct. 8, 2018, 2:43 p.m. UTC | #9
On Mon, 8 Oct 2018 at 16:13, Peter Bergner <bergner@linux.ibm.com> wrote:
>
> On 10/8/18 4:14 AM, Christophe Lyon wrote:
> > Since r264897, we are seeing lots of regressions when bootstrapping on arm.
> > There are execution errors as well as ICEs.
> > A detailed list can be found at:
> > https://ci.linaro.org/job/tcwg-compare-results/6498/artifact/artifacts/logs/1-diff-d05_32.tcwg-d05_32-build.txt
> >
> > You can also see the results posted on gcc-testresults.
>
> Sorry for the breakage.  I'll try and build a cross compiler to fix the
> ICEs.  Hopefully all the other issues are related.
>

Note that I saw ICEs only when bootstrapping, not when testing a cross-compiler.

> Peter
>
>
Jeff Law Oct. 8, 2018, 2:52 p.m. UTC | #10
On 10/8/18 8:43 AM, Christophe Lyon wrote:
> On Mon, 8 Oct 2018 at 16:13, Peter Bergner <bergner@linux.ibm.com> wrote:
>>
>> On 10/8/18 4:14 AM, Christophe Lyon wrote:
>>> Since r264897, we are seeing lots of regressions when bootstrapping on arm.
>>> There are execution errors as well as ICEs.
>>> A detailed list can be found at:
>>> https://ci.linaro.org/job/tcwg-compare-results/6498/artifact/artifacts/logs/1-diff-d05_32.tcwg-d05_32-build.txt
>>>
>>> You can also see the results posted on gcc-testresults.
>>
>> Sorry for the breakage.  I'll try and build a cross compiler to fix the
>> ICEs.  Hopefully all the other issues are related.
>>
> 
> Note that I saw ICEs only when bootstrapping, not when testing a cross-compiler.
My tester is showing a variety of problems as well.  hppa, sh4, aarch64,
aarch64_be, alpha arm* and s390x bootstraps are all failing at various
points.   Others may trickle in over time, but clearly something went
wrong recently.  I'm going to start trying to pick them off after the
morning meetings are wrapped up.


jeff
Peter Bergner Oct. 11, 2018, 12:57 a.m. UTC | #11
On 10/8/18 9:52 AM, Jeff Law wrote:
> My tester is showing a variety of problems as well.  hppa, sh4, aarch64,
> aarch64_be, alpha arm* and s390x bootstraps are all failing at various
> points.   Others may trickle in over time, but clearly something went
> wrong recently.  I'm going to start trying to pick them off after the
> morning meetings are wrapped up.

Hi Vlad and Jeff,

I looked into the PA-RISC test case issue Jeff sent me that is caused by my
patch that handles conflicts wrt reg copies and I _think_ I have a handle
on what the problem is, but don't yet know how to fix.  Jeff, I agree with
the analysis you gave in your email to me, but to get Vlad up to speed, here
it is again along with some additional info.

Compiling the test case, we have the following RTL during IRA.  I have also
annotated the pseudos in the RTL to include their hard reg assignment:

(insn 30 19 32 3 (set (reg/f:SI 112 "%r19") ....))
(insn 32 30 20 3 (set (reg:SI 26 %r26)
		      (reg/f:SI 101 "%r26")))
[snip]
(insn 23 22 49 3 (set (reg:SI 109 "%r28")
		 (mem:SI (reg/f:SI 101 "%r26"))))
(insn 49 23 31 3 (set (reg/f:SI 100 "%r28")
	(if_then_else:SI (eq (reg:SI 109 "%r28") (const_int 15 [0xf]))
	    (reg/f:SI 101 "%r26")
	    (const_int 0 [0])))
     (expr_list:REG_DEAD (reg:SI 109 "%r28")
	(expr_list:REG_DEAD (reg/f:SI 101 "%r26"))))
(insn 31 49 33 3 (set (mem/f:SI (reg/f:SI 112 "%r19"))
		      (reg/f:SI 100 "%r28"))
     (expr_list:REG_DEAD (reg/f:SI 112 "%r19")
	(expr_list:REG_DEAD (reg/f:SI 100 "%r28"))))
(call_insn 33 31 36 3 (parallel [
	    (call (mem:SI (symbol_ref/v:SI ("@_Z3yowP11dw_cfi_node"))
		(const_int 16 [0x10]))
	    (clobber (reg:SI 1 %r1))
	    (clobber (reg:SI 2 %r2))
	    (use (const_int 0 [0]))])
     (expr_list:REG_DEAD (reg:SI 26 %r26))
    (expr_list:SI (use (reg:SI 26 %r26)))))

Before my patch, hard reg %r26 and pseudo 101 conflicted, but with my patch
they now (correctly) do not.  IRA is able to assign hard regs to all of the
pseudos, but the constraints in the pattern for insn 49 forces op0 (pseudo 100)
and op1 (pseudo 101) to have the same hard reg.  They are not, so LRA comes
along and spills insn 49 with the following reloads:

Reloads for insn # 49
Reload 0: reload_in (SI) = (reg/f:SI 26 %r26 [orig:101 _10 ] [101])
	  reload_out (SI) = (reg/f:SI 28 %r28 [orig:100 iftmp.2_5 ] [100])
	  GENERAL_REGS, RELOAD_OTHER (opnum = 0)
	  reload_in_reg: (reg/f:SI 26 %r26 [orig:101 _10 ] [101])
	  reload_out_reg: (reg/f:SI 28 %r28 [orig:100 iftmp.2_5 ] [100])
	  reload_reg_rtx: (reg/f:SI 26 %r26 [orig:101 _10 ] [101])

..giving us the following updated insn:

(insn 49 23 58 3 (set (reg/f:SI 26 %r26 [101])
	(if_then_else:SI (eq (reg:SI 28 %r28 [109])
		(const_int 15))
	    (reg/f:SI 26 %r26 [101])
	    (const_int 0 [0]))))

The problem is, that hard reg %r26 is defined in insn 32, to be used in
insn 33, so using %r26 as the reload reg is wrong, because it will clobber
the value we set in insn 32.  Looking thru LRA, it looks like LRA assumes
that for a reload, if one of the input pseudos dies in the insn, then the
hard reg it was assigned to is available to use.  That assumption is (now)
wrong, because another pseudo may be using that hard reg or in this case,
the hard reg itself is still live.

For this example, pseudo 109 also dies in insn 49 and since it's hard reg
%r28 isn't live thru the insn, we could have used that instead.  However,
we cannot just look at REG_DEAD notes for free hard regs to use for reload
regs.  We need to make sure that that hard reg isn't also assigned to another
pseudo that is live at that insn or even that the hard reg itself is live.

Vlad, you know the LRA code better than anyone.  Will it be easy to find
all the places where we create reload regs and fix them up so that we
look at more than just REG_DEAD notes?  Even though looking at REG_DEAD
notes isn't enough, I still think the majority of the time those regs
probably will be free to use, we just have to check for the special
cases like above where they are not.

Peter
Peter Bergner Oct. 11, 2018, 6:18 p.m. UTC | #12
On 10/10/18 7:57 PM, Peter Bergner wrote:
> The problem is, that hard reg %r26 is defined in insn 32, to be used in
> insn 33, so using %r26 as the reload reg is wrong, because it will clobber
> the value we set in insn 32.  Looking thru LRA, it looks like LRA assumes
> that for a reload, if one of the input pseudos dies in the insn, then the
> hard reg it was assigned to is available to use.  That assumption is (now)
> wrong, because another pseudo may be using that hard reg or in this case,
> the hard reg itself is still live.
> 
> For this example, pseudo 109 also dies in insn 49 and since it's hard reg
> %r28 isn't live thru the insn, we could have used that instead.  However,
> we cannot just look at REG_DEAD notes for free hard regs to use for reload
> regs.  We need to make sure that that hard reg isn't also assigned to another
> pseudo that is live at that insn or even that the hard reg itself is live.
> 
> Vlad, you know the LRA code better than anyone.  Will it be easy to find
> all the places where we create reload regs and fix them up so that we
> look at more than just REG_DEAD notes?  Even though looking at REG_DEAD
> notes isn't enough, I still think the majority of the time those regs
> probably will be free to use, we just have to check for the special
> cases like above where they are not.

Ok, after working in gdb, I see that the PA-RISC port still uses reload
and not LRA, but it too seems to have the same issue of reusing input
regs that have REG_DEAD notes, so the question still stands.  It's just
that whatever fix we come up with will have to be to both LRA and reload.

Peter
Peter Bergner Oct. 11, 2018, 7:23 p.m. UTC | #13
On 10/11/18 1:18 PM, Peter Bergner wrote:
> Ok, after working in gdb, I see that the PA-RISC port still uses reload
> and not LRA, but it too seems to have the same issue of reusing input
> regs that have REG_DEAD notes, so the question still stands.  It's just
> that whatever fix we come up with will have to be to both LRA and reload.

On second thought, I'm thinking we should just leave reload alone and
only fix this in LRA.  That means we'd have to disable the reg copy
handling when not using LRA though, which might be another reason to
get targets to move to LRA?  I've verified the following patch gets
the PA-RISC test case to pass again.  Thoughts?

If ok, I still have to dig into the fails we're seeing on LRA targets.

Peter

	* ira-lives (non_conflicting_reg_copy_p): Disable for non LRA targets.

Index: gcc/ira-lives.c
===================================================================
--- gcc/ira-lives.c	(revision 264897)
+++ gcc/ira-lives.c	(working copy)
@@ -1064,6 +1064,10 @@ find_call_crossed_cheap_reg (rtx_insn *i
 rtx
 non_conflicting_reg_copy_p (rtx_insn *insn)
 {
+  /* Disallow this for non LRA targets.  */
+  if (!targetm.lra_p ())
+    return NULL_RTX;
+
   rtx set = single_set (insn);
 
   /* Disallow anything other than a simple register to register copy
Jeff Law Oct. 11, 2018, 7:40 p.m. UTC | #14
On 10/11/18 1:23 PM, Peter Bergner wrote:
> On 10/11/18 1:18 PM, Peter Bergner wrote:
>> Ok, after working in gdb, I see that the PA-RISC port still uses reload
>> and not LRA, but it too seems to have the same issue of reusing input
>> regs that have REG_DEAD notes, so the question still stands.  It's just
>> that whatever fix we come up with will have to be to both LRA and reload.
> 
> On second thought, I'm thinking we should just leave reload alone and
> only fix this in LRA.  That means we'd have to disable the reg copy
> handling when not using LRA though, which might be another reason to
> get targets to move to LRA?  I've verified the following patch gets
> the PA-RISC test case to pass again.  Thoughts?
> 
> If ok, I still have to dig into the fails we're seeing on LRA targets.
Hmmm.  Interesting.  I wonder if all the failing targets were reload
targets.....  If so, this may be the way forward -- I certainly don't
want to spend much, if any, time fixing reload.

I'm in the middle of something, but will try to look at each of the
failing targets and confirm they use reload by default.

Jeff
Vladimir Makarov Oct. 11, 2018, 8:51 p.m. UTC | #15
On 10/11/2018 03:23 PM, Peter Bergner wrote:
> On 10/11/18 1:18 PM, Peter Bergner wrote:
>> Ok, after working in gdb, I see that the PA-RISC port still uses reload
>> and not LRA, but it too seems to have the same issue of reusing input
>> regs that have REG_DEAD notes, so the question still stands.  It's just
>> that whatever fix we come up with will have to be to both LRA and reload.
> On second thought, I'm thinking we should just leave reload alone and
> only fix this in LRA.  That means we'd have to disable the reg copy
> handling when not using LRA though, which might be another reason to
> get targets to move to LRA?  I've verified the following patch gets
> the PA-RISC test case to pass again.  Thoughts?
>
> If ok, I still have to dig into the fails we're seeing on LRA targets.
>
I think it has a sense because even if LRA has the same problem, it will 
be hard to fix it in reload and LRA.  Nobody worked on reload pass for a 
long time and it is not worth to fix it because we are moving from reload.

I suspect that LRA might be immune to these failures because it 
generates new reload pseudos if it is necessary for insn constraints.  
Plus there is some primitive value numbering in LRA which can avoid the 
problem.

In any case, the patch is ok for me.
>
> 	* ira-lives (non_conflicting_reg_copy_p): Disable for non LRA targets.
>
> Index: gcc/ira-lives.c
> ===================================================================
> --- gcc/ira-lives.c	(revision 264897)
> +++ gcc/ira-lives.c	(working copy)
> @@ -1064,6 +1064,10 @@ find_call_crossed_cheap_reg (rtx_insn *i
>   rtx
>   non_conflicting_reg_copy_p (rtx_insn *insn)
>   {
> +  /* Disallow this for non LRA targets.  */
> +  if (!targetm.lra_p ())
> +    return NULL_RTX;
> +
>     rtx set = single_set (insn);
>   
>     /* Disallow anything other than a simple register to register copy
>
Peter Bergner Oct. 11, 2018, 9:05 p.m. UTC | #16
On 10/11/18 2:40 PM, Jeff Law wrote:
> On 10/11/18 1:23 PM, Peter Bergner wrote:
>> On 10/11/18 1:18 PM, Peter Bergner wrote:
>>> Ok, after working in gdb, I see that the PA-RISC port still uses reload
>>> and not LRA, but it too seems to have the same issue of reusing input
>>> regs that have REG_DEAD notes, so the question still stands.  It's just
>>> that whatever fix we come up with will have to be to both LRA and reload.
>>
>> On second thought, I'm thinking we should just leave reload alone and
>> only fix this in LRA.  That means we'd have to disable the reg copy
>> handling when not using LRA though, which might be another reason to
>> get targets to move to LRA?  I've verified the following patch gets
>> the PA-RISC test case to pass again.  Thoughts?
>>
>> If ok, I still have to dig into the fails we're seeing on LRA targets.
> Hmmm.  Interesting.  I wonder if all the failing targets were reload
> targets.....  If so, this may be the way forward -- I certainly don't
> want to spend much, if any, time fixing reload.
> 
> I'm in the middle of something, but will try to look at each of the
> failing targets and confirm they use reload by default.

These are the easy ones (they default to reload):

bergner@pike:~/gcc/gcc-fsf-mainline/gcc/config$ grep -r TARGET_LRA_P | grep false | sort
alpha/alpha.c:#define TARGET_LRA_P hook_bool_void_false
avr/avr.c:#define TARGET_LRA_P hook_bool_void_false
bfin/bfin.c:#define TARGET_LRA_P hook_bool_void_false
c6x/c6x.c:#define TARGET_LRA_P hook_bool_void_false
cr16/cr16.c:#define TARGET_LRA_P hook_bool_void_false
cris/cris.c:#define TARGET_LRA_P hook_bool_void_false
epiphany/epiphany.c:#define TARGET_LRA_P hook_bool_void_false
fr30/fr30.c:#define TARGET_LRA_P hook_bool_void_false
frv/frv.c:#define TARGET_LRA_P hook_bool_void_false
h8300/h8300.c:#define TARGET_LRA_P hook_bool_void_false
ia64/ia64.c:#define TARGET_LRA_P hook_bool_void_false
iq2000/iq2000.c:#define TARGET_LRA_P hook_bool_void_false
lm32/lm32.c:#define TARGET_LRA_P hook_bool_void_false
m32c/m32c.c:#define TARGET_LRA_P hook_bool_void_false
m32r/m32r.c:#define TARGET_LRA_P hook_bool_void_false
m68k/m68k.c:#define TARGET_LRA_P hook_bool_void_false
mcore/mcore.c:#define TARGET_LRA_P hook_bool_void_false
microblaze/microblaze.c:#define TARGET_LRA_P hook_bool_void_false
mmix/mmix.c:#define TARGET_LRA_P hook_bool_void_false
mn10300/mn10300.c:#define TARGET_LRA_P hook_bool_void_false
moxie/moxie.c:#define TARGET_LRA_P hook_bool_void_false
msp430/msp430.c:#define TARGET_LRA_P hook_bool_void_false
nvptx/nvptx.c:#define TARGET_LRA_P hook_bool_void_false
pa/pa.c:#define TARGET_LRA_P hook_bool_void_false
rl78/rl78.c:#define TARGET_LRA_P hook_bool_void_false
spu/spu.c:#define TARGET_LRA_P hook_bool_void_false
stormy16/stormy16.c:#define TARGET_LRA_P hook_bool_void_false
tilegx/tilegx.c:#define TARGET_LRA_P hook_bool_void_false
tilepro/tilepro.c:#define TARGET_LRA_P hook_bool_void_false
vax/vax.c:#define TARGET_LRA_P hook_bool_void_false
visium/visium.c:#define TARGET_LRA_P hook_bool_void_false
xtensa/xtensa.c:#define TARGET_LRA_P hook_bool_void_false

These are harder since they support -mlra:

arc/arc.c:#define TARGET_LRA_P arc_lra_p
ft32/ft32.c:#define TARGET_LRA_P ft32_lra_p
mips/mips.c:#define TARGET_LRA_P mips_lra_p
pdp11/pdp11.c:#define TARGET_LRA_P pdp11_lra_p
powerpcspe/powerpcspe.c:#define TARGET_LRA_P rs6000_lra_p
rx/rx.c:#define TARGET_LRA_P 				rx_enable_lra
s390/s390.c:#define TARGET_LRA_P s390_lra_p
sh/sh.c:#define TARGET_LRA_P sh_lra_p
sparc/sparc.c:#define TARGET_LRA_P sparc_lra_p

Quickly looking into their *.opt files, the follwoing default to LRA:
  mips, s390
while these default to reload:
  ft32, sh4
and these I'm not sure of without looking deeper:
  arc, pdp11, powerpcspe, rx, sparc

...if that helps.

Peter
Jeff Law Oct. 11, 2018, 9:15 p.m. UTC | #17
On 10/11/18 3:05 PM, Peter Bergner wrote:
> On 10/11/18 2:40 PM, Jeff Law wrote:
>> On 10/11/18 1:23 PM, Peter Bergner wrote:
>>> On 10/11/18 1:18 PM, Peter Bergner wrote:
>>>> Ok, after working in gdb, I see that the PA-RISC port still uses reload
>>>> and not LRA, but it too seems to have the same issue of reusing input
>>>> regs that have REG_DEAD notes, so the question still stands.  It's just
>>>> that whatever fix we come up with will have to be to both LRA and reload.
>>>
>>> On second thought, I'm thinking we should just leave reload alone and
>>> only fix this in LRA.  That means we'd have to disable the reg copy
>>> handling when not using LRA though, which might be another reason to
>>> get targets to move to LRA?  I've verified the following patch gets
>>> the PA-RISC test case to pass again.  Thoughts?
>>>
>>> If ok, I still have to dig into the fails we're seeing on LRA targets.
>> Hmmm.  Interesting.  I wonder if all the failing targets were reload
>> targets.....  If so, this may be the way forward -- I certainly don't
>> want to spend much, if any, time fixing reload.
>>
>> I'm in the middle of something, but will try to look at each of the
>> failing targets and confirm they use reload by default.
> 
> These are the easy ones (they default to reload):
> 
> bergner@pike:~/gcc/gcc-fsf-mainline/gcc/config$ grep -r TARGET_LRA_P | grep false | sort
> alpha/alpha.c:#define TARGET_LRA_P hook_bool_void_false
> avr/avr.c:#define TARGET_LRA_P hook_bool_void_false
> bfin/bfin.c:#define TARGET_LRA_P hook_bool_void_false
> c6x/c6x.c:#define TARGET_LRA_P hook_bool_void_false
> cr16/cr16.c:#define TARGET_LRA_P hook_bool_void_false
> cris/cris.c:#define TARGET_LRA_P hook_bool_void_false
> epiphany/epiphany.c:#define TARGET_LRA_P hook_bool_void_false
> fr30/fr30.c:#define TARGET_LRA_P hook_bool_void_false
> frv/frv.c:#define TARGET_LRA_P hook_bool_void_false
> h8300/h8300.c:#define TARGET_LRA_P hook_bool_void_false
> ia64/ia64.c:#define TARGET_LRA_P hook_bool_void_false
> iq2000/iq2000.c:#define TARGET_LRA_P hook_bool_void_false
> lm32/lm32.c:#define TARGET_LRA_P hook_bool_void_false
> m32c/m32c.c:#define TARGET_LRA_P hook_bool_void_false
> m32r/m32r.c:#define TARGET_LRA_P hook_bool_void_false
> m68k/m68k.c:#define TARGET_LRA_P hook_bool_void_false
> mcore/mcore.c:#define TARGET_LRA_P hook_bool_void_false
> microblaze/microblaze.c:#define TARGET_LRA_P hook_bool_void_false
> mmix/mmix.c:#define TARGET_LRA_P hook_bool_void_false
> mn10300/mn10300.c:#define TARGET_LRA_P hook_bool_void_false
> moxie/moxie.c:#define TARGET_LRA_P hook_bool_void_false
> msp430/msp430.c:#define TARGET_LRA_P hook_bool_void_false
> nvptx/nvptx.c:#define TARGET_LRA_P hook_bool_void_false
> pa/pa.c:#define TARGET_LRA_P hook_bool_void_false
> rl78/rl78.c:#define TARGET_LRA_P hook_bool_void_false
> spu/spu.c:#define TARGET_LRA_P hook_bool_void_false
> stormy16/stormy16.c:#define TARGET_LRA_P hook_bool_void_false
> tilegx/tilegx.c:#define TARGET_LRA_P hook_bool_void_false
> tilepro/tilepro.c:#define TARGET_LRA_P hook_bool_void_false
> vax/vax.c:#define TARGET_LRA_P hook_bool_void_false
> visium/visium.c:#define TARGET_LRA_P hook_bool_void_false
> xtensa/xtensa.c:#define TARGET_LRA_P hook_bool_void_false
> 
> These are harder since they support -mlra:
> 
> arc/arc.c:#define TARGET_LRA_P arc_lra_p
> ft32/ft32.c:#define TARGET_LRA_P ft32_lra_p
> mips/mips.c:#define TARGET_LRA_P mips_lra_p
> pdp11/pdp11.c:#define TARGET_LRA_P pdp11_lra_p
> powerpcspe/powerpcspe.c:#define TARGET_LRA_P rs6000_lra_p
> rx/rx.c:#define TARGET_LRA_P 				rx_enable_lra
> s390/s390.c:#define TARGET_LRA_P s390_lra_p
> sh/sh.c:#define TARGET_LRA_P sh_lra_p
> sparc/sparc.c:#define TARGET_LRA_P sparc_lra_p
> 
> Quickly looking into their *.opt files, the follwoing default to LRA:
>   mips, s390
So the failing targets were aarch64, alpha, arm, sh4, s390, alpha and
hppa.  In theory your patch has a reasonable chance of fixing sh4, alpha
and hppa.  So I suspect we're still going to have the aarch64, arm and
s390 issues.


I've had my tester turned off while we sorted this out.   I'll put your
patch to disable the conflict pruning for non-LRA targets and see what
we get overnight.



jeff
Jeff Law Oct. 12, 2018, 3:40 a.m. UTC | #18
On 10/11/18 1:23 PM, Peter Bergner wrote:
> On 10/11/18 1:18 PM, Peter Bergner wrote:
>> Ok, after working in gdb, I see that the PA-RISC port still uses reload
>> and not LRA, but it too seems to have the same issue of reusing input
>> regs that have REG_DEAD notes, so the question still stands.  It's just
>> that whatever fix we come up with will have to be to both LRA and reload.
> 
> On second thought, I'm thinking we should just leave reload alone and
> only fix this in LRA.  That means we'd have to disable the reg copy
> handling when not using LRA though, which might be another reason to
> get targets to move to LRA?  I've verified the following patch gets
> the PA-RISC test case to pass again.  Thoughts?
> 
> If ok, I still have to dig into the fails we're seeing on LRA targets.
> 
> Peter
> 
> 	* ira-lives (non_conflicting_reg_copy_p): Disable for non LRA targets.
So this helped the alpha & hppa and sh4.

I'm still seeing failures on the aarch64, s390x.  No surprise on these
since they use LRA by default and would be unaffected by this patch.


You've got state on the aarch64 issue where we generate bogus code for
the kernel which (thankfully) the assembler complained about.

For s390x the stage3 compiler fails its selftest with:

/home/nfs/law/jenkins/workspace/s390x-linux-gnu/obj/gcc/./gcc/xgcc
-B/home/nfs/law/jenkins/workspace/s390x-linux-gnu/obj/gcc/./gcc/ -xc
-nostdinc /dev/null -S -o /dev/null
-fself-test=../../../gcc/gcc/testsuite/selftests
../../../gcc/gcc/input.c:2423: test_lexer_string_locations_simple: FAIL:
ASSERT_EQ (expected_start_col, actual_start_col)
cc1: internal compiler error: in fail, at selftest.c:47
0x2171da3 selftest::fail(selftest::location const&, char const*)
	../../../gcc/gcc/selftest.c:47
0x2192bb3 assert_char_at_range
	../../../gcc/gcc/input.c:2299
0x2198ddb test_lexer_string_locations_simple
	../../../gcc/gcc/input.c:2423
0x2194a57 selftest::for_each_line_table_case(void
(*)(selftest::line_table_case const&))
	../../../gcc/gcc/input.c:3533
0x2194dcf selftest::input_c_tests()
	../../../gcc/gcc/input.c:3556
0x21061e9 selftest::run_tests()
	../../../gcc/gcc/selftest-run-tests.c:80
0x1869f6d toplev::run_self_tests()
	../../../gcc/gcc/toplev.c:2234
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
make[3]: *** [Makefile:1979: s-selftest-c] Error 1


That's an indication that we've mis-compiled GCC along the way since the
selftest was successful with the stage1 and stage2 compiler.

Given you can see the aarch64 failure with a cross and we haven't done
any real analysis on the s390 failure, I'd focus on aarch64.  I'd ignore
any pathname weirdness and track down why we generate the bogus code the
assembler complains about.


There's also still an arm-linux-gnueabi failure, but I haven't bisected
it to your change.   GCC bootstraps, but fails with an odd error in the
linemap code while building the kernel.

jeff
Eric Botcazou Oct. 12, 2018, 9:02 a.m. UTC | #19
> These are the easy ones (they default to reload):
> 
> bergner@pike:~/gcc/gcc-fsf-mainline/gcc/config$ grep -r TARGET_LRA_P | grep
> false | sort alpha/alpha.c:#define TARGET_LRA_P hook_bool_void_false
> avr/avr.c:#define TARGET_LRA_P hook_bool_void_false
> bfin/bfin.c:#define TARGET_LRA_P hook_bool_void_false
> c6x/c6x.c:#define TARGET_LRA_P hook_bool_void_false
> cr16/cr16.c:#define TARGET_LRA_P hook_bool_void_false
> cris/cris.c:#define TARGET_LRA_P hook_bool_void_false
> epiphany/epiphany.c:#define TARGET_LRA_P hook_bool_void_false
> fr30/fr30.c:#define TARGET_LRA_P hook_bool_void_false
> frv/frv.c:#define TARGET_LRA_P hook_bool_void_false
> h8300/h8300.c:#define TARGET_LRA_P hook_bool_void_false
> ia64/ia64.c:#define TARGET_LRA_P hook_bool_void_false
> iq2000/iq2000.c:#define TARGET_LRA_P hook_bool_void_false
> lm32/lm32.c:#define TARGET_LRA_P hook_bool_void_false
> m32c/m32c.c:#define TARGET_LRA_P hook_bool_void_false
> m32r/m32r.c:#define TARGET_LRA_P hook_bool_void_false
> m68k/m68k.c:#define TARGET_LRA_P hook_bool_void_false
> mcore/mcore.c:#define TARGET_LRA_P hook_bool_void_false
> microblaze/microblaze.c:#define TARGET_LRA_P hook_bool_void_false
> mmix/mmix.c:#define TARGET_LRA_P hook_bool_void_false
> mn10300/mn10300.c:#define TARGET_LRA_P hook_bool_void_false
> moxie/moxie.c:#define TARGET_LRA_P hook_bool_void_false
> msp430/msp430.c:#define TARGET_LRA_P hook_bool_void_false
> nvptx/nvptx.c:#define TARGET_LRA_P hook_bool_void_false
> pa/pa.c:#define TARGET_LRA_P hook_bool_void_false
> rl78/rl78.c:#define TARGET_LRA_P hook_bool_void_false
> spu/spu.c:#define TARGET_LRA_P hook_bool_void_false
> stormy16/stormy16.c:#define TARGET_LRA_P hook_bool_void_false
> tilegx/tilegx.c:#define TARGET_LRA_P hook_bool_void_false
> tilepro/tilepro.c:#define TARGET_LRA_P hook_bool_void_false
> vax/vax.c:#define TARGET_LRA_P hook_bool_void_false
> visium/visium.c:#define TARGET_LRA_P hook_bool_void_false
> xtensa/xtensa.c:#define TARGET_LRA_P hook_bool_void_false
> 
> These are harder since they support -mlra:
> 
> arc/arc.c:#define TARGET_LRA_P arc_lra_p
> ft32/ft32.c:#define TARGET_LRA_P ft32_lra_p
> mips/mips.c:#define TARGET_LRA_P mips_lra_p
> pdp11/pdp11.c:#define TARGET_LRA_P pdp11_lra_p
> powerpcspe/powerpcspe.c:#define TARGET_LRA_P rs6000_lra_p
> rx/rx.c:#define TARGET_LRA_P 				rx_enable_lra
> s390/s390.c:#define TARGET_LRA_P s390_lra_p
> sh/sh.c:#define TARGET_LRA_P sh_lra_p
> sparc/sparc.c:#define TARGET_LRA_P sparc_lra_p
> 
> Quickly looking into their *.opt files, the follwoing default to LRA:
>   mips, s390
> while these default to reload:
>   ft32, sh4
> and these I'm not sure of without looking deeper:
>   arc, pdp11, powerpcspe, rx, sparc
> 
> ...if that helps.

See https://gcc.gnu.org/backends.html for a more precise summary.
Peter Bergner Oct. 12, 2018, 4:38 p.m. UTC | #20
On 10/11/18 3:51 PM, Vladimir Makarov wrote:
> I think it has a sense because even if LRA has the same problem, it will be
> hard to fix it in reload and LRA.  Nobody worked on reload pass for a long
> time and it is not worth to fix it because we are moving from reload.
[snip]
> In any case, the patch is ok for me.

Ok, I committed the patch as revision 265113 with a slightly longer comment
explaining why we're disabling it for reload.  Thanks!



> I suspect that LRA might be immune to these failures because it generates
> new reload pseudos if it is necessary for insn constraints.  Plus there is
> some primitive value numbering in LRA which can avoid the problem.

Maybe.  We still have some LRA targets with issues, but we haven't gotten
to the point of identifying what the problem is.  It could well be target
constraints, etc. that is the problem.

I built a newer binutils on our s390x box and I have now recreated the
selftest ICE in stage3 and I still have the largish aarch64 failure I'm
looking into.

Peter
Jeff Law Oct. 12, 2018, 5:12 p.m. UTC | #21
On 10/12/18 10:38 AM, Peter Bergner wrote:
> On 10/11/18 3:51 PM, Vladimir Makarov wrote:
>> I think it has a sense because even if LRA has the same problem, it will be
>> hard to fix it in reload and LRA.  Nobody worked on reload pass for a long
>> time and it is not worth to fix it because we are moving from reload.
> [snip]
>> In any case, the patch is ok for me.
> 
> Ok, I committed the patch as revision 265113 with a slightly longer comment
> explaining why we're disabling it for reload.  Thanks!
> 
> 
> 
>> I suspect that LRA might be immune to these failures because it generates
>> new reload pseudos if it is necessary for insn constraints.  Plus there is
>> some primitive value numbering in LRA which can avoid the problem.
> 
> Maybe.  We still have some LRA targets with issues, but we haven't gotten
> to the point of identifying what the problem is.  It could well be target
> constraints, etc. that is the problem.
> 
> I built a newer binutils on our s390x box and I have now recreated the
> selftest ICE in stage3 and I still have the largish aarch64 failure I'm
> looking into.
ACK.  In that case I'll look into the one arm issue and see if I can
conclusively rule in or out the IRA/LRA work.

jeff
Peter Bergner Oct. 16, 2018, 2:24 a.m. UTC | #22
On 10/11/18 10:40 PM, Jeff Law wrote:
> On 10/11/18 1:23 PM, Peter Bergner wrote:
>> 	* ira-lives (non_conflicting_reg_copy_p): Disable for non LRA targets.
> So this helped the alpha & hppa and sh4.
> 
> I'm still seeing failures on the aarch64, s390x.  No surprise on these
> since they use LRA by default and would be unaffected by this patch.

Ok, I was able to reduce the aarch64 test case down to the minimum test case
that still kept the kernel's __cmpxchg_double() function intact.  I tested
the patch you're currently running on your builders which changed some
of the "... == OP_OUT" to "... != OP_IN", etc and it doesn't fix the
following test case, like it seems to fix the s390 issue and segher's
small test case (both aarch64 and ppc64).

It's late here, so I'll start digging into this one in the morning.

Peter



bergner@pike:~/gcc/BUGS/PR87507/$ cat slub-min.c
long
__cmpxchg_double (unsigned long arg)
{
  unsigned long old1 = 0;
  unsigned long old2 = arg;
  unsigned long new1 = 0;
  unsigned long new2 = 0;
  volatile void *ptr = 0;

  unsigned long oldval1 = old1;
  unsigned long oldval2 = old2;
  register unsigned long x0 asm ("x0") = old1;
  register unsigned long x1 asm ("x1") = old2;
  register unsigned long x2 asm ("x2") = new1;
  register unsigned long x3 asm ("x3") = new2;
  register unsigned long x4 asm ("x4") = (unsigned long) ptr;
  asm volatile ("	casp    %[old1], %[old2], %[new1], %[new2], %[v]\n"
		"	eor	%[old1], %[old1], %[oldval1]\n"
		"	eor	%[old2], %[old2], %[oldval2]\n"
		"	orr	%[old1], %[old1], %[old2]\n"
		: [old1] "+&r" (x0), [old2] "+&r" (x1), [v] "+Q" (* (unsigned long *) ptr)
		: [new1] "r" (x2), [new2] "r" (x3), [ptr] "r" (x4), [oldval1] "r" (oldval1),[oldval2] "r" (oldval2)
		: "x16", "x17", "x30");
  return x0;
}
bergner@pike:~/gcc/BUGS/PR87507/$ /home/bergner/gcc/build/gcc-fsf-mainline-aarch64-r264897/gcc/xgcc -B/home/bergner/gcc/build/gcc-fsf-mainline-aarch64-r264897/gcc -O2 -march=armv8.1-a -c slub-min.c
/tmp/ccQCkiSG.s: Assembler messages:
/tmp/ccQCkiSG.s:24: Error: reg pair must be contiguous at operand 2 -- `casp x0,x6,x2,x3,[x5]'
Renlin Li Nov. 1, 2018, 6:50 p.m. UTC | #23
Hi Peter,

Is there any update on this issues?
arm-none-linux-gnueabihf native toolchain has been mis-compiled for a while.

I got the following dump from the test case.

x1 is an early clobber operand in the inline assembly statement,
r92 should conflict with x1?

;; a0(r93,l0) conflicts: a1(r92,l0)
;;     total conflict hard regs: 0-4 16 17 30
;;     conflict hard regs: 0-4 16 17 30


;; a1(r92,l0) conflicts: a0(r93,l0)
;;     total conflict hard regs: 0 2-4 16 17 30
;;     conflict hard regs: 0 2-4 16 17 30

Dump from ira.

(insn 2 8 6 2 (set (reg/v:DI 92 [ arg ])
         (reg:DI 97)) "test.c":3:1 47 {*movdi_aarch64}
      (expr_list:REG_DEAD (reg:DI 97)
         (nil)))
(insn 7 6 9 2 (set (reg/v:DI 1 x1 [ x1 ])
         (reg/v:DI 92 [ arg ])) "test.c":13:26 47 {*movdi_aarch64}
      (nil))
(insn 11 10 14 2 (set (reg/f:DI 93)
         (const_int 0 [0])) "test.c":17:3 47 {*movdi_aarch64}
      (expr_list:REG_EQUIV (const_int 0 [0])
         (nil)))
(insn 14 11 21 2 (parallel [
             (set (reg/v:DI 0 x0 [ x0 ])
                 (asm_operands/v:DI ("	casp    %0, %1, %3, %4, %2
	eor	%0, %0, %6
	eor	%1, %1, %7
	orr	%0, %0, %1
") ("=&r") 0 [
                         (reg/v:DI 2 x2 [ x2 ])
                         (reg/v:DI 3 x3 [ x3 ])
                         (reg/v:DI 4 x4 [ x4 ])
                         (reg/f:DI 93)
                         (reg/v:DI 92 [ arg ])
                         (reg/v:DI 0 x0 [ x0 ])
                         (reg/v:DI 1 x1 [ x1 ])
                         (mem:DI (reg/f:DI 93) [1 MEM[(long unsigned int *)0B]+0 S8 A128])
                     ]
                      [
                         (asm_input:DI ("r") test.c:17)
                         (asm_input:DI ("r") test.c:17)
                         (asm_input:DI ("r") test.c:17)
                         (asm_input:DI ("r") test.c:17)
                         (asm_input:DI ("r") test.c:17)
                         (asm_input:DI ("0") test.c:17)
                         (asm_input:DI ("1") test.c:17)
                         (asm_input:DI ("Q") test.c:17)
                     ]
                      [] test.c:17))
             (set (reg/v:DI 1 x1 [ x1 ])
                 (asm_operands/v:DI ("	casp    %0, %1, %3, %4, %2
	eor	%0, %0, %6
	eor	%1, %1, %7
	orr	%0, %0, %1
") ("=&r") 1 [
                         (reg/v:DI 2 x2 [ x2 ])
                         (reg/v:DI 3 x3 [ x3 ])
                         (reg/v:DI 4 x4 [ x4 ])
                         (reg/f:DI 93)
                         (reg/v:DI 92 [ arg ])
                         (reg/v:DI 0 x0 [ x0 ])
                         (reg/v:DI 1 x1 [ x1 ])
                         (mem:DI (reg/f:DI 93) [1 MEM[(long unsigned int *)0B]+0 S8 A128])
                     ]
                      [
                         (asm_input:DI ("r") test.c:17)
                         (asm_input:DI ("r") test.c:17)
                         (asm_input:DI ("r") test.c:17)
                         (asm_input:DI ("r") test.c:17)
                         (asm_input:DI ("r") test.c:17)
                         (asm_input:DI ("0") test.c:17)
                         (asm_input:DI ("1") test.c:17)
                         (asm_input:DI ("Q") test.c:17)
                     ]
                      [] test.c:17))
             (set (mem:DI (reg/f:DI 93) [1 MEM[(long unsigned int *)0B]+0 S8 A128])
                 (asm_operands/v:DI ("	casp    %0, %1, %3, %4, %2
	eor	%0, %0, %6
	eor	%1, %1, %7
	orr	%0, %0, %1
") ("=Q") 2 [
                         (reg/v:DI 2 x2 [ x2 ])
                         (reg/v:DI 3 x3 [ x3 ])
                         (reg/v:DI 4 x4 [ x4 ])
                         (reg/f:DI 93)
                         (reg/v:DI 92 [ arg ])
                         (reg/v:DI 0 x0 [ x0 ])
                         (reg/v:DI 1 x1 [ x1 ])
                         (mem:DI (reg/f:DI 93) [1 MEM[(long unsigned int *)0B]+0 S8 A128])
                     ]
                      [
                         (asm_input:DI ("r") test.c:17)
                         (asm_input:DI ("r") test.c:17)
                         (asm_input:DI ("r") test.c:17)
                         (asm_input:DI ("r") test.c:17)
                         (asm_input:DI ("r") test.c:17)
                         (asm_input:DI ("0") test.c:17)
                         (asm_input:DI ("1") test.c:17)
                         (asm_input:DI ("Q") test.c:17)
                     ]
                      [] test.c:17))
             (clobber (reg:DI 30 x30))
             (clobber (reg:DI 17 x17))
             (clobber (reg:DI 16 x16))
         ]) "test.c":17:3 -1
      (expr_list:REG_DEAD (reg/f:DI 93)
         (expr_list:REG_DEAD (reg/v:DI 92 [ arg ])
             (expr_list:REG_DEAD (reg/v:DI 4 x4 [ x4 ])
                 (expr_list:REG_DEAD (reg/v:DI 3 x3 [ x3 ])
                     (expr_list:REG_DEAD (reg/v:DI 2 x2 [ x2 ])
                         (expr_list:REG_UNUSED (reg:DI 30 x30)
                             (expr_list:REG_UNUSED (reg:DI 17 x17)
                                 (expr_list:REG_UNUSED (reg:DI 16 x16)
                                     (expr_list:REG_UNUSED (reg/v:DI 1 x1 [ x1 ])
                                         (nil)))))))))))


Regards,
Renlin


On 10/16/2018 03:24 AM, Peter Bergner wrote:
> On 10/11/18 10:40 PM, Jeff Law wrote:
>> On 10/11/18 1:23 PM, Peter Bergner wrote:
>>> 	* ira-lives (non_conflicting_reg_copy_p): Disable for non LRA targets.
>> So this helped the alpha & hppa and sh4.
>>
>> I'm still seeing failures on the aarch64, s390x.  No surprise on these
>> since they use LRA by default and would be unaffected by this patch.
> 
> Ok, I was able to reduce the aarch64 test case down to the minimum test case
> that still kept the kernel's __cmpxchg_double() function intact.  I tested
> the patch you're currently running on your builders which changed some
> of the "... == OP_OUT" to "... != OP_IN", etc and it doesn't fix the
> following test case, like it seems to fix the s390 issue and segher's
> small test case (both aarch64 and ppc64).
> 
> It's late here, so I'll start digging into this one in the morning.
> 
> Peter
> 
> 
> 
> bergner@pike:~/gcc/BUGS/PR87507/$ cat slub-min.c
> long
> __cmpxchg_double (unsigned long arg)
> {
>    unsigned long old1 = 0;
>    unsigned long old2 = arg;
>    unsigned long new1 = 0;
>    unsigned long new2 = 0;
>    volatile void *ptr = 0;
> 
>    unsigned long oldval1 = old1;
>    unsigned long oldval2 = old2;
>    register unsigned long x0 asm ("x0") = old1;
>    register unsigned long x1 asm ("x1") = old2;
>    register unsigned long x2 asm ("x2") = new1;
>    register unsigned long x3 asm ("x3") = new2;
>    register unsigned long x4 asm ("x4") = (unsigned long) ptr;
>    asm volatile ("	casp    %[old1], %[old2], %[new1], %[new2], %[v]\n"
> 		"	eor	%[old1], %[old1], %[oldval1]\n"
> 		"	eor	%[old2], %[old2], %[oldval2]\n"
> 		"	orr	%[old1], %[old1], %[old2]\n"
> 		: [old1] "+&r" (x0), [old2] "+&r" (x1), [v] "+Q" (* (unsigned long *) ptr)
> 		: [new1] "r" (x2), [new2] "r" (x3), [ptr] "r" (x4), [oldval1] "r" (oldval1),[oldval2] "r" (oldval2)
> 		: "x16", "x17", "x30");
>    return x0;
> }
> bergner@pike:~/gcc/BUGS/PR87507/$ /home/bergner/gcc/build/gcc-fsf-mainline-aarch64-r264897/gcc/xgcc -B/home/bergner/gcc/build/gcc-fsf-mainline-aarch64-r264897/gcc -O2 -march=armv8.1-a -c slub-min.c
> /tmp/ccQCkiSG.s: Assembler messages:
> /tmp/ccQCkiSG.s:24: Error: reg pair must be contiguous at operand 2 -- `casp x0,x6,x2,x3,[x5]'
> 
> 
>
Segher Boessenkool Nov. 1, 2018, 8:35 p.m. UTC | #24
Hi Renlin,

On Thu, Nov 01, 2018 at 06:50:22PM +0000, Renlin Li wrote:
> I got the following dump from the test case.
> 
> x1 is an early clobber operand in the inline assembly statement,
> r92 should conflict with x1?

Yes, I think so.  Or LRA should not pick something that was already a
hard register to spill (this should work without the earlyclobbers, too...
In this testcase that works anyhow, but that seems to be by accident).


Segher
Peter Bergner Nov. 1, 2018, 10:07 p.m. UTC | #25
On 11/1/18 1:50 PM, Renlin Li wrote:
> Is there any update on this issues?
> arm-none-linux-gnueabihf native toolchain has been mis-compiled for a while.

From the analysis I've done, my commit is just exposing latent issues
in LRA.  Can you try the patch I submitted here to see if it helps?

  https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html

It survives on powerpc64le-linux, x86_64-linux and s390x-linux.
Jeff threw it on his testers and said he saw an arm issue and was
trying to come up with a test case for me to debug.

The specific issue you mentioned with the inline asm and the casp insn
is a bug in LRA where is will spill a user defined hard register and
it shouldn't do that.  My patch above stops that.  The question is
whether we've quashed the rest of the latent bugs.

Peter
Renlin Li Nov. 2, 2018, 10:05 a.m. UTC | #26
Hi Peter,

On 11/01/2018 10:07 PM, Peter Bergner wrote:
> On 11/1/18 1:50 PM, Renlin Li wrote:
>> Is there any update on this issues?
>> arm-none-linux-gnueabihf native toolchain has been mis-compiled for a while.
> 
>  From the analysis I've done, my commit is just exposing latent issues
> in LRA.  

Yes, it looks like some latent issues are been exposed.


Can you try the patch I submitted here to see if it helps?
> 
>    https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html

Thanks for the patch! I'll help to test the patch and let you know the status.

Thanks,
Renlin

> 
> It survives on powerpc64le-linux, x86_64-linux and s390x-linux.
> Jeff threw it on his testers and said he saw an arm issue and was
> trying to come up with a test case for me to debug.
> 
> The specific issue you mentioned with the inline asm and the casp insn
> is a bug in LRA where is will spill a user defined hard register and
> it shouldn't do that.  My patch above stops that.  The question is
> whether we've quashed the rest of the latent bugs.
> 
> Peter
> 
>
Jeff Law Nov. 5, 2018, 7:20 p.m. UTC | #27
On 11/1/18 4:07 PM, Peter Bergner wrote:
> On 11/1/18 1:50 PM, Renlin Li wrote:
>> Is there any update on this issues?
>> arm-none-linux-gnueabihf native toolchain has been mis-compiled for a while.
> 
> From the analysis I've done, my commit is just exposing latent issues
> in LRA.  Can you try the patch I submitted here to see if it helps?
> 
>   https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html
> 
> It survives on powerpc64le-linux, x86_64-linux and s390x-linux.
> Jeff threw it on his testers and said he saw an arm issue and was
> trying to come up with a test case for me to debug.
So I don't think the ARM issues are related to your patch, they may have
been related the combiner changes that went in around the same time.

At this point your patch appears to be DTRT across the board.  The only
fallout is the bogus s390 asm it caught in the kernel.

Jeff
Peter Bergner Nov. 5, 2018, 7:36 p.m. UTC | #28
On 11/5/18 1:20 PM, Jeff Law wrote:
> On 11/1/18 4:07 PM, Peter Bergner wrote:
>> On 11/1/18 1:50 PM, Renlin Li wrote:
>>> Is there any update on this issues?
>>> arm-none-linux-gnueabihf native toolchain has been mis-compiled for a while.
>>
>> From the analysis I've done, my commit is just exposing latent issues
>> in LRA.  Can you try the patch I submitted here to see if it helps?
>>
>>   https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html
>>
>> It survives on powerpc64le-linux, x86_64-linux and s390x-linux.
>> Jeff threw it on his testers and said he saw an arm issue and was
>> trying to come up with a test case for me to debug.
> So I don't think the ARM issues are related to your patch, they may have
> been related the combiner changes that went in around the same time.
> 
> At this point your patch appears to be DTRT across the board.  The only
> fallout is the bogus s390 asm it caught in the kernel.

Cool.  I will note that I contacted the s390 kernel guys and gave them a
fix to their broken constraints in that asm and they are going to fix it.

Is the above an approval to commit the patch mentioned above or do you
still want to wait until the ARM issues are fully resolved?

Peter
Jeff Law Nov. 5, 2018, 7:41 p.m. UTC | #29
On 11/5/18 12:36 PM, Peter Bergner wrote:
> On 11/5/18 1:20 PM, Jeff Law wrote:
>> On 11/1/18 4:07 PM, Peter Bergner wrote:
>>> On 11/1/18 1:50 PM, Renlin Li wrote:
>>>> Is there any update on this issues?
>>>> arm-none-linux-gnueabihf native toolchain has been mis-compiled for a while.
>>>
>>> From the analysis I've done, my commit is just exposing latent issues
>>> in LRA.  Can you try the patch I submitted here to see if it helps?
>>>
>>>   https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html
>>>
>>> It survives on powerpc64le-linux, x86_64-linux and s390x-linux.
>>> Jeff threw it on his testers and said he saw an arm issue and was
>>> trying to come up with a test case for me to debug.
>> So I don't think the ARM issues are related to your patch, they may have
>> been related the combiner changes that went in around the same time.
>>
>> At this point your patch appears to be DTRT across the board.  The only
>> fallout is the bogus s390 asm it caught in the kernel.
> 
> Cool.  I will note that I contacted the s390 kernel guys and gave them a
> fix to their broken constraints in that asm and they are going to fix it.
Sounds good.  I've got a hack in my tester to "fix" that bogus asm until
the kernel folks do it right.


> 
> Is the above an approval to commit the patch mentioned above or do you
> still want to wait until the ARM issues are fully resolved?
I think knowing the patch addresses all the known issues related to the
earlier IRA/LRA change unblocks the review step.  I don't think we need
to wait for the other ARM issues to be resolved -- they seem to be
unrelated to the IRA/LRA changes.

jeff
Renlin Li Nov. 6, 2018, 10:52 a.m. UTC | #30
Hi Jeff & Peter,

On 11/05/2018 07:41 PM, Jeff Law wrote:
> On 11/5/18 12:36 PM, Peter Bergner wrote:
>> On 11/5/18 1:20 PM, Jeff Law wrote:
>>> On 11/1/18 4:07 PM, Peter Bergner wrote:
>>>> On 11/1/18 1:50 PM, Renlin Li wrote:
>>>>> Is there any update on this issues?
>>>>> arm-none-linux-gnueabihf native toolchain has been mis-compiled for a while.
>>>>
>>>>  From the analysis I've done, my commit is just exposing latent issues
>>>> in LRA.  Can you try the patch I submitted here to see if it helps?
>>>>
>>>>    https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html
>>>>
>>>> It survives on powerpc64le-linux, x86_64-linux and s390x-linux.
>>>> Jeff threw it on his testers and said he saw an arm issue and was
>>>> trying to come up with a test case for me to debug.
>>> So I don't think the ARM issues are related to your patch, they may have
>>> been related the combiner changes that went in around the same time.
Yes, there are issues related to the combiner changes.

But the IRA/LRA change dose cause the arm-none-linux-gnueabihf bootstrap native toolchain mis-compiled.
And the new patch seems not fix this problem.

I am trying to extract a test case, but it is a little bit hard as the toolchain itself is mis-compiled.
And it ICEs when compile test case with it.

I created a bugzilla ticket for this, PR87899.

./gcc/cc1 ~/gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c  -O3
  test main
Analyzing compilation unit
Performing interprocedural optimizations
  <*free_lang_data> <visibility> <build_ssa_passes> <opt_local_passes> <targetclone> <free-fnsummary> <increase_alignment>Streaming LTO
  <whole-program> <profile_estimate> <icf> <devirt> <cp> <fnsummary> <inline> <pure-const> <free-fnsummary> <static-var> <single-use> 
<comdats>Assembling functions:
  <materialize-all-clones> testduring GIMPLE pass: ldist

gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c: In function ‘test’:
gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c:9:1: internal compiler error: Segmentation fault
     9 | test (void)
       | ^~~~
0x5c3a37 crash_signal
	../../gcc/gcc/toplev.c:325
0x63ef6b inchash::hash::add(void const*, unsigned int)
	../../gcc/gcc/inchash.h:100
0x63ef6b inchash::hash::add_ptr(void const*)
	../../gcc/gcc/inchash.h:94
0x63ef6b ddr_hasher::hash(data_dependence_relation const*)
	../../gcc/gcc/tree-loop-distribution.c:143
0x63ef6b hash_table<ddr_hasher, xcallocator>::find_slot(data_dependence_relation* const&, insert_option)
	../../gcc/gcc/hash-table.h:414
0x63ef6b get_data_dependence
	../../gcc/gcc/tree-loop-distribution.c:1184
0x63f2bd data_dep_in_cycle_p
	../../gcc/gcc/tree-loop-distribution.c:1210
0x63f2bd update_type_for_merge
	../../gcc/gcc/tree-loop-distribution.c:1255
0x64064b build_rdg_partition_for_vertex
	../../gcc/gcc/tree-loop-distribution.c:1302
0x64064b rdg_build_partitions
	../../gcc/gcc/tree-loop-distribution.c:1754
0x64064b distribute_loop
	../../gcc/gcc/tree-loop-distribution.c:2795
0x642299 execute
	../../gcc/gcc/tree-loop-distribution.c:3133
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.



Regards
Renlin



>>>
>>> At this point your patch appears to be DTRT across the board.  The only
>>> fallout is the bogus s390 asm it caught in the kernel.
>>
>> Cool.  I will note that I contacted the s390 kernel guys and gave them a
>> fix to their broken constraints in that asm and they are going to fix it.
> Sounds good.  I've got a hack in my tester to "fix" that bogus asm until
> the kernel folks do it right.
> 
> 
>>
>> Is the above an approval to commit the patch mentioned above or do you
>> still want to wait until the ARM issues are fully resolved?
> I think knowing the patch addresses all the known issues related to the
> earlier IRA/LRA change unblocks the review step.  I don't think we need
> to wait for the other ARM issues to be resolved -- they seem to be
> unrelated to the IRA/LRA changes.
> 
> jeff
>
Ramana Radhakrishnan Nov. 6, 2018, 10:57 a.m. UTC | #31
On Tue, Nov 6, 2018 at 10:52 AM Renlin Li <renlin.li@foss.arm.com> wrote:
>
> Hi Jeff & Peter,
>
> On 11/05/2018 07:41 PM, Jeff Law wrote:
> > On 11/5/18 12:36 PM, Peter Bergner wrote:
> >> On 11/5/18 1:20 PM, Jeff Law wrote:
> >>> On 11/1/18 4:07 PM, Peter Bergner wrote:
> >>>> On 11/1/18 1:50 PM, Renlin Li wrote:
> >>>>> Is there any update on this issues?
> >>>>> arm-none-linux-gnueabihf native toolchain has been mis-compiled for a while.
> >>>>
> >>>>  From the analysis I've done, my commit is just exposing latent issues
> >>>> in LRA.  Can you try the patch I submitted here to see if it helps?
> >>>>
> >>>>    https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html
> >>>>
> >>>> It survives on powerpc64le-linux, x86_64-linux and s390x-linux.
> >>>> Jeff threw it on his testers and said he saw an arm issue and was
> >>>> trying to come up with a test case for me to debug.
> >>> So I don't think the ARM issues are related to your patch, they may have
> >>> been related the combiner changes that went in around the same time.
> Yes, there are issues related to the combiner changes.

But didn't the combiner changes come *after* these patches ? So IIUC,
Renlin has been trying to get these fixed *without* the combine
patches but just with your patch applied on top of the revision where
the problem started showing up .

Can you confirm that Renlin ?


Ramana
>
> But the IRA/LRA change dose cause the arm-none-linux-gnueabihf bootstrap native toolchain mis-compiled.
> And the new patch seems not fix this problem.
>
> I am trying to extract a test case, but it is a little bit hard as the toolchain itself is mis-compiled.
> And it ICEs when compile test case with it.
>
> I created a bugzilla ticket for this, PR87899.
>
> ./gcc/cc1 ~/gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c  -O3
>   test main
> Analyzing compilation unit
> Performing interprocedural optimizations
>   <*free_lang_data> <visibility> <build_ssa_passes> <opt_local_passes> <targetclone> <free-fnsummary> <increase_alignment>Streaming LTO
>   <whole-program> <profile_estimate> <icf> <devirt> <cp> <fnsummary> <inline> <pure-const> <free-fnsummary> <static-var> <single-use>
> <comdats>Assembling functions:
>   <materialize-all-clones> testduring GIMPLE pass: ldist
>
> gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c: In function ‘test’:
> gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c:9:1: internal compiler error: Segmentation fault
>      9 | test (void)
>        | ^~~~
> 0x5c3a37 crash_signal
>         ../../gcc/gcc/toplev.c:325
> 0x63ef6b inchash::hash::add(void const*, unsigned int)
>         ../../gcc/gcc/inchash.h:100
> 0x63ef6b inchash::hash::add_ptr(void const*)
>         ../../gcc/gcc/inchash.h:94
> 0x63ef6b ddr_hasher::hash(data_dependence_relation const*)
>         ../../gcc/gcc/tree-loop-distribution.c:143
> 0x63ef6b hash_table<ddr_hasher, xcallocator>::find_slot(data_dependence_relation* const&, insert_option)
>         ../../gcc/gcc/hash-table.h:414
> 0x63ef6b get_data_dependence
>         ../../gcc/gcc/tree-loop-distribution.c:1184
> 0x63f2bd data_dep_in_cycle_p
>         ../../gcc/gcc/tree-loop-distribution.c:1210
> 0x63f2bd update_type_for_merge
>         ../../gcc/gcc/tree-loop-distribution.c:1255
> 0x64064b build_rdg_partition_for_vertex
>         ../../gcc/gcc/tree-loop-distribution.c:1302
> 0x64064b rdg_build_partitions
>         ../../gcc/gcc/tree-loop-distribution.c:1754
> 0x64064b distribute_loop
>         ../../gcc/gcc/tree-loop-distribution.c:2795
> 0x642299 execute
>         ../../gcc/gcc/tree-loop-distribution.c:3133
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See <https://gcc.gnu.org/bugs/> for instructions.
>
>
>
> Regards
> Renlin
>
>
>
> >>>
> >>> At this point your patch appears to be DTRT across the board.  The only
> >>> fallout is the bogus s390 asm it caught in the kernel.
> >>
> >> Cool.  I will note that I contacted the s390 kernel guys and gave them a
> >> fix to their broken constraints in that asm and they are going to fix it.
> > Sounds good.  I've got a hack in my tester to "fix" that bogus asm until
> > the kernel folks do it right.
> >
> >
> >>
> >> Is the above an approval to commit the patch mentioned above or do you
> >> still want to wait until the ARM issues are fully resolved?
> > I think knowing the patch addresses all the known issues related to the
> > earlier IRA/LRA change unblocks the review step.  I don't think we need
> > to wait for the other ARM issues to be resolved -- they seem to be
> > unrelated to the IRA/LRA changes.
> >
> > jeff
> >
Renlin Li Nov. 6, 2018, 12:23 p.m. UTC | #32
Hi Ramana,

On 11/06/2018 10:57 AM, Ramana Radhakrishnan wrote:
> On Tue, Nov 6, 2018 at 10:52 AM Renlin Li <renlin.li@foss.arm.com> wrote:
>>
>> Hi Jeff & Peter,
>>
>> On 11/05/2018 07:41 PM, Jeff Law wrote:
>>> On 11/5/18 12:36 PM, Peter Bergner wrote:
>>>> On 11/5/18 1:20 PM, Jeff Law wrote:
>>>>> On 11/1/18 4:07 PM, Peter Bergner wrote:
>>>>>> On 11/1/18 1:50 PM, Renlin Li wrote:
>>>>>>> Is there any update on this issues?
>>>>>>> arm-none-linux-gnueabihf native toolchain has been mis-compiled for a while.
>>>>>>
>>>>>>   From the analysis I've done, my commit is just exposing latent issues
>>>>>> in LRA.  Can you try the patch I submitted here to see if it helps?
>>>>>>
>>>>>>     https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html
>>>>>>
>>>>>> It survives on powerpc64le-linux, x86_64-linux and s390x-linux.
>>>>>> Jeff threw it on his testers and said he saw an arm issue and was
>>>>>> trying to come up with a test case for me to debug.
>>>>> So I don't think the ARM issues are related to your patch, they may have
>>>>> been related the combiner changes that went in around the same time.
>> Yes, there are issues related to the combiner changes.
> 
> But didn't the combiner changes come *after* these patches ? So IIUC,
> Renlin has been trying to get these fixed *without* the combine
> patches but just with your patch applied on top of the revision where
> the problem started showing up .
> 
> Can you confirm that Renlin ?

I just did a bootstrap again with everything up to r264897 which is Oct 6.
it produce the ICE I mentioned on the PR87899.

The first combiner patch on Oct 22.

Regards,
Renlin

> 
> 
> Ramana
>>
>> But the IRA/LRA change dose cause the arm-none-linux-gnueabihf bootstrap native toolchain mis-compiled.
>> And the new patch seems not fix this problem.
>>
>> I am trying to extract a test case, but it is a little bit hard as the toolchain itself is mis-compiled.
>> And it ICEs when compile test case with it.
>>
>> I created a bugzilla ticket for this, PR87899.
>>
>> ./gcc/cc1 ~/gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c  -O3
>>    test main
>> Analyzing compilation unit
>> Performing interprocedural optimizations
>>    <*free_lang_data> <visibility> <build_ssa_passes> <opt_local_passes> <targetclone> <free-fnsummary> <increase_alignment>Streaming LTO
>>    <whole-program> <profile_estimate> <icf> <devirt> <cp> <fnsummary> <inline> <pure-const> <free-fnsummary> <static-var> <single-use>
>> <comdats>Assembling functions:
>>    <materialize-all-clones> testduring GIMPLE pass: ldist
>>
>> gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c: In function ‘test’:
>> gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c:9:1: internal compiler error: Segmentation fault
>>       9 | test (void)
>>         | ^~~~
>> 0x5c3a37 crash_signal
>>          ../../gcc/gcc/toplev.c:325
>> 0x63ef6b inchash::hash::add(void const*, unsigned int)
>>          ../../gcc/gcc/inchash.h:100
>> 0x63ef6b inchash::hash::add_ptr(void const*)
>>          ../../gcc/gcc/inchash.h:94
>> 0x63ef6b ddr_hasher::hash(data_dependence_relation const*)
>>          ../../gcc/gcc/tree-loop-distribution.c:143
>> 0x63ef6b hash_table<ddr_hasher, xcallocator>::find_slot(data_dependence_relation* const&, insert_option)
>>          ../../gcc/gcc/hash-table.h:414
>> 0x63ef6b get_data_dependence
>>          ../../gcc/gcc/tree-loop-distribution.c:1184
>> 0x63f2bd data_dep_in_cycle_p
>>          ../../gcc/gcc/tree-loop-distribution.c:1210
>> 0x63f2bd update_type_for_merge
>>          ../../gcc/gcc/tree-loop-distribution.c:1255
>> 0x64064b build_rdg_partition_for_vertex
>>          ../../gcc/gcc/tree-loop-distribution.c:1302
>> 0x64064b rdg_build_partitions
>>          ../../gcc/gcc/tree-loop-distribution.c:1754
>> 0x64064b distribute_loop
>>          ../../gcc/gcc/tree-loop-distribution.c:2795
>> 0x642299 execute
>>          ../../gcc/gcc/tree-loop-distribution.c:3133
>> Please submit a full bug report,
>> with preprocessed source if appropriate.
>> Please include the complete backtrace with any bug report.
>> See <https://gcc.gnu.org/bugs/> for instructions.
>>
>>
>>
>> Regards
>> Renlin
>>
>>
>>
>>>>>
>>>>> At this point your patch appears to be DTRT across the board.  The only
>>>>> fallout is the bogus s390 asm it caught in the kernel.
>>>>
>>>> Cool.  I will note that I contacted the s390 kernel guys and gave them a
>>>> fix to their broken constraints in that asm and they are going to fix it.
>>> Sounds good.  I've got a hack in my tester to "fix" that bogus asm until
>>> the kernel folks do it right.
>>>
>>>
>>>>
>>>> Is the above an approval to commit the patch mentioned above or do you
>>>> still want to wait until the ARM issues are fully resolved?
>>> I think knowing the patch addresses all the known issues related to the
>>> earlier IRA/LRA change unblocks the review step.  I don't think we need
>>> to wait for the other ARM issues to be resolved -- they seem to be
>>> unrelated to the IRA/LRA changes.
>>>
>>> jeff
>>>
Peter Bergner Nov. 6, 2018, 6:46 p.m. UTC | #33
On 11/6/18 6:23 AM, Renlin Li wrote:
> I just did a bootstrap again with everything up to r264897 which is Oct 6.
> it produce the ICE I mentioned on the PR87899.
> 
> The first combiner patch on Oct 22.

Do the testsuite results (for disable-bootstrap builds) differ between
r264896 and r264897?  If so, that would be much easier to track down.

If not, maybe the following patch could help to narrow down which gcc
source file(s) are being miscompiled by allowing you to disable the
special handling of copy conflicts with an option?  The option default
(ie, not using the option or -fno-ira-copies-conflict) is the same behavior
as now and -fira-copies-conflict would make things behave like they did
before my patch.

Peter


Index: gcc/common.opt
===================================================================
--- gcc/common.opt	(revision 265402)
+++ gcc/common.opt	(working copy)
@@ -1761,6 +1761,10 @@ Enum(ira_region) String(all) Value(IRA_R
 EnumValue
 Enum(ira_region) String(mixed) Value(IRA_REGION_MIXED)
 
+fira-copies-conflict
+Common Report Var(flag_ira_copies_conflict) Init(0) Optimization
+Make pseudos connected by a copy conflict
+
 fira-hoist-pressure
 Common Report Var(flag_ira_hoist_pressure) Init(1) Optimization
 Use IRA based register pressure calculation
Index: gcc/ira-lives.c
===================================================================
--- gcc/ira-lives.c	(revision 265402)
+++ gcc/ira-lives.c	(working copy)
@@ -1066,7 +1066,7 @@ non_conflicting_reg_copy_p (rtx_insn *in
 {
   /* Reload has issues with overlapping pseudos being assigned to the
      same hard register, so don't allow it.  See PR87600 for details.  */
-  if (!targetm.lra_p ())
+  if (flag_ira_copies_conflict || !targetm.lra_p ())
     return NULL_RTX;
 
   rtx set = single_set (insn);
Jeff Law Nov. 6, 2018, 6:58 p.m. UTC | #34
On 11/6/18 3:52 AM, Renlin Li wrote:
> Hi Jeff & Peter,
> 
> On 11/05/2018 07:41 PM, Jeff Law wrote:
>> On 11/5/18 12:36 PM, Peter Bergner wrote:
>>> On 11/5/18 1:20 PM, Jeff Law wrote:
>>>> On 11/1/18 4:07 PM, Peter Bergner wrote:
>>>>> On 11/1/18 1:50 PM, Renlin Li wrote:
>>>>>> Is there any update on this issues?
>>>>>> arm-none-linux-gnueabihf native toolchain has been mis-compiled
>>>>>> for a while.
>>>>>
>>>>>  From the analysis I've done, my commit is just exposing latent issues
>>>>> in LRA.  Can you try the patch I submitted here to see if it helps?
>>>>>
>>>>>    https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html
>>>>>
>>>>> It survives on powerpc64le-linux, x86_64-linux and s390x-linux.
>>>>> Jeff threw it on his testers and said he saw an arm issue and was
>>>>> trying to come up with a test case for me to debug.
>>>> So I don't think the ARM issues are related to your patch, they may
>>>> have
>>>> been related the combiner changes that went in around the same time.
> Yes, there are issues related to the combiner changes.
> 
> But the IRA/LRA change dose cause the arm-none-linux-gnueabihf bootstrap
> native toolchain mis-compiled.
> And the new patch seems not fix this problem.
That's strange.  I'm bootstrapping arm-linux-gnueabihf daily with qemu +
a suitable root filesystem using Peter's most recent testing patch.


> 
> I am trying to extract a test case, but it is a little bit hard as the
> toolchain itself is mis-compiled.
> And it ICEs when compile test case with it.
What I would suggest doing is to first start with running the testsuite
against the stage1 compiler before/after Peter's changes.  Sometimes
that'll turn up something useful and you can avoid debuging things
through stage2/stage3.


jeff
Renlin Li Nov. 8, 2018, 10:57 a.m. UTC | #35
Hi,

On 11/06/2018 06:58 PM, Jeff Law wrote:
> On 11/6/18 3:52 AM, Renlin Li wrote:
>> Hi Jeff & Peter,
>>
>> On 11/05/2018 07:41 PM, Jeff Law wrote:
>>> On 11/5/18 12:36 PM, Peter Bergner wrote:
>>>> On 11/5/18 1:20 PM, Jeff Law wrote:
>>>>> On 11/1/18 4:07 PM, Peter Bergner wrote:
>>>>>> On 11/1/18 1:50 PM, Renlin Li wrote:
>>>>>>> Is there any update on this issues?
>>>>>>> arm-none-linux-gnueabihf native toolchain has been mis-compiled
>>>>>>> for a while.
>>>>>>
>>>>>>   From the analysis I've done, my commit is just exposing latent issues
>>>>>> in LRA.  Can you try the patch I submitted here to see if it helps?
>>>>>>
>>>>>>     https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html
>>>>>>
>>>>>> It survives on powerpc64le-linux, x86_64-linux and s390x-linux.
>>>>>> Jeff threw it on his testers and said he saw an arm issue and was
>>>>>> trying to come up with a test case for me to debug.
>>>>> So I don't think the ARM issues are related to your patch, they may
>>>>> have
>>>>> been related the combiner changes that went in around the same time.
>> Yes, there are issues related to the combiner changes.
>>
>> But the IRA/LRA change dose cause the arm-none-linux-gnueabihf bootstrap
>> native toolchain mis-compiled.
>> And the new patch seems not fix this problem.
> That's strange.  I'm bootstrapping arm-linux-gnueabihf daily with qemu +
> a suitable root filesystem using Peter's most recent testing patch.
> 
> 
>>
>> I am trying to extract a test case, but it is a little bit hard as the
>> toolchain itself is mis-compiled.
>> And it ICEs when compile test case with it.
> What I would suggest doing is to first start with running the testsuite
> against the stage1 compiler before/after Peter's changes.  Sometimes
> that'll turn up something useful and you can avoid debuging things
> through stage2/stage3.

Hi Jeff,
Thanks for the suggestion! I could reproduce it with stage1 compiler.



Hi Peter,

I think I found the problem!

As described in the PR, a hard register is used in
an pre/post modify expression. The hard register is live, but updated.
In this case, we should make it conflicting with all pseudos live at
that point.  Does it make sense?





It fixes the ICE of mis-compiled arm-linux-gnueabihf toolchain described in the
PR.

I attached the patch for discussion.  I haven't give a complete test on arm or
any other targets, yet. (Probably need more adjusting)

I will run arm and aarch64 regression test, cross and native.

Regards,
Renlin

BTW, The pre/post modify expression is generated by auto_inc/auto_dec pass.
somehow, it merges with hard register,  for example function argument registers.
This optimization make the life for RA harder. Probably we don't want that pass
too aggressive. @Wilco.
(This IRA/LRA and the combiner change reveals a lot of issues,
force us to work on it and improve the compiler :) .)

gcc/ChangeLog:

2018-11-08  Renlin Li  <renlin.li@arm.com>
         PR middle-end/87899
	* lra-lives.c (process_bb_lives): Make hard register of INOUT
	type conflict with all live pseudo.







> 
> 
> jeff
>
diff --git a/gcc/lra-lives.c b/gcc/lra-lives.c
index 0bf8cd06a302c8a6fcb914b94f953cdaa86597a2..370a7254cac7dbde4e320424e09274cee66c50b9 100644
--- a/gcc/lra-lives.c
+++ b/gcc/lra-lives.c
@@ -878,11 +878,25 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 
       /* See which defined values die here.  */
       for (reg = curr_id->regs; reg != NULL; reg = reg->next)
-	if (reg->type == OP_OUT
-	    && ! reg_early_clobber_p (reg, n_alt) && ! reg->subreg_p)
-	  need_curr_point_incr
-	    |= mark_regno_dead (reg->regno, reg->biggest_mode,
-				curr_point);
+	if (! reg_early_clobber_p (reg, n_alt) && ! reg->subreg_p)
+	  {
+	    if (reg->type == OP_OUT)
+	      need_curr_point_incr
+		|= mark_regno_dead (reg->regno, reg->biggest_mode,
+				    curr_point);
+
+	    // This is a hard register, and it must be live.  Keep it live and
+	    // make it conflict with all live pseudo registers.
+	    else if (reg->type == OP_INOUT && reg->regno < FIRST_PSEUDO_REGISTER)
+	      {
+		lra_assert (TEST_HARD_REG_BIT (hard_regs_live, reg->regno));
+
+		unsigned int i;
+		EXECUTE_IF_SET_IN_SPARSESET (pseudos_live, i)
+		  SET_HARD_REG_BIT (lra_reg_info[i].conflict_hard_regs,
+				    reg->regno);
+	      }
+	  }
 
       for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next)
 	if (reg->type == OP_OUT
Richard Biener Nov. 8, 2018, 11:48 a.m. UTC | #36
On Thu, Nov 8, 2018 at 11:57 AM Renlin Li <renlin.li@foss.arm.com> wrote:
>
> Hi,
>
> On 11/06/2018 06:58 PM, Jeff Law wrote:
> > On 11/6/18 3:52 AM, Renlin Li wrote:
> >> Hi Jeff & Peter,
> >>
> >> On 11/05/2018 07:41 PM, Jeff Law wrote:
> >>> On 11/5/18 12:36 PM, Peter Bergner wrote:
> >>>> On 11/5/18 1:20 PM, Jeff Law wrote:
> >>>>> On 11/1/18 4:07 PM, Peter Bergner wrote:
> >>>>>> On 11/1/18 1:50 PM, Renlin Li wrote:
> >>>>>>> Is there any update on this issues?
> >>>>>>> arm-none-linux-gnueabihf native toolchain has been mis-compiled
> >>>>>>> for a while.
> >>>>>>
> >>>>>>   From the analysis I've done, my commit is just exposing latent issues
> >>>>>> in LRA.  Can you try the patch I submitted here to see if it helps?
> >>>>>>
> >>>>>>     https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html
> >>>>>>
> >>>>>> It survives on powerpc64le-linux, x86_64-linux and s390x-linux.
> >>>>>> Jeff threw it on his testers and said he saw an arm issue and was
> >>>>>> trying to come up with a test case for me to debug.
> >>>>> So I don't think the ARM issues are related to your patch, they may
> >>>>> have
> >>>>> been related the combiner changes that went in around the same time.
> >> Yes, there are issues related to the combiner changes.
> >>
> >> But the IRA/LRA change dose cause the arm-none-linux-gnueabihf bootstrap
> >> native toolchain mis-compiled.
> >> And the new patch seems not fix this problem.
> > That's strange.  I'm bootstrapping arm-linux-gnueabihf daily with qemu +
> > a suitable root filesystem using Peter's most recent testing patch.
> >
> >
> >>
> >> I am trying to extract a test case, but it is a little bit hard as the
> >> toolchain itself is mis-compiled.
> >> And it ICEs when compile test case with it.
> > What I would suggest doing is to first start with running the testsuite
> > against the stage1 compiler before/after Peter's changes.  Sometimes
> > that'll turn up something useful and you can avoid debuging things
> > through stage2/stage3.
>
> Hi Jeff,
> Thanks for the suggestion! I could reproduce it with stage1 compiler.
>
>
>
> Hi Peter,
>
> I think I found the problem!
>
> As described in the PR, a hard register is used in
> an pre/post modify expression. The hard register is live, but updated.
> In this case, we should make it conflicting with all pseudos live at
> that point.  Does it make sense?

Err, that looks very much like a hack that manages to hide the issue.

Esp. adding conflicts in a loop that says "See which defined values die here."
is quite fishy.

Richard.

>
>
>
>
> It fixes the ICE of mis-compiled arm-linux-gnueabihf toolchain described in the
> PR.
>
> I attached the patch for discussion.  I haven't give a complete test on arm or
> any other targets, yet. (Probably need more adjusting)
>
> I will run arm and aarch64 regression test, cross and native.
>
> Regards,
> Renlin
>
> BTW, The pre/post modify expression is generated by auto_inc/auto_dec pass.
> somehow, it merges with hard register,  for example function argument registers.
> This optimization make the life for RA harder. Probably we don't want that pass
> too aggressive. @Wilco.
> (This IRA/LRA and the combiner change reveals a lot of issues,
> force us to work on it and improve the compiler :) .)
>
> gcc/ChangeLog:
>
> 2018-11-08  Renlin Li  <renlin.li@arm.com>
>          PR middle-end/87899
>         * lra-lives.c (process_bb_lives): Make hard register of INOUT
>         type conflict with all live pseudo.
>
>
>
>
>
>
>
> >
> >
> > jeff
> >
Peter Bergner Nov. 8, 2018, 12:35 p.m. UTC | #37
On 11/8/18 4:57 AM, Renlin Li wrote:
> I think I found the problem!
> 
> As described in the PR, a hard register is used in
> an pre/post modify expression. The hard register is live, but updated.
> In this case, we should make it conflicting with all pseudos live at
> that point.  Does it make sense?

Do you have a reproducer test case I can look at?  I'd like to see the
problematical rtl to help me determine whether your patch is correct
or not.  ...and thank you for debugging this!

Peter
Renlin Li Nov. 8, 2018, 1:42 p.m. UTC | #38
Hi Peter,

On 11/08/2018 12:35 PM, Peter Bergner wrote:
> On 11/8/18 4:57 AM, Renlin Li wrote:
>> I think I found the problem!
>>
>> As described in the PR, a hard register is used in
>> an pre/post modify expression. The hard register is live, but updated.
>> In this case, we should make it conflicting with all pseudos live at
>> that point.  Does it make sense?
> 
> Do you have a reproducer test case I can look at?  I'd like to see the
> problematical rtl to help me determine whether your patch is correct
> or not.  ...and thank you for debugging this!
> 
> Peter
> 

Sure! (I was trying to send the mail, but it failed with large attachment.)
I attached the dump file in the bugzilla ticket: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87899
I remove the unrelated dump (tar -xzvf xxx.tgz)

The code you want to check is the following in ira pass:
insn 10905: r1 = r2040
insn 208: use and update r1 with pre_modify
insn 191: use pseudo r2040

I could not create a test case. This dump is created with stage1 compiler compiling next stage compiler.
The (not helpful) command line is:


/home/renlin/try-new/./prev-gcc/cc1plus -quiet -nostdinc++ -v -I 
/home/renlin/try-new/prev-arm-none-linux-gnueabihf/libstdc++-v3/include/arm-none-linux-gnueabihf -I 
/home/renlin/try-new/prev-arm-none-linux-gnueabihf/libstdc++-v3/include -I /home/renlin/gcc/libstdc++-v3/libsupc++ -I . -I . -I ../../gcc/gcc -I 
../../gcc/gcc/. -I ../../gcc/gcc/../include -I ../../gcc/gcc/../libcpp/include -I ../../gcc/gcc/../libdecnumber -I ../../gcc/gcc/../libdecnumber/dpd 
-I ../libdecnumber -I ../../gcc/gcc/../libbacktrace -imultilib . -imultiarch arm-linux-gnueabihf -iprefix 
/home/renlin/try-new/prev-gcc/../lib/gcc/arm-none-linux-gnueabihf/9.0.0/ -isystem /home/renlin/try-new/./prev-gcc/include -isystem 
/home/renlin/try-new/./prev-gcc/include-fixed -MMD tree-loop-distribution.d -MF ./.deps/tree-loop-distribution.TPo -MP -MT tree-loop-distribution.o 
-D_GNU_SOURCE -D IN_GCC -D HAVE_CONFIG_H ../../gcc/gcc/tree-loop-distribution.c -quiet -dumpbase tree-loop-distribution.c -mfloat-abi=hard -mfpu=neon 
-mthumb -mtls-dialect=gnu -march=armv7-a+simd -auxbase-strip tree-loop-distribution.s -g -gtoggle -O2 -Wextra -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wsuggest-attribute=format -Woverloaded-virtual -Wpedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -version 
-fno-PIE -fno-checking -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -fno-common -o tree-loop-distribution.s

Regards,
Renlin
Peter Bergner Nov. 8, 2018, 2:29 p.m. UTC | #39
On 11/8/18 5:48 AM, Richard Biener wrote:
> Err, that looks very much like a hack that manages to hide the issue.

It's true we do not want to hide the issue by adding unneeded conflicts,
since that can lead to unnecessary spills.  However, ...


> Esp. adding conflicts in a loop that says "See which defined values die here."
> is quite fishy.

..the original loop is dealing with some of the gory details you never read
about in academic RA papers.  This code is used to catch the case where an insn
defines a register(s) that is never used.  Because it is never used, it never
ends up in the "live" (ie, live and available) set, which can cause us to miss
some required conflicts.

That said, I still need to look at the RTL from the bad program before
determining whether the patch is correct or not.  Computing accurate
conflict information is a delicate thing.

Peter
Peter Bergner Nov. 8, 2018, 3:21 p.m. UTC | #40
On 11/8/18 4:57 AM, Renlin Li wrote:
> I think I found the problem!
> 
> As described in the PR, a hard register is used in
> an pre/post modify expression. The hard register is live, but updated.
> In this case, we should make it conflicting with all pseudos live at
> that point.  Does it make sense?
[snip]
> It fixes the ICE of mis-compiled arm-linux-gnueabihf toolchain described in the
> PR.
> 
> I attached the patch for discussion.  I haven't give a complete test on arm or
> any other targets, yet. (Probably need more adjusting)

Yes, this is the problem.  We see from the dump, that r2040 does not conflict with
hard reg r1:

;; a2040(r1597,l0) conflicts: <list of pseudo regs>
;;     total conflict hard regs:
;;     conflict hard regs:

...and we have the following RTL:

(insn 10905 179 199 2 (set (reg:SI 1 r1)
        (reg/f:SI 2040)) "../../gcc/gcc/vec.h":1654 647 {*thumb2_movsi_vfp}
     (nil))

...

(insn 208 202 182 2 (set (mem/f/c:SI (pre_modify:SI (reg:SI 1 r1)
                (plus:SI (reg:SI 1 r1)
                    (const_int 12 [0xc]))) [129 loop_nest.m_vec+0 S4 A32])
        (reg:SI 1048)) "../../gcc/gcc/vec.h":1654 647 {*thumb2_movsi_vfp}
     (expr_list:REG_INC (reg:SI 1 r1)
        (nil)))

...

(insn 191 189 192 2 (set (mem/f/c:SI (plus:SI (reg/f:SI 2040)
                (const_int 8 [0x8])) [367 ddrs_table+0 S4 A32])
        (reg/f:SI 1047)) "../../gcc/gcc/tree-loop-distribution.c":2741 647 {*thumb2_movsi_vfp}
     (nil))

So my patch caused us to (correctly) skip adding a conflict between r1 and
r2040 due to the register copy in insn 10905.  However, they really should
conflict as you found due to the definition of r1 in insn 208 and the fact
we don't add one is a latent bug in LRA.  I think your patch is on the right
track, but not totally there yet.  Imagine instead that the references to r1
and r2040 were swapped, so instead we have:

(insn 10905 179 199 2 (set (reg:SI 2040)
        (reg/f:SI 1 r1)) "../../gcc/gcc/vec.h":1654 647 {*thumb2_movsi_vfp}
     (nil))

...

(insn 208 202 182 2 (set (mem/f/c:SI (pre_modify:SI (reg:SI 2040)
                (plus:SI (reg:SI 2040)
                    (const_int 12 [0xc]))) [129 loop_nest.m_vec+0 S4 A32])
        (reg:SI 1048)) "../../gcc/gcc/vec.h":1654 647 {*thumb2_movsi_vfp}
     (expr_list:REG_INC (reg:SI 2040)
        (nil)))

...

(insn 191 189 192 2 (set (mem/f/c:SI (plus:SI (reg/f:SI 1 r1)
                (const_int 8 [0x8])) [367 ddrs_table+0 S4 A32])
        (reg/f:SI 1047)) "../../gcc/gcc/tree-loop-distribution.c":2741 647 {*thumb2_movsi_vfp}
     (nil))

Even with your patch, we'd miss adding the conflict between r1 and r2040.
Let me think about how we should solve this one.

And a *BIG* thank you for tracking down the problem!!!

Peter
Renlin Li Nov. 8, 2018, 4:19 p.m. UTC | #41
Hi Peter,

On 11/08/2018 03:21 PM, Peter Bergner wrote:
> On 11/8/18 4:57 AM, Renlin Li wrote:
>> I think I found the problem!
>>
>> As described in the PR, a hard register is used in
>> an pre/post modify expression. The hard register is live, but updated.
>> In this case, we should make it conflicting with all pseudos live at
>> that point.  Does it make sense?
> [snip]
>> It fixes the ICE of mis-compiled arm-linux-gnueabihf toolchain described in the
>> PR.
>>
>> I attached the patch for discussion.  I haven't give a complete test on arm or
>> any other targets, yet. (Probably need more adjusting)
> 
> Yes, this is the problem.  We see from the dump, that r2040 does not conflict with
> hard reg r1:
> 
> ;; a2040(r1597,l0) conflicts: <list of pseudo regs>
> ;;     total conflict hard regs:
> ;;     conflict hard regs:
I think you should look for axxx(r2040, ..)?

Maybe I am wrong (not an expert of RA), from what I observed, it is the LRA
makes the code more complex. It decides to split the live range and spill r2040.
It creates multiple instructions to reload it.
r2944 in LRA dump is the register which starts to go wrong. It is assigned as r1.


       Creating newreg=2944 from oldreg=2040, assigning class GENERAL_REGS to inheritance r2944
     Original reg change 2040->2944 (bb2):
  10905: r1:SI=r2944:SI
     Add inheritance<-original before:
  12868: r2944:SI=r2040:SI

The dump is the final state of LRA. I debug it with gdb, and there are some temporary steps
which is not observable in the final dump.

> 
> ...and we have the following RTL:
> 
> (insn 10905 179 199 2 (set (reg:SI 1 r1)
>          (reg/f:SI 2040)) "../../gcc/gcc/vec.h":1654 647 {*thumb2_movsi_vfp}
>       (nil))
> 
> ...
> 
> (insn 208 202 182 2 (set (mem/f/c:SI (pre_modify:SI (reg:SI 1 r1)
>                  (plus:SI (reg:SI 1 r1)
>                      (const_int 12 [0xc]))) [129 loop_nest.m_vec+0 S4 A32])
>          (reg:SI 1048)) "../../gcc/gcc/vec.h":1654 647 {*thumb2_movsi_vfp}
>       (expr_list:REG_INC (reg:SI 1 r1)
>          (nil)))
> 
> ...
> 
> (insn 191 189 192 2 (set (mem/f/c:SI (plus:SI (reg/f:SI 2040)
>                  (const_int 8 [0x8])) [367 ddrs_table+0 S4 A32])
>          (reg/f:SI 1047)) "../../gcc/gcc/tree-loop-distribution.c":2741 647 {*thumb2_movsi_vfp}
>       (nil))
> 
> So my patch caused us to (correctly) skip adding a conflict between r1 and
> r2040 due to the register copy in insn 10905.  However, they really should
> conflict as you found due to the definition of r1 in insn 208 and the fact
> we don't add one is a latent bug in LRA.  I think your patch is on the right
> track, but not totally there yet.  Imagine instead that the references to r1
> and r2040 were swapped, so instead we have:
> 
> (insn 10905 179 199 2 (set (reg:SI 2040)
>          (reg/f:SI 1 r1)) "../../gcc/gcc/vec.h":1654 647 {*thumb2_movsi_vfp}
>       (nil))
> 
> ...
> 
> (insn 208 202 182 2 (set (mem/f/c:SI (pre_modify:SI (reg:SI 2040)
>                  (plus:SI (reg:SI 2040)
>                      (const_int 12 [0xc]))) [129 loop_nest.m_vec+0 S4 A32])
>          (reg:SI 1048)) "../../gcc/gcc/vec.h":1654 647 {*thumb2_movsi_vfp}
>       (expr_list:REG_INC (reg:SI 2040)
>          (nil)))
> 
> ...
> 
> (insn 191 189 192 2 (set (mem/f/c:SI (plus:SI (reg/f:SI 1 r1)
>                  (const_int 8 [0x8])) [367 ddrs_table+0 S4 A32])
>          (reg/f:SI 1047)) "../../gcc/gcc/tree-loop-distribution.c":2741 647 {*thumb2_movsi_vfp}
>       (nil))
> 
> Even with your patch, we'd miss adding the conflict between r1 and r2040.
> Let me think about how we should solve this one.

Yes, I am not confident the patch will be the ultimate fix to the problem.

> 
> And a *BIG* thank you for tracking down the problem!!!
> 
Nop.

Regards,
Renlin
> Peter
>
Peter Bergner Nov. 8, 2018, 5:52 p.m. UTC | #42
On 11/8/18 10:19 AM, Renlin Li wrote:
>> Yes, this is the problem.  We see from the dump, that r2040 does not conflict with
>> hard reg r1:
>>
>> ;; a2040(r1597,l0) conflicts: <list of pseudo regs>
>> ;;     total conflict hard regs:
>> ;;     conflict hard regs:
> I think you should look for axxx(r2040, ..)?
> 
> Maybe I am wrong (not an expert of RA), from what I observed, it is the LRA
> makes the code more complex. It decides to split the live range and spill r2040.
> It creates multiple instructions to reload it.
> r2944 in LRA dump is the register which starts to go wrong. It is assigned as r1.

Yes, IRA and LRA have similar code to compute conflicts.  We need them both to
compute that r2040 (and the reload pseudo(s) generated for it by LRA) conflict
with r1.

Peter
Peter Bergner Nov. 8, 2018, 6:59 p.m. UTC | #43
On 11/8/18 8:29 AM, Peter Bergner wrote:
> On 11/8/18 5:48 AM, Richard Biener wrote:
>> Esp. adding conflicts in a loop that says "See which defined values die here."
>> is quite fishy.
> 
> ..the original loop is dealing with some of the gory details you never read
> about in academic RA papers.  This code is used to catch the case where an insn
> defines a register(s) that is never used.  Because it is never used, it never
> ends up in the "live" (ie, live and available) set, which can cause us to miss
> some required conflicts.

Heh, of course I confused this loop with the one I described which is
earlier in the function.  This loop is actually just the normal loop
over all definitions, removing them from the live set and adding conflicts
with all pseudos that are currently live.

Sorry for the confusion. :-(

Peter
diff mbox series

Patch

Index: gcc/ira.h
===================================================================
--- gcc/ira.h	(revision 264789)
+++ gcc/ira.h	(working copy)
@@ -210,6 +210,9 @@  extern void ira_adjust_equiv_reg_cost (u
 /* ira-costs.c */
 extern void ira_costs_c_finalize (void);
 
+/* ira-lives.c */
+extern rtx copy_insn_p (rtx_insn *);
+
 /* Spilling static chain pseudo may result in generation of wrong
    non-local goto code using frame-pointer to address saved stack
    pointer value after restoring old frame pointer value.  The
Index: gcc/ira-lives.c
===================================================================
--- gcc/ira-lives.c	(revision 264789)
+++ gcc/ira-lives.c	(working copy)
@@ -84,6 +84,10 @@  static int *allocno_saved_at_call;
    supplemental to recog_data.  */
 static alternative_mask preferred_alternatives;
 
+/* If non-NULL, the source operand of a register to register copy for which
+   we should not add a conflict with the copy's destination operand.  */
+static rtx ignore_reg_for_conflicts;
+
 /* Record hard register REGNO as now being live.  */
 static void
 make_hard_regno_live (int regno)
@@ -101,6 +105,11 @@  make_hard_regno_dead (int regno)
     {
       ira_object_t obj = ira_object_id_map[i];
 
+      if (ignore_reg_for_conflicts != NULL_RTX
+	  && REGNO (ignore_reg_for_conflicts)
+	     == (unsigned int) ALLOCNO_REGNO (OBJECT_ALLOCNO (obj)))
+	continue;
+
       SET_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (obj), regno);
       SET_HARD_REG_BIT (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj), regno);
     }
@@ -154,12 +163,38 @@  static void
 make_object_dead (ira_object_t obj)
 {
   live_range_t lr;
+  int ignore_regno = -1;
+  int last_regno = -1;
 
   sparseset_clear_bit (objects_live, OBJECT_CONFLICT_ID (obj));
 
+  /* Check whether any part of IGNORE_REG_FOR_CONFLICTS already conflicts
+     with OBJ.  */
+  if (ignore_reg_for_conflicts != NULL_RTX
+      && REGNO (ignore_reg_for_conflicts) < FIRST_PSEUDO_REGISTER)
+    {
+      last_regno = END_REGNO (ignore_reg_for_conflicts);
+      int src_regno = ignore_regno = REGNO (ignore_reg_for_conflicts);
+
+      while (src_regno < last_regno)
+	{
+	  if (TEST_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (obj), src_regno))
+	    {
+	      ignore_regno = last_regno = -1;
+	      break;
+	    }
+	  src_regno++;
+	}
+    }
+
   IOR_HARD_REG_SET (OBJECT_CONFLICT_HARD_REGS (obj), hard_regs_live);
   IOR_HARD_REG_SET (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj), hard_regs_live);
 
+  /* If IGNORE_REG_FOR_CONFLICTS did not already conflict with OBJ, make
+     sure it still doesn't.  */
+  for (; ignore_regno < last_regno; ignore_regno++)
+    CLEAR_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (obj), ignore_regno);
+
   lr = OBJECT_LIVE_RANGES (obj);
   ira_assert (lr != NULL);
   lr->finish = curr_point;
@@ -1022,6 +1057,38 @@  find_call_crossed_cheap_reg (rtx_insn *i
   return cheap_reg;
 }  
 
+/* Determine whether INSN is a register to register copy of the type where
+   we do not need to make the source and destiniation registers conflict.
+   If this is a copy instruction, then return the source reg.  Otherwise,
+   return NULL_RTX.  */
+rtx
+copy_insn_p (rtx_insn *insn)
+{
+  rtx set;
+
+  /* Disallow anything other than a simple register to register copy
+     that has no side effects.  */
+  if ((set = single_set (insn)) == NULL_RTX
+      || !REG_P (SET_DEST (set))
+      || !REG_P (SET_SRC (set))
+      || side_effects_p (set))
+    return NULL_RTX;
+
+  int dst_regno = REGNO (SET_DEST (set));
+  int src_regno = REGNO (SET_SRC (set));
+  machine_mode mode = GET_MODE (SET_DEST (set));
+
+  /* Computing conflicts for register pairs is difficult to get right, so
+     for now, disallow it.  */
+  if ((dst_regno < FIRST_PSEUDO_REGISTER
+       && hard_regno_nregs (dst_regno, mode) != 1)
+      || (src_regno < FIRST_PSEUDO_REGISTER
+	  && hard_regno_nregs (src_regno, mode) != 1))
+    return NULL_RTX;
+
+  return SET_SRC (set);
+}
+
 /* Process insns of the basic block given by its LOOP_TREE_NODE to
    update allocno live ranges, allocno hard register conflicts,
    intersected calls, and register pressure info for allocnos for the
@@ -1107,22 +1174,7 @@  process_bb_node_lives (ira_loop_tree_nod
 		     curr_point);
 
 	  call_p = CALL_P (insn);
-#ifdef REAL_PIC_OFFSET_TABLE_REGNUM
-	  int regno;
-	  bool clear_pic_use_conflict_p = false;
-	  /* Processing insn usage in call insn can create conflict
-	     with pic pseudo and pic hard reg and that is wrong.
-	     Check this situation and fix it at the end of the insn
-	     processing.  */
-	  if (call_p && pic_offset_table_rtx != NULL_RTX
-	      && (regno = REGNO (pic_offset_table_rtx)) >= FIRST_PSEUDO_REGISTER
-	      && (a = ira_curr_regno_allocno_map[regno]) != NULL)
-	    clear_pic_use_conflict_p
-		= (find_regno_fusage (insn, USE, REAL_PIC_OFFSET_TABLE_REGNUM)
-		   && ! TEST_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS
-					   (ALLOCNO_OBJECT (a, 0)),
-					   REAL_PIC_OFFSET_TABLE_REGNUM));
-#endif
+	  ignore_reg_for_conflicts = copy_insn_p (insn);
 
 	  /* Mark each defined value as live.  We need to do this for
 	     unused values because they still conflict with quantities
@@ -1275,20 +1327,9 @@  process_bb_node_lives (ira_loop_tree_nod
 		}
 	    }
 
-#ifdef REAL_PIC_OFFSET_TABLE_REGNUM
-	  if (clear_pic_use_conflict_p)
-	    {
-	      regno = REGNO (pic_offset_table_rtx);
-	      a = ira_curr_regno_allocno_map[regno];
-	      CLEAR_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (ALLOCNO_OBJECT (a, 0)),
-				  REAL_PIC_OFFSET_TABLE_REGNUM);
-	      CLEAR_HARD_REG_BIT (OBJECT_TOTAL_CONFLICT_HARD_REGS
-				  (ALLOCNO_OBJECT (a, 0)),
-				  REAL_PIC_OFFSET_TABLE_REGNUM);
-	    }
-#endif
 	  curr_point++;
 	}
+      ignore_reg_for_conflicts = NULL_RTX;
 
       if (bb_has_eh_pred (bb))
 	for (j = 0; ; ++j)
Index: gcc/lra-lives.c
===================================================================
--- gcc/lra-lives.c	(revision 264789)
+++ gcc/lra-lives.c	(working copy)
@@ -96,6 +96,10 @@  static bitmap_head temp_bitmap;
 /* Pool for pseudo live ranges.	 */
 static object_allocator<lra_live_range> lra_live_range_pool ("live ranges");
 
+/* If non-NULL, the source operand of a register to register copy for which
+   we should not add a conflict with the copy's destination operand.  */
+static rtx ignore_reg_for_conflicts;
+
 /* Free live range list LR.  */
 static void
 free_live_range_list (lra_live_range_t lr)
@@ -239,11 +243,9 @@  make_hard_regno_live (int regno)
 
 /* Process the definition of hard register REGNO.  This updates
    hard_regs_live, START_DYING and conflict hard regs for living
-   pseudos.  Conflict hard regs for the pic pseudo is not updated if
-   REGNO is REAL_PIC_OFFSET_TABLE_REGNUM and CHECK_PIC_PSEUDO_P is
-   true.  */
+   pseudos.  */
 static void
-make_hard_regno_dead (int regno, bool check_pic_pseudo_p ATTRIBUTE_UNUSED)
+make_hard_regno_dead (int regno)
 {
   lra_assert (regno < FIRST_PSEUDO_REGISTER);
   if (! TEST_HARD_REG_BIT (hard_regs_live, regno))
@@ -251,13 +253,12 @@  make_hard_regno_dead (int regno, bool ch
   sparseset_set_bit (start_dying, regno);
   unsigned int i;
   EXECUTE_IF_SET_IN_SPARSESET (pseudos_live, i)
-#ifdef REAL_PIC_OFFSET_TABLE_REGNUM
-    if (! check_pic_pseudo_p
-	|| regno != REAL_PIC_OFFSET_TABLE_REGNUM
-	|| pic_offset_table_rtx == NULL
-	|| i != REGNO (pic_offset_table_rtx))
-#endif
+    {
+      if (ignore_reg_for_conflicts != NULL_RTX
+	  && REGNO (ignore_reg_for_conflicts) == i)
+	continue;
       SET_HARD_REG_BIT (lra_reg_info[i].conflict_hard_regs, regno);
+    }
   CLEAR_HARD_REG_BIT (hard_regs_live, regno);
   if (fixed_regs[regno] || TEST_HARD_REG_BIT (hard_regs_spilled_into, regno))
     {
@@ -294,14 +295,41 @@  static void
 mark_pseudo_dead (int regno, int point)
 {
   lra_live_range_t p;
+  int ignore_regno = -1;
+  int last_regno = -1;
 
   lra_assert (regno >= FIRST_PSEUDO_REGISTER);
   lra_assert (sparseset_bit_p (pseudos_live, regno));
   sparseset_clear_bit (pseudos_live, regno);
   sparseset_set_bit (start_dying, regno);
 
+  /* Check whether any part of IGNORE_REG_FOR_CONFLICTS already conflicts
+     with REGNO.  */
+  if (ignore_reg_for_conflicts != NULL_RTX
+      && REGNO (ignore_reg_for_conflicts) < FIRST_PSEUDO_REGISTER)
+    {
+      last_regno = END_REGNO (ignore_reg_for_conflicts);
+      int src_regno = ignore_regno = REGNO (ignore_reg_for_conflicts);
+
+      while (src_regno < last_regno)
+	{
+	  if (TEST_HARD_REG_BIT (lra_reg_info[regno].conflict_hard_regs,
+				 src_regno))
+	    {
+	      ignore_regno = last_regno = -1;
+	      break;
+	    }
+	  src_regno++;
+	}
+    }
+
   IOR_HARD_REG_SET (lra_reg_info[regno].conflict_hard_regs, hard_regs_live);
 
+  /* If IGNORE_REG_FOR_CONFLICTS did not already conflict with REGNO, make
+     sure it still doesn't.  */
+  for (; ignore_regno < last_regno; ignore_regno++)
+    CLEAR_HARD_REG_BIT (lra_reg_info[regno].conflict_hard_regs, ignore_regno);
+
   if (complete_info_p || lra_get_regno_hard_regno (regno) < 0)
     {
       p = lra_reg_info[regno].live_ranges;
@@ -350,7 +378,7 @@  mark_regno_dead (int regno, machine_mode
   if (regno < FIRST_PSEUDO_REGISTER)
     {
       for (last = end_hard_regno (mode, regno); regno < last; regno++)
-	make_hard_regno_dead (regno, false);
+	make_hard_regno_dead (regno);
     }
   else
     {
@@ -747,6 +775,7 @@  process_bb_lives (basic_block bb, int &c
 	}
 
       call_p = CALL_P (curr_insn);
+      ignore_reg_for_conflicts = copy_insn_p (curr_insn);
       src_regno = (set != NULL_RTX && REG_P (SET_SRC (set))
 		   ? REGNO (SET_SRC (set)) : -1);
       dst_regno = (set != NULL_RTX && REG_P (SET_DEST (set))
@@ -858,14 +887,13 @@  process_bb_lives (basic_block bb, int &c
       for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next)
 	if (reg->type == OP_OUT
 	    && ! reg_early_clobber_p (reg, n_alt) && ! reg->subreg_p)
-	  make_hard_regno_dead (reg->regno, false);
+	  make_hard_regno_dead (reg->regno);
 
       if (curr_id->arg_hard_regs != NULL)
 	for (i = 0; (regno = curr_id->arg_hard_regs[i]) >= 0; i++)
 	  if (regno >= FIRST_PSEUDO_REGISTER)
-	    /* It is a clobber.  Don't create conflict of used
-	       REAL_PIC_OFFSET_TABLE_REGNUM and the pic pseudo.  */
-	    make_hard_regno_dead (regno - FIRST_PSEUDO_REGISTER, true);
+	    /* It is a clobber.  */
+	    make_hard_regno_dead (regno - FIRST_PSEUDO_REGISTER);
 
       if (call_p)
 	{
@@ -925,8 +953,7 @@  process_bb_lives (basic_block bb, int &c
 	  make_hard_regno_live (reg->regno);
 
       if (curr_id->arg_hard_regs != NULL)
-	/* Make argument hard registers live.  Don't create conflict
-	   of used REAL_PIC_OFFSET_TABLE_REGNUM and the pic pseudo.  */
+	/* Make argument hard registers live.  */
 	for (i = 0; (regno = curr_id->arg_hard_regs[i]) >= 0; i++)
 	  if (regno < FIRST_PSEUDO_REGISTER)
 	    make_hard_regno_live (regno);
@@ -954,7 +981,7 @@  process_bb_lives (basic_block bb, int &c
 	      if (reg2->type != OP_OUT && reg2->regno == reg->regno)
 		break;
 	    if (reg2 == NULL)
-	      make_hard_regno_dead (reg->regno, false);
+	      make_hard_regno_dead (reg->regno);
 	  }
 
       if (need_curr_point_incr)
@@ -989,6 +1016,7 @@  process_bb_lives (basic_block bb, int &c
       EXECUTE_IF_SET_IN_SPARSESET (unused_set, j)
 	add_reg_note (curr_insn, REG_UNUSED, regno_reg_rtx[j]);
     }
+  ignore_reg_for_conflicts = NULL_RTX;
 
   if (bb_has_eh_pred (bb))
     for (j = 0; ; ++j)
Index: gcc/testsuite/gcc.target/powerpc/pr86939.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr86939.c	(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr86939.c	(working copy)
@@ -0,0 +1,12 @@ 
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-options "-O2" } */
+
+typedef long (*fptr_t) (void);
+long
+func (fptr_t *p)
+{
+  if (p)
+    return (*p) ();
+  return 0;
+}
+/* { dg-final { scan-assembler-not {mr %?r?12,} } } */
Index: gcc/testsuite/gcc.target/i386/pr49095.c
===================================================================
--- gcc/testsuite/gcc.target/i386/pr49095.c	(revision 264793)
+++ gcc/testsuite/gcc.target/i386/pr49095.c	(working copy)
@@ -73,4 +73,5 @@  G (long)
 /* { dg-final { scan-assembler-not "test\[lq\]" } } */
 /* The {f,h}{char,short,int,long}xor functions aren't optimized into
    a RMW instruction, so need load, modify and store.  FIXME eventually.  */
-/* { dg-final { scan-assembler-times "\\), %" 8 } } */
+/* { dg-final { scan-assembler-times "\\), %" 57 { target { ia32 } } } } */
+/* { dg-final { scan-assembler-times "\\), %" 45 { target { ! ia32 } } } } */