Message ID | 14bf79ef-9db2-e76b-df10-fcb2574d5ccb@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | None | expand |
On 10/03/2018 10:09 AM, Peter Bergner wrote: > Here is another updated PATCH 2 that is the same as the previous version, > but includes the modification to the expected output of i386/pr49095.c > test case, as H.J. has confirmed the code gen changes we are seeing on > are a good thing. > > This patch completed bootstrap and regtesting on powerpc64le-linux, > x86_64-linux and i686-linux (Thanks H.J.!) with no regressions. > Ok for trunk? > The patch is ok for the trunk. Only very minor comments. IMHO, the name copy_insn_p is too common and confusing (we already have functions copy_insn and copy_insn_1 in GCC). The name does not reflect its result meaning. I would call it something like non_conflict_copy_source_reg although it is long. Also I would rename last_regno to bound_regno because it is better reflect variable value meaning or at least to end_regno as it is a value of END_REGNO macro. But you can ignore these insignificant comments. Peter, thank you for working on the issue. The patches may be solve more PRs you mentioned. > > > gcc/ > PR rtl-optimization/86939 > PR rtl-optimization/87479 > * ira.h (copy_insn_p): New prototype. > * ira-lives.c (ignore_reg_for_conflicts): New static variable. > (make_hard_regno_dead): Don't add conflicts for register > ignore_reg_for_conflicts. > (make_object_dead): Likewise. > (copy_insn_p): New function. > (process_bb_node_lives): Set ignore_reg_for_conflicts for copies. > Remove special conflict handling of REAL_PIC_OFFSET_TABLE_REGNUM. > * lra-lives.c (ignore_reg_for_conflicts): New static variable. > (make_hard_regno_dead): Don't add conflicts for register > ignore_reg_for_conflicts. Remove special conflict handling of > REAL_PIC_OFFSET_TABLE_REGNUM. Remove now unused argument > check_pic_pseudo_p and update callers. > (mark_pseudo_dead): Don't add conflicts for register > ignore_reg_for_conflicts. > (process_bb_lives): Set ignore_reg_for_conflicts for copies. > > gcc/testsuite/ > * gcc.target/powerpc/pr86939.c: New test. > * gcc/testsuite/gcc.target/i386/pr49095.c: Fix expected results. >
On 10/4/18 3:01 PM, Vladimir Makarov wrote: > IMHO, the name copy_insn_p is too common and confusing (we already have > functions copy_insn and copy_insn_1 in GCC). The name does not reflect its > result meaning. I would call it something like non_conflict_copy_source_reg > although it is long. I'm fine with renaming it. I'm not sure I like the use of source reg in the name even though it is what is returned. That is just a convenience for the caller of the function. Its true purpose is recognizing whether INSN is or is not a reg to reg copy for which we can ignore their interference. How about is_reg_copy_insn_p() or is_reg_to_reg_copy_p() or ??? > Also I would rename last_regno to bound_regno because it is better reflect > variable value meaning or at least to end_regno as it is a value of END_REGNO > macro. Ok, I went with end_regno, since that seems to be used elsewhere. Peter
On 10/05/2018 12:40 PM, Peter Bergner wrote: > On 10/4/18 3:01 PM, Vladimir Makarov wrote: >> IMHO, the name copy_insn_p is too common and confusing (we already have >> functions copy_insn and copy_insn_1 in GCC). The name does not reflect its >> result meaning. I would call it something like non_conflict_copy_source_reg >> although it is long. > I'm fine with renaming it. I'm not sure I like the use of source reg in > the name even though it is what is returned. That is just a convenience for > the caller of the function. Its true purpose is recognizing whether INSN > is or is not a reg to reg copy for which we can ignore their interference. OK. > How about is_reg_copy_insn_p() or is_reg_to_reg_copy_p() or ??? > Personally I like the first name more. But it is up to you. I don't want to bother you anymore. > >> Also I would rename last_regno to bound_regno because it is better reflect >> variable value meaning or at least to end_regno as it is a value of END_REGNO >> macro. > Ok, I went with end_regno, since that seems to be used elsewhere. > Great. Thank you, Peter.
On 10/5/18 1:32 PM, Vladimir Makarov wrote: > On 10/05/2018 12:40 PM, Peter Bergner wrote: >> On 10/4/18 3:01 PM, Vladimir Makarov wrote: >>> IMHO, the name copy_insn_p is too common and confusing (we already have >>> functions copy_insn and copy_insn_1 in GCC). The name does not reflect its >>> result meaning. I would call it something like non_conflict_copy_source_reg >>> although it is long. >> How about is_reg_copy_insn_p() or is_reg_to_reg_copy_p() or ??? >> > Personally I like the first name more. But it is up to you. I don't want > to bother you anymore. It's not a bother, so lets get something we both are ok with. How about non_conflicting_reg_copy or non_conflicting_copy_insn? Peter
On 10/05/2018 04:00 PM, Peter Bergner wrote: > On 10/5/18 1:32 PM, Vladimir Makarov wrote: >> On 10/05/2018 12:40 PM, Peter Bergner wrote: >>> On 10/4/18 3:01 PM, Vladimir Makarov wrote: >>>> IMHO, the name copy_insn_p is too common and confusing (we already have >>>> functions copy_insn and copy_insn_1 in GCC). The name does not reflect its >>>> result meaning. I would call it something like non_conflict_copy_source_reg >>>> although it is long. >>> How about is_reg_copy_insn_p() or is_reg_to_reg_copy_p() or ??? >>> >> Personally I like the first name more. But it is up to you. I don't want >> to bother you anymore. > It's not a bother, so lets get something we both are ok with. > How about non_conflicting_reg_copy or non_conflicting_copy_insn? OK. I like the first name more.
On 10/5/18 4:12 PM, Vladimir Makarov wrote: > On 10/05/2018 04:00 PM, Peter Bergner wrote: >> How about non_conflicting_reg_copy or non_conflicting_copy_insn? > OK. I like the first name more. Ok, I committed the patch using the first function name. Thank you very much for the patch reviews and approvals! Peter
On Sat, 6 Oct 2018 at 04:38, Peter Bergner <bergner@linux.ibm.com> wrote: > > On 10/5/18 4:12 PM, Vladimir Makarov wrote: > > On 10/05/2018 04:00 PM, Peter Bergner wrote: > >> How about non_conflicting_reg_copy or non_conflicting_copy_insn? > > OK. I like the first name more. > > Ok, I committed the patch using the first function name. > Thank you very much for the patch reviews and approvals! > Hi, Since r264897, we are seeing lots of regressions when bootstrapping on arm. There are execution errors as well as ICEs. A detailed list can be found at: https://ci.linaro.org/job/tcwg-compare-results/6498/artifact/artifacts/logs/1-diff-d05_32.tcwg-d05_32-build.txt You can also see the results posted on gcc-testresults. Christophe > Peter > > >
On 10/8/18 4:14 AM, Christophe Lyon wrote: > Since r264897, we are seeing lots of regressions when bootstrapping on arm. > There are execution errors as well as ICEs. > A detailed list can be found at: > https://ci.linaro.org/job/tcwg-compare-results/6498/artifact/artifacts/logs/1-diff-d05_32.tcwg-d05_32-build.txt > > You can also see the results posted on gcc-testresults. Sorry for the breakage. I'll try and build a cross compiler to fix the ICEs. Hopefully all the other issues are related. Peter
On Mon, 8 Oct 2018 at 16:13, Peter Bergner <bergner@linux.ibm.com> wrote: > > On 10/8/18 4:14 AM, Christophe Lyon wrote: > > Since r264897, we are seeing lots of regressions when bootstrapping on arm. > > There are execution errors as well as ICEs. > > A detailed list can be found at: > > https://ci.linaro.org/job/tcwg-compare-results/6498/artifact/artifacts/logs/1-diff-d05_32.tcwg-d05_32-build.txt > > > > You can also see the results posted on gcc-testresults. > > Sorry for the breakage. I'll try and build a cross compiler to fix the > ICEs. Hopefully all the other issues are related. > Note that I saw ICEs only when bootstrapping, not when testing a cross-compiler. > Peter > >
On 10/8/18 8:43 AM, Christophe Lyon wrote: > On Mon, 8 Oct 2018 at 16:13, Peter Bergner <bergner@linux.ibm.com> wrote: >> >> On 10/8/18 4:14 AM, Christophe Lyon wrote: >>> Since r264897, we are seeing lots of regressions when bootstrapping on arm. >>> There are execution errors as well as ICEs. >>> A detailed list can be found at: >>> https://ci.linaro.org/job/tcwg-compare-results/6498/artifact/artifacts/logs/1-diff-d05_32.tcwg-d05_32-build.txt >>> >>> You can also see the results posted on gcc-testresults. >> >> Sorry for the breakage. I'll try and build a cross compiler to fix the >> ICEs. Hopefully all the other issues are related. >> > > Note that I saw ICEs only when bootstrapping, not when testing a cross-compiler. My tester is showing a variety of problems as well. hppa, sh4, aarch64, aarch64_be, alpha arm* and s390x bootstraps are all failing at various points. Others may trickle in over time, but clearly something went wrong recently. I'm going to start trying to pick them off after the morning meetings are wrapped up. jeff
On 10/8/18 9:52 AM, Jeff Law wrote: > My tester is showing a variety of problems as well. hppa, sh4, aarch64, > aarch64_be, alpha arm* and s390x bootstraps are all failing at various > points. Others may trickle in over time, but clearly something went > wrong recently. I'm going to start trying to pick them off after the > morning meetings are wrapped up. Hi Vlad and Jeff, I looked into the PA-RISC test case issue Jeff sent me that is caused by my patch that handles conflicts wrt reg copies and I _think_ I have a handle on what the problem is, but don't yet know how to fix. Jeff, I agree with the analysis you gave in your email to me, but to get Vlad up to speed, here it is again along with some additional info. Compiling the test case, we have the following RTL during IRA. I have also annotated the pseudos in the RTL to include their hard reg assignment: (insn 30 19 32 3 (set (reg/f:SI 112 "%r19") ....)) (insn 32 30 20 3 (set (reg:SI 26 %r26) (reg/f:SI 101 "%r26"))) [snip] (insn 23 22 49 3 (set (reg:SI 109 "%r28") (mem:SI (reg/f:SI 101 "%r26")))) (insn 49 23 31 3 (set (reg/f:SI 100 "%r28") (if_then_else:SI (eq (reg:SI 109 "%r28") (const_int 15 [0xf])) (reg/f:SI 101 "%r26") (const_int 0 [0]))) (expr_list:REG_DEAD (reg:SI 109 "%r28") (expr_list:REG_DEAD (reg/f:SI 101 "%r26")))) (insn 31 49 33 3 (set (mem/f:SI (reg/f:SI 112 "%r19")) (reg/f:SI 100 "%r28")) (expr_list:REG_DEAD (reg/f:SI 112 "%r19") (expr_list:REG_DEAD (reg/f:SI 100 "%r28")))) (call_insn 33 31 36 3 (parallel [ (call (mem:SI (symbol_ref/v:SI ("@_Z3yowP11dw_cfi_node")) (const_int 16 [0x10])) (clobber (reg:SI 1 %r1)) (clobber (reg:SI 2 %r2)) (use (const_int 0 [0]))]) (expr_list:REG_DEAD (reg:SI 26 %r26)) (expr_list:SI (use (reg:SI 26 %r26))))) Before my patch, hard reg %r26 and pseudo 101 conflicted, but with my patch they now (correctly) do not. IRA is able to assign hard regs to all of the pseudos, but the constraints in the pattern for insn 49 forces op0 (pseudo 100) and op1 (pseudo 101) to have the same hard reg. They are not, so LRA comes along and spills insn 49 with the following reloads: Reloads for insn # 49 Reload 0: reload_in (SI) = (reg/f:SI 26 %r26 [orig:101 _10 ] [101]) reload_out (SI) = (reg/f:SI 28 %r28 [orig:100 iftmp.2_5 ] [100]) GENERAL_REGS, RELOAD_OTHER (opnum = 0) reload_in_reg: (reg/f:SI 26 %r26 [orig:101 _10 ] [101]) reload_out_reg: (reg/f:SI 28 %r28 [orig:100 iftmp.2_5 ] [100]) reload_reg_rtx: (reg/f:SI 26 %r26 [orig:101 _10 ] [101]) ..giving us the following updated insn: (insn 49 23 58 3 (set (reg/f:SI 26 %r26 [101]) (if_then_else:SI (eq (reg:SI 28 %r28 [109]) (const_int 15)) (reg/f:SI 26 %r26 [101]) (const_int 0 [0])))) The problem is, that hard reg %r26 is defined in insn 32, to be used in insn 33, so using %r26 as the reload reg is wrong, because it will clobber the value we set in insn 32. Looking thru LRA, it looks like LRA assumes that for a reload, if one of the input pseudos dies in the insn, then the hard reg it was assigned to is available to use. That assumption is (now) wrong, because another pseudo may be using that hard reg or in this case, the hard reg itself is still live. For this example, pseudo 109 also dies in insn 49 and since it's hard reg %r28 isn't live thru the insn, we could have used that instead. However, we cannot just look at REG_DEAD notes for free hard regs to use for reload regs. We need to make sure that that hard reg isn't also assigned to another pseudo that is live at that insn or even that the hard reg itself is live. Vlad, you know the LRA code better than anyone. Will it be easy to find all the places where we create reload regs and fix them up so that we look at more than just REG_DEAD notes? Even though looking at REG_DEAD notes isn't enough, I still think the majority of the time those regs probably will be free to use, we just have to check for the special cases like above where they are not. Peter
On 10/10/18 7:57 PM, Peter Bergner wrote: > The problem is, that hard reg %r26 is defined in insn 32, to be used in > insn 33, so using %r26 as the reload reg is wrong, because it will clobber > the value we set in insn 32. Looking thru LRA, it looks like LRA assumes > that for a reload, if one of the input pseudos dies in the insn, then the > hard reg it was assigned to is available to use. That assumption is (now) > wrong, because another pseudo may be using that hard reg or in this case, > the hard reg itself is still live. > > For this example, pseudo 109 also dies in insn 49 and since it's hard reg > %r28 isn't live thru the insn, we could have used that instead. However, > we cannot just look at REG_DEAD notes for free hard regs to use for reload > regs. We need to make sure that that hard reg isn't also assigned to another > pseudo that is live at that insn or even that the hard reg itself is live. > > Vlad, you know the LRA code better than anyone. Will it be easy to find > all the places where we create reload regs and fix them up so that we > look at more than just REG_DEAD notes? Even though looking at REG_DEAD > notes isn't enough, I still think the majority of the time those regs > probably will be free to use, we just have to check for the special > cases like above where they are not. Ok, after working in gdb, I see that the PA-RISC port still uses reload and not LRA, but it too seems to have the same issue of reusing input regs that have REG_DEAD notes, so the question still stands. It's just that whatever fix we come up with will have to be to both LRA and reload. Peter
On 10/11/18 1:18 PM, Peter Bergner wrote: > Ok, after working in gdb, I see that the PA-RISC port still uses reload > and not LRA, but it too seems to have the same issue of reusing input > regs that have REG_DEAD notes, so the question still stands. It's just > that whatever fix we come up with will have to be to both LRA and reload. On second thought, I'm thinking we should just leave reload alone and only fix this in LRA. That means we'd have to disable the reg copy handling when not using LRA though, which might be another reason to get targets to move to LRA? I've verified the following patch gets the PA-RISC test case to pass again. Thoughts? If ok, I still have to dig into the fails we're seeing on LRA targets. Peter * ira-lives (non_conflicting_reg_copy_p): Disable for non LRA targets. Index: gcc/ira-lives.c =================================================================== --- gcc/ira-lives.c (revision 264897) +++ gcc/ira-lives.c (working copy) @@ -1064,6 +1064,10 @@ find_call_crossed_cheap_reg (rtx_insn *i rtx non_conflicting_reg_copy_p (rtx_insn *insn) { + /* Disallow this for non LRA targets. */ + if (!targetm.lra_p ()) + return NULL_RTX; + rtx set = single_set (insn); /* Disallow anything other than a simple register to register copy
On 10/11/18 1:23 PM, Peter Bergner wrote: > On 10/11/18 1:18 PM, Peter Bergner wrote: >> Ok, after working in gdb, I see that the PA-RISC port still uses reload >> and not LRA, but it too seems to have the same issue of reusing input >> regs that have REG_DEAD notes, so the question still stands. It's just >> that whatever fix we come up with will have to be to both LRA and reload. > > On second thought, I'm thinking we should just leave reload alone and > only fix this in LRA. That means we'd have to disable the reg copy > handling when not using LRA though, which might be another reason to > get targets to move to LRA? I've verified the following patch gets > the PA-RISC test case to pass again. Thoughts? > > If ok, I still have to dig into the fails we're seeing on LRA targets. Hmmm. Interesting. I wonder if all the failing targets were reload targets..... If so, this may be the way forward -- I certainly don't want to spend much, if any, time fixing reload. I'm in the middle of something, but will try to look at each of the failing targets and confirm they use reload by default. Jeff
On 10/11/2018 03:23 PM, Peter Bergner wrote: > On 10/11/18 1:18 PM, Peter Bergner wrote: >> Ok, after working in gdb, I see that the PA-RISC port still uses reload >> and not LRA, but it too seems to have the same issue of reusing input >> regs that have REG_DEAD notes, so the question still stands. It's just >> that whatever fix we come up with will have to be to both LRA and reload. > On second thought, I'm thinking we should just leave reload alone and > only fix this in LRA. That means we'd have to disable the reg copy > handling when not using LRA though, which might be another reason to > get targets to move to LRA? I've verified the following patch gets > the PA-RISC test case to pass again. Thoughts? > > If ok, I still have to dig into the fails we're seeing on LRA targets. > I think it has a sense because even if LRA has the same problem, it will be hard to fix it in reload and LRA. Nobody worked on reload pass for a long time and it is not worth to fix it because we are moving from reload. I suspect that LRA might be immune to these failures because it generates new reload pseudos if it is necessary for insn constraints. Plus there is some primitive value numbering in LRA which can avoid the problem. In any case, the patch is ok for me. > > * ira-lives (non_conflicting_reg_copy_p): Disable for non LRA targets. > > Index: gcc/ira-lives.c > =================================================================== > --- gcc/ira-lives.c (revision 264897) > +++ gcc/ira-lives.c (working copy) > @@ -1064,6 +1064,10 @@ find_call_crossed_cheap_reg (rtx_insn *i > rtx > non_conflicting_reg_copy_p (rtx_insn *insn) > { > + /* Disallow this for non LRA targets. */ > + if (!targetm.lra_p ()) > + return NULL_RTX; > + > rtx set = single_set (insn); > > /* Disallow anything other than a simple register to register copy >
On 10/11/18 2:40 PM, Jeff Law wrote: > On 10/11/18 1:23 PM, Peter Bergner wrote: >> On 10/11/18 1:18 PM, Peter Bergner wrote: >>> Ok, after working in gdb, I see that the PA-RISC port still uses reload >>> and not LRA, but it too seems to have the same issue of reusing input >>> regs that have REG_DEAD notes, so the question still stands. It's just >>> that whatever fix we come up with will have to be to both LRA and reload. >> >> On second thought, I'm thinking we should just leave reload alone and >> only fix this in LRA. That means we'd have to disable the reg copy >> handling when not using LRA though, which might be another reason to >> get targets to move to LRA? I've verified the following patch gets >> the PA-RISC test case to pass again. Thoughts? >> >> If ok, I still have to dig into the fails we're seeing on LRA targets. > Hmmm. Interesting. I wonder if all the failing targets were reload > targets..... If so, this may be the way forward -- I certainly don't > want to spend much, if any, time fixing reload. > > I'm in the middle of something, but will try to look at each of the > failing targets and confirm they use reload by default. These are the easy ones (they default to reload): bergner@pike:~/gcc/gcc-fsf-mainline/gcc/config$ grep -r TARGET_LRA_P | grep false | sort alpha/alpha.c:#define TARGET_LRA_P hook_bool_void_false avr/avr.c:#define TARGET_LRA_P hook_bool_void_false bfin/bfin.c:#define TARGET_LRA_P hook_bool_void_false c6x/c6x.c:#define TARGET_LRA_P hook_bool_void_false cr16/cr16.c:#define TARGET_LRA_P hook_bool_void_false cris/cris.c:#define TARGET_LRA_P hook_bool_void_false epiphany/epiphany.c:#define TARGET_LRA_P hook_bool_void_false fr30/fr30.c:#define TARGET_LRA_P hook_bool_void_false frv/frv.c:#define TARGET_LRA_P hook_bool_void_false h8300/h8300.c:#define TARGET_LRA_P hook_bool_void_false ia64/ia64.c:#define TARGET_LRA_P hook_bool_void_false iq2000/iq2000.c:#define TARGET_LRA_P hook_bool_void_false lm32/lm32.c:#define TARGET_LRA_P hook_bool_void_false m32c/m32c.c:#define TARGET_LRA_P hook_bool_void_false m32r/m32r.c:#define TARGET_LRA_P hook_bool_void_false m68k/m68k.c:#define TARGET_LRA_P hook_bool_void_false mcore/mcore.c:#define TARGET_LRA_P hook_bool_void_false microblaze/microblaze.c:#define TARGET_LRA_P hook_bool_void_false mmix/mmix.c:#define TARGET_LRA_P hook_bool_void_false mn10300/mn10300.c:#define TARGET_LRA_P hook_bool_void_false moxie/moxie.c:#define TARGET_LRA_P hook_bool_void_false msp430/msp430.c:#define TARGET_LRA_P hook_bool_void_false nvptx/nvptx.c:#define TARGET_LRA_P hook_bool_void_false pa/pa.c:#define TARGET_LRA_P hook_bool_void_false rl78/rl78.c:#define TARGET_LRA_P hook_bool_void_false spu/spu.c:#define TARGET_LRA_P hook_bool_void_false stormy16/stormy16.c:#define TARGET_LRA_P hook_bool_void_false tilegx/tilegx.c:#define TARGET_LRA_P hook_bool_void_false tilepro/tilepro.c:#define TARGET_LRA_P hook_bool_void_false vax/vax.c:#define TARGET_LRA_P hook_bool_void_false visium/visium.c:#define TARGET_LRA_P hook_bool_void_false xtensa/xtensa.c:#define TARGET_LRA_P hook_bool_void_false These are harder since they support -mlra: arc/arc.c:#define TARGET_LRA_P arc_lra_p ft32/ft32.c:#define TARGET_LRA_P ft32_lra_p mips/mips.c:#define TARGET_LRA_P mips_lra_p pdp11/pdp11.c:#define TARGET_LRA_P pdp11_lra_p powerpcspe/powerpcspe.c:#define TARGET_LRA_P rs6000_lra_p rx/rx.c:#define TARGET_LRA_P rx_enable_lra s390/s390.c:#define TARGET_LRA_P s390_lra_p sh/sh.c:#define TARGET_LRA_P sh_lra_p sparc/sparc.c:#define TARGET_LRA_P sparc_lra_p Quickly looking into their *.opt files, the follwoing default to LRA: mips, s390 while these default to reload: ft32, sh4 and these I'm not sure of without looking deeper: arc, pdp11, powerpcspe, rx, sparc ...if that helps. Peter
On 10/11/18 3:05 PM, Peter Bergner wrote: > On 10/11/18 2:40 PM, Jeff Law wrote: >> On 10/11/18 1:23 PM, Peter Bergner wrote: >>> On 10/11/18 1:18 PM, Peter Bergner wrote: >>>> Ok, after working in gdb, I see that the PA-RISC port still uses reload >>>> and not LRA, but it too seems to have the same issue of reusing input >>>> regs that have REG_DEAD notes, so the question still stands. It's just >>>> that whatever fix we come up with will have to be to both LRA and reload. >>> >>> On second thought, I'm thinking we should just leave reload alone and >>> only fix this in LRA. That means we'd have to disable the reg copy >>> handling when not using LRA though, which might be another reason to >>> get targets to move to LRA? I've verified the following patch gets >>> the PA-RISC test case to pass again. Thoughts? >>> >>> If ok, I still have to dig into the fails we're seeing on LRA targets. >> Hmmm. Interesting. I wonder if all the failing targets were reload >> targets..... If so, this may be the way forward -- I certainly don't >> want to spend much, if any, time fixing reload. >> >> I'm in the middle of something, but will try to look at each of the >> failing targets and confirm they use reload by default. > > These are the easy ones (they default to reload): > > bergner@pike:~/gcc/gcc-fsf-mainline/gcc/config$ grep -r TARGET_LRA_P | grep false | sort > alpha/alpha.c:#define TARGET_LRA_P hook_bool_void_false > avr/avr.c:#define TARGET_LRA_P hook_bool_void_false > bfin/bfin.c:#define TARGET_LRA_P hook_bool_void_false > c6x/c6x.c:#define TARGET_LRA_P hook_bool_void_false > cr16/cr16.c:#define TARGET_LRA_P hook_bool_void_false > cris/cris.c:#define TARGET_LRA_P hook_bool_void_false > epiphany/epiphany.c:#define TARGET_LRA_P hook_bool_void_false > fr30/fr30.c:#define TARGET_LRA_P hook_bool_void_false > frv/frv.c:#define TARGET_LRA_P hook_bool_void_false > h8300/h8300.c:#define TARGET_LRA_P hook_bool_void_false > ia64/ia64.c:#define TARGET_LRA_P hook_bool_void_false > iq2000/iq2000.c:#define TARGET_LRA_P hook_bool_void_false > lm32/lm32.c:#define TARGET_LRA_P hook_bool_void_false > m32c/m32c.c:#define TARGET_LRA_P hook_bool_void_false > m32r/m32r.c:#define TARGET_LRA_P hook_bool_void_false > m68k/m68k.c:#define TARGET_LRA_P hook_bool_void_false > mcore/mcore.c:#define TARGET_LRA_P hook_bool_void_false > microblaze/microblaze.c:#define TARGET_LRA_P hook_bool_void_false > mmix/mmix.c:#define TARGET_LRA_P hook_bool_void_false > mn10300/mn10300.c:#define TARGET_LRA_P hook_bool_void_false > moxie/moxie.c:#define TARGET_LRA_P hook_bool_void_false > msp430/msp430.c:#define TARGET_LRA_P hook_bool_void_false > nvptx/nvptx.c:#define TARGET_LRA_P hook_bool_void_false > pa/pa.c:#define TARGET_LRA_P hook_bool_void_false > rl78/rl78.c:#define TARGET_LRA_P hook_bool_void_false > spu/spu.c:#define TARGET_LRA_P hook_bool_void_false > stormy16/stormy16.c:#define TARGET_LRA_P hook_bool_void_false > tilegx/tilegx.c:#define TARGET_LRA_P hook_bool_void_false > tilepro/tilepro.c:#define TARGET_LRA_P hook_bool_void_false > vax/vax.c:#define TARGET_LRA_P hook_bool_void_false > visium/visium.c:#define TARGET_LRA_P hook_bool_void_false > xtensa/xtensa.c:#define TARGET_LRA_P hook_bool_void_false > > These are harder since they support -mlra: > > arc/arc.c:#define TARGET_LRA_P arc_lra_p > ft32/ft32.c:#define TARGET_LRA_P ft32_lra_p > mips/mips.c:#define TARGET_LRA_P mips_lra_p > pdp11/pdp11.c:#define TARGET_LRA_P pdp11_lra_p > powerpcspe/powerpcspe.c:#define TARGET_LRA_P rs6000_lra_p > rx/rx.c:#define TARGET_LRA_P rx_enable_lra > s390/s390.c:#define TARGET_LRA_P s390_lra_p > sh/sh.c:#define TARGET_LRA_P sh_lra_p > sparc/sparc.c:#define TARGET_LRA_P sparc_lra_p > > Quickly looking into their *.opt files, the follwoing default to LRA: > mips, s390 So the failing targets were aarch64, alpha, arm, sh4, s390, alpha and hppa. In theory your patch has a reasonable chance of fixing sh4, alpha and hppa. So I suspect we're still going to have the aarch64, arm and s390 issues. I've had my tester turned off while we sorted this out. I'll put your patch to disable the conflict pruning for non-LRA targets and see what we get overnight. jeff
On 10/11/18 1:23 PM, Peter Bergner wrote: > On 10/11/18 1:18 PM, Peter Bergner wrote: >> Ok, after working in gdb, I see that the PA-RISC port still uses reload >> and not LRA, but it too seems to have the same issue of reusing input >> regs that have REG_DEAD notes, so the question still stands. It's just >> that whatever fix we come up with will have to be to both LRA and reload. > > On second thought, I'm thinking we should just leave reload alone and > only fix this in LRA. That means we'd have to disable the reg copy > handling when not using LRA though, which might be another reason to > get targets to move to LRA? I've verified the following patch gets > the PA-RISC test case to pass again. Thoughts? > > If ok, I still have to dig into the fails we're seeing on LRA targets. > > Peter > > * ira-lives (non_conflicting_reg_copy_p): Disable for non LRA targets. So this helped the alpha & hppa and sh4. I'm still seeing failures on the aarch64, s390x. No surprise on these since they use LRA by default and would be unaffected by this patch. You've got state on the aarch64 issue where we generate bogus code for the kernel which (thankfully) the assembler complained about. For s390x the stage3 compiler fails its selftest with: /home/nfs/law/jenkins/workspace/s390x-linux-gnu/obj/gcc/./gcc/xgcc -B/home/nfs/law/jenkins/workspace/s390x-linux-gnu/obj/gcc/./gcc/ -xc -nostdinc /dev/null -S -o /dev/null -fself-test=../../../gcc/gcc/testsuite/selftests ../../../gcc/gcc/input.c:2423: test_lexer_string_locations_simple: FAIL: ASSERT_EQ (expected_start_col, actual_start_col) cc1: internal compiler error: in fail, at selftest.c:47 0x2171da3 selftest::fail(selftest::location const&, char const*) ../../../gcc/gcc/selftest.c:47 0x2192bb3 assert_char_at_range ../../../gcc/gcc/input.c:2299 0x2198ddb test_lexer_string_locations_simple ../../../gcc/gcc/input.c:2423 0x2194a57 selftest::for_each_line_table_case(void (*)(selftest::line_table_case const&)) ../../../gcc/gcc/input.c:3533 0x2194dcf selftest::input_c_tests() ../../../gcc/gcc/input.c:3556 0x21061e9 selftest::run_tests() ../../../gcc/gcc/selftest-run-tests.c:80 0x1869f6d toplev::run_self_tests() ../../../gcc/gcc/toplev.c:2234 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. make[3]: *** [Makefile:1979: s-selftest-c] Error 1 That's an indication that we've mis-compiled GCC along the way since the selftest was successful with the stage1 and stage2 compiler. Given you can see the aarch64 failure with a cross and we haven't done any real analysis on the s390 failure, I'd focus on aarch64. I'd ignore any pathname weirdness and track down why we generate the bogus code the assembler complains about. There's also still an arm-linux-gnueabi failure, but I haven't bisected it to your change. GCC bootstraps, but fails with an odd error in the linemap code while building the kernel. jeff
> These are the easy ones (they default to reload): > > bergner@pike:~/gcc/gcc-fsf-mainline/gcc/config$ grep -r TARGET_LRA_P | grep > false | sort alpha/alpha.c:#define TARGET_LRA_P hook_bool_void_false > avr/avr.c:#define TARGET_LRA_P hook_bool_void_false > bfin/bfin.c:#define TARGET_LRA_P hook_bool_void_false > c6x/c6x.c:#define TARGET_LRA_P hook_bool_void_false > cr16/cr16.c:#define TARGET_LRA_P hook_bool_void_false > cris/cris.c:#define TARGET_LRA_P hook_bool_void_false > epiphany/epiphany.c:#define TARGET_LRA_P hook_bool_void_false > fr30/fr30.c:#define TARGET_LRA_P hook_bool_void_false > frv/frv.c:#define TARGET_LRA_P hook_bool_void_false > h8300/h8300.c:#define TARGET_LRA_P hook_bool_void_false > ia64/ia64.c:#define TARGET_LRA_P hook_bool_void_false > iq2000/iq2000.c:#define TARGET_LRA_P hook_bool_void_false > lm32/lm32.c:#define TARGET_LRA_P hook_bool_void_false > m32c/m32c.c:#define TARGET_LRA_P hook_bool_void_false > m32r/m32r.c:#define TARGET_LRA_P hook_bool_void_false > m68k/m68k.c:#define TARGET_LRA_P hook_bool_void_false > mcore/mcore.c:#define TARGET_LRA_P hook_bool_void_false > microblaze/microblaze.c:#define TARGET_LRA_P hook_bool_void_false > mmix/mmix.c:#define TARGET_LRA_P hook_bool_void_false > mn10300/mn10300.c:#define TARGET_LRA_P hook_bool_void_false > moxie/moxie.c:#define TARGET_LRA_P hook_bool_void_false > msp430/msp430.c:#define TARGET_LRA_P hook_bool_void_false > nvptx/nvptx.c:#define TARGET_LRA_P hook_bool_void_false > pa/pa.c:#define TARGET_LRA_P hook_bool_void_false > rl78/rl78.c:#define TARGET_LRA_P hook_bool_void_false > spu/spu.c:#define TARGET_LRA_P hook_bool_void_false > stormy16/stormy16.c:#define TARGET_LRA_P hook_bool_void_false > tilegx/tilegx.c:#define TARGET_LRA_P hook_bool_void_false > tilepro/tilepro.c:#define TARGET_LRA_P hook_bool_void_false > vax/vax.c:#define TARGET_LRA_P hook_bool_void_false > visium/visium.c:#define TARGET_LRA_P hook_bool_void_false > xtensa/xtensa.c:#define TARGET_LRA_P hook_bool_void_false > > These are harder since they support -mlra: > > arc/arc.c:#define TARGET_LRA_P arc_lra_p > ft32/ft32.c:#define TARGET_LRA_P ft32_lra_p > mips/mips.c:#define TARGET_LRA_P mips_lra_p > pdp11/pdp11.c:#define TARGET_LRA_P pdp11_lra_p > powerpcspe/powerpcspe.c:#define TARGET_LRA_P rs6000_lra_p > rx/rx.c:#define TARGET_LRA_P rx_enable_lra > s390/s390.c:#define TARGET_LRA_P s390_lra_p > sh/sh.c:#define TARGET_LRA_P sh_lra_p > sparc/sparc.c:#define TARGET_LRA_P sparc_lra_p > > Quickly looking into their *.opt files, the follwoing default to LRA: > mips, s390 > while these default to reload: > ft32, sh4 > and these I'm not sure of without looking deeper: > arc, pdp11, powerpcspe, rx, sparc > > ...if that helps. See https://gcc.gnu.org/backends.html for a more precise summary.
On 10/11/18 3:51 PM, Vladimir Makarov wrote: > I think it has a sense because even if LRA has the same problem, it will be > hard to fix it in reload and LRA. Nobody worked on reload pass for a long > time and it is not worth to fix it because we are moving from reload. [snip] > In any case, the patch is ok for me. Ok, I committed the patch as revision 265113 with a slightly longer comment explaining why we're disabling it for reload. Thanks! > I suspect that LRA might be immune to these failures because it generates > new reload pseudos if it is necessary for insn constraints. Plus there is > some primitive value numbering in LRA which can avoid the problem. Maybe. We still have some LRA targets with issues, but we haven't gotten to the point of identifying what the problem is. It could well be target constraints, etc. that is the problem. I built a newer binutils on our s390x box and I have now recreated the selftest ICE in stage3 and I still have the largish aarch64 failure I'm looking into. Peter
On 10/12/18 10:38 AM, Peter Bergner wrote: > On 10/11/18 3:51 PM, Vladimir Makarov wrote: >> I think it has a sense because even if LRA has the same problem, it will be >> hard to fix it in reload and LRA. Nobody worked on reload pass for a long >> time and it is not worth to fix it because we are moving from reload. > [snip] >> In any case, the patch is ok for me. > > Ok, I committed the patch as revision 265113 with a slightly longer comment > explaining why we're disabling it for reload. Thanks! > > > >> I suspect that LRA might be immune to these failures because it generates >> new reload pseudos if it is necessary for insn constraints. Plus there is >> some primitive value numbering in LRA which can avoid the problem. > > Maybe. We still have some LRA targets with issues, but we haven't gotten > to the point of identifying what the problem is. It could well be target > constraints, etc. that is the problem. > > I built a newer binutils on our s390x box and I have now recreated the > selftest ICE in stage3 and I still have the largish aarch64 failure I'm > looking into. ACK. In that case I'll look into the one arm issue and see if I can conclusively rule in or out the IRA/LRA work. jeff
On 10/11/18 10:40 PM, Jeff Law wrote: > On 10/11/18 1:23 PM, Peter Bergner wrote: >> * ira-lives (non_conflicting_reg_copy_p): Disable for non LRA targets. > So this helped the alpha & hppa and sh4. > > I'm still seeing failures on the aarch64, s390x. No surprise on these > since they use LRA by default and would be unaffected by this patch. Ok, I was able to reduce the aarch64 test case down to the minimum test case that still kept the kernel's __cmpxchg_double() function intact. I tested the patch you're currently running on your builders which changed some of the "... == OP_OUT" to "... != OP_IN", etc and it doesn't fix the following test case, like it seems to fix the s390 issue and segher's small test case (both aarch64 and ppc64). It's late here, so I'll start digging into this one in the morning. Peter bergner@pike:~/gcc/BUGS/PR87507/$ cat slub-min.c long __cmpxchg_double (unsigned long arg) { unsigned long old1 = 0; unsigned long old2 = arg; unsigned long new1 = 0; unsigned long new2 = 0; volatile void *ptr = 0; unsigned long oldval1 = old1; unsigned long oldval2 = old2; register unsigned long x0 asm ("x0") = old1; register unsigned long x1 asm ("x1") = old2; register unsigned long x2 asm ("x2") = new1; register unsigned long x3 asm ("x3") = new2; register unsigned long x4 asm ("x4") = (unsigned long) ptr; asm volatile (" casp %[old1], %[old2], %[new1], %[new2], %[v]\n" " eor %[old1], %[old1], %[oldval1]\n" " eor %[old2], %[old2], %[oldval2]\n" " orr %[old1], %[old1], %[old2]\n" : [old1] "+&r" (x0), [old2] "+&r" (x1), [v] "+Q" (* (unsigned long *) ptr) : [new1] "r" (x2), [new2] "r" (x3), [ptr] "r" (x4), [oldval1] "r" (oldval1),[oldval2] "r" (oldval2) : "x16", "x17", "x30"); return x0; } bergner@pike:~/gcc/BUGS/PR87507/$ /home/bergner/gcc/build/gcc-fsf-mainline-aarch64-r264897/gcc/xgcc -B/home/bergner/gcc/build/gcc-fsf-mainline-aarch64-r264897/gcc -O2 -march=armv8.1-a -c slub-min.c /tmp/ccQCkiSG.s: Assembler messages: /tmp/ccQCkiSG.s:24: Error: reg pair must be contiguous at operand 2 -- `casp x0,x6,x2,x3,[x5]'
Hi Peter, Is there any update on this issues? arm-none-linux-gnueabihf native toolchain has been mis-compiled for a while. I got the following dump from the test case. x1 is an early clobber operand in the inline assembly statement, r92 should conflict with x1? ;; a0(r93,l0) conflicts: a1(r92,l0) ;; total conflict hard regs: 0-4 16 17 30 ;; conflict hard regs: 0-4 16 17 30 ;; a1(r92,l0) conflicts: a0(r93,l0) ;; total conflict hard regs: 0 2-4 16 17 30 ;; conflict hard regs: 0 2-4 16 17 30 Dump from ira. (insn 2 8 6 2 (set (reg/v:DI 92 [ arg ]) (reg:DI 97)) "test.c":3:1 47 {*movdi_aarch64} (expr_list:REG_DEAD (reg:DI 97) (nil))) (insn 7 6 9 2 (set (reg/v:DI 1 x1 [ x1 ]) (reg/v:DI 92 [ arg ])) "test.c":13:26 47 {*movdi_aarch64} (nil)) (insn 11 10 14 2 (set (reg/f:DI 93) (const_int 0 [0])) "test.c":17:3 47 {*movdi_aarch64} (expr_list:REG_EQUIV (const_int 0 [0]) (nil))) (insn 14 11 21 2 (parallel [ (set (reg/v:DI 0 x0 [ x0 ]) (asm_operands/v:DI (" casp %0, %1, %3, %4, %2 eor %0, %0, %6 eor %1, %1, %7 orr %0, %0, %1 ") ("=&r") 0 [ (reg/v:DI 2 x2 [ x2 ]) (reg/v:DI 3 x3 [ x3 ]) (reg/v:DI 4 x4 [ x4 ]) (reg/f:DI 93) (reg/v:DI 92 [ arg ]) (reg/v:DI 0 x0 [ x0 ]) (reg/v:DI 1 x1 [ x1 ]) (mem:DI (reg/f:DI 93) [1 MEM[(long unsigned int *)0B]+0 S8 A128]) ] [ (asm_input:DI ("r") test.c:17) (asm_input:DI ("r") test.c:17) (asm_input:DI ("r") test.c:17) (asm_input:DI ("r") test.c:17) (asm_input:DI ("r") test.c:17) (asm_input:DI ("0") test.c:17) (asm_input:DI ("1") test.c:17) (asm_input:DI ("Q") test.c:17) ] [] test.c:17)) (set (reg/v:DI 1 x1 [ x1 ]) (asm_operands/v:DI (" casp %0, %1, %3, %4, %2 eor %0, %0, %6 eor %1, %1, %7 orr %0, %0, %1 ") ("=&r") 1 [ (reg/v:DI 2 x2 [ x2 ]) (reg/v:DI 3 x3 [ x3 ]) (reg/v:DI 4 x4 [ x4 ]) (reg/f:DI 93) (reg/v:DI 92 [ arg ]) (reg/v:DI 0 x0 [ x0 ]) (reg/v:DI 1 x1 [ x1 ]) (mem:DI (reg/f:DI 93) [1 MEM[(long unsigned int *)0B]+0 S8 A128]) ] [ (asm_input:DI ("r") test.c:17) (asm_input:DI ("r") test.c:17) (asm_input:DI ("r") test.c:17) (asm_input:DI ("r") test.c:17) (asm_input:DI ("r") test.c:17) (asm_input:DI ("0") test.c:17) (asm_input:DI ("1") test.c:17) (asm_input:DI ("Q") test.c:17) ] [] test.c:17)) (set (mem:DI (reg/f:DI 93) [1 MEM[(long unsigned int *)0B]+0 S8 A128]) (asm_operands/v:DI (" casp %0, %1, %3, %4, %2 eor %0, %0, %6 eor %1, %1, %7 orr %0, %0, %1 ") ("=Q") 2 [ (reg/v:DI 2 x2 [ x2 ]) (reg/v:DI 3 x3 [ x3 ]) (reg/v:DI 4 x4 [ x4 ]) (reg/f:DI 93) (reg/v:DI 92 [ arg ]) (reg/v:DI 0 x0 [ x0 ]) (reg/v:DI 1 x1 [ x1 ]) (mem:DI (reg/f:DI 93) [1 MEM[(long unsigned int *)0B]+0 S8 A128]) ] [ (asm_input:DI ("r") test.c:17) (asm_input:DI ("r") test.c:17) (asm_input:DI ("r") test.c:17) (asm_input:DI ("r") test.c:17) (asm_input:DI ("r") test.c:17) (asm_input:DI ("0") test.c:17) (asm_input:DI ("1") test.c:17) (asm_input:DI ("Q") test.c:17) ] [] test.c:17)) (clobber (reg:DI 30 x30)) (clobber (reg:DI 17 x17)) (clobber (reg:DI 16 x16)) ]) "test.c":17:3 -1 (expr_list:REG_DEAD (reg/f:DI 93) (expr_list:REG_DEAD (reg/v:DI 92 [ arg ]) (expr_list:REG_DEAD (reg/v:DI 4 x4 [ x4 ]) (expr_list:REG_DEAD (reg/v:DI 3 x3 [ x3 ]) (expr_list:REG_DEAD (reg/v:DI 2 x2 [ x2 ]) (expr_list:REG_UNUSED (reg:DI 30 x30) (expr_list:REG_UNUSED (reg:DI 17 x17) (expr_list:REG_UNUSED (reg:DI 16 x16) (expr_list:REG_UNUSED (reg/v:DI 1 x1 [ x1 ]) (nil))))))))))) Regards, Renlin On 10/16/2018 03:24 AM, Peter Bergner wrote: > On 10/11/18 10:40 PM, Jeff Law wrote: >> On 10/11/18 1:23 PM, Peter Bergner wrote: >>> * ira-lives (non_conflicting_reg_copy_p): Disable for non LRA targets. >> So this helped the alpha & hppa and sh4. >> >> I'm still seeing failures on the aarch64, s390x. No surprise on these >> since they use LRA by default and would be unaffected by this patch. > > Ok, I was able to reduce the aarch64 test case down to the minimum test case > that still kept the kernel's __cmpxchg_double() function intact. I tested > the patch you're currently running on your builders which changed some > of the "... == OP_OUT" to "... != OP_IN", etc and it doesn't fix the > following test case, like it seems to fix the s390 issue and segher's > small test case (both aarch64 and ppc64). > > It's late here, so I'll start digging into this one in the morning. > > Peter > > > > bergner@pike:~/gcc/BUGS/PR87507/$ cat slub-min.c > long > __cmpxchg_double (unsigned long arg) > { > unsigned long old1 = 0; > unsigned long old2 = arg; > unsigned long new1 = 0; > unsigned long new2 = 0; > volatile void *ptr = 0; > > unsigned long oldval1 = old1; > unsigned long oldval2 = old2; > register unsigned long x0 asm ("x0") = old1; > register unsigned long x1 asm ("x1") = old2; > register unsigned long x2 asm ("x2") = new1; > register unsigned long x3 asm ("x3") = new2; > register unsigned long x4 asm ("x4") = (unsigned long) ptr; > asm volatile (" casp %[old1], %[old2], %[new1], %[new2], %[v]\n" > " eor %[old1], %[old1], %[oldval1]\n" > " eor %[old2], %[old2], %[oldval2]\n" > " orr %[old1], %[old1], %[old2]\n" > : [old1] "+&r" (x0), [old2] "+&r" (x1), [v] "+Q" (* (unsigned long *) ptr) > : [new1] "r" (x2), [new2] "r" (x3), [ptr] "r" (x4), [oldval1] "r" (oldval1),[oldval2] "r" (oldval2) > : "x16", "x17", "x30"); > return x0; > } > bergner@pike:~/gcc/BUGS/PR87507/$ /home/bergner/gcc/build/gcc-fsf-mainline-aarch64-r264897/gcc/xgcc -B/home/bergner/gcc/build/gcc-fsf-mainline-aarch64-r264897/gcc -O2 -march=armv8.1-a -c slub-min.c > /tmp/ccQCkiSG.s: Assembler messages: > /tmp/ccQCkiSG.s:24: Error: reg pair must be contiguous at operand 2 -- `casp x0,x6,x2,x3,[x5]' > > >
Hi Renlin, On Thu, Nov 01, 2018 at 06:50:22PM +0000, Renlin Li wrote: > I got the following dump from the test case. > > x1 is an early clobber operand in the inline assembly statement, > r92 should conflict with x1? Yes, I think so. Or LRA should not pick something that was already a hard register to spill (this should work without the earlyclobbers, too... In this testcase that works anyhow, but that seems to be by accident). Segher
On 11/1/18 1:50 PM, Renlin Li wrote: > Is there any update on this issues? > arm-none-linux-gnueabihf native toolchain has been mis-compiled for a while. From the analysis I've done, my commit is just exposing latent issues in LRA. Can you try the patch I submitted here to see if it helps? https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html It survives on powerpc64le-linux, x86_64-linux and s390x-linux. Jeff threw it on his testers and said he saw an arm issue and was trying to come up with a test case for me to debug. The specific issue you mentioned with the inline asm and the casp insn is a bug in LRA where is will spill a user defined hard register and it shouldn't do that. My patch above stops that. The question is whether we've quashed the rest of the latent bugs. Peter
Hi Peter, On 11/01/2018 10:07 PM, Peter Bergner wrote: > On 11/1/18 1:50 PM, Renlin Li wrote: >> Is there any update on this issues? >> arm-none-linux-gnueabihf native toolchain has been mis-compiled for a while. > > From the analysis I've done, my commit is just exposing latent issues > in LRA. Yes, it looks like some latent issues are been exposed. Can you try the patch I submitted here to see if it helps? > > https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html Thanks for the patch! I'll help to test the patch and let you know the status. Thanks, Renlin > > It survives on powerpc64le-linux, x86_64-linux and s390x-linux. > Jeff threw it on his testers and said he saw an arm issue and was > trying to come up with a test case for me to debug. > > The specific issue you mentioned with the inline asm and the casp insn > is a bug in LRA where is will spill a user defined hard register and > it shouldn't do that. My patch above stops that. The question is > whether we've quashed the rest of the latent bugs. > > Peter > >
On 11/1/18 4:07 PM, Peter Bergner wrote: > On 11/1/18 1:50 PM, Renlin Li wrote: >> Is there any update on this issues? >> arm-none-linux-gnueabihf native toolchain has been mis-compiled for a while. > > From the analysis I've done, my commit is just exposing latent issues > in LRA. Can you try the patch I submitted here to see if it helps? > > https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html > > It survives on powerpc64le-linux, x86_64-linux and s390x-linux. > Jeff threw it on his testers and said he saw an arm issue and was > trying to come up with a test case for me to debug. So I don't think the ARM issues are related to your patch, they may have been related the combiner changes that went in around the same time. At this point your patch appears to be DTRT across the board. The only fallout is the bogus s390 asm it caught in the kernel. Jeff
On 11/5/18 1:20 PM, Jeff Law wrote: > On 11/1/18 4:07 PM, Peter Bergner wrote: >> On 11/1/18 1:50 PM, Renlin Li wrote: >>> Is there any update on this issues? >>> arm-none-linux-gnueabihf native toolchain has been mis-compiled for a while. >> >> From the analysis I've done, my commit is just exposing latent issues >> in LRA. Can you try the patch I submitted here to see if it helps? >> >> https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html >> >> It survives on powerpc64le-linux, x86_64-linux and s390x-linux. >> Jeff threw it on his testers and said he saw an arm issue and was >> trying to come up with a test case for me to debug. > So I don't think the ARM issues are related to your patch, they may have > been related the combiner changes that went in around the same time. > > At this point your patch appears to be DTRT across the board. The only > fallout is the bogus s390 asm it caught in the kernel. Cool. I will note that I contacted the s390 kernel guys and gave them a fix to their broken constraints in that asm and they are going to fix it. Is the above an approval to commit the patch mentioned above or do you still want to wait until the ARM issues are fully resolved? Peter
On 11/5/18 12:36 PM, Peter Bergner wrote: > On 11/5/18 1:20 PM, Jeff Law wrote: >> On 11/1/18 4:07 PM, Peter Bergner wrote: >>> On 11/1/18 1:50 PM, Renlin Li wrote: >>>> Is there any update on this issues? >>>> arm-none-linux-gnueabihf native toolchain has been mis-compiled for a while. >>> >>> From the analysis I've done, my commit is just exposing latent issues >>> in LRA. Can you try the patch I submitted here to see if it helps? >>> >>> https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html >>> >>> It survives on powerpc64le-linux, x86_64-linux and s390x-linux. >>> Jeff threw it on his testers and said he saw an arm issue and was >>> trying to come up with a test case for me to debug. >> So I don't think the ARM issues are related to your patch, they may have >> been related the combiner changes that went in around the same time. >> >> At this point your patch appears to be DTRT across the board. The only >> fallout is the bogus s390 asm it caught in the kernel. > > Cool. I will note that I contacted the s390 kernel guys and gave them a > fix to their broken constraints in that asm and they are going to fix it. Sounds good. I've got a hack in my tester to "fix" that bogus asm until the kernel folks do it right. > > Is the above an approval to commit the patch mentioned above or do you > still want to wait until the ARM issues are fully resolved? I think knowing the patch addresses all the known issues related to the earlier IRA/LRA change unblocks the review step. I don't think we need to wait for the other ARM issues to be resolved -- they seem to be unrelated to the IRA/LRA changes. jeff
Hi Jeff & Peter, On 11/05/2018 07:41 PM, Jeff Law wrote: > On 11/5/18 12:36 PM, Peter Bergner wrote: >> On 11/5/18 1:20 PM, Jeff Law wrote: >>> On 11/1/18 4:07 PM, Peter Bergner wrote: >>>> On 11/1/18 1:50 PM, Renlin Li wrote: >>>>> Is there any update on this issues? >>>>> arm-none-linux-gnueabihf native toolchain has been mis-compiled for a while. >>>> >>>> From the analysis I've done, my commit is just exposing latent issues >>>> in LRA. Can you try the patch I submitted here to see if it helps? >>>> >>>> https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html >>>> >>>> It survives on powerpc64le-linux, x86_64-linux and s390x-linux. >>>> Jeff threw it on his testers and said he saw an arm issue and was >>>> trying to come up with a test case for me to debug. >>> So I don't think the ARM issues are related to your patch, they may have >>> been related the combiner changes that went in around the same time. Yes, there are issues related to the combiner changes. But the IRA/LRA change dose cause the arm-none-linux-gnueabihf bootstrap native toolchain mis-compiled. And the new patch seems not fix this problem. I am trying to extract a test case, but it is a little bit hard as the toolchain itself is mis-compiled. And it ICEs when compile test case with it. I created a bugzilla ticket for this, PR87899. ./gcc/cc1 ~/gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c -O3 test main Analyzing compilation unit Performing interprocedural optimizations <*free_lang_data> <visibility> <build_ssa_passes> <opt_local_passes> <targetclone> <free-fnsummary> <increase_alignment>Streaming LTO <whole-program> <profile_estimate> <icf> <devirt> <cp> <fnsummary> <inline> <pure-const> <free-fnsummary> <static-var> <single-use> <comdats>Assembling functions: <materialize-all-clones> testduring GIMPLE pass: ldist gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c: In function ‘test’: gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c:9:1: internal compiler error: Segmentation fault 9 | test (void) | ^~~~ 0x5c3a37 crash_signal ../../gcc/gcc/toplev.c:325 0x63ef6b inchash::hash::add(void const*, unsigned int) ../../gcc/gcc/inchash.h:100 0x63ef6b inchash::hash::add_ptr(void const*) ../../gcc/gcc/inchash.h:94 0x63ef6b ddr_hasher::hash(data_dependence_relation const*) ../../gcc/gcc/tree-loop-distribution.c:143 0x63ef6b hash_table<ddr_hasher, xcallocator>::find_slot(data_dependence_relation* const&, insert_option) ../../gcc/gcc/hash-table.h:414 0x63ef6b get_data_dependence ../../gcc/gcc/tree-loop-distribution.c:1184 0x63f2bd data_dep_in_cycle_p ../../gcc/gcc/tree-loop-distribution.c:1210 0x63f2bd update_type_for_merge ../../gcc/gcc/tree-loop-distribution.c:1255 0x64064b build_rdg_partition_for_vertex ../../gcc/gcc/tree-loop-distribution.c:1302 0x64064b rdg_build_partitions ../../gcc/gcc/tree-loop-distribution.c:1754 0x64064b distribute_loop ../../gcc/gcc/tree-loop-distribution.c:2795 0x642299 execute ../../gcc/gcc/tree-loop-distribution.c:3133 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. Regards Renlin >>> >>> At this point your patch appears to be DTRT across the board. The only >>> fallout is the bogus s390 asm it caught in the kernel. >> >> Cool. I will note that I contacted the s390 kernel guys and gave them a >> fix to their broken constraints in that asm and they are going to fix it. > Sounds good. I've got a hack in my tester to "fix" that bogus asm until > the kernel folks do it right. > > >> >> Is the above an approval to commit the patch mentioned above or do you >> still want to wait until the ARM issues are fully resolved? > I think knowing the patch addresses all the known issues related to the > earlier IRA/LRA change unblocks the review step. I don't think we need > to wait for the other ARM issues to be resolved -- they seem to be > unrelated to the IRA/LRA changes. > > jeff >
On Tue, Nov 6, 2018 at 10:52 AM Renlin Li <renlin.li@foss.arm.com> wrote: > > Hi Jeff & Peter, > > On 11/05/2018 07:41 PM, Jeff Law wrote: > > On 11/5/18 12:36 PM, Peter Bergner wrote: > >> On 11/5/18 1:20 PM, Jeff Law wrote: > >>> On 11/1/18 4:07 PM, Peter Bergner wrote: > >>>> On 11/1/18 1:50 PM, Renlin Li wrote: > >>>>> Is there any update on this issues? > >>>>> arm-none-linux-gnueabihf native toolchain has been mis-compiled for a while. > >>>> > >>>> From the analysis I've done, my commit is just exposing latent issues > >>>> in LRA. Can you try the patch I submitted here to see if it helps? > >>>> > >>>> https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html > >>>> > >>>> It survives on powerpc64le-linux, x86_64-linux and s390x-linux. > >>>> Jeff threw it on his testers and said he saw an arm issue and was > >>>> trying to come up with a test case for me to debug. > >>> So I don't think the ARM issues are related to your patch, they may have > >>> been related the combiner changes that went in around the same time. > Yes, there are issues related to the combiner changes. But didn't the combiner changes come *after* these patches ? So IIUC, Renlin has been trying to get these fixed *without* the combine patches but just with your patch applied on top of the revision where the problem started showing up . Can you confirm that Renlin ? Ramana > > But the IRA/LRA change dose cause the arm-none-linux-gnueabihf bootstrap native toolchain mis-compiled. > And the new patch seems not fix this problem. > > I am trying to extract a test case, but it is a little bit hard as the toolchain itself is mis-compiled. > And it ICEs when compile test case with it. > > I created a bugzilla ticket for this, PR87899. > > ./gcc/cc1 ~/gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c -O3 > test main > Analyzing compilation unit > Performing interprocedural optimizations > <*free_lang_data> <visibility> <build_ssa_passes> <opt_local_passes> <targetclone> <free-fnsummary> <increase_alignment>Streaming LTO > <whole-program> <profile_estimate> <icf> <devirt> <cp> <fnsummary> <inline> <pure-const> <free-fnsummary> <static-var> <single-use> > <comdats>Assembling functions: > <materialize-all-clones> testduring GIMPLE pass: ldist > > gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c: In function ‘test’: > gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c:9:1: internal compiler error: Segmentation fault > 9 | test (void) > | ^~~~ > 0x5c3a37 crash_signal > ../../gcc/gcc/toplev.c:325 > 0x63ef6b inchash::hash::add(void const*, unsigned int) > ../../gcc/gcc/inchash.h:100 > 0x63ef6b inchash::hash::add_ptr(void const*) > ../../gcc/gcc/inchash.h:94 > 0x63ef6b ddr_hasher::hash(data_dependence_relation const*) > ../../gcc/gcc/tree-loop-distribution.c:143 > 0x63ef6b hash_table<ddr_hasher, xcallocator>::find_slot(data_dependence_relation* const&, insert_option) > ../../gcc/gcc/hash-table.h:414 > 0x63ef6b get_data_dependence > ../../gcc/gcc/tree-loop-distribution.c:1184 > 0x63f2bd data_dep_in_cycle_p > ../../gcc/gcc/tree-loop-distribution.c:1210 > 0x63f2bd update_type_for_merge > ../../gcc/gcc/tree-loop-distribution.c:1255 > 0x64064b build_rdg_partition_for_vertex > ../../gcc/gcc/tree-loop-distribution.c:1302 > 0x64064b rdg_build_partitions > ../../gcc/gcc/tree-loop-distribution.c:1754 > 0x64064b distribute_loop > ../../gcc/gcc/tree-loop-distribution.c:2795 > 0x642299 execute > ../../gcc/gcc/tree-loop-distribution.c:3133 > Please submit a full bug report, > with preprocessed source if appropriate. > Please include the complete backtrace with any bug report. > See <https://gcc.gnu.org/bugs/> for instructions. > > > > Regards > Renlin > > > > >>> > >>> At this point your patch appears to be DTRT across the board. The only > >>> fallout is the bogus s390 asm it caught in the kernel. > >> > >> Cool. I will note that I contacted the s390 kernel guys and gave them a > >> fix to their broken constraints in that asm and they are going to fix it. > > Sounds good. I've got a hack in my tester to "fix" that bogus asm until > > the kernel folks do it right. > > > > > >> > >> Is the above an approval to commit the patch mentioned above or do you > >> still want to wait until the ARM issues are fully resolved? > > I think knowing the patch addresses all the known issues related to the > > earlier IRA/LRA change unblocks the review step. I don't think we need > > to wait for the other ARM issues to be resolved -- they seem to be > > unrelated to the IRA/LRA changes. > > > > jeff > >
Hi Ramana, On 11/06/2018 10:57 AM, Ramana Radhakrishnan wrote: > On Tue, Nov 6, 2018 at 10:52 AM Renlin Li <renlin.li@foss.arm.com> wrote: >> >> Hi Jeff & Peter, >> >> On 11/05/2018 07:41 PM, Jeff Law wrote: >>> On 11/5/18 12:36 PM, Peter Bergner wrote: >>>> On 11/5/18 1:20 PM, Jeff Law wrote: >>>>> On 11/1/18 4:07 PM, Peter Bergner wrote: >>>>>> On 11/1/18 1:50 PM, Renlin Li wrote: >>>>>>> Is there any update on this issues? >>>>>>> arm-none-linux-gnueabihf native toolchain has been mis-compiled for a while. >>>>>> >>>>>> From the analysis I've done, my commit is just exposing latent issues >>>>>> in LRA. Can you try the patch I submitted here to see if it helps? >>>>>> >>>>>> https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html >>>>>> >>>>>> It survives on powerpc64le-linux, x86_64-linux and s390x-linux. >>>>>> Jeff threw it on his testers and said he saw an arm issue and was >>>>>> trying to come up with a test case for me to debug. >>>>> So I don't think the ARM issues are related to your patch, they may have >>>>> been related the combiner changes that went in around the same time. >> Yes, there are issues related to the combiner changes. > > But didn't the combiner changes come *after* these patches ? So IIUC, > Renlin has been trying to get these fixed *without* the combine > patches but just with your patch applied on top of the revision where > the problem started showing up . > > Can you confirm that Renlin ? I just did a bootstrap again with everything up to r264897 which is Oct 6. it produce the ICE I mentioned on the PR87899. The first combiner patch on Oct 22. Regards, Renlin > > > Ramana >> >> But the IRA/LRA change dose cause the arm-none-linux-gnueabihf bootstrap native toolchain mis-compiled. >> And the new patch seems not fix this problem. >> >> I am trying to extract a test case, but it is a little bit hard as the toolchain itself is mis-compiled. >> And it ICEs when compile test case with it. >> >> I created a bugzilla ticket for this, PR87899. >> >> ./gcc/cc1 ~/gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c -O3 >> test main >> Analyzing compilation unit >> Performing interprocedural optimizations >> <*free_lang_data> <visibility> <build_ssa_passes> <opt_local_passes> <targetclone> <free-fnsummary> <increase_alignment>Streaming LTO >> <whole-program> <profile_estimate> <icf> <devirt> <cp> <fnsummary> <inline> <pure-const> <free-fnsummary> <static-var> <single-use> >> <comdats>Assembling functions: >> <materialize-all-clones> testduring GIMPLE pass: ldist >> >> gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c: In function ‘test’: >> gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c:9:1: internal compiler error: Segmentation fault >> 9 | test (void) >> | ^~~~ >> 0x5c3a37 crash_signal >> ../../gcc/gcc/toplev.c:325 >> 0x63ef6b inchash::hash::add(void const*, unsigned int) >> ../../gcc/gcc/inchash.h:100 >> 0x63ef6b inchash::hash::add_ptr(void const*) >> ../../gcc/gcc/inchash.h:94 >> 0x63ef6b ddr_hasher::hash(data_dependence_relation const*) >> ../../gcc/gcc/tree-loop-distribution.c:143 >> 0x63ef6b hash_table<ddr_hasher, xcallocator>::find_slot(data_dependence_relation* const&, insert_option) >> ../../gcc/gcc/hash-table.h:414 >> 0x63ef6b get_data_dependence >> ../../gcc/gcc/tree-loop-distribution.c:1184 >> 0x63f2bd data_dep_in_cycle_p >> ../../gcc/gcc/tree-loop-distribution.c:1210 >> 0x63f2bd update_type_for_merge >> ../../gcc/gcc/tree-loop-distribution.c:1255 >> 0x64064b build_rdg_partition_for_vertex >> ../../gcc/gcc/tree-loop-distribution.c:1302 >> 0x64064b rdg_build_partitions >> ../../gcc/gcc/tree-loop-distribution.c:1754 >> 0x64064b distribute_loop >> ../../gcc/gcc/tree-loop-distribution.c:2795 >> 0x642299 execute >> ../../gcc/gcc/tree-loop-distribution.c:3133 >> Please submit a full bug report, >> with preprocessed source if appropriate. >> Please include the complete backtrace with any bug report. >> See <https://gcc.gnu.org/bugs/> for instructions. >> >> >> >> Regards >> Renlin >> >> >> >>>>> >>>>> At this point your patch appears to be DTRT across the board. The only >>>>> fallout is the bogus s390 asm it caught in the kernel. >>>> >>>> Cool. I will note that I contacted the s390 kernel guys and gave them a >>>> fix to their broken constraints in that asm and they are going to fix it. >>> Sounds good. I've got a hack in my tester to "fix" that bogus asm until >>> the kernel folks do it right. >>> >>> >>>> >>>> Is the above an approval to commit the patch mentioned above or do you >>>> still want to wait until the ARM issues are fully resolved? >>> I think knowing the patch addresses all the known issues related to the >>> earlier IRA/LRA change unblocks the review step. I don't think we need >>> to wait for the other ARM issues to be resolved -- they seem to be >>> unrelated to the IRA/LRA changes. >>> >>> jeff >>>
On 11/6/18 6:23 AM, Renlin Li wrote: > I just did a bootstrap again with everything up to r264897 which is Oct 6. > it produce the ICE I mentioned on the PR87899. > > The first combiner patch on Oct 22. Do the testsuite results (for disable-bootstrap builds) differ between r264896 and r264897? If so, that would be much easier to track down. If not, maybe the following patch could help to narrow down which gcc source file(s) are being miscompiled by allowing you to disable the special handling of copy conflicts with an option? The option default (ie, not using the option or -fno-ira-copies-conflict) is the same behavior as now and -fira-copies-conflict would make things behave like they did before my patch. Peter Index: gcc/common.opt =================================================================== --- gcc/common.opt (revision 265402) +++ gcc/common.opt (working copy) @@ -1761,6 +1761,10 @@ Enum(ira_region) String(all) Value(IRA_R EnumValue Enum(ira_region) String(mixed) Value(IRA_REGION_MIXED) +fira-copies-conflict +Common Report Var(flag_ira_copies_conflict) Init(0) Optimization +Make pseudos connected by a copy conflict + fira-hoist-pressure Common Report Var(flag_ira_hoist_pressure) Init(1) Optimization Use IRA based register pressure calculation Index: gcc/ira-lives.c =================================================================== --- gcc/ira-lives.c (revision 265402) +++ gcc/ira-lives.c (working copy) @@ -1066,7 +1066,7 @@ non_conflicting_reg_copy_p (rtx_insn *in { /* Reload has issues with overlapping pseudos being assigned to the same hard register, so don't allow it. See PR87600 for details. */ - if (!targetm.lra_p ()) + if (flag_ira_copies_conflict || !targetm.lra_p ()) return NULL_RTX; rtx set = single_set (insn);
On 11/6/18 3:52 AM, Renlin Li wrote: > Hi Jeff & Peter, > > On 11/05/2018 07:41 PM, Jeff Law wrote: >> On 11/5/18 12:36 PM, Peter Bergner wrote: >>> On 11/5/18 1:20 PM, Jeff Law wrote: >>>> On 11/1/18 4:07 PM, Peter Bergner wrote: >>>>> On 11/1/18 1:50 PM, Renlin Li wrote: >>>>>> Is there any update on this issues? >>>>>> arm-none-linux-gnueabihf native toolchain has been mis-compiled >>>>>> for a while. >>>>> >>>>> From the analysis I've done, my commit is just exposing latent issues >>>>> in LRA. Can you try the patch I submitted here to see if it helps? >>>>> >>>>> https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html >>>>> >>>>> It survives on powerpc64le-linux, x86_64-linux and s390x-linux. >>>>> Jeff threw it on his testers and said he saw an arm issue and was >>>>> trying to come up with a test case for me to debug. >>>> So I don't think the ARM issues are related to your patch, they may >>>> have >>>> been related the combiner changes that went in around the same time. > Yes, there are issues related to the combiner changes. > > But the IRA/LRA change dose cause the arm-none-linux-gnueabihf bootstrap > native toolchain mis-compiled. > And the new patch seems not fix this problem. That's strange. I'm bootstrapping arm-linux-gnueabihf daily with qemu + a suitable root filesystem using Peter's most recent testing patch. > > I am trying to extract a test case, but it is a little bit hard as the > toolchain itself is mis-compiled. > And it ICEs when compile test case with it. What I would suggest doing is to first start with running the testsuite against the stage1 compiler before/after Peter's changes. Sometimes that'll turn up something useful and you can avoid debuging things through stage2/stage3. jeff
Hi, On 11/06/2018 06:58 PM, Jeff Law wrote: > On 11/6/18 3:52 AM, Renlin Li wrote: >> Hi Jeff & Peter, >> >> On 11/05/2018 07:41 PM, Jeff Law wrote: >>> On 11/5/18 12:36 PM, Peter Bergner wrote: >>>> On 11/5/18 1:20 PM, Jeff Law wrote: >>>>> On 11/1/18 4:07 PM, Peter Bergner wrote: >>>>>> On 11/1/18 1:50 PM, Renlin Li wrote: >>>>>>> Is there any update on this issues? >>>>>>> arm-none-linux-gnueabihf native toolchain has been mis-compiled >>>>>>> for a while. >>>>>> >>>>>> From the analysis I've done, my commit is just exposing latent issues >>>>>> in LRA. Can you try the patch I submitted here to see if it helps? >>>>>> >>>>>> https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html >>>>>> >>>>>> It survives on powerpc64le-linux, x86_64-linux and s390x-linux. >>>>>> Jeff threw it on his testers and said he saw an arm issue and was >>>>>> trying to come up with a test case for me to debug. >>>>> So I don't think the ARM issues are related to your patch, they may >>>>> have >>>>> been related the combiner changes that went in around the same time. >> Yes, there are issues related to the combiner changes. >> >> But the IRA/LRA change dose cause the arm-none-linux-gnueabihf bootstrap >> native toolchain mis-compiled. >> And the new patch seems not fix this problem. > That's strange. I'm bootstrapping arm-linux-gnueabihf daily with qemu + > a suitable root filesystem using Peter's most recent testing patch. > > >> >> I am trying to extract a test case, but it is a little bit hard as the >> toolchain itself is mis-compiled. >> And it ICEs when compile test case with it. > What I would suggest doing is to first start with running the testsuite > against the stage1 compiler before/after Peter's changes. Sometimes > that'll turn up something useful and you can avoid debuging things > through stage2/stage3. Hi Jeff, Thanks for the suggestion! I could reproduce it with stage1 compiler. Hi Peter, I think I found the problem! As described in the PR, a hard register is used in an pre/post modify expression. The hard register is live, but updated. In this case, we should make it conflicting with all pseudos live at that point. Does it make sense? It fixes the ICE of mis-compiled arm-linux-gnueabihf toolchain described in the PR. I attached the patch for discussion. I haven't give a complete test on arm or any other targets, yet. (Probably need more adjusting) I will run arm and aarch64 regression test, cross and native. Regards, Renlin BTW, The pre/post modify expression is generated by auto_inc/auto_dec pass. somehow, it merges with hard register, for example function argument registers. This optimization make the life for RA harder. Probably we don't want that pass too aggressive. @Wilco. (This IRA/LRA and the combiner change reveals a lot of issues, force us to work on it and improve the compiler :) .) gcc/ChangeLog: 2018-11-08 Renlin Li <renlin.li@arm.com> PR middle-end/87899 * lra-lives.c (process_bb_lives): Make hard register of INOUT type conflict with all live pseudo. > > > jeff > diff --git a/gcc/lra-lives.c b/gcc/lra-lives.c index 0bf8cd06a302c8a6fcb914b94f953cdaa86597a2..370a7254cac7dbde4e320424e09274cee66c50b9 100644 --- a/gcc/lra-lives.c +++ b/gcc/lra-lives.c @@ -878,11 +878,25 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p) /* See which defined values die here. */ for (reg = curr_id->regs; reg != NULL; reg = reg->next) - if (reg->type == OP_OUT - && ! reg_early_clobber_p (reg, n_alt) && ! reg->subreg_p) - need_curr_point_incr - |= mark_regno_dead (reg->regno, reg->biggest_mode, - curr_point); + if (! reg_early_clobber_p (reg, n_alt) && ! reg->subreg_p) + { + if (reg->type == OP_OUT) + need_curr_point_incr + |= mark_regno_dead (reg->regno, reg->biggest_mode, + curr_point); + + // This is a hard register, and it must be live. Keep it live and + // make it conflict with all live pseudo registers. + else if (reg->type == OP_INOUT && reg->regno < FIRST_PSEUDO_REGISTER) + { + lra_assert (TEST_HARD_REG_BIT (hard_regs_live, reg->regno)); + + unsigned int i; + EXECUTE_IF_SET_IN_SPARSESET (pseudos_live, i) + SET_HARD_REG_BIT (lra_reg_info[i].conflict_hard_regs, + reg->regno); + } + } for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next) if (reg->type == OP_OUT
On Thu, Nov 8, 2018 at 11:57 AM Renlin Li <renlin.li@foss.arm.com> wrote: > > Hi, > > On 11/06/2018 06:58 PM, Jeff Law wrote: > > On 11/6/18 3:52 AM, Renlin Li wrote: > >> Hi Jeff & Peter, > >> > >> On 11/05/2018 07:41 PM, Jeff Law wrote: > >>> On 11/5/18 12:36 PM, Peter Bergner wrote: > >>>> On 11/5/18 1:20 PM, Jeff Law wrote: > >>>>> On 11/1/18 4:07 PM, Peter Bergner wrote: > >>>>>> On 11/1/18 1:50 PM, Renlin Li wrote: > >>>>>>> Is there any update on this issues? > >>>>>>> arm-none-linux-gnueabihf native toolchain has been mis-compiled > >>>>>>> for a while. > >>>>>> > >>>>>> From the analysis I've done, my commit is just exposing latent issues > >>>>>> in LRA. Can you try the patch I submitted here to see if it helps? > >>>>>> > >>>>>> https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html > >>>>>> > >>>>>> It survives on powerpc64le-linux, x86_64-linux and s390x-linux. > >>>>>> Jeff threw it on his testers and said he saw an arm issue and was > >>>>>> trying to come up with a test case for me to debug. > >>>>> So I don't think the ARM issues are related to your patch, they may > >>>>> have > >>>>> been related the combiner changes that went in around the same time. > >> Yes, there are issues related to the combiner changes. > >> > >> But the IRA/LRA change dose cause the arm-none-linux-gnueabihf bootstrap > >> native toolchain mis-compiled. > >> And the new patch seems not fix this problem. > > That's strange. I'm bootstrapping arm-linux-gnueabihf daily with qemu + > > a suitable root filesystem using Peter's most recent testing patch. > > > > > >> > >> I am trying to extract a test case, but it is a little bit hard as the > >> toolchain itself is mis-compiled. > >> And it ICEs when compile test case with it. > > What I would suggest doing is to first start with running the testsuite > > against the stage1 compiler before/after Peter's changes. Sometimes > > that'll turn up something useful and you can avoid debuging things > > through stage2/stage3. > > Hi Jeff, > Thanks for the suggestion! I could reproduce it with stage1 compiler. > > > > Hi Peter, > > I think I found the problem! > > As described in the PR, a hard register is used in > an pre/post modify expression. The hard register is live, but updated. > In this case, we should make it conflicting with all pseudos live at > that point. Does it make sense? Err, that looks very much like a hack that manages to hide the issue. Esp. adding conflicts in a loop that says "See which defined values die here." is quite fishy. Richard. > > > > > It fixes the ICE of mis-compiled arm-linux-gnueabihf toolchain described in the > PR. > > I attached the patch for discussion. I haven't give a complete test on arm or > any other targets, yet. (Probably need more adjusting) > > I will run arm and aarch64 regression test, cross and native. > > Regards, > Renlin > > BTW, The pre/post modify expression is generated by auto_inc/auto_dec pass. > somehow, it merges with hard register, for example function argument registers. > This optimization make the life for RA harder. Probably we don't want that pass > too aggressive. @Wilco. > (This IRA/LRA and the combiner change reveals a lot of issues, > force us to work on it and improve the compiler :) .) > > gcc/ChangeLog: > > 2018-11-08 Renlin Li <renlin.li@arm.com> > PR middle-end/87899 > * lra-lives.c (process_bb_lives): Make hard register of INOUT > type conflict with all live pseudo. > > > > > > > > > > > > > jeff > >
On 11/8/18 4:57 AM, Renlin Li wrote: > I think I found the problem! > > As described in the PR, a hard register is used in > an pre/post modify expression. The hard register is live, but updated. > In this case, we should make it conflicting with all pseudos live at > that point. Does it make sense? Do you have a reproducer test case I can look at? I'd like to see the problematical rtl to help me determine whether your patch is correct or not. ...and thank you for debugging this! Peter
Hi Peter, On 11/08/2018 12:35 PM, Peter Bergner wrote: > On 11/8/18 4:57 AM, Renlin Li wrote: >> I think I found the problem! >> >> As described in the PR, a hard register is used in >> an pre/post modify expression. The hard register is live, but updated. >> In this case, we should make it conflicting with all pseudos live at >> that point. Does it make sense? > > Do you have a reproducer test case I can look at? I'd like to see the > problematical rtl to help me determine whether your patch is correct > or not. ...and thank you for debugging this! > > Peter > Sure! (I was trying to send the mail, but it failed with large attachment.) I attached the dump file in the bugzilla ticket: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87899 I remove the unrelated dump (tar -xzvf xxx.tgz) The code you want to check is the following in ira pass: insn 10905: r1 = r2040 insn 208: use and update r1 with pre_modify insn 191: use pseudo r2040 I could not create a test case. This dump is created with stage1 compiler compiling next stage compiler. The (not helpful) command line is: /home/renlin/try-new/./prev-gcc/cc1plus -quiet -nostdinc++ -v -I /home/renlin/try-new/prev-arm-none-linux-gnueabihf/libstdc++-v3/include/arm-none-linux-gnueabihf -I /home/renlin/try-new/prev-arm-none-linux-gnueabihf/libstdc++-v3/include -I /home/renlin/gcc/libstdc++-v3/libsupc++ -I . -I . -I ../../gcc/gcc -I ../../gcc/gcc/. -I ../../gcc/gcc/../include -I ../../gcc/gcc/../libcpp/include -I ../../gcc/gcc/../libdecnumber -I ../../gcc/gcc/../libdecnumber/dpd -I ../libdecnumber -I ../../gcc/gcc/../libbacktrace -imultilib . -imultiarch arm-linux-gnueabihf -iprefix /home/renlin/try-new/prev-gcc/../lib/gcc/arm-none-linux-gnueabihf/9.0.0/ -isystem /home/renlin/try-new/./prev-gcc/include -isystem /home/renlin/try-new/./prev-gcc/include-fixed -MMD tree-loop-distribution.d -MF ./.deps/tree-loop-distribution.TPo -MP -MT tree-loop-distribution.o -D_GNU_SOURCE -D IN_GCC -D HAVE_CONFIG_H ../../gcc/gcc/tree-loop-distribution.c -quiet -dumpbase tree-loop-distribution.c -mfloat-abi=hard -mfpu=neon -mthumb -mtls-dialect=gnu -march=armv7-a+simd -auxbase-strip tree-loop-distribution.s -g -gtoggle -O2 -Wextra -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wsuggest-attribute=format -Woverloaded-virtual -Wpedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -version -fno-PIE -fno-checking -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -fno-common -o tree-loop-distribution.s Regards, Renlin
On 11/8/18 5:48 AM, Richard Biener wrote: > Err, that looks very much like a hack that manages to hide the issue. It's true we do not want to hide the issue by adding unneeded conflicts, since that can lead to unnecessary spills. However, ... > Esp. adding conflicts in a loop that says "See which defined values die here." > is quite fishy. ..the original loop is dealing with some of the gory details you never read about in academic RA papers. This code is used to catch the case where an insn defines a register(s) that is never used. Because it is never used, it never ends up in the "live" (ie, live and available) set, which can cause us to miss some required conflicts. That said, I still need to look at the RTL from the bad program before determining whether the patch is correct or not. Computing accurate conflict information is a delicate thing. Peter
On 11/8/18 4:57 AM, Renlin Li wrote: > I think I found the problem! > > As described in the PR, a hard register is used in > an pre/post modify expression. The hard register is live, but updated. > In this case, we should make it conflicting with all pseudos live at > that point. Does it make sense? [snip] > It fixes the ICE of mis-compiled arm-linux-gnueabihf toolchain described in the > PR. > > I attached the patch for discussion. I haven't give a complete test on arm or > any other targets, yet. (Probably need more adjusting) Yes, this is the problem. We see from the dump, that r2040 does not conflict with hard reg r1: ;; a2040(r1597,l0) conflicts: <list of pseudo regs> ;; total conflict hard regs: ;; conflict hard regs: ...and we have the following RTL: (insn 10905 179 199 2 (set (reg:SI 1 r1) (reg/f:SI 2040)) "../../gcc/gcc/vec.h":1654 647 {*thumb2_movsi_vfp} (nil)) ... (insn 208 202 182 2 (set (mem/f/c:SI (pre_modify:SI (reg:SI 1 r1) (plus:SI (reg:SI 1 r1) (const_int 12 [0xc]))) [129 loop_nest.m_vec+0 S4 A32]) (reg:SI 1048)) "../../gcc/gcc/vec.h":1654 647 {*thumb2_movsi_vfp} (expr_list:REG_INC (reg:SI 1 r1) (nil))) ... (insn 191 189 192 2 (set (mem/f/c:SI (plus:SI (reg/f:SI 2040) (const_int 8 [0x8])) [367 ddrs_table+0 S4 A32]) (reg/f:SI 1047)) "../../gcc/gcc/tree-loop-distribution.c":2741 647 {*thumb2_movsi_vfp} (nil)) So my patch caused us to (correctly) skip adding a conflict between r1 and r2040 due to the register copy in insn 10905. However, they really should conflict as you found due to the definition of r1 in insn 208 and the fact we don't add one is a latent bug in LRA. I think your patch is on the right track, but not totally there yet. Imagine instead that the references to r1 and r2040 were swapped, so instead we have: (insn 10905 179 199 2 (set (reg:SI 2040) (reg/f:SI 1 r1)) "../../gcc/gcc/vec.h":1654 647 {*thumb2_movsi_vfp} (nil)) ... (insn 208 202 182 2 (set (mem/f/c:SI (pre_modify:SI (reg:SI 2040) (plus:SI (reg:SI 2040) (const_int 12 [0xc]))) [129 loop_nest.m_vec+0 S4 A32]) (reg:SI 1048)) "../../gcc/gcc/vec.h":1654 647 {*thumb2_movsi_vfp} (expr_list:REG_INC (reg:SI 2040) (nil))) ... (insn 191 189 192 2 (set (mem/f/c:SI (plus:SI (reg/f:SI 1 r1) (const_int 8 [0x8])) [367 ddrs_table+0 S4 A32]) (reg/f:SI 1047)) "../../gcc/gcc/tree-loop-distribution.c":2741 647 {*thumb2_movsi_vfp} (nil)) Even with your patch, we'd miss adding the conflict between r1 and r2040. Let me think about how we should solve this one. And a *BIG* thank you for tracking down the problem!!! Peter
Hi Peter, On 11/08/2018 03:21 PM, Peter Bergner wrote: > On 11/8/18 4:57 AM, Renlin Li wrote: >> I think I found the problem! >> >> As described in the PR, a hard register is used in >> an pre/post modify expression. The hard register is live, but updated. >> In this case, we should make it conflicting with all pseudos live at >> that point. Does it make sense? > [snip] >> It fixes the ICE of mis-compiled arm-linux-gnueabihf toolchain described in the >> PR. >> >> I attached the patch for discussion. I haven't give a complete test on arm or >> any other targets, yet. (Probably need more adjusting) > > Yes, this is the problem. We see from the dump, that r2040 does not conflict with > hard reg r1: > > ;; a2040(r1597,l0) conflicts: <list of pseudo regs> > ;; total conflict hard regs: > ;; conflict hard regs: I think you should look for axxx(r2040, ..)? Maybe I am wrong (not an expert of RA), from what I observed, it is the LRA makes the code more complex. It decides to split the live range and spill r2040. It creates multiple instructions to reload it. r2944 in LRA dump is the register which starts to go wrong. It is assigned as r1. Creating newreg=2944 from oldreg=2040, assigning class GENERAL_REGS to inheritance r2944 Original reg change 2040->2944 (bb2): 10905: r1:SI=r2944:SI Add inheritance<-original before: 12868: r2944:SI=r2040:SI The dump is the final state of LRA. I debug it with gdb, and there are some temporary steps which is not observable in the final dump. > > ...and we have the following RTL: > > (insn 10905 179 199 2 (set (reg:SI 1 r1) > (reg/f:SI 2040)) "../../gcc/gcc/vec.h":1654 647 {*thumb2_movsi_vfp} > (nil)) > > ... > > (insn 208 202 182 2 (set (mem/f/c:SI (pre_modify:SI (reg:SI 1 r1) > (plus:SI (reg:SI 1 r1) > (const_int 12 [0xc]))) [129 loop_nest.m_vec+0 S4 A32]) > (reg:SI 1048)) "../../gcc/gcc/vec.h":1654 647 {*thumb2_movsi_vfp} > (expr_list:REG_INC (reg:SI 1 r1) > (nil))) > > ... > > (insn 191 189 192 2 (set (mem/f/c:SI (plus:SI (reg/f:SI 2040) > (const_int 8 [0x8])) [367 ddrs_table+0 S4 A32]) > (reg/f:SI 1047)) "../../gcc/gcc/tree-loop-distribution.c":2741 647 {*thumb2_movsi_vfp} > (nil)) > > So my patch caused us to (correctly) skip adding a conflict between r1 and > r2040 due to the register copy in insn 10905. However, they really should > conflict as you found due to the definition of r1 in insn 208 and the fact > we don't add one is a latent bug in LRA. I think your patch is on the right > track, but not totally there yet. Imagine instead that the references to r1 > and r2040 were swapped, so instead we have: > > (insn 10905 179 199 2 (set (reg:SI 2040) > (reg/f:SI 1 r1)) "../../gcc/gcc/vec.h":1654 647 {*thumb2_movsi_vfp} > (nil)) > > ... > > (insn 208 202 182 2 (set (mem/f/c:SI (pre_modify:SI (reg:SI 2040) > (plus:SI (reg:SI 2040) > (const_int 12 [0xc]))) [129 loop_nest.m_vec+0 S4 A32]) > (reg:SI 1048)) "../../gcc/gcc/vec.h":1654 647 {*thumb2_movsi_vfp} > (expr_list:REG_INC (reg:SI 2040) > (nil))) > > ... > > (insn 191 189 192 2 (set (mem/f/c:SI (plus:SI (reg/f:SI 1 r1) > (const_int 8 [0x8])) [367 ddrs_table+0 S4 A32]) > (reg/f:SI 1047)) "../../gcc/gcc/tree-loop-distribution.c":2741 647 {*thumb2_movsi_vfp} > (nil)) > > Even with your patch, we'd miss adding the conflict between r1 and r2040. > Let me think about how we should solve this one. Yes, I am not confident the patch will be the ultimate fix to the problem. > > And a *BIG* thank you for tracking down the problem!!! > Nop. Regards, Renlin > Peter >
On 11/8/18 10:19 AM, Renlin Li wrote: >> Yes, this is the problem. We see from the dump, that r2040 does not conflict with >> hard reg r1: >> >> ;; a2040(r1597,l0) conflicts: <list of pseudo regs> >> ;; total conflict hard regs: >> ;; conflict hard regs: > I think you should look for axxx(r2040, ..)? > > Maybe I am wrong (not an expert of RA), from what I observed, it is the LRA > makes the code more complex. It decides to split the live range and spill r2040. > It creates multiple instructions to reload it. > r2944 in LRA dump is the register which starts to go wrong. It is assigned as r1. Yes, IRA and LRA have similar code to compute conflicts. We need them both to compute that r2040 (and the reload pseudo(s) generated for it by LRA) conflict with r1. Peter
On 11/8/18 8:29 AM, Peter Bergner wrote: > On 11/8/18 5:48 AM, Richard Biener wrote: >> Esp. adding conflicts in a loop that says "See which defined values die here." >> is quite fishy. > > ..the original loop is dealing with some of the gory details you never read > about in academic RA papers. This code is used to catch the case where an insn > defines a register(s) that is never used. Because it is never used, it never > ends up in the "live" (ie, live and available) set, which can cause us to miss > some required conflicts. Heh, of course I confused this loop with the one I described which is earlier in the function. This loop is actually just the normal loop over all definitions, removing them from the live set and adding conflicts with all pseudos that are currently live. Sorry for the confusion. :-( Peter
Index: gcc/ira.h =================================================================== --- gcc/ira.h (revision 264789) +++ gcc/ira.h (working copy) @@ -210,6 +210,9 @@ extern void ira_adjust_equiv_reg_cost (u /* ira-costs.c */ extern void ira_costs_c_finalize (void); +/* ira-lives.c */ +extern rtx copy_insn_p (rtx_insn *); + /* Spilling static chain pseudo may result in generation of wrong non-local goto code using frame-pointer to address saved stack pointer value after restoring old frame pointer value. The Index: gcc/ira-lives.c =================================================================== --- gcc/ira-lives.c (revision 264789) +++ gcc/ira-lives.c (working copy) @@ -84,6 +84,10 @@ static int *allocno_saved_at_call; supplemental to recog_data. */ static alternative_mask preferred_alternatives; +/* If non-NULL, the source operand of a register to register copy for which + we should not add a conflict with the copy's destination operand. */ +static rtx ignore_reg_for_conflicts; + /* Record hard register REGNO as now being live. */ static void make_hard_regno_live (int regno) @@ -101,6 +105,11 @@ make_hard_regno_dead (int regno) { ira_object_t obj = ira_object_id_map[i]; + if (ignore_reg_for_conflicts != NULL_RTX + && REGNO (ignore_reg_for_conflicts) + == (unsigned int) ALLOCNO_REGNO (OBJECT_ALLOCNO (obj))) + continue; + SET_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (obj), regno); SET_HARD_REG_BIT (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj), regno); } @@ -154,12 +163,38 @@ static void make_object_dead (ira_object_t obj) { live_range_t lr; + int ignore_regno = -1; + int last_regno = -1; sparseset_clear_bit (objects_live, OBJECT_CONFLICT_ID (obj)); + /* Check whether any part of IGNORE_REG_FOR_CONFLICTS already conflicts + with OBJ. */ + if (ignore_reg_for_conflicts != NULL_RTX + && REGNO (ignore_reg_for_conflicts) < FIRST_PSEUDO_REGISTER) + { + last_regno = END_REGNO (ignore_reg_for_conflicts); + int src_regno = ignore_regno = REGNO (ignore_reg_for_conflicts); + + while (src_regno < last_regno) + { + if (TEST_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (obj), src_regno)) + { + ignore_regno = last_regno = -1; + break; + } + src_regno++; + } + } + IOR_HARD_REG_SET (OBJECT_CONFLICT_HARD_REGS (obj), hard_regs_live); IOR_HARD_REG_SET (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj), hard_regs_live); + /* If IGNORE_REG_FOR_CONFLICTS did not already conflict with OBJ, make + sure it still doesn't. */ + for (; ignore_regno < last_regno; ignore_regno++) + CLEAR_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (obj), ignore_regno); + lr = OBJECT_LIVE_RANGES (obj); ira_assert (lr != NULL); lr->finish = curr_point; @@ -1022,6 +1057,38 @@ find_call_crossed_cheap_reg (rtx_insn *i return cheap_reg; } +/* Determine whether INSN is a register to register copy of the type where + we do not need to make the source and destiniation registers conflict. + If this is a copy instruction, then return the source reg. Otherwise, + return NULL_RTX. */ +rtx +copy_insn_p (rtx_insn *insn) +{ + rtx set; + + /* Disallow anything other than a simple register to register copy + that has no side effects. */ + if ((set = single_set (insn)) == NULL_RTX + || !REG_P (SET_DEST (set)) + || !REG_P (SET_SRC (set)) + || side_effects_p (set)) + return NULL_RTX; + + int dst_regno = REGNO (SET_DEST (set)); + int src_regno = REGNO (SET_SRC (set)); + machine_mode mode = GET_MODE (SET_DEST (set)); + + /* Computing conflicts for register pairs is difficult to get right, so + for now, disallow it. */ + if ((dst_regno < FIRST_PSEUDO_REGISTER + && hard_regno_nregs (dst_regno, mode) != 1) + || (src_regno < FIRST_PSEUDO_REGISTER + && hard_regno_nregs (src_regno, mode) != 1)) + return NULL_RTX; + + return SET_SRC (set); +} + /* Process insns of the basic block given by its LOOP_TREE_NODE to update allocno live ranges, allocno hard register conflicts, intersected calls, and register pressure info for allocnos for the @@ -1107,22 +1174,7 @@ process_bb_node_lives (ira_loop_tree_nod curr_point); call_p = CALL_P (insn); -#ifdef REAL_PIC_OFFSET_TABLE_REGNUM - int regno; - bool clear_pic_use_conflict_p = false; - /* Processing insn usage in call insn can create conflict - with pic pseudo and pic hard reg and that is wrong. - Check this situation and fix it at the end of the insn - processing. */ - if (call_p && pic_offset_table_rtx != NULL_RTX - && (regno = REGNO (pic_offset_table_rtx)) >= FIRST_PSEUDO_REGISTER - && (a = ira_curr_regno_allocno_map[regno]) != NULL) - clear_pic_use_conflict_p - = (find_regno_fusage (insn, USE, REAL_PIC_OFFSET_TABLE_REGNUM) - && ! TEST_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS - (ALLOCNO_OBJECT (a, 0)), - REAL_PIC_OFFSET_TABLE_REGNUM)); -#endif + ignore_reg_for_conflicts = copy_insn_p (insn); /* Mark each defined value as live. We need to do this for unused values because they still conflict with quantities @@ -1275,20 +1327,9 @@ process_bb_node_lives (ira_loop_tree_nod } } -#ifdef REAL_PIC_OFFSET_TABLE_REGNUM - if (clear_pic_use_conflict_p) - { - regno = REGNO (pic_offset_table_rtx); - a = ira_curr_regno_allocno_map[regno]; - CLEAR_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (ALLOCNO_OBJECT (a, 0)), - REAL_PIC_OFFSET_TABLE_REGNUM); - CLEAR_HARD_REG_BIT (OBJECT_TOTAL_CONFLICT_HARD_REGS - (ALLOCNO_OBJECT (a, 0)), - REAL_PIC_OFFSET_TABLE_REGNUM); - } -#endif curr_point++; } + ignore_reg_for_conflicts = NULL_RTX; if (bb_has_eh_pred (bb)) for (j = 0; ; ++j) Index: gcc/lra-lives.c =================================================================== --- gcc/lra-lives.c (revision 264789) +++ gcc/lra-lives.c (working copy) @@ -96,6 +96,10 @@ static bitmap_head temp_bitmap; /* Pool for pseudo live ranges. */ static object_allocator<lra_live_range> lra_live_range_pool ("live ranges"); +/* If non-NULL, the source operand of a register to register copy for which + we should not add a conflict with the copy's destination operand. */ +static rtx ignore_reg_for_conflicts; + /* Free live range list LR. */ static void free_live_range_list (lra_live_range_t lr) @@ -239,11 +243,9 @@ make_hard_regno_live (int regno) /* Process the definition of hard register REGNO. This updates hard_regs_live, START_DYING and conflict hard regs for living - pseudos. Conflict hard regs for the pic pseudo is not updated if - REGNO is REAL_PIC_OFFSET_TABLE_REGNUM and CHECK_PIC_PSEUDO_P is - true. */ + pseudos. */ static void -make_hard_regno_dead (int regno, bool check_pic_pseudo_p ATTRIBUTE_UNUSED) +make_hard_regno_dead (int regno) { lra_assert (regno < FIRST_PSEUDO_REGISTER); if (! TEST_HARD_REG_BIT (hard_regs_live, regno)) @@ -251,13 +253,12 @@ make_hard_regno_dead (int regno, bool ch sparseset_set_bit (start_dying, regno); unsigned int i; EXECUTE_IF_SET_IN_SPARSESET (pseudos_live, i) -#ifdef REAL_PIC_OFFSET_TABLE_REGNUM - if (! check_pic_pseudo_p - || regno != REAL_PIC_OFFSET_TABLE_REGNUM - || pic_offset_table_rtx == NULL - || i != REGNO (pic_offset_table_rtx)) -#endif + { + if (ignore_reg_for_conflicts != NULL_RTX + && REGNO (ignore_reg_for_conflicts) == i) + continue; SET_HARD_REG_BIT (lra_reg_info[i].conflict_hard_regs, regno); + } CLEAR_HARD_REG_BIT (hard_regs_live, regno); if (fixed_regs[regno] || TEST_HARD_REG_BIT (hard_regs_spilled_into, regno)) { @@ -294,14 +295,41 @@ static void mark_pseudo_dead (int regno, int point) { lra_live_range_t p; + int ignore_regno = -1; + int last_regno = -1; lra_assert (regno >= FIRST_PSEUDO_REGISTER); lra_assert (sparseset_bit_p (pseudos_live, regno)); sparseset_clear_bit (pseudos_live, regno); sparseset_set_bit (start_dying, regno); + /* Check whether any part of IGNORE_REG_FOR_CONFLICTS already conflicts + with REGNO. */ + if (ignore_reg_for_conflicts != NULL_RTX + && REGNO (ignore_reg_for_conflicts) < FIRST_PSEUDO_REGISTER) + { + last_regno = END_REGNO (ignore_reg_for_conflicts); + int src_regno = ignore_regno = REGNO (ignore_reg_for_conflicts); + + while (src_regno < last_regno) + { + if (TEST_HARD_REG_BIT (lra_reg_info[regno].conflict_hard_regs, + src_regno)) + { + ignore_regno = last_regno = -1; + break; + } + src_regno++; + } + } + IOR_HARD_REG_SET (lra_reg_info[regno].conflict_hard_regs, hard_regs_live); + /* If IGNORE_REG_FOR_CONFLICTS did not already conflict with REGNO, make + sure it still doesn't. */ + for (; ignore_regno < last_regno; ignore_regno++) + CLEAR_HARD_REG_BIT (lra_reg_info[regno].conflict_hard_regs, ignore_regno); + if (complete_info_p || lra_get_regno_hard_regno (regno) < 0) { p = lra_reg_info[regno].live_ranges; @@ -350,7 +378,7 @@ mark_regno_dead (int regno, machine_mode if (regno < FIRST_PSEUDO_REGISTER) { for (last = end_hard_regno (mode, regno); regno < last; regno++) - make_hard_regno_dead (regno, false); + make_hard_regno_dead (regno); } else { @@ -747,6 +775,7 @@ process_bb_lives (basic_block bb, int &c } call_p = CALL_P (curr_insn); + ignore_reg_for_conflicts = copy_insn_p (curr_insn); src_regno = (set != NULL_RTX && REG_P (SET_SRC (set)) ? REGNO (SET_SRC (set)) : -1); dst_regno = (set != NULL_RTX && REG_P (SET_DEST (set)) @@ -858,14 +887,13 @@ process_bb_lives (basic_block bb, int &c for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next) if (reg->type == OP_OUT && ! reg_early_clobber_p (reg, n_alt) && ! reg->subreg_p) - make_hard_regno_dead (reg->regno, false); + make_hard_regno_dead (reg->regno); if (curr_id->arg_hard_regs != NULL) for (i = 0; (regno = curr_id->arg_hard_regs[i]) >= 0; i++) if (regno >= FIRST_PSEUDO_REGISTER) - /* It is a clobber. Don't create conflict of used - REAL_PIC_OFFSET_TABLE_REGNUM and the pic pseudo. */ - make_hard_regno_dead (regno - FIRST_PSEUDO_REGISTER, true); + /* It is a clobber. */ + make_hard_regno_dead (regno - FIRST_PSEUDO_REGISTER); if (call_p) { @@ -925,8 +953,7 @@ process_bb_lives (basic_block bb, int &c make_hard_regno_live (reg->regno); if (curr_id->arg_hard_regs != NULL) - /* Make argument hard registers live. Don't create conflict - of used REAL_PIC_OFFSET_TABLE_REGNUM and the pic pseudo. */ + /* Make argument hard registers live. */ for (i = 0; (regno = curr_id->arg_hard_regs[i]) >= 0; i++) if (regno < FIRST_PSEUDO_REGISTER) make_hard_regno_live (regno); @@ -954,7 +981,7 @@ process_bb_lives (basic_block bb, int &c if (reg2->type != OP_OUT && reg2->regno == reg->regno) break; if (reg2 == NULL) - make_hard_regno_dead (reg->regno, false); + make_hard_regno_dead (reg->regno); } if (need_curr_point_incr) @@ -989,6 +1016,7 @@ process_bb_lives (basic_block bb, int &c EXECUTE_IF_SET_IN_SPARSESET (unused_set, j) add_reg_note (curr_insn, REG_UNUSED, regno_reg_rtx[j]); } + ignore_reg_for_conflicts = NULL_RTX; if (bb_has_eh_pred (bb)) for (j = 0; ; ++j) Index: gcc/testsuite/gcc.target/powerpc/pr86939.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr86939.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/pr86939.c (working copy) @@ -0,0 +1,12 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-options "-O2" } */ + +typedef long (*fptr_t) (void); +long +func (fptr_t *p) +{ + if (p) + return (*p) (); + return 0; +} +/* { dg-final { scan-assembler-not {mr %?r?12,} } } */ Index: gcc/testsuite/gcc.target/i386/pr49095.c =================================================================== --- gcc/testsuite/gcc.target/i386/pr49095.c (revision 264793) +++ gcc/testsuite/gcc.target/i386/pr49095.c (working copy) @@ -73,4 +73,5 @@ G (long) /* { dg-final { scan-assembler-not "test\[lq\]" } } */ /* The {f,h}{char,short,int,long}xor functions aren't optimized into a RMW instruction, so need load, modify and store. FIXME eventually. */ -/* { dg-final { scan-assembler-times "\\), %" 8 } } */ +/* { dg-final { scan-assembler-times "\\), %" 57 { target { ia32 } } } } */ +/* { dg-final { scan-assembler-times "\\), %" 45 { target { ! ia32 } } } } */