Message ID | D2095106-2E30-440C-888E-95EE0ED51D7E@sandoe.co.uk |
---|---|
State | New |
Headers | show |
Series | [RFC,Darwin,PPC] Fix PR 65342. | expand |
Hi! On Sat, Oct 12, 2019 at 10:13:16PM +0100, Iain Sandoe wrote: > (this is a bug reported against Fortran, but actually is a generic insn > selection problem for m64 powerpc-darwin. In fact, I’d say it renders > the 64b multilib unusable). > > The solution proposed is Darwin-local (it's a long-standing bug and it > would be intended to back-port it). However, I'd welcome views on it. > > The current Darwin load/store lo_sum patterns have neither predicate nor > constraint. This means that most parts of the backend, which rely on > recog() to validate the rtx, can produce invalid combinations/selections. > > For 32bit cases this isn't a problem since we can load/store to unaligned > addresses using D-mode insns. Can you? -m32 -mpowerpc64? We did have a bug with this before, maybe six years ago or so... Alan, do you remember? It required some assembler work IIRC. > Conversely, for 64bit instructions that use DS mode, this can manifest as > assemble errors (for an assembler that checks the LO14 relocations), or as > crashes caused by wrong offsets (or worse, wrong content for the two LSBs). Resulting in a different insn than intended. Yeah. > Trying to find the right place to fix this has been tricky, there's > discussion in the PR trail. There doesn't seem to be any way to deal with > this in legitimize_address (since most of the damage is done by the wide open > patterns that are in effect after the legitimizer has run). Likewise, > extending the checks in legitimate_address_p() and mode_dependent_address > doesn't help if the constraint is not accurate in the end. > > What we want to check for "Y" on Darwin is: > - that the alignment of the Symbols' target is sufficient for DS mode > - that the offset is suitable for DS mode. > (while looking through the Mach-O PIC unspecs). > > Proposed: > > 1) Drop the Darwin-specific lo_sum patterns. Actually, this produces an > immediate progression in quite a number of tests, we begin using the > movdi_internal64 patterns. However there are also regressions caused by > instructions unable to satisfy their constraints. And code quality regressions? Or does it even improve? (One can dream...) > 2) To resolve this we need to extend the handling of the mem_operand_gpr to > allow looking through Mach-O PIC UNSPECs in the lo_sum cases. > > - note, that rs6000_offsettable_memref_p () will not handle these so that > would return early, producing the issue with unsatisfiable constraints. > > - I do wonder if that's also the case for some non-Darwin lo_sum cases. Not sure what exactly you mean here? Non-Darwin doesn't have issues with not looking through Darwin-specific unspecs, anywhere, so you mean something more general no doubt -- but what? > (some things might be hard to detect, since the code will generally fall > back to doing " la Rx xxx@l ; ld Ry 0(Rx)" so it won't fail - just be > less efficient than it could be). Oh, you are saying mem_operand_gpr does not allow lo_sum *at all*? > + /* If we don't know the alignment of the thing to which the symbol refers, > + we assume optimistically it is "enough". > + ??? maybe we should be pessimistic instead. */ > + unsigned align = 0; If you guess it is aligned enough but it isn't, the compile will fail. Not good. OTOH, when don't we know the alignment? Only for globals? Does the ABI guarantee alignment in that case? I'll have another looke through this (esp. the generic part) when I'm fresh awake (but not before coffee!). Alan, can you have a look as well please? Segher
Hi, Segher Boessenkool <segher@kernel.crashing.org> wrote: > > On Sat, Oct 12, 2019 at 10:13:16PM +0100, Iain Sandoe wrote: >> (this is a bug reported against Fortran, but actually is a generic insn >> selection problem for m64 powerpc-darwin. In fact, I’d say it renders >> the 64b multilib unusable). >> >> The solution proposed is Darwin-local (it's a long-standing bug and it >> would be intended to back-port it). However, I'd welcome views on it. >> >> The current Darwin load/store lo_sum patterns have neither predicate nor >> constraint. This means that most parts of the backend, which rely on >> recog() to validate the rtx, can produce invalid combinations/selections. >> >> For 32bit cases this isn't a problem since we can load/store to unaligned >> addresses using D-mode insns. > > Can you? -m32 -mpowerpc64? In principle, this ought to be supported (there’s a pattern for the specific case in darwin.md) however, I don’t think it’s tested and there seem to be a number of places where the conditional is TARGET_64BIT instead of TARGET_POWERPC64 so my guess is that there would be some work to do to make it happen, > We did have a bug with this before, maybe > six years ago or so... Alan, do you remember? It required some assembler > work IIRC. > >> Conversely, for 64bit instructions that use DS mode, this can manifest as >> assemble errors (for an assembler that checks the LO14 relocations), or as >> crashes caused by wrong offsets (or worse, wrong content for the two >> LSBs). > > Resulting in a different insn than intended. Yeah. > >> Trying to find the right place to fix this has been tricky, there's >> discussion in the PR trail. There doesn't seem to be any way to deal with >> this in legitimize_address (since most of the damage is done by the wide >> open >> patterns that are in effect after the legitimizer has run). Likewise, >> extending the checks in legitimate_address_p() and mode_dependent_address >> doesn't help if the constraint is not accurate in the end. >> >> What we want to check for "Y" on Darwin is: >> - that the alignment of the Symbols' target is sufficient for DS mode >> - that the offset is suitable for DS mode. >> (while looking through the Mach-O PIC unspecs). >> >> Proposed: >> >> 1) Drop the Darwin-specific lo_sum patterns. Actually, this produces an >> immediate progression in quite a number of tests, we begin using the >> movdi_internal64 patterns. However there are also regressions caused by >> instructions unable to satisfy their constraints. > > And code quality regressions? Or does it even improve? (One can dream…) The code quality for the testcases I made for this is good when we do the part below - because, unless we cater for the indirections, we generate quite a few unnecessary loads for the PIC case. >> 2) To resolve this we need to extend the handling of the >> mem_operand_gpr to >> allow looking through Mach-O PIC UNSPECs in the lo_sum cases. >> >> - note, that rs6000_offsettable_memref_p () will not handle these so that >> would return early, producing the issue with unsatisfiable constraints. >> >> - I do wonder if that's also the case for some non-Darwin lo_sum cases. > > Not sure what exactly you mean here? Non-Darwin doesn't have issues with > not looking through Darwin-specific unspecs, anywhere, so you mean > something more general no doubt -- but what? LO_SUM appears to be handled in the case that it refers to a constant pool address, but I wonder if there are any other circumstances that it could be relevant. >> (some things might be hard to detect, since the code will generally fall >> back to doing " la Rx xxx@l ; ld Ry 0(Rx)" so it won't fail - just be >> less efficient than it could be). > > Oh, you are saying mem_operand_gpr does not allow lo_sum *at all*? No, I think it will let LO_SUM through for constant pool entries, I’m not sure about any other case tho (perhaps there is no other case that matters). The absence of something is much harder to reason about / test for than the presence of something - but I am sure that you and Alan know what you expect to be there. >> + /* If we don't know the alignment of the thing to which the symbol >> refers, >> + we assume optimistically it is "enough". >> + ??? maybe we should be pessimistic instead. */ >> + unsigned align = 0; > > If you guess it is aligned enough but it isn't, the compile will fail. Not > good. OTOH, when don't we know the alignment? Only for globals? Does the > ABI guarantee alignment in that case? We usually know the alignment for globals too - I’m not sure under what circumstance there is no decl attached to the symbol_ref There is code rs6000.c:7627 or so, that suggests that section anchors can produce the effect, so that’s something I can look at once the basic problem is solved. > I'll have another looke through this (esp. the generic part) when I'm fresh > awake (but not before coffee!). Alan, can you have a look as well please? thanks! Iain
On Sat, Oct 12, 2019 at 05:39:51PM -0500, Segher Boessenkool wrote: > On Sat, Oct 12, 2019 at 10:13:16PM +0100, Iain Sandoe wrote: > > For 32bit cases this isn't a problem since we can load/store to unaligned > > addresses using D-mode insns. > > Can you? -m32 -mpowerpc64? We did have a bug with this before, maybe > six years ago or so... Alan, do you remember? It required some assembler > work IIRC. Yes, the ppc32 ABI doesn't have the relocs to support DS fields. Rather than defining a whole series of _DS (and _DQ!) relocs, the linker inspects the instruction being relocated and complains if the relocation would modify opcode bits. See is_insn_ds_form in bfd/elf32-ppc.c. We do the same on ppc64 for DQ field insns. > I'll have another looke through this (esp. the generic part) when I'm fresh > awake (but not before coffee!). Alan, can you have a look as well please? It looks reasonable to me.
Thanks for the reviews, Segher, Alan, Alan Modra <amodra@gmail.com> wrote: > On Sat, Oct 12, 2019 at 05:39:51PM -0500, Segher Boessenkool wrote: >> On Sat, Oct 12, 2019 at 10:13:16PM +0100, Iain Sandoe wrote: >>> For 32bit cases this isn't a problem since we can load/store to unaligned >>> addresses using D-mode insns. >> >> Can you? -m32 -mpowerpc64? We did have a bug with this before, maybe >> six years ago or so... Alan, do you remember? It required some assembler >> work IIRC. > > Yes, the ppc32 ABI doesn't have the relocs to support DS fields. > Rather than defining a whole series of _DS (and _DQ!) relocs, the > linker inspects the instruction being relocated and complains if the > relocation would modify opcode bits. See is_insn_ds_form in > bfd/elf32-ppc.c. We do the same on ppc64 for DQ field insns. Ah, that makes a lot of sense - and ld64 also makes this check (for the same underlying reason, I am sure - we are short of reloc space anyway in Mach-O). >> I'll have another looke through this (esp. the generic part) when I'm fresh >> awake (but not before coffee!). Alan, can you have a look as well please? > > It looks reasonable to me. So, OK for trunk? (and backports after some bake time)? thanks Iain
Okay for trunk. For backports maybe wait a bit longer than usual? So ask again in two weeks, maybe? I know it's important for the darwin port, but the generic part is a little scary. On Sat, Oct 12, 2019 at 10:13:16PM +0100, Iain Sandoe wrote: > 2) To resolve this we need to extend the handling of the mem_operand_gpr to > allow looking through Mach-O PIC UNSPECs in the lo_sum cases. > > - note, that rs6000_offsettable_memref_p () will not handle these so that > would return early, producing the issue with unsatisfiable constraints. > > - I do wonder if that's also the case for some non-Darwin lo_sum cases. > > (some things might be hard to detect, since the code will generally fall > back to doing " la Rx xxx@l ; ld Ry 0(Rx)" so it won't fail - just be > less efficient than it could be). I'm putting this on the Big List of things I may some day have time to look at ;-) > * config/rs6000/darwin.md (movdi_low, movsi_low_st): Delete Full stop. > + /* We only care if the access(es) would cause a change to the high part. */ > + offset = ((offset & 0xffff) ^ 0x8000) - 0x8000; > + return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra); So this works because the "extra" part only is relevant for positive offsets. Okay. Tricky. > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -6894,13 +6894,6 @@ > ;; do the load 16-bits at a time. We could do this by loading from memory, > ;; and this is even supposed to be faster, but it is simpler not to get > ;; integers in the TOC. > -(define_insn "movsi_low" Should the preceding comment be moved elsewhere / changed / deleted? Segher
On Thu, Oct 17, 2019 at 01:46:41PM +1030, Alan Modra wrote: > On Sat, Oct 12, 2019 at 05:39:51PM -0500, Segher Boessenkool wrote: > > On Sat, Oct 12, 2019 at 10:13:16PM +0100, Iain Sandoe wrote: > > > For 32bit cases this isn't a problem since we can load/store to unaligned > > > addresses using D-mode insns. > > > > Can you? -m32 -mpowerpc64? We did have a bug with this before, maybe > > six years ago or so... Alan, do you remember? It required some assembler > > work IIRC. > > Yes, the ppc32 ABI doesn't have the relocs to support DS fields. > Rather than defining a whole series of _DS (and _DQ!) relocs, the > linker inspects the instruction being relocated and complains if the > relocation would modify opcode bits. See is_insn_ds_form in > bfd/elf32-ppc.c. We do the same on ppc64 for DQ field insns. Ah right, that was it. So it uses the D reloc but with DS or DQ restrictions. Gotcha. For the compiler this is just as if those DS and DQ relocs *do* exist. > > I'll have another looke through this (esp. the generic part) when I'm fresh > > awake (but not before coffee!). Alan, can you have a look as well please? > > It looks reasonable to me. Thanks Alan! Segher
Segher Boessenkool <segher@kernel.crashing.org> wrote: > Okay for trunk. For backports maybe wait a bit longer than usual? So ask > again in two weeks, maybe? I know it's important for the darwin port, but > the generic part is a little scary. No problem (I would like to get this in to the final issue of 7, if possible). FWIW, this implementation is completely guarded on TARGET_MACHO. I made some comments about “maybe the generic code would care about similar things” because when debugging and tracking through stuff, as you note below, it wasn’t obvious. > On Sat, Oct 12, 2019 at 10:13:16PM +0100, Iain Sandoe wrote: >> 2) To resolve this we need to extend the handling of the mem_operand_gpr to >> allow looking through Mach-O PIC UNSPECs in the lo_sum cases. >> >> - note, that rs6000_offsettable_memref_p () will not handle these so that >> would return early, producing the issue with unsatisfiable constraints. >> >> - I do wonder if that's also the case for some non-Darwin lo_sum cases. >> >> (some things might be hard to detect, since the code will generally fall >> back to doing " la Rx xxx@l ; ld Ry 0(Rx)" so it won't fail - just be >> less efficient than it could be). > > I'm putting this on the Big List of things I may some day have time to > look at ;-) > >> * config/rs6000/darwin.md (movdi_low, movsi_low_st): Delete > > Full stop. > >> + /* We only care if the access(es) would cause a change to the high part. */ >> + offset = ((offset & 0xffff) ^ 0x8000) - 0x8000; >> + return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra); > > So this works because the "extra" part only is relevant for positive > offsets. Okay. Tricky. > >> --- a/gcc/config/rs6000/rs6000.md >> +++ b/gcc/config/rs6000/rs6000.md >> @@ -6894,13 +6894,6 @@ >> ;; do the load 16-bits at a time. We could do this by loading from memory, >> ;; and this is even supposed to be faster, but it is simpler not to get >> ;; integers in the TOC. >> -(define_insn "movsi_low" > > Should the preceding comment be moved elsewhere / changed / deleted? It seemed to be a comment about the following code - or something that should have been deleted long ago - it mentions the TOC, which Darwin does not use so not a Darwin-related thing. Happy to do a separate patch to delete it that’s desired. cheers Iain
On Thu, Oct 17, 2019 at 10:37:33AM +0100, Iain Sandoe wrote: > Segher Boessenkool <segher@kernel.crashing.org> wrote: > > > Okay for trunk. For backports maybe wait a bit longer than usual? So ask > > again in two weeks, maybe? I know it's important for the darwin port, but > > the generic part is a little scary. > > No problem (I would like to get this in to the final issue of 7, if possible). > > FWIW, this implementation is completely guarded on TARGET_MACHO. Yeah, I misread that, somehow I thought the big change was inside mem_operand_gpr. It isn't, and it is perfectly safe elsewhere, so no special care is needed here at all, for backports. > >> --- a/gcc/config/rs6000/rs6000.md > >> +++ b/gcc/config/rs6000/rs6000.md > >> @@ -6894,13 +6894,6 @@ > >> ;; do the load 16-bits at a time. We could do this by loading from memory, > >> ;; and this is even supposed to be faster, but it is simpler not to get > >> ;; integers in the TOC. > >> -(define_insn "movsi_low" > > > > Should the preceding comment be moved elsewhere / changed / deleted? > > It seemed to be a comment about the following code - or something that should > have been deleted long ago - it mentions the TOC, which Darwin does not use > so not a Darwin-related thing. > > Happy to do a separate patch to delete it that’s desired. I think the comment is more confusing than helpful, currently. So sure, removing it is pre-approved. Segher
diff --git a/gcc/config/rs6000/darwin.md b/gcc/config/rs6000/darwin.md index 3994447f38..16f710b28b 100644 --- a/gcc/config/rs6000/darwin.md +++ b/gcc/config/rs6000/darwin.md @@ -122,33 +122,6 @@ You should have received a copy of the GNU General Public License [(set_attr "type" "store")]) ;; 64-bit MachO load/store support -(define_insn "movdi_low" - [(set (match_operand:DI 0 "gpc_reg_operand" "=r,*!d") - (mem:DI (lo_sum:DI (match_operand:DI 1 "gpc_reg_operand" "b,b") - (match_operand 2 "" ""))))] - "TARGET_MACHO && TARGET_64BIT" - "@ - ld %0,lo16(%2)(%1) - lfd %0,lo16(%2)(%1)" - [(set_attr "type" "load")]) - -(define_insn "movsi_low_st" - [(set (mem:SI (lo_sum:SI (match_operand:SI 1 "gpc_reg_operand" "b") - (match_operand 2 "" ""))) - (match_operand:SI 0 "gpc_reg_operand" "r"))] - "TARGET_MACHO && ! TARGET_64BIT" - "stw %0,lo16(%2)(%1)" - [(set_attr "type" "store")]) - -(define_insn "movdi_low_st" - [(set (mem:DI (lo_sum:DI (match_operand:DI 1 "gpc_reg_operand" "b,b") - (match_operand 2 "" ""))) - (match_operand:DI 0 "gpc_reg_operand" "r,*!d"))] - "TARGET_MACHO && TARGET_64BIT" - "@ - std %0,lo16(%2)(%1) - stfd %0,lo16(%2)(%1)" - [(set_attr "type" "store")]) ;; Mach-O PIC. diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index d1434a9f74..bc22573174 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -7329,6 +7329,103 @@ address_offset (rtx op) return NULL_RTX; } +/* This tests that a lo_sum {constant, symbol, symbol+offset} is valid for + the mode. If we can't find (or don't know) the alignment of the symbol + we assume (optimistically) that it's sufficiently aligned [??? maybe we + should be pessimistic]. Offsets are validated in the same way as for + reg + offset. */ +static bool +darwin_rs6000_legitimate_lo_sum_const_p (rtx x, machine_mode mode) +{ + /* We should not get here with this. */ + gcc_checking_assert (! mode_supports_dq_form (mode)); + + if (GET_CODE (x) == CONST) + x = XEXP (x, 0); + + if (GET_CODE (x) == UNSPEC && XINT (x, 1) == UNSPEC_MACHOPIC_OFFSET) + x = XVECEXP (x, 0, 0); + + rtx sym = NULL_RTX; + unsigned HOST_WIDE_INT offset = 0; + + if (GET_CODE (x) == PLUS) + { + sym = XEXP (x, 0); + if (! SYMBOL_REF_P (sym)) + return false; + if (!CONST_INT_P (XEXP (x, 1))) + return false; + offset = INTVAL (XEXP (x, 1)); + } + else if (SYMBOL_REF_P (x)) + sym = x; + else if (CONST_INT_P (x)) + offset = INTVAL (x); + else if (GET_CODE (x) == LABEL_REF) + offset = 0; // We assume code labels are suitably aligned + else + return false; // not sure what we have here. + + /* If we don't know the alignment of the thing to which the symbol refers, + we assume optimistically it is "enough". + ??? maybe we should be pessimistic instead. */ + unsigned align = 0; + + if (sym) + { + tree decl = SYMBOL_REF_DECL (sym); +#if TARGET_MACHO + if (MACHO_SYMBOL_INDIRECTION_P (sym)) + /* The decl in an indirection symbol is the original one, which might + be less aligned than the indirection. Our indirections are always + pointer-aligned. */ + ; + else +#endif + if (decl && DECL_ALIGN (decl)) + align = DECL_ALIGN_UNIT (decl); + } + + unsigned int extra = 0; + switch (mode) + { + case E_DFmode: + case E_DDmode: + case E_DImode: + /* If we are using VSX scalar loads, restrict ourselves to reg+reg + addressing. */ + if (VECTOR_MEM_VSX_P (mode)) + return false; + + if (!TARGET_POWERPC64) + extra = 4; + else if ((offset & 3) || (align & 3)) + return false; + break; + + case E_TFmode: + case E_IFmode: + case E_KFmode: + case E_TDmode: + case E_TImode: + case E_PTImode: + extra = 8; + if (!TARGET_POWERPC64) + extra = 12; + else if ((offset & 3) || (align & 3)) + return false; + break; + + default: + break; + } + + /* We only care if the access(es) would cause a change to the high part. */ + offset = ((offset & 0xffff) ^ 0x8000) - 0x8000; + return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra); +} + /* Return true if the MEM operand is a memory operand suitable for use with a (full width, possibly multiple) gpr load/store. On powerpc64 this means the offset must be divisible by 4. @@ -7366,7 +7463,13 @@ mem_operand_gpr (rtx op, machine_mode mode) if (address_is_prefixed (addr, mode, NON_PREFIXED_DS)) return true; - /* Don't allow non-offsettable addresses. See PRs 83969 and 84279. */ + /* We need to look through Mach-O PIC unspecs to determine if a lo_sum is + really OK. Doing this early avoids teaching all the other machinery + about them. */ + if (TARGET_MACHO && GET_CODE (addr) == LO_SUM) + return darwin_rs6000_legitimate_lo_sum_const_p (XEXP (addr, 1), mode); + + /* Only allow offsettable addresses. See PRs 83969 and 84279. */ if (!rs6000_offsettable_memref_p (op, mode, false)) return false; diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 2dca269744..29dd616520 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -6894,13 +6894,6 @@ ;; do the load 16-bits at a time. We could do this by loading from memory, ;; and this is even supposed to be faster, but it is simpler not to get ;; integers in the TOC. -(define_insn "movsi_low" - [(set (match_operand:SI 0 "gpc_reg_operand" "=r") - (mem:SI (lo_sum:SI (match_operand:SI 1 "gpc_reg_operand" "b") - (match_operand 2 "" ""))))] - "TARGET_MACHO && ! TARGET_64BIT" - "lwz %0,lo16(%2)(%1)" - [(set_attr "type" "load")]) ;; MR LA LWZ LFIWZX LXSIWZX ;; STW STFIWX STXSIWX LI LIS