diff mbox series

[RFC,Darwin,PPC] Fix PR 65342.

Message ID D2095106-2E30-440C-888E-95EE0ED51D7E@sandoe.co.uk
State New
Headers show
Series [RFC,Darwin,PPC] Fix PR 65342. | expand

Commit Message

Iain Sandoe Oct. 12, 2019, 9:13 p.m. UTC
Hi Folks,

(this is a bug reported against Fortran, but actually is a generic insn
 selection problem for m64 powerpc-darwin.  In fact, I’d say it renders
 the 64b multilib unusable).

The solution proposed is Darwin-local (it's a long-standing bug and it
would be intended to back-port it).  However, I'd welcome views on it.

The current Darwin load/store lo_sum patterns have neither predicate nor
constraint.  This means that most parts of the backend, which rely on
recog() to validate the rtx, can produce invalid combinations/selections.

For 32bit cases this isn't a problem since we can load/store to unaligned
addresses using D-mode insns.

Conversely, for 64bit instructions that use DS mode, this can manifest as
assemble errors (for an assembler that checks the LO14 relocations), or as
crashes caused by wrong offsets (or worse, wrong content for the two LSBs).

Trying to find the right place to fix this has been tricky, there's
discussion in the PR trail.  There doesn't seem to be any way to deal with
this in legitimize_address (since most of the damage is done by the wide open
patterns that are in effect after the legitimizer has run).  Likewise,
extending the checks in legitimate_address_p() and mode_dependent_address
doesn't help if the constraint is not accurate in the end.

What we want to check for "Y" on Darwin is:
  - that the alignment of the Symbols' target is sufficient for DS mode
  - that the offset is suitable for DS mode.
(while looking through the Mach-O PIC unspecs).

Proposed:

1) Drop the Darwin-specific lo_sum patterns.  Actually, this produces an
immediate progression in quite a number of tests, we begin using the
movdi_internal64 patterns.  However there are also regressions caused by
instructions unable to satisfy their constraints.

2) To resolve this we need to extend the handling of the  mem_operand_gpr to
allow looking through Mach-O PIC UNSPECs in the lo_sum cases.

 - note, that rs6000_offsettable_memref_p () will not handle these so that
   would return early, producing the issue with unsatisfiable constraints.

  - I do wonder if that's also the case for some non-Darwin lo_sum cases.

(some things might be hard to detect, since the code will generally fall
 back to doing " la  Rx xxx@l ; ld Ry 0(Rx)" so it won't fail - just be
 less efficient than it could be).

I’ve tested this on powerpc-darwin9, and powerpc64-linux-gnu.
It progresses around 120 tests across the testsuite (many in Fortran
but also in C and C++)

thoughts?
Iain

P.S. Probably the other crufty old lo_sum patterns can go too - but for another
day.

gcc/ChangeLog:

2019-10-12  Iain Sandoe  <iain@sandoe.co.uk>

	* config/rs6000/darwin.md (movdi_low, movsi_low_st): Delete
	(movdi_low_st): Delete.
	* config/rs6000/rs6000.c
	(darwin_rs6000_legitimate_lo_sum_const_p): New.
	(mem_operand_gpr): Validate Mach-O LO_SUM cases separately.
	* config/rs6000/rs6000.md (movsi_low): Delete.

Comments

Segher Boessenkool Oct. 12, 2019, 10:39 p.m. UTC | #1
Hi!

On Sat, Oct 12, 2019 at 10:13:16PM +0100, Iain Sandoe wrote:
> (this is a bug reported against Fortran, but actually is a generic insn
>  selection problem for m64 powerpc-darwin.  In fact, I’d say it renders
>  the 64b multilib unusable).
> 
> The solution proposed is Darwin-local (it's a long-standing bug and it
> would be intended to back-port it).  However, I'd welcome views on it.
> 
> The current Darwin load/store lo_sum patterns have neither predicate nor
> constraint.  This means that most parts of the backend, which rely on
> recog() to validate the rtx, can produce invalid combinations/selections.
> 
> For 32bit cases this isn't a problem since we can load/store to unaligned
> addresses using D-mode insns.

Can you?  -m32 -mpowerpc64?  We did have a bug with this before, maybe
six years ago or so...  Alan, do you remember?  It required some assembler
work IIRC.

> Conversely, for 64bit instructions that use DS mode, this can manifest as
> assemble errors (for an assembler that checks the LO14 relocations), or as
> crashes caused by wrong offsets (or worse, wrong content for the two LSBs).

Resulting in a different insn than intended.  Yeah.

> Trying to find the right place to fix this has been tricky, there's
> discussion in the PR trail.  There doesn't seem to be any way to deal with
> this in legitimize_address (since most of the damage is done by the wide open
> patterns that are in effect after the legitimizer has run).  Likewise,
> extending the checks in legitimate_address_p() and mode_dependent_address
> doesn't help if the constraint is not accurate in the end.
> 
> What we want to check for "Y" on Darwin is:
>   - that the alignment of the Symbols' target is sufficient for DS mode
>   - that the offset is suitable for DS mode.
> (while looking through the Mach-O PIC unspecs).
> 
> Proposed:
> 
> 1) Drop the Darwin-specific lo_sum patterns.  Actually, this produces an
> immediate progression in quite a number of tests, we begin using the
> movdi_internal64 patterns.  However there are also regressions caused by
> instructions unable to satisfy their constraints.

And code quality regressions?  Or does it even improve?  (One can dream...)

> 2) To resolve this we need to extend the handling of the  mem_operand_gpr to
> allow looking through Mach-O PIC UNSPECs in the lo_sum cases.
> 
>  - note, that rs6000_offsettable_memref_p () will not handle these so that
>    would return early, producing the issue with unsatisfiable constraints.
> 
>   - I do wonder if that's also the case for some non-Darwin lo_sum cases.

Not sure what exactly you mean here?  Non-Darwin doesn't have issues with
not looking through Darwin-specific unspecs, anywhere, so you mean
something more general no doubt -- but what?

> (some things might be hard to detect, since the code will generally fall
>  back to doing " la  Rx xxx@l ; ld Ry 0(Rx)" so it won't fail - just be
>  less efficient than it could be).

Oh, you are saying mem_operand_gpr does not allow lo_sum *at all*?


> +  /* If we don't know the alignment of the thing to which the symbol refers,
> +     we assume optimistically it is "enough".
> +     ??? maybe we should be pessimistic instead.  */
> +  unsigned align = 0;

If you guess it is aligned enough but it isn't, the compile will fail.  Not
good.  OTOH, when don't we know the alignment?  Only for globals?  Does the
ABI guarantee alignment in that case?


I'll have another looke through this (esp. the generic part) when I'm fresh
awake (but not before coffee!).  Alan, can you have a look as well please?


Segher
Iain Sandoe Oct. 12, 2019, 11:17 p.m. UTC | #2
Hi,

Segher Boessenkool <segher@kernel.crashing.org> wrote:
>
> On Sat, Oct 12, 2019 at 10:13:16PM +0100, Iain Sandoe wrote:
>> (this is a bug reported against Fortran, but actually is a generic insn
>> selection problem for m64 powerpc-darwin.  In fact, I’d say it renders
>> the 64b multilib unusable).
>>
>> The solution proposed is Darwin-local (it's a long-standing bug and it
>> would be intended to back-port it).  However, I'd welcome views on it.
>>
>> The current Darwin load/store lo_sum patterns have neither predicate nor
>> constraint.  This means that most parts of the backend, which rely on
>> recog() to validate the rtx, can produce invalid combinations/selections.
>>
>> For 32bit cases this isn't a problem since we can load/store to unaligned
>> addresses using D-mode insns.
>
> Can you?  -m32 -mpowerpc64?

In principle, this ought to be supported (there’s a pattern for the  
specific case
in darwin.md) however, I don’t think it’s tested and there seem to be a  
number
of places where the conditional is TARGET_64BIT instead of
TARGET_POWERPC64
so my guess is that there would be some work to do to make it happen,

> We did have a bug with this before, maybe
> six years ago or so...  Alan, do you remember?  It required some assembler
> work IIRC.
>
>> Conversely, for 64bit instructions that use DS mode, this can manifest as
>> assemble errors (for an assembler that checks the LO14 relocations), or as
>> crashes caused by wrong offsets (or worse, wrong content for the two  
>> LSBs).
>
> Resulting in a different insn than intended.  Yeah.
>
>> Trying to find the right place to fix this has been tricky, there's
>> discussion in the PR trail.  There doesn't seem to be any way to deal with
>> this in legitimize_address (since most of the damage is done by the wide  
>> open
>> patterns that are in effect after the legitimizer has run).  Likewise,
>> extending the checks in legitimate_address_p() and mode_dependent_address
>> doesn't help if the constraint is not accurate in the end.
>>
>> What we want to check for "Y" on Darwin is:
>>  - that the alignment of the Symbols' target is sufficient for DS mode
>>  - that the offset is suitable for DS mode.
>> (while looking through the Mach-O PIC unspecs).
>>
>> Proposed:
>>
>> 1) Drop the Darwin-specific lo_sum patterns.  Actually, this produces an
>> immediate progression in quite a number of tests, we begin using the
>> movdi_internal64 patterns.  However there are also regressions caused by
>> instructions unable to satisfy their constraints.
>
> And code quality regressions?  Or does it even improve?  (One can dream…)

The code quality for the testcases I made for this is good when we do the  
part
below - because, unless we cater for the indirections, we generate quite a  
few
unnecessary loads for the PIC case.

>> 2) To resolve this we need to extend the handling of the   
>> mem_operand_gpr to
>> allow looking through Mach-O PIC UNSPECs in the lo_sum cases.
>>
>> - note, that rs6000_offsettable_memref_p () will not handle these so that
>>   would return early, producing the issue with unsatisfiable constraints.
>>
>>  - I do wonder if that's also the case for some non-Darwin lo_sum cases.
>
> Not sure what exactly you mean here?  Non-Darwin doesn't have issues with
> not looking through Darwin-specific unspecs, anywhere, so you mean
> something more general no doubt -- but what?

LO_SUM appears to be handled in the case that it refers to a constant pool
address, but I wonder if there are any other circumstances that it could be
relevant.

>> (some things might be hard to detect, since the code will generally fall
>> back to doing " la  Rx xxx@l ; ld Ry 0(Rx)" so it won't fail - just be
>> less efficient than it could be).
>
> Oh, you are saying mem_operand_gpr does not allow lo_sum *at all*?

No, I think it will let LO_SUM through for constant pool entries, I’m not  
sure
about any other case tho (perhaps there is no other case that matters).

The absence of something is much harder to reason about / test for than the
presence of something - but I am sure that you and Alan know what you
expect to be there.

>> +  /* If we don't know the alignment of the thing to which the symbol  
>> refers,
>> +     we assume optimistically it is "enough".
>> +     ??? maybe we should be pessimistic instead.  */
>> +  unsigned align = 0;
>
> If you guess it is aligned enough but it isn't, the compile will fail.  Not
> good.  OTOH, when don't we know the alignment?  Only for globals?  Does the
> ABI guarantee alignment in that case?

We usually know the alignment for globals too - I’m not sure under what
circumstance there is no decl attached to the symbol_ref
There is code rs6000.c:7627 or so, that suggests that section anchors can
produce the effect, so that’s something I can look at once the basic problem
is solved.

> I'll have another looke through this (esp. the generic part) when I'm fresh
> awake (but not before coffee!).  Alan, can you have a look as well please?
thanks!
Iain
Alan Modra Oct. 17, 2019, 3:16 a.m. UTC | #3
On Sat, Oct 12, 2019 at 05:39:51PM -0500, Segher Boessenkool wrote:
> On Sat, Oct 12, 2019 at 10:13:16PM +0100, Iain Sandoe wrote:
> > For 32bit cases this isn't a problem since we can load/store to unaligned
> > addresses using D-mode insns.
> 
> Can you?  -m32 -mpowerpc64?  We did have a bug with this before, maybe
> six years ago or so...  Alan, do you remember?  It required some assembler
> work IIRC.

Yes, the ppc32 ABI doesn't have the relocs to support DS fields.
Rather than defining a whole series of _DS (and _DQ!) relocs, the
linker inspects the instruction being relocated and complains if the
relocation would modify opcode bits.  See is_insn_ds_form in
bfd/elf32-ppc.c.  We do the same on ppc64 for DQ field insns.

> I'll have another looke through this (esp. the generic part) when I'm fresh
> awake (but not before coffee!).  Alan, can you have a look as well please?

It looks reasonable to me.
Iain Sandoe Oct. 17, 2019, 7:43 a.m. UTC | #4
Thanks for the reviews, Segher, Alan,

Alan Modra <amodra@gmail.com> wrote:

> On Sat, Oct 12, 2019 at 05:39:51PM -0500, Segher Boessenkool wrote:
>> On Sat, Oct 12, 2019 at 10:13:16PM +0100, Iain Sandoe wrote:
>>> For 32bit cases this isn't a problem since we can load/store to unaligned
>>> addresses using D-mode insns.
>> 
>> Can you?  -m32 -mpowerpc64?  We did have a bug with this before, maybe
>> six years ago or so...  Alan, do you remember?  It required some assembler
>> work IIRC.
> 
> Yes, the ppc32 ABI doesn't have the relocs to support DS fields.
> Rather than defining a whole series of _DS (and _DQ!) relocs, the
> linker inspects the instruction being relocated and complains if the
> relocation would modify opcode bits.  See is_insn_ds_form in
> bfd/elf32-ppc.c.  We do the same on ppc64 for DQ field insns.

Ah, that makes a lot of sense - and ld64 also makes this check (for the same
underlying reason, I am sure - we are short of reloc space anyway in Mach-O).

>> I'll have another looke through this (esp. the generic part) when I'm fresh
>> awake (but not before coffee!).  Alan, can you have a look as well please?
> 
> It looks reasonable to me.

So, OK for trunk?
(and backports after some bake time)?

thanks
Iain
Segher Boessenkool Oct. 17, 2019, 9:23 a.m. UTC | #5
Okay for trunk.  For backports maybe wait a bit longer than usual?  So ask
again in two weeks, maybe?  I know it's important for the darwin port, but
the generic part is a little scary.

On Sat, Oct 12, 2019 at 10:13:16PM +0100, Iain Sandoe wrote:
> 2) To resolve this we need to extend the handling of the  mem_operand_gpr to
> allow looking through Mach-O PIC UNSPECs in the lo_sum cases.
> 
>  - note, that rs6000_offsettable_memref_p () will not handle these so that
>    would return early, producing the issue with unsatisfiable constraints.
> 
>   - I do wonder if that's also the case for some non-Darwin lo_sum cases.
> 
> (some things might be hard to detect, since the code will generally fall
>  back to doing " la  Rx xxx@l ; ld Ry 0(Rx)" so it won't fail - just be
>  less efficient than it could be).

I'm putting this on the Big List of things I may some day have time to
look at ;-)

> 	* config/rs6000/darwin.md (movdi_low, movsi_low_st): Delete

Full stop.

> +  /* We only care if the access(es) would cause a change to the high part.  */
> +  offset = ((offset & 0xffff) ^ 0x8000) - 0x8000;
> +  return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);

So this works because the "extra" part only is relevant for positive
offsets.  Okay.  Tricky.

> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -6894,13 +6894,6 @@
>  ;; do the load 16-bits at a time.  We could do this by loading from memory,
>  ;; and this is even supposed to be faster, but it is simpler not to get
>  ;; integers in the TOC.
> -(define_insn "movsi_low"

Should the preceding comment be moved elsewhere / changed / deleted?


Segher
Segher Boessenkool Oct. 17, 2019, 9:32 a.m. UTC | #6
On Thu, Oct 17, 2019 at 01:46:41PM +1030, Alan Modra wrote:
> On Sat, Oct 12, 2019 at 05:39:51PM -0500, Segher Boessenkool wrote:
> > On Sat, Oct 12, 2019 at 10:13:16PM +0100, Iain Sandoe wrote:
> > > For 32bit cases this isn't a problem since we can load/store to unaligned
> > > addresses using D-mode insns.
> > 
> > Can you?  -m32 -mpowerpc64?  We did have a bug with this before, maybe
> > six years ago or so...  Alan, do you remember?  It required some assembler
> > work IIRC.
> 
> Yes, the ppc32 ABI doesn't have the relocs to support DS fields.
> Rather than defining a whole series of _DS (and _DQ!) relocs, the
> linker inspects the instruction being relocated and complains if the
> relocation would modify opcode bits.  See is_insn_ds_form in
> bfd/elf32-ppc.c.  We do the same on ppc64 for DQ field insns.

Ah right, that was it.  So it uses the D reloc but with DS or DQ
restrictions.  Gotcha.  For the compiler this is just as if those DS and
DQ relocs *do* exist.

> > I'll have another looke through this (esp. the generic part) when I'm fresh
> > awake (but not before coffee!).  Alan, can you have a look as well please?
> 
> It looks reasonable to me.

Thanks Alan!


Segher
Iain Sandoe Oct. 17, 2019, 9:37 a.m. UTC | #7
Segher Boessenkool <segher@kernel.crashing.org> wrote:

> Okay for trunk.  For backports maybe wait a bit longer than usual?  So ask
> again in two weeks, maybe?  I know it's important for the darwin port, but
> the generic part is a little scary.

No problem (I would like to get this in to the final issue of 7, if possible).

FWIW, this implementation is completely guarded on TARGET_MACHO.

I made some comments about “maybe the generic code would care about 
similar things” because when debugging and tracking through stuff, as you
note below, it wasn’t obvious.

> On Sat, Oct 12, 2019 at 10:13:16PM +0100, Iain Sandoe wrote:
>> 2) To resolve this we need to extend the handling of the  mem_operand_gpr to
>> allow looking through Mach-O PIC UNSPECs in the lo_sum cases.
>> 
>> - note, that rs6000_offsettable_memref_p () will not handle these so that
>>   would return early, producing the issue with unsatisfiable constraints.
>> 
>>  - I do wonder if that's also the case for some non-Darwin lo_sum cases.
>> 
>> (some things might be hard to detect, since the code will generally fall
>> back to doing " la  Rx xxx@l ; ld Ry 0(Rx)" so it won't fail - just be
>> less efficient than it could be).
> 
> I'm putting this on the Big List of things I may some day have time to
> look at ;-)
> 
>> * config/rs6000/darwin.md (movdi_low, movsi_low_st): Delete
> 
> Full stop.
> 
>> +  /* We only care if the access(es) would cause a change to the high part.  */
>> +  offset = ((offset & 0xffff) ^ 0x8000) - 0x8000;
>> +  return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);
> 
> So this works because the "extra" part only is relevant for positive
> offsets.  Okay.  Tricky.
> 
>> --- a/gcc/config/rs6000/rs6000.md
>> +++ b/gcc/config/rs6000/rs6000.md
>> @@ -6894,13 +6894,6 @@
>> ;; do the load 16-bits at a time.  We could do this by loading from memory,
>> ;; and this is even supposed to be faster, but it is simpler not to get
>> ;; integers in the TOC.
>> -(define_insn "movsi_low"
> 
> Should the preceding comment be moved elsewhere / changed / deleted?

It seemed to be a comment about the following code - or something that should
have been deleted long ago - it mentions the TOC, which Darwin does not use
so not a Darwin-related thing.

Happy to do a separate patch to delete it that’s desired.

cheers
Iain
Segher Boessenkool Oct. 17, 2019, 10:01 a.m. UTC | #8
On Thu, Oct 17, 2019 at 10:37:33AM +0100, Iain Sandoe wrote:
> Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> > Okay for trunk.  For backports maybe wait a bit longer than usual?  So ask
> > again in two weeks, maybe?  I know it's important for the darwin port, but
> > the generic part is a little scary.
> 
> No problem (I would like to get this in to the final issue of 7, if possible).
> 
> FWIW, this implementation is completely guarded on TARGET_MACHO.

Yeah, I misread that, somehow I thought the big change was inside
mem_operand_gpr.  It isn't, and it is perfectly safe elsewhere, so no
special care is needed here at all, for backports.

> >> --- a/gcc/config/rs6000/rs6000.md
> >> +++ b/gcc/config/rs6000/rs6000.md
> >> @@ -6894,13 +6894,6 @@
> >> ;; do the load 16-bits at a time.  We could do this by loading from memory,
> >> ;; and this is even supposed to be faster, but it is simpler not to get
> >> ;; integers in the TOC.
> >> -(define_insn "movsi_low"
> > 
> > Should the preceding comment be moved elsewhere / changed / deleted?
> 
> It seemed to be a comment about the following code - or something that should
> have been deleted long ago - it mentions the TOC, which Darwin does not use
> so not a Darwin-related thing.
> 
> Happy to do a separate patch to delete it that’s desired.

I think the comment is more confusing than helpful, currently.  So sure,
removing it is pre-approved.


Segher
diff mbox series

Patch

diff --git a/gcc/config/rs6000/darwin.md b/gcc/config/rs6000/darwin.md
index 3994447f38..16f710b28b 100644
--- a/gcc/config/rs6000/darwin.md
+++ b/gcc/config/rs6000/darwin.md
@@ -122,33 +122,6 @@  You should have received a copy of the GNU General Public License
   [(set_attr "type" "store")])
 
 ;; 64-bit MachO load/store support
-(define_insn "movdi_low"
-  [(set (match_operand:DI 0 "gpc_reg_operand" "=r,*!d")
-        (mem:DI (lo_sum:DI (match_operand:DI 1 "gpc_reg_operand" "b,b")
-                           (match_operand 2 "" ""))))]
-  "TARGET_MACHO && TARGET_64BIT"
-  "@
-   ld %0,lo16(%2)(%1)
-   lfd %0,lo16(%2)(%1)"
-  [(set_attr "type" "load")])
-
-(define_insn "movsi_low_st"
-  [(set (mem:SI (lo_sum:SI (match_operand:SI 1 "gpc_reg_operand" "b")
-                           (match_operand 2 "" "")))
-	(match_operand:SI 0 "gpc_reg_operand" "r"))]
-  "TARGET_MACHO && ! TARGET_64BIT"
-  "stw %0,lo16(%2)(%1)"
-  [(set_attr "type" "store")])
-
-(define_insn "movdi_low_st"
-  [(set (mem:DI (lo_sum:DI (match_operand:DI 1 "gpc_reg_operand" "b,b")
-                           (match_operand 2 "" "")))
-	(match_operand:DI 0 "gpc_reg_operand" "r,*!d"))]
-  "TARGET_MACHO && TARGET_64BIT"
-  "@
-   std %0,lo16(%2)(%1)
-   stfd %0,lo16(%2)(%1)"
-  [(set_attr "type" "store")])
 
 ;; Mach-O PIC.
 
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index d1434a9f74..bc22573174 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -7329,6 +7329,103 @@  address_offset (rtx op)
   return NULL_RTX;
 }
 
+/* This tests that a lo_sum {constant, symbol, symbol+offset} is valid for
+   the mode.  If we can't find (or don't know) the alignment of the symbol
+   we assume (optimistically) that it's sufficiently aligned [??? maybe we
+   should be pessimistic].  Offsets are validated in the same way as for
+   reg + offset.  */
+static bool
+darwin_rs6000_legitimate_lo_sum_const_p (rtx x, machine_mode mode)
+{
+  /* We should not get here with this.  */
+  gcc_checking_assert (! mode_supports_dq_form (mode));
+
+  if (GET_CODE (x) == CONST)
+    x = XEXP (x, 0);
+
+  if (GET_CODE (x) == UNSPEC && XINT (x, 1) == UNSPEC_MACHOPIC_OFFSET)
+    x =  XVECEXP (x, 0, 0);
+
+  rtx sym = NULL_RTX;
+  unsigned HOST_WIDE_INT offset = 0;
+
+  if (GET_CODE (x) == PLUS)
+    {
+      sym = XEXP (x, 0);
+      if (! SYMBOL_REF_P (sym))
+	return false;
+      if (!CONST_INT_P (XEXP (x, 1)))
+	return false;
+      offset = INTVAL (XEXP (x, 1));
+    }
+  else if (SYMBOL_REF_P (x))
+    sym = x;
+  else if (CONST_INT_P (x))
+    offset = INTVAL (x);
+  else if (GET_CODE (x) == LABEL_REF)
+    offset = 0; // We assume code labels are suitably aligned
+  else
+    return false; // not sure what we have here.
+
+  /* If we don't know the alignment of the thing to which the symbol refers,
+     we assume optimistically it is "enough".
+     ??? maybe we should be pessimistic instead.  */
+  unsigned align = 0;
+
+  if (sym)
+    {
+      tree decl = SYMBOL_REF_DECL (sym);
+#if TARGET_MACHO
+      if (MACHO_SYMBOL_INDIRECTION_P (sym))
+      /* The decl in an indirection symbol is the original one, which might
+	 be less aligned than the indirection.  Our indirections are always
+	 pointer-aligned.  */
+	;
+      else
+#endif
+      if (decl && DECL_ALIGN (decl))
+	align = DECL_ALIGN_UNIT (decl);
+   }
+
+  unsigned int extra = 0;
+  switch (mode)
+    {
+    case E_DFmode:
+    case E_DDmode:
+    case E_DImode:
+      /* If we are using VSX scalar loads, restrict ourselves to reg+reg
+	 addressing.  */
+      if (VECTOR_MEM_VSX_P (mode))
+	return false;
+
+      if (!TARGET_POWERPC64)
+	extra = 4;
+      else if ((offset & 3) || (align & 3))
+	return false;
+      break;
+
+    case E_TFmode:
+    case E_IFmode:
+    case E_KFmode:
+    case E_TDmode:
+    case E_TImode:
+    case E_PTImode:
+      extra = 8;
+      if (!TARGET_POWERPC64)
+	extra = 12;
+      else if ((offset & 3) || (align & 3))
+	return false;
+      break;
+
+    default:
+      break;
+    }
+
+  /* We only care if the access(es) would cause a change to the high part.  */
+  offset = ((offset & 0xffff) ^ 0x8000) - 0x8000;
+  return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);
+}
+
 /* Return true if the MEM operand is a memory operand suitable for use
    with a (full width, possibly multiple) gpr load/store.  On
    powerpc64 this means the offset must be divisible by 4.
@@ -7366,7 +7463,13 @@  mem_operand_gpr (rtx op, machine_mode mode)
   if (address_is_prefixed (addr, mode, NON_PREFIXED_DS))
     return true;
 
-  /* Don't allow non-offsettable addresses.  See PRs 83969 and 84279.  */
+  /* We need to look through Mach-O PIC unspecs to determine if a lo_sum is
+     really OK.  Doing this early avoids teaching all the other machinery
+     about them.  */
+  if (TARGET_MACHO && GET_CODE (addr) == LO_SUM)
+    return darwin_rs6000_legitimate_lo_sum_const_p (XEXP (addr, 1), mode);
+
+  /* Only allow offsettable addresses.  See PRs 83969 and 84279.  */
   if (!rs6000_offsettable_memref_p (op, mode, false))
     return false;
 
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 2dca269744..29dd616520 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -6894,13 +6894,6 @@ 
 ;; do the load 16-bits at a time.  We could do this by loading from memory,
 ;; and this is even supposed to be faster, but it is simpler not to get
 ;; integers in the TOC.
-(define_insn "movsi_low"
-  [(set (match_operand:SI 0 "gpc_reg_operand" "=r")
-        (mem:SI (lo_sum:SI (match_operand:SI 1 "gpc_reg_operand" "b")
-                           (match_operand 2 "" ""))))]
-  "TARGET_MACHO && ! TARGET_64BIT"
-  "lwz %0,lo16(%2)(%1)"
-  [(set_attr "type" "load")])
 
 ;;		MR           LA           LWZ          LFIWZX       LXSIWZX
 ;;		STW          STFIWX       STXSIWX      LI           LIS