Patchwork PR53914, rs6000 constraints and reload queries

login
register
mail settings
Submitter Alan Modra
Date July 19, 2012, 3:04 a.m.
Message ID <20120719030425.GE27761@bubble.grove.modra.org>
Download mbox | patch
Permalink /patch/171863/
State New
Headers show

Comments

Alan Modra - July 19, 2012, 3:04 a.m.
Thanks very much Uli for verifying my conclusions about reload,
operand predicates and constraints, and particularly the general
unusability of the "o" constraint.

Re http://gcc.gnu.org/ml/gcc/2012-07/msg00142.html, this patch adds
the missing secondary reload patterns, corrects constraints I got
wrong ("?*d", not "*?d"), and fixes pr54009.

Uli said:
> An address involving pseudos should be
> considered "legitimate" if there exists an assignment of hard
> registers that makes it strictly legitimate (not if *any* such
> assignment would be strictly legitimate).  [ It might make sense
> in some cases to make the check stricter; for example if we know
> that an address would nearly always require a reload, we might
> choose to completely reject it if that actually increases performance.
> But that would be just performance tuning, not required for
> correctness ... ]
So there is quite a bit more work in rs6000.c to fully implement this.
See ??? comments that I added on code handling lo_sum, and I'll admit
to not even trying to relax rs6000_legitimate_offset_address_p
conditions for e500.  That can wait for another day.  The patch is
large enough already.

Some notes:
- word_offset_memref_operand isn't used as a predicate and as both Uli
and I noted, constraints calling predicates lead to trouble with
reload_legitimize_address output.  So move it out of predicates.md to
rs6000.c (renamed as mem_operand_gpr and without checks more suited to
predicates).
- where I changed a bunch of mode tests to GET_MODE_SIZE checks, the
original mode list missing TImode is irrelevant for 32-bit, since
TImode isn't supported on 32-bit (Why do we have 32-bit TImode insns?)
- reordering insn alternatives in some cases is cosmetic.  As the
comments say, putting r->Y and Y->r before r->r is necessary, but
reordering d->m,m->d,d->d isn't strictly necessary.  I did that for
consistency, and future proofing should the m constraint need to be
changed.  Putting r->Y before Y->r is also cosmetic but I prefer it
that way for insns that land in reload as pseudo->pseudo ie. mem->mem,
where both load and store alternatives match with reloading.  I think
it's nicer to choose input reloads rather than output reloads, so put
the store first.
- I haven't actually seen the 32-bit gpr secondary reload patterns
trigger (it's hard to make a testcase), so that code is largely
untested.  Fortunately the code is very similar to the 64-bit gpr
secondary reload code.
- movdf_hardfloat32 insn lengths looked wrong to me, so I fixed that.
gpr load and store ought to be just two insns, not four.  I also took
out the ?? kludge since the offsettable address problem is now fixed.
- I don't really like disparaging fprs in a number of DImode insns,
but without that reload prefers to reload inputs.  So you get code
like stw 10,xxx(1); stw 11,xxx+4(1); lfd 0,xxx(1); stfd 0,32764(9);
rather than addi 9,9,32764; stw 10,0(9); stw 11,4(9);  The former is
slower and requires a stack frame.

Bootstrapped and regression tested powerpc-linux.  OK to apply?

	PR target/53914
	PR target/54009
	* config/rs6000/constraints.md (Y): Use mem_operand_gpr.
	(wY): New constraint using mem_operand_fpr.
	* config/rs6000/predicates.md (word_offset_memref_operand): Delete.
	Adjust all rs6000_legitimate_offset_address_p calls.
	* config/rs6000/rs6000-protos.h (mem_operand_gpr): Declare.
	(mem_operand_fpr, rs6000_secondary_reload_gpr): Declare.
	(rs6000_legitimate_offset_address_p): Update prototype.
	(rs6000_offsettable_memref_p): Delete.
	(rs6000_secondary_reload_ppc64): Delete.
	* config/rs6000/rs6000.c (address_offset): New function.
	(mem_operand_gpr, mem_operand_fpr): Likewise.
	(rs6000_legitimate_offset_address_p): Add worst_case param.  When
	not worst_case assume class of regs with least restrictive offsets.
	Adjust all calls.
	(legitimate_lo_sum_address_p): Simplify register mode tests.
	(rs6000_legitimize_address): Likewise.  Assume best case offset
	addressing.  Combine ELF and MACHO lo_sum code.
	(rs6000_mode_dependent_address): Correct offset addressing limits.
	(rs6000_offsettable_memref_p): Make static, add reg_mode param.
	Use reg_mode to help rs6000_legitimate_offset_address_p.
	(rs6000_secondary_reload): Use address_offset.  Handle 32-bit multi
	gpr load/store when offset too large.
	(rs6000_secondary_reload_gpr): Renamed rs6000_secondary_reload_ppc64.
	(rs6000_split_multireg_move): Adjust rs6000_offsettable_memref_p calls.
	* config/rs6000/rs6000.md (movdf_hardfloat32): Use 'Y' constraint
	for gpr load/store.  Order alternatives as r->Y,Y->r,r->r and
	d->m,m->d,d->d.  Correct size of gpr load/store.
	(movdf_softfloat32): Use 'Y' constraint for gpr load/store.  Order
	alternatives.
	(movti_ppc64): Likewise.
	(movdi_internal32): Likewise.  Also disparage fprs.
	(movdi_mfpgpr, movdi_internal64): Likewise.
	(movtf_internal): Use 'wY' for fpr load/store.  Order alternatives.
	(movtf_softfloat): Order alternatives.
	(extenddftf2_internal): Use 'wY' and 'Y' for store.
	(movti_power, movti_string): Use 'Y' for gpr load/store.  Order.
	(stack_protect_setdi, stack_protect_testdi): Likewise.
	(movdf_hardfloat64_mfpgpr, movdf_hardfloat64): Order alternatives.
	(movdf_softfloat64): Likewise.
	(reload_<mode>_store): Adjust reload_di_store to provide
	reload_si_store as well.
	(reload_<mode>_load): Likewise.
Alan Modra - July 19, 2012, 2:35 p.m.
On Thu, Jul 19, 2012 at 12:34:25PM +0930, Alan Modra wrote:
> and fixes pr54009.

David, in looking over this today, I realised that this bug isn't
completely fixed.  I stopped gcc emitting an offset of 32768, but that
isn't enough.  lo_sum addresses can't be offset, except when you know
the alignment of the object being addressed guarantees that no part
crosses a 32k boundary.  For example, given lis 9,x@ha; lwz 3,x@l(9);
lwz 4,x+4@l(9); we run into trouble if x happens to reside at
n*64k + 32764.  The final address matters, not just the offset in the
insn.  So I have some changes to mem_operand_gpr and
rs6000_secondary_reload.  I'll post a patch on top of the previous
one.

Another thing.  I reckon we can do without the 'wY' constraint.  I
implemented it because movtf_internal currently uses an "o"
constraint, but it seems to me that rs6000_legitimate_address_p
already prevents non-offsettable TFmode addresses for mems accessed by
fprs.  Geoff introduced the "o" on movtf here:
http://gcc.gnu.org/ml/gcc-patches/2003-12/msg00803.html
Thoughts?
Alan Modra - July 20, 2012, 12:45 a.m.
On Fri, Jul 20, 2012 at 12:05:28AM +0930, Alan Modra wrote:
> On Thu, Jul 19, 2012 at 12:34:25PM +0930, Alan Modra wrote:
> > and fixes pr54009.
> 
> David, in looking over this today, I realised that this bug isn't
> completely fixed.  I stopped gcc emitting an offset of 32768, but that
> isn't enough.  lo_sum addresses can't be offset, except when you know
> the alignment of the object being addressed guarantees that no part
> crosses a 32k boundary.  For example, given lis 9,x@ha; lwz 3,x@l(9);
> lwz 4,x+4@l(9); we run into trouble if x happens to reside at
> n*64k + 32764.  The final address matters, not just the offset in the
> insn.  So I have some changes to mem_operand_gpr and
> rs6000_secondary_reload.  I'll post a patch on top of the previous
> one.

Actually, maybe I won't.  When I disabled offsetting LO_SUMs, except
for tocrefs (which we know are aligned), I ran into an "insn does not
satisfy its constraints" ICE on gcc.dg/pr28795-2.  The cause: code in
rs6000_legitimze_reload_address near the comment starting "/* Don't do
this for TFmode".  There we have a list of modes from which we make
assumptions about the regs used, and get it wrong for ppc32.  Reload
decided to use 2 gprs for a DFmode.

We've been doing this forever, and almost always the mem is aligned.
So fixing an esoteric testcase like the one in pr54009 would come at
the expense of normal code quality.  Unless I can reliably figure out
regs used and alignment of syms in rs6000_legitimize_reload_address,
and as far as I can tell that isn't possible.

> Another thing.  I reckon we can do without the 'wY' constraint.  I
> implemented it because movtf_internal currently uses an "o"
> constraint, but it seems to me that rs6000_legitimate_address_p
> already prevents non-offsettable TFmode addresses for mems accessed by
> fprs.  Geoff introduced the "o" on movtf here:
> http://gcc.gnu.org/ml/gcc-patches/2003-12/msg00803.html
> Thoughts?

This on the other hand works.  Please consider the patch amended to
remove mem_operand_fpr, the 'wY' constraint, and uses thereof replaced
with "m".
David Edelsohn - July 21, 2012, 7:52 p.m.
On Thu, Jul 19, 2012 at 8:45 PM, Alan Modra <amodra@gmail.com> wrote:

> This on the other hand works.  Please consider the patch amended to
> remove mem_operand_fpr, the 'wY' constraint, and uses thereof replaced
> with "m".

What, exactly, is the current proposed patch? Remove offsettable
address constraint from movXX patterns but not from LO_SUM, and not
adding wY constraint?

Thanks, David

Patch

Index: gcc/config/rs6000/constraints.md
===================================================================
--- gcc/config/rs6000/constraints.md	(revision 189420)
+++ gcc/config/rs6000/constraints.md	(working copy)
@@ -150,9 +150,15 @@  to use @samp{m} or @samp{es} in @code{asm} stateme
        (match_test "GET_CODE (XEXP (op, 0)) == REG")))
 
 (define_memory_constraint "Y"
-  "Indexed or word-aligned displacement memory operand"
-  (match_operand 0 "word_offset_memref_operand"))
+  "memory operand for 8 byte and 16 byte gpr load/store"
+  (and (match_code "mem")
+       (match_operand 0 "mem_operand_gpr")))
 
+(define_memory_constraint "wY"
+  "memory operand for 8 byte and 16 byte fpr load/store"
+  (and (match_code "mem")
+       (match_operand 0 "mem_operand_fpr")))
+
 (define_memory_constraint "Z"
   "Memory operand that is an indexed or indirect from a register (it is
 usually better to use @samp{m} or @samp{es} in @code{asm} statements)"
Index: gcc/config/rs6000/predicates.md
===================================================================
--- gcc/config/rs6000/predicates.md	(revision 189420)
+++ gcc/config/rs6000/predicates.md	(working copy)
@@ -432,29 +432,6 @@ 
   (and (match_operand 0 "memory_operand")
        (match_test "offsettable_nonstrict_memref_p (op)")))
 
-;; Return 1 if the operand is a memory operand with an address divisible by 4
-(define_predicate "word_offset_memref_operand"
-  (match_operand 0 "memory_operand")
-{
-  /* Address inside MEM.  */
-  op = XEXP (op, 0);
-
-  /* Extract address from auto-inc/dec.  */
-  if (GET_CODE (op) == PRE_INC
-      || GET_CODE (op) == PRE_DEC)
-    op = XEXP (op, 0);
-  else if (GET_CODE (op) == PRE_MODIFY)
-    op = XEXP (op, 1);
-  else if (GET_CODE (op) == LO_SUM
-	   && GET_CODE (XEXP (op, 0)) == REG
-	   && GET_CODE (XEXP (op, 1)) == CONST)
-    op = XEXP (XEXP (op, 1), 0);
-
-  return (GET_CODE (op) != PLUS
-	  || GET_CODE (XEXP (op, 1)) != CONST_INT
-	  || INTVAL (XEXP (op, 1)) % 4 == 0);
-})
-
 ;; Return 1 if the operand is an indexed or indirect memory operand.
 (define_predicate "indexed_or_indirect_operand"
   (match_code "mem")
@@ -892,7 +869,8 @@ 
   return input_operand (op, mode);
 })
 
-;; Return true if OP is an invalid SUBREG operation on the e500.
+;; Return true if OP is a non-immediate operand and not an invalid
+;; SUBREG operation on the e500.
 (define_predicate "rs6000_nonimmediate_operand"
   (match_code "reg,subreg,mem")
 {
@@ -1325,7 +1303,7 @@ 
       if (base_regno == 0)
 	return 0;
     }
-  else if (rs6000_legitimate_offset_address_p (SImode, src_addr, 0))
+  else if (rs6000_legitimate_offset_address_p (SImode, src_addr, false, false))
     {
       offset = INTVAL (XEXP (src_addr, 1));
       base_regno = REGNO (XEXP (src_addr, 0));
@@ -1353,7 +1331,7 @@ 
 	  newoffset = 0;
 	  addr_reg = newaddr;
 	}
-      else if (rs6000_legitimate_offset_address_p (SImode, newaddr, 0))
+      else if (rs6000_legitimate_offset_address_p (SImode, newaddr, false, false))
 	{
 	  addr_reg = XEXP (newaddr, 0);
 	  newoffset = INTVAL (XEXP (newaddr, 1));
@@ -1400,7 +1378,7 @@ 
       if (base_regno == 0)
 	return 0;
     }
-  else if (rs6000_legitimate_offset_address_p (SImode, dest_addr, 0))
+  else if (rs6000_legitimate_offset_address_p (SImode, dest_addr, false, false))
     {
       offset = INTVAL (XEXP (dest_addr, 1));
       base_regno = REGNO (XEXP (dest_addr, 0));
@@ -1428,7 +1406,7 @@ 
 	  newoffset = 0;
 	  addr_reg = newaddr;
 	}
-      else if (rs6000_legitimate_offset_address_p (SImode, newaddr, 0))
+      else if (rs6000_legitimate_offset_address_p (SImode, newaddr, false, false))
 	{
 	  addr_reg = XEXP (newaddr, 0);
 	  newoffset = INTVAL (XEXP (newaddr, 1));
Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 189420)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -38,6 +38,8 @@  extern bool macho_lo_sum_memory_operand (rtx, enum
 extern int num_insns_constant (rtx, enum machine_mode);
 extern int num_insns_constant_wide (HOST_WIDE_INT);
 extern int small_data_operand (rtx, enum machine_mode);
+extern bool mem_operand_gpr (rtx, enum machine_mode);
+extern bool mem_operand_fpr (rtx, enum machine_mode);
 extern bool toc_relative_expr_p (const_rtx, bool);
 extern bool invalid_e500_subreg (rtx, enum machine_mode);
 extern void validate_condition_mode (enum rtx_code, enum machine_mode);
@@ -83,7 +85,7 @@  extern bool (*rs6000_cannot_change_mode_class_ptr)
 						    enum machine_mode,
 						    enum reg_class);
 extern void rs6000_secondary_reload_inner (rtx, rtx, rtx, bool);
-extern void rs6000_secondary_reload_ppc64 (rtx, rtx, rtx, bool);
+extern void rs6000_secondary_reload_gpr (rtx, rtx, rtx, bool);
 extern int paired_emit_vector_cond_expr (rtx, rtx, rtx,
                                          rtx, rtx, rtx);
 extern void paired_expand_vector_move (rtx operands[]);
@@ -121,9 +123,9 @@  extern void rs6000_emit_move (rtx, rtx, enum machi
 extern rtx rs6000_secondary_memory_needed_rtx (enum machine_mode);
 extern rtx (*rs6000_legitimize_reload_address_ptr) (rtx, enum machine_mode,
 						    int, int, int, int *);
-extern bool rs6000_legitimate_offset_address_p (enum machine_mode, rtx, int);
+extern bool rs6000_legitimate_offset_address_p (enum machine_mode, rtx,
+						bool, bool);
 extern rtx rs6000_find_base_term (rtx);
-extern bool rs6000_offsettable_memref_p (rtx);
 extern rtx rs6000_return_addr (int, rtx);
 extern void rs6000_output_symbol_ref (FILE*, rtx);
 extern HOST_WIDE_INT rs6000_initial_elimination_offset (int, int);
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 189420)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -4956,6 +4956,82 @@  gpr_or_gpr_p (rtx op0, rtx op1)
 	  || (REG_P (op1) && INT_REGNO_P (REGNO (op1))));
 }
 
+/* Given an address, return a constant offset term if one exists.  */
+
+static rtx
+address_offset (rtx op)
+{
+  if (GET_CODE (op) == PRE_INC
+      || GET_CODE (op) == PRE_DEC)
+    op = XEXP (op, 0);
+  else if (GET_CODE (op) == PRE_MODIFY
+	   || GET_CODE (op) == LO_SUM)
+    op = XEXP (op, 1);
+
+  if (GET_CODE (op) == CONST)
+    op = XEXP (op, 0);
+
+  if (GET_CODE (op) == PLUS)
+    op = XEXP (op, 1);
+
+  if (CONST_INT_P (op))
+    return op;
+
+  return NULL_RTX;
+}
+
+/* Return true if the MEM operand is a memory operand suitable for use
+   with a (full width, possibly multiple) gpr load/store.  On
+   powerpc64 this means the offset must be divisible by 4.
+   Implements 'Y' constraint.
+
+   Accept direct, indexed, offset, lo_sum and tocref.  Since this is
+   a constraint function we know the operand has satisfied a suitable
+   memory predicate.  Also accept some odd rtl generated by reload
+   (see rs6000_legitimize_reload_address for various forms).  It is
+   important that reload rtl be accepted by appropriate constraints
+   but not by the operand predicate.  */
+
+bool
+mem_operand_gpr (rtx op, enum machine_mode mode)
+{
+  unsigned HOST_WIDE_INT offset;
+  unsigned int extra;
+
+  op = address_offset (XEXP (op, 0));
+  if (op == NULL_RTX)
+    return true;
+
+  offset = INTVAL (op);
+  extra = GET_MODE_SIZE (mode) - UNITS_PER_WORD;
+  if ((int) extra < 0)
+    extra = 0;
+  else if (TARGET_POWERPC64 && (offset & 3) != 0)
+    return false;
+  return offset + 0x8000 < 0x10000 - extra;
+}
+
+/* Return true if the MEM operand is a memory operand suitable for use
+   with a (possibly multiple) fpr load/store.
+   Implements 'wY' constraint.  */
+
+bool
+mem_operand_fpr (rtx op, enum machine_mode mode)
+{
+  unsigned HOST_WIDE_INT offset;
+  unsigned int extra;
+
+  op = address_offset (XEXP (op, 0));
+  if (op == NULL_RTX)
+    return true;
+
+  offset = INTVAL (op);
+  extra = GET_MODE_SIZE (mode) - 8;
+  if ((int) extra < 0)
+    extra = 0;
+  return offset + 0x8000 < 0x10000 - extra;
+}
+
 
 /* Subroutines of rs6000_legitimize_address and rs6000_legitimate_address_p.  */
 
@@ -5173,13 +5249,15 @@  legitimate_small_data_p (enum machine_mode mode, r
 #define SPE_CONST_OFFSET_OK(x) (((x) & ~0xf8) == 0)
 
 bool
-rs6000_legitimate_offset_address_p (enum machine_mode mode, rtx x, int strict)
+rs6000_legitimate_offset_address_p (enum machine_mode mode, rtx x,
+				    bool strict, bool worst_case)
 {
-  unsigned HOST_WIDE_INT offset, extra;
+  unsigned HOST_WIDE_INT offset;
+  unsigned int extra;
 
   if (GET_CODE (x) != PLUS)
     return false;
-  if (GET_CODE (XEXP (x, 0)) != REG)
+  if (!REG_P (XEXP (x, 0)))
     return false;
   if (!INT_REG_OK_FOR_BASE_P (XEXP (x, 0), strict))
     return false;
@@ -5202,14 +5280,6 @@  bool
       return SPE_CONST_OFFSET_OK (offset);
 
     case DFmode:
-      if (TARGET_E500_DOUBLE)
-	return SPE_CONST_OFFSET_OK (offset);
-
-      /* If we are using VSX scalar loads, restrict ourselves to reg+reg
-	 addressing.  */
-      if (VECTOR_MEM_VSX_P (DFmode))
-	return false;
-
     case DDmode:
     case DImode:
       /* On e500v2, we may have:
@@ -5220,25 +5290,33 @@  bool
       if (TARGET_E500_DOUBLE)
 	return SPE_CONST_OFFSET_OK (offset);
 
-      if (mode == DFmode || mode == DDmode || !TARGET_POWERPC64)
+      /* If we are using VSX scalar loads, restrict ourselves to reg+reg
+	 addressing.  */
+      if (mode == DFmode && VECTOR_MEM_VSX_P (DFmode))
+	return false;
+
+      if (!worst_case)
+	break;
+      if (!TARGET_POWERPC64)
 	extra = 4;
       else if (offset & 3)
 	return false;
       break;
 
     case TFmode:
+    case TDmode:
+    case TImode:
       if (TARGET_E500_DOUBLE)
 	return (SPE_CONST_OFFSET_OK (offset)
 		&& SPE_CONST_OFFSET_OK (offset + 8));
 
-    case TDmode:
-    case TImode:
-      if (mode == TFmode || mode == TDmode || !TARGET_POWERPC64)
+      extra = 8;
+      if (!worst_case)
+	break;
+      if (!TARGET_POWERPC64)
 	extra = 12;
       else if (offset & 3)
 	return false;
-      else
-	extra = 8;
       break;
 
     default:
@@ -5318,9 +5396,7 @@  legitimate_lo_sum_address_p (enum machine_mode mod
   if (!INT_REG_OK_FOR_BASE_P (XEXP (x, 0), strict))
     return false;
   /* Restrict addressing for DI because of our SUBREG hackery.  */
-  if (TARGET_E500_DOUBLE && (mode == DFmode || mode == TFmode
-			     || mode == DDmode || mode == TDmode
-			     || mode == DImode))
+  if (TARGET_E500_DOUBLE && GET_MODE_SIZE (mode) > UNITS_PER_WORD)
     return false;
   x = XEXP (x, 1);
 
@@ -5332,10 +5408,10 @@  legitimate_lo_sum_address_p (enum machine_mode mod
 	return false;
       if (GET_MODE_NUNITS (mode) != 1)
 	return false;
-      if (GET_MODE_BITSIZE (mode) > 64
-	  || (GET_MODE_BITSIZE (mode) > 32 && !TARGET_POWERPC64
-	      && !(TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT
-		   && (mode == DFmode || mode == DDmode))))
+      if (GET_MODE_SIZE (mode) > UNITS_PER_WORD
+	  && !(/* ??? Assume floating point reg based on mode?  */
+	       TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT
+	       && (mode == DFmode || mode == DDmode)))
 	return false;
 
       return CONSTANT_P (x);
@@ -5370,7 +5446,7 @@  static rtx
 rs6000_legitimize_address (rtx x, rtx oldx ATTRIBUTE_UNUSED,
 			   enum machine_mode mode)
 {
-  unsigned int extra = 0;
+  unsigned int extra;
 
   if (!reg_offset_addressing_ok_p (mode))
     {
@@ -5397,22 +5473,18 @@  rs6000_legitimize_address (rtx x, rtx oldx ATTRIBU
 	return rs6000_legitimize_tls_address (x, model);
     }
 
+  extra = 0;
   switch (mode)
     {
-    case DFmode:
-    case DDmode:
-      extra = 4;
-      break;
-    case DImode:
-      if (!TARGET_POWERPC64)
-	extra = 4;
-      break;
     case TFmode:
     case TDmode:
-      extra = 12;
-      break;
     case TImode:
-      extra = TARGET_POWERPC64 ? 8 : 12;
+      /* As in legitimate_offset_address_p we do not assume
+	 worst-case.  The mode here is just a hint as to the registers
+	 used.  A TImode is usually in gprs, but may actually be in
+	 fprs.  Leave worst-case scenario for reload to handle via
+	 insn constraints.  */
+      extra = 8;
       break;
     default:
       break;
@@ -5423,13 +5495,8 @@  rs6000_legitimize_address (rtx x, rtx oldx ATTRIBU
       && GET_CODE (XEXP (x, 1)) == CONST_INT
       && ((unsigned HOST_WIDE_INT) (INTVAL (XEXP (x, 1)) + 0x8000)
 	  >= 0x10000 - extra)
-      && !((TARGET_POWERPC64
-	    && (mode == DImode || mode == TImode)
-	    && (INTVAL (XEXP (x, 1)) & 3) != 0)
-	   || SPE_VECTOR_MODE (mode)
-	   || (TARGET_E500_DOUBLE && (mode == DFmode || mode == TFmode
-				      || mode == DImode || mode == DDmode
-				      || mode == TDmode))))
+      && !(SPE_VECTOR_MODE (mode)
+	   || (TARGET_E500_DOUBLE && GET_MODE_SIZE (mode) > UNITS_PER_WORD)))
     {
       HOST_WIDE_INT high_int, low_int;
       rtx sum;
@@ -5445,23 +5512,17 @@  rs6000_legitimize_address (rtx x, rtx oldx ATTRIBU
 	   && GET_CODE (XEXP (x, 0)) == REG
 	   && GET_CODE (XEXP (x, 1)) != CONST_INT
 	   && GET_MODE_NUNITS (mode) == 1
-	   && ((TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT)
-	       || TARGET_POWERPC64
-	       || ((mode != DImode && mode != DFmode && mode != DDmode)
-		   || (TARGET_E500_DOUBLE && mode != DDmode)))
-	   && (TARGET_POWERPC64 || mode != DImode)
-	   && !avoiding_indexed_address_p (mode)
-	   && mode != TImode
-	   && mode != TFmode
-	   && mode != TDmode)
+	   && (GET_MODE_SIZE (mode) <= UNITS_PER_WORD
+	       || (/* ??? Assume floating point reg based on mode?  */
+		   (TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT)
+		   && (mode == DFmode || mode == DDmode)))
+	   && !avoiding_indexed_address_p (mode))
     {
       return gen_rtx_PLUS (Pmode, XEXP (x, 0),
 			   force_reg (Pmode, force_operand (XEXP (x, 1), 0)));
     }
   else if (SPE_VECTOR_MODE (mode)
-	   || (TARGET_E500_DOUBLE && (mode == DFmode || mode == TFmode
-				      || mode == DDmode || mode == TDmode
-				      || mode == DImode)))
+	   || (TARGET_E500_DOUBLE && GET_MODE_SIZE (mode) > UNITS_PER_WORD))
     {
       if (mode == DImode)
 	return x;
@@ -5494,7 +5555,11 @@  rs6000_legitimize_address (rtx x, rtx oldx ATTRIBU
 
       return force_reg (Pmode, x);
     }
-  else if (TARGET_ELF
+  else if ((TARGET_ELF
+#if TARGET_MACHO
+	    || !MACHO_DYNAMIC_NO_PIC_P
+#endif
+	    )
 	   && TARGET_32BIT
 	   && TARGET_NO_TOC
 	   && ! flag_pic
@@ -5502,32 +5567,18 @@  rs6000_legitimize_address (rtx x, rtx oldx ATTRIBU
 	   && GET_CODE (x) != CONST_DOUBLE
 	   && CONSTANT_P (x)
 	   && GET_MODE_NUNITS (mode) == 1
-	   && (GET_MODE_BITSIZE (mode) <= 32
-	       || ((TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT)
+	   && (GET_MODE_SIZE (mode) <= UNITS_PER_WORD
+	       || (/* ??? Assume floating point reg based on mode?  */
+		   (TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT)
 		   && (mode == DFmode || mode == DDmode))))
     {
       rtx reg = gen_reg_rtx (Pmode);
-      emit_insn (gen_elf_high (reg, x));
+      if (TARGET_ELF)
+	emit_insn (gen_elf_high (reg, x));
+      else
+	emit_insn (gen_macho_high (reg, x));
       return gen_rtx_LO_SUM (Pmode, reg, x);
     }
-  else if (TARGET_MACHO && TARGET_32BIT && TARGET_NO_TOC
-	   && ! flag_pic
-#if TARGET_MACHO
-	   && ! MACHO_DYNAMIC_NO_PIC_P
-#endif
-	   && GET_CODE (x) != CONST_INT
-	   && GET_CODE (x) != CONST_DOUBLE
-	   && CONSTANT_P (x)
-	   && GET_MODE_NUNITS (mode) == 1
-	   && ((TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT)
-	       || (mode != DFmode && mode != DDmode))
-	   && mode != DImode
-	   && mode != TImode)
-    {
-      rtx reg = gen_reg_rtx (Pmode);
-      emit_insn (gen_macho_high (reg, x));
-      return gen_rtx_LO_SUM (Pmode, reg, x);
-    }
   else if (TARGET_TOC
 	   && GET_CODE (x) == SYMBOL_REF
 	   && constant_pool_expr_p (x)
@@ -6266,7 +6317,7 @@  rs6000_legitimate_address_p (enum machine_mode mod
 	  || XEXP (x, 0) == arg_pointer_rtx)
       && GET_CODE (XEXP (x, 1)) == CONST_INT)
     return 1;
-  if (rs6000_legitimate_offset_address_p (mode, x, reg_ok_strict))
+  if (rs6000_legitimate_offset_address_p (mode, x, reg_ok_strict, false))
     return 1;
   if (mode != TImode
       && mode != TFmode
@@ -6294,7 +6345,8 @@  rs6000_legitimate_address_p (enum machine_mode mod
 	   && (mode == DFmode || mode == DDmode || mode == DImode))
       && TARGET_UPDATE
       && legitimate_indirect_address_p (XEXP (x, 0), reg_ok_strict)
-      && (rs6000_legitimate_offset_address_p (mode, XEXP (x, 1), reg_ok_strict)
+      && (rs6000_legitimate_offset_address_p (mode, XEXP (x, 1),
+					      reg_ok_strict, false)
 	  || (!avoiding_indexed_address_p (mode)
 	      && legitimate_indexed_address_p (XEXP (x, 1), reg_ok_strict)))
       && rtx_equal_p (XEXP (XEXP (x, 1), 0), XEXP (x, 0)))
@@ -6358,7 +6410,7 @@  rs6000_mode_dependent_address (const_rtx addr)
 	  && GET_CODE (XEXP (addr, 1)) == CONST_INT)
 	{
 	  unsigned HOST_WIDE_INT val = INTVAL (XEXP (addr, 1));
-	  return val + 12 + 0x8000 >= 0x10000;
+	  return val + 0x8000 >= 0x10000 - (TARGET_POWERPC64 ? 8 : 12);
 	}
       break;
 
@@ -6428,14 +6480,16 @@  rs6000_find_base_term (rtx op)
 
    in 32-bit mode, that the recog predicate rejects.  */
 
-bool
-rs6000_offsettable_memref_p (rtx op)
+static bool
+rs6000_offsettable_memref_p (rtx op, enum machine_mode reg_mode)
 {
+  bool worst_case;
+
   if (!MEM_P (op))
     return false;
 
   /* First mimic offsettable_memref_p.  */
-  if (offsettable_address_p (1, GET_MODE (op), XEXP (op, 0)))
+  if (offsettable_address_p (true, GET_MODE (op), XEXP (op, 0)))
     return true;
 
   /* offsettable_address_p invokes rs6000_mode_dependent_address, but
@@ -6443,8 +6497,13 @@  rs6000_find_base_term (rtx op)
      reference and, therefore, assumes that it is the largest supported
      mode (TFmode).  As a consequence, legitimate offsettable memory
      references are rejected.  rs6000_legitimate_offset_address_p contains
-     the correct logic for the PLUS case of rs6000_mode_dependent_address.  */
-  return rs6000_legitimate_offset_address_p (GET_MODE (op), XEXP (op, 0), 1);
+     the correct logic for the PLUS case of rs6000_mode_dependent_address,
+     at least with a little bit of help here given that we know the
+     actual registers used.  */
+  worst_case = ((TARGET_POWERPC64 && GET_MODE_CLASS (reg_mode) == MODE_INT)
+		|| GET_MODE_SIZE (reg_mode) == 4);
+  return rs6000_legitimate_offset_address_p (GET_MODE (op), XEXP (op, 0),
+					     true, worst_case);
 }
 
 /* Change register usage conditional on target flags.  */
@@ -13455,7 +13514,8 @@  rs6000_secondary_reload (bool in_p,
 	  if (rclass == GENERAL_REGS || rclass == BASE_REGS)
 	    {
 	      if (!legitimate_indirect_address_p (addr, false)
-		  && !rs6000_legitimate_offset_address_p (TImode, addr, false))
+		  && !rs6000_legitimate_offset_address_p (TImode, addr,
+							  false, true))
 		{
 		  sri->icode = icode;
 		  /* account for splitting the loads, and converting the
@@ -13528,18 +13588,9 @@  rs6000_secondary_reload (bool in_p,
 	   && MEM_P (x)
 	   && GET_MODE_SIZE (GET_MODE (x)) >= UNITS_PER_WORD)
     {
-      rtx addr = XEXP (x, 0);
+      rtx off = address_offset (XEXP (x, 0));
 
-      if (GET_CODE (addr) == PRE_MODIFY)
-	addr = XEXP (addr, 1);
-      else if (GET_CODE (addr) == LO_SUM
-	       && GET_CODE (XEXP (addr, 0)) == REG
-	       && GET_CODE (XEXP (addr, 1)) == CONST)
-	addr = XEXP (XEXP (addr, 1), 0);
-
-      if (GET_CODE (addr) == PLUS
-	  && GET_CODE (XEXP (addr, 1)) == CONST_INT
-	  && (INTVAL (XEXP (addr, 1)) & 3) != 0)
+      if (off != NULL_RTX && (INTVAL (off) & 3) != 0)
 	{
 	  if (in_p)
 	    sri->icode = CODE_FOR_reload_di_load;
@@ -13551,6 +13602,27 @@  rs6000_secondary_reload (bool in_p,
       else
 	default_p = true;
     }
+  else if (!TARGET_POWERPC64
+	   && rs6000_reload_register_type (rclass) == GPR_REGISTER_TYPE
+	   && MEM_P (x)
+	   && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD)
+    {
+      rtx off = address_offset (XEXP (x, 0));
+
+      if (off != NULL_RTX
+	  && ((unsigned HOST_WIDE_INT) INTVAL (off) + 0x8000
+	      >= 0x1000u - (GET_MODE_SIZE (GET_MODE (x)) - UNITS_PER_WORD)))
+	{
+	  if (in_p)
+	    sri->icode = CODE_FOR_reload_si_load;
+	  else
+	    sri->icode = CODE_FOR_reload_si_store;
+	  sri->extra_cost = 2;
+	  ret = NO_REGS;
+	}
+      else
+	default_p = true;
+    }
   else
     default_p = true;
 
@@ -13639,8 +13711,9 @@  rs6000_secondary_reload_inner (rtx reg, rtx mem, r
 	}
 
       if (GET_CODE (addr) == PLUS
-	  && (!rs6000_legitimate_offset_address_p (TImode, addr, false)
-	      || and_op2 != NULL_RTX))
+	  && (and_op2 != NULL_RTX
+	      || !rs6000_legitimate_offset_address_p (TImode, addr,
+						      false, true)))
 	{
 	  addr_op1 = XEXP (addr, 0);
 	  addr_op2 = XEXP (addr, 1);
@@ -13672,7 +13745,8 @@  rs6000_secondary_reload_inner (rtx reg, rtx mem, r
 	  scratch_or_premodify = scratch;
 	}
       else if (!legitimate_indirect_address_p (addr, false)
-	       && !rs6000_legitimate_offset_address_p (TImode, addr, false))
+	       && !rs6000_legitimate_offset_address_p (TImode, addr,
+						       false, true))
 	{
 	  if (TARGET_DEBUG_ADDR)
 	    {
@@ -13731,7 +13805,7 @@  rs6000_secondary_reload_inner (rtx reg, rtx mem, r
 	      && GET_MODE_SIZE (mode) == 8
 	      && and_op2 == NULL_RTX
 	      && scratch_or_premodify == scratch
-	      && rs6000_legitimate_offset_address_p (mode, addr, false)))
+	      && rs6000_legitimate_offset_address_p (mode, addr, false, false)))
 	;
 
       else if (GET_CODE (addr) == PLUS)
@@ -13840,10 +13914,11 @@  rs6000_secondary_reload_inner (rtx reg, rtx mem, r
 }
 
 /* Convert reloads involving 64-bit gprs and misaligned offset
-   addressing to use indirect addressing.  */
+   addressing, or multiple 32-bit gprs and offsets that are too large,
+   to use indirect addressing.  */
 
 void
-rs6000_secondary_reload_ppc64 (rtx reg, rtx mem, rtx scratch, bool store_p)
+rs6000_secondary_reload_gpr (rtx reg, rtx mem, rtx scratch, bool store_p)
 {
   int regno = true_regnum (reg);
   enum reg_class rclass;
@@ -13852,7 +13927,7 @@  void
 
   if (TARGET_DEBUG_ADDR)
     {
-      fprintf (stderr, "\nrs6000_secondary_reload_ppc64, type = %s\n",
+      fprintf (stderr, "\nrs6000_secondary_reload_gpr, type = %s\n",
 	       store_p ? "store" : "load");
       fprintf (stderr, "reg:\n");
       debug_rtx (reg);
@@ -16932,7 +17007,7 @@  rs6000_split_multireg_move (rtx dst, rtx src)
 	      emit_insn (gen_add3_insn (breg, breg, delta_rtx));
 	      src = replace_equiv_address (src, breg);
 	    }
-	  else if (! rs6000_offsettable_memref_p (src))
+	  else if (! rs6000_offsettable_memref_p (src, reg_mode))
 	    {
 	      if (GET_CODE (XEXP (src, 0)) == PRE_MODIFY)
 		{
@@ -16998,7 +17073,7 @@  rs6000_split_multireg_move (rtx dst, rtx src)
 		emit_insn (gen_add3_insn (breg, breg, delta_rtx));
 	      dst = replace_equiv_address (dst, breg);
 	    }
-	  else if (!rs6000_offsettable_memref_p (dst)
+	  else if (!rs6000_offsettable_memref_p (dst, reg_mode)
 		   && GET_CODE (XEXP (dst, 0)) != LO_SUM)
 	    {
 	      if (GET_CODE (XEXP (dst, 0)) == PRE_MODIFY)
@@ -17036,7 +17111,7 @@  rs6000_split_multireg_move (rtx dst, rtx src)
 		}
 	    }
 	  else if (GET_CODE (XEXP (dst, 0)) != LO_SUM)
-	    gcc_assert (rs6000_offsettable_memref_p (dst));
+	    gcc_assert (rs6000_offsettable_memref_p (dst, reg_mode));
 	}
 
       for (i = 0; i < nregs; i++)
@@ -27874,7 +27949,7 @@  rs6000_allocate_stack_temp (enum machine_mode mode
   if (!legitimate_indirect_address_p (addr, strict_p))
     {
       if (offsettable_p
-	  && !rs6000_legitimate_offset_address_p (mode, addr, strict_p))
+	  && !rs6000_legitimate_offset_address_p (mode, addr, strict_p, true))
 	stack = replace_equiv_address (stack, copy_addr_to_reg (addr));
 
       else if (reg_reg_p && !legitimate_indexed_address_p (addr, strict_p))
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 189420)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -9659,15 +9659,12 @@ 
 #endif
 }")
 
-;; Don't have reload use general registers to load a constant.  First,
-;; it might not work if the output operand is the equivalent of
-;; a non-offsettable memref, but also it is less efficient than loading
-;; the constant into an FP register, since it will probably be used there.
-;; The "??" is a kludge until we can figure out a more reasonable way
-;; of handling these non-offsettable values.
+;; Don't have reload use general registers to load a constant.  It is
+;; less efficient than loading the constant into an FP register, since
+;; it will probably be used there.
 (define_insn "*movdf_hardfloat32"
-  [(set (match_operand:DF 0 "nonimmediate_operand" "=!r,??r,m,ws,?wa,ws,?wa,Z,?Z,d,d,m,wa,!r,!r,!r")
-	(match_operand:DF 1 "input_operand" "r,m,r,ws,wa,Z,Z,ws,wa,d,m,d,j,G,H,F"))]
+  [(set (match_operand:DF 0 "nonimmediate_operand" "=Y,r,!r,ws,?wa,ws,?wa,Z,?Z,m,d,d,wa,!r,!r,!r")
+	(match_operand:DF 1 "input_operand" "r,Y,r,ws,wa,Z,Z,ws,wa,d,m,d,j,G,H,F"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT 
    && (gpc_reg_operand (operands[0], DFmode)
        || gpc_reg_operand (operands[1], DFmode))"
@@ -9691,11 +9688,11 @@ 
     case 8:
       return \"stxsd%U0x %x1,%y0\";
     case 9:
-      return \"fmr %0,%1\";
+      return \"stfd%U0%X0 %1,%0\";
     case 10:
       return \"lfd%U1%X1 %0,%1\";
     case 11:
-      return \"stfd%U0%X0 %1,%0\";
+      return \"fmr %0,%1\";
     case 12:
       return \"xxlxor %x0,%x0,%x0\";
     case 13:
@@ -9704,46 +9701,47 @@ 
       return \"#\";
     }
 }"
-  [(set_attr "type" "two,load,store,fp,fp,fpload,fpload,fpstore,fpstore,fp,fpload,fpstore,vecsimple,*,*,*")
-   (set_attr "length" "8,16,16,4,4,4,4,4,4,4,4,4,4,8,12,16")])
+  [(set_attr "type" "store,load,two,fp,fp,fpload,fpload,fpstore,fpstore,fpstore,fpload,fp,vecsimple,*,*,*")
+   (set_attr "length" "8,8,8,4,4,4,4,4,4,4,4,4,4,8,12,16")])
 
 (define_insn "*movdf_softfloat32"
-  [(set (match_operand:DF 0 "nonimmediate_operand" "=r,r,m,r,r,r")
-	(match_operand:DF 1 "input_operand" "r,m,r,G,H,F"))]
+  [(set (match_operand:DF 0 "nonimmediate_operand" "=Y,r,r,r,r,r")
+	(match_operand:DF 1 "input_operand" "r,Y,r,G,H,F"))]
   "! TARGET_POWERPC64 
    && ((TARGET_FPRS && TARGET_SINGLE_FLOAT) 
        || TARGET_SOFT_FLOAT || TARGET_E500_SINGLE)
    && (gpc_reg_operand (operands[0], DFmode)
        || gpc_reg_operand (operands[1], DFmode))"
   "#"
-  [(set_attr "type" "two,load,store,*,*,*")
+  [(set_attr "type" "store,load,two,*,*,*")
    (set_attr "length" "8,8,8,8,12,16")])
 
 ;; Reload patterns to support gpr load/store with misaligned mem.
-(define_expand "reload_di_store"
+;; and multiple gpr load/store at offset >= 0xfffc
+(define_expand "reload_<mode>_store"
   [(parallel [(match_operand 0 "memory_operand" "=m")
               (match_operand 1 "gpc_reg_operand" "r")
-              (match_operand:DI 2 "register_operand" "=&b")])]
-  "TARGET_POWERPC64"
+              (match_operand:GPR 2 "register_operand" "=&b")])]
+  ""
 {
-  rs6000_secondary_reload_ppc64 (operands[1], operands[0], operands[2], true);
+  rs6000_secondary_reload_gpr (operands[1], operands[0], operands[2], true);
   DONE;
 })
 
-(define_expand "reload_di_load"
+(define_expand "reload_<mode>_load"
   [(parallel [(match_operand 0 "gpc_reg_operand" "=r")
               (match_operand 1 "memory_operand" "m")
-              (match_operand:DI 2 "register_operand" "=b")])]
-  "TARGET_POWERPC64"
+              (match_operand:GPR 2 "register_operand" "=b")])]
+  ""
 {
-  rs6000_secondary_reload_ppc64 (operands[0], operands[1], operands[2], false);
+  rs6000_secondary_reload_gpr (operands[0], operands[1], operands[2], false);
   DONE;
 })
 
 ; ld/std require word-aligned displacements -> 'Y' constraint.
 ; List Y->r and r->Y before r->r for reload.
 (define_insn "*movdf_hardfloat64_mfpgpr"
-  [(set (match_operand:DF 0 "nonimmediate_operand" "=Y,r,!r,ws,?wa,ws,?wa,Z,?Z,d,d,m,wa,*c*l,!r,*h,!r,!r,!r,r,d")
+  [(set (match_operand:DF 0 "nonimmediate_operand" "=Y,r,!r,ws,?wa,ws,?wa,Z,?Z,m,d,d,wa,*c*l,!r,*h,!r,!r,!r,r,d")
 	(match_operand:DF 1 "input_operand" "r,Y,r,ws,?wa,Z,Z,ws,wa,d,m,d,j,r,h,0,G,H,F,d,r"))]
   "TARGET_POWERPC64 && TARGET_MFPGPR && TARGET_HARD_FLOAT && TARGET_FPRS 
    && TARGET_DOUBLE_FLOAT
@@ -9759,9 +9757,9 @@ 
    lxsd%U1x %x0,%y1
    stxsd%U0x %x1,%y0
    stxsd%U0x %x1,%y0
+   stfd%U0%X0 %1,%0
+   lfd%U1%X1 %0,%1
    fmr %0,%1
-   lfd%U1%X1 %0,%1
-   stfd%U0%X0 %1,%0
    xxlxor %x0,%x0,%x0
    mt%0 %1
    mf%1 %0
@@ -9771,13 +9769,13 @@ 
    #
    mftgpr %0,%1
    mffgpr %0,%1"
-  [(set_attr "type" "store,load,*,fp,fp,fpload,fpload,fpstore,fpstore,fp,fpload,fpstore,vecsimple,mtjmpr,mfjmpr,*,*,*,*,mftgpr,mffgpr")
+  [(set_attr "type" "store,load,*,fp,fp,fpload,fpload,fpstore,fpstore,fpstore,fpload,fp,vecsimple,mtjmpr,mfjmpr,*,*,*,*,mftgpr,mffgpr")
    (set_attr "length" "4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,8,12,16,4,4")])
 
 ; ld/std require word-aligned displacements -> 'Y' constraint.
 ; List Y->r and r->Y before r->r for reload.
 (define_insn "*movdf_hardfloat64"
-  [(set (match_operand:DF 0 "nonimmediate_operand" "=Y,r,!r,ws,?wa,ws,?wa,Z,?Z,d,d,m,wa,*c*l,!r,*h,!r,!r,!r")
+  [(set (match_operand:DF 0 "nonimmediate_operand" "=Y,r,!r,ws,?wa,ws,?wa,Z,?Z,m,d,d,wa,*c*l,!r,*h,!r,!r,!r")
 	(match_operand:DF 1 "input_operand" "r,Y,r,ws,wa,Z,Z,ws,wa,d,m,d,j,r,h,0,G,H,F"))]
   "TARGET_POWERPC64 && !TARGET_MFPGPR && TARGET_HARD_FLOAT && TARGET_FPRS 
    && TARGET_DOUBLE_FLOAT
@@ -9793,9 +9791,9 @@ 
    lxsd%U1x %x0,%y1
    stxsd%U0x %x1,%y0
    stxsd%U0x %x1,%y0
+   stfd%U0%X0 %1,%0
+   lfd%U1%X1 %0,%1
    fmr %0,%1
-   lfd%U1%X1 %0,%1
-   stfd%U0%X0 %1,%0
    xxlxor %x0,%x0,%x0
    mt%0 %1
    mf%1 %0
@@ -9803,18 +9801,18 @@ 
    #
    #
    #"
-  [(set_attr "type" "store,load,*,fp,fp,fpload,fpload,fpstore,fpstore,fp,fpload,fpstore,vecsimple,mtjmpr,mfjmpr,*,*,*,*")
+  [(set_attr "type" "store,load,*,fp,fp,fpload,fpload,fpstore,fpstore,fpstore,fpload,fp,vecsimple,mtjmpr,mfjmpr,*,*,*,*")
    (set_attr "length" "4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,8,12,16")])
 
 (define_insn "*movdf_softfloat64"
-  [(set (match_operand:DF 0 "nonimmediate_operand" "=r,Y,r,cl,r,r,r,r,*h")
-	(match_operand:DF 1 "input_operand" "Y,r,r,r,h,G,H,F,0"))]
+  [(set (match_operand:DF 0 "nonimmediate_operand" "=Y,r,r,cl,r,r,r,r,*h")
+	(match_operand:DF 1 "input_operand" "r,Y,r,r,h,G,H,F,0"))]
   "TARGET_POWERPC64 && (TARGET_SOFT_FLOAT || !TARGET_FPRS)
    && (gpc_reg_operand (operands[0], DFmode)
        || gpc_reg_operand (operands[1], DFmode))"
   "@
+   std%U0%X0 %1,%0
    ld%U1%X1 %0,%1
-   std%U0%X0 %1,%0
    mr %0,%1
    mt%0 %1
    mf%1 %0
@@ -9822,7 +9820,7 @@ 
    #
    #
    {cror 0,0,0|nop}"
-  [(set_attr "type" "load,store,*,mtjmpr,mfjmpr,*,*,*,*")
+  [(set_attr "type" "store,load,*,mtjmpr,mfjmpr,*,*,*,*")
    (set_attr "length" "4,4,4,4,4,8,12,16,4")])
 
 (define_expand "movtf"
@@ -9831,12 +9829,12 @@ 
   "!TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128"
   "{ rs6000_emit_move (operands[0], operands[1], TFmode); DONE; }")
 
-; It's important to list the o->f and f->o moves before f->f because
-; otherwise reload, given m->f, will try to pick f->f and reload it,
-; which doesn't make progress.  Likewise r->Y must be before r->r.
+; It's important to list the wY->d and d->wY moves before d->d because
+; otherwise reload, given m->d, will try to pick d->d and reload it,
+; which doesn't make progress.  Likewise Y->r and r->Y must be before r->r.
 (define_insn_and_split "*movtf_internal"
-  [(set (match_operand:TF 0 "nonimmediate_operand" "=o,d,d,r,Y,r")
-	(match_operand:TF 1 "input_operand"         "d,o,d,YGHF,r,r"))]
+  [(set (match_operand:TF 0 "nonimmediate_operand" "=wY,d,d,Y,r,r")
+	(match_operand:TF 1 "input_operand" "d,wY,d,r,YGHF,r"))]
   "!TARGET_IEEEQUAD
    && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_LONG_DOUBLE_128
    && (gpc_reg_operand (operands[0], TFmode)
@@ -9848,8 +9846,8 @@ 
   [(set_attr "length" "8,8,8,20,20,16")])
 
 (define_insn_and_split "*movtf_softfloat"
-  [(set (match_operand:TF 0 "rs6000_nonimmediate_operand" "=r,Y,r")
-	(match_operand:TF 1 "input_operand"         "YGHF,r,r"))]
+  [(set (match_operand:TF 0 "rs6000_nonimmediate_operand" "=Y,r,r")
+	(match_operand:TF 1 "input_operand"         "r,YGHF,r"))]
   "!TARGET_IEEEQUAD
    && (TARGET_SOFT_FLOAT || !TARGET_FPRS) && TARGET_LONG_DOUBLE_128
    && (gpc_reg_operand (operands[0], TFmode)
@@ -9890,9 +9888,9 @@ 
 })
 
 (define_insn_and_split "*extenddftf2_internal"
-  [(set (match_operand:TF 0 "nonimmediate_operand" "=o,d,&d,r")
-       (float_extend:TF (match_operand:DF 1 "input_operand" "dr,md,md,rmGHF")))
-   (use (match_operand:DF 2 "zero_reg_mem_operand" "rd,m,d,n"))]
+  [(set (match_operand:TF 0 "nonimmediate_operand" "=wY,Y,d,&d,r")
+       (float_extend:TF (match_operand:DF 1 "input_operand" "d,r,md,md,rmGHF")))
+   (use (match_operand:DF 2 "zero_reg_mem_operand" "d,r,m,d,n"))]
   "!TARGET_IEEEQUAD
    && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT 
    && TARGET_LONG_DOUBLE_128"
@@ -10145,11 +10143,13 @@ 
 ;; Next come the multi-word integer load and store and the load and store
 ;; multiple insns.
 
-; List r->r after r->"o<>", otherwise reload will try to reload a
-; non-offsettable address by using r->r which won't make progress.
+;; List r->r after r->Y, otherwise reload will try to reload a
+;; non-offsettable address by using r->r which won't make progress.
+;; Use of fprs is disparaged slightly otherwise reload prefers to reload
+;; a gpr into a fpr instead of reloading an invalid 'Y' address
 (define_insn "*movdi_internal32"
-  [(set (match_operand:DI 0 "rs6000_nonimmediate_operand" "=o<>,r,r,*d,*d,m,r,?wa")
-	(match_operand:DI 1 "input_operand" "r,r,m,d,m,d,IJKnGHF,O"))]
+  [(set (match_operand:DI 0 "rs6000_nonimmediate_operand" "=Y,r,r,?m,?*d,?*d,r,?wa")
+	(match_operand:DI 1 "input_operand" "r,Y,r,d,m,d,IJKnGHF,O"))]
   "! TARGET_POWERPC64
    && (gpc_reg_operand (operands[0], DImode)
        || gpc_reg_operand (operands[1], DImode))"
@@ -10157,12 +10157,12 @@ 
    #
    #
    #
+   stfd%U0%X0 %1,%0
+   lfd%U1%X1 %0,%1
    fmr %0,%1
-   lfd%U1%X1 %0,%1
-   stfd%U0%X0 %1,%0
    #
    xxlxor %x0,%x0,%x0"
-  [(set_attr "type" "load,*,store,fp,fpload,fpstore,*,vecsimple")])
+  [(set_attr "type" "store,load,*,fpstore,fpload,fp,*,vecsimple")])
 
 (define_split
   [(set (match_operand:DI 0 "gpc_reg_operand" "")
@@ -10195,50 +10195,50 @@ 
 { rs6000_split_multireg_move (operands[0], operands[1]); DONE; })
 
 (define_insn "*movdi_mfpgpr"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,m,r,r,r,*d,*d,m,r,*h,*h,r,*d")
-	(match_operand:DI 1 "input_operand" "r,m,r,I,L,nF,d,m,d,*h,r,0,*d,r"))]
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=Y,r,r,r,r,r,?m,?*d,?*d,r,*h,*h,r,?*d")
+	(match_operand:DI 1 "input_operand" "r,Y,r,I,L,nF,d,m,d,*h,r,0,*d,r"))]
   "TARGET_POWERPC64 && TARGET_MFPGPR && TARGET_HARD_FLOAT && TARGET_FPRS
    && (gpc_reg_operand (operands[0], DImode)
        || gpc_reg_operand (operands[1], DImode))"
   "@
+   std%U0%X0 %1,%0
+   ld%U1%X1 %0,%1
    mr %0,%1
-   ld%U1%X1 %0,%1
-   std%U0%X0 %1,%0
    li %0,%1
    lis %0,%v1
    #
+   stfd%U0%X0 %1,%0
+   lfd%U1%X1 %0,%1
    fmr %0,%1
-   lfd%U1%X1 %0,%1
-   stfd%U0%X0 %1,%0
    mf%1 %0
    mt%0 %1
    {cror 0,0,0|nop}
    mftgpr %0,%1
    mffgpr %0,%1"
-  [(set_attr "type" "*,load,store,*,*,*,fp,fpload,fpstore,mfjmpr,mtjmpr,*,mftgpr,mffgpr")
+  [(set_attr "type" "store,load,*,*,*,*,fpstore,fpload,fp,mfjmpr,mtjmpr,*,mftgpr,mffgpr")
    (set_attr "length" "4,4,4,4,4,20,4,4,4,4,4,4,4,4")])
 
 (define_insn "*movdi_internal64"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,m,r,r,r,*d,*d,m,r,*h,*h,?wa")
-	(match_operand:DI 1 "input_operand" "r,m,r,I,L,nF,d,m,d,*h,r,0,O"))]
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=Y,r,r,r,r,r,?m,?*d,?*d,r,*h,*h,?wa")
+	(match_operand:DI 1 "input_operand" "r,Y,r,I,L,nF,d,m,d,*h,r,0,O"))]
   "TARGET_POWERPC64 && (!TARGET_MFPGPR || !TARGET_HARD_FLOAT || !TARGET_FPRS)
    && (gpc_reg_operand (operands[0], DImode)
        || gpc_reg_operand (operands[1], DImode))"
   "@
+   std%U0%X0 %1,%0
+   ld%U1%X1 %0,%1
    mr %0,%1
-   ld%U1%X1 %0,%1
-   std%U0%X0 %1,%0
    li %0,%1
    lis %0,%v1
    #
+   stfd%U0%X0 %1,%0
+   lfd%U1%X1 %0,%1
    fmr %0,%1
-   lfd%U1%X1 %0,%1
-   stfd%U0%X0 %1,%0
    mf%1 %0
    mt%0 %1
    {cror 0,0,0|nop}
    xxlxor %x0,%x0,%x0"
-  [(set_attr "type" "*,load,store,*,*,*,fp,fpload,fpstore,mfjmpr,mtjmpr,*,vecsimple")
+  [(set_attr "type" "store,load,*,*,*,*,fpstore,fpload,fp,mfjmpr,mtjmpr,*,vecsimple")
    (set_attr "length" "4,4,4,4,4,20,4,4,4,4,4,4,4")])
 
 ;; immediate value valid for a single instruction hiding in a const_double
@@ -10313,8 +10313,8 @@ 
 ;; giving the SCRATCH mq.
 
 (define_insn "*movti_power"
-  [(set (match_operand:TI 0 "reg_or_mem_operand" "=Q,m,????r,????r,????r,r")
-	(match_operand:TI 1 "input_operand" "r,r,r,Q,m,n"))
+  [(set (match_operand:TI 0 "reg_or_mem_operand" "=Q,Y,????r,????r,????r,r")
+	(match_operand:TI 1 "input_operand" "r,r,Q,Y,r,n"))
    (clobber (match_scratch:SI 2 "=q,q#X,X,X,X,X"))]
   "TARGET_POWER && ! TARGET_POWERPC64
    && (gpc_reg_operand (operands[0], TImode) || gpc_reg_operand (operands[1], TImode))"
@@ -10329,25 +10329,25 @@ 
       if (TARGET_STRING)
         return \"{stsi|stswi} %1,%P0,16\";
     case 1:
+      return \"#\";
     case 2:
-      return \"#\";
-    case 3:
       /* If the address is not used in the output, we can use lsi.  Otherwise,
 	 fall through to generating four loads.  */
       if (TARGET_STRING
 	  && ! reg_overlap_mentioned_p (operands[0], operands[1]))
 	return \"{lsi|lswi} %0,%P1,16\";
       /* ... fall through ...  */
+    case 3:
     case 4:
     case 5:
       return \"#\";
     }
 }"
-  [(set_attr "type" "store,store,*,load,load,*")])
+  [(set_attr "type" "store,store,load,load,*,*")])
 
 (define_insn "*movti_string"
-  [(set (match_operand:TI 0 "reg_or_mem_operand" "=Q,o<>,????r,????r,????r,r")
-	(match_operand:TI 1 "input_operand" "r,r,r,Q,m,n"))]
+  [(set (match_operand:TI 0 "reg_or_mem_operand" "=Q,Y,????r,????r,????r,r")
+	(match_operand:TI 1 "input_operand" "r,r,Q,Y,r,n"))]
   "! TARGET_POWER && ! TARGET_POWERPC64
    && (gpc_reg_operand (operands[0], TImode) || gpc_reg_operand (operands[1], TImode))"
   "*
@@ -10360,33 +10360,33 @@ 
       if (TARGET_STRING)
         return \"{stsi|stswi} %1,%P0,16\";
     case 1:
+      return \"#\";
     case 2:
-      return \"#\";
-    case 3:
       /* If the address is not used in the output, we can use lsi.  Otherwise,
 	 fall through to generating four loads.  */
       if (TARGET_STRING
           && ! reg_overlap_mentioned_p (operands[0], operands[1]))
 	return \"{lsi|lswi} %0,%P1,16\";
       /* ... fall through ...  */
+    case 3:
     case 4:
     case 5:
       return \"#\";
     }
 }"
-  [(set_attr "type" "store_ux,store_ux,*,load_ux,load_ux,*")
+  [(set_attr "type" "store_ux,store_ux,load_ux,load_ux,*,*")
    (set (attr "cell_micro") (if_then_else (match_test "TARGET_STRING")
    			                  (const_string "always")
 					  (const_string "conditional")))])
 
 (define_insn "*movti_ppc64"
-  [(set (match_operand:TI 0 "nonimmediate_operand" "=r,o<>,r")
-	(match_operand:TI 1 "input_operand" "r,r,m"))]
+  [(set (match_operand:TI 0 "nonimmediate_operand" "=Y,r,r")
+	(match_operand:TI 1 "input_operand" "r,Y,r"))]
   "(TARGET_POWERPC64 && (gpc_reg_operand (operands[0], TImode)
     || gpc_reg_operand (operands[1], TImode)))
    && VECTOR_MEM_NONE_P (TImode)"
   "#"
-  [(set_attr "type" "*,store,load")])
+  [(set_attr "type" "store,load,*")])
 
 (define_split
   [(set (match_operand:TI 0 "gpc_reg_operand" "")
@@ -13215,8 +13215,8 @@ 
    (set_attr "length" "12")])
 
 (define_insn "stack_protect_setdi"
-  [(set (match_operand:DI 0 "memory_operand" "=m")
-	(unspec:DI [(match_operand:DI 1 "memory_operand" "m")] UNSPEC_SP_SET))
+  [(set (match_operand:DI 0 "memory_operand" "=Y")
+	(unspec:DI [(match_operand:DI 1 "memory_operand" "Y")] UNSPEC_SP_SET))
    (set (match_scratch:DI 2 "=&r") (const_int 0))]
   "TARGET_64BIT"
   "ld%U1%X1 %2,%1\;std%U0%X0 %2,%0\;{lil|li} %2,0"
@@ -13257,8 +13257,8 @@ 
 
 (define_insn "stack_protect_testdi"
   [(set (match_operand:CCEQ 0 "cc_reg_operand" "=x,?y")
-        (unspec:CCEQ [(match_operand:DI 1 "memory_operand" "m,m")
-		      (match_operand:DI 2 "memory_operand" "m,m")]
+        (unspec:CCEQ [(match_operand:DI 1 "memory_operand" "Y,Y")
+		      (match_operand:DI 2 "memory_operand" "Y,Y")]
 		     UNSPEC_SP_TEST))
    (set (match_scratch:DI 4 "=r,r") (const_int 0))
    (clobber (match_scratch:DI 3 "=&r,&r"))]