diff mbox

, Add PowerPC ISA 3.0 vector d-form addressing

Message ID 20160503223955.GA12329@ibm-tiger.the-meissners.org
State New
Headers show

Commit Message

Michael Meissner May 3, 2016, 10:39 p.m. UTC
This patch implements the new instructions added in ISA 3.0 (power9) to allow
d-form (register + offset) memory loads and stores to/from vector registers.

I split the previous -mpower9-dform switch to -mpower9-dform-vector and
-mpower9-dform-scalar in case you need to disable one or both forms.  The
vector d-form instructions are more restricted than the scalar instructions in
that they use a 12-bit offset (like the lq/stq instructions that operate on GPR
registers).

Note, -mpower9-dform-vector is not compatible with RELOAD. I believe the
problem is that we are missing some push_reloads, and it winds up with an
invalid memory address that includes another memory address. Given that we plan
to move from RELOAD to LRA, I stopped trying to debug it, and enabled it only
if LRA is used.

Right now, we cannot move to LRA as the default until the performance
degradations that LRA causes in the 403.gcc spec bencharmark are fixed:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69847

With this patch, I enable -mlra if the user did not specify either -mlra or
-mno-lra on the command line, and -mcpu=power9 or -mpower9-dform-vector were
used. I also enabled -mvsx-timode if LRA was used, which also is a RELOAD
issue, that works with LRA.

I have built the spec 2006 CPU benchmarks with this option, and all bencmarks
generate vector d-form instructions (mcf only generates stores, not loads).

This patch bootstraps fine on a little endian Power8 compiler and has no
regressions. Is it ok to install in the trunk?

How about in the gcc 6.2 branch after a burn-in period.

[gcc]
2016-05-03  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000-cpus.def (ISA_3_0_MASKS_SERVER): Make -mlra
	an option mask instead of setting a separate word.  Add -mlra and
	-mvsx-timode as defaults for power9. Split -mpower9-dform into
	-mpower9-dform-scalar and -mpower9-dform-vector. Add support for
	ISA 3.0 vector d-form instructions. Set -mlra by default if
	-mpower9-dform-vector. Set -mvsx-timode if -mlra. Add more debug
	printouts. If we have ISA 3.0 d-form vector instructions use them
	for the epilog and prolog. Add wO constraint for ISA 3.0 vector
	d-form instructions. Rewrite quad memory support to support both
	lq/stq for GPRs and ISA 3.0 vector d-forms for vector registers.
	Delete p9_vecload_<mode> and p9_vecstore_<mode> in favor of
	folding the ISA 3.0 endian load/store into the general mov<mode>
	insns.
	(POWERPC_MASKS): Likewise.
	* config/rs6000/rs6000.opt (-mlra): Likewise.
	(-mpower9-dform): Likewise.
	(-mpower9-dform-scalar): Likewise.
	(-mpower9-dform-vector): Likewise.
	* config/rs6000/rs6000.c (RELOAD_REG_QUAD_OFFSET): Likewise.
	(mode_supports_vsx_dform_quad): Likewise.
	(rs6000_debug_addr_mask): Likewise.
	(rs6000_setup_reg_addr_masks): Likewise.
	(rs6000_option_override_internal): Likewise.
	(quad_address_offset_p): Likewise.
	(mem_operand_gpr): Likewise.
	(reg_offset_addressing_ok_p): Likewise.
	(offsettable_ok_by_alignment): Likewise.
	(rs6000_legitimate_offset_address_p): Likewise.
	(legitimate_lo_sum_address_p): Likewise.
	(rs6000_legitimize_address): Likewise.
	(rs6000_legitimize_reload_address): Likewise.
	(rs6000_legitimate_address_p): Likewise.
	(rs6000_secondary_reload_memory): Likewise.
	(rs6000_secondary_reload_inner): Likewise.
	(rs6000_preferred_reload_class): Likewise.
	(rs6000_output_move_128bit): Likewise.
	(rs6000_emit_prologue): Likewise.
	(rs6000_emit_epilogue): Likewise.
	(rs6000_lra_p): Likewise.
	(rs6000_opt_masks): Likewise.
	(rs6000_print_options_internal): Likewise.
	* config/rs6000/constraints.md (wO constraint): Likewise.
	* config/rs6000/predicates.md (quad_memory_operand): Likewise.
	(vsx_quad_dform_memory_operand): Likewise.
	* config/rs6000/rs6000-protos.h (quad_address_p): Likewise.
	* config/rs6000/vsx.md (p9_vecload_<mode>): Likewise.
	(p9_vecstore_<mode>): Likewise.
	(vsx_mov<mode): Likewise.
	(vsx_movti_64bi): Likewise.
	(vsx_movti_32bit): Likewise.
	* doc/invoke.texi (RS/6000 and PowerPC Options): Likewise.
	* doc/md.texi (wO constraint): Likewise.

[gcc/testsuite]
2016-05-03  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/dform-1.c: Add -mlra to options.
	* gcc.target/powerpc/dform-2.c: Likewise.
	* gcc.target/powerpc/dform-3.c: New test for ISA 3.0 vector d-form
	instructions.

Comments

Segher Boessenkool May 4, 2016, 4:16 p.m. UTC | #1
Hi Mike,

On Tue, May 03, 2016 at 06:39:55PM -0400, Michael Meissner wrote:
> With this patch, I enable -mlra if the user did not specify either -mlra or
> -mno-lra on the command line, and -mcpu=power9 or -mpower9-dform-vector were
> used. I also enabled -mvsx-timode if LRA was used, which also is a RELOAD
> issue, that works with LRA.

I don't like enabling LRA if the user didn't ask for it; it is a bit too
surprising.  What do you do if there is -mno-lra explicitly?  You can just
do the same if no-lra is implicit?

> 	* doc/md.texi (wO constraint): Likewise.

Everything is "likewise", that isn't very helpful.  Writing big changelogs
is annoying, I totally agree, but please try a bit harder.

> --- gcc/config/rs6000/rs6000.opt	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 235831)
> +++ gcc/config/rs6000/rs6000.opt	(.../gcc/config/rs6000)	(working copy)
> @@ -470,8 +470,8 @@ Target RejectNegative Joined UInteger Va
>  -mlong-double-<n>	Specify size of long double (64 or 128 bits).
>  
>  mlra
> -Target Report Var(rs6000_lra_flag) Init(0) Save
> -Use LRA instead of reload.
> +Target Undocumented Mask(LRA) Var(rs6000_isa_flags)
> +Use the LRA register allocator instead of the reload register allocator.

It wasn't "undocumented" before?  Why the change to a mask bit btw?

> +mpower9-dform-scalar
> +Target Report Mask(P9_DFORM_SCALAR) Var(rs6000_isa_flags)
> +Use/do not use scalar register+offset memory instructions added in ISA 3.0.
> +
> +mpower9-dform-vector
> +Target Report Mask(P9_DFORM_VECTOR) Var(rs6000_isa_flags)
> +Use/do not use vector register+offset memory instructions added in ISA 3.0.
> +
>  mpower9-dform
> -Target Undocumented Mask(P9_DFORM) Var(rs6000_isa_flags)
> -Use/do not use vector and scalar instructions added in ISA 3.0.
> +Target Report Var(TARGET_P9_DFORM_BOTH) Init(-1) Save
> +Use/do not use register+offset memory instructions added in ISA 3.0.

These should probably all be undocumented, though (they're not something
users should use).

> +/* Return true if the ADDR is an acceptiable address for a quad memory
                                    ^ spelling

> +	  if (((addr_mask & RELOAD_REG_QUAD_OFFSET) == 0)
> +	      || !quad_address_p (addr, mode, false))

You can lose some parens here, i.e.

+	  if ((addr_mask & RELOAD_REG_QUAD_OFFSET) == 0
+	      || !quad_address_p (addr, mode, false))


Segher
diff mbox

Patch

Index: gcc/config/rs6000/rs6000-cpus.def
===================================================================
--- gcc/config/rs6000/rs6000-cpus.def	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 235831)
+++ gcc/config/rs6000/rs6000-cpus.def	(.../gcc/config/rs6000)	(working copy)
@@ -60,14 +60,17 @@ 
 				 | OPTION_MASK_UPPER_REGS_SF)
 
 /* Add ISEL back into ISA 3.0, since it is supposed to be a win.  Do not add
-   P9_DFORM or P9_MINMAX until they are fully debugged.  */
+   P9_MINMAX until the hardware that supports it is available.  */
 #define ISA_3_0_MASKS_SERVER	(ISA_2_7_MASKS_SERVER			\
 				 | OPTION_MASK_FLOAT128_HW		\
 				 | OPTION_MASK_ISEL			\
+				 | OPTION_MASK_LRA			\
 				 | OPTION_MASK_MODULO			\
 				 | OPTION_MASK_P9_FUSION		\
-				 | OPTION_MASK_P9_DFORM			\
-				 | OPTION_MASK_P9_VECTOR)
+				 | OPTION_MASK_P9_DFORM_SCALAR		\
+				 | OPTION_MASK_P9_DFORM_VECTOR		\
+				 | OPTION_MASK_P9_VECTOR		\
+				 | OPTION_MASK_VSX_TIMODE)
 
 #define POWERPC_7400_MASK	(OPTION_MASK_PPC_GFXOPT | OPTION_MASK_ALTIVEC)
 
@@ -94,6 +97,7 @@ 
 				 | OPTION_MASK_FPRND			\
 				 | OPTION_MASK_HTM			\
 				 | OPTION_MASK_ISEL			\
+				 | OPTION_MASK_LRA			\
 				 | OPTION_MASK_MFCRF			\
 				 | OPTION_MASK_MFPGPR			\
 				 | OPTION_MASK_MODULO			\
@@ -101,7 +105,8 @@ 
 				 | OPTION_MASK_NO_UPDATE		\
 				 | OPTION_MASK_P8_FUSION		\
 				 | OPTION_MASK_P8_VECTOR		\
-				 | OPTION_MASK_P9_DFORM			\
+				 | OPTION_MASK_P9_DFORM_SCALAR		\
+				 | OPTION_MASK_P9_DFORM_VECTOR		\
 				 | OPTION_MASK_P9_FUSION		\
 				 | OPTION_MASK_P9_MINMAX		\
 				 | OPTION_MASK_P9_VECTOR		\
Index: gcc/config/rs6000/rs6000.opt
===================================================================
--- gcc/config/rs6000/rs6000.opt	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 235831)
+++ gcc/config/rs6000/rs6000.opt	(.../gcc/config/rs6000)	(working copy)
@@ -470,8 +470,8 @@  Target RejectNegative Joined UInteger Va
 -mlong-double-<n>	Specify size of long double (64 or 128 bits).
 
 mlra
-Target Report Var(rs6000_lra_flag) Init(0) Save
-Use LRA instead of reload.
+Target Undocumented Mask(LRA) Var(rs6000_isa_flags)
+Use the LRA register allocator instead of the reload register allocator.
 
 msched-costly-dep=
 Target RejectNegative Joined Var(rs6000_sched_costly_dep_str)
@@ -609,9 +609,17 @@  mpower9-vector
 Target Report Mask(P9_VECTOR) Var(rs6000_isa_flags)
 Use/do not use vector and scalar instructions added in ISA 3.0.
 
+mpower9-dform-scalar
+Target Report Mask(P9_DFORM_SCALAR) Var(rs6000_isa_flags)
+Use/do not use scalar register+offset memory instructions added in ISA 3.0.
+
+mpower9-dform-vector
+Target Report Mask(P9_DFORM_VECTOR) Var(rs6000_isa_flags)
+Use/do not use vector register+offset memory instructions added in ISA 3.0.
+
 mpower9-dform
-Target Undocumented Mask(P9_DFORM) Var(rs6000_isa_flags)
-Use/do not use vector and scalar instructions added in ISA 3.0.
+Target Report Var(TARGET_P9_DFORM_BOTH) Init(-1) Save
+Use/do not use register+offset memory instructions added in ISA 3.0.
 
 mpower9-minmax
 Target Undocumented Mask(P9_MINMAX) Var(rs6000_isa_flags)
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 235831)
+++ gcc/config/rs6000/rs6000.c	(.../gcc/config/rs6000)	(working copy)
@@ -452,6 +452,7 @@  typedef unsigned char addr_mask_type;
 #define RELOAD_REG_PRE_INCDEC	0x10	/* PRE_INC/PRE_DEC valid.  */
 #define RELOAD_REG_PRE_MODIFY	0x20	/* PRE_MODIFY valid.  */
 #define RELOAD_REG_AND_M16	0x40	/* AND -16 addressing.  */
+#define RELOAD_REG_QUAD_OFFSET	0x80	/* quad offset is limited.  */
 
 /* Register type masks based on the type, of valid addressing modes.  */
 struct rs6000_reg_addr {
@@ -499,6 +500,16 @@  mode_supports_vmx_dform (machine_mode mo
   return ((reg_addr[mode].addr_mask[RELOAD_REG_VMX] & RELOAD_REG_OFFSET) != 0);
 }
 
+/* Return true if we have D-form addressing in VSX registers.  This addressing
+   is more limited than normal d-form addressing in that the offset must be
+   aligned on a 16-byte boundary.  */
+static inline bool
+mode_supports_vsx_dform_quad (machine_mode mode)
+{
+  return ((reg_addr[mode].addr_mask[RELOAD_REG_ANY] & RELOAD_REG_QUAD_OFFSET)
+	  != 0);
+}
+
 
 /* Target cpu costs.  */
 
@@ -2108,7 +2119,9 @@  rs6000_debug_addr_mask (addr_mask_type m
   else if (keep_spaces)
     *p++ = ' ';
 
-  if ((mask & RELOAD_REG_OFFSET) != 0)
+  if ((mask & RELOAD_REG_QUAD_OFFSET) != 0)
+    *p++ = 'O';
+  else if ((mask & RELOAD_REG_OFFSET) != 0)
     *p++ = 'o';
   else if (keep_spaces)
     *p++ = ' ';
@@ -2645,9 +2658,6 @@  rs6000_debug_reg_global (void)
   if (TARGET_LINK_STACK)
     fprintf (stderr, DEBUG_FMT_S, "link_stack", "true");
 
-  if (targetm.lra_p ())
-    fprintf (stderr, DEBUG_FMT_S, "lra", "true");
-
   if (TARGET_P8_FUSION)
     {
       char options[80];
@@ -2781,17 +2791,31 @@  rs6000_setup_reg_addr_masks (void)
 	    }
 
 	  /* GPR and FPR registers can do REG+OFFSET addressing, except
-	     possibly for SDmode.  ISA 3.0 (i.e. power9) adds D-form
-	     addressing for scalars to altivec registers.  */
+	     possibly for SDmode.  ISA 3.0 (i.e. power9) adds D-form addressing
+	     for 64-bit scalars and 32-bit SFmode to altivec registers.  */
 	  if ((addr_mask != 0) && !indexed_only_p
 	      && msize <= 8
 	      && (rc == RELOAD_REG_GPR
-		  || rc == RELOAD_REG_FPR
-		  || (rc == RELOAD_REG_VMX
-		      && TARGET_P9_DFORM
-		      && (m2 == DFmode || m2 == SFmode))))
+		  || ((msize == 8 || m2 == SFmode)
+		      && (rc == RELOAD_REG_FPR
+			  || (rc == RELOAD_REG_VMX
+			      && TARGET_P9_DFORM_SCALAR)))))
 	    addr_mask |= RELOAD_REG_OFFSET;
 
+	  /* VSX registers can do REG+OFFSET addresssing if ISA 3.0
+	     instructions are enabled.  The offset for 128-bit VSX registers is
+	     only 12-bits.  While GPRs can handle the full offset range, VSX
+	     registers can only handle the restricted range.  */
+	  else if ((addr_mask != 0) && !indexed_only_p
+		   && msize == 16 && TARGET_P9_DFORM_VECTOR
+		   && (ALTIVEC_OR_VSX_VECTOR_MODE (m2)
+		       || (m2 == TImode && TARGET_VSX_TIMODE)))
+	    {
+	      addr_mask |= RELOAD_REG_OFFSET;
+	      if (rc == RELOAD_REG_FPR || rc == RELOAD_REG_VMX)
+		addr_mask |= RELOAD_REG_QUAD_OFFSET;
+	    }
+
 	  /* VMX registers can do (REG & -16) and ((REG+REG) & -16)
 	     addressing on 128-bit types.  */
 	  if (rc == RELOAD_REG_VMX && msize == 16
@@ -3114,7 +3138,7 @@  rs6000_init_hard_regno_mode_ok (bool glo
     }
 
   /* Support for new D-form instructions.  */
-  if (TARGET_P9_DFORM)
+  if (TARGET_P9_DFORM_SCALAR)
     rs6000_constraints[RS6000_CONSTRAINT_wb] = ALTIVEC_REGS;
 
   /* Support for ISA 3.0 (power9) vectors.  */
@@ -3987,7 +4011,8 @@  rs6000_option_override_internal (bool gl
 
   /* For the newer switches (vsx, dfp, etc.) set some of the older options,
      unless the user explicitly used the -mno-<option> to disable the code.  */
-  if (TARGET_P9_VECTOR || TARGET_MODULO || TARGET_P9_DFORM || TARGET_P9_MINMAX)
+  if (TARGET_P9_VECTOR || TARGET_MODULO || TARGET_P9_DFORM_SCALAR
+      || TARGET_P9_DFORM_VECTOR || TARGET_P9_DFORM_BOTH > 0 || TARGET_P9_MINMAX)
     rs6000_isa_flags |= (ISA_3_0_MASKS_SERVER & ~rs6000_isa_flags_explicit);
   else if (TARGET_P8_VECTOR || TARGET_DIRECT_MOVE || TARGET_CRYPTO)
     rs6000_isa_flags |= (ISA_2_7_MASKS_SERVER & ~rs6000_isa_flags_explicit);
@@ -4201,26 +4226,54 @@  rs6000_option_override_internal (bool gl
       && !(rs6000_isa_flags_explicit & OPTION_MASK_TOC_FUSION))
     rs6000_isa_flags |= OPTION_MASK_TOC_FUSION;
 
-  /* ISA 3.0 D-form instructions require p9-vector and upper-regs.  */
-  if (TARGET_P9_DFORM && !TARGET_P9_VECTOR)
+  /* -mpower9-dform turns on both -mpower9-dform-scalar and
+      -mpower9-dform-vector.  */
+  if (TARGET_P9_DFORM_BOTH > 0)
+    {
+      if (!(rs6000_isa_flags_explicit & OPTION_MASK_P9_DFORM_VECTOR))
+	rs6000_isa_flags |= OPTION_MASK_P9_DFORM_VECTOR;
+
+      if (!(rs6000_isa_flags_explicit & OPTION_MASK_P9_DFORM_SCALAR))
+	rs6000_isa_flags |= OPTION_MASK_P9_DFORM_SCALAR;
+    }
+  else if (TARGET_P9_DFORM_BOTH == 0)
+    {
+      if (!(rs6000_isa_flags_explicit & OPTION_MASK_P9_DFORM_VECTOR))
+	rs6000_isa_flags &= ~OPTION_MASK_P9_DFORM_VECTOR;
+
+      if (!(rs6000_isa_flags_explicit & OPTION_MASK_P9_DFORM_SCALAR))
+	rs6000_isa_flags &= ~OPTION_MASK_P9_DFORM_SCALAR;
+    }
+
+  /* ISA 3.0 D-form scalar instructions require p9-vector and upper-regs.  */
+  if (TARGET_P9_DFORM_SCALAR && !TARGET_P9_VECTOR)
     {
       if (rs6000_isa_flags_explicit & OPTION_MASK_P9_VECTOR)
-	error ("-mpower9-dform requires -mpower9-vector");
-      rs6000_isa_flags &= ~OPTION_MASK_P9_DFORM;
+	error ("-mpower9-dform-scalar requires -mpower9-vector");
+      rs6000_isa_flags &= ~OPTION_MASK_P9_DFORM_SCALAR;
     }
 
-  if (TARGET_P9_DFORM && !TARGET_UPPER_REGS_DF)
+  if (TARGET_P9_DFORM_SCALAR && !TARGET_UPPER_REGS_DF)
     {
       if (rs6000_isa_flags_explicit & OPTION_MASK_UPPER_REGS_DF)
-	error ("-mpower9-dform requires -mupper-regs-df");
-      rs6000_isa_flags &= ~OPTION_MASK_P9_DFORM;
+	error ("-mpower9-dform-scalar requires -mupper-regs-df");
+      rs6000_isa_flags &= ~OPTION_MASK_P9_DFORM_SCALAR;
     }
 
-  if (TARGET_P9_DFORM && !TARGET_UPPER_REGS_SF)
+  if (TARGET_P9_DFORM_SCALAR && !TARGET_UPPER_REGS_SF)
     {
       if (rs6000_isa_flags_explicit & OPTION_MASK_UPPER_REGS_SF)
-	error ("-mpower9-dform requires -mupper-regs-sf");
-      rs6000_isa_flags &= ~OPTION_MASK_P9_DFORM;
+	error ("-mpower9-dform-scalar requires -mupper-regs-sf");
+      rs6000_isa_flags &= ~OPTION_MASK_P9_DFORM_SCALAR;
+    }
+
+  /* ISA 3.0 D-form instructions require p9-vector.  */
+  if (TARGET_P9_DFORM_VECTOR && !TARGET_P9_VECTOR)
+    {
+      if ((rs6000_isa_flags_explicit & OPTION_MASK_P9_VECTOR)
+	  && (rs6000_isa_flags_explicit & OPTION_MASK_P9_DFORM_VECTOR))
+	error ("-mpower9-dform-vector requires -mpower9-vector");
+      rs6000_isa_flags &= ~OPTION_MASK_P9_DFORM_VECTOR;
     }
 
   /* ISA 3.0 vector instructions include ISA 2.07.  */
@@ -4231,6 +4284,51 @@  rs6000_option_override_internal (bool gl
       rs6000_isa_flags &= ~OPTION_MASK_P9_VECTOR;
     }
 
+  /* There have been bugs with both -mvsx-timode and -mpower9-dform-vector that
+     don't show up with -mlra, but do show up with -mno-lra.  Given -mlra will
+     become the default once PR 69847 is fixed, turn off the options with
+     problems by default if -mno-lra was used, and warn if the user explicitly
+     asked for the option. Set -mlra if it wasn't set and we want to generate
+     ISA 3.0 vector d-form instructions. Enable vsx-timode by default if
+     LRA.  */
+  if (!TARGET_LRA)
+    {
+      if ((rs6000_isa_flags_explicit & OPTION_MASK_LRA) == 0)
+	{
+	  if (TARGET_P9_DFORM_VECTOR)
+	    rs6000_isa_flags |= OPTION_MASK_LRA;
+	}
+
+      else
+	{
+	  if (TARGET_VSX_TIMODE)
+	    {
+	      if (rs6000_isa_flags_explicit & OPTION_MASK_VSX_TIMODE)
+		warning (0, "-mno-lra and -mvsx-timode might be incompatible");
+	      else
+		rs6000_isa_flags &= ~OPTION_MASK_VSX_TIMODE;
+	    }
+
+	  if (TARGET_P9_DFORM_VECTOR)
+	    {
+	      if (TARGET_P9_DFORM_BOTH > 0)
+		warning (0, "-mno-lra and -mpower9-dform might be "
+			 "incompatible");
+
+	      else if (rs6000_isa_flags_explicit & OPTION_MASK_P9_DFORM_VECTOR)
+		warning (0, "-mno-lra and -mpower9-dform-vector might be "
+			 "incompatible");
+
+	      else
+		rs6000_isa_flags &= ~OPTION_MASK_P9_DFORM_VECTOR;
+	    }
+	}
+    }
+
+  if (TARGET_LRA && TARGET_VSX && !TARGET_VSX_TIMODE
+      && (rs6000_isa_flags_explicit & OPTION_MASK_VSX_TIMODE) == 0)
+    rs6000_isa_flags |= OPTION_MASK_VSX_TIMODE;
+
   /* Set -mallow-movmisalign to explicitly on if we have full ISA 2.07
      support. If we only have ISA 2.06 support, and the user did not specify
      the switch, leave it set to -1 so the movmisalign patterns are enabled,
@@ -6915,6 +7013,59 @@  direct_move_p (rtx op0, rtx op1)
   return false;
 }
 
+/* Return true if the OFFSET is valid for the quad address instructions that
+   use d-form (register + offset) addressing.  */
+
+static inline bool
+quad_address_offset_p (HOST_WIDE_INT offset)
+{
+  return (IN_RANGE (offset, -32768, 32767) && ((offset) & 0xf) == 0);
+}
+
+/* Return true if the ADDR is an acceptiable address for a quad memory
+   operation of mode MODE (either LQ/STQ for general purpose registers, or
+   LXV/STXV for vector registers under ISA 3.0.  GPR_P is true if this address
+   is intended for LQ/STQ.  If it is false, the address is intended for the ISA
+   3.0 LXV/STXV instruction.  */
+
+bool
+quad_address_p (rtx addr, machine_mode mode, bool gpr_p)
+{
+  rtx op0, op1;
+
+  if (GET_MODE_SIZE (mode) != 16)
+    return false;
+
+  if (gpr_p)
+    {
+      if (!TARGET_QUAD_MEMORY && !TARGET_SYNC_TI)
+	return false;
+
+      /* LQ/STQ can handle indirect addresses.  */
+      if (base_reg_operand (addr, Pmode))
+	return true;
+    }
+
+  else
+    {
+      if (!mode_supports_vsx_dform_quad (mode))
+	return false;
+    }
+
+  if (GET_CODE (addr) != PLUS)
+    return false;
+
+  op0 = XEXP (addr, 0);
+  if (!base_reg_operand (op0, Pmode))
+    return false;
+
+  op1 = XEXP (addr, 1);
+  if (!CONST_INT_P (op1))
+    return false;
+
+  return quad_address_offset_p (INTVAL (op1));
+}
+
 /* Return true if this is a load or store quad operation.  This function does
    not handle the atomic quad memory instructions.  */
 
@@ -7007,6 +7158,10 @@  mem_operand_gpr (rtx op, machine_mode mo
   if (TARGET_POWERPC64 && (offset & 3) != 0)
     return false;
 
+  if (mode_supports_vsx_dform_quad (mode)
+      && !quad_address_offset_p (offset))
+    return false;
+
   extra = GET_MODE_SIZE (mode) - UNITS_PER_WORD;
   if (extra < 0)
     extra = 0;
@@ -7036,13 +7191,14 @@  reg_offset_addressing_ok_p (machine_mode
     case TImode:
     case TFmode:
     case KFmode:
-      /* AltiVec/VSX vector modes.  Only reg+reg addressing is valid.  While
-	 TImode is not a vector mode, if we want to use the VSX registers to
-	 move it around, we need to restrict ourselves to reg+reg addressing.
-	 Similarly for IEEE 128-bit floating point that is passed in a single
-	 vector register.  */
+      /* AltiVec/VSX vector modes.  Only reg+reg addressing was valid until the
+	 ISA 3.0 vector d-form addressing mode was added.  While TImode is not
+	 a vector mode, if we want to use the VSX registers to move it around,
+	 we need to restrict ourselves to reg+reg addressing.  Similarly for
+	 IEEE 128-bit floating point that is passed in a single vector
+	 register.  */
       if (VECTOR_MEM_ALTIVEC_OR_VSX_P (mode))
-	return false;
+	return mode_supports_vsx_dform_quad (mode);
       break;
 
     case V4HImode:
@@ -7109,6 +7265,11 @@  offsettable_ok_by_alignment (rtx op, HOS
   if (GET_CODE (op) != SYMBOL_REF)
     return false;
 
+  /* ISA 3.0 vector d-form addressing is restricted, don't allow
+     SYMBOL_REF.  */
+  if (mode_supports_vsx_dform_quad (mode))
+    return false;
+
   dsize = GET_MODE_SIZE (mode);
   decl = SYMBOL_REF_DECL (op);
   if (!decl)
@@ -7263,6 +7424,9 @@  rs6000_legitimate_offset_address_p (mach
     return false;
   if (!INT_REG_OK_FOR_BASE_P (XEXP (x, 0), strict))
     return false;
+  if (mode_supports_vsx_dform_quad (mode))
+    return (virtual_stack_registers_memory_p (x)
+	    || quad_address_p (x, mode, false));
   if (!reg_offset_addressing_ok_p (mode))
     return virtual_stack_registers_memory_p (x);
   if (legitimate_constant_pool_address_p (x, mode, strict || lra_in_progress))
@@ -7401,6 +7565,9 @@  legitimate_lo_sum_address_p (machine_mod
     return false;
   if (!INT_REG_OK_FOR_BASE_P (XEXP (x, 0), strict))
     return false;
+  /* quad word addresses are restricted, and we can't use LO_SUM.  */
+  if (mode_supports_vsx_dform_quad (mode))
+    return false;
   /* Restrict addressing for DI because of our SUBREG hackery.  */
   if (TARGET_E500_DOUBLE && GET_MODE_SIZE (mode) > UNITS_PER_WORD)
     return false;
@@ -7412,7 +7579,7 @@  legitimate_lo_sum_address_p (machine_mod
 
       if (DEFAULT_ABI == ABI_V4 && flag_pic)
 	return false;
-      /* LRA don't use LEGITIMIZE_RELOAD_ADDRESS as it usually calls
+      /* LRA doesn't use LEGITIMIZE_RELOAD_ADDRESS as it usually calls
 	 push_reload from reload pass code.  LEGITIMIZE_RELOAD_ADDRESS
 	 recognizes some LO_SUM addresses as valid although this
 	 function says opposite.  In most cases, LRA through different
@@ -7466,7 +7633,8 @@  rs6000_legitimize_address (rtx x, rtx ol
 {
   unsigned int extra;
 
-  if (!reg_offset_addressing_ok_p (mode))
+  if (!reg_offset_addressing_ok_p (mode)
+      || mode_supports_vsx_dform_quad (mode))
     {
       if (virtual_stack_registers_memory_p (x))
 	return x;
@@ -8177,6 +8345,11 @@  rs6000_legitimize_reload_address (rtx x,
       && GET_CODE (XEXP (XEXP (x, 0), 1)) == CONST_INT
       && GET_CODE (XEXP (x, 1)) == CONST_INT)
     {
+      if (TARGET_DEBUG_ADDR)
+	{
+	  fprintf (stderr, "\nlegitimize_reload_address push_reload #1:\n");
+	  debug_rtx (x);
+	}
       push_reload (XEXP (x, 0), NULL_RTX, &XEXP (x, 0), NULL,
 		   BASE_REG_CLASS, GET_MODE (x), VOIDmode, 0, 0,
 		   opnum, (enum reload_type) type);
@@ -8188,6 +8361,11 @@  rs6000_legitimize_reload_address (rtx x,
   if (GET_CODE (x) == LO_SUM
       && GET_CODE (XEXP (x, 0)) == HIGH)
     {
+      if (TARGET_DEBUG_ADDR)
+	{
+	  fprintf (stderr, "\nlegitimize_reload_address push_reload #2:\n");
+	  debug_rtx (x);
+	}
       push_reload (XEXP (x, 0), NULL_RTX, &XEXP (x, 0), NULL,
 		   BASE_REG_CLASS, Pmode, VOIDmode, 0, 0,
 		   opnum, (enum reload_type) type);
@@ -8220,6 +8398,11 @@  rs6000_legitimize_reload_address (rtx x,
     {
       rtx hi = gen_rtx_HIGH (Pmode, copy_rtx (x));
       x = gen_rtx_LO_SUM (Pmode, hi, x);
+      if (TARGET_DEBUG_ADDR)
+	{
+	  fprintf (stderr, "\nlegitimize_reload_address push_reload #3:\n");
+	  debug_rtx (x);
+	}
       push_reload (XEXP (x, 0), NULL_RTX, &XEXP (x, 0), NULL,
 		   BASE_REG_CLASS, Pmode, VOIDmode, 0, 0,
 		   opnum, (enum reload_type) type);
@@ -8257,6 +8440,11 @@  rs6000_legitimize_reload_address (rtx x,
 				      GEN_INT (high)),
 			GEN_INT (low));
 
+      if (TARGET_DEBUG_ADDR)
+	{
+	  fprintf (stderr, "\nlegitimize_reload_address push_reload #4:\n");
+	  debug_rtx (x);
+	}
       push_reload (XEXP (x, 0), NULL_RTX, &XEXP (x, 0), NULL,
 		   BASE_REG_CLASS, GET_MODE (x), VOIDmode, 0, 0,
 		   opnum, (enum reload_type) type);
@@ -8317,6 +8505,11 @@  rs6000_legitimize_reload_address (rtx x,
 	x = gen_rtx_LO_SUM (GET_MODE (x),
 	      gen_rtx_HIGH (Pmode, x), x);
 
+      if (TARGET_DEBUG_ADDR)
+	{
+	  fprintf (stderr, "\nlegitimize_reload_address push_reload #5:\n");
+	  debug_rtx (x);
+	}
       push_reload (XEXP (x, 0), NULL_RTX, &XEXP (x, 0), NULL,
 		   BASE_REG_CLASS, Pmode, VOIDmode, 0, 0,
 		   opnum, (enum reload_type) type);
@@ -8350,9 +8543,16 @@  rs6000_legitimize_reload_address (rtx x,
     {
       x = create_TOC_reference (x, NULL_RTX);
       if (TARGET_CMODEL != CMODEL_SMALL)
-	push_reload (XEXP (x, 0), NULL_RTX, &XEXP (x, 0), NULL,
-		     BASE_REG_CLASS, Pmode, VOIDmode, 0, 0,
-		     opnum, (enum reload_type) type);
+	{
+	  if (TARGET_DEBUG_ADDR)
+	    {
+	      fprintf (stderr, "\nlegitimize_reload_address push_reload #6:\n");
+	      debug_rtx (x);
+	    }
+	  push_reload (XEXP (x, 0), NULL_RTX, &XEXP (x, 0), NULL,
+		       BASE_REG_CLASS, Pmode, VOIDmode, 0, 0,
+		       opnum, (enum reload_type) type);
+	}
       *win = 1;
       return x;
     }
@@ -8408,6 +8608,7 @@  static bool
 rs6000_legitimate_address_p (machine_mode mode, rtx x, bool reg_ok_strict)
 {
   bool reg_offset_p = reg_offset_addressing_ok_p (mode);
+  bool quad_offset_p = mode_supports_vsx_dform_quad (mode);
 
   /* If this is an unaligned stvx/ldvx type address, discard the outer AND.  */
   if (VECTOR_MEM_ALTIVEC_P (mode)
@@ -8427,15 +8628,26 @@  rs6000_legitimate_address_p (machine_mod
     return 1;
   if (virtual_stack_registers_memory_p (x))
     return 1;
-  if (reg_offset_p && legitimate_small_data_p (mode, x))
-    return 1;
-  if (reg_offset_p
-      && legitimate_constant_pool_address_p (x, mode,
+
+  /* Handle restricted vector d-form offsets in ISA 3.0.  */
+  if (quad_offset_p)
+    {
+      if (quad_address_p (x, mode, false))
+	return 1;
+    }
+
+  else if (reg_offset_p)
+    {
+      if (legitimate_small_data_p (mode, x))
+	return 1;
+      if (legitimate_constant_pool_address_p (x, mode,
 					     reg_ok_strict || lra_in_progress))
-    return 1;
-  if (reg_offset_p && reg_addr[mode].fused_toc && GET_CODE (x) == UNSPEC
-      && XINT (x, 1) == UNSPEC_FUSION_ADDIS)
-    return 1;
+	return 1;
+      if (reg_addr[mode].fused_toc && GET_CODE (x) == UNSPEC
+	  && XINT (x, 1) == UNSPEC_FUSION_ADDIS)
+	return 1;
+    }
+
   /* For TImode, if we have load/store quad and TImode in VSX registers, only
      allow register indirect addresses.  This will allow the values to go in
      either GPRs or VSX registers without reloading.  The vector types would
@@ -8474,7 +8686,8 @@  rs6000_legitimate_address_p (machine_mod
 	      && legitimate_indexed_address_p (XEXP (x, 1), reg_ok_strict)))
       && rtx_equal_p (XEXP (XEXP (x, 1), 0), XEXP (x, 0)))
     return 1;
-  if (reg_offset_p && legitimate_lo_sum_address_p (mode, x, reg_ok_strict))
+  if (reg_offset_p && !quad_offset_p
+      && legitimate_lo_sum_address_p (mode, x, reg_ok_strict))
     return 1;
   return 0;
 }
@@ -18368,13 +18581,23 @@  rs6000_secondary_reload_memory (rtx addr
 	    }
 	}
 
+      else if ((addr_mask & RELOAD_REG_QUAD_OFFSET) != 0
+	       && CONST_INT_P (plus_arg1))
+	{
+	  if (!quad_address_offset_p (INTVAL (plus_arg1)))
+	    {
+	      extra_cost = 1;
+	      type = "vector d-form offset";
+	    }
+	}
+
       /* Make sure the register class can handle offset addresses.  */
       else if (rs6000_legitimate_offset_address_p (mode, addr, false, true))
 	{
 	  if ((addr_mask & RELOAD_REG_OFFSET) == 0)
 	    {
 	      extra_cost = 1;
-	      type = "offset";
+	      type = "offset #2";
 	    }
 	}
 
@@ -18387,7 +18610,14 @@  rs6000_secondary_reload_memory (rtx addr
       break;
 
     case LO_SUM:
-      if (!legitimate_lo_sum_address_p (mode, addr, false))
+      /* Quad offsets are restricted and can't handle normal addresses.  */
+      if ((addr_mask & RELOAD_REG_QUAD_OFFSET) != 0)
+	{
+	  extra_cost = -1;
+	  type = "vector d-form lo_sum";
+	}
+
+      else if (!legitimate_lo_sum_address_p (mode, addr, false))
 	{
 	  fail_msg = "bad LO_SUM";
 	  extra_cost = -1;
@@ -18404,8 +18634,17 @@  rs6000_secondary_reload_memory (rtx addr
     case CONST:
     case SYMBOL_REF:
     case LABEL_REF:
-      type = "address";
-      extra_cost = rs6000_secondary_reload_toc_costs (addr_mask);
+      if ((addr_mask & RELOAD_REG_QUAD_OFFSET) != 0)
+	{
+	  extra_cost = -1;
+	  type = "vector d-form lo_sum #2";
+	}
+
+      else
+	{
+	  type = "address";
+	  extra_cost = rs6000_secondary_reload_toc_costs (addr_mask);
+	}
       break;
 
       /* TOC references look like offsetable memory.  */
@@ -18416,6 +18655,12 @@  rs6000_secondary_reload_memory (rtx addr
 	  extra_cost = -1;
 	}
 
+      else if ((addr_mask & RELOAD_REG_QUAD_OFFSET) != 0)
+	{
+	  extra_cost = -1;
+	  type = "vector d-form lo_sum #3";
+	}
+
       else if ((addr_mask & RELOAD_REG_OFFSET) == 0)
 	{
 	  extra_cost = 1;
@@ -19046,6 +19291,16 @@  rs6000_secondary_reload_inner (rtx reg, 
 	    }
 	}
 
+      else if (mode_supports_vsx_dform_quad (mode) && CONST_INT_P (op1))
+	{
+	  if (((addr_mask & RELOAD_REG_QUAD_OFFSET) == 0)
+	      || !quad_address_p (addr, mode, false))
+	    {
+	      emit_insn (gen_rtx_SET (scratch, addr));
+	      new_addr = scratch;
+	    }
+	}
+
       /* Make sure the register class can handle offset addresses.  */
       else if (rs6000_legitimate_offset_address_p (mode, addr, false, true))
 	{
@@ -19076,6 +19331,13 @@  rs6000_secondary_reload_inner (rtx reg, 
 	    }
 	}
 
+      /* Quad offsets are restricted and can't handle normal addresses.  */
+      else if (mode_supports_vsx_dform_quad (mode))
+	{
+	  emit_insn (gen_rtx_SET (scratch, addr));
+	  new_addr = scratch;
+	}
+
       /* Make sure the register class can handle offset addresses.  */
       else if (legitimate_lo_sum_address_p (mode, addr, false))
 	{
@@ -19296,7 +19558,8 @@  rs6000_preferred_reload_class (rtx x, en
 	}
 
       /* D-form addressing can easily reload the value.  */
-      if (mode_supports_vmx_dform (mode))
+      if (mode_supports_vmx_dform (mode)
+	  || mode_supports_vsx_dform_quad (mode))
 	return rclass;
 
       /* If this is a scalar floating point value and we don't have D-form
@@ -19731,8 +19994,16 @@  rs6000_output_move_128bit (rtx operands[
 
       else if (TARGET_VSX && dest_vsx_p)
 	{
-	  if (mode == V16QImode || mode == V8HImode || mode == V4SImode)
+	  if (mode_supports_vsx_dform_quad (mode)
+	      && quad_address_p (XEXP (src, 0), mode, false))
+	    return "lxv %x0,%1";
+
+	  else if (TARGET_P9_VECTOR)
+	    return "lxvx %x0,%y1";
+
+	  else if (mode == V16QImode || mode == V8HImode || mode == V4SImode)
 	    return "lxvw4x %x0,%y1";
+
 	  else
 	    return "lxvd2x %x0,%y1";
 	}
@@ -19761,8 +20032,16 @@  rs6000_output_move_128bit (rtx operands[
 
       else if (TARGET_VSX && src_vsx_p)
 	{
-	  if (mode == V16QImode || mode == V8HImode || mode == V4SImode)
+	  if (mode_supports_vsx_dform_quad (mode)
+	      && quad_address_p (XEXP (dest, 0), mode, false))
+	    return "stxv %x1,%0";
+
+	  else if (TARGET_P9_VECTOR)
+	    return "stxvx %x1,%y0";
+
+	  else if (mode == V16QImode || mode == V8HImode || mode == V4SImode)
 	    return "stxvw4x %x1,%y0";
+
 	  else
 	    return "stxvd2x %x1,%y0";
 	}
@@ -26198,25 +26477,37 @@  rs6000_emit_prologue (void)
 	if (info->vrsave_mask & ALTIVEC_REG_BIT (i))
 	  {
 	    rtx areg, savereg, mem;
-	    int offset;
+	    HOST_WIDE_INT offset;
 
 	    offset = (info->altivec_save_offset + frame_off
 		      + 16 * (i - info->first_altivec_reg_save));
 
 	    savereg = gen_rtx_REG (V4SImode, i);
 
-	    NOT_INUSE (0);
-	    areg = gen_rtx_REG (Pmode, 0);
-	    emit_move_insn (areg, GEN_INT (offset));
-
-	    /* AltiVec addressing mode is [reg+reg].  */
-	    mem = gen_frame_mem (V4SImode,
-				 gen_rtx_PLUS (Pmode, frame_reg_rtx, areg));
-
-	    /* Rather than emitting a generic move, force use of the stvx
-	       instruction, which we always want.  In particular we don't
-	       want xxpermdi/stxvd2x for little endian.  */
-	    insn = emit_insn (gen_altivec_stvx_v4si_internal (mem, savereg));
+	    if (TARGET_P9_DFORM_VECTOR && quad_address_offset_p (offset))
+	      {
+		mem = gen_frame_mem (V4SImode,
+				     gen_rtx_PLUS (Pmode, frame_reg_rtx,
+						   GEN_INT (offset)));
+		insn = emit_insn (gen_rtx_SET (mem, savereg));
+		areg = NULL_RTX;
+	      }
+	    else
+	      {
+		NOT_INUSE (0);
+		areg = gen_rtx_REG (Pmode, 0);
+		emit_move_insn (areg, GEN_INT (offset));
+
+		/* AltiVec addressing mode is [reg+reg].  */
+		mem = gen_frame_mem (V4SImode,
+				     gen_rtx_PLUS (Pmode, frame_reg_rtx, areg));
+
+		/* Rather than emitting a generic move, force use of the stvx
+		   instruction, which we always want on ISA 2.07 (power8) systems.
+		   In particular we don't want xxpermdi/stxvd2x for little
+		   endian.  */
+		insn = emit_insn (gen_altivec_stvx_v4si_internal (mem, savereg));
+	      }
 
 	    rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off,
 				  areg, GEN_INT (offset));
@@ -26936,23 +27227,35 @@  rs6000_emit_epilogue (int sibcall)
 	  for (i = info->first_altivec_reg_save; i <= LAST_ALTIVEC_REGNO; ++i)
 	    if (info->vrsave_mask & ALTIVEC_REG_BIT (i))
 	      {
-		rtx addr, areg, mem, reg;
+		rtx addr, areg, mem, insn;
+		rtx reg = gen_rtx_REG (V4SImode, i);
+		HOST_WIDE_INT offset
+		  = (info->altivec_save_offset + frame_off
+		     + 16 * (i - info->first_altivec_reg_save));
 
-		areg = gen_rtx_REG (Pmode, 0);
-		emit_move_insn
-		  (areg, GEN_INT (info->altivec_save_offset
-				  + frame_off
-				  + 16 * (i - info->first_altivec_reg_save)));
+		if (TARGET_P9_DFORM_VECTOR && quad_address_offset_p (offset))
+		  {
+		    mem = gen_frame_mem (V4SImode,
+					 gen_rtx_PLUS (Pmode, frame_reg_rtx,
+						       GEN_INT (offset)));
+		    insn = gen_rtx_SET (reg, mem);
+		  }
+		else
+		  {
+		    areg = gen_rtx_REG (Pmode, 0);
+		    emit_move_insn (areg, GEN_INT (offset));
 
-		/* AltiVec addressing mode is [reg+reg].  */
-		addr = gen_rtx_PLUS (Pmode, frame_reg_rtx, areg);
-		mem = gen_frame_mem (V4SImode, addr);
+		    /* AltiVec addressing mode is [reg+reg].  */
+		    addr = gen_rtx_PLUS (Pmode, frame_reg_rtx, areg);
+		    mem = gen_frame_mem (V4SImode, addr);
+
+		    /* Rather than emitting a generic move, force use of the
+		       lvx instruction, which we always want.  In particular we
+		       don't want lxvd2x/xxpermdi for little endian.  */
+		    insn = gen_altivec_lvx_v4si_internal (reg, mem);
+		  }
 
-		reg = gen_rtx_REG (V4SImode, i);
-		/* Rather than emitting a generic move, force use of the
-		   lvx instruction, which we always want.  In particular
-		   we don't want lxvd2x/xxpermdi for little endian.  */
-		(void) emit_insn (gen_altivec_lvx_v4si_internal (reg, mem));
+		(void) emit_insn (insn);
 	      }
 	}
 
@@ -27139,23 +27442,35 @@  rs6000_emit_epilogue (int sibcall)
 	  for (i = info->first_altivec_reg_save; i <= LAST_ALTIVEC_REGNO; ++i)
 	    if (info->vrsave_mask & ALTIVEC_REG_BIT (i))
 	      {
-		rtx addr, areg, mem, reg;
+		rtx addr, areg, mem, insn;
+		rtx reg = gen_rtx_REG (V4SImode, i);
+		HOST_WIDE_INT offset
+		  = (info->altivec_save_offset + frame_off
+		     + 16 * (i - info->first_altivec_reg_save));
 
-		areg = gen_rtx_REG (Pmode, 0);
-		emit_move_insn
-		  (areg, GEN_INT (info->altivec_save_offset
-				  + frame_off
-				  + 16 * (i - info->first_altivec_reg_save)));
+		if (TARGET_P9_DFORM_VECTOR && quad_address_offset_p (offset))
+		  {
+		    mem = gen_frame_mem (V4SImode,
+					 gen_rtx_PLUS (Pmode, frame_reg_rtx,
+						       GEN_INT (offset)));
+		    insn = gen_rtx_SET (reg, mem);
+		  }
+		else
+		  {
+		    areg = gen_rtx_REG (Pmode, 0);
+		    emit_move_insn (areg, GEN_INT (offset));
 
-		/* AltiVec addressing mode is [reg+reg].  */
-		addr = gen_rtx_PLUS (Pmode, frame_reg_rtx, areg);
-		mem = gen_frame_mem (V4SImode, addr);
+		    /* AltiVec addressing mode is [reg+reg].  */
+		    addr = gen_rtx_PLUS (Pmode, frame_reg_rtx, areg);
+		    mem = gen_frame_mem (V4SImode, addr);
+
+		    /* Rather than emitting a generic move, force use of the
+		       lvx instruction, which we always want.  In particular we
+		       don't want lxvd2x/xxpermdi for little endian.  */
+		    insn = gen_altivec_lvx_v4si_internal (reg, mem);
+		  }
 
-		reg = gen_rtx_REG (V4SImode, i);
-		/* Rather than emitting a generic move, force use of the
-		   lvx instruction, which we always want.  In particular
-		   we don't want lxvd2x/xxpermdi for little endian.  */
-		(void) emit_insn (gen_altivec_lvx_v4si_internal (reg, mem));
+		(void) emit_insn (insn);
 	      }
 	}
 
@@ -34311,7 +34626,7 @@  rs6000_libcall_value (machine_mode mode)
 static bool
 rs6000_lra_p (void)
 {
-  return rs6000_lra_flag;
+  return TARGET_LRA;
 }
 
 /* Given FROM and TO register numbers, say whether this elimination is allowed.
@@ -34662,6 +34977,7 @@  static struct rs6000_opt_mask const rs60
   { "hard-dfp",			OPTION_MASK_DFP,		false, true  },
   { "htm",			OPTION_MASK_HTM,		false, true  },
   { "isel",			OPTION_MASK_ISEL,		false, true  },
+  { "lra",			OPTION_MASK_LRA,		false, false },
   { "mfcrf",			OPTION_MASK_MFCRF,		false, true  },
   { "mfpgpr",			OPTION_MASK_MFPGPR,		false, true  },
   { "modulo",			OPTION_MASK_MODULO,		false, true  },
@@ -34672,7 +34988,8 @@  static struct rs6000_opt_mask const rs60
   { "power8-fusion",		OPTION_MASK_P8_FUSION,		false, true  },
   { "power8-fusion-sign",	OPTION_MASK_P8_FUSION_SIGN,	false, true  },
   { "power8-vector",		OPTION_MASK_P8_VECTOR,		false, true  },
-  { "power9-dform",		OPTION_MASK_P9_DFORM,		false, true  },
+  { "power9-dform-scalar",	OPTION_MASK_P9_DFORM_SCALAR,	false, true  },
+  { "power9-dform-vector",	OPTION_MASK_P9_DFORM_VECTOR,	false, true  },
   { "power9-fusion",		OPTION_MASK_P9_FUSION,		false, true  },
   { "power9-minmax",		OPTION_MASK_P9_MINMAX,		false, true  },
   { "power9-vector",		OPTION_MASK_P9_VECTOR,		false, true  },
@@ -35305,7 +35622,9 @@  rs6000_print_options_internal (FILE *fil
   size_t i;
   size_t start_column = 0;
   size_t cur_column;
-  size_t max_column = 76;
+  size_t max_column = 120;
+  size_t prefix_len = strlen (prefix);
+  size_t comma_len = 0;
   const char *comma = "";
 
   if (indent)
@@ -35323,27 +35642,45 @@  rs6000_print_options_internal (FILE *fil
   cur_column = start_column;
   for (i = 0; i < num_elements; i++)
     {
-      if ((flags & opts[i].mask) != 0)
+      bool invert = opts[i].invert;
+      const char *name = opts[i].name;
+      const char *no_str = "";
+      HOST_WIDE_INT mask = opts[i].mask;
+      size_t len = comma_len + prefix_len + strlen (name);
+
+      if (!invert)
 	{
-	  const char *no_str = rs6000_opt_masks[i].invert ? "no-" : "";
-	  size_t len = (strlen (comma)
-			+ strlen (prefix)
-			+ strlen (no_str)
-			+ strlen (rs6000_opt_masks[i].name));
+	  if ((flags & mask) == 0)
+	    {
+	      no_str = "no-";
+	      len += sizeof ("no-") - 1;
+	    }
 
-	  cur_column += len;
-	  if (cur_column > max_column)
+	  flags &= ~mask;
+	}
+
+      else
+	{
+	  if ((flags & mask) != 0)
 	    {
-	      fprintf (stderr, ", \\\n%*s", (int)start_column, "");
-	      cur_column = start_column + len;
-	      comma = "";
+	      no_str = "no-";
+	      len += sizeof ("no-") - 1;
 	    }
 
-	  fprintf (file, "%s%s%s%s", comma, prefix, no_str,
-		   rs6000_opt_masks[i].name);
-	  flags &= ~ opts[i].mask;
-	  comma = ", ";
+	  flags |= mask;
 	}
+
+      cur_column += len;
+      if (cur_column > max_column)
+	{
+	  fprintf (stderr, ", \\\n%*s", (int)start_column, "");
+	  cur_column = start_column + len;
+	  comma = "";
+	}
+
+      fprintf (file, "%s%s%s%s", comma, prefix, no_str, name);
+      comma = ", ";
+      comma_len = sizeof (", ") - 1;
     }
 
   fputs ("\n", file);
Index: gcc/config/rs6000/constraints.md
===================================================================
--- gcc/config/rs6000/constraints.md	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 235831)
+++ gcc/config/rs6000/constraints.md	(.../gcc/config/rs6000)	(working copy)
@@ -156,6 +156,11 @@  (define_constraint "wL"
        (and (match_test "TARGET_DIRECT_MOVE_128")
 	    (match_test "(ival == VECTOR_ELEMENT_MFVSRLD_64BIT)"))))
 
+;; ISA 3.0 vector d-form addresses
+(define_memory_constraint "wO"
+  "Memory operand suitable for the ISA 3.0 vector d-form instructions."
+  (match_operand 0 "vsx_quad_dform_memory_operand"))
+
 ;; Lq/stq validates the address for load/store quad
 (define_memory_constraint "wQ"
   "Memory operand suitable for the load/store quad instructions"
Index: gcc/config/rs6000/predicates.md
===================================================================
--- gcc/config/rs6000/predicates.md	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 235831)
+++ gcc/config/rs6000/predicates.md	(.../gcc/config/rs6000)	(working copy)
@@ -698,48 +698,25 @@  (define_predicate "offsettable_mem_opera
 (define_predicate "quad_memory_operand"
   (match_code "mem")
 {
-  rtx addr, op0, op1;
-  int ret;
-
   if (!TARGET_QUAD_MEMORY && !TARGET_SYNC_TI)
-    ret = 0;
-
-  else if (!memory_operand (op, mode))
-    ret = 0;
-
-  else if (GET_MODE_SIZE (GET_MODE (op)) != 16)
-    ret = 0;
-
-  else if (MEM_ALIGN (op) < 128)
-    ret = 0;
-
-  else
-    {
-      addr = XEXP (op, 0);
-      if (int_reg_operand (addr, Pmode))
-	ret = 1;
+    return false;
 
-      else if (GET_CODE (addr) != PLUS)
-	ret = 0;
+  if (GET_MODE_SIZE (mode) != 16 || !MEM_P (op) || MEM_ALIGN (op) < 128)
+    return false;
 
-      else
-	{
-	  op0 = XEXP (addr, 0);
-	  op1 = XEXP (addr, 1);
-	  ret = (int_reg_operand (op0, Pmode)
-		 && GET_CODE (op1) == CONST_INT
-		 && IN_RANGE (INTVAL (op1), -32768, 32767)
-		 && (INTVAL (op1) & 15) == 0);
-	}
-    }
+  return quad_address_p (XEXP (op, 0), mode, true);
+})
 
-  if (TARGET_DEBUG_ADDR)
-    {
-      fprintf (stderr, "\nquad_memory_operand, ret = %s\n", ret ? "true" : "false");
-      debug_rtx (op);
-    }
+;; Return 1 if the operand is suitable for load/store to vector registers with
+;; d-form addressing (register+offset), which was added in ISA 3.0.
+;; Unlike quad_memory_operand, we do not have to check for alignment.
+(define_predicate "vsx_quad_dform_memory_operand"
+  (match_code "mem")
+{
+  if (!TARGET_P9_DFORM_VECTOR || !MEM_P (op) || GET_MODE_SIZE (mode) != 16)
+    return false;
 
-  return ret;
+  return quad_address_p (XEXP (op, 0), mode, false);
 })
 
 ;; Return 1 if the operand is an indexed or indirect memory operand.
Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 235831)
+++ gcc/config/rs6000/rs6000-protos.h	(.../gcc/config/rs6000)	(working copy)
@@ -86,6 +86,7 @@  extern int registers_ok_for_quad_peep (r
 extern int mems_ok_for_quad_peep (rtx, rtx);
 extern bool gpr_or_gpr_p (rtx, rtx);
 extern bool direct_move_p (rtx, rtx);
+extern bool quad_address_p (rtx, machine_mode, bool);
 extern bool quad_load_store_p (rtx, rtx);
 extern bool fusion_gpr_load_p (rtx, rtx, rtx, rtx);
 extern void expand_fusion_gpr_load (rtx *);
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 235831)
+++ gcc/config/rs6000/vsx.md	(.../gcc/config/rs6000)	(working copy)
@@ -301,24 +301,6 @@  (define_c_enum "unspec"
    UNSPEC_VSX_XVCVDPUXDS
   ])
 
-;; VSX (P9) moves
-
-(define_insn "*p9_vecload_<mode>"
-  [(set (match_operand:VSX_M2 0 "vsx_register_operand" "=<VSa>")
-        (match_operand:VSX_M2 1 "memory_operand" "Z"))]
-  "TARGET_P9_VECTOR"
-  "lxvx %x0,%y1"
-  [(set_attr "type" "vecload")
-   (set_attr "length" "4")])
-
-(define_insn "*p9_vecstore_<mode>"
-  [(set (match_operand:VSX_M2 0 "memory_operand" "=Z")
-        (match_operand:VSX_M2 1 "vsx_register_operand" "<VSa>"))]
-  "TARGET_P9_VECTOR"
-  "stxvx %x1,%y0"
-  [(set_attr "type" "vecstore")
-   (set_attr "length" "4")])
-
 ;; VSX moves
 
 ;; The patterns for LE permuted loads and stores come before the general
@@ -788,8 +770,8 @@  (define_split
   "")
 
 (define_insn "*vsx_mov<mode>"
-  [(set (match_operand:VSX_M 0 "nonimmediate_operand" "=Z,<VSr>,<VSr>,?Z,?<VSa>,?<VSa>,r,we,wQ,?&r,??Y,??r,??r,<VSr>,?<VSa>,*r,v,wZ,v")
-	(match_operand:VSX_M 1 "input_operand" "<VSr>,Z,<VSr>,<VSa>,Z,<VSa>,we,b,r,wQ,r,Y,r,j,j,j,W,v,wZ"))]
+  [(set (match_operand:VSX_M 0 "nonimmediate_operand" "=ZwO,<VSr>,<VSr>,?ZwO,?<VSa>,?<VSa>,r,we,wQ,?&r,??Y,??r,??r,<VSr>,?<VSa>,*r,v,wZ,v")
+	(match_operand:VSX_M 1 "input_operand" "<VSr>,ZwO,<VSr>,<VSa>,ZwO,<VSa>,we,b,r,wQ,r,Y,r,j,j,j,W,v,wZ"))]
   "VECTOR_MEM_VSX_P (<MODE>mode)
    && (register_operand (operands[0], <MODE>mode) 
        || register_operand (operands[1], <MODE>mode))"
@@ -803,8 +785,8 @@  (define_insn "*vsx_mov<mode>"
 ;; use of TImode is for unions.  However for plain data movement, slightly
 ;; favor the vector loads
 (define_insn "*vsx_movti_64bit"
-  [(set (match_operand:TI 0 "nonimmediate_operand" "=Z,wa,wa,wa,r,we,v,v,wZ,wQ,&r,Y,r,r,?r")
-	(match_operand:TI 1 "input_operand" "wa,Z,wa,O,we,b,W,wZ,v,r,wQ,r,Y,r,n"))]
+  [(set (match_operand:TI 0 "nonimmediate_operand" "=ZwO,wa,wa,wa,r,we,v,v,wZ,wQ,&r,Y,r,r,?r")
+	(match_operand:TI 1 "input_operand" "wa,ZwO,wa,O,we,b,W,wZ,v,r,wQ,r,Y,r,n"))]
   "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (TImode)
    && (register_operand (operands[0], TImode) 
        || register_operand (operands[1], TImode))"
@@ -815,8 +797,8 @@  (define_insn "*vsx_movti_64bit"
    (set_attr "length" "4,4,4,4,8,4,16,4,4,8,8,8,8,8,8")])
 
 (define_insn "*vsx_movti_32bit"
-  [(set (match_operand:TI 0 "nonimmediate_operand" "=Z,wa,wa,wa,v, v,wZ,Q,Y,????r,????r,????r,r")
-	(match_operand:TI 1 "input_operand"        "wa, Z,wa, O,W,wZ, v,r,r,    Q,    Y,    r,n"))]
+  [(set (match_operand:TI 0 "nonimmediate_operand" "=ZwO,wa,wa,wa,v,v,wZ,Q,Y,????r,????r,????r,r")
+	(match_operand:TI 1 "input_operand"        "wa,ZwO,wa,O,W,wZ,v,r,r,Q,Y,r,n"))]
   "! TARGET_POWERPC64 && VECTOR_MEM_VSX_P (TImode)
    && (register_operand (operands[0], TImode)
        || register_operand (operands[1], TImode))"
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/doc)	(revision 235831)
+++ gcc/doc/invoke.texi	(.../gcc/doc)	(working copy)
@@ -1004,7 +1004,10 @@  See RS/6000 and PowerPC Options.
 -mupper-regs-df -mno-upper-regs-df -mupper-regs-sf -mno-upper-regs-sf @gol
 -mupper-regs -mno-upper-regs -mmodulo -mno-modulo @gol
 -mfloat128 -mno-float128 -mfloat128-hardware -mno-float128-hardware @gol
--mpower9-fusion -mno-mpower9-fusion -mpower9-vector -mno-power9-vector}
+-mpower9-fusion -mno-mpower9-fusion -mpower9-vector -mno-power9-vector @gol
+-mpower9-dform-scalar -mno-power9-dform-scalar @gol
+-mpower9-dform-vector -mno-power9-dform-vector @gol
+-mpower9-dform -mno-power9-dform}
 
 @emph{RX Options}
 @gccoptlist{-m64bit-doubles  -m32bit-doubles  -fpu  -nofpu@gol
@@ -19909,7 +19912,7 @@  following options:
 -msimple-fpu -mstring  -mmulhw  -mdlmzb  -mmfpgpr -mvsx @gol
 -mcrypto -mdirect-move -mpower8-fusion -mpower8-vector @gol
 -mquad-memory -mquad-memory-atomic -mmodulo -mfloat128 -mfloat128-hardware @gol
--mpower9-fusion -mpower9-vector}
+-mpower9-fusion -mpower9-vector -mpower9-dform-scalar -mpower9-dform-vector}
 
 The particular options set for any particular CPU varies between
 compiler versions, depending on what setting seems to produce optimal
@@ -20182,10 +20185,33 @@  processors.
 @opindex mpower9-vector
 @opindex mno-power9-vector
 Generate code that uses (does not use) the vector and scalar
-instructions that were added in version 2.07 of the PowerPC ISA.  Also
+instructions that were added in version 3.0 of the PowerPC ISA.  Also
 enable the use of built-in functions that allow more direct access to
 the vector instructions.
 
+@item -mpower9-dform-scalar
+@itemx -mno-power9-dform-scalar
+@opindex mpower9-dform-scalar
+@opindex mno-power9-dform-scalar
+Generate code that uses (does not use) the scalar register + offset
+(d-form) instructions to load or store values into the traditional
+Altivec registers that were added in ISA 3.0.
+
+@item -mpower9-dform-vector
+@itemx -mno-power9-dform-vector
+@opindex mpower9-dform-vector
+@opindex mno-power9-dform-vector
+Generate code that uses (does not use) the vector register + offset
+(d-form) instructions to load or store values into the VSX registers
+that were added in ISA 3.0.
+
+@item -mpower9-dform
+@itemx -mno-power9-dform
+@opindex mpower9-dform
+@opindex mno-power9-dform
+Enable (disable) both scalar and vector register + offset memory
+instructions that were added in ISA 3.0.
+
 @item -mfloat-gprs=@var{yes/single/double/no}
 @itemx -mfloat-gprs
 @opindex mfloat-gprs
Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/doc)	(revision 235831)
+++ gcc/doc/md.texi	(.../gcc/doc)	(working copy)
@@ -3224,6 +3224,9 @@  Memory operand suitable for TOC fusion m
 Int constant that is the element number that the MFVSRLD instruction
 targets.
 
+@item wO
+A memory operand suitable for the ISA 3.0 vector d-form instructions.
+
 @item wQ
 A memory address that will work with the @code{lq} and @code{stq}
 instructions.
Index: gcc/testsuite/gcc.target/powerpc/dform-3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/dform-3.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/dform-3.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 235838)
@@ -0,0 +1,39 @@ 
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -mpower9-dform -O2 -mlra" } */
+
+#ifndef TYPE
+#define TYPE vector double
+#endif
+
+struct foo {
+  TYPE a, b, c, d;
+};
+
+/* Test whether ISA 3.0 vector d-form instructions are implemented.  */
+void
+add (struct foo *p)
+{
+  p->b = p->c + p->d;
+}
+
+/* Make sure we don't use direct moves to get stuff into GPR registers.  */
+void
+gpr (struct foo *p)
+{
+  TYPE x = p->c;
+
+  __asm__ (" # reg = %0" : "+r" (x));
+
+  p->b = x;
+}
+
+/* { dg-final { scan-assembler     "lxv "      } } */
+/* { dg-final { scan-assembler     "stxv "     } } */
+/* { dg-final { scan-assembler-not "lxvx "     } } */
+/* { dg-final { scan-assembler-not "stxvx "    } } */
+/* { dg-final { scan-assembler-not "mfvsrd "   } } */
+/* { dg-final { scan-assembler-not "mfvsrld "  } } */
+/* { dg-final { scan-assembler     "l\[dq\] "  } } */
+/* { dg-final { scan-assembler     "st\[dq\] " } } */
Index: gcc/testsuite/gcc.target/powerpc/dform-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/dform-1.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 235831)
+++ gcc/testsuite/gcc.target/powerpc/dform-1.c	(.../gcc/testsuite/gcc.target/powerpc)	(working copy)
@@ -1,7 +1,7 @@ 
 /* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
 /* { dg-require-effective-target powerpc_p9vector_ok } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
-/* { dg-options "-mcpu=power9 -mpower9-dform -O2" } */
+/* { dg-options "-mcpu=power9 -mpower9-dform -O2 -mlra" } */
 
 #ifndef TYPE
 #define TYPE double
Index: gcc/testsuite/gcc.target/powerpc/dform-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/dform-2.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 235831)
+++ gcc/testsuite/gcc.target/powerpc/dform-2.c	(.../gcc/testsuite/gcc.target/powerpc)	(working copy)
@@ -1,7 +1,7 @@ 
 /* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
 /* { dg-require-effective-target powerpc_p9vector_ok } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
-/* { dg-options "-mcpu=power9 -mpower9-dform -O2" } */
+/* { dg-options "-mcpu=power9 -mpower9-dform -O2 -mlra" } */
 
 #ifndef TYPE
 #define TYPE float