diff mbox series

[V3,#4,of,10] , Add general prefixed/pcrel support

Message ID 20190826204337.GD11790@ibm-toto.the-meissners.org
State New
Headers show
Series [V3,#4,of,10] , Add general prefixed/pcrel support | expand

Commit Message

Michael Meissner Aug. 26, 2019, 8:43 p.m. UTC
This patch (V3 patch #4) is a rework of the V1 patches #3 and #4.  It
adds support to generate prefixed (and local pc-relative) instructions
for all modes, except SDmode.  SDmode can't be used with a prefixed
offset instruction, because the default method to load up a SDmode
value is to use the LFIWZX instruction, which only has an indexed
format.

For the stack_protect_setdi and stack_protect_testdi insns, I reworked
them so that the expander will copy the prefixed memory address to a
register and use the indexed instruction format.  I added new
predicates to make sure nothing re-combined the insn to form a prefixed
insns.

I changed the logic previously using insn_form to now use trad_insn.

I think in the previoius patch, I mispoke, in that the logic for
pc-relative vector extract is here, and not in the previous patch.

I have built a bootstrap compiler on a little endian power8 system, and
there were no regressions when I ran make check.  Once the previous
patches are checked in, can I check in this patch?

2019-08-26  Michael Meissner  <meissner@linux.ibm.com>

	* config/rs6000/predicates.md (add_operand): Add support for the
	PADDI instruction.
	(non_add_cint_operand): Add support for the PADDI instruction.
	(lwa_operand): Add support for the PLWA instruction.
	(non_prefixed_mem_operand): New predicate.
	* config/rs6000/rs6000-protos.h (make_memory_non_prefixed): New
	declaration.
	* config/rs6000/rs6000.c (num_insns_constant_gpr): Add support for
	the PADDI instruction.
	(rs6000_adjust_vec_address): Add support for optimizing prefixed
	and pc-relative extracts with constant extraction elements.  Add a
	failure when we use pc-relative addressing and non-constant
	extraction elements.  Use SIGNED_16BIT_OFFSET_P.
	(quad_address_p): Add support for prefixed memory instructions.
	(mem_operand_gpr): Add support for prefixed memory instructions.
	Use SIGNED_16BIT_OFFSET_EXTRA_P.
	(mem_operand_ds_form): Add support for prefixed memory
	instructions.  Use SIGNED_16BIT_OFFSET_EXTRA_P.
	(rs6000_legitimate_offset_address_p): Add support for prefixed
	memory instructions.
	(rs6000_legitimate_address_p): Add support for prefixed memory
	instructions.
	(rs6000_mode_dependent_address): Add support for prefixed memory
	instructions.
	(make_memory_non_prefixed): New function.
	(prefixed_paddi_p): Fix thinkos in last patch.
	(rs6000_rtx_costs): Add support for the PADDI instruction.
	(rs6000_num_insns): Don't treat prefixed instructions as being
	slower because they have a larger length.
	(rs6000_insn_cost): Call rs6000_num_insns.
	* config/rs6000/rs6000.md (add<mode>3): Add support for the PADDI
	instruction.
	(movsi_low): Add support for the PADDI instruction.
	(movsi const int splitter): Add support for the PADDI
	instruction.
	(mov<mode>_64bit_dm): Add support for prefixed memory
	instructions. Split alternatives that had merged loading a
	constant with register moves.
	(movtd_64bit_nodm): Add support for prefixed memory instructions.
	(movdi_internal64): Add support for prefixed memory instructions.
	(movdi const int splitter): Add comment.
	(mov<mode>_ppc64): Add support for prefixed memory instructions.
	(stack_protect_setdi): Do not allow prefixed instructions.
	(stack_protect_testdi): Do not allow prefixed instructions.
	* config/rs6000/vsx.md (vsx_mov<mode>_64bit): Add support for
	prefixed memory instructions.

Comments

Segher Boessenkool Aug. 30, 2019, 4:35 p.m. UTC | #1
Hi!

(Please split off paddi to a separate patch?)

On Mon, Aug 26, 2019 at 04:43:37PM -0400, Michael Meissner wrote:
> 	(prefixed_paddi_p): Fix thinkos in last patch.

Do that separately please.  Don't hide this in another patch like this.

Hrm, this is not in this patch at all?  Fix the changelog, then :-)

> --- gcc/config/rs6000/predicates.md	(revision 274870)
> +++ gcc/config/rs6000/predicates.md	(working copy)
> @@ -839,7 +839,8 @@ (define_special_predicate "indexed_addre
>  (define_predicate "add_operand"
>    (if_then_else (match_code "const_int")
>      (match_test "satisfies_constraint_I (op)
> -		 || satisfies_constraint_L (op)")
> +		 || satisfies_constraint_L (op)
> +		 || satisfies_constraint_eI (op)")
>      (match_operand 0 "gpc_reg_operand")))
>  
>  ;; Return 1 if the operand is either a non-special register, or 0, or -1.
> @@ -852,7 +853,8 @@ (define_predicate "adde_operand"
>  (define_predicate "non_add_cint_operand"
>    (and (match_code "const_int")
>         (match_test "!satisfies_constraint_I (op)
> -		    && !satisfies_constraint_L (op)")))
> +		    && !satisfies_constraint_L (op)
> +		    && !satisfies_constraint_eI (op)")))

(define_predicate "non_add_cint_operand"
  (and (match_code "const_int")
       (not (match_operand 0 "add_operand"))))

?  You can do that *now*, and it is pre-approved.  (This could use a better
name btw., I always have to look up what it means; a longer name is fine as
well of course, it is used only once or so).

> @@ -933,6 +935,13 @@ (define_predicate "lwa_operand"
>      return false;
>  
>    addr = XEXP (inner, 0);
> +
> +  /* The LWA instruction uses the DS-form format where the bottom two bits of
> +     the offset must be 0.  The prefixed PLWA does not have this
> +     restriction.  */
> +  if (prefixed_local_addr_p (addr, mode, TRAD_INSN_DS))
> +    return true;

Why does the decision whether something is a valid prefixed lwa_operand
need to know the non-prefixed lwa is a DS-form instruction?

And "local" is a head-scratcher for this condition, too.

> +;; Return 1 if op is a memory operand that is not prefixed.
> +(define_predicate "non_prefixed_mem_operand"
> +  (match_code "mem")
> +{
> +  if (!memory_operand (op, mode))
> +    return false;
> +
> +  return !prefixed_local_addr_p (XEXP (op, 0), GET_MODE (op),
> +				 TRAD_INSN_DEFAULT);
> +})

Use match_operand for the first condition please (and then match_test for
the second?)

This does make it seem like we need a prefixed_local_mem_p as well?  So
that we need neither that XEXP nor that GET_MODE.

> @@ -5735,6 +5735,10 @@ num_insns_constant_gpr (HOST_WIDE_INT va
>  	   && (value >> 31 == -1 || value >> 31 == 0))
>      return 1;
>  
> +  /* PADDI can support up to 34 bit signed integers.  */
> +  else if (TARGET_PREFIXED_ADDR && SIGNED_34BIT_OFFSET_P (value))
> +    return 1;

Write this earlier, together with the 16BIT one?

> @@ -6905,6 +6909,7 @@ rs6000_adjust_vec_address (rtx scalar_re
>    rtx element_offset;
>    rtx new_addr;
>    bool valid_addr_p;
> +  bool pcrel_p = TARGET_PCREL && pcrel_local_address (addr, Pmode);

This is used 159 lines later.  Please refactor things.  That would make
a separate patch *before* this one.

> +  /* Optimize pc-relative addresses.  */
> +  else if (pcrel_p)
> +    {
> +      if (CONST_INT_P (element_offset))
> +	{
> +	  rtx addr2 = addr;

This var needs a better name and/or comments.  Or maybe just factoring.

> @@ -7007,9 +7050,8 @@ rs6000_adjust_vec_address (rtx scalar_re
>  
>    /* If we have a PLUS, we need to see whether the particular register class
>       allows for D-FORM or X-FORM addressing.  */
> -  if (GET_CODE (new_addr) == PLUS)
> +  if (GET_CODE (new_addr) == PLUS || pcrel_p)

That second condition needs a comment.

> @@ -7609,7 +7675,7 @@ mem_operand_ds_form (rtx op, machine_mod
>         causes a wrap, so test only the low 16 bits.  */
>      offset = ((offset & 0xffff) ^ 0x8000) - 0x8000;
>  
> -  return offset + 0x8000 < 0x10000u - extra;
> +  return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);

Please do all these things first too, as a separate patch.

> -  offset += 0x8000;
> -  return offset < 0x10000 - extra;
> +  if (TARGET_PREFIXED_ADDR)
> +    return SIGNED_34BIT_OFFSET_EXTRA_P (offset, extra);
> +  else
> +    return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);

So this you could just do the 16BIT first, and then *this* patch will add
the 34BIT thing, in an easy-to-read patch.

> +    {
> +      /* There is no prefixed version of the load/store with update.  */
> +      return !prefixed_local_addr_p (XEXP (x, 1), mode, TRAD_INSN_DEFAULT);
> +    }

If you pass the actual MEM, the prefixed_local_mem_p function can return
false, itself.

> -	  unsigned HOST_WIDE_INT val = INTVAL (XEXP (addr, 1));
> -	  return val + 0x8000 >= 0x10000 - (TARGET_POWERPC64 ? 8 : 12);
> +	  HOST_WIDE_INT val = INTVAL (XEXP (addr, 1));
> +	  HOST_WIDE_INT extra = TARGET_POWERPC64 ? 8 : 12;

The 8 vs. 12 could use a comment (yes, I know it was there already).  Do you
know what this is about, why it is 8 and 12?

> +/* Make a memory address non-prefixed if it is prefixed.  */

"Return an RTX that is like MEM but does not need prefixed instructions
to access."?

> +rtx
> +make_memory_non_prefixed (rtx mem)
> +{
> +  gcc_assert (MEM_P (mem));
> +  if (prefixed_local_addr_p (XEXP (mem, 0), GET_MODE (mem), TRAD_INSN_DEFAULT))

Swap the condition and do an early-out, please.

> +    {
> +      rtx old_addr = XEXP (mem, 0);
> +      rtx new_addr;

You you also need to strip CONST from around the address here?

> @@ -21060,7 +21168,8 @@ rs6000_rtx_costs (rtx x, machine_mode mo
>  	    || outer_code == PLUS
>  	    || outer_code == MINUS)
>  	   && (satisfies_constraint_I (x)
> -	       || satisfies_constraint_L (x)))
> +	       || satisfies_constraint_L (x)
> +	       || satisfies_constraint_eI (x)))

Just use add_operand here, maybe?

OTOH, do we want to count prefixed insns as not more expensive than
prefixed ones?

> +/* How many real instructions are generated for this insn?  This is slightly

"How many machine instructions are generated for INSN".

> +static int
> +rs6000_num_insns (rtx_insn *insn)
> +{
> +  /* Try to figure it out based on the length and whether there are prefixed
> +     instructions.  While prefixed instructions are only 8 bytes, we have to
> +     use 12 as the size of the first prefixed instruction in case the
> +     instruction needs to be aligned.  Back to back prefixed instructions would
> +     only take 20 bytes, since it is guaranteed that one of the prefixed
> +     instructions does not need the alignment.  */
> +  int length = get_attr_length (insn);
> +
> +  if (length >= 12 && TARGET_PREFIXED_ADDR
> +      && get_attr_prefixed (insn) == PREFIXED_YES)
> +    {
> +      /* Single prefixed instruction.  */
> +      if (length == 12)
> +	return 1;
> +
> +      /* A normal instruction and a prefixed instruction (16) or two back
> +	 to back prefixed instructions (20).  */
> +      if (length == 16 || length == 20)
> +	return 2;
> +
> +      /* Guess for larger instruction sizes.  */
> +      return 2 + (length - 20) / 4;
> +    }
> +
> +  return length / 4;
> +}

Yuck.  It only needs an approximate answer, but why then handle all kinds
of cases that you cannot test (because they do not currently happen)?

Instead, handle prefixed insns one step up, in insn_cost itself?  It
knows more, it can make better estimates.  It can do it per instruction
type, importantly.

>  (define_insn "*add<mode>3"
> -  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r,r,r")
> -	(plus:GPR (match_operand:GPR 1 "gpc_reg_operand" "%r,b,b")
> -		  (match_operand:GPR 2 "add_operand" "r,I,L")))]
> +  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r,r,r,r")
> +	(plus:GPR (match_operand:GPR 1 "gpc_reg_operand" "%r,b,b,b")
> +		  (match_operand:GPR 2 "add_operand" "r,I,L,eI")))]
>    ""
>    "@
>     add %0,%1,%2
>     addi %0,%1,%2
> -   addis %0,%1,%v2"
> -  [(set_attr "type" "add")])
> +   addis %0,%1,%v2
> +   addi %0,%1,%2"
> +  [(set_attr "type" "add")
> +   (set_attr "isa" "*,*,*,fut")])

Okay.

> @@ -6909,22 +6911,22 @@ (define_insn "movsi_low"
>  
>  ;;		MR           LA           LWZ          LFIWZX       LXSIWZX
>  ;;		STW          STFIWX       STXSIWX      LI           LIS
> -;;		#            XXLOR        XXSPLTIB 0   XXSPLTIB -1  VSPLTISW
> -;;		XXLXOR 0     XXLORC -1    P9 const     MTVSRWZ      MFVSRWZ
> -;;		MF%1         MT%0         NOP
> +;;		PLI          #            XXLOR        XXSPLTIB 0   XXSPLTIB -1
> +;;		VSPLTISW     XXLXOR 0     XXLORC -1    P9 const     MTVSRWZ
> +;;		MFVSRWZ      MF%1         MT%0         NOP

So this is adding just the PLI?  Put it on a line of its own then?  And
don't reformat the existing stuff.

It would be nice if this all can be formatted a bit nicer eventually, in
more logical groups, not strictly five per line, which isn't need for
anything and not helpful at all either.  So for this maybe

  mr la
  lwz lfiwzx lxsiwzx
  stw stfiwx stxsiwx
  li lis pli #

etc.  You also have the restriction that the order matters somewhat, but
there still is a lot of room to make it easier to read.

> -;; Split a load of a large constant into the appropriate two-insn
> -;; sequence.
> +;; Split a load of a large constant into the appropriate two-insn sequence.  On
> +;; systems that support PADDI (PLI), we can use PLI to load any 32-bit constant
> +;; in one instruction.
>  
>  (define_split
>    [(set (match_operand:SI 0 "gpc_reg_operand")
>  	(match_operand:SI 1 "const_int_operand"))]
>    "(unsigned HOST_WIDE_INT) (INTVAL (operands[1]) + 0x8000) >= 0x10000

Use the 16BIT thing here?

> @@ -7769,9 +7782,13 @@ (define_insn_and_split "*mov<mode>_64bit
>    "#"
>    "&& reload_completed"
>    [(pc)]
> -{ rs6000_split_multireg_move (operands[0], operands[1]); DONE; }
> -  [(set_attr "length" "8,8,8,8,12,12,8,8,8")
> -   (set_attr "isa" "*,*,*,*,*,*,*,p8v,p8v")])
> +{
> +  rs6000_split_multireg_move (operands[0], operands[1]);
> +  DONE;
> +}
> +  [(set_attr "isa" "*,*,*,*,*,*,*,*,p8v,p8v")
> +   (set_attr "non_prefixed_length" "8")
> +   (set_attr "prefixed_length" "20")])

Should this have a separate alternative for prefixed addressing?

What happened to the 12's?

> @@ -1149,10 +1149,30 @@ (define_insn "vsx_mov<mode>_64bit"
>                 "vecstore,  vecload,   vecsimple, mffgpr,    mftgpr,    load,
>                  store,     load,      store,     *,         vecsimple, vecsimple,
>                  vecsimple, *,         *,         vecstore,  vecload")
> -   (set_attr "length"
> -               "*,         *,         *,         8,         *,         8,
> -                8,         8,         8,         8,         *,         *,
> -                *,         20,        8,         *,         *")
> +   (set (attr "non_prefixed_length")
> +	(cond [(and (eq_attr "alternative" "4")		;; MTVSRDD
> +		    (match_test "TARGET_P9_VECTOR"))
> +	       (const_string "4")
> +
> +	       (eq_attr "alternative" "3,4")		;; GPR <-> VSX
> +	       (const_string "8")
> +
> +	       (eq_attr "alternative" "5,6,7,8")	;; GPR load/store
> +	       (const_string "8")]
> +	      (const_string "*")))

Why handle alternative 4 separately like this?  Shouldn't there just be
a separate alternative for the p9 version?


Segher
Alan Modra Aug. 31, 2019, 1:06 a.m. UTC | #2
On Fri, Aug 30, 2019 at 11:35:11AM -0500, Segher Boessenkool wrote:
> > -	  unsigned HOST_WIDE_INT val = INTVAL (XEXP (addr, 1));
> > -	  return val + 0x8000 >= 0x10000 - (TARGET_POWERPC64 ? 8 : 12);
> > +	  HOST_WIDE_INT val = INTVAL (XEXP (addr, 1));
> > +	  HOST_WIDE_INT extra = TARGET_POWERPC64 ? 8 : 12;
> 
> The 8 vs. 12 could use a comment (yes, I know it was there already).  Do you
> know what this is about, why it is 8 and 12?

"extra" here covers the increase in offset needed to access the memory
using multiple registers.  For example, when loading a TImode mem to
gprs you will load at offset+0 and offset+8 when powerpc64, and
offset+0, offset+4, offset+8, and offset+12 when powerpc32.
Segher Boessenkool Aug. 31, 2019, 12:03 p.m. UTC | #3
On Sat, Aug 31, 2019 at 10:36:00AM +0930, Alan Modra wrote:
> On Fri, Aug 30, 2019 at 11:35:11AM -0500, Segher Boessenkool wrote:
> > > -	  unsigned HOST_WIDE_INT val = INTVAL (XEXP (addr, 1));
> > > -	  return val + 0x8000 >= 0x10000 - (TARGET_POWERPC64 ? 8 : 12);
> > > +	  HOST_WIDE_INT val = INTVAL (XEXP (addr, 1));
> > > +	  HOST_WIDE_INT extra = TARGET_POWERPC64 ? 8 : 12;
> > 
> > The 8 vs. 12 could use a comment (yes, I know it was there already).  Do you
> > know what this is about, why it is 8 and 12?
> 
> "extra" here covers the increase in offset needed to access the memory
> using multiple registers.  For example, when loading a TImode mem to
> gprs you will load at offset+0 and offset+8 when powerpc64, and
> offset+0, offset+4, offset+8, and offset+12 when powerpc32.

Ah, so it is the size of the mode minus the size of the accesses done to
get it...  16 - UNITS_PER_WORD may be a good way to express it?  I don't
see a way to say it that isn't still helped by a comment though.

Thanks,


Segher
diff mbox series

Patch

Index: gcc/config/rs6000/predicates.md
===================================================================
--- gcc/config/rs6000/predicates.md	(revision 274870)
+++ gcc/config/rs6000/predicates.md	(working copy)
@@ -839,7 +839,8 @@  (define_special_predicate "indexed_addre
 (define_predicate "add_operand"
   (if_then_else (match_code "const_int")
     (match_test "satisfies_constraint_I (op)
-		 || satisfies_constraint_L (op)")
+		 || satisfies_constraint_L (op)
+		 || satisfies_constraint_eI (op)")
     (match_operand 0 "gpc_reg_operand")))
 
 ;; Return 1 if the operand is either a non-special register, or 0, or -1.
@@ -852,7 +853,8 @@  (define_predicate "adde_operand"
 (define_predicate "non_add_cint_operand"
   (and (match_code "const_int")
        (match_test "!satisfies_constraint_I (op)
-		    && !satisfies_constraint_L (op)")))
+		    && !satisfies_constraint_L (op)
+		    && !satisfies_constraint_eI (op)")))
 
 ;; Return 1 if the operand is a constant that can be used as the operand
 ;; of an AND, OR or XOR.
@@ -933,6 +935,13 @@  (define_predicate "lwa_operand"
     return false;
 
   addr = XEXP (inner, 0);
+
+  /* The LWA instruction uses the DS-form format where the bottom two bits of
+     the offset must be 0.  The prefixed PLWA does not have this
+     restriction.  */
+  if (prefixed_local_addr_p (addr, mode, TRAD_INSN_DS))
+    return true;
+
   if (GET_CODE (addr) == PRE_INC
       || GET_CODE (addr) == PRE_DEC
       || (GET_CODE (addr) == PRE_MODIFY
@@ -1686,6 +1695,17 @@  (define_predicate "pcrel_ext_address"
   return (SYMBOL_REF_P (op) && !SYMBOL_REF_LOCAL_P (op));
 })
 
+;; Return 1 if op is a memory operand that is not prefixed.
+(define_predicate "non_prefixed_mem_operand"
+  (match_code "mem")
+{
+  if (!memory_operand (op, mode))
+    return false;
+
+  return !prefixed_local_addr_p (XEXP (op, 0), GET_MODE (op),
+				 TRAD_INSN_DEFAULT);
+})
+
 ;; Match the first insn (addis) in fusing the combination of addis and loads to
 ;; GPR registers on power8.
 (define_predicate "fusion_gpr_addis"
Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 274872)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -170,6 +170,7 @@  typedef enum {
 } trad_insn_type;
 
 extern bool prefixed_local_addr_p (rtx, machine_mode, trad_insn_type);
+extern rtx make_memory_non_prefixed (rtx);
 extern bool prefixed_load_p (rtx_insn *);
 extern bool prefixed_store_p (rtx_insn *);
 extern bool prefixed_paddi_p (rtx_insn *);
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 274872)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -5727,7 +5727,7 @@  static int
 num_insns_constant_gpr (HOST_WIDE_INT value)
 {
   /* signed constant loadable with addi */
-  if (((unsigned HOST_WIDE_INT) value + 0x8000) < 0x10000)
+  if (SIGNED_16BIT_OFFSET_P (value))
     return 1;
 
   /* constant loadable with addis */
@@ -5735,6 +5735,10 @@  num_insns_constant_gpr (HOST_WIDE_INT va
 	   && (value >> 31 == -1 || value >> 31 == 0))
     return 1;
 
+  /* PADDI can support up to 34 bit signed integers.  */
+  else if (TARGET_PREFIXED_ADDR && SIGNED_34BIT_OFFSET_P (value))
+    return 1;
+
   else if (TARGET_POWERPC64)
     {
       HOST_WIDE_INT low  = ((value & 0xffffffff) ^ 0x80000000) - 0x80000000;
@@ -6905,6 +6909,7 @@  rs6000_adjust_vec_address (rtx scalar_re
   rtx element_offset;
   rtx new_addr;
   bool valid_addr_p;
+  bool pcrel_p = TARGET_PCREL && pcrel_local_address (addr, Pmode);
 
   /* Vector addresses should not have PRE_INC, PRE_DEC, or PRE_MODIFY.  */
   gcc_assert (GET_RTX_CLASS (GET_CODE (addr)) != RTX_AUTOINC);
@@ -6942,6 +6947,41 @@  rs6000_adjust_vec_address (rtx scalar_re
   else if (REG_P (addr) || SUBREG_P (addr))
     new_addr = gen_rtx_PLUS (Pmode, addr, element_offset);
 
+
+  /* Optimize pc-relative addresses.  */
+  else if (pcrel_p)
+    {
+      if (CONST_INT_P (element_offset))
+	{
+	  rtx addr2 = addr;
+	  HOST_WIDE_INT offset = INTVAL (element_offset);
+
+	  if (GET_CODE (addr2) == CONST)
+	    addr2 = XEXP (addr2, 0);
+
+	  if (GET_CODE (addr2) == PLUS)
+	    {
+	      offset += INTVAL (XEXP (addr2, 1));
+	      addr2 = XEXP (addr2, 0);
+	    }
+
+	  gcc_assert (SIGNED_34BIT_OFFSET_P (offset));
+	  if (offset)
+	    {
+	      addr2 = gen_rtx_PLUS (Pmode, addr2, GEN_INT (offset));
+	      new_addr = gen_rtx_CONST (Pmode, addr2);
+	    }
+	  else
+	    new_addr = addr2;
+	}
+
+      /* Right now, the pc-relative support needs to be re-thought if you have
+	 a pc-relative address and a variable extract, due to having only have
+	 one base register tmp to use.  Fail until this is rewritten.  */
+      else
+	gcc_unreachable ();
+    }
+
   /* Optimize D-FORM addresses with constant offset with a constant element, to
      include the element offset in the address directly.  */
   else if (GET_CODE (addr) == PLUS)
@@ -6956,8 +6996,11 @@  rs6000_adjust_vec_address (rtx scalar_re
 	  HOST_WIDE_INT offset = INTVAL (op1) + INTVAL (element_offset);
 	  rtx offset_rtx = GEN_INT (offset);
 
-	  if (IN_RANGE (offset, -32768, 32767)
-	      && (scalar_size < 8 || (offset & 0x3) == 0))
+	  if (TARGET_PREFIXED_ADDR && SIGNED_34BIT_OFFSET_P (offset))
+	    new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx);
+
+	  else if (SIGNED_16BIT_OFFSET_P (offset)
+		   && (scalar_size < 8 || (offset & 0x3) == 0))
 	    new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx);
 	  else
 	    {
@@ -7007,9 +7050,8 @@  rs6000_adjust_vec_address (rtx scalar_re
 
   /* If we have a PLUS, we need to see whether the particular register class
      allows for D-FORM or X-FORM addressing.  */
-  if (GET_CODE (new_addr) == PLUS)
+  if (GET_CODE (new_addr) == PLUS || pcrel_p)
     {
-      rtx op1 = XEXP (new_addr, 1);
       addr_mask_type addr_mask;
       unsigned int scalar_regno = reg_or_subregno (scalar_reg);
 
@@ -7026,7 +7068,10 @@  rs6000_adjust_vec_address (rtx scalar_re
       else
 	gcc_unreachable ();
 
-      if (REG_P (op1) || SUBREG_P (op1))
+      if (pcrel_p)
+	valid_addr_p = (addr_mask & RELOAD_REG_OFFSET) != 0;
+      else if (REG_P (XEXP (new_addr, 1))
+	       || SUBREG_P (XEXP (new_addr, 1)))
 	valid_addr_p = (addr_mask & RELOAD_REG_INDEXED) != 0;
       else
 	valid_addr_p = (addr_mask & RELOAD_REG_OFFSET) != 0;
@@ -7454,6 +7499,13 @@  quad_address_p (rtx addr, machine_mode m
   if (VECTOR_MODE_P (mode) && !mode_supports_dq_form (mode))
     return false;
 
+  /* Is this a valid prefixed address?  If the bottom four bits of the offset
+     are non-zero, we could use a prefixed instruction (which does not have the
+     DQ-form constraint that the traditional instruction had) instead of
+     forcing the unaligned offset to a GPR.  */
+  if (prefixed_local_addr_p (addr, mode, TRAD_INSN_DQ))
+    return true;
+
   if (GET_CODE (addr) != PLUS)
     return false;
 
@@ -7555,6 +7607,13 @@  mem_operand_gpr (rtx op, machine_mode mo
       && legitimate_indirect_address_p (XEXP (addr, 0), false))
     return true;
 
+  /* Allow prefixed instructions if supported.  If the bottom two bits of the
+     offset are non-zero, we could use a prefixed instruction (which does not
+     have the DS-form constraint that the traditional instruction had) instead
+     of forcing the unaligned offset to a GPR.  */
+  if (prefixed_local_addr_p (addr, mode, TRAD_INSN_DS))
+    return true;
+
   /* Don't allow non-offsettable addresses.  See PRs 83969 and 84279.  */
   if (!rs6000_offsettable_memref_p (op, mode, false))
     return false;
@@ -7576,7 +7635,7 @@  mem_operand_gpr (rtx op, machine_mode mo
        causes a wrap, so test only the low 16 bits.  */
     offset = ((offset & 0xffff) ^ 0x8000) - 0x8000;
 
-  return offset + 0x8000 < 0x10000u - extra;
+  return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);
 }
 
 /* As above, but for DS-FORM VSX insns.  Unlike mem_operand_gpr,
@@ -7589,6 +7648,13 @@  mem_operand_ds_form (rtx op, machine_mod
   int extra;
   rtx addr = XEXP (op, 0);
 
+  /* Allow prefixed instructions if supported.  If the bottom two bits of the
+     offset are non-zero, we could use a prefixed instruction (which does not
+     have the DS-form constraint that the traditional instruction had) instead
+     of forcing the unaligned offset to a GPR.  */
+  if (prefixed_local_addr_p (addr, mode, TRAD_INSN_DS))
+    return true;
+
   if (!offsettable_address_p (false, mode, addr))
     return false;
 
@@ -7609,7 +7675,7 @@  mem_operand_ds_form (rtx op, machine_mod
        causes a wrap, so test only the low 16 bits.  */
     offset = ((offset & 0xffff) ^ 0x8000) - 0x8000;
 
-  return offset + 0x8000 < 0x10000u - extra;
+  return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);
 }
 
 /* Subroutines of rs6000_legitimize_address and rs6000_legitimate_address_p.  */
@@ -7958,8 +8024,10 @@  rs6000_legitimate_offset_address_p (mach
       break;
     }
 
-  offset += 0x8000;
-  return offset < 0x10000 - extra;
+  if (TARGET_PREFIXED_ADDR)
+    return SIGNED_34BIT_OFFSET_EXTRA_P (offset, extra);
+  else
+    return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);
 }
 
 bool
@@ -8856,6 +8924,11 @@  rs6000_legitimate_address_p (machine_mod
       && mode_supports_pre_incdec_p (mode)
       && legitimate_indirect_address_p (XEXP (x, 0), reg_ok_strict))
     return 1;
+
+  /* Handle prefixed addresses (pc-relative or 34-bit offset).  */
+  if (prefixed_local_addr_p (x, mode, TRAD_INSN_DEFAULT))
+    return 1;
+
   /* Handle restricted vector d-form offsets in ISA 3.0.  */
   if (quad_offset_p)
     {
@@ -8914,7 +8987,10 @@  rs6000_legitimate_address_p (machine_mod
 	  || (!avoiding_indexed_address_p (mode)
 	      && legitimate_indexed_address_p (XEXP (x, 1), reg_ok_strict)))
       && rtx_equal_p (XEXP (XEXP (x, 1), 0), XEXP (x, 0)))
-    return 1;
+    {
+      /* There is no prefixed version of the load/store with update.  */
+      return !prefixed_local_addr_p (XEXP (x, 1), mode, TRAD_INSN_DEFAULT);
+    }
   if (reg_offset_p && !quad_offset_p
       && legitimate_lo_sum_address_p (mode, x, reg_ok_strict))
     return 1;
@@ -8976,8 +9052,12 @@  rs6000_mode_dependent_address (const_rtx
 	  && XEXP (addr, 0) != arg_pointer_rtx
 	  && CONST_INT_P (XEXP (addr, 1)))
 	{
-	  unsigned HOST_WIDE_INT val = INTVAL (XEXP (addr, 1));
-	  return val + 0x8000 >= 0x10000 - (TARGET_POWERPC64 ? 8 : 12);
+	  HOST_WIDE_INT val = INTVAL (XEXP (addr, 1));
+	  HOST_WIDE_INT extra = TARGET_POWERPC64 ? 8 : 12;
+	  if (TARGET_PREFIXED_ADDR)
+	    return !SIGNED_34BIT_OFFSET_EXTRA_P (val, extra);
+	  else
+	    return !SIGNED_16BIT_OFFSET_EXTRA_P (val, extra);
 	}
       break;
 
@@ -13950,6 +14030,34 @@  prefixed_local_addr_p (rtx addr,
 
   return false;
 }
+
+/* Make a memory address non-prefixed if it is prefixed.  */
+
+rtx
+make_memory_non_prefixed (rtx mem)
+{
+  gcc_assert (MEM_P (mem));
+  if (prefixed_local_addr_p (XEXP (mem, 0), GET_MODE (mem), TRAD_INSN_DEFAULT))
+    {
+      rtx old_addr = XEXP (mem, 0);
+      rtx new_addr;
+
+      if (GET_CODE (old_addr) == PLUS
+	  && (REG_P (XEXP (old_addr, 0)) || SUBREG_P (XEXP (old_addr, 0)))
+	  && CONST_INT_P (XEXP (old_addr, 1)))
+	{
+	  rtx tmp_reg = force_reg (Pmode, XEXP (old_addr, 1));
+	  new_addr = gen_rtx_PLUS (Pmode, XEXP (old_addr, 0), tmp_reg);
+	}
+      else
+	new_addr = force_reg (Pmode, old_addr);
+
+      mem = change_address (mem, VOIDmode, new_addr);
+    }
+
+  return mem;
+}
+
 
 /* Whether a load instruction is a prefixed instruction.  This is called from
    the prefixed attribute processing.  */
@@ -21060,7 +21168,8 @@  rs6000_rtx_costs (rtx x, machine_mode mo
 	    || outer_code == PLUS
 	    || outer_code == MINUS)
 	   && (satisfies_constraint_I (x)
-	       || satisfies_constraint_L (x)))
+	       || satisfies_constraint_L (x)
+	       || satisfies_constraint_eI (x)))
 	  || (outer_code == AND
 	      && (satisfies_constraint_K (x)
 		  || (mode == SImode
@@ -21440,6 +21549,42 @@  rs6000_debug_rtx_costs (rtx x, machine_m
   return ret;
 }
 
+/* How many real instructions are generated for this insn?  This is slightly
+   different from the length attribute, in that the length attribute counts the
+   number of bytes.  With prefixed instructions, we don't want to count a
+   prefixed instruction (length 12 bytes including possible NOP) as taking 3
+   instructions, but just one.  */
+
+static int
+rs6000_num_insns (rtx_insn *insn)
+{
+  /* Try to figure it out based on the length and whether there are prefixed
+     instructions.  While prefixed instructions are only 8 bytes, we have to
+     use 12 as the size of the first prefixed instruction in case the
+     instruction needs to be aligned.  Back to back prefixed instructions would
+     only take 20 bytes, since it is guaranteed that one of the prefixed
+     instructions does not need the alignment.  */
+  int length = get_attr_length (insn);
+
+  if (length >= 12 && TARGET_PREFIXED_ADDR
+      && get_attr_prefixed (insn) == PREFIXED_YES)
+    {
+      /* Single prefixed instruction.  */
+      if (length == 12)
+	return 1;
+
+      /* A normal instruction and a prefixed instruction (16) or two back
+	 to back prefixed instructions (20).  */
+      if (length == 16 || length == 20)
+	return 2;
+
+      /* Guess for larger instruction sizes.  */
+      return 2 + (length - 20) / 4;
+    }
+
+  return length / 4;
+}
+
 static int
 rs6000_insn_cost (rtx_insn *insn, bool speed)
 {
@@ -21453,7 +21598,7 @@  rs6000_insn_cost (rtx_insn *insn, bool s
   if (cost > 0)
     return cost;
 
-  int n = get_attr_length (insn) / 4;
+  int n = rs6000_num_insns (insn);
   enum attr_type type = get_attr_type (insn);
 
   switch (type)
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 274872)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -1761,15 +1761,17 @@  (define_expand "add<mode>3"
 })
 
 (define_insn "*add<mode>3"
-  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r,r,r")
-	(plus:GPR (match_operand:GPR 1 "gpc_reg_operand" "%r,b,b")
-		  (match_operand:GPR 2 "add_operand" "r,I,L")))]
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r,r,r,r")
+	(plus:GPR (match_operand:GPR 1 "gpc_reg_operand" "%r,b,b,b")
+		  (match_operand:GPR 2 "add_operand" "r,I,L,eI")))]
   ""
   "@
    add %0,%1,%2
    addi %0,%1,%2
-   addis %0,%1,%v2"
-  [(set_attr "type" "add")])
+   addis %0,%1,%v2
+   addi %0,%1,%2"
+  [(set_attr "type" "add")
+   (set_attr "isa" "*,*,*,fut")])
 
 (define_insn "*addsi3_high"
   [(set (match_operand:SI 0 "gpc_reg_operand" "=b")
@@ -6909,22 +6911,22 @@  (define_insn "movsi_low"
 
 ;;		MR           LA           LWZ          LFIWZX       LXSIWZX
 ;;		STW          STFIWX       STXSIWX      LI           LIS
-;;		#            XXLOR        XXSPLTIB 0   XXSPLTIB -1  VSPLTISW
-;;		XXLXOR 0     XXLORC -1    P9 const     MTVSRWZ      MFVSRWZ
-;;		MF%1         MT%0         NOP
+;;		PLI          #            XXLOR        XXSPLTIB 0   XXSPLTIB -1
+;;		VSPLTISW     XXLXOR 0     XXLORC -1    P9 const     MTVSRWZ
+;;		MFVSRWZ      MF%1         MT%0         NOP
 (define_insn "*movsi_internal1"
   [(set (match_operand:SI 0 "nonimmediate_operand"
 		"=r,         r,           r,           d,           v,
 		 m,          Z,           Z,           r,           r,
-		 r,          wa,          wa,          wa,          v,
-		 wa,         v,           v,           wa,          r,
-		 r,          *h,          *h")
+		 r,          r,           wa,          wa,          wa,
+		 v,          wa,          v,           v,           wa,
+		 r,          r,           *h,          *h")
 	(match_operand:SI 1 "input_operand"
 		"r,          U,           m,           Z,           Z,
 		 r,          d,           v,           I,           L,
-		 n,          wa,          O,           wM,          wB,
-		 O,          wM,          wS,          r,           wa,
-		 *h,         r,           0"))]
+		 eI,         n,           wa,          O,           wM,
+		 wB,         O,           wM,          wS,          r,
+		 wa,         *h,          r,           0"))]
   "gpc_reg_operand (operands[0], SImode)
    || gpc_reg_operand (operands[1], SImode)"
   "@
@@ -6938,6 +6940,7 @@  (define_insn "*movsi_internal1"
    stxsiwx %x1,%y0
    li %0,%1
    lis %0,%v1
+   li %0,%1
    #
    xxlor %x0,%x1,%x1
    xxspltib %x0,0
@@ -6954,21 +6957,21 @@  (define_insn "*movsi_internal1"
   [(set_attr "type"
 		"*,          *,           load,        fpload,      fpload,
 		 store,      fpstore,     fpstore,     *,           *,
-		 *,          veclogical,  vecsimple,   vecsimple,   vecsimple,
-		 veclogical, veclogical,  vecsimple,   mffgpr,      mftgpr,
-		 *,          *,           *")
+		 *,          *,           veclogical,  vecsimple,   vecsimple,
+		 vecsimple,  veclogical,  veclogical,  vecsimple,   mffgpr,
+		 mftgpr,     *,           *,           *")
    (set_attr "length"
 		"*,          *,           *,           *,           *,
 		 *,          *,           *,           *,           *,
-		 8,          *,           *,           *,           *,
-		 *,          *,           8,           *,           *,
-		 *,          *,           *")
+		 *,          8,           *,           *,           *,
+		 *,          *,           *,           8,           *,
+		 *,          *,           *,           *")
    (set_attr "isa"
 		"*,          *,           *,           p8v,         p8v,
 		 *,          p8v,         p8v,         *,           *,
-		 *,          p8v,         p9v,         p9v,         p8v,
-		 p9v,        p8v,         p9v,         p8v,         p8v,
-		 *,          *,           *")])
+		 fut,        *,           p8v,         p9v,         p9v,
+		 p8v,        p9v,         p8v,         p9v,         p8v,
+		 p8v,        *,           *,           *")])
 
 ;; Like movsi, but adjust a SF value to be used in a SI context, i.e.
 ;; (set (reg:SI ...) (subreg:SI (reg:SF ...) 0))
@@ -7113,14 +7116,15 @@  (define_insn "*movsi_from_df"
   "xscvdpsp %x0,%x1"
   [(set_attr "type" "fp")])
 
-;; Split a load of a large constant into the appropriate two-insn
-;; sequence.
+;; Split a load of a large constant into the appropriate two-insn sequence.  On
+;; systems that support PADDI (PLI), we can use PLI to load any 32-bit constant
+;; in one instruction.
 
 (define_split
   [(set (match_operand:SI 0 "gpc_reg_operand")
 	(match_operand:SI 1 "const_int_operand"))]
   "(unsigned HOST_WIDE_INT) (INTVAL (operands[1]) + 0x8000) >= 0x10000
-   && (INTVAL (operands[1]) & 0xffff) != 0"
+   && (INTVAL (operands[1]) & 0xffff) != 0 && !TARGET_PREFIXED_ADDR"
   [(set (match_dup 0)
 	(match_dup 2))
    (set (match_dup 0)
@@ -7759,9 +7763,18 @@  (define_expand "mov<mode>"
 ;; not swapped like they are for TImode or TFmode.  Subregs therefore are
 ;; problematical.  Don't allow direct move for this case.
 
+;;		FPR load    FPR store   FPR move    FPR zero    GPR load
+;;		GPR store   GPR move    GPR zero    MFVSRD      MTVSRD
+
 (define_insn_and_split "*mov<mode>_64bit_dm"
-  [(set (match_operand:FMOVE128_FPR 0 "nonimmediate_operand" "=m,d,d,d,Y,r,r,r,d")
-	(match_operand:FMOVE128_FPR 1 "input_operand" "d,m,d,<zero_fp>,r,<zero_fp>Y,r,d,r"))]
+  [(set (match_operand:FMOVE128_FPR 0 "nonimmediate_operand"
+		"=m,        d,          d,          d,          Y,
+		 r,         r,          r,          r,          d")
+
+	(match_operand:FMOVE128_FPR 1 "input_operand"
+		"d,         m,          d,          <zero_fp>,  r,
+		 <zero_fp>, Y,          r,          d,          r"))]
+
   "TARGET_HARD_FLOAT && TARGET_POWERPC64 && FLOAT128_2REG_P (<MODE>mode)
    && (<MODE>mode != TDmode || WORDS_BIG_ENDIAN)
    && (gpc_reg_operand (operands[0], <MODE>mode)
@@ -7769,9 +7782,13 @@  (define_insn_and_split "*mov<mode>_64bit
   "#"
   "&& reload_completed"
   [(pc)]
-{ rs6000_split_multireg_move (operands[0], operands[1]); DONE; }
-  [(set_attr "length" "8,8,8,8,12,12,8,8,8")
-   (set_attr "isa" "*,*,*,*,*,*,*,p8v,p8v")])
+{
+  rs6000_split_multireg_move (operands[0], operands[1]);
+  DONE;
+}
+  [(set_attr "isa" "*,*,*,*,*,*,*,*,p8v,p8v")
+   (set_attr "non_prefixed_length" "8")
+   (set_attr "prefixed_length" "20")])
 
 (define_insn_and_split "*movtd_64bit_nodm"
   [(set (match_operand:TD 0 "nonimmediate_operand" "=m,d,d,Y,r,r")
@@ -7782,8 +7799,12 @@  (define_insn_and_split "*movtd_64bit_nod
   "#"
   "&& reload_completed"
   [(pc)]
-{ rs6000_split_multireg_move (operands[0], operands[1]); DONE; }
-  [(set_attr "length" "8,8,8,12,12,8")])
+{
+  rs6000_split_multireg_move (operands[0], operands[1]);
+  DONE;
+}
+  [(set_attr "non_prefixed_length" "8")
+   (set_attr "prefixed_length" "20")])
 
 (define_insn_and_split "*mov<mode>_32bit"
   [(set (match_operand:FMOVE128_FPR 0 "nonimmediate_operand" "=m,d,d,d,Y,r,r")
@@ -8793,24 +8814,24 @@  (define_split
   [(pc)]
 { rs6000_split_multireg_move (operands[0], operands[1]); DONE; })
 
-;;              GPR store  GPR load   GPR move   GPR li     GPR lis     GPR #
-;;              FPR store  FPR load   FPR move   AVX store  AVX store   AVX load
-;;              AVX load   VSX move   P9 0       P9 -1      AVX 0/-1    VSX 0
-;;              VSX -1     P9 const   AVX const  From SPR   To SPR      SPR<->SPR
-;;              VSX->GPR   GPR->VSX
+;;              GPR store  GPR load   GPR move   GPR li     GPR lis     GPR pli
+;;              GPR #      FPR store  FPR load   FPR move   AVX store   AVX store
+;;              AVX load   AVX load   VSX move   P9 0       P9 -1       AVX 0/-1
+;;              VSX 0      VSX -1     P9 const   AVX const  From SPR    To SPR
+;;              SPR<->SPR  VSX->GPR   GPR->VSX
 (define_insn "*movdi_internal64"
   [(set (match_operand:DI 0 "nonimmediate_operand"
                "=YZ,       r,         r,         r,         r,          r,
-                m,         ^d,        ^d,        wY,        Z,          $v,
-                $v,        ^wa,       wa,        wa,        v,          wa,
-                wa,        v,         v,         r,         *h,         *h,
-                ?r,        ?wa")
+                r,         m,         ^d,        ^d,        wY,         Z,
+                $v,        $v,        ^wa,       wa,        wa,         v,
+                wa,        wa,        v,         v,         r,          *h,
+                *h,        ?r,        ?wa")
 	(match_operand:DI 1 "input_operand"
-               "r,         YZ,        r,         I,         L,          nF,
-                ^d,        m,         ^d,        ^v,        $v,         wY,
-                Z,         ^wa,       Oj,        wM,        OjwM,       Oj,
-                wM,        wS,        wB,        *h,        r,          0,
-                wa,        r"))]
+               "r,         YZ,        r,         I,         L,          eI,
+                nF,        ^d,        m,         ^d,        ^v,         $v,
+                wY,        Z,         ^wa,       Oj,        wM,         OjwM,
+                Oj,        wM,        wS,        wB,        *h,         r,
+                0,         wa,        r"))]
   "TARGET_POWERPC64
    && (gpc_reg_operand (operands[0], DImode)
        || gpc_reg_operand (operands[1], DImode))"
@@ -8820,6 +8841,7 @@  (define_insn "*movdi_internal64"
    mr %0,%1
    li %0,%1
    lis %0,%v1
+   li %0,%1
    #
    stfd%U0%X0 %1,%0
    lfd%U1%X1 %0,%1
@@ -8843,26 +8865,28 @@  (define_insn "*movdi_internal64"
    mtvsrd %x0,%1"
   [(set_attr "type"
                "store,      load,	*,         *,         *,         *,
-                fpstore,    fpload,     fpsimple,  fpstore,   fpstore,   fpload,
-                fpload,     veclogical, vecsimple, vecsimple, vecsimple, veclogical,
-                veclogical, vecsimple,  vecsimple, mfjmpr,    mtjmpr,    *,
-                mftgpr,    mffgpr")
+                *,          fpstore,    fpload,    fpsimple,  fpstore,   fpstore,
+                fpload,     fpload,     veclogical,vecsimple, vecsimple, vecsimple,
+                veclogical, veclogical, vecsimple,  vecsimple, mfjmpr,   mtjmpr,
+                *,          mftgpr,    mffgpr")
    (set_attr "size" "64")
    (set_attr "length"
-               "*,         *,         *,         *,         *,          20,
-                *,         *,         *,         *,         *,          *,
+               "*,         *,         *,         *,         *,          *,
+                20,        *,         *,         *,         *,          *,
                 *,         *,         *,         *,         *,          *,
-                *,         8,         *,         *,         *,          *,
-                *,         *")
+                *,         *,         8,         *,         *,          *,
+                *,         *,         *")
    (set_attr "isa"
-               "*,         *,         *,         *,         *,          *,
-                *,         *,         *,         p9v,       p7v,        p9v,
-                p7v,       *,         p9v,       p9v,       p7v,        *,
-                *,         p7v,       p7v,       *,         *,          *,
-                p8v,       p8v")])
+               "*,         *,         *,         *,         *,          fut,
+                *,         *,         *,         *,         p9v,        p7v,
+                p9v,       p7v,       *,         p9v,       p9v,        p7v,
+                *,         *,         p7v,       p7v,       *,          *,
+                *,         p8v,       p8v")])
 
 ; Some DImode loads are best done as a load of -1 followed by a mask
-; instruction.
+; instruction.  On systems that support the PADDI (PLI) instruction,
+; num_insns_constant returns 1, so these splitter would not be used for things
+; that be loaded with PLI.
 (define_split
   [(set (match_operand:DI 0 "int_reg_operand_not_pseudo")
 	(match_operand:DI 1 "const_int_operand"))]
@@ -8980,7 +9004,8 @@  (define_insn "*mov<mode>_ppc64"
   return rs6000_output_move_128bit (operands);
 }
   [(set_attr "type" "store,store,load,load,*,*")
-   (set_attr "length" "8")])
+   (set_attr "non_prefixed_length" "8,8,8,8,8,40")
+   (set_attr "prefixed_length" "20,20,20,20,8,40")])
 
 (define_split
   [(set (match_operand:TI2 0 "int_reg_operand")
@@ -11497,9 +11522,25 @@  (define_insn "stack_protect_setsi"
   [(set_attr "type" "three")
    (set_attr "length" "12")])
 
-(define_insn "stack_protect_setdi"
-  [(set (match_operand:DI 0 "memory_operand" "=Y")
-	(unspec:DI [(match_operand:DI 1 "memory_operand" "Y")] UNSPEC_SP_SET))
+(define_expand "stack_protect_setdi"
+  [(parallel [(set (match_operand:DI 0 "memory_operand")
+		   (unspec:DI [(match_operand:DI 1 "memory_operand")]
+		   UNSPEC_SP_SET))
+	      (set (match_scratch:DI 2)
+		   (const_int 0))])]
+  "TARGET_64BIT"
+{
+  if (TARGET_PREFIXED_ADDR)
+    {
+      operands[0] = make_memory_non_prefixed (operands[0]);
+      operands[1] = make_memory_non_prefixed (operands[1]);
+    }
+})
+
+(define_insn "*stack_protect_setdi"
+  [(set (match_operand:DI 0 "non_prefixed_mem_operand" "=YZ")
+	(unspec:DI [(match_operand:DI 1 "non_prefixed_mem_operand" "YZ")]
+		   UNSPEC_SP_SET))
    (set (match_scratch:DI 2 "=&r") (const_int 0))]
   "TARGET_64BIT"
   "ld%U1%X1 %2,%1\;std%U0%X0 %2,%0\;li %2,0"
@@ -11543,10 +11584,27 @@  (define_insn "stack_protect_testsi"
    lwz%U1%X1 %3,%1\;lwz%U2%X2 %4,%2\;cmplw %0,%3,%4\;li %3,0\;li %4,0"
   [(set_attr "length" "16,20")])
 
-(define_insn "stack_protect_testdi"
+(define_expand "stack_protect_testdi"
+  [(parallel [(set (match_operand:CCEQ 0 "cc_reg_operand")
+		   (unspec:CCEQ [(match_operand:DI 1 "memory_operand")
+				 (match_operand:DI 2 "memory_operand")]
+				UNSPEC_SP_TEST))
+	      (set (match_scratch:DI 4)
+		   (const_int 0))
+	      (clobber (match_scratch:DI 3))])]
+  "TARGET_64BIT"
+{
+  if (TARGET_PREFIXED_ADDR)
+    {
+      operands[0] = make_memory_non_prefixed (operands[0]);
+      operands[1] = make_memory_non_prefixed (operands[1]);
+    }
+})
+
+(define_insn "*stack_protect_testdi"
   [(set (match_operand:CCEQ 0 "cc_reg_operand" "=x,?y")
-        (unspec:CCEQ [(match_operand:DI 1 "memory_operand" "Y,Y")
-		      (match_operand:DI 2 "memory_operand" "Y,Y")]
+        (unspec:CCEQ [(match_operand:DI 1 "non_prefixed_mem_operand" "YZ,YZ")
+		      (match_operand:DI 2 "non_prefixed_mem_operand" "YZ,YZ")]
 		     UNSPEC_SP_TEST))
    (set (match_scratch:DI 4 "=r,r") (const_int 0))
    (clobber (match_scratch:DI 3 "=&r,&r"))]
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 274864)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -1149,10 +1149,30 @@  (define_insn "vsx_mov<mode>_64bit"
                "vecstore,  vecload,   vecsimple, mffgpr,    mftgpr,    load,
                 store,     load,      store,     *,         vecsimple, vecsimple,
                 vecsimple, *,         *,         vecstore,  vecload")
-   (set_attr "length"
-               "*,         *,         *,         8,         *,         8,
-                8,         8,         8,         8,         *,         *,
-                *,         20,        8,         *,         *")
+   (set (attr "non_prefixed_length")
+	(cond [(and (eq_attr "alternative" "4")		;; MTVSRDD
+		    (match_test "TARGET_P9_VECTOR"))
+	       (const_string "4")
+
+	       (eq_attr "alternative" "3,4")		;; GPR <-> VSX
+	       (const_string "8")
+
+	       (eq_attr "alternative" "5,6,7,8")	;; GPR load/store
+	       (const_string "8")]
+	      (const_string "*")))
+
+   (set (attr "prefixed_length")
+	(cond [(and (eq_attr "alternative" "4")		;; MTVSRDD
+		    (match_test "TARGET_P9_VECTOR"))
+	       (const_string "4")
+
+	       (eq_attr "alternative" "3,4")		;; GPR <-> VSX
+	       (const_string "8")
+
+	       (eq_attr "alternative" "5,6,7,8")	;; GPR load/store
+	       (const_string "20")]
+	      (const_string "*")))
+
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
                 *,         *,         *,         *,         p9v,       *,