diff mbox series

, V4, patch #4.1: Enable prefixed/pc-rel addressing (revised)

Message ID 20190930141254.GC5390@ibm-tinman.the-meissners.org
State New
Headers show
Series , V4, patch #4.1: Enable prefixed/pc-rel addressing (revised) | expand

Commit Message

Michael Meissner Sept. 30, 2019, 2:12 p.m. UTC
As we discussed privately on Friday, I had a few nits to patch #4 that came up
after I submitted the patch.  The changes between that patch and this patch are:

1) I needed to change the enum non_prefixed_form to its new spelling in the
revised patch #1 that was committed.

2) In building glibc for 'future', we discovered the stack protector insns did
not use a constraint that prevented the register allocator from creating a
prefixed address.  Even though the predicate did not allow prefixed
instructions, the compiler still generated the address due to the constraint.
The previous patch used 'ZY', but 'Y' now allows prefixed addresses.  I needed
to add a second memory constraint ('eM') that prevents using a prefixed
address.

3) The function rs6000_adjust_vec_address did not have the optimizations I had
in my private branch that allowed folding extracting a constant element from a
vector in memory into a single memory instruction.  I have put those
optimizations into this patch.

4) In the previous patch, I missed setting the prefixed size and non prefixed
size for mov<mode>_ppc64 in the insn.  This pattern is used for moving PTImode
in GPR registers (on non-VSX systems, it would move TImode also).  By the time
it gets to final, it will have been split, but it is still useful to get the
sizes correct before the mode is split.

Note, patches #5-8 do not need any modification (other than adjusting the line
numbers, which patch does).  As I stated in other mail, patches #1-3 have been
checked in.

I have done a bootstrap/make check on a little endian power8 system and there
were no regressions.  I have also built both Spec 2017 rate and Spec 2006 CPU
benchmarks for each of -mcpu=power8, -mcpu=power9, and -mcpu=future, and there
were no build failures.

Can I check this into the trunk?

2019-09-30  Michael Meissner  <meissner@linux.ibm.com>

	* config/rs6000/constraints.md (em constraint): New constraint for
	non PC-relative memory.
	(eM constraint): New constraint for non-prefixed memory.
	* config/rs6000/predicates.md (lwa_operand): Allow odd offsets if
	we have prefixed addressing.
	(non_prefixed_memory): New predicate.
	(non_pcrel_memory): New predicate.
	(reg_or_non_pcrel_memory): New predicate.
	* config/rs6000/rs6000-protos.h (make_memory_non_prefixed): New
	declaration.
	* config/rs6000/rs6000.c (rs6000_adjust_vec_address): Optimize
	PC-relative addresses with constant offsets.  Signal an error if
	we have a PC-relative address and a variable offset.  Use the
	SIGNED_16BIT_OFFSET_P macro.
	(rs6000_split_vec_extract_var): Signal an error if we have a
	PC-relative address and a variable offset.
	(quad_address_p): Add support for prefixed addresses.
	(mem_operand_gpr): Add support for prefixed addresses.
	(mem_operand_ds_form): Add support for prefixed addresses.
	(rs6000_legitimate_offset_address_p): Add support for prefixed
	addresses.
	(rs6000_legitimate_address_p): Add support for prefixed
	addresses.
	(rs6000_mode_dependent_address): Add support for prefixed
	addresses.
	(rs6000_num_insns): New helper function.
	(rs6000_insn_cost): Treat prefixed instructions as having the same
	cost as non prefixed instructions, even though the prefixed
	instructions are larger.
	(make_memory_non_prefixed): New function to make a non-prefixed
	memory operand.
	* config/rs6000/rs6000.md (mov<mode>_64bit_dm): Add support for
	prefixed addresses.
	(movtd_64bit_nodm): Add support for prefixed addresses.
	(mov<mode>_ppc64): Add support for prefixed addresses.
	(stack_protect_setdi): Convert prefixed addresses to non-prefixed
	addresses.  Allow for indexed addressing as well as offsettable.
	(stack_protect_testdi): Convert prefixed addresses to non-prefixed
	addresses.  Allow for indexed addressing as well as offsettable.
	* config/rs6000/vsx.md (vsx_mov<mode>_64bit): Add support for
	prefixed addresses.
	(vsx_extract_<mode>_var, VSX_D iterator): Do not allow a vector in
	memory with a prefixed address to combine with variable offsets.
	(vsx_extract_v4sf_var): Do not allow a vector in memory with a
	prefixed address to combine with variable offsets.
	(vsx_extract_<mode>_var, VSX_EXTRACT_I iterator): Do not allow a
	vector in memory with a prefixed address to combine with variable
	offsets.
	(vsx_extract_<mode>_<VS_scalar>mode_var): Do not allow a vector in
	memory with a prefixed address to combine with variable offsets.
	* doc/md.texi (PowerPC constraints): Document the 'em' and 'eM'
	constraints.

Comments

Segher Boessenkool Oct. 1, 2019, 11:56 p.m. UTC | #1
Hi Mike,

On Mon, Sep 30, 2019 at 10:12:54AM -0400, Michael Meissner wrote:
> I needed
> to add a second memory constraint ('eM') that prevents using a prefixed
> address.

Do we need both em and eM?  Do we ever want to allow prefixed insns but not
pcrel?  Or, alternatively, this only uses eM for some insns where the "p"
prefix trick won't work (because there are multiple insns in the template);
we could solve that some other way (by inserting the p's manually for
example).

But what should inline asm users do wrt prefixed/pcrel memory?  Should
they not use prefixed memory at all?  That means for asm we should always
disallow prefixed for "m".

Having both em and eM names is a bit confusing (which is what?)  The eM
one should not be documented in the user manual, probably.

Maybe just using wY here will work just as well?  That is also not ideal
of course, but we already have that one anyway.

> 4) In the previous patch, I missed setting the prefixed size and non prefixed
> size for mov<mode>_ppc64 in the insn.  This pattern is used for moving PTImode
> in GPR registers (on non-VSX systems, it would move TImode also).  By the time
> it gets to final, it will have been split, but it is still useful to get the
> sizes correct before the mode is split.

So that is a separate patch?  Please send it as one, then?

> +  /* The LWA instruction uses the DS-form format where the bottom two bits of
> +     the offset must be 0.  The prefixed PLWA does not have this
> +     restriction.  */
> +  if (TARGET_PREFIXED_ADDR
> +      && address_is_prefixed (addr, DImode, NON_PREFIXED_DS))
> +    return true;

Should TARGET_PREFIXED_ADDR be part of address_is_prefixed, instead of
part of all its callers?

> +;; Return 1 if op is a memory operand that is not prefixed.
> +(define_predicate "non_prefixed_memory"
> +  (match_code "mem")
> +{
> +  if (!memory_operand (op, mode))
> +    return false;
> +
> +  enum insn_form iform
> +    = address_to_insn_form (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT);
> +
> +  return (iform != INSN_FORM_PREFIXED_NUMERIC
> +          && iform != INSN_FORM_PCREL_LOCAL
> +          && iform != INSN_FORM_BAD);
> +})
> +
> +(define_predicate "non_pcrel_memory"
> +  (match_code "mem")
> +{
> +  if (!memory_operand (op, mode))
> +    return false;
> +
> +  enum insn_form iform
> +    = address_to_insn_form (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT);
> +
> +  return (iform != INSN_FORM_PCREL_EXTERNAL
> +          && iform != INSN_FORM_PCREL_LOCAL
> +          && iform != INSN_FORM_BAD);
> +})

Why does non_prefixed_memory not check INSN_FORM_PCREL_EXTERNAL?  Why does
non_prefixed_memory not use non_pcrel_memory, instead of open-coding it?

What is INSN_FORM_BAD about, in both functions?

> +;; Return 1 if op is either a register operand or a memory operand that does
> +;; not use a PC-relative address.
> +(define_predicate "reg_or_non_pcrel_memory"
> +  (match_code "reg,subreg,mem")
> +{
> +  if (REG_P (op) || SUBREG_P (op))
> +    return register_operand (op, mode);
> +
> +  return non_pcrel_memory (op, mode);
> +})

Why do we need this predicate?  Should it use register_operand like this,
or should it use gpc_reg_operand?

> +  bool pcrel_p = TARGET_PCREL && pcrel_local_address (addr, Pmode);

Similar as above: should TARGET_PCREL be part of pcrel_local_address?

> @@ -6860,6 +6904,12 @@ rs6000_split_vec_extract_var (rtx dest,
>       systems.  */
>    if (MEM_P (src))
>      {
> +      /* If this is a PC-relative address, we would need another register to
> +	 hold the address of the vector along with the variable offset.  The
> +	 callers should use reg_or_non_pcrel_memory to make sure we don't
> +	 get a PC-relative address here.  */

I don't understand this comment, nor the problem.  Please expand?

> +  /* Allow prefixed instructions if supported.  If the bottom two bits of the
> +     offset are non-zero, we could use a prefixed instruction (which does not
> +     have the DS-form constraint that the traditional instruction had) instead
> +     of forcing the unaligned offset to a GPR.  */
> +  if (address_is_prefixed (addr, mode, NON_PREFIXED_DS))
> +    return true;

Here (and for DQ) you aren't testing TARGET_PREFIXED?

> @@ -7371,7 +7435,7 @@ mem_operand_gpr (rtx op, machine_mode mo
>         causes a wrap, so test only the low 16 bits.  */
>      offset = ((offset & 0xffff) ^ 0x8000) - 0x8000;
>  
> -  return offset + 0x8000 < 0x10000u - extra;
> +  return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);

This is a separate patch (and pre-approved).

> @@ -7404,7 +7475,7 @@ mem_operand_ds_form (rtx op, machine_mod
>         causes a wrap, so test only the low 16 bits.  */
>      offset = ((offset & 0xffff) ^ 0x8000) - 0x8000;
>  
> -  return offset + 0x8000 < 0x10000u - extra;
> +  return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);

Together with this.

> -  offset += 0x8000;
> -  return offset < 0x10000 - extra;
> +  if (TARGET_PREFIXED_ADDR)
> +    return SIGNED_34BIT_OFFSET_EXTRA_P (offset, extra);
> +  else
> +    return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);

And the 16-bit part of this.

> -	  unsigned HOST_WIDE_INT val = INTVAL (XEXP (addr, 1));
> -	  return val + 0x8000 >= 0x10000 - (TARGET_POWERPC64 ? 8 : 12);
> +	  HOST_WIDE_INT val = INTVAL (XEXP (addr, 1));
> +	  HOST_WIDE_INT extra = TARGET_POWERPC64 ? 8 : 12;
> +	  if (TARGET_PREFIXED_ADDR)
> +	    return !SIGNED_34BIT_OFFSET_EXTRA_P (val, extra);
> +	  else
> +	    return !SIGNED_16BIT_OFFSET_EXTRA_P (val, extra);

And this.

> +/* How many real instructions are generated for this insn?  This is slightly
> +   different from the length attribute, in that the length attribute counts the
> +   number of bytes.  With prefixed instructions, we don't want to count a
> +   prefixed instruction (length 12 bytes including possible NOP) as taking 3
> +   instructions, but just one.  */
> +
> +static int
> +rs6000_num_insns (rtx_insn *insn)

Separate patch please.  This is a whole different issue.

> --- gcc/config/rs6000/rs6000.md	(revision 276284)
> +++ gcc/config/rs6000/rs6000.md	(working copy)
> @@ -7774,8 +7774,9 @@ (define_insn_and_split "*mov<mode>_64bit
>    "&& reload_completed"
>    [(pc)]
>  { rs6000_split_multireg_move (operands[0], operands[1]); DONE; }
> -  [(set_attr "length" "8")
> -   (set_attr "isa" "*,*,*,*,*,*,*,*,p8v,p8v")])
> +  [(set_attr "isa" "*,*,*,*,*,*,*,*,p8v,p8v")
> +   (set_attr "non_prefixed_length" "8")
> +   (set_attr "prefixed_length" "20")])

This one is fine.

> @@ -7786,8 +7787,12 @@ (define_insn_and_split "*movtd_64bit_nod
>    "#"
>    "&& reload_completed"
>    [(pc)]
> -{ rs6000_split_multireg_move (operands[0], operands[1]); DONE; }
> -  [(set_attr "length" "8,8,8,12,12,8")])
> +{
> +  rs6000_split_multireg_move (operands[0], operands[1]);
> +  DONE;
> +}
> +  [(set_attr "non_prefixed_length" "8")
> +   (set_attr "prefixed_length" "20")])

But this one I don't get...  Various alternatives were 12 before, what
changed?  Or was it wrong?

> @@ -8984,7 +8989,8 @@ (define_insn "*mov<mode>_ppc64"
>    return rs6000_output_move_128bit (operands);
>  }
>    [(set_attr "type" "store,store,load,load,*,*")
> -   (set_attr "length" "8")])
> +   (set_attr "non_prefixed_length" "8,8,8,8,8,40")
> +   (set_attr "prefixed_length" "20,20,20,20,8,40")])

What about the 40 here?

> -(define_insn "stack_protect_setdi"
> -  [(set (match_operand:DI 0 "memory_operand" "=Y")
> -	(unspec:DI [(match_operand:DI 1 "memory_operand" "Y")] UNSPEC_SP_SET))
> +(define_expand "stack_protect_setdi"
> +  [(parallel [(set (match_operand:DI 0 "memory_operand")
> +		   (unspec:DI [(match_operand:DI 1 "memory_operand")]
> +		   UNSPEC_SP_SET))
> +	      (set (match_scratch:DI 2)
> +		   (const_int 0))])]
> +  "TARGET_64BIT"
> +{
> +  if (TARGET_PREFIXED_ADDR)
> +    {
> +      operands[0] = make_memory_non_prefixed (operands[0]);
> +      operands[1] = make_memory_non_prefixed (operands[1]);
> +    }
> +})

I don't understand why this is needed...  Won't a better predicate do
this easier, safer, simpler?  Better than memory_operand, which of course
allows pretty much anything.

> @@ -1149,10 +1149,30 @@ (define_insn "vsx_mov<mode>_64bit"
>                 "vecstore,  vecload,   vecsimple, mffgpr,    mftgpr,    load,
>                  store,     load,      store,     *,         vecsimple, vecsimple,
>                  vecsimple, *,         *,         vecstore,  vecload")
> -   (set_attr "length"
> -               "*,         *,         *,         8,         *,         8,
> -                8,         8,         8,         8,         *,         *,
> -                *,         20,        8,         *,         *")
> +   (set (attr "non_prefixed_length")
> +	(cond [(and (eq_attr "alternative" "4")		;; MTVSRDD
> +		    (match_test "TARGET_P9_VECTOR"))
> +	       (const_string "4")
> +
> +	       (eq_attr "alternative" "3,4")		;; GPR <-> VSX
> +	       (const_string "8")

TARGET_P9_VECTOR is always true for alt 4 (it uses the "we" constraint).
And exactly the same is true for alt 3?  And MTVSRDD has nothing to do
with it?

> +
> +	       (eq_attr "alternative" "5,6,7,8")	;; GPR load/store
> +	       (const_string "8")]
> +	      (const_string "*")))

This loses that alt 13 was len 20, what happened there?  And alts 9 and 14?

> @@ -3235,9 +3255,10 @@ (define_insn "vsx_vslo_<mode>"
>  ;; Variable V2DI/V2DF extract
>  (define_insn_and_split "vsx_extract_<mode>_var"
>    [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=v,wa,r")
> -	(unspec:<VS_scalar> [(match_operand:VSX_D 1 "input_operand" "v,m,m")
> -			     (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
> -			    UNSPEC_VSX_EXTRACT))
> +	(unspec:<VS_scalar>
> +	 [(match_operand:VSX_D 1 "reg_or_non_pcrel_memory" "v,em,em")
> +	  (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
> +	 UNSPEC_VSX_EXTRACT))

Please explain why this needs to be non-pcrel (and split this to a separate
patch if possible).


Segher
Michael Meissner Oct. 2, 2019, 7:04 p.m. UTC | #2
On Tue, Oct 01, 2019 at 06:56:01PM -0500, Segher Boessenkool wrote:
> Hi Mike,
> 
> On Mon, Sep 30, 2019 at 10:12:54AM -0400, Michael Meissner wrote:
> > I needed
> > to add a second memory constraint ('eM') that prevents using a prefixed
> > address.
> 
> Do we need both em and eM?  Do we ever want to allow prefixed insns but not
> pcrel?  Or, alternatively, this only uses eM for some insns where the "p"
> prefix trick won't work (because there are multiple insns in the template);
> we could solve that some other way (by inserting the p's manually for
> example).

No right now we need one (no prefix) and the other (no pcrel) is desirable, but
if we can only have one, it will mean extra instructions being generated.

In the case of no prefix, we need this for stack_protect_testdi and
stack_protect_setdi.  There if we have a large stack frame and enable stack
protection, we don't want the register allocator from re-combining the insns to
an insn with a large offset, which it will do, even though the predicate does
not allow this case.  This was discovered when we tried to build glibc, and one
module (vfwprintf-internal.c) has a large stack frame and is built with
-fstack-protector-all.

In the case of no pc-rel, this occurs in optimizing vector extracts where the
vector is in memory.  In this code, we only have one temporary base register.
In the case where the address is PC-relative, and the element number being
extracted is variable, we would need two temporary base registers, one to hold
the PC-relative address, and the other to hold the offset from the start of the
vector.  So here, we disallow PC-relative addresses, but numeric addresses that
result in a prefixed instruction are fine, since the code calculates the
offset, adds in the offset, and then does the memory operation.

Aaron found this bug where it previously tried to use the single temporary with
two different uses.

If we only had a no prefixed constraint, the code would not combine the vector
extract from memory, and instead, it would load the whole vector into a
register, and then do the appropriate shifts, VSLO, etc. to extract the
element.  I imagine the case shows up when you have large stack frames (or very
large structures).

> But what should inline asm users do wrt prefixed/pcrel memory?  Should
> they not use prefixed memory at all?  That means for asm we should always
> disallow prefixed for "m".

Yes, I've been thinking that for some time.  But I'm not going to worry about
that until the patches are in.

> Having both em and eM names is a bit confusing (which is what?)  The eM
> one should not be documented in the user manual, probably.
> 
> Maybe just using wY here will work just as well?  That is also not ideal
> of course, but we already have that one anyway.

Well, wY does allow prefixed addresses (because it calls mem_operand_ds_form
which also was modified to support prefixed addresses), so it wouldn't help.

> > 4) In the previous patch, I missed setting the prefixed size and non prefixed
> > size for mov<mode>_ppc64 in the insn.  This pattern is used for moving PTImode
> > in GPR registers (on non-VSX systems, it would move TImode also).  By the time
> > it gets to final, it will have been split, but it is still useful to get the
> > sizes correct before the mode is split.
> 
> So that is a separate patch?  Please send it as one, then?

No, it needs to be part of this patch.  It was just missing from the patch I
sent out.

> > +  /* The LWA instruction uses the DS-form format where the bottom two bits of
> > +     the offset must be 0.  The prefixed PLWA does not have this
> > +     restriction.  */
> > +  if (TARGET_PREFIXED_ADDR
> > +      && address_is_prefixed (addr, DImode, NON_PREFIXED_DS))
> > +    return true;
> 
> Should TARGET_PREFIXED_ADDR be part of address_is_prefixed, instead of
> part of all its callers?

Just trying to do the test before the call, so the default case (not 'future')
won't have the call/return.

> > +;; Return 1 if op is a memory operand that is not prefixed.
> > +(define_predicate "non_prefixed_memory"
> > +  (match_code "mem")
> > +{
> > +  if (!memory_operand (op, mode))
> > +    return false;
> > +
> > +  enum insn_form iform
> > +    = address_to_insn_form (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT);
> > +
> > +  return (iform != INSN_FORM_PREFIXED_NUMERIC
> > +          && iform != INSN_FORM_PCREL_LOCAL
> > +          && iform != INSN_FORM_BAD);
> > +})
> > +
> > +(define_predicate "non_pcrel_memory"
> > +  (match_code "mem")
> > +{
> > +  if (!memory_operand (op, mode))
> > +    return false;
> > +
> > +  enum insn_form iform
> > +    = address_to_insn_form (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT);
> > +
> > +  return (iform != INSN_FORM_PCREL_EXTERNAL
> > +          && iform != INSN_FORM_PCREL_LOCAL
> > +          && iform != INSN_FORM_BAD);
> > +})
> 
> Why does non_prefixed_memory not check INSN_FORM_PCREL_EXTERNAL?  Why does
> non_prefixed_memory not use non_pcrel_memory, instead of open-coding it?

Again, I'm trying not to have the extra call/return in the pipeline.

> What is INSN_FORM_BAD about, in both functions?

Because I think it is a bad idea to return true in the case where the memory is
not correct (for example, if somebody created an address with 35-bit offsets,
the address_to_insn_form would return INSN_FORM_BAD, but you would get a false
positive if non_pcrel_memory returned true for it).

> > +;; Return 1 if op is either a register operand or a memory operand that does
> > +;; not use a PC-relative address.
> > +(define_predicate "reg_or_non_pcrel_memory"
> > +  (match_code "reg,subreg,mem")
> > +{
> > +  if (REG_P (op) || SUBREG_P (op))
> > +    return register_operand (op, mode);
> > +
> > +  return non_pcrel_memory (op, mode);
> > +})
> 
> Why do we need this predicate?  Should it use register_operand like this,
> or should it use gpc_reg_operand?

Again, just micro-optimizing.

> > +  bool pcrel_p = TARGET_PCREL && pcrel_local_address (addr, Pmode);
> 
> Similar as above: should TARGET_PCREL be part of pcrel_local_address?

Sure it can.

> > @@ -6860,6 +6904,12 @@ rs6000_split_vec_extract_var (rtx dest,
> >       systems.  */
> >    if (MEM_P (src))
> >      {
> > +      /* If this is a PC-relative address, we would need another register to
> > +	 hold the address of the vector along with the variable offset.  The
> > +	 callers should use reg_or_non_pcrel_memory to make sure we don't
> > +	 get a PC-relative address here.  */
> 
> I don't understand this comment, nor the problem.  Please expand?

If you have:

	static vector double v;

	double
	get_v_n (size_t n)
	{
	  return vec_extract (v, n);
	}

It calls rs6000_split_vec_extract_var with:

   dest		= (mem:V2DF (SYMBOL_REF:DImode "v"))
   element	= (reg:DI <xxx>)
   tmp_gpr	= (reg:DI <yyy>)
   tmp_altivec	= (scratch:V2DF)

So to turn this into a (MEM:DF ...) to access the element, it needs to
calculate the offset from the beginning of the vector into tmp_gpr.  But since
the hardware doesn't allow indexed PC-relative instructions, we have to load
the address into a register to do an indexed load.

Because that combination is not allowed, the compiler then loads up the address
into a temporary register, and calls rs6000_split_vec_extract_var with:
	(MEM:V2DF (reg:DI <zzz>))

I.e.

        rldicl 3,3,0,63
        pla 9,.LANCHOR0@pcrel
        sldi 10,3,3
        lfdx 1,9,10


> > +  /* Allow prefixed instructions if supported.  If the bottom two bits of the
> > +     offset are non-zero, we could use a prefixed instruction (which does not
> > +     have the DS-form constraint that the traditional instruction had) instead
> > +     of forcing the unaligned offset to a GPR.  */
> > +  if (address_is_prefixed (addr, mode, NON_PREFIXED_DS))
> > +    return true;
> 
> Here (and for DQ) you aren't testing TARGET_PREFIXED?

Probably it should be (or remove the tests elsewhere).

> > @@ -7371,7 +7435,7 @@ mem_operand_gpr (rtx op, machine_mode mo
> >         causes a wrap, so test only the low 16 bits.  */
> >      offset = ((offset & 0xffff) ^ 0x8000) - 0x8000;
> >  
> > -  return offset + 0x8000 < 0x10000u - extra;
> > +  return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);
> 
> This is a separate patch (and pre-approved).
> 
> > @@ -7404,7 +7475,7 @@ mem_operand_ds_form (rtx op, machine_mod
> >         causes a wrap, so test only the low 16 bits.  */
> >      offset = ((offset & 0xffff) ^ 0x8000) - 0x8000;
> >  
> > -  return offset + 0x8000 < 0x10000u - extra;
> > +  return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);
> 
> Together with this.
> 
> > -  offset += 0x8000;
> > -  return offset < 0x10000 - extra;
> > +  if (TARGET_PREFIXED_ADDR)
> > +    return SIGNED_34BIT_OFFSET_EXTRA_P (offset, extra);
> > +  else
> > +    return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);
> 
> And the 16-bit part of this.
>
> > -	  unsigned HOST_WIDE_INT val = INTVAL (XEXP (addr, 1));
> > -	  return val + 0x8000 >= 0x10000 - (TARGET_POWERPC64 ? 8 : 12);
> > +	  HOST_WIDE_INT val = INTVAL (XEXP (addr, 1));
> > +	  HOST_WIDE_INT extra = TARGET_POWERPC64 ? 8 : 12;
> > +	  if (TARGET_PREFIXED_ADDR)
> > +	    return !SIGNED_34BIT_OFFSET_EXTRA_P (val, extra);
> > +	  else
> > +	    return !SIGNED_16BIT_OFFSET_EXTRA_P (val, extra);
> 
> And this.

Ok.

> > +/* How many real instructions are generated for this insn?  This is slightly
> > +   different from the length attribute, in that the length attribute counts the
> > +   number of bytes.  With prefixed instructions, we don't want to count a
> > +   prefixed instruction (length 12 bytes including possible NOP) as taking 3
> > +   instructions, but just one.  */
> > +
> > +static int
> > +rs6000_num_insns (rtx_insn *insn)
> 
> Separate patch please.  This is a whole different issue.

Ok.

> > --- gcc/config/rs6000/rs6000.md	(revision 276284)
> > +++ gcc/config/rs6000/rs6000.md	(working copy)
> > @@ -7774,8 +7774,9 @@ (define_insn_and_split "*mov<mode>_64bit
> >    "&& reload_completed"
> >    [(pc)]
> >  { rs6000_split_multireg_move (operands[0], operands[1]); DONE; }
> > -  [(set_attr "length" "8")
> > -   (set_attr "isa" "*,*,*,*,*,*,*,*,p8v,p8v")])
> > +  [(set_attr "isa" "*,*,*,*,*,*,*,*,p8v,p8v")
> > +   (set_attr "non_prefixed_length" "8")
> > +   (set_attr "prefixed_length" "20")])
> 
> This one is fine.
> 
> > @@ -7786,8 +7787,12 @@ (define_insn_and_split "*movtd_64bit_nod
> >    "#"
> >    "&& reload_completed"
> >    [(pc)]
> > -{ rs6000_split_multireg_move (operands[0], operands[1]); DONE; }
> > -  [(set_attr "length" "8,8,8,12,12,8")])
> > +{
> > +  rs6000_split_multireg_move (operands[0], operands[1]);
> > +  DONE;
> > +}
> > +  [(set_attr "non_prefixed_length" "8")
> > +   (set_attr "prefixed_length" "20")])
> 
> But this one I don't get...  Various alternatives were 12 before, what
> changed?  Or was it wrong?

As far as I can tell it was wrong.  But I can add 4 to the values if you think
it might be needed.

> > @@ -8984,7 +8989,8 @@ (define_insn "*mov<mode>_ppc64"
> >    return rs6000_output_move_128bit (operands);
> >  }
> >    [(set_attr "type" "store,store,load,load,*,*")
> > -   (set_attr "length" "8")])
> > +   (set_attr "non_prefixed_length" "8,8,8,8,8,40")
> > +   (set_attr "prefixed_length" "20,20,20,20,8,40")])
> 
> What about the 40 here?

Dunno.  The original code just used 8, which is wrong.  I was trying to guess
what the maximum # of insns to load up a 128-bit value in two GPRs was.  I can
certainly use 8 like we have now.  Fortunately for the time it matters, it will
have been split and then the normal counts for loading up a single register
will be correct.

> > -(define_insn "stack_protect_setdi"
> > -  [(set (match_operand:DI 0 "memory_operand" "=Y")
> > -	(unspec:DI [(match_operand:DI 1 "memory_operand" "Y")] UNSPEC_SP_SET))
> > +(define_expand "stack_protect_setdi"
> > +  [(parallel [(set (match_operand:DI 0 "memory_operand")
> > +		   (unspec:DI [(match_operand:DI 1 "memory_operand")]
> > +		   UNSPEC_SP_SET))
> > +	      (set (match_scratch:DI 2)
> > +		   (const_int 0))])]
> > +  "TARGET_64BIT"
> > +{
> > +  if (TARGET_PREFIXED_ADDR)
> > +    {
> > +      operands[0] = make_memory_non_prefixed (operands[0]);
> > +      operands[1] = make_memory_non_prefixed (operands[1]);
> > +    }
> > +})
> 
> I don't understand why this is needed...  Won't a better predicate do
> this easier, safer, simpler?  Better than memory_operand, which of course
> allows pretty much anything.

For the define_expand, you need it to be more general, so that you can call
make_memory_non_prefixed.

> > @@ -1149,10 +1149,30 @@ (define_insn "vsx_mov<mode>_64bit"
> >                 "vecstore,  vecload,   vecsimple, mffgpr,    mftgpr,    load,
> >                  store,     load,      store,     *,         vecsimple, vecsimple,
> >                  vecsimple, *,         *,         vecstore,  vecload")
> > -   (set_attr "length"
> > -               "*,         *,         *,         8,         *,         8,
> > -                8,         8,         8,         8,         *,         *,
> > -                *,         20,        8,         *,         *")
> > +   (set (attr "non_prefixed_length")
> > +	(cond [(and (eq_attr "alternative" "4")		;; MTVSRDD
> > +		    (match_test "TARGET_P9_VECTOR"))
> > +	       (const_string "4")
> > +
> > +	       (eq_attr "alternative" "3,4")		;; GPR <-> VSX
> > +	       (const_string "8")
> 
> TARGET_P9_VECTOR is always true for alt 4 (it uses the "we" constraint).
> And exactly the same is true for alt 3?  And MTVSRDD has nothing to do
> with it?

I thought it was P8 code, not P9.  On P8, you would have to do two MTVSRD's and
a combine insn, which you can't express in a normal move.

> 
> > +
> > +	       (eq_attr "alternative" "5,6,7,8")	;; GPR load/store
> > +	       (const_string "8")]
> > +	      (const_string "*")))
> 
> This loses that alt 13 was len 20, what happened there?  And alts 9 and 14?
> 
> > @@ -3235,9 +3255,10 @@ (define_insn "vsx_vslo_<mode>"
> >  ;; Variable V2DI/V2DF extract
> >  (define_insn_and_split "vsx_extract_<mode>_var"
> >    [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=v,wa,r")
> > -	(unspec:<VS_scalar> [(match_operand:VSX_D 1 "input_operand" "v,m,m")
> > -			     (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
> > -			    UNSPEC_VSX_EXTRACT))
> > +	(unspec:<VS_scalar>
> > +	 [(match_operand:VSX_D 1 "reg_or_non_pcrel_memory" "v,em,em")
> > +	  (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
> > +	 UNSPEC_VSX_EXTRACT))
> 
> Please explain why this needs to be non-pcrel (and split this to a separate
> patch if possible).

See above.  If is is a separate patch, it will at least need to have the
gcc_unreachable's to flag when it is used.
Segher Boessenkool Oct. 2, 2019, 10:52 p.m. UTC | #3
On Wed, Oct 02, 2019 at 03:04:35PM -0400, Michael Meissner wrote:
> On Tue, Oct 01, 2019 at 06:56:01PM -0500, Segher Boessenkool wrote:
> > On Mon, Sep 30, 2019 at 10:12:54AM -0400, Michael Meissner wrote:
> > > I needed
> > > to add a second memory constraint ('eM') that prevents using a prefixed
> > > address.
> > 
> > Do we need both em and eM?  Do we ever want to allow prefixed insns but not
> > pcrel?  Or, alternatively, this only uses eM for some insns where the "p"
> > prefix trick won't work (because there are multiple insns in the template);
> > we could solve that some other way (by inserting the p's manually for
> > example).
> 
> No right now we need one (no prefix) and the other (no pcrel) is desirable, but
> if we can only have one, it will mean extra instructions being generated.

We can have both if we need to, but it should have less confusing names at
a minimum.

> In the case of no prefix, we need this for stack_protect_testdi and
> stack_protect_setdi.  There if we have a large stack frame and enable stack
> protection, we don't want the register allocator from re-combining the insns to
> an insn with a large offset, which it will do, even though the predicate does
> not allow this case.  This was discovered when we tried to build glibc, and one
> module (vfwprintf-internal.c) has a large stack frame and is built with
> -fstack-protector-all.

Yes, but why does it not allow prefixed insns anyway?  It does not currently
*handle* prefixed insns properly, but that can be fixed.  It won't be
pretty, but it won't be horrible either.

Anyway, we need to be able to handle non-prefixed anyway (for asm, as I
mentioned later), so yes we want a constraint for that, and have "m" in
inline asm mean that (just like right now it actually means "m but not
update form").

> In the case of no pc-rel, this occurs in optimizing vector extracts where the
> vector is in memory.  In this code, we only have one temporary base register.

Why?

> In the case where the address is PC-relative, and the element number being
> extracted is variable, we would need two temporary base registers, one to hold
> the PC-relative address, and the other to hold the offset from the start of the
> vector.  So here, we disallow PC-relative addresses, but numeric addresses that
> result in a prefixed instruction are fine, since the code calculates the
> offset, adds in the offset, and then does the memory operation.

So it should have more scratch registers here?

Or, alternatively, we can just disallow all prefixed addressing here?  Will
that really degrade anything?

> If we only had a no prefixed constraint, the code would not combine the vector
> extract from memory, and instead, it would load the whole vector into a
> register, and then do the appropriate shifts, VSLO, etc. to extract the
> element.  I imagine the case shows up when you have large stack frames (or very
> large structures).

I don't understand.

> > But what should inline asm users do wrt prefixed/pcrel memory?  Should
> > they not use prefixed memory at all?  That means for asm we should always
> > disallow prefixed for "m".
> 
> Yes, I've been thinking that for some time.  But I'm not going to worry about
> that until the patches are in.

Please do worry about it.

> > > 4) In the previous patch, I missed setting the prefixed size and non prefixed
> > > size for mov<mode>_ppc64 in the insn.  This pattern is used for moving PTImode
> > > in GPR registers (on non-VSX systems, it would move TImode also).  By the time
> > > it gets to final, it will have been split, but it is still useful to get the
> > > sizes correct before the mode is split.
> > 
> > So that is a separate patch?  Please send it as one, then?
> 
> No, it needs to be part of this patch.  It was just missing from the patch I
> sent out.

This patch does a whole lot of separate things.  It needs to be split up,
it took ages to review it like this.

> > > +  /* The LWA instruction uses the DS-form format where the bottom two bits of
> > > +     the offset must be 0.  The prefixed PLWA does not have this
> > > +     restriction.  */
> > > +  if (TARGET_PREFIXED_ADDR
> > > +      && address_is_prefixed (addr, DImode, NON_PREFIXED_DS))
> > > +    return true;
> > 
> > Should TARGET_PREFIXED_ADDR be part of address_is_prefixed, instead of
> > part of all its callers?
> 
> Just trying to do the test before the call, so the default case (not 'future')
> won't have the call/return.

Don't micro-optimise like this please, the compiler can do this better
than humans can.  And humans *can* forget it in places, which gives ICEs.

> > What is INSN_FORM_BAD about, in both functions?
> 
> Because I think it is a bad idea to return true in the case where the memory is
> not correct (for example, if somebody created an address with 35-bit offsets,
> the address_to_insn_form would return INSN_FORM_BAD, but you would get a false
> positive if non_pcrel_memory returned true for it).

Add an assert that this doesn't happen, if you are worried that it could.

> > > +;; Return 1 if op is either a register operand or a memory operand that does
> > > +;; not use a PC-relative address.
> > > +(define_predicate "reg_or_non_pcrel_memory"
> > > +  (match_code "reg,subreg,mem")
> > > +{
> > > +  if (REG_P (op) || SUBREG_P (op))
> > > +    return register_operand (op, mode);
> > > +
> > > +  return non_pcrel_memory (op, mode);
> > > +})
> > 
> > Why do we need this predicate?  Should it use register_operand like this,
> > or should it use gpc_reg_operand?
> 
> Again, just micro-optimizing.

Please don't?  Also, that doesn't answer the second question.

> > > @@ -6860,6 +6904,12 @@ rs6000_split_vec_extract_var (rtx dest,
> > >       systems.  */
> > >    if (MEM_P (src))
> > >      {
> > > +      /* If this is a PC-relative address, we would need another register to
> > > +	 hold the address of the vector along with the variable offset.  The
> > > +	 callers should use reg_or_non_pcrel_memory to make sure we don't
> > > +	 get a PC-relative address here.  */
> > 
> > I don't understand this comment, nor the problem.  Please expand?
> 
> If you have:
> 
> 	static vector double v;
> 
> 	double
> 	get_v_n (size_t n)
> 	{
> 	  return vec_extract (v, n);
> 	}
> 
> It calls rs6000_split_vec_extract_var with:
> 
>    dest		= (mem:V2DF (SYMBOL_REF:DImode "v"))
>    element	= (reg:DI <xxx>)
>    tmp_gpr	= (reg:DI <yyy>)
>    tmp_altivec	= (scratch:V2DF)
> 
> So to turn this into a (MEM:DF ...) to access the element, it needs to
> calculate the offset from the beginning of the vector into tmp_gpr.  But since
> the hardware doesn't allow indexed PC-relative instructions, we have to load
> the address into a register to do an indexed load.
> 
> Because that combination is not allowed, the compiler then loads up the address
> into a temporary register, and calls rs6000_split_vec_extract_var with:
> 	(MEM:V2DF (reg:DI <zzz>))
> 
> I.e.
> 
>         rldicl 3,3,0,63
>         pla 9,.LANCHOR0@pcrel
>         sldi 10,3,3
>         lfdx 1,9,10

Where is that rldicl generated?  You can solve this problem there (already
shift it there, saving an instruction as well!)

Or you can do
  sldi 3,3,3
  lfdx 1,9,3
  srdi 3,3,3
(and that last insn will be deleted later, if r3 isn't used after this).

> > > -{ rs6000_split_multireg_move (operands[0], operands[1]); DONE; }
> > > -  [(set_attr "length" "8,8,8,12,12,8")])
> > > +{
> > > +  rs6000_split_multireg_move (operands[0], operands[1]);
> > > +  DONE;
> > > +}
> > > +  [(set_attr "non_prefixed_length" "8")
> > > +   (set_attr "prefixed_length" "20")])
> > 
> > But this one I don't get...  Various alternatives were 12 before, what
> > changed?  Or was it wrong?
> 
> As far as I can tell it was wrong.  But I can add 4 to the values if you think
> it might be needed.

Please check if it was correct, and send a patch fixing it *before* this
one if not.

> > > @@ -8984,7 +8989,8 @@ (define_insn "*mov<mode>_ppc64"
> > >    return rs6000_output_move_128bit (operands);
> > >  }
> > >    [(set_attr "type" "store,store,load,load,*,*")
> > > -   (set_attr "length" "8")])
> > > +   (set_attr "non_prefixed_length" "8,8,8,8,8,40")
> > > +   (set_attr "prefixed_length" "20,20,20,20,8,40")])
> > 
> > What about the 40 here?
> 
> Dunno.  The original code just used 8, which is wrong.

Same as above then, please: separate patch.


Segher
diff mbox series

Patch

Index: gcc/config/rs6000/constraints.md
===================================================================
--- gcc/config/rs6000/constraints.md	(revision 276284)
+++ gcc/config/rs6000/constraints.md	(working copy)
@@ -210,6 +210,16 @@  several times, or that might not access
   (and (match_code "mem")
        (match_test "GET_RTX_CLASS (GET_CODE (XEXP (op, 0))) != RTX_AUTOINC")))
 
+(define_memory_constraint "em"
+  "A memory operand that does not contain a PC-relative reference."
+  (and (match_code "mem")
+       (match_test "non_pcrel_memory (op, mode)")))
+
+(define_memory_constraint "eM"
+  "A memory operand that does not contain a prefixed address."
+  (and (match_code "mem")
+       (match_test "non_prefixed_memory (op, mode)")))
+
 (define_memory_constraint "Q"
   "Memory operand that is an offset from a register (it is usually better
 to use @samp{m} or @samp{es} in @code{asm} statements)"
Index: gcc/config/rs6000/predicates.md
===================================================================
--- gcc/config/rs6000/predicates.md	(revision 276284)
+++ gcc/config/rs6000/predicates.md	(working copy)
@@ -932,6 +932,14 @@  (define_predicate "lwa_operand"
     return false;
 
   addr = XEXP (inner, 0);
+
+  /* The LWA instruction uses the DS-form format where the bottom two bits of
+     the offset must be 0.  The prefixed PLWA does not have this
+     restriction.  */
+  if (TARGET_PREFIXED_ADDR
+      && address_is_prefixed (addr, DImode, NON_PREFIXED_DS))
+    return true;
+
   if (GET_CODE (addr) == PRE_INC
       || GET_CODE (addr) == PRE_DEC
       || (GET_CODE (addr) == PRE_MODIFY
@@ -1807,3 +1815,43 @@  (define_predicate "pcrel_external_addres
 (define_predicate "pcrel_local_or_external_address"
   (ior (match_operand 0 "pcrel_local_address")
        (match_operand 0 "pcrel_external_address")))
+
+;; Return 1 if op is a memory operand that is not prefixed.
+(define_predicate "non_prefixed_memory"
+  (match_code "mem")
+{
+  if (!memory_operand (op, mode))
+    return false;
+
+  enum insn_form iform
+    = address_to_insn_form (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT);
+
+  return (iform != INSN_FORM_PREFIXED_NUMERIC
+          && iform != INSN_FORM_PCREL_LOCAL
+          && iform != INSN_FORM_BAD);
+})
+
+(define_predicate "non_pcrel_memory"
+  (match_code "mem")
+{
+  if (!memory_operand (op, mode))
+    return false;
+
+  enum insn_form iform
+    = address_to_insn_form (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT);
+
+  return (iform != INSN_FORM_PCREL_EXTERNAL
+          && iform != INSN_FORM_PCREL_LOCAL
+          && iform != INSN_FORM_BAD);
+})
+
+;; Return 1 if op is either a register operand or a memory operand that does
+;; not use a PC-relative address.
+(define_predicate "reg_or_non_pcrel_memory"
+  (match_code "reg,subreg,mem")
+{
+  if (REG_P (op) || SUBREG_P (op))
+    return register_operand (op, mode);
+
+  return non_pcrel_memory (op, mode);
+})
Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 276284)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -192,6 +192,7 @@  extern enum insn_form address_to_insn_fo
 extern bool prefixed_load_p (rtx_insn *);
 extern bool prefixed_store_p (rtx_insn *);
 extern bool prefixed_paddi_p (rtx_insn *);
+extern rtx make_memory_non_prefixed (rtx);
 extern void rs6000_asm_output_opcode (FILE *);
 extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int);
 
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 276284)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -6700,6 +6700,7 @@  rs6000_adjust_vec_address (rtx scalar_re
   rtx element_offset;
   rtx new_addr;
   bool valid_addr_p;
+  bool pcrel_p = TARGET_PCREL && pcrel_local_address (addr, Pmode);
 
   /* Vector addresses should not have PRE_INC, PRE_DEC, or PRE_MODIFY.  */
   gcc_assert (GET_RTX_CLASS (GET_CODE (addr)) != RTX_AUTOINC);
@@ -6737,6 +6738,40 @@  rs6000_adjust_vec_address (rtx scalar_re
   else if (REG_P (addr) || SUBREG_P (addr))
     new_addr = gen_rtx_PLUS (Pmode, addr, element_offset);
 
+  /* Optimize PC-relative addresses.  */
+  else if (pcrel_p)
+    {
+      if (CONST_INT_P (element_offset))
+	{
+	  rtx addr2 = addr;
+	  HOST_WIDE_INT offset = INTVAL (element_offset);
+
+	  if (GET_CODE (addr2) == CONST)
+	    addr2 = XEXP (addr2, 0);
+
+	  if (GET_CODE (addr2) == PLUS)
+	    {
+	      offset += INTVAL (XEXP (addr2, 1));
+	      addr2 = XEXP (addr2, 0);
+	    }
+
+	  gcc_assert (SIGNED_34BIT_OFFSET_P (offset));
+	  if (offset)
+	    {
+	      addr2 = gen_rtx_PLUS (Pmode, addr2, GEN_INT (offset));
+	      new_addr = gen_rtx_CONST (Pmode, addr2);
+	    }
+	  else
+	    new_addr = addr2;
+	}
+
+      /* Make sure we do not have a PC-relative address with a variable offset,
+	 since we only have one temporary base register, and we would need two
+	 registers in that case.  */
+      else
+	gcc_unreachable ();
+    }
+
   /* Optimize D-FORM addresses with constant offset with a constant element, to
      include the element offset in the address directly.  */
   else if (GET_CODE (addr) == PLUS)
@@ -6751,8 +6786,11 @@  rs6000_adjust_vec_address (rtx scalar_re
 	  HOST_WIDE_INT offset = INTVAL (op1) + INTVAL (element_offset);
 	  rtx offset_rtx = GEN_INT (offset);
 
-	  if (IN_RANGE (offset, -32768, 32767)
-	      && (scalar_size < 8 || (offset & 0x3) == 0))
+	  if (TARGET_PREFIXED_ADDR && SIGNED_34BIT_OFFSET_P (offset))
+	    new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx);
+
+	  else if (SIGNED_16BIT_OFFSET_P (offset)
+		   && (scalar_size < 8 || (offset & 0x3) == 0))
 	    new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx);
 	  else
 	    {
@@ -6800,11 +6838,11 @@  rs6000_adjust_vec_address (rtx scalar_re
       new_addr = gen_rtx_PLUS (Pmode, base_tmp, element_offset);
     }
 
-  /* If we have a PLUS, we need to see whether the particular register class
-     allows for D-FORM or X-FORM addressing.  */
-  if (GET_CODE (new_addr) == PLUS)
+  /* If we have a PLUS or a PC-relative address without the PLUS, we need to
+     see whether the particular register class allows for D-FORM or X-FORM
+     addressing.  */
+  if (GET_CODE (new_addr) == PLUS || pcrel_p)
     {
-      rtx op1 = XEXP (new_addr, 1);
       addr_mask_type addr_mask;
       unsigned int scalar_regno = reg_or_subregno (scalar_reg);
 
@@ -6821,10 +6859,16 @@  rs6000_adjust_vec_address (rtx scalar_re
       else
 	gcc_unreachable ();
 
-      if (REG_P (op1) || SUBREG_P (op1))
-	valid_addr_p = (addr_mask & RELOAD_REG_INDEXED) != 0;
-      else
+      if (pcrel_p)
 	valid_addr_p = (addr_mask & RELOAD_REG_OFFSET) != 0;
+      else
+	{
+	  rtx op1 = XEXP (new_addr, 1);
+	  if (REG_P (op1) || SUBREG_P (op1))
+	    valid_addr_p = (addr_mask & RELOAD_REG_INDEXED) != 0;
+	  else
+	    valid_addr_p = (addr_mask & RELOAD_REG_OFFSET) != 0;
+	}
     }
 
   else if (REG_P (new_addr) || SUBREG_P (new_addr))
@@ -6860,6 +6904,12 @@  rs6000_split_vec_extract_var (rtx dest,
      systems.  */
   if (MEM_P (src))
     {
+      /* If this is a PC-relative address, we would need another register to
+	 hold the address of the vector along with the variable offset.  The
+	 callers should use reg_or_non_pcrel_memory to make sure we don't
+	 get a PC-relative address here.  */
+      gcc_assert (non_pcrel_memory (src, mode));
+
       int num_elements = GET_MODE_NUNITS (mode);
       rtx num_ele_m1 = GEN_INT (num_elements - 1);
 
@@ -7249,6 +7299,13 @@  quad_address_p (rtx addr, machine_mode m
   if (VECTOR_MODE_P (mode) && !mode_supports_dq_form (mode))
     return false;
 
+  /* Is this a valid prefixed address?  If the bottom four bits of the offset
+     are non-zero, we could use a prefixed instruction (which does not have the
+     DQ-form constraint that the traditional instruction had) instead of
+     forcing the unaligned offset to a GPR.  */
+  if (address_is_prefixed (addr, mode, NON_PREFIXED_DQ))
+    return true;
+
   if (GET_CODE (addr) != PLUS)
     return false;
 
@@ -7350,6 +7407,13 @@  mem_operand_gpr (rtx op, machine_mode mo
       && legitimate_indirect_address_p (XEXP (addr, 0), false))
     return true;
 
+  /* Allow prefixed instructions if supported.  If the bottom two bits of the
+     offset are non-zero, we could use a prefixed instruction (which does not
+     have the DS-form constraint that the traditional instruction had) instead
+     of forcing the unaligned offset to a GPR.  */
+  if (address_is_prefixed (addr, mode, NON_PREFIXED_DS))
+    return true;
+
   /* Don't allow non-offsettable addresses.  See PRs 83969 and 84279.  */
   if (!rs6000_offsettable_memref_p (op, mode, false))
     return false;
@@ -7371,7 +7435,7 @@  mem_operand_gpr (rtx op, machine_mode mo
        causes a wrap, so test only the low 16 bits.  */
     offset = ((offset & 0xffff) ^ 0x8000) - 0x8000;
 
-  return offset + 0x8000 < 0x10000u - extra;
+  return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);
 }
 
 /* As above, but for DS-FORM VSX insns.  Unlike mem_operand_gpr,
@@ -7384,6 +7448,13 @@  mem_operand_ds_form (rtx op, machine_mod
   int extra;
   rtx addr = XEXP (op, 0);
 
+  /* Allow prefixed instructions if supported.  If the bottom two bits of the
+     offset are non-zero, we could use a prefixed instruction (which does not
+     have the DS-form constraint that the traditional instruction had) instead
+     of forcing the unaligned offset to a GPR.  */
+  if (address_is_prefixed (addr, mode, NON_PREFIXED_DS))
+    return true;
+
   if (!offsettable_address_p (false, mode, addr))
     return false;
 
@@ -7404,7 +7475,7 @@  mem_operand_ds_form (rtx op, machine_mod
        causes a wrap, so test only the low 16 bits.  */
     offset = ((offset & 0xffff) ^ 0x8000) - 0x8000;
 
-  return offset + 0x8000 < 0x10000u - extra;
+  return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);
 }
 
 /* Subroutines of rs6000_legitimize_address and rs6000_legitimate_address_p.  */
@@ -7753,8 +7824,10 @@  rs6000_legitimate_offset_address_p (mach
       break;
     }
 
-  offset += 0x8000;
-  return offset < 0x10000 - extra;
+  if (TARGET_PREFIXED_ADDR)
+    return SIGNED_34BIT_OFFSET_EXTRA_P (offset, extra);
+  else
+    return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);
 }
 
 bool
@@ -8651,6 +8724,11 @@  rs6000_legitimate_address_p (machine_mod
       && mode_supports_pre_incdec_p (mode)
       && legitimate_indirect_address_p (XEXP (x, 0), reg_ok_strict))
     return 1;
+
+  /* Handle prefixed addresses (PC-relative or 34-bit offset).  */
+  if (address_is_prefixed (x, mode, NON_PREFIXED_DEFAULT))
+    return 1;
+
   /* Handle restricted vector d-form offsets in ISA 3.0.  */
   if (quad_offset_p)
     {
@@ -8709,7 +8787,11 @@  rs6000_legitimate_address_p (machine_mod
 	  || (!avoiding_indexed_address_p (mode)
 	      && legitimate_indexed_address_p (XEXP (x, 1), reg_ok_strict)))
       && rtx_equal_p (XEXP (XEXP (x, 1), 0), XEXP (x, 0)))
-    return 1;
+    {
+      /* There is no prefixed version of the load/store with update.  */
+      rtx addr = XEXP (x, 1);
+      return !address_is_prefixed (addr, mode, NON_PREFIXED_DEFAULT);
+    }
   if (reg_offset_p && !quad_offset_p
       && legitimate_lo_sum_address_p (mode, x, reg_ok_strict))
     return 1;
@@ -8771,8 +8853,12 @@  rs6000_mode_dependent_address (const_rtx
 	  && XEXP (addr, 0) != arg_pointer_rtx
 	  && CONST_INT_P (XEXP (addr, 1)))
 	{
-	  unsigned HOST_WIDE_INT val = INTVAL (XEXP (addr, 1));
-	  return val + 0x8000 >= 0x10000 - (TARGET_POWERPC64 ? 8 : 12);
+	  HOST_WIDE_INT val = INTVAL (XEXP (addr, 1));
+	  HOST_WIDE_INT extra = TARGET_POWERPC64 ? 8 : 12;
+	  if (TARGET_PREFIXED_ADDR)
+	    return !SIGNED_34BIT_OFFSET_EXTRA_P (val, extra);
+	  else
+	    return !SIGNED_16BIT_OFFSET_EXTRA_P (val, extra);
 	}
       break;
 
@@ -20926,6 +21012,42 @@  rs6000_debug_rtx_costs (rtx x, machine_m
   return ret;
 }
 
+/* How many real instructions are generated for this insn?  This is slightly
+   different from the length attribute, in that the length attribute counts the
+   number of bytes.  With prefixed instructions, we don't want to count a
+   prefixed instruction (length 12 bytes including possible NOP) as taking 3
+   instructions, but just one.  */
+
+static int
+rs6000_num_insns (rtx_insn *insn)
+{
+  /* Try to figure it out based on the length and whether there are prefixed
+     instructions.  While prefixed instructions are only 8 bytes, we have to
+     use 12 as the size of the first prefixed instruction in case the
+     instruction needs to be aligned.  Back to back prefixed instructions would
+     only take 20 bytes, since it is guaranteed that one of the prefixed
+     instructions does not need the alignment.  */
+  int length = get_attr_length (insn);
+
+  if (length >= 12 && TARGET_PREFIXED_ADDR
+      && get_attr_prefixed (insn) == PREFIXED_YES)
+    {
+      /* Single prefixed instruction.  */
+      if (length == 12)
+	return 1;
+
+      /* A normal instruction and a prefixed instruction (16) or two back
+	 to back prefixed instructions (20).  */
+      if (length == 16 || length == 20)
+	return 2;
+
+      /* Guess for larger instruction sizes.  */
+      return 2 + (length - 20) / 4;
+    }
+
+  return length / 4;
+}
+
 static int
 rs6000_insn_cost (rtx_insn *insn, bool speed)
 {
@@ -20939,7 +21061,7 @@  rs6000_insn_cost (rtx_insn *insn, bool s
   if (cost > 0)
     return cost;
 
-  int n = get_attr_length (insn) / 4;
+  int n = rs6000_num_insns (insn);
   enum attr_type type = get_attr_type (insn);
 
   switch (type)
@@ -24908,6 +25030,34 @@  prefixed_paddi_p (rtx_insn *insn)
   return (iform == INSN_FORM_PCREL_EXTERNAL || iform == INSN_FORM_PCREL_LOCAL);
 }
 
+/* Make a memory address non-prefixed if it is prefixed.  */
+
+rtx
+make_memory_non_prefixed (rtx mem)
+{
+  gcc_assert (MEM_P (mem));
+
+  rtx old_addr = XEXP (mem, 0);
+  if (address_is_prefixed (old_addr, GET_MODE (mem), NON_PREFIXED_DEFAULT))
+    {
+      rtx new_addr;
+
+      if (GET_CODE (old_addr) == PLUS
+	  && (REG_P (XEXP (old_addr, 0)) || SUBREG_P (XEXP (old_addr, 0)))
+	  && CONST_INT_P (XEXP (old_addr, 1)))
+	{
+	  rtx tmp_reg = force_reg (Pmode, XEXP (old_addr, 1));
+	  new_addr = gen_rtx_PLUS (Pmode, XEXP (old_addr, 0), tmp_reg);
+	}
+      else
+	new_addr = force_reg (Pmode, old_addr);
+
+      mem = change_address (mem, VOIDmode, new_addr);
+    }
+
+  return mem;
+}
+
 /* Whether the next instruction needs a 'p' prefix issued before the
    instruction is printed out.  */
 static bool next_insn_prefixed_p;
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 276284)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -7774,8 +7774,9 @@  (define_insn_and_split "*mov<mode>_64bit
   "&& reload_completed"
   [(pc)]
 { rs6000_split_multireg_move (operands[0], operands[1]); DONE; }
-  [(set_attr "length" "8")
-   (set_attr "isa" "*,*,*,*,*,*,*,*,p8v,p8v")])
+  [(set_attr "isa" "*,*,*,*,*,*,*,*,p8v,p8v")
+   (set_attr "non_prefixed_length" "8")
+   (set_attr "prefixed_length" "20")])
 
 (define_insn_and_split "*movtd_64bit_nodm"
   [(set (match_operand:TD 0 "nonimmediate_operand" "=m,d,d,Y,r,r")
@@ -7786,8 +7787,12 @@  (define_insn_and_split "*movtd_64bit_nod
   "#"
   "&& reload_completed"
   [(pc)]
-{ rs6000_split_multireg_move (operands[0], operands[1]); DONE; }
-  [(set_attr "length" "8,8,8,12,12,8")])
+{
+  rs6000_split_multireg_move (operands[0], operands[1]);
+  DONE;
+}
+  [(set_attr "non_prefixed_length" "8")
+   (set_attr "prefixed_length" "20")])
 
 (define_insn_and_split "*mov<mode>_32bit"
   [(set (match_operand:FMOVE128_FPR 0 "nonimmediate_operand" "=m,d,d,d,Y,r,r")
@@ -8984,7 +8989,8 @@  (define_insn "*mov<mode>_ppc64"
   return rs6000_output_move_128bit (operands);
 }
   [(set_attr "type" "store,store,load,load,*,*")
-   (set_attr "length" "8")])
+   (set_attr "non_prefixed_length" "8,8,8,8,8,40")
+   (set_attr "prefixed_length" "20,20,20,20,8,40")])
 
 (define_split
   [(set (match_operand:TI2 0 "int_reg_operand")
@@ -11502,9 +11508,25 @@  (define_insn "stack_protect_setsi"
   [(set_attr "type" "three")
    (set_attr "length" "12")])
 
-(define_insn "stack_protect_setdi"
-  [(set (match_operand:DI 0 "memory_operand" "=Y")
-	(unspec:DI [(match_operand:DI 1 "memory_operand" "Y")] UNSPEC_SP_SET))
+(define_expand "stack_protect_setdi"
+  [(parallel [(set (match_operand:DI 0 "memory_operand")
+		   (unspec:DI [(match_operand:DI 1 "memory_operand")]
+		   UNSPEC_SP_SET))
+	      (set (match_scratch:DI 2)
+		   (const_int 0))])]
+  "TARGET_64BIT"
+{
+  if (TARGET_PREFIXED_ADDR)
+    {
+      operands[0] = make_memory_non_prefixed (operands[0]);
+      operands[1] = make_memory_non_prefixed (operands[1]);
+    }
+})
+
+(define_insn "*stack_protect_setdi"
+  [(set (match_operand:DI 0 "non_prefixed_memory" "=eM")
+	(unspec:DI [(match_operand:DI 1 "non_prefixed_memory" "eM")]
+		   UNSPEC_SP_SET))
    (set (match_scratch:DI 2 "=&r") (const_int 0))]
   "TARGET_64BIT"
   "ld%U1%X1 %2,%1\;std%U0%X0 %2,%0\;li %2,0"
@@ -11548,10 +11570,27 @@  (define_insn "stack_protect_testsi"
    lwz%U1%X1 %3,%1\;lwz%U2%X2 %4,%2\;cmplw %0,%3,%4\;li %3,0\;li %4,0"
   [(set_attr "length" "16,20")])
 
-(define_insn "stack_protect_testdi"
+(define_expand "stack_protect_testdi"
+  [(parallel [(set (match_operand:CCEQ 0 "cc_reg_operand")
+		   (unspec:CCEQ [(match_operand:DI 1 "memory_operand")
+				 (match_operand:DI 2 "memory_operand")]
+				UNSPEC_SP_TEST))
+	      (set (match_scratch:DI 4)
+		   (const_int 0))
+	      (clobber (match_scratch:DI 3))])]
+  "TARGET_64BIT"
+{
+  if (TARGET_PREFIXED_ADDR)
+    {
+      operands[1] = make_memory_non_prefixed (operands[1]);
+      operands[2] = make_memory_non_prefixed (operands[2]);
+    }
+})
+
+(define_insn "*stack_protect_testdi"
   [(set (match_operand:CCEQ 0 "cc_reg_operand" "=x,?y")
-        (unspec:CCEQ [(match_operand:DI 1 "memory_operand" "Y,Y")
-		      (match_operand:DI 2 "memory_operand" "Y,Y")]
+        (unspec:CCEQ [(match_operand:DI 1 "non_prefixed_memory" "eM,eM")
+		      (match_operand:DI 2 "non_prefixed_memory" "eM,eM")]
 		     UNSPEC_SP_TEST))
    (set (match_scratch:DI 4 "=r,r") (const_int 0))
    (clobber (match_scratch:DI 3 "=&r,&r"))]
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 276284)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -1149,10 +1149,30 @@  (define_insn "vsx_mov<mode>_64bit"
                "vecstore,  vecload,   vecsimple, mffgpr,    mftgpr,    load,
                 store,     load,      store,     *,         vecsimple, vecsimple,
                 vecsimple, *,         *,         vecstore,  vecload")
-   (set_attr "length"
-               "*,         *,         *,         8,         *,         8,
-                8,         8,         8,         8,         *,         *,
-                *,         20,        8,         *,         *")
+   (set (attr "non_prefixed_length")
+	(cond [(and (eq_attr "alternative" "4")		;; MTVSRDD
+		    (match_test "TARGET_P9_VECTOR"))
+	       (const_string "4")
+
+	       (eq_attr "alternative" "3,4")		;; GPR <-> VSX
+	       (const_string "8")
+
+	       (eq_attr "alternative" "5,6,7,8")	;; GPR load/store
+	       (const_string "8")]
+	      (const_string "*")))
+
+   (set (attr "prefixed_length")
+	(cond [(and (eq_attr "alternative" "4")		;; MTVSRDD
+		    (match_test "TARGET_P9_VECTOR"))
+	       (const_string "4")
+
+	       (eq_attr "alternative" "3,4")		;; GPR <-> VSX
+	       (const_string "8")
+
+	       (eq_attr "alternative" "5,6,7,8")	;; GPR load/store
+	       (const_string "20")]
+	      (const_string "*")))
+
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
                 *,         *,         *,         *,         p9v,       *,
@@ -3235,9 +3255,10 @@  (define_insn "vsx_vslo_<mode>"
 ;; Variable V2DI/V2DF extract
 (define_insn_and_split "vsx_extract_<mode>_var"
   [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=v,wa,r")
-	(unspec:<VS_scalar> [(match_operand:VSX_D 1 "input_operand" "v,m,m")
-			     (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
-			    UNSPEC_VSX_EXTRACT))
+	(unspec:<VS_scalar>
+	 [(match_operand:VSX_D 1 "reg_or_non_pcrel_memory" "v,em,em")
+	  (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
+	 UNSPEC_VSX_EXTRACT))
    (clobber (match_scratch:DI 3 "=r,&b,&b"))
    (clobber (match_scratch:V2DI 4 "=&v,X,X"))]
   "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_DIRECT_MOVE_64BIT"
@@ -3305,9 +3326,10 @@  (define_insn_and_split "*vsx_extract_v4s
 ;; Variable V4SF extract
 (define_insn_and_split "vsx_extract_v4sf_var"
   [(set (match_operand:SF 0 "gpc_reg_operand" "=wa,wa,?r")
-	(unspec:SF [(match_operand:V4SF 1 "input_operand" "v,m,m")
-		    (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
-		   UNSPEC_VSX_EXTRACT))
+	(unspec:SF
+	 [(match_operand:V4SF 1 "reg_or_non_pcrel_memory" "v,em,em")
+	  (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
+	 UNSPEC_VSX_EXTRACT))
    (clobber (match_scratch:DI 3 "=r,&b,&b"))
    (clobber (match_scratch:V2DI 4 "=&v,X,X"))]
   "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_DIRECT_MOVE_64BIT"
@@ -3668,7 +3690,7 @@  (define_insn_and_split "*vsx_extract_<mo
 (define_insn_and_split "vsx_extract_<mode>_var"
   [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=r,r,r")
 	(unspec:<VS_scalar>
-	 [(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,m")
+	 [(match_operand:VSX_EXTRACT_I 1 "reg_or_non_pcrel_memory" "v,v,em")
 	  (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
 	 UNSPEC_VSX_EXTRACT))
    (clobber (match_scratch:DI 3 "=r,r,&b"))
@@ -3688,7 +3710,7 @@  (define_insn_and_split "*vsx_extract_<mo
   [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=r,r,r")
 	(zero_extend:<VS_scalar>
 	 (unspec:<VSX_EXTRACT_I:VS_scalar>
-	  [(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,m")
+	  [(match_operand:VSX_EXTRACT_I 1 "reg_or_non_pcrel_memory" "v,v,em")
 	   (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
 	  UNSPEC_VSX_EXTRACT)))
    (clobber (match_scratch:DI 3 "=r,r,&b"))
Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	(revision 276284)
+++ gcc/doc/md.texi	(working copy)
@@ -3373,6 +3373,12 @@  asm ("st %1,%0" : "=m<>" (mem) : "r" (va
 
 is not.
 
+@item em
+A memory operand that does not contain a PC-relative address.
+
+@item eM
+A memory operand that does not contain a prefixed address.
+
 @item es
 A ``stable'' memory operand; that is, one which does not include any
 automodification of the base register.  This used to be useful when