diff mbox series

, V7, #6 of 7, Fix issues with vector extract and prefixed instructions

Message ID 20191114230909.GF7528@ibm-toto.the-meissners.org
State New
Headers show
Series , V7, #6 of 7, Fix issues with vector extract and prefixed instructions | expand

Commit Message

Michael Meissner Nov. 14, 2019, 11:09 p.m. UTC
This patch fixes two issues with vector extracts and prefixed instructions.

The first is if you use a vector extract on a vector that is located in memory
and to access the vector, you use a PC-relative address with a veriable index.
I.e.:

	#include <altivec.h>

	static vector int vi;

	int get (int n)
	{
	  return vec_extract (vi, n);
	}

In this case, the current code re-uses the temporary for calculating the offset
of the element to load up the address of the vector, losing the offset.  This
code prevents the combiner from combining loading the vector from memory and
the vector extract if the vector is accessed via a PC-relative address.
Instead, the vector is loaded up into a register, and the variable extract from
a register is done.

I needed to add a new constraint (em) in addition to new predicate functions.
I discovered that with the predicate function alone, the register allocator
would re-create the address.  The constraint prevents this combination.

I also modified the vector extract code to generate a single PC-relative load
if the vector has a PC-relative address and the offset is constant.

I have built a bootstrap compiler with this change, and there were no
regressions in the test suite.  Can I check this into the FSF trunk?

2019-11-14  Michael Meissner  <meissner@linux.ibm.com>

	* config/rs6000/constraints.md (em constraint): New constraint for
	non-prefixed memory.
	* config/rs6000/predicates.md (non_prefixed_memory): New
	predicate.
	(reg_or_non_prefixed_memory): New predicate.
	* config/rs6000/rs6000.c (rs6000_adjust_vec_address): Add support
	for optimizing extracting a constant vector element from a vector
	that uses a prefixed address.  If the element number is variable
	and the address uses a prefixed address, abort.
	* config/rs6000/vsx.md (vsx_extract_<mode>_var, VSX_D iterator):
	Do not allow combining prefixed memory with a variable vector
	extract.
	(vsx_extract_v4sf_var): Do not allow combining prefixed memory
	with a variable vector extract.
	(vsx_extract_<mode>_var, VSX_EXTRACT_I iterator): Do not allow
	combining prefixed memory with a variable vector extract.
	(vsx_extract_<mode>_<VS_scalar>mode_var): Do not allow combining
	prefixed memory with a variable vector extract.
	* doc/md.texi (PowerPC constraints): Document the em constraint.

Comments

Segher Boessenkool Nov. 26, 2019, 7:20 p.m. UTC | #1
Hi!

On Thu, Nov 14, 2019 at 06:09:09PM -0500, Michael Meissner wrote:
> In this case, the current code re-uses the temporary for calculating the offset
> of the element to load up the address of the vector, losing the offset.

Reusing the same pseudo is *always* a bad idea.  You get better
optimisation if most code is "SSA-like": write to every pseudo only
once.  Of course you need to violate this where you woule have PHIs in
actual SSA.

> I needed to add a new constraint (em) in addition to new predicate functions.
> I discovered that with the predicate function alone, the register allocator
> would re-create the address.  The constraint prevents this combination.

It's a reasonable thing to have as a contraint, too.

Once again, how should things work in inline assembler?  Can we allow
prefixed addressing there, or should "m" in inline asm mean "em"?

> I also modified the vector extract code to generate a single PC-relative load
> if the vector has a PC-relative address and the offset is constant.

That should be handled by the RTL optimisers anyway, no?  Or is this
for post-reload splitters (a bad idea always).

> 	* config/rs6000/rs6000.c (rs6000_adjust_vec_address): Add support
> 	for optimizing extracting a constant vector element from a vector
> 	that uses a prefixed address.  If the element number is variable
> 	and the address uses a prefixed address, abort.

It doesn't abort.  Erm, wait, it *does* ICE.  Please make that more
prominent (in the code).  It's not clear why you mention it in the
changelog while allt he other details are just "add support"?

I find the control flow very hard to read here.

> +  /* Optimize PC-relative addresses with a constant offset.  */
> +  else if (pcrel_p && CONST_INT_P (element_offset))
> +    {
> +      rtx addr2 = addr;

addr isn't used after this anyway, so you can clobber it just fine.

> +  /* With only one temporary base register, we can't support a PC-relative
> +     address added to a variable offset.  This is because the PADDI instruction
> +     requires RA to be 0 when doing a PC-relative add (i.e. no register to add
> +     to).  */
> +  else if (pcrel_p)
> +    gcc_unreachable ();

That comment suggests we just ICE when we get unwanted input.  Such a
comment belongs where we prevent such code from being formed in the
first place (or nowhere, if it is obvious).


Segher
Michael Meissner Dec. 3, 2019, 6:20 p.m. UTC | #2
On Tue, Nov 26, 2019 at 01:20:20PM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Nov 14, 2019 at 06:09:09PM -0500, Michael Meissner wrote:
> > In this case, the current code re-uses the temporary for calculating the offset
> > of the element to load up the address of the vector, losing the offset.
> 
> Reusing the same pseudo is *always* a bad idea.  You get better
> optimisation if most code is "SSA-like": write to every pseudo only
> once.  Of course you need to violate this where you woule have PHIs in
> actual SSA.

Yes.  I was describing what the current code does (and why I'm fixing it).  It
is a bug.

> > I needed to add a new constraint (em) in addition to new predicate functions.
> > I discovered that with the predicate function alone, the register allocator
> > would re-create the address.  The constraint prevents this combination.
> 
> It's a reasonable thing to have as a contraint, too.
> 
> Once again, how should things work in inline assembler?  Can we allow
> prefixed addressing there, or should "m" in inline asm mean "em"?

At the moment, I think we should just not allow prefixed addresses in asm
constructs at all.

> 
> > I also modified the vector extract code to generate a single PC-relative load
> > if the vector has a PC-relative address and the offset is constant.
> 
> That should be handled by the RTL optimisers anyway, no?  Or is this
> for post-reload splitters (a bad idea always).

You have it backwards.  It is the RTL optimizers that combines the vector
extract with the load in the first place.  My code allows the combination and
generates a single instruction.  Otherwise, it would do a PLA and then do the
vector extract with the address loaded.

> > 	* config/rs6000/rs6000.c (rs6000_adjust_vec_address): Add support
> > 	for optimizing extracting a constant vector element from a vector
> > 	that uses a prefixed address.  If the element number is variable
> > 	and the address uses a prefixed address, abort.
> 
> It doesn't abort.  Erm, wait, it *does* ICE.  Please make that more
> prominent (in the code).  It's not clear why you mention it in the
> changelog while allt he other details are just "add support"?
> 
> I find the control flow very hard to read here.
> 
> > +  /* Optimize PC-relative addresses with a constant offset.  */
> > +  else if (pcrel_p && CONST_INT_P (element_offset))
> > +    {
> > +      rtx addr2 = addr;
> 
> addr isn't used after this anyway, so you can clobber it just fine.

Generally I prefer not to reuse variables.

> > +  /* With only one temporary base register, we can't support a PC-relative
> > +     address added to a variable offset.  This is because the PADDI instruction
> > +     requires RA to be 0 when doing a PC-relative add (i.e. no register to add
> > +     to).  */
> > +  else if (pcrel_p)
> > +    gcc_unreachable ();
> 
> That comment suggests we just ICE when we get unwanted input.  Such a
> comment belongs where we prevent such code from being formed in the
> first place (or nowhere, if it is obvious).

The constraint and predicate is where we prevent it from occuring.  The
gcc_unreachable is just there as insurance that it didn't get recreated later.
During testing before I added the constraint, I found the register allocator
combining the two, even though the predicate didn't allow the combination.  So
the test is just to ICE out if the combination took place.
Segher Boessenkool Dec. 3, 2019, 8:36 p.m. UTC | #3
On Tue, Dec 03, 2019 at 01:20:04PM -0500, Michael Meissner wrote:
> On Tue, Nov 26, 2019 at 01:20:20PM -0600, Segher Boessenkool wrote:
> > > I needed to add a new constraint (em) in addition to new predicate functions.
> > > I discovered that with the predicate function alone, the register allocator
> > > would re-create the address.  The constraint prevents this combination.
> > 
> > It's a reasonable thing to have as a contraint, too.
> > 
> > Once again, how should things work in inline assembler?  Can we allow
> > prefixed addressing there, or should "m" in inline asm mean "em"?
> 
> At the moment, I think we should just not allow prefixed addresses in asm
> constructs at all.

We need to think about this *before GCC 10*.  And have patches for it.

So you are saying that yes, "m" in inline asm should mean "em".  This
probably makes sense, there is so much existing asm that does not work
with prefixed addresses -- this is somewhet similar to how "m" in inline
asm is not the same as "m<>" (as it is in the machine description).

But then we need an extra constraint to _do_ allow prefixed addresses:
there needs to be a way to write inline asm for such memory as well.

> > > I also modified the vector extract code to generate a single PC-relative load
> > > if the vector has a PC-relative address and the offset is constant.
> > 
> > That should be handled by the RTL optimisers anyway, no?  Or is this
> > for post-reload splitters (a bad idea always).
> 
> You have it backwards.  It is the RTL optimizers that combines the vector
> extract with the load in the first place.  My code allows the combination and
> generates a single instruction.  Otherwise, it would do a PLA and then do the
> vector extract with the address loaded.

So, separate patch please, for a separate improvement.  So that I can
understand what is what.

> > > +  /* Optimize PC-relative addresses with a constant offset.  */
> > > +  else if (pcrel_p && CONST_INT_P (element_offset))
> > > +    {
> > > +      rtx addr2 = addr;
> > 
> > addr isn't used after this anyway, so you can clobber it just fine.
> 
> Generally I prefer not to reuse variables.

You shouldn't fear that.  Instead, you should factor big routines, and/or
simplify control flow.

"Not reusing variables" (for the *same purpose* btw) is not the problem.
The problem is that the code is too big and complex and irregular to
simply understand.

> > > +  /* With only one temporary base register, we can't support a PC-relative
> > > +     address added to a variable offset.  This is because the PADDI instruction
> > > +     requires RA to be 0 when doing a PC-relative add (i.e. no register to add
> > > +     to).  */
> > > +  else if (pcrel_p)
> > > +    gcc_unreachable ();
> > 
> > That comment suggests we just ICE when we get unwanted input.  Such a
> > comment belongs where we prevent such code from being formed in the
> > first place (or nowhere, if it is obvious).
> 
> The constraint and predicate is where we prevent it from occuring.  The
> gcc_unreachable is just there as insurance that it didn't get recreated later.
> During testing before I added the constraint, I found the register allocator
> combining the two, even though the predicate didn't allow the combination.  So
> the test is just to ICE out if the combination took place.

So there should be no comment here?  A gcc_assert could be useful, but
that is all?  And the assert should be as early as possible, in general,
like (almost) immediately after the function start.  If you find you want
to do it later, congratulations, you found a good place to factor this
routine :-)


Segher
diff mbox series

Patch

Index: gcc/config/rs6000/constraints.md
===================================================================
--- gcc/config/rs6000/constraints.md	(revision 278173)
+++ gcc/config/rs6000/constraints.md	(working copy)
@@ -202,6 +202,11 @@  (define_constraint "H"
 
 ;; Memory constraints
 
+(define_memory_constraint "em"
+  "A memory operand that does not contain a prefixed address."
+  (and (match_code "mem")
+       (match_test "non_prefixed_memory (op, mode)")))
+
 (define_memory_constraint "es"
   "A ``stable'' memory operand; that is, one which does not include any
 automodification of the base register.  Unlike @samp{m}, this constraint
Index: gcc/config/rs6000/predicates.md
===================================================================
--- gcc/config/rs6000/predicates.md	(revision 278177)
+++ gcc/config/rs6000/predicates.md	(working copy)
@@ -1836,3 +1836,24 @@  (define_predicate "prefixed_memory"
 {
   return address_is_prefixed (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT);
 })
+
+;; Return true if the operand is a memory address that does not use a prefixed
+;; address.
+(define_predicate "non_prefixed_memory"
+  (match_code "mem")
+{
+  /* If the operand is not a valid memory operand even if it is not prefixed,
+     do not return true.  */
+  if (!memory_operand (op, mode))
+    return false;
+
+  return !address_is_prefixed (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT);
+})
+
+;; Return true if the operand is either a register or it is a non-prefixed
+;; memory operand.
+(define_predicate "reg_or_non_prefixed_memory"
+  (match_code "reg,subreg,mem")
+{
+  return gpc_reg_operand (op, mode) || non_prefixed_memory (op, mode);
+})
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 278178)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -6734,6 +6734,7 @@  rs6000_adjust_vec_address (rtx scalar_re
   rtx element_offset;
   rtx new_addr;
   bool valid_addr_p;
+  bool pcrel_p = pcrel_local_address (addr, Pmode);
 
   /* Vector addresses should not have PRE_INC, PRE_DEC, or PRE_MODIFY.  */
   gcc_assert (GET_RTX_CLASS (GET_CODE (addr)) != RTX_AUTOINC);
@@ -6771,6 +6772,38 @@  rs6000_adjust_vec_address (rtx scalar_re
   else if (REG_P (addr) || SUBREG_P (addr))
     new_addr = gen_rtx_PLUS (Pmode, addr, element_offset);
 
+  /* Optimize PC-relative addresses with a constant offset.  */
+  else if (pcrel_p && CONST_INT_P (element_offset))
+    {
+      rtx addr2 = addr;
+      HOST_WIDE_INT offset = INTVAL (element_offset);
+
+      if (GET_CODE (addr2) == CONST)
+	addr2 = XEXP (addr2, 0);
+
+      if (GET_CODE (addr2) == PLUS)
+	{
+	  offset += INTVAL (XEXP (addr2, 1));
+	  addr2 = XEXP (addr2, 0);
+	}
+
+      gcc_assert (SIGNED_34BIT_OFFSET_P (offset));
+      if (offset)
+	{
+	  addr2 = gen_rtx_PLUS (Pmode, addr2, GEN_INT (offset));
+	  new_addr = gen_rtx_CONST (Pmode, addr2);
+	}
+      else
+	new_addr = addr2;
+    }
+
+  /* With only one temporary base register, we can't support a PC-relative
+     address added to a variable offset.  This is because the PADDI instruction
+     requires RA to be 0 when doing a PC-relative add (i.e. no register to add
+     to).  */
+  else if (pcrel_p)
+    gcc_unreachable ();
+
   /* Optimize D-FORM addresses with constant offset with a constant element, to
      include the element offset in the address directly.  */
   else if (GET_CODE (addr) == PLUS)
@@ -6785,8 +6818,11 @@  rs6000_adjust_vec_address (rtx scalar_re
 	  HOST_WIDE_INT offset = INTVAL (op1) + INTVAL (element_offset);
 	  rtx offset_rtx = GEN_INT (offset);
 
-	  if (IN_RANGE (offset, -32768, 32767)
-	      && (scalar_size < 8 || (offset & 0x3) == 0))
+	  if (TARGET_PREFIXED_ADDR && SIGNED_34BIT_OFFSET_P (offset))
+	    new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx);
+
+	  else if (SIGNED_16BIT_OFFSET_P (offset)
+		   && (scalar_size < 8 || (offset & 0x3) == 0))
 	    new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx);
 	  else
 	    {
@@ -6834,11 +6870,11 @@  rs6000_adjust_vec_address (rtx scalar_re
       new_addr = gen_rtx_PLUS (Pmode, base_tmp, element_offset);
     }
 
-  /* If we have a PLUS, we need to see whether the particular register class
-     allows for D-FORM or X-FORM addressing.  */
-  if (GET_CODE (new_addr) == PLUS)
+  /* If we have a PLUS or a PC-relative address without the PLUS, we need to
+     see whether the particular register class allows for D-FORM or X-FORM
+     addressing.  */
+  if (GET_CODE (new_addr) == PLUS || pcrel_p)
     {
-      rtx op1 = XEXP (new_addr, 1);
       addr_mask_type addr_mask;
       unsigned int scalar_regno = reg_or_subregno (scalar_reg);
 
@@ -6855,10 +6891,16 @@  rs6000_adjust_vec_address (rtx scalar_re
       else
 	gcc_unreachable ();
 
-      if (REG_P (op1) || SUBREG_P (op1))
-	valid_addr_p = (addr_mask & RELOAD_REG_INDEXED) != 0;
-      else
+      if (pcrel_p)
 	valid_addr_p = (addr_mask & RELOAD_REG_OFFSET) != 0;
+      else
+	{
+	  rtx op1 = XEXP (new_addr, 1);
+	  if (REG_P (op1) || SUBREG_P (op1))
+	    valid_addr_p = (addr_mask & RELOAD_REG_INDEXED) != 0;
+	  else
+	    valid_addr_p = (addr_mask & RELOAD_REG_OFFSET) != 0;
+	}
     }
 
   else if (REG_P (new_addr) || SUBREG_P (new_addr))
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 278173)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -3248,9 +3248,10 @@  (define_insn "vsx_vslo_<mode>"
 ;; Variable V2DI/V2DF extract
 (define_insn_and_split "vsx_extract_<mode>_var"
   [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=v,wa,r")
-	(unspec:<VS_scalar> [(match_operand:VSX_D 1 "input_operand" "v,m,m")
-			     (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
-			    UNSPEC_VSX_EXTRACT))
+	(unspec:<VS_scalar>
+	 [(match_operand:VSX_D 1 "reg_or_non_prefixed_memory" "v,em,em")
+	  (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
+	 UNSPEC_VSX_EXTRACT))
    (clobber (match_scratch:DI 3 "=r,&b,&b"))
    (clobber (match_scratch:V2DI 4 "=&v,X,X"))]
   "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_DIRECT_MOVE_64BIT"
@@ -3318,9 +3319,10 @@  (define_insn_and_split "*vsx_extract_v4s
 ;; Variable V4SF extract
 (define_insn_and_split "vsx_extract_v4sf_var"
   [(set (match_operand:SF 0 "gpc_reg_operand" "=wa,wa,?r")
-	(unspec:SF [(match_operand:V4SF 1 "input_operand" "v,m,m")
-		    (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
-		   UNSPEC_VSX_EXTRACT))
+	(unspec:SF
+	 [(match_operand:V4SF 1 "reg_or_non_prefixed_memory" "v,em,em")
+	  (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
+	 UNSPEC_VSX_EXTRACT))
    (clobber (match_scratch:DI 3 "=r,&b,&b"))
    (clobber (match_scratch:V2DI 4 "=&v,X,X"))]
   "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_DIRECT_MOVE_64BIT"
@@ -3681,7 +3683,7 @@  (define_insn_and_split "*vsx_extract_<mo
 (define_insn_and_split "vsx_extract_<mode>_var"
   [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=r,r,r")
 	(unspec:<VS_scalar>
-	 [(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,m")
+	 [(match_operand:VSX_EXTRACT_I 1 "reg_or_non_prefixed_memory" "v,v,em")
 	  (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
 	 UNSPEC_VSX_EXTRACT))
    (clobber (match_scratch:DI 3 "=r,r,&b"))
@@ -3701,7 +3703,7 @@  (define_insn_and_split "*vsx_extract_<mo
   [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=r,r,r")
 	(zero_extend:<VS_scalar>
 	 (unspec:<VSX_EXTRACT_I:VS_scalar>
-	  [(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,m")
+	  [(match_operand:VSX_EXTRACT_I 1 "reg_or_non_prefixed_memory" "v,v,em")
 	   (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
 	  UNSPEC_VSX_EXTRACT)))
    (clobber (match_scratch:DI 3 "=r,r,&b"))
Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	(revision 278173)
+++ gcc/doc/md.texi	(working copy)
@@ -3373,6 +3373,9 @@  asm ("st %1,%0" : "=m<>" (mem) : "r" (va
 
 is not.
 
+@item em
+A memory operand that does not contain a prefixed address.
+
 @item es
 A ``stable'' memory operand; that is, one which does not include any
 automodification of the base register.  This used to be useful when