diff mbox series

V11 patch #2 of 15, Use prefixed load for vector extract with large offset

Message ID 20191220233832.GB28993@ibm-toto.the-meissners.org
State New
Headers show
Series V11 patch #2 of 15, Use prefixed load for vector extract with large offset | expand

Commit Message

Michael Meissner Dec. 20, 2019, 11:38 p.m. UTC
This patch incorporates large offsets for -mcpu=future when we optimization a
vector extract from memory and the memory address previously had been a
prefixed address with a large offset.

The current code would generate loading up the constant into a temporary and
then doing an indexed load.  Successive passes would eventually optimize that
back into the form we want (having the base register plus a large offset), but
it is better to generate the optimial code sooner.

I have bootstrapped this change on a little endian power8 system and there were
no regressions.  Can I check this into the trunk?

2019-12-20  Michael Meissner  <meissner@linux.ibm.com>

	* config/rs6000/rs6000.c (rs6000_adjust_vec_address): Add support
	for the offset being 34-bits when -mcpu=future is used.

Comments

Segher Boessenkool Dec. 22, 2019, 5:10 p.m. UTC | #1
Hi!

On Fri, Dec 20, 2019 at 06:38:32PM -0500, Michael Meissner wrote:
> --- gcc/config/rs6000/rs6000.c	(revision 279553)
> +++ gcc/config/rs6000/rs6000.c	(working copy)
> @@ -6792,9 +6792,17 @@ rs6000_adjust_vec_address (rtx scalar_re
>  	  HOST_WIDE_INT offset = INTVAL (op1) + INTVAL (element_offset);
>  	  rtx offset_rtx = GEN_INT (offset);
>  
> -	  if (IN_RANGE (offset, -32768, 32767)
> +	  /* 16-bit offset.  */
> +	  if (SIGNED_INTEGER_16BIT_P (offset)
>  	      && (scalar_size < 8 || (offset & 0x3) == 0))
>  	    new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx);

We probably should have a macro for this, hrm.  The
reg_or_aligned_short_operand predicate is the closest we have right now.

> +	  /* 34-bit offset if we have prefixed addresses.  */
> +	  else if (TARGET_PREFIXED_ADDR && SIGNED_INTEGER_34BIT_P (offset))
> +	    new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx);

This is cint34_operand.

And maybe we want both in one, for convenience?

(Something for the future of course, not this patch).

> +	  /* Offset overflowed, move offset to the temporary (which will likely
> +	     be split), and do X-FORM addressing.  */
>  	  else
>  	    {

The comment should go here, instead (after the {).

> +	  /* Make sure we don't overwrite the temporary if the element being
> +	     extracted is variable, and we've put the offset into base_tmp
> +	     previously.  */
> +	  else if (rtx_equal_p (base_tmp, element_offset))
> +	    emit_insn (gen_add2_insn (base_tmp, op1));

Register equality (in the same mode, as we have here) is just "==".  Is
that what we need here, or should it be reg_mentioned_p?


This whole function is too complex (and it writes TARGET_POWERPC64 where
it needs TARGET_64BIT, for example).


The patch is okay for trunk (with the comment moved, and the rtx_equal_p
fixed).  Thanks!


Segher
Michael Meissner Jan. 7, 2020, 1:41 a.m. UTC | #2
On Sun, Dec 22, 2019 at 11:10:09AM -0600, Segher Boessenkool wrote:
> The patch is okay for trunk (with the comment moved, and the rtx_equal_p
> fixed).  Thanks!

Here is the patch I committed (subversion id 279937):

2020-01-06  Michael Meissner  <meissner@linux.ibm.com>

	* config/rs6000/rs6000.c (rs6000_adjust_vec_address): Add support
	for the offset being 34-bits when -mcpu=future is used.

Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 279910)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -6797,11 +6797,19 @@ rs6000_adjust_vec_address (rtx scalar_re
 	  HOST_WIDE_INT offset = INTVAL (op1) + INTVAL (element_offset);
 	  rtx offset_rtx = GEN_INT (offset);
 
-	  if (IN_RANGE (offset, -32768, 32767)
+	  /* 16-bit offset.  */
+	  if (SIGNED_INTEGER_16BIT_P (offset)
 	      && (scalar_size < 8 || (offset & 0x3) == 0))
 	    new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx);
+
+	  /* 34-bit offset if we have prefixed addresses.  */
+	  else if (TARGET_PREFIXED_ADDR && SIGNED_INTEGER_34BIT_P (offset))
+	    new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx);
+
 	  else
 	    {
+	      /* Offset overflowed, move offset to the temporary (which will
+		 likely be split), and do X-FORM addressing.  */
 	      emit_move_insn (base_tmp, offset_rtx);
 	      new_addr = gen_rtx_PLUS (Pmode, op0, base_tmp);
 	    }
@@ -6830,6 +6838,12 @@ rs6000_adjust_vec_address (rtx scalar_re
 	      emit_insn (insn);
 	    }
 
+	  /* Make sure we don't overwrite the temporary if the element being
+	     extracted is variable, and we've put the offset into base_tmp
+	     previously.  */
+	  else if (reg_mentioned_p (base_tmp, element_offset))
+	    emit_insn (gen_add2_insn (base_tmp, op1));
+
 	  else
 	    {
 	      emit_move_insn (base_tmp, op1);
diff mbox series

Patch

Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 279553)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -6792,9 +6792,17 @@  rs6000_adjust_vec_address (rtx scalar_re
 	  HOST_WIDE_INT offset = INTVAL (op1) + INTVAL (element_offset);
 	  rtx offset_rtx = GEN_INT (offset);
 
-	  if (IN_RANGE (offset, -32768, 32767)
+	  /* 16-bit offset.  */
+	  if (SIGNED_INTEGER_16BIT_P (offset)
 	      && (scalar_size < 8 || (offset & 0x3) == 0))
 	    new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx);
+
+	  /* 34-bit offset if we have prefixed addresses.  */
+	  else if (TARGET_PREFIXED_ADDR && SIGNED_INTEGER_34BIT_P (offset))
+	    new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx);
+
+	  /* Offset overflowed, move offset to the temporary (which will likely
+	     be split), and do X-FORM addressing.  */
 	  else
 	    {
 	      emit_move_insn (base_tmp, offset_rtx);
@@ -6825,6 +6833,12 @@  rs6000_adjust_vec_address (rtx scalar_re
 	      emit_insn (insn);
 	    }
 
+	  /* Make sure we don't overwrite the temporary if the element being
+	     extracted is variable, and we've put the offset into base_tmp
+	     previously.  */
+	  else if (rtx_equal_p (base_tmp, element_offset))
+	    emit_insn (gen_add2_insn (base_tmp, op1));
+
 	  else
 	    {
 	      /* Make sure base_tmp is not the same as element_offset.  This