diff mbox

Fix PR 78243 on PowerPC

Message ID 20161111002115.GA30042@ibm-tiger.the-meissners.org
State New
Headers show

Commit Message

Michael Meissner Nov. 11, 2016, 12:21 a.m. UTC
Aaron Sawdey has been running the GCC testsuite on the power9 simulator and he
noticed that:

	gcc.c-torture/execute/pr68532.c

does not run, and opened bug 78243 for this failure.

Now, if you compile pr68532 with normal options (-O2/-O3 and -mcpu=power8 or
-mcpu=power9) it works because with the normal powerpc vectorization costs, the
vectorize only generates scalar code.  However, the options in the PR
explicitly turns on -fno-vect-cost-model, which forces the loop to be be
vectorized.  In doing so, it generates pretty bad code.

The vectorizer generates a vector add loop, and at the end it does a vector
reduction to get the total added.  When -mcpu=power9 is used it generates a
VEXTRACTUH instruction to extract the HImode from the V8HImode vector.
Unfortunately, on little endian (with little endian element ordering), it gets
the wrong element.

This patch fixes that problem.  I did a bootstrap and regression test, and
there were no regressions.  I ran the test on the power9 simulator for both big
endian and little endian options and it passed.  I also ran the following
executable tests from the testsuite which exercise vector init, set, and
extract for each of the basic types:

	vec-init-1.c	element type: int
	vec-init-2.c	element type: long
	vec-init-4.c	element type: short
	vec-init-5.c	element type: signed char
	vec-init-8.c	element type: float
	vec-init-9.c	element type: double

All tests passed on both little endian and big endian simulator runs.  Can I
check this patch into the trunk?

2016-11-10  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/78243
	* config/rs6000/vsx.md (vsx_extract_<mode>_p9): Correct the
	element order for little endian ordering.

	* config/rs6000/altivec.md (reduc_plus_scal_<mode>): Use
	VECTOR_ELT_ORDER_BIG and not BYTES_BIG_ENDIAN to adjust element
	number.

Comments

Segher Boessenkool Nov. 11, 2016, 1:03 a.m. UTC | #1
On Thu, Nov 10, 2016 at 07:21:15PM -0500, Michael Meissner wrote:
> --- gcc/config/rs6000/vsx.md	(revision 242048)
> +++ gcc/config/rs6000/vsx.md	(revision 242049)
> @@ -2542,10 +2542,13 @@ (define_insn "vsx_extract_<mode>_p9"
>    "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_VEXTRACTUB
>     && TARGET_VSX_SMALL_INTEGER"
>  {
> -  /* Note, the element number has already been adjusted for endianness, so we
> -     don't have to adjust it here.  */
> -  int unit_size = GET_MODE_UNIT_SIZE (<MODE>mode);
> -  HOST_WIDE_INT offset = unit_size * INTVAL (operands[2]);
> +  HOST_WIDE_INT elt = INTVAL (operands[2]);
> +  HOST_WIDE_INT elt_adj = ((!VECTOR_ELT_ORDER_BIG)
> +			   ? (GET_MODE_NUNITS (<MODE>mode) - 1 - elt)
> +			   : elt);

Two unnecessary pairs of parens (three, but emacs likes the outer pair?)

Otherwise looks fine.  Okay for trunk with that fix, thanks,


Segher
diff mbox

Patch

Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 242048)
+++ gcc/config/rs6000/vsx.md	(revision 242049)
@@ -2542,10 +2542,13 @@  (define_insn "vsx_extract_<mode>_p9"
   "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_VEXTRACTUB
    && TARGET_VSX_SMALL_INTEGER"
 {
-  /* Note, the element number has already been adjusted for endianness, so we
-     don't have to adjust it here.  */
-  int unit_size = GET_MODE_UNIT_SIZE (<MODE>mode);
-  HOST_WIDE_INT offset = unit_size * INTVAL (operands[2]);
+  HOST_WIDE_INT elt = INTVAL (operands[2]);
+  HOST_WIDE_INT elt_adj = ((!VECTOR_ELT_ORDER_BIG)
+			   ? (GET_MODE_NUNITS (<MODE>mode) - 1 - elt)
+			   : elt);
+
+  HOST_WIDE_INT unit_size = GET_MODE_UNIT_SIZE (<MODE>mode);
+  HOST_WIDE_INT offset = unit_size * elt_adj;
 
   operands[2] = GEN_INT (offset);
   if (unit_size == 4)
Index: gcc/config/rs6000/altivec.md
===================================================================
--- gcc/config/rs6000/altivec.md	(revision 242048)
+++ gcc/config/rs6000/altivec.md	(revision 242049)
@@ -2785,7 +2785,7 @@  (define_expand "reduc_plus_scal_<mode>"
   rtx vtmp1 = gen_reg_rtx (V4SImode);
   rtx vtmp2 = gen_reg_rtx (<MODE>mode);
   rtx dest = gen_lowpart (V4SImode, vtmp2);
-  int elt = BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (<MODE>mode) - 1 : 0;
+  int elt = VECTOR_ELT_ORDER_BIG ? GET_MODE_NUNITS (<MODE>mode) - 1 : 0;
 
   emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
   emit_insn (gen_altivec_vsum4s<VI_char>s (vtmp1, operands[1], vzero));