diff mbox

[AArch64_be] Fix vec_select hi/lo mask confusions.

Message ID 1406715057-23501-1-git-send-email-james.greenhalgh@arm.com
State New
Headers show

Commit Message

James Greenhalgh July 30, 2014, 10:10 a.m. UTC
Hi,

A vec_select mask exists in GCC's world-view of lane ordering. The
"low-half" of the vector { a, b, c, d } is { a, b }, which on big-endian
will be in the high bits of the architectural register. On little-endian,
these lanes will be in the low bits of the architectural register.
We therefore need different masks depending on our target endian-ness.
The diagram below may help.

We must draw the distinction when building masks which select one half of the
vector.  An instruction selecting architectural low-lanes for a big-endian
target, must be described using a mask selecting GCC high-lanes.

                 Big-Endian             Little-Endian

GCC             0   1   2   3           3   2   1   0
              | x | x | x | x |       | x | x | x | x |
Architecture    3   2   1   0           3   2   1   0

Low Mask:         { 2, 3 }                { 0, 1 }
High Mask:        { 0, 1 }                { 2, 3 }

The way we implement this requires some "there is no spoon" thinking to avoid
pattern duplication. We define a vec_par_cnst_lo_half mask to always
refer to the low architectural lanes. I gave some thought to renaming this
vec_par_cnst_arch_lo_half, but it didn't add much meaning. I'm happy to
take bike-shedding towards a more self-documenting naming scheme.

No regressions spotted on aarch64_be-none-elf or aarch64-none-elf.

OK for trunk?

Thanks,
James

---
gcc/

2014-07-30  James Greenhalgh  <james.greenhalgh@arm.com>

	* config/aarch64/aarch64.c (aarch64_simd_vect_par_cnst_half): Vary
	the generated mask based on BYTES_BIG_ENDIAN.
	(aarch64_simd_check_vect_par_cnst_half): New.
	* config/aarch64/aarch64-protos.h
	(aarch64_simd_check_vect_par_cnst_half): New.
	* config/aarch64/predicates.md (vect_par_cnst_hi_half): Refactor
	the check out to aarch64_simd_check_vect_par_cnst_half.
	(vect_par_cnst_lo_half): Likewise.
	* config/aarch64/aarch64-simd.md
	(aarch64_simd_move_hi_quad_<mode>): Always use vec_par_cnst_lo_half.
	(move_hi_quad_<mode>): Always generate a low mask.

Comments

Richard Biener July 30, 2014, 10:21 a.m. UTC | #1
On Wed, Jul 30, 2014 at 12:10 PM, James Greenhalgh
<james.greenhalgh@arm.com> wrote:
>
> Hi,
>
> A vec_select mask exists in GCC's world-view of lane ordering. The
> "low-half" of the vector { a, b, c, d } is { a, b }, which on big-endian
> will be in the high bits of the architectural register. On little-endian,
> these lanes will be in the low bits of the architectural register.
> We therefore need different masks depending on our target endian-ness.
> The diagram below may help.
>
> We must draw the distinction when building masks which select one half of the
> vector.  An instruction selecting architectural low-lanes for a big-endian
> target, must be described using a mask selecting GCC high-lanes.
>
>                  Big-Endian             Little-Endian
>
> GCC             0   1   2   3           3   2   1   0
>               | x | x | x | x |       | x | x | x | x |
> Architecture    3   2   1   0           3   2   1   0
>
> Low Mask:         { 2, 3 }                { 0, 1 }
> High Mask:        { 0, 1 }                { 2, 3 }
>
> The way we implement this requires some "there is no spoon" thinking to avoid
> pattern duplication. We define a vec_par_cnst_lo_half mask to always
> refer to the low architectural lanes. I gave some thought to renaming this
> vec_par_cnst_arch_lo_half, but it didn't add much meaning. I'm happy to
> take bike-shedding towards a more self-documenting naming scheme.
>
> No regressions spotted on aarch64_be-none-elf or aarch64-none-elf.
>
> OK for trunk?

Please make sure the above is still correct if you rip out all
if (BYTES_BIG_ENDIAN) cases from tree-vect*.c.

Richard.

> Thanks,
> James
>
> ---
> gcc/
>
> 2014-07-30  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         * config/aarch64/aarch64.c (aarch64_simd_vect_par_cnst_half): Vary
>         the generated mask based on BYTES_BIG_ENDIAN.
>         (aarch64_simd_check_vect_par_cnst_half): New.
>         * config/aarch64/aarch64-protos.h
>         (aarch64_simd_check_vect_par_cnst_half): New.
>         * config/aarch64/predicates.md (vect_par_cnst_hi_half): Refactor
>         the check out to aarch64_simd_check_vect_par_cnst_half.
>         (vect_par_cnst_lo_half): Likewise.
>         * config/aarch64/aarch64-simd.md
>         (aarch64_simd_move_hi_quad_<mode>): Always use vec_par_cnst_lo_half.
>         (move_hi_quad_<mode>): Always generate a low mask.
James Greenhalgh July 30, 2014, 10:45 a.m. UTC | #2
On Wed, Jul 30, 2014 at 11:21:40AM +0100, Richard Biener wrote:
> On Wed, Jul 30, 2014 at 12:10 PM, James Greenhalgh
> <james.greenhalgh@arm.com> wrote:
> >
> > Hi,
> >
> > A vec_select mask exists in GCC's world-view of lane ordering. The
> > "low-half" of the vector { a, b, c, d } is { a, b }, which on big-endian
> > will be in the high bits of the architectural register. On little-endian,
> > these lanes will be in the low bits of the architectural register.
> > We therefore need different masks depending on our target endian-ness.
> > The diagram below may help.
> >
> > We must draw the distinction when building masks which select one half of the
> > vector.  An instruction selecting architectural low-lanes for a big-endian
> > target, must be described using a mask selecting GCC high-lanes.
> >
> >                  Big-Endian             Little-Endian
> >
> > GCC             0   1   2   3           3   2   1   0
> >               | x | x | x | x |       | x | x | x | x |
> > Architecture    3   2   1   0           3   2   1   0
> >
> > Low Mask:         { 2, 3 }                { 0, 1 }
> > High Mask:        { 0, 1 }                { 2, 3 }
> >
> > The way we implement this requires some "there is no spoon" thinking to avoid
> > pattern duplication. We define a vec_par_cnst_lo_half mask to always
> > refer to the low architectural lanes. I gave some thought to renaming this
> > vec_par_cnst_arch_lo_half, but it didn't add much meaning. I'm happy to
> > take bike-shedding towards a more self-documenting naming scheme.
> >
> > No regressions spotted on aarch64_be-none-elf or aarch64-none-elf.
> >
> > OK for trunk?
> 
> Please make sure the above is still correct if you rip out all
> if (BYTES_BIG_ENDIAN) cases from tree-vect*.c.

It will be, yes.

The RTL and Tree/Gimple level representations will have a consistent view
that still won't match up with the way the architecture thinks of lanes and
elements. On our big-endian systems, the lowest address in memory gets loaded
to the highest numbered lane in register. GCC thinks of the lowest address
in memory as the lowest element in its vectors, which makes sense, but causes
a mismatch. So when GCC wants bits 0-32 of a V4SI vector extracted it really
means it wants what would be array element 0, so we need to map that to
extract bits 94-127 from the register.

This is all just back-end magic to keep the mid-end endianness agnostic.

There will, of course, be some patterns we'll have to clean up after we fix
tree-vect*.c, but this fundamental mismatch of lane numbering won't change.
We'll just have the same pain as the other big-endian backends adjusting
patterns as needed.

Cheers,
James

> 
> Richard.
> 
> > Thanks,
> > James
> >
> > ---
> > gcc/
> >
> > 2014-07-30  James Greenhalgh  <james.greenhalgh@arm.com>
> >
> >         * config/aarch64/aarch64.c (aarch64_simd_vect_par_cnst_half): Vary
> >         the generated mask based on BYTES_BIG_ENDIAN.
> >         (aarch64_simd_check_vect_par_cnst_half): New.
> >         * config/aarch64/aarch64-protos.h
> >         (aarch64_simd_check_vect_par_cnst_half): New.
> >         * config/aarch64/predicates.md (vect_par_cnst_hi_half): Refactor
> >         the check out to aarch64_simd_check_vect_par_cnst_half.
> >         (vect_par_cnst_lo_half): Likewise.
> >         * config/aarch64/aarch64-simd.md
> >         (aarch64_simd_move_hi_quad_<mode>): Always use vec_par_cnst_lo_half.
> >         (move_hi_quad_<mode>): Always generate a low mask.
>
Marcus Shawcroft July 31, 2014, 1:34 p.m. UTC | #3
On 30 July 2014 11:10, James Greenhalgh <james.greenhalgh@arm.com> wrote:

> 2014-07-30  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         * config/aarch64/aarch64.c (aarch64_simd_vect_par_cnst_half): Vary
>         the generated mask based on BYTES_BIG_ENDIAN.
>         (aarch64_simd_check_vect_par_cnst_half): New.
>         * config/aarch64/aarch64-protos.h
>         (aarch64_simd_check_vect_par_cnst_half): New.
>         * config/aarch64/predicates.md (vect_par_cnst_hi_half): Refactor
>         the check out to aarch64_simd_check_vect_par_cnst_half.
>         (vect_par_cnst_lo_half): Likewise.
>         * config/aarch64/aarch64-simd.md
>         (aarch64_simd_move_hi_quad_<mode>): Always use vec_par_cnst_lo_half.
>         (move_hi_quad_<mode>): Always generate a low mask.

OK with me /Marcus
diff mbox

Patch

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 53023ba..927a20e 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -200,6 +200,8 @@  bool aarch64_pad_arg_upward (enum machine_mode, const_tree);
 bool aarch64_pad_reg_upward (enum machine_mode, const_tree, bool);
 bool aarch64_regno_ok_for_base_p (int, bool);
 bool aarch64_regno_ok_for_index_p (int, bool);
+bool aarch64_simd_check_vect_par_cnst_half (rtx op, enum machine_mode mode,
+					    bool high);
 bool aarch64_simd_imm_scalar_p (rtx x, enum machine_mode mode);
 bool aarch64_simd_imm_zero_p (rtx, enum machine_mode);
 bool aarch64_simd_scalar_immediate_valid_for_move (rtx, enum machine_mode);
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 6300b9b6c7ac06384d2e59bbac1a0d5445975bb6..0d4b37e53b758919aea382c22d1c600f113a8b54 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1022,7 +1022,7 @@  (define_insn "aarch64_simd_move_hi_quad_
 	  (match_operand:<VHALF> 1 "register_operand" "w,r")
           (vec_select:<VHALF>
                 (match_dup 0)
-                (match_operand:VQ 2 "vect_par_cnst_hi_half" ""))))]
+                (match_operand:VQ 2 "vect_par_cnst_lo_half" ""))))]
   "TARGET_SIMD && BYTES_BIG_ENDIAN"
   "@
    ins\\t%0.d[1], %1.d[0]
@@ -1035,7 +1035,7 @@  (define_expand "move_hi_quad_<mode>"
   (match_operand:<VHALF> 1 "register_operand" "")]
  "TARGET_SIMD"
 {
-  rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, BYTES_BIG_ENDIAN);
+  rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, false);
   if (BYTES_BIG_ENDIAN)
     emit_insn (gen_aarch64_simd_move_hi_quad_be_<mode> (operands[0],
 		    operands[1], p));
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ed80269..2ea55e8 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7900,23 +7900,81 @@  aarch64_simd_scalar_immediate_valid_for_move (rtx op, enum machine_mode mode)
   return aarch64_simd_valid_immediate (op_v, vmode, false, NULL);
 }
 
-/* Construct and return a PARALLEL RTX vector.  */
+/* Construct and return a PARALLEL RTX vector with elements numbering the
+   lanes of either the high (HIGH == TRUE) or low (HIGH == FALSE) half of
+   the vector - from the perspective of the architecture.  This does not
+   line up with GCC's perspective on lane numbers, so we end up with
+   different masks depending on our target endian-ness.  The diagram
+   below may help.  We must draw the distinction when building masks
+   which select one half of the vector.  An instruction selecting
+   architectural low-lanes for a big-endian target, must be described using
+   a mask selecting GCC high-lanes.
+
+                 Big-Endian             Little-Endian
+
+GCC             0   1   2   3           3   2   1   0
+              | x | x | x | x |       | x | x | x | x |
+Architecture    3   2   1   0           3   2   1   0
+
+Low Mask:         { 2, 3 }                { 0, 1 }
+High Mask:        { 0, 1 }                { 2, 3 }
+*/
+
 rtx
 aarch64_simd_vect_par_cnst_half (enum machine_mode mode, bool high)
 {
   int nunits = GET_MODE_NUNITS (mode);
   rtvec v = rtvec_alloc (nunits / 2);
-  int base = high ? nunits / 2 : 0;
+  int high_base = nunits / 2;
+  int low_base = 0;
+  int base;
   rtx t1;
   int i;
 
-  for (i=0; i < nunits / 2; i++)
+  if (BYTES_BIG_ENDIAN)
+    base = high ? low_base : high_base;
+  else
+    base = high ? high_base : low_base;
+
+  for (i = 0; i < nunits / 2; i++)
     RTVEC_ELT (v, i) = GEN_INT (base + i);
 
   t1 = gen_rtx_PARALLEL (mode, v);
   return t1;
 }
 
+/* Check OP for validity as a PARALLEL RTX vector with elements
+   numbering the lanes of either the high (HIGH == TRUE) or low lanes,
+   from the perspective of the architecture.  See the diagram above
+   aarch64_simd_vect_par_cnst_half for more details.  */
+
+bool
+aarch64_simd_check_vect_par_cnst_half (rtx op, enum machine_mode mode,
+				       bool high)
+{
+  rtx ideal = aarch64_simd_vect_par_cnst_half (mode, high);
+  HOST_WIDE_INT count_op = XVECLEN (op, 0);
+  HOST_WIDE_INT count_ideal = XVECLEN (ideal, 0);
+  int i = 0;
+
+  if (!VECTOR_MODE_P (mode))
+    return false;
+
+  if (count_op != count_ideal)
+    return false;
+
+  for (i = 0; i < count_ideal; i++)
+    {
+      rtx elt_op = XVECEXP (op, 0, i);
+      rtx elt_ideal = XVECEXP (ideal, 0, i);
+
+      if (GET_CODE (elt_op) != CONST_INT
+	  || INTVAL (elt_ideal) != INTVAL (elt_op))
+	return false;
+    }
+  return true;
+}
+
 /* Bounds-check lanes.  Ensure OPERAND lies between LOW (inclusive) and
    HIGH (exclusive).  */
 void
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 2702a3c8d830963197a933eac3c0cf3869ef29db..95d1910ccece59fcc1c0a56d27ed084362710d10 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -207,62 +207,15 @@  (define_predicate "aarch64_sync_memory_o
 (define_special_predicate "vect_par_cnst_hi_half"
   (match_code "parallel")
 {
-  HOST_WIDE_INT count = XVECLEN (op, 0);
-  int nunits = GET_MODE_NUNITS (mode);
-  int i;
-
-  if (count < 1
-      || count != nunits / 2)
-    return false;
- 
-  if (!VECTOR_MODE_P (mode))
-    return false;
-
-  for (i = 0; i < count; i++)
-   {
-     rtx elt = XVECEXP (op, 0, i);
-     int val;
-
-     if (GET_CODE (elt) != CONST_INT)
-       return false;
-
-     val = INTVAL (elt);
-     if (val != (nunits / 2) + i)
-       return false;
-   }
-  return true;
+  return aarch64_simd_check_vect_par_cnst_half (op, mode, true);
 })
 
 (define_special_predicate "vect_par_cnst_lo_half"
   (match_code "parallel")
 {
-  HOST_WIDE_INT count = XVECLEN (op, 0);
-  int nunits = GET_MODE_NUNITS (mode);
-  int i;
-
-  if (count < 1
-      || count != nunits / 2)
-    return false;
-
-  if (!VECTOR_MODE_P (mode))
-    return false;
-
-  for (i = 0; i < count; i++)
-   {
-     rtx elt = XVECEXP (op, 0, i);
-     int val;
-
-     if (GET_CODE (elt) != CONST_INT)
-       return false;
-
-     val = INTVAL (elt);
-     if (val != i)
-       return false;
-   }
-  return true;
+  return aarch64_simd_check_vect_par_cnst_half (op, mode, false);
 })
 
-
 (define_special_predicate "aarch64_simd_lshift_imm"
   (match_code "const_vector")
 {