Patchwork [testsuite] require arm_little_endian in two tests

login
register
mail settings
Submitter Janis Johnson
Date Oct. 12, 2011, 11:21 p.m.
Message ID <4E9620F3.5010007@mentor.com>
Download mbox | patch
Permalink /patch/119327/
State New
Headers show

Comments

Janis Johnson - Oct. 12, 2011, 11:21 p.m.
Tests gcc.target/arm/pr48252.c and gcc.target/arm/neon-vset_lanes8.c
expect little-endian code and fail when compiled with -mbig-endian.
This patch skips the test if the current multilib does not generate
little-endian code.

I'm not able to run execution tests for -mbig-endian for GCC mainline
but have tested this patch with CodeSourcery's GCC 4.6.  OK for trunk?
2011-10-12  Janis Johnson  <janisjo@codesourcery.com>

	* gcc.target/arm/pr48252.c: Require arm_little_endian.
	* gcc.target/arm/neon-vset_lanes8.c: Likewise.
Richard Earnshaw - Oct. 13, 2011, 1:12 p.m.
On 13/10/11 00:21, Janis Johnson wrote:
> Tests gcc.target/arm/pr48252.c and gcc.target/arm/neon-vset_lanes8.c
> expect little-endian code and fail when compiled with -mbig-endian.
> This patch skips the test if the current multilib does not generate
> little-endian code.
> 
> I'm not able to run execution tests for -mbig-endian for GCC mainline
> but have tested this patch with CodeSourcery's GCC 4.6.  OK for trunk?
> 
> 
> gcc-20111012-003
> 
> 
> 2011-10-12  Janis Johnson  <janisjo@codesourcery.com>
> 
> 	* gcc.target/arm/pr48252.c: Require arm_little_endian.
> 	* gcc.target/arm/neon-vset_lanes8.c: Likewise.
> 
> Index: gcc/testsuite/gcc.target/arm/pr48252.c
> ===================================================================
> --- gcc/testsuite/gcc.target/arm/pr48252.c	(revision 344214)
> +++ gcc/testsuite/gcc.target/arm/pr48252.c	(working copy)
> @@ -1,5 +1,6 @@
>  /* { dg-do run } */
>  /* { dg-require-effective-target arm_neon_hw } */
> +/* { dg-require-effective-target arm_little_endian } */
>  /* { dg-options "-O2" } */
>  /* { dg-add-options arm_neon } */
>  

I can't think of any obvious reason why this should fail in big-endian.

> Index: gcc/testsuite/gcc.target/arm/neon-vset_lanes8.c
> ===================================================================
> --- gcc/testsuite/gcc.target/arm/neon-vset_lanes8.c	(revision 344214)
> +++ gcc/testsuite/gcc.target/arm/neon-vset_lanes8.c	(working copy)
> @@ -2,6 +2,7 @@
>  
>  /* { dg-do run } */
>  /* { dg-require-effective-target arm_neon_hw } */
> +/* { dg-require-effective-target arm_little_endian } */
>  /* { dg-options "-O0" } */
>  /* { dg-add-options arm_neon } */
>  

I can see why this fails at present, the test is based on the assumption
that

int8x8_t x = {...}
puts the first element in lane 0 and subsequent elements in consecutive
lanes, *and* that this is equivalent to casting char[8] into a vector.
However, this isn't the case for big-endian.

There's two ways this could be sorted.

1) Change the testcase to:

#include "arm_neon.h"
#include <stdlib.h>
#include <string.h>

signed char x_init[8] = { 1, 2, 3, 4, 5, 6, 7, 8 };
signed char y_init[8] = { 1, 2, 3, 16, 5, 6, 7, 8 };

int main (void)
{
  int8x8_t x = vld1_s8(x_init);
  int8x8_t y = vld1_s8(y_init);

  x = vset_lane_s8 (16, x, 3);
  if (memcmp (&x, &y, sizeof (x)) != 0)
    abort();
  return 0;
}

2) Change the compiler to make initializers of vectors assign elements
of initializers to consecutive lanes in a vector, rather than the
current behaviour of 'casting' an array of elements to a vector.

While the second would be my preferred change, I suspect it's too hard
to fix, and may well cause code written for other targets to break on
big-endian (altivec for example).

R.
Joseph S. Myers - Oct. 13, 2011, 2:56 p.m.
On Thu, 13 Oct 2011, Richard Earnshaw wrote:

> 2) Change the compiler to make initializers of vectors assign elements
> of initializers to consecutive lanes in a vector, rather than the
> current behaviour of 'casting' an array of elements to a vector.
> 
> While the second would be my preferred change, I suspect it's too hard
> to fix, and may well cause code written for other targets to break on
> big-endian (altivec for example).

Indeed, vector initializers are part of the target-independent GNU C 
language and have target-independent semantics that the elements go in 
memory order, corresponding to the target-independent semantics of lane 
numbers where they appear in GENERIC, GIMPLE and (non-UNSPEC) RTL and any 
target-independent built-in functions that use such numbers.  (The issue 
here being, as you saw, that the lane numbers used in ARM-specific NEON 
intrinsics are for big-endian not the same as those used in 
target-independent features of GNU C and target-independent internal 
representations in GCC - hence various code to translate them between the 
two conventions when processing intrinsics into non-UNSPEC RTL, and to 
translate back when generating assembly instructions that encode lane 
numbers with the ARM conventions, as expounded at greater length at 
<http://gcc.gnu.org/ml/gcc-patches/2010-06/msg00409.html>.)
Richard Earnshaw - Oct. 13, 2011, 3:12 p.m.
On 13/10/11 15:56, Joseph S. Myers wrote:
> On Thu, 13 Oct 2011, Richard Earnshaw wrote:
> 
>> 2) Change the compiler to make initializers of vectors assign elements
>> of initializers to consecutive lanes in a vector, rather than the
>> current behaviour of 'casting' an array of elements to a vector.
>>
>> While the second would be my preferred change, I suspect it's too hard
>> to fix, and may well cause code written for other targets to break on
>> big-endian (altivec for example).
> 
> Indeed, vector initializers are part of the target-independent GNU C 
> language and have target-independent semantics that the elements go in 
> memory order, corresponding to the target-independent semantics of lane 
> numbers where they appear in GENERIC, GIMPLE and (non-UNSPEC) RTL and any 
> target-independent built-in functions that use such numbers.  (The issue 
> here being, as you saw, that the lane numbers used in ARM-specific NEON 
> intrinsics are for big-endian not the same as those used in 
> target-independent features of GNU C and target-independent internal 
> representations in GCC - hence various code to translate them between the 
> two conventions when processing intrinsics into non-UNSPEC RTL, and to 
> translate back when generating assembly instructions that encode lane 
> numbers with the ARM conventions, as expounded at greater length at 
> <http://gcc.gnu.org/ml/gcc-patches/2010-06/msg00409.html>.)
> 

This is all rather horrible, and leads to THREE different layouts for a
128-bit vector for big-endian Neon.

GCC format
'VLD1.n' format
'ABI' format

GCC format and 'ABI' format differ in that the 64-bit words of the
128-bit vector are swapped.

All this and they are all expected to share a single machine mode.

Furthermore, the definitions in GCC are broken, in that the types
defined in arm_neon.h (eg int8x16_t) are supposed to be ABI format, not
GCC format.

Eukkkkkk! :-(

R.
Julian Brown - Oct. 14, 2011, 10:42 a.m.
On Thu, 13 Oct 2011 16:12:17 +0100
Richard Earnshaw <rearnsha@arm.com> wrote:

> On 13/10/11 15:56, Joseph S. Myers wrote:
> > Indeed, vector initializers are part of the target-independent GNU
> > C language and have target-independent semantics that the elements
> > go in memory order, corresponding to the target-independent
> > semantics of lane numbers where they appear in GENERIC, GIMPLE and
> > (non-UNSPEC) RTL and any target-independent built-in functions that
> > use such numbers.  (The issue here being, as you saw, that the lane
> > numbers used in ARM-specific NEON intrinsics are for big-endian not
> > the same as those used in target-independent features of GNU C and
> > target-independent internal representations in GCC - hence various
> > code to translate them between the two conventions when processing
> > intrinsics into non-UNSPEC RTL, and to translate back when
> > generating assembly instructions that encode lane numbers with the
> > ARM conventions, as expounded at greater length at
> > <http://gcc.gnu.org/ml/gcc-patches/2010-06/msg00409.html>.)
> > 
> 
> This is all rather horrible, and leads to THREE different layouts for
> a 128-bit vector for big-endian Neon.
> 
> GCC format
> 'VLD1.n' format
> 'ABI' format
> 
> GCC format and 'ABI' format differ in that the 64-bit words of the
> 128-bit vector are swapped.
> 
> All this and they are all expected to share a single machine mode.
> 
> Furthermore, the definitions in GCC are broken, in that the types
> defined in arm_neon.h (eg int8x16_t) are supposed to be ABI format,
> not GCC format.
> 
> Eukkkkkk! :-(

FWIW, I thought long and hard about this problem, and eventually gave
up trying to solve it. Note that many operations which depend on the
ordering of vectors are now disabled entirely (at least for Q regs) in
neon.md in big-endian mode to try and limit the damage. NEON is
basically only supported properly in little-endian mode, IMO.

I'd love to see this resolved properly. Some random observations:

 * The vectorizer can use whatever layout it wants for vectors in
   either endianness. Vectorizer vectors never interact with either
   GCC generic (source-level) vectors, nor the NEON intrinsics. Also
   they never cross ABI boundaries.

 * GCC generic vectors aren't specified very formally, particularly wrt.
   their interaction with NEON intrinsics. If you stick *entirely* to
   accessing vectors via NEON intrinsics, the problems in big-endian
   mode (I think) don't ever materialise. This includes not using
   indirection to load/store vectors, and (of course) not constructing
   vectors using { x, y, z... } syntax. One possibility might be to
   detect and *disallow* code which attempts to mix vector operations
   like that.

I don't quite understand your comment about the GCC definitions of
int8x16_t etc. being broken, tbh...

Cheers,

Julian
Richard Earnshaw - Oct. 14, 2011, 2:57 p.m.
On 14/10/11 11:42, Julian Brown wrote:
> On Thu, 13 Oct 2011 16:12:17 +0100
> Richard Earnshaw <rearnsha@arm.com> wrote:
> 
>> On 13/10/11 15:56, Joseph S. Myers wrote:
>>> Indeed, vector initializers are part of the target-independent GNU
>>> C language and have target-independent semantics that the elements
>>> go in memory order, corresponding to the target-independent
>>> semantics of lane numbers where they appear in GENERIC, GIMPLE and
>>> (non-UNSPEC) RTL and any target-independent built-in functions that
>>> use such numbers.  (The issue here being, as you saw, that the lane
>>> numbers used in ARM-specific NEON intrinsics are for big-endian not
>>> the same as those used in target-independent features of GNU C and
>>> target-independent internal representations in GCC - hence various
>>> code to translate them between the two conventions when processing
>>> intrinsics into non-UNSPEC RTL, and to translate back when
>>> generating assembly instructions that encode lane numbers with the
>>> ARM conventions, as expounded at greater length at
>>> <http://gcc.gnu.org/ml/gcc-patches/2010-06/msg00409.html>.)
>>>
>>
>> This is all rather horrible, and leads to THREE different layouts for
>> a 128-bit vector for big-endian Neon.
>>
>> GCC format
>> 'VLD1.n' format
>> 'ABI' format
>>
>> GCC format and 'ABI' format differ in that the 64-bit words of the
>> 128-bit vector are swapped.
>>
>> All this and they are all expected to share a single machine mode.
>>
>> Furthermore, the definitions in GCC are broken, in that the types
>> defined in arm_neon.h (eg int8x16_t) are supposed to be ABI format,
>> not GCC format.
>>
>> Eukkkkkk! :-(
> 
> FWIW, I thought long and hard about this problem, and eventually gave
> up trying to solve it. Note that many operations which depend on the
> ordering of vectors are now disabled entirely (at least for Q regs) in
> neon.md in big-endian mode to try and limit the damage. NEON is
> basically only supported properly in little-endian mode, IMO.
> 
> I'd love to see this resolved properly. Some random observations:
> 
>  * The vectorizer can use whatever layout it wants for vectors in
>    either endianness. Vectorizer vectors never interact with either
>    GCC generic (source-level) vectors, nor the NEON intrinsics. Also
>    they never cross ABI boundaries.
> 
>  * GCC generic vectors aren't specified very formally, particularly wrt.
>    their interaction with NEON intrinsics. If you stick *entirely* to
>    accessing vectors via NEON intrinsics, the problems in big-endian
>    mode (I think) don't ever materialise. This includes not using
>    indirection to load/store vectors, and (of course) not constructing
>    vectors using { x, y, z... } syntax. One possibility might be to
>    detect and *disallow* code which attempts to mix vector operations
>    like that.
> 
> I don't quite understand your comment about the GCC definitions of
> int8x16_t etc. being broken, tbh...
> 

the 128-bit vectors are loaded as a pair of D regs, with D<n> holding
the lower addressed D-word and D<n+1> holding the higher addressed
D-word; but these are treated in a Q reg as {D<n+1>:D<n>}. On a
big-endian machine that means D<n> contains the most significant lanes
of the vector and D<n+1> the least significant lanes.  For a big-endian
view we really need to see these as {D<n>:D<n+1>} (read {a:b} as
bit-wise concatenation of a and b).

One way we might address this is to redefine our 128-bit vector types as
structs of low/high Dwords.  Each Dword remains a vector (apart from
64-bit lane types), but the Dword order then matches the ABI
specification correctly.  For example, the definition of uint8x16_t becomes

	typedef struct { uint8x8_t _val[2]; } uint8x16_t;

that is we consider this to be a pair of 64-bit vectors.  Obviously
there would be similar definitions for the other vector types.  This
then gives the correct view on the world because D<n> is always _val[0]
and D<n+1> is always _val[1].

Secondly, all vector loads/stores should really be changed to use
vld1.64 (with {d<n>, d<n+1>} as the register list for 128-bit accesses)
rather than vldm; this then sorts out any issues with unaligned accesses
without changing the memory format.

> Cheers,
> 
> Julian
>
Joseph S. Myers - Oct. 14, 2011, 3:11 p.m.
On Fri, 14 Oct 2011, Julian Brown wrote:

>  * The vectorizer can use whatever layout it wants for vectors in
>    either endianness. Vectorizer vectors never interact with either
>    GCC generic (source-level) vectors, nor the NEON intrinsics. Also
>    they never cross ABI boundaries.

I don't think it makes sense to refer to the vectorizer as using a layout.  
The vectorizer transforms GIMPLE to GIMPLE, and both the input and output 
GIMPLE have target-independent semantics that may be relied upon anywhere 
that processes GIMPLE (meaning the transformations should be valid as 
target-independent transformations of GIMPLE), except insofar as built-in 
functions are used.  Of course which transformations are made depends on 
what operations can be implemented efficiently on the target processor.
Joseph S. Myers - Oct. 14, 2011, 3:21 p.m.
On Fri, 14 Oct 2011, Richard Earnshaw wrote:

> One way we might address this is to redefine our 128-bit vector types as
> structs of low/high Dwords.  Each Dword remains a vector (apart from
> 64-bit lane types), but the Dword order then matches the ABI
> specification correctly.  For example, the definition of uint8x16_t becomes
> 
> 	typedef struct { uint8x8_t _val[2]; } uint8x16_t;

Those types have different ABIs for argument passing and return, so you'd 
need some magic for special handling of the uint8x16_t type as defined in 
the header....

> Secondly, all vector loads/stores should really be changed to use
> vld1.64 (with {d<n>, d<n+1>} as the register list for 128-bit accesses)
> rather than vldm; this then sorts out any issues with unaligned accesses
> without changing the memory format.

vld1 runs into problems for big-endian of not being able to do core 
register loads / stores / transfers between core and NEON registers that 
way, and needing to convert to the other format for argument passing / 
return.
Richard Earnshaw - Oct. 14, 2011, 4:23 p.m.
On 14/10/11 16:21, Joseph S. Myers wrote:
> On Fri, 14 Oct 2011, Richard Earnshaw wrote:
> 
>> One way we might address this is to redefine our 128-bit vector types as
>> structs of low/high Dwords.  Each Dword remains a vector (apart from
>> 64-bit lane types), but the Dword order then matches the ABI
>> specification correctly.  For example, the definition of uint8x16_t becomes
>>
>> 	typedef struct { uint8x8_t _val[2]; } uint8x16_t;
> 
> Those types have different ABIs for argument passing and return, so you'd 
> need some magic for special handling of the uint8x16_t type as defined in 
> the header....
> 

Yes, it's not a simple substitution, but it more correctly describes the
data type that the architecture supports.  It might be necessary to
create a special internal type to distinguish it from user types that
are equivalent.


>> Secondly, all vector loads/stores should really be changed to use
>> vld1.64 (with {d<n>, d<n+1>} as the register list for 128-bit accesses)
>> rather than vldm; this then sorts out any issues with unaligned accesses
>> without changing the memory format.
> 
> vld1 runs into problems for big-endian of not being able to do core 
> register loads / stores / transfers between core and NEON registers that 
> way, and needing to convert to the other format for argument passing / 
> return.
> 

Note that I said vld1.64 (not vld1.<lane-size>.  That has the same
memory format as vldm, except that it can also deal with unaligned accesses.

Patch

Index: gcc/testsuite/gcc.target/arm/pr48252.c
===================================================================
--- gcc/testsuite/gcc.target/arm/pr48252.c	(revision 344214)
+++ gcc/testsuite/gcc.target/arm/pr48252.c	(working copy)
@@ -1,5 +1,6 @@ 
 /* { dg-do run } */
 /* { dg-require-effective-target arm_neon_hw } */
+/* { dg-require-effective-target arm_little_endian } */
 /* { dg-options "-O2" } */
 /* { dg-add-options arm_neon } */
 
Index: gcc/testsuite/gcc.target/arm/neon-vset_lanes8.c
===================================================================
--- gcc/testsuite/gcc.target/arm/neon-vset_lanes8.c	(revision 344214)
+++ gcc/testsuite/gcc.target/arm/neon-vset_lanes8.c	(working copy)
@@ -2,6 +2,7 @@ 
 
 /* { dg-do run } */
 /* { dg-require-effective-target arm_neon_hw } */
+/* { dg-require-effective-target arm_little_endian } */
 /* { dg-options "-O0" } */
 /* { dg-add-options arm_neon } */