Message ID | 4E9620F3.5010007@mentor.com |
---|---|
State | New |
Headers | show |
On 13/10/11 00:21, Janis Johnson wrote: > Tests gcc.target/arm/pr48252.c and gcc.target/arm/neon-vset_lanes8.c > expect little-endian code and fail when compiled with -mbig-endian. > This patch skips the test if the current multilib does not generate > little-endian code. > > I'm not able to run execution tests for -mbig-endian for GCC mainline > but have tested this patch with CodeSourcery's GCC 4.6. OK for trunk? > > > gcc-20111012-003 > > > 2011-10-12 Janis Johnson <janisjo@codesourcery.com> > > * gcc.target/arm/pr48252.c: Require arm_little_endian. > * gcc.target/arm/neon-vset_lanes8.c: Likewise. > > Index: gcc/testsuite/gcc.target/arm/pr48252.c > =================================================================== > --- gcc/testsuite/gcc.target/arm/pr48252.c (revision 344214) > +++ gcc/testsuite/gcc.target/arm/pr48252.c (working copy) > @@ -1,5 +1,6 @@ > /* { dg-do run } */ > /* { dg-require-effective-target arm_neon_hw } */ > +/* { dg-require-effective-target arm_little_endian } */ > /* { dg-options "-O2" } */ > /* { dg-add-options arm_neon } */ > I can't think of any obvious reason why this should fail in big-endian. > Index: gcc/testsuite/gcc.target/arm/neon-vset_lanes8.c > =================================================================== > --- gcc/testsuite/gcc.target/arm/neon-vset_lanes8.c (revision 344214) > +++ gcc/testsuite/gcc.target/arm/neon-vset_lanes8.c (working copy) > @@ -2,6 +2,7 @@ > > /* { dg-do run } */ > /* { dg-require-effective-target arm_neon_hw } */ > +/* { dg-require-effective-target arm_little_endian } */ > /* { dg-options "-O0" } */ > /* { dg-add-options arm_neon } */ > I can see why this fails at present, the test is based on the assumption that int8x8_t x = {...} puts the first element in lane 0 and subsequent elements in consecutive lanes, *and* that this is equivalent to casting char[8] into a vector. However, this isn't the case for big-endian. There's two ways this could be sorted. 1) Change the testcase to: #include "arm_neon.h" #include <stdlib.h> #include <string.h> signed char x_init[8] = { 1, 2, 3, 4, 5, 6, 7, 8 }; signed char y_init[8] = { 1, 2, 3, 16, 5, 6, 7, 8 }; int main (void) { int8x8_t x = vld1_s8(x_init); int8x8_t y = vld1_s8(y_init); x = vset_lane_s8 (16, x, 3); if (memcmp (&x, &y, sizeof (x)) != 0) abort(); return 0; } 2) Change the compiler to make initializers of vectors assign elements of initializers to consecutive lanes in a vector, rather than the current behaviour of 'casting' an array of elements to a vector. While the second would be my preferred change, I suspect it's too hard to fix, and may well cause code written for other targets to break on big-endian (altivec for example). R.
On Thu, 13 Oct 2011, Richard Earnshaw wrote: > 2) Change the compiler to make initializers of vectors assign elements > of initializers to consecutive lanes in a vector, rather than the > current behaviour of 'casting' an array of elements to a vector. > > While the second would be my preferred change, I suspect it's too hard > to fix, and may well cause code written for other targets to break on > big-endian (altivec for example). Indeed, vector initializers are part of the target-independent GNU C language and have target-independent semantics that the elements go in memory order, corresponding to the target-independent semantics of lane numbers where they appear in GENERIC, GIMPLE and (non-UNSPEC) RTL and any target-independent built-in functions that use such numbers. (The issue here being, as you saw, that the lane numbers used in ARM-specific NEON intrinsics are for big-endian not the same as those used in target-independent features of GNU C and target-independent internal representations in GCC - hence various code to translate them between the two conventions when processing intrinsics into non-UNSPEC RTL, and to translate back when generating assembly instructions that encode lane numbers with the ARM conventions, as expounded at greater length at <http://gcc.gnu.org/ml/gcc-patches/2010-06/msg00409.html>.)
On 13/10/11 15:56, Joseph S. Myers wrote: > On Thu, 13 Oct 2011, Richard Earnshaw wrote: > >> 2) Change the compiler to make initializers of vectors assign elements >> of initializers to consecutive lanes in a vector, rather than the >> current behaviour of 'casting' an array of elements to a vector. >> >> While the second would be my preferred change, I suspect it's too hard >> to fix, and may well cause code written for other targets to break on >> big-endian (altivec for example). > > Indeed, vector initializers are part of the target-independent GNU C > language and have target-independent semantics that the elements go in > memory order, corresponding to the target-independent semantics of lane > numbers where they appear in GENERIC, GIMPLE and (non-UNSPEC) RTL and any > target-independent built-in functions that use such numbers. (The issue > here being, as you saw, that the lane numbers used in ARM-specific NEON > intrinsics are for big-endian not the same as those used in > target-independent features of GNU C and target-independent internal > representations in GCC - hence various code to translate them between the > two conventions when processing intrinsics into non-UNSPEC RTL, and to > translate back when generating assembly instructions that encode lane > numbers with the ARM conventions, as expounded at greater length at > <http://gcc.gnu.org/ml/gcc-patches/2010-06/msg00409.html>.) > This is all rather horrible, and leads to THREE different layouts for a 128-bit vector for big-endian Neon. GCC format 'VLD1.n' format 'ABI' format GCC format and 'ABI' format differ in that the 64-bit words of the 128-bit vector are swapped. All this and they are all expected to share a single machine mode. Furthermore, the definitions in GCC are broken, in that the types defined in arm_neon.h (eg int8x16_t) are supposed to be ABI format, not GCC format. Eukkkkkk! :-( R.
On Thu, 13 Oct 2011 16:12:17 +0100 Richard Earnshaw <rearnsha@arm.com> wrote: > On 13/10/11 15:56, Joseph S. Myers wrote: > > Indeed, vector initializers are part of the target-independent GNU > > C language and have target-independent semantics that the elements > > go in memory order, corresponding to the target-independent > > semantics of lane numbers where they appear in GENERIC, GIMPLE and > > (non-UNSPEC) RTL and any target-independent built-in functions that > > use such numbers. (The issue here being, as you saw, that the lane > > numbers used in ARM-specific NEON intrinsics are for big-endian not > > the same as those used in target-independent features of GNU C and > > target-independent internal representations in GCC - hence various > > code to translate them between the two conventions when processing > > intrinsics into non-UNSPEC RTL, and to translate back when > > generating assembly instructions that encode lane numbers with the > > ARM conventions, as expounded at greater length at > > <http://gcc.gnu.org/ml/gcc-patches/2010-06/msg00409.html>.) > > > > This is all rather horrible, and leads to THREE different layouts for > a 128-bit vector for big-endian Neon. > > GCC format > 'VLD1.n' format > 'ABI' format > > GCC format and 'ABI' format differ in that the 64-bit words of the > 128-bit vector are swapped. > > All this and they are all expected to share a single machine mode. > > Furthermore, the definitions in GCC are broken, in that the types > defined in arm_neon.h (eg int8x16_t) are supposed to be ABI format, > not GCC format. > > Eukkkkkk! :-( FWIW, I thought long and hard about this problem, and eventually gave up trying to solve it. Note that many operations which depend on the ordering of vectors are now disabled entirely (at least for Q regs) in neon.md in big-endian mode to try and limit the damage. NEON is basically only supported properly in little-endian mode, IMO. I'd love to see this resolved properly. Some random observations: * The vectorizer can use whatever layout it wants for vectors in either endianness. Vectorizer vectors never interact with either GCC generic (source-level) vectors, nor the NEON intrinsics. Also they never cross ABI boundaries. * GCC generic vectors aren't specified very formally, particularly wrt. their interaction with NEON intrinsics. If you stick *entirely* to accessing vectors via NEON intrinsics, the problems in big-endian mode (I think) don't ever materialise. This includes not using indirection to load/store vectors, and (of course) not constructing vectors using { x, y, z... } syntax. One possibility might be to detect and *disallow* code which attempts to mix vector operations like that. I don't quite understand your comment about the GCC definitions of int8x16_t etc. being broken, tbh... Cheers, Julian
On 14/10/11 11:42, Julian Brown wrote: > On Thu, 13 Oct 2011 16:12:17 +0100 > Richard Earnshaw <rearnsha@arm.com> wrote: > >> On 13/10/11 15:56, Joseph S. Myers wrote: >>> Indeed, vector initializers are part of the target-independent GNU >>> C language and have target-independent semantics that the elements >>> go in memory order, corresponding to the target-independent >>> semantics of lane numbers where they appear in GENERIC, GIMPLE and >>> (non-UNSPEC) RTL and any target-independent built-in functions that >>> use such numbers. (The issue here being, as you saw, that the lane >>> numbers used in ARM-specific NEON intrinsics are for big-endian not >>> the same as those used in target-independent features of GNU C and >>> target-independent internal representations in GCC - hence various >>> code to translate them between the two conventions when processing >>> intrinsics into non-UNSPEC RTL, and to translate back when >>> generating assembly instructions that encode lane numbers with the >>> ARM conventions, as expounded at greater length at >>> <http://gcc.gnu.org/ml/gcc-patches/2010-06/msg00409.html>.) >>> >> >> This is all rather horrible, and leads to THREE different layouts for >> a 128-bit vector for big-endian Neon. >> >> GCC format >> 'VLD1.n' format >> 'ABI' format >> >> GCC format and 'ABI' format differ in that the 64-bit words of the >> 128-bit vector are swapped. >> >> All this and they are all expected to share a single machine mode. >> >> Furthermore, the definitions in GCC are broken, in that the types >> defined in arm_neon.h (eg int8x16_t) are supposed to be ABI format, >> not GCC format. >> >> Eukkkkkk! :-( > > FWIW, I thought long and hard about this problem, and eventually gave > up trying to solve it. Note that many operations which depend on the > ordering of vectors are now disabled entirely (at least for Q regs) in > neon.md in big-endian mode to try and limit the damage. NEON is > basically only supported properly in little-endian mode, IMO. > > I'd love to see this resolved properly. Some random observations: > > * The vectorizer can use whatever layout it wants for vectors in > either endianness. Vectorizer vectors never interact with either > GCC generic (source-level) vectors, nor the NEON intrinsics. Also > they never cross ABI boundaries. > > * GCC generic vectors aren't specified very formally, particularly wrt. > their interaction with NEON intrinsics. If you stick *entirely* to > accessing vectors via NEON intrinsics, the problems in big-endian > mode (I think) don't ever materialise. This includes not using > indirection to load/store vectors, and (of course) not constructing > vectors using { x, y, z... } syntax. One possibility might be to > detect and *disallow* code which attempts to mix vector operations > like that. > > I don't quite understand your comment about the GCC definitions of > int8x16_t etc. being broken, tbh... > the 128-bit vectors are loaded as a pair of D regs, with D<n> holding the lower addressed D-word and D<n+1> holding the higher addressed D-word; but these are treated in a Q reg as {D<n+1>:D<n>}. On a big-endian machine that means D<n> contains the most significant lanes of the vector and D<n+1> the least significant lanes. For a big-endian view we really need to see these as {D<n>:D<n+1>} (read {a:b} as bit-wise concatenation of a and b). One way we might address this is to redefine our 128-bit vector types as structs of low/high Dwords. Each Dword remains a vector (apart from 64-bit lane types), but the Dword order then matches the ABI specification correctly. For example, the definition of uint8x16_t becomes typedef struct { uint8x8_t _val[2]; } uint8x16_t; that is we consider this to be a pair of 64-bit vectors. Obviously there would be similar definitions for the other vector types. This then gives the correct view on the world because D<n> is always _val[0] and D<n+1> is always _val[1]. Secondly, all vector loads/stores should really be changed to use vld1.64 (with {d<n>, d<n+1>} as the register list for 128-bit accesses) rather than vldm; this then sorts out any issues with unaligned accesses without changing the memory format. > Cheers, > > Julian >
On Fri, 14 Oct 2011, Julian Brown wrote: > * The vectorizer can use whatever layout it wants for vectors in > either endianness. Vectorizer vectors never interact with either > GCC generic (source-level) vectors, nor the NEON intrinsics. Also > they never cross ABI boundaries. I don't think it makes sense to refer to the vectorizer as using a layout. The vectorizer transforms GIMPLE to GIMPLE, and both the input and output GIMPLE have target-independent semantics that may be relied upon anywhere that processes GIMPLE (meaning the transformations should be valid as target-independent transformations of GIMPLE), except insofar as built-in functions are used. Of course which transformations are made depends on what operations can be implemented efficiently on the target processor.
On Fri, 14 Oct 2011, Richard Earnshaw wrote: > One way we might address this is to redefine our 128-bit vector types as > structs of low/high Dwords. Each Dword remains a vector (apart from > 64-bit lane types), but the Dword order then matches the ABI > specification correctly. For example, the definition of uint8x16_t becomes > > typedef struct { uint8x8_t _val[2]; } uint8x16_t; Those types have different ABIs for argument passing and return, so you'd need some magic for special handling of the uint8x16_t type as defined in the header.... > Secondly, all vector loads/stores should really be changed to use > vld1.64 (with {d<n>, d<n+1>} as the register list for 128-bit accesses) > rather than vldm; this then sorts out any issues with unaligned accesses > without changing the memory format. vld1 runs into problems for big-endian of not being able to do core register loads / stores / transfers between core and NEON registers that way, and needing to convert to the other format for argument passing / return.
On 14/10/11 16:21, Joseph S. Myers wrote: > On Fri, 14 Oct 2011, Richard Earnshaw wrote: > >> One way we might address this is to redefine our 128-bit vector types as >> structs of low/high Dwords. Each Dword remains a vector (apart from >> 64-bit lane types), but the Dword order then matches the ABI >> specification correctly. For example, the definition of uint8x16_t becomes >> >> typedef struct { uint8x8_t _val[2]; } uint8x16_t; > > Those types have different ABIs for argument passing and return, so you'd > need some magic for special handling of the uint8x16_t type as defined in > the header.... > Yes, it's not a simple substitution, but it more correctly describes the data type that the architecture supports. It might be necessary to create a special internal type to distinguish it from user types that are equivalent. >> Secondly, all vector loads/stores should really be changed to use >> vld1.64 (with {d<n>, d<n+1>} as the register list for 128-bit accesses) >> rather than vldm; this then sorts out any issues with unaligned accesses >> without changing the memory format. > > vld1 runs into problems for big-endian of not being able to do core > register loads / stores / transfers between core and NEON registers that > way, and needing to convert to the other format for argument passing / > return. > Note that I said vld1.64 (not vld1.<lane-size>. That has the same memory format as vldm, except that it can also deal with unaligned accesses.
Index: gcc/testsuite/gcc.target/arm/pr48252.c =================================================================== --- gcc/testsuite/gcc.target/arm/pr48252.c (revision 344214) +++ gcc/testsuite/gcc.target/arm/pr48252.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do run } */ /* { dg-require-effective-target arm_neon_hw } */ +/* { dg-require-effective-target arm_little_endian } */ /* { dg-options "-O2" } */ /* { dg-add-options arm_neon } */ Index: gcc/testsuite/gcc.target/arm/neon-vset_lanes8.c =================================================================== --- gcc/testsuite/gcc.target/arm/neon-vset_lanes8.c (revision 344214) +++ gcc/testsuite/gcc.target/arm/neon-vset_lanes8.c (working copy) @@ -2,6 +2,7 @@ /* { dg-do run } */ /* { dg-require-effective-target arm_neon_hw } */ +/* { dg-require-effective-target arm_little_endian } */ /* { dg-options "-O0" } */ /* { dg-add-options arm_neon } */