From patchwork Wed Feb 27 17:29:47 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julian Brown X-Patchwork-Id: 223660 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id D8D1F2C0090 for ; Thu, 28 Feb 2013 04:30:37 +1100 (EST) Comment: DKIM? See http://www.dkim.org DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=gcc.gnu.org; s=default; x=1362591038; h=Comment: DomainKey-Signature:Received:Received:Received:Received:Received: Received:Date:From:To:CC:Subject:Message-ID:MIME-Version: Content-Type:Mailing-List:Precedence:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:Sender:Delivered-To; bh=/08WvXv mTkMIn58tOdmLNRupyss=; b=Pf7tu5LrkuguUeBqcF5TbNnTq7fOIe4XN5JBJ9v /h36rRVBv7hJpB4LXghUyNkOOUka2R7gsFkBFskwHLtPTym1ASNHJq8Rm/ksZK+E kijWol9etSee2P44NRwKN5SF0l3INcvhTb/EyGTERMvclMFtokLPdMAWBT2pWIrx Ql0g= Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:Received:Received:Date:From:To:CC:Subject:Message-ID:MIME-Version:Content-Type:X-IsSubscribed:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=efAkrcG2hamWY2hG/pKaCfOOOi1mVA4e5GpoBDHoH5gUsZAby/3GD8lS5dvLCB F25OvV94T4BxlvtEeeaEFxv9Wb+F1y4qKMUT7yno4usl0C1Qu0d4Zarn+paVJxP2 HnFqaeaeAVEcEzVft+DQxOil/xY/R9j+LZv8jzIwu7Ip4=; Received: (qmail 19086 invoked by alias); 27 Feb 2013 17:30:18 -0000 Received: (qmail 19076 invoked by uid 22791); 27 Feb 2013 17:30:15 -0000 X-SWARE-Spam-Status: No, hits=-3.4 required=5.0 tests=AWL, BAYES_00, KHOP_RCVD_UNTRUST, RCVD_IN_HOSTKARMA_W, RCVD_IN_HOSTKARMA_WL, TW_TM X-Spam-Check-By: sourceware.org Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 27 Feb 2013 17:30:01 +0000 Received: from svr-orw-exc-10.mgc.mentorg.com ([147.34.98.58]) by relay1.mentorg.com with esmtp id 1UAkpL-0001GA-Bz from Julian_Brown@mentor.com ; Wed, 27 Feb 2013 09:29:59 -0800 Received: from SVR-IES-FEM-01.mgc.mentorg.com ([137.202.0.104]) by SVR-ORW-EXC-10.mgc.mentorg.com with Microsoft SMTPSVC(6.0.3790.4675); Wed, 27 Feb 2013 09:29:59 -0800 Received: from octopus (137.202.0.76) by SVR-IES-FEM-01.mgc.mentorg.com (137.202.0.104) with Microsoft SMTP Server id 14.1.289.1; Wed, 27 Feb 2013 17:29:56 +0000 Date: Wed, 27 Feb 2013 17:29:47 +0000 From: Julian Brown To: CC: Ramana Radhakrishnan , Richard Earnshaw Subject: [PATCH, ARM, RFC] Fix vect.exp failures for NEON in big-endian mode Message-ID: <20130227172947.31fa279c@octopus> MIME-Version: 1.0 X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi, Several new (ish?) autovectorizer features have apparently caused NEON support for same to regress quite heavily in big-endian mode. This patch is an attempt to fix things up, but is not without problems -- maybe someone will have a suggestion as to how we should proceed. The problem (as ever) is that the ARM backend must lie to the middle-end about the layout of NEON vectors in big-endian mode (due to ABI requirements, VFP compatibility, and the middle-end semantics of vector indices being equivalent to those of an array with the same type of elements when stored in memory). A few years ago when the vectorizer was relatively less sophisticated, the ordering of vector elements could be ignored to some extent by disabling certain instruction patterns used by the vectorizer in big-endian mode which were sensitive to the ordering of elements: in fact this is still the strategy we're using, but it is clearly becoming less and less tenable as time progresses. Quad-word registers (being composed of two double-word registers, loaded/stored the "wrong way round" in big-endian mode) arguably cause more problems than double-word registers. So, the idea behind the attached patch was supposed to be to limit the autovectorizer to using double-word registers only, and to disable a few additional (or newly-used by the vectorizer) patterns in big-endian mode. That, plus several testsuite tweaks, gets us down to zero failures for vect.exp, which is good. The problem is that at the same time quite a large set of neon.exp tests regress (vzip/vuzp/vtrn): one of the new patterns which is disabled because it causes trouble (i.e. execution failures) for the vectorizer is vec_perm_const. However __builtin_shuffle (which uses that pattern) is used for arm_neon.h now -- so disabling it means that the proper instructions aren't generated for intrinsics any more in big-endian mode. I think we have a problem here. The vectorizer also tries to use __builtin_shuffle (for scatter/gather operations, when lane loading/storing ops aren't available), but does not understand the "special tweaks" that arm_evpc_neon_{vuzp,vzip,vtrn} does to try to hide the true element ordering of vectors from the middle-end. So, I'm left wondering: * Given our funky element ordering in BE mode, are the __builtin_shuffle lists in arm_neon.h actually an accurate representation of what the given intrinsic should do? (The fallback code might or might not do the same thing, I'm not sure.) * The vectorizer tries to use VEC_PERM_EXPR (equivalent to __builtin_shuffle) with e.g. pairs of doubleword registers loaded from adjacent memory locations. Are the semantics required for this (again, with our funky element ordering) even the same as those required for the intrinsics? Including quad-word registers for the latter? (My suspicion is "no", in which case there's a fundamental incompatibility here that needs to be resolved somehow.) Anyway: the tl;dr is "fixing NEON vect tests breaks intrinsics". Any ideas for what to do about that? (FAOD, I don't think I'm in a position to do the kind of middle-end surgery required to fix the problem "properly" at this point :-p). (It's arguably more important for the vectorizer to not generate bad code than it is for intrinsics to work properly, in which case: OK to apply? Tested cross to ARM EABI with configury modifications to build LE/BE multilibs.) Thanks, Julian ChangeLog gcc/ * config/arm/arm.c (arm_array_mode_supported_p): No array modes for big-endian NEON. (arm_preferred_simd_mode): Always prefer 64-bit modes for big-endian NEON. (arm_autovectorize_vector_sizes): Use 8-byte vectors only for NEON. (arm_vectorize_vec_perm_const_ok): No permutations are OK in big-endian mode. * config/arm/neon.md (vec_load_lanes): Disable in big-endian mode. (vec_store_lanes, vec_load_lanesti) (vec_load_lanesoi, vec_store_lanesti) (vec_store_lanesoi, vec_load_lanesei) (vec_load_lanesci, vec_store_lanesei) (vec_store_lanesci, vec_load_lanesxi) (vec_store_lanesxi): Likewise. (vec_widen_shiftl_lo_, vec_widen_shiftl_hi_) (vec_widen_mult_hi_, vec_widen_mult_lo_): Likewise. gcc/testsuite/ * gcc.dg/vect/slp-cond-3.c: XFAIL for !vect_unpack. * gcc.dg/vect/slp-cond-4.c: Likewise. * gcc.dg/vect/vect-1.c: Likewise. * gcc.dg/vect/vect-1-big-array.c: Likewise. * gcc.dg/vect/vect-35.c: Likewise. * gcc.dg/vect/vect-35-big-array.c: Likewise. * gcc.dg/vect/bb-slp-11.c: Likewise. * gcc.dg/vect/bb-slp-26.c: Likewise. * gcc.dg/vect/vect-over-widen-3-big-array.c: XFAIL for !vect_element_align. * gcc.dg/vect/vect-over-widen-1.c: Likewise. * gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise. * gcc.dg/vect/vect-over-widen-2.c: Likewise. * gcc.dg/vect/vect-over-widen-2-big-array.c: Likewise. * gcc.dg/vect/vect-over-widen-3.c: Likewise. * gcc.dg/vect/vect-over-widen-4.c: Likewise. * gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise. * gcc.dg/vect/pr43430-2.c: Likewise. * gcc.dg/vect/vect-widen-shift-u16.c: XFAIL for !vect_widen_shift && !vect_unpack. * gcc.dg/vect/vect-widen-shift-s8.c: Likewise. * gcc.dg/vect/vect-widen-shift-u8.c: Likewise. * gcc.dg/vect/vect-widen-shift-s16.c: Likewise. * gcc.dg/vect/vect-93.c: Only run if !vect_intfloat_cvt. * gcc.dg/vect/vect-intfloat-conversion-4a.c: Only run if vect_unpack. * gcc.dg/vect/vect-intfloat-conversion-4b.c: Likewise. * lib/target-supports.exp (check_effective_target_vect_perm): Only enable for NEON little-endian. (check_effective_target_vect_widen_sum_qi_to_hi): Likewise. (check_effective_target_vect_widen_mult_qi_to_hi): Likewise. (check_effective_target_vect_widen_mult_hi_to_si): Likewise. (check_effective_target_vect_widen_shift): Likewise. (check_effective_target_vect_extract_even_odd): Likewise. (check_effective_target_vect_interleave): Likewise. (check_effective_target_vect_stridedN): Likewise. (check_effective_target_vect_multiple_sizes): Likewise. (check_effective_target_vect64): Enable for any NEON. Index: gcc/testsuite/gcc.dg/vect/slp-cond-3.c =================================================================== --- gcc/testsuite/gcc.dg/vect/slp-cond-3.c (revision 196170) +++ gcc/testsuite/gcc.dg/vect/slp-cond-3.c (working copy) @@ -79,6 +79,6 @@ int main () return 0; } -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { ! vect_unpack } } } } */ /* { dg-final { cleanup-tree-dump "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-1.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-1.c (revision 196170) +++ gcc/testsuite/gcc.dg/vect/vect-1.c (working copy) @@ -86,5 +86,5 @@ foo (int n) } /* { dg-final { scan-tree-dump-times "vectorized 6 loops" 1 "vect" { target vect_strided2 } } } */ -/* { dg-final { scan-tree-dump-times "vectorized 5 loops" 1 "vect" { xfail vect_strided2 } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 5 loops" 1 "vect" { xfail { vect_strided2 || { ! vect_unpack } } } } } */ /* { dg-final { cleanup-tree-dump "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/slp-cond-4.c =================================================================== --- gcc/testsuite/gcc.dg/vect/slp-cond-4.c (revision 196170) +++ gcc/testsuite/gcc.dg/vect/slp-cond-4.c (working copy) @@ -82,5 +82,5 @@ int main () return 0; } -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { ! vect_unpack } } } } */ /* { dg-final { cleanup-tree-dump "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-1-big-array.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-1-big-array.c (revision 196170) +++ gcc/testsuite/gcc.dg/vect/vect-1-big-array.c (working copy) @@ -86,5 +86,5 @@ foo (int n) } /* { dg-final { scan-tree-dump-times "vectorized 6 loops" 1 "vect" { target vect_strided2 } } } */ -/* { dg-final { scan-tree-dump-times "vectorized 5 loops" 1 "vect" { xfail vect_strided2 } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 5 loops" 1 "vect" { xfail { vect_strided2 || { ! vect_unpack } } } } } */ /* { dg-final { cleanup-tree-dump "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-35.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-35.c (revision 196170) +++ gcc/testsuite/gcc.dg/vect/vect-35.c (working copy) @@ -45,6 +45,6 @@ int main (void) } -/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { xfail { ia64-*-* sparc*-*-* } } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { xfail { { ia64-*-* sparc*-*-* } || { ! vect_unpack } } } } } */ /* { dg-final { scan-tree-dump "can't determine dependence between" "vect" } } */ /* { dg-final { cleanup-tree-dump "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c (revision 196170) +++ gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c (working copy) @@ -59,6 +59,6 @@ int main (void) } /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 1 "vect" } } */ -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! vect_element_align } } } } */ /* { dg-final { cleanup-tree-dump "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-widen-shift-u16.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-widen-shift-u16.c (revision 196170) +++ gcc/testsuite/gcc.dg/vect/vect-widen-shift-u16.c (working copy) @@ -53,6 +53,6 @@ int main (void) } /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 1 "vect" { target vect_widen_shift } } } */ -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! vect_widen_shift } && { ! vect_unpack } } } } */ /* { dg-final { cleanup-tree-dump "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/bb-slp-26.c =================================================================== --- gcc/testsuite/gcc.dg/vect/bb-slp-26.c (revision 196170) +++ gcc/testsuite/gcc.dg/vect/bb-slp-26.c (working copy) @@ -55,6 +55,6 @@ int main (void) return 0; } -/* { dg-final { scan-tree-dump-times "basic block vectorized using SLP" 1 "slp" { target vect64 } } } */ +/* { dg-final { scan-tree-dump-times "basic block vectorized using SLP" 1 "slp" { target vect64 xfail { ! vect_unpack } } } } */ /* { dg-final { cleanup-tree-dump "slp" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c (revision 196170) +++ gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c (working copy) @@ -62,6 +62,6 @@ int main (void) /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */ /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */ /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */ -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! vect_element_align } } } } */ /* { dg-final { cleanup-tree-dump "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-35-big-array.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-35-big-array.c (revision 196170) +++ gcc/testsuite/gcc.dg/vect/vect-35-big-array.c (working copy) @@ -45,6 +45,6 @@ int main (void) } -/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { xfail { ia64-*-* sparc*-*-* } } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { xfail { { ia64-*-* sparc*-*-* } || { ! vect_unpack } } } } } */ /* { dg-final { scan-tree-dump-times "can't determine dependence between" 1 "vect" } } */ /* { dg-final { cleanup-tree-dump "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c (revision 196170) +++ gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c (working copy) @@ -60,6 +60,6 @@ int main (void) /* Final value stays in int, so no over-widening is detected at the moment. */ /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */ -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! vect_element_align } } } } */ /* { dg-final { cleanup-tree-dump "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/pr43430-2.c =================================================================== --- gcc/testsuite/gcc.dg/vect/pr43430-2.c (revision 196170) +++ gcc/testsuite/gcc.dg/vect/pr43430-2.c (working copy) @@ -12,5 +12,5 @@ vsad16_c (void *c, uint8_t * s1, uint8_t return score; } -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_condition } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_condition && vect_element_align } } } } */ /* { dg-final { cleanup-tree-dump "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-widen-shift-s8.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-widen-shift-s8.c (revision 196170) +++ gcc/testsuite/gcc.dg/vect/vect-widen-shift-s8.c (working copy) @@ -53,6 +53,6 @@ int main (void) } /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 1 "vect" { target vect_widen_shift } } } */ -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! vect_widen_shift } && { ! vect_unpack } } } } */ /* { dg-final { cleanup-tree-dump "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c (revision 196170) +++ gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c (working copy) @@ -61,6 +61,6 @@ int main (void) /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */ /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */ /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */ -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! vect_element_align } } } } */ /* { dg-final { cleanup-tree-dump "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c (revision 196170) +++ gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c (working copy) @@ -59,6 +59,6 @@ int main (void) } /* { dg-final { scan-tree-dump "vect_recog_over_widening_pattern: detected" "vect" } } */ -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! vect_element_align } } } } */ /* { dg-final { cleanup-tree-dump "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c (revision 196170) +++ gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c (working copy) @@ -66,6 +66,6 @@ int main (void) /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */ /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */ /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */ -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! vect_element_align } } } } */ /* { dg-final { cleanup-tree-dump "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c (revision 196170) +++ gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c (working copy) @@ -65,6 +65,6 @@ int main (void) /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */ /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */ /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */ -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! vect_element_align } } } } */ /* { dg-final { cleanup-tree-dump "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-93.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-93.c (revision 196170) +++ gcc/testsuite/gcc.dg/vect/vect-93.c (working copy) @@ -79,7 +79,7 @@ int main (void) /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target vect_no_align } } } */ /* in main: */ -/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target vect_no_align } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { vect_no_align && { ! vect_intfloat_cvt } } } } } */ /* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 1 "vect" { xfail { vect_no_align } } } } */ /* { dg-final { cleanup-tree-dump "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-widen-shift-u8.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-widen-shift-u8.c (revision 196170) +++ gcc/testsuite/gcc.dg/vect/vect-widen-shift-u8.c (working copy) @@ -60,5 +60,5 @@ int main (void) } /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */ -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! vect_widen_shift } && { ! vect_unpack } } } } */ /* { dg-final { cleanup-tree-dump "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4a.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4a.c (revision 196170) +++ gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4a.c (working copy) @@ -35,5 +35,5 @@ int main (void) return main1 (); } -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_intfloat_cvt } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_intfloat_cvt && vect_unpack } } } } */ /* { dg-final { cleanup-tree-dump "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4b.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4b.c (revision 196170) +++ gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4b.c (working copy) @@ -35,5 +35,5 @@ int main (void) return main1 (); } -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_intfloat_cvt } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_intfloat_cvt && vect_unpack } } } } */ /* { dg-final { cleanup-tree-dump "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c (revision 196170) +++ gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c (working copy) @@ -60,6 +60,6 @@ int main (void) /* Final value stays in int, so no over-widening is detected at the moment. */ /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */ -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! vect_element_align } } } } */ /* { dg-final { cleanup-tree-dump "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/bb-slp-11.c =================================================================== --- gcc/testsuite/gcc.dg/vect/bb-slp-11.c (revision 196170) +++ gcc/testsuite/gcc.dg/vect/bb-slp-11.c (working copy) @@ -48,6 +48,6 @@ int main (void) return 0; } -/* { dg-final { scan-tree-dump-times "basic block vectorized using SLP" 1 "slp" { target vect64 } } } */ +/* { dg-final { scan-tree-dump-times "basic block vectorized using SLP" 1 "slp" { target vect64 xfail { ! vect_unpack } } } } */ /* { dg-final { cleanup-tree-dump "slp" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-widen-shift-s16.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-widen-shift-s16.c (revision 196170) +++ gcc/testsuite/gcc.dg/vect/vect-widen-shift-s16.c (working copy) @@ -102,6 +102,6 @@ int main (void) } /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 8 "vect" { target vect_widen_shift } } } */ -/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { xfail { ! vect_widen_shift } && { ! vect_unpack } } } } */ /* { dg-final { cleanup-tree-dump "vect" } } */ Index: gcc/testsuite/lib/target-supports.exp =================================================================== --- gcc/testsuite/lib/target-supports.exp (revision 196170) +++ gcc/testsuite/lib/target-supports.exp (working copy) @@ -3089,7 +3089,8 @@ proc check_effective_target_vect_perm { verbose "check_effective_target_vect_perm: using cached result" 2 } else { set et_vect_perm_saved 0 - if { [is-effective-target arm_neon_ok] + if { ([is-effective-target arm_neon_ok] + && [is-effective-target arm_little_endian]) || [istarget aarch64*-*-*] || [istarget powerpc*-*-*] || [istarget spu-*-*] @@ -3211,7 +3212,8 @@ proc check_effective_target_vect_widen_s } else { set et_vect_widen_sum_qi_to_hi_saved 0 if { [check_effective_target_vect_unpack] - || [check_effective_target_arm_neon_ok] + || ([check_effective_target_arm_neon_ok] + && [check_effective_target_arm_little_endian]) || [istarget ia64-*-*] } { set et_vect_widen_sum_qi_to_hi_saved 1 } @@ -3263,7 +3265,8 @@ proc check_effective_target_vect_widen_m } if { [istarget powerpc*-*-*] || [istarget aarch64*-*-*] - || ([istarget arm*-*-*] && [check_effective_target_arm_neon_ok]) } { + || ([istarget arm*-*-*] && [check_effective_target_arm_neon_ok] + && [check_effective_target_arm_little_endian]) } { set et_vect_widen_mult_qi_to_hi_saved 1 } } @@ -3298,7 +3301,8 @@ proc check_effective_target_vect_widen_m || [istarget aarch64*-*-*] || [istarget i?86-*-*] || [istarget x86_64-*-*] - || ([istarget arm*-*-*] && [check_effective_target_arm_neon_ok]) } { + || ([istarget arm*-*-*] && [check_effective_target_arm_neon_ok] + && [check_effective_target_arm_little_endian]) } { set et_vect_widen_mult_hi_to_si_saved 1 } } @@ -3368,7 +3372,8 @@ proc check_effective_target_vect_widen_s verbose "check_effective_target_vect_widen_shift: using cached result" 2 } else { set et_vect_widen_shift_saved 0 - if { ([istarget arm*-*-*] && [check_effective_target_arm_neon_ok]) } { + if { ([istarget arm*-*-*] && [check_effective_target_arm_neon_ok] + && [check_effective_target_arm_little_endian]) } { set et_vect_widen_shift_saved 1 } } @@ -3859,7 +3864,8 @@ proc check_effective_target_vect_extract set et_vect_extract_even_odd_saved 0 if { [istarget aarch64*-*-*] || [istarget powerpc*-*-*] - || [is-effective-target arm_neon_ok] + || ([is-effective-target arm_neon_ok] + && [is-effective-target arm_little_endian]) || [istarget i?86-*-*] || [istarget x86_64-*-*] || [istarget ia64-*-*] @@ -3885,7 +3891,8 @@ proc check_effective_target_vect_interle set et_vect_interleave_saved 0 if { [istarget aarch64*-*-*] || [istarget powerpc*-*-*] - || [is-effective-target arm_neon_ok] + || ([is-effective-target arm_neon_ok] + && [is-effective-target arm_little_endian]) || [istarget i?86-*-*] || [istarget x86_64-*-*] || [istarget ia64-*-*] @@ -3915,7 +3922,8 @@ foreach N {2 3 4 8} { && [check_effective_target_vect_extract_even_odd] } { set et_vect_stridedN_saved 1 } - if { ([istarget arm*-*-*] + if { (([istarget arm*-*-*] && [is-effective-target arm_neon_ok] + && [is-effective-target arm_little_endian]) || [istarget aarch64*-*-*]) && N >= 2 && N <= 4 } { set et_vect_stridedN_saved 1 } @@ -3934,7 +3942,8 @@ proc check_effective_target_vect_multipl set et_vect_multiple_sizes_saved 0 if { ([istarget aarch64*-*-*] - || ([istarget arm*-*-*] && [check_effective_target_arm_neon_ok])) } { + || ([istarget arm*-*-*] && [check_effective_target_arm_neon_ok] + && [check_effective_target_arm_little_endian])) } { set et_vect_multiple_sizes_saved 1 } if { ([istarget x86_64-*-*] || [istarget i?86-*-*]) } { @@ -3957,8 +3966,7 @@ proc check_effective_target_vect64 { } { } else { set et_vect64_saved 0 if { ([istarget arm*-*-*] - && [check_effective_target_arm_neon_ok] - && [check_effective_target_arm_little_endian]) } { + && [check_effective_target_arm_neon_ok]) } { set et_vect64_saved 1 } } Index: gcc/config/arm/arm.c =================================================================== --- gcc/config/arm/arm.c (revision 196170) +++ gcc/config/arm/arm.c (working copy) @@ -25041,7 +25041,7 @@ static bool arm_array_mode_supported_p (enum machine_mode mode, unsigned HOST_WIDE_INT nelems) { - if (TARGET_NEON + if (TARGET_NEON && !BYTES_BIG_ENDIAN && (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode)) && (nelems >= 2 && nelems <= 4)) return true; @@ -25057,23 +25057,27 @@ static enum machine_mode arm_preferred_simd_mode (enum machine_mode mode) { if (TARGET_NEON) - switch (mode) - { - case SFmode: - return TARGET_NEON_VECTORIZE_DOUBLE ? V2SFmode : V4SFmode; - case SImode: - return TARGET_NEON_VECTORIZE_DOUBLE ? V2SImode : V4SImode; - case HImode: - return TARGET_NEON_VECTORIZE_DOUBLE ? V4HImode : V8HImode; - case QImode: - return TARGET_NEON_VECTORIZE_DOUBLE ? V8QImode : V16QImode; - case DImode: - if (!TARGET_NEON_VECTORIZE_DOUBLE) - return V2DImode; - break; + { + bool double_only = BYTES_BIG_ENDIAN || TARGET_NEON_VECTORIZE_DOUBLE; - default:; - } + switch (mode) + { + case SFmode: + return double_only ? V2SFmode : V4SFmode; + case SImode: + return double_only ? V2SImode : V4SImode; + case HImode: + return double_only ? V4HImode : V8HImode; + case QImode: + return double_only ? V8QImode : V16QImode; + case DImode: + if (!double_only) + return V2DImode; + break; + + default:; + } + } if (TARGET_REALLY_IWMMXT) switch (mode) @@ -25974,6 +25978,11 @@ arm_vector_alignment (const_tree type) static unsigned int arm_autovectorize_vector_sizes (void) { + /* Use of quad-word registers for autovectorization for NEON is fraught with + difficulties. Just don't do that. */ + if (TARGET_NEON && BYTES_BIG_ENDIAN) + return 8; + return TARGET_NEON_VECTORIZE_DOUBLE ? 0 : (16 | 8); } @@ -27008,6 +27017,12 @@ arm_vectorize_vec_perm_const_ok (enum ma unsigned int i, nelt, which; bool ret; + /* FIXME: There appear to be element-numbering problems with vector + permutations in big-endian mode that cause the vectorizer to produce bad + code. Disable for now. */ + if (BYTES_BIG_ENDIAN) + return false; + d.vmode = vmode; d.nelt = nelt = GET_MODE_NUNITS (d.vmode); d.testing_p = true; Index: gcc/config/arm/neon.md =================================================================== --- gcc/config/arm/neon.md (revision 196170) +++ gcc/config/arm/neon.md (working copy) @@ -4506,7 +4506,7 @@ [(set (match_operand:VDQX 0 "s_register_operand") (unspec:VDQX [(match_operand:VDQX 1 "neon_struct_operand")] UNSPEC_VLD1))] - "TARGET_NEON") + "TARGET_NEON && !BYTES_BIG_ENDIAN") (define_insn "neon_vld1" [(set (match_operand:VDQX 0 "s_register_operand" "=w") @@ -4618,7 +4618,7 @@ [(set (match_operand:VDQX 0 "neon_struct_operand") (unspec:VDQX [(match_operand:VDQX 1 "s_register_operand")] UNSPEC_VST1))] - "TARGET_NEON") + "TARGET_NEON && !BYTES_BIG_ENDIAN") (define_insn "neon_vst1" [(set (match_operand:VDQX 0 "neon_struct_operand" "=Um") @@ -4683,7 +4683,7 @@ (unspec:TI [(match_operand:TI 1 "neon_struct_operand") (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VLD2))] - "TARGET_NEON") + "TARGET_NEON && !BYTES_BIG_ENDIAN") (define_insn "neon_vld2" [(set (match_operand:TI 0 "s_register_operand" "=w") @@ -4708,7 +4708,7 @@ (unspec:OI [(match_operand:OI 1 "neon_struct_operand") (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VLD2))] - "TARGET_NEON") + "TARGET_NEON && !BYTES_BIG_ENDIAN") (define_insn "neon_vld2" [(set (match_operand:OI 0 "s_register_operand" "=w") @@ -4797,7 +4797,7 @@ (unspec:TI [(match_operand:TI 1 "s_register_operand") (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VST2))] - "TARGET_NEON") + "TARGET_NEON && !BYTES_BIG_ENDIAN") (define_insn "neon_vst2" [(set (match_operand:TI 0 "neon_struct_operand" "=Um") @@ -4822,7 +4822,7 @@ (unspec:OI [(match_operand:OI 1 "s_register_operand") (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VST2))] - "TARGET_NEON") + "TARGET_NEON && !BYTES_BIG_ENDIAN") (define_insn "neon_vst2" [(set (match_operand:OI 0 "neon_struct_operand" "=Um") @@ -4894,7 +4894,7 @@ (unspec:EI [(match_operand:EI 1 "neon_struct_operand") (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VLD3))] - "TARGET_NEON") + "TARGET_NEON && !BYTES_BIG_ENDIAN") (define_insn "neon_vld3" [(set (match_operand:EI 0 "s_register_operand" "=w") @@ -4918,7 +4918,7 @@ [(match_operand:CI 0 "s_register_operand") (match_operand:CI 1 "neon_struct_operand") (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - "TARGET_NEON" + "TARGET_NEON && !BYTES_BIG_ENDIAN" { emit_insn (gen_neon_vld3 (operands[0], operands[1])); DONE; @@ -5068,7 +5068,7 @@ (unspec:EI [(match_operand:EI 1 "s_register_operand") (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VST3))] - "TARGET_NEON") + "TARGET_NEON && !BYTES_BIG_ENDIAN") (define_insn "neon_vst3" [(set (match_operand:EI 0 "neon_struct_operand" "=Um") @@ -5091,7 +5091,7 @@ [(match_operand:CI 0 "neon_struct_operand") (match_operand:CI 1 "s_register_operand") (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - "TARGET_NEON" + "TARGET_NEON && !BYTES_BIG_ENDIAN" { emit_insn (gen_neon_vst3 (operands[0], operands[1])); DONE; @@ -5213,7 +5213,7 @@ (unspec:OI [(match_operand:OI 1 "neon_struct_operand") (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VLD4))] - "TARGET_NEON") + "TARGET_NEON && !BYTES_BIG_ENDIAN") (define_insn "neon_vld4" [(set (match_operand:OI 0 "s_register_operand" "=w") @@ -5237,7 +5237,7 @@ [(match_operand:XI 0 "s_register_operand") (match_operand:XI 1 "neon_struct_operand") (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - "TARGET_NEON" + "TARGET_NEON && !BYTES_BIG_ENDIAN" { emit_insn (gen_neon_vld4 (operands[0], operands[1])); DONE; @@ -5394,7 +5394,7 @@ (unspec:OI [(match_operand:OI 1 "s_register_operand") (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VST4))] - "TARGET_NEON") + "TARGET_NEON && !BYTES_BIG_ENDIAN") (define_insn "neon_vst4" [(set (match_operand:OI 0 "neon_struct_operand" "=Um") @@ -5418,7 +5418,7 @@ [(match_operand:XI 0 "neon_struct_operand") (match_operand:XI 1 "s_register_operand") (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - "TARGET_NEON" + "TARGET_NEON && !BYTES_BIG_ENDIAN" { emit_insn (gen_neon_vst4 (operands[0], operands[1])); DONE; @@ -5725,7 +5725,7 @@ [(set (match_operand: 0 "register_operand" "=w") (SE: (ashift:VW (match_operand:VW 1 "register_operand" "w") (match_operand: 2 "const_neon_scalar_shift_amount_operand" ""))))] - "TARGET_NEON" + "TARGET_NEON && !BYTES_BIG_ENDIAN" { return "vshll. %q0, %P1, %2"; } @@ -5771,7 +5771,7 @@ (define_expand "vec_unpack_lo_" [(match_operand: 0 "register_operand" "") (SE:(match_operand:VDI 1 "register_operand"))] - "TARGET_NEON" + "TARGET_NEON && !BYTES_BIG_ENDIAN" { rtx tmpreg = gen_reg_rtx (mode); emit_insn (gen_neon_unpack_ (tmpreg, operands[1])); @@ -5784,7 +5784,7 @@ (define_expand "vec_unpack_hi_" [(match_operand: 0 "register_operand" "") (SE:(match_operand:VDI 1 "register_operand"))] - "TARGET_NEON" + "TARGET_NEON && !BYTES_BIG_ENDIAN" { rtx tmpreg = gen_reg_rtx (mode); emit_insn (gen_neon_unpack_ (tmpreg, operands[1])); @@ -5800,7 +5800,7 @@ (match_operand:VDI 1 "register_operand" "w")) (SE: (match_operand:VDI 2 "register_operand" "w"))))] - "TARGET_NEON" + "TARGET_NEON && !BYTES_BIG_ENDIAN" "vmull. %q0, %P1, %P2" [(set_attr "neon_type" "neon_shift_1")] ) @@ -5809,7 +5809,7 @@ [(match_operand: 0 "register_operand" "") (SE: (match_operand:VDI 1 "register_operand" "")) (SE: (match_operand:VDI 2 "register_operand" ""))] - "TARGET_NEON" + "TARGET_NEON && !BYTES_BIG_ENDIAN" { rtx tmpreg = gen_reg_rtx (mode); emit_insn (gen_neon_vec_mult_ (tmpreg, operands[1], operands[2])); @@ -5824,7 +5824,7 @@ [(match_operand: 0 "register_operand" "") (SE: (match_operand:VDI 1 "register_operand" "")) (SE: (match_operand:VDI 2 "register_operand" ""))] - "TARGET_NEON" + "TARGET_NEON && !BYTES_BIG_ENDIAN" { rtx tmpreg = gen_reg_rtx (mode); emit_insn (gen_neon_vec_mult_ (tmpreg, operands[1], operands[2])); @@ -5839,7 +5839,7 @@ [(match_operand: 0 "register_operand" "") (SE: (match_operand:VDI 1 "register_operand" "")) (match_operand:SI 2 "immediate_operand" "i")] - "TARGET_NEON" + "TARGET_NEON && !BYTES_BIG_ENDIAN" { rtx tmpreg = gen_reg_rtx (mode); emit_insn (gen_neon_vec_shiftl_ (tmpreg, operands[1], operands[2])); @@ -5853,7 +5853,7 @@ [(match_operand: 0 "register_operand" "") (SE: (match_operand:VDI 1 "register_operand" "")) (match_operand:SI 2 "immediate_operand" "i")] - "TARGET_NEON" + "TARGET_NEON && !BYTES_BIG_ENDIAN" { rtx tmpreg = gen_reg_rtx (mode); emit_insn (gen_neon_vec_shiftl_ (tmpreg, operands[1], operands[2]));