From patchwork Thu Nov 29 17:53:07 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Earnshaw X-Patchwork-Id: 202800 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id AEB392C008F for ; Fri, 30 Nov 2012 04:53:31 +1100 (EST) Comment: DKIM? See http://www.dkim.org DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=gcc.gnu.org; s=default; x=1354816413; h=Comment: DomainKey-Signature:Received:Received:Received:Received:Received: Message-ID:Date:From:User-Agent:MIME-Version:To:Subject: Content-Type:Mailing-List:Precedence:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:Sender:Delivered-To; bh=99uP06m DUGN2z+ZgMOG5pezowTw=; b=hS4476Q15c6sAXw6kqopP1cpNBXGjNIoOX4RoEk 9/jKPlMB5VnqafVUgh0npynbbocD1JadcTXzsgTzG3CR/NBYiKKVQ5px7Txi0sQR YSuDUmd6S8pCRZTpHHkhTK3IIJQL8R31MBzAh/EUAv/u24XGwuMHtgZ94F5nALkx rX3c= Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:Received:Message-ID:Date:From:User-Agent:MIME-Version:To:Subject:X-MC-Unique:Content-Type:X-IsSubscribed:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=OjC/Peq5WHCzHJg2tHn96P44C/q1Q6Rn7WBmNjJb1sn2UKo24ZgRnKEmoCmS0f hQLGrk0vSBRg5MOnPHUIavXNeiL0/C3rWVgPzq7+/zpVFrI03+i/5BfA6VWi5Q0S SfAGhrXb+SKaVL+46tSMFDCH3t0Y5h4WFwsc7y1Q5ATdM=; Received: (qmail 17296 invoked by alias); 29 Nov 2012 17:53:20 -0000 Received: (qmail 17279 invoked by uid 22791); 29 Nov 2012 17:53:18 -0000 X-SWARE-Spam-Status: No, hits=-0.8 required=5.0 tests=AWL, BAYES_00, KHOP_RCVD_UNTRUST, RCVD_IN_DNSWL_LOW, TW_XF X-Spam-Check-By: sourceware.org Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 29 Nov 2012 17:53:11 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Thu, 29 Nov 2012 17:53:09 +0000 Received: from [10.1.69.67] ([10.1.255.212]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 29 Nov 2012 17:53:08 +0000 Message-ID: <50B7A103.8000701@arm.com> Date: Thu, 29 Nov 2012 17:53:07 +0000 From: Richard Earnshaw User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:15.0) Gecko/20120907 Thunderbird/15.0.1 MIME-Version: 1.0 To: gcc-patches Subject: [patch, ARM] Fix pr55073 X-MC-Unique: 112112917530900401 X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org PR 55073 is a case where scheduling appears to mess up the order of instructions to the extent that they no-longer give the correct results. However, looking at the patterns, I think they are ill-defined. IIRC tied operands should tie a source to a destination, rather than a destination to a source. Failing to get this right can essentially result in an input value being clobbered and the compiler failing to detect this. Because all the patterns that appear to have this violation are named patterns that are called during expand (mainly of intrinsics), it's not trivial to simply rename the operands, or we get bogus rtl. Instead, I've taken the approach of splitting expand from match. gcc: * arm/neon.md (neon_vtrn_internal): Split into expand and insn patterns. Re-order insn arguments to tie inputs to outputs. (neon_vzip_internal): Likewise. (neon_vuzp_internal): Likewise. testsuite: * gcc.target/arm/pr55073.C: New test. --- config/arm/neon.md (revision 193005) +++ config/arm/neon.md (local) @@ -4225,16 +4225,29 @@ (define_insn "neon_vtbx4v8qi" [(set_attr "neon_type" "neon_bp_3cycle")] ) -(define_insn "neon_vtrn_internal" +(define_expand "neon_vtrn_internal" + [(parallel + [(set (match_operand:VDQW 0 "s_register_operand" "") + (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "") + (match_operand:VDQW 2 "s_register_operand" "")] + UNSPEC_VTRN1)) + (set (match_operand:VDQW 3 "s_register_operand" "") + (unspec:VDQW [(match_dup 1) (match_dup 2)] UNSPEC_VTRN2))])] + "TARGET_NEON" + "" +) + +;; Note: Different operand numbering to handle tied registers correctly. +(define_insn "*neon_vtrn_insn" [(set (match_operand:VDQW 0 "s_register_operand" "=w") (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0") - (match_operand:VDQW 2 "s_register_operand" "w")] + (match_operand:VDQW 3 "s_register_operand" "2")] UNSPEC_VTRN1)) - (set (match_operand:VDQW 3 "s_register_operand" "=2") - (unspec:VDQW [(match_dup 1) (match_dup 2)] + (set (match_operand:VDQW 2 "s_register_operand" "=w") + (unspec:VDQW [(match_dup 1) (match_dup 3)] UNSPEC_VTRN2))] "TARGET_NEON" - "vtrn.\t%0, %3" + "vtrn.\t%0, %2" [(set (attr "neon_type") (if_then_else (match_test "") (const_string "neon_bp_simple") @@ -4252,16 +4265,29 @@ (define_expand "neon_vtrn" DONE; }) -(define_insn "neon_vzip_internal" +(define_expand "neon_vzip_internal" + [(parallel + [(set (match_operand:VDQW 0 "s_register_operand" "") + (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "") + (match_operand:VDQW 2 "s_register_operand" "")] + UNSPEC_VZIP1)) + (set (match_operand:VDQW 3 "s_register_operand" "") + (unspec:VDQW [(match_dup 1) (match_dup 2)] UNSPEC_VZIP2))])] + "TARGET_NEON" + "" +) + +;; Note: Different operand numbering to handle tied registers correctly. +(define_insn "*neon_vzip_insn" [(set (match_operand:VDQW 0 "s_register_operand" "=w") (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0") - (match_operand:VDQW 2 "s_register_operand" "w")] + (match_operand:VDQW 3 "s_register_operand" "2")] UNSPEC_VZIP1)) - (set (match_operand:VDQW 3 "s_register_operand" "=2") - (unspec:VDQW [(match_dup 1) (match_dup 2)] + (set (match_operand:VDQW 2 "s_register_operand" "=w") + (unspec:VDQW [(match_dup 1) (match_dup 3)] UNSPEC_VZIP2))] "TARGET_NEON" - "vzip.\t%0, %3" + "vzip.\t%0, %2" [(set (attr "neon_type") (if_then_else (match_test "") (const_string "neon_bp_simple") @@ -4279,16 +4305,29 @@ (define_expand "neon_vzip" DONE; }) -(define_insn "neon_vuzp_internal" +(define_expand "neon_vuzp_internal" + [(parallel + [(set (match_operand:VDQW 0 "s_register_operand" "") + (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "") + (match_operand:VDQW 2 "s_register_operand" "")] + UNSPEC_VUZP1)) + (set (match_operand:VDQW 3 "s_register_operand" "") + (unspec:VDQW [(match_dup 1) (match_dup 2)] UNSPEC_VUZP2))])] + "TARGET_NEON" + "" +) + +;; Note: Different operand numbering to handle tied registers correctly. +(define_insn "*neon_vuzp_insn" [(set (match_operand:VDQW 0 "s_register_operand" "=w") (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0") - (match_operand:VDQW 2 "s_register_operand" "w")] + (match_operand:VDQW 3 "s_register_operand" "2")] UNSPEC_VUZP1)) - (set (match_operand:VDQW 3 "s_register_operand" "=2") - (unspec:VDQW [(match_dup 1) (match_dup 2)] + (set (match_operand:VDQW 2 "s_register_operand" "=w") + (unspec:VDQW [(match_dup 1) (match_dup 3)] UNSPEC_VUZP2))] "TARGET_NEON" - "vuzp.\t%0, %3" + "vuzp.\t%0, %2" [(set (attr "neon_type") (if_then_else (match_test "") (const_string "neon_bp_simple") --- testsuite/gcc.target/arm/pr55073.C (revision 193005) +++ testsuite/gcc.target/arm/pr55073.C (local) @@ -0,0 +1,74 @@ +/* { dg-do run } */ +/* { dg-require-effective-target arm_neon } */ +/* { dg-options "-O2" } */ +/* { dg-add-options arm_neon } */ + +#include +#include + +struct __attribute__((aligned(16))) _v16u8_ { + uint8x16_t val; + _v16u8_() { } + + _v16u8_( const uint8x16_t &src) { val = src; } + _v16u8_( const int16x8_t &src) { val = vreinterpretq_u8_s16(src); } + _v16u8_( const uint32x4_t &src) { val = vreinterpretq_u8_u32(src); } + + operator uint8x16_t () const { return val; } + operator int8x16_t () const { return vreinterpretq_s8_u8 (val); } + operator int16x8_t () const { return vreinterpretq_s16_u8(val); } + operator uint32x4_t () const { return vreinterpretq_u32_u8(val); } + operator int32x4_t () const { return vreinterpretq_s32_u8(val); } +}; +typedef struct _v16u8_ v16u8; +typedef const v16u8 cv16u8; + +typedef v16u8 v16i8; +typedef v16u8 v8i16; +typedef v16u8 v4u32; + +inline v16u8 __attribute__((always_inline)) mergelo( const v16u8 & s, const v16u8 & t ) +{ + uint8x8x2_t r = vzip_u8( vget_low_u8(s), vget_low_u8(t) ); + return vcombine_u8( r.val[0], r.val[1] ); +} + +inline v8i16 __attribute__((always_inline)) unpacklo(const v16i8 & s) +{ + return vmovl_s8( vget_low_s8( s ) ); +} + +const uint32_t __attribute__((aligned(16))) _InA [4] = { 0xFF020001, 0xFF020001, 0xFF000101, 0xFF000101 } ; +const uint32_t __attribute__((aligned(16))) _InB [4] = { 0xFF050002, 0xFF050002, 0xFF000303, 0xFF000203 } ; + +__attribute__((noinline)) v16i8 test_func(void) +{ + v16u8 A = vld1q_u8( (uint8_t*) _InA ); + v16u8 B = vld1q_u8( (uint8_t*) _InB ); + v8i16 r = vdupq_n_s16(2); + + v16u8 _0 = mergelo( A, B ); + v16u8 _1 = mergelo( B, A ); + + v16u8 _2 = mergelo( _0, _1 ); + v16u8 _3 = mergelo( _1, _0 ); + + v8i16 _4 = vsubq_s16( unpacklo( _2 ), r ); + v8i16 _5 = vsubq_s16( unpacklo( _3 ), r ); + + v8i16 ret = vaddq_s16( _4, _5 ); + + return ( ret ); +} + +int main (int argc, char **argv) +{ + v16u8 val = test_func(); + + if (vgetq_lane_u32( val, 0 ) != 0xffffffff + || vgetq_lane_u32( val, 1 ) != 0xffffffff + || vgetq_lane_u32( val, 2 ) != 0xfffcfffc + || vgetq_lane_u32( val, 3 ) != 0xfffcfffc) + abort (); + exit (0); +}