From patchwork Sat May 26 08:27:53 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ramana Radhakrishnan X-Patchwork-Id: 161465 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 39F61B6FA2 for ; Sat, 26 May 2012 18:28:27 +1000 (EST) Comment: DKIM? See http://www.dkim.org DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=gcc.gnu.org; s=default; x=1338625708; h=Comment: DomainKey-Signature:Received:Received:Received:Received: MIME-Version:Received:Received:Date:Message-ID:Subject:From:To: Cc:Content-Type:Mailing-List:Precedence:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:Sender:Delivered-To; bh=j167TAv StbnonPG/dYwNJpYT5nI=; b=VzGtI6s3R/b3sy6QrkakMFG+ofrpHI4yolZyAaY XeV6CqdERhbxafENZyTjfScQQ4BcOSjyK3UHh0odglvUojJiVGB9X9dwzVqffF1a F0MaAw/Bkpn22r81yAMyKiMaOn84/XOpcw7pRWr1wlWHu05gE6EgkVOmGC9kzzKH 2zyU= Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:X-Google-DKIM-Signature:MIME-Version:Received:Received:Date:Message-ID:Subject:From:To:Cc:Content-Type:X-Gm-Message-State:X-IsSubscribed:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=tM9pa+Efe1F3mcgIPAtKQpQH+UAej6fTZiRFGXhUa7MaBjwkg/CZPcz/3MM5yx 1luE60flg4Ypsw+oko6cpfzCttW5tCSXyb2jEuSweZNGtODgmynMUBR9rEFR8Q7T lYWRKLxMRSl77W636hpni69J39M/M1MkU0XV7gyUAozbQ=; Received: (qmail 2577 invoked by alias); 26 May 2012 08:28:15 -0000 Received: (qmail 2567 invoked by uid 22791); 26 May 2012 08:28:07 -0000 X-SWARE-Spam-Status: No, hits=-2.0 required=5.0 tests=AWL, BAYES_00, KHOP_RCVD_UNTRUST, RCVD_IN_DNSWL_LOW, RCVD_IN_HOSTKARMA_YE, TW_EV, TW_VP X-Spam-Check-By: sourceware.org Received: from mail-wg0-f51.google.com (HELO mail-wg0-f51.google.com) (74.125.82.51) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sat, 26 May 2012 08:27:54 +0000 Received: by wgbed3 with SMTP id ed3so1367609wgb.8 for ; Sat, 26 May 2012 01:27:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:date:message-id:subject:from:to:cc:content-type :x-gm-message-state; bh=a6HLfbXC41svDqJNaQPzPrIYQgCPGbRhyZlp1xN9NRc=; b=KoQ0vYI5cxjEJ104MQBK0iJp56PlVVakcm95+MkIx2Wc1XeuH9uvjkOeR5afCmbZRu t1lMREvYLm11lh5D6rE51ZfTm0Fu0KkL1TtZIWfmB0cYEUoRCHCIT0RjS3Mf58P+ztyY un0bKQEXh+tPSpkni4+OX0K3rfxxfqzkukBk86UDEunYFpZ5Qs4Hbjv8MM4qvrnOo7ku 3YL8hS4tq40RCuwVs8gLtBbHpk+ZViRdnpwxP1ln27NJhCnCXT+WFcBr5BCvlvY4xOmI MIq2ROBH+lxjLVtOl72ZSR4mXGXTBnfOXhs93TfmrwAuXidjWy0jUra3CdJ2zHSR9Fkt J8Bg== MIME-Version: 1.0 Received: by 10.180.91.225 with SMTP id ch1mr2135923wib.18.1338020873067; Sat, 26 May 2012 01:27:53 -0700 (PDT) Received: by 10.216.132.97 with HTTP; Sat, 26 May 2012 01:27:53 -0700 (PDT) Date: Sat, 26 May 2012 09:27:53 +0100 Message-ID: Subject: [Patch ARM] Fix off by one error in neon_evpc_vrev. From: Ramana Radhakrishnan To: "gcc-patches\"" Cc: Richard Henderson , Patch Tracking , Richard Earnshaw X-Gm-Message-State: ALoCoQmFScFcHrrVcnZE8gSeILUHjHvTzSbR6+2b92S03hNhuZNgRiAahfypJETbh38htjLitMxE X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi, There is an off by one error in neon_evpc_vrev which means it rarely gets triggerred. The problem is that if you are looking at d->perm [i +j] and you increment i by just diff you end up starting looking where you looked at the end of the last place where you checked. Given this I think this patch should make it to trunk and to the 4.7 branch after appropriate testing and if the release managers don't object to such a change. The point is that this helps generate proper vect_reverse style instructions for arm_neon. There is scope for a follow up patch that fixes up the intrinsics essentially identical to all the functions in the test that I've added (except for a couple of missing v2sf and v4sf type operations which should boil down to the same implementation) . It's a permute so the type really doesn't matter. However it requires some ML magic with permutations for the masks which will prove to be good fun on a plane trip. Testing currently in progress and if there are no regressions I intend to commit this to trunk and (after a while I would like to backport this to the 4.7 branch if there are no objections from the release managers) RichardH , since you did the original patch would you mind having a quick look to see if I haven't missed anything obvious. * config/arm/arm.c (arm_evpc_neon_vrev): Fix off by one error. * gcc.target/arm/neon-vrev.c: New. regards, Ramana 1. For one example from the testcase you can see the effects before and after for this : (left is new compiler and right is old compiler) vrev16q2_u8: vrev16q2_u8: @ args = 0, pretend = 0, frame = 0 @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. @ link register save eliminated. vmov d16, r0, r1 @ v16qi | vmov d20, r0, r1 @ v16qi vmov d17, r2, r3 | vmov d21, r2, r3 vrev16.8 q8, q8 | vldr d16, .L41 vmov r0, r1, d16 @ v16qi | vldr d17, .L41+8 vmov r2, r3, d17 | vtbl.8 d18, {d20, d21}, d16 bx lr | vtbl.8 d19, {d20, d21}, d17 > vmov r0, r1, d18 @ v16qi > vmov r2, r3, d19 > bx lr > .L42: > .align 3 > .L41: > .byte 1 > .byte 0 > .byte 3 > .byte 2 > .byte 5 > .byte 4 > .byte 7 > .byte 6 > .byte 9 > .byte 8 > .byte 11 > .byte 10 > .byte 13 > .byte 12 > .byte 15 > .byte 14 .size vrev16q2_u8, .-vrev16q2_u8 .size vrev16q2_u8, .-vrev16q2_u8 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 321e6b5..bcad0b9 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -25668,10 +25668,18 @@ arm_evpc_neon_vrev (struct expand_vec_perm_d *d) return false; } - for (i = 0; i < nelt; i += diff) + for (i = 0; i < nelt ; i += (diff + 1)) for (j = 0; j <= diff; j += 1) - if (d->perm[i + j] != i + diff - j) - return false; + { + /* This is guaranteed to be true as the value of diff + is 7, 3, 1 and we should have enough elements in the + queue to generate this. Getting a vector mask with a + value of diff other than these values implies that + something is wrong by the time we get here. */ + gcc_assert ((i + j) < nelt); + if (d->perm[i + j] != i + diff - j) + return false; + } /* Success! */ if (d->testing_p) --- /dev/null 2012-05-25 12:51:24.801630363 +0100 +++ gcc/testsuite/gcc.target/arm/neon-vrev.c 2012-05-26 03:18:02.095635775 +0100 @@ -0,0 +1,105 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_ok } */ +/* { dg-options "-O2" } */ +/* { dg-add-options arm_neon } */ + +#include + +uint16x4_t +tst_vrev642_u16 (uint16x4_t __a) +{ + uint16x4_t __rv; + uint16x4_t __mask1 = { 3, 2, 1, 0}; + return __builtin_shuffle ( __a, __mask1) ; +} + +uint16x8_t +tst_vrev64q2_u16 (uint16x8_t __a) +{ + uint16x8_t __rv; + uint16x8_t __mask1 = {3, 2, 1, 0, 7, 6, 5, 4 }; + return __builtin_shuffle ( __a, __mask1) ; +} + +uint8x8_t +tst_vrev642_u8 (uint8x8_t __a) +{ + uint8x8_t __rv; + uint8x8_t __mask1 = { 7, 6, 5, 4, 3, 2, 1, 0}; + return __builtin_shuffle ( __a, __mask1) ; +} + +uint8x16_t +tst_vrev64q2_u8 (uint8x16_t __a) +{ + uint8x16_t __rv; + uint8x16_t __mask1 = {7, 6, 5, 4, 3, 2, 1, 0, 15, 14, 13, 12, 11, 10, 9, 8}; + return __builtin_shuffle ( __a, __mask1) ; + +} + +uint32x2_t +tst_vrev642_u32 (uint32x2_t __a) +{ + uint32x2_t __rv; + uint32x2_t __mask1 = {1, 0}; + return __builtin_shuffle ( __a, __mask1) ; + +} + +uint32x4_t +tst_vrev64q2_u32 (uint32x4_t __a) +{ + uint32x4_t __rv; + uint32x4_t __mask1 = {1, 0, 3, 2}; + return __builtin_shuffle ( __a, __mask1) ; +} + +uint16x4_t +tst_vrev322_u16 (uint16x4_t __a) +{ + uint16x4_t __mask1 = { 1, 0, 3, 2 }; + return __builtin_shuffle (__a, __mask1); +} + +uint16x8_t +tst_vrev32q2_u16 (uint16x8_t __a) +{ + uint16x8_t __mask1 = { 1, 0, 3, 2, 5, 4, 7, 6 }; + return __builtin_shuffle (__a, __mask1); +} + +uint8x8_t +tst_vrev322_u8 (uint8x8_t __a) +{ + uint8x8_t __mask1 = { 3, 2, 1, 0, 7, 6, 5, 4}; + return __builtin_shuffle (__a, __mask1); +} + +uint8x16_t +tst_vrev32q2_u8 (uint8x16_t __a) +{ + uint8x16_t __mask1 = { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12}; + return __builtin_shuffle (__a, __mask1); +} + +uint8x8_t +tst_vrev162_u8 (uint8x8_t __a) +{ + uint8x8_t __mask = { 1, 0, 3, 2, 5, 4, 7, 6}; + return __builtin_shuffle (__a, __mask); +} + +uint8x16_t +tst_vrev16q2_u8 (uint8x16_t __a) +{ + uint8x16_t __mask = { 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14}; + return __builtin_shuffle (__a, __mask); +} + +/* { dg-final {scan-assembler-times "vrev32\.16\\t" 2} } */ +/* { dg-final {scan-assembler-times "vrev32\.8\\t" 2} } */ +/* { dg-final {scan-assembler-times "vrev16\.8\\t" 2} } */ +/* { dg-final {scan-assembler-times "vrev64\.8\\t" 2} } */ +/* { dg-final {scan-assembler-times "vrev64\.32\\t" 2} } */ +/* { dg-final {scan-assembler-times "vrev64\.16\\t" 2} } */