From patchwork Tue Feb 9 17:07:18 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Charles Baylis X-Patchwork-Id: 580973 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id AB9641409C4 for ; Wed, 10 Feb 2016 04:07:30 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=Kv28QWFj; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; q=dns; s=default; b=u+4nT4PatqtX+tKhcu 9mPfKNGxLgPNBcdRop33gXw/Yx7mIIWVRDWijde52NobzFJIqdJ0SJsQkCEtPrcE FSw2Jx2kAaLETNcI/e5DGnkZ/czFw2HTvwnPr7HVWFc8Nf90fRy9bIhSxcjfXawd MHRrlRJKQq/WnXHhVwugR+K6o= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; s=default; bh=etA025I6J2lkNus7y5v7c9w6 UQs=; b=Kv28QWFjbTf4KygWA6Old10L5enl95wfnIsa9CRYYPWXYT06kBikW+bN kpvt8t2PBIHui2MvQWEuPA8/PljywVubhLFCzw4HY5zeoCEQimZX9GeBuIlMJ8OT wROjyImQW/lOpSi4Bp6ZvfGIS42U+bjDW6zNDamqT5qDyhJTHW4= Received: (qmail 86018 invoked by alias); 9 Feb 2016 17:07:23 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 86004 invoked by uid 89); 9 Feb 2016 17:07:22 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=4.4 required=5.0 tests=AWL, BAYES_50, RCVD_IN_DNSWL_LOW, SPF_PASS, ZIP_ATTACHED autolearn=no version=3.3.2 spammy=threw, 28318, 21, 8488, 8, 8517, 9 X-HELO: mail-oi0-f43.google.com Received: from mail-oi0-f43.google.com (HELO mail-oi0-f43.google.com) (209.85.218.43) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Tue, 09 Feb 2016 17:07:20 +0000 Received: by mail-oi0-f43.google.com with SMTP id j125so18969516oih.1 for ; Tue, 09 Feb 2016 09:07:20 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=72yqtbQdpNAgJuMFYzLlqp/8ichDpJ8ToJW0lzoQQ1o=; b=R3gP+U42usXFZcbpwFdaPHU6/2x8eRj24EXlVtla44WsIEXjxlvp2TWgQQKR0GX/5P bGfJfajwj+2zzo9j94UxMMplIJLe33QcKsflcey0Q4LaJyjRIaDAupIreXUG2A7fooiG Q8cvGjgV7wZPrWg4KIoOZAxDX9EA/hEZaJ0q8rycLddqsRAYVkVCobIUGg6sM8iNJx1M 4pzsRDYTTSowfJtr7tD380nizhzKuanNpDIf91UD4x4APm075TQjobhy+Dn/rXTKpZwP VIQb801UFniarlYHPLmht9sM20oINx3t+oq/xQtPdwTCjlHWJsXwq0aZ4VJ+b4s09CAC to8Q== X-Gm-Message-State: AG10YOSumQudaxlbumOvK0VRHFsnBFUdcxaMuZ8p9BV/7MYNqkK8vg0W0mTP/UVR4Rx1jbBepTvDJ6TwfxqHfEY2 MIME-Version: 1.0 X-Received: by 10.202.242.86 with SMTP id q83mr4672021oih.24.1455037638829; Tue, 09 Feb 2016 09:07:18 -0800 (PST) Received: by 10.202.224.4 with HTTP; Tue, 9 Feb 2016 09:07:18 -0800 (PST) In-Reply-To: <56B87F23.4030906@foss.arm.com> References: <1454525947-14690-1-git-send-email-charles.baylis@linaro.org> <1454525947-14690-3-git-send-email-charles.baylis@linaro.org> <56B87F23.4030906@foss.arm.com> Date: Tue, 9 Feb 2016 17:07:18 +0000 Message-ID: Subject: Re: [PATCH 2/2] [ARM] PR68532 Fix up vzip recognition for big endian From: Charles Baylis To: Kyrill Tkachov Cc: Ramana Radhakrishnan , Richard Earnshaw , Richard Earnshaw , GCC Patches , Michael Collison X-IsSubscribed: yes On 8 February 2016 at 11:42, Kyrill Tkachov wrote: > On 03/02/16 18:59, charles.baylis@linaro.org wrote: >> --- a/gcc/config/arm/arm.c >> +++ b/gcc/config/arm/arm.c >> @@ -28318,15 +28318,21 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d) >> unsigned int i, high, mask, nelt = d->nelt; >> rtx out0, out1, in0, in1; >> rtx (*gen)(rtx, rtx, rtx, rtx); >> + int first_elem; >> + bool is_swapped; >> if (GET_MODE_UNIT_SIZE (d->vmode) >= 8) >> return false; >> + is_swapped = BYTES_BIG_ENDIAN ? true : false; > > > This is just "is_swapped = BYTES_BIG_ENDIAN;" Done. >> + >> /* Note that these are little-endian tests. Adjust for big-endian >> later. */ > > > I think you can remove this comment now, like in patch 1/2 Done. >> + first_elem = d->perm[neon_endian_lane_map (d->vmode, 0) ^ is_swapped]; >> + >> high = nelt / 2; >> - if (d->perm[0] == high) >> + if (first_elem == neon_endian_lane_map (d->vmode, high)) >> ; >> - else if (d->perm[0] == 0) >> + else if (first_elem == neon_endian_lane_map (d->vmode, 0)) >> high = 0; >> else >> return false; >> @@ -28334,11 +28340,16 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d) >> for (i = 0; i < nelt / 2; i++) >> { >> - unsigned elt = (i + high) & mask; >> - if (d->perm[i * 2] != elt) >> + unsigned elt = >> + neon_pair_endian_lane_map (d->vmode, i + high) & mask; >> + if (d->perm[neon_pair_endian_lane_map (d->vmode, 2 * i + >> is_swapped)] >> + != elt) >> return false; >> - elt = (elt + nelt) & mask; >> - if (d->perm[i * 2 + 1] != elt) >> + elt = >> + neon_pair_endian_lane_map (d->vmode, i + nelt + high) >> + & mask; > > > The "& mask" can go on the previous line. Done >> + if (d->perm[neon_pair_endian_lane_map (d->vmode, 2 * i + >> !is_swapped)] >> + != elt) >> return false; >> } >> @@ -28362,10 +28373,9 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d >> *d) >> in0 = d->op0; >> in1 = d->op1; >> - if (BYTES_BIG_ENDIAN) >> + if (is_swapped) >> { >> std::swap (in0, in1); >> - high = !high; >> } > > > remove the braces around the std::swap. Done. > Ok with these changes. > I've tried out both patch and they do fix execution failures on big-endian > and don't break any NEON intrinsics tests that I threw at them. Attached for completeness, will commit once the VUZP patch is OKd. From 469f82610a4e70284bf23c373b8a73685cad0ec1 Mon Sep 17 00:00:00 2001 From: Charles Baylis Date: Tue, 9 Feb 2016 15:18:44 +0000 Subject: [PATCH 2/2] [ARM] PR68532 Fix up vzip recognition for big endian gcc/ChangeLog: 2016-02-09 Charles Baylis PR target/68532 * config/arm/arm.c (arm_evpc_neon_vzip): Allow for big endian lane order. * config/arm/arm_neon.h (vzipq_s8): Adjust shuffle patterns for big endian. (vzipq_s16): Likewise. (vzipq_s32): Likewise. (vzipq_f32): Likewise. (vzipq_u8): Likewise. (vzipq_u16): Likewise. (vzipq_u32): Likewise. (vzipq_p8): Likewise. (vzipq_p16): Likewise. Change-Id: I327678f5e73c1de2f413c1d22769ab42ce1d6c16 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 95ee9a5..5562baa 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -28318,15 +28318,20 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d) unsigned int i, high, mask, nelt = d->nelt; rtx out0, out1, in0, in1; rtx (*gen)(rtx, rtx, rtx, rtx); + int first_elem; + bool is_swapped; if (GET_MODE_UNIT_SIZE (d->vmode) >= 8) return false; - /* Note that these are little-endian tests. Adjust for big-endian later. */ + is_swapped = BYTES_BIG_ENDIAN; + + first_elem = d->perm[neon_endian_lane_map (d->vmode, 0) ^ is_swapped]; + high = nelt / 2; - if (d->perm[0] == high) + if (first_elem == neon_endian_lane_map (d->vmode, high)) ; - else if (d->perm[0] == 0) + else if (first_elem == neon_endian_lane_map (d->vmode, 0)) high = 0; else return false; @@ -28334,11 +28339,15 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d) for (i = 0; i < nelt / 2; i++) { - unsigned elt = (i + high) & mask; - if (d->perm[i * 2] != elt) + unsigned elt = + neon_pair_endian_lane_map (d->vmode, i + high) & mask; + if (d->perm[neon_pair_endian_lane_map (d->vmode, 2 * i + is_swapped)] + != elt) return false; - elt = (elt + nelt) & mask; - if (d->perm[i * 2 + 1] != elt) + elt = + neon_pair_endian_lane_map (d->vmode, i + nelt + high) & mask; + if (d->perm[neon_pair_endian_lane_map (d->vmode, 2 * i + !is_swapped)] + != elt) return false; } @@ -28362,11 +28371,8 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d) in0 = d->op0; in1 = d->op1; - if (BYTES_BIG_ENDIAN) - { - std::swap (in0, in1); - high = !high; - } + if (is_swapped) + std::swap (in0, in1); out0 = d->target; out1 = gen_reg_rtx (d->vmode); diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index 2e014b6..aa17f49 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -8453,9 +8453,9 @@ vzipq_s8 (int8x16_t __a, int8x16_t __b) int8x16x2_t __rv; #ifdef __ARM_BIG_ENDIAN __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t) - { 24, 8, 25, 9, 26, 10, 27, 11, 28, 12, 29, 13, 30, 14, 31, 15 }); + { 20, 4, 21, 5, 22, 6, 23, 7, 16, 0, 17, 1, 18, 2, 19, 3 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint8x16_t) - { 16, 0, 17, 1, 18, 2, 19, 3, 20, 4, 21, 5, 22, 6, 23, 7 }); + { 28, 12, 29, 13, 30, 14, 31, 15, 24, 8, 25, 9, 26, 10, 27, 11 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t) { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 }); @@ -8471,9 +8471,9 @@ vzipq_s16 (int16x8_t __a, int16x8_t __b) int16x8x2_t __rv; #ifdef __ARM_BIG_ENDIAN __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t) - { 12, 4, 13, 5, 14, 6, 15, 7 }); + { 10, 2, 11, 3, 8, 0, 9, 1 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint16x8_t) - { 8, 0, 9, 1, 10, 2, 11, 3 }); + { 14, 6, 15, 7, 12, 4, 13, 5 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t) { 0, 8, 1, 9, 2, 10, 3, 11 }); @@ -8488,8 +8488,8 @@ vzipq_s32 (int32x4_t __a, int32x4_t __b) { int32x4x2_t __rv; #ifdef __ARM_BIG_ENDIAN - __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 6, 2, 7, 3 }); - __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 4, 0, 5, 1 }); + __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 5, 1, 4, 0 }); + __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 7, 3, 6, 2 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 0, 4, 1, 5 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 2, 6, 3, 7 }); @@ -8502,8 +8502,8 @@ vzipq_f32 (float32x4_t __a, float32x4_t __b) { float32x4x2_t __rv; #ifdef __ARM_BIG_ENDIAN - __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 6, 2, 7, 3 }); - __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 4, 0, 5, 1 }); + __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 5, 1, 4, 0 }); + __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 7, 3, 6, 2 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 0, 4, 1, 5 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 2, 6, 3, 7 }); @@ -8517,9 +8517,9 @@ vzipq_u8 (uint8x16_t __a, uint8x16_t __b) uint8x16x2_t __rv; #ifdef __ARM_BIG_ENDIAN __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t) - { 24, 8, 25, 9, 26, 10, 27, 11, 28, 12, 29, 13, 30, 14, 31, 15 }); + { 20, 4, 21, 5, 22, 6, 23, 7, 16, 0, 17, 1, 18, 2, 19, 3 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint8x16_t) - { 16, 0, 17, 1, 18, 2, 19, 3, 20, 4, 21, 5, 22, 6, 23, 7 }); + { 28, 12, 29, 13, 30, 14, 31, 15, 24, 8, 25, 9, 26, 10, 27, 11 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t) { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 }); @@ -8535,9 +8535,9 @@ vzipq_u16 (uint16x8_t __a, uint16x8_t __b) uint16x8x2_t __rv; #ifdef __ARM_BIG_ENDIAN __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t) - { 12, 4, 13, 5, 14, 6, 15, 7 }); + { 10, 2, 11, 3, 8, 0, 9, 1 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint16x8_t) - { 8, 0, 9, 1, 10, 2, 11, 3 }); + { 14, 6, 15, 7, 12, 4, 13, 5 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t) { 0, 8, 1, 9, 2, 10, 3, 11 }); @@ -8552,8 +8552,8 @@ vzipq_u32 (uint32x4_t __a, uint32x4_t __b) { uint32x4x2_t __rv; #ifdef __ARM_BIG_ENDIAN - __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 6, 2, 7, 3 }); - __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 4, 0, 5, 1 }); + __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 5, 1, 4, 0 }); + __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 7, 3, 6, 2 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 0, 4, 1, 5 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 2, 6, 3, 7 }); @@ -8567,9 +8567,9 @@ vzipq_p8 (poly8x16_t __a, poly8x16_t __b) poly8x16x2_t __rv; #ifdef __ARM_BIG_ENDIAN __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t) - { 24, 8, 25, 9, 26, 10, 27, 11, 28, 12, 29, 13, 30, 14, 31, 15 }); + { 20, 4, 21, 5, 22, 6, 23, 7, 16, 0, 17, 1, 18, 2, 19, 3 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint8x16_t) - { 16, 0, 17, 1, 18, 2, 19, 3, 20, 4, 21, 5, 22, 6, 23, 7 }); + { 28, 12, 29, 13, 30, 14, 31, 15, 24, 8, 25, 9, 26, 10, 27, 11 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t) { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 }); @@ -8585,9 +8585,9 @@ vzipq_p16 (poly16x8_t __a, poly16x8_t __b) poly16x8x2_t __rv; #ifdef __ARM_BIG_ENDIAN __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t) - { 12, 4, 13, 5, 14, 6, 15, 7 }); + { 10, 2, 11, 3, 8, 0, 9, 1 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint16x8_t) - { 8, 0, 9, 1, 10, 2, 11, 3 }); + { 14, 6, 15, 7, 12, 4, 13, 5 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t) { 0, 8, 1, 9, 2, 10, 3, 11 }); -- 1.9.1