From patchwork Sun Oct 6 17:32:50 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bill Schmidt X-Patchwork-Id: 280890 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 739EE2C00BD for ; Mon, 7 Oct 2013 04:33:00 +1100 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:subject:from:to:cc:date:content-type :content-transfer-encoding:mime-version; q=dns; s=default; b=owc LUC2M3/41ypn10knrpU/7ek0IEecUIAKUDv+W5wcRnXH2CKMp3jpwq+34xMy2IAj 6GZVSWgqNtZMB5CMzoq+LwfC/Del/xcEqKa7lVQY6NxU6Nbe/LkHuEQc2p+1JEss EnSwOMqy3jUbcwxvB9yxgtDLm+2kwaYZfJmTZLGg= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:subject:from:to:cc:date:content-type :content-transfer-encoding:mime-version; s=default; bh=IMr9ArQZC CBMLToKLK+2gUTizB8=; b=LIwmrP0KXEhUsCUthD3yDBq9GQ4lnGz1hIcqlcAEx i+yi8ubeKbgPKG1BKNhIARoceObLzIv2yUIEImM5sCWlzANGAruiKjoN781aht7Q /RUpbzLMtA9eq1yB9C4Z9ScSHRYlH8imWu5FM3UPXCoZRVMxyIQlWNI6gGRoigmM iA= Received: (qmail 28540 invoked by alias); 6 Oct 2013 17:32:53 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 28525 invoked by uid 89); 6 Oct 2013 17:32:52 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-3.0 required=5.0 tests=AWL, BAYES_00, RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: e23smtp03.au.ibm.com Received: from e23smtp03.au.ibm.com (HELO e23smtp03.au.ibm.com) (202.81.31.145) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Sun, 06 Oct 2013 17:32:51 +0000 Received: from /spool/local by e23smtp03.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 7 Oct 2013 03:32:46 +1000 Received: from d23dlp01.au.ibm.com (202.81.31.203) by e23smtp03.au.ibm.com (202.81.31.209) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 7 Oct 2013 03:32:44 +1000 Received: from d23relay03.au.ibm.com (d23relay03.au.ibm.com [9.190.235.21]) by d23dlp01.au.ibm.com (Postfix) with ESMTP id 4CF522CE8040 for ; Mon, 7 Oct 2013 04:32:44 +1100 (EST) Received: from d23av03.au.ibm.com (d23av03.au.ibm.com [9.190.234.97]) by d23relay03.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r96HWQQY5243254 for ; Mon, 7 Oct 2013 04:32:32 +1100 Received: from d23av03.au.ibm.com (localhost [127.0.0.1]) by d23av03.au.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id r96HWbx8013990 for ; Mon, 7 Oct 2013 04:32:37 +1100 Received: from [9.65.233.72] (sig-9-65-233-72.mts.ibm.com [9.65.233.72]) by d23av03.au.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id r96HWZub013987; Mon, 7 Oct 2013 04:32:36 +1100 Message-ID: <1381080770.6275.11.camel@gnopaine> Subject: [PATCH, rs6000] Correct vector permute for little endian From: Bill Schmidt To: gcc-patches@gcc.gnu.org Cc: dje.gcc@gmail.com Date: Sun, 06 Oct 2013 12:32:50 -0500 Mime-Version: 1.0 X-TM-AS-MML: No X-Content-Scanned: Fidelis XPS MAILER x-cbid: 13100617-6102-0000-0000-000004493529 X-IsSubscribed: yes This patch corrects the expansion of vec_perm_constv16qi for powerpc64le. The explanation of the problem with a detailed example appears in the commentary, as this corrects for what I found to be surprising behavior in the implementation of the vperm instruction, and I don't want any of us to spend time figuring that out again. (We may want to add a programming note in the next version of the ISA.) This corrects 18 failing tests in the test suite for the powerpc64le target, without affecting the big-endian targets. Bootstrapped and tested with no new regressions on powerpc64le-unknown-linux-gnu and powerpc64-unknown-linux-gnu. Ok for trunk? Thanks, Bill 2013-10-06 Bill Schmidt * config/rs6000/rs6000.c (altivec_expand_vec_perm_const_le): New. (altivec_expand_vec_perm_const): Call it. Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 203018) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -28426,6 +28526,88 @@ rs6000_emit_parity (rtx dst, rtx src) } } +/* Expand an Altivec constant permutation for little endian mode. + There are two issues: First, the two input operands must be + swapped so that together they form a double-wide array in LE + order. Second, the vperm instruction has surprising behavior + in LE mode: it interprets the elements of the source vectors + in BE mode ("left to right") and interprets the elements of + the destination vector in LE mode ("right to left"). To + correct for this, we must subtract each element of the permute + control vector from 31. + + For example, suppose we want to concatenate vr10 = {0, 1, 2, 3} + with vr11 = {4, 5, 6, 7} and extract {0, 2, 4, 6} using a vperm. + We place {0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27} in vr12 to + serve as the permute control vector. Then, in BE mode, + + vperm 9,10,11,12 + + places the desired result in vr9. However, in LE mode the + vector contents will be + + vr10 = 00000003 00000002 00000001 00000000 + vr11 = 00000007 00000006 00000005 00000004 + + The result of the vperm using the same permute control vector is + + vr9 = 05000000 07000000 01000000 03000000 + + That is, the leftmost 4 bytes of vr10 are interpreted as the + source for the rightmost 4 bytes of vr9, and so on. + + If we change the permute control vector to + + vr12 = {31,20,29,28,23,22,21,20,15,14,13,12,7,6,5,4} + + and issue + + vperm 9,11,10,12 + + we get the desired + + vr9 = 00000006 00000004 00000002 00000000. */ + +void +altivec_expand_vec_perm_const_le (rtx operands[4]) +{ + unsigned int i; + rtx perm[16]; + rtx constv, unspec; + rtx target = operands[0]; + rtx op0 = operands[1]; + rtx op1 = operands[2]; + rtx sel = operands[3]; + + /* Unpack and adjust the constant selector. */ + for (i = 0; i < 16; ++i) + { + rtx e = XVECEXP (sel, 0, i); + unsigned int elt = 31 - (INTVAL (e) & 31); + perm[i] = GEN_INT (elt); + } + + /* Expand to a permute, swapping the inputs and using the + adjusted selector. */ + if (!REG_P (op0)) + op0 = force_reg (V16QImode, op0); + if (!REG_P (op1)) + op1 = force_reg (V16QImode, op1); + + constv = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, perm)); + constv = force_reg (V16QImode, constv); + unspec = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, op1, op0, constv), + UNSPEC_VPERM); + if (!REG_P (target)) + { + rtx tmp = gen_reg_rtx (V16QImode); + emit_move_insn (tmp, unspec); + unspec = tmp; + } + + emit_move_insn (target, unspec); +} + /* Expand an Altivec constant permutation. Return true if we match an efficient implementation; false to fall back to VPERM. */ @@ -28606,6 +28788,12 @@ altivec_expand_vec_perm_const (rtx operands[4]) } } + if (!BYTES_BIG_ENDIAN) + { + altivec_expand_vec_perm_const_le (operands); + return true; + } + return false; }