From patchwork Sun Oct  6 17:32:50 2013
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Bill Schmidt <wschmidt@linux.vnet.ibm.com>
X-Patchwork-Id: 280890
Return-Path: 
 <gcc-patches-return-350528-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client did not present a certificate)
	by ozlabs.org (Postfix) with ESMTPS id 739EE2C00BD
	for <incoming@patchwork.ozlabs.org>;
	Mon,  7 Oct 2013 04:33:00 +1100 (EST)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:subject:from:to:cc:date:content-type
	:content-transfer-encoding:mime-version; q=dns; s=default; b=owc
	LUC2M3/41ypn10knrpU/7ek0IEecUIAKUDv+W5wcRnXH2CKMp3jpwq+34xMy2IAj
	6GZVSWgqNtZMB5CMzoq+LwfC/Del/xcEqKa7lVQY6NxU6Nbe/LkHuEQc2p+1JEss
	EnSwOMqy3jUbcwxvB9yxgtDLm+2kwaYZfJmTZLGg=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:subject:from:to:cc:date:content-type
	:content-transfer-encoding:mime-version; s=default; bh=IMr9ArQZC
	CBMLToKLK+2gUTizB8=; b=LIwmrP0KXEhUsCUthD3yDBq9GQ4lnGz1hIcqlcAEx
	i+yi8ubeKbgPKG1BKNhIARoceObLzIv2yUIEImM5sCWlzANGAruiKjoN781aht7Q
	/RUpbzLMtA9eq1yB9C4Z9ScSHRYlH8imWu5FM3UPXCoZRVMxyIQlWNI6gGRoigmM
	iA=
Received: (qmail 28540 invoked by alias); 6 Oct 2013 17:32:53 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <mailto:gcc-patches-unsubscribe-##L=##H@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 28525 invoked by uid 89); 6 Oct 2013 17:32:52 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-3.0 required=5.0 tests=AWL, BAYES_00,
	RP_MATCHES_RCVD autolearn=ham version=3.3.2
X-HELO: e23smtp03.au.ibm.com
Received: from e23smtp03.au.ibm.com (HELO e23smtp03.au.ibm.com)
	(202.81.31.145) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted)
	ESMTPS; Sun, 06 Oct 2013 17:32:51 +0000
Received: from /spool/local	by e23smtp03.au.ibm.com with IBM ESMTP SMTP
	Gateway: Authorized Use Only! Violators will be
	prosecuted	for <gcc-patches@gcc.gnu.org> from
	<wschmidt@linux.vnet.ibm.com>; Mon, 7 Oct 2013 03:32:46 +1000
Received: from d23dlp01.au.ibm.com (202.81.31.203)	by e23smtp03.au.ibm.com
	(202.81.31.209) with IBM ESMTP SMTP Gateway: Authorized Use
	Only! Violators will be prosecuted; Mon, 7 Oct 2013 03:32:44 +1000
Received: from d23relay03.au.ibm.com (d23relay03.au.ibm.com
	[9.190.235.21])	by d23dlp01.au.ibm.com (Postfix) with ESMTP
	id 4CF522CE8040	for <gcc-patches@gcc.gnu.org>;
	Mon,  7 Oct 2013 04:32:44 +1100 (EST)
Received: from d23av03.au.ibm.com (d23av03.au.ibm.com [9.190.234.97])	by
	d23relay03.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id
	r96HWQQY5243254	for <gcc-patches@gcc.gnu.org>;
	Mon, 7 Oct 2013 04:32:32 +1100
Received: from d23av03.au.ibm.com (localhost [127.0.0.1])	by
	d23av03.au.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP
	id r96HWbx8013990	for <gcc-patches@gcc.gnu.org>;
	Mon, 7 Oct 2013 04:32:37 +1100
Received: from [9.65.233.72] (sig-9-65-233-72.mts.ibm.com [9.65.233.72])	by
	d23av03.au.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP
	id r96HWZub013987; Mon, 7 Oct 2013 04:32:36 +1100
Message-ID: <1381080770.6275.11.camel@gnopaine>
Subject: [PATCH, rs6000] Correct vector permute for little endian
From: Bill Schmidt <wschmidt@linux.vnet.ibm.com>
To: gcc-patches@gcc.gnu.org
Cc: dje.gcc@gmail.com
Date: Sun, 06 Oct 2013 12:32:50 -0500
Mime-Version: 1.0
X-TM-AS-MML: No
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 13100617-6102-0000-0000-000004493529
X-IsSubscribed: yes

This patch corrects the expansion of vec_perm_constv16qi for
powerpc64le.  The explanation of the problem with a detailed example
appears in the commentary, as this corrects for what I found to be
surprising behavior in the implementation of the vperm instruction, and
I don't want any of us to spend time figuring that out again.  (We may
want to add a programming note in the next version of the ISA.)

This corrects 18 failing tests in the test suite for the powerpc64le
target, without affecting the big-endian targets.  Bootstrapped and
tested with no new regressions on powerpc64le-unknown-linux-gnu and
powerpc64-unknown-linux-gnu.  Ok for trunk?

Thanks,
Bill


2013-10-06  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/rs6000.c (altivec_expand_vec_perm_const_le): New.
	(altivec_expand_vec_perm_const): Call it.

Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 203018)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -28426,6 +28526,88 @@ rs6000_emit_parity (rtx dst, rtx src)
     }
 }
 
+/* Expand an Altivec constant permutation for little endian mode.
+   There are two issues: First, the two input operands must be
+   swapped so that together they form a double-wide array in LE
+   order.  Second, the vperm instruction has surprising behavior
+   in LE mode:  it interprets the elements of the source vectors
+   in BE mode ("left to right") and interprets the elements of
+   the destination vector in LE mode ("right to left").  To
+   correct for this, we must subtract each element of the permute
+   control vector from 31.
+
+   For example, suppose we want to concatenate vr10 = {0, 1, 2, 3}
+   with vr11 = {4, 5, 6, 7} and extract {0, 2, 4, 6} using a vperm.
+   We place {0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27} in vr12 to
+   serve as the permute control vector.  Then, in BE mode,
+
+     vperm 9,10,11,12
+
+   places the desired result in vr9.  However, in LE mode the 
+   vector contents will be
+
+     vr10 = 00000003 00000002 00000001 00000000
+     vr11 = 00000007 00000006 00000005 00000004
+
+   The result of the vperm using the same permute control vector is
+
+     vr9  = 05000000 07000000 01000000 03000000
+
+   That is, the leftmost 4 bytes of vr10 are interpreted as the
+   source for the rightmost 4 bytes of vr9, and so on.
+
+   If we change the permute control vector to
+
+     vr12 = {31,20,29,28,23,22,21,20,15,14,13,12,7,6,5,4}
+
+   and issue
+
+     vperm 9,11,10,12
+
+   we get the desired
+
+   vr9  = 00000006 00000004 00000002 00000000.  */
+
+void
+altivec_expand_vec_perm_const_le (rtx operands[4])
+{
+  unsigned int i;
+  rtx perm[16];
+  rtx constv, unspec;
+  rtx target = operands[0];
+  rtx op0 = operands[1];
+  rtx op1 = operands[2];
+  rtx sel = operands[3];
+
+  /* Unpack and adjust the constant selector.  */
+  for (i = 0; i < 16; ++i)
+    {
+      rtx e = XVECEXP (sel, 0, i);
+      unsigned int elt = 31 - (INTVAL (e) & 31);
+      perm[i] = GEN_INT (elt);
+    }
+
+  /* Expand to a permute, swapping the inputs and using the
+     adjusted selector.  */
+  if (!REG_P (op0))
+    op0 = force_reg (V16QImode, op0);
+  if (!REG_P (op1))
+    op1 = force_reg (V16QImode, op1);
+
+  constv = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, perm));
+  constv = force_reg (V16QImode, constv);
+  unspec = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, op1, op0, constv),
+			   UNSPEC_VPERM);
+  if (!REG_P (target))
+    {
+      rtx tmp = gen_reg_rtx (V16QImode);
+      emit_move_insn (tmp, unspec);
+      unspec = tmp;
+    }
+
+  emit_move_insn (target, unspec);
+}
+
 /* Expand an Altivec constant permutation.  Return true if we match
    an efficient implementation; false to fall back to VPERM.  */
 
@@ -28606,6 +28788,12 @@ altivec_expand_vec_perm_const (rtx operands[4])
 	}
     }
 
+  if (!BYTES_BIG_ENDIAN)
+    {
+      altivec_expand_vec_perm_const_le (operands);
+      return true;
+    }
+
   return false;
 }