From patchwork Thu Mar  6 23:33:55 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alexander Graf <agraf@suse.de>
X-Patchwork-Id: 327656
Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])
	(using TLSv1 with cipher AES256-SHA (256/256 bits))
	(No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 515A92C032E
	for <incoming@patchwork.ozlabs.org>;
	Fri,  7 Mar 2014 11:28:25 +1100 (EST)
Received: from localhost ([::1]:33518 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71) (envelope-from
	<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)
	id 1WLieF-0008S7-04
	for incoming@patchwork.ozlabs.org; Thu, 06 Mar 2014 19:28:23 -0500
Received: from eggs.gnu.org ([2001:4830:134:3::10]:58953)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <agraf@suse.de>) id 1WLhoZ-0001IX-SS
	for qemu-devel@nongnu.org; Thu, 06 Mar 2014 18:35:38 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <agraf@suse.de>) id 1WLhnz-0003Pw-4d
	for qemu-devel@nongnu.org; Thu, 06 Mar 2014 18:34:59 -0500
Received: from cantor2.suse.de ([195.135.220.15]:55855 helo=mx2.suse.de)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <agraf@suse.de>)
	id 1WLhny-0003Eu-Q7; Thu, 06 Mar 2014 18:34:23 -0500
Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254])
	by mx2.suse.de (Postfix) with ESMTP id 8EC5CAD33;
	Thu,  6 Mar 2014 23:34:22 +0000 (UTC)
From: Alexander Graf <agraf@suse.de>
To: qemu-devel@nongnu.org
Date: Fri,  7 Mar 2014 00:33:55 +0100
Message-Id: <1394148857-19607-109-git-send-email-agraf@suse.de>
X-Mailer: git-send-email 1.7.12.4
In-Reply-To: <1394148857-19607-1-git-send-email-agraf@suse.de>
References: <1394148857-19607-1-git-send-email-agraf@suse.de>
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x
X-Received-From: 195.135.220.15
Cc: Peter Maydell <peter.maydell@linaro.org>, Tom Musta <tommusta@gmail.com>,
	blauwirbel@gmail.com, qemu-ppc@nongnu.org, aliguori@amazon.com,
	aurelien@aurel32.net
Subject: [Qemu-devel] [PULL 108/130] target-ppc: Altivec 2.07: Change Bit
	Masks to Support 64-bit Rotates and Shifts
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org
Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org

From: Tom Musta <tommusta@gmail.com>

Existing code in the VROTATE, VSL and VSR macros for the Altivec rotate and shift
helpers uses a formula to compute a bit mask used to extract the rotate/shift
amount from the VRB register.  What is desired is:

    mask = (1 << (3 + log2(sizeof(element)))) - 1

but what is implemented is:

    mask = (1 << (3 + (sizeof(element)/2))) - 1

This produces correct answers when "element" is uint8_t, uint16_t or uint_32t.  But
it breaks down when element is uint64_t.

This patch corrects the situation.  Since the mask is known at compile time, the
macros are changed to simply accept the mask as an argument.

Subsequent patches in this series will add double-word variants of rotates and
shifts and thus take advantage of this fix.

Signed-off-by: Tom Musta <tommusta@gmail.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
---
 target-ppc/int_helper.c | 40 +++++++++++++++-------------------------
 1 file changed, 15 insertions(+), 25 deletions(-)

diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index 56e8d9a..59b5a1f 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -1128,23 +1128,20 @@ VRFI(p, float_round_up)
 VRFI(z, float_round_to_zero)
 #undef VRFI
 
-#define VROTATE(suffix, element)                                        \
+#define VROTATE(suffix, element, mask)                                  \
     void helper_vrl##suffix(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)   \
     {                                                                   \
         int i;                                                          \
                                                                         \
         for (i = 0; i < ARRAY_SIZE(r->element); i++) {                  \
-            unsigned int mask = ((1 <<                                  \
-                                  (3 + (sizeof(a->element[0]) >> 1)))   \
-                                 - 1);                                  \
             unsigned int shift = b->element[i] & mask;                  \
             r->element[i] = (a->element[i] << shift) |                  \
                 (a->element[i] >> (sizeof(a->element[0]) * 8 - shift)); \
         }                                                               \
     }
-VROTATE(b, u8)
-VROTATE(h, u16)
-VROTATE(w, u32)
+VROTATE(b, u8, 0x7)
+VROTATE(h, u16, 0xF)
+VROTATE(w, u32, 0x1F)
 #undef VROTATE
 
 void helper_vrsqrtefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
@@ -1225,23 +1222,20 @@ VSHIFT(r, RIGHT)
 #undef LEFT
 #undef RIGHT
 
-#define VSL(suffix, element)                                            \
+#define VSL(suffix, element, mask)                                      \
     void helper_vsl##suffix(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)   \
     {                                                                   \
         int i;                                                          \
                                                                         \
         for (i = 0; i < ARRAY_SIZE(r->element); i++) {                  \
-            unsigned int mask = ((1 <<                                  \
-                                  (3 + (sizeof(a->element[0]) >> 1)))   \
-                                 - 1);                                  \
             unsigned int shift = b->element[i] & mask;                  \
                                                                         \
             r->element[i] = a->element[i] << shift;                     \
         }                                                               \
     }
-VSL(b, u8)
-VSL(h, u16)
-VSL(w, u32)
+VSL(b, u8, 0x7)
+VSL(h, u16, 0x0F)
+VSL(w, u32, 0x1F)
 #undef VSL
 
 void helper_vsldoi(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, uint32_t shift)
@@ -1325,26 +1319,22 @@ VSPLTI(h, s16, int16_t)
 VSPLTI(w, s32, int32_t)
 #undef VSPLTI
 
-#define VSR(suffix, element)                                            \
+#define VSR(suffix, element, mask)                                      \
     void helper_vsr##suffix(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)   \
     {                                                                   \
         int i;                                                          \
                                                                         \
         for (i = 0; i < ARRAY_SIZE(r->element); i++) {                  \
-            unsigned int mask = ((1 <<                                  \
-                                  (3 + (sizeof(a->element[0]) >> 1)))   \
-                                 - 1);                                  \
             unsigned int shift = b->element[i] & mask;                  \
-                                                                        \
             r->element[i] = a->element[i] >> shift;                     \
         }                                                               \
     }
-VSR(ab, s8)
-VSR(ah, s16)
-VSR(aw, s32)
-VSR(b, u8)
-VSR(h, u16)
-VSR(w, u32)
+VSR(ab, s8, 0x7)
+VSR(ah, s16, 0xF)
+VSR(aw, s32, 0x1F)
+VSR(b, u8, 0x7)
+VSR(h, u16, 0xF)
+VSR(w, u32, 0x1F)
 #undef VSR
 
 void helper_vsro(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)