From patchwork Thu Mar 6 23:32:52 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Graf X-Patchwork-Id: 327712 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id D38BD2C0331 for ; Fri, 7 Mar 2014 12:52:36 +1100 (EST) Received: from localhost ([::1]:33007 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WLhwx-00031v-TK for incoming@patchwork.ozlabs.org; Thu, 06 Mar 2014 18:43:39 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58547) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WLhoE-00015y-VE for qemu-devel@nongnu.org; Thu, 06 Mar 2014 18:35:17 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WLhnx-0003JV-0s for qemu-devel@nongnu.org; Thu, 06 Mar 2014 18:34:38 -0500 Received: from cantor2.suse.de ([195.135.220.15]:55846 helo=mx2.suse.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WLhnw-0003Ef-Ls; Thu, 06 Mar 2014 18:34:20 -0500 Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 0F547ACF3; Thu, 6 Mar 2014 23:34:20 +0000 (UTC) From: Alexander Graf To: qemu-devel@nongnu.org Date: Fri, 7 Mar 2014 00:32:52 +0100 Message-Id: <1394148857-19607-46-git-send-email-agraf@suse.de> X-Mailer: git-send-email 1.7.12.4 In-Reply-To: <1394148857-19607-1-git-send-email-agraf@suse.de> References: <1394148857-19607-1-git-send-email-agraf@suse.de> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 195.135.220.15 Cc: Peter Maydell , Tom Musta , blauwirbel@gmail.com, qemu-ppc@nongnu.org, aliguori@amazon.com, aurelien@aurel32.net Subject: [Qemu-devel] [PULL 045/130] target-ppc: VSX Stage 4: Add Scalar SP Fused Multiply-Adds X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org From: Tom Musta This patch adds the Single Precision VSX Scalar Fused Multiply-Add instructions: xsmaddasp, xsmaddmsp, xssubasp, xssubmsp, xsnmaddasp, xsnmaddmsp, xsnmsubasp, xsnmsubmsp. The existing VSX_MADD() macro is modified to support rounding of the intermediate double precision result to single precision. Signed-off-by: Tom Musta Reviewed-by: Richard Henderson Signed-off-by: Alexander Graf --- target-ppc/fpu_helper.c | 82 ++++++++++++++++++++++++++++++++----------------- target-ppc/helper.h | 8 +++++ target-ppc/translate.c | 16 ++++++++++ 3 files changed, 77 insertions(+), 29 deletions(-) diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c index 33da462..7e5003a 100644 --- a/target-ppc/fpu_helper.c +++ b/target-ppc/fpu_helper.c @@ -2192,7 +2192,7 @@ VSX_TSQRT(xvtsqrtsp, 4, float32, f32, -126, 23) * afrm - A form (1=A, 0=M) * sfprf - set FPRF */ -#define VSX_MADD(op, nels, tp, fld, maddflgs, afrm, sfprf) \ +#define VSX_MADD(op, nels, tp, fld, maddflgs, afrm, sfprf, r2sp) \ void helper_##op(CPUPPCState *env, uint32_t opcode) \ { \ ppc_vsr_t xt_in, xa, xb, xt_out; \ @@ -2218,8 +2218,18 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ for (i = 0; i < nels; i++) { \ float_status tstat = env->fp_status; \ set_float_exception_flags(0, &tstat); \ - xt_out.fld[i] = tp##_muladd(xa.fld[i], b->fld[i], c->fld[i], \ - maddflgs, &tstat); \ + if (r2sp && (tstat.float_rounding_mode == float_round_nearest_even)) {\ + /* Avoid double rounding errors by rounding the intermediate */ \ + /* result to odd. */ \ + set_float_rounding_mode(float_round_to_zero, &tstat); \ + xt_out.fld[i] = tp##_muladd(xa.fld[i], b->fld[i], c->fld[i], \ + maddflgs, &tstat); \ + xt_out.fld[i] |= (get_float_exception_flags(&tstat) & \ + float_flag_inexact) != 0; \ + } else { \ + xt_out.fld[i] = tp##_muladd(xa.fld[i], b->fld[i], c->fld[i], \ + maddflgs, &tstat); \ + } \ env->fp_status.float_exception_flags |= tstat.float_exception_flags; \ \ if (unlikely(tstat.float_exception_flags & float_flag_invalid)) { \ @@ -2242,6 +2252,11 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXISI, sfprf); \ } \ } \ + \ + if (r2sp) { \ + xt_out.fld[i] = helper_frsp(env, xt_out.fld[i]); \ + } \ + \ if (sfprf) { \ helper_compute_fprf(env, xt_out.fld[i], sfprf); \ } \ @@ -2255,32 +2270,41 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ #define NMADD_FLGS float_muladd_negate_result #define NMSUB_FLGS (float_muladd_negate_c | float_muladd_negate_result) -VSX_MADD(xsmaddadp, 1, float64, f64, MADD_FLGS, 1, 1) -VSX_MADD(xsmaddmdp, 1, float64, f64, MADD_FLGS, 0, 1) -VSX_MADD(xsmsubadp, 1, float64, f64, MSUB_FLGS, 1, 1) -VSX_MADD(xsmsubmdp, 1, float64, f64, MSUB_FLGS, 0, 1) -VSX_MADD(xsnmaddadp, 1, float64, f64, NMADD_FLGS, 1, 1) -VSX_MADD(xsnmaddmdp, 1, float64, f64, NMADD_FLGS, 0, 1) -VSX_MADD(xsnmsubadp, 1, float64, f64, NMSUB_FLGS, 1, 1) -VSX_MADD(xsnmsubmdp, 1, float64, f64, NMSUB_FLGS, 0, 1) - -VSX_MADD(xvmaddadp, 2, float64, f64, MADD_FLGS, 1, 0) -VSX_MADD(xvmaddmdp, 2, float64, f64, MADD_FLGS, 0, 0) -VSX_MADD(xvmsubadp, 2, float64, f64, MSUB_FLGS, 1, 0) -VSX_MADD(xvmsubmdp, 2, float64, f64, MSUB_FLGS, 0, 0) -VSX_MADD(xvnmaddadp, 2, float64, f64, NMADD_FLGS, 1, 0) -VSX_MADD(xvnmaddmdp, 2, float64, f64, NMADD_FLGS, 0, 0) -VSX_MADD(xvnmsubadp, 2, float64, f64, NMSUB_FLGS, 1, 0) -VSX_MADD(xvnmsubmdp, 2, float64, f64, NMSUB_FLGS, 0, 0) - -VSX_MADD(xvmaddasp, 4, float32, f32, MADD_FLGS, 1, 0) -VSX_MADD(xvmaddmsp, 4, float32, f32, MADD_FLGS, 0, 0) -VSX_MADD(xvmsubasp, 4, float32, f32, MSUB_FLGS, 1, 0) -VSX_MADD(xvmsubmsp, 4, float32, f32, MSUB_FLGS, 0, 0) -VSX_MADD(xvnmaddasp, 4, float32, f32, NMADD_FLGS, 1, 0) -VSX_MADD(xvnmaddmsp, 4, float32, f32, NMADD_FLGS, 0, 0) -VSX_MADD(xvnmsubasp, 4, float32, f32, NMSUB_FLGS, 1, 0) -VSX_MADD(xvnmsubmsp, 4, float32, f32, NMSUB_FLGS, 0, 0) +VSX_MADD(xsmaddadp, 1, float64, f64, MADD_FLGS, 1, 1, 0) +VSX_MADD(xsmaddmdp, 1, float64, f64, MADD_FLGS, 0, 1, 0) +VSX_MADD(xsmsubadp, 1, float64, f64, MSUB_FLGS, 1, 1, 0) +VSX_MADD(xsmsubmdp, 1, float64, f64, MSUB_FLGS, 0, 1, 0) +VSX_MADD(xsnmaddadp, 1, float64, f64, NMADD_FLGS, 1, 1, 0) +VSX_MADD(xsnmaddmdp, 1, float64, f64, NMADD_FLGS, 0, 1, 0) +VSX_MADD(xsnmsubadp, 1, float64, f64, NMSUB_FLGS, 1, 1, 0) +VSX_MADD(xsnmsubmdp, 1, float64, f64, NMSUB_FLGS, 0, 1, 0) + +VSX_MADD(xsmaddasp, 1, float64, f64, MADD_FLGS, 1, 1, 1) +VSX_MADD(xsmaddmsp, 1, float64, f64, MADD_FLGS, 0, 1, 1) +VSX_MADD(xsmsubasp, 1, float64, f64, MSUB_FLGS, 1, 1, 1) +VSX_MADD(xsmsubmsp, 1, float64, f64, MSUB_FLGS, 0, 1, 1) +VSX_MADD(xsnmaddasp, 1, float64, f64, NMADD_FLGS, 1, 1, 1) +VSX_MADD(xsnmaddmsp, 1, float64, f64, NMADD_FLGS, 0, 1, 1) +VSX_MADD(xsnmsubasp, 1, float64, f64, NMSUB_FLGS, 1, 1, 1) +VSX_MADD(xsnmsubmsp, 1, float64, f64, NMSUB_FLGS, 0, 1, 1) + +VSX_MADD(xvmaddadp, 2, float64, f64, MADD_FLGS, 1, 0, 0) +VSX_MADD(xvmaddmdp, 2, float64, f64, MADD_FLGS, 0, 0, 0) +VSX_MADD(xvmsubadp, 2, float64, f64, MSUB_FLGS, 1, 0, 0) +VSX_MADD(xvmsubmdp, 2, float64, f64, MSUB_FLGS, 0, 0, 0) +VSX_MADD(xvnmaddadp, 2, float64, f64, NMADD_FLGS, 1, 0, 0) +VSX_MADD(xvnmaddmdp, 2, float64, f64, NMADD_FLGS, 0, 0, 0) +VSX_MADD(xvnmsubadp, 2, float64, f64, NMSUB_FLGS, 1, 0, 0) +VSX_MADD(xvnmsubmdp, 2, float64, f64, NMSUB_FLGS, 0, 0, 0) + +VSX_MADD(xvmaddasp, 4, float32, f32, MADD_FLGS, 1, 0, 0) +VSX_MADD(xvmaddmsp, 4, float32, f32, MADD_FLGS, 0, 0, 0) +VSX_MADD(xvmsubasp, 4, float32, f32, MSUB_FLGS, 1, 0, 0) +VSX_MADD(xvmsubmsp, 4, float32, f32, MSUB_FLGS, 0, 0, 0) +VSX_MADD(xvnmaddasp, 4, float32, f32, NMADD_FLGS, 1, 0, 0) +VSX_MADD(xvnmaddmsp, 4, float32, f32, NMADD_FLGS, 0, 0, 0) +VSX_MADD(xvnmsubasp, 4, float32, f32, NMSUB_FLGS, 1, 0, 0) +VSX_MADD(xvnmsubmsp, 4, float32, f32, NMSUB_FLGS, 0, 0, 0) #define VSX_SCALAR_CMP(op, ordered) \ void helper_##op(CPUPPCState *env, uint32_t opcode) \ diff --git a/target-ppc/helper.h b/target-ppc/helper.h index 84c6ee1..655b670 100644 --- a/target-ppc/helper.h +++ b/target-ppc/helper.h @@ -293,6 +293,14 @@ DEF_HELPER_2(xsdivsp, void, env, i32) DEF_HELPER_2(xsresp, void, env, i32) DEF_HELPER_2(xssqrtsp, void, env, i32) DEF_HELPER_2(xsrsqrtesp, void, env, i32) +DEF_HELPER_2(xsmaddasp, void, env, i32) +DEF_HELPER_2(xsmaddmsp, void, env, i32) +DEF_HELPER_2(xsmsubasp, void, env, i32) +DEF_HELPER_2(xsmsubmsp, void, env, i32) +DEF_HELPER_2(xsnmaddasp, void, env, i32) +DEF_HELPER_2(xsnmaddmsp, void, env, i32) +DEF_HELPER_2(xsnmsubasp, void, env, i32) +DEF_HELPER_2(xsnmsubmsp, void, env, i32) DEF_HELPER_2(xvadddp, void, env, i32) DEF_HELPER_2(xvsubdp, void, env, i32) diff --git a/target-ppc/translate.c b/target-ppc/translate.c index fcf9517..b2a610c 100644 --- a/target-ppc/translate.c +++ b/target-ppc/translate.c @@ -7365,6 +7365,14 @@ GEN_VSX_HELPER_2(xsdivsp, 0x00, 0x03, 0, PPC2_VSX207) GEN_VSX_HELPER_2(xsresp, 0x14, 0x01, 0, PPC2_VSX207) GEN_VSX_HELPER_2(xssqrtsp, 0x16, 0x00, 0, PPC2_VSX207) GEN_VSX_HELPER_2(xsrsqrtesp, 0x14, 0x00, 0, PPC2_VSX207) +GEN_VSX_HELPER_2(xsmaddasp, 0x04, 0x00, 0, PPC2_VSX207) +GEN_VSX_HELPER_2(xsmaddmsp, 0x04, 0x01, 0, PPC2_VSX207) +GEN_VSX_HELPER_2(xsmsubasp, 0x04, 0x02, 0, PPC2_VSX207) +GEN_VSX_HELPER_2(xsmsubmsp, 0x04, 0x03, 0, PPC2_VSX207) +GEN_VSX_HELPER_2(xsnmaddasp, 0x04, 0x10, 0, PPC2_VSX207) +GEN_VSX_HELPER_2(xsnmaddmsp, 0x04, 0x11, 0, PPC2_VSX207) +GEN_VSX_HELPER_2(xsnmsubasp, 0x04, 0x12, 0, PPC2_VSX207) +GEN_VSX_HELPER_2(xsnmsubmsp, 0x04, 0x13, 0, PPC2_VSX207) GEN_VSX_HELPER_2(xvadddp, 0x00, 0x0C, 0, PPC2_VSX) GEN_VSX_HELPER_2(xvsubdp, 0x00, 0x0D, 0, PPC2_VSX) @@ -10179,6 +10187,14 @@ GEN_XX3FORM(xsdivsp, 0x00, 0x03, PPC2_VSX207), GEN_XX2FORM(xsresp, 0x14, 0x01, PPC2_VSX207), GEN_XX2FORM(xssqrtsp, 0x16, 0x00, PPC2_VSX207), GEN_XX2FORM(xsrsqrtesp, 0x14, 0x00, PPC2_VSX207), +GEN_XX3FORM(xsmaddasp, 0x04, 0x00, PPC2_VSX207), +GEN_XX3FORM(xsmaddmsp, 0x04, 0x01, PPC2_VSX207), +GEN_XX3FORM(xsmsubasp, 0x04, 0x02, PPC2_VSX207), +GEN_XX3FORM(xsmsubmsp, 0x04, 0x03, PPC2_VSX207), +GEN_XX3FORM(xsnmaddasp, 0x04, 0x10, PPC2_VSX207), +GEN_XX3FORM(xsnmaddmsp, 0x04, 0x11, PPC2_VSX207), +GEN_XX3FORM(xsnmsubasp, 0x04, 0x12, PPC2_VSX207), +GEN_XX3FORM(xsnmsubmsp, 0x04, 0x13, PPC2_VSX207), GEN_XX3FORM(xvadddp, 0x00, 0x0C, PPC2_VSX), GEN_XX3FORM(xvsubdp, 0x00, 0x0D, PPC2_VSX),