From patchwork Fri Jan 3 18:22:09 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Musta X-Patchwork-Id: 306661 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 7E9B22C00A5 for ; Sat, 4 Jan 2014 05:28:58 +1100 (EST) Received: from localhost ([::1]:51255 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Vz9UO-00069Q-Bt for incoming@patchwork.ozlabs.org; Fri, 03 Jan 2014 13:28:56 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45352) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Vz9Ol-0005Bh-Cl for qemu-devel@nongnu.org; Fri, 03 Jan 2014 13:23:15 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Vz9Oc-0000F9-PR for qemu-devel@nongnu.org; Fri, 03 Jan 2014 13:23:07 -0500 Received: from mail-qe0-x22c.google.com ([2607:f8b0:400d:c02::22c]:47325) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Vz9Oc-0000Ez-I9; Fri, 03 Jan 2014 13:22:58 -0500 Received: by mail-qe0-f44.google.com with SMTP id nd7so15999730qeb.3 for ; Fri, 03 Jan 2014 10:22:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=QHsel61pwTSfIsvCvOvi/A8cjeVfmiijOmnLrEYTF2k=; b=skwxd2qyxqSHywosI2oEn9YTdJLypMEnwChS6ab9rmsmdxEhgioZuYGg+ZCvbhmK07 kzoi39OoQac8kQCC9atenN1Hz9DB5I6+0/it3ztx5vA/KWfrJnF3rLCYSYHfpnmFxub3 mK8rF6Hzm9o/987ImWiiBPHD9/P0aaveJifeytceE8J6fJvuIqJF+NqMLnCQ7peE7/fU ojneDIODQsuccG1mK8vTK1AFn5f2HOdmnYxBlzCFNGpC8Qh80LLwxZFDEsRJL1JhE5w+ DqKx/NxYx3AcFNPio0xdwND2IrQLtswwIrs5bY0xqXxQHYVaihNuUmRO276b5viSGLvL Uzxw== X-Received: by 10.224.161.19 with SMTP id p19mr70760167qax.57.1388773376849; Fri, 03 Jan 2014 10:22:56 -0800 (PST) Received: from tmusta-sc.rchland.ibm.com (rchp4.rochester.ibm.com. [129.42.161.36]) by mx.google.com with ESMTPSA id f5sm22839079qas.11.2014.01.03.10.22.55 for (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 03 Jan 2014 10:22:56 -0800 (PST) From: Tom Musta To: qemu-devel@nongnu.org Date: Fri, 3 Jan 2014 12:22:09 -0600 Message-Id: <1388773331-787-13-git-send-email-tommusta@gmail.com> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1388773331-787-1-git-send-email-tommusta@gmail.com> References: <1388773331-787-1-git-send-email-tommusta@gmail.com> X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2607:f8b0:400d:c02::22c Cc: Tom Musta , qemu-ppc@nongnu.org Subject: [Qemu-devel] [V5 PATCH 12/14] target-ppc: VSX Stage 4: Add Scalar SP Fused Multiply-Adds X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org This patch adds the Single Precision VSX Scalar Fused Multiply-Add instructions: xsmaddasp, xsmaddmsp, xssubasp, xssubmsp, xsnmaddasp, xsnmaddmsp, xsnmsubasp, xsnmsubmsp. The existing VSX_MADD() macro is modified to support rounding of the intermediate double precision result to single precision. Signed-off-by: Tom Musta Reviewed-by: Richard Henderson --- V2: Re-implemented per feedback from Richard Henderson. In order to avoid double rounding and incorrect results, the operands must be converted to true single precision values and use the single precision fused multiply/add routine. V3: Re-implemented per feedback from Richard Henderson (I did not fully understand his comment when I implemented V2). V4: Changed to use helper_frsp (inadvertently re-injected when I used an earlier patch). Thanks to Richard Henderson for catching this. target-ppc/fpu_helper.c | 82 ++++++++++++++++++++++++++++++---------------- target-ppc/helper.h | 8 ++++ target-ppc/translate.c | 16 +++++++++ 3 files changed, 77 insertions(+), 29 deletions(-) diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c index 33da462..7e5003a 100644 --- a/target-ppc/fpu_helper.c +++ b/target-ppc/fpu_helper.c @@ -2192,7 +2192,7 @@ VSX_TSQRT(xvtsqrtsp, 4, float32, f32, -126, 23) * afrm - A form (1=A, 0=M) * sfprf - set FPRF */ -#define VSX_MADD(op, nels, tp, fld, maddflgs, afrm, sfprf) \ +#define VSX_MADD(op, nels, tp, fld, maddflgs, afrm, sfprf, r2sp) \ void helper_##op(CPUPPCState *env, uint32_t opcode) \ { \ ppc_vsr_t xt_in, xa, xb, xt_out; \ @@ -2218,8 +2218,18 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ for (i = 0; i < nels; i++) { \ float_status tstat = env->fp_status; \ set_float_exception_flags(0, &tstat); \ - xt_out.fld[i] = tp##_muladd(xa.fld[i], b->fld[i], c->fld[i], \ - maddflgs, &tstat); \ + if (r2sp && (tstat.float_rounding_mode == float_round_nearest_even)) {\ + /* Avoid double rounding errors by rounding the intermediate */ \ + /* result to odd. */ \ + set_float_rounding_mode(float_round_to_zero, &tstat); \ + xt_out.fld[i] = tp##_muladd(xa.fld[i], b->fld[i], c->fld[i], \ + maddflgs, &tstat); \ + xt_out.fld[i] |= (get_float_exception_flags(&tstat) & \ + float_flag_inexact) != 0; \ + } else { \ + xt_out.fld[i] = tp##_muladd(xa.fld[i], b->fld[i], c->fld[i], \ + maddflgs, &tstat); \ + } \ env->fp_status.float_exception_flags |= tstat.float_exception_flags; \ \ if (unlikely(tstat.float_exception_flags & float_flag_invalid)) { \ @@ -2242,6 +2252,11 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXISI, sfprf); \ } \ } \ + \ + if (r2sp) { \ + xt_out.fld[i] = helper_frsp(env, xt_out.fld[i]); \ + } \ + \ if (sfprf) { \ helper_compute_fprf(env, xt_out.fld[i], sfprf); \ } \ @@ -2255,32 +2270,41 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ #define NMADD_FLGS float_muladd_negate_result #define NMSUB_FLGS (float_muladd_negate_c | float_muladd_negate_result) -VSX_MADD(xsmaddadp, 1, float64, f64, MADD_FLGS, 1, 1) -VSX_MADD(xsmaddmdp, 1, float64, f64, MADD_FLGS, 0, 1) -VSX_MADD(xsmsubadp, 1, float64, f64, MSUB_FLGS, 1, 1) -VSX_MADD(xsmsubmdp, 1, float64, f64, MSUB_FLGS, 0, 1) -VSX_MADD(xsnmaddadp, 1, float64, f64, NMADD_FLGS, 1, 1) -VSX_MADD(xsnmaddmdp, 1, float64, f64, NMADD_FLGS, 0, 1) -VSX_MADD(xsnmsubadp, 1, float64, f64, NMSUB_FLGS, 1, 1) -VSX_MADD(xsnmsubmdp, 1, float64, f64, NMSUB_FLGS, 0, 1) - -VSX_MADD(xvmaddadp, 2, float64, f64, MADD_FLGS, 1, 0) -VSX_MADD(xvmaddmdp, 2, float64, f64, MADD_FLGS, 0, 0) -VSX_MADD(xvmsubadp, 2, float64, f64, MSUB_FLGS, 1, 0) -VSX_MADD(xvmsubmdp, 2, float64, f64, MSUB_FLGS, 0, 0) -VSX_MADD(xvnmaddadp, 2, float64, f64, NMADD_FLGS, 1, 0) -VSX_MADD(xvnmaddmdp, 2, float64, f64, NMADD_FLGS, 0, 0) -VSX_MADD(xvnmsubadp, 2, float64, f64, NMSUB_FLGS, 1, 0) -VSX_MADD(xvnmsubmdp, 2, float64, f64, NMSUB_FLGS, 0, 0) - -VSX_MADD(xvmaddasp, 4, float32, f32, MADD_FLGS, 1, 0) -VSX_MADD(xvmaddmsp, 4, float32, f32, MADD_FLGS, 0, 0) -VSX_MADD(xvmsubasp, 4, float32, f32, MSUB_FLGS, 1, 0) -VSX_MADD(xvmsubmsp, 4, float32, f32, MSUB_FLGS, 0, 0) -VSX_MADD(xvnmaddasp, 4, float32, f32, NMADD_FLGS, 1, 0) -VSX_MADD(xvnmaddmsp, 4, float32, f32, NMADD_FLGS, 0, 0) -VSX_MADD(xvnmsubasp, 4, float32, f32, NMSUB_FLGS, 1, 0) -VSX_MADD(xvnmsubmsp, 4, float32, f32, NMSUB_FLGS, 0, 0) +VSX_MADD(xsmaddadp, 1, float64, f64, MADD_FLGS, 1, 1, 0) +VSX_MADD(xsmaddmdp, 1, float64, f64, MADD_FLGS, 0, 1, 0) +VSX_MADD(xsmsubadp, 1, float64, f64, MSUB_FLGS, 1, 1, 0) +VSX_MADD(xsmsubmdp, 1, float64, f64, MSUB_FLGS, 0, 1, 0) +VSX_MADD(xsnmaddadp, 1, float64, f64, NMADD_FLGS, 1, 1, 0) +VSX_MADD(xsnmaddmdp, 1, float64, f64, NMADD_FLGS, 0, 1, 0) +VSX_MADD(xsnmsubadp, 1, float64, f64, NMSUB_FLGS, 1, 1, 0) +VSX_MADD(xsnmsubmdp, 1, float64, f64, NMSUB_FLGS, 0, 1, 0) + +VSX_MADD(xsmaddasp, 1, float64, f64, MADD_FLGS, 1, 1, 1) +VSX_MADD(xsmaddmsp, 1, float64, f64, MADD_FLGS, 0, 1, 1) +VSX_MADD(xsmsubasp, 1, float64, f64, MSUB_FLGS, 1, 1, 1) +VSX_MADD(xsmsubmsp, 1, float64, f64, MSUB_FLGS, 0, 1, 1) +VSX_MADD(xsnmaddasp, 1, float64, f64, NMADD_FLGS, 1, 1, 1) +VSX_MADD(xsnmaddmsp, 1, float64, f64, NMADD_FLGS, 0, 1, 1) +VSX_MADD(xsnmsubasp, 1, float64, f64, NMSUB_FLGS, 1, 1, 1) +VSX_MADD(xsnmsubmsp, 1, float64, f64, NMSUB_FLGS, 0, 1, 1) + +VSX_MADD(xvmaddadp, 2, float64, f64, MADD_FLGS, 1, 0, 0) +VSX_MADD(xvmaddmdp, 2, float64, f64, MADD_FLGS, 0, 0, 0) +VSX_MADD(xvmsubadp, 2, float64, f64, MSUB_FLGS, 1, 0, 0) +VSX_MADD(xvmsubmdp, 2, float64, f64, MSUB_FLGS, 0, 0, 0) +VSX_MADD(xvnmaddadp, 2, float64, f64, NMADD_FLGS, 1, 0, 0) +VSX_MADD(xvnmaddmdp, 2, float64, f64, NMADD_FLGS, 0, 0, 0) +VSX_MADD(xvnmsubadp, 2, float64, f64, NMSUB_FLGS, 1, 0, 0) +VSX_MADD(xvnmsubmdp, 2, float64, f64, NMSUB_FLGS, 0, 0, 0) + +VSX_MADD(xvmaddasp, 4, float32, f32, MADD_FLGS, 1, 0, 0) +VSX_MADD(xvmaddmsp, 4, float32, f32, MADD_FLGS, 0, 0, 0) +VSX_MADD(xvmsubasp, 4, float32, f32, MSUB_FLGS, 1, 0, 0) +VSX_MADD(xvmsubmsp, 4, float32, f32, MSUB_FLGS, 0, 0, 0) +VSX_MADD(xvnmaddasp, 4, float32, f32, NMADD_FLGS, 1, 0, 0) +VSX_MADD(xvnmaddmsp, 4, float32, f32, NMADD_FLGS, 0, 0, 0) +VSX_MADD(xvnmsubasp, 4, float32, f32, NMSUB_FLGS, 1, 0, 0) +VSX_MADD(xvnmsubmsp, 4, float32, f32, NMSUB_FLGS, 0, 0, 0) #define VSX_SCALAR_CMP(op, ordered) \ void helper_##op(CPUPPCState *env, uint32_t opcode) \ diff --git a/target-ppc/helper.h b/target-ppc/helper.h index 84c6ee1..655b670 100644 --- a/target-ppc/helper.h +++ b/target-ppc/helper.h @@ -293,6 +293,14 @@ DEF_HELPER_2(xsdivsp, void, env, i32) DEF_HELPER_2(xsresp, void, env, i32) DEF_HELPER_2(xssqrtsp, void, env, i32) DEF_HELPER_2(xsrsqrtesp, void, env, i32) +DEF_HELPER_2(xsmaddasp, void, env, i32) +DEF_HELPER_2(xsmaddmsp, void, env, i32) +DEF_HELPER_2(xsmsubasp, void, env, i32) +DEF_HELPER_2(xsmsubmsp, void, env, i32) +DEF_HELPER_2(xsnmaddasp, void, env, i32) +DEF_HELPER_2(xsnmaddmsp, void, env, i32) +DEF_HELPER_2(xsnmsubasp, void, env, i32) +DEF_HELPER_2(xsnmsubmsp, void, env, i32) DEF_HELPER_2(xvadddp, void, env, i32) DEF_HELPER_2(xvsubdp, void, env, i32) diff --git a/target-ppc/translate.c b/target-ppc/translate.c index 950c02e..5b68c7e 100644 --- a/target-ppc/translate.c +++ b/target-ppc/translate.c @@ -7365,6 +7365,14 @@ GEN_VSX_HELPER_2(xsdivsp, 0x00, 0x03, 0, PPC2_VSX207) GEN_VSX_HELPER_2(xsresp, 0x14, 0x01, 0, PPC2_VSX207) GEN_VSX_HELPER_2(xssqrtsp, 0x16, 0x00, 0, PPC2_VSX207) GEN_VSX_HELPER_2(xsrsqrtesp, 0x14, 0x00, 0, PPC2_VSX207) +GEN_VSX_HELPER_2(xsmaddasp, 0x04, 0x00, 0, PPC2_VSX207) +GEN_VSX_HELPER_2(xsmaddmsp, 0x04, 0x01, 0, PPC2_VSX207) +GEN_VSX_HELPER_2(xsmsubasp, 0x04, 0x02, 0, PPC2_VSX207) +GEN_VSX_HELPER_2(xsmsubmsp, 0x04, 0x03, 0, PPC2_VSX207) +GEN_VSX_HELPER_2(xsnmaddasp, 0x04, 0x10, 0, PPC2_VSX207) +GEN_VSX_HELPER_2(xsnmaddmsp, 0x04, 0x11, 0, PPC2_VSX207) +GEN_VSX_HELPER_2(xsnmsubasp, 0x04, 0x12, 0, PPC2_VSX207) +GEN_VSX_HELPER_2(xsnmsubmsp, 0x04, 0x13, 0, PPC2_VSX207) GEN_VSX_HELPER_2(xvadddp, 0x00, 0x0C, 0, PPC2_VSX) GEN_VSX_HELPER_2(xvsubdp, 0x00, 0x0D, 0, PPC2_VSX) @@ -10179,6 +10187,14 @@ GEN_XX3FORM(xsdivsp, 0x00, 0x03, PPC2_VSX207), GEN_XX2FORM(xsresp, 0x14, 0x01, PPC2_VSX207), GEN_XX2FORM(xssqrtsp, 0x16, 0x00, PPC2_VSX207), GEN_XX2FORM(xsrsqrtesp, 0x14, 0x00, PPC2_VSX207), +GEN_XX3FORM(xsmaddasp, 0x04, 0x00, PPC2_VSX207), +GEN_XX3FORM(xsmaddmsp, 0x04, 0x01, PPC2_VSX207), +GEN_XX3FORM(xsmsubasp, 0x04, 0x02, PPC2_VSX207), +GEN_XX3FORM(xsmsubmsp, 0x04, 0x03, PPC2_VSX207), +GEN_XX3FORM(xsnmaddasp, 0x04, 0x10, PPC2_VSX207), +GEN_XX3FORM(xsnmaddmsp, 0x04, 0x11, PPC2_VSX207), +GEN_XX3FORM(xsnmsubasp, 0x04, 0x12, PPC2_VSX207), +GEN_XX3FORM(xsnmsubmsp, 0x04, 0x13, PPC2_VSX207), GEN_XX3FORM(xvadddp, 0x00, 0x0C, PPC2_VSX), GEN_XX3FORM(xvsubdp, 0x00, 0x0D, PPC2_VSX),