From patchwork Tue Jul 19 21:02:51 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 105542 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id D48D5B6F64 for ; Wed, 20 Jul 2011 07:07:03 +1000 (EST) Received: (qmail 4047 invoked by alias); 19 Jul 2011 21:07:01 -0000 Received: (qmail 4036 invoked by uid 22791); 19 Jul 2011 21:07:00 -0000 X-SWARE-Spam-Status: No, hits=-1.7 required=5.0 tests=AWL, BAYES_00, NO_DNS_FOR_FROM, RP_MATCHES_RCVD, TW_BF, TW_BM, TW_DF, TW_FP, TW_MD, TW_VN, TW_XS, TW_XV X-Spam-Check-By: sourceware.org Received: from e38.co.us.ibm.com (HELO e38.co.us.ibm.com) (32.97.110.159) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 19 Jul 2011 21:06:28 +0000 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e38.co.us.ibm.com (8.14.4/8.13.1) with ESMTP id p6JDTPVp029075 for ; Tue, 19 Jul 2011 07:29:25 -0600 Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by d03relay04.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p6JL6I54111760 for ; Tue, 19 Jul 2011 15:06:19 -0600 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p6JL2w2Z013373 for ; Tue, 19 Jul 2011 15:02:58 -0600 Received: from hungry-tiger.westford.ibm.com (hungry-tiger.westford.ibm.com [9.33.37.78]) by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id p6JL2uXm012922; Tue, 19 Jul 2011 15:02:57 -0600 Received: by hungry-tiger.westford.ibm.com (Postfix, from userid 500) id 6E4C1F8302; Tue, 19 Jul 2011 17:02:51 -0400 (EDT) Date: Tue, 19 Jul 2011 17:02:51 -0400 From: Michael Meissner To: gcc-patches@gcc.gnu.org, dje.gcc@gmail.com Subject: [PATCH], Add 4 operand FMA support back into power7 Message-ID: <20110719210251.GA9554@hungry-tiger.westford.ibm.com> Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, dje.gcc@gmail.com MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org (I had an emacs failure when sending out this message, and I may have sent out a blank message by accident -- sorry if I did). When I did the original power7 support, I switched all of the floating point operations to use VSX instructions instead of the traditional instructions. The traditional fused multiply and add instructions (FMA) have 4 operands so that the destination can be a separate register from the inputs, while the VSX encoding requires the output to overlap the addend or one of the multiplies. Occasionally you would get better code if the register allocator had the freedom to use a different output register. This patch adds 4 operand FMAs back for scalar double precision. It also adds the Altivec 4 operand FMAs for vector single precision where the Altivec instruction set provides suitable instructions. It also adjusts the tests that were depending on using the VSX form of the instructions. I have bootstrapped the patches and reran the test suite with no regressions. In addition, I have built and run all of Spec 2006 with the patches. Are these patches ok to install in GCC 4.7? [gcc] 2011-07-19 Michael Meissner * config/rs6000/vsx.md (vsx_fma*): Use 4 argument fma instructions where we can use them from the standard and altivec instruction sets, instead of always using the 3 operand VSX forms that require the destination to overlap one of the inputs. (vsx_fms*): Ditto. (vsx_fnma*): Ditto. (vsx_fnms*): Ditto. * config/rs6000/rs6000.md (fmadf4_fpr): Set fp_type fp_maddsub_d for DF types. (fmsdf4_fpr): Ditto. (nfmadf4_fpr): Ditto. (nfmsdf4_fpr): Ditto. [gcc/testsuite] 2011-07-12 Michael Meissner * gcc.target/powerpc/ppc-fma-1.c: Adjust to allow non-VSX fmas to be generated. * gcc.target/powerpc/ppc-fma-2.c: Ditto. * gcc.target/powerpc/recip-3.c: Ditto. Index: gcc/config/rs6000/vsx.md =================================================================== --- gcc/config/rs6000/vsx.md (revision 176207) +++ gcc/config/rs6000/vsx.md (working copy) @@ -524,46 +524,112 @@ (define_insn "*vsx_tsqrt2_internal [(set_attr "type" "") (set_attr "fp_type" "")]) -;; Fused vector multiply/add instructions +;; Fused vector multiply/add instructions Support the classical DF versions of +;; fma, which allows the target to be a separate register from the 3 inputs. +;; Under VSX, the target must be either the addend or the first multiply. +;; Where we can, also do the same for the Altivec V4SF fmas. + +(define_insn "*vsx_fmadf4" + [(set (match_operand:DF 0 "vsx_register_operand" "=ws,ws,?wa,?wa,d") + (fma:DF + (match_operand:DF 1 "vsx_register_operand" "%ws,ws,wa,wa,d") + (match_operand:DF 2 "vsx_register_operand" "ws,0,wa,0,d") + (match_operand:DF 3 "vsx_register_operand" "0,ws,0,wa,d")))] + "VECTOR_UNIT_VSX_P (DFmode)" + "@ + xsmaddadp %x0,%x1,%x2 + xsmaddmdp %x0,%x1,%x3 + xsmaddadp %x0,%x1,%x2 + xsmaddmdp %x0,%x1,%x3 + {fma|fmadd} %0,%1,%2,%3" + [(set_attr "type" "fp") + (set_attr "fp_type" "fp_maddsub_d")]) + +(define_insn "*vsx_fmav4sf4" + [(set (match_operand:V4SF 0 "vsx_register_operand" "=ws,ws,?wa,?wa,v") + (fma:V4SF + (match_operand:V4SF 1 "vsx_register_operand" "%ws,ws,wa,wa,v") + (match_operand:V4SF 2 "vsx_register_operand" "ws,0,wa,0,v") + (match_operand:V4SF 3 "vsx_register_operand" "0,ws,0,wa,v")))] + "VECTOR_UNIT_VSX_P (V4SFmode)" + "@ + xvmaddasp %x0,%x1,%x2 + xvmaddmsp %x0,%x1,%x3 + xvmaddasp %x0,%x1,%x2 + xvmaddmsp %x0,%x1,%x3 + vmaddfp %0,%1,%2,%3" + [(set_attr "type" "vecfloat")]) -(define_insn "*vsx_fma4" - [(set (match_operand:VSX_B 0 "vsx_register_operand" "=,,?wa,?wa") - (fma:VSX_B - (match_operand:VSX_B 1 "vsx_register_operand" "%,,wa,wa") - (match_operand:VSX_B 2 "vsx_register_operand" ",0,wa,0") - (match_operand:VSX_B 3 "vsx_register_operand" "0,,0,wa")))] - "VECTOR_UNIT_VSX_P (mode)" +(define_insn "*vsx_fmav2df4" + [(set (match_operand:V2DF 0 "vsx_register_operand" "=ws,ws,?wa,?wa") + (fma:V2DF + (match_operand:V2DF 1 "vsx_register_operand" "%ws,ws,wa,wa") + (match_operand:V2DF 2 "vsx_register_operand" "ws,0,wa,0") + (match_operand:V2DF 3 "vsx_register_operand" "0,ws,0,wa")))] + "VECTOR_UNIT_VSX_P (V2DFmode)" "@ - xmadda %x0,%x1,%x2 - xmaddm %x0,%x1,%x3 - xmadda %x0,%x1,%x2 - xmaddm %x0,%x1,%x3" - [(set_attr "type" "") - (set_attr "fp_type" "")]) + xvmaddadp %x0,%x1,%x2 + xvmaddmdp %x0,%x1,%x3 + xvmaddadp %x0,%x1,%x2 + xvmaddmdp %x0,%x1,%x3" + [(set_attr "type" "vecfloat")]) + +(define_insn "*vsx_fmsdf4" + [(set (match_operand:DF 0 "vsx_register_operand" "=ws,ws,?wa,?wa,d") + (fma:DF + (match_operand:DF 1 "vsx_register_operand" "%ws,ws,wa,wa,d") + (match_operand:DF 2 "vsx_register_operand" "ws,0,wa,0,d") + (neg:DF + (match_operand:DF 3 "vsx_register_operand" "0,ws,0,wa,d"))))] + "VECTOR_UNIT_VSX_P (DFmode)" + "@ + xsmsubadp %x0,%x1,%x2 + xsmsubmdp %x0,%x1,%x3 + xsmsubadp %x0,%x1,%x2 + xsmsubmdp %x0,%x1,%x3 + {fms|fmsub} %0,%1,%2,%3" + [(set_attr "type" "fp") + (set_attr "fp_type" "fp_maddsub_d")]) (define_insn "*vsx_fms4" - [(set (match_operand:VSX_B 0 "vsx_register_operand" "=,,?wa,?wa") - (fma:VSX_B - (match_operand:VSX_B 1 "vsx_register_operand" "%,,wa,wa") - (match_operand:VSX_B 2 "vsx_register_operand" ",0,wa,0") - (neg:VSX_B - (match_operand:VSX_B 3 "vsx_register_operand" "0,,0,wa"))))] + [(set (match_operand:VSX_F 0 "vsx_register_operand" "=,,?wa,?wa") + (fma:VSX_F + (match_operand:VSX_F 1 "vsx_register_operand" "%,,wa,wa") + (match_operand:VSX_F 2 "vsx_register_operand" ",0,wa,0") + (neg:VSX_F + (match_operand:VSX_F 3 "vsx_register_operand" "0,,0,wa"))))] "VECTOR_UNIT_VSX_P (mode)" "@ xmsuba %x0,%x1,%x2 xmsubm %x0,%x1,%x3 xmsuba %x0,%x1,%x2 xmsubm %x0,%x1,%x3" - [(set_attr "type" "") - (set_attr "fp_type" "")]) + [(set_attr "type" "vecfloat")]) + +(define_insn "*vsx_nfmadf4" + [(set (match_operand:DF 0 "vsx_register_operand" "=ws,ws,?wa,?wa,d") + (neg:DF + (fma:DF + (match_operand:DF 1 "vsx_register_operand" "ws,ws,wa,wa,d") + (match_operand:DF 2 "vsx_register_operand" "ws,0,wa,0,d") + (match_operand:DF 3 "vsx_register_operand" "0,ws,0,wa,d"))))] + "VECTOR_UNIT_VSX_P (DFmode)" + "@ + xsnmaddadp %x0,%x1,%x2 + xsnmaddmdp %x0,%x1,%x3 + xsnmaddadp %x0,%x1,%x2 + xsnmaddmdp %x0,%x1,%x3 + {fnma|fnmadd} %0,%1,%2,%3" + [(set_attr "type" "fp") + (set_attr "fp_type" "fp_maddsub_d")]) (define_insn "*vsx_nfma4" - [(set (match_operand:VSX_B 0 "vsx_register_operand" "=,,?wa,?wa") - (neg:VSX_B - (fma:VSX_B - (match_operand:VSX_B 1 "vsx_register_operand" ",,wa,wa") - (match_operand:VSX_B 2 "vsx_register_operand" ",0,wa,0") - (match_operand:VSX_B 3 "vsx_register_operand" "0,,0,wa"))))] + [(set (match_operand:VSX_F 0 "vsx_register_operand" "=,,?wa,?wa") + (neg:VSX_F + (fma:VSX_F + (match_operand:VSX_F 1 "vsx_register_operand" ",,wa,wa") + (match_operand:VSX_F 2 "vsx_register_operand" ",0,wa,0") + (match_operand:VSX_F 3 "vsx_register_operand" "0,,0,wa"))))] "VECTOR_UNIT_VSX_P (mode)" "@ xnmadda %x0,%x1,%x2 @@ -573,22 +639,56 @@ (define_insn "*vsx_nfma4" [(set_attr "type" "") (set_attr "fp_type" "")]) -(define_insn "*vsx_nfms4" - [(set (match_operand:VSX_B 0 "vsx_register_operand" "=,,?wa,?wa") - (neg:VSX_B - (fma:VSX_B - (match_operand:VSX_B 1 "vsx_register_operand" "%,,wa,wa") - (match_operand:VSX_B 2 "vsx_register_operand" ",0,wa,0") - (neg:VSX_B - (match_operand:VSX_B 3 "vsx_register_operand" "0,,0,wa")))))] - "VECTOR_UNIT_VSX_P (mode)" +(define_insn "*vsx_nfmsdf4" + [(set (match_operand:DF 0 "vsx_register_operand" "=ws,ws,?wa,?wa,d") + (neg:DF + (fma:DF + (match_operand:DF 1 "vsx_register_operand" "%ws,ws,wa,wa,d") + (match_operand:DF 2 "vsx_register_operand" "ws,0,wa,0,d") + (neg:DF + (match_operand:DF 3 "vsx_register_operand" "0,ws,0,wa,d")))))] + "VECTOR_UNIT_VSX_P (DFmode)" "@ - xnmsuba %x0,%x1,%x2 - xnmsubm %x0,%x1,%x3 - xnmsuba %x0,%x1,%x2 - xnmsubm %x0,%x1,%x3" - [(set_attr "type" "") - (set_attr "fp_type" "")]) + xsnmsubadp %x0,%x1,%x2 + xsnmsubmdp %x0,%x1,%x3 + xsnmsubadp %x0,%x1,%x2 + xsnmsubmdp %x0,%x1,%x3 + {fnms|fnmsub} %0,%1,%2,%3" + [(set_attr "type" "fp") + (set_attr "fp_type" "fp_maddsub_d")]) + +(define_insn "*vsx_nfmsv4sf4" + [(set (match_operand:V4SF 0 "vsx_register_operand" "=wf,wf,?wa,?wa,v") + (neg:V4SF + (fma:V4SF + (match_operand:V4SF 1 "vsx_register_operand" "%wf,wf,wa,wa,v") + (match_operand:V4SF 2 "vsx_register_operand" "wf,0,wa,0,v") + (neg:V4SF + (match_operand:V4SF 3 "vsx_register_operand" "0,wf,0,wa,v")))))] + "VECTOR_UNIT_VSX_P (V4SFmode)" + "@ + xvnmsubasp %x0,%x1,%x2 + xvnmsubmsp %x0,%x1,%x3 + xvnmsubasp %x0,%x1,%x2 + xvnmsubmsp %x0,%x1,%x3 + vnmsubfp %0,%1,%2,%3" + [(set_attr "type" "vecfloat")]) + +(define_insn "*vsx_nfmsv2df4" + [(set (match_operand:V2DF 0 "vsx_register_operand" "=wd,wd,?wa,?wa") + (neg:V2DF + (fma:V2DF + (match_operand:V2DF 1 "vsx_register_operand" "%wd,wd,wa,wa") + (match_operand:V2DF 2 "vsx_register_operand" "wd,0,wa,0") + (neg:V2DF + (match_operand:V2DF 3 "vsx_register_operand" "0,wd,0,wa")))))] + "VECTOR_UNIT_VSX_P (V2DFmode)" + "@ + xvnmsubadp %x0,%x1,%x2 + xvnmsubmdp %x0,%x1,%x3 + xvnmsubadp %x0,%x1,%x2 + xvnmsubmdp %x0,%x1,%x3" + [(set_attr "type" "vecfloat")]) ;; Vector conditional expressions (no scalar version for these instructions) (define_insn "vsx_eq" Index: gcc/config/rs6000/rs6000.md =================================================================== --- gcc/config/rs6000/rs6000.md (revision 176207) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -6288,7 +6288,7 @@ (define_insn "*fmadf4_fpr" && VECTOR_UNIT_NONE_P (DFmode)" "{fma|fmadd} %0,%1,%2,%3" [(set_attr "type" "fp") - (set_attr "fp_type" "fp_maddsub_s")]) + (set_attr "fp_type" "fp_maddsub_d")]) (define_insn "*fmsdf4_fpr" [(set (match_operand:DF 0 "gpc_reg_operand" "=f") @@ -6299,7 +6299,7 @@ (define_insn "*fmsdf4_fpr" && VECTOR_UNIT_NONE_P (DFmode)" "{fms|fmsub} %0,%1,%2,%3" [(set_attr "type" "fp") - (set_attr "fp_type" "fp_maddsub_s")]) + (set_attr "fp_type" "fp_maddsub_d")]) (define_insn "*nfmadf4_fpr" [(set (match_operand:DF 0 "gpc_reg_operand" "=f") @@ -6310,7 +6310,7 @@ (define_insn "*nfmadf4_fpr" && VECTOR_UNIT_NONE_P (DFmode)" "{fnma|fnmadd} %0,%1,%2,%3" [(set_attr "type" "fp") - (set_attr "fp_type" "fp_maddsub_s")]) + (set_attr "fp_type" "fp_maddsub_d")]) (define_insn "*nfmsdf4_fpr" [(set (match_operand:DF 0 "gpc_reg_operand" "=f") @@ -6321,7 +6321,7 @@ (define_insn "*nfmsdf4_fpr" && VECTOR_UNIT_NONE_P (DFmode)" "{fnms|fnmsub} %0,%1,%2,%3" [(set_attr "type" "fp") - (set_attr "fp_type" "fp_maddsub_s")]) + (set_attr "fp_type" "fp_maddsub_d")]) (define_expand "sqrtdf2" [(set (match_operand:DF 0 "gpc_reg_operand" "") Index: gcc/testsuite/gcc.target/powerpc/ppc-fma-2.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/ppc-fma-2.c (revision 176207) +++ gcc/testsuite/gcc.target/powerpc/ppc-fma-2.c (working copy) @@ -3,16 +3,16 @@ /* { dg-require-effective-target powerpc_vsx_ok } */ /* { dg-options "-O3 -ftree-vectorize -mcpu=power7 -ffast-math -ffp-contract=off" } */ /* { dg-final { scan-assembler-times "xvmadd" 2 } } */ -/* { dg-final { scan-assembler-times "xsmadd" 1 } } */ +/* { dg-final { scan-assembler-times "xsmadd\|fmadd\ " 1 } } */ /* { dg-final { scan-assembler-times "fmadds" 1 } } */ /* { dg-final { scan-assembler-times "xvmsub" 2 } } */ -/* { dg-final { scan-assembler-times "xsmsub" 1 } } */ +/* { dg-final { scan-assembler-times "xsmsub\|fmsub\ " 1 } } */ /* { dg-final { scan-assembler-times "fmsubs" 1 } } */ /* { dg-final { scan-assembler-times "xvnmadd" 2 } } */ -/* { dg-final { scan-assembler-times "xsnmadd" 1 } } */ +/* { dg-final { scan-assembler-times "xsnmadd\|fnmadd\ " 1 } } */ /* { dg-final { scan-assembler-times "fnmadds" 1 } } */ /* { dg-final { scan-assembler-times "xvnmsub" 2 } } */ -/* { dg-final { scan-assembler-times "xsnmsub" 1 } } */ +/* { dg-final { scan-assembler-times "xsnmsub\|fnmsub\ " 1 } } */ /* { dg-final { scan-assembler-times "fnmsubs" 1 } } */ /* Only the functions calling the bulitin should generate an appropriate (a * Index: gcc/testsuite/gcc.target/powerpc/recip-3.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/recip-3.c (revision 176207) +++ gcc/testsuite/gcc.target/powerpc/recip-3.c (working copy) @@ -1,9 +1,9 @@ /* { dg-do compile { target { { powerpc*-*-* } && { ! powerpc*-apple-darwin* } } } } */ /* { dg-options "-O2 -mrecip -ffast-math -mcpu=power7" } */ /* { dg-final { scan-assembler-times "xsrsqrtedp" 1 } } */ -/* { dg-final { scan-assembler-times "xsmsub.dp" 1 } } */ +/* { dg-final { scan-assembler-times "xsmsub.dp\|fmsub\ " 1 } } */ /* { dg-final { scan-assembler-times "xsmuldp" 4 } } */ -/* { dg-final { scan-assembler-times "xsnmsub.dp" 2 } } */ +/* { dg-final { scan-assembler-times "xsnmsub.dp\|fnmsub\ " 2 } } */ /* { dg-final { scan-assembler-times "frsqrtes" 1 } } */ /* { dg-final { scan-assembler-times "fmsubs" 1 } } */ /* { dg-final { scan-assembler-times "fmuls" 4 } } */ Index: gcc/testsuite/gcc.target/powerpc/ppc-fma-1.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/ppc-fma-1.c (revision 176207) +++ gcc/testsuite/gcc.target/powerpc/ppc-fma-1.c (working copy) @@ -3,16 +3,16 @@ /* { dg-require-effective-target powerpc_vsx_ok } */ /* { dg-options "-O3 -ftree-vectorize -mcpu=power7 -ffast-math" } */ /* { dg-final { scan-assembler-times "xvmadd" 4 } } */ -/* { dg-final { scan-assembler-times "xsmadd" 2 } } */ +/* { dg-final { scan-assembler-times "xsmadd\|fmadd\ " 2 } } */ /* { dg-final { scan-assembler-times "fmadds" 2 } } */ /* { dg-final { scan-assembler-times "xvmsub" 2 } } */ -/* { dg-final { scan-assembler-times "xsmsub" 1 } } */ +/* { dg-final { scan-assembler-times "xsmsub\|fmsub\ " 1 } } */ /* { dg-final { scan-assembler-times "fmsubs" 1 } } */ /* { dg-final { scan-assembler-times "xvnmadd" 2 } } */ -/* { dg-final { scan-assembler-times "xsnmadd" 1 } } */ +/* { dg-final { scan-assembler-times "xsnmadd\|fnmadd " 1 } } */ /* { dg-final { scan-assembler-times "fnmadds" 1 } } */ /* { dg-final { scan-assembler-times "xvnmsub" 2 } } */ -/* { dg-final { scan-assembler-times "xsnmsub" 1 } } */ +/* { dg-final { scan-assembler-times "xsnmsub\|fnmsub " 1 } } */ /* { dg-final { scan-assembler-times "fnmsubs" 1 } } */ /* All functions should generate an appropriate (a * b) + c instruction