From patchwork Wed Jul 12 23:08:20 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Carl Love X-Patchwork-Id: 787450 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3x7F4X33Cwz9s0g for ; Thu, 13 Jul 2017 09:08:43 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="grRZ1E5h"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:from:to:cc:date:content-type:mime-version :content-transfer-encoding:message-id; q=dns; s=default; b=wXvD7 pA4aA86NplbBcBmSzwzoV9KeMU2wy3Chi5u+bZ3MKiQVIVIM3kAkeFJgZDLBVnW2 wttmmgcYqLc78Hpz7ogA9iqwyUPMxM+TN+q6L6gAIgcmKP9oluxjYBiCJTmj/WiI fXC8mW5MsfZXS/CBHW8UN+RWoR7DVyctVGdVaY= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:from:to:cc:date:content-type:mime-version :content-transfer-encoding:message-id; s=default; bh=MMKudnvDwg6 HxKwV1UrSdGcrgIM=; b=grRZ1E5hqKvPqLgmYZ4vw058l6NjgikCmdtOV7M7Yh+ B51n/sEnha9QS2Xw30MuskCVpyQ4vZFT+TsA3cpJlaxYpA1+kuEQ+paJ3uXdCxhl iUyWNiQHcsCe7x4U72HdEVi6jlidSQKG2DcyyZX6uq50vidU+ywau358lOowJkNI = Received: (qmail 104062 invoked by alias); 12 Jul 2017 23:08:30 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 103680 invoked by uid 89); 12 Jul 2017 23:08:30 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-27.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 12 Jul 2017 23:08:27 +0000 Received: from pps.filterd (m0098413.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id v6CN3tqY006982 for ; Wed, 12 Jul 2017 19:08:26 -0400 Received: from e36.co.us.ibm.com (e36.co.us.ibm.com [32.97.110.154]) by mx0b-001b2d01.pphosted.com with ESMTP id 2bnt3pp44h-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Wed, 12 Jul 2017 19:08:26 -0400 Received: from localhost by e36.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 12 Jul 2017 17:08:25 -0600 Received: from b03cxnp07028.gho.boulder.ibm.com (9.17.130.15) by e36.co.us.ibm.com (192.168.1.136) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 12 Jul 2017 17:08:23 -0600 Received: from b03ledav006.gho.boulder.ibm.com (b03ledav006.gho.boulder.ibm.com [9.17.130.237]) by b03cxnp07028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id v6CN8Num3801448; Wed, 12 Jul 2017 16:08:23 -0700 Received: from b03ledav006.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id DD308C6043; Wed, 12 Jul 2017 17:08:22 -0600 (MDT) Received: from oc3304648336.ibm.com (unknown [9.80.234.248]) by b03ledav006.gho.boulder.ibm.com (Postfix) with ESMTP id AE953C6037; Wed, 12 Jul 2017 17:08:21 -0600 (MDT) Subject: [PATCH, rs6000] Add support for vec_extract_fp_from_shorth() and vec_extract_fp_from_short From: Carl Love To: gcc-patches@gcc.gnu.org, David Edelsohn , Segher Boessenkool Cc: Bill Schmidt , cel@us.ibm.com Date: Wed, 12 Jul 2017 16:08:20 -0700 Mime-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 17071223-0020-0000-0000-00000C5BCEB8 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007357; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000214; SDB=6.00886702; UDB=6.00442639; IPR=6.00666852; BA=6.00005469; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00016203; XFM=3.00000015; UTC=2017-07-12 23:08:24 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17071223-0021-0000-0000-00005D38CC76 Message-Id: <1499900900.14462.15.camel@us.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-07-12_07:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1706020000 definitions=main-1707120367 X-IsSubscribed: yes GCC Maintainers: The following patch adds support for the vec_extract_fp_from_shorth() and vec_extract_fp_from_short builtin functions. The patch has been tested on powerpc64le-unknown-linux-gnu (Power 8 LE) and powerpc64le-unknown-linux-gnu (Power 9 LE). The test generates 1 unsupported test on Power 8 and 2 test passes on Power 9. Please let me know if the following patch is acceptable. Thanks. Carl Love ---------------------------------------------------- gcc/ChangeLog: 2017-07-12 Carl Love * config/rs6000/rs6000-c.c: Add support for built-in functions vector float vec_extract_fp32_from_shorth (vector unsigned short); vector float vec_extract_fp32_from_shortl (vector unsigned short); * config/rs6000/altivec.h (vec_extract_fp_from_shorth, vec_extract_fp_from_shortl): Add defines for the two builtins. * config/rs6000/rs6000-builtin.def (VEXTRACT_FP_FROM_SHORTH, VEXTRACT_FP_FROM_SHORTL): Add BU_P9V_OVERLOAD_1 and BU_P9V_VSX_1 new builtins. * config/rs6000/vsx.md(vsx_xvcvhpsp): Add define_insn. (vextract_fp_from_shorth, vextract_fp_from_shortl): Add define_expands. * doc/extend.texi: Update the built-in documentation file for the new built-in function. gcc/testsuite/ChangeLog: 2017-07-12 Carl Love * gcc.target/powerpc/builtins-3-p9-runnable.c: Add new test file for the new built-ins. --- gcc/config/rs6000/altivec.h | 3 + gcc/config/rs6000/rs6000-builtin.def | 5 ++ gcc/config/rs6000/rs6000-c.c | 5 ++ gcc/config/rs6000/vsx.md | 70 +++++++++++++++++++++- gcc/doc/extend.texi | 3 + .../gcc.target/powerpc/builtins-3-p9-runnable.c | 36 +++++++++++ 6 files changed, 121 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-3-p9-runnable.c diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h index 71cdca5..4d34a97 100644 --- a/gcc/config/rs6000/altivec.h +++ b/gcc/config/rs6000/altivec.h @@ -449,6 +449,9 @@ #define vec_insert_exp __builtin_vec_insert_exp #define vec_test_data_class __builtin_vec_test_data_class +#define vec_extract_fp_from_shorth __builtin_vec_vextract_fp_from_shorth +#define vec_extract_fp_from_shortl __builtin_vec_vextract_fp_from_shortl + #define scalar_extract_exp __builtin_vec_scalar_extract_exp #define scalar_extract_sig __builtin_vec_scalar_extract_sig #define scalar_insert_exp __builtin_vec_scalar_insert_exp diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def index e098e1c..400189e 100644 --- a/gcc/config/rs6000/rs6000-builtin.def +++ b/gcc/config/rs6000/rs6000-builtin.def @@ -2057,6 +2057,9 @@ BU_P9V_OVERLOAD_1 (VSTDCNSP, "scalar_test_neg_sp") BU_P9V_OVERLOAD_1 (REVB, "revb") +BU_P9V_OVERLOAD_1 (VEXTRACT_FP_FROM_SHORTH, "vextract_fp_from_shorth") +BU_P9V_OVERLOAD_1 (VEXTRACT_FP_FROM_SHORTL, "vextract_fp_from_shortl") + /* ISA 3.0 vector scalar overloaded 2 argument functions. */ BU_P9V_OVERLOAD_2 (VSIEDP, "scalar_insert_exp") @@ -2074,6 +2077,8 @@ BU_P9V_VSX_1 (VEEDP, "extract_exp_dp", CONST, xvxexpdp) BU_P9V_VSX_1 (VEESP, "extract_exp_sp", CONST, xvxexpsp) BU_P9V_VSX_1 (VESDP, "extract_sig_dp", CONST, xvxsigdp) BU_P9V_VSX_1 (VESSP, "extract_sig_sp", CONST, xvxsigsp) +BU_P9V_VSX_1 (VEXTRACT_FP_FROM_SHORTH, "vextract_fp_from_shorth", CONST, vextract_fp_from_shorth) +BU_P9V_VSX_1 (VEXTRACT_FP_FROM_SHORTL, "vextract_fp_from_shortl", CONST, vextract_fp_from_shortl) /* 2 argument vsx vector functions added in ISA 3.0 (power9). */ BU_P9V_VSX_2 (VIEDP, "insert_exp_dp", CONST, xviexpdp) diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c index c769442..a1d09ba 100644 --- a/gcc/config/rs6000/rs6000-c.c +++ b/gcc/config/rs6000/rs6000-c.c @@ -5164,6 +5164,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = { { P9V_BUILTIN_VEC_VEXTRACT4B, P9V_BUILTIN_VEXTRACT4B, RS6000_BTI_INTDI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTSI, 0 }, + { P9V_BUILTIN_VEC_VEXTRACT_FP_FROM_SHORTH, P9V_BUILTIN_VEXTRACT_FP_FROM_SHORTH, + RS6000_BTI_V4SF, RS6000_BTI_unsigned_V8HI, 0, 0 }, + { P9V_BUILTIN_VEC_VEXTRACT_FP_FROM_SHORTL, P9V_BUILTIN_VEXTRACT_FP_FROM_SHORTL, + RS6000_BTI_V4SF, RS6000_BTI_unsigned_V8HI, 0, 0 }, + { P9V_BUILTIN_VEC_VEXTULX, P9V_BUILTIN_VEXTUBLX, RS6000_BTI_INTQI, RS6000_BTI_UINTSI, RS6000_BTI_V16QI, 0 }, diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 2ddfae5..573eb3f 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -326,6 +326,7 @@ UNSPEC_VSX_CVDPSXWS UNSPEC_VSX_CVDPUXWS UNSPEC_VSX_CVSPDP + UNSPEC_VSX_CVHPSP UNSPEC_VSX_CVSPDPN UNSPEC_VSX_CVDPSPN UNSPEC_VSX_CVSXWDP @@ -367,6 +368,8 @@ UNSPEC_VSX_SIEXPDP UNSPEC_VSX_SCMPEXPDP UNSPEC_VSX_STSTDC + UNSPEC_VSX_VEXTRACT_FP_FROM_SHORTH + UNSPEC_VSX_VEXTRACT_FP_FROM_SHORTL UNSPEC_VSX_VXEXP UNSPEC_VSX_VXSIG UNSPEC_VSX_VIEXP @@ -1745,6 +1748,15 @@ "xscvspdp %x0,%x1" [(set_attr "type" "fp")]) +;; Generate xvcvhpsp instruction +(define_insn "vsx_xvcvhpsp" + [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa") + (unspec:V4SF [(match_operand: V8HI 1 "vsx_register_operand" "f")] + UNSPEC_VSX_CVHPSP))] + "VECTOR_UNIT_VSX_P (V4SFmode)" + "xvcvhpsp %x0,%x1" + [(set_attr "type" "fp")]) + ;; xscvdpsp used for splat'ing a scalar to V4SF, knowing that the internal SF ;; format of scalars is actually DF. (define_insn "vsx_xscvdpsp_scalar" @@ -4419,7 +4431,63 @@ "xxinsertw %x0,%x1,%3" [(set_attr "type" "vecperm")]) - +;; Generate vector extract four float 32 values from left four elements +;; of eight element vector of float 16 values. +(define_expand "vextract_fp_from_shorth" + [(set (match_operand:V4SF 0 "register_operand" "=v") + (unspec:V4SF [(match_operand:V8HI 1 "register_operand" "v")] + UNSPEC_VSX_VEXTRACT_FP_FROM_SHORTH))] + "TARGET_P9_VECTOR" +{ + int vals[16] = {0, 1, 0 ,0, 2, 3, 0, 0, 4, 5, 0, 0, 6, 7, 0, 8}; + int i; + + rtx rtx_tmp = gen_reg_rtx (V8HImode); + rtx rvals[16]; + rtx mask = gen_reg_rtx (V16QImode); + rtvec v; + + for (i = 0; i < 16; i++) + rvals[i] = GEN_INT (vals[i]); + + /* xvcvhpsp - vector convert F16 to vector F32 requires the four F16 + inputs in half words 1,3,5,7 (IBM numbering). Use xxperm to move + src half words 0,1,2,3 for the conversion instruction. */ + v = gen_rtvec_v (16, rvals); + emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v))); + emit_insn (gen_altivec_vperm_v8hi (rtx_tmp, operands[1], operands[1], mask)); + emit_insn (gen_vsx_xvcvhpsp (operands[0], rtx_tmp)); + DONE; +}) + +;; Generate vector extract four float 32 values from right four elements +;; of eight element vector of float 16 values. +(define_expand "vextract_fp_from_shortl" + [(set (match_operand:V4SF 0 "register_operand" "=v") + (unspec:V4SF [(match_operand:V8HI 1 "register_operand" "v")] + UNSPEC_VSX_VEXTRACT_FP_FROM_SHORTL))] + "TARGET_P9_VECTOR" +{ + int vals[16] = {8, 9, 0, 0, 10, 11, 0, 0, 12, 13, 0, 0, 14, 15, 0, 0}; + int i; + rtx rtx_tmp = gen_reg_rtx (V8HImode); + rtx rvals[16]; + rtx mask = gen_reg_rtx (V16QImode); + rtvec v; + + for (i = 0; i < 16; i++) + rvals[i] = GEN_INT (vals[i]); + + /* xvcvhpsp - vector convert F16 to vector F32 requires the four F16 + inputs in half words 1,3,5,7 (IBM numbering). Use xxperm to move + src half words 4,5,6,7 for the conversion instruction. */ + v = gen_rtvec_v (16, rvals); + emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v))); + emit_insn (gen_altivec_vperm_v8hi (rtx_tmp, operands[1], operands[1], mask)); + emit_insn (gen_vsx_xvcvhpsp (operands[0], rtx_tmp)); + DONE; +}) + ;; Support for ISA 3.0 vector byte reverse ;; Swap all bytes with in a vector diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 530a82d..0135fc7 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -18258,6 +18258,9 @@ vector bool short vec_cmpne (vector bool short, vector bool short); vector bool int vec_cmpne (vector bool int, vector bool int); vector bool long long vec_cmpne (vector bool long long, vector bool long long); +vector float vec_extract_fp32_from_shorth (vector unsigned short); +vector float vec_extract_fp32_from_shortl (vector unsigned short); + vector long long vec_vctz (vector long long); vector unsigned long long vec_vctz (vector unsigned long long); vector int vec_vctz (vector int); diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-3-p9-runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-3-p9-runnable.c new file mode 100644 index 0000000..ce1a2ce --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/builtins-3-p9-runnable.c @@ -0,0 +1,36 @@ +/* { dg-do run { target { powerpc64*-*-* && { lp64 && p9vector_hw } } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-mcpu=power9 -O2 -mupper-regs-di" } */ + +#include // vector + +void abort (void); + +int main() { + int i; + vector float vfr, vfexpt; + vector unsigned short vusha; + + /* 1.0, -2.0, 0.0, 8.5, 1.5, 0.5, 1.25, -0.25 */ + vusha = (vector unsigned short){0B011110000000000, 0B1100000000000000, + 0B000000000000000, 0B0100100001000000, + 0B011111000000000, 0B0011100000000000, + 0B011110100000000, 0B1011010000000000}; + + vfexpt = (vector float){1.0, -2.0, 0.0, 8.5}; + vfr = vec_extract_fp_from_shorth(vusha); + + for (i=0; i<4; i++) { + if (vfr[i] != vfexpt[i]) + abort(); + } + + vfexpt = (vector float){1.5, 0.5, 1.25, -0.25}; + vfr = vec_extract_fp_from_shortl(vusha); + + for (i=0; i<4; i++) { + if (vfr[i] != vfexpt[i]) + abort(); + } +}