From patchwork Tue May 24 21:05:14 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 625934 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3rDnxr4cgvz9t3Z for ; Wed, 25 May 2016 07:05:52 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=LGAsqrIn; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; q=dns; s= default; b=xT7Mnzr8yviM30cxJpH52DI7lfInhzkS8Psv4eqtLpumxz9j6z4zZ jw66LIkrYlpI+lDmTSmxSSft23E63j0K/IKx5nf8ag4qPxmT+vS7A7jv/Tk4w6hf qSYxdd6QGHXkTDHVEVUqNwJPwHXjn66piJjhNCd4kXPAiWP3FSREVk= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; s= default; bh=2dJOJhvA+gYSceH64pEQYeU/Hqw=; b=LGAsqrInhq3aGNvb4cHw BgSSPTcWA08HG3U+iFecxw7SlATwBkiU7SubNG60RgABtlxbcvZXQpRiax0I8hYU TPGB85rQRD1J3D6V6wQR8INmo2kaNo7L8LvPwIwAAGTrtox56Le5rgtZmqdfEklx qLJDW9LbSwpxCFoHHgDFeU0= Received: (qmail 86415 invoked by alias); 24 May 2016 21:05:40 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 86339 invoked by uid 89); 24 May 2016 21:05:33 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=1.8 required=5.0 tests=AWL, BAYES_50, KAM_ASCII_DIVIDERS, KAM_LAZY_DOMAIN_SECURITY autolearn=no version=3.3.2 spammy=1936, 193, 6, sia, 2506r X-HELO: e34.co.us.ibm.com Received: from e34.co.us.ibm.com (HELO e34.co.us.ibm.com) (32.97.110.152) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (CAMELLIA256-SHA encrypted) ESMTPS; Tue, 24 May 2016 21:05:23 +0000 Received: from localhost by e34.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 24 May 2016 15:05:19 -0600 Received: from d03dlp01.boulder.ibm.com (9.17.202.177) by e34.co.us.ibm.com (192.168.1.134) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 24 May 2016 15:05:16 -0600 X-IBM-Helo: d03dlp01.boulder.ibm.com X-IBM-MailFrom: meissner@ibm-tiger.the-meissners.org X-IBM-RcptTo: gcc-patches@gcc.gnu.org; kelvin@gcc.gnu.org; dje.gcc@gmail.com; segher@kernel.crashing.org Received: from b03cxnp08027.gho.boulder.ibm.com (b03cxnp08027.gho.boulder.ibm.com [9.17.130.19]) by d03dlp01.boulder.ibm.com (Postfix) with ESMTP id 7417B1FF0023; Tue, 24 May 2016 15:05:00 -0600 (MDT) Received: from b03ledav006.gho.boulder.ibm.com (b03ledav006.gho.boulder.ibm.com [9.17.130.237]) by b03cxnp08027.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u4OL5FV342467354; Tue, 24 May 2016 14:05:15 -0700 Received: from b03ledav006.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CBB00C6057; Tue, 24 May 2016 15:05:15 -0600 (MDT) Received: from ibm-tiger.the-meissners.org (unknown [9.32.77.111]) by b03ledav006.gho.boulder.ibm.com (Postfix) with ESMTP id A6816C604F; Tue, 24 May 2016 15:05:15 -0600 (MDT) Received: by ibm-tiger.the-meissners.org (Postfix, from userid 500) id 9DEB745CDE; Tue, 24 May 2016 17:05:14 -0400 (EDT) Date: Tue, 24 May 2016 17:05:14 -0400 From: Michael Meissner To: gcc-patches@gcc.gnu.org, Segher Boessenkool , David Edelsohn , Bill Schmidt , Kelvin Nilsen Subject: [PATCH], Add PowerPC ISA 3.0 vector count trailing zeros and vector parity support Message-ID: <20160524210514.GA6775@ibm-tiger.the-meissners.org> Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , David Edelsohn , Bill Schmidt , Kelvin Nilsen MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-12-10) X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16052421-0017-0000-0000-00002F7C738A X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused X-IsSubscribed: yes This patch adds support for two sets of new instructions in ISA 3.0, vector count trailing zeros, and vector parity. In addition, it defines many of the support macros that will be used by other built-in functions that will be added shortly. I have bootstraped this and there were no regressions. Is it ok to apply to the trunk? Assuming it is ok to apply to the trunk, is it ok to back port to the GCC 6.2 branch? [gcc] 2016-05-24 Michael Meissner * config/rs6000/altivec.md (VParity): New mode iterator for vector parity built-in functions. (p9v_ctz2): Add support for ISA 3.0 vector count trailing zeros. (p9v_parity2): Likewise. * config/rs6000/vector.md (VEC_IP): New mode iterator for vector parity. (ctz2): ISA 3.0 expander for vector count trailing zeros. (parity2): ISA 3.0 expander for vector parity. * config/rs6000/rs6000-builtin.def (BU_P9_MISC_1): New macros for power9 built-ins. (BU_P9_64BIT_MISC_0): Likewise. (BU_P9_MISC_0): Likewise. (BU_P9V_AV_1): Likewise. (BU_P9V_AV_2): Likewise. (BU_P9V_AV_3): Likewise. (BU_P9V_AV_P): Likewise. (BU_P9V_VSX_1): Likewise. (BU_P9V_OVERLOAD_1): Likewise. (BU_P9V_OVERLOAD_2): Likewise. (BU_P9V_OVERLOAD_3): Likewise. (VCTZB): Add vector count trailing zeros support. (VCTZH): Likewise. (VCTZW): Likewise. (VCTZD): Likewise. (VPRTYBD): Add vector parity support. (VPRTYBQ): Likewise. (VPRTYBW): Likewise. (VCTZ): Add overloaded vector count trailing zeros support. (VPRTYB): Add overloaded vector parity support. * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add overloaded vector count trailing zeros and parity instructions. * config/rs6000/rs6000.md (wd mode attribute): Add V1TI and TI for vector parity support. * config/rs6000/altivec.h (vec_vctz): Add ISA 3.0 vector count trailing zeros support. (vec_cntlz): Likewise. (vec_vctzb): Likewise. (vec_vctzd): Likewise. (vec_vctzh): Likewise. (vec_vctzw): Likewise. (vec_vprtyb): Add ISA 3.0 vector parity support. (vec_vprtybd): Likewise. (vec_vprtybw): Likewise. (vec_vprtybq): Likewise. * doc/extend.texi (PowerPC AltiVec Built-in Functions): Document the ISA 3.0 vector count trailing zeros and vector parity built-in functions. [gcc/testsuite] 2016-05-24 Michael Meissner * gcc.target/powerpc/p9-vparity.c: New file to check SIA 3.0 vector parity built-in functions. * gcc.target/powerpc/ctz-3.c: New file to check ISA 3.0 vector count trailing zeros automatic vectorization. * gcc.target/powerpc/ctz-4.c: New file to check ISA 3.0 vector count trailing zeros built-in functions. Index: gcc/config/rs6000/altivec.md =================================================================== --- gcc/config/rs6000/altivec.md (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 236663) +++ gcc/config/rs6000/altivec.md (.../gcc/config/rs6000) (working copy) @@ -193,6 +193,13 @@ (define_mode_iterator VM2 [V4SI (KF "FLOAT128_VECTOR_P (KFmode)") (TF "FLOAT128_VECTOR_P (TFmode)")]) +;; Specific iterator for parity which does not have a byte/half-word form, but +;; does have a quad word form +(define_mode_iterator VParity [V4SI + V2DI + V1TI + (TI "TARGET_VSX_TIMODE")]) + (define_mode_attr VI_char [(V2DI "d") (V4SI "w") (V8HI "h") (V16QI "b")]) (define_mode_attr VI_scalar [(V2DI "DI") (V4SI "SI") (V8HI "HI") (V16QI "QI")]) (define_mode_attr VI_unit [(V16QI "VECTOR_UNIT_ALTIVEC_P (V16QImode)") @@ -3415,7 +3422,7 @@ (define_expand "vec_unpacku_float_lo_v8h }") -;; Power8 vector instructions encoded as Altivec instructions +;; Power8/power9 vector instructions encoded as Altivec instructions ;; Vector count leading zeros (define_insn "*p8v_clz2" @@ -3426,6 +3433,15 @@ (define_insn "*p8v_clz2" [(set_attr "length" "4") (set_attr "type" "vecsimple")]) +;; Vector count trailing zeros +(define_insn "*p9v_ctz2" + [(set (match_operand:VI2 0 "register_operand" "=v") + (ctz:VI2 (match_operand:VI2 1 "register_operand" "v")))] + "TARGET_P9_VECTOR" + "vctz %0,%1" + [(set_attr "length" "4") + (set_attr "type" "vecsimple")]) + ;; Vector population count (define_insn "*p8v_popcount2" [(set (match_operand:VI2 0 "register_operand" "=v") @@ -3435,6 +3451,15 @@ (define_insn "*p8v_popcount2" [(set_attr "length" "4") (set_attr "type" "vecsimple")]) +;; Vector parity +(define_insn "*p9v_parity2" + [(set (match_operand:VParity 0 "register_operand" "=v") + (parity:VParity (match_operand:VParity 1 "register_operand" "v")))] + "TARGET_P9_VECTOR" + "vprtyb %0,%1" + [(set_attr "length" "4") + (set_attr "type" "vecsimple")]) + ;; Vector Gather Bits by Bytes by Doubleword (define_insn "p8v_vgbbd" [(set (match_operand:V16QI 0 "register_operand" "=v") Index: gcc/config/rs6000/vector.md =================================================================== --- gcc/config/rs6000/vector.md (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 236663) +++ gcc/config/rs6000/vector.md (.../gcc/config/rs6000) (working copy) @@ -26,6 +26,13 @@ ;; Vector int modes (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI]) +;; Vector int modes for parity +(define_mode_iterator VEC_IP [V8HI + V4SI + V2DI + V1TI + (TI "TARGET_VSX_TIMODE")]) + ;; Vector float modes (define_mode_iterator VEC_F [V4SF V2DF]) @@ -752,12 +759,24 @@ (define_expand "clz2" (clz:VEC_I (match_operand:VEC_I 1 "register_operand" "")))] "TARGET_P8_VECTOR") +;; Vector count trailing zeros +(define_expand "ctz2" + [(set (match_operand:VEC_I 0 "register_operand" "") + (ctz:VEC_I (match_operand:VEC_I 1 "register_operand" "")))] + "TARGET_P9_VECTOR") + ;; Vector population count (define_expand "popcount2" [(set (match_operand:VEC_I 0 "register_operand" "") (popcount:VEC_I (match_operand:VEC_I 1 "register_operand" "")))] "TARGET_P8_VECTOR") +;; Vector parity +(define_expand "parity2" + [(set (match_operand:VEC_IP 0 "register_operand" "") + (parity:VEC_IP (match_operand:VEC_IP 1 "register_operand" "")))] + "TARGET_P9_VECTOR") + ;; Same size conversions (define_expand "float2" Index: gcc/config/rs6000/rs6000-builtin.def =================================================================== --- gcc/config/rs6000/rs6000-builtin.def (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 236663) +++ gcc/config/rs6000/rs6000-builtin.def (.../gcc/config/rs6000) (working copy) @@ -687,8 +687,113 @@ | RS6000_BTC_BINARY), \ CODE_FOR_ ## ICODE) /* ICODE */ + +/* Miscellaneous builtins for instructions added in ISA 3.0. These + instructions don't require either the DFP or VSX options, just the basic + ISA 3.0 enablement since they operate on general purpose registers. */ +#define BU_P9_MISC_1(ENUM, NAME, ATTR, ICODE) \ + RS6000_BUILTIN_1 (MISC_BUILTIN_ ## ENUM, /* ENUM */ \ + "__builtin_" NAME, /* NAME */ \ + RS6000_BTM_MODULO, /* MASK */ \ + (RS6000_BTC_ ## ATTR /* ATTR */ \ + | RS6000_BTC_UNARY), \ + CODE_FOR_ ## ICODE) /* ICODE */ + +/* Miscellaneous builtins for instructions added in ISA 3.0. These + instructions don't require either the DFP or VSX options, just the basic + ISA 3.0 enablement since they operate on general purpose registers, + and they require 64-bit addressing. */ +#define BU_P9_64BIT_MISC_0(ENUM, NAME, ATTR, ICODE) \ + RS6000_BUILTIN_0 (MISC_BUILTIN_ ## ENUM, /* ENUM */ \ + "__builtin_" NAME, /* NAME */ \ + RS6000_BTM_MODULO \ + | RS6000_BTM_64BIT, /* MASK */ \ + (RS6000_BTC_ ## ATTR /* ATTR */ \ + | RS6000_BTC_SPECIAL), \ + CODE_FOR_ ## ICODE) /* ICODE */ + +/* Miscellaneous builtins for instructions added in ISA 3.0. These + instructions don't require either the DFP or VSX options, just the basic + ISA 3.0 enablement since they operate on general purpose registers. */ +#define BU_P9_MISC_0(ENUM, NAME, ATTR, ICODE) \ + RS6000_BUILTIN_0 (MISC_BUILTIN_ ## ENUM, /* ENUM */ \ + "__builtin_" NAME, /* NAME */ \ + RS6000_BTM_MODULO, /* MASK */ \ + (RS6000_BTC_ ## ATTR /* ATTR */ \ + | RS6000_BTC_SPECIAL), \ + CODE_FOR_ ## ICODE) /* ICODE */ + +/* ISA 3.0 (power9) vector convenience macros. */ +/* For the instructions that are encoded as altivec instructions use + __builtin_altivec_ as the builtin name. */ +#define BU_P9V_AV_1(ENUM, NAME, ATTR, ICODE) \ + RS6000_BUILTIN_1 (P9V_BUILTIN_ ## ENUM, /* ENUM */ \ + "__builtin_altivec_" NAME, /* NAME */ \ + RS6000_BTM_P9_VECTOR, /* MASK */ \ + (RS6000_BTC_ ## ATTR /* ATTR */ \ + | RS6000_BTC_UNARY), \ + CODE_FOR_ ## ICODE) /* ICODE */ + +#define BU_P9V_AV_2(ENUM, NAME, ATTR, ICODE) \ + RS6000_BUILTIN_2 (P9V_BUILTIN_ ## ENUM, /* ENUM */ \ + "__builtin_altivec_" NAME, /* NAME */ \ + RS6000_BTM_P9_VECTOR, /* MASK */ \ + (RS6000_BTC_ ## ATTR /* ATTR */ \ + | RS6000_BTC_BINARY), \ + CODE_FOR_ ## ICODE) /* ICODE */ + +#define BU_P9V_AV_3(ENUM, NAME, ATTR, ICODE) \ + RS6000_BUILTIN_3 (P9V_BUILTIN_ ## ENUM, /* ENUM */ \ + "__builtin_altivec_" NAME, /* NAME */ \ + RS6000_BTM_P9_VECTOR, /* MASK */ \ + (RS6000_BTC_ ## ATTR /* ATTR */ \ + | RS6000_BTC_TERNARY), \ + CODE_FOR_ ## ICODE) /* ICODE */ + +#define BU_P9V_AV_P(ENUM, NAME, ATTR, ICODE) \ + RS6000_BUILTIN_P (P9V_BUILTIN_ ## ENUM, /* ENUM */ \ + "__builtin_altivec_" NAME, /* NAME */ \ + RS6000_BTM_P9_VECTOR, /* MASK */ \ + (RS6000_BTC_ ## ATTR /* ATTR */ \ + | RS6000_BTC_PREDICATE), \ + CODE_FOR_ ## ICODE) /* ICODE */ + +/* For the instructions encoded as VSX instructions use __builtin_vsx as the + builtin name. */ +#define BU_P9V_VSX_1(ENUM, NAME, ATTR, ICODE) \ + RS6000_BUILTIN_1 (P9V_BUILTIN_ ## ENUM, /* ENUM */ \ + "__builtin_vsx_" NAME, /* NAME */ \ + RS6000_BTM_P9_VECTOR, /* MASK */ \ + (RS6000_BTC_ ## ATTR /* ATTR */ \ + | RS6000_BTC_UNARY), \ + CODE_FOR_ ## ICODE) /* ICODE */ + +#define BU_P9V_OVERLOAD_1(ENUM, NAME) \ + RS6000_BUILTIN_1 (P9V_BUILTIN_VEC_ ## ENUM, /* ENUM */ \ + "__builtin_vec_" NAME, /* NAME */ \ + RS6000_BTM_P9_VECTOR, /* MASK */ \ + (RS6000_BTC_OVERLOADED /* ATTR */ \ + | RS6000_BTC_UNARY), \ + CODE_FOR_nothing) /* ICODE */ + +#define BU_P9V_OVERLOAD_2(ENUM, NAME) \ + RS6000_BUILTIN_2 (P9V_BUILTIN_VEC_ ## ENUM, /* ENUM */ \ + "__builtin_vec_" NAME, /* NAME */ \ + RS6000_BTM_P9_VECTOR, /* MASK */ \ + (RS6000_BTC_OVERLOADED /* ATTR */ \ + | RS6000_BTC_BINARY), \ + CODE_FOR_nothing) /* ICODE */ + +#define BU_P9V_OVERLOAD_3(ENUM, NAME) \ + RS6000_BUILTIN_3 (P9V_BUILTIN_VEC_ ## ENUM, /* ENUM */ \ + "__builtin_vec_" NAME, /* NAME */ \ + RS6000_BTM_P9_VECTOR, /* MASK */ \ + (RS6000_BTC_OVERLOADED /* ATTR */ \ + | RS6000_BTC_TERNARY), \ + CODE_FOR_nothing) /* ICODE */ #endif + /* Insure 0 is not a legitimate index. */ BU_SPECIAL_X (RS6000_BUILTIN_NONE, NULL, 0, RS6000_BTC_MISC) @@ -1704,6 +1809,26 @@ BU_LDBL128_2 (UNPACK_TF, "unpack_longdou BU_P7_MISC_2 (PACK_V1TI, "pack_vector_int128", CONST, packv1ti) BU_P7_MISC_2 (UNPACK_V1TI, "unpack_vector_int128", CONST, unpackv1ti) +/* 1 argument vector functions added in ISA 3.0 (power9). */ +BU_P9V_AV_1 (VCTZB, "vctzb", CONST, ctzv16qi2) +BU_P9V_AV_1 (VCTZH, "vctzh", CONST, ctzv8hi2) +BU_P9V_AV_1 (VCTZW, "vctzw", CONST, ctzv4si2) +BU_P9V_AV_1 (VCTZD, "vctzd", CONST, ctzv2di2) +BU_P9V_AV_1 (VPRTYBD, "vprtybd", CONST, parityv2di2) +BU_P9V_AV_1 (VPRTYBQ, "vprtybq", CONST, parityv1ti2) +BU_P9V_AV_1 (VPRTYBW, "vprtybw", CONST, parityv4si2) + +/* ISA 3.0 vector overloaded 1 argument functions. */ +BU_P9V_OVERLOAD_1 (VCTZ, "vctz") +BU_P9V_OVERLOAD_1 (VCTZB, "vctzb") +BU_P9V_OVERLOAD_1 (VCTZH, "vctzh") +BU_P9V_OVERLOAD_1 (VCTZW, "vctzw") +BU_P9V_OVERLOAD_1 (VCTZD, "vctzd") +BU_P9V_OVERLOAD_1 (VPRTYB, "vprtyb") +BU_P9V_OVERLOAD_1 (VPRTYBD, "vprtybd") +BU_P9V_OVERLOAD_1 (VPRTYBQ, "vprtybq") +BU_P9V_OVERLOAD_1 (VPRTYBW, "vprtybw") + /* 1 argument crypto functions. */ BU_CRYPTO_1 (VSBOX, "vsbox", CONST, crypto_vsbox) Index: gcc/config/rs6000/rs6000-c.c =================================================================== --- gcc/config/rs6000/rs6000-c.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 236663) +++ gcc/config/rs6000/rs6000-c.c (.../gcc/config/rs6000) (working copy) @@ -4210,6 +4210,43 @@ const struct altivec_builtin_types altiv { P8V_BUILTIN_VEC_VCLZD, P8V_BUILTIN_VCLZD, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0, 0 }, + { P9V_BUILTIN_VEC_VCTZ, P9V_BUILTIN_VCTZB, + RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0, 0 }, + { P9V_BUILTIN_VEC_VCTZ, P9V_BUILTIN_VCTZB, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 }, + { P9V_BUILTIN_VEC_VCTZ, P9V_BUILTIN_VCTZH, + RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0, 0 }, + { P9V_BUILTIN_VEC_VCTZ, P9V_BUILTIN_VCTZH, + RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, 0, 0 }, + { P9V_BUILTIN_VEC_VCTZ, P9V_BUILTIN_VCTZW, + RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0, 0 }, + { P9V_BUILTIN_VEC_VCTZ, P9V_BUILTIN_VCTZW, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0, 0 }, + { P9V_BUILTIN_VEC_VCTZ, P9V_BUILTIN_VCTZD, + RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0, 0 }, + { P9V_BUILTIN_VEC_VCTZ, P9V_BUILTIN_VCTZD, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0, 0 }, + + { P9V_BUILTIN_VEC_VCTZB, P9V_BUILTIN_VCTZB, + RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0, 0 }, + { P9V_BUILTIN_VEC_VCTZB, P9V_BUILTIN_VCTZB, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 }, + + { P9V_BUILTIN_VEC_VCTZH, P9V_BUILTIN_VCTZH, + RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0, 0 }, + { P9V_BUILTIN_VEC_VCTZH, P9V_BUILTIN_VCTZH, + RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, 0, 0 }, + + { P9V_BUILTIN_VEC_VCTZW, P9V_BUILTIN_VCTZW, + RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0, 0 }, + { P9V_BUILTIN_VEC_VCTZW, P9V_BUILTIN_VCTZW, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0, 0 }, + + { P9V_BUILTIN_VEC_VCTZD, P9V_BUILTIN_VCTZD, + RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0, 0 }, + { P9V_BUILTIN_VEC_VCTZD, P9V_BUILTIN_VCTZD, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0, 0 }, + { P8V_BUILTIN_VEC_VGBBD, P8V_BUILTIN_VGBBD, RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0, 0 }, { P8V_BUILTIN_VEC_VGBBD, P8V_BUILTIN_VGBBD, @@ -4339,6 +4376,42 @@ const struct altivec_builtin_types altiv { P8V_BUILTIN_VEC_VPOPCNTD, P8V_BUILTIN_VPOPCNTD, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0, 0 }, + { P9V_BUILTIN_VEC_VPRTYB, P9V_BUILTIN_VPRTYBW, + RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0, 0 }, + { P9V_BUILTIN_VEC_VPRTYB, P9V_BUILTIN_VPRTYBW, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0, 0 }, + { P9V_BUILTIN_VEC_VPRTYB, P9V_BUILTIN_VPRTYBD, + RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0, 0 }, + { P9V_BUILTIN_VEC_VPRTYB, P9V_BUILTIN_VPRTYBD, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0, 0 }, + { P9V_BUILTIN_VEC_VPRTYB, P9V_BUILTIN_VPRTYBQ, + RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0, 0 }, + { P9V_BUILTIN_VEC_VPRTYB, P9V_BUILTIN_VPRTYBQ, + RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0, 0 }, + { P9V_BUILTIN_VEC_VPRTYB, P9V_BUILTIN_VPRTYBQ, + RS6000_BTI_INTTI, RS6000_BTI_INTTI, 0, 0 }, + { P9V_BUILTIN_VEC_VPRTYB, P9V_BUILTIN_VPRTYBQ, + RS6000_BTI_UINTTI, RS6000_BTI_UINTTI, 0, 0 }, + + { P9V_BUILTIN_VEC_VPRTYBW, P9V_BUILTIN_VPRTYBW, + RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0, 0 }, + { P9V_BUILTIN_VEC_VPRTYBW, P9V_BUILTIN_VPRTYBW, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0, 0 }, + + { P9V_BUILTIN_VEC_VPRTYBD, P9V_BUILTIN_VPRTYBD, + RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0, 0 }, + { P9V_BUILTIN_VEC_VPRTYBD, P9V_BUILTIN_VPRTYBD, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0, 0 }, + + { P9V_BUILTIN_VEC_VPRTYBQ, P9V_BUILTIN_VPRTYBQ, + RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0, 0 }, + { P9V_BUILTIN_VEC_VPRTYBQ, P9V_BUILTIN_VPRTYBQ, + RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0, 0 }, + { P9V_BUILTIN_VEC_VPRTYBQ, P9V_BUILTIN_VPRTYBQ, + RS6000_BTI_INTTI, RS6000_BTI_INTTI, 0, 0 }, + { P9V_BUILTIN_VEC_VPRTYBQ, P9V_BUILTIN_VPRTYBQ, + RS6000_BTI_UINTTI, RS6000_BTI_UINTTI, 0, 0 }, + { P8V_BUILTIN_VEC_VPKUDUM, P8V_BUILTIN_VPKUDUM, RS6000_BTI_V4SI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 }, { P8V_BUILTIN_VEC_VPKUDUM, P8V_BUILTIN_VPKUDUM, Index: gcc/config/rs6000/rs6000.md =================================================================== --- gcc/config/rs6000/rs6000.md (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 236663) +++ gcc/config/rs6000/rs6000.md (.../gcc/config/rs6000) (working copy) @@ -577,7 +577,9 @@ (define_mode_attr wd [(QI "b") (V16QI "b") (V8HI "h") (V4SI "w") - (V2DI "d")]) + (V2DI "d") + (V1TI "q") + (TI "q")]) ;; How many bits in this mode? (define_mode_attr bits [(QI "8") (HI "16") (SI "32") (DI "64")]) Index: gcc/config/rs6000/altivec.h =================================================================== --- gcc/config/rs6000/altivec.h (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 236663) +++ gcc/config/rs6000/altivec.h (.../gcc/config/rs6000) (working copy) @@ -384,6 +384,23 @@ #define vec_vupklsw __builtin_vec_vupklsw #endif +#ifdef _ARCH_PWR9 +/* Vector additions added in ISA 3.0. */ +#define vec_vctz __builtin_vec_vctz +#define vec_cntlz __builtin_vec_vctz +#define vec_vctzb __builtin_vec_vctzb +#define vec_vctzd __builtin_vec_vctzd +#define vec_vctzh __builtin_vec_vctzh +#define vec_vctzw __builtin_vec_vctzw +#define vec_vprtyb __builtin_vec_vprtyb +#define vec_vprtybd __builtin_vec_vprtybd +#define vec_vprtybw __builtin_vec_vprtybw + +#ifdef _ARCH_PPC64 +#define vec_vprtybq __builtin_vec_vprtybq +#endif +#endif + /* Predicates. For C++, we use templates in order to allow non-parenthesized arguments. For C, instead, we use macros since non-parenthesized arguments were Index: gcc/doc/extend.texi =================================================================== --- gcc/doc/extend.texi (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/doc) (revision 236663) +++ gcc/doc/extend.texi (.../gcc/doc) (working copy) @@ -17287,6 +17287,60 @@ int __builtin_bcdsub_gt (vector __int128 int __builtin_bcdsub_ov (vector __int128_t, vector__int128_t); @end smallexample +If the ISA 3.00 additions to the vector/scalar (power9-vector) +instruction set are available: + +@smallexample +vector long long vec_vctz (vector long long); +vector unsigned long long vec_vctz (vector unsigned long long); +vector int vec_vctz (vector int); +vector unsigned int vec_vctz (vector int); +vector short vec_vctz (vector short); +vector unsigned short vec_vctz (vector unsigned short); +vector signed char vec_vctz (vector signed char); +vector unsigned char vec_vctz (vector unsigned char); + +vector signed char vec_vctzb (vector signed char); +vector unsigned char vec_vctzb (vector unsigned char); + +vector long long vec_vctzd (vector long long); +vector unsigned long long vec_vctzd (vector unsigned long long); + +vector short vec_vctzh (vector short); +vector unsigned short vec_vctzh (vector unsigned short); + +vector int vec_vctzw (vector int); +vector unsigned int vec_vctzw (vector int); + +vector int vec_vprtyb (vector int); +vector unsigned int vec_vprtyb (vector unsigned int); +vector long long vec_vprtyb (vector long long); +vector unsigned long long vec_vprtyb (vector unsigned long long); + +vector int vec_vprtybw (vector int); +vector unsigned int vec_vprtybw (vector unsigned int); + +vector long long vec_vprtybd (vector long long); +vector unsigned long long vec_vprtybd (vector unsigned long long); +@end smallexample + + +If the ISA 3.00 additions to the vector/scalar (power9-vector) +instruction set are available for 64-bit targets: + +@smallexample +vector long vec_vprtyb (vector long); +vector unsigned long vec_vprtyb (vector unsigned long); +vector __int128_t vec_vprtyb (vector __int128_t); +vector __uint128_t vec_vprtyb (vector __uint128_t); + +vector long vec_vprtybd (vector long); +vector unsigned long vec_vprtybd (vector unsigned long); + +vector __int128_t vec_vprtybq (vector __int128_t); +vector __uint128_t vec_vprtybd (vector __uint128_t); +@end smallexample + If the cryptographic instructions are enabled (@option{-mcrypto} or @option{-mcpu=power8}), the following builtins are enabled. Index: gcc/testsuite/gcc.target/powerpc/p9-vparity.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/p9-vparity.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0) +++ gcc/testsuite/gcc.target/powerpc/p9-vparity.c (.../gcc/testsuite/gcc.target/powerpc) (revision 236664) @@ -0,0 +1,107 @@ +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-options "-mcpu=power9 -O2 -mlra -mvsx-timode" } */ + +#include + +vector int +parity_v4si_1s (vector int a) +{ + return vec_vprtyb (a); +} + +vector int +parity_v4si_2s (vector int a) +{ + return vec_vprtybw (a); +} + +vector unsigned int +parity_v4si_1u (vector unsigned int a) +{ + return vec_vprtyb (a); +} + +vector unsigned int +parity_v4si_2u (vector unsigned int a) +{ + return vec_vprtybw (a); +} + +vector long long +parity_v2di_1s (vector long long a) +{ + return vec_vprtyb (a); +} + +vector long long +parity_v2di_2s (vector long long a) +{ + return vec_vprtybd (a); +} + +vector unsigned long long +parity_v2di_1u (vector unsigned long long a) +{ + return vec_vprtyb (a); +} + +vector unsigned long long +parity_v2di_2u (vector unsigned long long a) +{ + return vec_vprtybd (a); +} + +vector __int128_t +parity_v1ti_1s (vector __int128_t a) +{ + return vec_vprtyb (a); +} + +vector __int128_t +parity_v1ti_2s (vector __int128_t a) +{ + return vec_vprtybq (a); +} + +__int128_t +parity_ti_3s (__int128_t a) +{ + return vec_vprtyb (a); +} + +__int128_t +parity_ti_4s (__int128_t a) +{ + return vec_vprtybq (a); +} + +vector __uint128_t +parity_v1ti_1u (vector __uint128_t a) +{ + return vec_vprtyb (a); +} + +vector __uint128_t +parity_v1ti_2u (vector __uint128_t a) +{ + return vec_vprtybq (a); +} + +__uint128_t +parity_ti_3u (__uint128_t a) +{ + return vec_vprtyb (a); +} + +__uint128_t +parity_ti_4u (__uint128_t a) +{ + return vec_vprtybq (a); +} + +/* { dg-final { scan-assembler "vprtybd" } } */ +/* { dg-final { scan-assembler "vprtybq" } } */ +/* { dg-final { scan-assembler "vprtybw" } } */ Index: gcc/testsuite/gcc.target/powerpc/ctz-3.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/ctz-3.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0) +++ gcc/testsuite/gcc.target/powerpc/ctz-3.c (.../gcc/testsuite/gcc.target/powerpc) (revision 236664) @@ -0,0 +1,62 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-options "-mcpu=power9 -O2 -ftree-vectorize -fvect-cost-model=dynamic -fno-unroll-loops -fno-unroll-all-loops" } */ + +#ifndef SIZE +#define SIZE 1024 +#endif + +#ifndef ALIGN +#define ALIGN 32 +#endif + +#define ALIGN_ATTR __attribute__((__aligned__(ALIGN))) + +#define DO_BUILTIN(PREFIX, TYPE, CTZ) \ +TYPE PREFIX ## _a[SIZE] ALIGN_ATTR; \ +TYPE PREFIX ## _b[SIZE] ALIGN_ATTR; \ + \ +void \ +PREFIX ## _ctz (void) \ +{ \ + unsigned long i; \ + \ + for (i = 0; i < SIZE; i++) \ + PREFIX ## _a[i] = CTZ (PREFIX ## _b[i]); \ +} + +#if !defined(DO_LONG_LONG) && !defined(DO_LONG) && !defined(DO_INT) && !defined(DO_SHORT) && !defined(DO_CHAR) +#define DO_INT 1 +#endif + +#if DO_LONG_LONG +/* At the moment, only int is auto vectorized. */ +DO_BUILTIN (sll, long long, __builtin_ctzll) +DO_BUILTIN (ull, unsigned long long, __builtin_ctzll) +#endif + +#if defined(_ARCH_PPC64) && DO_LONG +DO_BUILTIN (sl, long, __builtin_ctzl) +DO_BUILTIN (ul, unsigned long, __builtin_ctzl) +#endif + +#if DO_INT +DO_BUILTIN (si, int, __builtin_ctz) +DO_BUILTIN (ui, unsigned int, __builtin_ctz) +#endif + +#if DO_SHORT +DO_BUILTIN (ss, short, __builtin_ctz) +DO_BUILTIN (us, unsigned short, __builtin_ctz) +#endif + +#if DO_CHAR +DO_BUILTIN (sc, signed char, __builtin_ctz) +DO_BUILTIN (uc, unsigned char, __builtin_ctz) +#endif + +/* { dg-final { scan-assembler-times "vctzw" 2 } } */ +/* { dg-final { scan-assembler-not "cnttzd" } } */ +/* { dg-final { scan-assembler-not "cnttzw" } } */ Index: gcc/testsuite/gcc.target/powerpc/ctz-4.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/ctz-4.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0) +++ gcc/testsuite/gcc.target/powerpc/ctz-4.c (.../gcc/testsuite/gcc.target/powerpc) (revision 236664) @@ -0,0 +1,110 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-options "-mcpu=power9 -O2" } */ + +#include + +vector signed char +count_trailing_zeros_v16qi_1s (vector signed char a) +{ + return vec_vctz (a); +} + +vector signed char +count_trailing_zeros_v16qi_2s (vector signed char a) +{ + return vec_vctzb (a); +} + +vector unsigned char +count_trailing_zeros_v16qi_1u (vector unsigned char a) +{ + return vec_vctz (a); +} + +vector unsigned char +count_trailing_zeros_v16qi_2u (vector unsigned char a) +{ + return vec_vctzb (a); +} + +vector short +count_trailing_zeros_v8hi_1s (vector short a) +{ + return vec_vctz (a); +} + +vector short +count_trailing_zeros_v8hi_2s (vector short a) +{ + return vec_vctzh (a); +} + +vector unsigned short +count_trailing_zeros_v8hi_1u (vector unsigned short a) +{ + return vec_vctz (a); +} + +vector unsigned short +count_trailing_zeros_v8hi_2u (vector unsigned short a) +{ + return vec_vctzh (a); +} + +vector int +count_trailing_zeros_v4si_1s (vector int a) +{ + return vec_vctz (a); +} + +vector int +count_trailing_zeros_v4si_2s (vector int a) +{ + return vec_vctzw (a); +} + +vector unsigned int +count_trailing_zeros_v4si_1u (vector unsigned int a) +{ + return vec_vctz (a); +} + +vector unsigned int +count_trailing_zeros_v4si_2u (vector unsigned int a) +{ + return vec_vctzw (a); +} + +vector long long +count_trailing_zeros_v2di_1s (vector long long a) +{ + return vec_vctz (a); +} + +vector long long +count_trailing_zeros_v2di_2s (vector long long a) +{ + return vec_vctzd (a); +} + +vector unsigned long long +count_trailing_zeros_v2di_1u (vector unsigned long long a) +{ + return vec_vctz (a); +} + +vector unsigned long long +count_trailing_zeros_v2di_2u (vector unsigned long long a) +{ + return vec_vctzd (a); +} + +/* { dg-final { scan-assembler "vctzb" } } */ +/* { dg-final { scan-assembler "vctzd" } } */ +/* { dg-final { scan-assembler "vctzh" } } */ +/* { dg-final { scan-assembler "vctzw" } } */ +/* { dg-final { scan-assembler-not "cnttzd" } } */ +/* { dg-final { scan-assembler-not "cnttzw" } } */