From patchwork Mon Jun 15 23:37:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Carl Love X-Patchwork-Id: 1309875 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=gcc.gnu.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=K4r0SNog; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49m76x0pnnz9sRN for ; Tue, 16 Jun 2020 09:38:01 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0CA87383E815; Mon, 15 Jun 2020 23:37:57 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0CA87383E815 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1592264277; bh=JdWb+2Z08ODcbTnX6qz74LTLROLHbJr1L1qbPlogUYg=; h=Subject:To:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=K4r0SNogbipIVMlMtMtw1C3o3MOkc2USiKWluqqNIrT1VlLACNm6/7hhE7GU6Xvz7 BJ828jJUyDeJx62slZj533BJ6jUNPWm/eCgpKOk4KgXygClmVICRKIkRxW6nr5zBAs xDdW4Kj4LdAoc2VDuxNn8JCkOylUfjRJTXrlroO0= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id AC868383E813; Mon, 15 Jun 2020 23:37:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org AC868383E813 Received: from pps.filterd (m0187473.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 05FNW728024631; Mon, 15 Jun 2020 19:37:52 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 31n45d1afv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 15 Jun 2020 19:37:52 -0400 Received: from m0187473.ppops.net (m0187473.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 05FNWvxs027013; Mon, 15 Jun 2020 19:37:52 -0400 Received: from ppma02wdc.us.ibm.com (aa.5b.37a9.ip4.static.sl-reverse.com [169.55.91.170]) by mx0a-001b2d01.pphosted.com with ESMTP id 31n45d1afg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 15 Jun 2020 19:37:52 -0400 Received: from pps.filterd (ppma02wdc.us.ibm.com [127.0.0.1]) by ppma02wdc.us.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 05FNTlQL029943; Mon, 15 Jun 2020 23:37:50 GMT Received: from b03cxnp08027.gho.boulder.ibm.com (b03cxnp08027.gho.boulder.ibm.com [9.17.130.19]) by ppma02wdc.us.ibm.com with ESMTP id 31pey48ydt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 15 Jun 2020 23:37:50 +0000 Received: from b03ledav001.gho.boulder.ibm.com (b03ledav001.gho.boulder.ibm.com [9.17.130.232]) by b03cxnp08027.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 05FNbmHa7668384 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 15 Jun 2020 23:37:48 GMT Received: from b03ledav001.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1532B6E04C; Mon, 15 Jun 2020 23:37:50 +0000 (GMT) Received: from b03ledav001.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0BEAC6E059; Mon, 15 Jun 2020 23:37:49 +0000 (GMT) Received: from sig-9-65-250-81.ibm.com (unknown [9.65.250.81]) by b03ledav001.gho.boulder.ibm.com (Postfix) with ESMTP; Mon, 15 Jun 2020 23:37:48 +0000 (GMT) Message-ID: Subject: [PATCH 1/6 ver 2] rs6000, Update support for vec_extract To: segher@gcc.gnu.org, dje.gcc@gmail.com, gcc-patches@gcc.gnu.org Date: Mon, 15 Jun 2020 16:37:47 -0700 X-Mailer: Evolution 3.28.5 (3.28.5-5.el7) Mime-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216, 18.0.687 definitions=2020-06-15_11:2020-06-15, 2020-06-15 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 suspectscore=4 phishscore=0 adultscore=0 mlxlogscore=999 priorityscore=1501 bulkscore=0 malwarescore=0 spamscore=0 impostorscore=0 clxscore=1015 cotscore=-2147483648 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2006150164 X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Carl Love via Gcc-patches From: Carl Love Reply-To: Carl Love Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" v2 changes config/rs6000/altivec.md log entry for move from changed as suggested. config/rs6000/vsx.md log entro for moved to here changed as suggested. define_mode_iterator VI2 also moved, included in both change log entries -------------------------------------------- GCC maintainers: Move the existing vector extract support in altivec.md to vsx.md so all of the vector insert and extract support is in the same file. The patch also updates the name of the builtins and descriptions for the builtins in the documentation file so they match the approved builtin names and descriptions. The patch does not make any functional changes. Please let me know if the changes are acceptable. Thanks. Carl Love ------------------------------------------------------ gcc/ChangeLog 2020-06-15 Carl Love * config/rs6000/altivec.md: (UNSPEC_EXTRACTL, UNSPEC_EXTRACTR) (vextractl, vextractr) (vextractl_internal, vextractr_internal) (VI2): Move to gcc/config/rs6000/vsx.md. * config/rs6000/vsx.md: (UNSPEC_EXTRACTL, UNSPEC_EXTRACTR) (vextractl, vextractr) (vextractl_internal, vextractr_internal) (VI2): Code was moved from config/rs6000/altivec.md. * gcc/doc/extend.texi: Update documentation for vec_extractl. Replace builtin name vec_extractr with vec_extracth. Update description of vec_extracth. --- gcc/config/rs6000/altivec.md | 64 ------------------------------- gcc/config/rs6000/vsx.md | 66 ++++++++++++++++++++++++++++++++ gcc/doc/extend.texi | 73 +++++++++++++++++------------------- 3 files changed, 101 insertions(+), 102 deletions(-) diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 159f24ebc10..0b0b49ee056 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -171,8 +171,6 @@ UNSPEC_XXEVAL UNSPEC_VSTRIR UNSPEC_VSTRIL - UNSPEC_EXTRACTL - UNSPEC_EXTRACTR ]) (define_c_enum "unspecv" @@ -183,8 +181,6 @@ UNSPECV_DSS ]) -;; Like VI, defined in vector.md, but add ISA 2.07 integer vector ops -(define_mode_iterator VI2 [V4SI V8HI V16QI V2DI]) ;; Short vec int modes (define_mode_iterator VIshort [V8HI V16QI]) ;; Longer vec int modes for rotate/mask ops @@ -785,66 +781,6 @@ DONE; }) -(define_expand "vextractl" - [(set (match_operand:V2DI 0 "altivec_register_operand") - (unspec:V2DI [(match_operand:VI2 1 "altivec_register_operand") - (match_operand:VI2 2 "altivec_register_operand") - (match_operand:SI 3 "register_operand")] - UNSPEC_EXTRACTL))] - "TARGET_FUTURE" -{ - if (BYTES_BIG_ENDIAN) - { - emit_insn (gen_vextractl_internal (operands[0], operands[1], - operands[2], operands[3])); - emit_insn (gen_xxswapd_v2di (operands[0], operands[0])); - } - else - emit_insn (gen_vextractr_internal (operands[0], operands[2], - operands[1], operands[3])); - DONE; -}) - -(define_insn "vextractl_internal" - [(set (match_operand:V2DI 0 "altivec_register_operand" "=v") - (unspec:V2DI [(match_operand:VEC_I 1 "altivec_register_operand" "v") - (match_operand:VEC_I 2 "altivec_register_operand" "v") - (match_operand:SI 3 "register_operand" "r")] - UNSPEC_EXTRACTL))] - "TARGET_FUTURE" - "vextvlx %0,%1,%2,%3" - [(set_attr "type" "vecsimple")]) - -(define_expand "vextractr" - [(set (match_operand:V2DI 0 "altivec_register_operand") - (unspec:V2DI [(match_operand:VI2 1 "altivec_register_operand") - (match_operand:VI2 2 "altivec_register_operand") - (match_operand:SI 3 "register_operand")] - UNSPEC_EXTRACTR))] - "TARGET_FUTURE" -{ - if (BYTES_BIG_ENDIAN) - { - emit_insn (gen_vextractr_internal (operands[0], operands[1], - operands[2], operands[3])); - emit_insn (gen_xxswapd_v2di (operands[0], operands[0])); - } - else - emit_insn (gen_vextractl_internal (operands[0], operands[2], - operands[1], operands[3])); - DONE; -}) - -(define_insn "vextractr_internal" - [(set (match_operand:V2DI 0 "altivec_register_operand" "=v") - (unspec:V2DI [(match_operand:VEC_I 1 "altivec_register_operand" "v") - (match_operand:VEC_I 2 "altivec_register_operand" "v") - (match_operand:SI 3 "register_operand" "r")] - UNSPEC_EXTRACTR))] - "TARGET_FUTURE" - "vextvrx %0,%1,%2,%3" - [(set_attr "type" "vecsimple")]) - (define_expand "vstrir_" [(set (match_operand:VIshort 0 "altivec_register_operand") (unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand")] diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 2a28215ac5b..51ffe2d2000 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -344,8 +344,13 @@ UNSPEC_VSX_FIRST_MISMATCH_INDEX UNSPEC_VSX_FIRST_MISMATCH_EOS_INDEX UNSPEC_XXGENPCV + UNSPEC_EXTRACTL + UNSPEC_EXTRACTR ]) +;; Like VI, defined in vector.md, but add ISA 2.07 integer vector ops +(define_mode_iterator VI2 [V4SI V8HI V16QI V2DI]) + ;; VSX moves ;; The patterns for LE permuted loads and stores come before the general @@ -3781,6 +3786,67 @@ } [(set_attr "type" "load")]) +;; ISA 3.1 extract +(define_expand "vextractl" + [(set (match_operand:V2DI 0 "altivec_register_operand") + (unspec:V2DI [(match_operand:VI2 1 "altivec_register_operand") + (match_operand:VI2 2 "altivec_register_operand") + (match_operand:SI 3 "register_operand")] + UNSPEC_EXTRACTL))] + "TARGET_FUTURE" +{ + if (BYTES_BIG_ENDIAN) + { + emit_insn (gen_vextractl_internal (operands[0], operands[1], + operands[2], operands[3])); + emit_insn (gen_xxswapd_v2di (operands[0], operands[0])); + } + else + emit_insn (gen_vextractr_internal (operands[0], operands[2], + operands[1], operands[3])); + DONE; +}) + +(define_insn "vextractl_internal" + [(set (match_operand:V2DI 0 "altivec_register_operand" "=v") + (unspec:V2DI [(match_operand:VEC_I 1 "altivec_register_operand" "v") + (match_operand:VEC_I 2 "altivec_register_operand" "v") + (match_operand:SI 3 "register_operand" "r")] + UNSPEC_EXTRACTL))] + "TARGET_FUTURE" + "vextvlx %0,%1,%2,%3" + [(set_attr "type" "vecsimple")]) + +(define_expand "vextractr" + [(set (match_operand:V2DI 0 "altivec_register_operand") + (unspec:V2DI [(match_operand:VI2 1 "altivec_register_operand") + (match_operand:VI2 2 "altivec_register_operand") + (match_operand:SI 3 "register_operand")] + UNSPEC_EXTRACTR))] + "TARGET_FUTURE" +{ + if (BYTES_BIG_ENDIAN) + { + emit_insn (gen_vextractr_internal (operands[0], operands[1], + operands[2], operands[3])); + emit_insn (gen_xxswapd_v2di (operands[0], operands[0])); + } + else + emit_insn (gen_vextractl_internal (operands[0], operands[2], + operands[1], operands[3])); + DONE; +}) + +(define_insn "vextractr_internal" + [(set (match_operand:V2DI 0 "altivec_register_operand" "=v") + (unspec:V2DI [(match_operand:VEC_I 1 "altivec_register_operand" "v") + (match_operand:VEC_I 2 "altivec_register_operand" "v") + (match_operand:SI 3 "register_operand" "r")] + UNSPEC_EXTRACTR))] + "TARGET_FUTURE" + "vextvrx %0,%1,%2,%3" + [(set_attr "type" "vecsimple")]) + ;; VSX_EXTRACT optimizations ;; Optimize double d = (double) vec_extract (vi, ) ;; Get the element into the top position and use XVCVSWDP/XVCVUWDP diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index e656e66a80c..5549a695b42 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -20919,6 +20919,9 @@ Perform a 128-bit vector gather operation, as if implemented by the Future integer value between 2 and 7 inclusive. @findex vec_gnb + +Vector Extract + @smallexample @exdent vector unsigned long long int @exdent vec_extractl (vector unsigned char, vector unsigned char, unsigned int) @@ -20929,51 +20932,45 @@ integer value between 2 and 7 inclusive. @exdent vector unsigned long long int @exdent vec_extractl (vector unsigned long long, vector unsigned long long, unsigned int) @end smallexample -Extract a single element from the vector formed by catenating this function's -first two arguments at the byte offset specified by this function's -third argument. On big-endian targets, this function behaves as if -implemented by the Future @code{vextdubvlx}, @code{vextduhvlx}, -@code{vextduwvlx}, or @code{vextddvlx} instructions, depending on the -types of the function's first two arguments. On little-endian -targets, this function behaves as if implemented by the Future -@code{vextdubvrx}, @code{vextduhvrx}, -@code{vextduwvrx}, or @code{vextddvrx} instructions. -The byte offset of the element to be extracted is calculated -by computing the remainder of dividing the third argument by 32. -If this reminader value is not a multiple of the vector element size, -or if its value added to the vector element size exceeds 32, the -result is undefined. +Extract an element from two concatenated vectors starting at the given byte index +in natural-endian order, and place it zero-extended in doubleword 1 of the result +according to natural element order. If the byte index is out of range for the +data type, the intrinsic will be rejected. +For little-endian, this output will match the placement by the hardware +instruction, i.e., dword[0] in RTL notation. For big-endian, an additional +instruction is needed to move it from the "left" doubleword to the "right" one. +For little-endian, semantics matching the vextdu*vrx instruction will be +generated, while for big-endian, semantics matching the vextdu*vlx instruction +will be generated. Note that some fairly anomalous results can be generated if +the byte index is not aligned on an element boundary for the element being +extracted. This is a limitation of the bi-endian vector programming model is +consistent with the limitation on vec_perm, for example. @findex vec_extractl @smallexample @exdent vector unsigned long long int -@exdent vec_extractr (vector unsigned char, vector unsigned char, unsigned int) +@exdent vec_extracth (vector unsigned char, vector unsigned char, unsigned int) @exdent vector unsigned long long int -@exdent vec_extractr (vector unsigned short, vector unsigned short, unsigned int) +@exdent vec_extracth (vector unsigned short, vector unsigned short, unsigned int) @exdent vector unsigned long long int -@exdent vec_extractr (vector unsigned int, vector unsigned int, unsigned int) +@exdent vec_extracth (vector unsigned int, vector unsigned int, unsigned int) @exdent vector unsigned long long int -@exdent vec_extractr (vector unsigned long long, vector unsigned long long, unsigned int) -@end smallexample -Extract a single element from the vector formed by catenating this function's -first two arguments at the byte offset calculated by subtracting this -function's third argument from 31. On big-endian targets, this -function behaves as if -implemented by the Future -@code{vextdubvrx}, @code{vextduhvrx}, -@code{vextduwvrx}, or @code{vextddvrx} instructions, depending on the -types of the function's first two arguments. -On little-endian -targets, this function behaves as if implemented by the Future -@code{vextdubvlx}, @code{vextduhvlx}, -@code{vextduwvlx}, or @code{vextddvlx} instructions. -The byte offset of the element to be extracted, measured from the -right end of the catenation of the two vector arguments, is calculated -by computing the remainder of dividing the third argument by 32. -If this reminader value is not a multiple of the vector element size, -or if its value added to the vector element size exceeds 32, the -result is undefined. -@findex vec_extractr +@exdent vec_extracth (vector unsigned long long, vector unsigned long long, unsigned int) +@end smallexample +Extract an element from two concatenated vectors starting at the given byte index +in opposite-endian order, and place it zero-extended in doubleword 1 according to +natural element order. If the byte index is out of range for the data type, +the intrinsic will be rejected. For little-endian, +this output will match the placement by the hardware instruction, i.e., dword[0] +in RTL notation. For big-endian, an additional instruction is needed to move it +from the "left" doubleword to the "right" one. For little-endian, semantics +matching the vextdu*vlx instruction will be generated, while for big-endian, +semantics matching the vextdu*vrx instruction will be generated. Note that some +fairly anomalous results can be generated if the byte index is not aligned on the +element boundary for the element being extracted. This is a +limitation of the bi-endian vector programming model consistent with the +limitation on vec_perm, for example. +@findex vec_extracth @smallexample @exdent vector unsigned long long int From patchwork Mon Jun 15 23:37:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Carl Love X-Patchwork-Id: 1309876 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=gcc.gnu.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=NznOZy5p; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49m7733yQYz9sRN for ; Tue, 16 Jun 2020 09:38:07 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A00CE383E813; Mon, 15 Jun 2020 23:38:04 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A00CE383E813 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1592264284; bh=Q3Mfr1Ws+GE+fx7iuEBCnZ4nD2CECJn9su1XFUYGATE=; h=Subject:To:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=NznOZy5pLv2swrl+LHUyZJVQKL2SewjkbabdQW9dTRD/ttoO9XkkVSkRVf11s6vQW c6YnkyBlLMda8+HTzAtyosk0oqW9ZtqIiMZVdkxhzYwU6T0LzDjwkcZawMEloukSDo NZBrMxOgqaabsqhoiCT2hRgyzHEaRpfMd6Kpo8n0= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id D89B2386F45A; Mon, 15 Jun 2020 23:37:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org D89B2386F45A Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 05FNW0vo184954; Mon, 15 Jun 2020 19:37:59 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 31nrerjv6g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 15 Jun 2020 19:37:58 -0400 Received: from m0098399.ppops.net (m0098399.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 05FNWJAO186312; Mon, 15 Jun 2020 19:37:58 -0400 Received: from ppma03dal.us.ibm.com (b.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.11]) by mx0a-001b2d01.pphosted.com with ESMTP id 31nrerjv6a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 15 Jun 2020 19:37:58 -0400 Received: from pps.filterd (ppma03dal.us.ibm.com [127.0.0.1]) by ppma03dal.us.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 05FNTaLg015100; Mon, 15 Jun 2020 23:37:57 GMT Received: from b03cxnp08025.gho.boulder.ibm.com (b03cxnp08025.gho.boulder.ibm.com [9.17.130.17]) by ppma03dal.us.ibm.com with ESMTP id 31nbyu7w6e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 15 Jun 2020 23:37:57 +0000 Received: from b03ledav004.gho.boulder.ibm.com (b03ledav004.gho.boulder.ibm.com [9.17.130.235]) by b03cxnp08025.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 05FNbtXq19333566 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 15 Jun 2020 23:37:55 GMT Received: from b03ledav004.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2300878060; Mon, 15 Jun 2020 23:37:56 +0000 (GMT) Received: from b03ledav004.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0CA127805F; Mon, 15 Jun 2020 23:37:55 +0000 (GMT) Received: from sig-9-65-250-81.ibm.com (unknown [9.65.250.81]) by b03ledav004.gho.boulder.ibm.com (Postfix) with ESMTP; Mon, 15 Jun 2020 23:37:54 +0000 (GMT) Message-ID: Subject: [PATCH 2/6 ver 2] rs6000 Add vector insert builtin support To: segher@gcc.gnu.org, dje.gcc@gmail.com, gcc-patches@gcc.gnu.org Date: Mon, 15 Jun 2020 16:37:53 -0700 X-Mailer: Evolution 3.28.5 (3.28.5-5.el7) Mime-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216, 18.0.687 definitions=2020-06-15_11:2020-06-15, 2020-06-15 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 clxscore=1015 mlxlogscore=999 suspectscore=4 phishscore=0 bulkscore=0 adultscore=0 lowpriorityscore=0 priorityscore=1501 spamscore=0 cotscore=-2147483648 malwarescore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2006150164 X-Spam-Status: No, score=-9.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SCC_10_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Carl Love via Gcc-patches From: Carl Love Reply-To: Carl Love Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" v2 changes Fix change log entry for config/rs6000/altivec.h Fix change log entry for config/rs6000/rs6000-builtin.def Fix change log entry for config/rs6000/rs6000-call.c vsx.md: Fixed if (BYTES_BIG_ENDIAN) else statements. Porting error from pu branch. --------------------------------------------------------------- GCC maintainers: This patch adds support for vec_insertl and vec_inserth builtins. The patch has been compiled and tested on powerpc64le-unknown-linux-gnu (Power 9 LE) and mambo with no regression errors. Please let me know if this patch is acceptable for the mainline branch. Thanks. Carl Love -------------------------------------------------------------- gcc/ChangeLog 2020-06-15 Carl Love * config/rs6000/altivec.h (vec_insertl, vec_inserth): New defines. * config/rs6000/rs6000-builtin.def (VINSERTGPRBL, VINSERTGPRHL, VINSERTGPRWL, VINSERTGPRDL, VINSERTVPRBL, VINSERTVPRHL, VINSERTVPRWL, VINSERTGPRBR, VINSERTGPRHR, VINSERTGPRWR, VINSERTGPRDR, VINSERTVPRBR, VINSERTVPRHR, VINSERTVPRWR): New builtins. (INSERTL, INSERTH): New builtins. * config/rs6000/rs6000-call.c (FUTURE_BUILTIN_VEC_INSERTL, FUTURE_BUILTIN_VEC_INSERTH): New Overloaded definitions. (FUTURE_BUILTIN_VINSERTGPRBL, FUTURE_BUILTIN_VINSERTGPRHL, FUTURE_BUILTIN_VINSERTGPRWL, FUTURE_BUILTIN_VINSERTGPRDL, FUTURE_BUILTIN_VINSERTVPRBL, FUTURE_BUILTIN_VINSERTVPRHL, FUTURE_BUILTIN_VINSERTVPRWL): Add case entries. * config/rs6000/vsx.md (define_c_enum): Add UNSPEC_INSERTL, UNSPEC_INSERTR. (define_expand): Add vinsertvl_, vinsertvr_, vinsertgl_, vinsertgr_, mode is VI2. (define_ins): vinsertvl_internal_, vinsertvr_internal_, vinsertgl_internal_, vinsertgr_internal_, mode VEC_I. * doc/extend.texi: Add documentation for vec_insertl, vec_inserth. gcc/testsuite/ChangeLog 2020-06-15 Carl Love * gcc.target/powerpc/vec-insert-word-runnable.c: New test case. --- gcc/config/rs6000/altivec.h | 2 + gcc/config/rs6000/rs6000-builtin.def | 18 + gcc/config/rs6000/rs6000-call.c | 51 +++ gcc/config/rs6000/vsx.md | 110 ++++++ gcc/doc/extend.texi | 73 ++++ .../powerpc/vec-insert-word-runnable.c | 345 ++++++++++++++++++ 6 files changed, 599 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-insert-word-runnable.c diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h index 0a7e8ab3647..936aeb1ee09 100644 --- a/gcc/config/rs6000/altivec.h +++ b/gcc/config/rs6000/altivec.h @@ -699,6 +699,8 @@ __altivec_scalar_pred(vec_any_nle, /* Overloaded built-in functions for future architecture. */ #define vec_extractl(a, b, c) __builtin_vec_extractl (a, b, c) #define vec_extracth(a, b, c) __builtin_vec_extracth (a, b, c) +#define vec_insertl(a, b, c) __builtin_vec_insertl (a, b, c) +#define vec_inserth(a, b, c) __builtin_vec_inserth (a, b, c) #define vec_gnb(a, b) __builtin_vec_gnb (a, b) #define vec_clrl(a, b) __builtin_vec_clrl (a, b) diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def index 8b1ddb00045..c5bd4f86555 100644 --- a/gcc/config/rs6000/rs6000-builtin.def +++ b/gcc/config/rs6000/rs6000-builtin.def @@ -2627,6 +2627,22 @@ BU_FUTURE_V_3 (VEXTRACTHR, "vextduhvhx", CONST, vextractrv8hi) BU_FUTURE_V_3 (VEXTRACTWR, "vextduwvhx", CONST, vextractrv4si) BU_FUTURE_V_3 (VEXTRACTDR, "vextddvhx", CONST, vextractrv2di) +BU_FUTURE_V_3 (VINSERTGPRBL, "vinsgubvlx", CONST, vinsertgl_v16qi) +BU_FUTURE_V_3 (VINSERTGPRHL, "vinsguhvlx", CONST, vinsertgl_v8hi) +BU_FUTURE_V_3 (VINSERTGPRWL, "vinsguwvlx", CONST, vinsertgl_v4si) +BU_FUTURE_V_3 (VINSERTGPRDL, "vinsgudvlx", CONST, vinsertgl_v2di) +BU_FUTURE_V_3 (VINSERTVPRBL, "vinsvubvlx", CONST, vinsertvl_v16qi) +BU_FUTURE_V_3 (VINSERTVPRHL, "vinsvuhvlx", CONST, vinsertvl_v8hi) +BU_FUTURE_V_3 (VINSERTVPRWL, "vinsvuwvlx", CONST, vinsertvl_v4si) + +BU_FUTURE_V_3 (VINSERTGPRBR, "vinsgubvrx", CONST, vinsertgr_v16qi) +BU_FUTURE_V_3 (VINSERTGPRHR, "vinsguhvrx", CONST, vinsertgr_v8hi) +BU_FUTURE_V_3 (VINSERTGPRWR, "vinsguwvrx", CONST, vinsertgr_v4si) +BU_FUTURE_V_3 (VINSERTGPRDR, "vinsgudvrx", CONST, vinsertgr_v2di) +BU_FUTURE_V_3 (VINSERTVPRBR, "vinsvubvrx", CONST, vinsertvr_v16qi) +BU_FUTURE_V_3 (VINSERTVPRHR, "vinsvuhvrx", CONST, vinsertvr_v8hi) +BU_FUTURE_V_3 (VINSERTVPRWR, "vinsvuwvrx", CONST, vinsertvr_v4si) + BU_FUTURE_V_1 (VSTRIBR, "vstribr", CONST, vstrir_v16qi) BU_FUTURE_V_1 (VSTRIHR, "vstrihr", CONST, vstrir_v8hi) BU_FUTURE_V_1 (VSTRIBL, "vstribl", CONST, vstril_v16qi) @@ -2646,6 +2662,8 @@ BU_FUTURE_OVERLOAD_2 (XXGENPCVM, "xxgenpcvm") BU_FUTURE_OVERLOAD_3 (EXTRACTL, "extractl") BU_FUTURE_OVERLOAD_3 (EXTRACTH, "extracth") +BU_FUTURE_OVERLOAD_3 (INSERTL, "insertl") +BU_FUTURE_OVERLOAD_3 (INSERTH, "inserth") BU_FUTURE_OVERLOAD_1 (VSTRIR, "strir") BU_FUTURE_OVERLOAD_1 (VSTRIL, "stril") diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c index 817a14c9c0d..abbe00030ea 100644 --- a/gcc/config/rs6000/rs6000-call.c +++ b/gcc/config/rs6000/rs6000-call.c @@ -5567,6 +5567,28 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = { RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTQI }, + { FUTURE_BUILTIN_VEC_INSERTL, FUTURE_BUILTIN_VINSERTGPRBL, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTQI, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTSI }, + { FUTURE_BUILTIN_VEC_INSERTL, FUTURE_BUILTIN_VINSERTGPRHL, + RS6000_BTI_unsigned_V8HI, RS6000_BTI_UINTHI, + RS6000_BTI_unsigned_V8HI, RS6000_BTI_UINTSI }, + { FUTURE_BUILTIN_VEC_INSERTL, FUTURE_BUILTIN_VINSERTGPRWL, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTSI, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTSI }, + { FUTURE_BUILTIN_VEC_INSERTL, FUTURE_BUILTIN_VINSERTGPRDL, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTDI, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTSI }, + { FUTURE_BUILTIN_VEC_INSERTL, FUTURE_BUILTIN_VINSERTVPRBL, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTQI }, + { FUTURE_BUILTIN_VEC_INSERTL, FUTURE_BUILTIN_VINSERTVPRHL, + RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, + RS6000_BTI_unsigned_V8HI, RS6000_BTI_UINTQI }, + { FUTURE_BUILTIN_VEC_INSERTL, FUTURE_BUILTIN_VINSERTVPRWL, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTQI }, + { FUTURE_BUILTIN_VEC_EXTRACTH, FUTURE_BUILTIN_VEXTRACTBR, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTQI }, @@ -5580,6 +5602,28 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = { RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTQI }, + { FUTURE_BUILTIN_VEC_INSERTH, FUTURE_BUILTIN_VINSERTGPRBR, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTQI, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTSI }, + { FUTURE_BUILTIN_VEC_INSERTH, FUTURE_BUILTIN_VINSERTGPRHR, + RS6000_BTI_unsigned_V8HI, RS6000_BTI_UINTHI, + RS6000_BTI_unsigned_V8HI, RS6000_BTI_UINTSI }, + { FUTURE_BUILTIN_VEC_INSERTH, FUTURE_BUILTIN_VINSERTGPRWR, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTSI, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTSI }, + { FUTURE_BUILTIN_VEC_INSERTH, FUTURE_BUILTIN_VINSERTGPRDR, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTDI, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTSI }, + { FUTURE_BUILTIN_VEC_INSERTH, FUTURE_BUILTIN_VINSERTVPRBR, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTQI }, + { FUTURE_BUILTIN_VEC_INSERTH, FUTURE_BUILTIN_VINSERTVPRHR, + RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, + RS6000_BTI_unsigned_V8HI, RS6000_BTI_UINTQI }, + { FUTURE_BUILTIN_VEC_INSERTH, FUTURE_BUILTIN_VINSERTVPRWR, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTQI }, + { FUTURE_BUILTIN_VEC_VSTRIL, FUTURE_BUILTIN_VSTRIBL, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 }, { FUTURE_BUILTIN_VEC_VSTRIL, FUTURE_BUILTIN_VSTRIBL, @@ -13291,6 +13335,13 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0, case FUTURE_BUILTIN_VEXTRACTHR: case FUTURE_BUILTIN_VEXTRACTWR: case FUTURE_BUILTIN_VEXTRACTDR: + case FUTURE_BUILTIN_VINSERTGPRBL: + case FUTURE_BUILTIN_VINSERTGPRHL: + case FUTURE_BUILTIN_VINSERTGPRWL: + case FUTURE_BUILTIN_VINSERTGPRDL: + case FUTURE_BUILTIN_VINSERTVPRBL: + case FUTURE_BUILTIN_VINSERTVPRHL: + case FUTURE_BUILTIN_VINSERTVPRWL: h.uns_p[0] = 1; h.uns_p[1] = 1; h.uns_p[2] = 1; diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 51ffe2d2000..6ce93f14dec 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -346,6 +346,8 @@ UNSPEC_XXGENPCV UNSPEC_EXTRACTL UNSPEC_EXTRACTR + UNSPEC_INSERTL + UNSPEC_INSERTR ]) ;; Like VI, defined in vector.md, but add ISA 2.07 integer vector ops @@ -3847,6 +3849,114 @@ "vextvrx %0,%1,%2,%3" [(set_attr "type" "vecsimple")]) +(define_expand "vinsertvl_" + [(set (match_operand:VI2 0 "altivec_register_operand") + (unspec:VI2 [(match_operand:VI2 1 "altivec_register_operand") + (match_operand:VI2 2 "altivec_register_operand") + (match_operand:SI 3 "register_operand" "r")] + UNSPEC_INSERTL))] + "TARGET_FUTURE" +{ + if (BYTES_BIG_ENDIAN) + emit_insn (gen_vinsertvl_internal_ (operands[0], operands[3], + operands[1], operands[2])); + else + emit_insn (gen_vinsertvr_internal_ (operands[0], operands[3], + operands[1], operands[2])); + DONE; +}) + +(define_insn "vinsertvl_internal_" + [(set (match_operand:VEC_I 0 "altivec_register_operand" "=v") + (unspec:VEC_I [(match_operand:SI 1 "register_operand" "r") + (match_operand:VEC_I 2 "altivec_register_operand" "v") + (match_operand:VEC_I 3 "altivec_register_operand" "0")] + UNSPEC_INSERTL))] + "TARGET_FUTURE" + "vinsvlx %0,%1,%2" + [(set_attr "type" "vecsimple")]) + +(define_expand "vinsertvr_" + [(set (match_operand:VI2 0 "altivec_register_operand") + (unspec:VI2 [(match_operand:VI2 1 "altivec_register_operand") + (match_operand:VI2 2 "altivec_register_operand") + (match_operand:SI 3 "register_operand" "r")] + UNSPEC_INSERTR))] + "TARGET_FUTURE" +{ + if (BYTES_BIG_ENDIAN) + emit_insn (gen_vinsertvr_internal_ (operands[0], operands[3], + operands[1], operands[2])); + else + emit_insn (gen_vinsertvl_internal_ (operands[0], operands[3], + operands[1], operands[2])); + DONE; +}) + +(define_insn "vinsertvr_internal_" + [(set (match_operand:VEC_I 0 "altivec_register_operand" "=v") + (unspec:VEC_I [(match_operand:SI 1 "register_operand" "r") + (match_operand:VEC_I 2 "altivec_register_operand" "v") + (match_operand:VEC_I 3 "altivec_register_operand" "0")] + UNSPEC_INSERTR))] + "TARGET_FUTURE" + "vinsvrx %0,%1,%2" + [(set_attr "type" "vecsimple")]) + +(define_expand "vinsertgl_" + [(set (match_operand:VI2 0 "altivec_register_operand") + (unspec:VI2 [(match_operand:SI 1 "register_operand") + (match_operand:VI2 2 "altivec_register_operand") + (match_operand:SI 3 "register_operand")] + UNSPEC_INSERTL))] + "TARGET_FUTURE" +{ + if (BYTES_BIG_ENDIAN) + emit_insn (gen_vinsertgl_internal_ (operands[0], operands[3], + operands[1], operands[2])); + else + emit_insn (gen_vinsertgr_internal_ (operands[0], operands[3], + operands[1], operands[2])); + DONE; + }) + +(define_insn "vinsertgl_internal_" + [(set (match_operand:VEC_I 0 "altivec_register_operand" "=v") + (unspec:VEC_I [(match_operand:SI 1 "register_operand" "r") + (match_operand:SI 2 "register_operand" "r") + (match_operand:VEC_I 3 "altivec_register_operand" "0")] + UNSPEC_INSERTL))] + "TARGET_FUTURE" + "vinslx %0,%1,%2" + [(set_attr "type" "vecsimple")]) + +(define_expand "vinsertgr_" + [(set (match_operand:VI2 0 "altivec_register_operand") + (unspec:VI2 [(match_operand:SI 1 "register_operand") + (match_operand:VI2 2 "altivec_register_operand") + (match_operand:SI 3 "register_operand")] + UNSPEC_INSERTR))] + "TARGET_FUTURE" +{ + if (BYTES_BIG_ENDIAN) + emit_insn (gen_vinsertgr_internal_ (operands[0], operands[3], + operands[1], operands[2])); + else + emit_insn (gen_vinsertgl_internal_ (operands[0], operands[3], + operands[1], operands[2])); + DONE; + }) + +(define_insn "vinsertgr_internal_" + [(set (match_operand:VEC_I 0 "altivec_register_operand" "=v") + (unspec:VEC_I [(match_operand:SI 1 "register_operand" "r") + (match_operand:SI 2 "register_operand" "r") + (match_operand:VEC_I 3 "altivec_register_operand" "0")] + UNSPEC_INSERTR))] + "TARGET_FUTURE" + "vinsrx %0,%1,%2" + [(set_attr "type" "vecsimple")]) + ;; VSX_EXTRACT optimizations ;; Optimize double d = (double) vec_extract (vi, ) ;; Get the element into the top position and use XVCVSWDP/XVCVUWDP diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 5549a695b42..8931c7950f6 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -20972,6 +20972,79 @@ limitation of the bi-endian vector programming model consistent with the limitation on vec_perm, for example. @findex vec_extracth +Vector Insert + +@smallexample +@exdent vector unsigned char +@exdent vec_insertl (unsigned char, vector unsigned char, unsigned int); +@exdent vector unsigned short +@exdent vec_insertl (unsigned short, vector unsigned short, unsigned int); +@exdent vector unsigned int +@exdent vec_insertl (unsigned int, vector unsigned int, unsigned int); +@exdent vector unsigned long long +@exdent vec_insertl (unsigned long long, vector unsigned long long, +unsigned int); +@exdent vector unsigned char +@exdent vec_insertl (vector unsigned char, vector unsigned char, unsigned int; +@exdent vector unsigned short +@exdent vec_insertl (vector unsigned short, vector unsigned short, +unsigned int); +@exdent vector unsigned int +@exdent vec_insertl (vector unsigned int, vector unsigned int, unsigned int); +@end smallexample + +Let src be the first argument, when the first argument is a scalar, or the +rightmost element of the left doubleword of the first argument, when the first +argument is a vector. Insert src into the second argument at the position +identified by the third argument, using natural element order in the second +argument, and leaving the rest of the second argument unchanged. If the byte +index is greater than 14 for halfwords, 12 for words, or 8 for doublewords, +the intrinsic will be rejected. Note that the underlying hardware instruction +uses the same register for the second argument and the result, but this is +hidden by the built-in. For little-endian, the generated code will be +semantically equivalent to vins*rx, while for big-endian it will be +semantically equivalent to vins*lx. Note that some fairly anomalous results +can be generated if the byte index is not aligned on an element boundary for +the sort of element being inserted. This is a limitation of the bi-endian +vector programming model consistent with the limitation on veextracthc_perm, +for example. +@findex vec_insertl + +@smallexample +@exdent vector unsigned char +@exdent vec_inserth (unsigned char, vector unsigned char, unsigned int); +@exdent vector unsigned short +@exdent vec_inserth (unsigned short, vector unsigned short, unsigned int); +@exdent vector unsigned int +@exdent vec_inserth (unsigned int, vector unsigned int, unsigned int); +@exdent vector unsigned long long +@exdent vec_inserth (unsigned long long, vector unsigned long long, +unsigned int); +@exdent vector unsigned char +@exdent vec_inserth (vector unsigned char, vector unsigned char, unsigned int); +@exdent vector unsigned short +@exdent vec_inserth (vector unsigned short, vector unsigned short, +unsigned int); +@exdent vector unsigned int +@exdent vec_inserth (vector unsigned int, vector unsigned int, unsigned int); +@end smallexample + +Let src be the first argument, when the first argument is a scalar, or the +rightmost element of the first argument, when the first argument is a vector. +Insert src into the second argument at the position identified by the third +argument, using opposite element order in the second argument, and leaving the +rest of the second argument unchanged. If the byte index is greater than 14 +for halfwords, 12 for words, or 8 for doublewords, the intrinsic will be +rejected. Note that the underlying hardware instruction uses the same register +for the second argument and the result, but this is hidden by the built-in. +For little-endian, the code generation will be semantically equivalent to +vins*lx, while for big-endian it will be semantically equivalent to vins*rx. +Note that some fairly anomalous results can be generated if the byte index is +not aligned on an element boundary for the sort of element being inserted. +This is a limitation of the bi-endian vector programming model consistent with +the limitation on vec_perm, for example. +@findex vec_inserth + @smallexample @exdent vector unsigned long long int @exdent vec_pdep (vector unsigned long long int, vector unsigned long long int) diff --git a/gcc/testsuite/gcc.target/powerpc/vec-insert-word-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-insert-word-runnable.c new file mode 100644 index 00000000000..3fc68e9d7c7 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-insert-word-runnable.c @@ -0,0 +1,345 @@ +/* { dg-do run } */ +/* { dg-require-effective-target powerpc_future_hw } */ +/* { dg-options "-mdejagnu-cpu=future" } */ +#include + +#define DEBUG 1 + +#ifdef DEBUG +#include +#endif + +extern void abort (void); + +int +main (int argc, char *argv []) +{ + int i; + unsigned int index; + vector unsigned char vresult_ch; + vector unsigned char expected_vresult_ch; + vector unsigned char src_va_ch; + vector unsigned char src_vb_ch; + unsigned char src_a_ch; + + vector unsigned short vresult_sh; + vector unsigned short expected_vresult_sh; + vector unsigned short src_va_sh; + vector unsigned short src_vb_sh; + unsigned short int src_a_sh; + + vector unsigned int vresult_int; + vector unsigned int expected_vresult_int; + vector unsigned int src_va_int; + vector unsigned int src_vb_int; + unsigned int src_a_int; + + vector unsigned long long vresult_ll; + vector unsigned long long expected_vresult_ll; + vector unsigned long long src_va_ll; + unsigned long long int src_a_ll; + + /* Vector insert, low index, from GPR */ + src_a_ch = 79; + index = 2; + src_va_ch = (vector unsigned char) { 0, 1, 2, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 13, 14, 15 }; + vresult_ch = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_ch = (vector unsigned char) { 0, 1, 79, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 13, 14, 15 }; + + vresult_ch = vec_insertl (src_a_ch, src_va_ch, index); + + if (!vec_all_eq (vresult_ch, expected_vresult_ch)) { +#if DEBUG + printf("ERROR, vec_insertl (src_a_ch, src_va_ch, index)\n"); + for(i = 0; i < 16; i++) + printf(" vresult_ch[%d] = %d, expected_vresult_ch[%d] = %d\n", + i, vresult_ch[i], i, expected_vresult_ch[i]); +#else + abort(); +#endif + } + + src_a_sh = 79; + index = 10; + src_va_sh = (vector unsigned short int) { 0, 1, 2, 3, 4, 5, 6, 7 }; + vresult_sh = (vector unsigned short int) { 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_sh = (vector unsigned short int) { 0, 1, 2, 3, + 4, 79, 6, 7 }; + + vresult_sh = vec_insertl (src_a_sh, src_va_sh, index); + + if (!vec_all_eq (vresult_sh, expected_vresult_sh)) { +#if DEBUG + printf("ERROR, vec_insertl (src_a_sh, src_va_sh, index)\n"); + for(i = 0; i < 8; i++) + printf(" vresult_sh[%d] = %d, expected_vresult_sh[%d] = %d\n", + i, vresult_sh[i], i, expected_vresult_sh[i]); +#else + abort(); +#endif + } + + src_a_int = 79; + index = 8; + src_va_int = (vector unsigned int) { 0, 1, 2, 3 }; + vresult_int = (vector unsigned int) { 0, 0, 0, 0 }; + expected_vresult_int = (vector unsigned int) { 0, 1, 79, 3 }; + + vresult_int = vec_insertl (src_a_int, src_va_int, index); + + if (!vec_all_eq (vresult_int, expected_vresult_int)) { +#if DEBUG + printf("ERROR, vec_insertl (src_a_int, src_va_int, index)\n"); + for(i = 0; i < 4; i++) + printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n", + i, vresult_int[i], i, expected_vresult_int[i]); +#else + abort(); +#endif + } + + src_a_ll = 79; + index = 8; + src_va_ll = (vector unsigned long long) { 0, 1 }; + vresult_ll = (vector unsigned long long) { 0, 0 }; + expected_vresult_ll = (vector unsigned long long) { 0, 79 }; + + vresult_ll = vec_insertl (src_a_ll, src_va_ll, index); + + if (!vec_all_eq (vresult_ll, expected_vresult_ll)) { +#if DEBUG + printf("ERROR, vec_insertl (src_a_ll, src_va_ll, index)\n"); + for(i = 0; i < 2; i++) + printf(" vresult_ll[%d] = %d, expected_vresult_ll[%d] = %d\n", + i, vresult_ll[i], i, expected_vresult_ll[i]); +#else + abort(); +#endif + } + + /* Vector insert, low index, from vector */ + index = 2; + src_va_ch = (vector unsigned char) { 0, 1, 2, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 13, 14, 15 }; + src_vb_ch = (vector unsigned char) { 10, 11, 12, 13, 14, 15, 16, 17, + 18, 19, 20, 21, 22, 23, 24, 25 }; + vresult_ch = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_ch = (vector unsigned char) { 0, 1, 18, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 13, 14, 15 }; + + vresult_ch = vec_insertl (src_vb_ch, src_va_ch, index); + + if (!vec_all_eq (vresult_ch, expected_vresult_ch)) { +#if DEBUG + printf("ERROR, vec_insertl (src_vb_ch, src_va_ch, index)\n"); + for(i = 0; i < 16; i++) + printf(" vresult_ch[%d] = %d, expected_vresult_ch[%d] = %d\n", + i, vresult_ch[i], i, expected_vresult_ch[i]); +#else + abort(); +#endif + } + + index = 4; + src_va_sh = (vector unsigned short) { 0, 1, 2, 3, 4, 5, 6, 7 }; + src_vb_sh = (vector unsigned short) { 10, 11, 12, 13, 14, 15, 16, 17 }; + vresult_sh = (vector unsigned short) { 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_sh = (vector unsigned short) { 0, 1, 14, 3, 4, 5, 6, 7 }; + + vresult_sh = vec_insertl (src_vb_sh, src_va_sh, index); + + if (!vec_all_eq (vresult_sh, expected_vresult_sh)) { +#if DEBUG + printf("ERROR, vec_insertl (src_vb_sh, src_va_sh, index)\n"); + for(i = 0; i < 8; i++) + printf(" vresult_sh[%d] = %d, expected_vresult_sh[%d] = %d\n", + i, vresult_sh[i], i, expected_vresult_sh[i]); +#else + abort(); +#endif + } + + index = 8; + src_va_int = (vector unsigned int) { 0, 1, 2, 3 }; + src_vb_int = (vector unsigned int) { 10, 11, 12, 13 }; + vresult_int = (vector unsigned int) { 0, 0, 0, 0 }; + expected_vresult_int = (vector unsigned int) { 0, 1, 12, 3 }; + + vresult_int = vec_insertl (src_vb_int, src_va_int, index); + + if (!vec_all_eq (vresult_int, expected_vresult_int)) { +#if DEBUG + printf("ERROR, vec_insertl (src_vb_int, src_va_int, index)\n"); + for(i = 0; i < 4; i++) + printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n", + i, vresult_int[i], i, expected_vresult_int[i]); +#else + abort(); +#endif + } + + /* Vector insert, high index, from GPR */ + src_a_ch = 79; + index = 2; + src_va_ch = (vector unsigned char) { 0, 1, 2, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 13, 14, 15 }; + vresult_ch = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_ch = (vector unsigned char) { 0, 1, 2, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 79, 14, 15 }; + + vresult_ch = vec_inserth (src_a_ch, src_va_ch, index); + + if (!vec_all_eq (vresult_ch, expected_vresult_ch)) { +#if DEBUG + printf("ERROR, vec_inserth (src_a_ch, src_va_ch, index)\n"); + for(i = 0; i < 16; i++) + printf(" vresult_ch[%d] = %d, expected_vresult_ch[%d] = %d\n", + i, vresult_ch[i], i, expected_vresult_ch[i]); +#else + abort(); +#endif + } + + src_a_sh = 79; + index = 10; + src_va_sh = (vector unsigned short int) { 0, 1, 2, 3, 4, 5, 6, 7 }; + vresult_sh = (vector unsigned short int) { 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_sh = (vector unsigned short int) { 0, 1, 79, 3, + 4, 5, 6, 7 }; + + vresult_sh = vec_inserth (src_a_sh, src_va_sh, index); + + if (!vec_all_eq (vresult_sh, expected_vresult_sh)) { +#if DEBUG + printf("ERROR, vec_inserth (src_a_sh, src_va_sh, index)\n"); + for(i = 0; i < 8; i++) + printf(" vresult_sh[%d] = %d, expected_vresult_sh[%d] = %d\n", + i, vresult_sh[i], i, expected_vresult_sh[i]); +#else + abort(); +#endif + } + + src_a_int = 79; + index = 8; + src_va_int = (vector unsigned int) { 0, 1, 2, 3 }; + vresult_int = (vector unsigned int) { 0, 0, 0, 0 }; + expected_vresult_int = (vector unsigned int) { 0, 79, 2, 3 }; + + vresult_int = vec_inserth (src_a_int, src_va_int, index); + + if (!vec_all_eq (vresult_int, expected_vresult_int)) { +#if DEBUG + printf("ERROR, vec_inserth (src_a_int, src_va_int, index)\n"); + for(i = 0; i < 4; i++) + printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n", + i, vresult_int[i], i, expected_vresult_int[i]); +#else + abort(); +#endif + } + + src_a_ll = 79; + index = 8; + src_va_ll = (vector unsigned long long) { 0, 1 }; + vresult_ll = (vector unsigned long long) { 0, 0 }; + expected_vresult_ll = (vector unsigned long long) { 79, 1 }; + + vresult_ll = vec_inserth (src_a_ll, src_va_ll, index); + + if (!vec_all_eq (vresult_ll, expected_vresult_ll)) { +#if DEBUG + printf("ERROR, vec_inserth (src_a_ll, src_va_ll, index)\n"); + for(i = 0; i < 2; i++) + printf(" vresult_ll[%d] = %d, expected_vresult_ll[%d] = %d\n", + i, vresult_ll[i], i, expected_vresult_ll[i]); +#else + abort(); +#endif + } + + /* Vector insert, left index, from vector */ + index = 2; + src_va_ch = (vector unsigned char) { 0, 1, 2, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 13, 14, 15 }; + src_vb_ch = (vector unsigned char) { 10, 11, 12, 13, 14, 15, 16, 17, + 18, 19, 20, 21, 22, 23, 24, 25 }; + vresult_ch = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_ch = (vector unsigned char) { 0, 1, 2, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 18, 14, 15 }; + + vresult_ch = vec_inserth (src_vb_ch, src_va_ch, index); + + if (!vec_all_eq (vresult_ch, expected_vresult_ch)) { +#if DEBUG + printf("ERROR, vec_inserth (src_vb_ch, src_va_ch, index)\n"); + for(i = 0; i < 16; i++) + printf(" vresult_ch[%d] = %d, expected_vresult_ch[%d] = %d\n", + i, vresult_ch[i], i, expected_vresult_ch[i]); +#else + abort(); +#endif + } + + index = 4; + src_va_sh = (vector unsigned short) { 0, 1, 2, 3, 4, 5, 6, 7 }; + src_vb_sh = (vector unsigned short) { 10, 11, 12, 13, 14, 15, 16, 17 }; + vresult_sh = (vector unsigned short) { 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_sh = (vector unsigned short) { 0, 1, 2, 3, 4, 14, 6, 7 }; + + vresult_sh = vec_inserth (src_vb_sh, src_va_sh, index); + + if (!vec_all_eq (vresult_sh, expected_vresult_sh)) { +#if DEBUG + printf("ERROR, vec_inserth (src_vb_sh, src_va_sh, index)\n"); + for(i = 0; i < 8; i++) + printf(" vresult_sh[%d] = %d, expected_vresult_sh[%d] = %d\n", + i, vresult_sh[i], i, expected_vresult_sh[i]); +#else + abort(); +#endif + } + + index = 8; + src_va_int = (vector unsigned int) { 0, 1, 2, 3 }; + src_vb_int = (vector unsigned int) { 10, 11, 12, 13 }; + vresult_int = (vector unsigned int) { 0, 0, 0, 0 }; + expected_vresult_int = (vector unsigned int) { 0, 12, 2, 3 }; + + vresult_int = vec_inserth (src_vb_int, src_va_int, index); + + if (!vec_all_eq (vresult_int, expected_vresult_int)) { +#if DEBUG + printf("ERROR, vec_inserth (src_vb_int, src_va_int, index)\n"); + for(i = 0; i < 4; i++) + printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n", + i, vresult_int[i], i, expected_vresult_int[i]); +#else + abort(); +#endif + } + return 0; +} + +/* { dg-final { scan-assembler {\mvinsblx\M} } } */ +/* { dg-final { scan-assembler {\mvinshlx\M} } } */ +/* { dg-final { scan-assembler {\mvinswlx\M} } } */ +/* { dg-final { scan-assembler {\mvinsdlx\M} } } */ +/* { dg-final { scan-assembler {\mvinsbvlx\M} } } */ +/* { dg-final { scan-assembler {\mvinshvlx\M} } } */ +/* { dg-final { scan-assembler {\mvinswvlx\M} } } */ + +/* { dg-final { scan-assembler {\mvinsbrx\M} } } */ +/* { dg-final { scan-assembler {\mvinshrx\M} } } */ +/* { dg-final { scan-assembler {\mvinswrx\M} } } */ +/* { dg-final { scan-assembler {\mvinsdrx\M} } } */ +/* { dg-final { scan-assembler {\mvinsbvrx\M} } } */ +/* { dg-final { scan-assembler {\mvinshvrx\M} } } */ +/* { dg-final { scan-assembler {\mvinswvrx\M} } } */ + From patchwork Mon Jun 15 23:37:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Carl Love X-Patchwork-Id: 1309877 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=gcc.gnu.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=l4xWKajs; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49m7782WkQz9sRN for ; Tue, 16 Jun 2020 09:38:12 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id EA3BA383F878; Mon, 15 Jun 2020 23:38:09 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org EA3BA383F878 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1592264290; bh=PPNUIsDswmxq11yj4XncYzK17WTIP9peneGg9vYkYSU=; h=Subject:To:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=l4xWKajs1b3d3hruazKv9lVb19T5J00Z3+jcS5Hjhbl9d+qOQjk5dfGR1Of45GOuO GoPHU7ZNzlLj1Xlg6HZEq4tsGmSl36Ouo/k6nbRWkbnjpX5dI7H40RAwSCdqrn56+z tLEhmersP8G+t9pa60UEEEw2sGiDhJlbRAvD4nG0= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 4A8E8386F45A; Mon, 15 Jun 2020 23:38:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 4A8E8386F45A Received: from pps.filterd (m0127361.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 05FNW4v5020879; Mon, 15 Jun 2020 19:38:05 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 31p5euy6k1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 15 Jun 2020 19:38:04 -0400 Received: from m0127361.ppops.net (m0127361.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 05FNWL3G021524; Mon, 15 Jun 2020 19:38:04 -0400 Received: from ppma02dal.us.ibm.com (a.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.10]) by mx0a-001b2d01.pphosted.com with ESMTP id 31p5euy6ju-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 15 Jun 2020 19:38:03 -0400 Received: from pps.filterd (ppma02dal.us.ibm.com [127.0.0.1]) by ppma02dal.us.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 05FNVGlt009197; Mon, 15 Jun 2020 23:38:03 GMT Received: from b03cxnp08028.gho.boulder.ibm.com (b03cxnp08028.gho.boulder.ibm.com [9.17.130.20]) by ppma02dal.us.ibm.com with ESMTP id 31pe8ntywk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 15 Jun 2020 23:38:03 +0000 Received: from b03ledav003.gho.boulder.ibm.com (b03ledav003.gho.boulder.ibm.com [9.17.130.234]) by b03cxnp08028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 05FNc2E227590940 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 15 Jun 2020 23:38:02 GMT Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EE99C6A057; Mon, 15 Jun 2020 23:38:01 +0000 (GMT) Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D838B6A047; Mon, 15 Jun 2020 23:38:00 +0000 (GMT) Received: from sig-9-65-250-81.ibm.com (unknown [9.65.250.81]) by b03ledav003.gho.boulder.ibm.com (Postfix) with ESMTP; Mon, 15 Jun 2020 23:38:00 +0000 (GMT) Message-ID: Subject: [PATCH 3/6 ver 2] rs6000, Add vector replace builtin support To: segher@gcc.gnu.org, dje.gcc@gmail.com, gcc-patches@gcc.gnu.org Date: Mon, 15 Jun 2020 16:37:59 -0700 X-Mailer: Evolution 3.28.5 (3.28.5-5.el7) Mime-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216, 18.0.687 definitions=2020-06-15_11:2020-06-15, 2020-06-15 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 cotscore=-2147483648 adultscore=0 mlxlogscore=999 mlxscore=0 priorityscore=1501 phishscore=0 lowpriorityscore=0 impostorscore=0 clxscore=1015 malwarescore=0 suspectscore=4 bulkscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2006150164 X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Carl Love via Gcc-patches From: Carl Love Reply-To: Carl Love Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" v2 fixes: change log entries config/rs6000/vsx.md, config/rs6000/rs6000-builtin.def, config/rs6000/rs6000-call.c. gcc/config/rs6000/rs6000-call.c: fixed if check for 3rd arg between 0 and 3 fixed if check for 3rd arg between 0 and 12 gcc/config/rs6000/vsx.md: removed REPLACE_ELT_atr definition and used VS_scalar instead. removed REPLACE_ELT_inst definition and used i\ nstead fixed spelling mistake on Endianness. fixed indenting for vreplace_elt_ ----------------------------------- GCC maintainers: The following patch adds support for builtins vec_replace_elt and vec_replace_unaligned. The patch has been compiled and tested on powerpc64le-unknown-linux-gnu (Power 9 LE) and mambo with no regression errors. Please let me know if this patch is acceptable for the pu branch. Thanks. Carl Love ------------------------------------------------------- gcc/ChangeLog 2020-06-15 Carl Love * config/rs6000/altivec.h: Add define for vec_replace_elt and vec_replace_unaligned. * config/rs6000/vsx.md (UNSPEC_REPLACE_ELT, UNSPEC_REPLACE_UN): New. (REPLACE_ELT): New mode iterator. (REPLACE_ELT_atr, REPLACE_ELT_inst, REPLACE_ELT_char, REPLACE_ELT_sh, REPLACE_ELT_max): New mode attributes. (vreplace_un_, vreplace_elt__inst): New. * config/rs6000/rs6000-builtin.def (VREPLACE_ELT_V4SI, VREPLACE_ELT_UV4\ SI, VREPLACE_ELT_V4SF, VREPLACE_ELT_UV2DI, VREPLACE_ELT_V2DF, VREPLACE_UN_V4SI, VREPLACE_UN_UV4SI, VREPLACE_UN_V4SF, VREPLACE_UN_V2DI, VREPLACE_UN_UV2DI, VREPLACE_UN_V2DF): New. (REPLACE_ELT, REPLACE_UN): New. * config/rs6000/rs6000-call.c (FUTURE_BUILTIN_VEC_REPLACE_ELT, FUTURE_BUILTIN_VEC_REPLACE_UN): New. (rs6000_expand_ternop_builtin): Add 3rd argument checks for CODE_FOR_vreplace_elt_v4si, CODE_FOR_vreplace_elt_v4sf, CODE_FOR_vreplace_un_v4si, CODE_FOR_vreplace_un_v4sf. (builtin_function_type) [FUTURE_BUILTIN_VREPLACE_ELT_UV4SI, FUTURE_BUIL\ TIN_VREPLACE_ELT_UV2DI, FUTURE_BUILTIN_VREPLACE_UN_UV4SI, FUTURE_BUILTIN_VREPLACE_UN_UV2DI]: Ne\ w cases. * doc/extend.texi: Add description for vec_replace_elt and vec_replace_unaligned builtins. gcc/testsuite/ChangeLog 2020-06-15 Carl Love * gcc.target/powerpc/vec-replace-word.c: Add new test. --- gcc/config/rs6000/altivec.h | 2 + gcc/config/rs6000/rs6000-builtin.def | 16 + gcc/config/rs6000/rs6000-call.c | 61 ++++ gcc/config/rs6000/vsx.md | 60 ++++ gcc/doc/extend.texi | 50 +++ .../powerpc/vec-replace-word-runnable.c | 289 ++++++++++++++++++ 6 files changed, 478 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h index 936aeb1ee09..435ffb8158f 100644 --- a/gcc/config/rs6000/altivec.h +++ b/gcc/config/rs6000/altivec.h @@ -701,6 +701,8 @@ __altivec_scalar_pred(vec_any_nle, #define vec_extracth(a, b, c) __builtin_vec_extracth (a, b, c) #define vec_insertl(a, b, c) __builtin_vec_insertl (a, b, c) #define vec_inserth(a, b, c) __builtin_vec_inserth (a, b, c) +#define vec_replace_elt(a, b, c) __builtin_vec_replace_elt (a, b, c) +#define vec_replace_unaligned(a, b, c) __builtin_vec_replace_un (a, b, c) #define vec_gnb(a, b) __builtin_vec_gnb (a, b) #define vec_clrl(a, b) __builtin_vec_clrl (a, b) diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def index c5bd4f86555..91821f29a6f 100644 --- a/gcc/config/rs6000/rs6000-builtin.def +++ b/gcc/config/rs6000/rs6000-builtin.def @@ -2643,6 +2643,20 @@ BU_FUTURE_V_3 (VINSERTVPRBR, "vinsvubvrx", CONST, vinsertvr_v16qi) BU_FUTURE_V_3 (VINSERTVPRHR, "vinsvuhvrx", CONST, vinsertvr_v8hi) BU_FUTURE_V_3 (VINSERTVPRWR, "vinsvuwvrx", CONST, vinsertvr_v4si) +BU_FUTURE_V_3 (VREPLACE_ELT_V4SI, "vreplace_v4si", CONST, vreplace_elt_v4si) +BU_FUTURE_V_3 (VREPLACE_ELT_UV4SI, "vreplace_uv4si", CONST, vreplace_elt_v4si) +BU_FUTURE_V_3 (VREPLACE_ELT_V4SF, "vreplace_v4sf", CONST, vreplace_elt_v4sf) +BU_FUTURE_V_3 (VREPLACE_ELT_V2DI, "vreplace_v2di", CONST, vreplace_elt_v2di) +BU_FUTURE_V_3 (VREPLACE_ELT_UV2DI, "vreplace_uv2di", CONST, vreplace_elt_v2di) +BU_FUTURE_V_3 (VREPLACE_ELT_V2DF, "vreplace_v2df", CONST, vreplace_elt_v2df) + +BU_FUTURE_V_3 (VREPLACE_UN_V4SI, "vreplace_un_v4si", CONST, vreplace_un_v4si) +BU_FUTURE_V_3 (VREPLACE_UN_UV4SI, "vreplace_un_uv4si", CONST, vreplace_un_v4si) +BU_FUTURE_V_3 (VREPLACE_UN_V4SF, "vreplace_un_v4sf", CONST, vreplace_un_v4sf) +BU_FUTURE_V_3 (VREPLACE_UN_V2DI, "vreplace_un_v2di", CONST, vreplace_un_v2di) +BU_FUTURE_V_3 (VREPLACE_UN_UV2DI, "vreplace_un_uv2di", CONST, vreplace_un_v2di) +BU_FUTURE_V_3 (VREPLACE_UN_V2DF, "vreplace_un_v2df", CONST, vreplace_un_v2df) + BU_FUTURE_V_1 (VSTRIBR, "vstribr", CONST, vstrir_v16qi) BU_FUTURE_V_1 (VSTRIHR, "vstrihr", CONST, vstrir_v8hi) BU_FUTURE_V_1 (VSTRIBL, "vstribl", CONST, vstril_v16qi) @@ -2664,6 +2678,8 @@ BU_FUTURE_OVERLOAD_3 (EXTRACTL, "extractl") BU_FUTURE_OVERLOAD_3 (EXTRACTH, "extracth") BU_FUTURE_OVERLOAD_3 (INSERTL, "insertl") BU_FUTURE_OVERLOAD_3 (INSERTH, "inserth") +BU_FUTURE_OVERLOAD_3 (REPLACE_ELT, "replace_elt") +BU_FUTURE_OVERLOAD_3 (REPLACE_UN, "replace_un") BU_FUTURE_OVERLOAD_1 (VSTRIR, "strir") BU_FUTURE_OVERLOAD_1 (VSTRIL, "stril") diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c index abbe00030ea..2653222ced0 100644 --- a/gcc/config/rs6000/rs6000-call.c +++ b/gcc/config/rs6000/rs6000-call.c @@ -5624,6 +5624,36 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = { RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTQI }, + { FUTURE_BUILTIN_VEC_REPLACE_ELT, FUTURE_BUILTIN_VREPLACE_ELT_UV4SI, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, + RS6000_BTI_UINTSI, RS6000_BTI_UINTQI }, + { FUTURE_BUILTIN_VEC_REPLACE_ELT, FUTURE_BUILTIN_VREPLACE_ELT_V4SI, + RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_INTSI, RS6000_BTI_INTQI }, + { FUTURE_BUILTIN_VEC_REPLACE_ELT, FUTURE_BUILTIN_VREPLACE_ELT_V4SF, + RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_float, RS6000_BTI_INTQI }, + { FUTURE_BUILTIN_VEC_REPLACE_ELT, FUTURE_BUILTIN_VREPLACE_ELT_UV2DI, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, + RS6000_BTI_UINTDI, RS6000_BTI_UINTQI }, + { FUTURE_BUILTIN_VEC_REPLACE_ELT, FUTURE_BUILTIN_VREPLACE_ELT_V2DI, + RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_INTDI, RS6000_BTI_INTQI }, + { FUTURE_BUILTIN_VEC_REPLACE_ELT, FUTURE_BUILTIN_VREPLACE_ELT_V2DF, + RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_double, RS6000_BTI_INTQI }, + + { FUTURE_BUILTIN_VEC_REPLACE_UN, FUTURE_BUILTIN_VREPLACE_UN_UV4SI, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, + RS6000_BTI_UINTSI, RS6000_BTI_UINTQI }, + { FUTURE_BUILTIN_VEC_REPLACE_UN, FUTURE_BUILTIN_VREPLACE_UN_V4SI, + RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_INTSI, RS6000_BTI_INTQI }, + { FUTURE_BUILTIN_VEC_REPLACE_UN, FUTURE_BUILTIN_VREPLACE_UN_V4SF, + RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_float, RS6000_BTI_INTQI }, + { FUTURE_BUILTIN_VEC_REPLACE_UN, FUTURE_BUILTIN_VREPLACE_UN_UV2DI, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, + RS6000_BTI_UINTDI, RS6000_BTI_UINTQI }, + { FUTURE_BUILTIN_VEC_REPLACE_UN, FUTURE_BUILTIN_VREPLACE_UN_V2DI, + RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_INTDI, RS6000_BTI_INTQI }, + { FUTURE_BUILTIN_VEC_REPLACE_UN, FUTURE_BUILTIN_VREPLACE_UN_V2DF, + RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_double, RS6000_BTI_INTQI }, + { FUTURE_BUILTIN_VEC_VSTRIL, FUTURE_BUILTIN_VSTRIBL, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 }, { FUTURE_BUILTIN_VEC_VSTRIL, FUTURE_BUILTIN_VSTRIBL, @@ -9987,6 +10017,33 @@ rs6000_expand_ternop_builtin (enum insn_code icode, tree exp, rtx target) return CONST0_RTX (tmode); } } + else if (icode == CODE_FOR_vreplace_elt_v4si + || icode == CODE_FOR_vreplace_elt_v4sf) + { + /* Check whether the 3rd argument is an integer constant in the range + 0 to 3 inclusive. */ + STRIP_NOPS (arg2); + if (TREE_CODE (arg2) != INTEGER_CST + || !IN_RANGE (TREE_INT_CST_LOW (arg2), 0, 3)) + { + error ("argument 3 must be in the range 0 to 3"); + return CONST0_RTX (tmode); + } + } + + else if (icode == CODE_FOR_vreplace_un_v4si + || icode == CODE_FOR_vreplace_un_v4sf) + { + /* Check whether the 3rd argument is an integer constant in the range + 0 to 12 inclusive. */ + STRIP_NOPS (arg2); + if (TREE_CODE (arg2) != INTEGER_CST + || !IN_RANGE(TREE_INT_CST_LOW (arg2), 0, 12)) + { + error ("argument 3 must be in the range 0 to 12"); + return CONST0_RTX (tmode); + } + } if (target == 0 || GET_MODE (target) != tmode @@ -13342,6 +13399,10 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0, case FUTURE_BUILTIN_VINSERTVPRBL: case FUTURE_BUILTIN_VINSERTVPRHL: case FUTURE_BUILTIN_VINSERTVPRWL: + case FUTURE_BUILTIN_VREPLACE_ELT_UV4SI: + case FUTURE_BUILTIN_VREPLACE_ELT_UV2DI: + case FUTURE_BUILTIN_VREPLACE_UN_UV4SI: + case FUTURE_BUILTIN_VREPLACE_UN_UV2DI: h.uns_p[0] = 1; h.uns_p[1] = 1; h.uns_p[2] = 1; diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 6ce93f14dec..57607998c42 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -348,11 +348,22 @@ UNSPEC_EXTRACTR UNSPEC_INSERTL UNSPEC_INSERTR + UNSPEC_REPLACE_ELT + UNSPEC_REPLACE_UN ]) ;; Like VI, defined in vector.md, but add ISA 2.07 integer vector ops (define_mode_iterator VI2 [V4SI V8HI V16QI V2DI]) +;; Vector extract_elt iterator/attr for 32-bit and 64-bit elements +(define_mode_iterator REPLACE_ELT [V4SI V4SF V2DI V2DF]) +(define_mode_attr REPLACE_ELT_char [(V4SI "w") (V4SF "w") + (V2DI "d") (V2DF "d")]) +(define_mode_attr REPLACE_ELT_sh [(V4SI "2") (V4SF "2") + (V2DI "3") (V2DF "3")]) +(define_mode_attr REPLACE_ELT_max [(V4SI "12") (V4SF "12") + (V2DI "8") (V2DF "8")]) + ;; VSX moves ;; The patterns for LE permuted loads and stores come before the general @@ -3957,6 +3968,55 @@ "vinsrx %0,%1,%2" [(set_attr "type" "vecsimple")]) +(define_expand "vreplace_elt_" + [(set (match_operand:REPLACE_ELT 0 "register_operand") + (unspec:REPLACE_ELT [(match_operand:REPLACE_ELT 1 "register_operand") + (match_operand: 2 "register_operand") + (match_operand:QI 3 "const_0_to_3_operand")] + UNSPEC_REPLACE_ELT))] + "TARGET_FUTURE" +{ + int index; + /* Immediate value is the word index, convert to byte index and adjust for + Endianness if needed. */ + if (BYTES_BIG_ENDIAN) + index = INTVAL (operands[3]) << ; + + else + index = - (INTVAL (operands[3]) << ); + + emit_insn (gen_vreplace_elt__inst (operands[0], operands[1], + operands[2], + GEN_INT (index))); + DONE; + } +[(set_attr "type" "vecsimple")]) + +(define_expand "vreplace_un_" + [(set (match_operand:REPLACE_ELT 0 "register_operand") + (unspec:REPLACE_ELT [(match_operand:REPLACE_ELT 1 "register_operand") + (match_operand: 2 "register_operand") + (match_operand:QI 3 "const_0_to_12_operand")] + UNSPEC_REPLACE_UN))] + "TARGET_FUTURE" +{ + /* Immediate value is the byte index Big Endian numbering. */ + emit_insn (gen_vreplace_elt__inst (operands[0], operands[1], + operands[2], operands[3])); + DONE; + } +[(set_attr "type" "vecsimple")]) + +(define_insn "vreplace_elt__inst" + [(set (match_operand:REPLACE_ELT 0 "register_operand" "=v") + (unspec:REPLACE_ELT [(match_operand:REPLACE_ELT 1 "register_operand" "0") + (match_operand: 2 "register_operand" "r") + (match_operand:QI 3 "const_0_to_12_operand" "n")] + UNSPEC_REPLACE_ELT))] + "TARGET_FUTURE" + "vins %0,%2,%3" + [(set_attr "type" "vecsimple")]) + ;; VSX_EXTRACT optimizations ;; Optimize double d = (double) vec_extract (vi, ) ;; Get the element into the top position and use XVCVSWDP/XVCVUWDP diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 8931c7950f6..00c17be1851 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -21045,6 +21045,56 @@ This is a limitation of the bi-endian vector programming model consistent with the limitation on vec_perm, for example. @findex vec_inserth +Vector Replace Element +@smallexample +@exdent vector signed int vec_replace_elt (vector signed int, signed int, +const int); +@exdent vector unsigned int vec_replace_elt (vector unsigned int, +unsigned int, const int); +@exdent vector float vec_replace_elt (vector float, float, const int); +@exdent vector signed long long vec_replace_elt (vector signed long long, +signed long long, const int); +@exdent vector unsigned long long vec_replace_elt (vector unsigned long long, +unsigned long long, const int); +@exdent vector double rec_replace_elt (vector double, double, const int); +@end smallexample +The third argument (constrained to [0,3]) identifies the natural-endian +element number of the first argument that will be replaced by the second +argument to produce the result. The other elements of the first argument will +remain unchanged in the result. + +If it's desirable to insert a word at an unaligned position, use +vec_replace_unaligned instead. + +@findex vec_replace_element + +Vector Replace Unaligned +@smallexample +@exdent vector unsigned char vec_replace_unaligned (vector unsigned char, +signed int, const int); +@exdent vector unsigned char vec_replace_unaligned (vector unsigned char, +unsigned int, const int); +@exdent vector unsigned char vec_replace_unaligned (vector unsigned char, +float, const int); +@exdent vector unsigned char vec_replace_unaligned (vector unsigned char, +signed long long, const int); +@exdent vector unsigned char vec_replace_unaligned (vector unsigned char, +unsigned long long, const int); +@exdent vector unsigned char vec_replace_unaligned (vector unsigned char, +double, const int); +@end smallexample + +The second argument replaces a portion of the first argument to produce the +result, with the rest of the first argument unchanged in the result. The +third argument identifies the byte index (using left-to-right, or big-endian +order) where the high-order byte of the second argument will be placed, with +the remaining bytes of the second argument placed naturally "to the right" +of the high-order byte. + +The programmer is responsible for understanding the endianness issues involved +with the first argument and the result. +@findex vec_replace_unaligned + @smallexample @exdent vector unsigned long long int @exdent vec_pdep (vector unsigned long long int, vector unsigned long long int) diff --git a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c new file mode 100644 index 00000000000..1fe23d5f912 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c @@ -0,0 +1,289 @@ +/* { dg-do run } */ +/* { dg-require-effective-target powerpc_future_hw } */ +/* { dg-options "-mdejagnu-cpu=future" } */ + +#include + +#define DEBUG 1 + +#ifdef DEBUG +#include +#endif + +extern void abort (void); + +int +main (int argc, char *argv []) +{ + int i; + unsigned char ch; + unsigned int index; + + vector unsigned int vresult_uint; + vector unsigned int expected_vresult_uint; + vector unsigned int src_va_uint; + vector unsigned int src_vb_uint; + unsigned int src_a_uint; + + vector int vresult_int; + vector int expected_vresult_int; + vector int src_va_int; + vector int src_vb_int; + int src_a_int; + + vector unsigned long long int vresult_ullint; + vector unsigned long long int expected_vresult_ullint; + vector unsigned long long int src_va_ullint; + vector unsigned long long int src_vb_ullint; + unsigned int long long src_a_ullint; + + vector long long int vresult_llint; + vector long long int expected_vresult_llint; + vector long long int src_va_llint; + vector long long int src_vb_llint; + long long int src_a_llint; + + vector float vresult_float; + vector float expected_vresult_float; + vector float src_va_float; + float src_a_float; + + vector double vresult_double; + vector double expected_vresult_double; + vector double src_va_double; + double src_a_double; + + /* Vector replace 32-bit element */ + src_a_uint = 345; + src_va_uint = (vector unsigned int) { 0, 1, 2, 3 }; + vresult_uint = (vector unsigned int) { 0, 0, 0, 0 }; + expected_vresult_uint = (vector unsigned int) { 0, 1, 345, 3 }; + + vresult_uint = vec_replace_elt (src_va_uint, src_a_uint, 2); + + if (!vec_all_eq (vresult_uint, expected_vresult_uint)) { +#if DEBUG + printf("ERROR, vec_replace_elt (src_vb_uint, src_va_uint, index)\n"); + for(i = 0; i < 4; i++) + printf(" vresult_uint[%d] = %d, expected_vresult_uint[%d] = %d\n", + i, vresult_uint[i], i, expected_vresult_uint[i]); +#else + abort(); +#endif + } + + src_a_int = 234; + src_va_int = (vector int) { 0, 1, 2, 3 }; + vresult_int = (vector int) { 0, 0, 0, 0 }; + expected_vresult_int = (vector int) { 0, 234, 2, 3 }; + + vresult_int = vec_replace_elt (src_va_int, src_a_int, 1); + + if (!vec_all_eq (vresult_int, expected_vresult_int)) { +#if DEBUG + printf("ERROR, vec_replace_elt (src_vb_int, src_va_int, index)\n"); + for(i = 0; i < 4; i++) + printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n", + i, vresult_int[i], i, expected_vresult_int[i]); +#else + abort(); +#endif + } + + src_a_float = 34.0; + src_va_float = (vector float) { 0.0, 10.0, 20.0, 30.0 }; + vresult_float = (vector float) { 0.0, 0.0, 0.0, 0.0 }; + expected_vresult_float = (vector float) { 0.0, 34.0, 20.0, 30.0 }; + + vresult_float = vec_replace_elt (src_va_float, src_a_float, 1); + + if (!vec_all_eq (vresult_float, expected_vresult_float)) { +#if DEBUG + printf("ERROR, vec_replace_elt (src_vb_float, src_va_float, index)\n"); + for(i = 0; i < 4; i++) + printf(" vresult_float[%d] = %f, expected_vresult_float[%d] = %f\n", + i, vresult_float[i], i, expected_vresult_float[i]); +#else + abort(); +#endif + } + + /* Vector replace 64-bit element */ + src_a_ullint = 456; + src_va_ullint = (vector unsigned long long int) { 0, 1 }; + vresult_ullint = (vector unsigned long long int) { 0, 0 }; + expected_vresult_ullint = (vector unsigned long long int) { 0, 456 }; + + vresult_ullint = vec_replace_elt (src_va_ullint, src_a_ullint, 1); + + if (!vec_all_eq (vresult_ullint, expected_vresult_ullint)) { +#if DEBUG + printf("ERROR, vec_replace_elt (src_vb_ullint, src_va_ullint, index)\n"); + for(i = 0; i < 2; i++) + printf(" vresult_ullint[%d] = %d, expected_vresult_ullint[%d] = %d\n", + i, vresult_ullint[i], i, expected_vresult_ullint[i]); +#else + abort(); +#endif + } + + src_a_llint = 678; + src_va_llint = (vector long long int) { 0, 1 }; + vresult_llint = (vector long long int) { 0, 0 }; + expected_vresult_llint = (vector long long int) { 0, 678 }; + + vresult_llint = vec_replace_elt (src_va_llint, src_a_llint, 1); + + if (!vec_all_eq (vresult_llint, expected_vresult_llint)) { +#if DEBUG + printf("ERROR, vec_replace_elt (src_vb_llint, src_va_llint, index)\n"); + for(i = 0; i < 2; i++) + printf(" vresult_llint[%d] = %d, expected_vresult_llint[%d] = %d\n", + i, vresult_llint[i], i, expected_vresult_llint[i]); +#else + abort(); +#endif + } + + src_a_double = 678.0; + src_va_double = (vector double) { 0.0, 50.0 }; + vresult_double = (vector double) { 0.0, 0.0 }; + expected_vresult_double = (vector double) { 0.0, 678.0 }; + + vresult_double = vec_replace_elt (src_va_double, src_a_double, 1); + + if (!vec_all_eq (vresult_double, expected_vresult_double)) { +#if DEBUG + printf("ERROR, vec_replace_elt (src_vb_double, src_va_double, index)\n"); + for(i = 0; i < 2; i++) + printf(" vresult_double[%d] = %f, expected_vresult_double[%d] = %f\n", + i, vresult_double[i], i, expected_vresult_double[i]); +#else + abort(); +#endif + } + + + /* Vector replace 32-bit element, unaligned */ + src_a_uint = 345; + src_va_uint = (vector unsigned int) { 1, 2, 0, 0 }; + vresult_uint = (vector unsigned int) { 0, 0, 0, 0 }; + /* Byte index 7 will overwrite part of elements 2 and 3 */ + expected_vresult_uint = (vector unsigned int) { 1, 2, 345*256, 0 }; + + vresult_uint = vec_replace_unaligned (src_va_uint, src_a_uint, 3); + + if (!vec_all_eq (vresult_uint, expected_vresult_uint)) { +#if DEBUG + printf("ERROR, vec_replace_unaligned (src_vb_uint, src_va_uint, index)\n"); + for(i = 0; i < 4; i++) + printf(" vresult_uint[%d] = %d, expected_vresult_uint[%d] = %d\n", + i, vresult_uint[i], i, expected_vresult_uint[i]); +#else + abort(); +#endif + } + + src_a_int = 234; + src_va_int = (vector int) { 1, 0, 3, 4 }; + vresult_int = (vector int) { 0, 0, 0, 0 }; + /* Byte index 7 will over write part of elements 1 and 2 */ + expected_vresult_int = (vector int) { 1, 234*256, 0, 4 }; + + vresult_int = vec_replace_unaligned (src_va_int, src_a_int, 7); + + if (!vec_all_eq (vresult_int, expected_vresult_int)) { +#if DEBUG + printf("ERROR, vec_replace_unaligned (src_vb_int, src_va_int, index)\n"); + for(i = 0; i < 4; i++) + printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n", + i, vresult_int[i], i, expected_vresult_int[i]); +#else + abort(); +#endif + } + + src_a_float = 34.0; + src_va_float = (vector float) { 0.0, 10.0, 20.0, 30.0 }; + vresult_float = (vector float) { 0.0, 0.0, 0.0, 0.0 }; + expected_vresult_float = (vector float) { 0.0, 34.0, 20.0, 30.0 }; + + vresult_float = vec_replace_unaligned (src_va_float, src_a_float, 8); + + if (!vec_all_eq (vresult_float, expected_vresult_float)) { +#if DEBUG + printf("ERROR, vec_replace_unaligned (src_vb_float, src_va_float, index)\n"); + for(i = 0; i < 4; i++) + printf(" vresult_float[%d] = %f, expected_vresult_float[%d] = %f\n", + i, vresult_float[i], i, expected_vresult_float[i]); +#else + abort(); +#endif + } + + /* Vector replace 64-bit element, unaligned */ + src_a_ullint = 456; + src_va_ullint = (vector unsigned long long int) { 0, 0x222 }; + vresult_ullint = (vector unsigned long long int) { 0, 0 }; + expected_vresult_ullint = (vector unsigned long long int) { 456*256, + 0x200 }; + + /* Byte index 7 will over write least significant byte of element 0 */ + vresult_ullint = vec_replace_unaligned (src_va_ullint, src_a_ullint, 7); + + if (!vec_all_eq (vresult_ullint, expected_vresult_ullint)) { +#if DEBUG + printf("ERROR, vec_replace_unaligned (src_vb_ullint, src_va_ullint, index)\n"); + for(i = 0; i < 2; i++) + printf(" vresult_ullint[%d] = %d, expected_vresult_ullint[%d] = %d\n", + i, vresult_ullint[i], i, expected_vresult_ullint[i]); +#else + abort(); +#endif + } + + src_a_llint = 678; + src_va_llint = (vector long long int) { 0, 0x101 }; + vresult_llint = (vector long long int) { 0, 0 }; + /* Byte index 7 will over write least significant byte of element 0 */ + expected_vresult_llint = (vector long long int) { 678*256, 0x100 }; + + vresult_llint = vec_replace_unaligned (src_va_llint, src_a_llint, 7); + + if (!vec_all_eq (vresult_llint, expected_vresult_llint)) { +#if DEBUG + printf("ERROR, vec_replace_unaligned (src_vb_llint, src_va_llint, index)\n"); + for(i = 0; i < 2; i++) + printf(" vresult_llint[%d] = %d, expected_vresult_llint[%d] = %d\n", + i, vresult_llint[i], i, expected_vresult_llint[i]); +#else + abort(); +#endif + } + + src_a_double = 678.0; + src_va_double = (vector double) { 0.0, 50.0 }; + vresult_double = (vector double) { 0.0, 0.0 }; + expected_vresult_double = (vector double) { 0.0, 678.0 }; + + vresult_double = vec_replace_unaligned (src_va_double, src_a_double, 0); + + if (!vec_all_eq (vresult_double, expected_vresult_double)) { +#if DEBUG + printf("ERROR, vec_replace_unaligned (src_vb_double, src_va_double, index)\ +n"); + for(i = 0; i < 2; i++) + printf(" vresult_double[%d] = %f, expected_vresult_double[%d] = %f\n", + i, vresult_double[i], i, expected_vresult_double[i]); +#else + abort(); +#endif + } + + return 0; +} + +/* { dg-final { scan-assembler-times {\mvinsw\M} 6 } } */ +/* { dg-final { scan-assembler-times {\mvinsd\M} 6 } } */ + + From patchwork Mon Jun 15 23:38:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Carl Love X-Patchwork-Id: 1309878 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=gcc.gnu.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=iXzmhfrp; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49m77G3yZzz9sRN for ; Tue, 16 Jun 2020 09:38:18 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8488E386F45A; Mon, 15 Jun 2020 23:38:15 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8488E386F45A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1592264295; bh=K/U/yTxUyJOCyQ+maUU5z1OYeQDoLdPt+uG0TcXtYNk=; h=Subject:To:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=iXzmhfrpl8kaK7c3pJMuhnoxviVDbbAt6Fvnl+3vvNUE/cR3yERaTRyZ1go+6MTcg QfZkSJ5TotD8JIJ4+BajpUICGH2vnqx8N1je3p/n5vGxjVtKOiGeP/Qka8BB7lcbGG mE5dUtk58bPLWDCTPDHoCnbUUmbFKrfBw+4+OmKA= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 2430D38708EE; Mon, 15 Jun 2020 23:38:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 2430D38708EE Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 05FNX4Ct128972; Mon, 15 Jun 2020 19:38:10 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 31pc7n33ju-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 15 Jun 2020 19:38:10 -0400 Received: from m0098396.ppops.net (m0098396.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 05FNc9p4138413; Mon, 15 Jun 2020 19:38:09 -0400 Received: from ppma02dal.us.ibm.com (a.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.10]) by mx0a-001b2d01.pphosted.com with ESMTP id 31pc7n33jm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 15 Jun 2020 19:38:09 -0400 Received: from pps.filterd (ppma02dal.us.ibm.com [127.0.0.1]) by ppma02dal.us.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 05FNVGH7009189; Mon, 15 Jun 2020 23:38:08 GMT Received: from b03cxnp08026.gho.boulder.ibm.com (b03cxnp08026.gho.boulder.ibm.com [9.17.130.18]) by ppma02dal.us.ibm.com with ESMTP id 31pe8ntyx6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 15 Jun 2020 23:38:08 +0000 Received: from b03ledav005.gho.boulder.ibm.com (b03ledav005.gho.boulder.ibm.com [9.17.130.236]) by b03cxnp08026.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 05FNc5tt24117654 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 15 Jun 2020 23:38:05 GMT Received: from b03ledav005.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7A8BEBE056; Mon, 15 Jun 2020 23:38:07 +0000 (GMT) Received: from b03ledav005.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 66BD7BE051; Mon, 15 Jun 2020 23:38:06 +0000 (GMT) Received: from sig-9-65-250-81.ibm.com (unknown [9.65.250.81]) by b03ledav005.gho.boulder.ibm.com (Postfix) with ESMTP; Mon, 15 Jun 2020 23:38:06 +0000 (GMT) Message-ID: <5c4280910407eae085542e4411bddd5d622de0dd.camel@us.ibm.com> Subject: [PATCH 4/6 ver 2] rs6000, Add vector shift double builtin support To: segher@gcc.gnu.org, dje.gcc@gmail.com, gcc-patches@gcc.gnu.org Date: Mon, 15 Jun 2020 16:38:04 -0700 X-Mailer: Evolution 3.28.5 (3.28.5-5.el7) Mime-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216, 18.0.687 definitions=2020-06-15_11:2020-06-15, 2020-06-15 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=4 mlxlogscore=999 clxscore=1015 impostorscore=0 malwarescore=0 cotscore=-2147483648 priorityscore=1501 lowpriorityscore=0 phishscore=0 spamscore=0 mlxscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2006150168 X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SCC_10_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Carl Love via Gcc-patches From: Carl Love Reply-To: Carl Love Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" v2 fixes: change logs redone gcc/config/rs6000/rs6000-call.c - added spaces before parenthesis around args. ----------------------------------------------------------------- GCC maintainers: The following patch adds support for the vector shift double builtins for RFC2609. The patch has been compiled and tested on powerpc64le-unknown-linux-gnu (Power 9 LE) and Mambo with no regression errors. Please let me know if this patch is acceptable for the pu branch. Thanks. Carl Love ------------------------------------------------------- gcc/ChangeLog 2020-06-15 Carl Love * config/rs6000/altivec.h (vec_sldb and vec_srdb): New defines. * config/rs6000/altivec.md (UNSPEC_SLDB, UNSPEC_SRDB): New. (SLDB_LR attribute): New. (VSHIFT_DBL_LR iterator): New. (vsdb_): New define_insn. * config/rs6000/rs6000-builtin.def (VSLDB_V16QI, VSLDB_V8HI, VSLDB_V4SI, VSLDB_V2DI, VSRDB_V16QI, VSRDB_V8HI, VSRDB_V4SI, VSRDB_V2DI): New BU_FUTURE_V_3 definitions. (SLDB, SRDB): New BU_FUTURE_OVERLOAD_3 definitions. * config/rs6000/rs6000-call.c (FUTURE_BUILTIN_VEC_SLDB, FUTURE_BUILTIN_VEC_SRDB): New definitions. (rs6000_expand_ternop_builtin) [CODE_FOR_vsldb_v16qi, CODE_FOR_vsldb_v8hi, CODE_FOR_vsldb_v4si, CODE_FOR_vsldb_v2di, CODE_FOR_vsrdb_v16qi, CODE_FOR_vsrdb_v8hi, CODE_FOR_vsrdb_v4si, CODE_FOR_vsrdb_v2di}: Add else if clauses. * doc/extend.texi: Add description for vec_sldb and vec_srdb. gcc/testsuite/ChangeLog 2020-06-15 Carl Love * gcc.target/powerpc/vec-shift-double-runnable.c: New test file. --- gcc/config/rs6000/altivec.h | 2 + gcc/config/rs6000/altivec.md | 18 + gcc/config/rs6000/rs6000-builtin.def | 11 + gcc/config/rs6000/rs6000-call.c | 70 ++++ gcc/doc/extend.texi | 53 +++ .../powerpc/vec-shift-double-runnable.c | 384 ++++++++++++++++++ 6 files changed, 538 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable.c diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h index 435ffb8158f..0be68892aad 100644 --- a/gcc/config/rs6000/altivec.h +++ b/gcc/config/rs6000/altivec.h @@ -703,6 +703,8 @@ __altivec_scalar_pred(vec_any_nle, #define vec_inserth(a, b, c) __builtin_vec_inserth (a, b, c) #define vec_replace_elt(a, b, c) __builtin_vec_replace_elt (a, b, c) #define vec_replace_unaligned(a, b, c) __builtin_vec_replace_un (a, b, c) +#define vec_sldb(a, b, c) __builtin_vec_sldb (a, b, c) +#define vec_srdb(a, b, c) __builtin_vec_srdb (a, b, c) #define vec_gnb(a, b) __builtin_vec_gnb (a, b) #define vec_clrl(a, b) __builtin_vec_clrl (a, b) diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 0b0b49ee056..832a35cdaa9 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -171,6 +171,8 @@ UNSPEC_XXEVAL UNSPEC_VSTRIR UNSPEC_VSTRIL + UNSPEC_SLDB + UNSPEC_SRDB ]) (define_c_enum "unspecv" @@ -781,6 +783,22 @@ DONE; }) +;; Map UNSPEC_SLDB to "l" and UNSPEC_SRDB to "r". +(define_int_attr SLDB_LR [(UNSPEC_SLDB "l") + (UNSPEC_SRDB "r")]) + +(define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB]) + +(define_insn "vsdb_" + [(set (match_operand:VI2 0 "register_operand" "=v") + (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v") + (match_operand:VI2 2 "register_operand" "v") + (match_operand:QI 3 "const_0_to_12_operand" "n")] + VSHIFT_DBL_LR))] + "TARGET_FUTURE" + "vsdbi %0,%1,%2,%3" + [(set_attr "type" "vecsimple")]) + (define_expand "vstrir_" [(set (match_operand:VIshort 0 "altivec_register_operand") (unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand")] diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def index 91821f29a6f..2b198177ef0 100644 --- a/gcc/config/rs6000/rs6000-builtin.def +++ b/gcc/config/rs6000/rs6000-builtin.def @@ -2657,6 +2657,15 @@ BU_FUTURE_V_3 (VREPLACE_UN_V2DI, "vreplace_un_v2di", CONST, vreplace_un_v2di) BU_FUTURE_V_3 (VREPLACE_UN_UV2DI, "vreplace_un_uv2di", CONST, vreplace_un_v2di) BU_FUTURE_V_3 (VREPLACE_UN_V2DF, "vreplace_un_v2df", CONST, vreplace_un_v2df) +BU_FUTURE_V_3 (VSLDB_V16QI, "vsldb_v16qi", CONST, vsldb_v16qi) +BU_FUTURE_V_3 (VSLDB_V8HI, "vsldb_v8hi", CONST, vsldb_v8hi) +BU_FUTURE_V_3 (VSLDB_V4SI, "vsldb_v4si", CONST, vsldb_v4si) +BU_FUTURE_V_3 (VSLDB_V2DI, "vsldb_v2di", CONST, vsldb_v2di) + +BU_FUTURE_V_3 (VSRDB_V16QI, "vsrdb_v16qi", CONST, vsrdb_v16qi) +BU_FUTURE_V_3 (VSRDB_V8HI, "vsrdb_v8hi", CONST, vsrdb_v8hi) +BU_FUTURE_V_3 (VSRDB_V4SI, "vsrdb_v4si", CONST, vsrdb_v4si) +BU_FUTURE_V_3 (VSRDB_V2DI, "vsrdb_v2di", CONST, vsrdb_v2di) BU_FUTURE_V_1 (VSTRIBR, "vstribr", CONST, vstrir_v16qi) BU_FUTURE_V_1 (VSTRIHR, "vstrihr", CONST, vstrir_v8hi) BU_FUTURE_V_1 (VSTRIBL, "vstribl", CONST, vstril_v16qi) @@ -2680,6 +2689,8 @@ BU_FUTURE_OVERLOAD_3 (INSERTL, "insertl") BU_FUTURE_OVERLOAD_3 (INSERTH, "inserth") BU_FUTURE_OVERLOAD_3 (REPLACE_ELT, "replace_elt") BU_FUTURE_OVERLOAD_3 (REPLACE_UN, "replace_un") +BU_FUTURE_OVERLOAD_3 (SLDB, "sldb") +BU_FUTURE_OVERLOAD_3 (SRDB, "srdb") BU_FUTURE_OVERLOAD_1 (VSTRIR, "strir") BU_FUTURE_OVERLOAD_1 (VSTRIL, "stril") diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c index 2653222ced0..092e6c1cc2c 100644 --- a/gcc/config/rs6000/rs6000-call.c +++ b/gcc/config/rs6000/rs6000-call.c @@ -5654,6 +5654,56 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = { { FUTURE_BUILTIN_VEC_REPLACE_UN, FUTURE_BUILTIN_VREPLACE_UN_V2DF, RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_double, RS6000_BTI_INTQI }, + { FUTURE_BUILTIN_VEC_SLDB, FUTURE_BUILTIN_VSLDB_V16QI, + RS6000_BTI_V16QI, RS6000_BTI_V16QI, + RS6000_BTI_V16QI, RS6000_BTI_UINTQI }, + { FUTURE_BUILTIN_VEC_SLDB, FUTURE_BUILTIN_VSLDB_V16QI, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTQI }, + { FUTURE_BUILTIN_VEC_SLDB, FUTURE_BUILTIN_VSLDB_V8HI, + RS6000_BTI_V8HI, RS6000_BTI_V8HI, + RS6000_BTI_V8HI, RS6000_BTI_UINTQI }, + { FUTURE_BUILTIN_VEC_SLDB, FUTURE_BUILTIN_VSLDB_V8HI, + RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, + RS6000_BTI_unsigned_V8HI, RS6000_BTI_UINTQI }, + { FUTURE_BUILTIN_VEC_SLDB, FUTURE_BUILTIN_VSLDB_V4SI, + RS6000_BTI_V4SI, RS6000_BTI_V4SI, + RS6000_BTI_V4SI, RS6000_BTI_UINTQI }, + { FUTURE_BUILTIN_VEC_SLDB, FUTURE_BUILTIN_VSLDB_V4SI, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTQI }, + { FUTURE_BUILTIN_VEC_SLDB, FUTURE_BUILTIN_VSLDB_V2DI, + RS6000_BTI_V2DI, RS6000_BTI_V2DI, + RS6000_BTI_V2DI, RS6000_BTI_UINTQI }, + { FUTURE_BUILTIN_VEC_SLDB, FUTURE_BUILTIN_VSLDB_V2DI, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTQI }, + + { FUTURE_BUILTIN_VEC_SRDB, FUTURE_BUILTIN_VSRDB_V16QI, + RS6000_BTI_V16QI, RS6000_BTI_V16QI, + RS6000_BTI_V16QI, RS6000_BTI_UINTQI }, + { FUTURE_BUILTIN_VEC_SRDB, FUTURE_BUILTIN_VSRDB_V16QI, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTQI }, + { FUTURE_BUILTIN_VEC_SRDB, FUTURE_BUILTIN_VSRDB_V8HI, + RS6000_BTI_V8HI, RS6000_BTI_V8HI, + RS6000_BTI_V8HI, RS6000_BTI_UINTQI }, + { FUTURE_BUILTIN_VEC_SRDB, FUTURE_BUILTIN_VSRDB_V8HI, + RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, + RS6000_BTI_unsigned_V8HI, RS6000_BTI_UINTQI }, + { FUTURE_BUILTIN_VEC_SRDB, FUTURE_BUILTIN_VSRDB_V4SI, + RS6000_BTI_V4SI, RS6000_BTI_V4SI, + RS6000_BTI_V4SI, RS6000_BTI_UINTQI }, + { FUTURE_BUILTIN_VEC_SRDB, FUTURE_BUILTIN_VSRDB_V4SI, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTQI }, + { FUTURE_BUILTIN_VEC_SRDB, FUTURE_BUILTIN_VSRDB_V2DI, + RS6000_BTI_V2DI, RS6000_BTI_V2DI, + RS6000_BTI_V2DI, RS6000_BTI_UINTQI }, + { FUTURE_BUILTIN_VEC_SRDB, FUTURE_BUILTIN_VSRDB_V2DI, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTQI }, + { FUTURE_BUILTIN_VEC_VSTRIL, FUTURE_BUILTIN_VSTRIBL, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 }, { FUTURE_BUILTIN_VEC_VSTRIL, FUTURE_BUILTIN_VSTRIBL, @@ -10045,6 +10095,26 @@ rs6000_expand_ternop_builtin (enum insn_code icode, tree exp, rtx target) } } + else if (icode == CODE_FOR_vsldb_v16qi + || icode == CODE_FOR_vsldb_v8hi + || icode == CODE_FOR_vsldb_v4si + || icode == CODE_FOR_vsldb_v2di + || icode == CODE_FOR_vsrdb_v16qi + || icode == CODE_FOR_vsrdb_v8hi + || icode == CODE_FOR_vsrdb_v4si + || icode == CODE_FOR_vsrdb_v2di) + { + /* Check whether the 3rd argument is an integer constant in the range + 0 to 7 inclusive. */ + STRIP_NOPS (arg2); + if (TREE_CODE (arg2) != INTEGER_CST + || !IN_RANGE (TREE_INT_CST_LOW (arg2), 0, 7)) + { + error ("argument 3 must be in the range 0 to 7"); + return CONST0_RTX (tmode); + } + } + if (target == 0 || GET_MODE (target) != tmode || ! (*insn_data[icode].operand[0].predicate) (target, tmode)) diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 00c17be1851..6926c866492 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -21095,6 +21095,59 @@ The programmer is responsible for understanding the endianness issues involved with the first argument and the result. @findex vec_replace_unaligned +Vector Shift Left Double Bit Immediate +@smallexample +@exdent vector signed char vec_sldb (vector signed char, vector signed char, +const unsigned int); +@exdent vector unsigned char vec_sldb (vector unsigned char, +vector unsigned char, const unsigned int); +@exdent vector signed short vec_sldb (vector signed short, vector signed short, +const unsigned int); +@exdent vector unsigned short vec_sldb (vector unsigned short, +vector unsigned short, const unsigned int); +@exdent vector signed int vec_sldb (vector signed int, vector signed int, +const unsigned int); +@exdent vector unsigned int vec_sldb (vector unsigned int, vector unsigned int, +const unsigned int); +@exdent vector signed long long vec_sldb (vector signed long long, +vector signed long long, const unsigned int); +@exdent vector unsigned long long vec_sldb (vector unsigned long long, +vector unsigned long long, const unsigned int); +@end smallexample + +Shift the combined input vectors left by the amount specified by the low-order +three bits of the third argument, and return the leftmost remaining 128 bits. +Code using this instruction must be endian-aware. + +@findex vec_sldb + +Vector Shift Right Double Bit Immediate + +@smallexample +@exdent vector signed char vec_srdb (vector signed char, vector signed char, +const unsigned int); +@exdent vector unsigned char vec_srdb (vector unsigned char, vector unsigned char, +const unsigned int); +@exdent vector signed short vec_srdb (vector signed short, vector signed short, +const unsigned int); +@exdent vector unsigned short vec_srdb (vector unsigned short, vector unsigned short, +const unsigned int); +@exdent vector signed int vec_srdb (vector signed int, vector signed int, +const unsigned int); +@exdent vector unsigned int vec_srdb (vector unsigned int, vector unsigned int, +const unsigned int); +@exdent vector signed long long vec_srdb (vector signed long long, +vector signed long long, const unsigned int); +@exdent vector unsigned long long vec_srdb (vector unsigned long long, +vector unsigned long long, const unsigned int); +@end smallexample + +Shift the combined input vectors right by the amount specified by the low-order three +bits of the third argument, and return the rightmost remaining 128 bits. Code using +this instruction must be endian-aware. + +@findex vec_srdb + @smallexample @exdent vector unsigned long long int @exdent vec_pdep (vector unsigned long long int, vector unsigned long long int) diff --git a/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable.c new file mode 100644 index 00000000000..8093c33ba1d --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable.c @@ -0,0 +1,384 @@ +/* { dg-do run } */ +/* { dg-require-effective-target powerpc_future_hw } */ +/* { dg-options "-mdejagnu-cpu=future" } */ +#include + +#define DEBUG 0 + +#ifdef DEBUG +#include +#endif + +extern void abort (void); + +int +main (int argc, char *argv []) +{ + int i; + + vector signed char vresult_char; + vector signed char expected_vresult_char; + vector signed char src_va_char; + vector signed char src_vb_char; + + vector unsigned char vresult_uchar; + vector unsigned char expected_vresult_uchar; + vector unsigned char src_va_uchar; + vector unsigned char src_vb_uchar; + + vector short int vresult_sh; + vector short int expected_vresult_sh; + vector short int src_va_sh; + vector short int src_vb_sh; + + vector short unsigned int vresult_ush; + vector short unsigned int expected_vresult_ush; + vector short unsigned int src_va_ush; + vector short unsigned int src_vb_ush; + + vector int vresult_int; + vector int expected_vresult_int; + vector int src_va_int; + vector int src_vb_int; + int src_a_int; + + vector unsigned int vresult_uint; + vector unsigned int expected_vresult_uint; + vector unsigned int src_va_uint; + vector unsigned int src_vb_uint; + unsigned int src_a_uint; + + vector long long int vresult_llint; + vector long long int expected_vresult_llint; + vector long long int src_va_llint; + vector long long int src_vb_llint; + long long int src_a_llint; + + vector unsigned long long int vresult_ullint; + vector unsigned long long int expected_vresult_ullint; + vector unsigned long long int src_va_ullint; + vector unsigned long long int src_vb_ullint; + unsigned int long long src_a_ullint; + + /* Vector shift double left */ + src_va_char = (vector signed char) { 0, 2, 4, 6, 8, 10, 12, 14, + 16, 18, 20, 22, 24, 26, 28, 30 }; + src_vb_char = (vector signed char) { 10, 20, 30, 40, 50, 60, 70, 80, 90, + 100, 110, 120, 130, 140, 150, 160 }; + vresult_char = (vector signed char) { 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_char = (vector signed char) { 80, 0, 1, 2, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 13, 14 }; + + vresult_char = vec_sldb (src_va_char, src_vb_char, 7); + + if (!vec_all_eq (vresult_char, expected_vresult_char)) { +#if DEBUG + printf("ERROR, vec_sldb (src_va_char_, src_vb_char, 7)\n"); + for(i = 0; i < 16; i++) + printf(" vresult_char[%d] = %d, expected_vresult_char[%d] = %d\n", + i, vresult_char[i], i, expected_vresult_char[i]); +#else + abort(); +#endif + } + + src_va_uchar = (vector unsigned char) { 0, 2, 4, 6, 8, 10, 12, 14, + 16, 18, 20, 22, 24, 26, 28, 30 }; + src_vb_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }; + vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_uchar = (vector unsigned char) { 0, 0, 1, 2, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 13, 14 }; + + vresult_uchar = vec_sldb (src_va_uchar, src_vb_uchar, 7); + + if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) { +#if DEBUG + printf("ERROR, vec_sldb (src_va_uchar_, src_vb_uchar, 7)\n"); + for(i = 0; i < 16; i++) + printf(" vresult_uchar[%d] = %d, expected_vresult_uchar[%d] = %d\n", + i, vresult_uchar[i], i, expected_vresult_uchar[i]); +#else + abort(); +#endif + } + + src_va_sh = (vector short int) { 0, 2, 4, 6, 8, 10, 12, 14 }; + src_vb_sh = (vector short int) { 0, 0, 0, 0, 0, 0, 0, 0 }; + vresult_sh = (vector short int) { 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_sh = (vector short int) { 0, 2*128, 4*128, 6*128, + 8*128, 10*128, 12*128, 14*128 }; + + vresult_sh = vec_sldb (src_va_sh, src_vb_sh, 7); + + if (!vec_all_eq (vresult_sh, expected_vresult_sh)) { +#if DEBUG + printf("ERROR, vec_sldb (src_va_sh_, src_vb_sh, 7)\n"); + for(i = 0; i < 8; i++) + printf(" vresult_sh[%d] = %d, expected_vresult_sh[%d] = %d\n", + i, vresult_sh[i], i, expected_vresult_sh[i]); +#else + abort(); +#endif + } + + src_va_ush = (vector short unsigned int) { 0, 2, 4, 6, 8, 10, 12, 14 }; + src_vb_ush = (vector short unsigned int) { 10, 20, 30, 40, 50, 60, 70, 80 }; + vresult_ush = (vector short unsigned int) { 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_ush = (vector short unsigned int) { 0, 2*128, 4*128, 6*128, + 8*128, 10*128, 12*128, + 14*128 }; + + vresult_ush = vec_sldb (src_va_ush, src_vb_ush, 7); + + if (!vec_all_eq (vresult_ush, expected_vresult_ush)) { +#if DEBUG + printf("ERROR, vec_sldb (src_va_ush_, src_vb_ush, 7)\n"); + for(i = 0; i < 8; i++) + printf(" vresult_ush[%d] = %d, expected_vresult_ush[%d] = %d\n", + i, vresult_ush[i], i, expected_vresult_ush[i]); +#else + abort(); +#endif + } + + src_va_int = (vector signed int) { 0, 2, 3, 1 }; + src_vb_int = (vector signed int) { 0, 0, 0, 0 }; + vresult_int = (vector signed int) { 0, 0, 0, 0 }; + expected_vresult_int = (vector signed int) { 0, 2*128, 3*128, 1*128 }; + + vresult_int = vec_sldb (src_va_int, src_vb_int, 7); + + if (!vec_all_eq (vresult_int, expected_vresult_int)) { +#if DEBUG + printf("ERROR, vec_sldb (src_va_int_, src_vb_int, 7)\n"); + for(i = 0; i < 4; i++) + printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n", + i, vresult_int[i], i, expected_vresult_int[i]); +#else + abort(); +#endif + } + + src_va_uint = (vector unsigned int) { 0, 2, 4, 6 }; + src_vb_uint = (vector unsigned int) { 10, 20, 30, 40 }; + vresult_uint = (vector unsigned int) { 0, 0, 0, 0 }; + expected_vresult_uint = (vector unsigned int) { 0, 2*128, 4*128, 6*128 }; + + vresult_uint = vec_sldb (src_va_uint, src_vb_uint, 7); + + if (!vec_all_eq (vresult_uint, expected_vresult_uint)) { +#if DEBUG + printf("ERROR, vec_sldb (src_va_uint_, src_vb_uint, 7)\n"); + for(i = 0; i < 4; i++) + printf(" vresult_uint[%d] = %d, expected_vresult_uint[%d] = %d\n", + i, vresult_uint[i], i, expected_vresult_uint[i]); +#else + abort(); +#endif + } + + src_va_llint = (vector signed long long int) { 5, 6 }; + src_vb_llint = (vector signed long long int) { 0, 0 }; + vresult_llint = (vector signed long long int) { 0, 0 }; + expected_vresult_llint = (vector signed long long int) { 5*128, 6*128 }; + + vresult_llint = vec_sldb (src_va_llint, src_vb_llint, 7); + + if (!vec_all_eq (vresult_llint, expected_vresult_llint)) { +#if DEBUG + printf("ERROR, vec_sldb (src_va_llint_, src_vb_llint, 7)\n"); + for(i = 0; i < 2; i++) + printf(" vresult_llint[%d] = %d, expected_vresult_llint[%d] = %d\n", + i, vresult_llint[i], i, expected_vresult_llint[i]); +#else + abort(); +#endif + } + + src_va_ullint = (vector unsigned long long int) { 54, 26 }; + src_vb_ullint = (vector unsigned long long int) { 10, 20 }; + vresult_ullint = (vector unsigned long long int) { 0, 0 }; + expected_vresult_ullint = (vector unsigned long long int) { 54*128, + 26*128 }; + + vresult_ullint = vec_sldb (src_va_ullint, src_vb_ullint, 7); + + if (!vec_all_eq (vresult_ullint, expected_vresult_ullint)) { +#if DEBUG + printf("ERROR, vec_sldb (src_va_ullint_, src_vb_ullint, 7)\n"); + for(i = 0; i < 2; i++) + printf(" vresult_ullint[%d] = %d, expected_vresult_ullint[%d] = %d\n", + i, vresult_ullint[i], i, expected_vresult_ullint[i]); +#else + abort(); +#endif + } + + /* Vector shift double right */ + src_va_char = (vector signed char) { 0, 2, 4, 6, 8, 10, 12, 14, + 16, 18, 20, 22, 24, 26, 28, 30 }; + src_vb_char = (vector signed char) { 10, 12, 14, 16, 18, 20, 22, 24, 26, + 28, 30, 32, 34, 36, 38, 40 }; + vresult_char = (vector signed char) { 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_char = (vector signed char) { 24, 28, 32, 36, 40, 44, 48, + 52, 56, 60, 64, 68, 72, 76, + 80, 0 }; + + vresult_char = vec_srdb (src_va_char, src_vb_char, 7); + + if (!vec_all_eq (vresult_char, expected_vresult_char)) { +#if DEBUG + printf("ERROR, vec_srdb (src_va_char_, src_vb_char, 7)\n"); + for(i = 0; i < 16; i++) + printf(" vresult_char[%d] = %d, expected_vresult_char[%d] = %d\n", + i, vresult_char[i], i, expected_vresult_char[i]); +#else + abort(); +#endif + } + + src_va_uchar = (vector unsigned char) { 100, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }; + src_vb_uchar = (vector unsigned char) { 0, 2, 4, 6, 8, 10, 12, 14, + 16, 18, 20, 22, 24, 26, 28, 30 }; + vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_uchar = (vector unsigned char) { 4, 8, 12, 16, 20, 24, 28, + 32, 36, 40, 44, 48, 52, + 56, 60, 200 }; + + vresult_uchar = vec_srdb (src_va_uchar, src_vb_uchar, 7); + + if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) { +#if DEBUG + printf("ERROR, vec_srdb (src_va_uchar_, src_vb_uchar, 7)\n"); + for(i = 0; i < 16; i++) + printf(" vresult_uchar[%d] = %d, expected_vresult_uchar[%d] = %d\n", + i, vresult_uchar[i], i, expected_vresult_uchar[i]); +#else + abort(); +#endif + } + + src_va_sh = (vector short int) { 0, 0, 0, 0, 0, 0, 0, 0 }; + src_vb_sh = (vector short int) { 0, 2*128, 4*128, 6*128, + 8*128, 10*128, 12*128, 14*128 }; + vresult_sh = (vector short int) { 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_sh = (vector short int) { 0, 2, 4, 6, 8, 10, 12, 14 }; + + vresult_sh = vec_srdb (src_va_sh, src_vb_sh, 7); + + if (!vec_all_eq (vresult_sh, expected_vresult_sh)) { +#if DEBUG + printf("ERROR, vec_srdb (src_va_sh_, src_vb_sh, 7)\n"); + for(i = 0; i < 8; i++) + printf(" vresult_sh[%d] = %d, expected_vresult_sh[%d] = %d\n", + i, vresult_sh[i], i, expected_vresult_sh[i]); +#else + abort(); +#endif + } + + src_va_ush = (vector short unsigned int) { 0, 20, 30, 40, 50, 60, 70, 80 }; + src_vb_ush = (vector short unsigned int) { 0, 2*128, 4*128, 6*128, + 8*128, 10*128, 12*128, 14*128 }; + vresult_ush = (vector short unsigned int) { 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_ush = (vector short unsigned int) { 0, 2, 4, 6, 8, 10, + 12, 14 }; + + vresult_ush = vec_srdb (src_va_ush, src_vb_ush, 7); + + if (!vec_all_eq (vresult_ush, expected_vresult_ush)) { +#if DEBUG + printf("ERROR, vec_srdb (src_va_ush_, src_vb_ush, 7)\n"); + for(i = 0; i < 8; i++) + printf(" vresult_ush[%d] = %d, expected_vresult_ush[%d] = %d\n", + i, vresult_ush[i], i, expected_vresult_ush[i]); +#else + abort(); +#endif + } + + src_va_int = (vector signed int) { 0, 0, 0, 0 }; + src_vb_int = (vector signed int) { 0, 2*128, 3*128, 1*128 }; + vresult_int = (vector signed int) { 0, 0, 0, 0 }; + expected_vresult_int = (vector signed int) { 0, 2, 3, 1 }; + + vresult_int = vec_srdb (src_va_int, src_vb_int, 7); + + if (!vec_all_eq (vresult_int, expected_vresult_int)) { +#if DEBUG + printf("ERROR, vec_srdb (src_va_int_, src_vb_int, 7)\n"); + for(i = 0; i < 4; i++) + printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n", + i, vresult_int[i], i, expected_vresult_int[i]); +#else + abort(); +#endif + } + + src_va_uint = (vector unsigned int) { 0, 20, 30, 40 }; + src_vb_uint = (vector unsigned int) { 128, 2*128, 4*128, 6*128 }; + vresult_uint = (vector unsigned int) { 0, 0, 0, 0 }; + expected_vresult_uint = (vector unsigned int) { 1, 2, 4, 6 }; + + vresult_uint = vec_srdb (src_va_uint, src_vb_uint, 7); + + if (!vec_all_eq (vresult_uint, expected_vresult_uint)) { +#if DEBUG + printf("ERROR, vec_srdb (src_va_uint_, src_vb_uint, 7)\n"); + for(i = 0; i < 4; i++) + printf(" vresult_uint[%d] = %d, expected_vresult_uint[%d] = %d\n", + i, vresult_uint[i], i, expected_vresult_uint[i]); +#else + abort(); +#endif + } + + src_va_llint = (vector signed long long int) { 0, 0 }; + src_vb_llint = (vector signed long long int) { 5*128, 6*128 }; + vresult_llint = (vector signed long long int) { 0, 0 }; + expected_vresult_llint = (vector signed long long int) { 5, 6 }; + + vresult_llint = vec_srdb (src_va_llint, src_vb_llint, 7); + + if (!vec_all_eq (vresult_llint, expected_vresult_llint)) { +#if DEBUG + printf("ERROR, vec_srdb (src_va_llint_, src_vb_llint, 7)\n"); + for(i = 0; i < 2; i++) + printf(" vresult_llint[%d] = %d, expected_vresult_llint[%d] = %d\n", + i, vresult_llint[i], i, expected_vresult_llint[i]); +#else + abort(); +#endif + } + + src_va_ullint = (vector unsigned long long int) { 0, 0 }; + src_vb_ullint = (vector unsigned long long int) { 54*128, 26*128 }; + vresult_ullint = (vector unsigned long long int) { 0, 0 }; + expected_vresult_ullint = (vector unsigned long long int) { 54, 26 }; + + vresult_ullint = vec_srdb (src_va_ullint, src_vb_ullint, 7); + + if (!vec_all_eq (vresult_ullint, expected_vresult_ullint)) { +#if DEBUG + printf("ERROR, vec_srdb (src_va_ullint_, src_vb_ullint, 7)\n"); + for(i = 0; i < 2; i++) + printf(" vresult_ullint[%d] = %d, expected_vresult_ullint[%d] = %d\n", + i, vresult_ullint[i], i, expected_vresult_ullint[i]); +#else + abort(); +#endif + } + + return 0; +} + +/* { dg-final { scan-assembler-times {\msldbi\M} 6 } } */ +/* { dg-final { scan-assembler-times {\msrdbi\M} 6 } } */ + + From patchwork Mon Jun 15 23:38:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Carl Love X-Patchwork-Id: 1309880 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=gcc.gnu.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=pTWPQI0r; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49m78T4bC0z9sSS for ; Tue, 16 Jun 2020 09:39:21 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A4F2A383F878; Mon, 15 Jun 2020 23:39:19 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A4F2A383F878 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1592264359; bh=mt48ttYGmuKO0/fgTXyAN8tB5t8yNWbPpF7e7JNmNw0=; h=Subject:To:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=pTWPQI0rfJ9zLloVIS9Sacg42osCtWi+Z9rkBNJYTFCTMd0jEqAK9C8O80qKD+HOu 1Imxf7mLq3CRErbS4MJc+lWaknCLbygMuUZEWPTjcEbJfq3ZWMF5BVrJEdQh3Z7cfD j+59RRPji1da3FdlzOfEU4YDEClZEUPkfjFE9AAQ= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 4D1DC386F447; Mon, 15 Jun 2020 23:39:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 4D1DC386F447 Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 05FNX4vQ128920; Mon, 15 Jun 2020 19:39:15 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 31pc7n342n-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 15 Jun 2020 19:39:15 -0400 Received: from m0098396.ppops.net (m0098396.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 05FNY9n9131274; Mon, 15 Jun 2020 19:39:14 -0400 Received: from ppma02dal.us.ibm.com (a.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.10]) by mx0a-001b2d01.pphosted.com with ESMTP id 31pc7n342e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 15 Jun 2020 19:39:14 -0400 Received: from pps.filterd (ppma02dal.us.ibm.com [127.0.0.1]) by ppma02dal.us.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 05FNVG7b009192; Mon, 15 Jun 2020 23:39:14 GMT Received: from b01cxnp23032.gho.pok.ibm.com (b01cxnp23032.gho.pok.ibm.com [9.57.198.27]) by ppma02dal.us.ibm.com with ESMTP id 31pe8nu051-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 15 Jun 2020 23:39:14 +0000 Received: from b01ledav006.gho.pok.ibm.com (b01ledav006.gho.pok.ibm.com [9.57.199.111]) by b01cxnp23032.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 05FNcDU553608748 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 15 Jun 2020 23:38:13 GMT Received: from b01ledav006.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2825DAC05F; Mon, 15 Jun 2020 23:38:13 +0000 (GMT) Received: from b01ledav006.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 52EF5AC059; Mon, 15 Jun 2020 23:38:12 +0000 (GMT) Received: from sig-9-65-250-81.ibm.com (unknown [9.65.250.81]) by b01ledav006.gho.pok.ibm.com (Postfix) with ESMTP; Mon, 15 Jun 2020 23:38:12 +0000 (GMT) Message-ID: <6f6c9730b32ceb721155bf3f20d1db13a26a4713.camel@us.ibm.com> Subject: [PATCH 5/6 ver 2] rs6000, Add vector splat builtin support To: segher@gcc.gnu.org, dje.gcc@gmail.com, gcc-patches@gcc.gnu.org Date: Mon, 15 Jun 2020 16:38:11 -0700 X-Mailer: Evolution 3.28.5 (3.28.5-5.el7) Mime-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216, 18.0.687 definitions=2020-06-15_11:2020-06-15, 2020-06-15 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=4 mlxlogscore=999 clxscore=1015 impostorscore=0 malwarescore=0 cotscore=-2147483648 priorityscore=1501 lowpriorityscore=0 phishscore=0 spamscore=0 mlxscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2006150168 X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Carl Love via Gcc-patches From: Carl Love Reply-To: Carl Love Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" v2 changes: change log fixes gcc/config/rs6000/altivec changed name of define_insn and define_expand for vxxspltiw... to xxspltiw... Fixed spaces in gen_xxsplti32dx_v4sf_inst (operands[0], GEN_INT gcc/rs6000-builtin.def propagated name changes above where they are used. Updated definition for S32bit_cint_operand, c32bit_cint_operand, f32bit_const_operand predicate definitions. Changed name of rs6000_constF32toI32 to rs6000_const_f32_to_i32, propagated name change as needed. Replaced if test with gcc_assert(). Fixed description of vec_splatid() in documentation. ----------------------- GCC maintainers: The following patch adds support for the vec_splati, vec_splatid and vec_splati_ins builtins. Note, this patch adds support for instructions that take a 32-bit immediate value that represents a floating point value. This support adds new predicates and a support function to properly handle the immediate value. The patch has been compiled and tested on powerpc64le-unknown-linux-gnu (Power 9 LE) with no regression errors. The test case was compiled on a Power 9 system and then tested on Mambo. Please let me know if this patch is acceptable for the pu branch. Thanks. Carl Love -------------------------------------------------------- gcc/ChangeLog 2020-06-15 Carl Love * config/rs6000/altivec.h (vec_splati, vec_splatid, vec_splati_ins): Add defines. * config/rs6000/altivec.md (UNSPEC_XXSPLTIW, UNSPEC_XXSPLTID, UNSPEC_XXSPLTI32DX): New. (vxxspltiw_v4si, vxxspltiw_v4sf_inst, vxxspltidp_v2df_inst, vxxsplti32dx_v4si_inst, vxxsplti32dx_v4sf_inst): New define_insn. (vxxspltiw_v4sf, vxxspltidp_v2df, vxxsplti32dx_v4si, vxxsplti32dx_v4sf.): New define_expands. * config/rs6000/predicates (u1bit_cint_operand, s32bit_cint_operand, c32bit_cint_operand, f32bit_const_operand): New predicates. * config/rs6000/rs6000-builtin.def (VXXSPLTIW_V4SI, VXXSPLTIW_V4SF, VXXSPLTID): NewBU_FUTURE_V_1 definitions. (VXXSPLTI32DX_V4SI, VXXSPLTI32DX_V4SF): New BU_FUTURE_V_3 definitions. (XXSPLTIW, XXSPLTID): New BU_FUTURE_OVERLOAD_1 definitions. (XXSPLTI32DX): Add BU_FUTURE_OVERLOAD_3 definition. * config/rs6000/rs6000-call.c (FUTURE_BUILTIN_VEC_XXSPLTIW, FUTURE_BUILTIN_VEC_XXSPLTID, FUTURE_BUILTIN_VEC_XXSPLTI32DX): New definitions. * config/rs6000/rs6000-protos.h (rs6000_constF32toI32): New extern declaration. * config/rs6000/rs6000.c (rs6000_constF32toI32): New function. * config/doc/extend.texi: Add documentation for vec_splati, vec_splatid, and vec_splati_ins. gcc/testsuite/ChangeLog 2020-06-15 Carl Love * testsuite/gcc.target/powerpc/vec-splati-runnable: New test. --- gcc/config/rs6000/altivec.h | 3 + gcc/config/rs6000/altivec.md | 109 +++++++++++++ gcc/config/rs6000/predicates.md | 33 ++++ gcc/config/rs6000/rs6000-builtin.def | 13 ++ gcc/config/rs6000/rs6000-call.c | 19 +++ gcc/config/rs6000/rs6000-protos.h | 1 + gcc/config/rs6000/rs6000.c | 11 ++ gcc/doc/extend.texi | 35 +++++ .../gcc.target/powerpc/vec-splati-runnable.c | 145 ++++++++++++++++++ 9 files changed, 369 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h index 0be68892aad..9ed41b1cbf1 100644 --- a/gcc/config/rs6000/altivec.h +++ b/gcc/config/rs6000/altivec.h @@ -705,6 +705,9 @@ __altivec_scalar_pred(vec_any_nle, #define vec_replace_unaligned(a, b, c) __builtin_vec_replace_un (a, b, c) #define vec_sldb(a, b, c) __builtin_vec_sldb (a, b, c) #define vec_srdb(a, b, c) __builtin_vec_srdb (a, b, c) +#define vec_splati(a) __builtin_vec_xxspltiw (a) +#define vec_splatid(a) __builtin_vec_xxspltid (a) +#define vec_splati_ins(a, b, c) __builtin_vec_xxsplti32dx (a, b, c) #define vec_gnb(a, b) __builtin_vec_gnb (a, b) #define vec_clrl(a, b) __builtin_vec_clrl (a, b) diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 832a35cdaa9..25f6b9b2f07 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -173,6 +173,9 @@ UNSPEC_VSTRIL UNSPEC_SLDB UNSPEC_SRDB + UNSPEC_XXSPLTIW + UNSPEC_XXSPLTID + UNSPEC_XXSPLTI32DX ]) (define_c_enum "unspecv" @@ -799,6 +802,112 @@ "vsdbi %0,%1,%2,%3" [(set_attr "type" "vecsimple")]) +(define_insn "xxspltiw_v4si" + [(set (match_operand:V4SI 0 "register_operand" "=wa") + (unspec:V4SI [(match_operand:SI 1 "s32bit_cint_operand" "n")] + UNSPEC_XXSPLTIW))] + "TARGET_FUTURE" + "xxspltiw %x0,%1" + [(set_attr "type" "vecsimple")]) + +(define_expand "xxspltiw_v4sf" + [(set (match_operand:V4SF 0 "register_operand" "=wa") + (unspec:V4SF [(match_operand:SF 1 "f32bit_const_operand" "n")] + UNSPEC_XXSPLTIW))] + "TARGET_FUTURE" +{ + long long value = rs6000_const_f32_to_i32 (operands[1]); + emit_insn (gen_xxspltiw_v4sf_inst (operands[0], GEN_INT (value))); + DONE; +}) + +(define_insn "xxspltiw_v4sf_inst" + [(set (match_operand:V4SF 0 "register_operand" "=wa") + (unspec:V4SF [(match_operand:SI 1 "c32bit_cint_operand" "n")] + UNSPEC_XXSPLTIW))] + "TARGET_FUTURE" + "xxspltiw %x0,%c1" + [(set_attr "type" "vecsimple")]) + +(define_expand "xxspltidp_v2df" + [(set (match_operand:V2DF 0 "register_operand" ) + (unspec:V2DF [(match_operand:SF 1 "f32bit_const_operand")] + UNSPEC_XXSPLTID))] + "TARGET_FUTURE" +{ + long value = rs6000_const_f32_to_i32 (operands[1]); + emit_insn (gen_xxspltidp_v2df_inst (operands[0], GEN_INT (value))); + DONE; +}) + +(define_insn "xxspltidp_v2df_inst" + [(set (match_operand:V2DF 0 "register_operand" "=wa") + (unspec:V2DF [(match_operand:SI 1 "c32bit_cint_operand" "n")] + UNSPEC_XXSPLTID))] + "TARGET_FUTURE" + "xxspltidp %x0,%c1" + [(set_attr "type" "vecsimple")]) + +(define_expand "xxsplti32dx_v4si" + [(set (match_operand:V4SI 0 "register_operand" "=wa") + (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "wa") + (match_operand:QI 2 "u1bit_cint_operand" "n") + (match_operand:SI 3 "s32bit_cint_operand" "n")] + UNSPEC_XXSPLTI32DX))] + "TARGET_FUTURE" +{ + int index = INTVAL (operands[2]); + + if (!BYTES_BIG_ENDIAN) + index = 1 - index; + + /* Instruction uses destination as a source. Do not overwrite source. */ + emit_move_insn (operands[0], operands[1]); + + emit_insn (gen_xxsplti32dx_v4si_inst (operands[0], GEN_INT (index), + operands[3])); + DONE; +} + [(set_attr "type" "vecsimple")]) + +(define_insn "xxsplti32dx_v4si_inst" + [(set (match_operand:V4SI 0 "register_operand" "+wa") + (unspec:V4SI [(match_operand:QI 1 "u1bit_cint_operand" "n") + (match_operand:SI 2 "s32bit_cint_operand" "n")] + UNSPEC_XXSPLTI32DX))] + "TARGET_FUTURE" + "xxsplti32dx %x0,%1,%2" + [(set_attr "type" "vecsimple")]) + +(define_expand "xxsplti32dx_v4sf" + [(set (match_operand:V4SF 0 "register_operand" "=wa") + (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "wa") + (match_operand:QI 2 "u1bit_cint_operand" "n") + (match_operand:SF 3 "f32bit_const_operand" "n")] + UNSPEC_XXSPLTI32DX))] + "TARGET_FUTURE" +{ + int index = INTVAL (operands[2]); + long value = rs6000_const_f32_to_i32 (operands[3]); + if (!BYTES_BIG_ENDIAN) + index = 1 - index; + + /* Instruction uses destination as a source. Do not overwrite source. */ + emit_move_insn (operands[0], operands[1]); + emit_insn (gen_xxsplti32dx_v4sf_inst (operands[0], GEN_INT (index), + GEN_INT (value))); + DONE; +}) + +(define_insn "xxsplti32dx_v4sf_inst" + [(set (match_operand:V4SF 0 "register_operand" "+wa") + (unspec:V4SF [(match_operand:QI 1 "u1bit_cint_operand" "n") + (match_operand:SI 2 "s32bit_cint_operand" "n")] + UNSPEC_XXSPLTI32DX))] + "TARGET_FUTURE" + "xxsplti32dx %x0,%1,%2" + [(set_attr "type" "vecsimple")]) + (define_expand "vstrir_" [(set (match_operand:VIshort 0 "altivec_register_operand") (unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand")] diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index c3f460face2..48f913c5718 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -214,6 +214,11 @@ (and (match_code "const_int") (match_test "INTVAL (op) >= -16 && INTVAL (op) <= 15"))) +;; Return 1 if op is a unsigned 1-bit constant integer. +(define_predicate "u1bit_cint_operand" + (and (match_code "const_int") + (match_test "INTVAL (op) >= 0 && INTVAL (op) <= 1"))) + ;; Return 1 if op is a unsigned 3-bit constant integer. (define_predicate "u3bit_cint_operand" (and (match_code "const_int") @@ -272,6 +277,34 @@ (match_test "(unsigned HOST_WIDE_INT) (INTVAL (op) + 0x8000) >= 0x10000"))) +;; Return 1 if op is a 32-bit constant signed integer +(define_predicate "s32bit_cint_operand" + (and (match_code "const_int") + (match_test "(unsigned HOST_WIDE_INT) + (0x80000000 + UINTVAL (op)) >> 32 == 0"))) + +;; Return 1 if op is a constant 32-bit unsigned +(define_predicate "c32bit_cint_operand" + (and (match_code "const_int") + (match_test "((UINTVAL (op) >> 32) == 0)"))) + +;; Return 1 if op is a constant 32-bit floating point value +(define_predicate "f32bit_const_operand" + (match_code "const_double") +{ + if (GET_MODE (op) == SFmode) + return 1; + + else if ((GET_MODE (op) == DFmode) && ((UINTVAL (op) >> 32) == 0)) + { + /* Value fits in 32-bits */ + return 1; + } + else + /* Not the expected mode. */ + return 0; +}) + ;; Return 1 if op is a positive constant integer that is an exact power of 2. (define_predicate "exact_log2_cint_operand" (and (match_code "const_int") diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def index 2b198177ef0..c85326de7f2 100644 --- a/gcc/config/rs6000/rs6000-builtin.def +++ b/gcc/config/rs6000/rs6000-builtin.def @@ -2666,6 +2666,15 @@ BU_FUTURE_V_3 (VSRDB_V16QI, "vsrdb_v16qi", CONST, vsrdb_v16qi) BU_FUTURE_V_3 (VSRDB_V8HI, "vsrdb_v8hi", CONST, vsrdb_v8hi) BU_FUTURE_V_3 (VSRDB_V4SI, "vsrdb_v4si", CONST, vsrdb_v4si) BU_FUTURE_V_3 (VSRDB_V2DI, "vsrdb_v2di", CONST, vsrdb_v2di) + +BU_FUTURE_V_1 (VXXSPLTIW_V4SI, "vxxspltiw_v4si", CONST, xxspltiw_v4si) +BU_FUTURE_V_1 (VXXSPLTIW_V4SF, "vxxspltiw_v4sf", CONST, xxspltiw_v4sf) + +BU_FUTURE_V_1 (VXXSPLTID, "vxxspltidp", CONST, xxspltidp_v2df) + +BU_FUTURE_V_3 (VXXSPLTI32DX_V4SI, "vxxsplti32dx_v4si", CONST, xxsplti32dx_v4si) +BU_FUTURE_V_3 (VXXSPLTI32DX_V4SF, "vxxsplti32dx_v4sf", CONST, xxsplti32dx_v4sf) + BU_FUTURE_V_1 (VSTRIBR, "vstribr", CONST, vstrir_v16qi) BU_FUTURE_V_1 (VSTRIHR, "vstrihr", CONST, vstrir_v8hi) BU_FUTURE_V_1 (VSTRIBL, "vstribl", CONST, vstril_v16qi) @@ -2697,6 +2706,10 @@ BU_FUTURE_OVERLOAD_1 (VSTRIL, "stril") BU_FUTURE_OVERLOAD_1 (VSTRIR_P, "strir_p") BU_FUTURE_OVERLOAD_1 (VSTRIL_P, "stril_p") + +BU_FUTURE_OVERLOAD_1 (XXSPLTIW, "xxspltiw") +BU_FUTURE_OVERLOAD_1 (XXSPLTID, "xxspltid") +BU_FUTURE_OVERLOAD_3 (XXSPLTI32DX, "xxsplti32dx") /* 1 argument crypto functions. */ BU_CRYPTO_1 (VSBOX, "vsbox", CONST, crypto_vsbox_v2di) diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c index 092e6c1cc2c..e36aafaf71c 100644 --- a/gcc/config/rs6000/rs6000-call.c +++ b/gcc/config/rs6000/rs6000-call.c @@ -5679,6 +5679,22 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = { RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTQI }, + { FUTURE_BUILTIN_VEC_XXSPLTIW, FUTURE_BUILTIN_VXXSPLTIW_V4SI, + RS6000_BTI_V4SI, RS6000_BTI_INTSI, 0, 0 }, + { FUTURE_BUILTIN_VEC_XXSPLTIW, FUTURE_BUILTIN_VXXSPLTIW_V4SF, + RS6000_BTI_V4SF, RS6000_BTI_float, 0, 0 }, + + { FUTURE_BUILTIN_VEC_XXSPLTID, FUTURE_BUILTIN_VXXSPLTID, + RS6000_BTI_V2DF, RS6000_BTI_float, 0, 0 }, + + { FUTURE_BUILTIN_VEC_XXSPLTI32DX, FUTURE_BUILTIN_VXXSPLTI32DX_V4SI, + RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_UINTQI, RS6000_BTI_INTSI }, + { FUTURE_BUILTIN_VEC_XXSPLTI32DX, FUTURE_BUILTIN_VXXSPLTI32DX_V4SI, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_UINTQI, + RS6000_BTI_UINTSI }, + { FUTURE_BUILTIN_VEC_XXSPLTI32DX, FUTURE_BUILTIN_VXXSPLTI32DX_V4SF, + RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_UINTQI, RS6000_BTI_float }, + { FUTURE_BUILTIN_VEC_SRDB, FUTURE_BUILTIN_VSRDB_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_UINTQI }, @@ -13539,6 +13555,9 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0, case ALTIVEC_BUILTIN_VSRH: case ALTIVEC_BUILTIN_VSRW: case P8V_BUILTIN_VSRD: + /* Vector splat immediate insert */ + case FUTURE_BUILTIN_VXXSPLTI32DX_V4SI: + case FUTURE_BUILTIN_VXXSPLTI32DX_V4SF: h.uns_p[2] = 1; break; diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h index 5508484ba19..c6158874ce9 100644 --- a/gcc/config/rs6000/rs6000-protos.h +++ b/gcc/config/rs6000/rs6000-protos.h @@ -274,6 +274,7 @@ extern void rs6000_asm_output_dwarf_pcrel (FILE *file, int size, const char *label); extern void rs6000_asm_output_dwarf_datarel (FILE *file, int size, const char *label); +extern long long rs6000_const_f32_to_i32 (rtx operand); /* Declare functions in rs6000-c.c */ diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 58f5d780603..89fcc99df0a 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -26494,6 +26494,17 @@ rs6000_cannot_substitute_mem_equiv_p (rtx mem) return false; } +long long +rs6000_const_f32_to_i32 (rtx operand) +{ + long long value; + const struct real_value *rv = CONST_DOUBLE_REAL_VALUE (operand); + + gcc_assert (GET_MODE (operand) == SFmode); + REAL_VALUE_TO_TARGET_SINGLE (*rv, value); + return value; +} + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-rs6000.h" diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 6926c866492..dfdffead903 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -21148,6 +21148,41 @@ this instruction must be endian-aware. @findex vec_srdb +Vector Splat + +@smallexample +@exdent vector signed int vec_splati (const signed int); +@exdent vector float vec_splati (const float); +@end smallexample + +Splat a 32-bit immediate into a vector of words. + +@findex vec_splati + +@smallexample +@exdent vector double vec_splatid (const float); +@end smallexample + +Convert a single precision floating-point value to double-precision and splat +the result to a vector of double-precision floats. + +@findex vec_splatid + +@smallexample +@exdent vector signed int vec_splati_ins (vector signed int, +const unsigned int, const signed int); +@exdent vector unsigned int vec_splati_ins (vector unsigned int, +const unsigned int, const unsigned int); +@exdent vector float vec_splati_ins (vector float, const unsigned int, +const float); +@end smallexample + +Argument 2 must be either 0 or 1. Splat the value of argument 3 into the word +identified by argument 2 of each doubleword of argument 1 and return the +result. The other words of argument 1 are unchanged. + +@findex vec_splati_ins + @smallexample @exdent vector unsigned long long int @exdent vec_pdep (vector unsigned long long int, vector unsigned long long int) diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c new file mode 100644 index 00000000000..f9fa55ae0d4 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c @@ -0,0 +1,145 @@ +/* { dg-do run } */ +/* { dg-require-effective-target powerpc_future_hw } */ +/* { dg-options "-mdejagnu-cpu=future" } */ +#include + +#define DEBUG 0 + +#ifdef DEBUG +#include +#endif + +extern void abort (void); + +int +main (int argc, char *argv []) +{ + int i; + vector int vsrc_a_int; + vector int vresult_int; + vector int expected_vresult_int; + int src_a_int = 13; + + vector unsigned int vsrc_a_uint; + vector unsigned int vresult_uint; + vector unsigned int expected_vresult_uint; + unsigned int src_a_uint = 7; + + vector float vresult_f; + vector float expected_vresult_f; + vector float vsrc_a_f; + float src_a_f = 23.0; + + vector double vsrc_a_d; + vector double vresult_d; + vector double expected_vresult_d; + + /* Vector splati word */ + vresult_int = (vector signed int) { 1, 2, 3, 4 }; + expected_vresult_int = (vector signed int) { -13, -13, -13, -13 }; + + vresult_int = vec_splati ( -13 ); + + if (!vec_all_eq (vresult_int, expected_vresult_int)) { +#if DEBUG + printf("ERROR, vec_splati (src_a_int)\n"); + for(i = 0; i < 4; i++) + printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n", + i, vresult_int[i], i, expected_vresult_int[i]); +#else + abort(); +#endif + } + + vresult_f = (vector float) { 1.0, 2.0, 3.0, 4.0 }; + expected_vresult_f = (vector float) { 23.0, 23.0, 23.0, 23.0 }; + + vresult_f = vec_splati (23.0f); + + if (!vec_all_eq (vresult_f, expected_vresult_f)) { +#if DEBUG + printf("ERROR, vec_splati (src_a_f)\n"); + for(i = 0; i < 4; i++) + printf(" vresult_f[%d] = %f, expected_vresult_f[%d] = %f\n", + i, vresult_f[i], i, expected_vresult_f[i]); +#else + abort(); +#endif + } + + /* Vector splati double */ + vresult_d = (vector double) { 2.0, 3.0 }; + expected_vresult_d = (vector double) { -31.0, -31.0 }; + + vresult_d = vec_splatid (-31.0f); + + if (!vec_all_eq (vresult_d, expected_vresult_d)) { +#if DEBUG + printf("ERROR, vec_splati (-31.0f)\n"); + for(i = 0; i < 2; i++) + printf(" vresult_d[%i] = %f, expected_vresult_d[%i] = %f\n", + i, vresult_d[i], i, expected_vresult_d[i]); +#else + abort(); +#endif + } + + /* Vector splat immediate */ + vsrc_a_int = (vector int) { 2, 3, 4, 5 }; + vresult_int = (vector int) { 1, 1, 1, 1 }; + expected_vresult_int = (vector int) { 2, 20, 4, 20 }; + + vresult_int = vec_splati_ins (vsrc_a_int, 1, 20); + + if (!vec_all_eq (vresult_int, expected_vresult_int)) { +#if DEBUG + printf("ERROR, vec_splati_ins (vsrc_a_int, 1, 20)\n"); + for(i = 0; i < 4; i++) + printf(" vresult_int[%i] = %d, expected_vresult_int[%i] = %d\n", + i, vresult_int[i], i, expected_vresult_int[i]); +#else + abort(); +#endif + } + + vsrc_a_uint = (vector unsigned int) { 4, 5, 6, 7 }; + vresult_uint = (vector unsigned int) { 1, 1, 1, 1 }; + expected_vresult_uint = (vector unsigned int) { 4, 40, 6, 40 }; + + vresult_uint = vec_splati_ins (vsrc_a_uint, 1, 40); + + if (!vec_all_eq (vresult_uint, expected_vresult_uint)) { +#if DEBUG + printf("ERROR, vec_splati_ins (vsrc_a_uint, 1, 40)\n"); + for(i = 0; i < 4; i++) + printf(" vresult_uint[%i] = %d, expected_vresult_uint[%i] = %d\n", + i, vresult_uint[i], i, expected_vresult_uint[i]); +#else + abort(); +#endif + } + + vsrc_a_f = (vector float) { 2.0, 3.0, 4.0, 5.0 }; + vresult_f = (vector float) { 1.0, 1.0, 1.0, 1.0 }; + expected_vresult_f = (vector float) { 2.0, 20.1, 4.0, 20.1 }; + + vresult_f = vec_splati_ins (vsrc_a_f, 1, 20.1f); + + if (!vec_all_eq (vresult_f, expected_vresult_f)) { +#if DEBUG + printf("ERROR, vec_splati_ins (vsrc_a_f, 1, 20.1)\n"); + for(i = 0; i < 4; i++) + printf(" vresult_f[%i] = %f, expected_vresult_f[%i] = %f\n", + i, vresult_f[i], i, expected_vresult_f[i]); +#else + abort(); +#endif + } + + return 0; +} + +/* { dg-final { scan-assembler-times {\msplati\M} 6 } } */ +/* { dg-final { scan-assembler-times {\msrdbi\M} 6 } } */ + + From patchwork Mon Jun 15 23:38:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Carl Love X-Patchwork-Id: 1309879 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=gcc.gnu.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=qqDclxYb; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49m77S6BZKz9sRN for ; Tue, 16 Jun 2020 09:38:28 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 96CD0388A80D; Mon, 15 Jun 2020 23:38:26 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 96CD0388A80D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1592264306; bh=kwLMGj150Lvm8PQEoiHxBkr3en5uktBc1I0Y834rzks=; h=Subject:To:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=qqDclxYbLIrwO4+EUPJAa2aSQUpmbW4tgiyAp5PqNDHJsDfuNeVmXGAunxw3P7Mjd P1CHf4TuRl6kzXI3JKC1vfRCOr1nLM551QHLO8ogGRAL9XSfTxwO+KQPJKkbbnT9Fs N194MDkrSd8T/usaDucJbTZshmdKdfr8DxBen6KY= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 0644B383F875; Mon, 15 Jun 2020 23:38:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 0644B383F875 Received: from pps.filterd (m0098413.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 05FNWLhf152284; Mon, 15 Jun 2020 19:38:20 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 31pgbx2wrg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 15 Jun 2020 19:38:20 -0400 Received: from m0098413.ppops.net (m0098413.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 05FNWP7m152619; Mon, 15 Jun 2020 19:38:20 -0400 Received: from ppma04dal.us.ibm.com (7a.29.35a9.ip4.static.sl-reverse.com [169.53.41.122]) by mx0b-001b2d01.pphosted.com with ESMTP id 31pgbx2wra-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 15 Jun 2020 19:38:20 -0400 Received: from pps.filterd (ppma04dal.us.ibm.com [127.0.0.1]) by ppma04dal.us.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 05FNUmUw020921; Mon, 15 Jun 2020 23:38:19 GMT Received: from b01cxnp22036.gho.pok.ibm.com (b01cxnp22036.gho.pok.ibm.com [9.57.198.26]) by ppma04dal.us.ibm.com with ESMTP id 31mpe93g0t-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 15 Jun 2020 23:38:19 +0000 Received: from b01ledav001.gho.pok.ibm.com (b01ledav001.gho.pok.ibm.com [9.57.199.106]) by b01cxnp22036.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 05FNcIsF16253912 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 15 Jun 2020 23:38:18 GMT Received: from b01ledav001.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8B4BC28059; Mon, 15 Jun 2020 23:38:18 +0000 (GMT) Received: from b01ledav001.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A1E252805A; Mon, 15 Jun 2020 23:38:17 +0000 (GMT) Received: from sig-9-65-250-81.ibm.com (unknown [9.65.250.81]) by b01ledav001.gho.pok.ibm.com (Postfix) with ESMTP; Mon, 15 Jun 2020 23:38:17 +0000 (GMT) Message-ID: <82eed57332d04f6e453286fcb99c18f561aa2dd6.camel@us.ibm.com> Subject: [PATCH 6/6 ver 2] rs6000 Add vector blend, permute builtin support To: segher@gcc.gnu.org, dje.gcc@gmail.com, gcc-patches@gcc.gnu.org Date: Mon, 15 Jun 2020 16:38:16 -0700 X-Mailer: Evolution 3.28.5 (3.28.5-5.el7) Mime-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216, 18.0.687 definitions=2020-06-15_11:2020-06-15, 2020-06-15 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 priorityscore=1501 bulkscore=0 mlxlogscore=999 mlxscore=0 suspectscore=4 cotscore=-2147483648 impostorscore=0 spamscore=0 phishscore=0 clxscore=1015 malwarescore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2006150164 X-Spam-Status: No, score=-10.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SCC_10_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Carl Love via Gcc-patches From: Carl Love Reply-To: Carl Love Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" v2 changes: Updated ChangeLog per comments. define_expand "xxpermx", Updated implementation to use XOR (icode == CODE_FOR_xxpermx, fix comments and check for 3-bit immediate field. gcc/doc/extend.texi: comment "Maybe it should say it is related to vsel/xxsel, but per bigger element?", added comment. I took the description directly from spec. Don't really don't want to mess with the approved description. fixed typo for Vector Permute Extendedextracth ---------- GCC maintainers: The following patch adds support for the vec_blendv and vec_permx builtins. The patch has been compiled and tested on powerpc64le-unknown-linux-gnu (Power 9 LE) with no regression errors. The test cases were compiled on a Power 9 system and then tested on Mambo. Carl Love --------------------------------------------------------------- rs6000 RFC2609 vector blend, permute instructions gcc/ChangeLog 2020-06-15 Carl Love * config/rs6000/altivec.h (vec_blendv, vec_permx): Add define. * config/rs6000/altivec.md (UNSPEC_XXBLEND, UNSPEC_XXPERMX.): New unspecs. (VM3): New define_mode. (VM3_char): New define_attr. (xxblend_ mode VM3): New define_insn. (xxpermx): New define_expand. (xxpermx_inst): New define_insn. * config/rs6000/rs6000-builtin.def (VXXBLEND_V16QI, VXXBLEND_V8HI, VXXBLEND_V4SI, VXXBLEND_V2DI, VXXBLEND_V4SF, VXXBLEND_V2DF): New BU_FUTURE_V_3 definitions. (XXBLENDBU_FUTURE_OVERLOAD_3): New BU_FUTURE_OVERLOAD_3 definition. (XXPERMX): New BU_FUTURE_OVERLOAD_4 definition. * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): (FUTURE_BUILTIN_VXXPERMX): Add if case support. * config/rs6000/rs6000-call.c (FUTURE_BUILTIN_VXXBLEND_V16QI, FUTURE_BUILTIN_VXXBLEND_V8HI, FUTURE_BUILTIN_VXXBLEND_V4SI, FUTURE_BUILTIN_VXXBLEND_V2DI, FUTURE_BUILTIN_VXXBLEND_V4SF, FUTURE_BUILTIN_VXXBLEND_V2DF, FUTURE_BUILTIN_VXXPERMX): Define overloaded arguments. (rs6000_expand_quaternop_builtin): Add if case for CODE_FOR_xxpermx. (builtin_quaternary_function_type): Add v16uqi_type and xxpermx_type variables, add case statement for FUTURE_BUILTIN_VXXPERMX. (builtin_function_type)[FUTURE_BUILTIN_VXXBLEND_V16QI, FUTURE_BUILTIN_VXXBLEND_V8HI, FUTURE_BUILTIN_VXXBLEND_V4SI, FUTURE_BUILTIN_VXXBLEND_V2DI]: Add case statements. * doc/extend.texi: Add documentation for vec_blendv and vec_permx. gcc/testsuite/ChangeLog 2020-06-15 Carl Love gcc.target/powerpc/vec-blend-runnable.c: New test. gcc.target/powerpc/vec-permute-ext-runnable.c: New test. --- gcc/config/rs6000/altivec.h | 2 + gcc/config/rs6000/altivec.md | 71 +++++ gcc/config/rs6000/rs6000-builtin.def | 13 + gcc/config/rs6000/rs6000-c.c | 25 +- gcc/config/rs6000/rs6000-call.c | 94 ++++++ gcc/doc/extend.texi | 63 ++++ .../gcc.target/powerpc/vec-blend-runnable.c | 276 ++++++++++++++++ .../powerpc/vec-permute-ext-runnable.c | 294 ++++++++++++++++++ 8 files changed, 833 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-blend-runnable.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-permute-ext-runnable.c diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h index 9ed41b1cbf1..1b532effebe 100644 --- a/gcc/config/rs6000/altivec.h +++ b/gcc/config/rs6000/altivec.h @@ -708,6 +708,8 @@ __altivec_scalar_pred(vec_any_nle, #define vec_splati(a) __builtin_vec_xxspltiw (a) #define vec_splatid(a) __builtin_vec_xxspltid (a) #define vec_splati_ins(a, b, c) __builtin_vec_xxsplti32dx (a, b, c) +#define vec_blendv(a, b, c) __builtin_vec_xxblend (a, b, c) +#define vec_permx(a, b, c, d) __builtin_vec_xxpermx (a, b, c, d) #define vec_gnb(a, b) __builtin_vec_gnb (a, b) #define vec_clrl(a, b) __builtin_vec_clrl (a, b) diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 25f6b9b2f07..fd221bb21f6 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -176,6 +176,8 @@ UNSPEC_XXSPLTIW UNSPEC_XXSPLTID UNSPEC_XXSPLTI32DX + UNSPEC_XXBLEND + UNSPEC_XXPERMX ]) (define_c_enum "unspecv" @@ -218,6 +220,21 @@ (KF "FLOAT128_VECTOR_P (KFmode)") (TF "FLOAT128_VECTOR_P (TFmode)")]) +;; Like VM2, just do char, short, int, long, float and double +(define_mode_iterator VM3 [V4SI + V8HI + V16QI + V4SF + V2DF + V2DI]) + +(define_mode_attr VM3_char [(V2DI "d") + (V4SI "w") + (V8HI "h") + (V16QI "b") + (V2DF "d") + (V4SF "w")]) + ;; Map the Vector convert single precision to double precision for integer ;; versus floating point (define_mode_attr VS_sxwsp [(V4SI "sxw") (V4SF "sp")]) @@ -908,6 +925,60 @@ "xxsplti32dx %x0,%1,%2" [(set_attr "type" "vecsimple")]) +(define_insn "xxblend_" + [(set (match_operand:VM3 0 "register_operand" "=wa") + (unspec:VM3 [(match_operand:VM3 1 "register_operand" "wa") + (match_operand:VM3 2 "register_operand" "wa") + (match_operand:VM3 3 "register_operand" "wa")] + UNSPEC_XXBLEND))] + "TARGET_FUTURE" + "xxblendv %x0,%x1,%x2,%x3" + [(set_attr "type" "vecsimple")]) + +(define_expand "xxpermx" + [(set (match_operand:V2DI 0 "register_operand" "+wa") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "wa") + (match_operand:V2DI 2 "register_operand" "wa") + (match_operand:V16QI 3 "register_operand" "wa") + (match_operand:QI 4 "u8bit_cint_operand" "n")] + UNSPEC_XXPERMX))] + "TARGET_FUTURE" +{ + if (BYTES_BIG_ENDIAN) + emit_insn (gen_xxpermx_inst (operands[0], operands[1], + operands[2], operands[3], + operands[4])); + else + { + /* Reverse value of byte element index eidx by XORing with 0xFF. + Reverse the 32-byte section identifier match by subracting bits [0:2] + of elemet from 7. */ + int value = INTVAL (operands[4]); + rtx vreg = gen_reg_rtx (V16QImode); + + emit_insn (gen_xxspltib_v16qi (vreg, GEN_INT (-1))); + emit_insn (gen_xorv16qi3 (operands[3], operands[3], vreg)); + value = 7 - value; + emit_insn (gen_xxpermx_inst (operands[0], operands[2], + operands[1], operands[3], + GEN_INT (value))); + } + + DONE; +} + [(set_attr "type" "vecsimple")]) + +(define_insn "xxpermx_inst" + [(set (match_operand:V2DI 0 "register_operand" "+v") + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "v") + (match_operand:V2DI 2 "register_operand" "v") + (match_operand:V16QI 3 "register_operand" "v") + (match_operand:QI 4 "u3bit_cint_operand" "n")] + UNSPEC_XXPERMX))] + "TARGET_FUTURE" + "xxpermx %x0,%x1,%x2,%x3,%4" + [(set_attr "type" "vecsimple")]) + (define_expand "vstrir_" [(set (match_operand:VIshort 0 "altivec_register_operand") (unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand")] diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def index c85326de7f2..d1d04f013bb 100644 --- a/gcc/config/rs6000/rs6000-builtin.def +++ b/gcc/config/rs6000/rs6000-builtin.def @@ -2675,6 +2675,15 @@ BU_FUTURE_V_1 (VXXSPLTID, "vxxspltidp", CONST, xxspltidp_v2df) BU_FUTURE_V_3 (VXXSPLTI32DX_V4SI, "vxxsplti32dx_v4si", CONST, xxsplti32dx_v4si) BU_FUTURE_V_3 (VXXSPLTI32DX_V4SF, "vxxsplti32dx_v4sf", CONST, xxsplti32dx_v4sf) +BU_FUTURE_V_3 (VXXBLEND_V16QI, "xxblend_v16qi", CONST, xxblend_v16qi) +BU_FUTURE_V_3 (VXXBLEND_V8HI, "xxblend_v8hi", CONST, xxblend_v8hi) +BU_FUTURE_V_3 (VXXBLEND_V4SI, "xxblend_v4si", CONST, xxblend_v4si) +BU_FUTURE_V_3 (VXXBLEND_V2DI, "xxblend_v2di", CONST, xxblend_v2di) +BU_FUTURE_V_3 (VXXBLEND_V4SF, "xxblend_v4sf", CONST, xxblend_v4sf) +BU_FUTURE_V_3 (VXXBLEND_V2DF, "xxblend_v2df", CONST, xxblend_v2df) + +BU_FUTURE_V_4 (VXXPERMX, "xxpermx", CONST, xxpermx) + BU_FUTURE_V_1 (VSTRIBR, "vstribr", CONST, vstrir_v16qi) BU_FUTURE_V_1 (VSTRIHR, "vstrihr", CONST, vstrir_v8hi) BU_FUTURE_V_1 (VSTRIBL, "vstribl", CONST, vstril_v16qi) @@ -2710,6 +2719,10 @@ BU_FUTURE_OVERLOAD_1 (VSTRIL_P, "stril_p") BU_FUTURE_OVERLOAD_1 (XXSPLTIW, "xxspltiw") BU_FUTURE_OVERLOAD_1 (XXSPLTID, "xxspltid") BU_FUTURE_OVERLOAD_3 (XXSPLTI32DX, "xxsplti32dx") + +BU_FUTURE_OVERLOAD_3 (XXBLEND, "xxblend") +BU_FUTURE_OVERLOAD_4 (XXPERMX, "xxpermx") + /* 1 argument crypto functions. */ BU_CRYPTO_1 (VSBOX, "vsbox", CONST, crypto_vsbox_v2di) diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c index 07ca33a89b4..88bbc52de6b 100644 --- a/gcc/config/rs6000/rs6000-c.c +++ b/gcc/config/rs6000/rs6000-c.c @@ -1796,22 +1796,37 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl, unsupported_builtin = true; } } - else if (fcode == FUTURE_BUILTIN_VEC_XXEVAL) + else if ((fcode == FUTURE_BUILTIN_VEC_XXEVAL) + || (fcode == FUTURE_BUILTIN_VXXPERMX)) { - /* Need to special case __builtin_vec_xxeval because this takes + signed char op3_type; + + /* Need to special case these builins_xxeval because they takes 4 arguments, and the existing infrastructure handles no more than three. */ if (nargs != 4) { - error ("builtin %qs requires 4 arguments", - "__builtin_vec_xxeval"); + if (fcode == FUTURE_BUILTIN_VEC_XXEVAL) + error ("builtin %qs requires 4 arguments", + "__builtin_vec_xxeval"); + else + error ("builtin %qs requires 4 arguments", + "__builtin_vec_xxpermx"); + return error_mark_node; } + + /* Set value for vec_xxpermx here as it it a constant. */ + op3_type = RS6000_BTI_V16QI; + for ( ; desc->code == fcode; desc++) { + if (fcode == FUTURE_BUILTIN_VEC_XXEVAL) + op3_type = desc->op3; + if (rs6000_builtin_type_compatible (types[0], desc->op1) && rs6000_builtin_type_compatible (types[1], desc->op2) - && rs6000_builtin_type_compatible (types[2], desc->op3) + && rs6000_builtin_type_compatible (types[2], op3_type) && rs6000_builtin_type_compatible (types[3], RS6000_BTI_UINTSI)) { diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c index e36aafaf71c..6770a7f05a2 100644 --- a/gcc/config/rs6000/rs6000-call.c +++ b/gcc/config/rs6000/rs6000-call.c @@ -5554,6 +5554,39 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = { RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI }, + /* The overloaded XXPERMX definitions are handled specially because the + fourth unsigned char operand is not encoded in this table. */ + { FUTURE_BUILTIN_VEC_XXPERMX, FUTURE_BUILTIN_VXXPERMX, + RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, + RS6000_BTI_unsigned_V16QI }, + { FUTURE_BUILTIN_VEC_XXPERMX, FUTURE_BUILTIN_VXXPERMX, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI }, + { FUTURE_BUILTIN_VEC_XXPERMX, FUTURE_BUILTIN_VXXPERMX, + RS6000_BTI_V8HI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, + RS6000_BTI_unsigned_V16QI }, + { FUTURE_BUILTIN_VEC_XXPERMX, FUTURE_BUILTIN_VXXPERMX, + RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, + RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V16QI }, + { FUTURE_BUILTIN_VEC_XXPERMX, FUTURE_BUILTIN_VXXPERMX, + RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, + RS6000_BTI_unsigned_V16QI }, + { FUTURE_BUILTIN_VEC_XXPERMX, FUTURE_BUILTIN_VXXPERMX, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V16QI }, + { FUTURE_BUILTIN_VEC_XXPERMX, FUTURE_BUILTIN_VXXPERMX, + RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, + RS6000_BTI_unsigned_V16QI }, + { FUTURE_BUILTIN_VEC_XXPERMX, FUTURE_BUILTIN_VXXPERMX, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V16QI }, + { FUTURE_BUILTIN_VEC_XXPERMX, FUTURE_BUILTIN_VXXPERMX, + RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, + RS6000_BTI_unsigned_V16QI }, + { FUTURE_BUILTIN_VEC_XXPERMX, FUTURE_BUILTIN_VXXPERMX, + RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_V2DF, + RS6000_BTI_unsigned_V16QI }, + { FUTURE_BUILTIN_VEC_EXTRACTL, FUTURE_BUILTIN_VEXTRACTBL, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTQI }, @@ -5695,6 +5728,37 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = { { FUTURE_BUILTIN_VEC_XXSPLTI32DX, FUTURE_BUILTIN_VXXSPLTI32DX_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_UINTQI, RS6000_BTI_float }, + { FUTURE_BUILTIN_VEC_XXBLEND, FUTURE_BUILTIN_VXXBLEND_V16QI, + RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, + RS6000_BTI_unsigned_V16QI }, + { FUTURE_BUILTIN_VEC_XXBLEND, FUTURE_BUILTIN_VXXBLEND_V16QI, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI }, + { FUTURE_BUILTIN_VEC_XXBLEND, FUTURE_BUILTIN_VXXBLEND_V8HI, + RS6000_BTI_V8HI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, + RS6000_BTI_unsigned_V8HI }, + { FUTURE_BUILTIN_VEC_XXBLEND, FUTURE_BUILTIN_VXXBLEND_V8HI, + RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, + RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI }, + { FUTURE_BUILTIN_VEC_XXBLEND, FUTURE_BUILTIN_VXXBLEND_V4SI, + RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, + RS6000_BTI_unsigned_V4SI }, + { FUTURE_BUILTIN_VEC_XXBLEND, FUTURE_BUILTIN_VXXBLEND_V4SI, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI }, + { FUTURE_BUILTIN_VEC_XXBLEND, FUTURE_BUILTIN_VXXBLEND_V2DI, + RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, + RS6000_BTI_unsigned_V2DI }, + { FUTURE_BUILTIN_VEC_XXBLEND, FUTURE_BUILTIN_VXXBLEND_V2DI, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI }, + { FUTURE_BUILTIN_VEC_XXBLEND, FUTURE_BUILTIN_VXXBLEND_V4SF, + RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, + RS6000_BTI_unsigned_V4SI }, + { FUTURE_BUILTIN_VEC_XXBLEND, FUTURE_BUILTIN_VXXBLEND_V2DF, + RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_V2DF, + RS6000_BTI_unsigned_V2DI }, + { FUTURE_BUILTIN_VEC_SRDB, FUTURE_BUILTIN_VSRDB_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_UINTQI }, @@ -9911,6 +9975,18 @@ rs6000_expand_quaternop_builtin (enum insn_code icode, tree exp, rtx target) } } + else if (icode == CODE_FOR_xxpermx) + { + /* Only allow 3-bit unsigned literals. */ + STRIP_NOPS (arg3); + if (TREE_CODE (arg3) != INTEGER_CST + || TREE_INT_CST_LOW (arg3) & ~0x7) + { + error ("argument 4 must be an 3-bit unsigned literal"); + return CONST0_RTX (tmode); + } + } + if (target == 0 || GET_MODE (target) != tmode || ! (*insn_data[icode].operand[0].predicate) (target, tmode)) @@ -13293,12 +13369,17 @@ builtin_quaternary_function_type (machine_mode mode_ret, tree function_type = NULL; static tree v2udi_type = builtin_mode_to_type[V2DImode][1]; + static tree v16uqi_type = builtin_mode_to_type[V16QImode][1]; static tree uchar_type = builtin_mode_to_type[QImode][1]; static tree xxeval_type = build_function_type_list (v2udi_type, v2udi_type, v2udi_type, v2udi_type, uchar_type, NULL_TREE); + static tree xxpermx_type = + build_function_type_list (v2udi_type, v2udi_type, v2udi_type, + v16uqi_type, uchar_type, NULL_TREE); + switch (builtin) { case FUTURE_BUILTIN_XXEVAL: @@ -13310,6 +13391,15 @@ builtin_quaternary_function_type (machine_mode mode_ret, function_type = xxeval_type; break; + case FUTURE_BUILTIN_VXXPERMX: + gcc_assert ((mode_ret == V2DImode) + && (mode_arg0 == V2DImode) + && (mode_arg1 == V2DImode) + && (mode_arg2 == V16QImode) + && (mode_arg3 == QImode)); + function_type = xxpermx_type; + break; + default: /* A case for each quaternary built-in must be provided above. */ gcc_unreachable (); @@ -13489,6 +13579,10 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0, case FUTURE_BUILTIN_VREPLACE_ELT_UV2DI: case FUTURE_BUILTIN_VREPLACE_UN_UV4SI: case FUTURE_BUILTIN_VREPLACE_UN_UV2DI: + case FUTURE_BUILTIN_VXXBLEND_V16QI: + case FUTURE_BUILTIN_VXXBLEND_V8HI: + case FUTURE_BUILTIN_VXXBLEND_V4SI: + case FUTURE_BUILTIN_VXXBLEND_V2DI: h.uns_p[0] = 1; h.uns_p[1] = 1; h.uns_p[2] = 1; diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index dfdffead903..4529d43a7cc 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -21183,6 +21183,69 @@ result. The other words of argument 1 are unchanged. @findex vec_splati_ins +Vector Blend Variable + +@smallexample +@exdent vector signed char vec_blendv (vector signed char, vector signed char, +vector unsigned char); +@exdent vector unsigned char vec_blendv (vector unsigned char, +vector unsigned char, vector unsigned char); +@exdent vector signed short vec_blendv (vector signed short, +vector signed short, vector unsigned short); +@exdent vector unsigned short vec_blendv (vector unsigned short, +vector unsigned short, vector unsigned short); +@exdent vector signed int vec_blendv (vector signed int, vector signed int, +vector unsigned int); +@exdent vector unsigned int vec_blendv (vector unsigned int, +vector unsigned int, vector unsigned int); +@exdent vector signed long long vec_blendv (vector signed long long, +vector signed long long, vector unsigned long long); +@exdent vector unsigned long long vec_blendv (vector unsigned long long, +vector unsigned long long, vector unsigned long long); +@exdent vector float vec_blendv (vector float, vector float, +vector unsigned int); +@exdent vector double vec_blendv (vector double, vector double, +vector unsigned long long); +@end smallexample + +Blend the first and second argument vectors according to the sign bits of the +corresponding elements of the third argument vector. This is similar to the +vsel and xxsel instructions but for bigger elements. + +@findex vec_blendv + +Vector Permute Extended + +@smallexample +@exdent vector signed char vec_permx (vector signed char, vector signed char, +vector unsigned char, const int); +@exdent vector unsigned char vec_permx (vector unsigned char, +vector unsigned char, vector unsigned char, const int); +@exdent vector signed short vec_permx (vector signed short, +vector signed short, vector unsigned char, const int); +@exdent vector unsigned short vec_permx (vector unsigned short, +vector unsigned short, vector unsigned char, const int); +@exdent vector signed int vec_permx (vector signed int, vector signed int, +vector unsigned char, const int); +@exdent vector unsigned int vec_permx (vector unsigned int, +vector unsigned int, vector unsigned char, const int); +@exdent vector signed long long vec_permx (vector signed long long, +vector signed long long, vector unsigned char, const int); +@exdent vector unsigned long long vec_permx (vector unsigned long long, +vector unsigned long long, vector unsigned char, const int); +@exdent vector float (vector float, vector float, vector unsigned char, +const int); +@exdent vector double (vector double, vector double, vector unsigned char, +const int); +@end smallexample + +Perform a partial permute of the first two arguments, which form a 32-byte +section of an emulated vector up to 256 bytes wide, using the partial permute +control vector in the third argument. The fourth argument (constrained to +values of 0-7) identifies which 32-byte section of the emulated vector is +contained in the first two arguments. +@findex vec_permx + @smallexample @exdent vector unsigned long long int @exdent vec_pdep (vector unsigned long long int, vector unsigned long long int) diff --git a/gcc/testsuite/gcc.target/powerpc/vec-blend-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-blend-runnable.c new file mode 100644 index 00000000000..70b25be3bcb --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-blend-runnable.c @@ -0,0 +1,276 @@ +/* { dg-do run } */ +/* { dg-require-effective-target powerpc_future_hw } */ +/* { dg-options "-mdejagnu-cpu=future" } */ +#include + +#define DEBUG 0 + +#ifdef DEBUG +#include +#endif + +extern void abort (void); + +int +main (int argc, char *argv []) +{ + int i; + vector signed char vsrc_a_char, vsrc_b_char; + vector signed char vresult_char; + vector signed char expected_vresult_char; + + vector unsigned char vsrc_a_uchar, vsrc_b_uchar, vsrc_c_uchar; + vector unsigned char vresult_uchar; + vector unsigned char expected_vresult_uchar; + + vector signed short vsrc_a_short, vsrc_b_short, vsrc_c_short; + vector signed short vresult_short; + vector signed short expected_vresult_short; + + vector unsigned short vsrc_a_ushort, vsrc_b_ushort, vsrc_c_ushort; + vector unsigned short vresult_ushort; + vector unsigned short expected_vresult_ushort; + + vector int vsrc_a_int, vsrc_b_int, vsrc_c_int; + vector int vresult_int; + vector int expected_vresult_int; + + vector unsigned int vsrc_a_uint, vsrc_b_uint, vsrc_c_uint; + vector unsigned int vresult_uint; + vector unsigned int expected_vresult_uint; + + vector long long int vsrc_a_ll, vsrc_b_ll, vsrc_c_ll; + vector long long int vresult_ll; + vector long long int expected_vresult_ll; + + vector unsigned long long int vsrc_a_ull, vsrc_b_ull, vsrc_c_ull; + vector unsigned long long int vresult_ull; + vector unsigned long long int expected_vresult_ull; + + vector float vresult_f; + vector float expected_vresult_f; + vector float vsrc_a_f, vsrc_b_f; + + vector double vsrc_a_d, vsrc_b_d; + vector double vresult_d; + vector double expected_vresult_d; + + /* Vector blend */ + vsrc_c_uchar = (vector unsigned char) { 0, 0x80, 0, 0x80, 0, 0x80, 0, 0x80, + 0, 0x80, 0, 0x80, 0, 0x80, 0, 0x80 }; + + vsrc_a_char = (vector signed char) { -1, 3, 5, 7, 9, 11, 13, 15, + 17, 19, 21, 23, 25, 27, 29 }; + vsrc_b_char = (vector signed char) { 2, -4, 6, 8, 10, 12, 14, 16, + 18, 20, 22, 24, 26, 28, 30, 32 }; + vsrc_c_uchar = (vector unsigned char) { 0, 0x80, 0, 0x80, 0, 0x80, 0, 0x80, + 0, 0x80, 0, 0x80, 0, 0x80, 0, 0x80 }; + vresult_char = (vector signed char) { 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_char = (vector signed char) { -1, -4, 5, 8, + 9, 12, 13, 16, + 17, 20, 21, 24, + 25, 28, 29, 32 }; + + vresult_char = vec_blendv (vsrc_a_char, vsrc_b_char, vsrc_c_uchar); + + if (!vec_all_eq (vresult_char, expected_vresult_char)) { +#if DEBUG + printf("ERROR, vec_blendv (vsrc_a_char, vsrc_b_char, vsrc_c_uchar)\n"); + for(i = 0; i < 16; i++) + printf(" vresult_char[%d] = %d, expected_vresult_char[%d] = %d\n", + i, vresult_char[i], i, expected_vresult_char[i]); +#else + abort(); +#endif + } + + vsrc_a_uchar = (vector unsigned char) { 1, 3, 5, 7, 9, 11, 13, 15, + 17, 19, 21, 23, 25, 27, 29 }; + vsrc_b_uchar = (vector unsigned char) { 2, 4, 6, 8, 10, 12, 14, 16, + 18, 20, 22, 24, 26, 28, 30, 32 }; + vsrc_c_uchar = (vector unsigned char) { 0, 0x80, 0, 0x80, 0, 0x80, 0, 0x80, + 0, 0x80, 0, 0x80, 0, 0x80, 0, 0x80 }; + vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_uchar = (vector unsigned char) { 1, 4, 5, 8, + 9, 12, 13, 16, + 17, 20, 21, 24, + 25, 28, 29, 32 }; + + vresult_uchar = vec_blendv (vsrc_a_uchar, vsrc_b_uchar, vsrc_c_uchar); + + if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) { +#if DEBUG + printf("ERROR, vec_blendv (vsrc_a_uchar, vsrc_b_uchar, vsrc_c_uchar)\n"); + for(i = 0; i < 16; i++) + printf(" vresult_uchar[%d] = %d, expected_vresult_uchar[%d] = %d\n", + i, vresult_uchar[i], i, expected_vresult_uchar[i]); +#else + abort(); +#endif + } + + vsrc_a_short = (vector signed short) { -1, 3, 5, 7, 9, 11, 13, 15 }; + vsrc_b_short = (vector signed short) { 2, -4, 6, 8, 10, 12, 14, 16 }; + vsrc_c_ushort = (vector unsigned short) { 0, 0x8000, 0, 0x8000, + 0, 0x8000, 0, 0x8000 }; + vresult_short = (vector signed short) { 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_short = (vector signed short) { -1, -4, 5, 8, + 9, 12, 13, 16 }; + + vresult_short = vec_blendv (vsrc_a_short, vsrc_b_short, vsrc_c_ushort); + + if (!vec_all_eq (vresult_short, expected_vresult_short)) { +#if DEBUG + printf("ERROR, vec_blendv (vsrc_a_short, vsrc_b_short, vsrc_c_ushort)\n"); + for(i = 0; i < 8; i++) + printf(" vresult_short[%d] = %d, expected_vresult_short[%d] = %d\n", + i, vresult_short[i], i, expected_vresult_short[i]); +#else + abort(); +#endif + } + + vsrc_a_ushort = (vector unsigned short) { 1, 3, 5, 7, 9, 11, 13, 15 }; + vsrc_b_ushort = (vector unsigned short) { 2, 4, 6, 8, 10, 12, 14, 16 }; + vsrc_c_ushort = (vector unsigned short) { 0, 0x8000, 0, 0x8000, + 0, 0x8000, 0, 0x8000 }; + vresult_ushort = (vector unsigned short) { 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_ushort = (vector unsigned short) { 1, 4, 5, 8, + 9, 12, 13, 16 }; + + vresult_ushort = vec_blendv (vsrc_a_ushort, vsrc_b_ushort, vsrc_c_ushort); + + if (!vec_all_eq (vresult_ushort, expected_vresult_ushort)) { +#if DEBUG + printf("ERROR, vec_blendv (vsrc_a_ushort, vsrc_b_ushort, vsrc_c_ushort)\n"); + for(i = 0; i < 8; i++) + printf(" vresult_ushort[%d] = %d, expected_vresult_ushort[%d] = %d\n", + i, vresult_ushort[i], i, expected_vresult_ushort[i]); +#else + abort(); +#endif + } + + vsrc_a_int = (vector signed int) { -1, -3, -5, -7 }; + vsrc_b_int = (vector signed int) { 2, 4, 6, 8 }; + vsrc_c_uint = (vector unsigned int) { 0, 0x80000000, 0, 0x80000000}; + vresult_int = (vector signed int) { 0, 0, 0, 0 }; + expected_vresult_int = (vector signed int) { -1, 4, -5, 8 }; + + vresult_int = vec_blendv (vsrc_a_int, vsrc_b_int, vsrc_c_uint); + + if (!vec_all_eq (vresult_int, expected_vresult_int)) { +#if DEBUG + printf("ERROR, vec_blendv (vsrc_a_int, vsrc_b_int, vsrc_c_uint)\n"); + for(i = 0; i < 4; i++) + printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n", + i, vresult_int[i], i, expected_vresult_int[i]); +#else + abort(); +#endif + } + + vsrc_a_uint = (vector unsigned int) { 1, 3, 5, 7 }; + vsrc_b_uint = (vector unsigned int) { 2, 4, 6, 8 }; + vsrc_c_uint = (vector unsigned int) { 0, 0x80000000, 0, 0x80000000 }; + vresult_uint = (vector unsigned int) { 0, 0, 0, 0 }; + expected_vresult_uint = (vector unsigned int) { 1, 4, 5, 8 }; + + vresult_uint = vec_blendv (vsrc_a_uint, vsrc_b_uint, vsrc_c_uint); + + if (!vec_all_eq (vresult_uint, expected_vresult_uint)) { +#if DEBUG + printf("ERROR, vec_blendv (vsrc_a_uint, vsrc_b_uint, vsrc_c_uint)\n"); + for(i = 0; i < 4; i++) + printf(" vresult_uint[%d] = %d, expected_vresult_uint[%d] = %d\n", + i, vresult_uint[i], i, expected_vresult_uint[i]); +#else + abort(); +#endif + } + + vsrc_a_ll = (vector signed long long int) { -1, -3 }; + vsrc_b_ll = (vector signed long long int) { 2, 4, }; + vsrc_c_ull = (vector unsigned long long int) { 0, 0x8000000000000000ULL }; + vresult_ll = (vector signed long long int) { 0, 0 }; + expected_vresult_ll = (vector signed long long int) { -1, 4 }; + + vresult_ll = vec_blendv (vsrc_a_ll, vsrc_b_ll, vsrc_c_ull); + + if (!vec_all_eq (vresult_ll, expected_vresult_ll)) { +#if DEBUG + printf("ERROR, vec_blendv (vsrc_a_ll, vsrc_b_ll, vsrc_c_ull)\n"); + for(i = 0; i < 2; i++) + printf(" vresult_ll[%d] = %d, expected_vresult_ll[%d] = %d\n", + i, vresult_ll[i], i, expected_vresult_ll[i]); +#else + abort(); +#endif + } + + vsrc_a_ull = (vector unsigned long long) { 1, 3 }; + vsrc_b_ull = (vector unsigned long long) { 2, 4 }; + vsrc_c_ull = (vector unsigned long long int) { 0, 0x8000000000000000ULL }; + vresult_ull = (vector unsigned long long) { 0, 0 }; + expected_vresult_ull = (vector unsigned long long) { 1, 4 }; + + vresult_ull = vec_blendv (vsrc_a_ull, vsrc_b_ull, vsrc_c_ull); + + if (!vec_all_eq (vresult_ull, expected_vresult_ull)) { +#if DEBUG + printf("ERROR, vec_blendv (vsrc_a_ull, vsrc_b_ull, vsrc_c_ull)\n"); + for(i = 0; i < 2; i++) + printf(" vresult_ull[%d] = %d, expected_vresult_ull[%d] = %d\n", + i, vresult_ull[i], i, expected_vresult_ull[i]); +#else + abort(); +#endif + } + + vsrc_a_f = (vector float) { -1.0, -3.0, -5.0, -7.0 }; + vsrc_b_f = (vector float) { 2.0, 4.0, 6.0, 8.0 }; + vsrc_c_uint = (vector unsigned int) { 0, 0x80000000, 0, 0x80000000}; + vresult_f = (vector float) { 0, 0, 0, 0 }; + expected_vresult_f = (vector float) { -1, 4, -5, 8 }; + + vresult_f = vec_blendv (vsrc_a_f, vsrc_b_f, vsrc_c_uint); + + if (!vec_all_eq (vresult_f, expected_vresult_f)) { +#if DEBUG + printf("ERROR, vec_blendv (vsrc_a_f, vsrc_b_f, vsrc_c_uint)\n"); + for(i = 0; i < 4; i++) + printf(" vresult_f[%d] = %d, expected_vresult_f[%d] = %d\n", + i, vresult_f[i], i, expected_vresult_f[i]); +#else + abort(); +#endif + } + + vsrc_a_d = (vector double) { -1.0, -3.0 }; + vsrc_b_d = (vector double) { 2.0, 4.0 }; + vsrc_c_ull = (vector unsigned long long int) { 0, 0x8000000000000000ULL }; + vresult_d = (vector double) { 0, 0 }; + expected_vresult_d = (vector double) { -1, 4 }; + + vresult_d = vec_blendv (vsrc_a_d, vsrc_b_d, vsrc_c_ull); + + if (!vec_all_eq (vresult_d, expected_vresult_d)) { +#if DEBUG + printf("ERROR, vec_blendv (vsrc_a_d, vsrc_b_d, vsrc_c_ull)\n"); + for(i = 0; i < 4; i++) + printf(" vresult_d[%d] = %d, expected_vresult_d[%d] = %d\n", + i, vresult_d[i], i, expected_vresult_d[i]); +#else + abort(); +#endif + } + + return 0; +} + +/* { dg-final { scan-assembler-times {\msplati\M} 6 } } */ +/* { dg-final { scan-assembler-times {\msrdbi\M} 6 } } */ + + diff --git a/gcc/testsuite/gcc.target/powerpc/vec-permute-ext-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-permute-ext-runnable.c new file mode 100644 index 00000000000..f5d223d0530 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-permute-ext-runnable.c @@ -0,0 +1,294 @@ +/* { dg-do run } */ +/* { dg-require-effective-target powerpc_future_hw } */ +/* { dg-options "-mdejagnu-cpu=future" } */ +#include + +#define DEBUG 1 + +#ifdef DEBUG +#include +#endif + +extern void abort (void); + +int +main (int argc, char *argv []) +{ + int i; + vector signed char vsrc_a_char, vsrc_b_char; + vector signed char vresult_char; + vector signed char expected_vresult_char; + + vector unsigned char vsrc_a_uchar, vsrc_b_uchar, vsrc_c_uchar; + vector unsigned char vresult_uchar; + vector unsigned char expected_vresult_uchar; + + vector signed short vsrc_a_short, vsrc_b_short, vsrc_c_short; + vector signed short vresult_short; + vector signed short expected_vresult_short; + + vector unsigned short vsrc_a_ushort, vsrc_b_ushort, vsrc_c_ushort; + vector unsigned short vresult_ushort; + vector unsigned short expected_vresult_ushort; + + vector int vsrc_a_int, vsrc_b_int, vsrc_c_int; + vector int vresult_int; + vector int expected_vresult_int; + + vector unsigned int vsrc_a_uint, vsrc_b_uint, vsrc_c_uint; + vector unsigned int vresult_uint; + vector unsigned int expected_vresult_uint; + + vector long long int vsrc_a_ll, vsrc_b_ll, vsrc_c_ll; + vector long long int vresult_ll; + vector long long int expected_vresult_ll; + + vector unsigned long long int vsrc_a_ull, vsrc_b_ull, vsrc_c_ull; + vector unsigned long long int vresult_ull; + vector unsigned long long int expected_vresult_ull; + + vector float vresult_f; + vector float expected_vresult_f; + vector float vsrc_a_f, vsrc_b_f; + + vector double vsrc_a_d, vsrc_b_d; + vector double vresult_d; + vector double expected_vresult_d; + + /* Vector permx */ + vsrc_a_char = (vector signed char) { -1, 3, 5, 7, 9, 11, 13, 15, + 17, 19, 21, 23, 25, 27, 29 }; + vsrc_b_char = (vector signed char) { 2, -4, 6, 8, 10, 12, 14, 16, + 18, 20, 22, 24, 26, 28, 30, 32 }; + vsrc_c_uchar = (vector unsigned char) { 0x0, 0x7, 0, 0x5, 0, 0x3, 0, 0x1, + 0, 0x2, 0, 0x4, 0, 0x6, 0, 0x0 }; + vresult_char = (vector signed char) { 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_char = (vector signed char) { -1, 15, -1, 11, + -1, 7, -1, 3, + -1, 5, -1, 9, + -1, 13, -1, -1 }; + + vresult_char = vec_permx (vsrc_a_char, vsrc_b_char, vsrc_c_uchar, 0); + + if (!vec_all_eq (vresult_char, expected_vresult_char)) { +#if DEBUG + printf("ERROR, vec_permx (vsrc_a_char, vsrc_b_char, vsrc_c_uchar)\n"); + for(i = 0; i < 16; i++) + printf(" vresult_char[%d] = %d, expected_vresult_char[%d] = %d\n", + i, vresult_char[i], i, expected_vresult_char[i]); +#else + abort(); +#endif + } + + vsrc_a_uchar = (vector unsigned char) { 1, 3, 5, 7, 9, 11, 13, 15, + 17, 19, 21, 23, 25, 27, 29 }; + vsrc_b_uchar = (vector unsigned char) { 2, 4, 6, 8, 10, 12, 14, 16, + 18, 20, 22, 24, 26, 28, 30, 32 }; + vsrc_c_uchar = (vector unsigned char) { 0x0, 0x7, 0, 0x5, 0, 0x3, 0, 0x1, + 0, 0x2, 0, 0x4, 0, 0x6, 0, 0x0 }; + vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_uchar = (vector unsigned char) { 1, 15, 1, 11, + 1, 7, 1, 3, + 1, 5, 1, 9, + 1, 13, 1, 1 }; + + vresult_uchar = vec_permx (vsrc_a_uchar, vsrc_b_uchar, vsrc_c_uchar, 0); + + if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) { +#if DEBUG + printf("ERROR, vec_permx (vsrc_a_uchar, vsrc_b_uchar, vsrc_c_uchar)\n"); + for(i = 0; i < 16; i++) + printf(" vresult_uchar[%d] = %d, expected_vresult_uchar[%d] = %d\n", + i, vresult_uchar[i], i, expected_vresult_uchar[i]); +#else + abort(); +#endif + } + + vsrc_a_short = (vector signed short int) { 1, -3, 5, 7, 9, 11, 13, 15 }; + vsrc_b_short = (vector signed short int) { 2, 4, -6, 8, 10, 12, 14, 16 }; + vsrc_c_uchar = (vector unsigned char) { 0x0, 0x1, 0x2, 0x3, + 0x4, 0x5, 0x2, 0x3, + 0x8, 0x9, 0x2, 0x3, + 0x1E, 0x1F, 0x2, 0x3 }; + vresult_short = (vector signed short int) { 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_short = (vector signed short int) { 1, -3, 5, -3, + 9, -3, 16, -3 }; + + vresult_short = vec_permx (vsrc_a_short, vsrc_b_short, vsrc_c_uchar, 0); + + if (!vec_all_eq (vresult_short, expected_vresult_short)) { +#if DEBUG + printf("ERROR, vec_permx (vsrc_a_short, vsrc_b_short, vsrc_c_uchar)\n"); + for(i = 0; i < 8; i++) + printf(" vresult_short[%d] = %d, expected_vresult_short[%d] = %d\n", + i, vresult_short[i], i, expected_vresult_short[i]); +#else + abort(); +#endif + } + + vsrc_a_ushort = (vector unsigned short int) { 1, 3, 5, 7, 9, 11, 13, 15 }; + vsrc_b_ushort = (vector unsigned short int) { 2, 4, 6, 8, 10, 12, 14, 16 }; + vsrc_c_uchar = (vector unsigned char) { 0x0, 0x1, 0x2, 0x3, + 0x4, 0x5, 0x2, 0x3, + 0x8, 0x9, 0x2, 0x3, + 0x1E, 0x1F, 0x2, 0x3 }; + vresult_ushort = (vector unsigned short int) { 0, 0, 0, 0, 0, 0, 0, 0 }; + expected_vresult_ushort = (vector unsigned short int) { 1, 3, 5, 3, + 9, 3, 16, 3 }; + + vresult_ushort = vec_permx (vsrc_a_ushort, vsrc_b_ushort, vsrc_c_uchar, 0); + + if (!vec_all_eq (vresult_ushort, expected_vresult_ushort)) { +#if DEBUG + printf("ERROR, vec_permx (vsrc_a_ushort, vsrc_b_ushort, vsrc_c_uchar)\n"); + for(i = 0; i < 8; i++) + printf(" vresult_ushort[%d] = %d, expected_vresult_ushort[%d] = %d\n", + i, vresult_ushort[i], i, expected_vresult_ushort[i]); +#else + abort(); +#endif + } + + vsrc_a_int = (vector signed int) { 1, -3, 5, 7 }; + vsrc_b_int = (vector signed int) { 2, 4, -6, 8 }; + vsrc_c_uchar = (vector unsigned char) { 0x0, 0x1, 0x2, 0x3, + 0x4, 0x5, 0x6, 0x7, + 0x18, 0x19, 0x1A, 0x1B, + 0x1C, 0x1D, 0x1E, 0x1F }; + vresult_int = (vector signed int) { 0, 0, 0, 0 }; + expected_vresult_int = (vector signed int) { 1, -3, -6, 8 }; + + vresult_int = vec_permx (vsrc_a_int, vsrc_b_int, vsrc_c_uchar, 0); + + if (!vec_all_eq (vresult_int, expected_vresult_int)) { +#if DEBUG + printf("ERROR, vec_permx (vsrc_a_int, vsrc_b_int, vsrc_c_uchar)\n"); + for(i = 0; i < 4; i++) + printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n", + i, vresult_int[i], i, expected_vresult_int[i]); +#else + abort(); +#endif + } + + vsrc_a_uint = (vector unsigned int) { 1, 3, 5, 7 }; + vsrc_b_uint = (vector unsigned int) { 10, 12, 14, 16 }; + vsrc_c_uchar = (vector unsigned char) { 0x0, 0x1, 0x2, 0x3, + 0x4, 0x5, 0x6, 0x7, + 0x18, 0x19, 0x1A, 0x1B, + 0x1C, 0x1D, 0x1E, 0x1F }; + vresult_uint = (vector unsigned int) { 0, 0, 0, 0 }; + expected_vresult_uint = (vector unsigned int) { 1, 3, 14, 16 }; + + vresult_uint = vec_permx (vsrc_a_uint, vsrc_b_uint, vsrc_c_uchar, 0); + + if (!vec_all_eq (vresult_uint, expected_vresult_uint)) { +#if DEBUG + printf("ERROR, vec_permx (vsrc_a_uint, vsrc_b_uint, vsrc_c_uchar)\n"); + for(i = 0; i < 4; i++) + printf(" vresult_uint[%d] = %d, expected_vresult_uint[%d] = %d\n", + i, vresult_uint[i], i, expected_vresult_uint[i]); +#else + abort(); +#endif + } + + vsrc_a_ll = (vector signed long long int) { 1, -3 }; + vsrc_b_ll = (vector signed long long int) { 2, -4 }; + vsrc_c_uchar = (vector unsigned char) { 0x0, 0x1, 0x2, 0x3, + 0x4, 0x5, 0x6, 0x7, + 0x18, 0x19, 0x1A, 0x1B, + 0x1C, 0x1D, 0x1E, 0x1F }; + vresult_ll = (vector signed long long int) { 0, 0}; + expected_vresult_ll = (vector signed long long int) { 1, -4 }; + + vresult_ll = vec_permx (vsrc_a_ll, vsrc_b_ll, vsrc_c_uchar, 0); + + if (!vec_all_eq (vresult_ll, expected_vresult_ll)) { +#if DEBUG + printf("ERROR, vec_permx (vsrc_a_ll, vsrc_b_ll, vsrc_c_uchar)\n"); + for(i = 0; i < 2; i++) + printf(" vresult_ll[%d] = %lld, expected_vresult_ll[%d] = %lld\n", + i, vresult_ll[i], i, expected_vresult_ll[i]); +#else + abort(); +#endif + } + + vsrc_a_ull = (vector unsigned long long int) { 1, 3 }; + vsrc_b_ull = (vector unsigned long long int) { 10, 12 }; + vsrc_c_uchar = (vector unsigned char) { 0x0, 0x1, 0x2, 0x3, + 0x4, 0x5, 0x6, 0x7, + 0x18, 0x19, 0x1A, 0x1B, + 0x1C, 0x1D, 0x1E, 0x1F }; + vresult_ull = (vector unsigned long long int) { 0, 0 }; + expected_vresult_ull = (vector unsigned long long int) { 1, 12 }; + + vresult_ull = vec_permx (vsrc_a_ull, vsrc_b_ull, vsrc_c_uchar, 0); + + if (!vec_all_eq (vresult_ull, expected_vresult_ull)) { +#if DEBUG + printf("ERROR, vec_permx (vsrc_a_ull, vsrc_b_ull, vsrc_c_uchar)\n"); + for(i = 0; i < 2; i++) + printf(" vresult_ull[%d] = %d, expected_vresult_ull[%d] = %d\n", + i, vresult_ull[i], i, expected_vresult_ull[i]); +#else + abort(); +#endif + } + + vsrc_a_f = (vector float) { -3.0, 5.0, 7.0, 9.0 }; + vsrc_b_f = (vector float) { 2.0, 4.0, 6.0, 8.0 }; + vsrc_c_uchar = (vector unsigned char) { 0x0, 0x1, 0x2, 0x3, + 0x4, 0x5, 0x6, 0x7, + 0x18, 0x19, 0x1A, 0x1B, + 0x1C, 0x1D, 0x1E, 0x1F }; + vresult_f = (vector float) { 0.0, 0.0, 0.0, 0.0 }; + expected_vresult_f = (vector float) { -3.0, 5.0, 6.0, 8.0 }; + + vresult_f = vec_permx (vsrc_a_f, vsrc_b_f, vsrc_c_uchar, 0); + + if (!vec_all_eq (vresult_f, expected_vresult_f)) { +#if DEBUG + printf("ERROR, vec_permx (vsrc_a_f, vsrc_b_f, vsrc_c_uchar)\n"); + for(i = 0; i < 4; i++) + printf(" vresult_f[%d] = %f, expected_vresult_f[%d] = %f\n", + i, vresult_f[i], i, expected_vresult_f[i]); +#else + abort(); +#endif + } + + vsrc_a_d = (vector double) { 1.0, -3.0 }; + vsrc_b_d = (vector double) { 2.0, -4.0 }; + vsrc_c_uchar = (vector unsigned char) { 0x0, 0x1, 0x2, 0x3, + 0x4, 0x5, 0x6, 0x7, + 0x1A, 0x1B, 0x1C, 0x1B, + 0x1C, 0x1D, 0x1E, 0x1F }; + vresult_d = (vector double) { 0.0, 0.0 }; + expected_vresult_d = (vector double) { 1.0, -4.0 }; + + vresult_d = vec_permx (vsrc_a_d, vsrc_b_d, vsrc_c_uchar, 0); + + if (!vec_all_eq (vresult_d, expected_vresult_d)) { +#if DEBUG + printf("ERROR, vec_permx (vsrc_a_d, vsrc_b_d, vsrc_c_uchar)\n"); + for(i = 0; i < 2; i++) + printf(" vresult_d[%d] = %f, expected_vresult_d[%d] = %f\n", + i, vresult_d[i], i, expected_vresult_d[i]); +#else + abort(); +#endif + } + + return 0; +} + +/* { dg-final { scan-assembler-times {\mxxpermx\M} 6 } } */ + +