From patchwork Thu Nov 9 08:22:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Schulze Frielinghaus X-Patchwork-Id: 1861879 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=jttbXWnk; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SQw1d6HVjz1yQK for ; Thu, 9 Nov 2023 19:22:45 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1BBB838618EE for ; Thu, 9 Nov 2023 08:22:43 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 0BE5E3857354 for ; Thu, 9 Nov 2023 08:22:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0BE5E3857354 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 0BE5E3857354 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699518150; cv=none; b=NRELS3wtU34ncec+NIIlyy43bo9lvHqxR0OJ4srdV4CMTp/X3AWdxSqN0GekMWfjf6Zx704UjYLQe1Mr8KtMNOGAZS3lUP3qjY56dJR8qD9Su2ZCFlR9gRy+XbbVUMd19p8+Olin6VR2KXMDTTxgJ+L09jRRgVNBo9VaIXhzW90= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699518150; c=relaxed/simple; bh=j6VnWun12FGllYo9LtYi16ZBUzqe5ZzwniCZ3P+a7Ds=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=FUQ2nhq5WGZXesvhFlNAo8MlSYEE7dFsNCnWszpOoIMHG/pCQaNQbGUH2MqfxfC4jhfVD7CMWNfs26op6YbtLEt12RAcQ3IB8FXW5xdlAChjsb49H2hGcA9GKpk+k5fTLMc8hw5STGKDpOKlC5AL7XTiXJyjioM+xzHMMywS+tQ= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3A98HSFm031692 for ; Thu, 9 Nov 2023 08:22:27 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : mime-version : content-transfer-encoding; s=pp1; bh=qKqq6tb9r5IGo7mHq9w/89hxCu9dpaDaJ6KRxSA3WNo=; b=jttbXWnkSt14JG3GTU9LjBMRen990hVTG30eKoPhJMQQp1USSWtVqRnfkzKEw98om0qy NZSTpXhAfElzfR3QnyOlgX8r03EbuLQXKzj9RREhxdCcfglgl9fYHx3ZCpQ3QVQ/K9PE vTVSjKjBfjJpA6UC51u3z1G49UuAt72twDUVcXiC7Hs+YQSrMAv+pn6AnppfGytVShVh d0u8znXmgJ7VB7iRnfv2VBXqhF3ipyD9a/gNfM6IXV9f7mBbLs/xZhhPwpnfUneoeJXO 7CrYn/XlnUrmG4E96BN+hnN++z+mbhkK61tkO2kn3QDepJkq5BhzpXh/6rTIS3cODGXV Ug== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3u8usn06mg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 09 Nov 2023 08:22:26 +0000 Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3A97mC95000700 for ; Thu, 9 Nov 2023 08:22:25 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3u7w232j3a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 09 Nov 2023 08:22:25 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3A98MMel14746314 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 9 Nov 2023 08:22:22 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 43E902004D; Thu, 9 Nov 2023 08:22:22 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2317720040; Thu, 9 Nov 2023 08:22:22 +0000 (GMT) Received: from a8345010.lnxne.boe (unknown [9.152.108.100]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTPS; Thu, 9 Nov 2023 08:22:22 +0000 (GMT) From: Stefan Schulze Frielinghaus To: krebbel@linux.ibm.com, gcc-patches@gcc.gnu.org Cc: Stefan Schulze Frielinghaus Subject: [PATCH 1/3] s390: Recognize further vpdi and vmr{l,h} pattern Date: Thu, 9 Nov 2023 09:22:09 +0100 Message-ID: <20231109082211.2505-1-stefansf@linux.ibm.com> X-Mailer: git-send-email 2.41.0 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: XZPYoLC-yR8exrLNa7KSi1PGUcINHiHi X-Proofpoint-ORIG-GUID: XZPYoLC-yR8exrLNa7KSi1PGUcINHiHi X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-09_07,2023-11-08_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 bulkscore=0 phishscore=0 clxscore=1015 spamscore=0 priorityscore=1501 malwarescore=0 adultscore=0 mlxscore=0 suspectscore=0 mlxlogscore=972 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311060000 definitions=main-2311090068 X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Deal with cases where vpdi and vmr{l,h} are still applicable if the operands of those instructions are swapped. For example, currently for V2DI foo (V2DI x) { return (V2DI) {x[1], x[0]}; } the assembler sequence vlgvg %r1,%v24,1 vzero %v0 vlvgg %v0,%r1,0 vmrhg %v24,%v0,%v24 is emitted. With this patch a single vpdi is emitted. Extensive tests are included in a subsequent patch of this series where more cases are covered. Bootstrapped and regtested on s390. Ok for mainline? gcc/ChangeLog: * config/s390/s390.cc (expand_perm_with_merge): Deal with cases where vmr{l,h} are still applicable if the operands are swapped. (expand_perm_with_vpdi): Likewise for vpdi. --- gcc/config/s390/s390.cc | 118 ++++++++++++++++++++++++++++++---------- 1 file changed, 90 insertions(+), 28 deletions(-) diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc index 64f56d8effa..185eb59f8b8 100644 --- a/gcc/config/s390/s390.cc +++ b/gcc/config/s390/s390.cc @@ -17532,40 +17532,86 @@ struct expand_vec_perm_d static bool expand_perm_with_merge (const struct expand_vec_perm_d &d) { - bool merge_lo_p = true; - bool merge_hi_p = true; - - if (d.nelt % 2) + static const unsigned char hi_perm_di[2] = {0, 2}; + static const unsigned char hi_perm_si[4] = {0, 4, 1, 5}; + static const unsigned char hi_perm_hi[8] = {0, 8, 1, 9, 2, 10, 3, 11}; + static const unsigned char hi_perm_qi[16] + = {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}; + + static const unsigned char hi_perm_di_swap[2] = {2, 0}; + static const unsigned char hi_perm_si_swap[4] = {4, 0, 6, 2}; + static const unsigned char hi_perm_hi_swap[8] = {8, 0, 10, 2, 12, 4, 14, 6}; + static const unsigned char hi_perm_qi_swap[16] + = {16, 0, 18, 2, 20, 4, 22, 6, 24, 8, 26, 10, 28, 12, 30, 14}; + + static const unsigned char lo_perm_di[2] = {1, 3}; + static const unsigned char lo_perm_si[4] = {2, 6, 3, 7}; + static const unsigned char lo_perm_hi[8] = {4, 12, 5, 13, 6, 14, 7, 15}; + static const unsigned char lo_perm_qi[16] + = {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}; + + static const unsigned char lo_perm_di_swap[2] = {3, 1}; + static const unsigned char lo_perm_si_swap[4] = {5, 1, 7, 3}; + static const unsigned char lo_perm_hi_swap[8] = {9, 1, 11, 3, 13, 5, 15, 7}; + static const unsigned char lo_perm_qi_swap[16] + = {17, 1, 19, 3, 21, 5, 23, 7, 25, 9, 27, 11, 29, 13, 31, 15}; + + bool merge_lo_p = false; + bool merge_hi_p = false; + bool swap_operands_p = false; + + if ((d.nelt == 2 && memcmp (d.perm, hi_perm_di, 2) == 0) + || (d.nelt == 4 && memcmp (d.perm, hi_perm_si, 4) == 0) + || (d.nelt == 8 && memcmp (d.perm, hi_perm_hi, 8) == 0) + || (d.nelt == 16 && memcmp (d.perm, hi_perm_qi, 16) == 0)) + { + merge_hi_p = true; + } + else if ((d.nelt == 2 && memcmp (d.perm, hi_perm_di_swap, 2) == 0) + || (d.nelt == 4 && memcmp (d.perm, hi_perm_si_swap, 4) == 0) + || (d.nelt == 8 && memcmp (d.perm, hi_perm_hi_swap, 8) == 0) + || (d.nelt == 16 && memcmp (d.perm, hi_perm_qi_swap, 16) == 0)) + { + merge_hi_p = true; + swap_operands_p = true; + } + else if ((d.nelt == 2 && memcmp (d.perm, lo_perm_di, 2) == 0) + || (d.nelt == 4 && memcmp (d.perm, lo_perm_si, 4) == 0) + || (d.nelt == 8 && memcmp (d.perm, lo_perm_hi, 8) == 0) + || (d.nelt == 16 && memcmp (d.perm, lo_perm_qi, 16) == 0)) + { + merge_lo_p = true; + } + else if ((d.nelt == 2 && memcmp (d.perm, lo_perm_di_swap, 2) == 0) + || (d.nelt == 4 && memcmp (d.perm, lo_perm_si_swap, 4) == 0) + || (d.nelt == 8 && memcmp (d.perm, lo_perm_hi_swap, 8) == 0) + || (d.nelt == 16 && memcmp (d.perm, lo_perm_qi_swap, 16) == 0)) + { + merge_lo_p = true; + swap_operands_p = true; + } + + if (!merge_lo_p && !merge_hi_p) return false; - // For V4SI this checks for: { 0, 4, 1, 5 } - for (int telt = 0; telt < d.nelt; telt++) - if (d.perm[telt] != telt / 2 + (telt % 2) * d.nelt) - { - merge_hi_p = false; - break; - } + if (d.testing_p) + return merge_lo_p || merge_hi_p; - if (!merge_hi_p) + rtx op0, op1; + if (swap_operands_p) { - // For V4SI this checks for: { 2, 6, 3, 7 } - for (int telt = 0; telt < d.nelt; telt++) - if (d.perm[telt] != (telt + d.nelt) / 2 + (telt % 2) * d.nelt) - { - merge_lo_p = false; - break; - } + op0 = d.op1; + op1 = d.op0; } else - merge_lo_p = false; - - if (d.testing_p) - return merge_lo_p || merge_hi_p; + { + op0 = d.op0; + op1 = d.op1; + } - if (merge_lo_p || merge_hi_p) - s390_expand_merge (d.target, d.op0, d.op1, merge_hi_p); + s390_expand_merge (d.target, op0, op1, merge_hi_p); - return merge_lo_p || merge_hi_p; + return true; } /* Try to expand the vector permute operation described by D using the @@ -17582,6 +17628,7 @@ expand_perm_with_vpdi (const struct expand_vec_perm_d &d) { bool vpdi1_p = false; bool vpdi4_p = false; + bool swap_operands_p = false; rtx op0_reg, op1_reg; // Only V2DI and V2DF are supported here. @@ -17590,11 +17637,20 @@ expand_perm_with_vpdi (const struct expand_vec_perm_d &d) if (d.perm[0] == 0 && d.perm[1] == 3) vpdi1_p = true; - - if ((d.perm[0] == 1 && d.perm[1] == 2) + else if (d.perm[0] == 2 && d.perm[1] == 1) + { + vpdi1_p = true; + swap_operands_p = true; + } + else if ((d.perm[0] == 1 && d.perm[1] == 2) || (d.perm[0] == 1 && d.perm[1] == 0) || (d.perm[0] == 3 && d.perm[1] == 2)) vpdi4_p = true; + else if (d.perm[0] == 3 && d.perm[1] == 0) + { + vpdi4_p = true; + swap_operands_p = true; + } if (!vpdi1_p && !vpdi4_p) return false; @@ -17611,6 +17667,12 @@ expand_perm_with_vpdi (const struct expand_vec_perm_d &d) op1_reg = op0_reg; else if (d.only_op1) op0_reg = op1_reg; + else if (swap_operands_p) + { + rtx tmp = op0_reg; + op0_reg = op1_reg; + op1_reg = tmp; + } if (vpdi1_p) emit_insn (gen_vpdi1 (d.vmode, d.target, op0_reg, op1_reg)); From patchwork Thu Nov 9 08:22:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Schulze Frielinghaus X-Patchwork-Id: 1861883 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=G0E4g/8s; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SQw7S6Vy6z1yQl for ; Thu, 9 Nov 2023 19:27:48 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id EA36B38618F0 for ; Thu, 9 Nov 2023 08:27:45 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 24AEC3860769 for ; Thu, 9 Nov 2023 08:27:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 24AEC3860769 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 24AEC3860769 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699518453; cv=none; b=nsVG1pOfum+d7HgQF1eD+9j3zfQKBZFfmOLJwKo6MJDSrzsoIfJ7T4yQfCSNrz0IUv80aVNPv6GqX0mxejp+1HEIuCkbiSDULKj7WtyDmt1KDI/4PGsESAkoaNxIjrcmDdoEss15KzKjtboEprCfB7QYyaZrQtmSGEduNjW1WAo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699518453; c=relaxed/simple; bh=Nh9yHAHR8ibKXWmJMmRTBUJGEy8eGRz2k3ZSURtUliQ=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=CQ2C/4tDdtfVClYnvyg6NeO22EJBQXkWGFCs4prg8WWER9sCQY9BVyIVxELM6KBFiQwjD3/tDOOT7mThwdihA6d0Efhcx5fc9smAhZJR3t+Qfq5SgE29f4WuxQ8AquNpsnoNF7Z2Xytkw47q10zrtg3KVr9broewIjqqiFj2iyQ= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353728.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3A98I9hp028923 for ; Thu, 9 Nov 2023 08:27:29 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=NZVmrHaTHMK6MJ/QNXWHS+l+hqbe+I9X5e845fQQ81E=; b=G0E4g/8seKhXxFa73jz71fzM2ehJ625SCzJY5ixecuwo+78tIjRpuymfR4oZitAkDV0f TGSmrORDVHl6f0kDjW8lX4ydlGiB924stLR77IFgUH4rbIprajsEA24lBT+Bz1Aj2JcC 4jHpcwcWFD/SMZl8Glsf0ufizRhfgNYgrfdzPKcWNIACeqvaqTMX8LmzsC/s//4Tf20Y EuixXxW6HzC/BHbmBeBavoVPliTzWKom8dWjdyEey5yLI2s7mx1jDlYyKzasL0ccwsjJ r8XNj/xgateYfP90cX/dhVJGv9QUhIiFu0CiG7mMMqWF2SGklhNhG9yPNksgA3LTdkoo jQ== Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3u8usp8b1e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 09 Nov 2023 08:27:28 +0000 Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3A97wQB8014346 for ; Thu, 9 Nov 2023 08:22:27 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3u7w222kch-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 09 Nov 2023 08:22:27 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3A98MOHb18612978 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 9 Nov 2023 08:22:24 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4E05120049; Thu, 9 Nov 2023 08:22:24 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2D4F320040; Thu, 9 Nov 2023 08:22:24 +0000 (GMT) Received: from a8345010.lnxne.boe (unknown [9.152.108.100]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTPS; Thu, 9 Nov 2023 08:22:24 +0000 (GMT) From: Stefan Schulze Frielinghaus To: krebbel@linux.ibm.com, gcc-patches@gcc.gnu.org Cc: Stefan Schulze Frielinghaus Subject: [PATCH 2/3] s390: Add expand_perm_reverse_elements Date: Thu, 9 Nov 2023 09:22:10 +0100 Message-ID: <20231109082211.2505-2-stefansf@linux.ibm.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231109082211.2505-1-stefansf@linux.ibm.com> References: <20231109082211.2505-1-stefansf@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 9ckMG4zs0q150vtibX_HvCzm1nqo-4uO X-Proofpoint-ORIG-GUID: 9ckMG4zs0q150vtibX_HvCzm1nqo-4uO X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-09_07,2023-11-08_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=999 mlxscore=0 lowpriorityscore=0 spamscore=0 priorityscore=1501 bulkscore=0 clxscore=1015 impostorscore=0 suspectscore=0 adultscore=0 malwarescore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311060000 definitions=main-2311090068 X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Replace expand_perm_with_rot, expand_perm_with_vster, and expand_perm_with_vstbrq with a general implementation expand_perm_reverse_elements. Bootstrapped and regtested on s390. Ok for mainline? gcc/ChangeLog: * config/s390/s390.cc (expand_perm_with_rot): Remove. (expand_perm_reverse_elements): New. (expand_perm_with_vster): Remove. (expand_perm_with_vstbrq): Remove. (vectorize_vec_perm_const_1): Replace removed functions with new one. --- gcc/config/s390/s390.cc | 88 ++++++++--------------------------------- 1 file changed, 16 insertions(+), 72 deletions(-) diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc index 185eb59f8b8..e36efec8ddc 100644 --- a/gcc/config/s390/s390.cc +++ b/gcc/config/s390/s390.cc @@ -17693,78 +17693,28 @@ is_reverse_perm_mask (const struct expand_vec_perm_d &d) return true; } -/* The case of reversing a four-element vector [0, 1, 2, 3] - can be handled by first permuting the doublewords - [2, 3, 0, 1] and subsequently rotating them by 32 bits. */ static bool -expand_perm_with_rot (const struct expand_vec_perm_d &d) +expand_perm_reverse_elements (const struct expand_vec_perm_d &d) { - if (d.nelt != 4) + if (d.op0 != d.op1 || !is_reverse_perm_mask (d)) return false; - if (d.op0 == d.op1 && is_reverse_perm_mask (d)) - { - if (d.testing_p) - return true; - - rtx tmp = gen_reg_rtx (d.vmode); - rtx op0_reg = force_reg (GET_MODE (d.op0), d.op0); - - emit_insn (gen_vpdi4_2 (d.vmode, tmp, op0_reg, op0_reg)); - if (d.vmode == V4SImode) - emit_insn (gen_rotlv4si3_di (d.target, tmp)); - else if (d.vmode == V4SFmode) - emit_insn (gen_rotlv4sf3_di (d.target, tmp)); - - return true; - } - - return false; -} + if (d.testing_p) + return true; -/* If we just reverse the elements, emit an eltswap if we have - vler/vster. */ -static bool -expand_perm_with_vster (const struct expand_vec_perm_d &d) -{ - if (TARGET_VXE2 && d.op0 == d.op1 && is_reverse_perm_mask (d) - && (d.vmode == V2DImode || d.vmode == V2DFmode - || d.vmode == V4SImode || d.vmode == V4SFmode - || d.vmode == V8HImode)) + switch (d.vmode) { - if (d.testing_p) - return true; - - if (d.vmode == V2DImode) - emit_insn (gen_eltswapv2di (d.target, d.op0)); - else if (d.vmode == V2DFmode) - emit_insn (gen_eltswapv2df (d.target, d.op0)); - else if (d.vmode == V4SImode) - emit_insn (gen_eltswapv4si (d.target, d.op0)); - else if (d.vmode == V4SFmode) - emit_insn (gen_eltswapv4sf (d.target, d.op0)); - else if (d.vmode == V8HImode) - emit_insn (gen_eltswapv8hi (d.target, d.op0)); - return true; + case V1TImode: emit_move_insn (d.target, d.op0); break; + case V2DImode: emit_insn (gen_eltswapv2di (d.target, d.op0)); break; + case V4SImode: emit_insn (gen_eltswapv4si (d.target, d.op0)); break; + case V8HImode: emit_insn (gen_eltswapv8hi (d.target, d.op0)); break; + case V16QImode: emit_insn (gen_eltswapv16qi (d.target, d.op0)); break; + case V2DFmode: emit_insn (gen_eltswapv2df (d.target, d.op0)); break; + case V4SFmode: emit_insn (gen_eltswapv4sf (d.target, d.op0)); break; + default: gcc_unreachable(); } - return false; -} -/* If we reverse a byte-vector this is the same as - byte reversing it which can be done with vstbrq. */ -static bool -expand_perm_with_vstbrq (const struct expand_vec_perm_d &d) -{ - if (TARGET_VXE2 && d.op0 == d.op1 && is_reverse_perm_mask (d) - && d.vmode == V16QImode) - { - if (d.testing_p) - return true; - - emit_insn (gen_eltswapv16qi (d.target, d.op0)); - return true; - } - return false; + return true; } /* Try to emit vlbr/vstbr. Note, this is only a candidate insn since @@ -17826,21 +17776,15 @@ expand_perm_as_a_vlbr_vstbr_candidate (const struct expand_vec_perm_d &d) static bool vectorize_vec_perm_const_1 (const struct expand_vec_perm_d &d) { - if (expand_perm_with_merge (d)) - return true; - - if (expand_perm_with_vster (d)) + if (expand_perm_reverse_elements (d)) return true; - if (expand_perm_with_vstbrq (d)) + if (expand_perm_with_merge (d)) return true; if (expand_perm_with_vpdi (d)) return true; - if (expand_perm_with_rot (d)) - return true; - if (expand_perm_as_a_vlbr_vstbr_candidate (d)) return true; From patchwork Thu Nov 9 08:22:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Schulze Frielinghaus X-Patchwork-Id: 1861880 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=hM6yT4S5; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SQw1m3Gd0z1yQK for ; Thu, 9 Nov 2023 19:22:52 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8698D3861934 for ; Thu, 9 Nov 2023 08:22:50 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id A77BE385F015 for ; Thu, 9 Nov 2023 08:22:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A77BE385F015 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A77BE385F015 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699518155; cv=none; b=BpVI7JsMEBoCHMsZPE32ZJAUB7pbwMuYrVo1OVzcBIARE0fZNUcDCNP920knVzFyUi7QI31/ugL9/iramM2RhkiALCL32plXfs/CPK/G8ppA7XrGivNez6vrYel0DbNwDhYgwmDy2T5fBtzpncLpxvHmQl8huI4hjZBNjr6nigM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699518155; c=relaxed/simple; bh=HYs8eBwADSO0AVVBc7dCPhvlf+LDjagjS0n79QFCaTg=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=EEM0MWUuVmq1qMmOFyuVSWhT0rSN5kYjl+o9bO528E7dIrv1v1AVHu0vpDSLGkZ5kjCe0nBzPmkclkf6EmPguFNXJRQYBvyI9AMZBax6g2y6S+lRWV+EnRjXGCrv15YJ4uNT+8vdLh78c5Pr2V3OIhreURsV5JyuRhp1W/KmLCQ= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3A98HcMk032194 for ; Thu, 9 Nov 2023 08:22:29 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=nCQo8PnhT5sjCsycsBfk+qq7s2TrBXmrHCSMhrlX5kg=; b=hM6yT4S50uszwGDCCUHJhextgATr28oftXzs9ivE9bE33qj+rqFcMZGGzQP0xmNs51tu 9AW+UgX9FXkgKlifvin3EAiWZKAfanfNWz9zBY9n+tAPGazaga/2LJnUhK/zFGSgEiul g/3Mdhlq6VZeC6YyW9HpQTMnrpjazd/Lwd9pcpsjmQ35oGlH3DsjCocEMTUTsa67GNpA ZA/qw/pJV8J5sH0D2ZR/WAhbzsi1Wj+w1jCoSUuVrX2rgVVmjMztfJ6jogrZBh77wxbF U0W27Hx/Wn0pUPyToIliPv7VKFxQxlDXEmXFQGXqtglgdmBKbtPQ1xqMgU7xtx3QpwGC nQ== Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3u8usn06pc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 09 Nov 2023 08:22:29 +0000 Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3A9835iE004124 for ; Thu, 9 Nov 2023 08:22:28 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 3u7w212j13-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 09 Nov 2023 08:22:28 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3A98MPcQ16843334 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 9 Nov 2023 08:22:25 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5773720049; Thu, 9 Nov 2023 08:22:25 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2352A20040; Thu, 9 Nov 2023 08:22:25 +0000 (GMT) Received: from a8345010.lnxne.boe (unknown [9.152.108.100]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTPS; Thu, 9 Nov 2023 08:22:25 +0000 (GMT) From: Stefan Schulze Frielinghaus To: krebbel@linux.ibm.com, gcc-patches@gcc.gnu.org Cc: Stefan Schulze Frielinghaus Subject: [PATCH 3/3] s390: Revise vector reverse elements Date: Thu, 9 Nov 2023 09:22:11 +0100 Message-ID: <20231109082211.2505-3-stefansf@linux.ibm.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231109082211.2505-1-stefansf@linux.ibm.com> References: <20231109082211.2505-1-stefansf@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: tADBQ6nwLCjPUgB6Dx6sQ7ObL-yGlGNo X-Proofpoint-ORIG-GUID: tADBQ6nwLCjPUgB6Dx6sQ7ObL-yGlGNo X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-09_07,2023-11-08_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 bulkscore=0 phishscore=0 clxscore=1015 spamscore=0 priorityscore=1501 malwarescore=0 adultscore=0 mlxscore=0 suspectscore=0 mlxlogscore=999 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311060000 definitions=main-2311090068 X-Spam-Status: No, score=-7.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Replace UNSPEC_VEC_ELTSWAP with a vec_select implementation. Furthermore, for a vector reverse elements operation between registers of mode V8HI perform three rotates instead of a vperm operation since the latter involves loading the permutation vector from the literal pool. Prior z15, instead of larl + vl + vl + vperm prefer vl + vpdi (+ verllg (+ verllf)) for a load operation. Likewise, prior z15, instead of larl + vl + vperm + vst prefer vpdi (+ verllg (+ verllf)) + vst for a store operation. Bootstrapped and regtested on s390. Ok for mainline? gcc/ChangeLog: * config/s390/s390.md: Remove UNSPEC_VEC_ELTSWAP. * config/s390/vector.md (eltswapv16qi): New expander. (*eltswapv16qi): New insn and splitter. (eltswapv8hi): New insn and splitter. (eltswap): New insn and splitter for modes V_HW_4 as well as V_HW_2. * config/s390/vx-builtins.md (eltswap): Remove. (*eltswapv16qi): Remove. (*eltswap): Remove. (*eltswap_emu): Remove. gcc/testsuite/ChangeLog: * gcc.target/s390/zvector/vec-reve-load-halfword-z14.c: Remove vperm and substitude by vpdi et al. * gcc.target/s390/zvector/vec-reve-load-halfword.c: Likewise. * gcc.target/s390/vector/reverse-elements-1.c: New test. * gcc.target/s390/vector/reverse-elements-2.c: New test. * gcc.target/s390/vector/reverse-elements-3.c: New test. * gcc.target/s390/vector/reverse-elements-4.c: New test. * gcc.target/s390/vector/reverse-elements-5.c: New test. * gcc.target/s390/vector/reverse-elements-6.c: New test. * gcc.target/s390/vector/reverse-elements-7.c: New test. --- gcc/config/s390/s390.md | 2 - gcc/config/s390/vector.md | 146 ++++++++++++++++++ gcc/config/s390/vx-builtins.md | 143 ----------------- .../s390/vector/reverse-elements-1.c | 46 ++++++ .../s390/vector/reverse-elements-2.c | 16 ++ .../s390/vector/reverse-elements-3.c | 56 +++++++ .../s390/vector/reverse-elements-4.c | 67 ++++++++ .../s390/vector/reverse-elements-5.c | 56 +++++++ .../s390/vector/reverse-elements-6.c | 67 ++++++++ .../s390/vector/reverse-elements-7.c | 67 ++++++++ .../s390/zvector/vec-reve-load-halfword-z14.c | 4 +- .../s390/zvector/vec-reve-load-halfword.c | 4 +- 12 files changed, 527 insertions(+), 147 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/vector/reverse-elements-1.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/reverse-elements-2.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/reverse-elements-3.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/reverse-elements-4.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/reverse-elements-5.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/reverse-elements-6.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/reverse-elements-7.c diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 3f29ba21442..f5e559c1ba4 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -241,8 +241,6 @@ UNSPEC_VEC_VFMIN UNSPEC_VEC_VFMAX - UNSPEC_VEC_ELTSWAP - UNSPEC_NNPA_VCLFNHS_V8HI UNSPEC_NNPA_VCLFNLS_V8HI UNSPEC_NNPA_VCRNFS_V8HI diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md index 7d1eb36e844..c478fce09df 100644 --- a/gcc/config/s390/vector.md +++ b/gcc/config/s390/vector.md @@ -948,6 +948,152 @@ operands[5] = simplify_gen_subreg (DFmode, operands[1], TFmode, 8); }) +;; VECTOR REVERSE ELEMENTS V16QI + +(define_expand "eltswapv16qi" + [(parallel + [(set (match_operand:V16QI 0 "nonimmediate_operand") + (vec_select:V16QI + (match_operand:V16QI 1 "nonimmediate_operand") + (match_dup 2))) + (use (match_dup 3))])] + "TARGET_VX" +{ + rtvec vec = rtvec_alloc (16); + for (int i = 0; i < 16; ++i) + RTVEC_ELT (vec, i) = GEN_INT (15 - i); + operands[2] = gen_rtx_PARALLEL (VOIDmode, vec); + operands[3] = gen_rtx_CONST_VECTOR (V16QImode, vec); +}) + +(define_insn_and_split "*eltswapv16qi" + [(set (match_operand:V16QI 0 "nonimmediate_operand" "=v,^R,^v") + (vec_select:V16QI + (match_operand:V16QI 1 "nonimmediate_operand" "v,^v,^R") + (parallel [(const_int 15) + (const_int 14) + (const_int 13) + (const_int 12) + (const_int 11) + (const_int 10) + (const_int 9) + (const_int 8) + (const_int 7) + (const_int 6) + (const_int 5) + (const_int 4) + (const_int 3) + (const_int 2) + (const_int 1) + (const_int 0)]))) + (use (match_operand:V16QI 2 "permute_pattern_operand" "v,X,X"))] + "TARGET_VX" + "@ + # + vstbrq\t%v1,%0 + vlbrq\t%v0,%1" + "&& reload_completed && REG_P (operands[0]) && REG_P (operands[1])" + [(set (match_dup 0) + (unspec:V16QI [(match_dup 1) + (match_dup 1) + (match_dup 2)] + UNSPEC_VEC_PERM))] + "" + [(set_attr "cpu_facility" "*,vxe2,vxe2") + (set_attr "op_type" "*,VRX,VRX")]) + +;; VECTOR REVERSE ELEMENTS V8HI + +(define_insn_and_split "eltswapv8hi" + [(set (match_operand:V8HI 0 "nonimmediate_operand" "=v,R,v") + (vec_select:V8HI + (match_operand:V8HI 1 "nonimmediate_operand" "v,v,R") + (parallel [(const_int 7) + (const_int 6) + (const_int 5) + (const_int 4) + (const_int 3) + (const_int 2) + (const_int 1) + (const_int 0)]))) + (clobber (match_scratch:V2DI 2 "=&v,X,X")) + (clobber (match_scratch:V4SI 3 "=&v,X,X"))] + "TARGET_VX" + "@ + # + vsterh\t%v1,%0 + vlerh\t%v0,%1" + "&& reload_completed && REG_P (operands[0]) && REG_P (operands[1])" + [(set (match_dup 2) + (subreg:V2DI (match_dup 1) 0)) + (set (match_dup 2) + (vec_select:V2DI + (match_dup 2) + (parallel [(const_int 1) (const_int 0)]))) + (set (match_dup 2) + (rotate:V2DI + (match_dup 2) + (const_int 32))) + (set (match_dup 3) + (subreg:V4SI (match_dup 2) 0)) + (set (match_dup 3) + (rotate:V4SI + (match_dup 3) + (const_int 16))) + (set (match_dup 0) + (subreg:V8HI (match_dup 3) 0))] + "" + [(set_attr "cpu_facility" "*,vxe2,vxe2") + (set_attr "op_type" "*,VRX,VRX")]) + +;; VECTOR REVERSE ELEMENTS V4SI / V4SF + +(define_insn_and_split "eltswap" + [(set (match_operand:V_HW_4 0 "nonimmediate_operand" "=v,R,v") + (vec_select:V_HW_4 + (match_operand:V_HW_4 1 "nonimmediate_operand" "v,v,R") + (parallel [(const_int 3) + (const_int 2) + (const_int 1) + (const_int 0)]))) + (clobber (match_scratch:V2DI 2 "=&v,X,X"))] + "TARGET_VX" + "@ + # + vsterf\t%v1,%0 + vlerf\t%v0,%1" + "&& reload_completed && REG_P (operands[0]) && REG_P (operands[1])" + [(set (match_dup 2) + (subreg:V2DI (match_dup 1) 0)) + (set (match_dup 2) + (vec_select:V2DI + (match_dup 2) + (parallel [(const_int 1) (const_int 0)]))) + (set (match_dup 2) + (rotate:V2DI + (match_dup 2) + (const_int 32))) + (set (match_dup 0) + (subreg:V_HW_4 (match_dup 2) 0))] + "" + [(set_attr "cpu_facility" "*,vxe2,vxe2") + (set_attr "op_type" "*,VRX,VRX")]) + +;; VECTOR REVERSE ELEMENTS V2DI / V2DF + +(define_insn "eltswap" + [(set (match_operand:V_HW_2 0 "nonimmediate_operand" "=v,R,v") + (vec_select:V_HW_2 + (match_operand:V_HW_2 1 "nonimmediate_operand" "v,v,R") + (parallel [(const_int 1) + (const_int 0)])))] + "TARGET_VX" + "@ + vpdi\t%v0,%v1,%v1,4 + vsterg\t%v1,%0 + vlerg\t%v0,%1" + [(set_attr "cpu_facility" "vx,vxe2,vxe2") + (set_attr "op_type" "VRR,VRX,VRX")]) ;; ;; Vector integer arithmetic instructions diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md index 10eae76777f..6f42c91e8ae 100644 --- a/gcc/config/s390/vx-builtins.md +++ b/gcc/config/s390/vx-builtins.md @@ -2163,149 +2163,6 @@ "fmaxb\t%v0,%v1,%v2,%b3" [(set_attr "op_type" "VRR")]) -; The element reversal builtins introduced with z15 have been made -; available also for older CPUs down to z13. -(define_expand "eltswap" - [(set (match_operand:VEC_HW 0 "nonimmediate_operand" "") - (unspec:VEC_HW [(match_operand:VEC_HW 1 "nonimmediate_operand" "")] - UNSPEC_VEC_ELTSWAP))] - "TARGET_VX") - -; The byte element reversal is implemented as 128 bit byte swap. -; Alternatively this could be emitted as bswap:V1TI but the required -; subregs appear to confuse combine. -(define_insn "*eltswapv16qi" - [(set (match_operand:V16QI 0 "nonimmediate_operand" "=v,v,R") - (unspec:V16QI [(match_operand:V16QI 1 "nonimmediate_operand" "v,R,v")] - UNSPEC_VEC_ELTSWAP))] - "TARGET_VXE2" - "@ - # - vlbrq\t%v0,%v1 - vstbrq\t%v1,%v0" - [(set_attr "op_type" "*,VRX,VRX")]) - -; vlerh, vlerf, vlerg, vsterh, vsterf, vsterg -(define_insn "*eltswap" - [(set (match_operand:V_HW_HSD 0 "nonimmediate_operand" "=v,v,R") - (unspec:V_HW_HSD [(match_operand:V_HW_HSD 1 "nonimmediate_operand" "v,R,v")] - UNSPEC_VEC_ELTSWAP))] - "TARGET_VXE2" - "@ - # - vler\t%v0,%v1 - vster\t%v1,%v0" - [(set_attr "op_type" "*,VRX,VRX")]) - -; The emulation pattern below will also accept -; vst (eltswap (vl)) -; i.e. both operands in memory, which reload needs to fix. -; Split into -; vl -; vster (=vst (eltswap)) -; since we prefer vster over vler as long as the latter -; does not support alignment hints. -(define_split - [(set (match_operand:VEC_HW 0 "memory_operand" "") - (unspec:VEC_HW [(match_operand:VEC_HW 1 "memory_operand" "")] - UNSPEC_VEC_ELTSWAP))] - "TARGET_VXE2 && can_create_pseudo_p ()" - [(set (match_dup 2) (match_dup 1)) - (set (match_dup 0) - (unspec:VEC_HW [(match_dup 2)] UNSPEC_VEC_ELTSWAP))] -{ - operands[2] = gen_reg_rtx (mode); -}) - - -; Swapping v2df/v2di can be done via vpdi on z13 and z14. -(define_split - [(set (match_operand:V_HW_2 0 "register_operand" "") - (unspec:V_HW_2 [(match_operand:V_HW_2 1 "register_operand" "")] - UNSPEC_VEC_ELTSWAP))] - "TARGET_VX && can_create_pseudo_p ()" - [(set (match_operand:V_HW_2 0 "register_operand" "=v") - (vec_select:V_HW_2 - (vec_concat: - (match_operand:V_HW_2 1 "register_operand" "v") - (match_dup 1)) - (parallel [(const_int 1) (const_int 2)])))] -) - - -; Swapping v4df/v4si can be done via vpdi and rot. -(define_split - [(set (match_operand:V_HW_4 0 "register_operand" "") - (unspec:V_HW_4 [(match_operand:V_HW_4 1 "register_operand" "")] - UNSPEC_VEC_ELTSWAP))] - "TARGET_VX && can_create_pseudo_p ()" - [(set (match_dup 2) - (vec_select:V_HW_4 - (vec_concat: - (match_dup 1) - (match_dup 1)) - (parallel [(const_int 2) (const_int 3) (const_int 4) (const_int 5)]))) - (set (match_dup 3) - (subreg:V2DI (match_dup 2) 0)) - (set (match_dup 4) - (rotate:V2DI - (match_dup 3) - (const_int 32))) - (set (match_operand:V_HW_4 0) - (subreg:V_HW_4 (match_dup 4) 0))] -{ - operands[2] = gen_reg_rtx (mode); - operands[3] = gen_reg_rtx (V2DImode); - operands[4] = gen_reg_rtx (V2DImode); -}) - -; z15 has instructions for doing element reversal from mem to reg -; or the other way around. For reg to reg or on pre z15 machines -; we have to emulate it with vector permute. -(define_insn_and_split "*eltswap_emu" - [(set (match_operand:VEC_HW 0 "nonimmediate_operand" "=vR") - (unspec:VEC_HW [(match_operand:VEC_HW 1 "nonimmediate_operand" "vR")] - UNSPEC_VEC_ELTSWAP))] - "TARGET_VX && can_create_pseudo_p ()" - "#" - "&& ((!memory_operand (operands[0], mode) - && !memory_operand (operands[1], mode)) - || !TARGET_VXE2)" - [(set (match_dup 3) - (unspec:V16QI [(match_dup 4) - (match_dup 4) - (match_dup 2)] - UNSPEC_VEC_PERM)) - (set (match_dup 0) (subreg:VEC_HW (match_dup 3) 0))] -{ - static char p[4][16] = - { { 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 }, /* Q */ - { 14, 15, 12, 13, 10, 11, 8, 9, 6, 7, 4, 5, 2, 3, 0, 1 }, /* H */ - { 12, 13, 14, 15, 8, 9, 10, 11, 4, 5, 6, 7, 0, 1, 2, 3 }, /* S */ - { 8, 9, 10, 11, 12, 13, 14, 15, 0, 1, 2, 3, 4, 5, 6, 7 } }; /* D */ - char *perm; - rtx perm_rtx[16], constv; - - switch (GET_MODE_SIZE (GET_MODE_INNER (mode))) - { - case 1: perm = p[0]; break; - case 2: perm = p[1]; break; - case 4: perm = p[2]; break; - case 8: perm = p[3]; break; - default: gcc_unreachable (); - } - - for (int i = 0; i < 16; i++) - perm_rtx[i] = GEN_INT (perm[i]); - - operands[1] = force_reg (mode, operands[1]); - operands[2] = gen_reg_rtx (V16QImode); - operands[3] = gen_reg_rtx (V16QImode); - operands[4] = simplify_gen_subreg (V16QImode, operands[1], mode, 0); - constv = force_const_mem (V16QImode, gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, perm_rtx))); - emit_move_insn (operands[2], constv); -}) - ; vec_insert (__builtin_bswap32 (*a), b, 1) set-element-bswap-2.c ; b[1] = __builtin_bswap32 (*a) set-element-bswap-3.c ; vlebrh, vlebrf, vlebrg diff --git a/gcc/testsuite/gcc.target/s390/vector/reverse-elements-1.c b/gcc/testsuite/gcc.target/s390/vector/reverse-elements-1.c new file mode 100644 index 00000000000..4a2541b7ae6 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/vector/reverse-elements-1.c @@ -0,0 +1,46 @@ +/* { dg-compile } */ +/* { dg-options "-O3 -mzarch -march=z13" } */ +/* { dg-require-effective-target s390_vx } */ +/* { dg-final { scan-assembler-times {\tvpdi\t} 4 } } */ +/* { dg-final { scan-assembler-not {\tvperm\t} } } */ + +typedef short __attribute__ ((vector_size (16))) V8HI; +typedef int __attribute__ ((vector_size (16))) V4SI; +typedef long long __attribute__ ((vector_size (16))) V2DI; +typedef double __attribute__ ((vector_size (16))) V2DF; + +V8HI +v8hi (V8HI x) +{ + V8HI y; + for (int i = 0; i < 8; ++i) + y[i] = x[7 - i]; + return y; +} + +V4SI +v4si (V4SI x) +{ + V4SI y; + for (int i = 0; i < 4; ++i) + y[i] = x[3 - i]; + return y; +} + +V2DI +v2di (V2DI x) +{ + V2DI y; + for (int i = 0; i < 2; ++i) + y[i] = x[1 - i]; + return y; +} + +V2DF +v2df (V2DF x) +{ + V2DF y; + for (int i = 0; i < 2; ++i) + y[i] = x[1 - i]; + return y; +} diff --git a/gcc/testsuite/gcc.target/s390/vector/reverse-elements-2.c b/gcc/testsuite/gcc.target/s390/vector/reverse-elements-2.c new file mode 100644 index 00000000000..ec0d1da7d57 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/vector/reverse-elements-2.c @@ -0,0 +1,16 @@ +/* { dg-compile } */ +/* { dg-options "-O3 -mzarch -march=z14" } */ +/* { dg-require-effective-target s390_vxe } */ +/* { dg-final { scan-assembler-times {\tvpdi\t} 1 } } */ +/* { dg-final { scan-assembler-not {\tvperm\t} } } */ + +typedef float __attribute__ ((vector_size (16))) V4SF; + +V4SF +v4sf (V4SF x) +{ + V4SF y; + for (int i = 0; i < 4; ++i) + y[i] = x[3 - i]; + return y; +} diff --git a/gcc/testsuite/gcc.target/s390/vector/reverse-elements-3.c b/gcc/testsuite/gcc.target/s390/vector/reverse-elements-3.c new file mode 100644 index 00000000000..3f69db8831c --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/vector/reverse-elements-3.c @@ -0,0 +1,56 @@ +/* { dg-compile } */ +/* { dg-options "-O3 -mzarch -march=z14" } */ +/* { dg-require-effective-target s390_vxe } */ +/* { dg-final { scan-assembler-times {\tvpdi\t} 5 } } */ +/* { dg-final { scan-assembler-not {\tvperm\t} } } */ + +typedef short __attribute__ ((vector_size (16))) V8HI; +typedef int __attribute__ ((vector_size (16))) V4SI; +typedef long long __attribute__ ((vector_size (16))) V2DI; +typedef float __attribute__ ((vector_size (16))) V4SF; +typedef double __attribute__ ((vector_size (16))) V2DF; + +V8HI +v8hi (V8HI *x) +{ + V8HI y; + for (int i = 0; i < 8; ++i) + y[i] = (*x)[7 - i]; + return y; +} + +V4SI +v4si (V4SI *x) +{ + V4SI y; + for (int i = 0; i < 4; ++i) + y[i] = (*x)[3 - i]; + return y; +} + +V2DI +v2di (V2DI *x) +{ + V2DI y; + for (int i = 0; i < 2; ++i) + y[i] = (*x)[1 - i]; + return y; +} + +V4SF +v4sf (V4SF *x) +{ + V4SF y; + for (int i = 0; i < 4; ++i) + y[i] = (*x)[3 - i]; + return y; +} + +V2DF +v2df (V2DF *x) +{ + V2DF y; + for (int i = 0; i < 2; ++i) + y[i] = (*x)[1 - i]; + return y; +} diff --git a/gcc/testsuite/gcc.target/s390/vector/reverse-elements-4.c b/gcc/testsuite/gcc.target/s390/vector/reverse-elements-4.c new file mode 100644 index 00000000000..5027ed55f50 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/vector/reverse-elements-4.c @@ -0,0 +1,67 @@ +/* { dg-compile } */ +/* { dg-options "-O3 -mzarch -march=z15" } */ +/* { dg-require-effective-target s390_vxe2 } */ +/* { dg-final { scan-assembler-times {\tvlbrq\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tvler[hfg]\t} 5 } } */ +/* { dg-final { scan-assembler-not {\tvperm\t} } } */ + +typedef signed char __attribute__ ((vector_size (16))) V16QI; +typedef short __attribute__ ((vector_size (16))) V8HI; +typedef int __attribute__ ((vector_size (16))) V4SI; +typedef long long __attribute__ ((vector_size (16))) V2DI; +typedef float __attribute__ ((vector_size (16))) V4SF; +typedef double __attribute__ ((vector_size (16))) V2DF; + +V16QI +v16qi (V16QI *x) +{ + V16QI y; + for (int i = 0; i < 16; ++i) + y[i] = (*x)[15 - i]; + return y; +} + +V8HI +v8hi (V8HI *x) +{ + V8HI y; + for (int i = 0; i < 8; ++i) + y[i] = (*x)[7 - i]; + return y; +} + +V4SI +v4si (V4SI *x) +{ + V4SI y; + for (int i = 0; i < 4; ++i) + y[i] = (*x)[3 - i]; + return y; +} + +V2DI +v2di (V2DI *x) +{ + V2DI y; + for (int i = 0; i < 2; ++i) + y[i] = (*x)[1 - i]; + return y; +} + +V4SF +v4sf (V4SF *x) +{ + V4SF y; + for (int i = 0; i < 4; ++i) + y[i] = (*x)[3 - i]; + return y; +} + +V2DF +v2df (V2DF *x) +{ + V2DF y; + for (int i = 0; i < 2; ++i) + y[i] = (*x)[1 - i]; + return y; +} diff --git a/gcc/testsuite/gcc.target/s390/vector/reverse-elements-5.c b/gcc/testsuite/gcc.target/s390/vector/reverse-elements-5.c new file mode 100644 index 00000000000..8c250aa681b --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/vector/reverse-elements-5.c @@ -0,0 +1,56 @@ +/* { dg-compile } */ +/* { dg-options "-O3 -mzarch -march=z14" } */ +/* { dg-require-effective-target s390_vxe } */ +/* { dg-final { scan-assembler-times {\tvpdi\t} 5 } } */ +/* { dg-final { scan-assembler-not {\tvperm\t} } } */ + +typedef short __attribute__ ((vector_size (16))) V8HI; +typedef int __attribute__ ((vector_size (16))) V4SI; +typedef long long __attribute__ ((vector_size (16))) V2DI; +typedef float __attribute__ ((vector_size (16))) V4SF; +typedef double __attribute__ ((vector_size (16))) V2DF; + +void +v8hi (V8HI *x, V8HI y) +{ + V8HI z; + for (int i = 0; i < 8; ++i) + z[i] = y[7 - i]; + *x = z; +} + +void +v4si (V4SI *x, V4SI y) +{ + V4SI z; + for (int i = 0; i < 4; ++i) + z[i] = y[3 - i]; + *x = z; +} + +void +v2di (V2DI *x, V2DI y) +{ + V2DI z; + for (int i = 0; i < 2; ++i) + z[i] = y[1 - i]; + *x = z; +} + +void +v4sf (V4SF *x, V4SF y) +{ + V4SF z; + for (int i = 0; i < 4; ++i) + z[i] = y[3 - i]; + *x = z; +} + +void +v2df (V2DF *x, V2DF y) +{ + V2DF z; + for (int i = 0; i < 2; ++i) + z[i] = y[1 - i]; + *x = z; +} diff --git a/gcc/testsuite/gcc.target/s390/vector/reverse-elements-6.c b/gcc/testsuite/gcc.target/s390/vector/reverse-elements-6.c new file mode 100644 index 00000000000..7e2b2356788 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/vector/reverse-elements-6.c @@ -0,0 +1,67 @@ +/* { dg-compile } */ +/* { dg-options "-O3 -mzarch -march=z15" } */ +/* { dg-require-effective-target s390_vxe2 } */ +/* { dg-final { scan-assembler-times {\tvstbrq\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tvster[hfg]\t} 5 } } */ +/* { dg-final { scan-assembler-not {\tvperm\t} } } */ + +typedef signed char __attribute__ ((vector_size (16))) V16QI; +typedef short __attribute__ ((vector_size (16))) V8HI; +typedef int __attribute__ ((vector_size (16))) V4SI; +typedef long long __attribute__ ((vector_size (16))) V2DI; +typedef float __attribute__ ((vector_size (16))) V4SF; +typedef double __attribute__ ((vector_size (16))) V2DF; + +void +v16qi (V16QI *x, V16QI y) +{ + V16QI z; + for (int i = 0; i < 16; ++i) + z[i] = y[15 - i]; + *x = z; +} + +void +v8hi (V8HI *x, V8HI y) +{ + V8HI z; + for (int i = 0; i < 8; ++i) + z[i] = y[7 - i]; + *x = z; +} + +void +v4si (V4SI *x, V4SI y) +{ + V4SI z; + for (int i = 0; i < 4; ++i) + z[i] = y[3 - i]; + *x = z; +} + +void +v2di (V2DI *x, V2DI y) +{ + V2DI z; + for (int i = 0; i < 2; ++i) + z[i] = y[1 - i]; + *x = z; +} + +void +v4sf (V4SF *x, V4SF y) +{ + V4SF z; + for (int i = 0; i < 4; ++i) + z[i] = y[3 - i]; + *x = z; +} + +void +v2df (V2DF *x, V2DF y) +{ + V2DF z; + for (int i = 0; i < 2; ++i) + z[i] = y[1 - i]; + *x = z; +} diff --git a/gcc/testsuite/gcc.target/s390/vector/reverse-elements-7.c b/gcc/testsuite/gcc.target/s390/vector/reverse-elements-7.c new file mode 100644 index 00000000000..046fcc0790a --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/vector/reverse-elements-7.c @@ -0,0 +1,67 @@ +/* { dg-compile } */ +/* { dg-options "-O3 -mzarch -march=z15" } */ +/* { dg-require-effective-target s390_vxe2 } */ +/* { dg-final { scan-assembler-times {\tvstbrq\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tvster[hfg]\t} 5 } } */ +/* { dg-final { scan-assembler-not {\tvperm\t} } } */ + +typedef signed char __attribute__ ((vector_size (16))) V16QI; +typedef short __attribute__ ((vector_size (16))) V8HI; +typedef int __attribute__ ((vector_size (16))) V4SI; +typedef long long __attribute__ ((vector_size (16))) V2DI; +typedef float __attribute__ ((vector_size (16))) V4SF; +typedef double __attribute__ ((vector_size (16))) V2DF; + +void +v16qi (V16QI *x, V16QI *y) +{ + V16QI z; + for (int i = 0; i < 16; ++i) + z[i] = (*y)[15 - i]; + *x = z; +} + +void +v8hi (V8HI *x, V8HI *y) +{ + V8HI z; + for (int i = 0; i < 8; ++i) + z[i] = (*y)[7 - i]; + *x = z; +} + +void +v4si (V4SI *x, V4SI *y) +{ + V4SI z; + for (int i = 0; i < 4; ++i) + z[i] = (*y)[3 - i]; + *x = z; +} + +void +v2di (V2DI *x, V2DI *y) +{ + V2DI z; + for (int i = 0; i < 2; ++i) + z[i] = (*y)[1 - i]; + *x = z; +} + +void +v4sf (V4SF *x, V4SF *y) +{ + V4SF z; + for (int i = 0; i < 4; ++i) + z[i] = (*y)[3 - i]; + *x = z; +} + +void +v2df (V2DF *x, V2DF *y) +{ + V2DF z; + for (int i = 0; i < 2; ++i) + z[i] = (*y)[1 - i]; + *x = z; +} diff --git a/gcc/testsuite/gcc.target/s390/zvector/vec-reve-load-halfword-z14.c b/gcc/testsuite/gcc.target/s390/zvector/vec-reve-load-halfword-z14.c index 4938ac20613..3c1e9338f80 100644 --- a/gcc/testsuite/gcc.target/s390/zvector/vec-reve-load-halfword-z14.c +++ b/gcc/testsuite/gcc.target/s390/zvector/vec-reve-load-halfword-z14.c @@ -21,4 +21,6 @@ baz (signed short *x) return vec_reve (vec_xl (0, x)); } -/* { dg-final { scan-assembler-times "vperm\t" 3 } } */ +/* { dg-final { scan-assembler-times "vpdi\t" 3 } } */ +/* { dg-final { scan-assembler-times "verllg\t" 3 } } */ +/* { dg-final { scan-assembler-times "verllf\t" 3 } } */ diff --git a/gcc/testsuite/gcc.target/s390/zvector/vec-reve-load-halfword.c b/gcc/testsuite/gcc.target/s390/zvector/vec-reve-load-halfword.c index 3c9229922ec..7b1c3f885cd 100644 --- a/gcc/testsuite/gcc.target/s390/zvector/vec-reve-load-halfword.c +++ b/gcc/testsuite/gcc.target/s390/zvector/vec-reve-load-halfword.c @@ -9,7 +9,9 @@ foo (vector signed short x) return vec_reve (x); } -/* { dg-final { scan-assembler-times "vperm\t" 1 } } */ +/* { dg-final { scan-assembler-times "vpdi\t" 1 } } */ +/* { dg-final { scan-assembler-times "verllg\t" 1 } } */ +/* { dg-final { scan-assembler-times "verllf\t" 1 } } */ vector signed short