From patchwork Tue Apr 9 14:31:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Juergen Christ X-Patchwork-Id: 1921424 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=nY+likic; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4VDT1s6NtPz1yYH for ; Wed, 10 Apr 2024 00:32:17 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2E37E3844742 for ; Tue, 9 Apr 2024 14:32:16 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 14A5C386483E for ; Tue, 9 Apr 2024 14:31:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 14A5C386483E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 14A5C386483E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712673114; cv=none; b=Yg0Gku68/hLB4FLMCVzvuKW9WM6U3uKjSzXc/c5FuVlKFkJKDoqFY+6rgwElspxHWyqdqqG4R/oUfPtiVGC+ey9n0/EaKeCSK2gnVskDX8A6c7CL9HXJ4UcSIh1LBXUBRaysH19lgSgB/RGBfd4AkRyrmXlas7dJrks4FNua+8I= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712673114; c=relaxed/simple; bh=NW2meI0sj0yY2ZQ+oHhkN7AsoYAPFll2Rz02SiCaP+M=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=EYoaUurWHQxv3cGaRjis1I7MwJiHIqKzyy/Ml1GzTyIfdsla+DfxcO+YUNyQBmvt1tupsNFYx5nOGwCuVz7JIf7j2unfyvrY1IkdcFGS7zqzfze0QUUeIWPAWRF47ymynYvoChUD54prbZax5dn3lqSh5MCFlkFx+eYXScUSo4E= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 439EVFAR003502 for ; Tue, 9 Apr 2024 14:31:51 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : mime-version : content-transfer-encoding; s=pp1; bh=YrdyCHyksanb1qBzS/m3NbZsQOprcQZY4HMAjMCmqpg=; b=nY+likicGaraw3EIoLl2jZDMjzlWuT1Yp8mqAdSVcZC5tqT8LU3otvO7xxt1WiHdyCA0 mm2aHWr+9xREKnNtjRxmctoukQgAYvHI/f63LgE2Nm0kC4YurVHbhB0fb5LPlmurxRHF YHEvFFzP55RFSKsqxrXLcne47nGtoPMSYajhAX0OHEFE71lyWgtUOwZgSnh5O7T/md5p szqexlgNBgrAah+673xAkM8OXYqnfCeWa1X+YAaz0pjtNUziKAzkn6FbW6GOBZyRCg5K xhBA0bZ/e4cD3IixMabS/sX93706EySwiCMk6+BEsUQKG4AMBacxmsTxADANZmQmOJCY Ug== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3xd6mf04ux-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 09 Apr 2024 14:31:50 +0000 Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 439CI0jW022583 for ; Tue, 9 Apr 2024 14:31:48 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3xbhqnxqts-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 09 Apr 2024 14:31:48 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 439EVhfq41288176 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 9 Apr 2024 14:31:45 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1A1452004E; Tue, 9 Apr 2024 14:31:43 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id F0D992004B; Tue, 9 Apr 2024 14:31:42 +0000 (GMT) Received: from li-3a824ecc-34fe-11b2-a85c-eae455c7d911.ibm.com (unknown [9.152.222.38]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 9 Apr 2024 14:31:42 +0000 (GMT) From: Juergen Christ To: gcc-patches@gcc.gnu.org Cc: krebbel@linux.ibm.com Subject: [PATCH v2] s390x: Optimize vector permute with constant indexes Date: Tue, 9 Apr 2024 16:31:40 +0200 Message-Id: <20240409143140.22429-1-jchrist@linux.ibm.com> X-Mailer: git-send-email 2.39.3 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: cL_8iijW-eW3Ch6lLF3Hwnoki9doHZYS X-Proofpoint-GUID: cL_8iijW-eW3Ch6lLF3Hwnoki9doHZYS X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-04-09_10,2024-04-09_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 clxscore=1015 malwarescore=0 suspectscore=0 bulkscore=0 mlxscore=0 phishscore=0 mlxlogscore=999 spamscore=0 adultscore=0 lowpriorityscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2404010000 definitions=main-2404090095 X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Loop vectorizer can generate vector permutes with constant indexes where all indexes are equal. Optimize this case to use vector replicate instead of vector permute. gcc/ChangeLog: * config/s390/s390.cc (expand_perm_as_replicate): Implement. (vectorize_vec_perm_const_1): Call new function. * config/s390/vx-builtins.md (vec_splat): Change to... (@vec_splat): ...this. gcc/testsuite/ChangeLog: * gcc.target/s390/vector/vec-expand-replicate.c: New test. Bootstrapped and regtested on s390x. Ok for trunk? Signed-off-by: Juergen Christ --- gcc/config/s390/s390.cc | 33 ++++++++++ gcc/config/s390/vx-builtins.md | 2 +- .../s390/vector/vec-expand-replicate.c | 60 +++++++++++++++++++ 3 files changed, 94 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-expand-replicate.c diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc index 372a23244032..3148f163627c 100644 --- a/gcc/config/s390/s390.cc +++ b/gcc/config/s390/s390.cc @@ -17923,6 +17923,36 @@ expand_perm_as_a_vlbr_vstbr_candidate (const struct expand_vec_perm_d &d) return false; } +static bool +expand_perm_as_replicate (const struct expand_vec_perm_d &d) +{ + unsigned char i; + unsigned char elem; + rtx base = d.op0; + rtx insn; + /* Needed to silence maybe-uninitialized warning. */ + gcc_assert (d.nelt > 0); + elem = d.perm[0]; + for (i = 1; i < d.nelt; ++i) + if (d.perm[i] != elem) + return false; + if (!d.testing_p) + { + if (elem >= d.nelt) + { + base = d.op1; + elem -= d.nelt; + } + insn = maybe_gen_vec_splat (d.vmode, d.target, base, GEN_INT (elem)); + if (insn == NULL_RTX) + return false; + emit_insn (insn); + return true; + } + else + return maybe_code_for_vec_splat (d.vmode) != CODE_FOR_nothing; +} + /* Try to find the best sequence for the vector permute operation described by D. Return true if the operation could be expanded. */ @@ -17941,6 +17971,9 @@ vectorize_vec_perm_const_1 (const struct expand_vec_perm_d &d) if (expand_perm_as_a_vlbr_vstbr_candidate (d)) return true; + if (expand_perm_as_replicate (d)) + return true; + return false; } diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md index 432d81a719fc..93c0d408a43e 100644 --- a/gcc/config/s390/vx-builtins.md +++ b/gcc/config/s390/vx-builtins.md @@ -424,7 +424,7 @@ ; Replicate from vector element -(define_expand "vec_splat" +(define_expand "@vec_splat" [(set (match_operand:V_HW 0 "register_operand" "") (vec_duplicate:V_HW (vec_select: (match_operand:V_HW 1 "register_operand" "") diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-expand-replicate.c b/gcc/testsuite/gcc.target/s390/vector/vec-expand-replicate.c new file mode 100644 index 000000000000..872b1c9321cd --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/vector/vec-expand-replicate.c @@ -0,0 +1,60 @@ +/* Check that the vectorize_vec_perm_const expander correctly deals with + replication. Extracted from spec "nab". */ + +/* { dg-do compile } */ +/* { dg-options "-O3 -mzarch -march=z13 -fvect-cost-model=unlimited" } */ + +typedef double POINT_T[3]; +typedef double MATRIX_T[][4]; +typedef struct { + POINT_T a_pos; +} ATOM_T; +typedef struct { + ATOM_T *r_atoms; +} RESIDUE_T; +typedef struct strand_t { + RESIDUE_T *s_residues; +} STRAND_T; +typedef struct strand_t MOLECULE_T; +double xfm_xyz_oxyz4[4]; +MOLECULE_T add_he2o3transformmol_mol, add_he2o3transformmol_sp; +RESIDUE_T add_he2o3transformmol_res; +int add_he2o3transformmol_r, add_he2o3transformmol_a, add_he2o3transformmol_i; +ATOM_T *add_he2o3transformmol_ap; +POINT_T add_he2o3transformmol_xyz, add_he2o3transformmol_nxyz; +static void xfm_xyz(POINT_T oxyz, MATRIX_T mat, POINT_T nxyz) { + int i, j; + double nxyz4[4]; + for (i = 0; i < 3; i++) + xfm_xyz_oxyz4[i] = oxyz[i]; + xfm_xyz_oxyz4[3] = 1.0; + for (i = 0; i < 4; i++) { + nxyz4[i] = 0.0; + for (j = 0; j < 4; j++) + nxyz4[i] += xfm_xyz_oxyz4[j] * mat[j][i]; + } + for (i = 0; i < 3; i++) + nxyz[i] = nxyz4[i]; +} +void add_he2o3transformmol(MATRIX_T mat, int n) { + for (add_he2o3transformmol_sp = add_he2o3transformmol_mol;;) + for (add_he2o3transformmol_r = 0;;) { + add_he2o3transformmol_res = + add_he2o3transformmol_sp.s_residues[add_he2o3transformmol_r]; + for (add_he2o3transformmol_a = 0; add_he2o3transformmol_a < n; add_he2o3transformmol_a++) { + add_he2o3transformmol_ap = + &add_he2o3transformmol_res.r_atoms[add_he2o3transformmol_a]; + for (add_he2o3transformmol_i = 0; add_he2o3transformmol_i < 3; + add_he2o3transformmol_i++) + add_he2o3transformmol_xyz[add_he2o3transformmol_i] = + add_he2o3transformmol_ap->a_pos[add_he2o3transformmol_i]; + xfm_xyz(add_he2o3transformmol_xyz, mat, add_he2o3transformmol_nxyz); + for (add_he2o3transformmol_i = 0; add_he2o3transformmol_i < 3; + add_he2o3transformmol_i++) + add_he2o3transformmol_ap->a_pos[add_he2o3transformmol_i] = + add_he2o3transformmol_nxyz[add_he2o3transformmol_i]; + } + } +} + +/* { dg-final { scan-assembler-not "vperm" } } */