From patchwork Tue Apr 2 07:56:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Juergen Christ X-Patchwork-Id: 1918669 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=okbdbQ5T; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4V80ZY2bKnz1yYB for ; Tue, 2 Apr 2024 18:56:36 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9B312385841F for ; Tue, 2 Apr 2024 07:56:34 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id CCF863858D28 for ; Tue, 2 Apr 2024 07:56:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CCF863858D28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org CCF863858D28 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712044573; cv=none; b=RMRCtpzGvB6WEONInKV8CP4o2nKMjE0Gcy9vXCVL4AOr1xEergaKrnJ995IN/2kYIKwoIeYxa+6l+rkSarcjVcegokjw1aJjJQptFZly91zASDWm3yMyW91fW0BVIhpyC1miIena+4pflAs6E0TVGQbJQeYvCZ/8DATBMqKTmNY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712044573; c=relaxed/simple; bh=9uW+boCfCLhbG7j6J7cE4NRNveghiPFhV/yhUlk5/K0=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=m5O3/OkLdqOA0I8gO/j7/ebVvPx3MMSREf1d3YrUEUIdkF7/eB8WIimbUhFSds//ylm7FjSYtQLL0haL36w6IgDQfuoO2NirfScY8RWsZPBND5RI0TtxYUJMDXADUsk/1lUQt5PqB79GT4BoRO5brXy3IGudWoLYfYYBpWvNC/4= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 4327uAHd026639 for ; Tue, 2 Apr 2024 07:56:10 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : mime-version : content-transfer-encoding; s=pp1; bh=nBESvzAfLW8TMbjr+T80Fc0RW4cu450BUp7LkUtLCpM=; b=okbdbQ5ToQw+75yhVcwYpFwCzKH9AlvL6yDny3uaqSOgwQXtqB4SsGoEXzqZAXezcrYB AFIsh+4muibK2jnxoJO96EYl01hf5UmRDlo7J/mwvz/j3wmYEoZOPyuB8ysrTwDcIcJu 2SwDjqTFk47pr/eGxUZkkNRDvL6ngu4iPKGZT+q3wKoQss/T7x/wYCsTyPs5eCNZJNyS AfWpEQcE47SQmZ3my9CwEB7022NglmJc3pxCjvewUZaF6ynaEvM28aw96Jf85mhvfKdm a//8N9v9XnGNTnGngzgXkSYTodcCOeqrZIAtloZWlfC30nml1HKUc8N/54zAWE+NFEpT tg== Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3x8dn0g1ua-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 02 Apr 2024 07:56:10 +0000 Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 4325IYio002249 for ; Tue, 2 Apr 2024 07:56:09 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3x6xjmd0rt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 02 Apr 2024 07:56:09 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 4327u3Ll52166984 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 2 Apr 2024 07:56:05 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8C23220040; Tue, 2 Apr 2024 07:56:03 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 534502005A; Tue, 2 Apr 2024 07:56:03 +0000 (GMT) Received: from li-3a824ecc-34fe-11b2-a85c-eae455c7d911.ibm.com.com (unknown [9.179.0.86]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 2 Apr 2024 07:56:03 +0000 (GMT) From: Juergen Christ To: gcc-patches@gcc.gnu.org Cc: krebbel@linux.ibm.com Subject: [PATCH] s390x: Optimize vector permute with constant indexes Date: Tue, 2 Apr 2024 09:56:01 +0200 Message-Id: <20240402075601.7733-1-jchrist@linux.ibm.com> X-Mailer: git-send-email 2.39.3 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: frM0gNapeUw3vGE8pP5YGdPoHo5BR7Av X-Proofpoint-ORIG-GUID: frM0gNapeUw3vGE8pP5YGdPoHo5BR7Av X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-04-02_02,2024-04-01_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 mlxscore=0 impostorscore=0 clxscore=1015 lowpriorityscore=0 malwarescore=0 adultscore=0 suspectscore=0 spamscore=0 bulkscore=0 phishscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2403210000 definitions=main-2404020055 X-Spam-Status: No, score=-13.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Loop vectorizer can generate vector permutes with constant indexes where all indexes are equal. Optimize this case to use vector replicate instead of vector permute. gcc/ChangeLog: * config/s390/s390.cc (expand_perm_as_replicate): Implement. (vectorize_vec_perm_const_1): Call new function. * config/s390/vx-builtins.md (vec_splat): Change to... (@vec_splat): ...this. gcc/testsuite/ChangeLog: * gcc.target/s390/vector/vec-expand-replicate.c: New test. Bootstrapped and regtested on s390x. Ok for trunk? Signed-off-by: Juergen Christ --- gcc/config/s390/s390.cc | 32 +++++++++++++++++++ gcc/config/s390/vx-builtins.md | 2 +- .../s390/vector/vec-expand-replicate.c | 30 +++++++++++++++++ 3 files changed, 63 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-expand-replicate.c diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc index 372a23244032..4b4014ebe444 100644 --- a/gcc/config/s390/s390.cc +++ b/gcc/config/s390/s390.cc @@ -17923,6 +17923,35 @@ expand_perm_as_a_vlbr_vstbr_candidate (const struct expand_vec_perm_d &d) return false; } +static bool expand_perm_as_replicate (const struct expand_vec_perm_d &d) +{ + unsigned char i; + unsigned char elem; + rtx base = d.op0; + rtx insn; + /* Needed to silence maybe-uninitialized warning. */ + gcc_assert(d.nelt > 0); + elem = d.perm[0]; + for (i = 1; i < d.nelt; ++i) + if (d.perm[i] != elem) + return false; + if (!d.testing_p) + { + if (elem >= d.nelt) + { + base = d.op1; + elem -= d.nelt; + } + insn = maybe_gen_vec_splat (d.vmode, d.target, base, GEN_INT (elem)); + if (insn == NULL_RTX) + return false; + emit_insn (insn); + return true; + } + else + return maybe_code_for_vec_splat (d.vmode) != CODE_FOR_nothing; +} + /* Try to find the best sequence for the vector permute operation described by D. Return true if the operation could be expanded. */ @@ -17941,6 +17970,9 @@ vectorize_vec_perm_const_1 (const struct expand_vec_perm_d &d) if (expand_perm_as_a_vlbr_vstbr_candidate (d)) return true; + if (expand_perm_as_replicate(d)) + return true; + return false; } diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md index 432d81a719fc..93c0d408a43e 100644 --- a/gcc/config/s390/vx-builtins.md +++ b/gcc/config/s390/vx-builtins.md @@ -424,7 +424,7 @@ ; Replicate from vector element -(define_expand "vec_splat" +(define_expand "@vec_splat" [(set (match_operand:V_HW 0 "register_operand" "") (vec_duplicate:V_HW (vec_select: (match_operand:V_HW 1 "register_operand" "") diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-expand-replicate.c b/gcc/testsuite/gcc.target/s390/vector/vec-expand-replicate.c new file mode 100644 index 000000000000..27563a00f22b --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/vector/vec-expand-replicate.c @@ -0,0 +1,30 @@ +/* Check that the vectorize_vec_perm_const expander correctly deals with + replication. Extracted from spec "nab". */ + +/* { dg-do compile } */ +/* { dg-options "-O3 -mzarch -march=z13 -fvect-cost-model=unlimited" } */ + + +#define REAL_T double +typedef REAL_T MATRIX_T[ 4 ][ 4 ]; + +int concat_mat_i, concat_mat_j; +static void concat_mat(MATRIX_T m1, MATRIX_T, MATRIX_T m3); +MATRIX_T *rot4p() { + MATRIX_T mat3, mat4; + static MATRIX_T mat5; + concat_mat(mat4, mat3, mat5); +} +void concat_mat(MATRIX_T m1, MATRIX_T, MATRIX_T m3) { + int k; + for (;; concat_mat_i++) { + concat_mat_j = 0; + for (; 4; concat_mat_j++) { + k = 0; + for (; k < 4; k++) + m3[concat_mat_i][concat_mat_j] += m1[concat_mat_i][k]; + } + } +} + +/* { dg-final { scan-assembler-not "vperm" } } */