From patchwork Wed Mar 20 10:53:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Juergen Christ X-Patchwork-Id: 1914074 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=kbiTHjrG; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4V057y4z0Nz1yXD for ; Wed, 20 Mar 2024 21:54:37 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DEB55385840E for ; Wed, 20 Mar 2024 10:54:34 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 3B5503858C35 for ; Wed, 20 Mar 2024 10:54:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3B5503858C35 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 3B5503858C35 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1710932051; cv=none; b=klinPDPOgZspNaVXFlBUQJk7r0NyP7zGMw9gV9zWXbb+M3amRkXsVd4Q/zUgpcH/JPYz2CdkCpHcnzird692fLVZ0u9vj+/W1LblRAurrMbLG9M5YO21aTNpJsd45Aj7SwEvg9r+1xJZNMWgBMdTdjICrJ3Mf5tZNdNEF7noM0o= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1710932051; c=relaxed/simple; bh=8dzD97L+Hw0n88TRRBs+gUlyKtd2dZiCfWscinmVnlM=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=a2YbQdnviE4sPqlJvkgx+yFsYDLxAj36mvBBNTEVIoGxJN7HJIvQDytgGaoMY8EMi5pfOeRJtJ0jIdN3KzvNWDwEEUXUNfa71wgNY/OFw5uflQj/UL8ENjVS2skW/IlU3v4AiWW4cVw9w3RWHoWeUOXOW5nJbeMxSF7rW4I9ti4= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353726.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 42KAmONO014076 for ; Wed, 20 Mar 2024 10:54:08 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : mime-version : content-transfer-encoding; s=pp1; bh=+Dwe2y5MeZOYXpmT2xUgvMxc2krL5N2iPn6GwJUWUaA=; b=kbiTHjrGbQGq0K9riVDhQHgGiE8S8U/0l7uZuriCTIufc9z1yO4JiEtFro60k1sUG7TF 7+/3SpjYBtd2iXnUMtnT7n4X2fD5CrZa5dyQYc2EdVZB2XT+/75CoXtRacsmm5ZKGFwN kjFwB1bxi/Jjew/fuZmAMUehUFcvnZqQr0sTuTm4f6Mn0PhRg0OUK/xaW8x34Vw32hNq akyqtbb3QSyIOsDlXAP+orjGrasIPUlL8omk54b/m/XfqIQcG93Gnou3Em2iYrxomDfl BY3qFvtTTPCWwP2gHJ3+bUgPpNg2yeBC94qVi/yzkfFw3PNz65sZSd4bXf9baQG6MvEU tg== Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3wyxbr809g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 20 Mar 2024 10:54:07 +0000 Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 42K8pXmD011539 for ; Wed, 20 Mar 2024 10:54:06 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3wwq8m5nw9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 20 Mar 2024 10:54:06 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 42KAs0xO49217826 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 20 Mar 2024 10:54:02 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 965C52004D; Wed, 20 Mar 2024 10:54:00 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 63D2E2004B; Wed, 20 Mar 2024 10:54:00 +0000 (GMT) Received: from li-3a824ecc-34fe-11b2-a85c-eae455c7d911.ibm.com.com (unknown [9.179.0.211]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 20 Mar 2024 10:54:00 +0000 (GMT) From: Juergen Christ To: gcc-patches@gcc.gnu.org Cc: krebbel@linux.ibm.com Subject: [PATCH] s390x: Implement vector cost model Date: Wed, 20 Mar 2024 11:53:56 +0100 Message-Id: <20240320105356.16494-1-jchrist@linux.ibm.com> X-Mailer: git-send-email 2.39.3 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: gPZof0yCdPq-3rEN_Td2D65JggsgOkGM X-Proofpoint-GUID: gPZof0yCdPq-3rEN_Td2D65JggsgOkGM X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-03-20_08,2024-03-18_03,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 bulkscore=0 mlxlogscore=999 priorityscore=1501 phishscore=0 malwarescore=0 adultscore=0 impostorscore=0 spamscore=0 clxscore=1015 suspectscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2403140000 definitions=main-2403200086 X-Spam-Status: No, score=-13.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Hi, s390x used the basic cost model which does not correctly model the cost of register file crossing or the availability of certain instructions to simplify reversed operations. Implement an own cost model to better control when to vectorize. gcc/ChangeLog: * config/s390/s390.cc (class s390_vector_costs): Implement. (s390_vector_costs::s390_vector_costs): Dito. (s390_vector_costs::add_stmt_cost): Dito. (s390_vectorize_create_costs): Dito. (TARGET_VECTORIZE_CREATE_COSTS): Dito. gcc/testsuite/ChangeLog: * gcc.target/s390/vector/loop-1.c: New test. * gcc.target/s390/vector/slp-1.c: New test. * gcc.target/s390/vector/slp-2.c: New test. Signed-off-by: Juergen Christ Bootstrapped and tested on s390x. Ok for master? --- gcc/config/s390/s390.cc | 127 ++++++++++++++++++ gcc/testsuite/gcc.target/s390/vector/loop-1.c | 82 +++++++++++ gcc/testsuite/gcc.target/s390/vector/slp-1.c | 68 ++++++++++ gcc/testsuite/gcc.target/s390/vector/slp-2.c | 31 +++++ 4 files changed, 308 insertions(+) create mode 100644 gcc/testsuite/gcc.target/s390/vector/loop-1.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/slp-1.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/slp-2.c diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc index 372a23244032..b9dab1cf8a85 100644 --- a/gcc/config/s390/s390.cc +++ b/gcc/config/s390/s390.cc @@ -88,6 +88,7 @@ along with GCC; see the file COPYING3. If not see #include "ipa-prop.h" #include "ipa-fnsummary.h" #include "sched-int.h" +#include "tree-vectorizer.h" /* This file should be included last. */ #include "target-def.h" @@ -4199,6 +4200,130 @@ s390_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost, } } +/* s390-specific vector costs */ +class s390_vector_costs : public vector_costs +{ + stmt_vec_info skipfinalpart; +public: + s390_vector_costs (vec_info *, bool); + + unsigned int add_stmt_cost (int count, vect_cost_for_stmt kind, + stmt_vec_info stmt_info, slp_tree node, + tree vectype, int misalign, + vect_cost_model_location where) override; +}; + +s390_vector_costs::s390_vector_costs(vec_info *vinfo, bool costing_for_scalar) + : vector_costs(vinfo, costing_for_scalar) +{ +} + +unsigned int +s390_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind, + stmt_vec_info stmt_info, slp_tree node, + tree vectype, int misalign, + vect_cost_model_location where) +{ + bool fp = false; + int costs = s390_builtin_vectorization_cost (kind, vectype, misalign); + + if (vectype != NULL) + fp = FLOAT_TYPE_P (vectype); + + if ((kind == scalar_to_vec || kind == vec_construct) + && node + && SLP_TREE_DEF_TYPE (node) == vect_external_def) + { + unsigned int i; + tree op; + FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_OPS (node), i, op) + if (TREE_CODE (op) == SSA_NAME) + TREE_VISITED (op) = 0; + FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_OPS (node), i, op) + { + if (TREE_CODE (op) != SSA_NAME + || TREE_VISITED (op)) + continue; + TREE_VISITED (op) = 1; + gimple *def = SSA_NAME_DEF_STMT (op); + tree temp; + if (is_gimple_assign(def) + && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def)) + && (temp = gimple_assign_rhs1(def)) + && TREE_CODE (temp) == SSA_NAME + && tree_nop_conversion_p (TREE_TYPE (gimple_assign_lhs (def)), + TREE_TYPE (temp))) + def = SSA_NAME_DEF_STMT (temp); + if (!gimple_assign_load_p (def)) + { + /* For scalar_to_vec from a fp register, we might not + cross the register files. So keep the penalty small. + ??? If we have to cross, we actually cross twice + leading to a huge runtime penalty. Should we reflect + this here? */ + if (kind == scalar_to_vec && fp) + costs += 2; + else + costs += 3; + } + } + FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_OPS (node), i, op) + if (TREE_CODE (op) == SSA_NAME) + TREE_VISITED (op) = 0; + } + if (kind == scalar_stmt && stmt_info && is_gimple_assign (stmt_info->stmt)) + { + const gassign *assign = dyn_cast (stmt_info->stmt); + tree comptype = NULL_TREE; + if (gimple_assign_rhs_code (assign) == BIT_INSERT_EXPR) + comptype = TREE_TYPE (gimple_assign_rhs1 (assign)); + if (gimple_assign_rhs_code (assign) == BIT_FIELD_REF) + comptype = TREE_TYPE (TREE_OPERAND (gimple_assign_rhs1 (assign), 0)); + if (comptype != NULL_TREE && VECTOR_TYPE_P (comptype)) + { + /* This will be a vlvg or vlgv that crosses the register files. */ + costs += 3; + } + } + if (stmt_info + && (STMT_VINFO_TYPE (stmt_info) == store_vec_info_type + || STMT_VINFO_TYPE (stmt_info) == load_vec_info_type)) + { + if (STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_ELEMENTWISE) + { + /* gimple represents elementwise unloading as two steps + (vec_to_scalar followed by scalar_store). s390 stores + lanes to memory in one operation. Similarly, elementwise + loading is represented as scalar_load for each lane + followed by a vec_construct. s390 loads directly in the + appropriate lanes. The second operation does not + exist. */ + if (kind == scalar_to_vec || kind == scalar_load) + skipfinalpart = stmt_info; + if ((kind == scalar_store || kind == vec_construct) + && skipfinalpart == stmt_info) + return 0; + } + else if (STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_CONTIGUOUS_REVERSE) + { + /* gimple represents reversal via a vec_perm followed by the + load/store. s390 has vector load/store reversed + instructions. The permute operation does not exist. */ + if (kind == vec_perm) + return 0; + } + } + costs *= count; + return record_stmt_cost (stmt_info, where, (unsigned int) costs); +} + +/* Implement targetm.vectorize.create_costs. */ +static vector_costs * +s390_vectorize_create_costs (vec_info *vinfo, bool costing_for_scalar) +{ + return new s390_vector_costs(vinfo, costing_for_scalar); +} + /* If OP is a SYMBOL_REF of a thread-local symbol, return its TLS mode, otherwise return 0. */ @@ -18088,6 +18213,8 @@ s390_vectorize_vec_perm_const (machine_mode vmode, machine_mode op_mode, #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \ s390_builtin_vectorization_cost +#undef TARGET_VECTORIZE_CREATE_COSTS +#define TARGET_VECTORIZE_CREATE_COSTS s390_vectorize_create_costs #undef TARGET_MACHINE_DEPENDENT_REORG #define TARGET_MACHINE_DEPENDENT_REORG s390_reorg diff --git a/gcc/testsuite/gcc.target/s390/vector/loop-1.c b/gcc/testsuite/gcc.target/s390/vector/loop-1.c new file mode 100644 index 000000000000..4a75fe2c2c0e --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/vector/loop-1.c @@ -0,0 +1,82 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-all -march=z13" } */ + +#define N 32 + +void contiguous +(int *restrict out, int *restrict in, int m) +{ + int i; + + for (i = 0; i < N; ++i) + out[i] = in[i] * m; +} + +void contiguous_permute__load +(int *restrict out, int *restrict in, int m) +{ + int i; + + for (i = 0; i < N; ++i) + out[i] = in[2 * i] * m; +} + +void contiguous_permute__store +(int *restrict out, int *restrict in, int m) +{ + int i; + + for (i = 0; i < N; ++i) + out[2 * i] = in[i] * m; +} + +void elementwise +(int *restrict out, int *restrict in, int m, int s) +{ + int i; + + for (i = 0; i < N; ++i) + out[i] = in[s * i] * m; +} + +void contiguous_reverse +(int *restrict out, int *restrict in, int m) +{ + int i; + + for (i = N - 1; i >= 0; --i) + out[i] = in[i] * m; +} + +#if 0 +/* This does not work currently. + => "not falling back to elementwise accesses" */ +void contiguous_permute__load_reversed +(int *restrict out, int *restrict in, int m) +{ + int i; + + for (i = N - 1; i >= 0; --i) + out[i] = in[2 * i] * m; +} +#endif + +void contiguous_permute__store_reversed +(int *restrict out, int *restrict in, int m) +{ + int i; + + for (i = N - 1; i >= 0; --i) + out[2 * i] = in[i] * m; +} + +void elementwise__reversed +(int *restrict out, int *restrict in, int m, int s) +{ + int i; + + for (i = N - 1; i >= 0; --i) + out[i] = in[s * i] * m; +} + +/* { dg-final { scan-tree-dump-not "couldn't vectorize loop" "vect" } } */ diff --git a/gcc/testsuite/gcc.target/s390/vector/slp-1.c b/gcc/testsuite/gcc.target/s390/vector/slp-1.c new file mode 100644 index 000000000000..5ee93b649cc7 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/vector/slp-1.c @@ -0,0 +1,68 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-slp-all -march=z15" } */ + +void vrep +(int *x) +{ + x[0] = 42; + x[1] = 42; + x[2] = 42; + x[3] = 42; +} + +void vgbm +(int *x) +{ + x[0] = 0xff00; + x[1] = 0xff00; + x[2] = 0xff00; + x[3] = 0xff00; +} + +void vgm +(int *x) +{ + x[0] = 0x7e; + x[1] = 0x7e; + x[2] = 0x7e; + x[3] = 0x7e; +} + +void vl +(int *x) +{ + x[0] = 42; + x[1] = 0xff00; + x[2] = 0x7e; + x[3] = 0; +} + +void vl_vst +(int *restrict o, int *restrict i) +{ + o[0] = i[0]; + o[1] = i[1]; + o[2] = i[2]; + o[3] = i[3]; +} + +void vlrepf +(int *restrict o, int *restrict i) +{ + o[0] = i[0]; + o[1] = i[0]; + o[2] = i[0]; + o[3] = i[0]; +} + +// Needs z15 +void vcefb +(float *restrict o, int *restrict i) +{ + o[0] = i[0]; + o[1] = i[1]; + o[2] = i[2]; + o[3] = i[3]; +} + +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 7 "slp2" } } */ diff --git a/gcc/testsuite/gcc.target/s390/vector/slp-2.c b/gcc/testsuite/gcc.target/s390/vector/slp-2.c new file mode 100644 index 000000000000..b0dc44319922 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/vector/slp-2.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-slp-all -march=z13" } */ + +void elementwise +(int *o, int i0, int i1, int i2, unsigned int i3) +{ + o[0] = i0; + o[1] = i1; + o[2] = i2; + o[3] = i3; +} + +void elementreplicate +(int *o, int i) +{ + o[0] = i; + o[1] = i; + o[2] = i; + o[3] = i; +} + +void mult +(int *o, int i0, int i1, int i2, int i3, int m) +{ + o[0] = i0 * m; + o[1] = i1 * m; + o[2] = i2 * m; + o[3] = i3 * m; +} + +/* { dg-final { scan-tree-dump-times "not vectorized: vectorization is not profitable" 3 "slp2" } } */