From patchwork Fri Dec 1 02:38:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1870454 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=mj6xRWvf; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4ShHMF3fCgz1ySd for ; Fri, 1 Dec 2023 13:39:21 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 66FE3385B837 for ; Fri, 1 Dec 2023 02:39:19 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.11]) by sourceware.org (Postfix) with ESMTPS id 8231E3858C31 for ; Fri, 1 Dec 2023 02:39:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8231E3858C31 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8231E3858C31 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.11 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701398348; cv=none; b=NiIXcqKqBAOIAS7l30exTqV2z+KR8rr2L/7quiS4NCmIV32tfnXsHWMxzfFGYMqxETmlTr1E9BIKCYpnZM3wCJ+j+veVxJsTSktso+QeRs8jXjoiK/3UpH6T85h7qIl7BqQ0WK1jk96odFCTrMNts1lFYOt2yjQlDWBGDlVVTWI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701398348; c=relaxed/simple; bh=NlZFk/DJflLi58I53vUX39THKlbeGik5xpiTBn712QM=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=LHulIJvobOlSO8SB3iasjZe8H/NUzZ3wPE/+UgXe3AH++ATLz6is6p2mmuzmVQIiJond0etgefaO8tlAPmH0LnyxfVyUVHQv8SBZ4dVZP9UQqgaRZv2aLwJqndGr3JMWyyozKVeyeUAwb+VtaG9fG7sNNgampIfHhsLlUcvsJG4= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701398346; x=1732934346; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=NlZFk/DJflLi58I53vUX39THKlbeGik5xpiTBn712QM=; b=mj6xRWvfWy+AMnHTZ5IzyhWOwcc6TTBQdeKNK5zi1f+YetcuXx9+r8X5 Of688yvc2m8Q9J6JKdd8FzWJBWVg4E1I7mxA7ZTUNE4QwdP6/Zf6pyTag 9qhIZPBMpXTmAd9NUIJkqo69M0Sv5JjXGRWg1X7R2nXl9qMbUz0FYnxRg ir61nsGETMxG8I/ISui9KQaSVtY3kpMHBnlGSBRk8Zl11oUhIYbBIFO+o ulBy0vJKK7T5jjagYt18OMsMyS3UwWUmZT3etulvTHV1LVIde5laoi+wC CJ/vG4hf1PxmTkqfdDUBFxDzmQioTnZtjsH6o559TVwLzlSDB/i/EjHh2 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10910"; a="276434" X-IronPort-AV: E=Sophos;i="6.04,240,1695711600"; d="scan'208";a="276434" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmvoesa105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Nov 2023 18:38:53 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10910"; a="887458143" X-IronPort-AV: E=Sophos;i="6.04,240,1695711600"; d="scan'208";a="887458143" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga002.fm.intel.com with ESMTP; 30 Nov 2023 18:38:50 -0800 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 192F51005671; Fri, 1 Dec 2023 10:38:50 +0800 (CST) From: liuhongt To: gcc-patches@gcc.gnu.org Cc: crazylht@gmail.com, hjl.tools@gmail.com Subject: [PATCH] Take register pressure into account for vec_construct/scalar_to_vec when the components are not loaded from memory. Date: Fri, 1 Dec 2023 10:38:50 +0800 Message-Id: <20231201023850.1118763-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org > Hmm, I would suggest you put reg_needed into the class and accumulate > over all vec_construct, with your patch you pessimize a single v32qi > over two separate v16qi for example. Also currently the whole block is > gated with INTEGRAL_TYPE_P but register pressure would be also > a concern for floating point vectors. finish_cost would then apply an > adjustment. Changed. > 'target_avail_regs' is for GENERAL_REGS, does that include APX regs? > I don't see anything similar for FP regs, but I guess the target should know > or maybe there's a #regs in regclass query already. Haven't see any, use below setting. unsigned target_avail_sse = TARGET_64BIT ? (TARGET_AVX512F ? 32 : 16) : 8; Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. No big impact on SPEC2017. Observe 1 big improvement from other benchmark by avoiding vectorization with vec_construct v32qi which caused lots of spills. Ok for trunk? For vec_contruct, the components must be live at the same time if they're not loaded from memory, when the number of those components exceeds available registers, spill happens. Try to account that with a rough estimation. ??? Ideally, we should have an overall estimation of register pressure if we know the live range of all variables. gcc/ChangeLog: * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Count sse_reg/gpr_regs for components not loaded from memory. (ix86_vector_costs:ix86_vector_costs): New constructor. (ix86_vector_costs::m_num_gpr_needed[3]): New private memeber. (ix86_vector_costs::m_num_sse_needed[3]): Ditto. (ix86_vector_costs::finish_cost): Estimate overall register pressure cost. (ix86_vector_costs::ix86_vect_estimate_reg_pressure): New function. --- gcc/config/i386/i386.cc | 54 ++++++++++++++++++++++++++++++++++++++--- 1 file changed, 50 insertions(+), 4 deletions(-) diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 9390f525b99..dcaea6c2096 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -24562,15 +24562,34 @@ ix86_noce_conversion_profitable_p (rtx_insn *seq, struct noce_if_info *if_info) /* x86-specific vector costs. */ class ix86_vector_costs : public vector_costs { - using vector_costs::vector_costs; +public: + ix86_vector_costs (vec_info *, bool); unsigned int add_stmt_cost (int count, vect_cost_for_stmt kind, stmt_vec_info stmt_info, slp_tree node, tree vectype, int misalign, vect_cost_model_location where) override; void finish_cost (const vector_costs *) override; + +private: + + /* Estimate register pressure of the vectorized code. */ + void ix86_vect_estimate_reg_pressure (); + /* Number of GENERAL_REGS/SSE_REGS used in the vectorizer, it's used for + estimation of register pressure. + ??? Currently it's only used by vec_construct/scalar_to_vec + where we know it's not loaded from memory. */ + unsigned m_num_gpr_needed[3]; + unsigned m_num_sse_needed[3]; }; +ix86_vector_costs::ix86_vector_costs (vec_info* vinfo, bool costing_for_scalar) + : vector_costs (vinfo, costing_for_scalar), + m_num_gpr_needed (), + m_num_sse_needed () +{ +} + /* Implement targetm.vectorize.create_costs. */ static vector_costs * @@ -24748,8 +24767,7 @@ ix86_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind, } else if ((kind == vec_construct || kind == scalar_to_vec) && node - && SLP_TREE_DEF_TYPE (node) == vect_external_def - && INTEGRAL_TYPE_P (TREE_TYPE (vectype))) + && SLP_TREE_DEF_TYPE (node) == vect_external_def) { stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign); unsigned i; @@ -24785,7 +24803,15 @@ ix86_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind, && (gimple_assign_rhs_code (def) != BIT_FIELD_REF || !VECTOR_TYPE_P (TREE_TYPE (TREE_OPERAND (gimple_assign_rhs1 (def), 0)))))) - stmt_cost += ix86_cost->sse_to_integer; + { + if (fp) + m_num_sse_needed[where]++; + else + { + m_num_gpr_needed[where]++; + stmt_cost += ix86_cost->sse_to_integer; + } + } } FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_OPS (node), i, op) if (TREE_CODE (op) == SSA_NAME) @@ -24821,6 +24847,24 @@ ix86_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind, return retval; } +void +ix86_vector_costs::ix86_vect_estimate_reg_pressure () +{ + unsigned gpr_spill_cost = COSTS_N_INSNS (ix86_cost->int_store [2]) / 2; + unsigned sse_spill_cost = COSTS_N_INSNS (ix86_cost->sse_store[0]) / 2; + + /* Any better way to have target available fp registers, currently use SSE_REGS. */ + unsigned target_avail_sse = TARGET_64BIT ? (TARGET_AVX512F ? 32 : 16) : 8; + for (unsigned i = 0; i != 3; i++) + { + if (m_num_gpr_needed[i] > target_avail_regs) + m_costs[i] += gpr_spill_cost * (m_num_gpr_needed[i] - target_avail_regs); + /* Only measure sse registers pressure. */ + if (TARGET_SSE && (m_num_sse_needed[i] > target_avail_sse)) + m_costs[i] += sse_spill_cost * (m_num_sse_needed[i] - target_avail_sse); + } +} + void ix86_vector_costs::finish_cost (const vector_costs *scalar_costs) { @@ -24843,6 +24887,8 @@ ix86_vector_costs::finish_cost (const vector_costs *scalar_costs) m_costs[vect_body] = INT_MAX; } + ix86_vect_estimate_reg_pressure (); + vector_costs::finish_cost (scalar_costs); }