From patchwork Tue May 23 15:18:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 1785295 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=CbKLbvAH; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QQdHm2VhRz20Q6 for ; Wed, 24 May 2023 01:18:28 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CAE523858C2B for ; Tue, 23 May 2023 15:18:25 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CAE523858C2B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1684855105; bh=/W63v3jD0MW3EaJUlfhgVyqrOL9EFNpru/VBV7A/Kdw=; h=Date:To:cc:Subject:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=CbKLbvAH4xgGVoKLq4a3Zbqys0LtFBIWSJA5l+t9sowCE5F7LC7gVmzzfvJ/jmkK/ LOALRpq0Y2ppzQtqHZeVAICOP6OQ4Zlxtog41YAtDUHfRq8Rm8yhrbfxtgo2ZtqxKP hLS+rhEHiH7o27DGbzQJ3Wbxrha9uQStJkWEdYQw= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by sourceware.org (Postfix) with ESMTPS id 2F2AA3858D37 for ; Tue, 23 May 2023 15:18:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2F2AA3858D37 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 5579822990; Tue, 23 May 2023 15:18:03 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 3428B13A10; Tue, 23 May 2023 15:18:03 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id IP0rCyvZbGRIfQAAMHmgww (envelope-from ); Tue, 23 May 2023 15:18:03 +0000 Date: Tue, 23 May 2023 17:18:02 +0200 (CEST) To: gcc-patches@gcc.gnu.org cc: richard.sandiford@arm.com Subject: [PATCH] tree-optimization/109747 - SLP cost of CTORs MIME-Version: 1.0 Message-Id: <20230523151803.3428B13A10@imap2.suse-dmz.suse.de> X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Biener via Gcc-patches From: Richard Biener Reply-To: Richard Biener Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" The x86 backend looks at the SLP node passed to the add_stmt_cost hook when costing vec_construct, looking for elements that require a move from a GPR to a vector register and cost that. But since vect_prologue_cost_for_slp decomposes the cost for an external SLP node into individual pieces this cost gets applied N times without a chance for the backend to know it's just dealing with a part of the SLP node. Just looking at a part is also not perfect since the GPR to XMM move cost applies only once per distinct element so handling the whole SLP node one more correctly reflects cost (albeit without considering other external SLP nodes). The following addresses the issue by passing down the SLP node only for one piece and nullptr for the rest. The x86 backend is currently the only one looking at it. In the future the cost of external elements is something to deal with globally but that would require the full SLP tree be available to costing. It's difficult to write a testcase, at the tipping point not vectorizing is better so I'll followup with x86 specific adjustments and will see to add a testcase later. Bootstrapped and tested on x86_64-unknown-linux-gnu. Richard, we talked about this issue two weeks ago and I was looking for a solution that would be OK for backporting if the need arises. The following is what I could come up with that retains the whole SLP-node wide "CSE" of the element move cost. Is that OK until we come up with a better plan for trunk at some point? Thanks, Richard. PR tree-optimization/109747 * tree-vect-slp.cc (vect_prologue_cost_for_slp): Pass down the SLP node only once to the cost hook. --- gcc/tree-vect-slp.cc | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index e5c9d7e766e..a6f277c5e21 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -6069,6 +6069,7 @@ vect_prologue_cost_for_slp (slp_tree node, } /* ??? We're just tracking whether vectors in a single node are the same. Ideally we'd do something more global. */ + bool passed = false; for (unsigned int start : starts) { vect_cost_for_stmt kind; @@ -6078,7 +6079,15 @@ vect_prologue_cost_for_slp (slp_tree node, kind = scalar_to_vec; else kind = vec_construct; - record_stmt_cost (cost_vec, 1, kind, node, vectype, 0, vect_prologue); + /* The target cost hook has no idea which part of the SLP node + we are costing so avoid passing it down more than once. Pass + it to the first vec_construct or scalar_to_vec part since for those + the x86 backend tries to account for GPR to XMM register moves. */ + record_stmt_cost (cost_vec, 1, kind, + (kind != vector_load && !passed) ? node : nullptr, + vectype, 0, vect_prologue); + if (kind != vector_load) + passed = true; } }