From patchwork Sun Jun 18 15:54:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Hubicka X-Patchwork-Id: 1796256 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=llrWlY31; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Qkct36QC7z20X8 for ; Mon, 19 Jun 2023 01:55:06 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 91FF33858439 for ; Sun, 18 Jun 2023 15:55:03 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 91FF33858439 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1687103703; bh=vZJIifOtm4qcAFX3vCs9YX7UundJ0fouLad5qbCtkfY=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=llrWlY31tiovezSwfCCMfuYx57Gjk8SUJMgpfIjxm6sTYR8GLw/W5ZvIAqKzEncRX HZj+iGp1k7dJXsPzaSnhzwvid0ofb8C7sP7UFdknLNnyW9Kksfy09lRt+571w1OlLe sBbAvOpyE90it+rJYgN0J2SD85ww4+2SQwK/dfJ0= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from nikam.ms.mff.cuni.cz (nikam.ms.mff.cuni.cz [195.113.20.16]) by sourceware.org (Postfix) with ESMTPS id F09C63858D32 for ; Sun, 18 Jun 2023 15:54:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F09C63858D32 Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id 5E3C628AEBE; Sun, 18 Jun 2023 17:54:41 +0200 (CEST) Date: Sun, 18 Jun 2023 17:54:41 +0200 To: gcc-patches@gcc.gnu.org, rguenther@suse.cz Subject: Optimize std::max early Message-ID: MIME-Version: 1.0 Content-Disposition: inline X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jan Hubicka via Gcc-patches From: Jan Hubicka Reply-To: Jan Hubicka Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hi, we currently produce very bad code on loops using std::vector as a stack, since we fail to inline push_back which in turn prevents SRA and we fail to optimize out some store-to-load pairs (PR109849). I looked into why this function is not inlined and it is inlined by clang. We currently estimate it to 66 instructions and inline limits are 15 at -O2 and 30 at -O3. Clang has similar estimate, but still decides to inline at -O2. I looked into reason why the body is so large and one problem I spotted is the way std::max is implemented by taking and returning reference to the values. const T& max( const T& a, const T& b ); This makes it necessary to store the values to memory and load them later (Max is used by code computing new size of vector on resize.) Two stores, conditional and load accounts as 8 instructions, while MAX_EXPR as 1 and has a lot better chance to fold with the surrounding code. We optimize this to MAX_EXPR, but only during late optimizations. I think this is a common enough coding pattern and we ought to make this transparent to early opts and IPA. The following is easist fix that simply adds phiprop pass that turns the PHI of address values into PHI of values so later FRE can propagate values across memory, phiopt discover the MAX_EXPR pattern and DSE remove the memory stores. Bootstrapped/regtested x86_64-linux, does this look resonable thing to do? Looking into how expensive the pass is, I think it is very cheap, except that it computes postdominator and updates ssa even if no patterns are matched. I will send patch to avoid that. gcc/ChangeLog: PR tree-optimization/109811 PR tree-optimization/109849 * passes.def: Add phiprop to early optimization passes. * tree-ssa-phiprop.cc: Allow clonning. gcc/testsuite/ChangeLog: PR tree-optimization/109811 PR tree-optimization/109849 * gcc.dg/tree-ssa/phiprop-1.c: New test. diff --git a/gcc/passes.def b/gcc/passes.def index c9a8f19747b..faa5208b26b 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -88,6 +88,8 @@ along with GCC; see the file COPYING3. If not see /* pass_build_ealias is a dummy pass that ensures that we execute TODO_rebuild_alias at this point. */ NEXT_PASS (pass_build_ealias); + /* Do phiprop before FRE so we optimize std::min and std::max well. */ + NEXT_PASS (pass_phiprop); NEXT_PASS (pass_fre, true /* may_iterate */); NEXT_PASS (pass_early_vrp); NEXT_PASS (pass_merge_phi); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phiprop-1.c b/gcc/testsuite/gcc.dg/tree-ssa/phiprop-1.c new file mode 100644 index 00000000000..9f52c2a7298 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/phiprop-1.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O1 -fdump-tree-phiprop1-details -fdump-tree-release_ssa" } */ +int max(int a, int b) +{ + int *ptr; + if (a > b) + ptr = &a; + else + ptr = &b; + return *ptr; +} + +/* { dg-final { scan-tree-dump-times "Inserting PHI for result of load" 1 "phiprop1"} } */ +/* { dg-final { scan-tree-dump-times "MAX_EXPR" 1 "release_ssa"} } */ diff --git a/gcc/tree-ssa-phiprop.cc b/gcc/tree-ssa-phiprop.cc index 3cb4900b6be..5dc505df420 100644 --- a/gcc/tree-ssa-phiprop.cc +++ b/gcc/tree-ssa-phiprop.cc @@ -476,6 +476,7 @@ public: {} /* opt_pass methods: */ + opt_pass * clone () final override { return new pass_phiprop (m_ctxt); } bool gate (function *) final override { return flag_tree_phiprop; } unsigned int execute (function *) final override;