From patchwork Mon Oct 7 09:20:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 1172678 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-510383-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="P8HaQHPI"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 46mw283ZdTz9sCJ for ; Mon, 7 Oct 2019 20:20:46 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; q=dns; s=default; b=a5o7RxGGnPqxAoG0tZbwjy4IuwLUcqFjOC217AwZx05Y8p0y6o jZb+VOIdTdDM6UBwpSbH/a8NNYyhQd0NLSVviyrpukrZN87ANz2GSWM/+1osCV1G kdXTUPL3b3JuFq4ZdMi5lvspgohbLabTNDt+Pcqlmtc/yb1DgNQ2T2NLk= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; s= default; bh=vCTDfcMvgjpMrNOKp5M7AS4CubE=; b=P8HaQHPIL3u53E50RLo3 kLgR1Ed8kCzvjYSbXOF5yDGrpg4WybZjhsecXkNknqJeMAQq/uTNWcpb7vD7QGVg GM2kNcM7AlfAROnKx6FKEUedx5HlvZlyapJ3Dsg1hqJRf4NIa93ZW/gFMcEu1QCC 7V2gs0lq1l9kBpMUUvTvJaI= Received: (qmail 32615 invoked by alias); 7 Oct 2019 09:20:38 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 32605 invoked by uid 89); 7 Oct 2019 09:20:37 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-18.2 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, SPF_PASS autolearn=ham version=3.3.1 spammy=avail, cold, ldist X-HELO: mx1.suse.de Received: from mx2.suse.de (HELO mx1.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 07 Oct 2019 09:20:36 +0000 Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 9BCBAB19E for ; Mon, 7 Oct 2019 09:20:33 +0000 (UTC) Date: Mon, 7 Oct 2019 11:20:33 +0200 (CEST) From: Richard Biener To: gcc-patches@gcc.gnu.org cc: Jan Hubicka Subject: [PATCH] Fix PR91975, tame PRE some more Message-ID: User-Agent: Alpine 2.21 (LSU 202 2017-01-01) MIME-Version: 1.0 The following tries to address the issue that PRE is quite happy to introduce new IVs in loops just because it can compute some constant value on the loop entry edge. In principle there's already code that should work against that but it simply searches for a optimize_edge_for_speed () edge. That still considers the loop entry edge to be worth optimizing because it ends up as maybe_hot_edge_p (e) for -O2 which compares the edge count against the entry block count. For PRE we want something more local (comparing to the destination block count). Now for the simple testcases this shouldn't make a difference but hot/cold uses PARAM_VALUE (HOT_BB_FREQUENCY_FRACTION) which isn't the same as profile_probabilities likely or very likely... Still one key of the patch is that we compare the sum of the edge counts where the value is available (and thus the redundancy elimination happens) with the count we have to insert rather than looking for a single optimize_edge_for_speed_p edge. For that I've used if (avail_count < block->count.apply_probability (profile_probability::unlikely ())) so we block insertion if the redundancies would overall be "unlikely". I'm also not sure why maybe_hot_count_p uses HOT_BB_FREQUENCY_FRACTION while there exists HOT_BB_COUNT_FRACTION (with a ten-fold larger default value) that seems to match better for scaling a profile-count? Honza? Bootstrap & regtest running on x86-64-unknown-linux-gnu. Does the above predicate look sane or am I on a wrong track with using the destination block count here (I realize even the "locally cold" entries into the block might be quite hot globally). For a 1:1 translation of the existing code to sth using the original predicate but summing over edges I could use !maybe_hot_count_p (cfun, avail_count)? But then we're back to PRE doing the unwanted insertions. Changing maybe_hot_count_p to use HOT_BB_COUNT_FRACTION doesn't make any difference there (obviously). Thanks, Richard. 2019-10-06 Richard Biener PR tree-optimization/91975 * tree-ssa-pre.c (do_pre_regular_insertion): Adjust profitability check to use the sum of all edge counts the value is available on and check against unlikely execution of the block. * gcc.dg/tree-ssa/ldist-39.c: New testcase. diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c new file mode 100644 index 00000000000..a63548979ea --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c @@ -0,0 +1,43 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-ldist-details" } */ + +#define T int + +const T a[] = { 0, 1, 2, 3, 4, 5, 6, 7 }; +T b[sizeof a / sizeof *a]; + +void f0 (void) +{ + const T *s = a; + T *d = b; + for (unsigned i = 0; i != sizeof a / sizeof *a; ++i) + d[i] = s[i]; +} + +void g0 (void) +{ + const T *s = a; + T *d = b; + for (unsigned i = 0; i != sizeof a / sizeof *a; ++i) + *d++ = *s++; +} + +extern const T c[sizeof a / sizeof *a]; + +void f1 (void) +{ + const T *s = c; + T *d = b; + for (unsigned i = 0; i != sizeof a / sizeof *a; ++i) + d[i] = s[i]; +} + +void g1 (void) +{ + const T *s = c; + T *d = b; + for (unsigned i = 0; i != sizeof a / sizeof *a; ++i) + *d++ = *s++; +} + +/* { dg-final { scan-tree-dump-times "generated memcpy" 4 "ldist" } } */ diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c index c618601a184..af49ba388c1 100644 --- a/gcc/tree-ssa-pre.c +++ b/gcc/tree-ssa-pre.c @@ -3195,7 +3195,7 @@ do_pre_regular_insertion (basic_block block, basic_block dom) pre_expr eprime = NULL; edge_iterator ei; pre_expr edoubleprime = NULL; - bool do_insertion = false; + profile_count avail_count = profile_count::zero (); val = get_expr_value_id (expr); if (bitmap_set_contains_value (PHI_GEN (block), val)) @@ -3250,10 +3250,7 @@ do_pre_regular_insertion (basic_block block, basic_block dom) { avail[pred->dest_idx] = edoubleprime; by_some = true; - /* We want to perform insertions to remove a redundancy on - a path in the CFG we want to optimize for speed. */ - if (optimize_edge_for_speed_p (pred)) - do_insertion = true; + avail_count += pred->count (); if (first_s == NULL) first_s = edoubleprime; else if (!pre_expr_d::equal (first_s, edoubleprime)) @@ -3266,7 +3263,11 @@ do_pre_regular_insertion (basic_block block, basic_block dom) partially redundant. */ if (!cant_insert && !all_same && by_some) { - if (!do_insertion) + /* We want to perform insertions to remove a redundancy on + a path in the CFG that is somewhat likely. Avoid inserting + when we'd only remove a redundancy on unlikely paths. */ + if (avail_count < block->count.apply_probability + (profile_probability::unlikely ())) { if (dump_file && (dump_flags & TDF_DETAILS)) {