From patchwork Fri Jul 10 23:00:01 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bernhard Reutner-Fischer X-Patchwork-Id: 493888 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 9BFD91402B4 for ; Sat, 11 Jul 2015 09:00:17 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=e0TOtLhR; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; q=dns; s=default; b=OdhRwUSMDMJ2/npuX4 PmLQS0++Mfdrq/9qV8cUSPxRciM2HKYksoqR2yCrzmBBFVD14dI5xHDIj7QYxCMQ KhZVKZtj/zPMBu83NynwONK9YxAdkdRoqlxOeoi5cDdVckq+xqzBgkXsB250i8xq ncpKTlEzAzIudZ4o25oB85X7E= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; s=default; bh=Sdox18evP08qmLGJ36U0q3Mv qn0=; b=e0TOtLhRQvD+FJqzpbFNs7hpDHvpptNgeNTAy13cADxGf/R6nLiai3Oo MurEDxRO+KQYL5WKV1wlmUkf+72kyIbODr6CwKO8PRKACL+Xgw1cR7toee4OTRjB N0yAPxEGZlUvI4rnsXD7/Yi0bMWWuGBOZglX1d8pgAqBQRUD+YQ= Received: (qmail 112912 invoked by alias); 10 Jul 2015 23:00:07 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 112897 invoked by uid 89); 10 Jul 2015 23:00:07 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-lb0-f178.google.com Received: from mail-lb0-f178.google.com (HELO mail-lb0-f178.google.com) (209.85.217.178) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Fri, 10 Jul 2015 23:00:05 +0000 Received: by lbbpo10 with SMTP id po10so99967701lbb.3 for ; Fri, 10 Jul 2015 16:00:01 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.152.224.162 with SMTP id rd2mr21801018lac.43.1436569201671; Fri, 10 Jul 2015 16:00:01 -0700 (PDT) Received: by 10.152.90.230 with HTTP; Fri, 10 Jul 2015 16:00:01 -0700 (PDT) In-Reply-To: <559FBB13.80009@arm.com> References: <559FBB13.80009@arm.com> Date: Sat, 11 Jul 2015 01:00:01 +0200 Message-ID: Subject: Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive From: Bernhard Reutner-Fischer To: Kyrill Tkachov Cc: GCC Patches X-IsSubscribed: yes On 10 July 2015 at 14:31, Kyrill Tkachov wrote: > Hi all, > > This patch makes if-conversion more aggressive when handling code of the > form: > if (test) > x := a //THEN > else > x := b //ELSE > The current code adds the costs of both the THEN and ELSE blocks and proceeds if they don't > exceed the branch cost. I don't think that's quite a right calculation. > We're going to be executing at least one of the basic blocks anyway. > This patch we instead check the *maximum* of the two blocks against the branch cost. > This should still catch cases where a high latency instruction appears in one or both of > the paths. Shouldn't this maximum also take probability into account? Or maybe not, would have to think about it tomorrow. $ contrib/check_GNU_style.sh rtl-ifcvt.00.patch Blocks of 8 spaces should be replaced with tabs. 783:+ return FALSE; Generally ifcvt.c (resp. the whole tree) could use a sed -i -e "s/\([[:space:]]\)FALSE/\1false/g" gcc/ifcvt.c Maybe some of the int predicates could then become bools. +/* Return iff the registers that the insns in BB_A set do not + get used in BB_B. */ Return true iff Did you include go in your testing? I see: Unexpected results in this build (new failures) FAIL: encoding/json FAIL: go/printer FAIL: go/scanner FAIL: html/template FAIL: log FAIL: net/http FAIL: net/http/cgi FAIL: net/http/cookiejar FAIL: os FAIL: text/template bbs_ok_for_cmove_arith() looks costly but i guess you looked if there's some pre-existing cleverness you could have used instead? noce_emit_bb() could use a better comment. Likewise insn_valid_noce_process_p(). insn_rtx_cost() should return an unsigned int, then_cost, else_cost should thus be unsigned int too. copy_of_a versus copy_of_insn_b; I'd shorten the latter. bb_valid_for_noce_process_p() suggests that there is a JOIN_BB param but there is none? Also should document the return value (and should not clobber the OUT params upon failure, no?). As for the testcases, it would be nice to have at least a tiny bit for x86_64, too. PS: no -mbranch-cost and, a tad more seriously, no --param branch-cost either ;) PPS: attached meant to illustrate comments above. Untested. cheers, From 1b7d8f9b61eb538cc4338e2073d04a66518f13c2 Mon Sep 17 00:00:00 2001 From: Bernhard Reutner-Fischer Date: Fri, 10 Jul 2015 21:25:30 +0200 Subject: [PATCH] rtl-ifcvt typos fix some typos on top of Kyrill's patch. should mv the testcases to common ground. --- gcc/ifcvt.c | 49 ++++++++++------------- gcc/testsuite/gcc.target/aarch64/ifcvt_csel_1.c | 2 +- gcc/testsuite/gcc.target/aarch64/ifcvt_csel_2.c | 2 +- gcc/testsuite/gcc.target/aarch64/ifcvt_csel_3.c | 7 ++-- 4 files changed, 27 insertions(+), 33 deletions(-) diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index 3d324257..0bf6645 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -1784,8 +1784,7 @@ noce_try_cmove_arith (struct noce_if_info *if_info) /* We're going to execute one of the basic blocks anyway, so bail out if the most expensive of the two blocks is unacceptable. */ - if (MAX (then_cost, else_cost) - > COSTS_N_INSNS (if_info->branch_cost)) + if (MAX (then_cost, else_cost) > COSTS_N_INSNS (if_info->branch_cost)) return FALSE; /* Possibly rearrange operands to make things come out more natural. */ @@ -2730,28 +2729,26 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond, rtx cc = cc_in_cond (cond); if (!insn_valid_noce_process_p (last_insn, cc)) - return FALSE; + return false; last_set = single_set (last_insn); rtx x = SET_DEST (last_set); - rtx_insn *first_insn = first_active_insn (test_bb); rtx first_set = single_set (first_insn); - bool speed_p = optimize_bb_for_speed_p (test_bb); - *cost = insn_rtx_cost (last_set, speed_p); if (!first_set) return false; + /* We have a single simple set, that's okay. */ - else if (first_insn == last_insn) + bool speed_p = optimize_bb_for_speed_p (test_bb); + + if (first_insn == last_insn) { *simple_p = noce_operand_ok (SET_DEST (first_set)); *cost = insn_rtx_cost (first_set, speed_p); return *simple_p; } - *simple_p = false; - rtx_insn *prev_last_insn = PREV_INSN (last_insn); gcc_assert (prev_last_insn); @@ -2764,6 +2761,7 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond, /* The regs that are live out of test_bb. */ bitmap test_bb_live_out = df_get_live_out (test_bb); + int potential_cost = insn_rtx_cost (last_set, speed_p); rtx_insn *insn; FOR_BB_INSNS (test_bb, insn) { @@ -2781,7 +2779,7 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond, if (MEM_P (SET_SRC (sset)) || MEM_P (SET_DEST (sset))) goto free_bitmap_and_fail; - *cost += insn_rtx_cost (sset, speed_p); + potential_cost += insn_rtx_cost (sset, speed_p); bitmap_set_bit (test_bb_temps, REGNO (SET_DEST (sset))); } } @@ -2792,11 +2790,13 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond, goto free_bitmap_and_fail; BITMAP_FREE (test_bb_temps); + *cost = potential_cost; + *simple_p = false; return true; - free_bitmap_and_fail: - BITMAP_FREE (test_bb_temps); - return false; + free_bitmap_and_fail: + BITMAP_FREE (test_bb_temps); + return false; } /* Given a simple IF-THEN-JOIN or IF-THEN-ELSE-JOIN block, attempt to convert @@ -2828,21 +2828,14 @@ noce_process_if_block (struct noce_if_info *if_info) to calculate a value for x. ??? For future expansion, look for multiple X in such patterns. */ - bool then_bb_valid - = bb_valid_for_noce_process_p (then_bb, cond, &if_info->then_cost, - &if_info->then_simple); - - bool else_bb_valid = false; - if (else_bb) - else_bb_valid - = bb_valid_for_noce_process_p (else_bb, cond, &if_info->else_cost, - &if_info->else_simple); - - if (!then_bb_valid) - return FALSE; + if (! bb_valid_for_noce_process_p (then_bb, cond, &if_info->then_cost, + &if_info->then_simple)) + return false; - if (else_bb && !else_bb_valid) - return FALSE; + if (else_bb + && ! bb_valid_for_noce_process_p (else_bb, cond, &if_info->else_cost, + &if_info->else_simple)) + return false; insn_a = last_active_insn (then_bb, FALSE); set_a = single_set (insn_a); @@ -2866,7 +2859,7 @@ noce_process_if_block (struct noce_if_info *if_info) gcc_assert (set_b); if (!rtx_interchangeable_p (x, SET_DEST (set_b))) - return FALSE; + return FALSE; } else { diff --git a/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_1.c b/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_1.c index 1836f57..92bc17a 100644 --- a/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_1.c +++ b/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_1.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile { target aarch64*-*-* x86_64-*-* } } */ /* { dg-options "-fdump-rtl-ce1 -O2" } */ int diff --git a/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_2.c b/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_2.c index 8c48270..e0e1728 100644 --- a/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_2.c +++ b/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_2.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile { target aarch64*-*-* x86_64-*-* } } */ /* { dg-options "-fdump-rtl-ce1 -O2" } */ diff --git a/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_3.c b/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_3.c index 1aecbc9..0441357 100644 --- a/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_3.c +++ b/gcc/testsuite/gcc.target/aarch64/ifcvt_csel_3.c @@ -1,5 +1,5 @@ -/* { dg-do compile } */ -/* { dg-options "-save-temps -O2" } */ +/* { dg-do assemble { target aarch64*-*-* x86_64-*-* } } */ +/* { dg-options "-fdump-rtl-ce1 -O2" } */ typedef long long s64; @@ -16,4 +16,5 @@ foo (s64 a, s64 b, s64 c) } /* This test can be reduced to just return a + c; */ -/* { dg-final { scan-assembler-not "sub\.*\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+\.*" } } */ +/* { dg-final { scan-rtl-dump "3 true changes made" "ce1" } } */ +/* { dg-final { scan-assembler-not "sub\.*\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+\.*" { target { aarch64*-*-* } } } } */ -- 1.7.10.4