From patchwork Mon Nov 26 12:11:40 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Leoshkevich X-Patchwork-Id: 1003214 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-490865-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="mLLelpF2"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 433QlB1Pw6z9s29 for ; Mon, 26 Nov 2018 23:12:04 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:mime-version:content-transfer-encoding :message-id; q=dns; s=default; b=GO4RaamQVt2ggvBQsyult1MW+phGKfU vVapfLMl5L1M8eB6uvxKD00Y3kpzHgwqZ6fXaQLdm5hkXxF7rc2z8QMXbzoG5NsH WOPXKkWL+0baeHMfsWfRURCfbNpqMgplVxPEfzTy6rMufRYCEkwqX3HWWRDnprDz Vunm6/8ndfRI= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:mime-version:content-transfer-encoding :message-id; s=default; bh=Q6UV5dsB4v7Db4xX2Vk1Xr76HUA=; b=mLLel pF2maUbWSAk94I3mdaxTom2w543flGRloxDKlXVXfzNHAzhIdFODLhmae33LldnI NSzxaeHCjAp9HbtmmQT1FsSZ7QnxxYZhzPwVO7lzcMpTxfb3dSb72f+EoCj4tm+6 jjpMsLLQTVPJD2XmSBgOggvvxgZv4m4NvD2wr4= Received: (qmail 36002 invoked by alias); 26 Nov 2018 12:11:56 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 35987 invoked by uid 89); 26 Nov 2018 12:11:55 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.6 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy=locality, 061, 0.94, 1.56 X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 26 Nov 2018 12:11:54 +0000 Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id wAQC94eL040308 for ; Mon, 26 Nov 2018 07:11:52 -0500 Received: from e06smtp05.uk.ibm.com (e06smtp05.uk.ibm.com [195.75.94.101]) by mx0a-001b2d01.pphosted.com with ESMTP id 2p0gjm064n-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 26 Nov 2018 07:11:52 -0500 Received: from localhost by e06smtp05.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 26 Nov 2018 12:11:50 -0000 Received: from b06cxnps4075.portsmouth.uk.ibm.com (9.149.109.197) by e06smtp05.uk.ibm.com (192.168.101.135) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Mon, 26 Nov 2018 12:11:47 -0000 Received: from d06av24.portsmouth.uk.ibm.com (d06av24.portsmouth.uk.ibm.com [9.149.105.60]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id wAQCBkXf8388930 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Mon, 26 Nov 2018 12:11:46 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D307942042; Mon, 26 Nov 2018 12:11:45 +0000 (GMT) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 98CB442041; Mon, 26 Nov 2018 12:11:45 +0000 (GMT) Received: from white.boeblingen.de.ibm.com (unknown [9.152.98.73]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 26 Nov 2018 12:11:45 +0000 (GMT) From: Ilya Leoshkevich To: gcc-patches@gcc.gnu.org Cc: krebbel@linux.ibm.com, rdapp@linux.ibm.com, segher@kernel.crashing.org, Ilya Leoshkevich Subject: [PATCH v4] Repeat jump threading after combine Date: Mon, 26 Nov 2018 13:11:40 +0100 MIME-Version: 1.0 x-cbid: 18112612-0020-0000-0000-000002EDD20C X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18112612-0021-0000-0000-0000213D1EF7 Message-Id: <20181126121140.7176-1-iii@linux.ibm.com> X-IsSubscribed: yes Bootstrapped and regtested on x86_64-redhat-linux, s390x-redhat-linux and ppc64le-redhat-linux. Previous iteration: https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00495.html In the end, the main question was: does this make the code better on architectures other than s390? https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00993.html Not sure whether it's already too late for this one, but I'd like to at least post the updated code, my observations and SPEC CPU results. - Code size decreases in most cases. In general, the main side-effect of this patch is that after jump threading bbro pass builds different traces and reorders and merges basic blocks differently: # x86_64-redhat-linux: 436.cactusADM 274479 insns -528 smaller # maximum decrease 526.blender_r 2773303 insns -203 smaller 502.gcc_r 2262388 insns -142 smaller 403.gcc 815367 insns -106 smaller ... 525.x264_r 174450 insns +10 bigger # maximum increase # ppc64le-redhat-linux: 526.blender_r 3422613 insns -276 smaller # maximum decrease 521.wrf_r 6008722 insns -228 smaller 520.omnetpp_r 612626 insns -52 smaller ... 435.gromacs 338597 insns +16 bigger # maximum increase - Compilation performance did not seem to have been affected in a measurable way. According to -ftime-report, the total user time of SPEC CPU build used to be 26018s, and now it is 25985s, the difference being -0.12%. - Run time differences are all over the place: # x86_64-redhat-linux: 548.exchange2_r -1.82% 541.leela_r -1.59% 538.imagick_r -0.95% 520.omnetpp_r -0.94% 403.gcc -0.76% 447.dealII -0.58% 526.blender_r -0.56% 450.soplex -0.51% # skip |dt| < 0.5% 523.xalancbmk_r +0.52% 416.gamess +0.61% 503.bwaves_r +0.62% 445.gobmk +0.66% 456.hmmer +0.70% 549.fotonik3d_r +0.74% 471.omnetpp +0.99% 459.GemsFDTD +1.09% 554.roms_r +1.30% 500.perlbench_r +1.56% 483.xalancbmk +1.60% # ppc64le-redhat-linux: 511.povray_r -1.29% 482.sphinx3 -0.65% 456.hmmer -0.53% 519.lbm_r -0.51% # skip |dt| < 0.5% 549.fotonik3d_r +1.13% 403.gcc +1.76% 500.perlbench_r +2.35% I've investigated 483.xalancbmk and 500.perlbench_r regressions on x86_64. Even though the total 483.xalancbmk size slightly decreases, we get 4% more icache misses and 25% more stalls because of that. I couldn't pinpoint that to a certain function or line of code - can this be due to somehow generally worsened locality? 500.perlbench_r has 25% more indirect branch mispedicts, particularly, when perl_run ends up calling Perl_pp_rv2av, Perl_pp_gvsv and Perl_pp_nextstate. I have to admit I don't know what could have caused that. Consider the following RTL: (insn (set (reg 65) (if_then_else (eq %cc 0) 1 0))) (insn (parallel [(set %cc (compare (reg 65) 0)) (clobber %scratch)])) (jump_insn (set %pc (if_then_else (ne %cc 0) (label_ref 23) %pc))) Combine simplifies this into: (note NOTE_INSN_DELETED) (note NOTE_INSN_DELETED) (jump_insn (set %pc (if_then_else (eq %cc 0) (label_ref 23) %pc))) opening up the possibility to perform jump threading. gcc/ChangeLog: 2018-09-19 Ilya Leoshkevich PR target/80080 * cfgcleanup.c (class pass_postreload_jump): New pass. (pass_postreload_jump::execute): Likewise. (make_pass_postreload_jump): Likewise. * passes.def: Add pass_postreload_jump before pass_postreload_cse. * tree-pass.h (make_pass_postreload_jump): New pass. gcc/testsuite/ChangeLog: 2018-09-05 Ilya Leoshkevich PR target/80080 * gcc.target/s390/pr80080-4.c: New test. --- gcc/cfgcleanup.c | 42 +++++++++++++++++++++++ gcc/passes.def | 1 + gcc/testsuite/gcc.target/s390/pr80080-4.c | 16 +++++++++ gcc/tree-pass.h | 1 + 4 files changed, 60 insertions(+) create mode 100644 gcc/testsuite/gcc.target/s390/pr80080-4.c diff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c index 4a5dc29d14f..bc4a78889db 100644 --- a/gcc/cfgcleanup.c +++ b/gcc/cfgcleanup.c @@ -3259,6 +3259,48 @@ make_pass_jump (gcc::context *ctxt) namespace { +const pass_data pass_data_postreload_jump = +{ + RTL_PASS, /* type */ + "postreload_jump", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + TV_JUMP, /* tv_id */ + 0, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ +}; + +class pass_postreload_jump : public rtl_opt_pass +{ +public: + pass_postreload_jump (gcc::context *ctxt) + : rtl_opt_pass (pass_data_postreload_jump, ctxt) + {} + + /* opt_pass methods: */ + virtual unsigned int execute (function *); + +}; // class pass_postreload_jump + +unsigned int +pass_postreload_jump::execute (function *) +{ + cleanup_cfg (flag_thread_jumps ? CLEANUP_THREADING : 0); + return 0; +} + +} // anon namespace + +rtl_opt_pass * +make_pass_postreload_jump (gcc::context *ctxt) +{ + return new pass_postreload_jump (ctxt); +} + +namespace { + const pass_data pass_data_jump2 = { RTL_PASS, /* type */ diff --git a/gcc/passes.def b/gcc/passes.def index 82ad9404b9e..0079fecef32 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -458,6 +458,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_reload); NEXT_PASS (pass_postreload); PUSH_INSERT_PASSES_WITHIN (pass_postreload) + NEXT_PASS (pass_postreload_jump); NEXT_PASS (pass_postreload_cse); NEXT_PASS (pass_gcse2); NEXT_PASS (pass_split_after_reload); diff --git a/gcc/testsuite/gcc.target/s390/pr80080-4.c b/gcc/testsuite/gcc.target/s390/pr80080-4.c new file mode 100644 index 00000000000..5fc6a558008 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/pr80080-4.c @@ -0,0 +1,16 @@ +/* { dg-do compile { target { lp64 } } } */ +/* { dg-options "-march=z196 -O2" } */ + +extern void bar(int *mem); + +void foo4(int *mem) +{ + int oldval = 0; + if (!__atomic_compare_exchange_n (mem, (void *) &oldval, 1, + 1, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) + { + bar (mem); + } +} + +/* { dg-final { scan-assembler {(?n)\n\tlt\t.*\n\tjne\t(\.L\d+)\n(.*\n)*\tcs\t.*\n\tber\t%r14\n\1:\n\tjg\tbar\n} } } */ diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 2f8779ee4b8..b20d34c15e9 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -579,6 +579,7 @@ extern rtl_opt_pass *make_pass_clean_state (gcc::context *ctxt); extern rtl_opt_pass *make_pass_branch_prob (gcc::context *ctxt); extern rtl_opt_pass *make_pass_value_profile_transformations (gcc::context *ctxt); +extern rtl_opt_pass *make_pass_postreload_jump (gcc::context *ctxt); extern rtl_opt_pass *make_pass_postreload_cse (gcc::context *ctxt); extern rtl_opt_pass *make_pass_gcse2 (gcc::context *ctxt); extern rtl_opt_pass *make_pass_split_after_reload (gcc::context *ctxt);