From patchwork Sun Nov 17 23:35:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1196471 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-513862-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="B0IGXKfH"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47GT3q0fqDz9sPJ for ; Mon, 18 Nov 2019 10:35:50 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=TuYwbMX/WJEiQEqOPYvvVw7/D2bWQugZAhe25O+r4lY2Mffsl/ucJ b+/haKv44YERZIuD8mmRc4E0/bVGasb0FsumaJfodSSUvieslWK0XW9NuRA7+Xaf UDWsIbewBjqWci3bdncqsbLfTMuD9hcQqp6tIEawyhCQuLAFLrxtMs= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=lPOwbyowEg/EI5oQG1Y9IMtayL8=; b=B0IGXKfH7x18E6jT1VNE jdBCJwxBlp2vDpDtrIveDXDN8avqJ0XsYDqr+0JJfUZQv/bf9rJLD40cWMX2sFmX YtQwnAQPHMc2oVly6kWLCWsjESX03RM8OhH1/FtkNiaoUHL8/5ieztMekoWsqLzS 70Jd00kmQ2HEv+bYSWJPjgA= Received: (qmail 5788 invoked by alias); 17 Nov 2019 23:35:40 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 5777 invoked by uid 89); 17 Nov 2019 23:35:40 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-9.3 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KAM_SHORT autolearn=ham version=3.3.1 spammy=10024, gap, 1408, footer X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.110.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sun, 17 Nov 2019 23:35:30 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5767D30E for ; Sun, 17 Nov 2019 15:35:28 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9D8B43F52E for ; Sun, 17 Nov 2019 15:35:27 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: Add a new combine pass Date: Sun, 17 Nov 2019 23:35:26 +0000 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 X-IsSubscribed: yes (It's 23:35 local time, so it's still just about stage 1. :-)) While working on SVE, I've noticed several cases in which we fail to combine instructions because the combined form would need to be placed earlier in the instruction stream than the last of the instructions being combined. This includes one very important case in the handling of the first fault register (FFR). Combine currently requires the combined instruction to live at the same location as i3. I thought about trying to relax that restriction, but it would be difficult to do with the current pass structure while keeping everything linear-ish time. So this patch instead goes for an option that has been talked about several times over the years: writing a new combine pass that just does instruction combination, and not all the other optimisations that have been bolted onto combine over time. E.g. it deliberately doesn't do things like nonzero-bits tracking, since that really ought to be a separate, more global, optimisation. This is still far from being a realistic replacement for the even the combine parts of the current combine pass. E.g.: - it only handles combinations that can be built up from individual two-instruction combinations. - it doesn't allow new hard register clobbers to be added. - it doesn't have the special treatment of CC operations. - etc. But we have to start somewhere. On a more positive note, the pass handles things that the current combine pass doesn't: - the main motivating feature mentioned above: it works out where the combined instruction could validly live and moves it there if necessary. If there are a range of valid places, it tries to pick the best one based on register pressure (although only with a simple heuristic for now). - once it has combined two instructions, it can try combining the result with both later and earlier code, i.e. it can combine in both directions. - it tries using REG_EQUAL notes for the final instruction. - it can parallelise two independent instructions that both read from the same register or both read from memory. This last feature is useful for generating more load-pair combinations on AArch64. In some cases it can also produce more store-pair combinations, but only for consecutive stores. However, since the pass currently does this in a very greedy, peephole way, it only allows load/store-pair combinations if the first memory access has a higher alignment than the second, i.e. if we can be sure that the combined access is naturally aligned. This should help it to make better decisions than the post-RA peephole pass in some cases while not being too aggressive. The pass is supposed to be linear time without debug insns. It only tries a constant number C of combinations per instruction and its bookkeeping updates are constant-time. Once it has combined two instructions, it'll try up to C combinations on the result, but this can be counted against the instruction that was deleted by the combination and so effectively just doubles the constant. (Note that C depends on MAX_RECOG_OPERANDS and the new NUM_RANGE_USERS constant.) Unfortunately, debug updates via propagate_for_debug are more expensive. This could probably be fixed if the pass did more to track debug insns itself, but using propagate_for_debug matches combine's behaviour. The patch adds two instances of the new pass: one before combine and one after it. By default both are disabled, but this can be changed using the new 3-bit run-combine param, where: - bit 0 selects the new pre-combine pass - bit 1 selects the main combine pass - bit 2 selects the new post-combine pass The idea is that run-combine=3 can be used to see which combinations are missed by the new pass, while run-combine=6 (which I hope to be the production setting for AArch64 at -O2+) just uses the new pass to mop up cases that normal combine misses. Maybe in some distant future, the pass will be good enough for run-combine=[14] to be a realistic option. I ended up having to add yet another validate_simplify_* routine, this time to do the equivalent of: newx = simplify_replace_rtx (*loc, old_rtx, new_rtx); validate_change (insn, loc, newx, 1); but in a more memory-efficient way. validate_replace_rtx isn't suitable because it deliberately only tries simplifications in limited cases: /* Do changes needed to keep rtx consistent. Don't do any other simplifications, as it is not our job. */ And validate_simplify_insn isn't useful for this case because it works on patterns that have already had changes made to them and expects those patterns to be valid rtxes. simplify-replace operations instead need to simplify as they go, when the original modes are still to hand. As far as compile-time goes, I tried compiling optabs.ii at -O2 with an --enable-checking=release compiler: run-combine=2 (normal combine): 100.0% (baseline) run-combine=4 (new pass only) 98.0% run-combine=6 (both passes) 100.3% where the results are easily outside the noise. So the pass on its own is quicker than combine, but that's not a fair comparison when it doesn't do everything combine does. Running both passes only has a slight overhead. To get a feel for the effect on multiple targets, I did my usual bogo-comparison of number of lines of asm for gcc.c-torture, gcc.dg and g++.dg, this time comparing run-combine=2 and run-combine=6 using -O2 -ftree-vectorize: Target Tests Delta Best Worst Median ====== ===== ===== ==== ===== ====== aarch64-linux-gnu 3974 -39393 -2275 90 -2 aarch64_be-linux-gnu 3389 -36683 -2275 165 -2 alpha-linux-gnu 4154 -62860 -2132 335 -2 amdgcn-amdhsa 4818 9079 -7987 51850 -2 arc-elf 2868 -63710 -18998 286 -1 arm-linux-gnueabi 4053 -80404 -10019 605 -2 arm-linux-gnueabihf 4053 -80404 -10019 605 -2 avr-elf 3620 38513 -2386 23364 2 bfin-elf 2691 -32973 -1483 1127 -2 bpf-elf 5581 -78105 -11064 113 -3 c6x-elf 3915 -31710 -2441 1560 -2 cr16-elf 6030 192102 -1757 60009 12 cris-elf 2217 -30794 -1716 294 -2 csky-elf 2003 -24989 -9999 1468 -2 epiphany-elf 3345 -19416 -1803 4594 -2 fr30-elf 3562 -15077 -1921 2334 -1 frv-linux-gnu 2423 -16589 -1736 999 -1 ft32-elf 2246 -46337 -15988 433 -2 h8300-elf 2581 -33553 -1403 168 -2 hppa64-hp-hpux11.23 3926 -120876 -50134 1056 -2 i686-apple-darwin 3562 -46851 -1764 310 -2 i686-pc-linux-gnu 2902 -3639 -4809 6848 -2 ia64-linux-gnu 2900 -158870 -14006 428 -7 iq2000-elf 2929 -54690 -2904 2576 -3 lm32-elf 5265 162519 -1918 8004 5 m32r-elf 1861 -25296 -2713 1004 -2 m68k-linux-gnu 2520 -241573 -21879 200 -3 mcore-elf 2378 -28532 -1810 1635 -2 microblaze-elf 2782 -137363 -9516 1986 -2 mipsel-linux-gnu 2443 -38422 -8331 458 -1 mipsisa64-linux-gnu 2287 -60294 -12214 432 -2 mmix 4910 -136549 -13616 599 -2 mn10300-elf 2944 -29151 -2488 132 -1 moxie-rtems 1935 -12364 -1002 125 -1 msp430-elf 2379 -37007 -2163 176 -2 nds32le-elf 2356 -27551 -2126 163 -1 nios2-linux-gnu 1572 -44828 -23613 92 -2 nvptx-none 1014 -17337 -1590 16 -3 or1k-elf 2724 -92816 -14144 56 -3 pdp11 1897 -27296 -1370 534 -2 powerpc-ibm-aix7.0 2909 -58829 -10026 2001 -2 powerpc64-linux-gnu 3685 -60551 -12158 2001 -1 powerpc64le-linux-gnu 3501 -61846 -10024 765 -2 pru-elf 1574 -29734 -19998 1718 -1 riscv32-elf 2357 -22506 -10002 10175 -1 riscv64-elf 3320 -56777 -10002 226 -2 rl78-elf 2113 -232328 -18607 4065 -3 rx-elf 2800 -38515 -896 491 -2 s390-linux-gnu 3582 -75626 -12098 3999 -2 s390x-linux-gnu 3761 -73473 -13748 3999 -2 sh-linux-gnu 2350 -26401 -1003 522 -2 sparc-linux-gnu 3279 -49518 -2175 2223 -2 sparc64-linux-gnu 3849 -123084 -30200 2141 -2 tilepro-linux-gnu 2737 -35562 -3458 2848 -2 v850-elf 9002 -169126 -49996 76 -4 vax-netbsdelf 3325 -57734 -10000 1989 -2 visium-elf 1860 -17006 -1006 1066 -2 x86_64-darwin 3278 -48933 -9999 1408 -2 x86_64-linux-gnu 3008 -43887 -9999 3248 -2 xstormy16-elf 2497 -26569 -2051 89 -2 xtensa-elf 2161 -31231 -6910 138 -2 So running both passes does seem to have a significant benefit on most targets, but there are some nasty-looking outliers. The usual caveat applies: number of lines is a very poor measurement, it's just to get a feel. Bootstrapped & regression-tested on aarch64-linux-gnu and x86_64-linux-gnu with both run-combine=3 as the default (so that the new pass runs first) and with run-combine=6 as the default (so that the new pass runs second). There were no new execution failures. A couple of guality.exp tests that already failed for most options started failing for a couple more. Enabling the pass fixes the XFAILs in: gcc.target/aarch64/sve/acle/general/ptrue_pat_[234].c Inevitably there was some scan-assembler fallout for other tests. E.g. in gcc.target/aarch64/vmov_n_1.c: #define INHIB_OPTIMIZATION asm volatile ("" : : : "memory") ... INHIB_OPTIMIZATION; \ (a) = TEST (test, data_len); \ INHIB_OPTIMIZATION; \ (b) = VMOV_OBSCURE_INST (reg_len, data_len, data_type) (&(a)); \ is no longer effective for preventing move (a) from being merged into (b), because the pass can merge at the point of (a). I think this is a valid thing to do -- the asm semantics are still satisfied, and asm volatile ("" : : : "memory") never acted as a register barrier. But perhaps we should deal with this as a special case? Richard 2019-11-17 Richard Sandiford gcc/ * Makefile.in (OBJS): Add combine2.o * params.opt (--param=run-combine): New option. * doc/invoke.texi: Document it. * tree-pass.h (make_pass_combine2_before): Declare. (make_pass_combine2_after): Likewise. * passes.def: Add them. * timevar.def (TV_COMBINE2): New timevar. * cfgrtl.h (update_cfg_for_uncondjump): Declare. * combine.c (update_cfg_for_uncondjump): Move to... * cfgrtl.c (update_cfg_for_uncondjump): ...here. * simplify-rtx.c (simplify_truncation): Handle comparisons. * recog.h (validate_simplify_replace_rtx): Declare. * recog.c (validate_simplify_replace_rtx_1): New function. (validate_simplify_replace_rtx_uses): Likewise. (validate_simplify_replace_rtx): Likewise. * combine2.c: New file. Index: gcc/Makefile.in =================================================================== --- gcc/Makefile.in 2019-11-14 14:34:27.599783740 +0000 +++ gcc/Makefile.in 2019-11-17 23:15:31.188500613 +0000 @@ -1261,6 +1261,7 @@ OBJS = \ cgraphunit.o \ cgraphclones.o \ combine.o \ + combine2.o \ combine-stack-adj.o \ compare-elim.o \ context.o \ Index: gcc/params.opt =================================================================== --- gcc/params.opt 2019-11-14 14:34:26.339792215 +0000 +++ gcc/params.opt 2019-11-17 23:15:31.200500531 +0000 @@ -768,6 +768,10 @@ Use internal function id in profile look Common Joined UInteger Var(param_rpo_vn_max_loop_depth) Init(7) IntegerRange(2, 65536) Param Maximum depth of a loop nest to fully value-number optimistically. +-param=run-combine= +Target Joined UInteger Var(param_run_combine) Init(2) IntegerRange(0, 7) Param +Choose which of the 3 available combine passes to run: bit 1 for the main combine pass, bit 0 for an earlier variant of the combine pass, and bit 2 for a later variant of the combine pass. + -param=sccvn-max-alias-queries-per-access= Common Joined UInteger Var(param_sccvn_max_alias_queries_per_access) Init(1000) Param Maximum number of disambiguations to perform per memory access. Index: gcc/doc/invoke.texi =================================================================== --- gcc/doc/invoke.texi 2019-11-16 10:43:45.597105823 +0000 +++ gcc/doc/invoke.texi 2019-11-17 23:15:31.200500531 +0000 @@ -11807,6 +11807,11 @@ in combiner for a pseudo register as las @item max-combine-insns The maximum number of instructions the RTL combiner tries to combine. +@item run-combine +Choose which of the 3 available combine passes to run: bit 1 for the main +combine pass, bit 0 for an earlier variant of the combine pass, and bit 2 +for a later variant of the combine pass. + @item integer-share-limit Small integer constants can use a shared data structure, reducing the compiler's memory usage and increasing its speed. This sets the maximum Index: gcc/tree-pass.h =================================================================== --- gcc/tree-pass.h 2019-10-29 08:29:03.096444049 +0000 +++ gcc/tree-pass.h 2019-11-17 23:15:31.204500501 +0000 @@ -562,7 +562,9 @@ extern rtl_opt_pass *make_pass_reginfo_i extern rtl_opt_pass *make_pass_inc_dec (gcc::context *ctxt); extern rtl_opt_pass *make_pass_stack_ptr_mod (gcc::context *ctxt); extern rtl_opt_pass *make_pass_initialize_regs (gcc::context *ctxt); +extern rtl_opt_pass *make_pass_combine2_before (gcc::context *ctxt); extern rtl_opt_pass *make_pass_combine (gcc::context *ctxt); +extern rtl_opt_pass *make_pass_combine2_after (gcc::context *ctxt); extern rtl_opt_pass *make_pass_if_after_combine (gcc::context *ctxt); extern rtl_opt_pass *make_pass_jump_after_combine (gcc::context *ctxt); extern rtl_opt_pass *make_pass_ree (gcc::context *ctxt); Index: gcc/passes.def =================================================================== --- gcc/passes.def 2019-10-29 08:29:03.224443133 +0000 +++ gcc/passes.def 2019-11-17 23:15:31.200500531 +0000 @@ -437,7 +437,9 @@ along with GCC; see the file COPYING3. NEXT_PASS (pass_inc_dec); NEXT_PASS (pass_initialize_regs); NEXT_PASS (pass_ud_rtl_dce); + NEXT_PASS (pass_combine2_before); NEXT_PASS (pass_combine); + NEXT_PASS (pass_combine2_after); NEXT_PASS (pass_if_after_combine); NEXT_PASS (pass_jump_after_combine); NEXT_PASS (pass_partition_blocks); Index: gcc/timevar.def =================================================================== --- gcc/timevar.def 2019-10-11 15:43:53.403498517 +0100 +++ gcc/timevar.def 2019-11-17 23:15:31.204500501 +0000 @@ -251,6 +251,7 @@ DEFTIMEVAR (TV_AUTO_INC_DEC , " DEFTIMEVAR (TV_CSE2 , "CSE 2") DEFTIMEVAR (TV_BRANCH_PROB , "branch prediction") DEFTIMEVAR (TV_COMBINE , "combiner") +DEFTIMEVAR (TV_COMBINE2 , "second combiner") DEFTIMEVAR (TV_IFCVT , "if-conversion") DEFTIMEVAR (TV_MODE_SWITCH , "mode switching") DEFTIMEVAR (TV_SMS , "sms modulo scheduling") Index: gcc/cfgrtl.h =================================================================== --- gcc/cfgrtl.h 2019-03-08 18:15:39.320730391 +0000 +++ gcc/cfgrtl.h 2019-11-17 23:15:31.192500584 +0000 @@ -47,6 +47,7 @@ extern void fixup_partitions (void); extern bool purge_dead_edges (basic_block); extern bool purge_all_dead_edges (void); extern bool fixup_abnormal_edges (void); +extern void update_cfg_for_uncondjump (rtx_insn *); extern rtx_insn *unlink_insn_chain (rtx_insn *, rtx_insn *); extern void relink_block_chain (bool); extern rtx_insn *duplicate_insn_chain (rtx_insn *, rtx_insn *); Index: gcc/combine.c =================================================================== --- gcc/combine.c 2019-11-13 08:42:45.537368745 +0000 +++ gcc/combine.c 2019-11-17 23:15:31.192500584 +0000 @@ -2530,42 +2530,6 @@ reg_subword_p (rtx x, rtx reg) && GET_MODE_CLASS (GET_MODE (x)) == MODE_INT; } -/* Delete the unconditional jump INSN and adjust the CFG correspondingly. - Note that the INSN should be deleted *after* removing dead edges, so - that the kept edge is the fallthrough edge for a (set (pc) (pc)) - but not for a (set (pc) (label_ref FOO)). */ - -static void -update_cfg_for_uncondjump (rtx_insn *insn) -{ - basic_block bb = BLOCK_FOR_INSN (insn); - gcc_assert (BB_END (bb) == insn); - - purge_dead_edges (bb); - - delete_insn (insn); - if (EDGE_COUNT (bb->succs) == 1) - { - rtx_insn *insn; - - single_succ_edge (bb)->flags |= EDGE_FALLTHRU; - - /* Remove barriers from the footer if there are any. */ - for (insn = BB_FOOTER (bb); insn; insn = NEXT_INSN (insn)) - if (BARRIER_P (insn)) - { - if (PREV_INSN (insn)) - SET_NEXT_INSN (PREV_INSN (insn)) = NEXT_INSN (insn); - else - BB_FOOTER (bb) = NEXT_INSN (insn); - if (NEXT_INSN (insn)) - SET_PREV_INSN (NEXT_INSN (insn)) = PREV_INSN (insn); - } - else if (LABEL_P (insn)) - break; - } -} - /* Return whether PAT is a PARALLEL of exactly N register SETs followed by an arbitrary number of CLOBBERs. */ static bool @@ -15096,7 +15060,10 @@ const pass_data pass_data_combine = {} /* opt_pass methods: */ - virtual bool gate (function *) { return (optimize > 0); } + virtual bool gate (function *) + { + return optimize > 0 && (param_run_combine & 2) != 0; + } virtual unsigned int execute (function *) { return rest_of_handle_combine (); Index: gcc/cfgrtl.c =================================================================== --- gcc/cfgrtl.c 2019-10-17 14:22:55.523309009 +0100 +++ gcc/cfgrtl.c 2019-11-17 23:15:31.188500613 +0000 @@ -3409,6 +3409,42 @@ fixup_abnormal_edges (void) return inserted; } +/* Delete the unconditional jump INSN and adjust the CFG correspondingly. + Note that the INSN should be deleted *after* removing dead edges, so + that the kept edge is the fallthrough edge for a (set (pc) (pc)) + but not for a (set (pc) (label_ref FOO)). */ + +void +update_cfg_for_uncondjump (rtx_insn *insn) +{ + basic_block bb = BLOCK_FOR_INSN (insn); + gcc_assert (BB_END (bb) == insn); + + purge_dead_edges (bb); + + delete_insn (insn); + if (EDGE_COUNT (bb->succs) == 1) + { + rtx_insn *insn; + + single_succ_edge (bb)->flags |= EDGE_FALLTHRU; + + /* Remove barriers from the footer if there are any. */ + for (insn = BB_FOOTER (bb); insn; insn = NEXT_INSN (insn)) + if (BARRIER_P (insn)) + { + if (PREV_INSN (insn)) + SET_NEXT_INSN (PREV_INSN (insn)) = NEXT_INSN (insn); + else + BB_FOOTER (bb) = NEXT_INSN (insn); + if (NEXT_INSN (insn)) + SET_PREV_INSN (NEXT_INSN (insn)) = PREV_INSN (insn); + } + else if (LABEL_P (insn)) + break; + } +} + /* Cut the insns from FIRST to LAST out of the insns stream. */ rtx_insn * Index: gcc/simplify-rtx.c =================================================================== --- gcc/simplify-rtx.c 2019-11-16 15:33:36.642840131 +0000 +++ gcc/simplify-rtx.c 2019-11-17 23:15:31.204500501 +0000 @@ -851,6 +851,12 @@ simplify_truncation (machine_mode mode, && trunc_int_for_mode (INTVAL (XEXP (op, 1)), mode) == -1) return constm1_rtx; + /* (truncate:A (cmp X Y)) is (cmp:A X Y): we can compute the result + in a narrower mode if useful. */ + if (COMPARISON_P (op)) + return simplify_gen_relational (GET_CODE (op), mode, VOIDmode, + XEXP (op, 0), XEXP (op, 1)); + return NULL_RTX; } Index: gcc/recog.h =================================================================== --- gcc/recog.h 2019-09-09 18:58:28.860430363 +0100 +++ gcc/recog.h 2019-11-17 23:15:31.204500501 +0000 @@ -111,6 +111,7 @@ extern int validate_replace_rtx_part_nos extern void validate_replace_rtx_group (rtx, rtx, rtx_insn *); extern void validate_replace_src_group (rtx, rtx, rtx_insn *); extern bool validate_simplify_insn (rtx_insn *insn); +extern bool validate_simplify_replace_rtx (rtx_insn *, rtx *, rtx, rtx); extern int num_changes_pending (void); extern int next_insn_tests_no_inequality (rtx_insn *); extern bool reg_fits_class_p (const_rtx, reg_class_t, int, machine_mode); Index: gcc/recog.c =================================================================== --- gcc/recog.c 2019-10-01 09:55:35.150088599 +0100 +++ gcc/recog.c 2019-11-17 23:15:31.204500501 +0000 @@ -922,6 +922,226 @@ validate_simplify_insn (rtx_insn *insn) } return ((num_changes_pending () > 0) && (apply_change_group () > 0)); } + +/* A subroutine of validate_simplify_replace_rtx. Apply the replacement + described by R to LOC. Return true on success; leave the caller + to clean up on failure. */ + +static bool +validate_simplify_replace_rtx_1 (validate_replace_src_data &r, rtx *loc) +{ + rtx x = *loc; + enum rtx_code code = GET_CODE (x); + machine_mode mode = GET_MODE (x); + + if (rtx_equal_p (x, r.from)) + { + validate_unshare_change (r.insn, loc, r.to, 1); + return true; + } + + /* Recursively apply the substitution and see if we can simplify + the result. This specifically shouldn't use simplify_gen_*, + since we want to avoid generating new expressions where possible. */ + int old_num_changes = num_validated_changes (); + rtx newx = NULL_RTX; + bool recurse_p = false; + switch (GET_RTX_CLASS (code)) + { + case RTX_UNARY: + { + machine_mode op0_mode = GET_MODE (XEXP (x, 0)); + if (!validate_simplify_replace_rtx_1 (r, &XEXP (x, 0))) + return false; + + newx = simplify_unary_operation (code, mode, XEXP (x, 0), op0_mode); + break; + } + + case RTX_BIN_ARITH: + case RTX_COMM_ARITH: + { + if (!validate_simplify_replace_rtx_1 (r, &XEXP (x, 0)) + || !validate_simplify_replace_rtx_1 (r, &XEXP (x, 1))) + return false; + + newx = simplify_binary_operation (code, mode, + XEXP (x, 0), XEXP (x, 1)); + break; + } + + case RTX_COMPARE: + case RTX_COMM_COMPARE: + { + machine_mode op_mode = (GET_MODE (XEXP (x, 0)) != VOIDmode + ? GET_MODE (XEXP (x, 0)) + : GET_MODE (XEXP (x, 1))); + if (!validate_simplify_replace_rtx_1 (r, &XEXP (x, 0)) + || !validate_simplify_replace_rtx_1 (r, &XEXP (x, 1))) + return false; + + newx = simplify_relational_operation (code, mode, op_mode, + XEXP (x, 0), XEXP (x, 1)); + break; + } + + case RTX_TERNARY: + case RTX_BITFIELD_OPS: + { + machine_mode op0_mode = GET_MODE (XEXP (x, 0)); + if (!validate_simplify_replace_rtx_1 (r, &XEXP (x, 0)) + || !validate_simplify_replace_rtx_1 (r, &XEXP (x, 1)) + || !validate_simplify_replace_rtx_1 (r, &XEXP (x, 2))) + return false; + + newx = simplify_ternary_operation (code, mode, op0_mode, + XEXP (x, 0), XEXP (x, 1), + XEXP (x, 2)); + break; + } + + case RTX_EXTRA: + if (code == SUBREG) + { + machine_mode inner_mode = GET_MODE (SUBREG_REG (x)); + if (!validate_simplify_replace_rtx_1 (r, &SUBREG_REG (x))) + return false; + + rtx inner = SUBREG_REG (x); + newx = simplify_subreg (mode, inner, inner_mode, SUBREG_BYTE (x)); + /* Reject the same cases that simplify_gen_subreg would. */ + if (!newx + && (GET_CODE (inner) == SUBREG + || GET_CODE (inner) == CONCAT + || GET_MODE (inner) == VOIDmode + || !validate_subreg (mode, inner_mode, + inner, SUBREG_BYTE (x)))) + return false; + break; + } + else + recurse_p = true; + break; + + case RTX_OBJ: + if (code == LO_SUM) + { + if (!validate_simplify_replace_rtx_1 (r, &XEXP (x, 0)) + || !validate_simplify_replace_rtx_1 (r, &XEXP (x, 1))) + return false; + + /* (lo_sum (high x) y) -> y where x and y have the same base. */ + rtx op0 = XEXP (x, 0); + rtx op1 = XEXP (x, 1); + if (GET_CODE (op0) == HIGH) + { + rtx base0, base1, offset0, offset1; + split_const (XEXP (op0, 0), &base0, &offset0); + split_const (op1, &base1, &offset1); + if (rtx_equal_p (base0, base1)) + newx = op1; + } + } + else if (code == REG) + { + if (REG_P (r.from) && reg_overlap_mentioned_p (x, r.from)) + return false; + } + else + recurse_p = true; + break; + + case RTX_CONST_OBJ: + break; + + case RTX_AUTOINC: + if (reg_overlap_mentioned_p (XEXP (x, 0), r.from)) + return false; + recurse_p = true; + break; + + case RTX_MATCH: + case RTX_INSN: + gcc_unreachable (); + } + + if (recurse_p) + { + const char *fmt = GET_RTX_FORMAT (code); + for (int i = 0; fmt[i]; i++) + switch (fmt[i]) + { + case 'E': + for (int j = 0; j < XVECLEN (x, i); j++) + if (!validate_simplify_replace_rtx_1 (r, &XVECEXP (x, i, j))) + return false; + break; + + case 'e': + if (XEXP (x, i) + && !validate_simplify_replace_rtx_1 (r, &XEXP (x, i))) + return false; + break; + } + } + + if (newx && !rtx_equal_p (x, newx)) + { + /* There's no longer any point unsharing the substitutions made + for subexpressions, since we'll just copy this one instead. */ + for (int i = old_num_changes; i < num_changes; ++i) + changes[i].unshare = false; + validate_unshare_change (r.insn, loc, newx, 1); + } + + return true; +} + +/* A note_uses callback for validate_simplify_replace_rtx. + DATA points to a validate_replace_src_data object. */ + +static void +validate_simplify_replace_rtx_uses (rtx *loc, void *data) +{ + validate_replace_src_data &r = *(validate_replace_src_data *) data; + if (r.insn && !validate_simplify_replace_rtx_1 (r, loc)) + r.insn = NULL; +} + +/* Try to perform the equivalent of: + + newx = simplify_replace_rtx (*loc, OLD_RTX, NEW_RTX); + validate_change (INSN, LOC, newx, 1); + + but without generating as much garbage rtl when the resulting + pattern doesn't match. + + Return true if we were able to replace all uses of OLD_RTX in *LOC + and if the result conforms to general rtx rules (e.g. for whether + subregs are meaningful). + + When returning true, add all replacements to the current validation group, + leaving the caller to test it in the normal way. Leave both *LOC and the + validation group unchanged on failure. */ + +bool +validate_simplify_replace_rtx (rtx_insn *insn, rtx *loc, + rtx old_rtx, rtx new_rtx) +{ + validate_replace_src_data r; + r.from = old_rtx; + r.to = new_rtx; + r.insn = insn; + + unsigned int num_changes = num_validated_changes (); + note_uses (loc, validate_simplify_replace_rtx_uses, &r); + if (!r.insn) + { + cancel_changes (num_changes); + return false; + } + return true; +} /* Return 1 if the insn using CC0 set by INSN does not contain any ordered tests applied to the condition codes. Index: gcc/combine2.c =================================================================== --- /dev/null 2019-09-17 11:41:18.176664108 +0100 +++ gcc/combine2.c 2019-11-17 23:15:31.196500559 +0000 @@ -0,0 +1,1576 @@ +/* Combine instructions + Copyright (C) 2019 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "backend.h" +#include "rtl.h" +#include "df.h" +#include "tree-pass.h" +#include "memmodel.h" +#include "emit-rtl.h" +#include "insn-config.h" +#include "recog.h" +#include "print-rtl.h" +#include "rtl-iter.h" +#include "predict.h" +#include "cfgcleanup.h" +#include "cfghooks.h" +#include "cfgrtl.h" +#include "alias.h" +#include "valtrack.h" + +/* This pass tries to combine instructions in the following ways: + + (1) If we have two dependent instructions: + + I1: (set DEST1 SRC1) + I2: (...DEST1...) + + and I2 is the only user of DEST1, the pass tries to combine them into: + + I2: (...SRC1...) + + (2) If we have two dependent instructions: + + I1: (set DEST1 SRC1) + I2: (...DEST1...) + + the pass tries to combine them into: + + I2: (parallel [(set DEST1 SRC1) (...SRC1...)]) + + or: + + I2: (parallel [(...SRC1...) (set DEST1 SRC1)]) + + (3) If we have two independent instructions: + + I1: (set DEST1 SRC1) + I2: (set DEST2 SRC2) + + that read from memory or from the same register, the pass tries to + combine them into: + + I2: (parallel [(set DEST1 SRC1) (set DEST2 SRC2)]) + + or: + + I2: (parallel [(set DEST2 SRC2) (set DEST1 SRC1)]) + + If the combined form is a valid instruction, the pass tries to find a + place between I1 and I2 inclusive for the new instruction. If there + are multiple valid locations, it tries to pick the best one by taking + the effect on register pressure into account. + + If a combination succeeds and produces a single set, the pass tries to + combine the new form with earlier or later instructions. + + The pass currently optimizes each basic block separately. It walks + the instructions in reverse order, building up live ranges for registers + and memory. It then uses these live ranges to look for possible + combination opportunities and to decide where the combined instructions + could be placed. + + The pass represents positions in the block using point numbers, + with higher numbers indicating earlier instructions. The numbering + scheme is that: + + - the end of the current instruction sequence has an even base point B. + + - instructions initially have odd-numbered points B + 1, B + 3, etc. + with B + 1 being the final instruction in the sequence. + + - even points after B represent gaps between instructions where combined + instructions could be placed. + + Thus even points initially represent no instructions and odd points + initially represent single instructions. However, when picking a + place for a combined instruction, the pass may choose somewhere + inbetween the original two instructions, so that over time a point + may come to represent several instructions. When this happens, + the pass maintains the invariant that all instructions with the same + point number are independent of each other and thus can be treated as + acting in parallel (or as acting in any arbitrary sequence). + + TODOs: + + - Handle 3-instruction combinations, and possibly more. + + - Handle existing clobbers more efficiently. At the moment we can't + move an instruction that clobbers R across another instruction that + clobbers R. + + - Allow hard register clobbers to be added, like combine does. + + - Perhaps work on EBBs, or SESE regions. */ + +namespace { + +/* The number of explicit uses to record in a live range. */ +const unsigned int NUM_RANGE_USERS = 4; + +/* The maximum number of instructions that we can combine at once. */ +const unsigned int MAX_COMBINE_INSNS = 2; + +/* A fake cost for instructions that we haven't costed yet. */ +const unsigned int UNKNOWN_COST = ~0U; + +class combine2 +{ +public: + combine2 (function *); + ~combine2 (); + + void execute (); + +private: + struct insn_info_rec; + + /* Describes the live range of a register or of memory. For simplicity, + we treat memory as a single entity. + + If we had a fully-accurate live range, updating it to account for a + moved instruction would be a linear-time operation. Doing this for + each combination would then make the pass quadratic. We therefore + just maintain a list of NUM_RANGE_USERS use insns and use simple, + conservatively-correct behavior for the rest. */ + struct live_range_rec + { + /* Which instruction provides the dominating definition, or null if + we don't know yet. */ + insn_info_rec *producer; + + /* A selection of instructions that use the resource, in program order. */ + insn_info_rec *users[NUM_RANGE_USERS]; + + /* An inclusive range of points that covers instructions not mentioned + in USERS. Both values are zero if there are no such instructions. + + Once we've included a use U at point P in this range, we continue + to assume that some kind of use exists at P whatever happens to U + afterwards. */ + unsigned int first_extra_use; + unsigned int last_extra_use; + + /* The register number this range describes, or INVALID_REGNUM + for memory. */ + unsigned int regno; + + /* Forms a linked list of ranges for the same resource, in program + order. */ + live_range_rec *prev_range; + live_range_rec *next_range; + }; + + /* Pass-specific information about an instruction. */ + struct insn_info_rec + { + /* The instruction itself. */ + rtx_insn *insn; + + /* A null-terminated list of live ranges for the things that this + instruction defines. */ + live_range_rec **defs; + + /* A null-terminated list of live ranges for the things that this + instruction uses. */ + live_range_rec **uses; + + /* The point at which the instruction appears. */ + unsigned int point; + + /* The cost of the instruction, or UNKNOWN_COST if we haven't + measured it yet. */ + unsigned int cost; + }; + + /* Describes one attempt to combine instructions. */ + struct combination_attempt_rec + { + /* The instruction that we're currently trying to optimize. + If the combination succeeds, we'll use this insn_info_rec + to describe the new instruction. */ + insn_info_rec *new_home; + + /* The instructions we're combining, in program order. */ + insn_info_rec *sequence[MAX_COMBINE_INSNS]; + + /* If we're substituting SEQUENCE[0] into SEQUENCE[1], this is the + live range that describes the substituted register. */ + live_range_rec *def_use_range; + + /* The earliest and latest points at which we could insert the + combined instruction. */ + unsigned int earliest_point; + unsigned int latest_point; + + /* The cost of the new instruction, once we have a successful match. */ + unsigned int new_cost; + }; + + /* Pass-specific information about a register. */ + struct reg_info_rec + { + /* The live range associated with the last reference to the register. */ + live_range_rec *range; + + /* The point at which the last reference occurred. */ + unsigned int next_ref; + + /* True if the register is currently live. We record this here rather + than in a separate bitmap because (a) there's a natural hole for + it on LP64 hosts and (b) we only refer to it when updating the + other fields, and so recording it here should give better locality. */ + unsigned int live_p : 1; + }; + + live_range_rec *new_live_range (unsigned int, live_range_rec *); + live_range_rec *reg_live_range (unsigned int); + live_range_rec *mem_live_range (); + bool add_range_use (live_range_rec *, insn_info_rec *); + void remove_range_use (live_range_rec *, insn_info_rec *); + bool has_single_use_p (live_range_rec *); + bool known_last_use_p (live_range_rec *, insn_info_rec *); + unsigned int find_earliest_point (insn_info_rec *, insn_info_rec *); + unsigned int find_latest_point (insn_info_rec *, insn_info_rec *); + bool start_combination (combination_attempt_rec &, insn_info_rec *, + insn_info_rec *, live_range_rec * = NULL); + bool verify_combination (combination_attempt_rec &); + int estimate_reg_pressure_delta (insn_info_rec *); + void commit_combination (combination_attempt_rec &, bool); + bool try_parallel_sets (combination_attempt_rec &, rtx, rtx); + bool try_parallelize_insns (combination_attempt_rec &); + bool try_combine_def_use_1 (combination_attempt_rec &, rtx, rtx, bool); + bool try_combine_def_use (combination_attempt_rec &, rtx, rtx); + bool try_combine_two_uses (combination_attempt_rec &); + bool try_combine (insn_info_rec *, rtx, unsigned int); + bool optimize_insn (insn_info_rec *); + void record_defs (insn_info_rec *); + void record_reg_use (insn_info_rec *, df_ref); + void record_uses (insn_info_rec *); + void process_insn (insn_info_rec *); + void start_sequence (); + + /* The function we're optimizing. */ + function *m_fn; + + /* The highest pseudo register number plus one. */ + unsigned int m_num_regs; + + /* The current basic block. */ + basic_block m_bb; + + /* True if we should optimize the current basic block for speed. */ + bool m_optimize_for_speed_p; + + /* The point number to allocate to the next instruction we visit + in the backward traversal. */ + unsigned int m_point; + + /* The point number corresponding to the end of the current + instruction sequence, i.e. the lowest point number about which + we still have valid information. */ + unsigned int m_end_of_sequence; + + /* The point number corresponding to the end of the current basic block. + This is the same as M_END_OF_SEQUENCE when processing the last + instruction sequence in a basic block. */ + unsigned int m_end_of_bb; + + /* The memory live range, or null if we haven't yet found a memory + reference in the current instruction sequence. */ + live_range_rec *m_mem_range; + + /* Gives information about each register. We track both hard and + pseudo registers. */ + auto_vec m_reg_info; + + /* A bitmap of registers whose entry in m_reg_info is valid. */ + auto_sbitmap m_valid_regs; + + /* If nonnuull, an unused 2-element PARALLEL that we can use to test + instruction combinations. */ + rtx m_spare_parallel; + + /* A bitmap of instructions that we've already tried to combine with. */ + auto_bitmap m_tried_insns; + + /* A temporary bitmap used to hold register numbers. */ + auto_bitmap m_true_deps; + + /* An obstack used for allocating insn_info_recs and for building + up their lists of definitions and uses. */ + obstack m_insn_obstack; + + /* An obstack used for allocating live_range_recs. */ + obstack m_range_obstack; + + /* Start-of-object pointers for the two obstacks. */ + char *m_insn_obstack_start; + char *m_range_obstack_start; + + /* A list of instructions that we've optimized and whose new forms + change the cfg. */ + auto_vec m_cfg_altering_insns; + + /* The INSN_UIDs of all instructions in M_CFG_ALTERING_INSNS. */ + auto_bitmap m_cfg_altering_insn_ids; + + /* We can insert new instructions at point P * 2 by inserting them + after M_POINTS[P - M_END_OF_SEQUENCE / 2]. We can insert new + instructions at point P * 2 + 1 by inserting them before + M_POINTS[P - M_END_OF_SEQUENCE / 2]. */ + auto_vec m_points; +}; + +combine2::combine2 (function *fn) + : m_fn (fn), + m_num_regs (max_reg_num ()), + m_bb (NULL), + m_optimize_for_speed_p (false), + m_point (2), + m_end_of_sequence (m_point), + m_end_of_bb (m_point), + m_mem_range (NULL), + m_reg_info (m_num_regs), + m_valid_regs (m_num_regs), + m_spare_parallel (NULL_RTX) +{ + gcc_obstack_init (&m_insn_obstack); + gcc_obstack_init (&m_range_obstack); + m_reg_info.quick_grow (m_num_regs); + bitmap_clear (m_valid_regs); + m_insn_obstack_start = XOBNEWVAR (&m_insn_obstack, char, 0); + m_range_obstack_start = XOBNEWVAR (&m_range_obstack, char, 0); +} + +combine2::~combine2 () +{ + obstack_free (&m_insn_obstack, NULL); + obstack_free (&m_range_obstack, NULL); +} + +/* Return true if it's possible in principle to combine INSN with + other instructions. ALLOW_ASMS_P is true if the caller can cope + with asm statements. */ + +static bool +combinable_insn_p (rtx_insn *insn, bool allow_asms_p) +{ + rtx pattern = PATTERN (insn); + + if (GET_CODE (pattern) == USE || GET_CODE (pattern) == CLOBBER) + return false; + + if (JUMP_P (insn) && find_reg_note (insn, REG_NON_LOCAL_GOTO, NULL_RTX)) + return false; + + if (!allow_asms_p && asm_noperands (PATTERN (insn)) >= 0) + return false; + + return true; +} + +/* Return true if it's possible in principle to move INSN somewhere else, + as long as all dependencies are satisfied. */ + +static bool +movable_insn_p (rtx_insn *insn) +{ + if (JUMP_P (insn)) + return false; + + if (volatile_refs_p (PATTERN (insn))) + return false; + + return true; +} + +/* Create and return a new live range for REGNO. NEXT is the next range + in program order, or null if this is the first live range in the + sequence. */ + +combine2::live_range_rec * +combine2::new_live_range (unsigned int regno, live_range_rec *next) +{ + live_range_rec *range = XOBNEW (&m_range_obstack, live_range_rec); + memset (range, 0, sizeof (*range)); + + range->regno = regno; + range->next_range = next; + if (next) + next->prev_range = range; + return range; +} + +/* Return the current live range for register REGNO, creating a new + one if necessary. */ + +combine2::live_range_rec * +combine2::reg_live_range (unsigned int regno) +{ + /* Initialize the liveness flag, if it isn't already valid for this BB. */ + bool first_ref_p = !bitmap_bit_p (m_valid_regs, regno); + if (first_ref_p || m_reg_info[regno].next_ref < m_end_of_bb) + m_reg_info[regno].live_p = bitmap_bit_p (df_get_live_out (m_bb), regno); + + /* See if we already have a live range associated with the current + instruction sequence. */ + live_range_rec *range = NULL; + if (!first_ref_p && m_reg_info[regno].next_ref >= m_end_of_sequence) + range = m_reg_info[regno].range; + + /* Create a new range if this is the first reference to REGNO in the + current instruction sequence or if the current range has been closed + off by a definition. */ + if (!range || range->producer) + { + range = new_live_range (regno, range); + + /* If the register is live after the current sequence, treat that + as a fake use at the end of the sequence. */ + if (!range->next_range && m_reg_info[regno].live_p) + range->first_extra_use = range->last_extra_use = m_end_of_sequence; + + /* Record that this is now the current range for REGNO. */ + if (first_ref_p) + bitmap_set_bit (m_valid_regs, regno); + m_reg_info[regno].range = range; + m_reg_info[regno].next_ref = m_point; + } + return range; +} + +/* Return the current live range for memory, treating memory as a single + entity. Create a new live range if necessary. */ + +combine2::live_range_rec * +combine2::mem_live_range () +{ + if (!m_mem_range || m_mem_range->producer) + m_mem_range = new_live_range (INVALID_REGNUM, m_mem_range); + return m_mem_range; +} + +/* Record that instruction USER uses the resource described by RANGE. + Return true if this is new information. */ + +bool +combine2::add_range_use (live_range_rec *range, insn_info_rec *user) +{ + /* See if we've already recorded the instruction, or if there's a + spare use slot we can use. */ + unsigned int i = 0; + for (; i < NUM_RANGE_USERS && range->users[i]; ++i) + if (range->users[i] == user) + return false; + + if (i == NUM_RANGE_USERS) + { + /* Since we've processed USER recently, assume that it's more + interesting to record explicitly than the last user in the + current list. Evict that last user and describe it in the + overflow "extra use" range instead. */ + insn_info_rec *ousted_user = range->users[--i]; + if (range->first_extra_use < ousted_user->point) + range->first_extra_use = ousted_user->point; + if (range->last_extra_use > ousted_user->point) + range->last_extra_use = ousted_user->point; + } + + /* Insert USER while keeping the list sorted. */ + for (; i > 0 && range->users[i - 1]->point < user->point; --i) + range->users[i] = range->users[i - 1]; + range->users[i] = user; + return true; +} + +/* Remove USER from the uses recorded for RANGE, if we can. + There's nothing we can do if USER was described in the + overflow "extra use" range. */ + +void +combine2::remove_range_use (live_range_rec *range, insn_info_rec *user) +{ + for (unsigned int i = 0; i < NUM_RANGE_USERS; ++i) + if (range->users[i] == user) + { + for (unsigned int j = i; j < NUM_RANGE_USERS - 1; ++j) + range->users[j] = range->users[j + 1]; + range->users[NUM_RANGE_USERS - 1] = NULL; + break; + } +} + +/* Return true if RANGE has a single known user. */ + +bool +combine2::has_single_use_p (live_range_rec *range) +{ + return range->users[0] && !range->users[1] && !range->first_extra_use; +} + +/* Return true if we know that USER is the last user of RANGE. */ + +bool +combine2::known_last_use_p (live_range_rec *range, insn_info_rec *user) +{ + if (range->last_extra_use <= user->point) + return false; + + for (unsigned int i = 0; i < NUM_RANGE_USERS && range->users[i]; ++i) + if (range->users[i] == user) + return i == NUM_RANGE_USERS - 1 || !range->users[i + 1]; + else if (range->users[i]->point == user->point) + return false; + + gcc_unreachable (); +} + +/* Find the earliest point that we could move I2 up in order to combine + it with I1. Ignore any dependencies between I1 and I2; leave the + caller to deal with those instead. */ + +unsigned int +combine2::find_earliest_point (insn_info_rec *i2, insn_info_rec *i1) +{ + if (!movable_insn_p (i2->insn)) + return i2->point; + + /* Start by optimistically assuming that we can move the instruction + all the way up to I1. */ + unsigned int point = i1->point; + + /* Make sure that the new position preserves all necessary true dependencies + on earlier instructions. */ + for (live_range_rec **use = i2->uses; *use; ++use) + { + live_range_rec *range = *use; + if (range->producer + && range->producer != i1 + && point >= range->producer->point) + point = range->producer->point - 1; + } + + /* Make sure that the new position preserves all necessary output and + anti dependencies on earlier instructions. */ + for (live_range_rec **def = i2->defs; *def; ++def) + if (live_range_rec *range = (*def)->prev_range) + { + if (range->producer + && range->producer != i1 + && point >= range->producer->point) + point = range->producer->point - 1; + + for (unsigned int i = NUM_RANGE_USERS - 1; i-- > 0;) + if (range->users[i] && range->users[i] != i1) + { + if (point >= range->users[i]->point) + point = range->users[i]->point - 1; + break; + } + + if (range->last_extra_use && point >= range->last_extra_use) + point = range->last_extra_use - 1; + } + + return point; +} + +/* Find the latest point that we could move I1 down in order to combine + it with I2. Ignore any dependencies between I1 and I2; leave the + caller to deal with those instead. */ + +unsigned int +combine2::find_latest_point (insn_info_rec *i1, insn_info_rec *i2) +{ + if (!movable_insn_p (i1->insn)) + return i1->point; + + /* Start by optimistically assuming that we can move the instruction + all the way down to I2. */ + unsigned int point = i2->point; + + /* Make sure that the new position preserves all necessary anti dependencies + on later instructions. */ + for (live_range_rec **use = i1->uses; *use; ++use) + if (live_range_rec *range = (*use)->next_range) + if (range->producer != i2 && point <= range->producer->point) + point = range->producer->point + 1; + + /* Make sure that the new position preserves all necessary output and + true dependencies on later instructions. */ + for (live_range_rec **def = i1->defs; *def; ++def) + { + live_range_rec *range = *def; + + for (unsigned int i = 0; i < NUM_RANGE_USERS; ++i) + if (range->users[i] != i2) + { + if (range->users[i] && point <= range->users[i]->point) + point = range->users[i]->point + 1; + break; + } + + if (range->first_extra_use && point <= range->first_extra_use) + point = range->first_extra_use + 1; + + live_range_rec *next_range = range->next_range; + if (next_range + && next_range->producer != i2 + && point <= next_range->producer->point) + point = next_range->producer->point + 1; + } + + return point; +} + +/* Initialize ATTEMPT for an attempt to combine instructions I1 and I2, + where I1 is the instruction that we're currently trying to optimize. + If DEF_USE_RANGE is nonnull, I1 defines the value described by + DEF_USE_RANGE and I2 uses it. */ + +bool +combine2::start_combination (combination_attempt_rec &attempt, + insn_info_rec *i1, insn_info_rec *i2, + live_range_rec *def_use_range) +{ + attempt.new_home = i1; + attempt.sequence[0] = i1; + attempt.sequence[1] = i2; + if (attempt.sequence[0]->point < attempt.sequence[1]->point) + std::swap (attempt.sequence[0], attempt.sequence[1]); + attempt.def_use_range = def_use_range; + + /* Check that the instructions have no true dependencies other than + DEF_USE_RANGE. */ + bitmap_clear (m_true_deps); + for (live_range_rec **def = attempt.sequence[0]->defs; *def; ++def) + if (*def != def_use_range) + bitmap_set_bit (m_true_deps, (*def)->regno); + for (live_range_rec **use = attempt.sequence[1]->uses; *use; ++use) + if (*use != def_use_range && bitmap_bit_p (m_true_deps, (*use)->regno)) + return false; + + /* Calculate the range of points at which the combined instruction + could live. */ + attempt.earliest_point = find_earliest_point (attempt.sequence[1], + attempt.sequence[0]); + attempt.latest_point = find_latest_point (attempt.sequence[0], + attempt.sequence[1]); + if (attempt.earliest_point < attempt.latest_point) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "cannot combine %d and %d: no suitable" + " location for combined insn\n", + INSN_UID (attempt.sequence[0]->insn), + INSN_UID (attempt.sequence[1]->insn)); + return false; + } + + /* Make sure we have valid costs for the original instructions before + we start changing their patterns. */ + for (unsigned int i = 0; i < MAX_COMBINE_INSNS; ++i) + if (attempt.sequence[i]->cost == UNKNOWN_COST) + attempt.sequence[i]->cost = insn_cost (attempt.sequence[i]->insn, + m_optimize_for_speed_p); + return true; +} + +/* Check whether the combination attempt described by ATTEMPT matches + an .md instruction (or matches its constraints, in the case of an + asm statement). If so, calculate the cost of the new instruction + and check whether it's cheap enough. */ + +bool +combine2::verify_combination (combination_attempt_rec &attempt) +{ + rtx_insn *insn = attempt.sequence[1]->insn; + + bool ok_p = verify_changes (0); + if (dump_file && (dump_flags & TDF_DETAILS)) + { + if (!ok_p) + fprintf (dump_file, "failed to match this instruction:\n"); + else if (const char *name = get_insn_name (INSN_CODE (insn))) + fprintf (dump_file, "successfully matched this instruction to %s:\n", + name); + else + fprintf (dump_file, "successfully matched this instruction:\n"); + print_rtl_single (dump_file, PATTERN (insn)); + } + if (!ok_p) + return false; + + unsigned int cost1 = attempt.sequence[0]->cost; + unsigned int cost2 = attempt.sequence[1]->cost; + attempt.new_cost = insn_cost (insn, m_optimize_for_speed_p); + ok_p = (attempt.new_cost <= cost1 + cost2); + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "original cost = %d + %d, replacement cost = %d; %s\n", + cost1, cost2, attempt.new_cost, + ok_p ? "keeping replacement" : "rejecting replacement"); + if (!ok_p) + return false; + + confirm_change_group (); + return true; +} + +/* Return true if we should consider register REGNO when calculating + register pressure estimates. */ + +static bool +count_reg_pressure_p (unsigned int regno) +{ + if (regno == INVALID_REGNUM) + return false; + + /* Unallocatable registers aren't interesting. */ + if (HARD_REGISTER_NUM_P (regno) && fixed_regs[regno]) + return false; + + return true; +} + +/* Try to estimate the effect that the original form of INSN_INFO + had on register pressure, in the form "born - dying". */ + +int +combine2::estimate_reg_pressure_delta (insn_info_rec *insn_info) +{ + int delta = 0; + + for (live_range_rec **def = insn_info->defs; *def; ++def) + if (count_reg_pressure_p ((*def)->regno)) + delta += 1; + + for (live_range_rec **use = insn_info->uses; *use; ++use) + if (count_reg_pressure_p ((*use)->regno) + && known_last_use_p (*use, insn_info)) + delta -= 1; + + return delta; +} + +/* We've moved FROM_INSN's pattern to TO_INSN and are about to delete + FROM_INSN. Copy any useful information to TO_INSN before doing that. */ + +static void +transfer_insn (rtx_insn *to_insn, rtx_insn *from_insn) +{ + INSN_LOCATION (to_insn) = INSN_LOCATION (from_insn); + INSN_CODE (to_insn) = INSN_CODE (from_insn); + REG_NOTES (to_insn) = REG_NOTES (from_insn); +} + +/* The combination attempt in ATTEMPT has succeeded and is currently + part of an open validate_change group. Commit to making the change + and decide where the new instruction should go. + + KEPT_DEF_P is true if the new instruction continues to perform + the definition described by ATTEMPT.def_use_range. */ + +void +combine2::commit_combination (combination_attempt_rec &attempt, + bool kept_def_p) +{ + insn_info_rec *new_home = attempt.new_home; + rtx_insn *old_insn = attempt.sequence[0]->insn; + rtx_insn *new_insn = attempt.sequence[1]->insn; + + /* Remove any notes that are no longer relevant. */ + bool single_set_p = single_set (new_insn); + for (rtx *note_ptr = ®_NOTES (new_insn); *note_ptr; ) + { + rtx note = *note_ptr; + bool keep_p = true; + switch (REG_NOTE_KIND (note)) + { + case REG_EQUAL: + case REG_EQUIV: + case REG_NOALIAS: + keep_p = single_set_p; + break; + + case REG_UNUSED: + keep_p = false; + break; + + default: + break; + } + if (keep_p) + note_ptr = &XEXP (*note_ptr, 1); + else + { + *note_ptr = XEXP (*note_ptr, 1); + free_EXPR_LIST_node (note); + } + } + + /* Complete the open validate_change group. */ + confirm_change_group (); + + /* Decide where the new instruction should go. */ + unsigned int new_point = attempt.latest_point; + if (new_point != attempt.earliest_point + && prev_real_insn (new_insn) != old_insn) + { + /* Prefer the earlier point if the combined instruction reduces + register pressure and the latest point if it increases register + pressure. + + The choice isn't obvious in the event of a tie, but picking + the earliest point should reduce the number of times that + we need to invalidate debug insns. */ + int delta1 = estimate_reg_pressure_delta (attempt.sequence[0]); + int delta2 = estimate_reg_pressure_delta (attempt.sequence[1]); + bool move_up_p = (delta1 + delta2 <= 0); + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, + "register pressure delta = %d + %d; using %s position\n", + delta1, delta2, move_up_p ? "earliest" : "latest"); + if (move_up_p) + new_point = attempt.earliest_point; + } + + /* Translate inserting at NEW_POINT into inserting before or after + a particular insn. */ + rtx_insn *anchor = NULL; + bool before_p = (new_point & 1); + if (new_point != attempt.sequence[1]->point + && new_point != attempt.sequence[0]->point) + { + anchor = m_points[(new_point - m_end_of_sequence) / 2]; + rtx_insn *other_side = (before_p + ? prev_real_insn (anchor) + : next_real_insn (anchor)); + /* Inserting next to an insn X and then deleting X is just a + roundabout way of using X as the insertion point. */ + if (anchor == new_insn || other_side == new_insn) + new_point = attempt.sequence[1]->point; + else if (anchor == old_insn || other_side == old_insn) + new_point = attempt.sequence[0]->point; + } + + /* Actually perform the move. */ + if (new_point == attempt.sequence[1]->point) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "using insn %d to hold the combined pattern\n", + INSN_UID (new_insn)); + set_insn_deleted (old_insn); + } + else if (new_point == attempt.sequence[0]->point) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "using insn %d to hold the combined pattern\n", + INSN_UID (old_insn)); + PATTERN (old_insn) = PATTERN (new_insn); + transfer_insn (old_insn, new_insn); + std::swap (old_insn, new_insn); + set_insn_deleted (old_insn); + } + else + { + /* We need to insert a new instruction. We can't simply move + NEW_INSN because it acts as an insertion anchor in m_points. */ + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "inserting combined insn %s insn %d\n", + before_p ? "before" : "after", INSN_UID (anchor)); + + rtx_insn *added_insn = (before_p + ? emit_insn_before (PATTERN (new_insn), anchor) + : emit_insn_after (PATTERN (new_insn), anchor)); + transfer_insn (added_insn, new_insn); + set_insn_deleted (old_insn); + set_insn_deleted (new_insn); + new_insn = added_insn; + } + df_insn_rescan (new_insn); + + /* Unlink the old uses. */ + for (unsigned int i = 0; i < MAX_COMBINE_INSNS; ++i) + for (live_range_rec **use = attempt.sequence[i]->uses; *use; ++use) + remove_range_use (*use, attempt.sequence[i]); + + /* Work out which registers the new pattern uses. */ + bitmap_clear (m_true_deps); + df_ref use; + FOR_EACH_INSN_USE (use, new_insn) + { + rtx reg = DF_REF_REAL_REG (use); + bitmap_set_range (m_true_deps, REGNO (reg), REG_NREGS (reg)); + } + FOR_EACH_INSN_EQ_USE (use, new_insn) + { + rtx reg = DF_REF_REAL_REG (use); + bitmap_set_range (m_true_deps, REGNO (reg), REG_NREGS (reg)); + } + + /* Describe the combined instruction in NEW_HOME. */ + new_home->insn = new_insn; + new_home->point = new_point; + new_home->cost = attempt.new_cost; + + /* Build up a list of definitions for the combined instructions + and update all the ranges accordingly. It shouldn't matter + which order we do this in. */ + for (unsigned int i = 0; i < MAX_COMBINE_INSNS; ++i) + for (live_range_rec **def = attempt.sequence[i]->defs; *def; ++def) + if (kept_def_p || *def != attempt.def_use_range) + { + obstack_ptr_grow (&m_insn_obstack, *def); + (*def)->producer = new_home; + } + obstack_ptr_grow (&m_insn_obstack, NULL); + new_home->defs = (live_range_rec **) obstack_finish (&m_insn_obstack); + + /* Build up a list of uses for the combined instructions and update + all the ranges accordingly. Again, it shouldn't matter which + order we do this in. */ + for (unsigned int i = 0; i < MAX_COMBINE_INSNS; ++i) + for (live_range_rec **use = attempt.sequence[i]->uses; *use; ++use) + if (*use != attempt.def_use_range + && add_range_use (*use, new_home)) + obstack_ptr_grow (&m_insn_obstack, *use); + obstack_ptr_grow (&m_insn_obstack, NULL); + new_home->uses = (live_range_rec **) obstack_finish (&m_insn_obstack); + + /* There shouldn't be any remaining references to other instructions + in the combination. Invalidate their contents to make lingering + references a noisy failure. */ + for (unsigned int i = 0; i < MAX_COMBINE_INSNS; ++i) + if (attempt.sequence[i] != new_home) + { + attempt.sequence[i]->insn = NULL; + attempt.sequence[i]->point = ~0U; + } + + /* Unlink the def-use range. */ + if (!kept_def_p && attempt.def_use_range) + { + live_range_rec *range = attempt.def_use_range; + if (range->prev_range) + range->prev_range->next_range = range->next_range; + else + m_reg_info[range->regno].range = range->next_range; + if (range->next_range) + range->next_range->prev_range = range->prev_range; + } + + /* Record instructions whose new form alters the cfg. */ + rtx pattern = PATTERN (new_insn); + if ((returnjump_p (new_insn) + || any_uncondjump_p (new_insn) + || (GET_CODE (pattern) == TRAP_IF && XEXP (pattern, 0) == const1_rtx)) + && bitmap_set_bit (m_cfg_altering_insn_ids, INSN_UID (new_insn))) + m_cfg_altering_insns.safe_push (new_insn); +} + +/* Return true if X1 and X2 are memories and if X1 does not have + a higher alignment than X2. */ + +static bool +dubious_mem_pair_p (rtx x1, rtx x2) +{ + return MEM_P (x1) && MEM_P (x2) && MEM_ALIGN (x1) <= MEM_ALIGN (x2); +} + +/* Try implement ATTEMPT using (parallel [SET1 SET2]). */ + +bool +combine2::try_parallel_sets (combination_attempt_rec &attempt, + rtx set1, rtx set2) +{ + rtx_insn *insn = attempt.sequence[1]->insn; + + /* Combining two loads or two stores can be useful on targets that + allow them to be treated as a single access. However, we use a + very peephole approach to picking the pairs, so we need to be + relatively confident that we're making a good choice. + + For now just aim for cases in which the memory references are + consecutive and the first reference has a higher alignment. + We can leave the target to test the consecutive part; whatever test + we added here might be different from the target's, and in any case + it's fine if the target accepts other well-aligned cases too. */ + if (dubious_mem_pair_p (SET_DEST (set1), SET_DEST (set2)) + || dubious_mem_pair_p (SET_SRC (set1), SET_SRC (set2))) + return false; + + /* Cache the PARALLEL rtx between attempts so that we don't generate + too much garbage rtl. */ + if (!m_spare_parallel) + { + rtvec vec = gen_rtvec (2, set1, set2); + m_spare_parallel = gen_rtx_PARALLEL (VOIDmode, vec); + } + else + { + XVECEXP (m_spare_parallel, 0, 0) = set1; + XVECEXP (m_spare_parallel, 0, 1) = set2; + } + + unsigned int num_changes = num_validated_changes (); + validate_change (insn, &PATTERN (insn), m_spare_parallel, true); + if (verify_combination (attempt)) + { + m_spare_parallel = NULL_RTX; + return true; + } + cancel_changes (num_changes); + return false; +} + +/* Try to parallelize the two instructions in ATTEMPT. */ + +bool +combine2::try_parallelize_insns (combination_attempt_rec &attempt) +{ + rtx_insn *i1_insn = attempt.sequence[0]->insn; + rtx_insn *i2_insn = attempt.sequence[1]->insn; + + /* Can't parallelize asm statements. */ + if (asm_noperands (PATTERN (i1_insn)) >= 0 + || asm_noperands (PATTERN (i2_insn)) >= 0) + return false; + + /* For now, just handle the case in which both instructions are + single sets. We could handle more than 2 sets as well, but few + targets support that anyway. */ + rtx set1 = single_set (i1_insn); + if (!set1) + return false; + rtx set2 = single_set (i2_insn); + if (!set2) + return false; + + /* Make sure that we have structural proof that the destinations + are independent. Things like alias analysis rely on semantic + information and assume no undefined behavior, which is rarely a + good enough guarantee to allow a useful instruction combination. */ + rtx dest1 = SET_DEST (set1); + rtx dest2 = SET_DEST (set2); + if (MEM_P (dest1) + ? MEM_P (dest2) && nonoverlapping_memrefs_p (dest1, dest2, false) + : !MEM_P (dest2) && reg_overlap_mentioned_p (dest1, dest2)) + return false; + + /* Try the sets in both orders. */ + if (try_parallel_sets (attempt, set1, set2) + || try_parallel_sets (attempt, set2, set1)) + { + commit_combination (attempt, true); + if (MAY_HAVE_DEBUG_BIND_INSNS + && attempt.new_home->insn != i1_insn) + propagate_for_debug (i1_insn, attempt.new_home->insn, + SET_DEST (set1), SET_SRC (set1), m_bb); + return true; + } + return false; +} + +/* Replace DEST with SRC in the register notes for INSN. */ + +static void +substitute_into_note (rtx_insn *insn, rtx dest, rtx src) +{ + for (rtx *note_ptr = ®_NOTES (insn); *note_ptr; ) + { + rtx note = *note_ptr; + bool keep_p = true; + switch (REG_NOTE_KIND (note)) + { + case REG_EQUAL: + case REG_EQUIV: + keep_p = validate_simplify_replace_rtx (insn, &XEXP (note, 0), + dest, src); + break; + + default: + break; + } + if (keep_p) + note_ptr = &XEXP (*note_ptr, 1); + else + { + *note_ptr = XEXP (*note_ptr, 1); + free_EXPR_LIST_node (note); + } + } +} + +/* A subroutine of try_combine_def_use. Try replacing DEST with SRC + in ATTEMPT. SRC might be either the original SET_SRC passed to the + parent routine or a value pulled from a note; SRC_IS_NOTE_P is true + in the latter case. */ + +bool +combine2::try_combine_def_use_1 (combination_attempt_rec &attempt, + rtx dest, rtx src, bool src_is_note_p) +{ + rtx_insn *def_insn = attempt.sequence[0]->insn; + rtx_insn *use_insn = attempt.sequence[1]->insn; + + /* Mimic combine's behavior by not combining moves from allocatable hard + registers (e.g. when copying parameters or function return values). */ + if (REG_P (src) && HARD_REGISTER_P (src) && !fixed_regs[REGNO (src)]) + return false; + + /* Don't mess with volatile references. For one thing, we don't yet + know how many copies of SRC we'll need. */ + if (volatile_refs_p (src)) + return false; + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "trying to combine %d and %d%s:\n", + INSN_UID (def_insn), INSN_UID (use_insn), + src_is_note_p ? " using equal/equiv note" : ""); + dump_insn_slim (dump_file, def_insn); + dump_insn_slim (dump_file, use_insn); + } + + unsigned int num_changes = num_validated_changes (); + if (!validate_simplify_replace_rtx (use_insn, &PATTERN (use_insn), + dest, src)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "combination failed -- unable to substitute" + " all uses\n"); + return false; + } + + /* Try matching the instruction on its own if DEST isn't used elsewhere. */ + if (has_single_use_p (attempt.def_use_range) + && verify_combination (attempt)) + { + live_range_rec *next_range = attempt.def_use_range->next_range; + substitute_into_note (use_insn, dest, src); + commit_combination (attempt, false); + if (MAY_HAVE_DEBUG_BIND_INSNS) + { + rtx_insn *end_of_range = (next_range + ? next_range->producer->insn + : BB_END (m_bb)); + propagate_for_debug (def_insn, end_of_range, dest, src, m_bb); + } + return true; + } + + /* Try doing the new USE_INSN pattern in parallel with the DEF_INSN + pattern. */ + if (try_parallelize_insns (attempt)) + return true; + + cancel_changes (num_changes); + return false; +} + +/* ATTEMPT describes an attempt to substitute the result of the first + instruction into the second instruction. Try to implement it, + given that the first instruction sets DEST to SRC. */ + +bool +combine2::try_combine_def_use (combination_attempt_rec &attempt, + rtx dest, rtx src) +{ + rtx_insn *def_insn = attempt.sequence[0]->insn; + rtx_insn *use_insn = attempt.sequence[1]->insn; + rtx def_note = find_reg_equal_equiv_note (def_insn); + + /* First try combining the instructions in their original form. */ + if (try_combine_def_use_1 (attempt, dest, src, false)) + return true; + + /* Try to replace DEST with a REG_EQUAL/EQUIV value instead. */ + if (def_note + && try_combine_def_use_1 (attempt, dest, XEXP (def_note, 0), true)) + return true; + + /* If USE_INSN has a REG_EQUAL/EQUIV note that refers to DEST, try + using that instead of the main pattern. */ + for (rtx *link_ptr = ®_NOTES (use_insn); *link_ptr; + link_ptr = &XEXP (*link_ptr, 1)) + { + rtx use_note = *link_ptr; + if (REG_NOTE_KIND (use_note) != REG_EQUAL + && REG_NOTE_KIND (use_note) != REG_EQUIV) + continue; + + rtx use_set = single_set (use_insn); + if (!use_set) + break; + + if (!reg_overlap_mentioned_p (dest, XEXP (use_note, 0))) + continue; + + /* Try snipping out the note and putting it in the SET instead. */ + validate_change (use_insn, link_ptr, XEXP (use_note, 1), 1); + validate_change (use_insn, &SET_SRC (use_set), XEXP (use_note, 0), 1); + + if (try_combine_def_use_1 (attempt, dest, src, false)) + return true; + + if (def_note + && try_combine_def_use_1 (attempt, dest, XEXP (def_note, 0), true)) + return true; + + cancel_changes (0); + } + + return false; +} + +/* ATTEMPT describes an attempt to combine two instructions that use + the same resource. Try to implement it, returning true on success. */ + +bool +combine2::try_combine_two_uses (combination_attempt_rec &attempt) +{ + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "trying to parallelize %d and %d:\n", + INSN_UID (attempt.sequence[0]->insn), + INSN_UID (attempt.sequence[1]->insn)); + dump_insn_slim (dump_file, attempt.sequence[0]->insn); + dump_insn_slim (dump_file, attempt.sequence[1]->insn); + } + + return try_parallelize_insns (attempt); +} + +/* Try to optimize instruction INSN_INFO. Return true on success. */ + +bool +combine2::optimize_insn (insn_info_rec *i1) +{ + combination_attempt_rec attempt; + + if (!combinable_insn_p (i1->insn, false)) + return false; + + rtx set = single_set (i1->insn); + if (!set) + return false; + + /* First try combining INSN with a user of its result. */ + rtx dest = SET_DEST (set); + rtx src = SET_SRC (set); + if (REG_P (dest) && REG_NREGS (dest) == 1) + for (live_range_rec **def = i1->defs; *def; ++def) + if ((*def)->regno == REGNO (dest)) + { + for (unsigned int i = 0; i < NUM_RANGE_USERS; ++i) + { + insn_info_rec *use = (*def)->users[i]; + if (use + && combinable_insn_p (use->insn, has_single_use_p (*def)) + && start_combination (attempt, i1, use, *def) + && try_combine_def_use (attempt, dest, src)) + return true; + } + break; + } + + /* Try parallelizing INSN and another instruction that uses the same + resource. */ + bitmap_clear (m_tried_insns); + for (live_range_rec **use = i1->uses; *use; ++use) + for (unsigned int i = 0; i < NUM_RANGE_USERS; ++i) + { + insn_info_rec *i2 = (*use)->users[i]; + if (i2 + && i2 != i1 + && combinable_insn_p (i2->insn, false) + && bitmap_set_bit (m_tried_insns, INSN_UID (i2->insn)) + && start_combination (attempt, i1, i2) + && try_combine_two_uses (attempt)) + return true; + } + + return false; +} + +/* A note_stores callback. Set the bool at *DATA to true if DEST is in + memory. */ + +static void +find_mem_def (rtx dest, const_rtx, void *data) +{ + /* note_stores has stripped things like subregs and zero_extracts, + so we don't need to worry about them here. */ + if (MEM_P (dest)) + *(bool *) data = true; +} + +/* Record all register and memory definitions in INSN_INFO and fill in its + "defs" list. */ + +void +combine2::record_defs (insn_info_rec *insn_info) +{ + rtx_insn *insn = insn_info->insn; + + /* Record register definitions. */ + df_ref def; + FOR_EACH_INSN_DEF (def, insn) + { + rtx reg = DF_REF_REAL_REG (def); + unsigned int end_regno = END_REGNO (reg); + for (unsigned int regno = REGNO (reg); regno < end_regno; ++regno) + { + live_range_rec *range = reg_live_range (regno); + range->producer = insn_info; + m_reg_info[regno].live_p = false; + obstack_ptr_grow (&m_insn_obstack, range); + } + } + + /* If the instruction writes to memory, record that too. */ + bool saw_mem_p = false; + note_stores (insn, find_mem_def, &saw_mem_p); + if (saw_mem_p) + { + live_range_rec *range = mem_live_range (); + range->producer = insn_info; + obstack_ptr_grow (&m_insn_obstack, range); + } + + /* Complete the list of definitions. */ + obstack_ptr_grow (&m_insn_obstack, NULL); + insn_info->defs = (live_range_rec **) obstack_finish (&m_insn_obstack); +} + +/* Record that INSN_INFO contains register use USE. If this requires + new entries to be added to INSN_INFO->uses, add those entries to the + list we're building in m_insn_obstack. */ + +void +combine2::record_reg_use (insn_info_rec *insn_info, df_ref use) +{ + rtx reg = DF_REF_REAL_REG (use); + unsigned int end_regno = END_REGNO (reg); + for (unsigned int regno = REGNO (reg); regno < end_regno; ++regno) + { + live_range_rec *range = reg_live_range (regno); + if (add_range_use (range, insn_info)) + obstack_ptr_grow (&m_insn_obstack, range); + m_reg_info[regno].live_p = true; + } +} + +/* A note_uses callback. Set the bool at DATA to true if *LOC reads + from variable memory. */ + +static void +find_mem_use (rtx *loc, void *data) +{ + subrtx_iterator::array_type array; + FOR_EACH_SUBRTX (iter, array, *loc, NONCONST) + if (MEM_P (*iter) && !MEM_READONLY_P (*iter)) + { + *(bool *) data = true; + break; + } +} + +/* Record all register and memory uses in INSN_INFO and fill in its + "uses" list. */ + +void +combine2::record_uses (insn_info_rec *insn_info) +{ + rtx_insn *insn = insn_info->insn; + + /* Record register uses in the main pattern. */ + df_ref use; + FOR_EACH_INSN_USE (use, insn) + record_reg_use (insn_info, use); + + /* Treat REG_EQUAL uses as first-class uses. We don't lose much + by doing that, since it's rare for a REG_EQUAL note to mention + registers that the main pattern doesn't. It also gives us the + maximum freedom to use REG_EQUAL notes in place of the main pattern. */ + FOR_EACH_INSN_EQ_USE (use, insn) + record_reg_use (insn_info, use); + + /* Record a memory use if either the pattern or the notes read from + memory. */ + bool saw_mem_p = false; + note_uses (&PATTERN (insn), find_mem_use, &saw_mem_p); + for (rtx note = REG_NOTES (insn); !saw_mem_p && note; note = XEXP (note, 1)) + if (REG_NOTE_KIND (note) == REG_EQUAL + || REG_NOTE_KIND (note) == REG_EQUIV) + note_uses (&XEXP (note, 0), find_mem_use, &saw_mem_p); + if (saw_mem_p) + { + live_range_rec *range = mem_live_range (); + if (add_range_use (range, insn_info)) + obstack_ptr_grow (&m_insn_obstack, range); + } + + /* Complete the list of uses. */ + obstack_ptr_grow (&m_insn_obstack, NULL); + insn_info->uses = (live_range_rec **) obstack_finish (&m_insn_obstack); +} + +/* Start a new instruction sequence, discarding all information about + the previous one. */ + +void +combine2::start_sequence (void) +{ + m_end_of_sequence = m_point; + m_mem_range = NULL; + m_points.truncate (0); + obstack_free (&m_insn_obstack, m_insn_obstack_start); + obstack_free (&m_range_obstack, m_range_obstack_start); +} + +/* Run the pass on the current function. */ + +void +combine2::execute (void) +{ + df_analyze (); + FOR_EACH_BB_FN (m_bb, cfun) + { + m_optimize_for_speed_p = optimize_bb_for_speed_p (m_bb); + m_end_of_bb = m_point; + start_sequence (); + + rtx_insn *insn, *prev; + FOR_BB_INSNS_REVERSE_SAFE (m_bb, insn, prev) + { + if (!NONDEBUG_INSN_P (insn)) + continue; + + /* The current m_point represents the end of the sequence if + INSN is the last instruction in the sequence, otherwise it + represents the gap between INSN and the next instruction. + m_point + 1 represents INSN itself. + + Instructions can be added to m_point by inserting them + after INSN. They can be added to m_point + 1 by inserting + them before INSN. */ + m_points.safe_push (insn); + m_point += 1; + + insn_info_rec *insn_info = XOBNEW (&m_insn_obstack, insn_info_rec); + insn_info->insn = insn; + insn_info->point = m_point; + insn_info->cost = UNKNOWN_COST; + + record_defs (insn_info); + record_uses (insn_info); + + /* Set up m_point for the next instruction. */ + m_point += 1; + + if (CALL_P (insn)) + start_sequence (); + else + while (optimize_insn (insn_info)) + gcc_assert (insn_info->insn); + } + } + + /* If an instruction changes the cfg, update the containing block + accordingly. */ + rtx_insn *insn; + unsigned int i; + FOR_EACH_VEC_ELT (m_cfg_altering_insns, i, insn) + if (JUMP_P (insn)) + { + mark_jump_label (PATTERN (insn), insn, 0); + update_cfg_for_uncondjump (insn); + } + else + { + remove_edge (split_block (BLOCK_FOR_INSN (insn), insn)); + emit_barrier_after_bb (BLOCK_FOR_INSN (insn)); + } + + /* Propagate the above block-local cfg changes to the rest of the cfg. */ + if (!m_cfg_altering_insns.is_empty ()) + { + if (dom_info_available_p (CDI_DOMINATORS)) + free_dominance_info (CDI_DOMINATORS); + timevar_push (TV_JUMP); + rebuild_jump_labels (get_insns ()); + cleanup_cfg (0); + timevar_pop (TV_JUMP); + } +} + +const pass_data pass_data_combine2 = +{ + RTL_PASS, /* type */ + "combine2", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + TV_COMBINE2, /* tv_id */ + 0, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + TODO_df_finish, /* todo_flags_finish */ +}; + +class pass_combine2 : public rtl_opt_pass +{ +public: + pass_combine2 (gcc::context *ctxt, int flag) + : rtl_opt_pass (pass_data_combine2, ctxt), m_flag (flag) + {} + + bool + gate (function *) OVERRIDE + { + return optimize && (param_run_combine & m_flag) != 0; + } + + unsigned int + execute (function *f) OVERRIDE + { + combine2 (f).execute (); + return 0; + } + +private: + unsigned int m_flag; +}; // class pass_combine2 + +} // anon namespace + +rtl_opt_pass * +make_pass_combine2_before (gcc::context *ctxt) +{ + return new pass_combine2 (ctxt, 1); +} + +rtl_opt_pass * +make_pass_combine2_after (gcc::context *ctxt) +{ + return new pass_combine2 (ctxt, 4); +}