From patchwork Mon Apr 27 10:20:30 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 464931 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 5817614007D for ; Mon, 27 Apr 2015 20:20:49 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass reason="1024-bit key; unprotected key" header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=xSpJfUeb; dkim-adsp=none (unprotected policy); dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=v76KIMZNKPrL0aY0mHVGW2p1BrsSKScyIWaAvtoIHNs+IRxHLHqtB qWg44Ky6HNY2XDd4CairRs9eXjd5MZ4r5lQwc0FZtoXN2XfqLuHYM7TBlTP2+DFn 4AaLICKEpxIebjM3hlJ6V4YR/8rqo7D4zdceqp3A3NcfTgLMeRIkmo= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=EHg5ufkA/HwzRLtEvFDG+2+Al5Q=; b=xSpJfUebfK4aTXFEknnh p8HRmqCE2QeLLtaMiZSv4iszsSDkK++ObSF+vAXy5X6O7E1yUIZ+Bv9n6ITQ5NVL 02fcWrvHz8lJvetUiQ7MSSQBZ6yB9QuRvPTTSA1MY69YW5dJAM+T43AR2aSQacV0 maO/hjO7zsS6TeEHPGAZNyk= Received: (qmail 42058 invoked by alias); 27 Apr 2015 10:20:41 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 41490 invoked by uid 89); 27 Apr 2015 10:20:38 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.5 required=5.0 tests=AWL, BAYES_50, SPF_PASS autolearn=ham version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (207.82.80.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 27 Apr 2015 10:20:34 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by uk-mta-4.uk.mimecast.lan; Mon, 27 Apr 2015 11:20:31 +0100 Received: from localhost ([10.1.2.79]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Mon, 27 Apr 2015 11:20:30 +0100 From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: Mostly rewrite genrecog Date: Mon, 27 Apr 2015 11:20:30 +0100 Message-ID: <87egn5yis1.fsf@e105548-lin.cambridge.arm.com> User-Agent: Gnus/5.130012 (Ma Gnus v0.12) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 X-MC-Unique: waWyUAneS_CqxksFf1pgMw-1 I think it's been the case for a while that parallel builds of GCC tend to serialise around the compilation of insn-recog.c, especially with higher --enable-checking settings. This patch tries to speed that up by replacing most of genrecog with a new algorithm. I think the main problems with the current code are: 1. Vector architectures have added lots of new instructions that have a similar shape and differ only in mode, code or unspec number. The current algorithm doesn't have any way of factoring out those similarities. 2. When matching a particular instruction, the current code examines everything about a SET_DEST before moving on to the SET_SRC. This has two subproblems: 2a. The destination of a SET isn't very distinctive. It's usually just a register_operand, a memory_operand, a nonimmediate_operand or a flags register. We therefore tend to backtrack to the SET_DEST a lot, oscillating between groups of instructions with the same kind of destination. 2b. Backtracking through predicate checks is relatively expensive. It would be good to narrow down the "shape" of the instruction first and only then check the predicates. (The backtracking is expensive in terms of insn-recog.o compile time too, both because we need to copy into argument registers and out of the result register, and because it adds more sites where spills are needed.) 3. The code keeps one local variable per rtx depth, so it ends up loading the same rtx many times over (mostly when backtracking). This is very expensive in rtl-checking builds because each XEXP includes a code check and a line-specific failure call. In principle the idea of having one local variable per depth is good. But it was originally written that way when all optimisations were done at the rtl level and I imagine each local variable mapped to one pseudo register. These days the statements that reload the value needed on backtracking lead to many more SSA names and phi statements than you'd get with just a single variable per position (loaded once, so naturally SSA). There is still the potential benefit of avoiding having sibling rtxes live at once, but fixing (2) above reduces that problem. Also, the code is all goto-based, which makes it rather hard to step through. The patch deals with these as follows: 1. Detect subpatterns that differ only by mode, code and/or integer (e.g. unspec number) and split them out into a common routine. 2. Match the "shape" of the instruction first, in terms of codes, integers and vector lengths, and only then check the modes, predicates and dups. When checking the shape, handle SET_SRCs before SET_DESTs. In practice this seems to greatly reduce the amount of backtracking. 3. Have one local variable per rtx position. I tested the patch with and without the change and it helped a lot with rtl-checking builds without seeming to affect release builds much either way. As far as debuggability goes, the new code avoids gotos and just uses "natural" control flow. The headline stat is that a stage 3 --enable-checking=yes,rtl,df build of insn-recog.c on my box goes from 7m43s to 2m2s (using the same stage 2 compiler). The corresponding --enable-checking=release change is from 49s to 24s (less impressive, as expected). The patch seems to speed up recog. E.g. the time taken to build fold-const.ii goes from 6.74s before the patch to 6.69s after it; not a big speed-up, but reproducible. Here's a comparison of the number of lines of code in insn-recog.c before and after the patch on one target per config/ CPU: aarch64-linux-gnueabi 115526 38169 : 33.04% alpha-linux-gnu 24479 10740 : 43.87% arm-linux-gnueabi 169208 67759 : 40.04% avr-rtems 55647 22127 : 39.76% bfin-elf 13928 6498 : 46.65% c6x-elf 29928 13324 : 44.52% cr16-elf 2650 1419 : 53.55% cris-elf 18669 7257 : 38.87% epiphany-elf 19308 6131 : 31.75% fr30-elf 2204 1112 : 50.45% frv-linux-gnu 13541 5950 : 43.94% h8300-elf 19584 9327 : 47.63% hppa64-hp-hpux11.23 18299 8549 : 46.72% ia64-linux-gnu 37629 17101 : 45.45% iq2000-elf 2752 1609 : 58.47% lm32-elf 1536 872 : 56.77% m32c-elf 10040 4145 : 41.28% m32r-elf 4436 2307 : 52.01% m68k-linux-gnu 15739 8147 : 51.76% mcore-elf 4816 2577 : 53.51% mep-elf 67335 15929 : 23.66% microblaze-elf 2656 1587 : 59.75% mips-linux-gnu 54543 24186 : 44.34% mmix 2597 1487 : 57.26% mn10300-elf 6384 3294 : 51.60% moxie-elf 1311 659 : 50.27% msp430-elf 6054 2382 : 39.35% nds32le-elf 5953 3152 : 52.95% nios2-linux-gnu 3735 2143 : 57.38% pdp11 2137 1157 : 54.14% powerpc-eabispe 109322 40582 : 37.12% powerpc-linux-gnu 82976 32192 : 38.80% rl78-elf 5321 2432 : 45.71% rx-elf 14454 7534 : 52.12% s390-linux-gnu 48487 20454 : 42.18% sh-linux-gnu 104087 41820 : 40.18% sparc-linux-gnu 21912 10509 : 47.96% spu-elf 19937 8182 : 41.04% tilegx-elf 15412 6970 : 45.22% tilepro-elf 11998 5479 : 45.67% v850-elf 8725 4438 : 50.87% vax-netbsdelf 4537 2410 : 53.12% visium-elf 15190 7224 : 47.56% x86_64-darwin 346396 119593 : 34.52% xstormy16-elf 4660 2229 : 47.83% xtensa-elf 2799 1514 : 54.09% Here's the loadable size of insn-recog.o in an --enable-checking=release build on an x86_64-linux-gnu box: aarch64-linux-gnueabi 443955 298026 : 67.13% alpha-linux-gnu 97194 80893 : 83.23% arm-linux-gnueabi 782325 632248 : 80.82% avr-rtems 226827 159763 : 70.43% bfin-elf 52563 42376 : 80.62% c6x-elf 112512 90142 : 80.12% cr16-elf 10446 10006 : 95.79% cris-elf 74771 52634 : 70.39% epiphany-elf 87577 52284 : 59.70% fr30-elf 8041 7713 : 95.92% frv-linux-gnu 53699 47543 : 88.54% h8300-elf 70708 66274 : 93.73% hppa64-hp-hpux11.23 71597 57484 : 80.29% ia64-linux-gnu 147286 130632 : 88.69% iq2000-elf 11002 11774 : 107.02% lm32-elf 5894 5798 : 98.37% m32c-elf 36563 28855 : 78.92% m32r-elf 17252 16910 : 98.02% m68k-linux-gnu 58248 59781 : 102.63% mcore-elf 17708 18948 : 107.00% mep-elf 314466 150771 : 47.95% microblaze-elf 10257 10534 : 102.70% mips-linux-gnu 230991 191155 : 82.75% mmix 10782 10678 : 99.04% mn10300-elf 24035 22802 : 94.87% moxie-elf 4622 4198 : 90.83% msp430-elf 21707 16539 : 76.19% nds32le-elf 22041 19444 : 88.22% nios2-linux-gnu 15595 13238 : 84.89% pdp11 7630 8254 : 108.18% powerpc-eabispe 430816 308481 : 71.60% powerpc-linux-gnu 317738 248534 : 78.22% rl78-elf 18904 16329 : 86.38% rx-elf 55015 56632 : 102.94% s390-linux-gnu 190584 148961 : 78.16% sh-linux-gnu 408446 307927 : 75.39% sparc-linux-gnu 91016 80640 : 88.60% spu-elf 80387 69151 : 86.02% tilegx-elf 63815 49977 : 78.32% tilepro-elf 51763 39252 : 75.83% v850-elf 32812 28462 : 86.74% vax-netbsdelf 18350 18259 : 99.50% visium-elf 56872 46790 : 82.27% x86_64-darwin 1306182 883169 : 67.61% xstormy16-elf 17044 14430 : 84.66% xtensa-elf 10780 9678 : 89.78% The same for --enable-checking=yes,rtl: aarch64-linux-gnueabi 1790272 507488 : 28.35% alpha-linux-gnu 440058 193826 : 44.05% arm-linux-gnueabi 2845568 1299337 : 45.66% avr-rtems 885672 294387 : 33.24% bfin-elf 280606 142836 : 50.90% c6x-elf 486345 259256 : 53.31% cr16-elf 46626 20044 : 42.99% cris-elf 426813 144414 : 33.84% epiphany-elf 353078 122166 : 34.60% fr30-elf 40414 21042 : 52.07% frv-linux-gnu 259550 111335 : 42.90% h8300-elf 355199 158411 : 44.60% hppa64-hp-hpux11.23 340584 149009 : 43.75% ia64-linux-gnu 661364 293710 : 44.41% iq2000-elf 41123 26709 : 64.95% lm32-elf 20370 14781 : 72.56% m32c-elf 174344 62000 : 35.56% m32r-elf 74357 41773 : 56.18% m68k-linux-gnu 275733 117445 : 42.59% mcore-elf 85180 48018 : 56.37% mep-elf 1450168 376020 : 25.93% microblaze-elf 44189 26295 : 59.51% mips-linux-gnu 876650 375753 : 42.86% mmix 49882 25363 : 50.85% mn10300-elf 128148 66768 : 52.10% moxie-elf 23388 9011 : 38.53% msp430-elf 114200 34426 : 30.15% nds32le-elf 101416 73677 : 72.65% nios2-linux-gnu 58799 29825 : 50.72% pdp11 32836 18557 : 56.51% powerpc-eabispe 1976098 626942 : 31.73% powerpc-linux-gnu 1510652 526841 : 34.88% rl78-elf 93675 40538 : 43.28% rx-elf 279748 137284 : 49.07% s390-linux-gnu 857009 316494 : 36.93% sh-linux-gnu 2154337 806571 : 37.44% sparc-linux-gnu 367682 169019 : 45.97% spu-elf 341945 135629 : 39.66% tilegx-elf 235480 112103 : 47.61% tilepro-elf 246231 104137 : 42.29% v850-elf 158028 72875 : 46.12% vax-netbsdelf 85057 37578 : 44.18% visium-elf 257148 103331 : 40.18% x86_64-darwin 5514235 1721777 : 31.22% xstormy16-elf 83456 46128 : 55.27% xtensa-elf 52652 29880 : 56.75% Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-none-eabi. Also tested by building the testsuite for each of the targets above and making sure there were no assembly differences. Made sure that no objects in spec2k6 changed for aarch64-linux-gnu (except for perlbench perl.o and cactusADM datestamp.o, where the differences are timestamps). OK to install? Thanks, Richard PS. I've attached the new genrecog.c since the diff version is unreadable. gcc/ * Makefile.in (build/genrecog.o): Depend on inchash.h. (build/genrecog$(build_exeext): Depend on build/hash-table.o and build/inchash.o * genrecog.c: Rewrite most of the code except for the third page. Index: gcc/Makefile.in =================================================================== --- gcc/Makefile.in 2015-04-27 10:42:57.783191573 +0100 +++ gcc/Makefile.in 2015-04-27 10:43:42.878643078 +0100 @@ -2527,7 +2527,8 @@ build/genpeep.o : genpeep.c $(RTL_BASE_H build/genpreds.o : genpreds.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H) \ coretypes.h $(GTM_H) errors.h $(READ_MD_H) gensupport.h $(OBSTACK_H) build/genrecog.o : genrecog.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H) \ - coretypes.h $(GTM_H) errors.h $(READ_MD_H) gensupport.h + coretypes.h $(GTM_H) errors.h $(READ_MD_H) gensupport.h \ + $(HASH_TABLE_H) inchash.h build/genhooks.o : genhooks.c $(TARGET_DEF) $(C_TARGET_DEF) \ $(COMMON_TARGET_DEF) $(BCONFIG_H) $(SYSTEM_H) errors.h build/genmddump.o : genmddump.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H) \ @@ -2559,6 +2560,8 @@ genprog = $(genprogerr) check checksum c # These programs need libs over and above what they get from the above list. build/genautomata$(build_exeext) : BUILD_LIBS += -lm +build/genrecog$(build_exeext) : build/hash-table.o build/inchash.o + # For stage1 and when cross-compiling use the build libcpp which is # built with NLS disabled. For stage2+ use the host library and # its dependencies.