From patchwork Mon Apr 27 10:20:30 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Richard Sandiford <richard.sandiford@arm.com>
X-Patchwork-Id: 464931
Return-Path: 
 <gcc-patches-return-396056-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 5817614007D
	for <incoming@patchwork.ozlabs.org>;
	Mon, 27 Apr 2015 20:20:49 +1000 (AEST)
Authentication-Results: ozlabs.org; dkim=pass
	reason="1024-bit key; unprotected key"
	header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=xSpJfUeb;
	dkim-adsp=none (unprotected policy); dkim-atps=neutral
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:subject:date:message-id:mime-version:content-type; q=dns; s=
	default; b=v76KIMZNKPrL0aY0mHVGW2p1BrsSKScyIWaAvtoIHNs+IRxHLHqtB
	qWg44Ky6HNY2XDd4CairRs9eXjd5MZ4r5lQwc0FZtoXN2XfqLuHYM7TBlTP2+DFn
	4AaLICKEpxIebjM3hlJ6V4YR/8rqo7D4zdceqp3A3NcfTgLMeRIkmo=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:subject:date:message-id:mime-version:content-type; s=
	default; bh=EHg5ufkA/HwzRLtEvFDG+2+Al5Q=; b=xSpJfUebfK4aTXFEknnh
	p8HRmqCE2QeLLtaMiZSv4iszsSDkK++ObSF+vAXy5X6O7E1yUIZ+Bv9n6ITQ5NVL
	02fcWrvHz8lJvetUiQ7MSSQBZ6yB9QuRvPTTSA1MY69YW5dJAM+T43AR2aSQacV0
	maO/hjO7zsS6TeEHPGAZNyk=
Received: (qmail 42058 invoked by alias); 27 Apr 2015 10:20:41 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 41490 invoked by uid 89); 27 Apr 2015 10:20:38 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.5 required=5.0 tests=AWL, BAYES_50,
	SPF_PASS autolearn=ham version=3.3.2
X-HELO: eu-smtp-delivery-143.mimecast.com
Received: from eu-smtp-delivery-143.mimecast.com (HELO
	eu-smtp-delivery-143.mimecast.com) (207.82.80.143) by
	sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Mon, 27 Apr 2015 10:20:34 +0000
Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com
	[217.140.96.140]) by uk-mta-4.uk.mimecast.lan;
	Mon, 27 Apr 2015 11:20:31 +0100
Received: from localhost ([10.1.2.79]) by cam-owa2.Emea.Arm.com with
	Microsoft SMTPSVC(6.0.3790.3959); Mon, 27 Apr 2015 11:20:30 +0100
From: Richard Sandiford <richard.sandiford@arm.com>
To: gcc-patches@gcc.gnu.org
Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com
Subject: Mostly rewrite genrecog
Date: Mon, 27 Apr 2015 11:20:30 +0100
Message-ID: <87egn5yis1.fsf@e105548-lin.cambridge.arm.com>
User-Agent: Gnus/5.130012 (Ma Gnus v0.12) Emacs/24.3 (gnu/linux)
MIME-Version: 1.0
X-MC-Unique: waWyUAneS_CqxksFf1pgMw-1

I think it's been the case for a while that parallel builds of GCC tend
to serialise around the compilation of insn-recog.c, especially with
higher --enable-checking settings.  This patch tries to speed that
up by replacing most of genrecog with a new algorithm.

I think the main problems with the current code are:

1. Vector architectures have added lots of new instructions that have
   a similar shape and differ only in mode, code or unspec number.
   The current algorithm doesn't have any way of factoring out those
   similarities.

2. When matching a particular instruction, the current code examines
   everything about a SET_DEST before moving on to the SET_SRC.  This has
   two subproblems:

   2a. The destination of a SET isn't very distinctive.  It's usually
       just a register_operand, a memory_operand, a nonimmediate_operand
       or a flags register.  We therefore tend to backtrack to the
       SET_DEST a lot, oscillating between groups of instructions with
       the same kind of destination.

   2b. Backtracking through predicate checks is relatively expensive.
       It would be good to narrow down the "shape" of the instruction
       first and only then check the predicates.  (The backtracking is
       expensive in terms of insn-recog.o compile time too, both because
       we need to copy into argument registers and out of the result
       register, and because it adds more sites where spills are needed.)

3. The code keeps one local variable per rtx depth, so it ends up
   loading the same rtx many times over (mostly when backtracking).
   This is very expensive in rtl-checking builds because each XEXP
   includes a code check and a line-specific failure call.

   In principle the idea of having one local variable per depth
   is good.  But it was originally written that way when all optimisations
   were done at the rtl level and I imagine each local variable mapped
   to one pseudo register.  These days the statements that reload the
   value needed on backtracking lead to many more SSA names and phi
   statements than you'd get with just a single variable per position
   (loaded once, so naturally SSA).  There is still the potential benefit
   of avoiding having sibling rtxes live at once, but fixing (2) above
   reduces that problem.

Also, the code is all goto-based, which makes it rather hard to step through.

The patch deals with these as follows:

1. Detect subpatterns that differ only by mode, code and/or integer
   (e.g. unspec number) and split them out into a common routine.

2. Match the "shape" of the instruction first, in terms of codes,
   integers and vector lengths, and only then check the modes, predicates
   and dups.  When checking the shape, handle SET_SRCs before SET_DESTs.
   In practice this seems to greatly reduce the amount of backtracking.

3. Have one local variable per rtx position.  I tested the patch with
   and without the change and it helped a lot with rtl-checking builds
   without seeming to affect release builds much either way.

As far as debuggability goes, the new code avoids gotos and just
uses "natural" control flow.

The headline stat is that a stage 3 --enable-checking=yes,rtl,df
build of insn-recog.c on my box goes from 7m43s to 2m2s (using the
same stage 2 compiler).  The corresponding --enable-checking=release
change is from 49s to 24s (less impressive, as expected).

The patch seems to speed up recog.  E.g. the time taken to build
fold-const.ii goes from 6.74s before the patch to 6.69s after it;
not a big speed-up, but reproducible.

Here's a comparison of the number of lines of code in insn-recog.c
before and after the patch on one target per config/ CPU:

aarch64-linux-gnueabi                           115526    38169 :   33.04%
alpha-linux-gnu                                  24479    10740 :   43.87%
arm-linux-gnueabi                               169208    67759 :   40.04%
avr-rtems                                        55647    22127 :   39.76%
bfin-elf                                         13928     6498 :   46.65%
c6x-elf                                          29928    13324 :   44.52%
cr16-elf                                          2650     1419 :   53.55%
cris-elf                                         18669     7257 :   38.87%
epiphany-elf                                     19308     6131 :   31.75%
fr30-elf                                          2204     1112 :   50.45%
frv-linux-gnu                                    13541     5950 :   43.94%
h8300-elf                                        19584     9327 :   47.63%
hppa64-hp-hpux11.23                              18299     8549 :   46.72%
ia64-linux-gnu                                   37629    17101 :   45.45%
iq2000-elf                                        2752     1609 :   58.47%
lm32-elf                                          1536      872 :   56.77%
m32c-elf                                         10040     4145 :   41.28%
m32r-elf                                          4436     2307 :   52.01%
m68k-linux-gnu                                   15739     8147 :   51.76%
mcore-elf                                         4816     2577 :   53.51%
mep-elf                                          67335    15929 :   23.66%
microblaze-elf                                    2656     1587 :   59.75%
mips-linux-gnu                                   54543    24186 :   44.34%
mmix                                              2597     1487 :   57.26%
mn10300-elf                                       6384     3294 :   51.60%
moxie-elf                                         1311      659 :   50.27%
msp430-elf                                        6054     2382 :   39.35%
nds32le-elf                                       5953     3152 :   52.95%
nios2-linux-gnu                                   3735     2143 :   57.38%
pdp11                                             2137     1157 :   54.14%
powerpc-eabispe                                 109322    40582 :   37.12%
powerpc-linux-gnu                                82976    32192 :   38.80%
rl78-elf                                          5321     2432 :   45.71%
rx-elf                                           14454     7534 :   52.12%
s390-linux-gnu                                   48487    20454 :   42.18%
sh-linux-gnu                                    104087    41820 :   40.18%
sparc-linux-gnu                                  21912    10509 :   47.96%
spu-elf                                          19937     8182 :   41.04%
tilegx-elf                                       15412     6970 :   45.22%
tilepro-elf                                      11998     5479 :   45.67%
v850-elf                                          8725     4438 :   50.87%
vax-netbsdelf                                     4537     2410 :   53.12%
visium-elf                                       15190     7224 :   47.56%
x86_64-darwin                                   346396   119593 :   34.52%
xstormy16-elf                                     4660     2229 :   47.83%
xtensa-elf                                        2799     1514 :   54.09%

Here's the loadable size of insn-recog.o in an --enable-checking=release
build on an x86_64-linux-gnu box:

aarch64-linux-gnueabi                           443955   298026 :   67.13%
alpha-linux-gnu                                  97194    80893 :   83.23%
arm-linux-gnueabi                               782325   632248 :   80.82%
avr-rtems                                       226827   159763 :   70.43%
bfin-elf                                         52563    42376 :   80.62%
c6x-elf                                         112512    90142 :   80.12%
cr16-elf                                         10446    10006 :   95.79%
cris-elf                                         74771    52634 :   70.39%
epiphany-elf                                     87577    52284 :   59.70%
fr30-elf                                          8041     7713 :   95.92%
frv-linux-gnu                                    53699    47543 :   88.54%
h8300-elf                                        70708    66274 :   93.73%
hppa64-hp-hpux11.23                              71597    57484 :   80.29%
ia64-linux-gnu                                  147286   130632 :   88.69%
iq2000-elf                                       11002    11774 :  107.02%
lm32-elf                                          5894     5798 :   98.37%
m32c-elf                                         36563    28855 :   78.92%
m32r-elf                                         17252    16910 :   98.02%
m68k-linux-gnu                                   58248    59781 :  102.63%
mcore-elf                                        17708    18948 :  107.00%
mep-elf                                         314466   150771 :   47.95%
microblaze-elf                                   10257    10534 :  102.70%
mips-linux-gnu                                  230991   191155 :   82.75%
mmix                                             10782    10678 :   99.04%
mn10300-elf                                      24035    22802 :   94.87%
moxie-elf                                         4622     4198 :   90.83%
msp430-elf                                       21707    16539 :   76.19%
nds32le-elf                                      22041    19444 :   88.22%
nios2-linux-gnu                                  15595    13238 :   84.89%
pdp11                                             7630     8254 :  108.18%
powerpc-eabispe                                 430816   308481 :   71.60%
powerpc-linux-gnu                               317738   248534 :   78.22%
rl78-elf                                         18904    16329 :   86.38%
rx-elf                                           55015    56632 :  102.94%
s390-linux-gnu                                  190584   148961 :   78.16%
sh-linux-gnu                                    408446   307927 :   75.39%
sparc-linux-gnu                                  91016    80640 :   88.60%
spu-elf                                          80387    69151 :   86.02%
tilegx-elf                                       63815    49977 :   78.32%
tilepro-elf                                      51763    39252 :   75.83%
v850-elf                                         32812    28462 :   86.74%
vax-netbsdelf                                    18350    18259 :   99.50%
visium-elf                                       56872    46790 :   82.27%
x86_64-darwin                                  1306182   883169 :   67.61%
xstormy16-elf                                    17044    14430 :   84.66%
xtensa-elf                                       10780     9678 :   89.78%

The same for --enable-checking=yes,rtl:

aarch64-linux-gnueabi                          1790272   507488 :   28.35%
alpha-linux-gnu                                 440058   193826 :   44.05%
arm-linux-gnueabi                              2845568  1299337 :   45.66%
avr-rtems                                       885672   294387 :   33.24%
bfin-elf                                        280606   142836 :   50.90%
c6x-elf                                         486345   259256 :   53.31%
cr16-elf                                         46626    20044 :   42.99%
cris-elf                                        426813   144414 :   33.84%
epiphany-elf                                    353078   122166 :   34.60%
fr30-elf                                         40414    21042 :   52.07%
frv-linux-gnu                                   259550   111335 :   42.90%
h8300-elf                                       355199   158411 :   44.60%
hppa64-hp-hpux11.23                             340584   149009 :   43.75%
ia64-linux-gnu                                  661364   293710 :   44.41%
iq2000-elf                                       41123    26709 :   64.95%
lm32-elf                                         20370    14781 :   72.56%
m32c-elf                                        174344    62000 :   35.56%
m32r-elf                                         74357    41773 :   56.18%
m68k-linux-gnu                                  275733   117445 :   42.59%
mcore-elf                                        85180    48018 :   56.37%
mep-elf                                        1450168   376020 :   25.93%
microblaze-elf                                   44189    26295 :   59.51%
mips-linux-gnu                                  876650   375753 :   42.86%
mmix                                             49882    25363 :   50.85%
mn10300-elf                                     128148    66768 :   52.10%
moxie-elf                                        23388     9011 :   38.53%
msp430-elf                                      114200    34426 :   30.15%
nds32le-elf                                     101416    73677 :   72.65%
nios2-linux-gnu                                  58799    29825 :   50.72%
pdp11                                            32836    18557 :   56.51%
powerpc-eabispe                                1976098   626942 :   31.73%
powerpc-linux-gnu                              1510652   526841 :   34.88%
rl78-elf                                         93675    40538 :   43.28%
rx-elf                                          279748   137284 :   49.07%
s390-linux-gnu                                  857009   316494 :   36.93%
sh-linux-gnu                                   2154337   806571 :   37.44%
sparc-linux-gnu                                 367682   169019 :   45.97%
spu-elf                                         341945   135629 :   39.66%
tilegx-elf                                      235480   112103 :   47.61%
tilepro-elf                                     246231   104137 :   42.29%
v850-elf                                        158028    72875 :   46.12%
vax-netbsdelf                                    85057    37578 :   44.18%
visium-elf                                      257148   103331 :   40.18%
x86_64-darwin                                  5514235  1721777 :   31.22%
xstormy16-elf                                    83456    46128 :   55.27%
xtensa-elf                                       52652    29880 :   56.75%

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-none-eabi.
Also tested by building the testsuite for each of the targets above
and making sure there were no assembly differences.  Made sure that no
objects in spec2k6 changed for aarch64-linux-gnu (except for perlbench
perl.o and cactusADM datestamp.o, where the differences are timestamps).
OK to install?

Thanks,
Richard

PS. I've attached the new genrecog.c since the diff version is unreadable.


gcc/
	* Makefile.in (build/genrecog.o): Depend on inchash.h.
	(build/genrecog$(build_exeext): Depend on build/hash-table.o and
	build/inchash.o
	* genrecog.c: Rewrite most of the code except for the third page.

Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	2015-04-27 10:42:57.783191573 +0100
+++ gcc/Makefile.in	2015-04-27 10:43:42.878643078 +0100
@@ -2527,7 +2527,8 @@ build/genpeep.o : genpeep.c $(RTL_BASE_H
 build/genpreds.o : genpreds.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H)	\
   coretypes.h $(GTM_H) errors.h $(READ_MD_H) gensupport.h $(OBSTACK_H)
 build/genrecog.o : genrecog.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H)	\
-  coretypes.h $(GTM_H) errors.h $(READ_MD_H) gensupport.h
+  coretypes.h $(GTM_H) errors.h $(READ_MD_H) gensupport.h		\
+  $(HASH_TABLE_H) inchash.h
 build/genhooks.o : genhooks.c $(TARGET_DEF) $(C_TARGET_DEF)		\
   $(COMMON_TARGET_DEF) $(BCONFIG_H) $(SYSTEM_H) errors.h
 build/genmddump.o : genmddump.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H)	\
@@ -2559,6 +2560,8 @@ genprog = $(genprogerr) check checksum c
 # These programs need libs over and above what they get from the above list.
 build/genautomata$(build_exeext) : BUILD_LIBS += -lm
 
+build/genrecog$(build_exeext) : build/hash-table.o build/inchash.o
+
 # For stage1 and when cross-compiling use the build libcpp which is
 # built with NLS disabled.  For stage2+ use the host library and
 # its dependencies.