diff mbox

Mostly rewrite genrecog

Message ID 87egn5yis1.fsf@e105548-lin.cambridge.arm.com
State New
Headers show

Commit Message

Richard Sandiford April 27, 2015, 10:20 a.m. UTC
I think it's been the case for a while that parallel builds of GCC tend
to serialise around the compilation of insn-recog.c, especially with
higher --enable-checking settings.  This patch tries to speed that
up by replacing most of genrecog with a new algorithm.

I think the main problems with the current code are:

1. Vector architectures have added lots of new instructions that have
   a similar shape and differ only in mode, code or unspec number.
   The current algorithm doesn't have any way of factoring out those
   similarities.

2. When matching a particular instruction, the current code examines
   everything about a SET_DEST before moving on to the SET_SRC.  This has
   two subproblems:

   2a. The destination of a SET isn't very distinctive.  It's usually
       just a register_operand, a memory_operand, a nonimmediate_operand
       or a flags register.  We therefore tend to backtrack to the
       SET_DEST a lot, oscillating between groups of instructions with
       the same kind of destination.

   2b. Backtracking through predicate checks is relatively expensive.
       It would be good to narrow down the "shape" of the instruction
       first and only then check the predicates.  (The backtracking is
       expensive in terms of insn-recog.o compile time too, both because
       we need to copy into argument registers and out of the result
       register, and because it adds more sites where spills are needed.)

3. The code keeps one local variable per rtx depth, so it ends up
   loading the same rtx many times over (mostly when backtracking).
   This is very expensive in rtl-checking builds because each XEXP
   includes a code check and a line-specific failure call.

   In principle the idea of having one local variable per depth
   is good.  But it was originally written that way when all optimisations
   were done at the rtl level and I imagine each local variable mapped
   to one pseudo register.  These days the statements that reload the
   value needed on backtracking lead to many more SSA names and phi
   statements than you'd get with just a single variable per position
   (loaded once, so naturally SSA).  There is still the potential benefit
   of avoiding having sibling rtxes live at once, but fixing (2) above
   reduces that problem.

Also, the code is all goto-based, which makes it rather hard to step through.

The patch deals with these as follows:

1. Detect subpatterns that differ only by mode, code and/or integer
   (e.g. unspec number) and split them out into a common routine.

2. Match the "shape" of the instruction first, in terms of codes,
   integers and vector lengths, and only then check the modes, predicates
   and dups.  When checking the shape, handle SET_SRCs before SET_DESTs.
   In practice this seems to greatly reduce the amount of backtracking.

3. Have one local variable per rtx position.  I tested the patch with
   and without the change and it helped a lot with rtl-checking builds
   without seeming to affect release builds much either way.

As far as debuggability goes, the new code avoids gotos and just
uses "natural" control flow.

The headline stat is that a stage 3 --enable-checking=yes,rtl,df
build of insn-recog.c on my box goes from 7m43s to 2m2s (using the
same stage 2 compiler).  The corresponding --enable-checking=release
change is from 49s to 24s (less impressive, as expected).

The patch seems to speed up recog.  E.g. the time taken to build
fold-const.ii goes from 6.74s before the patch to 6.69s after it;
not a big speed-up, but reproducible.

Here's a comparison of the number of lines of code in insn-recog.c
before and after the patch on one target per config/ CPU:

aarch64-linux-gnueabi                           115526    38169 :   33.04%
alpha-linux-gnu                                  24479    10740 :   43.87%
arm-linux-gnueabi                               169208    67759 :   40.04%
avr-rtems                                        55647    22127 :   39.76%
bfin-elf                                         13928     6498 :   46.65%
c6x-elf                                          29928    13324 :   44.52%
cr16-elf                                          2650     1419 :   53.55%
cris-elf                                         18669     7257 :   38.87%
epiphany-elf                                     19308     6131 :   31.75%
fr30-elf                                          2204     1112 :   50.45%
frv-linux-gnu                                    13541     5950 :   43.94%
h8300-elf                                        19584     9327 :   47.63%
hppa64-hp-hpux11.23                              18299     8549 :   46.72%
ia64-linux-gnu                                   37629    17101 :   45.45%
iq2000-elf                                        2752     1609 :   58.47%
lm32-elf                                          1536      872 :   56.77%
m32c-elf                                         10040     4145 :   41.28%
m32r-elf                                          4436     2307 :   52.01%
m68k-linux-gnu                                   15739     8147 :   51.76%
mcore-elf                                         4816     2577 :   53.51%
mep-elf                                          67335    15929 :   23.66%
microblaze-elf                                    2656     1587 :   59.75%
mips-linux-gnu                                   54543    24186 :   44.34%
mmix                                              2597     1487 :   57.26%
mn10300-elf                                       6384     3294 :   51.60%
moxie-elf                                         1311      659 :   50.27%
msp430-elf                                        6054     2382 :   39.35%
nds32le-elf                                       5953     3152 :   52.95%
nios2-linux-gnu                                   3735     2143 :   57.38%
pdp11                                             2137     1157 :   54.14%
powerpc-eabispe                                 109322    40582 :   37.12%
powerpc-linux-gnu                                82976    32192 :   38.80%
rl78-elf                                          5321     2432 :   45.71%
rx-elf                                           14454     7534 :   52.12%
s390-linux-gnu                                   48487    20454 :   42.18%
sh-linux-gnu                                    104087    41820 :   40.18%
sparc-linux-gnu                                  21912    10509 :   47.96%
spu-elf                                          19937     8182 :   41.04%
tilegx-elf                                       15412     6970 :   45.22%
tilepro-elf                                      11998     5479 :   45.67%
v850-elf                                          8725     4438 :   50.87%
vax-netbsdelf                                     4537     2410 :   53.12%
visium-elf                                       15190     7224 :   47.56%
x86_64-darwin                                   346396   119593 :   34.52%
xstormy16-elf                                     4660     2229 :   47.83%
xtensa-elf                                        2799     1514 :   54.09%

Here's the loadable size of insn-recog.o in an --enable-checking=release
build on an x86_64-linux-gnu box:

aarch64-linux-gnueabi                           443955   298026 :   67.13%
alpha-linux-gnu                                  97194    80893 :   83.23%
arm-linux-gnueabi                               782325   632248 :   80.82%
avr-rtems                                       226827   159763 :   70.43%
bfin-elf                                         52563    42376 :   80.62%
c6x-elf                                         112512    90142 :   80.12%
cr16-elf                                         10446    10006 :   95.79%
cris-elf                                         74771    52634 :   70.39%
epiphany-elf                                     87577    52284 :   59.70%
fr30-elf                                          8041     7713 :   95.92%
frv-linux-gnu                                    53699    47543 :   88.54%
h8300-elf                                        70708    66274 :   93.73%
hppa64-hp-hpux11.23                              71597    57484 :   80.29%
ia64-linux-gnu                                  147286   130632 :   88.69%
iq2000-elf                                       11002    11774 :  107.02%
lm32-elf                                          5894     5798 :   98.37%
m32c-elf                                         36563    28855 :   78.92%
m32r-elf                                         17252    16910 :   98.02%
m68k-linux-gnu                                   58248    59781 :  102.63%
mcore-elf                                        17708    18948 :  107.00%
mep-elf                                         314466   150771 :   47.95%
microblaze-elf                                   10257    10534 :  102.70%
mips-linux-gnu                                  230991   191155 :   82.75%
mmix                                             10782    10678 :   99.04%
mn10300-elf                                      24035    22802 :   94.87%
moxie-elf                                         4622     4198 :   90.83%
msp430-elf                                       21707    16539 :   76.19%
nds32le-elf                                      22041    19444 :   88.22%
nios2-linux-gnu                                  15595    13238 :   84.89%
pdp11                                             7630     8254 :  108.18%
powerpc-eabispe                                 430816   308481 :   71.60%
powerpc-linux-gnu                               317738   248534 :   78.22%
rl78-elf                                         18904    16329 :   86.38%
rx-elf                                           55015    56632 :  102.94%
s390-linux-gnu                                  190584   148961 :   78.16%
sh-linux-gnu                                    408446   307927 :   75.39%
sparc-linux-gnu                                  91016    80640 :   88.60%
spu-elf                                          80387    69151 :   86.02%
tilegx-elf                                       63815    49977 :   78.32%
tilepro-elf                                      51763    39252 :   75.83%
v850-elf                                         32812    28462 :   86.74%
vax-netbsdelf                                    18350    18259 :   99.50%
visium-elf                                       56872    46790 :   82.27%
x86_64-darwin                                  1306182   883169 :   67.61%
xstormy16-elf                                    17044    14430 :   84.66%
xtensa-elf                                       10780     9678 :   89.78%

The same for --enable-checking=yes,rtl:

aarch64-linux-gnueabi                          1790272   507488 :   28.35%
alpha-linux-gnu                                 440058   193826 :   44.05%
arm-linux-gnueabi                              2845568  1299337 :   45.66%
avr-rtems                                       885672   294387 :   33.24%
bfin-elf                                        280606   142836 :   50.90%
c6x-elf                                         486345   259256 :   53.31%
cr16-elf                                         46626    20044 :   42.99%
cris-elf                                        426813   144414 :   33.84%
epiphany-elf                                    353078   122166 :   34.60%
fr30-elf                                         40414    21042 :   52.07%
frv-linux-gnu                                   259550   111335 :   42.90%
h8300-elf                                       355199   158411 :   44.60%
hppa64-hp-hpux11.23                             340584   149009 :   43.75%
ia64-linux-gnu                                  661364   293710 :   44.41%
iq2000-elf                                       41123    26709 :   64.95%
lm32-elf                                         20370    14781 :   72.56%
m32c-elf                                        174344    62000 :   35.56%
m32r-elf                                         74357    41773 :   56.18%
m68k-linux-gnu                                  275733   117445 :   42.59%
mcore-elf                                        85180    48018 :   56.37%
mep-elf                                        1450168   376020 :   25.93%
microblaze-elf                                   44189    26295 :   59.51%
mips-linux-gnu                                  876650   375753 :   42.86%
mmix                                             49882    25363 :   50.85%
mn10300-elf                                     128148    66768 :   52.10%
moxie-elf                                        23388     9011 :   38.53%
msp430-elf                                      114200    34426 :   30.15%
nds32le-elf                                     101416    73677 :   72.65%
nios2-linux-gnu                                  58799    29825 :   50.72%
pdp11                                            32836    18557 :   56.51%
powerpc-eabispe                                1976098   626942 :   31.73%
powerpc-linux-gnu                              1510652   526841 :   34.88%
rl78-elf                                         93675    40538 :   43.28%
rx-elf                                          279748   137284 :   49.07%
s390-linux-gnu                                  857009   316494 :   36.93%
sh-linux-gnu                                   2154337   806571 :   37.44%
sparc-linux-gnu                                 367682   169019 :   45.97%
spu-elf                                         341945   135629 :   39.66%
tilegx-elf                                      235480   112103 :   47.61%
tilepro-elf                                     246231   104137 :   42.29%
v850-elf                                        158028    72875 :   46.12%
vax-netbsdelf                                    85057    37578 :   44.18%
visium-elf                                      257148   103331 :   40.18%
x86_64-darwin                                  5514235  1721777 :   31.22%
xstormy16-elf                                    83456    46128 :   55.27%
xtensa-elf                                       52652    29880 :   56.75%

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-none-eabi.
Also tested by building the testsuite for each of the targets above
and making sure there were no assembly differences.  Made sure that no
objects in spec2k6 changed for aarch64-linux-gnu (except for perlbench
perl.o and cactusADM datestamp.o, where the differences are timestamps).
OK to install?

Thanks,
Richard

PS. I've attached the new genrecog.c since the diff version is unreadable.


gcc/
	* Makefile.in (build/genrecog.o): Depend on inchash.h.
	(build/genrecog$(build_exeext): Depend on build/hash-table.o and
	build/inchash.o
	* genrecog.c: Rewrite most of the code except for the third page.

Comments

Jeff Law April 28, 2015, 11:12 p.m. UTC | #1
On 04/27/2015 04:20 AM, Richard Sandiford wrote:
> I think it's been the case for a while that parallel builds of GCC tend
> to serialise around the compilation of insn-recog.c, especially with
> higher --enable-checking settings.  This patch tries to speed that
> up by replacing most of genrecog with a new algorithm.
>
> I think the main problems with the current code are:
>
> 1. Vector architectures have added lots of new instructions that have
>     a similar shape and differ only in mode, code or unspec number.
>     The current algorithm doesn't have any way of factoring out those
>     similarities.
>
> 2. When matching a particular instruction, the current code examines
>     everything about a SET_DEST before moving on to the SET_SRC.  This has
>     two subproblems:
>
>     2a. The destination of a SET isn't very distinctive.  It's usually
>         just a register_operand, a memory_operand, a nonimmediate_operand
>         or a flags register.  We therefore tend to backtrack to the
>         SET_DEST a lot, oscillating between groups of instructions with
>         the same kind of destination.
>
>     2b. Backtracking through predicate checks is relatively expensive.
>         It would be good to narrow down the "shape" of the instruction
>         first and only then check the predicates.  (The backtracking is
>         expensive in terms of insn-recog.o compile time too, both because
>         we need to copy into argument registers and out of the result
>         register, and because it adds more sites where spills are needed.)
>
> 3. The code keeps one local variable per rtx depth, so it ends up
>     loading the same rtx many times over (mostly when backtracking).
>     This is very expensive in rtl-checking builds because each XEXP
>     includes a code check and a line-specific failure call.
>
>     In principle the idea of having one local variable per depth
>     is good.  But it was originally written that way when all optimisations
>     were done at the rtl level and I imagine each local variable mapped
>     to one pseudo register.  These days the statements that reload the
>     value needed on backtracking lead to many more SSA names and phi
>     statements than you'd get with just a single variable per position
>     (loaded once, so naturally SSA).  There is still the potential benefit
>     of avoiding having sibling rtxes live at once, but fixing (2) above
>     reduces that problem.
>
> Also, the code is all goto-based, which makes it rather hard to step through.
>
> The patch deals with these as follows:
>
> 1. Detect subpatterns that differ only by mode, code and/or integer
>     (e.g. unspec number) and split them out into a common routine.
>
> 2. Match the "shape" of the instruction first, in terms of codes,
>     integers and vector lengths, and only then check the modes, predicates
>     and dups.  When checking the shape, handle SET_SRCs before SET_DESTs.
>     In practice this seems to greatly reduce the amount of backtracking.
>
> 3. Have one local variable per rtx position.  I tested the patch with
>     and without the change and it helped a lot with rtl-checking builds
>     without seeming to affect release builds much either way.
>
> As far as debuggability goes, the new code avoids gotos and just
> uses "natural" control flow.
>
> The headline stat is that a stage 3 --enable-checking=yes,rtl,df
> build of insn-recog.c on my box goes from 7m43s to 2m2s (using the
> same stage 2 compiler).  The corresponding --enable-checking=release
> change is from 49s to 24s (less impressive, as expected).
>
> The patch seems to speed up recog.  E.g. the time taken to build
> fold-const.ii goes from 6.74s before the patch to 6.69s after it;
> not a big speed-up, but reproducible.
[ big snip ]

I don't see anything that makes me think we don't want this :-)

It's interesting to note that the regressions in loadable size of a 
release-checking insn-recog aren't any mainstream ports.  Hell, I'd 
probably claim they're all fringe ports (iq2000, m68k, mcore, 
microblaze, pdp11, rx)


> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-none-eabi.
> Also tested by building the testsuite for each of the targets above
> and making sure there were no assembly differences.  Made sure that no
> objects in spec2k6 changed for aarch64-linux-gnu (except for perlbench
> perl.o and cactusADM datestamp.o, where the differences are timestamps).
> OK to install?
To be very blunt, I'm probably not capable of reviewing the new code. 
You're going to know it best and you should probably own it.

Given your history with gcc and attention to detail, I'm comfortable 
with approving this knowing you'll deal with any issues as they arise.

The one thing I would ask is that you make sure to include the recently 
added matching constraint checking bits in genrecog.  I'm assuming all 
the older sanity checks that have been in genrecog for a longer period 
of time have been kept.

Jeff
Eric Botcazou April 29, 2015, 8:42 a.m. UTC | #2
> I think it's been the case for a while that parallel builds of GCC tend
> to serialise around the compilation of insn-recog.c, especially with
> higher --enable-checking settings.  This patch tries to speed that
> up by replacing most of genrecog with a new algorithm.

I can confirm this, especially with --enable-checking=rtl.

> Also, the code is all goto-based, which makes it rather hard to step
> through.

Do you mean the code in genrecog.c or the generated code in insn-recog.c?

> The patch deals with these as follows:
> 
> 1. Detect subpatterns that differ only by mode, code and/or integer
>    (e.g. unspec number) and split them out into a common routine.
> 
> 2. Match the "shape" of the instruction first, in terms of codes,
>    integers and vector lengths, and only then check the modes, predicates
>    and dups.  When checking the shape, handle SET_SRCs before SET_DESTs.
>    In practice this seems to greatly reduce the amount of backtracking.
> 
> 3. Have one local variable per rtx position.  I tested the patch with
>    and without the change and it helped a lot with rtl-checking builds
>    without seeming to affect release builds much either way.
> 
> As far as debuggability goes, the new code avoids gotos and just
> uses "natural" control flow.

See below.

> The headline stat is that a stage 3 --enable-checking=yes,rtl,df
> build of insn-recog.c on my box goes from 7m43s to 2m2s (using the
> same stage 2 compiler).  The corresponding --enable-checking=release
> change is from 49s to 24s (less impressive, as expected).

The first figure is quite impressive, great work!

> PS. I've attached the new genrecog.c since the diff version is unreadable.

Small request: you didn't change a single line of the head comment, yet the 
size of the file is doubled.  Could you add a sketch of the algorithm to the 
head comment, e.g. by copy-and-pasting the above part of your message?

The old code contained a couple of DEBUG_FUNCTIONs but the new one doesn't.
Richard Sandiford April 29, 2015, 1:45 p.m. UTC | #3
Eric Botcazou <ebotcazou@adacore.com> writes:
>> Also, the code is all goto-based, which makes it rather hard to step
>> through.
>
> Do you mean the code in genrecog.c or the generated code in insn-recog.c?

The generated code.  genrecog.c itself isn't bad. :-)

>> PS. I've attached the new genrecog.c since the diff version is unreadable.
>
> Small request: you didn't change a single line of the head comment, yet the 
> size of the file is doubled.  Could you add a sketch of the algorithm to the 
> head comment, e.g. by copy-and-pasting the above part of your message?

OK.  I'd left the head comment alone because it just described the
interface, which hasn't changed.  But I suppose past lack of commentary
doesn't justify future lack of commentary.  Here's what I added:

   At a high level, the algorithm used in this file is as follows:

   1. Build up a decision tree for each routine, using the following
      approach to matching an rtx:

      - First determine the "shape" of the rtx, based on GET_CODE,
	XVECLEN and XINT.  This phase examines SET_SRCs before SET_DESTs
	since SET_SRCs tend to be more distinctive.  It examines other
	operands in numerical order, since the canonicalization rules
	prefer putting complex operands of commutative operators first.

      - Next check modes and predicates.  This phase examines all
	operands in numerical order, even for SETs, since the mode of a
	SET_DEST is exact while the mode of a SET_SRC can be VOIDmode
	for constant integers.

      - Next check match_dups.

      - Finally check the C condition and (where appropriate) pnum_clobbers.

   2. Try to optimize the tree by removing redundant tests, CSEing tests,
      folding tests together, etc.

   3. Look for common subtrees and split them out into "pattern" routines.
      These common subtrees can be identical or they can differ in mode,
      code, or integer (usually an UNSPEC or UNSPEC_VOLATILE code).
      In the latter case the users of the pattern routine pass the
      appropriate mode, etc., as argument.  For example, if two patterns
      contain:

         (plus:SI (match_operand:SI 1 "register_operand")
	          (match_operand:SI 2 "register_operand"))

      we can split the associated matching code out into a subroutine.
      If a pattern contains:

         (minus:DI (match_operand:DI 1 "register_operand")
	           (match_operand:DI 2 "register_operand"))

      then we can consider using the same matching routine for both
      the plus and minus expressions, passing PLUS and SImode in the
      former case and MINUS and DImode in the latter case.

      The main aim of this phase is to reduce the compile time of the
      insn-recog.c code and to reduce the amount of object code in
      insn-recog.o.

   4. Split the matching trees into functions, trying to limit the
      size of each function to a sensible amount.

      Again, the main aim of this phase is to reduce the compile time
      of insn-recog.c.  (It doesn't help with the size of insn-recog.o.)

   5. Write out C++ code for each function.

BTW, hope at least part of the doubling in size is due to more commentary
in the code itself.

> The old code contained a couple of DEBUG_FUNCTIONs but the new one doesn't.

Yeah, but I don't know how useful they were.  I had to debug the old
code several times and never used them.

I'd rather leave stuff like that to someone who wants it rather than try
to write routines speculatively in the hope that someone would find them
useful.

Thanks,
Richard
Richard Sandiford April 29, 2015, 1:49 p.m. UTC | #4
Jeff Law <law@redhat.com> writes:
> On 04/27/2015 04:20 AM, Richard Sandiford wrote:
>> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-none-eabi.
>> Also tested by building the testsuite for each of the targets above
>> and making sure there were no assembly differences.  Made sure that no
>> objects in spec2k6 changed for aarch64-linux-gnu (except for perlbench
>> perl.o and cactusADM datestamp.o, where the differences are timestamps).
>> OK to install?
> To be very blunt, I'm probably not capable of reviewing the new code. 
> You're going to know it best and you should probably own it.
>
> Given your history with gcc and attention to detail, I'm comfortable 
> with approving this knowing you'll deal with any issues as they arise.

Thanks, applied.

> The one thing I would ask is that you make sure to include the recently 
> added matching constraint checking bits in genrecog.  I'm assuming all 
> the older sanity checks that have been in genrecog for a longer period 
> of time have been kept.

Yeah, Chen Gang's changes are all still in there.  All the older checks
should still be in there too.

Richard
Bin.Cheng April 30, 2015, 6:39 a.m. UTC | #5
On Mon, Apr 27, 2015 at 6:20 PM, Richard Sandiford
<richard.sandiford@arm.com> wrote:
> I think it's been the case for a while that parallel builds of GCC tend
> to serialise around the compilation of insn-recog.c, especially with
> higher --enable-checking settings.  This patch tries to speed that
> up by replacing most of genrecog with a new algorithm.
>
> I think the main problems with the current code are:
>
> 1. Vector architectures have added lots of new instructions that have
>    a similar shape and differ only in mode, code or unspec number.
>    The current algorithm doesn't have any way of factoring out those
>    similarities.
>
> 2. When matching a particular instruction, the current code examines
>    everything about a SET_DEST before moving on to the SET_SRC.  This has
>    two subproblems:
>
>    2a. The destination of a SET isn't very distinctive.  It's usually
>        just a register_operand, a memory_operand, a nonimmediate_operand
>        or a flags register.  We therefore tend to backtrack to the
>        SET_DEST a lot, oscillating between groups of instructions with
>        the same kind of destination.
>
>    2b. Backtracking through predicate checks is relatively expensive.
>        It would be good to narrow down the "shape" of the instruction
>        first and only then check the predicates.  (The backtracking is
>        expensive in terms of insn-recog.o compile time too, both because
>        we need to copy into argument registers and out of the result
>        register, and because it adds more sites where spills are needed.)
>
> 3. The code keeps one local variable per rtx depth, so it ends up
>    loading the same rtx many times over (mostly when backtracking).
>    This is very expensive in rtl-checking builds because each XEXP
>    includes a code check and a line-specific failure call.
>
>    In principle the idea of having one local variable per depth
>    is good.  But it was originally written that way when all optimisations
>    were done at the rtl level and I imagine each local variable mapped
>    to one pseudo register.  These days the statements that reload the
>    value needed on backtracking lead to many more SSA names and phi
>    statements than you'd get with just a single variable per position
>    (loaded once, so naturally SSA).  There is still the potential benefit
>    of avoiding having sibling rtxes live at once, but fixing (2) above
>    reduces that problem.
>
> Also, the code is all goto-based, which makes it rather hard to step through.
>
> The patch deals with these as follows:
>
> 1. Detect subpatterns that differ only by mode, code and/or integer
>    (e.g. unspec number) and split them out into a common routine.
>
> 2. Match the "shape" of the instruction first, in terms of codes,
>    integers and vector lengths, and only then check the modes, predicates
>    and dups.  When checking the shape, handle SET_SRCs before SET_DESTs.
>    In practice this seems to greatly reduce the amount of backtracking.
>
> 3. Have one local variable per rtx position.  I tested the patch with
>    and without the change and it helped a lot with rtl-checking builds
>    without seeming to affect release builds much either way.
>
> As far as debuggability goes, the new code avoids gotos and just
> uses "natural" control flow.
>
> The headline stat is that a stage 3 --enable-checking=yes,rtl,df
> build of insn-recog.c on my box goes from 7m43s to 2m2s (using the
> same stage 2 compiler).  The corresponding --enable-checking=release
> change is from 49s to 24s (less impressive, as expected).
>
> The patch seems to speed up recog.  E.g. the time taken to build
> fold-const.ii goes from 6.74s before the patch to 6.69s after it;
> not a big speed-up, but reproducible.
>
> Here's a comparison of the number of lines of code in insn-recog.c
> before and after the patch on one target per config/ CPU:
>
> aarch64-linux-gnueabi                           115526    38169 :   33.04%
> alpha-linux-gnu                                  24479    10740 :   43.87%
> arm-linux-gnueabi                               169208    67759 :   40.04%
> avr-rtems                                        55647    22127 :   39.76%
> bfin-elf                                         13928     6498 :   46.65%
> c6x-elf                                          29928    13324 :   44.52%
> cr16-elf                                          2650     1419 :   53.55%
> cris-elf                                         18669     7257 :   38.87%
> epiphany-elf                                     19308     6131 :   31.75%
> fr30-elf                                          2204     1112 :   50.45%
> frv-linux-gnu                                    13541     5950 :   43.94%
> h8300-elf                                        19584     9327 :   47.63%
> hppa64-hp-hpux11.23                              18299     8549 :   46.72%
> ia64-linux-gnu                                   37629    17101 :   45.45%
> iq2000-elf                                        2752     1609 :   58.47%
> lm32-elf                                          1536      872 :   56.77%
> m32c-elf                                         10040     4145 :   41.28%
> m32r-elf                                          4436     2307 :   52.01%
> m68k-linux-gnu                                   15739     8147 :   51.76%
> mcore-elf                                         4816     2577 :   53.51%
> mep-elf                                          67335    15929 :   23.66%
> microblaze-elf                                    2656     1587 :   59.75%
> mips-linux-gnu                                   54543    24186 :   44.34%
> mmix                                              2597     1487 :   57.26%
> mn10300-elf                                       6384     3294 :   51.60%
> moxie-elf                                         1311      659 :   50.27%
> msp430-elf                                        6054     2382 :   39.35%
> nds32le-elf                                       5953     3152 :   52.95%
> nios2-linux-gnu                                   3735     2143 :   57.38%
> pdp11                                             2137     1157 :   54.14%
> powerpc-eabispe                                 109322    40582 :   37.12%
> powerpc-linux-gnu                                82976    32192 :   38.80%
> rl78-elf                                          5321     2432 :   45.71%
> rx-elf                                           14454     7534 :   52.12%
> s390-linux-gnu                                   48487    20454 :   42.18%
> sh-linux-gnu                                    104087    41820 :   40.18%
> sparc-linux-gnu                                  21912    10509 :   47.96%
> spu-elf                                          19937     8182 :   41.04%
> tilegx-elf                                       15412     6970 :   45.22%
> tilepro-elf                                      11998     5479 :   45.67%
> v850-elf                                          8725     4438 :   50.87%
> vax-netbsdelf                                     4537     2410 :   53.12%
> visium-elf                                       15190     7224 :   47.56%
> x86_64-darwin                                   346396   119593 :   34.52%
> xstormy16-elf                                     4660     2229 :   47.83%
> xtensa-elf                                        2799     1514 :   54.09%
>
> Here's the loadable size of insn-recog.o in an --enable-checking=release
> build on an x86_64-linux-gnu box:
>
> aarch64-linux-gnueabi                           443955   298026 :   67.13%
> alpha-linux-gnu                                  97194    80893 :   83.23%
> arm-linux-gnueabi                               782325   632248 :   80.82%
> avr-rtems                                       226827   159763 :   70.43%
> bfin-elf                                         52563    42376 :   80.62%
> c6x-elf                                         112512    90142 :   80.12%
> cr16-elf                                         10446    10006 :   95.79%
> cris-elf                                         74771    52634 :   70.39%
> epiphany-elf                                     87577    52284 :   59.70%
> fr30-elf                                          8041     7713 :   95.92%
> frv-linux-gnu                                    53699    47543 :   88.54%
> h8300-elf                                        70708    66274 :   93.73%
> hppa64-hp-hpux11.23                              71597    57484 :   80.29%
> ia64-linux-gnu                                  147286   130632 :   88.69%
> iq2000-elf                                       11002    11774 :  107.02%
> lm32-elf                                          5894     5798 :   98.37%
> m32c-elf                                         36563    28855 :   78.92%
> m32r-elf                                         17252    16910 :   98.02%
> m68k-linux-gnu                                   58248    59781 :  102.63%
> mcore-elf                                        17708    18948 :  107.00%
> mep-elf                                         314466   150771 :   47.95%
> microblaze-elf                                   10257    10534 :  102.70%
> mips-linux-gnu                                  230991   191155 :   82.75%
> mmix                                             10782    10678 :   99.04%
> mn10300-elf                                      24035    22802 :   94.87%
> moxie-elf                                         4622     4198 :   90.83%
> msp430-elf                                       21707    16539 :   76.19%
> nds32le-elf                                      22041    19444 :   88.22%
> nios2-linux-gnu                                  15595    13238 :   84.89%
> pdp11                                             7630     8254 :  108.18%
> powerpc-eabispe                                 430816   308481 :   71.60%
> powerpc-linux-gnu                               317738   248534 :   78.22%
> rl78-elf                                         18904    16329 :   86.38%
> rx-elf                                           55015    56632 :  102.94%
> s390-linux-gnu                                  190584   148961 :   78.16%
> sh-linux-gnu                                    408446   307927 :   75.39%
> sparc-linux-gnu                                  91016    80640 :   88.60%
> spu-elf                                          80387    69151 :   86.02%
> tilegx-elf                                       63815    49977 :   78.32%
> tilepro-elf                                      51763    39252 :   75.83%
> v850-elf                                         32812    28462 :   86.74%
> vax-netbsdelf                                    18350    18259 :   99.50%
> visium-elf                                       56872    46790 :   82.27%
> x86_64-darwin                                  1306182   883169 :   67.61%
> xstormy16-elf                                    17044    14430 :   84.66%
> xtensa-elf                                       10780     9678 :   89.78%
>
> The same for --enable-checking=yes,rtl:
>
> aarch64-linux-gnueabi                          1790272   507488 :   28.35%
> alpha-linux-gnu                                 440058   193826 :   44.05%
> arm-linux-gnueabi                              2845568  1299337 :   45.66%
> avr-rtems                                       885672   294387 :   33.24%
> bfin-elf                                        280606   142836 :   50.90%
> c6x-elf                                         486345   259256 :   53.31%
> cr16-elf                                         46626    20044 :   42.99%
> cris-elf                                        426813   144414 :   33.84%
> epiphany-elf                                    353078   122166 :   34.60%
> fr30-elf                                         40414    21042 :   52.07%
> frv-linux-gnu                                   259550   111335 :   42.90%
> h8300-elf                                       355199   158411 :   44.60%
> hppa64-hp-hpux11.23                             340584   149009 :   43.75%
> ia64-linux-gnu                                  661364   293710 :   44.41%
> iq2000-elf                                       41123    26709 :   64.95%
> lm32-elf                                         20370    14781 :   72.56%
> m32c-elf                                        174344    62000 :   35.56%
> m32r-elf                                         74357    41773 :   56.18%
> m68k-linux-gnu                                  275733   117445 :   42.59%
> mcore-elf                                        85180    48018 :   56.37%
> mep-elf                                        1450168   376020 :   25.93%
> microblaze-elf                                   44189    26295 :   59.51%
> mips-linux-gnu                                  876650   375753 :   42.86%
> mmix                                             49882    25363 :   50.85%
> mn10300-elf                                     128148    66768 :   52.10%
> moxie-elf                                        23388     9011 :   38.53%
> msp430-elf                                      114200    34426 :   30.15%
> nds32le-elf                                     101416    73677 :   72.65%
> nios2-linux-gnu                                  58799    29825 :   50.72%
> pdp11                                            32836    18557 :   56.51%
> powerpc-eabispe                                1976098   626942 :   31.73%
> powerpc-linux-gnu                              1510652   526841 :   34.88%
> rl78-elf                                         93675    40538 :   43.28%
> rx-elf                                          279748   137284 :   49.07%
> s390-linux-gnu                                  857009   316494 :   36.93%
> sh-linux-gnu                                   2154337   806571 :   37.44%
> sparc-linux-gnu                                 367682   169019 :   45.97%
> spu-elf                                         341945   135629 :   39.66%
> tilegx-elf                                      235480   112103 :   47.61%
> tilepro-elf                                     246231   104137 :   42.29%
> v850-elf                                        158028    72875 :   46.12%
> vax-netbsdelf                                    85057    37578 :   44.18%
> visium-elf                                      257148   103331 :   40.18%
> x86_64-darwin                                  5514235  1721777 :   31.22%
> xstormy16-elf                                    83456    46128 :   55.27%
> xtensa-elf                                       52652    29880 :   56.75%
>
> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-none-eabi.
> Also tested by building the testsuite for each of the targets above
> and making sure there were no assembly differences.  Made sure that no
> objects in spec2k6 changed for aarch64-linux-gnu (except for perlbench
> perl.o and cactusADM datestamp.o, where the differences are timestamps).
> OK to install?
>
> Thanks,
> Richard
>
> PS. I've attached the new genrecog.c since the diff version is unreadable.
>
>
> gcc/
>         * Makefile.in (build/genrecog.o): Depend on inchash.h.
>         (build/genrecog$(build_exeext): Depend on build/hash-table.o and
>         build/inchash.o
>         * genrecog.c: Rewrite most of the code except for the third page.
>
> Index: gcc/Makefile.in
> ===================================================================
> --- gcc/Makefile.in     2015-04-27 10:42:57.783191573 +0100
> +++ gcc/Makefile.in     2015-04-27 10:43:42.878643078 +0100
> @@ -2527,7 +2527,8 @@ build/genpeep.o : genpeep.c $(RTL_BASE_H
>  build/genpreds.o : genpreds.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H)   \
>    coretypes.h $(GTM_H) errors.h $(READ_MD_H) gensupport.h $(OBSTACK_H)
>  build/genrecog.o : genrecog.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H)   \
> -  coretypes.h $(GTM_H) errors.h $(READ_MD_H) gensupport.h
> +  coretypes.h $(GTM_H) errors.h $(READ_MD_H) gensupport.h              \
> +  $(HASH_TABLE_H) inchash.h
>  build/genhooks.o : genhooks.c $(TARGET_DEF) $(C_TARGET_DEF)            \
>    $(COMMON_TARGET_DEF) $(BCONFIG_H) $(SYSTEM_H) errors.h
>  build/genmddump.o : genmddump.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H) \
> @@ -2559,6 +2560,8 @@ genprog = $(genprogerr) check checksum c
>  # These programs need libs over and above what they get from the above list.
>  build/genautomata$(build_exeext) : BUILD_LIBS += -lm
>
> +build/genrecog$(build_exeext) : build/hash-table.o build/inchash.o
> +
>  # For stage1 and when cross-compiling use the build libcpp which is
>  # built with NLS disabled.  For stage2+ use the host library and
>  # its dependencies.
>
Hi Richard,
I noticed that this patch caused ICE for gcc.target/arm/mmx-2.c on
arm-none-linux-gnueabi.  Could you please have a look at it?

The log message is as below,
/projects/.../src/gcc/gcc/testsuite/gcc.target/arm/mmx-2.c: In function 'foo':
/projects/.../src/gcc/gcc/testsuite/gcc.target/arm/mmx-2.c:166:1:
error: unrecognizable insn:
(insn 541 540 542 2 (set (reg:V4HI 512 [ D.4809 ])
        (vec_merge:V4HI (vec_select:V4HI (reg:V4HI 510 [ D.4806 ])
                (parallel [
                        (const_int 2 [0x2])
                        (const_int 0 [0])
                        (const_int 3 [0x3])
                        (const_int 1 [0x1])
                    ]))
            (vec_select:V4HI (reg:V4HI 511 [ D.4806 ])
                (parallel [
                        (const_int 0 [0])
                        (const_int 2 [0x2])
                        (const_int 1 [0x1])
                        (const_int 3 [0x3])
                    ]))
            (const_int 5 [0x5])))
/projects/.../src/gcc/gcc/testsuite/gcc.target/arm/mmx-2.c:159 -1
     (nil))
/projects/.../src/gcc/gcc/testsuite/gcc.target/arm/mmx-2.c:166:1:
internal compiler error: in extract_insn, at recog.c:2341
0xa42d2a _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
    /projects/.../src/gcc/gcc/rtl-error.c:110
0xa42d59 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
    /projects/.../src/gcc/gcc/rtl-error.c:118
0xa15ff7 extract_insn(rtx_insn*)
    /projects/.../src/gcc/gcc/recog.c:2341
0x7ffb42 instantiate_virtual_regs_in_insn
    /projects/.../src/gcc/gcc/function.c:1598
0x7ffb42 instantiate_virtual_regs
    /projects/.../src/gcc/gcc/function.c:1966
0x7ffb42 execute
    /projects/.../src/gcc/gcc/function.c:2015
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.

GCC is configured with

gcc/configure --target=arm-none-linux-gnueabi --prefix=
--with-sysroot=... --enable-shared --disable-libsanitizer
--disable-libssp --disable-libmudflap
--with-plugin-ld=arm-none-linux-gnueabi-ld --enable-checking=yes
--enable-languages=c,c++,fortran --with-gmp=... --with-mpfr=...
--with-mpc=... --with-isl=... --with-cloog=... --with-arch=armv7-a
--with-fpu=vfpv3-d16 --with-float=softfp --with-arch=armv7-a

Thanks,
bin
Andreas Schwab April 30, 2015, 6:58 a.m. UTC | #6
Richard Sandiford <richard.sandiford@arm.com> writes:

> /* Represents a test and the action that should be taken on the result.
>    If a transition exists for the test outcome, the machine switches
>    to the transition's target state.  If no suitable transition exists,
>    the machine either falls through to the next decision or, if there are no
>    more decisions to try, fails the match.  */
> struct decision : list_head <transition>
> {
>   decision (const test &);
>
>   void set_parent (list_head <decision> *s);
>   bool if_statement_p (uint64_t * = 0) const;
>
>   /* The state to which this decision belongs.  */
>   state *s;
>
>   /* Links to other decisions in the same state.  */
>   decision *prev, *next;
>
>   /* The test to perform.  */
>   struct test test;
> };

../../gcc/genrecog.c:1467: error: declaration of 'test decision::test'
../../gcc/genrecog.c:1051: error: changes meaning of 'test' from 'struct test'

Bootstrap compiler is gcc 4.3.4.

Andreas.
Richard Sandiford April 30, 2015, 7:43 a.m. UTC | #7
Andreas Schwab <schwab@linux-m68k.org> writes:
> Richard Sandiford <richard.sandiford@arm.com> writes:
>
>> /* Represents a test and the action that should be taken on the result.
>>    If a transition exists for the test outcome, the machine switches
>>    to the transition's target state.  If no suitable transition exists,
>>    the machine either falls through to the next decision or, if there are no
>>    more decisions to try, fails the match.  */
>> struct decision : list_head <transition>
>> {
>>   decision (const test &);
>>
>>   void set_parent (list_head <decision> *s);
>>   bool if_statement_p (uint64_t * = 0) const;
>>
>>   /* The state to which this decision belongs.  */
>>   state *s;
>>
>>   /* Links to other decisions in the same state.  */
>>   decision *prev, *next;
>>
>>   /* The test to perform.  */
>>   struct test test;
>> };
>
> ../../gcc/genrecog.c:1467: error: declaration of 'test decision::test'
> ../../gcc/genrecog.c:1051: error: changes meaning of 'test' from 'struct test'
>
> Bootstrap compiler is gcc 4.3.4.

Bah.  Does it like "::test test" instead of "struct test test"?

Richard
Eric Botcazou April 30, 2015, 10:24 a.m. UTC | #8
> The generated code.  genrecog.c itself isn't bad. :-)

Nice work then.

> OK.  I'd left the head comment alone because it just described the
> interface, which hasn't changed.  But I suppose past lack of commentary
> doesn't justify future lack of commentary.  Here's what I added:
> [...]
> BTW, hope at least part of the doubling in size is due to more commentary
> in the code itself.

I see.  Thanks a lot for writing down the description of the algorithm!

> I'd rather leave stuff like that to someone who wants it rather than try
> to write routines speculatively in the hope that someone would find them
> useful.

OK.
Andreas Schwab April 30, 2015, 12:08 p.m. UTC | #9
Richard Sandiford <richard.sandiford@arm.com> writes:

> Andreas Schwab <schwab@linux-m68k.org> writes:
>> Richard Sandiford <richard.sandiford@arm.com> writes:
>>
>>> /* Represents a test and the action that should be taken on the result.
>>>    If a transition exists for the test outcome, the machine switches
>>>    to the transition's target state.  If no suitable transition exists,
>>>    the machine either falls through to the next decision or, if there are no
>>>    more decisions to try, fails the match.  */
>>> struct decision : list_head <transition>
>>> {
>>>   decision (const test &);
>>>
>>>   void set_parent (list_head <decision> *s);
>>>   bool if_statement_p (uint64_t * = 0) const;
>>>
>>>   /* The state to which this decision belongs.  */
>>>   state *s;
>>>
>>>   /* Links to other decisions in the same state.  */
>>>   decision *prev, *next;
>>>
>>>   /* The test to perform.  */
>>>   struct test test;
>>> };
>>
>> ../../gcc/genrecog.c:1467: error: declaration of 'test decision::test'
>> ../../gcc/genrecog.c:1051: error: changes meaning of 'test' from 'struct test'
>>
>> Bootstrap compiler is gcc 4.3.4.
>
> Bah.  Does it like "::test test" instead of "struct test test"?

Same error.

Andreas.
Richard Biener April 30, 2015, 12:22 p.m. UTC | #10
On Thu, Apr 30, 2015 at 2:08 PM, Andreas Schwab <schwab@linux-m68k.org> wrote:
> Richard Sandiford <richard.sandiford@arm.com> writes:
>
>> Andreas Schwab <schwab@linux-m68k.org> writes:
>>> Richard Sandiford <richard.sandiford@arm.com> writes:
>>>
>>>> /* Represents a test and the action that should be taken on the result.
>>>>    If a transition exists for the test outcome, the machine switches
>>>>    to the transition's target state.  If no suitable transition exists,
>>>>    the machine either falls through to the next decision or, if there are no
>>>>    more decisions to try, fails the match.  */
>>>> struct decision : list_head <transition>
>>>> {
>>>>   decision (const test &);
>>>>
>>>>   void set_parent (list_head <decision> *s);
>>>>   bool if_statement_p (uint64_t * = 0) const;
>>>>
>>>>   /* The state to which this decision belongs.  */
>>>>   state *s;
>>>>
>>>>   /* Links to other decisions in the same state.  */
>>>>   decision *prev, *next;
>>>>
>>>>   /* The test to perform.  */
>>>>   struct test test;
>>>> };
>>>
>>> ../../gcc/genrecog.c:1467: error: declaration of 'test decision::test'
>>> ../../gcc/genrecog.c:1051: error: changes meaning of 'test' from 'struct test'
>>>
>>> Bootstrap compiler is gcc 4.3.4.
>>
>> Bah.  Does it like "::test test" instead of "struct test test"?
>
> Same error.

You have to use a different name I believe (or -fpermissive).

Richard.

> Andreas.
>
> --
> Andreas Schwab, schwab@linux-m68k.org
> GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
> "And now for something completely different."
Richard Sandiford April 30, 2015, 4:24 p.m. UTC | #11
Richard Biener <richard.guenther@gmail.com> writes:
> On Thu, Apr 30, 2015 at 2:08 PM, Andreas Schwab <schwab@linux-m68k.org> wrote:
>> Richard Sandiford <richard.sandiford@arm.com> writes:
>>
>>> Andreas Schwab <schwab@linux-m68k.org> writes:
>>>> Richard Sandiford <richard.sandiford@arm.com> writes:
>>>>
>>>>> /* Represents a test and the action that should be taken on the result.
>>>>>    If a transition exists for the test outcome, the machine switches
>>>>>    to the transition's target state.  If no suitable transition exists,
>>>>>    the machine either falls through to the next decision or, if there are no
>>>>>    more decisions to try, fails the match.  */
>>>>> struct decision : list_head <transition>
>>>>> {
>>>>>   decision (const test &);
>>>>>
>>>>>   void set_parent (list_head <decision> *s);
>>>>>   bool if_statement_p (uint64_t * = 0) const;
>>>>>
>>>>>   /* The state to which this decision belongs.  */
>>>>>   state *s;
>>>>>
>>>>>   /* Links to other decisions in the same state.  */
>>>>>   decision *prev, *next;
>>>>>
>>>>>   /* The test to perform.  */
>>>>>   struct test test;
>>>>> };
>>>>
>>>> ../../gcc/genrecog.c:1467: error: declaration of 'test decision::test'
>>>> ../../gcc/genrecog.c:1051: error: changes meaning of 'test' from
>>>> struct test'
>>>>
>>>> Bootstrap compiler is gcc 4.3.4.
>>>
>>> Bah.  Does it like "::test test" instead of "struct test test"?
>>
>> Same error.
>
> You have to use a different name I believe (or -fpermissive).

Hmm, but then why does it work with more recent compilers?

Thanks,
Richard
Thomas Schwinge May 7, 2015, 8:59 a.m. UTC | #12
Hi!

On Mon, 27 Apr 2015 11:20:30 +0100, Richard Sandiford <richard.sandiford@arm.com> wrote:
> I think it's been the case for a while that parallel builds of GCC tend
> to serialise around the compilation of insn-recog.c, especially with
> higher --enable-checking settings.  This patch tries to speed that
> up by replacing most of genrecog with a new algorithm.  [...]

> Here's a comparison of the number of lines of code in insn-recog.c
> before and after the patch on one target per config/ CPU:
> 
> aarch64-linux-gnueabi                           115526    38169 :   33.04%
> alpha-linux-gnu                                  24479    10740 :   43.87%
> arm-linux-gnueabi                               169208    67759 :   40.04%
> avr-rtems                                        55647    22127 :   39.76%
> bfin-elf                                         13928     6498 :   46.65%
> c6x-elf                                          29928    13324 :   44.52%
> [...]

Nice work!  For a nvptx-none build of r222446 and r222860, respectively:

    $ wc -l {prev/,}build-gcc/gcc/insn-recog.c | head -n -1
      5042 prev/build-gcc/gcc/insn-recog.c
      2570 build-gcc/gcc/insn-recog.c


I'm mostly illiterate regarding GCC's machine description files, and
genrecog, but noticed one thing: with the genrecog rewrite, I get the
following diff compared to a previous build log:

     build/genrecog [...]/source-gcc/gcc/common.md [...]/source-gcc/gcc/config/nvptx/nvptx.md \
               insn-conditions.md > tmp-recog.c
    -[...]/source-gcc/gcc/config/nvptx/nvptx.md:1206: warning: operand 0 missing mode?
    -[...]/source-gcc/gcc/config/nvptx/nvptx.md:1206: warning: operand 1 missing mode?
    +Statistics for recog:
    +  Number of decisions:    799
    +  longest path:            28 (code:    208)
    +  longest backtrack:        2 (code:    136)
    +Statistics for split_insns:
    +  Number of decisions:      0
    +  longest path:             0 (code:     -1)
    +  longest backtrack:        0 (code:     -1)
    +Statistics for peephole2_insns:
    +  Number of decisions:      0
    +  longest path:             0 (code:     -1)
    +  longest backtrack:        0 (code:     -1)
    +Shared 655 out of 1350 states by creating 153 new states, saving 502

gcc/config/nvptx/nvptx.md:

    1206 (define_insn "allocate_stack"
    1207   [(set (match_operand 0 "nvptx_register_operand" "=R")
    1208         (unspec [(match_operand 1 "nvptx_register_operand" "R")]
    1209                   UNSPEC_ALLOCA))]
    1210   ""
    1211   "%.\\tcall (%0), %%alloca, (%1);")

Are these two (former) warnings a) something that should still be
reported by genrecog, and b) something that should be addressed (Bernd)?


Grüße,
 Thomas
Jakub Jelinek May 7, 2015, 9:14 a.m. UTC | #13
On Thu, May 07, 2015 at 10:59:01AM +0200, Thomas Schwinge wrote:
>      build/genrecog [...]/source-gcc/gcc/common.md [...]/source-gcc/gcc/config/nvptx/nvptx.md \
>                insn-conditions.md > tmp-recog.c
>     -[...]/source-gcc/gcc/config/nvptx/nvptx.md:1206: warning: operand 0 missing mode?
>     -[...]/source-gcc/gcc/config/nvptx/nvptx.md:1206: warning: operand 1 missing mode?
> 
> gcc/config/nvptx/nvptx.md:
> 
>     1206 (define_insn "allocate_stack"
>     1207   [(set (match_operand 0 "nvptx_register_operand" "=R")
>     1208         (unspec [(match_operand 1 "nvptx_register_operand" "R")]
>     1209                   UNSPEC_ALLOCA))]
>     1210   ""
>     1211   "%.\\tcall (%0), %%alloca, (%1);")
> 
> Are these two (former) warnings a) something that should still be
> reported by genrecog, 

Yes.

> and b) something that should be addressed (Bernd)?

Yes.  Supposedly you want :P on both match_operand and unspec too, but
as this serves not just as an insn pattern, but also as expander that
needs to have this particular name, supposedly you want:

(define_expand "allocate_stack"
  [(match_operand 0 "nvptx_register_operand")
   (match_operand 1 "nvptx_register_operand")]
  ""
{
  if (TARGET_ABI64)
    emit_insn (gen_allocate_stack_di (operands[0], operands[1]));
  else
    emit_insn (gen_allocate_stack_si (operands[0], operands[1]));
  DONE;
})

(define_insn "allocate_stack_<mode>"
  [(set (match_operand:P 0 "nvptx_register_operand" "=R")
	(unspec:P [(match_operand:P 1 "nvptx_register_operand" "R")]
		   UNSPEC_ALLOCA))]
  ""
  "%.\\tcall (%0), %%alloca, (%1);")

rr so.  Of course, as even latest Cuda drop doesn't support alloca, this is
quite dubious, perhaps better would be sorry on it.

BTW, with Cuda 7.0, even printf doesn't work anymore, is that known?

	Jakub
Andreas Krebbel May 16, 2015, 6:09 a.m. UTC | #14
Hi Richard,

I see regressions with the current IBM z13 vector patchset which appear to be related to the new
genrecog.

The following two insn definitions only differ in the mode and predicate of the shift count operand.

(define_insn "lshr<mode>3"
  [(set (match_operand:VI              0 "register_operand"             "=v")
        (lshiftrt:VI (match_operand:VI 1 "register_operand"              "v")
                     (match_operand:SI 2 "shift_count_or_setmem_operand" "Y")))]
  "TARGET_VX"
  "vesrl<bhfgq>\t%v0,%v1,%Y2"
  [(set_attr "op_type" "VRS")])

(define_insn "vlshr<mode>3"
  [(set (match_operand:VI              0 "register_operand" "=v")
        (lshiftrt:VI (match_operand:VI 1 "register_operand"  "v")
                     (match_operand:VI 2 "register_operand"  "v")))]
  "TARGET_VX"
  "vesrlv<bhfgq>\t%v0,%v1,%v2"
  [(set_attr "op_type" "VRR")])


However, the insn-recog.c code only seem to check the predicate. This is a problem since
shift_count_or_setmem_operand does not check the mode.

          if (shift_count_or_setmem_operand (operands[2], SImode)
              &&
#line 717 "/home3/andreas/patched/../gcc/gcc/config/s390/vector.md"
(TARGET_VX))
            return 600; /* lshrv2qi3 */
          if (register_operand (operands[2], V2QImode)
              &&
#line 747 "/home3/andreas/patched/../gcc/gcc/config/s390/vector.md"
(TARGET_VX))
            return 630; /* vlshrv2qi3 */
          break;

I could add a mode check to the predicate. However, I just wanted to check whether this change was
intentional.

Bye,

-Andreas-
Richard Sandiford May 17, 2015, 9:12 p.m. UTC | #15
Andreas Krebbel <krebbel@linux.vnet.ibm.com> writes:
> Hi Richard,
>
> I see regressions with the current IBM z13 vector patchset which appear to be related to the new
> genrecog.
>
> The following two insn definitions only differ in the mode and predicate of the shift count operand.
>
> (define_insn "lshr<mode>3"
>   [(set (match_operand:VI              0 "register_operand"             "=v")
>         (lshiftrt:VI (match_operand:VI 1 "register_operand"              "v")
>                      (match_operand:SI 2 "shift_count_or_setmem_operand" "Y")))]
>   "TARGET_VX"
>   "vesrl<bhfgq>\t%v0,%v1,%Y2"
>   [(set_attr "op_type" "VRS")])
>
> (define_insn "vlshr<mode>3"
>   [(set (match_operand:VI              0 "register_operand" "=v")
>         (lshiftrt:VI (match_operand:VI 1 "register_operand"  "v")
>                      (match_operand:VI 2 "register_operand"  "v")))]
>   "TARGET_VX"
>   "vesrlv<bhfgq>\t%v0,%v1,%v2"
>   [(set_attr "op_type" "VRR")])
>
>
> However, the insn-recog.c code only seem to check the predicate. This is a problem since
> shift_count_or_setmem_operand does not check the mode.

Yeah, it's a bug if a "non-special" predicate doesn't check the mode.
Even old genreog relied on that:

/* After factoring, try to simplify the tests on any one node.
   Tests that are useful for switch statements are recognizable
   by having only a single test on a node -- we'll be manipulating
   nodes with multiple tests:

   If we have mode tests or code tests that are redundant with
   predicates, remove them.  */

although it sounds like the old optimisation didn't trigger for your case.

genpreds.c:mark_mode_tests is supposed to add these tests automatically
if needed.  I suppose it isn't doing so here because the predicate
accepts const_int and because of:

/* Given an RTL expression EXP, find all subexpressions which we may
   assume to perform mode tests.  Normal MATCH_OPERAND does;
   MATCH_CODE does if it applies to the whole expression and accepts
   CONST_INT or CONST_DOUBLE; and we have to assume that MATCH_TEST
   does not.  [...]
*/
static void
mark_mode_tests (rtx exp)
{
  switch (GET_CODE (exp))
    {
[...]
    case MATCH_CODE:
      if (XSTR (exp, 1)[0] != '\0'
        || (!strstr (XSTR (exp, 0), "const_int")
              && !strstr (XSTR (exp, 0), "const_double")))
              NO_MODE_TEST (exp) = 1;
      break;

The code matches the comment, but it doesn't look right.  Perhaps it
was supposed to mean match_codes that _only_ contain const_int and
const_double?  Knowing that the rtx is one of those codes guarantees
that the mode is VOIDmode, but a match_code that includes other rtxes
as well doesn't itself test the mode of those rtxes.

Even then, a predicate that matches const_ints and is passed SImode
mustn't accept const_ints outside the SImode range.  I suppose we
just rely on all predicates to perform some kind of range check.
(The standard ones do of course.)

As a quick workaround, try replacing the case above with:

    case MATCH_CODE:
      if (XSTR (exp, 1)[0] != '\0')
        NO_MODE_TEST (exp) = 1;
      break;

I'll try to come up with a better fix in the meantime.

Thanks,
Richard
Andreas Krebbel May 22, 2015, 3:48 p.m. UTC | #16
On 05/17/2015 11:12 PM, Richard Sandiford wrote:
> Andreas Krebbel <krebbel@linux.vnet.ibm.com> writes:
>> Hi Richard,
>>
>> I see regressions with the current IBM z13 vector patchset which appear to be related to the new
>> genrecog.
>>
>> The following two insn definitions only differ in the mode and predicate of the shift count operand.
>>
>> (define_insn "lshr<mode>3"
>>   [(set (match_operand:VI              0 "register_operand"             "=v")
>>         (lshiftrt:VI (match_operand:VI 1 "register_operand"              "v")
>>                      (match_operand:SI 2 "shift_count_or_setmem_operand" "Y")))]
>>   "TARGET_VX"
>>   "vesrl<bhfgq>\t%v0,%v1,%Y2"
>>   [(set_attr "op_type" "VRS")])
>>
>> (define_insn "vlshr<mode>3"
>>   [(set (match_operand:VI              0 "register_operand" "=v")
>>         (lshiftrt:VI (match_operand:VI 1 "register_operand"  "v")
>>                      (match_operand:VI 2 "register_operand"  "v")))]
>>   "TARGET_VX"
>>   "vesrlv<bhfgq>\t%v0,%v1,%v2"
>>   [(set_attr "op_type" "VRR")])
>>
>>
>> However, the insn-recog.c code only seem to check the predicate. This is a problem since
>> shift_count_or_setmem_operand does not check the mode.
> 
> Yeah, it's a bug if a "non-special" predicate doesn't check the mode.
> Even old genreog relied on that:
> 
> /* After factoring, try to simplify the tests on any one node.
>    Tests that are useful for switch statements are recognizable
>    by having only a single test on a node -- we'll be manipulating
>    nodes with multiple tests:
> 
>    If we have mode tests or code tests that are redundant with
>    predicates, remove them.  */
> 
> although it sounds like the old optimisation didn't trigger for your case.
> 
> genpreds.c:mark_mode_tests is supposed to add these tests automatically
> if needed.  I suppose it isn't doing so here because the predicate
> accepts const_int and because of:
> 
> /* Given an RTL expression EXP, find all subexpressions which we may
>    assume to perform mode tests.  Normal MATCH_OPERAND does;
>    MATCH_CODE does if it applies to the whole expression and accepts
>    CONST_INT or CONST_DOUBLE; and we have to assume that MATCH_TEST
>    does not.  [...]
> */
> static void
> mark_mode_tests (rtx exp)
> {
>   switch (GET_CODE (exp))
>     {
> [...]
>     case MATCH_CODE:
>       if (XSTR (exp, 1)[0] != '\0'
>         || (!strstr (XSTR (exp, 0), "const_int")
>               && !strstr (XSTR (exp, 0), "const_double")))
>               NO_MODE_TEST (exp) = 1;
>       break;
> 
> The code matches the comment, but it doesn't look right.  Perhaps it
> was supposed to mean match_codes that _only_ contain const_int and
> const_double?  Knowing that the rtx is one of those codes guarantees
> that the mode is VOIDmode, but a match_code that includes other rtxes
> as well doesn't itself test the mode of those rtxes.
> 
> Even then, a predicate that matches const_ints and is passed SImode
> mustn't accept const_ints outside the SImode range.  I suppose we
> just rely on all predicates to perform some kind of range check.
> (The standard ones do of course.)
> 
> As a quick workaround, try replacing the case above with:
> 
>     case MATCH_CODE:
>       if (XSTR (exp, 1)[0] != '\0')
>         NO_MODE_TEST (exp) = 1;
>       break;
> 
> I'll try to come up with a better fix in the meantime.

Thanks for looking into this!
I've applied a patch adding a mode check to shift_count_or_setmem_operand which as expected fixes
the issue for me.

-Andreas-
Richard Sandiford May 22, 2015, 4:05 p.m. UTC | #17
Andreas Krebbel <krebbel@linux.vnet.ibm.com> writes:
> On 05/17/2015 11:12 PM, Richard Sandiford wrote:
>> Andreas Krebbel <krebbel@linux.vnet.ibm.com> writes:
>>> Hi Richard,
>>>
>>> I see regressions with the current IBM z13 vector patchset which appear to be related to the new
>>> genrecog.
>>>
>>> The following two insn definitions only differ in the mode and predicate of the shift count operand.
>>>
>>> (define_insn "lshr<mode>3"
>>>   [(set (match_operand:VI              0 "register_operand"             "=v")
>>>         (lshiftrt:VI (match_operand:VI 1 "register_operand"              "v")
>>>                      (match_operand:SI 2 "shift_count_or_setmem_operand" "Y")))]
>>>   "TARGET_VX"
>>>   "vesrl<bhfgq>\t%v0,%v1,%Y2"
>>>   [(set_attr "op_type" "VRS")])
>>>
>>> (define_insn "vlshr<mode>3"
>>>   [(set (match_operand:VI              0 "register_operand" "=v")
>>>         (lshiftrt:VI (match_operand:VI 1 "register_operand"  "v")
>>>                      (match_operand:VI 2 "register_operand"  "v")))]
>>>   "TARGET_VX"
>>>   "vesrlv<bhfgq>\t%v0,%v1,%v2"
>>>   [(set_attr "op_type" "VRR")])
>>>
>>>
>>> However, the insn-recog.c code only seem to check the predicate. This is a problem since
>>> shift_count_or_setmem_operand does not check the mode.
>> 
>> Yeah, it's a bug if a "non-special" predicate doesn't check the mode.
>> Even old genreog relied on that:
>> 
>> /* After factoring, try to simplify the tests on any one node.
>>    Tests that are useful for switch statements are recognizable
>>    by having only a single test on a node -- we'll be manipulating
>>    nodes with multiple tests:
>> 
>>    If we have mode tests or code tests that are redundant with
>>    predicates, remove them.  */
>> 
>> although it sounds like the old optimisation didn't trigger for your case.
>> 
>> genpreds.c:mark_mode_tests is supposed to add these tests automatically
>> if needed.  I suppose it isn't doing so here because the predicate
>> accepts const_int and because of:
>> 
>> /* Given an RTL expression EXP, find all subexpressions which we may
>>    assume to perform mode tests.  Normal MATCH_OPERAND does;
>>    MATCH_CODE does if it applies to the whole expression and accepts
>>    CONST_INT or CONST_DOUBLE; and we have to assume that MATCH_TEST
>>    does not.  [...]
>> */
>> static void
>> mark_mode_tests (rtx exp)
>> {
>>   switch (GET_CODE (exp))
>>     {
>> [...]
>>     case MATCH_CODE:
>>       if (XSTR (exp, 1)[0] != '\0'
>>         || (!strstr (XSTR (exp, 0), "const_int")
>>               && !strstr (XSTR (exp, 0), "const_double")))
>>               NO_MODE_TEST (exp) = 1;
>>       break;
>> 
>> The code matches the comment, but it doesn't look right.  Perhaps it
>> was supposed to mean match_codes that _only_ contain const_int and
>> const_double?  Knowing that the rtx is one of those codes guarantees
>> that the mode is VOIDmode, but a match_code that includes other rtxes
>> as well doesn't itself test the mode of those rtxes.
>> 
>> Even then, a predicate that matches const_ints and is passed SImode
>> mustn't accept const_ints outside the SImode range.  I suppose we
>> just rely on all predicates to perform some kind of range check.
>> (The standard ones do of course.)
>> 
>> As a quick workaround, try replacing the case above with:
>> 
>>     case MATCH_CODE:
>>       if (XSTR (exp, 1)[0] != '\0')
>>         NO_MODE_TEST (exp) = 1;
>>       break;
>> 
>> I'll try to come up with a better fix in the meantime.
>
> Thanks for looking into this!
> I've applied a patch adding a mode check to
> shift_count_or_setmem_operand which as expected fixes
> the issue for me.

OK.  After a false start earlier in the week, I've now got a patch
that I think fixes genpreds.c "properly".  Hope to post it soon.

Thanks,
Richard
diff mbox

Patch

Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	2015-04-27 10:42:57.783191573 +0100
+++ gcc/Makefile.in	2015-04-27 10:43:42.878643078 +0100
@@ -2527,7 +2527,8 @@  build/genpeep.o : genpeep.c $(RTL_BASE_H
 build/genpreds.o : genpreds.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H)	\
   coretypes.h $(GTM_H) errors.h $(READ_MD_H) gensupport.h $(OBSTACK_H)
 build/genrecog.o : genrecog.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H)	\
-  coretypes.h $(GTM_H) errors.h $(READ_MD_H) gensupport.h
+  coretypes.h $(GTM_H) errors.h $(READ_MD_H) gensupport.h		\
+  $(HASH_TABLE_H) inchash.h
 build/genhooks.o : genhooks.c $(TARGET_DEF) $(C_TARGET_DEF)		\
   $(COMMON_TARGET_DEF) $(BCONFIG_H) $(SYSTEM_H) errors.h
 build/genmddump.o : genmddump.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H)	\
@@ -2559,6 +2560,8 @@  genprog = $(genprogerr) check checksum c
 # These programs need libs over and above what they get from the above list.
 build/genautomata$(build_exeext) : BUILD_LIBS += -lm
 
+build/genrecog$(build_exeext) : build/hash-table.o build/inchash.o
+
 # For stage1 and when cross-compiling use the build libcpp which is
 # built with NLS disabled.  For stage2+ use the host library and
 # its dependencies.