New rematerialization sub-pass in LRA

Here is a new rematerialization sub-pass of LRA.

   This summer, I tried to implement a separate register pressure
release pass based on rematerialization described in Simpson's PhD.
Although this approach is very attractive as implementation of a
separate pass is simpler, it did not work in GCC.  I saw no code
improvement but mostly big performance degradations.  We have pretty
decent register pressure evaluation infrastructure but apparently it
is not enough to make right rematerialization based only on this info
as IRA/LRA make many other optimizations which can be hard to take
into account before the RA.  So I guess Simpson's approach works only
for basic/simple RA for regular file architectures (as register
pressure decrease pass described in Morgan's book which I also tried).

   So I had to implement rematerialization in LRA itself where all
necessary info is available.  The implementation is more complicated
than Simpson's one but it is worth it as there is no performance
degradation in vast majority cases.

   The new LRA rematerialization sub-pass works right before spilling
subpass and tries to rematerialize values of spilled pseudos.  To
implement the new sub-pass, some important changes in LRA were done.
First, lra-lives.c updates live info about all registers (not only
allocatable ones) and, second, lra-constraints.c was modified to
permit to check that insn satisfies all constraints in strict sense
even if it still contains pseudos.

   I've tested and benchmarked the sub-pass on x86-64 and ARM.  The
sub-pass permits to generate a smaller code in average on both
architecture (although improvement no-significant), adds < 0.4%
additional compilation time in -O2 mode of release GCC (according user
time of compilation of 500K lines fortran program and valgrind lakey #
insns in combine.i compilation) and about 0.7% in -O0 mode.  As the
performance result, the best I found is 1% SPECFP2000 improvement on
ARM Ecynos 5410 (973 vs 963) but for Intel Haswell the performance
results are practically the same (Haswell has a very good
sophisticated memory sub-system).

  There is a room for the pass improvements.  I wrote some ideas at the
top level comment of file lra-remat.c

  Rematerialization sub-pass will work at -O2 and higher and new option
-flra-remat is introduced.

  The patch was successfully tested on x86-64 and ARM.  I am going to
submit it on next week.  Any comments are appreciated.

2014-10-10  Vladimir Makarov  <vmakarov@redhat.com>

         * common.opt (flra-remat): New.
         * opts.c (default_options_table): Add entry for flra_remat.
         * timevar_def (TV_LRA_REMAT): New.
         * doc/invoke.texi (-flra-remat): Add description of the new
         option.
         * doc/passes.texi (-flra-remat): Remove lra-equivs.c and
         lra-saves.c.  Add lra-remat.c.
         * Makefile.in (OBJS): Add lra-remat.o.
         * lra-remat.c: New file.
         * lra.c: Add info about the rematerialization pass in the top
         comment.
         (collect_non_operand_hard_regs, add_regs_to_insn_regno_info):
         Process unallocatable regs too.
         (lra_constraint_new_insn_uid_start): Remove.
         (lra): Add code for calling rematerialization sub-pass.
         * lra-int.h (lra_constraint_new_insn_uid_start): Remove.
         (lra_constrain_insn, lra_remat): New prototypes.
         (lra_eliminate_regs_1): Add parameter.
         * lra-lives.c (make_hard_regno_born, make_hard_regno_dead):
         Process unallocatable hard regs too.
         (process_bb_lives): Ditto.
         * lra-spills.c (remove_pseudos): Add argument to
         lra_eliminate_regs_1 call.
         * lra-eliminations.c (lra_eliminate_regs_1): Add parameter.
         Use it for sp offset calculation.
         (lra_eliminate_regs): Add argument for lra_eliminate_regs_1
         call.
         (eliminate_regs_in_insn): Add parameter.  Use it for sp offset
         calculation.
         (process_insn_for_elimination): Add argument for
         eliminate_regs_in_insn call.
         * lra-constraints.c (get_equiv_with_elimination):  Add argument
         for lra_eliminate_regs_1 call.
         (process_addr_reg): Add parameter.  Use it.
         (process_address_1): Ditto.  Add argument for process_addr_reg
         call.
         (process_address): Ditto.
         (curr_insn_transform): Add parameter.  Use it.  Add argument for
         process_address calls.
         (lra_constrain_insn): New function.
         (lra_constraints): Add argument for curr_insn_transform call.

New rematerialization sub-pass in LRA

Commit Message

Comments

Patch