Patchwork RFA: enable LRA for rs6000

login
register
mail settings
Submitter Vladimir Makarov
Date April 11, 2013, 5:30 p.m.
Message ID <5166F34C.30901@redhat.com>
Download mbox | patch
Permalink /patch/235844/
State New
Headers show

Comments

Vladimir Makarov - April 11, 2013, 5:30 p.m.
Here is a patch to enable LRA for rs6000.  The patch includes code 
changes in rs6000 machine-dependent parts and in LRA parts.

   I've added a new switch -mlra for rs6000 to make debugging LRA for 
rs6000 easier but not documented it as it will be gone at the end of 
stage1 (may be with ability to use LRA for rs6000 if the results are 
unsatisfactory).  I have only one question about its default value.  
Currently by default LRA is used but if you prefer opposite I can 
reverse it. Please just let me know your opinion.

   The patch was successfully bootstrapped and tested on pppc64 and 
x86/x86-64.

   Are rs6000 parts ok for trunk?

2013-04-11  Vladimir Makarov  <vmakarov@redhat.com>

         * rtl.h (struct rtx_def): Add comment for field jump.
         (LRA_SUBREG_P): New macro.
         * recog.c (register_operand): Check LRA_SUBREG_P.
         * lra-constraints.c (match_reload, simplify_operand_subreg): Use
         LRA_SUBREG_P.
         (emit_spill_move): Set up LRA_SUBREG_P.
         * lra.c (lra): Add note at the end of RTL code. Align non-empty
         stack frame.
         * lra-spills.c (lra_spill): Align stack after spilling pseudos.
         (lra_final_code_change): Skip subreg change for operators.
         * lra-eliminations.c (eliminate_regs_in_insn): Make return earlier
         if there are no operand changes.
         * lra-constraints.c (curr_insn_set): New.
         (match_reload): Set LRA_SUBREG_P.
         (emit_spill_move): Ditto.
         (check_and_process_move): Use curr_insn_set. Process only single
         set insns.  Don't initialize sec_mem_p and change_p.
         (simplify_operand_subreg): Use LRA_SUBREG_P.
         (reg_in_class_p): New function.
         (process_alt_operands): Use it.  Use #if HAVE_ATTR_enabled instead
         of #ifdef.  Add code to remove cycling.
         (process_address): Check EXTRA_CONSTRAINT_STR. Process even if
         non-null disp.  Reload inner instead of disp when base and index
         are null.
         (EBB_PROBABILITY_CUTOFF): Redefine probability in percents.
         (curr_insn_transform): Initialize sec_mem_p and change_p.  Set up
         curr_insn_set.  Call check_and_process_move only for single set
         insns.
         * config/rs6000/rs6000.opt (mlra): New option.
         * config/rs6000/rs6000.h (SECONDARY_MEMORY_NEEDED_MODE): New macro.
         * config/rs6000/rs6000-protos.h
         (rs6000_secondary_memory_needed_mode): New prototype.
         * config/rs6000/rs6000.c: Include ira.h.
         (TARGET_LRA_P): Redefine.
         (legitimate_lo_sum_address_p): Permit modes bigger word for LRA.
         (rs6000_emit_move): Add movsd generation code for LRA.
         (rs6000_secondary_memory_needed_mode): New function.
         (rs6000_lra_p): Ditto.
         (rs6000_alloc_sdmode_stack_slot): Ignore code for LRA.
         (rs6000_secondary_reload_class): Return NO_REGS for LRA in case
         constants, memory, or fp regs.
David Edelsohn - April 11, 2013, 6:32 p.m.
On Thu, Apr 11, 2013 at 1:30 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>   Here is a patch to enable LRA for rs6000.  The patch includes code changes
> in rs6000 machine-dependent parts and in LRA parts.
>
>   I've added a new switch -mlra for rs6000 to make debugging LRA for rs6000
> easier but not documented it as it will be gone at the end of stage1 (may be
> with ability to use LRA for rs6000 if the results are unsatisfactory).  I
> have only one question about its default value.  Currently by default LRA is
> used but if you prefer opposite I can reverse it. Please just let me know
> your opinion.
>
>   The patch was successfully bootstrapped and tested on pppc64 and
> x86/x86-64.
>
>   Are rs6000 parts ok for trunk?

Vlad,

Thanks for your work on LRA and your work to convert the rs6000 port
to use the new feature.

My main question about the rs6000 parts of the patch is: what is
toc_ok_p suppose to mean?

-      if (TARGET_TOC)
+      toc_ok_p = (lra_in_progress && TARGET_CMODEL != CMODEL_SMALL
+                        && small_toc_ref (x, VOIDmode));
+      if (TARGET_TOC && ! toc_ok_p)
  return false;

This is a change to the meaning of the test with no explanation or
comment.  At least the name is confusing because it seems to be
selecting a subset of valid TOC addressing forms.  Medium and large
model TOC references before reload?

Also suffix "_ok" and "_p" seem redundant.

Maybe your intent is a boolean for "large" TOC that could be named
large_toc_p or large_toc_ok?

Thanks, David
Vladimir Makarov - April 11, 2013, 9:04 p.m.
On 04/11/2013 02:32 PM, David Edelsohn wrote:
> On Thu, Apr 11, 2013 at 1:30 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>>    Here is a patch to enable LRA for rs6000.  The patch includes code changes
>> in rs6000 machine-dependent parts and in LRA parts.
>>
>>    I've added a new switch -mlra for rs6000 to make debugging LRA for rs6000
>> easier but not documented it as it will be gone at the end of stage1 (may be
>> with ability to use LRA for rs6000 if the results are unsatisfactory).  I
>> have only one question about its default value.  Currently by default LRA is
>> used but if you prefer opposite I can reverse it. Please just let me know
>> your opinion.
>>
>>    The patch was successfully bootstrapped and tested on pppc64 and
>> x86/x86-64.
>>
>>    Are rs6000 parts ok for trunk?
> Vlad,
>
> Thanks for your work on LRA and your work to convert the rs6000 port
> to use the new feature.
>
> My main question about the rs6000 parts of the patch is: what is
> toc_ok_p suppose to mean?
>
> -      if (TARGET_TOC)
> +      toc_ok_p = (lra_in_progress && TARGET_CMODEL != CMODEL_SMALL
> +                        && small_toc_ref (x, VOIDmode));
> +      if (TARGET_TOC && ! toc_ok_p)
>    return false;
>
> This is a change to the meaning of the test with no explanation or
> comment.  At least the name is confusing because it seems to be
> selecting a subset of valid TOC addressing forms.  Medium and large
> model TOC references before reload?
>
> Also suffix "_ok" and "_p" seem redundant.
>
> Maybe your intent is a boolean for "large" TOC that could be named
> large_toc_p or large_toc_ok?
>
This is a hard question for me.  Thanks for pointing this out, David.  
The code is 9 months old.  It was introduced with lo_sum support for ppc:

http://gcc.gnu.org/ml/gcc-patches/2012-07/msg00260.html

Without this change, there are >3000 failures on GCC testsuite.  If ppc 
target code says that the address

(lo_sum:DI (reg:DI <some pseudo>)
                 (plus:DI (unspec:DI [
                             (symbol_ref:DI ("*.LANCHOR0") [flags 0x182])
                             (reg:DI 2 2)
                         ] UNSPEC_TOCREL)
                     (const_int 10 [0xa])))

is not legitimate lo_sum address, LRA tries to reload an address 
generating insn

<a pseudo> = (plus:DI (unspec:DI [
                          (symbol_ref:DI ("*.LANCHOR0") [flags 0x182])
                         (reg:DI 2 2)
                        ] UNSPEC_TOCREL)
               (const_int 10 [0xa]))

and that fails

I believe it was a quick hack for old address decomposition code which 
processed wrongly some addresses.  I guess I forgot about this hack.  
Looking at this I don't like it now.

Probably we still need a more informative address decomposition which 
was rewritten by Richard Sandiford (and out of my control now).  I'll 
investigate this more but probably another solution will be ready next 
week in the best case.
Vladimir Makarov - April 12, 2013, 4:57 p.m.
On 13-04-11 5:04 PM, Vladimir Makarov wrote:
> On 04/11/2013 02:32 PM, David Edelsohn wrote:
>> On Thu, Apr 11, 2013 at 1:30 PM, Vladimir Makarov 
>> <vmakarov@redhat.com> wrote:
>>>    Here is a patch to enable LRA for rs6000.  The patch includes 
>>> code changes
>>> in rs6000 machine-dependent parts and in LRA parts.
>>>
>>>    I've added a new switch -mlra for rs6000 to make debugging LRA 
>>> for rs6000
>>> easier but not documented it as it will be gone at the end of stage1 
>>> (may be
>>> with ability to use LRA for rs6000 if the results are 
>>> unsatisfactory).  I
>>> have only one question about its default value. Currently by default 
>>> LRA is
>>> used but if you prefer opposite I can reverse it. Please just let me 
>>> know
>>> your opinion.
>>>
>>>    The patch was successfully bootstrapped and tested on pppc64 and
>>> x86/x86-64.
>>>
>>>    Are rs6000 parts ok for trunk?
>> Vlad,
>>
>> Thanks for your work on LRA and your work to convert the rs6000 port
>> to use the new feature.
>>
>> My main question about the rs6000 parts of the patch is: what is
>> toc_ok_p suppose to mean?
>>
>> -      if (TARGET_TOC)
>> +      toc_ok_p = (lra_in_progress && TARGET_CMODEL != CMODEL_SMALL
>> +                        && small_toc_ref (x, VOIDmode));
>> +      if (TARGET_TOC && ! toc_ok_p)
>>    return false;
>>
>> This is a change to the meaning of the test with no explanation or
>> comment.  At least the name is confusing because it seems to be
>> selecting a subset of valid TOC addressing forms. Medium and large
>> model TOC references before reload?
>>
>> Also suffix "_ok" and "_p" seem redundant.
>>
>> Maybe your intent is a boolean for "large" TOC that could be named
>> large_toc_p or large_toc_ok?
>>
> This is a hard question for me.  Thanks for pointing this out, David.  
> The code is 9 months old.  It was introduced with lo_sum support for ppc:
>
> http://gcc.gnu.org/ml/gcc-patches/2012-07/msg00260.html
>
> Without this change, there are >3000 failures on GCC testsuite.  If 
> ppc target code says that the address
>
> (lo_sum:DI (reg:DI <some pseudo>)
>                 (plus:DI (unspec:DI [
>                             (symbol_ref:DI ("*.LANCHOR0") [flags 0x182])
>                             (reg:DI 2 2)
>                         ] UNSPEC_TOCREL)
>                     (const_int 10 [0xa])))
>
> is not legitimate lo_sum address, LRA tries to reload an address 
> generating insn
>
> <a pseudo> = (plus:DI (unspec:DI [
>                          (symbol_ref:DI ("*.LANCHOR0") [flags 0x182])
>                         (reg:DI 2 2)
>                        ] UNSPEC_TOCREL)
>               (const_int 10 [0xa]))
>
> and that fails
>
> I believe it was a quick hack for old address decomposition code which 
> processed wrongly some addresses.  I guess I forgot about this hack.  
> Looking at this I don't like it now.
>
> Probably we still need a more informative address decomposition which 
> was rewritten by Richard Sandiford (and out of my control now).  I'll 
> investigate this more but probably another solution will be ready next 
> week in the best case.
>
After thorough investigation I found that this code is still ok.  The 
explanations are in the comment.  Here is the modified version of the 
code taking you proposals into account

@@ -5701,19 +5705,31 @@ legitimate_lo_sum_address_p (enum machin

    if (TARGET_ELF || TARGET_MACHO)
      {
+      bool large_toc_ok;
+
        if (DEFAULT_ABI != ABI_AIX && DEFAULT_ABI != ABI_DARWIN && flag_pic)
         return false;
-      if (TARGET_TOC)
+      /* LRA don't use LEGITIMIZE_RELOAD_ADDRESS as it usually calls
+        push_reload from reload pass code. LEGITIMIZE_RELOAD_ADDRESS
+        recognizes some LO_SUM addresses as valid although this
+        function says opposite.  In most cases, LRA through different
+        transformations can generate correct code for address reloads.
+        It can not manage only some LO_SUM cases.  So we need to add
+        code analogous to one in rs6000_legitimize_reload_address for
+        LOW_SUM here saying that some addresses are still valid. */
+      large_toc_ok = (lra_in_progress && TARGET_CMODEL != CMODEL_SMALL
+                     && small_toc_ref (x, VOIDmode));
+      if (TARGET_TOC && ! large_toc_ok)
         return false;
        if (GET_MODE_NUNITS (mode) != 1)
         return false;
-      if (GET_MODE_SIZE (mode) > UNITS_PER_WORD
+      if (! lra_in_progress && GET_MODE_SIZE (mode) > UNITS_PER_WORD
           && !(/* ??? Assume floating point reg based on mode?  */
                TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT
                && (mode == DFmode || mode == DDmode)))
         return false;

-      return CONSTANT_P (x);
+      return CONSTANT_P (x) || large_toc_ok;
      }

    return false;
David Edelsohn - April 12, 2013, 11:38 p.m.
On Fri, Apr 12, 2013 at 12:57 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:

> After thorough investigation I found that this code is still ok.  The
> explanations are in the comment.  Here is the modified version of the code
> taking you proposals into account
>
> @@ -5701,19 +5705,31 @@ legitimate_lo_sum_address_p (enum machin
>
>    if (TARGET_ELF || TARGET_MACHO)
>      {
> +      bool large_toc_ok;
> +
>        if (DEFAULT_ABI != ABI_AIX && DEFAULT_ABI != ABI_DARWIN && flag_pic)
>         return false;
> -      if (TARGET_TOC)
> +      /* LRA don't use LEGITIMIZE_RELOAD_ADDRESS as it usually calls
> +        push_reload from reload pass code. LEGITIMIZE_RELOAD_ADDRESS
> +        recognizes some LO_SUM addresses as valid although this
> +        function says opposite.  In most cases, LRA through different
> +        transformations can generate correct code for address reloads.
> +        It can not manage only some LO_SUM cases.  So we need to add
> +        code analogous to one in rs6000_legitimize_reload_address for
> +        LOW_SUM here saying that some addresses are still valid. */
> +      large_toc_ok = (lra_in_progress && TARGET_CMODEL != CMODEL_SMALL
>
> +                     && small_toc_ref (x, VOIDmode));
> +      if (TARGET_TOC && ! large_toc_ok)
>         return false;
>        if (GET_MODE_NUNITS (mode) != 1)
>         return false;
> -      if (GET_MODE_SIZE (mode) > UNITS_PER_WORD
> +      if (! lra_in_progress && GET_MODE_SIZE (mode) > UNITS_PER_WORD
>           && !(/* ??? Assume floating point reg based on mode?  */
>                TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT
>                && (mode == DFmode || mode == DDmode)))
>         return false;
>
> -      return CONSTANT_P (x);
> +      return CONSTANT_P (x) || large_toc_ok;
>      }
>
>    return false;

Okay, this makes more sense.

I think that Mike is running some experiments with LRA to see what
impact we see.  I would like to understand that a little more before
we apply the patch to trunk.

Thanks David
Michael Meissner - April 15, 2013, 10:48 p.m.
I built the spec 2006 suite with/without Vlad's patches for enabling using the
LRA register allocator for the powerpc.  Because of the bug with the count
register that was in the version I checked out, I have built things with the
-fno-branch-count-reg option.

I created a branch off of subversion id 197925 and applied Vlad's initial
patches:
svn+ssh://gcc.gnu.org/svn/gcc/branches/ibm/meissner-lra

I can't put the spec files in a general mailing list, but I will make them
available to Vlad as needed.

On the 64-bit side, the wrf benchmark does not build:

/home/meissner/fsf-install-ppc64/meissner-lra/bin/gfortran -c -o module_diffusion_em.fppized.o -I. -I./netcdf/include -g -save-temps=obj -ffast-math -O3 -mveclibabi=mass -mcpu=power7 -mrecip=rsqrt -fpeel-loops -funroll-loops -ftree-vectorize -fvect-cost-model -msave-toc-indirect -fno-aggressive-loop-optimizations -fno-branch-count-reg -mno-pointers-to-nested-functions -mlra -m64 module_diffusion_em.fppized.f90
module_diffusion_em.fppized.f90: In function 'compute_diff_metrics':
module_diffusion_em.fppized.f90:5069:0: internal compiler error: in check_rtl, at lra.c:1999
     END SUBROUTINE compute_diff_metrics
 ^
0x1055e1bf check_rtl		 /home/meissner/fsf-src/meissner-lra/gcc/lra.c:1999
0x105604c3 lra(_IO_FILE*)	 /home/meissner/fsf-src/meissner-lra/gcc/lra.c:2374
0x10512f4b do_reload		 /home/meissner/fsf-src/meissner-lra/gcc/ira.c:4619
0x10512f4b rest_of_handle_reload /home/meissner/fsf-src/meissner-lra/gcc/ira.c:4731
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.
specmake: *** [module_diffusion_em.fppized.o] Error 1
specmake: *** Waiting for unfinished jobs....

On the 32-bit side, both wrf and dealII benchmarks do not build.  The wrf
failure looks like the 64-bit failure, but the file being compiled is
different:

/home/meissner/fsf-install-ppc64/meissner-lra/bin/gfortran -c -o ESMF_Alarm.fppized.o -I. -I./netcdf/include -g -save-temps=obj -ffast-math -Ofast -mveclibabi=mass -mcpu=power7 -mrecip=rsqrt -fpeel-loops -funroll-loops -ftree-vectorize -fvect-cost-model -fno-aggressive-loop-optimizations -fno-branch-count-reg -mlra -m32 ESMF_Alarm.fppized.f90
module_soil_pre.fppized.f90:1184:0: internal compiler error: in check_rtl, at lra.c:1999
    END SUBROUTINE init_soil_3_real
 ^
0x1055e1bf check_rtl		 /home/meissner/fsf-src/meissner-lra/gcc/lra.c:1999
0x105604c3 lra(_IO_FILE*)	 /home/meissner/fsf-src/meissner-lra/gcc/lra.c:2374
0x10512f4b do_reload		 /home/meissner/fsf-src/meissner-lra/gcc/ira.c:4619
0x10512f4b rest_of_handle_reload /home/meissner/fsf-src/meissner-lra/gcc/ira.c:4731
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.
specmake: *** [module_soil_pre.fppized.o] Error 1
specmake: *** Waiting for unfinished jobs....
Error with make 'specmake -j40 build': check file '/home/meissner/spec-build/spec-2006-base-dev49-power7-vsx-svn197925-nocountreg-lra-shared-at6.0-32bit/benchspec/CPU2006/481.wrf/build/build_base_dev49-power7-vsx-32bit.0000/make.err'
  Command returned exit code 2
  Error with make!
*** Error building 481.wrf

In dealII, quadrature_lib.cc and polynomial.cc don't build.

/home/meissner/fsf-install-ppc64/meissner-lra/bin/g++ -c -o quadrature_lib.o -DSPEC_CPU -DNDEBUG  -Iinclude -DBOOST_DISABLE_THREADS -Ddeal_II_dimension=3 -g -save-temps=obj -ffast-math -Ofast -mveclibabi=mass -mcpu=power7 -mrecip=rsqrt -fpeel-loops -funroll-loops -ftree-vectorize -fvect-cost-model -fno-aggressive-loop-optimizations -fno-branch-count-reg -mlra  -m32     -DSPEC_CPU_LINUX -include cstddef      quadrature_lib.cc
quadrature_lib.cc: In constructor 'QGauss<dim>::QGauss(unsigned int) [with int dim = 1]':
quadrature_lib.cc:95:1: internal compiler error: in check_rtl, at lra.c:1999
 }
 ^
0x106cb2bf check_rtl		 /home/meissner/fsf-src/meissner-lra/gcc/lra.c:1999
0x106cd5c3 lra(_IO_FILE*)        /home/meissner/fsf-src/meissner-lra/gcc/lra.c:2374
0x1068004b do_reload		 /home/meissner/fsf-src/meissner-lra/gcc/ira.c:4619
0x1068004b rest_of_handle_reload /home/meissner/fsf-src/meissner-lra/gcc/ira.c:4731
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.
specmake: *** [quadrature_lib.o] Error 1

/home/meissner/fsf-install-ppc64/meissner-lra/bin/g++ -c -o polynomial.o -DSPEC_CPU -DNDEBUG  -Iinclude -DBOOST_DISABLE_THREADS -Ddeal_II_dimension=3 -g -save-temps=obj -ffast-math -Ofast -mveclibabi=mass -mcpu=power7 -mrecip=rsqrt -fpeel-loops -funroll-loops -ftree-vectorize -fvect-cost-model -fno-aggressive-loop-optimizations -fno-branch-count-reg -mlra  -m32     -DSPEC_CPU_LINUX -include cstddef      polynomial.cc
polynomial.cc: In member function 'Polynomials::Polynomial<number> Polynomials::Polynomial<number>::derivative() const [with number = long double]':
polynomial.cc:282:3: internal compiler error: in check_rtl, at lra.c:1999
   }
   ^
0x106cb2bf check_rtl		 /home/meissner/fsf-src/meissner-lra/gcc/lra.c:1999
0x106cd5c3 lra(_IO_FILE*)	 /home/meissner/fsf-src/meissner-lra/gcc/lra.c:2374
0x1068004b do_reload		 /home/meissner/fsf-src/meissner-lra/gcc/ira.c:4619
0x1068004b rest_of_handle_reload /home/meissner/fsf-src/meissner-lra/gcc/ira.c:4731
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.
specmake: *** [polynomial.o] Error 1
Steven Bosscher - April 15, 2013, 11:03 p.m.
On Tue, Apr 16, 2013 at 12:48 AM, Michael Meissner wrote:
> 0x1055e1bf check_rtl             /home/meissner/fsf-src/meissner-lra/gcc/lra.c:1999

These are all cases of insns not satisfying their constraints. There
are no PRs for this, and there are no test suite failures of this kind
in the logs of my powerpc lra-branch test bot. I hope you can extract
test cases and file PRs...

Ciao!
Steven
Michael Meissner - April 15, 2013, 11:11 p.m.
On Tue, Apr 16, 2013 at 01:03:35AM +0200, Steven Bosscher wrote:
> On Tue, Apr 16, 2013 at 12:48 AM, Michael Meissner wrote:
> > 0x1055e1bf check_rtl             /home/meissner/fsf-src/meissner-lra/gcc/lra.c:1999
> 
> These are all cases of insns not satisfying their constraints. There
> are no PRs for this, and there are no test suite failures of this kind
> in the logs of my powerpc lra-branch test bot. I hope you can extract
> test cases and file PRs...

Yes of course, but I wanted to give Vlad a heads up, ASAP.

Patch

Index: config/rs6000/rs6000-protos.h
===================================================================
--- config/rs6000/rs6000-protos.h	(revision 197640)
+++ config/rs6000/rs6000-protos.h	(working copy)
@@ -118,6 +118,8 @@  extern rtx create_TOC_reference (rtx, rt
 extern void rs6000_split_multireg_move (rtx, rtx);
 extern void rs6000_emit_move (rtx, rtx, enum machine_mode);
 extern rtx rs6000_secondary_memory_needed_rtx (enum machine_mode);
+extern enum machine_mode rs6000_secondary_memory_needed_mode (enum
+							      machine_mode);
 extern rtx (*rs6000_legitimize_reload_address_ptr) (rtx, enum machine_mode,
 						    int, int, int, int *);
 extern bool rs6000_legitimate_offset_address_p (enum machine_mode, rtx,
Index: config/rs6000/rs6000.c
===================================================================
--- config/rs6000/rs6000.c	(revision 197640)
+++ config/rs6000/rs6000.c	(working copy)
@@ -56,6 +56,7 @@ 
 #include "intl.h"
 #include "params.h"
 #include "tm-constrs.h"
+#include "ira.h"
 #include "opts.h"
 #include "tree-vectorizer.h"
 #include "dumpfile.h"
@@ -1425,6 +1426,9 @@  static const struct attribute_spec rs600
 #undef TARGET_MODE_DEPENDENT_ADDRESS_P
 #define TARGET_MODE_DEPENDENT_ADDRESS_P rs6000_mode_dependent_address_p
 
+#undef TARGET_LRA_P
+#define TARGET_LRA_P rs6000_lra_p
+
 #undef TARGET_CAN_ELIMINATE
 #define TARGET_CAN_ELIMINATE rs6000_can_eliminate
 
@@ -5561,7 +5565,7 @@  rs6000_legitimate_offset_address_p (enum
     return false;
   if (!reg_offset_addressing_ok_p (mode))
     return virtual_stack_registers_memory_p (x);
-  if (legitimate_constant_pool_address_p (x, mode, strict))
+  if (legitimate_constant_pool_address_p (x, mode, strict || lra_in_progress))
     return true;
   if (GET_CODE (XEXP (x, 1)) != CONST_INT)
     return false;
@@ -5701,19 +5705,23 @@  legitimate_lo_sum_address_p (enum machin
 
   if (TARGET_ELF || TARGET_MACHO)
     {
+      bool toc_ok_p;
+
       if (DEFAULT_ABI != ABI_AIX && DEFAULT_ABI != ABI_DARWIN && flag_pic)
 	return false;
-      if (TARGET_TOC)
+      toc_ok_p = (lra_in_progress && TARGET_CMODEL != CMODEL_SMALL
+		  && small_toc_ref (x, VOIDmode));
+      if (TARGET_TOC && ! toc_ok_p)
 	return false;
       if (GET_MODE_NUNITS (mode) != 1)
 	return false;
-      if (GET_MODE_SIZE (mode) > UNITS_PER_WORD
+      if (! lra_in_progress && GET_MODE_SIZE (mode) > UNITS_PER_WORD
 	  && !(/* ??? Assume floating point reg based on mode?  */
 	       TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT
 	       && (mode == DFmode || mode == DDmode)))
 	return false;
 
-      return CONSTANT_P (x);
+      return CONSTANT_P (x) || toc_ok_p;
     }
 
   return false;
@@ -6711,7 +6719,8 @@  rs6000_legitimate_address_p (enum machin
   if (reg_offset_p && legitimate_small_data_p (mode, x))
     return 1;
   if (reg_offset_p
-      && legitimate_constant_pool_address_p (x, mode, reg_ok_strict))
+      && legitimate_constant_pool_address_p (x, mode,
+					     reg_ok_strict || lra_in_progress))
     return 1;
   /* If not REG_OK_STRICT (before reload) let pass any stack offset.  */
   if (! reg_ok_strict
@@ -7000,6 +7009,7 @@  rs6000_conditional_register_usage (void)
 	  fixed_regs[i] = call_used_regs[i] = call_really_used_regs[i] = 1;
     }
 }
+
 
 /* Try to output insns to set TARGET equal to the constant C if it can
    be done in less than N insns.  Do all computations in MODE.
@@ -7331,6 +7341,68 @@  rs6000_emit_move (rtx dest, rtx source,
     cfun->machine->sdmode_stack_slot =
       eliminate_regs (cfun->machine->sdmode_stack_slot, VOIDmode, NULL_RTX);
 
+
+  if (lra_in_progress
+      && mode == SDmode
+      && REG_P (operands[0]) && REGNO (operands[0]) >= FIRST_PSEUDO_REGISTER
+      && reg_preferred_class (REGNO (operands[0])) == NO_REGS
+      && (REG_P (operands[1])
+	  || (GET_CODE (operands[1]) == SUBREG
+	      && REG_P (SUBREG_REG (operands[1])))))
+    {
+      int regno = REGNO (GET_CODE (operands[1]) == SUBREG
+			 ? SUBREG_REG (operands[1]) : operands[1]);
+      enum reg_class cl;
+
+      if (regno >= FIRST_PSEUDO_REGISTER)
+	{
+	  cl = reg_preferred_class (regno);
+	  gcc_assert (cl != NO_REGS);
+	  regno = ira_class_hard_regs[cl][0];
+	}
+      if (FP_REGNO_P (regno))
+	{
+	  if (GET_MODE (operands[0]) != DDmode)
+	    operands[0] = gen_rtx_SUBREG (DDmode, operands[0], 0);
+	  emit_insn (gen_movsd_store (operands[0], operands[1]));
+	}
+      else if (INT_REGNO_P (regno))
+	emit_insn (gen_movsd_hardfloat (operands[0], operands[1]));
+      else
+	gcc_unreachable();
+      return;
+    }
+  if (lra_in_progress
+      && mode == SDmode
+      && (REG_P (operands[0])
+	  || (GET_CODE (operands[0]) == SUBREG
+	      && REG_P (SUBREG_REG (operands[0]))))
+      && REG_P (operands[1]) && REGNO (operands[1]) >= FIRST_PSEUDO_REGISTER
+      && reg_preferred_class (REGNO (operands[1])) == NO_REGS)
+    {
+      int regno = REGNO (GET_CODE (operands[0]) == SUBREG
+			 ? SUBREG_REG (operands[0]) : operands[0]);
+      enum reg_class cl;
+
+      if (regno >= FIRST_PSEUDO_REGISTER)
+	{
+	  cl = reg_preferred_class (regno);
+	  gcc_assert (cl != NO_REGS);
+	  regno = ira_class_hard_regs[cl][0];
+	}
+      if (FP_REGNO_P (regno))
+	{
+	  if (GET_MODE (operands[1]) != DDmode)
+	    operands[1] = gen_rtx_SUBREG (DDmode, operands[1], 0);
+	  emit_insn (gen_movsd_load (operands[0], operands[1]));
+	}
+      else if (INT_REGNO_P (regno))
+	emit_insn (gen_movsd_hardfloat (operands[0], operands[1]));
+      else
+	gcc_unreachable();
+      return;
+    }
+
   if (reload_in_progress
       && mode == SDmode
       && cfun->machine->sdmode_stack_slot != NULL_RTX
@@ -13848,6 +13920,17 @@  rs6000_secondary_memory_needed_rtx (enum
   return ret;
 }
 
+/* Return the mode to be used for memory when a secondary memory
+   location is needed.  For SDmode values we need to use DDmode, in
+   all other cases we can use the same mode.  */
+enum machine_mode
+rs6000_secondary_memory_needed_mode (enum machine_mode mode)
+{
+  if (mode == SDmode)
+    return DDmode;
+  return mode;
+}
+
 static tree
 rs6000_check_sdmode (tree *tp, int *walk_subtrees, void *data ATTRIBUTE_UNUSED)
 {
@@ -14511,6 +14594,10 @@  rs6000_alloc_sdmode_stack_slot (void)
   gimple_stmt_iterator gsi;
 
   gcc_assert (cfun->machine->sdmode_stack_slot == NULL_RTX);
+  /* We use a different approach for dealing with the secondary
+     memmory in LRA.  */
+  if (ira_use_lra_p)
+    return;
 
   if (TARGET_NO_SDMODE_STACK)
     return;
@@ -14747,7 +14834,7 @@  rs6000_secondary_reload_class (enum reg_
   /* Constants, memory, and FP registers can go into FP registers.  */
   if ((regno == -1 || FP_REGNO_P (regno))
       && (rclass == FLOAT_REGS || rclass == NON_SPECIAL_REGS))
-    return (mode != SDmode) ? NO_REGS : GENERAL_REGS;
+    return (mode != SDmode || lra_in_progress) ? NO_REGS : GENERAL_REGS;
 
   /* Memory, and FP/altivec registers can go into fp/altivec registers under
      VSX.  However, for scalar variables, use the traditional floating point
@@ -27666,6 +27753,13 @@  rs6000_libcall_value (enum machine_mode
 }
 
 
+/* Return true if we use LRA instead of reload pass.  */
+static bool
+rs6000_lra_p (void)
+{
+  return rs6000_lra_flag;
+}
+
 /* Given FROM and TO register numbers, say whether this elimination is allowed.
    Frame pointer elimination is automatically handled.
 
Index: config/rs6000/rs6000.h
===================================================================
--- config/rs6000/rs6000.h	(revision 197640)
+++ config/rs6000/rs6000.h	(working copy)
@@ -1391,6 +1391,13 @@  extern enum reg_class rs6000_constraints
 #define SECONDARY_MEMORY_NEEDED_RTX(MODE) \
   rs6000_secondary_memory_needed_rtx (MODE)
 
+/* Specify the mode to be used for memory when a secondary memory
+   location is needed.  For cpus that cannot load/store SDmode values
+   from the 64-bit FP registers without using a full 64-bit
+   load/store, we need a wider mode.  */
+#define SECONDARY_MEMORY_NEEDED_MODE(MODE)		\
+  rs6000_secondary_memory_needed_mode (MODE)
+
 /* Return the maximum number of consecutive registers
    needed to represent mode MODE in a register of class CLASS.
 
Index: config/rs6000/rs6000.opt
===================================================================
--- config/rs6000/rs6000.opt	(revision 197640)
+++ config/rs6000/rs6000.opt	(working copy)
@@ -443,6 +443,10 @@  mlong-double-
 Target RejectNegative Joined UInteger Var(rs6000_long_double_type_size) Save
 -mlong-double-<n>	Specify size of long double (64 or 128 bits)
 
+mlra
+Target Report Var(rs6000_lra_flag) Init(1) Save
+Use LRA instead of reload
+
 msched-costly-dep=
 Target RejectNegative Joined Var(rs6000_sched_costly_dep_str)
 Determine which dependences between insns are considered costly
Index: lra-constraints.c
===================================================================
--- lra-constraints.c	(revision 197640)
+++ lra-constraints.c	(working copy)
@@ -135,10 +135,11 @@ 
    reload insns.  */
 static int bb_reload_num;
 
-/* The current insn being processed and corresponding its data (basic
-   block, the insn data, the insn static data, and the mode of each
-   operand).  */
+/* The current insn being processed and corresponding its single set
+   (NULL otherwise), its data (basic block, the insn data, the insn
+   static data, and the mode of each operand).  */
 static rtx curr_insn;
+static rtx curr_insn_set;
 static basic_block curr_bb;
 static lra_insn_recog_data_t curr_id;
 static struct lra_static_insn_data *curr_static_id;
@@ -698,6 +699,7 @@  match_reload (signed char out, signed ch
 	    new_out_reg = gen_lowpart_SUBREG (outmode, reg);
 	  else
 	    new_out_reg = gen_rtx_SUBREG (outmode, reg, 0);
+	  LRA_SUBREG_P (new_out_reg) = 1;
 	  /* If the input reg is dying here, we can use the same hard
 	     register for REG and IN_RTX.  We do it only for original
 	     pseudos as reload pseudos can die although original
@@ -721,6 +723,7 @@  match_reload (signed char out, signed ch
 	     it at the end of LRA work.  */
 	  clobber = emit_clobber (new_out_reg);
 	  LRA_TEMP_CLOBBER_P (PATTERN (clobber)) = 1;
+	  LRA_SUBREG_P (new_in_reg) = 1;
 	  if (GET_CODE (in_rtx) == SUBREG)
 	    {
 	      rtx subreg_reg = SUBREG_REG (in_rtx);
@@ -856,32 +859,34 @@  static rtx
 emit_spill_move (bool to_p, rtx mem_pseudo, rtx val)
 {
   if (GET_MODE (mem_pseudo) != GET_MODE (val))
-    val = gen_rtx_SUBREG (GET_MODE (mem_pseudo),
-			  GET_CODE (val) == SUBREG ? SUBREG_REG (val) : val,
-			  0);
+    {
+      val = gen_rtx_SUBREG (GET_MODE (mem_pseudo),
+			    GET_CODE (val) == SUBREG ? SUBREG_REG (val) : val,
+			    0);
+      LRA_SUBREG_P (val) = 1;
+    }
   return (to_p
-	  ? gen_move_insn (mem_pseudo, val)
-	  : gen_move_insn (val, mem_pseudo));
+          ? gen_move_insn (mem_pseudo, val)
+          : gen_move_insn (val, mem_pseudo));
 }
 
 /* Process a special case insn (register move), return true if we
-   don't need to process it anymore.  Return that RTL was changed
-   through CHANGE_P and macro SECONDARY_MEMORY_NEEDED says to use
-   secondary memory through SEC_MEM_P.	*/
+   don't need to process it anymore.  INSN should be a single set
+   insn.  Set up that RTL was changed through CHANGE_P and macro
+   SECONDARY_MEMORY_NEEDED says to use secondary memory through
+   SEC_MEM_P.  */
 static bool
-check_and_process_move (bool *change_p, bool *sec_mem_p)
+check_and_process_move (bool *change_p, bool *sec_mem_p ATTRIBUTE_UNUSED)
 {
   int sregno, dregno;
-  rtx set, dest, src, dreg, sreg, old_sreg, new_reg, before, scratch_reg;
+  rtx dest, src, dreg, sreg, old_sreg, new_reg, before, scratch_reg;
   enum reg_class dclass, sclass, secondary_class;
   enum machine_mode sreg_mode;
   secondary_reload_info sri;
 
-  *sec_mem_p = *change_p = false;
-  if ((set = single_set (curr_insn)) == NULL)
-    return false;
-  dreg = dest = SET_DEST (set);
-  sreg = src = SET_SRC (set);
+  lra_assert (curr_insn_set != NULL_RTX);
+  dreg = dest = SET_DEST (curr_insn_set);
+  sreg = src = SET_SRC (curr_insn_set);
   /* Quick check on the right move insn which does not need
      reloads.  */
   if ((dclass = get_op_class (dest)) != NO_REGS
@@ -1008,7 +1013,7 @@  check_and_process_move (bool *change_p,
       if (GET_CODE (src) == SUBREG)
 	SUBREG_REG (src) = new_reg;
       else
-	SET_SRC (set) = new_reg;
+	SET_SRC (curr_insn_set) = new_reg;
     }
   else
     {
@@ -1205,7 +1210,10 @@  simplify_operand_subreg (int nop, enum m
        && (hard_regno_nregs[hard_regno][GET_MODE (reg)]
 	   >= hard_regno_nregs[hard_regno][mode])
        && simplify_subreg_regno (hard_regno, GET_MODE (reg),
-				 SUBREG_BYTE (operand), mode) < 0)
+				 SUBREG_BYTE (operand), mode) < 0
+       /* Don't reload subreg for matching reload.  It is actually
+	  valid subreg in LRA.  */
+       && ! LRA_SUBREG_P (operand))
       || CONSTANT_P (reg) || GET_CODE (reg) == PLUS || MEM_P (reg))
     {
       enum op_type type = curr_static_id->operand[nop].type;
@@ -1312,6 +1320,14 @@  general_constant_p (rtx x)
   return CONSTANT_P (x) && (! flag_pic || LEGITIMATE_PIC_OPERAND_P (x));
 }
 
+static bool
+reg_in_class_p (rtx reg, enum reg_class cl)
+{
+  if (cl == NO_REGS)
+    return get_reg_class (REGNO (reg)) == NO_REGS;
+  return in_class_p (reg, cl, NULL);
+}
+
 /* Major function to choose the current insn alternative and what
    operands should be reloaded and how.	 If ONLY_ALTERNATIVE is not
    negative we should consider only this alternative.  Return false if
@@ -1391,7 +1407,7 @@  process_alt_operands (int only_alternati
   for (nalt = 0; nalt < n_alternatives; nalt++)
     {
       /* Loop over operands for one constraint alternative.  */
-#ifdef HAVE_ATTR_enabled
+#if HAVE_ATTR_enabled
       if (curr_id->alternative_enabled_p != NULL
 	  && ! curr_id->alternative_enabled_p[nalt])
 	continue;
@@ -2048,6 +2064,31 @@  process_alt_operands (int only_alternati
 	  if (early_clobber_p && operand_reg[nop] != NULL_RTX)
 	    early_clobbered_nops[early_clobbered_regs_num++] = nop;
 	}
+      if (curr_insn_set != NULL_RTX && n_operands == 2
+	  && ((! curr_alt_win[0] && ! curr_alt_win[1]
+	       && REG_P (no_subreg_reg_operand[0])
+	       && REG_P (no_subreg_reg_operand[1])
+	       && (reg_in_class_p (no_subreg_reg_operand[0], curr_alt[1])
+		   || reg_in_class_p (no_subreg_reg_operand[1], curr_alt[0])))
+	      || (! curr_alt_win[0] && curr_alt_win[1]
+		  && REG_P (no_subreg_reg_operand[1])
+		  && reg_in_class_p (no_subreg_reg_operand[1], curr_alt[0]))
+	      || (curr_alt_win[0] && ! curr_alt_win[1]
+		  && REG_P (no_subreg_reg_operand[0])
+		  && reg_in_class_p (no_subreg_reg_operand[0], curr_alt[1])
+		  && (! CONST_POOL_OK_P (curr_operand_mode[1],
+					 no_subreg_reg_operand[1])
+		      || (targetm.preferred_reload_class
+			  (no_subreg_reg_operand[1],
+			   (enum reg_class) curr_alt[1]) != NO_REGS))
+		  /* If it is a result of recent elimination in move
+		     insn we can transform it into an add still by
+		     using this alternative.  */
+		  && GET_CODE (no_subreg_reg_operand[1]) != PLUS)))
+	/* We have a move insn and a new reload insn will be similar
+	   to the current insn.  We should avoid such situation as it
+	   results in LRA cycling.  */
+	overall += LRA_MAX_REJECT;
       ok_p = true;
       curr_alt_dont_inherit_ops_num = 0;
       for (nop = 0; nop < early_clobbered_regs_num; nop++)
@@ -2419,27 +2460,35 @@  process_address (int nop, rtx *before, r
       && process_addr_reg (ad.index_term, before, NULL, INDEX_REG_CLASS))
     change_p = true;
 
+#ifdef EXTRA_CONSTRAINT_STR
+  /* Target hooks sometimes reject extra constraint addresses -- use
+     EXTRA_CONSTRAINT_STR for the validation.  */
+  if (constraint[0] != 'p'
+      && EXTRA_ADDRESS_CONSTRAINT (constraint[0], constraint)
+      && EXTRA_CONSTRAINT_STR (op, constraint[0], constraint))
+    return change_p;
+#endif
+
   /* There are three cases where the shape of *AD.INNER may now be invalid:
 
      1) the original address was valid, but either elimination or
-	equiv_address_substitution applied a displacement that made
-	it invalid.
+	equiv_address_substitution was applied and that made
+	the address invalid.
 
      2) the address is an invalid symbolic address created by
 	force_const_to_mem.
 
      3) the address is a frame address with an invalid offset.
 
-     All these cases involve a displacement and a non-autoinc address,
-     so there is no point revalidating other types.  */
-  if (ad.disp == NULL || ad.autoinc_p || valid_address_p (&ad))
+     All these cases involve a non-autoinc address, so there is no
+     point revalidating other types.  */
+  if (ad.autoinc_p || valid_address_p (&ad))
     return change_p;
 
   /* Any index existed before LRA started, so we can assume that the
      presence and shape of the index is valid.  */
   push_to_sequence (*before);
-  gcc_assert (ad.segment == NULL);
-  gcc_assert (ad.disp == ad.disp_term);
+  lra_assert (ad.disp == ad.disp_term);
   if (ad.base == NULL)
     {
       if (ad.index == NULL)
@@ -2447,25 +2496,25 @@  process_address (int nop, rtx *before, r
 	  int code = -1;
 	  enum reg_class cl = base_reg_class (ad.mode, ad.as,
 					      SCRATCH, SCRATCH);
-	  rtx disp = *ad.disp;
+	  rtx addr = *ad.inner;
 
-	  new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, "disp");
+	  new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, "addr");
 #ifdef HAVE_lo_sum
 	  {
 	    rtx insn;
 	    rtx last = get_last_insn ();
 
-	    /* disp => lo_sum (new_base, disp), case (2) above.  */
+	    /* addr => lo_sum (new_base, addr), case (2) above.  */
 	    insn = emit_insn (gen_rtx_SET
 			      (VOIDmode, new_reg,
-			       gen_rtx_HIGH (Pmode, copy_rtx (disp))));
+			       gen_rtx_HIGH (Pmode, copy_rtx (addr))));
 	    code = recog_memoized (insn);
 	    if (code >= 0)
 	      {
-		*ad.disp = gen_rtx_LO_SUM (Pmode, new_reg, disp);
+		*ad.inner = gen_rtx_LO_SUM (Pmode, new_reg, addr);
 		if (! valid_address_p (ad.mode, *ad.outer, ad.as))
 		  {
-		    *ad.disp = disp;
+		    *ad.inner = addr;
 		    code = -1;
 		  }
 	      }
@@ -2475,9 +2524,9 @@  process_address (int nop, rtx *before, r
 #endif
 	  if (code < 0)
 	    {
-	      /* disp => new_base, case (2) above.  */
-	      lra_emit_move (new_reg, disp);
-	      *ad.disp = new_reg;
+	      /* addr => new_base, case (2) above.  */
+	      lra_emit_move (new_reg, addr);
+	      *ad.inner = new_reg;
 	    }
 	}
       else
@@ -2690,7 +2739,10 @@  curr_insn_transform (void)
   no_input_reloads_p = no_output_reloads_p = false;
   goal_alt_number = -1;
 
-  if (check_and_process_move (&change_p, &sec_mem_p))
+  change_p = sec_mem_p = false;
+  curr_insn_set = single_set (curr_insn);
+  if (curr_insn_set != NULL_RTX
+      && check_and_process_move (&change_p, &sec_mem_p))
     return change_p;
 
   /* JUMP_INSNs and CALL_INSNs are not allowed to have any output
@@ -4806,7 +4858,7 @@  inherit_in_ebb (rtx head, rtx tail)
 /* This value affects EBB forming.  If probability of edge from EBB to
    a BB is not greater than the following value, we don't add the BB
    to EBB.  */
-#define EBB_PROBABILITY_CUTOFF (REG_BR_PROB_BASE / 2)
+#define EBB_PROBABILITY_CUTOFF ((REG_BR_PROB_BASE * 50) / 100)
 
 /* Current number of inheritance/split iteration.  */
 int lra_inheritance_iter;
Index: lra-eliminations.c
===================================================================
--- lra-eliminations.c	(revision 197640)
+++ lra-eliminations.c	(working copy)
@@ -975,6 +975,9 @@  eliminate_regs_in_insn (rtx insn, bool r
 	}
     }
 
+  if (! validate_p)
+    return;
+
   /* Substitute the operands; the new values are in the substed_operand
      array.  */
   for (i = 0; i < static_id->n_operands; i++)
@@ -982,16 +985,13 @@  eliminate_regs_in_insn (rtx insn, bool r
   for (i = 0; i < static_id->n_dups; i++)
     *id->dup_loc[i] = substed_operand[(int) static_id->dup_num[i]];
 
-  if (validate_p)
-    {
-      /* If we had a move insn but now we don't, re-recognize it.
-	 This will cause spurious re-recognition if the old move had a
-	 PARALLEL since the new one still will, but we can't call
-	 single_set without having put new body into the insn and the
-	 re-recognition won't hurt in this rare case.  */
-      id = lra_update_insn_recog_data (insn);
-      static_id = id->insn_static_data;
-    }
+  /* If we had a move insn but now we don't, re-recognize it.
+     This will cause spurious re-recognition if the old move had a
+     PARALLEL since the new one still will, but we can't call
+     single_set without having put new body into the insn and the
+     re-recognition won't hurt in this rare case.  */
+  id = lra_update_insn_recog_data (insn);
+  static_id = id->insn_static_data;
 }
 
 /* Spill pseudos which are assigned to hard registers in SET.  Add
Index: lra-spills.c
===================================================================
--- lra-spills.c	(revision 197640)
+++ lra-spills.c	(working copy)
@@ -548,6 +548,11 @@  lra_spill (void)
   for (i = 0; i < n; i++)
     if (pseudo_slots[pseudo_regnos[i]].mem == NULL_RTX)
       assign_mem_slot (pseudo_regnos[i]);
+  if (n > 0 && crtl->stack_alignment_needed)
+    /* If we have a stack frame, we must align it now.  The stack size
+       may be a part of the offset computation for register
+       elimination.  */
+    assign_stack_local (BLKmode, 0, crtl->stack_alignment_needed);
   if (lra_dump_file != NULL)
     {
       for (i = 0; i < slots_num; i++)
@@ -644,10 +649,12 @@  lra_final_code_change (void)
 	    }
 
 	  lra_insn_recog_data_t id = lra_get_insn_recog_data (insn);
+	  struct lra_static_insn_data *static_id = id->insn_static_data;
 	  bool insn_change_p = false;
 
 	  for (i = id->insn_static_data->n_operands - 1; i >= 0; i--)
-	    if (alter_subregs (id->operand_loc[i], ! DEBUG_INSN_P (insn)))
+	    if ((DEBUG_INSN_P (insn) || ! static_id->operand[i].is_operator)
+		&& alter_subregs (id->operand_loc[i], ! DEBUG_INSN_P (insn)))
 	      {
 		lra_update_dup (id, i);
 		insn_change_p = true;
Index: lra.c
===================================================================
--- lra.c	(revision 197640)
+++ lra.c	(working copy)
@@ -2202,6 +2202,10 @@  lra (FILE *f)
 
   timevar_push (TV_LRA);
 
+  /* Make sure that the last insn is a note.  Some subsequent passes
+     need it.  */
+  emit_note (NOTE_INSN_DELETED);
+
   COPY_HARD_REG_SET (lra_no_alloc_regs, ira_no_alloc_regs);
 
   init_reg_info ();
@@ -2258,6 +2262,11 @@  lra (FILE *f)
   bitmap_initialize (&lra_split_regs, &reg_obstack);
   bitmap_initialize (&lra_optional_reload_pseudos, &reg_obstack);
   live_p = false;
+  if (get_frame_size () != 0 && crtl->stack_alignment_needed)
+    /* If we have a stack frame, we must align it now.  The stack size
+       may be a part of the offset computation for register
+       elimination.  */
+    assign_stack_local (BLKmode, 0, crtl->stack_alignment_needed);
   for (;;)
     {
       for (;;)
Index: recog.c
===================================================================
--- recog.c	(revision 197640)
+++ recog.c	(working copy)
@@ -1065,7 +1065,8 @@  register_operand (rtx op, enum machine_m
 	  && REGNO (sub) < FIRST_PSEUDO_REGISTER
 	  && REG_CANNOT_CHANGE_MODE_P (REGNO (sub), GET_MODE (sub), mode)
 	  && GET_MODE_CLASS (GET_MODE (sub)) != MODE_COMPLEX_INT
-	  && GET_MODE_CLASS (GET_MODE (sub)) != MODE_COMPLEX_FLOAT)
+	  && GET_MODE_CLASS (GET_MODE (sub)) != MODE_COMPLEX_FLOAT
+	  && ! LRA_SUBREG_P (op))
 	return 0;
 #endif
 
Index: rtl.h
===================================================================
--- rtl.h	(revision 197640)
+++ rtl.h	(working copy)
@@ -265,7 +265,8 @@  struct GTY((chain_next ("RTX_NEXT (&%h)"
      1 in a SET that is for a return.
      In a CODE_LABEL, part of the two-bit alternate entry field.
      1 in a CONCAT is VAL_EXPR_IS_COPIED in var-tracking.c.
-     1 in a VALUE is SP_BASED_VALUE_P in cselib.c.  */
+     1 in a VALUE is SP_BASED_VALUE_P in cselib.c.
+     1 in a SUBREG generated by LRA for reload insns.  */
   unsigned int jump : 1;
   /* In a CODE_LABEL, part of the two-bit alternate entry field.
      1 in a MEM if it cannot trap.
@@ -1411,6 +1412,11 @@  do {									\
   ((RTL_FLAG_CHECK1("SUBREG_PROMOTED_UNSIGNED_P", (RTX), SUBREG)->volatil) \
    ? -1 : (int) (RTX)->unchanging)
 
+/* True if the subreg was generated by LRA for reload insns.  Such
+   subregs are valid only during LRA.  */
+#define LRA_SUBREG_P(RTX)	\
+  (RTL_FLAG_CHECK1("LRA_SUBREG_P", (RTX), SUBREG)->jump)
+
 /* Access various components of an ASM_OPERANDS rtx.  */
 
 #define ASM_OPERANDS_TEMPLATE(RTX) XCSTR (RTX, 0, ASM_OPERANDS)