diff mbox

patch to enable LRA for ppc

Message ID 524DDB71.6040703@redhat.com
State New
Headers show

Commit Message

Vladimir Makarov Oct. 3, 2013, 9:02 p.m. UTC
The following patch permits today trunk to use LRA for ppc by default. 
To switch it off -mno-lra can be used.

The patch was bootstrapped on ppc64.  GCC testsuite does not have
regressions too (in comparison with reload).  The change in rs6000.md is
for fix LRA failure on a recently added ppc test.

Comments

David Edelsohn Oct. 18, 2013, 3:26 p.m. UTC | #1
On Thu, Oct 3, 2013 at 5:02 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
> The following patch permits today trunk to use LRA for ppc by default.
> To switch it off -mno-lra can be used.
>
> The patch was bootstrapped on ppc64.  GCC testsuite does not have
> regressions too (in comparison with reload).  The change in rs6000.md is
> for fix LRA failure on a recently added ppc test.

Vlad,

I have not forgotten this patch. We are trying to figure out the right
timeframe to make this change. The patch does affect performance --
both positively and negatively; most are in the noise but not all. And
there still are some SPEC benchmarks that fail to build with the
patch, at least in Mike's tests. And Mike is implementing some patches
to utilize reload to improve use of VSX registers, which would need to
be mirrored in LRA for the equivalent functionality.

Thanks, David
Vladimir Makarov Oct. 21, 2013, 2:48 a.m. UTC | #2
On 13-10-18 11:26 AM, David Edelsohn wrote:
> On Thu, Oct 3, 2013 at 5:02 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>> The following patch permits today trunk to use LRA for ppc by default.
>> To switch it off -mno-lra can be used.
>>
>> The patch was bootstrapped on ppc64.  GCC testsuite does not have
>> regressions too (in comparison with reload).  The change in rs6000.md is
>> for fix LRA failure on a recently added ppc test.
> Vlad,
>
> I have not forgotten this patch. We are trying to figure out the right
> timeframe to make this change. The patch does affect performance --
> both positively and negatively; most are in the noise but not all. And
> there still are some SPEC benchmarks that fail to build with the
> patch, at least in Mike's tests. And Mike is implementing some patches
> to utilize reload to improve use of VSX registers, which would need to
> be mirrored in LRA for the equivalent functionality.
Thanks for informing me, David.

I am ready to work on any LRA ppc issues when it will be in the trunk.  
It would be easier for me to work on LRA ppc if the patch is committed 
to the trunk and of course LRA is used as non-default local RA.

I don't know what Mike is doing on reload to use VSX registers.  I guess 
it is usage of  VSX regs as spilled locations for GENERAL regs instead 
of memory.  If it is so, it is 2 day work to add this functionality in 
LRA (as it already has analogous functionality for Intel processors and 
that gave a nice SPECFP2000 improvement for them) and probably more work 
on resolving issues especially as I have no power8.
Michael Meissner Oct. 21, 2013, 3:51 p.m. UTC | #3
On Sun, Oct 20, 2013 at 10:48:08PM -0400, Vladimir Makarov wrote:
> On 13-10-18 11:26 AM, David Edelsohn wrote:
> >On Thu, Oct 3, 2013 at 5:02 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
> >>The following patch permits today trunk to use LRA for ppc by default.
> >>To switch it off -mno-lra can be used.
> >>
> >>The patch was bootstrapped on ppc64.  GCC testsuite does not have
> >>regressions too (in comparison with reload).  The change in rs6000.md is
> >>for fix LRA failure on a recently added ppc test.
> >Vlad,
> >
> >I have not forgotten this patch. We are trying to figure out the right
> >timeframe to make this change. The patch does affect performance --
> >both positively and negatively; most are in the noise but not all. And
> >there still are some SPEC benchmarks that fail to build with the
> >patch, at least in Mike's tests. And Mike is implementing some patches
> >to utilize reload to improve use of VSX registers, which would need to
> >be mirrored in LRA for the equivalent functionality.
> Thanks for informing me, David.
> 
> I am ready to work on any LRA ppc issues when it will be in the
> trunk.  It would be easier for me to work on LRA ppc if the patch is
> committed to the trunk and of course LRA is used as non-default
> local RA.
> 
> I don't know what Mike is doing on reload to use VSX registers.  I
> guess it is usage of  VSX regs as spilled locations for GENERAL regs
> instead of memory.  If it is so, it is 2 day work to add this
> functionality in LRA (as it already has analogous functionality for
> Intel processors and that gave a nice SPECFP2000 improvement for
> them) and probably more work on resolving issues especially as I
> have no power8.

I would say lets add -mlra, but make the default OFF for the time being.  We
can always switch the default later.

Vladimir, I thought I included you in the list when I gave status.  The big
thing is several of the Spec 2006 benchmarks don't work in 32-bit mode, and I
get a lot of Fortran errors, again in 32-bit.  I also saw some decimal floating
point problems.

What I'm doing is adding secondary reload support so that up until reload time,
we can represent VSX addresses as reg+offset, and in secondary reload, create
the addition instructions to put the offset in a base register.  I haven't made
any changes to the machine independent portions of the compiler.  As long as
IRA uses the secondary reload interface, it should be ok.  However, right now,
I need to focus most of my attention on getting the secondary reload support to
work.

One thing that I've asked for before, but to remind you, is I really, really
wish secondary reload could allocate two scratch registers if it is given an
insn that takes 4 arguments.  Right now, I'm allocating a TFmode scratch, since
that gives 2 registers, but future changes will want TFmode to go into a single
vector register, and I will need to create another type, like V4DI that does
take 2 registers.  The case that this is needed for is moving an item from GPRs
to VSX registers that takes 2 GPR registers, such as moving 128-bit items in
64-bit mode, or 64-bit items in 32-bit mode.  I need two registers to do the
move into, and then I will do the combine operation.
Steven Bosscher Oct. 21, 2013, 6:18 p.m. UTC | #4
On Mon, Oct 21, 2013 at 5:51 PM, Michael Meissner wrote:
> What I'm doing is adding secondary reload support so that up until reload time,
> we can represent VSX addresses as reg+offset, and in secondary reload, create
> the addition instructions to put the offset in a base register.  I haven't made
> any changes to the machine independent portions of the compiler.  As long as
> IRA uses the secondary reload interface, it should be ok.  However, right now,
> I need to focus most of my attention on getting the secondary reload support to
> work.
>
> One thing that I've asked for before, but to remind you, is I really, really
> wish secondary reload could allocate two scratch registers if it is given an
> insn that takes 4 arguments.  Right now, I'm allocating a TFmode scratch, since
> that gives 2 registers, but future changes will want TFmode to go into a single
> vector register, and I will need to create another type, like V4DI that does
> take 2 registers.  The case that this is needed for is moving an item from GPRs
> to VSX registers that takes 2 GPR registers, such as moving 128-bit items in
> 64-bit mode, or 64-bit items in 32-bit mode.  I need two registers to do the
> move into, and then I will do the combine operation.


Eh, perhaps I'm missing something, but...

Isn't one of the great advantages of LRA over reload, that LRA allows
you to create new pseudos so that you shouldn't ever need secondary
reloads??

Ciao!
Steven
Michael Meissner Oct. 21, 2013, 6:55 p.m. UTC | #5
On Mon, Oct 21, 2013 at 08:18:22PM +0200, Steven Bosscher wrote:
> On Mon, Oct 21, 2013 at 5:51 PM, Michael Meissner wrote:
> > What I'm doing is adding secondary reload support so that up until reload time,
> > we can represent VSX addresses as reg+offset, and in secondary reload, create
> > the addition instructions to put the offset in a base register.  I haven't made
> > any changes to the machine independent portions of the compiler.  As long as
> > IRA uses the secondary reload interface, it should be ok.  However, right now,
> > I need to focus most of my attention on getting the secondary reload support to
> > work.
> >
> > One thing that I've asked for before, but to remind you, is I really, really
> > wish secondary reload could allocate two scratch registers if it is given an
> > insn that takes 4 arguments.  Right now, I'm allocating a TFmode scratch, since
> > that gives 2 registers, but future changes will want TFmode to go into a single
> > vector register, and I will need to create another type, like V4DI that does
> > take 2 registers.  The case that this is needed for is moving an item from GPRs
> > to VSX registers that takes 2 GPR registers, such as moving 128-bit items in
> > 64-bit mode, or 64-bit items in 32-bit mode.  I need two registers to do the
> > move into, and then I will do the combine operation.
> 
> 
> Eh, perhaps I'm missing something, but...
> 
> Isn't one of the great advantages of LRA over reload, that LRA allows
> you to create new pseudos so that you shouldn't ever need secondary
> reloads??

You still need secondary reload.

For example, on the powerpc, you have 5 addressing modes for GPRs and FPRS:

    1) base register
    2) base register + index register
    3) base register + offset
    4) auto-update for base register + index register
    5) auto-update for base register + offset register

Now for VSX registers you only have:

    1) base register
    2) base register + index register

So, in the work I'm doing right now, I want to allow reg + offset addressing,
but if the register being loaded is an Altivec register (high part of the VSX
registers), I need to create a secondary reload to load the offset into and
then convert the address to indirect or indexed addressing.  You don't want to
always disallow offset based addressing, but you want to create the secondary
reload when you need to.

Similarly, vector types can do indexed addressing (reg+reg) but if you are
loading or storing the value into GPR registers, you can't do reg+reg
addressing on multi-word items.

Finally, one of the features of ISA 2.07 is the notion of load fusion, where
the hardware will fuse together a load immediate with an adjacent load (for
GPRs, you need the load immediate shifted to be the register that is being
loaded, for VSX registers, you just need the instructions adjacent).  In this
case, before reload we will want to pretend that the machine has addressing to
include the fusion forms, and in secondary reload, you will generate the
combined insn that will become the fusion instruction.
Vladimir Makarov Oct. 22, 2013, 2:42 a.m. UTC | #6
On 13-10-21 11:51 AM, Michael Meissner wrote:
> On Sun, Oct 20, 2013 at 10:48:08PM -0400, Vladimir Makarov wrote:
>> On 13-10-18 11:26 AM, David Edelsohn wrote:
>>> On Thu, Oct 3, 2013 at 5:02 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>>>> The following patch permits today trunk to use LRA for ppc by default.
>>>> To switch it off -mno-lra can be used.
>>>>
>>>> The patch was bootstrapped on ppc64.  GCC testsuite does not have
>>>> regressions too (in comparison with reload).  The change in rs6000.md is
>>>> for fix LRA failure on a recently added ppc test.
>>> Vlad,
>>>
>>> I have not forgotten this patch. We are trying to figure out the right
>>> timeframe to make this change. The patch does affect performance --
>>> both positively and negatively; most are in the noise but not all. And
>>> there still are some SPEC benchmarks that fail to build with the
>>> patch, at least in Mike's tests. And Mike is implementing some patches
>>> to utilize reload to improve use of VSX registers, which would need to
>>> be mirrored in LRA for the equivalent functionality.
>> Thanks for informing me, David.
>>
>> I am ready to work on any LRA ppc issues when it will be in the
>> trunk.  It would be easier for me to work on LRA ppc if the patch is
>> committed to the trunk and of course LRA is used as non-default
>> local RA.
>>
>> I don't know what Mike is doing on reload to use VSX registers.  I
>> guess it is usage of  VSX regs as spilled locations for GENERAL regs
>> instead of memory.  If it is so, it is 2 day work to add this
>> functionality in LRA (as it already has analogous functionality for
>> Intel processors and that gave a nice SPECFP2000 improvement for
>> them) and probably more work on resolving issues especially as I
>> have no power8.
> I would say lets add -mlra, but make the default OFF for the time being.  We
> can always switch the default later.
Sure, if you know some LRA problems it should not be on default. 
Moreover, if we still have the problems when releasing gcc4.9, I think 
we should exclude any possibility for a user to use LRA for ppc.  I 
don't want to have GGC-4.9 users blaming LRA.

But adding LRA to PPC on the trunk (switched OFF by default) earlier 
could help me a lot to work on the issues.
> Vladimir, I thought I included you in the list when I gave status.  The big
> thing is several of the Spec 2006 benchmarks don't work in 32-bit mode, and I
> get a lot of Fortran errors, again in 32-bit.  I also saw some decimal floating
> point problems.
No, I did not see the message (or may be missed).  I need to check.
> What I'm doing is adding secondary reload support so that up until reload time,
> we can represent VSX addresses as reg+offset, and in secondary reload, create
> the addition instructions to put the offset in a base register.  I haven't made
> any changes to the machine independent portions of the compiler.  As long as
> IRA uses the secondary reload interface, it should be ok.  However, right now,
> I need to focus most of my attention on getting the secondary reload support to
> work.
I completely understand.  You are quite busy this time as me rushing 
some stuff into gcc-4.9.
> One thing that I've asked for before, but to remind you, is I really, really
> wish secondary reload could allocate two scratch registers if it is given an
> insn that takes 4 arguments.  Right now, I'm allocating a TFmode scratch, since
> that gives 2 registers, but future changes will want TFmode to go into a single
> vector register, and I will need to create another type, like V4DI that does
> take 2 registers.  The case that this is needed for is moving an item from GPRs
> to VSX registers that takes 2 GPR registers, such as moving 128-bit items in
> 64-bit mode, or 64-bit items in 32-bit mode.  I need two registers to do the
> move into, and then I will do the combine operation.
>
Ok.  I guess LRA can be adapted to some new secondary_reload hook 
returning two scratch registers.
Vladimir Makarov Oct. 22, 2013, 2:43 a.m. UTC | #7
On 13-10-21 2:55 PM, Michael Meissner wrote:
> On Mon, Oct 21, 2013 at 08:18:22PM +0200, Steven Bosscher wrote:
>> On Mon, Oct 21, 2013 at 5:51 PM, Michael Meissner wrote:
>>> What I'm doing is adding secondary reload support so that up until reload time,
>>> we can represent VSX addresses as reg+offset, and in secondary reload, create
>>> the addition instructions to put the offset in a base register.  I haven't made
>>> any changes to the machine independent portions of the compiler.  As long as
>>> IRA uses the secondary reload interface, it should be ok.  However, right now,
>>> I need to focus most of my attention on getting the secondary reload support to
>>> work.
>>>
>>> One thing that I've asked for before, but to remind you, is I really, really
>>> wish secondary reload could allocate two scratch registers if it is given an
>>> insn that takes 4 arguments.  Right now, I'm allocating a TFmode scratch, since
>>> that gives 2 registers, but future changes will want TFmode to go into a single
>>> vector register, and I will need to create another type, like V4DI that does
>>> take 2 registers.  The case that this is needed for is moving an item from GPRs
>>> to VSX registers that takes 2 GPR registers, such as moving 128-bit items in
>>> 64-bit mode, or 64-bit items in 32-bit mode.  I need two registers to do the
>>> move into, and then I will do the combine operation.
>>
>> Eh, perhaps I'm missing something, but...
>>
>> Isn't one of the great advantages of LRA over reload, that LRA allows
>> you to create new pseudos so that you shouldn't ever need secondary
>> reloads??
> You still need secondary reload.
As I understand, Mike is telling about secondary_reload hook.  LRA can 
generate chain of reloads as long as it is needed.  It is achieved by 
subsequent processing generated reload insns on one or more 
lra-constraints subpasses.  Porting LRA frequently consists of removing 
secondary reload hook as in many cases it is smart enough to find 
necessary reloads just from insns constraints.  But there are still 
really complicated situations when LRA can not do this and it still 
needs directions from secondary_reload hook.  I am sure PPC has real 
needs to use this hook even for LRA.
> For example, on the powerpc, you have 5 addressing modes for GPRs and FPRS:
>
>      1) base register
>      2) base register + index register
>      3) base register + offset
>      4) auto-update for base register + index register
>      5) auto-update for base register + offset register
>
> Now for VSX registers you only have:
>
>      1) base register
>      2) base register + index register
>
> So, in the work I'm doing right now, I want to allow reg + offset addressing,
> but if the register being loaded is an Altivec register (high part of the VSX
> registers), I need to create a secondary reload to load the offset into and
> then convert the address to indirect or indexed addressing.  You don't want to
> always disallow offset based addressing, but you want to create the secondary
> reload when you need to.
>
> Similarly, vector types can do indexed addressing (reg+reg) but if you are
> loading or storing the value into GPR registers, you can't do reg+reg
> addressing on multi-word items.
>
> Finally, one of the features of ISA 2.07 is the notion of load fusion, where
> the hardware will fuse together a load immediate with an adjacent load (for
> GPRs, you need the load immediate shifted to be the register that is being
> loaded, for VSX registers, you just need the instructions adjacent).  In this
> case, before reload we will want to pretend that the machine has addressing to
> include the fusion forms, and in secondary reload, you will generate the
> combined insn that will become the fusion instruction.
>
David Edelsohn Oct. 22, 2013, 2:21 p.m. UTC | #8
On Mon, Oct 21, 2013 at 10:42 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:

>> I would say lets add -mlra, but make the default OFF for the time being.
>> We
>> can always switch the default later.
>
> Sure, if you know some LRA problems it should not be on default. Moreover,
> if we still have the problems when releasing gcc4.9, I think we should
> exclude any possibility for a user to use LRA for ppc.  I don't want to have
> GGC-4.9 users blaming LRA.
>
> But adding LRA to PPC on the trunk (switched OFF by default) earlier could
> help me a lot to work on the issues.

My main concern was disrupting Mike. If Mike is comfortable with
adding LRA disabled by default, it is okay with me.

The patch mostly adds lra_in_progress, which will not have any effect
while LRA remains disabled.

My one question about the patch is:

-  [(set (match_operand:DI 0 "reg_or_mem_operand" "=&r,Z,??&r")
+  [(set (match_operand:DI 0 "reg_or_mem_operand" "=&r,Z,&r")

which may cause register preferencing problems for bswap when LRA is not used.

The rest of the patch is okay.

Thanks, David
Michael Meissner Oct. 22, 2013, 3:33 p.m. UTC | #9
On Tue, Oct 22, 2013 at 10:21:32AM -0400, David Edelsohn wrote:
> On Mon, Oct 21, 2013 at 10:42 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
> 
> >> I would say lets add -mlra, but make the default OFF for the time being.
> >> We
> >> can always switch the default later.
> >
> > Sure, if you know some LRA problems it should not be on default. Moreover,
> > if we still have the problems when releasing gcc4.9, I think we should
> > exclude any possibility for a user to use LRA for ppc.  I don't want to have
> > GGC-4.9 users blaming LRA.
> >
> > But adding LRA to PPC on the trunk (switched OFF by default) earlier could
> > help me a lot to work on the issues.
> 
> My main concern was disrupting Mike. If Mike is comfortable with
> adding LRA disabled by default, it is okay with me.
> 
> The patch mostly adds lra_in_progress, which will not have any effect
> while LRA remains disabled.
> 
> My one question about the patch is:
> 
> -  [(set (match_operand:DI 0 "reg_or_mem_operand" "=&r,Z,??&r")
> +  [(set (match_operand:DI 0 "reg_or_mem_operand" "=&r,Z,&r")
> 
> which may cause register preferencing problems for bswap when LRA is not used.
> 
> The rest of the patch is okay.
> 
> Thanks, David

Yeah, I can see a whole round of tuning issues, and everywhere
reload_in_progress is used, add lra_in_progress.  Because of the Advance
Toolchain, RHEL, and SLES, we will need to still deal with the original
register allocator.

Vlad, this is part of a message I had sent David, and I thought you were on the
CC list about LRA.

I haven't looked in detail what the changes are at this point.  I did do some
builds and comparisons.  It looks like there are definately problems with
32-bit fortran and decimal floating point (and likely long double using IBM's
double double format).  If somebody has some cycles, it may be useful digging
into why we get these failures.

Note, I have some sort of configuration problem in running dealII, so it
isn't run right now:

Spec 2006, 64-bit, 3 runs, picking the middle, power7 options:

Benchmark                   Type    Percent
400.perlbench               int      96.74%
401.bzip2                   int     100.09%
403.gcc                     int      99.94%
429.mcf                     int      99.21%
445.gobmk                   int      99.33%
456.hmmer                   int      98.34%
458.sjeng                   int      99.68%
462.libquantum              int     101.48%
464.h264ref                 int     101.40%
471.omnetpp                 int     100.28%
473.astar                   int     100.09%
483.xalancbmk               int      98.28%
410.bwaves                  fp       98.11%
416.gamess                  fp      101.31%
433.milc                    fp       99.43%
434.zeusmp                  fp      103.53%
435.gromacs                 fp      109.63%
436.cactusADM               fp       99.53%
437.leslie3d                fp      101.23%
444.namd                    fp      103.42%
447.dealII                  fp      ------
450.soplex                  fp       99.14%
453.povray                  fp       99.66%
454.calculix                fp       97.17%
459.GemsFDTD                fp      100.88%
465.tonto                   fp      101.18%
470.lbm                     fp       99.83%
481.wrf                     fp       93.38%
482.sphinx3                 fp      100.82%
Spec INT                    int      99.57%
Spec FP except 447.dealII   fp      100.43%

Perlbench, calculix, and wrf are slower.  Zeusmp, gromacs, and Namd are
faster.

Unfortunately, the profiling tools on my system seem to abort when I run 32-bit
benchmarks, so I haven't gotten the numbers recently (nor had time to get the
tools team to look at it).

In terms of building 32-bit, 3 benchmarks don't build with LRA: gamess, dealII
(note in 64-bit dealII builds, it just doesn't run correctly), and wrf.

Lets see.  In gamess, I see:

/home/meissner/fsf-install-ppc64/gcc-4_9-lra/bin/gfortran -c -o ormas1.fppized.o -g -save-temps=obj -ffast-math -Ofast -mveclibabi=mass -mcpu=power7 -mrecip=rsqrt -fpeel-loops -funroll-loops -ftree-vectorize -fvect-cost-model -fno-aggressive-loop-optimizations -mlra -m32 ormas1.fppized.f
ormas1.fppized.f: In function 'maktabs':
ormas1.fppized.f:2281:0: internal compiler error: in check_rtl, at lra.c:2036
       END
 ^
0x105a08ef check_rtl
        /home/meissner/fsf-src/gcc-4_9-lra/gcc/lra.c:2036
0x105a2bcb lra(_IO_FILE*)
        /home/meissner/fsf-src/gcc-4_9-lra/gcc/lra.c:2432
0x10552933 do_reload
        /home/meissner/fsf-src/gcc-4_9-lra/gcc/ira.c:4686
0x10552933 rest_of_handle_reload
        /home/meissner/fsf-src/gcc-4_9-lra/gcc/ira.c:4815
0x10552933 execute
        /home/meissner/fsf-src/gcc-4_9-lra/gcc/ira.c:4844
Please submit a full bug report,


In dealII we see:

/home/meissner/fsf-install-ppc64/gcc-4_9-lra/bin/g++ -c -o sparse_matrix_ez.float.o -DSPEC_CPU -DNDEBUG  -Iinclude -DBOOST_DISABLE_THREADS -Ddeal_II_dimension=3 -g -save-temps=obj -ffast-math -Ofast -mveclibabi=mass -mcpu=po
wer7 -mrecip=rsqrt -fpeel-loops -funroll-loops -ftree-vectorize -fvect-cost-model -fno-aggressive-loop-optimizations -mlra  -m32     -DSPEC_CPU_LINUX -include cstddef      sparse_matrix_ez.float.cc
quadrature_lib.cc: In constructor 'QGauss<dim>::QGauss(unsigned int) [with int dim = 1]':
quadrature_lib.cc:95:1: internal compiler error: in check_rtl, at lra.c:2036
 }
 ^
0x1073272f check_rtl
        /home/meissner/fsf-src/gcc-4_9-lra/gcc/lra.c:2036
0x10734a0b lra(_IO_FILE*)
        /home/meissner/fsf-src/gcc-4_9-lra/gcc/lra.c:2432
0x106e4773 do_reload
        /home/meissner/fsf-src/gcc-4_9-lra/gcc/ira.c:4686
0x106e4773 rest_of_handle_reload
        /home/meissner/fsf-src/gcc-4_9-lra/gcc/ira.c:4815
0x106e4773 execute
        /home/meissner/fsf-src/gcc-4_9-lra/gcc/ira.c:4844
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.
specmake: *** [quadrature_lib.o] Error 1
specmake: *** Waiting for unfinished jobs....
polynomial.cc: In member function 'Polynomials::Polynomial<number> Polynomials::Polynomial<number>::derivative() const [with number = long double]':
polynomial.cc:282:3: internal compiler error: in check_rtl, at lra.c:2036
   }
   ^
0x1073272f check_rtl
        /home/meissner/fsf-src/gcc-4_9-lra/gcc/lra.c:2036
0x10734a0b lra(_IO_FILE*)
        /home/meissner/fsf-src/gcc-4_9-lra/gcc/lra.c:2432
0x106e4773 do_reload
        /home/meissner/fsf-src/gcc-4_9-lra/gcc/ira.c:4686
0x106e4773 rest_of_handle_reload
        /home/meissner/fsf-src/gcc-4_9-lra/gcc/ira.c:4815
0x106e4773 execute
        /home/meissner/fsf-src/gcc-4_9-lra/gcc/ira.c:4844
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.
specmake: *** [polynomial.o] Error 1

In wrf, we get:
/home/meissner/fsf-install-ppc64/gcc-4_9-lra/bin/gfortran -c -o module_radiation_driver.fppized.o -I. -I./netcdf/include -g -save-temps=obj -ffast-math -Ofast -mveclibabi=mass -mcpu=power7 -mrecip=rsqrt -fpeel-loops -funroll
-loops -ftree-vectorize -fvect-cost-model -fno-aggressive-loop-optimizations -mlra -m32 module_radiation_driver.fppized.f90
module_diffusion_em.fppized.f90: In function 'cal_deform_and_div':
module_diffusion_em.fppized.f90:829:0: internal compiler error: in check_rtl, at lra.c:2036
     END SUBROUTINE cal_deform_and_div
 ^
0x105a08ef check_rtl
        /home/meissner/fsf-src/gcc-4_9-lra/gcc/lra.c:2036
0x105a2bcb lra(_IO_FILE*)
        /home/meissner/fsf-src/gcc-4_9-lra/gcc/lra.c:2432
0x10552933 do_reload
        /home/meissner/fsf-src/gcc-4_9-lra/gcc/ira.c:4686
0x10552933 rest_of_handle_reload
        /home/meissner/fsf-src/gcc-4_9-lra/gcc/ira.c:4815
0x10552933 execute
        /home/meissner/fsf-src/gcc-4_9-lra/gcc/ira.c:4844
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.
specmake: *** [module_diffusion_em.fppized.o] Error 1
specmake: *** Waiting for unfinished jobs....
Error with make 'specmake -j40 build': check file '/home/meissner/spec-build/spec-2006-base-dev49-power7-vsx-svn203459-lra-shared-at7.0-32bit/benchspec/CPU2006/481.wrf/build/build_base_dev49-power7-vsx-32bit.0000/make.err'
  Command returned exit code 2
  Error with make!
*** Error building 481.wrf

I checked the LRA changes into a branch, and it is based off of subversion id
203569.
svn+ssh://gcc.gnu.org/svn/gcc/branches/ibm/gcc-4_9-lra

Lets see, in terms of make check regressions:

Unexpected tests for gcc -m64:

Test                                            | gcc-4 #1 | trunk #2
=============================================== | ======== | ========
gcc.target/powerpc/p8vector-ldst.c              | fail     | ---
gcc.target/powerpc/pr57744.c                    | fail     | ---

Unexpected tests for gcc -m32:

Test                                            | gcc-4 #1 | trunk #2
=============================================== | ======== | ========
c-c++-common/dfp/cast.c                         | fail     | ---
c-c++-common/dfp/convert-bfp-10.c               | fail     | ---
c-c++-common/dfp/convert-bfp-11.c               | fail     | ---
c-c++-common/dfp/convert-bfp-2.c                | fail     | ---
c-c++-common/dfp/convert-bfp-3.c                | fail     | ---
c-c++-common/dfp/convert-bfp-4.c                | fail     | ---
c-c++-common/dfp/convert-bfp-5.c                | fail     | ---
c-c++-common/dfp/convert-bfp-6.c                | fail     | ---
c-c++-common/dfp/convert-bfp-7.c                | fail     | ---
c-c++-common/dfp/convert-bfp.c                  | fail     | ---
c-c++-common/dfp/inf-1.c                        | fail     | ---
gcc.target/powerpc/bswap64-4.c                  | fail     | ---
gcc.target/powerpc/p8vector-ldst.c              | fail     | ---
gcc.target/powerpc/pr53199.c                    | fail     | ---

Unexpected tests for g++ -m32:

Test                              | gcc-4 #1 | trunk #2
================================= | ======== | ========
c-c++-common/dfp/cast.c           | fail     | ---
c-c++-common/dfp/convert-bfp-10.c | fail     | ---
c-c++-common/dfp/convert-bfp-11.c | fail     | ---
c-c++-common/dfp/convert-bfp-2.c  | fail     | ---
c-c++-common/dfp/convert-bfp-3.c  | fail     | ---
c-c++-common/dfp/convert-bfp-4.c  | fail     | ---
c-c++-common/dfp/convert-bfp-5.c  | fail     | ---
c-c++-common/dfp/convert-bfp-6.c  | fail     | ---
c-c++-common/dfp/convert-bfp-7.c  | fail     | ---
c-c++-common/dfp/convert-bfp.c    | fail     | ---
c-c++-common/dfp/inf-1.c          | fail     | ---

Unexpected tests for gfortran -m32:

Test                                                        | gcc-4 #1 | trunk #2
=========================================================== | ======== | ========
gfortran.dg/PR19872.f                                       | fail     | ---
gfortran.dg/advance_1.f90                                   | fail     | ---
gfortran.dg/advance_4.f90                                   | fail     | ---
gfortran.dg/advance_5.f90                                   | fail     | ---
gfortran.dg/advance_6.f90                                   | fail     | ---
gfortran.dg/append_1.f90                                    | fail     | ---
gfortran.dg/associated_2.f90                                | fail     | ---
gfortran.dg/assumed_rank_1.f90                              | fail     | ---
gfortran.dg/assumed_rank_2.f90                              | fail     | ---
gfortran.dg/assumed_rank_7.f90                              | fail     | ---
gfortran.dg/assumed_type_2.f90                              | fail     | ---
gfortran.dg/backspace_10.f90                                | fail     | ---
gfortran.dg/backspace_2.f                                   | fail     | ---
gfortran.dg/backspace_8.f                                   | fail     | ---
gfortran.dg/backspace_9.f                                   | fail     | ---
gfortran.dg/bound_2.f90                                     | fail     | ---
gfortran.dg/bound_7.f90                                     | fail     | ---
gfortran.dg/char_cshift_1.f90                               | fail     | ---
gfortran.dg/char_cshift_2.f90                               | fail     | ---
gfortran.dg/char_cshift_3.f90                               | fail     | ---
gfortran.dg/char_eoshift_1.f90                              | fail     | ---
gfortran.dg/char_eoshift_2.f90                              | fail     | ---
gfortran.dg/char_eoshift_3.f90                              | fail     | ---
gfortran.dg/char_eoshift_4.f90                              | fail     | ---
gfortran.dg/char_eoshift_5.f90                              | fail     | ---
gfortran.dg/char_length_8.f90                               | fail     | ---
gfortran.dg/chmod_1.f90                                     | fail     | ---
gfortran.dg/chmod_2.f90                                     | fail     | ---
gfortran.dg/chmod_3.f90                                     | fail     | ---
gfortran.dg/comma.f                                         | fail     | ---
gfortran.dg/convert_2.f90                                   | fail     | ---
gfortran.dg/convert_implied_open.f90                        | fail     | ---
gfortran.dg/cr_lf.f90                                       | fail     | ---
gfortran.dg/cshift_bounds_1.f90                             | fail     | ---
gfortran.dg/cshift_bounds_2.f90                             | fail     | ---
gfortran.dg/cshift_bounds_3.f90                             | fail     | ---
gfortran.dg/cshift_bounds_4.f90                             | fail     | ---
gfortran.dg/cshift_nan_1.f90                                | fail     | ---
gfortran.dg/defined_assignment_9.f90                        | fail     | ---
gfortran.dg/dev_null.F90                                    | fail     | ---
gfortran.dg/direct_io_1.f90                                 | fail     | ---
gfortran.dg/direct_io_11.f90                                | fail     | ---
gfortran.dg/direct_io_12.f90                                | fail     | ---
gfortran.dg/direct_io_2.f90                                 | fail     | ---
gfortran.dg/direct_io_3.f90                                 | fail     | ---
gfortran.dg/direct_io_5.f90                                 | fail     | ---
gfortran.dg/direct_io_8.f90                                 | fail     | ---
gfortran.dg/endfile.f90                                     | fail     | ---
gfortran.dg/endfile_2.f90                                   | fail     | ---
gfortran.dg/eof_4.f90                                       | fail     | ---
gfortran.dg/eoshift.f90                                     | fail     | ---
gfortran.dg/eoshift_bounds_1.f90                            | fail     | ---
gfortran.dg/error_format.f90                                | fail     | ---
gfortran.dg/f2003_inquire_1.f03                             | fail     | ---
gfortran.dg/f2003_io_1.f03                                  | fail     | ---
gfortran.dg/f2003_io_5.f03                                  | fail     | ---
gfortran.dg/f2003_io_7.f03                                  | fail     | ---
gfortran.dg/fmt_cache_1.f                                   | fail     | ---
gfortran.dg/fmt_error_4.f90                                 | fail     | ---
gfortran.dg/fmt_error_5.f90                                 | fail     | ---
gfortran.dg/fmt_t_5.f90                                     | fail     | ---
gfortran.dg/fmt_t_7.f                                       | fail     | ---
gfortran.dg/ftell_3.f90                                     | fail     | ---
gfortran.dg/hollerith4.f90                                  | fail     | ---
gfortran.dg/inquire_10.f90                                  | fail     | ---
gfortran.dg/inquire_13.f90                                  | fail     | ---
gfortran.dg/inquire_15.f90                                  | fail     | ---
gfortran.dg/inquire_9.f90                                   | fail     | ---
gfortran.dg/inquire_size.f90                                | fail     | ---
gfortran.dg/iomsg_1.f90                                     | fail     | ---
gfortran.dg/iostat_2.f90                                    | fail     | ---
gfortran.dg/list_read_10.f90                                | fail     | ---
gfortran.dg/list_read_11.f90                                | fail     | ---
gfortran.dg/list_read_6.f90                                 | fail     | ---
gfortran.dg/list_read_7.f90                                 | fail     | ---
gfortran.dg/list_read_9.f90                                 | fail     | ---
gfortran.dg/matmul_1.f90                                    | fail     | ---
gfortran.dg/matmul_5.f90                                    | fail     | ---
gfortran.dg/maxloc_bounds_1.f90                             | fail     | ---
gfortran.dg/maxloc_bounds_2.f90                             | fail     | ---
gfortran.dg/maxloc_bounds_3.f90                             | fail     | ---
gfortran.dg/maxloc_bounds_6.f90                             | fail     | ---
gfortran.dg/maxloc_bounds_8.f90                             | fail     | ---
gfortran.dg/namelist_44.f90                                 | fail     | ---
gfortran.dg/namelist_45.f90                                 | fail     | ---
gfortran.dg/namelist_46.f90                                 | fail     | ---
gfortran.dg/namelist_66.f90                                 | fail     | ---
gfortran.dg/namelist_72.f                                   | fail     | ---
gfortran.dg/namelist_82.f90                                 | fail     | ---
gfortran.dg/negative_automatic_size.f90                     | fail     | ---
gfortran.dg/negative_unit.f                                 | fail     | ---
gfortran.dg/negative_unit_int8.f                            | fail     | ---
gfortran.dg/newunit_1.f90                                   | fail     | ---
gfortran.dg/newunit_3.f90                                   | fail     | ---
gfortran.dg/open_access_append_1.f90                        | fail     | ---
gfortran.dg/open_errors.f90                                 | fail     | ---
gfortran.dg/open_negative_unit_1.f90                        | fail     | ---
gfortran.dg/open_new.f90                                    | fail     | ---
gfortran.dg/open_readonly_1.f90                             | fail     | ---
gfortran.dg/open_status_1.f90                               | fail     | ---
gfortran.dg/open_status_2.f90                               | fail     | ---
gfortran.dg/open_status_3.f90                               | fail     | ---
gfortran.dg/optional_dim_2.f90                              | fail     | ---
gfortran.dg/optional_dim_3.f90                              | fail     | ---
gfortran.dg/overwrite_1.f                                   | fail     | ---
gfortran.dg/pointer_assign_8.f90                            | fail     | ---
gfortran.dg/pr16597.f90                                     | fail     | ---
gfortran.dg/pr16935.f90                                     | fail     | ---
gfortran.dg/pr20954.f                                       | fail     | ---
gfortran.dg/pr39865.f90                                     | fail     | ---
gfortran.dg/pr46804.f90                                     | fail     | ---
gfortran.dg/pr47878.f90                                     | fail     | ---
gfortran.dg/read_comma.f                                    | fail     | ---
gfortran.dg/read_eof_4.f90                                  | fail     | ---
gfortran.dg/read_eof_8.f90                                  | fail     | ---
gfortran.dg/read_eof_all.f90                                | fail     | ---
gfortran.dg/read_list_eof_1.f90                             | fail     | ---
gfortran.dg/read_many_1.f                                   | fail     | ---
gfortran.dg/read_no_eor.f90                                 | fail     | ---
gfortran.dg/readwrite_unf_direct_eor_1.f90                  | fail     | ---
gfortran.dg/realloc_on_assign_11.f90                        | fail     | ---
gfortran.dg/realloc_on_assign_7.f03                         | fail     | ---
gfortran.dg/record_marker_1.f90                             | fail     | ---
gfortran.dg/record_marker_3.f90                             | fail     | ---
gfortran.dg/runtime_warning_1.f90                           | fail     | ---
gfortran.dg/selected_char_kind_1.f90                        | fail     | ---
gfortran.dg/selected_char_kind_4.f90                        | fail     | ---
gfortran.dg/shift-alloc.f90                                 | fail     | ---
gfortran.dg/shift-kind_2.f90                                | fail     | ---
gfortran.dg/stat_1.f90                                      | fail     | ---
gfortran.dg/stat_2.f90                                      | fail     | ---
gfortran.dg/streamio_1.f90                                  | fail     | ---
gfortran.dg/streamio_10.f90                                 | fail     | ---
gfortran.dg/streamio_12.f90                                 | fail     | ---
gfortran.dg/streamio_14.f90                                 | fail     | ---
gfortran.dg/streamio_15.f90                                 | fail     | ---
gfortran.dg/streamio_16.f90                                 | fail     | ---
gfortran.dg/streamio_2.f90                                  | fail     | ---
gfortran.dg/streamio_3.f90                                  | fail     | ---
gfortran.dg/streamio_4.f90                                  | fail     | ---
gfortran.dg/streamio_5.f90                                  | fail     | ---
gfortran.dg/streamio_6.f90                                  | fail     | ---
gfortran.dg/streamio_7.f90                                  | fail     | ---
gfortran.dg/streamio_8.f90                                  | fail     | ---
gfortran.dg/streamio_9.f90                                  | fail     | ---
gfortran.dg/tl_editing.f90                                  | fail     | ---
gfortran.dg/unf_io_convert_1.f90                            | fail     | ---
gfortran.dg/unf_io_convert_2.f90                            | fail     | ---
gfortran.dg/unf_io_convert_3.f90                            | fail     | ---
gfortran.dg/unf_io_convert_4.f90                            | fail     | ---
gfortran.dg/unf_read_corrupted_1.f90                        | fail     | ---
gfortran.dg/unf_short_record_1.f90                          | fail     | ---
gfortran.dg/unformatted_subrecord_1.f90                     | fail     | ---
gfortran.dg/unpack_bounds_1.f90                             | fail     | ---
gfortran.dg/unpack_bounds_2.f90                             | fail     | ---
gfortran.dg/unpack_bounds_3.f90                             | fail     | ---
gfortran.dg/widechar_intrinsics_10.f90                      | fail     | ---
gfortran.dg/widechar_intrinsics_5.f90                       | fail     | ---
gfortran.dg/write_back.f                                    | fail     | ---
gfortran.dg/write_check.f90                                 | fail     | ---
gfortran.dg/write_check3.f90                                | fail     | ---
gfortran.dg/write_direct_eor.f90                            | fail     | ---
gfortran.dg/write_rewind_1.f                                | fail     | ---
gfortran.dg/write_rewind_2.f                                | fail     | ---
gfortran.dg/write_to_null.F90                               | fail     | ---
gfortran.dg/x_slash_2.f                                     | fail     | ---
gfortran.dg/zero_sized_1.f90                                | fail     | ---
gfortran.fortran-torture/execute/backspace.f90              | fail     | ---
gfortran.fortran-torture/execute/direct_io.f90              | fail     | ---
gfortran.fortran-torture/execute/inquire_1.f90              | fail     | ---
gfortran.fortran-torture/execute/inquire_2.f90              | fail     | ---
gfortran.fortran-torture/execute/inquire_3.f90              | fail     | ---
gfortran.fortran-torture/execute/inquire_4.f90              | fail     | ---
gfortran.fortran-torture/execute/inquire_5.f90              | fail     | ---
gfortran.fortran-torture/execute/intrinsic_associated.f90   | fail     | ---
gfortran.fortran-torture/execute/intrinsic_associated_2.f90 | fail     | ---
gfortran.fortran-torture/execute/intrinsic_cshift.f90       | fail     | ---
gfortran.fortran-torture/execute/intrinsic_eoshift.f90      | fail     | ---
gfortran.fortran-torture/execute/intrinsic_size.f90         | fail     | ---
gfortran.fortran-torture/execute/list_read_1.f90            | fail     | ---
gfortran.fortran-torture/execute/open_replace.f90           | fail     | ---
gfortran.fortran-torture/execute/seq_io.f90                 | fail     | ---
gfortran.fortran-torture/execute/slash_edit.f90             | fail     | ---
gfortran.fortran-torture/execute/unopened_unit_1.f90        | fail     | ---
Vladimir Makarov Oct. 23, 2013, 2:16 a.m. UTC | #10
On 13-10-22 10:21 AM, David Edelsohn wrote:
> On Mon, Oct 21, 2013 at 10:42 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>
>>> I would say lets add -mlra, but make the default OFF for the time being.
>>> We
>>> can always switch the default later.
>> Sure, if you know some LRA problems it should not be on default. Moreover,
>> if we still have the problems when releasing gcc4.9, I think we should
>> exclude any possibility for a user to use LRA for ppc.  I don't want to have
>> GGC-4.9 users blaming LRA.
>>
>> But adding LRA to PPC on the trunk (switched OFF by default) earlier could
>> help me a lot to work on the issues.
> My main concern was disrupting Mike. If Mike is comfortable with
> adding LRA disabled by default, it is okay with me.
>
> The patch mostly adds lra_in_progress, which will not have any effect
> while LRA remains disabled.
>
> My one question about the patch is:
>
> -  [(set (match_operand:DI 0 "reg_or_mem_operand" "=&r,Z,??&r")
> +  [(set (match_operand:DI 0 "reg_or_mem_operand" "=&r,Z,&r")
>
> which may cause register preferencing problems for bswap when LRA is not used.
>
> The rest of the patch is okay.
>
>
Thanks, David.  I'll commit the patch this week without this change (and 
making LRA active only when -mlra is given).  The change was for fixing 
a testsuite failure for a bad code generation.

It can be fixed in other way not affecting reload by adding a modified 
copy of insn definition active only when LRA is used and making the 
original definition active only when reload is used.  But I'll do it later.
diff mbox

Patch

2013-10-03  Vladimir Makarov  <vmakarov@redhat.com>

	* config/rs6000/rs6000-protos.h
	(rs6000_secondary_memory_needed_mode): New prototype.
	* config/rs6000/rs6000.c: Include ira.h.
	(TARGET_LRA_P): Redefine.
	(rs6000_legitimate_offset_address_p): Call
	legitimate_constant_pool_address_p in strict mode for LRA.
	(rs6000_legitimate_address_p): Ditto.
	(legitimate_lo_sum_address_p): Add code for LRA.
	Use lra_in_progress.
	(rs6000_emit_move): Add LRA version of code to generate load/store
	of SDmode values.
	(rs6000_secondary_memory_needed_mode): New.
	(rs6000_alloc_sdmode_stack_slot): Do nothing for LRA.
	(rs6000_secondary_reload_class): Return NO_REGS for LRA for
	constants, memory, and FP registers.
	(rs6000_lra_p): New.
	* config/rs6000/rs6000.h (SECONDARY_MEMORY_NEEDED_MODE): New
	macro.
	* config/rs6000/rs6000.md (*bswapdi2_64bit): Remove ?? from 3rd
	alternative.
	* config/rs6000/rs6000.opt (mlra): New option.

Index: config/rs6000/rs6000-protos.h
===================================================================
--- config/rs6000/rs6000-protos.h	(revision 203164)
+++ config/rs6000/rs6000-protos.h	(working copy)
@@ -124,6 +124,8 @@  extern rtx create_TOC_reference (rtx, rt
 extern void rs6000_split_multireg_move (rtx, rtx);
 extern void rs6000_emit_move (rtx, rtx, enum machine_mode);
 extern rtx rs6000_secondary_memory_needed_rtx (enum machine_mode);
+extern enum machine_mode rs6000_secondary_memory_needed_mode (enum
+							      machine_mode);
 extern rtx (*rs6000_legitimize_reload_address_ptr) (rtx, enum machine_mode,
 						    int, int, int, int *);
 extern bool rs6000_legitimate_offset_address_p (enum machine_mode, rtx,
Index: config/rs6000/rs6000.c
===================================================================
--- config/rs6000/rs6000.c	(revision 203164)
+++ config/rs6000/rs6000.c	(working copy)
@@ -56,6 +56,7 @@ 
 #include "intl.h"
 #include "params.h"
 #include "tm-constrs.h"
+#include "ira.h"
 #include "opts.h"
 #include "tree-vectorizer.h"
 #include "dumpfile.h"
@@ -1493,6 +1494,9 @@  static const struct attribute_spec rs600
 #undef TARGET_MODE_DEPENDENT_ADDRESS_P
 #define TARGET_MODE_DEPENDENT_ADDRESS_P rs6000_mode_dependent_address_p
 
+#undef TARGET_LRA_P
+#define TARGET_LRA_P rs6000_lra_p
+
 #undef TARGET_CAN_ELIMINATE
 #define TARGET_CAN_ELIMINATE rs6000_can_eliminate
 
@@ -6030,7 +6034,7 @@  rs6000_legitimate_offset_address_p (enum
     return false;
   if (!reg_offset_addressing_ok_p (mode))
     return virtual_stack_registers_memory_p (x);
-  if (legitimate_constant_pool_address_p (x, mode, strict))
+  if (legitimate_constant_pool_address_p (x, mode, strict || lra_in_progress))
     return true;
   if (GET_CODE (XEXP (x, 1)) != CONST_INT)
     return false;
@@ -6170,19 +6174,31 @@  legitimate_lo_sum_address_p (enum machin
 
   if (TARGET_ELF || TARGET_MACHO)
     {
+      bool large_toc_ok;
+
       if (DEFAULT_ABI != ABI_AIX && DEFAULT_ABI != ABI_DARWIN && flag_pic)
 	return false;
-      if (TARGET_TOC)
+      /* LRA don't use LEGITIMIZE_RELOAD_ADDRESS as it usually calls
+	 push_reload from reload pass code.  LEGITIMIZE_RELOAD_ADDRESS
+	 recognizes some LO_SUM addresses as valid although this
+	 function says opposite.  In most cases, LRA through different
+	 transformations can generate correct code for address reloads.
+	 It can not manage only some LO_SUM cases.  So we need to add
+	 code analogous to one in rs6000_legitimize_reload_address for
+	 LOW_SUM here saying that some addresses are still valid.  */
+      large_toc_ok = (lra_in_progress && TARGET_CMODEL != CMODEL_SMALL
+		      && small_toc_ref (x, VOIDmode));
+      if (TARGET_TOC && ! large_toc_ok)
 	return false;
       if (GET_MODE_NUNITS (mode) != 1)
 	return false;
-      if (GET_MODE_SIZE (mode) > UNITS_PER_WORD
+      if (! lra_in_progress && GET_MODE_SIZE (mode) > UNITS_PER_WORD
 	  && !(/* ??? Assume floating point reg based on mode?  */
 	       TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT
 	       && (mode == DFmode || mode == DDmode)))
 	return false;
 
-      return CONSTANT_P (x);
+      return CONSTANT_P (x) || large_toc_ok;
     }
 
   return false;
@@ -7180,7 +7196,8 @@  rs6000_legitimate_address_p (enum machin
   if (reg_offset_p && legitimate_small_data_p (mode, x))
     return 1;
   if (reg_offset_p
-      && legitimate_constant_pool_address_p (x, mode, reg_ok_strict))
+      && legitimate_constant_pool_address_p (x, mode,
+					     reg_ok_strict || lra_in_progress))
     return 1;
   /* For TImode, if we have load/store quad, only allow register indirect
      addresses.  This will allow the values to go in either GPRs or VSX
@@ -7479,6 +7496,7 @@  rs6000_conditional_register_usage (void)
 	  fixed_regs[i] = call_used_regs[i] = call_really_used_regs[i] = 1;
     }
 }
+
 
 /* Try to output insns to set TARGET equal to the constant C if it can
    be done in less than N insns.  Do all computations in MODE.
@@ -7783,6 +7801,68 @@  rs6000_emit_move (rtx dest, rtx source,
     cfun->machine->sdmode_stack_slot =
       eliminate_regs (cfun->machine->sdmode_stack_slot, VOIDmode, NULL_RTX);
 
+
+  if (lra_in_progress
+      && mode == SDmode
+      && REG_P (operands[0]) && REGNO (operands[0]) >= FIRST_PSEUDO_REGISTER
+      && reg_preferred_class (REGNO (operands[0])) == NO_REGS
+      && (REG_P (operands[1])
+	  || (GET_CODE (operands[1]) == SUBREG
+	      && REG_P (SUBREG_REG (operands[1])))))
+    {
+      int regno = REGNO (GET_CODE (operands[1]) == SUBREG
+			 ? SUBREG_REG (operands[1]) : operands[1]);
+      enum reg_class cl;
+
+      if (regno >= FIRST_PSEUDO_REGISTER)
+	{
+	  cl = reg_preferred_class (regno);
+	  gcc_assert (cl != NO_REGS);
+	  regno = ira_class_hard_regs[cl][0];
+	}
+      if (FP_REGNO_P (regno))
+	{
+	  if (GET_MODE (operands[0]) != DDmode)
+	    operands[0] = gen_rtx_SUBREG (DDmode, operands[0], 0);
+	  emit_insn (gen_movsd_store (operands[0], operands[1]));
+	}
+      else if (INT_REGNO_P (regno))
+	emit_insn (gen_movsd_hardfloat (operands[0], operands[1]));
+      else
+	gcc_unreachable();
+      return;
+    }
+  if (lra_in_progress
+      && mode == SDmode
+      && (REG_P (operands[0])
+	  || (GET_CODE (operands[0]) == SUBREG
+	      && REG_P (SUBREG_REG (operands[0]))))
+      && REG_P (operands[1]) && REGNO (operands[1]) >= FIRST_PSEUDO_REGISTER
+      && reg_preferred_class (REGNO (operands[1])) == NO_REGS)
+    {
+      int regno = REGNO (GET_CODE (operands[0]) == SUBREG
+			 ? SUBREG_REG (operands[0]) : operands[0]);
+      enum reg_class cl;
+
+      if (regno >= FIRST_PSEUDO_REGISTER)
+	{
+	  cl = reg_preferred_class (regno);
+	  gcc_assert (cl != NO_REGS);
+	  regno = ira_class_hard_regs[cl][0];
+	}
+      if (FP_REGNO_P (regno))
+	{
+	  if (GET_MODE (operands[1]) != DDmode)
+	    operands[1] = gen_rtx_SUBREG (DDmode, operands[1], 0);
+	  emit_insn (gen_movsd_load (operands[0], operands[1]));
+	}
+      else if (INT_REGNO_P (regno))
+	emit_insn (gen_movsd_hardfloat (operands[0], operands[1]));
+      else
+	gcc_unreachable();
+      return;
+    }
+
   if (reload_in_progress
       && mode == SDmode
       && cfun->machine->sdmode_stack_slot != NULL_RTX
@@ -14630,6 +14710,17 @@  rs6000_secondary_memory_needed_rtx (enum
   return ret;
 }
 
+/* Return the mode to be used for memory when a secondary memory
+   location is needed.  For SDmode values we need to use DDmode, in
+   all other cases we can use the same mode.  */
+enum machine_mode
+rs6000_secondary_memory_needed_mode (enum machine_mode mode)
+{
+  if (mode == SDmode)
+    return DDmode;
+  return mode;
+}
+
 static tree
 rs6000_check_sdmode (tree *tp, int *walk_subtrees, void *data ATTRIBUTE_UNUSED)
 {
@@ -15523,6 +15614,10 @@  rs6000_alloc_sdmode_stack_slot (void)
   gimple_stmt_iterator gsi;
 
   gcc_assert (cfun->machine->sdmode_stack_slot == NULL_RTX);
+  /* We use a different approach for dealing with the secondary
+     memory in LRA.  */
+  if (ira_use_lra_p)
+    return;
 
   if (TARGET_NO_SDMODE_STACK)
     return;
@@ -15744,7 +15839,7 @@  rs6000_secondary_reload_class (enum reg_
   /* Constants, memory, and FP registers can go into FP registers.  */
   if ((regno == -1 || FP_REGNO_P (regno))
       && (rclass == FLOAT_REGS || rclass == NON_SPECIAL_REGS))
-    return (mode != SDmode) ? NO_REGS : GENERAL_REGS;
+    return (mode != SDmode || lra_in_progress) ? NO_REGS : GENERAL_REGS;
 
   /* Memory, and FP/altivec registers can go into fp/altivec registers under
      VSX.  However, for scalar variables, use the traditional floating point
@@ -28936,6 +29031,13 @@  rs6000_libcall_value (enum machine_mode
 }
 
 
+/* Return true if we use LRA instead of reload pass.  */
+static bool
+rs6000_lra_p (void)
+{
+  return rs6000_lra_flag;
+}
+
 /* Given FROM and TO register numbers, say whether this elimination is allowed.
    Frame pointer elimination is automatically handled.
 
Index: config/rs6000/rs6000.h
===================================================================
--- config/rs6000/rs6000.h	(revision 203164)
+++ config/rs6000/rs6000.h	(working copy)
@@ -1491,6 +1491,13 @@  extern enum reg_class rs6000_constraints
 #define SECONDARY_MEMORY_NEEDED_RTX(MODE) \
   rs6000_secondary_memory_needed_rtx (MODE)
 
+/* Specify the mode to be used for memory when a secondary memory
+   location is needed.  For cpus that cannot load/store SDmode values
+   from the 64-bit FP registers without using a full 64-bit
+   load/store, we need a wider mode.  */
+#define SECONDARY_MEMORY_NEEDED_MODE(MODE)		\
+  rs6000_secondary_memory_needed_mode (MODE)
+
 /* Return the maximum number of consecutive registers
    needed to represent mode MODE in a register of class CLASS.
 
Index: config/rs6000/rs6000.md
===================================================================
--- config/rs6000/rs6000.md	(revision 203164)
+++ config/rs6000/rs6000.md	(working copy)
@@ -2391,7 +2391,7 @@ 
 
 ;; Non-power7/cell, fall back to use lwbrx/stwbrx
 (define_insn "*bswapdi2_64bit"
-  [(set (match_operand:DI 0 "reg_or_mem_operand" "=&r,Z,??&r")
+  [(set (match_operand:DI 0 "reg_or_mem_operand" "=&r,Z,&r")
 	(bswap:DI (match_operand:DI 1 "reg_or_mem_operand" "Z,r,r")))
    (clobber (match_scratch:DI 2 "=&b,&b,&r"))
    (clobber (match_scratch:DI 3 "=&r,&r,&r"))
Index: config/rs6000/rs6000.opt
===================================================================
--- config/rs6000/rs6000.opt	(revision 203164)
+++ config/rs6000/rs6000.opt	(working copy)
@@ -446,6 +446,10 @@  mlong-double-
 Target RejectNegative Joined UInteger Var(rs6000_long_double_type_size) Save
 -mlong-double-<n>	Specify size of long double (64 or 128 bits)
 
+mlra
+Target Report Var(rs6000_lra_flag) Init(1) Save
+Use LRA instead of reload
+
 msched-costly-dep=
 Target RejectNegative Joined Var(rs6000_sched_costly_dep_str)
 Determine which dependences between insns are considered costly