diff mbox

Combine four insns

Message ID 4C5FEF18.7010504@codesourcery.com
State New
Headers show

Commit Message

Bernd Schmidt Aug. 9, 2010, 12:05 p.m. UTC
On 08/07/2010 10:10 AM, Eric Botcazou wrote:
> So I think that the patch shouldn't go in at this point.

Richard has approved it.  I'll wait a few more days to see if anyone
else agrees with your position.

> Combining Steven and Bernd's figures, 1% of a bootstrap time is 37% of the 
> combiner's time.  The result is 0.18% more combined insns.  It seems to me 
> that we are already very far in the directory of diminishing returns.

Better to look at actual code generation results IMO.  Do you have an
opinion on the examples I included with the patch?

> Bernd is essentially of the opinion that compilation time doesn't matter.

In a sense, the fact that a single CPU can bootstrap gcc in under 15
minutes is evidence that it doesn't matter.
However, what I'm actually saying is that we shouldn't prioritize
compile time over producing good code, based on what I think users want
more.

> It  seems to me that, even if we were to adopt this position, this shouldn't
> mean wasting compilation time, which I think is the case here.

Compile time is wasted only when it's spent on something that has no
user-visible impact.  For all the talk about how important it is, no one
seems to have made an effort to eliminate some fairly obvious sources of
waste, such as excessive use of ggc.  I suspect that some of the time
lost in combine is simply due to inefficient allocation and collection
of all the patterns it creates.

The following crude proof-of-concept patch moves rtl generation back to
obstacks.  (You may need --disable-werror which I just noticed I have in
the build tree).

Three runs with ggc:
real 14m8.202s  user 99m23.408s  sys 3m4.175s
real 14m25.045s user 100m14.608s sys 3m7.654s
real 14m2.115s  user 99m9.492s sys 3m4.461s

Three runs with obstacks:
real 13m49.718s user 97m10.766s sys 3m4.311s
real 13m42.406s user 96m39.082s sys 3m3.908s
real 13m49.806s user 97m1.344s sys 3m2.731s

Combiner patch on top of the obstacks patch:
real 13m51.508s user 97m25.865s sys 3m5.938s
real 13m47.367s user 97m28.612s sys 3m7.298s

(The numbers are not comparable to the ones included with the combiner
patch last week, as that tree contained some i386 backend changes as
well which I've removed for this test.)

Even if you take the 96m39s outlier, I think it shows that the overhead
of the combine-4 patch is somewhat reduced when RTL allocation is
restored to sanity.

Since I didn't know what kinds of problems to expect, I've only tried to
find some kind of fix for whatever showed up, not necessarily the best
possible one.  A second pass over everything would be necessary to clean
it up a little.  I'm somewhat disinclined to spend much more than the
one weekend on this; after all I don't care about compile time.


Bernd

Comments

Steven Bosscher Aug. 9, 2010, 12:33 p.m. UTC | #1
On Mon, Aug 9, 2010 at 2:05 PM, Bernd Schmidt <bernds@codesourcery.com> wrote:
> The following crude proof-of-concept patch moves rtl generation back to
> obstacks.  (You may need --disable-werror which I just noticed I have in
> the build tree).

This is interesting. Years ago (7 years?) Zack Weinberg suggested that
GCC should move RTL back onto obstacks. The overhead should be
relatively small compared to keeping entire functions-as-trees in
memory. This is even more true today, now we keep entire translation
units (and more) in memory as GIMPLE (with SSA). Memory spent on RTL
is marginal compared to that.

I passes this idea to Laurynas last year ago
(http://gcc.gnu.org/ml/gcc/2009-08/msg00386.html). I don't know if he
played with the idea or not.

I've starred your prototype patch in gmail ;-)

Ciao!
Steven
Michael Matz Aug. 9, 2010, 12:38 p.m. UTC | #2
Hi,

On Mon, 9 Aug 2010, Bernd Schmidt wrote:

> On 08/07/2010 10:10 AM, Eric Botcazou wrote:
> > So I think that the patch shouldn't go in at this point.
> 
> Richard has approved it.  I'll wait a few more days to see if anyone
> else agrees with your position.

[I'm not a reviewer, but FWIW:]

I'm also not too thrilled about using 1% more compilation time for the 
result.  It's not so much a matter of which sequences are actually 
replaced (your examples showed quite abysmal initial sequences, no doubt), 
but rather a matter of how often they occur in practice, and there it 
doesn't look too bright:

> $ grep Trying.four log |wc -l
> 307743
> $ grep Succeeded.four.into.two log |wc -l
> 244
> $ grep Succeeded.four.into.one log |wc -l
> 140

So out of 1230972 insns it was able to remove 908.  That's 0.07%.  Not too 
exciting, no matter how bad these 384 initial sequences looked like.

Especially for the bitmap manipulation stuff Richi once had something for 
the tree level that would enable better initial code generation.  Apart 
from that I unfortunately have no better idea to get the results you got 
without paying so high a price.

Perhaps you can limit the number it tries to match four instructions to 
some simpler cases?  Like that at least two of the insns must have 
"leaves" (expressions not depending on pseudos) as one of their operand?


Ciao,
Michael.
Bernd Schmidt Aug. 9, 2010, 1:47 p.m. UTC | #3
On 08/09/2010 02:33 PM, Steven Bosscher wrote:
> This is interesting. Years ago (7 years?) Zack Weinberg suggested that
> GCC should move RTL back onto obstacks. The overhead should be
> relatively small compared to keeping entire functions-as-trees in
> memory. This is even more true today, now we keep entire translation
> units (and more) in memory as GIMPLE (with SSA). Memory spent on RTL
> is marginal compared to that.

I think the idea is quite well-known.  The reason we moved to garbage
collection was just the fact that our previous obstack-based memory
management was unmaintainable, and we wanted to be able to introduce
things like tree-ssa.  Now that we've managed that transition, the
original reason to use garbage collection is lost, at least for RTL.  We
no longer do crazy things like starting to expand a nested function in
the middle of compiling its containing function.

The combine.c portion of the patch shows that obstacks can give us more
control over memory access patterns: we can create a pattern and then
immediately and cheaply discard the memory if we don't use it; the next
one we create reuses the same memory.  This should be better for locality.


Bernd
Bernd Schmidt Aug. 9, 2010, 1:48 p.m. UTC | #4
On 08/09/2010 02:38 PM, Michael Matz wrote:
> Perhaps you can limit the number it tries to match four instructions to 
> some simpler cases?  Like that at least two of the insns must have 
> "leaves" (expressions not depending on pseudos) as one of their operand?

That might be workable.  I could require one of them has to be set a
register to a constant; I expect that would still catch some interesting
cases and eliminate most of the ones that couldn't be combined anyway.


Bernd
Toon Moene Aug. 9, 2010, 2:34 p.m. UTC | #5
Bernd Schmidt wrote:

> On 08/07/2010 10:10 AM, Eric Botcazou wrote:

>> Combining Steven and Bernd's figures, 1% of a bootstrap time is 37% of the 
>> combiner's time.  The result is 0.18% more combined insns.  It seems to me 
>> that we are already very far in the directory of diminishing returns.
> 
> Better to look at actual code generation results IMO.  Do you have an
> opinion on the examples I included with the patch?

Well, one of the limitations of this analysis is that it is static - it 
doesn't say what the run-time influence of the simplification is.

In case it is inside a heavily used loop, it might be far more important 
than it at first looks ...
Steven Bosscher Aug. 9, 2010, 2:39 p.m. UTC | #6
On Mon, Aug 9, 2010 at 4:34 PM, Toon Moene <toon@moene.org> wrote:
> Bernd Schmidt wrote:
>
>> On 08/07/2010 10:10 AM, Eric Botcazou wrote:
>
>>> Combining Steven and Bernd's figures, 1% of a bootstrap time is 37% of
>>> the combiner's time.  The result is 0.18% more combined insns.  It seems to
>>> me that we are already very far in the directory of diminishing returns.
>>
>> Better to look at actual code generation results IMO.  Do you have an
>> opinion on the examples I included with the patch?
>
> Well, one of the limitations of this analysis is that it is static - it
> doesn't say what the run-time influence of the simplification is.
>
> In case it is inside a heavily used loop, it might be far more important
> than it at first looks ...

With that argument alone, you can justify a superoptimizer at -O2 :-P

There has to be a trade-off at some point.

Ciao!
Steven
Toon Moene Aug. 9, 2010, 2:54 p.m. UTC | #7
Steven Bosscher wrote:

> On Mon, Aug 9, 2010 at 4:34 PM, Toon Moene <toon@moene.org> wrote:

>> In case it is inside a heavily used loop, it might be far more important
>> than it at first looks ...
> 
> With that argument alone, you can justify a superoptimizer at -O2 :-P
> 
> There has to be a trade-off at some point.

OK, lets get serious then on the (improvement of) the speed of compilation.

By far the most examples I have seen over the last decade, speed of 
compilation is a concern when developers are in the compile->fix 
bug->compile cycle.

Most of these "compiles" are -O0 -g compiles, for obvious reasons (why 
spend time on optimization when you don't even know the code is correct ?)

So all of these complaints fall outside of the "this optimization adds 1 
% of compile time when compiling with -O2" arguments.

When compilation time with -On (n > 1) becomes a *real* problem, I'd 
like to hear about it again (and then, Bernds' analysis of where time 
goes in compilation still holds).
Paul Koning Aug. 9, 2010, 2:58 p.m. UTC | #8
>> ...
> 
> OK, lets get serious then on the (improvement of) the speed of compilation.
> 
> By far the most examples I have seen over the last decade, speed of compilation is a concern when developers are in the compile->fix bug->compile cycle.
> 
> Most of these "compiles" are -O0 -g compiles, for obvious reasons (why spend time on optimization when you don't even know the code is correct ?)

Actually, I tend to use -O1 for debug compiles, for two reasons. (1) a number of useful gcc warnings don't appear unless optimization is used; (2) if I have to step instructions it's a whole lot more efficient.

	paul
Diego Novillo Aug. 9, 2010, 3:26 p.m. UTC | #9
On 10-08-09 10:54 , Toon Moene wrote:

> Most of these "compiles" are -O0 -g compiles, for obvious reasons (why
> spend time on optimization when you don't even know the code is correct ?)

Internally, we have been working on build time improvements for a few 
months.  We are not looking at just the compiler, but the entire toolchain.

As I've posted in other threads, our main consumer of compilation time 
is the C++ front end.  Hands down.  It consistently takes between 70% 
and 80% of compilation time across the board.  Furthermore, this is 
independent from the optimization level.  The optimizers never take more 
than 10-15% of total compilation time, on average.

We are also looking at incremental linking and addressing performance 
problems in the build system itself.  But for the compiler, our focus is 
just the C++ front end.  We are trying to incorporate more caching into 
the system.  We already have ccache style caches, so we are trying finer 
grained approaches.

We tried using pre-compiled headers earlier this year, but it just 
didn't work.  PCH caches the transitive closure of the translation unit, 
so if just one file changes, you need to re-compile the whole TU again. 
  It was also space prohibitive (caching the different versions required).

Currently, we are prototyping front end changes for caching tokens and 
parsing results.  We are not yet up to the point where we can tell 
whether it will work, though.

The one interesting result we found so far is that in looking at the 
whole toolchain, it may be worth *slowing* down some components to 
speedup the whole process.  In particular, we have found that using -Os 
as our default build setting, actually decreases build time by 15% to 
20%.  The compiler spends a bit more time optimizing (almost 
unnoticeable in general), but the resulting smaller objects and binaries 
more than compensate for it.  It speeds up linking, I/O, transmission 
times, etc.

Additionally, the very worst offender in terms of compile time is -g. 
The size of debugging information is such, that I/O and communication 
times increase significantly.


Diego.
Mark Mitchell Aug. 9, 2010, 3:33 p.m. UTC | #10
Diego Novillo wrote:

> Additionally, the very worst offender in terms of compile time is -g.
> The size of debugging information is such, that I/O and communication
> times increase significantly.

And, in the case of C++, this is especially true, since you end up with
information about the same classes in multiple places.
Toon Moene Aug. 9, 2010, 5:07 p.m. UTC | #11
Diego Novillo wrote:

> On 10-08-09 10:54 , Toon Moene wrote:
> 
>> Most of these "compiles" are -O0 -g compiles, for obvious reasons (why
>> spend time on optimization when you don't even know the code is 
>> correct ?)
> 
> Internally, we have been working on build time improvements for a few 
> months.  We are not looking at just the compiler, but the entire toolchain.
> 
> As I've posted in other threads, our main consumer of compilation time 
> is the C++ front end.  Hands down.  It consistently takes between 70% 
> and 80% of compilation time across the board.  Furthermore, this is 
> independent from the optimization level.  The optimizers never take more 
> than 10-15% of total compilation time, on average.

As I pointed out in my 2007 GCC Summit talk, the Fortran Front End 
*already* (i.e., before anyone approved it) performs optimization on the 
Front End internal representation of a Fortran program / subroutine.

Is this also true for C++ ?  In that case it might be useful to curb 
Front End optimizations when -O0 is given ...

Or is there a reason why the C++ Front End has to do so much work (even 
when not bothering with language specific optimizations) ?
Diego Novillo Aug. 9, 2010, 5:14 p.m. UTC | #12
On 10-08-09 13:07 , Toon Moene wrote:

> Is this also true for C++ ? In that case it might be useful to curb
> Front End optimizations when -O0 is given ...

Not really, the amount of optimization is quite minimal to non-existent.

Much of the slowness is due to the inherent nature of C++ parsing. 
There is some performance to be gained by tweaking the various data 
structures and algorithms, but no order-of-magnitude opportunities seem 
to exist.


Diego.
Joseph Myers Aug. 9, 2010, 5:19 p.m. UTC | #13
On Mon, 9 Aug 2010, Diego Novillo wrote:

> Additionally, the very worst offender in terms of compile time is -g. The size
> of debugging information is such, that I/O and communication times increase
> significantly.

If communication between the compiler and assembler is an important part 
of the cost there, it's possible that a binary interface between them as 
suggested by Ian at <http://www.airs.com/blog/archives/268> would help.  
I would imagine it should be possible to get the assembler to accept some 
form of mixed text/binary input so you could just transmit debug info that 
way and transition to a more efficient interface incrementally (assembler 
input goes through a rather complicated sequence of preprocessing / 
processing steps, but cleaning them up to work with such input should be 
possible).
Toon Moene Aug. 9, 2010, 5:28 p.m. UTC | #14
Diego Novillo wrote:

> On 10-08-09 13:07 , Toon Moene wrote:
> 
>> Is this also true for C++ ? In that case it might be useful to curb
>> Front End optimizations when -O0 is given ...
> 
> Not really, the amount of optimization is quite minimal to non-existent.
> 
> Much of the slowness is due to the inherent nature of C++ parsing. There 
> is some performance to be gained by tweaking the various data structures 
> and algorithms, but no order-of-magnitude opportunities seem to exist.

Perhaps Chris can add something to this discussion - after all, LLVM is 
written mostly in C++, no ?

Certainly, that must have provided him (and his team) with boatloads of 
performance data ....
Steven Bosscher Aug. 9, 2010, 5:29 p.m. UTC | #15
On Mon, Aug 9, 2010 at 5:26 PM, Diego Novillo <dnovillo@google.com> wrote:
> Additionally, the very worst offender in terms of compile time is -g. The
> size of debugging information is such, that I/O and communication times
> increase significantly.

I assume you already made -pipe the default, and verified that the
piping to the assembler works properly?

It'd be interesting to know if / how much this helps...

Ciao!
Steven
Diego Novillo Aug. 9, 2010, 5:34 p.m. UTC | #16
On 10-08-09 13:29 , Steven Bosscher wrote:
> On Mon, Aug 9, 2010 at 5:26 PM, Diego Novillo<dnovillo@google.com>  wrote:
>> Additionally, the very worst offender in terms of compile time is -g. The
>> size of debugging information is such, that I/O and communication times
>> increase significantly.
>
> I assume you already made -pipe the default, and verified that the
> piping to the assembler works properly?

Yes.  Cary (CC'd) can provide more details, but the core of the issue is 
the massive size of the debug info.  This causes machines to run out of 
memory, increased transmission times, etc.  Builds already occur in 
tmpfs, so I/O is not an issue.  Transmission costs are, however.


Diego.
Diego Novillo Aug. 9, 2010, 5:35 p.m. UTC | #17
On 10-08-09 13:19 , Joseph S. Myers wrote:
> On Mon, 9 Aug 2010, Diego Novillo wrote:
>
>> Additionally, the very worst offender in terms of compile time is -g. The size
>> of debugging information is such, that I/O and communication times increase
>> significantly.
>
> If communication between the compiler and assembler is an important part
> of the cost there, it's possible that a binary interface between them as
> suggested by Ian at<http://www.airs.com/blog/archives/268>  would help.

Well, builds are already done in tmpfs, and we have not found assembly 
times to show up on the radar.  But as other pieces of the toolchain 
speed up, perhaps they will start being noticed.


Diego.
Ralf Wildenhues Aug. 9, 2010, 6:55 p.m. UTC | #18
* Diego Novillo wrote on Mon, Aug 09, 2010 at 05:26:00PM CEST:
> We are also looking at incremental linking and addressing
> performance problems in the build system itself.

Do you have numbers for configure, make, and libtool overhead in GCC?

There is some room for improvement left in using better shell features
in configure before missing parallelism becomes its bottleneck in GCC.
Parallelism is a bit harder.

Cheers,
Ralf
Diego Novillo Aug. 9, 2010, 6:59 p.m. UTC | #19
On 10-08-09 14:55 , Ralf Wildenhues wrote:
> * Diego Novillo wrote on Mon, Aug 09, 2010 at 05:26:00PM CEST:
>> We are also looking at incremental linking and addressing
>> performance problems in the build system itself.
>
> Do you have numbers for configure, make, and libtool overhead in GCC?

No, sorry.  All the measurements I have were taken within our build 
environment.  In previous threads, the profiles I've seen posted were 
very different, since most of the code in GCC is C, not C++.


Diego.
Mike Stump Aug. 9, 2010, 7:59 p.m. UTC | #20
On Aug 9, 2010, at 8:26 AM, Diego Novillo wrote:
> Additionally, the very worst offender in terms of compile time is -g. The size of debugging information is such, that I/O and communication times increase significantly.

Well, if one uses a technology to engineer out the possibility of creating/moving/copying/assembling that information...  Apple found it beneficial to leave it behind in the .o files and has the debugger go look in the .o files...  more can be done.

For example, the front-end could tap into a live database directly and avoid much of the cost.  Instead of writing out the same information 100 times for 100 translation units, the first one writes, then next just punt to the first.  Of course, you'd have to be willing to sign up for the downside of this sort of scheme.  Another possibility would be to create the data very lazily (little of the debug information ever created is ever used)...
Cary Coutant Aug. 9, 2010, 11:06 p.m. UTC | #21
>>> Additionally, the very worst offender in terms of compile time is -g. The
>>> size of debugging information is such, that I/O and communication times
>>> increase significantly.
>>
>> I assume you already made -pipe the default, and verified that the
>> piping to the assembler works properly?
>
> Yes.  Cary (CC'd) can provide more details, but the core of the issue is the
> massive size of the debug info.  This causes machines to run out of memory,
> increased transmission times, etc.  Builds already occur in tmpfs, so I/O is
> not an issue.  Transmission costs are, however.

As Diego mentioned in a follow-up, we haven't found the cost of the
assembler intermediate to be a problem (at least not yet). What hurts
is the size of the object files themselves, with all of the duplicate
debug info that hasn't yet been eliminated by the linker (we're using
-gdwarf-4 and its ability to put debug type info into comdat
sections). Adding the --compress-debug-sections option to gas helped
quite a bit there.

I'd rephrase Diego's last two sentences, however, as "I/O is the
issue, but with tmpfs, the I/O happens outside the compiler and
linker."

-cary
Chris Lattner Aug. 9, 2010, 11:13 p.m. UTC | #22
On Aug 9, 2010, at 10:28 AM, Toon Moene wrote:

> Diego Novillo wrote:
> 
>> On 10-08-09 13:07 , Toon Moene wrote:
>>> Is this also true for C++ ? In that case it might be useful to curb
>>> Front End optimizations when -O0 is given ...
>> Not really, the amount of optimization is quite minimal to non-existent.
>> Much of the slowness is due to the inherent nature of C++ parsing. There is some performance to be gained by tweaking the various data structures and algorithms, but no order-of-magnitude opportunities seem to exist.
> 
> Perhaps Chris can add something to this discussion - after all, LLVM is written mostly in C++, no ?
> 
> Certainly, that must have provided him (and his team) with boatloads of performance data ....

I'm not sure what you mean here.  The single biggest win I've got in my personal development was switching from llvm-g++ to clang++.  It is substantially faster, uses much less memory and has better QoI than G++.  I assume that's not the option that you're suggesting though. :-)


If you want to speed up GCC, I don't have any particular special sauce.  I'd recommend the obvious approach:

1. Decide what you sort of builds you care about.  Difference scenarios require completely different approaches to improve:
  a. "Few core" laptops, "many core" workstations, massive distributed builds
  b. -O0 -g or -O3 builds.
  c. Memory constrained or memory unconstrained
  d. C, ObjC, C++, other?

2. Measure and evaluate.  You can see some of the (really old by now) measurements that we did for Clang here:
  http://clang.llvm.org/performance.html

3. Act on something, depending on what you think is important and what deficiency is most impactful to that scenario.  For example, we've found and tackled:
  a. Memory use.  On memory constrained systems (e.g. Apple just started shipping shiny new 24-thread machines that default to 6G of ram), this is the single biggest thing you can do to speed up builds.

  b. Debug info: As others have pointed out, in the ELF world, this is a huge sink for link times.  Apple defined this away long ago by changing the debug info model.

  c. PCH: For Objective-C and C++ apps that were built for it, PCH is an amazing win.  If you care about these use cases, it might be worthwhile to reevaluate GCC's PCH model, it "lacks optimality".

  d. Integrated assembler: For C apps, we've got a 10-20% speedup at -O0 -g.  The rest of GCC is probably not fast enough yet for this to start to matter much. See http://blog.llvm.org/2010/04/intro-to-llvm-mc-project.html

  e. General speedups: Clang's preprocessor is roughly 2x faster than GCC's and the frontend is generally much faster.  For example, it uses hash tables instead of lists where appropriate, so it doesn't get N^2 cases in silly situations as often.  I don't what what else GCC is doing wrong, I haven't looked at its frontends much.

  f. Optimizer/backend problems.  LLVM was designed to not have many of the issues GCC has suffered from, but we've made substantial improvements to the memory use of debug info (within the compiler) for example.   GCC seems to continue suffering from a host of historic issues with the RTL backend, and (my impression is) the GIMPLE optimizers are starting to slow down substantially as new stuff gets bolted in without a larger architectural view of the problem.

I'm also interested in looking at more aggressive techniques going forward, such as using a compiler server approach to get sharing across invocations of the compiler etc.  This would speed up apps using PCH in particular.

At root, if you're interested in speeding up GCC, you need to decide whats important and stop the 1% performance regressions.  It may not be obvious at first, but a continuous series of 1% slow-downs is actually an exponential regression.

-Chris
Cary Coutant Aug. 9, 2010, 11:23 p.m. UTC | #23
> Well, if one uses a technology to engineer out the possibility of
> creating/moving/copying/assembling that information...  Apple found it
> beneficial to leave it behind in the .o files and has the debugger go look in
> the .o files...  more can be done.
>
> For example, the front-end could tap into a live database directly and avoid
> much of the cost.  Instead of writing out the same information 100 times for
> 100 translation units, the first one writes, then next just punt to the first.
>  Of course, you'd have to be willing to sign up for the downside of this sort
> of scheme.  Another possibility would be to create the data very lazily
> (little of the debug information ever created is ever used)...

Leaving the debug info in the original .o files is a possibility, but
it has some serious drawbacks. Keeping the .o files around while
you're debugging doesn't fit every workflow, and if you've got
thousands of .o files, gdb performance is going to suffer.

With the DWARF-4 feature of type information in comdat sections, I'm
hoping to achieve a hybrid solution, where the non-type debug
information (which typically contains lots of relocatable content) is
processed normally and gets copied into the linker output, while the
type information (which typically requires little or no relocation)
can be served by that live database you mentioned. With DWARF-4 today,
for each type that the compiler generates debug info for, it forms a
signature and builds a comdat section with that signature as the key.
With a repository, the compiler would offer that signature to the
repository. Nine times out of ten, the repository would already have
the type info; when it doesn't, the compiler could store the type info
there instead of writing it into the object file. The debugger would
then just have one additional place to look when trying to resolve a
reference to a type signature (DW_FORM_ref_sig8).

-cary
Laurynas Biveinis Aug. 10, 2010, 2:49 a.m. UTC | #24
2010/8/9 Steven Bosscher <stevenb.gcc@gmail.com>:
> This is interesting. Years ago (7 years?) Zack Weinberg suggested that
> GCC should move RTL back onto obstacks. The overhead should be
> relatively small compared to keeping entire functions-as-trees in
> memory. This is even more true today, now we keep entire translation
> units (and more) in memory as GIMPLE (with SSA). Memory spent on RTL
> is marginal compared to that.
>
> I passes this idea to Laurynas last year ago
> (http://gcc.gnu.org/ml/gcc/2009-08/msg00386.html). I don't know if he
> played with the idea or not.

IMHO it's a very good idea and now it is on the top of my GCC TODO
list. Don't know when I'll get around doing this (busy in the final
months of the grad school), or if somebody won't beat me to it.
Anyway, I also made a note of this patch :)
Chiheng Xu Aug. 10, 2010, 5:57 a.m. UTC | #25
On Tue, Aug 10, 2010 at 1:19 AM, Joseph S. Myers
<joseph@codesourcery.com> wrote:
> On Mon, 9 Aug 2010, Diego Novillo wrote:
>
>> Additionally, the very worst offender in terms of compile time is -g. The size
>> of debugging information is such, that I/O and communication times increase
>> significantly.
>
> If communication between the compiler and assembler is an important part
> of the cost there, it's possible that a binary interface between them as
> suggested by Ian at <http://www.airs.com/blog/archives/268> would help.
> I would imagine it should be possible to get the assembler to accept some
> form of mixed text/binary input so you could just transmit debug info that
> way and transition to a more efficient interface incrementally (assembler
> input goes through a rather complicated sequence of preprocessing /
> processing steps, but cleaning them up to work with such input should be
> possible).
>
Ian Lance Taylor wrote:
"What does make sense is using a structured data format, rather than
text, to communicate between the compiler and the assembler. In gcc’s
terms, the compiler should generate insn patterns with associated
operands. The assembler should piece those together into its own
internal data structures. In fact, of course, ideally gcc and the
assembler would use the same internal data structure. It would be
interesting to design such a structure so that it could be transmitted
in a file, or over a pipe, or in shared memory."


Using shared memory is by far the most efficient way to transmit large
amount data between processes.  It's I/O and communication cost is
roughly zero if you have enough physical memory.

Using temp file or pipe is less efficient. This is because syscalls
like read() and write() have large amount of user space to kernel
space and kernel space to user space data coping. Although using pipe
have less memory consumption,  but because size of buffer ring of pipe
in Linux kernel is 4k bytes(not ?), so using pipe to transmit large
data will consume large amount of CPU time to schedule processes,
beside of cost of read() and write().


Using a structured data format, rather than text, to communicate
between the compiler and the assembler, may require big change to
current compiler architecture, rendering the compiler even harder to
maintain.  For a given machine instruction, it's gcc representation is
simple RTL, but it's assembly representation is complex and varying.
And, after all, you must provide text "dump" of the assembly for
debugging purpose.  Separation of compiler and assembler conforms
modern software engineer practice.
Chiheng Xu Aug. 10, 2010, 6:20 a.m. UTC | #26
On Tue, Aug 10, 2010 at 1:57 PM, Chiheng Xu <chiheng.xu@gmail.com> wrote:
>
> Using shared memory is by far the most efficient way to transmit large
> amount data between processes.  It's I/O and communication cost is
> roughly zero if you have enough physical memory.

I mean, using memory-maped temp file to transmit data. This is
essentially the same as shared memory, except that it does not consume
swap space.
Toon Moene Aug. 10, 2010, 12:30 p.m. UTC | #27
Chris Lattner wrote:

> On Aug 9, 2010, at 10:28 AM, Toon Moene wrote:
> 
>> Diego Novillo wrote:
>>
>>> On 10-08-09 13:07 , Toon Moene wrote:
>>>> Is this also true for C++ ? In that case it might be useful to curb
>>>> Front End optimizations when -O0 is given ...
>>> Not really, the amount of optimization is quite minimal to non-existent.
>>> Much of the slowness is due to the inherent nature of C++ parsing. There is some performance to be gained by tweaking the various data structures and algorithms, but no order-of-magnitude opportunities seem to exist.
>> Perhaps Chris can add something to this discussion - after all, LLVM is written mostly in C++, no ?
>>
>> Certainly, that must have provided him (and his team) with boatloads of performance data ....
> 
> I'm not sure what you mean here.  The single biggest win I've got in my personal development was
> switching from llvm-g++ to clang++.  It is substantially faster, uses much less memory and
> has better QoI than G++.  I assume that's not the option that you're suggesting though. :-)

Well, I just hoped for a list of things where clang++ was faster than 
llvm-g++ and why, but the issues you addressed are probably just as well ...

Thanks,

[ It would probably also help if we started to build GCC with C++ by
   default, although I imagine that the code isn't C++-like enough
   to guide us through all the issues ]
Chris Lattner Aug. 10, 2010, 3:35 p.m. UTC | #28
On Aug 10, 2010, at 5:30 AM, Toon Moene wrote:

> Chris Lattner wrote:
> 
>> On Aug 9, 2010, at 10:28 AM, Toon Moene wrote:
>>> Diego Novillo wrote:
>>> 
>>>> On 10-08-09 13:07 , Toon Moene wrote:
>>>>> Is this also true for C++ ? In that case it might be useful to curb
>>>>> Front End optimizations when -O0 is given ...
>>>> Not really, the amount of optimization is quite minimal to non-existent.
>>>> Much of the slowness is due to the inherent nature of C++ parsing. There is some performance to be gained by tweaking the various data structures and algorithms, but no order-of-magnitude opportunities seem to exist.
>>> Perhaps Chris can add something to this discussion - after all, LLVM is written mostly in C++, no ?
>>> 
>>> Certainly, that must have provided him (and his team) with boatloads of performance data ....
>> I'm not sure what you mean here.  The single biggest win I've got in my personal development was
>> switching from llvm-g++ to clang++.  It is substantially faster, uses much less memory and
>> has better QoI than G++.  I assume that's not the option that you're suggesting though. :-)
> 
> Well, I just hoped for a list of things where clang++ was faster than llvm-g++ and why, but the issues you addressed are probably just as well ...

Ah ok.  We haven't started performance tuning clang++ yet.  Only C/ObjC have seen a focus on compile time so far.

-Chris
diff mbox

Patch

Index: fwprop.c
===================================================================
--- fwprop.c	(revision 162821)
+++ fwprop.c	(working copy)
@@ -116,6 +116,8 @@  along with GCC; see the file COPYING3.  
    all that is needed by fwprop.  */
 
 
+static char *fwprop_firstobj;
+
 static int num_changes;
 
 DEF_VEC_P(df_ref);
@@ -1002,6 +1004,7 @@  try_fwprop_subst (df_ref use, rtx *loc, 
     {
       confirm_change_group ();
       num_changes++;
+      fwprop_firstobj = XOBNEWVAR (rtl_obstack, char, 0);
 
       df_ref_remove (use);
       if (!CONSTANT_P (new_rtx))
@@ -1023,6 +1026,7 @@  try_fwprop_subst (df_ref use, rtx *loc, 
 	    fprintf (dump_file, " Setting REG_EQUAL note\n");
 
 	  set_unique_reg_note (insn, REG_EQUAL, copy_rtx (new_rtx));
+	  fwprop_firstobj = XOBNEWVAR (rtl_obstack, char, 0);
 
 	  /* ??? Is this still necessary if we add the note through
 	     set_unique_reg_note?  */
@@ -1035,6 +1039,7 @@  try_fwprop_subst (df_ref use, rtx *loc, 
 			 type, DF_REF_IN_NOTE);
 	    }
 	}
+      obstack_free (rtl_obstack, fwprop_firstobj);
     }
 
   return ok;
@@ -1212,8 +1217,11 @@  forward_propagate_asm (df_ref use, rtx d
     }
 
   if (num_changes_pending () == 0 || !apply_change_group ())
-    return false;
-
+    {
+      obstack_free (rtl_obstack, fwprop_firstobj);
+      return false;
+    }
+  fwprop_firstobj = XOBNEWVAR (rtl_obstack, char, 0);
   num_changes++;
   return true;
 }
@@ -1392,6 +1400,7 @@  fwprop_init (void)
 
   build_single_def_use_links ();
   df_set_flags (DF_DEFER_INSN_RESCAN);
+  fwprop_firstobj = XOBNEWVAR (rtl_obstack, char, 0);
 }
 
 static void
Index: cgraph.h
===================================================================
--- cgraph.h	(revision 162821)
+++ cgraph.h	(working copy)
@@ -871,7 +871,7 @@  varpool_node_set_size (varpool_node_set 
 
 struct GTY(()) constant_descriptor_tree {
   /* A MEM for the constant.  */
-  rtx rtl;
+  rtx GTY((skip)) rtl;
 
   /* The value of the constant.  */
   tree value;
Index: libfuncs.h
===================================================================
--- libfuncs.h	(revision 162821)
+++ libfuncs.h	(working copy)
@@ -50,23 +50,23 @@  enum libfunc_index
 /* Information about an optab-related libfunc.  We use the same hashtable
    for normal optabs and conversion optabs.  In the first case mode2
    is unused.  */
-struct GTY(()) libfunc_entry {
+struct libfunc_entry {
   size_t optab;
   enum machine_mode mode1, mode2;
   rtx libfunc;
 };
 
 /* Target-dependent globals.  */
-struct GTY(()) target_libfuncs {
+struct target_libfuncs {
   /* SYMBOL_REF rtx's for the library functions that are called
      implicitly and not via optabs.  */
   rtx x_libfunc_table[LTI_MAX];
 
   /* Hash table used to convert declarations into nodes.  */
-  htab_t GTY((param_is (struct libfunc_entry))) x_libfunc_hash;
+  htab_t x_libfunc_hash;
 };
 
-extern GTY(()) struct target_libfuncs default_target_libfuncs;
+extern struct target_libfuncs default_target_libfuncs;
 #if SWITCHABLE_TARGET
 extern struct target_libfuncs *this_target_libfuncs;
 #else
Index: optabs.c
===================================================================
--- optabs.c	(revision 162821)
+++ optabs.c	(working copy)
@@ -6075,7 +6075,7 @@  set_optab_libfunc (optab optable, enum m
     val = 0;
   slot = (struct libfunc_entry **) htab_find_slot (libfunc_hash, &e, INSERT);
   if (*slot == NULL)
-    *slot = ggc_alloc_libfunc_entry ();
+    *slot = XNEW (struct libfunc_entry);
   (*slot)->optab = (size_t) (optable - &optab_table[0]);
   (*slot)->mode1 = mode;
   (*slot)->mode2 = VOIDmode;
@@ -6102,7 +6102,7 @@  set_conv_libfunc (convert_optab optable,
     val = 0;
   slot = (struct libfunc_entry **) htab_find_slot (libfunc_hash, &e, INSERT);
   if (*slot == NULL)
-    *slot = ggc_alloc_libfunc_entry ();
+    *slot = XNEW (struct libfunc_entry);
   (*slot)->optab = (size_t) (optable - &convert_optab_table[0]);
   (*slot)->mode1 = tmode;
   (*slot)->mode2 = fmode;
@@ -6123,7 +6123,7 @@  init_optabs (void)
       init_insn_codes ();
     }
   else
-    libfunc_hash = htab_create_ggc (10, hash_libfunc, eq_libfunc, NULL);
+    libfunc_hash = htab_create (10, hash_libfunc, eq_libfunc, NULL);
 
   init_optab (add_optab, PLUS);
   init_optabv (addv_optab, PLUS);
Index: gengenrtl.c
===================================================================
--- gengenrtl.c	(revision 162821)
+++ gengenrtl.c	(working copy)
@@ -125,6 +125,7 @@  static int
 special_rtx (int idx)
 {
   return (strcmp (defs[idx].enumname, "CONST_INT") == 0
+	  || strcmp (defs[idx].enumname, "CONST") == 0
 	  || strcmp (defs[idx].enumname, "REG") == 0
 	  || strcmp (defs[idx].enumname, "SUBREG") == 0
 	  || strcmp (defs[idx].enumname, "MEM") == 0
Index: tree.h
===================================================================
--- tree.h	(revision 162821)
+++ tree.h	(working copy)
@@ -2816,7 +2816,7 @@  extern void decl_value_expr_insert (tree
 
 struct GTY(()) tree_decl_with_rtl {
   struct tree_decl_common common;
-  rtx rtl;
+  rtx GTY((skip)) rtl;
 };
 
 /* In a FIELD_DECL, this is the field position, counting in bytes, of the
@@ -2937,7 +2937,7 @@  struct GTY(()) tree_const_decl {
 
 struct GTY(()) tree_parm_decl {
   struct tree_decl_with_rtl common;
-  rtx incoming_rtl;
+  rtx GTY((skip)) incoming_rtl;
   struct var_ann_d *ann;
 };
 
Index: reload.h
===================================================================
--- reload.h	(revision 162821)
+++ reload.h	(working copy)
@@ -206,7 +206,7 @@  extern struct target_reload *this_target
 #define caller_save_initialized_p \
   (this_target_reload->x_caller_save_initialized_p)
 
-extern GTY (()) VEC(rtx,gc) *reg_equiv_memory_loc_vec;
+extern VEC(rtx,gc) *reg_equiv_memory_loc_vec;
 extern rtx *reg_equiv_constant;
 extern rtx *reg_equiv_invariant;
 extern rtx *reg_equiv_memory_loc;
@@ -216,10 +216,10 @@  extern rtx *reg_equiv_alt_mem_list;
 
 /* Element N is the list of insns that initialized reg N from its equivalent
    constant or memory slot.  */
-extern GTY((length("reg_equiv_init_size"))) rtx *reg_equiv_init;
+extern rtx *reg_equiv_init;
 
 /* The size of the previous array, for GC purposes.  */
-extern GTY(()) int reg_equiv_init_size;
+extern int reg_equiv_init_size;
 
 /* All the "earlyclobber" operands of the current insn
    are recorded here.  */
Index: final.c
===================================================================
--- final.c	(revision 162821)
+++ final.c	(working copy)
@@ -4467,6 +4467,8 @@  rest_of_clean_state (void)
   /* We're done with this function.  Free up memory if we can.  */
   free_after_parsing (cfun);
   free_after_compilation (cfun);
+  free_function_rtl ();
+  discard_rtx_lists ();
   return 0;
 }
 
Index: builtins.c
===================================================================
--- builtins.c	(revision 162821)
+++ builtins.c	(working copy)
@@ -1885,11 +1885,12 @@  expand_errno_check (tree exp, rtx target
   /* If this built-in doesn't throw an exception, set errno directly.  */
   if (TREE_NOTHROW (TREE_OPERAND (CALL_EXPR_FN (exp), 0)))
     {
+      rtx errno_rtx;
 #ifdef GEN_ERRNO_RTX
-      rtx errno_rtx = GEN_ERRNO_RTX;
+      errno_rtx = GEN_ERRNO_RTX;
 #else
-      rtx errno_rtx
-	  = gen_rtx_MEM (word_mode, gen_rtx_SYMBOL_REF (Pmode, "errno"));
+      errno_rtx
+	= gen_rtx_MEM (word_mode, gen_rtx_SYMBOL_REF (Pmode, "errno"));
 #endif
       emit_move_insn (errno_rtx, GEN_INT (TARGET_EDOM));
       emit_label (lab);
Index: lists.c
===================================================================
--- lists.c	(revision 162821)
+++ lists.c	(working copy)
@@ -26,17 +26,16 @@  along with GCC; see the file COPYING3.  
 #include "diagnostic-core.h"
 #include "toplev.h"
 #include "rtl.h"
-#include "ggc.h"
 
 static void free_list (rtx *, rtx *);
 
 /* Functions for maintaining cache-able lists of EXPR_LIST and INSN_LISTs.  */
 
 /* An INSN_LIST containing all INSN_LISTs allocated but currently unused.  */
-static GTY ((deletable)) rtx unused_insn_list;
+static rtx unused_insn_list;
 
 /* An EXPR_LIST containing all EXPR_LISTs allocated but currently unused.  */
-static GTY ((deletable)) rtx unused_expr_list;
+static rtx unused_expr_list;
 
 /* This function will free an entire list of either EXPR_LIST, INSN_LIST
    or DEPS_LIST nodes.  This is to be used only on lists that consist
@@ -216,4 +215,9 @@  remove_free_EXPR_LIST_node (rtx *listp)
   return elem;
 }
 
-#include "gt-lists.h"
+void
+discard_rtx_lists (void)
+{
+  unused_insn_list = unused_expr_list = 0;
+}
+
Index: gensupport.c
===================================================================
--- gensupport.c	(revision 162821)
+++ gensupport.c	(working copy)
@@ -35,9 +35,6 @@  int target_flags;
 
 int insn_elision = 1;
 
-static struct obstack obstack;
-struct obstack *rtl_obstack = &obstack;
-
 static int sequence_num;
 
 static int predicable_default;
@@ -788,10 +785,10 @@  bool
 init_rtx_reader_args_cb (int argc, char **argv,
 			 bool (*parse_opt) (const char *))
 {
+  init_rtl ();
   /* Prepare to read input.  */
   condition_table = htab_create (500, hash_c_test, cmp_c_test, NULL);
   init_predicate_table ();
-  obstack_init (rtl_obstack);
   sequence_num = 0;
 
   read_md_files (argc, argv, parse_opt, rtx_handle_directive);
Index: toplev.c
===================================================================
--- toplev.c	(revision 162821)
+++ toplev.c	(working copy)
@@ -2080,6 +2080,7 @@  backend_init_target (void)
 static void
 backend_init (void)
 {
+  init_rtl ();
   init_emit_once ();
 
   init_rtlanal ();
@@ -2090,6 +2091,7 @@  backend_init (void)
   /* Initialize the target-specific back end pieces.  */
   ira_init_once ();
   backend_init_target ();
+  preserve_rtl ();
 }
 
 /* Initialize excess precision settings.  */
Index: dojump.c
===================================================================
--- dojump.c	(revision 162821)
+++ dojump.c	(working copy)
@@ -33,7 +33,6 @@  along with GCC; see the file COPYING3.  
 #include "expr.h"
 #include "optabs.h"
 #include "langhooks.h"
-#include "ggc.h"
 #include "basic-block.h"
 #include "output.h"
 
@@ -132,9 +131,9 @@  jumpif_1 (enum tree_code code, tree op0,
 
 /* Used internally by prefer_and_bit_test.  */
 
-static GTY(()) rtx and_reg;
-static GTY(()) rtx and_test;
-static GTY(()) rtx shift_test;
+static rtx and_reg;
+static rtx and_test;
+static rtx shift_test;
 
 /* Compare the relative costs of "(X & (1 << BITNUM))" and "(X >> BITNUM) & 1",
    where X is an arbitrary register of mode MODE.  Return true if the former
@@ -145,12 +144,14 @@  prefer_and_bit_test (enum machine_mode m
 {
   if (and_test == 0)
     {
+      rtl_on_permanent_obstack ();
       /* Set up rtxes for the two variations.  Use NULL as a placeholder
 	 for the BITNUM-based constants.  */
       and_reg = gen_rtx_REG (mode, FIRST_PSEUDO_REGISTER);
       and_test = gen_rtx_AND (mode, and_reg, NULL);
       shift_test = gen_rtx_AND (mode, gen_rtx_ASHIFTRT (mode, and_reg, NULL),
 				const1_rtx);
+      rtl_pop_obstack ();
     }
   else
     {
@@ -1185,5 +1186,3 @@  do_compare_and_jump (tree treeop0, tree 
                             ? expr_size (treeop0) : NULL_RTX),
 			   if_false_label, if_true_label, prob);
 }
-
-#include "gt-dojump.h"
Index: caller-save.c
===================================================================
--- caller-save.c	(revision 162821)
+++ caller-save.c	(working copy)
@@ -39,7 +39,6 @@  along with GCC; see the file COPYING3.  
 #include "tm_p.h"
 #include "addresses.h"
 #include "output.h"
-#include "ggc.h"
 
 #define MOVE_MAX_WORDS (MOVE_MAX / UNITS_PER_WORD)
 
@@ -101,12 +100,9 @@  static void add_stored_regs (rtx, const_
 
 
 
-static GTY(()) rtx savepat;
-static GTY(()) rtx restpat;
-static GTY(()) rtx test_reg;
-static GTY(()) rtx test_mem;
-static GTY(()) rtx saveinsn;
-static GTY(()) rtx restinsn;
+static rtx savepat, saveinsn;
+static rtx restpat, restinsn;
+static rtx test_reg, test_mem;
 
 /* Return the INSN_CODE used to save register REG in mode MODE.  */
 static int
@@ -190,7 +186,10 @@  init_caller_save (void)
 
   caller_save_initialized_p = true;
 
+  rtl_on_permanent_obstack ();
+
   CLEAR_HARD_REG_SET (no_caller_save_reg_set);
+  
   /* First find all the registers that we need to deal with and all
      the modes that they can have.  If we can't find a mode to use,
      we can't have the register live over calls.  */
@@ -278,6 +277,7 @@  init_caller_save (void)
 		SET_HARD_REG_BIT (no_caller_save_reg_set, i);
 	    }
 	}
+  rtl_pop_obstack ();
 }
 
 
@@ -1405,4 +1405,3 @@  insert_one_insn (struct insn_chain *chai
   INSN_CODE (new_chain->insn) = code;
   return new_chain;
 }
-#include "gt-caller-save.h"
Index: dwarf2out.c
===================================================================
--- dwarf2out.c	(revision 162821)
+++ dwarf2out.c	(working copy)
@@ -199,10 +199,6 @@  dwarf2out_do_cfi_asm (void)
 #define PTR_SIZE (POINTER_SIZE / BITS_PER_UNIT)
 #endif
 
-/* Array of RTXes referenced by the debugging information, which therefore
-   must be kept around forever.  */
-static GTY(()) VEC(rtx,gc) *used_rtx_array;
-
 /* A pointer to the base of a list of incomplete types which might be
    completed at some later time.  incomplete_types_list needs to be a
    VEC(tree,gc) because we want to tell the garbage collector about
@@ -233,7 +229,7 @@  static GTY(()) section *debug_frame_sect
 
 /* Personality decl of current unit.  Used only when assembler does not support
    personality CFI.  */
-static GTY(()) rtx current_unit_personality;
+static rtx current_unit_personality;
 
 /* How to start an assembler comment.  */
 #ifndef ASM_COMMENT_START
@@ -1662,17 +1658,17 @@  dwarf2out_notice_stack_adjust (rtx insn,
    of the prologue or (b) the register is clobbered.  This clusters
    register saves so that there are fewer pc advances.  */
 
-struct GTY(()) queued_reg_save {
+struct queued_reg_save {
   struct queued_reg_save *next;
   rtx reg;
   HOST_WIDE_INT cfa_offset;
   rtx saved_reg;
 };
 
-static GTY(()) struct queued_reg_save *queued_reg_saves;
+static struct queued_reg_save *queued_reg_saves;
 
 /* The caller's ORIG_REG is saved in SAVED_IN_REG.  */
-struct GTY(()) reg_saved_in_data {
+struct reg_saved_in_data {
   rtx orig_reg;
   rtx saved_in_reg;
 };
@@ -1681,8 +1677,8 @@  struct GTY(()) reg_saved_in_data {
    The list intentionally has a small maximum capacity of 4; if your
    port needs more than that, you might consider implementing a
    more efficient data structure.  */
-static GTY(()) struct reg_saved_in_data regs_saved_in_regs[4];
-static GTY(()) size_t num_regs_saved_in_regs;
+static struct reg_saved_in_data regs_saved_in_regs[4];
+static size_t num_regs_saved_in_regs;
 
 #if defined (DWARF2_DEBUGGING_INFO) || defined (DWARF2_UNWIND_INFO)
 static const char *last_reg_save_label;
@@ -1704,7 +1700,7 @@  queue_reg_save (const char *label, rtx r
 
   if (q == NULL)
     {
-      q = ggc_alloc_queued_reg_save ();
+      q = XNEW (struct queued_reg_save);
       q->next = queued_reg_saves;
       queued_reg_saves = q;
     }
@@ -1721,13 +1717,14 @@  queue_reg_save (const char *label, rtx r
 static void
 flush_queued_reg_saves (void)
 {
-  struct queued_reg_save *q;
+  struct queued_reg_save *q, *next;
 
-  for (q = queued_reg_saves; q; q = q->next)
+  for (q = queued_reg_saves; q; q = next)
     {
       size_t i;
       unsigned int reg, sreg;
 
+      next = q->next;
       for (i = 0; i < num_regs_saved_in_regs; i++)
 	if (REGNO (regs_saved_in_regs[i].orig_reg) == REGNO (q->reg))
 	  break;
@@ -1748,6 +1745,8 @@  flush_queued_reg_saves (void)
       else
 	sreg = INVALID_REGNUM;
       reg_save (last_reg_save_label, reg, sreg, q->cfa_offset);
+
+      free (q);
     }
 
   queued_reg_saves = NULL;
@@ -4278,7 +4277,7 @@  typedef struct GTY(()) dw_val_struct {
   enum dw_val_class val_class;
   union dw_val_struct_union
     {
-      rtx GTY ((tag ("dw_val_class_addr"))) val_addr;
+      rtx GTY ((skip,tag ("dw_val_class_addr"))) val_addr;
       unsigned HOST_WIDE_INT GTY ((tag ("dw_val_class_offset"))) val_offset;
       dw_loc_list_ref GTY ((tag ("dw_val_class_loc_list"))) val_loc_list;
       dw_loc_descr_ref GTY ((tag ("dw_val_class_loc"))) val_loc;
@@ -5859,7 +5858,7 @@  struct GTY ((chain_next ("%h.next"))) va
      mode is 0 and first operand is a CONCAT with bitsize
      as first CONCAT operand and NOTE_INSN_VAR_LOCATION resp.
      NULL as second operand.  */
-  rtx GTY (()) loc;
+  rtx GTY ((skip)) loc;
   const char * GTY (()) label;
   struct var_loc_node * GTY (()) next;
 };
@@ -7378,9 +7377,12 @@  add_AT_addr (dw_die_ref die, enum dwarf_
 {
   dw_attr_node attr;
 
+  rtl_on_permanent_obstack ();
+  addr = copy_rtx (addr);
+  rtl_pop_obstack ();
   attr.dw_attr = attr_kind;
   attr.dw_attr_val.val_class = dw_val_class_addr;
-  attr.dw_attr_val.v.val_addr = addr;
+  attr.dw_attr_val.v.val_addr = permanent_copy_rtx (addr);
   add_dwarf_attr (die, &attr);
 }
 
@@ -13620,7 +13622,7 @@  mem_loc_descriptor (rtx rtl, enum machin
 	  temp = new_loc_descr (DWARF2_ADDR_SIZE == 4
 				? DW_OP_const4u : DW_OP_const8u, 0, 0);
 	  temp->dw_loc_oprnd1.val_class = dw_val_class_addr;
-	  temp->dw_loc_oprnd1.v.val_addr = rtl;
+	  temp->dw_loc_oprnd1.v.val_addr = permanent_copy_rtx (rtl);
 	  temp->dtprel = true;
 
 	  mem_loc_result = new_loc_descr (DW_OP_GNU_push_tls_address, 0, 0);
@@ -13633,10 +13635,12 @@  mem_loc_descriptor (rtx rtl, enum machin
 	break;
 
     symref:
+      rtl_on_permanent_obstack ();
+      rtl = copy_rtx (rtl);
+      rtl_pop_obstack ();
       mem_loc_result = new_loc_descr (DW_OP_addr, 0, 0);
       mem_loc_result->dw_loc_oprnd1.val_class = dw_val_class_addr;
-      mem_loc_result->dw_loc_oprnd1.v.val_addr = rtl;
-      VEC_safe_push (rtx, gc, used_rtx_array, rtl);
+      mem_loc_result->dw_loc_oprnd1.v.val_addr = permanent_copy_rtx (rtl);
       break;
 
     case CONCAT:
@@ -14442,9 +14446,11 @@  loc_descriptor (rtx rtl, enum machine_mo
 	{
 	  loc_result = new_loc_descr (DW_OP_addr, 0, 0);
 	  loc_result->dw_loc_oprnd1.val_class = dw_val_class_addr;
-	  loc_result->dw_loc_oprnd1.v.val_addr = rtl;
+	  loc_result->dw_loc_oprnd1.v.val_addr = permanent_copy_rtx (rtl);
 	  add_loc_descr (&loc_result, new_loc_descr (DW_OP_stack_value, 0, 0));
-	  VEC_safe_push (rtx, gc, used_rtx_array, rtl);
+	  rtl_on_permanent_obstack ();
+	  rtl = copy_rtx (rtl);
+	  rtl_pop_obstack ();
 	}
       break;
 
@@ -15134,7 +15140,7 @@  loc_list_from_tree (tree loc, int want_a
 
 	  ret = new_loc_descr (first_op, 0, 0);
 	  ret->dw_loc_oprnd1.val_class = dw_val_class_addr;
-	  ret->dw_loc_oprnd1.v.val_addr = rtl;
+	  ret->dw_loc_oprnd1.v.val_addr = permanent_copy_rtx (rtl);
 	  ret->dtprel = dtprel;
 
 	  ret1 = new_loc_descr (second_op, 0, 0);
@@ -15185,7 +15191,7 @@  loc_list_from_tree (tree loc, int want_a
 	  {
 	    ret = new_loc_descr (DW_OP_addr, 0, 0);
 	    ret->dw_loc_oprnd1.val_class = dw_val_class_addr;
-	    ret->dw_loc_oprnd1.v.val_addr = rtl;
+	    ret->dw_loc_oprnd1.v.val_addr = permanent_copy_rtx (rtl);
 	  }
 	else
 	  {
@@ -16111,10 +16117,12 @@  add_const_value_attribute (dw_die_ref di
 	rtl_addr:
 	  loc_result = new_loc_descr (DW_OP_addr, 0, 0);
 	  loc_result->dw_loc_oprnd1.val_class = dw_val_class_addr;
-	  loc_result->dw_loc_oprnd1.v.val_addr = rtl;
+	  loc_result->dw_loc_oprnd1.v.val_addr = permanent_copy_rtx (rtl);
 	  add_loc_descr (&loc_result, new_loc_descr (DW_OP_stack_value, 0, 0));
 	  add_AT_loc (die, DW_AT_location, loc_result);
-	  VEC_safe_push (rtx, gc, used_rtx_array, rtl);
+	  rtl_on_permanent_obstack ();
+	  rtl = copy_rtx (rtl);
+	  rtl_pop_obstack ();
 	  return true;
 	}
       return false;
@@ -17498,9 +17506,11 @@  add_name_and_src_coords_attributes (dw_d
      from the DECL_NAME name used in the source file.  */
   if (TREE_CODE (decl) == FUNCTION_DECL && TREE_ASM_WRITTEN (decl))
     {
-      add_AT_addr (die, DW_AT_VMS_rtnbeg_pd_address,
-		   XEXP (DECL_RTL (decl), 0));
-      VEC_safe_push (rtx, gc, used_rtx_array, XEXP (DECL_RTL (decl), 0));
+      rtx rtl = XEXP (DECL_RTL (decl), 0);
+      rtl_on_permanent_obstack ();
+      rtl = copy_rtx (rtl);
+      rtl_pop_obstack ();
+      add_AT_addr (die, DW_AT_VMS_rtnbeg_pd_address, rtl);
     }
 #endif
 }
@@ -19036,9 +19046,13 @@  gen_variable_die (tree decl, tree origin
 			  && loc->expr->dw_loc_next == NULL
 			  && GET_CODE (loc->expr->dw_loc_oprnd1.v.val_addr)
 			     == SYMBOL_REF)
-			loc->expr->dw_loc_oprnd1.v.val_addr
-			  = plus_constant (loc->expr->dw_loc_oprnd1.v.val_addr, off);
-			else
+			{
+			  rtx t = plus_constant (loc->expr->dw_loc_oprnd1.v.val_addr,
+						 off);
+			  t = permanent_copy_rtx (t);
+			  loc->expr->dw_loc_oprnd1.v.val_addr = t;
+			}
+		      else
 			  loc_list_plus_const (loc, off);
 		    }
 		  add_AT_location_description (var_die, DW_AT_location, loc);
@@ -19099,8 +19113,12 @@  gen_variable_die (tree decl, tree origin
 		  && loc->expr->dw_loc_opc == DW_OP_addr
 		  && loc->expr->dw_loc_next == NULL
 		  && GET_CODE (loc->expr->dw_loc_oprnd1.v.val_addr) == SYMBOL_REF)
-		loc->expr->dw_loc_oprnd1.v.val_addr
-		  = plus_constant (loc->expr->dw_loc_oprnd1.v.val_addr, off);
+		{
+		  rtx t = plus_constant (loc->expr->dw_loc_oprnd1.v.val_addr,
+					 off);
+		  t = permanent_copy_rtx (t);
+		  loc->expr->dw_loc_oprnd1.v.val_addr = t;
+		}
 	      else
 		loc_list_plus_const (loc, off);
 	    }
@@ -21582,8 +21600,6 @@  dwarf2out_init (const char *filename ATT
 
   incomplete_types = VEC_alloc (tree, gc, 64);
 
-  used_rtx_array = VEC_alloc (rtx, gc, 32);
-
   debug_info_section = get_section (DEBUG_INFO_SECTION,
 				    SECTION_DEBUG, NULL);
   debug_abbrev_section = get_section (DEBUG_ABBREV_SECTION,
@@ -22126,8 +22142,9 @@  resolve_one_addr (rtx *addr, void *data 
       rtl = lookup_constant_def (t);
       if (!rtl || !MEM_P (rtl))
 	return 1;
-      rtl = XEXP (rtl, 0);
-      VEC_safe_push (rtx, gc, used_rtx_array, rtl);
+      rtl_on_permanent_obstack ();
+      rtl = copy_rtx (XEXP (rtl, 0));
+      rtl_pop_obstack ();
       *addr = rtl;
       return 0;
     }
Index: tree-ssa-address.c
===================================================================
--- tree-ssa-address.c	(revision 162821)
+++ tree-ssa-address.c	(working copy)
@@ -43,7 +43,6 @@  along with GCC; see the file COPYING3.  
 #include "rtl.h"
 #include "recog.h"
 #include "expr.h"
-#include "ggc.h"
 #include "target.h"
 
 /* TODO -- handling of symbols (according to Richard Hendersons
@@ -73,22 +72,22 @@  along with GCC; see the file COPYING3.  
 /* A "template" for memory address, used to determine whether the address is
    valid for mode.  */
 
-typedef struct GTY (()) mem_addr_template {
+typedef struct mem_addr_template {
   rtx ref;			/* The template.  */
-  rtx * GTY ((skip)) step_p;	/* The point in template where the step should be
-				   filled in.  */
-  rtx * GTY ((skip)) off_p;	/* The point in template where the offset should
-				   be filled in.  */
+  rtx *step_p;	/* The point in template where the step should be
+		   filled in.  */
+  rtx *off_p;	/* The point in template where the offset should
+		   be filled in.  */
 } mem_addr_template;
 
 DEF_VEC_O (mem_addr_template);
-DEF_VEC_ALLOC_O (mem_addr_template, gc);
+DEF_VEC_ALLOC_O (mem_addr_template, heap);
 
 /* The templates.  Each of the low five bits of the index corresponds to one
    component of TARGET_MEM_REF being present, while the high bits identify
    the address space.  See TEMPL_IDX.  */
 
-static GTY(()) VEC (mem_addr_template, gc) *mem_addr_template_list;
+static VEC (mem_addr_template, heap) *mem_addr_template_list;
 
 #define TEMPL_IDX(AS, SYMBOL, BASE, INDEX, STEP, OFFSET) \
   (((int) (AS) << 5) \
@@ -115,6 +114,8 @@  gen_addr_rtx (enum machine_mode address_
   if (offset_p)
     *offset_p = NULL;
 
+  rtl_on_permanent_obstack ();
+
   if (index)
     {
       act_elem = index;
@@ -176,6 +177,8 @@  gen_addr_rtx (enum machine_mode address_
 
   if (!*addr)
     *addr = const0_rtx;
+
+  rtl_pop_obstack ();
 }
 
 /* Returns address for TARGET_MEM_REF with parameters given by ADDR
@@ -209,13 +212,14 @@  addr_for_mem_ref (struct mem_address *ad
 
       if (templ_index
 	  >= VEC_length (mem_addr_template, mem_addr_template_list))
-	VEC_safe_grow_cleared (mem_addr_template, gc, mem_addr_template_list,
+	VEC_safe_grow_cleared (mem_addr_template, heap, mem_addr_template_list,
 			       templ_index + 1);
 
       /* Reuse the templates for addresses, so that we do not waste memory.  */
       templ = VEC_index (mem_addr_template, mem_addr_template_list, templ_index);
       if (!templ->ref)
 	{
+	  rtl_on_permanent_obstack ();
 	  sym = (addr->symbol ?
 		 gen_rtx_SYMBOL_REF (address_mode, ggc_strdup ("test_symbol"))
 		 : NULL_RTX);
@@ -232,6 +236,7 @@  addr_for_mem_ref (struct mem_address *ad
 			&templ->ref,
 			&templ->step_p,
 			&templ->off_p);
+	  rtl_pop_obstack ();
 	}
 
       if (st)
@@ -921,4 +926,3 @@  dump_mem_address (FILE *file, struct mem
     }
 }
 
-#include "gt-tree-ssa-address.h"
Index: function.c
===================================================================
--- function.c	(revision 162821)
+++ function.c	(working copy)
@@ -124,10 +124,8 @@  struct machine_function * (*init_machine
 struct function *cfun = 0;
 
 /* These hashes record the prologue and epilogue insns.  */
-static GTY((if_marked ("ggc_marked_p"), param_is (struct rtx_def)))
-  htab_t prologue_insn_hash;
-static GTY((if_marked ("ggc_marked_p"), param_is (struct rtx_def)))
-  htab_t epilogue_insn_hash;
+static htab_t prologue_insn_hash;
+static htab_t epilogue_insn_hash;
 
 
 htab_t types_used_by_vars_hash = NULL;
@@ -208,6 +206,10 @@  free_after_parsing (struct function *f)
 void
 free_after_compilation (struct function *f)
 {
+  if (prologue_insn_hash)
+    htab_delete (prologue_insn_hash);
+  if (epilogue_insn_hash)
+    htab_delete (epilogue_insn_hash);
   prologue_insn_hash = NULL;
   epilogue_insn_hash = NULL;
 
@@ -219,6 +221,8 @@  free_after_compilation (struct function 
   f->machine = NULL;
   f->cfg = NULL;
 
+  if (regno_reg_rtx)
+    free (regno_reg_rtx);
   regno_reg_rtx = NULL;
   insn_locators_free ();
 }
@@ -339,7 +343,8 @@  try_fit_stack_local (HOST_WIDE_INT start
 static void
 add_frame_space (HOST_WIDE_INT start, HOST_WIDE_INT end)
 {
-  struct frame_space *space = ggc_alloc_frame_space ();
+  struct frame_space *space;
+  space = (struct frame_space *)obstack_alloc (rtl_obstack, sizeof *space);
   space->next = crtl->frame_space_list;
   crtl->frame_space_list = space;
   space->start = start;
@@ -516,7 +521,6 @@  assign_stack_local (enum machine_mode mo
   return assign_stack_local_1 (mode, size, align, false);
 }
 
-
 /* In order to evaluate some expressions, such as function calls returning
    structures in memory, we need to temporarily allocate stack locations.
    We record each allocated temporary in the following structure.
@@ -535,13 +539,13 @@  assign_stack_local (enum machine_mode mo
    level where they are defined.  They are marked a "kept" so that
    free_temp_slots will not free them.  */
 
-struct GTY(()) temp_slot {
+struct temp_slot {
   /* Points to next temporary slot.  */
   struct temp_slot *next;
   /* Points to previous temporary slot.  */
   struct temp_slot *prev;
   /* The rtx to used to reference the slot.  */
-  rtx slot;
+  rtx GTY(()) slot;
   /* The size, in units, of the slot.  */
   HOST_WIDE_INT size;
   /* The type of the object in the slot, or zero if it doesn't correspond
@@ -569,10 +573,10 @@  struct GTY(()) temp_slot {
 
 /* A table of addresses that represent a stack slot.  The table is a mapping
    from address RTXen to a temp slot.  */
-static GTY((param_is(struct temp_slot_address_entry))) htab_t temp_slot_address_table;
+static htab_t temp_slot_address_table;
 
 /* Entry for the above hash table.  */
-struct GTY(()) temp_slot_address_entry {
+struct temp_slot_address_entry {
   hashval_t hash;
   rtx address;
   struct temp_slot *temp_slot;
@@ -682,7 +686,8 @@  static void
 insert_temp_slot_address (rtx address, struct temp_slot *temp_slot)
 {
   void **slot;
-  struct temp_slot_address_entry *t = ggc_alloc_temp_slot_address_entry ();
+  struct temp_slot_address_entry *t;
+  t = (struct temp_slot_address_entry *)obstack_alloc (rtl_obstack, sizeof *t);
   t->address = address;
   t->temp_slot = temp_slot;
   t->hash = temp_slot_address_compute_hash (t);
@@ -834,7 +839,7 @@  assign_stack_temp_for_type (enum machine
 
 	  if (best_p->size - rounded_size >= alignment)
 	    {
-	      p = ggc_alloc_temp_slot ();
+	      p = (struct temp_slot *)obstack_alloc (rtl_obstack, sizeof *p);
 	      p->in_use = p->addr_taken = 0;
 	      p->size = best_p->size - rounded_size;
 	      p->base_offset = best_p->base_offset + rounded_size;
@@ -858,7 +863,7 @@  assign_stack_temp_for_type (enum machine
     {
       HOST_WIDE_INT frame_offset_old = frame_offset;
 
-      p = ggc_alloc_temp_slot ();
+      p = (struct temp_slot *)obstack_alloc (rtl_obstack, sizeof *p);
 
       /* We are passing an explicit alignment request to assign_stack_local.
 	 One side effect of that is assign_stack_local will not round SIZE
@@ -1311,10 +1316,10 @@  init_temp_slots (void)
 
   /* Set up the table to map addresses to temp slots.  */
   if (! temp_slot_address_table)
-    temp_slot_address_table = htab_create_ggc (32,
-					       temp_slot_address_hash,
-					       temp_slot_address_eq,
-					       NULL);
+    temp_slot_address_table = htab_create (32,
+					   temp_slot_address_hash,
+					   temp_slot_address_eq,
+					   NULL);
   else
     htab_empty (temp_slot_address_table);
 }
@@ -1836,7 +1841,9 @@  instantiate_decls (tree fndecl)
   FOR_EACH_LOCAL_DECL (cfun, ix, decl)
     if (DECL_RTL_SET_P (decl))
       instantiate_decl_rtl (DECL_RTL (decl));
+#if 0
   VEC_free (tree, gc, cfun->local_decls);
+#endif
 }
 
 /* Pass through the INSNS of function FNDECL and convert virtual register
@@ -4779,8 +4786,6 @@  do_warn_unused_parameter (tree fn)
       warning (OPT_Wunused_parameter, "unused parameter %q+D", decl);
 }
 
-static GTY(()) rtx initial_trampoline;
-
 /* Generate RTL for the end of the current function.  */
 
 void
@@ -5068,7 +5073,7 @@  record_insns (rtx insns, rtx end, htab_t
 
   if (hash == NULL)
     *hashp = hash
-      = htab_create_ggc (17, htab_hash_pointer, htab_eq_pointer, NULL);
+      = htab_create (17, htab_hash_pointer, htab_eq_pointer, NULL);
 
   for (tmp = insns; tmp != end; tmp = NEXT_INSN (tmp))
     {
Index: function.h
===================================================================
--- function.h	(revision 162821)
+++ function.h	(working copy)
@@ -33,14 +33,14 @@  along with GCC; see the file COPYING3.  
    The main insn-chain is saved in the last element of the chain,
    unless the chain is empty.  */
 
-struct GTY(()) sequence_stack {
+struct sequence_stack {
   /* First and last insns in the chain of the saved sequence.  */
   rtx first;
   rtx last;
   struct sequence_stack *next;
 };
 
-struct GTY(()) emit_status {
+struct emit_status {
   /* This is reset to LAST_VIRTUAL_REGISTER + 1 at the start of each function.
      After rtl generation, it is 1 plus the largest register number used.  */
   int x_reg_rtx_no;
@@ -61,6 +61,7 @@  struct GTY(()) emit_status {
      The main insn-chain is saved in the last element of the chain,
      unless the chain is empty.  */
   struct sequence_stack *sequence_stack;
+  struct sequence_stack *free_sequence_stack;
 
   /* INSN_UID for next insn emitted.
      Reset to 1 for each function compiled.  */
@@ -83,7 +84,7 @@  struct GTY(()) emit_status {
   /* Indexed by pseudo register number, if nonzero gives the known alignment
      for that pseudo (if REG_POINTER is set in x_regno_reg_rtx).
      Allocated in parallel with x_regno_reg_rtx.  */
-  unsigned char * GTY((skip)) regno_pointer_align;
+  unsigned char *regno_pointer_align;
 };
 
 
@@ -92,7 +93,7 @@  struct GTY(()) emit_status {
    FIXME: We could put it into emit_status struct, but gengtype is not able to deal
    with length attribute nested in top level structures.  */
 
-extern GTY ((length ("crtl->emit.x_reg_rtx_no"))) rtx * regno_reg_rtx;
+extern rtx *regno_reg_rtx;
 
 /* For backward compatibility... eventually these should all go away.  */
 #define reg_rtx_no (crtl->emit.x_reg_rtx_no)
@@ -100,7 +101,7 @@  extern GTY ((length ("crtl->emit.x_reg_r
 
 #define REGNO_POINTER_ALIGN(REGNO) (crtl->emit.regno_pointer_align[REGNO])
 
-struct GTY(()) expr_status {
+struct expr_status {
   /* Number of units that we should eventually pop off the stack.
      These are the arguments to function calls that have already returned.  */
   int x_pending_stack_adjust;
@@ -145,7 +146,7 @@  DEF_VEC_P(call_site_record);
 DEF_VEC_ALLOC_P(call_site_record, gc);
 
 /* RTL representation of exception handling.  */
-struct GTY(()) rtl_eh {
+struct rtl_eh {
   rtx ehr_stackadj;
   rtx ehr_handler;
   rtx ehr_label;
@@ -178,7 +179,7 @@  typedef struct ipa_opt_pass_d *ipa_opt_p
 DEF_VEC_P(ipa_opt_pass);
 DEF_VEC_ALLOC_P(ipa_opt_pass,heap);
 
-struct GTY(()) varasm_status {
+struct varasm_status {
   /* If we're using a per-function constant pool, this is it.  */
   struct rtx_constant_pool *pool;
 
@@ -188,7 +189,7 @@  struct GTY(()) varasm_status {
 };
 
 /* Information mainlined about RTL representation of incoming arguments.  */
-struct GTY(()) incoming_args {
+struct incoming_args {
   /* Number of bytes of args popped by function being compiled on its return.
      Zero if no bytes are to be popped.
      May affect compilation of return insn or of function epilogue.  */
@@ -217,7 +218,7 @@  struct GTY(()) incoming_args {
 };
 
 /* Data for function partitioning.  */
-struct GTY(()) function_subsections {
+struct function_subsections {
   /* Assembly labels for the hot and cold text sections, to
      be used by debugger functions for determining the size of text
      sections.  */
@@ -236,7 +237,7 @@  struct GTY(()) function_subsections {
 /* Describe an empty area of space in the stack frame.  These can be chained
    into a list; this is used to keep track of space wasted for alignment
    reasons.  */
-struct GTY(()) frame_space
+struct frame_space
 {
   struct frame_space *next;
 
@@ -245,7 +246,7 @@  struct GTY(()) frame_space
 };
 
 /* Datastructures maintained for currently processed function in RTL form.  */
-struct GTY(()) rtl_data {
+struct rtl_data {
   struct expr_status expr;
   struct emit_status emit;
   struct varasm_status varasm;
@@ -461,7 +462,7 @@  struct GTY(()) rtl_data {
 #define stack_realign_fp (crtl->stack_realign_needed && !crtl->need_drap)
 #define stack_realign_drap (crtl->stack_realign_needed && crtl->need_drap)
 
-extern GTY(()) struct rtl_data x_rtl;
+extern struct rtl_data x_rtl;
 
 /* Accessor to RTL datastructures.  We keep them statically allocated now since
    we never keep multiple functions.  For threaded compiler we might however
@@ -505,11 +506,15 @@  struct GTY(()) function {
 
   /* Vector of function local variables, functions, types and constants.  */
   VEC(tree,gc) *local_decls;
+  VEC(tree,gc) *stack_vars;
+
+  /* A fake decl that is used as the MEM_EXPR of spill slots.  */
+  tree spill_slot_decl;
 
   /* For md files.  */
 
   /* tm.h can use this to store whatever it likes.  */
-  struct machine_function * GTY ((maybe_undef)) machine;
+  struct machine_function * GTY ((skip)) machine;
 
   /* Language-specific code can use this to store whatever it likes.  */
   struct language_function * language;
Index: gcse.c
===================================================================
--- gcse.c	(revision 162821)
+++ gcse.c	(working copy)
@@ -159,7 +159,6 @@  along with GCC; see the file COPYING3.  
 #include "function.h"
 #include "expr.h"
 #include "except.h"
-#include "ggc.h"
 #include "params.h"
 #include "cselib.h"
 #include "intl.h"
@@ -849,7 +848,7 @@  want_to_gcse_p (rtx x, int *max_distance
 
 /* Used internally by can_assign_to_reg_without_clobbers_p.  */
 
-static GTY(()) rtx test_insn;
+static rtx test_insn;
 
 /* Return true if we can assign X to a pseudo register such that the
    resulting insn does not result in clobbering a hard register as a
@@ -879,12 +878,14 @@  can_assign_to_reg_without_clobbers_p (rt
      our test insn if we haven't already.  */
   if (test_insn == 0)
     {
+      rtl_on_permanent_obstack ();
       test_insn
 	= make_insn_raw (gen_rtx_SET (VOIDmode,
 				      gen_rtx_REG (word_mode,
 						   FIRST_PSEUDO_REGISTER * 2),
 				      const0_rtx));
       NEXT_INSN (test_insn) = PREV_INSN (test_insn) = 0;
+      rtl_pop_obstack ();
     }
 
   /* Now make an insn like the one we would make when GCSE'ing and see if
@@ -5361,4 +5362,3 @@  struct rtl_opt_pass pass_rtl_hoist =
  }
 };
 
-#include "gt-gcse.h"
Index: alias.c
===================================================================
--- alias.c	(revision 162821)
+++ alias.c	(working copy)
@@ -204,13 +204,13 @@  static void memory_modified_1 (rtx, cons
    current function performs nonlocal memory memory references for the
    purposes of marking the function as a constant function.  */
 
-static GTY(()) VEC(rtx,gc) *reg_base_value;
+static VEC(rtx,heap) *reg_base_value;
 static rtx *new_reg_base_value;
 
 /* We preserve the copy of old array around to avoid amount of garbage
    produced.  About 8% of garbage produced were attributed to this
    array.  */
-static GTY((deletable)) VEC(rtx,gc) *old_reg_base_value;
+static VEC(rtx,heap) *old_reg_base_value;
 
 #define static_reg_base_value \
   (this_target_rtl->x_static_reg_base_value)
@@ -222,10 +222,10 @@  static GTY((deletable)) VEC(rtx,gc) *old
 /* Vector indexed by N giving the initial (unchanging) value known for
    pseudo-register N.  This array is initialized in init_alias_analysis,
    and does not change until end_alias_analysis is called.  */
-static GTY((length("reg_known_value_size"))) rtx *reg_known_value;
+static rtx *reg_known_value;
 
 /* Indicates number of valid entries in reg_known_value.  */
-static GTY(()) unsigned int reg_known_value_size;
+static unsigned int reg_known_value_size;
 
 /* Vector recording for each reg_known_value whether it is due to a
    REG_EQUIV note.  Future passes (viz., reload) may replace the
@@ -2625,7 +2625,7 @@  init_alias_analysis (void)
   timevar_push (TV_ALIAS_ANALYSIS);
 
   reg_known_value_size = maxreg - FIRST_PSEUDO_REGISTER;
-  reg_known_value = ggc_alloc_cleared_vec_rtx (reg_known_value_size);
+  reg_known_value = XCNEWVEC (rtx, reg_known_value_size);
   reg_known_equiv_p = XCNEWVEC (bool, reg_known_value_size);
 
   /* If we have memory allocated from the previous run, use it.  */
@@ -2635,7 +2635,7 @@  init_alias_analysis (void)
   if (reg_base_value)
     VEC_truncate (rtx, reg_base_value, 0);
 
-  VEC_safe_grow_cleared (rtx, gc, reg_base_value, maxreg);
+  VEC_safe_grow_cleared (rtx, heap, reg_base_value, maxreg);
 
   new_reg_base_value = XNEWVEC (rtx, maxreg);
   reg_seen = XNEWVEC (char, maxreg);
@@ -2800,7 +2800,7 @@  void
 end_alias_analysis (void)
 {
   old_reg_base_value = reg_base_value;
-  ggc_free (reg_known_value);
+  free (reg_known_value);
   reg_known_value = 0;
   reg_known_value_size = 0;
   free (reg_known_equiv_p);
Index: except.c
===================================================================
--- except.c	(revision 162821)
+++ except.c	(working copy)
@@ -2338,7 +2338,7 @@  add_call_site (rtx landing_pad, int acti
 {
   call_site_record record;
 
-  record = ggc_alloc_call_site_record_d ();
+  record = (call_site_record)obstack_alloc (rtl_obstack, sizeof *record);
   record->landing_pad = landing_pad;
   record->action = action;
 
Index: coverage.c
===================================================================
--- coverage.c	(revision 162821)
+++ coverage.c	(working copy)
@@ -106,7 +106,7 @@  static GTY(()) tree tree_ctr_tables[GCOV
 
 /* The names of the counter tables.  Not used if we're
    generating counters at tree level.  */
-static GTY(()) rtx ctr_labels[GCOV_COUNTERS];
+static rtx ctr_labels[GCOV_COUNTERS];
 
 /* The names of merge functions for counters.  */
 static const char *const ctr_merge_functions[GCOV_COUNTERS] = GCOV_MERGE_FUNCTIONS;
Index: except.h
===================================================================
--- except.h	(revision 162821)
+++ except.h	(working copy)
@@ -90,7 +90,7 @@  struct GTY(()) eh_landing_pad_d
      EXCEPTION_RECEIVER pattern will be expanded here, as well as other
      bookkeeping specific to exceptions.  There must not be normal edges
      into the block containing the landing-pad label.  */
-  rtx landing_pad;
+  rtx GTY((skip)) landing_pad;
 
   /* The index of this landing pad within fun->eh->lp_array.  */
   int index;
@@ -178,7 +178,8 @@  struct GTY(()) eh_region_d
   /* EXC_PTR and FILTER values copied from the runtime for this region.
      Each region gets its own psuedos so that if there are nested exceptions
      we do not overwrite the values of the first exception.  */
-  rtx exc_ptr_reg, filter_reg;
+  rtx GTY((skip)) exc_ptr_reg;
+  rtx GTY((skip)) filter_reg;
 
   /* True if this region should use __cxa_end_cleanup instead
      of _Unwind_Resume.  */
Index: emit-rtl.c
===================================================================
--- emit-rtl.c	(revision 162821)
+++ emit-rtl.c	(working copy)
@@ -119,24 +119,19 @@  rtx const_int_rtx[MAX_SAVED_CONST_INT * 
 /* A hash table storing CONST_INTs whose absolute value is greater
    than MAX_SAVED_CONST_INT.  */
 
-static GTY ((if_marked ("ggc_marked_p"), param_is (struct rtx_def)))
-     htab_t const_int_htab;
+static htab_t const_int_htab;
 
 /* A hash table storing memory attribute structures.  */
-static GTY ((if_marked ("ggc_marked_p"), param_is (struct mem_attrs)))
-     htab_t mem_attrs_htab;
+static htab_t mem_attrs_htab;
 
 /* A hash table storing register attribute structures.  */
-static GTY ((if_marked ("ggc_marked_p"), param_is (struct reg_attrs)))
-     htab_t reg_attrs_htab;
+static htab_t reg_attrs_htab;
 
 /* A hash table storing all CONST_DOUBLEs.  */
-static GTY ((if_marked ("ggc_marked_p"), param_is (struct rtx_def)))
-     htab_t const_double_htab;
+static htab_t const_double_htab;
 
 /* A hash table storing all CONST_FIXEDs.  */
-static GTY ((if_marked ("ggc_marked_p"), param_is (struct rtx_def)))
-     htab_t const_fixed_htab;
+static htab_t const_fixed_htab;
 
 #define cur_insn_uid (crtl->emit.x_cur_insn_uid)
 #define cur_debug_insn_uid (crtl->emit.x_cur_debug_insn_uid)
@@ -281,6 +276,8 @@  mem_attrs_htab_eq (const void *x, const 
 		  && operand_equal_p (p->expr, q->expr, 0))));
 }
 
+static GTY(())  VEC(tree,gc) *saved_decls;
+
 /* Allocate a new mem_attrs structure and insert it into the hash table if
    one identical to it is not already in the table.  We are doing this for
    MEM of mode MODE.  */
@@ -302,6 +299,7 @@  get_mem_attrs (alias_set_type alias, tre
 	  ? align == GET_MODE_ALIGNMENT (mode) : align == BITS_PER_UNIT))
     return 0;
 
+  VEC_safe_push (tree, gc, saved_decls, expr);
   attrs.alias = alias;
   attrs.expr = expr;
   attrs.offset = offset;
@@ -312,7 +310,7 @@  get_mem_attrs (alias_set_type alias, tre
   slot = htab_find_slot (mem_attrs_htab, &attrs, INSERT);
   if (*slot == 0)
     {
-      *slot = ggc_alloc_mem_attrs ();
+      *slot = XNEW (mem_attrs);
       memcpy (*slot, &attrs, sizeof (mem_attrs));
     }
 
@@ -355,13 +353,18 @@  get_reg_attrs (tree decl, int offset)
   if (decl == 0 && offset == 0)
     return 0;
 
+  if (cfun)
+    VEC_safe_push (tree, gc, cfun->stack_vars, decl);
+  else
+    VEC_safe_push (tree, gc, saved_decls, decl);
+
   attrs.decl = decl;
   attrs.offset = offset;
 
   slot = htab_find_slot (reg_attrs_htab, &attrs, INSERT);
   if (*slot == 0)
     {
-      *slot = ggc_alloc_reg_attrs ();
+      *slot = XNEW (reg_attrs);
       memcpy (*slot, &attrs, sizeof (reg_attrs));
     }
 
@@ -395,6 +398,25 @@  gen_raw_REG (enum machine_mode mode, int
   return x;
 }
 
+rtx
+gen_rtx_CONST (enum machine_mode mode, rtx arg)
+{
+  rtx x;
+  /* CONST can be shared if it contains a SYMBOL_REF.  If it contains
+     a LABEL_REF, it isn't sharable.  */
+  bool shared = (GET_CODE (arg) == PLUS
+		 && GET_CODE (XEXP (arg, 0)) == SYMBOL_REF
+		 && CONST_INT_P (XEXP (arg, 1)));
+  if (shared)
+    {
+      rtl_on_permanent_obstack ();
+      arg = copy_rtx (arg);
+    }
+  x = gen_rtx_raw_CONST (mode, arg);
+  if (shared)
+    rtl_pop_obstack ();
+  return x;
+}
 /* There are some RTL codes that require special attention; the generation
    functions do the raw handling.  If you add to this list, modify
    special_rtx in gengenrtl.c as well.  */
@@ -416,7 +438,11 @@  gen_rtx_CONST_INT (enum machine_mode mod
   slot = htab_find_slot_with_hash (const_int_htab, &arg,
 				   (hashval_t) arg, INSERT);
   if (*slot == 0)
-    *slot = gen_rtx_raw_CONST_INT (VOIDmode, arg);
+    {
+      rtl_on_permanent_obstack ();
+      *slot = gen_rtx_raw_CONST_INT (VOIDmode, arg);
+      rtl_pop_obstack ();
+    }
 
   return (rtx) *slot;
 }
@@ -449,8 +475,12 @@  lookup_const_double (rtx real)
 rtx
 const_double_from_real_value (REAL_VALUE_TYPE value, enum machine_mode mode)
 {
-  rtx real = rtx_alloc (CONST_DOUBLE);
+  rtx real;
+
+  rtl_on_permanent_obstack ();
+  real = rtx_alloc (CONST_DOUBLE);
   PUT_MODE (real, mode);
+  rtl_pop_obstack ();
 
   real->u.rv = value;
 
@@ -477,8 +507,12 @@  lookup_const_fixed (rtx fixed)
 rtx
 const_fixed_from_fixed_value (FIXED_VALUE_TYPE value, enum machine_mode mode)
 {
-  rtx fixed = rtx_alloc (CONST_FIXED);
+  rtx fixed;
+
+  rtl_on_permanent_obstack ();
+  fixed = rtx_alloc (CONST_FIXED);
   PUT_MODE (fixed, mode);
+  rtl_pop_obstack ();
 
   fixed->u.fv = value;
 
@@ -555,8 +589,10 @@  immed_double_const (HOST_WIDE_INT i0, HO
     return GEN_INT (i0);
 
   /* We use VOIDmode for integers.  */
+  rtl_on_permanent_obstack ();
   value = rtx_alloc (CONST_DOUBLE);
   PUT_MODE (value, VOIDmode);
+  rtl_pop_obstack ();
 
   CONST_DOUBLE_LOW (value) = i0;
   CONST_DOUBLE_HIGH (value) = i1;
@@ -902,7 +938,7 @@  gen_reg_rtx (enum machine_mode mode)
       memset (tmp + old_size, 0, old_size);
       crtl->emit.regno_pointer_align = (unsigned char *) tmp;
 
-      new1 = GGC_RESIZEVEC (rtx, regno_reg_rtx, old_size * 2);
+      new1 = XRESIZEVEC (rtx, regno_reg_rtx, old_size * 2);
       memset (new1 + old_size, 0, old_size * sizeof (rtx));
       regno_reg_rtx = new1;
 
@@ -921,7 +957,7 @@  static void
 update_reg_offset (rtx new_rtx, rtx reg, int offset)
 {
   REG_ATTRS (new_rtx) = get_reg_attrs (REG_EXPR (reg),
-				   REG_OFFSET (reg) + offset);
+				       REG_OFFSET (reg) + offset);
 }
 
 /* Generate a register with same attributes as REG, but with OFFSET
@@ -1814,7 +1850,7 @@  set_mem_attributes_minus_bitpos (rtx ref
   MEM_ATTRS (ref)
     = get_mem_attrs (alias, expr, offset, size, align,
 		     TYPE_ADDR_SPACE (type), GET_MODE (ref));
-
+  
   /* If this is already known to be a scalar or aggregate, we are done.  */
   if (MEM_IN_STRUCT_P (ref) || MEM_SCALAR_P (ref))
     return;
@@ -2222,13 +2258,10 @@  widen_memory_access (rtx memref, enum ma
   return new_rtx;
 }
 
-/* A fake decl that is used as the MEM_EXPR of spill slots.  */
-static GTY(()) tree spill_slot_decl;
-
 tree
 get_spill_slot_decl (bool force_build_p)
 {
-  tree d = spill_slot_decl;
+  tree d = cfun->spill_slot_decl;
   rtx rd;
 
   if (d || !force_build_p)
@@ -2240,7 +2273,7 @@  get_spill_slot_decl (bool force_build_p)
   DECL_IGNORED_P (d) = 1;
   TREE_USED (d) = 1;
   TREE_THIS_NOTRAP (d) = 1;
-  spill_slot_decl = d;
+  cfun->spill_slot_decl = d;
 
   rd = gen_rtx_MEM (BLKmode, frame_pointer_rtx);
   MEM_NOTRAP_P (rd) = 1;
@@ -2248,6 +2281,7 @@  get_spill_slot_decl (bool force_build_p)
 				  NULL_RTX, 0, ADDR_SPACE_GENERIC, BLKmode);
   SET_DECL_RTL (d, rd);
 
+  VEC_safe_push (tree, gc, cfun->stack_vars, d);
   return d;
 }
 
@@ -5238,9 +5272,6 @@  emit (rtx x)
     }
 }
 
-/* Space for free sequence stack entries.  */
-static GTY ((deletable)) struct sequence_stack *free_sequence_stack;
-
 /* Begin emitting insns to a sequence.  If this sequence will contain
    something that might cause the compiler to pop arguments to function
    calls (because those pops have previously been deferred; see
@@ -5253,13 +5284,13 @@  start_sequence (void)
 {
   struct sequence_stack *tem;
 
-  if (free_sequence_stack != NULL)
+  if (crtl->emit.free_sequence_stack != NULL)
     {
-      tem = free_sequence_stack;
-      free_sequence_stack = tem->next;
+      tem = crtl->emit.free_sequence_stack;
+      crtl->emit.free_sequence_stack = tem->next;
     }
   else
-    tem = ggc_alloc_sequence_stack ();
+    tem = XOBNEW (rtl_obstack, struct sequence_stack);
 
   tem->next = seq_stack;
   tem->first = get_insns ();
@@ -5357,8 +5388,8 @@  end_sequence (void)
   seq_stack = tem->next;
 
   memset (tem, 0, sizeof (*tem));
-  tem->next = free_sequence_stack;
-  free_sequence_stack = tem;
+  tem->next = crtl->emit.free_sequence_stack;
+  crtl->emit.free_sequence_stack = tem;
 }
 
 /* Return 1 if currently emitting into a sequence.  */
@@ -5567,6 +5598,8 @@  init_emit (void)
   first_label_num = label_num;
   seq_stack = NULL;
 
+  crtl->emit.free_sequence_stack = NULL;
+
   /* Init the tables that describe all the pseudo regs.  */
 
   crtl->emit.regno_pointer_align_length = LAST_VIRTUAL_REGISTER + 101;
@@ -5574,7 +5607,7 @@  init_emit (void)
   crtl->emit.regno_pointer_align
     = XCNEWVEC (unsigned char, crtl->emit.regno_pointer_align_length);
 
-  regno_reg_rtx = ggc_alloc_vec_rtx (crtl->emit.regno_pointer_align_length);
+  regno_reg_rtx = XNEWVEC (rtx, crtl->emit.regno_pointer_align_length);
 
   /* Put copies of all the hard registers into regno_reg_rtx.  */
   memcpy (regno_reg_rtx,
@@ -5669,7 +5702,9 @@  gen_rtx_CONST_VECTOR (enum machine_mode 
 	return CONST1_RTX (mode);
     }
 
-  return gen_rtx_raw_CONST_VECTOR (mode, v);
+  x = gen_rtx_raw_CONST_VECTOR (mode, v);
+
+  return x;
 }
 
 /* Initialise global register information required by all functions.  */
@@ -5727,21 +5762,23 @@  init_emit_once (void)
   enum machine_mode mode;
   enum machine_mode double_mode;
 
+  rtl_on_permanent_obstack ();
+
   /* Initialize the CONST_INT, CONST_DOUBLE, CONST_FIXED, and memory attribute
      hash tables.  */
-  const_int_htab = htab_create_ggc (37, const_int_htab_hash,
-				    const_int_htab_eq, NULL);
+  const_int_htab = htab_create (37, const_int_htab_hash,
+				const_int_htab_eq, NULL);
 
-  const_double_htab = htab_create_ggc (37, const_double_htab_hash,
-				       const_double_htab_eq, NULL);
+  const_double_htab = htab_create (37, const_double_htab_hash,
+				   const_double_htab_eq, NULL);
 
-  const_fixed_htab = htab_create_ggc (37, const_fixed_htab_hash,
-				      const_fixed_htab_eq, NULL);
+  const_fixed_htab = htab_create (37, const_fixed_htab_hash,
+				  const_fixed_htab_eq, NULL);
 
-  mem_attrs_htab = htab_create_ggc (37, mem_attrs_htab_hash,
-				    mem_attrs_htab_eq, NULL);
-  reg_attrs_htab = htab_create_ggc (37, reg_attrs_htab_hash,
-				    reg_attrs_htab_eq, NULL);
+  mem_attrs_htab = htab_create (37, mem_attrs_htab_hash,
+				mem_attrs_htab_eq, NULL);
+  reg_attrs_htab = htab_create (37, reg_attrs_htab_hash,
+				reg_attrs_htab_eq, NULL);
 
   /* Compute the word and byte modes.  */
 
@@ -5972,6 +6009,7 @@  init_emit_once (void)
   const_tiny_rtx[0][(int) BImode] = const0_rtx;
   if (STORE_FLAG_VALUE == 1)
     const_tiny_rtx[1][(int) BImode] = const1_rtx;
+  rtl_pop_obstack ();
 }
 
 /* Produce exact duplicate of insn INSN after AFTER.
@@ -6039,15 +6077,18 @@  emit_copy_of_insn_after (rtx insn, rtx a
   return new_rtx;
 }
 
-static GTY((deletable)) rtx hard_reg_clobbers [NUM_MACHINE_MODES][FIRST_PSEUDO_REGISTER];
+static rtx hard_reg_clobbers [NUM_MACHINE_MODES][FIRST_PSEUDO_REGISTER];
 rtx
 gen_hard_reg_clobber (enum machine_mode mode, unsigned int regno)
 {
-  if (hard_reg_clobbers[mode][regno])
-    return hard_reg_clobbers[mode][regno];
-  else
-    return (hard_reg_clobbers[mode][regno] =
-	    gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (mode, regno)));
+  if (!hard_reg_clobbers[mode][regno])
+    {
+      rtl_on_permanent_obstack ();
+      hard_reg_clobbers[mode][regno]
+	= gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (mode, regno));
+      rtl_pop_obstack ();
+    }
+  return hard_reg_clobbers[mode][regno];
 }
 
 #include "gt-emit-rtl.h"
Index: cfgexpand.c
===================================================================
--- cfgexpand.c	(revision 162821)
+++ cfgexpand.c	(working copy)
@@ -1234,6 +1234,7 @@  init_vars_expansion (void)
 {
   tree t;
   unsigned ix;
+  
   /* Set TREE_USED on all variables in the local_decls.  */
   FOR_EACH_LOCAL_DECL (cfun, ix, t)
     TREE_USED (t) = 1;
@@ -1252,8 +1253,11 @@  fini_vars_expansion (void)
 {
   size_t i, n = stack_vars_num;
   for (i = 0; i < n; i++)
-    BITMAP_FREE (stack_vars[i].conflicts);
-  XDELETEVEC (stack_vars);
+    {
+      VEC_safe_push (tree, gc, cfun->stack_vars, stack_vars[i].decl);
+      BITMAP_FREE (stack_vars[i].conflicts);
+    }
+  
   XDELETEVEC (stack_vars_sorted);
   stack_vars = NULL;
   stack_vars_alloc = stack_vars_num = 0;
@@ -2337,9 +2341,11 @@  expand_debug_expr (tree exp)
       if (op0)
 	return op0;
 
+      rtl_on_permanent_obstack ();
       op0 = gen_rtx_DEBUG_EXPR (mode);
       DEBUG_EXPR_TREE_DECL (op0) = exp;
       SET_DECL_RTL (exp, op0);
+      rtl_pop_obstack ();
 
       return op0;
 
@@ -3108,6 +3114,7 @@  expand_debug_locations (void)
 	  val = NULL_RTX;
 	else
 	  {
+	    VEC_safe_push (tree, gc, cfun->stack_vars, value);
 	    val = expand_debug_expr (value);
 	    gcc_assert (last == get_last_insn ());
 	  }
@@ -3287,6 +3294,9 @@  expand_gimple_basic_block (basic_block b
 		    rtx val;
 		    enum machine_mode mode;
 
+		    VEC_safe_push (tree, gc, cfun->stack_vars, vexpr);
+		    VEC_safe_push (tree, gc, cfun->stack_vars, value);
+
 		    set_curr_insn_source_location (gimple_location (def));
 		    set_curr_insn_block (gimple_block (def));
 
@@ -3303,6 +3313,7 @@  expand_gimple_basic_block (basic_block b
 
 		    val = emit_debug_insn (val);
 
+		    VEC_safe_push (tree, gc, cfun->stack_vars, vexpr);
 		    FOR_EACH_IMM_USE_STMT (debugstmt, imm_iter, op)
 		      {
 			if (!gimple_debug_bind_p (debugstmt))
@@ -3357,6 +3368,7 @@  expand_gimple_basic_block (basic_block b
 	      else
 		mode = TYPE_MODE (TREE_TYPE (var));
 
+	      VEC_safe_push (tree, gc, cfun->stack_vars, value);
 	      val = gen_rtx_VAR_LOCATION
 		(mode, var, (rtx)value, VAR_INIT_STATUS_INITIALIZED);
 
Index: cselib.c
===================================================================
--- cselib.c	(revision 162821)
+++ cselib.c	(working copy)
@@ -36,7 +36,6 @@  along with GCC; see the file COPYING3.  
 #include "diagnostic-core.h"
 #include "toplev.h"
 #include "output.h"
-#include "ggc.h"
 #include "hashtab.h"
 #include "tree-pass.h"
 #include "cselib.h"
@@ -165,7 +164,7 @@  static unsigned int n_used_regs;
 
 /* We pass this to cselib_invalidate_mem to invalidate all of
    memory for a non-const call instruction.  */
-static GTY(()) rtx callmem;
+static rtx callmem;
 
 /* Set by discard_useless_locs if it deleted the last location of any
    value.  */
@@ -2241,8 +2240,10 @@  cselib_init (int record_what)
 
   /* (mem:BLK (scratch)) is a special mechanism to conflict with everything,
      see canon_true_dependence.  This is only created once.  */
+  rtl_on_permanent_obstack ();
   if (! callmem)
     callmem = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (VOIDmode));
+  rtl_pop_obstack ();
 
   cselib_nregs = max_reg_num ();
 
@@ -2377,4 +2378,3 @@  dump_cselib_table (FILE *out)
   fprintf (out, "next uid %i\n", next_uid);
 }
 
-#include "gt-cselib.h"
Index: explow.c
===================================================================
--- explow.c	(revision 162821)
+++ explow.c	(working copy)
@@ -37,7 +37,6 @@  along with GCC; see the file COPYING3.  
 #include "libfuncs.h"
 #include "hard-reg-set.h"
 #include "insn-config.h"
-#include "ggc.h"
 #include "recog.h"
 #include "langhooks.h"
 #include "target.h"
@@ -1344,13 +1343,15 @@  allocate_dynamic_stack_space (rtx size, 
    run-time routine to call to check the stack, so provide a mechanism for
    calling that routine.  */
 
-static GTY(()) rtx stack_check_libfunc;
+static rtx stack_check_libfunc;
 
 void
 set_stack_check_libfunc (const char *libfunc_name)
 {
   gcc_assert (stack_check_libfunc == NULL_RTX);
+  rtl_on_permanent_obstack ();
   stack_check_libfunc = gen_rtx_SYMBOL_REF (Pmode, libfunc_name);
+  rtl_pop_obstack ();
 }
 
 /* Emit one stack probe at ADDRESS, an address within the stack.  */
@@ -1759,4 +1760,3 @@  rtx_to_tree_code (enum rtx_code code)
   return ((int) tcode);
 }
 
-#include "gt-explow.h"
Index: cfglayout.h
===================================================================
--- cfglayout.h	(revision 162821)
+++ cfglayout.h	(working copy)
@@ -22,8 +22,8 @@ 
 
 #include "basic-block.h"
 
-extern GTY(()) rtx cfg_layout_function_footer;
-extern GTY(()) rtx cfg_layout_function_header;
+extern rtx cfg_layout_function_footer;
+extern rtx cfg_layout_function_header;
 
 extern void cfg_layout_initialize (unsigned int);
 extern void cfg_layout_finalize (void);
Index: varasm.c
===================================================================
--- varasm.c	(revision 162821)
+++ varasm.c	(working copy)
@@ -178,13 +178,13 @@  static GTY(()) section *unnamed_sections
 static GTY((param_is (section))) htab_t section_htab;
 
 /* A table of object_blocks, indexed by section.  */
-static GTY((param_is (struct object_block))) htab_t object_block_htab;
+static htab_t object_block_htab;
 
 /* The next number to use for internal anchor labels.  */
 static GTY(()) int anchor_labelno;
 
 /* A pool of constants that can be shared between functions.  */
-static GTY(()) struct rtx_constant_pool *shared_constant_pool;
+static struct rtx_constant_pool *shared_constant_pool;
 
 /* Helper routines for maintaining section_htab.  */
 
@@ -280,7 +280,7 @@  get_section (const char *name, unsigned 
     {
       sect = ggc_alloc_section ();
       sect->named.common.flags = flags;
-      sect->named.name = ggc_strdup (name);
+      sect->named.name = rtl_strdup (name);
       sect->named.decl = decl;
       *slot = sect;
     }
@@ -327,7 +327,7 @@  get_block_for_section (section *sect)
   block = (struct object_block *) *slot;
   if (block == NULL)
     {
-      block = ggc_alloc_cleared_object_block ();
+      block = XCNEW (struct object_block);
       block->sect = sect;
       *slot = block;
     }
@@ -347,7 +347,7 @@  create_block_symbol (const char *label, 
 
   /* Create the extended SYMBOL_REF.  */
   size = RTX_HDR_SIZE + sizeof (struct block_symbol);
-  symbol = ggc_alloc_zone_rtx_def (size, &rtl_zone);
+  symbol = (rtx)obstack_alloc (rtl_obstack, size);
 
   /* Initialize the normal SYMBOL_REF fields.  */
   memset (symbol, 0, size);
@@ -383,7 +383,7 @@  initialize_cold_section_name (void)
       stripped_name = targetm.strip_name_encoding (name);
 
       buffer = ACONCAT ((stripped_name, "_unlikely", NULL));
-      crtl->subsections.unlikely_text_section_name = ggc_strdup (buffer);
+      crtl->subsections.unlikely_text_section_name = rtl_strdup (buffer);
     }
   else
     crtl->subsections.unlikely_text_section_name =  UNLIKELY_EXECUTED_TEXT_SECTION_NAME;
@@ -1038,7 +1038,14 @@  make_decl_rtl (tree decl)
       /* If the old RTL had the wrong mode, fix the mode.  */
       x = DECL_RTL (decl);
       if (GET_MODE (x) != DECL_MODE (decl))
-	SET_DECL_RTL (decl, adjust_address_nv (x, DECL_MODE (decl), 0));
+	{
+	  rtx newx;
+
+	  rtl_on_permanent_obstack ();
+	  newx = adjust_address_nv (x, DECL_MODE (decl), 0);
+	  rtl_pop_obstack ();
+	  SET_DECL_RTL (decl, newx);
+	}
 
       if (TREE_CODE (decl) != FUNCTION_DECL && DECL_REGISTER (decl))
 	return;
@@ -1112,6 +1119,9 @@  make_decl_rtl (tree decl)
 		     "optimization may eliminate reads and/or "
 		     "writes to register variables");
 
+	  if (TREE_STATIC (decl))
+	    rtl_on_permanent_obstack ();
+
 	  /* If the user specified one of the eliminables registers here,
 	     e.g., FRAME_POINTER_REGNUM, we don't want to get this variable
 	     confused with that register and be eliminated.  This usage is
@@ -1122,6 +1132,9 @@  make_decl_rtl (tree decl)
 	  REG_USERVAR_P (DECL_RTL (decl)) = 1;
 
 	  if (TREE_STATIC (decl))
+	    rtl_pop_obstack ();
+
+	  if (TREE_STATIC (decl))
 	    {
 	      /* Make this register global, so not usable for anything
 		 else.  */
@@ -1168,6 +1181,8 @@  make_decl_rtl (tree decl)
   if (TREE_CODE (decl) == VAR_DECL && DECL_WEAK (decl))
     DECL_COMMON (decl) = 0;
 
+  rtl_on_permanent_obstack ();
+  
   if (use_object_blocks_p () && use_blocks_for_decl_p (decl))
     x = create_block_symbol (name, get_block_for_decl (decl), -1);
   else
@@ -1188,6 +1203,8 @@  make_decl_rtl (tree decl)
     set_mem_attributes (x, decl, 1);
   SET_DECL_RTL (decl, x);
 
+  rtl_pop_obstack ();
+
   /* Optionally set flags or add text to the name to record information
      such as that it is a function name.
      If the name is changed, the macro ASM_OUTPUT_LABELREF
@@ -1424,13 +1441,13 @@  assemble_start_function (tree decl, cons
   if (flag_reorder_blocks_and_partition)
     {
       ASM_GENERATE_INTERNAL_LABEL (tmp_label, "LHOTB", const_labelno);
-      crtl->subsections.hot_section_label = ggc_strdup (tmp_label);
+      crtl->subsections.hot_section_label = rtl_strdup (tmp_label);
       ASM_GENERATE_INTERNAL_LABEL (tmp_label, "LCOLDB", const_labelno);
-      crtl->subsections.cold_section_label = ggc_strdup (tmp_label);
+      crtl->subsections.cold_section_label = rtl_strdup (tmp_label);
       ASM_GENERATE_INTERNAL_LABEL (tmp_label, "LHOTE", const_labelno);
-      crtl->subsections.hot_section_end_label = ggc_strdup (tmp_label);
+      crtl->subsections.hot_section_end_label = rtl_strdup (tmp_label);
       ASM_GENERATE_INTERNAL_LABEL (tmp_label, "LCOLDE", const_labelno);
-      crtl->subsections.cold_section_end_label = ggc_strdup (tmp_label);
+      crtl->subsections.cold_section_end_label = rtl_strdup (tmp_label);
       const_labelno++;
     }
   else
@@ -2199,7 +2216,7 @@  assemble_static_space (unsigned HOST_WID
 
   ASM_GENERATE_INTERNAL_LABEL (name, "LF", const_labelno);
   ++const_labelno;
-  namestring = ggc_strdup (name);
+  namestring = rtl_strdup (name);
 
   x = gen_rtx_SYMBOL_REF (Pmode, namestring);
   SYMBOL_REF_FLAGS (x) = SYMBOL_FLAG_LOCAL;
@@ -2230,7 +2247,7 @@  assemble_static_space (unsigned HOST_WID
    This is done at most once per compilation.
    Returns an RTX for the address of the template.  */
 
-static GTY(()) rtx initial_trampoline;
+static rtx initial_trampoline;
 
 rtx
 assemble_trampoline_template (void)
@@ -2245,6 +2262,8 @@  assemble_trampoline_template (void)
   if (initial_trampoline)
     return initial_trampoline;
 
+  rtl_on_permanent_obstack ();
+
   /* By default, put trampoline templates in read-only data section.  */
 
 #ifdef TRAMPOLINE_SECTION
@@ -2263,7 +2282,7 @@  assemble_trampoline_template (void)
 
   /* Record the rtl to refer to it.  */
   ASM_GENERATE_INTERNAL_LABEL (label, "LTRAMP", 0);
-  name = ggc_strdup (label);
+  name = rtl_strdup (label);
   symbol = gen_rtx_SYMBOL_REF (Pmode, name);
   SYMBOL_REF_FLAGS (symbol) = SYMBOL_FLAG_LOCAL;
 
@@ -2271,6 +2290,8 @@  assemble_trampoline_template (void)
   set_mem_align (initial_trampoline, TRAMPOLINE_ALIGNMENT);
   set_mem_size (initial_trampoline, GEN_INT (TRAMPOLINE_SIZE));
 
+  rtl_pop_obstack ();
+
   return initial_trampoline;
 }
 
@@ -2443,7 +2464,7 @@  assemble_real (REAL_VALUE_TYPE d, enum m
    Store them both in the structure *VALUE.
    EXP must be reducible.  */
 
-struct GTY(()) addr_const {
+struct addr_const {
   rtx base;
   HOST_WIDE_INT offset;
 };
@@ -2903,6 +2924,8 @@  copy_constant (tree exp)
     }
 }
 
+static GTY(()) VEC(tree,gc) *saved_constant_decls;
+
 /* Return the section into which constant EXP should be placed.  */
 
 static section *
@@ -2977,18 +3000,21 @@  build_constant_desc (tree exp)
   else
     align_variable (decl, 0);
 
+  rtl_on_permanent_obstack ();
+
   /* Now construct the SYMBOL_REF and the MEM.  */
   if (use_object_blocks_p ())
     {
       section *sect = get_constant_section (exp, DECL_ALIGN (decl));
-      symbol = create_block_symbol (ggc_strdup (label),
+      symbol = create_block_symbol (rtl_strdup (label),
 				    get_block_for_section (sect), -1);
     }
   else
-    symbol = gen_rtx_SYMBOL_REF (Pmode, ggc_strdup (label));
+    symbol = gen_rtx_SYMBOL_REF (Pmode, rtl_strdup (label));
   SYMBOL_REF_FLAGS (symbol) |= SYMBOL_FLAG_LOCAL;
   SET_SYMBOL_REF_DECL (symbol, decl);
   TREE_CONSTANT_POOL_ADDRESS_P (symbol) = 1;
+  VEC_safe_push (tree, gc, saved_constant_decls, decl);
 
   rtl = gen_rtx_MEM (TYPE_MODE (TREE_TYPE (exp)), symbol);
   set_mem_attributes (rtl, exp, 1);
@@ -3008,6 +3034,8 @@  build_constant_desc (tree exp)
 
   desc->rtl = rtl;
 
+  rtl_pop_obstack ();
+
   return desc;
 }
 
@@ -3188,7 +3216,7 @@  tree_output_constant_def (tree exp)
    can use one per-file pool.  Should add a targetm bit to tell the
    difference.  */
 
-struct GTY(()) rtx_constant_pool {
+struct rtx_constant_pool {
   /* Pointers to first and last constant in pool, as ordered by offset.  */
   struct constant_descriptor_rtx *first;
   struct constant_descriptor_rtx *last;
@@ -3197,14 +3225,14 @@  struct GTY(()) rtx_constant_pool {
      It is used on RISC machines where immediate integer arguments and
      constant addresses are restricted so that such constants must be stored
      in memory.  */
-  htab_t GTY((param_is (struct constant_descriptor_rtx))) const_rtx_htab;
+  htab_t const_rtx_htab;
 
   /* Current offset in constant pool (does not include any
      machine-specific header).  */
   HOST_WIDE_INT offset;
 };
 
-struct GTY((chain_next ("%h.next"))) constant_descriptor_rtx {
+struct constant_descriptor_rtx {
   struct constant_descriptor_rtx *next;
   rtx mem;
   rtx sym;
@@ -3337,8 +3365,8 @@  create_constant_pool (void)
 {
   struct rtx_constant_pool *pool;
 
-  pool = ggc_alloc_rtx_constant_pool ();
-  pool->const_rtx_htab = htab_create_ggc (31, const_desc_rtx_hash,
+  pool = XNEW (struct rtx_constant_pool);
+  pool->const_rtx_htab = htab_create (31, const_desc_rtx_hash,
 					  const_desc_rtx_eq, NULL);
   pool->first = NULL;
   pool->last = NULL;
@@ -3403,7 +3431,7 @@  force_const_mem (enum machine_mode mode,
     return copy_rtx (desc->mem);
 
   /* Otherwise, create a new descriptor.  */
-  desc = ggc_alloc_constant_descriptor_rtx ();
+  desc = XNEW (struct constant_descriptor_rtx);
   *slot = desc;
 
   /* Align the location counter as required by EXP's data type.  */
@@ -3435,6 +3463,9 @@  force_const_mem (enum machine_mode mode,
     pool->first = pool->last = desc;
   pool->last = desc;
 
+  if (pool == shared_constant_pool)
+    rtl_on_permanent_obstack ();
+
   /* Create a string containing the label name, in LABEL.  */
   ASM_GENERATE_INTERNAL_LABEL (label, "LC", const_labelno);
   ++const_labelno;
@@ -3444,11 +3475,11 @@  force_const_mem (enum machine_mode mode,
   if (use_object_blocks_p () && targetm.use_blocks_for_constant_p (mode, x))
     {
       section *sect = targetm.asm_out.select_rtx_section (mode, x, align);
-      symbol = create_block_symbol (ggc_strdup (label),
+      symbol = create_block_symbol (rtl_strdup (label),
 				    get_block_for_section (sect), -1);
     }
   else
-    symbol = gen_rtx_SYMBOL_REF (Pmode, ggc_strdup (label));
+    symbol = gen_rtx_SYMBOL_REF (Pmode, rtl_strdup (label));
   desc->sym = symbol;
   SYMBOL_REF_FLAGS (symbol) |= SYMBOL_FLAG_LOCAL;
   CONSTANT_POOL_ADDRESS_P (symbol) = 1;
@@ -3464,7 +3495,11 @@  force_const_mem (enum machine_mode mode,
   if (GET_CODE (x) == LABEL_REF)
     LABEL_PRESERVE_P (XEXP (x, 0)) = 1;
 
-  return copy_rtx (def);
+  def = copy_rtx (def);
+  if (pool == shared_constant_pool)
+    rtl_pop_obstack ();
+
+  return def;
 }
 
 /* Given a constant pool SYMBOL_REF, return the corresponding constant.  */
@@ -5640,7 +5675,7 @@  init_varasm_once (void)
 {
   section_htab = htab_create_ggc (31, section_entry_hash,
 				  section_entry_eq, NULL);
-  object_block_htab = htab_create_ggc (31, object_block_entry_hash,
+  object_block_htab = htab_create (31, object_block_entry_hash,
 				       object_block_entry_eq, NULL);
   const_desc_htab = htab_create_ggc (1009, const_desc_hash,
 				     const_desc_eq, NULL);
@@ -6685,7 +6720,7 @@  get_section_anchor (struct object_block 
 
   /* Create a new anchor with a unique label.  */
   ASM_GENERATE_INTERNAL_LABEL (label, "LANCHOR", anchor_labelno++);
-  anchor = create_block_symbol (ggc_strdup (label), block, offset);
+  anchor = create_block_symbol (rtl_strdup (label), block, offset);
   SYMBOL_REF_FLAGS (anchor) |= SYMBOL_FLAG_LOCAL | SYMBOL_FLAG_ANCHOR;
   SYMBOL_REF_FLAGS (anchor) |= model << SYMBOL_FLAG_TLS_SHIFT;
 
@@ -6884,6 +6919,7 @@  make_debug_expr_from_rtl (const_rtx exp)
   enum machine_mode mode = GET_MODE (exp);
   rtx dval;
 
+  VEC_safe_push (tree, gc, cfun->stack_vars, ddecl);
   DECL_ARTIFICIAL (ddecl) = 1;
   if (REG_P (exp) && REG_EXPR (exp))
     type = TREE_TYPE (REG_EXPR (exp));
Index: ira.c
===================================================================
--- ira.c	(revision 162821)
+++ ira.c	(working copy)
@@ -1693,7 +1693,7 @@  fix_reg_equiv_init (void)
 
   if (reg_equiv_init_size < max_regno)
     {
-      reg_equiv_init = GGC_RESIZEVEC (rtx, reg_equiv_init, max_regno);
+      reg_equiv_init = XRESIZEVEC (rtx, reg_equiv_init, max_regno);
       while (reg_equiv_init_size < max_regno)
 	reg_equiv_init[reg_equiv_init_size++] = NULL_RTX;
       for (i = FIRST_PSEUDO_REGISTER; i < reg_equiv_init_size; i++)
@@ -2272,7 +2272,7 @@  update_equiv_regs (void)
   recorded_label_ref = 0;
 
   reg_equiv = XCNEWVEC (struct equivalence, max_regno);
-  reg_equiv_init = ggc_alloc_cleared_vec_rtx (max_regno);
+  reg_equiv_init = XCNEWVEC (rtx, max_regno);
   reg_equiv_init_size = max_regno;
 
   init_alias_analysis ();
Index: rtl.c
===================================================================
--- rtl.c	(revision 162821)
+++ rtl.c	(working copy)
@@ -31,12 +31,19 @@  along with GCC; see the file COPYING3.  
 #include "tm.h"
 #include "rtl.h"
 #include "ggc.h"
+#include "obstack.h"
 #ifdef GENERATOR_FILE
 # include "errors.h"
 #else
 # include "diagnostic-core.h"
 #endif
 
+/* Obstack used for allocating RTL objects.  */
+
+static struct obstack function_obstack, permanent_obstack;
+struct obstack *rtl_obstack = &function_obstack;
+static char *rtl_firstobj;
+static int permanent_nesting;
 
 /* Indexed by rtx code, gives number of operands for an rtx with that code.
    Does NOT include rtx header data (code and links).  */
@@ -139,7 +146,6 @@  static int rtx_alloc_sizes[(int) LAST_AN
 static int rtvec_alloc_counts;
 static int rtvec_alloc_sizes;
 #endif
-
 
 /* Allocate an rtx vector of N elements.
    Store the length, and initialize all elements to zero.  */
@@ -149,7 +155,9 @@  rtvec_alloc (int n)
 {
   rtvec rt;
 
-  rt = ggc_alloc_rtvec_sized (n);
+  rt = (rtvec) obstack_alloc (rtl_obstack,
+			      sizeof (struct rtvec_def)
+			      + ((n - 1) * sizeof (rtunion)));
   /* Clear out the vector.  */
   memset (&rt->elem[0], 0, n * sizeof (rtx));
 
@@ -193,8 +201,24 @@  rtx_size (const_rtx x)
 rtx
 rtx_alloc_stat (RTX_CODE code MEM_STAT_DECL)
 {
-  rtx rt = ggc_alloc_zone_rtx_def_stat (&rtl_zone, RTX_CODE_SIZE (code)
-                                        PASS_MEM_STAT);
+  rtx rt;
+  struct obstack *ob = rtl_obstack;
+  register int length = RTX_CODE_SIZE (code);
+
+  /* This function is called very frequently, so we manipulate the
+     obstack directly.
+
+     Even though rtx objects are word aligned, we may be sharing an obstack
+     with tree nodes, which may have to be double-word aligned.  So align
+     our length to the alignment mask in the obstack.  */
+
+  length = (length + ob->alignment_mask) & ~ ob->alignment_mask;
+
+  if (ob->chunk_limit - ob->next_free < length)
+    _obstack_newchunk (ob, length);
+  rt = (rtx)ob->object_base;
+  ob->next_free += length;
+  ob->object_base = ob->next_free;
 
   /* We want to clear everything up to the FLD array.  Normally, this
      is one int, but we don't want to assume that and it isn't very
@@ -333,10 +357,10 @@  copy_rtx (rtx orig)
 /* Create a new copy of an rtx.  Only copy just one level.  */
 
 rtx
-shallow_copy_rtx_stat (const_rtx orig MEM_STAT_DECL)
+shallow_copy_rtx (const_rtx orig)
 {
   const unsigned int size = rtx_size (orig);
-  rtx const copy = ggc_alloc_zone_rtx_def_stat (&rtl_zone, size PASS_MEM_STAT);
+  rtx const copy = rtx_alloc (GET_CODE (orig));
   return (rtx) memcpy (copy, orig, size);
 }
 
@@ -721,3 +745,199 @@  rtl_check_failed_flag (const char *name,
      name, GET_RTX_NAME (GET_CODE (r)), func, trim_filename (file), line);
 }
 #endif /* ENABLE_RTL_FLAG_CHECKING */
+
+#if 0
+/* Allocate an rtx vector of N elements.
+   Store the length, and initialize all elements to zero.  */
+
+rtvec
+ggc_rtvec_alloc (int n)
+{
+  rtvec rt;
+
+  rt = ggc_alloc_rtvec_sized (n);
+  /* Clear out the vector.  */
+  memset (&rt->elem[0], 0, n * sizeof (rtx));
+
+  PUT_NUM_ELEM (rt, n);
+
+#ifdef GATHER_STATISTICS
+  rtvec_alloc_counts++;
+  rtvec_alloc_sizes += n * sizeof (rtx);
+#endif
+
+  return rt;
+}
+
+/* Allocate an rtx of code CODE.  The CODE is stored in the rtx;
+   all the rest is initialized to zero.  */
+
+rtx
+ggc_rtx_alloc_stat (RTX_CODE code MEM_STAT_DECL)
+{
+  rtx rt = ggc_alloc_zone_rtx_def_stat (&rtl_zone, RTX_CODE_SIZE (code)
+                                        PASS_MEM_STAT);
+
+  /* We want to clear everything up to the FLD array.  Normally, this
+     is one int, but we don't want to assume that and it isn't very
+     portable anyway; this is.  */
+
+  memset (rt, 0, RTX_HDR_SIZE);
+  PUT_CODE (rt, code);
+
+#ifdef GATHER_STATISTICS
+  rtx_alloc_counts[code]++;
+  rtx_alloc_sizes[code] += RTX_CODE_SIZE (code);
+#endif
+
+  return rt;
+}
+
+/* Create a new copy of an rtx.  Only copy just one level.  */
+
+static rtx
+ggc_shallow_copy_rtx_stat (const_rtx orig MEM_STAT_DECL)
+{
+  const unsigned int size = rtx_size (orig);
+  rtx const copy = ggc_alloc_zone_rtx_def_stat (&rtl_zone, size PASS_MEM_STAT);
+  return (rtx) memcpy (copy, orig, size);
+}
+
+#define ggc_shallow_copy_rtx(a) ggc_shallow_copy_rtx_stat (a MEM_STAT_INFO)
+
+rtx
+copy_rtx_to_ggc (rtx orig)
+{
+  rtx copy;
+  int i, j;
+  RTX_CODE code;
+  const char *format_ptr;
+
+  code = GET_CODE (orig);
+
+  switch (code)
+    {
+    case REG:
+    case CC0:
+    case SCRATCH:
+    case INSN:
+    case SET:
+    case CLOBBER:
+    case JUMP_INSN:
+    case CALL_INSN:
+    case DEBUG_INSN:
+    case DEBUG_EXPR:
+      gcc_unreachable ();
+
+    case CONST_INT:
+    case CONST_DOUBLE:
+    case CONST_FIXED:
+    case CONST_VECTOR:
+      /* These are shared and in ggc memory already.  */
+      return orig;
+
+    default:
+      break;
+    }
+
+  /* Copy the various flags, fields, and other information.  We assume
+     that all fields need copying, and then clear the fields that should
+     not be copied.  That is the sensible default behavior, and forces
+     us to explicitly document why we are *not* copying a flag.  */
+  copy = ggc_shallow_copy_rtx (orig);
+
+  /* We do not copy the USED flag, which is used as a mark bit during
+     walks over the RTL.  */
+  RTX_FLAG (copy, used) = 0;
+
+  RTX_FLAG (copy, jump) = RTX_FLAG (orig, jump);
+  RTX_FLAG (copy, call) = RTX_FLAG (orig, call);
+
+  format_ptr = GET_RTX_FORMAT (GET_CODE (copy));
+
+  for (i = 0; i < GET_RTX_LENGTH (GET_CODE (copy)); i++)
+    switch (*format_ptr++)
+      {
+      case 'e':
+	if (XEXP (orig, i) != NULL)
+	  XEXP (copy, i) = copy_rtx_to_ggc (XEXP (orig, i));
+	break;
+
+      case 'E':
+      case 'V':
+	if (XVEC (orig, i) != NULL)
+	  {
+	    XVEC (copy, i) = ggc_rtvec_alloc (XVECLEN (orig, i));
+	    for (j = 0; j < XVECLEN (copy, i); j++)
+	      XVECEXP (copy, i, j) = copy_rtx_to_ggc (XVECEXP (orig, i, j));
+	  }
+	break;
+
+      case 't':
+      case 'w':
+      case 'i':
+      case 's':
+      case 'S':
+      case 'T':
+      case 'u':
+      case 'B':
+      case '0':
+	/* These are left unchanged.  */
+	break;
+
+      default:
+	gcc_unreachable ();
+      }
+  return copy;
+}
+#endif
+
+rtx
+permanent_copy_rtx (rtx x)
+{
+  rtl_on_permanent_obstack ();
+  x = copy_rtx (x);
+  rtl_pop_obstack ();
+  return x;
+}
+
+char *
+rtl_strdup (const char *s)
+{
+  size_t len = strlen (s);
+  char *t = XOBNEWVAR (&permanent_obstack, char, len + 1);
+  memcpy (t, s, len + 1);
+  return t;
+}
+
+void preserve_rtl (void)
+{
+  rtl_firstobj = (char *) obstack_alloc (&function_obstack, 0);
+}
+
+void init_rtl (void)
+{
+  gcc_obstack_init (&permanent_obstack);
+  gcc_obstack_init (&function_obstack);
+  rtl_firstobj
+    = (char *) obstack_alloc (rtl_obstack, 0);
+}
+
+void free_function_rtl (void)
+{
+  gcc_assert (permanent_nesting == 0);
+  obstack_free (&function_obstack, rtl_firstobj);
+}
+
+void rtl_on_permanent_obstack (void)
+{
+  permanent_nesting++;
+  rtl_obstack = &permanent_obstack;
+}
+
+void rtl_pop_obstack (void)
+{
+  gcc_assert (permanent_nesting > 0);
+  if (--permanent_nesting == 0)
+    rtl_obstack = &function_obstack;
+}
Index: rtl.h
===================================================================
--- rtl.h	(revision 162821)
+++ rtl.h	(working copy)
@@ -143,8 +143,8 @@  typedef struct
 typedef struct GTY(()) mem_attrs
 {
   tree expr;			/* expr corresponding to MEM.  */
-  rtx offset;			/* Offset from start of DECL, as CONST_INT.  */
-  rtx size;			/* Size in bytes, as a CONST_INT.  */
+  rtx GTY((skip)) offset;	/* Offset from start of DECL, as CONST_INT.  */
+  rtx GTY((skip)) size;		/* Size in bytes, as a CONST_INT.  */
   alias_set_type alias;		/* Memory alias set.  */
   unsigned int align;		/* Alignment of MEM in bits.  */
   unsigned char addrspace;	/* Address space (0 for generic).  */
@@ -185,9 +185,9 @@  typedef union rtunion_def rtunion;
 /* This structure remembers the position of a SYMBOL_REF within an
    object_block structure.  A SYMBOL_REF only provides this information
    if SYMBOL_REF_HAS_BLOCK_INFO_P is true.  */
-struct GTY(()) block_symbol {
+struct block_symbol {
   /* The usual SYMBOL_REF fields.  */
-  rtunion GTY ((skip)) fld[3];
+  rtunion fld[3];
 
   /* The block that contains this object.  */
   struct object_block *block;
@@ -199,7 +199,7 @@  struct GTY(()) block_symbol {
 
 /* Describes a group of objects that are to be placed together in such
    a way that their relative positions are known.  */
-struct GTY(()) object_block {
+struct object_block {
   /* The section in which these objects should be placed.  */
   section *sect;
 
@@ -232,8 +232,7 @@  struct GTY(()) object_block {
 
 /* RTL expression ("rtx").  */
 
-struct GTY((chain_next ("RTX_NEXT (&%h)"),
-	    chain_prev ("RTX_PREV (&%h)"), variable_size)) rtx_def {
+struct rtx_def {
   /* The kind of expression this is.  */
   ENUM_BITFIELD(rtx_code) code: 16;
 
@@ -314,7 +313,7 @@  struct GTY((chain_next ("RTX_NEXT (&%h)"
     struct block_symbol block_sym;
     struct real_value rv;
     struct fixed_value fv;
-  } GTY ((special ("rtx_def"), desc ("GET_CODE (&%0)"))) u;
+  } u;
 };
 
 /* The size in bytes of an rtx header (code, mode and flags).  */
@@ -352,7 +351,7 @@  struct GTY((chain_next ("RTX_NEXT (&%h)"
    for a variable number of things.  The principle use is inside
    PARALLEL expressions.  */
 
-struct GTY((variable_size)) rtvec_def {
+struct rtvec_def {
   int num_elem;		/* number of elements */
   rtx GTY ((length ("%h.num_elem"))) elem[1];
 };
@@ -1543,6 +1542,12 @@  extern int generating_concat_p;
 /* Nonzero when we are expanding trees to RTL.  */
 extern int currently_expanding_to_rtl;
 
+extern struct obstack *rtl_obstack;
+
+extern void rtl_on_permanent_obstack (void);
+extern void rtl_pop_obstack (void);
+extern void free_function_rtl (void);
+
 /* Generally useful functions.  */
 
 /* In expmed.c */
@@ -1555,10 +1560,15 @@  extern rtx plus_constant (rtx, HOST_WIDE
 /* In rtl.c */
 extern rtx rtx_alloc_stat (RTX_CODE MEM_STAT_DECL);
 #define rtx_alloc(c) rtx_alloc_stat (c MEM_STAT_INFO)
+extern rtx ggc_rtx_alloc_stat (RTX_CODE MEM_STAT_DECL);
+#define ggc_rtx_alloc(c) ggc_rtx_alloc_stat (c MEM_STAT_INFO)
+extern char *rtl_strdup (const char *);
 
+extern rtvec ggc_rtvec_alloc (int);
 extern rtvec rtvec_alloc (int);
 extern rtvec shallow_copy_rtvec (rtvec);
 extern bool shared_const_p (const_rtx);
+extern rtx permanent_copy_rtx (rtx);
 extern rtx copy_rtx (rtx);
 extern void dump_rtx_statistics (void);
 
@@ -1566,9 +1576,11 @@  extern void dump_rtx_statistics (void);
 extern rtx copy_rtx_if_shared (rtx);
 
 /* In rtl.c */
+extern void init_rtl (void);
+extern void preserve_rtl (void);
+
 extern unsigned int rtx_size (const_rtx);
-extern rtx shallow_copy_rtx_stat (const_rtx MEM_STAT_DECL);
-#define shallow_copy_rtx(a) shallow_copy_rtx_stat (a MEM_STAT_INFO)
+extern rtx shallow_copy_rtx (const_rtx);
 extern int rtx_equal_p (const_rtx, const_rtx);
 
 /* In emit-rtl.c */
@@ -1917,7 +1929,7 @@  extern void remove_free_INSN_LIST_elem (
 extern rtx remove_list_elem (rtx, rtx *);
 extern rtx remove_free_INSN_LIST_node (rtx *);
 extern rtx remove_free_EXPR_LIST_node (rtx *);
-
+extern void discard_rtx_lists (void);
 
 /* reginfo.c */
 
@@ -1946,15 +1958,15 @@  extern void split_all_insns (void);
 extern unsigned int split_all_insns_noflow (void);
 
 #define MAX_SAVED_CONST_INT 64
-extern GTY(()) rtx const_int_rtx[MAX_SAVED_CONST_INT * 2 + 1];
+extern rtx const_int_rtx[MAX_SAVED_CONST_INT * 2 + 1];
 
 #define const0_rtx	(const_int_rtx[MAX_SAVED_CONST_INT])
 #define const1_rtx	(const_int_rtx[MAX_SAVED_CONST_INT+1])
 #define const2_rtx	(const_int_rtx[MAX_SAVED_CONST_INT+2])
 #define constm1_rtx	(const_int_rtx[MAX_SAVED_CONST_INT-1])
-extern GTY(()) rtx const_true_rtx;
+extern rtx const_true_rtx;
 
-extern GTY(()) rtx const_tiny_rtx[3][(int) MAX_MACHINE_MODE];
+extern rtx const_tiny_rtx[3][(int) MAX_MACHINE_MODE];
 
 /* Returns a constant 0 rtx in mode MODE.  Integer modes are treated the
    same as VOIDmode.  */
@@ -2011,7 +2023,7 @@  enum global_rtl_index
 };
 
 /* Target-dependent globals.  */
-struct GTY(()) target_rtl {
+struct target_rtl {
   /* All references to the hard registers in global_rtl_index go through
      these unique rtl objects.  On machines where the frame-pointer and
      arg-pointer are the same register, they use the same unique object.
@@ -2051,7 +2063,7 @@  struct GTY(()) target_rtl {
   rtx x_static_reg_base_value[FIRST_PSEUDO_REGISTER];
 };
 
-extern GTY(()) struct target_rtl default_target_rtl;
+extern struct target_rtl default_target_rtl;
 #if SWITCHABLE_TARGET
 extern struct target_rtl *this_target_rtl;
 #else
@@ -2437,7 +2449,7 @@  extern int stack_regs_mentioned (const_r
 #endif
 
 /* In toplev.c */
-extern GTY(()) rtx stack_limit_rtx;
+extern rtx stack_limit_rtx;
 
 /* In predict.c */
 extern void invert_br_probabilities (rtx);
Index: integrate.c
===================================================================
--- integrate.c	(revision 162821)
+++ integrate.c	(working copy)
@@ -42,7 +42,6 @@  along with GCC; see the file COPYING3.  
 #include "toplev.h"
 #include "intl.h"
 #include "params.h"
-#include "ggc.h"
 #include "target.h"
 #include "langhooks.h"
 #include "tree-pass.h"
@@ -53,14 +52,14 @@  along with GCC; see the file COPYING3.  
 
 
 /* Private type used by {get/has}_hard_reg_initial_val.  */
-typedef struct GTY(()) initial_value_pair {
+typedef struct initial_value_pair {
   rtx hard_reg;
   rtx pseudo;
 } initial_value_pair;
 typedef struct GTY(()) initial_value_struct {
   int num_entries;
   int max_entries;
-  initial_value_pair * GTY ((length ("%h.num_entries"))) entries;
+  initial_value_pair *entries;
 } initial_value_struct;
 
 static void set_block_origin_self (tree);
@@ -247,18 +246,28 @@  get_hard_reg_initial_val (enum machine_m
   ivs = crtl->hard_reg_initial_vals;
   if (ivs == 0)
     {
-      ivs = ggc_alloc_initial_value_struct ();
+      ivs
+	= ((struct initial_value_struct *)
+	   obstack_alloc (rtl_obstack, sizeof (struct initial_value_struct)));
       ivs->num_entries = 0;
       ivs->max_entries = 5;
-      ivs->entries = ggc_alloc_vec_initial_value_pair (5);
+      ivs->entries
+	= ((struct initial_value_pair *)obstack_alloc (rtl_obstack,
+						       5 * sizeof (struct initial_value_pair)));
       crtl->hard_reg_initial_vals = ivs;
     }
 
   if (ivs->num_entries >= ivs->max_entries)
     {
+      struct initial_value_pair *newvec;
       ivs->max_entries += 5;
-      ivs->entries = GGC_RESIZEVEC (initial_value_pair, ivs->entries,
-				    ivs->max_entries);
+      newvec
+	= ((struct initial_value_pair *)obstack_alloc (rtl_obstack,
+						       ivs->max_entries
+						       * sizeof (struct initial_value_pair)));
+      memcpy (newvec, ivs->entries,
+	      sizeof (struct initial_value_pair) * (ivs->max_entries - 5));
+      ivs->entries = newvec;
     }
 
   ivs->entries[ivs->num_entries].hard_reg = gen_rtx_REG (mode, regno);
@@ -372,5 +381,3 @@  allocate_initial_values (rtx *reg_equiv_
 	}
     }
 }
-
-#include "gt-integrate.h"
Index: combine.c
===================================================================
--- combine.c	(revision 162821)
+++ combine.c	(working copy)
@@ -367,6 +367,8 @@  struct undobuf
 
 static struct undobuf undobuf;
 
+static char *combine_firstobj;
+
 /* Number of times the pseudo being substituted for
    was found and replaced.  */
 
@@ -1147,6 +1149,8 @@  combine_instructions (rtx f, unsigned in
 		 into SUBREGs.  */
 	      note_uses (&PATTERN (insn), record_truncated_values, NULL);
 
+	      combine_firstobj = (char *) obstack_alloc (rtl_obstack, 0);
+
 	      /* Try this insn with each insn it links back to.  */
 
 	      for (links = LOG_LINKS (insn); links; links = XEXP (links, 1))
@@ -3508,6 +3512,7 @@  try_combine (rtx i3, rtx i2, rtx i1, int
        && (! check_asm_operands (newpat) || added_sets_1 || added_sets_2)))
     {
       undo_all ();
+      obstack_free (rtl_obstack, combine_firstobj);
       return 0;
     }
 
@@ -3523,6 +3528,7 @@  try_combine (rtx i3, rtx i2, rtx i1, int
       if (other_code_number < 0 && ! check_asm_operands (other_pat))
 	{
 	  undo_all ();
+	  obstack_free (rtl_obstack, combine_firstobj);
 	  return 0;
 	}
     }
@@ -3536,6 +3542,7 @@  try_combine (rtx i3, rtx i2, rtx i1, int
 	&& sets_cc0_p (newi2pat))
       {
 	undo_all ();
+	obstack_free (rtl_obstack, combine_firstobj);
 	return 0;
       }
   }
@@ -3546,6 +3553,7 @@  try_combine (rtx i3, rtx i2, rtx i1, int
   if (!combine_validate_cost (i1, i2, i3, newpat, newi2pat, other_pat))
     {
       undo_all ();
+      obstack_free (rtl_obstack, combine_firstobj);
       return 0;
     }
 
@@ -4039,6 +4047,8 @@  try_combine (rtx i3, rtx i2, rtx i1, int
   combine_successes++;
   undo_commit ();
 
+  combine_firstobj = (char *) obstack_alloc (rtl_obstack, 0);
+
   if (added_links_insn
       && (newi2pat == 0 || DF_INSN_LUID (added_links_insn) < DF_INSN_LUID (i2))
       && DF_INSN_LUID (added_links_insn) < DF_INSN_LUID (i3))
Index: Makefile.in
===================================================================
--- Makefile.in	(revision 162821)
+++ Makefile.in	(working copy)
@@ -2540,8 +2540,7 @@  tree-ssa-address.o : tree-ssa-address.c 
    $(SYSTEM_H) $(RTL_H) $(TREE_H) $(TM_P_H) \
    output.h $(DIAGNOSTIC_H) $(TIMEVAR_H) $(TM_H) coretypes.h $(TREE_DUMP_H) \
    $(TREE_PASS_H) $(FLAGS_H) $(TREE_INLINE_H) $(RECOG_H) insn-config.h \
-   $(EXPR_H) gt-tree-ssa-address.h $(GGC_H) tree-affine.h $(TARGET_H) \
-   tree-pretty-print.h
+   $(EXPR_H) tree-affine.h $(TARGET_H) tree-pretty-print.h
 tree-ssa-loop-niter.o : tree-ssa-loop-niter.c $(TREE_FLOW_H) $(CONFIG_H) \
    $(SYSTEM_H) $(TREE_H) $(TM_P_H) $(CFGLOOP_H) $(PARAMS_H) \
    $(TREE_INLINE_H) output.h $(DIAGNOSTIC_H) $(TM_H) coretypes.h $(TREE_DUMP_H) \
@@ -2907,7 +2906,7 @@  expr.o : expr.c $(CONFIG_H) $(SYSTEM_H) 
    $(TREE_PASS_H) $(DF_H) $(DIAGNOSTIC_H) vecprim.h $(SSAEXPAND_H)
 dojump.o : dojump.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(TREE_H) \
    $(FLAGS_H) $(FUNCTION_H) $(EXPR_H) $(OPTABS_H) $(INSN_ATTR_H) insn-config.h \
-   langhooks.h $(GGC_H) gt-dojump.h vecprim.h $(BASIC_BLOCK_H) output.h
+   langhooks.h vecprim.h $(BASIC_BLOCK_H) output.h
 builtins.o : builtins.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    $(TREE_H) $(GIMPLE_H) $(FLAGS_H) $(TARGET_H) $(FUNCTION_H) $(REGS_H) \
    $(EXPR_H) $(OPTABS_H) insn-config.h $(RECOG_H) output.h typeclass.h \
@@ -2926,7 +2925,7 @@  expmed.o : expmed.c $(CONFIG_H) $(SYSTEM
    expmed.h
 explow.o : explow.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(TREE_H) \
    $(FLAGS_H) hard-reg-set.h insn-config.h $(EXPR_H) $(OPTABS_H) $(RECOG_H) \
-   $(TOPLEV_H) $(DIAGNOSTIC_CORE_H) $(EXCEPT_H) $(FUNCTION_H) $(GGC_H) $(TM_P_H) langhooks.h gt-explow.h \
+   $(TOPLEV_H) $(DIAGNOSTIC_CORE_H) $(EXCEPT_H) $(FUNCTION_H) $(TM_P_H) langhooks.h \
    $(TARGET_H) output.h
 optabs.o : optabs.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    $(TREE_H) $(FLAGS_H) insn-config.h $(EXPR_H) $(OPTABS_H) $(LIBFUNCS_H) \
@@ -2972,7 +2971,7 @@  integrate.o : integrate.c $(CONFIG_H) $(
    $(RTL_H) $(TREE_H) $(FLAGS_H) debug.h $(INTEGRATE_H) insn-config.h \
    $(EXPR_H) $(REGS_H) intl.h $(FUNCTION_H) output.h $(RECOG_H) \
    $(EXCEPT_H) $(TOPLEV_H) $(DIAGNOSTIC_CORE_H) $(PARAMS_H) $(TM_P_H) $(TARGET_H) langhooks.h \
-   gt-integrate.h $(GGC_H) $(TREE_PASS_H) $(DF_H)
+   $(TREE_PASS_H) $(DF_H)
 jump.o : jump.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    $(FLAGS_H) hard-reg-set.h $(REGS_H) insn-config.h $(RECOG_H) $(EXPR_H) \
    $(EXCEPT_H) $(FUNCTION_H) $(BASIC_BLOCK_H) $(TREE_PASS_H) \
@@ -3067,7 +3066,7 @@  coverage.o : coverage.c $(GCOV_IO_H) $(C
 cselib.o : cselib.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(RECOG_H) \
    $(EMIT_RTL_H) $(TOPLEV_H) $(DIAGNOSTIC_CORE_H) output.h $(FUNCTION_H) $(TREE_PASS_H) \
-   cselib.h gt-cselib.h $(GGC_H) $(TM_P_H) $(PARAMS_H) alloc-pool.h \
+   cselib.h $(TM_P_H) $(PARAMS_H) alloc-pool.h \
    $(HASHTAB_H) $(TARGET_H) $(BITMAP_H)
 cse.o : cse.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(REGS_H) \
    hard-reg-set.h $(FLAGS_H) insn-config.h $(RECOG_H) $(EXPR_H) $(TOPLEV_H) $(DIAGNOSTIC_CORE_H) \
@@ -3096,9 +3095,9 @@  implicit-zee.o : implicit-zee.c $(CONFIG
    $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(TOPLEV_H) $(DIAGNOSTIC_CORE_H) \
    $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h $(PARAMS_H) $(CGRAPH_H)
 gcse.o : gcse.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
-   $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
+   $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h \
    $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h $(TOPLEV_H) $(DIAGNOSTIC_CORE_H) \
-   $(TM_P_H) $(PARAMS_H) cselib.h $(EXCEPT_H) gt-gcse.h $(TREE_H) $(TIMEVAR_H) \
+   $(TM_P_H) $(PARAMS_H) cselib.h $(EXCEPT_H) $(TREE_H) $(TIMEVAR_H) \
    intl.h $(OBSTACK_H) $(TREE_PASS_H) $(DF_H) $(DBGCNT_H) $(TARGET_H) \
    $(DF_H) gcse.h
 store-motion.o : store-motion.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
@@ -3301,7 +3300,7 @@  postreload-gcse.o : postreload-gcse.c $(
 caller-save.o : caller-save.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    $(FLAGS_H) $(REGS_H) hard-reg-set.h insn-config.h $(BASIC_BLOCK_H) $(FUNCTION_H) \
    addresses.h $(RECOG_H) reload.h $(EXPR_H) $(TOPLEV_H) $(DIAGNOSTIC_CORE_H) $(TM_P_H) $(DF_H) \
-   output.h gt-caller-save.h $(GGC_H)
+   output.h
 bt-load.o : bt-load.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(EXCEPT_H) \
    $(RTL_H) hard-reg-set.h $(REGS_H) $(TM_P_H) $(FIBHEAP_H) output.h $(EXPR_H) \
    $(TARGET_H) $(FLAGS_H) $(INSN_ATTR_H) $(FUNCTION_H) $(TREE_PASS_H) \
@@ -3434,7 +3433,7 @@  predict.o: predict.c $(CONFIG_H) $(SYSTE
    $(COVERAGE_H) $(SCEV_H) $(GGC_H) predict.def $(TIMEVAR_H) $(TREE_DUMP_H) \
    $(TREE_FLOW_H) $(TREE_PASS_H) $(EXPR_H) pointer-set.h
 lists.o: lists.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TOPLEV_H) $(DIAGNOSTIC_CORE_H) \
-   $(RTL_H) $(GGC_H) gt-lists.h
+   $(RTL_H)
 bb-reorder.o : bb-reorder.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
    $(RTL_H) $(FLAGS_H) $(TIMEVAR_H) output.h $(CFGLAYOUT_H) $(FIBHEAP_H) \
    $(TARGET_H) $(FUNCTION_H) $(TM_P_H) $(OBSTACK_H) $(EXPR_H) $(REGS_H) \
Index: basic-block.h
===================================================================
--- basic-block.h	(revision 162821)
+++ basic-block.h	(working copy)
@@ -41,7 +41,7 @@  struct GTY(()) edge_def {
   /* Instructions queued on the edge.  */
   union edge_def_insns {
     gimple_seq GTY ((tag ("true"))) g;
-    rtx GTY ((tag ("false"))) r;
+    rtx GTY ((skip,tag ("false"))) r;
   } GTY ((desc ("current_ir_type () == IR_GIMPLE"))) insns;
 
   /* Auxiliary info specific to a pass.  */
@@ -147,9 +147,9 @@  struct GTY((chain_next ("%h.next_bb"), c
   struct basic_block_def *next_bb;
 
   union basic_block_il_dependent {
-      struct gimple_bb_info * GTY ((tag ("0"))) gimple;
-      struct rtl_bb_info * GTY ((tag ("1"))) rtl;
-    } GTY ((desc ("((%1.flags & BB_RTL) != 0)"))) il;
+    struct gimple_bb_info * GTY ((tag ("0"))) gimple;
+    struct rtl_bb_info * GTY ((skip,tag ("1"))) rtl;
+  } GTY ((desc ("((%1.flags & BB_RTL) != 0)"))) il;
 
   /* Expected number of executions: calculated in profile.c.  */
   gcov_type count;
Index: config/i386/i386.h
===================================================================
--- config/i386/i386.h	(revision 162821)
+++ config/i386/i386.h	(working copy)
@@ -2298,13 +2298,13 @@  enum ix86_stack_slot
 /* Machine specific CFA tracking during prologue/epilogue generation.  */
 
 #ifndef USED_FOR_TARGET
-struct GTY(()) machine_cfa_state
+struct machine_cfa_state
 {
   rtx reg;
   HOST_WIDE_INT offset;
 };
 
-struct GTY(()) machine_function {
+struct machine_function {
   struct stack_local_entry *stack_locals;
   const char *some_ld_name;
   int varargs_gpr_size;
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 162821)
+++ config/i386/i386.c	(working copy)
@@ -8537,7 +8537,7 @@  ix86_emit_save_sse_regs_using_mov (rtx p
       }
 }
 
-static GTY(()) rtx queued_cfa_restores;
+static rtx queued_cfa_restores;
 
 /* Add a REG_CFA_RESTORE REG note to INSN or queue them until next stack
    manipulation insn.  Don't add it if the previously
@@ -20130,7 +20130,8 @@  ix86_init_machine_status (void)
 {
   struct machine_function *f;
 
-  f = ggc_alloc_cleared_machine_function ();
+  f = XNEW (struct machine_function);
+  memset (f, 0, sizeof *f);
   f->use_fast_prologue_epilogue_nregs = -1;
   f->tls_descriptor_call_expanded_p = 0;
   f->call_abi = ix86_abi;
@@ -20158,7 +20159,7 @@  assign_386_stack_local (enum machine_mod
     if (s->mode == mode && s->n == n)
       return copy_rtx (s->rtl);
 
-  s = ggc_alloc_stack_local_entry ();
+  s = XOBNEW (rtl_obstack, struct stack_local_entry);
   s->n = n;
   s->mode = mode;
   s->rtl = assign_stack_local (mode, GET_MODE_SIZE (mode), 0);
@@ -20170,18 +20171,20 @@  assign_386_stack_local (enum machine_mod
 
 /* Construct the SYMBOL_REF for the tls_get_addr function.  */
 
-static GTY(()) rtx ix86_tls_symbol;
+static rtx ix86_tls_symbol;
 rtx
 ix86_tls_get_addr (void)
 {
 
   if (!ix86_tls_symbol)
     {
+      rtl_on_permanent_obstack ();
       ix86_tls_symbol = gen_rtx_SYMBOL_REF (Pmode,
 					    (TARGET_ANY_GNU_TLS
 					     && !TARGET_64BIT)
 					    ? "___tls_get_addr"
 					    : "__tls_get_addr");
+      rtl_pop_obstack ();
     }
 
   return ix86_tls_symbol;
@@ -20189,17 +20192,19 @@  ix86_tls_get_addr (void)
 
 /* Construct the SYMBOL_REF for the _TLS_MODULE_BASE_ symbol.  */
 
-static GTY(()) rtx ix86_tls_module_base_symbol;
+static rtx ix86_tls_module_base_symbol;
 rtx
 ix86_tls_module_base (void)
 {
 
   if (!ix86_tls_module_base_symbol)
     {
+      rtl_on_permanent_obstack ();
       ix86_tls_module_base_symbol = gen_rtx_SYMBOL_REF (Pmode,
 							"_TLS_MODULE_BASE_");
       SYMBOL_REF_FLAGS (ix86_tls_module_base_symbol)
 	|= TLS_MODEL_GLOBAL_DYNAMIC << SYMBOL_FLAG_TLS_SHIFT;
+      rtl_pop_obstack ();
     }
 
   return ix86_tls_module_base_symbol;
@@ -20982,6 +20987,7 @@  static rtx
 ix86_static_chain (const_tree fndecl, bool incoming_p)
 {
   unsigned regno;
+  rtx t;
 
   if (!DECL_STATIC_CHAIN (fndecl))
     return NULL;
@@ -21031,7 +21037,10 @@  ix86_static_chain (const_tree fndecl, bo
 	}
     }
 
-  return gen_rtx_REG (Pmode, regno);
+  rtl_on_permanent_obstack ();
+  t = gen_rtx_REG (Pmode, regno);
+  rtl_pop_obstack ();
+  return t;
 }
 
 /* Emit RTL insns to initialize the variable parts of a trampoline.
Index: cfgrtl.c
===================================================================
--- cfgrtl.c	(revision 162821)
+++ cfgrtl.c	(working copy)
@@ -3079,7 +3079,8 @@  void
 init_rtl_bb_info (basic_block bb)
 {
   gcc_assert (!bb->il.rtl);
-  bb->il.rtl = ggc_alloc_cleared_rtl_bb_info ();
+  bb->il.rtl = XOBNEW (rtl_obstack, struct rtl_bb_info);
+  memset (bb->il.rtl, 0, sizeof (struct rtl_bb_info));
 }
 
 
Index: stmt.c
===================================================================
--- stmt.c	(revision 162821)
+++ stmt.c	(working copy)
@@ -140,7 +140,10 @@  label_rtx (tree label)
 
   if (!DECL_RTL_SET_P (label))
     {
-      rtx r = gen_label_rtx ();
+      rtx r;
+      rtl_on_permanent_obstack ();
+      r = gen_label_rtx ();
+      rtl_pop_obstack ();
       SET_DECL_RTL (label, r);
       if (FORCED_LABEL (label) || DECL_NONLOCAL (label))
 	LABEL_PRESERVE_P (r) = 1;
Index: reload1.c
===================================================================
--- reload1.c	(revision 162821)
+++ reload1.c	(working copy)
@@ -1457,7 +1457,9 @@  calculate_needs_all_insns (int global)
 {
   struct insn_chain **pprev_reload = &insns_need_reload;
   struct insn_chain *chain, *next = 0;
-
+#if 0
+  char *reload_rtl_firstobj = XOBNEWVAR (rtl_obstack, char, 0);
+#endif
   something_needs_elimination = 0;
 
   reload_insn_firstobj = XOBNEWVAR (&reload_obstack, char, 0);
@@ -1560,7 +1562,12 @@  calculate_needs_all_insns (int global)
 	  /* Discard any register replacements done.  */
 	  if (did_elimination)
 	    {
-	      obstack_free (&reload_obstack, reload_insn_firstobj);
+#if 0
+	      if (n_reloads != 0)
+		reload_rtl_firstobj = XOBNEWVAR (rtl_obstack, char, 0);
+	      else
+		obstack_free (rtl_obstack, reload_rtl_firstobj);
+#endif
 	      PATTERN (insn) = old_body;
 	      INSN_CODE (insn) = old_code;
 	      REG_NOTES (insn) = old_notes;
@@ -3221,9 +3228,13 @@  eliminate_regs_in_insn (rtx insn, int re
 		  || GET_CODE (PATTERN (insn)) == ASM_INPUT
 		  || DEBUG_INSN_P (insn));
       if (DEBUG_INSN_P (insn))
-	INSN_VAR_LOCATION_LOC (insn)
-	  = eliminate_regs (INSN_VAR_LOCATION_LOC (insn), VOIDmode, insn);
-      return 0;
+	{
+	  new_body = eliminate_regs (INSN_VAR_LOCATION_LOC (insn), VOIDmode, insn);
+	  if (new_body != INSN_VAR_LOCATION_LOC (insn))
+	    val = 1;
+	  INSN_VAR_LOCATION_LOC (insn) = new_body;
+	}
+      return val;
     }
 
   if (old_set != 0 && REG_P (SET_DEST (old_set))