Avoid some unnecessary set_cfun calls

Message ID	20131113094909.GI27813@tucnak.zalov.cz
State	New
Headers	show Return-Path: <gcc-patches-return-354319-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:mime-version :content-type; q=dns; s=default; b=gitube1TJ7GVDz80hR+RPk7qLS1m2 qZKBp0DqVERpb6nPjnTHD2Mt5cp0GH1qPs2UP+OyeKrI6nwleCa1XFQr2zFp0mnG GRg84q4LK9PPy9qupucuxVzapruiTXV31p6cmHCDoYzF3H0inj9lErGmbdct8GlH 16p+zZwfxXCiYQ= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org Date: Wed, 13 Nov 2013 10:49:09 +0100 From: Jakub Jelinek <jakub@redhat.com> To: Richard Biener <rguenther@suse.de> Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] Avoid some unnecessary set_cfun calls Message-ID: <20131113094909.GI27813@tucnak.zalov.cz> Reply-To: Jakub Jelinek <jakub@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15)

Jakub Jelinek Nov. 13, 2013, 9:49 a.m. UTC

Hi!

void f1 (void) {}
__attribute__((target ("avx"))) void f2 (void) {}
__attribute__((target ("avx2"))) void f3 (void) {}
__attribute__((target ("sse3"))) void f4 (void) {}
__attribute__((target ("ssse3"))) void f5 (void) {}
__attribute__((target ("sse4"))) void f6 (void) {}
takes about 3 seconds to compile at -O2, because set_cfun is terribly
expensive and there are hundreds of such calls.
The following patch is just a quick change to avoid some of them:
execute_function_todo starts with:
  unsigned int flags = (size_t)data;
  flags &= ~cfun->last_verified;
  if (!flags)
    return;
and if flags is initially zero, it does nothing.
Similarly, execute_function_dump has the whole body surrounded by
  if (dump_file && current_function_decl)
and thus if dump_file is NULL, there is nothing to do.
So IMHO in neither case (which happens pretty frequently) we need to
set_cfun to every function during IPA.

Also, I wonder if we couldn't defer the expensive ira_init, if the info
computed by it is used only during RTL optimization passes (haven't verified
it yet), then supposedly we could just remember using some target hook
what the last state was when we did ira_init last time, and call ira_init
again at the start of expansion or so if it is different from the last time.
For i?86/x86_64/ppc* this would be whether the current function's
DECL_FUNCTION_SPECIFIC_TARGET is the same as one for which ira_init has been
called, for rx whether interrupt attribute is the same and for mips whatever
is needed.

2013-11-13  Jakub Jelinek  <jakub@redhat.com>

	* passes.c (execute_todo): Don't call do_per_function if
	flags are zero.
	(execute_one_ipa_transform_pass, execute_one_pass): Don't call
	execute_function_dump if dump_file is NULL.


	Jakub

Richard Biener Nov. 13, 2013, 10:27 a.m. UTC | #1

On Wed, 13 Nov 2013, Jakub Jelinek wrote:

> Hi!
> 
> void f1 (void) {}
> __attribute__((target ("avx"))) void f2 (void) {}
> __attribute__((target ("avx2"))) void f3 (void) {}
> __attribute__((target ("sse3"))) void f4 (void) {}
> __attribute__((target ("ssse3"))) void f5 (void) {}
> __attribute__((target ("sse4"))) void f6 (void) {}
> takes about 3 seconds to compile at -O2, because set_cfun is terribly
> expensive and there are hundreds of such calls.
> The following patch is just a quick change to avoid some of them:
> execute_function_todo starts with:
>   unsigned int flags = (size_t)data;
>   flags &= ~cfun->last_verified;
>   if (!flags)
>     return;
> and if flags is initially zero, it does nothing.
> Similarly, execute_function_dump has the whole body surrounded by
>   if (dump_file && current_function_decl)
> and thus if dump_file is NULL, there is nothing to do.
> So IMHO in neither case (which happens pretty frequently) we need to
> set_cfun to every function during IPA.

Ok, but eventually all the TODO-called stuff should be made work
with a NULL cfun (and execute () get a struct function argument).

> Also, I wonder if we couldn't defer the expensive ira_init, if the info
> computed by it is used only during RTL optimization passes (haven't verified
> it yet), then supposedly we could just remember using some target hook
> what the last state was when we did ira_init last time, and call ira_init
> again at the start of expansion or so if it is different from the last time.
> For i?86/x86_64/ppc* this would be whether the current function's
> DECL_FUNCTION_SPECIFIC_TARGET is the same as one for which ira_init has been
> called, for rx whether interrupt attribute is the same and for mips whatever
> is needed.

I wonder why we cannot move all the stuff we re-init to a member
of struct function (or rather have a pointer to that info there
to cache it across functions with the same options).  That is,
get rid of more global state?  That would make switching back
and forth cheaper.

Thanks,
Richard.

> 2013-11-13  Jakub Jelinek  <jakub@redhat.com>
> 
> 	* passes.c (execute_todo): Don't call do_per_function if
> 	flags are zero.
> 	(execute_one_ipa_transform_pass, execute_one_pass): Don't call
> 	execute_function_dump if dump_file is NULL.
> 
> --- gcc/passes.c.jj	2013-11-12 11:31:30.000000000 +0100
> +++ gcc/passes.c	2013-11-12 18:52:40.590727542 +0100
> @@ -1875,7 +1875,8 @@ execute_todo (unsigned int flags)
>  
>    statistics_fini_pass ();
>  
> -  do_per_function (execute_function_todo, (void *)(size_t) flags);
> +  if (flags)
> +    do_per_function (execute_function_todo, (void *)(size_t) flags);
>  
>    /* Always remove functions just as before inlining: IPA passes might be
>       interested to see bodies of extern inline functions that are not inlined
> @@ -2065,7 +2066,8 @@ execute_one_ipa_transform_pass (struct c
>    if (profile_report && cfun && (cfun->curr_properties & PROP_cfg))
>      check_profile_consistency (pass->static_pass_number, 1, true);
>  
> -  do_per_function (execute_function_dump, NULL);
> +  if (dump_file)
> +    do_per_function (execute_function_dump, NULL);
>    pass_fini_dump_file (pass);
>  
>    current_pass = NULL;
> @@ -2231,7 +2233,8 @@ execute_one_pass (struct opt_pass *pass)
>      check_profile_consistency (pass->static_pass_number, 1, true);
>  
>    verify_interpass_invariants ();
> -  do_per_function (execute_function_dump, NULL);
> +  if (dump_file)
> +    do_per_function (execute_function_dump, NULL);
>    if (pass->type == IPA_PASS)
>      {
>        struct cgraph_node *node;
> 
> 	Jakub
> 
>

Jakub Jelinek Nov. 13, 2013, 10:48 a.m. UTC | #2

On Wed, Nov 13, 2013 at 11:27:10AM +0100, Richard Biener wrote:
> > Also, I wonder if we couldn't defer the expensive ira_init, if the info
> > computed by it is used only during RTL optimization passes (haven't verified
> > it yet), then supposedly we could just remember using some target hook
> > what the last state was when we did ira_init last time, and call ira_init
> > again at the start of expansion or so if it is different from the last time.
> > For i?86/x86_64/ppc* this would be whether the current function's
> > DECL_FUNCTION_SPECIFIC_TARGET is the same as one for which ira_init has been
> > called, for rx whether interrupt attribute is the same and for mips whatever
> > is needed.
> 
> I wonder why we cannot move all the stuff we re-init to a member
> of struct function (or rather have a pointer to that info there
> to cache it across functions with the same options).  That is,
> get rid of more global state?  That would make switching back
> and forth cheaper.

Isn't that what the SWITCHABLE_TARGET stuff is all about?
So, perhaps we should just define SWITCHABLE_TARGET on i?86/x86_64/powerpc*
(and rx if maintainer cares) and tweak it to attach somehow
struct target_globals * to TARGET_OPTION_NODE somehow.
A problem might be that lots of the save_target_globals
allocated structures are heap allocated rather than GC, so we might leak
memory.  Wonder if save_target_globals couldn't just compute the
aggregate size of all the structures it allocates with XCNEW right now
(plus required alignment if needed) and just allocate them together
with the ggc_alloc_target_globals after the target_globals structure
itself.

	Jakub

Richard Biener Nov. 13, 2013, 10:53 a.m. UTC | #3

On Wed, 13 Nov 2013, Jakub Jelinek wrote:

> On Wed, Nov 13, 2013 at 11:27:10AM +0100, Richard Biener wrote:
> > > Also, I wonder if we couldn't defer the expensive ira_init, if the info
> > > computed by it is used only during RTL optimization passes (haven't verified
> > > it yet), then supposedly we could just remember using some target hook
> > > what the last state was when we did ira_init last time, and call ira_init
> > > again at the start of expansion or so if it is different from the last time.
> > > For i?86/x86_64/ppc* this would be whether the current function's
> > > DECL_FUNCTION_SPECIFIC_TARGET is the same as one for which ira_init has been
> > > called, for rx whether interrupt attribute is the same and for mips whatever
> > > is needed.
> > 
> > I wonder why we cannot move all the stuff we re-init to a member
> > of struct function (or rather have a pointer to that info there
> > to cache it across functions with the same options).  That is,
> > get rid of more global state?  That would make switching back
> > and forth cheaper.
> 
> Isn't that what the SWITCHABLE_TARGET stuff is all about?

Maybe - I didn't follow all the changes in this area.

> So, perhaps we should just define SWITCHABLE_TARGET on i?86/x86_64/powerpc*
> (and rx if maintainer cares) and tweak it to attach somehow
> struct target_globals * to TARGET_OPTION_NODE somehow.
> A problem might be that lots of the save_target_globals
> allocated structures are heap allocated rather than GC, so we might leak
> memory.  Wonder if save_target_globals couldn't just compute the
> aggregate size of all the structures it allocates with XCNEW right now
> (plus required alignment if needed) and just allocate them together
> with the ggc_alloc_target_globals after the target_globals structure
> itself.

If you want to re-use it from functions with same options don't you
have a hashtable anyway?  You could add a reference count.

Richard.

Jakub Jelinek Nov. 13, 2013, 10:57 a.m. UTC | #4

On Wed, Nov 13, 2013 at 11:53:32AM +0100, Richard Biener wrote:
> > So, perhaps we should just define SWITCHABLE_TARGET on i?86/x86_64/powerpc*
> > (and rx if maintainer cares) and tweak it to attach somehow
> > struct target_globals * to TARGET_OPTION_NODE somehow.
> > A problem might be that lots of the save_target_globals
> > allocated structures are heap allocated rather than GC, so we might leak
> > memory.  Wonder if save_target_globals couldn't just compute the
> > aggregate size of all the structures it allocates with XCNEW right now
> > (plus required alignment if needed) and just allocate them together
> > with the ggc_alloc_target_globals after the target_globals structure
> > itself.
> 
> If you want to re-use it from functions with same options don't you
> have a hashtable anyway?  You could add a reference count.

build_target_option_node is such a hash table for that.

	Jakub

Richard Biener Nov. 13, 2013, 11:08 a.m. UTC | #5

On Wed, 13 Nov 2013, Jakub Jelinek wrote:

> On Wed, Nov 13, 2013 at 11:53:32AM +0100, Richard Biener wrote:
> > > So, perhaps we should just define SWITCHABLE_TARGET on i?86/x86_64/powerpc*
> > > (and rx if maintainer cares) and tweak it to attach somehow
> > > struct target_globals * to TARGET_OPTION_NODE somehow.
> > > A problem might be that lots of the save_target_globals
> > > allocated structures are heap allocated rather than GC, so we might leak
> > > memory.  Wonder if save_target_globals couldn't just compute the
> > > aggregate size of all the structures it allocates with XCNEW right now
> > > (plus required alignment if needed) and just allocate them together
> > > with the ggc_alloc_target_globals after the target_globals structure
> > > itself.
> > 
> > If you want to re-use it from functions with same options don't you
> > have a hashtable anyway?  You could add a reference count.
> 
> build_target_option_node is such a hash table for that.

Ah, and we already have some custom pointers in the tree node.  Looks
like a suitable place to put in memory management then.

Richard.

Martin Jambor Nov. 13, 2013, 12:33 p.m. UTC | #6

Hi,

On Wed, Nov 13, 2013 at 10:49:09AM +0100, Jakub Jelinek wrote:
> Hi!
> 
> void f1 (void) {}
> __attribute__((target ("avx"))) void f2 (void) {}
> __attribute__((target ("avx2"))) void f3 (void) {}
> __attribute__((target ("sse3"))) void f4 (void) {}
> __attribute__((target ("ssse3"))) void f5 (void) {}
> __attribute__((target ("sse4"))) void f6 (void) {}
> takes about 3 seconds to compile at -O2, because set_cfun is terribly
> expensive and there are hundreds of such calls.
> The following patch is just a quick change to avoid some of them:
> execute_function_todo starts with:
>   unsigned int flags = (size_t)data;
>   flags &= ~cfun->last_verified;
>   if (!flags)
>     return;
> and if flags is initially zero, it does nothing.
> Similarly, execute_function_dump has the whole body surrounded by
>   if (dump_file && current_function_decl)
> and thus if dump_file is NULL, there is nothing to do.
> So IMHO in neither case (which happens pretty frequently) we need to
> set_cfun to every function during IPA.
> 
> Also, I wonder if we couldn't defer the expensive ira_init, if the info
> computed by it is used only during RTL optimization passes (haven't verified
> it yet), then supposedly we could just remember using some target hook
> what the last state was when we did ira_init last time, and call ira_init
> again at the start of expansion or so if it is different from the
> last time.

I was wondering whether the expensive parts of set_cfun could only be
run in pass_all_optimizations (and the -Og equivalent) but not when
changing functions in early and IPA passes.

Martin

Richard Biener Nov. 13, 2013, 12:53 p.m. UTC | #7

On Wed, 13 Nov 2013, Martin Jambor wrote:

> Hi,
> 
> On Wed, Nov 13, 2013 at 10:49:09AM +0100, Jakub Jelinek wrote:
> > Hi!
> > 
> > void f1 (void) {}
> > __attribute__((target ("avx"))) void f2 (void) {}
> > __attribute__((target ("avx2"))) void f3 (void) {}
> > __attribute__((target ("sse3"))) void f4 (void) {}
> > __attribute__((target ("ssse3"))) void f5 (void) {}
> > __attribute__((target ("sse4"))) void f6 (void) {}
> > takes about 3 seconds to compile at -O2, because set_cfun is terribly
> > expensive and there are hundreds of such calls.
> > The following patch is just a quick change to avoid some of them:
> > execute_function_todo starts with:
> >   unsigned int flags = (size_t)data;
> >   flags &= ~cfun->last_verified;
> >   if (!flags)
> >     return;
> > and if flags is initially zero, it does nothing.
> > Similarly, execute_function_dump has the whole body surrounded by
> >   if (dump_file && current_function_decl)
> > and thus if dump_file is NULL, there is nothing to do.
> > So IMHO in neither case (which happens pretty frequently) we need to
> > set_cfun to every function during IPA.
> > 
> > Also, I wonder if we couldn't defer the expensive ira_init, if the info
> > computed by it is used only during RTL optimization passes (haven't verified
> > it yet), then supposedly we could just remember using some target hook
> > what the last state was when we did ira_init last time, and call ira_init
> > again at the start of expansion or so if it is different from the
> > last time.
> 
> I was wondering whether the expensive parts of set_cfun could only be
> run in pass_all_optimizations (and the -Og equivalent) but not when
> changing functions in early and IPA passes.

Sounds like a hack ;)

Better get things working without the cfun/current_function_decl globals.
Wasn't there someone replacing all implicit uses with explicit ones
for stuff like n_basic_blocks?

Richard.

Martin Jambor Nov. 13, 2013, 1:07 p.m. UTC | #8

On Wed, Nov 13, 2013 at 01:53:00PM +0100, Richard Biener wrote:
> On Wed, 13 Nov 2013, Martin Jambor wrote:
> 
> > Hi,
> > 
> > On Wed, Nov 13, 2013 at 10:49:09AM +0100, Jakub Jelinek wrote:
> > > Hi!
> > > 
> > > void f1 (void) {}
> > > __attribute__((target ("avx"))) void f2 (void) {}
> > > __attribute__((target ("avx2"))) void f3 (void) {}
> > > __attribute__((target ("sse3"))) void f4 (void) {}
> > > __attribute__((target ("ssse3"))) void f5 (void) {}
> > > __attribute__((target ("sse4"))) void f6 (void) {}
> > > takes about 3 seconds to compile at -O2, because set_cfun is terribly
> > > expensive and there are hundreds of such calls.
> > > The following patch is just a quick change to avoid some of them:
> > > execute_function_todo starts with:
> > >   unsigned int flags = (size_t)data;
> > >   flags &= ~cfun->last_verified;
> > >   if (!flags)
> > >     return;
> > > and if flags is initially zero, it does nothing.
> > > Similarly, execute_function_dump has the whole body surrounded by
> > >   if (dump_file && current_function_decl)
> > > and thus if dump_file is NULL, there is nothing to do.
> > > So IMHO in neither case (which happens pretty frequently) we need to
> > > set_cfun to every function during IPA.
> > > 
> > > Also, I wonder if we couldn't defer the expensive ira_init, if the info
> > > computed by it is used only during RTL optimization passes (haven't verified
> > > it yet), then supposedly we could just remember using some target hook
> > > what the last state was when we did ira_init last time, and call ira_init
> > > again at the start of expansion or so if it is different from the
> > > last time.
> > 
> > I was wondering whether the expensive parts of set_cfun could only be
> > run in pass_all_optimizations (and the -Og equivalent) but not when
> > changing functions in early and IPA passes.
> 
> Sounds like a hack ;)

Well, a little bit.

> 
> Better get things working without the cfun/current_function_decl globals.
> Wasn't there someone replacing all implicit uses with explicit ones
> for stuff like n_basic_blocks?

I'm not so sure, I think that having an implicit value for the
function parameter makes sense in all intraprocedural passes.  But it
would be great if it was no more than an implicit parameter.

One item on my TODO list is to try and make the at least the
summary-building stages of IPA passes not depend on cfun.  That should
be easy if they did not modify the function bodies.  But PR 54477
shows that they do and the bug has so far scared me away.

Martin

David Malcolm Nov. 13, 2013, 1:22 p.m. UTC | #9

On Wed, 2013-11-13 at 13:53 +0100, Richard Biener wrote:
> On Wed, 13 Nov 2013, Martin Jambor wrote:
> 
> > Hi,
> > 
> > On Wed, Nov 13, 2013 at 10:49:09AM +0100, Jakub Jelinek wrote:
> > > Hi!
> > > 
> > > void f1 (void) {}
> > > __attribute__((target ("avx"))) void f2 (void) {}
> > > __attribute__((target ("avx2"))) void f3 (void) {}
> > > __attribute__((target ("sse3"))) void f4 (void) {}
> > > __attribute__((target ("ssse3"))) void f5 (void) {}
> > > __attribute__((target ("sse4"))) void f6 (void) {}
> > > takes about 3 seconds to compile at -O2, because set_cfun is terribly
> > > expensive and there are hundreds of such calls.
> > > The following patch is just a quick change to avoid some of them:
> > > execute_function_todo starts with:
> > >   unsigned int flags = (size_t)data;
> > >   flags &= ~cfun->last_verified;
> > >   if (!flags)
> > >     return;
> > > and if flags is initially zero, it does nothing.
> > > Similarly, execute_function_dump has the whole body surrounded by
> > >   if (dump_file && current_function_decl)
> > > and thus if dump_file is NULL, there is nothing to do.
> > > So IMHO in neither case (which happens pretty frequently) we need to
> > > set_cfun to every function during IPA.
> > > 
> > > Also, I wonder if we couldn't defer the expensive ira_init, if the info
> > > computed by it is used only during RTL optimization passes (haven't verified
> > > it yet), then supposedly we could just remember using some target hook
> > > what the last state was when we did ira_init last time, and call ira_init
> > > again at the start of expansion or so if it is different from the
> > > last time.
> > 
> > I was wondering whether the expensive parts of set_cfun could only be
> > run in pass_all_optimizations (and the -Og equivalent) but not when
> > changing functions in early and IPA passes.
> 
> Sounds like a hack ;)
> 
> Better get things working without the cfun/current_function_decl globals.
> Wasn't there someone replacing all implicit uses with explicit ones
> for stuff like n_basic_blocks?

I was working on this:
http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00780.html
though I switched to other tasks I felt were higher priority; sorry.

Do you still want me to go ahead and commit the series of changes you
pre-approved there?

i.e. the "n_basic_blocks" macro goes away in favor of:
   n_basic_blocks_for_fn (cfun)
as a renaming of the existing n_basic_blocks_for_function macro,
followed up by analogous changes to the other macros.

Or should I repost before committing?

Dave

Richard Biener Nov. 13, 2013, 1:44 p.m. UTC | #10

On Wed, 13 Nov 2013, David Malcolm wrote:

> On Wed, 2013-11-13 at 13:53 +0100, Richard Biener wrote:
> > On Wed, 13 Nov 2013, Martin Jambor wrote:
> > 
> > > Hi,
> > > 
> > > On Wed, Nov 13, 2013 at 10:49:09AM +0100, Jakub Jelinek wrote:
> > > > Hi!
> > > > 
> > > > void f1 (void) {}
> > > > __attribute__((target ("avx"))) void f2 (void) {}
> > > > __attribute__((target ("avx2"))) void f3 (void) {}
> > > > __attribute__((target ("sse3"))) void f4 (void) {}
> > > > __attribute__((target ("ssse3"))) void f5 (void) {}
> > > > __attribute__((target ("sse4"))) void f6 (void) {}
> > > > takes about 3 seconds to compile at -O2, because set_cfun is terribly
> > > > expensive and there are hundreds of such calls.
> > > > The following patch is just a quick change to avoid some of them:
> > > > execute_function_todo starts with:
> > > >   unsigned int flags = (size_t)data;
> > > >   flags &= ~cfun->last_verified;
> > > >   if (!flags)
> > > >     return;
> > > > and if flags is initially zero, it does nothing.
> > > > Similarly, execute_function_dump has the whole body surrounded by
> > > >   if (dump_file && current_function_decl)
> > > > and thus if dump_file is NULL, there is nothing to do.
> > > > So IMHO in neither case (which happens pretty frequently) we need to
> > > > set_cfun to every function during IPA.
> > > > 
> > > > Also, I wonder if we couldn't defer the expensive ira_init, if the info
> > > > computed by it is used only during RTL optimization passes (haven't verified
> > > > it yet), then supposedly we could just remember using some target hook
> > > > what the last state was when we did ira_init last time, and call ira_init
> > > > again at the start of expansion or so if it is different from the
> > > > last time.
> > > 
> > > I was wondering whether the expensive parts of set_cfun could only be
> > > run in pass_all_optimizations (and the -Og equivalent) but not when
> > > changing functions in early and IPA passes.
> > 
> > Sounds like a hack ;)
> > 
> > Better get things working without the cfun/current_function_decl globals.
> > Wasn't there someone replacing all implicit uses with explicit ones
> > for stuff like n_basic_blocks?
> 
> I was working on this:
> http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00780.html
> though I switched to other tasks I felt were higher priority; sorry.
> 
> Do you still want me to go ahead and commit the series of changes you
> pre-approved there?
> 
> i.e. the "n_basic_blocks" macro goes away in favor of:
>    n_basic_blocks_for_fn (cfun)
> as a renaming of the existing n_basic_blocks_for_function macro,
> followed up by analogous changes to the other macros.
> 
> Or should I repost before committing?

I'd say create the n_basic_blocks patch and post it, that gives
people a chance to object.  If nobody chimes in I approve it
and pre-approve the rest ;)

Using n_basic_blocks_for_fn (cfun) might feel backwards if
eventually we'd want to C++-ify struct function and make
n_basic_blocks a member function which would make it
cfun->n_basic_blocks () instead.  Ok, I think that will get
us into C++ bikeshedding again ;)

Thanks,
Richard.

Richard Sandiford Nov. 16, 2013, 11:25 a.m. UTC | #11

Jakub Jelinek <jakub@redhat.com> writes:
> On Wed, Nov 13, 2013 at 11:27:10AM +0100, Richard Biener wrote:
>> > Also, I wonder if we couldn't defer the expensive ira_init, if the info
>> > computed by it is used only during RTL optimization passes (haven't verified
>> > it yet), then supposedly we could just remember using some target hook
>> > what the last state was when we did ira_init last time, and call ira_init
>> > again at the start of expansion or so if it is different from the last time.
>> > For i?86/x86_64/ppc* this would be whether the current function's
>> > DECL_FUNCTION_SPECIFIC_TARGET is the same as one for which ira_init has been
>> > called, for rx whether interrupt attribute is the same and for mips whatever
>> > is needed.
>> 
>> I wonder why we cannot move all the stuff we re-init to a member
>> of struct function (or rather have a pointer to that info there
>> to cache it across functions with the same options).  That is,
>> get rid of more global state?  That would make switching back
>> and forth cheaper.
>
> Isn't that what the SWITCHABLE_TARGET stuff is all about?
> So, perhaps we should just define SWITCHABLE_TARGET on i?86/x86_64/powerpc*
> (and rx if maintainer cares) and tweak it to attach somehow
> struct target_globals * to TARGET_OPTION_NODE somehow.
> A problem might be that lots of the save_target_globals
> allocated structures are heap allocated rather than GC, so we might leak
> memory.  Wonder if save_target_globals couldn't just compute the
> aggregate size of all the structures it allocates with XCNEW right now
> (plus required alignment if needed) and just allocate them together
> with the ggc_alloc_target_globals after the target_globals structure
> itself.

Yeah, that might be worth doing.  I think the only non-GCed structures
with subpointers are target_ira_int and target_lra_int, but we could
probably convert them to GCed structures.  (And perhaps use the same
technique recursively.  E.g. LRA could work out the maximum number of
operand_alternative structures needed and allocate them in one go.)

Thanks,
Richard

Avoid some unnecessary set_cfun calls

Commit Message

Comments

Patch