diff mbox

Eliminate write-only variables

Message ID CAFiYyc1R+AX6PYM2edpuQ9fpw89TFLZ3=wiXXaDT31CRZJaYxA@mail.gmail.com
State New
Headers show

Commit Message

Richard Biener May 20, 2014, 10:56 a.m. UTC
On Tue, May 20, 2014 at 6:29 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> On 05/18/2014 08:45 PM, Sandra Loosemore wrote:
>> >On 05/18/2014 02:59 PM, Jan Hubicka wrote:
>> >>For cases like local-statics-7 your approach can be "saved" by adding
>> >>simple IPA analysis
>> >>to look for static vars that are used only by one function and keeping
>> >>your DSE code active
>> >>for them, so we can still get rid of this special case assignments
>> >>during late compilation.
>> >>I am however not quite convinced it is worth the effort - do you have
>> >>some real world
>> >>cases where it helps?
>> >
>> >Um, the well-known benchmark.  ;-)
>>
>> I looked at the generated code for this benchmark and see that your
>> patch is indeed not getting rid of all the pointless static
>> variables, while ours is, so this is quite disappointing.  I'm
>> thinking now that we will still need to retain our patch at least
>> locally, although we'd really like to get it on trunk if possible.
>
> Yours patch can really be improved by adding simple IPA analysis to work
> out what variables are refernced by one function only instead of using
> not-longer-that-relevant local static info.
> As last IPA pass for each static variable with no address taken, look at all
> references and see if they all come from one function or functions inlined to
> it.
>>
>> Here is another testcase vaguely based on the same kind of
>> signal-processing algorithm as the benchmark, but coded without
>> reference to it.  I have arm-none-eabi builds around for both 4.9.0
>> with our remove_local_statics patch applied, and trunk with your
>> patch.  With -O2, our optimization produces 23 instructions and gets
>> rid of all 3 statics, yours produces 33 and only gets rid of 1 of
>> them.
>>
>> Of course it's lame to use static variables in this way, but OTOH
>> it's lame if GCC can't recognize them and optimize them away, too.
>>
>> -Sandra
>>
>
>> void
>> fir (int *coeffs, int coeff_length, int *input, int input_length, int *output)
>> {
>>   static int *coeffp;
>>   static int *inputp;
>>   static int *outputp;
>
> Here inputp read is rmeoved only at 101.dceloop1 pass. That is really late.
> coeffp is removed late, too.
>
>>   int i, c, acc;
>>
>>   for (i = 0; i < input_length; i++)
>>     {
>>       coeffp = coeffs;
>>       inputp = input + i;
>>       outputp = output + i;
>>       acc = 0;
>>       for (c = 0; c < coeff_length; c++)
>>       {
>>         acc += *coeffp * *inputp;
>>         coeffp++;
>>         inputp--;
>>       }
>
> It seems like AA can not work out the fact that inputp is unchanged until that
> late.  Richi, any ideas?

Well, AA is perfectly fine - it's just that this boils down to a partial
redundancy problem.  The stores can be removed by DCE even with


but I don't have a good way of checking the ??? prerequesite
(even without taking its address the statics may be refered to
by a) inline copies, b) ipa-split function parts, c) nested functions).
I'm sure IPA references are not kept up-to-date.

The last reads go away with PRE at the expense of two
additional induction variables.

If we know that a static variable does not have its address taken
and is only refered to from a single function we can in theory
simply rewrite it into SSA form during early opts (before inlining
its body), for example by SRA.  That isn't even restricted to
function-local statics (for statics used in multiple functions but
not having address-taken we can do the same with doing
function entry / exit load / store).  I think that would be a much
better IPA enabled optimization than playing games with
looking at individual stmts.

Richard.

> Honza
>>       *outputp = acc;
>>     }
>> }
>>
>>
>

Comments

Sandra Loosemore May 31, 2014, 6:21 p.m. UTC | #1
On 05/20/2014 04:56 AM, Richard Biener wrote:
> On Tue, May 20, 2014 at 6:29 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> On 05/18/2014 08:45 PM, Sandra Loosemore wrote:
>>>> On 05/18/2014 02:59 PM, Jan Hubicka wrote:
>>>>> For cases like local-statics-7 your approach can be "saved" by adding
>>>>> simple IPA analysis
>>>>> to look for static vars that are used only by one function and keeping
>>>>> your DSE code active
>>>>> for them, so we can still get rid of this special case assignments
>>>>> during late compilation.
>>>>> I am however not quite convinced it is worth the effort - do you have
>>>>> some real world
>>>>> cases where it helps?
>>>>
>>>> Um, the well-known benchmark.  ;-)
>>>
>>> I looked at the generated code for this benchmark and see that your
>>> patch is indeed not getting rid of all the pointless static
>>> variables, while ours is, so this is quite disappointing.  I'm
>>> thinking now that we will still need to retain our patch at least
>>> locally, although we'd really like to get it on trunk if possible.
>>
>> Yours patch can really be improved by adding simple IPA analysis to work
>> out what variables are refernced by one function only instead of using
>> not-longer-that-relevant local static info.
>> As last IPA pass for each static variable with no address taken, look at all
>> references and see if they all come from one function or functions inlined to
>> it.
>>>
>>> Here is another testcase vaguely based on the same kind of
>>> signal-processing algorithm as the benchmark, but coded without
>>> reference to it.  I have arm-none-eabi builds around for both 4.9.0
>>> with our remove_local_statics patch applied, and trunk with your
>>> patch.  With -O2, our optimization produces 23 instructions and gets
>>> rid of all 3 statics, yours produces 33 and only gets rid of 1 of
>>> them.
>>>
>>> Of course it's lame to use static variables in this way, but OTOH
>>> it's lame if GCC can't recognize them and optimize them away, too.
>>>
>>> -Sandra
>>>
>>
>>> void
>>> fir (int *coeffs, int coeff_length, int *input, int input_length, int *output)
>>> {
>>>    static int *coeffp;
>>>    static int *inputp;
>>>    static int *outputp;
>>
>> Here inputp read is rmeoved only at 101.dceloop1 pass. That is really late.
>> coeffp is removed late, too.
>>
>>>    int i, c, acc;
>>>
>>>    for (i = 0; i < input_length; i++)
>>>      {
>>>        coeffp = coeffs;
>>>        inputp = input + i;
>>>        outputp = output + i;
>>>        acc = 0;
>>>        for (c = 0; c < coeff_length; c++)
>>>        {
>>>          acc += *coeffp * *inputp;
>>>          coeffp++;
>>>          inputp--;
>>>        }
>>
>> It seems like AA can not work out the fact that inputp is unchanged until that
>> late.  Richi, any ideas?
>
> Well, AA is perfectly fine - it's just that this boils down to a partial
> redundancy problem.  The stores can be removed by DCE even with
>
> Index: gcc/tree-ssa-dce.c
> ===================================================================
> --- gcc/tree-ssa-dce.c  (revision 210635)
> +++ gcc/tree-ssa-dce.c  (working copy)
> @@ -278,10 +278,20 @@ mark_stmt_if_obviously_necessary (gimple
>         break;
>
>       case GIMPLE_ASSIGN:
> -      if (TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME
> -         && TREE_CLOBBER_P (gimple_assign_rhs1 (stmt)))
> -       return;
> -      break;
> +      {
> +       tree lhs = gimple_assign_lhs (stmt);
> +       if (TREE_CODE (lhs) == SSA_NAME
> +           && TREE_CLOBBER_P (gimple_assign_rhs1 (stmt)))
> +         return;
> +       if (TREE_CODE (lhs) == VAR_DECL
> +           && !TREE_ADDRESSABLE (lhs)
> +           && TREE_STATIC (lhs)
> +           && !TREE_PUBLIC (lhs) && !DECL_EXTERNAL (lhs)
> +           /* ???  Make sure lhs is only refered to from cfun->decl.  */
> +           && decl_function_context (lhs) == cfun->decl)
> +         return;
> +       break;
> +      }
>
>       default:
>         break;
>
> but I don't have a good way of checking the ??? prerequesite
> (even without taking its address the statics may be refered to
> by a) inline copies, b) ipa-split function parts, c) nested functions).
> I'm sure IPA references are not kept up-to-date.
>
> The last reads go away with PRE at the expense of two
> additional induction variables.
>
> If we know that a static variable does not have its address taken
> and is only refered to from a single function we can in theory
> simply rewrite it into SSA form during early opts (before inlining
> its body), for example by SRA.  That isn't even restricted to
> function-local statics (for statics used in multiple functions but
> not having address-taken we can do the same with doing
> function entry / exit load / store).  I think that would be a much
> better IPA enabled optimization than playing games with
> looking at individual stmts.

FAOD, I'm not currently planning to work on this any more myself.  I 
understand Bernd's patch pretty well and could make some minor changes 
to clean it up if that's what it takes to get it approved, but it sounds 
like what's wanted is a completely different approach, again, and I 
don't have any confidence that I could implement something that wouldn't 
just get stuck in another round of "why don't you try X instead" or 
"maybe it would be better to do this in Y".  :-(

Richard, FWIW I think both of the objections you raised to Bernd's patch 
last year have been resolved now.

https://gcc.gnu.org/ml/gcc-patches/2013-06/msg00322.html

We know that an IPA-only implementation (Jan's patch) isn't sufficient 
to catch the important cases, and the Bernd previously answered the 
question about inlining here:

https://gcc.gnu.org/ml/gcc-patches/2013-06/msg00327.html

-Sandra
Richard Biener June 2, 2014, 9:51 a.m. UTC | #2
On Sat, May 31, 2014 at 8:21 PM, Sandra Loosemore
<sandra@codesourcery.com> wrote:
> On 05/20/2014 04:56 AM, Richard Biener wrote:
>>
>> On Tue, May 20, 2014 at 6:29 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>>
>>>> On 05/18/2014 08:45 PM, Sandra Loosemore wrote:
>>>>>
>>>>> On 05/18/2014 02:59 PM, Jan Hubicka wrote:
>>>>>>
>>>>>> For cases like local-statics-7 your approach can be "saved" by adding
>>>>>> simple IPA analysis
>>>>>> to look for static vars that are used only by one function and keeping
>>>>>> your DSE code active
>>>>>> for them, so we can still get rid of this special case assignments
>>>>>> during late compilation.
>>>>>> I am however not quite convinced it is worth the effort - do you have
>>>>>> some real world
>>>>>> cases where it helps?
>>>>>
>>>>>
>>>>> Um, the well-known benchmark.  ;-)
>>>>
>>>>
>>>> I looked at the generated code for this benchmark and see that your
>>>> patch is indeed not getting rid of all the pointless static
>>>> variables, while ours is, so this is quite disappointing.  I'm
>>>> thinking now that we will still need to retain our patch at least
>>>> locally, although we'd really like to get it on trunk if possible.
>>>
>>>
>>> Yours patch can really be improved by adding simple IPA analysis to work
>>> out what variables are refernced by one function only instead of using
>>> not-longer-that-relevant local static info.
>>> As last IPA pass for each static variable with no address taken, look at
>>> all
>>> references and see if they all come from one function or functions
>>> inlined to
>>> it.
>>>>
>>>>
>>>> Here is another testcase vaguely based on the same kind of
>>>> signal-processing algorithm as the benchmark, but coded without
>>>> reference to it.  I have arm-none-eabi builds around for both 4.9.0
>>>> with our remove_local_statics patch applied, and trunk with your
>>>> patch.  With -O2, our optimization produces 23 instructions and gets
>>>> rid of all 3 statics, yours produces 33 and only gets rid of 1 of
>>>> them.
>>>>
>>>> Of course it's lame to use static variables in this way, but OTOH
>>>> it's lame if GCC can't recognize them and optimize them away, too.
>>>>
>>>> -Sandra
>>>>
>>>
>>>> void
>>>> fir (int *coeffs, int coeff_length, int *input, int input_length, int
>>>> *output)
>>>> {
>>>>    static int *coeffp;
>>>>    static int *inputp;
>>>>    static int *outputp;
>>>
>>>
>>> Here inputp read is rmeoved only at 101.dceloop1 pass. That is really
>>> late.
>>> coeffp is removed late, too.
>>>
>>>>    int i, c, acc;
>>>>
>>>>    for (i = 0; i < input_length; i++)
>>>>      {
>>>>        coeffp = coeffs;
>>>>        inputp = input + i;
>>>>        outputp = output + i;
>>>>        acc = 0;
>>>>        for (c = 0; c < coeff_length; c++)
>>>>        {
>>>>          acc += *coeffp * *inputp;
>>>>          coeffp++;
>>>>          inputp--;
>>>>        }
>>>
>>>
>>> It seems like AA can not work out the fact that inputp is unchanged until
>>> that
>>> late.  Richi, any ideas?
>>
>>
>> Well, AA is perfectly fine - it's just that this boils down to a partial
>> redundancy problem.  The stores can be removed by DCE even with
>>
>> Index: gcc/tree-ssa-dce.c
>> ===================================================================
>> --- gcc/tree-ssa-dce.c  (revision 210635)
>> +++ gcc/tree-ssa-dce.c  (working copy)
>> @@ -278,10 +278,20 @@ mark_stmt_if_obviously_necessary (gimple
>>         break;
>>
>>       case GIMPLE_ASSIGN:
>> -      if (TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME
>> -         && TREE_CLOBBER_P (gimple_assign_rhs1 (stmt)))
>> -       return;
>> -      break;
>> +      {
>> +       tree lhs = gimple_assign_lhs (stmt);
>> +       if (TREE_CODE (lhs) == SSA_NAME
>> +           && TREE_CLOBBER_P (gimple_assign_rhs1 (stmt)))
>> +         return;
>> +       if (TREE_CODE (lhs) == VAR_DECL
>> +           && !TREE_ADDRESSABLE (lhs)
>> +           && TREE_STATIC (lhs)
>> +           && !TREE_PUBLIC (lhs) && !DECL_EXTERNAL (lhs)
>> +           /* ???  Make sure lhs is only refered to from cfun->decl.  */
>> +           && decl_function_context (lhs) == cfun->decl)
>> +         return;
>> +       break;
>> +      }
>>
>>       default:
>>         break;
>>
>> but I don't have a good way of checking the ??? prerequesite
>> (even without taking its address the statics may be refered to
>> by a) inline copies, b) ipa-split function parts, c) nested functions).
>> I'm sure IPA references are not kept up-to-date.
>>
>> The last reads go away with PRE at the expense of two
>> additional induction variables.
>>
>> If we know that a static variable does not have its address taken
>> and is only refered to from a single function we can in theory
>> simply rewrite it into SSA form during early opts (before inlining
>> its body), for example by SRA.  That isn't even restricted to
>> function-local statics (for statics used in multiple functions but
>> not having address-taken we can do the same with doing
>> function entry / exit load / store).  I think that would be a much
>> better IPA enabled optimization than playing games with
>> looking at individual stmts.
>
>
> FAOD, I'm not currently planning to work on this any more myself.  I
> understand Bernd's patch pretty well and could make some minor changes to
> clean it up if that's what it takes to get it approved, but it sounds like
> what's wanted is a completely different approach, again, and I don't have
> any confidence that I could implement something that wouldn't just get stuck
> in another round of "why don't you try X instead" or "maybe it would be
> better to do this in Y".  :-(
>
> Richard, FWIW I think both of the objections you raised to Bernd's patch
> last year have been resolved now.
>
> https://gcc.gnu.org/ml/gcc-patches/2013-06/msg00322.html
>
> We know that an IPA-only implementation (Jan's patch) isn't sufficient to
> catch the important cases, and the Bernd previously answered the question
> about inlining here:
>
> https://gcc.gnu.org/ml/gcc-patches/2013-06/msg00327.html

Well, I'm hesitant to add a new pass just to optimize a (irrelevant in practice)
benchmark.  I'm ok with strengthening existing infrastructure or enhancing
existing passes.

The issue with these mini-passes is that they are very placement sensitive
and you don't easily get secondary effects.  See the comment (and
"patch") about DCE - the ??? comment can be addressed the same
way Bernd addressed it (use cgraph_function_possibly_inlined_p).
Would that optimize the testcase?

Note that the issue with nested function use prevails (not sure how to
solve that - same issue in Bernds patch).

Thanks,
Richard.

> -Sandra
>
Jan Hubicka June 2, 2014, 3:44 p.m. UTC | #3
> 
> Well, I'm hesitant to add a new pass just to optimize a (irrelevant in practice)
> benchmark.  I'm ok with strengthening existing infrastructure or enhancing
> existing passes.
> 
> The issue with these mini-passes is that they are very placement sensitive
> and you don't easily get secondary effects.  See the comment (and
> "patch") about DCE - the ??? comment can be addressed the same
> way Bernd addressed it (use cgraph_function_possibly_inlined_p).
> Would that optimize the testcase?
> 
> Note that the issue with nested function use prevails (not sure how to
> solve that - same issue in Bernds patch).

I think this part of the patch can be made much cleaner by simply adding flag
"used by one function only" to variables.  We can compute it at the end of IPA
queue and remove the code playing with local statics, nested functions and inlines.

I can easily implement this analysis - perhaps it would be useful for AA or
something else, too?

Honza
> 
> Thanks,
> Richard.
> 
> > -Sandra
> >
Richard Biener June 2, 2014, 6:54 p.m. UTC | #4
On June 2, 2014 5:44:13 PM CEST, Jan Hubicka <hubicka@ucw.cz> wrote:
>> 
>> Well, I'm hesitant to add a new pass just to optimize a (irrelevant
>in practice)
>> benchmark.  I'm ok with strengthening existing infrastructure or
>enhancing
>> existing passes.
>> 
>> The issue with these mini-passes is that they are very placement
>sensitive
>> and you don't easily get secondary effects.  See the comment (and
>> "patch") about DCE - the ??? comment can be addressed the same
>> way Bernd addressed it (use cgraph_function_possibly_inlined_p).
>> Would that optimize the testcase?
>> 
>> Note that the issue with nested function use prevails (not sure how
>to
>> solve that - same issue in Bernds patch).
>
>I think this part of the patch can be made much cleaner by simply
>adding flag
>"used by one function only" to variables.  We can compute it at the end
>of IPA
>queue and remove the code playing with local statics, nested functions
>and inlines.
>
>I can easily implement this analysis - perhaps it would be useful for
>AA or
>something else, too?

Yeah, I discussed this with martin today on irc.  For aliasing we'd like to know whether a decl possibly has its address taken. Currently we only trust TREE_ADDRESSABLE for statics - and lto might change those to hidden visibility public :(

So we want deck_referred_to_by_single_function and deck_may_have_address_taken.

Richard.

>Honza
>> 
>> Thanks,
>> Richard.
>> 
>> > -Sandra
>> >
Jan Hubicka June 2, 2014, 6:59 p.m. UTC | #5
> 
> Yeah, I discussed this with martin today on irc.  For aliasing we'd like to know whether a decl possibly has its address taken. Currently we only trust TREE_ADDRESSABLE for statics - and lto might change those to hidden visibility public :(
> 
> So we want deck_referred_to_by_single_function 

OK, I will implement this one in IPA mini-pass.  It is easy.
This property changes by clonning and inlining. What Martin wants to use it for?
(I.e. my plan would be to compute it as last IPA pass making it useless for
IPA analysis&propagation)

> and deck_may_have_address_taken.

We currently clear TREE_ADDRESSABLE for statics that have no address taken at
WPA time and that ought to keep the flag cleared at ltrans (I think I even made
testcase for this)

What I think we could improve is to not consider string functions as ADDR operations
for this analysis, i.e. it is common to only memset to 0.

How may_have_address_taken would differ here?

Honza
Richard Biener June 3, 2014, 9:15 a.m. UTC | #6
On Mon, Jun 2, 2014 at 8:59 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>
>> Yeah, I discussed this with martin today on irc.  For aliasing we'd like to know whether a decl possibly has its address taken. Currently we only trust TREE_ADDRESSABLE for statics - and lto might change those to hidden visibility public :(
>>
>> So we want deck_referred_to_by_single_function
>
> OK, I will implement this one in IPA mini-pass.  It is easy.
> This property changes by clonning and inlining. What Martin wants to use it for?
> (I.e. my plan would be to compute it as last IPA pass making it useless for
> IPA analysis&propagation)

He doesn't want to use it but we talked and he said he'd have a look.

Note that it's important the decls are not refered to by global initializers
either.

Why is a new pass necessary - can't the IPA reference maintaining code
update some flag in the varpool magically?

>> and deck_may_have_address_taken.
>
> We currently clear TREE_ADDRESSABLE for statics that have no address taken at
> WPA time and that ought to keep the flag cleared at ltrans (I think I even made
> testcase for this)
>
> What I think we could improve is to not consider string functions as ADDR operations
> for this analysis, i.e. it is common to only memset to 0.
>
> How may_have_address_taken would differ here?

I want tree.h:may_be_aliased to return false if a decl doesn't have its address
taken.  We don't trust TREE_ADDRESSABLE for TREE_PUBLIC/DECL_EXTERNAL
decls because other TUs may take the address.  I want the check to be replaced
with sth more symtab aware, that is, bring in benefits from LTO (and at the same
time be not confused by statics brought public with hidden visibility).

Richard.

> Honza
Martin Jambor June 3, 2014, 9:17 a.m. UTC | #7
On Mon, Jun 02, 2014 at 08:59:35PM +0200, Jan Hubicka wrote:
> > 
> > Yeah, I discussed this with martin today on irc.  For aliasing we'd like to know whether a decl possibly has its address taken. Currently we only trust TREE_ADDRESSABLE for statics - and lto might change those to hidden visibility public :(
> > 
> > So we want deck_referred_to_by_single_function 
> 
> OK, I will implement this one in IPA mini-pass.  It is easy.
> This property changes by clonning and inlining. What Martin wants to use it for?
> (I.e. my plan would be to compute it as last IPA pass making it useless for
> IPA analysis&propagation)

That is a misunderstanding, I don't plan to use it for anything.  It
was just a part of a discussion about this thread where I simply
proposed exactly the same thing as you did now.

Martin

> 
> > and deck_may_have_address_taken.
> 
> We currently clear TREE_ADDRESSABLE for statics that have no address taken at
> WPA time and that ought to keep the flag cleared at ltrans (I think I even made
> testcase for this)
> 
> What I think we could improve is to not consider string functions as ADDR operations
> for this analysis, i.e. it is common to only memset to 0.
> 
> How may_have_address_taken would differ here?
> 
> Honza
Jan Hubicka June 3, 2014, 4:19 p.m. UTC | #8
> On Mon, Jun 2, 2014 at 8:59 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
> >>
> >> Yeah, I discussed this with martin today on irc.  For aliasing we'd like to know whether a decl possibly has its address taken. Currently we only trust TREE_ADDRESSABLE for statics - and lto might change those to hidden visibility public :(
> >>
> >> So we want deck_referred_to_by_single_function
> >
> > OK, I will implement this one in IPA mini-pass.  It is easy.
> > This property changes by clonning and inlining. What Martin wants to use it for?
> > (I.e. my plan would be to compute it as last IPA pass making it useless for
> > IPA analysis&propagation)
> 
> He doesn't want to use it but we talked and he said he'd have a look.
> 
> Note that it's important the decls are not refered to by global initializers
> either.
> 
> Why is a new pass necessary - can't the IPA reference maintaining code
> update some flag in the varpool magically?

If the code to add/remove reference was updating the flag, it would became out of
date as we remove callgraph during the late compilation (we do not keep references for
functions we already compiled).
I do not want the flags to be computed before end of IPA queue so they won't become
out of date as we clone/inline.

We basically need to walk references and check that they are all functions &
either one given function F or a clones inlined to it.  Assuming that function
does not have unbounded number of references to given variale, this is
basically constant time test.
> 
> >> and deck_may_have_address_taken.
> >
> > We currently clear TREE_ADDRESSABLE for statics that have no address taken at
> > WPA time and that ought to keep the flag cleared at ltrans (I think I even made
> > testcase for this)
> >
> > What I think we could improve is to not consider string functions as ADDR operations
> > for this analysis, i.e. it is common to only memset to 0.
> >
> > How may_have_address_taken would differ here?
> 
> I want tree.h:may_be_aliased to return false if a decl doesn't have its address
> taken.  We don't trust TREE_ADDRESSABLE for TREE_PUBLIC/DECL_EXTERNAL
> decls because other TUs may take the address.  I want the check to be replaced
> with sth more symtab aware, that is, bring in benefits from LTO (and at the same
> time be not confused by statics brought public with hidden visibility).


I see, are you sure you need to ignore TREE_ADDRESSABLE for PUBLIC/EXTERNAL?
I belive frontends are resposible to set them for all data that may have address
taken (even externally) and IPA code maintains it - we clear the flagonly for
variables that are static.

Or we can just set the flag to true for externally visible vars ealry in IPA
queue if this is false.

Honza
> 
> Richard.
> 
> > Honza
Richard Biener June 4, 2014, 9:12 a.m. UTC | #9
On Tue, Jun 3, 2014 at 6:19 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> On Mon, Jun 2, 2014 at 8:59 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> >>
>> >> Yeah, I discussed this with martin today on irc.  For aliasing we'd like to know whether a decl possibly has its address taken. Currently we only trust TREE_ADDRESSABLE for statics - and lto might change those to hidden visibility public :(
>> >>
>> >> So we want deck_referred_to_by_single_function
>> >
>> > OK, I will implement this one in IPA mini-pass.  It is easy.
>> > This property changes by clonning and inlining. What Martin wants to use it for?
>> > (I.e. my plan would be to compute it as last IPA pass making it useless for
>> > IPA analysis&propagation)
>>
>> He doesn't want to use it but we talked and he said he'd have a look.
>>
>> Note that it's important the decls are not refered to by global initializers
>> either.
>>
>> Why is a new pass necessary - can't the IPA reference maintaining code
>> update some flag in the varpool magically?
>
> If the code to add/remove reference was updating the flag, it would became out of
> date as we remove callgraph during the late compilation (we do not keep references for
> functions we already compiled).
> I do not want the flags to be computed before end of IPA queue so they won't become
> out of date as we clone/inline.
>
> We basically need to walk references and check that they are all functions &
> either one given function F or a clones inlined to it.  Assuming that function
> does not have unbounded number of references to given variale, this is
> basically constant time test.
>>
>> >> and deck_may_have_address_taken.
>> >
>> > We currently clear TREE_ADDRESSABLE for statics that have no address taken at
>> > WPA time and that ought to keep the flag cleared at ltrans (I think I even made
>> > testcase for this)
>> >
>> > What I think we could improve is to not consider string functions as ADDR operations
>> > for this analysis, i.e. it is common to only memset to 0.
>> >
>> > How may_have_address_taken would differ here?
>>
>> I want tree.h:may_be_aliased to return false if a decl doesn't have its address
>> taken.  We don't trust TREE_ADDRESSABLE for TREE_PUBLIC/DECL_EXTERNAL
>> decls because other TUs may take the address.  I want the check to be replaced
>> with sth more symtab aware, that is, bring in benefits from LTO (and at the same
>> time be not confused by statics brought public with hidden visibility).
>
>
> I see, are you sure you need to ignore TREE_ADDRESSABLE for PUBLIC/EXTERNAL?
> I belive frontends are resposible to set them for all data that may have address
> taken (even externally) and IPA code maintains it - we clear the flagonly for
> variables that are static.

Not sure - we've always not trusted this flag on public/externals.  Probably
because there could be aliases to them that have their address taken?
(well, that's true for locals as well ...)

That said, having a varpool way of querying whether a decl may be aliased
by a pointer would be nice (with bonus points of handling the alias case).

> Or we can just set the flag to true for externally visible vars ealry in IPA
> queue if this is false.

Well, let me do a simple check (remove the restriction and bootstrap/test).

Richard.

> Honza
>>
>> Richard.
>>
>> > Honza
diff mbox

Patch

Index: gcc/tree-ssa-dce.c
===================================================================
--- gcc/tree-ssa-dce.c  (revision 210635)
+++ gcc/tree-ssa-dce.c  (working copy)
@@ -278,10 +278,20 @@  mark_stmt_if_obviously_necessary (gimple
       break;

     case GIMPLE_ASSIGN:
-      if (TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME
-         && TREE_CLOBBER_P (gimple_assign_rhs1 (stmt)))
-       return;
-      break;
+      {
+       tree lhs = gimple_assign_lhs (stmt);
+       if (TREE_CODE (lhs) == SSA_NAME
+           && TREE_CLOBBER_P (gimple_assign_rhs1 (stmt)))
+         return;
+       if (TREE_CODE (lhs) == VAR_DECL
+           && !TREE_ADDRESSABLE (lhs)
+           && TREE_STATIC (lhs)
+           && !TREE_PUBLIC (lhs) && !DECL_EXTERNAL (lhs)
+           /* ???  Make sure lhs is only refered to from cfun->decl.  */
+           && decl_function_context (lhs) == cfun->decl)
+         return;
+       break;
+      }

     default:
       break;