Patchwork Symtab cleanup 10/17 remove unnecesary DECL_ARGUMENTS and DECL_RESULT

login
register
mail settings
Submitter Jan Hubicka
Date Sept. 2, 2013, 10:52 a.m.
Message ID <20130902105208.GB5201@kam.mff.cuni.cz>
Download mbox | patch
Permalink /patch/271871/
State New
Headers show

Comments

Jan Hubicka - Sept. 2, 2013, 10:52 a.m.
> On Aug 21, 2013, at 11:47 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
> >> The problem is that DECL_ARGUMENTS of the thunk (aka _ZThn528_N1D3fooEv) is used during thunk code-generation, and thunk code-generation happens during the output of D::foo.
> 
> > I see, I will try to modify i386 backend to not output thunks.  The problem
> > indeed is that thunks' arguments are built by the front-end and they are no
> > longer streamed.  I am surprised i386 survives this given that it also
> > produces some gimple thunks.
> > 
> > I guess easiest way around is to make them to be streamed same way as we stream
> > functions that are used as abstract origin.  I have different plans in this
> > direction - I want to lower thunks to gimple form early so they go through the
> > usual channel and get i.e. the profile read correctly.
> 
> So, any news on this?
I was bid dragged into other debuggning. Sorry for the delay.
This patch seems to fix the problems on ppc64 hacked to not use asm thunks.
Can you, please, test it that it solves problems on your target?

Honza
Richard Guenther - Sept. 2, 2013, 11:18 a.m.
On Mon, Sep 2, 2013 at 12:52 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> On Aug 21, 2013, at 11:47 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> >> The problem is that DECL_ARGUMENTS of the thunk (aka _ZThn528_N1D3fooEv) is used during thunk code-generation, and thunk code-generation happens during the output of D::foo.
>>
>> > I see, I will try to modify i386 backend to not output thunks.  The problem
>> > indeed is that thunks' arguments are built by the front-end and they are no
>> > longer streamed.  I am surprised i386 survives this given that it also
>> > produces some gimple thunks.
>> >
>> > I guess easiest way around is to make them to be streamed same way as we stream
>> > functions that are used as abstract origin.  I have different plans in this
>> > direction - I want to lower thunks to gimple form early so they go through the
>> > usual channel and get i.e. the profile read correctly.
>>
>> So, any news on this?
> I was bid dragged into other debuggning. Sorry for the delay.
> This patch seems to fix the problems on ppc64 hacked to not use asm thunks.
> Can you, please, test it that it solves problems on your target?
>
> Honza
>
> Index: cgraphunit.c
> ===================================================================
> --- cgraphunit.c        (revision 202153)
> +++ cgraphunit.c        (working copy)
> @@ -1414,14 +1414,18 @@ expand_thunk (struct cgraph_node *node)
>    tree virtual_offset = NULL;
>    tree alias = node->callees->callee->symbol.decl;
>    tree thunk_fndecl = node->symbol.decl;
> -  tree a = DECL_ARGUMENTS (thunk_fndecl);
> +  tree a;
> +
> +  if (in_lto_p)
> +    cgraph_get_body (node);

That looks gross.  It cannot possibly be the correct fix for this.

Richard.

> +  a = DECL_ARGUMENTS (thunk_fndecl);
>
>    current_function_decl = thunk_fndecl;
>
>    /* Ensure thunks are emitted in their correct sections.  */
>    resolve_unique_section (thunk_fndecl, 0, flag_function_sections);
>
> -  if (this_adjusting
> +  if (this_adjusting
>        && targetm.asm_out.can_output_mi_thunk (thunk_fndecl, fixed_offset,
>                                               virtual_value, alias))
>      {
> Index: lto-streamer-in.c
> ===================================================================
> --- lto-streamer-in.c   (revision 202153)
> +++ lto-streamer-in.c   (working copy)
> @@ -998,6 +998,7 @@ input_function (tree fn_decl, struct dat
>    free_dominance_info (CDI_DOMINATORS);
>    free_dominance_info (CDI_POST_DOMINATORS);
>    free (stmts);
> +  pop_cfun ();
>  }
>
>
> @@ -1086,8 +1087,6 @@ lto_read_body (struct lto_file_decl_data
>
>        /* Restore decl state */
>        file_data->current_decl_state = file_data->global_decl_state;
> -
> -      pop_cfun ();
>      }
>
>    lto_data_in_delete (data_in);
> Index: lto-streamer-out.c
> ===================================================================
> --- lto-streamer-out.c  (revision 202153)
> +++ lto-streamer-out.c  (working copy)
> @@ -1982,8 +1982,7 @@ lto_output (void)
>        cgraph_node *node = dyn_cast <cgraph_node> (snode);
>        if (node
>           && lto_symtab_encoder_encode_body_p (encoder, node)
> -         && !node->symbol.alias
> -         && !node->thunk.thunk_p)
> +         && !node->symbol.alias)
>         {
>  #ifdef ENABLE_CHECKING
>           gcc_assert (!bitmap_bit_p (output, DECL_UID (node->symbol.decl)));
Jan Hubicka - Sept. 2, 2013, 11:43 a.m.
> > +  tree a;
> > +
> > +  if (in_lto_p)
> > +    cgraph_get_body (node);
> 
> That looks gross.  It cannot possibly be the correct fix for this.
DECL_ARGUMENTS/DECL_RESULT are now part of function body.  cgraph_get_body is
there to load it for you when you need it. We are going to expand the function
so it makes sense to get it.

The same is done by the passmanager when function is going to be expanded. Only
difference here is that thunks do not go through the passmanager.

I can drop in_lto_p (the function does nothing when body is already there)

Honza
Richard Guenther - Sept. 2, 2013, 11:48 a.m.
On Mon, Sep 2, 2013 at 1:43 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> > +  tree a;
>> > +
>> > +  if (in_lto_p)
>> > +    cgraph_get_body (node);
>>
>> That looks gross.  It cannot possibly be the correct fix for this.
> DECL_ARGUMENTS/DECL_RESULT are now part of function body.

Well, there is still fallout from this change so I'm not convinced
this will stay
this way.  Also we stream the function-decl that refers to these fields in
the global section which means we have a layering violation.  Which means
DECL_ARGUMENTS and DECL_RESULT should be moved to struct function?
Of course frontends may not be happy with that (given DECL_ARGUMENTS
is also used in function declarations)

Please consider reverting these changes (at least the DECL_ARGUMENTS one).

>  cgraph_get_body is
> there to load it for you when you need it. We are going to expand the function
> so it makes sense to get it.

Then all DECL_ARGUMENTS/DECL_RESULT users need to be audited, no?

Richard.

> The same is done by the passmanager when function is going to be expanded. Only
> difference here is that thunks do not go through the passmanager.
>
> I can drop in_lto_p (the function does nothing when body is already there)
>
> Honza
Jan Hubicka - Sept. 2, 2013, 12:58 p.m.
> 
> Well, there is still fallout from this change so I'm not convinced
> this will stay
> this way.  Also we stream the function-decl that refers to these fields in

As far as I know there are two problems
 1) problem with the thunk expansion not getting DECL_ARGUMENTS/DECL_RESULT
    addressed by this patch
 2) ICE in gcc.dg/torture/pr8081.c
    Here the problem is in variable type of return value that gets streamed
    twice and we end up with duplicate.

    As mentioned earlier, I want to handle this by introducing
    variable_sized_function_type_p that will make those go specially the
    old way (with DECL_ARGUMENTS/DECL_RESULT in the global stream).

    I did not send the patch, because I think the variable sized parameters
    were handled incorrectly before my changes, too, and the fix is not
    so trivial.  It just makes us
    not ICE the same way as before.  The testcase is:
	/* { dg-do run } */

	extern void abort (void);
	int
	main (int argc, char **argv)
	{
	  int size = 10;
	  typedef struct
	    {
	      char val[size];
	    }
	  block;
	  block a, b;
	  block __attribute__((noinline))
	  retframe_block ()
	    {
	      return *(block *) &b;
	    }
	  b.val[0] = 1;
	  b.val[9] = 2;
	  a=retframe_block ();
	  if (a.val[0] != 1
	      || a.val[9] != 2)
	    abort ();
	  return 0;
	}
    So here the size is not parameter of the function, it is a local vairable
    (later turned into structure) that gets into local function stream.  

    The type is:
	 <record_type 0x7ffff7511930 block sizes-gimplified type_0 type_1 BLK
	    size <var_decl 0x7ffff752f428 D.1738
		type <integer_type 0x7ffff740d0a8 bitsizetype public unsigned TI
		    size <integer_cst 0x7ffff73f7fc0 constant 128>
		    unit size <integer_cst 0x7ffff73f7fe0 constant 16>
		    align 128 symtab 0 alias set -1 canonical type 0x7ffff740d0a8 precision 128 min <integer_cst 0x7ffff740f000 0> max <integer_cst 0x7ffff73f7fa0 -1>>
		used unsigned ignored TI file t.c line 8 col 19 size <integer_cst 0x7ffff73f7fc0 128> unit size <integer_cst 0x7ffff73f7fe0 16>
		align 128 context <function_decl 0x7ffff7510f00 main>
		chain <var_decl 0x7ffff752f4c0 D.1739 type <integer_type 0x7ffff740d738 long int>
		    used ignored DI file t.c line 10 col 20
		    size <integer_cst 0x7ffff73f7f40 constant 64>
		    unit size <integer_cst 0x7ffff73f7f60 constant 8>
		    align 64 context <function_decl 0x7ffff7510f00 main> chain <var_decl 0x7ffff752f558 D.1740>>>
	    unit size <var_decl 0x7ffff752f2f8 D.1736
		type <integer_type 0x7ffff740d000 sizetype public unsigned DI size <integer_cst 0x7ffff73f7f40 64> unit size <integer_cst 0x7ffff73f7f60 8>
		    align 64 symtab 0 alias set -1 canonical type 0x7ffff740d000 precision 64 min <integer_cst 0x7ffff73f7f80 0> max <integer_cst 0x7ffff73f7000 18446744073709551615>>
		used unsigned ignored DI file t.c line 8 col 19 size <integer_cst 0x7ffff73f7f40 64> unit size <integer_cst 0x7ffff73f7f60 8>
		align 64 context <function_decl 0x7ffff7510f00 main>
		chain <var_decl 0x7ffff752f390 D.1737 type <integer_type 0x7ffff740d0a8 bitsizetype>
		    used unsigned ignored TI file t.c line 8 col 19 size <integer_cst 0x7ffff73f7fc0 128> unit size <integer_cst 0x7ffff73f7fe0 16>
		    align 128 context <function_decl 0x7ffff7510f00 main> chain <var_decl 0x7ffff752f428 D.1738>>>
	    align 8 symtab 0 alias set -1 canonical type 0x7ffff7511738
	    fields <field_decl 0x7ffff752f098 val
		type <array_type 0x7ffff7511888 type <integer_type 0x7ffff740d3f0 char>
		    sizes-gimplified type_1 BLK size <var_decl 0x7ffff752f428 D.1738> unit size <var_decl 0x7ffff752f2f8 D.1736>
		    align 8 symtab 0 alias set -1 structural equality domain <integer_type 0x7ffff75117e0>>
		decl_0 BLK file t.c line 10 col 20 size <var_decl 0x7ffff752f428 D.1738> unit size <var_decl 0x7ffff752f2f8 D.1736>
		align 8 offset_align 128
		offset <integer_cst 0x7ffff73f7f80 constant 0>
		bit offset <integer_cst 0x7ffff740f000 constant 0> context <record_type 0x7ffff7511738>> context <function_decl 0x7ffff7510f00 main>
	    pointer_to_this <pointer_type 0x7ffff7511bd0> chain <type_decl 0x7ffff7530000 D.1726>>

    My understanding is that we end up with silently putting automatic variable
    D.1738 into global stream and we happen to not explode later.  At stream
    in time we will have duplicated D.1738 parameters and it is only becuase
    the one in TYPE_SIZE is unused making us to not ICE. 

    I also do not see how we can produce valid debug info for the nested function.

    Only scenario the patch will solve is where the size of type is given
    by the parm_decl alone, since parm_decl will go to global stream.

    If the size of array is something like parm_decl * 4, i think gimplifier
    will put in an temporary re-introducing the problem, but I did not double
    check.

> the global section which means we have a layering violation.  Which means
> DECL_ARGUMENTS and DECL_RESULT should be moved to struct function?
> Of course frontends may not be happy with that (given DECL_ARGUMENTS
> is also used in function declarations)

It is the same as DECL_INITIAL that is already considered to be part of
function body by LTO, but it is not in DECL_STRUCT_FUNCTION because the way
frontends builds it.

I think we can modify frotnends to put those into function bodies, but it is
bit harder.
> 
> Please consider reverting these changes (at least the DECL_ARGUMENTS one).

It is certainly more involved change than I hoped for.  I am fine with
 a) fixing the two issues and getting it work
 b) reverting it and trying to get it working off-tree
 c) declaring the approach flawed and giving up.

The change brings noticeable improvements for firefox memory footprint. Two
important things has changed since then however
 1) I now remove DECL_ARGUMENTS/DECL_RESULT for external declarations that
    is ortoghonal to this problem.  
 2) We now have 600000 instead of 3700000 function bodies to deal with at
    WPA stage as a result of early virtual function removal.

If you are interested, I can get memory/streaming time stats for the change
on the current mainline.

> 
> >  cgraph_get_body is
> > there to load it for you when you need it. We are going to expand the function
> > so it makes sense to get it.
> 
> Then all DECL_ARGUMENTS/DECL_RESULT users need to be audited, no?

Yes, I went through this and explained my findings in the original email
(by greping DECL_ARGUMENT/DECL_RESULT and trying to understand for all the uses).

Local gimple passes use them, but those are safe.  For non-local things it
is primarily dwarf2out that needs them also for abstract origins of functions
where I explicitely introduced the abstract origin tracking for (there is
already bug with DECL_INITIAL not being there) and this early cgraph
machinery - thunks and clone matherialization.  I tought I fixed thunks,
but I pugged in only the return value issue (that is easy since DECL_RESULT
is not always provided by the FE)

Honza
> 
> Richard.
> 
> > The same is done by the passmanager when function is going to be expanded. Only
> > difference here is that thunks do not go through the passmanager.
> >
> > I can drop in_lto_p (the function does nothing when body is already there)
> >
> > Honza
Jan Hubicka - Sept. 2, 2013, 1:02 p.m.
> > 
> > Well, there is still fallout from this change so I'm not convinced
> > this will stay
> > this way.  Also we stream the function-decl that refers to these fields in
> 
> As far as I know there are two problems
>  1) problem with the thunk expansion not getting DECL_ARGUMENTS/DECL_RESULT
>     addressed by this patch
>  2) ICE in gcc.dg/torture/pr8081.c
>     Here the problem is in variable type of return value that gets streamed
>     twice and we end up with duplicate.
> 
>     As mentioned earlier, I want to handle this by introducing
>     variable_sized_function_type_p that will make those go specially the
>     old way (with DECL_ARGUMENTS/DECL_RESULT in the global stream).
> 
>     I did not send the patch, because I think the variable sized parameters
>     were handled incorrectly before my changes, too, and the fix is not
>     so trivial.  It just makes us
>     not ICE the same way as before.  The testcase is:
> 	/* { dg-do run } */
> 
> 	extern void abort (void);
> 	int
> 	main (int argc, char **argv)
> 	{
> 	  int size = 10;
> 	  typedef struct
> 	    {
> 	      char val[size];
> 	    }
> 	  block;
> 	  block a, b;
> 	  block __attribute__((noinline))
> 	  retframe_block ()
> 	    {
> 	      return *(block *) &b;
> 	    }
> 	  b.val[0] = 1;
> 	  b.val[9] = 2;
> 	  a=retframe_block ();
> 	  if (a.val[0] != 1
> 	      || a.val[9] != 2)
> 	    abort ();
> 	  return 0;
> 	}
>     So here the size is not parameter of the function, it is a local vairable
>     (later turned into structure) that gets into local function stream.  

	^^^ reading my email, perhaps in this case the correct fix is to make
	tree-nested to fix-up the TYPE_SIZE?  It knows how to access the
	frame variables from outer function via the invisible link pointer
	passed and it knows how to update gimple.  That may also fix the issue
	with debug info.
Richard Guenther - Sept. 2, 2013, 1:48 p.m.
On Mon, Sep 2, 2013 at 3:02 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> >
>> > Well, there is still fallout from this change so I'm not convinced
>> > this will stay
>> > this way.  Also we stream the function-decl that refers to these fields in
>>
>> As far as I know there are two problems
>>  1) problem with the thunk expansion not getting DECL_ARGUMENTS/DECL_RESULT
>>     addressed by this patch
>>  2) ICE in gcc.dg/torture/pr8081.c
>>     Here the problem is in variable type of return value that gets streamed
>>     twice and we end up with duplicate.
>>
>>     As mentioned earlier, I want to handle this by introducing
>>     variable_sized_function_type_p that will make those go specially the
>>     old way (with DECL_ARGUMENTS/DECL_RESULT in the global stream).
>>
>>     I did not send the patch, because I think the variable sized parameters
>>     were handled incorrectly before my changes, too, and the fix is not
>>     so trivial.  It just makes us
>>     not ICE the same way as before.  The testcase is:
>>       /* { dg-do run } */
>>
>>       extern void abort (void);
>>       int
>>       main (int argc, char **argv)
>>       {
>>         int size = 10;
>>         typedef struct
>>           {
>>             char val[size];
>>           }
>>         block;
>>         block a, b;
>>         block __attribute__((noinline))
>>         retframe_block ()
>>           {
>>             return *(block *) &b;
>>           }
>>         b.val[0] = 1;
>>         b.val[9] = 2;
>>         a=retframe_block ();
>>         if (a.val[0] != 1
>>             || a.val[9] != 2)
>>           abort ();
>>         return 0;
>>       }
>>     So here the size is not parameter of the function, it is a local vairable
>>     (later turned into structure) that gets into local function stream.
>
>         ^^^ reading my email, perhaps in this case the correct fix is to make
>         tree-nested to fix-up the TYPE_SIZE?  It knows how to access the
>         frame variables from outer function via the invisible link pointer
>         passed and it knows how to update gimple.  That may also fix the issue
>         with debug info.

But we still refer to the local entity from TREE_TYPE of the function decl,
no?

As for debug info we probably have to insert proper DEBUG_STMTs that
tell us how to construct those.  Don't we do this kind of dance already?

All the gimplfied type-sizes in TYPE_SIZE are basically 'useless', they
exist only for the gimplfiers sake and as 'marker' that the type has
variable size.
All references to the size are explicit in the code (and I believe
debug info has
to be emitted before gimplifying or doesn't properly handle such variadic types
at all).  That's some huge cleanup opportunity (I suppose NULLifying TYPE_SIZE
doesn't work).

Richard.
Jan Hubicka - Sept. 2, 2013, 3:19 p.m.
> 
> But we still refer to the local entity from TREE_TYPE of the function decl,
> no?

depending on definition of local entity.  I tought we was discussing if PARM_DECL
is local or not.

I spent some time thining about the whole streaming scheme and I think we do have
important side cases handled incorrectly.  Lets try to discuss things.  I will try
to summarize my understanding to make sure we are on sync and I do not miss something
important.

So we now have two types of streams
  - the global stream
    this stream is one per compilation unit and it gets load into WPA
    and merged with other streams by tree mergina and LTO-symtab resolution.
  - local stream.
    this stream is local for function (and hope for variable initializer soon, too),
    it is read when we need the function body and it is not in any way merged with
    global stream, except that the references to global streams are resolved
    after merging

We do have way to express references from local stream to global stream.  We however
can not express
  A) references from global stream to local stream 
  B) references in between two different local streams.

The decision what should go to local or global stream is basically motivated by
  1) everything needed for interprocedural optimization has to be global
  2) everything related to function bodies should be local.

For tree nodes, the decision about where the node goes is made by
tree_is_indexable.  The current set of rules is
 - parm_decl/result_decl is local
   (before my change resut_decl was global and parm_decl was local, why those
    was not the same escapes me, but it needed ugly fixup in ipa-prop that
    needed to reconnect global DECL pointers to local DECL pointers after
    streaming in)
 - function variables are local if they are not static.
   (Even external declarations are local, not only automatic vars.)
 - Types are global unless they are variably modified types
   (The variably modified type code walks into subtypes, so even pointers
   to variably modified type are local or function declarations mentioning
   these.)
 - FIELD_DECL is global unless it is part of variably modified type.
 - SSA_NAME is local despite logic of tree_is_indexable because it is handled
   separately as part of gimple and gimple is always local.
 - LABEL_DECLs are global even when their address is not taken and it would make
   more sense to hide it in the body sections

Now I think we are safe only when our references never violate A) and B) and moreover 
C) same entity that needs to be unique (such as declarations by one decl rule
   or BLOCK) is not streamed into multiple streams.  We never merge things
   between streams and duplicating those is forbidden by design.

It is an irritating property of our streaming machinery that it is not
diagnozing violations of A/B/C. Violations of type A and B are basically hidden
by introducing violations of type C and compiler tends to be resistant to those
in a sense that is usually does not ICE.

Now I only once added sanity check for C and larnt that

 1) variably sized types may end up being twice - in global stream because they are
    mentioned by FUNCTION_DECL/VARIABLE_DECL or TYPE_DECL
    and local stream because of tree_is_indexable rule

    Moreover they violate A after landing in global stream because they refer
    to temporaries in the gimple body.
    As you mentioned, we may want to clear TYPE_SIZE and variable DECL_FIELD_OFFSET.
    But sadly I think dwarf2out needs both and gimple land needs DECL_FIELD_OFFSET.
 2) TYPE_DECLS of local types (common in C++) tends to drag into global stream
    BLOCK that later brings to global stream the local variables.

    Making function local TYPE_DECLs local seems cool, but it break A because
    function/local static variable types may reffer to them.
 3) _ORIGIN pointers violate B by design.
 4) LABEL_DECL with address taken violate A and B.

I am sure this list is not complette.

Now we can play about what can be global or local.  Like putting PARM_DECL into global
stream only, or returning RESULT_DECL into global decl stream.  But
I do not really see how much chance it has to fix 1-4.  I think the general flaw
is that functions/static variables (we absolutely need in global stream) may reffer
to local entities by contextes and variably sized types.  

Either we need to consistently not stream pointers causing the violations above
- i.e. clear them in free_lang_data and make backend not dependent on them. Or
we need to find way to make references of type A and perhaps B possible.  I can
think of <local_ref function_decl localid> tree construct that we patch in and
replace by actual declaration after reading function_decl body when it is
needed (by dwarf2out or tree optimizers), but it does not seem cool to me either.
We will like soon notice that with comdats localid needs to be unique across
the merged comdat bodies or somehting similarly nasty.

Another alternative I can think of is to declare duplication in between streams
sane and allowed, but teach streaming in code to patch in the in-memory
representation from global stream by local stream one when this happens.  It
seems bit tricky to implement - the decl states of functions would need to
explicitly record the duplications in between local and global streams.

Honza
Jan Hubicka - Sept. 2, 2013, 3:40 p.m.
Hi,
the following testcase illustrate problem with the offset.  Sadly it ICEs even w/o
LTO:
evans:/abuild/jh/trunk-3/build-inst12/gcc/:[1]# ./xgcc -B ./ -O2 ~/tt.c      
/root/tt.c: In function 'main':
/root/tt.c:24:11: warning: overflow in implicit constant conversion [-Woverflow]
           b.b=0xdead;
           ^
/root/tt.c:25:12: internal compiler error: in assign_stack_temp_for_type, at function.c:762
           a=retframe_block ();
            ^
0x7ae808 assign_stack_temp_for_type(machine_mode, long, tree_node*)
        ../../gcc/function.c:762


        /* { dg-do run } */

        extern void abort (void);
        int
        main (int argc, char **argv)
        { 
          int size = 10;
          typedef struct
            { 
              char val[size];
              char b;
            }
          block;
          block a, b;
          block __attribute__((noinline))
          retframe_block ()
            { 
            if (b.b != 0xdead)
                abort ();
              return *(block *) &b;
            }
          b.val[0] = 1;
          b.val[9] = 2;
          b.b=0xdead;
          a=retframe_block ();
          if (a.val[0] != 1
              || a.val[9] != 2)
            abort ();
          return 0;
        }
Honza
Richard Guenther - Sept. 3, 2013, 8:27 a.m.
On Mon, Sep 2, 2013 at 5:19 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>
>> But we still refer to the local entity from TREE_TYPE of the function decl,
>> no?
>
> depending on definition of local entity.  I tought we was discussing if PARM_DECL
> is local or not.
>
> I spent some time thining about the whole streaming scheme and I think we do have
> important side cases handled incorrectly.  Lets try to discuss things.  I will try
> to summarize my understanding to make sure we are on sync and I do not miss something
> important.
>
> So we now have two types of streams
>   - the global stream
>     this stream is one per compilation unit and it gets load into WPA
>     and merged with other streams by tree mergina and LTO-symtab resolution.
>   - local stream.
>     this stream is local for function (and hope for variable initializer soon, too),
>     it is read when we need the function body and it is not in any way merged with
>     global stream, except that the references to global streams are resolved
>     after merging
>
> We do have way to express references from local stream to global stream.  We however
> can not express
>   A) references from global stream to local stream
>   B) references in between two different local streams.

Correct.

> The decision what should go to local or global stream is basically motivated by
>   1) everything needed for interprocedural optimization has to be global
>   2) everything related to function bodies should be local.

I'd rather formulate it as "everything not needed at WPA time should be
local if it can be local"

> For tree nodes, the decision about where the node goes is made by
> tree_is_indexable.  The current set of rules is
>  - parm_decl/result_decl is local
>    (before my change resut_decl was global and parm_decl was local, why those
>     was not the same escapes me, but it needed ugly fixup in ipa-prop that
>     needed to reconnect global DECL pointers to local DECL pointers after
>     streaming in)
>  - function variables are local if they are not static.
>    (Even external declarations are local, not only automatic vars.)
>  - Types are global unless they are variably modified types
>    (The variably modified type code walks into subtypes, so even pointers
>    to variably modified type are local or function declarations mentioning
>    these.)
>  - FIELD_DECL is global unless it is part of variably modified type.
>  - SSA_NAME is local despite logic of tree_is_indexable because it is handled
>    separately as part of gimple and gimple is always local.
>  - LABEL_DECLs are global even when their address is not taken and it would make
>    more sense to hide it in the body sections
>
> Now I think we are safe only when our references never violate A) and B) and moreover
> C) same entity that needs to be unique (such as declarations by one decl rule
>    or BLOCK) is not streamed into multiple streams.  We never merge things
>    between streams and duplicating those is forbidden by design.
>
> It is an irritating property of our streaming machinery that it is not
> diagnozing violations of A/B/C. Violations of type A and B are basically hidden
> by introducing violations of type C and compiler tends to be resistant to those
> in a sense that is usually does not ICE.

Yes, and with the re-organized streaming we can now do much better and
much faster!  This is because with the SCC finding we can record in O(SCC size)
whether a) the SCC contains entities that are local (and to what function) or
global, b) the SCC refers to other SCCs that are local (and in what function)
We can then detect whether we run into the situation that the new SCC refers
to SCCs that are in different local sections (which would mean those
SCCs in turn
would need to be globalized or that the current SCC needs to be duplicated into
all local sections it refers to).

I would not be surprised if our current tree structures, when strictly following
restrictions A) and B) would need to be either all global or all duplicated into
the local sections :/

At least we can now more easily formalize this (and can compute
"scc_is_indexable" which is the property we need now)

> Now I only once added sanity check for C and larnt that
>
>  1) variably sized types may end up being twice - in global stream because they are
>     mentioned by FUNCTION_DECL/VARIABLE_DECL or TYPE_DECL
>     and local stream because of tree_is_indexable rule
>
>     Moreover they violate A after landing in global stream because they refer
>     to temporaries in the gimple body.
>     As you mentioned, we may want to clear TYPE_SIZE and variable DECL_FIELD_OFFSET.
>     But sadly I think dwarf2out needs both and gimple land needs DECL_FIELD_OFFSET.
>  2) TYPE_DECLS of local types (common in C++) tends to drag into global stream
>     BLOCK that later brings to global stream the local variables.
>
>     Making function local TYPE_DECLs local seems cool, but it break A because
>     function/local static variable types may reffer to them.
>  3) _ORIGIN pointers violate B by design.
>  4) LABEL_DECL with address taken violate A and B.
>
> I am sure this list is not complette.
>
> Now we can play about what can be global or local.  Like putting PARM_DECL into global
> stream only, or returning RESULT_DECL into global decl stream.  But
> I do not really see how much chance it has to fix 1-4.  I think the general flaw
> is that functions/static variables (we absolutely need in global stream) may reffer
> to local entities by contextes and variably sized types.
>
> Either we need to consistently not stream pointers causing the violations above
> - i.e. clear them in free_lang_data and make backend not dependent on them. Or
> we need to find way to make references of type A and perhaps B possible.  I can
> think of <local_ref function_decl localid> tree construct that we patch in and
> replace by actual declaration after reading function_decl body when it is
> needed (by dwarf2out or tree optimizers), but it does not seem cool to me either.
> We will like soon notice that with comdats localid needs to be unique across
> the merged comdat bodies or somehting similarly nasty.
>
> Another alternative I can think of is to declare duplication in between streams
> sane and allowed, but teach streaming in code to patch in the in-memory
> representation from global stream by local stream one when this happens.  It
> seems bit tricky to implement - the decl states of functions would need to
> explicitly record the duplications in between local and global streams.

I think duplication is what we do right now, otherwise we couldn't fill in all
tree references at streaming time.

What we need to do is compute the decision once per SCC during the
SCC build and record statistics on how many trees we'll duplicate that
way (not sure if we even properly duplicate them now with the SCC streaming ...)

Richard.

> Honza
Jan Hubicka - Sept. 3, 2013, 1:42 p.m.
> > The decision what should go to local or global stream is basically motivated by
> >   1) everything needed for interprocedural optimization has to be global
> >   2) everything related to function bodies should be local.
> 
> I'd rather formulate it as "everything not needed at WPA time should be
> local if it can be local"

OK.  With additional rule that we want to have at least something local,
or the local sections are pointless.
> 
> > For tree nodes, the decision about where the node goes is made by
> > tree_is_indexable.  The current set of rules is
> >  - parm_decl/result_decl is local
> >    (before my change resut_decl was global and parm_decl was local, why those
> >     was not the same escapes me, but it needed ugly fixup in ipa-prop that
> >     needed to reconnect global DECL pointers to local DECL pointers after
> >     streaming in)
> >  - function variables are local if they are not static.
> >    (Even external declarations are local, not only automatic vars.)
> >  - Types are global unless they are variably modified types
> >    (The variably modified type code walks into subtypes, so even pointers
> >    to variably modified type are local or function declarations mentioning
> >    these.)
> >  - FIELD_DECL is global unless it is part of variably modified type.
> >  - SSA_NAME is local despite logic of tree_is_indexable because it is handled
> >    separately as part of gimple and gimple is always local.
> >  - LABEL_DECLs are global even when their address is not taken and it would make
> >    more sense to hide it in the body sections
> >
> > Now I think we are safe only when our references never violate A) and B) and moreover
> > C) same entity that needs to be unique (such as declarations by one decl rule
> >    or BLOCK) is not streamed into multiple streams.  We never merge things
> >    between streams and duplicating those is forbidden by design.
> >
> > It is an irritating property of our streaming machinery that it is not
> > diagnozing violations of A/B/C. Violations of type A and B are basically hidden
> > by introducing violations of type C and compiler tends to be resistant to those
> > in a sense that is usually does not ICE.
> 
> Yes, and with the re-organized streaming we can now do much better and
> much faster!  This is because with the SCC finding we can record in O(SCC size)
> whether a) the SCC contains entities that are local (and to what function) or
> global, b) the SCC refers to other SCCs that are local (and in what function)
> We can then detect whether we run into the situation that the new SCC refers
> to SCCs that are in different local sections (which would mean those
> SCCs in turn
> would need to be globalized or that the current SCC needs to be duplicated into
> all local sections it refers to).
> 
> I would not be surprised if our current tree structures, when strictly following
> restrictions A) and B) would need to be either all global or all duplicated into
> the local sections :/
> 
> At least we can now more easily formalize this (and can compute
> "scc_is_indexable" which is the property we need now)

I did not think of this.  SCC code can definitely help us to not put into local
section an SCC component that contains something that needs to be global.  This
is pretty cool in a way that tree_is_indexable don't need to do the closure
itself (as it does for variadic types currently).

I think the common case however is that you have variable A that is local.  SCC
streaming will make it local since it see no must be global entities in that
var.  Later you stream separate SCC component containing a type whose size
depends on A.  SCC already made A local and you get a violation.


> > Another alternative I can think of is to declare duplication in between streams
> > sane and allowed, but teach streaming in code to patch in the in-memory
> > representation from global stream by local stream one when this happens.  It
> > seems bit tricky to implement - the decl states of functions would need to
> > explicitly record the duplications in between local and global streams.
> 
> I think duplication is what we do right now, otherwise we couldn't fill in all
> tree references at streaming time.

Yes, we duplicate and that leads to illformed IL.
> 
> What we need to do is compute the decision once per SCC during the
> SCC build and record statistics on how many trees we'll duplicate that
> way (not sure if we even properly duplicate them now with the SCC streaming ...)

Except that we need plan for the scenario above.

Honza
> 
> Richard.
> 
> > Honza
Richard Guenther - Sept. 3, 2013, 1:47 p.m.
On Tue, Sep 3, 2013 at 3:42 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> > The decision what should go to local or global stream is basically motivated by
>> >   1) everything needed for interprocedural optimization has to be global
>> >   2) everything related to function bodies should be local.
>>
>> I'd rather formulate it as "everything not needed at WPA time should be
>> local if it can be local"
>
> OK.  With additional rule that we want to have at least something local,
> or the local sections are pointless.
>>
>> > For tree nodes, the decision about where the node goes is made by
>> > tree_is_indexable.  The current set of rules is
>> >  - parm_decl/result_decl is local
>> >    (before my change resut_decl was global and parm_decl was local, why those
>> >     was not the same escapes me, but it needed ugly fixup in ipa-prop that
>> >     needed to reconnect global DECL pointers to local DECL pointers after
>> >     streaming in)
>> >  - function variables are local if they are not static.
>> >    (Even external declarations are local, not only automatic vars.)
>> >  - Types are global unless they are variably modified types
>> >    (The variably modified type code walks into subtypes, so even pointers
>> >    to variably modified type are local or function declarations mentioning
>> >    these.)
>> >  - FIELD_DECL is global unless it is part of variably modified type.
>> >  - SSA_NAME is local despite logic of tree_is_indexable because it is handled
>> >    separately as part of gimple and gimple is always local.
>> >  - LABEL_DECLs are global even when their address is not taken and it would make
>> >    more sense to hide it in the body sections
>> >
>> > Now I think we are safe only when our references never violate A) and B) and moreover
>> > C) same entity that needs to be unique (such as declarations by one decl rule
>> >    or BLOCK) is not streamed into multiple streams.  We never merge things
>> >    between streams and duplicating those is forbidden by design.
>> >
>> > It is an irritating property of our streaming machinery that it is not
>> > diagnozing violations of A/B/C. Violations of type A and B are basically hidden
>> > by introducing violations of type C and compiler tends to be resistant to those
>> > in a sense that is usually does not ICE.
>>
>> Yes, and with the re-organized streaming we can now do much better and
>> much faster!  This is because with the SCC finding we can record in O(SCC size)
>> whether a) the SCC contains entities that are local (and to what function) or
>> global, b) the SCC refers to other SCCs that are local (and in what function)
>> We can then detect whether we run into the situation that the new SCC refers
>> to SCCs that are in different local sections (which would mean those
>> SCCs in turn
>> would need to be globalized or that the current SCC needs to be duplicated into
>> all local sections it refers to).
>>
>> I would not be surprised if our current tree structures, when strictly following
>> restrictions A) and B) would need to be either all global or all duplicated into
>> the local sections :/
>>
>> At least we can now more easily formalize this (and can compute
>> "scc_is_indexable" which is the property we need now)
>
> I did not think of this.  SCC code can definitely help us to not put into local
> section an SCC component that contains something that needs to be global.  This
> is pretty cool in a way that tree_is_indexable don't need to do the closure
> itself (as it does for variadic types currently).
>
> I think the common case however is that you have variable A that is local.  SCC
> streaming will make it local since it see no must be global entities in that
> var.  Later you stream separate SCC component containing a type whose size
> depends on A.  SCC already made A local and you get a violation.

You get a violation if you cannot make the separate SCC local in the same
section as A or global.  Once the separate SCC refers to a local entity it
has to be local itself.  The only problem is if it refers to two local entities
in different local sections ... in which case we need to duplicate (or figure
out a way to break one of the references).

>> > Another alternative I can think of is to declare duplication in between streams
>> > sane and allowed, but teach streaming in code to patch in the in-memory
>> > representation from global stream by local stream one when this happens.  It
>> > seems bit tricky to implement - the decl states of functions would need to
>> > explicitly record the duplications in between local and global streams.
>>
>> I think duplication is what we do right now, otherwise we couldn't fill in all
>> tree references at streaming time.
>
> Yes, we duplicate and that leads to illformed IL.
>>
>> What we need to do is compute the decision once per SCC during the
>> SCC build and record statistics on how many trees we'll duplicate that
>> way (not sure if we even properly duplicate them now with the SCC streaming ...)
>
> Except that we need plan for the scenario above.

Change the tree layout ;)

Richard.

> Honza
>>
>> Richard.
>>
>> > Honza

Patch

Index: cgraphunit.c
===================================================================
--- cgraphunit.c	(revision 202153)
+++ cgraphunit.c	(working copy)
@@ -1414,14 +1414,18 @@  expand_thunk (struct cgraph_node *node)
   tree virtual_offset = NULL;
   tree alias = node->callees->callee->symbol.decl;
   tree thunk_fndecl = node->symbol.decl;
-  tree a = DECL_ARGUMENTS (thunk_fndecl);
+  tree a;
+
+  if (in_lto_p)
+    cgraph_get_body (node);
+  a = DECL_ARGUMENTS (thunk_fndecl);
 
   current_function_decl = thunk_fndecl;
 
   /* Ensure thunks are emitted in their correct sections.  */
   resolve_unique_section (thunk_fndecl, 0, flag_function_sections);
 
-  if (this_adjusting
+  if (this_adjusting
       && targetm.asm_out.can_output_mi_thunk (thunk_fndecl, fixed_offset,
 					      virtual_value, alias))
     {
Index: lto-streamer-in.c
===================================================================
--- lto-streamer-in.c	(revision 202153)
+++ lto-streamer-in.c	(working copy)
@@ -998,6 +998,7 @@  input_function (tree fn_decl, struct dat
   free_dominance_info (CDI_DOMINATORS);
   free_dominance_info (CDI_POST_DOMINATORS);
   free (stmts);
+  pop_cfun ();
 }
 
 
@@ -1086,8 +1087,6 @@  lto_read_body (struct lto_file_decl_data
 
       /* Restore decl state */
       file_data->current_decl_state = file_data->global_decl_state;
-
-      pop_cfun ();
     }
 
   lto_data_in_delete (data_in);
Index: lto-streamer-out.c
===================================================================
--- lto-streamer-out.c	(revision 202153)
+++ lto-streamer-out.c	(working copy)
@@ -1982,8 +1982,7 @@  lto_output (void)
       cgraph_node *node = dyn_cast <cgraph_node> (snode);
       if (node
 	  && lto_symtab_encoder_encode_body_p (encoder, node)
-	  && !node->symbol.alias
-	  && !node->thunk.thunk_p)
+	  && !node->symbol.alias)
 	{
 #ifdef ENABLE_CHECKING
 	  gcc_assert (!bitmap_bit_p (output, DECL_UID (node->symbol.decl)));