Patchwork Combine location with block using block_locations

login
register
mail settings
Submitter Dehao Chen
Date Sept. 12, 2012, 5:20 p.m.
Message ID <CAO2gOZWKDceuDo_pdxvn4-9o0SdEvaNSVHjJ-sJwp56GqnD=oA@mail.gmail.com>
Download mbox | patch
Permalink /patch/183418/
State New
Headers show

Comments

Dehao Chen - Sept. 12, 2012, 5:20 p.m.
There is another bug in the patch (not covered by unittests,
discovered through spec benchmarks).

When we remove unused locals, we do not mark the block as used for
debug stmt, but gimple-streamer-out will still stream out blocks for
debug stmt. There can be 2 fixes:

1.

Either fix could work. Any suggests which one should we go?

Thanks,
Dehao

On Wed, Sep 12, 2012 at 10:05 AM, Dehao Chen <dehao@google.com> wrote:
> There are two parts that needs memory management:
>
> 1. The BLOCK structure. This is managed by GC. I originally thought
> that removing blocks from tree.gsbase would paralyze GC. This turned
> out not to be a concern because DECL_INITIAL will still mark those
> used tree nodes. This patch may decrease the memory consumption by
> removing blocks from tree/gimple. However, as it makes more blocks
> become used, they also increase the memory consumption.
> 2. The data structure in libcpp that maintains the hashtable for the
> location->block mapping. This is relatively minor because for the
> largest source I've seen, it only maintains less than 100K entries in
> the array (less than 1M total memory consumption). However, as it is a
> global data structure, it may make LTO unhappy. Honza is helping
> testing the memory consumption on LTO (but we first need to make this
> patch work for LTO). If the LTO result turns out ok, we probably don't
> want to put these under GC because: 1. it'll make things much more
> complicated. 2. using self managed memory is more efficient (as this
> is frequently used in many passes). 3. not using GC actually saves
> memory because even though the block is in the map, it can still be
> GCed as soon as it's not reachable from DECL_INITIAL.
>
> I've tested this on some very large C++ files (each one takes more
> than 10s to build), the memory consumption does not see noticeable
> increase/decrease.
>
> Thanks,
> Dehao
>
> On Wed, Sep 12, 2012 at 9:39 AM, Xinliang David Li <davidxl@google.com> wrote:
>> On Wed, Sep 12, 2012 at 2:13 AM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Wed, Sep 12, 2012 at 7:06 AM, Dehao Chen <dehao@google.com> wrote:
>>>> Now I think we are facing a more complex problem. The data structure
>>>> we use to store the location_adhoc_data are file-static in linemap.c
>>>> in libcpp. These data structures are not guarded by GTY(()).
>>>> Meanwhile, as we have removed the block data structure from
>>>> gimple.gsbase as well as tree.exp (encoding them into an location_t).
>>>> This could cause block being GCed and the LOCATION_BLOCK becoming
>>>> dangling pointers.
>>>
>>> Uh.  Note that it is quite important that we are able to garbage-collect unused
>>> BLOCKs, this is the whole point of removing unused BLOCK scopes in
>>> remove_unused_locals.  So this indeed becomes much more complicated ...
>>> What would be desired is that the garbage collector can NULL an entry in
>>> the mapping table when it is not referenced in any other way (that other
>>> reference would be the BLOCK tree as stored in a FUNCTION_DECLs DECL_INITIAL).
>>
>> It would be nice to GC those unused BLOCKS. I wonder how many BLOCKS
>> are created for a large C++ program. This patch saves memory by
>> shrinking tree size, is it a net win or loss without GC those BLOCKS?
>>
>> thanks,
>>
>> David
>>
>>
>>>
>>>> I tried to manipulate GTY to make it recognize the LOCATION_BLOCK from
>>>> gimple.gsbase.location. However, neigher nested_ptr nor mark_hook can
>>>> help me.
>>>>
>>>> Another approach would be guard the location_adhoc_data and related
>>>> data structures in GTY(()). However, this is non-trivial because tree
>>>> is not visible in libcpp. At the same time, my implementation heavily
>>>> relies on hashtable to make the code efficient, thus it's quite tricky
>>>> to make "param_is" and "use_params" work.
>>>>
>>>> The final approach, which I'll try tomorrow, would be move all my
>>>> implementation from libcpp to gcc, and guard them with GTY(()). I
>>>> still haven't thought of any potential problem of this approach. Any
>>>> comments?
>>>
>>> I think moving the mapping to GC in a lazy manner as I described above
>>> would be the way to go.  For hashtables GC already supports if_marked,
>>> not sure if similar support is available for arrays/vecs.
>>>
>>> Richard.
>>>
>>>> Thanks,
>>>> Dehao
>>>>
>>>> On Tue, Sep 11, 2012 at 9:00 AM, Dehao Chen <dehao@google.com> wrote:
>>>>> I saw comments in tree-streamer-out.c:
>>>>>
>>>>>   /* Do not stream BLOCK_SOURCE_LOCATION.  We cannot handle debug information
>>>>>      for early inlining so drop it on the floor instead of ICEing in
>>>>>      dwarf2out.c.  */
>>>>>   streamer_write_chain (ob, BLOCK_VARS (expr), ref_p);
>>>>>
>>>>> However, what the code is doing seemed contradictory with the comment.
>>>>> Or am I missing something?
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Sep 11, 2012 at 8:32 AM, Michael Matz <matz@suse.de> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On Tue, 11 Sep 2012, Dehao Chen wrote:
>>>>>>
>>>>>>> Looks like we have two choices:
>>>>>>>
>>>>>>> 1. Stream out block info, and use LTO_SET_PREVAIL for TREE_CHAIN(t)
>>>>>>
>>>>>> This will actually not work correctly in some cases.  The problem is, if
>>>>>> the prevailing decl is already part of another chain (say in another
>>>>>> block_var list) you would break the current chain.  Hence block vars need
>>>>>> special handling in the lto streamer (another reason why tree_chain is not
>>>>>> the most clever think to use for this chain).  This problem area needs to
>>>>>> be solved somehow if block info is to be preserved correctly.
>>>>>>
>>>>>>> 2. Don't stream out block info for LTO, and still call LTO_NO_PREVAIL
>>>>>>> (TREE_CHAIN (t)).
>>>>>>
>>>>>> That's also a large hammer as it basically will mean no debug info after
>>>>>> LTO :-/ Sigh, at this point I have no good solution that doesn't involve
>>>>>> quite some work, perhaps your hack is good enough for the time being,
>>>>>> though I hate it :)
>>>>>
>>>>> I got it. Then I'll keep the patch as it is (remove the
>>>>> LTO_NO_PREVAIL), and work with Honza to resolve the issue he had, and
>>>>> then we should be good to check in?
>>>>>
>>>>> Thanks,
>>>>> Dehao
>>>>>
>>>>>>
>>>>>>
>>>>>> Ciao,
>>>>>> Michael.
Richard Guenther - Sept. 13, 2012, noon
On Wed, Sep 12, 2012 at 7:20 PM, Dehao Chen <dehao@google.com> wrote:
> There is another bug in the patch (not covered by unittests,
> discovered through spec benchmarks).
>
> When we remove unused locals, we do not mark the block as used for
> debug stmt, but gimple-streamer-out will still stream out blocks for
> debug stmt. There can be 2 fixes:

Because doing so would create code generation differences -g vs. -g0.

> 1.
> --- a/gcc/gimple-streamer-out.c
> +++ b/gcc/gimple-streamer-out.c
> @@ -77,7 +77,8 @@ output_gimple_stmt (struct output_block *ob, gimple stmt)
>    lto_output_location (ob, LOCATION_LOCUS (gimple_location (stmt)));
>
>    /* Emit the lexical block holding STMT.  */
> -  stream_write_tree (ob, gimple_block (stmt), true);
> +  if (!is_gimple_debug (stmt))
> +    stream_write_tree (ob, gimple_block (stmt), true);
>
>    /* Emit the operands.  */
>    switch (gimple_code (stmt))
>
> 2.
> --- a/gcc/tree-ssa-live.c
> +++ b/gcc/tree-ssa-live.c
> @@ -726,9 +726,6 @@ remove_unused_locals (void)
>           gimple stmt = gsi_stmt (gsi);
>           tree b = gimple_block (stmt);
>
> -         if (is_gimple_debug (stmt))
> -           continue;
> -
>           if (gimple_clobber_p (stmt))
>             {
>               have_local_clobbers = true;
>
> Either fix could work. Any suggests which one should we go?

The 2nd one will not work and is not acceptable.  The 1st one - well ...
what happens on trunk right now?  The debug stmt points to a
BLOCK that is possibly removed from the BLOCK tree?  In this case
I think the fix is 3. make sure remove_unused_scope_block_p will
clear BLOCKs from all stmts / expressions that have been removed.

Richard.

> Thanks,
> Dehao
>
> On Wed, Sep 12, 2012 at 10:05 AM, Dehao Chen <dehao@google.com> wrote:
>> There are two parts that needs memory management:
>>
>> 1. The BLOCK structure. This is managed by GC. I originally thought
>> that removing blocks from tree.gsbase would paralyze GC. This turned
>> out not to be a concern because DECL_INITIAL will still mark those
>> used tree nodes. This patch may decrease the memory consumption by
>> removing blocks from tree/gimple. However, as it makes more blocks
>> become used, they also increase the memory consumption.
>> 2. The data structure in libcpp that maintains the hashtable for the
>> location->block mapping. This is relatively minor because for the
>> largest source I've seen, it only maintains less than 100K entries in
>> the array (less than 1M total memory consumption). However, as it is a
>> global data structure, it may make LTO unhappy. Honza is helping
>> testing the memory consumption on LTO (but we first need to make this
>> patch work for LTO). If the LTO result turns out ok, we probably don't
>> want to put these under GC because: 1. it'll make things much more
>> complicated. 2. using self managed memory is more efficient (as this
>> is frequently used in many passes). 3. not using GC actually saves
>> memory because even though the block is in the map, it can still be
>> GCed as soon as it's not reachable from DECL_INITIAL.
>>
>> I've tested this on some very large C++ files (each one takes more
>> than 10s to build), the memory consumption does not see noticeable
>> increase/decrease.
>>
>> Thanks,
>> Dehao
>>
>> On Wed, Sep 12, 2012 at 9:39 AM, Xinliang David Li <davidxl@google.com> wrote:
>>> On Wed, Sep 12, 2012 at 2:13 AM, Richard Guenther
>>> <richard.guenther@gmail.com> wrote:
>>>> On Wed, Sep 12, 2012 at 7:06 AM, Dehao Chen <dehao@google.com> wrote:
>>>>> Now I think we are facing a more complex problem. The data structure
>>>>> we use to store the location_adhoc_data are file-static in linemap.c
>>>>> in libcpp. These data structures are not guarded by GTY(()).
>>>>> Meanwhile, as we have removed the block data structure from
>>>>> gimple.gsbase as well as tree.exp (encoding them into an location_t).
>>>>> This could cause block being GCed and the LOCATION_BLOCK becoming
>>>>> dangling pointers.
>>>>
>>>> Uh.  Note that it is quite important that we are able to garbage-collect unused
>>>> BLOCKs, this is the whole point of removing unused BLOCK scopes in
>>>> remove_unused_locals.  So this indeed becomes much more complicated ...
>>>> What would be desired is that the garbage collector can NULL an entry in
>>>> the mapping table when it is not referenced in any other way (that other
>>>> reference would be the BLOCK tree as stored in a FUNCTION_DECLs DECL_INITIAL).
>>>
>>> It would be nice to GC those unused BLOCKS. I wonder how many BLOCKS
>>> are created for a large C++ program. This patch saves memory by
>>> shrinking tree size, is it a net win or loss without GC those BLOCKS?
>>>
>>> thanks,
>>>
>>> David
>>>
>>>
>>>>
>>>>> I tried to manipulate GTY to make it recognize the LOCATION_BLOCK from
>>>>> gimple.gsbase.location. However, neigher nested_ptr nor mark_hook can
>>>>> help me.
>>>>>
>>>>> Another approach would be guard the location_adhoc_data and related
>>>>> data structures in GTY(()). However, this is non-trivial because tree
>>>>> is not visible in libcpp. At the same time, my implementation heavily
>>>>> relies on hashtable to make the code efficient, thus it's quite tricky
>>>>> to make "param_is" and "use_params" work.
>>>>>
>>>>> The final approach, which I'll try tomorrow, would be move all my
>>>>> implementation from libcpp to gcc, and guard them with GTY(()). I
>>>>> still haven't thought of any potential problem of this approach. Any
>>>>> comments?
>>>>
>>>> I think moving the mapping to GC in a lazy manner as I described above
>>>> would be the way to go.  For hashtables GC already supports if_marked,
>>>> not sure if similar support is available for arrays/vecs.
>>>>
>>>> Richard.
>>>>
>>>>> Thanks,
>>>>> Dehao
>>>>>
>>>>> On Tue, Sep 11, 2012 at 9:00 AM, Dehao Chen <dehao@google.com> wrote:
>>>>>> I saw comments in tree-streamer-out.c:
>>>>>>
>>>>>>   /* Do not stream BLOCK_SOURCE_LOCATION.  We cannot handle debug information
>>>>>>      for early inlining so drop it on the floor instead of ICEing in
>>>>>>      dwarf2out.c.  */
>>>>>>   streamer_write_chain (ob, BLOCK_VARS (expr), ref_p);
>>>>>>
>>>>>> However, what the code is doing seemed contradictory with the comment.
>>>>>> Or am I missing something?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 11, 2012 at 8:32 AM, Michael Matz <matz@suse.de> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On Tue, 11 Sep 2012, Dehao Chen wrote:
>>>>>>>
>>>>>>>> Looks like we have two choices:
>>>>>>>>
>>>>>>>> 1. Stream out block info, and use LTO_SET_PREVAIL for TREE_CHAIN(t)
>>>>>>>
>>>>>>> This will actually not work correctly in some cases.  The problem is, if
>>>>>>> the prevailing decl is already part of another chain (say in another
>>>>>>> block_var list) you would break the current chain.  Hence block vars need
>>>>>>> special handling in the lto streamer (another reason why tree_chain is not
>>>>>>> the most clever think to use for this chain).  This problem area needs to
>>>>>>> be solved somehow if block info is to be preserved correctly.
>>>>>>>
>>>>>>>> 2. Don't stream out block info for LTO, and still call LTO_NO_PREVAIL
>>>>>>>> (TREE_CHAIN (t)).
>>>>>>>
>>>>>>> That's also a large hammer as it basically will mean no debug info after
>>>>>>> LTO :-/ Sigh, at this point I have no good solution that doesn't involve
>>>>>>> quite some work, perhaps your hack is good enough for the time being,
>>>>>>> though I hate it :)
>>>>>>
>>>>>> I got it. Then I'll keep the patch as it is (remove the
>>>>>> LTO_NO_PREVAIL), and work with Honza to resolve the issue he had, and
>>>>>> then we should be good to check in?
>>>>>>
>>>>>> Thanks,
>>>>>> Dehao
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Ciao,
>>>>>>> Michael.
Robert Dewar - Sept. 13, 2012, 1:33 p.m.
On 9/13/2012 8:00 AM, Richard Guenther wrote:

> Because doing so would create code generation differences -g vs. -g0.

Sometimes I wonder whether the insistence on -g not changing code
generation is warranted. In practice, gdb for me is so weak in handling
-O1 or -O2, that if I want to debug something I have to recompile
with -O0 -g, which causes quite a bit of code generation change :-)
Jakub Jelinek - Sept. 13, 2012, 1:38 p.m.
On Thu, Sep 13, 2012 at 09:33:20AM -0400, Robert Dewar wrote:
> On 9/13/2012 8:00 AM, Richard Guenther wrote:
> 
> >Because doing so would create code generation differences -g vs. -g0.
> 
> Sometimes I wonder whether the insistence on -g not changing code
> generation is warranted. In practice, gdb for me is so weak in handling

It is.  IMHO the most important reason is not that somebody would build
first with just -O2 and then later on to debug the code would build it again
with -g -O2 and hope the code is the same, but by making sure -g vs. -g0
doesn't change generate code we ensure -g doesn't pessimize the generated
code, and really many people compile even production code with -g -O2
or similar.  The debug info is then either stripped, or stripped into
separate files/not shipped or only optionally shipped with the product.

	Jakub
Robert Dewar - Sept. 13, 2012, 1:52 p.m.
On 9/13/2012 9:38 AM, Jakub Jelinek wrote:
> On Thu, Sep 13, 2012 at 09:33:20AM -0400, Robert Dewar wrote:
>> On 9/13/2012 8:00 AM, Richard Guenther wrote:
>>
>>> Because doing so would create code generation differences -g vs. -g0.
>>
>> Sometimes I wonder whether the insistence on -g not changing code
>> generation is warranted. In practice, gdb for me is so weak in handling
>
> It is.  IMHO the most important reason is not that somebody would build
> first with just -O2 and then later on to debug the code would build it again
> with -g -O2 and hope the code is the same, but by making sure -g vs. -g0
> doesn't change generate code we ensure -g doesn't pessimize the generated
> code, and really many people compile even production code with -g -O2
> or similar.  The debug info is then either stripped, or stripped into
> separate files/not shipped or only optionally shipped with the product.
>
> 	Jakub
>
Sure, it is obvious that you don't want -g to affect -O1 or -O2 code,
but I think if you have -Og (if and when we have that), it would not
be a bad thing for -g to affect that. I can even imagine that what
-Og means is -O1 if you don't have -g, and something good for
debugging if you do have -g.
Xinliang David Li - Sept. 13, 2012, 4:07 p.m.
It is very important to make sure -g does not affect code gen ---
people do release build with -g with optimization, and strip the
binary before sending it to production machines ..

David

On Thu, Sep 13, 2012 at 6:33 AM, Robert Dewar <dewar@adacore.com> wrote:
> On 9/13/2012 8:00 AM, Richard Guenther wrote:
>
>> Because doing so would create code generation differences -g vs. -g0.
>
>
> Sometimes I wonder whether the insistence on -g not changing code
> generation is warranted. In practice, gdb for me is so weak in handling
> -O1 or -O2, that if I want to debug something I have to recompile
> with -O0 -g, which causes quite a bit of code generation change :-)
>
Robert Dewar - Sept. 13, 2012, 4:29 p.m.
On 9/13/2012 12:07 PM, Xinliang David Li wrote:
> It is very important to make sure -g does not affect code gen ---
> people do release build with -g with optimization, and strip the
> binary before sending it to production machines ..

Yes, of course, and for sure -g cannot affect optimized code, see
my follow on message.
>
> David
>
> On Thu, Sep 13, 2012 at 6:33 AM, Robert Dewar <dewar@adacore.com> wrote:
>> On 9/13/2012 8:00 AM, Richard Guenther wrote:
>>
>>> Because doing so would create code generation differences -g vs. -g0.
>>
>>
>> Sometimes I wonder whether the insistence on -g not changing code
>> generation is warranted. In practice, gdb for me is so weak in handling
>> -O1 or -O2, that if I want to debug something I have to recompile
>> with -O0 -g, which causes quite a bit of code generation change :-)
>>
Tom Tromey - Sept. 13, 2012, 4:46 p.m.
>>>>> "Robert" == Robert Dewar <dewar@adacore.com> writes:

Robert> Sometimes I wonder whether the insistence on -g not changing code
Robert> generation is warranted. In practice, gdb for me is so weak in handling
Robert> -O1 or -O2, that if I want to debug something I have to recompile
Robert> with -O0 -g, which causes quite a bit of code generation change :-)

If those are gdb bugs, please file them.

Tom
Robert Dewar - Sept. 13, 2012, 4:51 p.m.
On 9/13/2012 12:46 PM, Tom Tromey wrote:
>>>>>> "Robert" == Robert Dewar <dewar@adacore.com> writes:
>
> Robert> Sometimes I wonder whether the insistence on -g not changing code
> Robert> generation is warranted. In practice, gdb for me is so weak in handling
> Robert> -O1 or -O2, that if I want to debug something I have to recompile
> Robert> with -O0 -g, which causes quite a bit of code generation change :-)
>
> If those are gdb bugs, please file them.

Well I think everyone knows about the failings of gdb in -O1 mode, they
have been much discussed, and they are not really gdb bugs, more an 
issue of it being basically hard to debug optimized code. Things used
to be a LOT better, I routinely debugged code at -O1, but then the
compiler got better at optimization, and things deteriorated so much
at -O1 that now I don't even attempt it.
>
> Tom
>
Mike Stump - Sept. 13, 2012, 5:37 p.m.
On Sep 13, 2012, at 6:52 AM, Robert Dewar <dewar@adacore.com> wrote:
> Sure, it is obvious that you don't want -g to affect -O1 or -O2 code,
> but I think if you have -Og (if and when we have that), it would not
> be a bad thing for -g to affect that.

No, instead think of -Og as affecting the -g output itself.  If it does, then there is nothing for -g to affect when used with -Og.  So, I agree, -g -Og can have any impact on code-gen that we want, I just dis-agree that -Og should be any different; I just don't see the need.
Mike Stump - Sept. 13, 2012, 5:44 p.m.
On Sep 13, 2012, at 9:51 AM, Robert Dewar <dewar@adacore.com> wrote:
> I routinely debugged code at -O1, but then the
> compiler got better at optimization, and things deteriorated so much
> at -O1 that now I don't even attempt it.

An example of a non-feature for me would be the reordering of instructions by scheduling when there is no benefit, only reorder, if there is a non-zero benefit.  This causes the jumpy debug experience, and needlessly so in some cases.  This is also an example that gdb can't fix.  The fix would be to compute costs as tuple and have the second part of the cost be the original instruction ordering.  Then, the scheduler actively works to preserve order, unless the upper case of the cost indicates there is a win to be had.
Richard Guenther - Sept. 14, 2012, 6:49 a.m.
On Thu, Sep 13, 2012 at 7:37 PM, Mike Stump <mikestump@comcast.net> wrote:
> On Sep 13, 2012, at 6:52 AM, Robert Dewar <dewar@adacore.com> wrote:
>> Sure, it is obvious that you don't want -g to affect -O1 or -O2 code,
>> but I think if you have -Og (if and when we have that), it would not
>> be a bad thing for -g to affect that.
>
> No, instead think of -Og as affecting the -g output itself.  If it does, then there is nothing for -g to affect when used with -Og.  So, I agree, -g -Og can have any impact on code-gen that we want, I just dis-agree that -Og should be any different; I just don't see the need.

I think it's going to make GCC harder to maintain if we drop the -g0
vs. -g no-code-difference requirement for just some optimization
levels.

Richard.
Eric Botcazou - Sept. 14, 2012, 8:59 a.m.
> I think it's going to make GCC harder to maintain if we drop the -g0
> vs. -g no-code-difference requirement for just some optimization
> levels.

Seconded, this is surely going to open yet another can of worms.
Diego Novillo - Sept. 14, 2012, 10:30 a.m.
On 2012-09-14 04:59 , Eric Botcazou wrote:
>> I think it's going to make GCC harder to maintain if we drop the -g0
>> vs. -g no-code-difference requirement for just some optimization
>> levels.
>
> Seconded, this is surely going to open yet another can of worms.

Agreed.


Diego.

Patch

--- a/gcc/gimple-streamer-out.c
+++ b/gcc/gimple-streamer-out.c
@@ -77,7 +77,8 @@  output_gimple_stmt (struct output_block *ob, gimple stmt)
   lto_output_location (ob, LOCATION_LOCUS (gimple_location (stmt)));

   /* Emit the lexical block holding STMT.  */
-  stream_write_tree (ob, gimple_block (stmt), true);
+  if (!is_gimple_debug (stmt))
+    stream_write_tree (ob, gimple_block (stmt), true);

   /* Emit the operands.  */
   switch (gimple_code (stmt))

2.
--- a/gcc/tree-ssa-live.c
+++ b/gcc/tree-ssa-live.c
@@ -726,9 +726,6 @@  remove_unused_locals (void)
          gimple stmt = gsi_stmt (gsi);
          tree b = gimple_block (stmt);

-         if (is_gimple_debug (stmt))
-           continue;
-
          if (gimple_clobber_p (stmt))
            {
              have_local_clobbers = true;