diff mbox series

keep scope blocks for all inlined functions (PR 98664)

Message ID 0f27b6c3-f6ab-ec4f-52c6-6e544684f751@gmail.com
State New
Headers show
Series keep scope blocks for all inlined functions (PR 98664) | expand

Commit Message

Martin Sebor Jan. 14, 2021, 7:13 p.m. UTC
One aspect of PR 98465 - Bogus warning stringop-overread for std::string
is the inconsistency between -g and -g0 which turns out to be due to
GCC eliminating apparently unused scope blocks from inlined functions
that aren't explicitly declared inline and artificial.  PR 98664 tracks
just this part of PR 98465.

To resolve just the PR 98664 subset the attached change has
the tree-ssa-live.c pass preserve these blocks for all inlined
functions, not just artificial ones.  Besides avoiding the interaction
between -g and warnings it also seems to improve the inlining context
by including more inlined call sites.  This can be seen in the adjusted
tests.  (Its effect on PR 98465 is that the false positive is issued
consistently, regardless of -g.  Avoiding the false positive is my
next step.)

Jakub, you raised a concern yesterday in PR 98465 c#13 about the memory
footprint of this change.  Can you please comment on whether it's in
line with what you were suggesting?

Martin

Comments

Richard Biener Jan. 15, 2021, 7:44 a.m. UTC | #1
On Thu, Jan 14, 2021 at 8:13 PM Martin Sebor via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> One aspect of PR 98465 - Bogus warning stringop-overread for std::string
> is the inconsistency between -g and -g0 which turns out to be due to
> GCC eliminating apparently unused scope blocks from inlined functions
> that aren't explicitly declared inline and artificial.  PR 98664 tracks
> just this part of PR 98465.
>
> To resolve just the PR 98664 subset the attached change has
> the tree-ssa-live.c pass preserve these blocks for all inlined
> functions, not just artificial ones.  Besides avoiding the interaction
> between -g and warnings it also seems to improve the inlining context
> by including more inlined call sites.  This can be seen in the adjusted
> tests.  (Its effect on PR 98465 is that the false positive is issued
> consistently, regardless of -g.  Avoiding the false positive is my
> next step.)
>
> Jakub, you raised a concern yesterday in PR 98465 c#13 about the memory
> footprint of this change.  Can you please comment on whether it's in
> line with what you were suggesting?

     {
       tree ao = BLOCK_ABSTRACT_ORIGIN (block);
-      if (TREE_CODE (ao) == FUNCTION_DECL)
-       loc = BLOCK_SOURCE_LOCATION (block);
-      else if (TREE_CODE (ao) != BLOCK)
-       break;
+       if (TREE_CODE (ao) == FUNCTION_DECL)
+        loc = BLOCK_SOURCE_LOCATION (block);
+       else if (TREE_CODE (ao) != BLOCK)
+        break;

you are replacing tabs with spaces?

@@ -558,16 +558,13 @@ remove_unused_scope_block_p (tree scope, bool
in_ctor_dtor_block)
    else if (!flag_auto_profile && debug_info_level == DINFO_LEVEL_NONE
            && !optinfo_wants_inlining_info_p ())
      {
-       /* Even for -g0 don't prune outer scopes from artificial
-         functions, otherwise diagnostics using tree_nonartificial_location
-         will not be emitted properly.  */
+       /* Even for -g0 don't prune outer scopes from inlined functions,
+         otherwise late diagnostics from such functions will not be
+         emitted or suppressed properly.  */
        if (inlined_function_outer_scope_p (scope))
         {
           tree ao = BLOCK_ORIGIN (scope);
-          if (ao
-              && TREE_CODE (ao) == FUNCTION_DECL
-              && DECL_DECLARED_INLINE_P (ao)
-              && lookup_attribute ("artificial", DECL_ATTRIBUTES (ao)))
+          if (ao && TREE_CODE (ao) == FUNCTION_DECL)
             unused = false;
         }
      }

so which inlined_function_outer_scope_p are you _not_ marking now?
BLOCK_ORIGIN is never NULL and all inlined scopes should have
an abstract origin - I believe always a FUNCTIN_DECL.  Which means
you could have simplified it further?

And yes, the main reason for the code above is memory use for
C++ with lots of inlining.  I suggest to try the patch on tramp3d
for example (there's about 10 inline instances per emitted
assembly op).

Richard.

> Martin
Martin Sebor Jan. 17, 2021, 12:46 a.m. UTC | #2
On 1/15/21 12:44 AM, Richard Biener wrote:
> On Thu, Jan 14, 2021 at 8:13 PM Martin Sebor via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> One aspect of PR 98465 - Bogus warning stringop-overread for std::string
>> is the inconsistency between -g and -g0 which turns out to be due to
>> GCC eliminating apparently unused scope blocks from inlined functions
>> that aren't explicitly declared inline and artificial.  PR 98664 tracks
>> just this part of PR 98465.
>>
>> To resolve just the PR 98664 subset the attached change has
>> the tree-ssa-live.c pass preserve these blocks for all inlined
>> functions, not just artificial ones.  Besides avoiding the interaction
>> between -g and warnings it also seems to improve the inlining context
>> by including more inlined call sites.  This can be seen in the adjusted
>> tests.  (Its effect on PR 98465 is that the false positive is issued
>> consistently, regardless of -g.  Avoiding the false positive is my
>> next step.)
>>
>> Jakub, you raised a concern yesterday in PR 98465 c#13 about the memory
>> footprint of this change.  Can you please comment on whether it's in
>> line with what you were suggesting?
> 
>       {
>         tree ao = BLOCK_ABSTRACT_ORIGIN (block);
> -      if (TREE_CODE (ao) == FUNCTION_DECL)
> -       loc = BLOCK_SOURCE_LOCATION (block);
> -      else if (TREE_CODE (ao) != BLOCK)
> -       break;
> +       if (TREE_CODE (ao) == FUNCTION_DECL)
> +        loc = BLOCK_SOURCE_LOCATION (block);
> +       else if (TREE_CODE (ao) != BLOCK)
> +        break;
> 
> you are replacing tabs with spaces?
> 
> @@ -558,16 +558,13 @@ remove_unused_scope_block_p (tree scope, bool
> in_ctor_dtor_block)
>      else if (!flag_auto_profile && debug_info_level == DINFO_LEVEL_NONE
>              && !optinfo_wants_inlining_info_p ())
>        {
> -       /* Even for -g0 don't prune outer scopes from artificial
> -         functions, otherwise diagnostics using tree_nonartificial_location
> -         will not be emitted properly.  */
> +       /* Even for -g0 don't prune outer scopes from inlined functions,
> +         otherwise late diagnostics from such functions will not be
> +         emitted or suppressed properly.  */
>          if (inlined_function_outer_scope_p (scope))
>           {
>             tree ao = BLOCK_ORIGIN (scope);
> -          if (ao
> -              && TREE_CODE (ao) == FUNCTION_DECL
> -              && DECL_DECLARED_INLINE_P (ao)
> -              && lookup_attribute ("artificial", DECL_ATTRIBUTES (ao)))
> +          if (ao && TREE_CODE (ao) == FUNCTION_DECL)
>               unused = false;
>           }
>        }
> 
> so which inlined_function_outer_scope_p are you _not_ marking now?
> BLOCK_ORIGIN is never NULL and all inlined scopes should have
> an abstract origin - I believe always a FUNCTIN_DECL.  Which means
> you could have simplified it further?

Quite possibly.  I could find no documentation for these macros so
I tried to keep my changes conservative.  I did put together some
notes to document what I saw the macros evaluate to in my testing
(below).  If/when it's close to accurate I'd like to add them to
tree.h and to the internals manual.

> And yes, the main reason for the code above is memory use for
> C++ with lots of inlining.  I suggest to try the patch on tramp3d
> for example (there's about 10 inline instances per emitted
> assembly op).

This one:
https://github.com/llvm-mirror/test-suite/tree/master/MultiSource/Benchmarks/tramp3d-v4
?

With the patch, 69,022 more blocks with distinct numbers are kept
than without it.  I see some small differences in -fmem-report
and -ftime-report output:

   Total: 286 -> 288M  210 -> 211M  3993 -> 4019k

I'm not really sure what to look at so I attach the two reports
for you to judge for yourself.

I also attach an updated patch with the slight simplification you
suggested.

Martin

PS Here are my notes on the macros and the two related functions:

BLOCK: Denotes a lexical scope.  Contains BLOCK_VARS of variables
declared in it, BLOCK_SUBBLOCKS of scopes nested in it, and
BLOCK_CHAIN pointing to the next BLOCK.  Its BLOCK_SUPERCONTEXT
point to the BLOCK of the enclosing scope.  May have
a BLOCK_ABSTRACT_ORIGIN and a BLOCK_SOURCE_LOCATION.

BLOCK_SUPERCONTEXT: The scope of the enclosing block, or FUNCTION_DECL
for the "outermost" function scope.  Inlined functions are chained by
this so that given expression E and its TREE_BLOCK(E) B,
BLOCK_SUPERCONTEXT(B) is the scope (BLOCK) in which E has been made
or into which E has been inlined.  In the latter case,

BLOCK_ORIGIN(B) evaluates either to the enclosing BLOCK or to
the enclosing function DECL.  It's never null.

BLOCK_ABSTRACT_ORIGIN(B) is the FUNCTION_DECL of the function into
which it has been inlined, or null if B is not inlined.

BLOCK_ABSTRACT_ORIGIN: A BLOCK, or FUNCTION_DECL of the function
into which a block has been inlined.  In a BLOCK immediately enclosing
an inlined leaf expression points to the outermost BLOCK into which it
has been inlined (thus bypassing all intermediate BLOCK_SUPERCONTEXTs).

BLOCK_FRAGMENT_ORIGIN: ???
BLOCK_FRAGMENT_CHAIN: ???

bool inlined_function_outer_scope_p(BLOCK)   [tree.h]
   Returns true if a BLOCK has a source location.
   True for all but the innermost (no SUBBLOCKs?) and outermost blocks
   into which an expression has been inlined. (Is this always true?)

tree block_ultimate_origin(BLOCK)   [tree.c]
   Returns BLOCK_ABSTRACT_ORIGIN(BLOCK), AO, after asserting that
   (DECL_P(AO) && DECL_ORIGIN(AO) == AO) || BLOCK_ORIGIN(AO) == AO).
$ /build/gcc-master/gcc/xg++ -B /build/gcc-master/gcc -nostdinc++ -I /build/gcc-master/x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu -I /build/gcc-master/x86_64-pc-linux-gnu/libstdc++-v3/include -I /src/gcc/master/libstdc++-v3/libsupc++ -I /src/gcc/master/libstdc++-v3/include/backward -I /src/gcc/master/libstdc++-v3/testsuite/util -L /build/gcc-master/x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -O2 -c -fdump-tree-cfg=tramp3d-v4.keep_blocks.cfg /src/tramp3d-v4.cpp -fmem-report -ftime-report

################################################################################
# Final                                                                        #
################################################################################

Number of expanded macros:                     18928
Average number of tokens per macro expansion:      7

Line Table allocations during the compilation process
Number of ordinary maps used:          915 
Ordinary map used size:                 21k
Number of ordinary maps allocated:    1365 
Ordinary maps allocated size:           31k
Number of macro maps used:              16k
Macro maps used size:                  525k
Macro maps locations size:            1124k
Macro maps size:                      1650k
Duplicated maps locations size:        378k
Total allocated maps size:            3204k
Total used maps size:                 1671k
Ad-hoc table size:                      12M
Ad-hoc table entries used:             472k
optimized_ranges:                     1171k
unoptimized_ranges:                    134k

Memory still allocated at the end of the compilation process
Size      Allocated        Used    Overhead
8               116k         95k       3480 
16             4120k       2271k         88k
32               23M         11M        418k
64               10M       8427k        166k
256              30M         25M        423k
512            1500k       1166k         20k
1024           3856k       1708k         52k
2048           5040k       4986k         68k
4096            148k        148k       2072 
8192             48k         48k        336 
16384            64k         64k        224 
32768           128k        128k        224 
65536           384k        384k        336 
131072          128k        128k         56 
262144          768k        768k        168 
524288          512k        512k         56 
1048576        2048k       2048k        112 
2097152        2048k       2048k         56 
16777216         16M         16M         56 
24               11M       5294k        208k
40               25M         17M        413k
48               14M       6606k        236k
56             4812k       1825k         75k
72             3276k        775k         44k
80              596k        217k       8344 
88              364k        177k       5096 
96               11M       6110k        154k
112            3492k       2075k         47k
120            6172k       4542k         84k
152              16M         15M        237k
128              30M         26M        430k
144            6484k       1881k         88k
168              42M         40M        601k
184            3348k       2045k         45k
104            1176k       1036k         16k
272            3536k        904k         48k
280             164k        103k       2296 
Total           286M        210M       3993k

String pool
entries:                        79454
identifiers:                    34933 (43.97%)
slots:                          131072
deleted:                        36676
GGC bytes:                      2620k
table size:                     1024k
coll/search:                    0.8217
ins/search:                     0.1222
avg. entry:                     33.77 bytes (+/- 71.45)
longest entry:                  496
(No per-node statistics)
Type hash: size 131071, 67199 elements, 1.112287 collisions
DECL_DEBUG_EXPR  hash: size 1021, 0 elements, 0.856842 collisions
DECL_VALUE_EXPR  hash: size 1021, 30 elements, 0.146046 collisions
decl_specializations: size 131071, 50624 elements, 1.386344 collisions
type_specializations: size 32749, 23184 elements, 2.504205 collisions
No GIMPLE statistics
No RTX statistics

--------------------------------------------------------------------------------------------------------------------------------------------
Heap vectors                                      sizeof(T)       Leak            Peak     Times       Leak items Peak items
--------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------
Heap vectors                                      sizeof(T)       Leak            Peak     Times       Leak items Peak items
--------------------------------------------------------------------------------------------------------------------------------------------
Total                                                               0                         0                0 
--------------------------------------------------------------------------------------------------------------------------------------------


Alias oracle query stats:
  refs_may_alias_p: 2798174 disambiguations, 3068078 queries
  ref_maybe_used_by_call_p: 23818 disambiguations, 2836845 queries
  call_may_clobber_ref_p: 2754 disambiguations, 2764 queries
  nonoverlapping_component_refs_p: 0 disambiguations, 3493 queries
  nonoverlapping_refs_since_match_p: 376 disambiguations, 9008 must overlaps, 9415 queries
  aliasing_component_refs_p: 807 disambiguations, 30842 queries
  TBAA oracle: 1041789 disambiguations 1976625 queries
               189831 are in alias set 0
               513994 queries asked about the same object
               0 queries asked about the same alias set
               0 access volatile
               230715 are dependent in the DAG
               296 are aritificially in conflict with void *

Modref stats:
  modref use: 537 disambiguations, 6371 queries
  modref clobber: 37429 disambiguations, 352194 queries
  119124 tbaa queries (0.338234 per modref query)
  21518 base compares (0.061097 per modref query)

PTA query stats:
  pt_solution_includes: 559296 disambiguations, 744735 queries
  pt_solutions_intersect: 154078 disambiguations, 422490 queries

Time variable                                   usr           sys          wall           GGC
 phase setup                        :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)  1554k (  0%)
 phase parsing                      :   4.56 (  4%)   0.60 ( 21%)   5.16 (  5%)   213M ( 20%)
 phase lang. deferred               :   4.29 (  4%)   0.29 ( 10%)   4.59 (  4%)   189M ( 17%)
 phase opt and generate             :  97.18 ( 91%)   2.00 ( 69%)  99.38 ( 91%)   687M ( 63%)
 phase finalize                     :   0.27 (  0%)   0.01 (  0%)   0.28 (  0%)     0  (  0%)
 |name lookup                       :   0.88 (  1%)   0.08 (  3%)   0.76 (  1%)    13M (  1%)
 |overload resolution               :   2.36 (  2%)   0.23 (  8%)   2.42 (  2%)   110M ( 10%)
 garbage collection                 :   3.26 (  3%)   0.01 (  0%)   3.27 (  3%)     0  (  0%)
 dump files                         :   0.29 (  0%)   0.02 (  1%)   0.49 (  0%)  2945k (  0%)
 callgraph construction             :   1.30 (  1%)   0.14 (  5%)   1.43 (  1%)    37M (  3%)
 callgraph optimization             :   0.73 (  1%)   0.03 (  1%)   0.83 (  1%)   154k (  0%)
 callgraph functions expansion      :  67.29 ( 63%)   0.69 ( 24%)  68.12 ( 62%)   362M ( 33%)
 callgraph ipa passes               :  26.35 ( 25%)   0.98 ( 34%)  27.39 ( 25%)   226M ( 21%)
 ipa function summary               :   0.23 (  0%)   0.00 (  0%)   0.23 (  0%)  4663k (  0%)
 ipa dead code removal              :   0.08 (  0%)   0.01 (  0%)   0.08 (  0%)    56  (  0%)
 ipa devirtualization               :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)  4944  (  0%)
 ipa cp                             :   0.55 (  1%)   0.01 (  0%)   0.52 (  0%)  4862k (  0%)
 ipa inlining heuristics            :   0.70 (  1%)   0.02 (  1%)   0.83 (  1%)    22M (  2%)
 ipa function splitting             :   0.19 (  0%)   0.01 (  0%)   0.18 (  0%)   705k (  0%)
 ipa comdats                        :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 ipa reference                      :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)     0  (  0%)
 ipa profile                        :   0.02 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 ipa pure const                     :   0.14 (  0%)   0.01 (  0%)   0.22 (  0%)   437k (  0%)
 ipa icf                            :   0.16 (  0%)   0.00 (  0%)   0.16 (  0%)    44k (  0%)
 ipa SRA                            :   0.24 (  0%)   0.01 (  0%)   0.22 (  0%)  6229k (  1%)
 ipa free lang data                 :   0.03 (  0%)   0.00 (  0%)   0.02 (  0%)     0  (  0%)
 ipa free inline summary            :   0.04 (  0%)   0.00 (  0%)   0.04 (  0%)     0  (  0%)
 ipa modref                         :   0.11 (  0%)   0.00 (  0%)   0.11 (  0%)  1858k (  0%)
 cfg construction                   :   0.06 (  0%)   0.00 (  0%)   0.06 (  0%)  1187k (  0%)
 cfg cleanup                        :   0.45 (  0%)   0.01 (  0%)   0.35 (  0%)  1464k (  0%)
 CFG verifier                       :   4.15 (  4%)   0.18 (  6%)   4.37 (  4%)     0  (  0%)
 trivially dead code                :   0.10 (  0%)   0.00 (  0%)   0.09 (  0%)     0  (  0%)
 df scan insns                      :   0.29 (  0%)   0.00 (  0%)   0.23 (  0%)    43k (  0%)
 df reaching defs                   :   0.32 (  0%)   0.00 (  0%)   0.38 (  0%)     0  (  0%)
 df live regs                       :   1.02 (  1%)   0.00 (  0%)   0.96 (  1%)     0  (  0%)
 df live&initialized regs           :   0.35 (  0%)   0.00 (  0%)   0.28 (  0%)     0  (  0%)
 df must-initialized regs           :   0.05 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 df use-def / def-use chains        :   0.15 (  0%)   0.00 (  0%)   0.20 (  0%)     0  (  0%)
 df reg dead/unused notes           :   0.43 (  0%)   0.00 (  0%)   0.41 (  0%)  4205k (  0%)
 register information               :   0.08 (  0%)   0.00 (  0%)   0.12 (  0%)     0  (  0%)
 alias analysis                     :   0.29 (  0%)   0.00 (  0%)   0.37 (  0%)    11M (  1%)
 alias stmt walking                 :   4.25 (  4%)   0.04 (  1%)   4.36 (  4%)  1366k (  0%)
 register scan                      :   0.03 (  0%)   0.00 (  0%)   0.01 (  0%)   117k (  0%)
 rebuild jump labels                :   0.03 (  0%)   0.00 (  0%)   0.06 (  0%)     0  (  0%)
 preprocessing                      :   0.32 (  0%)   0.10 (  3%)   0.39 (  0%)  5611k (  1%)
 parser (global)                    :   0.60 (  1%)   0.23 (  8%)   0.76 (  1%)    57M (  5%)
 parser struct body                 :   0.73 (  1%)   0.03 (  1%)   0.69 (  1%)    37M (  3%)
 parser enumerator list             :   0.03 (  0%)   0.00 (  0%)   0.03 (  0%)   357k (  0%)
 parser function body               :   0.39 (  0%)   0.02 (  1%)   0.37 (  0%)  9857k (  1%)
 parser inl. func. body             :   0.24 (  0%)   0.01 (  0%)   0.25 (  0%)  5970k (  1%)
 parser inl. meth. body             :   0.43 (  0%)   0.11 (  4%)   0.71 (  1%)    25M (  2%)
 template instantiation             :   4.92 (  5%)   0.38 ( 13%)   5.35 (  5%)   261M ( 24%)
 constant expression evaluation     :   0.09 (  0%)   0.01 (  0%)   0.14 (  0%)  1563k (  0%)
 early inlining heuristics          :   0.33 (  0%)   0.01 (  0%)   0.35 (  0%)    10M (  1%)
 inline parameters                  :   0.53 (  0%)   0.05 (  2%)   0.71 (  1%)    15M (  1%)
 integration                        :   2.03 (  2%)   0.07 (  2%)   2.14 (  2%)   126M ( 12%)
 tree gimplify                      :   0.53 (  0%)   0.07 (  2%)   0.51 (  0%)    38M (  4%)
 tree eh                            :   0.15 (  0%)   0.00 (  0%)   0.22 (  0%)    10M (  1%)
 tree CFG construction              :   0.15 (  0%)   0.01 (  0%)   0.18 (  0%)    17M (  2%)
 tree CFG cleanup                   :   1.05 (  1%)   0.03 (  1%)   1.32 (  1%)   625k (  0%)
 tree tail merge                    :   0.05 (  0%)   0.00 (  0%)   0.07 (  0%)  1772k (  0%)
 tree VRP                           :   2.20 (  2%)   0.02 (  1%)   2.26 (  2%)    14M (  1%)
 tree Early VRP                     :   1.53 (  1%)   0.06 (  2%)   1.46 (  1%)    12M (  1%)
 tree copy propagation              :   0.31 (  0%)   0.00 (  0%)   0.26 (  0%)   188k (  0%)
 tree PTA                           :   1.90 (  2%)   0.05 (  2%)   2.16 (  2%)  5309k (  0%)
 tree PHI insertion                 :   0.03 (  0%)   0.00 (  0%)   0.04 (  0%)  1826k (  0%)
 tree SSA rewrite                   :   0.46 (  0%)   0.01 (  0%)   0.45 (  0%)    15M (  1%)
 tree SSA other                     :   0.12 (  0%)   0.02 (  1%)   0.16 (  0%)  1597k (  0%)
 tree SSA incremental               :   0.42 (  0%)   0.03 (  1%)   0.43 (  0%)  3434k (  0%)
 tree operand scan                  :   0.82 (  1%)   0.05 (  2%)   0.77 (  1%)    43M (  4%)
 dominator optimization             :   2.53 (  2%)   0.04 (  1%)   2.60 (  2%)    12M (  1%)
 backwards jump threading           :   0.13 (  0%)   0.00 (  0%)   0.14 (  0%)   347k (  0%)
 tree SRA                           :   0.26 (  0%)   0.00 (  0%)   0.25 (  0%)  1084k (  0%)
 isolate eroneous paths             :   0.03 (  0%)   0.00 (  0%)   0.06 (  0%)  1584  (  0%)
 tree CCP                           :   1.43 (  1%)   0.03 (  1%)   1.74 (  2%)  4451k (  0%)
 tree split crit edges              :   0.04 (  0%)   0.00 (  0%)   0.05 (  0%)  1578k (  0%)
 tree reassociation                 :   0.08 (  0%)   0.00 (  0%)   0.07 (  0%)    18k (  0%)
 tree PRE                           :   2.07 (  2%)   0.03 (  1%)   1.87 (  2%)    11M (  1%)
 tree FRE                           :   2.57 (  2%)   0.02 (  1%)   2.82 (  3%)  6890k (  1%)
 tree code sinking                  :   0.06 (  0%)   0.00 (  0%)   0.09 (  0%)  1307k (  0%)
 tree linearize phis                :   0.17 (  0%)   0.00 (  0%)   0.20 (  0%)   993k (  0%)
 tree backward propagate            :   0.01 (  0%)   0.00 (  0%)   0.05 (  0%)     0  (  0%)
 tree forward propagate             :   0.93 (  1%)   0.04 (  1%)   0.80 (  1%)  2244k (  0%)
 tree phiprop                       :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)    19k (  0%)
 tree conservative DCE              :   0.24 (  0%)   0.00 (  0%)   0.23 (  0%)   449k (  0%)
 tree aggressive DCE                :   0.39 (  0%)   0.00 (  0%)   0.39 (  0%)    12M (  1%)
 tree buildin call DCE              :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)  3504  (  0%)
 tree DSE                           :   1.16 (  1%)   0.00 (  0%)   1.08 (  1%)   339k (  0%)
 PHI merge                          :   0.05 (  0%)   0.00 (  0%)   0.08 (  0%)   128k (  0%)
 tree loop optimization             :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 tree loop invariant motion         :   0.32 (  0%)   0.00 (  0%)   0.25 (  0%)   117k (  0%)
 tree canonical iv                  :   0.24 (  0%)   0.00 (  0%)   0.14 (  0%)  2168k (  0%)
 scev constant prop                 :   0.03 (  0%)   0.00 (  0%)   0.09 (  0%)   537k (  0%)
 complete unrolling                 :   1.56 (  1%)   0.01 (  0%)   1.32 (  1%)    14M (  1%)
 tree loop distribution             :   0.33 (  0%)   0.00 (  0%)   0.35 (  0%)  3488k (  0%)
 tree iv optimization               :   1.42 (  1%)   0.01 (  0%)   1.38 (  1%)    19M (  2%)
 tree copy headers                  :   0.11 (  0%)   0.00 (  0%)   0.10 (  0%)  1438k (  0%)
 tree SSA uncprop                   :   0.05 (  0%)   0.00 (  0%)   0.03 (  0%)     0  (  0%)
 tree NRV optimization              :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)  1824  (  0%)
 tree SSA verifier                  :  11.24 ( 11%)   0.29 ( 10%)  12.00 ( 11%)     0  (  0%)
 tree STMT verifier                 :  17.17 ( 16%)   0.27 (  9%)  16.90 ( 15%)     0  (  0%)
 tree switch conversion             :   0.04 (  0%)   0.00 (  0%)   0.02 (  0%)     0  (  0%)
 tree switch lowering               :   0.04 (  0%)   0.00 (  0%)   0.02 (  0%)     0  (  0%)
 gimple CSE sin/cos                 :   0.02 (  0%)   0.00 (  0%)   0.00 (  0%)     0  (  0%)
 gimple widening/fma detection      :   0.05 (  0%)   0.00 (  0%)   0.01 (  0%)    13k (  0%)
 tree strlen optimization           :   0.63 (  1%)   0.00 (  0%)   0.55 (  1%)  4021k (  0%)
 tree modref                        :   0.24 (  0%)   0.00 (  0%)   0.33 (  0%)  5652k (  1%)
 callgraph verifier                 :   0.58 (  1%)   0.02 (  1%)   0.53 (  0%)     0  (  0%)
 dominance frontiers                :   0.00 (  0%)   0.00 (  0%)   0.07 (  0%)     0  (  0%)
 dominance computation              :   1.19 (  1%)   0.00 (  0%)   1.36 (  1%)     0  (  0%)
 control dependences                :   0.02 (  0%)   0.00 (  0%)   0.06 (  0%)     0  (  0%)
 out of ssa                         :   0.11 (  0%)   0.00 (  0%)   0.10 (  0%)    60k (  0%)
 expand vars                        :   0.19 (  0%)   0.00 (  0%)   0.09 (  0%)  3108k (  0%)
 expand                             :   0.59 (  1%)   0.00 (  0%)   0.68 (  1%)    30M (  3%)
 post expand cleanups               :   0.08 (  0%)   0.00 (  0%)   0.03 (  0%)  2280k (  0%)
 varconst                           :   0.03 (  0%)   0.00 (  0%)   0.01 (  0%)    16k (  0%)
 lower subreg                       :   0.00 (  0%)   0.00 (  0%)   0.02 (  0%)    21k (  0%)
 jump                               :   0.02 (  0%)   0.00 (  0%)   0.00 (  0%)     0  (  0%)
 forward prop                       :   0.59 (  1%)   0.00 (  0%)   0.69 (  1%)   284k (  0%)
 CSE                                :   0.53 (  0%)   0.01 (  0%)   0.59 (  1%)  1772k (  0%)
 dead code elimination              :   0.11 (  0%)   0.00 (  0%)   0.05 (  0%)     0  (  0%)
 dead store elim1                   :   0.19 (  0%)   0.00 (  0%)   0.17 (  0%)  2756k (  0%)
 dead store elim2                   :   0.36 (  0%)   0.00 (  0%)   0.34 (  0%)  3976k (  0%)
 loop analysis                      :   0.04 (  0%)   0.00 (  0%)   0.02 (  0%)     0  (  0%)
 loop init                          :   1.85 (  2%)   0.02 (  1%)   1.81 (  2%)    25M (  2%)
 loop invariant motion              :   0.10 (  0%)   0.00 (  0%)   0.08 (  0%)   145k (  0%)
 loop fini                          :   0.05 (  0%)   0.01 (  0%)   0.05 (  0%)    14k (  0%)
 CPROP                              :   0.51 (  0%)   0.01 (  0%)   0.64 (  1%)  4218k (  0%)
 PRE                                :   0.47 (  0%)   0.00 (  0%)   0.56 (  1%)  1545k (  0%)
 CSE 2                              :   0.25 (  0%)   0.00 (  0%)   0.38 (  0%)   736k (  0%)
 branch prediction                  :   0.37 (  0%)   0.03 (  1%)   0.34 (  0%)  3272k (  0%)
 combiner                           :   0.84 (  1%)   0.01 (  0%)   0.71 (  1%)  7613k (  1%)
 if-conversion                      :   0.05 (  0%)   0.00 (  0%)   0.04 (  0%)   312k (  0%)
 integrated RA                      :   2.17 (  2%)   0.03 (  1%)   2.15 (  2%)    44M (  4%)
 LRA non-specific                   :   0.54 (  1%)   0.00 (  0%)   0.54 (  0%)  4028k (  0%)
 LRA virtuals elimination           :   0.16 (  0%)   0.01 (  0%)   0.14 (  0%)  2129k (  0%)
 LRA reload inheritance             :   0.13 (  0%)   0.00 (  0%)   0.14 (  0%)   391k (  0%)
 LRA create live ranges             :   0.49 (  0%)   0.00 (  0%)   0.60 (  1%)   417k (  0%)
 LRA hard reg assignment            :   0.10 (  0%)   0.00 (  0%)   0.09 (  0%)     0  (  0%)
 LRA rematerialization              :   0.15 (  0%)   0.00 (  0%)   0.07 (  0%)  2128  (  0%)
 reload                             :   0.00 (  0%)   0.00 (  0%)   0.03 (  0%)     0  (  0%)
 reload CSE regs                    :   0.57 (  1%)   0.00 (  0%)   0.68 (  1%)  4376k (  0%)
 ree                                :   0.04 (  0%)   0.00 (  0%)   0.04 (  0%)    48k (  0%)
 thread pro- & epilogue             :   0.44 (  0%)   0.01 (  0%)   0.29 (  0%)  2591k (  0%)
 if-conversion 2                    :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)    11k (  0%)
 combine stack adjustments          :   0.04 (  0%)   0.00 (  0%)   0.02 (  0%)     0  (  0%)
 peephole 2                         :   0.08 (  0%)   0.00 (  0%)   0.07 (  0%)   479k (  0%)
 hard reg cprop                     :   0.18 (  0%)   0.01 (  0%)   0.16 (  0%)    36k (  0%)
 scheduling 2                       :   1.49 (  1%)   0.00 (  0%)   1.37 (  1%)  1902k (  0%)
 machine dep reorg                  :   0.08 (  0%)   0.00 (  0%)   0.07 (  0%)     0  (  0%)
 reorder blocks                     :   0.16 (  0%)   0.00 (  0%)   0.17 (  0%)  1340k (  0%)
 shorten branches                   :   0.05 (  0%)   0.00 (  0%)   0.06 (  0%)     0  (  0%)
 final                              :   0.13 (  0%)   0.01 (  0%)   0.18 (  0%)  6736k (  1%)
 variable output                    :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)   196k (  0%)
 symout                             :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)     0  (  0%)
 tree if-combine                    :   0.00 (  0%)   0.00 (  0%)   0.03 (  0%)    17k (  0%)
 if to switch conversion            :   0.04 (  0%)   0.00 (  0%)   0.06 (  0%)     0  (  0%)
 straight-line strength reduction   :   0.12 (  0%)   0.00 (  0%)   0.07 (  0%)    56k (  0%)
 store merging                      :   0.38 (  0%)   0.00 (  0%)   0.29 (  0%)  1107k (  0%)
 initialize rtl                     :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)    12k (  0%)
 address lowering                   :   0.07 (  0%)   0.01 (  0%)   0.12 (  0%)  1586k (  0%)
 early local passes                 :   0.02 (  0%)   0.00 (  0%)   0.03 (  0%)     0  (  0%)
 unaccounted optimizations          :   0.00 (  0%)   0.00 (  0%)   0.02 (  0%)     0  (  0%)
 rest of compilation                :   0.82 (  1%)   0.02 (  1%)   0.94 (  1%)  4036k (  0%)
 remove unused locals               :   0.35 (  0%)   0.01 (  0%)   0.26 (  0%)    30k (  0%)
 address taken                      :   0.22 (  0%)   0.02 (  1%)   0.26 (  0%)     0  (  0%)
 verify loop closed                 :   0.02 (  0%)   0.00 (  0%)   0.03 (  0%)     0  (  0%)
 verify RTL sharing                 :   1.71 (  2%)   0.04 (  1%)   1.54 (  1%)     0  (  0%)
 rebuild frequencies                :   0.02 (  0%)   0.00 (  0%)   0.05 (  0%)    31k (  0%)
 repair loop structures             :   0.17 (  0%)   0.00 (  0%)   0.19 (  0%)  9456  (  0%)
 TOTAL                              : 106.30          2.90        109.42         1092M
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --enable-checking=release to disable checks.
$ keep_blocks=1 /build/gcc-master/gcc/xg++ -B /build/gcc-master/gcc -nostdinc++ -I /build/gcc-master/x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu -I /build/gcc-master/x86_64-pc-linux-gnu/libstdc++-v3/include -I /src/gcc/master/libstdc++-v3/libsupc++ -I /src/gcc/master/libstdc++-v3/include/backward -I /src/gcc/master/libstdc++-v3/testsuite/util -L /build/gcc-master/x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -O2 -c -fdump-tree-cfg=tramp3d-v4.keep_blocks.cfg /src/tramp3d-v4.cpp -fmem-report -ftime-report

################################################################################
# Final                                                                        #
################################################################################

Number of expanded macros:                     18928
Average number of tokens per macro expansion:      7

Line Table allocations during the compilation process
Number of ordinary maps used:          915 
Ordinary map used size:                 21k
Number of ordinary maps allocated:    1365 
Ordinary maps allocated size:           31k
Number of macro maps used:              16k
Macro maps used size:                  525k
Macro maps locations size:            1124k
Macro maps size:                      1650k
Duplicated maps locations size:        378k
Total allocated maps size:            3204k
Total used maps size:                 1671k
Ad-hoc table size:                      12M
Ad-hoc table entries used:             473k
optimized_ranges:                     1171k
unoptimized_ranges:                    134k

Memory still allocated at the end of the compilation process
Size      Allocated        Used    Overhead
8               116k         95k       3480 
16             4056k       2271k         87k
32               23M         11M        417k
64               10M       8427k        167k
256              30M         25M        424k
512            1504k       1166k         20k
1024           3852k       1708k         52k
2048           5040k       4986k         68k
4096            148k        148k       2072 
8192             48k         48k        336 
16384            64k         64k        224 
32768           128k        128k        224 
65536           320k        320k        280 
131072          128k        128k         56 
262144         1024k       1024k        224 
524288          512k        512k         56 
1048576        2048k       2048k        112 
2097152        2048k       2048k         56 
16777216         16M         16M         56 
24               11M       5294k        209k
40               26M         17M        416k
48               14M       6606k        236k
56             4828k       1825k         75k
72             3288k        775k         44k
80              600k        217k       8400 
88              360k        177k       5040 
96               12M       7050k        174k
112            3492k       2075k         47k
120            6172k       4542k         84k
152              17M         15M        238k
128              30M         26M        429k
144            6516k       1881k         89k
168              43M         40M        602k
184            3340k       2045k         45k
104            1164k       1036k         15k
272            3520k        904k         48k
280             172k        103k       2408 
Total           288M        211M       4019k

String pool
entries:                        79453
identifiers:                    34933 (43.97%)
slots:                          131072
deleted:                        36683
GGC bytes:                      2620k
table size:                     1024k
coll/search:                    0.8234
ins/search:                     0.1222
avg. entry:                     33.77 bytes (+/- 71.45)
longest entry:                  496
(No per-node statistics)
Type hash: size 131071, 67199 elements, 1.112279 collisions
DECL_DEBUG_EXPR  hash: size 1021, 0 elements, 0.859474 collisions
DECL_VALUE_EXPR  hash: size 1021, 30 elements, 0.148597 collisions
decl_specializations: size 131071, 50624 elements, 1.386344 collisions
type_specializations: size 32749, 23184 elements, 2.504205 collisions
No GIMPLE statistics
No RTX statistics

--------------------------------------------------------------------------------------------------------------------------------------------
Heap vectors                                      sizeof(T)       Leak            Peak     Times       Leak items Peak items
--------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------
Heap vectors                                      sizeof(T)       Leak            Peak     Times       Leak items Peak items
--------------------------------------------------------------------------------------------------------------------------------------------
Total                                                               0                         0                0 
--------------------------------------------------------------------------------------------------------------------------------------------


Alias oracle query stats:
  refs_may_alias_p: 2798174 disambiguations, 3068078 queries
  ref_maybe_used_by_call_p: 23818 disambiguations, 2836845 queries
  call_may_clobber_ref_p: 2754 disambiguations, 2764 queries
  nonoverlapping_component_refs_p: 0 disambiguations, 3493 queries
  nonoverlapping_refs_since_match_p: 376 disambiguations, 9008 must overlaps, 9415 queries
  aliasing_component_refs_p: 807 disambiguations, 30842 queries
  TBAA oracle: 1041789 disambiguations 1976625 queries
               189831 are in alias set 0
               513994 queries asked about the same object
               0 queries asked about the same alias set
               0 access volatile
               230715 are dependent in the DAG
               296 are aritificially in conflict with void *

Modref stats:
  modref use: 537 disambiguations, 6371 queries
  modref clobber: 37429 disambiguations, 352194 queries
  119124 tbaa queries (0.338234 per modref query)
  21518 base compares (0.061097 per modref query)

PTA query stats:
  pt_solution_includes: 559296 disambiguations, 744735 queries
  pt_solutions_intersect: 154078 disambiguations, 422490 queries

Time variable                                   usr           sys          wall           GGC
 phase setup                        :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)  1554k (  0%)
 phase parsing                      :   4.52 (  4%)   0.60 ( 21%)   5.12 (  5%)   213M ( 19%)
 phase lang. deferred               :   4.33 (  4%)   0.28 ( 10%)   4.63 (  4%)   189M ( 17%)
 phase opt and generate             :  98.18 ( 91%)   2.00 ( 69%) 100.38 ( 91%)   692M ( 63%)
 phase last asm                     :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)    32k (  0%)
 phase finalize                     :   0.29 (  0%)   0.01 (  0%)   0.29 (  0%)     0  (  0%)
 |name lookup                       :   0.73 (  1%)   0.04 (  1%)   0.68 (  1%)    13M (  1%)
 |overload resolution               :   2.29 (  2%)   0.22 (  8%)   2.39 (  2%)   110M ( 10%)
 garbage collection                 :   3.34 (  3%)   0.02 (  1%)   3.32 (  3%)     0  (  0%)
 dump files                         :   0.50 (  0%)   0.05 (  2%)   0.43 (  0%)  2945k (  0%)
 callgraph construction             :   1.26 (  1%)   0.11 (  4%)   1.39 (  1%)    37M (  3%)
 callgraph optimization             :   0.80 (  1%)   0.06 (  2%)   0.81 (  1%)   154k (  0%)
 callgraph functions expansion      :  68.01 ( 63%)   0.68 ( 24%)  68.84 ( 62%)   365M ( 33%)
 callgraph ipa passes               :  26.57 ( 25%)   1.03 ( 36%)  27.65 ( 25%)   228M ( 21%)
 ipa function summary               :   0.26 (  0%)   0.01 (  0%)   0.21 (  0%)  4663k (  0%)
 ipa dead code removal              :   0.10 (  0%)   0.00 (  0%)   0.11 (  0%)    56  (  0%)
 ipa inheritance graph              :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)    20k (  0%)
 ipa virtual call target            :   0.00 (  0%)   0.00 (  0%)   0.03 (  0%)  3264  (  0%)
 ipa cp                             :   0.55 (  1%)   0.00 (  0%)   0.48 (  0%)  4862k (  0%)
 ipa inlining heuristics            :   0.80 (  1%)   0.00 (  0%)   0.81 (  1%)    22M (  2%)
 ipa function splitting             :   0.20 (  0%)   0.01 (  0%)   0.24 (  0%)   724k (  0%)
 ipa comdats                        :   0.03 (  0%)   0.00 (  0%)   0.03 (  0%)     0  (  0%)
 ipa reference                      :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)     0  (  0%)
 ipa profile                        :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)     0  (  0%)
 ipa pure const                     :   0.23 (  0%)   0.01 (  0%)   0.14 (  0%)   437k (  0%)
 ipa icf                            :   0.17 (  0%)   0.00 (  0%)   0.17 (  0%)    44k (  0%)
 ipa SRA                            :   0.24 (  0%)   0.00 (  0%)   0.24 (  0%)  6229k (  1%)
 ipa free lang data                 :   0.05 (  0%)   0.00 (  0%)   0.04 (  0%)     0  (  0%)
 ipa free inline summary            :   0.03 (  0%)   0.01 (  0%)   0.03 (  0%)     0  (  0%)
 ipa modref                         :   0.11 (  0%)   0.00 (  0%)   0.10 (  0%)  1858k (  0%)
 cfg construction                   :   0.03 (  0%)   0.00 (  0%)   0.03 (  0%)  1187k (  0%)
 cfg cleanup                        :   0.36 (  0%)   0.00 (  0%)   0.53 (  0%)  1464k (  0%)
 CFG verifier                       :   3.90 (  4%)   0.04 (  1%)   4.71 (  4%)     0  (  0%)
 trivially dead code                :   0.12 (  0%)   0.00 (  0%)   0.16 (  0%)     0  (  0%)
 df scan insns                      :   0.31 (  0%)   0.02 (  1%)   0.22 (  0%)    43k (  0%)
 df reaching defs                   :   0.31 (  0%)   0.00 (  0%)   0.30 (  0%)     0  (  0%)
 df live regs                       :   0.89 (  1%)   0.01 (  0%)   0.95 (  1%)     0  (  0%)
 df live&initialized regs           :   0.48 (  0%)   0.00 (  0%)   0.33 (  0%)     0  (  0%)
 df must-initialized regs           :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)     0  (  0%)
 df use-def / def-use chains        :   0.25 (  0%)   0.00 (  0%)   0.16 (  0%)     0  (  0%)
 df reg dead/unused notes           :   0.43 (  0%)   0.00 (  0%)   0.47 (  0%)  4205k (  0%)
 register information               :   0.15 (  0%)   0.00 (  0%)   0.12 (  0%)     0  (  0%)
 alias analysis                     :   0.39 (  0%)   0.01 (  0%)   0.40 (  0%)    11M (  1%)
 alias stmt walking                 :   4.37 (  4%)   0.06 (  2%)   4.22 (  4%)  1366k (  0%)
 register scan                      :   0.05 (  0%)   0.00 (  0%)   0.03 (  0%)   373k (  0%)
 rebuild jump labels                :   0.05 (  0%)   0.00 (  0%)   0.04 (  0%)     0  (  0%)
 preprocessing                      :   0.29 (  0%)   0.14 (  5%)   0.52 (  0%)  5611k (  0%)
 parser (global)                    :   0.53 (  0%)   0.13 (  4%)   0.77 (  1%)    57M (  5%)
 parser struct body                 :   0.76 (  1%)   0.03 (  1%)   0.84 (  1%)    37M (  3%)
 parser enumerator list             :   0.00 (  0%)   0.00 (  0%)   0.02 (  0%)   357k (  0%)
 parser function body               :   0.35 (  0%)   0.08 (  3%)   0.30 (  0%)  9857k (  1%)
 parser inl. func. body             :   0.21 (  0%)   0.02 (  1%)   0.27 (  0%)  5970k (  1%)
 parser inl. meth. body             :   0.56 (  1%)   0.05 (  2%)   0.58 (  1%)    25M (  2%)
 template instantiation             :   4.89 (  5%)   0.41 ( 14%)   5.22 (  5%)   261M ( 24%)
 constant expression evaluation     :   0.12 (  0%)   0.01 (  0%)   0.13 (  0%)  1563k (  0%)
 early inlining heuristics          :   0.28 (  0%)   0.00 (  0%)   0.29 (  0%)    10M (  1%)
 inline parameters                  :   0.55 (  1%)   0.03 (  1%)   0.70 (  1%)    15M (  1%)
 integration                        :   1.70 (  2%)   0.09 (  3%)   2.05 (  2%)   130M ( 12%)
 tree gimplify                      :   0.63 (  1%)   0.04 (  1%)   0.64 (  1%)    38M (  4%)
 tree eh                            :   0.27 (  0%)   0.01 (  0%)   0.11 (  0%)    10M (  1%)
 tree CFG construction              :   0.10 (  0%)   0.02 (  1%)   0.18 (  0%)    17M (  2%)
 tree CFG cleanup                   :   1.23 (  1%)   0.00 (  0%)   1.46 (  1%)   633k (  0%)
 tree tail merge                    :   0.17 (  0%)   0.00 (  0%)   0.12 (  0%)  1772k (  0%)
 tree VRP                           :   2.34 (  2%)   0.01 (  0%)   2.53 (  2%)    14M (  1%)
 tree Early VRP                     :   1.63 (  2%)   0.08 (  3%)   1.44 (  1%)    12M (  1%)
 tree copy propagation              :   0.39 (  0%)   0.00 (  0%)   0.34 (  0%)   189k (  0%)
 tree PTA                           :   1.71 (  2%)   0.06 (  2%)   1.81 (  2%)  5305k (  0%)
 tree PHI insertion                 :   0.04 (  0%)   0.00 (  0%)   0.06 (  0%)  1826k (  0%)
 tree SSA rewrite                   :   0.42 (  0%)   0.02 (  1%)   0.42 (  0%)    15M (  1%)
 tree SSA other                     :   0.23 (  0%)   0.00 (  0%)   0.17 (  0%)  1597k (  0%)
 tree SSA incremental               :   0.50 (  0%)   0.00 (  0%)   0.54 (  0%)  3429k (  0%)
 tree operand scan                  :   0.60 (  1%)   0.01 (  0%)   0.65 (  1%)    43M (  4%)
 dominator optimization             :   2.56 (  2%)   0.04 (  1%)   2.72 (  2%)    12M (  1%)
 backwards jump threading           :   0.15 (  0%)   0.00 (  0%)   0.21 (  0%)   347k (  0%)
 tree SRA                           :   0.24 (  0%)   0.00 (  0%)   0.14 (  0%)  1084k (  0%)
 isolate eroneous paths             :   0.03 (  0%)   0.00 (  0%)   0.02 (  0%)  1584  (  0%)
 tree CCP                           :   1.54 (  1%)   0.02 (  1%)   1.29 (  1%)  4451k (  0%)
 tree split crit edges              :   0.03 (  0%)   0.00 (  0%)   0.03 (  0%)  1578k (  0%)
 tree reassociation                 :   0.08 (  0%)   0.00 (  0%)   0.06 (  0%)    18k (  0%)
 tree PRE                           :   1.87 (  2%)   0.04 (  1%)   2.22 (  2%)    11M (  1%)
 tree FRE                           :   2.53 (  2%)   0.02 (  1%)   2.66 (  2%)  6890k (  1%)
 tree code sinking                  :   0.13 (  0%)   0.00 (  0%)   0.13 (  0%)  1307k (  0%)
 tree linearize phis                :   0.18 (  0%)   0.00 (  0%)   0.11 (  0%)   993k (  0%)
 tree backward propagate            :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)     0  (  0%)
 tree forward propagate             :   0.76 (  1%)   0.01 (  0%)   0.90 (  1%)  2245k (  0%)
 tree phiprop                       :   0.04 (  0%)   0.00 (  0%)   0.02 (  0%)    19k (  0%)
 tree conservative DCE              :   0.29 (  0%)   0.01 (  0%)   0.38 (  0%)   449k (  0%)
 tree aggressive DCE                :   0.39 (  0%)   0.01 (  0%)   0.36 (  0%)    12M (  1%)
 tree buildin call DCE              :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)  3504  (  0%)
 tree DSE                           :   1.04 (  1%)   0.00 (  0%)   1.16 (  1%)   339k (  0%)
 PHI merge                          :   0.06 (  0%)   0.00 (  0%)   0.11 (  0%)   128k (  0%)
 tree loop optimization             :   0.07 (  0%)   0.00 (  0%)   0.00 (  0%)     0  (  0%)
 tree loop invariant motion         :   0.32 (  0%)   0.00 (  0%)   0.33 (  0%)   117k (  0%)
 tree canonical iv                  :   0.19 (  0%)   0.00 (  0%)   0.18 (  0%)  2167k (  0%)
 scev constant prop                 :   0.10 (  0%)   0.00 (  0%)   0.04 (  0%)   537k (  0%)
 complete unrolling                 :   1.43 (  1%)   0.02 (  1%)   1.44 (  1%)    14M (  1%)
 tree loop distribution             :   0.38 (  0%)   0.00 (  0%)   0.35 (  0%)  3488k (  0%)
 tree iv optimization               :   1.36 (  1%)   0.02 (  1%)   1.36 (  1%)    19M (  2%)
 tree copy headers                  :   0.09 (  0%)   0.00 (  0%)   0.07 (  0%)  1438k (  0%)
 tree SSA uncprop                   :   0.06 (  0%)   0.00 (  0%)   0.06 (  0%)     0  (  0%)
 tree NRV optimization              :   0.00 (  0%)   0.00 (  0%)   0.02 (  0%)  1824  (  0%)
 tree SSA verifier                  :  11.82 ( 11%)   0.24 (  8%)  12.05 ( 11%)     0  (  0%)
 tree STMT verifier                 :  17.14 ( 16%)   0.47 ( 16%)  16.96 ( 15%)     0  (  0%)
 tree switch conversion             :   0.04 (  0%)   0.00 (  0%)   0.04 (  0%)     0  (  0%)
 tree switch lowering               :   0.03 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 gimple CSE sin/cos                 :   0.01 (  0%)   0.00 (  0%)   0.03 (  0%)     0  (  0%)
 gimple widening/fma detection      :   0.03 (  0%)   0.00 (  0%)   0.05 (  0%)    13k (  0%)
 tree strlen optimization           :   0.60 (  1%)   0.01 (  0%)   0.61 (  1%)  4021k (  0%)
 tree modref                        :   0.35 (  0%)   0.02 (  1%)   0.27 (  0%)  5652k (  1%)
 callgraph verifier                 :   0.61 (  1%)   0.03 (  1%)   0.59 (  1%)     0  (  0%)
 dominance frontiers                :   0.01 (  0%)   0.00 (  0%)   0.05 (  0%)     0  (  0%)
 dominance computation              :   1.20 (  1%)   0.09 (  3%)   1.41 (  1%)     0  (  0%)
 control dependences                :   0.03 (  0%)   0.00 (  0%)   0.04 (  0%)     0  (  0%)
 out of ssa                         :   0.12 (  0%)   0.00 (  0%)   0.12 (  0%)    59k (  0%)
 expand vars                        :   0.08 (  0%)   0.00 (  0%)   0.12 (  0%)  2916k (  0%)
 expand                             :   0.66 (  1%)   0.01 (  0%)   0.64 (  1%)    30M (  3%)
 post expand cleanups               :   0.02 (  0%)   0.00 (  0%)   0.04 (  0%)  2280k (  0%)
 varconst                           :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)    16k (  0%)
 lower subreg                       :   0.03 (  0%)   0.00 (  0%)   0.01 (  0%)    21k (  0%)
 jump                               :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 forward prop                       :   0.55 (  1%)   0.00 (  0%)   0.72 (  1%)   284k (  0%)
 CSE                                :   0.46 (  0%)   0.00 (  0%)   0.45 (  0%)  1772k (  0%)
 dead code elimination              :   0.08 (  0%)   0.00 (  0%)   0.10 (  0%)     0  (  0%)
 dead store elim1                   :   0.26 (  0%)   0.00 (  0%)   0.21 (  0%)  2756k (  0%)
 dead store elim2                   :   0.39 (  0%)   0.00 (  0%)   0.35 (  0%)  3977k (  0%)
 loop analysis                      :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)     0  (  0%)
 loop init                          :   1.83 (  2%)   0.04 (  1%)   1.90 (  2%)    25M (  2%)
 loop invariant motion              :   0.07 (  0%)   0.00 (  0%)   0.10 (  0%)   145k (  0%)
 loop fini                          :   0.09 (  0%)   0.00 (  0%)   0.06 (  0%)    16k (  0%)
 CPROP                              :   0.67 (  1%)   0.00 (  0%)   0.50 (  0%)  4218k (  0%)
 PRE                                :   0.56 (  1%)   0.00 (  0%)   0.54 (  0%)  1545k (  0%)
 CSE 2                              :   0.33 (  0%)   0.00 (  0%)   0.29 (  0%)   736k (  0%)
 branch prediction                  :   0.36 (  0%)   0.02 (  1%)   0.37 (  0%)  3272k (  0%)
 combiner                           :   0.68 (  1%)   0.01 (  0%)   0.85 (  1%)  7613k (  1%)
 if-conversion                      :   0.08 (  0%)   0.00 (  0%)   0.04 (  0%)   312k (  0%)
 integrated RA                      :   2.03 (  2%)   0.01 (  0%)   2.05 (  2%)    44M (  4%)
 LRA non-specific                   :   0.55 (  1%)   0.00 (  0%)   0.57 (  1%)  4029k (  0%)
 LRA virtuals elimination           :   0.21 (  0%)   0.00 (  0%)   0.16 (  0%)  2129k (  0%)
 LRA reload inheritance             :   0.09 (  0%)   0.00 (  0%)   0.09 (  0%)   391k (  0%)
 LRA create live ranges             :   0.53 (  0%)   0.00 (  0%)   0.53 (  0%)   417k (  0%)
 LRA hard reg assignment            :   0.06 (  0%)   0.00 (  0%)   0.16 (  0%)     0  (  0%)
 LRA rematerialization              :   0.06 (  0%)   0.00 (  0%)   0.05 (  0%)  2128  (  0%)
 reload                             :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 reload CSE regs                    :   0.69 (  1%)   0.00 (  0%)   0.59 (  1%)  4376k (  0%)
 ree                                :   0.02 (  0%)   0.01 (  0%)   0.05 (  0%)    47k (  0%)
 thread pro- & epilogue             :   0.31 (  0%)   0.00 (  0%)   0.47 (  0%)  2591k (  0%)
 if-conversion 2                    :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)    11k (  0%)
 combine stack adjustments          :   0.05 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 peephole 2                         :   0.05 (  0%)   0.00 (  0%)   0.07 (  0%)   479k (  0%)
 hard reg cprop                     :   0.23 (  0%)   0.00 (  0%)   0.17 (  0%)    36k (  0%)
 scheduling 2                       :   1.32 (  1%)   0.02 (  1%)   1.48 (  1%)  1897k (  0%)
 machine dep reorg                  :   0.08 (  0%)   0.00 (  0%)   0.12 (  0%)     0  (  0%)
 reorder blocks                     :   0.15 (  0%)   0.00 (  0%)   0.11 (  0%)  1340k (  0%)
 shorten branches                   :   0.13 (  0%)   0.01 (  0%)   0.06 (  0%)     0  (  0%)
 final                              :   0.19 (  0%)   0.00 (  0%)   0.22 (  0%)  6736k (  1%)
 variable output                    :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)   196k (  0%)
 symout                             :   0.02 (  0%)   0.00 (  0%)   0.00 (  0%)     0  (  0%)
 tree if-combine                    :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)    17k (  0%)
 if to switch conversion            :   0.07 (  0%)   0.00 (  0%)   0.04 (  0%)     0  (  0%)
 straight-line strength reduction   :   0.11 (  0%)   0.00 (  0%)   0.07 (  0%)    56k (  0%)
 store merging                      :   0.38 (  0%)   0.01 (  0%)   0.26 (  0%)  1107k (  0%)
 initialize rtl                     :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)    12k (  0%)
 address lowering                   :   0.07 (  0%)   0.00 (  0%)   0.08 (  0%)  1586k (  0%)
 early local passes                 :   0.04 (  0%)   0.01 (  0%)   0.05 (  0%)     0  (  0%)
 unaccounted optimizations          :   0.02 (  0%)   0.00 (  0%)   0.00 (  0%)     0  (  0%)
 rest of compilation                :   0.99 (  1%)   0.01 (  0%)   1.00 (  1%)  4036k (  0%)
 unaccounted late compilation       :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 remove unused locals               :   0.21 (  0%)   0.01 (  0%)   0.31 (  0%)    30k (  0%)
 address taken                      :   0.20 (  0%)   0.00 (  0%)   0.25 (  0%)     0  (  0%)
 verify loop closed                 :   0.07 (  0%)   0.00 (  0%)   0.05 (  0%)     0  (  0%)
 verify RTL sharing                 :   1.63 (  2%)   0.01 (  0%)   1.62 (  1%)     0  (  0%)
 rebuild frequencies                :   0.07 (  0%)   0.00 (  0%)   0.06 (  0%)    31k (  0%)
 repair loop structures             :   0.14 (  0%)   0.00 (  0%)   0.15 (  0%)  9456  (  0%)
 TOTAL                              : 107.32          2.89        110.44         1097M
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --enable-checking=release to disable checks.
tmp$
Richard Biener Jan. 18, 2021, 1:25 p.m. UTC | #3
On Sun, Jan 17, 2021 at 1:46 AM Martin Sebor <msebor@gmail.com> wrote:
>
> On 1/15/21 12:44 AM, Richard Biener wrote:
> > On Thu, Jan 14, 2021 at 8:13 PM Martin Sebor via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> >>
> >> One aspect of PR 98465 - Bogus warning stringop-overread for std::string
> >> is the inconsistency between -g and -g0 which turns out to be due to
> >> GCC eliminating apparently unused scope blocks from inlined functions
> >> that aren't explicitly declared inline and artificial.  PR 98664 tracks
> >> just this part of PR 98465.
> >>
> >> To resolve just the PR 98664 subset the attached change has
> >> the tree-ssa-live.c pass preserve these blocks for all inlined
> >> functions, not just artificial ones.  Besides avoiding the interaction
> >> between -g and warnings it also seems to improve the inlining context
> >> by including more inlined call sites.  This can be seen in the adjusted
> >> tests.  (Its effect on PR 98465 is that the false positive is issued
> >> consistently, regardless of -g.  Avoiding the false positive is my
> >> next step.)
> >>
> >> Jakub, you raised a concern yesterday in PR 98465 c#13 about the memory
> >> footprint of this change.  Can you please comment on whether it's in
> >> line with what you were suggesting?
> >
> >       {
> >         tree ao = BLOCK_ABSTRACT_ORIGIN (block);
> > -      if (TREE_CODE (ao) == FUNCTION_DECL)
> > -       loc = BLOCK_SOURCE_LOCATION (block);
> > -      else if (TREE_CODE (ao) != BLOCK)
> > -       break;
> > +       if (TREE_CODE (ao) == FUNCTION_DECL)
> > +        loc = BLOCK_SOURCE_LOCATION (block);
> > +       else if (TREE_CODE (ao) != BLOCK)
> > +        break;
> >
> > you are replacing tabs with spaces?
> >
> > @@ -558,16 +558,13 @@ remove_unused_scope_block_p (tree scope, bool
> > in_ctor_dtor_block)
> >      else if (!flag_auto_profile && debug_info_level == DINFO_LEVEL_NONE
> >              && !optinfo_wants_inlining_info_p ())
> >        {
> > -       /* Even for -g0 don't prune outer scopes from artificial
> > -         functions, otherwise diagnostics using tree_nonartificial_location
> > -         will not be emitted properly.  */
> > +       /* Even for -g0 don't prune outer scopes from inlined functions,
> > +         otherwise late diagnostics from such functions will not be
> > +         emitted or suppressed properly.  */
> >          if (inlined_function_outer_scope_p (scope))
> >           {
> >             tree ao = BLOCK_ORIGIN (scope);
> > -          if (ao
> > -              && TREE_CODE (ao) == FUNCTION_DECL
> > -              && DECL_DECLARED_INLINE_P (ao)
> > -              && lookup_attribute ("artificial", DECL_ATTRIBUTES (ao)))
> > +          if (ao && TREE_CODE (ao) == FUNCTION_DECL)
> >               unused = false;
> >           }
> >        }
> >
> > so which inlined_function_outer_scope_p are you _not_ marking now?
> > BLOCK_ORIGIN is never NULL and all inlined scopes should have
> > an abstract origin - I believe always a FUNCTIN_DECL.  Which means
> > you could have simplified it further?
>
> Quite possibly.  I could find no documentation for these macros so
> I tried to keep my changes conservative.  I did put together some
> notes to document what I saw the macros evaluate to in my testing
> (below).  If/when it's close to accurate I'd like to add them to
> tree.h and to the internals manual.
>
> > And yes, the main reason for the code above is memory use for
> > C++ with lots of inlining.  I suggest to try the patch on tramp3d
> > for example (there's about 10 inline instances per emitted
> > assembly op).
>
> This one:
> https://github.com/llvm-mirror/test-suite/tree/master/MultiSource/Benchmarks/tramp3d-v4
> ?

yeah

> With the patch, 69,022 more blocks with distinct numbers are kept
> than without it.  I see some small differences in -fmem-report
> and -ftime-report output:
>
>    Total: 286 -> 288M  210 -> 211M  3993 -> 4019k
>
> I'm not really sure what to look at so I attach the two reports
> for you to judge for yourself.

A build with --enable-gather-detailed-mem-stats would have given
statistics on BLOCK trees I think, otherwise -fmem-report is
not too useful but I guess the above overall stat tell us the
overhead is manageable.

> I also attach an updated patch with the slight simplification you
> suggested.

So I was even suggesting to do

  if (inlined_function_outer_scope_p (scope))
    unused = false;

and maybe gcc_assert (TREE_CODE (orig) == FUNCTION_DECL)
but I think the patch is OK as updated.

> Martin
>
> PS Here are my notes on the macros and the two related functions:
>
> BLOCK: Denotes a lexical scope.  Contains BLOCK_VARS of variables
> declared in it, BLOCK_SUBBLOCKS of scopes nested in it, and
> BLOCK_CHAIN pointing to the next BLOCK.  Its BLOCK_SUPERCONTEXT
> point to the BLOCK of the enclosing scope.  May have
> a BLOCK_ABSTRACT_ORIGIN and a BLOCK_SOURCE_LOCATION.
>
> BLOCK_SUPERCONTEXT: The scope of the enclosing block, or FUNCTION_DECL
> for the "outermost" function scope.  Inlined functions are chained by
> this so that given expression E and its TREE_BLOCK(E) B,
> BLOCK_SUPERCONTEXT(B) is the scope (BLOCK) in which E has been made
> or into which E has been inlined.  In the latter case,
>
> BLOCK_ORIGIN(B) evaluates either to the enclosing BLOCK or to
> the enclosing function DECL.  It's never null.
>
> BLOCK_ABSTRACT_ORIGIN(B) is the FUNCTION_DECL of the function into
> which it has been inlined, or null if B is not inlined.

It's the BLOCK or FUNCTION it was inlined _from_, not were it was inlined to.
It's the "ultimate" source, thus the abstract copy of the block or function decl
(for the outermost scope, aka inlined_function_outer_scope_p).  It corresponds
to what you'd expect for the DWARF abstract origin.

BLOCK_ABSTRACT_ORIGIN can be NULL (in case it isn't an inline instance).

> BLOCK_ABSTRACT_ORIGIN: A BLOCK, or FUNCTION_DECL of the function
> into which a block has been inlined.  In a BLOCK immediately enclosing
> an inlined leaf expression points to the outermost BLOCK into which it
> has been inlined (thus bypassing all intermediate BLOCK_SUPERCONTEXTs).
>
> BLOCK_FRAGMENT_ORIGIN: ???
> BLOCK_FRAGMENT_CHAIN: ???

that's for scope blocks split by hot/cold partitioning and only temporarily
populated.

> bool inlined_function_outer_scope_p(BLOCK)   [tree.h]
>    Returns true if a BLOCK has a source location.
>    True for all but the innermost (no SUBBLOCKs?) and outermost blocks
>    into which an expression has been inlined. (Is this always true?)
>
> tree block_ultimate_origin(BLOCK)   [tree.c]
>    Returns BLOCK_ABSTRACT_ORIGIN(BLOCK), AO, after asserting that
>    (DECL_P(AO) && DECL_ORIGIN(AO) == AO) || BLOCK_ORIGIN(AO) == AO).
diff mbox series

Patch

PR middle-end/98664 - inconsistent -Wfree-nonheap-object for inlined calls to system headers

gcc/ChangeLog:

	PR middle-end/98664
	* tree-ssa-live.c (remove_unused_scope_block_p): Keep scopes for
	all functions, even if they're not declared artificial or inline.
	* tree.c (tree_inlined_location): Use macro expansion location
	only if scope traversal fails to expose one.

gcc/testsuite/ChangeLog:

	PR middle-end/98664
	* gcc.dg/Wvla-larger-than-4.c: Adjust expected output.
	* gcc.dg/plugin/diagnostic-test-inlining-3.c: Same.
	* g++.dg/warn/Wfree-nonheap-object-5.C: New test.
	* gcc.dg/Wfree-nonheap-object-4.c: New test.

diff --git a/gcc/testsuite/g++.dg/warn/Wfree-nonheap-object-5.C b/gcc/testsuite/g++.dg/warn/Wfree-nonheap-object-5.C
new file mode 100644
index 00000000000..742dba0cf58
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wfree-nonheap-object-5.C
@@ -0,0 +1,129 @@ 
+/* PR middle-end/98664 - inconsistent --Wfree-nonheap-object for inlined
+   calls to system headers
+   { dg-do compile }
+   { dg-options "-O2 -Wall" } */
+
+# 7 "Wfree-nonheap-object-5.h" 1 3
+
+struct A0
+{
+  void *p;
+
+  void f0 (void *q) { p = q; }
+  void g0 (void) {
+    __builtin_free (p);       // { dg-warning "\\\[-Wfree-nonheap-object" }
+  }
+};
+
+struct A1
+{
+  void *p;
+
+  void f0 (void *q) { p = q; }
+  void f1 (void *q) { f0 (q); }
+
+  void g0 (void) {
+    __builtin_free (p);       // { dg-warning "\\\[-Wfree-nonheap-object" }
+  }
+  void g1 (void) { g0 (); }
+};
+
+struct A2
+{
+  void *p;
+
+  void f0 (void *q) { p = q; }
+  void f1 (void *q) { f0 (q); }
+  void f2 (void *q) { f1 (q); }
+
+  void g0 (void) {
+    __builtin_free (p);       // { dg-warning "\\\[-Wfree-nonheap-object" }
+  }
+  void g1 (void) { g0 (); }
+  void g2 (void) { g1 (); }
+};
+
+# 47 "Wfree-nonheap-object-5.C"
+
+#define NOIPA __attribute__ ((noipa))
+
+extern int array[];
+
+/* Verify the warning is issued even for calls in a system header inlined
+   into a function outside the header.  */
+
+NOIPA void warn_g0 (struct A0 *p)
+{
+  int *q = array + 1;
+
+  p->f0 (q);
+  p->g0 ();
+}
+
+// { dg-message "inlined from 'void warn_g0\\(A0\\*\\)'" "" { target *-*-* } 0 }
+
+
+/* Also verify the warning can be suppressed.  */
+
+NOIPA void nowarn_g0 (struct A0 *p)
+{
+  int *q = array + 2;
+
+  p->f0 (q);
+
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wfree-nonheap-object"
+  p->g0 ();
+#pragma GCC diagnostic pop
+}
+
+
+NOIPA void warn_g1 (struct A1 *p)
+{
+  int *q = array + 3;
+
+  p->f1 (q);
+  p->g1 ();
+}
+
+// { dg-message "inlined from 'void A1::g1\\(\\)'" "" { target *-*-* } 0 }
+// { dg-message "inlined from 'void warn_g1\\(A1\\*\\)'" "" { target *-*-* } 0 }
+
+
+NOIPA void nowarn_g1 (struct A2 *p)
+{
+  int *q = array + 4;
+
+  p->f1 (q);
+
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wfree-nonheap-object"
+  p->g1 ();
+#pragma GCC diagnostic pop
+}
+
+
+NOIPA void warn_g2 (struct A2 *p)
+{
+  int *q = array + 5;
+
+  p->f2 (q);
+  p->g2 ();
+}
+
+// { dg-message "inlined from 'void A2::g1\\(\\)'" "" { target *-*-* } 0 }
+// { dg-message "inlined from 'void A2::g2\\(\\)'" "" { target *-*-* } 0 }
+// { dg-message "inlined from 'void warn_g2\\(A2\\*\\)'" "" { target *-*-* } 0 }
+
+
+NOIPA void nowarn_g2 (struct A2 *p)
+{
+  int *q = array + 6;
+
+  p->f2 (q);
+
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wfree-nonheap-object"
+  p->g2 ();
+#pragma GCC diagnostic pop
+}
diff --git a/gcc/testsuite/gcc.dg/Wfree-nonheap-object-4.c b/gcc/testsuite/gcc.dg/Wfree-nonheap-object-4.c
new file mode 100644
index 00000000000..a7d921248c4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Wfree-nonheap-object-4.c
@@ -0,0 +1,107 @@ 
+/* PR middle-end/98664 - inconsistent --Wfree-nonheap-object for inlined
+   calls to system headers
+   { dg-do compile }
+   { dg-options "-O2 -Wall" } */
+
+# 7 "Wfree-nonheap-object-4.h" 1 3
+
+struct A
+{
+  void *p;
+};
+
+void f0 (struct A *p, void *q) { p->p = q; }
+void f1 (struct A *p, void *q) { f0 (p, q); }
+void f2 (struct A *p, void *q) { f1 (p, q); }
+
+void g0 (struct A *p)
+{
+  __builtin_free (p->p);      // { dg-warning "\\\[-Wfree-nonheap-object" }
+}
+
+void g1 (struct A *p) { g0 (p); }
+void g2 (struct A *p) { g1 (p); }
+
+# 26 "Wfree-nonheap-object-4.c"
+
+#define NOIPA __attribute__ ((noipa))
+
+extern int array[];
+
+/* Verify the warning is issued even for calls in a system header inlined
+   into a function outside the header.  */
+
+NOIPA void warn_g0 (struct A *p)
+{
+  int *q = array + 1;
+
+  f0 (p, q);
+  g0 (p);
+}
+
+// { dg-message "inlined from 'warn_g0'" "" { target *-*-* } 0 }
+
+
+/* Also verify the warning can be suppressed.  */
+
+NOIPA void nowarn_g0 (struct A *p)
+{
+  int *q = array + 2;
+
+  f0 (p, q);
+
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wfree-nonheap-object"
+  g0 (p);
+#pragma GCC diagnostic pop
+}
+
+
+NOIPA void warn_g1 (struct A *p)
+{
+  int *q = array + 3;
+
+  f1 (p, q);
+  g1 (p);
+}
+
+// { dg-message "inlined from 'g1'" "" { target *-*-* } 0 }
+// { dg-message "inlined from 'warn_g1'" "" { target *-*-* } 0 }
+
+
+NOIPA void nowarn_g1 (struct A *p)
+{
+  int *q = array + 4;
+
+  f1 (p, q);
+
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wfree-nonheap-object"
+  g1 (p);
+#pragma GCC diagnostic pop
+}
+
+
+NOIPA void warn_g2 (struct A *p)
+{
+  int *q = array + 5;
+
+  f2 (p, q);
+  g2 (p);
+}
+
+// { dg-message "inlined from 'g2'" "" { target *-*-* } 0 }
+// { dg-message "inlined from 'warn_g2'" "" { target *-*-* } 0 }
+
+
+NOIPA void nowarn_g2 (struct A *p)
+{
+  int *q = array + 6;
+
+  f2 (p, q);
+
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wfree-nonheap-object"
+  g2 (p);
+#pragma GCC diagnostic pop
+}
diff --git a/gcc/testsuite/gcc.dg/Wvla-larger-than-4.c b/gcc/testsuite/gcc.dg/Wvla-larger-than-4.c
index de99afbe56e..7d27829736f 100644
--- a/gcc/testsuite/gcc.dg/Wvla-larger-than-4.c
+++ b/gcc/testsuite/gcc.dg/Wvla-larger-than-4.c
@@ -17,14 +17,14 @@  static inline void inline_use_vla (unsigned n)
 static inline void use_inlined_vla (unsigned n)
 {
   inline_use_vla (n);         // this call is okay
-  inline_use_vla (n + 1);     // this one is not
+  inline_use_vla (n + 1);     // this one is not (line 20)
 }
 
 void call_inline (void)
 {
-  use_inlined_vla (31);
+  use_inlined_vla (31);       // line 25
 }
 
 /* Verify that the inlining context is included and that it points
    to the correct line number in the inlined function:
-   { dg-message "function 'inline_use_vla'..*inlined from 'call_inline' .*:20:" "" { target *-*-* } 0 }  */
+   { dg-message "function 'inline_use_vla'.*inlined from 'use_inlined_vla'.*:20:.*inlined from 'call_inline' .*:25:" "" { target *-*-* } 0 }  */
diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-inlining-3.c b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-inlining-3.c
index e1a4fca2cb4..b7df063c52c 100644
--- a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-inlining-3.c
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-inlining-3.c
@@ -35,7 +35,8 @@  int main()
    This test case captures this behavior.  */
 
 /* { dg-regexp "In function 'foo'," "" } */
-/* { dg-regexp "    inlined from 'main' at .+/diagnostic-test-inlining-3.c:15:3:" "" } */
+/* { dg-regexp "    inlined from 'bar' at .+/diagnostic-test-inlining-3.c:15:3:" "" } */
+/* { dg-regexp "    inlined from 'main' at .+/diagnostic-test-inlining-3.c:20:3:" "" } */
 /* { dg-warning "3: message" "" { target *-*-* } 9 } */
 /* { dg-begin-multiline-output "" }
    __emit_warning ("message");
diff --git a/gcc/tree-ssa-live.c b/gcc/tree-ssa-live.c
index 02a7a56f0f9..a7464369d73 100644
--- a/gcc/tree-ssa-live.c
+++ b/gcc/tree-ssa-live.c
@@ -558,16 +558,13 @@  remove_unused_scope_block_p (tree scope, bool in_ctor_dtor_block)
    else if (!flag_auto_profile && debug_info_level == DINFO_LEVEL_NONE
 	    && !optinfo_wants_inlining_info_p ())
      {
-       /* Even for -g0 don't prune outer scopes from artificial
-	  functions, otherwise diagnostics using tree_nonartificial_location
-	  will not be emitted properly.  */
+       /* Even for -g0 don't prune outer scopes from inlined functions,
+	  otherwise late diagnostics from such functions will not be
+	  emitted or suppressed properly.  */
        if (inlined_function_outer_scope_p (scope))
 	 {
 	   tree ao = BLOCK_ORIGIN (scope);
-	   if (ao
-	       && TREE_CODE (ao) == FUNCTION_DECL
-	       && DECL_DECLARED_INLINE_P (ao)
-	       && lookup_attribute ("artificial", DECL_ATTRIBUTES (ao)))
+	   if (ao && TREE_CODE (ao) == FUNCTION_DECL)
 	     unused = false;
 	 }
      }
diff --git a/gcc/tree.c b/gcc/tree.c
index e0a1d512019..909551a73f9 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -12626,19 +12626,22 @@  tree_inlined_location (tree exp, bool system_header /* = true */)
 	 && BLOCK_ABSTRACT_ORIGIN (block))
     {
       tree ao = BLOCK_ABSTRACT_ORIGIN (block);
-      if (TREE_CODE (ao) == FUNCTION_DECL)
-	loc = BLOCK_SOURCE_LOCATION (block);
-      else if (TREE_CODE (ao) != BLOCK)
-	break;
+       if (TREE_CODE (ao) == FUNCTION_DECL)
+        loc = BLOCK_SOURCE_LOCATION (block);
+       else if (TREE_CODE (ao) != BLOCK)
+        break;
 
       block = BLOCK_SUPERCONTEXT (block);
     }
 
   if (loc == UNKNOWN_LOCATION)
-    loc = EXPR_LOCATION (exp);
-
-  if (system_header)
-    return expansion_point_location_if_in_system_header (loc);
+    {
+      loc = EXPR_LOCATION (exp);
+      if (system_header)
+	/* Only consider macro expansion when the block traversal failed
+	   to find a location.  Otherwise it's not relevant.  */
+	return expansion_point_location_if_in_system_header (loc);
+    }
 
   return loc;
 }