mbox series

[RFC,0/3] Support for CTF in GCC

Message ID 1558374855-19140-1-git-send-email-indu.bhagat@oracle.com
Headers show
Series Support for CTF in GCC | expand

Message

Indu Bhagat May 20, 2019, 5:54 p.m. UTC
Background :
CTF is the Compact Ansi-C Type Format. It is a format designed to express some
characteristics (specifically Type information) of the data types in a C
program. CTF format is compact and fast; It was originally designed for
use-cases like dynamic tracing, online in-application debugging among others.

A patch to the binutils mailing list to add libctf is currently under
review (https://sourceware.org/ml/binutils/2019-05/msg00154.html and
	https://sourceware.org/ml/binutils/2019-05/msg00212.html).  libctf
provides means to create, update, read and manipulate CTF information.

This GCC patch set is preliminary work and the purpose is to gather comments and
feedback about CTF support in GCC.

(For technical introduction into the CTF format, the CTF header or
https://sourceware.org/ml/binutils/2019-04/msg00277.html will be useful.)

Project Details :
The project aims to add the support for CTF in the GNU toolchain. Adding CTF
support in the GNU toolchain will help the community in developing and 
converging the tools and use-cases where a light-weight debug format is needed.

De-duplication is a key aspect of the CTF format which ensures its compactness.
A parallel effort is ongoing to support de-duplication of CTF types at the
link-time.

In phase 1, we are making the compiler, linker and the debugger (GDB) capable
of handling the CTF format.

CTF format, in its present form, does not have callsite information.  We are
working on this as well. Once the CTF format extensions are agreed upon, the 
-gt1 option (see below) will begin to take form, in phase 2 of the project.

GCC RFC patch set :
Patch 1 is a simple addition of a new function lang_GNU_GIMPLE to check for
GIMPLE frontend.
Patch 2 and Patch 3 set up the framework for CTF support in GCC :
-- Patch 2 adds the new command line option for generating CTF. CTF generation
   is enabled in the compiler by specifying an explicit -gt or
   -gtLEVEL[LEVEL=1,2] :
    
    -gtLEVEL

    This is used to request CTF debug information and to specify how much CTF
    debug information, LEVEL[=0,1,2] can be specified. If -gt is specified
    (with no LEVEL), the default value of LEVEL is 2.

    -gt0 (Level 0) produces no CTF debug information at all. Thus, -gt0
    negates -gt.

    -gt1 (Level 1) produces CTF information for tracebacks only. This includes
    CTF callsite information, but does not include type information for other
    entities.

    -gt2 (Level 2) produces type information for entities (functions, variables
    etc.) at file-scope or global-scope only. This level of information can be
    used by dynamic tracers like DTrace.

--  Patch 3 adds the CTF debug hooks and initializes them if the required 
    user-level options are specified. 
    CTF debug hooks are wrappers around the DWARF debug hooks.

One of the main high-level design requirements that is relevant in the context
of the current GCC patch set is that - CTF and DWARF must be able to co-exist.
A user may want CTF debug information in isolation or with other debug formats.
A .ctf section is small and unlike other debug sections, ideally should not
need to be stripped out of the binary/executable.

High-level proposed plan (phase 1) :
In the next few patches, the functionality to generate contents of the CTF
section (.ctf) for a single compilation unit will be added.
Once CTF generation for a single compilation unit stabilizes, LTO and CTF
generation will be looked at.

Feedback and suggestions welcome.

Thanks

Indu Bhagat (3):
  Add new function lang_GNU_GIMPLE
  Add CTF command line options : -gtLEVEL
  Create CTF debug hooks

 gcc/ChangeLog                                   |  24 ++
 gcc/Makefile.in                                 |   3 +
 gcc/common.opt                                  |   9 +
 gcc/ctfout.c                                    | 171 +++++++++
 gcc/ctfout.h                                    |  41 +++
 gcc/debug.h                                     |   4 +
 gcc/doc/invoke.texi                             |  16 +
 gcc/flag-types.h                                |  13 +
 gcc/gengtype.c                                  |   4 +-
 gcc/langhooks.c                                 |   9 +
 gcc/langhooks.h                                 |   1 +
 gcc/opts.c                                      |  26 ++
 gcc/testsuite/ChangeLog                         |   7 +
 gcc/testsuite/gcc.dg/debug/ctf/ctf-1.c          |   6 +
 gcc/testsuite/gcc.dg/debug/ctf/ctf-preamble-1.c |  11 +
 gcc/testsuite/gcc.dg/debug/ctf/ctf.exp          |  41 +++
 gcc/testsuite/gcc.dg/debug/dwarf2-ctf-1.c       |   7 +
 gcc/toplev.c                                    |  24 ++
 include/ChangeLog                               |   4 +
 include/ctf.h                                   | 471 ++++++++++++++++++++++++
 20 files changed, 890 insertions(+), 2 deletions(-)
 create mode 100644 gcc/ctfout.c
 create mode 100644 gcc/ctfout.h
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-preamble-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf.exp
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2-ctf-1.c
 create mode 100644 include/ctf.h

Comments

Richard Biener May 21, 2019, 10:28 a.m. UTC | #1
On Mon, May 20, 2019 at 7:56 PM Indu Bhagat <indu.bhagat@oracle.com> wrote:
>
> Background :
> CTF is the Compact Ansi-C Type Format. It is a format designed to express some
> characteristics (specifically Type information) of the data types in a C
> program. CTF format is compact and fast; It was originally designed for
> use-cases like dynamic tracing, online in-application debugging among others.
>
> A patch to the binutils mailing list to add libctf is currently under
> review (https://sourceware.org/ml/binutils/2019-05/msg00154.html and
>         https://sourceware.org/ml/binutils/2019-05/msg00212.html).  libctf
> provides means to create, update, read and manipulate CTF information.
>
> This GCC patch set is preliminary work and the purpose is to gather comments and
> feedback about CTF support in GCC.
>
> (For technical introduction into the CTF format, the CTF header or
> https://sourceware.org/ml/binutils/2019-04/msg00277.html will be useful.)
>
> Project Details :
> The project aims to add the support for CTF in the GNU toolchain. Adding CTF
> support in the GNU toolchain will help the community in developing and
> converging the tools and use-cases where a light-weight debug format is needed.
>
> De-duplication is a key aspect of the CTF format which ensures its compactness.
> A parallel effort is ongoing to support de-duplication of CTF types at the
> link-time.
>
> In phase 1, we are making the compiler, linker and the debugger (GDB) capable
> of handling the CTF format.
>
> CTF format, in its present form, does not have callsite information.  We are
> working on this as well. Once the CTF format extensions are agreed upon, the
> -gt1 option (see below) will begin to take form, in phase 2 of the project.
>
> GCC RFC patch set :
> Patch 1 is a simple addition of a new function lang_GNU_GIMPLE to check for
> GIMPLE frontend.

I don't think you should need this - the GIMPLE "frontend" is intended for
unit testing only, I wouldn't like it to be exposed more.

> Patch 2 and Patch 3 set up the framework for CTF support in GCC :
> -- Patch 2 adds the new command line option for generating CTF. CTF generation
>    is enabled in the compiler by specifying an explicit -gt or
>    -gtLEVEL[LEVEL=1,2] :
>
>     -gtLEVEL
>
>     This is used to request CTF debug information and to specify how much CTF
>     debug information, LEVEL[=0,1,2] can be specified. If -gt is specified
>     (with no LEVEL), the default value of LEVEL is 2.
>
>     -gt0 (Level 0) produces no CTF debug information at all. Thus, -gt0
>     negates -gt.
>
>     -gt1 (Level 1) produces CTF information for tracebacks only. This includes
>     CTF callsite information, but does not include type information for other
>     entities.
>
>     -gt2 (Level 2) produces type information for entities (functions, variables
>     etc.) at file-scope or global-scope only. This level of information can be
>     used by dynamic tracers like DTrace.
>
> --  Patch 3 adds the CTF debug hooks and initializes them if the required
>     user-level options are specified.
>     CTF debug hooks are wrappers around the DWARF debug hooks.
>
> One of the main high-level design requirements that is relevant in the context
> of the current GCC patch set is that - CTF and DWARF must be able to co-exist.
> A user may want CTF debug information in isolation or with other debug formats.
> A .ctf section is small and unlike other debug sections, ideally should not
> need to be stripped out of the binary/executable.
>
> High-level proposed plan (phase 1) :
> In the next few patches, the functionality to generate contents of the CTF
> section (.ctf) for a single compilation unit will be added.
> Once CTF generation for a single compilation unit stabilizes, LTO and CTF
> generation will be looked at.
>
> Feedback and suggestions welcome.

You probably got asked this question multiple times already, but,
can CTF information be generated from DWARF instead?

The meaning of the CTF acronym suggests that there's nothing
like locations, registers, etc. but just a representation of the
types?

Generally we are trying to walk away from supporting multiple
debug info formats because that gets in the way of being
more precise from the frontend side.  Since DWARF is the
defacto standard, extensible and with a rich feature set the
line of thinking is that other formats (like STABS) can be
generated by "post-processing" DWARF.  Such
post-processing could happen on the object files or
on the GCC internal DWARF data structures by
providing alternate output routines.  That is, the mid-term
design goal is to make DWARF generation the "API"
for GCC frontends to use when creating high-level
debug information rather than trying to abstract from
the debuginfo format via the current debug-hooks or
the other way around via language-hooks.

Richard.

> Thanks
>
> Indu Bhagat (3):
>   Add new function lang_GNU_GIMPLE
>   Add CTF command line options : -gtLEVEL
>   Create CTF debug hooks
>
>  gcc/ChangeLog                                   |  24 ++
>  gcc/Makefile.in                                 |   3 +
>  gcc/common.opt                                  |   9 +
>  gcc/ctfout.c                                    | 171 +++++++++
>  gcc/ctfout.h                                    |  41 +++
>  gcc/debug.h                                     |   4 +
>  gcc/doc/invoke.texi                             |  16 +
>  gcc/flag-types.h                                |  13 +
>  gcc/gengtype.c                                  |   4 +-
>  gcc/langhooks.c                                 |   9 +
>  gcc/langhooks.h                                 |   1 +
>  gcc/opts.c                                      |  26 ++
>  gcc/testsuite/ChangeLog                         |   7 +
>  gcc/testsuite/gcc.dg/debug/ctf/ctf-1.c          |   6 +
>  gcc/testsuite/gcc.dg/debug/ctf/ctf-preamble-1.c |  11 +
>  gcc/testsuite/gcc.dg/debug/ctf/ctf.exp          |  41 +++
>  gcc/testsuite/gcc.dg/debug/dwarf2-ctf-1.c       |   7 +
>  gcc/toplev.c                                    |  24 ++
>  include/ChangeLog                               |   4 +
>  include/ctf.h                                   | 471 ++++++++++++++++++++++++
>  20 files changed, 890 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/ctfout.c
>  create mode 100644 gcc/ctfout.h
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-preamble-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf.exp
>  create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2-ctf-1.c
>  create mode 100644 include/ctf.h
>
> --
> 1.8.3.1
>
Indu Bhagat May 21, 2019, 10:44 p.m. UTC | #2
Thanks for your feedback. Comments inline.


On 05/21/2019 03:28 AM, Richard Biener wrote:
>> GCC RFC patch set :
>> Patch 1 is a simple addition of a new function lang_GNU_GIMPLE to check for
>> GIMPLE frontend.
> I don't think you should need this - the GIMPLE "frontend" is intended for
> unit testing only, I wouldn't like it to be exposed more.

When using -gt with -flto, I would still like the CTF hooks to be initialized
so that CTF can be generated when -flto is used. So the check in toplev.c is
done to allow only C and GNU GIMPLE.  I am fine with doing a string compare
with the language.hooks string if you suggest to go that way.

>
>> One of the main high-level design requirements that is relevant in the context
>> of the current GCC patch set is that - CTF and DWARF must be able to co-exist.
>> A user may want CTF debug information in isolation or with other debug formats.
>> A .ctf section is small and unlike other debug sections, ideally should not
>> need to be stripped out of the binary/executable.
>>
>> High-level proposed plan (phase 1) :
>> In the next few patches, the functionality to generate contents of the CTF
>> section (.ctf) for a single compilation unit will be added.
>> Once CTF generation for a single compilation unit stabilizes, LTO and CTF
>> generation will be looked at.
>>
>> Feedback and suggestions welcome.
> You probably got asked this question multiple times already, but,
> can CTF information be generated from DWARF instead?

Yes and No :) And that is indeed one of the motivation of the project - to
allow CTF generation where it's most suited aka the toolchain.

There do exist utilties for generation of CTF from DWARF. For example, one of
them is the dwarf2ctf in the DTrace Linux distribution. dwarf2ctf works offline
to transform DWARF generated by the compiler into CTF.

A dependency of an external conversion utility for "post-processing" DWARF
offline poses several problems:

1. Deployment problems: the converter should be distributed and integrated in
    the build system of the program.  This, on occasions, can be intrusive.  For
    example, in terms of dependencies: the dwarf2ctf converter depends on
    libdwarf from elfutils suite, glib2 (used for the GHashTable), zlib (used to
    compress the CTF information) and libctf (which both reads and writes the
    CTF data).

2. Performance problems: the conversion from DWARF to CTF can take a long time,
    especially in big programs such as the Linux kernel.

3. Maintainability problems: the converter should be maintained in order to
    reflect potential changes in the DWARF generated by the compiler.

4. Adoption problem: it is difficult for applications to adopt the usage of
    CTF, even if it happens to provide what they need, since it would require to
    write a conversion utility or integrate DTrace's.


>
> The meaning of the CTF acronym suggests that there's nothing
> like locations, registers, etc. but just a representation of the
> types?

Yes. CTF is simply put Type information; no locations, registers etc.

>
> Generally we are trying to walk away from supporting multiple
> debug info formats because that gets in the way of being
> more precise from the frontend side.  Since DWARF is the

With regard to whether the support for CTF imposes infeasible or distinct
requirements on the frontend - it does not appear to be the case (I have
been working on CTF generation in GCC for a SINGLE compilation unit; More see
below). I agree that CTF debug information generation should ideally not impose
additional requirements on the frontend.

> defacto standard, extensible and with a rich feature set the
> line of thinking is that other formats (like STABS) can be
> generated by "post-processing" DWARF.  Such
> post-processing could happen on the object files or
> on the GCC internal DWARF data structures by
> providing alternate output routines.  That is, the mid-term
> design goal is to make DWARF generation the "API"
> for GCC frontends to use when creating high-level
> debug information rather than trying to abstract from
> the debuginfo format via the current debug-hooks or
> the other way around via language-hooks.

I am not sure if I understood the last part very well, so I will state how CTF
generation is intended to work. Does the following fit the design goal you
state ?

( Caveat : I have been working on the functionality to generate CTF for a SINGLE
   compilation unit. LTO bits remain. )

So far, there are no additional requirements on the frontend side. CTF hooks
are wrappers around DWARF debug hooks (much like go-dump hooks, and vms dbg
hooks).  We did notice that GCC does not have the infrastructure to register or
enlist multiple debug hooks; and now from your comments it is clear that this
is by design. Thanks for clarifying that.

Having said that, I use CTF hooks to go from TREE --> update CTF internal
structures or output CTF routines depending on the hook (e.g., type_decl or
finish respectively), rather than changing the dwarf* files with CTF APIs. The
CTF debug hooks relay control to the DWARF debug hooks at an appropriate point.
TREE input references to the CTF debug hooks are readonly in the context of CTF
generation.

The CTF debug information is kept in a CTF container distinct from the frontend
structures.  HashMaps are used to avoid generation of duplicate CTF and to
book-keep the generated CTF.
Richard Biener May 22, 2019, 9:04 a.m. UTC | #3
On Wed, May 22, 2019 at 12:34 AM Indu Bhagat <indu.bhagat@oracle.com> wrote:
>
> Thanks for your feedback. Comments inline.
>
>
> On 05/21/2019 03:28 AM, Richard Biener wrote:
>
> GCC RFC patch set :
> Patch 1 is a simple addition of a new function lang_GNU_GIMPLE to check for
> GIMPLE frontend.
>
> I don't think you should need this - the GIMPLE "frontend" is intended for
> unit testing only, I wouldn't like it to be exposed more.
>
> When using -gt with -flto, I would still like the CTF hooks to be initialized
> so that CTF can be generated when -flto is used. So the check in toplev.c is
> done to allow only C and GNU GIMPLE.  I am fine with doing a string compare
> with the language.hooks string if you suggest to go that way.

I see.  Note that starting with GCC 8 type and declaration debug information
(the only you are interested in it seems) for LTO is generated at compile-time
while LTO ("GNU GIMPLE") only amends it with location information.  So
I believe you do not want to emit CTF from LTO but from the compilation-stage
driven by the appropriate language frontend which is where debug information
for types and decls is emitted.

>
> One of the main high-level design requirements that is relevant in the context
> of the current GCC patch set is that - CTF and DWARF must be able to co-exist.
> A user may want CTF debug information in isolation or with other debug formats.
> A .ctf section is small and unlike other debug sections, ideally should not
> need to be stripped out of the binary/executable.
>
> High-level proposed plan (phase 1) :
> In the next few patches, the functionality to generate contents of the CTF
> section (.ctf) for a single compilation unit will be added.
> Once CTF generation for a single compilation unit stabilizes, LTO and CTF
> generation will be looked at.
>
> Feedback and suggestions welcome.
>
> You probably got asked this question multiple times already, but,
> can CTF information be generated from DWARF instead?
>
> Yes and No :) And that is indeed one of the motivation of the project - to
> allow CTF generation where it's most suited aka the toolchain.
>
> There do exist utilties for generation of CTF from DWARF. For example, one of
> them is the dwarf2ctf in the DTrace Linux distribution. dwarf2ctf works offline
> to transform DWARF generated by the compiler into CTF.
>
> A dependency of an external conversion utility for "post-processing" DWARF
> offline poses several problems:
>
> 1. Deployment problems: the converter should be distributed and integrated in
>    the build system of the program.  This, on occasions, can be intrusive.  For
>    example, in terms of dependencies: the dwarf2ctf converter depends on
>    libdwarf from elfutils suite, glib2 (used for the GHashTable), zlib (used to
>    compress the CTF information) and libctf (which both reads and writes the
>    CTF data).
>
> 2. Performance problems: the conversion from DWARF to CTF can take a long time,
>    especially in big programs such as the Linux kernel.
>
> 3. Maintainability problems: the converter should be maintained in order to
>    reflect potential changes in the DWARF generated by the compiler.
>
> 4. Adoption problem: it is difficult for applications to adopt the usage of
>    CTF, even if it happens to provide what they need, since it would require to
>    write a conversion utility or integrate DTrace's.
>
>
>
> The meaning of the CTF acronym suggests that there's nothing
> like locations, registers, etc. but just a representation of the
> types?
>
> Yes. CTF is simply put Type information; no locations, registers etc.
>
>
> Generally we are trying to walk away from supporting multiple
> debug info formats because that gets in the way of being
> more precise from the frontend side.  Since DWARF is the
>
>
> With regard to whether the support for CTF imposes infeasible or distinct
> requirements on the frontend - it does not appear to be the case (I have
> been working on CTF generation in GCC for a SINGLE compilation unit; More see
> below). I agree that CTF debug information generation should ideally not impose
> additional requirements on the frontend.
>
> defacto standard, extensible and with a rich feature set the
> line of thinking is that other formats (like STABS) can be
> generated by "post-processing" DWARF.  Such
> post-processing could happen on the object files or
> on the GCC internal DWARF data structures by
> providing alternate output routines.  That is, the mid-term
> design goal is to make DWARF generation the "API"
> for GCC frontends to use when creating high-level
> debug information rather than trying to abstract from
> the debuginfo format via the current debug-hooks or
> the other way around via language-hooks.
>
> I am not sure if I understood the last part very well, so I will state how CTF
> generation is intended to work. Does the following fit the design goal you
> state ?
>
> ( Caveat : I have been working on the functionality to generate CTF for a SINGLE
>   compilation unit. LTO bits remain. )
>
> So far, there are no additional requirements on the frontend side. CTF hooks
> are wrappers around DWARF debug hooks (much like go-dump hooks, and vms dbg
> hooks).  We did notice that GCC does not have the infrastructure to register or
> enlist multiple debug hooks; and now from your comments it is clear that this
> is by design. Thanks for clarifying that.
>
> Having said that, I use CTF hooks to go from TREE --> update CTF internal
> structures or output CTF routines depending on the hook (e.g., type_decl or
> finish respectively), rather than changing the dwarf* files with CTF APIs. The
> CTF debug hooks relay control to the DWARF debug hooks at an appropriate point.
> TREE input references to the CTF debug hooks are readonly in the context of CTF
> generation.
>
> The CTF debug information is kept in a CTF container distinct from the frontend
> structures.  HashMaps are used to avoid generation of duplicate CTF and to
> book-keep the generated CTF.

OK.  So I wonder how difficult it is to emit CTF by walking dwarf2outs own
data structures?  That is, in my view CTF should be emitted by
dwarf2out_early_finish ()  (which is also the point LTO type/decl debug
is generated from).  It would be nice to avoid extra bookkeeping data structures
for CTF since those of DWARF should include all necessary information already.

Btw, do I read the CTF document posted to the binutils list (but not
cross-referenced
here :/) correctly in that you only want CTF debug for objects defined
in the file and
type information for the types refered to from that?  At
dwarf2out_early_finish time
it isn't fully known which symbols will end up being emitted (and with
LTO you only
would know at link time).  It seems to me that linker support to garbage collect
unused entries would be the way to go forward (probably easy for the
declarations
but not so for the types)?

Richard.

>
Indu Bhagat May 23, 2019, 8:40 p.m. UTC | #4
On 05/22/2019 02:04 AM, Richard Biener wrote:
>> The CTF debug information is kept in a CTF container distinct from the frontend
>> structures.  HashMaps are used to avoid generation of duplicate CTF and to
>> book-keep the generated CTF.
> OK.  So I wonder how difficult it is to emit CTF by walking dwarf2outs own
> data structures?  That is, in my view CTF should be emitted by
> dwarf2out_early_finish ()  (which is also the point LTO type/decl debug
> is generated from).  It would be nice to avoid extra bookkeeping data structures
> for CTF since those of DWARF should include all necessary information already.

CTF format has some characteristics which make it necessary to "pre-process"
the generated CTF data before asm'ing out into a section. E.g. few cases of why
"pre-processing" CTF is required before asm'ing out :
  1. CTF types do need to be emitted in "some" order :
     CTF types can have references to other CTF types. This consequently implies
     that the referenced type must appear BEFORE the referring type.
  2. CTF preamble holds offsets to the various subsections - function info,
     variables, data types and CTF string table. To calculate the offsets, the
     compiler needs to know the size in bytes of these sub-sections.  CTF
     representation for some types like structures, functions, enums have
     variable length of bytes trailing them (depending on the defintion of the
     type).
  3. CTF variable entries need to be placed in the order of the names.

Because of some of these "features" of the CTF format, the compiler does need
to do a transition from runtime CTF generated data --> CTF binary data format
for a clean and readable code.

So, I think the needs are different enough to vouch for an implementation
segregated from dwarf* codebase.


> Btw, do I read the CTF document posted to the binutils list (but not
> cross-referenced
> here :/) correctly in that you only want CTF debug for objects defined
> in the file and
> type information for the types refered to from that?  At

Yes. CTF is emitted for types at file-scope and global-scope only.  Types, vars
at function-scope should be skipped.

> dwarf2out_early_finish time
> it isn't fully known which symbols will end up being emitted (and with
> LTO you only
> would know at link time).

In nutshell, I am processing all decl at early_global_decl () time except
TYPE_DECL (Similar to DWARF, based on the thinking that if they are required
they will be reached at via other DECL).
In addition, I process all decl at type_decl () time except function-scope,
no-name decl, builtins.

Currently, it does look like CTF for possibly to-be-omitted symbols will be
generated... I assume even DWARF needs to handle this case. Can you point me to
how DWARF does this ?

> It seems to me that linker support to garbage collect
> unused entries would be the way to go forward (probably easy for the
> declarations
> but not so for the types)?

Hmm, garbage collecting unused types in linker - Let me get back to you on
this. It does not look easy. Decl should be doable though.
Richard Biener May 24, 2019, 9:26 a.m. UTC | #5
On Thu, May 23, 2019 at 10:31 PM Indu Bhagat <indu.bhagat@oracle.com> wrote:
>
>
>
> On 05/22/2019 02:04 AM, Richard Biener wrote:
>
> The CTF debug information is kept in a CTF container distinct from the frontend
> structures.  HashMaps are used to avoid generation of duplicate CTF and to
> book-keep the generated CTF.
>
> OK.  So I wonder how difficult it is to emit CTF by walking dwarf2outs own
> data structures?  That is, in my view CTF should be emitted by
> dwarf2out_early_finish ()  (which is also the point LTO type/decl debug
> is generated from).  It would be nice to avoid extra bookkeeping data structures
> for CTF since those of DWARF should include all necessary information already.
>
> CTF format has some characteristics which make it necessary to "pre-process"
> the generated CTF data before asm'ing out into a section. E.g. few cases of why
> "pre-processing" CTF is required before asm'ing out :
>  1. CTF types do need to be emitted in "some" order :
>     CTF types can have references to other CTF types. This consequently implies
>     that the referenced type must appear BEFORE the referring type.
>  2. CTF preamble holds offsets to the various subsections - function info,
>     variables, data types and CTF string table. To calculate the offsets, the
>     compiler needs to know the size in bytes of these sub-sections.  CTF
>     representation for some types like structures, functions, enums have
>     variable length of bytes trailing them (depending on the defintion of the
>     type).
>  3. CTF variable entries need to be placed in the order of the names.
>
> Because of some of these "features" of the CTF format, the compiler does need
> to do a transition from runtime CTF generated data --> CTF binary data format
> for a clean and readable code.
>
> So, I think the needs are different enough to vouch for an implementation
> segregated from dwarf* codebase.
>
>
> Btw, do I read the CTF document posted to the binutils list (but not
> cross-referenced
> here :/) correctly in that you only want CTF debug for objects defined
> in the file and
> type information for the types refered to from that?  At
>
> Yes. CTF is emitted for types at file-scope and global-scope only.  Types, vars
> at function-scope should be skipped.
>
> dwarf2out_early_finish time
> it isn't fully known which symbols will end up being emitted (and with
> LTO you only
> would know at link time).
>
> In nutshell, I am processing all decl at early_global_decl () time except
> TYPE_DECL (Similar to DWARF, based on the thinking that if they are required
> they will be reached at via other DECL).
> In addition, I process all decl at type_decl () time except function-scope,
> no-name decl, builtins.
>
> Currently, it does look like CTF for possibly to-be-omitted symbols will be
> generated... I assume even DWARF needs to handle this case. Can you point me to
> how DWARF does this ?

It emits the debug information.  DWARF outputs a representation of the source,
not only emitted objects.  We prune some "unused" bits if the user prefers us
to do that but we do not omit information on types or decls that are used in
the source but later eventually optimized away.

> It seems to me that linker support to garbage collect
> unused entries would be the way to go forward (probably easy for the
> declarations
> but not so for the types)?
>
> Hmm, garbage collecting unused types in linker - Let me get back to you on
> this. It does not look easy. Decl should be doable though.

For example DWARF has something like type units that can be refered
to via hashes.  GCC can output those into separate sections and I can
envision outputting separate debug (CTF) sections for each declaration.
The linker could then merge sections for declarations that survived
and pick up all referenced type sections.  Restrictions on ordering
for CTF may make this a bit difficult though, essentially forcing a
separate intermediate "unlinked" format and the linker regenerating
the final one.  OTOH CTF probably simply concatenates data from
different CUs?

Richard.
Michael Matz May 24, 2019, 1:04 p.m. UTC | #6
Hello,

On Thu, 23 May 2019, Indu Bhagat wrote:

> > OK.  So I wonder how difficult it is to emit CTF by walking dwarf2outs own
> > data structures?  That is, in my view CTF should be emitted by
> > dwarf2out_early_finish ()  (which is also the point LTO type/decl debug
> > is generated from).  It would be nice to avoid extra bookkeeping data
> > structures
> > for CTF since those of DWARF should include all necessary information
> > already.
> 
> CTF format has some characteristics which make it necessary to "pre-process"
> the generated CTF data before asm'ing out into a section. E.g. few cases of
> why "pre-processing" CTF is required before asm'ing out :
>  1. CTF types do need to be emitted in "some" order :
>     CTF types can have references to other CTF types. This consequently
>     implies
>     that the referenced type must appear BEFORE the referring type.
>  2. CTF preamble holds offsets to the various subsections - function info,
>     variables, data types and CTF string table. To calculate the offsets, the
>     compiler needs to know the size in bytes of these sub-sections.  CTF
>     representation for some types like structures, functions, enums have
>     variable length of bytes trailing them (depending on the defintion of the
>     type).
>  3. CTF variable entries need to be placed in the order of the names.
> 
> Because of some of these "features" of the CTF format, the compiler does 
> need to do a transition from runtime CTF generated data --> CTF binary 
> data format for a clean and readable code.

Sure, but this whole process could still be triggered from within 
dwarf2out_early_finish, by walking the DWARF tree (instead of getting fed 
types and decls via hooks) and generating the appropriate CTF data 
structures.  (It's just a possibility, it might end up uglier that using 
GCC trees)

Imagine a world where debug hooks wouldn't exist (which is where we would 
like to end up in a far away future), how would you then add CTF debug 
info to the compiler (assuming it already emits DWARF)?  You would hook 
yourself either into the DWARF routines that currently are fed the 
entities or you would hook yourself into somewhere late in the pipeline 
where the DWARF debug info is complete and you would generate CTF from 
that.

> So, I think the needs are different enough to vouch for an implementation
> segregated from dwarf* codebase.

Of course.  We are merely discussing of where the triggering of processing 
starts: debug hooks, or something like:

dwarf2out_early_finish() {
  ...
  if (ctf)
    ctf_emit();
}

(and then in addition if the current DWARF info would be the source of CTF 
info, or if it'd be whatever the compiler gives you as trees)

The thing is, with debug hooks you'd have to invent a scheme of stacking 
hooks on top of each other (because we want to generate DWARF and CTF from 
the same compilation).  That seems like a wasted effort when our wish is 
for the hooks to go away alltogether.


Ciao,
Michael.
Jakub Jelinek May 24, 2019, 1:24 p.m. UTC | #7
On Tue, May 21, 2019 at 03:44:47PM -0700, Indu Bhagat wrote:
> Yes and No :) And that is indeed one of the motivation of the project - to
> allow CTF generation where it's most suited aka the toolchain.
> 
> There do exist utilties for generation of CTF from DWARF. For example, one of
> them is the dwarf2ctf in the DTrace Linux distribution. dwarf2ctf works offline
> to transform DWARF generated by the compiler into CTF.

So, if there is a conversion utility, why don't we just change gcc so that
if some new option is passed on the gcc driver link line, then that
post-processing utility will be invoked too?

> A dependency of an external conversion utility for "post-processing" DWARF
> offline poses several problems:
> 
> 1. Deployment problems: the converter should be distributed and integrated in
>    the build system of the program.  This, on occasions, can be intrusive.  For
>    example, in terms of dependencies: the dwarf2ctf converter depends on
>    libdwarf from elfutils suite, glib2 (used for the GHashTable), zlib (used to
>    compress the CTF information) and libctf (which both reads and writes the
>    CTF data).

I don't see this as a problem.

> 2. Performance problems: the conversion from DWARF to CTF can take a long time,
>    especially in big programs such as the Linux kernel.

So optimize it?  Linux kernel certainly doesn't have extra large debug
information, compared to other projects.

> 3. Maintainability problems: the converter should be maintained in order to
>    reflect potential changes in the DWARF generated by the compiler.

If you integrate the support into GCC, then it will need to be maintained
there as well, I bet it will be more work than on the conversion utility.

	Jakub
Jakub Jelinek May 24, 2019, 1:25 p.m. UTC | #8
On Fri, May 24, 2019 at 03:24:34PM +0200, Jakub Jelinek wrote:
> On Tue, May 21, 2019 at 03:44:47PM -0700, Indu Bhagat wrote:
> > Yes and No :) And that is indeed one of the motivation of the project - to
> > allow CTF generation where it's most suited aka the toolchain.
> > 
> > There do exist utilties for generation of CTF from DWARF. For example, one of
> > them is the dwarf2ctf in the DTrace Linux distribution. dwarf2ctf works offline
> > to transform DWARF generated by the compiler into CTF.
> 
> So, if there is a conversion utility, why don't we just change gcc so that
> if some new option is passed on the gcc driver link line, then that
> post-processing utility will be invoked too?

It could be even written as a linker plugin.

	Jakub
Indu Bhagat May 27, 2019, 6:22 p.m. UTC | #9
Hi Michael,

On 05/24/2019 06:04 AM, Michael Matz wrote:
> Hello,
>
> On Thu, 23 May 2019, Indu Bhagat wrote:
>
>>> OK.  So I wonder how difficult it is to emit CTF by walking dwarf2outs own
>>> data structures?  That is, in my view CTF should be emitted by
>>> dwarf2out_early_finish ()  (which is also the point LTO type/decl debug
>>> is generated from).  It would be nice to avoid extra bookkeeping data
>>> structures
>>> for CTF since those of DWARF should include all necessary information
>>> already.
>> CTF format has some characteristics which make it necessary to "pre-process"
>> the generated CTF data before asm'ing out into a section. E.g. few cases of
>> why "pre-processing" CTF is required before asm'ing out :
>>   1. CTF types do need to be emitted in "some" order :
>>      CTF types can have references to other CTF types. This consequently
>>      implies
>>      that the referenced type must appear BEFORE the referring type.
>>   2. CTF preamble holds offsets to the various subsections - function info,
>>      variables, data types and CTF string table. To calculate the offsets, the
>>      compiler needs to know the size in bytes of these sub-sections.  CTF
>>      representation for some types like structures, functions, enums have
>>      variable length of bytes trailing them (depending on the defintion of the
>>      type).
>>   3. CTF variable entries need to be placed in the order of the names.
>>
>> Because of some of these "features" of the CTF format, the compiler does
>> need to do a transition from runtime CTF generated data --> CTF binary
>> data format for a clean and readable code.
> Sure, but this whole process could still be triggered from within
> dwarf2out_early_finish, by walking the DWARF tree (instead of getting fed
> types and decls via hooks) and generating the appropriate CTF data
> structures.  (It's just a possibility, it might end up uglier that using
> GCC trees)

I think not only is the code messier, but it's also wasted effort if user only
wants to generate CTF.

> Imagine a world where debug hooks wouldn't exist (which is where we would
> like to end up in a far away future), how would you then add CTF debug
> info to the compiler (assuming it already emits DWARF)?  You would hook
> yourself either into the DWARF routines that currently are fed the
> entities or you would hook yourself into somewhere late in the pipeline
> where the DWARF debug info is complete and you would generate CTF from
> that.
>
>> So, I think the needs are different enough to vouch for an implementation
>> segregated from dwarf* codebase.
> Of course.  We are merely discussing of where the triggering of processing
> starts: debug hooks, or something like:
>
> dwarf2out_early_finish() {
>    ...
>    if (ctf)
>      ctf_emit();
> }
>
> (and then in addition if the current DWARF info would be the source of CTF
> info, or if it'd be whatever the compiler gives you as trees)
>
> The thing is, with debug hooks you'd have to invent a scheme of stacking
> hooks on top of each other (because we want to generate DWARF and CTF from
> the same compilation).  That seems like a wasted effort when our wish is
> for the hooks to go away alltogether.
>
When the debug hooks go away, the functionality can be folded in. Much like
above, the ctf proposed implementation will do :

ctf_early_global_decl (tree decl)
{
   ctf_decl (decl);

   real_debug_hooks->early_global_decl (decl);
}

These ctf_* debug hooks wrappers are as lean as shown above.

I do understand now that if debug hooks are destined to go away, all the
implementation which wraps debug hooks (go dump hooks, vms debug hooks,
and now the proposed ctf debug hooks) will need some merging. But to generate
CTF, I think working on type or decl instead of DWARF dies to is a better
implementation because if user wants only CTF, no DWARF trees need to be
created.

This way we keep DWARF and CTF generation independent of each other (as the
user may want either one of these or both).

> Ciao,
> Michael.
Richard Biener May 29, 2019, 7:15 a.m. UTC | #10
On Mon, May 27, 2019 at 8:12 PM Indu Bhagat <indu.bhagat@oracle.com> wrote:
>
> Hi Michael,
>
> On 05/24/2019 06:04 AM, Michael Matz wrote:
> > Hello,
> >
> > On Thu, 23 May 2019, Indu Bhagat wrote:
> >
> >>> OK.  So I wonder how difficult it is to emit CTF by walking dwarf2outs own
> >>> data structures?  That is, in my view CTF should be emitted by
> >>> dwarf2out_early_finish ()  (which is also the point LTO type/decl debug
> >>> is generated from).  It would be nice to avoid extra bookkeeping data
> >>> structures
> >>> for CTF since those of DWARF should include all necessary information
> >>> already.
> >> CTF format has some characteristics which make it necessary to "pre-process"
> >> the generated CTF data before asm'ing out into a section. E.g. few cases of
> >> why "pre-processing" CTF is required before asm'ing out :
> >>   1. CTF types do need to be emitted in "some" order :
> >>      CTF types can have references to other CTF types. This consequently
> >>      implies
> >>      that the referenced type must appear BEFORE the referring type.
> >>   2. CTF preamble holds offsets to the various subsections - function info,
> >>      variables, data types and CTF string table. To calculate the offsets, the
> >>      compiler needs to know the size in bytes of these sub-sections.  CTF
> >>      representation for some types like structures, functions, enums have
> >>      variable length of bytes trailing them (depending on the defintion of the
> >>      type).
> >>   3. CTF variable entries need to be placed in the order of the names.
> >>
> >> Because of some of these "features" of the CTF format, the compiler does
> >> need to do a transition from runtime CTF generated data --> CTF binary
> >> data format for a clean and readable code.
> > Sure, but this whole process could still be triggered from within
> > dwarf2out_early_finish, by walking the DWARF tree (instead of getting fed
> > types and decls via hooks) and generating the appropriate CTF data
> > structures.  (It's just a possibility, it might end up uglier that using
> > GCC trees)
>
> I think not only is the code messier, but it's also wasted effort if user only
> wants to generate CTF.
>
> > Imagine a world where debug hooks wouldn't exist (which is where we would
> > like to end up in a far away future), how would you then add CTF debug
> > info to the compiler (assuming it already emits DWARF)?  You would hook
> > yourself either into the DWARF routines that currently are fed the
> > entities or you would hook yourself into somewhere late in the pipeline
> > where the DWARF debug info is complete and you would generate CTF from
> > that.
> >
> >> So, I think the needs are different enough to vouch for an implementation
> >> segregated from dwarf* codebase.
> > Of course.  We are merely discussing of where the triggering of processing
> > starts: debug hooks, or something like:
> >
> > dwarf2out_early_finish() {
> >    ...
> >    if (ctf)
> >      ctf_emit();
> > }
> >
> > (and then in addition if the current DWARF info would be the source of CTF
> > info, or if it'd be whatever the compiler gives you as trees)
> >
> > The thing is, with debug hooks you'd have to invent a scheme of stacking
> > hooks on top of each other (because we want to generate DWARF and CTF from
> > the same compilation).  That seems like a wasted effort when our wish is
> > for the hooks to go away alltogether.
> >
> When the debug hooks go away, the functionality can be folded in. Much like
> above, the ctf proposed implementation will do :
>
> ctf_early_global_decl (tree decl)
> {
>    ctf_decl (decl);
>
>    real_debug_hooks->early_global_decl (decl);
> }
>
> These ctf_* debug hooks wrappers are as lean as shown above.
>
> I do understand now that if debug hooks are destined to go away, all the
> implementation which wraps debug hooks (go dump hooks, vms debug hooks,
> and now the proposed ctf debug hooks) will need some merging. But to generate
> CTF, I think working on type or decl instead of DWARF dies to is a better
> implementation because if user wants only CTF, no DWARF trees need to be
> created.
>
> This way we keep DWARF and CTF generation independent of each other (as the
> user may want either one of these or both).

The user currently can't have both DWARF and STABS either.  That things like
godump uses debug hooks is just (convenient?) abuse.

In the end frontends will not call sth like dwarf2out_decl but maybe
gen_subroutine_die () or gen_template_die ().  So how do you expect
the "wrapping" to work there?

I understand you want CTF for "actually emitted" decls so I propose you
instead hook into the symtab code which would end up calling the
early_global_decl debug hook.  But please don't add new debug hook
users.

Richard.

> > Ciao,
> > Michael.
>
Indu Bhagat May 31, 2019, 7:28 p.m. UTC | #11
On 05/24/2019 02:26 AM, Richard Biener wrote:
>> Currently, it does look like CTF for possibly to-be-omitted symbols will be
>> generated... I assume even DWARF needs to handle this case. Can you point me to
>> how DWARF does this ?
> It emits the debug information.  DWARF outputs a representation of the source,
> not only emitted objects.  We prune some "unused" bits if the user prefers us
> to do that but we do not omit information on types or decls that are used in
> the source but later eventually optimized away.
>
>> It seems to me that linker support to garbage collect
>> unused entries would be the way to go forward (probably easy for the
>> declarations
>> but not so for the types)?
>>
>> Hmm, garbage collecting unused types in linker - Let me get back to you on
>> this. It does not look easy. Decl should be doable though.
> For example DWARF has something like type units that can be refered
> to via hashes.  GCC can output those into separate sections and I can
> envision outputting separate debug (CTF) sections for each declaration.
> The linker could then merge sections for declarations that survived
> and pick up all referenced type sections.  Restrictions on ordering
> for CTF may make this a bit difficult though, essentially forcing a
> separate intermediate "unlinked" format and the linker regenerating
> the final one.  OTOH CTF probably simply concatenates data from
> different CUs?

Yes, I cannot see this happening with CTF easily without some format changes.

At link-time, there needs to be de-duplication of CTF types across CUs. This
linker component needs work at this time, although we do have a working
prototype.

Regarding the type units in DWARF, are the shared/common types duplicated
across the type units ? If not duplicated, how are the referenced types
maintained/denoted across type units ?

Thanks!
Indu
Indu Bhagat June 1, 2019, 12:25 a.m. UTC | #12
On 05/29/2019 12:15 AM, Richard Biener wrote:
>>> Of course.  We are merely discussing of where the triggering of processing
>>> starts: debug hooks, or something like:
>>>
>>> dwarf2out_early_finish() {
>>>     ...
>>>     if (ctf)
>>>       ctf_emit();
>>> }
>>>
>>> (and then in addition if the current DWARF info would be the source of CTF
>>> info, or if it'd be whatever the compiler gives you as trees)
>>>
>>> The thing is, with debug hooks you'd have to invent a scheme of stacking
>>> hooks on top of each other (because we want to generate DWARF and CTF from
>>> the same compilation).  That seems like a wasted effort when our wish is
>>> for the hooks to go away alltogether.
>>>
>> When the debug hooks go away, the functionality can be folded in. Much like
>> above, the ctf proposed implementation will do :
>>
>> ctf_early_global_decl (tree decl)
>> {
>>     ctf_decl (decl);
>>
>>     real_debug_hooks->early_global_decl (decl);
>> }
>>
>> These ctf_* debug hooks wrappers are as lean as shown above.
>>
>> I do understand now that if debug hooks are destined to go away, all the
>> implementation which wraps debug hooks (go dump hooks, vms debug hooks,
>> and now the proposed ctf debug hooks) will need some merging. But to generate
>> CTF, I think working on type or decl instead of DWARF dies to is a better
>> implementation because if user wants only CTF, no DWARF trees need to be
>> created.
>>
>> This way we keep DWARF and CTF generation independent of each other (as the
>> user may want either one of these or both).
> The user currently can't have both DWARF and STABS either.  That things like
> godump uses debug hooks is just (convenient?) abuse.
>
> In the end frontends will not call sth like dwarf2out_decl but maybe
> gen_subroutine_die () or gen_template_die ().  So how do you expect
> the "wrapping" to work there?
>
> I understand you want CTF for "actually emitted" decls so I propose you
> instead hook into the symtab code which would end up calling the
> early_global_decl debug hook.  But please don't add new debug hook
> users.

OK.

Will I need to tap both the callsites of the early_global_decl () debug hook ? :
   1. symbol_table::finalize_compilation_unit () in cgraphunit.c
   2. rest_of_decl_compilation () in passes.c
Or is the last one for something specific to godump debug hooks and C++ ?

I guess the above will take care of the CTF generation bit. For emission,
something similar should be done because DWARF hooks will not be initialized if
DWARF debuginfo is not requested by the user. So I cannot have the CTF emission
code in the dwarf2out*finish () debug hooks as suggested earlier.

Curious to know how the current debug hook users like dbx debug hooks will be
taken care of in the future design ? Is it just the wrapping/stacking of debug
hooks that's problematic and not the clean instances like dbx debug hooks ?
Richard Biener June 3, 2019, 10 a.m. UTC | #13
On Sat, Jun 1, 2019 at 2:14 AM Indu Bhagat <indu.bhagat@oracle.com> wrote:
>
>
>
> On 05/29/2019 12:15 AM, Richard Biener wrote:
>
> Of course.  We are merely discussing of where the triggering of processing
> starts: debug hooks, or something like:
>
> dwarf2out_early_finish() {
>    ...
>    if (ctf)
>      ctf_emit();
> }
>
> (and then in addition if the current DWARF info would be the source of CTF
> info, or if it'd be whatever the compiler gives you as trees)
>
> The thing is, with debug hooks you'd have to invent a scheme of stacking
> hooks on top of each other (because we want to generate DWARF and CTF from
> the same compilation).  That seems like a wasted effort when our wish is
> for the hooks to go away alltogether.
>
> When the debug hooks go away, the functionality can be folded in. Much like
> above, the ctf proposed implementation will do :
>
> ctf_early_global_decl (tree decl)
> {
>    ctf_decl (decl);
>
>    real_debug_hooks->early_global_decl (decl);
> }
>
> These ctf_* debug hooks wrappers are as lean as shown above.
>
> I do understand now that if debug hooks are destined to go away, all the
> implementation which wraps debug hooks (go dump hooks, vms debug hooks,
> and now the proposed ctf debug hooks) will need some merging. But to generate
> CTF, I think working on type or decl instead of DWARF dies to is a better
> implementation because if user wants only CTF, no DWARF trees need to be
> created.
>
> This way we keep DWARF and CTF generation independent of each other (as the
> user may want either one of these or both).
>
> The user currently can't have both DWARF and STABS either.  That things like
> godump uses debug hooks is just (convenient?) abuse.
>
> In the end frontends will not call sth like dwarf2out_decl but maybe
> gen_subroutine_die () or gen_template_die ().  So how do you expect
> the "wrapping" to work there?
>
> I understand you want CTF for "actually emitted" decls so I propose you
> instead hook into the symtab code which would end up calling the
> early_global_decl debug hook.  But please don't add new debug hook
> users.
>
> OK.
>
> Will I need to tap both the callsites of the early_global_decl () debug hook ? :
>   1. symbol_table::finalize_compilation_unit () in cgraphunit.c
>   2. rest_of_decl_compilation () in passes.c
> Or is the last one for something specific to godump debug hooks and C++ ?

You need to handle both, one is for variables and one for functions.

> I guess the above will take care of the CTF generation bit. For emission,
> something similar should be done because DWARF hooks will not be initialized if
> DWARF debuginfo is not requested by the user. So I cannot have the CTF emission
> code in the dwarf2out*finish () debug hooks as suggested earlier.

True.

> Curious to know how the current debug hook users like dbx debug hooks will be
> taken care of in the future design ? Is it just the wrapping/stacking of debug
> hooks that's problematic and not the clean instances like dbx debug hooks ?

dbx format support will go away, it's unmaintained and not up to the
task handling
modern languages.

Richard.

>
Richard Biener June 3, 2019, 10:01 a.m. UTC | #14
On Fri, May 31, 2019 at 9:17 PM Indu Bhagat <indu.bhagat@oracle.com> wrote:
>
>
>
> On 05/24/2019 02:26 AM, Richard Biener wrote:
>
> Currently, it does look like CTF for possibly to-be-omitted symbols will be
> generated... I assume even DWARF needs to handle this case. Can you point me to
> how DWARF does this ?
>
> It emits the debug information.  DWARF outputs a representation of the source,
> not only emitted objects.  We prune some "unused" bits if the user prefers us
> to do that but we do not omit information on types or decls that are used in
> the source but later eventually optimized away.
>
> It seems to me that linker support to garbage collect
> unused entries would be the way to go forward (probably easy for the
> declarations
> but not so for the types)?
>
> Hmm, garbage collecting unused types in linker - Let me get back to you on
> this. It does not look easy. Decl should be doable though.
>
> For example DWARF has something like type units that can be refered
> to via hashes.  GCC can output those into separate sections and I can
> envision outputting separate debug (CTF) sections for each declaration.
> The linker could then merge sections for declarations that survived
> and pick up all referenced type sections.  Restrictions on ordering
> for CTF may make this a bit difficult though, essentially forcing a
> separate intermediate "unlinked" format and the linker regenerating
> the final one.  OTOH CTF probably simply concatenates data from
> different CUs?
>
> Yes, I cannot see this happening with CTF easily without some format changes.
>
> At link-time, there needs to be de-duplication of CTF types across CUs. This
> linker component needs work at this time, although we do have a working
> prototype.
>
> Regarding the type units in DWARF, are the shared/common types duplicated
> across the type units ? If not duplicated, how are the referenced types
> maintained/denoted across type units ?

I think type units can refer to each other just fine but the linker will not
split them up further, just throw away duplicates.

Richard.

> Thanks!
> Indu
>
Indu Bhagat June 4, 2019, 12:15 a.m. UTC | #15
Hello,


On 05/24/2019 06:24 AM, Jakub Jelinek wrote:
> On Tue, May 21, 2019 at 03:44:47PM -0700, Indu Bhagat wrote:
>> Yes and No :) And that is indeed one of the motivation of the project - to
>> allow CTF generation where it's most suited aka the toolchain.
>>
>> There do exist utilties for generation of CTF from DWARF. For example, one of
>> them is the dwarf2ctf in the DTrace Linux distribution. dwarf2ctf works offline
>> to transform DWARF generated by the compiler into CTF.
> So, if there is a conversion utility, why don't we just change gcc so that
> if some new option is passed on the gcc driver link line, then that
> post-processing utility will be invoked too?

Performing DWARF to CTF conversion at link-time is not recommended because for
any large project, it is time-consuming to generate, read in, and convert DWARF
to CTF in a post-processing step. These costs are prohibitive enough and affect
CTF adoption.

Data for some projects below.

>> A dependency of an external conversion utility for "post-processing" DWARF
>> offline poses several problems:
>>
>> 1. Deployment problems: the converter should be distributed and integrated in
>>     the build system of the program.  This, on occasions, can be intrusive.  For
>>     example, in terms of dependencies: the dwarf2ctf converter depends on
>>     libdwarf from elfutils suite, glib2 (used for the GHashTable), zlib (used to
>>     compress the CTF information) and libctf (which both reads and writes the
>>     CTF data).
> I don't see this as a problem.
>
>> 2. Performance problems: the conversion from DWARF to CTF can take a long time,
>>     especially in big programs such as the Linux kernel.
> So optimize it?  Linux kernel certainly doesn't have extra large debug
> information, compared to other projects.

DWARF generation only for it to be transformed into something compact like CTF
and eventually to be stripped away is not an efficient workflow; Further, only
a subset of this is amenable to "optimizations", i.e. compile-time generation
of DWARF is unavoidable in this flow.

For example, for the Linux kernel, the debuginfo in the object files is ~9GiB
(The object files without debuginfo is ~1.5GiB). That's a 500% increase in the
space requirements. Generating the DWARF at compile time adds 20% to the compile
time. Next, reading in this DWARF using libdwarf from elftuils takes ~1 min 10
seconds. Next, a conversion utility like dwarf2ctf will then need to perform
de-duplication, on this rather voluminous DWARF, as you see, for the purpose on
hand.

Not just the kernel, for another relatively large internal application we
measured a 19% and 23% increase in compile time for generating DWARF for
a randomly chosen subsets of C and C++ components respectively. For the entire
application, it already takes an order of 3-4 hours to perform a parallel build.
The space requirements for building the entire application with -g also increase
to about 5.5x.

These numbers I state above are on sufficiently beefy platforms. On the
"wimpier" platforms or single-threaded builds for that matter, the increase in
time costs to generate DWARF and then use an external utility (at link-time
or offline) just aggravates the pain for developers, who may need to do even
multiple builds a day.

So, in summary, this increase in build times and space costs is noticeable
(even prohibitive in some cases) and does impact a subset of kernel and
application developers who need to use CTF. Above all, effectively it limits
CTF adoption.

>
>> 3. Maintainability problems: the converter should be maintained in order to
>>     reflect potential changes in the DWARF generated by the compiler.
> If you integrate the support into GCC, then it will need to be maintained
> there as well, I bet it will be more work than on the conversion utility.

Yes, work is needed on all components of the toolchain to integrate CTF
generation. We are working on the compiler, linker and debugger (GDB) currently.

We are committing to maintain the CTF generation in GCC, just like we are doing
for Binutils, and will be doing for GDB.

>
> 	Jakub