Patchwork v2 of GDB hooks for debugging GCC

login
register
mail settings
Submitter David Malcolm
Date Aug. 21, 2013, 8:07 p.m.
Message ID <1377115676.9927.25.camel@surprise>
Download mbox | patch
Permalink /patch/268898/
State New
Headers show

Comments

David Malcolm - Aug. 21, 2013, 8:07 p.m.
On Mon, 2013-08-05 at 08:26 -0600, Tom Tromey wrote:
> >>>>> "David" == David Malcolm <dmalcolm@redhat.com> writes:
> 
> David> GDB 7.0 onwards supports hooks written in Python to improve the
> David> quality-of-life within the debugger.  The best known are the
> David> pretty-printing hooks [1], which we already use within libstdc++ for
> David> printing better representations of STL containers.
> 
> Nice!
Thanks.

> A few suggestions.
> 
> David>   (note how the rtx_def* is printed inline.  This last one is actually a
> David> kludge; all the other pretty-printers work inside gdb, whereas this one
> David> injects a call into print-rtl.c into the inferior).
> 
> Naughty.
We chatted about this at Cauldron; I haven't yet had a chance to
implement the magic bullet approach we discussed there.  In the
meantime, is there a API I can call to determine how safe this kludge
is?

> David>   * it hardcoded values in a few places rather than correctly looking up
> David> enums
> 
> If you have a new-enough gdb (I don't recall the exact version -- I can
> look if you want, but recall that gcc changes mean that gcc developers
> generally have to use a very recent gdb) you can use
> gdb.types.make_enum_dict to get this very easily.

Thanks, I've rewritten to use these; works great on this box (with
gdb-7.4.50.20120120-54.fc17.x86_64 fwiw).


> David> You may see a message from gdb of the form:
> David>   cc1-gdb.py auto-loading has been declined by your `auto-load safe-path'
> David> as a protection against untrustworthy python scripts.  See
> David>   http://sourceware.org/gdb/onlinedocs/gdb/Auto_002dloading-safe-path.html
> 
> I think you could set up the safe-path in the gcc .gdbinit.

Interesting idea - but .gdbinit itself seems to get declined, so I don't
think this can help.

> David> Note that you can suppress pretty-printers using /r (for "raw"):
> David>   (gdb) p /r pass
> David>   $3 = (opt_pass *) 0x188b600
> David> and dereference the pointer in the normal way:
> David>   (gdb) p *pass
> David>   $4 = {type = RTL_PASS, name = 0x120a312 "expand",
> David>   [etc, ...snipped...]
> 
> I just wanted to clarify here that you can "p *pass" *without* first
> using "p/r".  Pretty-printing applies just to printing -- it does not
> affect what is in the value history.  The values there still have the
> correct type and such.

I've reversed the order of these in the docstring to make this more
clear.


> David> def pretty_printer_lookup(gdbval):
> [...]
> 
> David> def register (obj):
> David>     if obj is None:
> David>         obj = gdb
> 
> David>     # Wire up the pretty-printer
> David>     obj.pretty_printers.append(pretty_printer_lookup)
> 
> It's better to use the gdb.printing facilities now.  These let user
> disable pretty-printers if they prefer.  The libstdc++ printers go out
> of their way to use gdb.printing only if available; but you can probably
> just assume it exists.

I initially tried using gdb.printing.RegexpCollectionPrettyPrinter, with
code like this:

    pp.add_printer('rtx_def', r'^rtx_def \*$', RtxPrinter)

but it didn't work.  On debugging, I note that an "rtx_def*" has a
pointer type, and hence this code in
RegexpCollectionPrettyPrinter.__call__ fails:

        # Get the type name.
        typename = gdb.types.get_basic_type(val.type).tag
        if not typename:
            return None

since the basic_type has a None tag.

So I implemented a similar scheme, with all the prettyprinters in a
top-level "gcc" holder, but doing precise string matching on the
"unqualified" type, like in the original patch.

This works as before, and presumably works with the pretty-printer
management facilities; running this gives sane-looking output:

(gdb) info pretty-printer 
  objfile /home/david/coding/gcc-python/gcc-git-prettyprinters/build/gcc/cc1 pretty-printers:
  gcc
    basic_block
    cgraph_node
    edge
    gimple
    opt_pass
    rtx_def
    tree
  objfile /lib64/libstdc++.so.6 pretty-printers:
  libstdc++-v6
    __gnu_cxx::_Slist_iterator
    __gnu_cxx::__7::_Slist_iterator
---Type <return> to continue, or q <return> to quit---

How would one go about toggling the enabledness of a prettyprinter?  Is
this something that can only be done from python?

> David> print('Successfully loaded GDB hooks for GCC')
> 
> I wonder whether gdb ought to do this.

FWIW given all of the different gdb and python builds, the vagaries of
getting the hooks into the correct location for autoload to work,
setting up auto-load safe-path, and ensuring that the script actually
parses and runs to completion, I've found it very useful to have an
explicit "yes I'm working" message like this at the end of such scripts,
especially when dealing with bug reports: it's very useful to know
whether or not this message was printed when diagnosing problems
reported by 3rd parties.

I did see references to gdb.parameter("verbose") in gdb.printing - how
would one go about setting this?  (then again, the final print is a good
sanity check, for the reasons noted above).

I'm attaching the latest version (v2).

The main TODO is to hack up the Makefile.in machinery so that the hooks
are automatically copied into place (which is why I want this in the
"gdb" dir, rather than "contrib", I think).
Tom Tromey - Aug. 21, 2013, 9:01 p.m.
>>>>> "David" == David Malcolm <dmalcolm@redhat.com> writes:

Tom> Naughty.

David> We chatted about this at Cauldron; I haven't yet had a chance to
David> implement the magic bullet approach we discussed there.  In the
David> meantime, is there a API I can call to determine how safe this kludge
David> is?

Not right now.  You can just call the function and catch the exception
that occurs if it can't be done.

I think you can still run into trouble sometimes.  For example if the
user puts a breakpoint in one of the functions used by the
pretty-printer, and then does "bt", hitting the breakpoint while
printing the backtrace... not sure what happens then, maybe a crash.

Tom> I think you could set up the safe-path in the gcc .gdbinit.

David> Interesting idea - but .gdbinit itself seems to get declined, so I don't
David> think this can help.

Haha, I didn't think of that :-)

David> So I implemented a similar scheme, with all the prettyprinters in a
David> top-level "gcc" holder, but doing precise string matching on the
David> "unqualified" type, like in the original patch.

David> This works as before, and presumably works with the pretty-printer
David> management facilities; running this gives sane-looking output:
[...]

Nice.

David> How would one go about toggling the enabledness of a prettyprinter?  Is
David> this something that can only be done from python?

You can use "enable pretty-printer" and "disable pretty-printer".

David> I did see references to gdb.parameter("verbose") in gdb.printing
David> - how would one go about setting this?

"set verbose on"

I think few people use this setting though; probably best to do what
you're doing now.

David> +# Convert "enum tree_code" (tree.def and tree.h) to a dict:
David> +tree_code_dict = gdb.types.make_enum_dict(gdb.lookup_type('enum tree_code'))

One other subtlety is that this doesn't interact well with all kinds of
uses of gdb.  For example if you have a running gdb, then modify enum
tree_code and rebuild, then the pretty-printers won't notice this
change.

I guess it would be nice if we had pre-built caches for this kind of
this available upstream.  But meanwhile, if you care, you can roll your
own using events to notice when to invalidate data.

David> +    def __call__(self, gdbval):
David> +        type_ = gdbval.type.unqualified()
David> +        str_type_ = str(type_)

FWIW I think for RegexpCollectionPrettyPrinter you could write a
subclass whose __call__ first dereferenced a pointer, then called
super's __call__.  But your approach is just fine.

Tom
David Malcolm - Aug. 26, 2013, 6:28 p.m.
On Wed, 2013-08-21 at 15:01 -0600, Tom Tromey wrote:
> >>>>> "David" == David Malcolm <dmalcolm@redhat.com> writes:
[...]

> David> How would one go about toggling the enabledness of a prettyprinter?  Is
> David> this something that can only be done from python?
> 
> You can use "enable pretty-printer" and "disable pretty-printer".

Yes, using .* to match the current executable:

  (gdb) disable pretty-printer .* gcc
  7 printers disabled
  128 of 135 printers enabled

and this does indeed disable/enable the prettyprinters.


> David> I did see references to gdb.parameter("verbose") in gdb.printing
> David> - how would one go about setting this?
> 
> "set verbose on"
> 
> I think few people use this setting though; probably best to do what
> you're doing now.
> 
> David> +# Convert "enum tree_code" (tree.def and tree.h) to a dict:
> David> +tree_code_dict = gdb.types.make_enum_dict(gdb.lookup_type('enum tree_code'))
> 
> One other subtlety is that this doesn't interact well with all kinds of
> uses of gdb.  For example if you have a running gdb, then modify enum
> tree_code and rebuild, then the pretty-printers won't notice this
> change.
> 
> I guess it would be nice if we had pre-built caches for this kind of
> this available upstream.  But meanwhile, if you care, you can roll your
> own using events to notice when to invalidate data.

As they say, the two fundamental problems in Computer Science are cache
invalidation, naming things, and off-by-one errors.

I'm inclined not to care for now: if you've rebuilt gcc with a new enum
tree code, then you should restart gdb.

Is there a precanned event provided by gdb that I can connect to for
when the underlying code has changed and my caches need to be
invalidated?


> David> +    def __call__(self, gdbval):
> David> +        type_ = gdbval.type.unqualified()
> David> +        str_type_ = str(type_)
> 
> FWIW I think for RegexpCollectionPrettyPrinter you could write a
> subclass whose __call__ first dereferenced a pointer, then called
> super's __call__.  But your approach is just fine.

Thanks
Tom Tromey - Aug. 27, 2013, 4:06 p.m.
>>>>> "David" == David Malcolm <dmalcolm@redhat.com> writes:

David> Is there a precanned event provided by gdb that I can connect to for
David> when the underlying code has changed and my caches need to be
David> invalidated?

Maybe not :(

You could use the "exited" event as a decent approximation.

Also, and I think we're really getting into the obscure stuff here, if
you want to let users debug multiple versions of gcc at the same time,
then your caches have to be parameterized by the inferior.  I'm
reasonably confident that nobody actually does this.

Tom

Patch

commit 640380ed8aa8a36dc3b51c0e19815cfecd1f6027
Author: David Malcolm <dmalcolm@redhat.com>
Date:   Wed Aug 21 15:45:55 2013 -0400

    initial version of gdb hooks

diff --git a/gcc/gdb-hooks.py b/gcc/gdb-hooks.py
new file mode 100644
index 0000000..c167971
--- /dev/null
+++ b/gcc/gdb-hooks.py
@@ -0,0 +1,375 @@ 
+"""
+Installation
+------------
+gdb automatically looks for python files with a -gdb.py suffix relative
+to executables and DSOs.
+
+Currently you need to copy up this file to e.g. "cc1-gdb.py":
+
+  cp ../../src/gcc/gdb-hooks.py cc1-gdb.py
+
+You may see a message from gdb of the form:
+  cc1-gdb.py auto-loading has been declined by your `auto-load safe-path'
+as a protection against untrustworthy python scripts.  See
+  http://sourceware.org/gdb/onlinedocs/gdb/Auto_002dloading-safe-path.html
+
+A workaround is to add:
+  -iex "add-auto-load-safe-path ."
+to the gdb invocation, to mark the current directory as trustworthy.
+
+During development, I've been manually invoking the code in this way, as a
+precanned way of printing a variety of different kinds of value:
+
+  cp ../../src/gcc/gdb-hooks.py ./cc1-gdb.py \
+     && gdb \
+          -iex "add-auto-load-safe-path ." \
+          -ex "break expand_gimple_stmt" \
+          -ex "run" \
+          -ex "bt" \
+          --args \
+            ./cc1 foo.c -O3
+
+Examples of output using the pretty-printers
+--------------------------------------------
+Pointer values are generally shown in the form:
+  <type address extra_info>
+
+For example, an opt_pass* might appear as:
+  (gdb) p pass
+  $2 = <opt_pass* 0x188b600 "expand"(170)>
+
+The name of the pass is given ("expand"), together with the
+static_pass_number.
+
+Note that you can dereference the pointer in the normal way:
+  (gdb) p *pass
+  $4 = {type = RTL_PASS, name = 0x120a312 "expand",
+  [etc, ...snipped...]
+
+and you can suppress pretty-printers using /r (for "raw"):
+  (gdb) p /r pass
+  $3 = (opt_pass *) 0x188b600
+
+Basic blocks are shown with their index in parentheses, apart from the
+CFG's entry and exit blocks, which are given as "ENTRY" and "EXIT":
+  (gdb) p bb
+  $9 = <basic_block 0x7ffff041f1a0 (2)>
+  (gdb) p cfun->cfg->x_entry_block_ptr
+  $10 = <basic_block 0x7ffff041f0d0 (ENTRY)>
+  (gdb) p cfun->cfg->x_exit_block_ptr
+  $11 = <basic_block 0x7ffff041f138 (EXIT)>
+
+CFG edges are shown with the src and dest blocks given in parentheses:
+  (gdb) p e
+  $1 = <edge 0x7ffff043f118 (ENTRY -> 6)>
+
+Tree nodes are printed using Python code that emulates print_node_brief,
+running in gdb, rather than in the inferior:
+  (gdb) p cfun->decl
+  $1 = <function_decl 0x7ffff0420b00 foo>
+For usability, the type is printed first (e.g. "function_decl"), rather
+than just "tree".
+
+RTL expressions use a kludge: they are pretty-printed by injecting
+calls into print-rtl.c into the inferior:
+  Value returned is $1 = (note 9 8 10 [bb 3] NOTE_INSN_BASIC_BLOCK)
+  (gdb) p $1
+  $2 = (note 9 8 10 [bb 3] NOTE_INSN_BASIC_BLOCK)
+  (gdb) p /r $1
+  $3 = (rtx_def *) 0x7ffff043e140
+This won't work for coredumps, and probably in other circumstances, but
+it's a quick way of getting lots of debuggability quickly.
+
+Callgraph nodes are printed with the name of the function decl, if
+available:
+  (gdb) frame 5
+  #5  0x00000000006c288a in expand_function (node=<cgraph_node* 0x7ffff0312720 "foo">) at ../../src/gcc/cgraphunit.c:1594
+  1594	  execute_pass_list (g->get_passes ()->all_passes);
+  (gdb) p node
+  $1 = <cgraph_node* 0x7ffff0312720 "foo">
+"""
+import re
+
+import gdb
+import gdb.printing
+import gdb.types
+
+# Convert "enum tree_code" (tree.def and tree.h) to a dict:
+tree_code_dict = gdb.types.make_enum_dict(gdb.lookup_type('enum tree_code'))
+
+# ...and look up specific values for use later:
+IDENTIFIER_NODE = tree_code_dict['IDENTIFIER_NODE']
+TYPE_DECL = tree_code_dict['TYPE_DECL']
+
+# Similarly for "enum tree_code_class" (tree.h):
+tree_code_class_dict = gdb.types.make_enum_dict(gdb.lookup_type('enum tree_code_class'))
+tcc_type = tree_code_class_dict['tcc_type']
+tcc_declaration = tree_code_class_dict['tcc_declaration']
+
+class Tree:
+    """
+    Wrapper around a gdb.Value for a tree, with various methods
+    corresponding to macros in gcc/tree.h
+    """
+    def __init__(self, gdbval):
+        self.gdbval = gdbval
+
+    def is_nonnull(self):
+        return long(self.gdbval)
+
+    def TREE_CODE(self):
+        """
+        Get gdb.Value corresponding to TREE_CODE (self)
+        as per:
+          #define TREE_CODE(NODE) ((enum tree_code) (NODE)->base.code)
+        """
+        return self.gdbval['base']['code']
+
+    def DECL_NAME(self):
+        """
+        Get Tree instance corresponding to DECL_NAME (self)
+        """
+        return Tree(self.gdbval['decl_minimal']['name'])
+
+    def TYPE_NAME(self):
+        """
+        Get Tree instance corresponding to result of TYPE_NAME (self)
+        """
+        return Tree(self.gdbval['type_common']['name'])
+
+    def IDENTIFIER_POINTER(self):
+        """
+        Get str correspoinding to result of IDENTIFIER_NODE (self)
+        """
+        return self.gdbval['identifier']['id']['str'].string()
+
+class TreePrinter:
+    "Prints a tree"
+
+    def __init__ (self, gdbval):
+        self.gdbval = gdbval
+        self.node = Tree(gdbval)
+
+    def to_string (self):
+        # like gcc/print-tree.c:print_node_brief
+        # #define TREE_CODE(NODE) ((enum tree_code) (NODE)->base.code)
+        # tree_code_name[(int) TREE_CODE (node)])
+        if long(self.gdbval) == 0:
+            return '<tree 0x0>'
+
+        val_TREE_CODE = self.node.TREE_CODE()
+
+        # extern const enum tree_code_class tree_code_type[];
+        # #define TREE_CODE_CLASS(CODE)	tree_code_type[(int) (CODE)]
+
+        val_tree_code_type = gdb.parse_and_eval('tree_code_type')
+        val_tclass = val_tree_code_type[val_TREE_CODE]
+
+        val_tree_code_name = gdb.parse_and_eval('tree_code_name')
+        val_code_name = val_tree_code_name[long(val_TREE_CODE)]
+        #print val_code_name.string()
+
+        result = '<%s 0x%x' % (val_code_name.string(), long(self.gdbval))
+        if long(val_tclass) == tcc_declaration:
+            tree_DECL_NAME = self.node.DECL_NAME()
+            if tree_DECL_NAME.is_nonnull():
+                 result += ' %s' % tree_DECL_NAME.IDENTIFIER_POINTER()
+            else:
+                pass # TODO: labels etc
+        elif long(val_tclass) == tcc_type:
+            tree_TYPE_NAME = Tree(self.gdbval['type_common']['name'])
+            if tree_TYPE_NAME.is_nonnull():
+                if tree_TYPE_NAME.TREE_CODE() == IDENTIFIER_NODE:
+                    result += ' %s' % tree_TYPE_NAME.IDENTIFIER_POINTER()
+                elif tree_TYPE_NAME.TREE_CODE() == TYPE_DECL:
+                    if tree_TYPE_NAME.DECL_NAME().is_nonnull():
+                        result += ' %s' % tree_TYPE_NAME.DECL_NAME().IDENTIFIER_POINTER()
+        if self.node.TREE_CODE() == IDENTIFIER_NODE:
+            result += ' %s' % self.node.IDENTIFIER_POINTER()
+        # etc
+        result += '>'
+        return result
+
+######################################################################
+# Callgraph pretty-printers
+######################################################################
+
+class CGraphNodePrinter:
+    def __init__(self, gdbval):
+        self.gdbval = gdbval
+
+    def to_string (self):
+        result = '<cgraph_node* 0x%x' % long(self.gdbval)
+        if long(self.gdbval):
+            # symtab_node_name calls lang_hooks.decl_printable_name
+            # default implementation (lhd_decl_printable_name) is:
+            #    return IDENTIFIER_POINTER (DECL_NAME (decl));
+            symbol = self.gdbval['symbol']
+            tree_decl = Tree(symbol['decl'])
+            result += ' "%s"' % tree_decl.DECL_NAME().IDENTIFIER_POINTER()
+        result += '>'
+        return result
+
+######################################################################
+
+class GimplePrinter:
+    def __init__(self, gdbval):
+        self.gdbval = gdbval
+
+    def to_string (self):
+        if long(self.gdbval) == 0:
+            return '<gimple 0x0>'
+        val_gimple_code = self.gdbval['gsbase']['code']
+        val_gimple_code_name = gdb.parse_and_eval('gimple_code_name')
+        val_code_name = val_gimple_code_name[long(val_gimple_code)]
+        result = '<%s 0x%x' % (val_code_name.string(),
+                               long(self.gdbval))
+        result += '>'
+        return result
+
+######################################################################
+# CFG pretty-printers
+######################################################################
+
+def bb_index_to_str(index):
+    if index == 0:
+        return 'ENTRY'
+    elif index == 1:
+        return 'EXIT'
+    else:
+        return '%i' % index
+
+class BasicBlockPrinter:
+    def __init__(self, gdbval):
+        self.gdbval = gdbval
+
+    def to_string (self):
+        result = '<basic_block 0x%x' % long(self.gdbval)
+        if long(self.gdbval):
+            result += ' (%s)' % bb_index_to_str(long(self.gdbval['index']))
+        result += '>'
+        return result
+
+class CfgEdgePrinter:
+    def __init__(self, gdbval):
+        self.gdbval = gdbval
+
+    def to_string (self):
+        result = '<edge 0x%x' % long(self.gdbval)
+        if long(self.gdbval):
+            src = bb_index_to_str(long(self.gdbval['src']['index']))
+            dest = bb_index_to_str(long(self.gdbval['dest']['index']))
+            result += ' (%s -> %s)' % (src, dest)
+        result += '>'
+        return result
+
+######################################################################
+
+class Rtx:
+    def __init__(self, gdbval):
+        self.gdbval = gdbval
+
+    def GET_CODE(self):
+        return self.gdbval['code']
+
+def GET_RTX_LENGTH(code):
+    val_rtx_length = gdb.parse_and_eval('rtx_length')
+    return long(val_rtx_length[code])
+
+def GET_RTX_NAME(code):
+    val_rtx_name = gdb.parse_and_eval('rtx_name')
+    return val_rtx_name[code].string()
+
+def GET_RTX_FORMAT(code):
+    val_rtx_format = gdb.parse_and_eval('rtx_format')
+    return val_rtx_format[code].string()
+
+class RtxPrinter:
+    def __init__(self, gdbval):
+        self.gdbval = gdbval
+        self.rtx = Rtx(gdbval)
+
+    def to_string (self):
+        """
+        For now, a cheap kludge: invoke the inferior's print
+        function to get a string to use the user, and return an empty
+        string for gdb
+        """
+        # We use print_inline_rtx to avoid a trailing newline
+        gdb.execute('call print_inline_rtx (stderr, (const_rtx) %s, 0)'
+                    % long(self.gdbval))
+        return ''
+
+        # or by hand; based on gcc/print-rtl.c:print_rtx
+        result = ('<rtx_def 0x%x'
+                  % (long(self.gdbval)))
+        code = self.rtx.GET_CODE()
+        result += ' (%s' % GET_RTX_NAME(code)
+        format_ = GET_RTX_FORMAT(code)
+        for i in range(GET_RTX_LENGTH(code)):
+            print format_[i]
+        result += ')>'
+        return result
+
+######################################################################
+
+class PassPrinter:
+    def __init__(self, gdbval):
+        self.gdbval = gdbval
+
+    def to_string (self):
+        result = '<opt_pass* 0x%x' % long(self.gdbval)
+        if long(self.gdbval):
+            result += (' "%s"(%i)'
+                       % (self.gdbval['name'].string(),
+                          long(self.gdbval['static_pass_number'])))
+        result += '>'
+        return result
+
+######################################################################
+
+# TODO:
+#   * vec
+#   * hashtab
+#   * location_t
+
+class GdbSubprinter(gdb.printing.SubPrettyPrinter):
+    def __init__(self, name, str_type_, class_):
+        super(GdbSubprinter, self).__init__(name)
+        self.str_type_ = str_type_
+        self.class_ = class_
+
+class GdbPrettyPrinters(gdb.printing.PrettyPrinter):
+    def __init__(self, name):
+        super(GdbPrettyPrinters, self).__init__(name, [])
+
+    def add_printer(self, name, exp, class_):
+        self.subprinters.append(GdbSubprinter(name, exp, class_))
+
+    def __call__(self, gdbval):
+        type_ = gdbval.type.unqualified()
+        str_type_ = str(type_)
+        for printer in self.subprinters:
+            if printer.enabled and str_type_ == printer.str_type_:
+                return printer.class_(gdbval)
+
+        # Couldn't find a pretty printer (or it was disabled):
+        return None
+
+
+def build_pretty_printer():
+    pp = GdbPrettyPrinters('gcc')
+    pp.add_printer('tree', 'tree', TreePrinter)
+    pp.add_printer('cgraph_node', 'cgraph_node *', CGraphNodePrinter)
+    pp.add_printer('gimple', 'gimple', GimplePrinter)
+    pp.add_printer('basic_block', 'basic_block', BasicBlockPrinter)
+    pp.add_printer('edge', 'edge', CfgEdgePrinter)
+    pp.add_printer('rtx_def', 'rtx_def *', RtxPrinter)
+    pp.add_printer('opt_pass', 'opt_pass *', PassPrinter)
+    return pp
+
+gdb.printing.register_pretty_printer(
+    gdb.current_objfile(),
+    build_pretty_printer())
+
+print('Successfully loaded GDB hooks for GCC')