Patchwork Support multiline GTY markers in Doxygen filter; add unittests

login
register
mail settings
Submitter David Malcolm
Date Oct. 31, 2013, 8:06 p.m.
Message ID <1383249987-14379-1-git-send-email-dmalcolm@redhat.com>
Download mbox | patch
Permalink /patch/287598/
State New
Headers show

Comments

David Malcolm - Oct. 31, 2013, 8:06 p.m.
It's possible to run GCC's sources through Doxygen by setting
	INPUT_FILTER           = contrib/filter_gcc_for_doxygen
within contrib/gcc.doxy and invoking doxygen on the latter file.

The script filters out various preprocessor constructs from GCC sources
before Doxygen tries to parse them.

However, as written, it works one line at-a-time, and thus can't cope
with GTY markers that span multiple lines.  I added such a marker
in r204170 (symtab_node_base), and I'd like to add more such markers
(see http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02703.html).

So the attached patch rewrites the filtering script to cope with multiline
GTY markers.

I'm not familiar with Perl, so I rewrote the script in Python.

I expanded the regexes for readability, and added a unit-testing suite.
I also tweaked how comments get layed with @verbatim
to avoid inconsistent indentation between the first line and subsequent
lines within a comment.

Tested with Python 2.7; you can see a sample of the output (from my
gimple-as-C++-inheritance working copy) at:
http://dmalcolm.fedorapeople.org/gcc/2013-10-31/doxygen/html/structgimple__statement__base.html

In particular, with this patch Doxygen is able to cope with the symtab
GTY marker, and thus parse the symtab class hierarchy, giving the output
viewable here:
http://dmalcolm.fedorapeople.org/gcc/2013-10-31/doxygen/html/classsymtab__node__base.html

OK for trunk?

contrib/
	* filter_gcc_for_doxygen: Use filter_params.py rather than
	filter_params.pl.
	* filter_params.pl: Delete in favor of...
	* filter_params.py: New, porting the perl script to python in
	order to cope with GTY markers that span multiple lines,
	tweaking the layout of comments, and adding a test suite.
---
 contrib/filter_gcc_for_doxygen |   2 +-
 contrib/filter_params.pl       |  14 ---
 contrib/filter_params.py       | 191 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 192 insertions(+), 15 deletions(-)
 delete mode 100755 contrib/filter_params.pl
 create mode 100644 contrib/filter_params.py
Diego Novillo - Nov. 1, 2013, 3:16 p.m.
On Thu, Oct 31, 2013 at 4:06 PM, David Malcolm <dmalcolm@redhat.com> wrote:
> It's possible to run GCC's sources through Doxygen by setting
>         INPUT_FILTER           = contrib/filter_gcc_for_doxygen
> within contrib/gcc.doxy and invoking doxygen on the latter file.
>
> The script filters out various preprocessor constructs from GCC sources
> before Doxygen tries to parse them.
>
> However, as written, it works one line at-a-time, and thus can't cope
> with GTY markers that span multiple lines.  I added such a marker
> in r204170 (symtab_node_base), and I'd like to add more such markers
> (see http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02703.html).
>
> So the attached patch rewrites the filtering script to cope with multiline
> GTY markers.
>
> I'm not familiar with Perl, so I rewrote the script in Python.
>
> I expanded the regexes for readability, and added a unit-testing suite.
> I also tweaked how comments get layed with @verbatim
> to avoid inconsistent indentation between the first line and subsequent
> lines within a comment.
>
> Tested with Python 2.7; you can see a sample of the output (from my
> gimple-as-C++-inheritance working copy) at:
> http://dmalcolm.fedorapeople.org/gcc/2013-10-31/doxygen/html/structgimple__statement__base.html
>
> In particular, with this patch Doxygen is able to cope with the symtab
> GTY marker, and thus parse the symtab class hierarchy, giving the output
> viewable here:
> http://dmalcolm.fedorapeople.org/gcc/2013-10-31/doxygen/html/classsymtab__node__base.html
>
> OK for trunk?
>
> contrib/
>         * filter_gcc_for_doxygen: Use filter_params.py rather than
>         filter_params.pl.
>         * filter_params.pl: Delete in favor of...
>         * filter_params.py: New, porting the perl script to python in
>         order to cope with GTY markers that span multiple lines,
>         tweaking the layout of comments, and adding a test suite.
> ---
>  contrib/filter_gcc_for_doxygen |   2 +-
>  contrib/filter_params.pl       |  14 ---
>  contrib/filter_params.py       | 191 +++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 192 insertions(+), 15 deletions(-)
>  delete mode 100755 contrib/filter_params.pl
>  create mode 100644 contrib/filter_params.py
>
> diff --git a/contrib/filter_gcc_for_doxygen b/contrib/filter_gcc_for_doxygen
> index 3787eeb..ca1db31 100755
> --- a/contrib/filter_gcc_for_doxygen
> +++ b/contrib/filter_gcc_for_doxygen
> @@ -8,5 +8,5 @@
>  # process is put on stdout.
>
>  dir=`dirname $0`
> -perl $dir/filter_params.pl < $1 | perl $dir/filter_knr2ansi.pl
> +python $dir/filter_params.py $1 | perl $dir/filter_knr2ansi.pl
>  exit 0
> diff --git a/contrib/filter_params.pl b/contrib/filter_params.pl
> deleted file mode 100755
> index 22dae6c..0000000
> --- a/contrib/filter_params.pl
> +++ /dev/null
> @@ -1,14 +0,0 @@
> -#!/usr/bin/perl
> -
> -# Filters out some of the #defines used throughout the GCC sources:
> -# - GTY(()) marks declarations for gengtype.c
> -# - PARAMS(()) is used for K&R compatibility. See ansidecl.h.
> -
> -while (<>) {
> -    s/^\/\* /\/\*\* \@verbatim /;
> -    s/\*\// \@endverbatim \*\//;
> -    s/GTY[ \t]*\(\(.*\)\)//g;
> -    s/[ \t]ATTRIBUTE_UNUSED//g;
> -    s/PARAMS[ \t]*\(\((.*?)\)\)/\($1\)/sg;
> -    print;
> -}
> diff --git a/contrib/filter_params.py b/contrib/filter_params.py
> new file mode 100644
> index 0000000..d5092f6
> --- /dev/null
> +++ b/contrib/filter_params.py
> @@ -0,0 +1,191 @@
> +#!/usr/bin/python
> +"""
> +Filters out some of the #defines used throughout the GCC sources:
> +- GTY(()) marks declarations for gengtype.c
> +- PARAMS(()) is used for K&R compatibility. See ansidecl.h.
> +
> +When passed one or more filenames, acts on those files and prints the
> +results to stdout.
> +
> +When run without a filename, runs a unit-testing suite.
> +"""
> +import re
> +import sys
> +import unittest
> +
> +# Optional whitespace
> +opt_ws = '\s*'
> +
> +def filter_src(text):
> +    """
> +    str -> str.  We operate on the whole of the source file at once
> +    (rather than individual lines) so that we can have multiline
> +    regexes.
> +    """
> +
> +    # First, convert tabs to spaces so that we can line things up
> +    # sanely.
> +    text = text.expandtabs(8)

Does it really matter?  I thought doxygen reformatted output anyway.

> +
> +    # Convert C comments from GNU coding convention of:
> +    #    /* FIRST_LINE
> +    #       NEXT_LINE
> +    #       FINAL_LINE.  */
> +    # to:
> +    #    /** @verbatim
> +    #       FIRST_LINE
> +    #       NEXT_LINE
> +    #       FINAL_LINE.  @endverbatim */
> +    # so that doxygen will parse them, and so the first line
> +    # lines up correctly with subsequent lines.
> +    # Single-line comments turn from:
> +    #    /* TEXT.  */
> +    # into:
> +    #    /** @verbatim
> +    #        TEXT.  @endverbatim */

Oh, for @verbatim.  But, why @verbatim?  One trick we could do here is
recognize ALL CAPS parameters and turn them into \p PARAM. Later on,
we just emit \param.  Though I guess it would not be easy to pick up
the description. Anyway, we can think about this for later.

At the same time, I would like to encourage the use of doxygen markers
in the code. If we start adding all these @verbatim markers, those
markers will be canceled out. In fact, I would like to set up some
expectations for doxygen mark ups in the coding guidelines themselves.


> +
> +    text = re.sub(r'^([ \t]*)/\*+ ',
> +                  (r'\1/** @verbatim\n'
> +                   r'\1   '),
> +                  text,
> +                  flags=re.MULTILINE)
> +    text = re.sub(r'\*+/',
> +                  r'@endverbatim */',
> +                  text)
> +
> +    # Remove GTY markings (potentially multiline ones):
> +    text = re.sub('GTY' + opt_ws + r'\(\(.*?\)\)\s+',
> +                  '',
> +                  text,
> +                  flags=(re.MULTILINE|re.DOTALL))
> +
> +    # Strip out 'ATTRIBUTE_UNUSED'
> +    text = re.sub('\sATTRIBUTE_UNUSED',
> +                  '',
> +                  text)
> +
> +    # PARAMS(()) is used for K&R compatibility. See ansidecl.h.
> +    text = re.sub('PARAMS' + opt_ws + r'\(\((.*?)\)\)',
> +                  r'(\1)',
> +                  text)
> +
> +    return text
> +
> +class FilteringTests(unittest.TestCase):
> +    '''
> +    Unit tests for filter_src.
> +    '''
> +    def assert_filters_to(self, src_input, expected_result):
> +        # assertMultiLineEqual was added to unittest in 2.7/3.1
> +        if hasattr(self, 'assertMultiLineEqual'):
> +            assertion = self.assertMultiLineEqual
> +        else:
> +            assertion = self.assertEqual
> +        assertion(expected_result, filter_src(src_input))
> +
> +    def test_comment_example(self):
> +        self.assert_filters_to(
> +            ('    /* FIRST_LINE\n'
> +             '       NEXT_LINE\n'
> +             '       FINAL_LINE.  */\n'),
> +            ('    /** @verbatim\n'
> +             '       FIRST_LINE\n'
> +             '       NEXT_LINE\n'
> +             '       FINAL_LINE.  @endverbatim */\n'))
> +
> +    def test_oneliner_comment(self):
> +        self.assert_filters_to(
> +            '/* Returns the string representing CLASS.  */\n',
> +            ('/** @verbatim\n'
> +             '   Returns the string representing CLASS.  @endverbatim */\n'))
> +
> +    def test_multiline_comment_without_initial_indent(self):
> +        self.assert_filters_to(
> +            ('/* The thread-local storage model associated with a given VAR_DECL\n'
> +             "   or SYMBOL_REF.  This isn't used much, but both trees and RTL refer\n"
> +             "   to it, so it's here.  */\n"),
> +            ('/** @verbatim\n'
> +             '   The thread-local storage model associated with a given VAR_DECL\n'
> +             "   or SYMBOL_REF.  This isn't used much, but both trees and RTL refer\n"
> +             "   to it, so it's here.  @endverbatim */\n"))
> +
> +    def test_multiline_comment_with_initial_indent(self):
> +         self.assert_filters_to(
> +            ('  /* Set once the definition was analyzed.  The list of references and\n'
> +             '     other properties are built during analysis.  */\n'),
> +            ('  /** @verbatim\n'
> +             '     Set once the definition was analyzed.  The list of references and\n'
> +             '     other properties are built during analysis.  @endverbatim */\n'))
> +
> +    def test_multiline_comment_with_tabs(self):
> +        # Ensure that multiline comments containing tabs look sane, in
> +        # this case, one with initial indentation.
> +        self.assert_filters_to(
> +            ('  /* All the anchor SYMBOL_REFs used to address these objects, sorted\n'
> +             '     in order of increasing offset, and then increasing TLS model.\n'
> +             '     The following conditions will hold for each element X in this vector:\n'
> +             '\n'
> +             '\t SYMBOL_REF_HAS_BLOCK_INFO_P (X)\n'
> +             '\t SYMBOL_REF_ANCHOR_P (X)\n'
> +             '\t SYMBOL_REF_BLOCK (X) == [address of this structure]\n'
> +             '\t SYMBOL_REF_BLOCK_OFFSET (X) >= 0.  */\n'
> +             '  vec<rtx, va_gc> *anchors;\n'),
> +            ('  /** @verbatim\n'
> +             '     All the anchor SYMBOL_REFs used to address these objects, sorted\n'
> +             '     in order of increasing offset, and then increasing TLS model.\n'
> +             '     The following conditions will hold for each element X in this vector:\n'
> +             '\n'
> +             '         SYMBOL_REF_HAS_BLOCK_INFO_P (X)\n'
> +             '         SYMBOL_REF_ANCHOR_P (X)\n'
> +             '         SYMBOL_REF_BLOCK (X) == [address of this structure]\n'
> +             '         SYMBOL_REF_BLOCK_OFFSET (X) >= 0.  @endverbatim */\n'
> +             '  vec<rtx, va_gc> *anchors;\n'))
> +
> +    def test_simple_GTY(self):
> +        # Ensure that a simple GTY is filtered out.
> +        self.assert_filters_to(
> +            ('typedef struct GTY(()) alias_pair {\n'
> +             '  tree decl;\n'
> +             '  tree target;\n'
> +             '} alias_pair;\n'),
> +            ('typedef struct alias_pair {\n'
> +             '  tree decl;\n'
> +             '  tree target;\n'
> +             '} alias_pair;\n'))
> +
> +    def test_multiline_GTY(self):
> +        # Ensure that a multiline GTY is filtered out.
> +        self.assert_filters_to(
> +            ('class GTY((desc ("%h.type"), tag ("SYMTAB_SYMBOL"),\n'
> +             '\t   chain_next ("%h.next"), chain_prev ("%h.previous")))\n'
> +             '  symtab_node_base\n'
> +             '{\n'),
> +            ('class symtab_node_base\n'
> +             '{\n'))
> +
> +    def test_ATTRIBUTE_UNUSED(self):
> +        # Ensure that ATTRIBUTE_UNUSED is filtered out.
> +        self.assert_filters_to(
> +            ('static void\n'
> +             'record_set (rtx dest, const_rtx set, void *data ATTRIBUTE_UNUSED)\n'
> +             '{\n'),
> +            ('static void\n'
> +             'record_set (rtx dest, const_rtx set, void *data)\n'
> +             '{\n'))
> +
> +    def test_PARAMS(self):
> +        self.assert_filters_to(
> +            'char *strcpy PARAMS ((char *dest, char *source));\n',
> +            'char *strcpy (char *dest, char *source);\n')
> +
> +def act_on_files(argv):
> +    for filename in argv[1:]:
> +        with open(filename) as f:
> +            text = f.read()
> +            print(filter_src(text))
> +
> +if __name__ == '__main__':
> +    if len(sys.argv) > 1:
> +        act_on_files(sys.argv)

> +    else:
> +        unittest.main()

TESTS!  Thanks.  This is so very refreshing :)

Other than the @verbatim markers, the rest looks fine.


Diego.

Patch

diff --git a/contrib/filter_gcc_for_doxygen b/contrib/filter_gcc_for_doxygen
index 3787eeb..ca1db31 100755
--- a/contrib/filter_gcc_for_doxygen
+++ b/contrib/filter_gcc_for_doxygen
@@ -8,5 +8,5 @@ 
 # process is put on stdout.
 
 dir=`dirname $0`
-perl $dir/filter_params.pl < $1 | perl $dir/filter_knr2ansi.pl 
+python $dir/filter_params.py $1 | perl $dir/filter_knr2ansi.pl
 exit 0
diff --git a/contrib/filter_params.pl b/contrib/filter_params.pl
deleted file mode 100755
index 22dae6c..0000000
--- a/contrib/filter_params.pl
+++ /dev/null
@@ -1,14 +0,0 @@ 
-#!/usr/bin/perl
-
-# Filters out some of the #defines used throughout the GCC sources:
-# - GTY(()) marks declarations for gengtype.c
-# - PARAMS(()) is used for K&R compatibility. See ansidecl.h.
-
-while (<>) {
-    s/^\/\* /\/\*\* \@verbatim /;
-    s/\*\// \@endverbatim \*\//;
-    s/GTY[ \t]*\(\(.*\)\)//g;
-    s/[ \t]ATTRIBUTE_UNUSED//g;
-    s/PARAMS[ \t]*\(\((.*?)\)\)/\($1\)/sg;
-    print;
-}
diff --git a/contrib/filter_params.py b/contrib/filter_params.py
new file mode 100644
index 0000000..d5092f6
--- /dev/null
+++ b/contrib/filter_params.py
@@ -0,0 +1,191 @@ 
+#!/usr/bin/python
+"""
+Filters out some of the #defines used throughout the GCC sources:
+- GTY(()) marks declarations for gengtype.c
+- PARAMS(()) is used for K&R compatibility. See ansidecl.h.
+
+When passed one or more filenames, acts on those files and prints the
+results to stdout.
+
+When run without a filename, runs a unit-testing suite.
+"""
+import re
+import sys
+import unittest
+
+# Optional whitespace
+opt_ws = '\s*'
+
+def filter_src(text):
+    """
+    str -> str.  We operate on the whole of the source file at once
+    (rather than individual lines) so that we can have multiline
+    regexes.
+    """
+
+    # First, convert tabs to spaces so that we can line things up
+    # sanely.
+    text = text.expandtabs(8)
+
+    # Convert C comments from GNU coding convention of:
+    #    /* FIRST_LINE
+    #       NEXT_LINE
+    #       FINAL_LINE.  */
+    # to:
+    #    /** @verbatim
+    #       FIRST_LINE
+    #       NEXT_LINE
+    #       FINAL_LINE.  @endverbatim */
+    # so that doxygen will parse them, and so the first line
+    # lines up correctly with subsequent lines.
+    # Single-line comments turn from:
+    #    /* TEXT.  */
+    # into:
+    #    /** @verbatim
+    #        TEXT.  @endverbatim */
+
+    text = re.sub(r'^([ \t]*)/\*+ ',
+                  (r'\1/** @verbatim\n'
+                   r'\1   '),
+                  text,
+                  flags=re.MULTILINE)
+    text = re.sub(r'\*+/',
+                  r'@endverbatim */',
+                  text)
+
+    # Remove GTY markings (potentially multiline ones):
+    text = re.sub('GTY' + opt_ws + r'\(\(.*?\)\)\s+',
+                  '',
+                  text,
+                  flags=(re.MULTILINE|re.DOTALL))
+
+    # Strip out 'ATTRIBUTE_UNUSED'
+    text = re.sub('\sATTRIBUTE_UNUSED',
+                  '',
+                  text)
+
+    # PARAMS(()) is used for K&R compatibility. See ansidecl.h.
+    text = re.sub('PARAMS' + opt_ws + r'\(\((.*?)\)\)',
+                  r'(\1)',
+                  text)
+
+    return text
+
+class FilteringTests(unittest.TestCase):
+    '''
+    Unit tests for filter_src.
+    '''
+    def assert_filters_to(self, src_input, expected_result):
+        # assertMultiLineEqual was added to unittest in 2.7/3.1
+        if hasattr(self, 'assertMultiLineEqual'):
+            assertion = self.assertMultiLineEqual
+        else:
+            assertion = self.assertEqual
+        assertion(expected_result, filter_src(src_input))
+
+    def test_comment_example(self):
+        self.assert_filters_to(
+            ('    /* FIRST_LINE\n'
+             '       NEXT_LINE\n'
+             '       FINAL_LINE.  */\n'),
+            ('    /** @verbatim\n'
+             '       FIRST_LINE\n'
+             '       NEXT_LINE\n'
+             '       FINAL_LINE.  @endverbatim */\n'))
+
+    def test_oneliner_comment(self):
+        self.assert_filters_to(
+            '/* Returns the string representing CLASS.  */\n',
+            ('/** @verbatim\n'
+             '   Returns the string representing CLASS.  @endverbatim */\n'))
+
+    def test_multiline_comment_without_initial_indent(self):
+        self.assert_filters_to(
+            ('/* The thread-local storage model associated with a given VAR_DECL\n'
+             "   or SYMBOL_REF.  This isn't used much, but both trees and RTL refer\n"
+             "   to it, so it's here.  */\n"),
+            ('/** @verbatim\n'
+             '   The thread-local storage model associated with a given VAR_DECL\n'
+             "   or SYMBOL_REF.  This isn't used much, but both trees and RTL refer\n"
+             "   to it, so it's here.  @endverbatim */\n"))
+
+    def test_multiline_comment_with_initial_indent(self):
+         self.assert_filters_to(
+            ('  /* Set once the definition was analyzed.  The list of references and\n'
+             '     other properties are built during analysis.  */\n'),
+            ('  /** @verbatim\n'
+             '     Set once the definition was analyzed.  The list of references and\n'
+             '     other properties are built during analysis.  @endverbatim */\n'))
+
+    def test_multiline_comment_with_tabs(self):
+        # Ensure that multiline comments containing tabs look sane, in
+        # this case, one with initial indentation.
+        self.assert_filters_to(
+            ('  /* All the anchor SYMBOL_REFs used to address these objects, sorted\n'
+             '     in order of increasing offset, and then increasing TLS model.\n'
+             '     The following conditions will hold for each element X in this vector:\n'
+             '\n'
+             '\t SYMBOL_REF_HAS_BLOCK_INFO_P (X)\n'
+             '\t SYMBOL_REF_ANCHOR_P (X)\n'
+             '\t SYMBOL_REF_BLOCK (X) == [address of this structure]\n'
+             '\t SYMBOL_REF_BLOCK_OFFSET (X) >= 0.  */\n'
+             '  vec<rtx, va_gc> *anchors;\n'),
+            ('  /** @verbatim\n'
+             '     All the anchor SYMBOL_REFs used to address these objects, sorted\n'
+             '     in order of increasing offset, and then increasing TLS model.\n'
+             '     The following conditions will hold for each element X in this vector:\n'
+             '\n'
+             '         SYMBOL_REF_HAS_BLOCK_INFO_P (X)\n'
+             '         SYMBOL_REF_ANCHOR_P (X)\n'
+             '         SYMBOL_REF_BLOCK (X) == [address of this structure]\n'
+             '         SYMBOL_REF_BLOCK_OFFSET (X) >= 0.  @endverbatim */\n'
+             '  vec<rtx, va_gc> *anchors;\n'))
+
+    def test_simple_GTY(self):
+        # Ensure that a simple GTY is filtered out.
+        self.assert_filters_to(
+            ('typedef struct GTY(()) alias_pair {\n'
+             '  tree decl;\n'
+             '  tree target;\n'
+             '} alias_pair;\n'),
+            ('typedef struct alias_pair {\n'
+             '  tree decl;\n'
+             '  tree target;\n'
+             '} alias_pair;\n'))
+
+    def test_multiline_GTY(self):
+        # Ensure that a multiline GTY is filtered out.
+        self.assert_filters_to(
+            ('class GTY((desc ("%h.type"), tag ("SYMTAB_SYMBOL"),\n'
+             '\t   chain_next ("%h.next"), chain_prev ("%h.previous")))\n'
+             '  symtab_node_base\n'
+             '{\n'),
+            ('class symtab_node_base\n'
+             '{\n'))
+
+    def test_ATTRIBUTE_UNUSED(self):
+        # Ensure that ATTRIBUTE_UNUSED is filtered out.
+        self.assert_filters_to(
+            ('static void\n'
+             'record_set (rtx dest, const_rtx set, void *data ATTRIBUTE_UNUSED)\n'
+             '{\n'),
+            ('static void\n'
+             'record_set (rtx dest, const_rtx set, void *data)\n'
+             '{\n'))
+
+    def test_PARAMS(self):
+        self.assert_filters_to(
+            'char *strcpy PARAMS ((char *dest, char *source));\n',
+            'char *strcpy (char *dest, char *source);\n')
+
+def act_on_files(argv):
+    for filename in argv[1:]:
+        with open(filename) as f:
+            text = f.read()
+            print(filter_src(text))
+
+if __name__ == '__main__':
+    if len(sys.argv) > 1:
+        act_on_files(sys.argv)
+    else:
+        unittest.main()