diff mbox

[committed] libcpp: add callback for comment-handling

Message ID 1496698274-14253-1-git-send-email-dmalcolm@redhat.com
State New
Headers show

Commit Message

David Malcolm June 5, 2017, 9:31 p.m. UTC
On Fri, 2017-05-12 at 12:47 -0600, Jeff Law wrote:
> On 05/02/2017 01:08 PM, David Malcolm wrote:
> > Currently the C/C++ frontends discard comments when parsing.
> > It's possible to set up libcpp to capture comments as tokens,
> > by setting CPP_OPTION (pfile, discard_comments) to false),
> > and this can be enabled using the -C command line option (see
> > also -CC), but c-family/c-lex.c then discards any CPP_COMMENT
> > tokens it sees, so they're not seen by the frontend parser.
> > 
> > The following patch adds an (optional) callback to libcpp
> > for handling comments, giving the comment content, and the
> > location it was seen at.  This approach allows arbitrary
> > logic to be wired up to comments, and avoids having to
> > copy the comment content to a new buffer (which the CPP_COMMENT
> > approach does).
> > 
> > This could be used by plugins to chain up on the callback
> > e.g. to parse specially-formatted comments, e.g. for
> > documentation generation, or e.g. for GObject introspection
> > annotations [1].
> > 
> > As a proof of concept, the patch uses this to add a spellchecker
> > for comments.  It uses the Enchant meta-library:
> >     https://abiword.github.io/enchant/
> > (essentially a wrapper around 8 different spellchecking libraries).
> > I didn't bother with the autotool detection for enchant, and
> > just hacked it in for now.
> > 
> > Example output:
> > 
> > test.c:3:46: warning: spellcheck_word: "evaulate"
> >      When NONCONST_PRED is false the code will evaulate to constant
> > and
> >                                                ^~~~~~~~
> > test.c:3:46: note: suggestion: "evaluate"
> >      When NONCONST_PRED is false the code will evaulate to constant
> > and
> >                                                ^~~~~~~~
> >                                                evaluate
> > test.c:3:46: note: suggestion: "ululate"
> >      When NONCONST_PRED is false the code will evaulate to constant
> > and
> >                                                ^~~~~~~~
> >                                                ululate
> > test.c:3:46: note: suggestion: "elevate"
> >      When NONCONST_PRED is false the code will evaulate to constant
> > and
> >                                                ^~~~~~~~
> >                                                elevate
> > 
> > License-wise, Enchant is LGPL 2.1 "or (at your option) any
> > later version." with a special exception to allow non-LGPL
> > spellchecking providers (e.g. to allow linking against an
> > OS-provided spellchecker).
> > 
> > Various FIXMEs are present (e.g. hardcoded "en_US" for the
> > language to spellcheck against).
> > Also, the spellchecker has a lot of false positives e.g.
> > it doesn't grok URLs (and thus complains when it seens them);
> > similar for DejaGnu directives etc.
> > 
> > Does enchant seem like a reasonable dependency for the compiler?
> > (it pulls in libpthread.so.0, libglib-2.0.so.0, libgmodule
> > -2.0.so.0).
> > Or would this be better pursued as a plugin?  (if so, I'd
> > prefer the plugin to live in the source tree as an example,
> > rather than out-of-tree).
> > 
> > Unrelated to spellchecking, I also added two new options:
> > -Wfixme and -Wtodo, for warning when comments containing
> > "FIXME" or "TODO" are encountered.
> > I use such comments a lot during development.  I thought some
> > people might want a warning about them (I tend to just use grep
> > though).  [TODO: document these in invoke.texi, add test cases]
> > 
> > Thoughts?  Does any of this sound useful?
> > 
> > [not yet bootstrapped; as noted above, I haven't yet done
> > the autoconf stuff for handling Enchant]
> > 
> > [1] 
> > https://wiki.gnome.org/Projects/GObjectIntrospection/Annotations
> > 
> > gcc/ChangeLog:
> > 	* Makefile.in (LIBS): Hack in -lenchant for now.
> > 	(OBJS): Add spellcheck-enchant.o.
> > 	* common.opt (Wfixme): New option.
> > 	(Wtodo): New option.
> > 	* spellcheck-enchant.c: New file.
> > 	* spellcheck-enchant.h: New file.
> > 
> > gcc/c-family/ChangeLog:
> > 	* c-lex.c: Include spellcheck-enchant.h.
> > 	(init_c_lex): Wire up spellcheck_enchant_check_comment to the
> > 	comment callback.
> > 	* c-opts.c: Include spellcheck-enchant.h.
> > 	(c_common_post_options): Call spellcheck_enchant_init.
> > 	(c_common_finish): Call spellcheck_enchant_finish.
> > 
> > libcpp/ChangeLog:
> > 	* include/cpplib.h (struct cpp_callbacks): Add "comment"
> > 	callback.
> > 	* lex.c (_cpp_lex_direct): Call the comment callback if non
> > -NULL.
> enchant seems a bit out of the sweet spot, particular just to catch
> mis-spellings in comments.  But it might make an interesting plugin.
> 
> IIRC from our meeting earlier this week, you had another use case
> that
> might have been more compelling, but I can't remember what it was.

(FWIW I think we were chatting about the GObjectIntrospection annotations
use-case)

> I do like the ability to at least capture the comments better and
> while
> we don't have a strong need for that capability now, we might in the
> future.
> 
> Jeff

I stripped out all of the spellchecking stuff, and restricted the
scope of the patch to just adding a callback for plugins to hook into
if they're interested in handling comments.  I added an example of such
a plugin in the testsuite.

Successfully bootstrapped&regrtested on x86_64-pc-linux-gnu.

Committed to trunk as r248901.

gcc/testsuite/ChangeLog:
	* g++.dg/plugin/comment_plugin.c: New test plugin.
	* g++.dg/plugin/comments-1.C: New test file.
	* g++.dg/plugin/plugin.exp (plugin_test_list): Add the above.

libcpp/ChangeLog:
	* include/cpplib.h (struct cpp_callbacks): Add "comment"
	callback.
	* lex.c (_cpp_lex_direct): Call the comment callback if non-NULL.
---
 gcc/testsuite/g++.dg/plugin/comment_plugin.c | 63 ++++++++++++++++++++++++++++
 gcc/testsuite/g++.dg/plugin/comments-1.C     | 49 ++++++++++++++++++++++
 gcc/testsuite/g++.dg/plugin/plugin.exp       |  1 +
 libcpp/include/cpplib.h                      |  9 ++++
 libcpp/lex.c                                 |  7 ++++
 5 files changed, 129 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/plugin/comment_plugin.c
 create mode 100644 gcc/testsuite/g++.dg/plugin/comments-1.C
diff mbox

Patch

diff --git a/gcc/testsuite/g++.dg/plugin/comment_plugin.c b/gcc/testsuite/g++.dg/plugin/comment_plugin.c
new file mode 100644
index 0000000..c3b08e3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/plugin/comment_plugin.c
@@ -0,0 +1,63 @@ 
+/* Test of cpp_callbacks::comments.  */
+
+#include "gcc-plugin.h"
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "cpplib.h"
+#include "diagnostic.h"
+#include "c-family/c-pragma.h"
+
+int plugin_is_GPL_compatible;
+
+/* Test callback for cpp_callbacks::comments.  */
+
+void
+my_comment_cb (cpp_reader *, source_location loc,
+	       const unsigned char *content, size_t len)
+{
+  if (in_system_header_at (loc))
+    return;
+
+  /* CONTENT contains the opening slash-star (or slash-slash),
+     and for C-style comments contains the closing star-slash.  */
+  gcc_assert (len >= 2);
+  gcc_assert (content[0] == '/');
+  gcc_assert (content[1] == '*' || content[1] == '/');
+  bool c_style = (content[1] == '*');
+  if (c_style)
+    {
+      gcc_assert (content[len - 2] == '*');
+      gcc_assert (content[len - 1] == '/');
+    }
+
+  if (c_style)
+    inform (loc, "got C-style comment; length=%i", len);
+  else
+    inform (loc, "got C++-style comment; length=%i", len);
+
+  /* Print the content of the comment.
+     For a C-style comment, the buffer CONTENT contains the opening
+     slash-star and closing star-slash, so we can't directly verify
+     it in the DejaGnu test without adding another comment, which
+     would trigger this callback again.
+     Hence we skip the syntactically-significant parts of the comment
+     when printing it.  */
+  fprintf (stderr, "stripped content of comment: >");
+  /* Avoid printing trailing star-slash.  */
+  if (c_style)
+    len -= 2;
+  for (size_t i = 2; i < len; i++)
+    fputc (content[i], stderr);
+  fprintf (stderr, "<\n");
+}
+
+int
+plugin_init (struct plugin_name_args *plugin_info,
+	     struct plugin_gcc_version *version)
+{
+  cpp_callbacks *cb = cpp_get_callbacks (parse_in);
+  cb->comment = my_comment_cb;
+
+  return 0;
+}
diff --git a/gcc/testsuite/g++.dg/plugin/comments-1.C b/gcc/testsuite/g++.dg/plugin/comments-1.C
new file mode 100644
index 0000000..0821b14
--- /dev/null
+++ b/gcc/testsuite/g++.dg/plugin/comments-1.C
@@ -0,0 +1,49 @@ 
+/* Example of a one-line C-style comment.  */
+#if 0
+{ dg-message "1: got C-style comment; length=45" "" { target *-*-* } .-2 }
+{ dg-begin-multiline-output "" }
+stripped content of comment: > Example of a one-line C-style comment.  <
+{ dg-end-multiline-output "" }
+#endif
+
+     /*Another example of a one-line C-style comment.*/
+#if 0
+{ dg-message "6: got C-style comment; length=50" "" { target *-*-* } .-2 }
+{ dg-begin-multiline-output "" }
+stripped content of comment: >Another example of a one-line C-style comment.<
+{ dg-end-multiline-output "" }
+#endif
+
+/**/
+#if 0
+{ dg-message "1: got C-style comment; length=4" "" { target *-*-* } .-2 }
+{ dg-begin-multiline-output "" }
+stripped content of comment: ><
+{ dg-end-multiline-output "" }
+#endif
+
+/* Example of a
+   multi-line C-style comment.  */
+#if 0
+{ dg-message "1: got C-style comment; length=50" "" { target *-*-* } .-3 }
+{ dg-begin-multiline-output "" }
+stripped content of comment: > Example of a
+   multi-line C-style comment.  <
+{ dg-end-multiline-output "" }
+#endif
+
+// Example of a C++-style comment
+#if 0
+{ dg-message "1: got C\\+\\+-style comment; length=33" "" { target *-*-* } .-2 }
+{ dg-begin-multiline-output "" }
+stripped content of comment: > Example of a C++-style comment<
+{ dg-end-multiline-output "" }
+#endif
+
+//
+#if 0
+{ dg-message "1: got C\\+\\+-style comment; length=2" "" { target *-*-* } .-2 }
+{ dg-begin-multiline-output "" }
+stripped content of comment: ><
+{ dg-end-multiline-output "" }
+#endif
diff --git a/gcc/testsuite/g++.dg/plugin/plugin.exp b/gcc/testsuite/g++.dg/plugin/plugin.exp
index 94ebe93..e40cba3 100644
--- a/gcc/testsuite/g++.dg/plugin/plugin.exp
+++ b/gcc/testsuite/g++.dg/plugin/plugin.exp
@@ -68,6 +68,7 @@  set plugin_test_list [list \
     { show_template_tree_color_plugin.c \
     	  show-template-tree-color.C \
     	  show-template-tree-color-no-elide-type.C } \
+    { comment_plugin.c comments-1.C } \
 ]
 
 foreach plugin_test $plugin_test_list {
diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
index b843992..66ef4d6 100644
--- a/libcpp/include/cpplib.h
+++ b/libcpp/include/cpplib.h
@@ -609,6 +609,15 @@  struct cpp_callbacks
 
   /* Callback for providing suggestions for misspelled directives.  */
   const char *(*get_suggestion) (cpp_reader *, const char *, const char *const *);
+
+  /* Callback for when a comment is encountered, giving the location
+     of the opening slash, a pointer to the content (which is not
+     necessarily 0-terminated), and the length of the content.
+     The content contains the opening slash-star (or slash-slash),
+     and for C-style comments contains the closing star-slash.  For
+     C++-style comments it does not include the terminating newline.  */
+  void (*comment) (cpp_reader *, source_location, const unsigned char *,
+		   size_t);
 };
 
 #ifdef VMS
diff --git a/libcpp/lex.c b/libcpp/lex.c
index 9edd2a6..40ff801 100644
--- a/libcpp/lex.c
+++ b/libcpp/lex.c
@@ -2889,6 +2889,13 @@  _cpp_lex_direct (cpp_reader *pfile)
       if (fallthrough_comment_p (pfile, comment_start))
 	fallthrough_comment = true;
 
+      if (pfile->cb.comment)
+	{
+	  size_t len = pfile->buffer->cur - comment_start;
+	  pfile->cb.comment (pfile, result->src_loc, comment_start - 1,
+			     len + 1);
+	}
+
       if (!pfile->state.save_comments)
 	{
 	  result->flags |= PREV_WHITE;