diff mbox

[c++] provide intelligent error messages for missing semicolon after class definition

Message ID 20101109133016.GA7991@nightcrawler
State New
Headers show

Commit Message

Nathan Froyd Nov. 9, 2010, 1:30 p.m. UTC
The patch below addresses PR c++/45331 (and related PRs) by providing a
more helpful error message (in some cases) when the user forgets a
semicolon at the end of a class definition.  The basic idea is that, at
the end of the class definition, we check to see whether the next token
is in the FOLLOW set of the class definition rule.  If it is not, then
there should have been a semicolon: we complain and silently insert a
semicolon to avoid cascading errors.

The bulk of the patch comes from fixing up testcases.

Tested on x86_64-unknown-linux-gnu.  OK to commit?

-Nathan

gcc/cp/
	PR c++/16189
	PR c++/36888
	PR c++/45331
	* parser.c (cp_lexer_set_token_position): New function.
	(cp_lexer_previous_token_position): New function.
	(cp_lexer_previous_token): Call it.
	(cp_parser_class_specifier): Try to gracefully handle a missing
	semicolon.

gcc/testsuite/
	PR c++/16189
	PR c++/36888
	PR c++/45331
	* g++.dg/parse/semicolon3.C: New test.
	* g++.dg/debug/pr22514.C: Adjust.
	* g++.dg/init/error1.C: Adjust.
	* g++.dg/other/bitfield3.C: Adjust.
	* g++.dg/other/semicolon.C: Adjust.
	* g++.dg/parse/error14.C: Adjust.
	* g++.dg/parse/error5.C: Adjust.
	* g++.dg/parse/parameter-declaration-1.C: Adjust.
	* g++.dg/template/pr23510.C: Adjust.
	* g++.dg/template/pr39425.C: Adjust.
	* g++.old-deja/g++.robertl/eb125.C: Adjust.

Comments

Mark Mitchell Nov. 17, 2010, 2:39 a.m. UTC | #1
On 11/9/2010 5:30 AM, Nathan Froyd wrote:

> The patch below addresses PR c++/45331 (and related PRs) by providing a
> more helpful error message (in some cases) when the user forgets a
> semicolon at the end of a class definition.

Nice.

> +      case CPP_NAME:
> +      case CPP_SEMICOLON:
> +      case CPP_MULT:
> +      case CPP_AND:
> +      case CPP_OPEN_PAREN:
> +      case CPP_CLOSE_PAREN:
> +      case CPP_COMMA:
> +        want_semicolon = false;
> +        break;
> +        /* While it's legal for type qualifiers and storage class

Usual GNU style is a blank line after the "break", especially before a
big comment.  Also, I'm a little worried that we'll add something
somewhere and forget to update this list, or the list of

> +	  case RID_CONST:
> +	  case RID_VOLATILE:

specifiers and types (RID_INT, etc).  Would you please factor into a
predicate function?  I guess the best possible thing would be a .def
file or other table where when adding new RID_* things you'd be forced
to fill in the appropriate blanks here, but maybe that's insane effort?

Thank you,
Paolo Bonzini Nov. 18, 2010, 12:18 p.m. UTC | #2
On 11/09/2010 02:30 PM, Nathan Froyd wrote:
> The patch below addresses PR c++/45331 (and related PRs) by providing a
> more helpful error message (in some cases) when the user forgets a
> semicolon at the end of a class definition.  The basic idea is that, at
> the end of the class definition, we check to see whether the next token
> is in the FOLLOW set of the class definition rule.  If it is not, then
> there should have been a semicolon: we complain and silently insert a
> semicolon to avoid cascading errors.

Thanks very much for this!

Paolo
Nathan Froyd Nov. 18, 2010, 5:58 p.m. UTC | #3
On Tue, Nov 16, 2010 at 06:39:20PM -0800, Mark Mitchell wrote:
> On 11/9/2010 5:30 AM, Nathan Froyd wrote:
> > +      case CPP_NAME:
> > +      case CPP_SEMICOLON:
> > +      case CPP_MULT:
> > +      case CPP_AND:
> > +      case CPP_OPEN_PAREN:
> > +      case CPP_CLOSE_PAREN:
> > +      case CPP_COMMA:
> > +        want_semicolon = false;
> > +        break;
> > +        /* While it's legal for type qualifiers and storage class
> 
> Usual GNU style is a blank line after the "break", especially before a
> big comment.  Also, I'm a little worried that we'll add something
> somewhere and forget to update this list, or the list of
> 
> > +	  case RID_CONST:
> > +	  case RID_VOLATILE:
> 
> specifiers and types (RID_INT, etc).  Would you please factor into a
> predicate function?

I think it would be reasonable to provide functions for the RID_*
constants (keyword_is_{storage_class,type}_specifier, perhaps); such
lists occur all over the place.  But I'm not sure it's worthwhile for
the CPP_* constants in this case.  A function
cpp_token_in_follow_set_for_class_specifier doesn't seem like it'd be
reusable.  WDYT?

-Nathan
Mark Mitchell Nov. 18, 2010, 6:14 p.m. UTC | #4
On 11/18/2010 9:58 AM, Nathan Froyd wrote:

> I think it would be reasonable to provide functions for the RID_*
> constants (keyword_is_{storage_class,type}_specifier, perhaps); such
> lists occur all over the place.  But I'm not sure it's worthwhile for
> the CPP_* constants in this case.  A function
> cpp_token_in_follow_set_for_class_specifier doesn't seem like it'd be
> reusable.  WDYT?

Yes, that sounds reasonable.

Thank you,
diff mbox

Patch

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 6a9e4d7..05a36b6 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -502,15 +502,25 @@  cp_lexer_token_at (cp_lexer *lexer ATTRIBUTE_UNUSED, cp_token_position pos)
   return pos;
 }
 
-static inline cp_token *
-cp_lexer_previous_token (cp_lexer *lexer)
+static inline void
+cp_lexer_set_token_position (cp_lexer *lexer, cp_token_position pos)
 {
-  cp_token_position tp;
+  lexer->next_token = cp_lexer_token_at (lexer, pos);
+}
 
+static inline cp_token_position
+cp_lexer_previous_token_position (cp_lexer *lexer)
+{
   if (lexer->next_token == &eof_token)
-    tp = lexer->last_token - 1;
+    return lexer->last_token - 1;
   else
-    tp = cp_lexer_token_position (lexer, true);
+    return cp_lexer_token_position (lexer, true);
+}
+
+static inline cp_token *
+cp_lexer_previous_token (cp_lexer *lexer)
+{
+  cp_token_position tp = cp_lexer_previous_token_position (lexer);
 
   return cp_lexer_token_at (lexer, tp);
 }
@@ -16866,6 +16876,131 @@  cp_parser_class_specifier (cp_parser* parser)
     type = finish_struct (type, attributes);
   if (nested_name_specifier_p)
     pop_inner_scope (old_scope, scope);
+
+  /* We've finished a type definition.  Check for the common syntax
+     error of forgetting a semicolon after the definition.  We need to
+     be careful, as we can't just check for not-a-semicolon and be done
+     with it; the user might have typed:
+
+     class X { } c = ...;
+     class X { } *p = ...;
+
+     and so forth.  Instead, enumerate all the possible tokens that
+     might follow this production; if we don't see one of them, then
+     complain and silently insert the semicolon.  */
+  {
+    cp_token *token = cp_lexer_peek_token (parser->lexer);
+    bool want_semicolon = true;
+
+    switch (token->type)
+      {
+      case CPP_NAME:
+      case CPP_SEMICOLON:
+      case CPP_MULT:
+      case CPP_AND:
+      case CPP_OPEN_PAREN:
+      case CPP_CLOSE_PAREN:
+      case CPP_COMMA:
+        want_semicolon = false;
+        break;
+        /* While it's legal for type qualifiers and storage class
+           specifiers to follow type definitions in the grammar, only
+           compiler testsuites contain code like that.  Assume that if
+           we see such code, then what we're really seeing is a case
+           like:
+
+           class X { }
+           const <type> var = ...;
+
+           or
+
+           class Y { }
+           static <type> var = ...;
+
+           i.e. the qualifier or specifier applies to the next
+           declaration.  To do so, however, we need to look ahead one
+           more token to see if *that* token is a type specifier.
+
+	   This code could be improved to handle:
+
+	   class Z { }
+	   static const <type> var = ...;  */
+      case CPP_KEYWORD:
+	switch (token->keyword)
+	  {
+	  case RID_CONST:
+	  case RID_VOLATILE:
+	  case RID_INLINE:
+	  case RID_RESTRICT:
+	  case RID_STATIC:
+	  case RID_EXTERN:
+	  case RID_TYPEDEF:
+	  case RID_REGISTER:
+	  case RID_AUTO:
+	  case RID_MUTABLE:
+	    {
+	      cp_token *lookahead = cp_lexer_peek_nth_token (parser->lexer, 2);
+
+	      if (lookahead->type == CPP_KEYWORD)
+		switch (lookahead->keyword)
+		  {
+		  case RID_INT:
+		  case RID_CHAR:
+		  case RID_FLOAT:
+		  case RID_DOUBLE:
+		  case RID_VOID:
+		  case RID_INT128:
+		  case RID_UNSIGNED:
+		  case RID_LONG:
+		  case RID_SHORT:
+		  case RID_SIGNED:
+		  case RID_DFLOAT32:
+		  case RID_DFLOAT64:
+		  case RID_DFLOAT128:
+		  case RID_FRACT:
+		  case RID_ACCUM:
+		  case RID_BOOL:
+		  case RID_WCHAR:
+		  case RID_CHAR16:
+		  case RID_CHAR32:
+		  case RID_SAT:
+		    /* These are all type specifiers.  */
+		    break;
+		  default:
+		    want_semicolon = false;
+		    break;
+		  }
+	      else if (lookahead->type == CPP_NAME)
+		/* Handling user-defined types here would be nice, but
+		   very tricky.  */
+		want_semicolon = false;
+	    }
+	    break;
+	  default:
+	    break;
+	  }
+      default:
+	break;
+      }
+
+    if (want_semicolon)
+      {
+	cp_token_position prev
+	  = cp_lexer_previous_token_position (parser->lexer);
+	cp_token *prev_token = cp_lexer_token_at (parser->lexer, prev);
+
+	error_at (prev_token->location,
+		  "expected %<;%> after class specifier");
+
+	/* Unget one token and smash it to look as though we encountered
+	   a semicolon in the input stream.  */
+	cp_lexer_set_token_position (parser->lexer, prev);
+	token = cp_lexer_peek_token (parser->lexer);
+	token->type = CPP_SEMICOLON;
+	token->keyword = RID_MAX;
+      }
+  }
+
   /* If this class is not itself within the scope of another class,
      then we need to parse the bodies of all of the queued function
      definitions.  Note that the queued functions defined in a class
diff --git a/gcc/testsuite/g++.dg/debug/pr22514.C b/gcc/testsuite/g++.dg/debug/pr22514.C
index 3df9e23..6d2d820 100644
--- a/gcc/testsuite/g++.dg/debug/pr22514.C
+++ b/gcc/testsuite/g++.dg/debug/pr22514.C
@@ -8,6 +8,6 @@  namespace s
   template<int i> struct list : _List_base<i>
   {
     using _List_base<i>::_M_impl;
-  }
-}  /* { dg-error "expected unqualified-id before '\}'" } */
+  } // { dg-error "after class specifier" }
+}
 s::list<1> OutputModuleListType;
diff --git a/gcc/testsuite/g++.dg/init/error1.C b/gcc/testsuite/g++.dg/init/error1.C
index dd12e4c..d6177e9 100644
--- a/gcc/testsuite/g++.dg/init/error1.C
+++ b/gcc/testsuite/g++.dg/init/error1.C
@@ -2,6 +2,6 @@ 
 
 struct A {
   static float b[10];
-}
+} // { dg-error "after class specifier" }
 
-float A::b[] = {1,2,3}; // { dg-error "" }
+float A::b[] = {1,2,3};
diff --git a/gcc/testsuite/g++.dg/other/bitfield3.C b/gcc/testsuite/g++.dg/other/bitfield3.C
index b9726c2..befd7f8 100644
--- a/gcc/testsuite/g++.dg/other/bitfield3.C
+++ b/gcc/testsuite/g++.dg/other/bitfield3.C
@@ -3,13 +3,15 @@ 
 
 template<int> struct A
 {
-  struct {} : 2;	// { dg-error "with non-integral type" }
+  // multiple errors below: missing semicolon, no anonymous structs, etc.
+  struct {} : 2;	// { dg-error "" }
 };
 
 template<int> struct B
 {
   int a;
-  struct {} : 2;	// { dg-error "with non-integral type" }
+  // multiple errors below: missing semicolon, no anonymous structs, etc.
+  struct {} : 2;	// { dg-error "" }
   int b;
 };
 
diff --git a/gcc/testsuite/g++.dg/other/semicolon.C b/gcc/testsuite/g++.dg/other/semicolon.C
index efbae8b..00229c4 100644
--- a/gcc/testsuite/g++.dg/other/semicolon.C
+++ b/gcc/testsuite/g++.dg/other/semicolon.C
@@ -5,7 +5,6 @@ 
 
 struct A
 {
-  struct B { int i; } // { dg-error "3:new types may not be defined in a return type" }
-                      // { dg-message "perhaps a semicolon is missing" "note" { target *-*-* } 8 }
-  void foo();   // { dg-error "12:two or more" }
+  struct B { int i; } // { dg-error "after class specifier" }
+  void foo();
 };
diff --git a/gcc/testsuite/g++.dg/parse/error14.C b/gcc/testsuite/g++.dg/parse/error14.C
index 9e672c2..e9f1da9 100644
--- a/gcc/testsuite/g++.dg/parse/error14.C
+++ b/gcc/testsuite/g++.dg/parse/error14.C
@@ -21,6 +21,6 @@  struct X
 
 }; // { dg-error "2:expected '.' at end of input" "at end of input" }
    // { dg-error "1:expected primary-expression before '.' token" "primary" { target *-*-* } 22 }
-   // { dg-error "1:expected ';' before '.' token" "semicolon" { target *-*-* } 22 }
-   // { dg-error "1:expected unqualified-id at end of input" "unqual" { target *-*-* } 22 }
+   // { dg-error "2:expected ';' after class specifier" "semicolon" { target *-*-* } 22 }
+   // { dg-error "1:expected ';' before '.' token" "function" { target *-*-* } 22 }
 
diff --git a/gcc/testsuite/g++.dg/parse/error5.C b/gcc/testsuite/g++.dg/parse/error5.C
index 6ebb087..dfa3eb2 100644
--- a/gcc/testsuite/g++.dg/parse/error5.C
+++ b/gcc/testsuite/g++.dg/parse/error5.C
@@ -3,17 +3,17 @@ 
 
 class Foo { int foo() return 0; } };
 
-// { dg-error "30:expected identifier before numeric constant" "" { target *-*-* } 4 }
+// { dg-error "30:expected identifier before numeric constant" "identifier" { target *-*-* } 4 }
 
-// { dg-error "23:named return values are no longer supported" "" { target *-*-* } 4 }
+// { dg-error "23:named return values are no longer supported" "named return" { target *-*-* } 4 }
 
 // the column number info of this error output is still wrong because the error
 // message has been generated by cp_parser_error() which does not
 // necessarily allow accurate column number display. At some point, we will
 // need make cp_parser_error() report more accurate column numbers.
-// { dg-error "30:expected '\{' at end of input" "" { target *-*-* } 4 }
+// { dg-error "30:expected '\{' at end of input" "brace" { target *-*-* } 4 }
 
-// { dg-error "35:expected unqualified-id before '\}' token" "" {target *-*-* } 4 }
+// { dg-error "33:expected ';' after class specifier" "semicolon" {target *-*-* } 4 }
 
-// { dg-error "35:expected declaration before '\}' token" "" {target *-*-* } 4 }
+// { dg-error "35:expected declaration before '\}' token" "declaration" {target *-*-* } 4 }
 
diff --git a/gcc/testsuite/g++.dg/parse/parameter-declaration-1.C b/gcc/testsuite/g++.dg/parse/parameter-declaration-1.C
index 22d6f21..58f6799 100644
--- a/gcc/testsuite/g++.dg/parse/parameter-declaration-1.C
+++ b/gcc/testsuite/g++.dg/parse/parameter-declaration-1.C
@@ -2,5 +2,5 @@ 
 // Origin: Robert Schiele; PR C++/8799
 // { dg-do compile }
 
-struct {
+struct {			// { dg-error "" }
    a(void = 0; a(0), a(0)	// { dg-error "" "" { target *-*-* } }
diff --git a/gcc/testsuite/g++.dg/parse/semicolon3.C b/gcc/testsuite/g++.dg/parse/semicolon3.C
new file mode 100644
index 0000000..a919750
--- /dev/null
+++ b/gcc/testsuite/g++.dg/parse/semicolon3.C
@@ -0,0 +1,135 @@ 
+// PR c++/45331
+// { dg-do compile }
+
+struct OK1
+{
+  int a;
+} // no complaints
+  *s5;
+
+struct OK2
+{
+  int a;
+} // no complaints
+  &s6 = *(new OK2());
+
+struct OK3
+{
+  int a;
+} // no complaints
+  (s7);
+
+__SIZE_TYPE__
+test_offsetof (void)
+{
+  // no complaints about a missing semicolon
+  return __builtin_offsetof (struct OK4 { int a; int b; }, b);
+}
+
+struct OK5
+{
+  int a;
+} ok5_var;			// no complaints
+
+struct OK6
+{
+  int a;
+} static ok6_var;		// no complaints
+
+class OK7
+{
+public:
+  OK7() { };
+  int a;
+} const ok7_var;		// no complaints
+
+class OK8
+{
+  int a;
+} extern ok8_var;		// no complaints
+
+class OK9
+{
+  class OK9sub { int a; } mutable ok9sub; // no complaints
+  int a;
+};
+
+int
+autotest (void)
+{
+  struct OK10 { int a; } auto ok10 = { 0 }; // no complaints
+
+  return ok10.a;
+}
+
+struct E1
+{
+  int a;
+} // { dg-error "after class specifier" }
+
+typedef float BAR;
+
+struct E2
+{
+  int a;
+} // { dg-error "after class specifier" }
+
+const int i0 = 1;
+
+struct E3
+{
+  int a;
+} // { dg-error "after class specifier" }
+
+volatile long l0 = 1;
+
+struct E4
+{
+  int a;
+} // { dg-error "after class specifier" }
+
+extern char c0;
+
+struct E5
+{
+  int a;
+} // { dg-error "after class specifier" }
+
+static wchar_t wc0;
+
+struct E6
+{
+  int a;
+} // { dg-error "after class specifier" }
+
+bool b0;
+
+class E7
+{
+  int a;
+} // { dg-error "after class specifier" }
+
+extern double d0;
+
+class E8
+{
+  int a;
+} // { dg-error "after class specifier" }
+
+inline short f(void)
+{
+  return 2;
+}
+
+/* This was the original test from the PR.  */
+
+class C0
+{
+public:
+ int a;
+} // { dg-error "after class specifier" }
+
+const int foo(const C0 &x)
+{
+ return x.a;
+}
diff --git a/gcc/testsuite/g++.dg/template/pr23510.C b/gcc/testsuite/g++.dg/template/pr23510.C
index b9e9889..e63ddc0 100644
--- a/gcc/testsuite/g++.dg/template/pr23510.C
+++ b/gcc/testsuite/g++.dg/template/pr23510.C
@@ -6,13 +6,13 @@  struct Factorial
   enum { nValue = nFactor * Factorial<nFactor - 1>::nValue }; // { dg-error "depth exceeds maximum" } 
   // { dg-message "recursively instantiated" "" { target *-*-* } 6 } 
   // { dg-error "incomplete type" "" { target *-*-* } 6 } 
-} 
+}
 
-  template<> // { dg-error "expected" } 
+  template<>
   struct Factorial<0>
   {
     enum { nValue = 1 };
-  }
+  };
 
     static const unsigned int FACTOR = 20;
 
diff --git a/gcc/testsuite/g++.dg/template/pr39425.C b/gcc/testsuite/g++.dg/template/pr39425.C
index a063e05..db0423b 100644
--- a/gcc/testsuite/g++.dg/template/pr39425.C
+++ b/gcc/testsuite/g++.dg/template/pr39425.C
@@ -15,4 +15,4 @@  class a {
 
   static const unsigned int value = _rec < 1 >::size;
 
-}		// { dg-error "unqualified-id" }
+} // { dg-error "after class specifier" }
diff --git a/gcc/testsuite/g++.old-deja/g++.robertl/eb125.C b/gcc/testsuite/g++.old-deja/g++.robertl/eb125.C
index b068236..ed4a34b 100644
--- a/gcc/testsuite/g++.old-deja/g++.robertl/eb125.C
+++ b/gcc/testsuite/g++.old-deja/g++.robertl/eb125.C
@@ -10,13 +10,13 @@  void test<class BOX> (test_box *);   // { dg-error "" } illegal code
 class test_square
     {
       friend void test<class BOX> (test_box *); // { dg-error "" } does not match
-    }
+    }						// { dg-error "after class specifier" }
 
 
 
-template <class BOX> void test(BOX *the_box)  // { dg-error "" } semicolon missing
-    {x
-    the_box->print();
-    };
+template <class BOX> void test(BOX *the_box)
+    {x				// { dg-error "not declared in this scope" }
+    the_box->print();		// { dg-error "before" }
+    }
 
-template void test<> (test_box *); // { dg-error "" }
+template void test<> (test_box *);