diff mbox

Better error recovery for merge-conflict markers (v4)

Message ID 1450290181.1916.41.camel@surprise
State New
Headers show

Commit Message

David Malcolm Dec. 16, 2015, 6:23 p.m. UTC
On Wed, 2015-12-09 at 18:44 +0100, Bernd Schmidt wrote:
> On 12/09/2015 05:58 PM, David Malcolm wrote:
> > On Wed, 2015-11-04 at 14:56 +0100, Bernd Schmidt wrote:
> >>
> >> This seems like fairly low impact but also low cost, so I'm fine with it
> >> in principle. I wonder whether the length of the marker is the same
> >> across all versions of patch (and VC tools)?
> >
> > It's hardcoded for GNU patch:
> [...]
>  >From what I can tell, Perforce is the outlier here.
> 
> Thanks for checking all that.
> 
> >> Just thinking out loud - I guess it would be too much to hope for to
> >> share lexers between frontends so that we need only one copy of this?
> >
> > Probably :(
> 
> Someone slap sense into me, I just thought of deriving C and C++ parsers 
> from a common base class... (no this is not a suggestion for this patch).
> 
> > Would a better wording be:
> >
> > extern short some_var; /* This line would lead to a warning due to the
> >                            duplicate name, but it is skipped when handling
> >                            the conflict marker.  */
> 
> I think so, yes.
> 
> > That said, it's not clear they're always at the beginning of a line;
> > this bazaar bug indicates that CVS (and bazaar) can emit them
> > mid-line:
> >    https://bugs.launchpad.net/bzr/+bug/36399
> 
> Ok. CVS I think we shouldn't worry about, and it looks like this is one 
> particular bug/corner case where the conflict end marker is the last 
> thing in the file. I think on the whole it's best to check for beginning 
> of the line as you've done.
> 
> > Wording-wise, should it be "merge conflict marker", rather
> > than "patch conflict marker"?
> >
> > Clang spells it:
> > "error: version control conflict marker in file"
> > http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html#merge_conflicts
> 
> Yeah, if another compiler has a similar/identical diagnostic I think we 
> should just copy that unless there's a very good reason not to.
> 
> > Rebased on top of r231445 (from yesterday).
> > Successfully bootstrapped&regrtested on x86_64-pc-linux-gnu.
> > Adds 82 new PASSes to g++.sum and 27 new PASSes to gcc.sum.
> >
> > OK for trunk?
> 
> I'm inclined to say yes since it was originally submitted in time and 
> it's hard to imagine how the change could be risky (I'll point out right 
> away that there are one or two other patches in the queue that were also 
> submitted in time which I feel should not be considered for gcc-6 at 
> this point due to risk).
> 
> Let's wait until the end of the week for objections, commit then.

Thanks.  I updated it based on the feedback above, including changing
the wording to match clang's:
     "error: version control conflict marker in file"
I replaced "patch conflict marker" with "conflict marker" in the code
and names of test cases.

I took the liberty of adding:
      gcc_assert (n > 0);
to c_parser_peek_nth_token based on Martin's feedback.

Having verified bootstrap&regrtest (on x86_64-pc-linux-gnu), I've
committed it to trunk as r231712.

I'm attaching what I committed, for reference.
diff mbox

Patch

Index: gcc/c-family/ChangeLog
===================================================================
--- gcc/c-family/ChangeLog	(revision 231711)
+++ gcc/c-family/ChangeLog	(revision 231712)
@@ -1,3 +1,8 @@ 
+2015-12-16  David Malcolm  <dmalcolm@redhat.com>
+
+	* c-common.h (conflict_marker_get_final_tok_kind): New prototype.
+	* c-lex.c (conflict_marker_get_final_tok_kind): New function.
+
 2015-12-15  Ilya Verbin  <ilya.verbin@intel.com>
 
 	* c-common.c (c_common_attribute_table): Handle "omp declare target
Index: gcc/c-family/c-lex.c
===================================================================
--- gcc/c-family/c-lex.c	(revision 231711)
+++ gcc/c-family/c-lex.c	(revision 231712)
@@ -1263,3 +1263,29 @@ 
 
   return value;
 }
+
+/* Helper function for c_parser_peek_conflict_marker
+   and cp_lexer_peek_conflict_marker.
+   Given a possible conflict marker token of kind TOK1_KIND
+   consisting of a pair of characters, get the token kind for the
+   standalone final character.  */
+
+enum cpp_ttype
+conflict_marker_get_final_tok_kind (enum cpp_ttype tok1_kind)
+{
+  switch (tok1_kind)
+    {
+    default: gcc_unreachable ();
+    case CPP_LSHIFT:
+      /* "<<" and '<' */
+      return CPP_LESS;
+
+    case CPP_EQ_EQ:
+      /* "==" and '=' */
+      return CPP_EQ;
+
+    case CPP_RSHIFT:
+      /* ">>" and '>' */
+      return CPP_GREATER;
+    }
+}
Index: gcc/c-family/c-common.h
===================================================================
--- gcc/c-family/c-common.h	(revision 231711)
+++ gcc/c-family/c-common.h	(revision 231712)
@@ -1089,6 +1089,10 @@ 
 extern int c_gimplify_expr (tree *, gimple_seq *, gimple_seq *);
 extern tree c_build_bind_expr (location_t, tree, tree);
 
+/* In c-lex.c.  */
+extern enum cpp_ttype
+conflict_marker_get_final_tok_kind (enum cpp_ttype tok1_kind);
+
 /* In c-pch.c  */
 extern void pch_init (void);
 extern void pch_cpp_save_state (void);
Index: gcc/c/ChangeLog
===================================================================
--- gcc/c/ChangeLog	(revision 231711)
+++ gcc/c/ChangeLog	(revision 231712)
@@ -1,5 +1,13 @@ 
 2015-12-16  David Malcolm  <dmalcolm@redhat.com>
 
+	* c-parser.c (struct c_parser): Expand array "tokens_buf" from 2
+	to 4.
+	(c_parser_peek_nth_token): New function.
+	(c_parser_peek_conflict_marker): New function.
+	(c_parser_error): Detect conflict markers and report them as such.
+
+2015-12-16  David Malcolm  <dmalcolm@redhat.com>
+
 	* c-parser.c (c_parser_postfix_expression): Use EXPR_LOC_OR_LOC
 	to preserve range information for the primary expression
 	in the call to c_parser_postfix_expression_after_primary.
Index: gcc/c/c-parser.c
===================================================================
--- gcc/c/c-parser.c	(revision 231711)
+++ gcc/c/c-parser.c	(revision 231712)
@@ -202,8 +202,8 @@ 
   /* The look-ahead tokens.  */
   c_token * GTY((skip)) tokens;
   /* Buffer for look-ahead tokens.  */
-  c_token tokens_buf[2];
-  /* How many look-ahead tokens are available (0, 1 or 2, or
+  c_token tokens_buf[4];
+  /* How many look-ahead tokens are available (0 - 4, or
      more if parsing from pre-lexed tokens).  */
   unsigned int tokens_avail;
   /* True if a syntax error is being recovered from; false otherwise.
@@ -492,6 +492,23 @@ 
   return &parser->tokens[1];
 }
 
+/* Return a pointer to the Nth token from PARSER, reading it
+   in if necessary.  The N-1th token is already read in.  */
+
+static c_token *
+c_parser_peek_nth_token (c_parser *parser, unsigned int n)
+{
+  /* N is 1-based, not zero-based.  */
+  gcc_assert (n > 0);
+
+  if (parser->tokens_avail >= n)
+    return &parser->tokens[n - 1];
+  gcc_assert (parser->tokens_avail == n - 1);
+  c_lex_one_token (parser, &parser->tokens[n - 1]);
+  parser->tokens_avail = n;
+  return &parser->tokens[n - 1];
+}
+
 /* Return true if TOKEN can start a type name,
    false otherwise.  */
 static bool
@@ -829,6 +846,46 @@ 
     }
 }
 
+/* Helper function for c_parser_error.
+   Having peeked a token of kind TOK1_KIND that might signify
+   a conflict marker, peek successor tokens to determine
+   if we actually do have a conflict marker.
+   Specifically, we consider a run of 7 '<', '=' or '>' characters
+   at the start of a line as a conflict marker.
+   These come through the lexer as three pairs and a single,
+   e.g. three CPP_LSHIFT ("<<") and a CPP_LESS ('<').
+   If it returns true, *OUT_LOC is written to with the location/range
+   of the marker.  */
+
+static bool
+c_parser_peek_conflict_marker (c_parser *parser, enum cpp_ttype tok1_kind,
+			       location_t *out_loc)
+{
+  c_token *token2 = c_parser_peek_2nd_token (parser);
+  if (token2->type != tok1_kind)
+    return false;
+  c_token *token3 = c_parser_peek_nth_token (parser, 3);
+  if (token3->type != tok1_kind)
+    return false;
+  c_token *token4 = c_parser_peek_nth_token (parser, 4);
+  if (token4->type != conflict_marker_get_final_tok_kind (tok1_kind))
+    return false;
+
+  /* It must be at the start of the line.  */
+  location_t start_loc = c_parser_peek_token (parser)->location;
+  if (LOCATION_COLUMN (start_loc) != 1)
+    return false;
+
+  /* We have a conflict marker.  Construct a location of the form:
+       <<<<<<<
+       ^~~~~~~
+     with start == caret, finishing at the end of the marker.  */
+  location_t finish_loc = get_finish (token4->location);
+  *out_loc = make_location (start_loc, start_loc, finish_loc);
+
+  return true;
+}
+
 /* Issue a diagnostic of the form
       FILE:LINE: MESSAGE before TOKEN
    where TOKEN is the next token in the input stream of PARSER.
@@ -850,6 +907,20 @@ 
   parser->error = true;
   if (!gmsgid)
     return;
+
+  /* If this is actually a conflict marker, report it as such.  */
+  if (token->type == CPP_LSHIFT
+      || token->type == CPP_RSHIFT
+      || token->type == CPP_EQ_EQ)
+    {
+      location_t loc;
+      if (c_parser_peek_conflict_marker (parser, token->type, &loc))
+	{
+	  error_at (loc, "version control conflict marker in file");
+	  return;
+	}
+    }
+
   /* This diagnostic makes more sense if it is tagged to the line of
      the token we just peeked at.  */
   c_parser_set_source_position_from_token (token);
Index: gcc/testsuite/ChangeLog
===================================================================
--- gcc/testsuite/ChangeLog	(revision 231711)
+++ gcc/testsuite/ChangeLog	(revision 231712)
@@ -1,5 +1,20 @@ 
 2015-12-16  David Malcolm  <dmalcolm@redhat.com>
 
+	* c-c++-common/conflict-markers-1.c: New testcase.
+	* c-c++-common/conflict-markers-2.c: Likewise.
+	* c-c++-common/conflict-markers-3.c: Likewise.
+	* c-c++-common/conflict-markers-4.c: Likewise.
+	* c-c++-common/conflict-markers-5.c: Likewise.
+	* c-c++-common/conflict-markers-6.c: Likewise.
+	* c-c++-common/conflict-markers-7.c: Likewise.
+	* c-c++-common/conflict-markers-8.c: Likewise.
+	* c-c++-common/conflict-markers-9.c: Likewise.
+	* c-c++-common/conflict-markers-10.c: Likewise.
+	* c-c++-common/conflict-markers-11.c: Likewise.
+	* g++.dg/conflict-markers-1.C: Likewise.
+
+2015-12-16  David Malcolm  <dmalcolm@redhat.com>
+
 	* gcc.dg/cast-function-1.c (bar): Update column numbers.
 	* gcc.dg/diagnostic-range-bad-called-object.c: New test case.
 
Index: gcc/testsuite/g++.dg/conflict-markers-1.C
===================================================================
--- gcc/testsuite/g++.dg/conflict-markers-1.C	(revision 0)
+++ gcc/testsuite/g++.dg/conflict-markers-1.C	(revision 231712)
@@ -0,0 +1,13 @@ 
+/* Ensure that we don't complain about conflict markers on
+   valid template argument lists, valid in C++11 onwards.  */
+// { dg-options "-std=c++11" }
+
+template <typename T>
+struct foo
+{
+  T t;
+};
+
+foo <foo <foo <foo <foo <foo <foo <int
+>>>>>>> f;
+// The above line is valid C++11, and isn't a conflict marker
Index: gcc/testsuite/c-c++-common/conflict-markers-11.c
===================================================================
--- gcc/testsuite/c-c++-common/conflict-markers-11.c	(revision 0)
+++ gcc/testsuite/c-c++-common/conflict-markers-11.c	(revision 231712)
@@ -0,0 +1,14 @@ 
+/* Verify that we only report conflict markers at the start of lines.  */
+int p;
+
+ <<<<<<< HEAD /* { dg-error "expected identifier|expected unqualified-id" } */
+
+int q;
+
+ =======      /* { dg-error "expected identifier|expected unqualified-id" } */
+
+int r;
+
+ >>>>>>> Some commit message  /* { dg-error "expected identifier|expected unqualified-id" } */
+
+int s;
Index: gcc/testsuite/c-c++-common/conflict-markers-4.c
===================================================================
--- gcc/testsuite/c-c++-common/conflict-markers-4.c	(revision 0)
+++ gcc/testsuite/c-c++-common/conflict-markers-4.c	(revision 231712)
@@ -0,0 +1,11 @@ 
+/* Ensure we can handle mismatched conflict markers.  */
+
+int p;
+
+>>>>>>> Some commit message  /* { dg-error "conflict marker" } */
+
+int q;
+
+>>>>>>> Some other commit message  /* { dg-error "conflict marker" } */
+
+int r;
Index: gcc/testsuite/c-c++-common/conflict-markers-5.c
===================================================================
--- gcc/testsuite/c-c++-common/conflict-markers-5.c	(revision 0)
+++ gcc/testsuite/c-c++-common/conflict-markers-5.c	(revision 231712)
@@ -0,0 +1,11 @@ 
+/* Ensure we can handle mismatched conflict markers.  */
+
+int p;
+
+=======  /* { dg-error "conflict marker" } */
+
+int q;
+
+=======  /* { dg-error "conflict marker" } */
+
+int r;
Index: gcc/testsuite/c-c++-common/conflict-markers-6.c
===================================================================
--- gcc/testsuite/c-c++-common/conflict-markers-6.c	(revision 0)
+++ gcc/testsuite/c-c++-common/conflict-markers-6.c	(revision 231712)
@@ -0,0 +1,38 @@ 
+/* Branch coverage of conflict marker detection:
+   none of these should be reported as conflict markers.  */
+
+int a0;
+
+<< HEAD  /* { dg-error "expected" } */
+
+int a1;
+
+<<<< HEAD  /* { dg-error "expected" } */
+
+int a2;
+
+<<<<<< HEAD  /* { dg-error "expected" } */
+
+int b0;
+
+== HEAD  /* { dg-error "expected" } */
+
+int b1;
+
+==== HEAD  /* { dg-error "expected" } */
+
+int b2;
+
+====== HEAD  /* { dg-error "expected" } */
+
+int c0;
+
+>> HEAD  /* { dg-error "expected" } */
+
+int c1;
+
+>>>> HEAD  /* { dg-error "expected" } */
+
+int c2;
+
+>>>>>> HEAD  /* { dg-error "expected" } */
Index: gcc/testsuite/c-c++-common/conflict-markers-7.c
===================================================================
--- gcc/testsuite/c-c++-common/conflict-markers-7.c	(revision 0)
+++ gcc/testsuite/c-c++-common/conflict-markers-7.c	(revision 231712)
@@ -0,0 +1,6 @@ 
+/* It's valid to stringize the "<<<<<<<"; don't
+   report it as a conflict marker.  */
+#define str(s) #s
+const char *s = str(
+<<<<<<<
+);
Index: gcc/testsuite/c-c++-common/conflict-markers-8.c
===================================================================
--- gcc/testsuite/c-c++-common/conflict-markers-8.c	(revision 0)
+++ gcc/testsuite/c-c++-common/conflict-markers-8.c	(revision 231712)
@@ -0,0 +1,4 @@ 
+/* A macro that's never expanded shouldn't be reported as a
+   conflict marker.  */
+#define foo \
+<<<<<<<
Index: gcc/testsuite/c-c++-common/conflict-markers-1.c
===================================================================
--- gcc/testsuite/c-c++-common/conflict-markers-1.c	(revision 0)
+++ gcc/testsuite/c-c++-common/conflict-markers-1.c	(revision 231712)
@@ -0,0 +1,11 @@ 
+int p;
+
+<<<<<<< HEAD /* { dg-error "conflict marker" } */
+extern int some_var;
+=======      /* { dg-error "conflict marker" } */
+extern short some_var; /* This line would lead to a warning due to the
+			  duplicate name, but it is skipped when handling
+			  the conflict marker.  */
+>>>>>>> Some commit message  /* { dg-error "conflict marker" } */
+
+int q;
Index: gcc/testsuite/c-c++-common/conflict-markers-9.c
===================================================================
--- gcc/testsuite/c-c++-common/conflict-markers-9.c	(revision 0)
+++ gcc/testsuite/c-c++-common/conflict-markers-9.c	(revision 231712)
@@ -0,0 +1,8 @@ 
+/* It's valid to have
+<<<<<<<
+   inside both
+   comments (as above), and within string literals.  */
+const char *s = "\
+<<<<<<<";
+
+/* The above shouldn't be reported as errors.  */
Index: gcc/testsuite/c-c++-common/conflict-markers-2.c
===================================================================
--- gcc/testsuite/c-c++-common/conflict-markers-2.c	(revision 0)
+++ gcc/testsuite/c-c++-common/conflict-markers-2.c	(revision 231712)
@@ -0,0 +1,2 @@ 
+/* This should not be flagged as a conflict marker.  */
+const char *msg = "<<<<<<< ";
Index: gcc/testsuite/c-c++-common/conflict-markers-10.c
===================================================================
--- gcc/testsuite/c-c++-common/conflict-markers-10.c	(revision 0)
+++ gcc/testsuite/c-c++-common/conflict-markers-10.c	(revision 231712)
@@ -0,0 +1,25 @@ 
+/* { dg-options "-fdiagnostics-show-caret" } */
+
+<<<<<<< HEAD /* { dg-error "conflict marker" } */
+/* { dg-begin-multiline-output "" }
+ <<<<<<< HEAD
+ ^~~~~~~
+   { dg-end-multiline-output "" } */
+
+extern int some_var;
+
+=======      /* { dg-error "conflict marker" } */
+/* { dg-begin-multiline-output "" }
+ =======
+ ^~~~~~~
+   { dg-end-multiline-output "" } */
+
+extern short some_var; /* This line would lead to a warning due to the
+			  duplicate name, but it is skipped when handling
+			  the conflict marker.  */
+
+>>>>>>> Some commit message  /* { dg-error "conflict marker" } */
+/* { dg-begin-multiline-output "" }
+ >>>>>>>
+ ^~~~~~~
+   { dg-end-multiline-output "" } */
Index: gcc/testsuite/c-c++-common/conflict-markers-3.c
===================================================================
--- gcc/testsuite/c-c++-common/conflict-markers-3.c	(revision 0)
+++ gcc/testsuite/c-c++-common/conflict-markers-3.c	(revision 231712)
@@ -0,0 +1,11 @@ 
+/* Ensure we can handle unterminated conflict markers.  */
+
+int p;
+
+<<<<<<< HEAD  /* { dg-error "conflict marker" } */
+
+int q;
+
+<<<<<<< HEAD  /* { dg-error "conflict marker" } */
+
+int r;
Index: gcc/cp/ChangeLog
===================================================================
--- gcc/cp/ChangeLog	(revision 231711)
+++ gcc/cp/ChangeLog	(revision 231712)
@@ -1,3 +1,9 @@ 
+2015-12-16  David Malcolm  <dmalcolm@redhat.com>
+
+	* parser.c (cp_lexer_peek_conflict_marker): New function.
+	(cp_parser_error): Detect conflict markers and report them as
+	such.
+
 2015-12-15  Martin Sebor  <msebor@redhat.com>
 
 	c++/42121
Index: gcc/cp/parser.c
===================================================================
--- gcc/cp/parser.c	(revision 231711)
+++ gcc/cp/parser.c	(revision 231712)
@@ -2689,6 +2689,46 @@ 
   return token->keyword == keyword;
 }
 
+/* Helper function for cp_parser_error.
+   Having peeked a token of kind TOK1_KIND that might signify
+   a conflict marker, peek successor tokens to determine
+   if we actually do have a conflict marker.
+   Specifically, we consider a run of 7 '<', '=' or '>' characters
+   at the start of a line as a conflict marker.
+   These come through the lexer as three pairs and a single,
+   e.g. three CPP_LSHIFT tokens ("<<") and a CPP_LESS token ('<').
+   If it returns true, *OUT_LOC is written to with the location/range
+   of the marker.  */
+
+static bool
+cp_lexer_peek_conflict_marker (cp_lexer *lexer, enum cpp_ttype tok1_kind,
+			       location_t *out_loc)
+{
+  cp_token *token2 = cp_lexer_peek_nth_token (lexer, 2);
+  if (token2->type != tok1_kind)
+    return false;
+  cp_token *token3 = cp_lexer_peek_nth_token (lexer, 3);
+  if (token3->type != tok1_kind)
+    return false;
+  cp_token *token4 = cp_lexer_peek_nth_token (lexer, 4);
+  if (token4->type != conflict_marker_get_final_tok_kind (tok1_kind))
+    return false;
+
+  /* It must be at the start of the line.  */
+  location_t start_loc = cp_lexer_peek_token (lexer)->location;
+  if (LOCATION_COLUMN (start_loc) != 1)
+    return false;
+
+  /* We have a conflict marker.  Construct a location of the form:
+       <<<<<<<
+       ^~~~~~~
+     with start == caret, finishing at the end of the marker.  */
+  location_t finish_loc = get_finish (token4->location);
+  *out_loc = make_location (start_loc, start_loc, finish_loc);
+
+  return true;
+}
+
 /* If not parsing tentatively, issue a diagnostic of the form
       FILE:LINE: MESSAGE before TOKEN
    where TOKEN is the next token in the input stream.  MESSAGE
@@ -2713,6 +2753,19 @@ 
 	  return;
 	}
 
+      /* If this is actually a conflict marker, report it as such.  */
+      if (token->type == CPP_LSHIFT
+	  || token->type == CPP_RSHIFT
+	  || token->type == CPP_EQ_EQ)
+	{
+	  location_t loc;
+	  if (cp_lexer_peek_conflict_marker (parser->lexer, token->type, &loc))
+	    {
+	      error_at (loc, "version control conflict marker in file");
+	      return;
+	    }
+	}
+
       c_parse_error (gmsgid,
 		     /* Because c_parser_error does not understand
 			CPP_KEYWORD, keywords are treated like