diff mbox

[committed] Fix for PR preprocessor/78680

Message ID 1481566454-12915-1-git-send-email-dmalcolm@redhat.com
State New
Headers show

Commit Message

David Malcolm Dec. 12, 2016, 6:14 p.m. UTC
PR preprocessor/78680 identifies a crash when attempting to issue
a -Wformat warning, where the format string includes a string token
split across multiple physical source lines via backslash-continued
lines.

The issue is that libcpp is generating bogus range information for
such tokens.

For example, in:

void fn1() {
  __builtin_printf("\
     %ld.\n\
        2\n"); };

the range of the string token is printed as:

   __builtin_printf("\
                    ^~

whereas the range ought to be:

  __builtin_printf("\
                   ^~
     %ld.\n\
     ~~~~~~~
        2\n"); };
        ~~~~

The root cause is that the line notes expressing the update
of the buffer in lex.c aren't yet updated when the end-point of
the token is computed

3095	    tok_range.m_finish
3096	      = linemap_position_for_column (pfile->line_table,
3097					     CPP_BUF_COLUMN (buffer, buffer->cur));

so that the physical line is still regarded as that of the start
of the token, and, where CPP_BUF_COLUMN uses (BUF)->line_base,
line_base is still the location of the first physical line in the
and hence the column information is too large (as if it were the
offset in the *logical* line).

(the printed range is somewhat misleading; the actual buggy range
extends beyond the "\ in the line, but within diagnostic-show-locus.c
layout::print_annotation_line only prints up to the xbound set by
layout::print_source_line and so truncates most of the buggy range).

The fix is to ensure that line notes are handled before calculating
the end-point of the token range.

This leads to the range for the string token being correctly
computed, as:

  __builtin_printf("\
                   ^~
     %ld.\n\
     ~~~~~~~
        2\n"); };
        ~~~~

and this leads to get_substring_ranges_for_loc failing gracefully,
rather than crashing.

Successfully bootstrapped&regrtested on x86_64-pc-linux-gnu; adds
10 PASS results to gcc.sum.

Committed to trunk as r243567.

gcc/testsuite/ChangeLog:
	PR preprocessor/78680
	* gcc.dg/format/pr78680.c: New test case.
	* gcc.dg/plugin/diagnostic-test-expressions-1.c
	(test_multiline_token): New function.
	* gcc.dg/plugin/diagnostic-test-string-literals-1.c
	(test_backslash_continued_logical_lines): New function.

libcpp/ChangeLog:
	PR preprocessor/78680
	* lex.c (_cpp_lex_direct): Ensure line notes are processed before
	computing the end-point of the token.
---
 gcc/testsuite/gcc.dg/format/pr78680.c                | 16 ++++++++++++++++
 .../gcc.dg/plugin/diagnostic-test-expressions-1.c    | 19 +++++++++++++++++++
 .../plugin/diagnostic-test-string-literals-1.c       | 20 ++++++++++++++++++++
 libcpp/lex.c                                         |  7 +++++++
 4 files changed, 62 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/format/pr78680.c

Comments

Andreas Schwab Dec. 14, 2016, 10:54 a.m. UTC | #1
This crashes the compiler on ia64.

FAIL: c-c++-common/raw-string-6.c  -Wc++-compat  (test for excess errors)
Excess errors:
/usr/local/gcc/gcc-20161214/gcc/testsuite/c-c++-common/raw-string-6.c:5:1: internal compiler error: Segmentation fault
0x400000000108571f crash_signal
	../../gcc/toplev.c:333
0x4000000002024dcf _cpp_process_line_notes
	../../libcpp/lex.c:1108
0x400000000202b5af _cpp_lex_direct
	../../libcpp/lex.c:3097
0x400000000202f0ff _cpp_lex_token
	../../libcpp/lex.c:2554
0x400000000204169f cpp_get_token_1
	../../libcpp/macro.c:2462
0x40000000004252af c_lex_with_flags(tree_node**, unsigned int*, unsigned char*, int)
	../../gcc/c-family/c-lex.c:398
0x40000000002e4eaf c_lex_one_token
	../../gcc/c/c-parser.c:244
0x40000000003404ff c_parser_peek_token(c_parser*)
	../../gcc/c/c-parser.c:431
0x40000000003404ff c_parser_declaration_or_fndef
	../../gcc/c/c-parser.c:1907
0x40000000003574ef c_parser_external_declaration
	../../gcc/c/c-parser.c:1463
0x400000000035905f c_parser_translation_unit
	../../gcc/c/c-parser.c:1343
0x400000000035905f c_parse_file()
	../../gcc/c/c-parser.c:18140
0x400000000043e5ef c_common_parse_file()
	../../gcc/c-family/c-opts.c:1098

Andreas.
James Greenhalgh Dec. 14, 2016, 12:28 p.m. UTC | #2
On Wed, Dec 14, 2016 at 11:54:21AM +0100, Andreas Schwab wrote:
> This crashes the compiler on ia64.
> 
> FAIL: c-c++-common/raw-string-6.c  -Wc++-compat  (test for excess errors)
> Excess errors:
> /usr/local/gcc/gcc-20161214/gcc/testsuite/c-c++-common/raw-string-6.c:5:1: internal compiler error: Segmentation fault
> 0x400000000108571f crash_signal
> 	../../gcc/toplev.c:333
> 0x4000000002024dcf _cpp_process_line_notes
> 	../../libcpp/lex.c:1108
> 0x400000000202b5af _cpp_lex_direct
> 	../../libcpp/lex.c:3097
> 0x400000000202f0ff _cpp_lex_token
> 	../../libcpp/lex.c:2554
> 0x400000000204169f cpp_get_token_1
> 	../../libcpp/macro.c:2462
> 0x40000000004252af c_lex_with_flags(tree_node**, unsigned int*, unsigned char*, int)
> 	../../gcc/c-family/c-lex.c:398
> 0x40000000002e4eaf c_lex_one_token
> 	../../gcc/c/c-parser.c:244
> 0x40000000003404ff c_parser_peek_token(c_parser*)
> 	../../gcc/c/c-parser.c:431
> 0x40000000003404ff c_parser_declaration_or_fndef
> 	../../gcc/c/c-parser.c:1907
> 0x40000000003574ef c_parser_external_declaration
> 	../../gcc/c/c-parser.c:1463
> 0x400000000035905f c_parser_translation_unit
> 	../../gcc/c/c-parser.c:1343
> 0x400000000035905f c_parse_file()
> 	../../gcc/c/c-parser.c:18140
> 0x400000000043e5ef c_common_parse_file()
> 	../../gcc/c-family/c-opts.c:1098

I'm seeing the same failure, with the same backtrace on aarch64*-none-elf
and arm*-none-eabi targets when this test is run with a c++ compiler. 

Thanks,
James
David Malcolm Dec. 14, 2016, 3:14 p.m. UTC | #3
On Wed, 2016-12-14 at 11:54 +0100, Andreas Schwab wrote:
> This crashes the compiler on ia64.

Sorry about the breakage.

I'm able to reproduce this under valgrind on x86_64 (looks like a read
after free); I'm working on a fix.

Thanks
Dave

> FAIL: c-c++-common/raw-string-6.c  -Wc++-compat  (test for excess
> errors)
> Excess errors:
> /usr/local/gcc/gcc-20161214/gcc/testsuite/c-c++-common/raw-string
> -6.c:5:1: internal compiler error: Segmentation fault
> 0x400000000108571f crash_signal
> 	../../gcc/toplev.c:333
> 0x4000000002024dcf _cpp_process_line_notes
> 	../../libcpp/lex.c:1108
> 0x400000000202b5af _cpp_lex_direct
> 	../../libcpp/lex.c:3097
> 0x400000000202f0ff _cpp_lex_token
> 	../../libcpp/lex.c:2554
> 0x400000000204169f cpp_get_token_1
> 	../../libcpp/macro.c:2462
> 0x40000000004252af c_lex_with_flags(tree_node**, unsigned int*,
> unsigned char*, int)
> 	../../gcc/c-family/c-lex.c:398
> 0x40000000002e4eaf c_lex_one_token
> 	../../gcc/c/c-parser.c:244
> 0x40000000003404ff c_parser_peek_token(c_parser*)
> 	../../gcc/c/c-parser.c:431
> 0x40000000003404ff c_parser_declaration_or_fndef
> 	../../gcc/c/c-parser.c:1907
> 0x40000000003574ef c_parser_external_declaration
> 	../../gcc/c/c-parser.c:1463
> 0x400000000035905f c_parser_translation_unit
> 	../../gcc/c/c-parser.c:1343
> 0x400000000035905f c_parse_file()
> 	../../gcc/c/c-parser.c:18140
> 0x400000000043e5ef c_common_parse_file()
> 	../../gcc/c-family/c-opts.c:1098
> 
> Andreas.
>
diff mbox

Patch

diff --git a/gcc/testsuite/gcc.dg/format/pr78680.c b/gcc/testsuite/gcc.dg/format/pr78680.c
new file mode 100644
index 0000000..0c599f3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/format/pr78680.c
@@ -0,0 +1,16 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -Wall -Wextra -fdiagnostics-show-caret" } */
+
+void fn1() {
+  __builtin_printf("\
+     %ld.\n\
+        2\n"); };
+/* { dg-warning "expects a matching" "" { target *-*-* } .-3 } */
+/* { dg-begin-multiline-output "" }
+   __builtin_printf("\
+                    ^~
+      %ld.\n\
+      ~~~~~~~        
+         2\n"); };
+         ~~~~        
+   { dg-end-multiline-output "" } */
diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-expressions-1.c b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-expressions-1.c
index 9372936..afbe0f7 100644
--- a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-expressions-1.c
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-expressions-1.c
@@ -689,3 +689,22 @@  void test_multiple_ordinary_maps (void)
         ~~                      
    { dg-end-multiline-output "" } */
 }
+
+/* Verify that we correctly handle a token that spans multiple
+   physical lines.  */
+
+const char *test_multiline_token (void)
+{
+  __emit_expression_range (0, "foo\
+bar\
+baz");
+/* { dg-warning "range" "" { target *-*-* } .-3 } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, "foo\
+                               ^~~~~
+ bar\
+ ~~~~                           
+ baz");
+ ~~~~                           
+   { dg-end-multiline-output "" } */
+}
diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-1.c b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-1.c
index 76a085e..03f042a 100644
--- a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-1.c
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-1.c
@@ -272,3 +272,23 @@  test_terminator_location (void)
                                            ^
    { dg-end-multiline-output "" } */
 }
+
+/* Verify that we fail gracefully when a string literal token is split
+   across multiple physical lines.  */
+
+void
+test_backslash_continued_logical_lines (void)
+{
+  __emit_string_literal_range ("\
+01234\
+56789", 6, 6, 7);
+  /* { dg-error "unable to read substring location: range endpoints are on different lines" "" { target *-*-* } .-3 } */
+  /* { dg-begin-multiline-output "" }
+   __emit_string_literal_range ("\
+                                ^~
+ 01234\
+ ~~~~~~                          
+ 56789", 6, 6, 7);
+ ~~~~~~                          
+   { dg-end-multiline-output "" } */
+}
diff --git a/libcpp/lex.c b/libcpp/lex.c
index cea8848..ae45892 100644
--- a/libcpp/lex.c
+++ b/libcpp/lex.c
@@ -3089,6 +3089,13 @@  _cpp_lex_direct (cpp_reader *pfile)
       break;
     }
 
+  /* Ensure that any line notes are processed, so that we have the
+     correct physical line/column for the end-point of the token even
+     when a logical line is split via one or more backslashes.  */
+  if (buffer->cur >= buffer->notes[buffer->cur_note].pos
+      && !pfile->overlaid_buffer)
+    _cpp_process_line_notes (pfile, false);
+
   source_range tok_range;
   tok_range.m_start = result->src_loc;
   if (result->src_loc >= RESERVED_LOCATION_COUNT)