Message ID | 877g3uo6cq.fsf@redhat.com |
---|---|
State | New |
Headers | show |
Hello Dodji, I found time this morning to run your changes through our system. I patched our gcc-4.8.1 with your latest change, and ran it through our folly testsuite. One thing that I immediately noticed was that this increased the preprocessed size substantially. When preprocessing my favorite .cpp file, its .ii grew from 137k lines to 145k, a 5% increase. All the folly code compiled and ran successfully under the changes. I looked at some of the preprocessed output. I was pleased to see that consecutive macros that expanded entirely to system tokens did not insert unnecessary line directives between them. I did, however, notice that __LINE__ was treated as belonging to the calling file, even when its token appears in the system file. That is to say: CODE: // system macro #define FOO() sys_token __LINE__ sys_token // non-system callsite FOO() // preprocessed output # 3 "test.cpp" 3 4 sys_token # 3 "test.cpp" 3 # 3 "test.cpp" 3 4 sys_token :CODE This seems to generalize to other builtin macros, like __FILE__. Otherwise, the code looks fine. There is only one thing I noticed: > + if (do_line_adjustments > + && !in_pragma > + && !line_marker_emitted > + && print.prev_was_system_token != !!in_system_header_at(loc)) > + /* The system-ness of this token is different from the one > + of the previous token. Let's emit a line change to > + mark the new system-ness before we emit the token. */ > + line_marker_emitted = do_line_change (pfile, token, loc, false); This line_marker_emitted assignment is immediately overwritten, two lines below. However, from a maintainability perspective, this is probably a good assignment to keep. > cpp_output_token (token, print.outf); > + line_marker_emitted = false; > } Thanks for this diff! Cheers, Nicholas
Hello Nicholas, Nicholas Ormrod <nicholas.ormrod@hotmail.com> writes: > I found time this morning to run your changes through our system. I > patched our gcc-4.8.1 with your latest change, and ran it through our > folly testsuite. Thanks! > One thing that I immediately noticed was that this increased the preprocessed size substantially. > When preprocessing my favorite .cpp file, its .ii grew from 137k lines > to 145k, a 5% increase. Yeah, the growth is expected. It's interesting to have actual numbers, thank you for that. > All the folly code compiled and ran successfully under the changes. > > I looked at some of the preprocessed output. I was pleased to see that > consecutive macros that expanded entirely to system tokens did not > insert unnecessary line directives between them. Cool! > I did, however, notice that __LINE__ was treated as belonging to the > calling file, even when its token appears in the system file. That is > to say: > > CODE: > > // system macro > #define FOO() sys_token __LINE__ sys_token > > // non-system callsite > FOO() > > // preprocessed output > # 3 "test.cpp" 3 4 > sys_token > # 3 "test.cpp" > 3 > # 3 "test.cpp" 3 4 > sys_token > > :CODE Yeah. For Built-in tokens that are expanded like that we only do track their the location of their expansion, not their spelling location. So this behaviour is expected as well. > This seems to generalize to other builtin macros, like __FILE__. Indeed. > > Otherwise, the code looks fine. There is only one thing I noticed: > >> + if (do_line_adjustments >> + && !in_pragma >> + && !line_marker_emitted >> + && print.prev_was_system_token != !!in_system_header_at(loc)) >> + /* The system-ness of this token is different from the one >> + of the previous token. Let's emit a line change to >> + mark the new system-ness before we emit the token. */ >> + line_marker_emitted = do_line_change (pfile, token, loc, false); > > This line_marker_emitted assignment is immediately overwritten, two lines below. However, from a > maintainability perspective, this is probably a good assignment to > keep. Yeah, maintainability is why I kept it. But if the maintainers feel strongly about it I can, certainly just remove that assignment. Thank you for your time on this! Cheers.
On 07/04/2014 05:13 AM, Dodji Seketeli wrote: >> >// preprocessed output >> ># 3 "test.cpp" 3 4 >> >sys_token >> ># 3 "test.cpp" >> >3 >> ># 3 "test.cpp" 3 4 >> >sys_token > Yeah. For Built-in tokens that are expanded like that we only do track > their the location of their expansion, not their spelling location. So > this behaviour is expected as well. But surely you can do something to avoid this useless line marker in this case? A built-in token should never require a line change. Jason
Jason Merrill <jason@redhat.com> writes: > On 07/04/2014 05:13 AM, Dodji Seketeli wrote: >>> >// preprocessed output >>> ># 3 "test.cpp" 3 4 >>> >sys_token >>> ># 3 "test.cpp" >>> >3 >>> ># 3 "test.cpp" 3 4 >>> >sys_token > >> Yeah. For Built-in tokens that are expanded like that we only do track >> their the location of their expansion, not their spelling location. So >> this behaviour is expected as well. > > But surely you can do something to avoid this useless line marker in > this case? A built-in token should never require a line change. What triggers the line change is the *expansion* of the built-in macro __LINE__. That is, the token "3". As the existing location tracking facility doesn't track the locations for tokens originating from the expansion of built-in macros, the relationship "3" -> __LINE__ is lost. My understanding is that to avoid emitting the line marker in this case, we'd need to track the "3" -> __LINE__ relationship so that we can detect that "3" is the expansion of the built-in macro__LINE__. This would be a separate patch that I'd need to work on and test separately I guess. Should that prevent this patch to go in now?
On 07/10/2014 08:13 AM, Dodji Seketeli wrote: > Jason Merrill <jason@redhat.com> writes: > >> On 07/04/2014 05:13 AM, Dodji Seketeli wrote: >>>>> // preprocessed output >>>>> # 3 "test.cpp" 3 4 >>>>> sys_token >>>>> # 3 "test.cpp" >>>>> 3 >>>>> # 3 "test.cpp" 3 4 >>>>> sys_token >> >>> Yeah. For Built-in tokens that are expanded like that we only do track >>> their the location of their expansion, not their spelling location. So >>> this behaviour is expected as well. >> >> But surely you can do something to avoid this useless line marker in >> this case? A built-in token should never require a line change. > > What triggers the line change is the *expansion* of the built-in macro > __LINE__. > > That is, the token "3". As the existing location tracking facility > doesn't track the locations for tokens originating from the expansion of > built-in macros, the relationship "3" -> __LINE__ is lost. So what is the location of "3" in this case? It seems to me that it doesn't really have its own location, and so we should be able to use whatever the current location is and not emit a line note. Jason
The fact that extra line directives are inserted around built-in tokens isn't ideal, but I must concur with Dodji's assessment that such a fix belongs in a separate patch. The purpose of this patch is to resolve a discrepancy between integrated-cpp and non-integrated-cpp. The locations of built-in tokens is odd, but consistent. It is an orthogonal issue, and therefore should not be fixed as part of this diff. In terms of practical considerations, the impact of the extra line directives is minimal. The main concern of the two reporters (see gcc.gnu.org/bugzilla/show_bug.cgi?id=60723) is resolved by this patch, and is completely unaffected by the extra directives. I think that this patch is good to go. Regards, Nicholas P.S. Dodji, if you are also going to fix the locations of built-in tokens, it would also be worth adjusting their expansion locations. As mentioned in the bug report, built-in tokens are expanded at the closing paren of a macro call, whereas non-built-in tokens are expanded at the opening paren. This is weird.
Hello, Jason Merrill <jason@redhat.com> writes: >>>>>> // preprocessed output >>>>>> # 3 "test.cpp" 3 4 >>>>>> sys_token >>>>>> # 3 "test.cpp" >>>>>> 3 >>>>>> # 3 "test.cpp" 3 4 >>>>>> sys_token >>> >>>> Yeah. For Built-in tokens that are expanded like that we only do track >>>> their the location of their expansion, not their spelling location. So >>>> this behaviour is expected as well. >>> >>> But surely you can do something to avoid this useless line marker in >>> this case? A built-in token should never require a line change. >> >> What triggers the line change is the *expansion* of the built-in macro >> __LINE__. >> >> That is, the token "3". As the existing location tracking facility >> doesn't track the locations for tokens originating from the expansion of >> built-in macros, the relationship "3" -> __LINE__ is lost. > > So what is the location of "3" in this case? The location of the token "3" is 3, in that case. It's the line number of the expansion point of the __LINE__ built-in macro. The token "3" does not have a virtual location that allows us to "unwind" back to the special built-in spelling location 1, that would mean that "3" originates from a built-in macro. > It seems to me that it doesn't really have its own location, and so > we should be able to use whatever the current location is and not emit > a line note. The issue is that the location for "3" is not virtual, so there is not much we can do about it. But then I have just wrote the support for making that "3" token have a virtual location. I am sending a small patchset as a follow-up to this message. Cheers.
Hello Nicholas, Nicholas Ormrod <nicholas.ormrod@hotmail.com> writes: [...] > If you are also going to fix the locations of built-in tokens, it > would also be worth adjusting their expansion locations. As mentioned > in the bug report, built-in tokens are expanded at the closing paren > of a macro call, whereas non-built-in tokens are expanded at the > opening paren. This is weird. Yes it is weird. From what I understood while looking at this, this is also a separate issue that ought to be addressed in a separate patch with a separate test case. I'll look at that. Cheers.
diff --git a/gcc/c-family/c-ppoutput.c b/gcc/c-family/c-ppoutput.c index f3b5fa4..400d3a7 100644 --- a/gcc/c-family/c-ppoutput.c +++ b/gcc/c-family/c-ppoutput.c @@ -36,6 +36,8 @@ static struct unsigned char printed; /* Nonzero if something output at line. */ bool first_time; /* pp_file_change hasn't been called yet. */ const char *src_file; /* Current source file. */ + bool prev_was_system_token; /* True if the previous token was a + system token.*/ } print; /* Defined and undefined macros being queued for output with -dU at @@ -58,11 +60,11 @@ static void account_for_newlines (const unsigned char *, size_t); static int dump_macro (cpp_reader *, cpp_hashnode *, void *); static void dump_queued_macros (cpp_reader *); -static void print_line_1 (source_location, const char*, FILE *); -static void print_line (source_location, const char *); -static void maybe_print_line_1 (source_location, FILE *); -static void maybe_print_line (source_location); -static void do_line_change (cpp_reader *, const cpp_token *, +static bool print_line_1 (source_location, const char*, FILE *); +static bool print_line (source_location, const char *); +static bool maybe_print_line_1 (source_location, FILE *); +static bool maybe_print_line (source_location); +static bool do_line_change (cpp_reader *, const cpp_token *, source_location, int); /* Callback routines for the parser. Most of these are active only @@ -156,6 +158,7 @@ init_pp_output (FILE *out_stream) print.outf = out_stream; print.first_time = 1; print.src_file = ""; + print.prev_was_system_token = false; } /* Writes out the preprocessed file, handling spacing and paste @@ -168,6 +171,7 @@ scan_translation_unit (cpp_reader *pfile) = cpp_get_options (parse_in)->lang != CLK_ASM && !flag_no_line_commands; bool in_pragma = false; + bool line_marker_emitted = false; print.source = NULL; for (;;) @@ -200,7 +204,7 @@ scan_translation_unit (cpp_reader *pfile) && do_line_adjustments && !in_pragma) { - do_line_change (pfile, token, loc, false); + line_marker_emitted = do_line_change (pfile, token, loc, false); putc (' ', print.outf); } else if (print.source->flags & PREV_WHITE @@ -216,7 +220,7 @@ scan_translation_unit (cpp_reader *pfile) if (src_line != print.src_line && do_line_adjustments && !in_pragma) - do_line_change (pfile, token, loc, false); + line_marker_emitted = do_line_change (pfile, token, loc, false); putc (' ', print.outf); } @@ -228,7 +232,7 @@ scan_translation_unit (cpp_reader *pfile) const char *space; const char *name; - maybe_print_line (token->src_loc); + line_marker_emitted = maybe_print_line (token->src_loc); fputs ("#pragma ", print.outf); c_pp_lookup_pragma (token->val.pragma, &space, &name); if (space) @@ -248,9 +252,20 @@ scan_translation_unit (cpp_reader *pfile) if (cpp_get_options (parse_in)->debug) linemap_dump_location (line_table, token->src_loc, print.outf); + + if (do_line_adjustments + && !in_pragma + && !line_marker_emitted + && print.prev_was_system_token != !!in_system_header_at(loc)) + /* The system-ness of this token is different from the one + of the previous token. Let's emit a line change to + mark the new system-ness before we emit the token. */ + line_marker_emitted = do_line_change (pfile, token, loc, false); cpp_output_token (token, print.outf); + line_marker_emitted = false; } + print.prev_was_system_token = !!in_system_header_at(loc); /* CPP_COMMENT tokens and raw-string literal tokens can have embedded new-line characters. Rather than enumerating all the possible token types just check if token uses @@ -275,7 +290,7 @@ scan_translation_unit_directives_only (cpp_reader *pfile) struct _cpp_dir_only_callbacks cb; cb.print_lines = print_lines_directives_only; - cb.maybe_print_line = maybe_print_line; + cb.maybe_print_line = (void (*) (source_location)) maybe_print_line; _cpp_preprocess_dir_only (pfile, &cb); } @@ -306,11 +321,13 @@ scan_translation_unit_trad (cpp_reader *pfile) /* If the token read on logical line LINE needs to be output on a different line to the current one, output the required newlines or - a line marker, and return 1. Otherwise return 0. */ + a line marker. If a line marker was emitted, return TRUE otherwise + return FALSE. */ -static void +static bool maybe_print_line_1 (source_location src_loc, FILE *stream) { + bool emitted_line_marker = false; int src_line = LOCATION_LINE (src_loc); const char *src_file = LOCATION_FILE (src_loc); @@ -334,29 +351,34 @@ maybe_print_line_1 (source_location src_loc, FILE *stream) } } else - print_line_1 (src_loc, "", stream); + emitted_line_marker = print_line_1 (src_loc, "", stream); + return emitted_line_marker; } /* If the token read on logical line LINE needs to be output on a different line to the current one, output the required newlines or - a line marker, and return 1. Otherwise return 0. */ + a line marker. If a line marker was emitted, return TRUE otherwise + return FALSE. */ -static void +static bool maybe_print_line (source_location src_loc) { if (cpp_get_options (parse_in)->debug) linemap_dump_location (line_table, src_loc, print.outf); - maybe_print_line_1 (src_loc, print.outf); + return maybe_print_line_1 (src_loc, print.outf); } /* Output a line marker for logical line LINE. Special flags are "1" - or "2" indicating entering or leaving a file. */ + or "2" indicating entering or leaving a file. If the line marker + was effectively emitted, return TRUE otherwise return FALSE. */ -static void +static bool print_line_1 (source_location src_loc, const char *special_flags, FILE *stream) { + bool emitted_line_marker = false; + /* End any previous line of text. */ if (print.printed) putc ('\n', stream); @@ -391,33 +413,39 @@ print_line_1 (source_location src_loc, const char *special_flags, FILE *stream) fputs (" 3", stream); putc ('\n', stream); + emitted_line_marker = true; } + + return emitted_line_marker; } /* Output a line marker for logical line LINE. Special flags are "1" - or "2" indicating entering or leaving a file. */ + or "2" indicating entering or leaving a file. Return TRUE if a + line marker was effectively emitted, FALSE otherwise. */ -static void +static bool print_line (source_location src_loc, const char *special_flags) { if (cpp_get_options (parse_in)->debug) linemap_dump_location (line_table, src_loc, print.outf); - print_line_1 (src_loc, special_flags, print.outf); + return print_line_1 (src_loc, special_flags, print.outf); } -/* Helper function for cb_line_change and scan_translation_unit. */ -static void +/* Helper function for cb_line_change and scan_translation_unit. + Return TRUE if a line marker is emitted, FALSE otherwise. */ +static bool do_line_change (cpp_reader *pfile, const cpp_token *token, source_location src_loc, int parsing_args) { + bool emitted_line_marker = false; if (define_queue || undef_queue) dump_queued_macros (pfile); if (token->type == CPP_EOF || parsing_args) - return; + return false; - maybe_print_line (src_loc); + emitted_line_marker = maybe_print_line (src_loc); print.prev = 0; print.source = 0; @@ -434,6 +462,8 @@ do_line_change (cpp_reader *pfile, const cpp_token *token, while (-- spaces >= 0) putc (' ', print.outf); } + + return emitted_line_marker; } /* Called when a line of output is started. TOKEN is the first token diff --git a/gcc/testsuite/gcc.dg/cpp/syshdr4.c b/gcc/testsuite/gcc.dg/cpp/syshdr4.c new file mode 100644 index 0000000..fe001d2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/cpp/syshdr4.c @@ -0,0 +1,24 @@ +/* Contributed by Nicholas Ormrod */ +/* Origin: PR preprocessor/60723 */ + +/* This tests that multi-line macro callsites, which are defined + in system headers and whose expansion contains a builtin followed + by a non-builtin token, do not generate a line directive that + mark the current file as being a system file, when performing + non-integrated preprocessing. */ +/* System files suppress div-by-zero warnings, so the presence of + such indicates the lack of the bug. + + { dg-do compile } + { dg-options -no-integrated-cpp } */ + +#include "syshdr4.h" +FOO( +) + +int +foo() +{ + return 1 / 0; /* { dg-warning "div-by-zero" } */ + return 0; +} diff --git a/gcc/testsuite/gcc.dg/cpp/syshdr4.h b/gcc/testsuite/gcc.dg/cpp/syshdr4.h new file mode 100644 index 0000000..c464f6e --- /dev/null +++ b/gcc/testsuite/gcc.dg/cpp/syshdr4.h @@ -0,0 +1,8 @@ +/* Contributed by Nicholas Ormrod + Origin: PR preprocessor/60723. + + This file is to be included by the syshdr4.c file. */ + +#pragma GCC system_header + +#define FOO() int line = __LINE__ ; diff --git a/gcc/testsuite/gcc.dg/cpp/syshdr5.c b/gcc/testsuite/gcc.dg/cpp/syshdr5.c new file mode 100644 index 0000000..42c6263 --- /dev/null +++ b/gcc/testsuite/gcc.dg/cpp/syshdr5.c @@ -0,0 +1,14 @@ +/* Origin: PR preprocessor/60723 + + { dg-do compile } + { dg-options -no-integrated-cpp } */ + +#include "syshdr5.h" + +int +main() +{ + FOO(1/0 /* { dg-warning "division by zero" } */ + ); + return 0; +} diff --git a/gcc/testsuite/gcc.dg/cpp/syshdr5.h b/gcc/testsuite/gcc.dg/cpp/syshdr5.h new file mode 100644 index 0000000..300d6c3 --- /dev/null +++ b/gcc/testsuite/gcc.dg/cpp/syshdr5.h @@ -0,0 +1,6 @@ +/* Origin: PR preprocessor/60723 + + This header file is to be included by the syshdr5.c file. */ + +#pragma GCC system_header +#define FOO(A)do {int line = __LINE__ ; A;} while(0)