Message ID | CAH6eHdTfKxbucGxh7FGjRcWeNKhAqfQQO+k0pC7CjAaDKfy8Vw@mail.gmail.com |
---|---|
State | New |
Headers | show |
Series | PR preprocessor/84517 allow double-underscore macros after string literals | expand |
On Tue, Feb 27, 2018 at 9:41 AM, Jonathan Wakely <jwakely.gcc@gmail.com> wrote: > Since the fix for PR c++/80955 any suffix on a string literal that > begins with an underscore is assumed to be a user-defined literal > suffix, not a macro. This assumption is invalid for a suffix beginning > with two underscores, because such names are reserved and can't be used > for UDLs anyway. [lex.name]/3 reserves all identifiers containing double underscore, but the spaceless one-token form does not actually use such an identifier. See the (equally reserved) _Bq example in [over.literal]/8: double operator""_Bq(long double); // OK: does not use the reserved identifier _Bq ([lex.name]) double operator"" _Bq(long double); // uses the reserved identifier _Bq ([lex.name])
On 27 February 2018 at 16:49, Tim Song <t.canens.cpp@gmail.com> wrote: > On Tue, Feb 27, 2018 at 9:41 AM, Jonathan Wakely <jwakely.gcc@gmail.com> wrote: >> Since the fix for PR c++/80955 any suffix on a string literal that >> begins with an underscore is assumed to be a user-defined literal >> suffix, not a macro. This assumption is invalid for a suffix beginning >> with two underscores, because such names are reserved and can't be used >> for UDLs anyway. > > [lex.name]/3 reserves all identifiers containing double underscore, > but the spaceless one-token form does not actually use such an > identifier. > > See the (equally reserved) _Bq example in [over.literal]/8: > > double operator""_Bq(long double); // OK: does not > use the reserved identifier _Bq ([lex.name]) > double operator"" _Bq(long double); // uses the > reserved identifier _Bq ([lex.name]) I know, but GCC doesn't implement the rule accurately. I reported PR 80955 because GCC's UDL parsing meant we reject valid programs. The fix for that bug was a bit of a hack, simply adding a special case so that a suffix starting with an underscore is never expanded as a macro, which makes the above examples do the right thing. But that introduces a regression where we no longer accept ""__FILE__ (because it starts with an underscore), where previously that was accepted as an extension, just like "%"PRIu64 is accepted (with a warning). After the hack for 80955 we still do the wrong thing for: #define foo int operator""foo(); But that can't appear in a valid program, so we get away with it. But it's still a hack. My patch doesn't try to change how we parse operator""_Bq it just ensures we accept ""__FILE__ and similar cases. I think my patch means we reject: int operator""__X(unsigned long long); But that also can't appear in a valid program, so again we get away with it.
On 27 February 2018 at 16:59, Jonathan Wakely <jwakely.gcc@gmail.com> wrote: > On 27 February 2018 at 16:49, Tim Song <t.canens.cpp@gmail.com> wrote: >> On Tue, Feb 27, 2018 at 9:41 AM, Jonathan Wakely <jwakely.gcc@gmail.com> wrote: >>> Since the fix for PR c++/80955 any suffix on a string literal that >>> begins with an underscore is assumed to be a user-defined literal >>> suffix, not a macro. This assumption is invalid for a suffix beginning >>> with two underscores, because such names are reserved and can't be used >>> for UDLs anyway. >> >> [lex.name]/3 reserves all identifiers containing double underscore, >> but the spaceless one-token form does not actually use such an >> identifier. >> >> See the (equally reserved) _Bq example in [over.literal]/8: >> >> double operator""_Bq(long double); // OK: does not >> use the reserved identifier _Bq ([lex.name]) >> double operator"" _Bq(long double); // uses the >> reserved identifier _Bq ([lex.name]) > > I know, but GCC doesn't implement the rule accurately. I reported PR > 80955 because GCC's UDL parsing meant we reject valid programs. The > fix for that bug was a bit of a hack, simply adding a special case so > that a suffix starting with an underscore is never expanded as a > macro, which makes the above examples do the right thing. But that > introduces a regression where we no longer accept ""__FILE__ (because > it starts with an underscore), where previously that was accepted as > an extension, just like "%"PRIu64 is accepted (with a warning). > > After the hack for 80955 we still do the wrong thing for: > > #define foo > int operator""foo(); > > But that can't appear in a valid program, so we get away with it. But > it's still a hack. > > My patch doesn't try to change how we parse operator""_Bq it just > ensures we accept ""__FILE__ and similar cases. I think my patch means > we reject: > > int operator""__X(unsigned long long); > > But that also can't appear in a valid program, so again we get away with it. Sorry for being inaccurate, what I mean is the problem case where a macro clashes with a UDL can't appear (because defining the macro would be invalid): #define __X int operator""__X(unsigned long long); However, that doesn't help for this case: int operator""__FILE__(unsigned long long); This is allowed as a UDL, and doesn't use a reserved identifier (because it doesn't define __FILE__, the implementation does), but would fail after my patch because __FILE__ gets incorrectly expanded as a macro. I don't have much sympathy for anybody defining such a UDL.
OK. On Tue, Feb 27, 2018 at 9:41 AM, Jonathan Wakely <jwakely.gcc@gmail.com> wrote: > Since the fix for PR c++/80955 any suffix on a string literal that > begins with an underscore is assumed to be a user-defined literal > suffix, not a macro. This assumption is invalid for a suffix beginning > with two underscores, because such names are reserved and can't be used > for UDLs anyway. Checking for exactly one underscore restores support > for macro expansion in cases like "File: "__FILE__ or "Date: "__DATE__ > (which are formally ill-formed but accepted with a warning, as a > conforming extension). > > gcc/testsuite: > > PR preprocessor/84517 > * g++.dg/cpp0x/udlit-macros.C: Expect a warning for ""__FILE__. > > libcpp: > > PR preprocessor/84517 > * lex.c (is_macro_not_literal_suffix): New function. > (lex_raw_string, lex_string): Use is_macro_not_literal_suffix to > decide when to issue -Wliteral-suffix warnings. > > > Tested powerpc64le-linux, OK for trunk?
diff --git a/gcc/testsuite/g++.dg/cpp0x/udlit-macros.C b/gcc/testsuite/g++.dg/cpp0x/udlit-macros.C index fb518281811..7ef324b7e04 100644 --- a/gcc/testsuite/g++.dg/cpp0x/udlit-macros.C +++ b/gcc/testsuite/g++.dg/cpp0x/udlit-macros.C @@ -16,7 +16,7 @@ int operator""_ID(const char*, size_t) { return 0; } int main() { long i64 = 123; - char buf[100]; + char buf[] = "xxxxxx"__FILE__; // { dg-warning "invalid suffix on literal" } sprintf(buf, "%"PRId64"abc", i64); // { dg-warning "invalid suffix on literal" } return strcmp(buf, "123abc") + ""_zero diff --git a/libcpp/lex.c b/libcpp/lex.c index 92c62517a4d..37c365a3560 100644 --- a/libcpp/lex.c +++ b/libcpp/lex.c @@ -1630,6 +1630,21 @@ is_macro(cpp_reader *pfile, const uchar *base) return !result ? false : (result->type == NT_MACRO); } +/* Returns true if a literal suffix does not have the expected form + and is defined as a macro. */ + +static bool +is_macro_not_literal_suffix(cpp_reader *pfile, const uchar *base) +{ + /* User-defined literals outside of namespace std must start with a single + underscore, so assume anything of that form really is a UDL suffix. + We don't need to worry about UDLs defined inside namespace std because + their names are reserved, so cannot be used as macro names in valid + programs. */ + if (base[0] == '_' && base[1] != '_') + return false; + return is_macro (pfile, base); +} /* Lexes a raw string. The stored string contains the spelling, including double quotes, delimiter string, '(' and ')', any leading @@ -1900,10 +1915,8 @@ lex_raw_string (cpp_reader *pfile, cpp_token *token, const uchar *base, { /* If a string format macro, say from inttypes.h, is placed touching a string literal it could be parsed as a C++11 user-defined string - literal thus breaking the program. - Try to identify macros with is_macro. A warning is issued. - The macro name should not start with '_' for this warning. */ - if ((*cur != '_') && is_macro (pfile, cur)) + literal thus breaking the program. */ + if (is_macro_not_literal_suffix (pfile, cur)) { /* Raise a warning, but do not consume subsequent tokens. */ if (CPP_OPTION (pfile, warn_literal_suffix) && !pfile->state.skipping) @@ -2031,10 +2044,8 @@ lex_string (cpp_reader *pfile, cpp_token *token, const uchar *base) { /* If a string format macro, say from inttypes.h, is placed touching a string literal it could be parsed as a C++11 user-defined string - literal thus breaking the program. - Try to identify macros with is_macro. A warning is issued. - The macro name should not start with '_' for this warning. */ - if ((*cur != '_') && is_macro (pfile, cur)) + literal thus breaking the program. */ + if (is_macro_not_literal_suffix (pfile, cur)) { /* Raise a warning, but do not consume subsequent tokens. */ if (CPP_OPTION (pfile, warn_literal_suffix) && !pfile->state.skipping)