diff mbox series

[C++] PR 80955 (Macros expanded in definition of user-defined literals)

Message ID 6bd671aa-97cf-1dcc-ca90-5948ab533113@oracle.com
State New
Headers show
Series [C++] PR 80955 (Macros expanded in definition of user-defined literals) | expand

Commit Message

Mukesh Kapoor Oct. 20, 2017, 4:37 p.m. UTC
Hi,

This patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80955.
Handle user-defined literals correctly in lex_string(). An empty string 
followed by an identifier is
a valid user-defined literal. Don't issue a warning for this case.

Bootstrapped and tested with 'make check' on x86_64-linux. New test case 
added. Ok for trunk?

Mukesh

/libcpp
2017-10-20  Mukesh Kapoor   <mukesh.kapoor@oracle.com>

	PR c++/80955
	* lex.c (lex_string): An empty string followed by an identifier is
	a valid user-defined literal. Don't issue a warning for this case.

/testsuite
2017-10-20  Mukesh Kapoor   <mukesh.kapoor@oracle.com>

	PR c++/80955
	* g++.dg/cpp0x/udlit-macros.C: New.

Comments

Nathan Sidwell Oct. 20, 2017, 5:45 p.m. UTC | #1
On 10/20/2017 12:37 PM, Mukesh Kapoor wrote:
> Hi,
> 
> This patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80955.
> Handle user-defined literals correctly in lex_string(). An empty string 
> followed by an identifier is
> a valid user-defined literal. Don't issue a warning for this case.

a) why do we trigger on the definition of the operator function, and not 
on the use site?

b) Why is the empty string special cased?  Doesn't the same logic apply to:

int operator "bob"_zero (const char *, size_t) { return 0;}

(that'd be a syntactic error in the C++ parser of course)

nathan
Mukesh Kapoor Oct. 20, 2017, 6 p.m. UTC | #2
Hi,

On 10/20/2017 10:45 AM, Nathan Sidwell wrote:
> On 10/20/2017 12:37 PM, Mukesh Kapoor wrote:
>> Hi,
>>
>> This patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80955.
>> Handle user-defined literals correctly in lex_string(). An empty 
>> string followed by an identifier is
>> a valid user-defined literal. Don't issue a warning for this case.
>
> a) why do we trigger on the definition of the operator function, and 
> not on the use site?

Actually, the current compiler issues an error (incorrectly) at both 
places: at the definition as well as at its use.

>
> b) Why is the empty string special cased?  Doesn't the same logic 
> apply to:
>
> int operator "bob"_zero (const char *, size_t) { return 0;}

This is not a valid user-defined literal and is already reported as an 
error by the compiler. After my changes it's still reported as an error.
The empty string immediately followed by an identifier is a special case 
because it's a valid user-defined literal in C++. ""_zero is a valid 
user-defined literal.

Mukesh

>
> (that'd be a syntactic error in the C++ parser of course)
>
> nathan
>
Mukesh Kapoor Oct. 20, 2017, 7:52 p.m. UTC | #3
On 10/20/2017 11:00 AM, Mukesh Kapoor wrote:
> Hi,
>
> On 10/20/2017 10:45 AM, Nathan Sidwell wrote:
>> On 10/20/2017 12:37 PM, Mukesh Kapoor wrote:
>>> Hi,
>>>
>>> This patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80955.
>>> Handle user-defined literals correctly in lex_string(). An empty 
>>> string followed by an identifier is
>>> a valid user-defined literal. Don't issue a warning for this case.
>>
>> a) why do we trigger on the definition of the operator function, and 
>> not on the use site?
>
> Actually, the current compiler issues an error (incorrectly) at both 
> places: at the definition as well as at its use.
>
>>
>> b) Why is the empty string special cased?  Doesn't the same logic 
>> apply to:
>>
>> int operator "bob"_zero (const char *, size_t) { return 0;}
>
> This is not a valid user-defined literal and is already reported as an 
> error by the compiler. After my changes it's still reported as an error.
> The empty string immediately followed by an identifier is a special 
> case because it's a valid user-defined literal in C++. ""_zero is a 
> valid user-defined literal.

Sorry, I used incorrect terminology here. An empty string immediately 
followed by an identifier is a valid name for a literal operator; 
""_zero is a valid name for a literal operator.

Mukesh

>
> Mukesh
>
>>
>> (that'd be a syntactic error in the C++ parser of course)
>>
>> nathan
>>
>
Jason Merrill Oct. 24, 2017, 2:05 p.m. UTC | #4
On Fri, Oct 20, 2017 at 3:52 PM, Mukesh Kapoor <mukesh.kapoor@oracle.com> wrote:
> On 10/20/2017 11:00 AM, Mukesh Kapoor wrote:
>> On 10/20/2017 10:45 AM, Nathan Sidwell wrote:
>>> On 10/20/2017 12:37 PM, Mukesh Kapoor wrote:
>>>>
>>>> This patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80955.
>>>> Handle user-defined literals correctly in lex_string(). An empty string
>>>> followed by an identifier is
>>>> a valid user-defined literal. Don't issue a warning for this case.
>>>
>>> a) why do we trigger on the definition of the operator function, and not
>>> on the use site?
>>
>> Actually, the current compiler issues an error (incorrectly) at both
>> places: at the definition as well as at its use.
>>
>>> b) Why is the empty string special cased?  Doesn't the same logic apply
>>> to:
>>>
>>> int operator "bob"_zero (const char *, size_t) { return 0;}
>>
>> This is not a valid user-defined literal and is already reported as an
>> error by the compiler. After my changes it's still reported as an error.
>> The empty string immediately followed by an identifier is a special case
>> because it's a valid user-defined literal in C++. ""_zero is a valid
>> user-defined literal.
>
> Sorry, I used incorrect terminology here. An empty string immediately
> followed by an identifier is a valid name for a literal operator; ""_zero is
> a valid name for a literal operator.

Yes, and indeed "bob"_zero isn't.  But it would be fine for the
testcase to return "bob"_zero, a valid user-defined string literal
which calls operator ""_zero, and that will still break after your
patch.

It seems like we need to handle this error recovery in the front end,
where we can look to see if there's a matching operator, rather than
in libcpp.

Jason
Mukesh Kapoor Oct. 25, 2017, 4:03 a.m. UTC | #5
On 10/24/2017 7:05 AM, Jason Merrill wrote:
> On Fri, Oct 20, 2017 at 3:52 PM, Mukesh Kapoor <mukesh.kapoor@oracle.com> wrote:
>> On 10/20/2017 11:00 AM, Mukesh Kapoor wrote:
>>> On 10/20/2017 10:45 AM, Nathan Sidwell wrote:
>>>> On 10/20/2017 12:37 PM, Mukesh Kapoor wrote:
>>>>> This patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80955.
>>>>> Handle user-defined literals correctly in lex_string(). An empty string
>>>>> followed by an identifier is
>>>>> a valid user-defined literal. Don't issue a warning for this case.
>>>> a) why do we trigger on the definition of the operator function, and not
>>>> on the use site?
>>> Actually, the current compiler issues an error (incorrectly) at both
>>> places: at the definition as well as at its use.
>>>
>>>> b) Why is the empty string special cased?  Doesn't the same logic apply
>>>> to:
>>>>
>>>> int operator "bob"_zero (const char *, size_t) { return 0;}
>>> This is not a valid user-defined literal and is already reported as an
>>> error by the compiler. After my changes it's still reported as an error.
>>> The empty string immediately followed by an identifier is a special case
>>> because it's a valid user-defined literal in C++. ""_zero is a valid
>>> user-defined literal.
>> Sorry, I used incorrect terminology here. An empty string immediately
>> followed by an identifier is a valid name for a literal operator; ""_zero is
>> a valid name for a literal operator.
> Yes, and indeed "bob"_zero isn't.  But it would be fine for the
> testcase to return "bob"_zero, a valid user-defined string literal
> which calls operator ""_zero, and that will still break after your
> patch.
>
> It seems like we need to handle this error recovery in the front end,
> where we can look to see if there's a matching operator, rather than
> in libcpp.

Thanks for pointing this out. Checking in the front end will be 
difficult because the front end gets tokens after macro expansion. I 
think the difficulty of fixing this bug comes because of the requirement 
to maintain backward compatibility with the option -Wliteral-suffix for 
-std=c++11.

Mukesh

>
> Jason
Nathan Sidwell Oct. 25, 2017, 11:20 a.m. UTC | #6
On 10/25/2017 12:03 AM, Mukesh Kapoor wrote:

> Thanks for pointing this out. Checking in the front end will be 
> difficult because the front end gets tokens after macro expansion. I 
> think the difficulty of fixing this bug comes because of the requirement 
> to maintain backward compatibility with the option -Wliteral-suffix for 
> -std=c++11.

IIUC the warning's intent is to catch cases of:
printf ("some format"PRIx64 ..., ...);
where there's no space between the string literals and the PRIx64 macro. 
  I suspect it's very common for there to be a following string-literal, 
so perhaps the preprocessor could detect:

<string-literal>NON-FN-MACRO<maybe-space><string-literal>

and warn on that sequence?

nathan
Jason Merrill Oct. 25, 2017, 4:06 p.m. UTC | #7
On Wed, Oct 25, 2017 at 7:20 AM, Nathan Sidwell <nathan@acm.org> wrote:
> On 10/25/2017 12:03 AM, Mukesh Kapoor wrote:
>
>> Thanks for pointing this out. Checking in the front end will be difficult
>> because the front end gets tokens after macro expansion. I think the
>> difficulty of fixing this bug comes because of the requirement to maintain
>> backward compatibility with the option -Wliteral-suffix for -std=c++11.
>
>
> IIUC the warning's intent is to catch cases of:
> printf ("some format"PRIx64 ..., ...);
> where there's no space between the string literals and the PRIx64 macro.  I
> suspect it's very common for there to be a following string-literal, so
> perhaps the preprocessor could detect:
>
> <string-literal>NON-FN-MACRO<maybe-space><string-literal>
>
> and warn on that sequence?

Or perhaps we could limit the current behavior to when the macro
expands to a string literal?

Jason
Mukesh Kapoor Oct. 25, 2017, 4:13 p.m. UTC | #8
On 10/25/2017 9:06 AM, Jason Merrill wrote:
> On Wed, Oct 25, 2017 at 7:20 AM, Nathan Sidwell <nathan@acm.org> wrote:
>> On 10/25/2017 12:03 AM, Mukesh Kapoor wrote:
>>
>>> Thanks for pointing this out. Checking in the front end will be difficult
>>> because the front end gets tokens after macro expansion. I think the
>>> difficulty of fixing this bug comes because of the requirement to maintain
>>> backward compatibility with the option -Wliteral-suffix for -std=c++11.
>>
>> IIUC the warning's intent is to catch cases of:
>> printf ("some format"PRIx64 ..., ...);
>> where there's no space between the string literals and the PRIx64 macro.  I
>> suspect it's very common for there to be a following string-literal, so
>> perhaps the preprocessor could detect:
>>
>> <string-literal>NON-FN-MACRO<maybe-space><string-literal>
>>
>> and warn on that sequence?
> Or perhaps we could limit the current behavior to when the macro
> expands to a string literal?

To check this the macro will have to be expanded completely. A macro can 
be defined using other macros. The header <inttypes.h> has the following 
definition for PRId64:

#  define __PRI64_PREFIX        "l"
# define PRId64         __PRI64_PREFIX "d"

Mukesh

>
> Jason
Mukesh Kapoor Oct. 26, 2017, 1:44 a.m. UTC | #9
On 10/25/2017 4:20 AM, Nathan Sidwell wrote:
> On 10/25/2017 12:03 AM, Mukesh Kapoor wrote:
>
>> Thanks for pointing this out. Checking in the front end will be 
>> difficult because the front end gets tokens after macro expansion. I 
>> think the difficulty of fixing this bug comes because of the 
>> requirement to maintain backward compatibility with the option 
>> -Wliteral-suffix for -std=c++11.
>
> IIUC the warning's intent is to catch cases of:
> printf ("some format"PRIx64 ..., ...);
> where there's no space between the string literals and the PRIx64 
> macro.  I suspect it's very common for there to be a following 
> string-literal, so perhaps the preprocessor could detect:
>
> <string-literal>NON-FN-MACRO<maybe-space><string-literal>
>
> and warn on that sequence?

Yes, this can be done easily and this is also the usage mentioned in the 
man page. I made this change in the compiler, bootstrapped it and ran 
the tests. The following two tests fail after the fix:

g++.dg/cpp0x/Wliteral-suffix.C
g++.dg/cpp0x/warn_cxx0x4.C

Both tests have code similar to the following (from Wliteral-suffix.C):

#define BAR "bar"
#define PLUS_ONE + 1

   char c = '3'PLUS_ONE;   // { dg-warning "invalid suffix on literal" }
   char s[] = "foo"BAR;    // { dg-warning "invalid suffix on literal" }

Other compilers don't accept this code. Maybe I should just modify these 
tests to have error messages instead of warnings and submit my revised fix?

Mukesh

>
> nathan
>
Mukesh Kapoor Oct. 31, 2017, 4:17 p.m. UTC | #10
On 10/25/2017 6:44 PM, Mukesh Kapoor wrote:
> On 10/25/2017 4:20 AM, Nathan Sidwell wrote:
>> On 10/25/2017 12:03 AM, Mukesh Kapoor wrote:
>>
>>> Thanks for pointing this out. Checking in the front end will be 
>>> difficult because the front end gets tokens after macro expansion. I 
>>> think the difficulty of fixing this bug comes because of the 
>>> requirement to maintain backward compatibility with the option 
>>> -Wliteral-suffix for -std=c++11.
>>
>> IIUC the warning's intent is to catch cases of:
>> printf ("some format"PRIx64 ..., ...);
>> where there's no space between the string literals and the PRIx64 
>> macro.  I suspect it's very common for there to be a following 
>> string-literal, so perhaps the preprocessor could detect:
>>
>> <string-literal>NON-FN-MACRO<maybe-space><string-literal>
>>
>> and warn on that sequence?
>
> Yes, this can be done easily and this is also the usage mentioned in 
> the man page. I made this change in the compiler, bootstrapped it and 
> ran the tests. The following two tests fail after the fix:
>
> g++.dg/cpp0x/Wliteral-suffix.C
> g++.dg/cpp0x/warn_cxx0x4.C
>
> Both tests have code similar to the following (from Wliteral-suffix.C):
>
> #define BAR "bar"
> #define PLUS_ONE + 1
>
>   char c = '3'PLUS_ONE;   // { dg-warning "invalid suffix on literal" }
>   char s[] = "foo"BAR;    // { dg-warning "invalid suffix on literal" }
>
> Other compilers don't accept this code. Maybe I should just modify 
> these tests to have error messages instead of warnings and submit my 
> revised fix?

Actually, according to the man page for -Wliteral-suffix, only macro 
names that don't start with an underscore should be considered when 
issuing a warning:

        -Wliteral-suffix (C++ and Objective-C++ only)
            Warn when a string or character literal is followed by a 
ud-suffix
            which does not begin with an underscore...

So the fix is simply to check if the macro name in is_macro() starts 
with an underscore. The function is_macro() is called only at three 
places. At two places it's used to check for the warning related to 
-Wliteral-suffix and the check for underscore should be made for these 
two cases; at one place it is used to check for the warning related to 
-Wc++11-compat and there is no need to check for underscore for this case.

The fix is simply to pass a bool flag as an additional argument to 
is_macro() to decide whether the macro name starts with an underscore or 
not. I have tested the attached patch on x86_64-linux. Thanks.

Mukesh
/libcpp
2017-10-31  Mukesh Kapoor   <mukesh.kapoor@oracle.com>

	PR c++/80955
	* lex.c (lex_string): Add an argument, 'bool no_underscore', to
	is_macro(). If no_underscore is true, check if the macro name
	starts with an underscore and return false if it does.
	If no_underscore is false, don't check for underscore. 
	is_macro() is called at three places. At two places it's used to
	check for the warning related to -Wliteral-suffix and the check for
	underscore is made for these two cases. At one place it's used to
	check for the warning related to -Wc++11-compat and the check for
	underscore is not made for this case.

/testsuite
2017-10-31  Mukesh Kapoor   <mukesh.kapoor@oracle.com>

	PR c++/80955
	* g++.dg/cpp0x/udlit-macros.C: New.
Index: gcc/testsuite/g++.dg/cpp0x/udlit-macros.C
===================================================================
--- gcc/testsuite/g++.dg/cpp0x/udlit-macros.C	(revision 0)
+++ gcc/testsuite/g++.dg/cpp0x/udlit-macros.C	(working copy)
@@ -0,0 +1,31 @@
+// PR c++/80955
+// { dg-do compile { target c++11 } }
+
+#define __STDC_FORMAT_MACROS
+#include <inttypes.h>
+#include <stdio.h>
+#include <string.h>
+
+using size_t = decltype(sizeof(0));
+#define _zero
+#define _ID _xx
+int operator""_zero(const char*, size_t) { return 0; }
+int operator""_ID(const char*, size_t) { return 0; }
+
+int main()
+{
+  int64_t i64 = 123;
+  char buf[100];
+  sprintf(buf, "%"PRId64"abc", i64);  // { dg-warning "invalid suffix on literal" }
+  return strcmp(buf, "123abc")
+	 + ""_zero
+	 + "bob"_zero
+	 + R"#(raw
+	       string)#"_zero
+	 + "xx"_ID
+	 + ""_ID
+	 + R"AA(another
+		raw
+		string)AA"_ID;
+}
+
Index: libcpp/lex.c
===================================================================
--- libcpp/lex.c	(revision 254048)
+++ libcpp/lex.c	(working copy)
@@ -1576,14 +1576,17 @@
 
 
 /* Returns true if a macro has been defined.
+   If no_underscore is true, check that the macro
+   name does not start with underscore.
    This might not work if compile with -save-temps,
    or preprocess separately from compilation.  */
 
 static bool
-is_macro(cpp_reader *pfile, const uchar *base)
+is_macro(cpp_reader *pfile, const uchar *base, bool no_underscore)
 {
   const uchar *cur = base;
-  if (! ISIDST (*cur))
+  bool invalid_ident = (no_underscore ? ! ISALPHA (*cur) : ! ISIDST (*cur));
+  if (invalid_ident)
     return false;
   unsigned int hash = HT_HASHSTEP (0, *cur);
   ++cur;
@@ -1872,7 +1875,7 @@
 	 a string literal it could be parsed as a C++11 user-defined string
 	 literal thus breaking the program.
 	 Try to identify macros with is_macro. A warning is issued. */
-      if (is_macro (pfile, cur))
+      if (is_macro (pfile, cur, true))
 	{
 	  /* Raise a warning, but do not consume subsequent tokens.  */
 	  if (CPP_OPTION (pfile, warn_literal_suffix) && !pfile->state.skipping)
@@ -2002,7 +2005,7 @@
 	 a string literal it could be parsed as a C++11 user-defined string
 	 literal thus breaking the program.
 	 Try to identify macros with is_macro. A warning is issued. */
-      if (is_macro (pfile, cur))
+      if (is_macro (pfile, cur, true))
 	{
 	  /* Raise a warning, but do not consume subsequent tokens.  */
 	  if (CPP_OPTION (pfile, warn_literal_suffix) && !pfile->state.skipping)
@@ -2023,7 +2026,7 @@
 	}
     }
   else if (CPP_OPTION (pfile, cpp_warn_cxx11_compat)
-	   && is_macro (pfile, cur)
+	   && is_macro (pfile, cur, false)
 	   && !pfile->state.skipping)
     cpp_warning_with_line (pfile, CPP_W_CXX11_COMPAT,
 			   token->src_loc, 0, "C++11 requires a space "
Jason Merrill Nov. 1, 2017, 8:02 p.m. UTC | #11
On Tue, Oct 31, 2017 at 12:17 PM, Mukesh Kapoor
<mukesh.kapoor@oracle.com> wrote:
> On 10/25/2017 6:44 PM, Mukesh Kapoor wrote:
>>
>> On 10/25/2017 4:20 AM, Nathan Sidwell wrote:
>>>
>>> On 10/25/2017 12:03 AM, Mukesh Kapoor wrote:
>>>
>>>> Thanks for pointing this out. Checking in the front end will be
>>>> difficult because the front end gets tokens after macro expansion. I think
>>>> the difficulty of fixing this bug comes because of the requirement to
>>>> maintain backward compatibility with the option -Wliteral-suffix for
>>>> -std=c++11.
>>>
>>>
>>> IIUC the warning's intent is to catch cases of:
>>> printf ("some format"PRIx64 ..., ...);
>>> where there's no space between the string literals and the PRIx64 macro.
>>> I suspect it's very common for there to be a following string-literal, so
>>> perhaps the preprocessor could detect:
>>>
>>> <string-literal>NON-FN-MACRO<maybe-space><string-literal>
>>>
>>> and warn on that sequence?
>>
>>
>> Yes, this can be done easily and this is also the usage mentioned in the
>> man page. I made this change in the compiler, bootstrapped it and ran the
>> tests. The following two tests fail after the fix:
>>
>> g++.dg/cpp0x/Wliteral-suffix.C
>> g++.dg/cpp0x/warn_cxx0x4.C
>>
>> Both tests have code similar to the following (from Wliteral-suffix.C):
>>
>> #define BAR "bar"
>> #define PLUS_ONE + 1
>>
>>   char c = '3'PLUS_ONE;   // { dg-warning "invalid suffix on literal" }
>>   char s[] = "foo"BAR;    // { dg-warning "invalid suffix on literal" }
>>
>> Other compilers don't accept this code. Maybe I should just modify these
>> tests to have error messages instead of warnings and submit my revised fix?
>
>
> Actually, according to the man page for -Wliteral-suffix, only macro names
> that don't start with an underscore should be considered when issuing a
> warning:
>
>        -Wliteral-suffix (C++ and Objective-C++ only)
>            Warn when a string or character literal is followed by a
> ud-suffix
>            which does not begin with an underscore...
>
> So the fix is simply to check if the macro name in is_macro() starts with an
> underscore. The function is_macro() is called only at three places. At two
> places it's used to check for the warning related to -Wliteral-suffix and
> the check for underscore should be made for these two cases; at one place it
> is used to check for the warning related to -Wc++11-compat and there is no
> need to check for underscore for this case.
>
> The fix is simply to pass a bool flag as an additional argument to
> is_macro() to decide whether the macro name starts with an underscore or
> not. I have tested the attached patch on x86_64-linux. Thanks.

Rather than add a mysterious parameter to is_macro, how about checking
*cur != '_' before we call it?

Jason
Mukesh Kapoor Nov. 1, 2017, 8:45 p.m. UTC | #12
On 11/1/2017 1:02 PM, Jason Merrill wrote:
> On Tue, Oct 31, 2017 at 12:17 PM, Mukesh Kapoor
> <mukesh.kapoor@oracle.com> wrote:
>> On 10/25/2017 6:44 PM, Mukesh Kapoor wrote:
>>> On 10/25/2017 4:20 AM, Nathan Sidwell wrote:
>>>> On 10/25/2017 12:03 AM, Mukesh Kapoor wrote:
>>>>
>>>>> Thanks for pointing this out. Checking in the front end will be
>>>>> difficult because the front end gets tokens after macro expansion. I think
>>>>> the difficulty of fixing this bug comes because of the requirement to
>>>>> maintain backward compatibility with the option -Wliteral-suffix for
>>>>> -std=c++11.
>>>>
>>>> IIUC the warning's intent is to catch cases of:
>>>> printf ("some format"PRIx64 ..., ...);
>>>> where there's no space between the string literals and the PRIx64 macro.
>>>> I suspect it's very common for there to be a following string-literal, so
>>>> perhaps the preprocessor could detect:
>>>>
>>>> <string-literal>NON-FN-MACRO<maybe-space><string-literal>
>>>>
>>>> and warn on that sequence?
>>>
>>> Yes, this can be done easily and this is also the usage mentioned in the
>>> man page. I made this change in the compiler, bootstrapped it and ran the
>>> tests. The following two tests fail after the fix:
>>>
>>> g++.dg/cpp0x/Wliteral-suffix.C
>>> g++.dg/cpp0x/warn_cxx0x4.C
>>>
>>> Both tests have code similar to the following (from Wliteral-suffix.C):
>>>
>>> #define BAR "bar"
>>> #define PLUS_ONE + 1
>>>
>>>    char c = '3'PLUS_ONE;   // { dg-warning "invalid suffix on literal" }
>>>    char s[] = "foo"BAR;    // { dg-warning "invalid suffix on literal" }
>>>
>>> Other compilers don't accept this code. Maybe I should just modify these
>>> tests to have error messages instead of warnings and submit my revised fix?
>>
>> Actually, according to the man page for -Wliteral-suffix, only macro names
>> that don't start with an underscore should be considered when issuing a
>> warning:
>>
>>         -Wliteral-suffix (C++ and Objective-C++ only)
>>             Warn when a string or character literal is followed by a
>> ud-suffix
>>             which does not begin with an underscore...
>>
>> So the fix is simply to check if the macro name in is_macro() starts with an
>> underscore. The function is_macro() is called only at three places. At two
>> places it's used to check for the warning related to -Wliteral-suffix and
>> the check for underscore should be made for these two cases; at one place it
>> is used to check for the warning related to -Wc++11-compat and there is no
>> need to check for underscore for this case.
>>
>> The fix is simply to pass a bool flag as an additional argument to
>> is_macro() to decide whether the macro name starts with an underscore or
>> not. I have tested the attached patch on x86_64-linux. Thanks.
> Rather than add a mysterious parameter to is_macro, how about checking
> *cur != '_' before we call it?

This is a good suggestion. I have attached the revised patch. Thanks.

Mukesh
/libcpp
2017-10-31  Mukesh Kapoor   <mukesh.kapoor@oracle.com>

	PR c++/80955
	* lex.c (lex_string): When checking for a valid macro for the
	warning related to -Wliteral-suffix (CPP_W_LITERAL_SUFFIX),
	check that the macro name does not start with an underscore
	before calling is_macro().

/testsuite
2017-10-31  Mukesh Kapoor   <mukesh.kapoor@oracle.com>

	PR c++/80955
	* g++.dg/cpp0x/udlit-macros.C: New.
Index: gcc/testsuite/g++.dg/cpp0x/udlit-macros.C
===================================================================
--- gcc/testsuite/g++.dg/cpp0x/udlit-macros.C	(revision 0)
+++ gcc/testsuite/g++.dg/cpp0x/udlit-macros.C	(working copy)
@@ -0,0 +1,31 @@
+// PR c++/80955
+// { dg-do compile { target c++11 } }
+
+#define __STDC_FORMAT_MACROS
+#include <inttypes.h>
+#include <stdio.h>
+#include <string.h>
+
+using size_t = decltype(sizeof(0));
+#define _zero
+#define _ID _xx
+int operator""_zero(const char*, size_t) { return 0; }
+int operator""_ID(const char*, size_t) { return 0; }
+
+int main()
+{
+  int64_t i64 = 123;
+  char buf[100];
+  sprintf(buf, "%"PRId64"abc", i64);  // { dg-warning "invalid suffix on literal" }
+  return strcmp(buf, "123abc")
+	 + ""_zero
+	 + "bob"_zero
+	 + R"#(raw
+	       string)#"_zero
+	 + "xx"_ID
+	 + ""_ID
+	 + R"AA(another
+		raw
+		string)AA"_ID;
+}
+
Index: libcpp/lex.c
===================================================================
--- libcpp/lex.c	(revision 254048)
+++ libcpp/lex.c	(working copy)
@@ -1871,8 +1871,9 @@
       /* If a string format macro, say from inttypes.h, is placed touching
 	 a string literal it could be parsed as a C++11 user-defined string
 	 literal thus breaking the program.
-	 Try to identify macros with is_macro. A warning is issued. */
-      if (is_macro (pfile, cur))
+	 Try to identify macros with is_macro. A warning is issued.
+	 The macro name should not start with '_' for this warning. */
+      if ((*cur != '_') && is_macro (pfile, cur))
 	{
 	  /* Raise a warning, but do not consume subsequent tokens.  */
 	  if (CPP_OPTION (pfile, warn_literal_suffix) && !pfile->state.skipping)
@@ -2001,8 +2002,9 @@
       /* If a string format macro, say from inttypes.h, is placed touching
 	 a string literal it could be parsed as a C++11 user-defined string
 	 literal thus breaking the program.
-	 Try to identify macros with is_macro. A warning is issued. */
-      if (is_macro (pfile, cur))
+	 Try to identify macros with is_macro. A warning is issued.
+	 The macro name should not start with '_' for this warning. */
+      if ((*cur != '_') && is_macro (pfile, cur))
 	{
 	  /* Raise a warning, but do not consume subsequent tokens.  */
 	  if (CPP_OPTION (pfile, warn_literal_suffix) && !pfile->state.skipping)
Jason Merrill Nov. 2, 2017, 2:42 p.m. UTC | #13
On Wed, Nov 1, 2017 at 4:45 PM, Mukesh Kapoor <mukesh.kapoor@oracle.com> wrote:
> On 11/1/2017 1:02 PM, Jason Merrill wrote:
>>
>> On Tue, Oct 31, 2017 at 12:17 PM, Mukesh Kapoor
>> <mukesh.kapoor@oracle.com> wrote:
>>>
>>> On 10/25/2017 6:44 PM, Mukesh Kapoor wrote:
>>>>
>>>> On 10/25/2017 4:20 AM, Nathan Sidwell wrote:
>>>>>
>>>>> On 10/25/2017 12:03 AM, Mukesh Kapoor wrote:
>>>>>
>>>>>> Thanks for pointing this out. Checking in the front end will be
>>>>>> difficult because the front end gets tokens after macro expansion. I
>>>>>> think
>>>>>> the difficulty of fixing this bug comes because of the requirement to
>>>>>> maintain backward compatibility with the option -Wliteral-suffix for
>>>>>> -std=c++11.
>>>>>
>>>>>
>>>>> IIUC the warning's intent is to catch cases of:
>>>>> printf ("some format"PRIx64 ..., ...);
>>>>> where there's no space between the string literals and the PRIx64
>>>>> macro.
>>>>> I suspect it's very common for there to be a following string-literal,
>>>>> so
>>>>> perhaps the preprocessor could detect:
>>>>>
>>>>> <string-literal>NON-FN-MACRO<maybe-space><string-literal>
>>>>>
>>>>> and warn on that sequence?
>>>>
>>>>
>>>> Yes, this can be done easily and this is also the usage mentioned in the
>>>> man page. I made this change in the compiler, bootstrapped it and ran
>>>> the
>>>> tests. The following two tests fail after the fix:
>>>>
>>>> g++.dg/cpp0x/Wliteral-suffix.C
>>>> g++.dg/cpp0x/warn_cxx0x4.C
>>>>
>>>> Both tests have code similar to the following (from Wliteral-suffix.C):
>>>>
>>>> #define BAR "bar"
>>>> #define PLUS_ONE + 1
>>>>
>>>>    char c = '3'PLUS_ONE;   // { dg-warning "invalid suffix on literal" }
>>>>    char s[] = "foo"BAR;    // { dg-warning "invalid suffix on literal" }
>>>>
>>>> Other compilers don't accept this code. Maybe I should just modify these
>>>> tests to have error messages instead of warnings and submit my revised
>>>> fix?
>>>
>>>
>>> Actually, according to the man page for -Wliteral-suffix, only macro
>>> names
>>> that don't start with an underscore should be considered when issuing a
>>> warning:
>>>
>>>         -Wliteral-suffix (C++ and Objective-C++ only)
>>>             Warn when a string or character literal is followed by a
>>> ud-suffix
>>>             which does not begin with an underscore...
>>>
>>> So the fix is simply to check if the macro name in is_macro() starts with
>>> an
>>> underscore. The function is_macro() is called only at three places. At
>>> two
>>> places it's used to check for the warning related to -Wliteral-suffix and
>>> the check for underscore should be made for these two cases; at one place
>>> it
>>> is used to check for the warning related to -Wc++11-compat and there is
>>> no
>>> need to check for underscore for this case.
>>>
>>> The fix is simply to pass a bool flag as an additional argument to
>>> is_macro() to decide whether the macro name starts with an underscore or
>>> not. I have tested the attached patch on x86_64-linux. Thanks.
>>
>> Rather than add a mysterious parameter to is_macro, how about checking
>> *cur != '_' before we call it?
>
> This is a good suggestion. I have attached the revised patch. Thanks.

OK, thanks!

Jason
Paolo Carlini Nov. 3, 2017, 2:31 p.m. UTC | #14
Hi,

On 02/11/2017 15:42, Jason Merrill wrote:
>
>> This is a good suggestion. I have attached the revised patch. Thanks.
> OK, thanks!
Thanks Jason.

I was about to volunteer committing the patch but then noticed that the 
testcase includes quite a lot, eg, <inttypes.h> too, which we never 
include in the whole C++ testsuite. Can we have something simpler? Also, 
we don't need to include the whole <stdio.h> and <string.h> for a couple 
of declarations, we can simply provide by hand the declarations of 
sprintf and strcmp at the beginning of the file (plenty of examples in 
the testsuite). Mukesh, can you please work on that? Also, please watch 
out trailing blank lines.

Thanks,
Paolo.
Mukesh Kapoor Nov. 4, 2017, 10:37 p.m. UTC | #15
On 11/3/2017 7:31 AM, Paolo Carlini wrote:
> Hi,
>
> On 02/11/2017 15:42, Jason Merrill wrote:
>>
>>> This is a good suggestion. I have attached the revised patch. Thanks.
>> OK, thanks!
> Thanks Jason.
>
> I was about to volunteer committing the patch but then noticed that 
> the testcase includes quite a lot, eg, <inttypes.h> too, which we 
> never include in the whole C++ testsuite. Can we have something 
> simpler? Also, we don't need to include the whole <stdio.h> and 
> <string.h> for a couple of declarations, we can simply provide by hand 
> the declarations of sprintf and strcmp at the beginning of the file 
> (plenty of examples in the testsuite). Mukesh, can you please work on 
> that? Also, please watch out trailing blank lines.

I had included <inttypes.h> to get the definition of macro PRId64. I 
have now modified the test case to remove all includes. I have added the 
definition of the macro in the test case and also added declarations of 
functions sprintf() strcmp(). I have attached the revised patch. Thanks.

Mukesh
Index: gcc/testsuite/g++.dg/cpp0x/udlit-macros.C
===================================================================
--- gcc/testsuite/g++.dg/cpp0x/udlit-macros.C	(revision 0)
+++ gcc/testsuite/g++.dg/cpp0x/udlit-macros.C	(working copy)
@@ -0,0 +1,32 @@
+// PR c++/80955
+// { dg-do compile { target c++11 } }
+
+extern "C" int sprintf(char *__restrict s,
+		       const char *__restrict format, ...);
+extern "C" int strcmp(const char *s1, const char *s2);
+
+#  define __PRI64_PREFIX        "l"
+# define PRId64         __PRI64_PREFIX "d"
+
+using size_t = decltype(sizeof(0));
+#define _zero
+#define _ID _xx
+int operator""_zero(const char*, size_t) { return 0; }
+int operator""_ID(const char*, size_t) { return 0; }
+
+int main()
+{
+  int i64 = 123;
+  char buf[100];
+  sprintf(buf, "%"PRId64"abc", i64);  // { dg-warning "invalid suffix on literal" }
+  return strcmp(buf, "123abc")
+	 + ""_zero
+	 + "bob"_zero
+	 + R"#(raw
+	       string)#"_zero
+	 + "xx"_ID
+	 + ""_ID
+	 + R"AA(another
+		raw
+		string)AA"_ID;
+}
Index: libcpp/lex.c
===================================================================
--- libcpp/lex.c	(revision 254048)
+++ libcpp/lex.c	(working copy)
@@ -1871,8 +1871,9 @@
       /* If a string format macro, say from inttypes.h, is placed touching
 	 a string literal it could be parsed as a C++11 user-defined string
 	 literal thus breaking the program.
-	 Try to identify macros with is_macro. A warning is issued. */
-      if (is_macro (pfile, cur))
+	 Try to identify macros with is_macro. A warning is issued.
+	 The macro name should not start with '_' for this warning. */
+      if ((*cur != '_') && is_macro (pfile, cur))
 	{
 	  /* Raise a warning, but do not consume subsequent tokens.  */
 	  if (CPP_OPTION (pfile, warn_literal_suffix) && !pfile->state.skipping)
@@ -2001,8 +2002,9 @@
       /* If a string format macro, say from inttypes.h, is placed touching
 	 a string literal it could be parsed as a C++11 user-defined string
 	 literal thus breaking the program.
-	 Try to identify macros with is_macro. A warning is issued. */
-      if (is_macro (pfile, cur))
+	 Try to identify macros with is_macro. A warning is issued.
+	 The macro name should not start with '_' for this warning. */
+      if ((*cur != '_') && is_macro (pfile, cur))
 	{
 	  /* Raise a warning, but do not consume subsequent tokens.  */
 	  if (CPP_OPTION (pfile, warn_literal_suffix) && !pfile->state.skipping)
Paolo Carlini Nov. 6, 2017, 10:30 a.m. UTC | #16
Hi,

On 04/11/2017 23:37, Mukesh Kapoor wrote:
> I had included <inttypes.h> to get the definition of macro PRId64. I 
> have now modified the test case to remove all includes. I have added 
> the definition of the macro in the test case and also added 
> declarations of functions sprintf() strcmp(). I have attached the 
> revised patch. Thanks.
Excellent. I'm now committing the patch with a couple of additional 
tweaks to the testcase: 1- Since you carefully constructed it to return 
zero at run time if we do the right thing, you want a 'dg-do run' not a 
'dg-do compile'; 2- You also want a long ia64 variable, otherwise the 
testscase triggers a -Wformat warning.

Thanks,
Paolo.

////////////////////
Index: gcc/testsuite/g++.dg/cpp0x/udlit-macros.C
===================================================================
--- gcc/testsuite/g++.dg/cpp0x/udlit-macros.C	(nonexistent)
+++ gcc/testsuite/g++.dg/cpp0x/udlit-macros.C	(working copy)
@@ -0,0 +1,31 @@
+// PR c++/80955
+// { dg-do run { target c++11 } }
+
+extern "C" int sprintf (char *s, const char *format, ...);
+extern "C" int strcmp (const char *s1, const char *s2);
+
+#define __PRI64_PREFIX        "l"
+#define PRId64         __PRI64_PREFIX "d"
+
+using size_t = decltype(sizeof(0));
+#define _zero
+#define _ID _xx
+int operator""_zero(const char*, size_t) { return 0; }
+int operator""_ID(const char*, size_t) { return 0; }
+
+int main()
+{
+  long i64 = 123;
+  char buf[100];
+  sprintf(buf, "%"PRId64"abc", i64);  // { dg-warning "invalid suffix on literal" }
+  return strcmp(buf, "123abc")
+	 + ""_zero
+	 + "bob"_zero
+	 + R"#(raw
+	       string)#"_zero
+	 + "xx"_ID
+	 + ""_ID
+	 + R"AA(another
+		raw
+		string)AA"_ID;
+}
Index: libcpp/lex.c
===================================================================
--- libcpp/lex.c	(revision 254432)
+++ libcpp/lex.c	(working copy)
@@ -1871,8 +1871,9 @@ lex_raw_string (cpp_reader *pfile, cpp_token *toke
       /* If a string format macro, say from inttypes.h, is placed touching
 	 a string literal it could be parsed as a C++11 user-defined string
 	 literal thus breaking the program.
-	 Try to identify macros with is_macro. A warning is issued. */
-      if (is_macro (pfile, cur))
+	 Try to identify macros with is_macro. A warning is issued.
+	 The macro name should not start with '_' for this warning. */
+      if ((*cur != '_') && is_macro (pfile, cur))
 	{
 	  /* Raise a warning, but do not consume subsequent tokens.  */
 	  if (CPP_OPTION (pfile, warn_literal_suffix) && !pfile->state.skipping)
@@ -2001,8 +2002,9 @@ lex_string (cpp_reader *pfile, cpp_token *token, c
       /* If a string format macro, say from inttypes.h, is placed touching
 	 a string literal it could be parsed as a C++11 user-defined string
 	 literal thus breaking the program.
-	 Try to identify macros with is_macro. A warning is issued. */
-      if (is_macro (pfile, cur))
+	 Try to identify macros with is_macro. A warning is issued.
+	 The macro name should not start with '_' for this warning. */
+      if ((*cur != '_') && is_macro (pfile, cur))
 	{
 	  /* Raise a warning, but do not consume subsequent tokens.  */
 	  if (CPP_OPTION (pfile, warn_literal_suffix) && !pfile->state.skipping)
diff mbox series

Patch

Index: gcc/testsuite/g++.dg/cpp0x/udlit-macros.C
===================================================================
--- gcc/testsuite/g++.dg/cpp0x/udlit-macros.C	(revision 0)
+++ gcc/testsuite/g++.dg/cpp0x/udlit-macros.C	(working copy)
@@ -0,0 +1,7 @@ 
+// PR c++/80955
+// { dg-do compile { target c++11 } }
+
+using size_t = decltype(sizeof(0));
+#define _zero
+int operator""_zero(const char*, size_t) { return 0; }
+int main() { return ""_zero; }
Index: libcpp/lex.c
===================================================================
--- libcpp/lex.c	(revision 253775)
+++ libcpp/lex.c	(working copy)
@@ -2001,8 +2001,11 @@ 
       /* If a string format macro, say from inttypes.h, is placed touching
 	 a string literal it could be parsed as a C++11 user-defined string
 	 literal thus breaking the program.
-	 Try to identify macros with is_macro. A warning is issued. */
-      if (is_macro (pfile, cur))
+	 Try to identify macros with is_macro. A warning is issued.
+	 Don't do this for a user-defined literal, i.e. an
+	 empty string followed by an identifier.
+	 For an empty string "", (cur-base)==2. Bug 80955 */
+      if (is_macro (pfile, cur) && ((cur-base) != 2))
 	{
 	  /* Raise a warning, but do not consume subsequent tokens.  */
 	  if (CPP_OPTION (pfile, warn_literal_suffix) && !pfile->state.skipping)