diff mbox

[libcpp] PR53690 - wrong code for C++11 UCN

Message ID CABu31nMUcv8BVemBdKBjYZL9UiHa6niizL_s1nmZ4SHm158Pwg@mail.gmail.com
State New
Headers show

Commit Message

Steven Bosscher July 8, 2012, 10:29 p.m. UTC
Hello,

In PR53690, a UCN is incorrectly interpreted in C++11 mode. The value
should be 0 but is converted to 1 by libcpp U'\U00000000'.

The reason is that _cpp_valid_ucn converts all 0 results to 1, by
default. I am not 100% sure why that is (there is no comment and the
code has been like that since the initial checkin). In C99 characters
below 0xa0 are not allowed, so perhaps _cpp_valid_ucn returned 1 for a
0 UCN because it's invalid in C and it was deemed better to return a
non-NULL character as an error than '\0'

In any case, it's valid for C++11. Jason modified charset.c to
implement the C++11 change and handle the differences between C99 and
C++11 (*) but I think he overlooked the two lines at the bottom that
convert a 0 result to 1.

The attached patch was bootstrapped&tested on
powerpc64-unknown-linux-gnu. Does it make sense enough for an OK? :-)

Ciao!
Steven



(*) see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2170.html
and http://gcc.gnu.org/viewcvs?view=revision&revision=152614

libcpp/
        PR preprocessor/53690
        * charset (_cpp_valid_ucn): If result == 0, return 0, the C99
        path for values < 0xa0 is handled earlier since r152614.

testsuite/
        PR preprocessor/53690
        *  g++.dg/pr53690.C: New test.

Comments

Andreas Schwab July 9, 2012, 7:07 a.m. UTC | #1
Steven Bosscher <stevenb.gcc@gmail.com> writes:

> In any case, it's valid for C++11. Jason modified charset.c to
> implement the C++11 change and handle the differences between C99 and
> C++11 (*) but I think he overlooked the two lines at the bottom that
> convert a 0 result to 1.

You also get 0 for a partial UCN, which is significant for
forms_identifier_p.

Also this part of the comment needs to be adjusted:

   Otherwise the nonzero value of the UCN, whether valid or invalid,
   is returned.

Andreas.
Jason Merrill July 9, 2012, 8:04 a.m. UTC | #2
On 07/09/2012 09:07 AM, Andreas Schwab wrote:
> You also get 0 for a partial UCN, which is significant for
> forms_identifier_p.

Right.  Since 0 is a valid UCN, we can't use the return value both for 
the UCN and the predicate; I'd suggest returning the UCN by pointer 
parameter.

Jason
diff mbox

Patch

Index: libcpp/charset.c
===================================================================
--- libcpp/charset.c    (revision 189358)
+++ libcpp/charset.c    (working copy)
@@ -1071,9 +1071,6 @@  _cpp_valid_ucn (cpp_reader *pfile, const uchar **p
                   (int) (str - base), base);
     }

-  if (result == 0)
-    result = 1;
-
   return result;
 }

Index: gcc/testsuite/g++.dg/pr53690.C
===================================================================
--- gcc/testsuite/g++.dg/pr53690.C      (revision 0)
+++ gcc/testsuite/g++.dg/pr53690.C      (revision 0)
@@ -0,0 +1,25 @@ 
+// { dg-do compile }
+// { dg-options "-std=c++11" }
+
+extern "C" int printf (__const char *__restrict __format, ...);
+
+typedef unsigned short uint16_t;
+typedef unsigned int uint32_t;
+
+int main() {
+    uint32_t a = U'\U00000000';
+    uint32_t b = U'\u0000';
+    uint32_t c = U'\x00';
+    uint32_t d = U'\0';
+
+    uint16_t e = u'\U00000000';
+    uint16_t f = u'\u0000';
+    uint16_t g = u'\x00';
+    uint16_t h = u'\0';
+
+    printf("%x %x %x %x %x %x %x %x\n", a, b, c, d, e, f, g, h);
+
+    return 0;
+}
+
+// { dg-final { scan-tree-dump-not "= 1" "original" } }