| Submitter | Steven Bosscher |
|---|---|
| Date | July 8, 2012, 10:29 p.m. |
| Message ID | <CABu31nMUcv8BVemBdKBjYZL9UiHa6niizL_s1nmZ4SHm158Pwg@mail.gmail.com> |
| Download | mbox | patch |
| Permalink | /patch/169671/ |
| State | New |
| Headers | show |
Comments
Steven Bosscher <stevenb.gcc@gmail.com> writes: > In any case, it's valid for C++11. Jason modified charset.c to > implement the C++11 change and handle the differences between C99 and > C++11 (*) but I think he overlooked the two lines at the bottom that > convert a 0 result to 1. You also get 0 for a partial UCN, which is significant for forms_identifier_p. Also this part of the comment needs to be adjusted: Otherwise the nonzero value of the UCN, whether valid or invalid, is returned. Andreas.
On 07/09/2012 09:07 AM, Andreas Schwab wrote: > You also get 0 for a partial UCN, which is significant for > forms_identifier_p. Right. Since 0 is a valid UCN, we can't use the return value both for the UCN and the predicate; I'd suggest returning the UCN by pointer parameter. Jason
Patch
Index: libcpp/charset.c =================================================================== --- libcpp/charset.c (revision 189358) +++ libcpp/charset.c (working copy) @@ -1071,9 +1071,6 @@ _cpp_valid_ucn (cpp_reader *pfile, const uchar **p (int) (str - base), base); } - if (result == 0) - result = 1; - return result; } Index: gcc/testsuite/g++.dg/pr53690.C =================================================================== --- gcc/testsuite/g++.dg/pr53690.C (revision 0) +++ gcc/testsuite/g++.dg/pr53690.C (revision 0) @@ -0,0 +1,25 @@ +// { dg-do compile } +// { dg-options "-std=c++11" } + +extern "C" int printf (__const char *__restrict __format, ...); + +typedef unsigned short uint16_t; +typedef unsigned int uint32_t; + +int main() { + uint32_t a = U'\U00000000'; + uint32_t b = U'\u0000'; + uint32_t c = U'\x00'; + uint32_t d = U'\0'; + + uint16_t e = u'\U00000000'; + uint16_t f = u'\u0000'; + uint16_t g = u'\x00'; + uint16_t h = u'\0'; + + printf("%x %x %x %x %x %x %x %x\n", a, b, c, d, e, f, g, h); + + return 0; +} + +// { dg-final { scan-tree-dump-not "= 1" "original" } }
Hello, In PR53690, a UCN is incorrectly interpreted in C++11 mode. The value should be 0 but is converted to 1 by libcpp U'\U00000000'. The reason is that _cpp_valid_ucn converts all 0 results to 1, by default. I am not 100% sure why that is (there is no comment and the code has been like that since the initial checkin). In C99 characters below 0xa0 are not allowed, so perhaps _cpp_valid_ucn returned 1 for a 0 UCN because it's invalid in C and it was deemed better to return a non-NULL character as an error than '\0' In any case, it's valid for C++11. Jason modified charset.c to implement the C++11 change and handle the differences between C99 and C++11 (*) but I think he overlooked the two lines at the bottom that convert a 0 result to 1. The attached patch was bootstrapped&tested on powerpc64-unknown-linux-gnu. Does it make sense enough for an OK? :-) Ciao! Steven (*) see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2170.html and http://gcc.gnu.org/viewcvs?view=revision&revision=152614 libcpp/ PR preprocessor/53690 * charset (_cpp_valid_ucn): If result == 0, return 0, the C99 path for values < 0xa0 is handled earlier since r152614. testsuite/ PR preprocessor/53690 * g++.dg/pr53690.C: New test.