Message ID | f093d853-707c-af6f-70b2-06f9d90aa587@honermann.net |
---|---|
State | New |
Headers | show |
Series | : C N2653 char8_t implementation | expand |
On Sun, 6 Jun 2021, Tom Honermann via Gcc-patches wrote: > When -fchar8_t support is enabled for non-C++ modes, the _CHAR8_T_SOURCE macro > is predefined. This is the mechanism proposed to glibc to opt-in to > declarations of the char8_t typedef and c8rtomb and mbrtoc8 functions proposed > in N2653. See [2]. I don't think glibc should have such a feature test macro, and I don't think GCC should define such feature test macros either - _*_SOURCE macros are generally for the *user* to define to decide what namespace they want visible, not for the compiler to define. Without proliferating new language dialects, __STDC_VERSION__ ought to be sufficient to communicate from the compiler to the library (including to GCC's own headers such as stdatomic.h).
Also, it seems odd to add a new field to cpp_options without any code in libcpp that uses the value of that field.
On 6/7/21 5:11 PM, Joseph Myers wrote: > On Sun, 6 Jun 2021, Tom Honermann via Gcc-patches wrote: > >> When -fchar8_t support is enabled for non-C++ modes, the _CHAR8_T_SOURCE macro >> is predefined. This is the mechanism proposed to glibc to opt-in to >> declarations of the char8_t typedef and c8rtomb and mbrtoc8 functions proposed >> in N2653. See [2]. > I don't think glibc should have such a feature test macro, and I don't > think GCC should define such feature test macros either - _*_SOURCE macros > are generally for the *user* to define to decide what namespace they want > visible, not for the compiler to define. Without proliferating new > language dialects, __STDC_VERSION__ ought to be sufficient to communicate > from the compiler to the library (including to GCC's own headers such as > stdatomic.h). > In general I agree, but I think an exception is warranted in this case for a few reasons: 1. The feature includes both core language changes (the change of type for u8 string literals) and library changes. The library changes are not actually dependent on the core language change, but they are intended to be used together. 2. Existing use of the char8_t identifier can be found in existing open source projects and likely exists in some closed source projects as well. An opt-in approach avoids conflict and the need to conditionalize code based on gcc version. 3. An opt-in approach enables evaluation of the feature prior to any WG14 approval. Tom.
On 6/7/21 5:12 PM, Joseph Myers wrote: > Also, it seems odd to add a new field to cpp_options without any code in > libcpp that uses the value of that field. > Ah, thank you. That appears to be leftover code from prior experimentation and I failed to identify it as such when preparing the patch. I'll provide a revised patch. Tom.
On Fri, Jun 11, 2021 at 11:52:41AM -0400, Tom Honermann via Gcc-patches wrote: > On 6/7/21 5:11 PM, Joseph Myers wrote: > > On Sun, 6 Jun 2021, Tom Honermann via Gcc-patches wrote: > > > > > When -fchar8_t support is enabled for non-C++ modes, the _CHAR8_T_SOURCE macro > > > is predefined. This is the mechanism proposed to glibc to opt-in to > > > declarations of the char8_t typedef and c8rtomb and mbrtoc8 functions proposed > > > in N2653. See [2]. > > I don't think glibc should have such a feature test macro, and I don't > > think GCC should define such feature test macros either - _*_SOURCE macros > > are generally for the *user* to define to decide what namespace they want > > visible, not for the compiler to define. Without proliferating new > > language dialects, __STDC_VERSION__ ought to be sufficient to communicate > > from the compiler to the library (including to GCC's own headers such as > > stdatomic.h). > > > In general I agree, but I think an exception is warranted in this case for a > few reasons: > > 1. The feature includes both core language changes (the change of type > for u8 string literals) and library changes. The library changes > are not actually dependent on the core language change, but they are > intended to be used together. > 2. Existing use of the char8_t identifier can be found in existing open > source projects and likely exists in some closed source projects as > well. An opt-in approach avoids conflict and the need to > conditionalize code based on gcc version. > 3. An opt-in approach enables evaluation of the feature prior to any > WG14 approval. But calling it _CHAR8_T_SOURCE is weird and inconsistent with everything else. In C++, there is __cpp_char8_t 201811L predefined macro for char8_t. Using that in C is not right, sure. Often we use __SIZEOF_type__ macros not just for sizeof(), but also for presence check of the types, like #ifdef __SIZEOF_INT128__ __int128 i; #else long long i; #endif etc., while char8_t has sizeof (char8_t) == 1, perhaps predefining __SIZEOF_CHAR8_T__ 1 instead of _CHAR8_T_SOURCE would be better? Jakub
On 6/11/21 12:01 PM, Jakub Jelinek wrote: > On Fri, Jun 11, 2021 at 11:52:41AM -0400, Tom Honermann via Gcc-patches wrote: >> On 6/7/21 5:11 PM, Joseph Myers wrote: >>> On Sun, 6 Jun 2021, Tom Honermann via Gcc-patches wrote: >>> >>>> When -fchar8_t support is enabled for non-C++ modes, the _CHAR8_T_SOURCE macro >>>> is predefined. This is the mechanism proposed to glibc to opt-in to >>>> declarations of the char8_t typedef and c8rtomb and mbrtoc8 functions proposed >>>> in N2653. See [2]. >>> I don't think glibc should have such a feature test macro, and I don't >>> think GCC should define such feature test macros either - _*_SOURCE macros >>> are generally for the *user* to define to decide what namespace they want >>> visible, not for the compiler to define. Without proliferating new >>> language dialects, __STDC_VERSION__ ought to be sufficient to communicate >>> from the compiler to the library (including to GCC's own headers such as >>> stdatomic.h). >>> >> In general I agree, but I think an exception is warranted in this case for a >> few reasons: >> >> 1. The feature includes both core language changes (the change of type >> for u8 string literals) and library changes. The library changes >> are not actually dependent on the core language change, but they are >> intended to be used together. >> 2. Existing use of the char8_t identifier can be found in existing open >> source projects and likely exists in some closed source projects as >> well. An opt-in approach avoids conflict and the need to >> conditionalize code based on gcc version. >> 3. An opt-in approach enables evaluation of the feature prior to any >> WG14 approval. > But calling it _CHAR8_T_SOURCE is weird and inconsistent with everything > else. > In C++, there is __cpp_char8_t 201811L predefined macro for char8_t. > Using that in C is not right, sure. > Often we use __SIZEOF_type__ macros not just for sizeof(), but also for > presence check of the types, like > #ifdef __SIZEOF_INT128__ > __int128 i; > #else > long long i; > #endif > etc., while char8_t has sizeof (char8_t) == 1, perhaps predefining > __SIZEOF_CHAR8_T__ 1 > instead of _CHAR8_T_SOURCE would be better? I'm open to whatever signaling mechanism would be preferred. It took me a while to settle on _CHAR8_T_SOURCE as the mechanism to propose as I didn't find much for other precedents. I agree that having _CHAR8_T_SOURCE be implied by the -fchar8_t option is unusual with respect to other feature test macros. Is that what you find to be weird and inconsistent? Predefining __SIZEOF_CHAR8_T__ would be consistent with __SIZEOF_WCHAR_T__, but kind of strange too since the size is always 1. Perhaps a better approach would be to follow the __CHAR16_TYPE__ and __CHAR32_TYPE__ precedent and define __CHAR8_TYPE__ to unsigned char. That is likewise a bit strange since the type would always be unsigned char, but it does provide a bit more symmetry. That could potentially have some use as well; for C++, it could be defined as char8_t and thereby reflect the difference between the two languages. Perhaps it could be useful in the future as well if WG14 were to add distinct char8_t, char16_t, and char32_t types as C++ did (I'm not offering any prediction regarding the likelihood of that happening). Tom. > > Jakub >
On Fri, Jun 11, 2021 at 12:20:48PM -0400, Tom Honermann wrote: > I'm open to whatever signaling mechanism would be preferred. It took me a > while to settle on _CHAR8_T_SOURCE as the mechanism to propose as I didn't > find much for other precedents. > > I agree that having _CHAR8_T_SOURCE be implied by the -fchar8_t option is > unusual with respect to other feature test macros. Is that what you find to > be weird and inconsistent? > > Predefining __SIZEOF_CHAR8_T__ would be consistent with __SIZEOF_WCHAR_T__, > but kind of strange too since the size is always 1. > > Perhaps a better approach would be to follow the __CHAR16_TYPE__ and > __CHAR32_TYPE__ precedent and define __CHAR8_TYPE__ to unsigned char. That > is likewise a bit strange since the type would always be unsigned char, but > it does provide a bit more symmetry. That could potentially have some use > as well; for C++, it could be defined as char8_t and thereby reflect the > difference between the two languages. Perhaps it could be useful in the > future as well if WG14 were to add distinct char8_t, char16_t, and char32_t > types as C++ did (I'm not offering any prediction regarding the likelihood > of that happening). C++ already predefines #define __CHAR8_TYPE__ unsigned char #define __CHAR16_TYPE__ short unsigned int #define __CHAR32_TYPE__ unsigned int for -std={c,gnu}++2{0,a,3,b} or -fchar8_t (unless -fno-char8_t), so I agree just making sure __CHAR8_TYPE__ is defined to unsigned char even for C is best. And you probably don't need to do anything in the C patch for it, void c_stddef_cpp_builtins(void) { builtin_define_with_value ("__SIZE_TYPE__", SIZE_TYPE, 0); ... if (flag_char8_t) builtin_define_with_value ("__CHAR8_TYPE__", CHAR8_TYPE, 0); builtin_define_with_value ("__CHAR16_TYPE__", CHAR16_TYPE, 0); builtin_define_with_value ("__CHAR32_TYPE__", CHAR32_TYPE, 0); will do that. Jakub
On 6/11/21 12:53 PM, Jakub Jelinek wrote: > On Fri, Jun 11, 2021 at 12:20:48PM -0400, Tom Honermann wrote: >> I'm open to whatever signaling mechanism would be preferred. It took me a >> while to settle on _CHAR8_T_SOURCE as the mechanism to propose as I didn't >> find much for other precedents. >> >> I agree that having _CHAR8_T_SOURCE be implied by the -fchar8_t option is >> unusual with respect to other feature test macros. Is that what you find to >> be weird and inconsistent? >> >> Predefining __SIZEOF_CHAR8_T__ would be consistent with __SIZEOF_WCHAR_T__, >> but kind of strange too since the size is always 1. >> >> Perhaps a better approach would be to follow the __CHAR16_TYPE__ and >> __CHAR32_TYPE__ precedent and define __CHAR8_TYPE__ to unsigned char. That >> is likewise a bit strange since the type would always be unsigned char, but >> it does provide a bit more symmetry. That could potentially have some use >> as well; for C++, it could be defined as char8_t and thereby reflect the >> difference between the two languages. Perhaps it could be useful in the >> future as well if WG14 were to add distinct char8_t, char16_t, and char32_t >> types as C++ did (I'm not offering any prediction regarding the likelihood >> of that happening). > C++ already predefines > #define __CHAR8_TYPE__ unsigned char > #define __CHAR16_TYPE__ short unsigned int > #define __CHAR32_TYPE__ unsigned int > for -std={c,gnu}++2{0,a,3,b} or -fchar8_t (unless -fno-char8_t), so I agree > just making sure __CHAR8_TYPE__ is defined to unsigned char even for C > is best. > And you probably don't need to do anything in the C patch for it, > void > c_stddef_cpp_builtins(void) > { > builtin_define_with_value ("__SIZE_TYPE__", SIZE_TYPE, 0); > ... > if (flag_char8_t) > builtin_define_with_value ("__CHAR8_TYPE__", CHAR8_TYPE, 0); > builtin_define_with_value ("__CHAR16_TYPE__", CHAR16_TYPE, 0); > builtin_define_with_value ("__CHAR32_TYPE__", CHAR32_TYPE, 0); > will do that. Thank you; I had forgotten that I had already done that work. I confirmed that the proposed changes result in __CHAR8_TYPE__ being defined (the tests included with the patch already enforced it). Tom. > > Jakub >
commit c4260c7c49822522945377cc2fb93ee9830cefc8 Author: Tom Honermann <tom@honermann.net> Date: Sat Feb 13 09:02:34 2021 -0500 N2653 char8_t for C: Language support This patch implements the core language and compiler dependent library changes proposed in WG14 N2653 for C. The changes include: - Use of the existing -fchar8_t and -fno-char8_t options to opt-in to (or opt-out of) the following changes when compiling C code. - Change of type for UTF-8 string literals from array of const char to array of const char8_t (unsigned char). - A new atomic_char8_t typedef. - A new ATOMIC_CHAR8_T_LOCK_FREE macro defined in terms of a new predefined ATOMIC_CHAR8_T_LOCK_FREE macro. When -fchar8_t support is enabled for non-C++ modes, the _CHAR8_T_SOURCE macro is predefined. This is the mechanism proposed to glibc to opt-in to declarations of the char8_t typedef and c8rtomb and mbrtoc8 functions proposed in N2653. diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c index 42b7604c9ac..3e944ec2b86 100644 --- a/gcc/c-family/c-cppbuiltin.c +++ b/gcc/c-family/c-cppbuiltin.c @@ -1467,6 +1467,11 @@ c_cpp_builtins (cpp_reader *pfile) if (flag_iso) cpp_define (pfile, "__STRICT_ANSI__"); + /* Express intent for char8_t support in C (not C++) to the C library if + requested. */ + if (!c_dialect_cxx () && flag_char8_t) + cpp_define (pfile, "_CHAR8_T_SOURCE"); + if (!flag_signed_char) cpp_define (pfile, "__CHAR_UNSIGNED__"); diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c index c44e7a13489..e30e44e9f5c 100644 --- a/gcc/c-family/c-lex.c +++ b/gcc/c-family/c-lex.c @@ -1335,7 +1335,14 @@ lex_string (const cpp_token *tok, tree *valp, bool objc_string, bool translate) default: case CPP_STRING: case CPP_UTF8STRING: - value = build_string (1, ""); + if (type == CPP_UTF8STRING && flag_char8_t) + { + value = build_string (TYPE_PRECISION (char8_type_node) + / TYPE_PRECISION (char_type_node), + ""); /* char8_t is 8 bits */ + } + else + value = build_string (1, ""); break; case CPP_STRING16: value = build_string (TYPE_PRECISION (char16_type_node) diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c index 60b5802722c..eefc607dac6 100644 --- a/gcc/c-family/c-opts.c +++ b/gcc/c-family/c-opts.c @@ -718,6 +718,10 @@ c_common_handle_option (size_t scode, const char *arg, HOST_WIDE_INT value, case OPT_v: verbose = true; break; + + case OPT_fchar8_t: + cpp_opts->char8 = value; + break; } switch (c_language) diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index 91929706aff..eadb2468aa9 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -1451,8 +1451,8 @@ C ObjC C++ ObjC++ Where shorter, use canonicalized paths to systems headers. fchar8_t -C++ ObjC++ Var(flag_char8_t) Init(-1) -Enable the char8_t fundamental type and use it as the type for UTF-8 string +C ObjC C++ ObjC++ Var(flag_char8_t) Init(-1) +Enable the char8_t type and use it as the type for UTF-8 string and character literals. fcheck-pointer-bounds diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c index d71fd0abe90..501253d0ffe 100644 --- a/gcc/c/c-parser.c +++ b/gcc/c/c-parser.c @@ -7425,7 +7425,14 @@ c_parser_string_literal (c_parser *parser, bool translate, bool wide_ok) default: case CPP_STRING: case CPP_UTF8STRING: - value = build_string (1, ""); + if (type == CPP_UTF8STRING && flag_char8_t) + { + value = build_string (TYPE_PRECISION (char8_type_node) + / TYPE_PRECISION (char_type_node), + ""); /* char8_t is 8 bits */ + } + else + value = build_string (1, ""); break; case CPP_STRING16: value = build_string (TYPE_PRECISION (char16_type_node) @@ -7450,9 +7457,14 @@ c_parser_string_literal (c_parser *parser, bool translate, bool wide_ok) { default: case CPP_STRING: - case CPP_UTF8STRING: TREE_TYPE (value) = char_array_type_node; break; + case CPP_UTF8STRING: + if (flag_char8_t) + TREE_TYPE (value) = char8_array_type_node; + else + TREE_TYPE (value) = char_array_type_node; + break; case CPP_STRING16: TREE_TYPE (value) = char16_array_type_node; break; diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c index 5f322874423..1fa95949919 100644 --- a/gcc/c/c-typeck.c +++ b/gcc/c/c-typeck.c @@ -7979,7 +7979,8 @@ digest_init (location_t init_loc, tree type, tree init, tree origtype, if (char_array) { - if (typ2 != char_type_node) + if (typ2 != char_type_node + && typ2 != unsigned_char_type_node) /* char8_t literal */ incompat_string_cst = true; } else if (!comptypes (typ1, typ2)) diff --git a/gcc/ginclude/stdatomic.h b/gcc/ginclude/stdatomic.h index 23c07be2a48..6629902a666 100644 --- a/gcc/ginclude/stdatomic.h +++ b/gcc/ginclude/stdatomic.h @@ -49,6 +49,9 @@ typedef _Atomic long atomic_long; typedef _Atomic unsigned long atomic_ulong; typedef _Atomic long long atomic_llong; typedef _Atomic unsigned long long atomic_ullong; +#if defined(_CHAR8_T_SOURCE) +typedef _Atomic __CHAR8_TYPE__ atomic_char8_t; +#endif typedef _Atomic __CHAR16_TYPE__ atomic_char16_t; typedef _Atomic __CHAR32_TYPE__ atomic_char32_t; typedef _Atomic __WCHAR_TYPE__ atomic_wchar_t; @@ -97,6 +100,9 @@ extern void atomic_signal_fence (memory_order); #define ATOMIC_BOOL_LOCK_FREE __GCC_ATOMIC_BOOL_LOCK_FREE #define ATOMIC_CHAR_LOCK_FREE __GCC_ATOMIC_CHAR_LOCK_FREE +#if defined(_CHAR8_T_SOURCE) +#define ATOMIC_CHAR8_T_LOCK_FREE __GCC_ATOMIC_CHAR8_T_LOCK_FREE +#endif #define ATOMIC_CHAR16_T_LOCK_FREE __GCC_ATOMIC_CHAR16_T_LOCK_FREE #define ATOMIC_CHAR32_T_LOCK_FREE __GCC_ATOMIC_CHAR32_T_LOCK_FREE #define ATOMIC_WCHAR_T_LOCK_FREE __GCC_ATOMIC_WCHAR_T_LOCK_FREE diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h index 7e840635a38..4c90f8bbbda 100644 --- a/libcpp/include/cpplib.h +++ b/libcpp/include/cpplib.h @@ -358,6 +358,9 @@ struct cpp_options /* Nonzero means process u8 prefixed character literals (UTF-8). */ unsigned char utf8_char_literals; + /* Nonzero means char8_t support is enabled. */ + unsigned char char8; + /* Nonzero means process r/R raw strings. If this is set, uliterals must be set as well. */ unsigned char rliterals;