From patchwork Mon Jun 7 02:32:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Honermann X-Patchwork-Id: 1488376 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=RvxqnWP+; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4Fyy922NLGz9sPf for ; Mon, 7 Jun 2021 12:33:26 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 564E33851C13 for ; Mon, 7 Jun 2021 02:33:24 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 564E33851C13 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1623033204; bh=e7afRO1yLGLakTa78tTISEj6bJ/2DtkrsqetDG28LDg=; h=Subject:To:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=RvxqnWP+7EG8iY/J21gi5DEIHGunDPZEZ+JBDXTjj88BDeRaEqfc7+5hVNEfcOZiR CTOgE4QzpYHoSmS3xr2mis6nO1eO8oIbk33nAhVeurlpCSlxdZAtgSxdnC4PvsAUMC QnIE646yepercmnv57AzplDaUTqkUYWC4pwHjQYc= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp110.ord1d.emailsrvr.com (smtp110.ord1d.emailsrvr.com [184.106.54.110]) by sourceware.org (Postfix) with ESMTPS id 168813857400 for ; Mon, 7 Jun 2021 02:32:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 168813857400 X-Auth-ID: tom@honermann.net Received: by smtp6.relay.ord1d.emailsrvr.com (Authenticated sender: tom-AT-honermann.net) with ESMTPSA id 76CE2E00A8 for ; Sun, 6 Jun 2021 22:32:01 -0400 (EDT) Subject: [PATCH 1/3]: C N2653 char8_t: Language support To: gcc-patches Message-ID: Date: Sun, 6 Jun 2021 22:32:01 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 Content-Language: en-US X-Classification-ID: 4ccf99f1-4cd0-4d7b-86a6-b1b6b9c8b0fb-1-1 X-Spam-Status: No, score=-13.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tom Honermann via Gcc-patches From: Tom Honermann Reply-To: Tom Honermann Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch implements the core language and compiler dependent library changes proposed in WG14 N2653 [1] for C. The changes include: - Use of the existing -fchar8_t and -fno-char8_t options to opt-in to (or opt-out of) the following changes when compiling C code. - Change of type for UTF-8 string literals from array of char to array of char8_t (unsigned char). - A new atomic_char8_t typedef. - A new ATOMIC_CHAR8_T_LOCK_FREE macro defined in terms of a new predefined ATOMIC_CHAR8_T_LOCK_FREE macro. When -fchar8_t support is enabled for non-C++ modes, the _CHAR8_T_SOURCE macro is predefined. This is the mechanism proposed to glibc to opt-in to declarations of the char8_t typedef and c8rtomb and mbrtoc8 functions proposed in N2653. See [2]. Tested on Linux x86_64. gcc/ChangeLog: 2021-05-31 Tom Honermann * ginclude/stdatomic.h (atomic_char8_t, ATOMIC_CHAR8_T_LOCK_FREE): New typedef and macro. gcc/c/ChangeLog: 2021-05-31 Tom Honermann * c-parser.c (c_parser_string_literal): Use char8_t as the type of CPP_UTF8STRING when char8_t support is enabled. * c-typeck.c (digest_init): Handle initialization of an array of character type by a string literal with type array of unsigned char. gcc/c-family/ChangeLog: 2021-05-31 Tom Honermann * c-cppbuiltin.c (c_cpp_builtins): Define _CHAR8_T_SOURCE if char8_t support is enabled in non-C++ language modes. * c-lex.c (lex_string): Use char8_t as the type of CPP_UTF8STRING when char8_t support is enabled. * c-opts.c (c_common_handle_option): Inform the preprocessor if char8_t support is enabled. * c.opt (fchar8_t): Enable for C language modes. libcpp/ChangeLog: 2021-05-31 Tom Honermann * include/cpplib.h (cpp_options): Add char8. Tom. [1]: WG14 N2653 "char8_t: A type for UTF-8 characters and strings (Revision 1)" http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm [2]: C++20 P0482R6 and C2X N2653: support for char8_t, mbrtoc8(), and c8rtomb(). [Patch 0]: https://sourceware.org/pipermail/libc-alpha/2021-June/127230.html [Patch 1]: https://sourceware.org/pipermail/libc-alpha/2021-June/127231.html [Patch 2]: https://sourceware.org/pipermail/libc-alpha/2021-June/127232.html [Patch 3]: https://sourceware.org/pipermail/libc-alpha/2021-June/127233.html commit c4260c7c49822522945377cc2fb93ee9830cefc8 Author: Tom Honermann Date: Sat Feb 13 09:02:34 2021 -0500 N2653 char8_t for C: Language support This patch implements the core language and compiler dependent library changes proposed in WG14 N2653 for C. The changes include: - Use of the existing -fchar8_t and -fno-char8_t options to opt-in to (or opt-out of) the following changes when compiling C code. - Change of type for UTF-8 string literals from array of const char to array of const char8_t (unsigned char). - A new atomic_char8_t typedef. - A new ATOMIC_CHAR8_T_LOCK_FREE macro defined in terms of a new predefined ATOMIC_CHAR8_T_LOCK_FREE macro. When -fchar8_t support is enabled for non-C++ modes, the _CHAR8_T_SOURCE macro is predefined. This is the mechanism proposed to glibc to opt-in to declarations of the char8_t typedef and c8rtomb and mbrtoc8 functions proposed in N2653. diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c index 42b7604c9ac..3e944ec2b86 100644 --- a/gcc/c-family/c-cppbuiltin.c +++ b/gcc/c-family/c-cppbuiltin.c @@ -1467,6 +1467,11 @@ c_cpp_builtins (cpp_reader *pfile) if (flag_iso) cpp_define (pfile, "__STRICT_ANSI__"); + /* Express intent for char8_t support in C (not C++) to the C library if + requested. */ + if (!c_dialect_cxx () && flag_char8_t) + cpp_define (pfile, "_CHAR8_T_SOURCE"); + if (!flag_signed_char) cpp_define (pfile, "__CHAR_UNSIGNED__"); diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c index c44e7a13489..e30e44e9f5c 100644 --- a/gcc/c-family/c-lex.c +++ b/gcc/c-family/c-lex.c @@ -1335,7 +1335,14 @@ lex_string (const cpp_token *tok, tree *valp, bool objc_string, bool translate) default: case CPP_STRING: case CPP_UTF8STRING: - value = build_string (1, ""); + if (type == CPP_UTF8STRING && flag_char8_t) + { + value = build_string (TYPE_PRECISION (char8_type_node) + / TYPE_PRECISION (char_type_node), + ""); /* char8_t is 8 bits */ + } + else + value = build_string (1, ""); break; case CPP_STRING16: value = build_string (TYPE_PRECISION (char16_type_node) diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c index 60b5802722c..eefc607dac6 100644 --- a/gcc/c-family/c-opts.c +++ b/gcc/c-family/c-opts.c @@ -718,6 +718,10 @@ c_common_handle_option (size_t scode, const char *arg, HOST_WIDE_INT value, case OPT_v: verbose = true; break; + + case OPT_fchar8_t: + cpp_opts->char8 = value; + break; } switch (c_language) diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index 91929706aff..eadb2468aa9 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -1451,8 +1451,8 @@ C ObjC C++ ObjC++ Where shorter, use canonicalized paths to systems headers. fchar8_t -C++ ObjC++ Var(flag_char8_t) Init(-1) -Enable the char8_t fundamental type and use it as the type for UTF-8 string +C ObjC C++ ObjC++ Var(flag_char8_t) Init(-1) +Enable the char8_t type and use it as the type for UTF-8 string and character literals. fcheck-pointer-bounds diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c index d71fd0abe90..501253d0ffe 100644 --- a/gcc/c/c-parser.c +++ b/gcc/c/c-parser.c @@ -7425,7 +7425,14 @@ c_parser_string_literal (c_parser *parser, bool translate, bool wide_ok) default: case CPP_STRING: case CPP_UTF8STRING: - value = build_string (1, ""); + if (type == CPP_UTF8STRING && flag_char8_t) + { + value = build_string (TYPE_PRECISION (char8_type_node) + / TYPE_PRECISION (char_type_node), + ""); /* char8_t is 8 bits */ + } + else + value = build_string (1, ""); break; case CPP_STRING16: value = build_string (TYPE_PRECISION (char16_type_node) @@ -7450,9 +7457,14 @@ c_parser_string_literal (c_parser *parser, bool translate, bool wide_ok) { default: case CPP_STRING: - case CPP_UTF8STRING: TREE_TYPE (value) = char_array_type_node; break; + case CPP_UTF8STRING: + if (flag_char8_t) + TREE_TYPE (value) = char8_array_type_node; + else + TREE_TYPE (value) = char_array_type_node; + break; case CPP_STRING16: TREE_TYPE (value) = char16_array_type_node; break; diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c index 5f322874423..1fa95949919 100644 --- a/gcc/c/c-typeck.c +++ b/gcc/c/c-typeck.c @@ -7979,7 +7979,8 @@ digest_init (location_t init_loc, tree type, tree init, tree origtype, if (char_array) { - if (typ2 != char_type_node) + if (typ2 != char_type_node + && typ2 != unsigned_char_type_node) /* char8_t literal */ incompat_string_cst = true; } else if (!comptypes (typ1, typ2)) diff --git a/gcc/ginclude/stdatomic.h b/gcc/ginclude/stdatomic.h index 23c07be2a48..6629902a666 100644 --- a/gcc/ginclude/stdatomic.h +++ b/gcc/ginclude/stdatomic.h @@ -49,6 +49,9 @@ typedef _Atomic long atomic_long; typedef _Atomic unsigned long atomic_ulong; typedef _Atomic long long atomic_llong; typedef _Atomic unsigned long long atomic_ullong; +#if defined(_CHAR8_T_SOURCE) +typedef _Atomic __CHAR8_TYPE__ atomic_char8_t; +#endif typedef _Atomic __CHAR16_TYPE__ atomic_char16_t; typedef _Atomic __CHAR32_TYPE__ atomic_char32_t; typedef _Atomic __WCHAR_TYPE__ atomic_wchar_t; @@ -97,6 +100,9 @@ extern void atomic_signal_fence (memory_order); #define ATOMIC_BOOL_LOCK_FREE __GCC_ATOMIC_BOOL_LOCK_FREE #define ATOMIC_CHAR_LOCK_FREE __GCC_ATOMIC_CHAR_LOCK_FREE +#if defined(_CHAR8_T_SOURCE) +#define ATOMIC_CHAR8_T_LOCK_FREE __GCC_ATOMIC_CHAR8_T_LOCK_FREE +#endif #define ATOMIC_CHAR16_T_LOCK_FREE __GCC_ATOMIC_CHAR16_T_LOCK_FREE #define ATOMIC_CHAR32_T_LOCK_FREE __GCC_ATOMIC_CHAR32_T_LOCK_FREE #define ATOMIC_WCHAR_T_LOCK_FREE __GCC_ATOMIC_WCHAR_T_LOCK_FREE diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h index 7e840635a38..4c90f8bbbda 100644 --- a/libcpp/include/cpplib.h +++ b/libcpp/include/cpplib.h @@ -358,6 +358,9 @@ struct cpp_options /* Nonzero means process u8 prefixed character literals (UTF-8). */ unsigned char utf8_char_literals; + /* Nonzero means char8_t support is enabled. */ + unsigned char char8; + /* Nonzero means process r/R raw strings. If this is set, uliterals must be set as well. */ unsigned char rliterals; From patchwork Mon Jun 7 02:32:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Honermann X-Patchwork-Id: 1488377 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=y253+QOK; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4FyyBj0cHsz9sPf for ; Mon, 7 Jun 2021 12:34:53 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AFD403860C3F for ; Mon, 7 Jun 2021 02:34:50 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org AFD403860C3F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1623033290; bh=qeSqFU4JnOEFCQQ9a6M+LGzjB0zQuZdoN4+m+fv6QA8=; h=Subject:To:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=y253+QOKFBEjdelhjzueTvMhZdbgi3IgwWkC9f2RsUmKWcX+0bZRa96usuFhJ7g5e SBg9jQX2Atwh2IXehxZIk+BQRvEL41KeI2za62HTf5Dqv7s+LOqZEi5I9zOLidn/5M AcxS5S0yntBWG6pwUKo9nANZCuEQNpOl7bMLnhVY= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp73.ord1d.emailsrvr.com (smtp73.ord1d.emailsrvr.com [184.106.54.73]) by sourceware.org (Postfix) with ESMTPS id 3824D38515D5 for ; Mon, 7 Jun 2021 02:32:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3824D38515D5 X-Auth-ID: tom@honermann.net Received: by smtp18.relay.ord1d.emailsrvr.com (Authenticated sender: tom-AT-honermann.net) with ESMTPSA id 97DFFA006F for ; Sun, 6 Jun 2021 22:32:08 -0400 (EDT) Subject: =?utf-8?b?W1BBVENIIDIvM106IEMgTjI2NTMgY2hhcjhfdDogTmV3IHRlc3Rz4oCL?= To: gcc-patches Message-ID: Date: Sun, 6 Jun 2021 22:32:07 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 Content-Language: en-US X-Classification-ID: 605c5316-eb05-42c8-9ef3-d9e27027c33c-1-1 X-Spam-Status: No, score=-13.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tom Honermann via Gcc-patches From: Tom Honermann Reply-To: Tom Honermann Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch provides new tests for the core language and compiler dependent library changes proposed in WG14 N2653 [1] for C. Most of the tests are provided in both a positive (-fchar8_t) and negative (-fno-char8_t) form to ensure behaviors are appropriately present or absent in each mode. Tested on Linux x86_64. gcc/testsuite/ChangeLog: 2021-05-31 Tom Honermann * gcc.dg/atomic/stdatomic-lockfree-char8_t.c: New test. * gcc.dg/char8_t-init-string-literal-1.c: New test. * gcc.dg/char8_t-predefined-macros-1.c: New test. * gcc.dg/char8_t-predefined-macros-2.c: New test. * gcc.dg/char8_t-string-literal-1.c: New test. * gcc.dg/char8_t-string-literal-2.c: New test. Tom. [1]: WG14 N2653 "char8_t: A type for UTF-8 characters and strings (Revision 1)" http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm commit 900aa3507defd80339828e5791c215a28efd9fea Author: Tom Honermann Date: Sat Feb 13 10:02:41 2021 -0500 N2653 char8_t for C: New tests This change provides new tests for the core language and compiler dependent library changes proposed in WG14 N2653 for C. Some of the tests are provided in both a positive (-fchar8_t) and negative (-fno-char8_t) form to ensure behaviors are appropriately present or absent in each mode. diff --git a/gcc/testsuite/gcc.dg/atomic/stdatomic-lockfree-char8_t.c b/gcc/testsuite/gcc.dg/atomic/stdatomic-lockfree-char8_t.c new file mode 100644 index 00000000000..bb9eae84e83 --- /dev/null +++ b/gcc/testsuite/gcc.dg/atomic/stdatomic-lockfree-char8_t.c @@ -0,0 +1,42 @@ +/* Test atomic_is_lock_free for char8_t. */ +/* { dg-do run } */ +/* { dg-options "-std=c11 -fchar8_t -pedantic-errors" } */ + +#include +#include + +extern void abort (void); + +_Atomic __CHAR8_TYPE__ ac8a; +atomic_char8_t ac8t; + +#define CHECK_TYPE(MACRO, V1, V2) \ + do \ + { \ + int r1 = MACRO; \ + int r2 = atomic_is_lock_free (&V1); \ + int r3 = atomic_is_lock_free (&V2); \ + if (r1 != 0 && r1 != 1 && r1 != 2) \ + abort (); \ + if (r2 != 0 && r2 != 1) \ + abort (); \ + if (r3 != 0 && r3 != 1) \ + abort (); \ + if (r1 == 2 && r2 != 1) \ + abort (); \ + if (r1 == 2 && r3 != 1) \ + abort (); \ + if (r1 == 0 && r2 != 0) \ + abort (); \ + if (r1 == 0 && r3 != 0) \ + abort (); \ + } \ + while (0) + +int +main () +{ + CHECK_TYPE (ATOMIC_CHAR8_T_LOCK_FREE, ac8a, ac8t); + + return 0; +} diff --git a/gcc/testsuite/gcc.dg/char8_t-init-string-literal-1.c b/gcc/testsuite/gcc.dg/char8_t-init-string-literal-1.c new file mode 100644 index 00000000000..4d587e90a26 --- /dev/null +++ b/gcc/testsuite/gcc.dg/char8_t-init-string-literal-1.c @@ -0,0 +1,13 @@ +/* Test that char, signed char, and unsigned char arrays can still be + initialized by UTF-8 string literals if -fchar8_t is enabled. */ +/* { dg-do compile } */ +/* { dg-options "-fchar8_t" } */ + +char cbuf1[] = u8"text"; +char cbuf2[] = { u8"text" }; + +signed char scbuf1[] = u8"text"; +signed char scbuf2[] = { u8"text" }; + +unsigned char ucbuf1[] = u8"text"; +unsigned char ucbuf2[] = { u8"text" }; diff --git a/gcc/testsuite/gcc.dg/char8_t-predefined-macros-1.c b/gcc/testsuite/gcc.dg/char8_t-predefined-macros-1.c new file mode 100644 index 00000000000..884c634990d --- /dev/null +++ b/gcc/testsuite/gcc.dg/char8_t-predefined-macros-1.c @@ -0,0 +1,16 @@ +// Test that char8_t related predefined macros are not present when -fchar8_t is +// not enabled. +// { dg-do compile } +// { dg-options "-fno-char8_t" } + +#if defined(_CHAR8_T_SOURCE) +# error _CHAR8_T_SOURCE is defined! +#endif + +#if defined(__CHAR8_TYPE__) +# error __CHAR8_TYPE__ is defined! +#endif + +#if defined(__GCC_ATOMIC_CHAR8_T_LOCK_FREE) +# error __GCC_ATOMIC_CHAR8_T_LOCK_FREE is defined! +#endif diff --git a/gcc/testsuite/gcc.dg/char8_t-predefined-macros-2.c b/gcc/testsuite/gcc.dg/char8_t-predefined-macros-2.c new file mode 100644 index 00000000000..7f425357f57 --- /dev/null +++ b/gcc/testsuite/gcc.dg/char8_t-predefined-macros-2.c @@ -0,0 +1,16 @@ +// Test that char8_t related predefined macros are present when -fchar8_t is +// enabled. +// { dg-do compile } +// { dg-options "-fchar8_t" } + +#if !defined(_CHAR8_T_SOURCE) +# error _CHAR8_T_SOURCE is not defined! +#endif + +#if !defined(__CHAR8_TYPE__) +# error __CHAR8_TYPE__ is not defined! +#endif + +#if !defined(__GCC_ATOMIC_CHAR8_T_LOCK_FREE) +# error __GCC_ATOMIC_CHAR8_T_LOCK_FREE is not defined! +#endif diff --git a/gcc/testsuite/gcc.dg/char8_t-string-literal-1.c b/gcc/testsuite/gcc.dg/char8_t-string-literal-1.c new file mode 100644 index 00000000000..df94582ac1d --- /dev/null +++ b/gcc/testsuite/gcc.dg/char8_t-string-literal-1.c @@ -0,0 +1,6 @@ +// Test that UTF-8 string literals have type char[] if -fchar8_t is not enabled. +// { dg-do compile } +// { dg-options "-std=c11 -fno-char8_t" } + +_Static_assert (_Generic (u8"text", char*: 1, unsigned char*: 2) == 1, "UTF-8 string literals have an unexpected type"); +_Static_assert (_Generic (u8"x"[0], char: 1, unsigned char: 2) == 1, "UTF-8 string literal elements have an unexpected type"); diff --git a/gcc/testsuite/gcc.dg/char8_t-string-literal-2.c b/gcc/testsuite/gcc.dg/char8_t-string-literal-2.c new file mode 100644 index 00000000000..e7fd21f1067 --- /dev/null +++ b/gcc/testsuite/gcc.dg/char8_t-string-literal-2.c @@ -0,0 +1,6 @@ +// Test that UTF-8 string literals have type unsigned char[] if -fchar8_t is enabled. +// { dg-do compile } +// { dg-options "-std=c11 -fchar8_t" } + +_Static_assert (_Generic (u8"text", char*: 1, unsigned char*: 2) == 2, "UTF-8 string literals have an unexpected type"); +_Static_assert (_Generic (u8"x"[0], char: 1, unsigned char: 2) == 2, "UTF-8 string literal elements have an unexpected type"); From patchwork Mon Jun 7 02:32:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Honermann X-Patchwork-Id: 1488378 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=sQEy8ExG; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4FyyCP6GKkz9sPf for ; Mon, 7 Jun 2021 12:35:29 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 757F438515F7 for ; Mon, 7 Jun 2021 02:35:27 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 757F438515F7 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1623033327; bh=0TLeY+wFWoMmR1+3eU33XU/DzXJcBYIL2hLhbxoqLwA=; h=Subject:To:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=sQEy8ExGHOFHk4Cswjtnvu9zb8SKIYl+x9HTqsIZ72aSle6kGUToy5nVCJVULzuuk vHPDrmZ18j8Wv+WTh8VZdrYBsyI+0xqGyd+wyaoAYrYNmUYpNTxJJTjztvklUQeOkL 8IeaFHu/vVGeuUfA91ksgFnAU1yTeaQ9BGxA5aXM= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp82.ord1d.emailsrvr.com (smtp82.ord1d.emailsrvr.com [184.106.54.82]) by sourceware.org (Postfix) with ESMTPS id 32FDE38515D6 for ; Mon, 7 Jun 2021 02:32:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 32FDE38515D6 X-Auth-ID: tom@honermann.net Received: by smtp19.relay.ord1d.emailsrvr.com (Authenticated sender: tom-AT-honermann.net) with ESMTPSA id 88988600AF for ; Sun, 6 Jun 2021 22:32:14 -0400 (EDT) Subject: [PATCH 3/3]: C N2653 char8_t: Documentation updates To: gcc-patches Message-ID: Date: Sun, 6 Jun 2021 22:32:14 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 Content-Language: en-US X-Classification-ID: 3915660f-4cc8-4519-807b-6de6f5e2fabf-1-1 X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tom Honermann via Gcc-patches From: Tom Honermann Reply-To: Tom Honermann Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch updates documentation for the -fchar8_t and -fno-char8_t options to describe their effect on C code as proposed in WG14 N2653 [1]. Tested on Linux x86_64. 2021-05-31 Tom Honermann * doc/invoke.texi (-fchar8_t): update for char8_t support for C. Tom. [1]: WG14 N2653 "char8_t: A type for UTF-8 characters and strings (Revision 1)" http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm commit d3cb3c6648cc15fe1beea6c9799e044cb722148a Author: Tom Honermann Date: Sun May 30 16:57:09 2021 -0400 N2653 char8_t for C: Documentation updates This change updates documentation for the -fchar8_t option to describe its affect on C code as proposed in WG14 N2653 for C. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 5cd4e2d993c..ba4c60a6179 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -2884,14 +2884,27 @@ This flag is enabled by default for @option{-std=c++17}. @itemx -fno-char8_t @opindex fchar8_t @opindex fno-char8_t -Enable support for @code{char8_t} as adopted for C++20. This includes -the addition of a new @code{char8_t} fundamental type, changes to the -types of UTF-8 string and character literals, new signatures for -user-defined literals, associated standard library updates, and new -@code{__cpp_char8_t} and @code{__cpp_lib_char8_t} feature test macros. +Enable support for @code{char8_t} for C as proposed in N2653, and for +C++ as adopted for C++20. + +For C, this changes the type of UTF-8 string literals from array of +@code{char} to array of @code{unsigned char} and defines the +@code{_CHAR8_T_SOURCE} macro to inform the C standard library that the +@code{char8_t} typedef name and the @code{mbrtoc8} and @code{c8rtomb} +functions should be declared by @code{}, and that the +@code{atomic_char8_t} typedef name and the @code{ATOMIC_CHAR8_T_LOCK_FREE} +macro should be defined by @code{}. + +For C++, this enables the @code{char8_t} fundamental type, changes the +type of UTF-8 string literals from array of @code{char} to array of +@code{char8_t}, changes the type of character literals from @code{char} +to @code{char8_t}, adds additional @code{char8_t}-based signatures for +user-defined literals, enables associated standard library updates, and +defines the @code{__cpp_char8_t} and @code{__cpp_lib_char8_t} feature +test macros. This option enables functions to be overloaded for ordinary and UTF-8 -strings: +strings in C++: @smallexample int f(const char *); // #1