From patchwork Mon Jun 7 02:07:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Honermann X-Patchwork-Id: 1488370 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=gNwENWFQ; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4Fyxcl2TBNz9sRN for ; Mon, 7 Jun 2021 12:08:55 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4DEE63857400 for ; Mon, 7 Jun 2021 02:08:53 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4DEE63857400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1623031733; bh=jJxE0ddiN1uvUypUBjko+RXW39IlbjAS1ljunPX6dyQ=; h=Subject:To:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=gNwENWFQKob9u53Mhv3L08/r1JsRKFf9YAwart2DqxmFOVFQBmWORlu/4BmXD5Jc3 upzzuxohyROwcqYqYx1LV5ou5GLKMYHPf4zJ7gRDNxnx0nQf8F+xsc1aCvnuFsLa8b vb5uWzrCZMqjdVRL4Q6SqhGMN/iG39zCEhehgBYo= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from smtp82.ord1d.emailsrvr.com (smtp82.ord1d.emailsrvr.com [184.106.54.82]) by sourceware.org (Postfix) with ESMTPS id 81B343858C27 for ; Mon, 7 Jun 2021 02:07:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 81B343858C27 X-Auth-ID: tom@honermann.net Received: by smtp19.relay.ord1d.emailsrvr.com (Authenticated sender: tom-AT-honermann.net) with ESMTPSA id DCC686008A for ; Sun, 6 Jun 2021 22:07:55 -0400 (EDT) Subject: [PATCH 1/3]: C++20 P0482R6 and C2X N2653: Fix for bug 25744, mbrtowc with Big5-HKSCS To: libc-alpha Message-ID: Date: Sun, 6 Jun 2021 22:07:55 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 Content-Language: en-US X-Classification-ID: ce62b9de-3c52-4131-a7bc-69913209542b-1-1 X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tom Honermann via Libc-alpha From: Tom Honermann Reply-To: Tom Honermann Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" This patch for bug 25744 [1] updates the Big5-HKSCS converter to properly maintain the lowest 3 bits of the mbstate_t __count data member. This change is necessary to ensure that state is correctly preserved when the converter encounters an incomplete multibyte character. More details are available in bug 25744 [1]. The code changes are styled to match how these bits are maintained by converters such as iso-2022-jp.c, ibm930.c, and others. Running 'grep __count' in the 'iconvdata' directory suggests that a number of other converters, euc-jisx0213.c for example, also fail to preserve these bits in some cases, though it may be that negative effects are not observed for those converters. This patch does not attempt to address such issues with other converters. This fix was previously posted to this mailing list on April 7th, 2020 [2], but was not followed up on. Tested on Linux x86_64. Tom. [1]: Bug 25744 "mbrtowc with Big5-HKSCS returns 2 instead of 1 when consuming the second byte of certain double byte characters" https://sourceware.org/bugzilla/show_bug.cgi?id=25744 [2]: "[PATCH] Correct the Big5-HKSCS converter to preserve low order state bits (bug 25744)" https://sourceware.org/pipermail/libc-alpha/2020-April/112595.html commit 2c8959f68c1ac6a04c870932bc61693606a1ee48 Author: Tom Honermann Date: Fri Feb 12 18:19:58 2021 -0500 Correct the Big5-HKSCS converter to preserve low order state bits. BZ: https://sourceware.org/bugzilla/show_bug.cgi?id=25744 diff --git a/iconvdata/big5hkscs.c b/iconvdata/big5hkscs.c index 2f8534c0e7..8912c610d6 100644 --- a/iconvdata/big5hkscs.c +++ b/iconvdata/big5hkscs.c @@ -17771,7 +17771,7 @@ static struct the output state to the initial state. This has to be done during the flushing. */ #define EMIT_SHIFT_TO_INIT \ - if (data->__statep->__count != 0) \ + if ((data->__statep->__count >> 3) != 0) \ { \ if (FROM_DIRECTION) \ { \ @@ -17780,7 +17780,7 @@ static struct /* Write out the last character. */ \ *((uint32_t *) outbuf) = data->__statep->__count >> 3; \ outbuf += sizeof (uint32_t); \ - data->__statep->__count = 0; \ + data->__statep->__count &= 7; \ } \ else \ /* We don't have enough room in the output buffer. */ \ @@ -17794,7 +17794,7 @@ static struct uint32_t lasttwo = data->__statep->__count >> 3; \ *outbuf++ = (lasttwo >> 8) & 0xff; \ *outbuf++ = lasttwo & 0xff; \ - data->__statep->__count = 0; \ + data->__statep->__count &= 7; \ } \ else \ /* We don't have enough room in the output buffer. */ \ @@ -17880,7 +17880,7 @@ static struct \ /* Otherwise store only the first character now, and \ put the second one into the queue. */ \ - *statep = ch2 << 3; \ + *statep = (ch2 << 3) | (*statep & 7); \ /* Tell the caller why we terminate the loop. */ \ result = __GCONV_FULL_OUTPUT; \ break; \ @@ -17897,7 +17897,7 @@ static struct } \ else \ /* Clear the queue and proceed to output the saved character. */ \ - *statep = 0; \ + *statep &= 7; \ \ put32 (outptr, ch); \ outptr += 4; \ @@ -17948,7 +17948,7 @@ static struct } \ *outptr++ = (ch >> 8) & 0xff; \ *outptr++ = ch & 0xff; \ - *statep = 0; \ + *statep &= 7; \ inptr += 4; \ continue; \ \ @@ -17961,7 +17961,7 @@ static struct } \ *outptr++ = (lasttwo >> 8) & 0xff; \ *outptr++ = lasttwo & 0xff; \ - *statep = 0; \ + *statep &= 7; \ continue; \ } \ \ @@ -17998,7 +17998,7 @@ static struct /* Check for possible combining character. */ \ if (__glibc_unlikely (ch == 0xca || ch == 0xea)) \ { \ - *statep = ((cp[0] << 8) | cp[1]) << 3; \ + *statep = (((cp[0] << 8) | cp[1]) << 3) | (*statep & 7); \ inptr += 4; \ continue; \ } \ diff --git a/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c b/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c index 91d7d3a552..12697ebe23 100644 --- a/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c +++ b/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c @@ -128,6 +128,71 @@ check_conversion (struct testdata test) printf ("error: Result of third conversion was wrong.\n"); err++; } + + /* Now perform the same test as above consuming one byte at a time. */ + mbs = test.input; + memset (&st, 0, sizeof (st)); + + /* Consume the first byte; expect an incomplete multibyte character. */ + ret = mbrtowc (&wc, mbs, 1, &st); + if (ret != -2) + { + printf ("error: First byte conversion returned %zd.\n", ret); + err++; + } + /* Advance past the first consumed byte. */ + mbs += 1; + /* Consume the second byte; expect the first wchar_t. */ + ret = mbrtowc (&wc, mbs, 1, &st); + if (ret != 1) + { + printf ("error: Second byte conversion returned %zd.\n", ret); + err++; + } + /* Advance past the second consumed byte. */ + mbs += 1; + if (wc != test.expected[0]) + { + printf ("error: Result of first wchar_t conversion was wrong.\n"); + err++; + } + /* Consume no bytes; expect the second wchar_t. */ + ret = mbrtowc (&wc, mbs, 1, &st); + if (ret != 0) + { + printf ("error: First attempt of third byte conversion returned %zd.\n", ret); + err++; + } + /* Do not advance past the third byte. */ + mbs += 0; + if (wc != test.expected[1]) + { + printf ("error: Result of second wchar_t conversion was wrong.\n"); + err++; + } + /* After the second wchar_t conversion, the converter should be in + the initial state since the two input BIG5-HKSCS bytes have been + consumed and the two wchar_t's have been output. */ + if (mbsinit (&st) == 0) + { + printf ("error: Converter not in initial state.\n"); + err++; + } + /* Consume the third byte; expect the third wchar_t. */ + ret = mbrtowc (&wc, mbs, 1, &st); + if (ret != 1) + { + printf ("error: Third byte conversion returned %zd.\n", ret); + err++; + } + /* Advance past the third consumed byte. */ + mbs += 1; + if (wc != test.expected[2]) + { + printf ("error: Result of third wchar_t conversion was wrong.\n"); + err++; + } + /* Return 0 if we saw no errors. */ return err; } From patchwork Mon Jun 7 02:08:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Honermann X-Patchwork-Id: 1488371 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=q2w4u6ex; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4Fyxdc2gfpz9sRN for ; Mon, 7 Jun 2021 12:09:40 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5B0453857819 for ; Mon, 7 Jun 2021 02:09:38 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5B0453857819 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1623031778; bh=JiOUqUpG8x2BZmzAEWXglFTpUDgDQsnjq3+n1UHCa7U=; h=Subject:To:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=q2w4u6exvbfTn86x9ESjqUYGJidQtUGfFzR5A/e5u6MWndIDAB+3u6rcPXqg643hZ UbC04dvHqd6gug8hvZ+rellUT/en6y+mYjv6G5ebFoBe72Ry+yPHJYL92xrZINUlSG LbYLHNVEku1Lnhl57KVWUJuqxV0lMN34rCL6Qz0U= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from smtp86.ord1d.emailsrvr.com (smtp86.ord1d.emailsrvr.com [184.106.54.86]) by sourceware.org (Postfix) with ESMTPS id 395C7385702D for ; Mon, 7 Jun 2021 02:08:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 395C7385702D X-Auth-ID: tom@honermann.net Received: by smtp3.relay.ord1d.emailsrvr.com (Authenticated sender: tom-AT-honermann.net) with ESMTPSA id 8C1A560061 for ; Sun, 6 Jun 2021 22:08:03 -0400 (EDT) Subject: [PATCH 2/3]: C++20 P0482R6 and C2X N2653: Implement mbrtoc8, c8rtomb, char8_t To: libc-alpha Message-ID: <40868945-29d9-3099-fb11-260796a2a022@honermann.net> Date: Sun, 6 Jun 2021 22:08:03 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 Content-Language: en-US X-Classification-ID: 452bca9d-1b08-4e41-9f4c-afc480d2edc9-1-1 X-Spam-Status: No, score=-13.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tom Honermann via Libc-alpha From: Tom Honermann Reply-To: Tom Honermann Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" This patch provides implementations for the mbrtoc8 and c8rtomb functions adopted for C++20 via WG21 P0482R6 [1] and proposed for C2X via WG14 N2653 [2]. It also provides the char8_t typedef from WG14 N2653 [2] and introduces a _CHAR8_T_SOURCE feature test macro to opt-in to the new declarations. The mbrtoc8 and c8rtomb functions are declared in uchar.h if either the C++20 __cpp_char8_t feature test macro or the _CHAR8_T_SOURCE feature test macro are defined. The char8_t typedef is declared in uchar.h if _CHAR8_T_SOURCE is defined and __cpp_char8_t is not defined (if __cpp_char8_t is defined, then char8_t is a builtin type). Additionally, in features.h, missing comments for the __GLIBC_USE_ISOC2X, __GLIBC_USE_DEPRECATED_GETS, and __GLIBC_USE_DEPRECATED_SCANF macros are added. Tested on Linux x86_64. Tom. [1]: WG21 P0482R6 "char8_t: A type for UTF-8 characters and strings (Revision 6)" https://wg21.link/p0482r6 [2]: WG14 N2653 "char8_t: A type for UTF-8 characters and strings (Revision 1)" http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm commit 9b79320a42146cda02338941606cbf4ee0ca0db8 Author: Tom Honermann Date: Fri Feb 12 22:00:26 2021 -0500 Implement mbrtoc8(), c8rtomb(), and the char8_t typedef. This change provides implementations for the mbrtoc8 and c8rtomb functions adopted for C++20 via WG21 P0482R6 and proposed for C2X via WG14 N2653. It also provides the char8_t typedef from N2653 and introduces a _CHAR8_T_SOURCE feature test macro to opt-in to the new declarations. The mbrtoc8 and c8rtomb functions are declared in uchar.h if either the C++20 __cpp_char8_t feature test macro or the _CHAR8_T_SOURCE feature test macro are defined. The char8_t typedef is declared in uchar.h if _CHAR8_T_SOURCE is defined and __cpp_char8_t is not defined (if __cpp_char8_t is defined, then char8_t is a builtin type). In features.h, missing comments for the __GLIBC_USE_ISOC2X, __GLIBC_USE_DEPRECATED_GETS, and __GLIBC_USE_DEPRECATED_SCANF macros are added. diff --git a/NEWS b/NEWS index 266837bf2d..67793fc324 100644 --- a/NEWS +++ b/NEWS @@ -25,6 +25,15 @@ Major new features: * The ISO C2X function timespec_getres has been added. +* The mbrtoc8 and c8rtomb functions are added for implementation of the + C++20 P0482R6 and C2X N2653 proposals. These functions perform conversions + between multibyte sequences and the UTF-8 character encoding. A char8_t + typedef is added for the C2X N2653 proposal. The functions are declared + in uchar.h if the C++20 __cpp_char8_t feature test macro or the + _CHAR8_T_SOURCE feature test macro is defined. The char8_t typedef is + declared in uchar.h if _CHAR8_T_SOURCE is defined and __cpp_char8_t is + not defined. + Deprecated and removed features, and other changes affecting compatibility: * The function pthread_mutex_consistent_np has been deprecated; programs diff --git a/include/features.h b/include/features.h index eb97470afa..9437f0f0b8 100644 --- a/include/features.h +++ b/include/features.h @@ -60,6 +60,8 @@ _REENTRANT, _THREAD_SAFE Obsolete; equivalent to _POSIX_C_SOURCE=199506L. + _CHAR8_T_SOURCE Extensions for char8_t as specified in WG14 N2231. + The `-ansi' switch to the GNU C compiler, and standards conformance options such as `-std=c99', define __STRICT_ANSI__. If none of these are defined, or if _DEFAULT_SOURCE is defined, the default is @@ -100,6 +102,12 @@ MINSIGSTKSZ and SIGSTKSZ. __USE_GNU Define GNU extensions. __USE_FORTIFY_LEVEL Additional security measures used, according to level. + __GLIBC_USE_ISOC2X Define ISO C2X things. + __GLIBC_USE_DEPRECATED_GETS + Define deprecated gets() + __GLIBC_USE_DEPRECATED_SCANF + Define deprecated scanf() + __GLIBC_USE_CHAR8_T Define extensions for char8_t as specified in WG14 N2231. The macros `__GNU_LIBRARY__', `__GLIBC__', and `__GLIBC_MINOR__' are defined by this file unconditionally. `__GNU_LIBRARY__' is provided @@ -148,6 +156,7 @@ #undef __GLIBC_USE_ISOC2X #undef __GLIBC_USE_DEPRECATED_GETS #undef __GLIBC_USE_DEPRECATED_SCANF +#undef __GLIBC_USE_CHAR8_T /* Suppress kernel-name space pollution unless user expressedly asks for it. */ @@ -457,6 +466,16 @@ # define __GLIBC_USE_DEPRECATED_SCANF 0 #endif +/* The char8_t related c8rtomb and mbrtoc8 functions are declared if the + C++ __cpp_char8_t feature test macro is defined or if _CHAR8_T_SOURCE + is defined. The char8_t typedef is declared if _CHAR8_T_SOURCE is + defined and the C++ __cpp_char8_t feature test macro is not defined. */ +#if defined _CHAR8_T_SOURCE || defined __cpp_char8_t +# define __GLIBC_USE_CHAR8_T 1 +#else +# define __GLIBC_USE_CHAR8_T 0 +#endif + /* Get definitions of __STDC_* predefined macros, if the compiler has not preincluded this header automatically. */ #include diff --git a/sysdeps/mach/hurd/i386/libc.abilist b/sysdeps/mach/hurd/i386/libc.abilist index 49aa809366..f536073108 100644 --- a/sysdeps/mach/hurd/i386/libc.abilist +++ b/sysdeps/mach/hurd/i386/libc.abilist @@ -2207,7 +2207,9 @@ GLIBC_2.33 stat64 F GLIBC_2.34 __isnanf128 F GLIBC_2.34 __libc_start_main F GLIBC_2.34 _hurd_libc_proc_init F +GLIBC_2.34 c8rtomb F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 timespec_getres F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist index d22c7da7ef..54eb6473ff 100644 --- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist +++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist @@ -2335,6 +2335,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -2343,6 +2344,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist index cefff3bf36..288ed25ab1 100644 --- a/sysdeps/unix/sysv/linux/alpha/libc.abilist +++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist @@ -2427,6 +2427,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -2435,6 +2436,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/sysdeps/unix/sysv/linux/arc/libc.abilist b/sysdeps/unix/sysv/linux/arc/libc.abilist index 91a90f8ca4..3ef5eaf40f 100644 --- a/sysdeps/unix/sysv/linux/arc/libc.abilist +++ b/sysdeps/unix/sysv/linux/arc/libc.abilist @@ -2094,6 +2094,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -2102,6 +2103,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/sysdeps/unix/sysv/linux/arm/be/libc.abilist b/sysdeps/unix/sysv/linux/arm/be/libc.abilist index 120288d766..5553b89bc1 100644 --- a/sysdeps/unix/sysv/linux/arm/be/libc.abilist +++ b/sysdeps/unix/sysv/linux/arm/be/libc.abilist @@ -200,6 +200,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -208,6 +209,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/sysdeps/unix/sysv/linux/arm/le/libc.abilist b/sysdeps/unix/sysv/linux/arm/le/libc.abilist index be987da77e..9813f6ca87 100644 --- a/sysdeps/unix/sysv/linux/arm/le/libc.abilist +++ b/sysdeps/unix/sysv/linux/arm/le/libc.abilist @@ -197,6 +197,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -205,6 +206,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/sysdeps/unix/sysv/linux/csky/libc.abilist b/sysdeps/unix/sysv/linux/csky/libc.abilist index adb4e15cb8..d457c6e42a 100644 --- a/sysdeps/unix/sysv/linux/csky/libc.abilist +++ b/sysdeps/unix/sysv/linux/csky/libc.abilist @@ -2278,6 +2278,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -2286,6 +2287,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist index bd022276e8..23ea3705ba 100644 --- a/sysdeps/unix/sysv/linux/hppa/libc.abilist +++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist @@ -2231,6 +2231,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -2239,6 +2240,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist index 9e37e1cb38..39f6611ee0 100644 --- a/sysdeps/unix/sysv/linux/i386/libc.abilist +++ b/sysdeps/unix/sysv/linux/i386/libc.abilist @@ -2415,6 +2415,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -2423,6 +2424,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist index b8089b0b0c..4bb6092441 100644 --- a/sysdeps/unix/sysv/linux/ia64/libc.abilist +++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist @@ -2267,6 +2267,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -2275,6 +2276,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist index 093854ad85..63a8877c38 100644 --- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist +++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist @@ -201,6 +201,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -209,6 +210,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist index 87554f1468..f3c8644e51 100644 --- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist +++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist @@ -2358,6 +2358,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -2366,6 +2367,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist index e9340671c5..5897810a1f 100644 --- a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist +++ b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist @@ -2329,6 +2329,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -2337,6 +2338,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist index 6ddc0e90cf..7d1672eb0d 100644 --- a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist +++ b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist @@ -2326,6 +2326,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -2334,6 +2335,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist index 8582c9c371..4a4a43aad7 100644 --- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist @@ -2323,6 +2323,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -2331,6 +2332,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist index b0849bec98..fa33fc92d1 100644 --- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist @@ -2321,6 +2321,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -2329,6 +2330,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist index 386660a5a1..3dcdfa2b39 100644 --- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist @@ -2329,6 +2329,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -2337,6 +2338,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist index 4d05128f21..e4685a7e10 100644 --- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist @@ -2323,6 +2323,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -2331,6 +2332,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist index bd305f440f..3204a54838 100644 --- a/sysdeps/unix/sysv/linux/nios2/libc.abilist +++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist @@ -2368,6 +2368,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -2376,6 +2377,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist index c2665624aa..b0122085c9 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist @@ -2385,6 +2385,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -2393,6 +2394,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist index 13ef6ef39e..c7e4209cf5 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist @@ -2418,6 +2418,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -2426,6 +2427,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist index b21072e313..0e14ea72ca 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist @@ -2232,6 +2232,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -2240,6 +2241,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist index 62af65536c..2ff5fbc8a5 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist @@ -2531,6 +2531,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -2539,6 +2540,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist index a63aec3379..dbe5fcf2ae 100644 --- a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist +++ b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist @@ -2096,6 +2096,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -2104,6 +2105,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist index b52efaf5ee..9b9afda286 100644 --- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist +++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist @@ -2296,6 +2296,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -2304,6 +2305,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist index b699dedcc1..1ca55714a6 100644 --- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist +++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist @@ -2383,6 +2383,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -2391,6 +2392,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist index 94209858b1..7348c3aa2a 100644 --- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist +++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist @@ -2269,6 +2269,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -2277,6 +2278,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/sysdeps/unix/sysv/linux/sh/be/libc.abilist b/sysdeps/unix/sysv/linux/sh/be/libc.abilist index 0fab90e1e3..279f3388ac 100644 --- a/sysdeps/unix/sysv/linux/sh/be/libc.abilist +++ b/sysdeps/unix/sysv/linux/sh/be/libc.abilist @@ -2238,6 +2238,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -2246,6 +2247,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/sysdeps/unix/sysv/linux/sh/le/libc.abilist b/sysdeps/unix/sysv/linux/sh/le/libc.abilist index 2f3a64b580..cce9c390da 100644 --- a/sysdeps/unix/sysv/linux/sh/le/libc.abilist +++ b/sysdeps/unix/sysv/linux/sh/le/libc.abilist @@ -2235,6 +2235,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -2243,6 +2244,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist index e6fe453f50..54e64f4654 100644 --- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist +++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist @@ -2376,6 +2376,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -2384,6 +2385,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist index 4327cf5eb3..11f9bc8db2 100644 --- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist +++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist @@ -2288,6 +2288,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -2296,6 +2297,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist index 318a6d50f9..b9cae9f7ab 100644 --- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist +++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist @@ -2247,6 +2247,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -2255,6 +2256,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist index 0bcf898d4d..cf633eefa4 100644 --- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist +++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist @@ -2350,6 +2350,7 @@ GLIBC_2.34 __pthread_register_cancel_defer F GLIBC_2.34 __pthread_unregister_cancel F GLIBC_2.34 __pthread_unregister_cancel_restore F GLIBC_2.34 __pthread_unwind_next F +GLIBC_2.34 c8rtomb F GLIBC_2.34 call_once F GLIBC_2.34 cnd_broadcast F GLIBC_2.34 cnd_destroy F @@ -2358,6 +2359,7 @@ GLIBC_2.34 cnd_signal F GLIBC_2.34 cnd_timedwait F GLIBC_2.34 cnd_wait F GLIBC_2.34 execveat F +GLIBC_2.34 mbrtoc8 F GLIBC_2.34 mtx_destroy F GLIBC_2.34 mtx_init F GLIBC_2.34 mtx_lock F diff --git a/wcsmbs/Makefile b/wcsmbs/Makefile index f38eb5cfe1..e0c8acf591 100644 --- a/wcsmbs/Makefile +++ b/wcsmbs/Makefile @@ -42,7 +42,7 @@ routines := wcscat wcschr wcscmp wcscpy wcscspn wcsdup wcslen wcsncat \ wcsmbsload mbsrtowcs_l \ isoc99_wscanf isoc99_vwscanf isoc99_fwscanf isoc99_vfwscanf \ isoc99_swscanf isoc99_vswscanf \ - mbrtoc16 c16rtomb mbrtoc32 c32rtomb + mbrtoc8 c8rtomb mbrtoc16 c16rtomb mbrtoc32 c32rtomb strop-tests := wcscmp wcsncmp wmemcmp wcslen wcschr wcsrchr wcscpy wcsnlen \ wcpcpy wcsncpy wcpncpy wcscat wcsncat wcschrnul wcsspn wcspbrk \ diff --git a/wcsmbs/Versions b/wcsmbs/Versions index 0b31c1b940..acf6c3b705 100644 --- a/wcsmbs/Versions +++ b/wcsmbs/Versions @@ -49,4 +49,7 @@ libc { wcstof32; wcstof64; wcstof32x; wcstof32_l; wcstof64_l; wcstof32x_l; } + GLIBC_2.34 { + c8rtomb; mbrtoc8; + } } diff --git a/wcsmbs/c8rtomb.c b/wcsmbs/c8rtomb.c new file mode 100644 index 0000000000..ebb1e73e5c --- /dev/null +++ b/wcsmbs/c8rtomb.c @@ -0,0 +1,137 @@ +/* Copyright (C) 2020 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Tom Honermann , 2020. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* Ensure that char8_t support is enabled so that the char8_t typedef is + declared. */ +#define _CHAR8_T_SOURCE + +#include +#include +#include +#include + + +/* This is the private state used if PS is NULL. */ +static mbstate_t state; + +size_t +c8rtomb (char *s, char8_t c8, mbstate_t *ps) +{ + /* This implementation depends on the converter invoked by wcrtomb not + needing to retain state in either the top most bit of ps->__count or + in ps->__value between invocations. This implementation uses the + top most bit of ps->__count to indicate that trailing code units are + expected and uses ps->__value to store previously seen code units. */ + + wchar_t wc; + + if (ps == NULL) + ps = &state; + + if (s == NULL) + { + /* if 's' is a null pointer, behave as if u8'\0' was passed as 'c8'. If + this occurs for an incomplete code unit sequence, then an error will + be reported below. */ + c8 = u8""[0]; + } + + if (! (ps->__count & 0x80000000)) + { + /* Initial state. */ + if ((c8 >= 0x80 && c8 <= 0xC1) || c8 >= 0xF5) + { + /* An invalid lead code unit. */ + __set_errno (EILSEQ); + return -1; + } + if (c8 >= 0xC2) + { + /* A valid lead code unit. */ + ps->__count |= 0x80000000; + ps->__value.__wchb[0] = c8; + ps->__value.__wchb[3] = 1; + return 0; + } + /* A single byte (ASCII) code unit. */ + wc = c8; + } + else + { + char8_t cu1 = ps->__value.__wchb[0]; + if (ps->__value.__wchb[3] == 1) + { + /* A single lead code unit was previously seen. */ + if ((c8 < 0x80 || c8 > 0xBF) || + (cu1 == 0xE0 && c8 < 0xA0) || + (cu1 == 0xED && c8 > 0x9F) || + (cu1 == 0xF0 && c8 < 0x90) || + (cu1 == 0xF4 && c8 > 0x8F)) + { + /* An invalid second code unit. */ + __set_errno (EILSEQ); + return -1; + } + if (cu1 >= 0xE0) + { + /* A three or four code unit sequence. */ + ps->__value.__wchb[1] = c8; + ++ps->__value.__wchb[3]; + return 0; + } + wc = ((cu1 & 0x1F) << 6) + + (c8 & 0x3F); + } + else + { + char8_t cu2 = ps->__value.__wchb[1]; + /* A three or four byte code unit sequence. */ + if (c8 < 0x80 || c8 > 0xBF) + { + /* An invalid third or fourth code unit. */ + __set_errno (EILSEQ); + return -1; + } + if (ps->__value.__wchb[3] == 2 && cu1 >= 0xF0) + { + /* A four code unit sequence. */ + ps->__value.__wchb[2] = c8; + ++ps->__value.__wchb[3]; + return 0; + } + if (cu1 < 0xF0) + { + wc = ((cu1 & 0x0F) << 12) + + ((cu2 & 0x3F) << 6) + + (c8 & 0x3F); + } + else + { + char8_t cu3 = ps->__value.__wchb[2]; + wc = ((cu1 & 0x07) << 18) + + ((cu2 & 0x3F) << 12) + + ((cu3 & 0x3F) << 6) + + (c8 & 0x3F); + } + } + ps->__count &= 0x7fffffff; + ps->__value.__wch = 0; + } + + return wcrtomb (s, wc, ps); +} diff --git a/wcsmbs/mbrtoc8.c b/wcsmbs/mbrtoc8.c new file mode 100644 index 0000000000..c112216de5 --- /dev/null +++ b/wcsmbs/mbrtoc8.c @@ -0,0 +1,131 @@ +/* Copyright (C) 2020 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Tom Honermann , 2020. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* Ensure that char8_t support is enabled so that the char8_t typedef is + declared. */ +#define _CHAR8_T_SOURCE + +#include +#include +#include +#include +#include +#include +#include + +#include + +#ifndef EILSEQ +# define EILSEQ EINVAL +#endif + + +/* This is the private state used if PS is NULL. */ +static mbstate_t state; + +size_t +mbrtoc8 (char8_t *pc8, const char *s, size_t n, mbstate_t *ps) +{ + /* This implementation depends on the converter invoked by mbrtowc() not + needing to retain state in either the top most bit of ps->__count or + in ps->__value between invocations. This implementation uses the + top most bit of ps->__count to indicate that trailing code units are + yet to be written and uses ps->__value to store those code units. */ + + if (ps == NULL) + ps = &state; + + /* If state indicates that trailing code units are yet to be written, write + those first regardless of whether 's' is a null pointer. */ + if (ps->__count & 0x80000000) + { + /* ps->__value.__wchb[3] stores the index of the next code unit to + write. Code units are stored in reverse order. */ + size_t i = ps->__value.__wchb[3]; + if (pc8 != NULL) + { + *pc8 = ps->__value.__wchb[i]; + } + if (i == 0) + { + ps->__count &= 0x7fffffff; + ps->__value.__wch = 0; + } + else + --ps->__value.__wchb[3]; + return -3; + } + + if (s == NULL) + { + /* if 's' is a null pointer, behave as if a null pointer was passed for + 'pc8', an empty string was passed for 's', and 1 passed for 'n'. */ + pc8 = NULL; + s = ""; + n = 1; + } + + wchar_t wc; + size_t result; + + result = mbrtowc(&wc, s, n, ps); + if (result <= n) + { + if (wc <= 0x7F) + { + if (pc8 != NULL) + *pc8 = wc; + } + else if (wc <= 0x7FF) + { + if (pc8 != NULL) + *pc8 = 0xC0 + ((wc >> 6) & 0x1F); + ps->__value.__wchb[0] = 0x80 + (wc & 0x3F); + ps->__value.__wchb[3] = 0; + ps->__count |= 0x80000000; + } + else if (wc <= 0xFFFF) + { + if (pc8 != NULL) + *pc8 = 0xE0 + ((wc >> 12) & 0x0F); + ps->__value.__wchb[1] = 0x80 + ((wc >> 6) & 0x3F); + ps->__value.__wchb[0] = 0x80 + (wc & 0x3F); + ps->__value.__wchb[3] = 1; + ps->__count |= 0x80000000; + } + else if (wc <= 0x10FFFF) + { + if (pc8 != NULL) + *pc8 = 0xF0 + ((wc >> 18) & 0x07); + ps->__value.__wchb[2] = 0x80 + ((wc >> 12) & 0x3F); + ps->__value.__wchb[1] = 0x80 + ((wc >> 6) & 0x3F); + ps->__value.__wchb[0] = 0x80 + (wc & 0x3F); + ps->__value.__wchb[3] = 2; + ps->__count |= 0x80000000; + } + } + if (result == 0 && wc != 0) + { + /* mbrtowc() never returns -3. When a MB sequence converts to multiple + WCs, no input is consumed when writing the subsequent WCs resulting + in a result of 0 even if a null character wasn't written. */ + result = -3; + } + + return result; +} diff --git a/wcsmbs/uchar.h b/wcsmbs/uchar.h index 6020f66cf6..b23ff1a5ac 100644 --- a/wcsmbs/uchar.h +++ b/wcsmbs/uchar.h @@ -31,6 +31,14 @@ #include #include +/* Define the char8_t typedef if support is enabled, but only if the C++ + predefined feature test macro that indicates char8_t is a builtin type + is not defined. */ +#if __GLIBC_USE (CHAR8_T) && !defined __cpp_char8_t +/* Define the 8-bit character type. */ +typedef unsigned char char8_t; +#endif + #ifndef __USE_ISOCXX11 /* Define the 16-bit and 32-bit character types. */ typedef __uint_least16_t char16_t; @@ -40,6 +48,18 @@ typedef __uint_least32_t char32_t; __BEGIN_DECLS +#if __GLIBC_USE (CHAR8_T) +/* Write char8_t representation of multibyte character pointed + to by S to PC8. */ +extern size_t mbrtoc8 (char8_t *__restrict __pc8, + const char *__restrict __s, size_t __n, + mbstate_t *__restrict __p) __THROW; + +/* Write multibyte representation of char8_t C8 to S. */ +extern size_t c8rtomb (char *__restrict __s, char8_t __c8, + mbstate_t *__restrict __ps) __THROW; +#endif + /* Write char16_t representation of multibyte character pointed to by S to PC16. */ extern size_t mbrtoc16 (char16_t *__restrict __pc16, From patchwork Mon Jun 7 02:08:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Honermann X-Patchwork-Id: 1488372 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=towkN9P4; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4Fyxfj0mhmz9sXh for ; Mon, 7 Jun 2021 12:10:37 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0A0A83857C73 for ; Mon, 7 Jun 2021 02:10:35 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0A0A83857C73 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1623031835; bh=lhcl09EMhV8hBSYf+n9DADdJg+aiqd9dY8YTwMMZav8=; h=Subject:To:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=towkN9P4vNba+eHqry16jTloRQKsxK3J4z/bR5x/Oo38Vl8BeSIJgjIrcxiITzk6X AQ9SJ0vYxJ9DvxlHLQRU+rltrL5goT0Eyw+RygY9yDD5ODUedbmplnl3nOMOfrnoB9 i5YHYcaoTUaHHpMUlqFfA8/mKTIofYrmt5RHSFTk= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from smtp105.ord1d.emailsrvr.com (smtp105.ord1d.emailsrvr.com [184.106.54.105]) by sourceware.org (Postfix) with ESMTPS id E7E7C3858004 for ; Mon, 7 Jun 2021 02:08:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E7E7C3858004 X-Auth-ID: tom@honermann.net Received: by smtp22.relay.ord1d.emailsrvr.com (Authenticated sender: tom-AT-honermann.net) with ESMTPSA id 36527E00BC for ; Sun, 6 Jun 2021 22:08:09 -0400 (EDT) Subject: [PATCH 3/3]: C++20 P0482R6 and C2X N2653: Tests for mbrtoc8, c8rtomb, char8_t To: libc-alpha Message-ID: <895dc28e-d2d6-6c6a-1637-f84184e84190@honermann.net> Date: Sun, 6 Jun 2021 22:08:08 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 Content-Language: en-US X-Classification-ID: 43045a46-4bb9-4861-9d48-5690b9fe40a0-1-1 X-Spam-Status: No, score=-13.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tom Honermann via Libc-alpha From: Tom Honermann Reply-To: Tom Honermann Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" This patch adds tests for the mbrtoc8 and c8rtomb functions adopted for C++20 via WG21 P0482R6 [1] and proposed for C2X via WG14 N2653 [2], and for the char8_t typedef from WG14 N2653 [2]. The tests for mbrtoc8 and c8rtomb specifically exercise conversion from/to Big5-HKSCS because of special cases that arise with that encoding. Big5-HKSCS defines some double byte sequences that convert to more than one Unicode code point. In order to test this, the locale dependencies for running tests under wcsmbs is expanded to include zh_HK.BIG5-HKSCS. Tested on Linux x86_64. Tom. [1]: WG21 P0482R6 "char8_t: A type for UTF-8 characters and strings (Revision 6)" https://wg21.link/p0482r6 [2]: WG14 N2653 "char8_t: A type for UTF-8 characters and strings (Revision 1)" http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm commit d3cb93e0dffbd32b24307b416a3f3bf1de23dcde Author: Tom Honermann Date: Fri Feb 12 23:09:41 2021 -0500 Tests for mbrtoc8(), c8rtomb(), and the char8_t typedef. This change adds tests for the mbrtoc8 and c8rtomb functions adopted for C++20 via WG21 P0482R6 and proposed for C2X via WG14 N2653, and for the char8_t typedef from WG14 N2653. The tests for mbrtoc8 and c8rtomb specifically exercise conversion from/to Big5-HKSCS because of special cases that arise with that encoding. Big5-HKSCS defines some double byte sequences that convert to more than one Unicode code point. In order to test this, the locale dependencies for running tests under wcsmbs is expanded to include zh_HK.BIG5-HKSCS. diff --git a/wcsmbs/Makefile b/wcsmbs/Makefile index e0c8acf591..a0716069ce 100644 --- a/wcsmbs/Makefile +++ b/wcsmbs/Makefile @@ -52,13 +52,14 @@ tests := tst-wcstof wcsmbs-tst1 tst-wcsnlen tst-btowc tst-mbrtowc \ tst-c16c32-1 wcsatcliff tst-wcstol-locale tst-wcstod-nan-locale \ tst-wcstod-round test-char-types tst-fgetwc-after-eof \ tst-wcstod-nan-sign tst-c16-surrogate tst-c32-state \ + test-char8-type test-mbrtoc8 test-c8rtomb \ $(addprefix test-,$(strop-tests)) tst-mbstowcs include ../Rules ifeq ($(run-built-tests),yes) LOCALES := de_DE.ISO-8859-1 de_DE.UTF-8 en_US.ANSI_X3.4-1968 hr_HR.ISO-8859-2 \ - ja_JP.EUC-JP zh_TW.EUC-TW tr_TR.UTF-8 tr_TR.ISO-8859-9 + ja_JP.EUC-JP zh_TW.EUC-TW tr_TR.UTF-8 tr_TR.ISO-8859-9 zh_HK.BIG5-HKSCS include ../gen-locales.mk $(objpfx)tst-btowc.out: $(gen-locales) diff --git a/wcsmbs/test-c8rtomb.c b/wcsmbs/test-c8rtomb.c new file mode 100644 index 0000000000..2a65fb8b9c --- /dev/null +++ b/wcsmbs/test-c8rtomb.c @@ -0,0 +1,552 @@ +/* Test c8rtomb. + Copyright (C) 2020 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Tom Honermann , 2020. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* Ensure that char8_t support is enabled so that the char8_t typedef and + c8rtomb function are declared. */ +#define _CHAR8_T_SOURCE + +/* We always want assert to be fully defined. */ +#undef NDEBUG + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static int +test_truncated_code_unit_sequence (void) +{ + const char8_t *u8s; + char buf[MB_LEN_MAX]; + mbstate_t s; + + /* Missing trailing code unit for a two code byte unit sequence. */ + u8s = (const char8_t*) u8"\xC2"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 0); /* 1st byte processed */ + errno = 0; + assert (c8rtomb (buf, u8s[1], &s) == (size_t) -1); /* No trailing code unit */ + assert (errno == EILSEQ); + + /* Missing first trailing code unit for a three byte code unit sequence. */ + u8s = (const char8_t*) u8"\xE0"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 0); /* 1st byte processed */ + errno = 0; + assert (c8rtomb (buf, u8s[1], &s) == (size_t) -1); /* No trailing code unit */ + assert (errno == EILSEQ); + + /* Missing second trailing code unit for a three byte code unit sequence. */ + u8s = (const char8_t*) u8"\xE0\xA0"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 0); /* 1st byte processed */ + assert (c8rtomb (buf, u8s[1], &s) == (size_t) 0); /* 2nd byte processed */ + errno = 0; + assert (c8rtomb (buf, u8s[2], &s) == (size_t) -1); /* No trailing code unit */ + assert (errno == EILSEQ); + + /* Missing first trailing code unit for a four byte code unit sequence. */ + u8s = (const char8_t*) u8"\xF0"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 0); /* 1st byte processed */ + errno = 0; + assert (c8rtomb (buf, u8s[1], &s) == (size_t) -1); /* No trailing code unit */ + assert (errno == EILSEQ); + + /* Missing second trailing code unit for a four byte code unit sequence. */ + u8s = (const char8_t*) u8"\xF0\x90"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 0); /* 1st byte processed */ + assert (c8rtomb (buf, u8s[1], &s) == (size_t) 0); /* 2nd byte processed */ + errno = 0; + assert (c8rtomb (buf, u8s[2], &s) == (size_t) -1); /* No trailing code unit */ + assert (errno == EILSEQ); + + /* Missing third trailing code unit for a four byte code unit sequence. */ + u8s = (const char8_t*) u8"\xF0\x90\x80"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 0); /* 1st byte processed */ + assert (c8rtomb (buf, u8s[1], &s) == (size_t) 0); /* 2nd byte processed */ + assert (c8rtomb (buf, u8s[2], &s) == (size_t) 0); /* 3rd byte processed */ + errno = 0; + assert (c8rtomb (buf, u8s[3], &s) == (size_t) -1); /* No trailing code unit */ + assert (errno == EILSEQ); + + return 0; +} + +static int +test_invalid_trailing_code_unit_sequence (void) +{ + const char8_t *u8s; + char buf[MB_LEN_MAX]; + mbstate_t s; + + /* Invalid trailing code unit for a two code byte unit sequence. */ + u8s = (const char8_t*) u8"\xC2\xC0"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 0); /* 1st byte processed */ + errno = 0; + assert (c8rtomb (buf, u8s[1], &s) == (size_t) -1); /* Invalid trailing code unit */ + assert (errno == EILSEQ); + + /* Invalid first trailing code unit for a three byte code unit sequence. */ + u8s = (const char8_t*) u8"\xE0\xC0"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 0); /* 1st byte processed */ + errno = 0; + assert (c8rtomb (buf, u8s[1], &s) == (size_t) -1); /* Invalid trailing code unit */ + assert (errno == EILSEQ); + + /* Invalid second trailing code unit for a three byte code unit sequence. */ + u8s = (const char8_t*) u8"\xE0\xA0\xC0"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 0); /* 1st byte processed */ + assert (c8rtomb (buf, u8s[1], &s) == (size_t) 0); /* 2nd byte processed */ + errno = 0; + assert (c8rtomb (buf, u8s[2], &s) == (size_t) -1); /* Invalid trailing code unit */ + assert (errno == EILSEQ); + + /* Invalid first trailing code unit for a four byte code unit sequence. */ + u8s = (const char8_t*) u8"\xF0\xC0"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 0); /* 1st byte processed */ + errno = 0; + assert (c8rtomb (buf, u8s[1], &s) == (size_t) -1); /* Invalid trailing code unit */ + assert (errno == EILSEQ); + + /* Invalid second trailing code unit for a four byte code unit sequence. */ + u8s = (const char8_t*) u8"\xF0\x90\xC0"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 0); /* 1st byte processed */ + assert (c8rtomb (buf, u8s[1], &s) == (size_t) 0); /* 2nd byte processed */ + errno = 0; + assert (c8rtomb (buf, u8s[2], &s) == (size_t) -1); /* Invalid trailing code unit */ + assert (errno == EILSEQ); + + /* Invalid third trailing code unit for a four byte code unit sequence. */ + u8s = (const char8_t*) u8"\xF0\x90\x80\xC0"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 0); /* 1st byte processed */ + assert (c8rtomb (buf, u8s[1], &s) == (size_t) 0); /* 2nd byte processed */ + assert (c8rtomb (buf, u8s[2], &s) == (size_t) 0); /* 3rd byte processed */ + errno = 0; + assert (c8rtomb (buf, u8s[3], &s) == (size_t) -1); /* Invalid trailing code unit */ + assert (errno == EILSEQ); + + return 0; +} + +static int +test_lone_trailing_code_units (void) +{ + const char8_t *u8s; + char buf[MB_LEN_MAX]; + mbstate_t s; + + /* Lone trailing code unit. */ + u8s = (const char8_t*) u8"\x80"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + errno = 0; + assert (c8rtomb (buf, u8s[0], &s) == (size_t) -1); /* Lone trailing code unit */ + assert (errno == EILSEQ); + + return 0; +} + +static int +test_overlong_encoding (void) +{ + const char8_t *u8s; + char buf[MB_LEN_MAX]; + mbstate_t s; + + /* Two byte overlong encoding. */ + u8s = (const char8_t*) u8"\xC0\x80"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + errno = 0; + assert (c8rtomb (buf, u8s[0], &s) == (size_t) -1); /* Invalid lead code unit */ + assert (errno == EILSEQ); + + /* Two byte overlong encoding. */ + u8s = (const char8_t*) u8"\xC1\x80"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + errno = 0; + assert (c8rtomb (buf, u8s[0], &s) == (size_t) -1); /* Invalid lead code unit */ + assert (errno == EILSEQ); + + /* Three byte overlong encoding. */ + u8s = (const char8_t*) u8"\xE0\x9F\xBF"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 0); /* First byte processed */ + errno = 0; + assert (c8rtomb (buf, u8s[1], &s) == (size_t) -1); /* Invalid trailing code unit */ + assert (errno == EILSEQ); + + /* Four byte overlong encoding. */ + u8s = (const char8_t*) u8"\xF0\x8F\xBF\xBF"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 0); /* First byte processed */ + errno = 0; + assert (c8rtomb (buf, u8s[1], &s) == (size_t) -1); /* Invalid trailing code unit */ + assert (errno == EILSEQ); + + return 0; +} + +static int +test_surrogate_range (void) +{ + const char8_t *u8s; + char buf[MB_LEN_MAX]; + mbstate_t s; + + /* Would encode U+D800. */ + u8s = (const char8_t*) u8"\xED\xA0\x80"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 0); /* First byte processed */ + errno = 0; + assert (c8rtomb (buf, u8s[1], &s) == (size_t) -1); /* Invalid trailing code unit */ + assert (errno == EILSEQ); + + /* Would encode U+DFFF. */ + u8s = (const char8_t*) u8"\xED\xBF\xBF"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 0); /* First byte processed */ + errno = 0; + assert (c8rtomb (buf, u8s[1], &s) == (size_t) -1); /* Invalid trailing code unit */ + assert (errno == EILSEQ); + + return 0; +} + +static int +test_out_of_range_encoding (void) +{ + const char8_t *u8s; + char buf[MB_LEN_MAX]; + mbstate_t s; + + /* Would encode U+00110000. */ + u8s = (const char8_t*) u8"\xF4\x90\x80\x80"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 0); /* First byte processed */ + errno = 0; + assert (c8rtomb (buf, u8s[1], &s) == (size_t) -1); /* Invalid trailing code unit */ + assert (errno == EILSEQ); + + /* Would encode U+00140000. */ + u8s = (const char8_t*) u8"\xF5\x90\x80\x80"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + errno = 0; + assert (c8rtomb (buf, u8s[0], &s) == (size_t) -1); /* Invalid lead code unit */ + assert (errno == EILSEQ); + + return 0; +} + +static int +test_invalid_utf8 (void) +{ + int result = 0; + + result |= test_truncated_code_unit_sequence (); + result |= test_invalid_trailing_code_unit_sequence (); + result |= test_lone_trailing_code_units (); + result |= test_overlong_encoding (); + result |= test_surrogate_range (); + result |= test_out_of_range_encoding (); + + return result; +} + +static int +test_null_output_buffer (void) +{ + char buf[MB_LEN_MAX]; + mbstate_t s; + + /* Null character with an initial state. */ + memset (&s, 0, sizeof (s)); + assert (c8rtomb (NULL, u8"X"[0], &s) == (size_t) 1); /* null byte processed */ + assert (mbsinit (&s)); /* Assert the state is now an initial state. */ + + /* Null buffer with a state corresponding to an incompletely read code + unit sequence. In this case, an error occurs since insufficient + information is available to complete the already started code unit + sequence and return to the initial state. */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8"\xC2"[0], &s) == (size_t) 0); /* 1st byte processed */ + errno = 0; + assert (c8rtomb (NULL, u8"\x80"[0], &s) == (size_t) -1); /* No trailing code unit */ + assert (errno == EILSEQ); + + return 0; +} + +static int +test_utf8 (void) +{ + const char *locale = "de_DE.UTF-8"; + const char8_t *u8s; + char buf[MB_LEN_MAX]; + mbstate_t s; + + if (!setlocale (LC_ALL, locale)) + { + fprintf (stderr, "locale '%s' not available!\n", locale); + exit (1); + } + + /* Null character. */ + u8s = (const char8_t*) u8"\x00"; /* U+0000 => 0x00 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 1); /* 1st byte processed */ + assert (buf[0] == (char) 0x00); + assert (mbsinit (&s)); + + /* First non-null character in the code point range that maps to a single + code unit. */ + u8s = (const char8_t*) u8"\x01"; /* U+0001 => 0x01 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 1); /* 1st byte processed */ + assert (buf[0] == (char) 0x01); + assert (mbsinit (&s)); + + /* Last character in the code point range that maps to a single code unit. */ + u8s = (const char8_t*) u8"\x7F"; /* U+007F => 0x7F */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 1); /* 1st byte processed */ + assert (buf[0] == (char) 0x7F); + assert (mbsinit (&s)); + + /* First character in the code point range that maps to two code units. */ + u8s = (const char8_t*) u8"\xC2\x80"; /* U+0080 => 0xC2 0x80 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 0); /* 1st byte processed */ + assert (c8rtomb (buf, u8s[1], &s) == (size_t) 2); /* 2nd byte processed */ + assert (buf[0] == (char) 0xC2); + assert (buf[1] == (char) 0x80); + assert (mbsinit (&s)); + + /* Last character in the code point range that maps to two code units. */ + u8s = (const char8_t*) u8"\u07FF"; /* U+07FF => 0xDF 0xBF */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 0); /* 1st byte processed */ + assert (c8rtomb (buf, u8s[1], &s) == (size_t) 2); /* 2nd byte processed */ + assert (buf[0] == (char) 0xDF); + assert (buf[1] == (char) 0xBF); + assert (mbsinit (&s)); + + /* First character in the code point range that maps to three code units. */ + u8s = (const char8_t*) u8"\u0800"; /* U+0800 => 0xE0 0xA0 0x80 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 0); /* 1st byte processed */ + assert (c8rtomb (buf, u8s[1], &s) == (size_t) 0); /* 2nd byte processed */ + assert (c8rtomb (buf, u8s[2], &s) == (size_t) 3); /* 3rd byte processed */ + assert (buf[0] == (char) 0xE0); + assert (buf[1] == (char) 0xA0); + assert (buf[2] == (char) 0x80); + assert (mbsinit (&s)); + + /* Last character in the code point range that maps to three code units + before the surrogate code point range. */ + u8s = (const char8_t*) u8"\uD7FF"; /* U+D7FF => 0xED 0x9F 0xBF */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 0); /* 1st byte processed */ + assert (c8rtomb (buf, u8s[1], &s) == (size_t) 0); /* 2nd byte processed */ + assert (c8rtomb (buf, u8s[2], &s) == (size_t) 3); /* 3rd byte processed */ + assert (buf[0] == (char) 0xED); + assert (buf[1] == (char) 0x9F); + assert (buf[2] == (char) 0xBF); + assert (mbsinit (&s)); + + /* First character in the code point range that maps to three code units + after the surrogate code point range. */ + u8s = (const char8_t*) u8"\uE000"; /* U+E000 => 0xEE 0x80 0x80 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 0); /* 1st byte processed */ + assert (c8rtomb (buf, u8s[1], &s) == (size_t) 0); /* 2nd byte processed */ + assert (c8rtomb (buf, u8s[2], &s) == (size_t) 3); /* 3rd byte processed */ + assert (buf[0] == (char) 0xEE); + assert (buf[1] == (char) 0x80); + assert (buf[2] == (char) 0x80); + assert (mbsinit (&s)); + + /* Not a BOM. */ + u8s = (const char8_t*) u8"\uFEFF"; /* U+FEFF => 0xEF 0xBB 0xBF */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 0); /* 1st byte processed */ + assert (c8rtomb (buf, u8s[1], &s) == (size_t) 0); /* 2nd byte processed */ + assert (c8rtomb (buf, u8s[2], &s) == (size_t) 3); /* 3rd byte processed */ + assert (buf[0] == (char) 0xEF); + assert (buf[1] == (char) 0xBB); + assert (buf[2] == (char) 0xBF); + assert (mbsinit (&s)); + + /* Replacement character. */ + u8s = (const char8_t*) u8"\uFFFD"; /* U+FFFD => 0xEF 0xBF 0xBD */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 0); /* 1st byte processed */ + assert (c8rtomb (buf, u8s[1], &s) == (size_t) 0); /* 2nd byte processed */ + assert (c8rtomb (buf, u8s[2], &s) == (size_t) 3); /* 3rd byte processed */ + assert (buf[0] == (char) 0xEF); + assert (buf[1] == (char) 0xBF); + assert (buf[2] == (char) 0xBD); + assert (mbsinit (&s)); + + /* Last character in the code point range that maps to three code units. */ + u8s = (const char8_t*) u8"\uFFFF"; /* U+FFFF => 0xEF 0xBF 0xBF */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 0); /* 1st byte processed */ + assert (c8rtomb (buf, u8s[1], &s) == (size_t) 0); /* 2nd byte processed */ + assert (c8rtomb (buf, u8s[2], &s) == (size_t) 3); /* 3rd byte processed */ + assert (buf[0] == (char) 0xEF); + assert (buf[1] == (char) 0xBF); + assert (buf[2] == (char) 0xBF); + assert (mbsinit (&s)); + + /* First character in the code point range that maps to four code units. */ + u8s = (const char8_t*) u8"\U00010000"; /* U+10000 => 0xF0 0x90 0x80 0x80 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 0); /* 1st byte processed */ + assert (c8rtomb (buf, u8s[1], &s) == (size_t) 0); /* 2nd byte processed */ + assert (c8rtomb (buf, u8s[2], &s) == (size_t) 0); /* 3rd byte processed */ + assert (c8rtomb (buf, u8s[3], &s) == (size_t) 4); /* 4th byte processed */ + assert (buf[0] == (char) 0xF0); + assert (buf[1] == (char) 0x90); + assert (buf[2] == (char) 0x80); + assert (buf[3] == (char) 0x80); + assert (mbsinit (&s)); + + /* Last character in the code point range that maps to four code units. */ + u8s = (const char8_t*) u8"\U0010FFFF"; /* U+10FFFF => 0xF4 0x8F 0xBF 0xBF */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 0); /* 1st byte processed */ + assert (c8rtomb (buf, u8s[1], &s) == (size_t) 0); /* 2nd byte processed */ + assert (c8rtomb (buf, u8s[2], &s) == (size_t) 0); /* 3rd byte processed */ + assert (c8rtomb (buf, u8s[3], &s) == (size_t) 4); /* 4th byte processed */ + assert (buf[0] == (char) 0xF4); + assert (buf[1] == (char) 0x8F); + assert (buf[2] == (char) 0xBF); + assert (buf[3] == (char) 0xBF); + assert (mbsinit (&s)); + + return 0; +} + +static int +test_big5_hkscs (void) +{ + const char *locale = "zh_HK.BIG5-HKSCS"; + const char8_t *u8s; + char buf[MB_LEN_MAX]; + mbstate_t s; + + if (!setlocale (LC_ALL, locale)) + { + fprintf (stderr, "locale '%s' not available!\n", locale); + exit (1); + } + + /* A pair of two byte UTF-8 code unit sequences that map a Unicode code + point and combining character to a single double byte character. */ + u8s = (const char8_t*) u8"\u00CA\u0304"; /* U+00CA U+0304 => 0x88 0x62 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 0); /* 1st byte processed */ + assert (c8rtomb (buf, u8s[1], &s) == (size_t) 0); /* 2nd byte processed */ + assert (c8rtomb (buf, u8s[2], &s) == (size_t) 0); /* 3rd byte processed */ + assert (c8rtomb (buf, u8s[3], &s) == (size_t) 2); /* 4th byte processed */ + assert (buf[0] == (char) 0x88); + assert (buf[1] == (char) 0x62); + assert (mbsinit (&s)); + + /* Another pair of two byte UTF-8 code unit sequences that map a Unicode code + point and combining character to a single double byte character. */ + u8s = (const char8_t*) u8"\u00EA\u030C"; /* U+00EA U+030C => 0x88 0xA5 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (c8rtomb (buf, u8s[0], &s) == (size_t) 0); /* 1st byte processed */ + assert (c8rtomb (buf, u8s[1], &s) == (size_t) 0); /* 2nd byte processed */ + assert (c8rtomb (buf, u8s[2], &s) == (size_t) 0); /* 3rd byte processed */ + assert (c8rtomb (buf, u8s[3], &s) == (size_t) 2); /* 4th byte processed */ + assert (buf[0] == (char) 0x88); + assert (buf[1] == (char) 0xA5); + assert (mbsinit (&s)); + + return 0; +} + +static int +do_test (void) +{ + int result = 0; + + result |= test_invalid_utf8 (); + result |= test_null_output_buffer (); + result |= test_utf8 (); + result |= test_big5_hkscs (); + + return result; +} + +#define TEST_FUNCTION do_test () +#include "../test-skeleton.c" diff --git a/wcsmbs/test-char8-type.c b/wcsmbs/test-char8-type.c new file mode 100644 index 0000000000..220dea787e --- /dev/null +++ b/wcsmbs/test-char8-type.c @@ -0,0 +1,36 @@ +/* Test char8_t types consistent with compiler. + Copyright (C) 2020 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* Ensure that char8_t support is enabled so that the char8_t typedef is + declared. */ +#define _CHAR8_T_SOURCE + +#include +#include + +/* Verify that the char8_t type is recognized. */ +char8_t c8; + +static int +do_test (void) +{ + /* This is a compilation test. */ + return 0; +} + +#include diff --git a/wcsmbs/test-mbrtoc8.c b/wcsmbs/test-mbrtoc8.c new file mode 100644 index 0000000000..7bacf48166 --- /dev/null +++ b/wcsmbs/test-mbrtoc8.c @@ -0,0 +1,485 @@ +/* Test mbrtoc8. + Copyright (C) 2020 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Tom Honermann , 2020. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* Ensure that char8_t support is enabled so that the char8_t typedef and + mbrtoc8 function are declared. */ +#define _CHAR8_T_SOURCE + +/* We always want assert to be fully defined. */ +#undef NDEBUG + +#include +#include +#include +#include +#include +#include +#include +#include + +static int +test_utf8 (void) +{ + const char *locale = "de_DE.UTF-8"; + const char *mbs; + char8_t buf[1]; + mbstate_t s; + + if (!setlocale (LC_ALL, locale)) + { + fprintf (stderr, "locale '%s' not available!\n", locale); + exit (1); + } + + /* No inputs. */ + mbs = ""; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (mbrtoc8 (buf, mbs, 0, &s) == (size_t) -2); /* no input */ + assert (mbsinit (&s)); + + /* Null character. */ + mbs = "\x00"; /* 0x00 => U+0000 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) 0); /* null byte written */ + mbs += 1; + assert (buf[0] == 0x00); + assert (mbsinit (&s)); + + /* First non-null character in the code point range that maps to a single + code unit. */ + mbs = "\x01"; /* 0x01 => U+0001 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) 1); /* 1st byte processed */ + mbs += 1; + assert (buf[0] == 0x01); + assert (mbsinit (&s)); + + /* Last character in the code point range that maps to a single code unit. */ + mbs = "\x7F"; /* 0x7F => U+007F */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) 1); /* 1st byte processed */ + mbs += 1; + assert (buf[0] == 0x7F); + assert (mbsinit (&s)); + + /* First character in the code point range that maps to two code units. */ + mbs = "\xC2\x80"; /* 0xC2 0x80 => U+0080 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) 2); /* 1st byte written */ + mbs += 2; + assert (buf[0] == 0xC2); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) -3); /* 2nd byte written */ + assert (buf[0] == 0x80); + assert (mbsinit (&s)); + + /* Same as last test, but one code unit at a time. */ + mbs = "\xC2\x80"; /* 0xC2 0x80 => U+0080 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -2); /* incomplete */ + mbs += 1; + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) 1); /* 1st byte written */ + mbs += 1; + assert (buf[0] == 0xC2); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -3); /* 2nd byte written */ + assert (buf[0] == 0x80); + assert (mbsinit (&s)); + + /* Last character in the code point range that maps to two code units. */ + mbs = "\xDF\xBF"; /* 0xDF 0xBF => U+07FF */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) 2); /* 1st byte written */ + mbs += 2; + assert (buf[0] == 0xDF); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) -3); /* 2nd byte written */ + assert (buf[0] == 0xBF); + assert (mbsinit (&s)); + + /* Same as last test, but one code unit at a time. */ + mbs = "\xDF\xBF"; /* 0xDF 0xBF => U+07FF */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -2); /* incomplete */ + mbs += 1; + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) 1); /* 1st byte written */ + mbs += 1; + assert (buf[0] == 0xDF); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -3); /* 2nd byte written */ + assert (buf[0] == 0xBF); + assert (mbsinit (&s)); + + /* First character in the code point range that maps to three code units. */ + mbs = u8"\xE0\xA0\x80"; /* 0xE0 0xA0 0x80 => U+0800 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) 3); /* 1st byte written */ + mbs += 3; + assert (buf[0] == 0xE0); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) -3); /* 2nd byte written */ + assert (buf[0] == 0xA0); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) -3); /* 3rd byte written */ + assert (buf[0] == 0x80); + assert (mbsinit (&s)); + + /* Same as last test, but one code unit at a time. */ + mbs = u8"\xE0\xA0\x80"; /* 0xE0 0xA0 0x80 => U+0800 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -2); /* incomplete */ + mbs += 1; + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -2); /* incomplete */ + mbs += 1; + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) 1); /* 1st byte written */ + mbs += 1; + assert (buf[0] == 0xE0); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -3); /* 2nd byte written */ + assert (buf[0] == 0xA0); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -3); /* 3rd byte written */ + assert (buf[0] == 0x80); + assert (mbsinit (&s)); + + /* Last character in the code point range that maps to three code units + before the surrogate code point range. */ + mbs = "\xED\x9F\xBF"; /* 0xED 0x9F 0xBF => U+D7FF */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) 3); /* 1st byte written */ + mbs += 3; + assert (buf[0] == 0xED); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) -3); /* 2nd byte written */ + assert (buf[0] == 0x9F); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) -3); /* 3rd byte written */ + assert (buf[0] == 0xBF); + assert (mbsinit (&s)); + + /* Same as last test, but one code unit at a time. */ + mbs = "\xED\x9F\xBF"; /* 0xED 0x9F 0xBF => U+D7FF */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -2); /* incomplete */ + mbs += 1; + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -2); /* incomplete */ + mbs += 1; + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) 1); /* 1st byte written */ + mbs += 1; + assert (buf[0] == 0xED); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -3); /* 2nd byte written */ + assert (buf[0] == 0x9F); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -3); /* 3rd byte written */ + assert (buf[0] == 0xBF); + assert (mbsinit (&s)); + + /* First character in the code point range that maps to three code units + after the surrogate code point range. */ + mbs = "\xEE\x80\x80"; /* 0xEE 0x80 0x80 => U+E000 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) 3); /* 1st byte written */ + mbs += 3; + assert (buf[0] == 0xEE); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) -3); /* 2nd byte written */ + assert (buf[0] == 0x80); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) -3); /* 3rd byte written */ + assert (buf[0] == 0x80); + assert (mbsinit (&s)); + + /* Same as last test, but one code unit at a time. */ + mbs = "\xEE\x80\x80"; /* 0xEE 0x80 0x80 => U+E000 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -2); /* incomplete */ + mbs += 1; + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -2); /* incomplete */ + mbs += 1; + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) 1); /* 1st byte written */ + mbs += 1; + assert (buf[0] == 0xEE); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -3); /* 2nd byte written */ + assert (buf[0] == 0x80); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -3); /* 3rd byte written */ + assert (buf[0] == 0x80); + assert (mbsinit (&s)); + + /* Not a BOM. */ + mbs = "\xEF\xBB\xBF"; /* 0xEF 0xBB 0xBF => U+FEFF */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) 3); /* 1st byte written */ + mbs += 3; + assert (buf[0] == 0xEF); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) -3); /* 2nd byte written */ + assert (buf[0] == 0xBB); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) -3); /* 3rd byte written */ + assert (buf[0] == 0xBF); + assert (mbsinit (&s)); + + /* Same as last test, but one code unit at a time. */ + mbs = "\xEF\xBB\xBF"; /* 0xEF 0xBB 0xBF => U+FEFF */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -2); /* incomplete */ + mbs += 1; + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -2); /* incomplete */ + mbs += 1; + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) 1); /* 1st byte written */ + mbs += 1; + assert (buf[0] == 0xEF); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -3); /* 2nd byte written */ + assert (buf[0] == 0xBB); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -3); /* 3rd byte written */ + assert (buf[0] == 0xBF); + assert (mbsinit (&s)); + + /* Replacement character. */ + mbs = "\xEF\xBF\xBD"; /* 0xEF 0xBF 0xBD => U+FFFD */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) 3); /* 1st byte written */ + mbs += 3; + assert (buf[0] == 0xEF); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) -3); /* 2nd byte written */ + assert (buf[0] == 0xBF); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) -3); /* 3rd byte written */ + assert (buf[0] == 0xBD); + assert (mbsinit (&s)); + + /* Same as last test, but one code unit at a time. */ + mbs = "\xEF\xBF\xBD"; /* 0xEF 0xBF 0xBD => U+FFFD */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -2); /* incomplete */ + mbs += 1; + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -2); /* incomplete */ + mbs += 1; + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) 1); /* 1st byte written */ + mbs += 1; + assert (buf[0] == 0xEF); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -3); /* 2nd byte written */ + assert (buf[0] == 0xBF); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -3); /* 3rd byte written */ + assert (buf[0] == 0xBD); + assert (mbsinit (&s)); + + /* Last character in the code point range that maps to three code units. */ + mbs = "\xEF\xBF\xBF"; /* 0xEF 0xBF 0xBF => U+FFFF */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) 3); /* 1st byte processed */ + mbs += 3; + assert (buf[0] == 0xEF); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) -3); /* 2nd byte processed */ + assert (buf[0] == 0xBF); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) -3); /* 3rd byte processed */ + assert (buf[0] == 0xBF); + assert (mbsinit (&s)); + + /* Same as last test, but one code unit at a time. */ + mbs = "\xEF\xBF\xBF"; /* 0xEF 0xBF 0xBF => U+FFFF */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -2); /* incomplete */ + mbs += 1; + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -2); /* incomplete */ + mbs += 1; + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) 1); /* 1st byte written */ + mbs += 1; + assert (buf[0] == 0xEF); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -3); /* 2nd byte written */ + assert (buf[0] == 0xBF); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -3); /* 3rd byte written */ + assert (buf[0] == 0xBF); + assert (mbsinit (&s)); + + /* First character in the code point range that maps to four code units. */ + mbs = "\xF0\x90\x80\x80"; /* 0xF0 0x90 0x80 0x80 => U+10000 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) 4); /* 1st byte written */ + mbs += 4; + assert (buf[0] == 0xF0); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) -3); /* 2nd byte written */ + assert (buf[0] == 0x90); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) -3); /* 3rd byte written */ + assert (buf[0] == 0x80); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) -3); /* 4th byte written */ + assert (buf[0] == 0x80); + assert (mbsinit (&s)); + + /* Same as last test, but one code unit at a time. */ + mbs = "\xF0\x90\x80\x80"; /* 0xF0 0x90 0x80 0x80 => U+10000 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -2); /* incomplete */ + mbs += 1; + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -2); /* incomplete */ + mbs += 1; + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -2); /* incomplete */ + mbs += 1; + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) 1); /* 1st byte written */ + mbs += 1; + assert (buf[0] == 0xF0); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -3); /* 2nd byte written */ + assert (buf[0] == 0x90); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -3); /* 3rd byte written */ + assert (buf[0] == 0x80); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -3); /* 4th byte written */ + assert (buf[0] == 0x80); + assert (mbsinit (&s)); + + /* Last character in the code point range that maps to four code units. */ + mbs = "\xF4\x8F\xBF\xBF"; /* 0xF4 0x8F 0xBF 0xBF => U+10FFFF */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) 4); /* 1st byte written */ + mbs += 4; + assert (buf[0] == 0xF4); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) -3); /* 2nd byte written */ + assert (buf[0] == 0x8F); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) -3); /* 3rd byte written */ + assert (buf[0] == 0xBF); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) -3); /* 4th byte written */ + assert (buf[0] == 0xBF); + assert (mbsinit (&s)); + + /* Same as last test, but one code unit at a time. */ + mbs = "\xF4\x8F\xBF\xBF"; /* 0xF4 0x8F 0xBF 0xBF => U+10FFFF */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -2); /* incomplete */ + mbs += 1; + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -2); /* incomplete */ + mbs += 1; + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -2); /* incomplete */ + mbs += 1; + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) 1); /* 1st byte written */ + mbs += 1; + assert (buf[0] == 0xF4); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -3); /* 2nd byte written */ + assert (buf[0] == 0x8F); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -3); /* 3rd byte written */ + assert (buf[0] == 0xBF); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -3); /* 4th byte written */ + assert (buf[0] == 0xBF); + assert (mbsinit (&s)); + + return 0; +} + +static int +test_big5_hkscs (void) +{ + const char *locale = "zh_HK.BIG5-HKSCS"; + const char *mbs; + char8_t buf[1]; + mbstate_t s; + + if (!setlocale (LC_ALL, locale)) + { + fprintf (stderr, "locale '%s' not available!\n", locale); + exit (1); + } + + /* A double byte character that maps to a pair of two byte UTF-8 code unit + sequences. */ + mbs = "\x88\x62"; /* 0x88 0x62 => U+00CA U+0304 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) 2); /* 1st byte written */ + mbs += 2; + assert (buf[0] == 0xC3); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) -3); /* 2nd byte written */ + assert (buf[0] == 0x8A); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) -3); /* 3rd byte written */ + assert (buf[0] == 0xCC); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) -3); /* 4th byte written */ + assert (buf[0] == 0x84); + assert (mbsinit (&s)); + + /* Same as last test, but one code unit at a time. */ + mbs = "\x88\x62"; /* 0x88 0x62 => U+00CA U+0304 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -2); /* incomplete */ + mbs += 1; + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) 1); /* 1st byte written */ + mbs += 1; + assert (buf[0] == 0xC3); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -3); /* 2nd byte written */ + assert (buf[0] == 0x8A); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -3); /* 3rd byte written */ + assert (buf[0] == 0xCC); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -3); /* 4th byte written */ + assert (buf[0] == 0x84); + assert (mbsinit (&s)); + + /* Another double byte character that maps to a pair of two byte UTF-8 code + unit sequences. */ + mbs = "\x88\xA5"; /* 0x88 0xA5 => U+00EA U+030C */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) 2); /* 1st byte written */ + mbs += 2; + assert (buf[0] == 0xC3); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) -3); /* 2nd byte written */ + assert (buf[0] == 0xAA); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) -3); /* 3rd byte written */ + assert (buf[0] == 0xCC); + assert (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s) == (size_t) -3); /* 4th byte written */ + assert (buf[0] == 0x8C); + assert (mbsinit (&s)); + + /* Same as last test, but one code unit at a time. */ + mbs = "\x88\xA5"; /* 0x88 0xA5 => U+00EA U+030C */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -2); /* incomplete */ + mbs += 1; + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) 1); /* 1st byte written */ + mbs += 1; + assert (buf[0] == 0xC3); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -3); /* 2nd byte written */ + assert (buf[0] == 0xAA); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -3); /* 3rd byte written */ + assert (buf[0] == 0xCC); + assert (mbrtoc8 (buf, mbs, 1, &s) == (size_t) -3); /* 4th byte written */ + assert (buf[0] == 0x8C); + assert (mbsinit (&s)); + + return 0; +} + +static int +do_test (void) +{ + int result = 0; + + result |= test_utf8 (); + result |= test_big5_hkscs (); + + return result; +} + +#define TEST_FUNCTION do_test () +#include "../test-skeleton.c"