From patchwork Tue Jan 16 14:06:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Wakely X-Patchwork-Id: 1887083 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=RhdiFWqa; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TDrSD3dNrz1yPJ for ; Wed, 17 Jan 2024 01:07:40 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 684E13858CDB for ; Tue, 16 Jan 2024 14:07:38 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 6A4B23858D1E for ; Tue, 16 Jan 2024 14:06:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6A4B23858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 6A4B23858D1E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1705414029; cv=none; b=Bzrwxg2sv3sMGW/Y+XisI1IJZxpAJyYvn1QydJFLpXUnJEonx8vGpUYkknkCSyNl614I7jenTXlmDFvtQ0S5eBUYhs69EFTSQDcWJbk8/bgtjzxgLhgjLcwpNSzlPndj84cQy78q8qJuuXqnlhkxTUShXRS/bSRJoCY6h3oW3f4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1705414029; c=relaxed/simple; bh=B6jZEUC7V4Erv5fvfMnvhv4Z34BDiIGoS4hu+JygbH4=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=d+m12b/ZTjooWUUUkVQgkVdKMUjH9U6DLc5gznqso7H6QZvsRcT8IcC+2+xXqX/ZO1RHiLSg3VUD85ITUeFvsAa12RDSd/LaFFMTEQg/qofVRvM8X8HLaFb1pxmdCFOy8ZyYQr3lt3Zuo1K0XKkfx4fqgrUIIuiDiEHQ0SK4uJA= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1705414018; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=NlKQcOHWWp3LM5itTJuW3P2PhMrbOZZpi1VlTSnwK/0=; b=RhdiFWqadbE9Occi2PNFaqNfe2n7DLiQ3omnM0WQipdQUb8N/nXk33ZOcqlVZWaQrzcv8x LuDD9joIRRiCXOZphP2h2NIdHdWygMfELchLiBcl56L7A40h90rQPunzVTZgQfLwIfCJmJ /WbFYvO/t3eMdnu8Dpnrfni0TB6waog= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-272-tV9mbOINMWabvU4NvHq_tA-1; Tue, 16 Jan 2024 09:06:56 -0500 X-MC-Unique: tV9mbOINMWabvU4NvHq_tA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D97FB85CBA7; Tue, 16 Jan 2024 14:06:55 +0000 (UTC) Received: from localhost (unknown [10.42.28.185]) by smtp.corp.redhat.com (Postfix) with ESMTP id 58D1440C6EB9; Tue, 16 Jan 2024 14:06:55 +0000 (UTC) Date: Tue, 16 Jan 2024 14:06:54 +0000 From: Jonathan Wakely To: Patrick Palka Cc: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org Subject: [PATCH v4] libstdc++: Implement C++26 std::text_encoding (P1885R12) [PR113318] Message-ID: References: <20240113124834.1296437-1-jwakely@redhat.com> <20240115204803.1550804-1-jwakely@redhat.com> MIME-Version: 1.0 In-Reply-To: X-Clacks-Overhead: GNU Terry Pratchett X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.2 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org On 15/01/24 19:09 -0500, Patrick Palka wrote: >On Mon, 15 Jan 2024, Jonathan Wakely wrote: > >> I think I'm happy with this now. It has tests for all the new functions, >> and the performance of the charset alias match algorithm is improved by >> reusing part of . >> >> Tested x86_64-linux. >> >> -- >8 -- >> >> This is another C++26 change, approved in Varna 2022. We require a new > >2023? Oops, yes. >> @@ -1056,6 +1057,54 @@ inline namespace __v15_1_0 >> __literal_encoding_is_utf8() >> { return __literal_encoding_is_unicode(); } >> >> + consteval bool >> + __literal_encoding_is_extended_ascii() >> + { >> + return '0' == 0x30 && 'A' == 0x41 && 'Z' == 0x5a >> + && 'a' == 0x61 && 'z' == 0x7a; >> + } > >This function seems unused now. It is. I thought it might be a useful utility to have in the unicode file, but it's easy enough to recreate if needed. I'll remove it. >> + >> + // https://www.unicode.org/reports/tr22/tr22-8.html#Charset_Alias_Matching >> + constexpr bool >> + __charset_alias_match(string_view __a, string_view __b) >> + { >> + // Map alphanumeric chars to their base 64 value, everything else to 127. >> + auto __map = [](char __c, bool& __num) -> unsigned char { >> + using __detail::__from_chars_alnum_to_val_table; >> + if (__c == '0') [[unlikely]] >> + return __num ? 0 : 127; >> + auto __v = __from_chars_alnum_to_val_table::value.__data[__c]; > >Maybe it'd be more concise to use the accessor function >__from_chars_alnum_to_val here (so e.g. the caller doesn't have to pass >_DecOnly=false explicitly)? Yup, done. > + >> + class _Iterator >> + { >> + public: >> + using value_type = const char*; >> + using reference = const char*; >> + using difference_type = int; >> + constexpr value_type operator*() const; > >Are these defined out-of-line to avoid excessive indentation? Perhaps >we could define _Iterator (and aliases_view::begin) out-of-line instead? Not due to indentation, but because the text_encoding class itself became even more unreadable with this code in the middle of it. I realise that the huge list of enumerators in the id enum already makes it unwieldy, but at least that's just a repetitive list of constants. The iterator function bodies don't really add anything to the understanding of the std::text_encoding API (it's an iterator, it has the expected iterator operations, the details of how they're implemented don't matter to most casual readers). But moving the whole _Iterator class to the end serves the same purpose. I didn't try moving it to the end because I was concerned about Clang giving bogus errors about constexpr functions not being defined, like this recent one: https://github.com/llvm/llvm-project/issues/73232 Although if that was going to be a problem, defining the members out-of-line would probably already fail. I've moved it (and the begin() function that uses it). > >> + constexpr _Iterator& operator++(); >> + constexpr _Iterator& operator--(); >> + constexpr _Iterator operator++(int); >> + constexpr _Iterator operator--(int); >> + constexpr value_type operator[](difference_type) const; >> + constexpr _Iterator& operator+=(difference_type); >> + constexpr _Iterator& operator-=(difference_type); >> + constexpr difference_type operator-(const _Iterator&) const; >> + constexpr bool operator==(const _Iterator&) const = default; >> + constexpr bool operator==(_Sentinel) const noexcept; >> + constexpr strong_ordering operator<=>(const _Iterator&) const; >> + >> + friend _Iterator >> + operator+(_Iterator __i, difference_type __n) > >constexpr? Fixed. I've added tests that all iterator ops are usable in constant expressions, which found a bug in operator+= (it didn't let you increment one past the end of the range). Thanks for the review. Patch v4 attached. I'm testing it now. commit 175ab4d9474d6d6f835714dc4c261249b75f98a5 Author: Jonathan Wakely Date: Mon Jan 15 15:42:50 2024 libstdc++: Implement C++26 std::text_encoding (P1885R12) [PR113318] This is another C++26 change, approved in Varna 2023. We require a new static array of data that is extracted from the IANA Character Sets database. A new Python script to generate a header from the IANA CSV file is added. The text_encoding class is basically just a pointer to an {ID,name} pair in the static array. The aliases view is also just the same pointer (or empty), and the view's iterator moves forwards and backwards in the array while the array elements have the same ID (or to one element further, for a past-the-end iterator). Because those iterators refer to a global array that never goes out of scope, there's no reason they should every produce undefined behaviour or indeterminate values. They should either have well-defined behaviour, or abort. The overhead of ensuring those properties is pretty low, so seems worth it. This means that an aliases_view iterator should never be able to access out-of-bounds. A non-value-initialized iterator always points to an element of the static array even when not dereferenceable (the array has unreachable entries at the start and end, which means that even a past-the-end iterator for the last encoding in the array still points to valid memory). Dereferencing an iterator can always return a valid array element, or "" for a non-dereferenceable iterator (but doing so will abort when assertions are enabled). In the language being proposed for C++26, dereferencing an invalid iterator erroneously returns "". Attempting to increment/decrement past the last/first element in the view is erroneously a no-op, so aborts when assertions are enabled, and doesn't change value otherwise. libstdc++-v3/ChangeLog: PR libstdc++/113318 * acinclude.m4 (GLIBCXX_CONFIGURE): Add c++26 directory. (GLIBCXX_CHECK_TEXT_ENCODING): Define. * config.h.in: Regenerate. * configure: Regenerate. * configure.ac: Use GLIBCXX_CHECK_TEXT_ENCODING. * include/Makefile.am: Add new headers. * include/Makefile.in: Regenerate. * include/bits/locale_classes.h (locale::encoding): Declare new member function. * include/bits/unicode.h (__charset_alias_match): New function. * include/bits/text_encoding-data.h: New file. * include/bits/version.def (text_encoding): Define. * include/bits/version.h: Regenerate. * include/std/text_encoding: New file. * src/Makefile.am: Add new subdirectory. * src/Makefile.in: Regenerate. * src/c++26/Makefile.am: New file. * src/c++26/Makefile.in: New file. * src/c++26/text_encoding.cc: New file. * src/experimental/Makefile.am: Include c++26 convenience library. * src/experimental/Makefile.in: Regenerate. * python/libstdcxx/v6/printers.py (StdTextEncodingPrinter): New printer. * scripts/gen_text_encoding_data.py: New file. * testsuite/22_locale/locale/encoding.cc: New test. * testsuite/ext/unicode/charset_alias_match.cc: New test. * testsuite/std/text_encoding/cons.cc: New test. * testsuite/std/text_encoding/members.cc: New test. * testsuite/std/text_encoding/requirements.cc: New test. diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4 index e7cbf0fcf96..f9ba7ef744b 100644 --- a/libstdc++-v3/acinclude.m4 +++ b/libstdc++-v3/acinclude.m4 @@ -49,7 +49,7 @@ AC_DEFUN([GLIBCXX_CONFIGURE], [ # Keep these sync'd with the list in Makefile.am. The first provides an # expandable list at autoconf time; the second provides an expandable list # (i.e., shell variable) at configure time. - m4_define([glibcxx_SUBDIRS],[include libsupc++ src src/c++98 src/c++11 src/c++17 src/c++20 src/c++23 src/filesystem src/libbacktrace src/experimental doc po testsuite python]) + m4_define([glibcxx_SUBDIRS],[include libsupc++ src src/c++98 src/c++11 src/c++17 src/c++20 src/c++23 src/c++26 src/filesystem src/libbacktrace src/experimental doc po testsuite python]) SUBDIRS='glibcxx_SUBDIRS' # These need to be absolute paths, yet at the same time need to @@ -5821,6 +5821,34 @@ AC_LANG_SAVE AC_LANG_RESTORE ]) +dnl +dnl Check whether the dependencies for std::text_encoding are available. +dnl +dnl Defines: +dnl _GLIBCXX_USE_NL_LANGINFO_L if nl_langinfo_l is in . +dnl +AC_DEFUN([GLIBCXX_CHECK_TEXT_ENCODING], [ +AC_LANG_SAVE + AC_LANG_CPLUSPLUS + + AC_MSG_CHECKING([whether nl_langinfo_l is defined in ]) + AC_TRY_COMPILE([ + #include + #include + ],[ + locale_t loc = newlocale(LC_ALL_MASK, "", (locale_t)0); + const char* enc = nl_langinfo_l(CODESET, loc); + freelocale(loc); + ], [ac_nl_langinfo_l=yes], [ac_nl_langinfo_l=no]) + AC_MSG_RESULT($ac_nl_langinfo_l) + if test "$ac_nl_langinfo_l" = yes; then + AC_DEFINE_UNQUOTED(_GLIBCXX_USE_NL_LANGINFO_L, 1, + [Define if nl_langinfo_l should be used for std::text_encoding.]) + fi + + AC_LANG_RESTORE +]) + # Macros from the top-level gcc directory. m4_include([../config/gc++filt.m4]) m4_include([../config/tls.m4]) diff --git a/libstdc++-v3/configure.ac b/libstdc++-v3/configure.ac index c8b36333019..c68cac4f345 100644 --- a/libstdc++-v3/configure.ac +++ b/libstdc++-v3/configure.ac @@ -557,6 +557,9 @@ GLIBCXX_CHECK_INIT_PRIORITY # For __basic_file::native_handle() GLIBCXX_CHECK_FILEBUF_NATIVE_HANDLES +# For std::text_encoding +GLIBCXX_CHECK_TEXT_ENCODING + # Define documentation rules conditionally. # See if makeinfo has been installed and is modern enough diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am index c6d6a24eb9e..64152351ed0 100644 --- a/libstdc++-v3/include/Makefile.am +++ b/libstdc++-v3/include/Makefile.am @@ -104,6 +104,7 @@ std_headers = \ ${std_srcdir}/streambuf \ ${std_srcdir}/string \ ${std_srcdir}/system_error \ + ${std_srcdir}/text_encoding \ ${std_srcdir}/thread \ ${std_srcdir}/unordered_map \ ${std_srcdir}/unordered_set \ @@ -159,6 +160,7 @@ bits_freestanding = \ ${bits_srcdir}/stl_raw_storage_iter.h \ ${bits_srcdir}/stl_relops.h \ ${bits_srcdir}/stl_uninitialized.h \ + ${bits_srcdir}/text_encoding-data.h \ ${bits_srcdir}/version.h \ ${bits_srcdir}/string_view.tcc \ ${bits_srcdir}/unicode.h \ diff --git a/libstdc++-v3/include/bits/locale_classes.h b/libstdc++-v3/include/bits/locale_classes.h index 621f2a29f50..a2e94217006 100644 --- a/libstdc++-v3/include/bits/locale_classes.h +++ b/libstdc++-v3/include/bits/locale_classes.h @@ -40,6 +40,10 @@ #include #include +#ifdef __glibcxx_text_encoding +#include +#endif + namespace std _GLIBCXX_VISIBILITY(default) { _GLIBCXX_BEGIN_NAMESPACE_VERSION @@ -248,6 +252,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION string name() const; +#ifdef __glibcxx_text_encoding +# if __CHAR_BIT__ == 8 + text_encoding + encoding() const; +# else + text_encoding + encoding() const = delete; +# endif +#endif + /** * @brief Locale equality. * diff --git a/libstdc++-v3/include/bits/text_encoding-data.h b/libstdc++-v3/include/bits/text_encoding-data.h new file mode 100644 index 00000000000..7ac2e9dc3d9 --- /dev/null +++ b/libstdc++-v3/include/bits/text_encoding-data.h @@ -0,0 +1,902 @@ +// Generated by gen_text_encoding_data.py, do not edit. + +#ifndef _GLIBCXX_GET_ENCODING_DATA +# error "This is not a public header, do not include it directly" +#endif + + { 3, "US-ASCII" }, + { 3, "iso-ir-6" }, + { 3, "ANSI_X3.4-1968" }, + { 3, "ANSI_X3.4-1986" }, + { 3, "ISO_646.irv:1991" }, + { 3, "ISO646-US" }, + { 3, "us" }, + { 3, "IBM367" }, + { 3, "cp367" }, + { 3, "csASCII" }, + { 4, "ISO_8859-1:1987" }, + { 4, "iso-ir-100" }, + { 4, "ISO_8859-1" }, + { 4, "ISO-8859-1" }, + { 4, "latin1" }, + { 4, "l1" }, + { 4, "IBM819" }, + { 4, "CP819" }, + { 4, "csISOLatin1" }, + { 5, "ISO_8859-2:1987" }, + { 5, "iso-ir-101" }, + { 5, "ISO_8859-2" }, + { 5, "ISO-8859-2" }, + { 5, "latin2" }, + { 5, "l2" }, + { 5, "csISOLatin2" }, + { 6, "ISO_8859-3:1988" }, + { 6, "iso-ir-109" }, + { 6, "ISO_8859-3" }, + { 6, "ISO-8859-3" }, + { 6, "latin3" }, + { 6, "l3" }, + { 6, "csISOLatin3" }, + { 7, "ISO_8859-4:1988" }, + { 7, "iso-ir-110" }, + { 7, "ISO_8859-4" }, + { 7, "ISO-8859-4" }, + { 7, "latin4" }, + { 7, "l4" }, + { 7, "csISOLatin4" }, + { 8, "ISO_8859-5:1988" }, + { 8, "iso-ir-144" }, + { 8, "ISO_8859-5" }, + { 8, "ISO-8859-5" }, + { 8, "cyrillic" }, + { 8, "csISOLatinCyrillic" }, + { 9, "ISO_8859-6:1987" }, + { 9, "iso-ir-127" }, + { 9, "ISO_8859-6" }, + { 9, "ISO-8859-6" }, + { 9, "ECMA-114" }, + { 9, "ASMO-708" }, + { 9, "arabic" }, + { 9, "csISOLatinArabic" }, + { 10, "ISO_8859-7:1987" }, + { 10, "iso-ir-126" }, + { 10, "ISO_8859-7" }, + { 10, "ISO-8859-7" }, + { 10, "ELOT_928" }, + { 10, "ECMA-118" }, + { 10, "greek" }, + { 10, "greek8" }, + { 10, "csISOLatinGreek" }, + { 11, "ISO_8859-8:1988" }, + { 11, "iso-ir-138" }, + { 11, "ISO_8859-8" }, + { 11, "ISO-8859-8" }, + { 11, "hebrew" }, + { 11, "csISOLatinHebrew" }, + { 12, "ISO_8859-9:1989" }, + { 12, "iso-ir-148" }, + { 12, "ISO_8859-9" }, + { 12, "ISO-8859-9" }, + { 12, "latin5" }, + { 12, "l5" }, + { 12, "csISOLatin5" }, + { 13, "ISO-8859-10" }, + { 13, "iso-ir-157" }, + { 13, "l6" }, + { 13, "ISO_8859-10:1992" }, + { 13, "csISOLatin6" }, + { 13, "latin6" }, + { 14, "ISO_6937-2-add" }, + { 14, "iso-ir-142" }, + { 14, "csISOTextComm" }, + { 15, "JIS_X0201" }, + { 15, "X0201" }, + { 15, "csHalfWidthKatakana" }, + { 16, "JIS_Encoding" }, + { 16, "csJISEncoding" }, + { 17, "Shift_JIS" }, + { 17, "MS_Kanji" }, + { 17, "csShiftJIS" }, + { 18, "Extended_UNIX_Code_Packed_Format_for_Japanese" }, + { 18, "csEUCPkdFmtJapanese" }, + { 18, "EUC-JP" }, + { 19, "Extended_UNIX_Code_Fixed_Width_for_Japanese" }, + { 19, "csEUCFixWidJapanese" }, + { 20, "BS_4730" }, + { 20, "iso-ir-4" }, + { 20, "ISO646-GB" }, + { 20, "gb" }, + { 20, "uk" }, + { 20, "csISO4UnitedKingdom" }, + { 21, "SEN_850200_C" }, + { 21, "iso-ir-11" }, + { 21, "ISO646-SE2" }, + { 21, "se2" }, + { 21, "csISO11SwedishForNames" }, + { 22, "IT" }, + { 22, "iso-ir-15" }, + { 22, "ISO646-IT" }, + { 22, "csISO15Italian" }, + { 23, "ES" }, + { 23, "iso-ir-17" }, + { 23, "ISO646-ES" }, + { 23, "csISO17Spanish" }, + { 24, "DIN_66003" }, + { 24, "iso-ir-21" }, + { 24, "de" }, + { 24, "ISO646-DE" }, + { 24, "csISO21German" }, + { 25, "NS_4551-1" }, + { 25, "iso-ir-60" }, + { 25, "ISO646-NO" }, + { 25, "no" }, + { 25, "csISO60DanishNorwegian" }, + { 25, "csISO60Norwegian1" }, + { 26, "NF_Z_62-010" }, + { 26, "iso-ir-69" }, + { 26, "ISO646-FR" }, + { 26, "fr" }, + { 26, "csISO69French" }, + { 27, "ISO-10646-UTF-1" }, + { 27, "csISO10646UTF1" }, + { 28, "ISO_646.basic:1983" }, + { 28, "ref" }, + { 28, "csISO646basic1983" }, + { 29, "INVARIANT" }, + { 29, "csINVARIANT" }, + { 30, "ISO_646.irv:1983" }, + { 30, "iso-ir-2" }, + { 30, "irv" }, + { 30, "csISO2IntlRefVersion" }, + { 31, "NATS-SEFI" }, + { 31, "iso-ir-8-1" }, + { 31, "csNATSSEFI" }, + { 32, "NATS-SEFI-ADD" }, + { 32, "iso-ir-8-2" }, + { 32, "csNATSSEFIADD" }, + { 35, "SEN_850200_B" }, + { 35, "iso-ir-10" }, + { 35, "FI" }, + { 35, "ISO646-FI" }, + { 35, "ISO646-SE" }, + { 35, "se" }, + { 35, "csISO10Swedish" }, + { 36, "KS_C_5601-1987" }, + { 36, "iso-ir-149" }, + { 36, "KS_C_5601-1989" }, + { 36, "KSC_5601" }, + { 36, "korean" }, + { 36, "csKSC56011987" }, + { 37, "ISO-2022-KR" }, + { 37, "csISO2022KR" }, + { 38, "EUC-KR" }, + { 38, "csEUCKR" }, + { 39, "ISO-2022-JP" }, + { 39, "csISO2022JP" }, + { 40, "ISO-2022-JP-2" }, + { 40, "csISO2022JP2" }, + { 41, "JIS_C6220-1969-jp" }, + { 41, "JIS_C6220-1969" }, + { 41, "iso-ir-13" }, + { 41, "katakana" }, + { 41, "x0201-7" }, + { 41, "csISO13JISC6220jp" }, + { 42, "JIS_C6220-1969-ro" }, + { 42, "iso-ir-14" }, + { 42, "jp" }, + { 42, "ISO646-JP" }, + { 42, "csISO14JISC6220ro" }, + { 43, "PT" }, + { 43, "iso-ir-16" }, + { 43, "ISO646-PT" }, + { 43, "csISO16Portuguese" }, + { 44, "greek7-old" }, + { 44, "iso-ir-18" }, + { 44, "csISO18Greek7Old" }, + { 45, "latin-greek" }, + { 45, "iso-ir-19" }, + { 45, "csISO19LatinGreek" }, + { 46, "NF_Z_62-010_(1973)" }, + { 46, "iso-ir-25" }, + { 46, "ISO646-FR1" }, + { 46, "csISO25French" }, + { 47, "Latin-greek-1" }, + { 47, "iso-ir-27" }, + { 47, "csISO27LatinGreek1" }, + { 48, "ISO_5427" }, + { 48, "iso-ir-37" }, + { 48, "csISO5427Cyrillic" }, + { 49, "JIS_C6226-1978" }, + { 49, "iso-ir-42" }, + { 49, "csISO42JISC62261978" }, + { 50, "BS_viewdata" }, + { 50, "iso-ir-47" }, + { 50, "csISO47BSViewdata" }, + { 51, "INIS" }, + { 51, "iso-ir-49" }, + { 51, "csISO49INIS" }, + { 52, "INIS-8" }, + { 52, "iso-ir-50" }, + { 52, "csISO50INIS8" }, + { 53, "INIS-cyrillic" }, + { 53, "iso-ir-51" }, + { 53, "csISO51INISCyrillic" }, + { 54, "ISO_5427:1981" }, + { 54, "iso-ir-54" }, + { 54, "ISO5427Cyrillic1981" }, + { 54, "csISO54271981" }, + { 55, "ISO_5428:1980" }, + { 55, "iso-ir-55" }, + { 55, "csISO5428Greek" }, + { 56, "GB_1988-80" }, + { 56, "iso-ir-57" }, + { 56, "cn" }, + { 56, "ISO646-CN" }, + { 56, "csISO57GB1988" }, + { 57, "GB_2312-80" }, + { 57, "iso-ir-58" }, + { 57, "chinese" }, + { 57, "csISO58GB231280" }, + { 58, "NS_4551-2" }, + { 58, "ISO646-NO2" }, + { 58, "iso-ir-61" }, + { 58, "no2" }, + { 58, "csISO61Norwegian2" }, + { 59, "videotex-suppl" }, + { 59, "iso-ir-70" }, + { 59, "csISO70VideotexSupp1" }, + { 60, "PT2" }, + { 60, "iso-ir-84" }, + { 60, "ISO646-PT2" }, + { 60, "csISO84Portuguese2" }, + { 61, "ES2" }, + { 61, "iso-ir-85" }, + { 61, "ISO646-ES2" }, + { 61, "csISO85Spanish2" }, + { 62, "MSZ_7795.3" }, + { 62, "iso-ir-86" }, + { 62, "ISO646-HU" }, + { 62, "hu" }, + { 62, "csISO86Hungarian" }, + { 63, "JIS_C6226-1983" }, + { 63, "iso-ir-87" }, + { 63, "x0208" }, + { 63, "JIS_X0208-1983" }, + { 63, "csISO87JISX0208" }, + { 64, "greek7" }, + { 64, "iso-ir-88" }, + { 64, "csISO88Greek7" }, + { 65, "ASMO_449" }, + { 65, "ISO_9036" }, + { 65, "arabic7" }, + { 65, "iso-ir-89" }, + { 65, "csISO89ASMO449" }, + { 66, "iso-ir-90" }, + { 66, "csISO90" }, + { 67, "JIS_C6229-1984-a" }, + { 67, "iso-ir-91" }, + { 67, "jp-ocr-a" }, + { 67, "csISO91JISC62291984a" }, + { 68, "JIS_C6229-1984-b" }, + { 68, "iso-ir-92" }, + { 68, "ISO646-JP-OCR-B" }, + { 68, "jp-ocr-b" }, + { 68, "csISO92JISC62991984b" }, + { 69, "JIS_C6229-1984-b-add" }, + { 69, "iso-ir-93" }, + { 69, "jp-ocr-b-add" }, + { 69, "csISO93JIS62291984badd" }, + { 70, "JIS_C6229-1984-hand" }, + { 70, "iso-ir-94" }, + { 70, "jp-ocr-hand" }, + { 70, "csISO94JIS62291984hand" }, + { 71, "JIS_C6229-1984-hand-add" }, + { 71, "iso-ir-95" }, + { 71, "jp-ocr-hand-add" }, + { 71, "csISO95JIS62291984handadd" }, + { 72, "JIS_C6229-1984-kana" }, + { 72, "iso-ir-96" }, + { 72, "csISO96JISC62291984kana" }, + { 73, "ISO_2033-1983" }, + { 73, "iso-ir-98" }, + { 73, "e13b" }, + { 73, "csISO2033" }, + { 74, "ANSI_X3.110-1983" }, + { 74, "iso-ir-99" }, + { 74, "CSA_T500-1983" }, + { 74, "NAPLPS" }, + { 74, "csISO99NAPLPS" }, + { 75, "T.61-7bit" }, + { 75, "iso-ir-102" }, + { 75, "csISO102T617bit" }, + { 76, "T.61-8bit" }, + { 76, "T.61" }, + { 76, "iso-ir-103" }, + { 76, "csISO103T618bit" }, + { 77, "ECMA-cyrillic" }, + { 77, "iso-ir-111" }, + { 77, "KOI8-E" }, + { 77, "csISO111ECMACyrillic" }, + { 78, "CSA_Z243.4-1985-1" }, + { 78, "iso-ir-121" }, + { 78, "ISO646-CA" }, + { 78, "csa7-1" }, + { 78, "csa71" }, + { 78, "ca" }, + { 78, "csISO121Canadian1" }, + { 79, "CSA_Z243.4-1985-2" }, + { 79, "iso-ir-122" }, + { 79, "ISO646-CA2" }, + { 79, "csa7-2" }, + { 79, "csa72" }, + { 79, "csISO122Canadian2" }, + { 80, "CSA_Z243.4-1985-gr" }, + { 80, "iso-ir-123" }, + { 80, "csISO123CSAZ24341985gr" }, + { 81, "ISO_8859-6-E" }, + { 81, "csISO88596E" }, + { 81, "ISO-8859-6-E" }, + { 82, "ISO_8859-6-I" }, + { 82, "csISO88596I" }, + { 82, "ISO-8859-6-I" }, + { 83, "T.101-G2" }, + { 83, "iso-ir-128" }, + { 83, "csISO128T101G2" }, + { 84, "ISO_8859-8-E" }, + { 84, "csISO88598E" }, + { 84, "ISO-8859-8-E" }, + { 85, "ISO_8859-8-I" }, + { 85, "csISO88598I" }, + { 85, "ISO-8859-8-I" }, + { 86, "CSN_369103" }, + { 86, "iso-ir-139" }, + { 86, "csISO139CSN369103" }, + { 87, "JUS_I.B1.002" }, + { 87, "iso-ir-141" }, + { 87, "ISO646-YU" }, + { 87, "js" }, + { 87, "yu" }, + { 87, "csISO141JUSIB1002" }, + { 88, "IEC_P27-1" }, + { 88, "iso-ir-143" }, + { 88, "csISO143IECP271" }, + { 89, "JUS_I.B1.003-serb" }, + { 89, "iso-ir-146" }, + { 89, "serbian" }, + { 89, "csISO146Serbian" }, + { 90, "JUS_I.B1.003-mac" }, + { 90, "macedonian" }, + { 90, "iso-ir-147" }, + { 90, "csISO147Macedonian" }, + { 91, "greek-ccitt" }, + { 91, "iso-ir-150" }, + { 91, "csISO150" }, + { 91, "csISO150GreekCCITT" }, + { 92, "NC_NC00-10:81" }, + { 92, "cuba" }, + { 92, "iso-ir-151" }, + { 92, "ISO646-CU" }, + { 92, "csISO151Cuba" }, + { 93, "ISO_6937-2-25" }, + { 93, "iso-ir-152" }, + { 93, "csISO6937Add" }, + { 94, "GOST_19768-74" }, + { 94, "ST_SEV_358-88" }, + { 94, "iso-ir-153" }, + { 94, "csISO153GOST1976874" }, + { 95, "ISO_8859-supp" }, + { 95, "iso-ir-154" }, + { 95, "latin1-2-5" }, + { 95, "csISO8859Supp" }, + { 96, "ISO_10367-box" }, + { 96, "iso-ir-155" }, + { 96, "csISO10367Box" }, + { 97, "latin-lap" }, + { 97, "lap" }, + { 97, "iso-ir-158" }, + { 97, "csISO158Lap" }, + { 98, "JIS_X0212-1990" }, + { 98, "x0212" }, + { 98, "iso-ir-159" }, + { 98, "csISO159JISX02121990" }, + { 99, "DS_2089" }, + { 99, "DS2089" }, + { 99, "ISO646-DK" }, + { 99, "dk" }, + { 99, "csISO646Danish" }, + { 100, "us-dk" }, + { 100, "csUSDK" }, + { 101, "dk-us" }, + { 101, "csDKUS" }, + { 102, "KSC5636" }, + { 102, "ISO646-KR" }, + { 102, "csKSC5636" }, + { 103, "UNICODE-1-1-UTF-7" }, + { 103, "csUnicode11UTF7" }, + { 104, "ISO-2022-CN" }, + { 104, "csISO2022CN" }, + { 105, "ISO-2022-CN-EXT" }, + { 105, "csISO2022CNEXT" }, +#define _GLIBCXX_TEXT_ENCODING_UTF8_OFFSET 413 + { 106, "UTF-8" }, + { 106, "csUTF8" }, + { 109, "ISO-8859-13" }, + { 109, "csISO885913" }, + { 110, "ISO-8859-14" }, + { 110, "iso-ir-199" }, + { 110, "ISO_8859-14:1998" }, + { 110, "ISO_8859-14" }, + { 110, "latin8" }, + { 110, "iso-celtic" }, + { 110, "l8" }, + { 110, "csISO885914" }, + { 111, "ISO-8859-15" }, + { 111, "ISO_8859-15" }, + { 111, "Latin-9" }, + { 111, "csISO885915" }, + { 112, "ISO-8859-16" }, + { 112, "iso-ir-226" }, + { 112, "ISO_8859-16:2001" }, + { 112, "ISO_8859-16" }, + { 112, "latin10" }, + { 112, "l10" }, + { 112, "csISO885916" }, + { 113, "GBK" }, + { 113, "CP936" }, + { 113, "MS936" }, + { 113, "windows-936" }, + { 113, "csGBK" }, + { 114, "GB18030" }, + { 114, "csGB18030" }, + { 115, "OSD_EBCDIC_DF04_15" }, + { 115, "csOSDEBCDICDF0415" }, + { 116, "OSD_EBCDIC_DF03_IRV" }, + { 116, "csOSDEBCDICDF03IRV" }, + { 117, "OSD_EBCDIC_DF04_1" }, + { 117, "csOSDEBCDICDF041" }, + { 118, "ISO-11548-1" }, + { 118, "ISO_11548-1" }, + { 118, "ISO_TR_11548-1" }, + { 118, "csISO115481" }, + { 119, "KZ-1048" }, + { 119, "STRK1048-2002" }, + { 119, "RK1048" }, + { 119, "csKZ1048" }, + { 1000, "ISO-10646-UCS-2" }, + { 1000, "csUnicode" }, + { 1001, "ISO-10646-UCS-4" }, + { 1001, "csUCS4" }, + { 1002, "ISO-10646-UCS-Basic" }, + { 1002, "csUnicodeASCII" }, + { 1003, "ISO-10646-Unicode-Latin1" }, + { 1003, "csUnicodeLatin1" }, + { 1003, "ISO-10646" }, + { 1004, "ISO-10646-J-1" }, + { 1004, "csUnicodeJapanese" }, + { 1005, "ISO-Unicode-IBM-1261" }, + { 1005, "csUnicodeIBM1261" }, + { 1006, "ISO-Unicode-IBM-1268" }, + { 1006, "csUnicodeIBM1268" }, + { 1007, "ISO-Unicode-IBM-1276" }, + { 1007, "csUnicodeIBM1276" }, + { 1008, "ISO-Unicode-IBM-1264" }, + { 1008, "csUnicodeIBM1264" }, + { 1009, "ISO-Unicode-IBM-1265" }, + { 1009, "csUnicodeIBM1265" }, + { 1010, "UNICODE-1-1" }, + { 1010, "csUnicode11" }, + { 1011, "SCSU" }, + { 1011, "csSCSU" }, + { 1012, "UTF-7" }, + { 1012, "csUTF7" }, + { 1013, "UTF-16BE" }, + { 1013, "csUTF16BE" }, + { 1014, "UTF-16LE" }, + { 1014, "csUTF16LE" }, + { 1015, "UTF-16" }, + { 1015, "csUTF16" }, + { 1016, "CESU-8" }, + { 1016, "csCESU8" }, + { 1016, "csCESU-8" }, + { 1017, "UTF-32" }, + { 1017, "csUTF32" }, + { 1018, "UTF-32BE" }, + { 1018, "csUTF32BE" }, + { 1019, "UTF-32LE" }, + { 1019, "csUTF32LE" }, + { 1020, "BOCU-1" }, + { 1020, "csBOCU1" }, + { 1020, "csBOCU-1" }, + { 1021, "UTF-7-IMAP" }, + { 1021, "csUTF7IMAP" }, + { 2000, "ISO-8859-1-Windows-3.0-Latin-1" }, + { 2000, "csWindows30Latin1" }, + { 2001, "ISO-8859-1-Windows-3.1-Latin-1" }, + { 2001, "csWindows31Latin1" }, + { 2002, "ISO-8859-2-Windows-Latin-2" }, + { 2002, "csWindows31Latin2" }, + { 2003, "ISO-8859-9-Windows-Latin-5" }, + { 2003, "csWindows31Latin5" }, + { 2004, "hp-roman8" }, + { 2004, "roman8" }, + { 2004, "r8" }, + { 2004, "csHPRoman8" }, + { 2005, "Adobe-Standard-Encoding" }, + { 2005, "csAdobeStandardEncoding" }, + { 2006, "Ventura-US" }, + { 2006, "csVenturaUS" }, + { 2007, "Ventura-International" }, + { 2007, "csVenturaInternational" }, + { 2008, "DEC-MCS" }, + { 2008, "dec" }, + { 2008, "csDECMCS" }, + { 2009, "IBM850" }, + { 2009, "cp850" }, + { 2009, "850" }, + { 2009, "csPC850Multilingual" }, + { 2010, "IBM852" }, + { 2010, "cp852" }, + { 2010, "852" }, + { 2010, "csPCp852" }, + { 2011, "IBM437" }, + { 2011, "cp437" }, + { 2011, "437" }, + { 2011, "csPC8CodePage437" }, + { 2012, "PC8-Danish-Norwegian" }, + { 2012, "csPC8DanishNorwegian" }, + { 2013, "IBM862" }, + { 2013, "cp862" }, + { 2013, "862" }, + { 2013, "csPC862LatinHebrew" }, + { 2014, "PC8-Turkish" }, + { 2014, "csPC8Turkish" }, + { 2015, "IBM-Symbols" }, + { 2015, "csIBMSymbols" }, + { 2016, "IBM-Thai" }, + { 2016, "csIBMThai" }, + { 2017, "HP-Legal" }, + { 2017, "csHPLegal" }, + { 2018, "HP-Pi-font" }, + { 2018, "csHPPiFont" }, + { 2019, "HP-Math8" }, + { 2019, "csHPMath8" }, + { 2020, "Adobe-Symbol-Encoding" }, + { 2020, "csHPPSMath" }, + { 2021, "HP-DeskTop" }, + { 2021, "csHPDesktop" }, + { 2022, "Ventura-Math" }, + { 2022, "csVenturaMath" }, + { 2023, "Microsoft-Publishing" }, + { 2023, "csMicrosoftPublishing" }, + { 2024, "Windows-31J" }, + { 2024, "csWindows31J" }, + { 2025, "GB2312" }, + { 2025, "csGB2312" }, + { 2026, "Big5" }, + { 2026, "csBig5" }, + { 2027, "macintosh" }, + { 2027, "mac" }, + { 2027, "csMacintosh" }, + { 2028, "IBM037" }, + { 2028, "cp037" }, + { 2028, "ebcdic-cp-us" }, + { 2028, "ebcdic-cp-ca" }, + { 2028, "ebcdic-cp-wt" }, + { 2028, "ebcdic-cp-nl" }, + { 2028, "csIBM037" }, + { 2029, "IBM038" }, + { 2029, "EBCDIC-INT" }, + { 2029, "cp038" }, + { 2029, "csIBM038" }, + { 2030, "IBM273" }, + { 2030, "CP273" }, + { 2030, "csIBM273" }, + { 2031, "IBM274" }, + { 2031, "EBCDIC-BE" }, + { 2031, "CP274" }, + { 2031, "csIBM274" }, + { 2032, "IBM275" }, + { 2032, "EBCDIC-BR" }, + { 2032, "cp275" }, + { 2032, "csIBM275" }, + { 2033, "IBM277" }, + { 2033, "EBCDIC-CP-DK" }, + { 2033, "EBCDIC-CP-NO" }, + { 2033, "csIBM277" }, + { 2034, "IBM278" }, + { 2034, "CP278" }, + { 2034, "ebcdic-cp-fi" }, + { 2034, "ebcdic-cp-se" }, + { 2034, "csIBM278" }, + { 2035, "IBM280" }, + { 2035, "CP280" }, + { 2035, "ebcdic-cp-it" }, + { 2035, "csIBM280" }, + { 2036, "IBM281" }, + { 2036, "EBCDIC-JP-E" }, + { 2036, "cp281" }, + { 2036, "csIBM281" }, + { 2037, "IBM284" }, + { 2037, "CP284" }, + { 2037, "ebcdic-cp-es" }, + { 2037, "csIBM284" }, + { 2038, "IBM285" }, + { 2038, "CP285" }, + { 2038, "ebcdic-cp-gb" }, + { 2038, "csIBM285" }, + { 2039, "IBM290" }, + { 2039, "cp290" }, + { 2039, "EBCDIC-JP-kana" }, + { 2039, "csIBM290" }, + { 2040, "IBM297" }, + { 2040, "cp297" }, + { 2040, "ebcdic-cp-fr" }, + { 2040, "csIBM297" }, + { 2041, "IBM420" }, + { 2041, "cp420" }, + { 2041, "ebcdic-cp-ar1" }, + { 2041, "csIBM420" }, + { 2042, "IBM423" }, + { 2042, "cp423" }, + { 2042, "ebcdic-cp-gr" }, + { 2042, "csIBM423" }, + { 2043, "IBM424" }, + { 2043, "cp424" }, + { 2043, "ebcdic-cp-he" }, + { 2043, "csIBM424" }, + { 2044, "IBM500" }, + { 2044, "CP500" }, + { 2044, "ebcdic-cp-be" }, + { 2044, "ebcdic-cp-ch" }, + { 2044, "csIBM500" }, + { 2045, "IBM851" }, + { 2045, "cp851" }, + { 2045, "851" }, + { 2045, "csIBM851" }, + { 2046, "IBM855" }, + { 2046, "cp855" }, + { 2046, "855" }, + { 2046, "csIBM855" }, + { 2047, "IBM857" }, + { 2047, "cp857" }, + { 2047, "857" }, + { 2047, "csIBM857" }, + { 2048, "IBM860" }, + { 2048, "cp860" }, + { 2048, "860" }, + { 2048, "csIBM860" }, + { 2049, "IBM861" }, + { 2049, "cp861" }, + { 2049, "861" }, + { 2049, "cp-is" }, + { 2049, "csIBM861" }, + { 2050, "IBM863" }, + { 2050, "cp863" }, + { 2050, "863" }, + { 2050, "csIBM863" }, + { 2051, "IBM864" }, + { 2051, "cp864" }, + { 2051, "csIBM864" }, + { 2052, "IBM865" }, + { 2052, "cp865" }, + { 2052, "865" }, + { 2052, "csIBM865" }, + { 2053, "IBM868" }, + { 2053, "CP868" }, + { 2053, "cp-ar" }, + { 2053, "csIBM868" }, + { 2054, "IBM869" }, + { 2054, "cp869" }, + { 2054, "869" }, + { 2054, "cp-gr" }, + { 2054, "csIBM869" }, + { 2055, "IBM870" }, + { 2055, "CP870" }, + { 2055, "ebcdic-cp-roece" }, + { 2055, "ebcdic-cp-yu" }, + { 2055, "csIBM870" }, + { 2056, "IBM871" }, + { 2056, "CP871" }, + { 2056, "ebcdic-cp-is" }, + { 2056, "csIBM871" }, + { 2057, "IBM880" }, + { 2057, "cp880" }, + { 2057, "EBCDIC-Cyrillic" }, + { 2057, "csIBM880" }, + { 2058, "IBM891" }, + { 2058, "cp891" }, + { 2058, "csIBM891" }, + { 2059, "IBM903" }, + { 2059, "cp903" }, + { 2059, "csIBM903" }, + { 2060, "IBM904" }, + { 2060, "cp904" }, + { 2060, "904" }, + { 2060, "csIBBM904" }, + { 2061, "IBM905" }, + { 2061, "CP905" }, + { 2061, "ebcdic-cp-tr" }, + { 2061, "csIBM905" }, + { 2062, "IBM918" }, + { 2062, "CP918" }, + { 2062, "ebcdic-cp-ar2" }, + { 2062, "csIBM918" }, + { 2063, "IBM1026" }, + { 2063, "CP1026" }, + { 2063, "csIBM1026" }, + { 2064, "EBCDIC-AT-DE" }, + { 2064, "csIBMEBCDICATDE" }, + { 2065, "EBCDIC-AT-DE-A" }, + { 2065, "csEBCDICATDEA" }, + { 2066, "EBCDIC-CA-FR" }, + { 2066, "csEBCDICCAFR" }, + { 2067, "EBCDIC-DK-NO" }, + { 2067, "csEBCDICDKNO" }, + { 2068, "EBCDIC-DK-NO-A" }, + { 2068, "csEBCDICDKNOA" }, + { 2069, "EBCDIC-FI-SE" }, + { 2069, "csEBCDICFISE" }, + { 2070, "EBCDIC-FI-SE-A" }, + { 2070, "csEBCDICFISEA" }, + { 2071, "EBCDIC-FR" }, + { 2071, "csEBCDICFR" }, + { 2072, "EBCDIC-IT" }, + { 2072, "csEBCDICIT" }, + { 2073, "EBCDIC-PT" }, + { 2073, "csEBCDICPT" }, + { 2074, "EBCDIC-ES" }, + { 2074, "csEBCDICES" }, + { 2075, "EBCDIC-ES-A" }, + { 2075, "csEBCDICESA" }, + { 2076, "EBCDIC-ES-S" }, + { 2076, "csEBCDICESS" }, + { 2077, "EBCDIC-UK" }, + { 2077, "csEBCDICUK" }, + { 2078, "EBCDIC-US" }, + { 2078, "csEBCDICUS" }, + { 2079, "UNKNOWN-8BIT" }, + { 2079, "csUnknown8BiT" }, + { 2080, "MNEMONIC" }, + { 2080, "csMnemonic" }, + { 2081, "MNEM" }, + { 2081, "csMnem" }, + { 2082, "VISCII" }, + { 2082, "csVISCII" }, + { 2083, "VIQR" }, + { 2083, "csVIQR" }, + { 2084, "KOI8-R" }, + { 2084, "csKOI8R" }, + { 2085, "HZ-GB-2312" }, + { 2086, "IBM866" }, + { 2086, "cp866" }, + { 2086, "866" }, + { 2086, "csIBM866" }, + { 2087, "IBM775" }, + { 2087, "cp775" }, + { 2087, "csPC775Baltic" }, + { 2088, "KOI8-U" }, + { 2088, "csKOI8U" }, + { 2089, "IBM00858" }, + { 2089, "CCSID00858" }, + { 2089, "CP00858" }, + { 2089, "PC-Multilingual-850+euro" }, + { 2089, "csIBM00858" }, + { 2090, "IBM00924" }, + { 2090, "CCSID00924" }, + { 2090, "CP00924" }, + { 2090, "ebcdic-Latin9--euro" }, + { 2090, "csIBM00924" }, + { 2091, "IBM01140" }, + { 2091, "CCSID01140" }, + { 2091, "CP01140" }, + { 2091, "ebcdic-us-37+euro" }, + { 2091, "csIBM01140" }, + { 2092, "IBM01141" }, + { 2092, "CCSID01141" }, + { 2092, "CP01141" }, + { 2092, "ebcdic-de-273+euro" }, + { 2092, "csIBM01141" }, + { 2093, "IBM01142" }, + { 2093, "CCSID01142" }, + { 2093, "CP01142" }, + { 2093, "ebcdic-dk-277+euro" }, + { 2093, "ebcdic-no-277+euro" }, + { 2093, "csIBM01142" }, + { 2094, "IBM01143" }, + { 2094, "CCSID01143" }, + { 2094, "CP01143" }, + { 2094, "ebcdic-fi-278+euro" }, + { 2094, "ebcdic-se-278+euro" }, + { 2094, "csIBM01143" }, + { 2095, "IBM01144" }, + { 2095, "CCSID01144" }, + { 2095, "CP01144" }, + { 2095, "ebcdic-it-280+euro" }, + { 2095, "csIBM01144" }, + { 2096, "IBM01145" }, + { 2096, "CCSID01145" }, + { 2096, "CP01145" }, + { 2096, "ebcdic-es-284+euro" }, + { 2096, "csIBM01145" }, + { 2097, "IBM01146" }, + { 2097, "CCSID01146" }, + { 2097, "CP01146" }, + { 2097, "ebcdic-gb-285+euro" }, + { 2097, "csIBM01146" }, + { 2098, "IBM01147" }, + { 2098, "CCSID01147" }, + { 2098, "CP01147" }, + { 2098, "ebcdic-fr-297+euro" }, + { 2098, "csIBM01147" }, + { 2099, "IBM01148" }, + { 2099, "CCSID01148" }, + { 2099, "CP01148" }, + { 2099, "ebcdic-international-500+euro" }, + { 2099, "csIBM01148" }, + { 2100, "IBM01149" }, + { 2100, "CCSID01149" }, + { 2100, "CP01149" }, + { 2100, "ebcdic-is-871+euro" }, + { 2100, "csIBM01149" }, + { 2101, "Big5-HKSCS" }, + { 2101, "csBig5HKSCS" }, + { 2102, "IBM1047" }, + { 2102, "IBM-1047" }, + { 2102, "csIBM1047" }, + { 2103, "PTCP154" }, + { 2103, "csPTCP154" }, + { 2103, "PT154" }, + { 2103, "CP154" }, + { 2103, "Cyrillic-Asian" }, + { 2104, "Amiga-1251" }, + { 2104, "Ami1251" }, + { 2104, "Amiga1251" }, + { 2104, "Ami-1251" }, + { 2104, "csAmiga1251" }, + { 2104, "(Aliases" }, + { 2104, "are" }, + { 2104, "provided" }, + { 2104, "for" }, + { 2104, "historical" }, + { 2104, "reasons" }, + { 2104, "and" }, + { 2104, "should" }, + { 2104, "not" }, + { 2104, "be" }, + { 2104, "used)" }, + { 2104, "[Malyshev]" }, + { 2105, "KOI7-switched" }, + { 2105, "csKOI7switched" }, + { 2106, "BRF" }, + { 2106, "csBRF" }, + { 2107, "TSCII" }, + { 2107, "csTSCII" }, + { 2108, "CP51932" }, + { 2108, "csCP51932" }, + { 2109, "windows-874" }, + { 2109, "cswindows874" }, + { 2250, "windows-1250" }, + { 2250, "cswindows1250" }, + { 2251, "windows-1251" }, + { 2251, "cswindows1251" }, + { 2252, "windows-1252" }, + { 2252, "cswindows1252" }, + { 2253, "windows-1253" }, + { 2253, "cswindows1253" }, + { 2254, "windows-1254" }, + { 2254, "cswindows1254" }, + { 2255, "windows-1255" }, + { 2255, "cswindows1255" }, + { 2256, "windows-1256" }, + { 2256, "cswindows1256" }, + { 2257, "windows-1257" }, + { 2257, "cswindows1257" }, + { 2258, "windows-1258" }, + { 2258, "cswindows1258" }, + { 2259, "TIS-620" }, + { 2259, "csTIS620" }, + { 2259, "ISO-8859-11" }, + { 2260, "CP50220" }, + { 2260, "csCP50220" }, + +#undef _GLIBCXX_GET_ENCODING_DATA diff --git a/libstdc++-v3/include/bits/unicode.h b/libstdc++-v3/include/bits/unicode.h index f1b2b359bdf..048f02a4bcd 100644 --- a/libstdc++-v3/include/bits/unicode.h +++ b/libstdc++-v3/include/bits/unicode.h @@ -32,7 +32,8 @@ #if __cplusplus >= 202002L #include -#include +#include // bit_width +#include // __detail::__from_chars_alnum_to_val_table #include #include #include @@ -986,7 +987,7 @@ inline namespace __v15_1_0 return __n; } - template + template consteval bool __literal_encoding_is_unicode() { @@ -1056,6 +1057,53 @@ inline namespace __v15_1_0 __literal_encoding_is_utf8() { return __literal_encoding_is_unicode(); } + consteval bool + __literal_encoding_is_extended_ascii() + { + return '0' == 0x30 && 'A' == 0x41 && 'Z' == 0x5a + && 'a' == 0x61 && 'z' == 0x7a; + } + + // https://www.unicode.org/reports/tr22/tr22-8.html#Charset_Alias_Matching + constexpr bool + __charset_alias_match(string_view __a, string_view __b) + { + // Map alphanumeric chars to their base 64 value, everything else to 127. + auto __map = [](char __c, bool& __num) -> unsigned char { + if (__c == '0') [[unlikely]] + return __num ? 0 : 127; + const auto __v = __detail::__from_chars_alnum_to_val(__c); + __num = __v < 10; + return __v; + }; + + auto __ptr_a = __a.begin(), __end_a = __a.end(); + auto __ptr_b = __b.begin(), __end_b = __b.end(); + bool __num_a = false, __num_b = false; + + while (true) + { + // Find the value of the next alphanumeric character in each string. + unsigned char __val_a, __val_b; + while (__ptr_a != __end_a + && (__val_a = __map(*__ptr_a, __num_a)) == 127) + ++__ptr_a; + while (__ptr_b != __end_b + && (__val_b = __map(*__ptr_b, __num_b)) == 127) + ++__ptr_b; + // Stop when we reach the end of a string, or get a mismatch. + if (__ptr_a == __end_a) + return __ptr_b == __end_b; + else if (__ptr_b == __end_b) + return false; + else if (__val_a != __val_b) + return false; // Found non-matching characters. + ++__ptr_a; + ++__ptr_b; + } + return true; + } + } // namespace __unicode _GLIBCXX_END_NAMESPACE_VERSION diff --git a/libstdc++-v3/include/bits/version.def b/libstdc++-v3/include/bits/version.def index afbec6c3e6a..8fb8a2877ee 100644 --- a/libstdc++-v3/include/bits/version.def +++ b/libstdc++-v3/include/bits/version.def @@ -1751,6 +1751,16 @@ ftms = { }; }; +ftms = { + name = text_encoding; + values = { + v = 202306; + cxxmin = 26; + hosted = yes; + extra_cond = "_GLIBCXX_USE_NL_LANGINFO_L"; + }; +}; + ftms = { name = to_string; values = { diff --git a/libstdc++-v3/include/bits/version.h b/libstdc++-v3/include/bits/version.h index 9688b246ef4..9ba99deeda6 100644 --- a/libstdc++-v3/include/bits/version.h +++ b/libstdc++-v3/include/bits/version.h @@ -2137,6 +2137,17 @@ #undef __glibcxx_want_saturation_arithmetic // from version.def line 1755 +#if !defined(__cpp_lib_text_encoding) +# if (__cplusplus > 202302L) && _GLIBCXX_HOSTED && (_GLIBCXX_USE_NL_LANGINFO_L) +# define __glibcxx_text_encoding 202306L +# if defined(__glibcxx_want_all) || defined(__glibcxx_want_text_encoding) +# define __cpp_lib_text_encoding 202306L +# endif +# endif +#endif /* !defined(__cpp_lib_text_encoding) && defined(__glibcxx_want_text_encoding) */ +#undef __glibcxx_want_text_encoding + +// from version.def line 1765 #if !defined(__cpp_lib_to_string) # if (__cplusplus > 202302L) && _GLIBCXX_HOSTED && (__glibcxx_to_chars) # define __glibcxx_to_string 202306L @@ -2147,7 +2158,7 @@ #endif /* !defined(__cpp_lib_to_string) && defined(__glibcxx_want_to_string) */ #undef __glibcxx_want_to_string -// from version.def line 1765 +// from version.def line 1775 #if !defined(__cpp_lib_generator) # if (__cplusplus >= 202100L) && (__glibcxx_coroutine) # define __glibcxx_generator 202207L diff --git a/libstdc++-v3/include/std/text_encoding b/libstdc++-v3/include/std/text_encoding new file mode 100644 index 00000000000..934103f5d88 --- /dev/null +++ b/libstdc++-v3/include/std/text_encoding @@ -0,0 +1,677 @@ +// -*- C++ -*- + +// Copyright The GNU Toolchain Authors. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +// . + +/** @file include/text_encoding + * This is a Standard C++ Library header. + */ + +#ifndef _GLIBCXX_TEXT_ENCODING +#define _GLIBCXX_TEXT_ENCODING + +#pragma GCC system_header + +#include + +#define __glibcxx_want_text_encoding +#include + +#ifdef __cpp_lib_text_encoding +#include +#include +#include // hash +#include // view_interface +#include // __charset_alias_match +#include // __int_traits + +namespace std _GLIBCXX_VISIBILITY(default) +{ +_GLIBCXX_BEGIN_NAMESPACE_VERSION + + /** + * @brief An interface for accessing the IANA Character Sets registry. + * @ingroup locales + * @since C++23 + */ + struct text_encoding + { + private: + struct _Rep + { + using id = __INT_LEAST32_TYPE__; + id _M_id; + const char* _M_name; + + friend constexpr bool + operator<(const _Rep& __r, id __m) noexcept + { return __r._M_id < __m; } + + friend constexpr bool + operator==(const _Rep& __r, string_view __name) noexcept + { return __r._M_name == __name; } + }; + + public: + static constexpr size_t max_name_length = 63; + + enum class id : _Rep::id + { + other = 1, + unknown = 2, + ASCII = 3, + ISOLatin1 = 4, + ISOLatin2 = 5, + ISOLatin3 = 6, + ISOLatin4 = 7, + ISOLatinCyrillic = 8, + ISOLatinArabic = 9, + ISOLatinGreek = 10, + ISOLatinHebrew = 11, + ISOLatin5 = 12, + ISOLatin6 = 13, + ISOTextComm = 14, + HalfWidthKatakana = 15, + JISEncoding = 16, + ShiftJIS = 17, + EUCPkdFmtJapanese = 18, + EUCFixWidJapanese = 19, + ISO4UnitedKingdom = 20, + ISO11SwedishForNames = 21, + ISO15Italian = 22, + ISO17Spanish = 23, + ISO21German = 24, + ISO60DanishNorwegian = 25, + ISO69French = 26, + ISO10646UTF1 = 27, + ISO646basic1983 = 28, + INVARIANT = 29, + ISO2IntlRefVersion = 30, + NATSSEFI = 31, + NATSSEFIADD = 32, + ISO10Swedish = 35, + KSC56011987 = 36, + ISO2022KR = 37, + EUCKR = 38, + ISO2022JP = 39, + ISO2022JP2 = 40, + ISO13JISC6220jp = 41, + ISO14JISC6220ro = 42, + ISO16Portuguese = 43, + ISO18Greek7Old = 44, + ISO19LatinGreek = 45, + ISO25French = 46, + ISO27LatinGreek1 = 47, + ISO5427Cyrillic = 48, + ISO42JISC62261978 = 49, + ISO47BSViewdata = 50, + ISO49INIS = 51, + ISO50INIS8 = 52, + ISO51INISCyrillic = 53, + ISO54271981 = 54, + ISO5428Greek = 55, + ISO57GB1988 = 56, + ISO58GB231280 = 57, + ISO61Norwegian2 = 58, + ISO70VideotexSupp1 = 59, + ISO84Portuguese2 = 60, + ISO85Spanish2 = 61, + ISO86Hungarian = 62, + ISO87JISX0208 = 63, + ISO88Greek7 = 64, + ISO89ASMO449 = 65, + ISO90 = 66, + ISO91JISC62291984a = 67, + ISO92JISC62991984b = 68, + ISO93JIS62291984badd = 69, + ISO94JIS62291984hand = 70, + ISO95JIS62291984handadd = 71, + ISO96JISC62291984kana = 72, + ISO2033 = 73, + ISO99NAPLPS = 74, + ISO102T617bit = 75, + ISO103T618bit = 76, + ISO111ECMACyrillic = 77, + ISO121Canadian1 = 78, + ISO122Canadian2 = 79, + ISO123CSAZ24341985gr = 80, + ISO88596E = 81, + ISO88596I = 82, + ISO128T101G2 = 83, + ISO88598E = 84, + ISO88598I = 85, + ISO139CSN369103 = 86, + ISO141JUSIB1002 = 87, + ISO143IECP271 = 88, + ISO146Serbian = 89, + ISO147Macedonian = 90, + ISO150 = 91, + ISO151Cuba = 92, + ISO6937Add = 93, + ISO153GOST1976874 = 94, + ISO8859Supp = 95, + ISO10367Box = 96, + ISO158Lap = 97, + ISO159JISX02121990 = 98, + ISO646Danish = 99, + USDK = 100, + DKUS = 101, + KSC5636 = 102, + Unicode11UTF7 = 103, + ISO2022CN = 104, + ISO2022CNEXT = 105, + UTF8 = 106, + ISO885913 = 109, + ISO885914 = 110, + ISO885915 = 111, + ISO885916 = 112, + GBK = 113, + GB18030 = 114, + OSDEBCDICDF0415 = 115, + OSDEBCDICDF03IRV = 116, + OSDEBCDICDF041 = 117, + ISO115481 = 118, + KZ1048 = 119, + UCS2 = 1000, + UCS4 = 1001, + UnicodeASCII = 1002, + UnicodeLatin1 = 1003, + UnicodeJapanese = 1004, + UnicodeIBM1261 = 1005, + UnicodeIBM1268 = 1006, + UnicodeIBM1276 = 1007, + UnicodeIBM1264 = 1008, + UnicodeIBM1265 = 1009, + Unicode11 = 1010, + SCSU = 1011, + UTF7 = 1012, + UTF16BE = 1013, + UTF16LE = 1014, + UTF16 = 1015, + CESU8 = 1016, + UTF32 = 1017, + UTF32BE = 1018, + UTF32LE = 1019, + BOCU1 = 1020, + UTF7IMAP = 1021, + Windows30Latin1 = 2000, + Windows31Latin1 = 2001, + Windows31Latin2 = 2002, + Windows31Latin5 = 2003, + HPRoman8 = 2004, + AdobeStandardEncoding = 2005, + VenturaUS = 2006, + VenturaInternational = 2007, + DECMCS = 2008, + PC850Multilingual = 2009, + PC8DanishNorwegian = 2012, + PC862LatinHebrew = 2013, + PC8Turkish = 2014, + IBMSymbols = 2015, + IBMThai = 2016, + HPLegal = 2017, + HPPiFont = 2018, + HPMath8 = 2019, + HPPSMath = 2020, + HPDesktop = 2021, + VenturaMath = 2022, + MicrosoftPublishing = 2023, + Windows31J = 2024, + GB2312 = 2025, + Big5 = 2026, + Macintosh = 2027, + IBM037 = 2028, + IBM038 = 2029, + IBM273 = 2030, + IBM274 = 2031, + IBM275 = 2032, + IBM277 = 2033, + IBM278 = 2034, + IBM280 = 2035, + IBM281 = 2036, + IBM284 = 2037, + IBM285 = 2038, + IBM290 = 2039, + IBM297 = 2040, + IBM420 = 2041, + IBM423 = 2042, + IBM424 = 2043, + PC8CodePage437 = 2011, + IBM500 = 2044, + IBM851 = 2045, + PCp852 = 2010, + IBM855 = 2046, + IBM857 = 2047, + IBM860 = 2048, + IBM861 = 2049, + IBM863 = 2050, + IBM864 = 2051, + IBM865 = 2052, + IBM868 = 2053, + IBM869 = 2054, + IBM870 = 2055, + IBM871 = 2056, + IBM880 = 2057, + IBM891 = 2058, + IBM903 = 2059, + IBM904 = 2060, + IBM905 = 2061, + IBM918 = 2062, + IBM1026 = 2063, + IBMEBCDICATDE = 2064, + EBCDICATDEA = 2065, + EBCDICCAFR = 2066, + EBCDICDKNO = 2067, + EBCDICDKNOA = 2068, + EBCDICFISE = 2069, + EBCDICFISEA = 2070, + EBCDICFR = 2071, + EBCDICIT = 2072, + EBCDICPT = 2073, + EBCDICES = 2074, + EBCDICESA = 2075, + EBCDICESS = 2076, + EBCDICUK = 2077, + EBCDICUS = 2078, + Unknown8BiT = 2079, + Mnemonic = 2080, + Mnem = 2081, + VISCII = 2082, + VIQR = 2083, + KOI8R = 2084, + HZGB2312 = 2085, + IBM866 = 2086, + PC775Baltic = 2087, + KOI8U = 2088, + IBM00858 = 2089, + IBM00924 = 2090, + IBM01140 = 2091, + IBM01141 = 2092, + IBM01142 = 2093, + IBM01143 = 2094, + IBM01144 = 2095, + IBM01145 = 2096, + IBM01146 = 2097, + IBM01147 = 2098, + IBM01148 = 2099, + IBM01149 = 2100, + Big5HKSCS = 2101, + IBM1047 = 2102, + PTCP154 = 2103, + Amiga1251 = 2104, + KOI7switched = 2105, + BRF = 2106, + TSCII = 2107, + CP51932 = 2108, + windows874 = 2109, + windows1250 = 2250, + windows1251 = 2251, + windows1252 = 2252, + windows1253 = 2253, + windows1254 = 2254, + windows1255 = 2255, + windows1256 = 2256, + windows1257 = 2257, + windows1258 = 2258, + TIS620 = 2259, + CP50220 = 2260 + }; + using enum id; + + constexpr text_encoding() = default; + + constexpr explicit + text_encoding(string_view __enc) noexcept + : _M_rep(_S_find_name(__enc)) + { + __enc.copy(_M_name, max_name_length); + } + + // @pre i has the value of one of the enumerators of id. + constexpr + text_encoding(id __i) noexcept + : _M_rep(_S_find_id(__i)) + { + if (string_view __name(_M_rep->_M_name); !__name.empty()) + __name.copy(_M_name, max_name_length); + } + + constexpr id mib() const noexcept { return id(_M_rep->_M_id); } + + constexpr const char* name() const noexcept { return _M_name; } + + struct aliases_view : ranges::view_interface + { + private: + class _Iterator; + struct _Sentinel { }; + + public: + constexpr _Iterator begin() const noexcept; + constexpr _Sentinel end() const noexcept { return {}; } + + private: + friend struct text_encoding; + + constexpr explicit aliases_view(const _Rep* __r) : _M_begin(__r) { } + + const _Rep* _M_begin = nullptr; + }; + + constexpr aliases_view + aliases() const noexcept + { + return _M_rep->_M_name[0] ? aliases_view(_M_rep) : aliases_view{nullptr}; + } + + friend constexpr bool + operator==(const text_encoding& __a, + const text_encoding& __b) noexcept + { + if (__a.mib() == id::other && __b.mib() == id::other) [[unlikely]] + return _S_comp(__a._M_name, __b._M_name); + else + return __a.mib() == __b.mib(); + } + + friend constexpr bool + operator==(const text_encoding& __encoding, id __i) noexcept + { return __encoding.mib() == __i; } + +#if __CHAR_BIT__ == 8 + static consteval text_encoding + literal() noexcept + { +#ifdef __GNUC_EXECUTION_CHARSET_NAME + return text_encoding(__GNUC_EXECUTION_CHARSET_NAME); +#elif defined __clang_literal_encoding__ + return text_encoding(__clang_literal_encoding__); +#else + return text_encoding(); +#endif + } + + static text_encoding + environment(); + + template + static bool + environment_is() + { return text_encoding(_Id)._M_is_environment(); } +#else + static text_encoding literal() = delete; + static text_encoding environment() = delete; + template static bool environment_is() = delete; +#endif + + private: + const _Rep* _M_rep = _S_reps + 1; // id::unknown + char _M_name[max_name_length + 1] = {0}; + + bool + _M_is_environment() const; + + static inline constexpr _Rep _S_reps[] = { + { 1, "" }, { 2, "" }, +#define _GLIBCXX_GET_ENCODING_DATA +#include +#ifdef _GLIBCXX_GET_ENCODING_DATA +# error "Invalid text_encoding data" +#endif + { 9999, nullptr }, // sentinel + }; + + static constexpr bool + _S_comp(string_view __a, string_view __b) + { return __unicode::__charset_alias_match(__a, __b); } + + static constexpr const _Rep* + _S_find_name(string_view __name) noexcept + { +#ifdef _GLIBCXX_TEXT_ENCODING_UTF8_OFFSET + // Optimize the common UTF-8 case to avoid a linear search through all + // strings in the table using the _S_comp function. + if (__name == "UTF-8") + return _S_reps + 2 + _GLIBCXX_TEXT_ENCODING_UTF8_OFFSET; +#endif + + // The first two array elements (other and unknown) don't have names. + // The last element is a sentinel that can never match anything. + const auto __first = _S_reps + 2, __end = std::end(_S_reps) - 1; + for (auto __r = __first; __r != __end; ++__r) + if (_S_comp(__r->_M_name, __name)) + { + // Might have matched an alias. Find the first entry for this ID. + const auto __id = __r->_M_id; + while (__r[-1]._M_id == __id) + --__r; + return __r; + } + return _S_reps; // id::other + } + + static constexpr const _Rep* + _S_find_id(id __id) noexcept + { + const auto __i = (_Rep::id)__id; + const auto __r = std::lower_bound(_S_reps, std::end(_S_reps) - 1, __i); + if (__r->_M_id == __i) [[likely]] + return __r; + else + { + // Preconditions: i has the value of one of the enumerators of id. + __glibcxx_assert(__r->_M_id == __i); + return _S_reps + 1; // id::unknown + } + } + }; + + template<> + struct hash + { + size_t + operator()(const text_encoding& __enc) const noexcept + { return std::hash()(__enc.mib()); } + }; + + class text_encoding::aliases_view::_Iterator + { + public: + using value_type = const char*; + using reference = const char*; + using difference_type = int; + + constexpr _Iterator() = default; + + constexpr value_type + operator*() const + { + if (_M_dereferenceable()) [[likely]] + return _M_rep->_M_name; + else + { + __glibcxx_assert(_M_dereferenceable()); + return ""; + } + } + + constexpr _Iterator& + operator++() + { + if (_M_dereferenceable()) [[likely]] + ++_M_rep; + else + { + __glibcxx_assert(_M_dereferenceable()); + *this = _Iterator{}; + } + return *this; + } + + constexpr _Iterator& + operator--() + { + const bool __decrementable + = _M_rep != nullptr && _M_rep[-1]._M_id == _M_id; + if (__decrementable) [[likely]] + --_M_rep; + else + { + __glibcxx_assert(__decrementable); + *this = _Iterator{}; + } + return *this; + } + + constexpr _Iterator + operator++(int) + { + auto __it = *this; + ++*this; + return __it; + } + + constexpr _Iterator + operator--(int) + { + auto __it = *this; + --*this; + return __it; + } + + constexpr value_type + operator[](difference_type __n) const + { return *(*this + __n); } + + constexpr _Iterator& + operator+=(difference_type __n) + { + if (_M_rep != nullptr) + { + if (__n > 0) + { + if (__n < (std::end(_S_reps) - _M_rep) + && _M_rep[__n - 1]._M_id == _M_id) [[likely]] + _M_rep += __n; + else + *this == _Iterator{}; + } + else if (__n < 0) + { + if (__n > (_S_reps - _M_rep) + && _M_rep[__n]._M_id == _M_id) [[likely]] + _M_rep += __n; + else + *this == _Iterator{}; + } + } + __glibcxx_assert(_M_rep != nullptr); + return *this; + } + + constexpr _Iterator& + operator-=(difference_type __n) + { + using _Traits = __gnu_cxx::__int_traits; + if (__n == _Traits::__min) [[unlikely]] + return operator+=(_Traits::__max); + return operator+=(-__n); + } + + constexpr difference_type + operator-(const _Iterator& __i) const + { + if (_M_id == __i._M_id) + return _M_rep - __i._M_rep; + __glibcxx_assert(_M_id == __i._M_id); + return __gnu_cxx::__int_traits::__max; + } + + constexpr bool + operator==(const _Iterator&) const = default; + + constexpr bool + operator==(_Sentinel) const noexcept + { return !_M_dereferenceable(); } + + constexpr strong_ordering + operator<=>(const _Iterator& __i) const + { + __glibcxx_assert(_M_id == __i._M_id); + return _M_rep <=> __i._M_rep; + } + + friend constexpr _Iterator + operator+(_Iterator __i, difference_type __n) + { + __i += __n; + return __i; + } + + friend constexpr _Iterator + operator+(difference_type __n, _Iterator __i) + { + __i += __n; + return __i; + } + + friend constexpr _Iterator + operator-(_Iterator __i, difference_type __n) + { + __i -= __n; + return __i; + } + + private: + friend class text_encoding; + + constexpr explicit + _Iterator(const _Rep* __r) noexcept + : _M_rep(__r), _M_id(__r ? __r->_M_id : 0) + { } + + constexpr bool + _M_dereferenceable() const noexcept + { return _M_rep != nullptr && _M_rep->_M_id == _M_id; } + + const _Rep* _M_rep = nullptr; + _Rep::id _M_id = 0; + }; + + constexpr auto + text_encoding::aliases_view::begin() const noexcept + -> _Iterator + { return _Iterator(_M_begin); } + +namespace ranges +{ + // Opt-in to borrowed_range concept + template<> + inline constexpr bool + enable_borrowed_range = true; +} + +_GLIBCXX_END_NAMESPACE_VERSION +} // namespace std + +#endif // __cpp_lib_text_encoding +#endif // _GLIBCXX_TEXT_ENCODING diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py b/libstdc++-v3/python/libstdcxx/v6/printers.py index 032a7aa58a2..a6c2ed4599f 100644 --- a/libstdc++-v3/python/libstdcxx/v6/printers.py +++ b/libstdc++-v3/python/libstdcxx/v6/printers.py @@ -2324,6 +2324,21 @@ class StdIntegralConstantPrinter(printer_base): typename = strip_versioned_namespace(self._typename) return "{}<{}, {}>".format(typename, value_type, value) +class StdTextEncodingPrinter(printer_base): + """Print a std::text_encoding.""" + + def __init__(self, typename, val): + self._val = val + self._typename = typename + + def to_string(self): + rep = self._val['_M_rep'].dereference() + if rep['_M_id'] == 1: + return self._val['_M_name'] + if rep['_M_id'] == 2: + return 'unknown' + return rep['_M_name'] + # A "regular expression" printer which conforms to the # "SubPrettyPrinter" protocol from gdb.printing. class RxPrinter(object): @@ -2807,6 +2822,8 @@ def build_libstdcxx_dictionary(): libstdcxx_printer.add_version('std::', 'integral_constant', StdIntegralConstantPrinter) + libstdcxx_printer.add_version('std::', 'text_encoding', + StdTextEncodingPrinter) if hasattr(gdb.Value, 'dynamic_type'): libstdcxx_printer.add_version('std::', 'error_code', diff --git a/libstdc++-v3/scripts/gen_text_encoding_data.py b/libstdc++-v3/scripts/gen_text_encoding_data.py new file mode 100755 index 00000000000..2d6f3e4077a --- /dev/null +++ b/libstdc++-v3/scripts/gen_text_encoding_data.py @@ -0,0 +1,70 @@ +#!/usr/bin/env python3 +# +# Script to generate tables for libstdc++ std::text_encoding. +# +# This file is part of GCC. +# +# GCC is free software; you can redistribute it and/or modify it under +# the terms of the GNU General Public License as published by the Free +# Software Foundation; either version 3, or (at your option) any later +# version. +# +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY +# WARRANTY; without even the implied warranty of MERCHANTABILITY or +# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +# for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# . + +# To update the Libstdc++ static data in download +# the latest: +# https://www.iana.org/assignments/character-sets/character-sets-1.csv +# Then run this script and save the output to +# include/bits/text_encoding-data.h + +import sys +import csv + +if len(sys.argv) != 2: + print("Usage: %s " % sys.argv[0], file=sys.stderr) + sys.exit(1) + +print("// Generated by gen_text_encoding_data.py, do not edit.\n") +print("#ifndef _GLIBCXX_GET_ENCODING_DATA") +print('# error "This is not a public header, do not include it directly"') +print("#endif\n") + + +charsets = {} +with open(sys.argv[1], newline='') as f: + reader = csv.reader(f) + next(reader) # skip header row + for row in reader: + mib = int(row[2]) + if mib in charsets: + raise ValueError("Multiple rows for mibEnum={}".format(mib)) + name = row[1] + aliases = row[5].split() + # Ensure primary name comes first + if name in aliases: + aliases.remove(name) + charsets[mib] = [name] + aliases + +# Remove "NATS-DANO" and "NATS-DANO-ADD" +charsets.pop(33, None) +charsets.pop(34, None) + +count = 0 +for mib in sorted(charsets.keys()): + names = charsets[mib] + if names[0] == "UTF-8": + print("#define _GLIBCXX_TEXT_ENCODING_UTF8_OFFSET {}".format(count)) + for name in names: + print(' {{ {:4}, "{}" }},'.format(mib, name)) + count += len(names) + +# gives an error if this macro is left defined. +# Do this last, so that the generated output is not usable unless we reach here. +print("\n#undef _GLIBCXX_GET_ENCODING_DATA") diff --git a/libstdc++-v3/src/Makefile.am b/libstdc++-v3/src/Makefile.am index 7292ae70f81..37ba1491dea 100644 --- a/libstdc++-v3/src/Makefile.am +++ b/libstdc++-v3/src/Makefile.am @@ -43,7 +43,7 @@ experimental_dir = endif ## Keep this list sync'd with acinclude.m4:GLIBCXX_CONFIGURE. -SUBDIRS = c++98 c++11 c++17 c++20 c++23 \ +SUBDIRS = c++98 c++11 c++17 c++20 c++23 c++26 \ $(filesystem_dir) $(backtrace_dir) $(experimental_dir) # Cross compiler support. @@ -77,6 +77,7 @@ vpath % $(top_srcdir)/src/c++11 vpath % $(top_srcdir)/src/c++17 vpath % $(top_srcdir)/src/c++20 vpath % $(top_srcdir)/src/c++23 +vpath % $(top_srcdir)/src/c++26 if ENABLE_FILESYSTEM_TS vpath % $(top_srcdir)/src/filesystem endif diff --git a/libstdc++-v3/src/c++26/Makefile.am b/libstdc++-v3/src/c++26/Makefile.am new file mode 100644 index 00000000000..000ced1f501 --- /dev/null +++ b/libstdc++-v3/src/c++26/Makefile.am @@ -0,0 +1,109 @@ +## Makefile for the C++26 sources of the GNU C++ Standard library. +## +## Copyright (C) 1997-2023 Free Software Foundation, Inc. +## +## This file is part of the libstdc++ version 3 distribution. +## Process this file with automake to produce Makefile.in. + +## This file is part of the GNU ISO C++ Library. This library is free +## software; you can redistribute it and/or modify it under the +## terms of the GNU General Public License as published by the +## Free Software Foundation; either version 3, or (at your option) +## any later version. + +## This library is distributed in the hope that it will be useful, +## but WITHOUT ANY WARRANTY; without even the implied warranty of +## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +## GNU General Public License for more details. + +## You should have received a copy of the GNU General Public License along +## with this library; see the file COPYING3. If not see +## . + +include $(top_srcdir)/fragment.am + +# Convenience library for C++26 runtime. +noinst_LTLIBRARIES = libc++26convenience.la + +headers = + +if ENABLE_EXTERN_TEMPLATE +# XTEMPLATE_FLAGS = -fno-implicit-templates +inst_sources = +else +# XTEMPLATE_FLAGS = +inst_sources = +endif + +sources = text_encoding.cc + +vpath % $(top_srcdir)/src/c++26 + + +if GLIBCXX_HOSTED +libc__26convenience_la_SOURCES = $(sources) $(inst_sources) +else +libc__26convenience_la_SOURCES = +endif + +# AM_CXXFLAGS needs to be in each subdirectory so that it can be +# modified in a per-library or per-sub-library way. Need to manually +# set this option because CONFIG_CXXFLAGS has to be after +# OPTIMIZE_CXXFLAGS on the compile line so that -O2 can be overridden +# as the occasion calls for it. +AM_CXXFLAGS = \ + -std=gnu++26 \ + $(glibcxx_lt_pic_flag) $(glibcxx_compiler_shared_flag) \ + $(XTEMPLATE_FLAGS) $(VTV_CXXFLAGS) \ + $(WARN_CXXFLAGS) $(OPTIMIZE_CXXFLAGS) $(CONFIG_CXXFLAGS) \ + -fimplicit-templates + +AM_MAKEFLAGS = \ + "gxx_include_dir=$(gxx_include_dir)" + +# Libtool notes + +# 1) In general, libtool expects an argument such as `--tag=CXX' when +# using the C++ compiler, because that will enable the settings +# detected when C++ support was being configured. However, when no +# such flag is given in the command line, libtool attempts to figure +# it out by matching the compiler name in each configuration section +# against a prefix of the command line. The problem is that, if the +# compiler name and its initial flags stored in the libtool +# configuration file don't match those in the command line, libtool +# can't decide which configuration to use, and it gives up. The +# correct solution is to add `--tag CXX' to LTCXXCOMPILE and maybe +# CXXLINK, just after $(LIBTOOL), so that libtool doesn't have to +# attempt to infer which configuration to use. +# +# The second tag argument, `--tag disable-shared` means that libtool +# only compiles each source once, for static objects. In actuality, +# glibcxx_lt_pic_flag and glibcxx_compiler_shared_flag are added to +# the libtool command that is used create the object, which is +# suitable for shared libraries. The `--tag disable-shared` must be +# placed after --tag CXX lest things CXX undo the affect of +# disable-shared. + +# 2) Need to explicitly set LTCXXCOMPILE so that EXTRA_CXX_FLAGS is +# last. (That way, things like -O2 passed down from the toplevel can +# be overridden by --enable-debug.) +LTCXXCOMPILE = \ + $(LIBTOOL) --tag CXX --tag disable-shared \ + $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) \ + --mode=compile $(CXX) $(TOPLEVEL_INCLUDES) \ + $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CXXFLAGS) $(CXXFLAGS) $(EXTRA_CXX_FLAGS) + +LTLDFLAGS = $(shell $(SHELL) $(top_srcdir)/../libtool-ldflags $(LDFLAGS)) + +# 3) We'd have a problem when building the shared libstdc++ object if +# the rules automake generates would be used. We cannot allow g++ to +# be used since this would add -lstdc++ to the link line which of +# course is problematic at this point. So, we get the top-level +# directory to configure libstdc++-v3 to use gcc as the C++ +# compilation driver. +CXXLINK = \ + $(LIBTOOL) --tag CXX --tag disable-shared \ + $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) \ + --mode=link $(CXX) \ + $(VTV_CXXLINKFLAGS) \ + $(OPT_LDFLAGS) $(SECTION_LDFLAGS) $(AM_CXXFLAGS) $(LTLDFLAGS) -o $@ diff --git a/libstdc++-v3/src/c++26/Makefile.in b/libstdc++-v3/src/c++26/Makefile.in new file mode 100644 index 00000000000..77e73b2b265 diff --git a/libstdc++-v3/src/c++26/text_encoding.cc b/libstdc++-v3/src/c++26/text_encoding.cc new file mode 100644 index 00000000000..9a7df07db29 --- /dev/null +++ b/libstdc++-v3/src/c++26/text_encoding.cc @@ -0,0 +1,91 @@ +// Definitions for -*- C++ -*- + +// Copyright The GNU Toolchain Authors. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +// . + +#include +#include + +#ifdef _GLIBCXX_USE_NL_LANGINFO_L +#include +#include + +#if __CHAR_BIT__ == 8 +namespace std +{ +_GLIBCXX_BEGIN_NAMESPACE_VERSION + +text_encoding +__locale_encoding(const char* name) +{ + text_encoding enc; + if (locale_t loc = ::newlocale(LC_ALL_MASK, name, (locale_t)0)) + { + if (const char* codeset = ::nl_langinfo_l(CODESET, loc)) + { + string_view s(codeset); + if (s.size() < text_encoding::max_name_length) + enc = text_encoding(s); + } + ::freelocale(loc); + } + return enc; +} + +_GLIBCXX_END_NAMESPACE_VERSION +} // namespace std + +std::text_encoding +std::text_encoding::environment() +{ + return std::__locale_encoding(""); +} + +bool +std::text_encoding::_M_is_environment() const +{ + bool matched = false; + if (locale_t loc = ::newlocale(LC_ALL_MASK, "", (locale_t)0)) + { + if (const char* codeset = ::nl_langinfo_l(CODESET, loc)) + { + string_view sv(codeset); + for (auto alias : aliases()) + if (__unicode::__charset_alias_match(alias, sv)) + { + matched = true; + break; + } + } + ::freelocale(loc); + } + return matched; +} + +std::text_encoding +std::locale::encoding() const +{ + return std::__locale_encoding(name().c_str()); +} +#endif // CHAR_BIT == 8 + +#endif // _GLIBCXX_USE_NL_LANGINFO_L diff --git a/libstdc++-v3/src/experimental/Makefile.am b/libstdc++-v3/src/experimental/Makefile.am index 8259f986d95..6241430988e 100644 --- a/libstdc++-v3/src/experimental/Makefile.am +++ b/libstdc++-v3/src/experimental/Makefile.am @@ -47,10 +47,12 @@ libstdc__exp_la_SOURCES = $(sources) libstdc__exp_la_LIBADD = \ $(top_builddir)/src/c++23/libc++23convenience.la \ + $(top_builddir)/src/c++26/libc++26convenience.la \ $(filesystem_lib) $(backtrace_lib) libstdc__exp_la_DEPENDENCIES = \ $(top_builddir)/src/c++23/libc++23convenience.la \ + $(top_builddir)/src/c++26/libc++26convenience.la \ $(filesystem_lib) $(backtrace_lib) # AM_CXXFLAGS needs to be in each subdirectory so that it can be diff --git a/libstdc++-v3/testsuite/22_locale/locale/encoding.cc b/libstdc++-v3/testsuite/22_locale/locale/encoding.cc new file mode 100644 index 00000000000..18825fb88b9 --- /dev/null +++ b/libstdc++-v3/testsuite/22_locale/locale/encoding.cc @@ -0,0 +1,36 @@ +// { dg-options "-lstdc++exp" } +// { dg-do run { target c++26 } } +// { dg-require-namedlocale "en_US.ISO8859-1" } +// { dg-require-namedlocale "fr_FR.ISO8859-15" } + +#include +#include + +void +test_encoding() +{ + const std::locale c = std::locale::classic(); + std::text_encoding c_enc = c.encoding(); + VERIFY( c_enc == std::text_encoding::ASCII ); + + const std::locale fr = std::locale(ISO_8859(15, fr_FR)); + std::text_encoding fr_enc = fr.encoding(); + VERIFY( fr_enc == std::text_encoding::ISO885915 ); + + const std::locale en = std::locale(ISO_8859(1, en_US)); + std::text_encoding en_enc = en.encoding(); + VERIFY( en_enc == std::text_encoding::ISOLatin1 ); + +#if __cpp_exceptions + try { + const std::locale c_utf8 = std::locale("C.UTF-8"); + VERIFY( c_utf8.encoding() == std::text_encoding::UTF8 ); + } catch (...) { + } +#endif +} + +int main() +{ + test_encoding(); +} diff --git a/libstdc++-v3/testsuite/ext/unicode/charset_alias_match.cc b/libstdc++-v3/testsuite/ext/unicode/charset_alias_match.cc new file mode 100644 index 00000000000..f6272ae998b --- /dev/null +++ b/libstdc++-v3/testsuite/ext/unicode/charset_alias_match.cc @@ -0,0 +1,18 @@ +// { dg-do compile { target c++20 } } +#include + +using std::__unicode::__charset_alias_match; +static_assert( __charset_alias_match("UTF-8", "utf8") == true ); +static_assert( __charset_alias_match("UTF-8", "u.t.f-008") == true ); +static_assert( __charset_alias_match("UTF-8", "utf-80") == false ); +static_assert( __charset_alias_match("UTF-8", "ut8") == false ); + +static_assert( __charset_alias_match("iso8859_1", "ISO-8859-1") == true ); + +static_assert( __charset_alias_match("", "") == true ); +static_assert( __charset_alias_match("", ".") == true ); +static_assert( __charset_alias_match("--", "...") == true ); +static_assert( __charset_alias_match("--a", "a...") == true ); +static_assert( __charset_alias_match("--a010", "a..10.") == true ); +static_assert( __charset_alias_match("--a010", "a..1.0") == false ); +static_assert( __charset_alias_match("aaaa", "000.00.0a0a)0aa...") == true ); diff --git a/libstdc++-v3/testsuite/std/text_encoding/cons.cc b/libstdc++-v3/testsuite/std/text_encoding/cons.cc new file mode 100644 index 00000000000..b9d93641de4 --- /dev/null +++ b/libstdc++-v3/testsuite/std/text_encoding/cons.cc @@ -0,0 +1,113 @@ +// { dg-do run { target c++26 } } + +#include +#include +#include + +using namespace std::string_view_literals; + +constexpr void +test_default_construct() +{ + std::text_encoding e0; + VERIFY( e0.mib() == std::text_encoding::unknown ); + VERIFY( e0.name()[0] == '\0' ); // P2862R1 name() should never return null + VERIFY( e0.aliases().empty() ); +} + +constexpr void +test_construct_by_name() +{ + std::string_view s; + std::text_encoding e0(s); + VERIFY( e0.mib() == std::text_encoding::other ); + VERIFY( e0.name() == s ); + VERIFY( e0.aliases().empty() ); + + s = "not a real encoding"; + std::text_encoding e1(s); + VERIFY( e1.mib() == std::text_encoding::other ); + VERIFY( e1.name() == s ); + VERIFY( e1.aliases().empty() ); + + VERIFY( e1 != e0 ); + VERIFY( e1 == e0.mib() ); + + s = "utf8"; + std::text_encoding e2(s); + VERIFY( e2.mib() == std::text_encoding::UTF8 ); + VERIFY( e2.name() == s ); + VERIFY( ! e2.aliases().empty() ); + VERIFY( e2.aliases().front() == "UTF-8"sv ); + + s = "Latin-1"; // matches "latin1" + std::text_encoding e3(s); + VERIFY( e3.mib() == std::text_encoding::ISOLatin1 ); + VERIFY( e3.name() == s ); + VERIFY( ! e3.aliases().empty() ); + VERIFY( e3.aliases().front() == "ISO_8859-1:1987"sv ); // primary name + + s = "U.S."; // matches "us" + std::text_encoding e4(s); + VERIFY( e4.mib() == std::text_encoding::ASCII ); + VERIFY( e4.name() == s ); + VERIFY( ! e4.aliases().empty() ); + VERIFY( e4.aliases().front() == "US-ASCII"sv ); // primary name +} + +constexpr void +test_construct_by_id() +{ + std::text_encoding e0(std::text_encoding::other); + VERIFY( e0.mib() == std::text_encoding::other ); + VERIFY( e0.name() == ""sv ); + VERIFY( e0.aliases().empty() ); + + std::text_encoding e1(std::text_encoding::unknown); + VERIFY( e1.mib() == std::text_encoding::unknown ); + VERIFY( e1.name() == ""sv ); + VERIFY( e1.aliases().empty() ); + + std::text_encoding e2(std::text_encoding::UTF8); + VERIFY( e2.mib() == std::text_encoding::UTF8 ); + VERIFY( e2.name() == "UTF-8"sv ); + VERIFY( ! e2.aliases().empty() ); + VERIFY( e2.aliases().front() == std::string_view(e2.name()) ); + bool found = false; + for (auto alias : e2.aliases()) + if (alias == "csUTF8"sv) + { + found = true; + break; + } + VERIFY( found ); +} + +constexpr void +test_copy_construct() +{ + std::text_encoding e0; + std::text_encoding e1 = e0; + VERIFY( e1 == e0 ); + + std::text_encoding e2(std::text_encoding::UTF8); + auto e3 = e2; + VERIFY( e3 == e2 ); + + e1 = e3; + VERIFY( e1 == e2 ); +} + +int main() +{ + auto run_tests = [] { + test_default_construct(); + test_construct_by_name(); + test_construct_by_id(); + test_copy_construct(); + return true; + }; + + run_tests(); + static_assert( run_tests() ); +} diff --git a/libstdc++-v3/testsuite/std/text_encoding/members.cc b/libstdc++-v3/testsuite/std/text_encoding/members.cc new file mode 100644 index 00000000000..0b0d6bd0c96 --- /dev/null +++ b/libstdc++-v3/testsuite/std/text_encoding/members.cc @@ -0,0 +1,41 @@ +// { dg-options "-lstdc++exp" } +// { dg-do run { target c++26 } } +// { dg-require-namedlocale "en_US.ISO8859-1" } +// { dg-require-namedlocale "fr_FR.ISO8859-15" } + +#include +#include +#include +#include + +using namespace std::string_view_literals; + +void +test_literal() +{ + const std::text_encoding lit = std::text_encoding::literal(); + VERIFY( lit.name() == std::string_view(__GNUC_EXECUTION_CHARSET_NAME) ); +} + +void +test_env() +{ + const std::text_encoding env = std::text_encoding::environment(); + + if (env.mib() == std::text_encoding::UTF8) + VERIFY( std::text_encoding::environment_is() ); + + ::setlocale(LC_ALL, ISO_8859(1, en_US)); + const std::text_encoding env1 = std::text_encoding::environment(); + VERIFY( env1 == env ); + + ::setlocale(LC_ALL, ISO_8859(15, fr_FR)); + const std::text_encoding env2 = std::text_encoding::environment(); + VERIFY( env2 == env ); +} + +int main() +{ + test_literal(); + test_env(); +} diff --git a/libstdc++-v3/testsuite/std/text_encoding/requirements.cc b/libstdc++-v3/testsuite/std/text_encoding/requirements.cc new file mode 100644 index 00000000000..e22ab729d4f --- /dev/null +++ b/libstdc++-v3/testsuite/std/text_encoding/requirements.cc @@ -0,0 +1,72 @@ +// { dg-do compile { target c++26 } } +// { dg-add-options no_pch } + +#include +#ifndef __cpp_lib_text_encoding +# error "Feature-test macro for text_encoding missing in " +#elif __cpp_lib_text_encoding != 202306L +# error "Feature-test macro for text_encoding has wrong value in " +#endif + +#undef __cpp_lib_expected +#include +#ifndef __cpp_lib_text_encoding +# error "Feature-test macro for text_encoding missing in " +#elif __cpp_lib_text_encoding != 202306L +# error "Feature-test macro for text_encoding has wrong value in " +#endif + +#include +#include +static_assert( std::is_trivially_copyable_v ); + +using aliases_view = std::text_encoding::aliases_view; +static_assert( std::copyable ); +static_assert( std::ranges::view ); +static_assert( std::ranges::random_access_range ); +static_assert( std::ranges::borrowed_range ); +static_assert( std::same_as, + const char*> ); +static_assert( std::same_as, + const char*> ); + +#include + +constexpr bool +test_constexpr_iterator() +{ + // This encoding has two aliases, "UTF-8" and "csUTF8". + const auto a = std::text_encoding(std::text_encoding::UTF8).aliases(); + const auto begin = a.begin(); + const auto end = a.end(); + auto iter = begin; + VERIFY( *iter == *begin ); + VERIFY( *iter == iter[0] ); + VERIFY( iter[1] == begin[1] ); + --++iter; + VERIFY( iter == begin ); + auto iter2 = iter++; + VERIFY( iter2 == begin ); + VERIFY( iter != begin ); + iter2 = iter--; + VERIFY( iter2 != begin ); + VERIFY( iter == begin ); + iter++++; // Increments prvalue returned by operator++(int) instead of iter. + VERIFY( iter == iter2 ); + iter----; // Decrements prvalue returned by operator++(int) instead of iter. + VERIFY( iter == begin ); + const auto d = std::ranges::distance(a); + iter += d; + VERIFY( iter == end ); + VERIFY( (iter - begin) == d ); + VERIFY( (begin + 2) == iter ); + VERIFY( (2 + begin) == iter ); + VERIFY( iter[-1] == begin[1] ); + iter -= d; + VERIFY( iter == begin ); + VERIFY( *(iter + 1) == iter[1] ); + VERIFY( (1 + iter - 1) == begin ); + VERIFY( (-1 + (iter - -2) + -1) == begin ); + return true; +} +static_assert( test_constexpr_iterator() );