From patchwork Tue Mar 14 18:46:12 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Wakely X-Patchwork-Id: 738883 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3vjNxm1XHvz9s1h for ; Wed, 15 Mar 2017 05:46:51 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="m3pN+Dw6"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:references:mime-version:content-type :in-reply-to; q=dns; s=default; b=XNbUxIm/f880zHrAfIi6HTh19Ru8iZ 7ndfSYZPoOTggtNHcRkfQbQ8xK9Qx1xRMDjgw1DhD/HPjR2Jclkf+jmy0HbClaBy Po8MVM38Fz1Oz6MCd3e6kw/WjwIf97RagqvnZpSabbxpTebP7BhOIpmNikFSWpIO 8G4v9EDCUt3A8= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:references:mime-version:content-type :in-reply-to; s=default; bh=R5KGq93MuOIaNNfOJMyU/eSEXkY=; b=m3pN +Dw6TGJvQ7BHZ/i47mwlVJ66CIpNJy0aYZnW4pzvHNy7Ga7tUPDbdkx2vzMb2T2Z 6HEpg2vqgDMdglBGzM9n9GGU9iYpQrL7p6nowD9xx7QFkrsXY7bhpi0mlSl2Sd1M LDXX9iDZ0bYofF3/JFaeNuO7LdYrg7PnGHlj0iA= Received: (qmail 37222 invoked by alias); 14 Mar 2017 18:46:25 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 37147 invoked by uid 89); 14 Mar 2017 18:46:24 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RP_MATCHES_RCVD, SPF_HELO_PASS autolearn=ham version=3.3.2 spammy= X-Spam-User: qpsmtpd, 2 recipients X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 14 Mar 2017 18:46:14 +0000 Received: from smtp.corp.redhat.com (int-mx16.intmail.prod.int.phx2.redhat.com [10.5.11.28]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id B591881233; Tue, 14 Mar 2017 18:46:14 +0000 (UTC) Received: from localhost (ovpn-116-235.ams2.redhat.com [10.36.116.235]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5E3EB2D655; Tue, 14 Mar 2017 18:46:14 +0000 (UTC) Date: Tue, 14 Mar 2017 18:46:12 +0000 From: Jonathan Wakely To: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Various fixes for facets Message-ID: <20170314184612.GC3501@redhat.com> References: <20170313193547.GW3501@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20170313193547.GW3501@redhat.com> X-Clacks-Overhead: GNU Terry Pratchett User-Agent: Mutt/1.7.1 (2016-10-04) On 13/03/17 19:35 +0000, Jonathan Wakely wrote: >This is a series of patches to fix various bugs in the Unicode >character conversion facets. > >Ther first patch fixes a silly < versus <= bug that meant that 0xffff >got written as a surrogate pair instead of as simply 0xff, and an >endianness bug for the internal representation of UTF-16 code units >stored in char32_t or wchar_t values. That's PR 79511. > >The second patch fixes some incorrect bitwise operations (because I >confused & and |) and some incorrect limits (because I confused max >and min). That fixes determining the endianness of the external >representation bytes when they start with a Byte OrderMark, and >correctly reports errors on invalid UCS2. It also fixes >wstring_convert so that it reports the number of characters that were >converted prior to an error. That's PR 79980. > >The third patch fixes the output of the encoding() and max_length() >member functions on the codecvt facets, because I wasn't correctly >accounting for a BOM or for the differences between UTF-16 and UCS2. > >I plan to commit these for all branches, but I'll wait until after GCC >7.1 is released, and fix it for 7.2 instead. These bugs aren't >important enough to rush into trunk now. One more patch for a problem found by the libc++ testsuite. Now we pass all the libc++ tests, and we even pass a test that libc++ fails. With this, I hope our is 100% conforming. Just in time to be deprecated for C++17 :-) commit 3118704bc37cd771b9fc5bf83230f38a16a7c5c3 Author: Jonathan Wakely Date: Tue Mar 14 17:47:12 2017 +0000 PR libstdc++/80041 fix codecvt_utf16 to use UTF-16 not UTF-8 PR libstdc++/80041 * src/c++11/codecvt.cc (__codecvt_utf16_base::do_out) (__codecvt_utf16_base::do_in): Convert char arguments to char16_t to work with UTF-16 instead of UTF-8. * testsuite/22_locale/codecvt/codecvt_utf16/80041.cc: New test. diff --git a/libstdc++-v3/src/c++11/codecvt.cc b/libstdc++-v3/src/c++11/codecvt.cc index 9c91725..ef38267 100644 --- a/libstdc++-v3/src/c++11/codecvt.cc +++ b/libstdc++-v3/src/c++11/codecvt.cc @@ -1217,7 +1217,10 @@ do_out(state_type&, const intern_type* __from, const intern_type* __from_end, extern_type* __to, extern_type* __to_end, extern_type*& __to_next) const { - range to{ __to, __to_end }; + range to{ + reinterpret_cast(__to), + reinterpret_cast(__to_end) + }; #if __SIZEOF_WCHAR_T__ == 2 range from{ reinterpret_cast(__from), @@ -1234,7 +1237,7 @@ do_out(state_type&, const intern_type* __from, const intern_type* __from_end, return codecvt_base::error; #endif __from_next = reinterpret_cast(from.next); - __to_next = to.next; + __to_next = reinterpret_cast(to.next); return res; } @@ -1254,7 +1257,10 @@ do_in(state_type&, const extern_type* __from, const extern_type* __from_end, intern_type* __to, intern_type* __to_end, intern_type*& __to_next) const { - range from{ __from, __from_end }; + range from{ + reinterpret_cast(__from), + reinterpret_cast(__from_end) + }; #if __SIZEOF_WCHAR_T__ == 2 range to{ reinterpret_cast(__to), @@ -1270,7 +1276,7 @@ do_in(state_type&, const extern_type* __from, const extern_type* __from_end, #else return codecvt_base::error; #endif - __from_next = from.next; + __from_next = reinterpret_cast(from.next); __to_next = reinterpret_cast(to.next); return res; } diff --git a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf16/80041.cc b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf16/80041.cc new file mode 100644 index 0000000..a78b194 --- /dev/null +++ b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf16/80041.cc @@ -0,0 +1,87 @@ +// Copyright (C) 2017 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// You should have received a copy of the GNU General Public License along +// with this library; see the file COPYING3. If not see +// . + +// { dg-do run { target c++11 } } + +#include +#include + +void +test01() +{ +#ifdef _GLIBCXX_USE_WCHAR_T + std::codecvt_utf16 conv; + const wchar_t wc = 0x6557; + char bytes[2] = {0}; + const wchar_t* wcnext; + std::mbstate_t st{}; + char* next = nullptr; + auto res = conv.out(st, &wc, &wc+ 1, wcnext, bytes, std::end(bytes), next); + VERIFY( res == std::codecvt_base::ok ); + VERIFY( wcnext == &wc + 1 ); + VERIFY( next == std::end(bytes) ); + VERIFY( bytes[0] == 0x65 ); + VERIFY( bytes[1] == 0x57 ); + VERIFY( conv.length(st, bytes, next, 1) == (next - bytes) ); + + wchar_t w; + wchar_t* wnext; + const char* cnext; + st = {}; + res = conv.in(st, bytes, next, cnext, &w, &w + 1, wnext); + VERIFY( res == std::codecvt_base::ok ); + VERIFY( wnext == &w + 1 ); + VERIFY( cnext == next ); + VERIFY( w == wc ); +#endif +} + +void +test02() +{ +#ifdef _GLIBCXX_USE_WCHAR_T + std::codecvt_utf16 conv; + wchar_t wc = 0x6557; + char bytes[2] = {0}; + const wchar_t* wcnext; + std::mbstate_t st{}; + char* next = nullptr; + auto res = conv.out(st, &wc, &wc+ 1, wcnext, bytes, std::end(bytes), next); + VERIFY( res == std::codecvt_base::ok ); + VERIFY( wcnext == &wc + 1 ); + VERIFY( next == std::end(bytes) ); + VERIFY( bytes[0] == 0x57 ); + VERIFY( bytes[1] == 0x65 ); + VERIFY( conv.length(st, bytes, next, 1) == (next - bytes) ); + + wchar_t w; + wchar_t* wnext; + const char* cnext; + st = {}; + res = conv.in(st, bytes, next, cnext, &w, &w + 1, wnext); + VERIFY( res == std::codecvt_base::ok ); + VERIFY( wnext == &w + 1 ); + VERIFY( cnext == next ); + VERIFY( w == wc ); +#endif +} + +int main() +{ + test01(); + test02(); +}