From patchwork Tue Feb 9 10:02:42 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Carlos O'Donell X-Patchwork-Id: 580687 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 63BEE140B93 for ; Tue, 9 Feb 2016 21:02:55 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.b=XhXijAuc; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:to:references:cc:from:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; q=dns; s=default; b=fK2vaJtxnZQFS7m3 eZN9mFm6JEEPBKburLMjsNBS8QPp9bEprBYWBhGVcChODVr4KY04+Vjpgz+nQWsT HMecngivZvw+u69GNHzmgmGd7i8NM4BOui97Xia3m3QOaON6HVEpQsAidjKo2Goz XXvYQouMYtYIfz8iv3dTeClaTxo= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:to:references:cc:from:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; s=default; bh=4mq3yZiAGQeTkQZnttMN3Z JhoL4=; b=XhXijAucH3tR00oS7tYBgireKMi7ixKo6PFJ5xmEvAQPTXd43yLF1b IN/KILKRLRwvef9t6Juboqji9qwfvGJO/jmA7jnmqOlc7Vt9WWQHeNkJcWt4gqTw e6jQrCSeuwfCcJD/Ze4G5ltkVYJQ8nbAX9FD1kwTNLC3msgW87jSI= Received: (qmail 89824 invoked by alias); 9 Feb 2016 10:02:48 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 89803 invoked by uid 89); 9 Feb 2016 10:02:46 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=BAYES_00, RP_MATCHES_RCVD, SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=technically, checkin, xa0, states X-HELO: mx1.redhat.com Subject: Re: [PATCH] BZ #19575: Clarify status of entries in GB 18030-2005. To: Andreas Schwab References: <56B8FA69.8030508@redhat.com> <87mvrakhab.fsf@linux-m68k.org> <56B90D0C.7090000@redhat.com> <87a8nakfq6.fsf@linux-m68k.org> <56B92BC9.7010103@redhat.com> Cc: GNU C Library From: "Carlos O'Donell" X-Enigmail-Draft-Status: N1110 Message-ID: <56B9B942.2030203@redhat.com> Date: Tue, 9 Feb 2016 05:02:42 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 In-Reply-To: On 02/09/2016 03:55 AM, Andreas Schwab wrote: > "Carlos O'Donell" writes: > >> On 02/08/2016 05:19 PM, Andreas Schwab wrote: >>> "Carlos O'Donell" writes: >>> >>>> This patch is only to clarify why these entries are being mapped >>>> differently than in the original GB 18030-2005 standard. >>> >>> They aren't. >> >> Do you have a copy of the standard to verify that? > > See charset/data/ucm/gb-18030-2005.ucm in ICU. That's not a copy of the standard. "CJKV Information Processing" by Dr. Ken Lunde on page 108 explicitly states that GB-18030-2005 has 24 PUA mappings that with Unicode 4.1 or newer can be mapped to non-PUA equivalents and he describes the 24 characters, and the ICU ucm data does exactly that. This does not match the published standard, but that is OK, it's best practice not to use PUA mappings if you can avoid it when later Unicode versions include non-PUA equivalents (as we do also in glibc). All I want to clarify in the glibc version of these files is that the data is not identical to the standard as published. v2 of the patch follows. OK to checkin? Cheers, Carlos. 2016-02-09 Carlos O'Donell [BZ #19575] * charmaps/GB18030: Document PUA to non-PUA equivalents. diff --git a/localedata/charmaps/GB18030 b/localedata/charmaps/GB18030 index 863a123..85a15fe 100644 --- a/localedata/charmaps/GB18030 +++ b/localedata/charmaps/GB18030 @@ -57234,6 +57234,22 @@ CHARMAP /xa6/xbe /xa6/xbf /xa6/xc0 +% The newest GB 18030-2005 standard still uses some private use area +% code points. Any implementation which has Unicode 4.1 or newer +% support should not use these PUA code points, and instead should +% map these entries to their equivalent non-PUA code points. There +% are 24 idiograms in GB 18030-2005 which have non-PUA equivalents. +% In glibc we only support roundtrip code points, and so must choose +% between supporting the old PUA code points, or using the newer +% non-PUA code points. We choose to use the non-PUA code points to +% be compatible with ICU's similar choice. In choosing the non-PUA +% code points we can no longer convert the old PUA code points back +% to GB-18030-2005 (technically only fixable if we added support +% for non-roundtrip code points e.g. ICU's "fallback mapping"). +% The recommendation to use the non-PUA code points, where available, +% is based on "CJKV Information Processing" 2nd Ed. by Dr. Ken Lunde. +% +% These 10 PUA mappings use equivalents from to . % /xa6/xd9 % /xa6/xda % /xa6/xdb @@ -57371,6 +57387,7 @@ CHARMAP /xd7/xfd /xd7/xfe /x83/x36/xc9/x34 +% These 3 PUA mappings use equivalents , and . % /xfe/x51 % /xfe/x52 % /xfe/x53 @@ -57379,6 +57396,7 @@ CHARMAP /x83/x36/xc9/x37 /x83/x36/xc9/x38 /x83/x36/xc9/x39 +% This 1 PUA mapping uses the equivalent . % /xfe/x59 /x83/x36/xca/x30 /x83/x36/xca/x31 @@ -57387,17 +57405,20 @@ CHARMAP /x83/x36/xca/x34 /x83/x36/xca/x35 /x83/x36/xca/x36 +% This 1 PUA mapping uses the equivalent . % /xfe/x61 /x83/x36/xca/x37 /x83/x36/xca/x38 /x83/x36/xca/x39 /x83/x36/xcb/x30 +% These 2 PUA mappings use the equivalents and . % /xfe/x66 % /xfe/x67 /x83/x36/xcb/x31 /x83/x36/xcb/x32 /x83/x36/xcb/x33 /x83/x36/xcb/x34 +% These 2 PUA mappings use the equivalents and . % /xfe/x6c % /xfe/x6d /x83/x36/xcb/x35 @@ -57408,6 +57429,7 @@ CHARMAP /x83/x36/xcc/x30 /x83/x36/xcc/x31 /x83/x36/xcc/x32 +% This 1 PUA mapping uses the equivalent . % /xfe/x76 /x83/x36/xcc/x33 /x83/x36/xcc/x34 @@ -57416,6 +57438,7 @@ CHARMAP /x83/x36/xcc/x37 /x83/x36/xcc/x38 /x83/x36/xcc/x39 +% This 1 PUA mapping uses the equivalent . % /xfe/x7e /x83/x36/xcd/x30 /x83/x36/xcd/x31 @@ -57433,6 +57456,7 @@ CHARMAP /x83/x36/xce/x33 /x83/x36/xce/x34 /x83/x36/xce/x35 +% These 2 PUA mappings use the equivalents and . % /xfe/x90 % /xfe/x91 /x83/x36/xce/x36 @@ -57449,6 +57473,7 @@ CHARMAP /x83/x36/xcf/x37 /x83/x36/xcf/x38 /x83/x36/xcf/x39 +% This 1 PUA mapping uses the equivalent . % /xfe/xa0 /x83/x36/xd0/x30 /x83/x36/xd0/x31