Message ID | 56B8FA69.8030508@redhat.com |
---|---|
State | New |
Headers | show |
On 02/08/2016 09:28 PM, Carlos O'Donell wrote: > In bug 19575 Florian Weimer asks about the status of the glibc > support for GB 18030-2005, since ICU and Emacs produce slightly > different results than glibc. > > The following patch adds clarifying comments to GB 18030-2005's > character map to explain why glibc has the following mapping and > why it is best-practice. The comments would probably have helped me to understand the situation. Thanks, Florian
"Carlos O'Donell" <carlos@redhat.com> writes: > In bug 19575 Florian Weimer asks about the status of the glibc > support for GB 18030-2005, since ICU and Emacs produce slightly > different results than glibc. Emacs uses the same table as glibc. > +% The code points from <UFE10> to <UFE19> are a adjustment > +% of the GB 18030-2005 standard to account for the fact that > +% with Unicode 4.1 support we can now correctly represent those > +% entries, which in the standard, used PUA code points. There are more differences between GB18030-2000 and GB18030-2005. Andreas.
On 02/08/2016 04:45 PM, Andreas Schwab wrote: > "Carlos O'Donell" <carlos@redhat.com> writes: > >> In bug 19575 Florian Weimer asks about the status of the glibc >> support for GB 18030-2005, since ICU and Emacs produce slightly >> different results than glibc. > > Emacs uses the same table as glibc. Good. >> +% The code points from <UFE10> to <UFE19> are a adjustment >> +% of the GB 18030-2005 standard to account for the fact that >> +% with Unicode 4.1 support we can now correctly represent those >> +% entries, which in the standard, used PUA code points. > > There are more differences between GB18030-2000 and GB18030-2005. Agreed. This patch is only to clarify why these entries are being mapped differently than in the original GB 18030-2005 standard. Does the patch seem suitable to you? Cheers, Carlos.
"Carlos O'Donell" <carlos@redhat.com> writes: > This patch is only to clarify why these entries are being mapped > differently than in the original GB 18030-2005 standard. They aren't. Andreas.
On 02/08/2016 05:19 PM, Andreas Schwab wrote: > "Carlos O'Donell" <carlos@redhat.com> writes: > >> This patch is only to clarify why these entries are being mapped >> differently than in the original GB 18030-2005 standard. > > They aren't. Do you have a copy of the standard to verify that? Cheers, Carlos.
"Carlos O'Donell" <carlos@redhat.com> writes: > On 02/08/2016 05:19 PM, Andreas Schwab wrote: >> "Carlos O'Donell" <carlos@redhat.com> writes: >> >>> This patch is only to clarify why these entries are being mapped >>> differently than in the original GB 18030-2005 standard. >> >> They aren't. > > Do you have a copy of the standard to verify that? See charset/data/ucm/gb-18030-2005.ucm in ICU. Andreas.
diff --git a/localedata/charmaps/GB18030 b/localedata/charmaps/GB18030 index 863a123..c48276e 100644 --- a/localedata/charmaps/GB18030 +++ b/localedata/charmaps/GB18030 @@ -57234,6 +57234,12 @@ CHARMAP <UE78A> /xa6/xbe <Private Use> <UE78B> /xa6/xbf <Private Use> <UE78C> /xa6/xc0 <Private Use> +% The newest GB 18030-2005 standard still uses some private use area +% code points. Any implementation which has Unicode 4.1 or newer +% support should not use these PUA code points, and instead should +% map these entries to their equivalent non-PUA code points which +% in this case map from <UFE10> to <UFE19>. This recommendation is +% based on "CJKV Processing" by Dr. Ken Lunde. % <UE78D> /xa6/xd9 <Private Use> % <UE78E> /xa6/xda <Private Use> % <UE78F> /xa6/xdb <Private Use> @@ -62997,6 +63003,10 @@ CHARMAP <UFE0D> /x84/x31/x82/x33 VARIATION SELECTOR-14 <UFE0E> /x84/x31/x82/x34 VARIATION SELECTOR-15 <UFE0F> /x84/x31/x82/x35 VARIATION SELECTOR-16 +% The code points from <UFE10> to <UFE19> are a adjustment +% of the GB 18030-2005 standard to account for the fact that +% with Unicode 4.1 support we can now correctly represent those +% entries, which in the standard, used PUA code points. <UFE10> /xa6/xd9 PRESENTATION FORM FOR VERTICAL COMMA <UFE11> /xa6/xdb PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC COMMA <UFE12> /xa6/xda PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC FULL STOP