Message ID | or8ufscg8p.fsf@livre.home |
---|---|
State | New |
Headers | show |
On 02/20/2015 06:31 PM, Alexandre Oliva wrote: > On Feb 20, 2015, "Carlos O'Donell" <carlos@redhat.com> wrote: > >> Thus __STDC_ISO_10646__ should be 201304L (the date that ISO/EC 10646:2012 >> Amd.1 was published). > > Fixed in the patch below. This change looks good to me. OK to commit. > On Feb 19, 2015, Mike FABIAN <mfabian@redhat.com> wrote: > >> Mike Frysinger <vapier@gentoo.org> wrote: > >>> module level constants should really be in CAPS. and use a tuple to make it >>> const. >>> -mike > >> https://github.com/pravins/glibc-i18n/commit/53b81c58d220bfbb0e8faf8d4313c705826f4543 > > Thanks, integrated. I also adjusted the copyright notices to use year > ranges, as requested. Thanks. > On Feb 20, 2015, "Carlos O'Donell" <carlos@redhat.com> wrote: > >> One nit: > >> -% Character width according to Unicode 5.0.0. >> +% Character width according to Unicode 7.0.0. >> % - Default width is 1. >> % - Double-width characters have width 2; generated from >> % "grep '^[^;]*;[WF]' EastAsianWidth.txt" >> -% and "grep '^[^;]*;[^WF]' EastAsianWidth.txt" >> % - Non-spacing characters have width 0; generated from PropList.txt or >> % "grep '^[^;]*;[^;]*;[^;]*;[^;]*;NSM;' UnicodeData.txt" >> % - Format control characters have width 0; generated from >> % "grep '^[^;]*;[^;]*;Cf;' UnicodeData.txt" >> -% - Zero width characters have width 0; generated from >> -% "grep '^[^;]*;ZERO WIDTH ' UnicodeData.txt" > >> Why even mention the `grep` to be used to generate this data? >> It should just say to use the scripts. Nobody should be confused >> that this data was actually generated by this method. Nor do I want >> anyone doing it this way ever again. > >> Thus shouldn't `write_header_width` simply not output any of this >> stuff? I understand we're trying to minimize the initial diff, but >> in cleanup, we should remove all of this and just say: > >> "% Character width according to Unicode 7.0.0." > > I don't know enough about Unicode to tell whether we've extracted all of > the width information encoded in it, but I have verified that behavior > encoded in the python script is equivalent to what is described in the > comments, so I decided not to act on this right away. I guess we might > want to tweak the comments to make what's going on clearer, instead of > just dropping the info, although I wouldn't oppose that either. > > Does anyone else have thoughts to share on this? > > Mike FABIAN, should you want to tackle this, would you please submit a > patch to this list, with a proper ChangeLog entry, so that it can be > installed as written by yourself? Yes, please take this up with Mike and make sure we clean it up. My preference is to remove the comment entirely. > Here's the patch I'm testing. Ok to install? Yes, OK to install. > Amendments to Unicode 7 update. > > From: Alexandre Oliva <aoliva@redhat.com> > > for ChangeLog > > * include/stdc-predef.h (__STDC_ISO_10646__): Update to > 201304L, for Unicode 7. OK. > for localedata/ChangeLog > > * unicode-gen/ctype_compatibility.py: Use date ranges in > copyright notice. > * unicode-gen/ctype_compatibility_test_cases.py: Likewise. > * unicode-gen/gen_unicode_ctype.py: Likewise. > * unicode-gen/utf8_compatibility.py: Likewise. > * unicode-gen/utf8_gen.py: Likewise. Use upper case for > global variables, use tuples for global constant arrays. From > Mike FABIAN. Suggested by Mike Frysinger <vapier@gentoo.org>. > --- > include/stdc-predef.h | 11 ++++++++--- > localedata/unicode-gen/ctype_compatibility.py | 2 +- > .../unicode-gen/ctype_compatibility_test_cases.py | 2 +- > localedata/unicode-gen/gen_unicode_ctype.py | 2 +- > localedata/unicode-gen/utf8_compatibility.py | 2 +- > localedata/unicode-gen/utf8_gen.py | 20 ++++++++++---------- > 6 files changed, 22 insertions(+), 17 deletions(-) > > diff --git a/include/stdc-predef.h b/include/stdc-predef.h > index 1d6a4eb..e5f1139 100644 > --- a/include/stdc-predef.h > +++ b/include/stdc-predef.h > @@ -49,9 +49,14 @@ > # define __STDC_IEC_559_COMPLEX__ 1 > #endif > > -/* wchar_t uses ISO/IEC 10646 (2nd ed., published 2011-03-15) / > - Unicode 6.0. */ > -#define __STDC_ISO_10646__ 201103L > +/* wchar_t uses Unicode 7.0.0. Version 7.0 of the Unicode Standard is > + synchronized with ISO/IEC 10646:2012, plus Amendments 1 (published > + on April, 2013) and 2 (not yet published as of February, 2015). > + Additionally, it includes the accelerated publication of U+20BD > + RUBLE SIGN. Therefore Unicode 7.0.0 is between 10646:2012 and > + 10646:2014, and so we use the date ISO/IEC 10646:2012 Amd.1 was > + published. */ OK. Excellent comment. > +#define __STDC_ISO_10646__ 201304L > > /* We do not support C11 <threads.h>. */ > #define __STDC_NO_THREADS__ 1 > diff --git a/localedata/unicode-gen/ctype_compatibility.py b/localedata/unicode-gen/ctype_compatibility.py > index 19e9ee5..0d67f29 100755 > --- a/localedata/unicode-gen/ctype_compatibility.py > +++ b/localedata/unicode-gen/ctype_compatibility.py > @@ -1,6 +1,6 @@ > #!/usr/bin/python3 > # -*- coding: utf-8 -*- > -# Copyright (C) 2014, 2015 Free Software Foundation, Inc. > +# Copyright (C) 2014-2015 Free Software Foundation, Inc. > # This file is part of the GNU C Library. > # > # The GNU C Library is free software; you can redistribute it and/or > diff --git a/localedata/unicode-gen/ctype_compatibility_test_cases.py b/localedata/unicode-gen/ctype_compatibility_test_cases.py > index ab7f6dd..34e6de4 100644 > --- a/localedata/unicode-gen/ctype_compatibility_test_cases.py > +++ b/localedata/unicode-gen/ctype_compatibility_test_cases.py > @@ -1,5 +1,5 @@ > # -*- coding: utf-8 -*- > -# Copyright (C) 2014, 2015 Free Software Foundation, Inc. > +# Copyright (C) 2014-2015 Free Software Foundation, Inc. > # This file is part of the GNU C Library. > # > # The GNU C Library is free software; you can redistribute it and/or > diff --git a/localedata/unicode-gen/gen_unicode_ctype.py b/localedata/unicode-gen/gen_unicode_ctype.py > index 559af79..0c74f2a 100755 > --- a/localedata/unicode-gen/gen_unicode_ctype.py > +++ b/localedata/unicode-gen/gen_unicode_ctype.py > @@ -1,7 +1,7 @@ > #!/usr/bin/python3 > # > # Generate a Unicode conforming LC_CTYPE category from a UnicodeData file. > -# Copyright (C) 2014, 2015 Free Software Foundation, Inc. > +# Copyright (C) 2014-2015 Free Software Foundation, Inc. > # This file is part of the GNU C Library. > # Based on gen-unicode-ctype.c by Bruno Haible <haible@clisp.cons.org>, 2000. > # > diff --git a/localedata/unicode-gen/utf8_compatibility.py b/localedata/unicode-gen/utf8_compatibility.py > index e11327b..b84a1eb 100755 > --- a/localedata/unicode-gen/utf8_compatibility.py > +++ b/localedata/unicode-gen/utf8_compatibility.py > @@ -1,6 +1,6 @@ > #!/usr/bin/python3 > # -*- coding: utf-8 -*- > -# Copyright (C) 2014, 2015 Free Software Foundation, Inc. > +# Copyright (C) 2014-2015 Free Software Foundation, Inc. > # This file is part of the GNU C Library. > # > # The GNU C Library is free software; you can redistribute it and/or > diff --git a/localedata/unicode-gen/utf8_gen.py b/localedata/unicode-gen/utf8_gen.py > index 670a628..f1b88f5 100755 > --- a/localedata/unicode-gen/utf8_gen.py > +++ b/localedata/unicode-gen/utf8_gen.py > @@ -1,6 +1,6 @@ > #!/usr/bin/python3 > # -*- coding: utf-8 -*- > -# Copyright (C) 2014, 2015 Free Software Foundation, Inc. > +# Copyright (C) 2014-2015 Free Software Foundation, Inc. > # This file is part of the GNU C Library. > # > # The GNU C Library is free software; you can redistribute it and/or > @@ -33,21 +33,21 @@ import re > # Auxiliary tables for Hangul syllable names, see the Unicode 3.0 book, > # sections 3.11 and 4.4. > > -jamo_initial_short_name = [ > +JAMO_INITIAL_SHORT_NAME = ( > 'G', 'GG', 'N', 'D', 'DD', 'R', 'M', 'B', 'BB', 'S', 'SS', '', 'J', 'JJ', > 'C', 'K', 'T', 'P', 'H' > -] > +) > > -jamo_medial_short_name = [ > +JAMO_MEDIAL_SHORT_NAME = ( > 'A', 'AE', 'YA', 'YAE', 'EO', 'E', 'YEO', 'YE', 'O', 'WA', 'WAE', 'OE', > 'YO', 'U', 'WEO', 'WE', 'WI', 'YU', 'EU', 'YI', 'I' > -] > +) > > -jamo_final_short_name = [ > +JAMO_FINAL_SHORT_NAME = ( > '', 'G', 'GG', 'GS', 'N', 'NI', 'NH', 'D', 'L', 'LG', 'LM', 'LB', 'LS', > 'LT', 'LP', 'LH', 'M', 'B', 'BS', 'S', 'SS', 'NG', 'J', 'C', 'K', 'T', > 'P', 'H' > -] > +) > > def ucs_symbol(code_point): > '''Return the UCS symbol string for a Unicode character.''' > @@ -74,9 +74,9 @@ def process_range(start, end, outfile, name): > index2, index3 = divmod(i - 0xaC00, 28) > index1, index2 = divmod(index2, 21) > hangul_syllable_name = 'HANGUL SYLLABLE ' \ > - + jamo_initial_short_name[index1] \ > - + jamo_medial_short_name[index2] \ > - + jamo_final_short_name[index3] > + + JAMO_INITIAL_SHORT_NAME[index1] \ > + + JAMO_MEDIAL_SHORT_NAME[index2] \ > + + JAMO_FINAL_SHORT_NAME[index3] > outfile.write('{:<11s} {:<12s} {:s}\n'.format( > ucs_symbol(i), convert_to_hex(i), > hangul_syllable_name)) > > OK. Cheers, Carlos.
diff --git a/include/stdc-predef.h b/include/stdc-predef.h index 1d6a4eb..e5f1139 100644 --- a/include/stdc-predef.h +++ b/include/stdc-predef.h @@ -49,9 +49,14 @@ # define __STDC_IEC_559_COMPLEX__ 1 #endif -/* wchar_t uses ISO/IEC 10646 (2nd ed., published 2011-03-15) / - Unicode 6.0. */ -#define __STDC_ISO_10646__ 201103L +/* wchar_t uses Unicode 7.0.0. Version 7.0 of the Unicode Standard is + synchronized with ISO/IEC 10646:2012, plus Amendments 1 (published + on April, 2013) and 2 (not yet published as of February, 2015). + Additionally, it includes the accelerated publication of U+20BD + RUBLE SIGN. Therefore Unicode 7.0.0 is between 10646:2012 and + 10646:2014, and so we use the date ISO/IEC 10646:2012 Amd.1 was + published. */ +#define __STDC_ISO_10646__ 201304L /* We do not support C11 <threads.h>. */ #define __STDC_NO_THREADS__ 1 diff --git a/localedata/unicode-gen/ctype_compatibility.py b/localedata/unicode-gen/ctype_compatibility.py index 19e9ee5..0d67f29 100755 --- a/localedata/unicode-gen/ctype_compatibility.py +++ b/localedata/unicode-gen/ctype_compatibility.py @@ -1,6 +1,6 @@ #!/usr/bin/python3 # -*- coding: utf-8 -*- -# Copyright (C) 2014, 2015 Free Software Foundation, Inc. +# Copyright (C) 2014-2015 Free Software Foundation, Inc. # This file is part of the GNU C Library. # # The GNU C Library is free software; you can redistribute it and/or diff --git a/localedata/unicode-gen/ctype_compatibility_test_cases.py b/localedata/unicode-gen/ctype_compatibility_test_cases.py index ab7f6dd..34e6de4 100644 --- a/localedata/unicode-gen/ctype_compatibility_test_cases.py +++ b/localedata/unicode-gen/ctype_compatibility_test_cases.py @@ -1,5 +1,5 @@ # -*- coding: utf-8 -*- -# Copyright (C) 2014, 2015 Free Software Foundation, Inc. +# Copyright (C) 2014-2015 Free Software Foundation, Inc. # This file is part of the GNU C Library. # # The GNU C Library is free software; you can redistribute it and/or diff --git a/localedata/unicode-gen/gen_unicode_ctype.py b/localedata/unicode-gen/gen_unicode_ctype.py index 559af79..0c74f2a 100755 --- a/localedata/unicode-gen/gen_unicode_ctype.py +++ b/localedata/unicode-gen/gen_unicode_ctype.py @@ -1,7 +1,7 @@ #!/usr/bin/python3 # # Generate a Unicode conforming LC_CTYPE category from a UnicodeData file. -# Copyright (C) 2014, 2015 Free Software Foundation, Inc. +# Copyright (C) 2014-2015 Free Software Foundation, Inc. # This file is part of the GNU C Library. # Based on gen-unicode-ctype.c by Bruno Haible <haible@clisp.cons.org>, 2000. # diff --git a/localedata/unicode-gen/utf8_compatibility.py b/localedata/unicode-gen/utf8_compatibility.py index e11327b..b84a1eb 100755 --- a/localedata/unicode-gen/utf8_compatibility.py +++ b/localedata/unicode-gen/utf8_compatibility.py @@ -1,6 +1,6 @@ #!/usr/bin/python3 # -*- coding: utf-8 -*- -# Copyright (C) 2014, 2015 Free Software Foundation, Inc. +# Copyright (C) 2014-2015 Free Software Foundation, Inc. # This file is part of the GNU C Library. # # The GNU C Library is free software; you can redistribute it and/or diff --git a/localedata/unicode-gen/utf8_gen.py b/localedata/unicode-gen/utf8_gen.py index 670a628..f1b88f5 100755 --- a/localedata/unicode-gen/utf8_gen.py +++ b/localedata/unicode-gen/utf8_gen.py @@ -1,6 +1,6 @@ #!/usr/bin/python3 # -*- coding: utf-8 -*- -# Copyright (C) 2014, 2015 Free Software Foundation, Inc. +# Copyright (C) 2014-2015 Free Software Foundation, Inc. # This file is part of the GNU C Library. # # The GNU C Library is free software; you can redistribute it and/or @@ -33,21 +33,21 @@ import re # Auxiliary tables for Hangul syllable names, see the Unicode 3.0 book, # sections 3.11 and 4.4. -jamo_initial_short_name = [ +JAMO_INITIAL_SHORT_NAME = ( 'G', 'GG', 'N', 'D', 'DD', 'R', 'M', 'B', 'BB', 'S', 'SS', '', 'J', 'JJ', 'C', 'K', 'T', 'P', 'H' -] +) -jamo_medial_short_name = [ +JAMO_MEDIAL_SHORT_NAME = ( 'A', 'AE', 'YA', 'YAE', 'EO', 'E', 'YEO', 'YE', 'O', 'WA', 'WAE', 'OE', 'YO', 'U', 'WEO', 'WE', 'WI', 'YU', 'EU', 'YI', 'I' -] +) -jamo_final_short_name = [ +JAMO_FINAL_SHORT_NAME = ( '', 'G', 'GG', 'GS', 'N', 'NI', 'NH', 'D', 'L', 'LG', 'LM', 'LB', 'LS', 'LT', 'LP', 'LH', 'M', 'B', 'BS', 'S', 'SS', 'NG', 'J', 'C', 'K', 'T', 'P', 'H' -] +) def ucs_symbol(code_point): '''Return the UCS symbol string for a Unicode character.''' @@ -74,9 +74,9 @@ def process_range(start, end, outfile, name): index2, index3 = divmod(i - 0xaC00, 28) index1, index2 = divmod(index2, 21) hangul_syllable_name = 'HANGUL SYLLABLE ' \ - + jamo_initial_short_name[index1] \ - + jamo_medial_short_name[index2] \ - + jamo_final_short_name[index3] + + JAMO_INITIAL_SHORT_NAME[index1] \ + + JAMO_MEDIAL_SHORT_NAME[index2] \ + + JAMO_FINAL_SHORT_NAME[index3] outfile.write('{:<11s} {:<12s} {:s}\n'.format( ucs_symbol(i), convert_to_hex(i), hangul_syllable_name))