[v11] Locales: Cyrillic -> ASCII transliteration [BZ #2872]
diff mbox series

Message ID 80bb3d3a-bd89-2306-2772-85b5fdcb93c2@kobylkin.com
State New
Headers show
Series
  • [v11] Locales: Cyrillic -> ASCII transliteration [BZ #2872]
Related show

Commit Message

Diego (Egor) Kobylkin Dec. 8, 2018, 10:28 p.m. UTC
Changelog v11:
* Re-targeted the patch against locale/C-translit.h.in as the proper
file for the ASCII translit table.
* Correspondingly the patch now only contains the additional
Cyrillic-ASCII strings in the format of locale/C-translit.h.in table.
The 'include "translit_cyrillic";""' directives are not necessary in the
locale files and they are now all left intact.
* Also the file translit_cyrillic is not longer needed and is omitted.
* Edited below email, commit message.

Changelog v10:
* Removed ISO 9.1995 GOST 7.79-2000 System A (transliteration to Latin
with diacritics) as conflicting with System B within glibc mechanics and
not solving BZ #2872
* Edited below email, commit message, comment in translit_cyrillic to
reflect System A removal
* Removed <U0423><U0301> and <U0443><U0301> (Cyrillic U with acute,
using composition) as composing is not covered by current glibc
conversion mechanics

Changelog v9:
* Fixed formatting (trailing spaces etc.)
* Put commit summary in the patch file, now it is generated completely
by git format-patch

Changelog v8:
* Re-added missing translit_cyrillic in patch v7 (due to missing "git
add" in the script).

Changelog v7:
* Generated against git://sourceware.org/git/glibc.git master with git
format-patch.
* The 'include "translit_cyrillic";""' now immediately follows last
'include "translit_XXX";""' string (was inserted just before
translit_end previously.)
* Only the locales already having 'include .*translit.*;""' are patched
(see the list for manual exclusions below, full list of included locales
at the end of the email in the commit section.)
* Excluded az_AZ completely to avoid circular reference from tr_TR via
“copy "tr_TR"”.

Changelog v6:
* Locales removed from the patch: C and sd_PK.
* Added locales: az_AZ and ky_KG.
* Consistently transliterate single uppercase Cyrillic letters
  to sequences of all uppercase Latin letters in all languages (whenever
  a Cyrillic letter is transliterated to more than one Latin letter),
  for example "Ї" is now transliterated as "YI" rather than "Yi".

Dear locale maintainers,

fix the glibc bug 2872 "Transliteration Cyrillic -> ASCII fails"

https://sourceware.org/bugzilla/show_bug.cgi?id=2872 [1]

add the Cyrillic transliteration rows to locale/C-translit.h.in.

The patch is attached.


Current bug effect:

The glibc wiki explicitly lists this use case as the test example and
currently it fails on Cyrillic texts [1] [8] [9]:

iconv -f UTF-8 -t ASCII//TRANSLIT < translit-test-input.txt |grep CYRILLIC

CYRILLIC ????? ??? ???? ?????? ??????????? ?????, ?? ????? ?? ???.

- it produces a string of question marks and spaces.

This is what it should produce and it does so after the patch applied:

CYRILLIC S``esh` eshhyo e`tix myagkix franczuzskix bulok, da vy'pej zhe
chayu.


The root problem and the fix:

The root problem is the missing transliteration table that I am
supplying here.


COMMIT MESSAGE:
This translit_cyrillic table enables conversion (e.g. with iconv) from a
UTF-8 encoded text based on Cyrillic alphabet to a ASCII//TRANSLIT text.

Example: iconv -f UTF-8 -t ASCII//TRANSLIT will produce ASCII
compatible transcription.

While a UTF-encoded Cyrillic text requires Cyrillic fonts the result of
a transliteration/transcription has only Latin/ASCII codes but still can
be read by a native speaker. Among other things it is useful for
processing the Cyrillic texts and filenames by programs or on systems
that are not specifically prepared to work with Cyrillic, don't have
corresponding fonts installed or can't handle UTF-8.

The patch content (mapping) is based on ISO 9.1995 standard [10] and its
derivative GOST 7.79-2000 System B official source (Federal Agency on
Technical Regulating and Metrology Of Russian Federation [2]).
Technically an independent but mostly identical source [3] was used and
prepared in a spreadsheet [6].

The transliteration of Cyrillic to ASCII according to GOST 7.79-2000
System B represents what is actually called transcription (preserving
phonemes), while System A is the transliteration (preserving graphemes).
There is no meaningful way to preserve graphemes converting Cyrillic to
ASCII and thus the System B is chosen [11]. To be super clear the System
A has nothing to do with this bug regardless it being a transliteration.

Those interested in implementing System A for transliteration of
Cyrillic to Latin with Diacritic as a new feature are welcome to use the
spreadsheet in [6] as a starting point.

Links:

[1] This bug entry https://sourceware.org/bugzilla/show_bug.cgi?id=2872
[2] GOST 7.79-2000 official source
http://protect.gost.ru/document.aspx?control=7&id=130715 (is only
available in low quality gif format)
[3] http://transliteration.ru/gost-7-79-2000/ and
http://www.yfermer.ru/specifications/285821.html
[4] Wikipedia article on Cyrillic transliteration with Latin alphabet
https://ru.wikipedia.org/wiki/%D0%A2%D1%80%D0%B0%D0%BD%D1%81%D0%BB%D0%B8%D1%82%D0%B5%D1%80%D0%B0%D1%86%D0%B8%D1%8F_%D1%80%D1%83%D1%81%D1%81%D0%BA%D0%BE%D0%B3%D0%BE_%D0%B0%D0%BB%D1%84%D0%B0%D0%B2%D0%B8%D1%82%D0%B0_%D0%BB%D0%B0%D1%82%D0%B8%D0%BD%D0%B8%D1%86%D0%B5%D0%B9
[5] http://man7.org/linux/man-pages/man5/locale.5.html
[6] Spreadsheet for generating translit_cyrillic
https://sourceware.org/bugzilla/attachment.cgi?bugid=2872&action=viewall&hide_obsolete=1
[8] https://sourceware.org/glibc/wiki/Locales#Testing_Locales
[9] translit-test-input.txt
https://sourceware.org/bugzilla/attachment.cgi?id=11304
[10] https://en.wikipedia.org/wiki/ISO_9#GOST_7.79_System_B
[11]
https://scriptsource.org/cms/scripts/page.php?item_id=entry_detail&uid=gslmka8xq3

Best regards,
Egor Kobylkin

Comments

Diego (Egor) Kobylkin Dec. 19, 2018, 11:16 p.m. UTC | #1
Freeze ping.

I'd like to ping the list on this patch and to have some discussion on
moving ASCII transliteration to locale/C-translit.h.in before the freeze.

The wiki page for 2.29 [12] is set as "immutable" for newly registered
users, not sure it is so desired. I could not add this patch there as
"desired".
I have added 2.29 keyword to the bug entry.

Bests,
Egor Kobylkin


[12] https://sourceware.org/glibc/wiki/Release/2.29

On 08.12.18 23:28, Egor Kobylkin wrote:
> Changelog v11:
> * Re-targeted the patch against locale/C-translit.h.in as the proper
> file for the ASCII translit table.
> * Correspondingly the patch now only contains the additional
> Cyrillic-ASCII strings in the format of locale/C-translit.h.in table.
> The 'include "translit_cyrillic";""' directives are not necessary in the
> locale files and they are now all left intact.
> * Also the file translit_cyrillic is not longer needed and is omitted.
> * Edited below email, commit message.
> 
> Changelog v10:
> * Removed ISO 9.1995 GOST 7.79-2000 System A (transliteration to Latin
> with diacritics) as conflicting with System B within glibc mechanics and
> not solving BZ #2872
> * Edited below email, commit message, comment in translit_cyrillic to
> reflect System A removal
> * Removed <U0423><U0301> and <U0443><U0301> (Cyrillic U with acute,
> using composition) as composing is not covered by current glibc
> conversion mechanics
> 
> Changelog v9:
> * Fixed formatting (trailing spaces etc.)
> * Put commit summary in the patch file, now it is generated completely
> by git format-patch
> 
> Changelog v8:
> * Re-added missing translit_cyrillic in patch v7 (due to missing "git
> add" in the script).
> 
> Changelog v7:
> * Generated against git://sourceware.org/git/glibc.git master with git
> format-patch.
> * The 'include "translit_cyrillic";""' now immediately follows last
> 'include "translit_XXX";""' string (was inserted just before
> translit_end previously.)
> * Only the locales already having 'include .*translit.*;""' are patched
> (see the list for manual exclusions below, full list of included locales
> at the end of the email in the commit section.)
> * Excluded az_AZ completely to avoid circular reference from tr_TR via
> “copy "tr_TR"”.
> 
> Changelog v6:
> * Locales removed from the patch: C and sd_PK.
> * Added locales: az_AZ and ky_KG.
> * Consistently transliterate single uppercase Cyrillic letters
>   to sequences of all uppercase Latin letters in all languages (whenever
>   a Cyrillic letter is transliterated to more than one Latin letter),
>   for example "Ї" is now transliterated as "YI" rather than "Yi".
> 
> Dear locale maintainers,
> 
> fix the glibc bug 2872 "Transliteration Cyrillic -> ASCII fails"
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=2872 [1]
> 
> add the Cyrillic transliteration rows to locale/C-translit.h.in.
> 
> The patch is attached.
> 
> 
> Current bug effect:
> 
> The glibc wiki explicitly lists this use case as the test example and
> currently it fails on Cyrillic texts [1] [8] [9]:
> 
> iconv -f UTF-8 -t ASCII//TRANSLIT < translit-test-input.txt |grep CYRILLIC
> 
> CYRILLIC ????? ??? ???? ?????? ??????????? ?????, ?? ????? ?? ???.
> 
> - it produces a string of question marks and spaces.
> 
> This is what it should produce and it does so after the patch applied:
> 
> CYRILLIC S``esh` eshhyo e`tix myagkix franczuzskix bulok, da vy'pej zhe
> chayu.
> 
> 
> The root problem and the fix:
> 
> The root problem is the missing transliteration table that I am
> supplying here.
> 
> 
> COMMIT MESSAGE:
> This translit_cyrillic table enables conversion (e.g. with iconv) from a
> UTF-8 encoded text based on Cyrillic alphabet to a ASCII//TRANSLIT text.
> 
> Example: iconv -f UTF-8 -t ASCII//TRANSLIT will produce ASCII
> compatible transcription.
> 
> While a UTF-encoded Cyrillic text requires Cyrillic fonts the result of
> a transliteration/transcription has only Latin/ASCII codes but still can
> be read by a native speaker. Among other things it is useful for
> processing the Cyrillic texts and filenames by programs or on systems
> that are not specifically prepared to work with Cyrillic, don't have
> corresponding fonts installed or can't handle UTF-8.
> 
> The patch content (mapping) is based on ISO 9.1995 standard [10] and its
> derivative GOST 7.79-2000 System B official source (Federal Agency on
> Technical Regulating and Metrology Of Russian Federation [2]).
> Technically an independent but mostly identical source [3] was used and
> prepared in a spreadsheet [6].
> 
> The transliteration of Cyrillic to ASCII according to GOST 7.79-2000
> System B represents what is actually called transcription (preserving
> phonemes), while System A is the transliteration (preserving graphemes).
> There is no meaningful way to preserve graphemes converting Cyrillic to
> ASCII and thus the System B is chosen [11]. To be super clear the System
> A has nothing to do with this bug regardless it being a transliteration.
> 
> Those interested in implementing System A for transliteration of
> Cyrillic to Latin with Diacritic as a new feature are welcome to use the
> spreadsheet in [6] as a starting point.
> 
> Links:
> 
> [1] This bug entry https://sourceware.org/bugzilla/show_bug.cgi?id=2872
> [2] GOST 7.79-2000 official source
> http://protect.gost.ru/document.aspx?control=7&id=130715 (is only
> available in low quality gif format)
> [3] http://transliteration.ru/gost-7-79-2000/ and
> http://www.yfermer.ru/specifications/285821.html
> [4] Wikipedia article on Cyrillic transliteration with Latin alphabet
> https://ru.wikipedia.org/wiki/%D0%A2%D1%80%D0%B0%D0%BD%D1%81%D0%BB%D0%B8%D1%82%D0%B5%D1%80%D0%B0%D1%86%D0%B8%D1%8F_%D1%80%D1%83%D1%81%D1%81%D0%BA%D0%BE%D0%B3%D0%BE_%D0%B0%D0%BB%D1%84%D0%B0%D0%B2%D0%B8%D1%82%D0%B0_%D0%BB%D0%B0%D1%82%D0%B8%D0%BD%D0%B8%D1%86%D0%B5%D0%B9
> [5] http://man7.org/linux/man-pages/man5/locale.5.html
> [6] Spreadsheet for generating translit_cyrillic
> https://sourceware.org/bugzilla/attachment.cgi?bugid=2872&action=viewall&hide_obsolete=1
> [8] https://sourceware.org/glibc/wiki/Locales#Testing_Locales
> [9] translit-test-input.txt
> https://sourceware.org/bugzilla/attachment.cgi?id=11304
> [10] https://en.wikipedia.org/wiki/ISO_9#GOST_7.79_System_B
> [11]
> https://scriptsource.org/cms/scripts/page.php?item_id=entry_detail&uid=gslmka8xq3
> 
> Best regards,
> Egor Kobylkin
> 
>
From b9cd550028ecf7c875c9d7250c8598433b1fc474 Mon Sep 17 00:00:00 2001
From: Egor Kobylkin <egor@kobylkin.com>
Date: Sat, 8 Dec 2018 22:08:59 +0100
Subject: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872]

	[BZ #2872]
	* locale/C-translit.h.in: Add Cyrillic transliteration.
---
 locale/C-translit.h.in | 170 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 170 insertions(+)

diff --git a/locale/C-translit.h.in b/locale/C-translit.h.in
index e27f39e8fe..bd64edc609 100644
--- a/locale/C-translit.h.in
+++ b/locale/C-translit.h.in
@@ -2,6 +2,7 @@
    Copyright (C) 2000-2018 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
    Contributed by Ulrich Drepper <drepper@redhat.com>, 2000.
+   0401-04f9 contributed by Egor Kobylkin <Egor@Kobylkin.com>, 2018.
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -56,6 +57,175 @@
 "\x02cd"	"_"	/* <U02CD> MODIFIER LETTER LOW MACRON */
 "\x02d0"	":"	/* <U02D0> MODIFIER LETTER TRIANGULAR COLON */
 "\x02dc"	"~"	/* <U02DC> SMALL TILDE */
+"\x0401"	"YO"	/* <U0401> CYRILLIC CAPITAL LETTER IO */
+"\x0402"	"DJ"	/* <U0402> CYRILLIC CAPITAL LETTER DJE */
+"\x0403"	"G`"	/* <U0403> CYRILLIC CAPITAL LETTER GJE */
+"\x0404"	"YE"	/* <U0404> CYRILLIC CAPITAL LETTER UKRAINIAN IE */
+"\x0405"	"Z`"	/* <U0405> CYRILLIC CAPITAL LETTER DZE */
+"\x0406"	"I"	/* <U0406> CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I */
+"\x0407"	"YI"	/* <U0407> CYRILLIC CAPITAL LETTER YI */
+"\x0408"	"J"	/* <U0408> CYRILLIC CAPITAL LETTER JE */
+"\x0409"	"L`"	/* <U0409> CYRILLIC CAPITAL LETTER LJE */
+"\x040a"	"N`"	/* <U040A> CYRILLIC CAPITAL LETTER NJE */
+"\x040b"	"TSH"	/* <U040B> CYRILLIC CAPITAL LETTER TSHE */
+"\x040c"	"K`"	/* <U040C> CYRILLIC CAPITAL LETTER KJE */
+"\x040e"	"U`"	/* <U040E> CYRILLIC CAPITAL LETTER SHORT U */
+"\x040f"	"DH"	/* <U040F> CYRILLIC CAPITAL LETTER DZHE */
+"\x0410"	"A"	/* <U0410> CYRILLIC CAPITAL LETTER A */
+"\x0411"	"B"	/* <U0411> CYRILLIC CAPITAL LETTER BE */
+"\x0412"	"V"	/* <U0412> CYRILLIC CAPITAL LETTER VE */
+"\x0413"	"G"	/* <U0413> CYRILLIC CAPITAL LETTER GHE */
+"\x0414"	"D"	/* <U0414> CYRILLIC CAPITAL LETTER DE */
+"\x0415"	"E"	/* <U0415> CYRILLIC CAPITAL LETTER IE */
+"\x0416"	"ZH"	/* <U0416> CYRILLIC CAPITAL LETTER ZHE */
+"\x0417"	"Z"	/* <U0417> CYRILLIC CAPITAL LETTER ZE */
+"\x0418"	"I"	/* <U0418> CYRILLIC CAPITAL LETTER I */
+"\x0419"	"J"	/* <U0419> CYRILLIC CAPITAL LETTER SHORT I */
+"\x041a"	"K"	/* <U041A> CYRILLIC CAPITAL LETTER KA */
+"\x041b"	"L"	/* <U041B> CYRILLIC CAPITAL LETTER EL */
+"\x041c"	"M"	/* <U041C> CYRILLIC CAPITAL LETTER EM */
+"\x041d"	"N"	/* <U041D> CYRILLIC CAPITAL LETTER EN */
+"\x041e"	"O"	/* <U041E> CYRILLIC CAPITAL LETTER O */
+"\x041f"	"P"	/* <U041F> CYRILLIC CAPITAL LETTER PE */
+"\x0420"	"R"	/* <U0420> CYRILLIC CAPITAL LETTER ER */
+"\x0421"	"S"	/* <U0421> CYRILLIC CAPITAL LETTER ES */
+"\x0422"	"T"	/* <U0422> CYRILLIC CAPITAL LETTER TE */
+"\x0423"	"U"	/* <U0423> CYRILLIC CAPITAL LETTER U */
+"\x0424"	"F"	/* <U0424> CYRILLIC CAPITAL LETTER EF */
+"\x0425"	"X"	/* <U0425> CYRILLIC CAPITAL LETTER HA */
+"\x0426"	"CZ"	/* <U0426> CYRILLIC CAPITAL LETTER TSE */
+"\x0427"	"CH"	/* <U0427> CYRILLIC CAPITAL LETTER CHE */
+"\x0428"	"SH"	/* <U0428> CYRILLIC CAPITAL LETTER SHA */
+"\x0429"	"SHH"	/* <U0429> CYRILLIC CAPITAL LETTER SHCHA */
+"\x042a"	"A`"	/* <U042A> CYRILLIC CAPITAL LETTER HARD SIGN */
+"\x042b"	"Y`"	/* <U042B> CYRILLIC CAPITAL LETTER YERU */
+"\x042c"	"`"	/* <U042C> CYRILLIC CAPITAL LETTER SOFT SIGN */
+"\x042d"	"E`"	/* <U042D> CYRILLIC CAPITAL LETTER E */
+"\x042e"	"YU"	/* <U042E> CYRILLIC CAPITAL LETTER YU */
+"\x042f"	"YA"	/* <U042F> CYRILLIC CAPITAL LETTER YA */
+"\x0430"	"a"	/* <U0430> CYRILLIC SMALL LETTER A */
+"\x0431"	"b"	/* <U0431> CYRILLIC SMALL LETTER BE */
+"\x0432"	"v"	/* <U0432> CYRILLIC SMALL LETTER VE */
+"\x0433"	"g"	/* <U0433> CYRILLIC SMALL LETTER GHE */
+"\x0434"	"d"	/* <U0434> CYRILLIC SMALL LETTER DE */
+"\x0435"	"e"	/* <U0435> CYRILLIC SMALL LETTER IE */
+"\x0436"	"zh"	/* <U0436> CYRILLIC SMALL LETTER ZHE */
+"\x0437"	"z"	/* <U0437> CYRILLIC SMALL LETTER ZE */
+"\x0438"	"i"	/* <U0438> CYRILLIC SMALL LETTER I */
+"\x0439"	"j"	/* <U0439> CYRILLIC SMALL LETTER SHORT I */
+"\x043a"	"k"	/* <U043A> CYRILLIC SMALL LETTER KA */
+"\x043b"	"l"	/* <U043B> CYRILLIC SMALL LETTER EL */
+"\x043c"	"m"	/* <U043C> CYRILLIC SMALL LETTER EM */
+"\x043d"	"n"	/* <U043D> CYRILLIC SMALL LETTER EN */
+"\x043e"	"o"	/* <U043E> CYRILLIC SMALL LETTER O */
+"\x043f"	"p"	/* <U043F> CYRILLIC SMALL LETTER PE */
+"\x0440"	"r"	/* <U0440> CYRILLIC SMALL LETTER ER */
+"\x0441"	"s"	/* <U0441> CYRILLIC SMALL LETTER ES */
+"\x0442"	"t"	/* <U0442> CYRILLIC SMALL LETTER TE */
+"\x0443"	"u"	/* <U0443> CYRILLIC SMALL LETTER U */
+"\x0444"	"f"	/* <U0444> CYRILLIC SMALL LETTER EF */
+"\x0445"	"x"	/* <U0445> CYRILLIC SMALL LETTER HA */
+"\x0446"	"cz"	/* <U0446> CYRILLIC SMALL LETTER TSE */
+"\x0447"	"ch"	/* <U0447> CYRILLIC SMALL LETTER CHE */
+"\x0448"	"sh"	/* <U0448> CYRILLIC SMALL LETTER SHA */
+"\x0449"	"shh"	/* <U0449> CYRILLIC SMALL LETTER SHCHA */
+"\x044a"	"``"	/* <U044A> CYRILLIC SMALL LETTER HARD SIGN */
+"\x044b"	"y`"	/* <U044B> CYRILLIC SMALL LETTER YERU */
+"\x044c"	"`"	/* <U044C> CYRILLIC SMALL LETTER SOFT SIGN */
+"\x044d"	"e`"	/* <U044D> CYRILLIC SMALL LETTER E */
+"\x044e"	"yu"	/* <U044E> CYRILLIC SMALL LETTER YU */
+"\x044f"	"ya"	/* <U044F> CYRILLIC SMALL LETTER YA */
+"\x0451"	"yo"	/* <U0451> CYRILLIC SMALL LETTER IO */
+"\x0452"	"dj"	/* <U0452> CYRILLIC SMALL LETTER DJE */
+"\x0453"	"g`"	/* <U0453> CYRILLIC SMALL LETTER GJE */
+"\x0454"	"ye"	/* <U0454> CYRILLIC SMALL LETTER UKRAINIAN IE */
+"\x0455"	"z`"	/* <U0455> CYRILLIC SMALL LETTER DZE */
+"\x0456"	"i"	/* <U0456> CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I */
+"\x0457"	"yi"	/* <U0457> CYRILLIC SMALL LETTER YI */
+"\x0458"	"j"	/* <U0458> CYRILLIC SMALL LETTER JE */
+"\x0459"	"l`"	/* <U0459> CYRILLIC SMALL LETTER LJE */
+"\x045a"	"n`"	/* <U045A> CYRILLIC SMALL LETTER NJE */
+"\x045b"	"tsh"	/* <U045B> CYRILLIC SMALL LETTER TSHE */
+"\x045c"	"k`"	/* <U045C> CYRILLIC SMALL LETTER KJE */
+"\x045e"	"u`"	/* <U045E> CYRILLIC SMALL LETTER SHORT U */
+"\x045f"	"dh"	/* <U045F> CYRILLIC SMALL LETTER DZHE */
+"\x046a"	"O`"	/* <U046A> CYRILLIC CAPITAL LETTER BIG YUS */
+"\x046b"	"o`"	/* <U046B> CYRILLIC SMALL LETTER BIG YUS */
+"\x0472"	"FH"	/* <U0472> CYRILLIC CAPITAL LETTER FITA */
+"\x0473"	"fh"	/* <U0473> CYRILLIC SMALL LETTER FITA */
+"\x0474"	"YH"	/* <U0474> CYRILLIC CAPITAL LETTER IZHITSA */
+"\x0475"	"yh"	/* <U0475> CYRILLIC SMALL LETTER IZHITSA */
+"\x048c"	"E`"	/* <U048C> CYRILLIC CAPITAL LETTER SEMISOFT SIGN */
+"\x048d"	"e`"	/* <U048D> CYRILLIC SMALL LETTER SEMISOFT SIGN */
+"\x0490"	"G`"	/* <U0490> CYRILLIC CAPITAL LETTER GHE WITH UPTURN */
+"\x0491"	"g`"	/* <U0491> CYRILLIC SMALL LETTER GHE WITH UPTURN */
+"\x0492"	"GH"	/* <U0492> CYRILLIC CAPITAL LETTER GHE WITH STROKE */
+"\x0493"	"gh"	/* <U0493> CYRILLIC SMALL LETTER GHE WITH STROKE */
+"\x0494"	"GH"	/* <U0494> CYRILLIC CAPITAL LETTER GHE WITH MIDDLE HOOK */
+"\x0495"	"gh"	/* <U0495> CYRILLIC SMALL LETTER GHE WITH MIDDLE HOOK */
+"\x0496"	"ZH`"	/* <U0496> CYRILLIC CAPITAL LETTER ZHE WITH DESCENDER */
+"\x0497"	"zh`"	/* <U0497> CYRILLIC SMALL LETTER ZHE WITH DESCENDER */
+"\x049a"	"K`"	/* <U049A> CYRILLIC CAPITAL LETTER KA WITH DESCENDER */
+"\x049b"	"k`"	/* <U049B> CYRILLIC SMALL LETTER KA WITH DESCENDER */
+"\x049e"	"K`"	/* <U049E> CYRILLIC CAPITAL LETTER KA WITH STROKE */
+"\x049f"	"k`"	/* <U049F> CYRILLIC SMALL LETTER KA WITH STROKE */
+"\x04a2"	"N`"	/* <U04A2> CYRILLIC CAPITAL LETTER EN WITH DESCENDER */
+"\x04a3"	"n`"	/* <U04A3> CYRILLIC SMALL LETTER EN WITH DESCENDER */
+"\x04a4"	"NG"	/* <U04A4> CYRILLIC CAPITAL LIGATURE EN GHE */
+"\x04a5"	"ng"	/* <U04A5> CYRILLIC SMALL LIGATURE EN GHE */
+"\x04a6"	"P`"	/* <U04A6> CYRILLIC CAPITAL LETTER PE WITH MIDDLE HOOK */
+"\x04a7"	"p`"	/* <U04A7> CYRILLIC SMALL LETTER PE WITH MIDDLE HOOK */
+"\x04a8"	"O`"	/* <U04A8> CYRILLIC CAPITAL LETTER ABKHASIAN HA */
+"\x04a9"	"o`"	/* <U04A9> CYRILLIC SMALL LETTER ABKHASIAN HA */
+"\x04aa"	"C`"	/* <U04AA> CYRILLIC CAPITAL LETTER ES WITH DESCENDER */
+"\x04ab"	"C`"	/* <U04AB> CYRILLIC SMALL LETTER ES WITH DESCENDER */
+"\x04ac"	"T`"	/* <U04AC> CYRILLIC CAPITAL LETTER TE WITH DESCENDER */
+"\x04ad"	"t`"	/* <U04AD> CYRILLIC SMALL LETTER TE WITH DESCENDER */
+"\x04ae"	"U"	/* <U04AE> CYRILLIC CAPITAL LETTER STRAIGHT U */
+"\x04af"	"u"	/* <U04AF> CYRILLIC SMALL LETTER STRAIGHT U */
+"\x04b2"	"H`"	/* <U04B2> CYRILLIC CAPITAL LETTER HA WITH DESCENDER */
+"\x04b3"	"h`"	/* <U04B3> CYRILLIC SMALL LETTER HA WITH DESCENDER */
+"\x04b4"	"TCZ"	/* <U04B4> CYRILLIC CAPITAL LIGATURE TE TSE */
+"\x04b5"	"tcz"	/* <U04B5> CYRILLIC SMALL LIGATURE TE TSE */
+"\x04ba"	"SH`"	/* <U04BA> CYRILLIC CAPITAL LETTER SHHA */
+"\x04bb"	"SH`"	/* <U04BB> CYRILLIC SMALL LETTER SHHA */
+"\x04bc"	"CH`"	/* <U04BC> CYRILLIC CAPITAL LETTER ABKHASIAN CHE */
+"\x04bd"	"ch`"	/* <U04BD> CYRILLIC SMALL LETTER ABKHASIAN CHE */
+"\x04be"	"CH`"	/* <U04BE> CYRILLIC CAPITAL LETTER ABKHASIAN CHE WITH DESCENDER */
+"\x04bf"	"ch`"	/* <U04BF> CYRILLIC SMALL LETTER ABKHASIAN CHE WITH DESCENDER */
+"\x04c0"	"i"	/* <U04C0> CYRILLIC LETTER PALOCHKA */
+"\x04c1"	"ZH`"	/* <U04C1> CYRILLIC CAPITAL LETTER ZHE WITH BREVE */
+"\x04c2"	"zh`"	/* <U04C2> CYRILLIC SMALL LETTER ZHE WITH BREVE */
+"\x04cb"	"CH`"	/* <U04CB> CYRILLIC CAPITAL LETTER KHAKASSIAN CHE */
+"\x04cc"	"ch`"	/* <U04CC> CYRILLIC SMALL LETTER KHAKASSIAN CHE */
+"\x04d0"	"A`"	/* <U04D0> CYRILLIC CAPITAL LETTER A WITH BREVE */
+"\x04d1"	"a`"	/* <U04D1> CYRILLIC SMALL LETTER A WITH BREVE */
+"\x04d2"	"A`"	/* <U04D2> CYRILLIC CAPITAL LETTER A WITH DIAERESIS */
+"\x04d3"	"a`"	/* <U04D3> CYRILLIC SMALL LETTER A WITH DIAERESIS */
+"\x04d6"	"E`"	/* <U04D6> CYRILLIC CAPITAL LETTER IE WITH BREVE */
+"\x04d7"	"e`"	/* <U04D7> CYRILLIC SMALL LETTER IE WITH BREVE */
+"\x04d8"	"A`"	/* <U04D8> CYRILLIC CAPITAL LETTER SCHWA */
+"\x04d9"	"a`"	/* <U04D9> CYRILLIC SMALL LETTER SCHWA */
+"\x04dc"	"ZH`"	/* <U04DC> CYRILLIC CAPITAL LETTER ZHE WITH DIAERESIS */
+"\x04dd"	"zh`"	/* <U04DD> CYRILLIC SMALL LETTER ZHE WITH DIAERESIS */
+"\x04de"	"Z`"	/* <U04DE> CYRILLIC CAPITAL LETTER ZE WITH DIAERESIS */
+"\x04df"	"z`"	/* <U04DF> CYRILLIC SMALL LETTER ZE WITH DIAERESIS */
+"\x04e0"	"Z`"	/* <U04E0> CYRILLIC CAPITAL LETTER ABKHASIAN DZE */
+"\x04e1"	"z`"	/* <U04E1> CYRILLIC SMALL LETTER ABKHASIAN DZE */
+"\x04e4"	"I`"	/* <U04E4> CYRILLIC CAPITAL LETTER I WITH DIAERESIS */
+"\x04e5"	"i`"	/* <U04E5> CYRILLIC SMALL LETTER I WITH DIAERESIS */
+"\x04e6"	"O`"	/* <U04E6> CYRILLIC CAPITAL LETTER O WITH DIAERESIS */
+"\x04e7"	"o`"	/* <U04E7> CYRILLIC SMALL LETTER O WITH DIAERESIS */
+"\x04e8"	"O`"	/* <U04E8> CYRILLIC CAPITAL LETTER BARRED O */
+"\x04e9"	"o`"	/* <U04E9> CYRILLIC SMALL LETTER BARRED O */
+"\x04f0"	"U`"	/* <U04F0> CYRILLIC CAPITAL LETTER U WITH DIAERESIS */
+"\x04f1"	"u`"	/* <U04F1> CYRILLIC SMALL LETTER U WITH DIAERESIS */
+"\x04f2"	"U`"	/* <U04F2> CYRILLIC CAPITAL LETTER U WITH DOUBLE ACUTE */
+"\x04f3"	"u`"	/* <U04F3> CYRILLIC SMALL LETTER U WITH DOUBLE ACUTE */
+"\x04f4"	"CH`"	/* <U04F4> CYRILLIC CAPITAL LETTER CHE WITH DIAERESIS */
+"\x04f5"	"ch`"	/* <U04F5> CYRILLIC SMALL LETTER CHE WITH DIAERESIS */
+"\x04f8"	"Y`"	/* <U04F8> CYRILLIC CAPITAL LETTER YERU WITH DIAERESIS */
+"\x04f9"	"y`"	/* <U04F9> CYRILLIC SMALL LETTER YERU WITH DIAERESIS */
 "\x2002"	" "	/* <U2002> EN SPACE */
 "\x2003"	" "	/* <U2003> EM SPACE */
 "\x2004"	" "	/* <U2004> THREE-PER-EM SPACE */
Siddhesh Poyarekar Dec. 26, 2018, 10:07 a.m. UTC | #2
On 20/12/18 4:46 AM, Egor Kobylkin wrote:
> Freeze ping.
> 
> I'd like to ping the list on this patch and to have some discussion on
> moving ASCII transliteration to locale/C-translit.h.in before the freeze.
> 
> The wiki page for 2.29 [12] is set as "immutable" for newly registered
> users, not sure it is so desired. I could not add this patch there as
> "desired".
> I have added 2.29 keyword to the bug entry.
> 
> Bests,
> Egor Kobylkin
> 
> 
> [12] https://sourceware.org/glibc/wiki/Release/2.29

cc'd Rafal since I am not equipped to review this.  Only nit I can point 
out is that you need to remove the "Contributed by" line that you added; 
we don't do that any more.  You can remove the earlier contributed by 
line too since it's no longer part of our process.

Also, if you'd like edit access to the wiki then please tell me your 
username (assuming you've created an account on the wiki, please do if 
you haven't) and I'll add you to the editor group.  It's a measure we 
added to counter the high amounts of spam we faced on the wiki.

Thanks,
Siddhesh
Diego (Egor) Kobylkin Dec. 26, 2018, 12:13 p.m. UTC | #3
On 26.12.18 11:07, Siddhesh Poyarekar wrote:
> On 20/12/18 4:46 AM, Egor Kobylkin wrote:
>> Freeze ping.
>>
>> I'd like to ping the list on this patch and to have some discussion on
>> moving ASCII transliteration to locale/C-translit.h.in before the freeze.
>>
>> The wiki page for 2.29 [12] is set as "immutable" for newly registered
>> users, not sure it is so desired. I could not add this patch there as
>> "desired".
>> I have added 2.29 keyword to the bug entry.
>>
>> Bests,
>> Egor Kobylkin
>>
>>
>> [12] https://sourceware.org/glibc/wiki/Release/2.29
> 
> cc'd Rafal since I am not equipped to review this.  Only nit I can point
> out is that you need to remove the "Contributed by" line that you added;
> we don't do that any more.  You can remove the earlier contributed by
> line too since it's no longer part of our process.
> 
> Also, if you'd like edit access to the wiki then please tell me your
> username (assuming you've created an account on the wiki, please do if
> you haven't) and I'll add you to the editor group.  It's a measure we
> added to counter the high amounts of spam we faced on the wiki.
> 
> Thanks,
> Siddhesh

Thanks, Siddhesh, yes, please could you add my username EgorKobylkin to
the editors group.

Rafal has requested help and guidance about this patch in another email
to this list [1]. I hope other members would chime in on that in time
for 2.29. I understand we need input from those involved in C locale
that is compiled into the libc binaries (as opposed to the rest of
locales that are shipped in plain text, not compiled).

@Rafal - I know you have asked to drop your email from To: as you are
getting them through your list subscription and so twice. But I guess
To: is still helpful to see who is involved. I am not subscribed to the
list myself, so I would like my email to be kept on To: or CC: for this.

Bests,
Egor

[1] https://sourceware.org/ml/libc-alpha/2018-12/msg00787.html
Siddhesh Poyarekar Dec. 27, 2018, 1:30 a.m. UTC | #4
On 26/12/18 5:43 PM, Egor Kobylkin wrote:
> Thanks, Siddhesh, yes, please could you add my username EgorKobylkin to
> the editors group.

Done.  Here's a weird statistic: you're the first user on that wiki with 
name starting with E!

> Rafal has requested help and guidance about this patch in another email
> to this list [1]. I hope other members would chime in on that in time
> for 2.29. I understand we need input from those involved in C locale
> that is compiled into the libc binaries (as opposed to the rest of
> locales that are shipped in plain text, not compiled).

Ah OK, I missed that email.  It'll have to wait for more inputs though 
because like I said, I don't have enough experience in locales to make 
an intelligent comment, definitely not for Cyrillic.

> @Rafal - I know you have asked to drop your email from To: as you are
> getting them through your list subscription and so twice. But I guess
> To: is still helpful to see who is involved. I am not subscribed to the
> list myself, so I would like my email to be kept on To: or CC: for this.

I added @Rafal because it's kinda standard practice to do that to get an 
individual's attention since otherwise an email could get lost in the 
traffic.  @Rafal, I'll remove it if you object.

Siddhesh
Rafal Luzynski Dec. 27, 2018, 11:28 a.m. UTC | #5
27.12.2018 02:30 Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
> 
> On 26/12/18 5:43 PM, Egor Kobylkin wrote:
> [...]
> > Rafal has requested help and guidance about this patch in another email
> > to this list [1]. I hope other members would chime in on that in time
> > for 2.29. I understand we need input from those involved in C locale
> > that is compiled into the libc binaries (as opposed to the rest of
> > locales that are shipped in plain text, not compiled).
> 
> Ah OK, I missed that email.  It'll have to wait for more inputs though 
> because like I said, I don't have enough experience in locales to make 
> an intelligent comment, definitely not for Cyrillic.

My email is here:

https://sourceware.org/ml/libc-alpha/2018-12/msg00787.html

My questions are not related with Cyrillic but in general how
transliteration should be implemented.  You may replace "Cyrillic"
with any other script you know and ask yourself "how would I implement
transliteration from Foo Alphabet to ASCII".

I think that so far there was no transliteration common for all
locales except translit_combine which just removes the combining
diacritic characters.

Can we have any live meeting, like on IRC?  I think that we could have
more questions answered in direct conversation.  By email we can have
little more than one question and answer per day.

> > @Rafal - I know you have asked to drop your email from To: as you are
> > getting them through your list subscription and so twice. But I guess
> > To: is still helpful to see who is involved. I am not subscribed to the
> > list myself, so I would like my email to be kept on To: or CC: for this.
> 
> I added @Rafal because it's kinda standard practice to do that to get an 
> individual's attention since otherwise an email could get lost in the 
> traffic.  @Rafal, I'll remove it if you object.

I don't object here.  Previously I was complaining about large patches
which arrive in two copies and tend to exceed my email quota.  Regular
conversation does not cause much problem for me.

Regards,

Rafal

Patch
diff mbox series

From b9cd550028ecf7c875c9d7250c8598433b1fc474 Mon Sep 17 00:00:00 2001
From: Egor Kobylkin <egor@kobylkin.com>
Date: Sat, 8 Dec 2018 22:08:59 +0100
Subject: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872]

	[BZ #2872]
	* locale/C-translit.h.in: Add Cyrillic transliteration.
---
 locale/C-translit.h.in | 170 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 170 insertions(+)

diff --git a/locale/C-translit.h.in b/locale/C-translit.h.in
index e27f39e8fe..bd64edc609 100644
--- a/locale/C-translit.h.in
+++ b/locale/C-translit.h.in
@@ -2,6 +2,7 @@ 
    Copyright (C) 2000-2018 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
    Contributed by Ulrich Drepper <drepper@redhat.com>, 2000.
+   0401-04f9 contributed by Egor Kobylkin <Egor@Kobylkin.com>, 2018.
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -56,6 +57,175 @@ 
 "\x02cd"	"_"	/* <U02CD> MODIFIER LETTER LOW MACRON */
 "\x02d0"	":"	/* <U02D0> MODIFIER LETTER TRIANGULAR COLON */
 "\x02dc"	"~"	/* <U02DC> SMALL TILDE */
+"\x0401"	"YO"	/* <U0401> CYRILLIC CAPITAL LETTER IO */
+"\x0402"	"DJ"	/* <U0402> CYRILLIC CAPITAL LETTER DJE */
+"\x0403"	"G`"	/* <U0403> CYRILLIC CAPITAL LETTER GJE */
+"\x0404"	"YE"	/* <U0404> CYRILLIC CAPITAL LETTER UKRAINIAN IE */
+"\x0405"	"Z`"	/* <U0405> CYRILLIC CAPITAL LETTER DZE */
+"\x0406"	"I"	/* <U0406> CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I */
+"\x0407"	"YI"	/* <U0407> CYRILLIC CAPITAL LETTER YI */
+"\x0408"	"J"	/* <U0408> CYRILLIC CAPITAL LETTER JE */
+"\x0409"	"L`"	/* <U0409> CYRILLIC CAPITAL LETTER LJE */
+"\x040a"	"N`"	/* <U040A> CYRILLIC CAPITAL LETTER NJE */
+"\x040b"	"TSH"	/* <U040B> CYRILLIC CAPITAL LETTER TSHE */
+"\x040c"	"K`"	/* <U040C> CYRILLIC CAPITAL LETTER KJE */
+"\x040e"	"U`"	/* <U040E> CYRILLIC CAPITAL LETTER SHORT U */
+"\x040f"	"DH"	/* <U040F> CYRILLIC CAPITAL LETTER DZHE */
+"\x0410"	"A"	/* <U0410> CYRILLIC CAPITAL LETTER A */
+"\x0411"	"B"	/* <U0411> CYRILLIC CAPITAL LETTER BE */
+"\x0412"	"V"	/* <U0412> CYRILLIC CAPITAL LETTER VE */
+"\x0413"	"G"	/* <U0413> CYRILLIC CAPITAL LETTER GHE */
+"\x0414"	"D"	/* <U0414> CYRILLIC CAPITAL LETTER DE */
+"\x0415"	"E"	/* <U0415> CYRILLIC CAPITAL LETTER IE */
+"\x0416"	"ZH"	/* <U0416> CYRILLIC CAPITAL LETTER ZHE */
+"\x0417"	"Z"	/* <U0417> CYRILLIC CAPITAL LETTER ZE */
+"\x0418"	"I"	/* <U0418> CYRILLIC CAPITAL LETTER I */
+"\x0419"	"J"	/* <U0419> CYRILLIC CAPITAL LETTER SHORT I */
+"\x041a"	"K"	/* <U041A> CYRILLIC CAPITAL LETTER KA */
+"\x041b"	"L"	/* <U041B> CYRILLIC CAPITAL LETTER EL */
+"\x041c"	"M"	/* <U041C> CYRILLIC CAPITAL LETTER EM */
+"\x041d"	"N"	/* <U041D> CYRILLIC CAPITAL LETTER EN */
+"\x041e"	"O"	/* <U041E> CYRILLIC CAPITAL LETTER O */
+"\x041f"	"P"	/* <U041F> CYRILLIC CAPITAL LETTER PE */
+"\x0420"	"R"	/* <U0420> CYRILLIC CAPITAL LETTER ER */
+"\x0421"	"S"	/* <U0421> CYRILLIC CAPITAL LETTER ES */
+"\x0422"	"T"	/* <U0422> CYRILLIC CAPITAL LETTER TE */
+"\x0423"	"U"	/* <U0423> CYRILLIC CAPITAL LETTER U */
+"\x0424"	"F"	/* <U0424> CYRILLIC CAPITAL LETTER EF */
+"\x0425"	"X"	/* <U0425> CYRILLIC CAPITAL LETTER HA */
+"\x0426"	"CZ"	/* <U0426> CYRILLIC CAPITAL LETTER TSE */
+"\x0427"	"CH"	/* <U0427> CYRILLIC CAPITAL LETTER CHE */
+"\x0428"	"SH"	/* <U0428> CYRILLIC CAPITAL LETTER SHA */
+"\x0429"	"SHH"	/* <U0429> CYRILLIC CAPITAL LETTER SHCHA */
+"\x042a"	"A`"	/* <U042A> CYRILLIC CAPITAL LETTER HARD SIGN */
+"\x042b"	"Y`"	/* <U042B> CYRILLIC CAPITAL LETTER YERU */
+"\x042c"	"`"	/* <U042C> CYRILLIC CAPITAL LETTER SOFT SIGN */
+"\x042d"	"E`"	/* <U042D> CYRILLIC CAPITAL LETTER E */
+"\x042e"	"YU"	/* <U042E> CYRILLIC CAPITAL LETTER YU */
+"\x042f"	"YA"	/* <U042F> CYRILLIC CAPITAL LETTER YA */
+"\x0430"	"a"	/* <U0430> CYRILLIC SMALL LETTER A */
+"\x0431"	"b"	/* <U0431> CYRILLIC SMALL LETTER BE */
+"\x0432"	"v"	/* <U0432> CYRILLIC SMALL LETTER VE */
+"\x0433"	"g"	/* <U0433> CYRILLIC SMALL LETTER GHE */
+"\x0434"	"d"	/* <U0434> CYRILLIC SMALL LETTER DE */
+"\x0435"	"e"	/* <U0435> CYRILLIC SMALL LETTER IE */
+"\x0436"	"zh"	/* <U0436> CYRILLIC SMALL LETTER ZHE */
+"\x0437"	"z"	/* <U0437> CYRILLIC SMALL LETTER ZE */
+"\x0438"	"i"	/* <U0438> CYRILLIC SMALL LETTER I */
+"\x0439"	"j"	/* <U0439> CYRILLIC SMALL LETTER SHORT I */
+"\x043a"	"k"	/* <U043A> CYRILLIC SMALL LETTER KA */
+"\x043b"	"l"	/* <U043B> CYRILLIC SMALL LETTER EL */
+"\x043c"	"m"	/* <U043C> CYRILLIC SMALL LETTER EM */
+"\x043d"	"n"	/* <U043D> CYRILLIC SMALL LETTER EN */
+"\x043e"	"o"	/* <U043E> CYRILLIC SMALL LETTER O */
+"\x043f"	"p"	/* <U043F> CYRILLIC SMALL LETTER PE */
+"\x0440"	"r"	/* <U0440> CYRILLIC SMALL LETTER ER */
+"\x0441"	"s"	/* <U0441> CYRILLIC SMALL LETTER ES */
+"\x0442"	"t"	/* <U0442> CYRILLIC SMALL LETTER TE */
+"\x0443"	"u"	/* <U0443> CYRILLIC SMALL LETTER U */
+"\x0444"	"f"	/* <U0444> CYRILLIC SMALL LETTER EF */
+"\x0445"	"x"	/* <U0445> CYRILLIC SMALL LETTER HA */
+"\x0446"	"cz"	/* <U0446> CYRILLIC SMALL LETTER TSE */
+"\x0447"	"ch"	/* <U0447> CYRILLIC SMALL LETTER CHE */
+"\x0448"	"sh"	/* <U0448> CYRILLIC SMALL LETTER SHA */
+"\x0449"	"shh"	/* <U0449> CYRILLIC SMALL LETTER SHCHA */
+"\x044a"	"``"	/* <U044A> CYRILLIC SMALL LETTER HARD SIGN */
+"\x044b"	"y`"	/* <U044B> CYRILLIC SMALL LETTER YERU */
+"\x044c"	"`"	/* <U044C> CYRILLIC SMALL LETTER SOFT SIGN */
+"\x044d"	"e`"	/* <U044D> CYRILLIC SMALL LETTER E */
+"\x044e"	"yu"	/* <U044E> CYRILLIC SMALL LETTER YU */
+"\x044f"	"ya"	/* <U044F> CYRILLIC SMALL LETTER YA */
+"\x0451"	"yo"	/* <U0451> CYRILLIC SMALL LETTER IO */
+"\x0452"	"dj"	/* <U0452> CYRILLIC SMALL LETTER DJE */
+"\x0453"	"g`"	/* <U0453> CYRILLIC SMALL LETTER GJE */
+"\x0454"	"ye"	/* <U0454> CYRILLIC SMALL LETTER UKRAINIAN IE */
+"\x0455"	"z`"	/* <U0455> CYRILLIC SMALL LETTER DZE */
+"\x0456"	"i"	/* <U0456> CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I */
+"\x0457"	"yi"	/* <U0457> CYRILLIC SMALL LETTER YI */
+"\x0458"	"j"	/* <U0458> CYRILLIC SMALL LETTER JE */
+"\x0459"	"l`"	/* <U0459> CYRILLIC SMALL LETTER LJE */
+"\x045a"	"n`"	/* <U045A> CYRILLIC SMALL LETTER NJE */
+"\x045b"	"tsh"	/* <U045B> CYRILLIC SMALL LETTER TSHE */
+"\x045c"	"k`"	/* <U045C> CYRILLIC SMALL LETTER KJE */
+"\x045e"	"u`"	/* <U045E> CYRILLIC SMALL LETTER SHORT U */
+"\x045f"	"dh"	/* <U045F> CYRILLIC SMALL LETTER DZHE */
+"\x046a"	"O`"	/* <U046A> CYRILLIC CAPITAL LETTER BIG YUS */
+"\x046b"	"o`"	/* <U046B> CYRILLIC SMALL LETTER BIG YUS */
+"\x0472"	"FH"	/* <U0472> CYRILLIC CAPITAL LETTER FITA */
+"\x0473"	"fh"	/* <U0473> CYRILLIC SMALL LETTER FITA */
+"\x0474"	"YH"	/* <U0474> CYRILLIC CAPITAL LETTER IZHITSA */
+"\x0475"	"yh"	/* <U0475> CYRILLIC SMALL LETTER IZHITSA */
+"\x048c"	"E`"	/* <U048C> CYRILLIC CAPITAL LETTER SEMISOFT SIGN */
+"\x048d"	"e`"	/* <U048D> CYRILLIC SMALL LETTER SEMISOFT SIGN */
+"\x0490"	"G`"	/* <U0490> CYRILLIC CAPITAL LETTER GHE WITH UPTURN */
+"\x0491"	"g`"	/* <U0491> CYRILLIC SMALL LETTER GHE WITH UPTURN */
+"\x0492"	"GH"	/* <U0492> CYRILLIC CAPITAL LETTER GHE WITH STROKE */
+"\x0493"	"gh"	/* <U0493> CYRILLIC SMALL LETTER GHE WITH STROKE */
+"\x0494"	"GH"	/* <U0494> CYRILLIC CAPITAL LETTER GHE WITH MIDDLE HOOK */
+"\x0495"	"gh"	/* <U0495> CYRILLIC SMALL LETTER GHE WITH MIDDLE HOOK */
+"\x0496"	"ZH`"	/* <U0496> CYRILLIC CAPITAL LETTER ZHE WITH DESCENDER */
+"\x0497"	"zh`"	/* <U0497> CYRILLIC SMALL LETTER ZHE WITH DESCENDER */
+"\x049a"	"K`"	/* <U049A> CYRILLIC CAPITAL LETTER KA WITH DESCENDER */
+"\x049b"	"k`"	/* <U049B> CYRILLIC SMALL LETTER KA WITH DESCENDER */
+"\x049e"	"K`"	/* <U049E> CYRILLIC CAPITAL LETTER KA WITH STROKE */
+"\x049f"	"k`"	/* <U049F> CYRILLIC SMALL LETTER KA WITH STROKE */
+"\x04a2"	"N`"	/* <U04A2> CYRILLIC CAPITAL LETTER EN WITH DESCENDER */
+"\x04a3"	"n`"	/* <U04A3> CYRILLIC SMALL LETTER EN WITH DESCENDER */
+"\x04a4"	"NG"	/* <U04A4> CYRILLIC CAPITAL LIGATURE EN GHE */
+"\x04a5"	"ng"	/* <U04A5> CYRILLIC SMALL LIGATURE EN GHE */
+"\x04a6"	"P`"	/* <U04A6> CYRILLIC CAPITAL LETTER PE WITH MIDDLE HOOK */
+"\x04a7"	"p`"	/* <U04A7> CYRILLIC SMALL LETTER PE WITH MIDDLE HOOK */
+"\x04a8"	"O`"	/* <U04A8> CYRILLIC CAPITAL LETTER ABKHASIAN HA */
+"\x04a9"	"o`"	/* <U04A9> CYRILLIC SMALL LETTER ABKHASIAN HA */
+"\x04aa"	"C`"	/* <U04AA> CYRILLIC CAPITAL LETTER ES WITH DESCENDER */
+"\x04ab"	"C`"	/* <U04AB> CYRILLIC SMALL LETTER ES WITH DESCENDER */
+"\x04ac"	"T`"	/* <U04AC> CYRILLIC CAPITAL LETTER TE WITH DESCENDER */
+"\x04ad"	"t`"	/* <U04AD> CYRILLIC SMALL LETTER TE WITH DESCENDER */
+"\x04ae"	"U"	/* <U04AE> CYRILLIC CAPITAL LETTER STRAIGHT U */
+"\x04af"	"u"	/* <U04AF> CYRILLIC SMALL LETTER STRAIGHT U */
+"\x04b2"	"H`"	/* <U04B2> CYRILLIC CAPITAL LETTER HA WITH DESCENDER */
+"\x04b3"	"h`"	/* <U04B3> CYRILLIC SMALL LETTER HA WITH DESCENDER */
+"\x04b4"	"TCZ"	/* <U04B4> CYRILLIC CAPITAL LIGATURE TE TSE */
+"\x04b5"	"tcz"	/* <U04B5> CYRILLIC SMALL LIGATURE TE TSE */
+"\x04ba"	"SH`"	/* <U04BA> CYRILLIC CAPITAL LETTER SHHA */
+"\x04bb"	"SH`"	/* <U04BB> CYRILLIC SMALL LETTER SHHA */
+"\x04bc"	"CH`"	/* <U04BC> CYRILLIC CAPITAL LETTER ABKHASIAN CHE */
+"\x04bd"	"ch`"	/* <U04BD> CYRILLIC SMALL LETTER ABKHASIAN CHE */
+"\x04be"	"CH`"	/* <U04BE> CYRILLIC CAPITAL LETTER ABKHASIAN CHE WITH DESCENDER */
+"\x04bf"	"ch`"	/* <U04BF> CYRILLIC SMALL LETTER ABKHASIAN CHE WITH DESCENDER */
+"\x04c0"	"i"	/* <U04C0> CYRILLIC LETTER PALOCHKA */
+"\x04c1"	"ZH`"	/* <U04C1> CYRILLIC CAPITAL LETTER ZHE WITH BREVE */
+"\x04c2"	"zh`"	/* <U04C2> CYRILLIC SMALL LETTER ZHE WITH BREVE */
+"\x04cb"	"CH`"	/* <U04CB> CYRILLIC CAPITAL LETTER KHAKASSIAN CHE */
+"\x04cc"	"ch`"	/* <U04CC> CYRILLIC SMALL LETTER KHAKASSIAN CHE */
+"\x04d0"	"A`"	/* <U04D0> CYRILLIC CAPITAL LETTER A WITH BREVE */
+"\x04d1"	"a`"	/* <U04D1> CYRILLIC SMALL LETTER A WITH BREVE */
+"\x04d2"	"A`"	/* <U04D2> CYRILLIC CAPITAL LETTER A WITH DIAERESIS */
+"\x04d3"	"a`"	/* <U04D3> CYRILLIC SMALL LETTER A WITH DIAERESIS */
+"\x04d6"	"E`"	/* <U04D6> CYRILLIC CAPITAL LETTER IE WITH BREVE */
+"\x04d7"	"e`"	/* <U04D7> CYRILLIC SMALL LETTER IE WITH BREVE */
+"\x04d8"	"A`"	/* <U04D8> CYRILLIC CAPITAL LETTER SCHWA */
+"\x04d9"	"a`"	/* <U04D9> CYRILLIC SMALL LETTER SCHWA */
+"\x04dc"	"ZH`"	/* <U04DC> CYRILLIC CAPITAL LETTER ZHE WITH DIAERESIS */
+"\x04dd"	"zh`"	/* <U04DD> CYRILLIC SMALL LETTER ZHE WITH DIAERESIS */
+"\x04de"	"Z`"	/* <U04DE> CYRILLIC CAPITAL LETTER ZE WITH DIAERESIS */
+"\x04df"	"z`"	/* <U04DF> CYRILLIC SMALL LETTER ZE WITH DIAERESIS */
+"\x04e0"	"Z`"	/* <U04E0> CYRILLIC CAPITAL LETTER ABKHASIAN DZE */
+"\x04e1"	"z`"	/* <U04E1> CYRILLIC SMALL LETTER ABKHASIAN DZE */
+"\x04e4"	"I`"	/* <U04E4> CYRILLIC CAPITAL LETTER I WITH DIAERESIS */
+"\x04e5"	"i`"	/* <U04E5> CYRILLIC SMALL LETTER I WITH DIAERESIS */
+"\x04e6"	"O`"	/* <U04E6> CYRILLIC CAPITAL LETTER O WITH DIAERESIS */
+"\x04e7"	"o`"	/* <U04E7> CYRILLIC SMALL LETTER O WITH DIAERESIS */
+"\x04e8"	"O`"	/* <U04E8> CYRILLIC CAPITAL LETTER BARRED O */
+"\x04e9"	"o`"	/* <U04E9> CYRILLIC SMALL LETTER BARRED O */
+"\x04f0"	"U`"	/* <U04F0> CYRILLIC CAPITAL LETTER U WITH DIAERESIS */
+"\x04f1"	"u`"	/* <U04F1> CYRILLIC SMALL LETTER U WITH DIAERESIS */
+"\x04f2"	"U`"	/* <U04F2> CYRILLIC CAPITAL LETTER U WITH DOUBLE ACUTE */
+"\x04f3"	"u`"	/* <U04F3> CYRILLIC SMALL LETTER U WITH DOUBLE ACUTE */
+"\x04f4"	"CH`"	/* <U04F4> CYRILLIC CAPITAL LETTER CHE WITH DIAERESIS */
+"\x04f5"	"ch`"	/* <U04F5> CYRILLIC SMALL LETTER CHE WITH DIAERESIS */
+"\x04f8"	"Y`"	/* <U04F8> CYRILLIC CAPITAL LETTER YERU WITH DIAERESIS */
+"\x04f9"	"y`"	/* <U04F9> CYRILLIC SMALL LETTER YERU WITH DIAERESIS */
 "\x2002"	" "	/* <U2002> EN SPACE */
 "\x2003"	" "	/* <U2003> EM SPACE */
 "\x2004"	" "	/* <U2004> THREE-PER-EM SPACE */
-- 
2.17.1