[COMMITTED] locale/C-translit.h.in: Cyrillic -> ASCII transliteration [BZ #2872]
diff mbox series

Message ID 904613472.72769.1563652907282@poczta.nazwa.pl
State New
Headers show
Series
  • [COMMITTED] locale/C-translit.h.in: Cyrillic -> ASCII transliteration [BZ #2872]
Related show

Commit Message

Rafal Luzynski July 20, 2019, 8:01 p.m. UTC
For the record, this is the patch I have just pushed to master.
The content is exactly the same as Egor's v12 patch, minor changes
include the commit message reworded and the ChangeLog entry added.

I don't yet close the bug in Bugzilla because there may be few
minor updates (e.g., should we add NEWS entry?  Now I lean into
saying no.)

--- 8< ---

From: Egor Kobylkin <egor@kobylkin.com>
Date: Wed, 2 Jan 2019 05:50:13 +0100
Subject: [PATCH] locale/C-translit.h.in: Cyrillic -> ASCII transliteration
[BZ #2872]

This patch adds Cyrillic to plain ASCII transliteration table according
to GOST 7.79-2000 System B standard to the C locale.

	[BZ #2872]
	* locale/C-translit.h.in: Add Cyrillic transliteration.
---
 ChangeLog              |   5 ++
 locale/C-translit.h.in | 169 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 174 insertions(+)

 "\x2004"	" "	# <U2004> THREE-PER-EM SPACE

Comments

Rafal Luzynski July 22, 2019, 8:53 p.m. UTC | #1
Egor,

Here are my doubts and questions about the patch which I have
committed.  If they are resolved before the final release,
it will be fine.  If not - fine as well.

Sorry if they were discussed and answered before, my memory
is getting lost in these.


20.07.2019 22:01 Rafal Luzynski <digitalfreak@lingonborough.com> wrote:
>  [...]
>  	* sysdeps/unix/sysv/linux/syscall-names.list: Add system calls
> diff --git a/locale/C-translit.h.in b/locale/C-translit.h.in
> index d5f00df0f3..758171c394 100644
> --- a/locale/C-translit.h.in
> +++ b/locale/C-translit.h.in
> @@ -56,6 +56,175 @@
>  "\x02cd"	"_"	# <U02CD> MODIFIER LETTER LOW MACRON
>  "\x02d0"	":"	# <U02D0> MODIFIER LETTER TRIANGULAR COLON
>  "\x02dc"	"~"	# <U02DC> SMALL TILDE

There are gaps.  For example, here
<U0400> CYRILLIC CAPITAL LETTER IE WITH GRAVE (Ѐ)
is missing.  Should we add it and transliterate as, e.g., "E`"?

> +"\x0401"	"YO"	# <U0401> CYRILLIC CAPITAL LETTER IO
> +"\x0402"	"DJ"	# <U0402> CYRILLIC CAPITAL LETTER DJE
> +"\x0403"	"G`"	# <U0403> CYRILLIC CAPITAL LETTER GJE
> +"\x0404"	"YE"	# <U0404> CYRILLIC CAPITAL LETTER UKRAINIAN IE
> +"\x0405"	"Z`"	# <U0405> CYRILLIC CAPITAL LETTER DZE
> +"\x0406"	"I"	# <U0406> CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
> +"\x0407"	"YI"	# <U0407> CYRILLIC CAPITAL LETTER YI
> +"\x0408"	"J"	# <U0408> CYRILLIC CAPITAL LETTER JE
> +"\x0409"	"L`"	# <U0409> CYRILLIC CAPITAL LETTER LJE
> +"\x040a"	"N`"	# <U040A> CYRILLIC CAPITAL LETTER NJE

Isn't this ambiguous if we transliterate:

"Љ" -> "L`"
"Њ" -> "N`"

but also:

"Ль" -> "L`"
"Нь" -> "N`"

?

> +"\x040b"	"TSH"	# <U040B> CYRILLIC CAPITAL LETTER TSHE
> +"\x040c"	"K`"	# <U040C> CYRILLIC CAPITAL LETTER KJE
> +"\x040e"	"U`"	# <U040E> CYRILLIC CAPITAL LETTER SHORT U

<U040D> CYRILLIC CAPITAL LETTER I WITH GRAVE (Ѝ)
is missing here.  Shouldn't we add it?  "I`" maybe?

> +"\x040f"	"DH"	# <U040F> CYRILLIC CAPITAL LETTER DZHE
> +"\x0410"	"A"	# <U0410> CYRILLIC CAPITAL LETTER A
> +"\x0411"	"B"	# <U0411> CYRILLIC CAPITAL LETTER BE
> [...]

> [...]
> +"\x042a"	"A`"	# <U042A> CYRILLIC CAPITAL LETTER HARD SIGN
> [...]
> +"\x044a"	"``"	# <U044A> CYRILLIC SMALL LETTER HARD SIGN
> [...]

This is slightly reordered to illustrate my question.  Isn't it a problem
that uppercase hard sigh is transliterated to "A`" while the lowercase
is transliterated to "``"?  My doubt is that the transliterated graphemes
are not each others' upper/lower case variants.  If you look at the soft
sign:

> [...]
> +"\x042c"	"`"	# <U042C> CYRILLIC CAPITAL LETTER SOFT SIGN
> [...]
> +"\x044c"	"`"	# <U044C> CYRILLIC SMALL LETTER SOFT SIGN
> [...]

they don't have this problem.

> [...]
> +"\x042d"	"E`"	# <U042D> CYRILLIC CAPITAL LETTER E
> [...]
> +"\x044d"	"e`"	# <U044D> CYRILLIC SMALL LETTER E
> [...]
> +"\x048c"	"E`"	# <U048C> CYRILLIC CAPITAL LETTER SEMISOFT SIGN
> +"\x048d"	"e`"	# <U048D> CYRILLIC SMALL LETTER SEMISOFT SIGN
> [...]

Isn't this again an ambiguity problem?

> +"\x045c"	"k`"	# <U045C> CYRILLIC SMALL LETTER KJE
> +"\x045e"	"u`"	# <U045E> CYRILLIC SMALL LETTER SHORT U
> +"\x045f"	"dh"	# <U045F> CYRILLIC SMALL LETTER DZHE

Here is a gap which is not critical because here is a place for some
archaic letters which are hardly used and probably it is difficult to find
the correct transliterations for them.  But somehow you have managed to
find a transliteration for this:

> +"\x046a"	"O`"	# <U046A> CYRILLIC CAPITAL LETTER BIG YUS
> +"\x046b"	"o`"	# <U046B> CYRILLIC SMALL LETTER BIG YUS

Similarly, is it possible to find and provide tranlisterations for:

- little yus (Ѧ/ѧ)?
- iotified big yus (Ѭ/ѭ) and little yus (Ѩ/ѩ)?

While at this, the transliteration of big yus ("O`"/"o`")
is again ambiguous because it is the same as Abkhasian Ha (Ҩ),
O with diaeresis (Ӧ), and barred O (Ө).

> [...]
> +"\x049a"	"K`"	# <U049A> CYRILLIC CAPITAL LETTER KA WITH DESCENDER
> +"\x049b"	"k`"	# <U049B> CYRILLIC SMALL LETTER KA WITH DESCENDER
> +"\x049e"	"K`"	# <U049E> CYRILLIC CAPITAL LETTER KA WITH STROKE
> +"\x049f"	"k`"	# <U049F> CYRILLIC SMALL LETTER KA WITH STROKE
> +"\x04a2"	"N`"	# <U04A2> CYRILLIC CAPITAL LETTER EN WITH DESCENDER
> +"\x04a3"	"n`"	# <U04A3> CYRILLIC SMALL LETTER EN WITH DESCENDER
> [...]

As you can see, there are many more ambiguities.  But while here, wouldn't
"K," be a better transliteration for Ka with descender (Қ), and "N," for
En with descender (Ң)?

> [...]
> +"\x04a8"	"O`"	# <U04A8> CYRILLIC CAPITAL LETTER ABKHASIAN HA
> +"\x04a9"	"o`"	# <U04A9> CYRILLIC SMALL LETTER ABKHASIAN HA

Is Abkhasian Ha (Ҩ) pronounced like "H"?  Then why is it transliterated
as "O" (with some additional punctuation character) instead of "H"?

There are more doubts about ambiguous transliterations and gaps which
I don't list here for the sake of brevity.  They can be easily found.

Regards,

Rafal
Diego (Egor) Kobylkin July 23, 2019, 1:12 a.m. UTC | #2
Rafal, 


let's revisit on more input after the release. 


The letters that are in GOST 7.79 System B are already transliterated as we agreed. This is the only standard we had considered for Cyrillic-ASCII.

The rest of the letters below seem to be rare or outdated, some are irregular in their respective languages. As you point out we should generally aim for consistency, of course. After the patch is in maybe we will hear from the people directly concerned and could integrate their input as well. 


Hope this helps,
Egor


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, July 22, 2019 10:53 PM, Rafal Luzynski <digitalfreak@lingonborough.com> wrote:

> Egor,
> 

> Here are my doubts and questions about the patch which I have
> committed. If they are resolved before the final release,
> it will be fine. If not - fine as well.
> 

> Sorry if they were discussed and answered before, my memory
> is getting lost in these.
> 

> 20.07.2019 22:01 Rafal Luzynski digitalfreak@lingonborough.com wrote:
> 

> > [...]
> > 

> > -   sysdeps/unix/sysv/linux/syscall-names.list: Add system calls
> >     diff --git a/locale/C-translit.h.in b/locale/C-translit.h.in
> >     index d5f00df0f3..758171c394 100644
> >     --- a/locale/C-translit.h.in
> >     +++ b/locale/C-translit.h.in
> >     @@ -56,6 +56,175 @@
> >     "\x02cd" "_" # <U02CD> MODIFIER LETTER LOW MACRON
> >     "\x02d0" ":" # <U02D0> MODIFIER LETTER TRIANGULAR COLON
> >     "\x02dc" "~" # <U02DC> SMALL TILDE
> >     

> 

> There are gaps. For example, here
> <U0400> CYRILLIC CAPITAL LETTER IE WITH GRAVE (Ѐ)
> is missing. Should we add it and transliterate as, e.g., "E`"?
> 

> > +"\x0401" "YO" # <U0401> CYRILLIC CAPITAL LETTER IO
> > +"\x0402" "DJ" # <U0402> CYRILLIC CAPITAL LETTER DJE
> > +"\x0403" "G`" # <U0403> CYRILLIC CAPITAL LETTER GJE +"\\x0404" "YE" # <U0404> CYRILLIC CAPITAL LETTER UKRAINIAN IE +"\\x0405" "Z`" # <U0405> CYRILLIC CAPITAL LETTER DZE
> > +"\x0406" "I" # <U0406> CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
> > +"\x0407" "YI" # <U0407> CYRILLIC CAPITAL LETTER YI
> > +"\x0408" "J" # <U0408> CYRILLIC CAPITAL LETTER JE
> > +"\x0409" "L`" # <U0409> CYRILLIC CAPITAL LETTER LJE +"\\x040a" "N`" # <U040A> CYRILLIC CAPITAL LETTER NJE
> 

> Isn't this ambiguous if we transliterate:
> 

> "Љ" -> "L`"
> "Њ" -> "N`"
> 

> but also:
> 

> "Ль" -> "L`"
> "Нь" -> "N`"
> 

> ?
> 

> > +"\x040b" "TSH" # <U040B> CYRILLIC CAPITAL LETTER TSHE
> > +"\x040c" "K`" # <U040C> CYRILLIC CAPITAL LETTER KJE +"\\x040e" "U`" # <U040E> CYRILLIC CAPITAL LETTER SHORT U
> 

> <U040D> CYRILLIC CAPITAL LETTER I WITH GRAVE (Ѝ)
> is missing here. Shouldn't we add it? "I`" maybe?
> 

> > +"\x040f" "DH" # <U040F> CYRILLIC CAPITAL LETTER DZHE
> > +"\x0410" "A" # <U0410> CYRILLIC CAPITAL LETTER A
> > +"\x0411" "B" # <U0411> CYRILLIC CAPITAL LETTER BE
> > [...]
> 

> > [...]
> > +"\x042a" "A`" # <U042A> CYRILLIC CAPITAL LETTER HARD SIGN
> > [...]
> > +"\x044a" "``" # <U044A> CYRILLIC SMALL LETTER HARD SIGN
> > [...]
> 

> This is slightly reordered to illustrate my question. Isn't it a problem
> that uppercase hard sigh is transliterated to "A`" while the lowercase
> is transliterated to "``"? My doubt is that the transliterated graphemes
> are not each others' upper/lower case variants. If you look at the soft
> sign:
> 

> > [...]
> > +"\x042c" "`" # <U042C> CYRILLIC CAPITAL LETTER SOFT SIGN [...] +"\\x044c" "`" # <U044C> CYRILLIC SMALL LETTER SOFT SIGN
> > [...]
> 

> they don't have this problem.
> 

> > [...]
> > +"\x042d" "E`" # <U042D> CYRILLIC CAPITAL LETTER E [...] +"\\x044d" "e`" # <U044D> CYRILLIC SMALL LETTER E
> > [...]
> > +"\x048c" "E`" # <U048C> CYRILLIC CAPITAL LETTER SEMISOFT SIGN +"\\x048d" "e`" # <U048D> CYRILLIC SMALL LETTER SEMISOFT SIGN
> > [...]
> 

> Isn't this again an ambiguity problem?
> 

> > +"\x045c" "k`" # <U045C> CYRILLIC SMALL LETTER KJE +"\\x045e" "u`" # <U045E> CYRILLIC SMALL LETTER SHORT U
> > +"\x045f" "dh" # <U045F> CYRILLIC SMALL LETTER DZHE
> 

> Here is a gap which is not critical because here is a place for some
> archaic letters which are hardly used and probably it is difficult to find
> the correct transliterations for them. But somehow you have managed to
> find a transliteration for this:
> 

> > +"\x046a" "O`" # <U046A> CYRILLIC CAPITAL LETTER BIG YUS +"\\x046b" "o`" # <U046B> CYRILLIC SMALL LETTER BIG YUS
> 

> Similarly, is it possible to find and provide tranlisterations for:
> 

> -   little yus (Ѧ/ѧ)?
> -   iotified big yus (Ѭ/ѭ) and little yus (Ѩ/ѩ)?
>     

>     While at this, the transliteration of big yus ("O`"/"o`")
>     is again ambiguous because it is the same as Abkhasian Ha (Ҩ),
>     O with diaeresis (Ӧ), and barred O (Ө).
>     

> 

> > [...]
> > +"\x049a" "K`" # <U049A> CYRILLIC CAPITAL LETTER KA WITH DESCENDER +"\\x049b" "k`" # <U049B> CYRILLIC SMALL LETTER KA WITH DESCENDER
> > +"\x049e" "K`" # <U049E> CYRILLIC CAPITAL LETTER KA WITH STROKE +"\\x049f" "k`" # <U049F> CYRILLIC SMALL LETTER KA WITH STROKE
> > +"\x04a2" "N`" # <U04A2> CYRILLIC CAPITAL LETTER EN WITH DESCENDER +"\\x04a3" "n`" # <U04A3> CYRILLIC SMALL LETTER EN WITH DESCENDER
> > [...]
> 

> As you can see, there are many more ambiguities. But while here, wouldn't
> "K," be a better transliteration for Ka with descender (Қ), and "N," for
> En with descender (Ң)?
> 

> > [...]
> > +"\x04a8" "O`" # <U04A8> CYRILLIC CAPITAL LETTER ABKHASIAN HA +"\\x04a9" "o`" # <U04A9> CYRILLIC SMALL LETTER ABKHASIAN HA
> 

> Is Abkhasian Ha (Ҩ) pronounced like "H"? Then why is it transliterated
> as "O" (with some additional punctuation character) instead of "H"?
> 

> There are more doubts about ambiguous transliterations and gaps which
> I don't list here for the sake of brevity. They can be easily found.
> 

> Regards,
> 

> Rafal
Rafal Luzynski July 23, 2019, 7:02 p.m. UTC | #3
23.07.2019 03:12 "Diego (Egor) Kobylkin" <egor@kobylkin.com> wrote:
> 
> Rafal, 
> 
> let's revisit on more input after the release. 
> [...]

OK, so I read this as "no more input before the release"
and I close the bug report as FIXED.  Again, thank you Egor
and everyone else for your work.

Regards,

Rafal

Patch
diff mbox series

diff --git a/ChangeLog b/ChangeLog
index a606c5fd60..a1fdef9cff 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,8 @@ 
+2019-07-20  Egor Kobylkin  <egor@kobylkin.com>
+
+	[BZ #2872]
+	* locale/C-translit.h.in: Add Cyrillic transliteration.
+
 2019-07-19  Florian Weimer  <fweimer@redhat.com>
 
 	* sysdeps/unix/sysv/linux/syscall-names.list: Add system calls
diff --git a/locale/C-translit.h.in b/locale/C-translit.h.in
index d5f00df0f3..758171c394 100644
--- a/locale/C-translit.h.in
+++ b/locale/C-translit.h.in
@@ -56,6 +56,175 @@ 
 "\x02cd"	"_"	# <U02CD> MODIFIER LETTER LOW MACRON
 "\x02d0"	":"	# <U02D0> MODIFIER LETTER TRIANGULAR COLON
 "\x02dc"	"~"	# <U02DC> SMALL TILDE
+"\x0401"	"YO"	# <U0401> CYRILLIC CAPITAL LETTER IO
+"\x0402"	"DJ"	# <U0402> CYRILLIC CAPITAL LETTER DJE
+"\x0403"	"G`"	# <U0403> CYRILLIC CAPITAL LETTER GJE
+"\x0404"	"YE"	# <U0404> CYRILLIC CAPITAL LETTER UKRAINIAN IE
+"\x0405"	"Z`"	# <U0405> CYRILLIC CAPITAL LETTER DZE
+"\x0406"	"I"	# <U0406> CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
+"\x0407"	"YI"	# <U0407> CYRILLIC CAPITAL LETTER YI
+"\x0408"	"J"	# <U0408> CYRILLIC CAPITAL LETTER JE
+"\x0409"	"L`"	# <U0409> CYRILLIC CAPITAL LETTER LJE
+"\x040a"	"N`"	# <U040A> CYRILLIC CAPITAL LETTER NJE
+"\x040b"	"TSH"	# <U040B> CYRILLIC CAPITAL LETTER TSHE
+"\x040c"	"K`"	# <U040C> CYRILLIC CAPITAL LETTER KJE
+"\x040e"	"U`"	# <U040E> CYRILLIC CAPITAL LETTER SHORT U
+"\x040f"	"DH"	# <U040F> CYRILLIC CAPITAL LETTER DZHE
+"\x0410"	"A"	# <U0410> CYRILLIC CAPITAL LETTER A
+"\x0411"	"B"	# <U0411> CYRILLIC CAPITAL LETTER BE
+"\x0412"	"V"	# <U0412> CYRILLIC CAPITAL LETTER VE
+"\x0413"	"G"	# <U0413> CYRILLIC CAPITAL LETTER GHE
+"\x0414"	"D"	# <U0414> CYRILLIC CAPITAL LETTER DE
+"\x0415"	"E"	# <U0415> CYRILLIC CAPITAL LETTER IE
+"\x0416"	"ZH"	# <U0416> CYRILLIC CAPITAL LETTER ZHE
+"\x0417"	"Z"	# <U0417> CYRILLIC CAPITAL LETTER ZE
+"\x0418"	"I"	# <U0418> CYRILLIC CAPITAL LETTER I
+"\x0419"	"J"	# <U0419> CYRILLIC CAPITAL LETTER SHORT I
+"\x041a"	"K"	# <U041A> CYRILLIC CAPITAL LETTER KA
+"\x041b"	"L"	# <U041B> CYRILLIC CAPITAL LETTER EL
+"\x041c"	"M"	# <U041C> CYRILLIC CAPITAL LETTER EM
+"\x041d"	"N"	# <U041D> CYRILLIC CAPITAL LETTER EN
+"\x041e"	"O"	# <U041E> CYRILLIC CAPITAL LETTER O
+"\x041f"	"P"	# <U041F> CYRILLIC CAPITAL LETTER PE
+"\x0420"	"R"	# <U0420> CYRILLIC CAPITAL LETTER ER
+"\x0421"	"S"	# <U0421> CYRILLIC CAPITAL LETTER ES
+"\x0422"	"T"	# <U0422> CYRILLIC CAPITAL LETTER TE
+"\x0423"	"U"	# <U0423> CYRILLIC CAPITAL LETTER U
+"\x0424"	"F"	# <U0424> CYRILLIC CAPITAL LETTER EF
+"\x0425"	"X"	# <U0425> CYRILLIC CAPITAL LETTER HA
+"\x0426"	"CZ"	# <U0426> CYRILLIC CAPITAL LETTER TSE
+"\x0427"	"CH"	# <U0427> CYRILLIC CAPITAL LETTER CHE
+"\x0428"	"SH"	# <U0428> CYRILLIC CAPITAL LETTER SHA
+"\x0429"	"SHH"	# <U0429> CYRILLIC CAPITAL LETTER SHCHA
+"\x042a"	"A`"	# <U042A> CYRILLIC CAPITAL LETTER HARD SIGN
+"\x042b"	"Y`"	# <U042B> CYRILLIC CAPITAL LETTER YERU
+"\x042c"	"`"	# <U042C> CYRILLIC CAPITAL LETTER SOFT SIGN
+"\x042d"	"E`"	# <U042D> CYRILLIC CAPITAL LETTER E
+"\x042e"	"YU"	# <U042E> CYRILLIC CAPITAL LETTER YU
+"\x042f"	"YA"	# <U042F> CYRILLIC CAPITAL LETTER YA
+"\x0430"	"a"	# <U0430> CYRILLIC SMALL LETTER A
+"\x0431"	"b"	# <U0431> CYRILLIC SMALL LETTER BE
+"\x0432"	"v"	# <U0432> CYRILLIC SMALL LETTER VE
+"\x0433"	"g"	# <U0433> CYRILLIC SMALL LETTER GHE
+"\x0434"	"d"	# <U0434> CYRILLIC SMALL LETTER DE
+"\x0435"	"e"	# <U0435> CYRILLIC SMALL LETTER IE
+"\x0436"	"zh"	# <U0436> CYRILLIC SMALL LETTER ZHE
+"\x0437"	"z"	# <U0437> CYRILLIC SMALL LETTER ZE
+"\x0438"	"i"	# <U0438> CYRILLIC SMALL LETTER I
+"\x0439"	"j"	# <U0439> CYRILLIC SMALL LETTER SHORT I
+"\x043a"	"k"	# <U043A> CYRILLIC SMALL LETTER KA
+"\x043b"	"l"	# <U043B> CYRILLIC SMALL LETTER EL
+"\x043c"	"m"	# <U043C> CYRILLIC SMALL LETTER EM
+"\x043d"	"n"	# <U043D> CYRILLIC SMALL LETTER EN
+"\x043e"	"o"	# <U043E> CYRILLIC SMALL LETTER O
+"\x043f"	"p"	# <U043F> CYRILLIC SMALL LETTER PE
+"\x0440"	"r"	# <U0440> CYRILLIC SMALL LETTER ER
+"\x0441"	"s"	# <U0441> CYRILLIC SMALL LETTER ES
+"\x0442"	"t"	# <U0442> CYRILLIC SMALL LETTER TE
+"\x0443"	"u"	# <U0443> CYRILLIC SMALL LETTER U
+"\x0444"	"f"	# <U0444> CYRILLIC SMALL LETTER EF
+"\x0445"	"x"	# <U0445> CYRILLIC SMALL LETTER HA
+"\x0446"	"cz"	# <U0446> CYRILLIC SMALL LETTER TSE
+"\x0447"	"ch"	# <U0447> CYRILLIC SMALL LETTER CHE
+"\x0448"	"sh"	# <U0448> CYRILLIC SMALL LETTER SHA
+"\x0449"	"shh"	# <U0449> CYRILLIC SMALL LETTER SHCHA
+"\x044a"	"``"	# <U044A> CYRILLIC SMALL LETTER HARD SIGN
+"\x044b"	"y`"	# <U044B> CYRILLIC SMALL LETTER YERU
+"\x044c"	"`"	# <U044C> CYRILLIC SMALL LETTER SOFT SIGN
+"\x044d"	"e`"	# <U044D> CYRILLIC SMALL LETTER E
+"\x044e"	"yu"	# <U044E> CYRILLIC SMALL LETTER YU
+"\x044f"	"ya"	# <U044F> CYRILLIC SMALL LETTER YA
+"\x0451"	"yo"	# <U0451> CYRILLIC SMALL LETTER IO
+"\x0452"	"dj"	# <U0452> CYRILLIC SMALL LETTER DJE
+"\x0453"	"g`"	# <U0453> CYRILLIC SMALL LETTER GJE
+"\x0454"	"ye"	# <U0454> CYRILLIC SMALL LETTER UKRAINIAN IE
+"\x0455"	"z`"	# <U0455> CYRILLIC SMALL LETTER DZE
+"\x0456"	"i"	# <U0456> CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
+"\x0457"	"yi"	# <U0457> CYRILLIC SMALL LETTER YI
+"\x0458"	"j"	# <U0458> CYRILLIC SMALL LETTER JE
+"\x0459"	"l`"	# <U0459> CYRILLIC SMALL LETTER LJE
+"\x045a"	"n`"	# <U045A> CYRILLIC SMALL LETTER NJE
+"\x045b"	"tsh"	# <U045B> CYRILLIC SMALL LETTER TSHE
+"\x045c"	"k`"	# <U045C> CYRILLIC SMALL LETTER KJE
+"\x045e"	"u`"	# <U045E> CYRILLIC SMALL LETTER SHORT U
+"\x045f"	"dh"	# <U045F> CYRILLIC SMALL LETTER DZHE
+"\x046a"	"O`"	# <U046A> CYRILLIC CAPITAL LETTER BIG YUS
+"\x046b"	"o`"	# <U046B> CYRILLIC SMALL LETTER BIG YUS
+"\x0472"	"FH"	# <U0472> CYRILLIC CAPITAL LETTER FITA
+"\x0473"	"fh"	# <U0473> CYRILLIC SMALL LETTER FITA
+"\x0474"	"YH"	# <U0474> CYRILLIC CAPITAL LETTER IZHITSA
+"\x0475"	"yh"	# <U0475> CYRILLIC SMALL LETTER IZHITSA
+"\x048c"	"E`"	# <U048C> CYRILLIC CAPITAL LETTER SEMISOFT SIGN
+"\x048d"	"e`"	# <U048D> CYRILLIC SMALL LETTER SEMISOFT SIGN
+"\x0490"	"G`"	# <U0490> CYRILLIC CAPITAL LETTER GHE WITH UPTURN
+"\x0491"	"g`"	# <U0491> CYRILLIC SMALL LETTER GHE WITH UPTURN
+"\x0492"	"GH"	# <U0492> CYRILLIC CAPITAL LETTER GHE WITH STROKE
+"\x0493"	"gh"	# <U0493> CYRILLIC SMALL LETTER GHE WITH STROKE
+"\x0494"	"GH"	# <U0494> CYRILLIC CAPITAL LETTER GHE WITH MIDDLE HOOK
+"\x0495"	"gh"	# <U0495> CYRILLIC SMALL LETTER GHE WITH MIDDLE HOOK
+"\x0496"	"ZH`"	# <U0496> CYRILLIC CAPITAL LETTER ZHE WITH DESCENDER
+"\x0497"	"zh`"	# <U0497> CYRILLIC SMALL LETTER ZHE WITH DESCENDER
+"\x049a"	"K`"	# <U049A> CYRILLIC CAPITAL LETTER KA WITH DESCENDER
+"\x049b"	"k`"	# <U049B> CYRILLIC SMALL LETTER KA WITH DESCENDER
+"\x049e"	"K`"	# <U049E> CYRILLIC CAPITAL LETTER KA WITH STROKE
+"\x049f"	"k`"	# <U049F> CYRILLIC SMALL LETTER KA WITH STROKE
+"\x04a2"	"N`"	# <U04A2> CYRILLIC CAPITAL LETTER EN WITH DESCENDER
+"\x04a3"	"n`"	# <U04A3> CYRILLIC SMALL LETTER EN WITH DESCENDER
+"\x04a4"	"NG"	# <U04A4> CYRILLIC CAPITAL LIGATURE EN GHE
+"\x04a5"	"ng"	# <U04A5> CYRILLIC SMALL LIGATURE EN GHE
+"\x04a6"	"P`"	# <U04A6> CYRILLIC CAPITAL LETTER PE WITH MIDDLE HOOK
+"\x04a7"	"p`"	# <U04A7> CYRILLIC SMALL LETTER PE WITH MIDDLE HOOK
+"\x04a8"	"O`"	# <U04A8> CYRILLIC CAPITAL LETTER ABKHASIAN HA
+"\x04a9"	"o`"	# <U04A9> CYRILLIC SMALL LETTER ABKHASIAN HA
+"\x04aa"	"C`"	# <U04AA> CYRILLIC CAPITAL LETTER ES WITH DESCENDER
+"\x04ab"	"C`"	# <U04AB> CYRILLIC SMALL LETTER ES WITH DESCENDER
+"\x04ac"	"T`"	# <U04AC> CYRILLIC CAPITAL LETTER TE WITH DESCENDER
+"\x04ad"	"t`"	# <U04AD> CYRILLIC SMALL LETTER TE WITH DESCENDER
+"\x04ae"	"U"	# <U04AE> CYRILLIC CAPITAL LETTER STRAIGHT U
+"\x04af"	"u"	# <U04AF> CYRILLIC SMALL LETTER STRAIGHT U
+"\x04b2"	"H`"	# <U04B2> CYRILLIC CAPITAL LETTER HA WITH DESCENDER
+"\x04b3"	"h`"	# <U04B3> CYRILLIC SMALL LETTER HA WITH DESCENDER
+"\x04b4"	"TCZ"	# <U04B4> CYRILLIC CAPITAL LIGATURE TE TSE
+"\x04b5"	"tcz"	# <U04B5> CYRILLIC SMALL LIGATURE TE TSE
+"\x04ba"	"SH`"	# <U04BA> CYRILLIC CAPITAL LETTER SHHA
+"\x04bb"	"sh`"	# <U04BB> CYRILLIC SMALL LETTER SHHA
+"\x04bc"	"CH`"	# <U04BC> CYRILLIC CAPITAL LETTER ABKHASIAN CHE
+"\x04bd"	"ch`"	# <U04BD> CYRILLIC SMALL LETTER ABKHASIAN CHE
+"\x04be"	"CH`"	# <U04BE> CYRILLIC CAPITAL LETTER ABKHASIAN CHE WITH
DESCENDER
+"\x04bf"	"ch`"	# <U04BF> CYRILLIC SMALL LETTER ABKHASIAN CHE WITH DESCENDER
+"\x04c0"	"i"	# <U04C0> CYRILLIC LETTER PALOCHKA
+"\x04c1"	"ZH`"	# <U04C1> CYRILLIC CAPITAL LETTER ZHE WITH BREVE
+"\x04c2"	"zh`"	# <U04C2> CYRILLIC SMALL LETTER ZHE WITH BREVE
+"\x04cb"	"CH`"	# <U04CB> CYRILLIC CAPITAL LETTER KHAKASSIAN CHE
+"\x04cc"	"ch`"	# <U04CC> CYRILLIC SMALL LETTER KHAKASSIAN CHE
+"\x04d0"	"A`"	# <U04D0> CYRILLIC CAPITAL LETTER A WITH BREVE
+"\x04d1"	"a`"	# <U04D1> CYRILLIC SMALL LETTER A WITH BREVE
+"\x04d2"	"A`"	# <U04D2> CYRILLIC CAPITAL LETTER A WITH DIAERESIS
+"\x04d3"	"a`"	# <U04D3> CYRILLIC SMALL LETTER A WITH DIAERESIS
+"\x04d6"	"E`"	# <U04D6> CYRILLIC CAPITAL LETTER IE WITH BREVE
+"\x04d7"	"e`"	# <U04D7> CYRILLIC SMALL LETTER IE WITH BREVE
+"\x04d8"	"A`"	# <U04D8> CYRILLIC CAPITAL LETTER SCHWA
+"\x04d9"	"a`"	# <U04D9> CYRILLIC SMALL LETTER SCHWA
+"\x04dc"	"ZH`"	# <U04DC> CYRILLIC CAPITAL LETTER ZHE WITH DIAERESIS
+"\x04dd"	"zh`"	# <U04DD> CYRILLIC SMALL LETTER ZHE WITH DIAERESIS
+"\x04de"	"Z`"	# <U04DE> CYRILLIC CAPITAL LETTER ZE WITH DIAERESIS
+"\x04df"	"z`"	# <U04DF> CYRILLIC SMALL LETTER ZE WITH DIAERESIS
+"\x04e0"	"Z`"	# <U04E0> CYRILLIC CAPITAL LETTER ABKHASIAN DZE
+"\x04e1"	"z`"	# <U04E1> CYRILLIC SMALL LETTER ABKHASIAN DZE
+"\x04e4"	"I`"	# <U04E4> CYRILLIC CAPITAL LETTER I WITH DIAERESIS
+"\x04e5"	"i`"	# <U04E5> CYRILLIC SMALL LETTER I WITH DIAERESIS
+"\x04e6"	"O`"	# <U04E6> CYRILLIC CAPITAL LETTER O WITH DIAERESIS
+"\x04e7"	"o`"	# <U04E7> CYRILLIC SMALL LETTER O WITH DIAERESIS
+"\x04e8"	"O`"	# <U04E8> CYRILLIC CAPITAL LETTER BARRED O
+"\x04e9"	"o`"	# <U04E9> CYRILLIC SMALL LETTER BARRED O
+"\x04f0"	"U`"	# <U04F0> CYRILLIC CAPITAL LETTER U WITH DIAERESIS
+"\x04f1"	"u`"	# <U04F1> CYRILLIC SMALL LETTER U WITH DIAERESIS
+"\x04f2"	"U`"	# <U04F2> CYRILLIC CAPITAL LETTER U WITH DOUBLE ACUTE
+"\x04f3"	"u`"	# <U04F3> CYRILLIC SMALL LETTER U WITH DOUBLE ACUTE
+"\x04f4"	"CH`"	# <U04F4> CYRILLIC CAPITAL LETTER CHE WITH DIAERESIS
+"\x04f5"	"ch`"	# <U04F5> CYRILLIC SMALL LETTER CHE WITH DIAERESIS
+"\x04f8"	"Y`"	# <U04F8> CYRILLIC CAPITAL LETTER YERU WITH DIAERESIS
+"\x04f9"	"y`"	# <U04F9> CYRILLIC SMALL LETTER YERU WITH DIAERESIS
 "\x2002"	" "	# <U2002> EN SPACE
 "\x2003"	" "	# <U2003> EM SPACE