diff mbox series

[PING] Re: [PATCH] locale/C-translit.h.in: Greek -> ASCII transliteration table [BZ #12031]

Message ID w1v0WJwiUmFTKAjmrkcDGhacNQe0iUS2ah_VyswN3BKeLFtC2hOIn7fuGGoENQnOGH-1wDO9JgaKlnMA9B2i4eI7RQey2JCIZqraknAQsUA=@kobylkin.com
State New
Headers show
Series [PING] Re: [PATCH] locale/C-translit.h.in: Greek -> ASCII transliteration table [BZ #12031] | expand

Commit Message

Diego (Egor) Kobylkin Oct. 15, 2019, 11:26 a.m. UTC
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, September 4, 2019 9:31 AM, Diego (Egor) Kobylkin <egor@kobylkin.com> wrote:

> Dear locale maintainers,
> 

> fix the glibc bug 12031 "iconv -t ascii//translit with Greek characters" [1]
> add Greek transliteration rows to locale/C-translit.h.in.
> 

> This work is done on the heels of the successfully committed patch for the
> virtually the same bug [BZ #2872] but concerning Cyrillic characters. [2]
> 

> AFAIK there are many versions of transcription tables for Greek to ASCII
> transcription. Given that current iconv logic can only translit one to many
> but not many to many symbols we take the "Standard" part of
> the Romanization_of_Greek#Modern_Greek table [3]
> 

> and only keep the one letter Greek graphems. That "standard" seems to be close to
> the ELOT 743 indeed but not the same.
> 

> So we omit things like M and Μπ being transliterated as M and B accordingly.
> Rather Μπ will be treated like two separate graphems and transliterated as Mp.
> 

> Here is the list of some standards I have collected so far. There doesn't seem
> a way to harmonize them all into one. But if anyone want to propose a solution -
> please do.
> 

> -   ΕΛΟΤ 743 https://www.teicrete.gr/users/kutrulis/Ergalia/ELOT743.htm Passports.
> -   ISO 843 https://en.wikipedia.org/wiki/ISO_843
> -   ALA-LC https://www.loc.gov/catdir/cpso/romanization/greek.pdf Book titles.
> -   BGN/PCGN http://libraries.ucsd.edu/bib/fed/USBGN_romanization.pdf
> -   http://geonames.nga.mil/gns/html/Romanization/Romanization_Greek.pdf Geographical names.
>     

> Furthermore to cover the whole U0370-U03FF Greek/Coptic Unicode range I have
> asked around and made a best effort transliteration for the rest of characters
> not covered in above standards.
>     

> Should you have better sources for the actual translit entries please make sure to
> send your feedback!
>     

> The patch is attached.
>     

> Best regards,
> Egor Kobylkin
>     

> https://sourceware.org/bugzilla/show_bug.cgi?id=12031 [1]
> https://sourceware.org/ml/libc-alpha/2019-07/msg00477.html [2]
> https://en.wikipedia.org/wiki/Romanization_of_Greek#Modern_Greek [3]
>
diff mbox series

Patch

From d39f142b715d224cbb7e29c21d82ea1101fd2844 Mon Sep 17 00:00:00 2001
From: Egor Kobylkin <egor@kobylkin.com>
Date: Wed, 4 Sep 2019 08:24:04 +0200
Subject: [PATCH] Locales: Greek -> ASCII transliteration table [BZ #12031]

	[BZ #12031]
	* locale/C-translit.h.in: Add Greeklish transliteration.
---
 locale/C-translit.h.in | 135 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 135 insertions(+)

diff --git a/locale/C-translit.h.in b/locale/C-translit.h.in
index 758171c394..3c229c1813 100644
--- a/locale/C-translit.h.in
+++ b/locale/C-translit.h.in
@@ -56,6 +56,141 @@ 
 "\x02cd"	"_"	# <U02CD> MODIFIER LETTER LOW MACRON
 "\x02d0"	":"	# <U02D0> MODIFIER LETTER TRIANGULAR COLON
 "\x02dc"	"~"	# <U02DC> SMALL TILDE
+"\x0370"	"H"	# <U0370> GREEK CAPITAL LETTER HETA
+"\x0371"	"h"	# <U0371> GREEK SMALL LETTER HETA
+"\x0372"	"SS"	# <U0372> GREEK CAPITAL LETTER ARCHAIC SAMPI
+"\x0373"	"ss"	# <U0373> GREEK SMALL LETTER ARCHAIC SAMPI
+"\x0374"	"#"	# <U0374> GREEK NUMERAL SIGN
+"\x0375"	"#`"	# <U0375> GREEK LOWER NUMERAL SIGN
+"\x0376"	"W"	# <U0376> GREEK CAPITAL LETTER PAMPHYLIAN DIGAMMA
+"\x0377"	"w"	# <U0377> GREEK SMALL LETTER PAMPHYLIAN DIGAMMA
+"\x037a"	"i"	# <U037A> GREEK YPOGEGRAMMENI
+"\x037b"	"s"	# <U037B> GREEK SMALL REVERSED LUNATE SIGMA SYMBOL
+"\x037c"	"s"	# <U037C> GREEK SMALL DOTTED LUNATE SIGMA SYMBOL
+"\x037d"	"s"	# <U037D> GREEK SMALL REVERSED DOTTED LUNATE SIGMA SYMBOL
+"\x037e"	"?"	# <U037E> GREEK QUESTION MARK
+"\x037f"	"J"	# <U037F> GREEK CAPITAL LETTER YOT
+"\x0384"	"`"	# <U0384> GREEK TONOS
+"\x0385"	"`"	# <U0385> GREEK DIALYTIKA TONOS
+"\x0386"	"A"	# <U0386> GREEK CAPITAL LETTER ALPHA WITH TONOS
+"\x0387"	";"	# <U0387> GREEK ANO TELEIA
+"\x0388"	"E"	# <U0388> GREEK CAPITAL LETTER EPSILON WITH TONOS
+"\x0389"	"E"	# <U0389> GREEK CAPITAL LETTER ETA WITH TONOS
+"\x038a"	"I"	# <U038A> GREEK CAPITAL LETTER IOTA WITH TONOS
+"\x038c"	"O"	# <U038C> GREEK CAPITAL LETTER OMICRON WITH TONOS
+"\x038e"	"Y"	# <U038E> GREEK CAPITAL LETTER UPSILON WITH TONOS
+"\x038f"	"O"	# <U038F> GREEK CAPITAL LETTER OMEGA WITH TONOS
+"\x0390"	"I"	# <U0390> GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
+"\x0391"	"A"	# <U0391> GREEK CAPITAL LETTER ALPHA
+"\x0392"	"V"	# <U0392> GREEK CAPITAL LETTER BETA
+"\x0393"	"G"	# <U0393> GREEK CAPITAL LETTER GAMMA
+"\x0394"	"D"	# <U0394> GREEK CAPITAL LETTER DELTA
+"\x0395"	"E"	# <U0395> GREEK CAPITAL LETTER EPSILON
+"\x0396"	"Z"	# <U0396> GREEK CAPITAL LETTER ZETA
+"\x0397"	"I"	# <U0397> GREEK CAPITAL LETTER ETA
+"\x0398"	"TH"	# <U0398> GREEK CAPITAL LETTER THETA
+"\x0399"	"I"	# <U0399> GREEK CAPITAL LETTER IOTA
+"\x039a"	"K"	# <U039A> GREEK CAPITAL LETTER KAPPA
+"\x039b"	"L"	# <U039B> GREEK CAPITAL LETTER LAMDA
+"\x039c"	"M"	# <U039C> GREEK CAPITAL LETTER MU
+"\x039d"	"N"	# <U039D> GREEK CAPITAL LETTER NU
+"\x039e"	"X"	# <U039E> GREEK CAPITAL LETTER XI
+"\x039f"	"O"	# <U039F> GREEK CAPITAL LETTER OMICRON
+"\x03a0"	"P"	# <U03A0> GREEK CAPITAL LETTER PI
+"\x03a1"	"R"	# <U03A1> GREEK CAPITAL LETTER RHO
+"\x03a3"	"S"	# <U03A3> GREEK CAPITAL LETTER SIGMA
+"\x03a4"	"T"	# <U03A4> GREEK CAPITAL LETTER TAU
+"\x03a5"	"Y"	# <U03A5> GREEK CAPITAL LETTER UPSILON
+"\x03a6"	"F"	# <U03A6> GREEK CAPITAL LETTER PHI
+"\x03a7"	"CH"	# <U03A7> GREEK CAPITAL LETTER CHI
+"\x03a8"	"PS"	# <U03A8> GREEK CAPITAL LETTER PSI
+"\x03a9"	"O"	# <U03A9> GREEK CAPITAL LETTER OMEGA
+"\x03aa"	"I"	# <U03AA> GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
+"\x03ab"	"Y"	# <U03AB> GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
+"\x03ac"	"a"	# <U03AC> GREEK SMALL LETTER ALPHA WITH TONOS
+"\x03ad"	"e"	# <U03AD> GREEK SMALL LETTER EPSILON WITH TONOS
+"\x03ae"	"e"	# <U03AE> GREEK SMALL LETTER ETA WITH TONOS
+"\x03af"	"i"	# <U03AF> GREEK SMALL LETTER IOTA WITH TONOS
+"\x03b0"	"y"	# <U03B0> GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
+"\x03b1"	"a"	# <U03B1> GREEK SMALL LETTER ALPHA
+"\x03b2"	"v"	# <U03B2> GREEK SMALL LETTER BETA
+"\x03b3"	"g"	# <U03B3> GREEK SMALL LETTER GAMMA
+"\x03b4"	"d"	# <U03B4> GREEK SMALL LETTER DELTA
+"\x03b5"	"e"	# <U03B5> GREEK SMALL LETTER EPSILON
+"\x03b6"	"z"	# <U03B6> GREEK SMALL LETTER ZETA
+"\x03b7"	"i"	# <U03B7> GREEK SMALL LETTER ETA
+"\x03b8"	"th"	# <U03B8> GREEK SMALL LETTER THETA
+"\x03b9"	"i"	# <U03B9> GREEK SMALL LETTER IOTA
+"\x03ba"	"k"	# <U03BA> GREEK SMALL LETTER KAPPA
+"\x03bb"	"l"	# <U03BB> GREEK SMALL LETTER LAMDA
+"\x03bc"	"m"	# <U03BC> GREEK SMALL LETTER MU
+"\x03bd"	"n"	# <U03BD> GREEK SMALL LETTER NU
+"\x03be"	"x"	# <U03BE> GREEK SMALL LETTER XI
+"\x03bf"	"o"	# <U03BF> GREEK SMALL LETTER OMICRON
+"\x03c0"	"p"	# <U03C0> GREEK SMALL LETTER PI
+"\x03c1"	"r"	# <U03C1> GREEK SMALL LETTER RHO
+"\x03c2"	"s"	# <U03C2> GREEK SMALL LETTER FINAL SIGMA
+"\x03c3"	"s"	# <U03C3> GREEK SMALL LETTER SIGMA
+"\x03c4"	"t"	# <U03C4> GREEK SMALL LETTER TAU
+"\x03c5"	"y"	# <U03C5> GREEK SMALL LETTER UPSILON
+"\x03c6"	"f"	# <U03C6> GREEK SMALL LETTER PHI
+"\x03c7"	"ch"	# <U03C7> GREEK SMALL LETTER CHI
+"\x03c8"	"ps"	# <U03C8> GREEK SMALL LETTER PSI
+"\x03c9"	"o"	# <U03C9> GREEK SMALL LETTER OMEGA
+"\x03ca"	"i"	# <U03CA> GREEK SMALL LETTER IOTA WITH DIALYTIKA
+"\x03cb"	"y"	# <U03CB> GREEK SMALL LETTER UPSILON WITH DIALYTIKA
+"\x03cc"	"o"	# <U03CC> GREEK SMALL LETTER OMICRON WITH TONOS
+"\x03cd"	"y"	# <U03CD> GREEK SMALL LETTER UPSILON WITH TONOS
+"\x03ce"	"o"	# <U03CE> GREEK SMALL LETTER OMEGA WITH TONOS
+"\x03cf"	"&"	# <U03CF> GREEK CAPITAL KAI SYMBOL
+"\x03d0"	"b"	# <U03D0> GREEK BETA SYMBOL
+"\x03d1"	"th"	# <U03D1> GREEK THETA SYMBOL
+"\x03d2"	"Y`"	# <U03D2> GREEK UPSILON WITH HOOK SYMBOL
+"\x03d3"	"Y`"	# <U03D3> GREEK UPSILON WITH ACUTE AND HOOK SYMBOL
+"\x03d4"	"Y`"	# <U03D4> GREEK UPSILON WITH DIAERESIS AND HOOK SYMBOL
+"\x03d5"	"f"	# <U03D5> GREEK PHI SYMBOL
+"\x03d6"	"p"	# <U03D6> GREEK PI SYMBOL
+"\x03d7"	"&"	# <U03D7> GREEK KAI SYMBOL
+"\x03d8"	"Q"	# <U03D8> GREEK LETTER ARCHAIC KOPPA
+"\x03d9"	"q"	# <U03D9> GREEK SMALL LETTER ARCHAIC KOPPA
+"\x03da"	"6"	# <U03DA> GREEK LETTER STIGMA
+"\x03db"	"6"	# <U03DB> GREEK SMALL LETTER STIGMA
+"\x03dc"	"W"	# <U03DC> GREEK LETTER DIGAMMA
+"\x03dd"	"w"	# <U03DD> GREEK SMALL LETTER DIGAMMA
+"\x03de"	"90"	# <U03DE> GREEK LETTER KOPPA
+"\x03df"	"90"	# <U03DF> GREEK SMALL LETTER KOPPA
+"\x03e0"	"900"	# <U03E0> GREEK LETTER SAMPI
+"\x03e1"	"900"	# <U03E1> GREEK SMALL LETTER SAMPI
+"\x03e2"	"SH"	# <U03E2> COPTIC CAPITAL LETTER SHEI
+"\x03e3"	"sh"	# <U03E3> COPTIC SMALL LETTER SHEI
+"\x03e4"	"F"	# <U03E4> COPTIC CAPITAL LETTER FEI
+"\x03e5"	"f"	# <U03E5> COPTIC SMALL LETTER FEI
+"\x03e6"	"KH"	# <U03E6> COPTIC CAPITAL LETTER KHEI
+"\x03e7"	"kh"	# <U03E7> COPTIC SMALL LETTER KHEI
+"\x03e8"	"H"	# <U03E8> COPTIC CAPITAL LETTER HORI
+"\x03e9"	"h"	# <U03E9> COPTIC SMALL LETTER HORI
+"\x03ea"	"DJ"	# <U03EA> COPTIC CAPITAL LETTER GANGIA
+"\x03eb"	"dj"	# <U03EB> COPTIC SMALL LETTER GANGIA
+"\x03ec"	"GJ"	# <U03EC> COPTIC CAPITAL LETTER SHIMA
+"\x03ed"	"gj"	# <U03ED> COPTIC SMALL LETTER SHIMA
+"\x03ee"	"TI"	# <U03EE> COPTIC CAPITAL LETTER DEI
+"\x03ef"	"ti"	# <U03EF> COPTIC SMALL LETTER DEI
+"\x03f0"	"k"	# <U03F0> GREEK KAPPA SYMBOL
+"\x03f1"	"r"	# <U03F1> GREEK RHO SYMBOL
+"\x03f2"	"s"	# <U03F2> GREEK LUNATE SIGMA SYMBOL
+"\x03f3"	"j"	# <U03F3> GREEK LETTER YOT
+"\x03f4"	"TH"	# <U03F4> GREEK CAPITAL THETA SYMBOL
+"\x03f5"	"e"	# <U03F5> GREEK LUNATE EPSILON SYMBOL
+"\x03f6"	"e"	# <U03F6> GREEK REVERSED LUNATE EPSILON SYMBOL
+"\x03f7"	"SH"	# <U03F7> GREEK CAPITAL LETTER SHO
+"\x03f8"	"sh"	# <U03F8> GREEK SMALL LETTER SHO
+"\x03f9"	"S"	# <U03F9> GREEK CAPITAL LUNATE SIGMA SYMBOL
+"\x03fa"	"S"	# <U03FA> GREEK CAPITAL LETTER SAN
+"\x03fb"	"s"	# <U03FB> GREEK SMALL LETTER SAN
+"\x03fc"	"r"	# <U03FC> GREEK RHO WITH STROKE SYMBOL
+"\x03fd"	"S"	# <U03FD> GREEK CAPITAL REVERSED LUNATE SIGMA SYMBOL
+"\x03fe"	"S"	# <U03FE> GREEK CAPITAL DOTTED LUNATE SIGMA SYMBOL
+"\x03ff"	"S"	# <U03FF> GREEK CAPITAL REVERSED DOTTED LUNATE SIGMA SYMBOL
 "\x0401"	"YO"	# <U0401> CYRILLIC CAPITAL LETTER IO
 "\x0402"	"DJ"	# <U0402> CYRILLIC CAPITAL LETTER DJE
 "\x0403"	"G`"	# <U0403> CYRILLIC CAPITAL LETTER GJE
-- 
2.17.1