[PATCHv3] locale/C-translit.h.in: Greek -> ASCII transliteration table [BZ #12031]
diff mbox series

Message ID zphE-sfDKnYcOF3DuN0dPy88JzmyokV3ih5KliLP-fhlO0-2NsD7Crxo5BpCGnCsAmVJYUUQeqD0KIVJZn2Hzo50C0nfRbTYI9MHZH12xoM=@kobylkin.com
State New
Headers show
Series
  • [PATCHv3] locale/C-translit.h.in: Greek -> ASCII transliteration table [BZ #12031]
Related show

Commit Message

Diego (Egor) Kobylkin Nov. 14, 2019, 1:14 p.m. UTC
Changelog:

v3
 * spurious change (https/http) removed

v2
 *  ETA WITH TONOS is now transliterated as I/i to be consistent throughout the table. Ancient Greek calls for E/e and modern for I/i which we are taking here.



‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, November 14, 2019 1:29 PM, Diego (Egor) Kobylkin <egor@kobylkin.com> wrote:

> Changelog:
> 

> v2
> 

> -   ETA WITH TONOS is now transliterated as I/i to be consistent throughout the table. Ancient Greek calls for E/e and modern for I/i which we are taking here.
>     

>     Thanks Florian for the feedback on this!
>     Egor
>     

>     ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>     On Wednesday, September 4, 2019 9:31 AM, Diego (Egor) Kobylkin egor@kobylkin.com wrote:
>     

> 

> > Dear locale maintainers,
> > fix the glibc bug 12031 "iconv -t ascii//translit with Greek characters" [1]
> > add Greek transliteration rows to locale/C-translit.h.in.
> > This work is done on the heels of the successfully committed patch for the
> > virtually the same bug [BZ #2872] but concerning Cyrillic characters. [2]
> > AFAIK there are many versions of transcription tables for Greek to ASCII
> > transcription. Given that current iconv logic can only translit one to many
> > but not many to many symbols we take the "Standard" part of
> > the Romanization_of_Greek#Modern_Greek table [3]
> > and only keep the one letter Greek graphems. That "standard" seems to be close to
> > the ELOT 743 indeed but not the same.
> > So we omit things like M and Μπ being transliterated as M and B accordingly.
> > Rather Μπ will be treated like two separate graphems and transliterated as Mp.
> > Here is the list of some standards I have collected so far. There doesn't seem
> > a way to harmonize them all into one. But if anyone want to propose a solution -
> > please do.
> > 

> > -   ΕΛΟΤ 743 https://www.teicrete.gr/users/kutrulis/Ergalia/ELOT743.htm Passports.
> >     

> > -   ISO 843 https://en.wikipedia.org/wiki/ISO_843
> >     

> > -   ALA-LC https://www.loc.gov/catdir/cpso/romanization/greek.pdf Book titles.
> >     

> > -   BGN/PCGN http://libraries.ucsd.edu/bib/fed/USBGN_romanization.pdf
> >     

> > -   http://geonames.nga.mil/gns/html/Romanization/Romanization_Greek.pdf Geographical names.
> >     Furthermore to cover the whole U0370-U03FF Greek/Coptic Unicode range I have
> >     asked around and made a best effort transliteration for the rest of characters
> >     not covered in above standards.
> >     Should you have better sources for the actual translit entries please make sure to
> >     send your feedback!
> >     The patch is attached.
> >     Best regards,
> >     Egor Kobylkin
> >     https://sourceware.org/bugzilla/show_bug.cgi?id=12031 [1]
> >     https://sourceware.org/ml/libc-alpha/2019-07/msg00477.html [2]
> >     https://en.wikipedia.org/wiki/Romanization_of_Greek#Modern_Greek [3]
> >

Comments

Florian Weimer Nov. 26, 2019, 11:44 a.m. UTC | #1
* Diego Kobylkin:

> Changelog:
>
> v3
>  * spurious change (https/http) removed
>
> v2
>  * ETA WITH TONOS is now transliterated as I/i to be consistent
> throughout the table. Ancient Greek calls for E/e and modern for I/i
> which we are taking here.

Thanks, I have pushed this for you.

Patch
diff mbox series

From 5a45778dabe368d4a8fce9ca69dbd9894fda0006 Mon Sep 17 00:00:00 2001
From: Egor Kobylkin <egor@kobylkin.com>
Date: Thu, 14 Nov 2019 13:59:39 +0100
Subject: [PATCH] Locales: Greek -> ASCII transliteration table [BZ #12031]

	[BZ #12031]
	* locale/C-translit.h.in: Add Greeklish transliteration.
---
 locale/C-translit.h.in | 135 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 135 insertions(+)

diff --git a/locale/C-translit.h.in b/locale/C-translit.h.in
index 12cbcd35be..5a3cf482e0 100644
--- a/locale/C-translit.h.in
+++ b/locale/C-translit.h.in
@@ -56,6 +56,141 @@ 
 "\x02cd"	"_"	# <U02CD> MODIFIER LETTER LOW MACRON
 "\x02d0"	":"	# <U02D0> MODIFIER LETTER TRIANGULAR COLON
 "\x02dc"	"~"	# <U02DC> SMALL TILDE
+"\x0370"	"H"	# <U0370> GREEK CAPITAL LETTER HETA
+"\x0371"	"h"	# <U0371> GREEK SMALL LETTER HETA
+"\x0372"	"SS"	# <U0372> GREEK CAPITAL LETTER ARCHAIC SAMPI
+"\x0373"	"ss"	# <U0373> GREEK SMALL LETTER ARCHAIC SAMPI
+"\x0374"	"#"	# <U0374> GREEK NUMERAL SIGN
+"\x0375"	"#`"	# <U0375> GREEK LOWER NUMERAL SIGN
+"\x0376"	"W"	# <U0376> GREEK CAPITAL LETTER PAMPHYLIAN DIGAMMA
+"\x0377"	"w"	# <U0377> GREEK SMALL LETTER PAMPHYLIAN DIGAMMA
+"\x037a"	"i"	# <U037A> GREEK YPOGEGRAMMENI
+"\x037b"	"s"	# <U037B> GREEK SMALL REVERSED LUNATE SIGMA SYMBOL
+"\x037c"	"s"	# <U037C> GREEK SMALL DOTTED LUNATE SIGMA SYMBOL
+"\x037d"	"s"	# <U037D> GREEK SMALL REVERSED DOTTED LUNATE SIGMA SYMBOL
+"\x037e"	"?"	# <U037E> GREEK QUESTION MARK
+"\x037f"	"J"	# <U037F> GREEK CAPITAL LETTER YOT
+"\x0384"	"`"	# <U0384> GREEK TONOS
+"\x0385"	"`"	# <U0385> GREEK DIALYTIKA TONOS
+"\x0386"	"A"	# <U0386> GREEK CAPITAL LETTER ALPHA WITH TONOS
+"\x0387"	";"	# <U0387> GREEK ANO TELEIA
+"\x0388"	"E"	# <U0388> GREEK CAPITAL LETTER EPSILON WITH TONOS
+"\x0389"	"I"	# <U0389> GREEK CAPITAL LETTER ETA WITH TONOS
+"\x038a"	"I"	# <U038A> GREEK CAPITAL LETTER IOTA WITH TONOS
+"\x038c"	"O"	# <U038C> GREEK CAPITAL LETTER OMICRON WITH TONOS
+"\x038e"	"Y"	# <U038E> GREEK CAPITAL LETTER UPSILON WITH TONOS
+"\x038f"	"O"	# <U038F> GREEK CAPITAL LETTER OMEGA WITH TONOS
+"\x0390"	"I"	# <U0390> GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
+"\x0391"	"A"	# <U0391> GREEK CAPITAL LETTER ALPHA
+"\x0392"	"V"	# <U0392> GREEK CAPITAL LETTER BETA
+"\x0393"	"G"	# <U0393> GREEK CAPITAL LETTER GAMMA
+"\x0394"	"D"	# <U0394> GREEK CAPITAL LETTER DELTA
+"\x0395"	"E"	# <U0395> GREEK CAPITAL LETTER EPSILON
+"\x0396"	"Z"	# <U0396> GREEK CAPITAL LETTER ZETA
+"\x0397"	"I"	# <U0397> GREEK CAPITAL LETTER ETA
+"\x0398"	"TH"	# <U0398> GREEK CAPITAL LETTER THETA
+"\x0399"	"I"	# <U0399> GREEK CAPITAL LETTER IOTA
+"\x039a"	"K"	# <U039A> GREEK CAPITAL LETTER KAPPA
+"\x039b"	"L"	# <U039B> GREEK CAPITAL LETTER LAMDA
+"\x039c"	"M"	# <U039C> GREEK CAPITAL LETTER MU
+"\x039d"	"N"	# <U039D> GREEK CAPITAL LETTER NU
+"\x039e"	"X"	# <U039E> GREEK CAPITAL LETTER XI
+"\x039f"	"O"	# <U039F> GREEK CAPITAL LETTER OMICRON
+"\x03a0"	"P"	# <U03A0> GREEK CAPITAL LETTER PI
+"\x03a1"	"R"	# <U03A1> GREEK CAPITAL LETTER RHO
+"\x03a3"	"S"	# <U03A3> GREEK CAPITAL LETTER SIGMA
+"\x03a4"	"T"	# <U03A4> GREEK CAPITAL LETTER TAU
+"\x03a5"	"Y"	# <U03A5> GREEK CAPITAL LETTER UPSILON
+"\x03a6"	"F"	# <U03A6> GREEK CAPITAL LETTER PHI
+"\x03a7"	"CH"	# <U03A7> GREEK CAPITAL LETTER CHI
+"\x03a8"	"PS"	# <U03A8> GREEK CAPITAL LETTER PSI
+"\x03a9"	"O"	# <U03A9> GREEK CAPITAL LETTER OMEGA
+"\x03aa"	"I"	# <U03AA> GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
+"\x03ab"	"Y"	# <U03AB> GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
+"\x03ac"	"a"	# <U03AC> GREEK SMALL LETTER ALPHA WITH TONOS
+"\x03ad"	"e"	# <U03AD> GREEK SMALL LETTER EPSILON WITH TONOS
+"\x03ae"	"i"	# <U03AE> GREEK SMALL LETTER ETA WITH TONOS
+"\x03af"	"i"	# <U03AF> GREEK SMALL LETTER IOTA WITH TONOS
+"\x03b0"	"y"	# <U03B0> GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
+"\x03b1"	"a"	# <U03B1> GREEK SMALL LETTER ALPHA
+"\x03b2"	"v"	# <U03B2> GREEK SMALL LETTER BETA
+"\x03b3"	"g"	# <U03B3> GREEK SMALL LETTER GAMMA
+"\x03b4"	"d"	# <U03B4> GREEK SMALL LETTER DELTA
+"\x03b5"	"e"	# <U03B5> GREEK SMALL LETTER EPSILON
+"\x03b6"	"z"	# <U03B6> GREEK SMALL LETTER ZETA
+"\x03b7"	"i"	# <U03B7> GREEK SMALL LETTER ETA
+"\x03b8"	"th"	# <U03B8> GREEK SMALL LETTER THETA
+"\x03b9"	"i"	# <U03B9> GREEK SMALL LETTER IOTA
+"\x03ba"	"k"	# <U03BA> GREEK SMALL LETTER KAPPA
+"\x03bb"	"l"	# <U03BB> GREEK SMALL LETTER LAMDA
+"\x03bc"	"m"	# <U03BC> GREEK SMALL LETTER MU
+"\x03bd"	"n"	# <U03BD> GREEK SMALL LETTER NU
+"\x03be"	"x"	# <U03BE> GREEK SMALL LETTER XI
+"\x03bf"	"o"	# <U03BF> GREEK SMALL LETTER OMICRON
+"\x03c0"	"p"	# <U03C0> GREEK SMALL LETTER PI
+"\x03c1"	"r"	# <U03C1> GREEK SMALL LETTER RHO
+"\x03c2"	"s"	# <U03C2> GREEK SMALL LETTER FINAL SIGMA
+"\x03c3"	"s"	# <U03C3> GREEK SMALL LETTER SIGMA
+"\x03c4"	"t"	# <U03C4> GREEK SMALL LETTER TAU
+"\x03c5"	"y"	# <U03C5> GREEK SMALL LETTER UPSILON
+"\x03c6"	"f"	# <U03C6> GREEK SMALL LETTER PHI
+"\x03c7"	"ch"	# <U03C7> GREEK SMALL LETTER CHI
+"\x03c8"	"ps"	# <U03C8> GREEK SMALL LETTER PSI
+"\x03c9"	"o"	# <U03C9> GREEK SMALL LETTER OMEGA
+"\x03ca"	"i"	# <U03CA> GREEK SMALL LETTER IOTA WITH DIALYTIKA
+"\x03cb"	"y"	# <U03CB> GREEK SMALL LETTER UPSILON WITH DIALYTIKA
+"\x03cc"	"o"	# <U03CC> GREEK SMALL LETTER OMICRON WITH TONOS
+"\x03cd"	"y"	# <U03CD> GREEK SMALL LETTER UPSILON WITH TONOS
+"\x03ce"	"o"	# <U03CE> GREEK SMALL LETTER OMEGA WITH TONOS
+"\x03cf"	"&"	# <U03CF> GREEK CAPITAL KAI SYMBOL
+"\x03d0"	"b"	# <U03D0> GREEK BETA SYMBOL
+"\x03d1"	"th"	# <U03D1> GREEK THETA SYMBOL
+"\x03d2"	"Y`"	# <U03D2> GREEK UPSILON WITH HOOK SYMBOL
+"\x03d3"	"Y`"	# <U03D3> GREEK UPSILON WITH ACUTE AND HOOK SYMBOL
+"\x03d4"	"Y`"	# <U03D4> GREEK UPSILON WITH DIAERESIS AND HOOK SYMBOL
+"\x03d5"	"f"	# <U03D5> GREEK PHI SYMBOL
+"\x03d6"	"p"	# <U03D6> GREEK PI SYMBOL
+"\x03d7"	"&"	# <U03D7> GREEK KAI SYMBOL
+"\x03d8"	"Q"	# <U03D8> GREEK LETTER ARCHAIC KOPPA
+"\x03d9"	"q"	# <U03D9> GREEK SMALL LETTER ARCHAIC KOPPA
+"\x03da"	"6"	# <U03DA> GREEK LETTER STIGMA
+"\x03db"	"6"	# <U03DB> GREEK SMALL LETTER STIGMA
+"\x03dc"	"W"	# <U03DC> GREEK LETTER DIGAMMA
+"\x03dd"	"w"	# <U03DD> GREEK SMALL LETTER DIGAMMA
+"\x03de"	"90"	# <U03DE> GREEK LETTER KOPPA
+"\x03df"	"90"	# <U03DF> GREEK SMALL LETTER KOPPA
+"\x03e0"	"900"	# <U03E0> GREEK LETTER SAMPI
+"\x03e1"	"900"	# <U03E1> GREEK SMALL LETTER SAMPI
+"\x03e2"	"SH"	# <U03E2> COPTIC CAPITAL LETTER SHEI
+"\x03e3"	"sh"	# <U03E3> COPTIC SMALL LETTER SHEI
+"\x03e4"	"F"	# <U03E4> COPTIC CAPITAL LETTER FEI
+"\x03e5"	"f"	# <U03E5> COPTIC SMALL LETTER FEI
+"\x03e6"	"KH"	# <U03E6> COPTIC CAPITAL LETTER KHEI
+"\x03e7"	"kh"	# <U03E7> COPTIC SMALL LETTER KHEI
+"\x03e8"	"H"	# <U03E8> COPTIC CAPITAL LETTER HORI
+"\x03e9"	"h"	# <U03E9> COPTIC SMALL LETTER HORI
+"\x03ea"	"DJ"	# <U03EA> COPTIC CAPITAL LETTER GANGIA
+"\x03eb"	"dj"	# <U03EB> COPTIC SMALL LETTER GANGIA
+"\x03ec"	"GJ"	# <U03EC> COPTIC CAPITAL LETTER SHIMA
+"\x03ed"	"gj"	# <U03ED> COPTIC SMALL LETTER SHIMA
+"\x03ee"	"TI"	# <U03EE> COPTIC CAPITAL LETTER DEI
+"\x03ef"	"ti"	# <U03EF> COPTIC SMALL LETTER DEI
+"\x03f0"	"k"	# <U03F0> GREEK KAPPA SYMBOL
+"\x03f1"	"r"	# <U03F1> GREEK RHO SYMBOL
+"\x03f2"	"s"	# <U03F2> GREEK LUNATE SIGMA SYMBOL
+"\x03f3"	"j"	# <U03F3> GREEK LETTER YOT
+"\x03f4"	"TH"	# <U03F4> GREEK CAPITAL THETA SYMBOL
+"\x03f5"	"e"	# <U03F5> GREEK LUNATE EPSILON SYMBOL
+"\x03f6"	"e"	# <U03F6> GREEK REVERSED LUNATE EPSILON SYMBOL
+"\x03f7"	"SH"	# <U03F7> GREEK CAPITAL LETTER SHO
+"\x03f8"	"sh"	# <U03F8> GREEK SMALL LETTER SHO
+"\x03f9"	"S"	# <U03F9> GREEK CAPITAL LUNATE SIGMA SYMBOL
+"\x03fa"	"S"	# <U03FA> GREEK CAPITAL LETTER SAN
+"\x03fb"	"s"	# <U03FB> GREEK SMALL LETTER SAN
+"\x03fc"	"r"	# <U03FC> GREEK RHO WITH STROKE SYMBOL
+"\x03fd"	"S"	# <U03FD> GREEK CAPITAL REVERSED LUNATE SIGMA SYMBOL
+"\x03fe"	"S"	# <U03FE> GREEK CAPITAL DOTTED LUNATE SIGMA SYMBOL
+"\x03ff"	"S"	# <U03FF> GREEK CAPITAL REVERSED DOTTED LUNATE SIGMA SYMBOL
 "\x0401"	"YO"	# <U0401> CYRILLIC CAPITAL LETTER IO
 "\x0402"	"DJ"	# <U0402> CYRILLIC CAPITAL LETTER DJE
 "\x0403"	"G`"	# <U0403> CYRILLIC CAPITAL LETTER GJE
-- 
2.17.1