Message ID | b82fe65b-b880-a2b5-c97d-2a6aae9c1165@kobylkin.com |
---|---|
State | New |
Headers | show |
Series | [v9] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] | expand |
Thank you for working on this, Egor. Before I start reviewing I would like to summarize the things which I think are blocking for this patch. 1. I think we need tests for transliteration. Currently there is only one test program which is similar to what we need, localedata/bug-iconv-trans.c. It is old and it is not quite clear what bug it is trying to test. Therefore I think we need a new framework to test transliteration. Is it a good idea to base the test on the iconv(1) command line utility which is part of glibc? 2. I made few tests in the command line and it seems to me that the transliteration from "З" to "Z" (+ lowercase as well) in uk_UA does not work and has not been working for some time already because I've checked some older systems as well and the result is always the same. I think that the reason is that uk_UA defines multiple transliteration rules for "З" depending on what is the letter following it. It does not seem to work. AFAIK the reason is that the syntax of transliteration rules says that a single non-Latin character may map one or more Latin strings, each consisting of one or more characters. There cannot be a rule transliterating multiple source characters into one or multiple destination characters. Is it a bug in transliteration implementation? Or maybe in the specification, including POSIX standard? The definition of transliteration says that it is one-to-one mapping of graphemes while a grapheme may be one or multiple characters. It does not have to be always mapping one-to-one character. Should we fix this bug first, make uk_UA transliteration work, and only then add a generic Cyrillic transliteration? Egor's patch already contains transliteration of "У" + combining acute accent to "Ú" which most probably will not work. I still think that in the longer term all existing custom transliterations of Cyrillic alphabets should be ported to a modification of your patch. Egor, while at this I was thinking about your idea to transliterate letters like "Ш" (uppercase) to "SH" (always uppercase) in order to distinguish between "Шема" (-> "SHema") and "Схема" (-> "Shema" or "Sxema"). Also you include a rule to transliterate "Х" to "H" or "X" depending on which destination characters are available, which I told you already that will not work because both "H" and "X" are always available and therefore only the first rule will always be used. I still don't like the idea to put two uppercase letters in a beginning of a word in titlecase only to indicate that there was originally a single letter. What if we: * drop the rule of transliterating "Х" to "H" and transliterate always to "X", * transliterate uppercase "Ш" to "Sh" (so it will work fine for titlecase words)? As a result the Latin letter "h" will only appear as part of a digraph and never as a transliteration of "Х" and therefore will never cause a conflict. Examples: * "Шема" -> "Shema", * "Схема" -> "Sxema". Will this solve the problem? Regards, Rafal
Hi Rafal, thanks for putting it into a clear issue statement on SH/Sh problem. I'm totally with you on this being a good thing to discuss. It is orthogonal to the tests so let me focus on SH/Sh and System A/B problematic here. Looks like we have three issues: 1. lack of explicit control which transformation to use (System A or System B) via //TRANSLIT 2. possibility of collision for System B if used CAP/low transcription for capital letters 3. Cyrillic 'Х'/'х' (ha) never transcribes to 'H'/'h' as it should per System B because it's equivalent 'X'/'x' from System A is always present and takes precedence. As a solution shouldn't we only keep System B in a new file transcribe_cyrillic and put it in place as the explicit ASCII transcription for targeted locales (as opposed to transliteration)? We would keep System A as translit_cyrillic but won't include it into this patch. Once you have resolved an issue of having two conflicting rule-sets but only one key //TRANSLIT you could add the System A back. The SH/Sh can be decided on either way - seems like an easy change any way. Please see more discussion on your excellent points below: On 16.11.18 23:17, Rafal Luzynski wrote: > Egor, while at this I was thinking about your idea to transliterate > letters like "Ш" (uppercase) to "SH" (always uppercase) in order to > distinguish between "Шема" (-> "SHema") and "Схема" (-> "Shema" or > "Sxema"). to clarify, this SH/Sh collision issue relates only to iconv -f UTF-8 -t ASCII//TRANSLIT (i.e. System B transcription). But it's not only SH/Sh, there are following combinations used to transcribe capital letters: YO, DJ, YE, TSH, DH, ZH, CZ, CH, SH, SHH, YU, YA, FH, YH, GH, NG, TCZ Arguably any of them (if not in that CAP/CAP form) could collide with their CAP/low equivalent from a different word. (there may be language grammar rules that in fact prevent some but we don't know for sure) With transcription we are basically striping information from the data, mapping it into a smaller character set. The idea to keep them in CAP/CAP is to try to preserve as much information as possible. > Also you include a rule to transliterate "Х" to "H" or "X" depending > on which destination characters are available, which I told you > already that will not work because both "H" and "X" are always > available and therefore only the first rule will always be used. Just to have this here for reference, the idea was to have both rules in one file so iconv -f UTF-8 -t ASCII//TRANSLIT will produce ASCII compatible _transcription_ (System B) iconv -f UTF-8 -t ISO-8859-15//TRANSLIT | iconv -f ISO-8859-15 -t UTF-8 will produce Latin _transliteration_ as per ISO 9.1995. (System A) So in fact we have two rules for each letter in the same file (System A and System B), where System A takes precedence. I have a question then: isn't this more like a hack than a right thing to do? Shouldn't we have two explicit rules for transcription and transliteration not dependent on a destination character set? > I still don't like the idea to > put two uppercase letters in a beginning of a word in titlecase only > to indicate that there was originally a single letter. What if we: > > * drop the rule of transliterating "Х" to "H" and transliterate > always to "X", This would contradict ISO 9.1995. (System A). System A was added on Marko's request (so setting him on TO:) I am neutral on keeping it or dropping it, just to be clear. > * transliterate uppercase "Ш" to "Sh" (so it will work fine for > titlecase words)? > > As a result the Latin letter "h" will only appear as part of a > digraph and never as a transliteration of "Х" and therefore will > never cause a conflict. Examples: > > * "Шема" -> "Shema", * "Схема" -> "Sxema". > > Will this solve the problem? This particular rule with h/x would make sense it's own. But again - it would contradict the standards. On the other hand, for my personal needs I care less about standards but about current functionality and data loss because of missing transcription altogether due to the BZ #2872. Bests, Egor
Hi, On 17/11/2018 20.34, Egor Kobylkin wrote: > > Looks like we have three issues: > 1. lack of explicit control which transformation to use (System A or > System B) via //TRANSLIT > 2. possibility of collision for System B if used CAP/low transcription > for capital letters > 3. Cyrillic 'Х'/'х' (ha) never transcribes to 'H'/'h' as it should per > System B because it's equivalent 'X'/'x' from System A is always present > and takes precedence. > > As a solution shouldn't we only keep System B in a new file > transcribe_cyrillic and put it in place as the explicit ASCII > transcription for targeted locales (as opposed to transliteration)? > > We would keep System A as translit_cyrillic but won't include it into > this patch. Once you have resolved an issue of having two conflicting > rule-sets but only one key //TRANSLIT you could add the System A back. > > The SH/Sh can be decided on either way - seems like an easy change any way. > > I have a question then: isn't this more like a hack than a right thing > to do? > > Shouldn't we have two explicit rules for transcription and > transliteration not dependent on a destination character set? > > This would contradict ISO 9.1995. (System A). > System A was added on Marko's request (so setting him on TO:) I am > neutral on keeping it or dropping it, just to be clear. > > This particular rule with h/x would make sense it's own. > But again - it would contradict the standards. > On the other hand, for my personal needs I care less about standards but > about current functionality and data loss because of missing > transcription altogether due to the BZ #2872. Given the amount of questions above I think the way forward is to try follow the relevant standards as closely as possible and also check what the other implementations (i.e., uconv(1)) do. For example, checking the case earlier mentioned case may or may not give some hints: $ echo Шема | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin Šema $ echo Схема | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin Shema $ uconv -V uconv v2.1 ICU 50.1.2 Thanks,
On 19.11.18 08:13, Marko Myllynen wrote: > Hi, > > On 17/11/2018 20.34, Egor Kobylkin wrote: >> >> Shouldn't we have two explicit rules for transcription and >> transliteration not dependent on a destination character set? >> >> This would contradict ISO 9.1995. (System A). >> System A was added on Marko's request (so setting him on TO:) I am >> neutral on keeping it or dropping it, just to be clear. >> >> This particular rule with h/x would make sense it's own. >> But again - it would contradict the standards. >> On the other hand, for my personal needs I care less about standards but >> about current functionality and data loss because of missing >> transcription altogether due to the BZ #2872. > > Given the amount of questions above I think the way forward is to try > follow the relevant standards as closely as possible and also check what > the other implementations (i.e., uconv(1)) do. For example, checking the > case earlier mentioned case may or may not give some hints: > > $ echo Шема | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin > Šema > $ echo Схема | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin > Shema > $ uconv -V > uconv v2.1 ICU 50.1.2 Marko, Your example only covers _tansliteration_ to Latin Diacritics iconv -f UTF-8 -t ISO-8859-15//TRANSLIT \ | iconv -f ISO-8859-15 -t UTF-8 while BZ #2872 is about _transcription_ to ASCII iconv -f UTF-8 -t ASCII//TRANSLIT The glibc wiki explicitly lists this use case (ASCII) as the test example https://sourceware.org/glibc/wiki/Locales#Testing_Locales So again, you are asking to have ISO 9.1995. System A but the bug is about ISO 9.1995. System B (GOST 7.79-2000) Bests, Egor
Hi, On 19/11/2018 11.21, Egor Kobylkin wrote: > On 19.11.18 08:13, Marko Myllynen wrote: >> On 17/11/2018 20.34, Egor Kobylkin wrote: > > Your example only covers _tansliteration_ to Latin Diacritics > iconv -f UTF-8 -t ISO-8859-15//TRANSLIT \ > | iconv -f ISO-8859-15 -t UTF-8 > > while BZ #2872 is about _transcription_ to ASCII > iconv -f UTF-8 -t ASCII//TRANSLIT AFAICS v9 (unlike v10) supported both of the above cases. > The glibc wiki explicitly lists this use case (ASCII) as the test > example https://sourceware.org/glibc/wiki/Locales#Testing_Locales I wrote that section and I certainly wasn't considering Cyrillic aspects at that time (IIRC it was written even before Mike did the major update for transliteration rules at the end of 2015). The context back then was mostly about handling Latin letters like Å, Ä, Ö, Ø, etc. > So again, you are asking to have ISO 9.1995. System A but the bug is > about ISO 9.1995. System B (GOST 7.79-2000) We certainly can decide here what's the best course of action, we do not have to slavishly follow some old bug report when deciding the direction for the implementation. But I think I've made my position clear by now so I'm not going to repeat it anymore. In any case once your patch lands I'm going to submit a follow-up patch for fi_FI to make it compliant with the applicable national standard (SFS 4900) which defines how to do Cyrillic transliteration / transcription in the context Finnish. Thanks,
19.11.2018 08:13 Marko Myllynen <myllynen@redhat.com> wrote: > [...] > Given the amount of questions above I think the way forward is to try > follow the relevant standards as closely as possible and also check what > the other implementations (i.e., uconv(1)) do. For example, checking the > case earlier mentioned case may or may not give some hints: > > $ echo Шема | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin > Šema > $ echo Схема | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin > Shema > $ uconv -V > uconv v2.1 ICU 50.1.2 I've played a little with uconv and unfortunately it does not look good to me. It does not have any fallback transliteration to plain ASCII. When it says that 'Ш' is transliterated to 'Š' then it always uses 'Š' and if the target charset does not have this character then crashes: $ echo Шема | uconv -f UTF-8 -t ASCII -x cyrillic-latin Conversion from Unicode to codepage failed at output byte position 0. Unicode: 0160 Error: Invalid character found $ echo Шема | uconv -f UTF-8 -t ISO-8859-1 -x cyrillic-latin Conversion from Unicode to codepage failed at output byte position 0. Unicode: 0160 Error: Invalid character found $ echo Шема | uconv -f UTF-8 -t ISO-8859-2 -x cyrillic-latin �ema $ echo Шема | uconv -f UTF-8 -t ISO-8859-2 -x cyrillic-latin | uconv -f ISO-8859-2 -t UTF-8 Šema It seems to follow ISO 9 (GOST 7.79) System A. However, the transliteration of the hard sign is rather strange: $ echo нъе | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin nʺe The above was correct but: $ echo НЪЕ | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin Nʺ̱E $ echo Ъ | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin ʺ̱ $ echo Ъ | uconv -f UTF-8 -t UTF-16 -x cyrillic-latin| hexdump -x 0000000 feff 02ba 0331 000a 0000008 So this generates: 02BA MODIFIER LETTER DOUBLE PRIME 0331 COMBINING MACRON BELOW There is are more transliteration methods, for example Russian-Latin/BGN: $ echo Шема | uconv -f UTF-8 -t UTF-8 -x Russian-Latin/BGN Shema $ echo Схема | uconv -f UTF-8 -t UTF-8 -x Russian-Latin/BGN Skhema Converting 'х' to 'kh' seems to be common in English transliteration but it does not follow any ISO standard. $ echo ХА ха | uconv -f UTF-8 -t UTF-8 -x Russian-Latin/BGN KHA kha This means that the choice whether a digraph in the output should be all uppercase or maybe upper+lower is context based, something which we probably cannot implement. But definitely a good thing. Two more tests: $ echo Ещё | uconv -f UTF-8 -t UTF-8 -x Russian-Latin/BGN Yeshchë $ echo Ещё | uconv -f UTF-8 -t ASCII -x Russian-Latin/BGN Conversion from Unicode to codepage failed at output byte position 6. Unicode: 00eb Error: Invalid character found So the output is not plain ASCII. $ echo е же ле не | uconv -f UTF-8 -t ASCII -x Russian-Latin/BGN ye zhe le ne Again this means that transliteration of 'е' is context based: it is 'ye' in the beginning of a word and 'e' otherwise. The version which I've tested: $ uconv -V uconv v2.1 ICU 60.2 It seems that uconv will not be a good hint about transliterating to plain ASCII. Also, the difference between uconv and iconv is that we can provide multiple transliterations for any source character but we can't group them into standards so we can't tell iconv to use this or another system. It will just choose the best fitting the current output character set and the only thing we can choose is the locale. This makes me think: should we add a locale like ru_RU@SystemA or ru_RU@SystemB? Regards, Rafal
On 01.12.18 23:07, Rafal Luzynski wrote: > > Also, the difference between uconv and iconv is that we can provide > multiple transliterations for any source character but we can't group > them into standards so we can't tell iconv to use this or another > system. It will just choose the best fitting the current output > character set and the only thing we can choose is the locale. > > This makes me think: should we add a locale like ru_RU@SystemA or > ru_RU@SystemB? Wouldn't it require to create 3 versions of every locale that would include the translit_cyrillic file then? I.e. en_US + en_US@SystemA, en_US@SystemB etc.? This in turn will make two of them optional (as cyrillic fonts are at the moment). The highest value is in having the default locale being able to transliterate, isn't it? So putting the transliteration to optional locales kind of defeats the purpose. An example from my experience as a user - a networked device or host would often have the en_US as the default (only?) locale with no viable way to change it or install cyrillic fonts. Anyway, this is the most dire situation where the ASCII transliteration certainly helps most. Having en_US@SystemA or en_US@SystemB theoretically available but not compiled by the distributor wouldn't help here, would it? So the only useful scenario here would be to ship your locales with the transliteration already included by default in en_US. This way the distributor won't have to get active to include transliteration as en_US@SystemA or en_US@SystemB. From my (however limited) point of view it is better to have the System B in first, then see if some code need to be changed to accommodate System A/System B problematic. Again, System B is _transcription_ to ASCII and System A _transliteration_ to Latin with different use cases. It's insightful to see your comparison of the uconv vs. iconv! Similar to your checks this is what I was using to see whether any locale fails the transliteration for any cyrillic letter: echo "ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУУ́ФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуу́фхцчшщъыьэюяёђѓєѕіїјљњћќўџѪѫѲѳѴѵҌҍ ҐґҒғҔҕҖҗҚқҞҟҢңҤҥҦҧҨҩҪҫҬҭҮүҲҳҴҵҺһҼҽҾҿӀӁӂӋӌӐӑӒӓӖӗӘәӜӝӞӟӠӡӤӥӦӧӨөӰӱӲӳӴӵӸӹ’"| LOCPATH=$workdir/compiled_locales/"$locale"/ LC_ALL="$locale".UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT should give (can be asserted with bash string comparison): AaOoUussYODJG`YeZ`IYiJL`N`TSHK`U`DhABVGDEZHZIJKLMNOPRSTUUFHCCHSHSHHA`Y`E`YUYAabvgdezhzijklmnoprstuufhcchshshh``y`e`yuyayodjg`yez`iyijl`n`tshk`u`dhO`o`FhfhYhyhE`e` G`g`GHghGHghZH`zh`K`k`K`k`N`n`NGngP`p`O`o`C`C`T`t`UuH`h`TCZtczSH`SH`CH`ch`CH`ch`iZH`zh`CH`ch`A`a`A`a`E`e`A`a`ZH`zh`Z`z`Z`z`I`i`O`o`O`o`U`u`U`u`CH`ch`Y`y`' And I am attaching another file that has the Unicode Codepoints next to the letters for easier identification of failures. (like "U0401-Ё U0402-Ђ U0403-Ѓ etc.) Hope it will be helpful in creating the tests. Best regards, Egor Kobylkin CYRILLIC RUSSIAN Съешь ещё этих мягких французских булок, да выпей же чаю. СЪЕШЬ ЕЩЁ ЭТИХ МЯГКИХ ФРАНЦУЗСКИХ БУЛОК? ДА ВЫПЕЙ ЖЕ ЧАЮ! CYRILLIC COMPLETE U0401-Ё U0402-Ђ U0403-Ѓ U0404-Є U0405-Ѕ U0406-І U0407-Ї U0408-Ј U0409-Љ U040A-Њ U040B-Ћ U040C-Ќ U040E-Ў U040F-Џ U0410-А U0411-Б U0412-В U0413-Г U0414-Д U0415-Е U0416-Ж U0417-З U0418-И U0419-Й U041A-К U041B-Л U041C-М U041D-Н U041E-О U041F-П U0420-Р U0421-С U0422-Т U0423-У U0423 0301-У́ U0424-Ф U0425-Х U0426-Ц U0427-Ч U0428-Ш U0429-Щ U042A-ъ U042B-Ы U042C-ь U042D-Э U042E-Ю U042F-Я U0430-а U0431-б U0432-в U0433-г U0434-д U0435-е U0436-ж U0437-з U0438-и U0439-й U043A-к U043B-л U043C-м U043D-н U043E-о U043F-п U0440-р U0441-с U0442-т U0443-у U0443 0301-у́ U0444-ф U0445-х U0446-ц U0447-ч U0448-ш U0449-щ U044A-Ъ U044B-ы U044C-Ь U044D-э U044E-ю U044F-я U0451-ё U0452-ђ U0453-ѓ U0454-є U0455-ѕ U0456-і U0457-ї U0458-ј U0459-љ U045A-њ U045B-ћ U045C-ќ U045E-ў U045F-џ U046A-Ѫ U046B-ѫ U0472-Ѳ U0473-ѳ U0474-Ѵ U0475-ѵ U048C-Ҍ U048D-ҍ U0490-Ґ U0491-ґ U0492-Ғ U0493-ғ U0494-Ҕ U0495-ҕ U0496-Җ U0497-җ U049A-Қ U049B-қ U049E-Ҟ U049F-ҟ U04A2-Ң U04A3-ң U04A4-Ҥ U04A5-ҥ U04A6-Ҧ U04A7-ҧ U04A8-Ҩ U04A9-ҩ U04AA-Ҫ U04AB-ҫ U04AC-Ҭ U04AD-ҭ U04AE-Ү U04AF-ү U04B2-Ҳ U04B3-ҳ U04B4-Ҵ U04B5-ҵ U04BA-Һ U04BB-һ U04BC-Ҽ U04BD-ҽ U04BE-Ҿ U04BF-ҿ U04C0-Ӏ U04C1-Ӂ U04C2-ӂ U04CB-Ӌ U04CC-ӌ U04D0-Ӑ U04D1-ӑ U04D2-Ӓ U04D3-ӓ U04D6-Ӗ U04D7-ӗ U04D8-Ә U04D9-ә U04DC-Ӝ U04DD-ӝ U04DE-Ӟ U04DF-ӟ U04E0-Ӡ U04E1-ӡ U04E4-Ӥ U04E5-ӥ U04E6-Ӧ U04E7-ӧ U04E8-Ө U04E9-ө U04F0-Ӱ U04F1-ӱ U04F2-Ӳ U04F3-ӳ U04F4-Ӵ U04F5-ӵ U04F8-Ӹ U04F9-ӹ U2019-’ GREEK Ελληνικό Ίδρυμα Ευρωπαϊκής και Εξωτερικής. GERMAN Zwölf Boxkämpfer jagen Victor quer über den großen Sylter Deich. FRENCH Dès Noël où un zéphyr haï me vêt de glaçons würmiens je dîne d’exquis rôtis de bœuf au kir à l’aÿ d’âge mûr \& cætera. SPANISH El veloz murciélago hindú comía feliz cardillo y kiwi, la cigüeña tocaba el saxofón detrás del palenque de paja. END
Rafal, Just to touch base on this, what is the best way forward? Did you get any input/feedback on your questions below? Are you expecting input from anyone but myself? On the blocking issue #2: I really don’t see the connection to the uk_UA locale that has its transliteration table inline and is explicitly excluded from my patch. It may be revealing another issue you have with glibc but wouldn’t that be better addressed in a new bug? Again, in the v10 of my patch I have removed multicharacter source graphemes, so that issue is moot there. If you’d like to overhaul the glibc translit system wouldn’t it be better to commit the simple text file with the Cyrillic translit(transcription) table first, fix the bug from the year 2006 and then proceed from there all due diligence? The same with having both System A and System B. Initially I went along with the suggestion to include the system A but it is clear now that it doesn’t make fixing [BZ #2872] more straightforward. So I’d also propose to set it aside for the moment and use the v10 without the system A. That is the whole reason I have submitted it, to be superclear on that. Now you saw that uconv is transcribing «ХА» as KHA (cap/cap/cap) that should mitigate your concern about that issue too (somewhat, anyway). Making it context based would also be about adding new code, see above. Let me know if there’s anything I can help with getting more progress with the decision Bests, Egor On 16.11.18 23:17, Rafal Luzynski wrote: > 2. I made few tests in the command line and it seems to me that the > transliteration from "З" to "Z" (+ lowercase as well) in uk_UA does > not work and has not been working for some time already because I've > checked some older systems as well and the result is always the same. > I think that the reason is that uk_UA defines multiple > transliteration rules for "З" depending on what is the letter > following it. It does not seem to work. AFAIK the reason is that > the syntax of transliteration rules says that a single non-Latin > character may map one or more Latin strings, each consisting of one > or more characters. There cannot be a rule transliterating multiple > source characters into one or multiple destination characters. Is it > a bug in transliteration implementation? Or maybe in the > specification, including POSIX standard? > The definition of transliteration says that it is one-to-one mapping > of graphemes while a grapheme may be one or multiple characters. It > does not have to be always mapping one-to-one character. Should we > fix this bug first, make uk_UA transliteration work, and only then > add a generic Cyrillic transliteration? Egor's patch already > contains transliteration of "У" + combining acute accent to "Ú" which > most probably will not work. > > I still think that in the longer term all existing custom > transliterations of Cyrillic alphabets should be ported to a > modification of your patch. On 01.12.18 23:07, Rafal Luzynski wrote: > 19.11.2018 08:13 Marko Myllynen <myllynen@redhat.com> wrote: >> [...] >> Given the amount of questions above I think the way forward is to try >> follow the relevant standards as closely as possible and also check what >> the other implementations (i.e., uconv(1)) do. For example, checking the >> case earlier mentioned case may or may not give some hints: >> >> $ echo Шема | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin >> Šema >> $ echo Схема | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin >> Shema >> $ uconv -V >> uconv v2.1 ICU 50.1.2 > > I've played a little with uconv and unfortunately it does not look good > to me. > > It does not have any fallback transliteration to plain ASCII. When it says > that 'Ш' is transliterated to 'Š' then it always uses 'Š' and if the target > charset does not have this character then crashes: > > $ echo Шема | uconv -f UTF-8 -t ASCII -x cyrillic-latin > Conversion from Unicode to codepage failed at output byte position 0. > Unicode: 0160 Error: Invalid character found > $ echo Шема | uconv -f UTF-8 -t ISO-8859-1 -x cyrillic-latin > Conversion from Unicode to codepage failed at output byte position 0. > Unicode: 0160 Error: Invalid character found > $ echo Шема | uconv -f UTF-8 -t ISO-8859-2 -x cyrillic-latin > �ema > $ echo Шема | uconv -f UTF-8 -t ISO-8859-2 -x cyrillic-latin | uconv -f > ISO-8859-2 -t UTF-8 > Šema > > It seems to follow ISO 9 (GOST 7.79) System A. However, the transliteration > of the hard sign is rather strange: > > $ echo нъе | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin > nʺe > > The above was correct but: > > $ echo НЪЕ | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin > Nʺ̱E > $ echo Ъ | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin > ʺ̱ > $ echo Ъ | uconv -f UTF-8 -t UTF-16 -x cyrillic-latin| hexdump -x > 0000000 feff 02ba 0331 000a > 0000008 > > So this generates: > 02BA MODIFIER LETTER DOUBLE PRIME > 0331 COMBINING MACRON BELOW > > There is are more transliteration methods, for example Russian-Latin/BGN: > > $ echo Шема | uconv -f UTF-8 -t UTF-8 -x Russian-Latin/BGN > Shema > $ echo Схема | uconv -f UTF-8 -t UTF-8 -x Russian-Latin/BGN > Skhema > > Converting 'х' to 'kh' seems to be common in English transliteration but > it does not follow any ISO standard. > > $ echo ХА ха | uconv -f UTF-8 -t UTF-8 -x Russian-Latin/BGN > KHA kha > > This means that the choice whether a digraph in the output should be > all uppercase or maybe upper+lower is context based, something which we > probably cannot implement. But definitely a good thing. > > Two more tests: > > $ echo Ещё | uconv -f UTF-8 -t UTF-8 -x Russian-Latin/BGN > Yeshchë > $ echo Ещё | uconv -f UTF-8 -t ASCII -x Russian-Latin/BGN > Conversion from Unicode to codepage failed at output byte position 6. > Unicode: 00eb Error: Invalid character found > > So the output is not plain ASCII. > > $ echo е же ле не | uconv -f UTF-8 -t ASCII -x Russian-Latin/BGN > ye zhe le ne > > Again this means that transliteration of 'е' is context based: > it is 'ye' in the beginning of a word and 'e' otherwise. > > The version which I've tested: > > $ uconv -V > uconv v2.1 ICU 60.2 > > It seems that uconv will not be a good hint about transliterating > to plain ASCII. > > Also, the difference between uconv and iconv is that we can provide > multiple transliterations for any source character but we can't group > them into standards so we can't tell iconv to use this or another > system. It will just choose the best fitting the current output > character set and the only thing we can choose is the locale. > > This makes me think: should we add a locale like ru_RU@SystemA or > ru_RU@SystemB? > > Regards, > > Rafal >
17.11.2018 19:34 Egor Kobylkin <egor@kobylkin.com> wrote: > [...] > Looks like we have three issues: > 1. lack of explicit control which transformation to use (System A or > System B) via //TRANSLIT > 2. possibility of collision for System B if used CAP/low transcription > for capital letters > 3. Cyrillic 'Х'/'х' (ha) never transcribes to 'H'/'h' as it should per > System B because it's equivalent 'X'/'x' from System A is always present > and takes precedence. True. > As a solution shouldn't we only keep System B in a new file > transcribe_cyrillic and put it in place as the explicit ASCII > transcription for targeted locales (as opposed to transliteration)? > > We would keep System A as translit_cyrillic but won't include it into > this patch. Once you have resolved an issue of having two conflicting > rule-sets but only one key //TRANSLIT you could add the System A back. Sounds like a good idea to provide those two files: * translit_cyrillic_system_a, * translit_cyrillic_system_b, (or any other pair of names) and let the individual locales choose whether they want to include System A or System B. For optimization, system_b file could include system_a and modify it. > The SH/Sh can be decided on either way - seems like an easy change any > way. I'm in favor of "Sh" because it will work fine for titlecased words (where only the first letter is uppercase) but I'm aware it would be a problem for uppercased words. Unfortunately, I think we are unable to satisfy both cases. > On 16.11.18 23:17, Rafal Luzynski wrote: > > > Egor, while at this I was thinking about your idea to transliterate > > letters like "Ш" (uppercase) to "SH" (always uppercase) in order to > > distinguish between "Шема" (-> "SHema") and "Схема" (-> "Shema" or > > "Sxema"). > > to clarify, this SH/Sh collision issue relates only to iconv -f UTF-8 -t > ASCII//TRANSLIT (i.e. System B transcription). True. > But it's not only SH/Sh, there are following combinations used to > transcribe capital letters: > > YO, DJ, YE, TSH, DH, ZH, CZ, CH, SH, SHH, YU, YA, FH, YH, GH, NG, TCZ Absolutely true. I skip the whole list only for the brevity: if we find a solution for one letter the same solution will work fine for all others. > [...] > With transcription we are basically striping information from the data, > mapping it into a smaller character set. The idea to keep them in > CAP/CAP is to try to preserve as much information as possible. I'm only afraid that things like "TWo CApitals" or "CamelCase" are common among us computer geeks while they do not look great when working with natural language and when displaying them to regular users and even non-computer people. > [...] > So in fact we have two rules for each letter in the same file (System A > and System B), where System A takes precedence. > > I have a question then: isn't this more like a hack than a right thing > to do? > > Shouldn't we have two explicit rules for transcription and > transliteration not dependent on a destination character set? It's impossible with the current API of iconv. Maybe it would be possible ever in future but that's a greater amount of work than what we are doing here now. Again, for now different set of rules = different locale. I have another question: is it really a job of transliteration to preserve all original information, to ensure no collisions and have the ability to restore the original text? I'm afraid that as long as plain ASCII is the destination charset whatever system we provide it will always be possible to provide a malicious combination of the Cyrillic characters proving that the system generates collisions. > > I still don't like the idea to > > put two uppercase letters in a beginning of a word in titlecase only > > to indicate that there was originally a single letter. What if we: > > > > * drop the rule of transliterating "Х" to "H" and transliterate > > always to "X", > This would contradict ISO 9.1995. (System A). Yes, it would. I'm trying to find solution here since I think we have proved that we can't implement a system which will handle System A, System B, and ensure no collisions at the same time. At least one requirement must be dropped (at least partially). > System A was added on Marko's request (so setting him on TO:) I am > neutral on keeping it or dropping it, just to be clear. I think I didn't see this Marko's request but I'm in favor of keeping System A, too. Marko, it would be good to hear your opinion about System A vs. System B again. > [...] > On the other hand, for my personal needs I care less about standards but > about current functionality and data loss because of missing > transcription altogether due to the BZ #2872. I read this that you are open to a solution which is inspired by some standards but does not implement them fully due to our technical limitations. 19.11.2018 10:21 Egor Kobylkin <egor@kobylkin.com> wrote: > [...] > Marko, > > Your example only covers _tansliteration_ to Latin Diacritics > [...] > while BZ #2872 is about _transcription_ to ASCII > [...] > > So again, you are asking to have ISO 9.1995. System A but the bug is > about ISO 9.1995. System B (GOST 7.79-2000) It's hard to say what the original bug reporter meant but I think that the problem is that there is no transliteration from Cyrillic to any variant of Latin, except in few locales. If System A was implemented but System B was not then at least some characters would be handled correctly. Currently no Cyrillic characters are handled. 19.11.2018 20:35 Marko Myllynen <myllynen@redhat.com> wrote: > [...] > In any case once your patch lands I'm going to submit a follow-up patch > for fi_FI to make it compliant with the applicable national standard > (SFS 4900) which defines how to do Cyrillic transliteration / > transcription in the context Finnish. I totally agree. As far as I can see, SFS 4900 is more similar to System A (ISO 9) rather than System B, that is, it transliterates to Latin characters with diacritics rather than plain ASCII. Marko, what is your opinion about possible implementation of SFS 4900 in these cases: * When the destination charset does not contain required Latin diacritic characters (e.g., it is plain ASCII)? * When the output is ambiguous, that means, when two different Cyrillic strings produce the same Latin (or ASCII) output? At the moment I am not curious about SFS 4900 but we are facing the same problems now with ISO 9 and GOST 7.79. 1.12.2018 23:07 Rafal Luzynski <digitalfreak@lingonborough.com> wrote: > [...] > $ echo ХА ха | uconv -f UTF-8 -t UTF-8 -x Russian-Latin/BGN > KHA kha > > This means that the choice whether a digraph in the output should be > all uppercase or maybe upper+lower is context based, something which we > probably cannot implement. But definitely a good thing. I forgot to include this test which is really interesting: $ echo ХА Ха ха | uconv -f UTF-8 -t UTF-8 -x Russian-Latin/BGN KHA Kha kha which again confirms that the choice of all uppercase or just the first letter uppercased is context based, a thing which we can't implement now. 1.12.2018 23:53 Egor Kobylkin <egor@kobylkin.com> wrote: > > On 01.12.18 23:07, Rafal Luzynski wrote: > > > > [...] > > This makes me think: should we add a locale like ru_RU@SystemA or > > ru_RU@SystemB? > > Wouldn't it require to create 3 versions of every locale that would > include the translit_cyrillic file then? I.e. en_US + en_US@SystemA, > en_US@SystemB etc.? OK, please read this as another brainstorming idea and let's just forget it. > [...] > An example from my experience as a user - a networked device or host > would often have the en_US as the default (only?) locale with no viable > way to change it or install cyrillic fonts. Anyway, this is the most > dire situation where the ASCII transliteration certainly helps most. > Having en_US@SystemA or en_US@SystemB theoretically available but not > compiled by the distributor wouldn't help here, would it? > > So the only useful scenario here would be to ship your locales with the > transliteration already included by default in en_US. This way the > distributor won't have to get active to include transliteration as > en_US@SystemA or en_US@SystemB. Having the idea of "@SystemA" and "@SystemB" dropped I don't think implementing any solution in glibc would be helpful for your use case. Two reasons: 1. I believe that sooner or later someone will develop a transliteration system for en_US which will follow English transliteration of Russian instead of any standard we are discussing here. That means, it would transliterate 'Х' as 'Kh' rather than 'H' or 'X'. 2. Currently there is a trend not to install even en_US locales and leave only C which is hardcoded into glibc binaries. OTOH, I wouldn't mind if ISO 9 was hardcoded into C as well. 3. That's beyond Russian language but transliteration according to Serbian or Bulgarian or Ukrainian or Kazakh rules still requires installing their proper locales. I think that requiring ru_RU to be installed could be reasonable especially if we end up with ru_RU somehow differing from the default "translit_cyrillic". BTW you don't need Cyrillic fonts to be installed on your server in order to process the Cyrillic text correctly unless your server renders the text. 3.12.2018 23:19 Egor Kobylkin <egor@kobylkin.com> wrote: > > Rafal, > > Just to touch base on this, what is the best way forward? Did you get > any input/feedback on your questions below? Are you expecting input from > anyone but myself? Yes, I expected some input from more experienced maintainers about whether and how to write the tests but I'd rather start another thread about it because this one is too long already. > On the blocking issue #2: I really don’t see the connection to the uk_UA > locale that has its transliteration table inline and is explicitly > excluded from my patch. It may be revealing another issue you have with > glibc but wouldn’t that be better addressed in a new bug? OK, I was not precise enough (I'm sorry about it) so I'd like to explain here: 1. In the long term goal I would like to convert those excluded locales to use your translit_cyrillic as well. 2. In order to ensure that change is not destructive for them I will need automatic tests to prove that their transliteration rules work the same good before the change and after the change. 3. It does not matter that converting those other locales is in a distant future because we need the same tests for Russian language now. 4. Even although I have not started writing any tests I can see they will be failing for uk_UA. The reason is that glibc transliteration rules can handle transliterating single characters into single characters, single characters into multiple characters but not multiple characters into multiple (or even single) characters. 5. We can ignore uk_UA but we will face the same case in ru_RU where you had a case of 'У́ ' ('У' + 'COMBINING ACUTE ACCENT'). 6. So the question was: how (and whether) to write the tests if we already know they would be failing? Skip them? Resolve the other issue first? Mark them as XFAIL? In the meantime, you have removed the controversial conversion rule of 'У' with the acute accent: > Again, in the v10 of my patch I have removed multicharacter source > graphemes, so that issue is moot there. so we can move to the next step. > If you’d like to overhaul the glibc translit system wouldn’t it be > better to commit the simple text file with the Cyrillic > translit(transcription) table first, fix the bug from the year 2006 and > then proceed from there all due diligence? I agree and we are now one step forward. > The same with having both System A and System B. Initially I went along > with the suggestion to include the system A but it is clear now that it > doesn’t make fixing [BZ #2872] more straightforward. So I’d also propose > to set it aside for the moment and use the v10 without the system A. > That is the whole reason I have submitted it, to be superclear on that. OK, I think that now I understand your reason to drop System A better. But still I'd like to rethink implementing System A somehow and drop (or rather: implement only partially) System B. > Now you saw that uconv is transcribing «ХА» as KHA (cap/cap/cap) that > should mitigate your concern about that issue too (somewhat, anyway). > Making it context based would also be about adding new code, see above. It would also require the changes in the syntax of the source code of locale data and possibly breaking the POSIX compatibility which I think would be unacceptable. > Let me know if there’s anything I can help with getting more progress > with the decision I'm afraid you can't help more. I'd like to hear some feedback from other people. Due to some minor obstacles we can't resolve this issue being only two here. Regards, Rafal
Hi, On 08/12/2018 03.15, Rafal Luzynski wrote: > 17.11.2018 19:34 Egor Kobylkin <egor@kobylkin.com> wrote: >> >> The SH/Sh can be decided on either way - seems like an easy change any >> way. > > I'm in favor of "Sh" because it will work fine for titlecased words > (where only the first letter is uppercase) but I'm aware it would be > a problem for uppercased words. Unfortunately, I think we are unable > to satisfy both cases. I think I'm in favor of "Sh" as well, although not perfect I'd assume it's probably going to be correct in more cases than SH. >> System A was added on Marko's request (so setting him on TO:) I am >> neutral on keeping it or dropping it, just to be clear. > > I think I didn't see this Marko's request but I'm in favor of keeping > System A, too. > > Marko, it would be good to hear your opinion about System A vs. System B > again. I think System A is a better option as it should be the same as ISO 9 and perhaps also produces results in some cases which are more expected than with System B (if the Wikipedia ISO 9 article is to be believed). Wrt BZ #2872 I think it's good to keep it in mind but IMHO we can also deviate from it if needed, however with System A + ASCII fallback definitions the RFE should be satisfied as well? > 19.11.2018 20:35 Marko Myllynen <myllynen@redhat.com> wrote: >> [...] >> In any case once your patch lands I'm going to submit a follow-up patch >> for fi_FI to make it compliant with the applicable national standard >> (SFS 4900) which defines how to do Cyrillic transliteration / >> transcription in the context Finnish. > > I totally agree. As far as I can see, SFS 4900 is more similar to > System A (ISO 9) rather than System B, that is, it transliterates to Latin > characters with diacritics rather than plain ASCII. Marko, what is your > opinion about possible implementation of SFS 4900 in these cases: > > * When the destination charset does not contain required Latin diacritic > characters (e.g., it is plain ASCII)? This would be according to http://jkorpela.fi/iso9.html8 so for example instead of ž -> zh and instead of štš -> shtsh. > * When the output is ambiguous, that means, when two different Cyrillic > strings produce the same Latin (or ASCII) output? This is a good point and one I haven't considered but I'm not sure is there anything we can do about this (at least without major locale system internals work)? Do you have any rough idea how frequently this could happen or is this more a theoretical issue? (Sorry if I've missed earlier comments about this, it's been a long thread.) >> The same with having both System A and System B. Initially I went along >> with the suggestion to include the system A but it is clear now that it >> doesn’t make fixing [BZ #2872] more straightforward. So I’d also propose >> to set it aside for the moment and use the v10 without the system A. >> That is the whole reason I have submitted it, to be superclear on that. > > OK, I think that now I understand your reason to drop System A better. > But still I'd like to rethink implementing System A somehow and drop > (or rather: implement only partially) System B. Yes, I also think System A AKA ISO 9 would be a better choice but I'll leave the final decision for you two (and others who might weigh in). Thanks,
10.12.2018 22:20 Marko Myllynen <myllynen@redhat.com> wrote: > > Hi, > > On 08/12/2018 03.15, Rafal Luzynski wrote: > > [...] > > Marko, it would be good to hear your opinion about System A vs. System B > > again. > > I think System A is a better option as it should be the same as ISO 9 > and perhaps also produces results in some cases which are more expected > than with System B (if the Wikipedia ISO 9 article is to be believed). > > Wrt BZ #2872 I think it's good to keep it in mind but IMHO we can also > deviate from it if needed, however with System A + ASCII fallback > definitions the RFE should be satisfied as well? That's exactly what I meant (sorry if it was not clear before). > > [...] Marko, what is your > > opinion about possible implementation of SFS 4900 in these cases: > > > > * When the destination charset does not contain required Latin diacritic > > characters (e.g., it is plain ASCII)? > > This would be according to http://jkorpela.fi/iso9.html8 so for example > instead of ž -> zh and instead of štš -> shtsh. Agree. > > * When the output is ambiguous, that means, when two different Cyrillic > > strings produce the same Latin (or ASCII) output? > > This is a good point and one I haven't considered but I'm not sure is > there anything we can do about this (at least without major locale > system internals work)? I agree with the suggestion that we can't do much about it. I mean, there are possibly solutions (like using more punctuation characters) but they don't look natural to me. > Do you have any rough idea how frequently this > could happen or is this more a theoretical issue? (Sorry if I've missed > earlier comments about this, it's been a long thread.) Yes, Egor provided this example many times: "схема" -> "shema" (if "с" -> "s" and "х" -> "h") "шема" -> "shema" (if "ш" -> "sh") I don't think that it matters how frequent are these cases. I think that the question is if ambiguity is a bug because if yes then even one corner case proves that the solution is wrong. > [...] > Yes, I also think System A AKA ISO 9 would be a better choice but I'll > leave the final decision for you two (and others who might weigh in). Egor is a native speaker so I respect his opinion even if I'm not fully convinced for technical reasons. Sadly, nobody else provides any opinion which could weigh. I am going to write a separate email about it. Regards, Rafal
On 19.12.18 23:25, Rafal Luzynski wrote: > 10.12.2018 22:20 Marko Myllynen <myllynen@redhat.com> wrote: > >> [...] >> Yes, I also think System A AKA ISO 9 would be a better choice but I'll >> leave the final decision for you two (and others who might weigh in). > > Egor is a native speaker so I respect his opinion even if I'm not fully > convinced for technical reasons. Sadly, nobody else provides any opinion > which could weigh. I am going to write a separate email about it. > > Regards, > > Rafal > It's not about which letter should be used for a particular transliteration. I couldn't care less about that just to be clear. May be I am missing something, could you tell how do you want to fit System A to ASCII exactly? Let's take the very first example from the table: CyrillicUnicode CyrillicLetter CyrillicUnicodeName LatinUnicode System A Latin Letter System B ASCII Letter 0401 Ё CYRILLIC CAPITAL LETTER IO 00CB Ë YO so: Cyrillic Ё U0401 System A - Ë U00CB - _not_ ASCII System B - YO (or Yo) "<U0059><U004F>" - ASCII Could you explain how can we make System A "Ë" to be displayed or processes somehow in a C locale? Or in a locale or program that doesn't have "Ë" U00CB? Bests, Egor
19.12.2018 23:48 Egor Kobylkin <egor@kobylkin.com> wrote: > [...] > May be I am missing something, could you tell how do you want to fit > System A to ASCII exactly? > > Let's take the very first example from the table: > CyrillicUnicode CyrillicLetter CyrillicUnicodeName LatinUnicode System A > Latin Letter System B ASCII Letter > 0401 Ё CYRILLIC CAPITAL LETTER IO 00CB Ë YO > > so: > Cyrillic Ё U0401 > System A - Ë U00CB - _not_ ASCII > System B - YO (or Yo) "<U0059><U004F>" - ASCII > > Could you explain how can we make System A "Ë" to be displayed or > processes somehow in a C locale? Or in a locale or program that doesn't > have "Ë" U00CB? It should be "YO" (or "Yo"). Exactly as you provided in your previous patches. I am afraid that my description "Cyrillic -> Latin -> ASCII" was too ambiguous, I am sorry about it. Actually it is a list which says: Convert Cyrillic "Ё" into Latin "Ë" if possible, otherwise to "YO" ("Yo"). We may stop using "Cyrillic -> Latin -> ASCII" picture as too ambiguous and invent a better one. Regards, Rafal
From a8ae30e0bf7484f4c0f034480110c81dd059b69e Mon Sep 17 00:00:00 2001 From: Egor Kobylkin <egor@kobylkin.com> Date: Wed, 14 Nov 2018 22:10:37 +0100 Subject: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] [BZ #2872] * localedata/locales/translit_cyrillic: New file. Supports ISO 9.1995, GOST 7.79 System A transliteration System B transcription table from Cyrillic to Latin/ASCII. * localedata/locales/aa_DJ: Add 'include "translit_cyrillic";""' to LC_CTYPE translit section. * localedata/locales/af_ZA: Likewise. * localedata/locales/ak_GH: Likewise. * localedata/locales/am_ET: Likewise. * localedata/locales/ar_EG: Likewise. * localedata/locales/be_BY: Likewise. * localedata/locales/bem_ZM: Likewise. * localedata/locales/ber_DZ: Likewise. * localedata/locales/ber_MA: Likewise. * localedata/locales/bg_BG: Likewise. * localedata/locales/bi_VU: Likewise. * localedata/locales/bn_BD: Likewise. * localedata/locales/bo_CN: Likewise. * localedata/locales/ca_ES: Likewise. * localedata/locales/ce_RU: Likewise. * localedata/locales/cmn_TW: Likewise. * localedata/locales/cs_CZ: Likewise. * localedata/locales/cv_RU: Likewise. * localedata/locales/cy_GB: Likewise. * localedata/locales/da_DK: Likewise. * localedata/locales/de_DE: Likewise. * localedata/locales/dv_MV: Likewise. * localedata/locales/dz_BT: Likewise. * localedata/locales/el_GR: Likewise. * localedata/locales/en_GB: Likewise. * localedata/locales/en_NG: Likewise. * localedata/locales/en_ZM: Likewise. * localedata/locales/es_CU: Likewise. * localedata/locales/es_ES: Likewise. * localedata/locales/et_EE: Likewise. * localedata/locales/fa_IR: Likewise. * localedata/locales/ff_SN: Likewise. * localedata/locales/fi_FI: Likewise. * localedata/locales/fr_FR: Likewise. * localedata/locales/ga_IE: Likewise. * localedata/locales/gd_GB: Likewise. * localedata/locales/gu_IN: Likewise. * localedata/locales/gv_GB: Likewise. * localedata/locales/he_IL: Likewise. * localedata/locales/hi_IN: Likewise. * localedata/locales/hif_FJ: Likewise. * localedata/locales/hr_HR: Likewise. * localedata/locales/ht_HT: Likewise. * localedata/locales/hu_HU: Likewise. * localedata/locales/hy_AM: Likewise. * localedata/locales/id_ID: Likewise. * localedata/locales/is_IS: Likewise. * localedata/locales/it_IT: Likewise. * localedata/locales/ja_JP: Likewise. * localedata/locales/kab_DZ: Likewise. * localedata/locales/kk_KZ: Likewise. * localedata/locales/km_KH: Likewise. * localedata/locales/kn_IN: Likewise. * localedata/locales/ko_KR: Likewise. * localedata/locales/ks_IN: Likewise. * localedata/locales/kw_GB: Likewise. * localedata/locales/ky_KG: Likewise. * localedata/locales/lb_LU: Likewise. * localedata/locales/lg_UG: Likewise. * localedata/locales/lij_IT: Likewise. * localedata/locales/ln_CD: Likewise. * localedata/locales/lo_LA: Likewise. * localedata/locales/lt_LT: Likewise. * localedata/locales/lv_LV: Likewise. * localedata/locales/mg_MG: Likewise. * localedata/locales/mhr_RU: Likewise. * localedata/locales/mk_MK: Likewise. * localedata/locales/ml_IN: Likewise. * localedata/locales/ms_MY: Likewise. * localedata/locales/mt_MT: Likewise. * localedata/locales/nan_TW@latin: Likewise. * localedata/locales/nb_NO: Likewise. * localedata/locales/ne_NP: Likewise. * localedata/locales/nhn_MX: Likewise. * localedata/locales/niu_NU: Likewise. * localedata/locales/niu_NZ: Likewise. * localedata/locales/nl_NL: Likewise. * localedata/locales/nr_ZA: Likewise. * localedata/locales/oc_FR: Likewise. * localedata/locales/om_KE: Likewise. * localedata/locales/or_IN: Likewise. * localedata/locales/os_RU: Likewise. * localedata/locales/pa_IN: Likewise. * localedata/locales/pa_PK: Likewise. * localedata/locales/pl_PL: Likewise. * localedata/locales/pt_PT: Likewise. * localedata/locales/quz_PE: Likewise. * localedata/locales/ro_RO: Likewise. * localedata/locales/ru_RU: Likewise. * localedata/locales/rw_RW: Likewise. * localedata/locales/sa_IN: Likewise. * localedata/locales/sd_IN: Likewise. * localedata/locales/sd_IN@devanagari: Likewise. * localedata/locales/se_NO: Likewise. * localedata/locales/sgs_LT: Likewise. * localedata/locales/shn_MM: Likewise. * localedata/locales/si_LK: Likewise. * localedata/locales/sk_SK: Likewise. * localedata/locales/sl_SI: Likewise. * localedata/locales/sm_WS: Likewise. * localedata/locales/so_SO: Likewise. * localedata/locales/sq_AL: Likewise. * localedata/locales/ss_ZA: Likewise. * localedata/locales/st_ZA: Likewise. * localedata/locales/sv_SE: Likewise. * localedata/locales/sw_KE: Likewise. * localedata/locales/ta_IN: Likewise. * localedata/locales/te_IN: Likewise. * localedata/locales/th_TH: Likewise. * localedata/locales/ti_ET: Likewise. * localedata/locales/tn_ZA: Likewise. * localedata/locales/to_TO: Likewise. * localedata/locales/tpi_PG: Likewise. * localedata/locales/tr_TR: Likewise. * localedata/locales/ts_ZA: Likewise. * localedata/locales/unm_US: Likewise. * localedata/locales/ur_IN: Likewise. * localedata/locales/ur_PK: Likewise. * localedata/locales/ve_ZA: Likewise. * localedata/locales/vi_VN: Likewise. * localedata/locales/wa_BE: Likewise. * localedata/locales/wo_SN: Likewise. * localedata/locales/xh_ZA: Likewise. * localedata/locales/yi_US: Likewise. * localedata/locales/yuw_PG: Likewise. * localedata/locales/zh_CN: Likewise. * localedata/locales/zu_ZA: Likewise. --- localedata/locales/aa_DJ | 1 + localedata/locales/af_ZA | 1 + localedata/locales/ak_GH | 1 + localedata/locales/am_ET | 1 + localedata/locales/ar_EG | 1 + localedata/locales/be_BY | 1 + localedata/locales/bem_ZM | 1 + localedata/locales/ber_DZ | 1 + localedata/locales/ber_MA | 1 + localedata/locales/bg_BG | 1 + localedata/locales/bi_VU | 1 + localedata/locales/bn_BD | 1 + localedata/locales/bo_CN | 1 + localedata/locales/ca_ES | 1 + localedata/locales/ce_RU | 1 + localedata/locales/cs_CZ | 1 + localedata/locales/cv_RU | 1 + localedata/locales/cy_GB | 1 + localedata/locales/da_DK | 1 + localedata/locales/de_DE | 1 + localedata/locales/dv_MV | 1 + localedata/locales/dz_BT | 1 + localedata/locales/el_GR | 1 + localedata/locales/en_GB | 1 + localedata/locales/en_NG | 1 + localedata/locales/en_ZM | 1 + localedata/locales/es_CU | 1 + localedata/locales/es_ES | 1 + localedata/locales/et_EE | 1 + localedata/locales/fa_IR | 1 + localedata/locales/ff_SN | 1 + localedata/locales/fi_FI | 1 + localedata/locales/fr_FR | 1 + localedata/locales/ga_IE | 1 + localedata/locales/gd_GB | 1 + localedata/locales/gu_IN | 1 + localedata/locales/gv_GB | 1 + localedata/locales/he_IL | 1 + localedata/locales/hi_IN | 1 + localedata/locales/hif_FJ | 1 + localedata/locales/hr_HR | 1 + localedata/locales/ht_HT | 1 + localedata/locales/hu_HU | 1 + localedata/locales/hy_AM | 1 + localedata/locales/id_ID | 1 + localedata/locales/is_IS | 1 + localedata/locales/it_IT | 1 + localedata/locales/ja_JP | 1 + localedata/locales/kab_DZ | 1 + localedata/locales/kk_KZ | 1 + localedata/locales/km_KH | 1 + localedata/locales/kn_IN | 1 + localedata/locales/ko_KR | 1 + localedata/locales/ks_IN | 1 + localedata/locales/kw_GB | 1 + localedata/locales/ky_KG | 1 + localedata/locales/lb_LU | 1 + localedata/locales/lg_UG | 1 + localedata/locales/lij_IT | 1 + localedata/locales/ln_CD | 1 + localedata/locales/lo_LA | 1 + localedata/locales/lt_LT | 1 + localedata/locales/lv_LV | 1 + localedata/locales/mg_MG | 1 + localedata/locales/mhr_RU | 1 + localedata/locales/mk_MK | 1 + localedata/locales/ml_IN | 1 + localedata/locales/ms_MY | 1 + localedata/locales/mt_MT | 1 + localedata/locales/nan_TW@latin | 1 + localedata/locales/nb_NO | 1 + localedata/locales/ne_NP | 1 + localedata/locales/nhn_MX | 1 + localedata/locales/niu_NU | 1 + localedata/locales/niu_NZ | 1 + localedata/locales/nl_NL | 1 + localedata/locales/nr_ZA | 1 + localedata/locales/oc_FR | 1 + localedata/locales/om_KE | 1 + localedata/locales/or_IN | 1 + localedata/locales/os_RU | 1 + localedata/locales/pa_IN | 1 + localedata/locales/pa_PK | 1 + localedata/locales/pl_PL | 1 + localedata/locales/pt_PT | 1 + localedata/locales/quz_PE | 1 + localedata/locales/ro_RO | 1 + localedata/locales/ru_RU | 1 + localedata/locales/rw_RW | 1 + localedata/locales/sa_IN | 1 + localedata/locales/sd_IN | 1 + localedata/locales/sd_IN@devanagari | 1 + localedata/locales/se_NO | 1 + localedata/locales/sgs_LT | 1 + localedata/locales/shn_MM | 1 + localedata/locales/si_LK | 1 + localedata/locales/sk_SK | 1 + localedata/locales/sl_SI | 1 + localedata/locales/sm_WS | 1 + localedata/locales/so_SO | 1 + localedata/locales/sq_AL | 1 + localedata/locales/ss_ZA | 1 + localedata/locales/st_ZA | 1 + localedata/locales/sv_SE | 1 + localedata/locales/sw_KE | 1 + localedata/locales/ta_IN | 1 + localedata/locales/te_IN | 1 + localedata/locales/th_TH | 1 + localedata/locales/ti_ET | 1 + localedata/locales/tn_ZA | 1 + localedata/locales/to_TO | 1 + localedata/locales/tpi_PG | 1 + localedata/locales/tr_TR | 1 + localedata/locales/translit_cyrillic | 383 +++++++++++++++++++++++++++ localedata/locales/ts_ZA | 1 + localedata/locales/unm_US | 1 + localedata/locales/ur_IN | 1 + localedata/locales/ur_PK | 1 + localedata/locales/ve_ZA | 1 + localedata/locales/vi_VN | 1 + localedata/locales/wa_BE | 1 + localedata/locales/wo_SN | 1 + localedata/locales/xh_ZA | 1 + localedata/locales/yi_US | 1 + localedata/locales/yuw_PG | 1 + localedata/locales/zh_CN | 1 + localedata/locales/zu_ZA | 1 + 127 files changed, 509 insertions(+) create mode 100644 localedata/locales/translit_cyrillic diff --git a/localedata/locales/aa_DJ b/localedata/locales/aa_DJ index fcb9af8abc..533e5b714e 100644 --- a/localedata/locales/aa_DJ +++ b/localedata/locales/aa_DJ @@ -68,6 +68,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/af_ZA b/localedata/locales/af_ZA index 2f45ddad63..d16bbcf707 100644 --- a/localedata/locales/af_ZA +++ b/localedata/locales/af_ZA @@ -70,6 +70,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/ak_GH b/localedata/locales/ak_GH index 926e4df343..d743ba48c7 100644 --- a/localedata/locales/ak_GH +++ b/localedata/locales/ak_GH @@ -54,6 +54,7 @@ LC_CTYPE copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/am_ET b/localedata/locales/am_ET index e5fe88a4cd..bee494be0a 100644 --- a/localedata/locales/am_ET +++ b/localedata/locales/am_ET @@ -96,6 +96,7 @@ copy "i18n" space <U1361> translit_start include "translit_combining";"" +include "translit_cyrillic";"" % hoy-sadis followed by a vowel <U1205><U12A0> <U0068><U0027><U0065> diff --git a/localedata/locales/ar_EG b/localedata/locales/ar_EG index c8cb3180bf..f2584cd7ad 100644 --- a/localedata/locales/ar_EG +++ b/localedata/locales/ar_EG @@ -44,6 +44,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/be_BY b/localedata/locales/be_BY index 324379b65a..4fb16d3540 100644 --- a/localedata/locales/be_BY +++ b/localedata/locales/be_BY @@ -91,6 +91,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/bem_ZM b/localedata/locales/bem_ZM index fa43ad1610..7a8c3c3b77 100644 --- a/localedata/locales/bem_ZM +++ b/localedata/locales/bem_ZM @@ -41,6 +41,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/ber_DZ b/localedata/locales/ber_DZ index 79f3d289b1..137643873d 100644 --- a/localedata/locales/ber_DZ +++ b/localedata/locales/ber_DZ @@ -136,6 +136,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/ber_MA b/localedata/locales/ber_MA index b9bd64868c..fd79bf11d6 100644 --- a/localedata/locales/ber_MA +++ b/localedata/locales/ber_MA @@ -83,6 +83,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/bg_BG b/localedata/locales/bg_BG index 7a9cfa0a5d..504199a4d9 100644 --- a/localedata/locales/bg_BG +++ b/localedata/locales/bg_BG @@ -49,6 +49,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/bi_VU b/localedata/locales/bi_VU index 88bf70a61b..81d717b2f6 100755 --- a/localedata/locales/bi_VU +++ b/localedata/locales/bi_VU @@ -39,6 +39,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/bn_BD b/localedata/locales/bn_BD index 73efd1cbc3..bc82d611e0 100644 --- a/localedata/locales/bn_BD +++ b/localedata/locales/bn_BD @@ -61,6 +61,7 @@ map to_inpunct; / translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/bo_CN b/localedata/locales/bo_CN index 90cbc7807b..7779d3d99b 100644 --- a/localedata/locales/bo_CN +++ b/localedata/locales/bo_CN @@ -43,6 +43,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/ca_ES b/localedata/locales/ca_ES index 0ba74ccf33..af72a1ab86 100644 --- a/localedata/locales/ca_ES +++ b/localedata/locales/ca_ES @@ -57,6 +57,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/ce_RU b/localedata/locales/ce_RU index 03e60f838a..75ef80498d 100644 --- a/localedata/locales/ce_RU +++ b/localedata/locales/ce_RU @@ -38,6 +38,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/cs_CZ b/localedata/locales/cs_CZ index 41fbd2be93..9450d22f2f 100644 --- a/localedata/locales/cs_CZ +++ b/localedata/locales/cs_CZ @@ -215,6 +215,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/cv_RU b/localedata/locales/cv_RU index e9247b39f8..253cbd63af 100644 --- a/localedata/locales/cv_RU +++ b/localedata/locales/cv_RU @@ -103,6 +103,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/cy_GB b/localedata/locales/cy_GB index 5f6fd7c87f..6d35d7c27e 100644 --- a/localedata/locales/cy_GB +++ b/localedata/locales/cy_GB @@ -65,6 +65,7 @@ LC_CTYPE copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/da_DK b/localedata/locales/da_DK index 05a2681bef..1b38e8af17 100644 --- a/localedata/locales/da_DK +++ b/localedata/locales/da_DK @@ -147,6 +147,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" % LATIN CAPITAL LETTER A WITH DIAERESIS -> "AE" <U00C4> "<U0041><U0308>";"<U0041><U0045>" diff --git a/localedata/locales/de_DE b/localedata/locales/de_DE index eaa9f7ff8e..85793437a5 100644 --- a/localedata/locales/de_DE +++ b/localedata/locales/de_DE @@ -44,6 +44,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" % German umlauts. % LATIN CAPITAL LETTER A WITH DIAERESIS. diff --git a/localedata/locales/dv_MV b/localedata/locales/dv_MV index 0d7842f39f..f9c8de4a50 100644 --- a/localedata/locales/dv_MV +++ b/localedata/locales/dv_MV @@ -49,6 +49,7 @@ LC_CTYPE copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end diff --git a/localedata/locales/dz_BT b/localedata/locales/dz_BT index 272fa7e78f..31d488ad0c 100644 --- a/localedata/locales/dz_BT +++ b/localedata/locales/dz_BT @@ -59,6 +59,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/el_GR b/localedata/locales/el_GR index 7362492fbd..994a4a913d 100644 --- a/localedata/locales/el_GR +++ b/localedata/locales/el_GR @@ -58,6 +58,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/en_GB b/localedata/locales/en_GB index 5b895574ac..2f1cc5904b 100644 --- a/localedata/locales/en_GB +++ b/localedata/locales/en_GB @@ -54,6 +54,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/en_NG b/localedata/locales/en_NG index 109201c2fe..fa70ffe943 100644 --- a/localedata/locales/en_NG +++ b/localedata/locales/en_NG @@ -49,6 +49,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/en_ZM b/localedata/locales/en_ZM index 8957d8e8aa..1fc5dfed65 100644 --- a/localedata/locales/en_ZM +++ b/localedata/locales/en_ZM @@ -41,6 +41,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/es_CU b/localedata/locales/es_CU index d37d452b0f..90c714ea18 100644 --- a/localedata/locales/es_CU +++ b/localedata/locales/es_CU @@ -58,6 +58,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/es_ES b/localedata/locales/es_ES index aa919a2626..534152d0a8 100644 --- a/localedata/locales/es_ES +++ b/localedata/locales/es_ES @@ -107,6 +107,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/et_EE b/localedata/locales/et_EE index f5c47149a6..51e6a4ab13 100644 --- a/localedata/locales/et_EE +++ b/localedata/locales/et_EE @@ -113,6 +113,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/fa_IR b/localedata/locales/fa_IR index 3714a30932..fdeaf6312e 100644 --- a/localedata/locales/fa_IR +++ b/localedata/locales/fa_IR @@ -78,6 +78,7 @@ map to_outpunct; / translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/ff_SN b/localedata/locales/ff_SN index e4b18eba7b..32e2eb78d8 100644 --- a/localedata/locales/ff_SN +++ b/localedata/locales/ff_SN @@ -41,6 +41,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/fi_FI b/localedata/locales/fi_FI index eeb278316b..57eda9bff1 100644 --- a/localedata/locales/fi_FI +++ b/localedata/locales/fi_FI @@ -177,6 +177,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/fr_FR b/localedata/locales/fr_FR index a18c514f19..098be4906f 100644 --- a/localedata/locales/fr_FR +++ b/localedata/locales/fr_FR @@ -57,6 +57,7 @@ translit_start % In France, accents are simply omitted if they cannot be represented. include "translit_combining";"" +include "translit_cyrillic";"" translit_end diff --git a/localedata/locales/ga_IE b/localedata/locales/ga_IE index 782adbaa5c..d430028b74 100644 --- a/localedata/locales/ga_IE +++ b/localedata/locales/ga_IE @@ -53,6 +53,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/gd_GB b/localedata/locales/gd_GB index 8d54593113..aaa41a0bda 100644 --- a/localedata/locales/gd_GB +++ b/localedata/locales/gd_GB @@ -45,6 +45,7 @@ LC_CTYPE copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/gu_IN b/localedata/locales/gu_IN index cd7e23a4be..00f00d4f8d 100644 --- a/localedata/locales/gu_IN +++ b/localedata/locales/gu_IN @@ -62,6 +62,7 @@ map to_inpunct; / translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/gv_GB b/localedata/locales/gv_GB index 473c043cba..3c6ba93629 100644 --- a/localedata/locales/gv_GB +++ b/localedata/locales/gv_GB @@ -56,6 +56,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/he_IL b/localedata/locales/he_IL index 52b5a6bff0..82a0760c10 100644 --- a/localedata/locales/he_IL +++ b/localedata/locales/he_IL @@ -58,6 +58,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/hi_IN b/localedata/locales/hi_IN index a94365519f..12a44e6689 100644 --- a/localedata/locales/hi_IN +++ b/localedata/locales/hi_IN @@ -61,6 +61,7 @@ map to_inpunct; / translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/hif_FJ b/localedata/locales/hif_FJ index 5433bb4a2a..005ac6d308 100644 --- a/localedata/locales/hif_FJ +++ b/localedata/locales/hif_FJ @@ -37,6 +37,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/hr_HR b/localedata/locales/hr_HR index 029a3794e2..8222d73ff0 100644 --- a/localedata/locales/hr_HR +++ b/localedata/locales/hr_HR @@ -46,6 +46,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" % Historicaly we used ISO-8869-2 and wrote digraphs % <U01C6> {dž}, <U01C9> {lj} and <U01CC> {nj} diff --git a/localedata/locales/ht_HT b/localedata/locales/ht_HT index 0e0a79d2f1..69688a401e 100644 --- a/localedata/locales/ht_HT +++ b/localedata/locales/ht_HT @@ -57,6 +57,7 @@ LC_CTYPE copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/hu_HU b/localedata/locales/hu_HU index 9d6bb85022..5e19e5b689 100644 --- a/localedata/locales/hu_HU +++ b/localedata/locales/hu_HU @@ -455,6 +455,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" <U00C1> "<U0041><U0301>";"<U0041><U00B4>";"<U0041><U0027>" <U00C9> "<U0045><U0301>";"<U0045><U00B4>";"<U0045><U0027>" diff --git a/localedata/locales/hy_AM b/localedata/locales/hy_AM index 74e1b77efb..5973c85f33 100644 --- a/localedata/locales/hy_AM +++ b/localedata/locales/hy_AM @@ -75,6 +75,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/id_ID b/localedata/locales/id_ID index 3ddd8d07da..af36159ca6 100644 --- a/localedata/locales/id_ID +++ b/localedata/locales/id_ID @@ -54,6 +54,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/is_IS b/localedata/locales/is_IS index 8d59b468d6..f614fea728 100644 --- a/localedata/locales/is_IS +++ b/localedata/locales/is_IS @@ -149,6 +149,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/it_IT b/localedata/locales/it_IT index 8a10545de0..7d4cda7fc6 100644 --- a/localedata/locales/it_IT +++ b/localedata/locales/it_IT @@ -58,6 +58,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/ja_JP b/localedata/locales/ja_JP index 1fd2fee44b..34ed430947 100644 --- a/localedata/locales/ja_JP +++ b/localedata/locales/ja_JP @@ -1680,6 +1680,7 @@ translit_start include "translit_combining";"" include "translit_cjk_variants";"" +include "translit_cyrillic";"" translit_end diff --git a/localedata/locales/kab_DZ b/localedata/locales/kab_DZ index a165f53f01..4cf468c6a5 100644 --- a/localedata/locales/kab_DZ +++ b/localedata/locales/kab_DZ @@ -41,6 +41,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/kk_KZ b/localedata/locales/kk_KZ index c29c84b46e..c4ceb28b27 100644 --- a/localedata/locales/kk_KZ +++ b/localedata/locales/kk_KZ @@ -99,6 +99,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/km_KH b/localedata/locales/km_KH index 0d8c9ce78d..acd9291346 100644 --- a/localedata/locales/km_KH +++ b/localedata/locales/km_KH @@ -42,6 +42,7 @@ LC_CTYPE copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/kn_IN b/localedata/locales/kn_IN index b6443d12c8..cffa4e4544 100644 --- a/localedata/locales/kn_IN +++ b/localedata/locales/kn_IN @@ -63,6 +63,7 @@ map to_inpunct; / translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/ko_KR b/localedata/locales/ko_KR index bd0d919218..31a8b105c5 100644 --- a/localedata/locales/ko_KR +++ b/localedata/locales/ko_KR @@ -6098,6 +6098,7 @@ translit_start include "translit_combining";"" include "translit_hangul";"" +include "translit_cyrillic";"" translit_end diff --git a/localedata/locales/ks_IN b/localedata/locales/ks_IN index 9ab8707922..0c1572b8fd 100644 --- a/localedata/locales/ks_IN +++ b/localedata/locales/ks_IN @@ -46,6 +46,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/kw_GB b/localedata/locales/kw_GB index c0433b3f07..1eb4cfd1c1 100644 --- a/localedata/locales/kw_GB +++ b/localedata/locales/kw_GB @@ -57,6 +57,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/ky_KG b/localedata/locales/ky_KG index 871b8a818b..f46b6979e2 100644 --- a/localedata/locales/ky_KG +++ b/localedata/locales/ky_KG @@ -82,6 +82,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/lb_LU b/localedata/locales/lb_LU index 92f1e22e1a..992d0f677d 100644 --- a/localedata/locales/lb_LU +++ b/localedata/locales/lb_LU @@ -44,6 +44,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" % German umlauts % LATIN CAPITAL LETTER A WITH DIAERESIS diff --git a/localedata/locales/lg_UG b/localedata/locales/lg_UG index 70dd1cad2e..57dd8c74e8 100644 --- a/localedata/locales/lg_UG +++ b/localedata/locales/lg_UG @@ -56,6 +56,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/lij_IT b/localedata/locales/lij_IT index 2d6e5fcc5c..baec837196 100644 --- a/localedata/locales/lij_IT +++ b/localedata/locales/lij_IT @@ -47,6 +47,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/ln_CD b/localedata/locales/ln_CD index ed6404a1e5..a91441809c 100644 --- a/localedata/locales/ln_CD +++ b/localedata/locales/ln_CD @@ -39,6 +39,7 @@ LC_CTYPE copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/lo_LA b/localedata/locales/lo_LA index d60d157167..2abd680a6a 100644 --- a/localedata/locales/lo_LA +++ b/localedata/locales/lo_LA @@ -50,6 +50,7 @@ LC_CTYPE copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/lt_LT b/localedata/locales/lt_LT index e9834bd200..a58168dc45 100644 --- a/localedata/locales/lt_LT +++ b/localedata/locales/lt_LT @@ -163,6 +163,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/lv_LV b/localedata/locales/lv_LV index a20cbdde46..e3fb992562 100644 --- a/localedata/locales/lv_LV +++ b/localedata/locales/lv_LV @@ -125,6 +125,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/mg_MG b/localedata/locales/mg_MG index 266ff17e7d..ee1ed56fed 100644 --- a/localedata/locales/mg_MG +++ b/localedata/locales/mg_MG @@ -53,6 +53,7 @@ translit_start % Accents are simply omitted if they cannot be represented. include "translit_combining";"" +include "translit_cyrillic";"" translit_end diff --git a/localedata/locales/mhr_RU b/localedata/locales/mhr_RU index 85ac21b35a..b936253ebc 100644 --- a/localedata/locales/mhr_RU +++ b/localedata/locales/mhr_RU @@ -58,6 +58,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/mk_MK b/localedata/locales/mk_MK index 87bae1dc7c..210cfce05c 100644 --- a/localedata/locales/mk_MK +++ b/localedata/locales/mk_MK @@ -48,6 +48,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/ml_IN b/localedata/locales/ml_IN index d7a8f43f1e..794d59f923 100644 --- a/localedata/locales/ml_IN +++ b/localedata/locales/ml_IN @@ -60,6 +60,7 @@ map to_inpunct; / translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE % diff --git a/localedata/locales/ms_MY b/localedata/locales/ms_MY index 66b5dd98e9..4fa53adbc3 100644 --- a/localedata/locales/ms_MY +++ b/localedata/locales/ms_MY @@ -45,6 +45,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/mt_MT b/localedata/locales/mt_MT index a6ab7b1dad..4b6a08f4e1 100644 --- a/localedata/locales/mt_MT +++ b/localedata/locales/mt_MT @@ -47,6 +47,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/nan_TW@latin b/localedata/locales/nan_TW@latin index d4579a4cdf..99e2bd80ab 100644 --- a/localedata/locales/nan_TW@latin +++ b/localedata/locales/nan_TW@latin @@ -51,6 +51,7 @@ translit_start % accents are simply omitted if they cannot be represented. include "translit_combining";"" +include "translit_cyrillic";"" translit_end diff --git a/localedata/locales/nb_NO b/localedata/locales/nb_NO index a8675b6104..4c90307366 100644 --- a/localedata/locales/nb_NO +++ b/localedata/locales/nb_NO @@ -144,6 +144,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" % LATIN CAPITAL LETTER A WITH DIAERESIS -> "AE" <U00C4> "<U0041><U0308>";"<U0041><U0045>" diff --git a/localedata/locales/ne_NP b/localedata/locales/ne_NP index eb80eabbd8..3aecda7fd7 100644 --- a/localedata/locales/ne_NP +++ b/localedata/locales/ne_NP @@ -43,6 +43,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/nhn_MX b/localedata/locales/nhn_MX index 88a89765e8..a5e286bc4c 100644 --- a/localedata/locales/nhn_MX +++ b/localedata/locales/nhn_MX @@ -59,6 +59,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/niu_NU b/localedata/locales/niu_NU index 553c5d9edc..e34f33e0c6 100644 --- a/localedata/locales/niu_NU +++ b/localedata/locales/niu_NU @@ -58,6 +58,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/niu_NZ b/localedata/locales/niu_NZ index 560101b447..85acd3bc44 100644 --- a/localedata/locales/niu_NZ +++ b/localedata/locales/niu_NZ @@ -58,6 +58,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/nl_NL b/localedata/locales/nl_NL index 1ab3277aa0..6284728fe7 100644 --- a/localedata/locales/nl_NL +++ b/localedata/locales/nl_NL @@ -56,6 +56,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/nr_ZA b/localedata/locales/nr_ZA index 7de6420a6b..caf2aba2e4 100644 --- a/localedata/locales/nr_ZA +++ b/localedata/locales/nr_ZA @@ -64,6 +64,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/oc_FR b/localedata/locales/oc_FR index 707927ee26..f347c8c4d8 100644 --- a/localedata/locales/oc_FR +++ b/localedata/locales/oc_FR @@ -54,6 +54,7 @@ LC_CTYPE copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/om_KE b/localedata/locales/om_KE index 66cdcf5c45..a75a623053 100644 --- a/localedata/locales/om_KE +++ b/localedata/locales/om_KE @@ -156,6 +156,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/or_IN b/localedata/locales/or_IN index ef28b58895..5c7b9cf8ef 100644 --- a/localedata/locales/or_IN +++ b/localedata/locales/or_IN @@ -62,6 +62,7 @@ map to_inpunct; / translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/os_RU b/localedata/locales/os_RU index 9a4ce037cd..7ab0b7a9bc 100644 --- a/localedata/locales/os_RU +++ b/localedata/locales/os_RU @@ -71,6 +71,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/pa_IN b/localedata/locales/pa_IN index ca28f21162..93e17fa848 100644 --- a/localedata/locales/pa_IN +++ b/localedata/locales/pa_IN @@ -60,6 +60,7 @@ map to_inpunct; / translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/pa_PK b/localedata/locales/pa_PK index 1f49bdc90d..7782adb5d8 100644 --- a/localedata/locales/pa_PK +++ b/localedata/locales/pa_PK @@ -49,6 +49,7 @@ LC_CTYPE copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" % those two lettes are not in cp1256... diff --git a/localedata/locales/pl_PL b/localedata/locales/pl_PL index 4c1b2a869d..8caa5e8579 100644 --- a/localedata/locales/pl_PL +++ b/localedata/locales/pl_PL @@ -130,6 +130,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/pt_PT b/localedata/locales/pt_PT index 6225036edf..d52ac3ac26 100644 --- a/localedata/locales/pt_PT +++ b/localedata/locales/pt_PT @@ -58,6 +58,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/quz_PE b/localedata/locales/quz_PE index f6b1956b93..018cd9a7e5 100644 --- a/localedata/locales/quz_PE +++ b/localedata/locales/quz_PE @@ -55,6 +55,7 @@ LC_CTYPE copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/ro_RO b/localedata/locales/ro_RO index 39c4d09a07..6443d66d6a 100644 --- a/localedata/locales/ro_RO +++ b/localedata/locales/ro_RO @@ -129,6 +129,7 @@ copy "i18n" % translit_start include "translit_combining";"" +include "translit_cyrillic";"" % if t/scomma is not available, try first t/scedilla <U0218> "<U015E>";"<U0053>" diff --git a/localedata/locales/ru_RU b/localedata/locales/ru_RU index fdb2059fe7..1f6d2c6935 100644 --- a/localedata/locales/ru_RU +++ b/localedata/locales/ru_RU @@ -69,6 +69,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/rw_RW b/localedata/locales/rw_RW index e0bc763c5a..e12a3d83a3 100644 --- a/localedata/locales/rw_RW +++ b/localedata/locales/rw_RW @@ -45,6 +45,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/sa_IN b/localedata/locales/sa_IN index 4eaf6fe1fe..6ebb5e4f90 100644 --- a/localedata/locales/sa_IN +++ b/localedata/locales/sa_IN @@ -44,6 +44,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/sd_IN b/localedata/locales/sd_IN index e5ab80b062..23b7424d3b 100644 --- a/localedata/locales/sd_IN +++ b/localedata/locales/sd_IN @@ -46,6 +46,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/sd_IN@devanagari b/localedata/locales/sd_IN@devanagari index d57cea639b..0a122b95ac 100644 --- a/localedata/locales/sd_IN@devanagari +++ b/localedata/locales/sd_IN@devanagari @@ -44,6 +44,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/se_NO b/localedata/locales/se_NO index b50001139a..b423d93531 100644 --- a/localedata/locales/se_NO +++ b/localedata/locales/se_NO @@ -221,6 +221,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/sgs_LT b/localedata/locales/sgs_LT index 6b6ab1cac9..561c43b651 100644 --- a/localedata/locales/sgs_LT +++ b/localedata/locales/sgs_LT @@ -58,6 +58,7 @@ LC_CTYPE copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/shn_MM b/localedata/locales/shn_MM index 4212c50ec5..079506dafc 100644 --- a/localedata/locales/shn_MM +++ b/localedata/locales/shn_MM @@ -58,6 +58,7 @@ map to_inpunct; / translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/si_LK b/localedata/locales/si_LK index dc4a9eb04d..4d2fc8b3f0 100644 --- a/localedata/locales/si_LK +++ b/localedata/locales/si_LK @@ -44,6 +44,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/sk_SK b/localedata/locales/sk_SK index 94e6e12bb2..086499bb7e 100644 --- a/localedata/locales/sk_SK +++ b/localedata/locales/sk_SK @@ -67,6 +67,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/sl_SI b/localedata/locales/sl_SI index 6157b26d4f..dd9b516111 100644 --- a/localedata/locales/sl_SI +++ b/localedata/locales/sl_SI @@ -2120,6 +2120,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/sm_WS b/localedata/locales/sm_WS index 6058fbdc38..b9954ae30e 100644 --- a/localedata/locales/sm_WS +++ b/localedata/locales/sm_WS @@ -37,6 +37,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/so_SO b/localedata/locales/so_SO index 713bf79608..9ed4d68ce9 100644 --- a/localedata/locales/so_SO +++ b/localedata/locales/so_SO @@ -68,6 +68,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/sq_AL b/localedata/locales/sq_AL index b16a459c56..d9154d7f9e 100644 --- a/localedata/locales/sq_AL +++ b/localedata/locales/sq_AL @@ -45,6 +45,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/ss_ZA b/localedata/locales/ss_ZA index 7532a1940b..31c45321ce 100644 --- a/localedata/locales/ss_ZA +++ b/localedata/locales/ss_ZA @@ -66,6 +66,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/st_ZA b/localedata/locales/st_ZA index 706ef3e50a..b62f478f5f 100644 --- a/localedata/locales/st_ZA +++ b/localedata/locales/st_ZA @@ -62,6 +62,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/sv_SE b/localedata/locales/sv_SE index aa28c23776..7443ee277c 100644 --- a/localedata/locales/sv_SE +++ b/localedata/locales/sv_SE @@ -151,6 +151,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" % LATIN CAPITAL LETTER A WITH DIAERESIS -> "AE" <U00C4> "<U0041><U0308>";"<U0041><U0045>" diff --git a/localedata/locales/sw_KE b/localedata/locales/sw_KE index 6c303da983..1e3f848e1d 100644 --- a/localedata/locales/sw_KE +++ b/localedata/locales/sw_KE @@ -43,6 +43,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/ta_IN b/localedata/locales/ta_IN index 5a083d2658..ec08739ebd 100644 --- a/localedata/locales/ta_IN +++ b/localedata/locales/ta_IN @@ -63,6 +63,7 @@ map to_inpunct; / translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/te_IN b/localedata/locales/te_IN index b70f320051..99ffb43bf5 100644 --- a/localedata/locales/te_IN +++ b/localedata/locales/te_IN @@ -63,6 +63,7 @@ map to_inpunct; / translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/th_TH b/localedata/locales/th_TH index 7a10376e80..148a1c632b 100644 --- a/localedata/locales/th_TH +++ b/localedata/locales/th_TH @@ -57,6 +57,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/ti_ET b/localedata/locales/ti_ET index 6c387604e9..2c2e32a702 100644 --- a/localedata/locales/ti_ET +++ b/localedata/locales/ti_ET @@ -864,6 +864,7 @@ translit_start <U137C> <U0060><U0031><U0030><U0030><U0030><U0030> include "translit_combining";"" +include "translit_cyrillic";"" translit_end % END LC_CTYPE diff --git a/localedata/locales/tn_ZA b/localedata/locales/tn_ZA index 8473426eab..274336c8d3 100644 --- a/localedata/locales/tn_ZA +++ b/localedata/locales/tn_ZA @@ -67,6 +67,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/to_TO b/localedata/locales/to_TO index 7abe8685df..09e5e093d5 100644 --- a/localedata/locales/to_TO +++ b/localedata/locales/to_TO @@ -36,6 +36,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/tpi_PG b/localedata/locales/tpi_PG index 3315c27633..e625543fcb 100644 --- a/localedata/locales/tpi_PG +++ b/localedata/locales/tpi_PG @@ -44,6 +44,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/tr_TR b/localedata/locales/tr_TR index f7c13ddf4b..c751dc696a 100644 --- a/localedata/locales/tr_TR +++ b/localedata/locales/tr_TR @@ -2535,6 +2535,7 @@ class "combining_level3"; / translit_start include "translit_combining";"" +include "translit_cyrillic";"" % TURKISH LIRA SIGN <U20BA> "<U0054><U004C>" diff --git a/localedata/locales/translit_cyrillic b/localedata/locales/translit_cyrillic new file mode 100644 index 0000000000..82d9749e08 --- /dev/null +++ b/localedata/locales/translit_cyrillic @@ -0,0 +1,383 @@ +escape_char / +comment_char % + +% This file is part of the GNU C Library and contains locale data. +% The Free Software Foundation does not claim any copyright interest +% in the locale data contained in this file. The foregoing does not +% affect the license of the GNU C Library as a whole. It does not +% exempt you from the conditions of the license if your use would +% otherwise be governed by that license. + +% Transliterations of Cyrillic letters to Latin and/or ASCII symbols. +% Inspired by ISO 9.1995 / GOST 7.79-2000. +% Covers Unicode Range https://www.unicode.org/charts/PDF/U0400.pdf +% i.e. [U0401-U04F9, U2019] but only the letters covered by ISO 9.1995 +% It implements the GOST_7.79 System A (Latin Script) as a first +% option and System B Cyrillic (ASCII) as a second option. Check +% h:ttps://en.wikipedia.org/wiki/ISO_9 for reference. +% The System B is extended from GOST_7.79-Russian using open sources +% of the transliteration mappings and the "h/`" diacritics logic. + +% Usage examples: +% iconv -f UTF-8 -t ISO-8859-15//TRANSLIT \ +% | iconv -f ISO-8859-15 -t UTF-8 # System A +% iconv -f UTF-8 -t ASCII//TRANSLIT # System B. + +% Contributions welcome for the rest of Cyrillic script in Unicode +% https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode. +% Bugfix for https://sourceware.org/bugzilla/show_bug.cgi?id=2872. +% Generated from UnicodeData.txt with a spreadsheet referenced +% in that bug's doclet + +LC_CTYPE + +translit_start + +% CYRILLIC CAPITAL LETTER IO +<U0401> <U00CB>;"<U0059><U004F>" +% CYRILLIC CAPITAL LETTER DJE +<U0402> <U0110>;"<U0044><U004A>" +% CYRILLIC CAPITAL LETTER GJE +<U0403> <U01F4>;"<U0047><U0060>" +% CYRILLIC CAPITAL LETTER UKRAINIAN IE +<U0404> <U00CA>;"<U0059><U0045>" +% CYRILLIC CAPITAL LETTER DZE +<U0405> <U1E90>;"<U005A><U0060>" +% CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I +<U0406> <U00CC>;<U0049> +% CYRILLIC CAPITAL LETTER YI +<U0407> <U00CF>;"<U0059><U0049>" +% CYRILLIC CAPITAL LETTER JE +<U0408> "<U004A><U030C>";<U004A> +% CYRILLIC CAPITAL LETTER LJE +<U0409> "<U004C><U0302>";"<U004C><U0060>" +% CYRILLIC CAPITAL LETTER NJE +<U040A> "<U004E><U0302>";"<U004E><U0060>" +% CYRILLIC CAPITAL LETTER TSHE +<U040B> <U0106>;"<U0054><U0053><U0048>" +% CYRILLIC CAPITAL LETTER KJE +<U040C> <U1E30>;"<U004B><U0060>" +% CYRILLIC CAPITAL LETTER SHORT U +<U040E> <U016C>;"<U0055><U0060>" +% CYRILLIC CAPITAL LETTER DZHE +<U040F> "<U0044><U0302>";"<U0044><U0048>" +% CYRILLIC CAPITAL LETTER A +<U0410> <U0041> +% CYRILLIC CAPITAL LETTER BE +<U0411> <U0042> +% CYRILLIC CAPITAL LETTER VE +<U0412> <U0056> +% CYRILLIC CAPITAL LETTER GHE +<U0413> <U0047> +% CYRILLIC CAPITAL LETTER DE +<U0414> <U0044> +% CYRILLIC CAPITAL LETTER IE +<U0415> <U0045> +% CYRILLIC CAPITAL LETTER ZHE +<U0416> <U017D>;"<U005A><U0048>" +% CYRILLIC CAPITAL LETTER ZE +<U0417> <U005A> +% CYRILLIC CAPITAL LETTER I +<U0418> <U0049> +% CYRILLIC CAPITAL LETTER SHORT I +<U0419> <U004A> +% CYRILLIC CAPITAL LETTER KA +<U041A> <U004B> +% CYRILLIC CAPITAL LETTER EL +<U041B> <U004C> +% CYRILLIC CAPITAL LETTER EM +<U041C> <U004D> +% CYRILLIC CAPITAL LETTER EN +<U041D> <U004E> +% CYRILLIC CAPITAL LETTER O +<U041E> <U004F> +% CYRILLIC CAPITAL LETTER PE +<U041F> <U0050> +% CYRILLIC CAPITAL LETTER ER +<U0420> <U0052> +% CYRILLIC CAPITAL LETTER ES +<U0421> <U0053> +% CYRILLIC CAPITAL LETTER TE +<U0422> <U0054> +% CYRILLIC CAPITAL LETTER U +<U0423> <U0055> +% CYRILLIC UNDEFINED +<U0423><U0301> <U00DA>;"<U0055><U0060>" +% CYRILLIC CAPITAL LETTER EF +<U0424> <U0046> +% CYRILLIC CAPITAL LETTER HA +<U0425> <U0048>;<U0058> +% CYRILLIC CAPITAL LETTER TSE +<U0426> <U0043>;"<U0043><U005A>" +% CYRILLIC CAPITAL LETTER CHE +<U0427> <U010C>;"<U0043><U0048>" +% CYRILLIC CAPITAL LETTER SHA +<U0428> <U0160>;"<U0053><U0048>" +% CYRILLIC CAPITAL LETTER SHCHA +<U0429> <U015C>;"<U0053><U0048><U0048>" +% CYRILLIC CAPITAL LETTER HARD SIGN +<U042A> <U02BA>;"<U0041><U0060>" +% CYRILLIC CAPITAL LETTER YERU +<U042B> <U0059>;"<U0059><U0060>" +% CYRILLIC CAPITAL LETTER SOFT SIGN +<U042C> <U02B9>;<U0060> +% CYRILLIC CAPITAL LETTER E +<U042D> <U00C8>;"<U0045><U0060>" +% CYRILLIC CAPITAL LETTER YU +<U042E> <U00DB>;"<U0059><U0055>" +% CYRILLIC CAPITAL LETTER YA +<U042F> <U00C2>;"<U0059><U0041>" +% CYRILLIC SMALL LETTER A +<U0430> <U0061> +% CYRILLIC SMALL LETTER BE +<U0431> <U0062> +% CYRILLIC SMALL LETTER VE +<U0432> <U0076> +% CYRILLIC SMALL LETTER GHE +<U0433> <U0067> +% CYRILLIC SMALL LETTER DE +<U0434> <U0064> +% CYRILLIC SMALL LETTER IE +<U0435> <U0065> +% CYRILLIC SMALL LETTER ZHE +<U0436> <U017E>;"<U007A><U0068>" +% CYRILLIC SMALL LETTER ZE +<U0437> <U007A> +% CYRILLIC SMALL LETTER I +<U0438> <U0069> +% CYRILLIC SMALL LETTER SHORT I +<U0439> <U006A> +% CYRILLIC SMALL LETTER KA +<U043A> <U006B> +% CYRILLIC SMALL LETTER EL +<U043B> <U006C> +% CYRILLIC SMALL LETTER EM +<U043C> <U006D> +% CYRILLIC SMALL LETTER EN +<U043D> <U006E> +% CYRILLIC SMALL LETTER O +<U043E> <U006F> +% CYRILLIC SMALL LETTER PE +<U043F> <U0070> +% CYRILLIC SMALL LETTER ER +<U0440> <U0072> +% CYRILLIC SMALL LETTER ES +<U0441> <U0073> +% CYRILLIC SMALL LETTER TE +<U0442> <U0074> +% CYRILLIC SMALL LETTER U +<U0443> <U0075> +% CYRILLIC UNDEFINED +<U0443><U0301> <U00FA>;"<U0075><U0060>" +% CYRILLIC SMALL LETTER EF +<U0444> <U0066> +% CYRILLIC SMALL LETTER HA +<U0445> <U0068>;<U0078> +% CYRILLIC SMALL LETTER TSE +<U0446> <U0063>;"<U0063><U007A>" +% CYRILLIC SMALL LETTER CHE +<U0447> <U010D>;"<U0063><U0068>" +% CYRILLIC SMALL LETTER SHA +<U0448> <U0161>;"<U0073><U0068>" +% CYRILLIC SMALL LETTER SHCHA +<U0449> <U015D>;"<U0073><U0068><U0068>" +% CYRILLIC SMALL LETTER HARD SIGN +<U044A> <U02BA>;"<U0060><U0060>" +% CYRILLIC SMALL LETTER YERU +<U044B> <U0079>;"<U0079><U0060>" +% CYRILLIC SMALL LETTER SOFT SIGN +<U044C> <U02B9>;<U0060> +% CYRILLIC SMALL LETTER E +<U044D> <U00E8>;"<U0065><U0060>" +% CYRILLIC SMALL LETTER YU +<U044E> <U00FB>;"<U0079><U0075>" +% CYRILLIC SMALL LETTER YA +<U044F> <U00E2>;"<U0079><U0061>" +% CYRILLIC SMALL LETTER IO +<U0451> <U00EB>;"<U0079><U006F>" +% CYRILLIC SMALL LETTER DJE +<U0452> <U0111>;"<U0064><U006A>" +% CYRILLIC SMALL LETTER GJE +<U0453> <U01F5>;"<U0067><U0060>" +% CYRILLIC SMALL LETTER UKRAINIAN IE +<U0454> <U00EA>;"<U0079><U0065>" +% CYRILLIC SMALL LETTER DZE +<U0455> <U1E91>;"<U007A><U0060>" +% CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I +<U0456> <U00EC>;<U0069> +% CYRILLIC SMALL LETTER YI +<U0457> <U00EF>;"<U0079><U0069>" +% CYRILLIC SMALL LETTER JE +<U0458> <U01F0>;<U006A> +% CYRILLIC SMALL LETTER LJE +<U0459> "<U006C><U0302>";"<U006C><U0060>" +% CYRILLIC SMALL LETTER NJE +<U045A> "<U006E><U0302>";"<U006E><U0060>" +% CYRILLIC SMALL LETTER TSHE +<U045B> <U0107>;"<U0074><U0073><U0068>" +% CYRILLIC SMALL LETTER KJE +<U045C> <U1E31>;"<U006B><U0060>" +% CYRILLIC SMALL LETTER SHORT U +<U045E> <U016D>;"<U0075><U0060>" +% CYRILLIC SMALL LETTER DZHE +<U045F> "<U0064><U0302>";"<U0064><U0068>" +% CYRILLIC CAPITAL LETTER BIG YUS +<U046A> <U01CD>;"<U004F><U0060>" +% CYRILLIC SMALL LETTER BIG YUS +<U046B> <U01CE>;"<U006F><U0060>" +% CYRILLIC CAPITAL LETTER FITA +<U0472> "<U0046><U0300>";"<U0046><U0048>" +% CYRILLIC SMALL LETTER FITA +<U0473> "<U0066><U0300>";"<U0066><U0068>" +% CYRILLIC CAPITAL LETTER IZHITSA +<U0474> <U1EF2>;"<U0059><U0048>" +% CYRILLIC SMALL LETTER IZHITSA +<U0475> <U1EF3>;"<U0079><U0068>" +% CYRILLIC CAPITAL LETTER SEMISOFT SIGN +<U048C> <U011A>;"<U0045><U0060>" +% CYRILLIC SMALL LETTER SEMISOFT SIGN +<U048D> <U011B>;"<U0065><U0060>" +% CYRILLIC CAPITAL LETTER GHE WITH UPTURN +<U0490> "<U0047><U0300>";"<U0047><U0060>" +% CYRILLIC SMALL LETTER GHE WITH UPTURN +<U0491> "<U0067><U0300>";"<U0067><U0060>" +% CYRILLIC CAPITAL LETTER GHE WITH STROKE +<U0492> <U0120>;"<U0047><U0048>" +% CYRILLIC SMALL LETTER GHE WITH STROKE +<U0493> <U0121>;"<U0067><U0068>" +% CYRILLIC CAPITAL LETTER GHE WITH MIDDLE HOOK +<U0494> <U011E>;"<U0047><U0048>" +% CYRILLIC SMALL LETTER GHE WITH MIDDLE HOOK +<U0495> <U011F>;"<U0067><U0068>" +% CYRILLIC CAPITAL LETTER ZHE WITH DESCENDER +<U0496> "<U017D><U0327>";"<U005A><U0048><U0060>" +% CYRILLIC SMALL LETTER ZHE WITH DESCENDER +<U0497> "<U017E><U0327>";"<U007A><U0068><U0060>" +% CYRILLIC CAPITAL LETTER KA WITH DESCENDER +<U049A> <U0136>;"<U004B><U0060>" +% CYRILLIC SMALL LETTER KA WITH DESCENDER +<U049B> <U0137>;"<U006B><U0060>" +% CYRILLIC CAPITAL LETTER KA WITH STROKE +<U049E> "<U004B><U0304>";"<U004B><U0060>" +% CYRILLIC SMALL LETTER KA WITH STROKE +<U049F> "<U006B><U0304>";"<U006B><U0060>" +% CYRILLIC CAPITAL LETTER EN WITH DESCENDER +<U04A2> <U1E46>;"<U004E><U0060>" +% CYRILLIC SMALL LETTER EN WITH DESCENDER +<U04A3> <U1E47>;"<U006E><U0060>" +% CYRILLIC CAPITAL LIGATURE EN GHE +<U04A4> <U1E44>;"<U004E><U0047>" +% CYRILLIC SMALL LIGATURE EN GHE +<U04A5> <U1E45>;"<U006E><U0067>" +% CYRILLIC CAPITAL LETTER PE WITH MIDDLE HOOK +<U04A6> <U1E54>;"<U0050><U0060>" +% CYRILLIC SMALL LETTER PE WITH MIDDLE HOOK +<U04A7> <U1E55>;"<U0070><U0060>" +% CYRILLIC CAPITAL LETTER ABKHASIAN HA +<U04A8> <U00D2>;"<U004F><U0060>" +% CYRILLIC SMALL LETTER ABKHASIAN HA +<U04A9> <U00F2>;"<U006F><U0060>" +% CYRILLIC CAPITAL LETTER ES WITH DESCENDER +<U04AA> <U00C7>;"<U0043><U0060>" +% CYRILLIC SMALL LETTER ES WITH DESCENDER +<U04AB> <U00E7>;"<U0043><U0060>" +% CYRILLIC CAPITAL LETTER TE WITH DESCENDER +<U04AC> <U0162>;"<U0054><U0060>" +% CYRILLIC SMALL LETTER TE WITH DESCENDER +<U04AD> <U0163>;"<U0074><U0060>" +% CYRILLIC CAPITAL LETTER STRAIGHT U +<U04AE> <U00D9>;<U0055> +% CYRILLIC SMALL LETTER STRAIGHT U +<U04AF> <U00F9>;<U0075> +% CYRILLIC CAPITAL LETTER HA WITH DESCENDER +<U04B2> <U1E28>;"<U0048><U0060>" +% CYRILLIC SMALL LETTER HA WITH DESCENDER +<U04B3> <U1E29>;"<U0068><U0060>" +% CYRILLIC CAPITAL LIGATURE TE TSE +<U04B4> "<U0043><U0304>";"<U0054><U0043><U005A>" +% CYRILLIC SMALL LIGATURE TE TSE +<U04B5> "<U0063><U0304>";"<U0074><U0063><U007A>" +% CYRILLIC CAPITAL LETTER SHHA +<U04BA> <U1E24>;"<U0053><U0048><U0060>" +% CYRILLIC SMALL LETTER SHHA +<U04BB> <U1E25>;"<U0053><U0048><U0060>" +% CYRILLIC CAPITAL LETTER ABKHASIAN CHE +<U04BC> "<U0043><U0306>";"<U0043><U0048><U0060>" +% CYRILLIC SMALL LETTER ABKHASIAN CHE +<U04BD> "<U0063><U0306>";"<U0063><U0068><U0060>" +% CYRILLIC CAPITAL LETTER ABKHASIAN CHE WITH DESCENDER +<U04BE> "<U00C7><U0306>";"<U0043><U0048><U0060>" +% CYRILLIC SMALL LETTER ABKHASIAN CHE WITH DESCENDER +<U04BF> "<U00E7><U0306>";"<U0063><U0068><U0060>" +% CYRILLIC LETTER PALOCHKA +<U04C0> <U2021>;<U0069> +% CYRILLIC CAPITAL LETTER ZHE WITH BREVE +<U04C1> "<U005A><U0306>";"<U005A><U0048><U0060>" +% CYRILLIC SMALL LETTER ZHE WITH BREVE +<U04C2> "<U007A><U0306>";"<U007A><U0068><U0060>" +% CYRILLIC CAPITAL LETTER KHAKASSIAN CHE +<U04CB> <U00C7>;"<U0043><U0048><U0060>" +% CYRILLIC SMALL LETTER KHAKASSIAN CHE +<U04CC> <U00E7>;"<U0063><U0068><U0060>" +% CYRILLIC CAPITAL LETTER A WITH BREVE +<U04D0> <U0102>;"<U0041><U0060>" +% CYRILLIC SMALL LETTER A WITH BREVE +<U04D1> <U0103>;"<U0061><U0060>" +% CYRILLIC CAPITAL LETTER A WITH DIAERESIS +<U04D2> <U00C4>;"<U0041><U0060>" +% CYRILLIC SMALL LETTER A WITH DIAERESIS +<U04D3> <U00E4>;"<U0061><U0060>" +% CYRILLIC CAPITAL LETTER IE WITH BREVE +<U04D6> <U0114>;"<U0045><U0060>" +% CYRILLIC SMALL LETTER IE WITH BREVE +<U04D7> <U0115>;"<U0065><U0060>" +% CYRILLIC CAPITAL LETTER SCHWA +<U04D8> "<U0041><U030B>";"<U0041><U0060>" +% CYRILLIC SMALL LETTER SCHWA +<U04D9> "<U0061><U030B>";"<U0061><U0060>" +% CYRILLIC CAPITAL LETTER ZHE WITH DIAERESIS +<U04DC> "<U005A><U0304>";"<U005A><U0048><U0060>" +% CYRILLIC SMALL LETTER ZHE WITH DIAERESIS +<U04DD> "<U007A><U0304>";"<U007A><U0068><U0060>" +% CYRILLIC CAPITAL LETTER ZE WITH DIAERESIS +<U04DE> "<U005A><U0308>";"<U005A><U0060>" +% CYRILLIC SMALL LETTER ZE WITH DIAERESIS +<U04DF> "<U007A><U0308>";"<U007A><U0060>" +% CYRILLIC CAPITAL LETTER ABKHASIAN DZE +<U04E0> <U0179>;"<U005A><U0060>" +% CYRILLIC SMALL LETTER ABKHASIAN DZE +<U04E1> <U017A>;"<U007A><U0060>" +% CYRILLIC CAPITAL LETTER I WITH DIAERESIS +<U04E4> <U00CE>;"<U0049><U0060>" +% CYRILLIC SMALL LETTER I WITH DIAERESIS +<U04E5> <U00EE>;"<U0069><U0060>" +% CYRILLIC CAPITAL LETTER O WITH DIAERESIS +<U04E6> <U00D6>;"<U004F><U0060>" +% CYRILLIC SMALL LETTER O WITH DIAERESIS +<U04E7> <U00F6>;"<U006F><U0060>" +% CYRILLIC CAPITAL LETTER BARRED O +<U04E8> <U00D4>;"<U004F><U0060>" +% CYRILLIC SMALL LETTER BARRED O +<U04E9> <U00F4>;"<U006F><U0060>" +% CYRILLIC CAPITAL LETTER U WITH DIAERESIS +<U04F0> <U00DC>;"<U0055><U0060>" +% CYRILLIC SMALL LETTER U WITH DIAERESIS +<U04F1> <U00FC>;"<U0075><U0060>" +% CYRILLIC CAPITAL LETTER U WITH DOUBLE ACUTE +<U04F2> <U0170>;"<U0055><U0060>" +% CYRILLIC SMALL LETTER U WITH DOUBLE ACUTE +<U04F3> <U0171>;"<U0075><U0060>" +% CYRILLIC CAPITAL LETTER CHE WITH DIAERESIS +<U04F4> "<U0043><U0308>";"<U0043><U0048><U0060>" +% CYRILLIC SMALL LETTER CHE WITH DIAERESIS +<U04F5> "<U0063><U0308>";"<U0063><U0068><U0060>" +% CYRILLIC CAPITAL LETTER YERU WITH DIAERESIS +<U04F8> <U0178>;"<U0059><U0060>" +% CYRILLIC SMALL LETTER YERU WITH DIAERESIS +<U04F9> <U00FF>;"<U0079><U0060>" +% RIGHT SINGLE QUOTATION MARK +<U2019> <U2035>;<U0027> + +translit_end + +END LC_CTYPE diff --git a/localedata/locales/ts_ZA b/localedata/locales/ts_ZA index 0256e42979..8e16fc02ae 100644 --- a/localedata/locales/ts_ZA +++ b/localedata/locales/ts_ZA @@ -62,6 +62,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/unm_US b/localedata/locales/unm_US index 1e62c60443..66cb4f7210 100644 --- a/localedata/locales/unm_US +++ b/localedata/locales/unm_US @@ -48,6 +48,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/ur_IN b/localedata/locales/ur_IN index 062cbf0937..38675b8c6b 100644 --- a/localedata/locales/ur_IN +++ b/localedata/locales/ur_IN @@ -46,6 +46,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/ur_PK b/localedata/locales/ur_PK index aaf47fceb5..4ea9c56100 100644 --- a/localedata/locales/ur_PK +++ b/localedata/locales/ur_PK @@ -49,6 +49,7 @@ LC_CTYPE copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" % those two lettes are not in cp1256... diff --git a/localedata/locales/ve_ZA b/localedata/locales/ve_ZA index 6b80455c98..1964162cc4 100644 --- a/localedata/locales/ve_ZA +++ b/localedata/locales/ve_ZA @@ -65,6 +65,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/vi_VN b/localedata/locales/vi_VN index 7fac1fbbcc..8eac6f3ba9 100644 --- a/localedata/locales/vi_VN +++ b/localedata/locales/vi_VN @@ -53,6 +53,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" % dong sign -> d// -> dd <U20AB> "<U0111>";"<U0064><U0064>" diff --git a/localedata/locales/wa_BE b/localedata/locales/wa_BE index e97493089e..6349142ef7 100644 --- a/localedata/locales/wa_BE +++ b/localedata/locales/wa_BE @@ -54,6 +54,7 @@ LC_CTYPE copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" % A-bole -> A-circonflecse -> AU <U00C5> "A<U030A>";"A";"AU" diff --git a/localedata/locales/wo_SN b/localedata/locales/wo_SN index 47263d2eab..bd466d934a 100644 --- a/localedata/locales/wo_SN +++ b/localedata/locales/wo_SN @@ -53,6 +53,7 @@ translit_start % Accents are simply omitted if they cannot be represented. include "translit_combining";"" +include "translit_cyrillic";"" translit_end diff --git a/localedata/locales/xh_ZA b/localedata/locales/xh_ZA index 4564137e85..5bd3d5bd3c 100644 --- a/localedata/locales/xh_ZA +++ b/localedata/locales/xh_ZA @@ -64,6 +64,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/yi_US b/localedata/locales/yi_US index 95963830fc..edd55f77e9 100644 --- a/localedata/locales/yi_US +++ b/localedata/locales/yi_US @@ -60,6 +60,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" % if digraphs are not available (this is the case with iso-8859-8) % then use the single letters diff --git a/localedata/locales/yuw_PG b/localedata/locales/yuw_PG index 0cb3cadf4a..b9e393d354 100644 --- a/localedata/locales/yuw_PG +++ b/localedata/locales/yuw_PG @@ -40,6 +40,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE diff --git a/localedata/locales/zh_CN b/localedata/locales/zh_CN index 62a46415c1..00f2332dde 100644 --- a/localedata/locales/zh_CN +++ b/localedata/locales/zh_CN @@ -58,6 +58,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end class "hanzi"; / diff --git a/localedata/locales/zu_ZA b/localedata/locales/zu_ZA index cf93a63009..ab37a145b2 100644 --- a/localedata/locales/zu_ZA +++ b/localedata/locales/zu_ZA @@ -68,6 +68,7 @@ copy "i18n" translit_start include "translit_combining";"" +include "translit_cyrillic";"" translit_end END LC_CTYPE -- 2.17.1