Enable Transliteration for Individual Languages
In cases where you need a particular transliteration, you can set GenericTransliteration to False and use the per-language transliteration schemes.
-
In the
[MyLanguage]individual language configuration sections, set theTransliterationparameter toTrueorFalse.
When you set GenericTransliteration to False, the Content component always performs transliteration for the languages in the following table, even if you also set Transliteration to False.
| Language | Transliteration |
|---|---|
| Japanese |
Half width katakana to full width katakana Full width 0–9, A–Z, a–z to single byte 0–9, A–Z, a–z |
| Chinese | Full width 0–9, A–Z, a–z to single byte 0–9, A–Z, a–z |
| Greek | Accented Greek characters to non-accented characters |
| Spanish | Accented vowels áéíóúü to non-accented vowels |
| Portuguese | Accented vowels àáâãçéêíòóôõúü to non-accented vowels |
| Russian | Some removal of characters |
|
Arabic Persian Sindhi Urdu Malay Malayalam Pushto |
Arabic character normalization |
Transliteration is optional for the language groups in the following table.
| Language group | Transliteration |
|---|---|
| Western European |
àáâãä=a å=aa ç=c èéêë=e ìíîï=i òóôõö=o ø=oe ùúûü=u œ(oe)=oe æ=ae ß=ss ñ=nh ý=y ð=d þ=th |
| German |
Same as Western European apart from: ä=ae ö=oe ü=ue |
| Scandinavian |
Same as Western European apart from: ä=ae ö=oe ü=ue |
| Catalan |
Same as Western European apart from: ç=sz |
| Cyrillic |
All characters mapped to A–Z Transliteration scheme uses British Standard 2979:1958 |
| South Slavic | For transliteration scheme, refer to A Handbook of Bosnian, Serbian and Croatian by Brown & Alt. |
The following table describes the languages that each of these language groups contain.
| Western European |
Czech Dutch English French Hungarian |
Italian Maori Mirandese Polish Portuguese |
Romanian Slovakian Spanish Turkish |
| German | German | ||
| Scandinavian |
Danish Finnish Icelandic |
Norwegian Swedish |
|
| Catalan | Catalan | ||
| Cyrillic |
Russian Tajik |
||
| South Slavic |
Bosnian Serbian Croatian |
For all other languages, transliteration does not apply, except for hyphen normalization.
For full details of the transliteration of different characters in different transliteration modes, refer to Knowledge Discovery Expert.