-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Cyrillic "й" is usually better represented with "y" and "ё" with "yo" #2
Comments
text-unidecode runs a Perl script to get a list of transliteration rules; bug tracker for the Perl library is https://rt.cpan.org/Public/Dist/Display.html?Name=Text-Unidecode. You can also monkey-patch the rules - check the module source code, it is very short. I think you can do something like that: import text_unidecode
text_unidecode._replaces[ord('ё')-1] = 'yo'
text_unidecode._replaces[ord('Ё')-1] = 'Yo' If you only work with Russian there are Russian-specific transliteration packages available, e.g. https://github.com/j2a/pytils. |
Thanks @kmike! What if we go further than Perl's version? I suppose we may patch and regenerate |
@rudyryk could you please explain in more details the approach you're thinking about? Fix Perl library -> get the fix merged upstream -> regenerate data.bin file looks like an ideal case. |
@kmike I would try to dump the patched Fixing Perl library is also possible I suppose but I'm not sure where to upload fixes, CPAN version seems to be unsupported for some years. |
BTW, how did you generate |
Now:
Better:
Yozh Vasiliy
BTW, Is it possible to tweak mapping manually?
The text was updated successfully, but these errors were encountered: