-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Transliteration not proper for few characters in Tamil #11
Comments
Thanks for pointing out. The extended ITRANS standard we defined does not probably have a mapping for this character. I will check this over the weekend. |
I wonder how this transliteration compares to open-tamil package. Anoop would you be publishing this package on python pkg repository? Where are your unittests for this project, I can't seem to find it. |
The open-tamil package too has some problems handling the unicodes. You will have to explicitly type out in Tamil to get the best results.Discrepancy I faced is like so -
I have used open-tamil package.In both scenarios source of the letters were different i.e. different texts. |
@vrindaprabhu - please create a suitable issue and we can address it. |
@vrindaprabhu - I checked on Python3 and Open-Tamil version 0.51, I'm not seeing this issue you report. get_letters() returns just 1 letter as element of list. |
Strange. Probably like I mentioned it depends on how "தொ" is written. Even I did not face the issue all the time but only with few particular sentences in the corpus. |
@vrindaprabhu - there are unicode normalization issues and these are fixed in version 0.65. |
Please find the below code for transliterating from Tamil to English.
The text was updated successfully, but these errors were encountered: