Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[QUESTION] Disambiguation using unfactored bert model does not yield same results as using the Camelira Web Interface #130

Open
amsu2 opened this issue Dec 16, 2023 · 1 comment
Assignees
Labels

Comments

@amsu2
Copy link

amsu2 commented Dec 16, 2023

I installed the project. Did all things.

Used the example code from https://camel-tools.readthedocs.io/en/stable/api/disambig/bert.html#examples.

Tried out various input sentences. In pretty much every sentence, often in verbs, the last letter remains without diacritization.

But more importantly, every so often, a word gets disambiguated completely different to what the Camlira Website would do, and the weightings are also different.

Example:
Input: وهي مدرسة
Output: وَهِيَ مَدْرَسَةٌ
Camelira Website Output: وَهِيَ مُدَرِّسَةٌ

For some words, not only are the weightings or the chosing between two 1.0 results different, but the analysis is completely different.

Example:
Input: مهمة
Output: مَهَمَّةً
Camelia Website Output: 30 versions of مُهِمَّةٌ; mahammah is not once included.

Thanks in advance for your help. I'm a CS Student and have been interested in linguistics and Arabic for a few years now; I'm a big fan of your work. This would really help me.

Windows 10, Python 3.9

@Hamed1Hamed
Copy link

I have the same issue!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants