You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
By essence the z-score is rescaled by annotator to make sure there are no big discrepancies. Therefore it should make scores consistent across languages but it is not so meaningfull to compare scores between languages.
However since I used the same method as COMET in my estimator for EuroLLM here: https://medium.com/p/7dccfe167814
and as wmt24 provides the same English source for most language pair, you can see that scores are not so much different across traditional pairs. The question is more about: do we really trust the DA scores year after year ....
Hi, I find that COMET is trained on the z-score of DA. However, I am not sure about the implementation.
Is it rescaled on the translation direction level or something else?
The text was updated successfully, but these errors were encountered: