language-model-evaluation

Here are 6 public repositories matching this topic...

microsoft / LMChallenge

A library & tools to evaluate predictive language models.

nlp evaluation research-tool language-model prediction-model ngram-model evaluation-toolkit next-word-prediction lm-challenge language-model-evaluation

Updated Aug 9, 2023
Python

sb-jang / kodialogbench

Star

Code and data for "KoDialogBench: Evaluating Conversational Understanding of Language Models with Korean Dialogue Benchmark" (LREC-COLING 2024)

benchmark dialogue conversation korean-nlp language-model-evaluation

Updated Mar 2, 2024
Python

SALT-NLP / PrivacyLens

Star

A data construction and evaluation framework to quantify privacy norm awareness of language models (LMs) and emerging privacy risk of LM agents. (NeurIPS 2024 D&B)

privacy language-model-evaluation large-language-model language-model-agent neurips-2024

Updated Oct 10, 2024
Python

Curriculum is a new format of NLI benchmark for evaluation of broad-coverage linguistic phenomena. This linguistic-phenomena-driven benchmark can serve as an effective tool for diagnosing model behavior and verifying model learning quality.

benchmark natural-language-processing natural-language-inference reasoning natural-language-understanding probing language-model-evaluation