A library & tools to evaluate predictive language models.
-
Updated
Aug 9, 2023 - Python
A library & tools to evaluate predictive language models.
Code and data for "KoDialogBench: Evaluating Conversational Understanding of Language Models with Korean Dialogue Benchmark" (LREC-COLING 2024)
A data construction and evaluation framework to quantify privacy norm awareness of language models (LMs) and emerging privacy risk of LM agents. (NeurIPS 2024 D&B)
Curriculum is a new format of NLI benchmark for evaluation of broad-coverage linguistic phenomena. This linguistic-phenomena-driven benchmark can serve as an effective tool for diagnosing model behavior and verifying model learning quality.
Language Modeling
A thesis investigating the use of large language models for summarizing application logs.
Add a description, image, and links to the language-model-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the language-model-evaluation topic, visit your repo's landing page and select "manage topics."