Skip to content

AutoJudge is an optimized LLM-as-a-Judge eval implementation

License

Notifications You must be signed in to change notification settings

shisa-ai/autojudge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AutoJudge

AutoJudge is an optimized LLM-as-a-Judge eval implementation.

  • Simpler and more intuitive execution.

  • Runs completely standalone for use with scripted evals (eg, run after training)

  • Internal Database/Queue for resuming runs (useful when paying by the token)

  • High performance - vLLM with HF fallback with batching

  • Config support organized by evals, judges, and results (organized by runs, no overwriting of your results, easy PRs)

Install

To install the required dependencies, run:

pip install packaging
pip install -r requirements.txt

To install the optional dependencies, run:

pip install -r requirements-optional.txt

Usage

autojudge evaluate --model <model-path> --dataset <dataset-path> --output <output-path> --user <user>
autojudge config

Development

To set up the development environment, run:

pip install -e .[dev]

TODO

Tests:

  • LLM-as-a-Judge Reliability Testing
    • Sample 10% and run 10X
    • Calculate violin graph of distribution
  • PoLL

About

AutoJudge is an optimized LLM-as-a-Judge eval implementation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages