A local leaderboard for evaluating your LLMs

Overview

OSAIL (OpenSource AI Leaderboard) is a service that allows you to evaluate LLMs by computing an Elo rating for a set of given models. Osail uses a Judge LLM (which can be configured) to evaluate the candidate responses.

Usage

Installation

Binary

Go to the releases page to find the release that best suits your operating system and architecture. Once downloaded, launch it using:

./osail

Open up your browser and goto localhost:8080 to see the starting page.

Build from source

clone the repository:

git clone https://github.com/softmaxer/osail.git

Make sure to have go > 1.21 installed and then install templ:

go install github.com/a-h/templ/cmd/templ@latest

then,

make

Starting an experiment

Before starting an experiment, make sure to have Ollama installed on your PC as the judge model runs natively on your machine.
The experiments and the ratings of the models are stored in an SQLite databse. So go ahead and create a new my_database.db file and include it in your .env file at the root. (Or use an existing file from anywhere in your system).
System prompt defines the general behaviour of the LLM i.e., if it should respond in a certain format or with a certain tone, etc.
The PROMPT file should be a .txt file that can contain multiple prompts separated by ---- (four dashes). NOTE: The field system prompt can be used a prompt template, and the prompts can be the actual text, as they will be concatenated for the inference.
Choose a judge model that either exists on your PC locally, or one of the models available from Ollama NOTE: Since OSAIL uses Ollama as the main inference engine, any of the model tags present on their website should work out of the box! (given you have enough resources on your machine).

Registering a Model

Refer to the example multi-ollama.yml file to see how to start up multiple ollama instances as docker images.
Configure which port they should be running on, as per your will.
Once you have filled out all the models that need to be running for the experiment, press run, and wait for it to finish. NOTE: This may take a very long time, depending on the resources available on your Machine. Ideally this should be run with a GPU

Upcoming features and changes

Human In Loop (HIL) Annotation
Publishing an experiment / Sharing experiments
UI changes

Contributing

Any and all contributions are more than welcome for either upcoming features and/or bug fixes! To get started, clone the repository and make to sure to have go > 1.21 and templ installed. The app itself is built with Templating and server side rendering. Meaning the API is written to respond in HTML, with HTMX requests from the DOM elements. For any questions about the code / Bugs, please raise an Issue and wait for the maintainers to respond.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
assets		assets
data		data
handler		handler
pkg		pkg
views		views
.env-dist		.env-dist
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go
multi-ollama.yml		multi-ollama.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A local leaderboard for evaluating your LLMs

Overview

Usage

Installation

Binary

Build from source

Starting an experiment

Registering a Model

Upcoming features and changes

Contributing

About

Releases 1

Packages

Languages

softmaxer/osail

Folders and files

Latest commit

History

Repository files navigation

A local leaderboard for evaluating your LLMs

Overview

Usage

Installation

Binary

Build from source

Starting an experiment

Registering a Model

Upcoming features and changes

Contributing

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages