llm_model_evaluation

Description

Use python script to do LLM Model Evaluation.

Support Dataset

I. mmlu dataset

Introduction from paper with code: Paper-with-code

II. tmmluplus dataset

Introduction: Medium Article
huggingface dataset: Huggingface Dataset

How to use it?

Step 1: please download the model from huggingface The following command line is the example of mistral-7B-v0.1 model:

git lfs install
git clone https://huggingface.co/mistralai/Mistral-7B-v0.1

Step 2: Please arrange the dataset from tmmluplus data folder to data_arrange folder.
Step 3: Please run the following code to predict the result:

python3 evaluation_hf_testing.py \
    --model ./models/llama2-7b-hf \
    --data_dir ./llm_evaluation_tmmluplus/data_arrange/ \
    --save_dir ./llm_evaluation_tmmluplus/results/

Step 4: Please run the evaluation code to get the output json file.

!python /content/llm_model_evaluation/catogories_result_eval.py \
    --catogory "mmlu" \
    --model ./models/llama2-7b-hf \
    --save_dir "./results/results_llama2-7b-hf"

The example google colab code

mmlu dataset:

Google Colab - mmlu
Google Colab - mmlu in phi-2 model [Colab free tier can use this Google Colab example]

tmmluplus dataset:

Google Colab - tmmluplus

Evaluation Result

mmlu dataset:

模型	Weighted Accuracy	STEM	humanities	social sciences	other	Inference Time(s)
Mistral-7B-v0.1	0.6254094858282296	0.5251822398939695	0.5636556854410202	0.7357816054598635	0.703578038247995	15624.038010835648

tmmluplus dataset:

模型	Weighted Accuracy	STEM	humanities	social sciences	other	Inference Time(s)
Mistral-7B-v0.1	-	-	-	-	-	-

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
categories		categories
config		config
data		data
data_arrange		data_arrange
old_script		old_script
results		results
LICENSE		LICENSE
README.md		README.md
catogories_result_eval.py		catogories_result_eval.py
evaluation_api_testing.py		evaluation_api_testing.py
evaluation_hf_testing.py		evaluation_hf_testing.py
llm_evaluation_mmlu.ipynb		llm_evaluation_mmlu.ipynb
llm_evaluation_mmlu_phi_2.ipynb		llm_evaluation_mmlu_phi_2.ipynb
llm_evaluation_tmmluplus.ipynb		llm_evaluation_tmmluplus.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llm_model_evaluation

Description

Support Dataset

I. mmlu dataset

II. tmmluplus dataset

How to use it?

The example google colab code

Evaluation Result

About

Releases

Packages

Languages

License

InfuseAI/llm_model_evaluation

Folders and files

Latest commit

History

Repository files navigation

llm_model_evaluation

Description

Support Dataset

I. mmlu dataset

II. tmmluplus dataset

How to use it?

The example google colab code

Evaluation Result

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages