- train.json, 4608 questions
- id_test.json, 421 questions
- ood_test.json, 390 questions
- data/en_preview Note that the official English version is still being processed, and there may be errors in the current version.
- pytorch 2.0
- transformers
- zhipuai
- openai 0.28.0
- dashscope
Install numbat tool from [https://github.com/sharkdp/numbat].
GLM-4
series: baselines/LLMs/GLM/ChatGLM4_api.pyGPT
series: baselines/LLMs/GLM/ChatGPT_api.pyQwen
series: baselines/LLMs/GLM/Qwen_api.pyother LLMs
: download model files from huggingface and thencd baselines/LLMs/ && python run.py --model_name_or_path /path/to/llm --data_file datas/id_test_zero_shot.json
.data_file
could be one of[id_test_zero_shot, ood_test_zero_shot, id_test_5_shot, ood_test_5_shot]
.- eval:
cd baselines/LLMs/ && python eval_results.py --id_results {id_result_file} --ood_results {ood_result_file}
- with calculator:
cd baselines/small_models && bash run_qwen.sh
- without calculator:
cd baselines/small_models && bash run_qwen_wo_cal.sh
- train formula retriever:
cd baselines/RAG/ && bash run.sh
- eval formula retriever:
cd baselines/RAG/ && python eval.py --model_path outputs_retriever