Build llama inference compute from scrath, only using torch/numpy base ops
Inspired by karpathy's awesome repo nanoGPT, I re-implemented a simple and clear llama model from scratch.
pip install torch >= 2.1.0
# transformers is used for convert model weights and compare results
pip install transformers >= 4.35.2
git clone https://github.com/silencelamb/naked_llama.git
# convert huggingface model to npy file
python convert_hf_to_pkl.py # default model_size is 7b
# default model_size is 7b
python naked_llama.py
# run 70 b
python naked_llama.py --model_size 70b