Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add LLama CPP Support #335

Closed
wants to merge 6 commits into from
Closed

Conversation

bayedieng
Copy link
Contributor

@bayedieng bayedieng commented Oct 12, 2024

This PR is meant to add LLama CPP Support using the ggml inference engine. For the sake of simplicity, this PR will take Q8(8-bit) format GGUF Files, implementing the LLAMA model, and infer from their weights. The process involves dequantizing the weights into 32-bit floats to perform computations and re-quantizing them to Q8s for memory-efficiency.

Steps

  • Parse model weights
  • Implement Tensor Operations to Perform Inference
  • Implement Full Model

Closes #167

@bayedieng bayedieng closed this Oct 25, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BOUNTY - $500] Llama.cpp inference engine
1 participant