Guide: Getting llama.cpp to work on AMD RX 6600 series on Ubuntu via Vulkan for significant performance boost #9491
liquidscorpio
started this conversation in
General
Replies: 0 comments
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
-
The guide is about running the Python bindings for llama.cpp - llama-cpp-python on an RDNA2 series GPU using the Vulkan backend to get ~25x performance boost v/s OpenBLAS on CPU.
There seems to very sparse information about the topic so writing one here. The steps here should work for vanilla builds of llama.cpp (without the Python bindings) too.
Context
AMD cards are pain to get working properly in one shot with most LLM frameworks. The RX 6600 family are especially frustrating in this regard as hipBLAS does not support RX 6600 family (LLVM target: gfx1032) of cards.
While the steps here were tested specifically against RX 6600XT, they should be applicable to other RDNA series cards having Vulkan support.
Operating System
Ubuntu 22.04 LTS (but will work for higher or lower too)
Install Vulkan
Install Vulkan and OpenBLAS (you may or may not need it).
More details about APT packages and for other versions can be found here.
Install llama-cpp-python
Install llama-cpp-python with the relevant CMake flags. You can just retain the flags if you are source building llama.cpp and don't need Python. (Works inside a virtualenv too.)
Instance the Llama class (Python) and use it
Rough Performance Comparision
Beta Was this translation helpful? Give feedback.
All reactions