Guide: Getting llama.cpp to work on AMD RX 6600 series on Ubuntu via Vulkan for significant performance boost #9491

liquidscorpio · 2024-09-15T06:33:13Z

liquidscorpio
Sep 15, 2024

The guide is about running the Python bindings for llama.cpp - llama-cpp-python on an RDNA2 series GPU using the Vulkan backend to get ~25x performance boost v/s OpenBLAS on CPU.

There seems to very sparse information about the topic so writing one here. The steps here should work for vanilla builds of llama.cpp (without the Python bindings) too.

Context

AMD cards are pain to get working properly in one shot with most LLM frameworks. The RX 6600 family are especially frustrating in this regard as hipBLAS does not support RX 6600 family (LLVM target: gfx1032) of cards.

While the steps here were tested specifically against RX 6600XT, they should be applicable to other RDNA series cards having Vulkan support.

Operating System

Ubuntu 22.04 LTS (but will work for higher or lower too)

Install Vulkan

Install Vulkan and OpenBLAS (you may or may not need it).
More details about APT packages and for other versions can be found here.

wget -qO- https://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo tee /etc/apt/trusted.gpg.d/lunarg.asc
sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-jammy.list http://packages.lunarg.com/vulkan/lunarg-vulkan-jammy.list
sudo apt update
sudo apt install vulkan-sdk libopenblas-dev libopenblas0-openmp libopenblas0 libopenblas-dev libopenblas-pthread-dev libopenblas-openmp-dev libopenblas0-openmp libopenblas0-pthread vulkan-tools mesa-utils mesa-utils

Install llama-cpp-python

Install llama-cpp-python with the relevant CMake flags. You can just retain the flags if you are source building llama.cpp and don't need Python. (Works inside a virtualenv too.)

CMAKE_ARGS="-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS -DGGML_VULKAN=1" pip install llama-cpp-python --no-cache-dir --force-reinstall -v

Instance the Llama class (Python) and use it

llm = Llama(
    model_path=model_path,  # Path to your gguf model file
    verbose=True,  # True is needed for GPU
    n_gpu_layers=-1,  # Set as desired for GPU acceleration
    n_ctx=2048,  # Context window
)

Rough Performance Comparision

Backend	Hardware	RAM / VRAM	Model	Seconds for ~300 input + output tokens
CPU	Intel i5 8600K	48G, 2133 Mhz	Mistral 7BQ5	80 - 100s
GPU	AMD RX 6600XT	8G DDRR6, 128-bit	Mistral 7BQ5	4 - 6s

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guide: Getting llama.cpp to work on AMD RX 6600 series on Ubuntu via Vulkan for significant performance boost #9491

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Guide: Getting llama.cpp to work on AMD RX 6600 series on Ubuntu via Vulkan for significant performance boost #9491

liquidscorpio Sep 15, 2024

Context

Operating System

Install Vulkan

Install llama-cpp-python

Rough Performance Comparision

Replies: 0 comments

liquidscorpio
Sep 15, 2024