Skip to content

Commit 97c9b77

Browse files
authored
Add documentation about CLBlast (#1604)
Installing, compiling and using.
1 parent 0ecb1bb commit 97c9b77

File tree

1 file changed

+79
-6
lines changed

1 file changed

+79
-6
lines changed

README.md

+79-6
Original file line numberDiff line numberDiff line change
@@ -240,11 +240,11 @@ In order to build llama.cpp you have three different options.
240240

241241
Building the program with BLAS support may lead to some performance improvements in prompt processing using batch sizes higher than 32 (the default is 512). BLAS doesn't affect the normal generation performance. There are currently three different implementations of it:
242242
243-
- Accelerate Framework:
243+
- **Accelerate Framework**:
244244
245245
This is only available on Mac PCs and it's enabled by default. You can just build using the normal instructions.
246246

247-
- OpenBLAS:
247+
- **OpenBLAS**:
248248

249249
This provides BLAS acceleration using only the CPU. Make sure to have OpenBLAS installed on your machine.
250250

@@ -278,11 +278,11 @@ Building the program with BLAS support may lead to some performance improvements
278278
cmake --build . --config Release
279279
```
280280

281-
- BLIS
281+
- **BLIS**
282282

283283
Check [BLIS.md](BLIS.md) for more information.
284284

285-
- Intel MKL
285+
- **Intel MKL**
286286

287287
By default, `LLAMA_BLAS_VENDOR` is set to `Generic`, so if you already sourced intel environment script and assign `-DLLAMA_BLAS=ON` in cmake, the mkl version of Blas will automatically been selected. You may also specify it by:
288288

@@ -293,7 +293,7 @@ Building the program with BLAS support may lead to some performance improvements
293293
cmake --build . -config Release
294294
```
295295

296-
- cuBLAS
296+
- **cuBLAS**
297297

298298
This provides BLAS acceleration using the CUDA cores of your Nvidia GPU. Make sure to have the CUDA toolkit installed. You can download it from your Linux distro's package manager or from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads).
299299
- Using `make`:
@@ -308,8 +308,81 @@ Building the program with BLAS support may lead to some performance improvements
308308
cmake .. -DLLAMA_CUBLAS=ON
309309
cmake --build . --config Release
310310
```
311+
Note: Because llama.cpp uses multiple CUDA streams for matrix multiplication results [are not guaranteed to be reproducible](https://docs.nvidia.com/cuda/cublas/index.html#results-reproducibility). If you need reproducibility, set `GGML_CUDA_MAX_STREAMS` in the file `ggml-cuda.cu` to 1.
311312
312-
Note: Because llama.cpp uses multiple CUDA streams for matrix multiplication results [are not guaranteed to be reproducible](https://docs.nvidia.com/cuda/cublas/index.html#results-reproducibility). If you need reproducibility, set `GGML_CUDA_MAX_STREAMS` in the file `ggml-cuda.cu` to 1.
313+
- **CLBlast**
314+
315+
OpenCL acceleration is provided by the matrix multiplication kernels from the [CLBlast](https://github.com/CNugteren/CLBlast) project and custom kernels for ggml that can generate tokens on the GPU.
316+
317+
You will need the [OpenCL SDK](https://github.com/KhronosGroup/OpenCL-SDK).
318+
- For Ubuntu or Debian, the packages `opencl-headers`, `ocl-icd` may be needed.
319+
320+
- <details>
321+
<summary>Installing the OpenCL SDK from source</summary>
322+
323+
```sh
324+
git clone --recurse-submodules https://github.com/KhronosGroup/OpenCL-SDK.git
325+
mkdir OpenCL-SDK/build
326+
cd OpenCL-SDK/build
327+
cmake .. -DBUILD_DOCS=OFF \
328+
-DBUILD_EXAMPLES=OFF \
329+
-DBUILD_TESTING=OFF \
330+
-DOPENCL_SDK_BUILD_SAMPLES=OFF \
331+
-DOPENCL_SDK_TEST_SAMPLES=OFF
332+
cmake --build . --config Release
333+
cmake --install . --prefix /some/path
334+
```
335+
</details>
336+
337+
Installing CLBlast: it may be found in your operating system's packages.
338+
339+
- <details>
340+
<summary>If not, then installing from source:</summary>
341+
342+
```sh
343+
git clone https://github.com/CNugteren/CLBlast.git
344+
mkdir CLBlast/build
345+
cd CLBLast/build
346+
cmake .. -DBUILD_SHARED_LIBS=OFF -DTUNERS=OFF
347+
cmake --build . --config Release
348+
cmake --install . --prefix /some/path
349+
```
350+
351+
Where `/some/path` is where the built library will be installed (default is `/usr/loca`l`).
352+
</details>
353+
354+
Building:
355+
356+
- Build with make:
357+
```sh
358+
make LLAMA_CLBLAST=1
359+
```
360+
- CMake:
361+
```sh
362+
mkdir build
363+
cd build
364+
cmake .. -DLLAMA_CLBLAST=ON -DCLBlast_dir=/some/path
365+
cmake --build . --config Release
366+
```
367+
368+
Running:
369+
370+
The CLBlast build supports `--gpu-layers|-ngl` like the CUDA version does.
371+
372+
To select the correct platform (driver) and device (GPU), you can use the environment variables `GGML_OPENCL_PLATFORM` and `GGML_OPENCL_DEVICE`.
373+
The selection can be a number (starting from 0) or a text string to search:
374+
375+
```sh
376+
GGML_OPENCL_PLATFORM=1 ./main ...
377+
GGML_OPENCL_DEVICE=2 ./main ...
378+
GGML_OPENCL_PLATFORM=Intel ./main ...
379+
GGML_OPENCL_PLATFORM=AMD GGML_OPENCL_DEVICE=1 ./main ...
380+
```
381+
382+
The default behavior is to find the first GPU device, but when it is an integrated GPU on a laptop, for instance, the selectors are useful.
383+
Using the variables it is possible to select a CPU-based driver as well, if so desired.
384+
385+
You can get a list of platforms and devices from the `clinfo -l` command, etc.
313386
314387
### Prepare Data & Run
315388

0 commit comments

Comments
 (0)