You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+79-6
Original file line number
Diff line number
Diff line change
@@ -240,11 +240,11 @@ In order to build llama.cpp you have three different options.
240
240
241
241
Building the program with BLAS support may lead to some performance improvements in prompt processing using batch sizes higher than 32 (the default is 512). BLAS doesn't affect the normal generation performance. There are currently three different implementations of it:
242
242
243
-
- Accelerate Framework:
243
+
- **Accelerate Framework**:
244
244
245
245
This is only available on Mac PCs and it's enabled by default. You can just build using the normal instructions.
246
246
247
-
- OpenBLAS:
247
+
- **OpenBLAS**:
248
248
249
249
This provides BLAS acceleration using only the CPU. Make sure to have OpenBLAS installed on your machine.
250
250
@@ -278,11 +278,11 @@ Building the program with BLAS support may lead to some performance improvements
278
278
cmake --build . --config Release
279
279
```
280
280
281
-
- BLIS
281
+
- **BLIS**
282
282
283
283
Check [BLIS.md](BLIS.md) for more information.
284
284
285
-
- Intel MKL
285
+
- **Intel MKL**
286
286
287
287
By default, `LLAMA_BLAS_VENDOR` is set to `Generic`, so if you already sourced intel environment script and assign `-DLLAMA_BLAS=ON`in cmake, the mkl version of Blas will automatically been selected. You may also specify it by:
288
288
@@ -293,7 +293,7 @@ Building the program with BLAS support may lead to some performance improvements
293
293
cmake --build . -config Release
294
294
```
295
295
296
-
- cuBLAS
296
+
- **cuBLAS**
297
297
298
298
This provides BLAS acceleration using the CUDA cores of your Nvidia GPU. Make sure to have the CUDA toolkit installed. You can download it from your Linux distro's package manager or from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads).
299
299
- Using `make`:
@@ -308,8 +308,81 @@ Building the program with BLAS support may lead to some performance improvements
308
308
cmake .. -DLLAMA_CUBLAS=ON
309
309
cmake --build . --config Release
310
310
```
311
+
Note: Because llama.cpp uses multiple CUDA streams for matrix multiplication results [are not guaranteed to be reproducible](https://docs.nvidia.com/cuda/cublas/index.html#results-reproducibility). If you need reproducibility, set `GGML_CUDA_MAX_STREAMS` in the file `ggml-cuda.cu` to 1.
311
312
312
-
Note: Because llama.cpp uses multiple CUDA streams for matrix multiplication results [are not guaranteed to be reproducible](https://docs.nvidia.com/cuda/cublas/index.html#results-reproducibility). If you need reproducibility, set `GGML_CUDA_MAX_STREAMS` in the file `ggml-cuda.cu` to 1.
313
+
- **CLBlast**
314
+
315
+
OpenCL acceleration is provided by the matrix multiplication kernels from the [CLBlast](https://github.com/CNugteren/CLBlast) project and custom kernels for ggml that can generate tokens on the GPU.
316
+
317
+
You will need the [OpenCL SDK](https://github.com/KhronosGroup/OpenCL-SDK).
318
+
- For Ubuntu or Debian, the packages `opencl-headers`, `ocl-icd` may be needed.
319
+
320
+
- <details>
321
+
<summary>Installing the OpenCL SDK from source</summary>
0 commit comments