Skip to content

Commit 4cb20e5

Browse files
committed
feature: add blis support
1 parent 5ea4339 commit 4cb20e5

File tree

4 files changed

+95
-0
lines changed

4 files changed

+95
-0
lines changed

BLIS.md

+67
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
BLIS Installation Manual
2+
------------------------
3+
4+
BLIS is a portable software framework for high-performance BLAS-like dense linear algebra libraries. It has received awards and recognition, including the 2023 James H. Wilkinson Prize for Numerical Software and the 2020 SIAM Activity Group on Supercomputing Best Paper Prize. BLIS provides a new BLAS-like API and a compatibility layer for traditional BLAS routine calls. It offers features such as object-based API, typed API, BLAS and CBLAS compatibility layers.
5+
6+
Project URL: https://github.com/flame/blis
7+
8+
### Prepare:
9+
10+
Compile BLIS:
11+
12+
```bash
13+
git clone https://github.com/flame/blis
14+
cd blis
15+
./configure --enable-cblas -t openmp,pthreads auto
16+
# will install to /usr/local/ by default.
17+
make -j
18+
```
19+
20+
Install BLIS:
21+
22+
```bash
23+
sudo make install
24+
```
25+
26+
We recommend using openmp since it's easier to modify the cores been used.
27+
28+
### llama.cpp compilation
29+
30+
Makefile:
31+
32+
```bash
33+
make LLAMA_BLIS=1 -j
34+
# make LLAMA_BLIS=1 benchmark-matmult
35+
```
36+
37+
CMake:
38+
39+
```bash
40+
mkdir build
41+
cd build
42+
cmake -DLLAMA_BLIS=ON ..
43+
make -j
44+
```
45+
46+
### llama.cpp execution
47+
48+
According to the BLIS documentation, we could set the following
49+
environment variables to modify the behavior of openmp:
50+
51+
```
52+
export GOMP_GPU_AFFINITY="0-19"
53+
export BLIS_NUM_THREADS=14
54+
```
55+
56+
And then run the binaries as normal.
57+
58+
59+
### Intel specific issue
60+
61+
Some might get the error message saying that `libimf.so` cannot be found.
62+
Please follow this [stackoverflow page](https://stackoverflow.com/questions/70687930/intel-oneapi-2022-libimf-so-no-such-file-or-directory-during-openmpi-compila).
63+
64+
### Reference:
65+
66+
1. https://github.com/flame/blis#getting-started
67+
2. https://github.com/flame/blis/blob/master/docs/Multithreading.md

CMakeLists.txt

+19
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,7 @@ endif()
6666
# 3rd party libs
6767
option(LLAMA_ACCELERATE "llama: enable Accelerate framework" ON)
6868
option(LLAMA_OPENBLAS "llama: use OpenBLAS" OFF)
69+
option(LLAMA_BLIS "llama: use blis" OFF)
6970
option(LLAMA_CUBLAS "llama: use cuBLAS" OFF)
7071
option(LLAMA_CLBLAST "llama: use CLBlast" OFF)
7172

@@ -178,6 +179,24 @@ if (LLAMA_OPENBLAS)
178179
endif()
179180
endif()
180181

182+
if (LLAMA_BLIS)
183+
add_compile_definitions(GGML_USE_BLIS)
184+
# we don't directly call BLIS apis, use cblas wrapper instead
185+
add_compile_definitions(GGML_USE_OPENBLAS)
186+
set(BLIS_INCLUDE_SEARCH_PATHS
187+
/usr/include
188+
/usr/include/blis
189+
/usr/local/include
190+
/usr/local/include/blis
191+
$ENV{BLIS_HOME}
192+
$ENV{BLIS_HOME}/include
193+
)
194+
find_path(BLIS_INC NAMES blis.h PATHS ${BLIS_INCLUDE_SEARCH_PATHS})
195+
add_compile_definitions(BLIS_ENABLE_CBLAS)
196+
add_link_options(-lblis)
197+
add_compile_options(-I${BLIS_INC})
198+
endif()
199+
181200
if (LLAMA_CUBLAS)
182201
cmake_minimum_required(VERSION 3.17)
183202

Makefile

+4
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,10 @@ ifdef LLAMA_OPENBLAS
122122
LDFLAGS += -lopenblas
123123
endif
124124
endif
125+
ifdef LLAMA_BLIS
126+
CFLAGS += -DGGML_USE_OPENBLAS -DGGML_USE_BLIS -I/usr/local/include/blis -I/usr/include/blis
127+
LDFLAGS += -lblis -L/usr/local/lib
128+
endif
125129
ifdef LLAMA_CUBLAS
126130
CFLAGS += -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I$(CUDA_PATH)/targets/x86_64-linux/include
127131
CXXFLAGS += -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I$(CUDA_PATH)/targets/x86_64-linux/include

README.md

+5
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ The main goal of `llama.cpp` is to run the LLaMA model using 4-bit integer quant
5757
- Runs on the CPU
5858
- OpenBLAS support
5959
- cuBLAS and CLBlast support
60+
- BLIS support (cblas wrapper)
6061

6162
The original implementation of `llama.cpp` was [hacked in an evening](https://github.com/ggerganov/llama.cpp/issues/33#issuecomment-1465108022).
6263
Since then, the project has improved significantly thanks to many contributions. This project is for educational purposes and serves
@@ -276,6 +277,10 @@ Building the program with BLAS support may lead to some performance improvements
276277
cmake --build . --config Release
277278
```
278279

280+
- BLIS
281+
282+
Check [BLIS.md](BLIS.md) for more information.
283+
279284
- cuBLAS
280285

281286
This provides BLAS acceleration using the CUDA cores of your Nvidia GPU. Make sure to have the CUDA toolkit installed. You can download it from your Linux distro's package manager or from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads).

0 commit comments

Comments
 (0)