This repository contains a number of examples written in Halide language which work on RISC-V CPU with RVV 0.7.1. Follow the steps below to reproduce the experiments or some of their parts.
NOTE: At this moment project relies on the Ahead-Of-Time (AOT) compilation of Halide kernels. However there are precompiled files for easy start.
Project uses OpenCV as a reference implementation for some algorithms. Also, we benchmark it to compare with Halide implemention. Fetch OpenCV source code by submodules update after clonning the repository.
git submodule update --init
-
Download THead toolchain: Xuantie-900-gcc-linux-5.10.4-glibc-x86_64-V2.6.1-20220906.tar.gz (registration needed)
-
Clone Halide source code (no build required)
git clone --depth 1 https://github.com/halide/Halide
-
Build a project for RISC-V CPU:
export PATH=$HOME/Xuantie-900-gcc-linux-5.10.4-glibc-x86_64-V2.6.1/bin/:$PATH cmake \ -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_TOOLCHAIN_FILE=$HOME/halide_riscv/riscv64-071.toolchain.cmake \ -DHalide_INCLUDE_DIRS=$HOME/Halide/src/runtime \ -S halide_riscv -B build_rv64 cmake --build build_rv64 -j$(nproc --all)
-
Transfer build directory to the RISC-V board and run
test_algo
(accuracy tests) orperf_algo
(performance tests).scp test_algo perf_algo libalgos.so opencv-prefix/src/opencv-build/lib/* sipeed@x.x.x.x:/home/sipeed/
export LD_LIBRARY_PATH=./ ./test_algo ./perf_algo
HW: Sipeed Lichee RV Dock (Allwinner D1 aka XuanTie C906 CPU)
OS: 20211230_LicheeRV_debian_d1_hdmi_8723ds
BGR2Gray input: 1080x1920 |
Median time |
---|---|
Reference (interleaved) | 37.80ms |
OpenCV (interleaved), no RVV | 32.18ms |
Halide (interleaved) | 30.78ms |
Halide (planar) | 6.65ms |
Box filter input: 1080x1920 output: 1078x1918 |
Median time |
---|---|
OpenCV, no RVV | 75.17ms |
Halide | 62.89ms |
Histogram input: 1080x1920x3 output: 256x3 |
Median time |
---|---|
Reference | 72.06ms |
OpenCV, no RVV | 57.35ms |
Halide | 92.44ms |
Convolution FP32 input: 1x16x128x128 kernel: 32x16x3x3 |
Layout | Median time |
---|---|---|
OpenCV, no RVV | NCHW | 829.13ms |
Halide | NCHW | 698.27ms |
Halide | NHWC | 418.95ms |
If you want regenerate AOT artifacts or add new algorithms, build the project on x86:
-
Build LLVM from https://github.com/dkurt/llvm-rvv-071/tree/rvv-071
cmake -DCMAKE_BUILD_TYPE=Release \ -DLLVM_ENABLE_PROJECTS="clang;lld" \ -DLLVM_TARGETS_TO_BUILD="RISCV" \ -DLLVM_ENABLE_TERMINFO=OFF -DLLVM_ENABLE_ASSERTIONS=ON \ -DLLVM_ENABLE_EH=ON -DLLVM_ENABLE_RTTI=ON -DLLVM_BUILD_32_BITS=OFF \ -GNinja \ -S llvm-project/llvm -B llvm-build cmake --build llvm-build -j4
-
Build Halide with the following patch (tested on revision https://github.com/halide/Halide/commit/7963cd4e3c23856b82567c99e0a3d16035ffe895):
diff --git a/src/CodeGen_RISCV.cpp b/src/CodeGen_RISCV.cpp index ba9abe04d..454558d11 100644 --- a/src/CodeGen_RISCV.cpp +++ b/src/CodeGen_RISCV.cpp @@ -151,6 +151,7 @@ string CodeGen_RISCV::mattrs() const { arch_flags += ",+zvl" + std::to_string(target.vector_bits) + "b"; } #endif + arch_flags += ",-zve64x"; } return arch_flags; } diff --git a/src/autoschedulers/CMakeLists.txt b/src/autoschedulers/CMakeLists.txt index 9b88f0a66..10088bb9b 100644 --- a/src/autoschedulers/CMakeLists.txt +++ b/src/autoschedulers/CMakeLists.txt @@ -24,6 +24,6 @@ endfunction() add_subdirectory(common) -add_subdirectory(adams2019) +# add_subdirectory(adams2019) add_subdirectory(li2018) add_subdirectory(mullapudi2016)
export LLVM_ROOT=$HOME/llvm-build cmake -DLLVM_DIR=$LLVM_ROOT/lib/cmake/llvm \ -DClang_DIR=$LLVM_ROOT/lib/cmake/clang \ -DCMAKE_BUILD_TYPE=Release \ -DWITH_TESTS=OFF \ -DWITH_TUTORIALS=OFF \ -DWITH_PYTHON_BINDINGS=OFF \ -S Halide -B halide-build cmake --build halide-build -j4 cmake --install halide-build --prefix halide-install
-
Build a project on x86
export LD_LIBRARY_PATH=$HOME/halide-build/src/autoschedulers/mullapudi2016:$LD_LIBRARY_PATH cmake \ -DCMAKE_BUILD_TYPE=Release \ -DHalide_DIR=$HOME/halide-install/lib/cmake/Halide \ -S halide_riscv -B build cmake --build build -j$(nproc --all)
-
Run
perf_algo
once and find the generated*.h
and*.s
files in the working directory.