Comparison between RTL and HLS version processing kernel
You can read written paper here "Comparison Between HLS and HDL Image Processing in FPGAs", ICCE-Asia 2020
- HLS Project: sobel_edge
- Vivado Project: sobel_1080p
- Prebuilt Base Image: base
- Image samples: data
- Using HLS version IP Python Code: sobel_sw.ipynb
- Using RTL version IP Python Code: sobel_hw.ipynb
High Level Synthesis(HLS) tools transforms C/C++ based functions into FPGA synthesizable RTL modules.
By using HLS, you can get
- Rapid design speed
- Easy verification(testbench with C / RTL)
In this project, we're going to
- Create an image / video processing kernel of Sobel edge detection
- Verify it
- Export as an RTL IP
- Use it in Vivado
If you have no idea for HLS you had better read this article first.
The image processing kerenel we're going to make will contain following steps.
- Color conversion(RGB2GRAY)
- Spatial low-pass filter(3X3 Gaussian filter)
- Sobel operation(3X3 Sobel)
- Sum of gradients’ magnitude(Weighted Sum)
For optimization, dataflow pragma was used to parallelize the processing. The dataflow pragma creates a channel FIFO buffer between the stages to make it so that the processes are done in parallel.
- Create Vivado High-Level Synthesis project
Do not specify source file, testbench file - will be added later
Solution name: solution1
Clock Period: 10
Part: Select Board Ultra96v2 board
- Create source / header / testbench files and write codes.
-
sobel.cpp
#include "sobel.h" void sobel_accel(AXI_STREAM& INPUT_STREAM, AXI_STREAM& OUTPUT_STREAM){ #pragma HLS INTERFACE axis port=INPUT_STREAM #pragma HLS INTERFACE axis port=OUTPUT_STREAM #pragma HLS INTERFACE s_axilite port=return bundle=CONTROL_BUS int rows = MAX_HEIGHT; int cols = MAX_WIDTH; BGR_IMG src(rows, cols); GRAY_IMG gray(rows, cols); GRAY_IMG blurred(rows, cols); GRAY_IMG gray1(rows, cols); GRAY_IMG gray2(rows, cols); GRAY_SIGNED_IMG sobel_x64(rows, cols); GRAY_SIGNED_IMG sobel_y64(rows, cols); GRAY_IMG sobel_x(rows, cols); GRAY_IMG sobel_y(rows, cols); GRAY_IMG sobel_g(rows, cols); BGR_IMG dst(rows, cols); #pragma HLS dataflow hls::AXIvideo2Mat(INPUT_STREAM, src); hls::CvtColor<HLS_BGR2GRAY>(src, gray); hls::GaussianBlur<3,3>(gray,blurred); hls::Duplicate(blurred,gray1,gray2); hls::Sobel<1,0,3>(gray1, sobel_x64); hls::Sobel<0,1,3>(gray2, sobel_y64); hls::ConvertScaleAbs(sobel_x64, sobel_x); hls::ConvertScaleAbs(sobel_y64, sobel_y); hls::AddWeighted(sobel_x,0.5,sobel_y,0.5,0.0,sobel_g); hls::CvtColor<HLS_GRAY2BGR>(sobel_g, dst); hls::Mat2AXIvideo(dst, OUTPUT_STREAM); }
-
sobel.h
#include "hls_video.h" #include "ap_fixed.h" typedef hls::stream<ap_axiu<24,1,1,1> > AXI_STREAM; void sobel_accel(AXI_STREAM& stream_in, AXI_STREAM& stream_out); #define MAX_HEIGHT 1080 #define MAX_WIDTH 1920 typedef hls::Mat<MAX_HEIGHT, MAX_WIDTH, HLS_8UC3> BGR_IMG; typedef hls::Mat<MAX_HEIGHT, MAX_WIDTH, HLS_8UC1> GRAY_IMG; typedef hls::Mat<MAX_HEIGHT, MAX_WIDTH, HLS_16S> GRAY_SIGNED_IMG;
-
tb_sobel.cpp
#include "sobel.h" #include "hls_opencv.h" int main(){ cv::Mat img_src(cv::Size(MAX_WIDTH,MAX_HEIGHT),CV_8UC3); cv::Mat img_dst(cv::Size(MAX_WIDTH,MAX_HEIGHT),CV_8UC3); img_src = cv::imread("C:/work/zynq/hls/hls_projects/sobel_edge/src/FHD_1.jpg"); AXI_STREAM stream_in, stream_out; cvMat2AXIvideo(img_src, stream_in); sobel_accel(stream_in, stream_out); AXIvideo2cvMat(stream_out, img_dst); cv::imwrite("C:/work/zynq/hls/hls_projects/sobel_edge/src/sobel_FHD_1.jpg", img_dst); return 0; }
-
Include pre-written source / testbench files to HLS Project
or maybe you can write code after include source codes
-
Project Settings -> Synthesis -> Set Top Function as
In a C/C++ file, many functions may exist. Specify the top function here, but the top function's name mustn't be main(), since the main() function is used in test bench to call the top function.
-
Synthesis
Click the button for C-Synthesis.
After C-Synthesis, you can findout the data infos.
- Timing: How fast clock speed your IP can work with. In this image, the iming target is 10.00ns, which means 100MHz fabric clock. The sobel_accel IP's timing is 8.544ns that you can use this IP in the system of 100MHz fabric clock speed.
- Latency(clock cycles): How much time(clock cycles) is required to execute the IP.
- Utilizaition Estimates: Used FPGA resouces.
- Interfaces: What kind of protocols are used for input/output.
-
Test Bench
Run test bench. The left red box is C Simulation, and the right one is C/RTL cosimulation. Before you run test bench, you must change paths for source image and destination image.
The result image looks fine that we can use it.
-
Export to RTL
Now export as RTL IP. You can choose the language between Verilog / VHDL The path to the IP is [project_name]/solution1/impl/ip
In this project, there is an image/video processing kernel(sobel-accel) between two VDMAs. You can find out the 'Image Processing Kernel easily.
- PS reads a stored image from external SD Card memory(Using OpenCV-Python).
- Numpy array is used for the image frame.
- Configure the VDMAs' channel mode as VideoMode(Width 1920, Height 1080, Data Width 24 for R, G, B)
- Deep copy the image frame to Contiguous Memory Array(CMA)
- Write channel VDMA sends the image frame stored at input buffer(CMA).
- Read channel VDMA receives the image frame and stores at output buffer(CMA).
- PS prints received frame image.
Creating Vivado Project and block design components are really similar with the former project VDMA_PASS_THROUGH.
-
Create Vivado Project
- Set board as Ultra96
-
Add the pre-built Sobel Accel IP
IP Catalog - Add Repository: Add path to the pre-built Sobel Accel IP.
-
- Processing System: Apply board presets, Enable M_AXI_HPM0_FPD, S_AXI_HPC0_FPD
- AXI Video Direct Memory Access x 2
Frame Buffers: Configure how many frames store in external DRAM(Not important for this project) Stream Data Width: At least 24, since we're sending RGB 24 bit datawidth pixel data. Read/Write Burst Size: This matters for bandwidth. For HD resolution images, 32bit width yields throughput of 96 fps, 64bit width yields 192 fps.
- Concat: Number of Ports 5
- AXI Interrupt Controller
- Sobel Accel IP: Pre-built Sobel accel IP.
-
Generate Bitstream Copy and rename bitstream(bit) and hardware handoff file(hwh)
- proj.runs/impl_1/design_1_wrapper.bit
- proj.srcs/sources_1/bd/design_1/hw_handoff/design_1.hwh as hw_sobel.bit / hw_sobel.hwh
-
Run in the Jupyter Notebooks Python & OpenCV SW Version FPGA Accelerated HW Version