Name	Name	Last commit message	Last commit date
parent directory ..
ReadMe.md	ReadMe.md

HLS RTL VIDEO PROCESSING

Comparison between RTL and HLS version processing kernel

You can read written paper here "Comparison Between HLS and HDL Image Processing in FPGAs", ICCE-Asia 2020

So What's Inside

HLS Project: sobel_edge
Vivado Project: sobel_1080p
Prebuilt Base Image: base
Image samples: data
Using HLS version IP Python Code: sobel_sw.ipynb
Using RTL version IP Python Code: sobel_hw.ipynb

High Level Synthesis

High Level Synthesis(HLS) tools transforms C/C++ based functions into FPGA synthesizable RTL modules.

By using HLS, you can get

Rapid design speed
Easy verification(testbench with C / RTL)

In this project, we're going to

Create an image / video processing kernel of Sobel edge detection
Verify it
Export as an RTL IP
Use it in Vivado

If you have no idea for HLS you had better read this article first.

Image Processing Kernel

The image processing kerenel we're going to make will contain following steps.

Color conversion(RGB2GRAY)
Spatial low-pass filter(3X3 Gaussian filter)
Sobel operation(3X3 Sobel)
Sum of gradients’ magnitude(Weighted Sum)

For optimization, dataflow pragma was used to parallelize the processing. The dataflow pragma creates a channel FIFO buffer between the stages to make it so that the processes are done in parallel.

Vivado High Level Synthesis

Create Vivado High-Level Synthesis project

Do not specify source file, testbench file - will be added later Solution name: solution1 Clock Period: 10 Part: Select Board Ultra96v2 board

Create source / header / testbench files and write codes.

sobel.cpp

  #include "sobel.h"
  void sobel_accel(AXI_STREAM& INPUT_STREAM, AXI_STREAM& OUTPUT_STREAM){
      #pragma HLS INTERFACE axis port=INPUT_STREAM
      #pragma HLS INTERFACE axis port=OUTPUT_STREAM
      #pragma HLS INTERFACE s_axilite port=return bundle=CONTROL_BUS

      int rows = MAX_HEIGHT;
      int cols = MAX_WIDTH;

      BGR_IMG src(rows, cols);
      GRAY_IMG gray(rows, cols);
      GRAY_IMG blurred(rows, cols);
      GRAY_IMG gray1(rows, cols);
      GRAY_IMG gray2(rows, cols);
      GRAY_SIGNED_IMG sobel_x64(rows, cols);
      GRAY_SIGNED_IMG sobel_y64(rows, cols);
      GRAY_IMG sobel_x(rows, cols);
      GRAY_IMG sobel_y(rows, cols);
      GRAY_IMG sobel_g(rows, cols);
      BGR_IMG dst(rows, cols);

      #pragma HLS dataflow
      hls::AXIvideo2Mat(INPUT_STREAM, src);
      hls::CvtColor<HLS_BGR2GRAY>(src, gray);
      hls::GaussianBlur<3,3>(gray,blurred);
      hls::Duplicate(blurred,gray1,gray2);
      hls::Sobel<1,0,3>(gray1, sobel_x64);
      hls::Sobel<0,1,3>(gray2, sobel_y64);
      hls::ConvertScaleAbs(sobel_x64, sobel_x);
      hls::ConvertScaleAbs(sobel_y64, sobel_y);
      hls::AddWeighted(sobel_x,0.5,sobel_y,0.5,0.0,sobel_g);
      hls::CvtColor<HLS_GRAY2BGR>(sobel_g, dst);
      hls::Mat2AXIvideo(dst, OUTPUT_STREAM);
  }

sobel.h

  #include "hls_video.h"
  #include "ap_fixed.h"

  typedef hls::stream<ap_axiu<24,1,1,1> > AXI_STREAM;

  void sobel_accel(AXI_STREAM& stream_in, AXI_STREAM& stream_out);
  #define MAX_HEIGHT 1080
  #define MAX_WIDTH 1920
  typedef hls::Mat<MAX_HEIGHT, MAX_WIDTH, HLS_8UC3> BGR_IMG;
  typedef hls::Mat<MAX_HEIGHT, MAX_WIDTH, HLS_8UC1> GRAY_IMG;
  typedef hls::Mat<MAX_HEIGHT, MAX_WIDTH, HLS_16S> GRAY_SIGNED_IMG;

tb_sobel.cpp

  #include "sobel.h"
  #include "hls_opencv.h"
  int main(){
      cv::Mat img_src(cv::Size(MAX_WIDTH,MAX_HEIGHT),CV_8UC3);
      cv::Mat img_dst(cv::Size(MAX_WIDTH,MAX_HEIGHT),CV_8UC3);
      img_src = cv::imread("C:/work/zynq/hls/hls_projects/sobel_edge/src/FHD_1.jpg");
      AXI_STREAM stream_in, stream_out;
      cvMat2AXIvideo(img_src, stream_in);
      sobel_accel(stream_in, stream_out);
      AXIvideo2cvMat(stream_out, img_dst);
      cv::imwrite("C:/work/zynq/hls/hls_projects/sobel_edge/src/sobel_FHD_1.jpg", img_dst);

      return 0;
  }

Include pre-written source / testbench files to HLS Project or maybe you can write code after include source codes
Project Settings -> Synthesis -> Set Top Function as In a C/C++ file, many functions may exist. Specify the top function here, but the top function's name mustn't be main(), since the main() function is used in test bench to call the top function.
Synthesis Click the button for C-Synthesis. After C-Synthesis, you can findout the data infos.

Timing: How fast clock speed your IP can work with. In this image, the iming target is 10.00ns, which means 100MHz fabric clock. The sobel_accel IP's timing is 8.544ns that you can use this IP in the system of 100MHz fabric clock speed.
Latency(clock cycles): How much time(clock cycles) is required to execute the IP.
Utilizaition Estimates: Used FPGA resouces.
Interfaces: What kind of protocols are used for input/output.

Test Bench Run test bench. The left red box is C Simulation, and the right one is C/RTL cosimulation. Before you run test bench, you must change paths for source image and destination image. The result image looks fine that we can use it.
Export to RTL Now export as RTL IP. You can choose the language between Verilog / VHDL The path to the IP is [project_name]/solution1/impl/ip

SYSTEM BLOCK DIAGRAM

In this project, there is an image/video processing kernel(sobel-accel) between two VDMAs. You can find out the 'Image Processing Kernel easily.

PS reads a stored image from external SD Card memory(Using OpenCV-Python).
Numpy array is used for the image frame.
Configure the VDMAs' channel mode as VideoMode(Width 1920, Height 1080, Data Width 24 for R, G, B)
Deep copy the image frame to Contiguous Memory Array(CMA)
Write channel VDMA sends the image frame stored at input buffer(CMA).
Read channel VDMA receives the image frame and stores at output buffer(CMA).
PS prints received frame image.

HW Implementation

Creating Vivado Project and block design components are really similar with the former project VDMA_PASS_THROUGH.

Create Vivado Project
- Set board as Ultra96
Add the pre-built Sobel Accel IP IP Catalog - Add Repository: Add path to the pre-built Sobel Accel IP.
Create Block Design
- Processing System: Apply board presets, Enable M_AXI_HPM0_FPD, S_AXI_HPC0_FPD
- AXI Video Direct Memory Access x 2 Frame Buffers: Configure how many frames store in external DRAM(Not important for this project) Stream Data Width: At least 24, since we're sending RGB 24 bit datawidth pixel data. Read/Write Burst Size: This matters for bandwidth. For HD resolution images, 32bit width yields throughput of 96 fps, 64bit width yields 192 fps.
- Concat: Number of Ports 5
- AXI Interrupt Controller
- Sobel Accel IP: Pre-built Sobel accel IP.
Generate Bitstream Copy and rename bitstream(bit) and hardware handoff file(hwh)
- proj.runs/impl_1/design_1_wrapper.bit
- proj.srcs/sources_1/bd/design_1/hw_handoff/design_1.hwh as hw_sobel.bit / hw_sobel.hwh
Run in the Jupyter Notebooks Python & OpenCV SW Version FPGA Accelerated HW Version

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3.HLS_RTL_VIDEO_PROCESSING

3.HLS_RTL_VIDEO_PROCESSING

ReadMe.md

HLS RTL VIDEO PROCESSING

So What's Inside

High Level Synthesis

Image Processing Kernel

Vivado High Level Synthesis

SYSTEM BLOCK DIAGRAM

HW Implementation

Files

3.HLS_RTL_VIDEO_PROCESSING

Directory actions

More options

Directory actions

More options

Latest commit

History

3.HLS_RTL_VIDEO_PROCESSING

Folders and files

parent directory

ReadMe.md

HLS RTL VIDEO PROCESSING

So What's Inside

High Level Synthesis

Image Processing Kernel

Vivado High Level Synthesis

SYSTEM BLOCK DIAGRAM

HW Implementation