Skip to content

OpenCL based FPGA Convolution Accelerator with Systolic Array and Winograd

Notifications You must be signed in to change notification settings

aazz44ss/ConvFPGA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ConvFPGA

OpenCL based FPGA Convolution Accelerator with Systolic Array and Winograd

Parallelism

Total MACs for convolution = Oh x Ow x Fo x Fi x Fh x Fw
Parallelize M on Fo and N on Fi can increase M x N times on MAC/cycle

Input Folding

We offsen set N as 16 (or larger), but the input feature map usally are pictures of 3 of Fi (RGB channel), causing the FPGA utilization only be 3/16 on first layer.
By folding the input feature map to increase number of Fi in first layer to improve parallelism.

Fixed Point 8 or Fixed Point 6

There are 1518 of 18x19 DSP in Altera 10 FPGA.
It can do 759 FP32 multiplication per cycle because it needs 2 of DSP to calculate FP32 multiplication (fraction bits 23 larger than 18).
It can do 1518 Fixed Point 8 multiplication per cycle (8 less than 18).
It can do 3036 Fixed Point 6 multiplication per cycle by packing 2 of Fixed Point 6 into 18 Bits integer (FP6_0,6'0,FP6_1)

Architecture of conv_core

Systolic Array

Broadcasting data from DDR to MAC unit is not friendly for hardware layout, will cause very high latency (20-30 MHz clocks) in order to meet timing requirement.
By using systolic array that pass data from DDR to PE(0) to PE(M-1), hardware layout can achieve low latency (130-160 MHz clocks) timing requirement. Can increase throughput by about 5 times compare to broadcast data from DDR.
Each PE is consist of N multiplier and 4 shift register.

Winograd

Use 1D winograd to increase throughput

conv_core_arch

Resource Usage on Intel Arria10 FPGA

Resource Usage

About

OpenCL based FPGA Convolution Accelerator with Systolic Array and Winograd

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published