Analysis and Eval | |||||
---|---|---|---|---|---|
Supported Layers | Performance/Resource Utilization | ||||
Performance Eval | |||||
Design and Development | |||||
API Reference | Quantization User Guide for CHaiDNN | Model Zoo | Running Inference on new Network | ||
Creating SDx GUI Project | Configurable Parameters | Custom Platform Generation | Software Layer Plugin | ||
SDSoC Environment User Guide | Hardware-Software Partitioning for Performance |
- Network files (.prototxt, .caffemodel & mean file)
- All the layers of the network must be supported by CHaiDNN
- Quantization parameters for all layers
- CHaiDNN repository & pre-built binaries / SD-Card image
Weights/Bias | Activations |
---|---|
8-Bit | 8-bit |
6-Bit | 6-Bit |
📌 NOTE: For all the layer of a given network the bit-widths should be same for activations. Same is the case with bit-widths for weights and biases
Supported Layers | |||
---|---|---|---|
Convolution | BatchNorm | Power | Scale |
Deconvolution | ReLU | Pooling(Max, Avg) | InnerProduct |
Dropout | Softmax | Crop | Concat |
Permute | Normalize(L2 Norm) | Argmax | Flatten |
PriorBox | Reshape | NMS | Eltwise |
CReLU* | Depthwise Separable Convolution | Software Layer Plugin** | Input/ Data |
Dilated Convolution |
📌 NOTE: The above mentioned layers are verified for the layer configurations/params defined in the provided 6 example networks only.
Step 1: Update prototxt with precision Parameters
CHaiDNN works in fixed point domain for better performance. All the feature maps and trained parameters are converted from single precision to fixed point before the computation starts. The steps to obtain the updated deploy.prototxt with precision parameters are shared in the Quantization user guide
Step 2: Build application with new network
To implement a network inference using CHaiDNN APIs, follow these steps.
-
Create a cpp file. Say,
MyNet_ex.cpp
. -
Include standard headers and opencv header files.
#include <stdio.h> #include <string.h> #include <stdlib.h> #undef __ARM_NEON__ #undef __ARM_NEON #include <opencv2/core/core.hpp> #include <opencv2/imgproc/imgproc.hpp> #include <opencv2/highgui/highgui.hpp> #define __ARM_NEON__ #define __ARM_NEON #include <iostream> using namespace std; using namespace cv;
-
Include sds include.
#include "sds_lib.h"
-
Include interface header which has CHaiDNN API prototypes
#include "../api/xi.hpp" #include "../api/xi_readwrite_util.hpp" #include "../checkers/checkers.hpp"
-
Define start/end time Macros for performance measurement
//# Performance check long long int clock_start, clock_end, frequency; #define TIME_STAMP_INIT clock_start = sds_clock_counter(); #define TIME_STAMP { \ clock_end = sds_clock_counter(); \ frequency = sds_clock_frequency(); \ }
-
Write
main()
.-
Create a structure to hold info about input/output layers
io_layer_info io_layer_info_ptr;
-
Define variables to hold network directory/file paths.
char *dirpath = "/mnt/models/MyNet"; /* Path to the network model directory */ char *prototxt = "deploy.prototxt"; /* Prototxt file name residing in network model directory */ char *caffemodel = "MyNet.caffemodel"; /* caffemodel file name residing in network model directory */
-
Define variables to hold input image path.
char *img_path = "/mnt/models/MyNet/input/camel.jpg";
-
Define start and end layers in network.
//# start/end layer in the graph1 string start_layer = ""; string end_layer = "";
start_layer
represents the name of the first layer of a network. If it is set to empty string, name of the first layer in the prototxt is taken by default.end_layer
represents the name of the last layer of a network. If it is set to empty string, name of the last layer in the prototxt is taken by default.
-
The data initialization can now be done using the
xiInit()
API. This API parses the network and initializes the Job-queue with memory and store network params in buffers.void *chai_handle = xiInit(dirpath, prototxt, caffemodel, &io_layer_info_ptr, numImg_to_process, is_first_layer, start_layer, end_layer);
-
Read and pre-process input image. This includes resizing the input image and subtract the mean if the mean is pixel wise. Two example utility functions are provided to make preprocessing easier.
int status = inputNormalization(normalizeInput, resize_h, resize_w, img_path1, img_path2, inp_mode, mean_path, numImg_to_process, io_layer_info_ptr);
-
Create input buffer
int in_size = io_layer_info_ptr.inlayer_sizebytes; //# Create input/output Buffers vector<void *> input; void *ptr; for(int i = 0; i < io_layer_info_ptr.num_in_bufs; i++) { if(io_layer_info_ptr.inlayer_exectype.compare("hardware") == 0) ptr = sds_alloc_non_cacheable(in_size); else ptr = malloc(in_size); input.push_back(ptr); }
-
Create output buffer
int out_size = io_layer_info_ptr.outlayer_sizebytes; vector<void *> output; for(int i = 0; i < io_layer_info_ptr.num_out_bufs; i++) { if(io_layer_info_ptr.outlayer_exectype.compare("hardware") == 0) ptr = sds_alloc_non_cacheable(out_size); else ptr = malloc(out_size); output.push_back(ptr); }
-
Pack the mean-subtracted input to input buffer
xiInputRead(normalizeInput, input, numImg_to_process, io_layer_info_ptr);
-
Call
xiExec
to run inferenceTIME_STAMP_INIT xiExec(chai_handle, input, output); TIME_STAMP
📌 NOTE:
TIME_STAMP_INIT
andTIME_STAMP
stores the start and end cycles which can be used to check the performance of the network.- Check the latency
//# Total time for the API in Images/Second double tot_time = (((double)(clock_end-clock_start)/(double)frequency)*1000)*(double)XBATCH_SIZE; fprintf(stderr, "\n[PERFM] Performance : %lf Images/second\n", (double)(1000)/tot_time); fprintf(stderr, "\n\n");
- Unpack the output and write to output file (optional)
int unpack_out_size = io_layer_info_ptr.outlayer_sizebytes; //# Create memory for unpack output data vector<void *> unpack_output; for(int batch_id = 0; batch_id < numImg_to_process; batch_id++) { void *ptr = malloc(unpack_out_size); unpack_output.push_back(ptr); } //# Loading required params for unpack function kernel_type_e out_kerType = io_layer_info_ptr.out_kerType; int out_layer_size = io_layer_info_ptr.out_size; //# unpacks the output data xiUnpackOutput(output, unpack_output, out_kerType, out_layer_size, numImg_to_process); //# Write the output data to txt file outputWrite(dirpath, img_path1, unpack_output, numImg_to_process, io_layer_info_ptr, 0);
- Release Memory
xiRelease(chai_handle); //# Release before exiting application
-
📌 NOTE: Paths provided for libs/includes in below Makefile example might change based on where the Makefile is located. Use Relative/Absolute paths to libs/includes based on the directory structure. These instructions assumes that all the libraries are already built and kept in
SD_Card
directory.
-
Set ARM compiler & SDx install path
ARM_CXX = aarch64-linux-gnu-g++ # Provide Correct SDx Path SDx_BUILD_PATH = /proj/xbuilds/2017.4_released/installs/lin64/SDx/2017.4
-
Set include path
IDIRS = -I$(SDx_BUILD_PATH)/target/aarch64-linux/include
-
Set OpenCV and Protobuf paths
PB_ARM_DIR = ../../SD_Card/protobuf_arm64 OPENCV_DIR = ../../SD_Card/opencv_arm64 CBLAS_ARM_DIR = ../../SD_Card/cblas_arm64
-
Set required libraries
OPENCV_LIBS = -lopencv_core -llzma -ltiff -lpng16 -lz -ljpeg -lopencv_imgproc -lopencv_imgcodecs -ldl -lrt -lwebp LDIRS = -L../../SD_Card/lib LLIBS = -lprotobuf -lpthread -lxstack -lxlnxdnn -lparser_arm
-
Set compilation flags
CFLAGS_ARM = -std=c++11 -D__SDSOC=1 -Wno-write-strings .PHONY: all
-
Set compilation commands using above variables
MyNet.elf : ./MyNet_ex.cpp $(ARM_CXX) $(CFLAGS_ARM) -L$(PB_ARM_DIR)/lib -I$(PB_ARM_DIR)/include -L$(OPENCV_DIR)/lib -I$(OPENCV_DIR)/include -L$(CBLAS_ARM_DIR)/lib -I(CBLAS_ARM_DIR)/include $(IDIRS) $(LDIRS) $(LLIBS) $(OPENCV_LIBS) $^ -o $@
-
Save Makefile and run make
make MyNet.elf
This will generate an executable MyNet.elf
to run the network inference.
Copyright© 2018 Xilinx