CHaiDNN-v2
Analysis and Eval
	Supported Layers		Performance/Resource Utilization

	Performance Eval



Design and Development
	API Reference	Quantization User Guide for CHaiDNN	Model Zoo	Running Inference on new Network

	Creating SDx GUI Project	Configurable Parameters	Custom Platform Generation	Software Layer Plugin

	SDSoC Environment User Guide		Hardware-Software Partitioning for Performance

Running a Network using CHaiDNN

Prerequisites

Network files (.prototxt, .caffemodel & mean file)
All the layers of the network must be supported by CHaiDNN
Quantization parameters for all layers
CHaiDNN repository & pre-built binaries / SD-Card image

Combination of Supported bit-widths

Weights/Bias	Activations
8-Bit	8-bit
6-Bit	6-Bit

📌 NOTE: For all the layer of a given network the bit-widths should be same for activations. Same is the case with bit-widths for weights and biases

Layer Support

Supported Layers
Convolution	BatchNorm	Power	Scale
Deconvolution	ReLU	Pooling(Max, Avg)	InnerProduct
Dropout	Softmax	Crop	Concat
Permute	Normalize(L2 Norm)	Argmax	Flatten
PriorBox	Reshape	NMS	Eltwise
CReLU^*	Depthwise Separable Convolution	Software Layer Plugin^**	Input/ Data
Dilated Convolution

📌 NOTE: The above mentioned layers are verified for the layer configurations/params defined in the provided 6 example networks only.

Steps to Run a network using CHaiDNN

Step 1: Update prototxt with precision Parameters

CHaiDNN works in fixed point domain for better performance. All the feature maps and trained parameters are converted from single precision to fixed point before the computation starts. The steps to obtain the updated deploy.prototxt with precision parameters are shared in the Quantization user guide

Step 2: Build application with new network

To implement a network inference using CHaiDNN APIs, follow these steps.

1. Create an Application with new Network

Create a cpp file. Say, MyNet_ex.cpp.

Include standard headers and opencv header files.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#undef __ARM_NEON__
#undef __ARM_NEON
#include <opencv2/core/core.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/highgui/highgui.hpp>
#define __ARM_NEON__
#define __ARM_NEON

#include <iostream>
using namespace std;
using namespace cv;

Include sds include.
```
#include "sds_lib.h"
```

Include interface header which has CHaiDNN API prototypes

 #include "../api/xi.hpp"
 #include "../api/xi_readwrite_util.hpp"
 #include "../checkers/checkers.hpp"

Define start/end time Macros for performance measurement

//# Performance check
long long int clock_start, clock_end, frequency;
#define TIME_STAMP_INIT  clock_start = sds_clock_counter();
#define TIME_STAMP  { \
        clock_end = sds_clock_counter(); \
        frequency = sds_clock_frequency(); \
}

Write main().

Create a structure to hold info about input/output layers
```
  io_layer_info io_layer_info_ptr;
```

Define variables to hold network directory/file paths.

char *dirpath    = "/mnt/models/MyNet"; /* Path to the network model directory */
char *prototxt   = "deploy.prototxt";       /* Prototxt file name residing in network model directory */
char *caffemodel = "MyNet.caffemodel";  /* caffemodel file name residing in network model directory */

Define variables to hold input image path.

char *img_path  = "/mnt/models/MyNet/input/camel.jpg";

Define start and end layers in network.
```
//# start/end layer in the graph1
string start_layer = "";
string end_layer   = "";
```
- start_layer represents the name of the first layer of a network. If it is set to empty string, name of the first layer in the prototxt is taken by default.
- end_layer represents the name of the last layer of a network. If it is set to empty string, name of the last layer in the prototxt is taken by default.
The data initialization can now be done using the xiInit() API. This API parses the network and initializes the Job-queue with memory and store network params in buffers.
```
  void *chai_handle = xiInit(dirpath, prototxt, caffemodel, &io_layer_info_ptr, numImg_to_process, is_first_layer, start_layer, end_layer);
```
Read and pre-process input image. This includes resizing the input image and subtract the mean if the mean is pixel wise. Two example utility functions are provided to make preprocessing easier.
```
  int status = inputNormalization(normalizeInput, resize_h, resize_w, img_path1, img_path2,
  	inp_mode, mean_path, numImg_to_process, io_layer_info_ptr);
```

Create input buffer

int in_size = io_layer_info_ptr.inlayer_sizebytes;
//# Create input/output Buffers
vector<void *> input;
void *ptr;
for(int i = 0; i < io_layer_info_ptr.num_in_bufs; i++)
{
   if(io_layer_info_ptr.inlayer_exectype.compare("hardware") == 0)
       ptr = sds_alloc_non_cacheable(in_size);
   else
       ptr = malloc(in_size);
   input.push_back(ptr);
}

Create output buffer

int out_size = io_layer_info_ptr.outlayer_sizebytes;
vector<void *> output;

for(int i = 0; i < io_layer_info_ptr.num_out_bufs; i++)
{
  if(io_layer_info_ptr.outlayer_exectype.compare("hardware") == 0)
      ptr = sds_alloc_non_cacheable(out_size);
  else
      ptr = malloc(out_size);
  output.push_back(ptr);
}

Pack the mean-subtracted input to input buffer

xiInputRead(normalizeInput, input, numImg_to_process, io_layer_info_ptr);

Call xiExec to run inference

TIME_STAMP_INIT
xiExec(chai_handle, input, output);
TIME_STAMP

📌 NOTE: TIME_STAMP_INIT and TIME_STAMP stores the start and end cycles which can be used to check the performance of the network.

Check the latency

//# Total time for the API in Images/Second
double tot_time = (((double)(clock_end-clock_start)/(double)frequency)*1000)*(double)XBATCH_SIZE;
fprintf(stderr, "\n[PERFM] Performance : %lf Images/second\n", (double)(1000)/tot_time);
fprintf(stderr, "\n\n");

Unpack the output and write to output file (optional)

int unpack_out_size = io_layer_info_ptr.outlayer_sizebytes;

//# Create memory for unpack output data
vector<void *> unpack_output;
for(int batch_id = 0; batch_id < numImg_to_process; batch_id++)
{
  void *ptr = malloc(unpack_out_size);
  unpack_output.push_back(ptr);
}
//# Loading required params for unpack function
kernel_type_e out_kerType = io_layer_info_ptr.out_kerType;
int out_layer_size = io_layer_info_ptr.out_size;
//# unpacks the output data
xiUnpackOutput(output, unpack_output, out_kerType, out_layer_size, numImg_to_process);
//# Write the output data to txt file
outputWrite(dirpath, img_path1, unpack_output, numImg_to_process, io_layer_info_ptr, 0);

Release Memory

xiRelease(chai_handle); //# Release before exiting application

2. Compile Application using Makefile.

📌 NOTE: Paths provided for libs/includes in below Makefile example might change based on where the Makefile is located. Use Relative/Absolute paths to libs/includes based on the directory structure. These instructions assumes that all the libraries are already built and kept in SD_Card directory.

Set ARM compiler & SDx install path

ARM_CXX = aarch64-linux-gnu-g++

# Provide Correct SDx Path
SDx_BUILD_PATH = /proj/xbuilds/2017.4_released/installs/lin64/SDx/2017.4

Set include path

IDIRS = -I$(SDx_BUILD_PATH)/target/aarch64-linux/include

Set OpenCV and Protobuf paths

PB_ARM_DIR = ../../SD_Card/protobuf_arm64
OPENCV_DIR = ../../SD_Card/opencv_arm64
CBLAS_ARM_DIR = ../../SD_Card/cblas_arm64

Set required libraries

OPENCV_LIBS = -lopencv_core -llzma -ltiff -lpng16 -lz -ljpeg -lopencv_imgproc -lopencv_imgcodecs -ldl -lrt -lwebp
LDIRS = -L../../SD_Card/lib
LLIBS = -lprotobuf -lpthread -lxstack -lxlnxdnn -lparser_arm

Set compilation flags

CFLAGS_ARM = -std=c++11 -D__SDSOC=1 -Wno-write-strings
.PHONY: all

Set compilation commands using above variables

MyNet.elf : ./MyNet_ex.cpp
	$(ARM_CXX) $(CFLAGS_ARM) -L$(PB_ARM_DIR)/lib -I$(PB_ARM_DIR)/include -L$(OPENCV_DIR)/lib -I$(OPENCV_DIR)/include -L$(CBLAS_ARM_DIR)/lib -I(CBLAS_ARM_DIR)/include $(IDIRS) $(LDIRS) $(LLIBS) $(OPENCV_LIBS) $^ -o $@

Save Makefile and run make
```
make MyNet.elf
```

This will generate an executable MyNet.elf to run the network inference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RUN_NEW_NETWORK.md

RUN_NEW_NETWORK.md

CHaiDNN-v2

Running a Network using CHaiDNN

Prerequisites

Combination of Supported bit-widths

Layer Support

Steps to Run a network using CHaiDNN

1. Create an Application with new Network

2. Compile Application using Makefile.

Files

RUN_NEW_NETWORK.md

Latest commit

History

RUN_NEW_NETWORK.md

File metadata and controls

CHaiDNN-v2

Running a Network using CHaiDNN

Prerequisites

Combination of Supported bit-widths

Layer Support

Steps to Run a network using CHaiDNN

1. Create an Application with new Network

2. Compile Application using Makefile.