The code of quantized object-detection network model for hardware acceleration is under dnn/CoDeNet submodule. We also have non-quantized models to study impact of hardware-friendly deformable convolution modifications. These models are compatiable with the Detectron2 library and are under dnn/CoDeNet_Detectron2.
Command to run Config a:
python ctdet --arch shufflenetv2 --exp_id pascal_shufflenetv2_256_new1_1 --dataset pascal --input_res 256 --resume --flip_test --gpu 0
Command to run Config b:
python ctdet --arch shufflenetv2 --exp_id pascal_shufflenetv2_256_new3_1 --dataset pascal --input_res 256 --resume --flip_test --gpu 0 --maxpool
Command to run Config c:
python ctdet --arch shufflenetv2 --exp_id pascal_shufflenetv2_512_new14_1_test --dataset pascal --input_res 512 --resume --flip_test --gpu 0
Command to run Config d:
python ctdet --arch shufflenetv2 --exp_id pascal_shufflenetv2_512_new17_1 --dataset pascal --input_res 512 --resume --flip_test --gpu 0 --w2
Command to run Config e:
python ctdet --arch shufflenetv2 --exp_id pascal_shufflenetv2_512_new15_1 --dataset pascal --input_res 512 --resume --flip_test --gpu 0 --w2 --maxpool
Please follow the instructions at CoDeNet_Detectron2 Installation to set up the environment. We also provide a remote server to evaluate the trained model.
Command to run VOC result with modified deformable convolution in Table 1. last row:
python tools/ --num-gpus 1 --config-file configs/centernet/voc/V2_1.0x_voc_512_4gpus_1x_deform_conv_square_depthwise.yaml --eval-only MODEL.WEIGHTS output/centernet/voc/V2_1.0x_voc_512_4gpus_1x_deform_conv_square_depthwise/model_final.pth
# result: AP: 41.7 AP50: 64.5 AP75: 43.8
Command to run COCO result with modified deformable convolution in Table 1. last row:
python tools/ --num-gpus 1 --config-file configs/centernet/coco/V2_1.0x_coco_512_10gpus_1x_deform_conv_square_depthwise.yaml --eval-only MODEL.WEIGHTS output/centernet/coco/V2_1.0x_coco_512_10gpus_1x_deform_conv_square_depthwise/model_final.pth
# result: AP: 21.6 AP50: 37.4 AP75: 21.8 APs: 6.5 APm: 23.7 APl: 34.8
We evaluate the latency of our network on the Ultra96 PYNQ platform.
Please refer to cpp files and the system files under ./hls. The precompiled FPGA image is under ./bitfile. The project file can be downloaded here. The hls project can be downloaded here.
The source code for running the first layer layer latency is under sw/tvm. Please follow the sw/tvm/ to run it. The source code for calling the accelearator is in codenet.ipynb.
Please connect to the Ultra96 board and browse to the ipython notebook page
Upload the sw/codenet.ipynb
and sw/bitfile
folder to the remote FPGA. Run the iptyon notebook to see the latency results.