diff --git a/docs/backends/tensorrt.md b/docs/backends/tensorrt.md index 0cd79472c..4c7a08bee 100644 --- a/docs/backends/tensorrt.md +++ b/docs/backends/tensorrt.md @@ -8,7 +8,7 @@ Please install TensorRT 8 follow [install-guide](https://docs.nvidia.com/deeplea #### Build custom ops -Some custom ops are created to support models in OpenMMLab, the custom ops can be build as follow: +Some custom ops are created to support models in OpenMMLab, and the custom ops can be built as follow: ```bash cd ${MMDEPLOY_DIR} @@ -18,7 +18,7 @@ cmake -DBUILD_TENSORRT_OPS=ON .. make -j$(nproc) ``` -If you haven't install TensorRT in default path, Please add `-DTENSORRT_DIR` flag in cmake. +If you haven't installed TensorRT in the default path, Please add `-DTENSORRT_DIR` flag in CMake. ```bash cmake -DBUILD_TENSORRT_OPS=ON -DTENSORRT_DIR=${TENSORRT_DIR} .. @@ -26,3 +26,88 @@ If you haven't install TensorRT in default path, Please add `-DTENSORRT_DIR` fla ``` ### Convert model + +Please follow the tutorial in [How to convert model](../tutorials/how_to_convert_model.md). **Note** that the device must be `cuda` device. + +#### Int8 Support + +Since TensorRT supports INT8 mode, a custom dataset config can be given to calibrate the model. Following is an example for MMDetection: + +```python +# calibration_dataset.py + +# dataset settings, same format as the codebase in OpenMMLab +dataset_type = 'CalibrationDataset' +data_root = 'calibration/dataset/root' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=(1333, 800), + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size_divisor=32), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']), + ]) +] +data = dict( + samples_per_gpu=2, + workers_per_gpu=2, + train=dict( + type=dataset_type, + ann_file=data_root + 'train_annotations.json', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + ann_file=data_root + 'val_annotations.json', + pipeline=test_pipeline), + test=dict( + type=dataset_type, + ann_file=data_root + 'test_annotations.json', + pipeline=test_pipeline)) +evaluation = dict(interval=1, metric='bbox') +``` + +Convert your model with this calibration dataset: + +```python +python tools/deploy.py \ + ... + --calib-dataset-cfg calibration_dataset.py +``` + +If the calibration dataset is not given, the data will be calibrated with the dataset in model config. + +### FAQs + +- Error `error: parameter check failed at: engine.cpp::setBindingDimensions::1046, condition: profileMinDims.d[i] <= dimensions.d[i]` + + There is an input shape limit in deployment config: + + ```python + backend_config = dict( + # other configs + model_inputs=[ + dict( + input_shapes=dict( + input=dict( + min_shape=[1, 3, 320, 320], + opt_shape=[1, 3, 800, 1344], + max_shape=[1, 3, 1344, 1344]))) + ]) + # other configs + ``` + + The shape of the tensor `input` must be limited between `input_shapes["input"]["min_shape"]` and `input_shapes["input"]["max_shape"]`. + +- Error `error: [TensorRT] INTERNAL ERROR: Assertion failed: cublasStatus == CUBLAS_STATUS_SUCCESS` + + TRT 7.2.1 switches to use cuBLASLt (previously it was cuBLAS). cuBLASLt is the default choice for SM version >= 7.0. However, you may need CUDA-10.2 Patch 1 (Released Aug 26, 2020) to resolve some cuBLASLt issues. Another option is to use the new TacticSource API and disable cuBLASLt tactics if you don't want to upgrade. + + Read [this](https://forums.developer.nvidia.com/t/matrixmultiply-failed-on-tensorrt-7-2-1/158187/4) for detail. diff --git a/docs/ops/tensorrt.md b/docs/ops/tensorrt.md index ce59c58f2..6a06db314 100644 --- a/docs/ops/tensorrt.md +++ b/docs/ops/tensorrt.md @@ -1 +1,318 @@ ## TensorRT Ops + + + +- [TensorRT Ops](#tensorrt-ops) + - [TRTBatchedNMS](#trtbatchednms) + - [Description](#description) + - [Parameters](#parameters) + - [Inputs](#inputs) + - [Outputs](#outputs) + - [Type Constraints](#type-constraints) + - [grid_sampler](#grid_sampler) + - [Description](#description-1) + - [Parameters](#parameters-1) + - [Inputs](#inputs-1) + - [Outputs](#outputs-1) + - [Type Constraints](#type-constraints-1) + - [MMCVInstanceNormalization](#mmcvinstancenormalization) + - [Description](#description-2) + - [Parameters](#parameters-2) + - [Inputs](#inputs-2) + - [Outputs](#outputs-2) + - [Type Constraints](#type-constraints-2) + - [MMCVModulatedDeformConv2d](#mmcvmodulateddeformconv2d) + - [Description](#description-3) + - [Parameters](#parameters-3) + - [Inputs](#inputs-3) + - [Outputs](#outputs-3) + - [Type Constraints](#type-constraints-3) + - [MMCVMultiLevelRoiAlign](#mmcvmultilevelroialign) + - [Description](#description-4) + - [Parameters](#parameters-4) + - [Inputs](#inputs-4) + - [Outputs](#outputs-4) + - [Type Constraints](#type-constraints-4) + - [MMCVRoIAlign](#mmcvroialign) + - [Description](#description-5) + - [Parameters](#parameters-5) + - [Inputs](#inputs-5) + - [Outputs](#outputs-5) + - [Type Constraints](#type-constraints-5) + - [ScatterND](#scatternd) + - [Description](#description-6) + - [Parameters](#parameters-6) + - [Inputs](#inputs-6) + - [Outputs](#outputs-6) + - [Type Constraints](#type-constraints-6) + + + +### TRTBatchedNMS + +#### Description + +Batched NMS with a fixed number of output bounding boxes. + +#### Parameters + +| Type | Parameter | Description | +| ------- | --------------------- | --------------------------------------------------------------------------------------------------------------------------------------- | +| `int` | `background_label_id` | The label ID for the background class. If there is no background class, set it to `-1`. | +| `int` | `num_classes` | The number of classes. | +| `int` | `topK` | The number of bounding boxes to be fed into the NMS step. | +| `int` | `keepTopK` | The number of total bounding boxes to be kept per-image after the NMS step. Should be less than or equal to the `topK` value. | +| `float` | `scoreThreshold` | The scalar threshold for score (low scoring boxes are removed). | +| `float` | `iouThreshold` | The scalar threshold for IoU (new boxes that have high IoU overlap with previously selected boxes are removed). | +| `int` | `isNormalized` | Set to `false` if the box coordinates are not normalized, meaning they are not in the range `[0,1]`. Defaults to `true`. | +| `int` | `clipBoxes` | Forcibly restrict bounding boxes to the normalized range `[0,1]`. Only applicable if `isNormalized` is also `true`. Defaults to `true`. | + +#### Inputs + +
+
inputs[0]: T
+
boxes; 4-D tensor of shape (N, num_boxes, num_classes, 4), where N is the batch size; `num_boxes` is the number of boxes; `num_classes` is the number of classes, which could be 1 if the boxes are shared between all classes.
+
inputs[1]: T
+
scores; 4-D tensor of shape (N, num_boxes, 1, num_classes).
+
+ +#### Outputs +
+
outputs[0]: T
+
dets; 3-D tensor of shape (N, valid_num_boxes, 5), `valid_num_boxes` is the number of boxes after NMS. For each row `dets[i,j,:] = [x0, y0, x1, y1, score]`
+
outputs[1]: tensor(int32, Linear)
+
labels; 2-D tensor of shape (N, valid_num_boxes).
+
+ +#### Type Constraints + +- T:tensor(float32, Linear) + +### grid_sampler + +#### Description + +Perform sample from `input` with pixel locations from `grid`. + +#### Parameters + +| Type | Parameter | Description | +| ----- | -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `int` | `interpolation_mode` | Interpolation mode to calculate output values. (0: `bilinear` , 1: `nearest`) | +| `int` | `padding_mode` | Padding mode for outside grid values. (0: `zeros`, 1: `border`, 2: `reflection`) | +| `int` | `align_corners` | If `align_corners=1`, the extrema (`-1` and `1`) are considered as referring to the center points of the input's corner pixels. If `align_corners=0`, they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic. | + +#### Inputs + +
+
inputs[0]: T
+
Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the numbers of channels, inH and inW are the height and width of the data.
+
inputs[1]: T
+
Input offset; 4-D tensor of shape (N, outH, outW, 2), where outH and outW are the height and width of offset and output.
+
+ +#### Outputs + +
+
outputs[0]: T
+
Output feature; 4-D tensor of shape (N, C, outH, outW).
+
+ +#### Type Constraints + +- T:tensor(float32, Linear) + +### MMCVInstanceNormalization + +#### Description + +Carry out instance normalization as described in the paper https://arxiv.org/abs/1607.08022. + +y = scale * (x - mean) / sqrt(variance + epsilon) + B, where mean and variance are computed per instance per channel. + +#### Parameters + +| Type | Parameter | Description | +| ------- | --------- | -------------------------------------------------------------------- | +| `float` | `epsilon` | The epsilon value to use to avoid division by zero. Default is 1e-05 | + +#### Inputs + +
+
input: T
+
Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 ... Dn), where N is the batch size.
+
scale: T
+
The input 1-dimensional scale tensor of size C.
+
B: T
+
The input 1-dimensional bias tensor of size C.
+
+ +#### Outputs + +
+
output: T
+
The output tensor of the same shape as input.
+
+ +#### Type Constraints + +- T:tensor(float32, Linear) + + +### MMCVModulatedDeformConv2d + +#### Description + +Perform Modulated Deformable Convolution on input feature. Read [Deformable ConvNets v2: More Deformable, Better Results](https://arxiv.org/abs/1811.11168?from=timeline) for detail. + +#### Parameters + +| Type | Parameter | Description | +| -------------- | ------------------ | ------------------------------------------------------------------------------------- | +| `list of ints` | `stride` | The stride of the convolving kernel. (sH, sW) | +| `list of ints` | `padding` | Paddings on both sides of the input. (padH, padW) | +| `list of ints` | `dilation` | The spacing between kernel elements. (dH, dW) | +| `int` | `deformable_group` | Groups of deformable offset. | +| `int` | `group` | Split input into groups. `input_channel` should be divisible by the number of groups. | + +#### Inputs + +
+
inputs[0]: T
+
Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the number of channels, inH and inW are the height and width of the data.
+
inputs[1]: T
+
Input offset; 4-D tensor of shape (N, deformable_group* 2* kH* kW, outH, outW), where kH and kW are the height and width of weight, outH and outW are the height and width of offset and output.
+
inputs[2]: T
+
Input mask; 4-D tensor of shape (N, deformable_group* kH* kW, outH, outW), where kH and kW are the height and width of weight, outH and outW are the height and width of offset and output.
+
inputs[3]: T
+
Input weight; 4-D tensor of shape (output_channel, input_channel, kH, kW).
+
inputs[4]: T, optional
+
Input weight; 1-D tensor of shape (output_channel).
+
+ +#### Outputs + +
+
outputs[0]: T
+
Output feature; 4-D tensor of shape (N, output_channel, outH, outW).
+
+ +#### Type Constraints + +- T:tensor(float32, Linear) + +### MMCVMultiLevelRoiAlign + +#### Description + +Perform RoIAlign on features from multiple levels. Used in bbox_head of most two-stage detectors. + +#### Parameters + +| Type | Parameter | Description | +| ---------------- | ------------------ | ------------------------------------------------------------------------------------------------------------- | +| `int` | `output_height` | height of output roi. | +| `int` | `output_width` | width of output roi. | +| `list of floats` | `featmap_strides` | feature map stride of each level. | +| `int` | `sampling_ratio` | number of input samples to take for each output sample. `0` means to take samples densely for current models. | +| `float` | `roi_scale_factor` | RoIs will be scaled by this factor before RoI Align. | +| `int` | `finest_scale` | Scale threshold of mapping to level 0. Default: 56. | +| `int` | `aligned` | If `aligned=0`, use the legacy implementation in MMDetection. Else, align the results more perfectly. | + +#### Inputs + +
inputs[0]: T
+
RoIs (Regions of Interest) to pool over; 2-D tensor of shape (num_rois, 5) given as [[batch_index, x1, y1, x2, y2], ...].
+
inputs[1~]: T
+
Input feature map; 4D tensor of shape (N, C, H, W), where N is the batch size, C is the numbers of channels, H and W are the height and width of the data.
+ +#### Outputs + +
+
outputs[0]: T
+
RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element output[0][r-1] is a pooled feature map corresponding to the r-th RoI inputs[1][r-1].
+
+ +#### Type Constraints + +- T:tensor(float32, Linear) + +### MMCVRoIAlign + +#### Description + +Perform RoIAlign on output feature, used in bbox_head of most two-stage detectors. + +#### Parameters + +| Type | Parameter | Description | +| ------- | ---------------- | ------------------------------------------------------------------------------------------------------------- | +| `int` | `output_height` | height of output roi | +| `int` | `output_width` | width of output roi | +| `float` | `spatial_scale` | used to scale the input boxes | +| `int` | `sampling_ratio` | number of input samples to take for each output sample. `0` means to take samples densely for current models. | +| `str` | `mode` | pooling mode in each bin. `avg` or `max` | +| `int` | `aligned` | If `aligned=0`, use the legacy implementation in MMDetection. Else, align the results more perfectly. | + +#### Inputs + +
+
inputs[0]: T
+
Input feature map; 4D tensor of shape (N, C, H, W), where N is the batch size, C is the numbers of channels, H and W are the height and width of the data.
+
inputs[1]: T
+
RoIs (Regions of Interest) to pool over; 2-D tensor of shape (num_rois, 5) given as [[batch_index, x1, y1, x2, y2], ...]. The RoIs' coordinates are the coordinate system of inputs[0].
+
+ +#### Outputs + +
+
outputs[0]: T
+
RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element output[0][r-1] is a pooled feature map corresponding to the r-th RoI inputs[1][r-1].
+
+ +#### Type Constraints + +- T:tensor(float32, Linear) + +### ScatterND + +#### Description + +ScatterND takes three inputs `data` tensor of rank r >= 1, `indices` tensor of rank q >= 1, and `updates` tensor of rank q + r - indices.shape[-1] - 1. The output of the operation is produced by creating a copy of the input `data`, and then updating its value to values specified by updates at specific index positions specified by `indices`. Its output shape is the same as the shape of `data`. Note that `indices` should not have duplicate entries. That is, two or more updates for the same index-location is not supported. + +The `output` is calculated via the following equation: + +```python + output = np.copy(data) + update_indices = indices.shape[:-1] + for idx in np.ndindex(update_indices): + output[indices[idx]] = updates[idx] +``` + +#### Parameters + +None + +#### Inputs + +
+
inputs[0]: T
+
Tensor of rank r>=1.
+ +
inputs[1]: tensor(int32, Linear)
+
Tensor of rank q>=1.
+ +
inputs[2]: T
+
Tensor of rank q + r - indices_shape[-1] - 1.
+
+ +#### Outputs + +
+
outputs[0]: T
+
Tensor of rank r >= 1.
+
+ +#### Type Constraints + +- T:tensor(float32, Linear), tensor(int32, Linear)