Directly launch the service using docker compose.
docker compose up --build
Build the docker image from scratch.
# build from scratch, cd to the parent dir of Dockerfile.server
docker build . -f Dockerfile/Dockerfile.sensevoice -t soar97/triton-sensevoice:24.05
your_mount_dir=/mnt:/mnt
docker run -it --name "sensevoice-server" --gpus all --net host -v $your_mount_dir --shm-size=2g soar97/triton-sensevoice:24.05
Please follow the official guide of FunASR to export the sensevoice onnx file. Also, you need to download the tokenizer file by yourself.
Log of directory tree:
model_repo_sense_voice_small
|-- encoder
| |-- 1
| | `-- model.onnx -> /your/path/model.onnx
| `-- config.pbtxt
|-- feature_extractor
| |-- 1
| | `-- model.py
| |-- am.mvn
| |-- config.pbtxt
| `-- config.yaml
|-- scoring
| |-- 1
| | `-- model.py
| |-- chn_jpn_yue_eng_ko_spectok.bpe.model -> /your/path/chn_jpn_yue_eng_ko_spectok.bpe.model
| `-- config.pbtxt
`-- sensevoice
|-- 1
`-- config.pbtxt
8 directories, 10 files
# launch the service
tritonserver --model-repository /workspace/model_repo_sensevoice_small \
--pinned-memory-pool-byte-size=512000000 \
--cuda-memory-pool-byte-size=0:1024000000
git clone https://github.com/yuekaizhang/Triton-ASR-Client.git
cd Triton-ASR-Client
num_task=32
python3 client.py \
--server-addr localhost \
--server-port 10086 \
--model-name sensevoice \
--compute-cer \
--num-tasks $num_task \
--batch-size 16 \
--manifest-dir ./datasets/aishell1_test
Benchmark results below were based on Aishell1 test set with a single V100, the total audio duration is 36108.919 seconds.
concurrent-tasks | batch-size-per-task | processing time(s) | RTF |
---|---|---|---|
32 (onnx fp32) | 16 | 67.09 | 0.0019 |
32 (onnx fp32) | 1 | 82.04 | 0.0023 |
(Note: for batch-size-per-task=1 cases, tritonserver could use dynamic batching to improve throughput.)
This part originates from NVIDIA CISI project. We also have TTS and NLP solutions deployed on triton inference server. If you are interested, please contact us.