A proof-of-concept mock server for the NVIDIA Triton Inference Server. It operates in two modes:
-
Replay mode: In this mode, the mock server replays the requests it has seen before.
-
Record mode: In this mode, the mock server records the requests it sees and saves them to disk.
The mock server utilizes the gRPC definitions from the Triton Inference Server.
To run in recording mode:
RUST_LOG=debug cargo run --release -- --remote-host 0.0.0.0 --record
This requires a real Triton Inference Server running on ports 8302-8307
. Right now the mapping of model names to ports is hard-coded in src/main.rs
.
These recordings can be replayed using:
RUST_LOG=debug cargo run --release -- --remote-host 0.0.0.0
See PUBLISHING.md