Fast inference engine for Transformer models
deep-neural-networks deep-learning cpp neon machine-translation openmp parallel-computing cuda inference avx intrinsics avx2 neural-machine-translation opennmt quantization gemm mkl thrust transformer-models onednn
-
Updated
Dec 18, 2024 - C++