PyTorch implementation of a collections of scalable Video Transformer Benchmarks.
-
Updated
May 4, 2022 - Python
PyTorch implementation of a collections of scalable Video Transformer Benchmarks.
Developed the ViViT model for medical video classification, enhancing 3D organ image analysis using transformer-based architectures.
The dataset used for the "A non-contact SpO2 estimation using video magnification and infrared data" publication
Video vision transformers for hierarchical anomaly detection in video scenes.
Python script to fine tune Open source Video Vision Transformer (ViVit) using HuggingFace Trainer Library
Some incomplete works with 2D action recognition on MM-Fit dataset using ViT, ViViT, and MLP-Mixer Topics Resources
Unofficial Tensorflow implementation of the ViViT model architecture
A comparative study of ViViT, CNN-RNN, and ResNet architectures for video action recognition using the UCF101 dataset
Add a description, image, and links to the vivit topic page so that developers can more easily learn about it.
To associate your repository with the vivit topic, visit your repo's landing page and select "manage topics."