A curated list of foundation models for vision and language tasks
-
Updated
Nov 1, 2024
A curated list of foundation models for vision and language tasks
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Video Search with CLIP
Multimodal Bi-Transformers (MMBT) in Biomedical Text/Image Classification
Phi-3-Vision model test - running locally
Add a description, image, and links to the multimodal-models topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-models topic, visit your repo's landing page and select "manage topics."