a state-of-the-art-level open visual language model | 多模态预训练模型
-
Updated
May 29, 2024 - Python
a state-of-the-art-level open visual language model | 多模态预训练模型
🦀️ CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents. https://crab.camel-ai.org/
Commanding robots using only Language Models' prompts
Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"
Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.
Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖
Implementation of the "Learn No to Say Yes Better" paper.
Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"
Scene and animal attribute retrieval from camera trap data with domain-adapted vision-language models
Universal Adversarial Perturbations for Vision-Language Pre-trained Models
[NAACL 2024] Official Implementation of paper "Self-Adaptive Sampling for Efficient Video Question Answering on Image--Text Models"
Chain of Images for Intuitively Reasoning
A benchmark for evaluating hallucinations in large visual language models
PaliGemma is a project created from scratch, based on a YouTube guide, to learn and demonstrate application/library/system creation. The project uses modern development approaches and best practices from the original tutorial.
CLI for converting UForm models to CoreML.
Add a description, image, and links to the visual-language-models topic page so that developers can more easily learn about it.
To associate your repository with the visual-language-models topic, visit your repo's landing page and select "manage topics."