VLM-review

视觉语言大模型综述

论文目录

Model
- LLaVA
Data
- ShareGPT4V

Model

LLaVA

Code: Github

Weight: Huggingface

Demo: Huggingface

LLaVA: Visual Instruction Tuning

Date: 2023.04
Core contribution: A simple framework for training VLM (Vision Language Model) has been proposed, achieving notable visual comprehension abilities with less data. Moreover, a batch of high-quality structured fine-tuning VLM data comprising 158K instances has been produced using existing open-source datasets.

LLaVA 1.5: Improved Baselines with Visual Instruction Tuning

Date: 2023.10
Core contribution: By increasing the amount of data for pre-training (558K) and fine-tuning (655K), as well as enhancing the resolution of the visual encoder and the size of the LLM, further evidence was provided for the key factors in improving the capabilities of VLMs.

LLaVA 1.6: LLaVA-NeXT: Improved reasoning, OCR, and world knowledge

Date: 2024.01
Core contribution: Selected subsets of datasets were curated, increasing the fine-tuning data volume to 760K; image clarity was enhanced by utilizing dynamic image segmentation techniques; furthermore, larger LLM models were employed for further improvements.

Data

ShareGPT4V

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

Date: 2023.11
Core contribution: The impact of using high-quality image captions during the pre-training phase of VLMs was proposed. Utilizing GPT-4V, 100K high-quality image caption data were generated, and based on this data, the ShareCaptioner model was trained to produce high-quality image captions. Ultimately, the ShareCaptioner model was used to create and contribute a total of 1.2 million data entries.
Code: Github
Weight: [ShareGPT4V-7B] [ShareCaptioner]
Demo: Huggingface

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VLM-review

论文目录

Model

LLaVA

Data

ShareGPT4V

About

Releases

Packages

yiyexy/VLM-review

Folders and files

Latest commit

History

Repository files navigation

VLM-review

论文目录

Model

LLaVA

Data

ShareGPT4V

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages