VisionUnite

This repository is the official implementation of the paper "VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge" Arxiv. The dataset we use for fine-tuning is the MMFundus dataset.

(a) Previous vision models could only diagnose specific diseases as positive or negative, lacking the ability to provide clinical explanations or interact with patients. However, our proposed VisionUnite changes this approach. It can predict a wide range of diseases and allows real-time conversations with patients, incorporating their feedback. Additionally, VisionUnite offers clear clinical explanations in its output, making it more understandable and useful. (b) The label distribution of the proposed MMFundus dataset, which includes eight main categories excluding the "Others" class. (c) VisionUnite is built with a transformer-based vision encoder and a specialized vision adapter designed for classifying six different signs including Vascular, Macular, FBC (Fundus Boundary Color), OCD (Optical Cup Disc), FHE (Fundus Hemorrhages Examination), and Other. It includes a vision projector to align visual embeddings with text tokens. (d) The illustration of image-text contrastive learning (CLIP Loss). (e) The illustration of classification supervised learning (CLS Loss). (f) The illustration of text-generation supervised learning (LLM Loss).

Requirements

Python == 3.8 and install from the requirements.txt using:

pip install -r requirements.txt

Usage

1. Training

You can train to get your own model.

bash ./exps/train.sh

2. Evaluation

2.1 Test the Model

Prepare the test data and run the following command

python demo.py

2.2 Pre-trained models

To obtain pre-trained models for the MMFundus dataset, you can contact the email address zhanli@uw.edu. We just handle the real-name email and your email suffix must match your affiliation. The email should contain the following information:

Name/Homepage/Google Scholar: (Tell us who you are.)
Primary Affiliation: (The name of your institution or university, etc.)
Job Title: (E.g., Professor, Associate Professor, Ph.D., etc.)
Affiliation Email: (the password will be sent to this email, we just reply to the email which is the end of "edu".)
How to use: (Only for academic research, not for commercial use or second-development.)

Our code is adapted from LLaMA-Adapter and InternVL. Thanks to these authors for their valuable works.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
ImageBind		ImageBind
exps		exps
llama		llama
util		util
LICENSE		LICENSE
README.md		README.md
VisionUnite_Manuscript.jpg		VisionUnite_Manuscript.jpg
convert_ckpt.py		convert_ckpt.py
demo.py		demo.py
engine_pretrain.py		engine_pretrain.py
fundus_prep.py		fundus_prep.py
gradio_app.py		gradio_app.py
main_pretrain.py		main_pretrain.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VisionUnite

Requirements

Usage

1. Training

2. Evaluation

2.1 Test the Model

2.2 Pre-trained models

About

Releases

Packages

Languages

License

HUANGLIZI/VisionUnite

Folders and files

Latest commit

History

Repository files navigation

VisionUnite

Requirements

Usage

1. Training

2. Evaluation

2.1 Test the Model

2.2 Pre-trained models

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages