This repository is the official implementation of the paper "VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge" Arxiv. The dataset we use for fine-tuning is the MMFundus dataset.
(a) Previous vision models could only diagnose specific diseases as positive or negative, lacking the ability to provide clinical explanations or interact with patients. However, our proposed VisionUnite changes this approach. It can predict a wide range of diseases and allows real-time conversations with patients, incorporating their feedback. Additionally, VisionUnite offers clear clinical explanations in its output, making it more understandable and useful. (b) The label distribution of the proposed MMFundus dataset, which includes eight main categories excluding the "Others" class. (c) VisionUnite is built with a transformer-based vision encoder and a specialized vision adapter designed for classifying six different signs including Vascular, Macular, FBC (Fundus Boundary Color), OCD (Optical Cup Disc), FHE (Fundus Hemorrhages Examination), and Other. It includes a vision projector to align visual embeddings with text tokens. (d) The illustration of image-text contrastive learning (CLIP Loss). (e) The illustration of classification supervised learning (CLS Loss). (f) The illustration of text-generation supervised learning (LLM Loss).
Python == 3.8 and install from the requirements.txt
using:
pip install -r requirements.txt
You can train to get your own model.
bash ./exps/train.sh
Prepare the test data and run the following command
python demo.py
To obtain pre-trained models for the MMFundus dataset, you can contact the email address zhanli@uw.edu. We just handle the real-name email and your email suffix must match your affiliation. The email should contain the following information:
Name/Homepage/Google Scholar: (Tell us who you are.)
Primary Affiliation: (The name of your institution or university, etc.)
Job Title: (E.g., Professor, Associate Professor, Ph.D., etc.)
Affiliation Email: (the password will be sent to this email, we just reply to the email which is the end of "edu".)
How to use: (Only for academic research, not for commercial use or second-development.)
Our code is adapted from LLaMA-Adapter and InternVL. Thanks to these authors for their valuable works.