Visual ChatGPT

Visual ChatGPT connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting.

See our paper: Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

Intro

I implement a google-colab version under standard GPU environment. I just use two models T2I and ImageCaption to process images because of my insufficient GPU memory. You can try my colab notebook here

Demo

T2I

ImageCaption

GPU memory usage

Here we list the GPU memory usage of each visual foundation model, one can modify self.tools with fewer visual foundation models to save your GPU memory:

Foundation Model	Memory Usage (MB)
ImageEditing	6667
ImageCaption	1755
T2I	6677
canny2image	5540
line2image	6679
hed2image	6679
scribble2image	6679
pose2image	6681
BLIPVQA	2709
seg2image	5540
depth2image	6677
normal2image	3974
InstructPix2Pix	2795

Acknowledgement

We appreciate the open source of the following projects:

Hugging Face LangChain Stable Diffusion ControlNet InstructPix2Pix CLIPSeg BLIP

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
assets		assets
LICENSE.md		LICENSE.md
README.md		README.md
download.sh		download.sh
requirement.txt		requirement.txt
visual_chatgpt.py		visual_chatgpt.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visual ChatGPT

Intro

Demo

GPU memory usage

Acknowledgement

About

Releases

Packages

Languages

License

Starlight039/visual-chatgpt-googlecolab

Folders and files

Latest commit

History

Repository files navigation

Visual ChatGPT

Intro

Demo

GPU memory usage

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages