This repo contains the source code of our proposed Multi-MELO, a unified multimodel model editing method, which supports edting for different network architectures on both the image-to-text and text-to-image tasks.
- 2023/12/14: Experiments on editing LDM for personalization during text-to-image generation. 🎨
- 2023/10/29: Experiments on edting BLIP-2 OPT of VQA and Image Captioning 🎊
- 2023/09/02: Pulling the Vector Database out from layer forward pass. ⭐
Model editing aims to correct hallucinations or incorporate new knowledge into the pre-trained model. Most previous work focuses on model editing with merely the textual modality, while editing for multimodal models is not well studied. Recent research turns to investigate how to adapt the language model editors into the multimodal scenarios. Whereas, these methods are limited to the image-to-text tasks and similar model architectures. The text-to-image editing task has not been explored, which poses big challenges concerning the significant diversity of complex network architectures. In this paper, we propose a unified multimodal model editing framework based on dynamic LoRA (Multi-MELO), which enables effective editing for various multimodal models by dynamically activating corresponding LoRA blocks that encode the related knowledge. We explore the framework for editing diverse multimodal models (i.e., BLIP-2 and latent diffusion model) on three downstream tasks, including image captioning, visual question answering and text-to-image generation.
Main results of experiments based on BLIP-2 and Multi-MELO.
are recommended. -
Required CUDA environment and library dependencies are listed in:
Then you should install our modified PEFT:
cd peft_egg pip install -e .
Detailed implementation of MELO is in
Datasets for both the editing and evaluation are listed as follow:
Please refer to EasyEdit for the Dataset of Editing and Evaluation on VQA.
Please refer to EasyEdit for the Dataset of Editing and Evaluation on Caption.
Please refer to DreamBooth for the Dataset of Editing and Evaluation on ImGen.
Configurations of our project are governed by Hydra configs.
Following the hydra format, configurations for editing BLIP-2 and LDM are provided in ./BLIP2_melo/config
and ./DreamBooth_melo/config
cd BLIP_melo python +alg=lora_blip +experiment=vqa +model=blip2
cd DreamBooth_melo python +alg=lora_blip +experiment=caption +model=blip2
cd DreamBooth_melo python +algs=ft_diff +experiment=diffusion +model=dreambooth python +algs=lora_diff +experiment=diffusion +model=dreambooth python +algs=melo_diff +experiment=diffusion +model=dreambooth
We would like to thank the following individuals and organizations for their contributions to this project:
Huggingface: for their support of the PEFT community and their development of the PEFT framework (
MELO: for the development of the open-source library MELO which inspired our work (
EasyEdit: for the development of the open-source Dataset (
DreamBooth: for the development of the open-source Dataset (