Skip to content

AI45Lab/T2ISafety

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

T2ISafety

📢 We are currently organizing the code for T2ISafety. If you are interested in our work, please star ⭐ our project.

Version License Stars Issues

🎉 Introduction

Text-to-image (T2I) models have rapidly advanced, enabling the generation of high-quality images from text prompts across various domains. However, these models present notable safety concerns, including the risk of generating harmful, biased, or private content. Current research on assessing T2I safety remains in its early stages. While some efforts have been made to evaluate models on specific safety dimensions, many critical risks remain unexplored. To address this gap, we introduce T2ISafety, a safety benchmark that evaluates T2I models across three key domains: toxicity, fairness, and bias. We build a detailed hierarchy of 12 tasks and 44 categories based on these three domains, and meticulously collect 70K corresponding prompts. Based on this taxonomy and prompt set, we build a large-scale T2I dataset with 68K manually annotated images and train an evaluator capable of detecting critical risks that previous work has failed to identify, including risks that even ultra-large proprietary models like GPTs cannot correctly detect. We evaluate 12 prominent diffusion models on T2ISafety and reveal several concerns including persistent issues with racial fairness, a tendency to generate toxic content, and significant variation in privacy protection across the models, even with defense methods like concept erasing.

🚩 Features

  1. Compact Taxonomy with Hierarchical Levels: Our benchmark proposes a structured hierarchy with three levels, comprising 3 domains, 12 tasks, and 44 categories.
  2. Advanced Evaluation Framework
    1. Specialized fine-tuned evaluator namely ImageGuard for Images.

🚩 Dataset construction

Overview Diagram The creation of the T2ISafety dataset involves three key stages: prompt construction, image generation, and human annotation. The dataset showcases prompt-image pairs across the three main domains of fairness, toxicity, and privacy. T2ISafety is derived from a distinct subset following the prompt construction phase.

🚩 Model architecture

Overview Diagram Network architecture and additional loss of ImageGuard. Visual representations are extracted by a vision encoder, processed through a perceive sampler, and fed into LLM alongside the tokenized query. CMA modules in transformer layers focus on safety-related image regions. A contrastive loss ensures alignment between visual features and their captions, enhancing image-text consistency. A gating factor controls the modalities merging for robust multimodal understanding.

Results

🚩 T2I model results

Results

📑 Citation

@article{libenchmarking,
  title={Benchmarking Ethics in Text-to-Image Models: A Holistic Dataset and Evaluator for Fairness, Toxicity, and Privacy},
  author={Li, Lijun and Shi, Zhelun and Hu, Xuhao and Dong, Bowen and Qin, Yiran and Liu, Xihui and Sheng, Lu and Shao, Jing}
}

SALAD-Bench Team

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published