SD3

Stable Diffusion 3 Medium

Access

Note: SD3 is under gated access: go to https://huggingface.co/stabilityai/stable-diffusion-3-medium and fill the form to get access

Load Model

Components

SD Model consists of:

MMDiT (multi-modal diffusion transformer)
Note: "medium" primarily refers to number of parameters in MMDiT component: 2B
StabilityAI may release smaller and/or larger variations as full SD3 has 8B parameters
VAE (variational autoencoder)
Multiple text encoders: CLIP-ViT/L, OpenCLIP-ViT/G, T5 Version 1.1
TE3 (T5) is optional and used primarily to render text

Load using Reference Models

Select: Networks -> Models -> Reference -> StabilityAI Stable Diffusion 3 Medium

Tip

To allow access to the models from SDNext server get your Huggingface token from your huggingface profile -> settings -> access tokens and enter it in SDNext -> settings -> diffusers -> huggingface token

Tip

Alternatively, login to Huggingface CLI and use the token from there

source venv/bin/activate
venv/bin/huggingface-cli login

Load using Manually provided single-file

Download SD3 models from Huggingface

Supported:

sd3_medium.safetensors: includes the MMDiT and VAE weights only, SD.Next will automatically load CLiP models as needed
sd3_medium_incl_clips.safetensors: includes all necessary weights except for the t5 text encoder

Unsupported:

sd3_medium_incl_clips_t5xxlfp8.safetensors: contains all necessary weights and t5 fp8 variant support for this version is planned in the near-future due to nature of fp8 quantization packaged in the file
t5 can be loaded/unloaded separately

Load Text Encoder

SD.Next allows changing optional text encoder on-the-fly

Go to settings -> models -> text encoder and select the desired text encoder
Default is None, supported are T5 FP8 and T5 FP16 (not recommended due to size)
T5 enhances text rendering and some details, but its otherwise very lightly used and optional
Loading T5 will greatly increase model resource usage and automatically enables sequential offloading

Tip

If you want to frequently switch between text encoders, you can add that setting to quicksettings

Parameters

Mandatory parameters:
Sampler: Default
Note: SD3 uses custom sampler FlowMatchEulerDiscreteScheduler
you can experiment with different samplers, but results are not guaranteed
StabilityAI recommended parameters:
Resolution: 1024x1024, CFG scale: 7.0, Steps: 28

ToDo

Add prompt attention parser
Add preview
Add inpainting
Fix SD3Transformer2DModel not compatible with cross-attention

Other

Pipeline documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SD3