Validation ‐ Explained by its author

During training, the model learns to reconstruct an image from noise. We want the model not just to memorize specific images but to generalize and learn the concept, rather than individual pixels. The loss shows how much the model makes mistakes in estimating the noise that needs to be removed for image reconstruction. The training loss reflects only this and nothing more. An overfitted model will show a minimal loss, but that is no longer useful to us.

To catch the moment when the model stops generalizing and starts memorizing specific images, we use the validation process, which replicates the same steps as training: prediction and loss calculation, but with a few key differences:

-It does not update the model weights, so it doesn't affect the training process or modify the model. -It uses the validation concept. -The batch size is set to 1 (Why? I don't remember 😃). Since that there is no aspect ratio bucketing, and you can use any number of images. -For each validation run, predict is performed on every image in the validation concept, and the loss is averaged within the concept.

So, the prediction and loss calculation are performed on images that are not present in the training dataset but share the same meaning or concept. If you are using captions for training images, use captions for validation images too.

Images commonly found on the internet show that the best checkpoint is where the validation loss curves and starts increasing instead of decreasing. This indicates the moment when the model begins to overfit.

Val1

The size of the validation dataset is not strictly defined; typically, 5-10% of the training dataset is used. If the training dataset is small, both the training loss and validation loss can become noisy, making it difficult to visually identify the moment of overfitting. Even smoothing in TensorBoard does not always help in such cases.

Some even say that loss does not provide any useful information at all, which I disagree with. A noisy loss should be averaged over epochs or you can use bigger batch or gradient accumulration steps. An example is attached.

Originally, the validation feature worked like this: you tick the checkbox in the general window, set interval to run validation. Then you specify a validation concept in the concepts window, tick the enable checkbox to include concept and tick the validation checkbox to separate it from training concepts. In TensorBoard, a separate loss graph should appear for each concept.

If you see no additional graphs, something went wrong. If the training loss graph and validation loss graph look identical, something went wrong.

example of loss averaging

Val2

Overview

Home

Overview

Learning

Training

Getting Started

The Program - Tab Explanation

General

Model

Data

Concepts

Training

Optimizers

Custom Scheduler

Sampling

Backup and Saving

Tools

Additional Embeddings

Cloud

Embedding

Lora

More info

Infos, Guides and Lessons Learnt

Misc Info

Diffusion Models

Guides

One Trainer March 2024 Guide

Run One Trainer on Runpod

Other Tools - Helpful Links

Lessons Learnt

Frequently Asked Questions

Lessons Learnt and Tutorials

For Developers

Dev Corner

Developing on Clouds

Quick Start for Developers

CLI Training

Docker Image

Embedding Training

Project Structure

RAM Offloading

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validation ‐ Explained by its author

Overview

Training

More info

For Developers

Clone this wiki locally