-
-
Notifications
You must be signed in to change notification settings - Fork 167
Validation ‐ Explained by its author
During training, the model learns to reconstruct an image from noise. We want the model not just to memorize specific images but to generalize and learn the concept, rather than individual pixels. The loss shows how much the model makes mistakes in estimating the noise that needs to be removed for image reconstruction. The training loss reflects only this and nothing more. An overfitted model will show a minimal loss, but that is no longer useful to us.
To catch the moment when the model stops generalizing and starts memorizing specific images, we use the validation process, which replicates the same steps as training: prediction and loss calculation, but with a few key differences:
-It does not update the model weights, so it doesn't affect the training process or modify the model. -It uses the validation concept. -The batch size is set to 1 (Why? I don't remember 😃). Since that there is no aspect ratio bucketing, and you can use any number of images. -For each validation run, predict is performed on every image in the validation concept, and the loss is averaged within the concept.
So, the prediction and loss calculation are performed on images that are not present in the training dataset but share the same meaning or concept. If you are using captions for training images, use captions for validation images too.
Images commonly found on the internet show that the best checkpoint is where the validation loss curves and starts increasing instead of decreasing. This indicates the moment when the model begins to overfit.
The size of the validation dataset is not strictly defined; typically, 5-10% of the training dataset is used. If the training dataset is small, both the training loss and validation loss can become noisy, making it difficult to visually identify the moment of overfitting. Even smoothing in TensorBoard does not always help in such cases.
Some even say that loss does not provide any useful information at all, which I disagree with. A noisy loss should be averaged over epochs or you can use bigger batch or gradient accumulration steps. An example is attached.
Originally, the validation feature worked like this: you tick the checkbox in the general window, set interval to run validation. Then you specify a validation concept in the concepts window, tick the enable checkbox to include concept and tick the validation checkbox to separate it from training concepts. In TensorBoard, a separate loss graph should appear for each concept.
If you see no additional graphs, something went wrong. If the training loss graph and validation loss graph look identical, something went wrong.
example of loss averaging