Experience in Pre-training

Since I finish this project with quite limited computational resources, I would like to share some experiences. If you are also in a small group and plan to pre-train back-bone models for fun, hope it would help.

Workflow

Design a model and its pre-training strategies.
Test whether the code is correct or not by over-fitting a super small split (5000 images, typically) of aggregated data.
Pre-train it on all aggregated pre-training data for around 3 to 4 epochs. (At least make sure that all the images are included!)
Test the pre-training performance on a small split of fine-tuning tasks. I used 5000 images by setting the --fast option. Note that the epochs should be increased to around 20/50 from 4.
If the accuracy (i.e., results) of the fine-tuning tasks keep growing, it indicates that the pre-training is effective!
Compare the full fine-tuning-data results when 3-4 epochs' pre-training finishes and select the best pre-training strategies.
Train on full aggregated data and have a good one-week sleep ;).

Tips

Do not validate pre-training strategies (pre-training tasks, pre-training model) on a small split of the data. The behavior of pre-training on a small split is significantly different from the full pre-training dataset.
Do not over-tune the pre-training hyperparameters. Keep in mind that a good idea will overshadow all these cherry-pick hyper-parameters. Anyway, you would not have enough GPUs to do that.
Add a component at each time; Have a plan for it.
Pipeline everything.
You could rest but GPUs never get rest; GPUs are sometimes broken but you never give up.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

experience_in_pretraining.md

experience_in_pretraining.md

Experience in Pre-training

Workflow

Tips

Files

experience_in_pretraining.md

Latest commit

History

experience_in_pretraining.md

File metadata and controls

Experience in Pre-training

Workflow

Tips