Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Pretraining dataset #73

Open
mactavish91 opened this issue Dec 28, 2023 · 1 comment
Open

Pretraining dataset #73

mactavish91 opened this issue Dec 28, 2023 · 1 comment

Comments

@mactavish91
Copy link

Thank you for your excellent work. I'm currently training my own CLIP model and have a question. If I use LAION-2B, COYO-700M, and Datacomp datasets simultaneously for training, will it yield better results? Should I perform data deduplication?

@gabrielilharco
Copy link
Contributor

Hi @mactavish91, we don't have those exact experiments, but there are some relevant ones in Table 18 or our paper (https://arxiv.org/pdf/2304.14108.pdf)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants