Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Source code doesnt provide operation of data prepare for BERT-like models #42

Open
pikaliov opened this issue Sep 18, 2020 · 1 comment

Comments

@pikaliov
Copy link

pikaliov commented Sep 18, 2020

Repository doesnt contain code for ALBERT/BERT: masking input tokens, masked labels, madked positions.
Did you follow hugginface or tensorflow.official algorithm to create dataset for train/eval?

@jarednielsen
Copy link
Contributor

I'd recommend following the instructions provided by Nvidia: https://github.com/NVIDIA/DeepLearningExamples/tree/b7903f0f62b1cdc3356d27956b5c8dee3896f68d/TensorFlow/LanguageModeling/BERT#getting-the-data. Those TFRecords are the expected format for the training scripts.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants