You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After downloading OpenWebText corpus, I extract it using the tar xvf openwebtext.tar.gz command. When I try running the python -m lmtuners.utils.tokenize_and_cache_data data/ data_tokenized_128/ --tokenizer_path bert-base-uncased-vocab.txt --max_length=64 command, I get an error saying
@makeshn I’m not able to try to reproduce this right now, but I believe the openwebtext corpus was compressed twice. I see you’re running the script over *.xz files. I think you should be able to decompress each *.xz file into a bunch of .txt files
After downloading OpenWebText corpus, I extract it using the
tar xvf openwebtext.tar.gz
command. When I try running thepython -m lmtuners.utils.tokenize_and_cache_data data/ data_tokenized_128/ --tokenizer_path bert-base-uncased-vocab.txt --max_length=64
command, I get an error sayingfor every file. Could you please help me overcome this issue? @shoarora
The text was updated successfully, but these errors were encountered: