Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

无法找到知识增强预训练的数据 #29

Open
nuoma opened this issue May 9, 2023 · 2 comments
Open

无法找到知识增强预训练的数据 #29

nuoma opened this issue May 9, 2023 · 2 comments

Comments

@nuoma
Copy link

nuoma commented May 9, 2023

你好,我无法找到文件: data_path=/wjn/nlp_task_datasets/kg-pre-trained-corpus/total_pretrain_kgicl_gpt,感觉看的有点模糊,麻烦指个路,谢谢!

@wjn1996
Copy link
Contributor

wjn1996 commented May 9, 2023

您好,这个数据对应的工作还在投中,所以暂未开源。数据格式本质上和gpt的训练语料一样。

@nuoma
Copy link
Author

nuoma commented May 13, 2023

是指预训练阶段的语料(wudao,pile),一堆txt文件,每个文件里每行就是一句话这种吗?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants