Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

请问WizardLM的数据是爬取的GPT4吗? #60

Open
AceCHQ opened this issue Aug 29, 2023 · 3 comments
Open

请问WizardLM的数据是爬取的GPT4吗? #60

AceCHQ opened this issue Aug 29, 2023 · 3 comments

Comments

@AceCHQ
Copy link

AceCHQ commented Aug 29, 2023

Hello,感谢您的工作,请问WizardLM的进化指令翻译质量如何,有经过过滤吗?另外回答是爬取的GPT4还是GPT3.5吗?谢谢回复~

@LC1332
Copy link
Owner

LC1332 commented Aug 29, 2023

WizardLM有1万是用没改进的prompt翻译的,剩余5万多是好的。我打算之后用embedding筛除一下质量不好的。回答是爬取3.5的,4有点小贵~~

@AceCHQ
Copy link
Author

AceCHQ commented Sep 3, 2023

谢谢回复,请问embedding如何筛除?有什么合适的模型吗?

@LC1332
Copy link
Owner

LC1332 commented Sep 3, 2023

Good Question. 我们最近训了一个 https://huggingface.co/silk-road/luotuo-bert-en 我还剩一个实验是用这个 去对 luotuo-bert,把这些翻译数据集出现指令注入现象的错误翻译给修正一遍,你有兴趣的话 去我知乎主页https://www.zhihu.com/people/cheng-li-47 留个微信吧,我找相关的同学来推进一下QAQ

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants