Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

更大的模型需要更多的block吗? #18

Open
PoseidomWong opened this issue Mar 13, 2024 · 1 comment
Open

更大的模型需要更多的block吗? #18

PoseidomWong opened this issue Mar 13, 2024 · 1 comment

Comments

@PoseidomWong
Copy link

如果想把llama-pro应用到更大的模型中比如34B、72B,那么是否需要按比例增大block的数量?这方面的实验是否有做过呢?

@hills-code
Copy link
Collaborator

hills-code commented Mar 13, 2024

我们也在探索更大的模型,不过这样的实验很需要资源,目前为止我们探索了在不同架构,如mistral上的扩展,取得了一定的效果,如Mistral-Pro,后续我们也会进一步探索这方面的idea。我们发现yi也最近使用深度扩展进行了数学代码的训练,Yi-9B,他扩展了16层,我相信复制的位置,复制的层数,还是有很多值得研究的地方,我们会逐步研究的。

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants