Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

mmap issue in bf16 of gpt-fast #165

Open
yanbing-j opened this issue Apr 28, 2024 · 1 comment
Open

mmap issue in bf16 of gpt-fast #165

yanbing-j opened this issue Apr 28, 2024 · 1 comment

Comments

@yanbing-j
Copy link

gpt-fast will use torch.load with mmap=True to load checkpoints of models. This may help speed up model load time. However, eventually, mmap is not used in bf16, because in https://github.com/pytorch-labs/gpt-fast/blob/main/generate.py#L247, model will to bfloat16 from float16 when running bf16 model. to will malloc a new memory area, mapped file is not used.

Meanwhile, in int8/int4, the logic of https://github.com/pytorch-labs/gpt-fast/blob/main/generate.py#L247 does not make sense. int8 model should not convert to bfloat16 data type. Now, int8/int4 can work well, because weight is not a parameter of int8/int4 modules by chance.

@yanboliang
Copy link
Contributor

@yanbing-j The goal of gpt-fast is to demonstrate the list of optimization we did to accelerate inference, model loading is not the major bottleneck, so we didn't do too much optimization. But I do agree with your points, we are also welcome PRs for these optimizations.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants