-
Notifications
You must be signed in to change notification settings - Fork 67
convert Yi34B model fail #120
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
Yi requires support from a higher version of transformers. You can bypass this issue by upgrading to version 4.34.0. Although Yi's structure is consistent with Llama, XFT has not been adapted yet, and it is uncertain whether it can be seamlessly compatible. Contributions to the adaptation code for Yi are welcome. thx~ |
After upgrading transformers to 4.34, model conversion is ok now. However, it got core dump when use the sample code, it print the prompt again, then report illegal instruction.
|
What's your CPU platform? Or cloud you provide the info of |
|
this is the 2nd gen intel xeon processor(CLX), so only includes the AVX512 instruction, so please refer to here: https://github.com/intel/xFasterTransformer/wiki/How-to-build#build-on-icx--clx. |
Hi ~
However, it silent quit when call xfastertransformer.AutoModel.from_pretrained.
|
After switch to fp16: it seems to be working, but at very slow speed, it took almost one minute to generate the first token, then no response afterwards... So is there any benchmark data for different size of llm? |
Did you run it with BF16 dtype in the first time? Our BF16 solution need AMX hardware support(4nd gen Xeon).
|
When run with bf16, it stop after print below warn message: |
Yes, since BF16 requires AMX hardware support, and you disabled the BF16 option during compilation, this is as expected. BF16 is currently not supported on the CXL platform. |
What's your command to run FP16? |
model = xfastertransformer.AutoModel.from_pretrained(MODEL_PATH, dtype="fp16") |
So you just run this with You can follow this to open python cli:
If you want run on 2 nodes, you need save python code into script.py and then:
|
With this: numactl -C 0-23 python The generate speed is much faster now. For fp16, is AMX being used? |
AMX on SPR only supports BF16 and INT8 now, but fp16 will still have a much better performance on SPR since SPR supports avx512_fp16 instructions. SPR performance is much better than CLX even without AMX since it has more cores and larger memory bandwidth. |
@leiwen83 Feel free to join our WeChat Group for additional details and prompt assistance. https://github.com/intel/xFasterTransformer/wiki |
close as fixed. |
Hi,
I try do the Yi34B model conversion with tools/llama_convert.py, but met error...
The text was updated successfully, but these errors were encountered: