-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Tecorigin sdaa accelerator #6903
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Tecorigin sdaa accelerator #6903
Conversation
@siqi654321, is this ready for review? |
@tjruwase Yes, it's ready for review. And Tecorigin SDAA is a AI processor that support AI frameworks like PyTorch, etc. It‘s possible to run Transformers/Accelerate/DeepSpeed on SDAA to train foundation model. Website: http://www.tecorigin.com/ |
@siqi654321, please address the DCO requirement |
614eea9
to
82ca0ea
Compare
Signed-off-by: siqi <siqi@tecorigin.com>
Signed-off-by: siqi <siqi@tecorigin.com>
82ca0ea
to
38e4d50
Compare
@tjruwase The DCO problem has been solved. |
@siqi654321, please consider doing the following as appropriate:
@loadams, did I miss anything? |
@siqi654321, the following can help fix the Formatting issue |
Signed-off-by: siqi <siqi@tecorigin.com>
@tjruwase Thx for help. It seems that the formatting error has been fixed. I would also like to ask whether the additional documentation you mentioned is necessary. I see that some accelerators, such as mlu, do not provide these documents. |
Signed-off-by: siqi <siqi@tecorigin.com>
@siqi654321, the additional documentation is completely optional. This PR will merge once CI completes. |
@tjruwase I found that the xpu-max1100 test has failed, but it doesn't seem to be related to my changes. Could you help take a look at this issue? |
@delock, this CI failure occurs sporadically but seems unrelated to the PRs. Can you help investigate? Thanks! https://github.com/deepspeedai/DeepSpeed/actions/runs/13260306651/job/37017257486?pr=6903 |
I think this covers everything, the README and the accelerator setup guide being the most important. |
@tjruwase we are updating firmware of this server to see whether the random failure goes away, at this time please ignore this error. Thanks! |
@tjruwase The tests "xpu-max1100" and "nv-torch-latest-v100" appear to be executed on the "xpu" and "cuda" accelerators respectively. However, I have confirmed that my modifications did not impact these two accelerators, which is quite puzzling. Is it possible to run the tests on a different CI machine? Or are there any other methods I can assist with to help troubleshoot the issue? |
@siqi654321 - nothing you need to do to troubleshoot the issue, we will check on the tests and ensure they pass or will mark them as not required if needed. I think you could just make the changes to the README as discussed above. |
@siqi654321 - did you want to update the README to indicate the SDAA is a supported accelerator? |
@loadams Is it possible to merge the code first in order to quickly support our ecosystem users? We will consider adding documentation related to this accelerator in subsequent pull requests. |
Description This PR includes Tecorigin SDAA accelerator support. With this PR, DeepSpeed supports SDAA as backend for training tasks. --------- Signed-off-by: siqi <siqi@tecorigin.com> Co-authored-by: siqi <siqi@tecorigin.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Description This PR includes Tecorigin SDAA accelerator support. With this PR, DeepSpeed supports SDAA as backend for training tasks. --------- Signed-off-by: siqi <siqi@tecorigin.com> Co-authored-by: siqi <siqi@tecorigin.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Signed-off-by: Max Kovalenko <mkovalenko@habana.ai>
Description This PR includes Tecorigin SDAA accelerator support. With this PR, DeepSpeed supports SDAA as backend for training tasks. --------- Signed-off-by: siqi <siqi@tecorigin.com> Co-authored-by: siqi <siqi@tecorigin.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Signed-off-by: Max Kovalenko <mkovalenko@habana.ai>
Description This PR includes Tecorigin SDAA accelerator support. With this PR, DeepSpeed supports SDAA as backend for training tasks. --------- Signed-off-by: siqi <siqi@tecorigin.com> Co-authored-by: siqi <siqi@tecorigin.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Signed-off-by: gyou2021 <ganmei.you@intel.com>
Description This PR includes Tecorigin SDAA accelerator support. With this PR, DeepSpeed supports SDAA as backend for training tasks. --------- Signed-off-by: siqi <siqi@tecorigin.com> Co-authored-by: siqi <siqi@tecorigin.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com>
Description This PR includes Tecorigin SDAA accelerator support. With this PR, DeepSpeed supports SDAA as backend for training tasks. --------- Signed-off-by: siqi <siqi@tecorigin.com> Co-authored-by: siqi <siqi@tecorigin.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Signed-off-by: Zheyu SHEN <zyshen@umd.edu>
Description This PR includes Tecorigin SDAA accelerator support. With this PR, DeepSpeed supports SDAA as backend for training tasks. --------- Signed-off-by: siqi <siqi@tecorigin.com> Co-authored-by: siqi <siqi@tecorigin.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Signed-off-by: yisheng <yi.sheng@intel.com>
Description This PR includes Tecorigin SDAA accelerator support. With this PR, DeepSpeed supports SDAA as backend for training tasks. --------- Signed-off-by: siqi <siqi@tecorigin.com> Co-authored-by: siqi <siqi@tecorigin.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Description This PR includes Tecorigin SDAA accelerator support. With this PR, DeepSpeed supports SDAA as backend for training tasks. --------- Signed-off-by: siqi <siqi@tecorigin.com> Co-authored-by: siqi <siqi@tecorigin.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Signed-off-by: Logan Adams <loadams@microsoft.com>
Description
This PR includes Tecorigin SDAA accelerator support.
With this PR, DeepSpeed supports SDAA as backend for training tasks.