-
Notifications
You must be signed in to change notification settings - Fork 67
Process isAssertion failed in file ../../src/mpid/ch4/shm/posix/eager/include/intel_transport_recv.h at line 1160: cma_read_nbytes == size #107
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
Looks something wrong with oneCCL. @JunxiChhen Do we still encounter such issue? |
Yes. But it only occurred on SPR-HBM snc4 flat mode now. I didn't see any issue on SPR Quad mode. |
@shanzhou2186 Have you ever encountered such issue? for such issue, need an environment to debug. |
Have you checked the memory usage on each sub-numa? Is it possible that one of sub numa OOM? |
No. I haven't tried such large input/output before. |
@JunxiChhen Hi, I also got this error when using Intel OneAPI (version 2023.2.0). Did you finally know how to avoid this bug? |
Issue occured:

When running llama2-7b, input4096, output2048, BS16, Beam1, on SPR-HBM flat mode SNC4.

Benchmarking CMD:
The text was updated successfully, but these errors were encountered: