Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

ARM performance library comparisons #4744

Closed
jianshu93 opened this issue Jun 9, 2024 · 4 comments
Closed

ARM performance library comparisons #4744

jianshu93 opened this issue Jun 9, 2024 · 4 comments

Comments

@jianshu93
Copy link

Dear OpenBLAS team,

Just curious how OpenBLAS on ARM will look like when compared to the ARM official performance library here, on ARM CPUs: https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Libraries. It is only for ARM CPUs though.

Thanks,

Jianshu

@martin-frbg
Copy link
Collaborator

I don't have data for that right now , best you do the comparison on the hardware and functions you want to use. I expect performance will be pretty similar for GEMM, but OpenBLAS does not yet have SVE kernels for every function where it would make sense (e.g. complex dot product has fairly poor performance)

@jianshu93
Copy link
Author

I will probably run some test on M1 chips since both can be easily installed on my Mac. What other functions you suggest I can test except GEMM?

Thanks,

Jianshu

@martin-frbg
Copy link
Collaborator

I'd think DOT and AXPY, as a number of other BLAS functions can/will be implemented in terms of them. Maybe TRMM for completeness (will not necessarily be close to GEMM). Of course M1 is a bit special as it does not provide SVE support - on the other hand, if you (can) include Apple's own Accelerate library in your comparison you gain access to the (officially) undocumented matrix math coprocessor (AMX).

@martin-frbg
Copy link
Collaborator

to be followed up at some point in the future in OpenMathLib/BLAS-Benchmarks#8

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants