Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

AVX512 detection failed when cpu supports AVX512 #291

Open
swhzzh opened this issue Jul 11, 2024 · 3 comments
Open

AVX512 detection failed when cpu supports AVX512 #291

swhzzh opened this issue Jul 11, 2024 · 3 comments

Comments

@swhzzh
Copy link

swhzzh commented Jul 11, 2024

OS: debian 9
GCC: 6.3
NASM: 2.12.01
CPU: Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz
ISA-L: 2.31

I have confirmed that my cpu supports AVX512 through https://ark.intel.com/content/www/us/en/ark/products/215269/intel-xeon-silver-4314-processor-24m-cache-2-40-ghz.html. However the AVX512 detection in building isa-l failed. I tried to run the detection code myself and got the following output:
$echo vinserti32x8 zmm0, ymm1, 1\; > tst.asm && nasm -f elf64 tst.asm && echo pass
tst.asm:1: error: invalid combination of opcode and operands.

I want to know why the detection failed and if there are some operations to the OS kernel I need to perform to enable AVX512. Thanks for answering!

@pablodelara
Copy link
Contributor

Hi @swhzzh. Your NASM version is way too old. You should install NASM 2.13.03 at least.

@swhzzh
Copy link
Author

swhzzh commented Jul 11, 2024

Hi @swhzzh. Your NASM version is way too old. You should install NASM 2.13.03 at least.

I install NASM 2.16 and then passed the check, thanks!
Besides, i want to know which xor_gen method is called at runtime, anyway to do that?

@swhzzh
Copy link
Author

swhzzh commented Jul 12, 2024

@pablodelara Hi, i benchmark xor_gen 10 +1 performance in different sizes. I can see that:

when CPU L2 Cache(1280K) can hold all data units(the test length is less than 128K per unit), the xor_gen performance when enable AVX512 is much better than not enable; however, for larger sizes, the xor_gen performance when enable AVX512 is worse than not enable.

Can you tell me why? Thanks!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants