Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Problem with GEMM benchmark results for NVIDIA Volta #104

Open
wdj opened this issue Jul 3, 2018 · 1 comment
Open

Problem with GEMM benchmark results for NVIDIA Volta #104

wdj opened this issue Jul 3, 2018 · 1 comment

Comments

@wdj
Copy link

wdj commented Jul 3, 2018

Your NVIDIA gemm benchmark appears to have a problem. gemm_bench.cu uses uint16_t, an integer type, instead of __half to represent half precision floating point numbers. As a result, rand() in tensor.h fills the matrices A and B with random floating point numbers between 0 and 1 that are converted to integers -- therefore most of the entries are zeros rather than fully random floating point numbers. This results in unrepresentative benchmark timings for Volta GPUs that have power/frequency throttling enabled -- computing on zeros takes much less power than computing on random numbers -- I've confirmed this with nvidia-smi using your benchmark. For your gemm benchmark I've measured performance reported up to ~15% higher due to computing on zeros, an unrepresentative use case, compared to computing on realistic, nonzero inputs.

The fix seems to be replacing uint16_t with __half in the code.

Thank you for your assistance.

@WilliamTambellini
Copy link

Thank you @wdj
Fixed in my PR :
#110

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants