Using Cupy + custom Numba kernels for GPU #42

mlazzarin · 2021-11-13T12:47:37Z

In this PR I tried an alternative approach w.r.t. #38.
To sum up, we have different possibilities for the implementation of the GPU backend:

Cupy with custom kernels written in C++ (in the main branch)
Numba with custom Numba kernels (explored in Add Numba GPU backend #38)
Cupy with custom Numba kernels (in this PR)

A comparison between the current branch and the main one is presented in the following table.
Concerning the dry run overhead of the main branch, please remember that we moved the compilation to import, otherwise it would be ~ 0.9 s

nqubits	Simulation time cupy	Simulation time cupy+numba	Dry run overhead cupy	Dry run overhead cupy+numba
3	0.00043	0.00467	0.09297702312469483	0.30436996221542356
4	0.00083	0.00888	0.09518848657608033	0.2708741545677185
5	0.00081	0.00916	0.09455618858337403	0.2714032411575317
6	0.00119	0.01318	0.09532219171524048	0.2728468656539917
7	0.00121	0.01318	0.09505038261413574	0.27390332221984864
8	0.00155	0.01749	0.09760881662368774	0.2756802082061768
9	0.00156	0.01757	0.09740451574325562	0.2755767583847046
10	0.00203	0.02177	0.09837921857833862	0.27367912530899047
11	0.00225	0.02120	0.09772248268127441	0.27559599876403806
12	0.00271	0.02601	0.09852179288864135	0.2762377142906189
13	0.00264	0.02627	0.09880340099334717	0.27665033340454104
14	0.00313	0.03065	0.10003931522369384	0.2785204529762268
15	0.00306	0.03111	0.09940198659896851	0.2762040734291077
16	0.00357	0.03532	0.10092591047286988	0.28022428750991824
17	0.00354	0.03561	0.10018761157989502	0.2839407444000244
18	0.00438	0.03952	0.10117172002792359	0.2824350595474243
19	0.00572	0.04057	0.1013898491859436	0.2807410478591919
20	0.00892	0.04637	0.10407603979110717	0.28025256395339965
21	0.01354	0.05051	0.10352401733398438	0.28388042449951173
22	0.02545	0.06747	0.1031303882598877	0.28077268600463867
23	0.04632	0.08786	0.10362286567687988	0.2811563968658447
24	0.09514	0.14064	0.10486226081848145	0.2819861888885498
25	0.18772	0.23358	0.10353651046752929	0.2827317714691162
26	0.39646	0.44629	0.10386247634887696	0.28243045806884765
27	0.79687	0.84750	0.10521702766418461	0.28218193054199214
28	1.70808	1.76715	0.10237464904785165	0.2835762977600098
29	3.36999	3.43840	0.09933676719665518	0.2804431915283203
30	7.28894	7.37794	0.09502677917480451	0.2725296497344969

The results are worse than the main branch. Please note that the implementation in this PR is fully working, apart from multi-qubit gate kernels, which don't work well without the _launch_bound_ directive.

Given these results, it seems that our implementation in main is still the best option. By the way, the implementation in this PR is very simple and doesn't even require the programmer to know C++ or CUDA, so it may be a good option for projects with different goals/constraints.

scarrazza · 2021-11-13T20:34:32Z

Thanks for this implementation and checks. I think we should add to this PR and #38 a new label, i.e. "experimental" and keep this open until there are more progress from the numba side.

mlazzarin added 5 commits November 13, 2021 09:15

Fix multiqubit GPU ops not called by JitCustomBackend

25b6e48

Add Numba custom kernels for GPU

8b80bed

Replace cupy kernels with numba kernels

0bb6d73

Remove cupy kernels

ced4288

Replace cupy synchronize with numba one

c0bca8a

mlazzarin added experimental Experimental implementation do not merge labels Nov 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Cupy + custom Numba kernels for GPU #42

Using Cupy + custom Numba kernels for GPU #42

mlazzarin commented Nov 13, 2021

scarrazza commented Nov 13, 2021

Using Cupy + custom Numba kernels for GPU #42

Are you sure you want to change the base?

Using Cupy + custom Numba kernels for GPU #42

Conversation

mlazzarin commented Nov 13, 2021

scarrazza commented Nov 13, 2021