Using Cupy + custom Numba kernels for GPU #42
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In this PR I tried an alternative approach w.r.t. #38.
To sum up, we have different possibilities for the implementation of the GPU backend:
A comparison between the current branch and the main one is presented in the following table.
Concerning the dry run overhead of the main branch, please remember that we moved the compilation to import, otherwise it would be ~ 0.9 s
The results are worse than the main branch. Please note that the implementation in this PR is fully working, apart from multi-qubit gate kernels, which don't work well without the
_launch_bound_
directive.Given these results, it seems that our implementation in main is still the best option. By the way, the implementation in this PR is very simple and doesn't even require the programmer to know C++ or CUDA, so it may be a good option for projects with different goals/constraints.