Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

🚀 perf: Convolution2D #118

Merged
merged 10 commits into from
Feb 28, 2024
Merged

🚀 perf: Convolution2D #118

merged 10 commits into from
Feb 28, 2024

Conversation

jean-francoisreboud
Copy link
Collaborator

@jean-francoisreboud jean-francoisreboud commented Feb 27, 2024

Changes:

  • some kernel optimization: locality

Benchmark:

20 iterations of 64 images of size (224, 224)

After some kernel optimizations:

M3Max 6900XT
Train VGG16 119s (74% faster) 145s (66% faster)
Eval VGG16 38s (48% faster) 30s (39% faster)

Before kernel optimizations:

M3Max 6900XT
Train VGG16 455s 422s
Eval VGG16 73s 49s

@jean-francoisreboud jean-francoisreboud self-assigned this Feb 27, 2024
@jean-francoisreboud jean-francoisreboud merged commit 192f994 into release_5 Feb 28, 2024
3 checks passed
@jean-francoisreboud jean-francoisreboud deleted the jfr/gpu branch February 28, 2024 07:58
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant