cute-flash-attention

Implement a simple Flash Attention using Cute, with very little performance overhead. When using it, you need to first replace -I/mnt/d/cuda/cutlass/* in bench.py with the path to your own cutlass directory. This code is for educational purposes only, edge cases are not considered, and it only works in the scenario where head_dim=64.

test on 4070s

	1x1024x32x64
cute-flash-attention	182.21us
flashinfer	194.4us
flash-attention	184.48us

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
bench.py		bench.py
flash.cu		flash.cu
main.cpp		main.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cute-flash-attention

About

Releases

Packages

Languages

luliyucoordinate/cute-flash-attention

Folders and files

Latest commit

History

Repository files navigation

cute-flash-attention

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages