GitHub - gpu-mode/triton-tutorials

Work in Progress!

General summary: work through progressive kernels to learn Triton with:
a - vector add
b - simple matmul (outer k loop and tiled)
c - fused softmax (in pytorch, then in Triton)
d - softmax with backward
e - flash attention 2
f - group gemm with backwards (bf16, then fp8)
g - MoE permute / unpermute kernels

Getting Started:

Lessons are arranged in order, starting with lesson_1 etc.

Some of the above are also examples in the core Triton repo...what's the difference?

Generically and from an optionated view, the examples in the core repo tend to put too much, too soon into the examples obfuscating the core lessons one should take from it (again, biased opinion).

We'll stick with simpler versions that allow the core tenets of kernel programming to shine through, bypassing autotuning, certain cache optimizations, and other aspects and pull them in with later lessons.
In addition. goal is to have almost every line in each kernel commented with clear, concise comments to make it easy to map what the kernel code is doing.

External contributions are very welcome - this is meant to be an evolving, iteratively improving tutorial series.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
kernels		kernels
lesson_1		lesson_1
lesson_2_first_kernel		lesson_2_first_kernel
lesson_5/triton_contiguous_group_gemm		lesson_5/triton_contiguous_group_gemm
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting Started:

About

Releases

Packages

Contributors 2

Languages

gpu-mode/triton-tutorials

Folders and files

Latest commit

History

Repository files navigation

Getting Started:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages