Skip to content

Commit

Permalink
Bump to v2.7.0
Browse files Browse the repository at this point in the history
  • Loading branch information
tridao committed Nov 12, 2024
1 parent 6ffeb57 commit c555642
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 1 deletion.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -373,6 +373,10 @@ Thanks to @beginlner for this contribution.
Support attention with softcapping, as used in Gemma-2 and Grok models.
Thanks to @Narsil and @lucidrains for this contribution.

### 2.7: Compatibility with torch compile

Thanks to @ani300 for this contribution.

## Performance

We present expected speedup (combined forward + backward pass) and memory savings from using FlashAttention against PyTorch standard attention, depending on sequence length, on different GPUs (speedup depends on memory bandwidth - we see more speedup on slower GPU memory).
Expand Down
2 changes: 1 addition & 1 deletion flash_attn/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__version__ = "2.6.3"
__version__ = "2.7.0"

from flash_attn.flash_attn_interface import (
flash_attn_func,
Expand Down

0 comments on commit c555642

Please # to comment.