Why vllm store fp8 kv cache as uint8_t, but not torch.float8_e4m3fn or torch.float8_e5m2? #10911
Unanswered
cyLi-Tiger
asked this question in
Q&A
Replies: 0 comments
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
-
Here, key_states and value_states are being converted from BFloat16 (bf16) to uint8_t. Will this conversion cause performance issues due to the difference in range between these types? Additionally, how should we handle negative values during this conversion?
Beta Was this translation helpful? Give feedback.
All reactions