From bb671447569763d0d3502f0ac155a638e1395e21 Mon Sep 17 00:00:00 2001
From: Zihao Ye <expye@outlook.com>
Date: Sun, 10 Nov 2024 19:08:57 -0800
Subject: [PATCH] doc: update readme (#604)

Add lorax and acknowledge aitemplate for kernel generation.
---
 README.md      | 5 +++--
 docs/index.rst | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index 39150d1a..88ce6e0f 100644
--- a/README.md
+++ b/README.md
@@ -16,7 +16,7 @@ Kernel Library for LLM Serving
 [![Documentation](https://github.com/flashinfer-ai/flashinfer/actions/workflows/build-doc.yml/badge.svg)](https://github.com/flashinfer-ai/flashinfer/actions/workflows/build-doc.yml)
 
 
-FlashInfer is a library for Large Language Models that provides high-performance implementation of LLM GPU kernels such as FlashAttention, SparseAttention, PageAttention, Sampling, and more. FlashInfer focuses on LLM serving and inference, and delivers state-of-the-art performance across diverse scenarios.
+FlashInfer is a library and kernel generator for Large Language Models that provides high-performance implementation of LLM GPU kernels such as FlashAttention, SparseAttention, PageAttention, Sampling, and more. FlashInfer focuses on LLM serving and inference, and delivers state-of-the-art performance across diverse scenarios.
 
 The unique features of FlashInfer include:
 1. **Comprehensive Attention Kernels**: Attention kernels that cover all the common use cases of LLM serving, including *single-request* and *batching* versions of *Prefill*, *Decode*, and *Append* kernels, on different formats of KV-Cache (Padded Tensor, Ragged Tensor, and Page Table).
@@ -125,7 +125,8 @@ Currently FlashInfer is adopted by the following projects:
 - [ScaleLLM](https://github.com/vectorch-ai/ScaleLLM)
 - [vLLM](https://github.com/vllm-project/vllm)
 - [TGI](https://github.com/huggingface/text-generation-inference)
+- [lorax](https://github.com/predibase/lorax)
 
 ## Acknowledgement
 
-FlashInfer is inspired by [FlashAttention 1&2](https://github.com/dao-AILab/flash-attention/), [vLLM](https://github.com/vllm-project/vllm), [stream-K](https://arxiv.org/abs/2301.03598) and [cutlass](https://github.com/nvidia/cutlass) projects.
+FlashInfer is inspired by [FlashAttention 1&2](https://github.com/dao-AILab/flash-attention/), [vLLM](https://github.com/vllm-project/vllm), [stream-K](https://arxiv.org/abs/2301.03598), [cutlass](https://github.com/nvidia/cutlass) and [AITemplate](https://github.com/facebookincubator/AITemplate) projects.
diff --git a/docs/index.rst b/docs/index.rst
index d8af2d44..ee4ecdb1 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -8,7 +8,7 @@ Welcome to FlashInfer's documentation!
 
 `Blog <https://flashinfer.ai/>`_ | `Discussion Forum <https://github.com/orgs/flashinfer-ai/discussions>`_ | `GitHub <https://github.com/flashinfer-ai/flashinfer/>`_
 
-FlashInfer is a library for Large Language Models that provides high-performance implementation of LLM GPU kernels such as FlashAttention, PageAttention and LoRA. FlashInfer focus on LLM serving and inference, and delivers state-of-the-art performance across diverse scenarios.
+FlashInfer is a library and kernel generator for Large Language Models that provides high-performance implementation of LLM GPU kernels such as FlashAttention, PageAttention and LoRA. FlashInfer focus on LLM serving and inference, and delivers state-of-the-art performance across diverse scenarios.
 
 .. toctree::
    :maxdepth: 2