Zifan Zheng1*, Yezhaohui Wang1*, Yuxin Huang2*, Shichao Song1, Bo Tang1, Feiyu Xiong1, Zhiyu Li1†
1Institute for Advanced Algorithms Research (IAAR), Shanghai,
2Institute for AI Industry Research (AIR), Tsinghua University
*Equal contribution.
†Corresponding author: Zhiyu Li (lizy@iaar.ac.cn).
Important
-
🌟 Star Us! If you find our work helpful, please consider staring our GitHub to stay updated with the latest content in this Awesome repo!
-
About this repo. This is a platform to get the latest research on different kinds of LLM's Attention Heads. Also, we released a survey based on these fantastic works.
-
If you want to cite our work, here is our bibtex entry: CITATION.bib.
-
If you only want to see the related paper list, please jump directly to here.
- [2024/09/07] Our paper secured the 2nd place on Hugging Face's Daily Paper List.
- [2024/09/06] Our survey paper is available on the arXiv platform: https://arxiv.org/abs/2409.03752.
With the development of Large Language Model (LLMs), their underlying network structure, the Transformer, is being extensively studied. Researching the Transformer structure helps us enhance our understanding of this "black box" and improve model interpretability. Recently, there has been an increasing body of work suggesting that the model contains two distinct partitions: attention mechanisms used for behavior, inference, and analysis, and Feed-Forward Networks (FFN) for knowledge storage. The former is crucial for revealing the functional capabilities of the model, leading to a series of studies exploring various functions within attention mechanisms, which we have termed Attention Head Mining.
In this survey, we delve into the potential mechanisms of how attention heads in LLMs contribute to the reasoning process.
Highlights:
- We propose an innovative four-stage framework, inspired by human cognitive neuroscience, to analyze the reasoning process of LLMs (Knowledge Recalling, In-Context Identification, Latent Reasoning, Expression Preparation).
- We classify current research on the interpretability of LLM attention heads according to the four-stage framework and d explore the collaborative mechanisms among them.
- We provide a comprehensive summary and classification of the experimental methodologies
- We summary the limitations of current research in this field and propose directions for future research.
Papers below are ordered by publication date: