Unlike Extractive systems, Generative/Abstractive Question Answering systems create answers to questions, based on some knowledge. The answers generated by abstractive systems can aggregate information contained in multiple original passages and are human-like.
Today everyone knows and talks about ChatGPT 💬. If we look at it from the point of view of Question Answering, it is a closed-book system: it is based on internal knowledge. This knowledge is also known as "parametric memory": it is stored in the model weights and is accumulated during training.
When used in isolation, abstractive closed-book QA systems have some serious limitations:
❌ their knowledge does not fit a specific domain and is expensive and difficult to update over time
❌ they can produce "hallucinations"
To overcome the disadvantages of generative closed-book solutions, several systems have been developed that share a similar idea:
- use a Retriever 🔎 to collect passages of text relevant to the user's question
- use the non-parametric knowledge stored in text passages to influence Answer Generation
In recent years, various systems have been proposed that combine the two components. Some examples: ORQA (Google), REALM (Google), RAG (Meta), FiD (Meta), RETRO (Deepmind). Probably the most popular is Retrieval-Augmented Generation (RAG), proposed by Patrick Lewis et al. in 2020, which has made a comeback in recent months...
This system is not that famous, but it is simple and effective. It was introduced by Gautier Izacard and Edouard Grave (Meta Research) in 2021.
-
for Retrieval from Wikipedia, the authors considered two methods: BM25 (sparse retrieval) and Dense Passage Retrieval. Since the retriever is not trained, FiD is potentially compatible with any retrieval system.
-
the generative model is based on a sequence-to-sequence network, pretrained on unsupervised data, such as T5 (transformer with encoder-decoder architecture). Each retrieved passage and its title are concatenated with the question and processed independently from other passages by the encoder. Then the decoder performs attention over the concatenation of the resulting representations of all the retrieved passages (Fusion-in-Decoder).
Experiments and results:
- The FiD system has been trained and evaluated on 3 different QA datasets
- while conceptually simple, trained models are competitive with or better than closed-book approaches and result in much smaller sizes
- major performance improvements are achieved by using the knowledge retrieved and scaling to a large number of jointly processed passages
- (Large) Language Models 🧠 have strong text comprehension/generation skills
- their knowledge is generic and is not easily updated over time
- When building NLP applications, we can combine LM with 🔎 Retrieval systems to provide new/specific knowledge and make them answer factually!
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks: paper introducing RAG by Patrick Lewis et al.
- Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering: paper introducing Fusion-in-Decoder
- fastRAG: recent framework for efficient Retrieval Augmentation and Generation by Intel Labs
- The Illustrated Retrieval Transformer: deep blogpost by Jay Alammar, who explains DeepMind's RETRO and Retrieval-based Answer Generation
- Build a Search Engine with GPT-3: blogpost by Tuana Çelik, who explains how to combine Large Language Models with a corpus of your choice to generate natural-sounding answers that are grounded in facts