FAVA is a hallucination detection and editing model. You can find a model demo here, model weights here and our datasets here. This repo includes information on synthetic data generation for training and evaluating FAVA.
- Installation
- Synthetic Data Generation
- Postprocess Data for Training
- Retrieval Guide
- FActScore Evaluations
- Fine Grained Sentence Detection Evaluations
conda create -n fava python=3.9
conda activate fava
pip install -r requirements.txt
python -m spacy download en_core_web_sm
Our synthetic data generation takes in wikipedia passages and a title, diversifies the passage to another genre of text and then inserts errors one by one using ChatGPT and GPT-4.
cd training
python generate_train_data.py \
--input_file {input_file_path} \
--output_file {output_file_path} \
--openai_key {your_openai_key}
Input file is jsonl
and includes:
intro
(ex: 'Lionel Messi is an Argentine soccer player.')title
(ex: 'Lionel Andrés Messi')
Output file includes:
evidence
(ex: 'Lionel Messi is an Argentine soccer player.')diversified_passage
(ex: 'The Argentine soccer player, Lionel Messi, is...')errored_passage
(ex: 'The <entity><delete>Argentine</delete><mark>American</mark></entity> soccer player, Lionel Messi, is...')subject
(ex: 'Lionel Andrés Messi')type
(ex: 'News Article')error_types
(ex: ['entity'])
cd training
python process_train_data.py \
--input_file {input_file_path} \
--output_file {output_file_path}
Input file is json
and includes:
evidence
(ex: 'Lionel Messi is an Argentine soccer player.')errored_passage
(ex: 'The <entity><delete>Argentine</delete><mark>American</mark></entity> soccer player, Lionel Messi, is...')ctxs
(ex: [{'id': 0, 'title': 'Lionel Messi', 'text': 'Lio Messi is known for...'},...])
Output file includes:
prompt
(ex: 'Read the following references:\nReference[1]:Lio Messi is...[Text] The American soccer player, Lionel Messi, is...')completion
(ex: 'The <entity><mark>Argentine</mark><delete>American</delete></entity> soccer player, Lionel Messi, is...')
We followed Open-Instruct's training script for training FAVA. We updated and ran this script updating the train_file to our processed training data from step 2 and used Llama-2-Chat 7B as our base model.
You can find our training data here.
We use Contriever to retrieve documents.
Download the preprocessed passage data and the generated passaged (Contriever-MSMARCO).
cd retrieval
wget https://dl.fbaipublicfiles.com/dpr/wikipedia_split/psgs_w100.tsv.gz
wget https://dl.fbaipublicfiles.com/contriever/embeddings/contriever-msmarco/wikipedia_embeddings.tar
We retrieve the top 5 documents but you may adjust num_docs
as per your liking.
cd retrieval
python passage_retrieval.py \
--model_name_or_path facebook/contriever-msmarco --passages psgs_w100.tsv \
--passages_embeddings "wikipedia_embeddings/*" \
--data {input_file_path} \
--output_dir {output_file_path} \
--n_docs {num_docs}
Input file is either a json
or jsonl
and includes:
question
orinstruction
(ex: 'Who is Lionel Messi')
We provide two main evaluation set ups: FActScore and our own fine grained error detection task.
cd eval
python run_eval --model_name_or_path {model_name_or_path} --input_file {input_file_path} --output_file {output_file_path} --metric factscore --openai_key {your_openai_key}
Input file is json
and includes:
passage
(ex: 'The American soccer player, Lionel Messi, is...')evidence
(ex: 'Lionel Messi is an Argentine soccer player...')title
(ex: 'Lionel Messi')
FActScore dataset can be downloaded from here. We used the the Alpaca 7B, Alpaca 13B, and ChatGPT data from FActScore.
cd eval
python run_eval --model_name_or_path {model_name_or_path} --input_file {input_file_path} --output_file {output_file_path} --metric detection
Input file is json
and includes:
passage
(ex: 'The American soccer player, Lionel Messi, is...')evidence
(ex: 'Lionel Messi is an Argentine soccer player...')annotated
(ex: 'The <entity><mark>Argentine</mark><delete>American</delete></entity> soccer player, Lionel Messi, is...')
You can find our human annotation data here.
Optional flags:
--max_new_tokens
: max new tokens to generate--do_sample
: true or false, whether or not to use sampling--temperature
: temperature for sampling--top_p
: top_p value for sampling
@article{mishra2024finegrained,
title={ Fine-grained Hallucinations Detections },
author={ Mishra, Abhika and Asai, Akari and Balachandran, Vidhisha and Wang, Yizhong and Neubig, Graham and Tsvetkov, Yulia and Hajishirzi, Hannaneh },
journal={arXiv preprint},
year={ 2024 },
url={ https://arxiv.org/abs/2401.06855 }
}