Online Chatbot:
https://spinach.genie.stanford.edu
The SPINACH dataset: Current KBQA datasets lack real-world complexity. The SPINACH KBQA dataset, collected from Wikidata's Request a Query sites, is the first to cover both natural questions and complex SPARQLs
The SPINACH agent: The SPINACH agent is a new KBQA approach that mimics expert human SPARQL writing, achieving SOTA on many KBQA datasets. You can try it at https://spinach.genie.stanford.edu
datasets/
contains all prior dataset files. Predictions for the SPINACH agent used in the paper can be found at:
datasets/qald_7_task4/spinach_output_test.json
for QALD-7datasets/qald_9_plus/en/spinach_output_test.json
for QALD-9-plusdatasets/qald_10/en/spinach_output_test.json
for QALD-10 full set (the prediction for the ToG subset can be retrieved by uncommenting the portion usingget_tog_baseline_questions
inevaluate_file.py
)datasets/wikiwebquestions/spinach_output_dev.json
anddatasets/wikiwebquestions/spinach_output_test.json
for WikiWebQuestions
spinach_dataset/
contains the dev and test set of the SPINACH dataset. The SPINACH agent's outputs are also stored in this directory.
spinach_agent/
contains the implementation for the SPINACH agent.
notebooks/
stores various Jupyter notebooks used to crawl the initial conversations and compute dataset complexity metrics.
tasks/
stores the files declaring how to use the invoke
command.
tests/
contains all tests, which use pytest
. You can run all tests by running invoke tests
. test_eval.py
, which stores test cases for the row-major F1 implementation, can be run via python tests/test_eval.py
.
Run conda env create -f conda_env.yaml
.
Create a file called API_KEYS
and write various API keys inside. The format is one key per line, for example OPENAI_API_KEY=sk-...
inv evaluate-parser --parser-type part_to_whole --subsample=-1 --engine=gpt-4o --dataset=datasets/qald_10/en/test.json --output-file=datasets/qald_10/en/spinach_output_test.json --regex-use-select-distinct-and-id-not-label --llm-extract-prediction-if-null
The two flags at the end are for:
llm-extract-prediction-if-null
: If a reasoning chain ended without any predicted SPARQL, asks a LLM to return a SPARQL. This part is implemented insideextract_sparql.ainvoke
. This is helpful because for simple queries, LLMs could just use ``get_wikidata_entry'' to get results instead of ever writing a SPARQL. We enabled this flag for all datasets we evaluated on.--regex-use-select-distinct-and-id-not-label
: Attempts to use regex to force useSELECT DISTINCT
instead ofSELECT
, and try to always include the variable QID instead of the label (i.e., usex
instead ofxLabel
). We enabled this for all datasets except the new SPINACH dataset that we evaluated on (The SPINACH dataset involves more complex predicted SPARQLs. The regex is not sophisticated enough to handle these cases.)
The script will also write a .log
file with SPINACH's chain of reasonings and actions with the same file name as the .json
output.
You can re-evaluate the output simply from the .json
file:
python spinach_agent/evaluate_file.py --input datasets/qald_10/en/spinach_output_test.json
If you'd like to simply run the parser on a list of questions, use the following code from evaluate_parser.py
:
from spinach_agent.part_to_whole_parser import PartToWholeParser
semantic_parser_class = PartToWholeParser
semantic_parser_class.initialize(engine=args.engine) # e.g. "gpt-4o"
chain_output = semantic_parser_class.run_batch(
questions, # this should be a dict of {"question": "...", "conversation_history": [...]}, conversation_history can be empty list if running on single-turn questions
)
The code in this repo is released under Apache License, version 2.0. The SPINACH dataset, derived from the Wikidata Request a Query forum, is released under the CC BY-SA 4.0 license, the same license that covers the forum.
@misc{liu2024spinachsparqlbasedinformationnavigation,
title={SPINACH: SPARQL-Based Information Navigation for Challenging Real-World Questions},
author={Shicheng Liu and Sina J. Semnani and Harold Triedman and Jialiang Xu and Isaac Dan Zhao and Monica S. Lam},
year={2024},
eprint={2407.11417},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2407.11417},
}