ORKG Synthesis Dataset

This work is accepted for publication at JCDL-2024 conference.

What is the ORKG Synthesis Dataset?

We develop a methodology to collect and process scientific papers into a format ready for synthesis using the Open Research Knowledge Graph, a multidisciplinary platform that facilitates the comparison of scientific contributions. Where later, we introduce new synthesis types —- paper-wise, methodological, and thematic —- that focus on different aspects of the extracted insights. Utilizing Mistral-7B and GPT4 , we generate a large-scale dataset of these syntheses. The established nine quality criteria for evaluating these syntheses, assessed by both an automated LLM evaluator (GPT-4) and a human-crowdsourced survey.

Directories

corpus: Contains ORKG Synthesis dataset for bot GPT-4 and Mistral-7B for three synthesis objectives (paper-wise, methodological, and thematic). Also Prolific Human Survey Results.
gpt-4 synthesis-evaluator: Contains Evaluation System Prompt and evaluator script.
orkg-comparison-data-gen-scripts: Synthesis generation scripts.
synthesis-generation-prompts: Synthesis generation prompts for paper-wise, methodological, and thematic objectives.

Prolific Survey

The Prolific Survey Participant Demographics available at Table 1 in the corpus/prolific directory.

Also the average human and automatic (LLM) evaluation available at Table 2 in the corpus/prolific directory, representing average human and LLM evaluation scores by characteristic comparisons. For each domain/characteristic, the human scores are an average of 18 judgements (6 syntheses (2 samples x 3 synthesis types) x 3 participants) while the auto scores are an average of 6 judgements (6 syntheses (2 samples x 3 synthesis types) x 1 LLM evaluation).

LLMs4Synthesis

The LLMs4Synthesis framework on top of this dataset is available at https://github.com/HamedBabaei/LLMs4Synthesis.

Citation

If you find this work useful, please consider citing our research papers listed below.

@inproceedings{evans-etal-2024-large,
    title = "Large Language Models as Evaluators for Scientific Synthesis",
    author = {Evans, Julia  and
      D{'}Souza, Jennifer  and
      Auer, S{\"o}ren},
    editor = "Luz de Araujo, Pedro Henrique  and
      Baumann, Andreas  and
      Gromann, Dagmar  and
      Krenn, Brigitte  and
      Roth, Benjamin  and
      Wiegand, Michael",
    booktitle = "Proceedings of the 20th Conference on Natural Language Processing (KONVENS 2024)",
    month = sep,
    year = "2024",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.konvens-main.1/",
    pages = "1--22"
}


@inbook{babaei-giglou-etal-2025-synthesis,
author = {Babaei Giglou, Hamed and D'Souza, Jennifer and Auer, S\"{o}ren},
title = {LLMs4Synthesis: Leveraging Large Language Models for Scientific Synthesis},
year = {2025},
isbn = {9798400710933},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3677389.3702565},
articleno = {31},
numpages = {12}
}

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
corpus		corpus
gpt-4 synthesis-evaluator		gpt-4 synthesis-evaluator
images		images
orkg-comparison-data-gen-scripts		orkg-comparison-data-gen-scripts
synthesis-generation-prompts		synthesis-generation-prompts
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ORKG Synthesis Dataset

What is the ORKG Synthesis Dataset?

Directories

Prolific Survey

LLMs4Synthesis

Citation

About

Releases

Packages

Contributors 2

Languages

License

jd-coderepos/scisynthesis

Folders and files

Latest commit

History

Repository files navigation

ORKG Synthesis Dataset

What is the ORKG Synthesis Dataset?

Directories

Prolific Survey

LLMs4Synthesis

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages