Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

人工标注的数据集有公开吗 #35

Open
lingoubb opened this issue Nov 28, 2024 · 0 comments
Open

人工标注的数据集有公开吗 #35

lingoubb opened this issue Nov 28, 2024 · 0 comments

Comments

@lingoubb
Copy link

这部分提到的标注后的数据集有公开吗:Dataset. We randomly sample a subset of 400
queries from the complete ALIGNBENCH dataset.
To make sure each category consists of enough
samples to produce reliable results, smaller cat-
egories are upsampled. To cover LLMs with a
wider levels of capability, we adopt answers from
8 LLMs, including GPT-4 (OpenAI, 2023), three
versions of ChatGLM series (Zeng et al., 2022; Du
et al., 2022), Sparkdesk, Qwen-plus-v1-search(Bai
et al., 2023a), InternLM-7B-Chat (Team, 2023)
and Chinese-Llama2-7B-Chat, producing a total
of 3200 question-answer pairings. Subsequent to
the compilation of the evaluation set, the question-
answer-reference triples are delivered to human
annotators, tasked with assigning quality ratings to
the answers according to the references. Given the
inherent limitations bound to human cognition, an-
notators are instructed to employ a rating on a scale
from 1 to 5. The scores are indicative of response
quality, with higher scores epitomizing superior
quality and profound satisfaction. In particular, a
score of 1 marks irrelevant, incorrect, or potentially
harmful responses.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant