Skip to content

Commit

Permalink
Update llm_judge_evaluating_ai_search_engines_with_judges_library.ipynb
Browse files Browse the repository at this point in the history
Fixed dataset link.
  • Loading branch information
julianeagu authored Feb 4, 2025
1 parent 52e7130 commit a892854
Showing 1 changed file with 1 addition and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@
"\n",
"We use the [Natural Questions dataset](https://paperswithcode.com/dataset/natural-questions), an open-source collection of real Google queries and Wikipedia articles, to benchmark AI search engine quality.\n",
"\n",
"1. Start with a [**100-datapoint subset of Natural Questions**](https://huggingface.co/datasets/quotientai/natural-qa-random-100-with-AI-search-answers), which only includes human evaluated answers and their corresponding queries for correctness, clarity, and completeness. We'll use these as the ground truth answers to the queries.\n",
"1. Start with a [**100-datapoint subset of Natural Questions**](https://huggingface.co/datasets/quotientai/labeled-natural-qa-random-100), which only includes human evaluated answers and their corresponding queries for correctness, clarity, and completeness. We'll use these as the ground truth answers to the queries.\n",
"2. Use different **AI search engines** (Perplexity, Exa, and Gemini) to generate responses to the queries in the dataset.\n",
"3. Use `judges` to evaluate the responses for **correctness** and **quality**.\n",
"\n",
Expand Down

0 comments on commit a892854

Please # to comment.