diff --git a/notebooks/en/llm_judge_evaluating_ai_search_engines_with_judges_library.ipynb b/notebooks/en/llm_judge_evaluating_ai_search_engines_with_judges_library.ipynb index 6c83a3cb..72e7b18a 100644 --- a/notebooks/en/llm_judge_evaluating_ai_search_engines_with_judges_library.ipynb +++ b/notebooks/en/llm_judge_evaluating_ai_search_engines_with_judges_library.ipynb @@ -47,7 +47,7 @@ "\n", "We use the [Natural Questions dataset](https://paperswithcode.com/dataset/natural-questions), an open-source collection of real Google queries and Wikipedia articles, to benchmark AI search engine quality.\n", "\n", - "1. Start with a [**100-datapoint subset of Natural Questions**](https://huggingface.co/datasets/quotientai/natural-qa-random-100-with-AI-search-answers), which only includes human evaluated answers and their corresponding queries for correctness, clarity, and completeness. We'll use these as the ground truth answers to the queries.\n", + "1. Start with a [**100-datapoint subset of Natural Questions**](https://huggingface.co/datasets/quotientai/labeled-natural-qa-random-100), which only includes human evaluated answers and their corresponding queries for correctness, clarity, and completeness. We'll use these as the ground truth answers to the queries.\n", "2. Use different **AI search engines** (Perplexity, Exa, and Gemini) to generate responses to the queries in the dataset.\n", "3. Use `judges` to evaluate the responses for **correctness** and **quality**.\n", "\n",