diff --git a/notebooks/en/llm_judge_evaluating_ai_search_engines_with_judges_library.ipynb b/notebooks/en/llm_judge_evaluating_ai_search_engines_with_judges_library.ipynb
index 6c83a3cb..72e7b18a 100644
--- a/notebooks/en/llm_judge_evaluating_ai_search_engines_with_judges_library.ipynb
+++ b/notebooks/en/llm_judge_evaluating_ai_search_engines_with_judges_library.ipynb
@@ -47,7 +47,7 @@
     "\n",
     "We use the [Natural Questions dataset](https://paperswithcode.com/dataset/natural-questions), an open-source collection of real Google queries and Wikipedia articles, to benchmark AI search engine quality.\n",
     "\n",
-    "1. Start with a [**100-datapoint subset of Natural Questions**](https://huggingface.co/datasets/quotientai/natural-qa-random-100-with-AI-search-answers), which only includes human evaluated answers and their corresponding queries for correctness, clarity, and completeness. We'll use these as the ground truth answers to the queries.\n",
+    "1. Start with a [**100-datapoint subset of Natural Questions**](https://huggingface.co/datasets/quotientai/labeled-natural-qa-random-100), which only includes human evaluated answers and their corresponding queries for correctness, clarity, and completeness. We'll use these as the ground truth answers to the queries.\n",
     "2. Use different **AI search engines** (Perplexity, Exa, and Gemini) to generate responses to the queries in the dataset.\n",
     "3. Use `judges` to evaluate the responses for **correctness** and **quality**.\n",
     "\n",