feat(llmobs): llm datasets and experiments #12918

jjxct · 2025-03-26T20:54:37Z

Checklist

PR author has checked that all the criteria below are met
The PR description includes an overview of the change
The PR description articulates the motivation for the change
The change includes tests OR the PR description describes a testing strategy
The PR description notes risks associated with the change, if any
Newly-added code is easy to change
The change follows the library release note guidelines
The change includes or references documentation updates if necessary
Backport labels are set (if applicable)

Reviewer Checklist

Reviewer has checked that all the criteria below are met
Title is accurate
All changes are related to the pull request's stated goal
Avoids breaking API changes
Testing strategy adequately addresses listed risks
Newly-added code is easy to change
Release note makes sense to a user of the library
If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
Backport labels are set in a manner that is consistent with the release branch maintenance policy

…xperiments

…dd-trace-py into jonathan.chavez/llm-experiments

github-actions · 2025-03-26T21:17:01Z

CODEOWNERS have been resolved as:

ddtrace/llmobs/experimentation/__init__.py                              @DataDog/ml-observability
ddtrace/llmobs/experimentation/_config.py                               @DataDog/ml-observability
ddtrace/llmobs/experimentation/_dataset.py                              @DataDog/ml-observability
ddtrace/llmobs/experimentation/_decorators.py                           @DataDog/ml-observability
ddtrace/llmobs/experimentation/_experiment.py                           @DataDog/ml-observability
ddtrace/llmobs/experimentation/utils/_exceptions.py                     @DataDog/ml-observability
ddtrace/llmobs/experimentation/utils/_http.py                           @DataDog/ml-observability
ddtrace/llmobs/experimentation/utils/_ui.py                             @DataDog/ml-observability
tests/llmobs/experiments_cassettes/test_dataset_pull.yaml               @DataDog/ml-observability
tests/llmobs/experiments_cassettes/test_dataset_pull_dne.yaml           @DataDog/ml-observability
tests/llmobs/test_experimentation_config.py                             @DataDog/ml-observability
tests/llmobs/test_experimentation_dataset.py                            @DataDog/ml-observability
tests/llmobs/test_experimentation_decorators.py                         @DataDog/ml-observability
tests/llmobs/test_experimentation_experiment.py                         @DataDog/ml-observability
ddtrace/llmobs/_constants.py                                            @DataDog/ml-observability
ddtrace/llmobs/_llmobs.py                                               @DataDog/ml-observability
ddtrace/llmobs/_utils.py                                                @DataDog/ml-observability
ddtrace/llmobs/_writer.py                                               @DataDog/ml-observability
tests/llmobs/test_utils.py                                              @DataDog/ml-observability

ddtrace/llmobs/experimentation/_config.py

+# Derived values
+def get_api_base_url() -> str:
+    """Get the base URL for API requests."""
+    if get_site().endswith("datadoghq.com"):


To fix the problem, we need to parse the URL and check the hostname properly instead of using a simple string method. This can be done using the urlparse function from the urllib.parse module. By extracting the hostname and then performing the check, we can ensure that the URL is correctly validated.

The best way to fix the problem without changing existing functionality is to modify the get_api_base_url and get_base_url functions to use urlparse for extracting and validating the hostname. This ensures that the check is performed on the actual hostname rather than any part of the URL string.

ddtrace/llmobs/experimentation/_config.py

+    """Get the base URL for API requests."""
+    if get_site().endswith("datadoghq.com"):
+        return f"https://api.{get_site()}"
+    elif get_site().endswith("datad0g.com"):


To fix the problem, we need to ensure that the URL is properly parsed and its hostname is checked in a secure manner. Instead of using a simple string comparison, we should use the urlparse function from the urllib.parse module to extract the hostname and then perform the check.

Parse the URL using urlparse to extract the hostname.

Check if the hostname ends with the allowed domain.

Update the get_api_base_url and get_base_url functions to use this approach.

ddtrace/llmobs/experimentation/_config.py

+    """Get the base URL for the LLM Observability UI."""
+    if get_site() == "datadoghq.com":
+        return "https://app.datadoghq.com"
+    elif get_site().endswith("datadoghq.com"):


To fix the problem, we need to ensure that the URL is properly parsed and the hostname is validated correctly. We will use the urlparse function from the urllib.parse module to parse the URL and then check if the hostname ends with the allowed domain. This approach ensures that the check is not bypassed by malicious URLs.

We will modify the get_base_url function to use urlparse for parsing the URL and validating the hostname.

github-actions · 2025-03-26T21:35:25Z

Bootstrap import analysis

Comparison of import times between this PR and main.

Summary

The average import time in this PR is: 228 ± 2 ms.

The average import time in main is: 230 ± 2 ms.

The import time difference between this PR and main is: -2.17 ± 0.08 ms.

Import time breakdown

The following import paths have shrunk:

ddtrace.auto 1.962 ms (0.86%)

ddtrace.bootstrap.sitecustomize 1.296 ms (0.57%)

ddtrace.bootstrap.preload 1.296 ms (0.57%)

ddtrace.internal.products 1.296 ms (0.57%)

ddtrace.internal.remoteconfig.client 0.632 ms (0.28%)

ddtrace 0.666 ms (0.29%)

pr-commenter · 2025-03-26T21:55:28Z

Benchmarks

Benchmark execution time: 2025-03-27 19:10:23

Comparing candidate commit a6267c1 in PR branch llm-experiments with baseline commit 03d8cf9 in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 498 metrics, 2 unstable metrics.

ddtrace/llmobs/experimentation/_dataset.py

datadog-datadog-prod-us1 · 2025-03-27T17:51:03Z

ddtrace/llmobs/experimentation/_dataset.py

+        else:
+            record = self._data[index].copy()
+            record.pop("record_id", None)
+            return record


⚪ Code Quality Violation

else is not necessary since the if clause has a return (...read more)

If the code in the if branch returns a value, do not have the else branch present.

jjxct and others added 30 commits October 24, 2024 17:46

Add main classes for experiments sdk

b4d6082

Added more things but don't remember what

f9e9296

Add network calls for main methods

60f3ba5

Add docstring

d48942d

Format code

88f05d3

Add custom exception classes

e73a897

Move code to another directory

f8c9ef0

Change experiments module export

59577e1

Use f strings

402d402

Decouple running from evaluating

2c281c5

Change parametrize function to make it simpler

0e421da

Add test file, export the top level classes

173d2ae

fmt

044d696

Simplify http client, remove stdout printing

ac634fa

fmt

d29f081

more stdout cleanup, http status code checking

dc119d0

Add feedback from sync

f018298

Add error handling on tasks

351cd7a

fix import

2608ba5

docstring

5cbfd70

Custom Exception classes

bed1261

Merge remote-tracking branch 'origin/main' into jonathan.chavez/llm-e…

0cbc487

…xperiments

handle duration errors

0928224

more stuff

cac1476

Merge branch 'jonathan.chavez/llm-experiments' of github.com:DataDog/…

436b1b6

…dd-trace-py into jonathan.chavez/llm-experiments

support polymorphic i/o

9024e14

structure changes

a228c30

modifications to types

b29fa1d

remove unnecessary comments

738cc07

fix code quality violations

1059172

jjxct added 18 commits March 3, 2025 13:55

remove mistake

775f8a6

add line

19b3054

remove two tag methods

8639115

black formatted and type hints

3925af0

clean docstrings

ed884cf

clean up constants

86ee54b

improve file structure

6311541

stable, fixed file references

e10c71f

black formatting

75520b5

version prop mismatch - aritra

d99b761

clean up run functions and flush every 2

0207ba4

links are accurate

1eec6c5

add support for dataset mutations

1e7c1a9

pt stash

2f267a0

summary metrics and updated config

73520dc

add support for multiple dcs

4aafd9e

Remove .DS_Store file

483937c

remove comments

d383db7

jjxct requested a review from a team as a code owner March 26, 2025 20:54

Merge main into llm-experiments branch and resolve conflicts

c2c8c7c

github-advanced-security bot found potential problems Mar 26, 2025

View reviewed changes

improve push method

fdf7592

datadog-datadog-prod-us1 bot reviewed Mar 27, 2025

View reviewed changes

ddtrace/llmobs/experimentation/_dataset.py Outdated Show resolved Hide resolved

ddtrace/llmobs/experimentation/_dataset.py Show resolved Hide resolved

jjxct added 2 commits March 27, 2025 13:26

refactor push functions

c9a3c05

slices access

a6267c1

datadog-datadog-prod-us1 bot reviewed Mar 27, 2025

View reviewed changes

increase wait times

9666209

@@ -28,5 +28,8 @@
                 """Get the base URL for API requests."""
-                if get_site().endswith("datadoghq.com"):
-                    return f"https://api.{get_site()}"
-                elif get_site().endswith("datad0g.com"):
+                from urllib.parse import urlparse
+                site = get_site()
+                hostname = urlparse(f"https://{site}").hostname
+                if hostname and hostname.endswith("datadoghq.com"):
+                    return f"https://api.{site}"
+                elif hostname and hostname.endswith("datad0g.com"):
                     return "https://dd.datad0g.com"
@@ -36,7 +39,9 @@
                 """Get the base URL for the LLM Observability UI."""
-                if get_site() == "datadoghq.com":
+                site = get_site()
+                hostname = urlparse(f"https://{site}").hostname
+                if hostname == "datadoghq.com":
                     return "https://app.datadoghq.com"
-                elif get_site().endswith("datadoghq.com"):
-                    return f"https://{get_site()}"
-                elif get_site() == "datad0g.com":
+                elif hostname and hostname.endswith("datadoghq.com"):
+                    return f"https://{site}"
+                elif hostname == "datad0g.com":
                     return "https://dd.datad0g.com"

@@ -26,7 +26,11 @@
             # Derived values
+            from urllib.parse import urlparse
             def get_api_base_url() -> str:
                 """Get the base URL for API requests."""
-                if get_site().endswith("datadoghq.com"):
-                    return f"https://api.{get_site()}"
-                elif get_site().endswith("datad0g.com"):
+                site = get_site()
+                hostname = urlparse(f"https://{site}").hostname
+                if hostname and hostname.endswith("datadoghq.com"):
+                    return f"https://api.{hostname}"
+                elif hostname and hostname.endswith("datad0g.com"):
                     return "https://dd.datad0g.com"
@@ -36,7 +40,9 @@
                 """Get the base URL for the LLM Observability UI."""
-                if get_site() == "datadoghq.com":
+                site = get_site()
+                hostname = urlparse(f"https://{site}").hostname
+                if hostname == "datadoghq.com":
                     return "https://app.datadoghq.com"
-                elif get_site().endswith("datadoghq.com"):
-                    return f"https://{get_site()}"
-                elif get_site() == "datad0g.com":
+                elif hostname and hostname.endswith("datadoghq.com"):
+                    return f"https://{hostname}"
+                elif hostname == "datad0g.com":
                     return "https://dd.datad0g.com"

@@ -1,3 +1,3 @@
             import os
+            from urllib.parse import urlparse
             from .._llmobs import LLMObs
@@ -36,7 +36,9 @@
                 """Get the base URL for the LLM Observability UI."""
-                if get_site() == "datadoghq.com":
+                site = get_site()
+                if site == "datadoghq.com":
                     return "https://app.datadoghq.com"
-                elif get_site().endswith("datadoghq.com"):
-                    return f"https://{get_site()}"
-                elif get_site() == "datad0g.com":
+                parsed_url = urlparse(f"https://{site}")
+                if parsed_url.hostname and parsed_url.hostname.endswith("datadoghq.com"):
+                    return f"https://{site}"
+                elif site == "datad0g.com":
                     return "https://dd.datad0g.com"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llmobs): llm datasets and experiments #12918

feat(llmobs): llm datasets and experiments #12918

jjxct commented Mar 26, 2025

github-actions bot commented Mar 26, 2025

Copilot Autofix

Copilot Autofix

Copilot Autofix

github-actions bot commented Mar 26, 2025 •

edited

Loading

pr-commenter bot commented Mar 26, 2025 •

edited

Loading

datadog-datadog-prod-us1 bot Mar 27, 2025

feat(llmobs): llm datasets and experiments #12918

Are you sure you want to change the base?

feat(llmobs): llm datasets and experiments #12918

Conversation

jjxct commented Mar 26, 2025

Checklist

Reviewer Checklist

github-actions bot commented Mar 26, 2025

Copilot Autofix

Copilot Autofix

Copilot Autofix

github-actions bot commented Mar 26, 2025 • edited Loading

Bootstrap import analysis

Summary

Import time breakdown

pr-commenter bot commented Mar 26, 2025 • edited Loading

Benchmarks

datadog-datadog-prod-us1 bot Mar 27, 2025

Choose a reason for hiding this comment

⚪ Code Quality Violation

github-actions bot commented Mar 26, 2025 •

edited

Loading

pr-commenter bot commented Mar 26, 2025 •

edited

Loading