added some flexibility to create your custom benchmark splits #307

optimass · 2025-01-17T15:09:10Z

added some flexibility to create your custom benchmark splits

gasse

Here are my comments:

This is an API change for users, ideally it would be best to avoid that. Also the method's behavior is quite different depending on whether task_splits is provided or not, which is a bit confusing IMO. Why not implement a new explicit method, like subset_from_custom_splits() or subset_from_task_list() ? This would solve both problems.

Otherwise LGTM. I let you decide whether to keep like this or not @optimass :)

optimass · 2025-01-21T17:08:15Z

@gasse yes, @recursix made the same suggestion :)

recursix · 2025-01-17T15:52:47Z

browsergym/experiments/src/browsergym/experiments/benchmark/base.py

-        if split_column not in self.task_metadata.columns:
-            raise NotImplementedError(
-                f"This benchmark does not provide default train/valid/test splits (missing a {repr(split_column)} column in task metadata)"
+    def subset_from_split(


It seems like it would be a much better fit to have two separate function. Instead of a if with two completely separate code.

I would keep the original subset_from_split function untouch and add a
subset_from_task_list(task_names: list[str], benchmark_name_suffix: str)
function.

But technically, you could achieve exactly this with subset_from_regex and pass a long regex. subset_from_task_list could be a wrapper to generate the regex, it might be more convenient than having to specify exactly each task names.

In your code task_splits is only used with task_splits[split], so instead of creating a dict, just directly use a list

browsergym/experiments/src/browsergym/experiments/benchmark/base.py

optimass added 2 commits January 17, 2025 15:06

added some flexibility to create your custom benchmark splits

4dcfacf

Merge branch 'main' of github.com:ServiceNow/BrowserGym

75c3092

optimass changed the title ~~Main~~ added some flexibility to create your custom benchmark splits Jan 17, 2025

gasse reviewed Jan 21, 2025

View reviewed changes

improved

945a50e

recursix previously approved these changes Jan 22, 2025

View reviewed changes

fix

305943c

optimass dismissed recursix’s stale review via 305943c January 22, 2025 18:38

removing failed test

4931714

TLSDC approved these changes Jan 23, 2025

View reviewed changes

optimass merged commit 5a879e1 into main Jan 23, 2025
13 checks passed

optimass deleted the custom_splits branch January 23, 2025 18:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added some flexibility to create your custom benchmark splits #307

added some flexibility to create your custom benchmark splits #307

optimass commented Jan 17, 2025

gasse left a comment

optimass commented Jan 21, 2025

recursix Jan 17, 2025

added some flexibility to create your custom benchmark splits #307

added some flexibility to create your custom benchmark splits #307

Conversation

optimass commented Jan 17, 2025

gasse left a comment

Choose a reason for hiding this comment

optimass commented Jan 21, 2025

recursix Jan 17, 2025

Choose a reason for hiding this comment