Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

OPIK-71 Resolve dataset name in experiment endpoints #211

Merged
merged 2 commits into from
Sep 11, 2024

Conversation

andrescrz
Copy link
Collaborator

@andrescrz andrescrz commented Sep 10, 2024

Details

Applies to the two existing endpoints:

  • Get experiment by ID.
  • Find experiments.

Datasets are stored in MySQL whereas experiments are in ClickHouse, so there's going to be a performance penalty no matter the implementation.

Implemented the dataset name retrieval in a single DB call in both cases, so there's a single MySQL query per each ClickHouse query.

Nevertheless, it should be fine in production for both endpoints:

  • Get experiment by ID: No concern as the size of the response is very small, only one experiment and therefore one dataset is returned.
  • Find experiments: it's a paginated call and meant to be used by the UI. As long as the page size has a resonable size, it should be fine. That should always be the case for a UI usage.

Issues

OPIK-71

Testing

  • Updated and verified integration tests.

Documentation

N/A

@andrescrz andrescrz self-assigned this Sep 10, 2024
@andrescrz andrescrz added the enhancement New feature or request label Sep 10, 2024
@andrescrz andrescrz marked this pull request as ready for review September 10, 2024 16:58
@andrescrz andrescrz requested a review from a team as a code owner September 10, 2024 16:58

return dao.findById(id, workspaceId).orElseThrow(this::createNotFoundError);
@Override
public List<Dataset> findByIds(@NonNull Set<UUID> ids, @NonNull String workspaceId) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return only the name and ID? This probably would reduce the "overhead"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an option, but I'd rather have and generic and re-usable method over something where we don't know the performance impact yet and what would be the improvement by following this approach.

We can always change this in the future.

@andrescrz andrescrz merged commit 3163f2f into main Sep 11, 2024
2 checks passed
@andrescrz andrescrz deleted the andrescrz/OPIK-71-resolve-dataset-name-in-experiments branch September 11, 2024 08:16
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants