Skip to content

Subset arrays #411

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 17 commits into
base: dev
Choose a base branch
from
Open

Subset arrays #411

wants to merge 17 commits into from

Conversation

eodole
Copy link
Collaborator

@eodole eodole commented Apr 14, 2025

Addresses Issue #278

Implemented Take and subsample methods, however as discussed with Lars the requested squeeze functionality is essentially just the inverse of expand dims

@eodole eodole requested a review from LarsKue April 14, 2025 16:47
Copy link
Contributor

@LarsKue LarsKue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! The core of these changes already looks good, but there are some things I would like to see changed. Please also change the base branch of the PR to dev. See here on how to do that. EDIT: I could change it myself 🙂

@@ -39,3 +39,6 @@ docs/

# MacOS
.DS_Store

# Rproj
.Rproj.user
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unfamiliar with R. What is this directory used for, and should all other users have it ignored too? Otherwise, please put this in your local .git/info/exclude instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to @stefanradev93, this should be .Rproj

)
assert not np.all(
np.diff(output, axis=i) > 0
), f"is ordered along axis which is not meant to be ordered: {i}."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this is being reordered now, are you running ruff version 0.11.2?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apparently I was running ruff 0.8.1 but I will update it

def __init__(self):
super().__init__()

def forward(self, data: np.ndarray, sample_size: int, axis=-1):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the sample_size should be part of the constructor for this transform. It seems like a bit of a hassle to have to pass this argument in the forward call of the Adapter, unless you have a specific use case in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also allow it to be a float in [0, 1] specifying a proportion of the sample to subsample.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second this, but it raises the concern of whether we should floor or ceil the resulting value. I am thinking ceiling should be better.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what case do you imagine for use of ceiling over floor? For most applications I assume users would want a slightly smaller rather than larger sample size.

@LarsKue LarsKue changed the base branch from main to dev April 15, 2025 15:43
@LarsKue LarsKue added feature New feature or request user interface Changes to the user interface and improvements in usability good first issue Good for first-time contributors labels Apr 15, 2025
@LarsKue LarsKue moved this from Future to In Progress in bayesflow development Apr 15, 2025
Copy link
Collaborator Author

@eodole eodole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm honestly not sure that this should be a map transform as written, one consequence of the way its written now is that all keys specified by this transform will have random subsamples of the same size from the same axis. I think that all datasets that a user wants to subsample should be specified individually, so that axis and sample size are specified individually. If this is the case, would it not be better to force the map transform to reject a sequence of keys but rather only take one key?

@eodole
Copy link
Collaborator Author

eodole commented Apr 22, 2025

You also asked me to rename subsample_array to random_subsample my question is should all associated files also be renamed?

@stefanradev93 stefanradev93 deleted the branch bayesflow-org:dev April 22, 2025 14:37
@github-project-automation github-project-automation bot moved this from In Progress to Done in bayesflow development Apr 22, 2025
@LarsKue
Copy link
Contributor

LarsKue commented Apr 22, 2025

This was accidentally closed. We will investigate how to restore the branch and reopen PRs.

@LarsKue LarsKue reopened this Apr 22, 2025
@LarsKue
Copy link
Contributor

LarsKue commented Apr 22, 2025

Yes, please rename all associated files so the structure of file_name.py::class_or_function_name is consistent.

to force the map transform to reject a sequence of keys

In this case, I think we would want to have a regular Transform. It would also be fine to implement it as an ElementwiseTransform and wrap it in a MapTransform internally, but then the dispatch method on the adapter, i.e., adapter.random_subsample should raise an error if a Sequence of keys is passed.

@LarsKue LarsKue self-requested a review May 2, 2025 17:19
Copy link
Contributor

@LarsKue LarsKue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for addressing my previous comments. Before we can merge this into dev, it looks like you will need to resolve merge-conflicts by merging dev into this first. If you need help with that, let me know.





Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These empty lines would be removed by the formatter, which should automatically run on pre-commit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i made a new environment based on the contribution.md
and now the linter runs automatically. on my end the most recent commit it passed all checks so im not sure why it's failing on the pull request

Copy link

codecov bot commented May 6, 2025

Codecov Report

Attention: Patch coverage is 42.85714% with 28 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
bayesflow/adapters/transforms/random_subsample.py 36.36% 14 Missing ⚠️
bayesflow/adapters/adapter.py 25.00% 9 Missing ⚠️
bayesflow/adapters/transforms/take.py 61.53% 5 Missing ⚠️
Files with missing lines Coverage Δ
bayesflow/adapters/transforms/__init__.py 100.00% <100.00%> (ø)
bayesflow/adapters/transforms/take.py 61.53% <61.53%> (ø)
bayesflow/adapters/adapter.py 63.34% <25.00%> (-21.35%) ⬇️
bayesflow/adapters/transforms/random_subsample.py 36.36% <36.36%> (ø)

... and 157 files with indirect coverage changes

@eodole
Copy link
Collaborator Author

eodole commented May 6, 2025

I have no idea why the linter failed, i have it in the precommit hook and i merged dev into this branch. Im not really sure why it doesnt pass

@eodole eodole requested a review from LarsKue May 6, 2025 12:29
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
feature New feature or request good first issue Good for first-time contributors user interface Changes to the user interface and improvements in usability
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants