Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Allow 1 or 3+ columns as input in the array feature selection #134

Closed
riley-harper opened this issue May 30, 2024 · 0 comments · Fixed by #135
Closed

Allow 1 or 3+ columns as input in the array feature selection #134

riley-harper opened this issue May 30, 2024 · 0 comments · Fixed by #135
Labels
type: feature A new feature or enhancement to a feature

Comments

@riley-harper
Copy link
Contributor

Right now the array feature selection only allows combining exactly two input columns into an output column. To make this more flexible, we could support passing any number of columns, with a minimum of 1. This should be a small change in hlink/linking/core/transforms.py, where we unpack feature_selection["input_columns"] with

col1, col2 = feature_selection["input_columns"]

The pyspark.sql.functions.array() function which we're using accepts a variable number of arguments.

riley-harper added a commit that referenced this issue May 30, 2024
This includes some failing tests which provide 1 or 3 input columns instead of
just 2. #134 should make these tests pass.
riley-harper added a commit that referenced this issue May 30, 2024
…ctionality

Now this feature selection transform handles any number of columns, not just 2.
@riley-harper riley-harper added the type: feature A new feature or enhancement to a feature label Dec 4, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
type: feature A new feature or enhancement to a feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant