Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Working to add checks for duplicated comparison features, feature selection, and column mappings #113

Merged
merged 8 commits into from
Oct 30, 2023

Conversation

jrbalch543
Copy link

This is meant to address one of the GitHub issues. This adds checks for duplicates in those three sections to hopefully prevent against errors later on.

Do these checks look like enough?

@jrbalch543 jrbalch543 marked this pull request as draft October 26, 2023 14:47
@jrbalch543
Copy link
Author

jrbalch543 commented Oct 26, 2023

For some reason, this isn't letting me make it a draft right now...but I am adding additional tests. Also, no clue why I made the name of the branch same_output.
So this is a draft, I haven't forgotten that.

@jrbalch543 jrbalch543 marked this pull request as ready for review October 26, 2023 15:34
@jrbalch543
Copy link
Author

Ok, now it's ready

@riley-harper
Copy link
Contributor

riley-harper commented Oct 26, 2023

This is for issue #50.

Copy link
Contributor

@riley-harper riley-harper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking mostly good! I left a few comments.

I think that pipeline_features could use a similar check. That wasn't mentioned in the issue, sorry about that.

In the error messages I would add some more information. I would include

  • the name of the section, like [[comparison_features]]
  • short instructions on the problem and how to fix it, maybe like "duplicate aliases are not allowed. Please edit your config file to use unique aliases for each comparison feature."
  • a comma-separated list of the offending aliases or column names, not in Python format

hlink/spark/session.py Outdated Show resolved Hide resolved
hlink/scripts/lib/conf_validations.py Outdated Show resolved Hide resolved
hlink/tests/conf_validations_test.py Show resolved Hide resolved
hlink/scripts/lib/conf_validations.py Outdated Show resolved Hide resolved
@jrbalch543
Copy link
Author

I'm realizing I didn't actually address the big top comment and only the little ones

hlink/scripts/lib/conf_validations.py Outdated Show resolved Hide resolved
hlink/scripts/lib/conf_validations.py Outdated Show resolved Hide resolved
balch027 added 2 commits October 26, 2023 17:28
@riley-harper
Copy link
Contributor

Doing some more testing of this this morning. I think we may need a few more changes. Not ready to merge yet.

Copy link
Contributor

@riley-harper riley-harper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think I was wrong in one of my comments, sorry.

comparison_features are allowed to overwrite column_mappings. We do this often with the fetch_a comparison feature. So the comparison_features check should just be for duplicate aliases.

The column_mappings code still looks good. We can keep the stricter checking for feature_selections too, I think. I will double-check that.

@jrbalch543
Copy link
Author

Just updated! Is that back to what you're expecting?

Copy link
Contributor

@riley-harper riley-harper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, looking good now, thanks!

@riley-harper riley-harper merged commit c67a6ca into ipums:main Oct 30, 2023
@jrbalch543 jrbalch543 deleted the same_output branch October 30, 2023 14:38
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants