-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Implement data validation step with basic check for duplicate rows #1088
base: main
Are you sure you want to change the base?
Conversation
Benchmark Results
Benchmark PlotsA plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1088 +/- ##
==========================================
+ Coverage 97.58% 97.65% +0.06%
==========================================
Files 29 30 +1
Lines 952 980 +28
==========================================
+ Hits 929 957 +28
Misses 23 23 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
94bd338
to
06aee04
Compare
Create a data validation function to call various validation functions. Call this data validation after initialising empty tables and before creating the internal tables inside the create_internal_tables function. Starting point for #461
06aee04
to
57c68ab
Compare
@datejada, for the failing data you shared Monday, this is the result: ERROR: DataValidationException: The following issues were found in the data:
- Table asset_commission has duplicate entries for (asset=DE_Onshore_Wind, commission_year=2050)
- Table asset_commission has duplicate entries for (asset=NL_Onshore_Wind, commission_year=2050)
- Table asset_commission has duplicate entries for (asset=DE_Offshore_Wind, commission_year=2050)
- Table asset_commission has duplicate entries for (asset=NL_Solar, commission_year=2050)
- Table asset_commission has duplicate entries for (asset=BE_Onshore_Wind, commission_year=2050)
- Table asset_commission has duplicate entries for (asset=BE_Offshore_Wind, commission_year=2050)
- Table asset_commission has duplicate entries for (asset=BE_Solar, commission_year=2050)
- Table asset_commission has duplicate entries for (asset=NL_Offshore_Wind, commission_year=2050)
- Table asset_commission has duplicate entries for (asset=DE_Solar, commission_year=2050)
- Table asset_milestone has duplicate entries for (asset=DE_Onshore_Wind, milestone_year=2050)
- Table asset_milestone has duplicate entries for (asset=NL_Onshore_Wind, milestone_year=2050)
- Table asset_milestone has duplicate entries for (asset=DE_Offshore_Wind, milestone_year=2050)
- Table asset_milestone has duplicate entries for (asset=NL_Solar, milestone_year=2050)
- Table asset_milestone has duplicate entries for (asset=BE_Onshore_Wind, milestone_year=2050)
- Table asset_milestone has duplicate entries for (asset=BE_Offshore_Wind, milestone_year=2050)
- Table asset_milestone has duplicate entries for (asset=BE_Solar, milestone_year=2050)
- Table asset_milestone has duplicate entries for (asset=NL_Offshore_Wind, milestone_year=2050)
- Table asset_milestone has duplicate entries for (asset=DE_Solar, milestone_year=2050)
- Table assets_rep_periods_partitions has duplicate entries for (asset=BE_Onshore_Wind, year=2050, rep_period=1)
- Table assets_rep_periods_partitions has duplicate entries for (asset=NL_Solar, year=2050, rep_period=1)
- Table assets_rep_periods_partitions has duplicate entries for (asset=DE_Onshore_Wind, year=2050, rep_period=1)
- Table assets_rep_periods_partitions has duplicate entries for (asset=NL_Onshore_Wind, year=2050, rep_period=1)
- Table assets_rep_periods_partitions has duplicate entries for (asset=BE_Offshore_Wind, year=2050, rep_period=1)
- Table assets_rep_periods_partitions has duplicate entries for (asset=DE_Offshore_Wind, year=2050, rep_period=1)
- Table assets_rep_periods_partitions has duplicate entries for (asset=NL_Offshore_Wind, year=2050, rep_period=1)
- Table assets_rep_periods_partitions has duplicate entries for (asset=BE_Solar, year=2050, rep_period=1)
- Table assets_rep_periods_partitions has duplicate entries for (asset=DE_Solar, year=2050, rep_period=1) |
Create a data validation function to call various validation functions.
Call this data validation after initialising empty tables and before
creating the internal tables inside the create_internal_tables function.
Related issues
Starting point for #461
Checklist