Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Implement data validation step with basic check for duplicate rows #1088

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

abelsiqueira
Copy link
Member

@abelsiqueira abelsiqueira commented Mar 11, 2025

Create a data validation function to call various validation functions.
Call this data validation after initialising empty tables and before
creating the internal tables inside the create_internal_tables function.

Related issues

Starting point for #461

Checklist

  • I am following the contributing guidelines
  • Tests are passing
  • Lint workflow is passing
  • Docs were updated and workflow is passing

@abelsiqueira abelsiqueira added the benchmark PR only - Run benchmark on PR label Mar 11, 2025
Copy link
Contributor

github-actions bot commented Mar 11, 2025

Benchmark Results

a80eb21... 57c68ab... a80eb21... / 57c68ab...
energy_problem/create_model 28.4 ± 2.4 s 29.7 ± 3.6 s 0.956
energy_problem/input_and_constructor 23.7 ± 0.092 s 23.7 ± 0.18 s 1
time_to_load 2.57 ± 0.0096 s 2.55 ± 0.033 s 1.01
a80eb21... 57c68ab... a80eb21... / 57c68ab...
energy_problem/create_model 0.196 G allocs: 11.2 GB 0.196 G allocs: 11.2 GB 1
energy_problem/input_and_constructor 26.7 M allocs: 0.943 GB 26.7 M allocs: 0.943 GB 1
time_to_load 0.159 k allocs: 11.2 kB 0.159 k allocs: 11.2 kB 1

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

Copy link

codecov bot commented Mar 11, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.65%. Comparing base (a80eb21) to head (57c68ab).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1088      +/-   ##
==========================================
+ Coverage   97.58%   97.65%   +0.06%     
==========================================
  Files          29       30       +1     
  Lines         952      980      +28     
==========================================
+ Hits          929      957      +28     
  Misses         23       23              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@abelsiqueira abelsiqueira force-pushed the 461-create-validation-step branch from 94bd338 to 06aee04 Compare March 11, 2025 19:44
Create a data validation function to call various validation functions.
Call this data validation after initialising empty tables and before
creating the internal tables inside the create_internal_tables function.

Starting point for #461
@abelsiqueira abelsiqueira force-pushed the 461-create-validation-step branch from 06aee04 to 57c68ab Compare March 12, 2025 08:51
@abelsiqueira abelsiqueira marked this pull request as ready for review March 12, 2025 09:25
@abelsiqueira abelsiqueira requested a review from datejada March 12, 2025 09:25
@abelsiqueira
Copy link
Member Author

@datejada, for the failing data you shared Monday, this is the result:

ERROR: DataValidationException: The following issues were found in the data:
- Table asset_commission has duplicate entries for (asset=DE_Onshore_Wind, commission_year=2050)
- Table asset_commission has duplicate entries for (asset=NL_Onshore_Wind, commission_year=2050)
- Table asset_commission has duplicate entries for (asset=DE_Offshore_Wind, commission_year=2050)
- Table asset_commission has duplicate entries for (asset=NL_Solar, commission_year=2050)
- Table asset_commission has duplicate entries for (asset=BE_Onshore_Wind, commission_year=2050)
- Table asset_commission has duplicate entries for (asset=BE_Offshore_Wind, commission_year=2050)
- Table asset_commission has duplicate entries for (asset=BE_Solar, commission_year=2050)
- Table asset_commission has duplicate entries for (asset=NL_Offshore_Wind, commission_year=2050)
- Table asset_commission has duplicate entries for (asset=DE_Solar, commission_year=2050)
- Table asset_milestone has duplicate entries for (asset=DE_Onshore_Wind, milestone_year=2050)
- Table asset_milestone has duplicate entries for (asset=NL_Onshore_Wind, milestone_year=2050)
- Table asset_milestone has duplicate entries for (asset=DE_Offshore_Wind, milestone_year=2050)
- Table asset_milestone has duplicate entries for (asset=NL_Solar, milestone_year=2050)
- Table asset_milestone has duplicate entries for (asset=BE_Onshore_Wind, milestone_year=2050)
- Table asset_milestone has duplicate entries for (asset=BE_Offshore_Wind, milestone_year=2050)
- Table asset_milestone has duplicate entries for (asset=BE_Solar, milestone_year=2050)
- Table asset_milestone has duplicate entries for (asset=NL_Offshore_Wind, milestone_year=2050)
- Table asset_milestone has duplicate entries for (asset=DE_Solar, milestone_year=2050)
- Table assets_rep_periods_partitions has duplicate entries for (asset=BE_Onshore_Wind, year=2050, rep_period=1)
- Table assets_rep_periods_partitions has duplicate entries for (asset=NL_Solar, year=2050, rep_period=1)
- Table assets_rep_periods_partitions has duplicate entries for (asset=DE_Onshore_Wind, year=2050, rep_period=1)
- Table assets_rep_periods_partitions has duplicate entries for (asset=NL_Onshore_Wind, year=2050, rep_period=1)
- Table assets_rep_periods_partitions has duplicate entries for (asset=BE_Offshore_Wind, year=2050, rep_period=1)
- Table assets_rep_periods_partitions has duplicate entries for (asset=DE_Offshore_Wind, year=2050, rep_period=1)
- Table assets_rep_periods_partitions has duplicate entries for (asset=NL_Offshore_Wind, year=2050, rep_period=1)
- Table assets_rep_periods_partitions has duplicate entries for (asset=BE_Solar, year=2050, rep_period=1)
- Table assets_rep_periods_partitions has duplicate entries for (asset=DE_Solar, year=2050, rep_period=1)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
benchmark PR only - Run benchmark on PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant