Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Improve hub_validations print method #116

Merged

Conversation

annakrystalli
Copy link
Member

In this PR I've gone a step further on improving the hub_validations print method:

  • hub_validations class object combine() method now ensures that check names are made unique across all hub_validations objects being combined.
  • Additional improvements to hub_validations class object print() method.
    • Check results for each file validated are now split and printed under file name header.
    • The check name that can be used to access the check result from the hub_validations object is now included as the prefix to the check result message instead of the file name (Add name of tests to print out #76).

This makes print out much more streamlined and informative. Here's the output of

── mod_del_hub ────

✔ [valid_config]: All hub config files are valid.

── 2022-10-08-hub-baseline.csv ────

ⓧ [model_output_mod]: Previously submitted model output files must not be modified.
  model-output/hub-baseline/2022-10-08-hub-baseline.csv modified.
✔ [file_exists]: File exists at path model-output/hub-baseline/2022-10-08-hub-baseline.csv.
✔ [file_name]: File name "2022-10-08-hub-baseline.csv" is valid.
✔ [file_location]: File directory name matches `model_id` metadata in file name.
✔ [round_id_valid]: `round_id` is valid.
✔ [file_format]: File is accepted hub format.
✔ [metadata_exists]: Metadata file exists at path model-metadata/hub-baseline.yml.
✔ [file_read]: File could be read successfully.
✔ [valid_round_id_col]: `round_id_col` name is valid.
✔ [unique_round_id]: `round_id` column "origin_date" contains a single, unique round ID
  value.
✔ [match_round_id]: All `round_id_col` "origin_date" values match submission `round_id`
  from file name.
✔ [colnames]: Column names are consistent with expected round task IDs and std column
  names.
✔ [col_types]: Column data types match hub schema.
✔ [valid_vals]: `tbl` contains valid values/value combinations.
✔ [rows_unique]: All combinations of task ID column/`output_type`/`output_type_id` values
  are unique.
✔ [req_vals]: Required task ID/output type/output type ID combinations all present.
✔ [value_col_valid]: Values in column `value` all valid with respect to modeling task
  config.
✔ [value_col_non_desc]: Values in `value` column are non-decreasing as output_type_ids
  increase for all unique task ID value/output type combinations of quantile or cdf output
  types.
ℹ [value_col_sum1]: No pmf output types to check for sum of 1. Check skipped.
ℹ [spl_compound_taskid_set]: No v3 samples found in model output data to check. Skipping
  `check_tbl_spl_compound_taskid_set` check.
ℹ [spl_compound_tid]: No v3 samples found in model output data to check. Skipping
  `check_tbl_spl_compound_tid` check.
ℹ [spl_non_compound_tid]: No v3 samples found in model output data to check. Skipping
  `check_tbl_spl_non_compound_tid` check.
ℹ [spl_n]: No v3 samples found in model output data to check. Skipping `check_tbl_spl_n`
  check.

── 2022-10-15-team1-goodmodel.csv ────

ⓧ [model_output_mod_1]: Previously submitted model output files must not be removed.
  model-output/team1-goodmodel/2022-10-15-team1-goodmodel.csv removed.

── team1-goodmodel.yaml ────

ⓧ [model_metadata_mod]: Previously submitted model metadata files must not be removed.
  model-metadata/team1-goodmodel.yaml removed.

── 2022-10-22-team1-goodmodel.csv ────

✔ [file_exists_1]: File exists at path
  model-output/team1-goodmodel/2022-10-22-team1-goodmodel.csv.
✔ [file_name_1]: File name "2022-10-22-team1-goodmodel.csv" is valid.
✔ [file_location_1]: File directory name matches `model_id` metadata in file name.
✔ [round_id_valid_1]: `round_id` is valid.
✔ [file_format_1]: File is accepted hub format.
ⓧ [metadata_exists_1]: Metadata file does not exist at path
  model-metadata/team1-goodmodel.yml or model-metadata/team1-goodmodel.yaml.

@annakrystalli annakrystalli linked an issue Sep 3, 2024 that may be closed by this pull request
Copy link
Member

@zkamvar zkamvar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a wonderful change! I think this definitely clarifies and de-clutters the output 🧹 and makes it clear where exactly these errors are coming from.

As always, I have non-blocking suggestions below along with this suggestion:

When I look at a failing check, my first instinct is to use ?check_[check name] to find details about that particular unit check. Because the model output checks are all prefixed with tbl_, I stumbled a bit. Could we documentation aliases for the check_tbl_* functions to be check_* (e.g. #' @alias check_colnames for check_tbl_colnames)?

R/hub_validations_methods.R Show resolved Hide resolved
R/hub_validations_methods.R Outdated Show resolved Hide resolved
@annakrystalli
Copy link
Member Author

When I look at a failing check, my first instinct is to use ?check_[check name] to find details about that particular unit check. Because the model output checks are all prefixed with tbl_, I stumbled a bit. Could we documentation aliases for the check_tbl_* functions to be check_* (e.g. #' @alias check_colnames for check_tbl_colnames)?

There are reasons for the prefixes in that they indicate which higher level functions are running the tests and on what aspect (i.e. on the file? on metadata? on the contents of the file? etc). In terms of matching to function docs, I wouldn't want to add aliases that are a mismash of function names and check names but don't actually match up to any correctly. So if we must add aliases I would prefer to add the actual check name.

However, there are tables with details of each check at the bottom of each higher level validation function (e.g. here's the help file for validate_submission() and it's pretty easy to tie the check name to the function too so I don't think aliases are strictly necessary. But feel free to open an issue if you think it would really help.

@annakrystalli annakrystalli merged commit 4faebe9 into ak/change-check-fail-class-and-print/111 Sep 5, 2024
@annakrystalli annakrystalli deleted the ak/hubval-print branch September 5, 2024 07:58
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Add name of tests to print out
2 participants