Improve `hub_validations` print method #116

annakrystalli · 2024-09-03T12:45:24Z

In this PR I've gone a step further on improving the hub_validations print method:

hub_validations class object combine() method now ensures that check names are made unique across all hub_validations objects being combined.
Additional improvements to hub_validations class object print() method.
- Check results for each file validated are now split and printed under file name header.
- The check name that can be used to access the check result from the hub_validations object is now included as the prefix to the check result message instead of the file name (Add name of tests to print out #76).

This makes print out much more streamlined and informative. Here's the output of

── mod_del_hub ────

✔ [valid_config]: All hub config files are valid.

── 2022-10-08-hub-baseline.csv ────

ⓧ [model_output_mod]: Previously submitted model output files must not be modified.
  model-output/hub-baseline/2022-10-08-hub-baseline.csv modified.
✔ [file_exists]: File exists at path model-output/hub-baseline/2022-10-08-hub-baseline.csv.
✔ [file_name]: File name "2022-10-08-hub-baseline.csv" is valid.
✔ [file_location]: File directory name matches `model_id` metadata in file name.
✔ [round_id_valid]: `round_id` is valid.
✔ [file_format]: File is accepted hub format.
✔ [metadata_exists]: Metadata file exists at path model-metadata/hub-baseline.yml.
✔ [file_read]: File could be read successfully.
✔ [valid_round_id_col]: `round_id_col` name is valid.
✔ [unique_round_id]: `round_id` column "origin_date" contains a single, unique round ID
  value.
✔ [match_round_id]: All `round_id_col` "origin_date" values match submission `round_id`
  from file name.
✔ [colnames]: Column names are consistent with expected round task IDs and std column
  names.
✔ [col_types]: Column data types match hub schema.
✔ [valid_vals]: `tbl` contains valid values/value combinations.
✔ [rows_unique]: All combinations of task ID column/`output_type`/`output_type_id` values
  are unique.
✔ [req_vals]: Required task ID/output type/output type ID combinations all present.
✔ [value_col_valid]: Values in column `value` all valid with respect to modeling task
  config.
✔ [value_col_non_desc]: Values in `value` column are non-decreasing as output_type_ids
  increase for all unique task ID value/output type combinations of quantile or cdf output
  types.
ℹ [value_col_sum1]: No pmf output types to check for sum of 1. Check skipped.
ℹ [spl_compound_taskid_set]: No v3 samples found in model output data to check. Skipping
  `check_tbl_spl_compound_taskid_set` check.
ℹ [spl_compound_tid]: No v3 samples found in model output data to check. Skipping
  `check_tbl_spl_compound_tid` check.
ℹ [spl_non_compound_tid]: No v3 samples found in model output data to check. Skipping
  `check_tbl_spl_non_compound_tid` check.
ℹ [spl_n]: No v3 samples found in model output data to check. Skipping `check_tbl_spl_n`
  check.

── 2022-10-15-team1-goodmodel.csv ────

ⓧ [model_output_mod_1]: Previously submitted model output files must not be removed.
  model-output/team1-goodmodel/2022-10-15-team1-goodmodel.csv removed.

── team1-goodmodel.yaml ────

ⓧ [model_metadata_mod]: Previously submitted model metadata files must not be removed.
  model-metadata/team1-goodmodel.yaml removed.

── 2022-10-22-team1-goodmodel.csv ────

✔ [file_exists_1]: File exists at path
  model-output/team1-goodmodel/2022-10-22-team1-goodmodel.csv.
✔ [file_name_1]: File name "2022-10-22-team1-goodmodel.csv" is valid.
✔ [file_location_1]: File directory name matches `model_id` metadata in file name.
✔ [round_id_valid_1]: `round_id` is valid.
✔ [file_format_1]: File is accepted hub format.
ⓧ [metadata_exists_1]: Metadata file does not exist at path
  model-metadata/team1-goodmodel.yml or model-metadata/team1-goodmodel.yaml.

…_validations. Resolves #76

zkamvar

I think this is a wonderful change! I think this definitely clarifies and de-clutters the output 🧹 and makes it clear where exactly these errors are coming from.

As always, I have non-blocking suggestions below along with this suggestion:

When I look at a failing check, my first instinct is to use ?check_[check name] to find details about that particular unit check. Because the model output checks are all prefixed with tbl_, I stumbled a bit. Could we documentation aliases for the check_tbl_* functions to be check_* (e.g. #' @alias check_colnames for check_tbl_colnames)?

R/hub_validations_methods.R

annakrystalli · 2024-09-03T16:14:44Z

When I look at a failing check, my first instinct is to use ?check_[check name] to find details about that particular unit check. Because the model output checks are all prefixed with tbl_, I stumbled a bit. Could we documentation aliases for the check_tbl_* functions to be check_* (e.g. #' @alias check_colnames for check_tbl_colnames)?

There are reasons for the prefixes in that they indicate which higher level functions are running the tests and on what aspect (i.e. on the file? on metadata? on the contents of the file? etc). In terms of matching to function docs, I wouldn't want to add aliases that are a mismash of function names and check names but don't actually match up to any correctly. So if we must add aliases I would prefer to add the actual check name.

However, there are tables with details of each check at the bottom of each higher level validation function (e.g. here's the help file for validate_submission() and it's pretty easy to tie the check name to the function too so I don't think aliases are strictly necessary. But feel free to open an issue if you think it would really help.

annakrystalli added 5 commits September 3, 2024 10:24

Use filename as header and check name as msg prefix when printing hub…

09b553c

…_validations. Resolves #76

Ensure check names in combined hub_validations are unique

45639ea

Update NEWS

d445ae9

new line

965b1f4

Fix issue number

2a11cd7

annakrystalli requested a review from zkamvar September 3, 2024 12:45

annakrystalli linked an issue Sep 3, 2024 that may be closed by this pull request

Add name of tests to print out #76

Closed

update vignette print method description

10b641c

zkamvar approved these changes Sep 3, 2024

View reviewed changes

R/hub_validations_methods.R Show resolved Hide resolved

R/hub_validations_methods.R Outdated Show resolved Hide resolved

annakrystalli added 2 commits September 3, 2024 17:18

Appease the linter

407a613

print msg when hub_validations empty

c20802c

annakrystalli merged commit 4faebe9 into ak/change-check-fail-class-and-print/111 Sep 5, 2024

annakrystalli deleted the ak/hubval-print branch September 5, 2024 07:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve `hub_validations` print method #116

Improve `hub_validations` print method #116

annakrystalli commented Sep 3, 2024

zkamvar left a comment

annakrystalli commented Sep 3, 2024

Improve hub_validations print method #116

Improve hub_validations print method #116

Conversation

annakrystalli commented Sep 3, 2024

zkamvar left a comment

Choose a reason for hiding this comment

annakrystalli commented Sep 3, 2024

Improve `hub_validations` print method #116

Improve `hub_validations` print method #116