Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

filter: Confusing warning and error combination when query contains a missing column #1592

Closed
victorlin opened this issue Aug 21, 2024 · 2 comments · Fixed by #1604
Closed
Labels
bug Something isn't working

Comments

@victorlin
Copy link
Member

@huddlej noted in nextstrain/mpox#273 (comment) that this output comes across as a bug (that a warning is also an error):

WARNING: Column 'QC_rare_mutations' does not exist in the metadata file. Ignoring it.
ERROR: Query contains a column that does not exist in metadata.

This happens because the columns are effectively checked twice:

  1. read_metadata which does not error by design:

    augur/augur/io/metadata.py

    Lines 124 to 127 in d8faf01

    # Ignore missing columns. Don't error since augur filter's
    # --exclude-where allows invalid columns to be specified (they
    # are just ignored).
    print_err(f"WARNING: Column '{requested_column}' does not exist in the metadata file. Ignoring it.")

  2. Evaluation of the query:

    try:
    return set(metadata_copy.query(query).index.values)
    except Exception as e:
    if isinstance(e, PandasUndefinedVariableError):
    raise AugurError(f"Query contains a column that does not exist in metadata.") from e

Possible solution

Both checks are beneficial, so it doesn't make sense to remove either. Instead, the warning text could be updated to indicate potential issues:

WARNING: Column 'QC_rare_mutations' does not exist in the metadata file. This may cause subsequent errors.
ERROR: Query contains a column that does not exist in metadata.
@victorlin victorlin added the bug Something isn't working label Aug 21, 2024
@genehack
Copy link
Contributor

I would think if you're going to be making changes here, updating the second (error) message to include the name of the missing column would be a good improvement — it will make the combination of messages overall more clear because it will be apparent that both of them are about the same column.

@victorlin
Copy link
Member Author

updating the second (error) message to include the name of the missing column would be a good improvement

I agree, and looked into this briefly. Pandas's UndefinedVariableError does not expose the column name apart from being in the error message text name 'QC_rare_mutations' is not defined, and parsing it from there feels wrong. Maybe a good compromise would be to tack on the entire Pandas error message like this:

WARNING: Column 'QC_rare_mutations' does not exist in the metadata file. This may cause subsequent errors.
ERROR: Query contains a column that does not exist in metadata: name 'QC_rare_mutations' is not defined

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants