Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Include a location x variant count threshold for inclusion of GA #129

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

trvrb
Copy link
Member

@trvrb trvrb commented Feb 26, 2025

Previously, the growth advantage was included for each location and each variant regardless of the amount of data available for this particular location x variant combination. There would be situations where there is 0 sequences for a particular variant in a particular location and we'd still display the growth advantage (which was just the hierarchical prior).

This PR adds an option of --location-ga-inclusion-threshold to run-mlr-model.py. I've initially set this to 100, which seemed to work well at least heuristically.

Screenshot 2025-02-26 at 2 45 18 PM

Previously, the growth advantage was included for each location and each variant regardless of the amount of data available for this particular location x variant combination. There would be situations where there is 0 sequences for a particular variant in a particular location and we'd still display the growth advantage (which was just the hierarchical prior).

This commit adds an option of "--location-ga-inclusion-threshold" to "run-mlr-model.py". I've initially set this to 100, which seemed to work well at least heuristically.
@trvrb trvrb requested a review from joverlee521 February 26, 2025 14:44
Copy link
Contributor

@joverlee521 joverlee521 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't comment on whether this threshold makes sense, but I do think we should surface this in the description within the viz app. Otherwise, just looking at the screenshot in the PR, I'd think there was a bug in the graph because of the missing plots.

int_value = int(value)
if int_value <= 0:
print(f"ERROR: {int_value} is not a positive integer.", file=sys.stderr)
sys.exit(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing sys import at the top.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or alternatively, make use of argparse's error API:

    int_value = int(value)
    if int_value <= 0:
        raise argparse.ArgumentTypeError(f"{int_value} is not a positive integer.")

@@ -403,15 +429,28 @@ def get_group_samples(samples, sites, group):
help="Whether to run the model as hierarchical. Overrides model.hierarchical in config. "
+ "Default is false if unspecified."
)

parser.add_argument(
"--location-ga-inclusion-threshold", type=positive_int, default=0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly weird that the default value of 0 could fail the positive_int check.
Would it make sense to change to default=1 or type=positive_int_or_zero?

Comment on lines +450 to +451
seq_data = pd.read_csv(args.seq_path, sep="\t")
variant_location_counts = seq_data.groupby(["location", "variant"])["sequences"].sum().to_dict()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking suggestion

The raw_seq loaded above should be the same Dataframe, so there should be no need to load the data again.

Suggested change
seq_data = pd.read_csv(args.seq_path, sep="\t")
variant_location_counts = seq_data.groupby(["location", "variant"])["sequences"].sum().to_dict()
variant_location_counts = raw_seq.groupby(["location", "variant"])["sequences"].sum().to_dict()

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants