-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Include a location x variant count threshold for inclusion of GA #129
base: main
Are you sure you want to change the base?
Conversation
Previously, the growth advantage was included for each location and each variant regardless of the amount of data available for this particular location x variant combination. There would be situations where there is 0 sequences for a particular variant in a particular location and we'd still display the growth advantage (which was just the hierarchical prior). This commit adds an option of "--location-ga-inclusion-threshold" to "run-mlr-model.py". I've initially set this to 100, which seemed to work well at least heuristically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't comment on whether this threshold makes sense, but I do think we should surface this in the description within the viz app. Otherwise, just looking at the screenshot in the PR, I'd think there was a bug in the graph because of the missing plots.
int_value = int(value) | ||
if int_value <= 0: | ||
print(f"ERROR: {int_value} is not a positive integer.", file=sys.stderr) | ||
sys.exit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing sys
import at the top.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or alternatively, make use of argparse's error API:
int_value = int(value)
if int_value <= 0:
raise argparse.ArgumentTypeError(f"{int_value} is not a positive integer.")
@@ -403,15 +429,28 @@ def get_group_samples(samples, sites, group): | |||
help="Whether to run the model as hierarchical. Overrides model.hierarchical in config. " | |||
+ "Default is false if unspecified." | |||
) | |||
|
|||
parser.add_argument( | |||
"--location-ga-inclusion-threshold", type=positive_int, default=0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly weird that the default value of 0 could fail the positive_int
check.
Would it make sense to change to default=1
or type=positive_int_or_zero
?
seq_data = pd.read_csv(args.seq_path, sep="\t") | ||
variant_location_counts = seq_data.groupby(["location", "variant"])["sequences"].sum().to_dict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-blocking suggestion
The raw_seq
loaded above should be the same Dataframe, so there should be no need to load the data again.
seq_data = pd.read_csv(args.seq_path, sep="\t") | |
variant_location_counts = seq_data.groupby(["location", "variant"])["sequences"].sum().to_dict() | |
variant_location_counts = raw_seq.groupby(["location", "variant"])["sequences"].sum().to_dict() |
Previously, the growth advantage was included for each location and each variant regardless of the amount of data available for this particular location x variant combination. There would be situations where there is 0 sequences for a particular variant in a particular location and we'd still display the growth advantage (which was just the hierarchical prior).
This PR adds an option of
--location-ga-inclusion-threshold
torun-mlr-model.py
. I've initially set this to 100, which seemed to work well at least heuristically.