-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Bug fix: Disease-gene associations #158
base: develop
Are you sure you want to change the base?
Conversation
4a33ef1
to
53a25d2
Compare
53a25d2
to
4f40bee
Compare
- Added and updated comments related to gene disease associations
- Fixing a bug where causaility and other relationships are not being defined correctly. - Update: In this commit, undoing temporary print statements.
- Add: comments - Add; SPARQL query and goal to list relationships
840da88
to
ebfec86
Compare
- Delete: Code for adding Gene2Disease relationships - Bug fix: Code for adding Disease2Gene relationships. These should be added if and only if there is one gene associated with the disease, unless it is a digenic disease/phenotype. However before, we were erroneously checking the number of diseases associated with a given gene.
3296155
to
a61ddb2
Compare
omim2obo/main.py
Outdated
add_subclassof_restriction(graph, RO['0002525'], CHR[chr_id], OMIM[gene_mim]) | ||
|
||
# TODO: @Trish: I've removed addition of gene->disease relations, leaving only disease->gene. Is that what we want? | ||
# Removed because extra complexity. Want to get down disease->gene code first. And Mondo doesn't need them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed calculating the number of associations differently since it did not appear that if a gene was unique in the morbidmap file that it was sufficient to tag that gene as causal for a disease based on the check here, and instead that the total number of genes per phenotype mim number (ie disease) should be used to determine if a gene is causal (ie only 1 gene per disease).
- What's the reason/error that you saw to remove gene->disease?
- Is Mondo the only place where these omim release files are used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Gene removal
There are a few reasons. It's been a year since this was written, so I don't remember everything, but I keep being reminded about the disease-centric nature of things, and I wonder whether we meant to add them in the first place. And moreso, I want this code review to be easier at first; for us to get the disease-gene stuff handled first. If you look at the code, I had to add on a bit to what you wrote to complete it.
Thinking more about it today though, I did leave part of the gene restrictions; the "cyto location" bit. And for adding the other ones back, I think the best way to handle it is to add some extra lines to this "cyto location" block. It'll actually look very similar to what was there before; I just need to drop the "n associations" conditional for the genes as that only matters for the diseases.
TLDR: Let's make sure the Disease2Gene stuff is correct first and I can add back the Gene2Disease shortly.
- @joeflack4 add gene->disease rels back
- Yes, AFAIK only
mondo-ingest
consumes these outputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the gene disease relationships back.
I had to put some thought into it to make sure I felt it was correct, and to make the code flow as cleanly and understandably as I could make it.
Feel free to review here or during meeting.
- Delete: Logic for digenic genes. - Add: Back logic for restricting to mapping key 3. - Add: Back gene->disease rels - Add: Filtering of non-definitive relationships (presence of [, non-diseases, {, susceptibility, and ?, provisional). - Bug fix: Missing space in warning message
omim2obo/main.py
Outdated
# Only add if (1) only 1 association (2)‚mapping key = 3, (3) 'definitive' | ||
phenotype_genes2: Dict[str, Dict[str, str]] = {k: v[0] for k, v in phenotype_genes.items() if len(v) == 1} # 1assoc | ||
phenotype_genes2 = {k: v for k, v in phenotype_genes2.items() if p2g_is_definitive(v['phenotype_label'])} | ||
for p_mim, assoc in phenotype_genes2.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this will still include the handful of diseases where p_label contains the word 'digenic' since there are cases where these exist with 1 gene association and a mapping key of 3 and are not prefixed with an omim special character.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's true. I was going to follow up with you about that.
Basically my rationale for not filtering these out is because of the OMIM team's response. It seems to me that they haven't really insured the data quality of digenic diseases. So I suppose that could mean one of the following or both:
- Non digenic, but being labeled as such
- Digenic but they only labeled one of the associations
I had the first case at the forefront of my mind. But now that I think about it, for these cases where it is labeled diegenic but it only has one association, we have no idea which of these two cases it is.
So therefore, I do agree it is a good idea to filter them out now.
However I wonder if we want to consider outputting them into a special table for curator review. Perhaps some of them are not actually digenic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I think outputting them to a file for curator review is a good idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
I added review.tsv
to the release files and documentation for it, including the types of cases found inside of it, in the README.md
.
- Add: Finished adding gene->disease rels back - Add: Additional filter for diseases marked digenic even though they only have 1 association. But will output these special cases to a file. - Add: Additional comments - Add: New output / release file 'review.tsv', with typed dictionary class and documentation in README.md: First and currently only case we are adding: Situations where found a phenotype where we would have added a causal subclass restriction were it not for 'digenic' being in the label (possibly erroneously).
f495c8a
to
fc9ef27
Compare
- Bug fix / updates: SPARQL query: Renamed some columns, reordered columns, updated sorting. Fixed "Direction" and renamed to "PredDirection". Added SubBiolink and ObjBiolink types columns.
- Update: Comments regrading biolink:category, etc, derived from MIM types
1838e7e
to
b1d57db
Compare
- Bug fix: Removed accidental "causal" G2D restrictions being added when the corresponding D2G was not found to be definitively causal.
b1d57db
to
ca03787
Compare
- Update: Docs: Note about the inputs of the repo and how they don't always match omim.org/entry pages - Fix: Docs: grammatical error
- Fix: A misstatment just added to the docs.
- Fix: Docs - Fix: Grammar: erroneous line break - Fix: Typo
fe6bc63
to
aa10e37
Compare
- Update: Remove filtering out of single-association cases where phenotype marked 'digenic'.
- Add: Related comment: 9606: NCBI Taxonomy ID for Homo Sapiens - Add: Warning if MIM type of Phenotype-Gene assoc found to not be that of "Phenotype". - Bug fix: Some Phenotype-Gene associations were not identified due to a REGEX parsing error requiring an unnecessary (but 99% of the time present, as apparently being part of the OMIM naming spec for this field) comma.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said on Slack, this should be merged into develop and of course then into main and a new release should be generated so the OMIM gene pipeline can be tested with these updates. This is needed ASAP.
resolves #156
Changes
Resources