joined gene names, a possible pitfall to cause incorrect result? #178

biocyberman · 2016-03-13T20:10:07Z

Is chanjo aware of this problematic gene names, which may causes various problems for queries that base on gene names?

➤ gawk '{print $NF}' ccds.15.grch37p13.extended.bed|grep ','|head                                                                                                                                                                                 
NOX1,NOX1,NOX1
NOX1,NOX1,NOX1
NOX1,NOX1
NOX1,NOX1,NOX1
NOX1,NOX1,NOX1
NOX1,NOX1,NOX1
NOX1,NOX1,NOX1
NOX1,NOX1,NOX1
NOX1,NOX1,NOX1
NOX1,NOX1,NOX1
➤ gawk '{print $NF}' ccds.15.grch37p13.extended.bed|grep ','|wc -l                                                                                                                                                                                
66188

➤ gawk '{print $NF}' ccds.15.grch37p13.extended.bed|grep ','|sort|uniq|wc -l                                                                                                                                                                      
9290
➤ gawk '{print $NF}' ccds.15.grch37p13.extended.bed|grep ','|sort|uniq >problematic.gene.names.txt

The text was updated successfully, but these errors were encountered:

biocyberman · 2016-03-13T20:14:34Z

An test query on NOX1 returned a result. So I guess chanjo does indeed take care of the problem. Could you @robinandeer explain how it does that? Maybe point me to the relevant code section is enough.

robinandeer · 2016-03-13T21:31:33Z

I'm not quiet sure what you mean :/

The only problematic gene names I know on are the ones that exist on both the X and Y chromosomes and have to be given prefixes.

It looks like you are picking out exons that belong to multiple transcripts which all map to the same gene but the input looks correct :)

Remember that it's only in the loading step these colums matter - for annotations, only the chrom, start, end columns matter

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

joined gene names, a possible pitfall to cause incorrect result? #178

joined gene names, a possible pitfall to cause incorrect result? #178

biocyberman commented Mar 13, 2016

biocyberman commented Mar 13, 2016

robinandeer commented Mar 13, 2016

joined gene names, a possible pitfall to cause incorrect result? #178

joined gene names, a possible pitfall to cause incorrect result? #178

Comments

biocyberman commented Mar 13, 2016

biocyberman commented Mar 13, 2016

robinandeer commented Mar 13, 2016