-
Notifications
You must be signed in to change notification settings - Fork 353
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
wish: fusion gene for RNA-Seq #210
Comments
more tools: secondary analysis of fusion product: |
Tophat fusion (poor usability), http://tophat.cbcb.umd.edu/fusion_tutorial.html (looks like it's entirely free) Most of the Fusion callers I've seen in use are half baked, produce massive amounts of false positives and their licensing is unclear. Take deFuse where the authors themselves are confused about what license it's on. Further, deFuse and probably other fusion callers depend on BLAT which costs $$$$$$ for for-profits. A few review papers: |
Thanks! STAR can identify fusion transcript as well, it seems Oncofuse is a well-trusted framework, but I cannot locate its license. |
Oncofuse seems to be under the Apache 2.0 open source license! I can confirm that we have seen positive results using tophat in fusion mode and feeding the output to Oncofuse, to find a well known fusion in prostate cancer. A simplified pipeline would be tophat (fusion mode) + Oncofuse or STAR + Oncofuse. This would be a completely open source solution and wouldn't depend on the extremely clumsy tophat-fusion-post (which also depends on BLAST and manually downloading data files). Not 100% certain but Oncofuse might be limited to human only. |
Great, is there any reason you like tophat over STAR? just curious |
Sorry, what I meant was that tophat-fusion-post (the post processing step for tophat run in fusion mode) is very clumsy. I haven't actually tried STAR yet. |
Hi Miika and Paul, Good discussion-- I have never done any fusion gene detection so I don't have any suggestions on that front. I can fix the Tophat support so it supports mapping in fusion mode and I'll fix the STAR support so it actually works, looking at it I checked in something half-done. Nice. Would that be enough to at least get you to the point where you can run the tools on the output? |
I added Tophat fusion support with dd4fb1a. If you add |
I re-enabled STAR support via b9630e2, but I haven't tested it out on anything real, just our unit tests. I have a nice set of real test data that I'll report on when it is finished running. |
Thanks Rory, I can draft the initial implementation of Oncofuse, and Miika can evaluate |
Hi Paul and Rory, Happy to evaluate all of this on our data. Oncofuse is simple to use on Tophat output at least as it's just a Java package. (The documentation of Oncofuse is a bit, well, funny as one of the outputs is a Bayesian probability interpreted as a p-value and corrected for multiple testing using Bonferroni correction..heh) Rory, regarding disambiguation and fusions, they are kind of mutually exclusive in the same run of bcbio. What I've done with explant fusion detection is to first run the disambiguation pipeline and then extract the disambiguated reads (and any unaligned mates of reads that survived disambiguation), drop them into fastq files and run fusion detection. |
Rory, with the new installation script, what's the best way to test the new code without installing? (sorry, I am still not very comfortable with git) |
Hi Paul, Happy holidays, thanks for all of your contributions this past year. Brad describes his approach to testing here: https://bcbio-nextgen.readthedocs.org/en/latest/contents/code.html#development-infrastructure. I describe what I do here: #147 (comment) The idea is to not reinstall everything, but just have a separate installation of the bcbio-nextgen Python code that you can run in it's own python virtual environment. Then you can edit that code as much as you want and run it, this is useful if other people are using your bcbio-nextgen installation and you want to hack on one without breaking it for them. One gotcha is that when you invoke the development one you installed and you want to do a run, you need to explicitly point to the
Let me know if you run into any issues and thanks again for everything. |
Thanks for the guideline, that is super helpful. I submitted a pull request #237 to check if that's the right way to implement oncofuse. it may not run properly at first, but I had some issues installing bcbio and will test it when that's fixed. |
I hope this pull request works fine, and let me close this issue at the moment |
Hei Miika, the situation about fusion finders actually is quite good and out there are very good fusion finders which have low false positive rates! Here is a more up to date comparison of fusion finders:
|
Thanks Daniel, I think there is room for improvement in terms of including more fusion callers and Brad would probably be happy to accept pull requests. I'd be vary of incorporating anything that relies on BLAT though because of the $$$$$ license. |
FusionCatcher is using four aligners, which are Bowtie, Bowtie2, STAR, and BLAT. It is very easy to disable the BLAT aligner in FusionCatcher just by using the command line option "--skip-blat" (then FusionCatcher is using only 3 aligners instead of 4)! Therefore BLAT license is not an issue! |
Hi @ndaniel, Thanks for the awesome comments; at HSPH we don't have very much experience at all with fusion genes. Do you have example data available with known fusion genes where Oncofuse and STAR is missing them? We'd love to improve the fusion gene calling and having a known set to work with would be really helpful. |
Hi roryk, here it is again: |
Thanks @ndaniel, Is there a way where we could skip a bunch of the aligning and what not and start out from using just STAR alignments? |
If you refer to FusionCatcher then the answer is no. FusionCatcher is a fully automatic pipeline by itself and it needs to take as input RAW fastq files. There are no shortcuts. |
Hi @roryk here it is a very small testing case which allows to test quickly a pipeline for missed fusions. These two small FASTQ (paired-end reads) files (size less than 2MB):
contain reads for 9 known spike-in fusion genes:
from open-access synthetic spike-in mRNA-seq data for cancer gene fusions SRA. Here is more info about this small case. I estimate that it should take less than 5 minutes to analyze this FASTQ files for your test! If one runs FusionCatcher with exactly these two input FASTQ files, FusionCatcher detects all 9 fusions and the results are here. |
I am exploring different options of fusion gene detection from RNA-Seq data, would anyone have experience from different tools?
I found FusionQ at https://sites.google.com/site/fusionq1/home/ and FusionCatcher https://code.google.com/p/fusioncatcher/ with a bit of googling.
The text was updated successfully, but these errors were encountered: