-
Notifications
You must be signed in to change notification settings - Fork 29
Drosophila sequence and assembly
Instrument: PacBio RS II
Chemistry: C3
Enzyme: P5
The data released here is associated with the data release announcement on the PacBio blog. In collaboration with Dr. Casey Bergman at the University of Manchester and Drs. Susan Celniker and Roger Hoskins of the Berkeley Drosophila Genome Project (BDGP) at Lawrence Berkeley National Laboratory, we have sequenced adult males from a subline of the ISO1 (y; cn, bw, sp) strain of D. melanogaster. This is the same stock used in the official BDGP reference assemblies since the first genome sequence release in 2000. The DNA was size-selected for >15 kb elution using BluePippinTM (Sage Sciences), and in total, ~15 Gb (105.8X) of sequence was generated from a 20kb library using P5-C3 sequencing chemistry on the PacBio® RS II:
Total number of bases: 15,208,567,933 bp
Total number of reads: 1,514,730
Average read length: 10,040 bp
Half of sequenced bases in reads greater than: 14,214 bp
PacBio RS II instrument time for sequencing: 6 days
Number of SMRT Cells: 42
Number of Instrument Runs: 6
Preliminary analyses and step-by-step instructions for downloading, mapping, and visualizing the raw data are described on the Bergman lab blog. The raw data can be downloaded in 6 tarballs from the PacBio AWS site:
https://s3.amazonaws.com/datasets.pacb.com/2014/Drosophila/raw/Dro1_24NOV2013_398.tgz
https://s3.amazonaws.com/datasets.pacb.com/2014/Drosophila/raw/Dro2_25NOV2013_399.tgz
https://s3.amazonaws.com/datasets.pacb.com/2014/Drosophila/raw/Dro3_26NOV2013_400.tgz
https://s3.amazonaws.com/datasets.pacb.com/2014/Drosophila/raw/Dro4_28NOV2013_401.tgz
https://s3.amazonaws.com/datasets.pacb.com/2014/Drosophila/raw/Dro5_29NOV2013_402.tgz
https://s3.amazonaws.com/datasets.pacb.com/2014/Drosophila/raw/Dro6_1DEC2013_403.tgz
Alternatively, you can download the raw data from the NCBI Short Read Archive under accession SRX499318.
Please cite the following publication is you use this raw dataset in your research:
Kim KE, Peluso P, Babayan P, Yeadon PJ, Yu C, Fisher WW, Chin CS, Rapicavoli NA, Rank DR, Li J, Catcheside DE, Celniker SE, Phillippy AM, Bergman CM, Landolin JM. Long-read, whole-genome shotgun sequence data for five model organisms. Sci Data. 2014 1:140045. http://www.nature.com/articles/sdata201445
You can download the preassembled reads as well as the final diploid assembly contigs file from the FALCON diploid assembler here:
https://s3.amazonaws.com/datasets.pacb.com/2014/Drosophila/reads/dmel_FALCON_diploid_assembly.tgz
Please cite the following publication is you use the Falcon datasets in your research:
Chin CS, Peluso P, Sedlazeck FJ, Nattestad M, Concepcion GT, Clum A, Dunn C, O'Malley R, Figueroa-Balderas R, Morales-Cruz A, Cramer GR, Delledonne M, Luo C, Ecker JR, Cantu D, Rank DR, Schatz MC. Phased diploid genome assembly with single-molecule real-time sequencing. Nat Methods. 2016 13:1050-1054. http://www.nature.com/nmeth/journal/v13/n12/full/nmeth.4035.html
Preassembled reads, plus unpolished and polished assemblies of the 25X longest preassembled reads generated using PBcR and Celera Assembler 8.1 can be downloaded from the University of Maryland Center for Bioinformatics and Computational Biology website or directly via the following URLs:
ftp://cbcb.umd.edu/pub/data/sergek/dros_corrected.fastq.bz2
http://cbcb.umd.edu/software/pbcr/dmel_cons_asm.tar.gz
Please cite the following publication is you use the PBcR-CA datasets in your research:
Koren S, Schatz MC, Walenz BP, Martin J, Howard JT, Ganapathy G, Wang Z, Rasko DA, McCombie WR, Jarvis ED, Adam M Phillippy. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat Biotechnol. 2012 30:693-700. http://www.nature.com/nbt/journal/v30/n7/full/nbt.2280.html
Pre-assembled reads, unpolished assemblies, and polished assemblies of the complete 90X dataset generated using MHAP and Celera Assembler 8.2 can be downloaded from the University of Maryland Center for Bioinformatics and Computational Biology website or directly via the following URLs:
http://gembox.cbcb.umd.edu/mhap/data/dmel.polished.fastq.gz
http://gembox.cbcb.umd.edu/mhap/asm/dmel.ctg.fasta.gz
http://gembox.cbcb.umd.edu/mhap/asm/dmel.all.fasta.gz
http://gembox.cbcb.umd.edu/mhap/asm/dmel.quiver.ctg.fasta.gz
http://gembox.cbcb.umd.edu/mhap/asm/dmel.quiver.all.fasta.gz
The Quiver polished assembly of MHAP-CA contigs can also be directly downloaded from NCBI:
https://www.ncbi.nlm.nih.gov/nuccore/JSAE00000000.1/
Please cite the following publication is you use the MHAP-CA datasets in your research:
Berlin K, Koren S, Chin CS, Drake JP, Landolin JM, Phillippy AM. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat Biotechnol. 2015 33:623-30. http://www.nature.com/nbt/journal/v33/n6/full/nbt.3238.html
Visit the PacBio Developer's Network Website for the most up-to-date links to downloads, documentation and more. Terms of Use | Trademarks | Contact Us