-
Notifications
You must be signed in to change notification settings - Fork 29
Human_Microbiome_Project_MockB_Shotgun
Instrument: PacBio RS II
Chemistry: C2 & C3
Enzyme: P4 & P5
P4-C2 and P5-C3 both collected as indicated below.
Just as the de novo assembly of individual genomes is dramatically improved by applying the long read lengths of SMRT® Sequencing, the assembly of metagenomes should also benefit from these advances with significant improvements to delineate between members of the community. As a proof of concept to study this hypothesis, Pacific Biosciences has sequenced a mock community from the Human Microbiome Project and assembled the data using the same algorithm used to assemble single microbial genomes, HGAP.
Posted here is shotgun sequencing data from the Mock Community B sample from the Human Microbiome Project. These files contain the sequencing read data only, not an assembly. The data has been broken into several sets to keep the file size somewhat reasonable. Some preliminary example results of the assembly are also shown at the bottom of the page.
The mock community was obtained through BEI Resources, NIAID, NIH as part of the Human Microbiome Project: Genomic DNA from Microbial Mock Community B (Even, High Concentration), v5.1H, for Whole Genome Shotgun Sequencing, HM-276D. http://www.beiresources.org/Catalog/otherProducts/HM-276D.aspx
- Set 1 (7 SMRT Cells using P4-C2) (43 Gb tar.gz)
- Set 2 (7 SMRT Cells using P4-C2) (30 Gb tar.gz)
- Set 3 (7 SMRT Cells using P4-C2) (28 Gb tar.gz)
- Set 4 (6 SMRT Cells using P4-C2) (25 Gb tar.gz)
- Set 5 (8 SMRT Cells using P5-C3) (35 Gb tar.gz)
- Set 6 (7 SMRT Cells using P5-C3) (29 Gb tar.gz)
- Set 7 (7 SMRT Cells using P5-C3) (20 Gb tar.gz)
Data file | Sample Name | Binding Tube | Polymerase | Chemistry | Acq Time | Post-Filter # Subreads | Post-Filter # Bases |
---|---|---|---|---|---|---|---|
hmp_set1 | BEI high even metagenomic_50 pM | BEI high even metagenomic_bt_1 | P4 | C2 | 180 | 86148 | 379137670 |
hmp_set1 | BEI high even metagenomic_50 pM | BEI high even metagenomic_bt_1 | P4 | C2 | 180 | 81868 | 360470933 |
hmp_set1 | BEI high even metagenomic_50 pM | BEI high even metagenomic_bt_1 | P4 | C2 | 180 | 74202 | 325257630 |
hmp_set1 | BEI high even metagenomic_50 pM | BEI high even metagenomic_bt_1 | P4 | C2 | 180 | 56766 | 239910920 |
hmp_set1 | BEI high even metagenomic_75 pM | BEI high even metagenomic_bt_1 | P4 | C2 | 180 | 77952 | 334252160 |
hmp_set1 | BEI high even metagenomic_75 pM | BEI high even metagenomic_bt_1 | P4 | C2 | 180 | 84656 | 403816210 |
hmp_set1 | BEI high even metagenomic_75 pM | BEI high even metagenomic_bt_1 | P4 | C2 | 180 | 96587 | 478724129 |
hmp_set2 | BEI high even metagenomic_75 pM | BEI high even metagenomic_bt_1 | P4 | C2 | 180 | 88561 | 426776736 |
hmp_set2 | BEI high even metagenomic_25 pM | BEI high even metagenomic_bt_1 | P4 | C2 | 180 | 60997 | 277183852 |
hmp_set2 | BEI high even metagenomic_25 pM | BEI high even metagenomic_bt_1 | P4 | C2 | 180 | 53994 | 236239138 |
hmp_set2 | BEI high even metagenomic_25 pM | BEI high even metagenomic_bt_1 | P4 | C2 | 180 | 37144 | 153559945 |
hmp_set2 | BEI high even metagenomic_25 pM | BEI high even metagenomic_bt_1 | P4 | C2 | 180 | 16972 | 71041660 |
hmp_set2 | BEI high even metagenomic_50nM | BEI high even metagenomic_bt_1 | P4 | C2 | 180 | 82662 | 330381041 |
hmp_set2 | BEI high even metagenomic_50nM | BEI high even metagenomic_bt_1 | P4 | C2 | 180 | 68918 | 252186962 |
hpm_set3 | BEI high even metagenomic_50nM | BEI high even metagenomic_bt_1 | P4 | C2 | 180 | 72172 | 263299931 |
hmp_set3 | BEI high even metagenomic_50nM | BEI high even metagenomic_bt_1 | P4 | C2 | 180 | 82525 | 326780667 |
hmp_set3 | BEI high even metagenomic_50nM | BEI high even metagenomic_bt_1 | P4 | C2 | 180 | 71059 | 268035345 |
hmp_set3 | BEI high even metagenomic_50nM | BEI high even metagenomic_bt_1 | P4 | C2 | 180 | 64942 | 261233340 |
hmp_set3 | BEI high even metagenomic_50nM | BEI high even metagenomic_bt_1 | P4 | C2 | 180 | 45375 | 176070055 |
hmp_set3 | BEI high even metagenomic_50 pM | BEI high even metagenomic_bt_1 | P4 | C2 | 120 | 57429 | 197140754 |
hmp_set3 | BEI high even metagenomic_50 pM | BEI high even metagenomic_bt_1 | P4 | C2 | 120 | 36045 | 124101501 |
hmp_set4 | BEI high even metagenomic_50 pM | BEI high even metagenomic_bt_1 | P4 | C2 | 120 | 54251 | 216380686 |
hmp_set4 | BEI high even metagenomic_50 pM | BEI high even metagenomic_bt_1 | P4 | C2 | 120 | 79171 | 308755441 |
hmp_set4 | BEI high even metagenomic_50 pM | BEI high even metagenomic_bt_1 | P4 | C2 | 120 | 60845 | 237953531 |
hmp_set4 | BEI high even metagenomic_50 pM | BEI high even metagenomic_bt_1 | P4 | C2 | 120 | 82003 | 317202529 |
hmp_set4 | BEI high even metagenomic_50 pM | BEI high even metagenomic_bt_1 | P4 | C2 | 120 | 79972 | 316034059 |
hmp_set4 | BEI high even metagenomic_50 pM | BEI high even metagenomic_bt_1 | P4 | C2 | 120 | 72257 | 274021988 |
hmp_set5 | BEI high even metagenomic_ | BEI high even metagenomic_bt_2 | P5 | C3 | 150 | 71388 | 383313431 |
hmp_set5 | BEI high even metagenomic_ | BEI high even metagenomic_bt_2 | P5 | C3 | 150 | 71115 | 373244879 |
hmp_set5 | BEI high even metagenomic_ | BEI high even metagenomic_bt_2 | P5 | C3 | 150 | 70456 | 366785650 |
hmp_set5 | BEI high even metagenomic_ | BEI high even metagenomic_bt_2 | P5 | C3 | 150 | 69065 | 357261288 |
hmp_set5 | BEI high even metagenomic_ | BEI high even metagenomic_bt_2 | P5 | C3 | 150 | 66999 | 346936491 |
hmp_set5 | BEI high even metagenomic_ | BEI high even metagenomic_bt_2 | P5 | C3 | 150 | 70243 | 349011680 |
hmp_set5 | BEI high even metagenomic_ | BEI high even metagenomic_bt_2 | P5 | C3 | 150 | 68937 | 337872541 |
hmp_set5 | BEI high even metagenomic_ | BEI high even metagenomic_bt_2 | P5 | C3 | 150 | 65010 | 320675096 |
hmp_set6 | BEI high even metagenomic_50pM | BEI high even metagenomic_bt_2 | P5 | C3 | 150 | 80562 | 381691078 |
hmp_set6 | BEI high even metagenomic_50pM | BEI high even metagenomic_bt_2 | P5 | C3 | 150 | 66540 | 324761102 |
hmp_set6 | BEI high even metagenomic_50pM | BEI high even metagenomic_bt_2 | P5 | C3 | 150 | 78035 | 371404939 |
hmp_set6 | BEI high even metagenomic_50pM | BEI high even metagenomic_bt_2 | P5 | C3 | 150 | 73278 | 349280176 |
hmp_set6 | BEI high even metagenomic_50pM | BEI high even metagenomic_bt_2 | P5 | C3 | 150 | 61150 | 289192246 |
hmp_set6 | BEI high even metagenomic_50pM | BEI high even metagenomic_bt_2 | P5 | C3 | 150 | 66913 | 314699834 |
hmp_set6 | BEI high even metagenomic_50pM | BEI high even metagenomic_bt_2 | P5 | C3 | 150 | 63319 | 291515819 |
hmp_set7 | BEI high even metagenomic_ | BEI high even metagenomic_bt_2 | P5 | C3 | 150 | 58617 | 262376178 |
hmp_set7 | BEI high even metagenomic_ | BEI high even metagenomic_bt_2 | P5 | C3 | 150 | 59557 | 266423085 |
hmp_set7 | BEI high even metagenomic_ | BEI high even metagenomic_bt_2 | P5 | C3 | 150 | 43703 | 196536274 |
hmp_set7 | BEI high even metagenomic_ | BEI high even metagenomic_bt_2 | P5 | C3 | 150 | 49685 | 224514646 |
hmp_set7 | BEI high even metagenomic_ | BEI high even metagenomic_bt_2 | P5 | C3 | 150 | 44409 | 209207825 |
hmp_set7 | BEI high even metagenomic_ | BEI high even metagenomic_bt_2 | P5 | C3 | 150 | 40569 | 181415565 |
hmp_set7 | BEI high even metagenomic_ | BEI high even metagenomic_bt_2 | P5 | C3 | 150 | 35858 | 162613189 |
The preliminary results of this analysis are promising, and the assemblies produced generally display improved contiguity as compared to publicly available, short-read data sets and assemblies. Additionally, the use of epigenetic information to make associations between contigs may make it possible to further improve the shotgun metagenome assembly. This approach would serve as a novel validation method provided only with PacBio sequencing, which can detect epigenetic modifications during single-molecule sequencing. We expect that using methylation data to make associations between contigs will prove more reliable than other strategies, such as binning by GC content, since the methylation profiles of different species within the community should follow a fundamental biological principle to be consistent across the genome. .
Some example results of the shotgun metagenome assembly are represented in the figures below, comparing PacBio preliminary results to MetaVelvet assemblies using Illumina® data. (Illumina benchmark from Treangen and Koren et al. MetAMOS: a modular and open source metagenomic assembly and analysis pipeline. MetaVelvet used for comparison because it produced the fewest contigs for the example genomes shown below -- i.e., the best short-read assemblies are used as a comparison below.)
Visit the PacBio Developer's Network Website for the most up-to-date links to downloads, documentation and more. Terms of Use | Trademarks | Contact Us