Skip to content

Analysis pipeline for running DNA circuits on the nanopore

Notifications You must be signed in to change notification settings

uwmisl/dna-nanopore-computing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DNA Circuit Detection from Raw Nanopore Sensing Data

Analysis pipeline for extracting, filtering, classifying, and quantifying DNA circuit output on a nanopore sensing platform. Bulk raw data was collected from Oxford Nanopore Technologies' MinION using R9.4.1 flow cells and a custom MinKNOW run script.

Adapted from https://github.com/uwmisl/NanoporeTERs, which uses this pipeline for peptide detection.

System Requirements and Installation

This software is compatible with Linux operating systems. The classification algorithms in this software also utilize a GPU (CUDA 10.0).

This repository primarily consists of iPython notebooks that were developed and tested on a Jupyter server with Python 2.7. The following dependencies should be installed:

  • dask (1.2.2)
  • future (0.17.1)
  • h5py (2.9.0)
  • joblib (0.14.0)
  • matplotlib (2.2.4)
  • numpy (1.16.2)
  • pandas (0.24.2)
  • scikit-learn (0.20.4)
  • scipy (1.2.2)
  • pytorch (1.2.0) for CUDA 10.0
  • yaml (0.1.7)

Installation of these dependencies should only take a few minutes with the exception of pytorch, which can take several hours depending on download speed.

How to Use

The input for this analysis pipeline is the bulk raw fast5 file generated by MinKNOW after an experimental run. Details of the experimental run, including the times at which each analyte is introduced, should be recorded in a Google spreadsheet. An example of this spreadsheet can be found here.

Open nanopore_experiments/prep_experiment_notebook.ipynb. Change date in Cell 2 to match the appropriate experiment. Change f5_base_dir to the directory of the raw fast5 file. Change output_dir to the desired directory for output capture data. Run the entire notebook. This will create a new experiment notebook in nanopore_experiments under the name experiment_DATE_FLOWCELL.ipynb, as well as a config file in nanopore_experiments/configs under the name segment_DATE_FLOWCELL.yml.

Open the newly generated experiment notebook. Details are written in the notebook, as well as in the Methods section of the accompanying manuscript, on the expected behavior and available parameters for each major step in the data processing pipeline. All cells in the notebook should be run in sequential order.

The output from this pipeline should include:

  • Split fast5 files for each analyte, saved to the same directory as the bulk raw fast5
  • Example nanopore traces for each analyte, saved to nanopore_experiments/plots
  • Map of good channels for each analyte, saved to nanopore_experiments/plots
  • Capture metadata for each analyte, saved to user-defined output_dir
  • Raw capture data for each analyte, saved to user-defined output_dir
  • Filtered and classified capture metadata for each analyte, saved to user-defined output_dir
  • Quantification of each analyte, saved to concentration

Demo

An example raw fast5 file is provided here (file size ~6 GB), corresponding to the experiment logged on the example spreadsheet.

The fully-executed experiment notebook for this demo is provided at nanopore_experiments/experiment_20210118_FAP26604.ipynb. The expected runtime for this demo (from raw fast5 file to quantification results) is ~10 minutes. Expected results for both time until capture-based and frequency-based quantification are provided at concentration.

About

Analysis pipeline for running DNA circuits on the nanopore

Resources

Stars

Watchers

Forks

Packages

No packages published