This repository contains the code by the Frank Gu Lab for DISCO-NMR data processing.
This code transforms integral tables describing individual DISCO-NMR runs into proton-level summaries of interactive behaivour.
To build intuition for the steps underlying DISCO Data Processing in this repository, we provide a teaching tutorial Colab notebook, linked here.
We also have a graphical user interface version of the repository to simplify use, available at this link.
However, if you prefer to run the code locally to scale analysis beyond 7 inputs, and additionally ouput all interim data tables, the present repository provides a comprehensive "script mode" implementation.
Additionally, data table outputs from running this repository in script mode are directly compatible with the Disco Figures publication plotting template. If you are working on publishing DISCO-NMR results and wish to use the template, conducting your data processing in this repository enables compatibility.
Both the GUI and the present repository are built from identical underlying data processing scripts.
├── LICENSE
├── README.md <- The top-level README for this project.
├── data
│ ├── input <- Place Excel books of integral table data here
│ └── output <- The code will auto-generate output folders for each input
│
├── docs/source/src_modules <- Sphinx documentation files
├── src <- source code
│ ├── discoprocess <- contains helper functions for data processing
│ ├── disco-data-processing.py <- Key global data processing executable
│ ├── standard-figures.py <- Auto-generate buildup curve and fingerprint plots
│ ├── custom-figures.py <- Auto-genreated figure customization scripts
│ ├── dashboard.py <- Obsolete, preliminary dashboard code
│ ├── requirements.txt <- Pip version of requirements for the analysis environment
│ └── environment.yml <- Conda version of requirements for the analysis environment
├── tests <- Unit test implementation in Pytest
└── docs/source/src_modules <- Sphinx documentation files
Do one of the following:
-
Clone this repository to a directory of your choice on your computer using the command line or GitHub Desktop.
-
Download the ZIP file of archive of the repository, move and extract it in the directory of your choice on your computer.
-
Download and install Anaconda
-
Navigate to the project directory
-
Open Anaconda prompt in this directory (or Terminal)
-
Run the following commend from Anaconda prompt (or Terminal) to automatically create an environment from the requirements.txt file:
$ conda create --name <env-name> --file requirements.txt
-
Run the following command to activate the environment:
conda activate env-name
(where env-name is the name entered between brackets above) -
You are now ready to open and run files in the repository in a code editor of your choice that runs your virtual environment (ex: VSCode)
For detailed information about creating, managing, and working with Conda environments, please see the corresponding help page.
If you prefer to manage your packages using pip, navigate in Terminal to the project directory and run the command below to install the preqrequisite packages into your virtual environment:
$ pip install -r requirements.txt
With either install option, you may need to create an additional Jupyter Notebook kernel containing your virtual environment, if it does not automatically appear. See this guide for more information.
- Place Excel books of integral tables in the
data/input
folder (Example HPC 370kDa in folder to Quickstart first run of code) - Navigate in Terminal to
src
directory - Type command
python disco-data-processing.py
to execute script - Outputs will be generated automatically in
data/output
. If it is your first time running the script, the output directory will be automatically generated.
- output from
disco-data-processing.py
executable
- output from
disco-data-processing.py
executable
- output from
standard-figures.py
executable
Currently, 100% of unit tests pass (on Windows machine).
To run unit tests, navigate to the tests
directory, and run the command pytest
in Terminal.
Please format one experiment, and all its technical replicates, as one Excel Book.
Within a book, each tab corresponds to the integral tables of a technical replicate. The exemplary book for HPC 370kDa is provided in the input folder as a template for format requirements.