Skip to content

Transforms Excel books of DISCO NMR integral tables into merged datasets for downstream analytics and machine learning.

License

Notifications You must be signed in to change notification settings

Frank-Gu-Lab/disco-data-processing

Repository files navigation

DISCO Data Processing Code 🕺⚙️

This repository contains the code by the Frank Gu Lab for DISCO-NMR data processing.

This code transforms integral tables describing individual DISCO-NMR runs into proton-level summaries of interactive behaivour.

To build intuition for the steps underlying DISCO Data Processing in this repository, we provide a teaching tutorial Colab notebook, linked here.

Script Mode Data Processing vs Graphical User Interface

We also have a graphical user interface version of the repository to simplify use, available at this link.

However, if you prefer to run the code locally to scale analysis beyond 7 inputs, and additionally ouput all interim data tables, the present repository provides a comprehensive "script mode" implementation.

Additionally, data table outputs from running this repository in script mode are directly compatible with the Disco Figures publication plotting template. If you are working on publishing DISCO-NMR results and wish to use the template, conducting your data processing in this repository enables compatibility.

Both the GUI and the present repository are built from identical underlying data processing scripts.


Project Organization

├── LICENSE
├── README.md      <- The top-level README for this project.
├── data
│   ├── input      <- Place Excel books of integral table data here
│   └── output     <- The code will auto-generate output folders for each input
│
├── docs/source/src_modules  <- Sphinx documentation files
├── src  <- source code
│   ├── discoprocess <- contains helper functions for data processing
│   ├── disco-data-processing.py <- Key global data processing executable
│   ├── standard-figures.py <- Auto-generate buildup curve and fingerprint plots
│   ├── custom-figures.py   <- Auto-genreated figure customization scripts
│   ├── dashboard.py        <- Obsolete, preliminary dashboard code
│   ├── requirements.txt  <- Pip version of requirements for the analysis environment
│   └── environment.yml   <- Conda version of requirements for the analysis environment
├── tests  <- Unit test implementation in Pytest 
└── docs/source/src_modules  <- Sphinx documentation files

Running the code locally in script mode

1. Clone or download this GitHub repository:

Do one of the following:

  • Clone this repository to a directory of your choice on your computer using the command line or GitHub Desktop.

  • Download the ZIP file of archive of the repository, move and extract it in the directory of your choice on your computer.

2. Install dependencies using Anaconda or Pip

Instructions for installing dependencies via Anaconda:

  1. Download and install Anaconda

  2. Navigate to the project directory

  3. Open Anaconda prompt in this directory (or Terminal)

  4. Run the following commend from Anaconda prompt (or Terminal) to automatically create an environment from the requirements.txt file: $ conda create --name <env-name> --file requirements.txt

  5. Run the following command to activate the environment: conda activate env-name (where env-name is the name entered between brackets above)

  6. You are now ready to open and run files in the repository in a code editor of your choice that runs your virtual environment (ex: VSCode)

For detailed information about creating, managing, and working with Conda environments, please see the corresponding help page.

Instructions for installing dependencies with pip

If you prefer to manage your packages using pip, navigate in Terminal to the project directory and run the command below to install the preqrequisite packages into your virtual environment:

$ pip install -r requirements.txt

With either install option, you may need to create an additional Jupyter Notebook kernel containing your virtual environment, if it does not automatically appear. See this guide for more information.


Running the data processing code

  1. Place Excel books of integral tables in the data/input folder (Example HPC 370kDa in folder to Quickstart first run of code)
  2. Navigate in Terminal to src directory
  3. Type command python disco-data-processing.py to execute script
  4. Outputs will be generated automatically in data/output. If it is your first time running the script, the output directory will be automatically generated.

Expected Output

Per Book Outputs

  • output from disco-data-processing.py executable

Merged Datasets: Proton Binding, Quality Check

  • output from disco-data-processing.py executable

Standard Figures

  • output from standard-figures.py executable

Unit Tests

Currently, 100% of unit tests pass (on Windows machine).

To run unit tests, navigate to the tests directory, and run the command pytest in Terminal.

Excel Input Data Formatting Requirements

Please format one experiment, and all its technical replicates, as one Excel Book.

Within a book, each tab corresponds to the integral tables of a technical replicate. The exemplary book for HPC 370kDa is provided in the input folder as a template for format requirements.

To Do: Add Guidance on MestreNova Pre-processing and Input Book Formatting

About

Transforms Excel books of DISCO NMR integral tables into merged datasets for downstream analytics and machine learning.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages