Skip to content

FNNDSC/pl-unirep_analysis

Repository files navigation

pl-unirep_analysis

https://img.shields.io/docker/v/fnndsc/pl-unirep_analysis?sort=semver https://img.shields.io/github/license/fnndsc/pl-unirep_analysis

unirep_analysis is a ChRIS app that is wrapped around the UniRep project (https://github.com/churchlab/UniRep)

This plugin is GPU-capable. The 64-unit model should be OK to run on any machine. The full-sized model will require a machine with more than 8GB of GPU RAM.

For full information about the underlying method, consult the UniRep publication:

Paper: https://www.nature.com/articles/s41592-019-0598-1

The source code of UniRep is available on Github: https://github.com/churchlab/UniRep.

unirep_analysis                                                     \
                            [--dimension <modelDimension>]          \
                            [--batch_size <batchSize>]              \
                            [--learning_rate <learningRate>]        \
                            [--inputFile <inputFileToProcess>]      \
                            [--inputGlob <inputGlobPattern>]        \
                            [--modelWeightPath <pathToWeights>]     \
                            [--outputFile <resultOutputFile>]       \
                            [--topModelTraining]                    \
                            [--jointModelTraining]                  \
                            [--json]                                \
                            <inputDir>
                            <outputDir>

unirep_analysis is a ChRIS-based "plugin" application that is capable of inferencing protein sequence representations and generative modelling aka "babbling".

Simply pull the docker image,

docker pull fnndsc/pl-unirep_analysis

and go straight to the examples section.

[--dimension <modelDimension>]
By default, the <modelDimension> is 64. However, the value can be changed
to 1900 (full) or 256 and the corresponding weights files (present inside
the container) will be used.

[--batch_size <batchSize>]
This represents the batch size of the babbler. Default value is 12.

[--learning_rate <learningRate>]
Needed to build the model. Default is 0.001.

[--inputFile <inputFileToProcess>]
The name of the input text file that contains your amino acid sequences.
The default file name is an empty string. The full path to the
<inputFileToProcess> is constructed by concatenating <inputDir>

        <inputDir>/<inputFileToProcess>

[--inputGlob <inputGlob>]
A glob pattern string, default '**/*txt', that specifies the file containing
an amino acid sequence. This parameter allows for dynamic searching in the
input space a sequence file, and the first "hit" is grabbed.

[--modelWeightPath <path>]
A path to a directory containing model weight files to use for inference.

[--outputFile <resultOutputFile>]
The name of the output or formatted 'txt' file. Default name is 'format.txt'

[--topModelTraining]
If specified, run a training model just optimizing top model

[--jointModelTraing]
If specified, jointly train top model and mLSTM

[-h]
Display inline help

[--json]
If specified, print a JSON representation of the app.

The execute vector of this plugin is via docker.

To run using docker, be sure to assign an "input" directory to /incoming and an output directory to /outgoing. Make sure that the $(pwd)/out directory is world writable!

Now, prefix all calls with

docker run --rm -v $(pwd)/out:/outgoing                        \
        fnndsc/pl-unirep_analysis                              \
        unirep_analysis                                        \

Thus, getting inline help is:

mkdir in out && chmod 777 out
docker run --rm -v $(pwd)/in:/incoming -v $(pwd)/out:/outgoing      \
        fnndsc/pl-unirep_analysis                                   \
        unirep_analysis                                             \
        -h                                                          \
        /incoming /outgoing

Assuming that the <inputDir> layout conforms to

<inputDir>
    │
    └──█ sequence.txt

to process this (by default on a GPU) do

docker run   --rm --gpus all                                             \
             -v $(pwd)/in:/incoming -v $(pwd)/out:/outgoing              \
             fnndsc/pl-unirep_analysis unirep_analysis                   \
             --inputFile sequence.txt --outputFile formatted.txt         \
             /incoming /outgoing

(note the --gpus all is not necessarily required) which will create in the <outputDir>:

<outputDir>
    │
    └──█ formatted.txt

To perform in-line debugging of the container, do

docker run --rm -it --userns=host  -u $(id -u):$(id -g)                                     \
    -v $PWD/unirep_analysis.py:/usr/local/lib/python3.5/dist-packages/unirep_analysis.py:ro \
    -v $PWD/src:/usr/local/lib/python3.5/dist-packages/src                                  \
       -v $PWD/in:/incoming:ro -v $PWD/out:/outgoing:rw -w /outgoing                        \
       local/pl-unirep_analysis2 unirep_analysis /incoming /outgoing

Note, if you want to use pudb for debugging, then omit the -u $(id -u):$(id -g):

docker run --rm -it --userns=host                                                           \
    -v $PWD/unirep_analysis.py:/usr/local/lib/python3.5/dist-packages/unirep_analysis.py:ro \
    -v $PWD/src:/usr/local/lib/python3.5/dist-packages/src                                  \
       -v $PWD/in:/incoming:ro -v $PWD/out:/outgoing:rw -w /outgoing                        \
       local/pl-unirep_analysis2 unirep_analysis /incoming /outgoing

Of course, in both cases above, use approrpiate CLI args if required.

https://raw.githubusercontent.com/FNNDSC/cookiecutter-chrisapp/master/doc/assets/badge/light.png

_-30-_