RAATK

RAATK: A Python-based reduce amino acid toolkit of machine learning for protein sequence level inference.

Installation

It is recommended to use pip for installation from github.

$ pip install git+https://github.com/huang-sh/raatk.git@master -U

or

$pip install raatk

All commands within paper can be tested by running demo.sh in demo directory after installing RAATK

$ ./demo.sh

Function

view reduced amio acid alphabet
reduce amino acid sequence
extract sequence feature
evaluation
result visualization
ROC evaluation
feature selection
train model
prediction
split data
transfer format

Command

view

$raatk view -t 9 -s 2 4 6 10 12 14 16 --visual

Output:

type9  2  IMVLFWY-GPCASTNHQEDRK                   BLOSUM50 matrix
type9  4  IMVLFWY-G-PCAST-NHQEDRK                 BLOSUM50 matrix
type9  6  IMVL-FWY-G-P-CAST-NHQEDRK               BLOSUM50 matrix
type9  10 IMV-L-FWY-G-P-C-A-STNH-QERK-D           BLOSUM50 matrix
type9  12 IMV-L-FWY-G-P-C-A-ST-N-HQRK-E-D         BLOSUM50 matrix
type9  14 IMV-L-F-WY-G-P-C-A-S-T-N-HQRK-E-D       BLOSUM50 matrix
type9  16 IMV-L-F-W-Y-G-P-C-A-S-T-N-H-QRK-E-D     BLOSUM50 matrix

reduce

reduce sequence according to built-in reduction alphabets. And the output is stored in directories.

$raatk reduce positive.txt negative.txt -t 1-8 -s 2-19 -o pos neg

reduce sequence according to specific amino acid cluster. The output result is in a single file.

$raatk reduce positive.txt -c IMV-L-FWY-G-P-C-A-STNH-QERK-D -o reduce_positive.txt

extract

extract sequence features of directories, and the output is also stored in directories.

$raatk extract pos neg -k 3 -d -o k3 -m

extract sequence features of files, and the output is also stored in files.

$raatk extract pos/type9/4-IGPN.txt neg/type9/4-IGPN.txt -k 1 -o t9s4-k1.csv -m -raa IGPN

Output:

label,I,G,P,N
0.000000,0.125000,0.062500,0.562500,0.250000
0.000000,0.291667,0.166667,0.416667,0.125000
0.000000,0.277778,0.083333,0.416667,0.222222
                  ......
1.000000,0.177778,0.133333,0.377778,0.311111
1.000000,0.166667,0.000000,0.583333,0.250000
1.000000,0.387097,0.161290,0.322581,0.129032

And a feature file without label and the feature use

$raatk extract pos/type9/4-IGPN.txt -k 1 -o t9s4-k1p.csv -raa IGPN --count --label-f

Output:

I,G,P,N
2.000000,1.000000,9.000000,4.000000
7.000000,4.000000,10.000000,3.000000
10.000000,3.000000,15.000000,8.000000
                  ......

eval

evaluate the performance of different alphabet clusters based on machine learning. And the output is a json file.

$raatk eval k3 -d -o k3-eval -clf svm -c 2 -g 0.5 -p 3

evaluate a single file.

$raatk eval k3/type2/10-ARNCQHIFPW.csv -cv -1 -c 2 -g 0.5 -o k3-t2s10.txt

output:

                        0                         
0   38  7
1   7  36

      tp   fn   fp   tn   recall  precision  f1-score  
  0   38    7    7   36    0.84     0.84       0.84    
  1   36    7    7   38    0.84     0.84       0.84    
acc                                            0.84
mcc                                            0.68
-------------------------------------------------------

plot

result of json visualization

$raatk plot k3-eval.json -o k3p

output:

roc

ROC evaluation

$raatk roc k3/type2/10-ARNCQHIFPW.csv -clf svm -cv 5 -c 2 -g 0.5 -o roc

output:

ifs

incremental feature selection

$raatk ifs k3/type2/10-ARNCQHIFPW.csv -s 2 -clf svm -cv 5 -c 2 -g 0.5 -o ifs

output:

train

train a classifier for prediction

$raatk train ifs_best.csv -clf svm -c 2 -g 0.5 -o svm.model -prob

predict

predict new data using trained model. The new data must be feature file without label and feature extract parameter must be same as training feature.

$raatk predict new_data.csv -m svm.model -o 'test-result.csv'

split

split feature data into train and test subsets

$raatk split ifs_best.csv -ts 0.3 -o test_split.csv

transfer

transfer csv to arff for Weka.

$raatk transfer ifs_best.csv -fmt arff

Contact

If you have any problem, contact me with hsh-me@outlook.com.

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
demo		demo
img		img
raatk		raatk
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAATK

Installation

Function

Command

view

reduce

extract

eval

plot

roc

ifs

train

predict

split

transfer

Contact

About

Releases

Packages

Languages

License

lihaicheng7003/raatk

Folders and files

Latest commit

History

Repository files navigation

RAATK

Installation

Function

Command

Contact

About

Resources

License

Stars

Watchers

Forks

Languages