- Added a new argument
batches
toparSeqSim()
. The new argument supports breaking down the pairwise similarity computation into smaller batches. This is useful when you have a large number of protein sequences, enough number of CPU cores, but not enough RAM to compute and hold all the pairwise similarities in a single batch. Also, use the other new argumentverbose
to track the computation progress.
- Added a new function
parSeqSimDisk()
. Compared to the in-memory versionparSeqSim()
, this new function caches the partial results in each batch to the hard drive and merges the results together in the end. This could further reduce the memory usage for parallel similarity computations involving a large number of protein sequences.
- Fixed an issue in
parGOSim()
that will create minor numerical inconsistencies in results due to argument matching.
- Updated
twoGOSim()
andparGOSim()
to use the latestGOSemSim
API for computing GO based semantic similarity. Issues in the code examples are also fixed. We thank Denisa Duma for the feedback.
- Fixed the API endpoint issue (from HTTP to HTTPS) in
getUniProt()
.
- Added two new parameters
gap.opening
andgap.extension
toparSeqSim()
, allowing more flexible tuning of the sequence alignment for more types of amino acid sequence data. We thank Dr. Maisa Pinheiro for the feedback. - Added floating TOC and new CSS style in the vignette to improve navigation and readability.
- Added a new function
removeGaps()
for removing/replacing gaps (-
) or any irregular characters from protein sequences, to make them suitable for feature extraction or sequence alignment based similarity computation. We thank Dr. Maisa Pinheiro for the feedback.
- Resolved a critical bug due to improper
ifelse
conditioning (3f6e106) for the distribution descriptor in CTD. We thank Jielu Yan from the University of Macau for kindly reporting this issue.
- General fixes and improvements for the package vignette.
- The function list is now organized into sections on the package website (https://nanx.me/protr/reference/).
- Use system font stack instead of Google Fonts in vignettes to avoid pandoc SSL issue.
- Converted table images to markdown tables in the vignette
- Updated the screenshot of protrweb in the vignette
- Migrated from Sweave-based PDF vignette to knitr-based HTML vignette
- Fix obsolete URLs
- Better R code formatting
- Better function documentation and vignette formatting
- New website: https://nanx.me/protr/
- Added Windows continuous integration support using AppVeyor.
- Better R file naming scheme
- Added continuous integration
- Code style improvements
-
Fix URLs that cannot be accessed by
curl -I -L
:- Use http://protr.org
- Remove all inaccessible URLs
- Bug fix in
extractCTDD()
- Improvements for dealing with boundary cases in several functions (thanks for @koefoed's patches)
- Added citation information
- Minor improvements and fixes for documentation
- Added functions allowing users to specify their own classification of the amino acid
- Documentation improvements
- Other minor improvements
- General documentation improvements
- Added profile-based descriptors derived by PSSM
- Added example workflow using protr in the vignette
- Added LICENSE file according to CRAN policies
- second release
- added Proteochemometric (PCM) Modeling descriptors, parallellized similarity computation derived by protein sequence alignment and Gene Ontology (GO) semantic similarity measures between a list of protein sequences / GO terms / Entrez Gene IDs
- added misc tools and datasets
- initial version of Scales-Based Descriptors derived by Principal Components Analysis
- initial version of Scales-Based Descriptors derived by AA-Properties (AAindex)
- initial version of Scales-Based Descriptors derived by 20+ classes of 2D and 3D Molecular Descriptors
- initial version of Scales-Based Descriptors derived by Factor Analysis
- initial version of Scales-Based Descriptors derived by Multidimensional Scaling
- initial version of BLOSUM and PAM Matrix-Derived Descriptors
- initial version of parallelized pairwise similarity calculation with a list of protein sequences
- initial version of pairwise semantic similarity calculation with a list of GO terms / Entrez Gene IDs
- initial version of Auto Cross Covariance (ACC) for generating scales-based descriptors of the same length
- introducing ProtWeb, the web service based on protr: http://protr.org
- initial version
- first version of Amino Acid Composition descriptor
- first version of Dipeptide Composition descriptor
- first version of Tripeptide Composition descriptor
- first version of Normalized Moreau-Broto Autocorrelation descriptor
- first version of Moran Autocorrelation descriptor
- first version of Geary Autocorrelation descriptor
- first version of CTD - Composition descriptor
- first version of CTD - Transition descriptor
- first version of CTD - Distribution descriptor
- first version of Conjoint Triad descriptor
- first version of Sequence Order Coupling Number descriptor
- first version of Quasi-Sequence-Order descriptor
- first version of Pseudo Amino Acid Composition descriptor
- first version of Amphiphilic Pseudo Amino Acid Composition descriptor
- first version of
readFASTA()
- first version of
getUniProt()
- first version of
protcheck()
- first version of
protseg()