ml-toolbox

This repo contains various data science strategy and machine learning models to deal with structure as well as unstructured data. It contains module on feature-preprocessing, feature-engineering, machine-learning-models, bayesian-parameter-tuning, etc. Some of these features are collected from the existed libraries such as scikit-learn, keras, h2o, xgboost, lightgbm, catboost, etc. I have also added some technique, which I implemented by following the Research Paper and Data-Scientist advice(on kaggle). There are a lot of feature engineering strategy as well, which I developed during ML-contest and helped me a lot in those contest.

I use this toolbox for my personal usuage. I consistently update it, to make more generic.

Feature preprocessing
- cleaning
- handling null value
- normalization
- grouping unknow variable
- memory optimization
- text preprocessing
Feature Engineering
- label-encoder/one-hot/binary/hashing
- binning/quantile-binning
- target-encoding
- bayesian-encoding
- feature-interaction
- date-time feature
- time-lag featue(in time series)
- rounding/decimal value
- relation feature(aggregation based)
- text feature using tf-idf, count-vect
- clustering based feature(linear/non-linear)
- polynomial feature
- statistical ferature
EDA
- boxplot/kdeplot/countplot/pairplot
- heatmap
Machine Learning models
- Tree base model
  - xgboost/lighgbm/catboost
  - sklearn: decision-tree/random-forest/extra-tree/GBM
- Linear/Non-Linear Model:
  - Logistic-Regression/lasso/Ridge/Passive-Agreesive/SVM
- Regularized Greedy forest(in progress)
- field aware factorization machine
- online learning(vowpal rabbit/follow the regularized leader)(in progress)
- h2o models (gbm/rf/nn/auto-ml)
Deep learning models
- neural networks(keras/tensorflow)
- Attention mechanism for LSTM
- Data augmentation(for image)
- cyclic-learning-rate
- keras custom loss-function/metric/callbacks
- pretrained model
- word2vec usuage using gensim
- entity embedding
- data on fly(efficient training)
- segmentation
- graph based neural network
Advances
- gridsearch / bayesian optimization
- Stacking/blending/rank-average
- Bert-Model(pretrained text model)
- parallel-processing for feature-engineering(in progress)

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
examples		examples
src		src
tutorials		tutorials
README.md		README.md
ToDoList.md		ToDoList.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ml-toolbox

Contents:

About

Releases

Packages

Languages

ankishb/ml-toolbox

Folders and files

Latest commit

History

Repository files navigation

ml-toolbox

Contents:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages