-
Notifications
You must be signed in to change notification settings - Fork 19
SVM Specializer
Here we describe how to use the SVM specializer in your Python code and give examples. Please refer to the PhD dissertation for (a lot) more detail here. The specializer uses the efficient GPU code from Catanzaro et. al. It can train a two-class classifier using the SMO algorithm and classify new examples to one of two classes. It supports the following kernel functions:
- Linear
- Gaussian
- Polynomial
- Sigmoid
The SVM specializer comes with the PyCASP framework, see PyCASP Manual
After installing PyCASP, you need to import it in your Python script like so:
from svm_specializer.svm import *
Creating a SVM object is just like creating an object of any class in Python. SVM object constructors don't take any parameters:
svm = SVM()
The constructor allocates the needed data structures internally. Only when data is passed to the object for training does the specializer know how big the data structures (i.e. the support vectors) it needs to allocate.
gmm = GMM(M, D, cvtype='diag', means=my_means, covars=my_covar, weights=my_weights)
Where means, covars and weights are numpy arrays. Note: when training the GMM, these parameters will get overwritten by new parameters after training, if you are using parameters from a different GMM, make sure to make a copy of the parameters first and pass that to the GMM constructor.
To train the GMM object using the Expectation-Maximization (EM) algorithm on a set of observations, use the train()
function:
lkld = gmm.train(data, max_em_iters=1, min_em_iters=3)
Where data
is an N by D numpy array of observation vectors (N vectors, each of D dimensions) and min_em_iters and max_em_iters bound the number of EM iterations (both optional, default min = 1, max = 10). It returns the likelihood of the trained GMM fitting the data.
To compute the log-likelihood of the trained GMM on a new set of observations use the score()
function:
log_lklds = gmm.score(data)
Where data
is an N by D numpy array. The function returns a numpy array of N log-likelihoods, one for each observation vector. To get cummulative statistics about the data, you can use numpy.average() or numpy.sum().
We emulate the functionality provided by sklearn.mixture.GMM by providing other functions to evaluate trained GMMs.
log_lklds, posteriors = gmm.eval(data)
returns N log-likelihoods, and N by M posterior probabilities (a probability of each component explaining each event).
log_lklds, indexes = gmm.decode(self, obs_data)
returns N log-likelihoods, and N indexes indicating which component most probably explained each event.
indexes = gmm.predict(self, obs_data)
returns N indexes N indexes indicating which component most probably explained each event.
You can access the GMM mean, covariance and weight parameters like so:
means = gmm.components.means
covariance = gmm.components.covars
weights = gmm.components.weights
means
is an M by D array (number of components by number of dimensions), covariance
is an M by D by D array (number of components by number of dimensions by number of dimensions) and weights
is an array of size M (number of components).
This is a simple example that takes a training dataset training_data
, creates a 32-component GMM and trains it on the data, and then computes the average log_likelihood of a testing dataset:
from gmm_specializer.gmm import *
import numpy as np
training_data = np.array(get_training_data()) # training_data.shape = (N1, D)
testing_data = np.array(get_testing_data()) # testing_data.shape = (N2, D)
M = 32
D = training_data.shape[1] # get the D dimension from the data
gmm = GMM(M, D, cvtype=1) # create new GMM object
gmm.train(training_data, max_em_iters=5) # train the GMM on the training data
log_lklds = gmm.score(testing_data) # compute the log likelihoods of the testing data obsevations
print "Average log likelihood for testing data = ", np.average(log_lklds)
To be continued...