SVM Specializer

Here we describe how to use the SVM specializer in your Python code and give examples. Please refer to the PhD dissertation for (a lot) more detail here. The specializer uses the efficient GPU code from Catanzaro et. al. It can train a two-class classifier using the SMO algorithm and classify new examples to one of two classes (for more detail see the Catanzaro paper). It supports the following kernel functions:

Linear
Gaussian
Polynomial
Sigmoid

The SVM specializer comes with the PyCASP framework, see PyCASP Manual

Importing the specializer

After installing PyCASP, you need to import it in your Python script like so:

from svm_specializer.svm import *

Creating the SVM object

Creating a SVM object is just like creating an object of any class in Python. SVM object constructors don't take any parameters:

svm = SVM()

The constructor allocates the needed data structures internally. Only when data is passed to the object for training does the specializer know how big the data structures (i.e. the support vectors) it needs to allocate.

SVM Training

To train the SVM object using the SMO algorithm on a set of observations, use the train() function:

svm.train(input_data, labels, kernel_type, paramA = None, paramB = None, paramC = None, heuristicMethod = None, tolerance = None, cost = None, epsilon = None)

where the parameters are:

input_data = input data
labels = input data labels
kernel_type1 = can be linear, gaussian, polynomialorsigmoid`
paramA = parameter a for polynomial and sigmoid kernels (default = 1/nPoints) or gamma for gaussian kernel (default = 1/nPoints), where nPoints = number of training points
paramB = parameter r for polynomial and sigmoid kernels (default = 1)
paramC = parameter d for polynomial kernel (default = 3)
heuristicMethod = one of the heuristic methods (see the Catanzaro paper (first, second or adaptive). Adaptive by default.
tolerance = termination criterion tolerance (default = 0.001)
cost = SVM training cost C (default = 10)
epsilon = support vector threshold (default = 1e-5)

Classification using a trained SVM

To compute the log-likelihood of the trained GMM on a new set of observations use the score() function:

log_lklds = gmm.score(data)

Where data is an N by D numpy array. The function returns a numpy array of N log-likelihoods, one for each observation vector. To get cummulative statistics about the data, you can use numpy.average() or numpy.sum().

Accessing the SVM parameters

You can access the GMM mean, covariance and weight parameters like so:

means = gmm.components.means

covariance = gmm.components.covars

weights = gmm.components.weights

means is an M by D array (number of components by number of dimensions), covariance is an M by D by D array (number of components by number of dimensions by number of dimensions) and weights is an array of size M (number of components).

Example: Simple Training and Evaluation

This is a simple example that takes a training dataset training_data, creates a 32-component GMM and trains it on the data, and then computes the average log_likelihood of a testing dataset:

      from gmm_specializer.gmm import *
      import numpy as np

      training_data = np.array(get_training_data()) # training_data.shape = (N1, D)
      testing_data = np.array(get_testing_data()) # testing_data.shape = (N2, D)

      M = 32
      D = training_data.shape[1] # get the D dimension from the data

      gmm = GMM(M, D, cvtype=1) # create new GMM object

      gmm.train(training_data, max_em_iters=5) # train the GMM on the training data

      log_lklds = gmm.score(testing_data) # compute the log likelihoods of the testing data obsevations

      print "Average log likelihood for testing data = ", np.average(log_lklds)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly