Skip to content
egonina edited this page Dec 20, 2013 · 7 revisions

Here we describe how to use the SVM specializer in your Python code and give examples. Please refer to the PhD dissertation for (a lot) more detail here. The specializer uses the efficient GPU code from Catanzaro et. al. It can train a two-class classifier using the SMO algorithm and classify new examples to one of two classes (for more detail see the Catanzaro paper). It supports the following kernel functions:

  1. Linear
  2. Gaussian
  3. Polynomial
  4. Sigmoid

The SVM specializer comes with the PyCASP framework, see PyCASP Manual

Importing the specializer

After installing PyCASP, you need to import it in your Python script like so:

from svm_specializer.svm import *

Creating the SVM object

Creating a SVM object is just like creating an object of any class in Python. SVM object constructors don't take any parameters:

svm = SVM()

The constructor allocates the needed data structures internally. Only when data is passed to the object for training does the specializer know how big the data structures (i.e. the support vectors) it needs to allocate.

SVM Training

To train the SVM object using the SMO algorithm on a set of observations, use the train() function:

svm.train(input_data, labels, kernel_type, paramA = None, paramB = None, paramC = None, heuristicMethod = None, tolerance = None, cost = None, epsilon = None)

where the parameters are:

  • input_data = input data
  • labels = input data labels
  • kernel_type1 = can be linear, gaussian, polynomialorsigmoid`
  • paramA = parameter a for polynomial and sigmoid kernels (default = 1/nPoints) or gamma for gaussian kernel (default = 1/nPoints), where nPoints = number of training points
  • paramB = parameter r for polynomial and sigmoid kernels (default = 1)
  • paramC = parameter d for polynomial kernel (default = 3)
  • heuristicMethod = one of the heuristic methods (see the Catanzaro paper (first, second or adaptive). Adaptive by default.
  • tolerance = termination criterion tolerance (default = 0.001)
  • cost = SVM training cost C (default = 10)
  • epsilon = support vector threshold (default = 1e-5)

The training function trains the SVM on the input data using the specified parameters and stores the support vector in the SVM object. Support vectors can be accessed using the svm.support_vectors() call.

Input data is an N by D numpy array of input points (N points of D dimensions). Labels are an N-dimensional numpy array of integers (-1 or 1). Input data and labels can be parsed from data stored in LIBSVM format using the read_data() function in the following test example.

For more detail of the algorithm and implementation see Catanzaro paper and software.

Classification using a trained SVM

To classify new examples using a trained SVM, use the classify function:

class = svm.classify(input_data, labels)

where

  • input_data = input data
  • labels = ground truth labels, the classify call with compute the accuracy based on these labels

The classification call classifies the input data and returns the most-likely class and computes accuracy if the ground truth labels are passed to it.

Example: Simple Training and Classification

This is a simple example that takes a training dataset training_data, trains a linear SVM and classifies a test set.

      from svm_specializer.svm import *
      import numpy as np

      training_data = np.array(get_training_data()) # training_data.shape = (N, D)
      training_labels = np.array(get_testing_labels()) # testing_labels.shape = (N)

      testing_data = np.array(get_testing_data()) # testing_data.shape = (N2, D)
      testing_labels = np.array(get_testing_labels()) # testing_labels.shape = (N2)

      svm = SVM()
      svm.train(training_data, training_labels, "linear")
      class = svm.classify(testing_data, testing_labels)

For more examples see the svm test suite and speaker verification example. Example of sample data is here.