Skip to content
egonina edited this page Jul 15, 2013 · 2 revisions

Here we describe how to use the GMM specializer in your Python code and give examples. Please refer to our HotPar'11 and ASRU'11 papers for details on the specializer and the speaker diarization example respectively. Our specializer uses numpy to store and manipulate arrays. Note that this code is still under development.

The GMM specializer comes with the PyCASP framework, see PyCASP Manual

Importing the specializer

After installing PyCASP, you need to import it in your Python script like so:

from gmm_specializer.gmm import *

Creating the GMM object

Creating a GMM object is just like creating an object of any class in Python. You can either create an empty GMM object, specifying its dimensions (M = number of components, D = dimension) and if it has either a diagonal or full covariance matrix (cvtype='diag' for diagonal and cvtype='full' for full, diagonal by default):

gmm = GMM(M, D, cvtype='diag')

The parameters will be initialized randomly from the data when the train() function is called (see below). GMM can also be initialized with existing parameters, like so:

gmm = GMM(M, D, cvtype='diag', means=my_means, covars=my_covar, weights=my_weights)

Where means, covars and weights are numpy arrays. Note: when training the GMM, these parameters will get overwritten by new parameters after training, if you are using parameters from a different GMM, make sure to make a copy of the parameters first and pass that to the GMM constructor.

GMM Training

To train the GMM object using the Expectation-Maximization (EM) algorithm on a set of observations, use the train() function:

lkld = gmm.train(data, max_em_iters=1, min_em_iters=3)

Where data is an N by D numpy array of observation vectors (N vectors, each of D dimensions) and min_em_iters and max_em_iters bound the number of EM iterations (both optional, default min = 1, max = 10). It returns the likelihood of the trained GMM fitting the data.

Computing likelihood given the trained GMM

To compute the log-likelihood of the trained GMM on a new set of observations use the score() function:

log_lklds = gmm.score(data)

Where data is an N by D numpy array. The function returns a numpy array of N log-likelihoods, one for each observation vector. To get cummulative statistics about the data, you can use numpy.average() or numpy.sum().

Other evaluation functions for trained GMMs

We emulate the functionality provided by sklearn.mixture.GMM by providing other functions to evaluate trained GMMs.

log_lklds, posteriors = gmm.eval(data) returns N log-likelihoods, and N by M posterior probabilities (a probability of each component explaining each event).

log_lklds, indexes = gmm.decode(self, obs_data) returns N log-likelihoods, and N indexes indicating which component most probably explained each event.

indexes = gmm.predict(self, obs_data) returns N indexes N indexes indicating which component most probably explained each event.

Accessing the GMM parameters

You can access the GMM mean, covariance and weight parameters like so:

means = gmm.components.means

covariance = gmm.components.covars

weights = gmm.components.weights

means is an M by D array (number of components by number of dimensions), covariance is an M by D by D array (number of components by number of dimensions by number of dimensions) and weights is an array of size M (number of components).

Example: Simple Training and Evaluation

This is a simple example that takes a training dataset training_data, creates a 32-component GMM and trains it on the data, and then computes the average log_likelihood of a testing dataset:

      from gmm_specializer.gmm import *
      import numpy as np

      training_data = np.array(get_training_data()) # training_data.shape = (N1, D)
      testing_data = np.array(get_testing_data()) # testing_data.shape = (N2, D)

      M = 32
      D = training_data.shape[1] # get the D dimension from the data

      gmm = GMM(M, D, cvtype=1) # create new GMM object

      gmm.train(training_data, max_em_iters=5) # train the GMM on the training data

      log_lklds = gmm.score(testing_data) # compute the log likelihoods of the testing data obsevations

      print "Average log likelihood for testing data = ", np.average(log_lklds) 

Other Examples

To be continued...