Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[RFC] Update ICA interface and implementation #122

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ os:
- linux
julia:
- 1.0
- 1.1
- 1.4
- nightly
notifications:
email: false
Expand Down
7 changes: 4 additions & 3 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,16 @@ version = "0.7.0"
[deps]
Arpack = "7d9fca2a-8960-54d3-9f78-7d1dccf2cb97"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
SparseArrays = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"

[compat]
StatsBase = ">=0.29"

[extras]
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[targets]
test = ["Test"]

[compat]
StatsBase = ">=0.29"
25 changes: 11 additions & 14 deletions docs/source/ica.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Independent Component Analysis

`Independent Component Analysis <http://en.wikipedia.org/wiki/Independent_component_analysis>`_ (ICA) is a computational technique for separating a multivariate signal into additive subcomponents, with the assumption that the subcomponents are non-Gaussian and independent from each other.

There are multiple algorithms for ICA. Currently, this package implements the Fast ICA algorithm.
There are multiple algorithms for ICA. Currently, this package implements the Fast ICA algorithm.

ICA
~~~~
Expand All @@ -17,7 +17,7 @@ The package uses a type ``ICA``, defined below, to represent an ICA model:
W::Matrix{T} # component coefficient matrix, of size (m, k)
end

**Note:** Each column of ``W`` here corresponds to an independent component.
**Note:** Each column of ``W`` here corresponds to an independent component.

Several methods are provided to work with ``ICA``. Let ``M`` be an instance of ``ICA``:

Expand All @@ -27,11 +27,11 @@ Several methods are provided to work with ``ICA``. Let ``M`` be an instance of `

.. function:: outdim(M)

Get the output dimension, *i.e* the number of independent components.
Get the output dimension, *i.e* the number of independent components.

.. function:: mean(M)

Get the mean vector.
Get the mean vector.

**Note:** if ``M.mean`` is empty, this function returns a zero vector of length ``indim(M)``.

Expand All @@ -47,13 +47,13 @@ One can use ``fit`` to perform ICA over a given data set.

.. function:: fit(ICA, X, k; ...)

Perform ICA over the data set given in ``X``.
Perform ICA over the data set given in ``X``.

:param X: The data matrix, of size ``(m, n)``. Each row corresponds to a mixed signal, while each column corresponds to an observation (*e.g* all signal value at a particular time step).

:param k: The number of independent components to recover.

:return: The resultant ICA model, an instance of type ``ICA``.
:return: The resultant ICA model, an instance of type ``ICA``.

**Note:** If ``do_whiten`` is ``true``, the return ``W`` satisfies :math:`\mathbf{W}^T \mathbf{C} \mathbf{W} = \mathbf{I}`, otherwise ``W`` is orthonormal, *i.e* :math:`\mathbf{W}^T \mathbf{W} = \mathbf{I}`

Expand All @@ -65,13 +65,11 @@ One can use ``fit`` to perform ICA over a given data set.
=========== ======================================================= ===================
alg The choice of algorithm (must be ``:fastica``) ``:fastica``
----------- ------------------------------------------------------- -------------------
fun The approx neg-entropy functor. It can be obtained ``icagfun(:tanh)``
using the function ``icagfun``. Now, it accepts
fun The approx neg-entropy functor. Now, it accepts
the following values:

- ``icagfun(:tanh)``
- ``icagfun(:tanh, a)``
- ``icagfun(:gaus)``
- ``Tanh(a)``
- ``Gaus()``
----------- ------------------------------------------------------- -------------------
do_whiten Whether to perform pre-whitening ``true``
----------- ------------------------------------------------------- -------------------
Expand Down Expand Up @@ -107,12 +105,11 @@ The package also exports functions of the core algorithms. Sometimes, it can be

:param W: The initial un-mixing matrix, of size ``(m, k)``. The function updates this matrix inplace.
:param X: The data matrix, of size ``(m, n)``. This matrix is input only, and won't be modified.
:param fun: The approximate neg-entropy functor, which can be obtained using ``icagfun`` (see above).
:param maxiter: Maximum number of iterations.
:param fun: The approximate neg-entropy functor ( ∈ {`Tanh(a), Gaus()`}).
:param maxiter: Maximum number of iterations.
:param tol: Tolerable change of ``W`` at convergence.
:param verbose: Whether to display iteration information.

:return: The updated ``W``.

**Note:** The number of components is inferred from ``W`` as ``size(W, 2)``.

3 changes: 2 additions & 1 deletion src/MultivariateStats.jl
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,8 @@ module MultivariateStats
## ica
ICA, # Type: the Fast ICA model

icagfun, # a function to get a ICA approx neg-entropy functor
Tanh, # An ICADeriv type
Gaus, # An ICADeriv type
fastica!, # core algorithm function for the Fast ICA

## fa
Expand Down
163 changes: 125 additions & 38 deletions src/ica.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

#### FastICA type

abstract type AbstractICAAlg end

struct FastICA <: AbstractICAAlg end

struct ICA{T<:Real}
mean::Vector{T} # mean vector, of length m (or empty to indicate zero mean)
W::Matrix{T} # component coefficient matrix, of size (m, k)
Expand All @@ -24,28 +28,47 @@ transform(M::ICA, x::AbstractVecOrMat) = transpose(M.W) * centralize(x, M.mean)
#
# It returns a function value v, and derivative d
#
abstract type ICAGDeriv{T<:Real} end
abstract type ICAGDeriv end

struct Tanh{T} <: ICAGDeriv{T}
struct Tanh{T} <: ICAGDeriv
a::T
end

evaluate(f::Tanh{T}, x::T) where T<:Real = (a = f.a; t = tanh(a * x); (t, a * (1 - t * t)))

struct Gaus{T} <: ICAGDeriv{T} end
evaluate(f::Gaus{T}, x::T) where T<:Real = (x2 = x * x; e = exp(-x2/2); (x * e, (1 - x2) * e))
function update_UE!(f::Tanh{T}, U::AbstractMatrix{T}, E1::AbstractVector{T}) where T
n,k = size(U)
_s = zero(T)
a = f.a
@inbounds for j = 1:k
@inbounds @fastmath for i = 1:n
t = tanh(a * U[i,j])
U[i,j] = t
_s += a * (1 - t^2)
end
E1[j] = _s / n
end
end

struct Gaus <: ICAGDeriv end

## a function to get a g-fun
evaluate(f::Gaus, x::T) where T<:Real = (x2 = x * x; e = exp(-x2/2); (x * e, (1 - x2) * e))

icagfun(fname::Symbol, ::Type{T} = Float64) where T<:Real=
fname == :tanh ? Tanh{T}(1.0) :
fname == :gaus ? Gaus{T}() :
error("Unknown gfun $(fname)")
function update_UE!(f::Gaus, U::AbstractMatrix{T}, E1::AbstractVector{T}) where T
n,k = size(U)
_s = zero(T)
@inbounds for j = 1:k
for i = 1:n
u = U[i,j]
u2 = u^2
e = exp(-u2/2)
U[i,j] = u * e
_s += (1 - u2) * e
end
E1[j] = _s / n
end
end

icagfun(fname::Symbol, a::T) where T<:Real =
fname == :tanh ? Tanh(a) :
fname == :gaus ? error("The gfun $(fname) has no parameters") :
error("Unknown gfun $(fname)")

# Fast ICA
#
Expand All @@ -55,9 +78,9 @@ icagfun(fname::Symbol, a::T) where T<:Real =
# Independent Component Analysis: Algorithms and Applications.
# Neural Network 13(4-5), 2000.
#
function fastica!(W::DenseMatrix{T}, # initialized component matrix, size (m, k)
function ica!(::FastICA, W::DenseMatrix{T}, # initialized component matrix, size (m, k)
X::DenseMatrix{T}, # (whitened) observation sample matrix, size(m, n)
fun::ICAGDeriv{T}, # approximate neg-entropy functor
fun::ICAGDeriv, # approximate neg-entropy functor
maxiter::Int, # maximum number of iterations
tol::Real) where T<:Real# convergence tolerance

Expand All @@ -79,49 +102,39 @@ function fastica!(W::DenseMatrix{T}, # initialized component matrix, size (
# normalize each column
for j = 1:k
w = view(W,:,j)
rmul!(w, 1.0 / sqrt(sum(abs2, w)))
rmul!(w, one(T) / sqrt(sum(abs2, w)))
end

# main loop
chg = NaN
chg = T(NaN)
t = 0
converged = false
while !converged && t < maxiter
@inbounds while !converged && t < maxiter
t += 1
copyto!(Wp, W)

# apply W of previous step
mul!(U, transpose(X), W) # u <- w'x

# compute g(w'x) --> U and E{g'(w'x)} --> E1
_s = 0.0
for j = 1:k
for i = 1:n
u, v = evaluate(fun, U[i,j])
U[i,j] = u
_s += v
end
E1[j] = _s / n
end
update_UE!(fun, U, E1)

# compute E{x g(w'x)} --> Y
rmul!(mul!(Y, X, U), 1.0 / n)
rmul!(mul!(Y, X, U), one(T) / n)

# update w: E{x g(w'x)} - E{g'(w'x)} w := y - e1 * w
for j = 1:k
w = view(W,:,j)
y = view(Y,:,j)
e1 = E1[j]
for i = 1:m
w[i] = y[i] - e1 * w[i]
end
@. w = y - e1 * w
end

# symmetric decorrelation: W <- W * (W'W)^{-1/2}
copyto!(W, W * _invsqrtm!(W'W))

# compare with Wp to evaluate a conversion change
chg = maximum(abs.(abs.(diag(W*Wp')) .- 1.0))
chg = maximum(abs.(abs.(diag(W*Wp')) .- one(T)))
converged = (chg < tol)

@debug "Iteration $t" change=chg tolerance=tol
Expand All @@ -132,10 +145,10 @@ end

#### interface function

function fit(::Type{ICA}, X::AbstractMatrix{T}, # sample matrix, size (m, n)
function fit(::Type{ICA}, X::AbstractMatrix{T}, # sample matrix, size (m, n)
k::Int; # number of independent components
alg::Symbol=:fastica, # choice of algorithm
fun::ICAGDeriv=icagfun(:tanh, T), # approx neg-entropy functor
alg::AbstractICAAlg, # choice of algorithm
fun::ICAGDeriv=Tanh(one(T)), # approx neg-entropy functor
do_whiten::Bool=true, # whether to perform pre-whitening
maxiter::Integer=100, # maximum number of iterations
tol::Real=1.0e-6, # convergence tolerance
Expand All @@ -146,8 +159,6 @@ function fit(::Type{ICA}, X::AbstractMatrix{T}, # sample matrix,
m, n = size(X)
n > 1 || error("There must be more than one samples, i.e. n > 1.")
k <= min(m, n) || error("k must not exceed min(m, n).")

alg == :fastica || error("alg must be :fastica")
maxiter > 1 || error("maxiter must be greater than 1.")
tol > 0 || error("tol must be positive.")

Expand All @@ -169,11 +180,87 @@ function fit(::Type{ICA}, X::AbstractMatrix{T}, # sample matrix,
W = (isempty(winit) ? randn(T, size(Z,1), k) : copy(winit))

# invoke core algorithm
fastica!(W, Z, fun, maxiter, tol)
ica!(alg, W, Z, fun, maxiter, tol)

# construct model
if do_whiten
W = W0 * W
end
return ICA(mv, W)
end



# function fastica!(W::DenseMatrix{T}, # initialized component matrix, size (m, k)
# X::DenseMatrix{T}, # (whitened) observation sample matrix, size(m, n)
# fun::ICAGDeriv{T}, # approximate neg-entropy functor
# maxiter::Int, # maximum number of iterations
# tol::Real) where T<:Real# convergence tolerance
#
# # argument checking
# m = size(W, 1)
# k = size(W, 2)
# size(X, 1) == m || throw(DimensionMismatch("Sizes of W and X mismatch."))
# n = size(X, 2)
# k <= min(m, n) || throw(DimensionMismatch("k must not exceed min(m, n)."))
#
# @debug "FastICA Algorithm" m=m n=n k=k
#
# # pre-allocated storage
# Wp = Matrix{T}(undef, m, k) # to store previous version of W
# U = Matrix{T}(undef, n, k) # to store w'x & g(w'x)
# Y = Matrix{T}(undef, m, k) # to store E{x g(w'x)} for components
# E1 = Vector{T}(undef, k) # store E{g'(w'x)} for components
#
# # normalize each column
# for j = 1:k
# w = view(W,:,j)
# rmul!(w, one(T) / sqrt(sum(abs2, w)))
# end
#
# # main loop
# chg = NaN
# t = 0
# converged = false
# @inbounds while !converged && t < maxiter
# t += 1
# copyto!(Wp, W)
#
# # apply W of previous step
# mul!(U, transpose(X), W) # u <- w'x
#
# # compute g(w'x) --> U and E{g'(w'x)} --> E1
# _s = zero(T)
# for j = 1:k
# for i = 1:n
# u, v = evaluate(fun, U[i,j])
# U[i,j] = u
# _s += v
# end
# E1[j] = _s / n
# end
#
# # compute E{x g(w'x)} --> Y
# rmul!(mul!(Y, X, U), one(T) / n)
#
# # update w: E{x g(w'x)} - E{g'(w'x)} w := y - e1 * w
# for j = 1:k
# w = view(W,:,j)
# y = view(Y,:,j)
# e1 = E1[j]
# @. w = y - e1 * w
#
# end
#
# # symmetric decorrelation: W <- W * (W'W)^{-1/2}
# copyto!(W, W * _invsqrtm!(W'W))
#
# # compare with Wp to evaluate a conversion change
# chg = maximum(abs.(abs.(diag(W*Wp')) .- one(T)))
# converged = (chg < tol)
#
# @debug "Iteration $t" change=chg tolerance=tol
# end
# converged || throw(ConvergenceException(maxiter, chg, oftype(chg, tol)))
# return W
# end
Loading