Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Refactor PCA and docs #163

Merged
merged 6 commits into from
Oct 28, 2021
Merged

Refactor PCA and docs #163

merged 6 commits into from
Oct 28, 2021

Conversation

wildart
Copy link
Collaborator

@wildart wildart commented Oct 20, 2021

Towards #109

@wildart wildart merged commit c0f74de into master Oct 28, 2021
@wildart wildart deleted the wa/pca_rewrite branch October 28, 2021 20:43
@eliascarv
Copy link

Hi. My name is Elias Carvalho.
I think the PCA struct can be improved by adding type parameters to abstract types. Going from this:

struct PCA{T<:Real} <: LinearDimensionalityReduction
    mean::AbstractVector{T}
    proj::AbstractMatrix{T}
    prinvars::AbstractVector{T}
    tprinvar::T
    tvar::T
end

To that:

struct PCA{T<:Real, V<:AbstractVector{T}, M<:AbstractMatrix{T}} <: LinearDimensionalityReduction
    mean::V
    proj::M
    prinvars::V
    tprinvar::T
    tvar::T
end

The idea is that fields with Abstract types in structs arr not good for performance, as presented here in documentation: https://docs.julialang.org/en/v1/manual/performance-tips/#Avoid-fields-with-abstract-type

@wildart
Copy link
Collaborator Author

wildart commented Dec 11, 2021

Could you provide an example where the current type will affect performance? I guess, if you need an array type of a projection field, that could require some time checking.

@eliascarv
Copy link

eliascarv commented Dec 11, 2021

Yes I can, consider the following code:

image

Now notice the performance difference:

image

The performance problem happens because it is not efficient to access a property that has an abstract type.

@wildart
Copy link
Collaborator Author

wildart commented Dec 11, 2021

This is not a general usage of the PCA object, you would not iterate over a mean or a projection matrix. Usually, it's a matrix multiplication as in here:

reconstruct(M::PCA, y::AbstractVecOrMat{T}) where {T<:Real} = decentralize(M.proj * y, M.mean)

So, if we test MM performance as follows

struct S1{T}
    a::AbstractMatrix{T}
end
struct S2{T, M<:AbstractMatrix{T}}
    a::M
end
 
test(t::S1{T}, v::V) where {T, V<:AbstractVector{T}} = t.a*v
test(t::S2{T, M}, v::V) where {T,M<:AbstractMatrix{T},V<:AbstractVector{T}} = t.a*v 

v = collect(1.0:10.)
t1 = S1(v*v')
t2 = S2(v*v') 
@benchmark test(t1,v)
@benchmark test(t2,v)

The difference for small matrix 10x10 is about ~50ns

julia> @benchmark test(t1,v)
BenchmarkTools.Trial: 10000 samples with 413 evaluations.
 Range (min … max):  238.850 ns …  2.484 μs  ┊ GC (min … max): 0.00% … 85.37%
 Time  (median):     244.431 ns              ┊ GC (median):    0.00%
 Time  (mean 1 σ):   253.411 ns 1 79.018 ns  ┊ GC (mean 1 σ):  1.12% 1  3.29%

   ▁▇██▇▄               ▁         ▁             ▂▂▂▁▁     ▁    ▂
  ▇███████▅▁▁▃▁▃▁▁▁▄█▇██████▆▅▄▆█████▆▅▄▄▆▄▄▄▅▇███████▇▆▇▇██▇▇ █
  239 ns        Histogram: log(frequency) by time       317 ns <

 Memory estimate: 160 bytes, allocs estimate: 1.

julia> @benchmark test(t2,v)
BenchmarkTools.Trial: 10000 samples with 636 evaluations.
 Range (min … max):  193.121 ns …  1.787 μs  ┊ GC (min … max): 0.00% … 83.13%
 Time  (median):     196.686 ns              ┊ GC (median):    0.00%
 Time  (mean 1 σ):   205.346 ns 1 64.292 ns  ┊ GC (mean 1 σ):  1.37% 1  4.01%

  ▁▇█▆▁           ▁▁   ▁▁                 ▁▁▁                  ▂
  █████▅▁▄▁▄▄▆▇██████▇▇██▇▇▅▄▅▆▆▅▃▄▃▄▅▃▄▅██████████▆▅▄▄▅▅▃▄▃▄▄ █
  193 ns        Histogram: log(frequency) by time       283 ns <

 Memory estimate: 160 bytes, allocs estimate: 1.

And for the larger projection matrix 100x100 - almost no difference.

julia> @benchmark test(t1,v)
BenchmarkTools.Trial: 10000 samples with 7 evaluations.
 Range (min … max):  4.187 μs … 256.233 μs  ┊ GC (min … max): 0.00% … 97.03%
 Time  (median):     4.890 μs               ┊ GC (median):    0.00%
 Time  (mean 1 σ):   5.014 μs 1   3.968 μs  ┊ GC (mean 1 σ):  0.50% 1  0.97%

       ▃▆█▆▃         ▃▃▂▁      ▁▂▃▃▂▁                          
  ▁▁▂▃▇█████▇▄▂▁▂▂▃▆█████▇▅▅▄▅███████▇▅▄▃▃▂▂▂▂▃▂▂▃▂▂▂▂▁▂▁▁▁▂▁ ▃
  4.19 μs         Histogram: frequency by time        6.07 μs <

 Memory estimate: 896 bytes, allocs estimate: 1.

julia> @benchmark test(t2,v)
BenchmarkTools.Trial: 10000 samples with 7 evaluations.
 Range (min … max):  4.120 μs … 255.155 μs  ┊ GC (min … max): 0.00% … 97.01%
 Time  (median):     4.832 μs               ┊ GC (median):    0.00%
 Time  (mean 1 σ):   4.966 μs 1   3.712 μs  ┊ GC (mean 1 σ):  0.50% 1  0.97%

      ▃▇█▅        ▂▃▂       ▂▃▂▂                               
  ▁▁▂▆█████▄▂▁▁▂▃▆████▆▄▄▄▅██████▆▅▃▃▂▂▂▂▃▃▃▂▂▂▂▁▁▂▂▂▁▁▁▁▁▁▁▁ ▃
  4.12 μs         Histogram: frequency by time        6.32 μs <

 Memory estimate: 896 bytes, allocs estimate: 1.

I post this not to discourage, if you can contribute to the package with the above modification, your PR is always welcome.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants