Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Numerical instability in student t distribution #165

Open
FelixNoessler opened this issue Apr 19, 2024 · 3 comments · May be fixed by #166
Open

Numerical instability in student t distribution #165

FelixNoessler opened this issue Apr 19, 2024 · 3 comments · May be fixed by #166

Comments

@FelixNoessler
Copy link

Hello,

thanks for creating the Distributions.jl package. I used the Student t distribution in a likelihood function and I noticed some numerical instability for large values of the degrees of freedom. I am not sure if this is expected behaviour.

Best,
Felix

using Distributions
using Statistics
using CairoMakie

############# quantile function is numerically unstable for large degrees of freedom
ν = 10.0 .^ LinRange(0, 20, 200)
d = TDist.(ν)
y = quantile.(d, 0.75)
lines(ν, y; axis = (; ylabel = "0.75 quantile",
                      xlabel = "degrees of freedom ν",
                      xscale = log10))

############# likelihood/logpdf is numerically unstable for large degrees of freedom
data = rand.(TDist(100.0), 100)
νs = 10.0 .^ LinRange(0, 20, 200)

lls = Array{Float64}(undef, length(νs))
for (i,ν) in enumerate(νs)
    d = TDist(ν)
    lls[i] = sum(logpdf.(d, data))
end
lines(νs, lls; axis = (; ylabel = "log likelihood",
                      xlabel = "degrees of freedom ν",
                      xscale = log10))

student_t_quantile
student_t_logpdf

@andreasnoack andreasnoack transferred this issue from JuliaStats/Distributions.jl Apr 19, 2024
@andreasnoack
Copy link
Member

andreasnoack commented Apr 19, 2024

I've transfered this issue to StatsFuns since the issue is in

for f in ("invcdf", "invccdf", "invlogcdf", "invlogccdf")
ff = Symbol("fdist"*f)
bf = Symbol("beta"*f)
@eval function $ff(ν1::T, ν2::T, y::T) where {T<:Real}
x = $bf(ν1/2, ν2/2, y)
return x/(1 - x)*ν2/ν1
end
@eval $ff(ν1::Real, ν2::Real, y::Real) = $ff(promote(ν1, ν2, y)...)
end
where we make use of the relationship between the F and Beta distributions. However, this relationship becomes numerically unstable when the second degrees of freedom parameter grows very large. I think the solution should be to switch to the χ² for some threshold of the second degrees of freedom parameter. This is of course identical to switching to the normal distribution when considering the T-distribution but doing it for the F-distribution will be more general. @FelixNoessler would you be able to prepare a PR? I believe it should be something like

    ff = Symbol("fdist"*f) 
    bf = Symbol("beta"*f) 
    cf = Symbol("chisq"*f)
    @eval function $ff(ν1::T, ν2::T, y::T) where {T<:Real} 
        if ν2 > some_large_value
            return $cf(ν1, y)/ν1
        else
            x = $bf(ν1/2, ν2/2, y) 
            return x/(1 - x)*ν2/ν1 
        end
    end

@FelixNoessler
Copy link
Author

Thanks for your response! Your answer indeed fixes the issue for the quantile function.

However, there a remaining numerical instabilities for other functions of the student t distribution. For example here in line 7, we should switch with high value of the degrees of freedom (lower than infinity) to the normal distribution, this fixes the issue with the log pdf that I have also showed above:

function tdistlogpdf::T, x::T) where {T<:Real}
isinf(ν) && return normlogpdf(x)
νp12 =+ 1) / 2
return loggamma(νp12) - (logπ + log(ν)) / 2 - loggamma/ 2) - νp12 * log1p(x^2 / ν)
end

Should I address all this in one PR and use the same threshold value? Should I write a comment in the code why we use a threshold value to switch to the normal/chi^2 distribution?

I will test it a bit to get a good threshold value. I am happy to prepare a PR. I haven't done it often, but I will try and will be happy to get feedback.

@andreasnoack
Copy link
Member

Great. You can do it in one or two PRs. Either is fine. Adding a comment is almost always a good idea.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
2 participants