Skip to content

Direct and robust methods for outlier detection in linear regression

License

Notifications You must be signed in to change notification settings

tantei3/LinRegOutliers

 
 

Repository files navigation

Build Status

LinRegOutliers

A Julia package for outlier detection in linear regression.

Implemented Methods

  • Ordinary Least Squares, Weighted Least Squares, Basic diagnostics
  • Hadi & Simonoff (1993)
  • Kianifard & Swallow (1989)
  • Sebert & Montgomery & Rollier (1998)
  • Least Median of Squares
  • Least Trimmed Squares
  • Minimum Volume Ellipsoid (MVE)
  • MVE & LTS Plot
  • Billor & Chatterjee & Hadi (2006)
  • Pena & Yohai (1995)
  • Satman (2013)
  • Satman (2015)
  • Setan & Halim & Mohd (2000)
  • Least Absolute Deviations (LAD)
  • Least Trimmed Absolute Deviations (LTA)
  • Hadi (1992)
  • Marchette & Solka (2003) Data Images
  • Satman's GA based LTS estimation (2012)
  • Fischler & Bolles (1981) RANSAC Algorithm
  • Minimum Covariance Determinant Estimator
  • Imon (2005) Algorithm
  • Barratt & Angeris & Boyd (2020) CCF algorithm
  • Atkinson (1994) Forward Search Algorithm
  • Summary

Unimplemented Methods

  • BACON Algorithm (Billor & Hadi & Velleman (2000))
  • Hadi (1994) Algorithm
  • Depth based estimators (Regression depth, deepest regression, etc.)
  • Theil & Sen estimator for mutliple regression

Installation

julia> ]
(@v1.5) pkg> add LinRegOutliers

or

julia> using Pgk
julia> Pkg.add("LinRegOutliers")

then

julia> using LinRegOutliers

to make all the stuff be ready!

Examples

julia> using LinRegOutliers
julia> # Regression setting for Hawkins & Bradu & Kass data
julia> reg = createRegressionSetting(@formula(y ~ x1 + x2 + x3), hbk)
julia> smr98(reg)
14-element Array{Int64,1}:
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
julia> py95(reg)["outliers"]
14-element Array{Int64,1}:
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
julia> reg = createRegressionSetting(@formula(calls ~ year), phones);
julia> lms(reg)
Dict{Any,Any} with 6 entries:
  "stdres"    => [2.42593, 1.62705, 0.550525, 0.584612, 0.155943, -0.272726, -0.608843, -1.03751, -0.448118, -0.228929    93.4182, 96.9692, 112.552, 127.209, 147.419, 174.108
  "S"         => 1.08048
  "outliers"  => [14, 15, 16, 17, 18, 19, 20, 21]
  "objective" => 0.43276
  "coef"      => [-56.3796, 1.16317]
  "crit"      => 2.5 
julia> reg = createRegressionSetting(@formula(calls ~ year), phones);

julia> lts(reg)
Dict{Any,Any} with 6 entries:
  "betas"            => [-56.5219, 1.16488]
  "S"                => 1.10918
  "hsubset"          => [11, 10, 5, 6, 23, 12, 13, 9, 24, 7, 3, 4, 8]
  "outliers"         => [14, 15, 16, 17, 18, 19, 20, 21]
  "scaled.residuals" => [2.41447, 1.63472, 0.584504, 0.61617, 0.197052, -0.222066, -0.551027, -0.970146, -0.397538, -0.185558    91.0312, 94.4889, 109.667, 123.943, 143.629, 
  "objective"        => 3.43133

detectOutliersImage

julia> # Matrix of independent variables of Hawkins & Bradu & Kass data
julia> data = hcat(hbk.x1, hbk.x2, hbk.x3);
julia> dataimage(data)

detectOutliersImage

Want to have contributions?

You are probably the right contributor

  • If you have statistics background
  • If you like Julia

However, the second condition is more important because an outlier detection algorithm is just an algorithm. Reading the implemented methods is enough to implement new ones. Please follow the issues. Here is the a bunch of first shot introductions for new comers. Welcome and thank you in advance!

Contact & Communication

  • Please use issues for a new future request or bug reports
  • We are in #linregoutliers channel on Julia Slack for any discussion requires online chatting.

About

Direct and robust methods for outlier detection in linear regression

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Julia 100.0%