-
Notifications
You must be signed in to change notification settings - Fork 110
/
Copy pathflux.Rd
69 lines (64 loc) · 2.74 KB
/
flux.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/flux.R
\name{flux}
\alias{flux}
\title{Influx and outflux of multivariate missing data patterns}
\usage{
flux(data, local = names(data))
}
\arguments{
\item{data}{A data frame or a matrix containing the incomplete data. Missing
values are coded as NA's.}
\item{local}{A vector of names of columns of \code{data}. The default is to
include all columns in the calculations.}
}
\value{
A data frame with \code{ncol(data)} rows and six columns:
pobs = Proportion observed,
influx = Influx
outflux = Outflux
ainb = Average inbound statistic
aout = Average outbound statistic
fico = Fraction of incomplete cases among cases with \code{Yj} observed
}
\description{
Influx and outflux are statistics of the missing data pattern. These
statistics are useful in selecting predictors that should go into the
imputation model.
}
\details{
Infux and outflux have been proposed by Van Buuren (2018), chapter 4.
Influx is equal to the number of variable pairs \code{(Yj , Yk)} with
\code{Yj} missing and \code{Yk} observed, divided by the total number of
observed data cells. Influx depends on the proportion of missing data of the
variable. Influx of a completely observed variable is equal to 0, whereas for
completely missing variables we have influx = 1. For two variables with the
same proportion of missing data, the variable with higher influx is better
connected to the observed data, and might thus be easier to impute.
Outflux is equal to the number of variable pairs with \code{Yj} observed and
\code{Yk} missing, divided by the total number of incomplete data cells.
Outflux is an indicator of the potential usefulness of \code{Yj} for imputing
other variables. Outflux depends on the proportion of missing data of the
variable. Outflux of a completely observed variable is equal to 1, whereas
outflux of a completely missing variable is equal to 0. For two variables
having the same proportion of missing data, the variable with higher outflux
is better connected to the missing data, and thus potentially more useful for
imputing other variables.
FICO is an outbound statistic defined by the fraction of incomplete cases
among cases with \code{Yj} observed (White and Carlin, 2010).
}
\references{
Van Buuren, S. (2018).
\href{https://stefvanbuuren.name/fimd/missing-data-pattern.html#sec:flux}{\emph{Flexible Imputation of Missing Data. Second Edition.}}
Chapman & Hall/CRC. Boca Raton, FL.
White, I.R., Carlin, J.B. (2010). Bias and efficiency of multiple imputation
compared with complete-case analysis for missing covariate values.
\emph{Statistics in Medicine}, \emph{29}, 2920-2931.
}
\seealso{
\code{\link{fluxplot}}, \code{\link{md.pattern}}, \code{\link{fico}}
}
\author{
Stef van Buuren, 2012
}
\keyword{misc}