-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathMarkov.Rmd
256 lines (176 loc) · 11.2 KB
/
Markov.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
---
title: A Markov Multi-State Model using the Framingham Data
output: html_document
---
Measuring stage duration and transition probability & intensity with the msm package.
The framingham_ms dataset contains 10,132 state observations, defined as:
- State 1: No Disease
- State 2: Hypertension
- State 3: Cardiovascular disease
- State 4: Death
- State 99: Censored
The original Framingham Data were reshaped using Python 3 in a notebook entitled FraminghamReshaping.ipynb
msm Package Citation:
Jackson CH (2011). “Multi-State Models for Panel Data: The msm Package for R.” Journal of Statistical Software, 38(8), 1–29. doi:10.18637/jss.v038.i08.
Kyle P. Rasku MS BSN RN
```{r}
library("msm")
frmghm_ms <- read.csv("Datasets/framingham_ms.csv", header = TRUE, sep = ",")
frmghm_ms
```
## Define states and transitions allowed for model
Provide initial values representing a guess that there is an equal probability of progression, recovery or death (qrr = - SUM where s ne r of qrs).
Or, supply the option "gen.inits=TRUE" in the msm function call. This sets the initial values for non-zero entries of the Q matrix (transition intensity matrix) to the maximum likelihood estimates under the assumption that transitions take place only at observation times.
For transparency, I show the state table and MLE estimated Q matrix, and then set that to the qmatrix arg. given to msm.
```{r}
twoway4.q <- rbind(c(0, 0.166, 0.166, 0.166), c(0, 0, 0.25, 0.25), c(0, 0, 0, 0.50), c(0,0,0,0))
statetable.msm(STATE, RANDID, data=frmghm_ms)
Q = crudeinits.msm(STATE ~ YEARS, RANDID, data=frmghm_ms, censor = 99, censor.states = c(1,2,3), qmatrix=twoway4.q)
Q
```
```{r}
# Each initial transition block must add up to -qrr (0.5) based on the assumption
# In state 1 there are 3 possible transitions - to hypertension (2), to cvd (3), or death (4)
# In state 2 there are 2 possible transitions - to cvd (3), or death (4)
# In state 3 there are 2 possible transition - to hypertension (2), or death (4)
# In state 4 there are no further transitions possible (obviously)
rownames(Q) <- colnames(Q) <- c("Well", "Hypertensive", "CVD", "Death")
```
```{r}
frm.msm <- msm(STATE ~ YEARS, subject = RANDID, data = frmghm_ms, qmatrix = Q, censor = 99,
censor.states = c(1,2,3), exacttimes=TRUE)
```
```{r}
frm.msm
```
## Interpreting the Transition Intensities
- Mean time in the Well state: 1/0.048348 = 20.7 years
- Mean time in the Hypertensive state: 1/0.021638 = 46.2 years
- Mean time in the CVD state: 1/0.150787 = 6.6 years
- From the Well state, the likelihood of transition to Hypertensive is: 0.026774 (2.7%)
- From the Well state, the likelihood of transition to CVD is: 0.009887 (1%)
- From the Well state, the likelihood of transition to Death is 0.011687 (1.2%)
- From the Hypertensive state, the likelihood of transition to CVD is: 0.011434 (1.1%)
- From the Hypertensive state, the likelihood of transition to Death is: 0.010204 (1%)
- From the CVD state, the likelihood of transition to Death is: 0.150787 **(15%)**
```{r}
# Display the Transition Probability Matrix, P(t) over an interval of t=1 (in this case, 1 year)
# ci = "normal" computes a confidence interval for P(t) by repeated sampling from the asymptotic
# normal distribution of the maximum likelihood estimates of the log(qrs)
# Based on a default 1000 samples, converged to within 2 significant figures
# NOTE:
# ci = "boot" would instead compute intervals using nonparametric bootstrap resampling, drawn with replacement
# the model is refitted repeatedly to estimate the sampling uncertainty surrounding the estimates
# more accurate, but slower
pmatrix.msm(frm.msm, t = 1, ci = "normal")
```
## Transition Probabilities
- The probability of being Well 1 year from now, given Well is 95%
- The probability of being Hypertensive 1 year from now, given Well is 2.6%
- The probability of being CVD 1 year from now, given Well is 1%
- The probability of being Dead 1 year from now, given Well is 1.2%
- The probability of being Hypertensive 1 year from now, given Hypertensive is 98%
- The probability of being CVD 1 year from now, given Hypertensive is 1%
- The probability of being Dead 1 year from now, given Hypertensive is 1%
- The probability of being CVD 1 year from now, given CVD is 86%
- The probability of being Dead 1 year from now, given CVD is 14%
```{r}
# A model with covariates: sex, age, diabetes (yes or no) and smoker (yes or no)
frm.cov.msm <- msm(STATE ~ YEARS, subject = RANDID, data = frmghm_ms, covariates = ~ AGE + SEX + DIABETES + CURSMOKE,
qmatrix = Q, method = "BFGS", exacttimes = TRUE, censor = 99,
censor.states = c(1,2,3), control = list(fnscale=30000, maxit=10000))
```
```{r}
# Display hazard ratios for each covariate on each transition with 95% confidence intervals
hazard.msm(frm.cov.msm)
```
## Interpreting the (significant) Hazard Ratios
- 1 year increase in age is associated with a 5% increased risk of CVD onset (Well -> CVD)
- 1 year increase in age is associated with a 8% increased risk of Death (on average) (Well -> Death)
- 1 year increase in age is associated with a 3.2% increased risk of CVD onset, if Hypertensive (HTN -> CVD)
- 1 year increase in age is associated with a 4.4% increased risk of Death, if Hypertensive (HTN -> Death)
- 1 year increase in age is associated with a 5.8% increased risk of Death, if CVD (CVD -> Death)
- Being Female increases the risk of Hypertension onset (Well -> HTN) by 39% on average.
- Being Female decreases the risk of CVD onset (Well -> CVD) by 51% on average.
- Being Female decreases the risk of Death (Well -> Death) by 19% on average.
- Being Female decreases the risk of CVD, given HTN (HTN -> CVD) by 45% on average.
- Being Female decreases the risk of Death, given HTN (HTN -> Death) by 33% on average.
- An initial diagnosis of Diabetes increases the risk of CVD onset (Well -> CVD) by an avg. of **170%**
- An initial diagnosis of Diabetes increases the risk of CVD onset, if Hypertensive (HTN -> CVD) by an avg. of **187%**
- An initial diagnosis of Diabetes increases the risk of Death for those who already have HTN by 103%.
- An initial diagnosis of Diabetes increases the risk of Death for those who already have CVD by 57%.
- Being a smoker is associated with a 27% lower risk of becoming Hypertensive if Well (these are younger people).
- Being a smoker is associated with a 26% increased risk of CVD, if Well.
- Being a smoker is associated with a 39% increased risk of Dying, if Well.
- Being a smoker is associated with a 29% increased risk of CVD, if Hypertensive.
- Being a smoker is associated with a 25% increased risk of Death, if CVD.
```{r}
# Calculating the Transition Intensity Matrix for specified covariates
# Age: 40, Primary Diagnosis: Diabetes & Smoker
# Note: Avg age is 50
qmatrix.msm(frm.cov.msm, covariates = list(AGE = 40, DIABETES = 1, CURSMOKE = 1))
```
#### Comparing this patient to the average patient
- Average patient's likelihood of transition from Well -> HTN 0.026774 (2.7%), for this patient it is **9.7%**
- Average patient's likelihood of transition from Well -> CVD 0.009887 (1%), for this patient it is **5.3%**
- Average patient's likelihood of transition from HTN -> CVD 0.011434 (1.1%), for this patient it is **5.9%**
- Average patient's likelihood of transition from CVD -> Death 0.150787 (15%), for this patient it is **6.6%**
- Average patient's mean time in Well state 20.7 yrs, this patient **13.4 yrs**
- Average patient's mean time in Hypertensive state 46.2 yrs, this patient **12.3 yrs**
- Average patient's mean time in CVD state 6.6 yrs, this patient **15.2 yrs**
```{r}
# Does the model with covariates fit significantly better than the one without?
# Compare the likelihood ratio statistic to Chi-square distribution with 24 degrees of freedom
lrtest.msm(frm.msm, frm.cov.msm)
```
The p-value is highly significant.
## When Q is piecewise-constant
Transition probabilities can be calculated in closed form by summing the likelihood over the unknown observed state at the times when the covariates change.
```{r}
# Fitting a model where all intensities change 12 years after the beginning of the study
# Divides data into 2 time periods: -Inf to 12 yrs, and 12 yrs to Inf
# The study lasted a little more than 24 years, so this is about the halfway point
frm.pci.msm <- msm(STATE ~ YEARS, subject = RANDID, data = frmghm_ms, qmatrix = Q,
pci = 12, method = "BFGS", exacttimes = TRUE,
censor = 99, censor.states = c(1, 2, 3),
control = list(fnscale=30000, maxit=10000))
# Is this data truly time-inhomogenous?
lrtest.msm(frm.msm, frm.pci.msm)
```
It is very likely that the data is time-inhomogenous.
```{r}
hazard.msm(frm.pci.msm)
```
## Diagnostic Plots
Comparing model predictions with Kaplan-Meier curves
```{r}
par(mfrow = c(2, 2))
plot.survfit.msm(frm.msm, main = "frm.msm: no covariates", mark.time = FALSE, legend.pos=c(0,0))
plot.survfit.msm(frm.cov.msm, main = "frm.cov.msm: covariates", mark.time = FALSE, legend.pos=c(0,0))
plot.survfit.msm(frm.pci.msm, mark.time = FALSE, legend.pos=c(0,0))
title("frm.pci.msm: time-inhomogeneous", line = 2)
title("(12 year change point)", line = 1)
frm.pci2.msm <- msm(STATE ~ YEARS, subject = RANDID, data = frmghm_ms, qmatrix = Q,
pci = c(6,12,18), method = "BFGS", exacttimes = TRUE,
censor = 99, censor.states=c(1, 2, 3),
control = list(fnscale=30000, maxit=10000))
plot.survfit.msm(frm.pci2.msm, mark.time = FALSE, legend.pos=c(0,0))
title("frm.pci2.msm: time-inhomogeneous", line = 2)
title("(6, 12 and 18 year change points)", line = 1)
```
The fit is much improved after adding censoring!
Adding additional time-change points doesn't appear to help the fit, but covariates likely do, although the differences between the first two graphs are not readily apparent, the differences in the model fits are.
## Comparing Observed and Expected Prevalence
Works best when individuals are actually observed at the computed times, otherwise assumptions are made such as individuals are only observed at these times, or midpoints are assumed.
The observed prevalence of a state is simply calculated as the number of individuals known to be in that state, divided by the number of individuals whose state is known at that time, which ignores the information from individuals censored at earlier times (root of Kaplan-Meier estimation :))
```{r}
# Need to look at how this is implemented by Gentleman et al. 1994 using prevalence.msm, and plot.prevalence.msm
```
## None of these models give an adequate fit
A more complex pattern of time-dependence or allowing transition intensities to depend on covariates would likely yield a better fit.
TO DO => Figure out how to allow transition intensities to depend on covariates!
## It is also possible to calculate the influence of each individual on the MLE
Using scoreresid.msm
## Extensions of msm and limitations
For continuously-observed processes: mstate (deWreede et al. 2010)
For Random Effects models (unexplained heterogeneity in transition intensities between individuals) - calculating likelihood often intractable with a few exceptions: tracking model - random effect acts on all intensities simultaneously (Satten 1999), or a discrete random effects distribution (Cook et al 2004)