-
Notifications
You must be signed in to change notification settings - Fork 8
/
Copy path3.3-cjs_with_mix.Rmd
335 lines (286 loc) · 14.9 KB
/
3.3-cjs_with_mix.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
# Capture-mark recapture model with a mixture structure to account for missing sex-variable for parts of the individuals {#cjs_with_mix}
## Introduction
In some species the identification of the sex is not possible for all individuals without sampling DNA. For example, morphological dimorphism is absent or so weak that parts of the individuals cannot be assigned to one of the sexes. Particularly in ornithological long-term capture recapture data sets that typically are obtained by voluntary bird ringers who do normaly not have the possibilities to analyse DNA, often the sex identification is missing in parts of the individuals. For estimating survival, it would nevertheless be valuable to include data of all individuals, use the information on sex-specific effects on survival wherever possible but account for the fact that of parts of the individuals the sex is not known. We here explain how a Cormack-Jolly-Seber model can be integrated with a mixture model in oder to allow for a combined analyses of individuals with and without sex identified.
An introduction to the Cormack-Jolly-Seber model we gave in Chapter 14.5 of the book @KornerNievergelt.2015. We here expand this model by a mixture structure that allows including individuals with a missing categorical predictor variable, such as sex.
## Data description
```{r datasim, echo=TRUE}
## simulate data
# true parameter values
theta <- 0.6 # proportion of males
nocc <- 15 # number of years in the data set
b0 <- matrix(NA, ncol=nocc-1, nrow=2)
b0[1,] <- rbeta((nocc-1), 3, 4) # capture probability of males
b0[2,] <- rbeta((nocc-1), 2, 4) # capture probability of females
a0 <- matrix(NA, ncol=2, nrow=2)
a1 <- matrix(NA, ncol=2, nrow=2)
a0[1,1]<- qlogis(0.7) # average annual survival for adult males
a0[1,2]<- qlogis(0.3) # average annual survival for juveniles
a0[2,1] <- qlogis(0.55) # average annual survival for adult females
a0[2,2] <- a0[1,2]
a1[1,1] <- 0
a1[1,2] <- -0.5
a1[2,1] <- -0.8
a1[2,2] <- a1[1,2]
nindi <- 1000 # number of individuals with identified sex
nindni <- 1500 # number of individuals with non-identified sex
nind <- nindi + nindni # total number of individuals
y <- matrix(ncol=nocc, nrow=nind)
z <- matrix(ncol=nocc, nrow=nind)
first <- sample(1:(nocc-1), nind, replace=TRUE)
sex <- sample(c(1,2), nind, prob=c(theta, 1-theta), replace=TRUE)
juvfirst <- sample(c(0,1), nind, prob=c(0.5, 0.5), replace=TRUE)
juv <- matrix(0, nrow=nind, ncol=nocc)
for(i in 1:nind) juv[i,first[i]] <- juv[i]
x <- runif(nocc-1, -2, 2) # a time dependent covariate covariate
p <- b0 # recapture probability
phi <- array(NA, dim=c(2, 2, nocc-1))
# for ad males
phi[1,1,] <- plogis(a0[1,1]+a1[1,1]*x)
# for ad females
phi[2,1,] <- plogis(a0[2,1]+a1[2,1]*x)
# for juvs
phi[1,2,] <- phi[2,2,] <- plogis(a0[2,2]+a1[2,2]*x)
for(i in 1:nind){
z[i,first[i]] <- 1
y[i, first[i]] <- 1
for(t in (first[i]+1):nocc){
z[i, t] <- rbinom(1, size=1, prob=z[i,t-1]*phi[sex[i],juv[i,t-1]+1, t-1])
y[i, t] <- rbinom(1, size=1, prob=z[i,t]*p[sex[i],t-1])
}
}
y[is.na(y)] <- 0
```
The mark-recapture data set consists of capture histories of `r round(nind)` individuals over `r round(nocc)` time periods. For each time period $t$ and individual $i$ the capture history matrix $y$ contains $y_{it}=1$ if the individual $i$ is captured during time period $t$, or $y_{it}=0$ if the individual $i$ is not captured during time period $t$. The marking time period varies between individuals from 1 to `r round(nocc-1)`. At the marking time period, the age of the individuals was classified either as juvenile or as adult. Juveniles turn into adults after one time period, thus age is known for all individuals during all time periods after marking. For `r round(nindi)` individuals of the `r round(nind)` individuals, the sex is identified, whereas for `r round(nindni)` individuals, the sex is unknown. The example data contain one covariate $x$ that takes on one value for each time period.
```{r}
# bundle the data for Stan
i <- 1:nindi
ni <- (nindi+1):nind
datax <- list(yi=y[i,], nindi=nindi, sex=sex[i], nocc=nocc,
yni=y[ni,], nindni=nindni, firsti=first[i], firstni=first[ni],
juvi=juv[i,]+1, juvni=juv[ni,]+1, year=1:nocc, x=x)
```
## Model description
The observations $y_{it}$, an indicator of whether individual i was recaptured during time period $t$ is modelled conditional on the latent true state of the individual birds $z_{it}$ (0 = dead or permanently emigrated, 1 = alive and at the study site) as a Bernoulli variable. The probability $P(y_{it} = 1)$ is the product of the probability that an alive individual is recaptured, $p_{it}$, and the state of the bird $z_{it}$ (alive = 1, dead = 0). Thus, a dead bird cannot be recaptured, whereas for a bird alive during time period $t$, the recapture probability equals $p_{it}$:
$$y_{it} \sim Bernoulli(z_{it}p_{it})$$
The latent state variable $z_{it}$ is a Markovian variable with the state at time $t$ being dependent on the state at time $t-1$ and the apparent survival probability $$\phi_{it}$$:
$$z_{it} \sim Bernoulli(z_{it-1}\phi_{it})$$
We use the term apparent survival in order to indicate that the parameter $\phi$ is a product of site fidelity and survival. Thus, individuals that permanently emigrated from the study area cannot be distinguished from dead individuals.
In both models, the parameters $\phi$ and $p$ were modelled as sex-specific. However, for parts of the individuals, sex could not be identified, i.e. sex was missing. Ignoring these missing values would most likely lead to a bias because they were not missing at random. The probability that sex can be identified is increasing with age and most likely differs between sexes. Therefore, we included a mixture model for the sex:
$$Sex_i \sim Categorical(q_i)$$
where $q_i$ is a vector of length 2, containing the probability of being a male and a female, respectively. In this way, the sex of the non-identified individuals was assumed to be male or female with probability $q[1]$ and $q[2]=1-q[1]$, respectively. This model corresponds to the finite mixture model introduced by @Pledger.2003 in order to account for unknown classes of birds (heterogeneity). However, in our case, for parts of the individuals the class (sex) was known.
In the example model, we constrain apparent survival to be linearly dependent on a covariate x with different slopes for males, females and juveniles using the logit link function.
$$logit(\phi_{it}) = a0_{sex-age-class[it]} + a1_{sex-age-class[it]}x_i$$
Annual recapture probability was modelled for each year and age and sex class independently:
$$p_{it} = b0_{t,sex-age-class[it]}$$
Uniform prior distributions were used for all parameters with a parameter space limited to values between 0 and 1 (probabilities) and a normal distribution with a mean of 0 and a standard deviation of 1.5 for the intercept $a0$, and a standard deviation of 5 was used for $a1$.
## The Stan code
The trick for coding the CMR-mixture model in Stan is to formulate the model 3 times:
1. For the individuals with identified sex
2. For the males that were not identified
3. For the females that were not identified
Then for the non-identified individuals a mixture model is formulated that assigns a probability of being a female or a male to each individual.
```{r engine='cat', engine.opts=list(file="stanmodels/cmr_mixture_model.stan",lang="stan")}
data {
int<lower=2> nocc; // number of capture events
int<lower=0> nindi; // number of individuals with identified sex
int<lower=0> nindni; // number of individuals with non-identified sex
int<lower=0,upper=2> yi[nindi,nocc]; // CH[i,k]: individual i captured at k
int<lower=0,upper=nocc-1> firsti[nindi]; // year of first capture
int<lower=0,upper=2> yni[nindni,nocc]; // CH[i,k]: individual i captured at k
int<lower=0,upper=nocc-1> firstni[nindni]; // year of first capture
int<lower=1, upper=2> sex[nindi];
int<lower=1, upper=2> juvi[nindi, nocc];
int<lower=1, upper=2> juvni[nindni, nocc];
int<lower=1> year[nocc];
real x[nocc-1]; // a covariate
}
transformed data {
int<lower=0,upper=nocc+1> lasti[nindi]; // last[i]: ind i last capture
int<lower=0,upper=nocc+1> lastni[nindni]; // last[i]: ind i last capture
lasti = rep_array(0,nindi);
lastni = rep_array(0,nindni);
for (i in 1:nindi) {
for (k in firsti[i]:nocc) {
if (yi[i,k] == 1) {
if (k > lasti[i]) lasti[i] = k;
}
}
}
for (ii in 1:nindni) {
for (kk in firstni[ii]:nocc) {
if (yni[ii,kk] == 1) {
if (kk > lastni[ii]) lastni[ii] = kk;
}
}
}
}
parameters {
real<lower=0, upper=1> theta[nindni]; // probability of being male for non-identified individuals
real<lower=0, upper=1> b0[2,nocc-1]; // intercept of p
real a0[2,2]; // intercept for phi
real a1[2,2]; // coefficient for phi
}
transformed parameters {
real<lower=0,upper=1>p_male[nindni,nocc]; // capture probability
real<lower=0,upper=1>p_female[nindni,nocc]; // capture probability
real<lower=0,upper=1>p[nindi,nocc]; // capture probability
real<lower=0,upper=1>phi_male[nindni,nocc-1]; // survival probability
real<lower=0,upper=1>chi_male[nindni,nocc+1]; // probability that an individual
// is never recaptured after its
// last capture
real<lower=0,upper=1>phi_female[nindni,nocc-1]; // survival probability
real<lower=0,upper=1>chi_female[nindni,nocc+1]; // probability that an individual
// is never recaptured after its
// last capture
real<lower=0,upper=1>phi[nindi,nocc-1]; // survival probability
real<lower=0,upper=1>chi[nindi,nocc+1]; // probability that an individual
// is never recaptured after its
// last capture
{
int k;
int kk;
for(ii in 1:nindi){
if (firsti[ii]>1) {
for (z in 1:(firsti[ii]-1)){
phi[ii,z] = 1;
}
}
for(tt in firsti[ii]:(nocc-1)) {
// linear predictor for phi:
phi[ii,tt] = inv_logit(a0[sex[ii], juvi[ii,tt]] + a1[sex[ii], juvi[ii,tt]]*x[tt]);
}
}
for(ii in 1:nindni){
if (firstni[ii]>1) {
for (z in 1:(firstni[ii]-1)){
phi_female[ii,z] = 1;
phi_male[ii,z] = 1;
}
}
for(tt in firstni[ii]:(nocc-1)) {
// linear predictor for phi:
phi_male[ii,tt] = inv_logit(a0[1, juvni[ii,tt]] + a1[1, juvni[ii,tt]]*x[tt]);
phi_female[ii,tt] = inv_logit(a0[2, juvni[ii,tt]]+ a1[2, juvni[ii,tt]]*x[tt]);
}
}
for(i in 1:nindi) {
// linear predictor for p for identified individuals
for(w in 1:firsti[i]){
p[i,w] = 1;
}
for(kkk in (firsti[i]+1):nocc)
p[i,kkk] = b0[sex[i],year[kkk-1]];
chi[i,nocc+1] = 1.0;
k = nocc;
while (k > firsti[i]) {
chi[i,k] = (1 - phi[i,k-1]) + phi[i,k-1] * (1 - p[i,k]) * chi[i,k+1];
k = k - 1;
}
if (firsti[i]>1) {
for (u in 1:(firsti[i]-1)){
chi[i,u] = 0;
}
}
chi[i,firsti[i]] = (1 - p[i,firsti[i]]) * chi[i,firsti[i]+1];
}// close definition of transformed parameters for identified individuals
for(i in 1:nindni) {
// linear predictor for p for non-identified individuals
for(w in 1:firstni[i]){
p_male[i,w] = 1;
p_female[i,w] = 1;
}
for(kkkk in (firstni[i]+1):nocc){
p_male[i,kkkk] = b0[1,year[kkkk-1]];
p_female[i,kkkk] = b0[2,year[kkkk-1]];
}
chi_male[i,nocc+1] = 1.0;
chi_female[i,nocc+1] = 1.0;
k = nocc;
while (k > firstni[i]) {
chi_male[i,k] = (1 - phi_male[i,k-1]) + phi_male[i,k-1] * (1 - p_male[i,k]) * chi_male[i,k+1];
chi_female[i,k] = (1 - phi_female[i,k-1]) + phi_female[i,k-1] * (1 - p_female[i,k]) * chi_female[i,k+1];
k = k - 1;
}
if (firstni[i]>1) {
for (u in 1:(firstni[i]-1)){
chi_male[i,u] = 0;
chi_female[i,u] = 0;
}
}
chi_male[i,firstni[i]] = (1 - p_male[i,firstni[i]]) * chi_male[i,firstni[i]+1];
chi_female[i,firstni[i]] = (1 - p_female[i,firstni[i]]) * chi_female[i,firstni[i]+1];
} // close definition of transformed parameters for non-identified individuals
} // close block of transformed parameters exclusive parameter declarations
} // close transformed parameters
model {
// priors
theta ~ beta(1, 1);
for (g in 1:(nocc-1)){
b0[1,g]~beta(1,1);
b0[2,g]~beta(1,1);
}
a0[1,1]~normal(0,1.5);
a0[1,2]~normal(0,1.5);
a1[1,1]~normal(0,3);
a1[1,2]~normal(0,3);
a0[2,1]~normal(0,1.5);
a0[2,2]~normal(a0[1,2],0.01); // for juveniles, we assume that the effect of the covariate is independet of sex
a1[2,1]~normal(0,3);
a1[2,2]~normal(a1[1,2],0.01);
// likelihood for identified individuals
for (i in 1:nindi) {
if (lasti[i]>0) {
for (k in firsti[i]:lasti[i]) {
if(k>1) target+= (log(phi[i, k-1]));
if (yi[i,k] == 1) target+=(log(p[i,k]));
else target+=(log1m(p[i,k]));
}
}
target+=(log(chi[i,lasti[i]+1]));
}
// likelihood for non-identified individuals
for (i in 1:nindni) {
real log_like_male = 0;
real log_like_female = 0;
if (lastni[i]>0) {
for (k in firstni[i]:lastni[i]) {
if(k>1){
log_like_male += (log(phi_male[i, k-1]));
log_like_female += (log(phi_female[i, k-1]));
}
if (yni[i,k] == 1){
log_like_male+=(log(p_male[i,k]));
log_like_female+=(log(p_female[i,k]));
}
else{
log_like_male+=(log1m(p_male[i,k]));
log_like_female+=(log1m(p_female[i,k]));
}
}
}
log_like_male += (log(chi_male[i,lastni[i]+1]));
log_like_female += (log(chi_female[i,lastni[i]+1]));
target += log_mix(theta[i], log_like_male, log_like_female);
}
}
```
## Call Stan from R, check convergence and look at results
```{r runstan, eval=FALSE}
# Run STAN
library(rstan)
fit <- stan(file = "stanmodels/cmr_mixture_model.stan", data=datax, verbose = FALSE)
# for above simulated data (25000 individuals x 15 time periods)
# computing time is around 48 hours on an intel corei7 laptop
# for larger data sets, we recommed moving the transformed parameters block
# to the model block in order to avoid monitoring of p_male, p_female,
# phi_male and phi_female producing memory problems
# launch_shinystan(fit) # diagnostic plots
summary(fit)
```
```{r savefit, eval=FALSE, echo=FALSE}
#save(sumfit2, file="stanmodels/summaryfit_cmrmix.rda")
```
```{r loadfit, echo=FALSE}
load("stanmodels/summaryfit_cmrmix.rda")
sumfit2
```