all references from _new only, all in same shape Author.YYYY, Author.…

…YYYYb etc
TobiasRoth · Dec 13, 2024 · e08fc43 · e08fc43
1 parent 4797ba3
commit e08fc43
Show file tree

Hide file tree

Showing 17 changed files with 67 additions and 40 deletions.
diff --git a/1.1-prerequisites.Rmd b/1.1-prerequisites.Rmd
@@ -474,7 +474,7 @@ y <- c(47.5, 43, 43, 44, 48.5, 37.5, 41.5, 45.5)
 n <- length(y)
 ```
 
-Because there are two parameters, we need to specify a two-dimensional prior distribution. We looked up in @Gelman2014 that the conjugate prior distribution in our case is an Normal-Inverse-Chisquare distribution: 
+Because there are two parameters, we need to specify a two-dimensional prior distribution. We looked up in @Gelman.2014 that the conjugate prior distribution in our case is an Normal-Inverse-Chisquare distribution: 
 
 $p(\theta, \sigma) = N-Inv-\chi^2(\mu_0, \sigma_0^2/\kappa_0; v_0, \sigma_0^2)$ 
 

diff --git a/1.2-analyses_steps.Rmd b/1.2-analyses_steps.Rmd
@@ -44,7 +44,7 @@ d) Is it expected that a change of 1 at lower values for x has the same biologic
 
 2. Centering: Centering ($x.c = x-mean(x)$) is a transformation that produces a variable with a mean of 0. Centering is optional. We have two reasons to center a predictor variable. First, it helps the model fitting algorithm to better converge because it reduces correlations among model parameters. Second, with centered predictors, the intercept and main effects in the linear model are better interpretable (they are measured at the center of the data instead of at the covariate value of 0 which may be far off).
 
-3. Scaling: Scaling ($x.s = x/c$, where $c$ is a constant) is a transformation that changes the unit of the variable. Also scaling is optional. We have three reasons to scale an predictor variable. First, to make the effect sizes better understandable. For example, a population change from one year to the next may be very small and hard to interpret. When we give the change for a 10-year period, its ecological meaning is better understandable. Second, to make the estimate of the effect sizes comparable between variables, we may use $x.s = x/sd(x)$. The resulting variable has a unit of one standard deviation. A standard deviation may be comparable between variables that oritinally are measured in different units (meters, seconds etc). @Gelman2007 (p. 55 f) propose to scale the variables by two times the standard deviation ($x.s = x/(2*sd(x))$) to make effect sizes comparable between numeric and binary variables. Third, scaling can be important for model convergence, especially when polynomials are included. Also, consider the use of orthogonal polynomials, see Chapter 4.2.9 in @KornerNievergelt2015.
+3. Scaling: Scaling ($x.s = x/c$, where $c$ is a constant) is a transformation that changes the unit of the variable. Also scaling is optional. We have three reasons to scale an predictor variable. First, to make the effect sizes better understandable. For example, a population change from one year to the next may be very small and hard to interpret. When we give the change for a 10-year period, its ecological meaning is better understandable. Second, to make the estimate of the effect sizes comparable between variables, we may use $x.s = x/sd(x)$. The resulting variable has a unit of one standard deviation. A standard deviation may be comparable between variables that oritinally are measured in different units (meters, seconds etc). @Gelman.2007 (p. 55 f) propose to scale the variables by two times the standard deviation ($x.s = x/(2*sd(x))$) to make effect sizes comparable between numeric and binary variables. Third, scaling can be important for model convergence, especially when polynomials are included. Also, consider the use of orthogonal polynomials, see Chapter 4.2.9 in @KornerNievergelt.2015.
 
 4. Collinearity: Look at the correlation among the explanatory variables (pairs plot or correlation matrix). If the explanatory variables are correlated, go back to step 2 and add this relationship. Further, read our thoughts about [collinearity](#collinearity).
 
@@ -70,7 +70,7 @@ Fit the model.
 ## Check Model {#step8}
 We assess model fit by graphical [analyses of the residuals](#residualanalysis) and by [predictive model checking](#modelchecking). 
 
-For non-Gaussian models it is often easier to assess model fit using [posterior predictive checks](#modelchecking) rather than residual analyses. Posterior predictive checks usually show clearly in which aspect the model failed so we can go back to step 2 of the analysis. Recognizing in what aspect a model does not fit the data based on residual plots improves with experience. Therefore, we list in Chapter 16 of @KornerNievergelt2015 some patterns that can appear in residual plots together with what these patterns possibly indicate. We also indicate what could be done in the specific cases.
+For non-Gaussian models it is often easier to assess model fit using [posterior predictive checks](#modelchecking) rather than residual analyses. Posterior predictive checks usually show clearly in which aspect the model failed so we can go back to step 2 of the analysis. Recognizing in what aspect a model does not fit the data based on residual plots improves with experience. Therefore, we list in Chapter 16 of @KornerNievergelt.2015 some patterns that can appear in residual plots together with what these patterns possibly indicate. We also indicate what could be done in the specific cases.
 
 ## Model Uncertainty  {#step9}
 If, while working through steps 1 to 8, possibly repeatedly, we came up with one or more models that fit the data reasonably well, we [model comparison](#model_comparison) is needed or inference may be drawn from more than one model. If we have only one model, we proceed to \@ref(step10).

diff --git a/1.3-distributions.Rmd b/1.3-distributions.Rmd
@@ -91,7 +91,7 @@ Further, note that not all variables measured as an integer number are count dat
 
 ### Negative-binomial distribution
 
-The negative-binomial distribution represents the number of zeros which occur in a sequence of Bernoulli trials before a target number of ones is reached. It is hard to see this situation in, e.g., the number of individuals counted on plots. Therefore, we were reluctant to introduce this distribution in our old book [@KornerNievergelt2015]. However, the negative-binomial distribution often fits much better to count data than the Poisson model because it has two parameters and therefore allows for fitting both the mean and the variance to the data. Therefore, we started using the negative-binomial distribution as a data model more often. 
+The negative-binomial distribution represents the number of zeros which occur in a sequence of Bernoulli trials before a target number of ones is reached. It is hard to see this situation in, e.g., the number of individuals counted on plots. Therefore, we were reluctant to introduce this distribution in our old book [@KornerNievergelt.2015]. However, the negative-binomial distribution often fits much better to count data than the Poisson model because it has two parameters and therefore allows for fitting both the mean and the variance to the data. Therefore, we started using the negative-binomial distribution as a data model more often. 
 $x \sim negative-binomial(p,n)$ 
 
 Its probability function is rather complex:  

diff --git a/2.0-PART-II.Rmd b/2.0-PART-II.Rmd
@@ -9,8 +9,8 @@ knitr::include_graphics('images/part_II.jpg', dpi = 150)
 ------
 
 ## Further reading {-} 
-A really good introductory book to Bayesian data analyses is [@McElreath2016]. This book starts with a thorough introduction to applying the Bayes theorem for drawing inference from data. In addition, it carefully discusses what can and what cannot be concluded from statistical results. We like this very much.
+A really good introductory book to Bayesian data analyses is [@McElreath.2020]. This book starts with a thorough introduction to applying the Bayes theorem for drawing inference from data. In addition, it carefully discusses what can and what cannot be concluded from statistical results. We like this very much.
 
-The developer of the `brms` package, Paul Bürkner, is writing a [book](http://paulbuerkner.com/software/brms-book/) that is already partly available online. It is a helpful cookbook with understandable explanations. We very much look forward to the finished book, that may bundle all the helpful vignettes and help-files to the functions of the `brms`package. 
+The developer of the `brms` package, Paul Bürkner, is writing a [book](http://paulbuerkner.com/software/brms-book/) that is already partly available online. It is a helpful cookbook with understandable explanations. We very much look forward to the finished book, that may bundle all the helpful vignettes and help-files to the functions of the `brms` package. 
 
-We like looking up statistical methods in papers and books written by Andrew Gelman [e.g. @Gelman2014] and Trevor Hastie (e.g. [@Hastie2009, @Efron2016]) because both explain complicated things in a concise and understandable way.  
+We like looking up statistical methods in papers and books written by Andrew Gelman [e.g. @Gelman.2014] and Trevor Hastie (e.g. [@Hastie.2009, @Efron.2016]) because both explain complicated things in a concise and understandable way.  
diff --git a/2.01-bayesian_paradigm.Rmd b/2.01-bayesian_paradigm.Rmd
@@ -156,4 +156,4 @@ Also, @Burnham.2002 give a thorough introduction to the likelihood and how it is
 
 The difference between the ML- and LS-estimates of the variance is explained in @Royle.2008b.
 
-@Gelman2014 introduce the log predictive density and explain how it is computed (p. 167).
+@Gelman.2014 introduce the log predictive density and explain how it is computed (p. 167).
diff --git a/2.03-lm.Rmd b/2.03-lm.Rmd
@@ -678,7 +678,7 @@ From these parameters we obtain the estimated differences in wing length between
 
 ### A linear model with a categorical and a numeric predictor (ANCOVA)
 
-An analysis of covariance, ANCOVA, is a normal linear model that contains at least one factor and one continuous variable as predictor variables. The continuous variable is also called a covariate, hence the name analysis of covariance. An ANCOVA can be used, for example, when we are interested in how the biomass of grass depends on the distance from the surface of the soil to the ground water in two different species (*Alopecurus pratensis*, *Dactylis glomerata*). The two species were grown by @Ellenberg1953 in tanks that showed a gradient in distance from the soil surface to the ground water. The distance from the soil surface to the ground water is used as a covariate (‘water’). We further assume that the species react differently to the water conditions. Therefore, we include an interaction between species and water. The model formula is then
+An analysis of covariance, ANCOVA, is a normal linear model that contains at least one factor and one continuous variable as predictor variables. The continuous variable is also called a covariate, hence the name analysis of covariance. An ANCOVA can be used, for example, when we are interested in how the biomass of grass depends on the distance from the surface of the soil to the ground water in two different species (*Alopecurus pratensis*, *Dactylis glomerata*). The two species were grown by @Ellenberg.1953 in tanks that showed a gradient in distance from the soil surface to the ground water. The distance from the soil surface to the ground water is used as a covariate (‘water’). We further assume that the species react differently to the water conditions. Therefore, we include an interaction between species and water. The model formula is then
 $\hat{y_i} = \beta_0 + \beta_1I(species=Dg) + \beta_2water_i + \beta_3I(species=Dg)water_i$  
 $y_i \sim normal(\hat{y_i}, \sigma^2)$
 
@@ -895,7 +895,7 @@ The correlations per se can be interesting. Further readings on how to visualize
 
 - principal component analysis [@Manly.1994]  
 - path analyses, e.g. @Shipley.2009  
-- structural equation models [@Hoyle2012]
+- structural equation models [@Hoyle.2012]
 
 ```{r fig.align='center', echo=FALSE, fig.link=''}
 knitr::include_graphics('images/ruchen.jpg', dpi = 150)

diff --git a/2.05-lmer.Rmd b/2.05-lmer.Rmd
@@ -19,7 +19,7 @@ library(arm)
 
 ### Why Mixed Effects Models?
 
-Mixed effects models (or hierarchical models @Gelman2007 for a discussion on the terminology) are used to analyze nonindependent, grouped, or hierarchical data. For example, when we measure growth rates of nestlings in different nests by taking mass measurements of each nestling several times during the nestling phase, the measurements are grouped within nestlings (because there are repeated measurements of each) and the nestlings are grouped within nests. Measurements from the same individual are likely to be more similar than measurements from different individuals, and individuals from the same nest are likely to be more similar than nestlings from different nests. Measurements of the same group (here, the “groups” are individuals or nests) are not independent. If the grouping structure of the data is ignored in the model, the residuals do not fulfill the independence assumption.  
+Mixed effects models (or hierarchical models @Gelman.2007 for a discussion on the terminology) are used to analyze nonindependent, grouped, or hierarchical data. For example, when we measure growth rates of nestlings in different nests by taking mass measurements of each nestling several times during the nestling phase, the measurements are grouped within nestlings (because there are repeated measurements of each) and the nestlings are grouped within nests. Measurements from the same individual are likely to be more similar than measurements from different individuals, and individuals from the same nest are likely to be more similar than nestlings from different nests. Measurements of the same group (here, the “groups” are individuals or nests) are not independent. If the grouping structure of the data is ignored in the model, the residuals do not fulfill the independence assumption.  
 
 Further, predictor variables can be measured on different hierarchical levels. For example, in each nest some nestlings were treated with a hormone implant whereas others received a placebo. Thus, the treatment is measured at the level of the individual, while clutch size is measured at the level of the nest. Clutch size was measured only once per nest but entered in the data file more than once (namely for each individual from the same nest). Repeated measure results in pseudoreplication if we do not account for the hierarchical data structure in the model. Mixed models allow modeling of the hierarchical structure of the data and, therefore, account for pseudoreplication.  
 
@@ -39,7 +39,7 @@ Treating a factor as a random factor is equivalent to partial pooling of the dat
 
 Second, group means may be estimated separately for each group. In this case, the data from all other groups are ignored when estimating a group mean. No pooling occurs in this case (right panel in Figure \@ref(fig:pooling)).  
 
-Third, the data of the different groups can be partially pooled (i.e., treated as a random effect). Thereby, the group means are weighted averages of the population mean and the unpooled group means. The weights are proportional to sample size and the inverse of the variance (see @Gelman2007, p. 252). Further, the estimated mean of all group equals the mean of the group specific means, thus, every group is weighed similarly for calculating the overall mean. In contrast, in the complete pooling case, the groups get weights proportional to their sample sizes. 
+Third, the data of the different groups can be partially pooled (i.e., treated as a random effect). Thereby, the group means are weighted averages of the population mean and the unpooled group means. The weights are proportional to sample size and the inverse of the variance (see @Gelman.2007, p. 252). Further, the estimated mean of all group equals the mean of the group specific means, thus, every group is weighed similarly for calculating the overall mean. In contrast, in the complete pooling case, the groups get weights proportional to their sample sizes. 
 
 
 

diff --git a/2.06-glm.Rmd b/2.06-glm.Rmd
@@ -443,7 +443,7 @@ says that we add a predictor but do not estimate its effect because it is fixed
 $$y_i \sim Poisson(\lambda_i T_i)$$
 $$ log(\boldsymbol \lambda \boldsymbol T) = log(\boldsymbol  \lambda) + log(\boldsymbol T) = \boldsymbol X \boldsymbol \beta + log(\boldsymbol T)$$
 
-In R, we can use the argument “offset” within the function `glm` to specify an offset. We illustrate this using a breeding bird census on wildflower fields in Switzerland in 2007 conducted by @zollinger_optimal_2013. We focus on the common whitethroat *Silvia communis*, a bird of field margins and fallow lands that has become rare in the intensively used agricultural landscape. Wildflower fields are an ecological compensation measure to provide food and nesting grounds for species such as the common whitethroat. Such fields are sown and then left unmanaged for several years except for the control of potentially problematic species (e.g., some thistle species, *Carduus spp.*). The plant composition and the vegetation structure in the field gradually changes over the years, hence the interest in this study was to determine the optimal age of a wildflower field for use by the common whitethroat.
+In R, we can use the argument “offset” within the function `glm` to specify an offset. We illustrate this using a breeding bird census on wildflower fields in Switzerland in 2007 conducted by @Zollinger.2013. We focus on the common whitethroat *Silvia communis*, a bird of field margins and fallow lands that has become rare in the intensively used agricultural landscape. Wildflower fields are an ecological compensation measure to provide food and nesting grounds for species such as the common whitethroat. Such fields are sown and then left unmanaged for several years except for the control of potentially problematic species (e.g., some thistle species, *Carduus spp.*). The plant composition and the vegetation structure in the field gradually changes over the years, hence the interest in this study was to determine the optimal age of a wildflower field for use by the common whitethroat.
 
 We use the number of breeding pairs (bp) as the outcome variable and field size as an offset, which means that we model breeding pair density. We include the age of the field (age) as a linear and quadratic term because we expect there to be an optimal age of the field (i.e., a curvilinear relationship between the breeding pair density and age). We also include field size as a covariate (in addition to using it as the offset) because the size of the field may have an effect on the density; for example, small fields may have a higher density if the whitethroat can also use surrounding areas but uses the field to breed. Size (in hectares) was z-transformed before the model fit.
Original file line number	Diff line number	Diff line change
Expand Up		@@ -156,4 +156,4 @@ Also, @Burnham.2002 give a thorough introduction to the likelihood and how it is

		The difference between the ML- and LS-estimates of the variance is explained in @Royle.2008b.

		@Gelman2014 introduce the log predictive density and explain how it is computed (p. 167).
		@Gelman.2014 introduce the log predictive density and explain how it is computed (p. 167).