Skip to content

Commit

Permalink
revision basics (e.g. formula of mean)
Browse files Browse the repository at this point in the history
all refs from .._new
  • Loading branch information
Pius Korner authored and Pius Korner committed Dec 13, 2024
1 parent e08fc43 commit b18e69d
Show file tree
Hide file tree
Showing 97 changed files with 2,047 additions and 3,368 deletions.
7 changes: 4 additions & 3 deletions docs/1.0-PART-I.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
# (PART) BASIC STATISTICS FOR ECOLOGISTS {-}

# Introduction to PART I {#PART-I}
<a href="" target="_blank"><img src="images/part_I.jpg" width="410" style="display: block; margin: auto;" /></a>
<a href="" target="_blank"><img src="images/part_I.jpg" style="display: block; margin: auto;" /></a>

------

During our courses we are sometimes asked to give an introduction to some R-related stuff covering data analysis, presentation of results or rather specialist topics in ecology. In this part we present collected these introduction and try to keep them updated. This is also a commented collection of R-code that we documented for our own work. We hope this might be useful olso for other readers.
In this first part, we present some fundamental material of statistics and data analysis, and R-specific issues that we regularly encounter during data analyses. This is also a commented collection of R-code that we documented for our own work. We hope this might be useful also for other readers.


## Further reading
- [R for Data Science by Garrett Grolemund and Hadley Wickham](http://r4ds.had.co.nz): Introduces the tidyverse framwork. It explains how to get data into R, get it into the most useful structure, transform it, visualise it and model it.
- [An Introduction to R](https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf) is an introduction and manual for basic R usage.
- [R for Data Science](https://r4ds.hadley.nz/spreadsheets): Introduces the tidyverse framework, explains how to get data into R, get it into the most useful structure, transform it, visualise it and model it.

228 changes: 132 additions & 96 deletions docs/1.1-prerequisites.md

Large diffs are not rendered by default.

Binary file modified docs/1.1-prerequisites_files/figure-html/CImean-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/1.1-prerequisites_files/figure-html/boxplot-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/1.1-prerequisites_files/figure-html/histboot-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/1.1-prerequisites_files/figure-html/histogram-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/1.1-prerequisites_files/figure-html/histtruesample-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/1.1-prerequisites_files/figure-html/jointpdist-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/1.1-prerequisites_files/figure-html/principal-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/1.1-prerequisites_files/figure-html/ranhist-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/1.1-prerequisites_files/figure-html/scatterplot-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/1.1-prerequisites_files/figure-html/sesd-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/1.1-prerequisites_files/figure-html/triplot-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/1.1-prerequisites_files/figure-html/unnamed-chunk-8-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions docs/1.2-analyses_steps.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ d) Is it expected that a change of 1 at lower values for x has the same biologic

2. Centering: Centering ($x.c = x-mean(x)$) is a transformation that produces a variable with a mean of 0. Centering is optional. We have two reasons to center a predictor variable. First, it helps the model fitting algorithm to better converge because it reduces correlations among model parameters. Second, with centered predictors, the intercept and main effects in the linear model are better interpretable (they are measured at the center of the data instead of at the covariate value of 0 which may be far off).

3. Scaling: Scaling ($x.s = x/c$, where $c$ is a constant) is a transformation that changes the unit of the variable. Also scaling is optional. We have three reasons to scale an predictor variable. First, to make the effect sizes better understandable. For example, a population change from one year to the next may be very small and hard to interpret. When we give the change for a 10-year period, its ecological meaning is better understandable. Second, to make the estimate of the effect sizes comparable between variables, we may use $x.s = x/sd(x)$. The resulting variable has a unit of one standard deviation. A standard deviation may be comparable between variables that oritinally are measured in different units (meters, seconds etc). @Gelman2007 (p. 55 f) propose to scale the variables by two times the standard deviation ($x.s = x/(2*sd(x))$) to make effect sizes comparable between numeric and binary variables. Third, scaling can be important for model convergence, especially when polynomials are included. Also, consider the use of orthogonal polynomials, see Chapter 4.2.9 in @KornerNievergelt2015.
3. Scaling: Scaling ($x.s = x/c$, where $c$ is a constant) is a transformation that changes the unit of the variable. Also scaling is optional. We have three reasons to scale an predictor variable. First, to make the effect sizes better understandable. For example, a population change from one year to the next may be very small and hard to interpret. When we give the change for a 10-year period, its ecological meaning is better understandable. Second, to make the estimate of the effect sizes comparable between variables, we may use $x.s = x/sd(x)$. The resulting variable has a unit of one standard deviation. A standard deviation may be comparable between variables that oritinally are measured in different units (meters, seconds etc). @Gelman.2007 (p. 55 f) propose to scale the variables by two times the standard deviation ($x.s = x/(2*sd(x))$) to make effect sizes comparable between numeric and binary variables. Third, scaling can be important for model convergence, especially when polynomials are included. Also, consider the use of orthogonal polynomials, see Chapter 4.2.9 in @KornerNievergelt.2015.

4. Collinearity: Look at the correlation among the explanatory variables (pairs plot or correlation matrix). If the explanatory variables are correlated, go back to step 2 and add this relationship. Further, read our thoughts about [collinearity](#collinearity).

Expand All @@ -70,7 +70,7 @@ Fit the model.
## Check Model {#step8}
We assess model fit by graphical [analyses of the residuals](#residualanalysis) and by [predictive model checking](#modelchecking).

For non-Gaussian models it is often easier to assess model fit using [posterior predictive checks](#modelchecking) rather than residual analyses. Posterior predictive checks usually show clearly in which aspect the model failed so we can go back to step 2 of the analysis. Recognizing in what aspect a model does not fit the data based on residual plots improves with experience. Therefore, we list in Chapter 16 of @KornerNievergelt2015 some patterns that can appear in residual plots together with what these patterns possibly indicate. We also indicate what could be done in the specific cases.
For non-Gaussian models it is often easier to assess model fit using [posterior predictive checks](#modelchecking) rather than residual analyses. Posterior predictive checks usually show clearly in which aspect the model failed so we can go back to step 2 of the analysis. Recognizing in what aspect a model does not fit the data based on residual plots improves with experience. Therefore, we list in Chapter 16 of @KornerNievergelt.2015 some patterns that can appear in residual plots together with what these patterns possibly indicate. We also indicate what could be done in the specific cases.

## Model Uncertainty {#step9}
If, while working through steps 1 to 8, possibly repeatedly, we came up with one or more models that fit the data reasonably well, we [model comparison](#model_comparison) is needed or inference may be drawn from more than one model. If we have only one model, we proceed to \@ref(step10).
Expand Down
2 changes: 1 addition & 1 deletion docs/1.3-distributions.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ Further, note that not all variables measured as an integer number are count dat

### Negative-binomial distribution

The negative-binomial distribution represents the number of zeros which occur in a sequence of Bernoulli trials before a target number of ones is reached. It is hard to see this situation in, e.g., the number of individuals counted on plots. Therefore, we were reluctant to introduce this distribution in our old book [@KornerNievergelt2015]. However, the negative-binomial distribution often fits much better to count data than the Poisson model because it has two parameters and therefore allows for fitting both the mean and the variance to the data. Therefore, we started using the negative-binomial distribution as a data model more often.
The negative-binomial distribution represents the number of zeros which occur in a sequence of Bernoulli trials before a target number of ones is reached. It is hard to see this situation in, e.g., the number of individuals counted on plots. Therefore, we were reluctant to introduce this distribution in our old book [@KornerNievergelt.2015]. However, the negative-binomial distribution often fits much better to count data than the Poisson model because it has two parameters and therefore allows for fitting both the mean and the variance to the data. Therefore, we started using the negative-binomial distribution as a data model more often.
$x \sim negative-binomial(p,n)$

Its probability function is rather complex:
Expand Down
Binary file modified docs/1.3-distributions_files/figure-html/unnamed-chunk-1-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit b18e69d

Please # to comment.