This repo includes a collection of resources that may be helpful for learning about methods of categorical data analysis, including the specialized case of species data. Many materials are linked below, and others are included in the folders above.
For getting a general sense of what to do with your data and how to work with it in R, I found the vcdExtra
tutorial and the Penn State course very helpful. The Agresti book is also extremely useful, and also includes an associated R manual (both in general-materials folder). Humble word of advice: start with categorical data analysis in general, and then move on to caper
or phylolm
for phylogenetic methods. The main difference is that you will use a phylogeny along with your data, but that can complicate things.
Another humble word of advice: it matters if your response variable is binary. If your response variable is binary, you will need to use a specialized case of glm
(family = "binomial"
) and a special function from the rr2
package (BinaryPGLMM
).
For visualizing the output, I cannot speak highly enough about vcdExtra
(for non-phylogentic data).
- Analyzing categorical data with
vcdExtra
tutorial - PennState graduate course on Analysis of Discrete Data
- Categorical data analysis in R from Boston University
- Slide presentation on log-linear models for contingency tables
- Watch me personally test every kind of relationship under the sun for one binary response variable and 3 predictor variables
- Phylogenetic linear regression for non-binary response variable: R package
caper
- Phylogenetic regression for binary response variable
- Calculating predicted R^2 values for
BinaryPGLMM
results - For a great primer on binary phylogenetic data, check out Chapter 9 in the book Modern Phylogenetic Comparative Methods and Their Application in Evolutionary Biology (included in phylogenetic-materials)
These materials are specifically designed with nominal categorical variables in mind. Data like this has no order and is non-numeric (e.g., "married" or "divorced"). Discrete data is countable and numeric (like the number of times a coin landed on heads, or the number of customer complaints), but can be treated as categorical in cases some cases. Save yourself from my personal pitfalls and make sure you investigate what kind of data you have (discrete, continuous, ordinal, etc.) to ensure you are performing the proper tests.