Skip to content

Suggested Notation

Joram Soch edited this page Jan 17, 2025 · 5 revisions

By nature, the proofs and definitions in "The Book of Statistical Proofs" use mathematical notation. On this page, we list recommendations how to denote certain statistical objects, e.g. probabilities, distributions and models, separated by StatProofBook chapters. Generally speaking, StatProofBook contributors are not bound to this notation – that is, submitting as such is regarded as more desirable than submitting in form –, but contributors should try to adhere to the suggested notation as close as possible.

For more information, see the guidelines on using LaTeX and MathJax.

Chapter I: General Theorems

Probability theory: foundations

  • E – single random event
  • A, B, C – multiple random events
  • A_1, \ldots, A_k – mutually exclusive random events
  • \bar{A}, \bar{B}, \bar{C} – complements of random events
  • X, Y, Z – scalar random variables, random vectors or random matrices
  • x, y, z – realizations or values of random variables (exception: random matrices)
  • \mathcal{X}, \mathcal{Y}, \mathcal{Z} – sets of possible values of random variables
  • x \in \mathcal{X}, y \in \mathcal{Y}, z \in \mathcal{Z} – indexing all possible values
  • P(A), P(B), P(C) – axiomatic definition of probability
  • p(x), q(x) – (marginal) probability densities or probability masses
  • \mathrm{Pr}(X=a), \mathrm{Pr}(X \in A) – specific statements about random variables
  • p(x,y) – joint probability
  • p(x|y) – conditional probability
  • f_X(x) – probability density (PDF) or probability mass function (PMF)
  • F_X(x) – cumulative distribution function (CDF)
  • Q_X(p) – quantile function (QF) a.k.a. inverse CDF.
  • M_X(t) – moment-generating function (MGF)

Probability theory: moments

  • \mathrm{E}(X) – expected value (mean)
  • \mathrm{Var}(X) – variance
  • \mathrm{Skew}(X) – skewness
  • \mathrm{Cov}(X,Y) – covariance
  • \mathrm{Corr}(X,Y) – correlation
  • \Sigma_{XX} – covariance matrix
  • C_{XX} – correlation matrix
  • x = \left\lbrace x_1, \ldots, x_n \right\rbrace – sample
  • \bar{x} – sample mean
  • s^2, s_x^2 – sample variance
  • \hat{s}, \hat{s}_x – sample skewness
  • s_{xy} – sample covariance
  • r_{xy} – sample correlation
  • S, S_{xy} – sample covariance matrix
  • R, R_{xy} – sample correlation matrix

Probability theory: other

  • \mathrm{median}(X) – median
  • \mathrm{mode}(X) – mode
  • \sigma(X) – standard deviation
  • \mathrm{FWHM}(X) – full width at half maximum
  • \mathrm{min}, \mathrm{max} – minimum, maximum
  • \mu_n(c) – n-th moment about c
  • \mu_n' – n-th raw moment
  • \mu_n – n-th central moment
  • \mu_n^{*} – n-th standardized moment

Information theory

  • \mathrm{H}(X) – (Shannon) entropy
  • \mathrm{H}(X|Y) – conditional entropy
  • \mathrm{H}(X,Y) – joint entropy (of two random variables)
  • \mathrm{H}(P,Q) – cross-entropy (of two probability distributions)
  • \mathrm{h}(X) – differential entropy
  • \mathrm{h}(X|Y) – conditional differential entropy
  • \mathrm{h}(X,Y) – joint differential entropy (of two random variables)
  • \mathrm{h}(P,Q) – differential cross-entropy (of two probability distributions)
  • \mathrm{I}(X,Y) – mutual information
  • \mathrm{KL}[P||Q] – Kullback-Leibler divergence (between two probability distributions)
  • \mathrm{KL}[p(x)||q(x)] – Kullback-Leibler divergence (between two PMFs or PDFs)

Chapter II: Probability Distributions

General notation

  • \lambda – hyper-parameters, parameters of a distribution
  • \mathcal{D}(\lambda) – parametrized probability distribution
  • X \sim \mathcal{D}(\lambda) – random variable following probability distribution
  • f_X(x) = \mathcal{D}(x; \lambda) – PDF or PMF of probability distribution
  • F_X(x) = \int_{-\infty}^x \mathcal{D}(z; \lambda) \, \mathrm{d}z – CDF of probability distribution
  • Y = \sum_{i=1}^p a_i X_i – linear combination of random variables
  • Y = AX + b – linear transformation of random variable(s)
  • \mathrm{E}(X) – expected value of random variable
  • \mathrm{median}(X) – median of random variable
  • \mathrm{mode}(X) – mode of random variable
  • \mathrm{Var}(X) – variance of random variable
  • \mathrm{Cov}(X) – covariance of random vector

Specific distributions: discrete

  • \mathcal{U}(a, b) – discrete uniformation distribution
  • \mathrm{Bern}(p) – Bernoulli distribution
  • \mathrm{Bin}(n, p) – binomial distribution
  • \mathrm{BetBin}(n,\alpha,\beta) – beta-binomial distribution
  • \mathrm{Poiss}(\lambda) – Poisson distribution
  • \mathrm{Cat}([p_1,\ldots,p_k]) – categorical distribution
  • \mathrm{Mult}(n,[p_1,\ldots,p_k]) – multinomial distribution

Specific distributions: univariate continuous

  • \mathcal{U}(a, b) – continuous uniformation distribution
  • \mathcal{N}(\mu, \sigma^2) – univariate normal distribution
  • t(\nu) – univariate t-distribution
  • \mathrm{Gam}(a,b) – gamma distribution
  • \mathrm{Exp}(\lambda) – exponential distribution
  • \ln \mathcal{N}(\mu, \sigma^2) – log-normal distribution
  • \chi^2(k) – chi-squared distribution
  • \ln \mathcal{N}(\mu, \sigma^2) – log-normal distribution
  • F(d_1, d_2) – F-distribution
  • \mathrm{Bet}(\alpha, \beta) – beta distribution
  • \mathrm{Wald}(\gamma, \alpha) – Wald distribution
  • \mathrm{ex-Gaussian}(\mu, \sigma, \lambda) – ex-Gaussian distribution

Specific distributions: multivariate continuous

  • \mathcal{N}(\mu, \Sigma) – multivariate normal distribution
  • t(\mu, \Sigma, \nu) – multivariate t-distribution
  • \mathrm{NG}(\mu, \Lambda, a, b) – normal-gamma distribution
  • \mathrm{Dir}(\alpha) – Dirichlet distribution
  • \mathcal{MN}(M, U, V) – matrix-normal distribution
  • \mathcal{W}(V, n) – Wishart distribution
  • \mathrm{NW}(M, U, V, \nu) – normal-Wishart distribution

Chapter III: Statistical Models

General notation

  • y – measured data
  • m – generative model
  • \theta – model parameters
  • \lambda – model hyper-parameters
  • \mathcal{L}_m(\theta) – likelihood function
  • p(y|\theta,m) – likelihood function
  • \mathrm{LL}(\theta) – log-likelihood function
  • \hat{\theta} – estimated model parameters (maximum likelihood)
  • \hat{\theta}_\mathrm{MAP} – estimated model parameters (maximum-a-posteriori)
  • \hat{y} – fitted/predicted data
  • p(\theta|m) – prior distribution
  • p(\theta|y,m) – posterior distribution
  • p(y_\mathrm{new}|m) – prior predictive distribution
  • p(y_\mathrm{new}|y,m) – posterior predictive distribution
  • p(y|m) – marginal likelihood
  • \log p(y|m) – log model evidence

Linear models

  • y, Y – univariate/multivariate measured data
  • x, X – single predictor/design matrix
  • \beta, B – univariate/multivariate regression coefficients
  • \varepsilon, E – univariate/multivariate noise
  • \sigma^2, \Sigma – noise variance/measurement covariance
  • I_n – noise covariance matrix (i.i.d.)
  • V – noise covariance matrix (not i.i.d.)
  • n – number of observations
  • v – number of measurements
  • p – number of regressors
  • y_i – i-th observation (univariate GLM)
  • y_{ij} – i-th observation of j-th measurement (multivariate GLM)
  • y_{ij} – j-th observation of i-th category (one-way ANOVA)
  • y_{ijk} – k-th observation of (i,j)-th cell (two-way ANOVA)
  • y = \left\lbrace y_1, \ldots, y_n \right\rbrace – data set consisting of n data points

Chapter IV: Model Selection

General notation

  • y – measured data
  • m – generative model
  • f – generative model family
  • n – number of observations
  • k – number of free model parameters

Model selection criteria

  • \sigma^2 – noise variance
  • \hat{\sigma}^2 – residual variance
  • R^2 – coefficient of determination
  • R^2_\mathrm{adj} – adjusted coefficient of determination
  • \mathrm{SNR} – signal-to-noise ratio
  • \mathrm{MLL}{m} – maximum log-likelihood
  • \mathrm{IC}{m} – information criterion

Bayesian model selection

  • p(y|m) – model evidence
  • \mathrm{LME}{m} – log model evidence
  • \mathrm{Acc}{m} – (Bayesian) model accuracy (term)
  • \mathrm{Com}{m} – (Bayesian) model complexity (penalty)
  • m \in f – indexing all models in a family
  • p(y|f) – family evidence
  • \mathrm{LFE}{f} – log family evidence
  • \mathrm{BF}_{12} – Bayes factor
  • \mathrm{LBF}_{12} – log Bayes factor
  • p(m|y) – posterior model probability
  • p(\theta|y) – marginal posterior distribution